linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. R. Okajima" <hooanon05@yahoo.co.jp>
To: Valerie Aurora <vaurora@redhat.com>
Cc: Neil Brown <neilb@suse.de>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Miklos Szeredi <miklos@szeredi.hu>, Jan Blunck <jblunck@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 14/39] union-mount: Union mounts documentation
Date: Wed, 25 Aug 2010 14:03:50 +0900	[thread overview]
Message-ID: <6246.1282712630@jrobl> (raw)
In-Reply-To: <20100824204839.GB28718@shell>


Valerie Aurora:
> No, that's not a sufficient description and leaves open questions
> about all sorts of deadlocks and race conditions.  For example,
> inotify events occur while holding locks only on one layer.  You
> obviously need to lock the top layer to update the inheritance and
> parent-child relationships.  Now you are locking the lower layer first
> and the top layer second, which is the reverse of the usual order.

I don't agree about deadlock and race condition.
When user modifies the dir hierarchy on the layer directly during
aufs_rename() is running, aufs will detect it after lock_rename().
It behaves like this.
- decide the layer where actual rename operates. create the dir
  hierarchy on it if necessary.
- lock_rename() for the layer
- calls ->rename()
or
- if the renaming file exists on the lower readonly layer, aufs will
  copyup it to the upper writable layer as the rename target name.
  In this case, ->rename() is not called.

If a user changes the dir hierarchy directly on the layer before
aufs_rename(), then the notify event tells aufs it and aufs gets the
latetst hierarchy.

If it happens before lock_rename() in aufs_rename(), aufs verifies the
relationship between the target child and the locked dir. if it differs,
return EBUSY. Of course, lock_rename() follows the "ancestors first"
order described in Documentation/filesystem/directory-locking.


> around on the lower layer is safe.  In general, your first task is to
> show a global lock ordering to prove lack of deadlocks (which I don't
> think you should spend time on because most VFS experts think it is
> impossible to do with two read-write layers).

Since you may not read this anymore and other people doesn't seem to
be intrested in aufs, it may not be meaningful to write down about
locking in aufs. But I will try.

At first,
- since aufs is FS, it has its own super_block, dentry and inode.
- super_block, dentry and inode in aufs have private data which contains
  rwsem.
- the locking order for these rwsem is child-first.
- aufs specifies FS_RENAME_DOES_D_MOVE.

locking order in aufs_rename
+ down_read() for aufs sb
  protects sb from branch-add, delete.
+ two down_write()s for src and dest child
  protects them from other processes in aufs.
+ down_write() for the dst_parent.
+ decide the layer where we will operate, by comparing the index of
  layers where the targets exist and the layer attribute (ro, rw).
+ copyup the dest dir hierarchy if necessary, by repeating
  - dget_parent(), down/up_read() for the parent (in aufs)
  - mutex_lock() for the dir (on the layer) to mkdir the non-existing
    child dir on the layer and verify the parent-child relationship.
  - mkdir and setattr on the layer.
  - mutex_unlock() the dir on the layer.
+ test they are rename-able
  if it is a dir, it must be empty (logically) or must not have children
  on the multiple branches.
+ if src_parent and dst_parent differ, down_write both. up_write for
  dst_parent may be necessary to keep the "child-first" rule in aufs.

(from here the "sub-VFS" characteristic of aufs appears)
+ lock_rename() on the layer
  and verify the every relationships between child and parent.
+ test the src_child is deletable.
+ test the dst_child is add-able or deletable if it exists.
+ vfs_rename() on the layer or copyup src_child as a dst_child name.
+ unlock_rename() on the layer

(return to aufs world)
+ d_drop() dst_child if necessary.
+ d_move()
+ up_write() for src_parent and dst_parent
+ up_write() fot src_child and dst_child
+ up_read() for aufs sb

Strictly speaking, there are more things which aufs_rename() handles
such as inode attributes, whiteout, opaque-dir, internal pointers to the
object on the layer, temporary dir-name. But they are unrelated to the
locking order essentially. So I didn't describe about them.


Thank you reading this long mail.


J. R. Okajima

  parent reply	other threads:[~2010-08-25  5:03 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-08 15:52 [PATCH 00/39] Union mounts - return d_ino from lower fs Valerie Aurora
2010-08-08 15:52 ` [PATCH 01/39] VFS: Comment follow_mount() and friends Valerie Aurora
2010-08-08 15:52 ` [PATCH 02/39] VFS: Make lookup_hash() return a struct path Valerie Aurora
2010-08-08 15:52 ` [PATCH 03/39] VFS: Add read-only users count to superblock Valerie Aurora
2010-08-08 15:52 ` [PATCH 04/39] autofs4: Save autofs trigger's vfsmount in super block info Valerie Aurora
2010-08-08 15:52 ` [PATCH 05/39] whiteout/NFSD: Don't return information about whiteouts to userspace Valerie Aurora
2010-08-08 15:52 ` [PATCH 06/39] whiteout: Add vfs_whiteout() and whiteout inode operation Valerie Aurora
2010-08-08 15:52 ` [PATCH 07/39] whiteout: Set opaque flag if new directory was previously a whiteout Valerie Aurora
2010-08-08 15:52 ` [PATCH 08/39] whiteout: Allow removal of a directory with whiteouts Valerie Aurora
2010-08-08 15:52 ` [PATCH 09/39] whiteout: tmpfs whiteout support Valerie Aurora
2010-08-08 15:52 ` [PATCH 10/39] whiteout: Split of ext2_append_link() from ext2_add_link() Valerie Aurora
2010-08-08 15:52 ` [PATCH 11/39] whiteout: ext2 whiteout support Valerie Aurora
2010-08-08 15:52 ` [PATCH 12/39] whiteout: jffs2 " Valerie Aurora
2010-08-08 15:52 ` [PATCH 13/39] fallthru: Basic fallthru definitions Valerie Aurora
2010-08-08 15:52 ` [PATCH 14/39] union-mount: Union mounts documentation Valerie Aurora
2010-08-09 22:56   ` Neil Brown
2010-08-11  1:51     ` J. R. Okajima
2010-08-17 20:44     ` Valerie Aurora
2010-08-17 22:53       ` Neil Brown
2010-08-18  0:15         ` Luca Barbieri
2010-08-18 19:04         ` Valerie Aurora
2010-08-18  1:23       ` J. R. Okajima
2010-08-18 18:55         ` Valerie Aurora
2010-08-19  1:34           ` J. R. Okajima
2010-08-24  0:05             ` Valerie Aurora
2010-08-24  2:28               ` J. R. Okajima
2010-08-24 20:48                 ` Valerie Aurora
2010-08-25  2:59                   ` Christian Stroetmann
2010-08-25  5:03                   ` J. R. Okajima [this message]
2010-08-08 15:52 ` [PATCH 15/39] union-mount: Introduce MNT_UNION and MS_UNION flags Valerie Aurora
2010-08-08 15:52 ` [PATCH 16/39] union-mount: Introduce union_dir structure and basic operations Valerie Aurora
2010-08-08 15:52 ` [PATCH 17/39] union-mount: Free union dirs on removal from dcache Valerie Aurora
2010-08-08 15:52 ` [PATCH 18/39] union-mount: Support for union mounting file systems Valerie Aurora
2010-08-08 15:52 ` [PATCH 19/39] union-mount: Implement union lookup Valerie Aurora
2010-08-13 13:49   ` Miklos Szeredi
2010-08-17 21:44     ` Valerie Aurora
2010-08-18  8:11       ` Miklos Szeredi
2010-08-08 15:52 ` [PATCH 20/39] union-mount: Call do_whiteout() on unlink and rmdir in unions Valerie Aurora
2010-08-08 15:52 ` [PATCH 21/39] union-mount: Copy up directory entries on first readdir() Valerie Aurora
2010-08-08 15:52 ` [PATCH 22/39] union-mount: Add generic_readdir_fallthru() helper Valerie Aurora
2010-08-08 15:52 ` [PATCH 23/39] fallthru: ext2 fallthru support Valerie Aurora
2010-08-13 13:52   ` Miklos Szeredi
2010-08-17 21:08     ` Valerie Aurora
2010-08-17 22:28     ` Valerie Aurora
2010-08-08 15:52 ` [PATCH 24/39] fallthru: jffs2 " Valerie Aurora
2010-08-08 15:52 ` [PATCH 25/39] fallthru: tmpfs " Valerie Aurora
2010-08-08 15:52 ` [PATCH 26/39] VFS: Split inode_permission() and create path_permission() Valerie Aurora
2010-08-08 15:52 ` [PATCH 27/39] VFS: Create user_path_nd() to lookup both parent and target Valerie Aurora
2010-08-08 15:52 ` [PATCH 28/39] union-mount: In-kernel file copyup routines Valerie Aurora
2010-08-08 15:52 ` [PATCH 29/39] union-mount: Implement union-aware access()/faccessat() Valerie Aurora
2010-08-08 15:52 ` [PATCH 30/39] union-mount: Implement union-aware link() Valerie Aurora
2010-08-08 15:52 ` [PATCH 31/39] union-mount: Implement union-aware rename() Valerie Aurora
2010-08-08 15:52 ` [PATCH 32/39] union-mount: Implement union-aware writable open() Valerie Aurora
2010-08-08 15:52 ` [PATCH 33/39] union-mount: Implement union-aware chown() Valerie Aurora
2010-08-08 15:52 ` [PATCH 34/39] union-mount: Implement union-aware truncate() Valerie Aurora
2010-08-08 15:52 ` [PATCH 35/39] union-mount: Implement union-aware chmod()/fchmodat() Valerie Aurora
2010-08-08 15:52 ` [PATCH 36/39] union-mount: Implement union-aware lchown() Valerie Aurora
2010-08-08 15:52 ` [PATCH 37/39] union-mount: Implement union-aware utimensat() Valerie Aurora
2010-08-08 15:52 ` [PATCH 38/39] union-mount: Implement union-aware setxattr() Valerie Aurora
2010-08-08 15:52 ` [PATCH 39/39] union-mount: Implement union-aware lsetxattr() Valerie Aurora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6246.1282712630@jrobl \
    --to=hooanon05@yahoo.co.jp \
    --cc=hch@infradead.org \
    --cc=jblunck@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.de \
    --cc=vaurora@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).