* A missing i_mutex in rename? (Linux kernel 2.6.latest) @ 2006-04-19 10:50 Anton Altaparmakov 2006-04-19 12:18 ` Matthew Wilcox 0 siblings, 1 reply; 5+ messages in thread From: Anton Altaparmakov @ 2006-04-19 10:50 UTC (permalink / raw) To: Al Viro; +Cc: linux-fsdevel, linux-kernel Hi Al and other fs developers, Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex on all parent directories and source/destination inodes before calling into the file system inode operations. sys_rename() OTOH, does not take i_mutex on the old inode. It only takes i_mutex on the two parent directories and on the target inode if it exists. Why is this? To me it seems that either sys_rename() must take i_mutex on the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to hold the i_mutex. What am I missing? ps. I verified my reading of the code by inserting a mutex_is_locked(old_dent->d_inode) in ->rename in ntfs and it returns negative no matter how I invoke the rename (i.e. it does not matter if source is a file or directory or whether a target exists, etc). pps. If indeed sys_rename() is correct in not needing the mutex and sys_unlink()/sys_rmdir(), sys_link(), etc are correct in needing the mutex, would it be safe if I just take old_dentry->d_inode->i_mutex on entry to ntfs_rename()? I would assume that there is no deadlock risk because the parent is already locked, correct? Thanks a lot in advance! Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest) 2006-04-19 10:50 A missing i_mutex in rename? (Linux kernel 2.6.latest) Anton Altaparmakov @ 2006-04-19 12:18 ` Matthew Wilcox 2006-04-19 12:51 ` Anton Altaparmakov 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2006-04-19 12:18 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Al Viro, linux-fsdevel, linux-kernel On Wed, Apr 19, 2006 at 11:50:00AM +0100, Anton Altaparmakov wrote: > Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex > on all parent directories and source/destination inodes before calling > into the file system inode operations. > > sys_rename() OTOH, does not take i_mutex on the old inode. It only takes > i_mutex on the two parent directories and on the target inode if it > exists. > > Why is this? To me it seems that either sys_rename() must take i_mutex on > the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to > hold the i_mutex. > > What am I missing? I believe the current locking scheme to be correct. Reading Documentation/filesystems/directory-locking and pondering for a few minutes leads me to the following conclusions: - sys_rmdir() must take the lock on the parent directory and on the victim. If a different process is trying to create a file in the victim, sys_rmdir() mustn't race with it. - I don't immediately see a race that taking the lock on the victim of sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems desirable. - sys_link() needs to lock the target to be sure it isn't removed and replaced with a directory in the meantime. - sys_rename() does not need to lock the old inode. Since the parent is already locked, the old inode can't be removed/renamed by a racing process. It doesn't matter if something's created or deleted from within the old inode (if it's a directory), unlike rmdir(). It doesn't need to be protected from a sys_link() race. If you need to lock the old inode inside ntfs for your own consistency purposes, that looks like it should be fine, but the VFS doesn't need to lock it for you. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest) 2006-04-19 12:18 ` Matthew Wilcox @ 2006-04-19 12:51 ` Anton Altaparmakov 2006-04-20 10:59 ` Al Viro 0 siblings, 1 reply; 5+ messages in thread From: Anton Altaparmakov @ 2006-04-19 12:51 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Al Viro, linux-fsdevel, linux-kernel Hi Matthew, Thanks for the quick reply. On Wed, 19 Apr 2006, Matthew Wilcox wrote: > On Wed, Apr 19, 2006 at 11:50:00AM +0100, Anton Altaparmakov wrote: > > Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex > > on all parent directories and source/destination inodes before calling > > into the file system inode operations. > > > > sys_rename() OTOH, does not take i_mutex on the old inode. It only takes > > i_mutex on the two parent directories and on the target inode if it > > exists. > > > > Why is this? To me it seems that either sys_rename() must take i_mutex on > > the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to > > hold the i_mutex. > > > > What am I missing? > > I believe the current locking scheme to be correct. Reading > Documentation/filesystems/directory-locking and pondering for a few > minutes leads me to the following conclusions: > > - sys_rmdir() must take the lock on the parent directory and on the > victim. If a different process is trying to create a file in the > victim, sys_rmdir() mustn't race with it. Agreed. > - I don't immediately see a race that taking the lock on the victim of > sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems > desirable. I guess the symmetry thing is fair enough. > - sys_link() needs to lock the target to be sure it isn't removed and > replaced with a directory in the meantime. Agreed. > - sys_rename() does not need to lock the old inode. Since the parent > is already locked, the old inode can't be removed/renamed by a racing > process. It doesn't matter if something's created or deleted from > within the old inode (if it's a directory), unlike rmdir(). It > doesn't need to be protected from a sys_link() race. Agreed. > If you need to lock the old inode inside ntfs for your own consistency > purposes, that looks like it should be fine, but the VFS doesn't need to > lock it for you. Great, thanks. That was my own conclusion also but it never hurts to be sure. (-: ntfs_rename() at the moment looks roughly like this: if (target_inode) { if (S_ISDIR(target_inode->i_mode) ntfs_rmdir(target_dir_inode, target_dentry); else ntfs_unlink(target_dir_inode, target_dentry); } mutex_lock(&old_inode->i_mutex); ntfs_link(old_dentry, target_dir_inode, target_dentry); ntfs_unlink(old_dir_inode, old_dentry); mutex_unlock(&old_inode->i_mutex); Which is incredibly inefficient but very simple and works (with minimal special casing in ntfs_link() and ntfs_unlink() mostly so if old_inode is a directory we never get a link count greater one on the VFS inode) and I doubt sys_rename() is a very often invoked system call. Normally, a sys_link() and sys_unlink() would take i_mutex on old_inode as shown in above code which is why I was wondering whether I should take it as shown above or whether I can just not worry and not take yet another lock in a code path where tons of locks are already being taken and released. My conclusion is that the above code is safe even if I remove the mutex_lock()/mutex_unlock() around the ntfs_link()/ntfs_unlink() given that sys_unlink() only takes the lock for symmetry reasons and sys_link()'s need for locking is taken care of by the fact that sys_rename() has the lock on both parent directory inodes. Would you agree? Thanks a lot in advance! Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest) 2006-04-19 12:51 ` Anton Altaparmakov @ 2006-04-20 10:59 ` Al Viro 2006-04-20 12:24 ` Anton Altaparmakov 0 siblings, 1 reply; 5+ messages in thread From: Al Viro @ 2006-04-20 10:59 UTC (permalink / raw) To: Anton Altaparmakov; +Cc: Matthew Wilcox, linux-fsdevel, linux-kernel On Wed, Apr 19, 2006 at 01:51:21PM +0100, Anton Altaparmakov wrote: > > - I don't immediately see a race that taking the lock on the victim of > > sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems > > desirable. > > I guess the symmetry thing is fair enough. Not only; it does, among other things, guarantee that fs can assume that ->link() won't race with unlink() (and that link count is protected by ->i_mutex, while we are at it). > > - sys_link() needs to lock the target to be sure it isn't removed and > > replaced with a directory in the meantime. > > Agreed. "Replaced" part is bogus - we'd done lookup, so we won't get anything new. > > If you need to lock the old inode inside ntfs for your own consistency > > purposes, that looks like it should be fine, but the VFS doesn't need to > > lock it for you. > > Great, thanks. That was my own conclusion also but it never hurts to be > sure. (-: > > ntfs_rename() at the moment looks roughly like this: > > if (target_inode) { > if (S_ISDIR(target_inode->i_mode) > ntfs_rmdir(target_dir_inode, target_dentry); > else > ntfs_unlink(target_dir_inode, target_dentry); > } > mutex_lock(&old_inode->i_mutex); > ntfs_link(old_dentry, target_dir_inode, target_dentry); > ntfs_unlink(old_dir_inode, old_dentry); > mutex_unlock(&old_inode->i_mutex); Have fun dealing with error handling in the above... Note that failing rename() should _NOT_ lead to target disappearing. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest) 2006-04-20 10:59 ` Al Viro @ 2006-04-20 12:24 ` Anton Altaparmakov 0 siblings, 0 replies; 5+ messages in thread From: Anton Altaparmakov @ 2006-04-20 12:24 UTC (permalink / raw) To: Al Viro; +Cc: Matthew Wilcox, linux-fsdevel, linux-kernel On Thu, 20 Apr 2006, Al Viro wrote: > On Wed, Apr 19, 2006 at 01:51:21PM +0100, Anton Altaparmakov wrote: > > > - I don't immediately see a race that taking the lock on the victim of > > > sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems > > > desirable. > > > > I guess the symmetry thing is fair enough. > > Not only; it does, among other things, guarantee that fs can assume that > ->link() won't race with unlink() (and that link count is protected by > ->i_mutex, while we are at it). Ok. > > ntfs_rename() at the moment looks roughly like this: > > > > if (target_inode) { > > if (S_ISDIR(target_inode->i_mode) > > ntfs_rmdir(target_dir_inode, target_dentry); > > else > > ntfs_unlink(target_dir_inode, target_dentry); > > } > > mutex_lock(&old_inode->i_mutex); > > ntfs_link(old_dentry, target_dir_inode, target_dentry); > > ntfs_unlink(old_dir_inode, old_dentry); > > mutex_unlock(&old_inode->i_mutex); > > Have fun dealing with error handling in the above... Note that failing > rename() should _NOT_ lead to target disappearing. Indeed. But if I do not define a rename operation at all then the mv command looses the target if the move fails, too. So I don't see how the native rename needs to be any better other than for standards compliance. Error recovery is actually pretty easy as if the second ntfs_unlink() fails I simply ntfs_unlink() the link I just created with ntfs_link(). That already works fine. But yes at the moment I do lose the target if either the ntfs_link() or the second ntfs_unlink() fails. I am planning to fix this though. It is not too hard because I can more or less just ntfs_link() it back in (with a few special casings). Even if I implement are more "native" rename not using ntfs_unlink()/ntfs_link() I still have to remove the target first otherwise I cannot insert the new target because you cannot have two entries in the B+tree that have identical index keys (the index key is the filename itself so the two would be equal). And I cannot simply replace the B+tree index entry because NTFS is not case sensitive so the two names (old and new) may not be in the same place in the B+tree. The only thing I can do is to remove the old B+tree index entry and add the new one after that fact. So no matter what I do, I have to be able to recreate the old index entry and that is just what ntfs_link() does which is why I chose my "simplified rename" approach in the first place. Thanks a lot for you comments about the locking! Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer, http://www.linux-ntfs.org/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-04-20 12:24 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-19 10:50 A missing i_mutex in rename? (Linux kernel 2.6.latest) Anton Altaparmakov 2006-04-19 12:18 ` Matthew Wilcox 2006-04-19 12:51 ` Anton Altaparmakov 2006-04-20 10:59 ` Al Viro 2006-04-20 12:24 ` Anton Altaparmakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).