* A missing i_mutex in rename? (Linux kernel 2.6.latest)
@ 2006-04-19 10:50 Anton Altaparmakov
2006-04-19 12:18 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Anton Altaparmakov @ 2006-04-19 10:50 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
Hi Al and other fs developers,
Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex
on all parent directories and source/destination inodes before calling
into the file system inode operations.
sys_rename() OTOH, does not take i_mutex on the old inode. It only takes
i_mutex on the two parent directories and on the target inode if it
exists.
Why is this? To me it seems that either sys_rename() must take i_mutex on
the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to
hold the i_mutex.
What am I missing?
ps. I verified my reading of the code by inserting a
mutex_is_locked(old_dent->d_inode) in ->rename in ntfs and it returns
negative no matter how I invoke the rename (i.e. it does not matter if
source is a file or directory or whether a target exists, etc).
pps. If indeed sys_rename() is correct in not needing the mutex and
sys_unlink()/sys_rmdir(), sys_link(), etc are correct in needing the
mutex, would it be safe if I just take old_dentry->d_inode->i_mutex on
entry to ntfs_rename()? I would assume that there is no deadlock risk
because the parent is already locked, correct?
Thanks a lot in advance!
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest)
2006-04-19 10:50 A missing i_mutex in rename? (Linux kernel 2.6.latest) Anton Altaparmakov
@ 2006-04-19 12:18 ` Matthew Wilcox
2006-04-19 12:51 ` Anton Altaparmakov
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2006-04-19 12:18 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Al Viro, linux-fsdevel, linux-kernel
On Wed, Apr 19, 2006 at 11:50:00AM +0100, Anton Altaparmakov wrote:
> Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex
> on all parent directories and source/destination inodes before calling
> into the file system inode operations.
>
> sys_rename() OTOH, does not take i_mutex on the old inode. It only takes
> i_mutex on the two parent directories and on the target inode if it
> exists.
>
> Why is this? To me it seems that either sys_rename() must take i_mutex on
> the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to
> hold the i_mutex.
>
> What am I missing?
I believe the current locking scheme to be correct. Reading
Documentation/filesystems/directory-locking and pondering for a few
minutes leads me to the following conclusions:
- sys_rmdir() must take the lock on the parent directory and on the
victim. If a different process is trying to create a file in the
victim, sys_rmdir() mustn't race with it.
- I don't immediately see a race that taking the lock on the victim of
sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems
desirable.
- sys_link() needs to lock the target to be sure it isn't removed and
replaced with a directory in the meantime.
- sys_rename() does not need to lock the old inode. Since the parent
is already locked, the old inode can't be removed/renamed by a racing
process. It doesn't matter if something's created or deleted from
within the old inode (if it's a directory), unlike rmdir(). It
doesn't need to be protected from a sys_link() race.
If you need to lock the old inode inside ntfs for your own consistency
purposes, that looks like it should be fine, but the VFS doesn't need to
lock it for you.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest)
2006-04-19 12:18 ` Matthew Wilcox
@ 2006-04-19 12:51 ` Anton Altaparmakov
2006-04-20 10:59 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: Anton Altaparmakov @ 2006-04-19 12:51 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Al Viro, linux-fsdevel, linux-kernel
Hi Matthew,
Thanks for the quick reply.
On Wed, 19 Apr 2006, Matthew Wilcox wrote:
> On Wed, Apr 19, 2006 at 11:50:00AM +0100, Anton Altaparmakov wrote:
> > Both sys_unlink()/sys_rmdir() and sys_link() all end up taking the i_mutex
> > on all parent directories and source/destination inodes before calling
> > into the file system inode operations.
> >
> > sys_rename() OTOH, does not take i_mutex on the old inode. It only takes
> > i_mutex on the two parent directories and on the target inode if it
> > exists.
> >
> > Why is this? To me it seems that either sys_rename() must take i_mutex on
> > the old inode or sys_unlink()/sys_rmdir(), sys_link(), etc do not need to
> > hold the i_mutex.
> >
> > What am I missing?
>
> I believe the current locking scheme to be correct. Reading
> Documentation/filesystems/directory-locking and pondering for a few
> minutes leads me to the following conclusions:
>
> - sys_rmdir() must take the lock on the parent directory and on the
> victim. If a different process is trying to create a file in the
> victim, sys_rmdir() mustn't race with it.
Agreed.
> - I don't immediately see a race that taking the lock on the victim of
> sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems
> desirable.
I guess the symmetry thing is fair enough.
> - sys_link() needs to lock the target to be sure it isn't removed and
> replaced with a directory in the meantime.
Agreed.
> - sys_rename() does not need to lock the old inode. Since the parent
> is already locked, the old inode can't be removed/renamed by a racing
> process. It doesn't matter if something's created or deleted from
> within the old inode (if it's a directory), unlike rmdir(). It
> doesn't need to be protected from a sys_link() race.
Agreed.
> If you need to lock the old inode inside ntfs for your own consistency
> purposes, that looks like it should be fine, but the VFS doesn't need to
> lock it for you.
Great, thanks. That was my own conclusion also but it never hurts to be
sure. (-:
ntfs_rename() at the moment looks roughly like this:
if (target_inode) {
if (S_ISDIR(target_inode->i_mode)
ntfs_rmdir(target_dir_inode, target_dentry);
else
ntfs_unlink(target_dir_inode, target_dentry);
}
mutex_lock(&old_inode->i_mutex);
ntfs_link(old_dentry, target_dir_inode, target_dentry);
ntfs_unlink(old_dir_inode, old_dentry);
mutex_unlock(&old_inode->i_mutex);
Which is incredibly inefficient but very simple and works (with minimal
special casing in ntfs_link() and ntfs_unlink() mostly so if old_inode is
a directory we never get a link count greater one on the VFS inode) and I
doubt sys_rename() is a very often invoked system call. Normally, a
sys_link() and sys_unlink() would take i_mutex on old_inode as shown in
above code which is why I was wondering whether I should take it as shown
above or whether I can just not worry and not take yet another lock in a
code path where tons of locks are already being taken and released.
My conclusion is that the above code is safe even if I remove the
mutex_lock()/mutex_unlock() around the ntfs_link()/ntfs_unlink() given
that sys_unlink() only takes the lock for symmetry reasons and
sys_link()'s need for locking is taken care of by the fact that
sys_rename() has the lock on both parent directory inodes.
Would you agree?
Thanks a lot in advance!
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest)
2006-04-19 12:51 ` Anton Altaparmakov
@ 2006-04-20 10:59 ` Al Viro
2006-04-20 12:24 ` Anton Altaparmakov
0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2006-04-20 10:59 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Matthew Wilcox, linux-fsdevel, linux-kernel
On Wed, Apr 19, 2006 at 01:51:21PM +0100, Anton Altaparmakov wrote:
> > - I don't immediately see a race that taking the lock on the victim of
> > sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems
> > desirable.
>
> I guess the symmetry thing is fair enough.
Not only; it does, among other things, guarantee that fs can assume that
->link() won't race with unlink() (and that link count is protected by
->i_mutex, while we are at it).
> > - sys_link() needs to lock the target to be sure it isn't removed and
> > replaced with a directory in the meantime.
>
> Agreed.
"Replaced" part is bogus - we'd done lookup, so we won't get anything new.
> > If you need to lock the old inode inside ntfs for your own consistency
> > purposes, that looks like it should be fine, but the VFS doesn't need to
> > lock it for you.
>
> Great, thanks. That was my own conclusion also but it never hurts to be
> sure. (-:
>
> ntfs_rename() at the moment looks roughly like this:
>
> if (target_inode) {
> if (S_ISDIR(target_inode->i_mode)
> ntfs_rmdir(target_dir_inode, target_dentry);
> else
> ntfs_unlink(target_dir_inode, target_dentry);
> }
> mutex_lock(&old_inode->i_mutex);
> ntfs_link(old_dentry, target_dir_inode, target_dentry);
> ntfs_unlink(old_dir_inode, old_dentry);
> mutex_unlock(&old_inode->i_mutex);
Have fun dealing with error handling in the above... Note that failing
rename() should _NOT_ lead to target disappearing.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A missing i_mutex in rename? (Linux kernel 2.6.latest)
2006-04-20 10:59 ` Al Viro
@ 2006-04-20 12:24 ` Anton Altaparmakov
0 siblings, 0 replies; 5+ messages in thread
From: Anton Altaparmakov @ 2006-04-20 12:24 UTC (permalink / raw)
To: Al Viro; +Cc: Matthew Wilcox, linux-fsdevel, linux-kernel
On Thu, 20 Apr 2006, Al Viro wrote:
> On Wed, Apr 19, 2006 at 01:51:21PM +0100, Anton Altaparmakov wrote:
> > > - I don't immediately see a race that taking the lock on the victim of
> > > sys_unlink() solves; however, for symmetry with sys_rmdir(), it seems
> > > desirable.
> >
> > I guess the symmetry thing is fair enough.
>
> Not only; it does, among other things, guarantee that fs can assume that
> ->link() won't race with unlink() (and that link count is protected by
> ->i_mutex, while we are at it).
Ok.
> > ntfs_rename() at the moment looks roughly like this:
> >
> > if (target_inode) {
> > if (S_ISDIR(target_inode->i_mode)
> > ntfs_rmdir(target_dir_inode, target_dentry);
> > else
> > ntfs_unlink(target_dir_inode, target_dentry);
> > }
> > mutex_lock(&old_inode->i_mutex);
> > ntfs_link(old_dentry, target_dir_inode, target_dentry);
> > ntfs_unlink(old_dir_inode, old_dentry);
> > mutex_unlock(&old_inode->i_mutex);
>
> Have fun dealing with error handling in the above... Note that failing
> rename() should _NOT_ lead to target disappearing.
Indeed. But if I do not define a rename operation at all then the mv
command looses the target if the move fails, too. So I don't see how the
native rename needs to be any better other than for standards compliance.
Error recovery is actually pretty easy as if the second ntfs_unlink()
fails I simply ntfs_unlink() the link I just created with ntfs_link().
That already works fine. But yes at the moment I do lose the target if
either the ntfs_link() or the second ntfs_unlink() fails.
I am planning to fix this though. It is not too hard because I can more
or less just ntfs_link() it back in (with a few special casings).
Even if I implement are more "native" rename not using
ntfs_unlink()/ntfs_link() I still have to remove the target first
otherwise I cannot insert the new target because you cannot have two
entries in the B+tree that have identical index keys (the index key is the
filename itself so the two would be equal). And I cannot simply replace
the B+tree index entry because NTFS is not case sensitive so the two
names (old and new) may not be in the same place in the B+tree. The only
thing I can do is to remove the old B+tree index entry and add the new one
after that fact. So no matter what I do, I have to be able to recreate
the old index entry and that is just what ntfs_link() does which is why
I chose my "simplified rename" approach in the first place.
Thanks a lot for you comments about the locking!
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer, http://www.linux-ntfs.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-04-20 12:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-19 10:50 A missing i_mutex in rename? (Linux kernel 2.6.latest) Anton Altaparmakov
2006-04-19 12:18 ` Matthew Wilcox
2006-04-19 12:51 ` Anton Altaparmakov
2006-04-20 10:59 ` Al Viro
2006-04-20 12:24 ` Anton Altaparmakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).