* SMP race in VFS.
@ 2001-12-13 20:14 Michael Gaughen
0 siblings, 0 replies; only message in thread
From: Michael Gaughen @ 2001-12-13 20:14 UTC (permalink / raw)
To: linux-kernel; +Cc: mgaughen
Hello,
I have a question about a potential race, on SMP machines,
between the creat(2) and rename(2) operations. Rename has the
semantic that if the "newpath" exists, then it will be atomically
replaced. Now I have two processes: one is renaming a directory
"a" to directory "b" (which _is_ empty), while the second is
trying to create a file "c" within directory "b." In that test, the
directory "b" should _always_ exist, therefore, the create should
always succeed. However, there exists a window of opportunity
in which the create can return ENOENT.
Upon further investigation of the VFS code, I found:
do_rename():
...
lock_kernel();
error = vfs_rename(old_dir->d_inode, old_dentry,
new_dir->d_inode, new_dentry);
unlock_kernel();
...
that eventually calls down to:
vfs_rename_dir():
if (target) { /* Hastur! Hastur! Hastur! */
triple_down(&old_dir->i_zombie,
&new_dir->i_zombie,
&target->i_zombie);
d_unhash(new_dentry);
} else
double_down(&old_dir->i_zombie,
&new_dir->i_zombie);
...
if (target) {
if (!error)
target->i_flags |= S_DEAD;
triple_up(&old_dir->i_zombie,
&new_dir->i_zombie,
&target->i_zombie);
if (d_unhashed(new_dentry))
d_rehash(new_dentry);
dput(new_dentry);
} else
double_up(&old_dir->i_zombie,
&new_dir->i_zombie);
if (!error)
d_move(old_dentry,new_dentry);
...
vfs_create():
...
down(&dir->i_zombie);
error = may_create(dir, dentry);
if (error)
goto exit_locks;
...
lock_kernel();
error = dir->i_op->create(dir, dentry, mode);
unlock_kernel();
...
Now, in this case, both processes have performed their
path walks, and are now referencing the same dentry/inode
corresponding to the directory "b." The rename process is
able to acquire the BKL, and the necessary i_zombies first.
The create process must then block, within vfs_create,
waiting for directory "b's" i_zombie. The rename operation
completes, sets S_DEAD within the target inode (in this case
the target inode is the directory "b"), and releases all of its
i_zombies. At this point, before vfs_rename_dir is able to
call d_move, the process blocked in vfs_create is able to
acquire the i_zombie, and call may_create - which finds that
the directory "b's" inode is now dead and returns ENOENT.
The problem boils down to the fact that the d_move was not
called before the may_create. I am not sure of the appropriate
fix for this race, but I see at least two options. One is to move
the call to may_create within the lock_kernel() - however, that
may not be a good long term solution if the BKL is going away.
Another possibility is to move the call to d_move within the
protection of the i_zombie (within vfs_rename_dir).
The test was run on an SMP machine running Linux kernel2.4.7.
Upon inspection, I found that the race still exists in the latest stable
release: 2.4.16. And this race _only_ exists_ on SMP machines.
Comments? Ideas? Flames?
Thanks,
-Mike Gaughen
ps. Please CC me on any replies, as I am not subscribed to the list.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2001-12-13 20:17 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-13 20:14 SMP race in VFS Michael Gaughen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox