From: Michael Gaughen <mgaughen@polyserve.com>
To: linux-kernel@vger.kernel.org
Cc: mgaughen@polyserve.com
Subject: SMP race in VFS.
Date: Thu, 13 Dec 2001 12:14:44 -0800 [thread overview]
Message-ID: <3C190C34.50008@polyserve.com> (raw)
Hello,
I have a question about a potential race, on SMP machines,
between the creat(2) and rename(2) operations. Rename has the
semantic that if the "newpath" exists, then it will be atomically
replaced. Now I have two processes: one is renaming a directory
"a" to directory "b" (which _is_ empty), while the second is
trying to create a file "c" within directory "b." In that test, the
directory "b" should _always_ exist, therefore, the create should
always succeed. However, there exists a window of opportunity
in which the create can return ENOENT.
Upon further investigation of the VFS code, I found:
do_rename():
...
lock_kernel();
error = vfs_rename(old_dir->d_inode, old_dentry,
new_dir->d_inode, new_dentry);
unlock_kernel();
...
that eventually calls down to:
vfs_rename_dir():
if (target) { /* Hastur! Hastur! Hastur! */
triple_down(&old_dir->i_zombie,
&new_dir->i_zombie,
&target->i_zombie);
d_unhash(new_dentry);
} else
double_down(&old_dir->i_zombie,
&new_dir->i_zombie);
...
if (target) {
if (!error)
target->i_flags |= S_DEAD;
triple_up(&old_dir->i_zombie,
&new_dir->i_zombie,
&target->i_zombie);
if (d_unhashed(new_dentry))
d_rehash(new_dentry);
dput(new_dentry);
} else
double_up(&old_dir->i_zombie,
&new_dir->i_zombie);
if (!error)
d_move(old_dentry,new_dentry);
...
vfs_create():
...
down(&dir->i_zombie);
error = may_create(dir, dentry);
if (error)
goto exit_locks;
...
lock_kernel();
error = dir->i_op->create(dir, dentry, mode);
unlock_kernel();
...
Now, in this case, both processes have performed their
path walks, and are now referencing the same dentry/inode
corresponding to the directory "b." The rename process is
able to acquire the BKL, and the necessary i_zombies first.
The create process must then block, within vfs_create,
waiting for directory "b's" i_zombie. The rename operation
completes, sets S_DEAD within the target inode (in this case
the target inode is the directory "b"), and releases all of its
i_zombies. At this point, before vfs_rename_dir is able to
call d_move, the process blocked in vfs_create is able to
acquire the i_zombie, and call may_create - which finds that
the directory "b's" inode is now dead and returns ENOENT.
The problem boils down to the fact that the d_move was not
called before the may_create. I am not sure of the appropriate
fix for this race, but I see at least two options. One is to move
the call to may_create within the lock_kernel() - however, that
may not be a good long term solution if the BKL is going away.
Another possibility is to move the call to d_move within the
protection of the i_zombie (within vfs_rename_dir).
The test was run on an SMP machine running Linux kernel2.4.7.
Upon inspection, I found that the race still exists in the latest stable
release: 2.4.16. And this race _only_ exists_ on SMP machines.
Comments? Ideas? Flames?
Thanks,
-Mike Gaughen
ps. Please CC me on any replies, as I am not subscribed to the list.
reply other threads:[~2001-12-13 20:17 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C190C34.50008@polyserve.com \
--to=mgaughen@polyserve.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.