public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Gaughen <mgaughen@polyserve.com>
To: linux-kernel@vger.kernel.org
Cc: mgaughen@polyserve.com
Subject: SMP race in VFS.
Date: Thu, 13 Dec 2001 12:14:44 -0800	[thread overview]
Message-ID: <3C190C34.50008@polyserve.com> (raw)

Hello,

  I have a question about a potential race, on SMP machines,
between the creat(2) and rename(2) operations.  Rename has the
semantic that if the "newpath" exists, then it will be atomically
replaced.  Now I have two processes: one is renaming a directory
"a" to directory "b" (which _is_ empty), while the second is
trying to create a file "c" within directory "b."  In that test, the
directory "b" should _always_ exist, therefore, the create should
always succeed.  However, there exists a window of opportunity
in which the create can return ENOENT.

  Upon further investigation of the VFS code, I found:

     do_rename():
        ...
        lock_kernel();
        error = vfs_rename(old_dir->d_inode, old_dentry,
                                   new_dir->d_inode, new_dentry);
        unlock_kernel();
        ...

  that eventually calls down to:

      vfs_rename_dir():
        if (target) { /* Hastur! Hastur! Hastur! */
                triple_down(&old_dir->i_zombie,
                            &new_dir->i_zombie,
                            &target->i_zombie);
                d_unhash(new_dentry);
        } else
                double_down(&old_dir->i_zombie,
                            &new_dir->i_zombie);
        ...
        if (target) {
                if (!error)
                        target->i_flags |= S_DEAD;
                triple_up(&old_dir->i_zombie,
                          &new_dir->i_zombie,
                          &target->i_zombie);
                if (d_unhashed(new_dentry))
                        d_rehash(new_dentry);
                dput(new_dentry);
        } else
                double_up(&old_dir->i_zombie,
                          &new_dir->i_zombie);

        if (!error)
                d_move(old_dentry,new_dentry);
         ...

     vfs_create():
         ...
        down(&dir->i_zombie);
        error = may_create(dir, dentry);
        if (error)
                goto exit_locks;
        ...
        lock_kernel();
        error = dir->i_op->create(dir, dentry, mode);
        unlock_kernel();
        ...

Now, in this case, both processes have performed their
path walks, and are now referencing the same dentry/inode
corresponding to the directory "b."  The rename process is
able to acquire the BKL, and the necessary i_zombies first.
The create process must then block, within vfs_create,
waiting for directory "b's" i_zombie.  The rename operation
completes, sets S_DEAD within the target inode (in this case
the target inode is the directory "b"), and releases all of its
i_zombies.  At this point, before vfs_rename_dir is able to
call d_move, the process blocked in vfs_create is able to
acquire the i_zombie, and call may_create - which finds that
the directory "b's" inode is now dead and returns ENOENT.

The problem boils down to the fact that the d_move was not
called before the may_create.  I am not sure of the appropriate
fix for this race, but I see at least two options.  One is to move
the call to may_create within the lock_kernel() - however, that
may not be a good long term solution if the BKL is going away.  
Another possibility is to move the call to d_move within the
protection of the i_zombie (within vfs_rename_dir).

The test was run on an SMP machine running Linux kernel2.4.7.  
Upon inspection, I found that the race still exists in the latest stable
release: 2.4.16.  And this race _only_ exists_ on SMP machines.

Comments?  Ideas?  Flames?

Thanks,
-Mike Gaughen

ps.  Please CC me on any replies, as I am not subscribed to the list.


                 reply	other threads:[~2001-12-13 20:17 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C190C34.50008@polyserve.com \
    --to=mgaughen@polyserve.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox