From: Dave Hansen <dave@linux.vnet.ibm.com>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>, Nick Piggin <npiggin@suse.de>,
linux-kernel@vger.kernel.org
Subject: Re: mnt_want_write_file() has problem?
Date: Mon, 03 Aug 2009 13:37:00 -0700 [thread overview]
Message-ID: <1249331820.26977.1746.camel@nimitz> (raw)
In-Reply-To: <8763d4ivi5.fsf@devron.myhome.or.jp>
On Tue, 2009-08-04 at 03:48 +0900, OGAWA Hirofumi wrote:
> Dave Hansen <dave@linux.vnet.ibm.com> writes:
>
> > On Mon, 2009-08-03 at 06:36 +0900, OGAWA Hirofumi wrote:
> >> While I'm reading some code, I suspected that mnt_want_write_file() may
> >> have wrong assumption. I think mnt_want_write_file() is assuming it
> >> increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's
> >> special_file(), it is false?
> >>
> >> Sorry, I'm still not checking all of those though. E.g. I'm thinking the
> >> below.
> >>
> >> static inline int __get_file_write_access(struct inode *inode,
> >> struct vfsmount *mnt)
> >> {
> >> [...]
> >> if (!special_file(inode->i_mode)) {
> >> /*
> >> * Balanced in __fput()
> >> */
> >> error = mnt_want_write(mnt);
> >> if (error)
> >> put_write_access(inode);
> >> }
> >> return error;
> >> }
> >
> > In practice I don't think this is an issue. We were never supposed to
> > do mnt_want_write(mnt) for any 'struct file' that was a special_file(),
> > specifically because of what you mention.
> >
> > Nick's use of mnt_want_write_file() was a 1:1 drop-in for
> > mnt_want_write(). So, if all is well in the world, there should not be
> > any call sites where mnt_want_write_file() gets called on a
> > special_file().
>
> void file_update_time(struct file *file)
> sys_fchmod()
> sys_fchown()
> sys_fsetxattr()
> sys_fremovexattr()
>
> Um..., the users of mnt_want_write_file() seems to be those. I think
> all of those filp can be special file?
OK, I see where you're going now. I think the race goes like this:
Let's say we have a process with /dev/null opened with FMODE_WRITE. It
is the only file open on the filesystem and so the /dev mount has a 0
mnt_writers count. That process goes to f_chmod() its fd to /dev/null.
The code checks and notices that (file->f_mode & FMODE_WRITE), and goes
to mnt_clone_write().
At the same time, another process tries to 'mount -o remount,ro /dev'.
That process never sees mnt_clone_write()'s mnt_writers bump and allows
the remount,ro, even though there's an elevated mnt_writers count.
Here's a completely untested/uncompiled patch. I'll see if I can find a
test case that triggers this bug with the BUG_ON() in this patch.
diff --git a/fs/namespace.c b/fs/namespace.c
index 277c28a..a4714c4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -294,9 +294,17 @@ EXPORT_SYMBOL_GPL(mnt_want_write);
*
* After finished, mnt_drop_write must be called as usual to
* drop the reference.
+ *
+ * Be very careful using this. You must *guarantee* that
+ * this vfsmount has at least one existing, persistent writer
+ * that can not possibly go away, before calling this.
*/
int mnt_clone_write(struct vfsmount *mnt)
{
+ /* This would kill the performance
+ * optimization in this function
+ BUG_ON(count_mnt_writers(mnt) > 0);
+ */
/* superblock may be r/o */
if (__mnt_is_readonly(mnt))
return -EROFS;
@@ -312,14 +320,20 @@ EXPORT_SYMBOL_GPL(mnt_clone_write);
* @file: the file who's mount on which to take a write
*
* This is like mnt_want_write, but it takes a file and can
- * do some optimisations if the file is open for write already
+ * do some optimisations if the file is open for write already.
+ * We do not do mnt_want_write() on read-only or special files,
+ * so we can not use mnt_clone_write() for them.
*/
int mnt_want_write_file(struct file *file)
{
- if (!(file->f_mode & FMODE_WRITE))
- return mnt_want_write(file->f_path.mnt);
- else
- return mnt_clone_write(file->f_path.mnt);
+ struct path *path = &file->f_path;
+ struct inode *inode = path->dentry->d_inode;
+
+ if ((file->f_mode & FMODE_WRITE) &&
+ !special_file(inode))
+ return mnt_clone_write(path->mnt);
+
+ return mnt_want_write(path->mnt);
}
EXPORT_SYMBOL_GPL(mnt_want_write_file);
-- Dave
next prev parent reply other threads:[~2009-08-03 20:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-02 21:36 mnt_want_write_file() has problem? OGAWA Hirofumi
2009-08-03 18:31 ` Dave Hansen
2009-08-03 18:48 ` OGAWA Hirofumi
2009-08-03 20:37 ` Dave Hansen [this message]
2009-08-04 19:15 ` Dave Hansen
2009-08-05 5:37 ` Nick Piggin
2009-09-12 13:39 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1249331820.26977.1746.camel@nimitz \
--to=dave@linux.vnet.ibm.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox