From: Dave Hansen <haveblue@us.ibm.com>
To: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org, miklos@szeredi.hu,
hch@infradead.org
Subject: Re: [PATCH] track number of mnts writing to superblocks
Date: Thu, 03 Jan 2008 11:37:49 -0800 [thread overview]
Message-ID: <1199389069.13731.72.camel@localhost> (raw)
In-Reply-To: <20080103040258.GB14117@sergelap.austin.ibm.com>
On Wed, 2008-01-02 at 22:02 -0600, Serge E. Hallyn wrote:
> Ok I'm blabbing quite a bit here while trying to figure out
> the patch, and maybe there are some useful hints for where more
> comments would be useful. But other than the fact that
> mark_mnt_has_writer() needs to the atomic_inc() even if
> cpu_writer was passed in as NULL, the patch seems good.
Yeah, I screwed that up. Should be fixed now.
> > Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
> > ---
> >
> > linux-2.6.git-dave/fs/file_table.c | 24 -----
> > linux-2.6.git-dave/fs/namespace.c | 134 +++++++++++++++++++++++++------
> > linux-2.6.git-dave/fs/super.c | 61 +++++++++++---
> > linux-2.6.git-dave/include/linux/fs.h | 5 -
> > linux-2.6.git-dave/include/linux/mount.h | 3
> > 5 files changed, 163 insertions(+), 64 deletions(-)
> >
> > diff -puN fs/file_table.c~track_sb_mnt_writers fs/file_table.c
> > --- linux-2.6.git/fs/file_table.c~track_sb_mnt_writers 2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/file_table.c 2008-01-02 10:49:11.000000000 -0800
> > @@ -374,30 +374,6 @@ void file_kill(struct file *file)
> > }
> > }
> >
> > -int fs_may_remount_ro(struct super_block *sb)
> > -{
> > - struct file *file;
> > -
> > - /* Check that no files are currently opened for writing. */
> > - file_list_lock();
> > - list_for_each_entry(file, &sb->s_files, f_u.fu_list) {
> > - struct inode *inode = file->f_path.dentry->d_inode;
> > -
> > - /* File with pending delete? */
> > - if (inode->i_nlink == 0)
> > - goto too_bad;
> > -
> > - /* Writeable file? */
> > - if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
>
> (why did this originally skip directories?)
I think it's more to skip device files and named pipes than directories.
I don't even know what happens offhand if you try a plain write() to an
open directory.
> > diff -puN fs/file_table.c.orig~track_sb_mnt_writers fs/file_table.c.orig
> > diff -puN fs/namespace.c~track_sb_mnt_writers fs/namespace.c
> > --- linux-2.6.git/fs/namespace.c~track_sb_mnt_writers 2008-01-02 10:49:11.000000000 -0800
> > +++ linux-2.6.git-dave/fs/namespace.c 2008-01-02 13:39:52.000000000 -0800
> > @@ -118,7 +118,7 @@ struct mnt_writer {
> > * If holding multiple instances of this lock, they
> > * must be ordered by cpu number.
> > */
> > - spinlock_t lock;
> > + struct mutex lock;
> > struct lock_class_key lock_class; /* compiles out with !lockdep */
> > unsigned long count;
> > struct vfsmount *mnt;
> > @@ -130,7 +130,7 @@ static int __init init_mnt_writers(void)
> > int cpu;
> > for_each_possible_cpu(cpu) {
> > struct mnt_writer *writer = &per_cpu(mnt_writers, cpu);
> > - spin_lock_init(&writer->lock);
> > + mutex_init(&writer->lock);
> > lockdep_set_class(&writer->lock, &writer->lock_class);
> > writer->count = 0;
> > }
> > @@ -145,11 +145,52 @@ static void mnt_unlock_cpus(void)
> >
> > for_each_possible_cpu(cpu) {
> > cpu_writer = &per_cpu(mnt_writers, cpu);
> > - spin_unlock(&cpu_writer->lock);
> > + mutex_unlock(&cpu_writer->lock);
> > }
> > }
> >
> > -static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
> > +static int mark_mnt_has_writer(struct vfsmount *mnt,
> > + struct mnt_writer *cpu_writer)
> > +{
> > + /*
> > + * Ensure that if there are people racing to set
> > + * the bit that only one of them succeeds and can
> > + * increment the sb count.
> > + */
> > + if (test_and_set_bit(ilog2(MNT_MAY_HAVE_WRITERS), &mnt->mnt_flags))
> > + return 0;
>
> Comment isn't entirely clear, but you're returning 0 here because
> someone else has already set the flag and incremented
> sb->__s_mnt_writers so you don't have to and you're all set to go on
> with writing?
Yeah. This function is all about making sure that the sb is marked
properly because the mnt is writable. If it's already marked, then
we're good to go.
> > + if (cpu_writer == NULL)
> > + return 0;
> > +
> > + /*
> > + * Our goal here is to get exclusion from a superblock
> > + * remount operation (fs_may_remount_ro()). This is
> > + * effectively a slow path that we must go through the
> > + * first time we set the bit on each mount, but never
> > + * again unless the writer counts get coalesced.
> > + */
> > +
> > + mutex_unlock(&cpu_writer->lock);
> > + lock_super(mnt->mnt_sb);
> > +
> > + atomic_inc(&mnt->mnt_sb->__s_mnt_writers);
>
> The atomic_inc of course should be done even if cpu_writer was passed
> in as NULL, you just don't need to do the locking then, and can
> return 0 in that case?
Yep. I'll fix that.
> > +
> > + unlock_super(mnt->mnt_sb);
> > + mutex_lock(&cpu_writer->lock);
> > + return -EAGAIN;
> > +}
> > +
> > +static void __mark_sb_if_writable(struct vfsmount *mnt)
>
> This function is taking the writable state from a mnt and marking it in
> the sb. So the name should be a shorter verison of something like
> "commit_mnt_writable_state_to_sb"
Here's what I have now:
static void __sync_mnt_writable_to_sb(struct vfsmount *mnt)
> > +{
> > + int bitnr = ilog2(MNT_MAY_HAVE_WRITERS);
> > +
> > + if (atomic_read(&mnt->__mnt_writers))
> > + mark_mnt_has_writer(mnt, NULL);
> > + else if (test_and_clear_bit(bitnr, &mnt->mnt_flags))
> > + atomic_dec(&mnt->mnt_sb->__s_mnt_writers);
>
> And after staring at this code for awhile it did make sense, but a
> comment above it would help saying that
>
> /* If mnt has writers, mark_mnt_has_writer() will make
> sure that is marked in the sb. If mnt has no writers,
> then if mnt->mnt_flags was previously set, that means
> that mnt->mnt_sb->__s_mnt_writers reflects our having
> writers, so we decrement it
> */
>
> Ok, maybe it's clearer to other people and the comment isn't needed...
You couldn't figure it out at first glance, and I tend to think that
most people will have the same experience. I'll see if I can add a
better comment.
I'll send out an updated patch in a bit along with a simpler alternative
patch.
-- Dave
next prev parent reply other threads:[~2008-01-03 19:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-02 21:51 [PATCH] track number of mnts writing to superblocks Dave Hansen
2008-01-03 4:02 ` Serge E. Hallyn
2008-01-03 19:37 ` Dave Hansen [this message]
2008-01-10 7:42 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1199389069.13731.72.camel@localhost \
--to=haveblue@us.ibm.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=serue@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.