Re: [rfc][patch 1/2] mnt_want_write speedup 1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Fri, 19 Dec 2008 07:32:01 -0800	[thread overview]
Message-ID: <1229700721.17206.634.camel@nimitz> (raw)
In-Reply-To: <20081219070311.GA26419@wotan.suse.de>

On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote:
> On Thu, Dec 18, 2008 at 10:54:57PM -0800, Dave Hansen wrote:
> > On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote:
> > > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm
> > >  {
> > >         int ret = 0;
> > > 
> > > -       lock_mnt_writers();
> > > +       spin_lock(&vfsmount_lock);
> > > +       mnt->mnt_flags |= MNT_WRITE_HOLD;
> > >         /*
> > > -        * With all the locks held, this value is stable
> > > +        * After storing MNT_WRITE_HOLD, we'll read the counters. This store
> > > +        * should be visible before we do.
> > >          */
> > > -       if (atomic_read(&mnt->__mnt_writers) > 0) {
> > > +       smp_mb();
> > > +
> > > +       /*
> > > +        * With writers on hold, if this value is zero, then there are definitely
> > > +        * no active writers (although held writers may subsequently increment
> > > +        * the count, they'll have to wait, and decrement it after seeing
> > > +        * MNT_READONLY).
> > > +        */
> > > +       if (count_mnt_writers(mnt) > 0) {
> > >                 ret = -EBUSY;
> > 
> > OK, I think this is one of the big races inherent with this approach.
> > There's nothing in here to ensure that no one is in the middle of an
> > update during this code.  The preempt_disable() will, of course, reduce
> > the window, but I think there's still a race here.
> 
> MNT_WRITE_HOLD is set, so any writer that has already made it past
> the MNT_WANT_WRITE loop will have its count visible here. Any writer
> that has not made it past that loop will wait until the slowpath
> completes and then the fastpath will go on to check whether the
> mount is still writeable.

Ahh, got it.  I'm slowly absorbing the barriers.  Not the normal way, I
code.

I thought there was another race with MNT_WRITE_HOLD since mnt_flags
isn't really managed atomically.  But, by only modifying with the
vfsmount_lock, I think it is OK.

I also wondered if there was a possibility of getting a spurious -EBUSY
when remounting r/w->r/o.  But, that turned out to just happen when the
fs was *already* r/o.  So that looks good.

While this has cleared out a huge amount of complexity, I can't stop
wondering if this could be done with a wee bit more "normal" operations.
I'm pretty sure I couldn't have come up with this by myself, and I'm a
bit worried that I wouldn't be able to find a race in it if one reared
its ugly head.  

Is there a real good reason to allocate the percpu counters dynamically?
Might as well stick them in the vfsmount and let the one
kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation.
Did you think that was going to bloat it to a compound allocation or
something?  I hate the #ifdefs. :)

-- Dave

WARNING: multiple messages have this Message-ID (diff)

From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Fri, 19 Dec 2008 07:32:01 -0800	[thread overview]
Message-ID: <1229700721.17206.634.camel@nimitz> (raw)
In-Reply-To: <20081219070311.GA26419@wotan.suse.de>

On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote:
> On Thu, Dec 18, 2008 at 10:54:57PM -0800, Dave Hansen wrote:
> > On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote:
> > > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm
> > >  {
> > >         int ret = 0;
> > > 
> > > -       lock_mnt_writers();
> > > +       spin_lock(&vfsmount_lock);
> > > +       mnt->mnt_flags |= MNT_WRITE_HOLD;
> > >         /*
> > > -        * With all the locks held, this value is stable
> > > +        * After storing MNT_WRITE_HOLD, we'll read the counters. This store
> > > +        * should be visible before we do.
> > >          */
> > > -       if (atomic_read(&mnt->__mnt_writers) > 0) {
> > > +       smp_mb();
> > > +
> > > +       /*
> > > +        * With writers on hold, if this value is zero, then there are definitely
> > > +        * no active writers (although held writers may subsequently increment
> > > +        * the count, they'll have to wait, and decrement it after seeing
> > > +        * MNT_READONLY).
> > > +        */
> > > +       if (count_mnt_writers(mnt) > 0) {
> > >                 ret = -EBUSY;
> > 
> > OK, I think this is one of the big races inherent with this approach.
> > There's nothing in here to ensure that no one is in the middle of an
> > update during this code.  The preempt_disable() will, of course, reduce
> > the window, but I think there's still a race here.
> 
> MNT_WRITE_HOLD is set, so any writer that has already made it past
> the MNT_WANT_WRITE loop will have its count visible here. Any writer
> that has not made it past that loop will wait until the slowpath
> completes and then the fastpath will go on to check whether the
> mount is still writeable.

Ahh, got it.  I'm slowly absorbing the barriers.  Not the normal way, I
code.

I thought there was another race with MNT_WRITE_HOLD since mnt_flags
isn't really managed atomically.  But, by only modifying with the
vfsmount_lock, I think it is OK.

I also wondered if there was a possibility of getting a spurious -EBUSY
when remounting r/w->r/o.  But, that turned out to just happen when the
fs was *already* r/o.  So that looks good.

While this has cleared out a huge amount of complexity, I can't stop
wondering if this could be done with a wee bit more "normal" operations.
I'm pretty sure I couldn't have come up with this by myself, and I'm a
bit worried that I wouldn't be able to find a race in it if one reared
its ugly head.  

Is there a real good reason to allocate the percpu counters dynamically?
Might as well stick them in the vfsmount and let the one
kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation.
Did you think that was going to bloat it to a compound allocation or
something?  I hate the #ifdefs. :)

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-12-19 15:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19  6:19 [rfc][patch 1/2] mnt_want_write speedup 1 Nick Piggin
2008-12-19  6:19 ` Nick Piggin
2008-12-19  6:20 ` [rfc][patch 2/2] mnt_want_write speedup 2 Nick Piggin
2008-12-19  6:20   ` Nick Piggin
2008-12-19  6:34 ` [rfc][patch 1/2] mnt_want_write speedup 1 Dave Hansen
2008-12-19  6:52   ` Nick Piggin
2008-12-19  6:52     ` Nick Piggin
2008-12-19  6:56     ` Nick Piggin
2008-12-19  6:54 ` Dave Hansen
2008-12-19  6:54   ` Dave Hansen
2008-12-19  7:03   ` Nick Piggin
2008-12-19  7:03     ` Nick Piggin
2008-12-19 15:32     ` Dave Hansen [this message]
2008-12-19 15:32       ` Dave Hansen
2008-12-22  4:35       ` Nick Piggin
2008-12-22  4:35         ` Nick Piggin
2008-12-29 23:00         ` Dave Hansen
2008-12-29 23:00           ` Dave Hansen
2008-12-30  4:02           ` Nick Piggin
2008-12-30  4:02             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1229700721.17206.634.camel@nimitz \
    --to=dave@linux.vnet.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.