Re: [rfc][patch 1/2] mnt_want_write speedup 1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Fri, 19 Dec 2008 07:52:42 +0100	[thread overview]
Message-ID: <20081219065242.GD16268@wotan.suse.de> (raw)
In-Reply-To: <1229668492.17206.594.camel@nimitz>

On Thu, Dec 18, 2008 at 10:34:52PM -0800, Dave Hansen wrote:
> On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote:
> > Hi. Fun, chasing down performance regressions.... I wonder what people think
> > about these patches? Is it OK to bloat struct vfsmount? Any races?
> 
> Very cool stuff, Nick.  I especially like how much it simplifies things
> and removes *SO* much code.

Thanks.

> Bloating the vfsmount was one of the things that really, really tried to
> avoid.  When I start to think about the SGI machines, it gets me really
> worried.  I went to a lot of trouble to make sure that the per-vfsmount
> memory overhead didn't scale with the number of cpus.

Well, OTOH, the SGI machines have a lot of memory ;) I *think* that
not many systems probably have thousands of mounts (given that the
mount hashtable is fixed sized single page), but I might be wrong
which is why I ask here.

Let's say a 4096 CPU machine with one mount for each CPU (4096 mounts),
I think should only use about 128MB total for the counters. OK, yes
that is a lot ;) but not exactly insane for such machine size.

Say for 32 CPU system with 10,000 mounts, it's 9MB.

> > This could
> > be made even faster if mnt_make_readonly could tolerate a really high latency
> > synchronize_rcu()... can it?)
> 
> Yes, I think it can tolerate it.  There's a lot of work to do, and we
> already have to go touch all the other per-cpu objects.  There also
> tends to be writeout when this happens, so I don't think a few seconds,
> even, will be noticed.

That would be good. After the first patch, mnt_want_write still shows up
on profiles and almost oall the hits come right after the msync from
the smp_mb there.

It would be really nice to use RCU here. I think it might allow us to
eliminate the memory barriers.

> > This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
> > basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
> > A microbenchmark yes, but it exercises some important paths in the mm.
> 
> Do you know where the overhead actually came from?  Was it the
> spinlocks?  Was removing all the atomic ops what really helped?

I thnk about 95% of the unhalted cycles were hit against the two
instructions after the call to spin_lock. It wasn't actually flipping 
the write counter per-cpu cache as far as I could see. I didn't save
the instruction level profiles, but I'll do another run if people
think it will be sane to use RCU here.

> I'll take a more in-depth look at your code tomorrow and see if I see
> any races.

Thanks.

WARNING: multiple messages have this Message-ID (diff)

From: Nick Piggin <npiggin@suse.de>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [rfc][patch 1/2] mnt_want_write speedup 1
Date: Fri, 19 Dec 2008 07:52:42 +0100	[thread overview]
Message-ID: <20081219065242.GD16268@wotan.suse.de> (raw)
In-Reply-To: <1229668492.17206.594.camel@nimitz>

On Thu, Dec 18, 2008 at 10:34:52PM -0800, Dave Hansen wrote:
> On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote:
> > Hi. Fun, chasing down performance regressions.... I wonder what people think
> > about these patches? Is it OK to bloat struct vfsmount? Any races?
> 
> Very cool stuff, Nick.  I especially like how much it simplifies things
> and removes *SO* much code.

Thanks.

> Bloating the vfsmount was one of the things that really, really tried to
> avoid.  When I start to think about the SGI machines, it gets me really
> worried.  I went to a lot of trouble to make sure that the per-vfsmount
> memory overhead didn't scale with the number of cpus.

Well, OTOH, the SGI machines have a lot of memory ;) I *think* that
not many systems probably have thousands of mounts (given that the
mount hashtable is fixed sized single page), but I might be wrong
which is why I ask here.

Let's say a 4096 CPU machine with one mount for each CPU (4096 mounts),
I think should only use about 128MB total for the counters. OK, yes
that is a lot ;) but not exactly insane for such machine size.

Say for 32 CPU system with 10,000 mounts, it's 9MB.

> > This could
> > be made even faster if mnt_make_readonly could tolerate a really high latency
> > synchronize_rcu()... can it?)
> 
> Yes, I think it can tolerate it.  There's a lot of work to do, and we
> already have to go touch all the other per-cpu objects.  There also
> tends to be writeout when this happens, so I don't think a few seconds,
> even, will be noticed.

That would be good. After the first patch, mnt_want_write still shows up
on profiles and almost oall the hits come right after the msync from
the smp_mb there.

It would be really nice to use RCU here. I think it might allow us to
eliminate the memory barriers.

> > This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
> > basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
> > A microbenchmark yes, but it exercises some important paths in the mm.
> 
> Do you know where the overhead actually came from?  Was it the
> spinlocks?  Was removing all the atomic ops what really helped?

I thnk about 95% of the unhalted cycles were hit against the two
instructions after the call to spin_lock. It wasn't actually flipping 
the write counter per-cpu cache as far as I could see. I didn't save
the instruction level profiles, but I'll do another run if people
think it will be sane to use RCU here.

> I'll take a more in-depth look at your code tomorrow and see if I see
> any races.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-12-19  6:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19  6:19 [rfc][patch 1/2] mnt_want_write speedup 1 Nick Piggin
2008-12-19  6:19 ` Nick Piggin
2008-12-19  6:20 ` [rfc][patch 2/2] mnt_want_write speedup 2 Nick Piggin
2008-12-19  6:20   ` Nick Piggin
2008-12-19  6:34 ` [rfc][patch 1/2] mnt_want_write speedup 1 Dave Hansen
2008-12-19  6:52   ` Nick Piggin [this message]
2008-12-19  6:52     ` Nick Piggin
2008-12-19  6:56     ` Nick Piggin
2008-12-19  6:54 ` Dave Hansen
2008-12-19  6:54   ` Dave Hansen
2008-12-19  7:03   ` Nick Piggin
2008-12-19  7:03     ` Nick Piggin
2008-12-19 15:32     ` Dave Hansen
2008-12-19 15:32       ` Dave Hansen
2008-12-22  4:35       ` Nick Piggin
2008-12-22  4:35         ` Nick Piggin
2008-12-29 23:00         ` Dave Hansen
2008-12-29 23:00           ` Dave Hansen
2008-12-30  4:02           ` Nick Piggin
2008-12-30  4:02             ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081219065242.GD16268@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.