Re: + fs-bufferc-make-bh_lru_install-more-efficient.patch added to -mm tree

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: + fs-bufferc-make-bh_lru_install-more-efficient.patch added to -mm tree
       [not found] ` <alpine.DEB.2.20.1706011050590.8835@east.gentwo.org>
@ 2017-06-03  5:44   ` Eric Biggers
  2017-06-05 13:51     ` Christoph Lameter
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Biggers @ 2017-06-03  5:44 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, viro, mm-commits, linux-fsdevel

+Cc linux-fsdevel

On Thu, Jun 01, 2017 at 11:07:19AM -0500, Christoph Lameter wrote:
> On Wed, 31 May 2017, akpm@linux-foundation.org wrote:
> 
> > +	struct buffer_head *evictee = bh;
> > +	struct bh_lru *b;
> > +	int i;
> > +	b = this_cpu_ptr(&bh_lrus);
> > +	for (i = 0; i < BH_LRU_SIZE; i++) {
> > +		swap(evictee, b->bhs[i]);
> 
> Could you try to use this_cpu_xchg here to see if it reduces latency
> further?
> 
> for (i = 0; i < BH_LRU_SIZE; i++) {
> 	__this_cpu_xchg(bh_lrus->bhs[i], evictee)
> 
> ...
> 

I tried --- actually, 'evictee = __this_cpu_xchg(bh_lrus.bhs[i], evictee)'.  But
it's much slower, nearly as slow as the original --- which perhaps is not
surprising since __this_cpu_xchg() is a cmpxchg rather than a simple load and
store.  It may be even worse on non-x86 architectures.  Also note that we still
have to disable IRQs because we need to stay on the same CPU throughout so that
only a single queue is operated on.

Eric

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: + fs-bufferc-make-bh_lru_install-more-efficient.patch added to -mm tree
  2017-06-03  5:44   ` + fs-bufferc-make-bh_lru_install-more-efficient.patch added to -mm tree Eric Biggers
@ 2017-06-05 13:51     ` Christoph Lameter
  0 siblings, 0 replies; 2+ messages in thread
From: Christoph Lameter @ 2017-06-05 13:51 UTC (permalink / raw)
  To: Eric Biggers; +Cc: akpm, viro, mm-commits, linux-fsdevel

On Fri, 2 Jun 2017, Eric Biggers wrote:

> I tried --- actually, 'evictee = __this_cpu_xchg(bh_lrus.bhs[i], evictee)'.  But
> it's much slower, nearly as slow as the original --- which perhaps is not
> surprising since __this_cpu_xchg() is a cmpxchg rather than a simple load and
> store.  It may be even worse on non-x86 architectures.  Also note that we still

Its a local cmpxchg which should only take a few cycles.

> have to disable IRQs because we need to stay on the same CPU throughout so that
> only a single queue is operated on.

Ah ok that would kill it.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-05 13:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <592f4959.hf6G/WhBeHpccHb7%akpm@linux-foundation.org>
     [not found] ` <alpine.DEB.2.20.1706011050590.8835@east.gentwo.org>
2017-06-03  5:44   ` + fs-bufferc-make-bh_lru_install-more-efficient.patch added to -mm tree Eric Biggers
2017-06-05 13:51     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).