Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
@ 2009-01-30  1:23 Jan Kara
  2009-02-03  1:24 ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2009-01-30  1:23 UTC (permalink / raw)
  To: LKML; +Cc: npiggin

  Hi,

  today I found that commit 31a12666d8f0c22235297e1c1575f82061480029 (mm:
write_cache_pages cyclic fix) slows down operations over Berkeley DB.
Without this "fix", I can add 100k entries in about 5 minutes 30s, with
that change it takes about 20 minutes. 
  What is IMO happening is that previously we scanned to the end of file,
we left writeback_index at the end of file and went to write next file.
With the fix, we wrap around (seek) and after writing some more we go
to next file (seek again).
  Anyway, I think the original semantics of "cyclic" makes more sence, just
the name was chosen poorly. What we should do is really scan to the end of
file, reset index to start from the beginning next time and go for the next
file.
  I can write a patch to introduce this semantics but I'd like to hear
opinions of other people before I do so.

									Honza

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
  2009-01-30  1:23 Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB Jan Kara
@ 2009-02-03  1:24 ` Nick Piggin
  2009-02-03  1:54   ` Zhang, Yanmin
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2009-02-03  1:24 UTC (permalink / raw)
  To: Jan Kara, Andrew Morton, linux-fsdevel; +Cc: LKML, npiggin

On Friday 30 January 2009 12:23:15 Jan Kara wrote:
>   Hi,
>
>   today I found that commit 31a12666d8f0c22235297e1c1575f82061480029 (mm:
> write_cache_pages cyclic fix) slows down operations over Berkeley DB.
> Without this "fix", I can add 100k entries in about 5 minutes 30s, with
> that change it takes about 20 minutes.
>   What is IMO happening is that previously we scanned to the end of file,
> we left writeback_index at the end of file and went to write next file.
> With the fix, we wrap around (seek) and after writing some more we go
> to next file (seek again).

Hmm, but isn't that what pdflush has asked for? It is wanting to flush
some of the dirty data out of this file, and hence it wants to start
from where it last flushed out and then cycle back and flush more?

>   Anyway, I think the original semantics of "cyclic" makes more sence, just
> the name was chosen poorly. What we should do is really scan to the end of
> file, reset index to start from the beginning next time and go for the next
> file.

Well, if we think of a file as containing a set of dirty pages (as it
appears to the high level mm), rather than a sequence, then behaviour
of my patch is correct (ie. there should be no distinction between dirty
pages at different offsets in the file).

However, clearly there is some problem with that assumption if you're
seeing a 4x slowdown :P I'd really like to know how it messes up the IO
patterns. How many files in the BDB workload? Are filesystem blocks
being allocated at the end of the file while writeout is happening?
Delayed allocation?

>   I can write a patch to introduce this semantics but I'd like to hear
> opinions of other people before I do so.

I like dirty page cleaning to be offset agnostic as far as possible,
but I can't argue with numbers like that. Though maybe it would be
possible to solve it some other way.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
  2009-02-03  1:24 ` Nick Piggin
@ 2009-02-03  1:54   ` Zhang, Yanmin
  2009-02-03  2:11     ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Zhang, Yanmin @ 2009-02-03  1:54 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jan Kara, Andrew Morton, linux-fsdevel, LKML, npiggin

On Tue, 2009-02-03 at 12:24 +1100, Nick Piggin wrote:
> On Friday 30 January 2009 12:23:15 Jan Kara wrote:
> >   Hi,
> >
> >   today I found that commit 31a12666d8f0c22235297e1c1575f82061480029 (mm:
> > write_cache_pages cyclic fix) slows down operations over Berkeley DB.
> > Without this "fix", I can add 100k entries in about 5 minutes 30s, with
> > that change it takes about 20 minutes.
> >   What is IMO happening is that previously we scanned to the end of file,
> > we left writeback_index at the end of file and went to write next file.
> > With the fix, we wrap around (seek) and after writing some more we go
> > to next file (seek again).
We also found this commit causes about 40~50% regression with iozone mmap-rand-write.
#iozone -B -r 4k -s 64k -s 512m -s 1200m

My machine has 8GB memory.

> Hmm, but isn't that what pdflush has asked for? It is wanting to flush
> some of the dirty data out of this file, and hence it wants to start
> from where it last flushed out and then cycle back and flush more?
> 
> 
> >   Anyway, I think the original semantics of "cyclic" makes more sence, just
> > the name was chosen poorly. What we should do is really scan to the end of
> > file, reset index to start from the beginning next time and go for the next
> > file.
> 
> Well, if we think of a file as containing a set of dirty pages (as it
> appears to the high level mm), rather than a sequence, then behaviour
> of my patch is correct (ie. there should be no distinction between dirty
> pages at different offsets in the file).
> 
> However, clearly there is some problem with that assumption if you're
> seeing a 4x slowdown :P I'd really like to know how it messes up the IO
> patterns. How many files in the BDB workload? Are filesystem blocks
> being allocated at the end of the file while writeout is happening?
> Delayed allocation?
> 
> 
> >   I can write a patch to introduce this semantics but I'd like to hear
> > opinions of other people before I do so.
> 
> I like dirty page cleaning to be offset agnostic as far as possible,
> but I can't argue with numbers like that. Though maybe it would be
> possible to solve it some other way.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
  2009-02-03  1:54   ` Zhang, Yanmin
@ 2009-02-03  2:11     ` Nick Piggin
  2009-02-03 16:12       ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2009-02-03  2:11 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: Jan Kara, Andrew Morton, linux-fsdevel, LKML, npiggin

On Tuesday 03 February 2009 12:54:26 Zhang, Yanmin wrote:
> On Tue, 2009-02-03 at 12:24 +1100, Nick Piggin wrote:
> > On Friday 30 January 2009 12:23:15 Jan Kara wrote:
> > >   Hi,
> > >
> > >   today I found that commit 31a12666d8f0c22235297e1c1575f82061480029
> > > (mm: write_cache_pages cyclic fix) slows down operations over Berkeley
> > > DB. Without this "fix", I can add 100k entries in about 5 minutes 30s,
> > > with that change it takes about 20 minutes.
> > >   What is IMO happening is that previously we scanned to the end of
> > > file, we left writeback_index at the end of file and went to write next
> > > file. With the fix, we wrap around (seek) and after writing some more
> > > we go to next file (seek again).
>
> We also found this commit causes about 40~50% regression with iozone
> mmap-rand-write. #iozone -B -r 4k -s 64k -s 512m -s 1200m
>
> My machine has 8GB memory.

Ah, thanks. Yes BDB I believe is basically just doing an mmap-rand-write,
so maybe this is a good test case.

The interesting thing is why is this causing such a slowdown. If there is
only a single main file active in the workload, then I don't see why this
patch should make such a big difference. In either case, wouldn't pdflush
come back and just start writing out from the start of the file anyway?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
  2009-02-03  2:11     ` Nick Piggin
@ 2009-02-03 16:12       ` Chris Mason
  2009-02-05  3:40         ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2009-02-03 16:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Zhang, Yanmin, Jan Kara, Andrew Morton, linux-fsdevel, LKML,
	npiggin

On Tue, 2009-02-03 at 13:11 +1100, Nick Piggin wrote:
> On Tuesday 03 February 2009 12:54:26 Zhang, Yanmin wrote:
> > On Tue, 2009-02-03 at 12:24 +1100, Nick Piggin wrote:
> > > On Friday 30 January 2009 12:23:15 Jan Kara wrote:
> > > >   Hi,
> > > >
> > > >   today I found that commit 31a12666d8f0c22235297e1c1575f82061480029
> > > > (mm: write_cache_pages cyclic fix) slows down operations over Berkeley
> > > > DB. Without this "fix", I can add 100k entries in about 5 minutes 30s,
> > > > with that change it takes about 20 minutes.
> > > >   What is IMO happening is that previously we scanned to the end of
> > > > file, we left writeback_index at the end of file and went to write next
> > > > file. With the fix, we wrap around (seek) and after writing some more
> > > > we go to next file (seek again).
> >
> > We also found this commit causes about 40~50% regression with iozone
> > mmap-rand-write. #iozone -B -r 4k -s 64k -s 512m -s 1200m
> >
> > My machine has 8GB memory.
> 
> Ah, thanks. Yes BDB I believe is basically just doing an mmap-rand-write,
> so maybe this is a good test case.
> 
> The interesting thing is why is this causing such a slowdown. If there is
> only a single main file active in the workload, then I don't see why this
> patch should make such a big difference. In either case, wouldn't pdflush
> come back and just start writing out from the start of the file anyway?

Perhaps the difference is that without the patch, pdflush will return
after running congestion_wait()?  This would give bdb and iozone a
chance to fill in more pages, and increases the chances we'll do
sequential IO.

-chris



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB
  2009-02-03 16:12       ` Chris Mason
@ 2009-02-05  3:40         ` Nick Piggin
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2009-02-05  3:40 UTC (permalink / raw)
  To: Chris Mason
  Cc: Zhang, Yanmin, Jan Kara, Andrew Morton, linux-fsdevel, LKML,
	npiggin

On Wednesday 04 February 2009 03:12:45 Chris Mason wrote:
> On Tue, 2009-02-03 at 13:11 +1100, Nick Piggin wrote:

> > The interesting thing is why is this causing such a slowdown. If there is
> > only a single main file active in the workload, then I don't see why this
> > patch should make such a big difference. In either case, wouldn't pdflush
> > come back and just start writing out from the start of the file anyway?
>
> Perhaps the difference is that without the patch, pdflush will return
> after running congestion_wait()?  This would give bdb and iozone a
> chance to fill in more pages, and increases the chances we'll do
> sequential IO.

That's a possible explanation. I'll do some iozone runs and see if I can
work out what's going on. If we can distill the "goodness" of the previous
offset-wrap behaviour without reverting to make writeout more offset
dependent, that would be nice -- it might even benefit other cases that
weren't previously hitting the offset wrap case.

Worst case, if we can't come up with something simple and obvious for 2.6.29,
we can restore the old behaviour until we can come up with something better.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-02-05  3:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-30  1:23 Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB Jan Kara
2009-02-03  1:24 ` Nick Piggin
2009-02-03  1:54   ` Zhang, Yanmin
2009-02-03  2:11     ` Nick Piggin
2009-02-03 16:12       ` Chris Mason
2009-02-05  3:40         ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox