Writeback efficiency -- proposal

public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed

* Writeback efficiency -- proposal
@ 2017-09-20  8:01 Michael Lyle
  2017-09-20  8:08 ` Vojtech Pavlik
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Michael Lyle @ 2017-09-20  8:01 UTC (permalink / raw)
  To: linux-bcache; +Cc: Kent Overstreet

Hey everyone---

Right now writeback is pretty inefficient.  It lowers the seek
workload some on the disk by doing things in ascending-LBA order, but
there is no prioritization of writing back larger blocks (that is,
doing larger sequential IOs).

At the same time, there is no on-disk index that makes it easy to find
larger sequential pieces.  However, I think it's possible to take a
heuristic approach to make this better.

Proposal--- When gathering dirty chunks--- I would like to track the
median size written back in the last batch of writebacks, and then
skip the first 500 things smaller than the median size.  This still
has the effect of putting all of our writes in LBA order, and has a
relatively minimal cost (having to scan through 1000 dirty things
instead of 500 in the worst case).  Upon reaching the end of the btree
we can revert to accepting all blocks.

Taking a trivial case-- If half of the things to write back are 4k,
and half are 8k, this will make us favor / almost entirely do
writeback of 8k chunks, and will demand 25% fewer seeks to do an
equivalent amount of writeback, in exchange for a small amount of
additional CPU.  (To an extent even this will be mitigated, because we
won't have to scan to find dirty blocks as often).

Does this sound reasonable?

Thanks!

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Writeback efficiency -- proposal
  2017-09-20  8:01 Writeback efficiency -- proposal Michael Lyle
@ 2017-09-20  8:08 ` Vojtech Pavlik
  2017-09-20  8:51   ` Michael Lyle
  2017-09-20  8:20 ` Coly Li
  2017-09-20 14:06 ` Kent Overstreet
  2 siblings, 1 reply; 6+ messages in thread
From: Vojtech Pavlik @ 2017-09-20  8:08 UTC (permalink / raw)
  To: Michael Lyle; +Cc: linux-bcache, Kent Overstreet

On Wed, Sep 20, 2017 at 01:01:47AM -0700, Michael Lyle wrote:
> Hey everyone---
> 
> Right now writeback is pretty inefficient.  It lowers the seek
> workload some on the disk by doing things in ascending-LBA order, but
> there is no prioritization of writing back larger blocks (that is,
> doing larger sequential IOs).

On RAID devices, bcache attempts writing out full RAID stripes, avoiding
the issue you describe.

It might make sense to extend that logic to non-striped devices, too.

> At the same time, there is no on-disk index that makes it easy to find
> larger sequential pieces.  However, I think it's possible to take a
> heuristic approach to make this better.
> 
> Proposal--- When gathering dirty chunks--- I would like to track the
> median size written back in the last batch of writebacks, and then
> skip the first 500 things smaller than the median size.  This still
> has the effect of putting all of our writes in LBA order, and has a
> relatively minimal cost (having to scan through 1000 dirty things
> instead of 500 in the worst case).  Upon reaching the end of the btree
> we can revert to accepting all blocks.
> 
> Taking a trivial case-- If half of the things to write back are 4k,
> and half are 8k, this will make us favor / almost entirely do
> writeback of 8k chunks, and will demand 25% fewer seeks to do an
> equivalent amount of writeback, in exchange for a small amount of
> additional CPU.  (To an extent even this will be mitigated, because we
> won't have to scan to find dirty blocks as often).
> 
> Does this sound reasonable?

It doesn't sound wrong. :)

Vojtech

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Writeback efficiency -- proposal
  2017-09-20  8:08 ` Vojtech Pavlik
@ 2017-09-20  8:51   ` Michael Lyle
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Lyle @ 2017-09-20  8:51 UTC (permalink / raw)
  To: Vojtech Pavlik; +Cc: linux-bcache, Kent Overstreet

Vojtech & Coly---

Thanks for feedback!

On Wed, Sep 20, 2017 at 1:08 AM, Vojtech Pavlik <vojtech@suse.com> wrote:
> On RAID devices, bcache attempts writing out full RAID stripes, avoiding
> the issue you describe.
>
> It might make sense to extend that logic to non-striped devices, too.

Yes-- it tries to write full stripes.  OTOH it doesn't favor
contiguous sets of full-stripes, and if there's not a lot of
full-stripes (500-- fairly unlikely, especially with sequential I/O
bypassing the cache) it falls back to the other behavior and will
happily pick the smallest blocks.  So it's still not seek minimized in
either case.

The data structure used for full-stripes would be not too bad to scan
to look for very large chunks.  I am not sure if we should, though, as
it is likely to destroy the LBA-ordering properties.  That is, there's
a definite tradeoff between trying to do the I/O in order and trying
to do the biggest I/Os.

For now I think I am going to put the heuristic in the fallback
(not-striped) case-- so on a striped array, first we'll search the
entire disk for full-stripes, then try to get bigger-than-average
reads, and finally fall back to anything we find on the way.

>> [snip]

> It doesn't sound wrong. :)
>
> Vojtech

Awesome.  :D I have some preliminary measurement that this is a win--
I'll work on getting a good patchset together some time in the next
week.

Thanks,

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Writeback efficiency -- proposal
  2017-09-20  8:01 Writeback efficiency -- proposal Michael Lyle
  2017-09-20  8:08 ` Vojtech Pavlik
@ 2017-09-20  8:20 ` Coly Li
  2017-09-20 14:06 ` Kent Overstreet
  2 siblings, 0 replies; 6+ messages in thread
From: Coly Li @ 2017-09-20  8:20 UTC (permalink / raw)
  To: Michael Lyle, linux-bcache; +Cc: Kent Overstreet

On 2017/9/20 上午10:01, Michael Lyle wrote:
> Hey everyone---
> 
> Right now writeback is pretty inefficient.  It lowers the seek
> workload some on the disk by doing things in ascending-LBA order, but
> there is no prioritization of writing back larger blocks (that is,
> doing larger sequential IOs).
> 
> At the same time, there is no on-disk index that makes it easy to find
> larger sequential pieces.  However, I think it's possible to take a
> heuristic approach to make this better.
> 
> Proposal--- When gathering dirty chunks--- I would like to track the
> median size written back in the last batch of writebacks, and then
> skip the first 500 things smaller than the median size.  This still
> has the effect of putting all of our writes in LBA order, and has a
> relatively minimal cost (having to scan through 1000 dirty things
> instead of 500 in the worst case).  Upon reaching the end of the btree
> we can revert to accepting all blocks.
> 
> Taking a trivial case-- If half of the things to write back are 4k,
> and half are 8k, this will make us favor / almost entirely do
> writeback of 8k chunks, and will demand 25% fewer seeks to do an
> equivalent amount of writeback, in exchange for a small amount of
> additional CPU.  (To an extent even this will be mitigated, because we
> won't have to scan to find dirty blocks as often).
> 
> Does this sound reasonable?

Hi Mike,

It sounds reasonable, let's see how it works out in practice :-)

Thanks.

-- 
Coly Li

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Writeback efficiency -- proposal
  2017-09-20  8:01 Writeback efficiency -- proposal Michael Lyle
  2017-09-20  8:08 ` Vojtech Pavlik
  2017-09-20  8:20 ` Coly Li
@ 2017-09-20 14:06 ` Kent Overstreet
  2017-09-20 15:42   ` Michael Lyle
  2 siblings, 1 reply; 6+ messages in thread
From: Kent Overstreet @ 2017-09-20 14:06 UTC (permalink / raw)
  To: Michael Lyle; +Cc: linux-bcache

On Wed, Sep 20, 2017 at 01:01:47AM -0700, Michael Lyle wrote:
> Hey everyone---
> 
> Right now writeback is pretty inefficient.  It lowers the seek
> workload some on the disk by doing things in ascending-LBA order, but
> there is no prioritization of writing back larger blocks (that is,
> doing larger sequential IOs).
> 
> At the same time, there is no on-disk index that makes it easy to find
> larger sequential pieces.  However, I think it's possible to take a
> heuristic approach to make this better.
> 
> Proposal--- When gathering dirty chunks--- I would like to track the
> median size written back in the last batch of writebacks, and then
> skip the first 500 things smaller than the median size.  This still
> has the effect of putting all of our writes in LBA order, and has a
> relatively minimal cost (having to scan through 1000 dirty things
> instead of 500 in the worst case).  Upon reaching the end of the btree
> we can revert to accepting all blocks.
> 
> Taking a trivial case-- If half of the things to write back are 4k,
> and half are 8k, this will make us favor / almost entirely do
> writeback of 8k chunks, and will demand 25% fewer seeks to do an
> equivalent amount of writeback, in exchange for a small amount of
> additional CPU.  (To an extent even this will be mitigated, because we
> won't have to scan to find dirty blocks as often).
> 
> Does this sound reasonable?

The main thing to be careful about is anything you do that increases scanning
for dirty data has the potential to cause problems by starving foreground writes
via the writeback lock.

If you or others are going to be working on this code, trying to improve that
locking would probably be very worthwhile...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Writeback efficiency -- proposal
  2017-09-20 14:06 ` Kent Overstreet
@ 2017-09-20 15:42   ` Michael Lyle
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Lyle @ 2017-09-20 15:42 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-bcache

On Wed, Sep 20, 2017 at 7:06 AM, Kent Overstreet
<kent.overstreet@gmail.com> wrote:
> The main thing to be careful about is anything you do that increases scanning
> for dirty data has the potential to cause problems by starving foreground writes
> via the writeback lock.
>
> If you or others are going to be working on this code, trying to improve that
> locking would probably be very worthwhile...

Kent--

This is a very good point.  Hopefully the approach I've got where at
most the work is doubled (and happens less frequently) isn't too bad
on this scale.  I will look at making that locking more granular, too,
though. :D

Thanks,

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-20 15:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-20  8:01 Writeback efficiency -- proposal Michael Lyle
2017-09-20  8:08 ` Vojtech Pavlik
2017-09-20  8:51   ` Michael Lyle
2017-09-20  8:20 ` Coly Li
2017-09-20 14:06 ` Kent Overstreet
2017-09-20 15:42   ` Michael Lyle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox