public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <linux-bcache@vger.kernel.org>
To: Pavel Goran <via-bcache@pvgoran.name>
Cc: linux-bcache@vger.kernel.org
Subject: Re[2]: [PATCH 1/2] bcache: introduce ioprio-based bypass/writeback hints
Date: Thu, 29 Sep 2016 11:31:44 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LRH.2.11.1609291125260.5355@mail.ewheeler.net> (raw)
In-Reply-To: <1515763256.20160929135700@pvgoran.name>

> Hello Eric,
> 
> Thursday, September 29, 2016, 1:21:53 AM, you wrote:
> 
> > It respects policy first of course.  You can see the top of
> > should_writeback() in the patch as context:
> >         if (cache_mode != CACHE_MODE_WRITEBACK || [...]
> 
> So the priority-dependent logic is only applied when cache mode is
> "writeback", right? 

Correct. It might be better to say that the new priority-dependent logic 
_for writes_ is only applied when cache mode is "writeback" via the new 
syfs entry ioprio_writeback.

For reads, ioprio_bypass is hinted in both writeback and writethrough 
modes.

> Then, what does "Original bcache logic" from the table in the original 
> patch description mean? That is, how does it differ from "Writeback" 
> from the same table?

There are two hints in `struct data_insert_op`:
	iop.bypass, iop.writeback

bcache has an existing function `should_writeback()` which is a hint on 
the cached_dev_write() path.  I've added the writeback portion of the 
ioprio logic there, so if it doesn't match the ioprio set in 
ioprio_writeback, it continues as if my code hadn't been added.

> Does "Writeback" there mean that bcache would write data to the cache
> *always*, even in case of linear write, or in other situations where the
> original writeback logic would bypass cache? If so, the table should perhaps
> be adjusted to say "Always writeback" instead of "Writeback", and "Original
> writeback logic" instead of "Original bcache logic".

Similarly, the function `should_bypass()` sets the iop.bypass hint.  This 
is where the logic as to whether or not the write looks "sequential" is 
held.

The IO is then passed to cached_dev_write() which checks 
should_writeback().  If should_writeback() returns true, it overrides the 
bypass flag.  Thus, iop.bypass is weaker than iop.writeback.  Bcache makes 
other decisions which may override bypass, too, like overlapping of dirty 
data and perhaps other reasons.  

Keep in mind that SSDs are still better at seq-writes than random writes 
because of erase block rewrites handled by the SSD's FTL.  For realtime 
applications 100% writeback might still be preferred, even if the IOs are 
sequential.  This will wear your SSD faster, of course, so set your 
io priorities appropriately.

Pseudo-code call-stack:
	cached_dev_make_request():
	   s->iop.bypass = check_should_bypass()
	   cached_dev_write():
	         if (should_writeback()) {
                    s->iop.bypass = false;
                    s->iop.writeback = true;
                 }
		 submit-based-on-hint( s )

> > If not already in cache, it will read from the backing device and will not
> > promote into (pollute) your cache. Having idle IOs bypass the cache can 
> > increase performance everywhere else since you probably don't care about 
> > the performance of idle IOs.
> 
> Ok, this sounds convincing.
> 
> Ideally this would be handled by some kind of priority semantics within the
> cache, so that "high-priority" data would always replace "low-priority" data,
> and "low-priority" data would only replace "high-priority" data if it's really
> old. It's entirely feasible to implement within bcache, but probably it
> wouldn't be easy, and might even mean changing cache data format.
> 
> Your SSD wearout problem would (again, ideally) be handled by some kind of
> custom process attribute (no idea if the kernel even has this kind of concept,
> probably not) that would signal bcache to bypass caching for IO that
> originates from this process. The additional "bulk" priority class proposed by
> Kai would also be a good solution.

Some SSDs have a SMART attribute for LBAs written---which is a good idea 
to monitor if your SSD supports it---but its not consistent enough to 
build a framework around.  It varies from mfg to mfg.
 
> However, given that these two approaches are supposedly unavailable, using
> existing IO priorities looks like an acceptable solution.

Agreed.  

> > If you really want idle IO that can writeback, then I can set the default
> > for ioprio_writeback and/or ioprio_bypass to be disabled until a user 
> > explicitly sets them.
> 
> Are you talking about bcache options available via sysfs? It would be good to
> have control over the priority-dependent logic in this way (or it's already
> implemented?). 

This patch. It just adds sysfs files to configure ioprio_bypass and 
ioprio_writeback with flags that hook into the existing 
should_bypass()/should_writeback() functions.  

I'm planning a v2 patch to update documentation and allow 0,0 to disable 
either or both features.

> Not sure what the defaults should be, though. :) 

Does anyone else have a suggestion on best defaults? As currently written, 
it is like so:
	writeback for higher priority than `ionice -c2 -n0`
	bypass    for lower  priority than `ionice -c2 -n7`

Users who do not use `ionice` are unaffected by this default since 
unprioritized pids are either "ioprio_invalid()" or have an ioprio of '-c2 -n4'.

> It would be even better to have the possibility to save these flags to 
> the superblock, if it's something that can be easily implemented.

Thats definitely a neat idea.  It would be great to put other flags like 
sequential_cutoff, writeback_*, and perhaps a few other sysfs 
configurations into the superblock so I needn't set them at boot time.  
A superblock patch for another time :)

--
Eric Wheeler

  reply	other threads:[~2016-09-29 18:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26 23:17 [PATCH 1/2] bcache: introduce ioprio-based bypass/writeback hints Eric Wheeler
2016-09-27  1:43 ` Pavel Goran
2016-09-27  2:43   ` Kai Krakow
2016-09-28 18:21     ` Eric Wheeler
2016-09-29  6:57       ` Re[2]: " Pavel Goran
2016-09-29 18:31         ` Eric Wheeler [this message]
2016-09-29 18:34         ` Kai Krakow
2016-09-29 22:50           ` Eric Wheeler
2016-09-28 19:26 ` Kai Krakow
2016-09-28 21:38   ` Eric Wheeler
2016-09-29  2:19     ` Kai Krakow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.11.1609291125260.5355@mail.ewheeler.net \
    --to=linux-bcache@vger.kernel.org \
    --cc=via-bcache@pvgoran.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox