From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "g.btrfs@cobb.uk.net" <g.btrfs@cobb.uk.net>,
Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
"dsterba@suse.cz" <dsterba@suse.cz>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"realwakka@gmail.com" <realwakka@gmail.com>
Subject: Re: [PATCH] btrfs-progs: scrub: warn if scrub started on a device has mq-deadline
Date: Sat, 12 Dec 2020 11:05:04 +0000 [thread overview]
Message-ID: <d3afc9edd13c47b33cf3152e27f10b701a97b77a.camel@wdc.com> (raw)
In-Reply-To: <SN4PR0401MB3598EA952D5F942C165422819BC90@SN4PR0401MB3598.namprd04.prod.outlook.com>
On Sat, 2020-12-12 at 10:34 +0000, Johannes Thumshirn wrote:
> On 11/12/2020 18:05, Graham Cobb wrote:
> > On 11/12/2020 16:42, Johannes Thumshirn wrote:
> > > On 11/12/2020 17:02, Johannes Thumshirn wrote:
> > > > [+Cc Damien ]
> > > > On 11/12/2020 16:55, David Sterba wrote:
> > > > > On Fri, Dec 11, 2020 at 06:50:23AM +0000, Johannes Thumshirn wrote:
> > > > > > On 10/12/2020 21:22, David Sterba wrote:
> > > > > > > On Mon, Dec 07, 2020 at 07:23:03AM +0000, Johannes Thumshirn wrote:
> > > > > > > > On 05/12/2020 19:51, Sidong Yang wrote:
> > > > > > > > > Warn if scurb stared on a device that has mq-deadline as io-scheduler
> > > > > > > > > and point documentation. mq-deadline doesn't work with ionice value and
> > > > > > > > > it results performance loss. This warning helps users figure out the
> > > > > > > > > situation. This patch implements the function that gets io-scheduler
> > > > > > > > > from sysfs and check when scrub stars with the function.
> > > > > > > >
> > > > > > > > From a quick grep it seems to me that only bfq is supporting ioprio settings.
> > > > > > >
> > > > > > > Yeah it's only BFQ.
> > > > > > >
> > > > > > > > Also there's some features like write ordering guarantees that currently
> > > > > > > > only mq-deadline provides.
> > > > > > > >
> > > > > > > > This warning will trigger a lot once the zoned patchset for btrfs is merged,
> > > > > > > > as for example SMR drives need this ordering guarantees and therefore select
> > > > > > > > mq-deadline (via the ELEVATOR_F_ZBD_SEQ_WRITE elevator feature).
> > > > > > >
> > > > > > > This won't affect the default case and for zoned fs we can't simply use
> > > > > > > BFQ and thus the ionice interface. Which should be IMHO acceptable.
> > > > > >
> > > > > > But then the patch must check if bfq is set and otherwise warn. My only fear
> > > > > > is though, people will blindly select bfq then and get all kinds of
> > > > > > penalties/problems.
> > > > >
> > > > > Hm right, and we know that once such recommendations stick, it's very
> > > > > hard to get rid of them even if there's a proper solution implemented.
> > > > >
> > > > > > And it's not only mq-deadline and bfq here, there are also
> > > > > > kyber and none. On a decent nvme I'd like to have none instead of bfq, otherwise
> > > > > > performance goes down the drain. If my workload is sensitive to buffer bloat, I'd
> > > > > > use kyber not bfq.
> > > > >
> > > > > I'm not sure about high performance devices if the scrub io load on the
> > > > > normal priority is the same problem as with spinning devices. Assuming
> > > > > it is, we need some low priority/idle class for all schedulers at least.
> > > > >
> > > > > > So IMHO this patch just makes things worse. But who am I to judge.
> > > > >
> > > > > In this situation we need broader perspective, thanks for the input,
> > > > > we'll hopefully come to some conclusion. We don't want to make things
> > > > > worse, now we're left with workarounds or warnings until some brave soul
> > > > > implements the missing io scheduler functionality.
> > > > >
> > > >
> > > > I think that's all we can do now.
> > > >
> > > > Damien (CCed) did some work on mq-deadline, maybe he has an idea if this is
> > > > possible/doable.
> > > >
> > >
> > > Ha I have a stop gap solution, we could temporarily change the io scheduler to bfq
> > > while the scrub job is running and then back to whatever was configured.
> > >
> > > Of cause only if bfq supports the elevator_features the block device requires.
> > >
> > > Thoughts?
> > >
> >
> > I would prefer you didn't mess with the scheduler I have chosen (or only
> > if asked to with a new option). My suggestion:
> >
> > 1. If -c or -n have been specified explicitly, check the scheduler and
> > error if it is known that it is incompatible.
Elevator not compatible with zoned block devices will not even be shown in
sysfs. E.g. bfq is not even listed for SMR drives. So the user/FS cannot select
a wrong scheduler beside "none". For SMR, that currently means deadline only.
For ZNS, the ELEVATOR_F_ZBD_SEQ_WRITE feature is ignored, so any scheduler can
be used. Using one would not be a good idea in any case, but if bfq is needed
it can be used: we do not have the sequential write constraint thanks to the
use of zone append writes only for sequential zones.
> > 2. If -c/-n have not been specified, just warn (not fail) if the
> > scheduler is known to be incompatible with the default idle class request.
> >
> > 3. Provide a way to suppress the warning in step 2 (is -q enough, or
> > does that also suppress other useful/important messages?).
> >
> > 4. Expand the description in the man page which currently says "Note
> > that not all IO schedulers honor the ionice settings." It could read
> > something like:
> >
> > "Note that not all IO schedulers honor the ionice settings. At time of
> > writing, only bfq honors the ionice settings although newer kernels may
> > change that. A warning will be shown if the currently selected IO
> > scheduler is known to not support ionice."
> >
> >
>
>
> Works for me as well. My only concern is, people will default to bfq and then
> complain for XY because of the change to bfq. Note, I'm not against bfq at all,
> I'm just arguing it's not a one size fits all decision.
>
> I'd also like to see if scrub on a nvme really is beneficial with bfq instead of
> none. I have my doubt's but that's a gut feeling and I have no numbers to back up
> this statement.
>
> Nice weekend,
> Johannes
>
--
Damien Le Moal
Western Digital
next prev parent reply other threads:[~2020-12-12 11:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-05 18:49 [PATCH] btrfs-progs: scrub: warn if scrub started on a device has mq-deadline Sidong Yang
2020-12-07 7:23 ` Johannes Thumshirn
2020-12-10 20:20 ` David Sterba
2020-12-11 6:50 ` Johannes Thumshirn
2020-12-11 15:53 ` David Sterba
2020-12-11 16:02 ` Johannes Thumshirn
2020-12-11 16:42 ` Johannes Thumshirn
2020-12-11 17:04 ` Graham Cobb
2020-12-12 10:34 ` Johannes Thumshirn
2020-12-12 11:05 ` Damien Le Moal [this message]
2020-12-12 16:44 ` Sidong Yang
2020-12-14 7:11 ` Johannes Thumshirn
2020-12-07 8:00 ` Nikolay Borisov
2020-12-10 20:16 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3afc9edd13c47b33cf3152e27f10b701a97b77a.camel@wdc.com \
--to=damien.lemoal@wdc.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=dsterba@suse.cz \
--cc=g.btrfs@cobb.uk.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=realwakka@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox