From: Jens Axboe <axboe@kernel.dk>
To: colyli <colyli@suse.de>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
Date: Fri, 27 May 2022 18:00:30 -0600 [thread overview]
Message-ID: <1a3a151e-a8f9-7f53-f9fc-e7eaea42a462@kernel.dk> (raw)
In-Reply-To: <8251ee2fab43b59ecd5a6140655eeb47@suse.de>
On 5/27/22 11:05 AM, colyli wrote:
> ? 2022-05-27 23:49?Jens Axboe ???
>> On 5/27/22 9:28 AM, Coly Li wrote:
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index d138a2d73240..c51671abe74e 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work)
>>> struct cached_dev,
>>> writeback_rate_update);
>>> struct cache_set *c = dc->disk.c;
>>> + bool contention = false;
>>>
>>> /*
>>> * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work)
>>> * in maximum writeback rate number(s).
>>> */
>>> if (!set_at_max_writeback_rate(c, dc)) {
>>> - down_read(&dc->writeback_lock);
>>> - __update_writeback_rate(dc);
>>> - update_gc_after_writeback(c);
>>> - up_read(&dc->writeback_lock);
>>> + /*
>>> + * When contention happens on dc->writeback_lock with
>>> + * the writeback thread, this kwork may be blocked for
>>> + * very long time if there are too many dirty data to
>>> + * writeback, and kerne message will complain a (bogus)
>>> + * software lockup kernel message. To avoid potential
>>> + * starving, if down_read_trylock() fails, writeback
>>> + * rate updating will be skipped for dc->retry_max times
>>> + * at most while delay this worker a bit longer time.
>>> + * If dc->retry_max times are tried and the trylock
>>> + * still fails, then call down_read() to wait for
>>> + * dc->writeback_lock.
>>> + */
>>> + if (!down_read_trylock((&dc->writeback_lock))) {
>>> + contention = true;
>>> + dc->retry_nr++;
>>> + if (dc->retry_nr > dc->retry_max)
>>> + down_read(&dc->writeback_lock);
>>> + }
>>> +
>>> + if (!contention || dc->retry_nr > dc->retry_max) {
>>> + __update_writeback_rate(dc);
>>> + update_gc_after_writeback(c);
>>> + up_read(&dc->writeback_lock);
>>> + dc->retry_nr = 0;
>>> + }
>>> }
>>> }
>>
>
> Hi Jens,
>
> Thanks for looking into this :-)
>
>> This is really not very pretty. First of all, why bother with storing a
>> max retry value in there? Doesn't seem like it'd ever be different per
>
> It is because the probability of the lock contention on
> dc->writeback_lock depends on the I/O speed backing device. From my
> observation during the tests, for fast backing device with larger
> cache device, its writeback thread may work harder to flush more dirty
> data to backing device, the lock contention happens more and longer,
> so the writeback rate update kworker has to wait longer time before
> acquires dc->writeback_lock. So its dc->retry_max should be larger
> then slow backing device.
>
> Therefore I'd like to have a tunable per-backing-device retry_max. And
> the syses interface will be added when users/customers want it. The
> use case is from SAP HANA users, I have report that they observe the
> soft lockup warning for dc->writeback_lock contention and worry about
> whether data is corrupted (indeed, of course not).
The initial patch has 5 as the default, and there are no sysfs knobs. If
you ever need a sysfs knob, by all means make it a variable and make it
configurable too. But don't do it upfront where the '5' suitabled named
would do the job.
>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index 9ee0005874cd..cbc01372c7a1 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
>> work_struct *work)
>> return;
>> }
>>
>> - if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
>> + if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
>> + !set_at_max_writeback_rate(c, dc)) {
>> /*
>> * If the whole cache set is idle, set_at_max_writeback_rate()
>> * will set writeback rate to a max number. Then it is
>> * unncessary to update writeback rate for an idle cache set
>> * in maximum writeback rate number(s).
>> */
>> - if (!set_at_max_writeback_rate(c, dc)) {
>
> The reason I didn't place '!set_at_max_writeback_rate' with other items in
> previous if() was for the above code comment. If I moved it to previous
> if() without other items, I was not comfortable to place the code comments
> neither before or after the if() check. So I used a separated if() check for
> '!set_at_max_writeback_rate'.
>
> From your change, it seems placing the code comments behind is fine (or
> better), can I understand in this way? I try to learn and follow your way
> to handle such code comments situation.
Just put it higher up if you want, it doesn't really matter, or leave it
where it is.
>> __update_writeback_rate(dc);
>> update_gc_after_writeback(c);
>> up_read(&dc->writeback_lock);
>> - }
>> + } while (0);
>
> Aha, this is cool! I never though of using do{}while(0) and break in
> such a genius way! Sure I will use this, thanks for the hint :-)
>
> After you reply my defense of dc->retry_max, and the question of code
> comments location, I will update and test the patch again, and
> re-sbumit to you.
>
> Thanks for your constructive suggestion, especially the do{}while(0)
> part!
I would do something similar to my change and drop the 'dc' addition for
the max retries as it, by definition, can only be one value right now.
For all I know, you'll never need to change it again, and then you're
just wasting memory and making the code harder to read by putting it in
dc instead of just having this define.
--
Jens Axboe
next prev parent reply other threads:[~2022-05-28 0:00 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
2022-05-27 15:49 ` Jens Axboe
2022-05-27 17:05 ` colyli
2022-05-28 0:00 ` Jens Axboe [this message]
2022-05-28 2:20 ` colyli
2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a3a151e-a8f9-7f53-f9fc-e7eaea42a462@kernel.dk \
--to=axboe@kernel.dk \
--cc=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).