Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: colyli <colyli@suse.de>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
Date: Fri, 27 May 2022 18:00:30 -0600	[thread overview]
Message-ID: <1a3a151e-a8f9-7f53-f9fc-e7eaea42a462@kernel.dk> (raw)
In-Reply-To: <8251ee2fab43b59ecd5a6140655eeb47@suse.de>

On 5/27/22 11:05 AM, colyli wrote:
> ? 2022-05-27 23:49?Jens Axboe ???
>> On 5/27/22 9:28 AM, Coly Li wrote:
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index d138a2d73240..c51671abe74e 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -214,6 +214,7 @@ static void update_writeback_rate(struct work_struct *work)
>>>                           struct cached_dev,
>>>                           writeback_rate_update);
>>>      struct cache_set *c = dc->disk.c;
>>> +    bool contention = false;
>>>
>>>      /*
>>>       * should check BCACHE_DEV_RATE_DW_RUNNING before calling
>>> @@ -243,13 +244,41 @@ static void update_writeback_rate(struct work_struct *work)
>>>           * in maximum writeback rate number(s).
>>>           */
>>>          if (!set_at_max_writeback_rate(c, dc)) {
>>> -            down_read(&dc->writeback_lock);
>>> -            __update_writeback_rate(dc);
>>> -            update_gc_after_writeback(c);
>>> -            up_read(&dc->writeback_lock);
>>> +            /*
>>> +             * When contention happens on dc->writeback_lock with
>>> +             * the writeback thread, this kwork may be blocked for
>>> +             * very long time if there are too many dirty data to
>>> +             * writeback, and kerne message will complain a (bogus)
>>> +             * software lockup kernel message. To avoid potential
>>> +             * starving, if down_read_trylock() fails, writeback
>>> +             * rate updating will be skipped for dc->retry_max times
>>> +             * at most while delay this worker a bit longer time.
>>> +             * If dc->retry_max times are tried and the trylock
>>> +             * still fails, then call down_read() to wait for
>>> +             * dc->writeback_lock.
>>> +             */
>>> +            if (!down_read_trylock((&dc->writeback_lock))) {
>>> +                contention = true;
>>> +                dc->retry_nr++;
>>> +                if (dc->retry_nr > dc->retry_max)
>>> +                    down_read(&dc->writeback_lock);
>>> +            }
>>> +
>>> +            if (!contention || dc->retry_nr > dc->retry_max) {
>>> +                __update_writeback_rate(dc);
>>> +                update_gc_after_writeback(c);
>>> +                up_read(&dc->writeback_lock);
>>> +                dc->retry_nr = 0;
>>> +            }
>>>          }
>>>      }
>>
> 
> Hi Jens,
> 
> Thanks for looking into this :-)
> 
>> This is really not very pretty. First of all, why bother with storing a
>> max retry value in there? Doesn't seem like it'd ever be different per
> 
> It is because the probability of the lock contention on
> dc->writeback_lock depends on the I/O speed backing device. From my
> observation during the tests, for fast backing device with larger
> cache device, its writeback thread may work harder to flush more dirty
> data to backing device, the lock contention happens more and longer,
> so the writeback rate update kworker has to wait longer time before
> acquires dc->writeback_lock. So its dc->retry_max should be larger
> then slow backing device.
> 
> Therefore I'd like to have a tunable per-backing-device retry_max. And
> the syses interface will be added when users/customers want it. The
> use case is from SAP HANA users, I have report that they observe the
> soft lockup warning for dc->writeback_lock contention and worry about
> whether data is corrupted (indeed, of course not).

The initial patch has 5 as the default, and there are no sysfs knobs. If
you ever need a sysfs knob, by all means make it a variable and make it
configurable too. But don't do it upfront where the '5' suitabled named
would do the job.

>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index 9ee0005874cd..cbc01372c7a1 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -235,19 +235,27 @@ static void update_writeback_rate(struct
>> work_struct *work)
>>          return;
>>      }
>>
>> -    if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
>> +    if (atomic_read(&dc->has_dirty) && dc->writeback_percent &&
>> +        !set_at_max_writeback_rate(c, dc)) {
>>          /*
>>           * If the whole cache set is idle, set_at_max_writeback_rate()
>>           * will set writeback rate to a max number. Then it is
>>           * unncessary to update writeback rate for an idle cache set
>>           * in maximum writeback rate number(s).
>>           */
>> -        if (!set_at_max_writeback_rate(c, dc)) {
> 
> The reason I didn't place '!set_at_max_writeback_rate' with other items in
> previous if() was for the above code comment. If I moved it to previous
> if() without other items, I was not comfortable to place the code comments
> neither before or after the if() check. So I used a separated if() check for
> '!set_at_max_writeback_rate'.
> 
> From your change, it seems placing the code comments behind is fine (or
> better), can I understand in this way? I try to learn and follow your way
> to handle such code comments situation.

Just put it higher up if you want, it doesn't really matter, or leave it
where it is.

>>              __update_writeback_rate(dc);
>>              update_gc_after_writeback(c);
>>              up_read(&dc->writeback_lock);
>> -        }
>> +        } while (0);
> 
> Aha, this is cool! I never though of using do{}while(0) and break in
> such a genius way! Sure I will use this, thanks for the hint :-)
> 
> After you reply my defense of dc->retry_max, and the question of code
> comments location, I will update and test the patch again, and
> re-sbumit to you.
> 
> Thanks for your constructive suggestion, especially the do{}while(0)
> part!

I would do something similar to my change and drop the 'dc' addition for
the max retries as it, by definition, can only be one value right now.
For all I know, you'll never need to change it again, and then you're
just wasting memory and making the code harder to read by putting it in
dc instead of just having this define.

-- 
Jens Axboe

next prev parent reply	other threads:[~2022-05-28  0:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-27 15:28 [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Coly Li
2022-05-27 15:28 ` [PATCH 1/3] bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() Coly Li
2022-05-27 15:28 ` [PATCH 2/3] bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() Coly Li
2022-05-27 15:49   ` Jens Axboe
2022-05-27 17:05     ` colyli
2022-05-28  0:00       ` Jens Axboe [this message]
2022-05-28  2:20         ` colyli
2022-05-27 15:28 ` [PATCH 3/3] md: bcache: check the return value of kzalloc() in detached_dev_do_request() Coly Li
2022-05-27 15:50 ` (subset) [PATCH 0/3] bcache fixes for Linux v5.19 (2nd wave) Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a3a151e-a8f9-7f53-f9fc-e7eaea42a462@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=colyli@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).