Re: [PATCH 1/1] bcache: process fewer btree nodes in incremental GC cycles

public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed

From: Coly Li <i@coly.li>
To: Robert Pang <robertpang@google.com>
Cc: Coly Li <colyli@kernel.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-bcache@vger.kernel.org,
	Mingzhe Zou <mingzhe.zou@easystack.cn>
Subject: Re: [PATCH 1/1] bcache: process fewer btree nodes in incremental GC cycles
Date: Wed, 16 Apr 2025 17:44:13 +0800	[thread overview]
Message-ID: <F3A62312-F568-499C-BAB4-017CEC11CDB5@coly.li> (raw)
In-Reply-To: <CAJhEC05LSFsDSKAfY9PPHz2zHYxu2geoZ6NO_umv3v9uJEEyZg@mail.gmail.com>



> 2025年4月16日 01:25，Robert Pang <robertpang@google.com> 写道：
> 
> Thank you for your prompt feedback, Coly.
> 
> On Mon, Apr 14, 2025 at 7:08 PM Coly Li <colyli@kernel.org> wrote:
>> 
>> On Mon, Apr 14, 2025 at 03:44:04PM +0800, Robert Pang wrote:
>>> Current incremental GC processes a minimum of 100 btree nodes per cycle,
>>> followed by a 100ms sleep. For NVMe cache devices, where the average node
>>> processing time is ~1ms, this leads to front-side I/O latency potentially
>>> reaching tens or hundreds of milliseconds during GC execution.
>>> 
>>> This commit resolves this latency issue by reducing the minimum node processing
>>> count per cycle to 10 and the inter-cycle sleep duration to 10ms. It also
>>> integrates a check of existing GC statistics to re-scale the number of nodes
>>> processed per sleep interval when needed, ensuring GC finishes well before the
>>> next GC is due.
>>> 
>>> Signed-off-by: Robert Pang <robertpang@google.com>
>>> ---
>>> drivers/md/bcache/btree.c | 38 +++++++++++++++++---------------------
>>> drivers/md/bcache/util.h  |  3 +++
>>> 2 files changed, 20 insertions(+), 21 deletions(-)
>>> 
>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>> index ed40d8600656..093e1edcaa53 100644
>>> --- a/drivers/md/bcache/btree.c
>>> +++ b/drivers/md/bcache/btree.c
>>> @@ -88,11 +88,8 @@
>>>  * Test module load/unload
>>>  */
>>> 
>>> -#define MAX_NEED_GC          64
>>> -#define MAX_SAVE_PRIO                72
>> 
>> You may compose another patch for the above changes, to separte them
>> from main idea of this patch.
> 
> Certainly, Just sent this as a separate patch.
> 
>>> -#define MAX_GC_TIMES         100
>>> -#define MIN_GC_NODES         100
>>> -#define GC_SLEEP_MS          100
>>> +#define GC_MIN_NODES         10
>>> +#define GC_SLEEP_MS          10
>>> 
>>> #define PTR_DIRTY_BIT                (((uint64_t) 1 << 36))
>>> 
>>> @@ -1585,25 +1582,24 @@ static unsigned int btree_gc_count_keys(struct btree *b)
>>> 
>>> static size_t btree_gc_min_nodes(struct cache_set *c)
>>> {
>>> -     size_t min_nodes;
>>> +     size_t min_nodes = GC_MIN_NODES;
>>> +     uint64_t gc_max_ms = time_stat_average(&c->btree_gc_time, frequency, ms) / 2;
>>> 
>>>      /*
>>> -      * Since incremental GC would stop 100ms when front
>>> -      * side I/O comes, so when there are many btree nodes,
>>> -      * if GC only processes constant (100) nodes each time,
>>> -      * GC would last a long time, and the front side I/Os
>>> -      * would run out of the buckets (since no new bucket
>>> -      * can be allocated during GC), and be blocked again.
>>> -      * So GC should not process constant nodes, but varied
>>> -      * nodes according to the number of btree nodes, which
>>> -      * realized by dividing GC into constant(100) times,
>>> -      * so when there are many btree nodes, GC can process
>>> -      * more nodes each time, otherwise, GC will process less
>>> -      * nodes each time (but no less than MIN_GC_NODES)
>>> +      * The incremental garbage collector operates by processing
>>> +      * GC_MIN_NODES at a time, pausing for GC_SLEEP_MS between
>>> +      * each interval. If historical garbage collection statistics
>>> +      * (btree_gc_time) is available, the maximum allowable GC
>>> +      * duration is set to half of this observed frequency. To
>>> +      * prevent exceeding this maximum duration, the number of
>>> +      * nodes processed in the current step may be increased if
>>> +      * the projected completion time based on the current pace
>>> +      * extends beyond the allowed limit. This ensures timely GC
>>> +      * completion before the next GC is due.
>>>       */
>>> -     min_nodes = c->gc_stats.nodes / MAX_GC_TIMES;
>>> -     if (min_nodes < MIN_GC_NODES)
>>> -             min_nodes = MIN_GC_NODES;
>>> +     if ((gc_max_ms >= GC_SLEEP_MS) &&
>>> +         (GC_SLEEP_MS * (c->gc_stats.nodes / min_nodes)) > gc_max_ms)
>>> +             min_nodes = c->gc_stats.nodes / (gc_max_ms / GC_SLEEP_MS);
>>> 
>> 
>> Is it possible that gc_max_ms becomes 0?
> 
> Yes, gc_max_ms can be 0 initially when the cache is set up and stats are not
> collected yet. In that case, the check "gc_max_ms >= GC_SLEEP_MS" fails and
> we process at the default rate of 10 nodes per cycle.
> 
> Importantly, this same check also serves to prevent a division-by-zero error
> when the number of nodes is re-scaled using the following calculation:
> 
>  min_nodes = c->gc_stats.nodes / (gc_max_ms / GC_SLEEP_MS);
> 
>> 

Thanks. So the code itself I don’t have more comment, just look forward to more testing and benchmark results.

Thanks.

Coly Li




>>>      return min_nodes;
>>> }
>>> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
>>> index 539454d8e2d0..21a370f444b7 100644
>>> --- a/drivers/md/bcache/util.h
>>> +++ b/drivers/md/bcache/util.h
>>> @@ -305,6 +305,9 @@ static inline unsigned int local_clock_us(void)
>>> #define NSEC_PER_ms                  NSEC_PER_MSEC
>>> #define NSEC_PER_sec                 NSEC_PER_SEC
>>> 
>>> +#define time_stat_average(stats, stat, units)                                \
>>> +     div_u64((stats)->average_ ## stat >> 8, NSEC_PER_ ## units)
>>> +
>> 
>> Could you please add a few code comments here to explain what does
>> time_stat_average() do?
> 
> Will add the code comments with explanation in a new version promptly.
> 
>> 
>> Thanks in advance.
>> 
>>> #define __print_time_stat(stats, name, stat, units)                  \
>>>      sysfs_print(name ## _ ## stat ## _ ## units,                    \
>>>                  div_u64((stats)->stat >> 8, NSEC_PER_ ## units))
>>> --
>>> 2.49.0.604.gff1f9ca942-goog
>>> 
>> 
>> --
>> Coly Li

next prev parent reply	other threads:[~2025-04-16 13:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 22:44 [PATCH 0/1] bcache: reduce front IO latency during GC Robert Pang
2025-04-14 22:44 ` [PATCH 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
2025-04-15  2:08   ` Coly Li
2025-04-15 17:25     ` Robert Pang
2025-04-16  9:44       ` Coly Li [this message]
2025-04-17  5:44         ` Robert Pang
2025-04-15  1:57 ` [PATCH 0/1] bcache: reduce front IO latency during GC Coly Li
2025-04-22  1:44   ` Robert Pang
2025-05-02  1:01     ` Robert Pang
2025-05-03 17:33       ` Coly Li
2025-05-15  0:58         ` Robert Pang
2025-05-15  1:04           ` Robert Pang
2025-05-15  7:12           ` Kuan-Wei Chiu
2025-05-16  3:58             ` Robert Pang
2025-05-16 16:14               ` Kuan-Wei Chiu
2025-05-17 11:02                 ` Coly Li
2025-05-20 11:51                   ` Kuan-Wei Chiu
2025-05-20 12:13                     ` Coly Li
2025-05-20 12:26                       ` Kuan-Wei Chiu
2025-05-20 13:13                         ` Coly Li
2025-05-21 14:40                           ` Kuan-Wei Chiu
2025-06-06  7:39                             ` Robert Pang
2025-06-06 12:37                               ` Kuan-Wei Chiu
2025-07-03 15:28           ` Robert Pang
     [not found]             ` <10F3457F-FC71-4D21-B2CA-05346B68D873@coly.li>
     [not found]               ` <A52B16E2-DF9C-4C8A-9853-464385D1A7FA@coly.li>
2025-07-17 22:52                 ` Robert Pang
2025-07-18  6:09                   ` Coly Li
2025-05-16 17:10 ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F3A62312-F568-499C-BAB4-017CEC11CDB5@coly.li \
    --to=i@coly.li \
    --cc=colyli@kernel.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcache@vger.kernel.org \
    --cc=mingzhe.zou@easystack.cn \
    --cc=robertpang@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox