From: Coly Li <i@coly.li>
To: Robert Pang <robertpang@google.com>
Cc: Coly Li <colyli@kernel.org>,
Kent Overstreet <kent.overstreet@linux.dev>,
linux-bcache@vger.kernel.org,
Mingzhe Zou <mingzhe.zou@easystack.cn>
Subject: Re: [PATCH 1/1] bcache: process fewer btree nodes in incremental GC cycles
Date: Wed, 16 Apr 2025 17:44:13 +0800 [thread overview]
Message-ID: <F3A62312-F568-499C-BAB4-017CEC11CDB5@coly.li> (raw)
In-Reply-To: <CAJhEC05LSFsDSKAfY9PPHz2zHYxu2geoZ6NO_umv3v9uJEEyZg@mail.gmail.com>
> 2025年4月16日 01:25,Robert Pang <robertpang@google.com> 写道:
>
> Thank you for your prompt feedback, Coly.
>
> On Mon, Apr 14, 2025 at 7:08 PM Coly Li <colyli@kernel.org> wrote:
>>
>> On Mon, Apr 14, 2025 at 03:44:04PM +0800, Robert Pang wrote:
>>> Current incremental GC processes a minimum of 100 btree nodes per cycle,
>>> followed by a 100ms sleep. For NVMe cache devices, where the average node
>>> processing time is ~1ms, this leads to front-side I/O latency potentially
>>> reaching tens or hundreds of milliseconds during GC execution.
>>>
>>> This commit resolves this latency issue by reducing the minimum node processing
>>> count per cycle to 10 and the inter-cycle sleep duration to 10ms. It also
>>> integrates a check of existing GC statistics to re-scale the number of nodes
>>> processed per sleep interval when needed, ensuring GC finishes well before the
>>> next GC is due.
>>>
>>> Signed-off-by: Robert Pang <robertpang@google.com>
>>> ---
>>> drivers/md/bcache/btree.c | 38 +++++++++++++++++---------------------
>>> drivers/md/bcache/util.h | 3 +++
>>> 2 files changed, 20 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
>>> index ed40d8600656..093e1edcaa53 100644
>>> --- a/drivers/md/bcache/btree.c
>>> +++ b/drivers/md/bcache/btree.c
>>> @@ -88,11 +88,8 @@
>>> * Test module load/unload
>>> */
>>>
>>> -#define MAX_NEED_GC 64
>>> -#define MAX_SAVE_PRIO 72
>>
>> You may compose another patch for the above changes, to separte them
>> from main idea of this patch.
>
> Certainly, Just sent this as a separate patch.
>
>>> -#define MAX_GC_TIMES 100
>>> -#define MIN_GC_NODES 100
>>> -#define GC_SLEEP_MS 100
>>> +#define GC_MIN_NODES 10
>>> +#define GC_SLEEP_MS 10
>>>
>>> #define PTR_DIRTY_BIT (((uint64_t) 1 << 36))
>>>
>>> @@ -1585,25 +1582,24 @@ static unsigned int btree_gc_count_keys(struct btree *b)
>>>
>>> static size_t btree_gc_min_nodes(struct cache_set *c)
>>> {
>>> - size_t min_nodes;
>>> + size_t min_nodes = GC_MIN_NODES;
>>> + uint64_t gc_max_ms = time_stat_average(&c->btree_gc_time, frequency, ms) / 2;
>>>
>>> /*
>>> - * Since incremental GC would stop 100ms when front
>>> - * side I/O comes, so when there are many btree nodes,
>>> - * if GC only processes constant (100) nodes each time,
>>> - * GC would last a long time, and the front side I/Os
>>> - * would run out of the buckets (since no new bucket
>>> - * can be allocated during GC), and be blocked again.
>>> - * So GC should not process constant nodes, but varied
>>> - * nodes according to the number of btree nodes, which
>>> - * realized by dividing GC into constant(100) times,
>>> - * so when there are many btree nodes, GC can process
>>> - * more nodes each time, otherwise, GC will process less
>>> - * nodes each time (but no less than MIN_GC_NODES)
>>> + * The incremental garbage collector operates by processing
>>> + * GC_MIN_NODES at a time, pausing for GC_SLEEP_MS between
>>> + * each interval. If historical garbage collection statistics
>>> + * (btree_gc_time) is available, the maximum allowable GC
>>> + * duration is set to half of this observed frequency. To
>>> + * prevent exceeding this maximum duration, the number of
>>> + * nodes processed in the current step may be increased if
>>> + * the projected completion time based on the current pace
>>> + * extends beyond the allowed limit. This ensures timely GC
>>> + * completion before the next GC is due.
>>> */
>>> - min_nodes = c->gc_stats.nodes / MAX_GC_TIMES;
>>> - if (min_nodes < MIN_GC_NODES)
>>> - min_nodes = MIN_GC_NODES;
>>> + if ((gc_max_ms >= GC_SLEEP_MS) &&
>>> + (GC_SLEEP_MS * (c->gc_stats.nodes / min_nodes)) > gc_max_ms)
>>> + min_nodes = c->gc_stats.nodes / (gc_max_ms / GC_SLEEP_MS);
>>>
>>
>> Is it possible that gc_max_ms becomes 0?
>
> Yes, gc_max_ms can be 0 initially when the cache is set up and stats are not
> collected yet. In that case, the check "gc_max_ms >= GC_SLEEP_MS" fails and
> we process at the default rate of 10 nodes per cycle.
>
> Importantly, this same check also serves to prevent a division-by-zero error
> when the number of nodes is re-scaled using the following calculation:
>
> min_nodes = c->gc_stats.nodes / (gc_max_ms / GC_SLEEP_MS);
>
>>
Thanks. So the code itself I don’t have more comment, just look forward to more testing and benchmark results.
Thanks.
Coly Li
>>> return min_nodes;
>>> }
>>> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
>>> index 539454d8e2d0..21a370f444b7 100644
>>> --- a/drivers/md/bcache/util.h
>>> +++ b/drivers/md/bcache/util.h
>>> @@ -305,6 +305,9 @@ static inline unsigned int local_clock_us(void)
>>> #define NSEC_PER_ms NSEC_PER_MSEC
>>> #define NSEC_PER_sec NSEC_PER_SEC
>>>
>>> +#define time_stat_average(stats, stat, units) \
>>> + div_u64((stats)->average_ ## stat >> 8, NSEC_PER_ ## units)
>>> +
>>
>> Could you please add a few code comments here to explain what does
>> time_stat_average() do?
>
> Will add the code comments with explanation in a new version promptly.
>
>>
>> Thanks in advance.
>>
>>> #define __print_time_stat(stats, name, stat, units) \
>>> sysfs_print(name ## _ ## stat ## _ ## units, \
>>> div_u64((stats)->stat >> 8, NSEC_PER_ ## units))
>>> --
>>> 2.49.0.604.gff1f9ca942-goog
>>>
>>
>> --
>> Coly Li
next prev parent reply other threads:[~2025-04-16 13:19 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-14 22:44 [PATCH 0/1] bcache: reduce front IO latency during GC Robert Pang
2025-04-14 22:44 ` [PATCH 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
2025-04-15 2:08 ` Coly Li
2025-04-15 17:25 ` Robert Pang
2025-04-16 9:44 ` Coly Li [this message]
2025-04-17 5:44 ` Robert Pang
2025-04-15 1:57 ` [PATCH 0/1] bcache: reduce front IO latency during GC Coly Li
2025-04-22 1:44 ` Robert Pang
2025-05-02 1:01 ` Robert Pang
2025-05-03 17:33 ` Coly Li
2025-05-15 0:58 ` Robert Pang
2025-05-15 1:04 ` Robert Pang
2025-05-15 7:12 ` Kuan-Wei Chiu
2025-05-16 3:58 ` Robert Pang
2025-05-16 16:14 ` Kuan-Wei Chiu
2025-05-17 11:02 ` Coly Li
2025-05-20 11:51 ` Kuan-Wei Chiu
2025-05-20 12:13 ` Coly Li
2025-05-20 12:26 ` Kuan-Wei Chiu
2025-05-20 13:13 ` Coly Li
2025-05-21 14:40 ` Kuan-Wei Chiu
2025-06-06 7:39 ` Robert Pang
2025-06-06 12:37 ` Kuan-Wei Chiu
2025-07-03 15:28 ` Robert Pang
[not found] ` <10F3457F-FC71-4D21-B2CA-05346B68D873@coly.li>
[not found] ` <A52B16E2-DF9C-4C8A-9853-464385D1A7FA@coly.li>
2025-07-17 22:52 ` Robert Pang
2025-07-18 6:09 ` Coly Li
2025-05-16 17:10 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F3A62312-F568-499C-BAB4-017CEC11CDB5@coly.li \
--to=i@coly.li \
--cc=colyli@kernel.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcache@vger.kernel.org \
--cc=mingzhe.zou@easystack.cn \
--cc=robertpang@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox