* [PATCH v2 0/1] bcache: reduce front IO latency during GC
@ 2025-04-15 17:39 Robert Pang
2025-04-15 17:39 ` [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
0 siblings, 1 reply; 6+ messages in thread
From: Robert Pang @ 2025-04-15 17:39 UTC (permalink / raw)
To: Coly Li, Kent Overstreet, linux-bcache; +Cc: Robert Pang, Mingzhe Zou
In performance benchmarks on disks with bcache using the Linux 6.6 kernel, we
observe noticeable IO latency increase during btree garbage collection. The
increase ranges from high tens to hundreds of milliseconds, depending on the
size of the cache device. Further investigation reveals that it is the same
issue reported in [1], where the large number of nodes processed in each
incremental GC cycle causes the front IO latency.
Building upon the approach suggested in [1], this patch decomposes the
incremental GC process into more but smaller cycles. In contrast to [1], this
implementation adopts a simpler strategy by setting a lower limit of 10 nodes
per cycle to reduce front IO delay and introducing a fixed 10ms sleep per cycle
when front IO is in progress. Furthermore, when garbage collection statistics
are available, the number of nodes processed per cycle is dynamically rescaled
based on the average GC frequency to ensure GC completes well within the next
subsequent scheduled interval.
Testing with a 750GB NVMe cache and 256KB bucket size using the following fio
configuration demonstrates that our patch reduces front IO latency during GC
without significantly increasing GC duration.
ioengine=libaio
direct=1
bs=4k
size=900G
iodepth=10
readwrite=randwrite
log_avg_msec=10
Before:
time-ms,latency-ns,,,
12170, 285016, 1, 0, 0
12183, 296581, 1, 0, 0
12207, 6542725, 1, 0, 0
12242, 24483604, 1, 0, 0
12250, 1895628, 1, 0, 0
12260, 284854, 1, 0, 0
12270, 275513, 1, 0, 0
/sys/block/bcache0/bcache/cache/internal/btree_gc_average_duration_ms:2880
/sys/block/bcache0/bcache/cache/internal/btree_gc_average_frequency_sec:133
/sys/block/bcache0/bcache/cache/internal/btree_gc_last_sec:121
/sys/block/bcache0/bcache/cache/internal/btree_gc_max_duration_ms:3456
After:
12690, 378494, 1, 0, 0
12700, 413934, 1, 0, 0
12710, 661217, 1, 0, 0
12727, 354510, 1, 0, 0
12730, 1100768, 1, 0, 0
12742, 382484, 1, 0, 0
12750, 532679, 1, 0, 0
12760, 572758, 1, 0, 0
12773, 283416, 1, 0, 0
/sys/block/bcache0/bcache/cache/internal/btree_gc_average_duration_ms:3619
/sys/block/bcache0/bcache/cache/internal/btree_gc_average_frequency_sec:58
/sys/block/bcache0/bcache/cache/internal/btree_gc_last_sec:23
/sys/block/bcache0/bcache/cache/internal/btree_gc_max_duration_ms:3866
[1] https://lore.kernel.org/all/20220511073903.13568-1-mingzhe.zou@easystack.cn/
---
Changes in v2:
- Move the deletion of unused MAX_NEED_GC and MAX_SAVE_PRIO to a separate patch
- Add code comments to explain time_stat_average().
Robert Pang (1):
bcache: process fewer btree nodes in incremental GC cycles
drivers/md/bcache/btree.c | 36 +++++++++++++++++-------------------
drivers/md/bcache/util.h | 7 +++++++
2 files changed, 24 insertions(+), 19 deletions(-)
--
2.49.0.805.g082f7c87e0-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
2025-04-15 17:39 [PATCH v2 0/1] bcache: reduce front IO latency during GC Robert Pang
@ 2025-04-15 17:39 ` Robert Pang
2025-04-17 2:09 ` kernel test robot
2025-04-17 2:31 ` kernel test robot
0 siblings, 2 replies; 6+ messages in thread
From: Robert Pang @ 2025-04-15 17:39 UTC (permalink / raw)
To: Coly Li, Kent Overstreet, linux-bcache; +Cc: Robert Pang, Mingzhe Zou
Current incremental GC processes a minimum of 100 btree nodes per cycle,
followed by a 100ms sleep. For NVMe cache devices, where the average node
processing time is ~1ms, this leads to front-side I/O latency potentially
reaching tens or hundreds of milliseconds during GC execution.
This commit resolves this latency issue by reducing the minimum node processing
count per cycle to 10 and the inter-cycle sleep duration to 10ms. It also
integrates a check of existing GC statistics to re-scale the number of nodes
processed per sleep interval when needed, ensuring GC finishes well before the
next GC is due.
Signed-off-by: Robert Pang <robertpang@google.com>
---
drivers/md/bcache/btree.c | 36 +++++++++++++++++-------------------
drivers/md/bcache/util.h | 7 +++++++
2 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ed40d8600656..447c79157d0e 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -90,9 +90,8 @@
#define MAX_NEED_GC 64
#define MAX_SAVE_PRIO 72
-#define MAX_GC_TIMES 100
-#define MIN_GC_NODES 100
-#define GC_SLEEP_MS 100
+#define GC_MIN_NODES 10
+#define GC_SLEEP_MS 10
#define PTR_DIRTY_BIT (((uint64_t) 1 << 36))
@@ -1585,25 +1584,24 @@ static unsigned int btree_gc_count_keys(struct btree *b)
static size_t btree_gc_min_nodes(struct cache_set *c)
{
- size_t min_nodes;
+ size_t min_nodes = GC_MIN_NODES;
+ uint64_t gc_max_ms = time_stat_average(&c->btree_gc_time, frequency, ms) / 2;
/*
- * Since incremental GC would stop 100ms when front
- * side I/O comes, so when there are many btree nodes,
- * if GC only processes constant (100) nodes each time,
- * GC would last a long time, and the front side I/Os
- * would run out of the buckets (since no new bucket
- * can be allocated during GC), and be blocked again.
- * So GC should not process constant nodes, but varied
- * nodes according to the number of btree nodes, which
- * realized by dividing GC into constant(100) times,
- * so when there are many btree nodes, GC can process
- * more nodes each time, otherwise, GC will process less
- * nodes each time (but no less than MIN_GC_NODES)
+ * The incremental garbage collector operates by processing
+ * GC_MIN_NODES at a time, pausing for GC_SLEEP_MS between
+ * each interval. If historical garbage collection statistics
+ * (btree_gc_time) is available, the maximum allowable GC
+ * duration is set to half of this observed frequency. To
+ * prevent exceeding this maximum duration, the number of
+ * nodes processed in the current step may be increased if
+ * the projected completion time based on the current pace
+ * extends beyond the allowed limit. This ensures timely GC
+ * completion before the next GC is due.
*/
- min_nodes = c->gc_stats.nodes / MAX_GC_TIMES;
- if (min_nodes < MIN_GC_NODES)
- min_nodes = MIN_GC_NODES;
+ if ((gc_max_ms >= GC_SLEEP_MS) &&
+ (GC_SLEEP_MS * (c->gc_stats.nodes / min_nodes)) > gc_max_ms)
+ min_nodes = c->gc_stats.nodes / (gc_max_ms / GC_SLEEP_MS);
return min_nodes;
}
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index 539454d8e2d0..ba2e6de5ec71 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -305,6 +305,13 @@ static inline unsigned int local_clock_us(void)
#define NSEC_PER_ms NSEC_PER_MSEC
#define NSEC_PER_sec NSEC_PER_SEC
+/*
+ * time_stat_average - return the average duration/frequency stat of a time
+ * stats in the desired unit.
+ */
+#define time_stat_average(stats, stat, units) \
+ div_u64((stats)->average_ ## stat >> 8, NSEC_PER_ ## units)
+
#define __print_time_stat(stats, name, stat, units) \
sysfs_print(name ## _ ## stat ## _ ## units, \
div_u64((stats)->stat >> 8, NSEC_PER_ ## units))
--
2.49.0.805.g082f7c87e0-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
2025-04-15 17:39 ` [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
@ 2025-04-17 2:09 ` kernel test robot
2025-04-17 2:31 ` kernel test robot
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2025-04-17 2:09 UTC (permalink / raw)
To: Robert Pang, Coly Li, Kent Overstreet, linux-bcache
Cc: oe-kbuild-all, Robert Pang, Mingzhe Zou
Hi Robert,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.15-rc2 next-20250416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Robert-Pang/bcache-process-fewer-btree-nodes-in-incremental-GC-cycles/20250416-133615
base: linus/master
patch link: https://lore.kernel.org/r/20250415174145.346121-2-robertpang%40google.com
patch subject: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
config: i386-buildonly-randconfig-004-20250417 (https://download.01.org/0day-ci/archive/20250417/202504170950.T7aXwrDK-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250417/202504170950.T7aXwrDK-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504170950.T7aXwrDK-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: drivers/md/bcache/btree.o: in function `btree_gc_recurse':
>> btree.c:(.text+0x4c82): undefined reference to `__udivdi3'
>> ld: btree.c:(.text+0x4c91): undefined reference to `__udivdi3'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
2025-04-15 17:39 ` [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
2025-04-17 2:09 ` kernel test robot
@ 2025-04-17 2:31 ` kernel test robot
2025-04-17 3:13 ` Robert Pang
1 sibling, 1 reply; 6+ messages in thread
From: kernel test robot @ 2025-04-17 2:31 UTC (permalink / raw)
To: Robert Pang, Coly Li, Kent Overstreet, linux-bcache
Cc: llvm, oe-kbuild-all, Robert Pang, Mingzhe Zou
Hi Robert,
kernel test robot noticed the following build errors:
[auto build test ERROR on linus/master]
[also build test ERROR on v6.15-rc2 next-20250416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Robert-Pang/bcache-process-fewer-btree-nodes-in-incremental-GC-cycles/20250416-133615
base: linus/master
patch link: https://lore.kernel.org/r/20250415174145.346121-2-robertpang%40google.com
patch subject: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
config: i386-buildonly-randconfig-001-20250417 (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504171044.TFdtTfEh-lkp@intel.com/
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "__udivdi3" [drivers/md/bcache/bcache.ko] undefined!
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
2025-04-17 2:31 ` kernel test robot
@ 2025-04-17 3:13 ` Robert Pang
2025-04-17 5:27 ` Robert Pang
0 siblings, 1 reply; 6+ messages in thread
From: Robert Pang @ 2025-04-17 3:13 UTC (permalink / raw)
To: kernel test robot
Cc: Coly Li, Kent Overstreet, linux-bcache, llvm, oe-kbuild-all,
Mingzhe Zou
Please accept my apologies for the build error. It appears that the current
patch is missing a macro for 64-bit integer division needed for the i386
architecture. I will address this issue and submit a revised version promptly.
On Wed, Apr 16, 2025 at 7:31 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Robert,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on linus/master]
> [also build test ERROR on v6.15-rc2 next-20250416]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Robert-Pang/bcache-process-fewer-btree-nodes-in-incremental-GC-cycles/20250416-133615
> base: linus/master
> patch link: https://lore.kernel.org/r/20250415174145.346121-2-robertpang%40google.com
> patch subject: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
> config: i386-buildonly-randconfig-001-20250417 (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/config)
> compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202504171044.TFdtTfEh-lkp@intel.com/
>
> All errors (new ones prefixed by >>, old ones prefixed by <<):
>
> >> ERROR: modpost: "__udivdi3" [drivers/md/bcache/bcache.ko] undefined!
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
2025-04-17 3:13 ` Robert Pang
@ 2025-04-17 5:27 ` Robert Pang
0 siblings, 0 replies; 6+ messages in thread
From: Robert Pang @ 2025-04-17 5:27 UTC (permalink / raw)
To: kernel test robot
Cc: Coly Li, Kent Overstreet, linux-bcache, llvm, oe-kbuild-all,
Mingzhe Zou
Revised version sent.
On Wed, Apr 16, 2025 at 8:13 PM Robert Pang <robertpang@google.com> wrote:
>
> Please accept my apologies for the build error. It appears that the current
> patch is missing a macro for 64-bit integer division needed for the i386
> architecture. I will address this issue and submit a revised version promptly.
>
> On Wed, Apr 16, 2025 at 7:31 PM kernel test robot <lkp@intel.com> wrote:
> >
> > Hi Robert,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on linus/master]
> > [also build test ERROR on v6.15-rc2 next-20250416]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Robert-Pang/bcache-process-fewer-btree-nodes-in-incremental-GC-cycles/20250416-133615
> > base: linus/master
> > patch link: https://lore.kernel.org/r/20250415174145.346121-2-robertpang%40google.com
> > patch subject: [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles
> > config: i386-buildonly-randconfig-001-20250417 (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/config)
> > compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250417/202504171044.TFdtTfEh-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202504171044.TFdtTfEh-lkp@intel.com/
> >
> > All errors (new ones prefixed by >>, old ones prefixed by <<):
> >
> > >> ERROR: modpost: "__udivdi3" [drivers/md/bcache/bcache.ko] undefined!
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-17 5:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-15 17:39 [PATCH v2 0/1] bcache: reduce front IO latency during GC Robert Pang
2025-04-15 17:39 ` [PATCH v2 1/1] bcache: process fewer btree nodes in incremental GC cycles Robert Pang
2025-04-17 2:09 ` kernel test robot
2025-04-17 2:31 ` kernel test robot
2025-04-17 3:13 ` Robert Pang
2025-04-17 5:27 ` Robert Pang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).