* [PATCH 0/2] block/blk-throttle: Fix throttle slice time for SSDs @ 2025-07-30 16:48 Guenter Roeck 2025-07-30 16:48 ` [PATCH 1/2] " Guenter Roeck 2025-07-30 16:48 ` [PATCH 2/2] block/blk-throttle: Remove throtl_slice from struct throtl_data Guenter Roeck 0 siblings, 2 replies; 5+ messages in thread From: Guenter Roeck @ 2025-07-30 16:48 UTC (permalink / raw) To: Tejun Heo Cc: Josef Bacik, Jens Axboe, Yu Kuai, cgroups, linux-block, linux-kernel, Guenter Roeck Since bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW"), the throttle slice time differs between SSD and non-SSD devices. This causes test failures with slow throttle speeds on SSD devices. The first patch in the series fixes the problem by restoring the throttle slice time to a fixed value, matching behavior seen prior to above mentioned revert. The second patch is optional and replaces the throtl_slice variable with a constant. ---------------------------------------------------------------- Guenter Roeck (2): block/blk-throttle: Fix throttle slice time for SSDs block/blk-throttle: Remove throtl_slice from struct throtl_data block/blk-throttle.c | 45 ++++++++++++++------------------------------- 1 file changed, 14 insertions(+), 31 deletions(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] block/blk-throttle: Fix throttle slice time for SSDs 2025-07-30 16:48 [PATCH 0/2] block/blk-throttle: Fix throttle slice time for SSDs Guenter Roeck @ 2025-07-30 16:48 ` Guenter Roeck 2025-07-30 18:30 ` Yu Kuai 2025-07-30 16:48 ` [PATCH 2/2] block/blk-throttle: Remove throtl_slice from struct throtl_data Guenter Roeck 1 sibling, 1 reply; 5+ messages in thread From: Guenter Roeck @ 2025-07-30 16:48 UTC (permalink / raw) To: Tejun Heo Cc: Josef Bacik, Jens Axboe, Yu Kuai, cgroups, linux-block, linux-kernel, Guenter Roeck Commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD") introduced device type specific throttle slices if BLK_DEV_THROTTLING_LOW was enabled. Commit bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") removed support for BLK_DEV_THROTTLING_LOW, but left the device type specific throttle slices in place. This effectively changed throttling behavior on systems with SSD which now use a different and non-configurable slice time compared to non-SSD devices. Practical impact is that throughput tests with low configured throttle values (65536 bps) experience less than expected throughput on SSDs, presumably due to rounding errors associated with the small throttle slice time used for those devices. The same tests pass when setting the throttle values to 65536 * 4 = 262144 bps. The original code sets the throttle slice time to DFL_THROTL_SLICE_HD if CONFIG_BLK_DEV_THROTTLING_LOW is disabled. Restore that code to fix the problem. With that, DFL_THROTL_SLICE_SSD is no longer necessary. Revert to the original code and re-introduce DFL_THROTL_SLICE to replace both DFL_THROTL_SLICE_HD and DFL_THROTL_SLICE_SSD. This effectively reverts commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD"). After the removal of CONFIG_BLK_DEV_THROTTLING_LOW, it is no longer necessary to enable block accounting, so remove the call to blk_stat_enable_accounting(). With that, the track_bio_latency variable is no longer used and can be deleted from struct throtl_data. Also, including blk-stat.h is no longer necessary. While at it, also remove MAX_THROTL_SLICE since it is not used anymore. Fixes: bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") Cc: Yu Kuai <yukuai3@huawei.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- block/blk-throttle.c | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 397b6a410f9e..924d09b51b69 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -12,7 +12,6 @@ #include <linux/blktrace_api.h> #include "blk.h" #include "blk-cgroup-rwstat.h" -#include "blk-stat.h" #include "blk-throttle.h" /* Max dispatch from a group in 1 round */ @@ -22,9 +21,7 @@ #define THROTL_QUANTUM 32 /* Throttling is performed over a slice and after that slice is renewed */ -#define DFL_THROTL_SLICE_HD (HZ / 10) -#define DFL_THROTL_SLICE_SSD (HZ / 50) -#define MAX_THROTL_SLICE (HZ) +#define DFL_THROTL_SLICE (HZ / 10) /* A workqueue to queue throttle related work */ static struct workqueue_struct *kthrotld_workqueue; @@ -45,8 +42,6 @@ struct throtl_data /* Work for dispatching throttled bios */ struct work_struct dispatch_work; - - bool track_bio_latency; }; static void throtl_pending_timer_fn(struct timer_list *t); @@ -1345,13 +1340,7 @@ static int blk_throtl_init(struct gendisk *disk) goto out; } - if (blk_queue_nonrot(q)) - td->throtl_slice = DFL_THROTL_SLICE_SSD; - else - td->throtl_slice = DFL_THROTL_SLICE_HD; - td->track_bio_latency = !queue_is_mq(q); - if (!td->track_bio_latency) - blk_stat_enable_accounting(q); + td->throtl_slice = DFL_THROTL_SLICE; out: blk_mq_unquiesce_queue(disk->queue); -- 2.45.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] block/blk-throttle: Fix throttle slice time for SSDs 2025-07-30 16:48 ` [PATCH 1/2] " Guenter Roeck @ 2025-07-30 18:30 ` Yu Kuai 2025-07-30 23:19 ` Guenter Roeck 0 siblings, 1 reply; 5+ messages in thread From: Yu Kuai @ 2025-07-30 18:30 UTC (permalink / raw) To: Guenter Roeck, Tejun Heo Cc: Josef Bacik, Jens Axboe, Yu Kuai, cgroups, linux-block, linux-kernel Hi, 在 2025/7/31 0:48, Guenter Roeck 写道: > Commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD") > introduced device type specific throttle slices if BLK_DEV_THROTTLING_LOW > was enabled. Commit bf20ab538c81 ("blk-throttle: remove > CONFIG_BLK_DEV_THROTTLING_LOW") removed support for BLK_DEV_THROTTLING_LOW, > but left the device type specific throttle slices in place. This > effectively changed throttling behavior on systems with SSD which now use > a different and non-configurable slice time compared to non-SSD devices. > Practical impact is that throughput tests with low configured throttle > values (65536 bps) experience less than expected throughput on SSDs, > presumably due to rounding errors associated with the small throttle slice > time used for those devices. The same tests pass when setting the throttle > values to 65536 * 4 = 262144 bps. > > The original code sets the throttle slice time to DFL_THROTL_SLICE_HD if > CONFIG_BLK_DEV_THROTTLING_LOW is disabled. Restore that code to fix the > problem. With that, DFL_THROTL_SLICE_SSD is no longer necessary. Revert to > the original code and re-introduce DFL_THROTL_SLICE to replace both > DFL_THROTL_SLICE_HD and DFL_THROTL_SLICE_SSD. This effectively reverts > commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD"). > > After the removal of CONFIG_BLK_DEV_THROTTLING_LOW, it is no longer > necessary to enable block accounting, so remove the call to > blk_stat_enable_accounting(). With that, the track_bio_latency variable > is no longer used and can be deleted from struct throtl_data. Also, > including blk-stat.h is no longer necessary. > > While at it, also remove MAX_THROTL_SLICE since it is not used anymore. > > Fixes: bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") > Cc: Yu Kuai <yukuai3@huawei.com> > Cc: Tejun Heo <tj@kernel.org> > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > block/blk-throttle.c | 15 ++------------- > 1 file changed, 2 insertions(+), 13 deletions(-) > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c > index 397b6a410f9e..924d09b51b69 100644 > --- a/block/blk-throttle.c > +++ b/block/blk-throttle.c > @@ -12,7 +12,6 @@ > #include <linux/blktrace_api.h> > #include "blk.h" > #include "blk-cgroup-rwstat.h" > -#include "blk-stat.h" > #include "blk-throttle.h" > > /* Max dispatch from a group in 1 round */ > @@ -22,9 +21,7 @@ > #define THROTL_QUANTUM 32 > > /* Throttling is performed over a slice and after that slice is renewed */ > -#define DFL_THROTL_SLICE_HD (HZ / 10) > -#define DFL_THROTL_SLICE_SSD (HZ / 50) > -#define MAX_THROTL_SLICE (HZ) > +#define DFL_THROTL_SLICE (HZ / 10) > > /* A workqueue to queue throttle related work */ > static struct workqueue_struct *kthrotld_workqueue; > @@ -45,8 +42,6 @@ struct throtl_data > > /* Work for dispatching throttled bios */ > struct work_struct dispatch_work; > - > - bool track_bio_latency; > }; > > static void throtl_pending_timer_fn(struct timer_list *t); > @@ -1345,13 +1340,7 @@ static int blk_throtl_init(struct gendisk *disk) > goto out; > } > > - if (blk_queue_nonrot(q)) > - td->throtl_slice = DFL_THROTL_SLICE_SSD; > - else > - td->throtl_slice = DFL_THROTL_SLICE_HD; > - td->track_bio_latency = !queue_is_mq(q); > - if (!td->track_bio_latency) > - blk_stat_enable_accounting(q); > + td->throtl_slice = DFL_THROTL_SLICE; > > out: > blk_mq_unquiesce_queue(disk->queue); This looks correct, I do missed the throtl_slice for ssd is only used with BLK_DEV_THROTTLING_LOW. However, I think it's better to factor the track_bio_latency changes into a separate patch. Thanks, Kuai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] block/blk-throttle: Fix throttle slice time for SSDs 2025-07-30 18:30 ` Yu Kuai @ 2025-07-30 23:19 ` Guenter Roeck 0 siblings, 0 replies; 5+ messages in thread From: Guenter Roeck @ 2025-07-30 23:19 UTC (permalink / raw) To: yukuai, Tejun Heo Cc: Josef Bacik, Jens Axboe, Yu Kuai, cgroups, linux-block, linux-kernel On 7/30/25 11:30, Yu Kuai wrote: > Hi, > > 在 2025/7/31 0:48, Guenter Roeck 写道: >> Commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD") >> introduced device type specific throttle slices if BLK_DEV_THROTTLING_LOW >> was enabled. Commit bf20ab538c81 ("blk-throttle: remove >> CONFIG_BLK_DEV_THROTTLING_LOW") removed support for BLK_DEV_THROTTLING_LOW, >> but left the device type specific throttle slices in place. This >> effectively changed throttling behavior on systems with SSD which now use >> a different and non-configurable slice time compared to non-SSD devices. >> Practical impact is that throughput tests with low configured throttle >> values (65536 bps) experience less than expected throughput on SSDs, >> presumably due to rounding errors associated with the small throttle slice >> time used for those devices. The same tests pass when setting the throttle >> values to 65536 * 4 = 262144 bps. >> >> The original code sets the throttle slice time to DFL_THROTL_SLICE_HD if >> CONFIG_BLK_DEV_THROTTLING_LOW is disabled. Restore that code to fix the >> problem. With that, DFL_THROTL_SLICE_SSD is no longer necessary. Revert to >> the original code and re-introduce DFL_THROTL_SLICE to replace both >> DFL_THROTL_SLICE_HD and DFL_THROTL_SLICE_SSD. This effectively reverts >> commit d61fcfa4bb18 ("blk-throttle: choose a small throtl_slice for SSD"). >> >> After the removal of CONFIG_BLK_DEV_THROTTLING_LOW, it is no longer >> necessary to enable block accounting, so remove the call to >> blk_stat_enable_accounting(). With that, the track_bio_latency variable >> is no longer used and can be deleted from struct throtl_data. Also, >> including blk-stat.h is no longer necessary. >> >> While at it, also remove MAX_THROTL_SLICE since it is not used anymore. >> >> Fixes: bf20ab538c81 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") >> Cc: Yu Kuai <yukuai3@huawei.com> >> Cc: Tejun Heo <tj@kernel.org> >> Signed-off-by: Guenter Roeck <linux@roeck-us.net> >> --- >> block/blk-throttle.c | 15 ++------------- >> 1 file changed, 2 insertions(+), 13 deletions(-) >> >> diff --git a/block/blk-throttle.c b/block/blk-throttle.c >> index 397b6a410f9e..924d09b51b69 100644 >> --- a/block/blk-throttle.c >> +++ b/block/blk-throttle.c >> @@ -12,7 +12,6 @@ >> #include <linux/blktrace_api.h> >> #include "blk.h" >> #include "blk-cgroup-rwstat.h" >> -#include "blk-stat.h" >> #include "blk-throttle.h" >> /* Max dispatch from a group in 1 round */ >> @@ -22,9 +21,7 @@ >> #define THROTL_QUANTUM 32 >> /* Throttling is performed over a slice and after that slice is renewed */ >> -#define DFL_THROTL_SLICE_HD (HZ / 10) >> -#define DFL_THROTL_SLICE_SSD (HZ / 50) >> -#define MAX_THROTL_SLICE (HZ) >> +#define DFL_THROTL_SLICE (HZ / 10) >> /* A workqueue to queue throttle related work */ >> static struct workqueue_struct *kthrotld_workqueue; >> @@ -45,8 +42,6 @@ struct throtl_data >> /* Work for dispatching throttled bios */ >> struct work_struct dispatch_work; >> - >> - bool track_bio_latency; >> }; >> static void throtl_pending_timer_fn(struct timer_list *t); >> @@ -1345,13 +1340,7 @@ static int blk_throtl_init(struct gendisk *disk) >> goto out; >> } >> - if (blk_queue_nonrot(q)) >> - td->throtl_slice = DFL_THROTL_SLICE_SSD; >> - else >> - td->throtl_slice = DFL_THROTL_SLICE_HD; >> - td->track_bio_latency = !queue_is_mq(q); >> - if (!td->track_bio_latency) >> - blk_stat_enable_accounting(q); >> + td->throtl_slice = DFL_THROTL_SLICE; >> out: >> blk_mq_unquiesce_queue(disk->queue); > This looks correct, I do missed the throtl_slice for ssd is only used with > BLK_DEV_THROTTLING_LOW. However, I think it's better to factor the > track_bio_latency changes into a separate patch. > I had combined it because it is another left-over from bf20ab538c81 and I don't know if enabling statistics has other side effects. But, sure, I can split it out if that is preferred. Let's wait for feedback from Jens and/or Tejun; I'll follow their guidance. Thanks, Guenter ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/2] block/blk-throttle: Remove throtl_slice from struct throtl_data 2025-07-30 16:48 [PATCH 0/2] block/blk-throttle: Fix throttle slice time for SSDs Guenter Roeck 2025-07-30 16:48 ` [PATCH 1/2] " Guenter Roeck @ 2025-07-30 16:48 ` Guenter Roeck 1 sibling, 0 replies; 5+ messages in thread From: Guenter Roeck @ 2025-07-30 16:48 UTC (permalink / raw) To: Tejun Heo Cc: Josef Bacik, Jens Axboe, Yu Kuai, cgroups, linux-block, linux-kernel, Guenter Roeck throtl_slice is now a constant. Remove the variable and use the constant directly where needed. Cc: Yu Kuai <yukuai3@huawei.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- block/blk-throttle.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 924d09b51b69..7756e6c8338d 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -38,8 +38,6 @@ struct throtl_data /* Total Number of queued bios on READ and WRITE lists */ unsigned int nr_queued[2]; - unsigned int throtl_slice; - /* Work for dispatching throttled bios */ struct work_struct dispatch_work; }; @@ -446,7 +444,7 @@ static void throtl_dequeue_tg(struct throtl_grp *tg) static void throtl_schedule_pending_timer(struct throtl_service_queue *sq, unsigned long expires) { - unsigned long max_expire = jiffies + 8 * sq_to_td(sq)->throtl_slice; + unsigned long max_expire = jiffies + 8 * DFL_THROTL_SLICE; /* * Since we are adjusting the throttle limit dynamically, the sleep @@ -514,7 +512,7 @@ static inline void throtl_start_new_slice_with_credit(struct throtl_grp *tg, if (time_after(start, tg->slice_start[rw])) tg->slice_start[rw] = start; - tg->slice_end[rw] = jiffies + tg->td->throtl_slice; + tg->slice_end[rw] = jiffies + DFL_THROTL_SLICE; throtl_log(&tg->service_queue, "[%c] new slice with credit start=%lu end=%lu jiffies=%lu", rw == READ ? 'R' : 'W', tg->slice_start[rw], @@ -529,7 +527,7 @@ static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw, tg->io_disp[rw] = 0; } tg->slice_start[rw] = jiffies; - tg->slice_end[rw] = jiffies + tg->td->throtl_slice; + tg->slice_end[rw] = jiffies + DFL_THROTL_SLICE; throtl_log(&tg->service_queue, "[%c] new slice start=%lu end=%lu jiffies=%lu", @@ -540,7 +538,7 @@ static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw, static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw, unsigned long jiffy_end) { - tg->slice_end[rw] = roundup(jiffy_end, tg->td->throtl_slice); + tg->slice_end[rw] = roundup(jiffy_end, DFL_THROTL_SLICE); } static inline void throtl_extend_slice(struct throtl_grp *tg, bool rw, @@ -671,12 +669,12 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) * sooner, then we need to reduce slice_end. A high bogus slice_end * is bad because it does not allow new slice to start. */ - throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice); + throtl_set_slice_end(tg, rw, jiffies + DFL_THROTL_SLICE); time_elapsed = rounddown(jiffies - tg->slice_start[rw], - tg->td->throtl_slice); + DFL_THROTL_SLICE); /* Don't trim slice until at least 2 slices are used */ - if (time_elapsed < tg->td->throtl_slice * 2) + if (time_elapsed < DFL_THROTL_SLICE * 2) return; /* @@ -687,7 +685,7 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) * lower rate than expected. Therefore, other than the above rounddown, * one extra slice is preserved for deviation. */ - time_elapsed -= tg->td->throtl_slice; + time_elapsed -= DFL_THROTL_SLICE; bytes_trim = throtl_trim_bps(tg, rw, time_elapsed); io_trim = throtl_trim_iops(tg, rw, time_elapsed); if (!bytes_trim && !io_trim) @@ -697,7 +695,7 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) throtl_log(&tg->service_queue, "[%c] trim slice nr=%lu bytes=%lld io=%d start=%lu end=%lu jiffies=%lu", - rw == READ ? 'R' : 'W', time_elapsed / tg->td->throtl_slice, + rw == READ ? 'R' : 'W', time_elapsed / DFL_THROTL_SLICE, bytes_trim, io_trim, tg->slice_start[rw], tg->slice_end[rw], jiffies); } @@ -768,7 +766,7 @@ static unsigned long tg_within_iops_limit(struct throtl_grp *tg, struct bio *bio jiffy_elapsed = jiffies - tg->slice_start[rw]; /* Round up to the next throttle slice, wait time must be nonzero */ - jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, tg->td->throtl_slice); + jiffy_elapsed_rnd = roundup(jiffy_elapsed + 1, DFL_THROTL_SLICE); io_allowed = calculate_io_allowed(iops_limit, jiffy_elapsed_rnd); if (io_allowed > 0 && tg->io_disp[rw] + 1 <= io_allowed) return 0; @@ -794,9 +792,9 @@ static unsigned long tg_within_bps_limit(struct throtl_grp *tg, struct bio *bio, /* Slice has just started. Consider one slice interval */ if (!jiffy_elapsed) - jiffy_elapsed_rnd = tg->td->throtl_slice; + jiffy_elapsed_rnd = DFL_THROTL_SLICE; - jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); + jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, DFL_THROTL_SLICE); bytes_allowed = calculate_bytes_allowed(bps_limit, jiffy_elapsed_rnd); /* Need to consider the case of bytes_allowed overflow. */ if ((bytes_allowed > 0 && tg->bytes_disp[rw] + bio_size <= bytes_allowed) @@ -848,7 +846,7 @@ static void tg_update_slice(struct throtl_grp *tg, bool rw) sq_queued(&tg->service_queue, rw) == 0) throtl_start_new_slice(tg, rw, true); else - throtl_extend_slice(tg, rw, jiffies + tg->td->throtl_slice); + throtl_extend_slice(tg, rw, jiffies + DFL_THROTL_SLICE); } static unsigned long tg_dispatch_bps_time(struct throtl_grp *tg, struct bio *bio) @@ -1337,12 +1335,8 @@ static int blk_throtl_init(struct gendisk *disk) if (ret) { q->td = NULL; kfree(td); - goto out; } - td->throtl_slice = DFL_THROTL_SLICE; - -out: blk_mq_unquiesce_queue(disk->queue); blk_mq_unfreeze_queue(disk->queue, memflags); -- 2.45.2 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-30 23:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-30 16:48 [PATCH 0/2] block/blk-throttle: Fix throttle slice time for SSDs Guenter Roeck 2025-07-30 16:48 ` [PATCH 1/2] " Guenter Roeck 2025-07-30 18:30 ` Yu Kuai 2025-07-30 23:19 ` Guenter Roeck 2025-07-30 16:48 ` [PATCH 2/2] block/blk-throttle: Remove throtl_slice from struct throtl_data Guenter Roeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).