[PATCH] blk-wbt: Fix io starvation in wbt_rqw

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()
@ 2025-07-31 12:33 Julian Sun
  2025-07-31 15:40 ` Yizhou Tang
  0 siblings, 1 reply; 4+ messages in thread
From: Julian Sun @ 2025-07-31 12:33 UTC (permalink / raw)
  To: linux-block; +Cc: axboe, stable, Julian Sun

Recently, we encountered the following hungtask:

INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/11:2    D    0 2981147      2 0x80004000
Workqueue: cgroup_destroy css_free_rwork_fn
Call Trace:
 __schedule+0x934/0xe10
 schedule+0x40/0xb0
 wb_wait_for_completion+0x52/0x80
 ? finish_wait+0x80/0x80
 mem_cgroup_css_free+0x3a/0x1b0
 css_free_rwork_fn+0x42/0x380
 process_one_work+0x1a2/0x360
 worker_thread+0x30/0x390
 ? create_worker+0x1a0/0x1a0
 kthread+0x110/0x130
 ? __kthread_cancel_work+0x40/0x40
 ret_from_fork+0x1f/0x30

This is because the writeback thread has been continuously and repeatedly
throttled by wbt, but at the same time, the writes of another thread
proceed quite smoothly.
After debugging, I believe it is caused by the following reasons.

When thread A is blocked by wbt, the I/O issued by thread B will
use a deeper queue depth(rwb->rq_depth.max_depth) because it
meets the conditions of wb_recent_wait(), thus allowing thread B's
I/O to be issued smoothly and resulting in the inflight I/O of wbt
remaining relatively high.

However, when I/O completes, due to the high inflight I/O of wbt,
the condition "limit - inflight >= rwb->wb_background / 2"
in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
to remain unable to be woken up.

Some on-site information:

>>> rwb.rq_depth.max_depth
(unsigned int)48
>>> rqw.inflight.counter.value_()
44
>>> rqw.inflight.counter.value_()
35
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)3
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)2
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)20
>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)12

cat wb_normal
24
cat wb_background
12

To fix this issue, we can use max_depth in wbt_rqw_done(), so that
the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
will also be consistent, which is more reasonable.

Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
---
 block/blk-wbt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a50d4cd55f41..d6a2782d442f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
 	else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
 		 !wb_recent_wait(rwb))
 		limit = 0;
+	else if (wb_recent_wait(rwb))
+		limit = rwb->rq_depth.max_depth;
 	else
 		limit = rwb->wb_normal;

-- 
2.20.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()
  2025-07-31 12:33 [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done() Julian Sun
@ 2025-07-31 15:40 ` Yizhou Tang
  2025-07-31 17:12   ` Yu Kuai
  0 siblings, 1 reply; 4+ messages in thread
From: Yizhou Tang @ 2025-07-31 15:40 UTC (permalink / raw)
  To: Julian Sun; +Cc: linux-block, axboe, stable, Julian Sun

Hi Julian,

On Thu, Jul 31, 2025 at 8:33 PM Julian Sun <sunjunchao2870@gmail.com> wrote:
>
> Recently, we encountered the following hungtask:
>
> INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/11:2    D    0 2981147      2 0x80004000
> Workqueue: cgroup_destroy css_free_rwork_fn
> Call Trace:
>  __schedule+0x934/0xe10
>  schedule+0x40/0xb0
>  wb_wait_for_completion+0x52/0x80

I don’t see __wbt_wait() or rq_qos_wait() here, so I suspect this call
stack is not directly related to wbt.


>  ? finish_wait+0x80/0x80
>  mem_cgroup_css_free+0x3a/0x1b0
>  css_free_rwork_fn+0x42/0x380
>  process_one_work+0x1a2/0x360
>  worker_thread+0x30/0x390
>  ? create_worker+0x1a0/0x1a0
>  kthread+0x110/0x130
>  ? __kthread_cancel_work+0x40/0x40
>  ret_from_fork+0x1f/0x30
>
> This is because the writeback thread has been continuously and repeatedly
> throttled by wbt, but at the same time, the writes of another thread
> proceed quite smoothly.
> After debugging, I believe it is caused by the following reasons.
>
> When thread A is blocked by wbt, the I/O issued by thread B will
> use a deeper queue depth(rwb->rq_depth.max_depth) because it
> meets the conditions of wb_recent_wait(), thus allowing thread B's
> I/O to be issued smoothly and resulting in the inflight I/O of wbt
> remaining relatively high.
>
> However, when I/O completes, due to the high inflight I/O of wbt,
> the condition "limit - inflight >= rwb->wb_background / 2"
> in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
> to remain unable to be woken up.

From your description above, it seems you're suggesting that if A is
throttled by wbt, then a writer B on the same device could
continuously starve A.
This situation is not possible — please refer to rq_qos_wait(): if A
is already sleeping, then when B calls wq_has_sleeper(), it will
detect A’s presence, meaning B will also be throttled.

Thanks,
Yi

>
> Some on-site information:
>
> >>> rwb.rq_depth.max_depth
> (unsigned int)48
> >>> rqw.inflight.counter.value_()
> 44
> >>> rqw.inflight.counter.value_()
> 35
> >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> (unsigned long)3
> >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> (unsigned long)2
> >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> (unsigned long)20
> >>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> (unsigned long)12
>
> cat wb_normal
> 24
> cat wb_background
> 12
>
> To fix this issue, we can use max_depth in wbt_rqw_done(), so that
> the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
> will also be consistent, which is more reasonable.
>
> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> ---
>  block/blk-wbt.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index a50d4cd55f41..d6a2782d442f 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
>         else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
>                  !wb_recent_wait(rwb))
>                 limit = 0;
> +       else if (wb_recent_wait(rwb))
> +               limit = rwb->rq_depth.max_depth;
>         else
>                 limit = rwb->wb_normal;
>
> --
> 2.20.1
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()
  2025-07-31 15:40 ` Yizhou Tang
@ 2025-07-31 17:12   ` Yu Kuai
  2025-08-06  7:52     ` Julian Sun
  0 siblings, 1 reply; 4+ messages in thread
From: Yu Kuai @ 2025-07-31 17:12 UTC (permalink / raw)
  To: Yizhou Tang, Julian Sun; +Cc: linux-block, axboe, stable, Julian Sun

Hi,

在 2025/7/31 23:40, Yizhou Tang 写道:
> Hi Julian,
>
> On Thu, Jul 31, 2025 at 8:33 PM Julian Sun <sunjunchao2870@gmail.com> wrote:
>> Recently, we encountered the following hungtask:
>>
>> INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> kworker/11:2    D    0 2981147      2 0x80004000
>> Workqueue: cgroup_destroy css_free_rwork_fn
>> Call Trace:
>>   __schedule+0x934/0xe10
>>   schedule+0x40/0xb0
>>   wb_wait_for_completion+0x52/0x80
> I don’t see __wbt_wait() or rq_qos_wait() here, so I suspect this call
> stack is not directly related to wbt.
>
>
>>   ? finish_wait+0x80/0x80
>>   mem_cgroup_css_free+0x3a/0x1b0
>>   css_free_rwork_fn+0x42/0x380
>>   process_one_work+0x1a2/0x360
>>   worker_thread+0x30/0x390
>>   ? create_worker+0x1a0/0x1a0
>>   kthread+0x110/0x130
>>   ? __kthread_cancel_work+0x40/0x40
>>   ret_from_fork+0x1f/0x30
This is writeback cgroup is waiting for writeback to be done, if you 
figured out
they are throttled by wbt, you need to explain clearly, and it's very 
important to
provide evidence to support your analysis. However, the following 
analysis is
a mess :(
>>
>> This is because the writeback thread has been continuously and repeatedly
>> throttled by wbt, but at the same time, the writes of another thread
>> proceed quite smoothly.
>> After debugging, I believe it is caused by the following reasons.
>>
>> When thread A is blocked by wbt, the I/O issued by thread B will
>> use a deeper queue depth(rwb->rq_depth.max_depth) because it
>> meets the conditions of wb_recent_wait(), thus allowing thread B's
>> I/O to be issued smoothly and resulting in the inflight I/O of wbt
>> remaining relatively high.
>>
>> However, when I/O completes, due to the high inflight I/O of wbt,
>> the condition "limit - inflight >= rwb->wb_background / 2"
>> in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
>> to remain unable to be woken up.
>  From your description above, it seems you're suggesting that if A is
> throttled by wbt, then a writer B on the same device could
> continuously starve A.
> This situation is not possible — please refer to rq_qos_wait(): if A
> is already sleeping, then when B calls wq_has_sleeper(), it will
> detect A’s presence, meaning B will also be throttled.
Yes, there are three rq_wait in wbt, and each one is FIFO. It will be 
possible
if  A is backgroup, and B is swap.
>
> Thanks,
> Yi
>
>> Some on-site information:
>>
>>>>> rwb.rq_depth.max_depth
>> (unsigned int)48
>>>>> rqw.inflight.counter.value_()
>> 44
>>>>> rqw.inflight.counter.value_()
>> 35
>>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
>> (unsigned long)3
>>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
>> (unsigned long)2
>>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
>> (unsigned long)20
>>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
>> (unsigned long)12
>>
>> cat wb_normal
>> 24
>> cat wb_background
>> 12
>>
>> To fix this issue, we can use max_depth in wbt_rqw_done(), so that
>> the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
>> will also be consistent, which is more reasonable.
Are you able to reproduce this problem, and give this patch a test before
you send it?

Thanks,
Kuai
>>
>> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
>> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
>> ---
>>   block/blk-wbt.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>> index a50d4cd55f41..d6a2782d442f 100644
>> --- a/block/blk-wbt.c
>> +++ b/block/blk-wbt.c
>> @@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
>>          else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
>>                   !wb_recent_wait(rwb))
>>                  limit = 0;
>> +       else if (wb_recent_wait(rwb))
>> +               limit = rwb->rq_depth.max_depth;
>>          else
>>                  limit = rwb->wb_normal;
>>
>> --
>> 2.20.1
>>
>>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()
  2025-07-31 17:12   ` Yu Kuai
@ 2025-08-06  7:52     ` Julian Sun
  0 siblings, 0 replies; 4+ messages in thread
From: Julian Sun @ 2025-08-06  7:52 UTC (permalink / raw)
  To: yukuai; +Cc: Yizhou Tang, linux-block, axboe, stable, Julian Sun

Hi,

On Fri, Aug 1, 2025 at 1:13 AM Yu Kuai <yukuai@kernel.org> wrote:
>
> Hi,
>
> 在 2025/7/31 23:40, Yizhou Tang 写道:
> > Hi Julian,
> >
> > On Thu, Jul 31, 2025 at 8:33 PM Julian Sun <sunjunchao2870@gmail.com> wrote:
> >> Recently, we encountered the following hungtask:
> >>
> >> INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> kworker/11:2    D    0 2981147      2 0x80004000
> >> Workqueue: cgroup_destroy css_free_rwork_fn
> >> Call Trace:
> >>   __schedule+0x934/0xe10
> >>   schedule+0x40/0xb0
> >>   wb_wait_for_completion+0x52/0x80
> > I don’t see __wbt_wait() or rq_qos_wait() here, so I suspect this call
> > stack is not directly related to wbt.
> >
> >
> >>   ? finish_wait+0x80/0x80
> >>   mem_cgroup_css_free+0x3a/0x1b0
> >>   css_free_rwork_fn+0x42/0x380
> >>   process_one_work+0x1a2/0x360
> >>   worker_thread+0x30/0x390
> >>   ? create_worker+0x1a0/0x1a0
> >>   kthread+0x110/0x130
> >>   ? __kthread_cancel_work+0x40/0x40
> >>   ret_from_fork+0x1f/0x30
> This is writeback cgroup is waiting for writeback to be done, if you
> figured out
> they are throttled by wbt, you need to explain clearly, and it's very
> important to
> provide evidence to support your analysis. However, the following
> analysis is
> a mess :(
Thanks for the detailed review.
Yes, the description is a bit confusing. I will take a more detailed
look at the on-site information.
> >>
> >> This is because the writeback thread has been continuously and repeatedly
> >> throttled by wbt, but at the same time, the writes of another thread
> >> proceed quite smoothly.
> >> After debugging, I believe it is caused by the following reasons.
> >>
> >> When thread A is blocked by wbt, the I/O issued by thread B will
> >> use a deeper queue depth(rwb->rq_depth.max_depth) because it
> >> meets the conditions of wb_recent_wait(), thus allowing thread B's
> >> I/O to be issued smoothly and resulting in the inflight I/O of wbt
> >> remaining relatively high.
> >>
> >> However, when I/O completes, due to the high inflight I/O of wbt,
> >> the condition "limit - inflight >= rwb->wb_background / 2"
> >> in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
> >> to remain unable to be woken up.
> >  From your description above, it seems you're suggesting that if A is
> > throttled by wbt, then a writer B on the same device could
> > continuously starve A.
> > This situation is not possible — please refer to rq_qos_wait(): if A
> > is already sleeping, then when B calls wq_has_sleeper(), it will
> > detect A’s presence, meaning B will also be throttled.
> Yes, there are three rq_wait in wbt, and each one is FIFO. It will be
> possible
> if  A is backgroup, and B is swap.
> >
> > Thanks,
> > Yi
> >
> >> Some on-site information:
> >>
> >>>>> rwb.rq_depth.max_depth
> >> (unsigned int)48
> >>>>> rqw.inflight.counter.value_()
> >> 44
> >>>>> rqw.inflight.counter.value_()
> >> 35
> >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> >> (unsigned long)3
> >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> >> (unsigned long)2
> >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> >> (unsigned long)20
> >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
> >> (unsigned long)12
> >>
> >> cat wb_normal
> >> 24
> >> cat wb_background
> >> 12
> >>
> >> To fix this issue, we can use max_depth in wbt_rqw_done(), so that
> >> the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
> >> will also be consistent, which is more reasonable.
> Are you able to reproduce this problem, and give this patch a test before
> you send it?
>
> Thanks,
> Kuai
> >>
> >> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> >> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
> >> ---
> >>   block/blk-wbt.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> >> index a50d4cd55f41..d6a2782d442f 100644
> >> --- a/block/blk-wbt.c
> >> +++ b/block/blk-wbt.c
> >> @@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
> >>          else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
> >>                   !wb_recent_wait(rwb))
> >>                  limit = 0;
> >> +       else if (wb_recent_wait(rwb))
> >> +               limit = rwb->rq_depth.max_depth;
> >>          else
> >>                  limit = rwb->wb_normal;
> >>
> >> --
> >> 2.20.1
> >>
> >>
>

Thanks,
-- 
Julian Sun <sunjunchao2870@gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-06  7:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31 12:33 [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done() Julian Sun
2025-07-31 15:40 ` Yizhou Tang
2025-07-31 17:12   ` Yu Kuai
2025-08-06  7:52     ` Julian Sun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).