[PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
@ 2025-09-25 13:22 Julian Sun
  2025-09-25 16:31 ` Jan Kara
  2025-09-25 17:25 ` Mateusz Guzik
  0 siblings, 2 replies; 11+ messages in thread
From: Julian Sun @ 2025-09-25 13:22 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: jack, brauner, viro, peterz

Writing back a large number of pages can take a lots of time.
This issue is exacerbated when the underlying device is slow or
subject to block layer rate limiting, which in turn triggers
unexpected hung task warnings.

We can trigger a wake-up once a chunk has been written back and the
waiting time for writeback exceeds half of
sysctl_hung_task_timeout_secs.
This action allows the hung task detector to be aware of the writeback
progress, thereby eliminating these unexpected hung task warnings.

This patch has passed the xfstests 'check -g quick' test based on ext4,
with no additional failures introduced.

Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
---
 fs/fs-writeback.c                | 13 +++++++++++--
 include/linux/backing-dev-defs.h |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a07b8cf73ae2..475d52abfb3e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -14,6 +14,7 @@
  *		Additions for address_space-based writeback
  */
 
+#include <linux/sched/sysctl.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/spinlock.h>
@@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
 		kfree(work);
 	if (done) {
 		wait_queue_head_t *waitq = done->waitq;
+		/* Report progress to inform the hung task detector of the progress. */
+		bool force_wake = (jiffies - done->stamp) >
+				   sysctl_hung_task_timeout_secs * HZ / 2;
 
 		/* @done can't be accessed after the following dec */
-		if (atomic_dec_and_test(&done->cnt))
+		if (atomic_dec_and_test(&done->cnt) || force_wake)
 			wake_up_all(waitq);
 	}
 }
@@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
 void wb_wait_for_completion(struct wb_completion *done)
 {
 	atomic_dec(&done->cnt);		/* put down the initial count */
-	wait_event(*done->waitq, !atomic_read(&done->cnt));
+	wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
 }
 
 #ifdef CONFIG_CGROUP_WRITEBACK
@@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
 		 */
 		__writeback_single_inode(inode, &wbc);
 
+		/* Report progress to inform the hung task detector of the progress. */
+		if (work->done && (jiffies - work->done->stamp) >
+		    HZ * sysctl_hung_task_timeout_secs / 2)
+			wake_up_all(work->done->waitq);
+
 		wbc_detach_inode(&wbc);
 		work->nr_pages -= write_chunk - wbc.nr_to_write;
 		wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 2ad261082bba..c37c6bd5ef5c 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -63,6 +63,7 @@ enum wb_reason {
 struct wb_completion {
 	atomic_t		cnt;
 	wait_queue_head_t	*waitq;
+	unsigned long stamp;
 };
 
 #define __WB_COMPLETION_INIT(_waitq)	\
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-25 13:22 [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk Julian Sun
@ 2025-09-25 16:31 ` Jan Kara
  2025-09-25 17:25 ` Mateusz Guzik
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Kara @ 2025-09-25 16:31 UTC (permalink / raw)
  To: Julian Sun; +Cc: linux-fsdevel, jack, brauner, viro, peterz

On Thu 25-09-25 21:22:39, Julian Sun wrote:
> Writing back a large number of pages can take a lots of time.
> This issue is exacerbated when the underlying device is slow or
> subject to block layer rate limiting, which in turn triggers
> unexpected hung task warnings.
> 
> We can trigger a wake-up once a chunk has been written back and the
> waiting time for writeback exceeds half of
> sysctl_hung_task_timeout_secs.
> This action allows the hung task detector to be aware of the writeback
> progress, thereby eliminating these unexpected hung task warnings.
> 
> This patch has passed the xfstests 'check -g quick' test based on ext4,
> with no additional failures introduced.
> 
> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/fs-writeback.c                | 13 +++++++++++--
>  include/linux/backing-dev-defs.h |  1 +
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a07b8cf73ae2..475d52abfb3e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -14,6 +14,7 @@
>   *		Additions for address_space-based writeback
>   */
>  
> +#include <linux/sched/sysctl.h>
>  #include <linux/kernel.h>
>  #include <linux/export.h>
>  #include <linux/spinlock.h>
> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
>  		kfree(work);
>  	if (done) {
>  		wait_queue_head_t *waitq = done->waitq;
> +		/* Report progress to inform the hung task detector of the progress. */
> +		bool force_wake = (jiffies - done->stamp) >
> +				   sysctl_hung_task_timeout_secs * HZ / 2;
>  
>  		/* @done can't be accessed after the following dec */
> -		if (atomic_dec_and_test(&done->cnt))
> +		if (atomic_dec_and_test(&done->cnt) || force_wake)
>  			wake_up_all(waitq);
>  	}
>  }
> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>  void wb_wait_for_completion(struct wb_completion *done)
>  {
>  	atomic_dec(&done->cnt);		/* put down the initial count */
> -	wait_event(*done->waitq, !atomic_read(&done->cnt));
> +	wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
>  }
>  
>  #ifdef CONFIG_CGROUP_WRITEBACK
> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
>  		 */
>  		__writeback_single_inode(inode, &wbc);
>  
> +		/* Report progress to inform the hung task detector of the progress. */
> +		if (work->done && (jiffies - work->done->stamp) >
> +		    HZ * sysctl_hung_task_timeout_secs / 2)
> +			wake_up_all(work->done->waitq);
> +
>  		wbc_detach_inode(&wbc);
>  		work->nr_pages -= write_chunk - wbc.nr_to_write;
>  		wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 2ad261082bba..c37c6bd5ef5c 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -63,6 +63,7 @@ enum wb_reason {
>  struct wb_completion {
>  	atomic_t		cnt;
>  	wait_queue_head_t	*waitq;
> +	unsigned long stamp;
>  };
>  
>  #define __WB_COMPLETION_INIT(_waitq)	\
> -- 
> 2.39.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-25 13:22 [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk Julian Sun
  2025-09-25 16:31 ` Jan Kara
@ 2025-09-25 17:25 ` Mateusz Guzik
  2025-09-26  2:26   ` Julian Sun
  1 sibling, 1 reply; 11+ messages in thread
From: Mateusz Guzik @ 2025-09-25 17:25 UTC (permalink / raw)
  To: Julian Sun; +Cc: linux-fsdevel, jack, brauner, viro, peterz

On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> Writing back a large number of pages can take a lots of time.
> This issue is exacerbated when the underlying device is slow or
> subject to block layer rate limiting, which in turn triggers
> unexpected hung task warnings.
> 
> We can trigger a wake-up once a chunk has been written back and the
> waiting time for writeback exceeds half of
> sysctl_hung_task_timeout_secs.
> This action allows the hung task detector to be aware of the writeback
> progress, thereby eliminating these unexpected hung task warnings.
> 

If I'm reading correctly this is also messing with stats how long the
thread was stuck to begin with.

Perhaps it would be better to have a var in task_struct which would
serve as an indicator of progress being made (e.g., last time stamp of
said progress).

task_struct already has numerous holes so this would not have to grow it
above what it is now.


> This patch has passed the xfstests 'check -g quick' test based on ext4,
> with no additional failures introduced.
> 
> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> ---
>  fs/fs-writeback.c                | 13 +++++++++++--
>  include/linux/backing-dev-defs.h |  1 +
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a07b8cf73ae2..475d52abfb3e 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -14,6 +14,7 @@
>   *		Additions for address_space-based writeback
>   */
>  
> +#include <linux/sched/sysctl.h>
>  #include <linux/kernel.h>
>  #include <linux/export.h>
>  #include <linux/spinlock.h>
> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
>  		kfree(work);
>  	if (done) {
>  		wait_queue_head_t *waitq = done->waitq;
> +		/* Report progress to inform the hung task detector of the progress. */
> +		bool force_wake = (jiffies - done->stamp) >
> +				   sysctl_hung_task_timeout_secs * HZ / 2;
>  
>  		/* @done can't be accessed after the following dec */
> -		if (atomic_dec_and_test(&done->cnt))
> +		if (atomic_dec_and_test(&done->cnt) || force_wake)
>  			wake_up_all(waitq);
>  	}
>  }
> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>  void wb_wait_for_completion(struct wb_completion *done)
>  {
>  	atomic_dec(&done->cnt);		/* put down the initial count */
> -	wait_event(*done->waitq, !atomic_read(&done->cnt));
> +	wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
>  }
>  
>  #ifdef CONFIG_CGROUP_WRITEBACK
> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
>  		 */
>  		__writeback_single_inode(inode, &wbc);
>  
> +		/* Report progress to inform the hung task detector of the progress. */
> +		if (work->done && (jiffies - work->done->stamp) >
> +		    HZ * sysctl_hung_task_timeout_secs / 2)
> +			wake_up_all(work->done->waitq);
> +
>  		wbc_detach_inode(&wbc);
>  		work->nr_pages -= write_chunk - wbc.nr_to_write;
>  		wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 2ad261082bba..c37c6bd5ef5c 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -63,6 +63,7 @@ enum wb_reason {
>  struct wb_completion {
>  	atomic_t		cnt;
>  	wait_queue_head_t	*waitq;
> +	unsigned long stamp;
>  };
>  
>  #define __WB_COMPLETION_INIT(_waitq)	\
> -- 
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-25 17:25 ` Mateusz Guzik
@ 2025-09-26  2:26   ` Julian Sun
  2025-09-26 11:17     ` Mateusz Guzik
  0 siblings, 1 reply; 11+ messages in thread
From: Julian Sun @ 2025-09-26  2:26 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: linux-fsdevel, jack, brauner, viro, peterz, akpm, Lance Yang

On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
>> Writing back a large number of pages can take a lots of time.
>> This issue is exacerbated when the underlying device is slow or
>> subject to block layer rate limiting, which in turn triggers
>> unexpected hung task warnings.
>>
>> We can trigger a wake-up once a chunk has been written back and the
>> waiting time for writeback exceeds half of
>> sysctl_hung_task_timeout_secs.
>> This action allows the hung task detector to be aware of the writeback
>> progress, thereby eliminating these unexpected hung task warnings.
>>
> 
> If I'm reading correctly this is also messing with stats how long the
> thread was stuck to begin with.

IMO, it will not mess up the time. Since it only updates the time when 
we can see progress (which is not a hang). If the task really hangs for 
a long time, then we can't perform the time update—so it will not mess 
up the time.

cc Lance and Andrew.
> 
> Perhaps it would be better to have a var in task_struct which would
> serve as an indicator of progress being made (e.g., last time stamp of
> said progress).
> 
> task_struct already has numerous holes so this would not have to grow it
> above what it is now.
> 
> 
>> This patch has passed the xfstests 'check -g quick' test based on ext4,
>> with no additional failures introduced.
>>
>> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> ---
>>   fs/fs-writeback.c                | 13 +++++++++++--
>>   include/linux/backing-dev-defs.h |  1 +
>>   2 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index a07b8cf73ae2..475d52abfb3e 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -14,6 +14,7 @@
>>    *		Additions for address_space-based writeback
>>    */
>>   
>> +#include <linux/sched/sysctl.h>
>>   #include <linux/kernel.h>
>>   #include <linux/export.h>
>>   #include <linux/spinlock.h>
>> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
>>   		kfree(work);
>>   	if (done) {
>>   		wait_queue_head_t *waitq = done->waitq;
>> +		/* Report progress to inform the hung task detector of the progress. */
>> +		bool force_wake = (jiffies - done->stamp) >
>> +				   sysctl_hung_task_timeout_secs * HZ / 2;
>>   
>>   		/* @done can't be accessed after the following dec */
>> -		if (atomic_dec_and_test(&done->cnt))
>> +		if (atomic_dec_and_test(&done->cnt) || force_wake)
>>   			wake_up_all(waitq);
>>   	}
>>   }
>> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>>   void wb_wait_for_completion(struct wb_completion *done)
>>   {
>>   	atomic_dec(&done->cnt);		/* put down the initial count */
>> -	wait_event(*done->waitq, !atomic_read(&done->cnt));
>> +	wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
>>   }
>>   
>>   #ifdef CONFIG_CGROUP_WRITEBACK
>> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
>>   		 */
>>   		__writeback_single_inode(inode, &wbc);
>>   
>> +		/* Report progress to inform the hung task detector of the progress. */
>> +		if (work->done && (jiffies - work->done->stamp) >
>> +		    HZ * sysctl_hung_task_timeout_secs / 2)
>> +			wake_up_all(work->done->waitq);
>> +
>>   		wbc_detach_inode(&wbc);
>>   		work->nr_pages -= write_chunk - wbc.nr_to_write;
>>   		wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
>> index 2ad261082bba..c37c6bd5ef5c 100644
>> --- a/include/linux/backing-dev-defs.h
>> +++ b/include/linux/backing-dev-defs.h
>> @@ -63,6 +63,7 @@ enum wb_reason {
>>   struct wb_completion {
>>   	atomic_t		cnt;
>>   	wait_queue_head_t	*waitq;
>> +	unsigned long stamp;
>>   };
>>   
>>   #define __WB_COMPLETION_INIT(_waitq)	\
>> -- 
>> 2.39.5
>>

Thanks,
-- 
Julian Sun <sunjunchao@bytedance.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26  2:26   ` Julian Sun
@ 2025-09-26 11:17     ` Mateusz Guzik
  2025-09-26 11:43       ` Julian Sun
  0 siblings, 1 reply; 11+ messages in thread
From: Mateusz Guzik @ 2025-09-26 11:17 UTC (permalink / raw)
  To: Julian Sun; +Cc: linux-fsdevel, jack, brauner, viro, peterz, akpm, Lance Yang

On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
>
> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> > On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> >> Writing back a large number of pages can take a lots of time.
> >> This issue is exacerbated when the underlying device is slow or
> >> subject to block layer rate limiting, which in turn triggers
> >> unexpected hung task warnings.
> >>
> >> We can trigger a wake-up once a chunk has been written back and the
> >> waiting time for writeback exceeds half of
> >> sysctl_hung_task_timeout_secs.
> >> This action allows the hung task detector to be aware of the writeback
> >> progress, thereby eliminating these unexpected hung task warnings.
> >>
> >
> > If I'm reading correctly this is also messing with stats how long the
> > thread was stuck to begin with.
>
> IMO, it will not mess up the time. Since it only updates the time when
> we can see progress (which is not a hang). If the task really hangs for
> a long time, then we can't perform the time update—so it will not mess
> up the time.
>

My point is that if you are stuck in the kernel for so long for the
hung task detector to take notice, that's still something worth
reporting in some way, even if you are making progress. I presume with
the patch at hand this information is lost.

For example the detector could be extended to drop a one-liner about
encountering a thread which was unable to leave the kernel for a long
time, even though it is making progress. Bonus points if the message
contained info this is i/o and for which device.

> cc Lance and Andrew.
> >
> > Perhaps it would be better to have a var in task_struct which would
> > serve as an indicator of progress being made (e.g., last time stamp of
> > said progress).
> >
> > task_struct already has numerous holes so this would not have to grow it
> > above what it is now.
> >
> >
> >> This patch has passed the xfstests 'check -g quick' test based on ext4,
> >> with no additional failures introduced.
> >>
> >> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> >> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> >> ---
> >>   fs/fs-writeback.c                | 13 +++++++++++--
> >>   include/linux/backing-dev-defs.h |  1 +
> >>   2 files changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index a07b8cf73ae2..475d52abfb3e 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -14,6 +14,7 @@
> >>    *         Additions for address_space-based writeback
> >>    */
> >>
> >> +#include <linux/sched/sysctl.h>
> >>   #include <linux/kernel.h>
> >>   #include <linux/export.h>
> >>   #include <linux/spinlock.h>
> >> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
> >>              kfree(work);
> >>      if (done) {
> >>              wait_queue_head_t *waitq = done->waitq;
> >> +            /* Report progress to inform the hung task detector of the progress. */
> >> +            bool force_wake = (jiffies - done->stamp) >
> >> +                               sysctl_hung_task_timeout_secs * HZ / 2;
> >>
> >>              /* @done can't be accessed after the following dec */
> >> -            if (atomic_dec_and_test(&done->cnt))
> >> +            if (atomic_dec_and_test(&done->cnt) || force_wake)
> >>                      wake_up_all(waitq);
> >>      }
> >>   }
> >> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
> >>   void wb_wait_for_completion(struct wb_completion *done)
> >>   {
> >>      atomic_dec(&done->cnt);         /* put down the initial count */
> >> -    wait_event(*done->waitq, !atomic_read(&done->cnt));
> >> +    wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
> >>   }
> >>
> >>   #ifdef CONFIG_CGROUP_WRITEBACK
> >> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
> >>               */
> >>              __writeback_single_inode(inode, &wbc);
> >>
> >> +            /* Report progress to inform the hung task detector of the progress. */
> >> +            if (work->done && (jiffies - work->done->stamp) >
> >> +                HZ * sysctl_hung_task_timeout_secs / 2)
> >> +                    wake_up_all(work->done->waitq);
> >> +
> >>              wbc_detach_inode(&wbc);
> >>              work->nr_pages -= write_chunk - wbc.nr_to_write;
> >>              wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
> >> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> >> index 2ad261082bba..c37c6bd5ef5c 100644
> >> --- a/include/linux/backing-dev-defs.h
> >> +++ b/include/linux/backing-dev-defs.h
> >> @@ -63,6 +63,7 @@ enum wb_reason {
> >>   struct wb_completion {
> >>      atomic_t                cnt;
> >>      wait_queue_head_t       *waitq;
> >> +    unsigned long stamp;
> >>   };
> >>
> >>   #define __WB_COMPLETION_INIT(_waitq)       \
> >> --
> >> 2.39.5
> >>
>
> Thanks,
> --
> Julian Sun <sunjunchao@bytedance.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 11:17     ` Mateusz Guzik
@ 2025-09-26 11:43       ` Julian Sun
  2025-09-26 12:05         ` Mateusz Guzik
  0 siblings, 1 reply; 11+ messages in thread
From: Julian Sun @ 2025-09-26 11:43 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: linux-fsdevel, jack, brauner, viro, peterz, akpm, Lance Yang

On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
>>
>> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
>>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
>>>> Writing back a large number of pages can take a lots of time.
>>>> This issue is exacerbated when the underlying device is slow or
>>>> subject to block layer rate limiting, which in turn triggers
>>>> unexpected hung task warnings.
>>>>
>>>> We can trigger a wake-up once a chunk has been written back and the
>>>> waiting time for writeback exceeds half of
>>>> sysctl_hung_task_timeout_secs.
>>>> This action allows the hung task detector to be aware of the writeback
>>>> progress, thereby eliminating these unexpected hung task warnings.
>>>>
>>>
>>> If I'm reading correctly this is also messing with stats how long the
>>> thread was stuck to begin with.
>>
>> IMO, it will not mess up the time. Since it only updates the time when
>> we can see progress (which is not a hang). If the task really hangs for
>> a long time, then we can't perform the time update—so it will not mess
>> up the time.
>>
> 
> My point is that if you are stuck in the kernel for so long for the
> hung task detector to take notice, that's still something worth
> reporting in some way, even if you are making progress. I presume with
> the patch at hand this information is lost.
> 
> For example the detector could be extended to drop a one-liner about
> encountering a thread which was unable to leave the kernel for a long
> time, even though it is making progress. Bonus points if the message
> contained info this is i/o and for which device.

Let me understand: you want to print logs when writeback is making 
progress but is so slow that the task can't exit, correct?
I see this as a new requirement different from the existing hung task 
detector: needing to print info when writeback is slow.
Indeed, the existing detector prints warnings in two cases: 1) no 
writeback progress; 2) progress is made but writeback is so slow it will 
take too long.
This patch eliminates warnings for case 2, but only to make hung task 
warnings more accurate.
Case 2 is essentially a performance issue. Increasing 
sysctl_hung_task_timeout_secs may make case 2 warnings disappear, but 
not case 1.
So we should consider: do we need to print info when slow writeback 
keeps a task from exiting beyond a certain time?
This deserves a new patch: record a timestamp at writeback start, 
compare it with the initial timestamp after each chunk is written back, 
and decide whether to print. This should meet your needs.
If fs maintainers find this reasonable, I'm willing to contribute the 
code.>
>> cc Lance and Andrew.
>>>
>>> Perhaps it would be better to have a var in task_struct which would
>>> serve as an indicator of progress being made (e.g., last time stamp of
>>> said progress).
>>>
>>> task_struct already has numerous holes so this would not have to grow it
>>> above what it is now.
>>>
>>>
>>>> This patch has passed the xfstests 'check -g quick' test based on ext4,
>>>> with no additional failures introduced.
>>>>
>>>> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
>>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>>>> ---
>>>>    fs/fs-writeback.c                | 13 +++++++++++--
>>>>    include/linux/backing-dev-defs.h |  1 +
>>>>    2 files changed, 12 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>>>> index a07b8cf73ae2..475d52abfb3e 100644
>>>> --- a/fs/fs-writeback.c
>>>> +++ b/fs/fs-writeback.c
>>>> @@ -14,6 +14,7 @@
>>>>     *         Additions for address_space-based writeback
>>>>     */
>>>>
>>>> +#include <linux/sched/sysctl.h>
>>>>    #include <linux/kernel.h>
>>>>    #include <linux/export.h>
>>>>    #include <linux/spinlock.h>
>>>> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
>>>>               kfree(work);
>>>>       if (done) {
>>>>               wait_queue_head_t *waitq = done->waitq;
>>>> +            /* Report progress to inform the hung task detector of the progress. */
>>>> +            bool force_wake = (jiffies - done->stamp) >
>>>> +                               sysctl_hung_task_timeout_secs * HZ / 2;
>>>>
>>>>               /* @done can't be accessed after the following dec */
>>>> -            if (atomic_dec_and_test(&done->cnt))
>>>> +            if (atomic_dec_and_test(&done->cnt) || force_wake)
>>>>                       wake_up_all(waitq);
>>>>       }
>>>>    }
>>>> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>>>>    void wb_wait_for_completion(struct wb_completion *done)
>>>>    {
>>>>       atomic_dec(&done->cnt);         /* put down the initial count */
>>>> -    wait_event(*done->waitq, !atomic_read(&done->cnt));
>>>> +    wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
>>>>    }
>>>>
>>>>    #ifdef CONFIG_CGROUP_WRITEBACK
>>>> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
>>>>                */
>>>>               __writeback_single_inode(inode, &wbc);
>>>>
>>>> +            /* Report progress to inform the hung task detector of the progress. */
>>>> +            if (work->done && (jiffies - work->done->stamp) >
>>>> +                HZ * sysctl_hung_task_timeout_secs / 2)
>>>> +                    wake_up_all(work->done->waitq);
>>>> +
>>>>               wbc_detach_inode(&wbc);
>>>>               work->nr_pages -= write_chunk - wbc.nr_to_write;
>>>>               wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
>>>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
>>>> index 2ad261082bba..c37c6bd5ef5c 100644
>>>> --- a/include/linux/backing-dev-defs.h
>>>> +++ b/include/linux/backing-dev-defs.h
>>>> @@ -63,6 +63,7 @@ enum wb_reason {
>>>>    struct wb_completion {
>>>>       atomic_t                cnt;
>>>>       wait_queue_head_t       *waitq;
>>>> +    unsigned long stamp;
>>>>    };
>>>>
>>>>    #define __WB_COMPLETION_INIT(_waitq)       \
>>>> --
>>>> 2.39.5
>>>>
>>
>> Thanks,
>> --
>> Julian Sun <sunjunchao@bytedance.com>

Thanks,
-- 
Julian Sun <sunjunchao@bytedance.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 11:43       ` Julian Sun
@ 2025-09-26 12:05         ` Mateusz Guzik
  2025-09-26 15:48           ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Mateusz Guzik @ 2025-09-26 12:05 UTC (permalink / raw)
  To: Julian Sun; +Cc: linux-fsdevel, jack, brauner, viro, peterz, akpm, Lance Yang

On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
>
> On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> >>
> >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> >>>> Writing back a large number of pages can take a lots of time.
> >>>> This issue is exacerbated when the underlying device is slow or
> >>>> subject to block layer rate limiting, which in turn triggers
> >>>> unexpected hung task warnings.
> >>>>
> >>>> We can trigger a wake-up once a chunk has been written back and the
> >>>> waiting time for writeback exceeds half of
> >>>> sysctl_hung_task_timeout_secs.
> >>>> This action allows the hung task detector to be aware of the writeback
> >>>> progress, thereby eliminating these unexpected hung task warnings.
> >>>>
> >>>
> >>> If I'm reading correctly this is also messing with stats how long the
> >>> thread was stuck to begin with.
> >>
> >> IMO, it will not mess up the time. Since it only updates the time when
> >> we can see progress (which is not a hang). If the task really hangs for
> >> a long time, then we can't perform the time update—so it will not mess
> >> up the time.
> >>
> >
> > My point is that if you are stuck in the kernel for so long for the
> > hung task detector to take notice, that's still something worth
> > reporting in some way, even if you are making progress. I presume with
> > the patch at hand this information is lost.
> >
> > For example the detector could be extended to drop a one-liner about
> > encountering a thread which was unable to leave the kernel for a long
> > time, even though it is making progress. Bonus points if the message
> > contained info this is i/o and for which device.
>
> Let me understand: you want to print logs when writeback is making
> progress but is so slow that the task can't exit, correct?
> I see this as a new requirement different from the existing hung task
> detector: needing to print info when writeback is slow.
> Indeed, the existing detector prints warnings in two cases: 1) no
> writeback progress; 2) progress is made but writeback is so slow it will
> take too long.

I am saying it would be a nice improvement to extend the htd like that.

And that your patch as proposed would avoidably make it harder -- you
can still get what you are aiming for without the wakeups.

Also note that when looking at a kernel crashdump it may be beneficial
to know when a particular thread got first stuck in the kernel, which
is again gone with your patch.

> This patch eliminates warnings for case 2, but only to make hung task
> warnings more accurate.
> Case 2 is essentially a performance issue. Increasing
> sysctl_hung_task_timeout_secs may make case 2 warnings disappear, but
> not case 1.
> So we should consider: do we need to print info when slow writeback
> keeps a task from exiting beyond a certain time?

I would wager that helps figure out why there is no apparent progress.

> This deserves a new patch: record a timestamp at writeback start,
> compare it with the initial timestamp after each chunk is written back,
> and decide whether to print. This should meet your needs.
> If fs maintainers find this reasonable, I'm willing to contribute the
> code.

That sounds great, let's see what the fs folk think.

> >> cc Lance and Andrew.
> >>>
> >>> Perhaps it would be better to have a var in task_struct which would
> >>> serve as an indicator of progress being made (e.g., last time stamp of
> >>> said progress).
> >>>
> >>> task_struct already has numerous holes so this would not have to grow it
> >>> above what it is now.
> >>>
> >>>
> >>>> This patch has passed the xfstests 'check -g quick' test based on ext4,
> >>>> with no additional failures introduced.
> >>>>
> >>>> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> >>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> >>>> ---
> >>>>    fs/fs-writeback.c                | 13 +++++++++++--
> >>>>    include/linux/backing-dev-defs.h |  1 +
> >>>>    2 files changed, 12 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >>>> index a07b8cf73ae2..475d52abfb3e 100644
> >>>> --- a/fs/fs-writeback.c
> >>>> +++ b/fs/fs-writeback.c
> >>>> @@ -14,6 +14,7 @@
> >>>>     *         Additions for address_space-based writeback
> >>>>     */
> >>>>
> >>>> +#include <linux/sched/sysctl.h>
> >>>>    #include <linux/kernel.h>
> >>>>    #include <linux/export.h>
> >>>>    #include <linux/spinlock.h>
> >>>> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
> >>>>               kfree(work);
> >>>>       if (done) {
> >>>>               wait_queue_head_t *waitq = done->waitq;
> >>>> +            /* Report progress to inform the hung task detector of the progress. */
> >>>> +            bool force_wake = (jiffies - done->stamp) >
> >>>> +                               sysctl_hung_task_timeout_secs * HZ / 2;
> >>>>
> >>>>               /* @done can't be accessed after the following dec */
> >>>> -            if (atomic_dec_and_test(&done->cnt))
> >>>> +            if (atomic_dec_and_test(&done->cnt) || force_wake)
> >>>>                       wake_up_all(waitq);
> >>>>       }
> >>>>    }
> >>>> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
> >>>>    void wb_wait_for_completion(struct wb_completion *done)
> >>>>    {
> >>>>       atomic_dec(&done->cnt);         /* put down the initial count */
> >>>> -    wait_event(*done->waitq, !atomic_read(&done->cnt));
> >>>> +    wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
> >>>>    }
> >>>>
> >>>>    #ifdef CONFIG_CGROUP_WRITEBACK
> >>>> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
> >>>>                */
> >>>>               __writeback_single_inode(inode, &wbc);
> >>>>
> >>>> +            /* Report progress to inform the hung task detector of the progress. */
> >>>> +            if (work->done && (jiffies - work->done->stamp) >
> >>>> +                HZ * sysctl_hung_task_timeout_secs / 2)
> >>>> +                    wake_up_all(work->done->waitq);
> >>>> +
> >>>>               wbc_detach_inode(&wbc);
> >>>>               work->nr_pages -= write_chunk - wbc.nr_to_write;
> >>>>               wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
> >>>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> >>>> index 2ad261082bba..c37c6bd5ef5c 100644
> >>>> --- a/include/linux/backing-dev-defs.h
> >>>> +++ b/include/linux/backing-dev-defs.h
> >>>> @@ -63,6 +63,7 @@ enum wb_reason {
> >>>>    struct wb_completion {
> >>>>       atomic_t                cnt;
> >>>>       wait_queue_head_t       *waitq;
> >>>> +    unsigned long stamp;
> >>>>    };
> >>>>
> >>>>    #define __WB_COMPLETION_INIT(_waitq)       \
> >>>> --
> >>>> 2.39.5
> >>>>
> >>
> >> Thanks,
> >> --
> >> Julian Sun <sunjunchao@bytedance.com>
>
> Thanks,
> --
> Julian Sun <sunjunchao@bytedance.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 12:05         ` Mateusz Guzik
@ 2025-09-26 15:48           ` Jan Kara
  2025-09-26 15:50             ` Mateusz Guzik
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2025-09-26 15:48 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Julian Sun, linux-fsdevel, jack, brauner, viro, peterz, akpm,
	Lance Yang

On Fri 26-09-25 14:05:59, Mateusz Guzik wrote:
> On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
> >
> > On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> > >>
> > >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> > >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> > >>>> Writing back a large number of pages can take a lots of time.
> > >>>> This issue is exacerbated when the underlying device is slow or
> > >>>> subject to block layer rate limiting, which in turn triggers
> > >>>> unexpected hung task warnings.
> > >>>>
> > >>>> We can trigger a wake-up once a chunk has been written back and the
> > >>>> waiting time for writeback exceeds half of
> > >>>> sysctl_hung_task_timeout_secs.
> > >>>> This action allows the hung task detector to be aware of the writeback
> > >>>> progress, thereby eliminating these unexpected hung task warnings.
> > >>>>
> > >>>
> > >>> If I'm reading correctly this is also messing with stats how long the
> > >>> thread was stuck to begin with.
> > >>
> > >> IMO, it will not mess up the time. Since it only updates the time when
> > >> we can see progress (which is not a hang). If the task really hangs for
> > >> a long time, then we can't perform the time update—so it will not mess
> > >> up the time.
> > >>
> > >
> > > My point is that if you are stuck in the kernel for so long for the
> > > hung task detector to take notice, that's still something worth
> > > reporting in some way, even if you are making progress. I presume with
> > > the patch at hand this information is lost.
> > >
> > > For example the detector could be extended to drop a one-liner about
> > > encountering a thread which was unable to leave the kernel for a long
> > > time, even though it is making progress. Bonus points if the message
> > > contained info this is i/o and for which device.
> >
> > Let me understand: you want to print logs when writeback is making
> > progress but is so slow that the task can't exit, correct?
> > I see this as a new requirement different from the existing hung task
> > detector: needing to print info when writeback is slow.
> > Indeed, the existing detector prints warnings in two cases: 1) no
> > writeback progress; 2) progress is made but writeback is so slow it will
> > take too long.
> 
> I am saying it would be a nice improvement to extend the htd like that.
> 
> And that your patch as proposed would avoidably make it harder -- you
> can still get what you are aiming for without the wakeups.
> 
> Also note that when looking at a kernel crashdump it may be beneficial
> to know when a particular thread got first stuck in the kernel, which
> is again gone with your patch.

I understand your concerns but I think it's stretching the goals for this
patch a bit too much.  I'm fine with the patch going in as is and if Julian
is willing to work on this additional debug features, then great!

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 15:48           ` Jan Kara
@ 2025-09-26 15:50             ` Mateusz Guzik
  2025-09-26 15:55               ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Mateusz Guzik @ 2025-09-26 15:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Julian Sun, linux-fsdevel, brauner, viro, peterz, akpm,
	Lance Yang

On Fri, Sep 26, 2025 at 5:48 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 26-09-25 14:05:59, Mateusz Guzik wrote:
> > On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
> > >
> > > On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > > > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> > > >>
> > > >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> > > >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> > > >>>> Writing back a large number of pages can take a lots of time.
> > > >>>> This issue is exacerbated when the underlying device is slow or
> > > >>>> subject to block layer rate limiting, which in turn triggers
> > > >>>> unexpected hung task warnings.
> > > >>>>
> > > >>>> We can trigger a wake-up once a chunk has been written back and the
> > > >>>> waiting time for writeback exceeds half of
> > > >>>> sysctl_hung_task_timeout_secs.
> > > >>>> This action allows the hung task detector to be aware of the writeback
> > > >>>> progress, thereby eliminating these unexpected hung task warnings.
> > > >>>>
> > > >>>
> > > >>> If I'm reading correctly this is also messing with stats how long the
> > > >>> thread was stuck to begin with.
> > > >>
> > > >> IMO, it will not mess up the time. Since it only updates the time when
> > > >> we can see progress (which is not a hang). If the task really hangs for
> > > >> a long time, then we can't perform the time update—so it will not mess
> > > >> up the time.
> > > >>
> > > >
> > > > My point is that if you are stuck in the kernel for so long for the
> > > > hung task detector to take notice, that's still something worth
> > > > reporting in some way, even if you are making progress. I presume with
> > > > the patch at hand this information is lost.
> > > >
> > > > For example the detector could be extended to drop a one-liner about
> > > > encountering a thread which was unable to leave the kernel for a long
> > > > time, even though it is making progress. Bonus points if the message
> > > > contained info this is i/o and for which device.
> > >
> > > Let me understand: you want to print logs when writeback is making
> > > progress but is so slow that the task can't exit, correct?
> > > I see this as a new requirement different from the existing hung task
> > > detector: needing to print info when writeback is slow.
> > > Indeed, the existing detector prints warnings in two cases: 1) no
> > > writeback progress; 2) progress is made but writeback is so slow it will
> > > take too long.
> >
> > I am saying it would be a nice improvement to extend the htd like that.
> >
> > And that your patch as proposed would avoidably make it harder -- you
> > can still get what you are aiming for without the wakeups.
> >
> > Also note that when looking at a kernel crashdump it may be beneficial
> > to know when a particular thread got first stuck in the kernel, which
> > is again gone with your patch.
>
> I understand your concerns but I think it's stretching the goals for this
> patch a bit too much.  I'm fine with the patch going in as is and if Julian
> is willing to work on this additional debug features, then great!
>

I am not asking the patch does all that work, merely that it gets
implemented in a way which wont require a rewrite should the above
work get done. Which boils down to storing the timestamp somewhere in
task_struct.

> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 15:50             ` Mateusz Guzik
@ 2025-09-26 15:55               ` Jan Kara
  2025-09-26 16:14                 ` Mateusz Guzik
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2025-09-26 15:55 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Jan Kara, Julian Sun, linux-fsdevel, brauner, viro, peterz, akpm,
	Lance Yang

On Fri 26-09-25 17:50:43, Mateusz Guzik wrote:
> On Fri, Sep 26, 2025 at 5:48 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Fri 26-09-25 14:05:59, Mateusz Guzik wrote:
> > > On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
> > > >
> > > > On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > > > > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> > > > >>
> > > > >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> > > > >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> > > > >>>> Writing back a large number of pages can take a lots of time.
> > > > >>>> This issue is exacerbated when the underlying device is slow or
> > > > >>>> subject to block layer rate limiting, which in turn triggers
> > > > >>>> unexpected hung task warnings.
> > > > >>>>
> > > > >>>> We can trigger a wake-up once a chunk has been written back and the
> > > > >>>> waiting time for writeback exceeds half of
> > > > >>>> sysctl_hung_task_timeout_secs.
> > > > >>>> This action allows the hung task detector to be aware of the writeback
> > > > >>>> progress, thereby eliminating these unexpected hung task warnings.
> > > > >>>>
> > > > >>>
> > > > >>> If I'm reading correctly this is also messing with stats how long the
> > > > >>> thread was stuck to begin with.
> > > > >>
> > > > >> IMO, it will not mess up the time. Since it only updates the time when
> > > > >> we can see progress (which is not a hang). If the task really hangs for
> > > > >> a long time, then we can't perform the time update—so it will not mess
> > > > >> up the time.
> > > > >>
> > > > >
> > > > > My point is that if you are stuck in the kernel for so long for the
> > > > > hung task detector to take notice, that's still something worth
> > > > > reporting in some way, even if you are making progress. I presume with
> > > > > the patch at hand this information is lost.
> > > > >
> > > > > For example the detector could be extended to drop a one-liner about
> > > > > encountering a thread which was unable to leave the kernel for a long
> > > > > time, even though it is making progress. Bonus points if the message
> > > > > contained info this is i/o and for which device.
> > > >
> > > > Let me understand: you want to print logs when writeback is making
> > > > progress but is so slow that the task can't exit, correct?
> > > > I see this as a new requirement different from the existing hung task
> > > > detector: needing to print info when writeback is slow.
> > > > Indeed, the existing detector prints warnings in two cases: 1) no
> > > > writeback progress; 2) progress is made but writeback is so slow it will
> > > > take too long.
> > >
> > > I am saying it would be a nice improvement to extend the htd like that.
> > >
> > > And that your patch as proposed would avoidably make it harder -- you
> > > can still get what you are aiming for without the wakeups.
> > >
> > > Also note that when looking at a kernel crashdump it may be beneficial
> > > to know when a particular thread got first stuck in the kernel, which
> > > is again gone with your patch.
> >
> > I understand your concerns but I think it's stretching the goals for this
> > patch a bit too much.  I'm fine with the patch going in as is and if Julian
> > is willing to work on this additional debug features, then great!
> >
> 
> I am not asking the patch does all that work, merely that it gets
> implemented in a way which wont require a rewrite should the above
> work get done. Which boils down to storing the timestamp somewhere in
> task_struct.

Well, but that doesn't really make much sense without the debug patch
itself, does it? And it could be a potential discussion point. So I think
the debug patch should just move the timestamp from the wb completion to
task_struct if that's needed...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
  2025-09-26 15:55               ` Jan Kara
@ 2025-09-26 16:14                 ` Mateusz Guzik
  0 siblings, 0 replies; 11+ messages in thread
From: Mateusz Guzik @ 2025-09-26 16:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Julian Sun, linux-fsdevel, brauner, viro, peterz, akpm,
	Lance Yang

On Fri, Sep 26, 2025 at 5:55 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 26-09-25 17:50:43, Mateusz Guzik wrote:
> > On Fri, Sep 26, 2025 at 5:48 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Fri 26-09-25 14:05:59, Mateusz Guzik wrote:
> > > > On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
> > > > >
> > > > > On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > > > > > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> > > > > >>
> > > > > >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> > > > > >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> > > > > >>>> Writing back a large number of pages can take a lots of time.
> > > > > >>>> This issue is exacerbated when the underlying device is slow or
> > > > > >>>> subject to block layer rate limiting, which in turn triggers
> > > > > >>>> unexpected hung task warnings.
> > > > > >>>>
> > > > > >>>> We can trigger a wake-up once a chunk has been written back and the
> > > > > >>>> waiting time for writeback exceeds half of
> > > > > >>>> sysctl_hung_task_timeout_secs.
> > > > > >>>> This action allows the hung task detector to be aware of the writeback
> > > > > >>>> progress, thereby eliminating these unexpected hung task warnings.
> > > > > >>>>
> > > > > >>>
> > > > > >>> If I'm reading correctly this is also messing with stats how long the
> > > > > >>> thread was stuck to begin with.
> > > > > >>
> > > > > >> IMO, it will not mess up the time. Since it only updates the time when
> > > > > >> we can see progress (which is not a hang). If the task really hangs for
> > > > > >> a long time, then we can't perform the time update—so it will not mess
> > > > > >> up the time.
> > > > > >>
> > > > > >
> > > > > > My point is that if you are stuck in the kernel for so long for the
> > > > > > hung task detector to take notice, that's still something worth
> > > > > > reporting in some way, even if you are making progress. I presume with
> > > > > > the patch at hand this information is lost.
> > > > > >
> > > > > > For example the detector could be extended to drop a one-liner about
> > > > > > encountering a thread which was unable to leave the kernel for a long
> > > > > > time, even though it is making progress. Bonus points if the message
> > > > > > contained info this is i/o and for which device.
> > > > >
> > > > > Let me understand: you want to print logs when writeback is making
> > > > > progress but is so slow that the task can't exit, correct?
> > > > > I see this as a new requirement different from the existing hung task
> > > > > detector: needing to print info when writeback is slow.
> > > > > Indeed, the existing detector prints warnings in two cases: 1) no
> > > > > writeback progress; 2) progress is made but writeback is so slow it will
> > > > > take too long.
> > > >
> > > > I am saying it would be a nice improvement to extend the htd like that.
> > > >
> > > > And that your patch as proposed would avoidably make it harder -- you
> > > > can still get what you are aiming for without the wakeups.
> > > >
> > > > Also note that when looking at a kernel crashdump it may be beneficial
> > > > to know when a particular thread got first stuck in the kernel, which
> > > > is again gone with your patch.
> > >
> > > I understand your concerns but I think it's stretching the goals for this
> > > patch a bit too much.  I'm fine with the patch going in as is and if Julian
> > > is willing to work on this additional debug features, then great!
> > >
> >
> > I am not asking the patch does all that work, merely that it gets
> > implemented in a way which wont require a rewrite should the above
> > work get done. Which boils down to storing the timestamp somewhere in
> > task_struct.
>
> Well, but that doesn't really make much sense without the debug patch
> itself, does it? And it could be a potential discussion point. So I think
> the debug patch should just move the timestamp from the wb completion to
> task_struct if that's needed...
>

I don't follow your e-mail, so I'm going to restate how I see this.

There is a timestamp stored somewhere indicating when was the last
time the thread went off CPU.

There are cases where it went off CPU for a long time, despite the
thing it is waiting for making progress.

This can trigger false-positives in hung task detector.

The proposed patch forces wake ups before this happens. This works
around the problem, but also perturbs a datum which would be useful in
debugging. It also seems weirdly disruptive for what it is aiming for.

When looking at a kernel crashdump it may be useful to know how long
the particular thread was off CPU. This will no longer be accessible.

Even for cases where progress is being made, but takes a long time, it
may still be useful to report it as this does not look like the
expected condition. With the change at hand there is no way to do it.

The reported issue can be worked around by adding a timestamp of 'last
known progress' to one of the holes in task_struct. Then hung task
detector can be trivially patched to bugger off based on it.

Later an interested party can extend the current functionality with
the thing I mentioned.

>                                                                 Honza
>
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-26 16:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-25 13:22 [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk Julian Sun
2025-09-25 16:31 ` Jan Kara
2025-09-25 17:25 ` Mateusz Guzik
2025-09-26  2:26   ` Julian Sun
2025-09-26 11:17     ` Mateusz Guzik
2025-09-26 11:43       ` Julian Sun
2025-09-26 12:05         ` Mateusz Guzik
2025-09-26 15:48           ` Jan Kara
2025-09-26 15:50             ` Mateusz Guzik
2025-09-26 15:55               ` Jan Kara
2025-09-26 16:14                 ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).