From: Mateusz Guzik <mjguzik@gmail.com>
To: Julian Sun <sunjunchao@bytedance.com>
Cc: linux-fsdevel@vger.kernel.org, jack@suse.cz, brauner@kernel.org,
viro@zeniv.linux.org.uk, peterz@infradead.org,
akpm@linux-foundation.org, Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk.
Date: Fri, 26 Sep 2025 14:05:59 +0200 [thread overview]
Message-ID: <CAGudoHEkJfenk7ePETr3PCCqb9AYo7F4Ha754EjV4rT+U6_qoQ@mail.gmail.com> (raw)
In-Reply-To: <14ee6648-1878-4b46-9e46-d275cc50bf0a@bytedance.com>
On Fri, Sep 26, 2025 at 1:43 PM Julian Sun <sunjunchao@bytedance.com> wrote:
>
> On 9/26/25 7:17 PM, Mateusz Guzik wrote:
> > On Fri, Sep 26, 2025 at 4:26 AM Julian Sun <sunjunchao@bytedance.com> wrote:
> >>
> >> On 9/26/25 1:25 AM, Mateusz Guzik wrote:
> >>> On Thu, Sep 25, 2025 at 09:22:39PM +0800, Julian Sun wrote:
> >>>> Writing back a large number of pages can take a lots of time.
> >>>> This issue is exacerbated when the underlying device is slow or
> >>>> subject to block layer rate limiting, which in turn triggers
> >>>> unexpected hung task warnings.
> >>>>
> >>>> We can trigger a wake-up once a chunk has been written back and the
> >>>> waiting time for writeback exceeds half of
> >>>> sysctl_hung_task_timeout_secs.
> >>>> This action allows the hung task detector to be aware of the writeback
> >>>> progress, thereby eliminating these unexpected hung task warnings.
> >>>>
> >>>
> >>> If I'm reading correctly this is also messing with stats how long the
> >>> thread was stuck to begin with.
> >>
> >> IMO, it will not mess up the time. Since it only updates the time when
> >> we can see progress (which is not a hang). If the task really hangs for
> >> a long time, then we can't perform the time update—so it will not mess
> >> up the time.
> >>
> >
> > My point is that if you are stuck in the kernel for so long for the
> > hung task detector to take notice, that's still something worth
> > reporting in some way, even if you are making progress. I presume with
> > the patch at hand this information is lost.
> >
> > For example the detector could be extended to drop a one-liner about
> > encountering a thread which was unable to leave the kernel for a long
> > time, even though it is making progress. Bonus points if the message
> > contained info this is i/o and for which device.
>
> Let me understand: you want to print logs when writeback is making
> progress but is so slow that the task can't exit, correct?
> I see this as a new requirement different from the existing hung task
> detector: needing to print info when writeback is slow.
> Indeed, the existing detector prints warnings in two cases: 1) no
> writeback progress; 2) progress is made but writeback is so slow it will
> take too long.
I am saying it would be a nice improvement to extend the htd like that.
And that your patch as proposed would avoidably make it harder -- you
can still get what you are aiming for without the wakeups.
Also note that when looking at a kernel crashdump it may be beneficial
to know when a particular thread got first stuck in the kernel, which
is again gone with your patch.
> This patch eliminates warnings for case 2, but only to make hung task
> warnings more accurate.
> Case 2 is essentially a performance issue. Increasing
> sysctl_hung_task_timeout_secs may make case 2 warnings disappear, but
> not case 1.
> So we should consider: do we need to print info when slow writeback
> keeps a task from exiting beyond a certain time?
I would wager that helps figure out why there is no apparent progress.
> This deserves a new patch: record a timestamp at writeback start,
> compare it with the initial timestamp after each chunk is written back,
> and decide whether to print. This should meet your needs.
> If fs maintainers find this reasonable, I'm willing to contribute the
> code.
That sounds great, let's see what the fs folk think.
> >> cc Lance and Andrew.
> >>>
> >>> Perhaps it would be better to have a var in task_struct which would
> >>> serve as an indicator of progress being made (e.g., last time stamp of
> >>> said progress).
> >>>
> >>> task_struct already has numerous holes so this would not have to grow it
> >>> above what it is now.
> >>>
> >>>
> >>>> This patch has passed the xfstests 'check -g quick' test based on ext4,
> >>>> with no additional failures introduced.
> >>>>
> >>>> Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
> >>>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> >>>> ---
> >>>> fs/fs-writeback.c | 13 +++++++++++--
> >>>> include/linux/backing-dev-defs.h | 1 +
> >>>> 2 files changed, 12 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >>>> index a07b8cf73ae2..475d52abfb3e 100644
> >>>> --- a/fs/fs-writeback.c
> >>>> +++ b/fs/fs-writeback.c
> >>>> @@ -14,6 +14,7 @@
> >>>> * Additions for address_space-based writeback
> >>>> */
> >>>>
> >>>> +#include <linux/sched/sysctl.h>
> >>>> #include <linux/kernel.h>
> >>>> #include <linux/export.h>
> >>>> #include <linux/spinlock.h>
> >>>> @@ -174,9 +175,12 @@ static void finish_writeback_work(struct wb_writeback_work *work)
> >>>> kfree(work);
> >>>> if (done) {
> >>>> wait_queue_head_t *waitq = done->waitq;
> >>>> + /* Report progress to inform the hung task detector of the progress. */
> >>>> + bool force_wake = (jiffies - done->stamp) >
> >>>> + sysctl_hung_task_timeout_secs * HZ / 2;
> >>>>
> >>>> /* @done can't be accessed after the following dec */
> >>>> - if (atomic_dec_and_test(&done->cnt))
> >>>> + if (atomic_dec_and_test(&done->cnt) || force_wake)
> >>>> wake_up_all(waitq);
> >>>> }
> >>>> }
> >>>> @@ -213,7 +217,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
> >>>> void wb_wait_for_completion(struct wb_completion *done)
> >>>> {
> >>>> atomic_dec(&done->cnt); /* put down the initial count */
> >>>> - wait_event(*done->waitq, !atomic_read(&done->cnt));
> >>>> + wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
> >>>> }
> >>>>
> >>>> #ifdef CONFIG_CGROUP_WRITEBACK
> >>>> @@ -1975,6 +1979,11 @@ static long writeback_sb_inodes(struct super_block *sb,
> >>>> */
> >>>> __writeback_single_inode(inode, &wbc);
> >>>>
> >>>> + /* Report progress to inform the hung task detector of the progress. */
> >>>> + if (work->done && (jiffies - work->done->stamp) >
> >>>> + HZ * sysctl_hung_task_timeout_secs / 2)
> >>>> + wake_up_all(work->done->waitq);
> >>>> +
> >>>> wbc_detach_inode(&wbc);
> >>>> work->nr_pages -= write_chunk - wbc.nr_to_write;
> >>>> wrote = write_chunk - wbc.nr_to_write - wbc.pages_skipped;
> >>>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> >>>> index 2ad261082bba..c37c6bd5ef5c 100644
> >>>> --- a/include/linux/backing-dev-defs.h
> >>>> +++ b/include/linux/backing-dev-defs.h
> >>>> @@ -63,6 +63,7 @@ enum wb_reason {
> >>>> struct wb_completion {
> >>>> atomic_t cnt;
> >>>> wait_queue_head_t *waitq;
> >>>> + unsigned long stamp;
> >>>> };
> >>>>
> >>>> #define __WB_COMPLETION_INIT(_waitq) \
> >>>> --
> >>>> 2.39.5
> >>>>
> >>
> >> Thanks,
> >> --
> >> Julian Sun <sunjunchao@bytedance.com>
>
> Thanks,
> --
> Julian Sun <sunjunchao@bytedance.com>
next prev parent reply other threads:[~2025-09-26 12:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-25 13:22 [PATCH] write-back: Wake up waiting tasks when finishing the writeback of a chunk Julian Sun
2025-09-25 16:31 ` Jan Kara
2025-09-25 17:25 ` Mateusz Guzik
2025-09-26 2:26 ` Julian Sun
2025-09-26 11:17 ` Mateusz Guzik
2025-09-26 11:43 ` Julian Sun
2025-09-26 12:05 ` Mateusz Guzik [this message]
2025-09-26 15:48 ` Jan Kara
2025-09-26 15:50 ` Mateusz Guzik
2025-09-26 15:55 ` Jan Kara
2025-09-26 16:14 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGudoHEkJfenk7ePETr3PCCqb9AYo7F4Ha754EjV4rT+U6_qoQ@mail.gmail.com \
--to=mjguzik@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=sunjunchao@bytedance.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).