From: Fengguang Wu <fengguang.wu@intel.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
Gavin Shan <shangw@linux.vnet.ibm.com>
Subject: Re: [PATCH V2] writeback: fix hung_task alarm when sync block
Date: Thu, 14 Jun 2012 21:36:00 +0800 [thread overview]
Message-ID: <20120614133600.GB14883@localhost> (raw)
In-Reply-To: <x49lijr8bur.fsf@segfault.boston.devel.redhat.com>
Hi Jeff,
On Wed, Jun 13, 2012 at 11:34:20AM -0400, Jeff Moyer wrote:
> Fengguang Wu <fengguang.wu@intel.com> writes:
>
> > Hi Jeff,
> >
> > On Wed, Jun 13, 2012 at 10:27:50AM -0400, Jeff Moyer wrote:
> >> Wanpeng Li <liwp.linux@gmail.com> writes:
> >>
> >> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> > index f2d0109..df879ee 100644
> >> > --- a/fs/fs-writeback.c
> >> > +++ b/fs/fs-writeback.c
> >> > @@ -1311,7 +1311,11 @@ void writeback_inodes_sb_nr(struct super_block *sb,
> >> >
> >> > WARN_ON(!rwsem_is_locked(&sb->s_umount));
> >> > bdi_queue_work(sb->s_bdi, &work);
> >> > - wait_for_completion(&done);
> >> > + if (sysctl_hung_task_timeout_secs)
> >> > + while (!wait_for_completion_timeout(&done, HZ/2))
> >> > + ;
> >> > + else
> >> > + wait_for_completion(&done);
> >> > }
> >> > EXPORT_SYMBOL(writeback_inodes_sb_nr);
> >>
> >> Is it really expected that writeback_inodes_sb_nr will routinely queue
> >> up more than 2 seconds worth of I/O (Yes, I understand that it isn't the
> >> only entity issuing I/O)?
> >
> > Yes, in the case of syncing the whole superblock.
> > Basically sync() does its job in two steps:
> >
> > for all sb:
> > writeback_inodes_sb_nr() # WB_SYNC_NONE
> > sync_inodes_sb() # WB_SYNC_ALL
> >
> >> For devices that are really slow, it may make
> >> more sense to tune the system so that you don't have too much writeback
> >> I/O submitted at once. Dropping nr_requests for the given queue should
> >> fix this situation, I would think.
> >
> > The worried case is about sync() waiting
> >
> > (nr_dirty + nr_writeback) / write_bandwidth
> >
> > time, where it is nr_dirty that could grow rather large.
> >
> > For example, if dirty threshold is 1GB and write_bandwidth is 10MB/s,
> > the sync() will have to wait for 100 seconds. If there are heavy
> > dirtiers running during the sync, it will typically take several
> > hundreds of seconds (which looks not that good, but still much better
> > than being livelocked in some old kernels)..
> >
> >> This really feels like we're papering over the problem.
> >
> > That's true. The majority users probably don't want to cache 100s
> > worth of data in memory. It may be worthwhile to add a new per-bdi
> > limit whose unit is number-of-seconds (of dirty data).
>
> Hi, Fengguang,
>
> Another option is to limit the amount of time we wait to the amount of
> time we expect to have to wait. IOW, if we can estimate the amount of
> time we think the I/O will take to complete, we can set the
> hung_task_timeout[1] to *that* (with some fudge factor). Do you have a
> mechanism in place today to make such an estimate? The benefit of this
> solution is obvious: you still get notified when tasks are actually
> hung, but you don't get false warnings.
Good idea! Yes we can do some estimation and adaptively extend the
hang timeout for the current writeback_inodes_sb_nr()/sync_inodes_sb()
call.
Note that it's not going to reliably get rid of false warnings due to
estimation errors, which could be pretty large and unavoidable on
change of workload. But still, it would be a net improvement and
perhaps enough to get rid of most false warnings, while still being
able to catch livelock or other kind of task hang.
Thanks,
Fengguang
next prev parent reply other threads:[~2012-06-14 13:36 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-13 4:42 [PATCH V2] writeback: fix hung_task alarm when sync block Wanpeng Li
2012-06-13 14:27 ` Jeff Moyer
2012-06-13 14:48 ` Fengguang Wu
2012-06-13 14:55 ` Fengguang Wu
2012-06-13 15:34 ` Jeff Moyer
2012-06-14 13:36 ` Fengguang Wu [this message]
2012-06-19 20:14 ` Jeff Moyer
2012-06-19 21:02 ` Dave Chinner
2012-06-19 21:09 ` Jeff Moyer
2012-06-19 21:56 ` Dave Chinner
2012-06-14 1:35 ` Wanpeng Li
2012-06-14 13:26 ` Fengguang Wu
2012-06-15 22:43 ` Dave Chinner
2012-06-14 10:52 ` Wanpeng Li
2012-06-15 22:38 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120614133600.GB14883@localhost \
--to=fengguang.wu@intel.com \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liwp.linux@gmail.com \
--cc=shangw@linux.vnet.ibm.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).