From: Vivek Goyal <vgoyal@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Josh Hunt <joshhunt00@gmail.com>,
linux-kernel@vger.kernel.org, tj@kernel.org
Subject: Re: multi-second application stall in open()
Date: Wed, 7 Mar 2012 14:56:38 -0500 [thread overview]
Message-ID: <20120307195638.GI13430@redhat.com> (raw)
In-Reply-To: <4F57AF4A.6080703@kernel.dk>
On Wed, Mar 07, 2012 at 07:56:10PM +0100, Jens Axboe wrote:
[..]
> >
> > blktrace of cfq look odd. I see that some IO (async writes) are being
> > submitted but CFQ did not dispatch it for a long time. Even some unplugs
> > came in still nothing happened. Also no completions are happening during
> > that window. Not sure why CFQ refuses to dispatch queued writes.
> >
> > Request added by flusher.
> >
> > 8,0 1 36926 5028.546000122 2846 A W 20147012 + 8 <- (8,3)
> > 3375152
> > 8,0 1 36927 5028.546001798 2846 Q W 20147012 + 8 [flush-8:0]
> > 8,0 1 36928 5028.546009900 2846 G W 20147012 + 8 [flush-8:0]
> > 8,0 1 36929 5028.546014649 2846 I W 20147012 + 8 ( 4749)
> > [flush-8:0]
> >
> > And this request is dispatched after 22 seconds.
> >
> > 8,0 1 37056 5050.117337221 162 D W 20147012 + 16 (21571322572) [sync_supers]
> >
> >
> > And it completes fairly fast.
> >
> > 8,0 0 36522 5050.117686149 9657 C W 20147012 + 16 ( 348928)
> > [0]
> >
> > So not sure why CFQ will hold that request for so long when other IO is
> > not happening.
> >
> > Please try latest kernels and see if deadline has the same issue. If not,
> > then we know somehow CFQ is related. If it still happens on latest
> > kernels, can you try capturing blktrace again when you are experiencing
> > the delays.
>
> I'm seeing something very similar here. While testing the gtk fio
> client, I ran a job that issued a lot of random reads to my primary
> disk. 64 ios in flight, direct, libaio, 512b random reads. Firefox
> essentially froze, windows starting freezing up around me.
>
> I'll try and reproduce, but a quick guess would be that things starting
> piling up in fsync() or stalling on writes in general, since we are
> heavily starving those.
Quite possible. Other people also had reported write starvation issues. I
have got reports of "hung task timeout of 120 seconds" reports in presence
of sync IO happening on same disk/partition.
We probably need to do something about write starvation. I had posted one
patch to make sure we dispatch atleast one WRITE after we were waiting for
pending sync requests to finish.
https://lkml.org/lkml/2011/6/10/326
This might help a bit but might not prevent servere delays in dispatching
async writes as things are so heavily loaded in favor or sync IO.
BTW, in this case, I did not see any sync IO completions happening while
async was not being dispatched. That's little odd.
Thanks
Vivek
next prev parent reply other threads:[~2012-03-07 19:56 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-06 21:56 multi-second application stall in open() Josh Hunt
2012-03-07 13:43 ` Josh Hunt
2012-03-07 16:28 ` Vivek Goyal
2012-03-07 18:56 ` Jens Axboe
2012-03-07 19:56 ` Vivek Goyal [this message]
2012-03-07 21:08 ` Josh Hunt
2012-03-08 22:22 ` Josh Hunt
2012-03-08 23:40 ` Vivek Goyal
[not found] ` <CAKA=qzbsL9UVYLZ3=hoT-1jfp=v=_Sr=h+YeHu0qAA=Ko_7P6w@mail.gmail.com>
2012-06-21 19:26 ` Josh Hunt
2012-06-21 20:32 ` Vivek Goyal
2012-06-21 20:36 ` Tejun Heo
2012-06-21 21:28 ` Josh Hunt
2012-06-21 21:32 ` Tejun Heo
2012-06-21 21:48 ` Rakesh Iyer
[not found] ` <CAOT6A4-a49wLHcQepUxJCDxOxfnSTEWa72OweLsmrea85OyrCg@mail.gmail.com>
2012-06-22 14:15 ` Vivek Goyal
2012-06-21 21:11 ` Josh Hunt
2012-06-22 14:12 ` Vivek Goyal
2012-06-22 20:05 ` Josh Hunt
2012-06-22 20:22 ` Josh Hunt
2012-06-22 20:42 ` Vivek Goyal
2012-06-22 20:53 ` Josh Hunt
2012-06-22 20:57 ` Josh Hunt
2012-06-22 21:34 ` Josh Hunt
2012-06-25 13:30 ` Vivek Goyal
2012-06-25 16:22 ` Josh Hunt
2012-06-25 21:18 ` Vivek Goyal
2012-06-25 23:05 ` Josh Hunt
2012-06-26 4:01 ` Josh Hunt
2012-06-26 12:59 ` Vivek Goyal
2012-06-26 15:18 ` Josh Hunt
2012-06-26 15:53 ` Vivek Goyal
2012-06-26 20:37 ` Josh Hunt
2012-06-26 20:56 ` Tejun Heo
[not found] ` <CAKA=qzbBtteDjHiPogCvN5jOSiPrDxx=vn96p02bXUy=6=jAgA@mail.gmail.com>
2012-06-26 23:44 ` Josh Hunt
2012-06-27 17:21 ` Josh Hunt
2012-06-27 17:38 ` Tejun Heo
2012-06-27 17:44 ` Josh Hunt
2012-06-27 17:54 ` Tejun Heo
2012-06-27 17:59 ` Josh Hunt
2012-06-29 23:02 ` Tejun Heo
2012-06-30 0:37 ` Josh Hunt
2012-07-04 1:12 ` Tejun Heo
2012-07-18 17:48 ` Tejun Heo
2012-06-26 20:43 ` Tejun Heo
2012-06-25 17:26 ` Tejun Heo
2012-03-07 19:47 ` Josh Hunt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120307195638.GI13430@redhat.com \
--to=vgoyal@redhat.com \
--cc=axboe@kernel.dk \
--cc=joshhunt00@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).