From: Vivek Goyal <vgoyal@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Josh Hunt <joshhunt00@gmail.com>,
linux-kernel@vger.kernel.org, tj@kernel.org
Subject: Re: multi-second application stall in open()
Date: Wed, 7 Mar 2012 14:56:38 -0500 [thread overview]
Message-ID: <20120307195638.GI13430@redhat.com> (raw)
In-Reply-To: <4F57AF4A.6080703@kernel.dk>
On Wed, Mar 07, 2012 at 07:56:10PM +0100, Jens Axboe wrote:
[..]
> >
> > blktrace of cfq look odd. I see that some IO (async writes) are being
> > submitted but CFQ did not dispatch it for a long time. Even some unplugs
> > came in still nothing happened. Also no completions are happening during
> > that window. Not sure why CFQ refuses to dispatch queued writes.
> >
> > Request added by flusher.
> >
> > 8,0 1 36926 5028.546000122 2846 A W 20147012 + 8 <- (8,3)
> > 3375152
> > 8,0 1 36927 5028.546001798 2846 Q W 20147012 + 8 [flush-8:0]
> > 8,0 1 36928 5028.546009900 2846 G W 20147012 + 8 [flush-8:0]
> > 8,0 1 36929 5028.546014649 2846 I W 20147012 + 8 ( 4749)
> > [flush-8:0]
> >
> > And this request is dispatched after 22 seconds.
> >
> > 8,0 1 37056 5050.117337221 162 D W 20147012 + 16 (21571322572) [sync_supers]
> >
> >
> > And it completes fairly fast.
> >
> > 8,0 0 36522 5050.117686149 9657 C W 20147012 + 16 ( 348928)
> > [0]
> >
> > So not sure why CFQ will hold that request for so long when other IO is
> > not happening.
> >
> > Please try latest kernels and see if deadline has the same issue. If not,
> > then we know somehow CFQ is related. If it still happens on latest
> > kernels, can you try capturing blktrace again when you are experiencing
> > the delays.
>
> I'm seeing something very similar here. While testing the gtk fio
> client, I ran a job that issued a lot of random reads to my primary
> disk. 64 ios in flight, direct, libaio, 512b random reads. Firefox
> essentially froze, windows starting freezing up around me.
>
> I'll try and reproduce, but a quick guess would be that things starting
> piling up in fsync() or stalling on writes in general, since we are
> heavily starving those.
Quite possible. Other people also had reported write starvation issues. I
have got reports of "hung task timeout of 120 seconds" reports in presence
of sync IO happening on same disk/partition.
We probably need to do something about write starvation. I had posted one
patch to make sure we dispatch atleast one WRITE after we were waiting for
pending sync requests to finish.
https://lkml.org/lkml/2011/6/10/326
This might help a bit but might not prevent servere delays in dispatching
async writes as things are so heavily loaded in favor or sync IO.
BTW, in this case, I did not see any sync IO completions happening while
async was not being dispatched. That's little odd.
Thanks
Vivek
next prev parent reply other threads:[~2012-03-07 19:56 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-06 21:56 multi-second application stall in open() Josh Hunt
2012-03-07 13:43 ` Josh Hunt
2012-03-07 16:28 ` Vivek Goyal
2012-03-07 18:56 ` Jens Axboe
2012-03-07 19:56 ` Vivek Goyal [this message]
2012-03-07 21:08 ` Josh Hunt
2012-03-08 22:22 ` Josh Hunt
2012-03-08 23:40 ` Vivek Goyal
[not found] ` <CAKA=qzbsL9UVYLZ3=hoT-1jfp=v=_Sr=h+YeHu0qAA=Ko_7P6w@mail.gmail.com>
2012-06-21 19:26 ` Josh Hunt
2012-06-21 20:32 ` Vivek Goyal
2012-06-21 20:36 ` Tejun Heo
2012-06-21 21:28 ` Josh Hunt
2012-06-21 21:32 ` Tejun Heo
2012-06-21 21:48 ` Rakesh Iyer
[not found] ` <CAOT6A4-a49wLHcQepUxJCDxOxfnSTEWa72OweLsmrea85OyrCg@mail.gmail.com>
2012-06-22 14:15 ` Vivek Goyal
2012-06-21 21:11 ` Josh Hunt
2012-06-22 14:12 ` Vivek Goyal
2012-06-22 20:05 ` Josh Hunt
2012-06-22 20:22 ` Josh Hunt
2012-06-22 20:42 ` Vivek Goyal
2012-06-22 20:53 ` Josh Hunt
2012-06-22 20:57 ` Josh Hunt
2012-06-22 21:34 ` Josh Hunt
2012-06-25 13:30 ` Vivek Goyal
2012-06-25 16:22 ` Josh Hunt
2012-06-25 21:18 ` Vivek Goyal
2012-06-25 23:05 ` Josh Hunt
2012-06-26 4:01 ` Josh Hunt
2012-06-26 12:59 ` Vivek Goyal
2012-06-26 15:18 ` Josh Hunt
2012-06-26 15:53 ` Vivek Goyal
2012-06-26 20:37 ` Josh Hunt
2012-06-26 20:56 ` Tejun Heo
[not found] ` <CAKA=qzbBtteDjHiPogCvN5jOSiPrDxx=vn96p02bXUy=6=jAgA@mail.gmail.com>
2012-06-26 23:44 ` Josh Hunt
2012-06-27 17:21 ` Josh Hunt
2012-06-27 17:38 ` Tejun Heo
2012-06-27 17:44 ` Josh Hunt
2012-06-27 17:54 ` Tejun Heo
2012-06-27 17:59 ` Josh Hunt
2012-06-29 23:02 ` Tejun Heo
2012-06-30 0:37 ` Josh Hunt
2012-07-04 1:12 ` Tejun Heo
2012-07-18 17:48 ` Tejun Heo
2012-06-26 20:43 ` Tejun Heo
2012-06-25 17:26 ` Tejun Heo
2012-03-07 19:47 ` Josh Hunt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120307195638.GI13430@redhat.com \
--to=vgoyal@redhat.com \
--cc=axboe@kernel.dk \
--cc=joshhunt00@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.