From: Jeff Moyer <jmoyer@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Theodore Ts'o <tytso@mit.edu>, Dave Chinner <david@fromorbit.com>,
Jan Kara <jack@suse.cz>,
linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Jiri Slaby <jslaby@suse.cz>
Subject: Re: Excessive stall times on ext4 in 3.9-rc2
Date: Wed, 24 Apr 2013 15:09:13 -0400 [thread overview]
Message-ID: <x49ppxjeofa.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20130423140134.GA2108@suse.de> (Mel Gorman's message of "Tue, 23 Apr 2013 15:01:34 +0100")
Mel Gorman <mgorman@suse.de> writes:
>> I'll also note that even though your I/O is going all over the place
>> (D2C is pretty bad, 14ms), most of the time is spent waiting for a
>> struct request allocation or between Queue and Merge:
>>
>> ==================== All Devices ====================
>>
>> ALL MIN AVG MAX N
>> --------------- ------------- ------------- ------------- -----------
>>
>> Q2Q 0.000000001 0.000992259 8.898375882 2300861
>> Q2G 0.000000843 10.193261239 2064.079501935 1016463 <====
>
> This is not normally my sandbox so do you mind spelling this out?
>
> IIUC, the time to allocate the struct request from the slab cache is just a
> small portion of this time. The bulk of the time is spent in get_request()
> waiting for congestion to clear on the request list for either the sync or
> async queue. Once a process goes to sleep on that waitqueue, it has to wait
> until enough requests on that queue have been serviced before it gets woken
> again at which point it gets priority access to prevent further starvation.
> This is the Queue To Get Reqiest (Q2G) delay. What we may be seeing here
> is that the async queue was congested and on average, we are waiting for
> 10 seconds for it to clear. The maximum value may be bogus for reasons
> explained later.
>
> Is that accurate?
Yes, without getting into excruciating detail.
>> G2I 0.000000461 0.000044702 3.237065090 1015803
>> Q2M 0.000000101 8.203147238 2064.079367557 1311662
>> I2D 0.000002012 1.476824812 2064.089774419 1014890
>> M2D 0.000003283 6.994306138 283.573348664 1284872
>> D2C 0.000061889 0.014438316 0.857811758 2291996
>> Q2C 0.000072284 13.363007244 2064.092228625 2292191
>>
>> ==================== Device Overhead ====================
>>
>> DEV | Q2G G2I Q2M I2D D2C
>> ---------- | --------- --------- --------- --------- ---------
>> ( 8, 0) | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080%
>> ---------- | --------- --------- --------- --------- ---------
>> Overall | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080%
>>
>> I'm not sure I believe that max value. 2064 seconds seems a bit high.
>
> It is so I looked closer at the timestamps and there is an one hour
> correction about 4400 seconds into the test. Daylight savings time kicked
> in on March 31st and the machine is rarely rebooted until this test case
> came along. It looks like there is a timezone or time misconfiguration
> on the laptop that starts the machine with the wrong time. NTP must have
> corrected the time which skewed the readings in that window severely :(
Not sure I'm buying that argument, as there are no gaps in the blkparse
output. The logging is not done using wallclock time. I still haven't
had sufficient time to dig into these numbers.
>> Also, Q2M should not be anywhere near that big, so more investigation is
>> required there. A quick look over the data doesn't show any such delays
>> (making me question the tools), but I'll write some code tomorrow to
>> verify the btt output.
>>
>
> It might be a single set of readings during a time correction that
> screwed it.
Again, I don't think so.
> I can reproduce it at will. Due to the nature of the test, the test
> results are variable and unfortunately it is one of the tricker mmtest
> configurations to setup.
>
> 1. Get access to a webserver
> 2. Close mmtests to your test machine
> git clone https://github.com/gormanm/mmtests.git
> 3. Edit shellpacks/common-config.sh and set WEBROOT to a webserver path
> 4. Create a tar.gz of a large git tree and place it at $WEBROOT/linux-2.6.tar.gz
> Alternatively place a compressed git tree anywhere and edit
> configs/config-global-dhp__io-multiple-source-latency
> and update GITCHECKOUT_SOURCETAR
> 5. Create a tar.gz of a large maildir directory and place it at
> $WEBROOT/$WEBROOT/maildir.tar.gz
> Alternatively, use an existing maildir folder and set
> MONITOR_INBOX_OPEN_MAILDIR in
> configs/config-global-dhp__io-multiple-source-latency
>
> It's awkward but it's not like there are standard benchmarks lying around
> and it seemed the best way to reproduce the problems I typically see early
> in the lifetime of a system or when running a git checkout when the tree
> has not been used in a few hours. Run the actual test with
>
> ./run-mmtests.sh --config configs/config-global-dhp__io-multiple-source-latency --run-monitor test-name-of-your-choice
>
> Results will be in work/log. You'll need to run this as root so it
> can run blktrace and so it can drop_caches between git checkouts
> (to force disk IO). If systemtap craps out on you, then edit
> configs/config-global-dhp__io-multiple-source-latency and remove dstate
> from MONITORS_GZIP
And how do I determine whether I've hit the problem?
> If you have trouble getting this running, ping me on IRC.
Yes, I'm having issues getting things to go, but you didn't provide me a
time zone, an irc server or a nick to help me find you. Was that
intentional? ;-)
Cheers,
Jeff
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-04-24 19:09 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-02 14:27 Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-02 15:00 ` Jiri Slaby
2013-04-02 15:03 ` Zheng Liu
2013-04-02 15:15 ` Mel Gorman
2013-04-02 15:06 ` Theodore Ts'o
2013-04-02 15:14 ` Theodore Ts'o
2013-04-02 18:19 ` Theodore Ts'o
2013-04-07 21:59 ` Frank Ch. Eigler
2013-04-08 8:36 ` Mel Gorman
2013-04-08 10:52 ` Frank Ch. Eigler
2013-04-08 11:01 ` Theodore Ts'o
2013-04-03 10:19 ` Mel Gorman
2013-04-03 12:05 ` Theodore Ts'o
2013-04-03 15:15 ` Mel Gorman
2013-04-05 22:18 ` Jiri Slaby
2013-04-05 23:16 ` Theodore Ts'o
2013-04-06 7:29 ` Jiri Slaby
2013-04-06 7:37 ` Jiri Slaby
2013-04-06 8:19 ` Jiri Slaby
2013-04-06 13:15 ` Theodore Ts'o
2013-04-10 10:56 ` Mel Gorman
2013-04-10 13:12 ` Theodore Ts'o
2013-04-11 17:04 ` Mel Gorman
2013-04-11 18:35 ` Theodore Ts'o
2013-04-11 21:33 ` Jan Kara
2013-04-12 2:57 ` Theodore Ts'o
2013-04-12 4:50 ` Dave Chinner
2013-04-12 15:19 ` Theodore Ts'o
2013-04-13 1:23 ` Dave Chinner
2013-04-22 14:38 ` Mel Gorman
2013-04-22 22:42 ` Jeff Moyer
2013-04-23 0:02 ` Theodore Ts'o
2013-04-23 9:31 ` Jan Kara
2013-04-23 14:01 ` Mel Gorman
2013-04-24 19:09 ` Jeff Moyer [this message]
2013-04-25 12:21 ` Mel Gorman
2013-04-12 9:47 ` Mel Gorman
2013-04-21 0:05 ` Theodore Ts'o
2013-04-21 0:07 ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Theodore Ts'o
2013-04-21 0:07 ` [PATCH 2/3] buffer: add BH_Prio and BH_Meta flags Theodore Ts'o
2013-04-21 0:07 ` [PATCH 3/3] ext4: mark metadata blocks using bh flags Theodore Ts'o
2013-04-21 6:09 ` Jiri Slaby
2013-04-21 19:55 ` Theodore Ts'o
2013-04-21 20:48 ` [PATCH 3/3 -v2] " Theodore Ts'o
2013-04-22 12:06 ` [PATCH 1/3] ext4: mark all metadata I/O with REQ_META Zheng Liu
2013-04-23 15:33 ` Excessive stall times on ext4 in 3.9-rc2 Mel Gorman
2013-04-23 15:50 ` Theodore Ts'o
2013-04-23 16:13 ` Mel Gorman
2013-04-12 10:18 ` Tvrtko Ursulin
2013-04-12 9:45 ` Mel Gorman
2013-04-02 23:16 ` Theodore Ts'o
2013-04-03 15:22 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x49ppxjeofa.fsf@segfault.boston.devel.redhat.com \
--to=jmoyer@redhat.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jslaby@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).