linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Something amiss with IO throttling
@ 2013-02-26  0:52 Phillip Susi
  2013-02-26  3:31 ` readahead blocking on ext4 ( was: Something amiss with IO throttling ) Phillip Susi
  0 siblings, 1 reply; 2+ messages in thread
From: Phillip Susi @ 2013-02-26  0:52 UTC (permalink / raw)
  To: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There seems to be something wrong with IO throttling in recent
kernels.  I have noticed two problems recently:

1)  I noticed that the man page for readahead() states that it blocks
until the data has been read, which is incorrect.  The whole point of
the call is that it initiates background read(ahead), in the hopes
that it will finish before the data is actually read, just like
posix_fadvise(POSIX_FADV_WILLNEED).  Testing however, indicates that
indeed, both readahead() and fadvise with POSIX_FADV_WILLNEED does
appear to be blocking at least for most of the time it takes to read a
large file.  In my case I'm reading a 400mb file after dropping cache
and having 2+gb of free ram, and the readahead/fadvise blocks for
nearly the 5 seconds it takes to read that.

I made sure that the queue is not getting full by increasing
max_sectors_kb to 4096, and nr_requests to 4096 and verifying that the
file is in a single contiguous extent using ext4 extents, so no need
to read indirect blocks ( which was the cause of readahead() blocking
I ran into a few years ago ).

2)  I have been trying to figure out why restoring a level 0 dump is
only getting half or less the throughput it should.  Inspection of
blktrace data seemed to show that there were a fair number of seeks
back and forth between the file data, which was largely contiguous
thanks to the ext4 multiblock allocator, and metadata ( inodes,
directories, bitmaps ).  Even after implementing POSIX_FADV_NOREUSE in
the hopes that this would allow the file data to migrate out first and
the metadata to remain cached until later so these seeks could be
avoided, there was little to no improvement.  Prior to my recent pair
of patches implementing NOREUSE and fixing DONTNEED, using DONTNEED
and/or sync_file_range(SYNC_FILE_RANGE_WRITE) actually made things
*worse*, indicating that they too were being unnecessarily throttled.

I noticed that during the restore, write queues are not full, dirty
pages are nowhere near dirty_max_ratio, and pages in Writeback are
only 1-3 MB, yet the restore process is being throttled.  I also
disabled dirty_writeback_centisecs.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJRLAdGAAoJEJrBOlT6nu75954IAMaYCgimvT7ftDb8a4QGdYaD
mvXIErscjEoZeke/CufYuckKyziYkhjD3yrZmBBPpM3+EZQJHynVb/c9bh0IB64C
P/1uxOW5Si6q5u/qchIA7kZo4PHfyNjSNehtLG3urK7/8XvSMxZbE1tTX5BSKjCJ
edpwpYXie6HiKCKRnbiYGSD8wnFDOvygE+KRQ37DoYyP+UVaPTiWwOc9RFGKgAgX
ET14fHHD72t7BXtpAsErMwRJhE1Aw8L23mP20O4yN2rY3maIMuQzv1wFeRzUDLqn
TE6+4OAZRFkJpBjHXr3RUme7jq300a8YsbaHGCjx0XA0N1uxjLVxQRLc2blGOso=
=eNWA
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: readahead blocking on ext4 ( was: Something amiss with IO throttling )
  2013-02-26  0:52 Something amiss with IO throttling Phillip Susi
@ 2013-02-26  3:31 ` Phillip Susi
  0 siblings, 0 replies; 2+ messages in thread
From: Phillip Susi @ 2013-02-26  3:31 UTC (permalink / raw)
  To: linux-mm; +Cc: ext4 development

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/25/2013 07:52 PM, Phillip Susi wrote:
> There seems to be something wrong with IO throttling in recent 
> kernels.  I have noticed two problems recently:
> 
> 1)  I noticed that the man page for readahead() states that it
> blocks until the data has been read, which is incorrect.  The whole
> point of the call is that it initiates background read(ahead), in
> the hopes that it will finish before the data is actually read,
> just like posix_fadvise(POSIX_FADV_WILLNEED).  Testing however,
> indicates that indeed, both readahead() and fadvise with
> POSIX_FADV_WILLNEED does appear to be blocking at least for most of
> the time it takes to read a large file.  In my case I'm reading a
> 400mb file after dropping cache and having 2+gb of free ram, and
> the readahead/fadvise blocks for nearly the 5 seconds it takes to
> read that.

Well, I solved this part and no longer think it is related to the
second problem.  It turns out that the file I was using for testing
had been downloaded via bit torrent, and while the data blocks were
100% contiguous, the extent tree was slightly fragmented.  The file
used 3 level 0 extent tree blocks, so it seems that the readahead
blocked until most of the file was read, and the third extent tree
block then could be read, and the remaining blocks mapped by it could
be queued, then readahead would return.

I'm adding linux-ext4 to Cc, and I'm hoping we can come up with some
ideas to improve this.  It seems to me that the extent tree blocks
should be read first, then all of the data blocks queued up so
readahead only has to block once and shortly before returning.  What
do you think?

Also it may be interesting to note that the three extent tree blocks
were allocated seemingly at random:

EXTENTS:
(ETB0):33795, (0-30975):370688-401663, (30976-40191):401664-410879,
(ETB0):483328, (40192-72575):410880-443263,
(72576-81919):443264-452607, (ETB0):483329, (81920-111578):452608-482266

Shouldn't the extent tree blocks be clustered a little better?  And
shouldn't the actual extents be merged a little better?  With 111578
logical blocks and a max of 32767 ( or was it 32768 initialized
blocks, and 32767 uninit? ) per extent, this whole file should be able
to fit within the 4 extents embedded in the inode, with no need for
any extent tree blocks.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJRLCyQAAoJEJrBOlT6nu75PEkIAK+SWfo6ugAV6cvgqGWiNkuz
eAn6IfaGxdg/6TLH+xsnQbI8KTqe8Zg9VXTZyenfwdC2lsgFz2WOI19yzMbR6eWi
fddzadKuvPbKN8BgpmF3I61wAOG1YdbpEcvJRHhyFU211+I10shOeowGnyIuj7II
uoyGeBaWN1MpIuTzLySFVLAYdbZOofYlTbrxrjiCavAHhjG91WfZExMVR60OkY0z
dxBQ0NF5B+YIx+egOx1fVbstXxfrFTIIqVSLpK6fF44DUN/rHWIRivT3vARJt1Pa
ZxFmqyGMLUgqaq71n9B4L5FmfQ+anYM33plufm4cUjFfoEqBbowZvSnwN2p1mQk=
=H8i2
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-02-26  3:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-26  0:52 Something amiss with IO throttling Phillip Susi
2013-02-26  3:31 ` readahead blocking on ext4 ( was: Something amiss with IO throttling ) Phillip Susi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).