From: Wu Fengguang <fengguang.wu@intel.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Frantisek Rysanek <Frantisek.Rysanek@post.cz>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Disk IO, "Paralllel sequential" load: read-ahead inefficient? FS tuning?
Date: Fri, 10 Apr 2009 10:35:47 +0800 [thread overview]
Message-ID: <20090410023547.GB6831@localhost> (raw)
In-Reply-To: <87prfl8zzk.fsf@basil.nowhere.org>
On Fri, Apr 10, 2009 at 02:05:03AM +0800, Andi Kleen wrote:
> "Frantisek Rysanek" <Frantisek.Rysanek@post.cz> writes:
>
> > I don't understand all the tweakable knobs of mkfs.xfs - not well
> > enough to match the 4MB RAID chunk size somewhere in the internal
> > structure of XFS.
>
> If it's software RAID recent mkfs.xfs should be able to figure
> it out the stripe sizes on its own.
A side note on Frantisek's "perfectly aligned 4MB readahead on 4MB
file allocation on 4MB RAID chunk size" proposal:
- 4MB IO size may be good for _disk_ bandwidth but not necessarily for
the actual throughput of your applications because of latency issues.
- a (dirty) quick solution for your big-file servers is to use 16MB
chunk size for software RAID and use 2MB readahead size. It won't
suffer a lot from RAID5's partial write insufficiency, because
- the write ratio is small
- the writes are mostly sequential and can be write-back in busty
The benefit for reads are, as long as XFS keeps the file blocks
continuous, only 1 out of 8 readahead IO will involve two disks :-)
> > Another problem is, that there seems to be a single tweakable knob to
> > read-ahead in Linux 2.6, accessible in several ways:
> > /sys/block/<dev>/queue/max_sectors_kb
> > /sbin/blockdev --setra
> > /sbin/blockdev --setfra
>
> unsigned long max_sane_readahead(unsigned long nr)
> {
> return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
> + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
> }
>
> So you can affect it indirectly by keeping a lot of memory free
> with vm.min_free_kbytes. Probably not an optimal solution.
Of course, not even viable ;)
Here is the memory demand of concurrent readahead. For 1MB readahead
size, each stream will require about 2MB memory to keep it safe from
readahead thrashing. So for a server with 1000 streams, 2GB is enough
for readahead.
My old adaptive readahead patches can significantly reduce this
requirement - e.g. cut that 2GB down to 500MB. However, who cares
(please speak out!)? Servers seem to be memory bounty nowadays..
> >
> > Based on some manpages on the madvise() and fadvise() functions, I'd
> > say that the level of read-ahead corresponding to MADV_SEQUENTIAL and
> > FADV_SEQUENTIAL is still decimal orders less than the desired figure.
>
> Wu Fengguang (cc'ed) is doing a lot of work on the MADV_* readahead
> algorithms. There was a recent new patchkit from him on linux-kernel
> that you might try. It still uses strict limits, but it's better
> at figuring out specific patterns.
>
> But then if you really know very well what kind of readahead
> is needed it might be best to just implement it directly in the
> applications than to rely on kernel heuristics.
File downloading servers typically run sequential reads/writes.
Which can be well served by the kernel readahead logic.
Apache/lighttpd have the option to do mmap reads. For these sequential
mmap read workloads, these new patches are expected to serve them well:
http://lwn.net/Articles/327647/
> For example for faster booting sys_readahead() is widely used
> now.
And the more portable/versatile posix_fadvise() advices :-)
Thanks,
Fengguang
next prev parent reply other threads:[~2009-04-10 2:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-08 14:22 Disk IO, "Paralllel sequential" load: read-ahead inefficient? FS tuning? Frantisek Rysanek
2009-04-09 18:05 ` Andi Kleen
2009-04-10 2:35 ` Wu Fengguang [this message]
2009-04-10 6:19 ` Frantisek Rysanek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090410023547.GB6831@localhost \
--to=fengguang.wu@intel.com \
--cc=Frantisek.Rysanek@post.cz \
--cc=andi@firstfloor.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).