linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: "Frantisek Rysanek" <Frantisek.Rysanek@post.cz>
Cc: linux-fsdevel@vger.kernel.org, "Wu Fengguang" <fengguang.wu@intel.com>
Subject: Re: Disk IO, "Paralllel sequential" load: read-ahead inefficient? FS tuning?
Date: Thu, 09 Apr 2009 20:05:03 +0200	[thread overview]
Message-ID: <87prfl8zzk.fsf@basil.nowhere.org> (raw)
In-Reply-To: <49DCCF55.32475.8FFB6157@localhost> (Frantisek Rysanek's message of "Wed, 08 Apr 2009 16:22:45 +0200")

"Frantisek Rysanek" <Frantisek.Rysanek@post.cz> writes:

> I don't understand all the tweakable knobs of mkfs.xfs - not well
> enough to match the 4MB RAID chunk size somewhere in the internal
> structure of XFS.

If it's software RAID recent mkfs.xfs should be able to figure
it out the stripe sizes on its own.

> Another problem is, that there seems to be a single tweakable knob to 
> read-ahead in Linux 2.6, accessible in several ways:
>   /sys/block/<dev>/queue/max_sectors_kb
>   /sbin/blockdev --setra
>   /sbin/blockdev --setfra

unsigned long max_sane_readahead(unsigned long nr)
{
        return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
                + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
}

So you can affect it indirectly by keeping a lot of memory free
with vm.min_free_kbytes. Probably not an optimal solution.

>
> Based on some manpages on the madvise() and fadvise() functions, I'd 
> say that the level of read-ahead corresponding to MADV_SEQUENTIAL and 
> FADV_SEQUENTIAL is still decimal orders less than the desired figure.

Wu Fengguang (cc'ed) is doing a lot of work on the MADV_* readahead
algorithms. There was a recent new patchkit from him on linux-kernel
that you might try. It still uses strict limits, but it's better
at figuring out specific patterns.

But then if you really know very well what kind of readahead
is needed it might be best to just implement it directly in the
applications than to rely on kernel heuristics.

For example for faster booting sys_readahead() is widely used
now.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

  reply	other threads:[~2009-04-09 18:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-08 14:22 Disk IO, "Paralllel sequential" load: read-ahead inefficient? FS tuning? Frantisek Rysanek
2009-04-09 18:05 ` Andi Kleen [this message]
2009-04-10  2:35   ` Wu Fengguang
2009-04-10  6:19   ` Frantisek Rysanek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87prfl8zzk.fsf@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=Frantisek.Rysanek@post.cz \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).