From: Jens Axboe <jens.axboe@oracle.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>,
"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"Wu, Fengguang" <fengguang.wu@intel.com>
Subject: Re: btrfs: why default 4M readahead size?
Date: Fri, 19 Mar 2010 13:57:20 +0100 [thread overview]
Message-ID: <20100319125719.GQ5768@kernel.dk> (raw)
In-Reply-To: <20100319092900.GA28071@sli10-desk.sh.intel.com>
On Fri, Mar 19 2010, Shaohua Li wrote:
> On Fri, Mar 19, 2010 at 04:22:11PM +0800, Jens Axboe wrote:
> > On Fri, Mar 19 2010, Shaohua Li wrote:
> > > On Fri, Mar 19, 2010 at 08:59:48AM +0800, Shaohua Li wrote:
> > > > On Thu, Mar 18, 2010 at 08:53:13PM +0800, Chris Mason wrote:
> > > > > On Thu, Mar 18, 2010 at 09:42:57AM +0800, Shaohua Li wrote:
> > > > > > Btrfs uses below equation to calculate ra_pages:
> > > > > > fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
> > > > > > 4 * 1024 * 1024 / PAGE_CACHE_SIZE);
> > > > > > is the max() a typo of min()? This makes the readahead size is 4M by default,
> > > > > > which is too big.
> > > > >
> > > > > Looks like things have changed since I tuned that number. Fengguang has
> > > > > been busy ;)
> > > > >
> > > > > > I have a system with 16 CPU, 6G memory and 12 sata disks. I create a btrfs for
> > > > > > each disk, so this isn't a raid setup. The test is fio, which has 12 tasks to
> > > > > > access 12 files for each disk. The fio test is mmap sequential read. I measure
> > > > > > the performance with different readahead size:
> > > > > > ra size io throughput
> > > > > > 4M 268288 k/s
> > > > > > 2M 367616 k/s
> > > > > > 1M 431104 k/s
> > > > > > 512K 474112 k/s
> > > > > > 256K 512000 k/s
> > > > > > 128K 538624 k/s
> > > > > > The 4M default readahead size has poor performance.
> > > > > > I also does sync sequential read test, the test difference in't that big. But
> > > > > > the 4M case still has about 10% drop compared to the 512k case.
> > > > >
> > > > > I'm surprised the 4M is so much slower. At any rate, the larger size
> > > > > was selected because btrfs checksumming means we need a bigger buffer to
> > > > > keep the disks saturated. Were you on a fancy intel box with hardware
> > > > > crc32c enabled?
> > > > yes, this machine supports sse4.2 instruction. Let me check the result with checksum
> > > > disabled.
> > > Sounds no big difference with checksum disabled. I format the disks and redo
> > > the test:
> > > 128k ra: 539648 k/s
> > > 4m ra: 285696 k/s
> >
> > 4MB is definitely a huge read-ahead size, but I do wonder why it would
> > perform that much worse than a 128KB window. If you narrow your test
> > down to a single disk (or something simpler, at least), how does 4MB
> > compare to 128KB? With 6GB of memory, you should not run into read-ahead
> > memory thrashing.
> test data for a single disk(just run one time so far):
> 128k ra: 88513k/s
> 4m ra:87630k/s
> so no big difference.
That looks pretty much as expected, unless you hit some sort of memory
thrashing, a huge read-ahead window should not cause a performance
degredation. At least not of your magnitude. I would expect performance
to reach a stable threshold once you have requests that are large enough
to utilize the full device bandwidth on its own and then remain at that
plateau.
Any chance you could capture blktrace data for a run with 128KB and one
with 4MB so we could inspect the disk IO pattern?
--
Jens Axboe
prev parent reply other threads:[~2010-03-19 12:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-18 1:42 btrfs: why default 4M readahead size? Shaohua Li
2010-03-18 12:53 ` Chris Mason
2010-03-19 0:59 ` Shaohua Li
2010-03-19 2:56 ` Shaohua Li
2010-03-19 8:22 ` Jens Axboe
2010-03-19 9:29 ` Shaohua Li
2010-03-19 12:57 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100319125719.GQ5768@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=chris.mason@oracle.com \
--cc=fengguang.wu@intel.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).