From: Chris Mason <chris.mason@oracle.com>
To: Freek Dijkstra <Freek.Dijkstra@sara.nl>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: Poor read performance on high-end server
Date: Fri, 6 Aug 2010 07:41:44 -0400 [thread overview]
Message-ID: <20100806114144.GB29846@think> (raw)
In-Reply-To: <4C5B2B42.3030407@sara.nl>
On Thu, Aug 05, 2010 at 11:21:06PM +0200, Freek Dijkstra wrote:
> Chris Mason wrote:
>
> > Basically we have two different things to tune. First the block layer
> > and then btrfs.
>
>
> > And then we need to setup a fio job file that hammers on all the ssds at
> > once. I'd have it use adio/dio and talk directly to the drives.
>
> Thanks. First one disk:
>
> > f1: (groupid=0, jobs=1): err= 0: pid=6273
> > read : io=32780MB, bw=260964KB/s, iops=12, runt=128626msec
> > clat (usec): min=74940, max=80721, avg=78449.61, stdev=923.24
> > bw (KB/s) : min=240469, max=269981, per=100.10%, avg=261214.77, stdev=2765.91
> > cpu : usr=0.01%, sys=2.69%, ctx=1747, majf=0, minf=5153
> > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> > issued r/w: total=1639/0, short=0/0
> >
> > lat (msec): 100=100.00%
> >
> > Run status group 0 (all jobs):
> > READ: io=32780MB, aggrb=260963KB/s, minb=267226KB/s, maxb=267226KB/s, mint=128626msec, maxt=128626msec
> >
> > Disk stats (read/write):
> > sdd: ios=261901/0, merge=0/0, ticks=10135270/0, in_queue=10136460, util=99.30%
>
> So 255 MiByte/s.
> Out of curiousity, what is the distinction between the reported figures
> of 260964 kiB/s, 261214.77 kiB/s, 267226 kiB/s and 260963 kiB/s?
When there is only one job, they should all be the same. aggr is the
total seen across all the jobs, min is the lowest, max is the highest.
>
>
> Now 16 disks (abbreviated):
>
> > ~/fio# ./fio ssd.fio
> > Starting 16 processes
> > f1: (groupid=0, jobs=1): err= 0: pid=4756
> > read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec
> > clat (msec): min=75, max=138, avg=96.15, stdev= 4.47
> > lat (msec): min=75, max=138, avg=96.15, stdev= 4.47
> > bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, stdev=9052.26
> > cpu : usr=0.00%, sys=1.71%, ctx=2737, majf=0, minf=5153
> > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> > issued r/w: total=1639/0, short=0/0
> >
> > lat (msec): 100=97.99%, 250=2.01%
> > Run status group 0 (all jobs):
> > READ: io=524480MB, aggrb=3301MB/s, minb=216323KB/s, maxb=219763KB/s, mint=156406msec, maxt=158893msec
> So, the maximum for these 16 disks is 3301 MiByte/s.
>
> I also tried hardware RAID (2 sets of 8 disks), and got a similar result:
>
> > Run status group 0 (all jobs):
> > READ: io=65560MB, aggrb=3024MB/s, minb=1548MB/s, maxb=1550MB/s, mint=21650msec, maxt=21681msec
Great, so we know the drives are fast.
>
>
>
> > fio should be able to push these devices up to the line speed. If it
> > doesn't I would suggest changing elevators (deadline, cfq, noop) and
> > bumping the max request size to the max supported by the device.
>
> 3301 MiByte/s seems like a reasonable number, given the theoretic
> maximum of 16 times the single disk performance of 16*256 MiByte/s =
> 4096 MiByte/s.
>
> Based on this, I have not looked at tuning. Would you recommend that I do?
>
> Our minimal goal is 2500 MiByte/s; that seems achievable as ZFS was able
> to reach 2750 MiByte/s without tuning.
>
> > When we have a config that does so, we can tune the btrfs side of things
> > as well.
>
> Some files are created in the root folder of the mount point, but I get
> errors instead of results:
>
Someone else mentioned that btrfs only gained DIO reads in 2.6.35. I
think you'll get the best results with that kernel if you can find an
update.
If not, you can change the fio job file to remove direct=1 and increase the
bs flag up to 20M.
I'd also suggest changing /sys/class/bdi/btrfs-1/read_ahead_kb to a
bigger number. Try 20480
-chris
next prev parent reply other threads:[~2010-08-06 11:41 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-05 14:05 Poor read performance on high-end server Freek Dijkstra
2010-08-05 14:51 ` Chris Mason
2010-08-05 21:21 ` Freek Dijkstra
2010-08-05 22:13 ` Daniel J Blueman
2010-08-06 11:41 ` Chris Mason [this message]
2010-08-06 11:55 ` Jens Axboe
2010-08-06 11:59 ` Chris Mason
2010-08-20 4:53 ` Sander
2010-08-20 14:37 ` Chris Mason
2010-08-08 7:18 ` Andi Kleen
2010-08-08 11:04 ` Jens Axboe
2010-08-09 14:45 ` Freek Dijkstra
2010-08-10 0:55 ` Chris Mason
2010-08-05 14:54 ` Daniel J Blueman
2010-08-05 16:21 ` Mathieu Chouquet-Stringer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100806114144.GB29846@think \
--to=chris.mason@oracle.com \
--cc=Freek.Dijkstra@sara.nl \
--cc=axboe@kernel.dk \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).