From: Mark Nelson <mark.nelson@inktank.com>
To: Li Wang <liwang@ubuntukylin.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
Sage Weil <sage@inktank.com>
Subject: Re: Read ahead affect Ceph read performance much
Date: Mon, 29 Jul 2013 09:48:47 -0500 [thread overview]
Message-ID: <51F680CF.4050903@inktank.com> (raw)
In-Reply-To: <51F642E2.3090201@ubuntukylin.com>
On 07/29/2013 05:24 AM, Li Wang wrote:
> We performed Iozone read test on a 32-node HPC server. Regarding the
> hardware of each node, the CPU is very powerful, so does the network,
> with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, the
> throughput measured by ‘dd’ locally is around 70MB/s. We configured a
> Ceph cluster with 24 OSDs on 24 nodes, one mds, one to four clients, one
> client per node. The performance is as follows,
>
> Iozone sequential read throughput (MB/s)
> Number of clients 1 2 4
> Default resize 180.0954 324.4836 591.5851
> Resize: 256MB 645.3347 1022.998 1267.631
>
> The complete iozone parameter for one client is,
> iozone -t 1 -+m /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w
> -c -e -b /tmp/iozone.nodelist.50305030.output, on each client node, only
> one thread is started.
>
> for two clients, it is,
> iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w
> -c -e -b /tmp/iozone.nodelist.50305030.output
>
> As the data shown, a larger read ahead window could result in >300%
> speedup!
Very interesting! I've done some similar tests and saw somewhat
different results (I actually in some cases saw improvement with lower
readahead!). I suspect that this may be very hardware dependent. Were
you using RBD or CephFS? In either case, was it the kernel client or
userland (IE QEMU/KVM or FUSE)? Also, where did you adjust readahead?
Was this on the client volume or under the OSDs?
I've got to prepare for the talk later this week, but I will try to get
my readahead test results out soon as well.
>
> Besides, Since the backend of Ceph is not the traditional hard disk, it
> is beneficial to capture the stride read prefetching. To prove this, we
> tested the stride read with the following program, as we know, the
> generic read ahead algorithm of Linux kernel will not capture
> stride-read prefetch, so we use fadvise() to manually force pretching.
> the record size is 4MB. The result is even more surprising,
>
> Stride read throughput (MB/s)
> Number of records prefetched 0 1 4 16 64 128
> Throughput 42.82 100.74 217.41 497.73 854.48 950.18
>
> As the data shown, with a read ahead size of 128*4MB, the speedup over
> without read ahead could be up to 950/42 > 2000%!
>
> The core logic of the test program is below,
>
> stride = 17
> recordsize = 4MB
> for (;;) {
> for (i = 0; i < count; ++i) {
> long long start = pos + (i + 1) * stride * recordsize;
> printf("PRE READ %lld %lld\n", start, start + block);
> posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED);
> }
> len = read(fd, buf, block);
> total += len;
> printf("READ %lld %lld\n", pos, (pos + len));
> pos += len;
> lseek(fd, (stride - 1) * block, SEEK_CUR);
> pos += (stride - 1) * block;
> }
>
> Given the above results and some more, We plan to submit a blue print to
> discuss the prefetching optimization of Ceph.
Cool!
>
> Cheers,
> Li Wang
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-07-29 14:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-29 10:24 Read ahead affect Ceph read performance much Li Wang
2013-07-29 13:00 ` Andrey Korolyov
2013-07-29 14:48 ` Mark Nelson [this message]
2013-07-31 4:42 ` Chen, Xiaoxi
2013-07-31 15:27 ` Li Wang
2013-07-31 15:48 ` Chen, Xiaoxi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51F680CF.4050903@inktank.com \
--to=mark.nelson@inktank.com \
--cc=ceph-devel@vger.kernel.org \
--cc=liwang@ubuntukylin.com \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.