* Read ahead affect Ceph read performance much
@ 2013-07-29 10:24 Li Wang
2013-07-29 13:00 ` Andrey Korolyov
2013-07-29 14:48 ` Mark Nelson
0 siblings, 2 replies; 6+ messages in thread
From: Li Wang @ 2013-07-29 10:24 UTC (permalink / raw)
To: ceph-devel@vger.kernel.org; +Cc: Sage Weil
We performed Iozone read test on a 32-node HPC server. Regarding the
hardware of each node, the CPU is very powerful, so does the network,
with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, the
throughput measured by ‘dd’ locally is around 70MB/s. We configured a
Ceph cluster with 24 OSDs on 24 nodes, one mds, one to four clients, one
client per node. The performance is as follows,
Iozone sequential read throughput (MB/s)
Number of clients 1 2 4
Default resize 180.0954 324.4836 591.5851
Resize: 256MB 645.3347 1022.998 1267.631
The complete iozone parameter for one client is,
iozone -t 1 -+m /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w
-c -e -b /tmp/iozone.nodelist.50305030.output, on each client node, only
one thread is started.
for two clients, it is,
iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w
-c -e -b /tmp/iozone.nodelist.50305030.output
As the data shown, a larger read ahead window could result in >300% speedup!
Besides, Since the backend of Ceph is not the traditional hard disk, it
is beneficial to capture the stride read prefetching. To prove this, we
tested the stride read with the following program, as we know, the
generic read ahead algorithm of Linux kernel will not capture
stride-read prefetch, so we use fadvise() to manually force pretching.
the record size is 4MB. The result is even more surprising,
Stride read throughput (MB/s)
Number of records prefetched 0 1 4 16 64 128
Throughput 42.82 100.74 217.41 497.73 854.48 950.18
As the data shown, with a read ahead size of 128*4MB, the speedup over
without read ahead could be up to 950/42 > 2000%!
The core logic of the test program is below,
stride = 17
recordsize = 4MB
for (;;) {
for (i = 0; i < count; ++i) {
long long start = pos + (i + 1) * stride * recordsize;
printf("PRE READ %lld %lld\n", start, start + block);
posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED);
}
len = read(fd, buf, block);
total += len;
printf("READ %lld %lld\n", pos, (pos + len));
pos += len;
lseek(fd, (stride - 1) * block, SEEK_CUR);
pos += (stride - 1) * block;
}
Given the above results and some more, We plan to submit a blue print to
discuss the prefetching optimization of Ceph.
Cheers,
Li Wang
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Read ahead affect Ceph read performance much 2013-07-29 10:24 Read ahead affect Ceph read performance much Li Wang @ 2013-07-29 13:00 ` Andrey Korolyov 2013-07-29 14:48 ` Mark Nelson 1 sibling, 0 replies; 6+ messages in thread From: Andrey Korolyov @ 2013-07-29 13:00 UTC (permalink / raw) To: Li Wang; +Cc: ceph-devel@vger.kernel.org, Sage Weil Wow, very glad to hear that. I tried with the regular FS tunable and there was almost no effect on the regular test, so I thought that reads cannot be improved at all in this direction. On Mon, Jul 29, 2013 at 2:24 PM, Li Wang <liwang@ubuntukylin.com> wrote: > We performed Iozone read test on a 32-node HPC server. Regarding the > hardware of each node, the CPU is very powerful, so does the network, with a > bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, the throughput > measured by ‘dd’ locally is around 70MB/s. We configured a Ceph cluster with > 24 OSDs on 24 nodes, one mds, one to four clients, one client per node. The > performance is as follows, > > Iozone sequential read throughput (MB/s) > Number of clients 1 2 4 > Default resize 180.0954 324.4836 591.5851 > Resize: 256MB 645.3347 1022.998 1267.631 > > The complete iozone parameter for one client is, > iozone -t 1 -+m /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w -c -e > -b /tmp/iozone.nodelist.50305030.output, on each client node, only one > thread is started. > > for two clients, it is, > iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w -c -e > -b /tmp/iozone.nodelist.50305030.output > > As the data shown, a larger read ahead window could result in >300% speedup! > > Besides, Since the backend of Ceph is not the traditional hard disk, it is > beneficial to capture the stride read prefetching. To prove this, we tested > the stride read with the following program, as we know, the generic read > ahead algorithm of Linux kernel will not capture stride-read prefetch, so we > use fadvise() to manually force pretching. > the record size is 4MB. The result is even more surprising, > > Stride read throughput (MB/s) > Number of records prefetched 0 1 4 16 64 128 > Throughput 42.82 100.74 217.41 497.73 854.48 950.18 > > As the data shown, with a read ahead size of 128*4MB, the speedup over > without read ahead could be up to 950/42 > 2000%! > > The core logic of the test program is below, > > stride = 17 > recordsize = 4MB > for (;;) { > for (i = 0; i < count; ++i) { > long long start = pos + (i + 1) * stride * recordsize; > printf("PRE READ %lld %lld\n", start, start + block); > posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED); > } > len = read(fd, buf, block); > total += len; > printf("READ %lld %lld\n", pos, (pos + len)); > pos += len; > lseek(fd, (stride - 1) * block, SEEK_CUR); > pos += (stride - 1) * block; > } > > Given the above results and some more, We plan to submit a blue print to > discuss the prefetching optimization of Ceph. > > Cheers, > Li Wang > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Read ahead affect Ceph read performance much 2013-07-29 10:24 Read ahead affect Ceph read performance much Li Wang 2013-07-29 13:00 ` Andrey Korolyov @ 2013-07-29 14:48 ` Mark Nelson 2013-07-31 4:42 ` Chen, Xiaoxi 1 sibling, 1 reply; 6+ messages in thread From: Mark Nelson @ 2013-07-29 14:48 UTC (permalink / raw) To: Li Wang; +Cc: ceph-devel@vger.kernel.org, Sage Weil On 07/29/2013 05:24 AM, Li Wang wrote: > We performed Iozone read test on a 32-node HPC server. Regarding the > hardware of each node, the CPU is very powerful, so does the network, > with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, the > throughput measured by ‘dd’ locally is around 70MB/s. We configured a > Ceph cluster with 24 OSDs on 24 nodes, one mds, one to four clients, one > client per node. The performance is as follows, > > Iozone sequential read throughput (MB/s) > Number of clients 1 2 4 > Default resize 180.0954 324.4836 591.5851 > Resize: 256MB 645.3347 1022.998 1267.631 > > The complete iozone parameter for one client is, > iozone -t 1 -+m /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w > -c -e -b /tmp/iozone.nodelist.50305030.output, on each client node, only > one thread is started. > > for two clients, it is, > iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w > -c -e -b /tmp/iozone.nodelist.50305030.output > > As the data shown, a larger read ahead window could result in >300% > speedup! Very interesting! I've done some similar tests and saw somewhat different results (I actually in some cases saw improvement with lower readahead!). I suspect that this may be very hardware dependent. Were you using RBD or CephFS? In either case, was it the kernel client or userland (IE QEMU/KVM or FUSE)? Also, where did you adjust readahead? Was this on the client volume or under the OSDs? I've got to prepare for the talk later this week, but I will try to get my readahead test results out soon as well. > > Besides, Since the backend of Ceph is not the traditional hard disk, it > is beneficial to capture the stride read prefetching. To prove this, we > tested the stride read with the following program, as we know, the > generic read ahead algorithm of Linux kernel will not capture > stride-read prefetch, so we use fadvise() to manually force pretching. > the record size is 4MB. The result is even more surprising, > > Stride read throughput (MB/s) > Number of records prefetched 0 1 4 16 64 128 > Throughput 42.82 100.74 217.41 497.73 854.48 950.18 > > As the data shown, with a read ahead size of 128*4MB, the speedup over > without read ahead could be up to 950/42 > 2000%! > > The core logic of the test program is below, > > stride = 17 > recordsize = 4MB > for (;;) { > for (i = 0; i < count; ++i) { > long long start = pos + (i + 1) * stride * recordsize; > printf("PRE READ %lld %lld\n", start, start + block); > posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED); > } > len = read(fd, buf, block); > total += len; > printf("READ %lld %lld\n", pos, (pos + len)); > pos += len; > lseek(fd, (stride - 1) * block, SEEK_CUR); > pos += (stride - 1) * block; > } > > Given the above results and some more, We plan to submit a blue print to > discuss the prefetching optimization of Ceph. Cool! > > Cheers, > Li Wang > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Read ahead affect Ceph read performance much 2013-07-29 14:48 ` Mark Nelson @ 2013-07-31 4:42 ` Chen, Xiaoxi 2013-07-31 15:27 ` Li Wang 0 siblings, 1 reply; 6+ messages in thread From: Chen, Xiaoxi @ 2013-07-31 4:42 UTC (permalink / raw) To: Mark Nelson, Li Wang; +Cc: ceph-devel@vger.kernel.org, Sage Weil My 0.02, we have done some readahead test tuning on server(ceph osd) side, the result showing that when readahead = 0.5 * object_size(4M in default), we can get max read throughput. Readahead value larger than this generally will not help, but also not harm the performance. For your case, seems your workload(HPC) are fully sequential, so larger read ahead and prefetch should be helpful, but for RBD part, it's a bit harder to so such tuning. -----Original Message----- From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson Sent: Monday, July 29, 2013 10:49 PM To: Li Wang Cc: ceph-devel@vger.kernel.org; Sage Weil Subject: Re: Read ahead affect Ceph read performance much On 07/29/2013 05:24 AM, Li Wang wrote: > We performed Iozone read test on a 32-node HPC server. Regarding the > hardware of each node, the CPU is very powerful, so does the network, > with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, > the throughput measured by 'dd' locally is around 70MB/s. We > configured a Ceph cluster with 24 OSDs on 24 nodes, one mds, one to > four clients, one client per node. The performance is as follows, > > Iozone sequential read throughput (MB/s) > Number of clients 1 2 4 > Default resize 180.0954 324.4836 591.5851 > Resize: 256MB 645.3347 1022.998 1267.631 > > The complete iozone parameter for one client is, iozone -t 1 -+m > /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w -c -e -b > /tmp/iozone.nodelist.50305030.output, on each client node, only one > thread is started. > > for two clients, it is, > iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w > -c -e -b /tmp/iozone.nodelist.50305030.output > > As the data shown, a larger read ahead window could result in >300% > speedup! Very interesting! I've done some similar tests and saw somewhat different results (I actually in some cases saw improvement with lower readahead!). I suspect that this may be very hardware dependent. Were you using RBD or CephFS? In either case, was it the kernel client or userland (IE QEMU/KVM or FUSE)? Also, where did you adjust readahead? Was this on the client volume or under the OSDs? I've got to prepare for the talk later this week, but I will try to get my readahead test results out soon as well. > > Besides, Since the backend of Ceph is not the traditional hard disk, > it is beneficial to capture the stride read prefetching. To prove > this, we tested the stride read with the following program, as we > know, the generic read ahead algorithm of Linux kernel will not > capture stride-read prefetch, so we use fadvise() to manually force pretching. > the record size is 4MB. The result is even more surprising, > > Stride read throughput (MB/s) > Number of records prefetched 0 1 4 16 64 128 > Throughput 42.82 100.74 217.41 497.73 854.48 950.18 > > As the data shown, with a read ahead size of 128*4MB, the speedup over > without read ahead could be up to 950/42 > 2000%! > > The core logic of the test program is below, > > stride = 17 > recordsize = 4MB > for (;;) { > for (i = 0; i < count; ++i) { > long long start = pos + (i + 1) * stride * recordsize; > printf("PRE READ %lld %lld\n", start, start + block); > posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED); > } > len = read(fd, buf, block); > total += len; > printf("READ %lld %lld\n", pos, (pos + len)); > pos += len; > lseek(fd, (stride - 1) * block, SEEK_CUR); > pos += (stride - 1) * block; > } > > Given the above results and some more, We plan to submit a blue print > to discuss the prefetching optimization of Ceph. Cool! > > Cheers, > Li Wang > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@vger.kernel.org More majordomo > info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Read ahead affect Ceph read performance much 2013-07-31 4:42 ` Chen, Xiaoxi @ 2013-07-31 15:27 ` Li Wang 2013-07-31 15:48 ` Chen, Xiaoxi 0 siblings, 1 reply; 6+ messages in thread From: Li Wang @ 2013-07-31 15:27 UTC (permalink / raw) To: Chen, Xiaoxi; +Cc: Mark Nelson, ceph-devel@vger.kernel.org, Sage Weil We are tuning the prefetching window from the client side by specifying a different 'rasize' at mount time. The workload we are using is iozone, just the hardware is, to some extent, for HPC. We think how many OSDs is a file stored across also impact the performance, since that somehow determines how much optimization space are there. More OSDs, More performance potential to exploit, so maybe you could try more OSDs. Would like to hear your further test results. Cheers, Li Wang On 07/31/2013 12:42 PM, Chen, Xiaoxi wrote: > My 0.02, we have done some readahead test tuning on server(ceph osd) side, the result showing that when readahead = 0.5 * object_size(4M in default), we can get max read throughput. Readahead value larger than this generally will not help, but also not harm the performance. > > For your case, seems your workload(HPC) are fully sequential, so larger read ahead and prefetch should be helpful, but for RBD part, it's a bit harder to so such tuning. > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson > Sent: Monday, July 29, 2013 10:49 PM > To: Li Wang > Cc: ceph-devel@vger.kernel.org; Sage Weil > Subject: Re: Read ahead affect Ceph read performance much > > On 07/29/2013 05:24 AM, Li Wang wrote: >> We performed Iozone read test on a 32-node HPC server. Regarding the >> hardware of each node, the CPU is very powerful, so does the network, >> with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, >> the throughput measured by 'dd' locally is around 70MB/s. We >> configured a Ceph cluster with 24 OSDs on 24 nodes, one mds, one to >> four clients, one client per node. The performance is as follows, >> >> Iozone sequential read throughput (MB/s) >> Number of clients 1 2 4 >> Default resize 180.0954 324.4836 591.5851 >> Resize: 256MB 645.3347 1022.998 1267.631 >> >> The complete iozone parameter for one client is, iozone -t 1 -+m >> /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w -c -e -b >> /tmp/iozone.nodelist.50305030.output, on each client node, only one >> thread is started. >> >> for two clients, it is, >> iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n -w >> -c -e -b /tmp/iozone.nodelist.50305030.output >> >> As the data shown, a larger read ahead window could result in >300% >> speedup! > > Very interesting! I've done some similar tests and saw somewhat different results (I actually in some cases saw improvement with lower readahead!). I suspect that this may be very hardware dependent. Were you using RBD or CephFS? In either case, was it the kernel client or userland (IE QEMU/KVM or FUSE)? Also, where did you adjust readahead? > Was this on the client volume or under the OSDs? > > I've got to prepare for the talk later this week, but I will try to get my readahead test results out soon as well. > >> >> Besides, Since the backend of Ceph is not the traditional hard disk, >> it is beneficial to capture the stride read prefetching. To prove >> this, we tested the stride read with the following program, as we >> know, the generic read ahead algorithm of Linux kernel will not >> capture stride-read prefetch, so we use fadvise() to manually force pretching. >> the record size is 4MB. The result is even more surprising, >> >> Stride read throughput (MB/s) >> Number of records prefetched 0 1 4 16 64 128 >> Throughput 42.82 100.74 217.41 497.73 854.48 950.18 >> >> As the data shown, with a read ahead size of 128*4MB, the speedup over >> without read ahead could be up to 950/42 > 2000%! >> >> The core logic of the test program is below, >> >> stride = 17 >> recordsize = 4MB >> for (;;) { >> for (i = 0; i < count; ++i) { >> long long start = pos + (i + 1) * stride * recordsize; >> printf("PRE READ %lld %lld\n", start, start + block); >> posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED); >> } >> len = read(fd, buf, block); >> total += len; >> printf("READ %lld %lld\n", pos, (pos + len)); >> pos += len; >> lseek(fd, (stride - 1) * block, SEEK_CUR); >> pos += (stride - 1) * block; >> } >> >> Given the above results and some more, We plan to submit a blue print >> to discuss the prefetching optimization of Ceph. > > Cool! > >> >> Cheers, >> Li Wang >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Read ahead affect Ceph read performance much 2013-07-31 15:27 ` Li Wang @ 2013-07-31 15:48 ` Chen, Xiaoxi 0 siblings, 0 replies; 6+ messages in thread From: Chen, Xiaoxi @ 2013-07-31 15:48 UTC (permalink / raw) To: Li Wang; +Cc: Mark Nelson, ceph-devel@vger.kernel.org, Sage Weil Hi Li Wang, >We are tuning the prefetching window from the client side by specifying a different 'rasize' at mount time. By the term client, are you saying cephfs? Or your self-written client? > We think how many OSDs is a file stored across also impact the performance, since that somehow determines how much optimization space are there. More OSDs, More performance potential to exploit, so maybe you could try more OSDs. Basically not really, for HPC case, you have fewer client( 4) but much larger backend(24), so striping the file across many OSDs generaly do help. But for RBD/EBS side, with enough concurrent( #of RBDs = # of OSDs,it's common for this use case), it's not that ture to have a file stored a lot of OSDs. Actually we have even higher throughput(normalized by # of disks) than yours ( 1267/24=53MB/s for yours, 2472/40=62 for mine) even with much less sequential pattern ( we are doing 64K sequential read from VM, although it's sequential for a particular vm, but considered the large concurrency , the access pattern in OSD side is not that sequential) Thanks again for the information. Xiaoxi -----Original Message--- From: Li Wang [mailto:liwang@ubuntukylin.com] Sent: Wednesday, July 31, 2013 11:27 PM To: Chen, Xiaoxi Cc: Mark Nelson; ceph-devel@vger.kernel.org; Sage Weil Subject: Re: Read ahead affect Ceph read performance much >We are tuning the prefetching window from the client side by specifying a different 'rasize' at mount time. The workload we are using is iozone, just the hardware is, to some extent, for HPC. We think how many OSDs is a file stored across also impact the performance, since that somehow determines how much optimization space are there. More OSDs, More performance potential to exploit, so maybe you could try more OSDs. Would like to hear your further test results. Cheers, Li Wang On 07/31/2013 12:42 PM, Chen, Xiaoxi wrote: > My 0.02, we have done some readahead test tuning on server(ceph osd) side, the result showing that when readahead = 0.5 * object_size(4M in default), we can get max read throughput. Readahead value larger than this generally will not help, but also not harm the performance. > > For your case, seems your workload(HPC) are fully sequential, so larger read ahead and prefetch should be helpful, but for RBD part, it's a bit harder to so such tuning. > > -----Original Message----- > From: ceph-devel-owner@vger.kernel.org > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Mark Nelson > Sent: Monday, July 29, 2013 10:49 PM > To: Li Wang > Cc: ceph-devel@vger.kernel.org; Sage Weil > Subject: Re: Read ahead affect Ceph read performance much > > On 07/29/2013 05:24 AM, Li Wang wrote: >> We performed Iozone read test on a 32-node HPC server. Regarding the >> hardware of each node, the CPU is very powerful, so does the network, >> with a bandwidth > 1.5 GB/s. 64GB memory, the IO is relatively slow, >> the throughput measured by 'dd' locally is around 70MB/s. We >> configured a Ceph cluster with 24 OSDs on 24 nodes, one mds, one to >> four clients, one client per node. The performance is as follows, >> >> Iozone sequential read throughput (MB/s) >> Number of clients 1 2 4 >> Default resize 180.0954 324.4836 591.5851 >> Resize: 256MB 645.3347 1022.998 1267.631 >> >> The complete iozone parameter for one client is, iozone -t 1 -+m >> /tmp/iozone.nodelist.50305030 -s 64G -r 4M -i 0 -+n -w -c -e -b >> /tmp/iozone.nodelist.50305030.output, on each client node, only one >> thread is started. >> >> for two clients, it is, >> iozone -t 2 -+m /tmp/iozone.nodelist.50305030 -s 32G -r 4M -i 0 -+n >> -w -c -e -b /tmp/iozone.nodelist.50305030.output >> >> As the data shown, a larger read ahead window could result in >300% >> speedup! > > Very interesting! I've done some similar tests and saw somewhat different results (I actually in some cases saw improvement with lower readahead!). I suspect that this may be very hardware dependent. Were you using RBD or CephFS? In either case, was it the kernel client or userland (IE QEMU/KVM or FUSE)? Also, where did you adjust readahead? > Was this on the client volume or under the OSDs? > > I've got to prepare for the talk later this week, but I will try to get my readahead test results out soon as well. > >> >> Besides, Since the backend of Ceph is not the traditional hard disk, >> it is beneficial to capture the stride read prefetching. To prove >> this, we tested the stride read with the following program, as we >> know, the generic read ahead algorithm of Linux kernel will not >> capture stride-read prefetch, so we use fadvise() to manually force pretching. >> the record size is 4MB. The result is even more surprising, >> >> Stride read throughput (MB/s) >> Number of records prefetched 0 1 4 16 64 128 >> Throughput 42.82 100.74 217.41 497.73 854.48 950.18 >> >> As the data shown, with a read ahead size of 128*4MB, the speedup >> over without read ahead could be up to 950/42 > 2000%! >> >> The core logic of the test program is below, >> >> stride = 17 >> recordsize = 4MB >> for (;;) { >> for (i = 0; i < count; ++i) { >> long long start = pos + (i + 1) * stride * recordsize; >> printf("PRE READ %lld %lld\n", start, start + block); >> posix_fadvise(fd, start, block, POSIX_FADV_WILLNEED); >> } >> len = read(fd, buf, block); >> total += len; >> printf("READ %lld %lld\n", pos, (pos + len)); >> pos += len; >> lseek(fd, (stride - 1) * block, SEEK_CUR); >> pos += (stride - 1) * block; >> } >> >> Given the above results and some more, We plan to submit a blue print >> to discuss the prefetching optimization of Ceph. > > Cool! > >> >> Cheers, >> Li Wang >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-07-31 15:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-29 10:24 Read ahead affect Ceph read performance much Li Wang 2013-07-29 13:00 ` Andrey Korolyov 2013-07-29 14:48 ` Mark Nelson 2013-07-31 4:42 ` Chen, Xiaoxi 2013-07-31 15:27 ` Li Wang 2013-07-31 15:48 ` Chen, Xiaoxi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.