* dd to a striped device with 9 disks gets much lower throughput when oflag=direct used
@ 2012-01-27 1:06 Richard Sharpe
2012-01-27 6:54 ` Hannes Reinecke
2012-01-27 8:52 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Richard Sharpe @ 2012-01-27 1:06 UTC (permalink / raw)
To: dm-devel
Hi,
Perhaps I am doing something stupid, but I would like to understand
why there is a difference in the following situation.
I have defined a stripe device thusly:
"echo 0 17560535040 striped 9 8 /dev/sdd 0 /dev/sde 0 /dev/sdf 0
/dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0 /dev/sdk 0 /dev/sdl 0 |
dmsetup create stripe_dev"
Then is did the following:
dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000
and I got 880 MB/s
However, when I changed that command to:
dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000
oflag=direct
I get 210 MB/s reliably.
The system in question is a 16 core (probably two CPUs) Intel Xeon
E5620 @2.40Ghz with 64GB of memory and 12 7200PRM SATA drives
connected to an LSI SAS controller but set up as a JBOD of 12 drives.
Why do I see such a big performance difference? Does writing to the
device also use the page cache if I don't specify DIRECT IO?
--
Regards,
Richard Sharpe
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 1:06 dd to a striped device with 9 disks gets much lower throughput when oflag=direct used Richard Sharpe @ 2012-01-27 6:54 ` Hannes Reinecke 2012-01-27 8:52 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Hannes Reinecke @ 2012-01-27 6:54 UTC (permalink / raw) To: device-mapper development On 01/27/2012 02:06 AM, Richard Sharpe wrote: > Hi, > > Perhaps I am doing something stupid, but I would like to understand > why there is a difference in the following situation. > > I have defined a stripe device thusly: > > "echo 0 17560535040 striped 9 8 /dev/sdd 0 /dev/sde 0 /dev/sdf 0 > /dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0 /dev/sdk 0 /dev/sdl 0 | > dmsetup create stripe_dev" > > Then is did the following: > > dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000 > > and I got 880 MB/s > > However, when I changed that command to: > > dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000 > oflag=direct > > I get 210 MB/s reliably. > > The system in question is a 16 core (probably two CPUs) Intel Xeon > E5620 @2.40Ghz with 64GB of memory and 12 7200PRM SATA drives > connected to an LSI SAS controller but set up as a JBOD of 12 drives. > > Why do I see such a big performance difference? Does writing to the > device also use the page cache if I don't specify DIRECT IO? > Yes. All I/O using read/write calls is going via the pagecache. The only way to circumvent this is to use DIRECT_IO. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 1:06 dd to a striped device with 9 disks gets much lower throughput when oflag=direct used Richard Sharpe 2012-01-27 6:54 ` Hannes Reinecke @ 2012-01-27 8:52 ` Christoph Hellwig 2012-01-27 15:03 ` Richard Sharpe 1 sibling, 1 reply; 9+ messages in thread From: Christoph Hellwig @ 2012-01-27 8:52 UTC (permalink / raw) To: device-mapper development On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote: > Why do I see such a big performance difference? Does writing to the > device also use the page cache if I don't specify DIRECT IO? Yes. Trying adding conv=fdatasync to both versions to get more realistic results. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 8:52 ` Christoph Hellwig @ 2012-01-27 15:03 ` Richard Sharpe 2012-01-27 15:16 ` Zdenek Kabelac 0 siblings, 1 reply; 9+ messages in thread From: Richard Sharpe @ 2012-01-27 15:03 UTC (permalink / raw) To: device-mapper development On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote: >> Why do I see such a big performance difference? Does writing to the >> device also use the page cache if I don't specify DIRECT IO? > > Yes. Trying adding conv=fdatasync to both versions to get more > realistic results. Thank you for that advice. I am comparing btrfs vs rolling my own thing using the new dm thin-provisioning approach to get something with resilient metadata, but I need to support two different types of IO, one that uses directio and one that can take advantage of the page cache. So far, btrfs gives me around 800MB/s with a similar setup (can't get exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a dm striped setup is giving me about 10% better throughput without DIRECTIO but only about 45% of the performance with DIRECTIO. Anyway, I now understand. I will run my scripts with conv=fdatasync as well. -- Regards, Richard Sharpe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 15:03 ` Richard Sharpe @ 2012-01-27 15:16 ` Zdenek Kabelac 2012-01-27 15:28 ` Richard Sharpe 0 siblings, 1 reply; 9+ messages in thread From: Zdenek Kabelac @ 2012-01-27 15:16 UTC (permalink / raw) To: dm-devel Dne 27.1.2012 16:03, Richard Sharpe napsal(a): > On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org> wrote: >> On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote: >>> Why do I see such a big performance difference? Does writing to the >>> device also use the page cache if I don't specify DIRECT IO? >> >> Yes. Trying adding conv=fdatasync to both versions to get more >> realistic results. > > Thank you for that advice. I am comparing btrfs vs rolling my own > thing using the new dm thin-provisioning approach to get something > with resilient metadata, but I need to support two different types of > IO, one that uses directio and one that can take advantage of the page > cache. > > So far, btrfs gives me around 800MB/s with a similar setup (can't get > exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a > dm striped setup is giving me about 10% better throughput without > DIRECTIO but only about 45% of the performance with DIRECTIO. > You've mentioned you are using thinp device with stripping - do you have stripes properly aligned on data-block-size of thinp device ? (I think 9 disks are properly quite hard to align somehow on 3.2 kernel, since data block size needs to be power of 2 - I think 3.3 will have this relaxed to page size boundary. Zdenek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 15:16 ` Zdenek Kabelac @ 2012-01-27 15:28 ` Richard Sharpe 2012-01-27 17:24 ` Zdenek Kabelac 0 siblings, 1 reply; 9+ messages in thread From: Richard Sharpe @ 2012-01-27 15:28 UTC (permalink / raw) To: device-mapper development On Fri, Jan 27, 2012 at 7:16 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote: > Dne 27.1.2012 16:03, Richard Sharpe napsal(a): > >> On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org> >> wrote: >>> >>> On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote: >>>> >>>> Why do I see such a big performance difference? Does writing to the >>>> device also use the page cache if I don't specify DIRECT IO? >>> >>> >>> Yes. Trying adding conv=fdatasync to both versions to get more >>> realistic results. >> >> >> Thank you for that advice. I am comparing btrfs vs rolling my own >> thing using the new dm thin-provisioning approach to get something >> with resilient metadata, but I need to support two different types of >> IO, one that uses directio and one that can take advantage of the page >> cache. >> >> So far, btrfs gives me around 800MB/s with a similar setup (can't get >> exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a >> dm striped setup is giving me about 10% better throughput without >> DIRECTIO but only about 45% of the performance with DIRECTIO. >> > > You've mentioned you are using thinp device with stripping - do you have > stripes properly aligned on data-block-size of thinp device ? > (I think 9 disks are properly quite hard to align somehow on 3.2 kernel, > since data block size needs to be power of 2 - I think 3.3 will have this > relaxed to page size boundary. Actually, so far I have not used any thinp devices, since from reading the documentation it seemed that, for what I am doing, I need to give thinp a mirrored device for its metadata and a striped device for its data, so I thought I would try just a striped device. Actually, I can cut that back to 8 devices in the stripe. I am using 4kiB block sizes and writing 256kiB blocks in the dd requests and there is no parity involved so there should be no read-modify-write cycles. I imagine that if I push the write sizes up to a MB or more at a time throughput will get better because at the moment each device is being given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger write size they will get more data at a time. -- Regards, Richard Sharpe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 15:28 ` Richard Sharpe @ 2012-01-27 17:24 ` Zdenek Kabelac 2012-01-27 17:48 ` Richard Sharpe 0 siblings, 1 reply; 9+ messages in thread From: Zdenek Kabelac @ 2012-01-27 17:24 UTC (permalink / raw) To: device-mapper development Dne 27.1.2012 16:28, Richard Sharpe napsal(a): > On Fri, Jan 27, 2012 at 7:16 AM, Zdenek Kabelac<zkabelac@redhat.com> wrote: >> Dne 27.1.2012 16:03, Richard Sharpe napsal(a): >> >>> On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org> >>> wrote: >>>> >>>> On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote: >>>>> >>>>> Why do I see such a big performance difference? Does writing to the >>>>> device also use the page cache if I don't specify DIRECT IO? >>>> >>>> >>>> Yes. Trying adding conv=fdatasync to both versions to get more >>>> realistic results. >>> >>> >>> Thank you for that advice. I am comparing btrfs vs rolling my own >>> thing using the new dm thin-provisioning approach to get something >>> with resilient metadata, but I need to support two different types of >>> IO, one that uses directio and one that can take advantage of the page >>> cache. >>> >>> So far, btrfs gives me around 800MB/s with a similar setup (can't get >>> exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a >>> dm striped setup is giving me about 10% better throughput without >>> DIRECTIO but only about 45% of the performance with DIRECTIO. >>> >> >> You've mentioned you are using thinp device with stripping - do you have >> stripes properly aligned on data-block-size of thinp device ? >> (I think 9 disks are properly quite hard to align somehow on 3.2 kernel, >> since data block size needs to be power of 2 - I think 3.3 will have this >> relaxed to page size boundary. > > Actually, so far I have not used any thinp devices, since from reading > the documentation it seemed that, for what I am doing, I need to give > thinp a mirrored device for its metadata and a striped device for its > data, so I thought I would try just a striped device. > > Actually, I can cut that back to 8 devices in the stripe. I am using > 4kiB block sizes and writing 256kiB blocks in the dd requests and > there is no parity involved so there should be no read-modify-write > cycles. > > I imagine that if I push the write sizes up to a MB or more at a time > throughput will get better because at the moment each device is being > given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger > write size they will get more data at a time. > Well I cannot tell how big influence proper alignment has in your case, but it would be good to measure it in your case. Do you use data_block_size equal to stripe size (256KiB 512blocks ?) Zdenek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 17:24 ` Zdenek Kabelac @ 2012-01-27 17:48 ` Richard Sharpe 2012-01-27 18:06 ` Zdenek Kabelac 0 siblings, 1 reply; 9+ messages in thread From: Richard Sharpe @ 2012-01-27 17:48 UTC (permalink / raw) To: device-mapper development On Fri, Jan 27, 2012 at 9:24 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote: > Dne 27.1.2012 16:28, Richard Sharpe napsal(a): >> Actually, so far I have not used any thinp devices, since from reading >> the documentation it seemed that, for what I am doing, I need to give >> thinp a mirrored device for its metadata and a striped device for its >> data, so I thought I would try just a striped device. >> >> Actually, I can cut that back to 8 devices in the stripe. I am using >> 4kiB block sizes and writing 256kiB blocks in the dd requests and >> there is no parity involved so there should be no read-modify-write >> cycles. >> >> I imagine that if I push the write sizes up to a MB or more at a time >> throughput will get better because at the moment each device is being >> given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger >> write size they will get more data at a time. >> > > Well I cannot tell how big influence proper alignment has in your case, but > it would be good to measure it in your case. > Do you use data_block_size equal to stripe size (256KiB 512blocks ?) I suspect not :-) However, I am not sure what you are asking. I believe that the stripe size is 9 * 8 * 512B, or 36kiB because I think I told it to use 8 sectors per device. This might be sub-optimal. Based on that, I think it will take my write blocks, of 256kiB, and write sectors that are (offset/512 + 256) mod 9 = {0, 1, 2, ... 8} to {disk 0, disk 1, disk 2, ... disk 8}. If I wanted perfectly strip-aligned writes then I think I should write something like 32*9kiB rather than the 32*8kiB I am currently writing. Is that what you are asking me? -- Regards, Richard Sharpe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: dd to a striped device with 9 disks gets much lower throughput when oflag=direct used 2012-01-27 17:48 ` Richard Sharpe @ 2012-01-27 18:06 ` Zdenek Kabelac 0 siblings, 0 replies; 9+ messages in thread From: Zdenek Kabelac @ 2012-01-27 18:06 UTC (permalink / raw) To: dm-devel Dne 27.1.2012 18:48, Richard Sharpe napsal(a): > On Fri, Jan 27, 2012 at 9:24 AM, Zdenek Kabelac<zkabelac@redhat.com> wrote: >> Dne 27.1.2012 16:28, Richard Sharpe napsal(a): >>> Actually, so far I have not used any thinp devices, since from reading >>> the documentation it seemed that, for what I am doing, I need to give >>> thinp a mirrored device for its metadata and a striped device for its >>> data, so I thought I would try just a striped device. >>> >>> Actually, I can cut that back to 8 devices in the stripe. I am using >>> 4kiB block sizes and writing 256kiB blocks in the dd requests and >>> there is no parity involved so there should be no read-modify-write >>> cycles. >>> >>> I imagine that if I push the write sizes up to a MB or more at a time >>> throughput will get better because at the moment each device is being >>> given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger >>> write size they will get more data at a time. >>> >> >> Well I cannot tell how big influence proper alignment has in your case, but >> it would be good to measure it in your case. >> Do you use data_block_size equal to stripe size (256KiB 512blocks ?) > > I suspect not :-) However, I am not sure what you are asking. I > believe that the stripe size is 9 * 8 * 512B, or 36kiB because I think > I told it to use 8 sectors per device. This might be sub-optimal. > > Based on that, I think it will take my write blocks, of 256kiB, and > write sectors that are (offset/512 + 256) mod 9 = {0, 1, 2, ... 8} to > {disk 0, disk 1, disk 2, ... disk 8}. > > If I wanted perfectly strip-aligned writes then I think I should write > something like 32*9kiB rather than the 32*8kiB I am currently writing. > > Is that what you are asking me? > There is surely number of things to test to get optimal performance from striped array and you probably need to make several experiments yourself to figure out the best settings. I'd suggest to use 32KiB on each disk and combine them (8 x 32) to 256KiB array. Then use 512 data_block_size for thinp creation. You may as well try just 4KiB on each drive and get 64KiB stripe and use 128 blocks as data_block_size for thinp. For 9 disks it's hard to say what is the 'optimal' number with 3.2 kernel and thinp - so it will need some playtime. Maybe 32KiB on each disk - and use 128KiB data_block_size on 288KiB stripe. (Though data block size heavily depends on the use case). Zdenek ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-01-27 18:06 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-27 1:06 dd to a striped device with 9 disks gets much lower throughput when oflag=direct used Richard Sharpe 2012-01-27 6:54 ` Hannes Reinecke 2012-01-27 8:52 ` Christoph Hellwig 2012-01-27 15:03 ` Richard Sharpe 2012-01-27 15:16 ` Zdenek Kabelac 2012-01-27 15:28 ` Richard Sharpe 2012-01-27 17:24 ` Zdenek Kabelac 2012-01-27 17:48 ` Richard Sharpe 2012-01-27 18:06 ` Zdenek Kabelac
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).