* AWFUL reshape speed with raid5.
@ 2008-07-28 17:39 Jon Nelson
2008-07-28 18:14 ` Justin Piszcz
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Jon Nelson @ 2008-07-28 17:39 UTC (permalink / raw)
To: LinuxRaid
I built a raid5 with 2 devices (and --assume-clean) using 2x 4GB
partitions (not logical volumes).
I then grew it to 3 devices.
The reshape speed is really really slow.
vmstat shows I/O like this:
0 0 212 25844 141160 497484 0 0 0 612 673 1284 0 6 93 0
0 0 212 25164 141160 497748 0 0 0 19 594 1253 1 4 95 0
0 0 212 25044 141160 498004 0 0 0 0 374 445 0 1 99 0
1 0 212 25220 141164 498000 0 0 0 23 506 1149 0 3 96 1
0 0 212 25500 141164 498004 0 0 0 3 546 1416 0 5 95 0
The min/max is 1000/200000.
What might be going on here?
Kernel is 2.6.25.11 (openSUSE 11.0 x86-64 stock)
/proc/mdstat for this entry:
md99 : active raid5 sdd3[2] sdc3[1] sdb3[0]
3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
[=>...................] reshape = 8.2% (324224/3903744)
finish=43.3min speed=1373K/sec
This is on a set of devices capable of 70+ MB/s.
No meaningful change if I start with 3 disks and grow to 4, with or
without bitmap.
--
Jon
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: AWFUL reshape speed with raid5. 2008-07-28 17:39 AWFUL reshape speed with raid5 Jon Nelson @ 2008-07-28 18:14 ` Justin Piszcz 2008-07-28 18:24 ` Jon Nelson 2008-08-01 1:26 ` Neil Brown 2008-08-21 2:58 ` Jon Nelson 2 siblings, 1 reply; 20+ messages in thread From: Justin Piszcz @ 2008-07-28 18:14 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid What happens if you use 0.90 superblocks? Also, are these sata ports on the mobo or sata ports on a pci-based mobo? On Mon, 28 Jul 2008, Jon Nelson wrote: > I built a raid5 with 2 devices (and --assume-clean) using 2x 4GB > partitions (not logical volumes). > I then grew it to 3 devices. > The reshape speed is really really slow. > > vmstat shows I/O like this: > > 0 0 212 25844 141160 497484 0 0 0 612 673 1284 0 6 93 0 > 0 0 212 25164 141160 497748 0 0 0 19 594 1253 1 4 95 0 > 0 0 212 25044 141160 498004 0 0 0 0 374 445 0 1 99 0 > 1 0 212 25220 141164 498000 0 0 0 23 506 1149 0 3 96 1 > 0 0 212 25500 141164 498004 0 0 0 3 546 1416 0 5 95 0 > > The min/max is 1000/200000. > What might be going on here? > > Kernel is 2.6.25.11 (openSUSE 11.0 x86-64 stock) > > /proc/mdstat for this entry: > > md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] > 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] > [=>...................] reshape = 8.2% (324224/3903744) > finish=43.3min speed=1373K/sec > > > This is on a set of devices capable of 70+ MB/s. > > No meaningful change if I start with 3 disks and grow to 4, with or > without bitmap. > > -- > Jon > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 18:14 ` Justin Piszcz @ 2008-07-28 18:24 ` Jon Nelson 2008-07-28 18:55 ` Jon Nelson 0 siblings, 1 reply; 20+ messages in thread From: Jon Nelson @ 2008-07-28 18:24 UTC (permalink / raw) To: Justin Piszcz; +Cc: LinuxRaid On Mon, Jul 28, 2008 at 1:14 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > What happens if you use 0.90 superblocks? An even more spectacular explosion. Really quite impressive, actually. Just a small sample of the >800 lines: Jul 28 13:18:54 turnip kernel: itmap 9: invalid bitmap page 9: invalid bitmap page 9: invalid bitmap p9: invalid bitmap pag9: invalid bitmap 9: invalid bitmap page9: invalid bitmap pa9: invalid bitmap pag9: invalid bitmap 9: invalid bitmap 9: invalid bitmap pag9: invalid bitmap page9: invalid bitmap 9: invalid bitmap 9: invalid bitmap page9: invalid bitmap pa9: invalid bitmap pag9: invalid bitmap pa9: invalid bitmap p9: invalid bitmap page 9: invalid bitmap page 9: invalid bitmap p9: invalid bitmap 9: invalid bitmap page9: invalid bitmap pag9: invalid bitmap pag9: invalid bitmap page9: invalid bitmap 9: invalid bitmap pag9: invalid bitmap page9: invalid bitmap pag9: invalid bitmap9: invalid bitmap page9: invalid bitmap pag9: invalid bitmap pa9: invalid bitmap page9: invalid bitmap pa9: invalid bitmap p9: invalid bitmap page 9: invalid bitmap page 9: invalid bit9: invalid bitmap pa9: invalid bitmap page 9: invalid bitmap pag9: invalid bitmap page9: invalid bitmap 9: invalid bitmap pa9: invalid bitmap page9: invalid (Note the page -> pag -> p ... variations/corruption/whatever). and Jul 28 13:18:55 turnip kernel: i9: i9: 9: 9: i9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: i9: 9: i9: i9: 9: 9: i9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: in9: 9: 9: 9: 9: 9: i9: 9: i9: 9: i9: 9: i9: i9: 9: 9: 9: 9: 9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: 9: 9: 9: 9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: Jul 28 13:18:55 turnip kernel: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: i9: 9: 9: 9: 9: 9: 9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: i9: 9: i9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: 9: invalid bitmap page request: 222 (> 122) > Also, are these sata ports on the mobo or sata ports on a pci-based mobo? The ports are on a PCI-E motherboard, MCP55 chipset. The motherboard/chipset/whatever is *not* the problem. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 18:24 ` Jon Nelson @ 2008-07-28 18:55 ` Jon Nelson 2008-07-28 19:17 ` Roger Heflin ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Jon Nelson @ 2008-07-28 18:55 UTC (permalink / raw) To: Justin Piszcz; +Cc: LinuxRaid Some more data points, observations, and questions. For each test, I'd --create the array, drop the caches, --grow, and then watch vmstat and also record the time between kernel: md: resuming resync of md99 from checkpoint. and kernel: md: md99: resync done. I found two things: 1. metadata version matters. Why? 2. VERY LITTLE I/O takes place (between 0 and 100KB/s, typically no I/O at all) according to vmstat. Why? If it takes 1m34s to "grow" the array, but no I/O is taking place, then what is actually taking so long? 3. I removed the bitmap for these tests. Having a bitmap meant that the overall speed was REALLY HORRIBLE. The results: metadata: time taken 0.9: 27s 1.0: 27s 1.1: 37s 1.2: 1m34s Questions (repeated): 1. Why does the metadata version matter so much? 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O must be taking place but why doesn't vmstat show it? ] -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 18:55 ` Jon Nelson @ 2008-07-28 19:17 ` Roger Heflin 2008-07-28 19:43 ` Justin Piszcz 2008-07-30 16:23 ` Bill Davidsen 2 siblings, 0 replies; 20+ messages in thread From: Roger Heflin @ 2008-07-28 19:17 UTC (permalink / raw) To: Jon Nelson; +Cc: Justin Piszcz, LinuxRaid Jon Nelson wrote: > Some more data points, observations, and questions. > > For each test, I'd --create the array, drop the caches, --grow, and > then watch vmstat and also record the time between > > kernel: md: resuming resync of md99 from checkpoint. > and > kernel: md: md99: resync done. > > I found two things: > > 1. metadata version matters. Why? > 2. VERY LITTLE I/O takes place (between 0 and 100KB/s, typically no > I/O at all) according to vmstat. Why? If it takes 1m34s to "grow" the > array, but no I/O is taking place, then what is actually taking so > long? I *think* that internal md io is not being shown. I know I can tell an array to check itself, and have mdstat indicate a speed of 35MB/second and vmstat indicates no IO was happening. The same happens when an array is rebuilding, vmstat indicates no IO. If you do IO to the md device from outside it does show that. And in both cases visually checking the confirms that quite a lot appears to be going on. Roger ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 18:55 ` Jon Nelson 2008-07-28 19:17 ` Roger Heflin @ 2008-07-28 19:43 ` Justin Piszcz 2008-07-28 19:59 ` David Lethe 2008-07-30 16:23 ` Bill Davidsen 2 siblings, 1 reply; 20+ messages in thread From: Justin Piszcz @ 2008-07-28 19:43 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid There once was a bug in an earlier kernel, in which the min_speed is what the rebuild ran at if you had a specific chunk size, have you tried to echo 30000 > to min_speed? Does it increase it to 30mb/s for the rebuild? On Mon, 28 Jul 2008, Jon Nelson wrote: > Some more data points, observations, and questions. > > For each test, I'd --create the array, drop the caches, --grow, and > then watch vmstat and also record the time between > > kernel: md: resuming resync of md99 from checkpoint. > and > kernel: md: md99: resync done. > > I found two things: > > 1. metadata version matters. Why? > 2. VERY LITTLE I/O takes place (between 0 and 100KB/s, typically no > I/O at all) according to vmstat. Why? If it takes 1m34s to "grow" the > array, but no I/O is taking place, then what is actually taking so > long? > 3. I removed the bitmap for these tests. Having a bitmap meant that > the overall speed was REALLY HORRIBLE. > > The results: > > metadata: time taken > > 0.9: 27s > 1.0: 27s > 1.1: 37s > 1.2: 1m34s > > Questions (repeated): > > 1. Why does the metadata version matter so much? > 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O > must be taking place but why doesn't vmstat show it? ] > > -- > Jon > ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: AWFUL reshape speed with raid5. 2008-07-28 19:43 ` Justin Piszcz @ 2008-07-28 19:59 ` David Lethe 2008-07-28 20:56 ` Jon Nelson 0 siblings, 1 reply; 20+ messages in thread From: David Lethe @ 2008-07-28 19:59 UTC (permalink / raw) To: Justin Piszcz, Jon Nelson; +Cc: LinuxRaid >-----Original Message----- >From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Justin Piszcz >Sent: Monday, July 28, 2008 2:44 PM >To: Jon Nelson >Cc: LinuxRaid >Subject: Re: AWFUL reshape speed with raid5. > >There once was a bug in an earlier kernel, in which the min_speed is what >the rebuild ran at if you had a specific chunk size, have you tried to >echo 30000 > to min_speed? Does it increase it to 30mb/s for the rebuild? > >On Mon, 28 Jul 2008, Jon Nelson wrote: > >> Some more data points, observations, and questions. >> >> For each test, I'd --create the array, drop the caches, --grow, and >> then watch vmstat and also record the time between >> >> kernel: md: resuming resync of md99 from checkpoint. >> and >> kernel: md: md99: resync done. >> >> I found two things: >> >> 1. metadata version matters. Why? >> 2. VERY LITTLE I/O takes place (between 0 and 100KB/s, typically no >> I/O at all) according to vmstat. Why? If it takes 1m34s to "grow" the >> array, but no I/O is taking place, then what is actually taking so >> long? >> 3. I removed the bitmap for these tests. Having a bitmap meant that >> the overall speed was REALLY HORRIBLE. >> >> The results: >> >> metadata: time taken >> >> 0.9: 27s >> 1.0: 27s >> 1.1: 37s >> 1.2: 1m34s >> >> Questions (repeated): >> >> 1. Why does the metadata version matter so much? >> 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O >> must be taking place but why doesn't vmstat show it? ] >> >> -- >> Jon >> You are incorrectly working from the premise that vmstat is measures disk activity. It does not. Vmstat has no idea how many actual bytes get sent to, or received from disk drives. Why not do a real test and hook up a pair of SAS, SCSI, or FC disks, then issue some LOG SENSE commands to report the actual number of bytes read & written to each disk during the rebuild? If the disks are FibreChannel, then you have even more ways to measure true throughput in bytes. It will not be an estimate, it will be a real count of cumulative bytes read, written, re-read/re-written, recovered, etc., for any instant in time. Heck, if you have Seagate and some other disks, then you can even see detailed information for cached reads so you can see if any particular md configuration results in a higher number of cached I/Os, meaning greater efficiency and smaller overall latency. David @ santools dot com David ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 19:59 ` David Lethe @ 2008-07-28 20:56 ` Jon Nelson 0 siblings, 0 replies; 20+ messages in thread From: Jon Nelson @ 2008-07-28 20:56 UTC (permalink / raw) To: David Lethe; +Cc: Justin Piszcz, LinuxRaid On Mon, Jul 28, 2008 at 2:59 PM, David Lethe <david@santools.com> wrote: >>-----Original Message----- >>From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Justin Piszcz >>Sent: Monday, July 28, 2008 2:44 PM >>To: Jon Nelson >>Cc: LinuxRaid >>Subject: Re: AWFUL reshape speed with raid5. >> >>There once was a bug in an earlier kernel, in which the min_speed is > what >>the rebuild ran at if you had a specific chunk size, have you tried to >>echo 30000 > to min_speed? Does it increase it to 30mb/s for the > rebuild? As I said in my original post, I'm running 2.6.25.11 > You are incorrectly working from the premise that vmstat is measures > disk activity. It does not. Vmstat has no idea how many actual bytes > get sent to, or received from disk drives. Fair enough. > Why not do a real test and hook up a pair of SAS, SCSI, or FC disks, ... Cost. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 18:55 ` Jon Nelson 2008-07-28 19:17 ` Roger Heflin 2008-07-28 19:43 ` Justin Piszcz @ 2008-07-30 16:23 ` Bill Davidsen 2008-07-30 16:31 ` Jon Nelson 2008-07-30 16:50 ` David Greaves 2 siblings, 2 replies; 20+ messages in thread From: Bill Davidsen @ 2008-07-30 16:23 UTC (permalink / raw) To: Jon Nelson; +Cc: Justin Piszcz, LinuxRaid Jon Nelson wrote: > 1. metadata version matters. Why? > 2. VERY LITTLE I/O takes place (between 0 and 100KB/s, typically no > I/O at all) according to vmstat. Why? If it takes 1m34s to "grow" the > array, but no I/O is taking place, then what is actually taking so > long? > 3. I removed the bitmap for these tests. Having a bitmap meant that > the overall speed was REALLY HORRIBLE. > > The results: > > metadata: time taken > > 0.9: 27s > 1.0: 27s > 1.1: 37s > 1.2: 1m34s > > Questions (repeated): > > 1. Why does the metadata version matter so much? > I have no idea. > 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O > must be taking place but why doesn't vmstat show it? ] > vmstat doesn't tell you enough, you need a tool to show per-device and per-partition io, which will give you what you need. I can't put a finger on the one I wrote, but there are others. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 16:23 ` Bill Davidsen @ 2008-07-30 16:31 ` Jon Nelson 2008-07-30 17:08 ` Justin Piszcz 2008-07-30 16:50 ` David Greaves 1 sibling, 1 reply; 20+ messages in thread From: Jon Nelson @ 2008-07-30 16:31 UTC (permalink / raw) To: Bill Davidsen; +Cc: Justin Piszcz, LinuxRaid On Wed, Jul 30, 2008 at 11:23 AM, Bill Davidsen <davidsen@tmr.com> wrote: > Jon Nelson wrote: >> 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O >> must be taking place but why doesn't vmstat show it? ] >> > > vmstat doesn't tell you enough, you need a tool to show per-device and > per-partition io, which will give you what you need. I can't put a finger on > the one I wrote, but there are others. I gave dstat a try (actually, I rather prefer dstat over vmstat...): This is right before, during, and after the --grow operation. --dsk/sdb-----dsk/sdc-----dsk/sdd-----dsk/sde----dsk/total- read writ: read writ: read writ: read writ: read writ 0 44k: 0 24k: 0 24k: 0 0 : 0 184k 0 0 : 0 0 : 0 0 : 0 0 : 0 0 0 24k: 0 0 : 0 0 : 0 0 : 0 48k 32M 14k: 32M 2048B: 32M 2048B: 0 0 : 191M 36k 63M 0 : 63M 0 : 63M 0 : 0 0 : 377M 0 65M 0 : 65M 0 : 65M 0 : 0 0 : 391M 0 72M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 74M 0 : 73M 0 : 74M 0 : 0 0 : 441M 0 70M 48k: 70M 0 : 70M 0 : 0 0 : 419M 96k 61M 44k: 61M 16k: 62M 32k: 0 0 : 368M 184k 71M 0 : 72M 0 : 71M 0 : 0 0 : 429M 0 74M 0 : 73M 0 : 73M 0 : 0 0 : 439M 0 73M 0 : 73M 0 : 73M 0 : 0 0 : 439M 0 71M 20k: 71M 0 : 71M 0 : 0 0 : 426M 40k 72M 0 : 72M 0 : 73M 0 : 0 0 : 434M 0 73M 0 : 74M 0 : 73M 0 : 0 0 : 442M 0 60M 40k: 59M 16k: 59M 28k: 0 0 : 356M 168k 73M 0 : 73M 0 : 73M 0 : 0 0 : 438M 0 70M 24k: 69M 0 : 70M 0 : 0 0 : 418M 48k 72M 0 : 71M 0 : 72M 0 : 0 0 : 430M 0 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 71M 0 : 71M 0 : 71M 0 : 0 0 : 427M 0 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 71M 24k: 71M 0 : 71M 0 : 0 0 : 428M 48k 72M 0 : 72M 0 : 72M 0 : 0 0 : 432M 0 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 71M 4096B: 70M 0 : 70M 0 : 0 0 : 422M 8192B 58M 0 : 58M 0 : 58M 0 : 0 0 : 350M 0 59M 24k: 60M 0 : 59M 0 : 0 0 : 357M 48k 74M 0 : 73M 0 : 74M 0 : 0 0 : 441M 0 19M 8192B: 19M 8192B: 19M 4096B: 0 0 : 114M 40k 0 0 : 0 0 : 0 0 : 0 0 : 0 0 0 0 : 0 0 : 0 0 : 0 0 : 0 0 0 160k: 0 16k: 0 20k: 0 0 : 0 392k So. Clearly, lots of I/O. 440MB/s total. Almost entirely reads. Question: to --grow --size the array, clearly we see lots of reads. Why aren't we seeing any (meaningful) writes? If there are no writes, then what purpose do the reads serve? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 16:31 ` Jon Nelson @ 2008-07-30 17:08 ` Justin Piszcz 2008-07-30 17:48 ` Jon Nelson 0 siblings, 1 reply; 20+ messages in thread From: Justin Piszcz @ 2008-07-30 17:08 UTC (permalink / raw) To: Jon Nelson; +Cc: Bill Davidsen, LinuxRaid On Wed, 30 Jul 2008, Jon Nelson wrote: > On Wed, Jul 30, 2008 at 11:23 AM, Bill Davidsen <davidsen@tmr.com> wrote: >> Jon Nelson wrote: >>> 2. If no I/O is taking place, why does it take so long? [ NOTE: I/O >>> must be taking place but why doesn't vmstat show it? ] >>> >> >> vmstat doesn't tell you enough, you need a tool to show per-device and >> per-partition io, which will give you what you need. I can't put a finger on >> the one I wrote, but there are others. > > I gave dstat a try (actually, I rather prefer dstat over vmstat...): > > This is right before, during, and after the --grow operation. > > --dsk/sdb-----dsk/sdc-----dsk/sdd-----dsk/sde----dsk/total- > read writ: read writ: read writ: read writ: read writ > 0 44k: 0 24k: 0 24k: 0 0 : 0 184k > 0 0 : 0 0 : 0 0 : 0 0 : 0 0 > 0 24k: 0 0 : 0 0 : 0 0 : 0 48k > 32M 14k: 32M 2048B: 32M 2048B: 0 0 : 191M 36k > 63M 0 : 63M 0 : 63M 0 : 0 0 : 377M 0 > 65M 0 : 65M 0 : 65M 0 : 0 0 : 391M 0 > 72M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 > 74M 0 : 73M 0 : 74M 0 : 0 0 : 441M 0 > 70M 48k: 70M 0 : 70M 0 : 0 0 : 419M 96k > 61M 44k: 61M 16k: 62M 32k: 0 0 : 368M 184k > 71M 0 : 72M 0 : 71M 0 : 0 0 : 429M 0 > 74M 0 : 73M 0 : 73M 0 : 0 0 : 439M 0 > 73M 0 : 73M 0 : 73M 0 : 0 0 : 439M 0 > 71M 20k: 71M 0 : 71M 0 : 0 0 : 426M 40k > 72M 0 : 72M 0 : 73M 0 : 0 0 : 434M 0 > 73M 0 : 74M 0 : 73M 0 : 0 0 : 442M 0 > 60M 40k: 59M 16k: 59M 28k: 0 0 : 356M 168k > 73M 0 : 73M 0 : 73M 0 : 0 0 : 438M 0 > 70M 24k: 69M 0 : 70M 0 : 0 0 : 418M 48k > 72M 0 : 71M 0 : 72M 0 : 0 0 : 430M 0 > 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 > 71M 0 : 71M 0 : 71M 0 : 0 0 : 427M 0 > 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 > 71M 24k: 71M 0 : 71M 0 : 0 0 : 428M 48k > 72M 0 : 72M 0 : 72M 0 : 0 0 : 432M 0 > 73M 0 : 73M 0 : 73M 0 : 0 0 : 437M 0 > 71M 4096B: 70M 0 : 70M 0 : 0 0 : 422M 8192B > 58M 0 : 58M 0 : 58M 0 : 0 0 : 350M 0 > 59M 24k: 60M 0 : 59M 0 : 0 0 : 357M 48k > 74M 0 : 73M 0 : 74M 0 : 0 0 : 441M 0 > 19M 8192B: 19M 8192B: 19M 4096B: 0 0 : 114M 40k > 0 0 : 0 0 : 0 0 : 0 0 : 0 0 > 0 0 : 0 0 : 0 0 : 0 0 : 0 0 > 0 160k: 0 16k: 0 20k: 0 0 : 0 392k > > So. Clearly, lots of I/O. 440MB/s total. Almost entirely reads. > > > Question: to --grow --size the array, clearly we see lots of reads. > Why aren't we seeing any (meaningful) writes? If there are no writes, > then what purpose do the reads serve? > In dstat, the speed is doubled for the total for some reason, divide by 2 (if you compare with iostat -x -k 1) you should see the difference. As far as the grow itself, the last time I did it was 2-3 years ago but if I recall it ran for 24hrs+ (IDE system, PCI, etc) between 5-15MiB/s. Justin. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 17:08 ` Justin Piszcz @ 2008-07-30 17:48 ` Jon Nelson 2008-08-01 1:43 ` Neil Brown 0 siblings, 1 reply; 20+ messages in thread From: Jon Nelson @ 2008-07-30 17:48 UTC (permalink / raw) To: Justin Piszcz; +Cc: Bill Davidsen, LinuxRaid On Wed, Jul 30, 2008 at 12:08 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > In dstat, the speed is doubled for the total for some reason, divide by 2 > (if you compare with iostat -x -k 1) you should see the difference. Ah, nice catch. It still doesn't change my question. Upon further reflection, I believe I know what has caused the read/write disparity: 1. I've done so much testing with these devices that the contents have been zeroed many times. 2. I am GUESSING that if the raid recovery code reads from drives A, B, and C, builds the appropriate checksum and verifies it, if the checksum matches it skips the write. I believe that this is what is happening. To confirm, I wrote a few gigs of /dev/urandom to one of the devices and re-tested. Indeed, this time around I saw plenty of writing. One mystery solved. Remaining questions: 1. Why does the version of metadata matter so much in a --grow --size operation? 2. There appear to be bugs when a bitmap is used. Can somebody else confirm? 3. I'll look into the awful speed thing later as that doesn't seem to be an issue. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 17:48 ` Jon Nelson @ 2008-08-01 1:43 ` Neil Brown 2008-08-01 13:23 ` Jon Nelson 0 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2008-08-01 1:43 UTC (permalink / raw) To: Jon Nelson; +Cc: Justin Piszcz, Bill Davidsen, LinuxRaid On Wednesday July 30, jnelson-linux-raid@jamponi.net wrote: > On Wed, Jul 30, 2008 at 12:08 PM, Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > > In dstat, the speed is doubled for the total for some reason, divide by 2 > > (if you compare with iostat -x -k 1) you should see the difference. > > Ah, nice catch. It still doesn't change my question. > Upon further reflection, I believe I know what has caused the > read/write disparity: > > 1. I've done so much testing with these devices that the contents have > been zeroed many times. > 2. I am GUESSING that if the raid recovery code reads from drives A, > B, and C, builds the appropriate checksum and verifies it, if the > checksum matches it skips the write. I believe that this is what is > happening. To confirm, I wrote a few gigs of /dev/urandom to one of > the devices and re-tested. Indeed, this time around I saw plenty of > writing. One mystery solved. I note that while in the original mail in this thread you were talking about growing an array by adding drives, you are now talking about growing an array by using more space on each drive. This change threw me at first... You are correct. When raid5 is repairing parity, it reads everything and only writes if something is found to be wrong. This is in-general fast. When you grow the --size of a raid5 it repairs the parity on the newly added space. If this already has correct parity, nothing will be written. > > Remaining questions: > > 1. Why does the version of metadata matter so much in a --grow --size operation? I cannot measure any significant different. Could you give some precise details of the tests you run and the results you get ? > 2. There appear to be bugs when a bitmap is used. Can somebody else confirm? Confirmed. If you --grow an array with a bitmap, you will hit problems as there is no mechanism to grow the bitmap. What you need to do is to remove the bitmap, do the 'grow', then re-add the bitmap. I thought I had arranged that a grow would fail if there was a bitmap in place, but I guess not. I'll have a look into this. Thanks. NeilBrown > 3. I'll look into the awful speed thing later as that doesn't seem to > be an issue. > > -- > Jon > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-08-01 1:43 ` Neil Brown @ 2008-08-01 13:23 ` Jon Nelson 2008-08-01 15:57 ` Jon Nelson 0 siblings, 1 reply; 20+ messages in thread From: Jon Nelson @ 2008-08-01 13:23 UTC (permalink / raw) To: Neil Brown; +Cc: Justin Piszcz, Bill Davidsen, LinuxRaid On Thu, Jul 31, 2008 at 8:43 PM, Neil Brown <neilb@suse.de> wrote: > I note that while in the original mail in this thread you were talking > about growing an array by adding drives, you are now talking about > growing an array by using more space on each drive. This change threw > me at first... True. Mea Culpa. >> Remaining questions: >> >> 1. Why does the version of metadata matter so much in a --grow --size operation? > > I cannot measure any significant different. Could you give some > precise details of the tests you run and the results you get ? I'll try to throw some stuff together soon. >> 2. There appear to be bugs when a bitmap is used. Can somebody else confirm? > > Confirmed. If you --grow an array with a bitmap, you will hit > problems as there is no mechanism to grow the bitmap. > What you need to do is to remove the bitmap, do the 'grow', then > re-add the bitmap. > I thought I had arranged that a grow would fail if there was a bitmap > in place, but I guess not. > I'll have a look into this. A small suggestion: to avoid trepidation, perhaps a small note like "you may re-add the bitmap while the array is still rebuilding/growing/whatever" would help to avoid some worry. There are two other solutions: Have the underlying code grow the bitmap (probably hard), or have it automatically remove+re-add the bitmap. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-08-01 13:23 ` Jon Nelson @ 2008-08-01 15:57 ` Jon Nelson 0 siblings, 0 replies; 20+ messages in thread From: Jon Nelson @ 2008-08-01 15:57 UTC (permalink / raw) Cc: LinuxRaid >>> 1. Why does the version of metadata matter so much in a --grow --size operation? >> >> I cannot measure any significant different. Could you give some >> precise details of the tests you run and the results you get ? > > I'll try to throw some stuff together soon. I was unable to replicate the difference in bitmap speeds. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 16:23 ` Bill Davidsen 2008-07-30 16:31 ` Jon Nelson @ 2008-07-30 16:50 ` David Greaves 2008-07-30 17:24 ` Bill Davidsen 1 sibling, 1 reply; 20+ messages in thread From: David Greaves @ 2008-07-30 16:50 UTC (permalink / raw) To: Bill Davidsen; +Cc: Jon Nelson, Justin Piszcz, LinuxRaid Bill Davidsen wrote: > vmstat doesn't tell you enough, you need a tool to show per-device and > per-partition io, which will give you what you need. I can't put a > finger on the one I wrote, but there are others. iostat? David ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-30 16:50 ` David Greaves @ 2008-07-30 17:24 ` Bill Davidsen 0 siblings, 0 replies; 20+ messages in thread From: Bill Davidsen @ 2008-07-30 17:24 UTC (permalink / raw) To: David Greaves; +Cc: Jon Nelson, Justin Piszcz, LinuxRaid David Greaves wrote: > Bill Davidsen wrote: > >> vmstat doesn't tell you enough, you need a tool to show per-device and >> per-partition io, which will give you what you need. I can't put a >> finger on the one I wrote, but there are others. >> > > iostat? > dstat seems to do what he wants, the one I wrote produced a file which could be used by gnuplot to generate neat graphical output to make problems glaringly obvious. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 17:39 AWFUL reshape speed with raid5 Jon Nelson 2008-07-28 18:14 ` Justin Piszcz @ 2008-08-01 1:26 ` Neil Brown 2008-08-01 13:14 ` Jon Nelson 2008-08-21 2:58 ` Jon Nelson 2 siblings, 1 reply; 20+ messages in thread From: Neil Brown @ 2008-08-01 1:26 UTC (permalink / raw) To: Jon Nelson; +Cc: LinuxRaid On Monday July 28, jnelson-linux-raid@jamponi.net wrote: > I built a raid5 with 2 devices (and --assume-clean) using 2x 4GB > partitions (not logical volumes). > I then grew it to 3 devices. > The reshape speed is really really slow. ... > > Kernel is 2.6.25.11 (openSUSE 11.0 x86-64 stock) > > /proc/mdstat for this entry: > > md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] > 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] > [=>...................] reshape = 8.2% (324224/3903744) > finish=43.3min speed=1373K/sec > 1.3MB/sec is certainly slow. On my test system (which is just a bunch of fairly ordinary SATA drives in a cheap controller) I get about 10 times this - 13MB/sec. > > This is on a set of devices capable of 70+ MB/s. The 70MB/s is streaming IO. When doing a reshape like this, md/raid5 need to read some data, then go back and write it somewhere else. So there is lots of seeking backwards and forwards. You can possibly increase the speed somewhat by increasing the buffer space that is used, thus allowing larger reads followed by larger writes. This is done by increasing /sys/block/mdXX/md/stripe_cache_size Still, 1373K/sec is very slow. I cannot explain that. NeilBrown ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-08-01 1:26 ` Neil Brown @ 2008-08-01 13:14 ` Jon Nelson 0 siblings, 0 replies; 20+ messages in thread From: Jon Nelson @ 2008-08-01 13:14 UTC (permalink / raw) To: Neil Brown; +Cc: LinuxRaid On Thu, Jul 31, 2008 at 8:26 PM, Neil Brown <neilb@suse.de> wrote: > On Monday July 28, jnelson-linux-raid@jamponi.net wrote: >> I built a raid5 with 2 devices (and --assume-clean) using 2x 4GB >> partitions (not logical volumes). >> I then grew it to 3 devices. >> The reshape speed is really really slow. > ... >> >> Kernel is 2.6.25.11 (openSUSE 11.0 x86-64 stock) >> >> /proc/mdstat for this entry: >> >> md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] >> 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> [=>...................] reshape = 8.2% (324224/3903744) >> finish=43.3min speed=1373K/sec >> > > > 1.3MB/sec is certainly slow. On my test system (which is just a bunch > of fairly ordinary SATA drives in a cheap controller) I get about 10 > times this - 13MB/sec. I was able to sorta replicate it. The exact sequence of commands; mdadm --create /dev/md99 --level=raid5 --raid-devices=2 --spare-devices=0 --assume-clean --metadata=1.0 /dev/sdb3 /dev/sdc3 echo 2000 > /sys/block/md99/md/sync_speed_min mdadm --add /dev/md99 /dev/sdd3 mdadm --grow --raid-devices=3 /dev/md99 cat /proc/mdstat md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] [=>...................] reshape = 5.1% (202240/3903744) finish=28.5min speed=2161K/sec dstat shows this (note that the total is doubled for some reason...): --dsk/sdb-----dsk/sdc-----dsk/sdd-----dsk/sde----dsk/total- read writ: read writ: read writ: read writ: read writ 549k 78k: 542k 68k: 534k 72k: 174B 11k:3333k 1097k 7936k 3970k:7936k 3970k: 0 3970k: 0 0 : 31M 23M 4096k 2050k:4096k 2050k: 0 2050k: 0 0 : 16M 12M 2816k 1282k:2816k 1282k: 0 1410k: 0 0 : 11M 7948k 1280k 836k:1280k 768k: 0 640k: 0 0 :5120k 4488k 7936k 3970k:7936k 3970k: 0 3970k: 0 0 : 31M 23M 5120k 2562k:5120k 2562k: 0 2562k: 0 0 : 20M 15M 3456k 1728k:3456k 1728k: 0 1728k: 0 0 : 14M 10M 1664k 834k:1664k 834k: 0 834k: 0 0 :6656k 5004k 3072k 1560k:3072k 1536k: 0 1536k: 0 0 : 12M 9264k 9216k 4612k:9216k 4612k: 0 4612k: 0 0 : 36M 27M Which clearly shows not a great deal of I/O: 5-18MB/s *total*. .. > You can possibly increase the speed somewhat by increasing the buffer > space that is used, thus allowing larger reads followed by larger > writes. This is done by increasing > /sys/block/mdXX/md/stripe_cache_size turnip:~ # cat /sys/block/md99/md/stripe_cache_size 256 turnip:~ # cat /sys/block/md99/md/stripe_cache_active 0 Increasing that to 4096 moves the rebuild speed to between 3 and 4MB/s. Any ideas? This appears to happen with all metadata versions. 100% reproduceable. -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: AWFUL reshape speed with raid5. 2008-07-28 17:39 AWFUL reshape speed with raid5 Jon Nelson 2008-07-28 18:14 ` Justin Piszcz 2008-08-01 1:26 ` Neil Brown @ 2008-08-21 2:58 ` Jon Nelson 2 siblings, 0 replies; 20+ messages in thread From: Jon Nelson @ 2008-08-21 2:58 UTC (permalink / raw) To: LinuxRaid On Mon, Jul 28, 2008 at 12:39 PM, Jon Nelson <jnelson-linux-raid@jamponi.net> wrote: > I built a raid5 with 2 devices (and --assume-clean) using 2x 4GB > partitions (not logical volumes). > I then grew it to 3 devices. > The reshape speed is really really slow. > > vmstat shows I/O like this: > > 0 0 212 25844 141160 497484 0 0 0 612 673 1284 0 6 93 0 > 0 0 212 25164 141160 497748 0 0 0 19 594 1253 1 4 95 0 > 0 0 212 25044 141160 498004 0 0 0 0 374 445 0 1 99 0 > 1 0 212 25220 141164 498000 0 0 0 23 506 1149 0 3 96 1 > 0 0 212 25500 141164 498004 0 0 0 3 546 1416 0 5 95 0 > > The min/max is 1000/200000. > What might be going on here? > > Kernel is 2.6.25.11 (openSUSE 11.0 x86-64 stock) > > /proc/mdstat for this entry: > > md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] > 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] > [=>...................] reshape = 8.2% (324224/3903744) > finish=43.3min speed=1373K/sec > > > This is on a set of devices capable of 70+ MB/s. I found some time to give this another shot. It's still true! Here is how I built the array: mdadm --create /dev/md99 --level=raid5 --raid-devices=2 --spare-devices=0 --assume-clean --metadata=1.0 --chunk=64 /dev/sdb3 /dev/sdc3 and then I added a drive: mdadm --add /dev/md99 /dev/sdd3 and then I grew the array to 3 devices: mdadm --grow /dev/md99 --raid-devices=3 This is what the relevant portion of /proc/mdstat looks like: md99 : active raid5 sdd3[2] sdc3[1] sdb3[0] 3903744 blocks super 1.0 level 5, 64k chunk, algorithm 2 [3/3] [UUU] [=>...................] reshape = 6.1% (241920/3903744) finish=43.0min speed=1415K/sec The 1000/200000 min/max defaults are being used. If I bump up the min to, say, 30000, the rebuild speed does grow to hover around 30000. As Justin Piszcz said: There once was a bug in an earlier kernel, in which the min_speed is what the rebuild ran at if you had a specific chunk size, have you tried to echo 30000 > to min_speed? Does it increase it to 30mb/s for the rebuild? Yes, apparently, it does. However, 'git log drivers/md' in the linux-2.6 tree doesn't show anything obvious for me. Can somebody point me to a specific commit, patch, etc... because as of 2.6.25.11 it's apparently still a problem (on an otherwise idle system, too). > > No meaningful change if I start with 3 disks and grow to 4, with or > without bitmap. > > -- > Jon > -- Jon ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2008-08-21 2:58 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-28 17:39 AWFUL reshape speed with raid5 Jon Nelson 2008-07-28 18:14 ` Justin Piszcz 2008-07-28 18:24 ` Jon Nelson 2008-07-28 18:55 ` Jon Nelson 2008-07-28 19:17 ` Roger Heflin 2008-07-28 19:43 ` Justin Piszcz 2008-07-28 19:59 ` David Lethe 2008-07-28 20:56 ` Jon Nelson 2008-07-30 16:23 ` Bill Davidsen 2008-07-30 16:31 ` Jon Nelson 2008-07-30 17:08 ` Justin Piszcz 2008-07-30 17:48 ` Jon Nelson 2008-08-01 1:43 ` Neil Brown 2008-08-01 13:23 ` Jon Nelson 2008-08-01 15:57 ` Jon Nelson 2008-07-30 16:50 ` David Greaves 2008-07-30 17:24 ` Bill Davidsen 2008-08-01 1:26 ` Neil Brown 2008-08-01 13:14 ` Jon Nelson 2008-08-21 2:58 ` Jon Nelson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).