* Why are reads not balanced across my RAID-1?
@ 2014-01-24 10:47 George Spelvin
2014-01-24 12:04 ` keld
0 siblings, 1 reply; 11+ messages in thread
From: George Spelvin @ 2014-01-24 10:47 UTC (permalink / raw)
To: linux-raid; +Cc: linux
I was doing some bulk reads on an ext4 file system on
a simple mirrored device (3.13 x86_64 kernel):
md2 : active raid1 sdd3[0] sde3[1]
1932514496 blocks [2/2] [UU]
bitmap: 0/15 pages [0KB], 65536KB chunk
And I noticed that the reads were all hitting the first disk.
The second was basically idle. Here's "dstat -d -D md2,sdd,sde":
--dsk/md2-----dsk/sdd-----dsk/sde--
read writ: read writ: read writ
510k 55k: 484k 58k: 27k 58k
129M 0 : 129M 0 : 48k 0
108M 48k: 108M 60k: 576k 60k
123M 0 : 123M 8192B: 0 8192B
133M 0 : 133M 0 : 360k 0
132M 0 : 132M 0 : 20k 0
138M 0 : 138M 0 : 304k 0
128M 0 : 128M 0 : 896k 0
129M 0 : 129M 4096B: 64k 4096B
135M 0 : 135M 0 : 36k 0
116M 12k: 116M 24k: 36k 28k
117M 0 : 116M 4096B: 632k 0
127M 0 : 127M 0 : 288k 0
130M 0 : 130M 0 : 336k 0
133M 0 : 133M 0 : 212k 0
134M 0 : 134M 0 : 304k 0
130M 0 : 129M 4096B: 100k 4096B
128M 0 : 127M 0 : 280k 0
106M 12k: 106M 28k: 372k 28k
129M 0 : 129M 0 : 344k 0
134M 0 : 134M 0 : 196k 0
134M 0 : 134M 0 : 384k 0
129M 0 : 129M 0 : 304k 0
I thought (drivers/md/raid1.c:read_balance()) the driver was
supposed to do some striping on large reads.
While 125M/s is nice, more than that would be nicer.
The drives are identical, but are plugged in to different controllers.
sdd is on an AMD SB600 controller, while sde is on a PDC42819.
Is there some knob I need to adjust to make read balancing happen?
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Why are reads not balanced across my RAID-1? 2014-01-24 10:47 Why are reads not balanced across my RAID-1? George Spelvin @ 2014-01-24 12:04 ` keld 2014-01-24 13:33 ` George Spelvin 2014-01-24 14:56 ` Matt Garman 0 siblings, 2 replies; 11+ messages in thread From: keld @ 2014-01-24 12:04 UTC (permalink / raw) To: George Spelvin; +Cc: linux-raid On Fri, Jan 24, 2014 at 05:47:57AM -0500, George Spelvin wrote: > I was doing some bulk reads on an ext4 file system on > a simple mirrored device (3.13 x86_64 kernel): > > md2 : active raid1 sdd3[0] sde3[1] > 1932514496 blocks [2/2] [UU] > bitmap: 0/15 pages [0KB], 65536KB chunk > > And I noticed that the reads were all hitting the first disk. > The second was basically idle. Here's "dstat -d -D md2,sdd,sde": > > --dsk/md2-----dsk/sdd-----dsk/sde-- > read writ: read writ: read writ > 510k 55k: 484k 58k: 27k 58k > 129M 0 : 129M 0 : 48k 0 > 108M 48k: 108M 60k: 576k 60k > 123M 0 : 123M 8192B: 0 8192B > 133M 0 : 133M 0 : 360k 0 > 132M 0 : 132M 0 : 20k 0 > 138M 0 : 138M 0 : 304k 0 > 128M 0 : 128M 0 : 896k 0 > 129M 0 : 129M 4096B: 64k 4096B > 135M 0 : 135M 0 : 36k 0 > 116M 12k: 116M 24k: 36k 28k > 117M 0 : 116M 4096B: 632k 0 > 127M 0 : 127M 0 : 288k 0 > 130M 0 : 130M 0 : 336k 0 > 133M 0 : 133M 0 : 212k 0 > 134M 0 : 134M 0 : 304k 0 > 130M 0 : 129M 4096B: 100k 4096B > 128M 0 : 127M 0 : 280k 0 > 106M 12k: 106M 28k: 372k 28k > 129M 0 : 129M 0 : 344k 0 > 134M 0 : 134M 0 : 196k 0 > 134M 0 : 134M 0 : 384k 0 > 129M 0 : 129M 0 : 304k 0 > > I thought (drivers/md/raid1.c:read_balance()) the driver was > supposed to do some striping on large reads. > > While 125M/s is nice, more than that would be nicer. > > The drives are identical, but are plugged in to different controllers. > sdd is on an AMD SB600 controller, while sde is on a PDC42819. > > Is there some knob I need to adjust to make read balancing happen? The reading is not balanced because it does not make sense to do balanced reads for sequential reading. In RAID-1 the disk sectors are consequitive. So if you would read one sector from one disk, and the following sector from the other disk, then the next read from disk 1 would need to skip a full resolvation of the disk, which may cost something like 8 ms. So better read contigously from the same disk, and hope for some other IO request that can use disk 2. For sequential reading RAID-10 in the "far" layout make a much more balanced reading scheme. You will get somethng like RAID-0 reading speeds here. Best regards Keld ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 12:04 ` keld @ 2014-01-24 13:33 ` George Spelvin 2014-01-25 1:48 ` keld 2014-01-24 14:56 ` Matt Garman 1 sibling, 1 reply; 11+ messages in thread From: George Spelvin @ 2014-01-24 13:33 UTC (permalink / raw) To: keld, linux; +Cc: linux-raid > The reading is not balanced because it does not make sense to do balanced > reads for sequential reading. In RAID-1 the disk sectors are consequitive. > So if you would read one sector from one disk, and the following sector > from the other disk, then the next read from disk 1 would need to skip > a full resolvation of the disk, which may cost something like 8 ms. > So better read contigously from the same disk, and hope for some other > IO request that can use disk 2. Actually I don't think that's true once reads get big enough. 7200 RPM is 120 rotations pers second, so there is about 1 MB of data per track. At the end of that, the drive has to switch heads or do a track-to-track seek to get more. If we knew where the track boundaries were, we could interleave reads on those boundaries and get good speedup. But it's also possible to take advantage if the reads are sufficiently larger than this 1 MB threshold. Alternating at 8 MB boundaries would probably be a speedup. Actually, I should do some timing to find out.... Reading every odd block from the first 8 GiB of sdd (4 GiB of data read) using a block size of: 256 MiB: 25.917s (165.7 MB/s) 128 MiB: 25.293s (169.8 MB/s) 64 MiB: 26.341s (163.1 MB/s) 32 MiB: 26.029s (165.0 MB/s) 16 MiB: 27.327s (157.2 MB/s) 8 MiB: 28.210s (152.2 MB/s) 4 MiB: 31.371s (136.9 MB/s) 2 MiB: 36.560s (117.5 MB/s) 1 MiB: 51.325s ( 83.7 MB/s) So 1 MB striping would be barely faster than single-drive reading, but 2MB offers a speedup, and 8MB sould actually be quite nice. (Reading the first 4 GiB of the drive, with no seeking, also takes 25.something seconds.) ... but even ignoring that, shouldn't reads change drives when there's a jump in the sequential read? ext4 (with flex_bg) divides the disk into 2 GB "block groups", with reserved inode space at the start of each one, and can't allocate contiguous data larger than that. And the files I was reading were all in the 50-150 MB range anyway. Should the md driver switch drives when there is a jump in the address being fetched? But for minutes, I *never* saw any sde activity. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 13:33 ` George Spelvin @ 2014-01-25 1:48 ` keld 0 siblings, 0 replies; 11+ messages in thread From: keld @ 2014-01-25 1:48 UTC (permalink / raw) To: George Spelvin; +Cc: linux-raid On Fri, Jan 24, 2014 at 08:33:26AM -0500, George Spelvin wrote: > > The reading is not balanced because it does not make sense to do balanced > > reads for sequential reading. In RAID-1 the disk sectors are consequitive. > > So if you would read one sector from one disk, and the following sector > > from the other disk, then the next read from disk 1 would need to skip > > a full resolvation of the disk, which may cost something like 8 ms. > > So better read contigously from the same disk, and hope for some other > > IO request that can use disk 2. > > Actually I don't think that's true once reads get big enough. > 7200 RPM is 120 rotations pers second, so there is about 1 MB > of data per track. At the end of that, the drive has to switch > heads or do a track-to-track seek to get more. If we knew where > the track boundaries were, we could interleave reads on those > boundaries and get good speedup. I am not sure that there is 1 MB per track, or per rotation, but on 7200 RPM disks, a rotation is about 8 miliseconds. In 8 ms you can read something like (on the drive you describe below) 150 MB/s or 150 kB/s, and in 8 ms this you can then read 1.2 MB. As you say, with big block sizes you can minimize the impact of this loss, eg using 10 MB block sizes. But for databases it would be quite a waste to read 10 MB per record access. Raid10,far then gives a good performance both for sequential reading and for database access, with .block sizes of about only 1 MB. > But it's also possible to take advantage if the reads are > sufficiently larger than this 1 MB threshold. Alternating > at 8 MB boundaries would probably be a speedup. > > Actually, I should do some timing to find out.... > > Reading every odd block from the first 8 GiB of sdd (4 GiB of data read) > using a block size of: > 256 MiB: 25.917s (165.7 MB/s) > 128 MiB: 25.293s (169.8 MB/s) > 64 MiB: 26.341s (163.1 MB/s) > 32 MiB: 26.029s (165.0 MB/s) > 16 MiB: 27.327s (157.2 MB/s) > 8 MiB: 28.210s (152.2 MB/s) > 4 MiB: 31.371s (136.9 MB/s) > 2 MiB: 36.560s (117.5 MB/s) > 1 MiB: 51.325s ( 83.7 MB/s) > > So 1 MB striping would be barely faster than single-drive reading, > but 2MB offers a speedup, and 8MB sould actually be quite nice. > (Reading the first 4 GiB of the drive, with no seeking, also takes > 25.something seconds.) With raid10,far you should get the full speed of about 160 MB/s with a block size of just 1 MB, for seqyential reading. > ... but even ignoring that, shouldn't reads change drives when there's a > jump in the sequential read? ext4 (with flex_bg) divides the disk into > 2 GB "block groups", with reserved inode space at the start of each one, > and can't allocate contiguous data larger than that. And the files > I was reading were all in the 50-150 MB range anyway. > > Should the md driver switch drives when there is a jump in the address > being fetched? But for minutes, I *never* saw any sde activity. Well, it could. I did find some differences between raid1 and raid10,n2 in some tests I did, which could be due to different balancing in the raid1 and raid10 driver. best regards keld ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 12:04 ` keld 2014-01-24 13:33 ` George Spelvin @ 2014-01-24 14:56 ` Matt Garman 2014-01-24 16:50 ` Mathias Burén 2014-01-24 17:49 ` keld 1 sibling, 2 replies; 11+ messages in thread From: Matt Garman @ 2014-01-24 14:56 UTC (permalink / raw) To: keld; +Cc: George Spelvin, Mdadm On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote: > The reading is not balanced because it does not make sense to do balanced > reads for sequential reading. In RAID-1 the disk sectors are consequitive. > So if you would read one sector from one disk, and the following sector from the other disk, > then the next read from disk 1 would need to skip a full resolvation of the disk, > which may cost something like 8 ms. So better read contigously from the same disk, and hope > for some other IO request that can use disk 2. Does that rationale hold for SSDs? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 14:56 ` Matt Garman @ 2014-01-24 16:50 ` Mathias Burén 2014-01-24 17:49 ` keld 1 sibling, 0 replies; 11+ messages in thread From: Mathias Burén @ 2014-01-24 16:50 UTC (permalink / raw) To: Matt Garman; +Cc: Keld Jørn Simonsen, George Spelvin, Mdadm On 24 January 2014 14:56, Matt Garman <matthew.garman@gmail.com> wrote: > On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote: >> The reading is not balanced because it does not make sense to do balanced >> reads for sequential reading. In RAID-1 the disk sectors are consequitive. >> So if you would read one sector from one disk, and the following sector from the other disk, >> then the next read from disk 1 would need to skip a full resolvation of the disk, >> which may cost something like 8 ms. So better read contigously from the same disk, and hope >> for some other IO request that can use disk 2. > > > Does that rationale hold for SSDs? > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Note: MD can do RAID10 with 2 devices, right? Mathias ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 14:56 ` Matt Garman 2014-01-24 16:50 ` Mathias Burén @ 2014-01-24 17:49 ` keld 2014-01-24 18:22 ` Roberto Spadim 1 sibling, 1 reply; 11+ messages in thread From: keld @ 2014-01-24 17:49 UTC (permalink / raw) To: Matt Garman; +Cc: George Spelvin, Mdadm On Fri, Jan 24, 2014 at 08:56:02AM -0600, Matt Garman wrote: > On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote: > > The reading is not balanced because it does not make sense to do balanced > > reads for sequential reading. In RAID-1 the disk sectors are consequitive. > > So if you would read one sector from one disk, and the following sector from the other disk, > > then the next read from disk 1 would need to skip a full resolvation of the disk, > > which may cost something like 8 ms. So better read contigously from the same disk, and hope > > for some other IO request that can use disk 2. > > > Does that rationale hold for SSDs? No, this is not relevant for SSDs as access time is almost zero. best regards keld ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 17:49 ` keld @ 2014-01-24 18:22 ` Roberto Spadim 2014-01-25 1:18 ` George Spelvin 2014-01-25 17:52 ` Robert L Mathews 0 siblings, 2 replies; 11+ messages in thread From: Roberto Spadim @ 2014-01-24 18:22 UTC (permalink / raw) To: Keld Jørn Simonsen; +Cc: Matt Garman, George Spelvin, Mdadm Hi guys, i done some tests some time ago, and what i can tell is Raid1 use the nearest disk algorithm to select what disk will execute reads (read balance) i think it's very good, but don't consider diferent disk speed (access time and continous read), but it do a very good job i don't know there's space to improvement here, i got 1% of speed up doing a mix of disk speed (7.2k rpm 5k rpm and 15k rpm and a ssd) and changing the read balance algorithm i think it's solid 1% of speed up, but confuse to config parameters, parameters are based with access time (head time to move from position 0 to end of disk, or 0 for ssd) and continous read (based on mb/second in a continous read, or based on rpm and track size) the best i got is: if you want raid1 you will run a system like databases, or multi thread systems, each thread operating a disk if have a system with continous read/write (dvr, stream, etc) raid10 far is the best maybe more tunes you can do with align, read ahead, and others filesystem options/parameters and cache, a raid card with flash backup or battery is nice too good luck 2014/1/24 <keld@keldix.com> > > On Fri, Jan 24, 2014 at 08:56:02AM -0600, Matt Garman wrote: > > On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote: > > > The reading is not balanced because it does not make sense to do balanced > > > reads for sequential reading. In RAID-1 the disk sectors are consequitive. > > > So if you would read one sector from one disk, and the following sector from the other disk, > > > then the next read from disk 1 would need to skip a full resolvation of the disk, > > > which may cost something like 8 ms. So better read contigously from the same disk, and hope > > > for some other IO request that can use disk 2. > > > > > > Does that rationale hold for SSDs? > > No, this is not relevant for SSDs as access time is almost zero. > > best regards > keld > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 18:22 ` Roberto Spadim @ 2014-01-25 1:18 ` George Spelvin 2014-01-30 20:05 ` Phillip Susi 2014-01-25 17:52 ` Robert L Mathews 1 sibling, 1 reply; 11+ messages in thread From: George Spelvin @ 2014-01-25 1:18 UTC (permalink / raw) To: keld, rspadim; +Cc: linux-raid, linux, matthew.garman > the best i got is: > if you want raid1 you will run a system like databases, or multi > thread systems, each thread operating a disk > if have a system with continous read/write (dvr, stream, etc) raid10 > far is the best Thank you for the respose, but I'm a little confused Can you clarify? I'm not sure which of three things you're suggesting. Are you talking about the actual data layout, or the Linux driver? Obviously, if I added disks and striped across them, sequential performance would go up. This is an actual RAID-10 layout. But it's not useful unless I buy more disks. But there's also the case that I can use the kernel raid10 driver with an n2 layout. This has the same disk layout as raid1, but might have different preformance. Is that what you mean? In that case, it would be nice to either copy the improvements to the raid1 driver, or make the raid10 driver able to handle raid1 arrays and throw out the raid1 driver. Finally, a third option is that you're talking about the raid10 driver, but with a far or offset layout. I haven't played with those. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-25 1:18 ` George Spelvin @ 2014-01-30 20:05 ` Phillip Susi 0 siblings, 0 replies; 11+ messages in thread From: Phillip Susi @ 2014-01-30 20:05 UTC (permalink / raw) To: George Spelvin, keld, rspadim; +Cc: linux-raid, matthew.garman -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/24/2014 8:18 PM, George Spelvin wrote: > Obviously, if I added disks and striped across them, sequential > performance would go up. This is an actual RAID-10 layout. But > it's not useful unless I buy more disks. No, md can do raid10 on two disks, just not in the near layout. The offset layout with a sufficiently large chunk size mostly does what you want. I recently set myself up with a 3 disk array like that and it pushes 500 MB/s using cheap, run of the mill 1 TB WD Blue drives. The offset layout gives slightly better performance than far ( as long as you use a sufficiently large chunk size ), and can be resized, unlike far. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS6rB2AAoJEI5FoCIzSKrwCgUH/2ObIJbpgipuSpf1JsTLyMJk zJnRkKISS6/1MiEtiBLRDYHhi2yL/B0S/J5jRSmgeLbAuM/q1cfF7BSOKfP52qUS frDIRO0AtWybE/8NNiLMe6dBG1Zkfn/P+atMQwGGfy5wMWAU1DcCzq/qlv+dVkP7 VHaGuEKm/A1ySzwxnKdPTbAfe1/wRrBDeQg4leZRP9nBLA+jDWmw3oGlHW/7Aeeb DCKKzU6+V1Hqrk8kmayh6A6D5Dp8AdPoMEj7q/I8edNX/Zp8NI/yH2wOrEkXI/xK kiE2E0kRHyTFQ2VxU0rGKzYzE4fcjtoaDdQxHhc58/5LkUE3PTDsTyBxKn0XsHU= =G5bT -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1? 2014-01-24 18:22 ` Roberto Spadim 2014-01-25 1:18 ` George Spelvin @ 2014-01-25 17:52 ` Robert L Mathews 1 sibling, 0 replies; 11+ messages in thread From: Robert L Mathews @ 2014-01-25 17:52 UTC (permalink / raw) To: Mdadm On 1/24/14, 10:22 AM, Roberto Spadim wrote: > i don't know there's space to improvement here, i got 1% of speed up > doing a mix of disk speed (7.2k rpm 5k rpm and 15k rpm and a ssd) and > changing the read balance algorithm If your RAID1 array contains both spinning disks and SSDs, you can (and should) simply set the spinning disks as "write-mostly": http://tansi.info/hybrid/ This causes all reads to come from the SSD if possible. After doing this, all reads should be at the SSD speed, although all writes will still be at spinning disk speeds. We have been doing this on all our database servers for years with zero problems (although we finally now trust SSDs enough to switch all array members to SSDs, so we're phasing it out). If your workload consists of lots of scattered reads, you'll get far more than a 1% read performance increase from this, with no tweaking of read balancing algorithm necessary. -- Robert L Mathews, Tiger Technologies, http://www.tigertech.net/ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-30 20:05 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-24 10:47 Why are reads not balanced across my RAID-1? George Spelvin 2014-01-24 12:04 ` keld 2014-01-24 13:33 ` George Spelvin 2014-01-25 1:48 ` keld 2014-01-24 14:56 ` Matt Garman 2014-01-24 16:50 ` Mathias Burén 2014-01-24 17:49 ` keld 2014-01-24 18:22 ` Roberto Spadim 2014-01-25 1:18 ` George Spelvin 2014-01-30 20:05 ` Phillip Susi 2014-01-25 17:52 ` Robert L Mathews
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).