* Why are reads not balanced across my RAID-1?
@ 2014-01-24 10:47 George Spelvin
2014-01-24 12:04 ` keld
0 siblings, 1 reply; 11+ messages in thread
From: George Spelvin @ 2014-01-24 10:47 UTC (permalink / raw)
To: linux-raid; +Cc: linux
I was doing some bulk reads on an ext4 file system on
a simple mirrored device (3.13 x86_64 kernel):
md2 : active raid1 sdd3[0] sde3[1]
1932514496 blocks [2/2] [UU]
bitmap: 0/15 pages [0KB], 65536KB chunk
And I noticed that the reads were all hitting the first disk.
The second was basically idle. Here's "dstat -d -D md2,sdd,sde":
--dsk/md2-----dsk/sdd-----dsk/sde--
read writ: read writ: read writ
510k 55k: 484k 58k: 27k 58k
129M 0 : 129M 0 : 48k 0
108M 48k: 108M 60k: 576k 60k
123M 0 : 123M 8192B: 0 8192B
133M 0 : 133M 0 : 360k 0
132M 0 : 132M 0 : 20k 0
138M 0 : 138M 0 : 304k 0
128M 0 : 128M 0 : 896k 0
129M 0 : 129M 4096B: 64k 4096B
135M 0 : 135M 0 : 36k 0
116M 12k: 116M 24k: 36k 28k
117M 0 : 116M 4096B: 632k 0
127M 0 : 127M 0 : 288k 0
130M 0 : 130M 0 : 336k 0
133M 0 : 133M 0 : 212k 0
134M 0 : 134M 0 : 304k 0
130M 0 : 129M 4096B: 100k 4096B
128M 0 : 127M 0 : 280k 0
106M 12k: 106M 28k: 372k 28k
129M 0 : 129M 0 : 344k 0
134M 0 : 134M 0 : 196k 0
134M 0 : 134M 0 : 384k 0
129M 0 : 129M 0 : 304k 0
I thought (drivers/md/raid1.c:read_balance()) the driver was
supposed to do some striping on large reads.
While 125M/s is nice, more than that would be nicer.
The drives are identical, but are plugged in to different controllers.
sdd is on an AMD SB600 controller, while sde is on a PDC42819.
Is there some knob I need to adjust to make read balancing happen?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 10:47 Why are reads not balanced across my RAID-1? George Spelvin
@ 2014-01-24 12:04 ` keld
2014-01-24 13:33 ` George Spelvin
2014-01-24 14:56 ` Matt Garman
0 siblings, 2 replies; 11+ messages in thread
From: keld @ 2014-01-24 12:04 UTC (permalink / raw)
To: George Spelvin; +Cc: linux-raid
On Fri, Jan 24, 2014 at 05:47:57AM -0500, George Spelvin wrote:
> I was doing some bulk reads on an ext4 file system on
> a simple mirrored device (3.13 x86_64 kernel):
>
> md2 : active raid1 sdd3[0] sde3[1]
> 1932514496 blocks [2/2] [UU]
> bitmap: 0/15 pages [0KB], 65536KB chunk
>
> And I noticed that the reads were all hitting the first disk.
> The second was basically idle. Here's "dstat -d -D md2,sdd,sde":
>
> --dsk/md2-----dsk/sdd-----dsk/sde--
> read writ: read writ: read writ
> 510k 55k: 484k 58k: 27k 58k
> 129M 0 : 129M 0 : 48k 0
> 108M 48k: 108M 60k: 576k 60k
> 123M 0 : 123M 8192B: 0 8192B
> 133M 0 : 133M 0 : 360k 0
> 132M 0 : 132M 0 : 20k 0
> 138M 0 : 138M 0 : 304k 0
> 128M 0 : 128M 0 : 896k 0
> 129M 0 : 129M 4096B: 64k 4096B
> 135M 0 : 135M 0 : 36k 0
> 116M 12k: 116M 24k: 36k 28k
> 117M 0 : 116M 4096B: 632k 0
> 127M 0 : 127M 0 : 288k 0
> 130M 0 : 130M 0 : 336k 0
> 133M 0 : 133M 0 : 212k 0
> 134M 0 : 134M 0 : 304k 0
> 130M 0 : 129M 4096B: 100k 4096B
> 128M 0 : 127M 0 : 280k 0
> 106M 12k: 106M 28k: 372k 28k
> 129M 0 : 129M 0 : 344k 0
> 134M 0 : 134M 0 : 196k 0
> 134M 0 : 134M 0 : 384k 0
> 129M 0 : 129M 0 : 304k 0
>
> I thought (drivers/md/raid1.c:read_balance()) the driver was
> supposed to do some striping on large reads.
>
> While 125M/s is nice, more than that would be nicer.
>
> The drives are identical, but are plugged in to different controllers.
> sdd is on an AMD SB600 controller, while sde is on a PDC42819.
>
> Is there some knob I need to adjust to make read balancing happen?
The reading is not balanced because it does not make sense to do balanced
reads for sequential reading. In RAID-1 the disk sectors are consequitive.
So if you would read one sector from one disk, and the following sector from the other disk,
then the next read from disk 1 would need to skip a full resolvation of the disk,
which may cost something like 8 ms. So better read contigously from the same disk, and hope
for some other IO request that can use disk 2.
For sequential reading RAID-10 in the "far" layout make a much more balanced
reading scheme. You will get somethng like RAID-0 reading speeds here.
Best regards
Keld
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 12:04 ` keld
@ 2014-01-24 13:33 ` George Spelvin
2014-01-25 1:48 ` keld
2014-01-24 14:56 ` Matt Garman
1 sibling, 1 reply; 11+ messages in thread
From: George Spelvin @ 2014-01-24 13:33 UTC (permalink / raw)
To: keld, linux; +Cc: linux-raid
> The reading is not balanced because it does not make sense to do balanced
> reads for sequential reading. In RAID-1 the disk sectors are consequitive.
> So if you would read one sector from one disk, and the following sector
> from the other disk, then the next read from disk 1 would need to skip
> a full resolvation of the disk, which may cost something like 8 ms.
> So better read contigously from the same disk, and hope for some other
> IO request that can use disk 2.
Actually I don't think that's true once reads get big enough.
7200 RPM is 120 rotations pers second, so there is about 1 MB
of data per track. At the end of that, the drive has to switch
heads or do a track-to-track seek to get more. If we knew where
the track boundaries were, we could interleave reads on those
boundaries and get good speedup.
But it's also possible to take advantage if the reads are
sufficiently larger than this 1 MB threshold. Alternating
at 8 MB boundaries would probably be a speedup.
Actually, I should do some timing to find out....
Reading every odd block from the first 8 GiB of sdd (4 GiB of data read)
using a block size of:
256 MiB: 25.917s (165.7 MB/s)
128 MiB: 25.293s (169.8 MB/s)
64 MiB: 26.341s (163.1 MB/s)
32 MiB: 26.029s (165.0 MB/s)
16 MiB: 27.327s (157.2 MB/s)
8 MiB: 28.210s (152.2 MB/s)
4 MiB: 31.371s (136.9 MB/s)
2 MiB: 36.560s (117.5 MB/s)
1 MiB: 51.325s ( 83.7 MB/s)
So 1 MB striping would be barely faster than single-drive reading,
but 2MB offers a speedup, and 8MB sould actually be quite nice.
(Reading the first 4 GiB of the drive, with no seeking, also takes
25.something seconds.)
... but even ignoring that, shouldn't reads change drives when there's a
jump in the sequential read? ext4 (with flex_bg) divides the disk into
2 GB "block groups", with reserved inode space at the start of each one,
and can't allocate contiguous data larger than that. And the files
I was reading were all in the 50-150 MB range anyway.
Should the md driver switch drives when there is a jump in the address
being fetched? But for minutes, I *never* saw any sde activity.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 12:04 ` keld
2014-01-24 13:33 ` George Spelvin
@ 2014-01-24 14:56 ` Matt Garman
2014-01-24 16:50 ` Mathias Burén
2014-01-24 17:49 ` keld
1 sibling, 2 replies; 11+ messages in thread
From: Matt Garman @ 2014-01-24 14:56 UTC (permalink / raw)
To: keld; +Cc: George Spelvin, Mdadm
On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote:
> The reading is not balanced because it does not make sense to do balanced
> reads for sequential reading. In RAID-1 the disk sectors are consequitive.
> So if you would read one sector from one disk, and the following sector from the other disk,
> then the next read from disk 1 would need to skip a full resolvation of the disk,
> which may cost something like 8 ms. So better read contigously from the same disk, and hope
> for some other IO request that can use disk 2.
Does that rationale hold for SSDs?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 14:56 ` Matt Garman
@ 2014-01-24 16:50 ` Mathias Burén
2014-01-24 17:49 ` keld
1 sibling, 0 replies; 11+ messages in thread
From: Mathias Burén @ 2014-01-24 16:50 UTC (permalink / raw)
To: Matt Garman; +Cc: Keld Jørn Simonsen, George Spelvin, Mdadm
On 24 January 2014 14:56, Matt Garman <matthew.garman@gmail.com> wrote:
> On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote:
>> The reading is not balanced because it does not make sense to do balanced
>> reads for sequential reading. In RAID-1 the disk sectors are consequitive.
>> So if you would read one sector from one disk, and the following sector from the other disk,
>> then the next read from disk 1 would need to skip a full resolvation of the disk,
>> which may cost something like 8 ms. So better read contigously from the same disk, and hope
>> for some other IO request that can use disk 2.
>
>
> Does that rationale hold for SSDs?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Note: MD can do RAID10 with 2 devices, right?
Mathias
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 14:56 ` Matt Garman
2014-01-24 16:50 ` Mathias Burén
@ 2014-01-24 17:49 ` keld
2014-01-24 18:22 ` Roberto Spadim
1 sibling, 1 reply; 11+ messages in thread
From: keld @ 2014-01-24 17:49 UTC (permalink / raw)
To: Matt Garman; +Cc: George Spelvin, Mdadm
On Fri, Jan 24, 2014 at 08:56:02AM -0600, Matt Garman wrote:
> On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote:
> > The reading is not balanced because it does not make sense to do balanced
> > reads for sequential reading. In RAID-1 the disk sectors are consequitive.
> > So if you would read one sector from one disk, and the following sector from the other disk,
> > then the next read from disk 1 would need to skip a full resolvation of the disk,
> > which may cost something like 8 ms. So better read contigously from the same disk, and hope
> > for some other IO request that can use disk 2.
>
>
> Does that rationale hold for SSDs?
No, this is not relevant for SSDs as access time is almost zero.
best regards
keld
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 17:49 ` keld
@ 2014-01-24 18:22 ` Roberto Spadim
2014-01-25 1:18 ` George Spelvin
2014-01-25 17:52 ` Robert L Mathews
0 siblings, 2 replies; 11+ messages in thread
From: Roberto Spadim @ 2014-01-24 18:22 UTC (permalink / raw)
To: Keld Jørn Simonsen; +Cc: Matt Garman, George Spelvin, Mdadm
Hi guys, i done some tests some time ago, and what i can tell is
Raid1 use the nearest disk algorithm to select what disk will execute
reads (read balance)
i think it's very good, but don't consider diferent disk speed (access
time and continous read), but it do a very good job
i don't know there's space to improvement here, i got 1% of speed up
doing a mix of disk speed (7.2k rpm 5k rpm and 15k rpm and a ssd) and
changing the read balance algorithm
i think it's solid 1% of speed up, but confuse to config parameters,
parameters are based with access time (head time to move from position
0 to end of disk, or 0 for ssd) and continous read (based on mb/second
in a continous read, or based on rpm and track size)
the best i got is:
if you want raid1 you will run a system like databases, or multi
thread systems, each thread operating a disk
if have a system with continous read/write (dvr, stream, etc) raid10
far is the best
maybe more tunes you can do with align, read ahead, and others
filesystem options/parameters and cache, a raid card with flash backup
or battery is nice too
good luck
2014/1/24 <keld@keldix.com>
>
> On Fri, Jan 24, 2014 at 08:56:02AM -0600, Matt Garman wrote:
> > On Fri, Jan 24, 2014 at 6:04 AM, <keld@keldix.com> wrote:
> > > The reading is not balanced because it does not make sense to do balanced
> > > reads for sequential reading. In RAID-1 the disk sectors are consequitive.
> > > So if you would read one sector from one disk, and the following sector from the other disk,
> > > then the next read from disk 1 would need to skip a full resolvation of the disk,
> > > which may cost something like 8 ms. So better read contigously from the same disk, and hope
> > > for some other IO request that can use disk 2.
> >
> >
> > Does that rationale hold for SSDs?
>
> No, this is not relevant for SSDs as access time is almost zero.
>
> best regards
> keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Roberto Spadim
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 18:22 ` Roberto Spadim
@ 2014-01-25 1:18 ` George Spelvin
2014-01-30 20:05 ` Phillip Susi
2014-01-25 17:52 ` Robert L Mathews
1 sibling, 1 reply; 11+ messages in thread
From: George Spelvin @ 2014-01-25 1:18 UTC (permalink / raw)
To: keld, rspadim; +Cc: linux-raid, linux, matthew.garman
> the best i got is:
> if you want raid1 you will run a system like databases, or multi
> thread systems, each thread operating a disk
> if have a system with continous read/write (dvr, stream, etc) raid10
> far is the best
Thank you for the respose, but I'm a little confused Can you clarify?
I'm not sure which of three things you're suggesting.
Are you talking about the actual data layout, or the Linux driver?
Obviously, if I added disks and striped across them, sequential
performance would go up. This is an actual RAID-10 layout.
But it's not useful unless I buy more disks.
But there's also the case that I can use the kernel raid10 driver with
an n2 layout. This has the same disk layout as raid1, but might have
different preformance. Is that what you mean? In that case, it would be
nice to either copy the improvements to the raid1 driver, or make the
raid10 driver able to handle raid1 arrays and throw out the raid1 driver.
Finally, a third option is that you're talking about the raid10
driver, but with a far or offset layout. I haven't played with those.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 13:33 ` George Spelvin
@ 2014-01-25 1:48 ` keld
0 siblings, 0 replies; 11+ messages in thread
From: keld @ 2014-01-25 1:48 UTC (permalink / raw)
To: George Spelvin; +Cc: linux-raid
On Fri, Jan 24, 2014 at 08:33:26AM -0500, George Spelvin wrote:
> > The reading is not balanced because it does not make sense to do balanced
> > reads for sequential reading. In RAID-1 the disk sectors are consequitive.
> > So if you would read one sector from one disk, and the following sector
> > from the other disk, then the next read from disk 1 would need to skip
> > a full resolvation of the disk, which may cost something like 8 ms.
> > So better read contigously from the same disk, and hope for some other
> > IO request that can use disk 2.
>
> Actually I don't think that's true once reads get big enough.
> 7200 RPM is 120 rotations pers second, so there is about 1 MB
> of data per track. At the end of that, the drive has to switch
> heads or do a track-to-track seek to get more. If we knew where
> the track boundaries were, we could interleave reads on those
> boundaries and get good speedup.
I am not sure that there is 1 MB per track, or per rotation, but on 7200 RPM disks, a rotation is
about 8 miliseconds. In 8 ms you can read something like (on the drive you describe below)
150 MB/s or 150 kB/s, and in 8 ms this you can then read 1.2 MB. As you say, with big
block sizes you can minimize the impact of this loss, eg using 10 MB block sizes.
But for databases it would be quite a waste to read 10 MB per record access.
Raid10,far then gives a good performance both for sequential reading and for database access,
with .block sizes of about only 1 MB.
> But it's also possible to take advantage if the reads are
> sufficiently larger than this 1 MB threshold. Alternating
> at 8 MB boundaries would probably be a speedup.
>
> Actually, I should do some timing to find out....
>
> Reading every odd block from the first 8 GiB of sdd (4 GiB of data read)
> using a block size of:
> 256 MiB: 25.917s (165.7 MB/s)
> 128 MiB: 25.293s (169.8 MB/s)
> 64 MiB: 26.341s (163.1 MB/s)
> 32 MiB: 26.029s (165.0 MB/s)
> 16 MiB: 27.327s (157.2 MB/s)
> 8 MiB: 28.210s (152.2 MB/s)
> 4 MiB: 31.371s (136.9 MB/s)
> 2 MiB: 36.560s (117.5 MB/s)
> 1 MiB: 51.325s ( 83.7 MB/s)
>
> So 1 MB striping would be barely faster than single-drive reading,
> but 2MB offers a speedup, and 8MB sould actually be quite nice.
> (Reading the first 4 GiB of the drive, with no seeking, also takes
> 25.something seconds.)
With raid10,far you should get the full speed of about 160 MB/s with
a block size of just 1 MB, for seqyential reading.
> ... but even ignoring that, shouldn't reads change drives when there's a
> jump in the sequential read? ext4 (with flex_bg) divides the disk into
> 2 GB "block groups", with reserved inode space at the start of each one,
> and can't allocate contiguous data larger than that. And the files
> I was reading were all in the 50-150 MB range anyway.
>
> Should the md driver switch drives when there is a jump in the address
> being fetched? But for minutes, I *never* saw any sde activity.
Well, it could. I did find some differences between raid1 and raid10,n2
in some tests I did, which could be due to different balancing in
the raid1 and raid10 driver.
best regards
keld
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-24 18:22 ` Roberto Spadim
2014-01-25 1:18 ` George Spelvin
@ 2014-01-25 17:52 ` Robert L Mathews
1 sibling, 0 replies; 11+ messages in thread
From: Robert L Mathews @ 2014-01-25 17:52 UTC (permalink / raw)
To: Mdadm
On 1/24/14, 10:22 AM, Roberto Spadim wrote:
> i don't know there's space to improvement here, i got 1% of speed up
> doing a mix of disk speed (7.2k rpm 5k rpm and 15k rpm and a ssd) and
> changing the read balance algorithm
If your RAID1 array contains both spinning disks and SSDs, you can (and
should) simply set the spinning disks as "write-mostly":
http://tansi.info/hybrid/
This causes all reads to come from the SSD if possible.
After doing this, all reads should be at the SSD speed, although all
writes will still be at spinning disk speeds. We have been doing this on
all our database servers for years with zero problems (although we
finally now trust SSDs enough to switch all array members to SSDs, so
we're phasing it out).
If your workload consists of lots of scattered reads, you'll get far
more than a 1% read performance increase from this, with no tweaking of
read balancing algorithm necessary.
--
Robert L Mathews, Tiger Technologies, http://www.tigertech.net/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Why are reads not balanced across my RAID-1?
2014-01-25 1:18 ` George Spelvin
@ 2014-01-30 20:05 ` Phillip Susi
0 siblings, 0 replies; 11+ messages in thread
From: Phillip Susi @ 2014-01-30 20:05 UTC (permalink / raw)
To: George Spelvin, keld, rspadim; +Cc: linux-raid, matthew.garman
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 1/24/2014 8:18 PM, George Spelvin wrote:
> Obviously, if I added disks and striped across them, sequential
> performance would go up. This is an actual RAID-10 layout. But
> it's not useful unless I buy more disks.
No, md can do raid10 on two disks, just not in the near layout. The
offset layout with a sufficiently large chunk size mostly does what
you want. I recently set myself up with a 3 disk array like that and
it pushes 500 MB/s using cheap, run of the mill 1 TB WD Blue drives.
The offset layout gives slightly better performance than far ( as long
as you use a sufficiently large chunk size ), and can be resized,
unlike far.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJS6rB2AAoJEI5FoCIzSKrwCgUH/2ObIJbpgipuSpf1JsTLyMJk
zJnRkKISS6/1MiEtiBLRDYHhi2yL/B0S/J5jRSmgeLbAuM/q1cfF7BSOKfP52qUS
frDIRO0AtWybE/8NNiLMe6dBG1Zkfn/P+atMQwGGfy5wMWAU1DcCzq/qlv+dVkP7
VHaGuEKm/A1ySzwxnKdPTbAfe1/wRrBDeQg4leZRP9nBLA+jDWmw3oGlHW/7Aeeb
DCKKzU6+V1Hqrk8kmayh6A6D5Dp8AdPoMEj7q/I8edNX/Zp8NI/yH2wOrEkXI/xK
kiE2E0kRHyTFQ2VxU0rGKzYzE4fcjtoaDdQxHhc58/5LkUE3PTDsTyBxKn0XsHU=
=G5bT
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-30 20:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-24 10:47 Why are reads not balanced across my RAID-1? George Spelvin
2014-01-24 12:04 ` keld
2014-01-24 13:33 ` George Spelvin
2014-01-25 1:48 ` keld
2014-01-24 14:56 ` Matt Garman
2014-01-24 16:50 ` Mathias Burén
2014-01-24 17:49 ` keld
2014-01-24 18:22 ` Roberto Spadim
2014-01-25 1:18 ` George Spelvin
2014-01-30 20:05 ` Phillip Susi
2014-01-25 17:52 ` Robert L Mathews
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).