* RAID1 write-behind not working?
@ 2009-03-26 0:39 David Rees
[not found] ` <49CB7CB2.1020105@tmr.com>
2009-03-27 4:29 ` Neil Brown
0 siblings, 2 replies; 5+ messages in thread
From: David Rees @ 2009-03-26 0:39 UTC (permalink / raw)
To: linux-raid
I picked up a SSD (30GB OCZ Vertex if you're curious) and am planning
to set up a RAID1 with it and a SATA disk with the SATA disk in
write-mostly mode along with write-behind enabled.
So I went to test it to make sure I had all the details correct on
aligning the partition to a 128kB offset and the correct mdadm
commands to create the array in write-mostly and write-behind, but I
found that I couldn't get writes to "write-behind". Essentially all
writes would complete at the speed of the rotating disk. Reads
however worked great - they all appeared to be coming from the SSD.
I created the array with the following command:
mdadm --create /dev/md0 --raid-devices=2 --level=1 \
--bitmap=internal --write-behind /dev/sdb1 --write-mostly /dev/sda1
/dev/sda is the SATA disk, /dev/sdb is the SSD.
So then I created an ext3 filesystem on the array (mkfs -t ext3 -E
stride=32 /dev/md0), mounted it and proceeded to run some quick dd
tests to verify that things were working as expected.
I simply wrote a 1kB file to disk like this:
dd if=/dev/zero of=/mnt/disk/tmpfile bs=1k count=1 conv=fdatasync
I tried various file sizes, and even tried reconfiguring the bitmap
with a write-behind setting of 10000.
I confirmed speeds by failing and removing either the SSD or SATA disk
- with both disks active, writes were completing at the speed of the
SATA disk, which was 2-5 times slower depending on the size of the
file.
My only guess is that by specifying fdatasync on the dd command, that
forces the data to be written out to all members of the array. Not
quite intuitive if that is by design - because otherwise, when are you
able to take advantage of the write-behind feature.
The system I was testing on was running off a Fedora 10 live CD, so
it's not the most recent software, but it's not that old, either:
(kernel 2.6.27.5-117.fc10, mdadm 2.6.7.1).
Can anyone confirm that write-behind is working for them, and a quick
test to show that it's working?
-Dave
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID1 write-behind not working?
[not found] ` <49CB7CB2.1020105@tmr.com>
@ 2009-03-26 17:28 ` David Rees
2009-03-27 4:25 ` Neil Brown
0 siblings, 1 reply; 5+ messages in thread
From: David Rees @ 2009-03-26 17:28 UTC (permalink / raw)
To: Bill Davidsen; +Cc: linux-raid
(re-adding linux-raid to Cc)
On Thu, Mar 26, 2009 at 6:01 AM, Bill Davidsen <davidsen@tmr.com> wrote:
> David Rees wrote:
>> My only guess is that by specifying fdatasync on the dd command, that
>> forces the data to be written out to all members of the array. Not
>> quite intuitive if that is by design - because otherwise, when are you
>> able to take advantage of the write-behind feature.
>>
>> The system I was testing on was running off a Fedora 10 live CD, so
>> it's not the most recent software, but it's not that old, either:
>> (kernel 2.6.27.5-117.fc10, mdadm 2.6.7.1).
>>
>> Can anyone confirm that write-behind is working for them, and a quick
>> test to show that it's working?
>
> I agree that it's not obvious that fdatasync would work that way, but
> thinking about it, that's correct behavior. You would use fdatasync when you
> want to be really sure the data is on the device. This may be hard to
> measure, but perhaps using "iostat" with a 1 sec sample would at least let
> you see the i/o rates and convince yourself that it is working as expected.
Yeah, it does kind of make sense in theory, but it seems that it would
be useful to relax it as it seems like it would largely negate most of
the benefits of write-behind, no?
After all, in most cases, data doesn't start getting written to disk
until you really want it there, either in userpspace using fsync or
because the kernel has decided it's time to start flushing data to
disk as it's reached it's dirty limits.
In this case, all I really care is that the data has been written to
the non-write-mostly disks and that the write-mostly disks can lag by
the number of IOs I've specified.
-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID1 write-behind not working?
2009-03-26 17:28 ` David Rees
@ 2009-03-27 4:25 ` Neil Brown
0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2009-03-27 4:25 UTC (permalink / raw)
To: David Rees; +Cc: Bill Davidsen, linux-raid
On Thursday March 26, drees76@gmail.com wrote:
> (re-adding linux-raid to Cc)
>
> On Thu, Mar 26, 2009 at 6:01 AM, Bill Davidsen <davidsen@tmr.com> wrote:
> > David Rees wrote:
> >> My only guess is that by specifying fdatasync on the dd command, that
> >> forces the data to be written out to all members of the array. Not
> >> quite intuitive if that is by design - because otherwise, when are you
> >> able to take advantage of the write-behind feature.
That is not the design. md/raid doesn't know anything about
fdatasync.
When a write request arrives at a raid1 with write-behind, a copy is
made of the data, and that copy is written to the "write-mostly"
devices while the original is written to the other devices (or
something like that. When the writes to the non-write-mostly devices
complete, the filesystem is told that the write is complete.
The write-behind devices can only cause a delay if:
- the number of outstanding write exceeds the maximum given
with --write-behind= to mdadm
- memory gets tight and we fail to malloc space to make a copy of
the data.
So if your write-behind device has high latency, that should be
masked. If however it has low throughput, that cannot be masked. The
throughput of the raid1 will always be limited by the throughput of
the slowest component device.
>
> In this case, all I really care is that the data has been written to
> the non-write-mostly disks and that the write-mostly disks can lag by
> the number of IOs I've specified.
That is exactly how it is designed to work (providing malloc doesn't
fail).
If you have numerical evidence that it is behaving otherwise, I'd be
keen to see it.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID1 write-behind not working?
2009-03-26 0:39 RAID1 write-behind not working? David Rees
[not found] ` <49CB7CB2.1020105@tmr.com>
@ 2009-03-27 4:29 ` Neil Brown
2009-03-27 5:48 ` David Rees
1 sibling, 1 reply; 5+ messages in thread
From: Neil Brown @ 2009-03-27 4:29 UTC (permalink / raw)
To: David Rees; +Cc: linux-raid
On Wednesday March 25, drees76@gmail.com wrote:
> I picked up a SSD (30GB OCZ Vertex if you're curious) and am planning
> to set up a RAID1 with it and a SATA disk with the SATA disk in
> write-mostly mode along with write-behind enabled.
>
> So I went to test it to make sure I had all the details correct on
> aligning the partition to a 128kB offset and the correct mdadm
> commands to create the array in write-mostly and write-behind, but I
> found that I couldn't get writes to "write-behind". Essentially all
> writes would complete at the speed of the rotating disk. Reads
> however worked great - they all appeared to be coming from the SSD.
>
> I created the array with the following command:
>
> mdadm --create /dev/md0 --raid-devices=2 --level=1 \
> --bitmap=internal --write-behind /dev/sdb1 --write-mostly /dev/sda1
Hmmm... try with an external bitmap stored elsewhere on the SSD.
When you have bitmap=internal, bitmap updates are synchronous to all
devices. Maybe that is causing the loss of speed that you see.
NeilBrown
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID1 write-behind not working?
2009-03-27 4:29 ` Neil Brown
@ 2009-03-27 5:48 ` David Rees
0 siblings, 0 replies; 5+ messages in thread
From: David Rees @ 2009-03-27 5:48 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On Thu, Mar 26, 2009 at 9:29 PM, Neil Brown <neilb@suse.de> wrote:
> On Wednesday March 25, drees76@gmail.com wrote:
>> I picked up a SSD (30GB OCZ Vertex if you're curious) and am planning
>> to set up a RAID1 with it and a SATA disk with the SATA disk in
>> write-mostly mode along with write-behind enabled.
>>
>> So I went to test it to make sure I had all the details correct on
>> aligning the partition to a 128kB offset and the correct mdadm
>> commands to create the array in write-mostly and write-behind, but I
>> found that I couldn't get writes to "write-behind". Essentially all
>> writes would complete at the speed of the rotating disk. Reads
>> however worked great - they all appeared to be coming from the SSD.
>>
>> I created the array with the following command:
>>
>> mdadm --create /dev/md0 --raid-devices=2 --level=1 \
>> --bitmap=internal --write-behind /dev/sdb1 --write-mostly /dev/sda1
>
> Hmmm... try with an external bitmap stored elsewhere on the SSD.
> When you have bitmap=internal, bitmap updates are synchronous to all
> devices. Maybe that is causing the loss of speed that you see.
Darn, I gave that a brief thought when I was setting it up but later
forgot about it. I'll have to give it another test. If so, we should
probably include some docs alongside the write-behind information
noting that you need to put the bitmap externally to see any benefit.
Hopefully I'll have time to get this set up and tested over the next
couple days.
-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-03-27 5:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-26 0:39 RAID1 write-behind not working? David Rees
[not found] ` <49CB7CB2.1020105@tmr.com>
2009-03-26 17:28 ` David Rees
2009-03-27 4:25 ` Neil Brown
2009-03-27 4:29 ` Neil Brown
2009-03-27 5:48 ` David Rees
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).