linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Raid 0+1
@ 2012-11-30 13:25 Oguz Yilmaz
  2012-11-30 14:27 ` Sebastian Riemer
  0 siblings, 1 reply; 7+ messages in thread
From: Oguz Yilmaz @ 2012-11-30 13:25 UTC (permalink / raw)
  To: linux-raid

Hello list members,

What is the suggested way for making Raid 0+1 (not 1+0)?
Is it possible to make it without LVM?

Regards,

--
Oguz YILMAZ

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-11-30 13:25 Raid 0+1 Oguz Yilmaz
@ 2012-11-30 14:27 ` Sebastian Riemer
  2012-11-30 14:40   ` Sebastian Riemer
  2012-11-30 15:29   ` Lars Marowsky-Bree
  0 siblings, 2 replies; 7+ messages in thread
From: Sebastian Riemer @ 2012-11-30 14:27 UTC (permalink / raw)
  To: Oguz Yilmaz; +Cc: linux-raid

On 30.11.2012 14:25, Oguz Yilmaz wrote:
> What is the suggested way for making Raid 0+1 (not 1+0)?
> Is it possible to make it without LVM?

Yes, it is possible but it only makes sense if you want to mirror to
another server as most people know that the alternative DRBD is too slow
for serious storage requirements.

Create the RAID-0 first, then take your RAID-0 device and e.g. an iSCSI
device from another storage server with the same setup and create a
RAID-1 over them. Then, you've got your stacked MD layers.

With the flag write-mostly you can even tell the read balancing that the
remote device is slower than the local one.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-11-30 14:27 ` Sebastian Riemer
@ 2012-11-30 14:40   ` Sebastian Riemer
  2012-12-01 13:11     ` Oguz Yilmaz
  2012-11-30 15:29   ` Lars Marowsky-Bree
  1 sibling, 1 reply; 7+ messages in thread
From: Sebastian Riemer @ 2012-11-30 14:40 UTC (permalink / raw)
  To: Oguz Yilmaz; +Cc: linux-raid

On 30.11.2012 15:27, Sebastian Riemer wrote:
> On 30.11.2012 14:25, Oguz Yilmaz wrote:
>> What is the suggested way for making Raid 0+1 (not 1+0)?
>> Is it possible to make it without LVM?
> 
> Yes, it is possible but it only makes sense if you want to mirror to
> another server as most people know that the alternative DRBD is too slow
> for serious storage requirements.
> 
> Create the RAID-0 first, then take your RAID-0 device and e.g. an iSCSI
> device from another storage server with the same setup and create a
> RAID-1 over them. Then, you've got your stacked MD layers.
> 
> With the flag write-mostly you can even tell the read balancing that the
> remote device is slower than the local one.
> 

I've forgotten to mention: You need a kernel >= 3.4.2 for this.

Earlier kernels don't support bvec merging and therefore every IO is a
slow 4 KiB IO in that RAID 0+1 setup.

Btw.: LVM also supports striping but letting LVM do striping is only
useful if you want to build RAID 1+0. The speed is the same as "RAID 1+0
+ LVM". Only that raid10 driver doesn't scale good for >= 24 HDDs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-11-30 14:27 ` Sebastian Riemer
  2012-11-30 14:40   ` Sebastian Riemer
@ 2012-11-30 15:29   ` Lars Marowsky-Bree
  2012-11-30 16:25     ` Sebastian Riemer
  1 sibling, 1 reply; 7+ messages in thread
From: Lars Marowsky-Bree @ 2012-11-30 15:29 UTC (permalink / raw)
  To: linux-raid

On 2012-11-30T15:27:56, Sebastian Riemer <sebastian.riemer@profitbricks.com> wrote:

> Yes, it is possible but it only makes sense if you want to mirror to
> another server as most people know that the alternative DRBD is too slow
> for serious storage requirements.
> 
> Create the RAID-0 first, then take your RAID-0 device and e.g. an iSCSI
> device from another storage server with the same setup and create a
> RAID-1 over them. Then, you've got your stacked MD layers.
> 
> With the flag write-mostly you can even tell the read balancing that the
> remote device is slower than the local one.

That is somewhat orthogonal to the original discussion, but in which
benchmarks is this approach faster than DRBD - aren't the bottlenecks
still the spindle and the network IO?



Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-11-30 15:29   ` Lars Marowsky-Bree
@ 2012-11-30 16:25     ` Sebastian Riemer
  0 siblings, 0 replies; 7+ messages in thread
From: Sebastian Riemer @ 2012-11-30 16:25 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-raid

On 30.11.2012 16:29, Lars Marowsky-Bree wrote:
> On 2012-11-30T15:27:56, Sebastian Riemer <sebastian.riemer@profitbricks.com> wrote:
> 
>> Yes, it is possible but it only makes sense if you want to mirror to
>> another server as most people know that the alternative DRBD is too slow
>> for serious storage requirements.
>>
>> Create the RAID-0 first, then take your RAID-0 device and e.g. an iSCSI
>> device from another storage server with the same setup and create a
>> RAID-1 over them. Then, you've got your stacked MD layers.
>>
>> With the flag write-mostly you can even tell the read balancing that the
>> remote device is slower than the local one.
> 
> That is somewhat orthogonal to the original discussion, but in which
> benchmarks is this approach faster than DRBD - aren't the bottlenecks
> still the spindle and the network IO?
> 

Hi Lars,

just "blktrace" DRBD while doing a file copy with at least 512 KiB
read-ahead. Power off the secondary and "blktrace" again.

Here is what you'll see:
DRBD uses 128 KiB hashing functions. You can never get bigger IOs than
that - bad for big sequential stuff.

In the second test you'll see that DRBD has a dynamic IO request size
detection. It always starts with 4 KiB limits. If you loose connection
to the other host even your local IO is limited to 4 KiB. Sorry, but
this is crap.

There are lots of other performance related bugs in DRBD. If you run it
in a virtual data center, then you'll see 4 KiB IOs while syncing
because they use the blk limits as signed instead of unsigned and KVM
initializes them as "-1U". They've fixed that one in 8.3.14 and 8.4.2.

Furthermore, there are lots of performance issues that you see clearly
if you use a fast transport like QDR InfiniBand. We had ridiculous
performance with that. DRBD introduces lots of latency.

With SRP transport things are much better. Put MD RAID-1 on top and this
is nice! If you've got both rdevs as remote storage you can even have
symmetric (both rdevs the same) latency with MD RAID-1.

The write-intent bitmap of MD is really sophisticated!

Cheers,
Sebastian


-- 
Sebastian Riemer
Linux Kernel Developer - Storage

We are looking for (SENIOR) LINUX KERNEL DEVELOPERS!

ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.riemer@profitbricks.com

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Andreas Gauger, Achim Weiss
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-11-30 14:40   ` Sebastian Riemer
@ 2012-12-01 13:11     ` Oguz Yilmaz
  2012-12-02  2:10       ` Stan Hoeppner
  0 siblings, 1 reply; 7+ messages in thread
From: Oguz Yilmaz @ 2012-12-01 13:11 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: linux-raid

On Fri, Nov 30, 2012 at 4:40 PM, Sebastian Riemer
<sebastian.riemer@profitbricks.com> wrote:
> On 30.11.2012 15:27, Sebastian Riemer wrote:
>> On 30.11.2012 14:25, Oguz Yilmaz wrote:
>>> What is the suggested way for making Raid 0+1 (not 1+0)?
>>> Is it possible to make it without LVM?
>>
>> Yes, it is possible but it only makes sense if you want to mirror to
>> another server as most people know that the alternative DRBD is too slow
>> for serious storage requirements.
>>
>> Create the RAID-0 first, then take your RAID-0 device and e.g. an iSCSI
>> device from another storage server with the same setup and create a
>> RAID-1 over them. Then, you've got your stacked MD layers.
>>


Why do we need another storage server.
IF I create
md0 Raid0 (sda1 + sda2)
md1 Raid0 (sdc1 + sdc2)

then is it possible to create
md2 Raid1 (md0 + md1)

like md?

Regards,

>> With the flag write-mostly you can even tell the read balancing that the
>> remote device is slower than the local one.
>>
>
> I've forgotten to mention: You need a kernel >= 3.4.2 for this.
>
> Earlier kernels don't support bvec merging and therefore every IO is a
> slow 4 KiB IO in that RAID 0+1 setup.
>
> Btw.: LVM also supports striping but letting LVM do striping is only
> useful if you want to build RAID 1+0. The speed is the same as "RAID 1+0
> + LVM". Only that raid10 driver doesn't scale good for >= 24 HDDs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Raid 0+1
  2012-12-01 13:11     ` Oguz Yilmaz
@ 2012-12-02  2:10       ` Stan Hoeppner
  0 siblings, 0 replies; 7+ messages in thread
From: Stan Hoeppner @ 2012-12-02  2:10 UTC (permalink / raw)
  To: Oguz Yilmaz; +Cc: Sebastian Riemer, linux-raid

On 12/1/2012 7:11 AM, Oguz Yilmaz wrote:

> Why do we need another storage server.

You don't.

> IF I create
> md0 Raid0 (sda1 + sda2)
> md1 Raid0 (sdc1 + sdc2)
> 
> then is it possible to create
> md2 Raid1 (md0 + md1)
> 
> like md?

Sure, you can do this.  One downside is you can never expand it WRT
capacity or effective spindles.  The only way to get there is to put the
RAID1 device in a linear device and grow more of these 4 device RAID 0+1
devices into the linear device.  This setup requires an allocation group
based filesystem, XFS, to get anything near linear scaling across the
drives.  But for that your application must exhibit file level
parallelism, i.e. reading/writing many dozens of files in parallel, and
with inode64 mount, they must be in different directories.  Otherwise
you IO won't scale across your disks.

-- 
Stan



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-12-02  2:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-30 13:25 Raid 0+1 Oguz Yilmaz
2012-11-30 14:27 ` Sebastian Riemer
2012-11-30 14:40   ` Sebastian Riemer
2012-12-01 13:11     ` Oguz Yilmaz
2012-12-02  2:10       ` Stan Hoeppner
2012-11-30 15:29   ` Lars Marowsky-Bree
2012-11-30 16:25     ` Sebastian Riemer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).