All of lore.kernel.org
 help / color / mirror / Atom feed
* Partitioning on top of raid mirror device questions.
@ 2014-07-10 11:24 Wilson Jonathan
  2014-07-10 13:17 ` Phil Turmel
  2014-07-11 16:29 ` Chris Murphy
  0 siblings, 2 replies; 4+ messages in thread
From: Wilson Jonathan @ 2014-07-10 11:24 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

In my system I am running a few virtual systems, currently they reside
in raw files.

Initially I created a raid1 partition, with an ext4 file system on top
of the whole partition.

/dev/md/md85 from /dev/sda5 /dev/sdb5 mirror 120G

However I know from experience that raw files can be a bit slow (totally
different setup, raid6), so I wondered about the possibility of creating
3 individual partitions on top of the raid and if this would improve
performance.

Having read the man, it seems that partitions on top of raid are fine,
and no special options are required in the raid creation.

Now the questions.

Alignment... 

Now I understand that the base disk partitions require alignment based
on the drive... and I assume mdadm then creates its internal structure
so that it is also aligned, or does it? 

My wondering here is that I know mdadm has an area that holds data bout
the raid, then another area that holds the data... if the data area
(chunks? I may have the wrong term) was not aligned to the underlying
drives then would a write of "chunkX" potentially partially write to
disk area62 and disk area63 (for example) causing the underlying disk to
do a RMR.

If we assume that raid/base disk is all hunky dory alignment wise, this
then brings me on to partitions on top of the raid...

As raid when partitioned pretends to be a block disk device; when I used
gdisk to look at it without performing anything except a look at its
layout it reports its a normal disk, 512bytes, first usable sector 34,
partitions will be aligned on 2048 sector boundaries.

So my question is am I correct in thinking that "md85 partition 01" will
align to (an imaginary) 2048 boundary on "md85" which will align to the
real 2048 boundary on "sda5/sdb5"?

I may just stick with raw files but as I am in the process of upgrading
it piqued my interest and might be worth converting to partitions, or
possibly LVM which seems the preferred or most documented option (bit
I'm not sure I want to add a whole new set of skills and learning curve
at the moment). 

My intention is to add 2 more disks to the mirror raid, which while not
changing the write performance I believe will improve the read
performance... at least as far as I can tell, again is this assumption
correct?

Thanks in advance. 

Jon.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Partitioning on top of raid mirror device questions.
  2014-07-10 11:24 Partitioning on top of raid mirror device questions Wilson Jonathan
@ 2014-07-10 13:17 ` Phil Turmel
  2014-07-12 10:47   ` Wilson, Jonathan
  2014-07-11 16:29 ` Chris Murphy
  1 sibling, 1 reply; 4+ messages in thread
From: Phil Turmel @ 2014-07-10 13:17 UTC (permalink / raw)
  To: Wilson Jonathan, linux-raid@vger.kernel.org

Good morning Jonathan,

On 07/10/2014 07:24 AM, Wilson Jonathan wrote:

[trim /]

> However I know from experience that raw files can be a bit slow (totally
> different setup, raid6), so I wondered about the possibility of creating
> 3 individual partitions on top of the raid and if this would improve
> performance.

Yes, likely.  There would be no filesystem overhead.  I do this all the
time with md raid and LVM.

> Having read the man, it seems that partitions on top of raid are fine,
> and no special options are required in the raid creation.

Correct.  However, if you use metadata v0.9 or v1.0, the raid data area
starts at the raid underlying device start.  It's then possible for the
kernel to "see" the partions as if they are on the underlying device
instead of in the array.  This is actually quite handy for /boot,
allowing a BIOS to boot from any of several identical mirrors.  But
hazardous for pretty much everything else.

Modern mdadm defaults to v1.2, so not a problem.

> Now the questions.
> 
> Alignment... 
> 
> Now I understand that the base disk partitions require alignment based
> on the drive... and I assume mdadm then creates its internal structure
> so that it is also aligned, or does it? 

MD raid simply accepts whatever underlying alignment is present, and
sets up the data area by default at no less than 64k intervals (early
versions), and typically 1M intervals (later versions).  So if the
underlying partions are aligned, MD's structures will be aligned.  So
any partitions created within the array that are aligned to the array
will also be aligned to the disks.

> My wondering here is that I know mdadm has an area that holds data bout
> the raid, then another area that holds the data... if the data area
> (chunks? I may have the wrong term) was not aligned to the underlying
> drives then would a write of "chunkX" potentially partially write to
> disk area62 and disk area63 (for example) causing the underlying disk to
> do a RMR.

MD reserves space on the devices for *metadata*, which includes the
*superblock*.  There are various versions and layouts, all reasonably
well documented in the various man pages.

Where the raid level needs it, MD breaks the data area down into
*chunks* to create the boundaries for spreading the array data among the
multiple underlying devices.  The chunk size is configurable, but the
defaults are also alignment-friendly.  Some *filesystems* are smart
enough to take this into account, but I'm not an expert on that.

> If we assume that raid/base disk is all hunky dory alignment wise, this
> then brings me on to partitions on top of the raid...
> 
> As raid when partitioned pretends to be a block disk device; when I used
> gdisk to look at it without performing anything except a look at its
> layout it reports its a normal disk, 512bytes, first usable sector 34,
> partitions will be aligned on 2048 sector boundaries.
> 
> So my question is am I correct in thinking that "md85 partition 01" will
> align to (an imaginary) 2048 boundary on "md85" which will align to the
> real 2048 boundary on "sda5/sdb5"?

Yes.

> I may just stick with raw files but as I am in the process of upgrading
> it piqued my interest and might be worth converting to partitions, or
> possibly LVM which seems the preferred or most documented option (bit
> I'm not sure I want to add a whole new set of skills and learning curve
> at the moment). 

I always use LVM on top of my arrays.  It is also alignment-friendly,
and is *very* handy when you need to rearrange a machine's storage
without downtime.  I prefer it over partitions within the raid.

> My intention is to add 2 more disks to the mirror raid, which while not
> changing the write performance I believe will improve the read
> performance... at least as far as I can tell, again is this assumption
> correct?

It will improve multiple-threaded reads, or multiple simultaneous
programs' reads.  It will not improve single-threaded streaming reads.

HTH,

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Partitioning on top of raid mirror device questions.
  2014-07-10 11:24 Partitioning on top of raid mirror device questions Wilson Jonathan
  2014-07-10 13:17 ` Phil Turmel
@ 2014-07-11 16:29 ` Chris Murphy
  1 sibling, 0 replies; 4+ messages in thread
From: Chris Murphy @ 2014-07-11 16:29 UTC (permalink / raw)
  To: Wilson Jonathan; +Cc: linux-raid@vger.kernel.org


On Jul 10, 2014, at 5:24 AM, Wilson Jonathan <i400s@hotmail.com> wrote:
> 
> If we assume that raid/base disk is all hunky dory alignment wise, this
> then brings me on to partitions on top of the raid…

Since you're going to use LVM anyway, why not use LVM raid? Make each whole disk (or partition) a PV. Then at the time you create an LV, choose the raid level. This consolidates the additional partitioning, mdadm, and lvm steps all into a single lvm step.


> I may just stick with raw files but as I am in the process of upgrading
> it piqued my interest and might be worth converting to partitions, or
> possibly LVM which seems the preferred or most documented option (bit
> I'm not sure I want to add a whole new set of skills and learning curve
> at the moment). 

I found qcow2 files had the best performance. And if your use case even remotely benefits from snapshotting, qcow2 performs much better than (conventional) LVM snapshotting. The new LVM thin provisioning snapshotting makes snapshots perform the same as the original LV, but the libvirt integration work to use thin pools isn't done yet, I think.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Partitioning on top of raid mirror device questions.
  2014-07-10 13:17 ` Phil Turmel
@ 2014-07-12 10:47   ` Wilson, Jonathan
  0 siblings, 0 replies; 4+ messages in thread
From: Wilson, Jonathan @ 2014-07-12 10:47 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid@vger.kernel.org

On Thu, 2014-07-10 at 09:17 -0400, Phil Turmel wrote:
> Good morning Jonathan,
> 
> On 07/10/2014 07:24 AM, Wilson Jonathan wrote:
> 
> [trim /]
> 
[snipped]
> 
> > I may just stick with raw files but as I am in the process of upgrading
> > it piqued my interest and might be worth converting to partitions, or
> > possibly LVM which seems the preferred or most documented option (bit
> > I'm not sure I want to add a whole new set of skills and learning curve
> > at the moment). 
> 
> I always use LVM on top of my arrays.  It is also alignment-friendly,
> and is *very* handy when you need to rearrange a machine's storage
> without downtime.  I prefer it over partitions within the raid.

In the end I decided just partitions would be enough, although there
were a couple of gotchas... 

The first was I have to manually manage them which is not really a
problem; which I found out when trying to clone my working env from P1
to P2 "block devices to clone must be libvirt managed storage volumes"
but easy enough to do a dd or I could probably use the "qemu-img
convert" using P1 and P2 as the from and to options (not tested, on my
todo list).

The second was that moving from a wheezy system to a new jessie
(although there were other possible causes in initial miss configuring
of "import from existing") seemed to be enough to trigger a "hardware
has changed" within the XP virtual machine and then one to many
activated counts caused a "this product has been installed to many
times, use the phone..." (thankfully the phone MS auto system still
works even tho' MS no longer supports it)

> 
> > My intention is to add 2 more disks to the mirror raid, which while not
> > changing the write performance I believe will improve the read
> > performance... at least as far as I can tell, again is this assumption
> > correct?
> 
> It will improve multiple-threaded reads, or multiple simultaneous
> programs' reads.  It will not improve single-threaded streaming reads.
> 

Interesting to know... and my initial observation is that the
boot-to-idle time "feels" faster.

There was one final problem that bit me, but with hindsight should have
been obvious...

My on disk partition(s) was 120G, raid on top, then what should have
been 4 partitions of 30G (the original raw file virtual size) but the
4th partition was fractionally smaller than 30G (about 29.99G) because I
forgot to take into account the on disk raid metadata information would
take up space. Obvious when you think about it, DOH!, but as I only need
2 working envs, and one "this copy works and is set up how I like it"
for backup re-replication I can live with it and use the 4th partition
as a scratch test bed :-)


> HTH,
> 
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-12 10:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-10 11:24 Partitioning on top of raid mirror device questions Wilson Jonathan
2014-07-10 13:17 ` Phil Turmel
2014-07-12 10:47   ` Wilson, Jonathan
2014-07-11 16:29 ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.