* How to format RAID1 correctly
@ 2014-09-24 0:46 Helmut Tessarek
2014-09-24 2:05 ` Helmut Tessarek
2014-09-24 2:07 ` Eric Sandeen
0 siblings, 2 replies; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 0:46 UTC (permalink / raw)
To: xfs
The information provided in the FAQ and on several web sites is not really
useful regarding RAID1.
According to the FAQ (entry 35):
The correct options to format a RAID1 (2 disks) with 64k chunk size is:
mkfs.xfs -d su=64k -d sw=1 /dev/mapper/data
But it also states that it would be automatically detected and used correctly,
yet
mkfs.xfs /dev/mapper/data
yields a different result:
[root@atvie01s ~]# mkfs.xfs -f /dev/mapper/data
meta-data=/dev/mapper/data isize=256 agcount=4,
agsize=244173876 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=976695504, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=476902, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Formel 1:
[root@atvie01s ~]# mkfs.xfs -f -d su=64k -d sw=1 /dev/mapper/data
meta-data=/dev/mapper/data isize=256 agcount=32,
agsize=30521728 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=976695296, imaxpct=5
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=476902, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Another inconsistency is that RAID1 doesn't use striping, so the chunk size
should be irrelevant in the first place.
So what is ultimately the correct way to format a RAID1?
Cheers,
K. C.
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 0:46 How to format RAID1 correctly Helmut Tessarek
@ 2014-09-24 2:05 ` Helmut Tessarek
2014-09-24 2:07 ` Eric Sandeen
1 sibling, 0 replies; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 2:05 UTC (permalink / raw)
To: xfs
On 23.09.14 20:46 , Helmut Tessarek wrote:
> So what is ultimately the correct way to format a RAID1?
According to sandeen in #xfs the correct way is:
mkfs.xfs /dev/md0
Makes sense, since no striping is involved.
Just wanted to clarify and I did.
Cheers,
K. C.
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 0:46 How to format RAID1 correctly Helmut Tessarek
2014-09-24 2:05 ` Helmut Tessarek
@ 2014-09-24 2:07 ` Eric Sandeen
2014-09-24 2:11 ` Helmut Tessarek
1 sibling, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2014-09-24 2:07 UTC (permalink / raw)
To: Helmut Tessarek, xfs
On 9/23/14 7:46 PM, Helmut Tessarek wrote:
> The information provided in the FAQ and on several web sites is not really
> useful regarding RAID1.
>
> According to the FAQ (entry 35):
> The correct options to format a RAID1 (2 disks) with 64k chunk size is:
> mkfs.xfs -d su=64k -d sw=1 /dev/mapper/data
I don't see that text in the faq... where is it? I see:
> So if your RAID controller has a stripe size of 64KB, and you have a RAID-6 with 8 disks, use
>
> su = 64k
> sw = 6 (RAID-6 of 8 disks has 6 data disks)
but! your raid doesn't have a 64k stripe, so that doesn't apply.
> But it also states that it would be automatically detected and used correctly,
> yet
> mkfs.xfs /dev/mapper/data
> yields a different result:
>
> [root@atvie01s ~]# mkfs.xfs -f /dev/mapper/data
>
> meta-data=/dev/mapper/data isize=256 agcount=4,
> agsize=244173876 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0
> data = bsize=4096 blocks=976695504, imaxpct=5
> = sunit=0 swidth=0 blks
no geometry because md0 raid1 doesn't export any stripe geometry.
> Formel 1:
>
> [root@atvie01s ~]# mkfs.xfs -f -d su=64k -d sw=1 /dev/mapper/data
>
> meta-data=/dev/mapper/data isize=256 agcount=32,
> agsize=30521728 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0
> data = bsize=4096 blocks=976695296, imaxpct=5
> = sunit=16 swidth=16 blks
You specified it, so mkfs obeyed.
> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> log =internal log bsize=4096 blocks=476902, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
>
> Another inconsistency is that RAID1 doesn't use striping, so the chunk size
> should be irrelevant in the first place.
agreed - but I don't see anything in the faq about raid1 stripes.
Am I missing something?
> So what is ultimately the correct way to format a RAID1?
for software md raid over individual disks, bare mkfs should do the right
thing.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 2:07 ` Eric Sandeen
@ 2014-09-24 2:11 ` Helmut Tessarek
2014-09-24 2:21 ` Eric Sandeen
2014-09-24 3:05 ` stan hoeppner
0 siblings, 2 replies; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 2:11 UTC (permalink / raw)
To: Eric Sandeen, xfs
On 23.09.14 22:07 , Eric Sandeen wrote:
> but! your raid doesn't have a 64k stripe, so that doesn't apply.
Yep, that's true, but see below.
> no geometry because md0 raid1 doesn't export any stripe geometry.
[root@atvie01s ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0] sdd1[1]
3906784064 blocks super 1.2 [2/2] [UU]
bitmap: 0/30 pages [0KB], 65536KB chunk
So for some reason it shows a 64k chunk size even for RAID1.
That was what got me confused.
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 2:11 ` Helmut Tessarek
@ 2014-09-24 2:21 ` Eric Sandeen
2014-09-24 2:30 ` Helmut Tessarek
2014-09-24 3:05 ` stan hoeppner
1 sibling, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2014-09-24 2:21 UTC (permalink / raw)
To: Helmut Tessarek, xfs
On 9/23/14 9:11 PM, Helmut Tessarek wrote:
> On 23.09.14 22:07 , Eric Sandeen wrote:
>> but! your raid doesn't have a 64k stripe, so that doesn't apply.
>
> Yep, that's true, but see below.
>
>> no geometry because md0 raid1 doesn't export any stripe geometry.
>
> [root@atvie01s ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb1[0] sdd1[1]
> 3906784064 blocks super 1.2 [2/2] [UU]
> bitmap: 0/30 pages [0KB], 65536KB chunk
>
> So for some reason it shows a 64k chunk size even for RAID1.
>
> That was what got me confused.
Hm, 65536KB sounds like 64MB...
Anyway, mkfs.xfs picks up the queue's minimum IO size for
sunit, and optimal io size for swidth.
So:
blkid # blockdev --getiomin --getioopt /dev/md0
(which here returns:
512
0
here)
will show you what your raid's queue is actually reporting,
and what mkfs.xfs will pick up.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 2:21 ` Eric Sandeen
@ 2014-09-24 2:30 ` Helmut Tessarek
0 siblings, 0 replies; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 2:30 UTC (permalink / raw)
To: Eric Sandeen, xfs
On 23.09.14 22:21 , Eric Sandeen wrote:
> blkid # blockdev --getiomin --getioopt /dev/md0
>
> (which here returns:
> 512
> 0
> here)
>
> will show you what your raid's queue is actually reporting,
> and what mkfs.xfs will pick up.
[root@atvie01s ~]# blockdev --getiomin --getioopt /dev/md0
4096
0
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 2:11 ` Helmut Tessarek
2014-09-24 2:21 ` Eric Sandeen
@ 2014-09-24 3:05 ` stan hoeppner
2014-09-24 3:15 ` Helmut Tessarek
1 sibling, 1 reply; 12+ messages in thread
From: stan hoeppner @ 2014-09-24 3:05 UTC (permalink / raw)
To: Helmut Tessarek, Eric Sandeen, xfs
On 09/23/2014 09:11 PM, Helmut Tessarek wrote:
> On 23.09.14 22:07 , Eric Sandeen wrote:
>> but! your raid doesn't have a 64k stripe, so that doesn't apply.
>
> Yep, that's true, but see below.
>
>> no geometry because md0 raid1 doesn't export any stripe geometry.
>
> [root@atvie01s ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb1[0] sdd1[1]
> 3906784064 blocks super 1.2 [2/2] [UU]
> bitmap: 0/30 pages [0KB], 65536KB chunk
>
> So for some reason it shows a 64k chunk size even for RAID1.
>
> That was what got me confused.
It confuses many people who are new to md RAID1. The above is the
*bitmap* chunk size, not the array chunk size. There is no array chunk
size for RAID1 as there is no striping. You must have striping to have
chunks. With md RAID1 every 4KB page write is simply mirrored to each
physical disk.
Cheers,
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 3:05 ` stan hoeppner
@ 2014-09-24 3:15 ` Helmut Tessarek
2014-09-24 4:09 ` Stan Hoeppner
0 siblings, 1 reply; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 3:15 UTC (permalink / raw)
To: stan hoeppner, Eric Sandeen, xfs
On 23.09.14 23:05 , stan hoeppner wrote:
> It confuses many people who are new to md RAID1. The above is the
> *bitmap* chunk size, not the array chunk size. There is no array chunk
> size for RAID1 as there is no striping. You must have striping to have
> chunks. With md RAID1 every 4KB page write is simply mirrored to each
> physical disk.
Thanks for the info. I'm used to big ass storage subsystems, but new to SW
RAID. I seems I have some catching up to do.
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 3:15 ` Helmut Tessarek
@ 2014-09-24 4:09 ` Stan Hoeppner
2014-09-24 15:53 ` Helmut Tessarek
0 siblings, 1 reply; 12+ messages in thread
From: Stan Hoeppner @ 2014-09-24 4:09 UTC (permalink / raw)
To: Helmut Tessarek, Eric Sandeen, xfs
On 09/23/2014 10:15 PM, Helmut Tessarek wrote:
> On 23.09.14 23:05 , stan hoeppner wrote:
>> It confuses many people who are new to md RAID1. The above is the
>> *bitmap* chunk size, not the array chunk size. There is no array chunk
>> size for RAID1 as there is no striping. You must have striping to have
>> chunks. With md RAID1 every 4KB page write is simply mirrored to each
>> physical disk.
>
> Thanks for the info. I'm used to big ass storage subsystems, but new to SW
> RAID. I seems I have some catching up to do.
If you create any striped arrays, especially parity arrays, with md make
sure to manually specify chunk size and match it to your workload. The
current default is 512KB. This is too large for a great many workloads,
specifically those that are metadata heavy or manipulate many small
files. 512KB wastes space and with parity arrays causes RMW, hammering
throughput and increasing latency.
Cheers,
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 4:09 ` Stan Hoeppner
@ 2014-09-24 15:53 ` Helmut Tessarek
2014-09-24 16:18 ` Eric Sandeen
2014-09-24 19:06 ` Stan Hoeppner
0 siblings, 2 replies; 12+ messages in thread
From: Helmut Tessarek @ 2014-09-24 15:53 UTC (permalink / raw)
To: Stan Hoeppner, Eric Sandeen, xfs
On 2014-09-24 0:09, Stan Hoeppner wrote:
> If you create any striped arrays, especially parity arrays, with md make
> sure to manually specify chunk size and match it to your workload. The
> current default is 512KB. This is too large for a great many workloads,
> specifically those that are metadata heavy or manipulate many small
> files. 512KB wastes space and with parity arrays causes RMW, hammering
> throughput and increasing latency.
Thanks again for the valueable information.
I used to work with databases on storage subsystems, so placing GBs of
database containers for tableapaces on arrays with a larger stripe size
was actually beneficial.
For log files and other data I usually used different cache settings and
strip sizes.
So how does this work with SW RAID?
Does the XFS chunk size equal the amount of data touched by a single r/w
operation?
I'm asking because data is usually written in page/extent sizes for
databases. Even if I have a container with 50GB, I might only have to
read/write a 4k page.
Cheers,
K. C.
--
regards Helmut K. C. Tessarek
lookup http://sks.pkqs.net for KeyID 0xC11F128D
/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 15:53 ` Helmut Tessarek
@ 2014-09-24 16:18 ` Eric Sandeen
2014-09-24 19:06 ` Stan Hoeppner
1 sibling, 0 replies; 12+ messages in thread
From: Eric Sandeen @ 2014-09-24 16:18 UTC (permalink / raw)
To: Helmut Tessarek, Stan Hoeppner, xfs
On 9/24/14 10:53 AM, Helmut Tessarek wrote:
> On 2014-09-24 0:09, Stan Hoeppner wrote:
>> If you create any striped arrays, especially parity arrays, with md make
>> sure to manually specify chunk size and match it to your workload. The
>> current default is 512KB. This is too large for a great many workloads,
>> specifically those that are metadata heavy or manipulate many small
>> files. 512KB wastes space and with parity arrays causes RMW, hammering
>> throughput and increasing latency.
>
> Thanks again for the valueable information.
>
> I used to work with databases on storage subsystems, so placing GBs of
> database containers for tableapaces on arrays with a larger stripe size
> was actually beneficial.
> For log files and other data I usually used different cache settings and
> strip sizes.
>
> So how does this work with SW RAID?
>
> Does the XFS chunk size equal the amount of data touched by a single r/w
> operation?
It has more to do with where allocations start, so that allocations
don't cross stripe boundaries if possible.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: How to format RAID1 correctly
2014-09-24 15:53 ` Helmut Tessarek
2014-09-24 16:18 ` Eric Sandeen
@ 2014-09-24 19:06 ` Stan Hoeppner
1 sibling, 0 replies; 12+ messages in thread
From: Stan Hoeppner @ 2014-09-24 19:06 UTC (permalink / raw)
To: Helmut Tessarek, Eric Sandeen, xfs
On 09/24/2014 10:53 AM, Helmut Tessarek wrote:
> On 2014-09-24 0:09, Stan Hoeppner wrote:
>> If you create any striped arrays, especially parity arrays, with md make
>> sure to manually specify chunk size and match it to your workload. The
>> current default is 512KB. This is too large for a great many workloads,
>> specifically those that are metadata heavy or manipulate many small
>> files. 512KB wastes space and with parity arrays causes RMW, hammering
>> throughput and increasing latency.
>
> Thanks again for the valueable information.
>
> I used to work with databases on storage subsystems, so placing GBs of
> database containers for tableapaces on arrays with a larger stripe size
> was actually beneficial.
> For log files and other data I usually used different cache settings and
> strip sizes.
>
> So how does this work with SW RAID?
>
> Does the XFS chunk size equal the amount of data touched by a single r/w
> operation?
No. The XFS stripe unit size (chunk size) must/should equal the
underlying RAID stripe unit size (chunk). As Eric said, all this does
is help XFS align allocations to the underlying RAID geometry in an
attempt to get more full chunk/stripe writes on the disks.
Note we both said allocation. The XFS sunit/swidth settings have no
affect with file appends or writing into preallocated files.
> I'm asking because data is usually written in page/extent sizes for
> databases. Even if I have a container with 50GB, I might only have to
> read/write a 4k page.
db operations are going to be append or modify-in-place. Neither will
benefit from aligning XFS to the RAID geometry. Appending the db logs
is also not allocation. So in practical terms, you simply would not use
the "-d" striping options during mkfs.xfs. Use the defaults.
Cheers,
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-09-24 19:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-24 0:46 How to format RAID1 correctly Helmut Tessarek
2014-09-24 2:05 ` Helmut Tessarek
2014-09-24 2:07 ` Eric Sandeen
2014-09-24 2:11 ` Helmut Tessarek
2014-09-24 2:21 ` Eric Sandeen
2014-09-24 2:30 ` Helmut Tessarek
2014-09-24 3:05 ` stan hoeppner
2014-09-24 3:15 ` Helmut Tessarek
2014-09-24 4:09 ` Stan Hoeppner
2014-09-24 15:53 ` Helmut Tessarek
2014-09-24 16:18 ` Eric Sandeen
2014-09-24 19:06 ` Stan Hoeppner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox