From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q5NNjH1V146498 for ; Sat, 23 Jun 2012 18:45:17 -0500 Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id MMSceSHWZYTF2rVA for ; Sat, 23 Jun 2012 16:45:14 -0700 (PDT) Date: Sun, 24 Jun 2012 09:44:45 +1000 From: Dave Chinner Subject: Re: mkfs.xfs states log stripe unit is too large Message-ID: <20120623234445.GZ19223@dastard> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Ingo =?iso-8859-1?Q?J=FCrgensmann?= Cc: xfs@oss.sgi.com On Sat, Jun 23, 2012 at 02:50:49PM +0200, Ingo J=FCrgensmann wrote: > muaddib:~# cat /proc/mdstat = > Personalities : [raid1] [raid6] [raid5] [raid4] = > md7 : active raid5 sdf4[3] sdd4[1] sde4[0] > 7811261440 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] = [UUU] ..... > The RAID devices /dev/md0 to /dev/md4 are on my old 3x 1 TB > Seagate disks. Anyway, to finally come to the problem, when I try > to create a filesystem on the new RAID5 I get the following: = > = > muaddib:~# mkfs.xfs /dev/lv/usr > log stripe unit (524288 bytes) is too large (maximum is 256KiB) > log stripe unit adjusted to 32KiB > meta-data=3D/dev/lv/usr isize=3D256 agcount=3D16, agsize=3D= 327552 blks > =3D sectsz=3D512 attr=3D2, projid32bit= =3D0 > data =3D bsize=3D4096 blocks=3D5240832, imaxp= ct=3D25 > =3D sunit=3D128 swidth=3D256 blks > naming =3Dversion 2 bsize=3D4096 ascii-ci=3D0 > log =3Dinternal log bsize=3D4096 blocks=3D2560, version= =3D2 > =3D sectsz=3D512 sunit=3D8 blks, lazy-co= unt=3D1 > realtime =3Dnone extsz=3D4096 blocks=3D0, rtextents= =3D0 > = > = > As you can see I follow the "mkfs.xfs knows best, don't fiddle > around with options unless you know what you're doing!"-advice. > But apparently mkfs.xfs wanted to create a log stripe unit of 512 > kiB, most likely because it's the same chunk size as the > underlying RAID device. = Exactly. Best thing in general is to align all log writes to the underlying stripe unit of the array. That way as multiple frequent log writes occur, it is guaranteed to form full stripe writes and basically have no RMW overhead. 32k is chosen by default because that's the default log buffer size and hence the typical size of log writes. If you increase the log stripe unit, you also increase the minimum log buffer size that the filesystem supports. The filesystem can support up to 256k log buffers, and hence the limit on maximum log stripe alignment. > The problem seems to be related to RAID5, because when I try to > make a filesystem on /dev/md6 (RAID1), there's no error message: They don't have a stripe unit/stripe width, so no alignment is needed or configured. > So, the question is: = > - is this a bug somewhere in XFS, LVM or Linux's software RAID > implementation? Not a bug at all. > - will performance suffer from log stripe size adjusted to just 32 > kiB? Some of my logical volumes will just store data, but one or > the other will have some workload acting as storage for BackupPC. For data volumes, no. For backupPC, it depends on whether the MD RAID stripe cache can turn all the sequential log writes into a full stripe write. In general, this is not a problem, and is almost never a problem for HW RAID with BBWC.... > - would it be worth the effort to raise log stripe to at least 256 > kiB? Depends on your workload. If it is fsync heavy, I'd advise against it, as every log write will be padded out to 256k, even if you only write 500 bytes worth of transaction data.... > - or would it be better to run with external log on the old 1 TB > RAID? External logs provide muchless benefit with delayed logging than hey use to. As it is, your external log needs to have the same reliability characteristics as the main volume - lose the log, corrupt the filesystem. Hence for RAID5 volumes, you need a RAID1 log, and for RAID6 you either need RAID6 or a 3-way mirror to provide the same reliability.... > End note: the 4 TB disks are not yet "in production", so I can run > tests with both RAID setup as well as mkfs.xfs. Reshaping the RAID > will take up to 10 hours, though... = IMO, RAID reshaping is just a bad idea - it changes the alignment characteristic of the volume, hence everything that the filesystemlaid down in an aligned fashion is now unaligned, and you have to tell the filesytemteh new alignment before new files will be correctly aligned. Also, it's usually faster to back up, recreate and restore than reshape and that puts a lot less load on your disks, too... Cheers, Dave. -- = Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs