* Alignment size? @ 2010-08-12 22:10 Michael Tokarev 2010-08-12 23:49 ` Dave Chinner 0 siblings, 1 reply; 11+ messages in thread From: Michael Tokarev @ 2010-08-12 22:10 UTC (permalink / raw) To: xfs Hello. I used XFS for a long time on many different servers, and it works well. But now I encountered an.. unexpected problem. The question is: on one of our servers, XFS requires different alignment size for O_DIRECT operations than on others. Usually it's 512 bytes, but on this server it is 4096 - both min_io and alignment (this is from XFS_IOC_DIOINFO ioctl). I'm not sure what the reason for this is. On this server, the underlying block device is raid5 (linux sw raid), but we had other machines with raid5 which didn't have that alignment requiriments. The problem with that is that Oracle db, which we use with XFS alot, refuses to work on this machine, or, rather, XFS refuses to process I/O in 512-byte chunks from oracle (control files and redolog files). I know it is a frequent combination which is used in production in many places, and is used here alot too, but I haven't seen anyone mentioning this issue we have now, with "larger than usual" alignment size requiriments. Is there a way to remedy this somehow, without reformatting whole 600+ gb? Thank you! /mjt _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-12 22:10 Alignment size? Michael Tokarev @ 2010-08-12 23:49 ` Dave Chinner 2010-08-13 6:24 ` Michael Tokarev 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2010-08-12 23:49 UTC (permalink / raw) To: Michael Tokarev; +Cc: xfs On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote: > Hello. > > I used XFS for a long time on many different > servers, and it works well. But now I encountered > an.. unexpected problem. > > The question is: on one of our servers, XFS requires > different alignment size for O_DIRECT operations than > on others. Usually it's 512 bytes, but on this server > it is 4096 - both min_io and alignment (this is from > XFS_IOC_DIOINFO ioctl). It'll be a filesystem set up with a 4k sector size, then. Check the output of xfs_info. > I'm not sure what the reason for this is. > On this server, the underlying block device is raid5 > (linux sw raid), but we had other machines with raid5 > which didn't have that alignment requiriments. > > The problem with that is that Oracle db, which we use > with XFS alot, refuses to work on this machine, or, > rather, XFS refuses to process I/O in 512-byte chunks > from oracle (control files and redolog files). A clear case of application failure. I guess Oracle have some work to do to support 4k sector drives where they won't be able to do 512 byte direct IOs at all.... > Is there a way to remedy this somehow, without > reformatting whole 600+ gb? Not really. If it is 4k sector size, then there is some extremely dangerous voodoo that you could do to realign and resize the AG headers, followed by a full xfs_repair run to fix up all the block accounting. This is not something I'd recommend anyone ever does, and for only 600GB of data it would probably take more time to work out how to do it correctly (using disposable filesystem images) than it would to dump, mkfs and restore... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-12 23:49 ` Dave Chinner @ 2010-08-13 6:24 ` Michael Tokarev 2010-08-13 10:27 ` Stan Hoeppner ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Michael Tokarev @ 2010-08-13 6:24 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs 13.08.2010 03:49, Dave Chinner wrote: > On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote: >> Hello. >> >> I used XFS for a long time on many different >> servers, and it works well. But now I encountered >> an.. unexpected problem. >> >> The question is: on one of our servers, XFS requires >> different alignment size for O_DIRECT operations than >> on others. Usually it's 512 bytes, but on this server >> it is 4096 - both min_io and alignment (this is from >> XFS_IOC_DIOINFO ioctl). > > It'll be a filesystem set up with a 4k sector size, then. Check the > output of xfs_info. yes, xfs_info reports sectsz=4096, I noticed this yesterday. >> I'm not sure what the reason for this is. >> On this server, the underlying block device is raid5 >> (linux sw raid), but we had other machines with raid5 >> which didn't have that alignment requiriments. >> >> The problem with that is that Oracle db, which we use >> with XFS alot, refuses to work on this machine, or, >> rather, XFS refuses to process I/O in 512-byte chunks >> from oracle (control files and redolog files). > > A clear case of application failure. I guess Oracle have some work > to do to support 4k sector drives where they won't be able to do 512 > byte direct IOs at all.... Sure thing, that's oracle10, and at least at that time there was no way to determine the size of I/O in a generic way. Now there is, and I hope in oracle12 there will be support for various different sectors. But this is not the point.. . >> Is there a way to remedy this somehow, without >> reformatting whole 600+ gb? > > Not really. If it is 4k sector size, then there is some extremely > dangerous voodoo that you could do to realign and resize the AG > headers, followed by a full xfs_repair run to fix up all the block > accounting. This is not something I'd recommend anyone ever does, > and for only 600GB of data it would probably take more time to work > out how to do it correctly (using disposable filesystem images) than > it would to dump, mkfs and restore... Ugh. I see. Well, I was afraid of that, but I'm already sorta-prepared for that, after "sleeping with this idea"... ;) It'll take ages for sure, but there's no other choice for now. So the question that remains is: why? It's an old machine (PIV era), with old scsi disks (74Gb non-hotswap), -- the same disks as used on numerous other machines out there, where there's no such issue. Plain old linux software raid array, also as used on many other systems. At that time, all stuff were in 512 bytes for sure. The array and filesystem were re-created last year (we added another drive to it), but I don't think at that time there were a kernel version that supported >512 sector sizes either (it was 2.6.27 I think). So why xfs decided the block size is 4K?? And a related question, -- is there a way to create xfs fs with the right sector size? The filesystem were ok in years, not only on this machine, and I'm quite afraid to replace it with something else (e.g. ext4) in a hurry without good prior testing. By the way, how one can check the "sector size" of a block device nowadays? I think I saw something about sysfs, but I see nothing of that sort in 2.6.32 kernel (which is used on this and other systems). Thanks! /mjt _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 6:24 ` Michael Tokarev @ 2010-08-13 10:27 ` Stan Hoeppner 2010-08-13 11:00 ` Michael Tokarev 2010-08-13 11:36 ` Roger Willcocks 2010-08-13 11:39 ` Dave Chinner 2 siblings, 1 reply; 11+ messages in thread From: Stan Hoeppner @ 2010-08-13 10:27 UTC (permalink / raw) To: xfs Michael Tokarev put forth on 8/13/2010 1:24 AM: > So the question that remains is: why? 4096 is the default block size and has been since at least 2.6.26 when I started using XFS. From "man mkfs.xfs": OPTIONS -b block_size_options This option specifies the fundamental block size of the filesystem. The valid block_size_options are: log=value or size=value and only one can be supplied. The block size is specified either as a base two logarithm value with log=, or in bytes with size=. The default value is 4096 bytes (4 KiB), the minimum is 512, and the maximum is 65536 (64 KiB). XFS on Linux currently only supports pagesize or smaller blocks. > So why xfs decided the block size is 4K?? See above. It's the default. Dave, Eric, Alex and others may be able to explain why 4096 was chosen as the default. I'm guessing it has to do with the best all around performance across a wide variety of storage systems. > And a related question, -- is there a way to create > xfs fs with the right sector size? Yes. -s sector_size This option specifies the fundamental sector size of the filesystem. The sector_size is specified either as a value in bytes with size=value or as a base two logarithm value with log=value. The default sector_size is 512 bytes. The minimum value for sector size is 512; the maximum is 32768 (32 KiB). The sector_size must be a power of 2 size and cannot be made larger than the filesystem block size. Note that the default is 512. This would lead me to believe that whoever created this 600GB XFS filesystem manually specified "-s 4096" on the command line when creating it. > The filesystem > were ok in years, not only on this machine, and I'm > quite afraid to replace it with something else (e.g. > ext4) in a hurry without good prior testing. > > By the way, how one can check the "sector size" of a > block device nowadays? cat /sys/block/[device]/queue/hw_sector_size That will give you the hardware sector size. As mentioned above, the XFS sector size can be manually specified during FS creation. Thus they may not match, which is likely the case with the 600GB FS you're having the problems with. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 10:27 ` Stan Hoeppner @ 2010-08-13 11:00 ` Michael Tokarev 0 siblings, 0 replies; 11+ messages in thread From: Michael Tokarev @ 2010-08-13 11:00 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs 13.08.2010 14:27, Stan Hoeppner wrote: > Michael Tokarev put forth on 8/13/2010 1:24 AM: > >> So the question that remains is: why? > > 4096 is the default block size and has been since at least 2.6.26 when I > started using XFS. From "man mkfs.xfs": > > OPTIONS > -b block_size_options This is block size. But XFS_IOC_DIOINFO returns _sector_ size. All other XFS filesystems we have are made with the same 4096 _block_ size. > [..] The default value is 4096 bytes (4 KiB) > >> So why xfs decided the block size is 4K?? That was the wrong question. The right one is about _sector_ size, not _block_ size. The filesystem in question has _sector_ size =4096, all the rest has it =512. >> And a related question, -- is there a way to create >> xfs fs with the right sector size? > > Yes. > > -s sector_size > This option specifies the fundamental sector size of the filesystem. The > sector_size is specified either as a value in bytes with size=value or as a > base two logarithm value with log=value. The default sector_size is 512 Yeah, the default is 512, my manpage agrees. But yet I've a filesystem that has it =4096... But maybe it were specified during filesystem creation. I re-read the mkfs.xfs manpage yesterday, but somehow missed the sector size option (!), which you quoted above. Maybe we used it year ago when creating the filesystem, for yet to be determined reason... ;) I just tried to create an xfs filesystem on this machine (on a small reserved partition) - it uses sector size = 512 as expected. >> By the way, how one can check the "sector size" of a >> block device nowadays? > > cat /sys/block/[device]/queue/hw_sector_size And it shows 512 even for the md array in question. > That will give you the hardware sector size. As mentioned above, the XFS > sector size can be manually specified during FS creation. Thus they may not > match, which is likely the case with the 600GB FS you're having the problems with. Yup! Thank you all for the information, and please excuse me for the noize - just too many stuff at once... ;) /mjt _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 6:24 ` Michael Tokarev 2010-08-13 10:27 ` Stan Hoeppner @ 2010-08-13 11:36 ` Roger Willcocks 2010-08-13 11:39 ` Dave Chinner 2 siblings, 0 replies; 11+ messages in thread From: Roger Willcocks @ 2010-08-13 11:36 UTC (permalink / raw) To: Michael Tokarev; +Cc: xfs On Fri, 2010-08-13 at 10:24 +0400, Michael Tokarev wrote: > So why xfs decided the block size is 4K?? We had a similar problem with direct io here, with 2.6.9; I quote from the Bugzilla: "mkfs.xfs has md-specific code (!) that looks at the raid flavour to figure stripe parameters, alignment requirements, etc. "Raid flavours 4,5 and 6 force the alignment to be the same as the file system block size (which is 4096 bytes)." Here's a program to test the alignment requirements: ---- #include <xfs/libxfs.h> #include <fcntl.h> int main(int argc, char* argv[]) { struct dioattr dio; int tfd = open((argc == 2) ? argv[1] : "/mnt/disk1", O_RDONLY, 0666); if (ioctl(tfd, XFS_IOC_DIOINFO, &dio) < 0) perror("ioctl"); else { printf("min io size = %d\n", dio.d_miniosz); printf("max io size = %d\n", dio.d_maxiosz); printf("align = %d\n", dio.d_mem); } close(tfd); return 0; } ---- The same disk set returned 'align = 4096' for raid 5, but 'align = 512' for raid 0. -- Roger _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 6:24 ` Michael Tokarev 2010-08-13 10:27 ` Stan Hoeppner 2010-08-13 11:36 ` Roger Willcocks @ 2010-08-13 11:39 ` Dave Chinner 2010-08-13 15:15 ` Christoph Hellwig 2010-08-17 0:18 ` Michael Tokarev 2 siblings, 2 replies; 11+ messages in thread From: Dave Chinner @ 2010-08-13 11:39 UTC (permalink / raw) To: Michael Tokarev; +Cc: xfs On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote: > 13.08.2010 03:49, Dave Chinner wrote: > > On Fri, Aug 13, 2010 at 02:10:39AM +0400, Michael Tokarev wrote: > >> Hello. > >> > >> I used XFS for a long time on many different > >> servers, and it works well. But now I encountered > >> an.. unexpected problem. > >> > >> The question is: on one of our servers, XFS requires > >> different alignment size for O_DIRECT operations than > >> on others. Usually it's 512 bytes, but on this server > >> it is 4096 - both min_io and alignment (this is from > >> XFS_IOC_DIOINFO ioctl). > > > > It'll be a filesystem set up with a 4k sector size, then. Check the > > output of xfs_info. > > yes, xfs_info reports sectsz=4096, I noticed this yesterday. .... > So the question that remains is: why? > > It's an old machine (PIV era), with old scsi disks (74Gb > non-hotswap), -- the same disks as used on numerous other > machines out there, where there's no such issue. If the software was as old as the machine, then that's the likely reason. The old md raid5 implementation did not handle sub-page size aligned IO very well - a change of IO alignment would cause the stripe cache to be purged and cause performance to be terrible. Hence every time XFS wrote the superblock or an AG header it would purge the stripe cache. The workaround old versions of mkfs.xfs used was to create the fs with a sector size of 4k when it detected md raid5 underneath it so the sb and ag headers were all 4k aligned and sized, just like the rest of the filesystem.... > And a related question, -- is there a way to create > xfs fs with the right sector size? The filesystem > were ok in years, not only on this machine, and I'm > quite afraid to replace it with something else (e.g. > ext4) in a hurry without good prior testing. # mkfs.xfs -s <size> .... if you want to set it manually. YOu shouldn't need to with any relatively recent mkfs.xfs... > By the way, how one can check the "sector size" of a > block device nowadays? I think I saw something about > sysfs, but I see nothing of that sort in 2.6.32 kernel > (which is used on this and other systems). /sys/block/<dev>/queue/hw_sector_size Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 11:39 ` Dave Chinner @ 2010-08-13 15:15 ` Christoph Hellwig 2010-08-17 0:18 ` Michael Tokarev 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2010-08-13 15:15 UTC (permalink / raw) To: Dave Chinner; +Cc: Michael Tokarev, xfs On Fri, Aug 13, 2010 at 09:39:15PM +1000, Dave Chinner wrote: > The workaround old versions of mkfs.xfs used was to create the fs > with a sector size of 4k when it detected md raid5 underneath it so > the sb and ag headers were all 4k aligned and sized, just like the > rest of the filesystem.... That workaround is still in latests mkfs.xfs if you build against the internal libdisk instead of libblkid. And that's the case at least for Debian, and probably a few other distributions as well. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-13 11:39 ` Dave Chinner 2010-08-13 15:15 ` Christoph Hellwig @ 2010-08-17 0:18 ` Michael Tokarev 2010-08-17 0:30 ` Michael Tokarev 2010-08-17 0:31 ` Dave Chinner 1 sibling, 2 replies; 11+ messages in thread From: Michael Tokarev @ 2010-08-17 0:18 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs 13.08.2010 15:39, Dave Chinner wrote: > On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote: [] >> And a related question, -- is there a way to create >> xfs fs with the right sector size? The filesystem >> were ok in years, not only on this machine, and I'm >> quite afraid to replace it with something else (e.g. >> ext4) in a hurry without good prior testing. > > # mkfs.xfs -s <size> .... > > if you want to set it manually. YOu shouldn't need to with any > relatively recent mkfs.xfs... Um. It appears that mkfs.xfs ignores -s size=512 on this raid5 array, and silently creates a filesystem with 4096 sector size, regardless of various -s size=nn and -s log=mm options. This is xfsprogs 3.1.2-1 (debian squeeze package). So the question stands... Thanks! /mjt _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-17 0:18 ` Michael Tokarev @ 2010-08-17 0:30 ` Michael Tokarev 2010-08-17 0:31 ` Dave Chinner 1 sibling, 0 replies; 11+ messages in thread From: Michael Tokarev @ 2010-08-17 0:30 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs 17.08.2010 04:18, Michael Tokarev wrote: >> # mkfs.xfs -s <size> .... >> >> if you want to set it manually. YOu shouldn't need to with any >> relatively recent mkfs.xfs... > > Um. It appears that mkfs.xfs ignores -s size=512 on this > raid5 array, and silently creates a filesystem with 4096 > sector size, regardless of various -s size=nn and -s log=mm > options. > > This is xfsprogs 3.1.2-1 (debian squeeze package). > > So the question stands... Debian builds it with internal blkid. Rebuilding it with libblkid fixes that. Thanks! /mjt _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Alignment size? 2010-08-17 0:18 ` Michael Tokarev 2010-08-17 0:30 ` Michael Tokarev @ 2010-08-17 0:31 ` Dave Chinner 1 sibling, 0 replies; 11+ messages in thread From: Dave Chinner @ 2010-08-17 0:31 UTC (permalink / raw) To: Michael Tokarev; +Cc: xfs On Tue, Aug 17, 2010 at 04:18:28AM +0400, Michael Tokarev wrote: > 13.08.2010 15:39, Dave Chinner wrote: > > On Fri, Aug 13, 2010 at 10:24:46AM +0400, Michael Tokarev wrote: > [] > >> And a related question, -- is there a way to create > >> xfs fs with the right sector size? The filesystem > >> were ok in years, not only on this machine, and I'm > >> quite afraid to replace it with something else (e.g. > >> ext4) in a hurry without good prior testing. > > > > # mkfs.xfs -s <size> .... > > > > if you want to set it manually. YOu shouldn't need to with any > > relatively recent mkfs.xfs... > > Um. It appears that mkfs.xfs ignores -s size=512 on this > raid5 array, and silently creates a filesystem with 4096 > sector size, regardless of various -s size=nn and -s log=mm > options. > > This is xfsprogs 3.1.2-1 (debian squeeze package). > > So the question stands... IIRC, the current debian xfsprogs package is still being built with the old detection library. If you install libblkid and build xfsprogs yourself, it should use the newer detection code and behave as expected. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-08-17 0:30 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-12 22:10 Alignment size? Michael Tokarev 2010-08-12 23:49 ` Dave Chinner 2010-08-13 6:24 ` Michael Tokarev 2010-08-13 10:27 ` Stan Hoeppner 2010-08-13 11:00 ` Michael Tokarev 2010-08-13 11:36 ` Roger Willcocks 2010-08-13 11:39 ` Dave Chinner 2010-08-13 15:15 ` Christoph Hellwig 2010-08-17 0:18 ` Michael Tokarev 2010-08-17 0:30 ` Michael Tokarev 2010-08-17 0:31 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox