* Re: xfs: very slow after mount, very slow at umount [not found] <4D40C8D1.8090202@teksavvy.com> @ 2011-01-27 3:30 ` Dave Chinner 2011-01-27 3:49 ` Mark Lord [not found] ` <4D40CDCF.4010301@teksavvy.com> 1 sibling, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-27 3:30 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder [Please cc xfs@oss.sgi.com on XFS bug reports. Added.] On Wed, Jan 26, 2011 at 08:22:25PM -0500, Mark Lord wrote: > Alex / Christoph, > > My mythtv box here uses XFS on a 2TB drive for storing recordings and videos. > It is behaving rather strangely though, and has gotten worse recently. > Here is what I see happening: > > The drive mounts fine at boot, but the very first attempt to write a new file > to the filesystem suffers from a very very long pause, 30-60 seconds, during which > time the disk activity light is fully "on". Please post the output of xfs_info <mtpt> so we can see what you filesystem configuration is. > This happens only on the first new file write after mounting. > From then on, the filesystem is fast and responsive as expected. > If I umount the filesystem, and then mount it again, > the exact same behaviour can be observed. I can't say I've seen this. Can you capture a blktrace of the IO so we can see what IO is actually being done, and perhaps also record an XFS event trace as well (i.e. of all the events in /sys/kernel/debug/tracing/events/xfs). > This of course screws up mythtv, as it causes me to lose the first 30-60 > seconds of the first recording it attempts after booting. So as a workaround > I now have a startup script to create, sync, and delete a 64MB file before > starting mythtv. This still takes 30-60 seconds, but it all happens and > finishes before mythtv has a real-time need to write to the filesystem. > > The 2TB drive is fine -- zero errors, no events in the SMART logs, > and I've disabled the silly WD head-unload logic on it. > > What's happening here? Why the big long burst of activity? > I've only just noticed this behaviour in the past few weeks, > running 2.6.35 and more recently 2.6.37. Can you be a bit more precise? what were you running before 2.6.35 when you didn't notice this? > * * * > > The other issue is something I notice at umount time. > I have a second big drive used as a backup device for the drive discussed above. > I use "mirrordir" (similar to rsync) to clone directories/files from the main > drive to the backup drive. After mirrordir finishes, I then "umount /backup". > The umount promptly hangs, disk light on solid, for 30-60 seconds, then finishes. Same again - blktrace and event traces for the different cases. Also, how many files are you syncing? how much data, number of inodes, etc... > If I type "sync" just before doing the umount, sync takes about 1 second, > and the umount finishes instantly. > > Huh? What's happening there? Sounds like something is broken w.r.t. writeback during unmount. Perhaps also adding the writeback events to the trace would help understand what is happening here.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:30 ` xfs: very slow after mount, very slow at umount Dave Chinner @ 2011-01-27 3:49 ` Mark Lord 2011-01-27 5:17 ` Stan Hoeppner ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Mark Lord @ 2011-01-27 3:49 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On 11-01-26 10:30 PM, Dave Chinner wrote: > [Please cc xfs@oss.sgi.com on XFS bug reports. Added.] > > On Wed, Jan 26, 2011 at 08:22:25PM -0500, Mark Lord wrote: >> Alex / Christoph, >> >> My mythtv box here uses XFS on a 2TB drive for storing recordings and videos. >> It is behaving rather strangely though, and has gotten worse recently. >> Here is what I see happening: >> >> The drive mounts fine at boot, but the very first attempt to write a new file >> to the filesystem suffers from a very very long pause, 30-60 seconds, during which >> time the disk activity light is fully "on". > > Please post the output of xfs_info <mtpt> so we can see what you > filesystem configuration is. /dev/sdb1 on /var/lib/mythtv type xfs (rw,noatime,allocsize=64M,logbufs=8,largeio) [~] xfs_info /var/lib/mythtv meta-data=/dev/sdb1 isize=256 agcount=7453, agsize=65536 blks = sectsz=512 attr=2 data = bsize=4096 blocks=488378638, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 >> This happens only on the first new file write after mounting. >> From then on, the filesystem is fast and responsive as expected. >> If I umount the filesystem, and then mount it again, >> the exact same behaviour can be observed. > > I can't say I've seen this. Can you capture a blktrace of the IO so > we can see what IO is actually being done, and perhaps also record > an XFS event trace as well (i.e. of all the events in > /sys/kernel/debug/tracing/events/xfs). I'll have to reconfig/rebuild the kernel to include support for blktrace first. Can you specify the exact commands/args you'd like for running blktrace etc? >> This of course screws up mythtv, as it causes me to lose the first 30-60 >> seconds of the first recording it attempts after booting. So as a workaround >> I now have a startup script to create, sync, and delete a 64MB file before >> starting mythtv. This still takes 30-60 seconds, but it all happens and >> finishes before mythtv has a real-time need to write to the filesystem. >> >> The 2TB drive is fine -- zero errors, no events in the SMART logs, >> and I've disabled the silly WD head-unload logic on it. >> >> What's happening here? Why the big long burst of activity? >> I've only just noticed this behaviour in the past few weeks, >> running 2.6.35 and more recently 2.6.37. > > Can you be a bit more precise? what were you running before 2.6.35 > when you didn't notice this? Those details are in my earlier follow-up posting. >> The other issue is something I notice at umount time. I'm going to let that issue rest for now, until we figure out the first issue. Heck, they might even be the exact same thing.. :) Thanks! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:49 ` Mark Lord @ 2011-01-27 5:17 ` Stan Hoeppner 2011-01-27 15:12 ` Mark Lord 2011-01-27 23:39 ` Dave Chinner 2 siblings, 0 replies; 34+ messages in thread From: Stan Hoeppner @ 2011-01-27 5:17 UTC (permalink / raw) To: xfs Mark Lord put forth on 1/26/2011 9:49 PM: > agcount=7453 That's probably a bit high Mark, and very possibly the cause of your problems. :) Unless the disk array backing this filesystem has something like 400-800 striped disk drives. You said it's a single 2TB drive right? The default agcount for a single drive filesystem is 4 allocation groups. For mdraid (of any number of disks/configuration) it's 16 allocation groups. Why/how did you end up with 7452 allocation groups? That can definitely cause some performance issues due to massively excessive head seeking, and possibly all manner of weirdness. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:49 ` Mark Lord 2011-01-27 5:17 ` Stan Hoeppner @ 2011-01-27 15:12 ` Mark Lord 2011-01-27 15:40 ` Justin Piszcz 2011-01-27 23:41 ` Dave Chinner 2011-01-27 23:39 ` Dave Chinner 2 siblings, 2 replies; 34+ messages in thread From: Mark Lord @ 2011-01-27 15:12 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On 11-01-27 12:30 AM, Stan Hoeppner wrote: > Mark Lord put forth on 1/26/2011 9:49 PM: > >> agcount=7453 > > That's probably a bit high Mark, and very possibly the cause of your problems. > :) Unless the disk array backing this filesystem has something like 400-800 > striped disk drives. You said it's a single 2TB drive right? > > The default agcount for a single drive filesystem is 4 allocation groups. For > mdraid (of any number of disks/configuration) it's 16 allocation groups. > > Why/how did you end up with 7452 allocation groups? That can definitely cause > some performance issues due to massively excessive head seeking, and possibly > all manner of weirdness. This is great info, exactly the kind of feedback I was hoping for! The filesystem is about a year old now, and I probably used agsize=nnnnn when creating it or something. So if this resulted in what you consider to be many MANY too MANY ags, then I can imagine the first new file write wanting to go out and read in all of the ag data to determine the "best fit" or something. Which might explain some of the delay. Once I get the new 2TB drive, I'll re-run mkfs.xfs and then copy everything over onto a fresh xfs filesystem. Can you recommend a good set of mkfs.xfs parameters to suit the characteristics of this system? Eg. Only a few thousand active inodes, and nearly all files are in the 600MB -> 20GB size range. The usage pattern it must handle is up to six concurrent streaming writes at the same time as up to three streaming reads, with no significant delays permitted on the reads. That's the kind of workload that I find XFS handles nicely, and EXT4 has given me trouble with in the past. Thanks -ml _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 15:12 ` Mark Lord @ 2011-01-27 15:40 ` Justin Piszcz 2011-01-27 16:03 ` Mark Lord 2011-01-27 23:41 ` Dave Chinner 1 sibling, 1 reply; 34+ messages in thread From: Justin Piszcz @ 2011-01-27 15:40 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, Alex Elder, Linux Kernel, Stan Hoeppner, xfs On Thu, 27 Jan 2011, Mark Lord wrote: > On 11-01-27 12:30 AM, Stan Hoeppner wrote: >> Mark Lord put forth on 1/26/2011 9:49 PM: >> >>> agcount=7453 >> >> That's probably a bit high Mark, and very possibly the cause of your problems. >> :) Unless the disk array backing this filesystem has something like 400-800 >> striped disk drives. You said it's a single 2TB drive right? >> >> The default agcount for a single drive filesystem is 4 allocation groups. For >> mdraid (of any number of disks/configuration) it's 16 allocation groups. >> >> Why/how did you end up with 7452 allocation groups? That can definitely cause >> some performance issues due to massively excessive head seeking, and possibly >> all manner of weirdness. > > This is great info, exactly the kind of feedback I was hoping for! > > The filesystem is about a year old now, and I probably used agsize=nnnnn > when creating it or something. > > So if this resulted in what you consider to be many MANY too MANY ags, > then I can imagine the first new file write wanting to go out and read > in all of the ag data to determine the "best fit" or something. > Which might explain some of the delay. > > Once I get the new 2TB drive, I'll re-run mkfs.xfs and then copy everything > over onto a fresh xfs filesystem. > > Can you recommend a good set of mkfs.xfs parameters to suit the characteristics > of this system? Eg. Only a few thousand active inodes, and nearly all files are > in the 600MB -> 20GB size range. The usage pattern it must handle is up to > six concurrent streaming writes at the same time as up to three streaming reads, > with no significant delays permitted on the reads. > > That's the kind of workload that I find XFS handles nicely, > and EXT4 has given me trouble with in the past. > > Thanks Hi Mark, I did a load of benchmarks a long time ago testing every mkfs.xfs option there was, and I found that most of the time (if not all), the defaults were the best. Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 15:40 ` Justin Piszcz @ 2011-01-27 16:03 ` Mark Lord 2011-01-27 19:40 ` Stan Hoeppner 2011-01-27 20:24 ` John Stoffel 0 siblings, 2 replies; 34+ messages in thread From: Mark Lord @ 2011-01-27 16:03 UTC (permalink / raw) To: Justin Piszcz Cc: Christoph Hellwig, Alex Elder, Linux Kernel, Stan Hoeppner, xfs On 11-01-27 10:40 AM, Justin Piszcz wrote: > > > On Thu, 27 Jan 2011, Mark Lord wrote: .. >> Can you recommend a good set of mkfs.xfs parameters to suit the characteristics >> of this system? Eg. Only a few thousand active inodes, and nearly all files are >> in the 600MB -> 20GB size range. The usage pattern it must handle is up to >> six concurrent streaming writes at the same time as up to three streaming reads, >> with no significant delays permitted on the reads. >> >> That's the kind of workload that I find XFS handles nicely, >> and EXT4 has given me trouble with in the past. .. > I did a load of benchmarks a long time ago testing every mkfs.xfs option there > was, and I found that most of the time (if not all), the defaults were the best. .. I am concerned with fragmentation on the very special workload in this case. I'd really like the 20GB files, written over a 1-2 hour period, to consist of a very few very large extents, as much as possible. Rather than hundreds or thousands of "tiny" MB sized extents. I wonder what the best mkfs.xfs parameters might be to encourage that? Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 16:03 ` Mark Lord @ 2011-01-27 19:40 ` Stan Hoeppner 2011-01-27 20:11 ` david ` (2 more replies) 2011-01-27 20:24 ` John Stoffel 1 sibling, 3 replies; 34+ messages in thread From: Stan Hoeppner @ 2011-01-27 19:40 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, Linux Kernel, xfs, Justin Piszcz, Alex Elder Mark Lord put forth on 1/27/2011 10:03 AM: > On 11-01-27 10:40 AM, Justin Piszcz wrote: >> >> >> On Thu, 27 Jan 2011, Mark Lord wrote: > .. >>> Can you recommend a good set of mkfs.xfs parameters to suit the characteristics >>> of this system? Eg. Only a few thousand active inodes, and nearly all files are >>> in the 600MB -> 20GB size range. The usage pattern it must handle is up to >>> six concurrent streaming writes at the same time as up to three streaming reads, >>> with no significant delays permitted on the reads. >>> >>> That's the kind of workload that I find XFS handles nicely, >>> and EXT4 has given me trouble with in the past. > .. >> I did a load of benchmarks a long time ago testing every mkfs.xfs option there >> was, and I found that most of the time (if not all), the defaults were the best. > .. > > I am concerned with fragmentation on the very special workload in this case. > I'd really like the 20GB files, written over a 1-2 hour period, to consist > of a very few very large extents, as much as possible. For XFS that's actually not a special case workload but an average one. XFS was conceived at SGI for use on large supercomputers where typical single file datasets are extremely large, i.e. hundreds of GB. Also note that the real time sub volume feature was created for almost exactly your purpose: streaming record/playback of raw A/V data for broadcast (i.e. television). In your case it's compressed, not raw A/V data. I'm not recommending you use the real time feature however, as it's overkill for MythTV and not necessary. > Rather than hundreds or thousands of "tiny" MB sized extents. > I wonder what the best mkfs.xfs parameters might be to encourage that? You need to use the mkfs.xfs defaults for any single drive filesystem, and trust the allocator to do the right thing. XFS uses variable size extents and the size is chosen dynamically--you don't have direct or indirect control of the extent size chosen for a given file or set of files AFAIK. As Dave Chinner is fond of pointing out, it's those who don't know enough about XFS and choose custom settings that most often get themselves into trouble (as you've already done once). :) The defaults exist for a reason, and they weren't chosen willy nilly. The vast bulk of XFS' configurability exists for tuning maximum performance on large to very large RAID arrays. There isn't much, if any, additional performance to be gained with parameter tweaks on a single drive XFS filesystem. A brief explanation of agcount: the filesystem is divided into agcount regions called allocation groups, or AGs. The allocator writes to all AGs in parallel to increase performance. With extremely fast storage (SSD, large high RPM RAID) this increases throughput as the storage can often sink writes faster than a serial writer can push data. In your case, you have a single slow spindle with over 7,000 AGs. Thus, the allocator is writing to over 7,000 locations on that single disk simultaneously, or, at least, it's trying to. Thus, the poor head on that drive is being whipped all over the place without actually getting much writing done. To add insults to injury, this is one of these low RPM low head performance "green" drives correct? Trust the defaults. If they give you problems (unlikely) then we can't talk. ;) -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 19:40 ` Stan Hoeppner @ 2011-01-27 20:11 ` david 2011-01-27 23:53 ` Stan Hoeppner 2011-01-27 21:56 ` Mark Lord 2011-01-28 19:18 ` Martin Steigerwald 2 siblings, 1 reply; 34+ messages in thread From: david @ 2011-01-27 20:11 UTC (permalink / raw) To: Stan Hoeppner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord On Thu, 27 Jan 2011, Stan Hoeppner wrote: >> Rather than hundreds or thousands of "tiny" MB sized extents. >> I wonder what the best mkfs.xfs parameters might be to encourage that? > > You need to use the mkfs.xfs defaults for any single drive filesystem, and trust > the allocator to do the right thing. XFS uses variable size extents and the > size is chosen dynamically--you don't have direct or indirect control of the > extent size chosen for a given file or set of files AFAIK. > > As Dave Chinner is fond of pointing out, it's those who don't know enough about > XFS and choose custom settings that most often get themselves into trouble (as > you've already done once). :) > > The defaults exist for a reason, and they weren't chosen willy nilly. The vast > bulk of XFS' configurability exists for tuning maximum performance on large to > very large RAID arrays. There isn't much, if any, additional performance to be > gained with parameter tweaks on a single drive XFS filesystem. how do I understand how to setup things on multi-disk systems? the documentation I've found online is not that helpful, and in some ways contradictory. If there really are good rules for how to do this, it would be very helpful if you could just give mkfs.xfs the information about your system (this partition is on a 16 drive raid6 array) and have it do the right thing. David Lang > A brief explanation of agcount: the filesystem is divided into agcount regions > called allocation groups, or AGs. The allocator writes to all AGs in parallel > to increase performance. With extremely fast storage (SSD, large high RPM RAID) > this increases throughput as the storage can often sink writes faster than a > serial writer can push data. In your case, you have a single slow spindle with > over 7,000 AGs. Thus, the allocator is writing to over 7,000 locations on that > single disk simultaneously, or, at least, it's trying to. Thus, the poor head > on that drive is being whipped all over the place without actually getting much > writing done. To add insults to injury, this is one of these low RPM low head > performance "green" drives correct? > > Trust the defaults. If they give you problems (unlikely) then we can't talk. ;) > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 20:11 ` david @ 2011-01-27 23:53 ` Stan Hoeppner 2011-01-28 2:09 ` david 0 siblings, 1 reply; 34+ messages in thread From: Stan Hoeppner @ 2011-01-27 23:53 UTC (permalink / raw) To: david Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord david@lang.hm put forth on 1/27/2011 2:11 PM: > how do I understand how to setup things on multi-disk systems? the documentation > I've found online is not that helpful, and in some ways contradictory. Visit http://xfs.org There you will find: Users guide: http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/index.html File system structure: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/index.html Training labs: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Labs/tmp/en-US/html/index.html > If there really are good rules for how to do this, it would be very helpful if > you could just give mkfs.xfs the information about your system (this partition > is on a 16 drive raid6 array) and have it do the right thing. If your disk array is built upon Linux mdraid, recent versions of mkfs.xfs will read the parameters and automatically make the filesystem accordingly, properly. mxfs.fxs will not do this for PCIe/x hardware RAID arrays or external FC/iSCSI based SAN arrays as there is no standard place to acquire the RAID configuration information for such systems. For these you will need to configure mkfs.xfs manually. At minimum you will want to specify stripe width (sw) which needs to match the hardware stripe width. For RAID0 sw=[#of_disks]. For RAID 10, sw=[#disks/2]. For RAID5 sw=[#disks-1]. For RAID6 sw=[#disks-2]. You'll want at minimum agcount=16 for striped hardware arrays. Depending on the number and spindle speed of the disks, the total size of the array, the characteristics of the RAID controller (big or small cache), you may want to increase agcount. Experimentation may be required to find the optimum parameters for a given hardware RAID array. Typically all other parameters may be left at defaults. Picking the perfect mkfs.xfs parameters for a hardware RAID array can be somewhat of a black art, mainly because no two vendor arrays act or perform identically. Systems of a caliber requiring XFS should be thoroughly tested before going into production. Testing _with your workload_ of multiple parameters should be performed to identify those yielding best performance. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 23:53 ` Stan Hoeppner @ 2011-01-28 2:09 ` david 2011-01-28 13:56 ` Dave Chinner 0 siblings, 1 reply; 34+ messages in thread From: david @ 2011-01-28 2:09 UTC (permalink / raw) To: Stan Hoeppner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord On Thu, 27 Jan 2011, Stan Hoeppner wrote: > david@lang.hm put forth on 1/27/2011 2:11 PM: > >> how do I understand how to setup things on multi-disk systems? the documentation >> I've found online is not that helpful, and in some ways contradictory. > > Visit http://xfs.org There you will find: > > Users guide: > http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/index.html > > File system structure: > http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/index.html > > Training labs: > http://xfs.org/docs/xfsdocs-xml-dev/XFS_Labs/tmp/en-US/html/index.html thanks for the pointers. >> If there really are good rules for how to do this, it would be very helpful if >> you could just give mkfs.xfs the information about your system (this partition >> is on a 16 drive raid6 array) and have it do the right thing. > > If your disk array is built upon Linux mdraid, recent versions of mkfs.xfs will > read the parameters and automatically make the filesystem accordingly, properly. > > mxfs.fxs will not do this for PCIe/x hardware RAID arrays or external FC/iSCSI > based SAN arrays as there is no standard place to acquire the RAID configuration > information for such systems. For these you will need to configure mkfs.xfs > manually. > > At minimum you will want to specify stripe width (sw) which needs to match the > hardware stripe width. For RAID0 sw=[#of_disks]. For RAID 10, sw=[#disks/2]. > For RAID5 sw=[#disks-1]. For RAID6 sw=[#disks-2]. > > You'll want at minimum agcount=16 for striped hardware arrays. Depending on the > number and spindle speed of the disks, the total size of the array, the > characteristics of the RAID controller (big or small cache), you may want to > increase agcount. Experimentation may be required to find the optimum > parameters for a given hardware RAID array. Typically all other parameters may > be left at defaults. does this value change depending on the number of disks in the array? > Picking the perfect mkfs.xfs parameters for a hardware RAID array can be > somewhat of a black art, mainly because no two vendor arrays act or perform > identically. if mkfs.xfs can figure out how to do the 'right thing' for md raid arrays, can there be a mode where it asks the users for the same information that it gets from the kernel? > Systems of a caliber requiring XFS should be thoroughly tested before going into > production. Testing _with your workload_ of multiple parameters should be > performed to identify those yielding best performance. <rant> the problem with this is that for large arrays, formatting the array and loading it with data can take a day or more, even before you start running the test. This is made even worse if you are scaling up an existing system a couple orders of magnatude, because you may not have the full workload available to you. Saying that you should test out every option before going into production is a cop-out. The better you can test it, the better off you are, but without knowing what the knobs do, just doing a test and twiddling the knobs to do another test isn't very useful. If there is a way to set the knobs in the general ballpark, then you can test and see if the performance seems adaquate, if not you can try teaking one of the knobs a little bit and see if it helps or hurts. but if the knobs aren't even in the ballpark when you start, this doesn't help much. </rant> David Lang _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 2:09 ` david @ 2011-01-28 13:56 ` Dave Chinner 2011-01-28 19:26 ` david 0 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-28 13:56 UTC (permalink / raw) To: david Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@lang.hm wrote: > On Thu, 27 Jan 2011, Stan Hoeppner wrote: > >david@lang.hm put forth on 1/27/2011 2:11 PM: > > > >>how do I understand how to setup things on multi-disk systems? the documentation > >>I've found online is not that helpful, and in some ways contradictory. > > > >Visit http://xfs.org There you will find: > > > >Users guide: > >http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/index.html > > > >File system structure: > >http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/index.html > > > >Training labs: > >http://xfs.org/docs/xfsdocs-xml-dev/XFS_Labs/tmp/en-US/html/index.html > > thanks for the pointers. > > >>If there really are good rules for how to do this, it would be very helpful if > >>you could just give mkfs.xfs the information about your system (this partition > >>is on a 16 drive raid6 array) and have it do the right thing. > > > >If your disk array is built upon Linux mdraid, recent versions of mkfs.xfs will > >read the parameters and automatically make the filesystem accordingly, properly. > > > >mxfs.fxs will not do this for PCIe/x hardware RAID arrays or external FC/iSCSI > >based SAN arrays as there is no standard place to acquire the RAID configuration > >information for such systems. For these you will need to configure mkfs.xfs > >manually. > > > >At minimum you will want to specify stripe width (sw) which needs to match the > >hardware stripe width. For RAID0 sw=[#of_disks]. For RAID 10, sw=[#disks/2]. > >For RAID5 sw=[#disks-1]. For RAID6 sw=[#disks-2]. > > > >You'll want at minimum agcount=16 for striped hardware arrays. Depending on the > >number and spindle speed of the disks, the total size of the array, the > >characteristics of the RAID controller (big or small cache), you may want to > >increase agcount. Experimentation may be required to find the optimum > >parameters for a given hardware RAID array. Typically all other parameters may > >be left at defaults. > > does this value change depending on the number of disks in the array? Only depending on block device capacity. Once at the maximum AG size (1TB), mkfs has to add more AGs. So once above 4TB for hardware RAID LUNs and 16TB for md/dm devices, you will get an AG per TB of storage by default. As it is, the optimal number and size of AGs will depend on many geometry factors as workload factors, such as the size of the luns, the way they are striped, whether you are using linear concatenation of luns or striping them or a combination of both, the amount of allocation concurrency you require, etc. In these sorts of situations, mkfs can only make a best guess - to do better you really need someone proficient in the dark arts to configure the storage and filesystem optimally. > >Picking the perfect mkfs.xfs parameters for a hardware RAID array can be > >somewhat of a black art, mainly because no two vendor arrays act or perform > >identically. > > if mkfs.xfs can figure out how to do the 'right thing' for md raid > arrays, can there be a mode where it asks the users for the same > information that it gets from the kernel? mkfs.xfs can get the information it needs directly from dm and md devices. However, when hardware RAID luns present themselves to the OS in an identical manner to single drives, how does mkfs tell the difference between a 2TB hardware RAID lun made up of 30x73GB drives and a single 2TB SATA drive? The person running mkfs should already know this little detail.... > >Systems of a caliber requiring XFS should be thoroughly tested before going into > >production. Testing _with your workload_ of multiple parameters should be > >performed to identify those yielding best performance. > > <rant> > the problem with this is that for large arrays, formatting the array > and loading it with data can take a day or more, even before you > start running the test. This is made even worse if you are scaling > up an existing system a couple orders of magnatude, because you may > not have the full workload available to you. If your hardware procurement-to-production process doesn't include testing performance of potential equipment on a representative workload, then I'd say you have a process problem that we can't help you solve.... > Saying that you should > test out every option before going into production is a cop-out. I never test every option. I know what the options do, so to decide what to tweak (if anything) what I first need to know is how a workload performs on a given storage layout with default options. I need to have: a) some idea of the expected performance of the workload b) a baseline performance characterisation of the underlying block devices c) a set of baseline performance metrics from a representative workload on a default filesystem d) spent some time analysing the baseline metrics for evidence of sub-optimal performance characteristics. Once I have that information, I can suggest meaningful ways (if any) to change the storage and filesystem configuration that may improve the performance of the workload. BTW, if you ask me how to optimise an ext4 filesystem for the same workload, I'll tell you straight up that I have no idea and that you should ask an ext4 expert.... > The better you can test it, the better off you are, but without > knowing what the knobs do, just doing a test and twiddling the > knobs to do another test isn't very useful. Well, yes, that is precisely the reason you should use the defaults. It's also the reason we have experts - they know what knob to twiddle to fix specific problems. If you prefer to twiddle knobs like Blind Freddy, then you should expect things to go wrong.... > If there is a way to > set the knobs in the general ballpark, Have you ever considered that this is exactly what mkfs does when you use the defaults? And that this is the fundamental reason we keep saying "use the defaults"? > then you can test and see > if the performance seems adaquate, if not you can try teaking one > of the knobs a little bit and see if it helps or hurts. but if the > knobs aren't even in the ballpark when you start, this doesn't > help much. The thread has now come full circle - you're ranting about not knowing what knobs do or how to set reasonable values so you want to twiddle random knobs them to see if they do anything as the basis of your optimisation process. This is the exact process that lead to the bug report that started this thread - a tweak-without- understanding configuration leading to undesirable behavioural characteristics from the filesystem..... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 13:56 ` Dave Chinner @ 2011-01-28 19:26 ` david 2011-01-29 5:40 ` Dave Chinner 0 siblings, 1 reply; 34+ messages in thread From: david @ 2011-01-28 19:26 UTC (permalink / raw) To: Dave Chinner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Sat, 29 Jan 2011, Dave Chinner wrote: > On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@lang.hm wrote: >> On Thu, 27 Jan 2011, Stan Hoeppner wrote: >>> david@lang.hm put forth on 1/27/2011 2:11 PM: >>> >>> Picking the perfect mkfs.xfs parameters for a hardware RAID array can be >>> somewhat of a black art, mainly because no two vendor arrays act or perform >>> identically. >> >> if mkfs.xfs can figure out how to do the 'right thing' for md raid >> arrays, can there be a mode where it asks the users for the same >> information that it gets from the kernel? > > mkfs.xfs can get the information it needs directly from dm and md > devices. However, when hardware RAID luns present themselves to the > OS in an identical manner to single drives, how does mkfs tell the > difference between a 2TB hardware RAID lun made up of 30x73GB drives > and a single 2TB SATA drive? The person running mkfs should already > know this little detail.... that's my point, the person running mkfs knows this information, and can easily answer questions that mkfs asks (or provide this information on the command line). but mkfs doesn't ask for this infomation, instead it asks the user to define a whole bunch of parameters that are not well understood. An XFS guru can tell you how to configure these parameters based on different hardware layouts, but as long as it remains a 'back art' getting new people up to speed is really hard. If this can be reduced down to is this a hardware raid device if yes how many drives are there what raid type is used (linear, raid 0, 1, 5, 6, 10) and whatever questions are needed, it would _greatly_ improve the quality of the settings that non-guru people end up using. David Lang _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 19:26 ` david @ 2011-01-29 5:40 ` Dave Chinner 2011-01-29 6:08 ` david 0 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-29 5:40 UTC (permalink / raw) To: david Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Fri, Jan 28, 2011 at 11:26:00AM -0800, david@lang.hm wrote: > On Sat, 29 Jan 2011, Dave Chinner wrote: > > >On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@lang.hm wrote: > >>On Thu, 27 Jan 2011, Stan Hoeppner wrote: > >>>david@lang.hm put forth on 1/27/2011 2:11 PM: > >>> > >>>Picking the perfect mkfs.xfs parameters for a hardware RAID array can be > >>>somewhat of a black art, mainly because no two vendor arrays act or perform > >>>identically. > >> > >>if mkfs.xfs can figure out how to do the 'right thing' for md raid > >>arrays, can there be a mode where it asks the users for the same > >>information that it gets from the kernel? > > > >mkfs.xfs can get the information it needs directly from dm and md > >devices. However, when hardware RAID luns present themselves to the > >OS in an identical manner to single drives, how does mkfs tell the > >difference between a 2TB hardware RAID lun made up of 30x73GB drives > >and a single 2TB SATA drive? The person running mkfs should already > >know this little detail.... > > that's my point, the person running mkfs knows this information, and > can easily answer questions that mkfs asks (or provide this > information on the command line). but mkfs doesn't ask for this > infomation, instead it asks the user to define a whole bunch of > parameters that are not well understood. I'm going to be blunt - XFS is not a filesystem suited to use by clueless noobs. XFS is a highly complex filesystem designed for high end, high performance storage and therefore has the configurability and flexibility required by such environments. Hence I expect that anyone configuring an XFS filesystem for a production environments is a professional and has, at minimum, done their homework before they go fiddling with knobs. And we have a FAQ for a reason. ;) > An XFS guru can tell you > how to configure these parameters based on different hardware > layouts, but as long as it remains a 'back art' getting new people > up to speed is really hard. If this can be reduced down to > > is this a hardware raid device > if yes > how many drives are there > what raid type is used (linear, raid 0, 1, 5, 6, 10) > > and whatever questions are needed, it would _greatly_ improve the > quality of the settings that non-guru people end up using. As opposed to just making mkfs DTRT without needing to ask questions? If you really think an interactive mkfs-for-dummies script is necessary, then go ahead and write one - you don't need to modify mkfs at all to do it..... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-29 5:40 ` Dave Chinner @ 2011-01-29 6:08 ` david 2011-01-29 7:35 ` Dave Chinner 0 siblings, 1 reply; 34+ messages in thread From: david @ 2011-01-29 6:08 UTC (permalink / raw) To: Dave Chinner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Sat, 29 Jan 2011, Dave Chinner wrote: > On Fri, Jan 28, 2011 at 11:26:00AM -0800, david@lang.hm wrote: >> On Sat, 29 Jan 2011, Dave Chinner wrote: >> >>> On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@lang.hm wrote: >>>> On Thu, 27 Jan 2011, Stan Hoeppner wrote: >>>>> david@lang.hm put forth on 1/27/2011 2:11 PM: >>>>> >>>>> Picking the perfect mkfs.xfs parameters for a hardware RAID array can be >>>>> somewhat of a black art, mainly because no two vendor arrays act or perform >>>>> identically. >>>> >>>> if mkfs.xfs can figure out how to do the 'right thing' for md raid >>>> arrays, can there be a mode where it asks the users for the same >>>> information that it gets from the kernel? >>> >>> mkfs.xfs can get the information it needs directly from dm and md >>> devices. However, when hardware RAID luns present themselves to the >>> OS in an identical manner to single drives, how does mkfs tell the >>> difference between a 2TB hardware RAID lun made up of 30x73GB drives >>> and a single 2TB SATA drive? The person running mkfs should already >>> know this little detail.... >> >> that's my point, the person running mkfs knows this information, and >> can easily answer questions that mkfs asks (or provide this >> information on the command line). but mkfs doesn't ask for this >> infomation, instead it asks the user to define a whole bunch of >> parameters that are not well understood. > > I'm going to be blunt - XFS is not a filesystem suited to use by > clueless noobs. XFS is a highly complex filesystem designed for high > end, high performance storage and therefore has the configurability > and flexibility required by such environments. Hence I expect that > anyone configuring an XFS filesystem for a production environments > is a professional and has, at minimum, done their homework before > they go fiddling with knobs. And we have a FAQ for a reason. ;) > >> An XFS guru can tell you >> how to configure these parameters based on different hardware >> layouts, but as long as it remains a 'back art' getting new people >> up to speed is really hard. If this can be reduced down to >> >> is this a hardware raid device >> if yes >> how many drives are there >> what raid type is used (linear, raid 0, 1, 5, 6, 10) >> >> and whatever questions are needed, it would _greatly_ improve the >> quality of the settings that non-guru people end up using. > > As opposed to just making mkfs DTRT without needing to ask > questions? but you just said that mkfs couldn't do this with hardware raid because it can't "tell the difference between a 2TB hardware RAID lun made up of 30x73GB drives and a single 2TB SATA drive" if it could tell the difference, it should just do the right thing, but if it can't tell the difference, it should ask the user who can give it the answer. also, keep in mind that what it learns about the 'disks' from md and dm may not be the complete picture. I have one system that thinks it's doing a raid0 across 10 drives, but it's really 160 drives, grouped into 10 raid6 sets by hardware raid, than then gets combined by md. I am all for the defaults and auto-config being as good as possible (one of my biggest gripes about postgres is how bad it's defaults are), but whe you can't tell what reality is, ask the admin who knows (or at least have the option of asking the admin) > If you really think an interactive mkfs-for-dummies script is > necessary, then go ahead and write one - you don't need to modify > mkfs at all to do it..... it doesn't have to be interactive, the answers to the questions could be comand-line options. as for the reason that I don't do this, that's simple. I don't know enough of the black arts to know what the logic is to convert from knowing the disk layout to setting the existing parameters. David Lang _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-29 6:08 ` david @ 2011-01-29 7:35 ` Dave Chinner 2011-01-31 19:17 ` Christoph Hellwig 0 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-29 7:35 UTC (permalink / raw) To: david Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Fri, Jan 28, 2011 at 10:08:42PM -0800, david@lang.hm wrote: > On Sat, 29 Jan 2011, Dave Chinner wrote: > > >On Fri, Jan 28, 2011 at 11:26:00AM -0800, david@lang.hm wrote: > >>On Sat, 29 Jan 2011, Dave Chinner wrote: > >> > >>>On Thu, Jan 27, 2011 at 06:09:58PM -0800, david@lang.hm wrote: > >>>>On Thu, 27 Jan 2011, Stan Hoeppner wrote: > >>>>>david@lang.hm put forth on 1/27/2011 2:11 PM: > >>>>> > >>>>>Picking the perfect mkfs.xfs parameters for a hardware RAID array can be > >>>>>somewhat of a black art, mainly because no two vendor arrays act or perform > >>>>>identically. > >>>> > >>>>if mkfs.xfs can figure out how to do the 'right thing' for md raid > >>>>arrays, can there be a mode where it asks the users for the same > >>>>information that it gets from the kernel? > >>> > >>>mkfs.xfs can get the information it needs directly from dm and md > >>>devices. However, when hardware RAID luns present themselves to the > >>>OS in an identical manner to single drives, how does mkfs tell the > >>>difference between a 2TB hardware RAID lun made up of 30x73GB drives > >>>and a single 2TB SATA drive? The person running mkfs should already > >>>know this little detail.... > >> > >>that's my point, the person running mkfs knows this information, and > >>can easily answer questions that mkfs asks (or provide this > >>information on the command line). but mkfs doesn't ask for this > >>infomation, instead it asks the user to define a whole bunch of > >>parameters that are not well understood. > > > >I'm going to be blunt - XFS is not a filesystem suited to use by > >clueless noobs. XFS is a highly complex filesystem designed for high > >end, high performance storage and therefore has the configurability > >and flexibility required by such environments. Hence I expect that > >anyone configuring an XFS filesystem for a production environments > >is a professional and has, at minimum, done their homework before > >they go fiddling with knobs. And we have a FAQ for a reason. ;) > > > >>An XFS guru can tell you > >>how to configure these parameters based on different hardware > >>layouts, but as long as it remains a 'back art' getting new people > >>up to speed is really hard. If this can be reduced down to > >> > >>is this a hardware raid device > >> if yes > >> how many drives are there > >> what raid type is used (linear, raid 0, 1, 5, 6, 10) > >> > >>and whatever questions are needed, it would _greatly_ improve the > >>quality of the settings that non-guru people end up using. > > > >As opposed to just making mkfs DTRT without needing to ask > >questions? > > but you just said that mkfs couldn't do this with hardware raid > because it can't "tell the difference between a 2TB hardware RAID > lun made up of 30x73GB drives and a single 2TB SATA drive" if it > could tell the difference, it should just do the right thing, but if > it can't tell the difference, it should ask the user who can give it > the answer. Just because we can't do it right now doesn't mean it is not possible. Array/raid controller vendors need to implement the SCSI block limit VPD page, and if they do then stripe unit/stripe width may be exposed for the device in sysfs. However, I haven't seen any devices except for md and dm that actually export values that reflect sunit/swidth in the files: /sys/block/<dev>/queue/minimum_io_size /sys/block/<dev>/queue/optimal_io_size There's information about it here: http://www.kernel.org/doc/ols/2009/ols2009-pages-235-238.pdf But what we really need here is for RAID vendors to implement the part of the SCSI protocol that gives us the necessary information. > also, keep in mind that what it learns about the 'disks' from md and > dm may not be the complete picture. I have one system that thinks > it's doing a raid0 across 10 drives, but it's really 160 drives, > grouped into 10 raid6 sets by hardware raid, than then gets combined > by md. MD doesn't care whether the block devices are single disks or RAID LUNS. In this case, it's up to you to configure the md chunk size appropriately for those devices. i.e. the MD chunk size needs to be the RAID6 lun stripe width. If you get the MD config right, then mkfs will do exactly the right thing without needing to be tweaked. The same goes for any sort of heirarchical aggregation of storage - if you don't get the geometry right at each level, then performance will suck. FWIW, SGI has been using XFS in complex, multilayer, multipath, heirarchical configurations like this for 15 years. What you describe is a typical, everyday configuration that XFS is used on and it is this sort of configuration we tend to optimise the default behaviour for.... > I am all for the defaults and auto-config being as good as possible > (one of my biggest gripes about postgres is how bad it's defaults > are), but whe you can't tell what reality is, ask the admin who > knows (or at least have the option of asking the admin) > > >If you really think an interactive mkfs-for-dummies script is > >necessary, then go ahead and write one - you don't need to modify > >mkfs at all to do it..... > > it doesn't have to be interactive, the answers to the questions > could be comand-line options. Which means you're assuming a competent admin is running the tool, in which case they could just run mkfs directly. Anyway, it still doesn't need mkfs changes. > as for the reason that I don't do this, that's simple. I don't know > enough of the black arts to know what the logic is to convert from > knowing the disk layout to setting the existing parameters. Writing such a script would be a good way to learn the art and document the information that people are complaining that is lacking. I don't have the time (or need) to write such a script, but I can answer questions when they arise should someone decide to do it.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-29 7:35 ` Dave Chinner @ 2011-01-31 19:17 ` Christoph Hellwig 0 siblings, 0 replies; 34+ messages in thread From: Christoph Hellwig @ 2011-01-31 19:17 UTC (permalink / raw) To: Dave Chinner Cc: david, Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner On Sat, Jan 29, 2011 at 06:35:54PM +1100, Dave Chinner wrote: > Just because we can't do it right now doesn't mean it is not > possible. Array/raid controller vendors need to implement the SCSI > block limit VPD page, and if they do then stripe unit/stripe width > may be exposed for the device in sysfs. However, I haven't seen any > devices except for md and dm that actually export values that > reflect sunit/swidth in the files: I have access to a few big vendor arrays that export it, but I think they are still running beta firmware versions. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 19:40 ` Stan Hoeppner 2011-01-27 20:11 ` david @ 2011-01-27 21:56 ` Mark Lord 2011-01-28 0:17 ` Dave Chinner 2011-01-28 19:18 ` Martin Steigerwald 2 siblings, 1 reply; 34+ messages in thread From: Mark Lord @ 2011-01-27 21:56 UTC (permalink / raw) To: Stan Hoeppner Cc: Christoph Hellwig, Linux Kernel, xfs, Justin Piszcz, Alex Elder On 11-01-27 02:40 PM, Stan Hoeppner wrote: .. > You need to use the mkfs.xfs defaults for any single drive filesystem, and trust > the allocator to do the right thing. But it did not do the right thing when I used the defaults. Big files ended up with tons of (exactly) 64MB extents, ISTR. With the increased number of ags, I saw much less fragmentation, and the drive was still very light on I/O despite multiple simultaneous recordings, commflaggers, and playback at once. The only, ONLY, glitch, was this recent "first write takes 45 seconds" glitch. After that initial write after boot, throughput was normal (great). Thus the attempts to tweak. > Trust the defaults. I imagine the defaults are designed to handle a typical Linux install, with 100,000 to 1,000,000 files varying from a few bytes to a few megabytes. That's not what this filesystem will have. It will have only a few thousand (max) inodes at any given time, but each file will be HUGE. XFS is fantastic at adapting to the workload, but I'd like to try and have it tuned more closely for the known workload this system is throwing at it. I'm now trying again, but with 8 ags instead of 8000+. Thanks! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 21:56 ` Mark Lord @ 2011-01-28 0:17 ` Dave Chinner 2011-01-28 1:22 ` Mark Lord 0 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-28 0:17 UTC (permalink / raw) To: Mark Lord Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On Thu, Jan 27, 2011 at 04:56:20PM -0500, Mark Lord wrote: > On 11-01-27 02:40 PM, Stan Hoeppner wrote: > .. > > You need to use the mkfs.xfs defaults for any single drive filesystem, and trust > > the allocator to do the right thing. > > But it did not do the right thing when I used the defaults. > Big files ended up with tons of (exactly) 64MB extents, ISTR. Because your AG size is 64MB. An extent can't be larger than an AG. Hence you are fragmenting your large files unnecessarily, as extents can be up to 8GB in size on a 4k block size filesystem. > With the increased number of ags, I saw much less fragmentation, > and the drive was still very light on I/O despite multiple simultaneous > recordings, commflaggers, and playback at once. The allocsize mount option is the prefered method of keeping fragemntation down for dvr style workloads. > > Trust the defaults. > > I imagine the defaults are designed to handle a typical Linux install, > with 100,000 to 1,000,000 files varying from a few bytes to a few megabytes. Why would we optimise a filesystem designed for use on high end storage and large amounts of IO concurrency for what a typical Linux desktop needs? For such storage (i.e. single spindle) mkfs optimises the layout for minimal seeks and relatively low amounts of concurrency. This gives _adequate_ performance on desktop machines without compromising scalbility on high end storage. In my experience with XFS, most people who tweak mkfs parameters end up with some kind of problem they can't explain and don't know how to solve. And they are typically problems that would not have occurred had they simply used the defaults in the first place. What you've done is a perfect example of this. Yes, I know we are taking the fun out of tweaking knobs so you can say it's 1% faster than the default, but that's our job: to determine the right default settings so the filesystem works as well as possible out of the box with no tweaking for most workloads on a wide range of storage.... > That's not what this filesystem will have. It will have only a few thousand > (max) inodes at any given time, but each file will be HUGE. Which is exactly the use case XFS was designed for, and.... > XFS is fantastic at adapting to the workload, but I'd like to try and have > it tuned more closely for the known workload this system is throwing at it. .... as such the mkfs defaults are already tuned as well as they can be for such usage. > I'm now trying again, but with 8 ags instead of 8000+. Why 8 AGs and not the default? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 0:17 ` Dave Chinner @ 2011-01-28 1:22 ` Mark Lord 2011-01-28 1:36 ` Mark Lord ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Mark Lord @ 2011-01-28 1:22 UTC (permalink / raw) To: Dave Chinner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On 11-01-27 07:17 PM, Dave Chinner wrote: > > In my experience with XFS, most people who tweak mkfs parameters end > up with some kind of problem they can't explain and don't know how > to solve. And they are typically problems that would not have > occurred had they simply used the defaults in the first place. What > you've done is a perfect example of this. Maybe. But what I read from the paragraph above, is that the documentation could perhaps explain things better, and then people other than the coders might understand how best to tweak it. > Why 8 AGs and not the default? How AGs are used is not really explained anywhere I've looked, so I am guessing at what they do and how the system might respond to different values there (that documentation thing again). Lacking documentation, my earlier experiences suggest that more AGs gives me less fragmentation when multiple simultaneous recording streams are active. I got higher fragmentation with the defaults than with the tweaked value. Now, that might be due to differences in kernel versions too, as things in XFS are continuously getting even better (thanks!), and the original "defaults" assessment was with the kernel-of-the-day back in early 2010 (2.6.34?), and now the system is using 2.6.37. But I just don't know. My working theory, likely entirely wrong, is that if I have N streams active, odds are that each of those streams might get assigned to different AGs, given sufficient AGs >= N. Since the box often has 3-7 recording streams active, I'm trying it out with 8 AGs now. Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 1:22 ` Mark Lord @ 2011-01-28 1:36 ` Mark Lord 2011-01-28 4:14 ` David Rees 2011-01-28 7:31 ` Dave Chinner 2 siblings, 0 replies; 34+ messages in thread From: Mark Lord @ 2011-01-28 1:36 UTC (permalink / raw) To: Dave Chinner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On 11-01-27 08:22 PM, Mark Lord wrote: > On 11-01-27 07:17 PM, Dave Chinner wrote: >> >> In my experience with XFS, most people who tweak mkfs parameters end >> up with some kind of problem they can't explain and don't know how >> to solve. And they are typically problems that would not have >> occurred had they simply used the defaults in the first place. What >> you've done is a perfect example of this. > > Maybe. But what I read from the paragraph above, > is that the documentation could perhaps explain things better, > and then people other than the coders might understand how > best to tweak it. By the way, the documentation is excellent, for a developer who wants to work on the codebase. It describes the data structures and layouts etc.. better than perhaps any other Linux filesystem. But it doesn't seem to describe the algorithms, such as how it decides where to store a recording stream. I'm not complaining, far from it. XFS is simply wonderful, and my DVR literally couldn't work without it. But I am as technical as you are, and I like to experiment and understand the technology I use. That's partly why we both work on the Linux kernel. Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 1:22 ` Mark Lord 2011-01-28 1:36 ` Mark Lord @ 2011-01-28 4:14 ` David Rees 2011-01-28 14:22 ` Mark Lord 2011-01-28 7:31 ` Dave Chinner 2 siblings, 1 reply; 34+ messages in thread From: David Rees @ 2011-01-28 4:14 UTC (permalink / raw) To: Mark Lord Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On Thu, Jan 27, 2011 at 5:22 PM, Mark Lord <kernel@teksavvy.com> wrote: > But I just don't know. My working theory, likely entirely wrong, > is that if I have N streams active, odds are that each of those > streams might get assigned to different AGs, given sufficient AGs >= N. > > Since the box often has 3-7 recording streams active, > I'm trying it out with 8 AGs now. As suggested before - why are you messing with AGs instead of allocsize? I suspect that with the default configuration, XFS was trying to maximize throughput by reducing seeks with multiple processes writing streams. But now, you're telling XFS that it's OK to write in up to 8 different locations on the disk without worrying about seek performance. I think this is likely to result in overall worse performance at the worst time - under write load. If you are trying to optimize single thread read performance by minimizing file fragments, why don't you simply figure out at what point increasing allocsize stops increasing read performance? I suspect that the the defaults do good job because even if your file are fragmented in 64MB chunks because you have multiple streams writing, those chunks are very likely to be very close together so there isn't much of a seek penalty. -Dave _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 4:14 ` David Rees @ 2011-01-28 14:22 ` Mark Lord 0 siblings, 0 replies; 34+ messages in thread From: Mark Lord @ 2011-01-28 14:22 UTC (permalink / raw) To: David Rees Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On 11-01-27 11:14 PM, David Rees wrote: > > As suggested before - why are you messing with AGs instead of allocsize? Who said "instead of"? I'm using both. Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 1:22 ` Mark Lord 2011-01-28 1:36 ` Mark Lord 2011-01-28 4:14 ` David Rees @ 2011-01-28 7:31 ` Dave Chinner 2011-01-28 14:33 ` Mark Lord 2 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-28 7:31 UTC (permalink / raw) To: Mark Lord Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On Thu, Jan 27, 2011 at 08:22:48PM -0500, Mark Lord wrote: > On 11-01-27 07:17 PM, Dave Chinner wrote: > > > > In my experience with XFS, most people who tweak mkfs parameters end > > up with some kind of problem they can't explain and don't know how > > to solve. And they are typically problems that would not have > > occurred had they simply used the defaults in the first place. What > > you've done is a perfect example of this. > > Maybe. But what I read from the paragraph above, > is that the documentation could perhaps explain things better, > and then people other than the coders might understand how > best to tweak it. A simple google search turns up discussions like this: http://oss.sgi.com/archives/xfs/2009-01/msg01161.html Where someone reads the docco and asks questions to fill in gaps in their knowledge that the docco didn't explain fully before they try to twiddle knobs. Configuring XFS filesystems for optimal performance has always been a black art because it requires you to understand your storage, your application workload(s) and XFS from the ground up. Most people can't even tick one of those boxes, let alone all three.... > > Why 8 AGs and not the default? > > How AGs are used is not really explained anywhere I've looked, > so I am guessing at what they do and how the system might respond > to different values there (that documentation thing again). Section 5.1 of this 1996 whitepaper tells you what allocation groups are and the general allocation strategy around them: http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html > Lacking documentation, my earlier experiences suggest that more AGs > gives me less fragmentation when multiple simultaneous recording streams > are active. I got higher fragmentation with the defaults than with > the tweaked value. Fragmentation is not a big problem if you've got extents larger than a typical IO. Once extents get to a few megabytes in size, it just doesn't matter if they are any bigger for small DVR workloads because the seek cost between streams is sufficiently amortised with a few MB of sequential access per stream.... > Now, that might be due to differences in kernel versions too, > as things in XFS are continuously getting even better (thanks!), > and the original "defaults" assessment was with the kernel-of-the-day > back in early 2010 (2.6.34?), and now the system is using 2.6.37. > > But I just don't know. My working theory, likely entirely wrong, > is that if I have N streams active, odds are that each of those > streams might get assigned to different AGs, given sufficient AGs >= N. Streaming into different AGs is not necessarily the right solution; it causes seeks between every streami, and the stream in AG0 will be able to read/write faster than the stream in AG 7 because of their locations on disk. IOWs, interleaving streams within an AG might give better IO patterns, lower latency and better throughput. Of course, it depends on the storage subsystem, the application, etc. And yes, you can change this sort of allocation behaviour by fiddling with XFS knobs in the right way - start to see what I mean about tuning XFS really being a "black art"? > Since the box often has 3-7 recording streams active, > I'm trying it out with 8 AGs now. Seems like a reasonable decsion. Good luck. > > Cheers > > -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 7:31 ` Dave Chinner @ 2011-01-28 14:33 ` Mark Lord 2011-01-28 23:58 ` Dave Chinner 0 siblings, 1 reply; 34+ messages in thread From: Mark Lord @ 2011-01-28 14:33 UTC (permalink / raw) To: Dave Chinner Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On 11-01-28 02:31 AM, Dave Chinner wrote: > > A simple google search turns up discussions like this: > > http://oss.sgi.com/archives/xfs/2009-01/msg01161.html "in the long term we still expect fragmentation to degrade the performance of XFS file systems" Other than that, no hints there about how changing agcount affects things. > Configuring XFS filesystems for optimal performance has always been > a black art because it requires you to understand your storage, your > application workload(s) and XFS from the ground up. Most people > can't even tick one of those boxes, let alone all three.... Well, I've got 2/3 of those down just fine, thanks. But it's the "XFS" part that is still the "black art" part, because so little is written about *how* it works (as opposed to how it is laid out on disk). Again, that's only a minor complaint -- XFS is way better documented than the alternatives, and also works way better than the others I've tried here on this workload. >>> Why 8 AGs and not the default? >> >> How AGs are used is not really explained anywhere I've looked, >> so I am guessing at what they do and how the system might respond >> to different values there (that documentation thing again). > > Section 5.1 of this 1996 whitepaper tells you what allocation groups > are and the general allocation strategy around them: > > http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html Looks a bit dated: "Allocation groups are typically 0.5 to 4 gigabytes in size." But it does suggest that "processes running concurrently can allocate space in the file system concurrently without interfering with each other". Dunno if that's still true today, but it sounds pretty close to what I was theorizing about how it might work. > start to see what I mean about tuning XFS really being a "black art"? No, I've seen that "black" (aka. undefined, undocumented) part from the start. :) Thanks for chipping in here, though -- it's been really useful. Cheers! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-28 14:33 ` Mark Lord @ 2011-01-28 23:58 ` Dave Chinner 0 siblings, 0 replies; 34+ messages in thread From: Dave Chinner @ 2011-01-28 23:58 UTC (permalink / raw) To: Mark Lord Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner On Fri, Jan 28, 2011 at 09:33:02AM -0500, Mark Lord wrote: > On 11-01-28 02:31 AM, Dave Chinner wrote: > > > > A simple google search turns up discussions like this: > > > > http://oss.sgi.com/archives/xfs/2009-01/msg01161.html > > "in the long term we still expect fragmentation to degrade the performance of > XFS file systems" "so we intend to add an on-line file system defragmentation utility to optimize the file system in the future" You are quoting from the wrong link - that's from the 1996 whitepaper. And sure, at the time that was written, nobody had any real experience with long term aging of XFS filesystems so it was still a guess at that point. XFS has had that online defragmentation utility since 1998, IIRC, even though in most cases it is unnecessary to use it. > Other than that, no hints there about how changing agcount affects things. If the reason given in the whitepaper for multiple AGs (i.e. they are for increasing the concurrency of allocation) doesn't help you understand why you'd want to increase the number of AGs in the filesystem, then you haven't really thought about what you read. As it is, from the same google search that found the above link as #1 hit, this was #6: http://oss.sgi.com/archives/xfs/2010-11/msg00497.html | > AG count has a | > direct relationship to the storage hardware, not the number of CPUs | > (cores) in the system | | Actually, I used 16 AGs because it's twice the number of CPU cores | and I want to make sure that CPU parallel workloads (e.g. make -j 8) | don't serialise on AG locks during allocation. IOWs, I laid it out | that way precisely because of the number of CPUs in the system... | | And to point out the not-so-obvious, this is the _default layout_ | that mkfs.xfs in the debian squeeze installer came up with. IOWs, | mkfs.xfs did exactly what I wanted without me having to tweak | _anything_." | [...] | | In that case, you are right. Single spindle SRDs go backwards in | performance pretty quickly once you go over 4 AGs... It seems to me that you haven't really done much looking for information; there's lots of relevant advice in xfs mailing list archives... (and before you ask - SRD == Spinning Rust Disk) > > Configuring XFS filesystems for optimal performance has always been > > a black art because it requires you to understand your storage, your > > application workload(s) and XFS from the ground up. Most people > > can't even tick one of those boxes, let alone all three.... > > Well, I've got 2/3 of those down just fine, thanks. > But it's the "XFS" part that is still the "black art" part, > because so little is written about *how* it works > (as opposed to how it is laid out on disk). If you want to know exactly how it works, there plenty of code to read. I know, you're going to call that a cop out, but I've got more important things to do than document 20,000 lines of allocation code just for you. In a world of infinite resources then everything would be documented just the way you want, but we don't have infinite resources so it remains documented by the code that implements it. However, if you want to go and understand it and document it all for us, then we'll happily take the patches. :) Cheers,, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 19:40 ` Stan Hoeppner 2011-01-27 20:11 ` david 2011-01-27 21:56 ` Mark Lord @ 2011-01-28 19:18 ` Martin Steigerwald 2 siblings, 0 replies; 34+ messages in thread From: Martin Steigerwald @ 2011-01-28 19:18 UTC (permalink / raw) To: xfs Cc: Linux Kernel, Christoph Hellwig, Justin Piszcz, Alex Elder, Mark Lord, Stan Hoeppner [-- Attachment #1.1: Type: Text/Plain, Size: 421 bytes --] Am Thursday 27 January 2011 schrieb Stan Hoeppner: > Trust the defaults. If they give you problems (unlikely) then we can't > talk. ;) With one addition: Use a recent xfsprogs! ;) Earlier ones created more AGs, didn't activate lazy super block counter (likely no issue here) and whatnot... -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 16:03 ` Mark Lord 2011-01-27 19:40 ` Stan Hoeppner @ 2011-01-27 20:24 ` John Stoffel 1 sibling, 0 replies; 34+ messages in thread From: John Stoffel @ 2011-01-27 20:24 UTC (permalink / raw) To: Mark Lord Cc: Linux Kernel, xfs, Christoph Hellwig, Justin Piszcz, Alex Elder, Stan Hoeppner >>>>> "Mark" == Mark Lord <kernel@teksavvy.com> writes: Mark> On 11-01-27 10:40 AM, Justin Piszcz wrote: >> >> >> On Thu, 27 Jan 2011, Mark Lord wrote: Mark> .. >>> Can you recommend a good set of mkfs.xfs parameters to suit the characteristics >>> of this system? Eg. Only a few thousand active inodes, and nearly all files are >>> in the 600MB -> 20GB size range. The usage pattern it must handle is up to >>> six concurrent streaming writes at the same time as up to three streaming reads, >>> with no significant delays permitted on the reads. >>> >>> That's the kind of workload that I find XFS handles nicely, >>> and EXT4 has given me trouble with in the past. Mark> .. >> I did a load of benchmarks a long time ago testing every mkfs.xfs option there >> was, and I found that most of the time (if not all), the defaults were the best. Mark> .. Mark> I am concerned with fragmentation on the very special workload Mark> in this case. I'd really like the 20GB files, written over a Mark> 1-2 hour period, to consist of a very few very large extents, as Mark> much as possible. Mark> Rather than hundreds or thousands of "tiny" MB sized extents. I Mark> wonder what the best mkfs.xfs parameters might be to encourage Mark> that? Hmmm, should the application be pre-allocating the disk space then, so that the writes get into nice large extents automatically? Isn't this what the fallocate() system call is for? Doesn't MythTV use this? I don't use XFS, or MythTV, but I like keeping track of this stuff. John _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 15:12 ` Mark Lord 2011-01-27 15:40 ` Justin Piszcz @ 2011-01-27 23:41 ` Dave Chinner 2011-01-28 0:59 ` Mark Lord 1 sibling, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-27 23:41 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, xfs, Linux Kernel, Stan Hoeppner, Alex Elder On Thu, Jan 27, 2011 at 10:12:23AM -0500, Mark Lord wrote: > On 11-01-27 12:30 AM, Stan Hoeppner wrote: > > Mark Lord put forth on 1/26/2011 9:49 PM: > > > >> agcount=7453 > > > > That's probably a bit high Mark, and very possibly the cause of your problems. > > :) Unless the disk array backing this filesystem has something like 400-800 > > striped disk drives. You said it's a single 2TB drive right? > > > > The default agcount for a single drive filesystem is 4 allocation groups. For > > mdraid (of any number of disks/configuration) it's 16 allocation groups. > > > > Why/how did you end up with 7452 allocation groups? That can definitely cause > > some performance issues due to massively excessive head seeking, and possibly > > all manner of weirdness. > > This is great info, exactly the kind of feedback I was hoping for! > > The filesystem is about a year old now, and I probably used agsize=nnnnn > when creating it or something. > > So if this resulted in what you consider to be many MANY too MANY ags, > then I can imagine the first new file write wanting to go out and read > in all of the ag data to determine the "best fit" or something. > Which might explain some of the delay. > > Once I get the new 2TB drive, I'll re-run mkfs.xfs and then copy everything > over onto a fresh xfs filesystem. > > Can you recommend a good set of mkfs.xfs parameters to suit the characteristics > of this system? http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E And perhaps you want to consider the allocsize mount option, though that shouldn't be necessary for 2.6.38+... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 23:41 ` Dave Chinner @ 2011-01-28 0:59 ` Mark Lord 0 siblings, 0 replies; 34+ messages in thread From: Mark Lord @ 2011-01-28 0:59 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, xfs, Linux Kernel, Stan Hoeppner, Alex Elder On 11-01-27 06:41 PM, Dave Chinner wrote: > On Thu, Jan 27, 2011 at 10:12:23AM -0500, Mark Lord wrote: .. >> Can you recommend a good set of mkfs.xfs parameters to suit the characteristics >> of this system? > > http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E That entry says little beyond "blindly trust the defaults". But thanks anyway (really). > And perhaps you want to consider the allocsize mount option, though > that shouldn't be necessary for 2.6.38+... That's a good tip, thanks. >From my earlier posting: > /dev/sdb1 on /var/lib/mythtv type xfs > (rw,noatime,allocsize=64M,logbufs=8,largeio) Maybe that allocsize value could be increased though. Perhaps something on the order of 256MB might do it. Thanks again! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:49 ` Mark Lord 2011-01-27 5:17 ` Stan Hoeppner 2011-01-27 15:12 ` Mark Lord @ 2011-01-27 23:39 ` Dave Chinner 2 siblings, 0 replies; 34+ messages in thread From: Dave Chinner @ 2011-01-27 23:39 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On Wed, Jan 26, 2011 at 10:49:03PM -0500, Mark Lord wrote: > On 11-01-26 10:30 PM, Dave Chinner wrote: > > [Please cc xfs@oss.sgi.com on XFS bug reports. Added.] > > > > On Wed, Jan 26, 2011 at 08:22:25PM -0500, Mark Lord wrote: > >> Alex / Christoph, > >> > >> My mythtv box here uses XFS on a 2TB drive for storing recordings and videos. > >> It is behaving rather strangely though, and has gotten worse recently. > >> Here is what I see happening: > >> > >> The drive mounts fine at boot, but the very first attempt to write a new file > >> to the filesystem suffers from a very very long pause, 30-60 seconds, during which > >> time the disk activity light is fully "on". > > > > Please post the output of xfs_info <mtpt> so we can see what you > > filesystem configuration is. > > /dev/sdb1 on /var/lib/mythtv type xfs > (rw,noatime,allocsize=64M,logbufs=8,largeio) > > [~] xfs_info /var/lib/mythtv > meta-data=/dev/sdb1 isize=256 agcount=7453, agsize=65536 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=488378638, imaxpct=5 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=32768, version=2 > = sectsz=512 sunit=0 blks, lazy-count=0 > realtime =none extsz=4096 blocks=0, rtextents=0 7453 AGs means that the first write coud cause up to ~7500 disk reads to occur as the AGF headers are read in to find where the best free space extent for allocation lies. That'll be your problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <4D40CDCF.4010301@teksavvy.com>]
* Re: xfs: very slow after mount, very slow at umount [not found] ` <4D40CDCF.4010301@teksavvy.com> @ 2011-01-27 3:43 ` Dave Chinner 2011-01-27 3:53 ` Mark Lord 0 siblings, 1 reply; 34+ messages in thread From: Dave Chinner @ 2011-01-27 3:43 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On Wed, Jan 26, 2011 at 08:43:43PM -0500, Mark Lord wrote: > On 11-01-26 08:22 PM, Mark Lord wrote: > > Alex / Christoph, > > > > My mythtv box here uses XFS on a 2TB drive for storing recordings and videos. > > It is behaving rather strangely though, and has gotten worse recently. > > Here is what I see happening: > > > > The drive mounts fine at boot, but the very first attempt to write a new file > > to the filesystem suffers from a very very long pause, 30-60 seconds, during which > > time the disk activity light is fully "on". > > > > This happens only on the first new file write after mounting. > >>From then on, the filesystem is fast and responsive as expected. > > If I umount the filesystem, and then mount it again, > > the exact same behaviour can be observed. > > > > This of course screws up mythtv, as it causes me to lose the first 30-60 > > seconds of the first recording it attempts after booting. So as a workaround > > I now have a startup script to create, sync, and delete a 64MB file before > > starting mythtv. This still takes 30-60 seconds, but it all happens and > > finishes before mythtv has a real-time need to write to the filesystem. > > > > The 2TB drive is fine -- zero errors, no events in the SMART logs, > > and I've disabled the silly WD head-unload logic on it. > > > > What's happening here? Why the big long burst of activity? > > I've only just noticed this behaviour in the past few weeks, > > running 2.6.35 and more recently 2.6.37. > > > > * * * > > > > The other issue is something I notice at umount time. > > I have a second big drive used as a backup device for the drive discussed above. > > I use "mirrordir" (similar to rsync) to clone directories/files from the main > > drive to the backup drive. After mirrordir finishes, I then "umount /backup". > > The umount promptly hangs, disk light on solid, for 30-60 seconds, then finishes. > > > > If I type "sync" just before doing the umount, sync takes about 1 second, > > and the umount finishes instantly. > > > > Huh? What's happening there? > > > > System is running 2.6.37 from kernel.org, but similar behaviour > > has been there under 2.6.35 and 2.6.34. Dunno about earlier. > > > > I can query any info you need from the filesystem. > > > Thinking about it some more: the first problem very much appears as if > it is due to a filesystem check happening on the already-mounted filesystem, > if that makes any kind of sense (?). Not to me. You can check this simply by looking at the output of top while the problem is occurring... > Because.. running xfs_check on the umounted drive takes about the same 30-60 > seconds, > with the disk activity light fully "on". Well, yeah - XFS check reads all the metadata in the filesystem, so of course it's going to thrash your disk when it is run. The fact it takes the same length of time as whatever problem you are having is likely to be coincidental. > The other thought that came to mind: this behaviour has only been > noticed recently, probably because I have recently added about > 1000 new files (hundreds of MB each) to the videos/ directory on > that filesystem. Whereas before, it had fewer than 500 (multi-GB) > files in total. > > So if it really is doing some kind of internal filesystem check, > then the time required has only recently become 3X larger than > before.. so the behaviour may not be new/recent, but now is very > noticeable. Where does that 3x figure come from? Have you measured it? If so, what are the numbers? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:43 ` Dave Chinner @ 2011-01-27 3:53 ` Mark Lord 2011-01-27 4:54 ` Mark Lord 2011-01-27 23:34 ` Dave Chinner 0 siblings, 2 replies; 34+ messages in thread From: Mark Lord @ 2011-01-27 3:53 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On 11-01-26 10:43 PM, Dave Chinner wrote: > On Wed, Jan 26, 2011 at 08:43:43PM -0500, Mark Lord wrote: >> On 11-01-26 08:22 PM, Mark Lord wrote: .. >> Thinking about it some more: the first problem very much appears as if >> it is due to a filesystem check happening on the already-mounted filesystem, >> if that makes any kind of sense (?). > > Not to me. You can check this simply by looking at the output of > top while the problem is occurring... Top doesn't show anything interesting, since disk I/O uses practically zero CPU. >> running xfs_check on the umounted drive takes about the same 30-60 seconds, >> with the disk activity light fully "on". > > Well, yeah - XFS check reads all the metadata in the filesystem, so > of course it's going to thrash your disk when it is run. The fact it > takes the same length of time as whatever problem you are having is > likely to be coincidental. I find it interesting that the mount takes zero-time, as if it never actually reads much from the filesystem. Something has to eventually read the metadata etc. >> The other thought that came to mind: this behaviour has only been >> noticed recently, probably because I have recently added about >> 1000 new files (hundreds of MB each) to the videos/ directory on >> that filesystem. Whereas before, it had fewer than 500 (multi-GB) >> files in total. >> >> So if it really is doing some kind of internal filesystem check, >> then the time required has only recently become 3X larger than >> before.. so the behaviour may not be new/recent, but now is very >> noticeable. > > Where does that 3x figure come from? Well, it used to have about 500 files/subdirs on it, and now it has somewhat over 1500 files/subdirs. That's a ballpark estimate of 3X the amount of meta data. All of these files are at least large (hundreds of MB), and a lot are huge (many GB) in size. Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:53 ` Mark Lord @ 2011-01-27 4:54 ` Mark Lord 2011-01-27 23:34 ` Dave Chinner 1 sibling, 0 replies; 34+ messages in thread From: Mark Lord @ 2011-01-27 4:54 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On 11-01-26 10:53 PM, Mark Lord wrote: > On 11-01-26 10:43 PM, Dave Chinner wrote: >> On Wed, Jan 26, 2011 at 08:43:43PM -0500, Mark Lord wrote: >>> On 11-01-26 08:22 PM, Mark Lord wrote: > .. >>> Thinking about it some more: the first problem very much appears as if >>> it is due to a filesystem check happening on the already-mounted filesystem, >>> if that makes any kind of sense (?). >> >> Not to me. You can check this simply by looking at the output of >> top while the problem is occurring... > > Top doesn't show anything interesting, since disk I/O uses practically zero CPU. > >>> running xfs_check on the umounted drive takes about the same 30-60 seconds, >>> with the disk activity light fully "on". >> >> Well, yeah - XFS check reads all the metadata in the filesystem, so >> of course it's going to thrash your disk when it is run. The fact it >> takes the same length of time as whatever problem you are having is >> likely to be coincidental. > > I find it interesting that the mount takes zero-time, > as if it never actually reads much from the filesystem. > Something has to eventually read the metadata etc. > >>> The other thought that came to mind: this behaviour has only been >>> noticed recently, probably because I have recently added about >>> 1000 new files (hundreds of MB each) to the videos/ directory on >>> that filesystem. Whereas before, it had fewer than 500 (multi-GB) >>> files in total. >>> >>> So if it really is doing some kind of internal filesystem check, >>> then the time required has only recently become 3X larger than >>> before.. so the behaviour may not be new/recent, but now is very >>> noticeable. >> >> Where does that 3x figure come from? > > Well, it used to have about 500 files/subdirs on it, > and now it has somewhat over 1500 files/subdirs. > That's a ballpark estimate of 3X the amount of meta data. > > All of these files are at least large (hundreds of MB), > and a lot are huge (many GB) in size. I've rebuilt the kernel with the various config options to enable blktrace and XFS_DEBUG, but in the meanwhile we have also watched and deleted a few GB of recordings. The result is that the mysterious first-write delay has vanished, for now, so there's nothing to trace. I think I'll pick up an extra 2TB drive, so that next time it surfaces I can simply bit-clone the filesystem or something, to preserve the buggered state for further examination. The second issue is probably still there, and I'll blktrace that instead. But it will have to wait a spell -- I've run out of time here right now. Cheers _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: xfs: very slow after mount, very slow at umount 2011-01-27 3:53 ` Mark Lord 2011-01-27 4:54 ` Mark Lord @ 2011-01-27 23:34 ` Dave Chinner 1 sibling, 0 replies; 34+ messages in thread From: Dave Chinner @ 2011-01-27 23:34 UTC (permalink / raw) To: Mark Lord; +Cc: Christoph Hellwig, xfs, Linux Kernel, Alex Elder On Wed, Jan 26, 2011 at 10:53:17PM -0500, Mark Lord wrote: > On 11-01-26 10:43 PM, Dave Chinner wrote: > > On Wed, Jan 26, 2011 at 08:43:43PM -0500, Mark Lord wrote: > >> On 11-01-26 08:22 PM, Mark Lord wrote: > .. > >> Thinking about it some more: the first problem very much appears as if > >> it is due to a filesystem check happening on the already-mounted filesystem, > >> if that makes any kind of sense (?). > > > > Not to me. You can check this simply by looking at the output of > > top while the problem is occurring... > > Top doesn't show anything interesting, since disk I/O uses practically zero CPU. My point is that xfs_check doesn't use zero cpu or memory - it uses quite a lot of both, so if it is not present in top output while the disk is being thrashed, it ain't running... > > >> running xfs_check on the umounted drive takes about the same 30-60 seconds, > >> with the disk activity light fully "on". > > > > Well, yeah - XFS check reads all the metadata in the filesystem, so > > of course it's going to thrash your disk when it is run. The fact it > > takes the same length of time as whatever problem you are having is > > likely to be coincidental. > > I find it interesting that the mount takes zero-time, > as if it never actually reads much from the filesystem. > Something has to eventually read the metadata etc. Sure, for a clean log it has basically nothing to do - a few disk reads to read the superblock, find the head/tail of the log, and little else needs doing. Only when log recovery needs to be done does mount do any significant IO. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2011-01-31 19:15 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4D40C8D1.8090202@teksavvy.com>
2011-01-27 3:30 ` xfs: very slow after mount, very slow at umount Dave Chinner
2011-01-27 3:49 ` Mark Lord
2011-01-27 5:17 ` Stan Hoeppner
2011-01-27 15:12 ` Mark Lord
2011-01-27 15:40 ` Justin Piszcz
2011-01-27 16:03 ` Mark Lord
2011-01-27 19:40 ` Stan Hoeppner
2011-01-27 20:11 ` david
2011-01-27 23:53 ` Stan Hoeppner
2011-01-28 2:09 ` david
2011-01-28 13:56 ` Dave Chinner
2011-01-28 19:26 ` david
2011-01-29 5:40 ` Dave Chinner
2011-01-29 6:08 ` david
2011-01-29 7:35 ` Dave Chinner
2011-01-31 19:17 ` Christoph Hellwig
2011-01-27 21:56 ` Mark Lord
2011-01-28 0:17 ` Dave Chinner
2011-01-28 1:22 ` Mark Lord
2011-01-28 1:36 ` Mark Lord
2011-01-28 4:14 ` David Rees
2011-01-28 14:22 ` Mark Lord
2011-01-28 7:31 ` Dave Chinner
2011-01-28 14:33 ` Mark Lord
2011-01-28 23:58 ` Dave Chinner
2011-01-28 19:18 ` Martin Steigerwald
2011-01-27 20:24 ` John Stoffel
2011-01-27 23:41 ` Dave Chinner
2011-01-28 0:59 ` Mark Lord
2011-01-27 23:39 ` Dave Chinner
[not found] ` <4D40CDCF.4010301@teksavvy.com>
2011-01-27 3:43 ` Dave Chinner
2011-01-27 3:53 ` Mark Lord
2011-01-27 4:54 ` Mark Lord
2011-01-27 23:34 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox