* automatically running fstrim @ 2011-05-24 16:53 Phil Karn 2011-05-25 10:06 ` Lukas Czerner 0 siblings, 1 reply; 8+ messages in thread From: Phil Karn @ 2011-05-24 16:53 UTC (permalink / raw) To: xfs Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't run fstrim out of my crontab? It doesn't seem to slow down my system significantly while it runs. As I understand fstrim, it walks through the file system free list issuing TRIMs for each entry, and except for whatever load the TRIM commands themselves generate (which is drive dependent) it shouldn't interfere that much with system operation. Correct? Is there any mechanism to issue these commands at a lower priority than regular disk I/O? Thanks for all the work you guys do on XFS. It is much appreciated. Phil _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-24 16:53 automatically running fstrim Phil Karn @ 2011-05-25 10:06 ` Lukas Czerner 2011-05-25 11:20 ` Phil Karn 2011-05-26 9:11 ` Dave Chinner 0 siblings, 2 replies; 8+ messages in thread From: Lukas Czerner @ 2011-05-25 10:06 UTC (permalink / raw) To: Phil Karn; +Cc: xfs On Tue, 24 May 2011, Phil Karn wrote: > Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't > run fstrim out of my crontab? It doesn't seem to slow down my system > significantly while it runs. > > As I understand fstrim, it walks through the file system free list > issuing TRIMs for each entry, and except for whatever load the TRIM > commands themselves generate (which is drive dependent) it shouldn't > interfere that much with system operation. Correct? Is there any > mechanism to issue these commands at a lower priority than regular disk I/O? No, not that I know of. But why not to run fstrim from cron lets say every day ? Note that you do not necessarily need to run it "all the time", because if the drive firmware has a lot of space for doing wear-leveling, there is no point of sending TRIM. Also keep in mind that lot of newer SSD's has some "hidden" space just for wear-leveling, so to get to the point where firmware will have hard time doing it and the drive actually get slower takes even more writes than just filling your drive up to max. So doing fstrim once or twice a day (it really depends on your work load) is more than enough. Also, since we have all this in place we might talk to distributions to add the infrastructure to actually recognise "discard enabled" devices and add fstrim into cron job automatically. Or, since the filesystem should know the best when is the "right" time to do this, we might try to figure out some kernel logic to trigger it. However it might be a little bit tricky, since every drive behaves differently... Thanks! -Lukas > > Thanks for all the work you guys do on XFS. It is much appreciated. > > Phil > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > -- _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-25 10:06 ` Lukas Czerner @ 2011-05-25 11:20 ` Phil Karn 2011-05-25 11:47 ` Lukas Czerner 2011-05-26 9:11 ` Dave Chinner 1 sibling, 1 reply; 8+ messages in thread From: Phil Karn @ 2011-05-25 11:20 UTC (permalink / raw) To: Lukas Czerner; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 643 bytes --] Thanks. My problem is that I've been running some workloads that can gobble up the SSD erased page pool rather quickly. It's a Perl script feeding a large number of email messages to procmail, one at a time. I think this creates and deletes a lot of temporary files. While XFS delayed allocation normally keeps such files from going to disk, I think procmail defeats this with fsync() to keep mail from ever being lost. So I've simply been running fstrim by hand a lot so I don't have a repeat of the system lockup I had a few days ago that I am pretty sure was due to my OCZ Revo drive not handling garbage collection very gracefully. Phil [-- Attachment #1.2: Type: text/html, Size: 683 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-25 11:20 ` Phil Karn @ 2011-05-25 11:47 ` Lukas Czerner 2011-05-25 22:36 ` Phil Karn 0 siblings, 1 reply; 8+ messages in thread From: Lukas Czerner @ 2011-05-25 11:47 UTC (permalink / raw) To: karn; +Cc: Lukas Czerner, xfs On Wed, 25 May 2011, Phil Karn wrote: > Thanks. My problem is that I've been running some workloads that can gobble > up the SSD erased page pool rather quickly. It's a Perl script feeding a > large number of email messages to procmail, one at a time. I think this > creates and deletes a lot of temporary files. While XFS delayed allocation > normally keeps such files from going to disk, I think procmail defeats this > with fsync() to keep mail from ever being lost. > > So I've simply been running fstrim by hand a lot so I don't have a repeat of > the system lockup I had a few days ago that I am pretty sure was due to my > OCZ Revo drive not handling garbage collection very gracefully. > > Phil > Interesting, system lockup really ? I have never seen problems like this and I have been doing a lot of SSD testing. Looks like that the drive is really crappy :), have you tried to look up for firmware update ? Anyway, if running fstrim more often solves the problem, it is fine. But I wonder if the other approach (periodic discard) would do better in this case (it might not since the files are really small and are unlinked often). Unfortunately xfs does not have this support yet, but other filesystems do (ext4,btrfs,...) so if you like you might try one of those and mount it with -o discard mount option. What it does is, that it will send a TRIM for every range of freed filesystem blocks. Generally, in its current state it comes with quite big performance loss (that's why we have fstrim), but in you case it might be actually more convenient than running fstrim all the time. Also it is handled automatically and the only think needed is to pass the "-i discard" mount option. Thanks! -Lukas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-25 11:47 ` Lukas Czerner @ 2011-05-25 22:36 ` Phil Karn 2011-05-26 7:56 ` Lukas Czerner 0 siblings, 1 reply; 8+ messages in thread From: Phil Karn @ 2011-05-25 22:36 UTC (permalink / raw) To: Lukas Czerner; +Cc: xfs On 5/25/11 4:47 AM, Lukas Czerner wrote: > unlinked often). Unfortunately xfs does not have this support yet, but > other filesystems do (ext4,btrfs,...) so if you like you might try one > of those and mount it with -o discard mount option. What it does is, > that it will send a TRIM for every range of freed filesystem blocks. > > Generally, in its current state it comes with quite big performance > loss (that's why we have fstrim), but in you case it might be actually > more convenient than running fstrim all the time. Also it is handled > automatically and the only think needed is to pass the "-i discard" > mount option. I have thought of using ext4 with the discard option on that device for just this reason. But this OCZ Revo SSD seems to execute TRIM rather slowly. I just timed it at 7 minutes 38 seconds to trim 46 GB of free space on a 90 GB SSD. I wouldn't want that to occur in the foreground while I'm running a program that's generating a lot of garbage blocks. Intel drives, at least, seem to execute TRIM much faster; I think they can take more blocks in each operation, and I conjecture that the drive controller simply adds them to a "to do" list for later erasure in the background. So there should probably be an option for "real-time" TRIM on those SSDs that can do it without a performance penalty. It would be nice if the fitrim ioctl were to issue TRIM commands only for newly created garbage blocks that haven't already been trimmed. But I guess that would require some major changes to the file system data structures. At the least, it would require some special record-keeping to keep track of this information. The Intel drive shows it's possible to implement a very speedy TRIM, so maybe it won't be such a bad thing to just trim the whole free list every time. Phil _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-25 22:36 ` Phil Karn @ 2011-05-26 7:56 ` Lukas Czerner 0 siblings, 0 replies; 8+ messages in thread From: Lukas Czerner @ 2011-05-26 7:56 UTC (permalink / raw) To: Phil Karn; +Cc: Lukas Czerner, xfs On Wed, 25 May 2011, Phil Karn wrote: > On 5/25/11 4:47 AM, Lukas Czerner wrote: > > > > unlinked often). Unfortunately xfs does not have this support yet, but > > other filesystems do (ext4,btrfs,...) so if you like you might try one > > of those and mount it with -o discard mount option. What it does is, > > that it will send a TRIM for every range of freed filesystem blocks. > > > > Generally, in its current state it comes with quite big performance > > loss (that's why we have fstrim), but in you case it might be actually > > more convenient than running fstrim all the time. Also it is handled > > automatically and the only think needed is to pass the "-i discard" > > mount option. > > I have thought of using ext4 with the discard option on that device for > just this reason. But this OCZ Revo SSD seems to execute TRIM rather > slowly. I just timed it at 7 minutes 38 seconds to trim 46 GB of free > space on a 90 GB SSD. I wouldn't want that to occur in the foreground > while I'm running a program that's generating a lot of garbage blocks. > > Intel drives, at least, seem to execute TRIM much faster; I think they > can take more blocks in each operation, and I conjecture that the drive > controller simply adds them to a "to do" list for later erasure in the > background. So there should probably be an option for "real-time" TRIM > on those SSDs that can do it without a performance penalty. Well, this is a bit tricky. I have had a chance to test drive like this and I realized that the drive seems to perform slower after more and more trims sent to it. It did eventually recover, however it took about half a minute to get performance back. Well, it is still a bit young technology. If you want to see some of my results, look here: http://people.redhat.com/lczerner/discard/ there is also a tool available to do the testing. > > It would be nice if the fitrim ioctl were to issue TRIM commands only > for newly created garbage blocks that haven't already been trimmed. But > I guess that would require some major changes to the file system data > structures. At the least, it would require some special record-keeping > to keep track of this information. There are some patches for ext4 to do something like this, however it is still not finished. > The Intel drive shows it's possible > to implement a very speedy TRIM, so maybe it won't be such a bad thing > to just trim the whole free list every time. > > Phil Thanks! -Lukas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-25 10:06 ` Lukas Czerner 2011-05-25 11:20 ` Phil Karn @ 2011-05-26 9:11 ` Dave Chinner 2011-05-26 9:57 ` Lukas Czerner 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2011-05-26 9:11 UTC (permalink / raw) To: Lukas Czerner; +Cc: Phil Karn, xfs On Wed, May 25, 2011 at 12:06:32PM +0200, Lukas Czerner wrote: > On Tue, 24 May 2011, Phil Karn wrote: > > > Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't > > run fstrim out of my crontab? It doesn't seem to slow down my system > > significantly while it runs. > > > > As I understand fstrim, it walks through the file system free list > > issuing TRIMs for each entry, and except for whatever load the TRIM > > commands themselves generate (which is drive dependent) it shouldn't > > interfere that much with system operation. Correct? Is there any > > mechanism to issue these commands at a lower priority than regular disk I/O? > > No, not that I know of. But why not to run fstrim from cron lets say every > day ? Note that you do not necessarily need to run it "all the time", > because if the drive firmware has a lot of space for doing > wear-leveling, there is no point of sending TRIM. > > Also keep in mind that lot of newer SSD's has some "hidden" space just > for wear-leveling, so to get to the point where firmware will have hard > time doing it and the drive actually get slower takes even more writes > than just filling your drive up to max. > > So doing fstrim once or twice a day (it really depends on your work > load) is more than enough. > > Also, since we have all this in place we might talk to distributions to > add the infrastructure to actually recognise "discard enabled" devices > and add fstrim into cron job automatically. History suggests regularly scheduled preventative maintenance like this can have unintended consequences that don't show up for some time. When XFS first got it's online defrag tool (xfs_fsr) back on Irix in the late 90s, it was considered a good idea that running it once a week to quickly detect and fix fragementation problems before they got out of hand. That seems like a good idea, but then 6-12 months later people started reporting XFS filesystems with really severe fragmentation, worse than before xfs_fsr was being run regularly. The majority of the files that had been in the filesystem for some time were not fragmented, but any new file would be badly fragemented and could not be fixed. It was then discovered that the act of defragmenting files caused the fragementation of free space. That is, for every file with 2 extents that was defragmented into 1 extent, we now have two freespace extents instead of 1. So, the more files you defragment, the more free space fragments you create. If you don't delete files regularly, then eventually you run out of large free space extents. Then you can't defragment files any more, nor can you create unfragemented files. So, xfs_fsr was then removed from the system weekly cron job, and filesystems that suffered from this went through a dump-mkfs-restore process to defragment them. From that time, xfs_fsr has been recommended as a "run only when fragmentation is causing perf problems" type of tool... The moral of this story is that running trim as a preventative maintenance tool could have the same sort of unintended long-term consequences. That is, it may look like a good idea to run it often to keep things clean and neat, but we just don't know what it is doing to the underlying device's algorithms and it may take months for such problems to show up. e.g. as a device that performance cannot be restored to except via a secure erase.... > Or, since the filesystem > should know the best when is the "right" time to do this, we might try > to figure out some kernel logic to trigger it. However it might be a > little bit tricky, since every drive behaves differently... And that makes it much more likely that it will cause some kind of unintended problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: automatically running fstrim 2011-05-26 9:11 ` Dave Chinner @ 2011-05-26 9:57 ` Lukas Czerner 0 siblings, 0 replies; 8+ messages in thread From: Lukas Czerner @ 2011-05-26 9:57 UTC (permalink / raw) To: Dave Chinner; +Cc: Lukas Czerner, Phil Karn, xfs On Thu, 26 May 2011, Dave Chinner wrote: > On Wed, May 25, 2011 at 12:06:32PM +0200, Lukas Czerner wrote: > > On Tue, 24 May 2011, Phil Karn wrote: > > > > > Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't > > > run fstrim out of my crontab? It doesn't seem to slow down my system > > > significantly while it runs. > > > > > > As I understand fstrim, it walks through the file system free list > > > issuing TRIMs for each entry, and except for whatever load the TRIM > > > commands themselves generate (which is drive dependent) it shouldn't > > > interfere that much with system operation. Correct? Is there any > > > mechanism to issue these commands at a lower priority than regular disk I/O? > > > > No, not that I know of. But why not to run fstrim from cron lets say every > > day ? Note that you do not necessarily need to run it "all the time", > > because if the drive firmware has a lot of space for doing > > wear-leveling, there is no point of sending TRIM. > > > > Also keep in mind that lot of newer SSD's has some "hidden" space just > > for wear-leveling, so to get to the point where firmware will have hard > > time doing it and the drive actually get slower takes even more writes > > than just filling your drive up to max. > > > > So doing fstrim once or twice a day (it really depends on your work > > load) is more than enough. > > > > Also, since we have all this in place we might talk to distributions to > > add the infrastructure to actually recognise "discard enabled" devices > > and add fstrim into cron job automatically. > > History suggests regularly scheduled preventative maintenance like > this can have unintended consequences that don't show up for some > time. > > When XFS first got it's online defrag tool (xfs_fsr) back on Irix in > the late 90s, it was considered a good idea that running it once a > week to quickly detect and fix fragementation problems before they > got out of hand. > > That seems like a good idea, but then 6-12 months later people > started reporting XFS filesystems with really severe fragmentation, > worse than before xfs_fsr was being run regularly. The majority of > the files that had been in the filesystem for some time were not > fragmented, but any new file would be badly fragemented and could > not be fixed. > > It was then discovered that the act of defragmenting files caused > the fragementation of free space. That is, for every file with 2 > extents that was defragmented into 1 extent, we now have two > freespace extents instead of 1. So, the more files you defragment, > the more free space fragments you create. If you don't delete files > regularly, then eventually you run out of large free space extents. > Then you can't defragment files any more, nor can you create > unfragemented files. > > So, xfs_fsr was then removed from the system weekly cron job, and > filesystems that suffered from this went through a dump-mkfs-restore > process to defragment them. From that time, xfs_fsr has been > recommended as a "run only when fragmentation is causing perf > problems" type of tool... > > The moral of this story is that running trim as a preventative > maintenance tool could have the same sort of unintended long-term > consequences. That is, it may look like a good idea to run it often > to keep things clean and neat, but we just don't know what it is > doing to the underlying device's algorithms and it may take months > for such problems to show up. e.g. as a device that performance > cannot be restored to except via a secure erase.... Hi Dave, Interesting story really, so what you have got from this experience is "lesson learned". I would not be very optimistic about avoiding this next logical step, because otherwise we'll never learn the lesson, hence things might be still wrong but silent enough that noone notice. It is the same like enabling virtually any feature, unless you do not enable it by default it get very little testing and you'll never find if there is anything deeply wrong with it. But I agree that we have to be careful with enabling something to do its job periodically. So now (I hope) people will use it, possibly create their own cron jobs, a if there is any problem, we'll notice. And after six moths or so, when new Fedora will come out (hypothetically with mentioned infrastructure) it should be relatively safe. But still, this is something to discuss. > > > Or, since the filesystem > > should know the best when is the "right" time to do this, we might try > > to figure out some kernel logic to trigger it. However it might be a > > little bit tricky, since every drive behaves differently... > > And that makes it much more likely that it will cause some kind of > unintended problem. I agree, that's why I like the first approach better. > > Cheers, > > Dave. Thanks! -Lukas _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-05-26 9:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-24 16:53 automatically running fstrim Phil Karn 2011-05-25 10:06 ` Lukas Czerner 2011-05-25 11:20 ` Phil Karn 2011-05-25 11:47 ` Lukas Czerner 2011-05-25 22:36 ` Phil Karn 2011-05-26 7:56 ` Lukas Czerner 2011-05-26 9:11 ` Dave Chinner 2011-05-26 9:57 ` Lukas Czerner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox