* Options for SSD - autodefrag etc? @ 2014-01-23 22:23 KC 2014-01-24 6:54 ` Duncan 2014-01-24 20:14 ` Kai Krakow 0 siblings, 2 replies; 15+ messages in thread From: KC @ 2014-01-23 22:23 UTC (permalink / raw) To: linux-btrfs I was wondering about whether using options like "autodefrag" and "inode_cache" on SSDs. On one hand, one always hears that defragmentation of SSD is a no-no, does that apply to BTRFS's autodefrag? Also, just recently, I heard something similar about "inode_cache". On the other hand, Arch BTRFS wiki recommends to use both options on SSDs http://wiki.archlinux.org/index.php/Btrfs#Mount_options So to clear things up, I ask at the source where people should know best. Does using those options on SSDs gives any benefits and causes non-negligible increase in SSD wear? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-23 22:23 Options for SSD - autodefrag etc? KC @ 2014-01-24 6:54 ` Duncan 2014-01-25 12:54 ` Martin Steigerwald 2014-01-24 20:14 ` Kai Krakow 1 sibling, 1 reply; 15+ messages in thread From: Duncan @ 2014-01-24 6:54 UTC (permalink / raw) To: linux-btrfs KC posted on Thu, 23 Jan 2014 23:23:35 +0100 as excerpted: > I was wondering about whether using options like "autodefrag" and > "inode_cache" on SSDs. > > On one hand, one always hears that defragmentation of SSD is a no-no, > does that apply to BTRFS's autodefrag? > Also, just recently, I heard something similar about "inode_cache". > > On the other hand, Arch BTRFS wiki recommends to use both options on > SSDs http://wiki.archlinux.org/index.php/Btrfs#Mount_options > > So to clear things up, I ask at the source where people should know > best. > > Does using those options on SSDs gives any benefits and causes > non-negligible increase in SSD wear? inode_cache is not recommended for general use, tho it can make sense for use-cases such as busy maildir based email servers where there's a lot of small files being constantly written and erased. Additionally, since btrfs is not yet fully stable (tho with kernel 3.13 the kconfig warning for btrfs was officially decreased in severity), my thought is if it's disabled, that's one less feature I have to worry about bugs in, for my filesystems. =:^) So don't enable inode_cache unless you know you need it. autodefrag is an interesting one, and I asked about it too when I was setting up my ssd-backed btrfs filesystems, so good question! =:^) Yes, autodefrag does use up somewhat limited on SSD write cycles, and yes, there's no seek time to worry about on SSDs so fragmentation doesn't hurt as badly as it does on spinning rust. There's still some cost to fragmentation, however -- each file fragment is an IOPS count on access, and while modern SSDs are rated pretty high IOPS, copy-on-write (COW) based filesystems like btrfs can heavily fragment "internally rewritten" (as opposed to written once and never changed, or new data always appended at the end like a log file or streaming media recording) files. We've seen worst-cases internal- rewritten files such as multi-gig VMs reported here, with 100K extents or more! That *WILL* eat up IOPS, even on SSDs, and there's other serious issues with that heavily fragmented a file as well, not least the additional chance of damage to it given all the work btrfs has to do tracking all those extents! But for that large a file, autodefrag isn't really the best option. See a couple paragraphs down for a better one for such large files. There are several COW-triggered fragmentation worst-cases. Perhaps the most common one on a typical desktop is small database files such as the sqlite files used for firefox history, cookies, etc, and this is where the autodefrag mount option really shines and what it was designed for. Larger internal-write files (say half a gig or bigger), particularly highly active ones where file updates may come fast enough rewriting the whole file slows things down, like big active database files, pre- allocated bittorrent download files, or multi-gig VM images, are a rather different problem, and autodefrag doesn't work as well with them. For these, the NOCOW file attribute (set with chattr +C, see the chattr manpage), which with btrfs must be set before data is written into the file, works rather better. The easiest way to set the attribute before the file is written into is to set it on the containing directory so new files created in it inherit the attribute automatically. So setup your database, VMs, or torrent client to use the same dir for everything, then set +C/NOCOW on that dir before the files are downloaded/created/copied- into-it/whatever. That way, rewrites happen in-place instead of creating a new extent every time some bit of the file changes. Of course another alternative is to use an entirely separate filesystem for your big internal-write files, either something like ext4 that's not COW-based, or btrfs with the NODATACOW mount option set (tho you'd definitely not want to use that for a general purpose btrfs). But back to autodefrag. It's also worth noting that actually doing the install with this option enabled can make a difference too, as apparently a number of popular distro installers trigger fragmentation during their work, leaving even brand new installations heavily fragmented if the install is to btrfs mounted without autodefrag. One more note on fragmentation. filefrag doesn't yet understand btrfs compression, and reports each compression block (128 KiB IIRC) as a separate extent. So if you use compression (I use compress=lzo, here), don't be surprised to see larger files reported as several hundred extents, perhaps a few thousand on gigabyte sized files. If you're worried about it, (manually, btrfs fi defrag) defrag the file and see if the number of reported extents goes down significantly. If it does, the file was fragmented and defragmenting helped. If not, defragmenting didn't help. Anyway, yes, I turned autodefrag on for my SSDs, here, but there are arguments to be made in either direction, so I can understand people choosing not to do that. One not-btrfs specific mount option that's very useful for btrfs, particularly if you're using btrfs snapshotting features, SSD or not, is noatime. While admins have been disabling atime updates for years to get better performance and that's recommended in general unless you run mutt (with other than mbox files) or something else that requires it, given that the exclusive size of a snapshot is the size of the filesystem changes written between it and the previous snapshot, with atime updates on and not a lot of other writes, those atime updates can be a big part of the exclusive size of that snapshot! So disabling them means smaller and more efficient snapshots, particularly if there isn't that much other write activity going on either. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-24 6:54 ` Duncan @ 2014-01-25 12:54 ` Martin Steigerwald 2014-01-26 21:44 ` Duncan 0 siblings, 1 reply; 15+ messages in thread From: Martin Steigerwald @ 2014-01-25 12:54 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs Hi Duncan, Am Freitag, 24. Januar 2014, 06:54:31 schrieb Duncan: > Anyway, yes, I turned autodefrag on for my SSDs, here, but there are > arguments to be made in either direction, so I can understand people > choosing not to do that. Do you have numbers to back up that this gives any advantage? I have it disabled and yet I have things like: Oh, this is insane. This filefrags runs for over a minute already. And hogging on one core eating almost 100% of its processing power. merkaba:/home/martin/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend> /usr/bin/time -v filefrag soprano-virtuoso.db Wow, this still didn´t complete yet – even after 5 minutes. Well I have some files with several ten thousands extent. But first, this is mounted with compress=lzo, so 128k is the largest extent size as far as I know, and second: I did manual btrfs filesystem defragment on files like those and and never ever perceived any noticable difference in performance. Thus I just gave up on trying to defragment stuff on the SSD. Well, now that command completed: soprano-virtuoso.db: 93807 extents found Command being timed: "filefrag soprano-virtuoso.db" User time (seconds): 0.00 System time (seconds): 338.77 Percent of CPU this job got: 98% Elapsed (wall clock) time (h:mm:ss or m:ss): 5:42.81 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 520 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 181 Voluntary context switches: 9978 Involuntary context switches: 1216 Swaps: 0 File system inputs: 150160 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 And this is really quite high. But… I think I have a more pressing issue with that BTRFS /home on an Intel SSD 320 and that is that it is almost full: merkaba:~> LANG=C df -hT /home Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/merkaba-home btrfs 254G 241G 8.5G 97% /home merkaba:~> btrfs filesystem show […] Label: home uuid: […] Total devices 1 FS bytes used 238.99GiB devid 1 size 253.52GiB used 253.52GiB path /dev/mapper/merkaba-home Btrfs v3.12 merkaba:~> btrfs filesystem df /home Data, single: total=245.49GiB, used=237.07GiB System, DUP: total=8.00MiB, used=48.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=4.00GiB, used=1.92GiB Metadata, single: total=8.00MiB, used=0.00 Okay, I could probably get back 1,5 GiB on metadata, but whenever I tried a btrfs filesystem balance on any of the BTRFS filesystems on my SSD I usually got the following unpleasant result: Halve of the performance. Like double boot times on / and such. So I have the following thoughts: 1) I am not yet clear whether defragmenting files on SSD will really bring a benefit. 2) On my /home problem is more that it is almost full and free space appears to be highly fragmented. Long fstrim times speak tend to agree with it: merkaba:~> /usr/bin/time fstrim -v /home /home: 13494484992 bytes were trimmed 0.00user 12.64system 1:02.93elapsed 20%CPU (0avgtext+0avgdata 768maxresident)k 192inputs+0outputs (0major+243minor)pagefaults 0swaps 3) Turning autodefrag on might fragment free space even more. 4) I have no clear conclusion on what maintenance other than scrubbing might make sense for BTRFS filesystems on SSDs at all. Everything I tried either did not have any perceivable effect or made things worse. Thus for SSD except for the scrubbing and the occasional fstrim I be done with it. For harddisks I enable autodefrag. But still for now this is only guess work. I don´t have much clue on BTRFS filesystems maintenance yet and I just remember the slogan on xfs.org wiki: "Use the defaults." With a cite of Donald Knuth: "Premature optimization is the root of all evil." http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E I would love to hear some more or less official words from BTRFS filesystem developers on that. But for know I think one of the best optimizations would be to complement that 300 GB Intel SSD 320 with a 512 GB Crucial m5 mSATA SSD or some Intel mSATA SSDs (but these cost twice as much), and make more free space on /home again. For criticial data regarding data safety and amount of accesses I could even use BTRFS RAID 1 then. All those MPEG3 and photos I could place on the bigger mSATA SSD. Granted a SSD is definately not needed for those, but it is just more silent. I never got how loud even a tiny 2,5 inch laptop drive is, unless I switched one external on while using this ThinkPad T520 with SSD. For the first time I heard the harddisk clearly. Thus I´d prefer a SSD anyway. Still even with that highly full filesystem performance is pretty nice here. Except for some burts on btrfs-delalloc kernel threads once in a while. Especially when I fill it even a bit more. BTRFS has trouble finding free space on this partition. I saw this thread being active for half a minute without much happening on BTRFS. Thus I really think its good to get it at least to 20-30 GiB free again. Well I could still add about 13 GiB to it if I get rid of a 10 GiB volume for testing out SSD caching. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-25 12:54 ` Martin Steigerwald @ 2014-01-26 21:44 ` Duncan 0 siblings, 0 replies; 15+ messages in thread From: Duncan @ 2014-01-26 21:44 UTC (permalink / raw) To: linux-btrfs Martin Steigerwald posted on Sat, 25 Jan 2014 13:54:40 +0100 as excerpted: > Hi Duncan, > > Am Freitag, 24. Januar 2014, 06:54:31 schrieb Duncan: >> Anyway, yes, I turned autodefrag on for my SSDs, here, but there are >> arguments to be made in either direction, so I can understand people >> choosing not to do that. > > Do you have numbers to back up that this gives any advantage? Your post (like some of mine) reads like a stream of consciousness more than a well organized post, making it somewhat difficult to reply to (I guess I'm now experiencing the pain others sometimes mention when trying to reply to some of mine). However, I'll try... I haven't done benchmarks, etc, nor do I have them at hand to quote, if that's what you're asking for. But of course I did say I understand the arguments made by both sides, and just gave the reasons why I made the choice I did, here. What I /do/ have is the multiple post here on this list from people complaining about pathologic[1] performance issues due to large-internal- written-file fragmentation even on SSDs, particularly so when interacting with non-trivial numbers of snapshots as well. That's a case that at present simply Does. Not. Scale. Period! Of course the multi-gig internal-rewritten-file case is better suited to the NOCOW extended attribute than to autodefrag, but anyway... > I have it disabled and yet I have things like: > > Oh, this is insane. This filefrags runs for over [five minutes] > already. And hogging on one core eating almost 100% of its processing > power. > /usr/bin/time -v filefrag soprano-virtuoso.db > Well, now that command completed: > > soprano-virtuoso.db: 93807 extents found > Command being timed: "filefrag soprano-virtuoso.db" > User time (seconds): 0.00 > System time (seconds): 338.77 > Percent of CPU this job got: 98% > Elapsed (wall clock) time (h:mm:ss or m:ss): 5:42.81 I don't see any mention of the file size. I'm (informally) collecting data on that sort of thing ATM, since it's exactly the sort of thing I was referring to, and I've seen enough posts on the list about it to have caught my interest. FWIW I'll guess something over a gig, perhaps 2-3 gigs... Also FWIW, while my desktop of choice is indeed KDE, I'm running gentoo, and turned off USE=semantic-desktop and related flags some time ago (early kde 4.7, so over 2.5 years ago now), entirely purging nepomuk, virtuoso, etc, from my system. That was well before I switched to btrfs, but the performance improvement from not just turning it off at runtime (I already had it off at runtime) but entirely purging it from my system was HUGE, I mean like clean all the malware off an MS Windows machine and see how much faster it runs HUGE, *WELL* more than I expected! (I had /expected/ just to get rid of a few packages that I'd no longer have to update, little or no performance improvement at all, since I already the data indexing, etc, turned off to the extent that I could, at runtime. Boy was I surprised, but in a GOOD way! =:^) Anyway, because I have that stuff not only disabled at runtime but entirely turned off at build time and purged from the system as well, I don't have such a database file available here to compare with yours. But I'd certainly be interested in knowing how big yours actually was, since I already have both the filefrag report on it, and your complaint about how long it took filefrag to compile that information and report back. > Well I have some files with several ten thousands extent. But first, > this is mounted with compress=lzo, so 128k is the largest extent size as > far as I know Well, you're mounting with compress=lzo (which I'm using too, FWIW), not compress-force=lzo, so btrfs won't try to compress it if it thinks it's already compressed. Unfortunately, I believe there's no tool to report on whether btrfs has actually compressed the file or not, and as you imply filefrag doesn't know about btrfs compression yet, so just running the filefrag on a file on a compress=lzo btrfs doesn't really tell you a whole lot. =:^( What you /could/ do (well, after you've freed some space given your filesystem usage information below, or perhaps to a different filesystem) would be copy the file elsewhere, using reflink=no just to be sure it's actually copied, and see what filefrag reports on the new copy. Assuming enough free space btrfs should write the new file as a single extent, so if filefrag reports a similar number of extents on the new copy, you'll know it's compression related, while if it reports only one or a small handful of extents, you'll know the original wasn't compressed and it's real fragmentation. It would also be interesting to know how long a filefrag on the new file takes, as compared to the original, but in ordered to get an apples to apples comparison, you'd have to either drop-caches before doing the filefrag on the new one, or reboot, since after the copy it'd be cached, while the 5+ minute time on the original above was presumably with very little of the file actually cached. And of course you could temporarily mount without the compress=lzo option and do the copy, if you find it is the compression triggering the extents report from filefrag, just to see the difference compression makes. Or similarly, you could mount with compress-force=lzo and try it, if you find btrfs isn't compressing the file with ordinary compress=lzo, again to see the difference that makes. > and second: I did manual btrfs filesystem defragment on > files like those and and never ever perceived any noticable difference > in performance. > > Thus I just gave up on trying to defragment stuff on the SSD. I still say it'd be interesting to see the (from cold-cache) filefrag report and timing on a fresh copy, compared to the 5 minute plus timing above. > And this is really quite high. > But… I think I have a more pressing issue with that BTRFS /home > on an Intel SSD 320 and that is that it is almost full: > > merkaba:~> LANG=C df -hT /home > Filesystem Type Size Used Avail Use% Mounted on > /dev/mapper/merkaba-home btrfs 254G 241G 8.5G 97% /home Yeah, that's uncomfortably close to full... (FWIW, it's also interesting comparing that to a df on my /home... $>> df . Filesystem 2M-blocks Used Available Use% Mounted on /dev/sda6 20480 12104 7988 61% /h As you can see I'm using 2M blocks (alias df=df -B2M), but the filesystem is raid1 both data and metadata, so the numbers would be double and the 2M blocks are thus 1M block equivalent. (You can also see that I've actually mounted it on /h, not /home. /home is actually a symlink to /h just in case, but I export HOME=/h/whatever, and most programs honor that.) So the partition size is 20480 MiB or 20.0 GiB, with ~12+ GiB used, just under 8 GiB available. It can be and is so small because I have a dedicated media partition with all the big stuff located elsewhere (still on reiserfs on spinning rust, as a matter of fact). Just interesting to see how people setup their systems differently, is all, thus the "FWIW". But the small independent partitions do make for much shorter balance times, etc! =:^) > merkaba:~> btrfs filesystem show […] > Label: home uuid: […] > Total devices 1 FS bytes used 238.99GiB > devid 1 size 253.52GiB used 253.52GiB path [...] > > Btrfs v3.12 > > merkaba:~> btrfs filesystem df /home > Data, single: total=245.49GiB, used=237.07GiB > System, DUP: total=8.00MiB, used=48.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=4.00GiB, used=1.92GiB > Metadata, single: total=8.00MiB, used=0.00 It has come up before on this list and doesn't hurt anything, but those extra system-single and metadata-single chunks can be removed. A balance with a zero usage filter should do it. Something like this: btrfs balance start -musage=0 That will act on metadata chunks with usage=0 only. It may or may not act on the system chunk. Here it does, and metadata implies system also, but someone reported it didn't, for them. If it doesn't... btrfs balance start -f -susage=0 ... should do it. (-f=force, needed if acting on system chunk only.) https://btrfs.wiki.kernel.org/index.php/Balance_Filters (That's for the filter info, not well documented in the manpage yet. The manpage documents btrfs balance fairly well tho, other than that.) Anyway... 252 gigs used of 252 total in filesystem show. That's full enough you may not even be able to balance as there's no unallocated blocks left to allocate for the balance. But the usage=0 thing may get you a bit of room, after which you can try usage=1, etc, to hopefully recover a bit more, until you get at least /some/ unallocated space as a buffer to work with. Right now, you're risking being unable to allocate anything more when data or metadata runs out, and I'd be worried about that. > Okay, I could probably get back 1,5 GiB on metadata, but whenever I > tried a btrfs filesystem balance on any of the BTRFS filesystems on my > SSD I usually got the following unpleasant result: > > Halve of the performance. Like double boot times on / and such. That's weird. I wonder why/how, unless it's simply so full an SSD that the firmware's having serious trouble doing its thing. I know I've seen nothing like that on my SSDs. But then again, my usage is WILDLY different, with my largest partition 24 gigs, and only about 60% of the SSD even partitioned at all because I keep the big stuff like media files on spinning rust (and reiserfs, not btrfs), so the firmware has *LOTS* of room to shuffle blocks around for write-cycle balancing, etc. And of course I'm using a different brand SSD. (FWIW, Corsair Neutron 256 GB, 238 GiB, *NOT* the Neutron GTX.) But if anything, Intel SSDs have a better rep than my Corsair Neutrons do, so I doubt that has anything to do with it. > So I have the following thoughts: > > 1) I am not yet clear whether defragmenting files on SSD will really > bring a benefit. Of course that's the question of the entire thread. As I said, I have it turned on here, but I understand the arguments for both sides, and from here that question does appear to remain open for debate. One other related critical point while we're on the subject. A number of people have reported that at least for some distros installed to btrfs, brand new installs are coming up significantly fragmented. Apparently some distros do their install to btrfs mounted without autodefrag turned on. And once there's existing fragmentation, turning on autodefrag /then/ results in a slowdown for several boot cycles, as normal usage detects and queues for defrag, then defrags, all those already fragmented files. There's an eventual speedup (at least on spinning rust, SSDs of course are open to question, thus this thread), but the system has to work thru the existing backlog of fragmentation before you'll see it. Of course one way out of that (temporary but sometimes several days) pain is to deliberately run a btrfs defrag recursive (new enough btrfs has a recursive flag, previous to that, one had to play some tricks with find, as documented on the wiki) on the entire filesystem. That will be more intense pain, but it'll be over faster! =:^) The point being, if a reader is considering autodefrag, be SURE and turn it on BEFORE there's a whole bunch of already fragmented data on the filesystem. Ideally, turn it on for the first mount after the mkfs.btrfs, and never mount without it. That ensures there's never a chance for fragmentation to get out of hand in the first place. =:^) (Well, with the additional caveat that the NOCOW extended attribute is used appropriately on internal-rewrite files such as VM images, databases, bittorrent preallocations, etc, when said file approaches a gig or larger. But that is discussed elsewhere.) > 2) On my /home problem is more that it is almost full and free space > appears to be highly fragmented. Long fstrim times speak tend to agree > with it: > > merkaba:~> /usr/bin/time fstrim -v /home > /home: 13494484992 bytes were trimmed > 0.00user 12.64system 1:02.93elapsed 20%CPU Some people wouldn't call a minute "long", but yeah, on an SSD, even at several hundred gig, that's definitely not "short". It's not well comparable because as I explained, my partition sizes are so much smaller, but for reference, a trim on my 20-gig /home took a bit over a second. Doing the math, that'd be 10-20 seconds for 200+ gigs. That you're seeing a minute, does indeed seem to indicate high free-space fragmentation. But again, I'm at under 60% SSD space even partitioned, so there's LOTS of space for the firmware to do its management thing. If your SSD is 256 gig as mine, with 253+ gigs used (well, I see below it's 300 gig, but still...) ... especially if you're not running with the discard mount option (which could be an entire thread of its own, but at least there's some official guidance on it), that firmware could be working pretty hard indeed with the resources it has at its disposal! I expect you'd see quite a difference if you could reduce that to say 80% partitioned and trim the other 20%, giving the firmware a solid 20% extra space to work with. If you could then give btrfs some headroom on the reduced size partition as well, well... > 3) Turning autodefrag on might fragment free space even more. Now, yes. As I stressed above, turn it on when the filesystem's new, before you start loading it with content, and the story should be quite different. Don't give it a chance to fragment in the first place. =:^) > 4) I have no clear conclusion on what maintenance other than scrubbing > might make sense for BTRFS filesystems on SSDs at all. Everything I > tried either did not have any perceivable effect or made things worse. Well, of course there's backups. Given that btrfs isn't fully stabilized yet and there are still bugs being worked out, those are *VITAL* maintenance! =:^) Also, for the same reason (btrfs isn't yet fully stable), I recently refreshed and double-checked my backups, then blew away the existing btrfs with a fresh mkfs.btrfs and restored from backup. The brand new filesystems now make use of several features that the older ones didn't have, including the new 16k nodesize default. =:^) For anyone who has been running btrfs for awhile, that's potentially a nice improvement. I expect to do the same thing at least once more, later on after btrfs has settled down to more or less routine stability, just to clear out any remaining not-fully-stable-yet corner-cases that may eventually come back to haunt me if I don't, as well as to update the filesystem to take advantage of any further format updates between now and then. That's useful btrfs maintenance, SSD or no SSD. =:^) > Thus for SSD except for the scrubbing and the occasional fstrim I be > done with it. > > For harddisks I enable autodefrag. > > But still for now this is only guess work. I don´t have much clue on > BTRFS filesystems maintenance yet and I just remember the slogan on > xfs.org wiki: > > "Use the defaults." =:^) > I would love to hear some more or less official words from BTRFS > filesystem developers on that. But for know I think one of the best > optimizations would be to complement that 300 GB Intel SSD 320 with a > 512 GB Crucial m5 mSATA SSD or some Intel mSATA SSDs (but these cost > twice as much), and make more free space on /home again. For criticial > data regarding data safety and amount of accesses I could even use BTRFS > RAID 1 then. Indeed. I'm running btrfs raid1 mode with my ssds (except for /boot, where I have a separate one configured on each drive, so I can grub install update one and test it before doing the other, without endangering my ability to boot off the other should something go wrong). > All those MPEG3 and photos I could place on the bigger > mSATA SSD. Granted a SSD is definately not needed for those, but it is > just more silent. I never got how loud even a tiny 2,5 inch laptop drive > is, unless I switched one external on while using this ThinkPad T520 > with SSD. For the first time I heard the harddisk clearly. Thus I´d > prefer a SSD anyway. Well, yes. But SSDs cost money. And at least here, while I could justify two SSDs in raid1 mode for my critical data, and even overprovision such that I have nearly 50% available space entirely unpartitioned, I really couldn't justify spending SSD money on gigs of media files. But as they say, YMMV... --- [1] Pathologic: THAT is the word I was looking for in several recent posts, but couldn't remember, not "pathetic", "pathologic"! But all I could think of was pathetic, and I knew /that/ wasn't what I wanted, so explained using other words instead. So if you see any of my other recent posts on the issue and think I'm describing a pathologic case using other words, it's because I AM! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-23 22:23 Options for SSD - autodefrag etc? KC 2014-01-24 6:54 ` Duncan @ 2014-01-24 20:14 ` Kai Krakow 2014-01-25 13:11 ` Martin Steigerwald 1 sibling, 1 reply; 15+ messages in thread From: Kai Krakow @ 2014-01-24 20:14 UTC (permalink / raw) To: linux-btrfs KC <conrad.francois.artus@googlemail.com> schrieb: > I was wondering about whether using options like "autodefrag" and > "inode_cache" on SSDs. > > On one hand, one always hears that defragmentation of SSD is a no-no, > does that apply to BTRFS's autodefrag? > Also, just recently, I heard something similar about "inode_cache". > > On the other hand, Arch BTRFS wiki recommends to use both options on SSDs > http://wiki.archlinux.org/index.php/Btrfs#Mount_options > > So to clear things up, I ask at the source where people should know best. > > Does using those options on SSDs gives any benefits and causes > non-negligible increase in SSD wear? I'm not an expert, but I wondered myself. And while I still have not SSD yet I would prefer turning autodefrag on even for SSD - at least when I have no big write-intensive files on the device (but you should plan your FS to not have those on a SSD anyways) because btrfs may rewrite large files on SSD just for the purpose of autodefragging. I hope that will improve soon, maybe by only defragging parts of the file given some sane thresholds. Why I decided I would turn it on? Well, heavily fragmented files give a performance overhead, and btrfs tends to fragment files fast (except for the nodatacow mount flag with its own downsides). An adaptive online defrag ensures you gain no performance loss due to very scattert extents. And: Fragmented files (or let's better say fragmented free space) increases write-amplification (at least for long-living filesystems) because when small amounts of free space are randomly scattered all over the device the filesystem has to fill these holes at some point in time. This decreases performance because it has to find these holes and possibly split batched write requests, and it potentially decreases life-time of your SSD because the read-modify-write-erase cycle takes action in more places than what would be needed if the free space hole had just been big enough. I don't know how big erase blocks [*] are - but think about it. You will come to the conclusion that it will reduce life-time. So it is generally recommended to defragment heavily fragmented files, leave alone the not-so-heavily fragmented files and coalesce free space holes into bigger free space areas on a regular basis. I think, an effective algorithm could coalesce free space into bigger areas of freespace and as a side effect simply defragment those files whose parts had to be moved anyways to merge free space. During this process, a trim should be applied. I wonder if btrfs will optimize for this use case in the future... All in all, I'd say: Defragmenting a SSD is not that bad if done right, and if done right it will even improve life-time and performance. And I believe this is why the wiki recommends it. I'd recommend combining it with compress=lzo or maybe even compress-force=lzo (unless your SSD firmware does compression) - it should give a performance boost and reduces writes to your SSD. YMMV - so do your (long-term) benchmarking. If performance and life-time is a really big concern then only partition and ever use 75% of your device and leave the rest of it untouched so it can be used as spare area for wear-levelling [**]. It will give you a good long- term performance and should increase life-time. [*] Erase blocks are usually much much bigger than the block size you can read and write data at. Flash memory cannot be overwritten, it is essentially write-once-read-many, so it needs to be erased. This is where the read-modify-write-erase cycle comes from and why wear-leveling is needed: Read the whole erase block, modify it with your data block, write it to a new location, erase and free the old block. So you see: Writing just 4k can result in (128k-4k) read, 128k written, 128k erased (so something like a write-amplification factor of 64), given an erase block size of 128k. Do this a lot and randomly scattered, and performance and life-time will suffer a lot. The SSD firmware will try to buffer as much data as possible before the read-modify-write-erase-cycle kicks in to decrease the bad effects of random writes. So a block-sorting scheduler (deadline instead of noop) and increasing nr_requests may be a good idea. This is also why you may want to look into filesystems that turn random writes into sequential writes like f2fs or why you may want to use bcache which also turns random writes into sequential writes for the cache device (your SSD). [**] This ([*]) is why you should keep a spare area... These are just my humble thoughts. You see: The topic may be a lot more complex than just saying "use noop scheduler" and "SSD needs no defragmentation". I think those statements are just plain wrong. -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-24 20:14 ` Kai Krakow @ 2014-01-25 13:11 ` Martin Steigerwald 2014-01-25 14:06 ` Kai Krakow 0 siblings, 1 reply; 15+ messages in thread From: Martin Steigerwald @ 2014-01-25 13:11 UTC (permalink / raw) To: Kai Krakow; +Cc: linux-btrfs Am Freitag, 24. Januar 2014, 21:14:21 schrieben Sie: > KC <conrad.francois.artus@googlemail.com> schrieb: > > I was wondering about whether using options like "autodefrag" and > > "inode_cache" on SSDs. > > > > > > > > On one hand, one always hears that defragmentation of SSD is a no-no, > > does that apply to BTRFS's autodefrag? > > Also, just recently, I heard something similar about "inode_cache". > > > > > > > > On the other hand, Arch BTRFS wiki recommends to use both options on SSDs > > > > http://wiki.archlinux.org/index.php/Btrfs#Mount_options > > > > > > So to clear things up, I ask at the source where people should know best. > > > > > > > > Does using those options on SSDs gives any benefits and causes > > non-negligible increase in SSD wear? > > I'm not an expert, but I wondered myself. And while I still have not SSD > yet I would prefer turning autodefrag on even for SSD - at least when I > have no big write-intensive files on the device (but you should plan your > FS to not have those on a SSD anyways) because btrfs may rewrite large > files on SSD just for the purpose of autodefragging. I hope that will > improve soon, maybe by only defragging parts of the file given some sane > thresholds. > > Why I decided I would turn it on? Well, heavily fragmented files give a > performance overhead, and btrfs tends to fragment files fast (except for > the nodatacow mount flag with its own downsides). An adaptive online > defrag ensures you gain no performance loss due to very scattert extents. > And: Fragmented files (or let's better say fragmented free space) increases > write-amplification (at least for long-living filesystems) because when > small amounts of free space are randomly scattered all over the device the > filesystem has to fill these holes at some point in time. This decreases > performance because it has to find these holes and possibly split batched > write requests, and it potentially decreases life-time of your SSD because > the read-modify-write-erase cycle takes action in more places than what > would be needed if the free space hole had just been big enough. I don't > know how big erase blocks [*] are - but think about it. You will come to > the conclusion that it will reduce life-time. Do you have any numbers to back your claim? I just demonstrated that >90000 extent Nepomuk database file. And still I do not see any serious performance degradation in KDE´s desktop search. For example I just entered nodatacow in Alt-F2 krunner text input and it presented me some indexed mails in an instant. I tried to defrag the file, but frankly even though numbers of extent decreased I never perceived any difference in performance whatsoever. I am just not convinced that autodefrag will give me any noticeable benefit for this Intel SSD 320 based /home. For seeing any visible difference I think you need to have an I/O pattern that generated lots of IOPS due to the fragmented file, i.e. is reading and writing continuously large amounts of the fragmented data, yet despite those >90000 extents I get: merkaba:/home/martin/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend> echo 3 > /proc/sys/vm/drop_caches ; /usr/bin/time -v dd if=soprano-virtuoso.db of=/dev/null bs=1M 2418+0 Datensätze ein 2418+0 Datensätze aus 2535456768 Bytes (2,5 GB) kopiert, 13,9546 s, 182 MB/s Command being timed: "dd if=soprano-virtuoso.db of=/dev/null bs=1M" User time (seconds): 0.00 System time (seconds): 2.77 Percent of CPU this job got: 19% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.96 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2000 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 2 Minor (reclaiming a frame) page faults: 549 Voluntary context switches: 9369 Involuntary context switches: 57 Swaps: 0 File system inputs: 5102304 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 So even if I read in the full 2,4 GiB where BTRFS will have to look up all the >90000 extents I get 182 MB/s. (I disabled Nepomuk during that test). Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso isn´t doing this kind of large scale I/O on a highly fragmented file. Its a database. Its random access. My oppinion is that Virtuoso couldn´t care less about the fragmentation of the file. As long as it is stored on the SSD. Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128 KiB gives at least about 20000 extents already provided that my calculation is correct. And these extents could be sequential (I doubt it tough also give the high free space fragmention I suspect to be on this FS). Anyway: I do not perceive any noticable performance issues due to file fragmentation on SSD and think that at least on highly filled BTRFS filesystem autodefrag may do more harm than good (like fragment free space and then let btrfs-delalloc go crazy on new allocations). I know xfs_fsr for defragmenting XFS in the background, even via cron job. And I think I remember Dave Chinner telling in some post that even for harddisks it may not be a very wise idea to run this frequently due to the risk to fragment free space. There are several kinds of fragmentations. And defragmenting files may increase freespace fragmentation. Thus, I am not yet convinced regarding autodefrag on SSDs. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-25 13:11 ` Martin Steigerwald @ 2014-01-25 14:06 ` Kai Krakow 2014-01-25 16:19 ` Martin Steigerwald 0 siblings, 1 reply; 15+ messages in thread From: Kai Krakow @ 2014-01-25 14:06 UTC (permalink / raw) To: linux-btrfs Martin Steigerwald <Martin@lichtvoll.de> schrieb: > Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso > isn´t doing this kind of large scale I/O on a highly fragmented file. Its > a database. Its random access. My oppinion is that Virtuoso couldn´t care > less about the fragmentation of the file. As long as it is stored on the > SSD. I think it makes no real difference here since access to virtuoso is random anyway. And if I got you right you run it nocow, so upon writes you aren't introducing more fragmentation to the file. All is good... It probably would even be good with cow as virtuoso is read-most, so rarely written to. For VM images it might be a whole different story. The guest system sees a block device and expects it to be continuous. All optimizations for access patterns cannot work right if btrfs is constantly moving parts of the file around for doing cow. So make it nocow and all should be as good as it can get. > Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128 > KiB gives at least about 20000 extents already provided that my > calculation is correct. And these extents could be sequential (I doubt it > tough also give the high free space fragmention I suspect to be on this > FS). Your CPU is more mighty than the flash chips. LZO improves read performance. But does it make sense on Intel drives? I think they already do compression. > Anyway: I do not perceive any noticable performance issues due to file > fragmentation on SSD and think that at least on highly filled BTRFS > filesystem autodefrag may do more harm than good (like fragment free space > and then let btrfs-delalloc go crazy on new allocations). I know xfs_fsr > for defragmenting XFS in the background, even via cron job. And I think I > remember Dave Chinner telling in some post that even for harddisks it may > not be a very wise idea to run this frequently due to the risk to fragment > free space. > > There are several kinds of fragmentations. And defragmenting files may > increase freespace fragmentation. This is why I wondered if btrfs will be optimized for keeping free space together in the future for SSD. But it's not as simple as this. It should not scatter file blocks all over the disk just to fill tiny holes. It should try to keep file blocks together so the read-modify-write-erase cycle of SSDs can work optimally. > Thus, I am not yet convinced regarding autodefrag on SSDs. I think everything would be easier if btrfs exposed some stats about what the autodefrag thread is really doing... -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-25 14:06 ` Kai Krakow @ 2014-01-25 16:19 ` Martin Steigerwald 0 siblings, 0 replies; 15+ messages in thread From: Martin Steigerwald @ 2014-01-25 16:19 UTC (permalink / raw) To: Kai Krakow; +Cc: linux-btrfs Am Samstag, 25. Januar 2014, 15:06:24 schrieb Kai Krakow: > Martin Steigerwald <Martin@lichtvoll.de> schrieb: > > Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso > > isn´t doing this kind of large scale I/O on a highly fragmented file. Its > > a database. Its random access. My oppinion is that Virtuoso couldn´t care > > less about the fragmentation of the file. As long as it is stored on the > > SSD. > > I think it makes no real difference here since access to virtuoso is random > anyway. And if I got you right you run it nocow, so upon writes you aren't > introducing more fragmentation to the file. All is good... It probably would > even be good with cow as virtuoso is read-most, so rarely written to. No, its not nocow. > For VM images it might be a whole different story. The guest system sees a > block device and expects it to be continuous. All optimizations for access > patterns cannot work right if btrfs is constantly moving parts of the file > around for doing cow. So make it nocow and all should be as good as it can > get. I have some VirtualBox based VMs. I never see any issue with that. They are just fast. But then, for write based workloads I read hints that Virtualbox may not honor fsync() that closely. > > Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128 > > KiB gives at least about 20000 extents already provided that my > > calculation is correct. And these extents could be sequential (I doubt it > > tough also give the high free space fragmention I suspect to be on this > > FS). > > Your CPU is more mighty than the flash chips. LZO improves read performance. > But does it make sense on Intel drives? I think they already do > compression. Not the Intel SSD 320 to my knowledge. > > Anyway: I do not perceive any noticable performance issues due to file > > fragmentation on SSD and think that at least on highly filled BTRFS > > filesystem autodefrag may do more harm than good (like fragment free space > > and then let btrfs-delalloc go crazy on new allocations). I know xfs_fsr > > for defragmenting XFS in the background, even via cron job. And I think I > > remember Dave Chinner telling in some post that even for harddisks it may > > not be a very wise idea to run this frequently due to the risk to fragment > > free space. > > > > There are several kinds of fragmentations. And defragmenting files may > > increase freespace fragmentation. > > This is why I wondered if btrfs will be optimized for keeping free space > together in the future for SSD. But it's not as simple as this. It should > not scatter file blocks all over the disk just to fill tiny holes. It should > try to keep file blocks together so the read-modify-write-erase cycle of > SSDs can work optimally. I am reluctant about conclusions about the behavior or SSDs. I am not sure whether a modern SSDs cares that much about scattering file blocks all over the disk. AFAIK all modern SSDs don´t tell the OS a thing about in which erase block they store something and all SSDs use some caching. So a modern SSD may just sort several write accesses even if there are at different ends of the block device together into adjacent erase blocks. Well, actually I think thats the whole point of SSD firmwares. I am pretty much sure that the blocks of the block device that Linux sees are not mapped sequentially to flash chips by the SSD firmware. AFAIK all SSDs have some internal mapping. So I wonder whether it even matters… Heck a SSD firmware even copies over stuff to distribute erase cycles evenly onto all flash chips in the background and whatnot. > > Thus, I am not yet convinced regarding autodefrag on SSDs. > > I think everything would be easier if btrfs exposed some stats about what > the autodefrag thread is really doing... … and if we actually knew how SSD firmwares really behave. But well… regarding autodefrag… I don´t know… my gut feeling is to disable it for SSDs for the reasons I outlined. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? @ 2014-01-24 18:55 KC 2014-01-24 20:27 ` Kai Krakow 0 siblings, 1 reply; 15+ messages in thread From: KC @ 2014-01-24 18:55 UTC (permalink / raw) To: linux-btrfs >>> From: Duncan <1i5t5.duncan <at> cox.net> >>> Subject: Re: Options for SSD - autodefrag etc? >>> Newsgroups: gmane.comp.file-systems.btrfs >>> Date: 2014-01-24 06:54:31 GMT (11 hours and 44 minutes ago) >>> KC posted on Thu, 23 Jan 2014 23:23:35 +0100 as excerpted: Duncan, thank you for this outstanding explanation. It was very informative and helpful. I only have one follow-up question. I followed your advice on NOCOW for virtualbox images and torrents like so: chattr -v /home/juha/VirtualBox\ VMs/ chattr -RC /home/juha/Downloads/torrent/#unfinished As you can see, i used the recursive flag. However, I do not know whether this will automatically apply to files that will be created in the future in subfolders that do not yet exist. Also, how can I confirm whether a file/folder has a NOCOW attribute set on it? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-24 18:55 KC @ 2014-01-24 20:27 ` Kai Krakow 2014-01-25 5:09 ` Duncan 2014-01-25 13:33 ` Imran Geriskovan 0 siblings, 2 replies; 15+ messages in thread From: Kai Krakow @ 2014-01-24 20:27 UTC (permalink / raw) To: linux-btrfs KC <conrad.francois.artus@googlemail.com> schrieb: > I followed your advice on NOCOW for virtualbox images and torrents like > so: chattr -v /home/juha/VirtualBox\ VMs/ > chattr -RC /home/juha/Downloads/torrent/#unfinished > > As you can see, i used the recursive flag. However, I do not know > whether this will automatically apply to files that will be created in > the future in subfolders that do not yet exist. > > Also, how can I confirm whether a file/folder has a NOCOW attribute set > on it? The C attribute is also inherited by newly created directories. But keep in mind that, at the time applied, it only has effects on existing files if they are empty (read: never written to yet). Newly created files will inherit the attribute from its directory and then behave as expected. You can use lsattr to confirm the C attribute was set. But again keep in mind: it does not reflect the file is actually nocow because of the above caveat. So in your use-case you may want to be sure by doing this (quit all VirtualBox instances beforehand): # mkdir "VirtualBox VMs.new" # chattr +C "VirtualBox VMs.new" # rsync -aSv "VirtualBox VMs"/. "VirtualBox VMs.new"/. # mv "VirtualBox VMs" "VirtualBox VMs.bak" # mv "VirtualBox VMs.new" "VirtualBox VMs" Then ensure everything is working, you can use lsattr to see the C attribute has been inherited. You should immediatly notice the effects of this by seeing better performing IO in VirtualBox (at least this was what I noticed). If everything was copied correctly, you can delete the backups. You could compare md5sums to be sure, of course before running a VM. ;-) -- Replies to list only preferred. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-24 20:27 ` Kai Krakow @ 2014-01-25 5:09 ` Duncan 2014-01-25 13:33 ` Imran Geriskovan 1 sibling, 0 replies; 15+ messages in thread From: Duncan @ 2014-01-25 5:09 UTC (permalink / raw) To: linux-btrfs Kai Krakow posted on Fri, 24 Jan 2014 21:27:19 +0100 as excerpted: > KC <conrad.francois.artus@googlemail.com> schrieb: > >> I followed your advice on NOCOW for virtualbox images and torrents >> [...] >> >> As you can see, i used the recursive flag. However, I do not know >> whether this will automatically apply to files that will be created in >> the future in subfolders that do not yet exist. >> >> Also, how can I confirm whether a file/folder has a NOCOW attribute set >> on it? > > The C attribute is also inherited by newly created directories. But keep > in mind that, at the time applied, it only has effects on existing files > if they are empty (read: never written to yet). Newly created files will > inherit the attribute from its directory and then behave as expected. > > You can use lsattr to confirm the C attribute was set. But again keep in > mind: it does not reflect the file is actually nocow because of the > above caveat. Excellent reply (including what I snipped). I don't actually work with VMs or other huge internal-write files much here, and don't otherwise work with extended attributes much, so would have had to lookup lsattr, and wasn't actually sure on the nested subdirs inheritance point myself tho I thought it /should/ work that way. And your chattr/rsync routine ensures all data will be newly copied in AFTER the chattr on the dir, thus nicely addressing the very critical point about NEW DATA ONLY coverage I was most worried about communicating correctly. =:^) Which is why I so enjoy mailing lists and newsgroups. Time and again I've seen one person's answer simply not getting the whole job done no matter how mightily they struggle to do so, but because it's a public list/group, someone else steps in with a followup that addresses the gaps left by the first answer. It nicely takes the pressure off any one person to have the "perfect" reply "every" time, as well as benefiting untold numbers of lurkers who now understand something they didn't know before, but may have never thought to ask themselves. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-24 20:27 ` Kai Krakow 2014-01-25 5:09 ` Duncan @ 2014-01-25 13:33 ` Imran Geriskovan 2014-01-25 14:01 ` Martin Steigerwald 1 sibling, 1 reply; 15+ messages in thread From: Imran Geriskovan @ 2014-01-25 13:33 UTC (permalink / raw) To: Kai Krakow; +Cc: linux-btrfs Every write on a SSD block reduces its data retension capability. No concrete figures but it is assumed to be - 10 years for new devices - 1 year at rated usage. (There are much lower figures around) Hence, I would not trade retension time and wear for autodefrag with no/minor benefits on SSD. (which means at least +2x write amplification on fragments) On hard disks, we've experienced temporary freezes (about 10secs to 3mins) during background autodefrag. Regards, Imran ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-25 13:33 ` Imran Geriskovan @ 2014-01-25 14:01 ` Martin Steigerwald 2014-01-26 17:18 ` Duncan 0 siblings, 1 reply; 15+ messages in thread From: Martin Steigerwald @ 2014-01-25 14:01 UTC (permalink / raw) To: Imran Geriskovan; +Cc: Kai Krakow, linux-btrfs Am Samstag, 25. Januar 2014, 15:33:08 schrieb Imran Geriskovan: > Every write on a SSD block reduces its data retension capability. > > No concrete figures but it is assumed to be > - 10 years for new devices > - 1 year at rated usage. (There are much lower figures around) Where do you have these figures from? For the Intel SSD 320 in this ThinkPad T520 I read about a minimal usable live of 5 years with 20 GB host writes each day in the tech specs. Thats 7300 GB a year or 7,3 TB. I assume metric system here. According to smartctl it has written 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 360158 360158 * 32 MiB (hmmm, now according to smartctl output this is MiB) which gives almost 11 TiB (10,99). The SSD is over 2,5 years old. Thats less than 5 TiB a year. So that would lay within the range you say. Although the Intel SSD 320 isn´t basically a new device in my eyes. Thats with KDE session with Akonadi and desktop search, sometimes even two KDE sessions and a load of applications running at times. Anyway that SSDs still thinks it is well *new*: 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0 Thats the same media wearout indicator (which takes into account the amount of Erase cycles according to Intel docs) it had at the first day I used it. So I am basically not concerned. While autodefrag mal cause additional writes… that would not even be the main reason for me not to use it at the moment. I am just not convinced that it gives any noticable benefit. And given that… of course it doesn´t make sense to me to have it cause additional writes to the SSD. But I am not using it due to avoiding those additional writes in the first place. My most important recommendation regarding SSDs still is: Keep some space free. Yeah, SSD manufacturers are doing this. But in another Intel SSD PDF I saw some graphs that convinced me in an instant that leaving free about 20% is a good idea. But heck, due to the current fill status of this SSD I do not even adhere to my own recommendation at the moment. Then a occasional fstrim, maybe mount with noatime (cause who cares about it at all?)… Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Options for SSD - autodefrag etc? 2014-01-25 14:01 ` Martin Steigerwald @ 2014-01-26 17:18 ` Duncan [not found] ` <KA9w1n01A0tVtje01A9yLn> 0 siblings, 1 reply; 15+ messages in thread From: Duncan @ 2014-01-26 17:18 UTC (permalink / raw) To: linux-btrfs Martin Steigerwald posted on Sat, 25 Jan 2014 15:01:13 +0100 as excerpted: > Am Samstag, 25. Januar 2014, 15:33:08 schrieb Imran Geriskovan: >> Every write on a SSD block reduces its data retension capability. >> >> No concrete figures but it is assumed to be - 10 years for new devices >> - 1 year at rated usage. (There are much lower figures around) > > Where do you have these figures from? > > For the Intel SSD 320 in this ThinkPad T520 I read about a minimal > usable live of 5 years with 20 GB host writes each day in the tech > specs. Thats 7300 GB a year or 7,3 TB. I assume metric system here. The two of you are talking about two entirely different things. 1) SSD's limited write-cycle thing, which you're talking about, is widely known, and must be considered, but with modern wear-leveling it's not a /horrible/ concern under /reasonable/ (that is, not constant write/erase as if to benchmark or prove the point) usage. While it's a real issue, I think it has been blown out of proportion, potentially by old-style spinning-rust manufacturers in ordered to maintain a market when it looked like SSD prices were going to drop to and below spinning rust prices within a few years (which they didn't do). I don't remember the exact numbers I saw given at one point, but they were in the context of worry over using SSD for swap. Suffice it to say that the level of constant writing to blow the write-cycle rating within a feasible swap-usage lifetime of 5 years was well beyond anything most people even with low memory would be doing. Once I saw those numbers, I more or less quit /worrying/ about that, and started /considering/ it, but in a far more "yes, this is practical to use without excessive worry" context. 2) What (I think) Imran was talking about was something very different altho somewhat related, which has seemed to get far less attention, the actual memory cell on-the-shelf-archival data retention lifetime. For comparison purposes and to make crystal clear that we're not talking about rewriting, it's well known that commercially pressed CDs have a useful lifetime of perhaps a few decades (15-25 years is what I've seen quoted) if treated /well/ (practically "well", still actually using them, not atmosphere-controlled file away for a decade and bring out to read once test then file away again data-archiving well), while CD-ROMs burnt at full rated 24x speed may retain their data for only perhaps 2-5 years, but reducing the write-speed to say 4x can often double or triple that, thus yielding a very reasonable decade or so of retention, midline, approaching commercial press lifetimes of a quarter century or so on the long end. With current-use common SSD MLC-flash-memory technology, the cell-data- retention lifetimes numbers I've seen are as Imran said, perhaps 10 years powered-off when new, a year at rated write-cycles, down as far as days or even hours past rating shortly before cell write failure. *HOWEVER*, that's *UNPOWERED* data retention time. Flash technology, like DRAM but on an order of hours/days/years instead of milliseconds, requires refreshing cell charge occasionally to maintain state. Plug in that USB thumbdrive that you've written to a couple of times then forgotten until you find it again several years later, and it'll probably still work. If the same thumbdrive was used as swap (impractical perhaps, but this is just a thought experiment example) on a low-memory machine for a year, such that it reached lifetime write rating, then unplugged and lost for a few years, then found and plugged in to see what's on it, very likely it'd be unreadable. OTOH, plug that same thumbdrive into an internal USB connector on a regularly used machine, use it as swap for a year, then reconfigure not to use it as swap any longer but keep it in the machine and keep using the machine regularly, so the thumbdrive continues to receive power but isn't actually used to store anything for a few years, and when that machine dies and you're salvaging it before throwing out the dead hulk, and you find that forgotten thumb drive still plugged into its internal slot, the data from its last use may very well still be readable, because the thumb drive was regularly powered and the cells recharged the whole time it wasn't otherwise used. Now apply that same idea to a standard SSD instead of a thumb drive. But with SSDs still relatively expensive compared to spinning rust, such sit around unpowered for years, or even weeks, usage, just isn't that common. And if the flash (in SSD or thumbdrive form) is regularly powered, the cells recharge and data should be retained. So again, as long as SSDs remain more expensive and lower capacity than spinning rust (and as long as capacity doesn't reach petabytes for under $100 at near current data usage, such that the difference in cost is so trivial it ceases to be a factor), they're relatively unlikely to be used for archival storage where unpowered data retention under say a year is that much of a factor. Sure, if unpowered retention life drops to weeks, someone might go on vacation and not power their work laptop for long enough to be a problem, but as long as unpowered retention remains a year or so at minimum, the issue isn't likely to hit the common person enough to hit the radar. Still, as can be seen by Imram's post, it's a real concern for some, perhaps because the technology is new enough and unproven enough that they're worried the numbers aren't actually that good, and that they'll find themselves on the wrong end of an outlier, dead in the water after taking a week off. But to quote you admittedly now out of context (since I happened to glance down and see your sentence, just waiting to be quoted in my new context! =:^) : > So I am basically not concerned. Particularly since I still have bootable spinning rust backups at this point in any case. I might lose a few months of work as I'm not exactly keeping those backups current, but the risk is low enough and the work I'd lose uncritical enough, that's a risk I'm willing to take... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <KA9w1n01A0tVtje01A9yLn>]
* Re: Options for SSD - autodefrag etc? [not found] ` <KA9w1n01A0tVtje01A9yLn> @ 2014-01-28 11:41 ` Duncan 0 siblings, 0 replies; 15+ messages in thread From: Duncan @ 2014-01-28 11:41 UTC (permalink / raw) To: KC, linux-btrfs On Mon, 27 Jan 2014 23:09:55 +0100 KC <impactoria@googlemail.com> wrote: > I forgot to ask about space_cache. Should it be off on SSD (ie > nospace_cache)? [I don't see this on the list (which I read/reply-to using nntp via gmane.org's list2news service) yet, so I'll reply to both the list and you directly via mail...] The default is now space_cache -- formerly a btrfs needed mounted with it once, after which the option was "sticky" and applied by default so it didn't need to be used again, and I believe the wiki mount-option documentation at least still says that, but for at least several kernels, the option seems to be on by default -- I never mounted with space_cache specifically given here, yet all my btrfs have it listed in /proc/self/mounts. So space-cache is now the default unless specifically turned off. And while I don't have a specific reason to use it on ssd, I don't have a good reason not to either, so not knowing anything specific I figured I'd be best sticking with the defaults. So on my btrfs on SSDs, space_cache is default-on simply because it's the default and I know no good reason to mess with the defaults in that case. Actually I hadn't even thought of it in the context of something else to record and thus to contribute to write-cycles, if there's no real benefit to it on SSDs otherwise, but I guess there is, or it'd default to off when ssds are detected, just like the ssd option is automatically turned on in that case. (In general, btrfs should in most cases be able to detect ssd if it's on the "bare metal" physical device or a partition on it. If the btrfs is on top of lvm or mdraid, however, or on some other mid-layer virtual device, it's less likely to properly detect ssd, and you'd likely need to turn that option on manually.) -- Duncan - No HTML messages please, as they are filtered as spam. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-01-28 11:41 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-23 22:23 Options for SSD - autodefrag etc? KC
2014-01-24 6:54 ` Duncan
2014-01-25 12:54 ` Martin Steigerwald
2014-01-26 21:44 ` Duncan
2014-01-24 20:14 ` Kai Krakow
2014-01-25 13:11 ` Martin Steigerwald
2014-01-25 14:06 ` Kai Krakow
2014-01-25 16:19 ` Martin Steigerwald
-- strict thread matches above, loose matches on Subject: below --
2014-01-24 18:55 KC
2014-01-24 20:27 ` Kai Krakow
2014-01-25 5:09 ` Duncan
2014-01-25 13:33 ` Imran Geriskovan
2014-01-25 14:01 ` Martin Steigerwald
2014-01-26 17:18 ` Duncan
[not found] ` <KA9w1n01A0tVtje01A9yLn>
2014-01-28 11:41 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).