Options for SSD - autodefrag etc?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Options for SSD - autodefrag etc?
@ 2014-01-23 22:23 KC
  2014-01-24  6:54 ` Duncan
  2014-01-24 20:14 ` Kai Krakow
  0 siblings, 2 replies; 15+ messages in thread
From: KC @ 2014-01-23 22:23 UTC (permalink / raw)
  To: linux-btrfs

I was wondering about whether using options like "autodefrag" and 
"inode_cache" on SSDs.

On one hand, one always hears that defragmentation of SSD is a no-no, 
does that apply to BTRFS's autodefrag?
Also, just recently, I heard something similar about "inode_cache".

On the other hand, Arch BTRFS wiki recommends to use both options on SSDs
  http://wiki.archlinux.org/index.php/Btrfs#Mount_options

So to clear things up, I ask at the source where people should know best.

Does using those options on SSDs gives any benefits and causes 
non-negligible increase in SSD wear?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-23 22:23 Options for SSD - autodefrag etc? KC
@ 2014-01-24  6:54 ` Duncan
  2014-01-25 12:54   ` Martin Steigerwald
  2014-01-24 20:14 ` Kai Krakow
  1 sibling, 1 reply; 15+ messages in thread
From: Duncan @ 2014-01-24  6:54 UTC (permalink / raw)
  To: linux-btrfs

KC posted on Thu, 23 Jan 2014 23:23:35 +0100 as excerpted:

> I was wondering about whether using options like "autodefrag" and
> "inode_cache" on SSDs.
> 
> On one hand, one always hears that defragmentation of SSD is a no-no,
> does that apply to BTRFS's autodefrag?
> Also, just recently, I heard something similar about "inode_cache".
> 
> On the other hand, Arch BTRFS wiki recommends to use both options on
> SSDs  http://wiki.archlinux.org/index.php/Btrfs#Mount_options
> 
> So to clear things up, I ask at the source where people should know
> best.
> 
> Does using those options on SSDs gives any benefits and causes
> non-negligible increase in SSD wear?

inode_cache is not recommended for general use, tho it can make sense for 
use-cases such as busy maildir based email servers where there's a lot of 
small files being constantly written and erased.  Additionally, since 
btrfs is not yet fully stable (tho with kernel 3.13 the kconfig warning 
for btrfs was officially decreased in severity), my thought is if it's 
disabled, that's one less feature I have to worry about bugs in, for my 
filesystems. =:^)  So don't enable inode_cache unless you know you need 
it.

autodefrag is an interesting one, and I asked about it too when I was 
setting up my ssd-backed btrfs filesystems, so good question! =:^)

Yes, autodefrag does use up somewhat limited on SSD write cycles, and 
yes, there's no seek time to worry about on SSDs so fragmentation doesn't 
hurt as badly as it does on spinning rust.

There's still some cost to fragmentation, however -- each file fragment 
is an IOPS count on access, and while modern SSDs are rated pretty high 
IOPS, copy-on-write (COW) based filesystems like btrfs can heavily 
fragment "internally rewritten" (as opposed to written once and never 
changed, or new data always appended at the end like a log file or 
streaming media recording) files.  We've seen worst-cases internal-
rewritten files such as multi-gig VMs reported here, with 100K extents or 
more!  That *WILL* eat up IOPS, even on SSDs, and there's other serious 
issues with that heavily fragmented a file as well, not least the 
additional chance of damage to it given all the work btrfs has to do 
tracking all those extents!  But for that large a file, autodefrag isn't 
really the best option.  See a couple paragraphs down for a better one 
for such large files.

There are several COW-triggered fragmentation worst-cases.  Perhaps the 
most common one on a typical desktop is small database files such as the 
sqlite files used for firefox history, cookies, etc, and this is where 
the autodefrag mount option really shines and what it was designed for.

Larger internal-write files (say half a gig or bigger), particularly 
highly active ones where file updates may come fast enough rewriting the 
whole file slows things down, like big active database files, pre-
allocated bittorrent download files, or multi-gig VM images, are a rather 
different problem, and autodefrag doesn't work as well with them.  For 
these, the NOCOW file attribute (set with chattr +C, see the chattr 
manpage), which with btrfs must be set before data is written into the 
file, works rather better.  The easiest way to set the attribute before 
the file is written into is to set it on the containing directory so new 
files created in it inherit the attribute automatically.  So setup your 
database, VMs, or torrent client to use the same dir for everything, then 
set +C/NOCOW on that dir before the files are downloaded/created/copied-
into-it/whatever.  That way, rewrites happen in-place instead of creating 
a new extent every time some bit of the file changes.

Of course another alternative is to use an entirely separate filesystem 
for your big internal-write files, either something like ext4 that's not 
COW-based, or btrfs with the NODATACOW mount option set (tho you'd 
definitely not want to use that for a general purpose btrfs).

But back to autodefrag. It's also worth noting that actually doing the 
install with this option enabled can make a difference too, as apparently 
a number of popular distro installers trigger fragmentation during their 
work, leaving even brand new installations heavily fragmented if the 
install is to btrfs mounted without autodefrag.

One more note on fragmentation.  filefrag doesn't yet understand btrfs 
compression, and reports each compression block (128 KiB IIRC) as a 
separate extent.  So if you use compression (I use compress=lzo, here), 
don't be surprised to see larger files reported as several hundred 
extents, perhaps a few thousand on gigabyte sized files.  If you're 
worried about it, (manually, btrfs fi defrag) defrag the file and see if 
the number of reported extents goes down significantly.  If it does, the 
file was fragmented and defragmenting helped.  If not, defragmenting 
didn't help.

Anyway, yes, I turned autodefrag on for my SSDs, here, but there are 
arguments to be made in either direction, so I can understand people 
choosing not to do that.

One not-btrfs specific mount option that's very useful for btrfs, 
particularly if you're using btrfs snapshotting features, SSD or not, is 
noatime. While admins have been disabling atime updates for years to get 
better performance and that's recommended in general unless you run mutt 
(with other than mbox files) or something else that requires it, given 
that the exclusive size of a snapshot is the size of the filesystem 
changes written between it and the previous snapshot, with atime updates 
on and not a lot of other writes, those atime updates can be a big part 
of the exclusive size of that snapshot!  So disabling them means smaller 
and more efficient snapshots, particularly if there isn't that much other 
write activity going on either.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-24  6:54 ` Duncan
@ 2014-01-25 12:54   ` Martin Steigerwald
  2014-01-26 21:44     ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2014-01-25 12:54 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Hi Duncan,

Am Freitag, 24. Januar 2014, 06:54:31 schrieb Duncan:
> Anyway, yes, I turned autodefrag on for my SSDs, here, but there are 
> arguments to be made in either direction, so I can understand people 
> choosing not to do that.

Do you have numbers to back up that this gives any advantage?

I have it disabled and yet I have things like:

Oh, this is insane. This filefrags runs for over a minute already. And hogging 
on one core eating almost 100% of its processing power.

merkaba:/home/martin/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend> 
/usr/bin/time -v filefrag soprano-virtuoso.db

Wow, this still didn´t complete yet – even after 5 minutes.

Well I have some files with several ten thousands extent. But first, this is 
mounted with compress=lzo, so 128k is the largest extent size as far as I 
know, and second: I did manual btrfs filesystem defragment on files like those 
and and never ever perceived any noticable difference in performance.

Thus I just gave up on trying to defragment stuff on the SSD.

Well, now that command completed:

soprano-virtuoso.db: 93807 extents found
        Command being timed: "filefrag soprano-virtuoso.db"
        User time (seconds): 0.00
        System time (seconds): 338.77
        Percent of CPU this job got: 98%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:42.81
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 520
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 181
        Voluntary context switches: 9978
        Involuntary context switches: 1216
        Swaps: 0
        File system inputs: 150160
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

And this is really quite high. But… I think I have a more pressing issue with 
that BTRFS /home on an Intel SSD 320 and that is that it is almost full:

merkaba:~> LANG=C df -hT /home
Filesystem               Type   Size  Used Avail Use% Mounted on
/dev/mapper/merkaba-home btrfs  254G  241G  8.5G  97% /home

merkaba:~> btrfs filesystem show
[…]
Label: home  uuid: […]
        Total devices 1 FS bytes used 238.99GiB
        devid    1 size 253.52GiB used 253.52GiB path /dev/mapper/merkaba-home

Btrfs v3.12

merkaba:~> btrfs filesystem df /home
Data, single: total=245.49GiB, used=237.07GiB
System, DUP: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=4.00GiB, used=1.92GiB
Metadata, single: total=8.00MiB, used=0.00

Okay, I could probably get back 1,5 GiB on metadata, but whenever I tried a 
btrfs filesystem balance on any of the BTRFS filesystems on my SSD I usually got 
the following unpleasant result:

Halve of the performance. Like double boot times on / and such.

So I have the following thoughts:

1) I am not yet clear whether defragmenting files on SSD will really bring a 
benefit.

2) On my /home problem is more that it is almost full and free space appears 
to be highly fragmented. Long fstrim times speak tend to agree with it:

merkaba:~> /usr/bin/time fstrim -v /home
/home: 13494484992 bytes were trimmed
0.00user 12.64system 1:02.93elapsed 20%CPU (0avgtext+0avgdata 768maxresident)k
192inputs+0outputs (0major+243minor)pagefaults 0swaps

3) Turning autodefrag on might fragment free space even more.

4) I have no clear conclusion on what maintenance other than scrubbing might 
make sense for BTRFS filesystems on SSDs at all. Everything I tried either did 
not have any perceivable effect or made things worse.

Thus for SSD except for the scrubbing and the occasional fstrim I be done with 
it.

For harddisks I enable autodefrag.

But still for now this is only guess work. I don´t have much clue on BTRFS 
filesystems maintenance yet and I just remember the slogan on xfs.org wiki:

"Use the defaults."

With a cite of Donald Knuth:

"Premature optimization is the root of all evil."

http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E

I would love to hear some more or less official words from BTRFS filesystem 
developers on that. But for know I think one of the best optimizations would 
be to complement that 300 GB Intel SSD 320 with a 512 GB Crucial m5 mSATA SSD 
or some Intel mSATA SSDs (but these cost twice as much), and make more free 
space on /home again. For criticial data regarding data safety and amount of 
accesses I could even use BTRFS RAID 1 then. All those MPEG3 and photos I 
could place on the bigger mSATA SSD. Granted a SSD is definately not needed for 
those, but it is just more silent. I never got how loud even a tiny 2,5 inch 
laptop drive is, unless I switched one external on while using this ThinkPad 
T520 with SSD. For the first time I heard the harddisk clearly. Thus I´d prefer 
a SSD anyway.

Still even with that highly full filesystem performance is pretty nice here. 
Except for some burts on btrfs-delalloc kernel threads once in a while. 
Especially when I fill it even a bit more. BTRFS has trouble finding free space 
on this partition. I saw this thread being active for half a minute without 
much happening on BTRFS. Thus I really think its good to get it at least to 
20-30 GiB free again. Well I could still add about 13 GiB to it if I get rid 
of a 10 GiB volume for testing out SSD caching.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-25 12:54   ` Martin Steigerwald
@ 2014-01-26 21:44     ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2014-01-26 21:44 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Sat, 25 Jan 2014 13:54:40 +0100 as excerpted:

> Hi Duncan,
> 
> Am Freitag, 24. Januar 2014, 06:54:31 schrieb Duncan:
>> Anyway, yes, I turned autodefrag on for my SSDs, here, but there are
>> arguments to be made in either direction, so I can understand people
>> choosing not to do that.
> 
> Do you have numbers to back up that this gives any advantage?

Your post (like some of mine) reads like a stream of consciousness more 
than a well organized post, making it somewhat difficult to reply to (I 
guess I'm now experiencing the pain others sometimes mention when trying 
to reply to some of mine).  However, I'll try...

I haven't done benchmarks, etc, nor do I have them at hand to quote, if 
that's what you're asking for.  But of course I did say I understand the 
arguments made by both sides, and just gave the reasons why I made the 
choice I did, here.

What I /do/ have is the multiple post here on this list from people 
complaining about pathologic[1] performance issues due to large-internal-
written-file fragmentation even on SSDs, particularly so when interacting 
with non-trivial numbers of snapshots as well.  That's a case that at 
present simply Does. Not. Scale. Period!

Of course the multi-gig internal-rewritten-file case is better suited to 
the NOCOW extended attribute than to autodefrag, but anyway...

> I have it disabled and yet I have things like:
> 
> Oh, this is insane. This filefrags runs for over [five minutes]
> already. And hogging on one core eating almost 100% of its processing
> power.

> /usr/bin/time -v filefrag soprano-virtuoso.db

> Well, now that command completed:
> 
> soprano-virtuoso.db: 93807 extents found
>         Command being timed: "filefrag soprano-virtuoso.db"
>         User time (seconds): 0.00
>         System time (seconds): 338.77
>         Percent of CPU this job got: 98%
>         Elapsed (wall clock) time (h:mm:ss or m:ss): 5:42.81

I don't see any mention of the file size.  I'm (informally) collecting 
data on that sort of thing ATM, since it's exactly the sort of thing I 
was referring to, and I've seen enough posts on the list about it to have 
caught my interest.

FWIW I'll guess something over a gig, perhaps 2-3 gigs...

Also FWIW, while my desktop of choice is indeed KDE, I'm running gentoo, 
and turned off USE=semantic-desktop and related flags some time ago 
(early kde 4.7, so over 2.5 years ago now), entirely purging nepomuk, 
virtuoso, etc, from my system.  That was well before I switched to btrfs, 
but the performance improvement from not just turning it off at runtime 
(I already had it off at runtime) but entirely purging it from my system 
was HUGE, I mean like clean all the malware off an MS Windows machine and 
see how much faster it runs HUGE, *WELL* more than I expected!  (I had 
/expected/ just to get rid of a few packages that I'd no longer have to 
update, little or no performance improvement at all, since I already the 
data indexing, etc, turned off to the extent that I could, at runtime.  
Boy was I surprised, but in a GOOD way! =:^)

Anyway, because I have that stuff not only disabled at runtime but 
entirely turned off at build time and purged from the system as well, I 
don't have such a database file available here to compare with yours.  
But I'd certainly be interested in knowing how big yours actually was, 
since I already have both the filefrag report on it, and your complaint 
about how long it took filefrag to compile that information and report 
back.

> Well I have some files with several ten thousands extent. But first,
> this is mounted with compress=lzo, so 128k is the largest extent size as
> far as I know

Well, you're mounting with compress=lzo (which I'm using too, FWIW), not 
compress-force=lzo, so btrfs won't try to compress it if it thinks it's 
already compressed.

Unfortunately, I believe there's no tool to report on whether btrfs has 
actually compressed the file or not, and as you imply filefrag doesn't 
know about btrfs compression yet, so just running the filefrag on a file 
on a compress=lzo btrfs doesn't really tell you a whole lot. =:^(

What you /could/ do (well, after you've freed some space given your 
filesystem usage information below, or perhaps to a different filesystem) 
would be copy the file elsewhere, using reflink=no just to be sure it's 
actually copied, and see what filefrag reports on the new copy.  Assuming 
enough free space btrfs should write the new file as a single extent, so 
if filefrag reports a similar number of extents on the new copy, you'll 
know it's compression related, while if it reports only one or a small 
handful of extents, you'll know the original wasn't compressed and it's 
real fragmentation.

It would also be interesting to know how long a filefrag on the new file 
takes, as compared to the original, but in ordered to get an apples to 
apples comparison, you'd have to either drop-caches before doing the 
filefrag on the new one, or reboot, since after the copy it'd be cached, 
while the 5+ minute time on the original above was presumably with very 
little of the file actually cached.

And of course you could temporarily mount without the compress=lzo option 
and do the copy, if you find it is the compression triggering the extents 
report from filefrag, just to see the difference compression makes.  Or 
similarly, you could mount with compress-force=lzo and try it, if you 
find btrfs isn't compressing the file with ordinary compress=lzo, again 
to see the difference that makes.

> and second: I did manual btrfs filesystem defragment on
> files like those and and never ever perceived any noticable difference
> in performance.
> 
> Thus I just gave up on trying to defragment stuff on the SSD.

I still say it'd be interesting to see the (from cold-cache) filefrag 
report and timing on a fresh copy, compared to the 5 minute plus timing 
above.

> And this is really quite high.

> But… I think I have a more pressing issue with that BTRFS /home
> on an Intel SSD 320 and that is that it is almost full:
> 
> merkaba:~> LANG=C df -hT /home
> Filesystem               Type   Size Used Avail Use% Mounted on
> /dev/mapper/merkaba-home btrfs  254G 241G  8.5G  97% /home

Yeah, that's uncomfortably close to full...

(FWIW, it's also interesting comparing that to a df on my /home...

$>> df .
Filesystem     2M-blocks  Used Available Use% Mounted on
/dev/sda6          20480 12104      7988  61% /h

As you can see I'm using 2M blocks (alias df=df -B2M), but the filesystem 
is raid1 both data and metadata, so the numbers would be double and the 
2M blocks are thus 1M block equivalent.  (You can also see that I've 
actually mounted it on /h, not /home.  /home is actually a symlink to /h 
just in case, but I export HOME=/h/whatever, and most programs honor 
that.)

So the partition size is 20480 MiB or 20.0 GiB, with ~12+ GiB used, just 
under 8 GiB available.

It can be and is so small because I have a dedicated media partition with 
all the big stuff located elsewhere (still on reiserfs on spinning rust, 
as a matter of fact).

Just interesting to see how people setup their systems differently, is 
all, thus the "FWIW". But the small independent partitions do make for 
much shorter balance times, etc! =:^)

> merkaba:~> btrfs filesystem show […]
> Label: home  uuid: […]
>         Total devices 1 FS bytes used 238.99GiB
>         devid  1 size 253.52GiB used 253.52GiB path [...]
> 
> Btrfs v3.12
> 
> merkaba:~> btrfs filesystem df /home
> Data, single: total=245.49GiB, used=237.07GiB
> System, DUP: total=8.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, DUP: total=4.00GiB, used=1.92GiB
> Metadata, single: total=8.00MiB, used=0.00

It has come up before on this list and doesn't hurt anything, but those 
extra system-single and metadata-single chunks can be removed.  A balance 
with a zero usage filter should do it.  Something like this:

btrfs balance start -musage=0

That will act on metadata chunks with usage=0 only.  It may or may not 
act on the system chunk.  Here it does, and metadata implies system also, 
but someone reported it didn't, for them.  If it doesn't...

btrfs balance start -f -susage=0

... should do it.  (-f=force, needed if acting on system chunk only.)

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

(That's for the filter info, not well documented in the manpage yet.  The 
manpage documents btrfs balance fairly well tho, other than that.)

Anyway... 252 gigs used of 252 total in filesystem show.  That's full 
enough you may not even be able to balance as there's no unallocated 
blocks left to allocate for the balance.  But the usage=0 thing may get 
you a bit of room, after which you can try usage=1, etc, to hopefully 
recover a bit more, until you get at least /some/ unallocated space as a 
buffer to work with.  Right now, you're risking being unable to allocate 
anything more when data or metadata runs out, and I'd be worried about 
that.

> Okay, I could probably get back 1,5 GiB on metadata, but whenever I
> tried a btrfs filesystem balance on any of the BTRFS filesystems on my
> SSD I usually got the following unpleasant result:
> 
> Halve of the performance. Like double boot times on / and such.

That's weird.  I wonder why/how, unless it's simply so full an SSD that 
the firmware's having serious trouble doing its thing.  I know I've seen 
nothing like that on my SSDs.  But then again, my usage is WILDLY 
different, with my largest partition 24 gigs, and only about 60% of the 
SSD even partitioned at all because I keep the big stuff like media files 
on spinning rust (and reiserfs, not btrfs), so the firmware has *LOTS* of 
room to shuffle blocks around for write-cycle balancing, etc.

And of course I'm using a different brand SSD.  (FWIW, Corsair Neutron 
256 GB, 238 GiB, *NOT* the Neutron GTX.)  But if anything, Intel SSDs 
have a better rep than my Corsair Neutrons do, so I doubt that has 
anything to do with it.

> So I have the following thoughts:
> 
> 1) I am not yet clear whether defragmenting files on SSD will really
> bring a benefit.

Of course that's the question of the entire thread.  As I said, I have it 
turned on here, but I understand the arguments for both sides, and from 
here that question does appear to remain open for debate.

One other related critical point while we're on the subject.

A number of people have reported that at least for some distros installed 
to btrfs, brand new installs are coming up significantly fragmented.  
Apparently some distros do their install to btrfs mounted without 
autodefrag turned on.

And once there's existing fragmentation, turning on autodefrag /then/ 
results in a slowdown for several boot cycles, as normal usage detects 
and queues for defrag, then defrags, all those already fragmented files.  
There's an eventual speedup (at least on spinning rust, SSDs of course 
are open to question, thus this thread), but the system has to work thru 
the existing backlog of fragmentation before you'll see it.

Of course one way out of that (temporary but sometimes several days) pain 
is to deliberately run a btrfs defrag recursive (new enough btrfs has a 
recursive flag, previous to that, one had to play some tricks with find, 
as documented on the wiki) on the entire filesystem.  That will be more 
intense pain, but it'll be over faster! =:^)

The point being, if a reader is considering autodefrag, be SURE and turn 
it on BEFORE there's a whole bunch of already fragmented data on the 
filesystem.

Ideally, turn it on for the first mount after the mkfs.btrfs, and never 
mount without it.  That ensures there's never a chance for fragmentation 
to get out of hand in the first place. =:^)

(Well, with the additional caveat that the NOCOW extended attribute is 
used appropriately on internal-rewrite files such as VM images, 
databases, bittorrent preallocations, etc, when said file approaches a 
gig or larger.  But that is discussed elsewhere.)

> 2) On my /home problem is more that it is almost full and free space
> appears to be highly fragmented. Long fstrim times speak tend to agree
> with it:
> 
> merkaba:~> /usr/bin/time fstrim -v /home
> /home: 13494484992 bytes were trimmed
> 0.00user 12.64system 1:02.93elapsed 20%CPU

Some people wouldn't call a minute "long", but yeah, on an SSD, even at 
several hundred gig, that's definitely not "short".

It's not well comparable because as I explained, my partition sizes are 
so much smaller, but for reference, a trim on my 20-gig /home took a bit 
over a second.  Doing the math, that'd be 10-20 seconds for 200+ gigs.  
That you're seeing a minute, does indeed seem to indicate high free-space 
fragmentation.

But again, I'm at under 60% SSD space even partitioned, so there's LOTS 
of space for the firmware to do its management thing.  If your SSD is 256 
gig as mine, with 253+ gigs used (well, I see below it's 300 gig, but 
still...) ... especially if you're not running with the discard mount 
option (which could be an entire thread of its own, but at least there's 
some official guidance on it), that firmware could be working pretty hard 
indeed with the resources it has at its disposal!

I expect you'd see quite a difference if you could reduce that to say 80% 
partitioned and trim the other 20%, giving the firmware a solid 20% extra 
space to work with.

If you could then give btrfs some headroom on the reduced size partition 
as well, well...

> 3) Turning autodefrag on might fragment free space even more.

Now, yes.  As I stressed above, turn it on when the filesystem's new, 
before you start loading it with content, and the story should be quite 
different.  Don't give it a chance to fragment in the first place. =:^)

> 4) I have no clear conclusion on what maintenance other than scrubbing
> might make sense for BTRFS filesystems on SSDs at all. Everything I
> tried either did not have any perceivable effect or made things worse.

Well, of course there's backups.  Given that btrfs isn't fully stabilized 
yet and there are still bugs being worked out, those are *VITAL* 
maintenance! =:^)

Also, for the same reason (btrfs isn't yet fully stable), I recently 
refreshed and double-checked my backups, then blew away the existing 
btrfs with a fresh mkfs.btrfs and restored from backup.

The brand new filesystems now make use of several features that the older 
ones didn't have, including the new 16k nodesize default. =:^)  For 
anyone who has been running btrfs for awhile, that's potentially a nice 
improvement.

I expect to do the same thing at least once more, later on after btrfs 
has settled down to more or less routine stability, just to clear out any 
remaining not-fully-stable-yet corner-cases that may eventually come back 
to haunt me if I don't, as well as to update the filesystem to take 
advantage of any further format updates between now and then.

That's useful btrfs maintenance, SSD or no SSD. =:^)

> Thus for SSD except for the scrubbing and the occasional fstrim I be
> done with it.
> 
> For harddisks I enable autodefrag.
> 
> But still for now this is only guess work. I don´t have much clue on
> BTRFS filesystems maintenance yet and I just remember the slogan on
> xfs.org wiki:
> 
> "Use the defaults."

=:^)

> I would love to hear some more or less official words from BTRFS
> filesystem developers on that. But for know I think one of the best
> optimizations would be to complement that 300 GB Intel SSD 320 with a
> 512 GB Crucial m5 mSATA SSD or some Intel mSATA SSDs (but these cost
> twice as much), and make more free space on /home again. For criticial
> data regarding data safety and amount of accesses I could even use BTRFS
> RAID 1 then.

Indeed.  I'm running btrfs raid1 mode with my ssds (except for /boot, 
where I have a separate one configured on each drive, so I can grub 
install update one and test it before doing the other, without 
endangering my ability to boot off the other should something go wrong).

> All those MPEG3 and photos I could place on the bigger
> mSATA SSD. Granted a SSD is definately not needed for those, but it is
> just more silent. I never got how loud even a tiny 2,5 inch laptop drive
> is, unless I switched one external on while using this ThinkPad T520
> with SSD. For the first time I heard the harddisk clearly. Thus I´d
> prefer a SSD anyway.

Well, yes.  But SSDs cost money.  And at least here, while I could 
justify two SSDs in raid1 mode for my critical data, and even 
overprovision such that I have nearly 50% available space entirely 
unpartitioned, I really couldn't justify spending SSD money on gigs of 
media files.

But as they say, YMMV...

---
[1] Pathologic: THAT is the word I was looking for in several recent 
posts, but couldn't remember, not "pathetic", "pathologic"!  But all I 
could think of was pathetic, and I knew /that/ wasn't what I wanted, so 
explained using other words instead.  So if you see any of my other 
recent posts on the issue and think I'm describing a pathologic case 
using other words, it's because I AM!

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-23 22:23 Options for SSD - autodefrag etc? KC
  2014-01-24  6:54 ` Duncan
@ 2014-01-24 20:14 ` Kai Krakow
  2014-01-25 13:11   ` Martin Steigerwald
  1 sibling, 1 reply; 15+ messages in thread
From: Kai Krakow @ 2014-01-24 20:14 UTC (permalink / raw)
  To: linux-btrfs

KC <conrad.francois.artus@googlemail.com> schrieb:

> I was wondering about whether using options like "autodefrag" and
> "inode_cache" on SSDs.
> 
> On one hand, one always hears that defragmentation of SSD is a no-no,
> does that apply to BTRFS's autodefrag?
> Also, just recently, I heard something similar about "inode_cache".
> 
> On the other hand, Arch BTRFS wiki recommends to use both options on SSDs
>   http://wiki.archlinux.org/index.php/Btrfs#Mount_options
> 
> So to clear things up, I ask at the source where people should know best.
> 
> Does using those options on SSDs gives any benefits and causes
> non-negligible increase in SSD wear?

I'm not an expert, but I wondered myself. And while I still have not SSD yet 
I would prefer turning autodefrag on even for SSD - at least when I have no 
big write-intensive files on the device (but you should plan your FS to not 
have those on a SSD anyways) because btrfs may rewrite large files on SSD 
just for the purpose of autodefragging. I hope that will improve soon, maybe 
by only defragging parts of the file given some sane thresholds.

Why I decided I would turn it on? Well, heavily fragmented files give a 
performance overhead, and btrfs tends to fragment files fast (except for the 
nodatacow mount flag with its own downsides). An adaptive online defrag 
ensures you gain no performance loss due to very scattert extents. And: 
Fragmented files (or let's better say fragmented free space) increases 
write-amplification (at least for long-living filesystems) because when 
small amounts of free space are randomly scattered all over the device the 
filesystem has to fill these holes at some point in time. This decreases 
performance because it has to find these holes and possibly split batched 
write requests, and it potentially decreases life-time of your SSD because 
the read-modify-write-erase cycle takes action in more places than what 
would be needed if the free space hole had just been big enough. I don't 
know how big erase blocks [*] are - but think about it. You will come to the 
conclusion that it will reduce life-time.

So it is generally recommended to defragment heavily fragmented files, leave 
alone the not-so-heavily fragmented files and coalesce free space holes into 
bigger free space areas on a regular basis. I think, an effective algorithm 
could coalesce free space into bigger areas of freespace and as a side 
effect simply defragment those files whose parts had to be moved anyways to 
merge free space. During this process, a trim should be applied.

I wonder if btrfs will optimize for this use case in the future...

All in all, I'd say: Defragmenting a SSD is not that bad if done right, and 
if done right it will even improve life-time and performance. And I believe 
this is why the wiki recommends it. I'd recommend combining it with 
compress=lzo or maybe even compress-force=lzo (unless your SSD firmware does 
compression) - it should give a performance boost and reduces writes to your 
SSD. YMMV - so do your (long-term) benchmarking.

If performance and life-time is a really big concern then only partition and 
ever use 75% of your device and leave the rest of it untouched so it can be 
used as spare area for wear-levelling [**]. It will give you a good long-
term performance and should increase life-time.

[*] Erase blocks are usually much much bigger than the block size you can 
read and write data at. Flash memory cannot be overwritten, it is 
essentially write-once-read-many, so it needs to be erased. This is where 
the read-modify-write-erase cycle comes from and why wear-leveling is 
needed: Read the whole erase block, modify it with your data block, write it 
to a new location, erase and free the old block. So you see: Writing just 4k 
can result in (128k-4k) read, 128k written, 128k erased (so something like a 
write-amplification factor of 64), given an erase block size of 128k. Do 
this a lot and randomly scattered, and performance and life-time will suffer 
a lot. The SSD firmware will try to buffer as much data as possible before 
the read-modify-write-erase-cycle kicks in to decrease the bad effects of 
random writes. So a block-sorting scheduler (deadline instead of noop) and 
increasing nr_requests may be a good idea. This is also why you may want to 
look into filesystems that turn random writes into sequential writes like 
f2fs or why you may want to use bcache which also turns random writes into 
sequential writes for the cache device (your SSD).

[**] This ([*]) is why you should keep a spare area...

These are just my humble thoughts. You see: The topic may be a lot more 
complex than just saying "use noop scheduler" and "SSD needs no 
defragmentation". I think those statements are just plain wrong.

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-24 20:14 ` Kai Krakow
@ 2014-01-25 13:11   ` Martin Steigerwald
  2014-01-25 14:06     ` Kai Krakow
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2014-01-25 13:11 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Am Freitag, 24. Januar 2014, 21:14:21 schrieben Sie:
> KC <conrad.francois.artus@googlemail.com> schrieb:
> > I was wondering about whether using options like "autodefrag" and
> > "inode_cache" on SSDs.
> >
> > 
> >
> > On one hand, one always hears that defragmentation of SSD is a no-no,
> > does that apply to BTRFS's autodefrag?
> > Also, just recently, I heard something similar about "inode_cache".
> >
> > 
> >
> > On the other hand, Arch BTRFS wiki recommends to use both options on SSDs
> >
> >   http://wiki.archlinux.org/index.php/Btrfs#Mount_options
> > 
> >
> > So to clear things up, I ask at the source where people should know best.
> >
> > 
> >
> > Does using those options on SSDs gives any benefits and causes
> > non-negligible increase in SSD wear?
> 
> I'm not an expert, but I wondered myself. And while I still have not SSD
> yet  I would prefer turning autodefrag on even for SSD - at least when I
> have no big write-intensive files on the device (but you should plan your
> FS to not have those on a SSD anyways) because btrfs may rewrite large
> files on SSD just for the purpose of autodefragging. I hope that will
> improve soon, maybe by only defragging parts of the file given some sane
> thresholds.
> 
> Why I decided I would turn it on? Well, heavily fragmented files give a 
> performance overhead, and btrfs tends to fragment files fast (except for
> the  nodatacow mount flag with its own downsides). An adaptive online
> defrag ensures you gain no performance loss due to very scattert extents.
> And: Fragmented files (or let's better say fragmented free space) increases
> write-amplification (at least for long-living filesystems) because when
> small amounts of free space are randomly scattered all over the device the
> filesystem has to fill these holes at some point in time. This decreases
> performance because it has to find these holes and possibly split batched
> write requests, and it potentially decreases life-time of your SSD because
> the read-modify-write-erase cycle takes action in more places than what
> would be needed if the free space hole had just been big enough. I don't
> know how big erase blocks [*] are - but think about it. You will come to
> the conclusion that it will reduce life-time.

Do you have any numbers to back your claim?

I just demonstrated that >90000 extent Nepomuk database file. And still I do 
not see any serious performance degradation in KDE´s desktop search. For 
example I just entered nodatacow in Alt-F2 krunner text input and it presented 
me some indexed mails in an instant.

I tried to defrag the file, but frankly even though numbers of extent decreased 
I never perceived any difference in performance whatsoever.

I am just not convinced that autodefrag will give me any noticeable benefit for 
this Intel SSD 320 based /home.

For seeing any visible difference I think you need to have an I/O pattern that 
generated lots of IOPS due to the fragmented file, i.e. is reading and writing 
continuously large amounts of the fragmented data, yet despite those >90000 
extents I get:

merkaba:/home/martin/.kde/share/apps/nepomuk/repository/main/data/virtuosobackend> 
echo 3 > /proc/sys/vm/drop_caches ; /usr/bin/time -v dd if=soprano-virtuoso.db 
of=/dev/null bs=1M
2418+0 Datensätze ein
2418+0 Datensätze aus
2535456768 Bytes (2,5 GB) kopiert, 13,9546 s, 182 MB/s
        Command being timed: "dd if=soprano-virtuoso.db of=/dev/null bs=1M"
        User time (seconds): 0.00
        System time (seconds): 2.77
        Percent of CPU this job got: 19%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.96
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2000
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 2
        Minor (reclaiming a frame) page faults: 549
        Voluntary context switches: 9369
        Involuntary context switches: 57
        Swaps: 0
        File system inputs: 5102304
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

So even if I read in the full 2,4 GiB where BTRFS will have to look up all the 
>90000 extents I get 182 MB/s. (I disabled Nepomuk during that test).

Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso isn´t 
doing this kind of large scale I/O on a highly fragmented file. Its a database. 
Its random access. My oppinion is that Virtuoso couldn´t care less about the 
fragmentation of the file. As long as it is stored on the SSD.

Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128 KiB 
gives at least about 20000 extents already provided that my calculation is 
correct. And these extents could be sequential (I doubt it tough also give the 
high free space fragmention I suspect to be on this FS).

Anyway: I do not perceive any noticable performance issues due to file 
fragmentation on SSD and think that at least on highly filled BTRFS filesystem 
autodefrag may do more harm than good (like fragment free space and then let 
btrfs-delalloc go crazy on new allocations). I know xfs_fsr for defragmenting 
XFS in the background, even via cron job. And I think I remember Dave Chinner 
telling in some post that even for harddisks it may not be a very wise idea to 
run this frequently due to the risk to fragment free space.

There are several kinds of fragmentations. And defragmenting files may increase 
freespace fragmentation.

Thus, I am not yet convinced regarding autodefrag on SSDs.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-25 13:11   ` Martin Steigerwald
@ 2014-01-25 14:06     ` Kai Krakow
  2014-01-25 16:19       ` Martin Steigerwald
  0 siblings, 1 reply; 15+ messages in thread
From: Kai Krakow @ 2014-01-25 14:06 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald <Martin@lichtvoll.de> schrieb:

> Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso
> isn´t doing this kind of large scale I/O on a highly fragmented file. Its
> a database. Its random access. My oppinion is that Virtuoso couldn´t care
> less about the fragmentation of the file. As long as it is stored on the
> SSD.

I think it makes no real difference here since access to virtuoso is random 
anyway. And if I got you right you run it nocow, so upon writes you aren't 
introducing more fragmentation to the file. All is good... It probably would 
even be good with cow as virtuoso is read-most, so rarely written to. 

For VM images it might be a whole different story. The guest system sees a 
block device and expects it to be continuous. All optimizations for access 
patterns cannot work right if btrfs is constantly moving parts of the file 
around for doing cow. So make it nocow and all should be as good as it can 
get.

> Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128
> KiB gives at least about 20000 extents already provided that my
> calculation is correct. And these extents could be sequential (I doubt it
> tough also give the high free space fragmention I suspect to be on this
> FS).

Your CPU is more mighty than the flash chips. LZO improves read performance. 
But does it make sense on Intel drives? I think they already do compression.

> Anyway: I do not perceive any noticable performance issues due to file
> fragmentation on SSD and think that at least on highly filled BTRFS
> filesystem autodefrag may do more harm than good (like fragment free space
> and then let btrfs-delalloc go crazy on new allocations). I know xfs_fsr
> for defragmenting XFS in the background, even via cron job. And I think I
> remember Dave Chinner telling in some post that even for harddisks it may
> not be a very wise idea to run this frequently due to the risk to fragment
> free space.
> 
> There are several kinds of fragmentations. And defragmenting files may
> increase freespace fragmentation.

This is why I wondered if btrfs will be optimized for keeping free space 
together in the future for SSD. But it's not as simple as this. It should 
not scatter file blocks all over the disk just to fill tiny holes. It should 
try to keep file blocks together so the read-modify-write-erase cycle of 
SSDs can work optimally.

> Thus, I am not yet convinced regarding autodefrag on SSDs.

I think everything would be easier if btrfs exposed some stats about what 
the autodefrag thread is really doing...

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-25 14:06     ` Kai Krakow
@ 2014-01-25 16:19       ` Martin Steigerwald
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Steigerwald @ 2014-01-25 16:19 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Am Samstag, 25. Januar 2014, 15:06:24 schrieb Kai Krakow:
> Martin Steigerwald <Martin@lichtvoll.de> schrieb:
> > Okay, I have seen 260 MB/s. But frankly I am pretty sure that Virtuoso
> > isn´t doing this kind of large scale I/O on a highly fragmented file. Its
> > a database. Its random access. My oppinion is that Virtuoso couldn´t care
> > less about the fragmentation of the file. As long as it is stored on the
> > SSD.
> 
> I think it makes no real difference here since access to virtuoso is random
> anyway. And if I got you right you run it nocow, so upon writes you aren't
> introducing more fragmentation to the file. All is good... It probably would
> even be good with cow as virtuoso is read-most, so rarely written to.

No, its not nocow.

> For VM images it might be a whole different story. The guest system sees a
> block device and expects it to be continuous. All optimizations for access
> patterns cannot work right if btrfs is constantly moving parts of the file
> around for doing cow. So make it nocow and all should be as good as it can
> get.

I have some VirtualBox based VMs. I never see any issue with that. They are 
just fast. But then, for write based workloads I read hints that Virtualbox 
may not honor fsync() that closely.

> > Well… take this with caveat. This is LZO compressed, those 2,4 GiB / 128
> > KiB gives at least about 20000 extents already provided that my
> > calculation is correct. And these extents could be sequential (I doubt it
> > tough also give the high free space fragmention I suspect to be on this
> > FS).
> 
> Your CPU is more mighty than the flash chips. LZO improves read performance.
> But does it make sense on Intel drives? I think they already do
> compression.

Not the Intel SSD 320 to my knowledge.

> > Anyway: I do not perceive any noticable performance issues due to file
> > fragmentation on SSD and think that at least on highly filled BTRFS
> > filesystem autodefrag may do more harm than good (like fragment free space
> > and then let btrfs-delalloc go crazy on new allocations). I know xfs_fsr
> > for defragmenting XFS in the background, even via cron job. And I think I
> > remember Dave Chinner telling in some post that even for harddisks it may
> > not be a very wise idea to run this frequently due to the risk to fragment
> > free space.
> > 
> > There are several kinds of fragmentations. And defragmenting files may
> > increase freespace fragmentation.
> 
> This is why I wondered if btrfs will be optimized for keeping free space
> together in the future for SSD. But it's not as simple as this. It should
> not scatter file blocks all over the disk just to fill tiny holes. It should
> try to keep file blocks together so the read-modify-write-erase cycle of
> SSDs can work optimally.

I am reluctant about conclusions about the behavior or SSDs. I am not sure 
whether a modern SSDs cares that much about scattering file blocks all over the 
disk. AFAIK all modern SSDs don´t tell the OS a thing about in which erase 
block they store something and all SSDs use some caching. So a modern SSD may 
just sort several write accesses even if there are at different ends of the 
block device together into adjacent erase blocks. Well, actually I think thats 
the whole point of SSD firmwares. I am pretty much sure that the blocks of the 
block device that Linux sees are not mapped sequentially to flash chips by the 
SSD firmware. AFAIK all SSDs have some internal mapping.

So I wonder whether it even matters…

Heck a SSD firmware even copies over stuff to distribute erase cycles evenly 
onto all flash chips in the background and whatnot.

> > Thus, I am not yet convinced regarding autodefrag on SSDs.
> 
> I think everything would be easier if btrfs exposed some stats about what
> the autodefrag thread is really doing...

… and if we actually knew how SSD firmwares really behave.

But well… regarding autodefrag… I don´t know… my gut feeling is to disable it 
for SSDs for the reasons I outlined.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
@ 2014-01-24 18:55 KC
  2014-01-24 20:27 ` Kai Krakow
  0 siblings, 1 reply; 15+ messages in thread
From: KC @ 2014-01-24 18:55 UTC (permalink / raw)
  To: linux-btrfs

 >>> From: Duncan <1i5t5.duncan <at> cox.net>
 >>>  Subject: Re: Options for SSD - autodefrag etc?
 >>>  Newsgroups: gmane.comp.file-systems.btrfs
 >>>  Date: 2014-01-24 06:54:31 GMT (11 hours and 44 minutes ago)
 >>> KC posted on Thu, 23 Jan 2014 23:23:35 +0100 as excerpted:

Duncan, thank you for this outstanding explanation. It was very 
informative and helpful.
I only have one follow-up question.

I followed your advice on NOCOW for virtualbox images and torrents like so:
chattr -v /home/juha/VirtualBox\ VMs/
chattr -RC /home/juha/Downloads/torrent/#unfinished

As you can see, i used the recursive flag. However, I do not know 
whether this will automatically apply to files that will be created in 
the future in subfolders that do not yet exist.

Also, how can I confirm whether a file/folder has a NOCOW attribute set 
on it?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-24 18:55 KC
@ 2014-01-24 20:27 ` Kai Krakow
  2014-01-25  5:09   ` Duncan
  2014-01-25 13:33   ` Imran Geriskovan
  0 siblings, 2 replies; 15+ messages in thread
From: Kai Krakow @ 2014-01-24 20:27 UTC (permalink / raw)
  To: linux-btrfs

KC <conrad.francois.artus@googlemail.com> schrieb:

> I followed your advice on NOCOW for virtualbox images and torrents like
> so: chattr -v /home/juha/VirtualBox\ VMs/
> chattr -RC /home/juha/Downloads/torrent/#unfinished
> 
> As you can see, i used the recursive flag. However, I do not know
> whether this will automatically apply to files that will be created in
> the future in subfolders that do not yet exist.
> 
> Also, how can I confirm whether a file/folder has a NOCOW attribute set
> on it?

The C attribute is also inherited by newly created directories. But keep in 
mind that, at the time applied, it only has effects on existing files if 
they are empty (read: never written to yet). Newly created files will 
inherit the attribute from its directory and then behave as expected.

You can use lsattr to confirm the C attribute was set. But again keep in 
mind: it does not reflect the file is actually nocow because of the above 
caveat. So in your use-case you may want to be sure by doing this (quit all 
VirtualBox instances beforehand):

# mkdir "VirtualBox VMs.new"
# chattr +C "VirtualBox VMs.new"
# rsync -aSv "VirtualBox VMs"/. "VirtualBox VMs.new"/.
# mv "VirtualBox VMs" "VirtualBox VMs.bak"
# mv "VirtualBox VMs.new" "VirtualBox VMs"

Then ensure everything is working, you can use lsattr to see the C attribute 
has been inherited. You should immediatly notice the effects of this by 
seeing better performing IO in VirtualBox (at least this was what I 
noticed). If everything was copied correctly, you can delete the backups. 
You could compare md5sums to be sure, of course before running a VM. ;-)

-- 
Replies to list only preferred.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-24 20:27 ` Kai Krakow
@ 2014-01-25  5:09   ` Duncan
  2014-01-25 13:33   ` Imran Geriskovan
  1 sibling, 0 replies; 15+ messages in thread
From: Duncan @ 2014-01-25  5:09 UTC (permalink / raw)
  To: linux-btrfs

Kai Krakow posted on Fri, 24 Jan 2014 21:27:19 +0100 as excerpted:

> KC <conrad.francois.artus@googlemail.com> schrieb:
> 
>> I followed your advice on NOCOW for virtualbox images and torrents
>> [...]
>> 
>> As you can see, i used the recursive flag. However, I do not know
>> whether this will automatically apply to files that will be created in
>> the future in subfolders that do not yet exist.
>> 
>> Also, how can I confirm whether a file/folder has a NOCOW attribute set
>> on it?
> 
> The C attribute is also inherited by newly created directories. But keep
> in mind that, at the time applied, it only has effects on existing files
> if they are empty (read: never written to yet). Newly created files will
> inherit the attribute from its directory and then behave as expected.
> 
> You can use lsattr to confirm the C attribute was set. But again keep in
> mind: it does not reflect the file is actually nocow because of the
> above caveat.

Excellent reply (including what I snipped).  I don't actually work with 
VMs or other huge internal-write files much here, and don't otherwise 
work with extended attributes much, so would have had to lookup lsattr, 
and wasn't actually sure on the nested subdirs inheritance point myself 
tho I thought it /should/ work that way.

And your chattr/rsync routine ensures all data will be newly copied in 
AFTER the chattr on the dir, thus nicely addressing the very critical 
point about NEW DATA ONLY coverage I was most worried about communicating 
correctly. =:^)

Which is why I so enjoy mailing lists and newsgroups.  Time and again 
I've seen one person's answer simply not getting the whole job done no 
matter how mightily they struggle to do so, but because it's a public 
list/group, someone else steps in with a followup that addresses the gaps 
left by the first answer.  It nicely takes the pressure off any one 
person to have the "perfect" reply "every" time, as well as benefiting 
untold numbers of lurkers who now understand something they didn't know 
before, but may have never thought to ask themselves. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-24 20:27 ` Kai Krakow
  2014-01-25  5:09   ` Duncan
@ 2014-01-25 13:33   ` Imran Geriskovan
  2014-01-25 14:01     ` Martin Steigerwald
  1 sibling, 1 reply; 15+ messages in thread
From: Imran Geriskovan @ 2014-01-25 13:33 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Every write on a SSD block reduces its data retension capability.

No concrete figures but it is assumed to be
- 10 years for new devices
- 1 year at rated usage. (There are much lower figures around)

Hence, I would not trade retension time and wear for
autodefrag with no/minor benefits on SSD. (which means
at least +2x write amplification on fragments)

On hard disks, we've experienced temporary freezes
(about 10secs to 3mins) during background autodefrag.

Regards,
Imran

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-25 13:33   ` Imran Geriskovan
@ 2014-01-25 14:01     ` Martin Steigerwald
  2014-01-26 17:18       ` Duncan
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Steigerwald @ 2014-01-25 14:01 UTC (permalink / raw)
  To: Imran Geriskovan; +Cc: Kai Krakow, linux-btrfs

Am Samstag, 25. Januar 2014, 15:33:08 schrieb Imran Geriskovan:
> Every write on a SSD block reduces its data retension capability.
> 
> No concrete figures but it is assumed to be
> - 10 years for new devices
> - 1 year at rated usage. (There are much lower figures around)

Where do you have these figures from?

For the Intel SSD 320 in this ThinkPad T520 I read about a minimal usable live 
of 5 years with 20 GB host writes each day in the tech specs. Thats 7300 GB a 
year or 7,3 TB. I assume metric system here.

According to smartctl it has written

241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       
-       360158

360158 * 32 MiB (hmmm, now according to smartctl output this is MiB) which 
gives almost 11 TiB (10,99).

The SSD is over 2,5 years old. Thats less than 5 TiB a year.

So that would lay within the range you say. Although the Intel SSD 320 isn´t 
basically a new device in my eyes.

Thats with KDE session with Akonadi and desktop search, sometimes even two KDE 
sessions and a load of applications running at times.

Anyway that SSDs still thinks it is well *new*:

233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       
-       0

Thats the same media wearout indicator (which takes into account the amount of 
Erase cycles according to Intel docs) it had at the first day I used it.

So I am basically not concerned.

While autodefrag mal cause additional writes… that would not even be the main 
reason for me not to use it at the moment. I am just not convinced that it 
gives any noticable benefit. And given that… of course it doesn´t make sense to 
me to have it cause additional writes to the SSD.

But I am not using it due to avoiding those additional writes in the first 
place.

My most important recommendation regarding SSDs still is: Keep some space 
free. Yeah, SSD manufacturers are doing this. But in another Intel SSD PDF I 
saw some graphs that convinced me in an instant that leaving free about 20% is 
a good idea. But heck, due to the current fill status of this SSD I do not even 
adhere to my own recommendation at the moment.

Then a occasional fstrim, maybe mount with noatime (cause who cares about it 
at all?)…

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Options for SSD - autodefrag etc?
  2014-01-25 14:01     ` Martin Steigerwald
@ 2014-01-26 17:18       ` Duncan
       [not found]         ` <KA9w1n01A0tVtje01A9yLn>
  0 siblings, 1 reply; 15+ messages in thread
From: Duncan @ 2014-01-26 17:18 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Sat, 25 Jan 2014 15:01:13 +0100 as excerpted:

> Am Samstag, 25. Januar 2014, 15:33:08 schrieb Imran Geriskovan:
>> Every write on a SSD block reduces its data retension capability.
>> 
>> No concrete figures but it is assumed to be - 10 years for new devices
>> - 1 year at rated usage. (There are much lower figures around)
> 
> Where do you have these figures from?
> 
> For the Intel SSD 320 in this ThinkPad T520 I read about a minimal
> usable live of 5 years with 20 GB host writes each day in the tech
> specs. Thats 7300 GB a year or 7,3 TB. I assume metric system here.

The two of you are talking about two entirely different things.

1) SSD's limited write-cycle thing, which you're talking about, is widely 
known, and must be considered, but with modern wear-leveling it's not a 
/horrible/ concern under /reasonable/ (that is, not constant write/erase 
as if to benchmark or prove the point) usage.  While it's a real issue, I 
think it has been blown out of proportion, potentially by old-style 
spinning-rust manufacturers in ordered to maintain a market when it 
looked like SSD prices were going to drop to and below spinning rust 
prices within a few years (which they didn't do).

I don't remember the exact numbers I saw given at one point, but they 
were in the context of worry over using SSD for swap.  Suffice it to say 
that the level of constant writing to blow the write-cycle rating within 
a feasible swap-usage lifetime of 5 years was well beyond anything most 
people even with low memory would be doing.  Once I saw those numbers, I 
more or less quit /worrying/ about that, and started /considering/ it, 
but in a far more "yes, this is practical to use without excessive worry" 
context.

2) What (I think) Imran was talking about was something very different 
altho somewhat related, which has seemed to get far less attention, the 
actual memory cell on-the-shelf-archival data retention lifetime.

For comparison purposes and to make crystal clear that we're not talking 
about rewriting, it's well known that commercially pressed CDs have a 
useful lifetime of perhaps a few decades (15-25 years is what I've seen 
quoted) if treated /well/ (practically "well", still actually using them, 
not atmosphere-controlled file away for a decade and bring out to read 
once test then file away again data-archiving well), while CD-ROMs burnt 
at full rated 24x speed may retain their data for only perhaps 2-5 years, 
but reducing the write-speed to say 4x can often double or triple that, 
thus yielding a very reasonable decade or so of retention, midline, 
approaching commercial press lifetimes of a quarter century or so on the 
long end.

With current-use common SSD MLC-flash-memory technology, the cell-data-
retention lifetimes numbers I've seen are as Imran said, perhaps 10 years 
powered-off when new, a year at rated write-cycles, down as far as days 
or even hours past rating shortly before cell write failure.

*HOWEVER*, that's *UNPOWERED* data retention time.  Flash technology, 
like DRAM but on an order of hours/days/years instead of milliseconds, 
requires refreshing cell charge occasionally to maintain state.  Plug in 
that USB thumbdrive that you've written to a couple of times then 
forgotten until you find it again several years later, and it'll probably 
still work.  If the same thumbdrive was used as swap (impractical 
perhaps, but this is just a thought experiment example) on a low-memory 
machine for a year, such that it reached lifetime write rating, then 
unplugged and lost for a few years, then found and plugged in to see 
what's on it, very likely it'd be unreadable.

OTOH, plug that same thumbdrive into an internal USB connector on a 
regularly used machine, use it as swap for a year, then reconfigure not 
to use it as swap any longer but keep it in the machine and keep using 
the machine regularly, so the thumbdrive continues to receive power but 
isn't actually used to store anything for a few years, and when that 
machine dies and you're salvaging it before throwing out the dead hulk, 
and you find that forgotten thumb drive still plugged into its internal 
slot, the data from its last use may very well still be readable, because 
the thumb drive was regularly powered and the cells recharged the whole 
time it wasn't otherwise used.

Now apply that same idea to a standard SSD instead of a thumb drive.

But with SSDs still relatively expensive compared to spinning rust, such 
sit around unpowered for years, or even weeks, usage, just isn't that 
common.  And if the flash (in SSD or thumbdrive form) is regularly 
powered, the cells recharge and data should be retained.

So again, as long as SSDs remain more expensive and lower capacity than 
spinning rust (and as long as capacity doesn't reach petabytes for under 
$100 at near current data usage, such that the difference in cost is so 
trivial it ceases to be a factor), they're relatively unlikely to be used 
for archival storage where unpowered data retention under say a year is 
that much of a factor.  Sure, if unpowered retention life drops to weeks, 
someone might go on vacation and not power their work laptop for long 
enough to be a problem, but as long as unpowered retention remains a year 
or so at minimum, the issue isn't likely to hit the common person enough 
to hit the radar.

Still, as can be seen by Imram's post, it's a real concern for some, 
perhaps because the technology is new enough and unproven enough that 
they're worried the numbers aren't actually that good, and that they'll 
find themselves on the wrong end of an outlier, dead in the water after 
taking a week off.

But to quote you admittedly now out of context (since I happened to 
glance down and see your sentence, just waiting to be quoted in my new 
context! =:^) :

> So I am basically not concerned.

Particularly since I still have bootable spinning rust backups at this 
point in any case.  I might lose a few months of work as I'm not exactly 
keeping those backups current, but the risk is low enough and the work 
I'd lose uncritical enough, that's a risk I'm willing to take...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <KA9w1n01A0tVtje01A9yLn>]

* Re: Options for SSD - autodefrag etc?
       [not found]         ` <KA9w1n01A0tVtje01A9yLn>
@ 2014-01-28 11:41           ` Duncan
  0 siblings, 0 replies; 15+ messages in thread
From: Duncan @ 2014-01-28 11:41 UTC (permalink / raw)
  To: KC, linux-btrfs

On Mon, 27 Jan 2014 23:09:55 +0100
KC <impactoria@googlemail.com> wrote:

> I forgot to ask about space_cache. Should it be off on SSD (ie 
> nospace_cache)?

[I don't see this on the list (which I read/reply-to using nntp via
gmane.org's list2news service) yet, so I'll reply to both the list and
you directly via mail...]

The default is now space_cache -- formerly a btrfs needed mounted with
it once, after which the option was "sticky" and applied by default so
it didn't need to be used again, and I believe the wiki mount-option
documentation at least still says that, but for at least several
kernels, the option seems to be on by default -- I never mounted with
space_cache specifically given here, yet all my btrfs have it listed
in /proc/self/mounts.

So space-cache is now the default unless specifically turned off.  And
while I don't have a specific reason to use it on ssd, I don't have a
good reason not to either, so not knowing anything specific I figured
I'd be best sticking with the defaults.

So on my btrfs on SSDs, space_cache is default-on simply because it's
the default and I know no good reason to mess with the defaults in that
case.  Actually I hadn't even thought of it in the context of something
else to record and thus to contribute to write-cycles, if there's no
real benefit to it on SSDs otherwise, but I guess there is, or it'd
default to off when ssds are detected, just like the ssd option is
automatically turned on in that case.  (In general, btrfs should in
most cases be able to detect ssd if it's on the "bare metal" physical
device or a partition on it.  If the btrfs is on top of lvm or mdraid,
however, or on some other mid-layer virtual device, it's less likely to
properly detect ssd, and you'd likely need to turn that option on
manually.)

-- 
Duncan - No HTML messages please, as they are filtered as spam.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-28 11:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-23 22:23 Options for SSD - autodefrag etc? KC
2014-01-24  6:54 ` Duncan
2014-01-25 12:54   ` Martin Steigerwald
2014-01-26 21:44     ` Duncan
2014-01-24 20:14 ` Kai Krakow
2014-01-25 13:11   ` Martin Steigerwald
2014-01-25 14:06     ` Kai Krakow
2014-01-25 16:19       ` Martin Steigerwald
  -- strict thread matches above, loose matches on Subject: below --
2014-01-24 18:55 KC
2014-01-24 20:27 ` Kai Krakow
2014-01-25  5:09   ` Duncan
2014-01-25 13:33   ` Imran Geriskovan
2014-01-25 14:01     ` Martin Steigerwald
2014-01-26 17:18       ` Duncan
     [not found]         ` <KA9w1n01A0tVtje01A9yLn>
2014-01-28 11:41           ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).