Massive BTRFS performance degradation

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* Massive BTRFS performance degradation
@ 2014-03-09  7:48 KC
  2014-03-09  8:17 ` Swâmi Petaramesh
  0 siblings, 1 reply; 32+ messages in thread
From: KC @ 2014-03-09  7:48 UTC (permalink / raw)
  To: linux-btrfs

I am experiencing massive performance degradation on my BTRFS root 
partition on SSD. Except for regular daily updates, nothing changed in 
the system. The mount point remained the same:

/  btrfs rw,noatime,compress=lzo,ssd,space_cache,autodefrag 0 0

but the performance dropped to less than 8% of norm.

Before:

# dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1.57307 s, 683 MB/s

Now:

# dd if=tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 26.4373 s, 40.6 MB/s

I created a new btrfs partition on the SSD with the same mount options 
and it is not being affected:

# dd if=/mnt/er/tempfile of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1.57634 s, 681 MB/s

I also did
btrfs filesystem balance start /
wit no effect.

I tried changing mount options - still no effect.

I'd appreciate some suggestions.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09  7:48 Massive BTRFS performance degradation KC
@ 2014-03-09  8:17 ` Swâmi Petaramesh
  2014-03-09 10:01   ` Martin Steigerwald
  2014-03-09 17:36   ` Massive BTRFS performance degradation Austin S Hemmelgarn
  0 siblings, 2 replies; 32+ messages in thread
From: Swâmi Petaramesh @ 2014-03-09  8:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: impactoria

Le dimanche 9 mars 2014 08:48:20 KC a écrit :
> I am experiencing massive performance degradation on my BTRFS root
> partition on SSD.

BTW, is BTRFS still a SSD-killer ? It had this reputation a while ago, and I'm 
not sure if this still is the case, but I don't dare (yet) converting to BTRFS 
one of my laptops that has a SSD...

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E

A bus station is where buses stop. A train station is where trains stop.
On my desk, there is a workstation...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09  8:17 ` Swâmi Petaramesh
@ 2014-03-09 10:01   ` Martin Steigerwald
  2014-03-09 10:23     ` Swâmi Petaramesh
  2014-03-09 17:36   ` Massive BTRFS performance degradation Austin S Hemmelgarn
  1 sibling, 1 reply; 32+ messages in thread
From: Martin Steigerwald @ 2014-03-09 10:01 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: linux-btrfs, impactoria

Am Sonntag, 9. März 2014, 09:17:24 schrieb Swâmi Petaramesh:
> Le dimanche 9 mars 2014 08:48:20 KC a écrit :
> > I am experiencing massive performance degradation on my BTRFS root
> > partition on SSD.
> 
> BTW, is BTRFS still a SSD-killer ? It had this reputation a while ago, and
> I'm not sure if this still is the case, but I don't dare (yet) converting
> to BTRFS one of my laptops that has a SSD...

I never heard about this reputation and luckily the Intel SSD 320 didn´t
either. Its almost three years old by now:

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       9171
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2603
170 Reserve_Block_Count     0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
183 Runtime_Bad_Block       0x0030   100   100   000    Old_age   Offline      -       1
184 End-to-End_Error        0x0032   100   100   090    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       225
199 UDMA_CRC_Error_Count    0x0030   100   100   000    Old_age   Offline      -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       393645
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2204244
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       49
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       13145477
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       393645
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       1002465


Media wearout indicator basically says the SSD considers itself to be
"new". Value is the same 100 as it was as it was new. The raw value tough
raised for the first time. On 2013-10-12 is was:

233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0

For more about this indicator read in Intel PDF about it.


There are some Erase fails that happened I think in the first year of
SSD life, but that 169 raw value so far never raised gain.


There have been 393645 * 32 MiB = 12,01 TiB of writes. The SSD itself is
specified to be usable for at least 5 years with 20 TB of host writes each
day. That is about 7,3 TB or 7,1 TiB. I assumed TB in the Intel
specification document, if its TiB, its then its 7,3 TiB.

Anyway with conversative 7 TiB a year or 21 TiB in three years of which
only 12 TiB are used up, I am quite confident that this SSD could last
longer than 5 years.

This ThinkPad T520 has been with BTRFS since installation of the Debian
sid system on it with Kernel 2.6.39 or even 2.6.38 (where Sandybridge
graphics didn´t work so well as today yet).

So that much to any FUD about BTRFS and SSDs.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 10:01   ` Martin Steigerwald
@ 2014-03-09 10:23     ` Swâmi Petaramesh
  2014-03-09 11:33       ` Hugo Mills
  0 siblings, 1 reply; 32+ messages in thread
From: Swâmi Petaramesh @ 2014-03-09 10:23 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-btrfs

Le dimanche 9 mars 2014 11:01:17 vous avez écrit :
> This ThinkPad T520 has been with BTRFS since installation of the Debian
> sid system on it with Kernel 2.6.39 or even 2.6.38 (where Sandybridge
> graphics didn´t work so well as today yet).
>
> So that much to any FUD about BTRFS and SSDs.

Wow !

Thanks for this very interesting info. Would you tell me if you use any of the 
SSD optimisation mount options: discard, ssd or ssd_spread ?

Myself I've been moving back and forth between BTRFS / ZFS ans ext4 over the 
past 2-3 years, each time giving a chance to BTRFS, then typically 3-4 months 
later switching back to either ext4 or ZFS after having either lost all of my 
data, or seen the filesystem slow down to the point it becomes unusable, beyond 
defragmentation or removing snapshots or whatever...

So my yo-yo-game is kind of "Is BTRFS now ready for use ?... Let's give it a 
chance... OMFG... Lost everything, unusable system... Never want to hear about 
BTRFS anymore... Wel... Maybe will come back next year... etc"

I've been used to consider for 3 years that :

- Next kernel release will have a truly excellent and mature BTRFS support.

- Current kernel release has correct BTRFS support - but most mainline distros 
don't have it yet, maybe in 6 months ?

- Previous kernel release (the one that all current distros come with) have a 
completely broke BTRFS support...

#LOL

Well I hope it's quite not the case anymore for I just installed my neighbour, 
old lady's system with a Linux Mint 16 (kernel 3.11) on BTRFS with skinny 
extents...

But for myself running ArchLinux in kernel 3.13, I still find out that :

- "btrfs send" causes my kernel to BUG :-/ (the wiki says it's working 
stuff...)
- btrfs-defrag.sgh hangs because of some glitch with "filefrag".
- bedup crashes badly and looks completely unmaintained as far as I can tell 
and nobody seems to care.

Soooo weeelllll... Looks like readiness for prime time is still ahead of us...

(But still my 2 main systems are now BTRFS, including my main storage machine 
running BTRFS RAID-1, so I hope it can be reliable, at least...)

Kind regards.

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 10:23     ` Swâmi Petaramesh
@ 2014-03-09 11:33       ` Hugo Mills
  2014-03-09 11:54         ` Martin Steigerwald
                           ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Hugo Mills @ 2014-03-09 11:33 UTC (permalink / raw)
  To: Swâmi Petaramesh; +Cc: Martin Steigerwald, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4016 bytes --]

On Sun, Mar 09, 2014 at 11:23:29AM +0100, Swâmi Petaramesh wrote:
> Le dimanche 9 mars 2014 11:01:17 vous avez écrit :
> > This ThinkPad T520 has been with BTRFS since installation of the Debian
> > sid system on it with Kernel 2.6.39 or even 2.6.38 (where Sandybridge
> > graphics didn´t work so well as today yet).
> >
> > So that much to any FUD about BTRFS and SSDs.
> 
> Wow !
> 
> Thanks for this very interesting info. Would you tell me if you use any of the 
> SSD optimisation mount options: discard, ssd or ssd_spread ?

   I would recommend none of the three. :)

   ssd should be activated automatically on any non-rotational device.
ssd_spread is generally slower on modern SSDs than the ssd option.
discard is, except on the very latest hardware, a synchronous command
(it's a limitation of the SATA standard), and therefore results in
very very poor performance.

[snip]
> I've been used to consider for 3 years that :
> 
> - Next kernel release will have a truly excellent and mature BTRFS support.

   I don't think anyone's claimed that. The next version tends to fix
most of the *known* problems.

> - Current kernel release has correct BTRFS support - but most
> mainline distros don't have it yet, maybe in 6 months ?

   This is usually true -- but by the time the current kernels come
round, there's usually been another swathe of bugs uncovered, thus
falling into this problem:

> - Previous kernel release (the one that all current distros come
> with) have a completely broke BTRFS support...

   Not completely broken, but with known and identified bugs that have
been fixed in later versions.

> #LOL
> 
> Well I hope it's quite not the case anymore for I just installed my neighbour, 
> old lady's system with a Linux Mint 16 (kernel 3.11) on BTRFS with skinny 
> extents...

   There's one known and serious bug in 3.11 before 3.11.6 which
affects balances. Please make sure that you're running 3.11.6 or
later. There may be other bugs in there that have been fixed in later
kernel versions as well, but that's the "headline" one.

> But for myself running ArchLinux in kernel 3.13, I still find out that :
> 
> - "btrfs send" causes my kernel to BUG :-/ (the wiki says it's working 
> stuff...)

   We don't get many bug reports of kernel oopses in send. This may be
that we don't have many people trying to use it (it is, after all,
fairly deep and poorly explained magic at the moment). It may be that
you have some corruption that's gone undetected otherwise, and the
send code isn't handling it well. Or it may be an actual bug in send.
At least you've reported it. (It might also be worth putting a copy of
the report on bugzilla.kernel.org, because then it doesn't get
forgotten in the email noise here).

> - btrfs-defrag.sgh hangs because of some glitch with "filefrag".

   Is that a btrfs problem, or a filefrag problem? btrfs-defrag.sh
isn't something I've heard of before, so I'd say it's unlikely to be
maintained by any of the main btrfs developers (and hence is much more
likely to be unmaintained or just plain broken in general).

> - bedup crashes badly and looks completely unmaintained as far as I can tell 
> and nobody seems to care.

   That's because nobody here is connected to bedup in any way. It was
a third-party piece of software written by someone (I don't even
recall who) who hasn't, as far as I know, engaged with the main btrfs
developers at all.

> Soooo weeelllll... Looks like readiness for prime time is still
> ahead of us...

   I think that's fair to say. However, it is noticeably improving
over time. The timescales are just quite long.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Well, you don't get to be a kernel hacker simply by looking ---   
                    good in Speedos. -- Rusty Russell                    

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 11:33       ` Hugo Mills
@ 2014-03-09 11:54         ` Martin Steigerwald
  2014-03-09 12:10         ` Swâmi Petaramesh
  2014-03-14  2:11         ` discard synchronous on most SSDs? Marc MERLIN
  2 siblings, 0 replies; 32+ messages in thread
From: Martin Steigerwald @ 2014-03-09 11:54 UTC (permalink / raw)
  To: Hugo Mills, Swâmi Petaramesh, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

Am Sonntag, 9. März 2014, 11:33:50 schrieb Hugo Mills:
> On Sun, Mar 09, 2014 at 11:23:29AM +0100, Swâmi Petaramesh wrote:
> > Le dimanche 9 mars 2014 11:01:17 vous avez écrit :
> > > This ThinkPad T520 has been with BTRFS since installation of the Debian
> > > sid system on it with Kernel 2.6.39 or even 2.6.38 (where Sandybridge
> > > graphics didn´t work so well as today yet).
> > > 
> > > So that much to any FUD about BTRFS and SSDs.
> >
> > 
> >
> > Wow !
> >
> > 
> >
> > Thanks for this very interesting info. Would you tell me if you use any of
> > the  SSD optimisation mount options: discard, ssd or ssd_spread ?
> 
>    I would recommend none of the three.
> 
>    ssd should be activated automatically on any non-rotational device.
> ssd_spread is generally slower on modern SSDs than the ssd option.
> discard is, except on the very latest hardware, a synchronous command
> (it's a limitation of the SATA standard), and therefore results in
> very very poor performance.

Thats exactly how I use it. I just fstrim the partitions from time to time.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 11:33       ` Hugo Mills
  2014-03-09 11:54         ` Martin Steigerwald
@ 2014-03-09 12:10         ` Swâmi Petaramesh
  2014-03-09 17:14           ` boris
  2014-03-14  2:11         ` discard synchronous on most SSDs? Marc MERLIN
  2 siblings, 1 reply; 32+ messages in thread
From: Swâmi Petaramesh @ 2014-03-09 12:10 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs

Le dimanche 9 mars 2014 11:33:50 Hugo Mills a écrit :
> 
>    ssd should be activated automatically on any non-rotational device.
> ssd_spread is generally slower on modern SSDs than the ssd option.
> discard is, except on the very latest hardware, a synchronous command
> (it's a limitation of the SATA standard), and therefore results in
> very very poor performance.

Thanks for the info Hugo :-)

>    There's one known and serious bug in 3.11 before 3.11.6 which
> affects balances. Please make sure that you're running 3.11.6 or
> later. There may be other bugs in there that have been fixed in later
> kernel versions as well, but that's the "headline" one.

Latest Ubuntu / Mint now have 3.11.0-18. Anyway I don't think my "old lady 
neighbour" will ever hear about balance or care, and will ever try to run it 
on her laptop. She would first have to figure out what a terminal and command 
line are ;-)

>    We don't get many bug reports of kernel oopses in send. This may be
> that we don't have many people trying to use it (it is, after all,
> fairly deep and poorly explained magic at the moment). It may be that
> you have some corruption that's gone undetected otherwise, 

Well, that's a rather "young" BTRFS setup (less than a month) that passes 
scrub without detecting any error, and a plain "btrfs send" works, then an 
incremental one fails...

> send code isn't handling it well. Or it may be an actual bug in send.

I would tend to believe so ;-)

> At least you've reported it. (It might also be worth putting a copy of
> the report on bugzilla.kernel.org, because then it doesn't get
> forgotten in the email noise here).

> > - btrfs-defrag.sgh hangs because of some glitch with "filefrag".
> 
>    Is that a btrfs problem, or a filefrag problem?

Looks like it's a filefrag problem. Looks like filefrag stalls forever trying to 
figure out the fragmentation status of some files...

> btrfs-defrag.sh isn't something I've heard of before, so I'd say it's
> unlikely to be maintained by any of the main btrfs developers (and hence is
> much more likely to be unmaintained or just plain broken in general).

It's a useful script that can be found there
https://gitorious.org/btrfs-defrag
...and it's maintained by Dmitry, who's a nice, responsive and helpful guy.

> > - bedup crashes badly and looks completely unmaintained as far as I can
> > tell and nobody seems to care.
> 
>    That's because nobody here is connected to bedup in any way. It was
> a third-party piece of software written by someone (I don't even
> recall who) who hasn't, as far as I know, engaged with the main btrfs
> developers at all.

bedup is mentioned on the BRTFS wiki 
https://btrfs.wiki.kernel.org/index.php/Deduplication
...as being the only current way to perform BTRFS deduplication. I found it in 
the wiki and belived/hoped it was something more "official and maintained" that 
what you seem to mean - alas...

Actually deduplication WAS the reason why I recently made the move to BTRFS 
again, for deduplication in ZFS is working, but *SO* memory hungry and 
performance killer unless you have *lots* of RAM...

So I wanted to give a try at BTRFS offline bedup.

> > Soooo weeelllll... Looks like readiness for prime time is still
> > ahead of us...
> 
>    I think that's fair to say. However, it is noticeably improving
> over time. The timescales are just quite long.

If the timescales become really too long, people with just end keeping with 
the idea that BTRFS is not ready for production and won't be any previsible 
time soon...

Kind regards.

-- 
Swâmi Petaramesh <swami@petaramesh.org> http://petaramesh.org PGP 9076E32E

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 12:10         ` Swâmi Petaramesh
@ 2014-03-09 17:14           ` boris
  0 siblings, 0 replies; 32+ messages in thread
From: boris @ 2014-03-09 17:14 UTC (permalink / raw)
  To: linux-btrfs

Swâmi Petaramesh <swami <at> petaramesh.org> writes:


> Actually deduplication WAS the reason why I recently made the move to BTRFS 
> again, for deduplication in ZFS is working, but *SO* memory hungry and 
> performance killer unless you have *lots* of RAM...
> 

If you think about what dedup is has to do it's going to be fairly memory
hungry; hopefully there are a few maths (yes, it's maths not math! Think of
the game dominoe :-D ) bods on the team.

<starts to get excited about how one would tackle it then decides he needs
to get out more>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09  8:17 ` Swâmi Petaramesh
  2014-03-09 10:01   ` Martin Steigerwald
@ 2014-03-09 17:36   ` Austin S Hemmelgarn
  2014-03-09 18:55     ` Tobias Holst
  1 sibling, 1 reply; 32+ messages in thread
From: Austin S Hemmelgarn @ 2014-03-09 17:36 UTC (permalink / raw)
  To: Swâmi Petaramesh, linux-btrfs; +Cc: impactoria

On 03/09/2014 04:17 AM, Swâmi Petaramesh wrote:
> Le dimanche 9 mars 2014 08:48:20 KC a écrit :
>> I am experiencing massive performance degradation on my BTRFS
>> root partition on SSD.
> 
> BTW, is BTRFS still a SSD-killer ? It had this reputation a while
> ago, and I'm not sure if this still is the case, but I don't dare
> (yet) converting to BTRFS one of my laptops that has a SSD...
> 
Actually, because of the COW nature of BTRFS, it should be better for
SSD's than stuff like ext4 (which DOES kill SSD's when journaling is
enabled because it ends up doing thousands of read-modify-write cycles
to the same 128k of the disk under just generic usage).  Just make
sure that you use the 'ssd' and 'discard' mount options.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Massive BTRFS performance degradation
  2014-03-09 17:36   ` Massive BTRFS performance degradation Austin S Hemmelgarn
@ 2014-03-09 18:55     ` Tobias Holst
  0 siblings, 0 replies; 32+ messages in thread
From: Tobias Holst @ 2014-03-09 18:55 UTC (permalink / raw)
  To: Austin S Hemmelgarn; +Cc: Swâmi Petaramesh, linux-btrfs, impactoria

2014-03-09 18:36 GMT+01:00 Austin S Hemmelgarn <ahferroin7@gmail.com>:
> On 03/09/2014 04:17 AM, Swâmi Petaramesh wrote:
>> Le dimanche 9 mars 2014 08:48:20 KC a écrit :
>>> I am experiencing massive performance degradation on my BTRFS
>>> root partition on SSD.
>>
>> BTW, is BTRFS still a SSD-killer ? It had this reputation a while
>> ago, and I'm not sure if this still is the case, but I don't dare
>> (yet) converting to BTRFS one of my laptops that has a SSD...
>>
> Actually, because of the COW nature of BTRFS, it should be better for
> SSD's than stuff like ext4 (which DOES kill SSD's when journaling is
> enabled because it ends up doing thousands of read-modify-write cycles
> to the same 128k of the disk under just generic usage).  Just make
> sure that you use the 'ssd' and 'discard' mount options.

Every modern SSD does "Wear Leveling". Doing a read-modify-write cycle
on the same block doesn't mean it writes to the same memory cell. The
SSD-controller distributes the write-cycles over all (empty) cells. So
in best-case every cell in the SSD is used equally, no matter of doing
random writes or writing the same block over and over. This works
better with lots of empty space on the SSD, that's why you should
never use more than 90% of the space on a SSD. Garbage collection and
TRIM also help the SSD-controller to find empty cells.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-09 11:33       ` Hugo Mills
  2014-03-09 11:54         ` Martin Steigerwald
  2014-03-09 12:10         ` Swâmi Petaramesh
@ 2014-03-14  2:11         ` Marc MERLIN
  2014-03-14  3:39           ` Chris Murphy
  2 siblings, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2014-03-14  2:11 UTC (permalink / raw)
  To: Hugo Mills, Swâmi Petaramesh, Martin Steigerwald,
	linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
> discard is, except on the very latest hardware, a synchronous command
> (it's a limitation of the SATA standard), and therefore results in
> very very poor performance.

Interesting. How do I know if a given SSD will hang on discard?
Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 308 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  2:11         ` discard synchronous on most SSDs? Marc MERLIN
@ 2014-03-14  3:39           ` Chris Murphy
  2014-03-14  5:17             ` Marc MERLIN
  2014-03-14  7:27             ` Chris Samuel
  0 siblings, 2 replies; 32+ messages in thread
From: Chris Murphy @ 2014-03-14  3:39 UTC (permalink / raw)
  To: Btrfs BTRFS


On Mar 13, 2014, at 8:11 PM, Marc MERLIN <marc@merlins.org> wrote:

> On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
>> discard is, except on the very latest hardware, a synchronous command
>> (it's a limitation of the SATA standard), and therefore results in
>> very very poor performance.
> 
> Interesting. How do I know if a given SSD will hang on discard?
> Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)

smartctl -a or -x will tell you what SATA revision is in place. The queued trim support is in SATA Rev 3.1. I'm not certain if this requires only the drive to support that revision level, or both controller and drive.

Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  3:39           ` Chris Murphy
@ 2014-03-14  5:17             ` Marc MERLIN
  2014-03-14  7:33               ` Chris Samuel
                                 ` (2 more replies)
  2014-03-14  7:27             ` Chris Samuel
  1 sibling, 3 replies; 32+ messages in thread
From: Marc MERLIN @ 2014-03-14  5:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
> 
> On Mar 13, 2014, at 8:11 PM, Marc MERLIN <marc@merlins.org> wrote:
> 
> > On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
> >> discard is, except on the very latest hardware, a synchronous command
> >> (it's a limitation of the SATA standard), and therefore results in
> >> very very poor performance.
> > 
> > Interesting. How do I know if a given SSD will hang on discard?
> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
> 
> smartctl -a or -x will tell you what SATA revision is in place. The queued trim support is in SATA Rev 3.1. I'm not certain if this requires only the drive to support that revision level, or both controller and drive.

I'm not sure I'm seeing this, which field is that?

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 840 EVO 1TB
Serial Number:    S1D9NEAD934600N
LU WWN Device Id: 5 002538 85009a8ff
Firmware Version: EXT0BB0Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4c
Local Time is:    Thu Mar 13 22:15:14 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (15000) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 250) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  9 Power_On_Hours          -O--CK   099   099   000    -    2219
 12 Power_Cycle_Count       -O--CK   099   099   000    -    659
177 Wear_Leveling_Count     PO--C-   099   099   000    -    3
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   100   100   010    -    0
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   100   100   010    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O--CK   054   041   000    -    46
195 Hardware_ECC_Recovered  -O-RC-   200   200   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   100   100   000    -    0
235 Unknown_Attribute       -O--C-   099   099   000    -    35
241 Total_LBAs_Written      -O--CK   099   099   000    -    12186944165
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  3:39           ` Chris Murphy
  2014-03-14  5:17             ` Marc MERLIN
@ 2014-03-14  7:27             ` Chris Samuel
  1 sibling, 0 replies; 32+ messages in thread
From: Chris Samuel @ 2014-03-14  7:27 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 493 bytes --]

On Thu, 13 Mar 2014 09:39:02 PM Chris Murphy wrote:

> smartctl -a or -x will tell you what SATA revision is in place. The queued
> trim support is in SATA Rev 3.1. I'm not certain if this requires only the
> drive to support that revision level, or both controller and drive.

Both I'd say as I believe it's the controller that has to issue it to the 
drive, and the drive needs to understand it.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  5:17             ` Marc MERLIN
@ 2014-03-14  7:33               ` Chris Samuel
  2014-03-14 19:26                 ` Marc MERLIN
  2014-03-15  4:06                 ` Chris Samuel
  2014-03-14 12:07               ` Duncan
  2014-03-14 21:44               ` Chris Murphy
  2 siblings, 2 replies; 32+ messages in thread
From: Chris Samuel @ 2014-03-14  7:33 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 464 bytes --]

Hi Marc,

On Thu, 13 Mar 2014 10:17:50 PM Marc MERLIN wrote:

> I'm not sure I'm seeing this, which field is that?

I *think* you want smartctl -i instead, and look for the field that says 
something like:

ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

So if my understanding is correct that says it's just rev. 3.0 so TRIM for 
this is synchronous.

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  5:17             ` Marc MERLIN
  2014-03-14  7:33               ` Chris Samuel
@ 2014-03-14 12:07               ` Duncan
  2014-03-14 21:44               ` Chris Murphy
  2 siblings, 0 replies; 32+ messages in thread
From: Duncan @ 2014-03-14 12:07 UTC (permalink / raw)
  To: linux-btrfs

Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:

> On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
>> 
>> On Mar 13, 2014, at 8:11 PM, Marc MERLIN <marc@merlins.org> wrote:
>> 
>> > On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
>> >> discard is, except on the very latest hardware, a synchronous
>> >> command (it's a limitation of the SATA standard), and therefore
>> >> results in very very poor performance.
>> > 
>> > Interesting. How do I know if a given SSD will hang on discard?
>> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
>> 
>> smartctl -a or -x will tell you what SATA revision is in place. The
>> queued trim support is in SATA Rev 3.1. I'm not certain if this
>> requires only the drive to support that revision level, or both
>> controller and drive.
> 
> I'm not sure I'm seeing this, which field is that?

> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4c

Your drive didn't report it, but here, I have SATA fields as well, in 
addition to the ATA fields:

Here's the fields from my Corsair Neutron SSDs:

ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 6.0 Gb/s

Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:

ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s

(More about that below.)

Smartctl version here is 6.2 2013-07-26 r3841, according to the output. 
(I'm running gentoo/~amd64 FWIW so it's a local-build). You snipped that 
bit of your output so I can't compare.

But it may also depend on whether smartctl auto-detected and used the ATA 
or the SCSI (or something else) command set and how your devices are 
actually connected, plus BIOS settings, etc.  See the manpage 
documentation for the -d TYPE (--device=TYPE) option and the ATA/SCSI/SAT 
discussion rather further down the manpage for more.

Here I have direct SATA connections with the BIOS set to AHCI mode and am 
thus using the kernel's AHCI drivers, since that's the most common SATA 
chipset standard these days, thus increasing portability given my 
monolithic kernel build.

smartctl's -d test reports an original guess of scsi, changed to sat 
after detection.

Of course connection via USB bridge or the like complicates things 
considerably.

Meanwhile, SATA 2.5, 6 Gb/s on the SSDs, SATA 2.6, 3 Gb/s on the spinning 
rust?  WTF?  The SSDs have SATA 2.5 but 6 Gb/s while the spinning rust 
has a later 2.6 but only 3 Gb/s (tho of course on a mechanical drive the 
bus speed won't be the bottleneck)?  Now I'm confused.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  7:33               ` Chris Samuel
@ 2014-03-14 19:26                 ` Marc MERLIN
  2014-03-14 19:57                   ` Martin K. Petersen
  2014-03-15  4:06                 ` Chris Samuel
  1 sibling, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2014-03-14 19:26 UTC (permalink / raw)
  To: Chris Samuel, Duncan, Christopher Corsi; +Cc: linux-btrfs

On Fri, Mar 14, 2014 at 12:07:54PM +0000, Duncan wrote:
> Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:
> 
> > On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
> >> 
> >> On Mar 13, 2014, at 8:11 PM, Marc MERLIN <marc@merlins.org> wrote:
> >> 
> >> > On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
> >> >> discard is, except on the very latest hardware, a synchronous
> >> >> command (it's a limitation of the SATA standard), and therefore
> >> >> results in very very poor performance.
> >> > 
> >> > Interesting. How do I know if a given SSD will hang on discard?
> >> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
> >> 
> >> smartctl -a or -x will tell you what SATA revision is in place. The
> >> queued trim support is in SATA Rev 3.1. I'm not certain if this
> >> requires only the drive to support that revision level, or both
> >> controller and drive.
> > 
> > I'm not sure I'm seeing this, which field is that?
> 
> > ATA Version is:   8
> > ATA Standard is:  ATA-8-ACS revision 4c
> 
> Your drive didn't report it, but here, I have SATA fields as well, in 
> addition to the ATA fields:
> 
> Here's the fields from my Corsair Neutron SSDs:
> 
> ATA Version is:   ATA8-ACS (minor revision not indicated)
> SATA Version is:  SATA 2.5, 6.0 Gb/s
> 
> Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:
> 
> ATA Version is:   ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 2.6, 3.0 Gb/s

Ok, my smartmontools was too old. I got a newer one and now have proper
output:
Device Model:     Samsung SSD 840 EVO 1TB
Serial Number:    S1D9NEAD934600N
LU WWN Device Id: 5 002538 85009a8ff
Firmware Version: EXT0BB0Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Mar 14 10:49:39 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

So I have Sata 3.1, that's great news, it means I can keep using discard
without worrying about performance and hangs

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14 19:26                 ` Marc MERLIN
@ 2014-03-14 19:57                   ` Martin K. Petersen
  2014-03-14 20:46                     ` Holger Hoffstätte
  2014-03-15  5:25                     ` Chris Samuel
  0 siblings, 2 replies; 32+ messages in thread
From: Martin K. Petersen @ 2014-03-14 19:57 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Chris Samuel, Duncan, Christopher Corsi, linux-btrfs

>>>>> "Marc" == Marc MERLIN <marc@merlins.org> writes:

Marc,

Marc> So I have Sata 3.1, that's great news, it means I can keep using
Marc> discard without worrying about performance and hangs

The fact that the drive reports compliance with a certain version of
SATA does not in any way imply that it implements all commands defined
in that specification.

The location where queued TRIM support is reported is somewhat unusual.
And last I looked hdparm -I had no infrastructure in place to report
stuff contained in log pages.

The kernel does look the right place to determine whether to issue the
queued or unqueued variant or not. But the information isn't exported to
userland.

So right now I'm afraid we don't have a good way for a user to determine
whether a device supports queued trims or not.

I guess we could consider either adding an ATA-specific "I don't suck"
flag in sysfs, add the missing code to hdparm, or both...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14 19:57                   ` Martin K. Petersen
@ 2014-03-14 20:46                     ` Holger Hoffstätte
  2014-03-15  4:21                       ` Marc MERLIN
  2014-03-15  5:25                     ` Chris Samuel
  1 sibling, 1 reply; 32+ messages in thread
From: Holger Hoffstätte @ 2014-03-14 20:46 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:

> So right now I'm afraid we don't have a good way for a user to determine
> whether a device supports queued trims or not.

Mount with discard, unpack kernel tree, sync, rm -rf tree.
If it takes several seconds, you have sync discard, no?

This changed somewhere around kernel 3.8.x; before that it used to be 
acceptably fast. Since then I only do batch trims, daily (server) or 
weekly (laptop).

-h


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  5:17             ` Marc MERLIN
  2014-03-14  7:33               ` Chris Samuel
  2014-03-14 12:07               ` Duncan
@ 2014-03-14 21:44               ` Chris Murphy
  2 siblings, 0 replies; 32+ messages in thread
From: Chris Murphy @ 2014-03-14 21:44 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Btrfs BTRFS

On Mar 13, 2014, at 11:17 PM, Marc MERLIN <marc@merlins.org> wrote:

> On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
>> 
>> On Mar 13, 2014, at 8:11 PM, Marc MERLIN <marc@merlins.org> wrote:
>> 
>>> On Sun, Mar 09, 2014 at 11:33:50AM +0000, Hugo Mills wrote:
>>>> discard is, except on the very latest hardware, a synchronous command
>>>> (it's a limitation of the SATA standard), and therefore results in
>>>> very very poor performance.
>>> 
>>> Interesting. How do I know if a given SSD will hang on discard?
>>> Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
>> 
>> smartctl -a or -x will tell you what SATA revision is in place. The queued trim support is in SATA Rev 3.1. I'm not certain if this requires only the drive to support that revision level, or both controller and drive.
> 
> I'm not sure I'm seeing this, which field is that?
> 
> === START OF INFORMATION SECTION ===
> Device Model:     Samsung SSD 840 EVO 1TB
> Serial Number:    S1D9NEAD934600N
> LU WWN Device Id: 5 002538 85009a8ff
> Firmware Version: EXT0BB0Q
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4c
> Local Time is:    Thu Mar 13 22:15:14 2014 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled

After ATA Version for me.

$ smartctl -a /dev/disk0
smartctl 6.1 2013-03-16 r3800 [x86_64-apple-darwin12.3.0] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG SSD 830 Series
Serial Number:    S0Z4NEAC933856
LU WWN Device Id: 5 002538 043584d30
Firmware Version: CXM03B1Q
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 2
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Mar 14 15:37:07 2014 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The Samsung hardware by and large is fairly well behaved with discard in my experience. But it does really depend a lot on the workload. I'd notice occasional random freezes for a couple of seconds when I had it enabled in OS X (totally different animal from the kernel up), nothing severe. But it was annoying enough I disabled it, and the problem went away. Apple doesn't enable trim by default on non-Apple SSD's still, so the idea that "everyone else" is doing this isn't true. The Windows implementation is rather complex, and also isn't always used contrary to what's been reported (on the everybody panic or get mad NOW type web sites).

If you want to be conservative about it, I'd say just manually run fstrim when the system is idle. Do that once a week or two. Chron job it if you want.

Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14  7:33               ` Chris Samuel
  2014-03-14 19:26                 ` Marc MERLIN
@ 2014-03-15  4:06                 ` Chris Samuel
  2014-03-16 16:07                   ` Martin K. Petersen
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Samuel @ 2014-03-15  4:06 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Fri, 14 Mar 2014 06:33:24 PM Chris Samuel wrote:

> I *think* you want smartctl -i instead, and look for the field that says 
> something like:
> 
> ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

Late night, cut and pasted the wrong line of output, mine says:

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Of course that's what the drive is reporting it supports, I'm not sure whether 
that's the result of what has been negotiated between the controller and drive 
or purely what the drive supports.

To get more information from smartctl you can use the --identify=wb option 
instead of -i and that should give you a lot more detail about what then 
drives claims to (and not to) support.   On the version in Kubuntu 13.10 
(6.1+svn3812-1) it only reports 3 things regarding TRIM for my drives.

chris@quad:/tmp$ sudo smartctl --identify=wb -d sat /dev/sdb | egrep -i 'trim|
discard'
  69     14          1   Deterministic data after trim supported
  69      5          0   Trimmed LBA range(s) returning zeroed data supported
 169      0          1   Trim bit in DATA SET MANAGEMENT command supported

I'm currently doing a git clone of their SVN repo to see if there's any new 
functionality that will gather any more information.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14 20:46                     ` Holger Hoffstätte
@ 2014-03-15  4:21                       ` Marc MERLIN
  2014-03-15  9:38                         ` Holger Hoffstätte
  0 siblings, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2014-03-15  4:21 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-btrfs

On Fri, Mar 14, 2014 at 08:46:09PM +0000, Holger Hoffstätte wrote:
> On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:
> 
> > So right now I'm afraid we don't have a good way for a user to determine
> > whether a device supports queued trims or not.
> 
> Mount with discard, unpack kernel tree, sync, rm -rf tree.
> If it takes several seconds, you have sync discard, no?

Mmmh, interesting point.

legolas:/usr/src# time rm -rf linux-3.14-rc5
real	0m1.584s
user	0m0.008s
sys	0m1.524s

I remounted my FS with remount,nodiscard, and the time was the same.

> This changed somewhere around kernel 3.8.x; before that it used to be 
> acceptably fast. Since then I only do batch trims, daily (server) or 
> weekly (laptop).

I'm never really timed this before. Is it supposed to be faster than 1.5s on
a fast SSD?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-14 19:57                   ` Martin K. Petersen
  2014-03-14 20:46                     ` Holger Hoffstätte
@ 2014-03-15  5:25                     ` Chris Samuel
  2014-03-15  6:48                       ` Chris Samuel
  2014-03-16 16:22                       ` Martin K. Petersen
  1 sibling, 2 replies; 32+ messages in thread
From: Chris Samuel @ 2014-03-15  5:25 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Fri, 14 Mar 2014 03:57:41 PM Martin K. Petersen wrote:

> The fact that the drive reports compliance with a certain version of
> SATA does not in any way imply that it implements all commands defined
> in that specification.

It looks like drives that do support it can be detected with the kernel helper 
function ata_fpdma_dsm_supported() defined in include/linux/libata.h.

I wonder if it would be possible to use that knowledge to extend the 
smartctl's --identify functionality to report this?

Not even all drives that implement it do so correctly, the kernel has a 
blacklist of drives that don't and currently lists just two:

       /* devices that don't properly handle queued TRIM commands */
       { "Micron_M500*",·      ·       NULL,·  ATA_HORKAGE_NO_NCQ_TRIM, },
       { "Crucial_CT???M500SSD*",·     NULL,·  ATA_HORKAGE_NO_NCQ_TRIM, },

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15  5:25                     ` Chris Samuel
@ 2014-03-15  6:48                       ` Chris Samuel
  2014-03-15 11:26                         ` Duncan
  2014-03-16 16:22                       ` Martin K. Petersen
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Samuel @ 2014-03-15  6:48 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Sat, 15 Mar 2014 04:25:05 PM Chris Samuel wrote:

> I wonder if it would be possible to use that knowledge to extend the 
> smartctl's --identify functionality to report this?

After reading the SATA 3.1 spec I believe that smartctl *can* indicate if a 
drive claims to support SATA 3.1 NCQ TRIM, thus:

$ sudo smartctl --identify /dev/sdb | fgrep 'Trim bit in DATA SET MANAGEMENT'
 169      0          1   Trim bit in DATA SET MANAGEMENT command supported 
$

If that command returns nothing then it's not reported as supported (and I've 
tested that).  You can get the same info with hdparm -I.

Of course, as Martin said, that doesn't necessarily mean the kernel is using 
that reported ability.

My puzzle now is that I have two SSD drives that report supporting NCQ TRIM 
(one confirmed via product info) but report only supporting SATA 3.0 not 3.1.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15  4:21                       ` Marc MERLIN
@ 2014-03-15  9:38                         ` Holger Hoffstätte
  0 siblings, 0 replies; 32+ messages in thread
From: Holger Hoffstätte @ 2014-03-15  9:38 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 14 Mar 2014 21:21:16 -0700, Marc MERLIN wrote:

> On Fri, Mar 14, 2014 at 08:46:09PM +0000, Holger Hoffstätte wrote:
>> On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:
>> 
>> > So right now I'm afraid we don't have a good way for a user to
>> > determine whether a device supports queued trims or not.
>> 
>> Mount with discard, unpack kernel tree, sync, rm -rf tree.
>> If it takes several seconds, you have sync discard, no?
> 
> Mmmh, interesting point.
> 
> legolas:/usr/src# time rm -rf linux-3.14-rc5 real	0m1.584s user	
0m0.008s
> sys	0m1.524s
> 
> I remounted my FS with remount,nodiscard, and the time was the same.
> 
>> This changed somewhere around kernel 3.8.x; before that it used to be
>> acceptably fast. Since then I only do batch trims, daily (server) or
>> weekly (laptop).
> 
> I'm never really timed this before. Is it supposed to be faster than
> 1.5s on a fast SSD?

No, ~1s + noise is OK and seems normal, depending on filesystem and
phase of the moon. To contrast here is the output from my laptop,
which has an old but still-going-strong Intel G2 with ext4:

$smartctl -i /dev/sda | grep ATA
ATA Version is:   ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s

without dicard:
rm -rf linux-3.12.14  0.05s user 1.28s system 98% cpu 1.364 total

remounted with discard & after an initial manual fstrim:
rm -rf linux-3.12.14  1.90s user 0.02s system 2% cpu 1:07.45 total

I think these numbers speak for themselves. :)

It's really good to know that SATA 3.1 apparently fixed this.

cheers
Holger


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15  6:48                       ` Chris Samuel
@ 2014-03-15 11:26                         ` Duncan
  2014-03-15 22:48                           ` Chris Samuel
  2014-03-16  6:06                           ` Marc MERLIN
  0 siblings, 2 replies; 32+ messages in thread
From: Duncan @ 2014-03-15 11:26 UTC (permalink / raw)
  To: linux-btrfs

Chris Samuel posted on Sat, 15 Mar 2014 17:48:56 +1100 as excerpted:

> $ sudo smartctl --identify /dev/sdb | fgrep 'Trim bit in DATA SET
> MANAGEMENT'
>  169      0          1   Trim bit in DATA SET MANAGEMENT command
>  supported
> $
> 
> If that command returns nothing then it's not reported as supported (and
> I've tested that).  You can get the same info with hdparm -I.

> My puzzle now is that I have two SSD drives that report supporting NCQ
> TRIM (one confirmed via product info) but report only supporting SATA
> 3.0 not 3.1.

My SATA 2.5 SSDs reported earlier, report support for it too, so it's 
apparently not SATA 3.1 limited.  (Note that I'm simply grepping word 
169, in the command below.  Since word 169 is trim support...)

sudo smartctl --identify /dev/sda | grep '^ 169'
 169      -     0x0001   Data Set Management support
 169      0          1   Trim bit in DATA SET MANAGEMENT command supported

Either that or that feature bit simply indicates trim support, not NCQ 
trim support.

But it can be noted that if SATA 3.1 requires trim to be NCQ if its 
supported at all (spinning rust would thus get a pass), then claiming 3.1 
support as well as trim support should be the equivalent of claiming NCQ 
trim support, likely with no indicator of whether that trim support is NCQ 
or not, pre-3.1.

... Which would mean that my SATA 2.5 and your SATA 3.0 drives are simply 
indicating trim support, not specifically NCQ trim support.

I guess you'd have to check the SATA 2.5 and 3.0 specs to find that out.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15 11:26                         ` Duncan
@ 2014-03-15 22:48                           ` Chris Samuel
  2014-03-16  6:06                           ` Marc MERLIN
  1 sibling, 0 replies; 32+ messages in thread
From: Chris Samuel @ 2014-03-15 22:48 UTC (permalink / raw)
  To: linux-btrfs

On 15/03/14 22:26, Duncan wrote:

> Either that or that feature bit simply indicates trim support, not NCQ
> trim support.

You're quite right, I outsmarted myself by noticing at the fact that the 
kernel tests for ATA_LOG_NCQ_SEND_RECV_DSM_TRIM and unsets that for 
drives that don't support NCQ DSM TRIM and then seeing DSM TRIM in the 
SATA 3.1 spec and inferred they were the same thing.

Looking closer at the kernel code that tests for what trim to use with 
ATA_LOG_NCQ_SEND_RECV_DSM_TRIM it falls back to ATA_DSM_TRIM if it can't 
do the NCQ version.

Mea culpa!

-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15 11:26                         ` Duncan
  2014-03-15 22:48                           ` Chris Samuel
@ 2014-03-16  6:06                           ` Marc MERLIN
  2014-03-16 17:09                             ` Chris Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Marc MERLIN @ 2014-03-16  6:06 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Sat, Mar 15, 2014 at 11:26:27AM +0000, Duncan wrote:
> Chris Samuel posted on Sat, 15 Mar 2014 17:48:56 +1100 as excerpted:
> 
> > $ sudo smartctl --identify /dev/sdb | fgrep 'Trim bit in DATA SET
> > MANAGEMENT'
> >  169      0          1   Trim bit in DATA SET MANAGEMENT command
> >  supported
> > $
> > 
> > If that command returns nothing then it's not reported as supported (and
> > I've tested that).  You can get the same info with hdparm -I.
> 
> > My puzzle now is that I have two SSD drives that report supporting NCQ
> > TRIM (one confirmed via product info) but report only supporting SATA
> > 3.0 not 3.1.
> 
> My SATA 2.5 SSDs reported earlier, report support for it too, so it's 
> apparently not SATA 3.1 limited.  (Note that I'm simply grepping word 
> 169, in the command below.  Since word 169 is trim support...)
> 
> sudo smartctl --identify /dev/sda | grep '^ 169'
>  169      -     0x0001   Data Set Management support
>  169      0          1   Trim bit in DATA SET MANAGEMENT command supported
> 
> Either that or that feature bit simply indicates trim support, not NCQ 
> trim support.

Mmmh, so now I'm confused.

See this:

=== START OF INFORMATION SECTION ===
Device Model:     INTEL SSDSC2BW180A3L
Serial Number:    CVCV215200XU180EGN
LU WWN Device Id: 5 001517 bb28c5317
Firmware Version: LE1i
User Capacity:    180,045,766,656 bytes [180 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Mar 15 15:49:06 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

polgara:/usr/src# smartctl --identify /dev/sda | grep '^ 169'
 169      -     0x0001   Data Set Management support
 169      0          1   Trim bit in DATA SET MANAGEMENT command supported

This is a super old SSD from 3 years ago. Clearly it can't support
synchronous dicard, right?

Yet, deleting a kernel tree also takes 1.5 seconds:
polgara:/usr/src# time rm -rf linux-3.14-rc5/
real    0m1.441s
user    0m0.048s
sys     0m1.352s


So maybe it's not the data level, but just the value of 169?

Either way, this SSD is more than 2 years old, maybe 3 actually.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15  4:06                 ` Chris Samuel
@ 2014-03-16 16:07                   ` Martin K. Petersen
  0 siblings, 0 replies; 32+ messages in thread
From: Martin K. Petersen @ 2014-03-16 16:07 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

>>>>> "Chris" == Chris Samuel <chris@csamuel.org> writes:

Chris> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Chris> Of course that's what the drive is reporting it supports, I'm not
Chris> sure whether that's the result of what has been negotiated
Chris> between the controller and drive or purely what the drive
Chris> supports.

It just what the drive reports. Often drives will implement features
before they are ratified in the spec and thus before they can claim
compliance with a specific version.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-15  5:25                     ` Chris Samuel
  2014-03-15  6:48                       ` Chris Samuel
@ 2014-03-16 16:22                       ` Martin K. Petersen
  2014-03-16 17:50                         ` Marc MERLIN
  1 sibling, 1 reply; 32+ messages in thread
From: Martin K. Petersen @ 2014-03-16 16:22 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

>>>>> "Chris" == Chris Samuel <chris@csamuel.org> writes:

Chris> It looks like drives that do support it can be detected with the
Chris> kernel helper function ata_fpdma_dsm_supported() defined in
Chris> include/linux/libata.h.

Chris> I wonder if it would be possible to use that knowledge to extend
Chris> the smartctl's --identify functionality to report this?

Queued trim support is indicated in a log page and not the identify
information. However, we can get to the information we want using
smartctl's ability to look at log pages.

I don't have a single drive from any vendor in the lab that supports
queued trim, not even a prototype. I went out and bought a 840 EVO this
morning because the general lazyweb opinion seemed to indicate that this
drive supports queued trim. Well, it doesn't. At least not in the 120GB
version:

# smartctl -l gplog,0x13 /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.14.0-rc6+] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

General Purpose Log 0x13 does not exist (override with '-T permissive' option)

If there's a drive with a working queued trim implementation out there,
I'd like to know about it...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-16  6:06                           ` Marc MERLIN
@ 2014-03-16 17:09                             ` Chris Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Murphy @ 2014-03-16 17:09 UTC (permalink / raw)
  To: Btrfs

On Mar 16, 2014, at 12:06 AM, Marc MERLIN <marc@merlins.org> wrote:
> 
> Mmmh, so now I'm confused.
> 
> See this:
> 
> === START OF INFORMATION SECTION ===
> Device Model:     INTEL SSDSC2BW180A3L
> Serial Number:    CVCV215200XU180EGN
> LU WWN Device Id: 5 001517 bb28c5317
> Firmware Version: LE1i
> User Capacity:    180,045,766,656 bytes [180 GB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:    Sat Mar 15 15:49:06 2014 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> polgara:/usr/src# smartctl --identify /dev/sda | grep '^ 169'
> 169      -     0x0001   Data Set Management support
> 169      0          1   Trim bit in DATA SET MANAGEMENT command supported
> 
> This is a super old SSD from 3 years ago. Clearly it can't support
> synchronous dicard, right?

No. The first signs I saw they were appearing in the wild was 3rd quarter 2013. I'm pretty sure SAS SSDs always have had a queued trim command. So in the workloads that demanded it and with a budget, this wouldn't have ever been a problem.

> Yet, deleting a kernel tree also takes 1.5 seconds:
> polgara:/usr/src# time rm -rf linux-3.14-rc5/
> real    0m1.441s
> user    0m0.048s
> sys     0m1.352s

I don't know that this is a good test for two reasons. Does rm always call trim before the rm command completes? If trim is batched or delayed it could happen well after. Second, and more of a factor, the queue needs to have pending commands in them that an async trim command will have to wait for. The problem with non-queued trim is that it requires the queue to be empty. So you'd need a test or workload that causes that behavior to be a problem.

And yet another factor with trim is that some SSDs immediately, aggressively start garbage collection and become slow to respond to anything. While others are smarter about doing this when the drive isn't as busy.

Chris Murphy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: discard synchronous on most SSDs?
  2014-03-16 16:22                       ` Martin K. Petersen
@ 2014-03-16 17:50                         ` Marc MERLIN
  0 siblings, 0 replies; 32+ messages in thread
From: Marc MERLIN @ 2014-03-16 17:50 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Chris Samuel, linux-btrfs

On Sun, Mar 16, 2014 at 12:22:05PM -0400, Martin K. Petersen wrote:
> queued trim, not even a prototype. I went out and bought a 840 EVO this
> morning because the general lazyweb opinion seemed to indicate that this
> drive supports queued trim. Well, it doesn't. At least not in the 120GB
> version:
> 
> # smartctl -l gplog,0x13 /dev/sda
> smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.14.0-rc6+] (local build)
> Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> General Purpose Log 0x13 does not exist (override with '-T permissive' option)
> 
> If there's a drive with a working queued trim implementation out there,
> I'd like to know about it...

I tried that for you on my 840 EVO 1TB and go the same as you:
legolas:/usr/src# smartctl -l gplog,0x13 /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.14.0-rc5-amd64-i915-preempt-20140216c] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

General Purpose Log 0x13 does not exist (override with '-T permissive' option)

Now, back to the fact that it takes 1.5sec to delete a kernel tree with
discard on, and it doesn't seem faster with discard off on either that
drive or my very old intel SSD, I'm starting to think that this is kind
of a non problem and/or that something else makes rm of a kernel tree
take around 1.5sec

Is is it much faster on an ssd for someone else?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-03-16 17:50 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-09  7:48 Massive BTRFS performance degradation KC
2014-03-09  8:17 ` Swâmi Petaramesh
2014-03-09 10:01   ` Martin Steigerwald
2014-03-09 10:23     ` Swâmi Petaramesh
2014-03-09 11:33       ` Hugo Mills
2014-03-09 11:54         ` Martin Steigerwald
2014-03-09 12:10         ` Swâmi Petaramesh
2014-03-09 17:14           ` boris
2014-03-14  2:11         ` discard synchronous on most SSDs? Marc MERLIN
2014-03-14  3:39           ` Chris Murphy
2014-03-14  5:17             ` Marc MERLIN
2014-03-14  7:33               ` Chris Samuel
2014-03-14 19:26                 ` Marc MERLIN
2014-03-14 19:57                   ` Martin K. Petersen
2014-03-14 20:46                     ` Holger Hoffstätte
2014-03-15  4:21                       ` Marc MERLIN
2014-03-15  9:38                         ` Holger Hoffstätte
2014-03-15  5:25                     ` Chris Samuel
2014-03-15  6:48                       ` Chris Samuel
2014-03-15 11:26                         ` Duncan
2014-03-15 22:48                           ` Chris Samuel
2014-03-16  6:06                           ` Marc MERLIN
2014-03-16 17:09                             ` Chris Murphy
2014-03-16 16:22                       ` Martin K. Petersen
2014-03-16 17:50                         ` Marc MERLIN
2014-03-15  4:06                 ` Chris Samuel
2014-03-16 16:07                   ` Martin K. Petersen
2014-03-14 12:07               ` Duncan
2014-03-14 21:44               ` Chris Murphy
2014-03-14  7:27             ` Chris Samuel
2014-03-09 17:36   ` Massive BTRFS performance degradation Austin S Hemmelgarn
2014-03-09 18:55     ` Tobias Holst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox