* How can btrfs take 23sec to stat 23K files from an SSD?
2012-07-22 22:41 ` brtfs on top of dmcrypt with SSD -> file access 5x slower than spinning disk Marc MERLIN
@ 2012-07-23 6:42 ` Marc MERLIN
2012-07-24 7:56 ` Martin Steigerwald
2012-07-27 11:08 ` Chris Mason
0 siblings, 2 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-07-23 6:42 UTC (permalink / raw)
To: linux-btrfs
I just realized that the older thread got a bit confusing, so I'll keep
problems separate and make things simpler :)
On an _unencrypted_ partition on the SSD, running du -sh on a directory
with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
encrypted spinning drive, both with a similar btrfs filesystem, and
the same kernel (3.4.4).
Unencrypted btrfs on SSD:
gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:/mnt/mnt2# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
514M src
real 0m22.667s
Encrypted btrfs on spinning drive of the same src directory:
gandalfthegreat:/var/local# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
514M src
real 0m3.881s
I've run this many times and get the same numbers.
I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow.
I also tried with:
- space_cache and nospace_cache
- ssd and nossd
- noatime didn't seem to help even though I was hopeful on this one.
In all cases, I get:
gandalfthegreat:/mnt/mnt2# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
514M src
real 0m22.537s
I'm having the same slow speed on 2 btrfs filesystems on the same SSD.
One is encrypted, the other one isnt:
Label: 'btrfs_pool1' uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d
Total devices 1 FS bytes used 144.74GB
devid 1 size 441.70GB used 195.04GB path /dev/dm-0
Label: 'boot' uuid: 84199644-3542-430a-8f18-a5aa58959662
Total devices 1 FS bytes used 2.33GB
devid 1 size 25.00GB used 5.04GB path /dev/sda2
If instead of stating a bunch of files, I try reading a big file, I do get speeds
that are quite fast (253MB/s and 423MB/s).
22 seconds for 15K files on an SSD is super slow and being 5 times
slower than a spinning disk with the same data.
What's going on?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
@ 2012-07-24 6:18 Marc MERLIN
0 siblings, 0 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-07-24 6:18 UTC (permalink / raw)
To: linux-btrfs
On Sun, Jul 22, 2012 at 11:42:03PM -0700, Marc MERLIN wrote:
> I just realized that the older thread got a bit confusing, so I'll keep
> problems separate and make things simpler :)
Since yesterday, I tried other kernels, including noprempt, volprempt and
preempt for 3.4.4.
I also tried a default 3.2.0 kernel from debian (all amd64), but that did
not help. I'm still seeing close to 25 seconds to scan 15K files.
How can it possibly be so slow?
More importantly how I can provide useful debug information.
- I don't think it's a problem with the kernel since I tried 4 kernels,
including a default debian one.
- Alignement seem ok, I made sure cylinders was divisible by 512:
/dev/sda2 502272 52930559 26214144 83 Linux
- I tried another brand new btrfs, and thing are even slower now.
gandalfthegreat:/mnt/mnt2# mount -o ssd,discard,noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:/mnt/mnt2# reset_cache
gandalfthegreat:/mnt/mnt2# time du -sh src/
514M src/
real 0m29.584s
gandalfthegreat:/mnt/mnt2# find src/| wc -l
15261
This is bad enough that there ought to be a way to debug this, right?
Can you suggest something?
Thanks,
Marc
> On an _unencrypted_ partition on the SSD, running du -sh on a directory
> with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
> encrypted spinning drive, both with a similar btrfs filesystem, and
> the same kernel (3.4.4).
>
> Unencrypted btrfs on SSD:
> gandalfthegreat:~# mount -o compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2
> gandalfthegreat:/mnt/mnt2# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
> 514M src
> real 0m22.667s
>
> Encrypted btrfs on spinning drive of the same src directory:
> gandalfthegreat:/var/local# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
> 514M src
> real 0m3.881s
>
> I've run this many times and get the same numbers.
> I've tried deadline and noop on /dev/sda (the SSD) and du is just as slow.
>
> I also tried with:
> - space_cache and nospace_cache
> - ssd and nossd
> - noatime didn't seem to help even though I was hopeful on this one.
>
> In all cases, I get:
> gandalfthegreat:/mnt/mnt2# echo 3 > /proc/sys/vm/drop_caches; time du -sh src
> 514M src
> real 0m22.537s
>
>
> I'm having the same slow speed on 2 btrfs filesystems on the same SSD.
> One is encrypted, the other one isnt:
> Label: 'btrfs_pool1' uuid: d570c40a-4a0b-4d03-b1c9-cff319fc224d
> Total devices 1 FS bytes used 144.74GB
> devid 1 size 441.70GB used 195.04GB path /dev/dm-0
>
> Label: 'boot' uuid: 84199644-3542-430a-8f18-a5aa58959662
> Total devices 1 FS bytes used 2.33GB
> devid 1 size 25.00GB used 5.04GB path /dev/sda2
>
> If instead of stating a bunch of files, I try reading a big file, I do get speeds
> that are quite fast (253MB/s and 423MB/s).
>
> 22 seconds for 15K files on an SSD is super slow and being 5 times
> slower than a spinning disk with the same data.
> What's going on?
>
> Thanks,
> Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-07-23 6:42 ` How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
@ 2012-07-24 7:56 ` Martin Steigerwald
2012-07-27 4:40 ` Marc MERLIN
2012-07-27 11:08 ` Chris Mason
1 sibling, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-07-24 7:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: Marc MERLIN
Am Montag, 23. Juli 2012 schrieb Marc MERLIN:
> I just realized that the older thread got a bit confusing, so I'll keep
> problems separate and make things simpler :)
>
> On an _unencrypted_ partition on the SSD, running du -sh on a directory
> with 15K files, takes 23 seconds on unencrypted SSD and 4 secs on
> encrypted spinning drive, both with a similar btrfs filesystem, and
> the same kernel (3.4.4).
>
> Unencrypted btrfs on SSD:
> gandalfthegreat:~# mount -o
> compress=lzo,discard,nossd,space_cache,noatime /dev/sda2 /mnt/mnt2
> gandalfthegreat:/mnt/mnt2# echo 3 > /proc/sys/vm/drop_caches; time du
> -sh src 514M src
> real 0m22.667s
>
> Encrypted btrfs on spinning drive of the same src directory:
> gandalfthegreat:/var/local# echo 3 > /proc/sys/vm/drop_caches; time du
> -sh src 514M src
> real 0m3.881s
find is fast, du is much slower:
merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( find /usr | wc -l )
404166
( find /usr | wc -l; ) 0,03s user 0,07s system 1% cpu 9,212 total
merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( du -sh /usr )
11G /usr
( du -sh /usr; ) 1,00s user 19,07s system 41% cpu 48,886 total
Now I try to find something with less files.
merkaba:~> find /usr/share/doc | wc -l
50715
merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( find /usr/share/doc
| wc -l )
50715
( find /usr/share/doc | wc -l; ) 0,00s user 0,02s system 1% cpu 1,398
total
merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( du -sh
/usr/share/doc )
606M /usr/share/doc
( du -sh /usr/share/doc; ) 0,20s user 3,63s system 35% cpu 10,691 total
merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time du -sh /usr/share/doc
606M /usr/share/doc
du -sh /usr/share/doc 0,19s user 3,54s system 35% cpu 10,386 total
Anyway thats still much faster than your measurements.
merkaba:~> df -hT /usr
Dateisystem Typ Größe Benutzt Verf. Verw% Eingehängt auf
/dev/dm-0 btrfs 19G 11G 5,6G 67% /
merkaba:~> btrfs fi sh
failed to read /dev/sr0
Label: 'debian' uuid: […]
Total devices 1 FS bytes used 10.25GB
devid 1 size 18.62GB used 18.62GB path /dev/dm-0
Btrfs Btrfs v0.19
merkaba:~> btrfs fi df /
Data: total=15.10GB, used=9.59GB
System, DUP: total=8.00MB, used=4.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.75GB, used=670.43MB
Metadata: total=8.00MB, used=0.00
merkaba:~> grep btrfs /proc/mounts
/dev/dm-0 / btrfs rw,noatime,compress=lzo,ssd,space_cache,inode_cache 0 0
Somewhat aged BTRFS filesystem on ThinkPad T520, Intel SSD 320, kernel
3.5.
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-07-24 7:56 ` Martin Steigerwald
@ 2012-07-27 4:40 ` Marc MERLIN
0 siblings, 0 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-07-27 4:40 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs
On Tue, Jul 24, 2012 at 09:56:26AM +0200, Martin Steigerwald wrote:
> find is fast, du is much slower:
>
> merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( find /usr | wc -l )
> 404166
> ( find /usr | wc -l; ) 0,03s user 0,07s system 1% cpu 9,212 total
> merkaba:~> echo 3 > /proc/sys/vm/drop_caches ; time ( du -sh /usr )
> 11G /usr
> ( du -sh /usr; ) 1,00s user 19,07s system 41% cpu 48,886 total
You're right.
gandalfthegreat [mc]# time du -sh src
514M src
real 0m25.159s
gandalfthegreat [mc]# reset_cache
gandalfthegreat [mc]# time bash -c "find src | wc -l"
15261
real 0m7.614s
But find is still slower the du on my spinning disk for the same tree.
> Anyway thats still much faster than your measurements.
Right. Your numbers look reasonable for an SSD.
Since then, I did some more tests and I'm also getting slower than normal
speeds with ext4, an indications that it's a problem with the block layer.
I'm working with some folks try to pin down the core problem, but it looks
like it's not an easy one.
Thanks for your data.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-07-23 6:42 ` How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
2012-07-24 7:56 ` Martin Steigerwald
@ 2012-07-27 11:08 ` Chris Mason
2012-07-27 18:42 ` Marc MERLIN
1 sibling, 1 reply; 22+ messages in thread
From: Chris Mason @ 2012-07-27 11:08 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org
On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote:
>
> 22 seconds for 15K files on an SSD is super slow and being 5 times
> slower than a spinning disk with the same data.
> What's going on?
Hi Marc,
The easiest way to figure out is with latencytop. I'd either run the
latencytop gui or use the latencytop -c patch which sends a text dump to
the console.
This is assuming that you're not pegged at 100% CPU...
https://oss.oracle.com/~mason/latencytop.patch
-chris
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-07-27 11:08 ` Chris Mason
@ 2012-07-27 18:42 ` Marc MERLIN
[not found] ` <20120801053042.GG12695@merlins.org>
0 siblings, 1 reply; 22+ messages in thread
From: Marc MERLIN @ 2012-07-27 18:42 UTC (permalink / raw)
To: Chris Mason, linux-btrfs@vger.kernel.org
On Fri, Jul 27, 2012 at 07:08:35AM -0400, Chris Mason wrote:
> On Mon, Jul 23, 2012 at 12:42:03AM -0600, Marc MERLIN wrote:
> >
> > 22 seconds for 15K files on an SSD is super slow and being 5 times
> > slower than a spinning disk with the same data.
> > What's going on?
>
> Hi Marc,
>
> The easiest way to figure out is with latencytop. I'd either run the
> latencytop gui or use the latencytop -c patch which sends a text dump to
> the console.
>
> This is assuming that you're not pegged at 100% CPU...
>
> https://oss.oracle.com/~mason/latencytop.patch
Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU
(not even close and I get the same problem with unencrypted filesystem, actually
du -sh is exactly the same speed on encrypted and unecrypted).
Here's the result I think you were looking for. I'm not good at reading this,
but hopefully it tells you something useful :)
The full run is here if that helps:
http://marc.merlins.org/tmp/latencytop.txt
Process du (6748) Total: 4280.5 msec
Reading directory content 15.2 msec 11.0 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_next_leaf
btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath
[sleep_on_page] 13.5 msec 88.2 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_search_slot
btrfs_lookup_csum __btrfs_lookup_bio_sums
btrfs_lookup_bio_sums btrfs_submit_compressed_read btrfs_submit_bio_hook
Page fault 12.9 msec 0.6 %
sleep_on_page_killable wait_on_page_bit_killable
__lock_page_or_retry filemap_fault __do_fault handle_pte_fault
handle_mm_fault do_page_fault page_fault
Executing a program 7.1 msec 0.2 %
sleep_on_page_killable __lock_page_killable
generic_file_aio_read do_sync_read vfs_read kernel_read
prepare_binprm do_execve_common.isra.27 do_execve sys_execve
stub_execve
Process du (6748) Total: 9517.4 msec
[sleep_on_page] 23.0 msec 82.8 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_search_slot
btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup
__lookup_hash
Reading directory content 13.2 msec 17.2 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_search_slot
btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath
Process du (6748) Total: 9524.0 msec
[sleep_on_page] 17.1 msec 88.5 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_search_slot
btrfs_lookup_inode btrfs_iget btrfs_lookup_dentry btrfs_lookup
__lookup_hash
Reading directory content 16.0 msec 11.5 %
sleep_on_page wait_on_page_bit read_extent_buffer_pages
btree_read_extent_buffer_pages.constprop.110 read_tree_block
read_block_for_search.isra.32 btrfs_search_slot
btrfs_real_readdir vfs_readdir sys_getdents system_call_fastpath
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
[not found] ` <20120801053042.GG12695@merlins.org>
@ 2012-08-01 6:01 ` Marc MERLIN
2012-08-01 6:08 ` Fajar A. Nugraha
2012-08-01 6:36 ` Chris Samuel
0 siblings, 2 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-08-01 6:01 UTC (permalink / raw)
To: Chris Mason, linux-btrfs@vger.kernel.org, Ted Ts'o
Cc: Lukáš Czerner, linux-ext4, axboe, Milan Broz
On Fri, Jul 27, 2012 at 11:42:39AM -0700, Marc MERLIN wrote:
> > https://oss.oracle.com/~mason/latencytop.patch
>
> Thanks for the patch, and yes I can confirm I'm definitely not pegged on CPU
> (not even close and I get the same problem with unencrypted filesystem, actually
> du -sh is exactly the same speed on encrypted and unecrypted).
>
> Here's the result I think you were looking for. I'm not good at reading this,
> but hopefully it tells you something useful :)
>
> The full run is here if that helps:
> http://marc.merlins.org/tmp/latencytop.txt
I did some other tests since last week since my laptop is hard to use
considering how slow the SSD is.
(TL;DR: ntfs on linux via fuse is 33% faster than ext4, which is 2x faster
than btrfs, but 3x slower than the same filesystem on spinning disk :( )
Ok, just to help with debuggging this,
1) I put my samsung 830 SSD into another thinkpad and it wasn't faster or
slower.
2) Then I put a crucial 256 C300 SSD (the replacement for the one I had that
just died and killed all my data), and du took 0.3 seconds on both my old
and new thinkpads.
The old thinkpad is running ubuntu 32bit the new one debian testing 64bit
both with kernel 3.4.4.
So, clearly, there is something wrong with the samsung 830 SSD with linux
but I have no clue what :(
In raw speed (dd) the samsung is faster than the crucial (350MB/s vs
500MB/s).
It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
I have a hard time believing that samsung is selling SSDs that are slower
than hard drives at random IO and 'seeks'.
3) I just got a 2nd ssd from samsung (same kind), just to make sure the one
I had wasn't bad. It's brand new, and I formatted it carefully on 512
boundaries:
/dev/sda1 2048 502271 250112 83 Linux
/dev/sda2 502272 52930559 26214144 7 HPFS/NTFS/exFAT
/dev/sda3 52930560 73902079 10485760 82 Linux swap / Solaris
/dev/sda4 73902080 1000215215 463156568 83 Linux
I also upgraded to 3.5.0 in the meantime but unfortunately the results are
similar.
First: btrfs is the slowest:
gandalfthegreat:/mnt/ssd/var/local# time du -sh src/
514M src/
real 0m25.741s
gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts
/dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0
Second: ext4 is 2x faster than btrfs with mkfs.ext4 -O extent -b 4096 /dev/sda3
gandalfthegreat:/mnt/mnt3# reset_cache
gandalfthegreat:/mnt/mnt3# time du -sh src/
519M src/
real 0m12.459s
gandalfthegreat:~# grep mnt3 /proc/mounts
/dev/sda3 /mnt/mnt3 ext4 rw,noatime,discard,data=ordered 0 0
Third, A freshly made ntfs filesystem through fuse is actually FASTER!
gandalfthegreat:/mnt/mnt2# reset_cache
gandalfthegreat:/mnt/mnt2# time du -sh src/
506M src/
real 0m8.928s
gandalfthegreat:/mnt/mnt2# grep mnt2 /proc/mounts
/dev/sda2 /mnt/mnt2 fuseblk rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 0 0
How can ntfs via fuse be the fastest and btrfs so slow?
Of course, all 3 are slower than the same filesystem on spinning too, but
I'm wondering if there is a scheduling issue that is somehow causing the
extreme slowness I'm seeing.
Did the latencytop trace I got help in any way?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 6:01 ` Marc MERLIN
@ 2012-08-01 6:08 ` Fajar A. Nugraha
2012-08-01 6:21 ` Marc MERLIN
2012-08-01 6:36 ` Chris Samuel
1 sibling, 1 reply; 22+ messages in thread
From: Fajar A. Nugraha @ 2012-08-01 6:08 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs@vger.kernel.org
On Wed, Aug 1, 2012 at 1:01 PM, Marc MERLIN <marc@merlins.org> wrote:
> So, clearly, there is something wrong with the samsung 830 SSD with linux
> It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
> I have a hard time believing that samsung is selling SSDs that are slower
> than hard drives at random IO and 'seeks'.
You'd be surprised on how badly some vendors can screw up :)
> First: btrfs is the slowest:
> gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts
> /dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0
Just checking, did you explicitly activate "discard"? Cause on my
setup (with corsair SSD) it made things MUCH slower. Also, try adding
"noatime" (just in case the slow down was because "du" cause many
access time updates)
--
Fajar
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 6:08 ` Fajar A. Nugraha
@ 2012-08-01 6:21 ` Marc MERLIN
2012-08-01 21:57 ` Martin Steigerwald
0 siblings, 1 reply; 22+ messages in thread
From: Marc MERLIN @ 2012-08-01 6:21 UTC (permalink / raw)
To: Fajar A. Nugraha; +Cc: linux-btrfs@vger.kernel.org
On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote:
> > It it were a random crappy SSD from a random vendor, I'd blame the SSD, but
> > I have a hard time believing that samsung is selling SSDs that are slower
> > than hard drives at random IO and 'seeks'.
>
> You'd be surprised on how badly some vendors can screw up :)
At some point, it may come down to that indeed :-/
I'm still hopefully that Samsung didn't, but we'll see.
> > First: btrfs is the slowest:
>
> > gandalfthegreat:/mnt/ssd/var/local# grep /mnt/ssd/var /proc/mounts
> > /dev/mapper/ssd /mnt/ssd/var btrfs rw,noatime,compress=lzo,ssd,discard,space_cache 0 0
>
> Just checking, did you explicitly activate "discard"? Cause on my
Yes. Note that it should a noop when all you're doing is stating inodes and
not writing (I'm using noatime).
> setup (with corsair SSD) it made things MUCH slower. Also, try adding
> "noatime" (just in case the slow down was because "du" cause many
> access time updates)
I have noatime in there already :)
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 6:01 ` Marc MERLIN
2012-08-01 6:08 ` Fajar A. Nugraha
@ 2012-08-01 6:36 ` Chris Samuel
2012-08-01 6:40 ` Marc MERLIN
1 sibling, 1 reply; 22+ messages in thread
From: Chris Samuel @ 2012-08-01 6:36 UTC (permalink / raw)
To: Marc MERLIN
Cc: Chris Mason, linux-btrfs@vger.kernel.org, Ted Ts'o,
Lukáš Czerner, linux-ext4, axboe, Milan Broz
On 01/08/12 16:01, Marc MERLIN wrote:
> Third, A freshly made ntfs filesystem through fuse is actually FASTER!
Could it be that Samsungs FTL has optimisations in it for NTFS ?
A horrible thought, but not impossible..
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 6:36 ` Chris Samuel
@ 2012-08-01 6:40 ` Marc MERLIN
0 siblings, 0 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-08-01 6:40 UTC (permalink / raw)
To: Chris Samuel
Cc: Chris Mason, linux-btrfs@vger.kernel.org, Ted Ts'o,
Lukáš Czerner, linux-ext4, axboe, Milan Broz
On Wed, Aug 01, 2012 at 04:36:22PM +1000, Chris Samuel wrote:
> On 01/08/12 16:01, Marc MERLIN wrote:
>
> > Third, A freshly made ntfs filesystem through fuse is actually FASTER!
>
> Could it be that Samsungs FTL has optimisations in it for NTFS ?
>
> A horrible thought, but not impossible..
Not impossible, but it's can be the main reason since it clocks still 2x
slower with ntfs than a spinning disk with encrypted btrfs.
Since SSDs should "seek" 10-100x faster than spinning disks, that can't be
the only reason.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 6:21 ` Marc MERLIN
@ 2012-08-01 21:57 ` Martin Steigerwald
2012-08-02 5:07 ` Marc MERLIN
0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-01 21:57 UTC (permalink / raw)
To: linux-btrfs; +Cc: Marc MERLIN, Fajar A. Nugraha
Hi Marc,
Am Mittwoch, 1. August 2012 schrieb Marc MERLIN:
> On Wed, Aug 01, 2012 at 01:08:46PM +0700, Fajar A. Nugraha wrote:
> > > It it were a random crappy SSD from a random vendor, I'd blame the
> > > SSD, but I have a hard time believing that samsung is selling SSDs
> > > that are slower than hard drives at random IO and 'seeks'.
> >
> > You'd be surprised on how badly some vendors can screw up :)
>
> At some point, it may come down to that indeed :-/
> I'm still hopefully that Samsung didn't, but we'll see.
Its getting quite strange.
I lost track of whether you did that already or not, but if you didn´t
please post some
vmstat 1
iostat -xd 1
on the device while it is being slow.
I am interested in wait I/O and latencies and disk utilization.
Comparison data of Intel SSD 320 in ThinkPad T520 during
merkaba:~> echo 3 > /proc/sys/drop_caches ; du -sch /usr
on BTRFS with Kernel 3.5:
martin@merkaba:~> vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 1 21556 4442668 2056 502352 0 0 194 85 247 120 11 2 87 0
1 2 21556 4408888 2448 514884 0 0 11684 328 4975 24585 5 16 65 14
1 0 21556 4389880 2448 528060 0 0 13400 0 4574 23452 2 16 68 14
3 1 21556 4370068 2448 545052 0 0 18132 0 5499 27220 1 18 64 16
1 0 21556 4350228 2448 555580 0 0 10856 0 4122 25339 3 16 67 14
1 1 21556 4315604 2448 569756 0 0 12648 0 4647 31153 5 14 66 15
0 1 21556 4295652 2456 581480 0 0 11548 56 4093 24618 2 13 69 16
0 1 21556 4286720 2456 591580 0 0 10824 0 3750 21445 1 12 71 16
0 1 21556 4266308 2456 603620 0 0 12932 0 4841 26447 4 12 68 17
1 0 21556 4248228 2456 613808 0 0 10264 4 3703 22108 1 13 71 15
5 1 21556 4231976 2456 624356 0 0 10540 0 3581 20436 1 10 72 17
0 1 21556 4197168 2456 639108 0 0 12952 0 4738 28223 4 15 66 15
4 1 21556 4178456 2456 650552 0 0 11656 0 4234 23480 2 14 68 16
0 1 21556 4163616 2456 662992 0 0 13652 0 4619 26580 1 16 70 13
4 1 21556 4138288 2456 675696 0 0 13352 0 4422 22254 1 16 70 13
1 0 21556 4113204 2456 689060 0 0 13232 0 4312 21936 1 15 70 14
0 1 21556 4085532 2456 704160 0 0 14972 0 4820 24238 1 16 69 14
2 0 21556 4055740 2456 719644 0 0 15736 0 5099 25513 3 17 66 14
0 1 21556 4028612 2456 734380 0 0 14504 0 4795 25052 3 15 68 14
2 0 21556 3999108 2456 749040 0 0 14656 16 4672 21878 1 17 69 13
1 1 21556 3972732 2456 762108 0 0 12972 0 4717 22411 1 17 70 13
5 0 21556 3949684 2584 773484 0 0 11528 52 4837 24107 3 15 67 15
1 0 21556 3912504 2584 787420 0 0 12156 0 4883 25201 4 15 67 14
martin@merkaba:~> iostat -xd 1 /dev/sda
Linux 3.5.0-tp520 (merkaba) 01.08.2012 _x86_64_ (4 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 1,29 1,44 11,58 12,78 684,74 299,75 80,81 0,24 9,86 0,95 17,93 0,29 0,71
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 2808,00 0,00 11232,00 0,00 8,00 0,57 0,21 0,21 0,00 0,19 54,50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 2967,00 0,00 11868,00 0,00 8,00 0,63 0,21 0,21 0,00 0,21 60,90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 11,00 2992,00 4,00 11968,00 56,00 8,03 0,64 0,22 0,22 0,25 0,21 62,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 2680,00 0,00 10720,00 0,00 8,00 0,70 0,26 0,26 0,00 0,25 66,70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3153,00 0,00 12612,00 0,00 8,00 0,72 0,23 0,23 0,00 0,22 69,30
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 2769,00 0,00 11076,00 0,00 8,00 0,63 0,23 0,23 0,00 0,21 58,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 2523,00 1,00 10092,00 4,00 8,00 0,74 0,29 0,29 0,00 0,28 71,30
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3026,00 0,00 12104,00 0,00 8,00 0,73 0,24 0,24 0,00 0,21 64,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3069,00 0,00 12276,00 0,00 8,00 0,67 0,22 0,22 0,00 0,20 62,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3346,00 0,00 13384,00 0,00 8,00 0,64 0,19 0,19 0,00 0,18 59,90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3188,00 0,00 12752,00 0,00 8,00 0,80 0,25 0,25 0,00 0,17 54,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3433,00 0,00 13732,00 0,00 8,00 1,03 0,30 0,30 0,00 0,17 57,00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3565,00 0,00 14260,00 0,00 8,00 0,92 0,26 0,26 0,00 0,16 57,30
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3972,00 0,00 15888,00 0,00 8,00 1,13 0,29 0,29 0,00 0,16 62,90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3743,00 0,00 14972,00 0,00 8,00 1,03 0,28 0,28 0,00 0,16 59,40
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3408,00 0,00 13632,00 0,00 8,00 1,08 0,32 0,32 0,00 0,17 56,70
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0,00 0,00 3730,00 3,00 14920,00 16,00 8,00 1,14 0,31 0,31 0,00 0,15 56,30
I also suggest to use fio with with the ssd-test example on the SSD. I
have some comparison data available for my setup. Heck it should be
publicly available in my ADMIN magazine article about fio. I used a bit
different fio jobs with block sizes of 2k to 16k, but its similar enough
and I might even have some 4k examples at hand or can easily create one.
I also raised size and duration a bit.
An example based on whats in my article:
[global]
ioengine=libaio
direct=1
iodepth=64
runtime=60
filename=testfile
size=2G
bsrange=2k-16k
refill_buffers=1
[randomwrite]
stonewall
rw=randwrite
[sequentialwrite]
stonewall
rw=write
[randomread]
stonewall
rw=randread
[sequentialread]
stonewall
rw=read
Remove bsrange if you want 4k blocks only. I put the writes above the
reads cause the writes with refill_buffers=1 initialize the testfile
with random data. It would contain only zeros which will be compressible
nicely by modern SandForce controllers otherwise. Above job is untested,
but it should do.
Please remove the test file and fstrim your disk unless you have discard
mount option on after having run any write tests. (Not necessarily after
each one ;).
Also I am interested in
merkaba:~> hdparm -I /dev/sda | grep -i queue
Queue depth: 32
* Native Command Queueing (NCQ)
output for your SSD.
Try to load the ssd with different iodepths up to twice the amount
displayed by hdparm.
But note: a du -sch will not reach that high iodepth. I expect it to
get as low as an iodepth of one in that case. You see in my comparison
output that a single du -sch is not able to load the Intel SSD 320 fully,
but load is just about 50-70%. And iowait is just about 10-20%.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-01 21:57 ` Martin Steigerwald
@ 2012-08-02 5:07 ` Marc MERLIN
2012-08-02 11:18 ` Martin Steigerwald
2012-08-02 11:25 ` Martin Steigerwald
0 siblings, 2 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-08-02 5:07 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha
On Wed, Aug 01, 2012 at 11:57:39PM +0200, Martin Steigerwald wrote:
> Its getting quite strange.
I would agree :)
Before I paste a bunch of thing, I wanted to thank you for not giving up on me
and offering your time to help me figure this out :)
> I lost track of whether you did that already or not, but if you didn´t
> please post some
>
> vmstat 1
> iostat -xd 1
> on the device while it is being slow.
Sure thing, here's the 24 second du -s run:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 1 0 2747264 44 348388 0 0 28 50 242 184 19 6 74 1
1 0 0 2744128 44 351700 0 0 144 0 2758 32115 30 5 61 4
2 1 0 2743100 44 351992 0 0 792 0 2616 30613 28 4 50 18
1 1 0 2741592 44 352668 0 0 776 0 2574 31551 29 4 45 21
1 1 0 2740720 44 353432 0 0 692 0 2734 32891 30 4 45 22
1 1 0 2740104 44 354284 0 0 460 0 2639 31585 30 4 45 21
3 1 0 2738520 44 354692 0 0 544 264 2834 30302 32 5 42 21
1 1 0 2736936 44 355476 0 0 1064 2012 2867 31172 28 4 45 23
3 1 0 2734892 44 356840 0 0 1236 0 2865 31100 30 3 46 21
3 1 0 2733840 44 357432 0 0 636 12 2930 31362 30 4 45 21
3 1 0 2735152 44 357892 0 0 612 0 2546 31641 28 4 46 22
3 1 0 2733608 44 358680 0 0 668 0 2846 29747 30 3 46 22
2 1 0 2731480 44 359556 0 0 712 248 2744 27939 31 3 44 22
2 1 0 2732688 44 360736 0 0 1204 0 2774 30979 30 3 46 21
4 1 0 2730276 44 361872 0 0 872 256 3174 31657 30 5 45 20
1 1 0 2728372 44 363008 0 0 912 132 2752 31811 30 3 46 22
1 1 0 2727060 44 363240 0 0 476 0 2595 30313 33 4 43 21
1 1 0 2725856 44 364000 0 0 580 0 2547 30803 29 4 46 21
1 1 0 2723628 44 364724 0 0 756 272 2766 30652 31 4 44 22
2 1 0 2722128 44 365616 0 0 744 8 2771 31760 30 4 45 21
2 1 0 2721024 44 366284 0 0 788 196 2695 31819 31 4 44 22
3 1 0 2719312 44 367084 0 0 732 0 2737 32038 30 5 45 20
1 1 0 2717436 44 367880 0 0 728 0 2570 30810 29 3 46 22
3 1 0 2716820 44 368500 0 0 608 0 2614 30949 29 4 45 22
2 1 0 2715576 44 369240 0 0 628 264 2715 31180 34 4 40 22
1 1 0 2714308 44 369952 0 0 624 0 2533 31155 28 4 46 22
Linux 3.5.0-amd64-preempt-noide-20120410 (gandalfthegreat) 08/01/2012 _x86_64_ (4 CPU)
rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2.18 0.68 6.45 17.77 78.12 153.39 19.12 0.18 7.51 8.52 7.15 4.46 10.81
0.00 0.00 118.00 0.00 540.00 0.00 9.15 1.18 9.93 9.93 0.00 4.98 58.80
0.00 0.00 217.00 0.00 868.00 0.00 8.00 1.90 8.77 8.77 0.00 4.44 96.40
0.00 0.00 192.00 0.00 768.00 0.00 8.00 1.63 8.44 8.44 0.00 5.10 98.00
0.00 0.00 119.00 0.00 476.00 0.00 8.00 1.06 9.01 9.01 0.00 8.20 97.60
0.00 0.00 125.00 0.00 500.00 0.00 8.00 1.08 8.67 8.67 0.00 7.55 94.40
0.00 0.00 165.00 0.00 660.00 0.00 8.00 1.50 9.12 9.12 0.00 5.87 96.80
0.00 0.00 195.00 13.00 780.00 272.00 10.12 1.68 8.10 7.94 10.46 4.65 96.80
0.00 0.00 173.00 0.00 692.00 0.00 8.00 1.72 9.87 9.87 0.00 5.71 98.80
0.00 0.00 171.00 0.00 684.00 0.00 8.00 1.62 9.33 9.33 0.00 5.75 98.40
0.00 0.00 161.00 0.00 644.00 0.00 8.00 1.52 9.57 9.57 0.00 6.14 98.80
0.00 0.00 136.00 0.00 544.00 0.00 8.00 1.26 9.29 9.29 0.00 7.24 98.40
0.00 0.00 199.00 0.00 796.00 0.00 8.00 1.94 9.73 9.73 0.00 4.94 98.40
0.00 0.00 201.00 0.00 804.00 0.00 8.00 1.70 8.54 8.54 0.00 4.80 96.40
0.00 0.00 272.00 15.00 1088.00 272.00 9.48 2.35 8.21 8.46 3.73 3.39 97.20
0.00 0.00 208.00 1.00 832.00 4.00 8.00 1.90 9.01 9.04 4.00 4.67 97.60
0.00 0.00 228.00 0.00 912.00 0.00 8.00 1.94 8.56 8.56 0.00 4.23 96.40
0.00 0.00 107.00 0.00 428.00 0.00 8.00 0.98 9.05 9.05 0.00 9.12 97.60
0.00 0.00 163.00 0.00 652.00 0.00 8.00 1.60 9.89 9.89 0.00 6.01 98.00
0.00 0.00 162.00 0.00 648.00 0.00 8.00 1.83 10.20 10.20 0.00 6.07 98.40
1.00 10.00 224.00 71.00 900.00 2157.50 20.73 2.82 10.14 9.50 12.17 3.35 98.80
41.00 8.00 166.00 176.00 832.00 333.50 6.82 2.48 7.24 8.48 6.07 2.92 100.00
18.00 0.00 78.00 72.00 388.00 36.00 5.65 2.30 15.28 16.56 13.89 6.67 100.00
13.00 0.00 190.00 25.00 824.00 12.50 7.78 1.95 9.17 9.43 7.20 4.58 98.40
0.00 0.00 156.00 0.00 624.00 0.00 8.00 1.36 8.72 8.72 0.00 6.26 97.60
0.00 0.00 158.00 19.00 632.00 64.00 7.86 1.50 8.45 8.61 7.16 5.60 99.20
0.00 0.00 176.00 0.00 704.00 0.00 8.00 1.65 9.39 9.39 0.00 5.61 98.80
0.00 0.00 44.00 0.00 176.00 0.00 8.00 0.38 8.55 8.55 0.00 8.27 36.40
> I am interested in wait I/O and latencies and disk utilization.
Cool tool, I didn't know about iostat.
My r_await numbers don't look good obviously and yet %util is pretty much
100% the entire time.
Does that show that it's indeed the device that is unable to deliver the requests any quicker, despite
being an ssd, or are you reading this differently?
> Also I am interested in
> merkaba:~> hdparm -I /dev/sda | grep -i queue
> Queue depth: 32
> * Native Command Queueing (NCQ)
> output for your SSD.
gandalfthegreat:/var/local# hdparm -I /dev/sda | grep -i queue
Queue depth: 32
* Native Command Queueing (NCQ)
gandalfthegreat:/var/local#
I've the the fio tests in:
/dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
(discard is there, so fstrim shouldn't be needed)
> I also suggest to use fio with with the ssd-test example on the SSD. I
> have some comparison data available for my setup. Heck it should be
> publicly available in my ADMIN magazine article about fio. I used a bit
> different fio jobs with block sizes of 2k to 16k, but its similar enough
> and I might even have some 4k examples at hand or can easily create one.
> I also raised size and duration a bit.
>
> An example based on whats in my article:
Thanks, here's the output.
I see bw jumping from bw=1700.8KB/s all the way to bw=474684KB/s depending
on the io size. That's "intereesting" to say the least.
So, doctor, is it bad? :)
randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
2.0.8
Starting 4 processes
randomwrite: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0 iops] [eta 00m:00s]
randomwrite: (groupid=0, jobs=1): err= 0: pid=7193
write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec
slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55
clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63
lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57
clat percentiles (msec):
| 1.00th=[ 225], 5.00th=[ 241], 10.00th=[ 247], 20.00th=[ 260],
| 30.00th=[ 269], 40.00th=[ 281], 50.00th=[ 293], 60.00th=[ 306],
| 70.00th=[ 322], 80.00th=[ 351], 90.00th=[ 545], 95.00th=[ 570],
| 99.00th=[ 627], 99.50th=[ 644], 99.90th=[ 709], 99.95th=[ 725],
| 99.99th=[ 742]
bw (KB/s) : min= 11, max= 2591, per=99.83%, avg=1697.13, stdev=491.48
lat (usec) : 50=0.01%
lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89%
lat (msec) : 500=72.44%, 750=14.43%
cpu : usr=0.30%, sys=2.65%, ctx=9699, majf=0, minf=19
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.4%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=11396/d=0, short=r=0/w=0/d=0
sequentialwrite: (groupid=1, jobs=1): err= 0: pid=7218
write: io=80994KB, bw=1349.7KB/s, iops=149 , runt= 60011msec
slat (usec): min=15 , max=296798 , avg=6658.06, stdev=7414.10
clat (usec): min=15 , max=962319 , avg=418984.76, stdev=128626.12
lat (msec): min=12 , max=967 , avg=425.64, stdev=130.36
clat percentiles (msec):
| 1.00th=[ 269], 5.00th=[ 306], 10.00th=[ 322], 20.00th=[ 338],
| 30.00th=[ 355], 40.00th=[ 371], 50.00th=[ 388], 60.00th=[ 404],
| 70.00th=[ 420], 80.00th=[ 445], 90.00th=[ 603], 95.00th=[ 766],
| 99.00th=[ 881], 99.50th=[ 906], 99.90th=[ 938], 99.95th=[ 947],
| 99.99th=[ 963]
bw (KB/s) : min= 418, max= 1952, per=99.31%, avg=1339.72, stdev=354.39
lat (usec) : 20=0.01%
lat (msec) : 20=0.01%, 50=0.11%, 100=0.27%, 250=0.29%, 500=86.93%
lat (msec) : 750=6.87%, 1000=5.51%
cpu : usr=0.25%, sys=2.32%, ctx=13426, majf=0, minf=21
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=8991/d=0, short=r=0/w=0/d=0
randomread: (groupid=2, jobs=1): err= 0: pid=7234
read : io=473982KB, bw=7899.5KB/s, iops=934 , runt= 60005msec
slat (usec): min=2 , max=31048 , avg=1065.57, stdev=3519.63
clat (usec): min=21 , max=246981 , avg=67367.84, stdev=33459.21
lat (msec): min=1 , max=263 , avg=68.43, stdev=33.81
clat percentiles (msec):
| 1.00th=[ 15], 5.00th=[ 25], 10.00th=[ 33], 20.00th=[ 39],
| 30.00th=[ 50], 40.00th=[ 57], 50.00th=[ 61], 60.00th=[ 70],
| 70.00th=[ 76], 80.00th=[ 89], 90.00th=[ 111], 95.00th=[ 139],
| 99.00th=[ 176], 99.50th=[ 190], 99.90th=[ 217], 99.95th=[ 229],
| 99.99th=[ 247]
bw (KB/s) : min= 2912, max=11909, per=100.00%, avg=7900.46, stdev=2255.68
lat (usec) : 50=0.01%
lat (msec) : 2=0.17%, 4=0.16%, 10=0.35%, 20=2.49%, 50=27.44%
lat (msec) : 100=55.05%, 250=14.34%
cpu : usr=0.38%, sys=2.89%, ctx=8425, majf=0, minf=276
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=56073/w=0/d=0, short=r=0/w=0/d=0
sequentialread: (groupid=3, jobs=1): err= 0: pid=7249
read : io=2048.0MB, bw=474684KB/s, iops=52754 , runt= 4418msec
slat (usec): min=1 , max=44869 , avg=15.35, stdev=320.96
clat (usec): min=0 , max=67787 , avg=1145.60, stdev=3487.25
lat (usec): min=2 , max=67801 , avg=1161.14, stdev=3508.06
clat percentiles (usec):
| 1.00th=[ 10], 5.00th=[ 187], 10.00th=[ 217], 20.00th=[ 249],
| 30.00th=[ 278], 40.00th=[ 302], 50.00th=[ 330], 60.00th=[ 362],
| 70.00th=[ 398], 80.00th=[ 450], 90.00th=[ 716], 95.00th=[ 6624],
| 99.00th=[16064], 99.50th=[20096], 99.90th=[44800], 99.95th=[61696],
| 99.99th=[67072]
bw (KB/s) : min=50063, max=635019, per=97.34%, avg=462072.50, stdev=202894.18
lat (usec) : 2=0.65%, 4=0.02%, 10=0.30%, 20=0.24%, 50=0.36%
lat (usec) : 100=0.52%, 250=18.08%, 500=65.03%, 750=4.88%, 1000=0.52%
lat (msec) : 2=0.81%, 4=0.95%, 10=4.72%, 20=2.43%, 50=0.45%
lat (msec) : 100=0.05%
cpu : usr=7.52%, sys=67.56%, ctx=1494, majf=0, minf=277
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=233071/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=102048KB, aggrb=1700KB/s, minb=1700KB/s, maxb=1700KB/s, mint=60003msec, maxt=60003msec
Run status group 1 (all jobs):
WRITE: io=80994KB, aggrb=1349KB/s, minb=1349KB/s, maxb=1349KB/s, mint=60011msec, maxt=60011msec
Run status group 2 (all jobs):
READ: io=473982KB, aggrb=7899KB/s, minb=7899KB/s, maxb=7899KB/s, mint=60005msec, maxt=60005msec
Run status group 3 (all jobs):
READ: io=2048.0MB, aggrb=474683KB/s, minb=474683KB/s, maxb=474683KB/s, mint=4418msec, maxt=4418msec
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 5:07 ` Marc MERLIN
@ 2012-08-02 11:18 ` Martin Steigerwald
2012-08-02 17:39 ` Marc MERLIN
2012-08-02 11:25 ` Martin Steigerwald
1 sibling, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-02 11:18 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs, Fajar A. Nugraha
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
> On Wed, Aug 01, 2012 at 11:57:39PM +0200, Martin Steigerwald wrote:
> > Its getting quite strange.
>
> I would agree :)
>
> Before I paste a bunch of thing, I wanted to thank you for not giving up on me
> and offering your time to help me figure this out :)
You are welcome.
Well I am holding Linux performance analysis & tuning trainings and I am
really interested into issues like this ;)
I will take care of myself and I take my time to respond or even do not
respond at all anymore if I run out of ideas ;).
> > I lost track of whether you did that already or not, but if you didn´t
> > please post some
> >
> > vmstat 1
> > iostat -xd 1
> > on the device while it is being slow.
>
> Sure thing, here's the 24 second du -s run:
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 2 1 0 2747264 44 348388 0 0 28 50 242 184 19 6 74 1
> 1 0 0 2744128 44 351700 0 0 144 0 2758 32115 30 5 61 4
> 2 1 0 2743100 44 351992 0 0 792 0 2616 30613 28 4 50 18
> 1 1 0 2741592 44 352668 0 0 776 0 2574 31551 29 4 45 21
> 1 1 0 2740720 44 353432 0 0 692 0 2734 32891 30 4 45 22
> 1 1 0 2740104 44 354284 0 0 460 0 2639 31585 30 4 45 21
> 3 1 0 2738520 44 354692 0 0 544 264 2834 30302 32 5 42 21
> 1 1 0 2736936 44 355476 0 0 1064 2012 2867 31172 28 4 45 23
A bit more wait I/O with not even 10% of the throughput as compared to
the Intel SSD 320 figures. Seems that Intel SSD is running circles around
your Samsung SSD while not – as expected for that use case – being fully
utilized.
> Linux 3.5.0-amd64-preempt-noide-20120410 (gandalfthegreat) 08/01/2012 _x86_64_ (4 CPU)
> rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
> 2.18 0.68 6.45 17.77 78.12 153.39 19.12 0.18 7.51 8.52 7.15 4.46 10.81
> 0.00 0.00 118.00 0.00 540.00 0.00 9.15 1.18 9.93 9.93 0.00 4.98 58.80
> 0.00 0.00 217.00 0.00 868.00 0.00 8.00 1.90 8.77 8.77 0.00 4.44 96.40
> 0.00 0.00 192.00 0.00 768.00 0.00 8.00 1.63 8.44 8.44 0.00 5.10 98.00
> 0.00 0.00 119.00 0.00 476.00 0.00 8.00 1.06 9.01 9.01 0.00 8.20 97.60
> 0.00 0.00 125.00 0.00 500.00 0.00 8.00 1.08 8.67 8.67 0.00 7.55 94.40
> 0.00 0.00 165.00 0.00 660.00 0.00 8.00 1.50 9.12 9.12 0.00 5.87 96.80
> 0.00 0.00 195.00 13.00 780.00 272.00 10.12 1.68 8.10 7.94 10.46 4.65 96.80
> 0.00 0.00 173.00 0.00 692.00 0.00 8.00 1.72 9.87 9.87 0.00 5.71 98.80
> 0.00 0.00 171.00 0.00 684.00 0.00 8.00 1.62 9.33 9.33 0.00 5.75 98.40
> 0.00 0.00 161.00 0.00 644.00 0.00 8.00 1.52 9.57 9.57 0.00 6.14 98.80
> 0.00 0.00 136.00 0.00 544.00 0.00 8.00 1.26 9.29 9.29 0.00 7.24 98.40
> 0.00 0.00 199.00 0.00 796.00 0.00 8.00 1.94 9.73 9.73 0.00 4.94 98.40
> 0.00 0.00 201.00 0.00 804.00 0.00 8.00 1.70 8.54 8.54 0.00 4.80 96.40
> 0.00 0.00 272.00 15.00 1088.00 272.00 9.48 2.35 8.21 8.46 3.73 3.39 97.20
[…]
> > I am interested in wait I/O and latencies and disk utilization.
>
> Cool tool, I didn't know about iostat.
> My r_await numbers don't look good obviously and yet %util is pretty much
> 100% the entire time.
>
> Does that show that it's indeed the device that is unable to deliver the requests any quicker, despite
> being an ssd, or are you reading this differently?
That, or…
> > Also I am interested in
> > merkaba:~> hdparm -I /dev/sda | grep -i queue
> > Queue depth: 32
> > * Native Command Queueing (NCQ)
> > output for your SSD.
>
> gandalfthegreat:/var/local# hdparm -I /dev/sda | grep -i queue
> Queue depth: 32
> * Native Command Queueing (NCQ)
> gandalfthegreat:/var/local#
>
> I've the the fio tests in:
> /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
… you are still using dm_crypt?
Please test without dm_crypt. My figures are from within LVM, but no
dm_crypt. Its good to have a comparable base for the measurements.
> (discard is there, so fstrim shouldn't be needed)
I can´t imagine why it should matter, but maybe its worth having some
tests without „discard“.
> > I also suggest to use fio with with the ssd-test example on the SSD. I
> > have some comparison data available for my setup. Heck it should be
> > publicly available in my ADMIN magazine article about fio. I used a bit
> > different fio jobs with block sizes of 2k to 16k, but its similar enough
> > and I might even have some 4k examples at hand or can easily create one.
> > I also raised size and duration a bit.
> >
> > An example based on whats in my article:
>
> Thanks, here's the output.
> I see bw jumping from bw=1700.8KB/s all the way to bw=474684KB/s depending
> on the io size. That's "intereesting" to say the least.
>
> So, doctor, is it bad? :)
Well I do not engineer those SSDs, but to me it looks like either dm_crypt
is wildly hogging performance or the SSD is way slower than what I would
expect. But then you did some tests without dm_crypt AFAIR, but then it
would be interesting to repeat these tests without dm_crypt as well.
> randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> 2.0.8
> Starting 4 processes
> randomwrite: Laying out IO file(s) (1 file(s) / 2048MB)
> Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0 iops] [eta 00m:00s]
> randomwrite: (groupid=0, jobs=1): err= 0: pid=7193
> write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec
> slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55
> clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63
> lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57
> clat percentiles (msec):
> | 1.00th=[ 225], 5.00th=[ 241], 10.00th=[ 247], 20.00th=[ 260],
> | 30.00th=[ 269], 40.00th=[ 281], 50.00th=[ 293], 60.00th=[ 306],
> | 70.00th=[ 322], 80.00th=[ 351], 90.00th=[ 545], 95.00th=[ 570],
> | 99.00th=[ 627], 99.50th=[ 644], 99.90th=[ 709], 99.95th=[ 725],
> | 99.99th=[ 742]
> bw (KB/s) : min= 11, max= 2591, per=99.83%, avg=1697.13, stdev=491.48
> lat (usec) : 50=0.01%
> lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89%
> lat (msec) : 500=72.44%, 750=14.43%
Gosh, look at these latencies!
72,44% of all requests above 500 (in words: five hundred) milliseconds!
And 14,43% above 750 msecs. The percentage of requests served at 100 msecs
or less is was below one percent! Hey, is this an SSD or what?
Please really test without dm_crypt so that we can see whether it
introduces any kind of latency and if so in what amount.
But then let me compare. You are using iodepth=64 which fills the
device with request at maximum speed. So this will likely increase
latencies.
martin@merkaba:~[…]> fio iops-iodepth64.job
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
fio 1.57
Starting 4 processes
[…]
zufälligschreiben: (groupid=2, jobs=1): err= 0: pid=13103
write: io=2048.0MB, bw=76940KB/s, iops=14032 , runt= 27257msec
slat (usec): min=3 , max=6426 , avg=14.35, stdev=22.33
clat (usec): min=82 , max=414190 , avg=4540.81, stdev=4996.47
lat (usec): min=97 , max=414205 , avg=4555.53, stdev=4996.42
bw (KB/s) : min=12890, max=112728, per=100.39%, avg=77240.48, stdev=20310.79
cpu : usr=9.67%, sys=29.47%, ctx=72053, majf=0, minf=532
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=0/382480/0, short=0/0/0
lat (usec): 100=0.01%, 250=0.14%, 500=0.66%, 750=0.82%, 1000=1.52%
lat (msec): 2=10.86%, 4=27.88%, 10=57.18%, 20=0.79%, 50=0.03%
lat (msec): 100=0.07%, 250=0.04%, 500=0.01%
Still even with iodepth 64 totally different picture. And look at the IOPS
and throughput.
For reference, this refers to
[global]
ioengine=libaio
direct=1
iodepth=64
# Für zufällige Daten über die komplette Länge
# der Datei vorher Job sequentiell laufen lassen
filename=testdatei
size=2G
bsrange=2k-16k
refill_buffers=1
[zufälliglesen]
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
[zufälligschreiben]
stonewall
rw=randwrite
runtime=60
[sequentiellschreiben]
stonewall
rw=write
runtime=60
but on Ext4 instead of BTRFS.
This could be another good test. Text with Ext4 on plain logical volume
without dm_crypt.
> cpu : usr=0.30%, sys=2.65%, ctx=9699, majf=0, minf=19
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.4%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=0/w=11396/d=0, short=r=0/w=0/d=0
> sequentialwrite: (groupid=1, jobs=1): err= 0: pid=7218
> write: io=80994KB, bw=1349.7KB/s, iops=149 , runt= 60011msec
> slat (usec): min=15 , max=296798 , avg=6658.06, stdev=7414.10
> clat (usec): min=15 , max=962319 , avg=418984.76, stdev=128626.12
> lat (msec): min=12 , max=967 , avg=425.64, stdev=130.36
> clat percentiles (msec):
> | 1.00th=[ 269], 5.00th=[ 306], 10.00th=[ 322], 20.00th=[ 338],
> | 30.00th=[ 355], 40.00th=[ 371], 50.00th=[ 388], 60.00th=[ 404],
> | 70.00th=[ 420], 80.00th=[ 445], 90.00th=[ 603], 95.00th=[ 766],
> | 99.00th=[ 881], 99.50th=[ 906], 99.90th=[ 938], 99.95th=[ 947],
> | 99.99th=[ 963]
> bw (KB/s) : min= 418, max= 1952, per=99.31%, avg=1339.72, stdev=354.39
> lat (usec) : 20=0.01%
> lat (msec) : 20=0.01%, 50=0.11%, 100=0.27%, 250=0.29%, 500=86.93%
> lat (msec) : 750=6.87%, 1000=5.51%
Sequential write latencies are abysmal as well.
sequentiellschreiben: (groupid=3, jobs=1): err= 0: pid=13105
write: io=2048.0MB, bw=155333KB/s, iops=17290 , runt= 13501msec
slat (usec): min=2 , max=4299 , avg=13.51, stdev=18.91
clat (usec): min=334 , max=201706 , avg=3680.94, stdev=2088.53
lat (usec): min=340 , max=201718 , avg=3694.83, stdev=2088.28
bw (KB/s) : min=144952, max=162856, per=100.04%, avg=155394.92, stdev=5720.43
cpu : usr=16.77%, sys=31.17%, ctx=47790, majf=0, minf=535
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=0/233444/0, short=0/0/0
lat (usec): 500=0.01%, 750=0.02%, 1000=0.03%
lat (msec): 2=0.55%, 4=77.52%, 10=21.86%, 20=0.01%, 100=0.01%
lat (msec): 250=0.01%
> cpu : usr=0.25%, sys=2.32%, ctx=13426, majf=0, minf=21
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=0/w=8991/d=0, short=r=0/w=0/d=0
> randomread: (groupid=2, jobs=1): err= 0: pid=7234
> read : io=473982KB, bw=7899.5KB/s, iops=934 , runt= 60005msec
> slat (usec): min=2 , max=31048 , avg=1065.57, stdev=3519.63
> clat (usec): min=21 , max=246981 , avg=67367.84, stdev=33459.21
> lat (msec): min=1 , max=263 , avg=68.43, stdev=33.81
> clat percentiles (msec):
> | 1.00th=[ 15], 5.00th=[ 25], 10.00th=[ 33], 20.00th=[ 39],
> | 30.00th=[ 50], 40.00th=[ 57], 50.00th=[ 61], 60.00th=[ 70],
> | 70.00th=[ 76], 80.00th=[ 89], 90.00th=[ 111], 95.00th=[ 139],
> | 99.00th=[ 176], 99.50th=[ 190], 99.90th=[ 217], 99.95th=[ 229],
> | 99.99th=[ 247]
> bw (KB/s) : min= 2912, max=11909, per=100.00%, avg=7900.46, stdev=2255.68
> lat (usec) : 50=0.01%
> lat (msec) : 2=0.17%, 4=0.16%, 10=0.35%, 20=2.49%, 50=27.44%
> lat (msec) : 100=55.05%, 250=14.34%
Even read latencies are really high.
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=13101
read : io=2048.0MB, bw=162545KB/s, iops=29719 , runt= 12902msec
slat (usec): min=2 , max=1485 , avg=10.58, stdev=10.81
clat (usec): min=246 , max=18706 , avg=2140.05, stdev=1073.28
lat (usec): min=267 , max=18714 , avg=2150.97, stdev=1074.42
bw (KB/s) : min=108000, max=205060, per=101.08%, avg=164305.12, stdev=32494.26
cpu : usr=12.74%, sys=49.83%, ctx=82018, majf=0, minf=276
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=383439/0/0, short=0/0/0
lat (usec): 250=0.01%, 500=0.03%, 750=0.30%, 1000=3.21%
lat (msec): 2=48.26%, 4=44.68%, 10=3.35%, 20=0.18%
Here for comparison that 500 GB 2.5 inch eSATA Hitachi disk (5400rpm):
merkaba:[…]> fio --readonly iops-gerät-lesend-iodepth64.job
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
fio 1.57
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [79411K/0K /s] [8617 /0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=32578
read : io=58290KB, bw=984592 B/s, iops=106 , runt= 60623msec
slat (usec): min=3 , max=82 , avg=24.63, stdev= 6.64
clat (msec): min=40 , max=2825 , avg=602.78, stdev=374.10
lat (msec): min=40 , max=2825 , avg=602.80, stdev=374.10
bw (KB/s) : min= 0, max= 1172, per=65.52%, avg=629.66, stdev=466.50
cpu : usr=0.24%, sys=0.64%, ctx=6443, majf=0, minf=275
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.5%, >=64=99.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=6435/0/0, short=0/0/0
lat (msec): 50=0.20%, 100=2.61%, 250=15.37%, 500=26.64%, 750=25.07%
lat (msec): 1000=16.13%, 2000=13.44%, >=2000=0.54%
Okay, so at least your SSD has shorter random read latencies than this
slow harddisk. ;)
Has a queue depth of 32 as well.
> cpu : usr=0.38%, sys=2.89%, ctx=8425, majf=0, minf=276
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=56073/w=0/d=0, short=r=0/w=0/d=0
> sequentialread: (groupid=3, jobs=1): err= 0: pid=7249
> read : io=2048.0MB, bw=474684KB/s, iops=52754 , runt= 4418msec
> slat (usec): min=1 , max=44869 , avg=15.35, stdev=320.96
> clat (usec): min=0 , max=67787 , avg=1145.60, stdev=3487.25
> lat (usec): min=2 , max=67801 , avg=1161.14, stdev=3508.06
> clat percentiles (usec):
> | 1.00th=[ 10], 5.00th=[ 187], 10.00th=[ 217], 20.00th=[ 249],
> | 30.00th=[ 278], 40.00th=[ 302], 50.00th=[ 330], 60.00th=[ 362],
> | 70.00th=[ 398], 80.00th=[ 450], 90.00th=[ 716], 95.00th=[ 6624],
> | 99.00th=[16064], 99.50th=[20096], 99.90th=[44800], 99.95th=[61696],
> | 99.99th=[67072]
> bw (KB/s) : min=50063, max=635019, per=97.34%, avg=462072.50, stdev=202894.18
> lat (usec) : 2=0.65%, 4=0.02%, 10=0.30%, 20=0.24%, 50=0.36%
> lat (usec) : 100=0.52%, 250=18.08%, 500=65.03%, 750=4.88%, 1000=0.52%
> lat (msec) : 2=0.81%, 4=0.95%, 10=4.72%, 20=2.43%, 50=0.45%
> lat (msec) : 100=0.05%
> cpu : usr=7.52%, sys=67.56%, ctx=1494, majf=0, minf=277
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=233071/w=0/d=0, short=r=0/w=0/d=0
Intel SSD 320:
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=13102
read : io=2048.0MB, bw=252699KB/s, iops=28043 , runt= 8299msec
slat (usec): min=2 , max=1416 , avg=10.75, stdev=10.95
clat (usec): min=305 , max=201105 , avg=2268.82, stdev=2066.69
lat (usec): min=319 , max=201114 , avg=2279.94, stdev=2066.61
bw (KB/s) : min=249424, max=254472, per=100.03%, avg=252776.50, stdev=1416.63
cpu : usr=12.29%, sys=43.62%, ctx=43913, majf=0, minf=278
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=232729/0/0, short=0/0/0
lat (usec): 500=0.01%, 750=0.06%, 1000=0.10%
lat (msec): 2=11.00%, 4=88.32%, 10=0.50%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%
Harddisk again:
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=32580
read : io=4510.3MB, bw=76964KB/s, iops=8551 , runt= 60008msec
slat (usec): min=1 , max=3373 , avg=10.03, stdev= 8.96
clat (msec): min=1 , max=49 , avg= 7.47, stdev= 1.14
lat (msec): min=1 , max=49 , avg= 7.48, stdev= 1.14
bw (KB/s) : min=72608, max=79588, per=100.05%, avg=77003.96, stdev=1626.66
cpu : usr=6.89%, sys=16.96%, ctx=229486, majf=0, minf=277
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=513184/0/0, short=0/0/0
lat (msec): 2=0.01%, 4=0.02%, 10=99.67%, 20=0.29%, 50=0.01%
Maximum latencies on our SSD are longer. Minimum latencies shorter.
Well sequential stuff might not be located sequential on the SSD, the SSD
controller might put the stuff from a big file into totally different places.
> Run status group 0 (all jobs):
> WRITE: io=102048KB, aggrb=1700KB/s, minb=1700KB/s, maxb=1700KB/s, mint=60003msec, maxt=60003msec
>
> Run status group 1 (all jobs):
> WRITE: io=80994KB, aggrb=1349KB/s, minb=1349KB/s, maxb=1349KB/s, mint=60011msec, maxt=60011msec
>
> Run status group 2 (all jobs):
> READ: io=473982KB, aggrb=7899KB/s, minb=7899KB/s, maxb=7899KB/s, mint=60005msec, maxt=60005msec
>
> Run status group 3 (all jobs):
> READ: io=2048.0MB, aggrb=474683KB/s, minb=474683KB/s, maxb=474683KB/s, mint=4418msec, maxt=4418msec
Thats pathetic except for the last job group (each group has one job here
cause of the stonewall commands) of reading sequentially. Strangely
sequential write is also abysmally slow.
This is how I expect this to look like with iodepth 64:
Run status group 0 (all jobs):
READ: io=2048.0MB, aggrb=162544KB/s, minb=166445KB/s, maxb=166445KB/s, mint=12902msec, maxt=12902msec
Run status group 1 (all jobs):
READ: io=2048.0MB, aggrb=252699KB/s, minb=258764KB/s, maxb=258764KB/s, mint=8299msec, maxt=8299msec
Run status group 2 (all jobs):
WRITE: io=2048.0MB, aggrb=76939KB/s, minb=78786KB/s, maxb=78786KB/s, mint=27257msec, maxt=27257msec
Run status group 3 (all jobs):
WRITE: io=2048.0MB, aggrb=155333KB/s, minb=159061KB/s, maxb=159061KB/s, mint=13501msec, maxt=13501msec
Can you also post the last lines:
Disk stats (read/write):
dm-2: ios=616191/613142, merge=0/0, ticks=1300820/2565384, in_queue=3867448, util=98.81%, aggrios=504829/504643, aggrmerge=111362/111451, aggrticks=1058320/2164664, aggrin_queue=3223048, aggrutil=98.78%
sda: ios=504829/504643, merge=111362/111451, ticks=1058320/2164664, in_queue=3223048, util=98.78%
martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/Messungen/merkaba>
Its gives information on who good the I/O scheduler was able to merge requests.
I didn´t see much of a difference between CFQ and noop, so it may not
matter much, but since it gives also a number on total disk utilization
its still quite nice to have.
So my recommendation of now:
Remove as much factors as possible and in order to compare results with
what I posted try with plain logical volume with Ext4.
If the values are still quite slow, I think its good to ask Linux
block layer experts – for example by posting on fio mailing list where
there are people subscribed that may be able to provide other test
results – and SSD experts. There might be a Linux block layer mailing
list or use libata or fsdevel, I don´t know.
If your Samsung SSD turns out to be this slow or almost this slow on
such a basic level, I am out of ideas.
Besides except:
Have the IOPS run on the device it self. That will remove any filesystem
layer. But only the read only tests, to make sure I suggest to use fio
with the --readonly option as safety guard. Unless you have a spare SSD
that you can afford to use for write testing which will likely destroy
every filesystem on it. Or let it run on just one logical volume.
If its then slow, then I´d tend to use a different SSD and test from
there. If its then faster, I´d tend to believe that its either a hardware
or probably more likely a compatibilty issue.
What does your SSD report discard alignment – well but even that should
not matter for reads:
merkaba:/sys/block/sda> cat discard_alignment
0
Seems Intel tells us to not care at all. Or the value for some reason
cannot be read.
merkaba:/sys/block/sda> cd queue
merkaba:/sys/block/sda/queue> grep . *
add_random:1
discard_granularity:512
discard_max_bytes:2147450880
discard_zeroes_data:1
hw_sector_size:512
I would be interested whether these above differ.
grep: iosched: Ist ein Verzeichnis
iostats:1
logical_block_size:512
max_hw_sectors_kb:32767
max_integrity_segments:0
max_sectors_kb:512
max_segments:168
max_segment_size:65536
minimum_io_size:512
nomerges:0
nr_requests:128
optimal_io_size:0
physical_block_size:512
read_ahead_kb:128
rotational:0
rq_affinity:1
scheduler:noop deadline [cfq]
And whether there is any optimal io size.
One note regarding this: Back then I made the Ext4 I carried above tests
in the last year to be aligned to what I think could be good for the SSD.
I do not know whether this matters much.
merkaba:~> tune2fs -l /dev/merkaba/home
RAID stripe width: 128
Which should be in blocks and alignto 512 KiB. Hmmm, I am a bit puzzled
about this value at the moment. Either 1 MiB or 128 KiB would make sense
to me. But then what I do not about the erase block size of that SSD.
Then add BTRFS and then dm_crypt and look at how the numbers change.
I think this is a plan to find out whether its either really the hardware
or some wierd happening in the low level parts of Linux, e.g. the block
layer or dm_crypt or filing system.
Reduce it to the most basic level and then work from there.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 5:07 ` Marc MERLIN
2012-08-02 11:18 ` Martin Steigerwald
@ 2012-08-02 11:25 ` Martin Steigerwald
1 sibling, 0 replies; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-02 11:25 UTC (permalink / raw)
To: linux-btrfs; +Cc: Marc MERLIN, Fajar A. Nugraha
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
> So, doctor, is it bad? :)
>
> randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> 2.0.8
> Starting 4 processes
> randomwrite: Laying out IO file(s) (1 file(s) / 2048MB)
> Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0 iops] [eta 00m:00s]
> randomwrite: (groupid=0, jobs=1): err= 0: pid=7193
> write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec
> slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55
> clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63
> lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57
> clat percentiles (msec):
Heck, I didn´t look at the IOPS figure!
189 IOPS for a SATA-600 SSD. Thats pathetic.
So again, please test this without dm_crypt. I can´t believe that this
is the maximum the hardware is able to achieve.
A really fast 15000 rpm SAS harddisk might top that.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 11:18 ` Martin Steigerwald
@ 2012-08-02 17:39 ` Marc MERLIN
2012-08-02 20:20 ` Martin Steigerwald
0 siblings, 1 reply; 22+ messages in thread
From: Marc MERLIN @ 2012-08-02 17:39 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha
On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote:
> > I've the the fio tests in:
> > /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
>
> … you are still using dm_crypt?
That was my biggest partition and so far I've found no performance impact
on file access between unencrypted and dm_crypt.
I just took out my swap partition and made a smaller btrfs there:
/dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0
I mounted without discard.
> > lat (usec) : 50=0.01%
> > lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89%
> > lat (msec) : 500=72.44%, 750=14.43%
>
> Gosh, look at these latencies!
>
> 72,44% of all requests above 500 (in words: five hundred) milliseconds!
> And 14,43% above 750 msecs. The percentage of requests served at 100 msecs
> or less is was below one percent! Hey, is this an SSD or what?
Yeah, that's kind of what I've been complaining about since the beginning :)
Once I'm reading sequentially, it goes fast, but random access/latency is
indeed abysmal.
> Still even with iodepth 64 totally different picture. And look at the IOPS
> and throughput.
Yep. I know mine are bad :(
> For reference, this refers to
>
> [global]
> ioengine=libaio
> direct=1
> iodepth=64
Since it's slightly different than the first job file you gave me, I re-ran
with this one this time.
gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2
gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2
gandalfthegreat:~# grep sda2 /proc/mounts
/dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0
here's the btrfs test (ext4 is lower down):
gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
2.0.8
Starting 4 processes
zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193 iops] [eta 02m:10s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318
read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec
slat (usec): min=3 , max=37432 , avg=7252.52, stdev=5717.70
clat (usec): min=13 , max=981927 , avg=454046.13, stdev=110527.92
lat (msec): min=5 , max=999 , avg=461.30, stdev=112.00
clat percentiles (msec):
| 1.00th=[ 145], 5.00th=[ 269], 10.00th=[ 371], 20.00th=[ 408],
| 30.00th=[ 424], 40.00th=[ 437], 50.00th=[ 449], 60.00th=[ 457],
| 70.00th=[ 474], 80.00th=[ 490], 90.00th=[ 570], 95.00th=[ 644],
| 99.00th=[ 865], 99.50th=[ 922], 99.90th=[ 963], 99.95th=[ 979],
| 99.99th=[ 979]
bw (KB/s) : min= 8, max= 2807, per=100.00%, avg=1227.75, stdev=317.57
lat (usec) : 20=0.01%
lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82%
lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58%
cpu : usr=0.12%, sys=1.13%, ctx=12186, majf=0, minf=276
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=8262/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=30340
read : io=2048.0MB, bw=211257KB/s, iops=23473 , runt= 9927msec
slat (usec): min=1 , max=56321 , avg=20.51, stdev=424.44
clat (usec): min=0 , max=57987 , avg=2695.98, stdev=6624.00
lat (usec): min=1 , max=58015 , avg=2716.75, stdev=6642.09
clat percentiles (usec):
| 1.00th=[ 1], 5.00th=[ 10], 10.00th=[ 30], 20.00th=[ 100],
| 30.00th=[ 217], 40.00th=[ 362], 50.00th=[ 494], 60.00th=[ 636],
| 70.00th=[ 892], 80.00th=[ 1656], 90.00th=[ 7392], 95.00th=[21632],
| 99.00th=[29056], 99.50th=[29568], 99.90th=[43776], 99.95th=[46848],
| 99.99th=[57600]
bw (KB/s) : min=166675, max=260984, per=99.83%, avg=210892.26, stdev=22433.65
lat (usec) : 2=2.16%, 4=0.43%, 10=2.33%, 20=2.80%, 50=5.72%
lat (usec) : 100=6.47%, 250=12.35%, 500=18.29%, 750=15.52%, 1000=5.44%
lat (msec) : 2=13.04%, 4=3.59%, 10=3.83%, 20=1.88%, 50=6.11%
lat (msec) : 100=0.04%
cpu : usr=4.51%, sys=35.70%, ctx=11480, majf=0, minf=278
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=233025/w=0/d=0, short=r=0/w=0/d=0
zufälligschreiben: (groupid=2, jobs=1): err= 0: pid=30348
write: io=110768KB, bw=1845.1KB/s, iops=208 , runt= 60007msec
slat (usec): min=26 , max=50160 , avg=4789.12, stdev=4968.75
clat (usec): min=29 , max=752494 , avg=301858.86, stdev=80422.24
lat (msec): min=12 , max=757 , avg=306.65, stdev=81.52
clat percentiles (msec):
| 1.00th=[ 208], 5.00th=[ 241], 10.00th=[ 245], 20.00th=[ 255],
| 30.00th=[ 262], 40.00th=[ 269], 50.00th=[ 277], 60.00th=[ 289],
| 70.00th=[ 306], 80.00th=[ 326], 90.00th=[ 363], 95.00th=[ 519],
| 99.00th=[ 627], 99.50th=[ 652], 99.90th=[ 734], 99.95th=[ 742],
| 99.99th=[ 750]
bw (KB/s) : min= 616, max= 2688, per=99.72%, avg=1839.80, stdev=399.60
lat (usec) : 50=0.01%
lat (msec) : 20=0.02%, 50=0.04%, 100=0.07%, 250=14.31%, 500=79.62%
lat (msec) : 750=5.91%, 1000=0.01%
cpu : usr=0.28%, sys=2.60%, ctx=10167, majf=0, minf=20
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=12495/d=0, short=r=0/w=0/d=0
sequentiellschreiben: (groupid=3, jobs=1): err= 0: pid=30364
write: io=84902KB, bw=1414.1KB/s, iops=156 , runt= 60005msec
slat (usec): min=18 , max=40825 , avg=6389.99, stdev=6072.56
clat (usec): min=22 , max=887097 , avg=401897.77, stdev=108015.23
lat (msec): min=11 , max=899 , avg=408.29, stdev=109.47
clat percentiles (msec):
| 1.00th=[ 262], 5.00th=[ 302], 10.00th=[ 318], 20.00th=[ 338],
| 30.00th=[ 351], 40.00th=[ 363], 50.00th=[ 375], 60.00th=[ 388],
| 70.00th=[ 404], 80.00th=[ 433], 90.00th=[ 502], 95.00th=[ 693],
| 99.00th=[ 783], 99.50th=[ 824], 99.90th=[ 873], 99.95th=[ 881],
| 99.99th=[ 889]
bw (KB/s) : min= 346, max= 2115, per=99.44%, avg=1406.11, stdev=333.58
lat (usec) : 50=0.01%
lat (msec) : 20=0.02%, 50=0.03%, 100=0.07%, 250=0.59%, 500=89.12%
lat (msec) : 750=7.91%, 1000=2.24%
cpu : usr=0.28%, sys=2.21%, ctx=13224, majf=0, minf=21
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.3%, >=64=99.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=9369/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=73682KB, aggrb=1227KB/s, minb=1227KB/s, maxb=1227KB/s, mint=60004msec, maxt=60004msec
Run status group 1 (all jobs):
READ: io=2048.0MB, aggrb=211257KB/s, minb=211257KB/s, maxb=211257KB/s, mint=9927msec, maxt=9927msec
Run status group 2 (all jobs):
WRITE: io=110768KB, aggrb=1845KB/s, minb=1845KB/s, maxb=1845KB/s, mint=60007msec, maxt=60007msec
Run status group 3 (all jobs):
WRITE: io=84902KB, aggrb=1414KB/s, minb=1414KB/s, maxb=1414KB/s, mint=60005msec, maxt=60005msec
gandalfthegreat:/mnt/mnt2#
(no lines beyond this, fio 2.0.8)
> This could be another good test. Text with Ext4 on plain logical volume
> without dm_crypt.
> Can you also post the last lines:
>
> Disk stats (read/write):
> dm-2: ios=616191/613142, merge=0/0, ticks=1300820/2565384, in_queue=3867448, util=98.81%, aggrios=504829/504643, aggrmerge=111362/111451, aggrticks=1058320/2164664, aggrin_queue=3223048, aggrutil=98.78%
> sda: ios=504829/504643, merge=111362/111451, ticks=1058320/2164664, in_queue=3223048, util=98.78%
> martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/Messungen/merkaba>
I didn't get these lines.
> Its gives information on who good the I/O scheduler was able to merge requests.
>
> I didn´t see much of a difference between CFQ and noop, so it may not
> matter much, but since it gives also a number on total disk utilization
> its still quite nice to have.
I tried deadline and noop, and indeed I'm not see much of a difference for my basic tests.
Fro now I have deadline.
> So my recommendation of now:
>
> Remove as much factors as possible and in order to compare results with
> what I posted try with plain logical volume with Ext4.
gandalfthegreat:~# mkfs.ext4 -O extent -b 4096 -E stride=128,stripe-width=128 /dev/sda2
/dev/sda2 /mnt/mnt2 ext4 rw,noatime,stripe=128,data=ordered 0 0
gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
2.0.8
Starting 4 processes
zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [___W] [63.8% done] [0K/2526K /s] [0 /280 iops] [eta 01m:21s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30077
read : io=2048.0MB, bw=276232KB/s, iops=50472 , runt= 7592msec
slat (usec): min=2 , max=2276 , avg= 6.87, stdev=12.01
clat (usec): min=249 , max=52128 , avg=1258.87, stdev=1714.63
lat (usec): min=260 , max=52134 , avg=1266.00, stdev=1715.36
clat percentiles (usec):
| 1.00th=[ 450], 5.00th=[ 548], 10.00th=[ 620], 20.00th=[ 724],
| 30.00th=[ 820], 40.00th=[ 908], 50.00th=[ 1004], 60.00th=[ 1096],
| 70.00th=[ 1208], 80.00th=[ 1368], 90.00th=[ 1640], 95.00th=[ 2040],
| 99.00th=[ 8256], 99.50th=[14912], 99.90th=[21120], 99.95th=[23168],
| 99.99th=[33024]
bw (KB/s) : min=76463, max=385328, per=100.00%, avg=277313.20, stdev=94661.29
lat (usec) : 250=0.01%, 500=2.46%, 750=19.82%, 1000=27.70%
lat (msec) : 2=44.79%, 4=3.55%, 10=0.72%, 20=0.78%, 50=0.17%
lat (msec) : 100=0.01%
cpu : usr=11.91%, sys=51.64%, ctx=91337, majf=0, minf=277
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=383190/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=30081
read : io=2048.0MB, bw=150043KB/s, iops=16641 , runt= 13977msec
slat (usec): min=1 , max=2134 , avg= 7.09, stdev=10.83
clat (usec): min=298 , max=16751 , avg=3836.41, stdev=755.26
lat (usec): min=304 , max=16771 , avg=3843.77, stdev=754.77
clat percentiles (usec):
| 1.00th=[ 2608], 5.00th=[ 2960], 10.00th=[ 3152], 20.00th=[ 3376],
| 30.00th=[ 3536], 40.00th=[ 3664], 50.00th=[ 3792], 60.00th=[ 3920],
| 70.00th=[ 4080], 80.00th=[ 4256], 90.00th=[ 4448], 95.00th=[ 4704],
| 99.00th=[ 5216], 99.50th=[ 5984], 99.90th=[13888], 99.95th=[15296],
| 99.99th=[16512]
bw (KB/s) : min=134280, max=169692, per=100.00%, avg=150111.30, stdev=7227.35
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.07%, 4=64.79%, 10=34.87%, 20=0.25%
cpu : usr=6.81%, sys=20.01%, ctx=77116, majf=0, minf=279
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=232601/w=0/d=0, short=r=0/w=0/d=0
zufälligschreiben: (groupid=2, jobs=1): err= 0: pid=30088
write: io=111828KB, bw=1863.7KB/s, iops=209 , runt= 60006msec
slat (usec): min=7 , max=51044 , avg=4757.60, stdev=4816.78
clat (usec): min=958 , max=582205 , avg=299854.73, stdev=56663.66
lat (msec): min=12 , max=588 , avg=304.61, stdev=57.22
clat percentiles (msec):
| 1.00th=[ 73], 5.00th=[ 225], 10.00th=[ 251], 20.00th=[ 273],
| 30.00th=[ 285], 40.00th=[ 297], 50.00th=[ 306], 60.00th=[ 314],
| 70.00th=[ 318], 80.00th=[ 330], 90.00th=[ 343], 95.00th=[ 359],
| 99.00th=[ 482], 99.50th=[ 519], 99.90th=[ 553], 99.95th=[ 570],
| 99.99th=[ 570]
bw (KB/s) : min= 1033, max= 3430, per=99.67%, avg=1856.90, stdev=307.24
lat (usec) : 1000=0.01%
lat (msec) : 20=0.06%, 50=0.72%, 100=0.44%, 250=8.50%, 500=89.52%
lat (msec) : 750=0.75%
cpu : usr=0.27%, sys=1.27%, ctx=9505, majf=0, minf=21
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=12580/d=0, short=r=0/w=0/d=0
sequentiellschreiben: (groupid=3, jobs=1): err= 0: pid=30102
write: io=137316KB, bw=2288.2KB/s, iops=254 , runt= 60013msec
slat (usec): min=5 , max=40817 , avg=3911.57, stdev=4785.23
clat (msec): min=4 , max=682 , avg=246.61, stdev=77.35
lat (msec): min=5 , max=688 , avg=250.53, stdev=78.39
clat percentiles (msec):
| 1.00th=[ 73], 5.00th=[ 143], 10.00th=[ 186], 20.00th=[ 206],
| 30.00th=[ 221], 40.00th=[ 229], 50.00th=[ 239], 60.00th=[ 245],
| 70.00th=[ 258], 80.00th=[ 273], 90.00th=[ 318], 95.00th=[ 416],
| 99.00th=[ 529], 99.50th=[ 562], 99.90th=[ 660], 99.95th=[ 668],
| 99.99th=[ 676]
bw (KB/s) : min= 811, max= 4243, per=99.65%, avg=2279.92, stdev=610.32
lat (msec) : 10=0.26%, 20=0.03%, 50=0.22%, 100=1.97%, 250=62.01%
lat (msec) : 500=33.43%, 750=2.07%
cpu : usr=0.35%, sys=1.46%, ctx=10993, majf=0, minf=21
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=15293/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=2048.0MB, aggrb=276231KB/s, minb=276231KB/s, maxb=276231KB/s, mint=7592msec, maxt=7592msec
Run status group 1 (all jobs):
READ: io=2048.0MB, aggrb=150043KB/s, minb=150043KB/s, maxb=150043KB/s, mint=13977msec, maxt=13977msec
Run status group 2 (all jobs):
WRITE: io=111828KB, aggrb=1863KB/s, minb=1863KB/s, maxb=1863KB/s, mint=60006msec, maxt=60006msec
Run status group 3 (all jobs):
WRITE: io=137316KB, aggrb=2288KB/s, minb=2288KB/s, maxb=2288KB/s, mint=60013msec, maxt=60013msec
Disk stats (read/write):
sda: ios=505649/30833, merge=110818/7835, ticks=917980/187892, in_queue=1106088, util=98.50%
gandalfthegreat:/mnt/mnt2#
> If the values are still quite slow, I think its good to ask Linux
> block layer experts – for example by posting on fio mailing list where
> there are people subscribed that may be able to provide other test
> results – and SSD experts. There might be a Linux block layer mailing
> list or use libata or fsdevel, I don´t know.
I'll try there next. I think after this mail, and your help which is very
much appreciated, it's fair to take things out of this list to stop spamming
the wrong people :)
> Have the IOPS run on the device it self. That will remove any filesystem
> layer. But only the read only tests, to make sure I suggest to use fio
> with the --readonly option as safety guard. Unless you have a spare SSD
> that you can afford to use for write testing which will likely destroy
> every filesystem on it. Or let it run on just one logical volume.
Can you send me a recommended job config you'd like me to run if the runs
above haven't already answered your questions?
> What does your SSD report discard alignment – well but even that should
> not matter for reads:
>
> merkaba:/sys/block/sda> cat discard_alignment
> 0
gandalfthegreat:/sys/block/sda# cat discard_alignment
0
> Seems Intel tells us to not care at all. Or the value for some reason
> cannot be read.
>
> merkaba:/sys/block/sda> cd queue
> merkaba:/sys/block/sda/queue> grep . *
> add_random:1
> discard_granularity:512
> discard_max_bytes:2147450880
> discard_zeroes_data:1
> hw_sector_size:512
>
> I would be interested whether these above differ.
I have the exact same.
> And whether there is any optimal io size.
I also have
optimal_io_size:0
> One note regarding this: Back then I made the Ext4 I carried above tests
> in the last year to be aligned to what I think could be good for the SSD.
> I do not know whether this matters much.
>
> merkaba:~> tune2fs -l /dev/merkaba/home
> RAID stripe width: 128
> I think this is a plan to find out whether its either really the hardware
> or some wierd happening in the low level parts of Linux, e.g. the block
> layer or dm_crypt or filing system.
>
> Reduce it to the most basic level and then work from there.
So, I went back to basics.
On Thu, Aug 02, 2012 at 01:25:17PM +0200, Martin Steigerwald wrote:
> Heck, I didn´t look at the IOPS figure!
>
> 189 IOPS for a SATA-600 SSD. Thats pathetic.
Yeah, and the new tests above without dmcrypt are no better.
> A really fast 15000 rpm SAS harddisk might top that.
My slow 1TB 5400rpm laptop hard drive runs du -s 4x faster than
the SSD right now :)
I read your comments about the multiple things you had in mind. Given the
new data, should I just go to an io list now, or even save myself some time
and just return the damned SSDs and buy from another vendor? :)
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 17:39 ` Marc MERLIN
@ 2012-08-02 20:20 ` Martin Steigerwald
2012-08-02 20:44 ` Marc MERLIN
0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-02 20:20 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs, Fajar A. Nugraha
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
> On Thu, Aug 02, 2012 at 01:18:07PM +0200, Martin Steigerwald wrote:
> > > I've the the fio tests in:
> > > /dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
> >
> > … you are still using dm_crypt?
[…]
> I just took out my swap partition and made a smaller btrfs there:
> /dev/sda3 /mnt/mnt3 btrfs rw,noatime,ssd,space_cache 0 0
>
> I mounted without discard.
[…]
> > For reference, this refers to
> >
> > [global]
> > ioengine=libaio
> > direct=1
> > iodepth=64
>
> Since it's slightly different than the first job file you gave me, I re-ran
> with this one this time.
>
> gandalfthegreat:~# /sbin/mkfs.btrfs -L test /dev/sda2
> gandalfthegreat:~# mount -o noatime /dev/sda2 /mnt/mnt2
> gandalfthegreat:~# grep sda2 /proc/mounts
> /dev/sda2 /mnt/mnt2 btrfs rw,noatime,ssd,space_cache 0 0
>
> here's the btrfs test (ext4 is lower down):
> gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
> zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> 2.0.8
Still abysmal except for sequential reads.
> Starting 4 processes
> zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
> sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
> zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
> sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
> Jobs: 1 (f=1): [___W] [59.5% done] [0K/1800K /s] [0 /193 iops] [eta 02m:10s]
> zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30318
> read : io=73682KB, bw=1227.1KB/s, iops=137 , runt= 60004msec
[…]
> lat (usec) : 20=0.01%
> lat (msec) : 10=0.01%, 20=0.02%, 50=0.04%, 100=0.46%, 250=3.82%
> lat (msec) : 500=79.48%, 750=13.57%, 1000=2.58%
1 second latency? Oh well…
> Run status group 3 (all jobs):
> WRITE: io=84902KB, aggrb=1414KB/s, minb=1414KB/s, maxb=1414KB/s, mint=60005msec, maxt=60005msec
> gandalfthegreat:/mnt/mnt2#
> (no lines beyond this, fio 2.0.8)
[…]
> > Can you also post the last lines:
> >
> > Disk stats (read/write):
> > dm-2: ios=616191/613142, merge=0/0, ticks=1300820/2565384, in_queue=3867448, util=98.81%, aggrios=504829/504643, aggrmerge=111362/111451, aggrticks=1058320/2164664, aggrin_queue=3223048, aggrutil=98.78%
> > sda: ios=504829/504643, merge=111362/111451, ticks=1058320/2164664, in_queue=3223048, util=98.78%
> > martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/Messungen/merkaba>
>
> I didn't get these lines.
Hmmm, back then I guess this was fio 1.5.9 or so.
> I tried deadline and noop, and indeed I'm not see much of a difference for my basic tests.
> Fro now I have deadline.
>
> > So my recommendation of now:
> >
> > Remove as much factors as possible and in order to compare results with
> > what I posted try with plain logical volume with Ext4.
>
> gandalfthegreat:~# mkfs.ext4 -O extent -b 4096 -E stride=128,stripe-width=128 /dev/sda2
> /dev/sda2 /mnt/mnt2 ext4 rw,noatime,stripe=128,data=ordered 0 0
>
> gandalfthegreat:/mnt/mnt2# fio ~/fio.job2
> zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
> 2.0.8
> Starting 4 processes
> zufälliglesen: Laying out IO file(s) (1 file(s) / 2048MB)
> sequentielllesen: Laying out IO file(s) (1 file(s) / 2048MB)
> zufälligschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
> sequentiellschreiben: Laying out IO file(s) (1 file(s) / 2048MB)
> Jobs: 1 (f=1): [___W] [63.8% done] [0K/2526K /s] [0 /280 iops] [eta 01m:21s]
> zufälliglesen: (groupid=0, jobs=1): err= 0: pid=30077
> read : io=2048.0MB, bw=276232KB/s, iops=50472 , runt= 7592msec
> slat (usec): min=2 , max=2276 , avg= 6.87, stdev=12.01
> clat (usec): min=249 , max=52128 , avg=1258.87, stdev=1714.63
> lat (usec): min=260 , max=52134 , avg=1266.00, stdev=1715.36
> clat percentiles (usec):
> | 1.00th=[ 450], 5.00th=[ 548], 10.00th=[ 620], 20.00th=[ 724],
> | 30.00th=[ 820], 40.00th=[ 908], 50.00th=[ 1004], 60.00th=[ 1096],
> | 70.00th=[ 1208], 80.00th=[ 1368], 90.00th=[ 1640], 95.00th=[ 2040],
> | 99.00th=[ 8256], 99.50th=[14912], 99.90th=[21120], 99.95th=[23168],
> | 99.99th=[33024]
> bw (KB/s) : min=76463, max=385328, per=100.00%, avg=277313.20, stdev=94661.29
> lat (usec) : 250=0.01%, 500=2.46%, 750=19.82%, 1000=27.70%
> lat (msec) : 2=44.79%, 4=3.55%, 10=0.72%, 20=0.78%, 50=0.17%
> lat (msec) : 100=0.01%
> cpu : usr=11.91%, sys=51.64%, ctx=91337, majf=0, minf=277
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=383190/w=0/d=0, short=r=0/w=0/d=0
Hey, whats this? With Ext4 you have really good random read performance
now! Way better than the Intel SSD 320 and…
> sequentielllesen: (groupid=1, jobs=1): err= 0: pid=30081
> read : io=2048.0MB, bw=150043KB/s, iops=16641 , runt= 13977msec
> slat (usec): min=1 , max=2134 , avg= 7.09, stdev=10.83
> clat (usec): min=298 , max=16751 , avg=3836.41, stdev=755.26
> lat (usec): min=304 , max=16771 , avg=3843.77, stdev=754.77
> clat percentiles (usec):
> | 1.00th=[ 2608], 5.00th=[ 2960], 10.00th=[ 3152], 20.00th=[ 3376],
> | 30.00th=[ 3536], 40.00th=[ 3664], 50.00th=[ 3792], 60.00th=[ 3920],
> | 70.00th=[ 4080], 80.00th=[ 4256], 90.00th=[ 4448], 95.00th=[ 4704],
> | 99.00th=[ 5216], 99.50th=[ 5984], 99.90th=[13888], 99.95th=[15296],
> | 99.99th=[16512]
> bw (KB/s) : min=134280, max=169692, per=100.00%, avg=150111.30, stdev=7227.35
> lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
> lat (msec) : 2=0.07%, 4=64.79%, 10=34.87%, 20=0.25%
> cpu : usr=6.81%, sys=20.01%, ctx=77116, majf=0, minf=279
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
> issued : total=r=232601/w=0/d=0, short=r=0/w=0/d=0
… even faster than sequential reads. That just doesn´t make any sense
to me.
The performance of your Samsung SSD seems to be quite erratic. It seems
that the device is capable of being fast, but only sometimes shows this
capability.
> zufälligschreiben: (groupid=2, jobs=1): err= 0: pid=30088
> write: io=111828KB, bw=1863.7KB/s, iops=209 , runt= 60006msec
[…]
> sequentiellschreiben: (groupid=3, jobs=1): err= 0: pid=30102
> write: io=137316KB, bw=2288.2KB/s, iops=254 , runt= 60013msec
[…]
> Run status group 0 (all jobs):
> READ: io=2048.0MB, aggrb=276231KB/s, minb=276231KB/s, maxb=276231KB/s, mint=7592msec, maxt=7592msec
>
> Run status group 1 (all jobs):
> READ: io=2048.0MB, aggrb=150043KB/s, minb=150043KB/s, maxb=150043KB/s, mint=13977msec, maxt=13977msec
>
> Run status group 2 (all jobs):
> WRITE: io=111828KB, aggrb=1863KB/s, minb=1863KB/s, maxb=1863KB/s, mint=60006msec, maxt=60006msec
>
> Run status group 3 (all jobs):
> WRITE: io=137316KB, aggrb=2288KB/s, minb=2288KB/s, maxb=2288KB/s, mint=60013msec, maxt=60013msec
Writes are slow again.
> Disk stats (read/write):
> sda: ios=505649/30833, merge=110818/7835, ticks=917980/187892, in_queue=1106088, util=98.50%
Well there we have that line. You don´t seem to have the other line
possibly due to not using LVM.
> gandalfthegreat:/mnt/mnt2#
>
> > If the values are still quite slow, I think its good to ask Linux
> > block layer experts – for example by posting on fio mailing list where
> > there are people subscribed that may be able to provide other test
> > results – and SSD experts. There might be a Linux block layer mailing
> > list or use libata or fsdevel, I don´t know.
>
> I'll try there next. I think after this mail, and your help which is very
> much appreciated, it's fair to take things out of this list to stop spamming
> the wrong people :)
Well the only thing interesting regarding BTRFS is the whopping
difference in random read performance between BTRFS and Ext4 on this
SSD.
> > Have the IOPS run on the device it self. That will remove any filesystem
> > layer. But only the read only tests, to make sure I suggest to use fio
> > with the --readonly option as safety guard. Unless you have a spare SSD
> > that you can afford to use for write testing which will likely destroy
> > every filesystem on it. Or let it run on just one logical volume.
>
> Can you send me a recommended job config you'd like me to run if the runs
> above haven't already answered your questions?
[global]
ioengine=libaio
direct=1
filename=/dev/sdb
size=476940m
bsrange=2k-16k
[zufälliglesen]
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
I won´t expect much of a difference, but then the random read performance
is quite different between Ext4 and BTRFS on this disk. That would make
it interesting to test without any filesystem in between and over the
whole device.
It might make sense to test around with different alignment, I think the
option is called bsalign or so…
> On Thu, Aug 02, 2012 at 01:25:17PM +0200, Martin Steigerwald wrote:
> > Heck, I didn´t look at the IOPS figure!
> >
> > 189 IOPS for a SATA-600 SSD. Thats pathetic.
>
> Yeah, and the new tests above without dmcrypt are no better.
>
> > A really fast 15000 rpm SAS harddisk might top that.
>
> My slow 1TB 5400rpm laptop hard drive runs du -s 4x faster than
> the SSD right now :)
>
> I read your comments about the multiple things you had in mind. Given the
> new data, should I just go to an io list now, or even save myself some time
> and just return the damned SSDs and buy from another vendor? :)
… or get yourself another SSD. Its your decision.
I admire your endurance. ;)
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 20:20 ` Martin Steigerwald
@ 2012-08-02 20:44 ` Marc MERLIN
2012-08-02 21:21 ` Martin Steigerwald
0 siblings, 1 reply; 22+ messages in thread
From: Marc MERLIN @ 2012-08-02 20:44 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha
On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote:
> Hey, whats this? With Ext4 you have really good random read performance
> now! Way better than the Intel SSD 320 and…
Yep, my du -sh tests do show that ext4 is 2x faster than btrfs.
Obviously it's sending IO in a way that either the IO subsystem, linux
driver, or drive prefer.
> The performance of your Samsung SSD seems to be quite erratic. It seems
> that the device is capable of being fast, but only sometimes shows this
> capability.
That's indeed exactly what I'm seeing in real life :)
> > > Have the IOPS run on the device it self. That will remove any filesystem
> > > layer. But only the read only tests, to make sure I suggest to use fio
> > > with the --readonly option as safety guard. Unless you have a spare SSD
> > > that you can afford to use for write testing which will likely destroy
> > > every filesystem on it. Or let it run on just one logical volume.
> >
> > Can you send me a recommended job config you'd like me to run if the runs
> > above haven't already answered your questions?
> [global]
(...)
I used this and just changed filename to /dev/sda. Since I'm reading
from the beginning of the drive, reads have to be aligned.
> I won´t expect much of a difference, but then the random read performance
> is quite different between Ext4 and BTRFS on this disk. That would make
> it interesting to test without any filesystem in between and over the
> whole device.
Here is the output:
gandalfthegreat:~# fio --readonly ./fio.job3
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172
read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec
slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64
clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66
lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88
clat percentiles (usec):
| 1.00th=[ 126], 5.00th=[ 235], 10.00th=[ 5216], 20.00th=[ 5920],
| 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480],
| 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768],
| 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656],
| 99.99th=[27264]
bw (KB/s) : min= 423, max= 5776, per=100.00%, avg=986.48, stdev=480.68
lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02%
lat (usec) : 1000=0.02%
lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18%
cpu : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=6532/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199
read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec
slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91
clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32
lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06
clat percentiles (usec):
| 1.00th=[ 61], 5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920],
| 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608],
| 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536],
| 99.00th=[18816], 99.50th=[19584], 99.90th=[24960], 99.95th=[29568],
| 99.99th=[34048]
bw (KB/s) : min= 405, max= 2680, per=100.00%, avg=912.92, stdev=261.62
lat (usec) : 50=0.41%, 100=1.77%, 250=1.20%, 500=0.23%, 750=0.02%
lat (usec) : 1000=0.03%
lat (msec) : 2=0.02%, 10=43.06%, 20=52.91%, 50=0.36%
cpu : usr=0.15%, sys=0.45%, ctx=6103, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=6101/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=59036KB, aggrb=983KB/s, minb=983KB/s, maxb=983KB/s, mint=60002msec, maxt=60002msec
Run status group 1 (all jobs):
READ: io=54658KB, aggrb=910KB/s, minb=910KB/s, maxb=910KB/s, mint=60002msec, maxt=60002msec
Disk stats (read/write):
sda: ios=12660/2072, merge=5/34, ticks=119452/22496, in_queue=141936, util=99.30%
> … or get yourself another SSD. Its your decision.
>
> I admire your endurance. ;)
Since I've gotten 2 SSDs to make sure I didn't get one bad one, and that the
company is greating great reviews for them, I'm now pretty sure that
it's either a problem with a linux driver, which is interesting for us
all to debug :)
If I go buy another brand, the next guy will have the same problems than
me.
But yes, if we figure this out, Samsung owes you and me some money :)
I'll try plugging this SSD in a totally different PC and see what happens.
This may say if it's an AHCI/intel sata driver problem.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 20:44 ` Marc MERLIN
@ 2012-08-02 21:21 ` Martin Steigerwald
2012-08-02 21:49 ` Marc MERLIN
0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-02 21:21 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs, Fajar A. Nugraha
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
> On Thu, Aug 02, 2012 at 10:20:07PM +0200, Martin Steigerwald wrote:
> > Hey, whats this? With Ext4 you have really good random read performance
> > now! Way better than the Intel SSD 320 and…
>
> Yep, my du -sh tests do show that ext4 is 2x faster than btrfs.
> Obviously it's sending IO in a way that either the IO subsystem, linux
> driver, or drive prefer.
But only on reads.
> > > > Have the IOPS run on the device it self. That will remove any filesystem
> > > > layer. But only the read only tests, to make sure I suggest to use fio
> > > > with the --readonly option as safety guard. Unless you have a spare SSD
> > > > that you can afford to use for write testing which will likely destroy
> > > > every filesystem on it. Or let it run on just one logical volume.
> > >
> > > Can you send me a recommended job config you'd like me to run if the runs
> > > above haven't already answered your questions?
>
> > [global]
> (...)
>
> I used this and just changed filename to /dev/sda. Since I'm reading
> from the beginning of the drive, reads have to be aligned.
>
> > I won´t expect much of a difference, but then the random read performance
> > is quite different between Ext4 and BTRFS on this disk. That would make
> > it interesting to test without any filesystem in between and over the
> > whole device.
>
> Here is the output:
> gandalfthegreat:~# fio --readonly ./fio.job3
> zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
> sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
> 2.0.8
> Starting 2 processes
> Jobs: 1 (f=1): [_R] [66.9% done] [966K/0K /s] [108 /0 iops] [eta 01m:00s]
> zufälliglesen: (groupid=0, jobs=1): err= 0: pid=2172
> read : io=59036KB, bw=983.93KB/s, iops=108 , runt= 60002msec
WTF?
Hey, did you adapt the size= keyword? It seems fio 2.0.8 can do completely
without it.
Also I noticed that I had iodepth 1 in there to circumvent any in drive
cache / optimization.
> slat (usec): min=5 , max=158 , avg=27.62, stdev=10.64
> clat (usec): min=45 , max=27348 , avg=9150.78, stdev=4452.66
> lat (usec): min=53 , max=27370 , avg=9179.05, stdev=4454.88
> clat percentiles (usec):
> | 1.00th=[ 126], 5.00th=[ 235], 10.00th=[ 5216], 20.00th=[ 5920],
> | 30.00th=[ 5920], 40.00th=[ 5984], 50.00th=[ 7712], 60.00th=[12480],
> | 70.00th=[12608], 80.00th=[12736], 90.00th=[12864], 95.00th=[16768],
> | 99.00th=[18560], 99.50th=[18816], 99.90th=[20352], 99.95th=[22656],
> | 99.99th=[27264]
> bw (KB/s) : min= 423, max= 5776, per=100.00%, avg=986.48, stdev=480.68
> lat (usec) : 50=0.11%, 100=0.64%, 250=4.47%, 500=1.65%, 750=0.02%
> lat (usec) : 1000=0.02%
> lat (msec) : 2=0.06%, 4=0.03%, 10=43.31%, 20=49.51%, 50=0.18%
> cpu : usr=0.17%, sys=0.45%, ctx=6534, majf=0, minf=26
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued : total=r=6532/w=0/d=0, short=r=0/w=0/d=0
Latency is still way to high even with iodepth 1, 10 milliseconds for
43% of requests. And the throughput and IOPS is still abysmal even
for iodepth 1 (see below for Intel SSD 320 values).
Okay, one further idea: Remove the bsrange to just test with 4k blocks.
Additionally test these are aligned with
blockalign=int[,int], ba=int[,int]
At what boundary to align random IO offsets. Defaults to
the same as 'blocksize' the minimum blocksize given.
Minimum alignment is typically 512b for using direct IO,
though it usually depends on the hardware block size.
This option is mutually exclusive with using a random
map for files, so it will turn off that option.
I would first test with 4k blocks as is. And then do something like:
blocksize=4k
blockalign=4k
And then raise
blockalign to some values that may matter like 8k, 128k, 512k, 1m or so.
But thats just guess work. I do not even exactly now if it works this
way in fio.
There is something pretty wierd going on. But I am not sure what it is.
Maybe an alignment issue, since Ext4 with stripe alignment was able to
do so much faster on reads.
> sequentielllesen: (groupid=1, jobs=1): err= 0: pid=2199
> read : io=54658KB, bw=932798 B/s, iops=101 , runt= 60002msec
Ey, whats this?
> slat (usec): min=5 , max=140 , avg=28.63, stdev= 9.91
> clat (usec): min=39 , max=34210 , avg=9799.18, stdev=4471.32
> lat (usec): min=45 , max=34228 , avg=9828.50, stdev=4472.06
> clat percentiles (usec):
> | 1.00th=[ 61], 5.00th=[ 5088], 10.00th=[ 5856], 20.00th=[ 5920],
> | 30.00th=[ 5984], 40.00th=[ 6048], 50.00th=[11840], 60.00th=[12608],
> | 70.00th=[12608], 80.00th=[12736], 90.00th=[16512], 95.00th=[17536],
> | 99.00th=[18816], 99.50th=[19584], 99.90th=[24960], 99.95th=[29568],
> | 99.99th=[34048]
> bw (KB/s) : min= 405, max= 2680, per=100.00%, avg=912.92, stdev=261.62
> lat (usec) : 50=0.41%, 100=1.77%, 250=1.20%, 500=0.23%, 750=0.02%
> lat (usec) : 1000=0.03%
> lat (msec) : 2=0.02%, 10=43.06%, 20=52.91%, 50=0.36%
> cpu : usr=0.15%, sys=0.45%, ctx=6103, majf=0, minf=28
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued : total=r=6101/w=0/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
> READ: io=59036KB, aggrb=983KB/s, minb=983KB/s, maxb=983KB/s, mint=60002msec, maxt=60002msec
>
> Run status group 1 (all jobs):
> READ: io=54658KB, aggrb=910KB/s, minb=910KB/s, maxb=910KB/s, mint=60002msec, maxt=60002msec
>
> Disk stats (read/write):
> sda: ios=12660/2072, merge=5/34, ticks=119452/22496, in_queue=141936, util=99.30%
What on earth is this?
You are testing the raw device and are getting these fun values out of it?
Let me repeat that here:
merkaba:/tmp> cat iops-read.job
[global]
ioengine=libaio
direct=1
filename=/dev/sda
bsrange=2k-16k
[zufälliglesen]
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
merkaba:/tmp> fio iops-read.job
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [57784K/0K /s] [6471 /0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=31915
read : io=1681.2MB, bw=28692KB/s, iops=3193 , runt= 60001msec
slat (usec): min=5 , max=991 , avg=31.18, stdev=10.07
clat (usec): min=2 , max=3312 , avg=276.67, stdev=124.41
lat (usec): min=48 , max=3337 , avg=308.54, stdev=125.51
clat percentiles (usec):
| 1.00th=[ 61], 5.00th=[ 82], 10.00th=[ 103], 20.00th=[ 159],
| 30.00th=[ 201], 40.00th=[ 237], 50.00th=[ 270], 60.00th=[ 306],
| 70.00th=[ 354], 80.00th=[ 402], 90.00th=[ 446], 95.00th=[ 478],
| 99.00th=[ 524], 99.50th=[ 548], 99.90th=[ 644], 99.95th=[ 716],
| 99.99th=[ 1020]
bw (KB/s) : min=27760, max=31784, per=100.00%, avg=28694.49, stdev=625.77
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.15%, 100=8.76%
lat (usec) : 250=34.69%, 500=54.13%, 750=2.23%, 1000=0.03%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=2.91%, sys=12.49%, ctx=199678, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=191587/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=31921
read : io=3895.9MB, bw=66487KB/s, iops=7384 , runt= 60001msec
slat (usec): min=5 , max=518 , avg=30.18, stdev= 8.16
clat (usec): min=1 , max=3394 , avg=100.21, stdev=83.51
lat (usec): min=44 , max=3429 , avg=131.69, stdev=84.44
clat percentiles (usec):
| 1.00th=[ 54], 5.00th=[ 60], 10.00th=[ 61], 20.00th=[ 68],
| 30.00th=[ 72], 40.00th=[ 79], 50.00th=[ 86], 60.00th=[ 92],
| 70.00th=[ 99], 80.00th=[ 105], 90.00th=[ 112], 95.00th=[ 163],
| 99.00th=[ 556], 99.50th=[ 716], 99.90th=[ 900], 99.95th=[ 940],
| 99.99th=[ 1080]
bw (KB/s) : min=30276, max=81176, per=99.95%, avg=66453.19, stdev=10893.90
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.57%
lat (usec) : 100=69.52%, 250=26.59%, 500=2.01%, 750=0.89%, 1000=0.39%
lat (msec) : 2=0.02%, 4=0.01%
cpu : usr=7.12%, sys=27.34%, ctx=461537, majf=0, minf=27
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=443106/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1681.2MB, aggrb=28691KB/s, minb=28691KB/s, maxb=28691KB/s, mint=60001msec, maxt=60001msec
Run status group 1 (all jobs):
READ: io=3895.9MB, aggrb=66487KB/s, minb=66487KB/s, maxb=66487KB/s, mint=60001msec, maxt=60001msec
Disk stats (read/write):
sda: ios=632851/228, merge=0/140, ticks=106392/351, in_queue=105682, util=87.78%
I have seen about 4000 IOPS with pure 4k blocks from that drive with 4k
blocks, so that seems to be fine. And look at latencies, barely some
requests in low millisecond range.
Now just for the sake of it with iodepth 64:
merkaba:/tmp> fio iops-read.job
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [217.8M/0K /s] [24.1K/0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=31945
read : io=12412MB, bw=211819KB/s, iops=23795 , runt= 60003msec
slat (usec): min=2 , max=2158 , avg=15.69, stdev= 8.96
clat (usec): min=138 , max=19349 , avg=2670.01, stdev=861.85
lat (usec): min=189 , max=19366 , avg=2686.29, stdev=861.67
clat percentiles (usec):
| 1.00th=[ 1256], 5.00th=[ 1448], 10.00th=[ 1592], 20.00th=[ 1864],
| 30.00th=[ 2128], 40.00th=[ 2384], 50.00th=[ 2640], 60.00th=[ 2896],
| 70.00th=[ 3152], 80.00th=[ 3408], 90.00th=[ 3728], 95.00th=[ 3952],
| 99.00th=[ 4448], 99.50th=[ 4896], 99.90th=[ 8768], 99.95th=[10048],
| 99.99th=[12096]
bw (KB/s) : min=116464, max=216400, per=100.00%, avg=211823.16, stdev=9787.89
lat (usec) : 250=0.01%, 500=0.01%, 750=0.02%, 1000=0.05%
lat (msec) : 2=25.18%, 4=70.40%, 10=4.28%, 20=0.05%
cpu : usr=14.39%, sys=44.38%, ctx=624900, majf=0, minf=278
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=1427811/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=32301
read : io=12499MB, bw=213307KB/s, iops=23691 , runt= 60002msec
slat (usec): min=1 , max=1509 , avg=12.42, stdev= 8.12
clat (usec): min=306 , max=201523 , avg=2685.75, stdev=1220.14
lat (usec): min=316 , max=201536 , avg=2698.70, stdev=1220.06
clat percentiles (usec):
| 1.00th=[ 1816], 5.00th=[ 2024], 10.00th=[ 2096], 20.00th=[ 2192],
| 30.00th=[ 2256], 40.00th=[ 2352], 50.00th=[ 2416], 60.00th=[ 2544],
| 70.00th=[ 2736], 80.00th=[ 2992], 90.00th=[ 3632], 95.00th=[ 4320],
| 99.00th=[ 5664], 99.50th=[ 6112], 99.90th=[ 7136], 99.95th=[ 7712],
| 99.99th=[26240]
bw (KB/s) : min=144720, max=256568, per=99.96%, avg=213210.08, stdev=23175.58
lat (usec) : 500=0.01%, 750=0.02%, 1000=0.06%
lat (msec) : 2=4.10%, 4=88.72%, 10=7.09%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=11.77%, sys=34.79%, ctx=440558, majf=0, minf=280
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=1421534/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=12412MB, aggrb=211818KB/s, minb=211818KB/s, maxb=211818KB/s, mint=60003msec, maxt=60003msec
Run status group 1 (all jobs):
READ: io=12499MB, aggrb=213306KB/s, minb=213306KB/s, maxb=213306KB/s, mint=60002msec, maxt=60002msec
Disk stats (read/write):
sda: ios=2151624/243, merge=692022/115, ticks=5719203/651132, in_queue=6372962, util=99.87%
(Seems that extra line is gone from fio 2, I darkly remember that Jens
Axboe removed some output he found to be superfluuos, but the disk stats
are still there.)
In that case random versus sequential do not even seem to make much of
a difference.
> > … or get yourself another SSD. Its your decision.
> >
> > I admire your endurance. ;)
>
> Since I've gotten 2 SSDs to make sure I didn't get one bad one, and that the
> company is greating great reviews for them, I'm now pretty sure that
> it's either a problem with a linux driver, which is interesting for us
> all to debug :)
Either that or there possible is some firmware update? But then those
reviews would mention it I bet.
I´d still look whether there is some firmware update available labeled:
"This will raise performance on lots of Linux based workloads by factors
of 10x to 250x" ;)
Anyway, I think now we are pretty near to the hardware.
So issue has to be somewhere in the block layer, the controller,
the SSD firmware or the SSD itself IMHO.
> If I go buy another brand, the next guy will have the same problems than
> me.
>
> But yes, if we figure this out, Samsung owes you and me some money :)
;)
> I'll try plugging this SSD in a totally different PC and see what happens.
> This may say if it's an AHCI/intel sata driver problem.
Seems we will continue until someone starts to complain here. Maybe
another list will be more approbiate? But then this thread has it all
in one ;). Adding a CC with some introductory note might be approbiate.
Its your problem, so you decide ;). I´d suggest the fio mailing list,
there are other performance people how may want to chime in.
Another idea: Is there anything funny in SMART values (smartctl -a
and possibly even -x)? Well I gather these SSDs are new, but still.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 21:21 ` Martin Steigerwald
@ 2012-08-02 21:49 ` Marc MERLIN
2012-08-03 18:45 ` Martin Steigerwald
0 siblings, 1 reply; 22+ messages in thread
From: Marc MERLIN @ 2012-08-02 21:49 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha
On Thu, Aug 02, 2012 at 11:21:51PM +0200, Martin Steigerwald wrote:
> > Yep, my du -sh tests do show that ext4 is 2x faster than btrfs.
> > Obviously it's sending IO in a way that either the IO subsystem, linux
> > driver, or drive prefer.
>
> But only on reads.
Yes. I was focussing on the simple case. SSDs can be slower than hard
drives on writes, and now you're getting into deep firmware magic for
corner cases, so I didn't want to touch that here :)
> WTF?
>
> Hey, did you adapt the size= keyword? It seems fio 2.0.8 can do completely
> without it.
I left it as is:
[global]
ioengine=libaio
direct=1
filename=/dev/sda
size=476940m
bsrange=2k-16k
[zufälliglesen]
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
I just removed it now.
> Okay, one further idea: Remove the bsrange to just test with 4k blocks.
Ok, there you go:
[global]
ioengine=libaio
direct=1
filename=/dev/sda
blocksize=4k
blockalign=4k
[zufälliglesen]
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
gandalfthegreat:~# fio --readonly ./fio.job4
zufälliglesen: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [1476K/0K /s] [369 /0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=3614
read : io=60832KB, bw=1013.7KB/s, iops=253 , runt= 60012msec
slat (usec): min=4 , max=892 , avg=24.36, stdev=16.05
clat (usec): min=2 , max=27575 , avg=3915.39, stdev=4990.69
lat (usec): min=44 , max=27591 , avg=3940.35, stdev=4991.99
clat percentiles (usec):
| 1.00th=[ 41], 5.00th=[ 46], 10.00th=[ 51], 20.00th=[ 66],
| 30.00th=[ 113], 40.00th=[ 163], 50.00th=[ 205], 60.00th=[ 4960],
| 70.00th=[ 5856], 80.00th=[11584], 90.00th=[12480], 95.00th=[12608],
| 99.00th=[14272], 99.50th=[17280], 99.90th=[18816], 99.95th=[21120],
| 99.99th=[22912]
bw (KB/s) : min= 253, max= 2520, per=100.00%, avg=1014.86, stdev=364.74
lat (usec) : 4=0.05%, 10=0.01%, 20=0.01%, 50=8.07%, 100=1iiiiiiii9.94%
lat (usec) : 250=26.26%, 500=2.78%, 750=0.30%, 1000=0.09%
lat (msec) : 2=0.12%, 4=0.07%, 10=20.75%, 20=21.51%, 50=0.05%
cpu : usr=0.31%, sys=0.88%, ctx=15250, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=15208/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=3636
read : io=88468KB, bw=1474.3KB/s, iops=368 , runt= 60009msec
slat (usec): min=3 , max=340 , avg=16.46, stdev=10.23
clat (usec): min=1 , max=27356 , avg=2692.99, stdev=4564.76
lat (usec): min=30 , max=27370 , avg=2709.87, stdev=4566.31
clat percentiles (usec):
| 1.00th=[ 27], 5.00th=[ 29], 10.00th=[ 31], 20.00th=[ 35],
| 30.00th=[ 47], 40.00th=[ 61], 50.00th=[ 78], 60.00th=[ 107],
| 70.00th=[ 270], 80.00th=[ 5856], 90.00th=[11712], 95.00th=[12608],
| 99.00th=[16512], 99.50th=[17536], 99.90th=[18816], 99.95th=[19584],
| 99.99th=[21120]
bw (KB/s) : min= 282, max= 5096, per=100.00%, avg=1474.40, stdev=899.36
lat (usec) : 2=0.05%, 4=0.01%, 20=0.02%, 50=32.45%, 100=24.61%
lat (usec) : 250=12.52%, 500=1.40%, 750=0.03%, 1000=0.03%
lat (msec) : 2=0.02%, 4=0.03%, 10=13.95%, 20=14.85%, 50=0.03%
cpu : usr=0.33%, sys=0.90%, ctx=22176, majf=0, minf=25
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=22117/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=60832KB, aggrb=1013KB/s, minb=1013KB/s, maxb=1013KB/s, mint=60012msec, maxt=60012msec
Run status group 1 (all jobs):
READ: io=88468KB, aggrb=1474KB/s, minb=1474KB/s, maxb=1474KB/s, mint=60009msec, maxt=60009msec
Disk stats (read/write):
sda: ios=54013/1836, merge=291/17, ticks=251660/11052, in_queue=262668, util=99.51%
and the same test with blockalign=512k:
gandalfthegreat:~# fio --readonly ./fio.job4 # blockalign=512k
fio: Any use of blockalign= turns off randommap
zufälliglesen: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=1
2.0.8
Starting 2 processes
Jobs: 1 (f=1): [_R] [66.9% done] [4180K/0K /s] [1045 /0 iops] [eta 01m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=3679
read : io=56096KB, bw=957355 B/s, iops=233 , runt= 60001msec
slat (usec): min=4 , max=344 , avg=27.73, stdev=15.51
clat (usec): min=2 , max=38954 , avg=4244.27, stdev=5153.32
lat (usec): min=44 , max=38980 , avg=4272.70, stdev=5154.68
clat percentiles (usec):
| 1.00th=[ 42], 5.00th=[ 46], 10.00th=[ 53], 20.00th=[ 69],
| 30.00th=[ 120], 40.00th=[ 169], 50.00th=[ 211], 60.00th=[ 5024],
| 70.00th=[ 5920], 80.00th=[11584], 90.00th=[12480], 95.00th=[12608],
| 99.00th=[16768], 99.50th=[17536], 99.90th=[18560], 99.95th=[20096],
| 99.99th=[37632]
bw (KB/s) : min= 263, max= 3133, per=100.00%, avg=935.09, stdev=392.66
lat (usec) : 4=0.05%, 10=0.03%, 20=0.01%, 50=7.84%, 100=19.12%
lat (usec) : 250=25.84%, 500=1.61%, 750=0.08%, 1000=0.05%
lat (msec) : 2=0.06%, 4=0.08%, 10=21.99%, 20=23.17%, 50=0.06%
cpu : usr=0.29%, sys=0.94%, ctx=14069, majf=0, minf=25
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=14024/w=0/d=0, short=r=0/w=0/d=0
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=3703
read : io=99592KB, bw=1659.7KB/s, iops=414 , runt= 60008msec
slat (usec): min=4 , max=186 , avg=15.83, stdev= 9.95
clat (usec): min=1 , max=25530 , avg=2390.79, stdev=4340.79
lat (usec): min=28 , max=25539 , avg=2407.01, stdev=4342.90
clat percentiles (usec):
| 1.00th=[ 26], 5.00th=[ 28], 10.00th=[ 30], 20.00th=[ 34],
| 30.00th=[ 43], 40.00th=[ 57], 50.00th=[ 72], 60.00th=[ 91],
| 70.00th=[ 155], 80.00th=[ 5152], 90.00th=[11712], 95.00th=[12480],
| 99.00th=[12992], 99.50th=[16768], 99.90th=[18560], 99.95th=[18816],
| 99.99th=[21632]
bw (KB/s) : min= 269, max= 8465, per=100.00%, avg=1663.84, stdev=1367.82
lat (usec) : 2=0.01%, 4=0.02%, 20=0.01%, 50=35.42%, 100=26.00%
lat (usec) : 250=11.29%, 500=1.31%, 750=0.05%, 1000=0.01%
lat (msec) : 2=0.04%, 4=0.02%, 10=12.55%, 20=13.25%, 50=0.01%
cpu : usr=0.31%, sys=0.99%, ctx=24963, majf=0, minf=25
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=24898/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=56096KB, aggrb=934KB/s, minb=934KB/s, maxb=934KB/s, mint=60001msec, maxt=60001msec
Run status group 1 (all jobs):
READ: io=99592KB, aggrb=1659KB/s, minb=1659KB/s, maxb=1659KB/s, mint=60008msec, maxt=60008msec
Disk stats (read/write):
sda: ios=55021/1637, merge=72/39, ticks=240896/12080, in_queue=252912, util=99.59%
> > Since I've gotten 2 SSDs to make sure I didn't get one bad one, and that the
> > company is greating great reviews for them, I'm now pretty sure that
> > it's either a problem with a linux driver, which is interesting for us
> > all to debug :)
>
> Either that or there possible is some firmware update? But then those
> reviews would mention it I bet.
So far, there are no firmware updates from mine, I checked right after
unboxing it :)
> > I'll try plugging this SSD in a totally different PC and see what happens.
> > This may say if it's an AHCI/intel sata driver problem.
>
> Seems we will continue until someone starts to complain here. Maybe
> another list will be more approbiate? But then this thread has it all
> in one ;). Adding a CC with some introductory note might be approbiate.
> Its your problem, so you decide ;). I´d suggest the fio mailing list,
> there are other performance people how may want to chime in.
Actually you know the lists and people involved more than me. I'd be
happy if you added a Cc to another list you think is best, and we can
move there.
> Another idea: Is there anything funny in SMART values (smartctl -a
> and possibly even -x)? Well I gather these SSDs are new, but still.
smartctl -a /dev/sda
(snipped)
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 36
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 7
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 9
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 058 040 000 Old_age Always - 42
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 253 253 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 4
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 433339112
smartctl -x /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.5.0-amd64-preempt-noide-20120410] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG SSD 830 Series
Serial Number: S0WGNEAC400484
LU WWN Device Id: 5 002538 043584d30
Firmware Version: CXM03B1Q
User Capacity: 512,110,190,592 bytes [512 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 revision 2
Local Time is: Thu Aug 2 14:42:08 2012 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x83) Offline data collection activity
is in a Reserved state.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 1980) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 33) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
9 Power_On_Hours -O--CK 099 099 000 - 36
12 Power_Cycle_Count -O--CK 099 099 000 - 7
177 Wear_Leveling_Count PO--C- 099 099 000 - 9
179 Used_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 0
181 Program_Fail_Cnt_Total -O--CK 100 100 010 - 0
182 Erase_Fail_Count_Total -O--CK 100 100 010 - 0
183 Runtime_Bad_Block PO--C- 100 100 010 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
190 Airflow_Temperature_Cel -O--CK 058 040 000 - 42
195 Hardware_ECC_Recovered -O-RC- 200 200 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 253 253 000 - 0
235 Unknown_Attribute -O--C- 099 099 000 - 4
241 Total_LBAs_Written -O--CK 099 099 000 - 433352008
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
GP/S Log at address 0x00 has 1 sectors [Log Directory]
GP/S Log at address 0x01 has 1 sectors [Summary SMART error log]
GP/S Log at address 0x02 has 2 sectors [Comprehensive SMART error log]
GP/S Log at address 0x03 has 2 sectors [Ext. Comprehensive SMART error log]
GP/S Log at address 0x06 has 1 sectors [SMART self-test log]
GP/S Log at address 0x07 has 2 sectors [Extended self-test log]
GP/S Log at address 0x09 has 1 sectors [Selective self-test log]
GP/S Log at address 0x10 has 1 sectors [NCQ Command Error]
GP/S Log at address 0x11 has 1 sectors [SATA Phy Event Counters]
GP/S Log at address 0x80 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x81 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x82 has 16 sectors [Host vendor specific log]
GP/S Log at address 0x83 has 16 sectors [Host vendor specific log]
(snip)
SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (2 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 27 -
# 2 Short offline Completed without error 00% 8 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Warning: device does not support SCT Commands
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 49152 R_ERR response for data FIS
0x0003 2 53248 R_ERR response for device-to-host data FIS
0x0004 2 57344 R_ERR response for host-to-device data FIS
0x0005 2 7952 R_ERR response for non-data FIS
0x0006 2 2562 R_ERR response for device-to-host non-data FIS
0x0007 2 4096 R_ERR response for host-to-device non-data FIS
0x0008 2 7952 Device-to-host non-data FIS retries
0x0009 2 7953 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 25 Device-to-host register FISes sent due to a COMRESET
0x000b 2 6145 CRC errors within host-to-device FIS
0x000d 2 7953 Non-CRC errors within host-to-device FIS
0x000f 2 8176 R_ERR response for host-to-device data FIS, CRC
0x0010 2 15 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 49153 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 12112 R_ERR response for host-to-device non-data FIS, non-CRC
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-02 21:49 ` Marc MERLIN
@ 2012-08-03 18:45 ` Martin Steigerwald
2012-08-16 7:45 ` Marc MERLIN
0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2012-08-03 18:45 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-btrfs, Fajar A. Nugraha
Am Donnerstag, 2. August 2012 schrieb Marc MERLIN:
> > > I'll try plugging this SSD in a totally different PC and see what
> > > happens. This may say if it's an AHCI/intel sata driver problem.
> >
> >
> >
> > Seems we will continue until someone starts to complain here. Maybe
> > another list will be more approbiate? But then this thread has it all
> > in one ;). Adding a CC with some introductory note might be
> > approbiate. Its your problem, so you decide ;). I´d suggest the fio
> > mailing list, there are other performance people how may want to
> > chime in.
>
>
> Actually you know the lists and people involved more than me. I'd be
> happy if you added a Cc to another list you think is best, and we can
> move there.
I have no more ideas for the moment.
I suggest you to try the fio mailing list at fio lalala vger.kernel.org.
I currently have no ideas for other lists. I bet there is some block layer
/ libata / scsi list that might match. Just browse the lists from
kernel.org and pick the one that sounds most suitable for me¹. Probably
fsdevel but since the thing does not seem to be filesystem related beside
some alignment effect (stripe in Ext4). Hmmm, probably linux-scsi, but
look there first, whether block layer and libata related things are
discussed there.
I would start a new thread. Tell the problem, tell in summary what you
have tried and the outcome of that was and ask for advice. Add a link to
the original thread here. (It might be a good idea to have one more post
here with a link to the new thread so that other curious people can follow
there – in case there are any. Otherwise I would end it on this thread.
The thread view already looks ridiculous in KMail here ;-)
And put me on Cc ;). I´d really like to know whats going on here. As
written I have no more ideas right now. It seems to be getting to low
level for me.
[1] http://vger.kernel.org/vger-lists.html
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: How can btrfs take 23sec to stat 23K files from an SSD?
2012-08-03 18:45 ` Martin Steigerwald
@ 2012-08-16 7:45 ` Marc MERLIN
0 siblings, 0 replies; 22+ messages in thread
From: Marc MERLIN @ 2012-08-16 7:45 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Fajar A. Nugraha
To close this thread.
I gave up on the drive after it showed abysmal benchmarking results in
windows too.
I owed everyone an update, which I just finished typing:
http://marc.merlins.org/perso/linux/post_2012-08-15_The-tale-of-SSDs_-Crucial-C300-early-Death_-Samsung-830-extreme-random-IO-slowness_-and-settling-with-OCZ-Vertex-4.html
I'm not sure how I could have gotten 2 bad drives from Samsung in 2
different shipments, so I'm afraid the entire line may be bad. At least, it
was for me after extensive benchmarking, and even using their own windows
benchmarking tool.
In the end, I got a OCZ Vertex 4 and it's superfast as per the benchmarks I
posted in the link above.
What is interesting is that ext4 is somewhat faster than btrfs in the tests
I posted above, but not in ways that should be truly worrisome.
I'm still happy to have COW and snapshots, even if that costs me
performance.
Sorry for spamming the list with what apparently just ended up a crappy
drive (well, 2 since I got two just to make sure) from a vendor who
shouldn't have been selling crappy SSDs.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2012-08-16 7:45 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-24 6:18 How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
-- strict thread matches above, loose matches on Subject: below --
2012-02-02 12:42 brtfs on top of dmcrypt with SSD. No corruption iff write cache off? Chris Mason
2012-07-18 22:04 ` brtfs on top of dmcrypt with SSD -> Trim or no Trim Marc MERLIN
2012-07-22 18:58 ` brtfs on top of dmcrypt with SSD -> ssd or nossd + crypt performance? Marc MERLIN
2012-07-22 19:35 ` Martin Steigerwald
2012-07-22 20:44 ` Marc MERLIN
2012-07-22 22:41 ` brtfs on top of dmcrypt with SSD -> file access 5x slower than spinning disk Marc MERLIN
2012-07-23 6:42 ` How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
2012-07-24 7:56 ` Martin Steigerwald
2012-07-27 4:40 ` Marc MERLIN
2012-07-27 11:08 ` Chris Mason
2012-07-27 18:42 ` Marc MERLIN
[not found] ` <20120801053042.GG12695@merlins.org>
2012-08-01 6:01 ` Marc MERLIN
2012-08-01 6:08 ` Fajar A. Nugraha
2012-08-01 6:21 ` Marc MERLIN
2012-08-01 21:57 ` Martin Steigerwald
2012-08-02 5:07 ` Marc MERLIN
2012-08-02 11:18 ` Martin Steigerwald
2012-08-02 17:39 ` Marc MERLIN
2012-08-02 20:20 ` Martin Steigerwald
2012-08-02 20:44 ` Marc MERLIN
2012-08-02 21:21 ` Martin Steigerwald
2012-08-02 21:49 ` Marc MERLIN
2012-08-03 18:45 ` Martin Steigerwald
2012-08-16 7:45 ` Marc MERLIN
2012-08-02 11:25 ` Martin Steigerwald
2012-08-01 6:36 ` Chris Samuel
2012-08-01 6:40 ` Marc MERLIN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).