From: Marc MERLIN <marc@merlins.org>
To: Martin Steigerwald <Martin@lichtvoll.de>
Cc: linux-btrfs@vger.kernel.org, "Fajar A. Nugraha" <list@fajar.net>
Subject: Re: How can btrfs take 23sec to stat 23K files from an SSD?
Date: Wed, 1 Aug 2012 22:07:41 -0700 [thread overview]
Message-ID: <20120802050741.GA22772@merlins.org> (raw)
In-Reply-To: <201208012357.39786.Martin@lichtvoll.de>
On Wed, Aug 01, 2012 at 11:57:39PM +0200, Martin Steigerwald wrote:
> Its getting quite strange.
I would agree :)
Before I paste a bunch of thing, I wanted to thank you for not giving up on me
and offering your time to help me figure this out :)
> I lost track of whether you did that already or not, but if you didn´t
> please post some
>
> vmstat 1
> iostat -xd 1
> on the device while it is being slow.
Sure thing, here's the 24 second du -s run:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 1 0 2747264 44 348388 0 0 28 50 242 184 19 6 74 1
1 0 0 2744128 44 351700 0 0 144 0 2758 32115 30 5 61 4
2 1 0 2743100 44 351992 0 0 792 0 2616 30613 28 4 50 18
1 1 0 2741592 44 352668 0 0 776 0 2574 31551 29 4 45 21
1 1 0 2740720 44 353432 0 0 692 0 2734 32891 30 4 45 22
1 1 0 2740104 44 354284 0 0 460 0 2639 31585 30 4 45 21
3 1 0 2738520 44 354692 0 0 544 264 2834 30302 32 5 42 21
1 1 0 2736936 44 355476 0 0 1064 2012 2867 31172 28 4 45 23
3 1 0 2734892 44 356840 0 0 1236 0 2865 31100 30 3 46 21
3 1 0 2733840 44 357432 0 0 636 12 2930 31362 30 4 45 21
3 1 0 2735152 44 357892 0 0 612 0 2546 31641 28 4 46 22
3 1 0 2733608 44 358680 0 0 668 0 2846 29747 30 3 46 22
2 1 0 2731480 44 359556 0 0 712 248 2744 27939 31 3 44 22
2 1 0 2732688 44 360736 0 0 1204 0 2774 30979 30 3 46 21
4 1 0 2730276 44 361872 0 0 872 256 3174 31657 30 5 45 20
1 1 0 2728372 44 363008 0 0 912 132 2752 31811 30 3 46 22
1 1 0 2727060 44 363240 0 0 476 0 2595 30313 33 4 43 21
1 1 0 2725856 44 364000 0 0 580 0 2547 30803 29 4 46 21
1 1 0 2723628 44 364724 0 0 756 272 2766 30652 31 4 44 22
2 1 0 2722128 44 365616 0 0 744 8 2771 31760 30 4 45 21
2 1 0 2721024 44 366284 0 0 788 196 2695 31819 31 4 44 22
3 1 0 2719312 44 367084 0 0 732 0 2737 32038 30 5 45 20
1 1 0 2717436 44 367880 0 0 728 0 2570 30810 29 3 46 22
3 1 0 2716820 44 368500 0 0 608 0 2614 30949 29 4 45 22
2 1 0 2715576 44 369240 0 0 628 264 2715 31180 34 4 40 22
1 1 0 2714308 44 369952 0 0 624 0 2533 31155 28 4 46 22
Linux 3.5.0-amd64-preempt-noide-20120410 (gandalfthegreat) 08/01/2012 _x86_64_ (4 CPU)
rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2.18 0.68 6.45 17.77 78.12 153.39 19.12 0.18 7.51 8.52 7.15 4.46 10.81
0.00 0.00 118.00 0.00 540.00 0.00 9.15 1.18 9.93 9.93 0.00 4.98 58.80
0.00 0.00 217.00 0.00 868.00 0.00 8.00 1.90 8.77 8.77 0.00 4.44 96.40
0.00 0.00 192.00 0.00 768.00 0.00 8.00 1.63 8.44 8.44 0.00 5.10 98.00
0.00 0.00 119.00 0.00 476.00 0.00 8.00 1.06 9.01 9.01 0.00 8.20 97.60
0.00 0.00 125.00 0.00 500.00 0.00 8.00 1.08 8.67 8.67 0.00 7.55 94.40
0.00 0.00 165.00 0.00 660.00 0.00 8.00 1.50 9.12 9.12 0.00 5.87 96.80
0.00 0.00 195.00 13.00 780.00 272.00 10.12 1.68 8.10 7.94 10.46 4.65 96.80
0.00 0.00 173.00 0.00 692.00 0.00 8.00 1.72 9.87 9.87 0.00 5.71 98.80
0.00 0.00 171.00 0.00 684.00 0.00 8.00 1.62 9.33 9.33 0.00 5.75 98.40
0.00 0.00 161.00 0.00 644.00 0.00 8.00 1.52 9.57 9.57 0.00 6.14 98.80
0.00 0.00 136.00 0.00 544.00 0.00 8.00 1.26 9.29 9.29 0.00 7.24 98.40
0.00 0.00 199.00 0.00 796.00 0.00 8.00 1.94 9.73 9.73 0.00 4.94 98.40
0.00 0.00 201.00 0.00 804.00 0.00 8.00 1.70 8.54 8.54 0.00 4.80 96.40
0.00 0.00 272.00 15.00 1088.00 272.00 9.48 2.35 8.21 8.46 3.73 3.39 97.20
0.00 0.00 208.00 1.00 832.00 4.00 8.00 1.90 9.01 9.04 4.00 4.67 97.60
0.00 0.00 228.00 0.00 912.00 0.00 8.00 1.94 8.56 8.56 0.00 4.23 96.40
0.00 0.00 107.00 0.00 428.00 0.00 8.00 0.98 9.05 9.05 0.00 9.12 97.60
0.00 0.00 163.00 0.00 652.00 0.00 8.00 1.60 9.89 9.89 0.00 6.01 98.00
0.00 0.00 162.00 0.00 648.00 0.00 8.00 1.83 10.20 10.20 0.00 6.07 98.40
1.00 10.00 224.00 71.00 900.00 2157.50 20.73 2.82 10.14 9.50 12.17 3.35 98.80
41.00 8.00 166.00 176.00 832.00 333.50 6.82 2.48 7.24 8.48 6.07 2.92 100.00
18.00 0.00 78.00 72.00 388.00 36.00 5.65 2.30 15.28 16.56 13.89 6.67 100.00
13.00 0.00 190.00 25.00 824.00 12.50 7.78 1.95 9.17 9.43 7.20 4.58 98.40
0.00 0.00 156.00 0.00 624.00 0.00 8.00 1.36 8.72 8.72 0.00 6.26 97.60
0.00 0.00 158.00 19.00 632.00 64.00 7.86 1.50 8.45 8.61 7.16 5.60 99.20
0.00 0.00 176.00 0.00 704.00 0.00 8.00 1.65 9.39 9.39 0.00 5.61 98.80
0.00 0.00 44.00 0.00 176.00 0.00 8.00 0.38 8.55 8.55 0.00 8.27 36.40
> I am interested in wait I/O and latencies and disk utilization.
Cool tool, I didn't know about iostat.
My r_await numbers don't look good obviously and yet %util is pretty much
100% the entire time.
Does that show that it's indeed the device that is unable to deliver the requests any quicker, despite
being an ssd, or are you reading this differently?
> Also I am interested in
> merkaba:~> hdparm -I /dev/sda | grep -i queue
> Queue depth: 32
> * Native Command Queueing (NCQ)
> output for your SSD.
gandalfthegreat:/var/local# hdparm -I /dev/sda | grep -i queue
Queue depth: 32
* Native Command Queueing (NCQ)
gandalfthegreat:/var/local#
I've the the fio tests in:
/dev/mapper/cryptroot /var btrfs rw,noatime,compress=lzo,nossd,discard,space_cache 0 0
(discard is there, so fstrim shouldn't be needed)
> I also suggest to use fio with with the ssd-test example on the SSD. I
> have some comparison data available for my setup. Heck it should be
> publicly available in my ADMIN magazine article about fio. I used a bit
> different fio jobs with block sizes of 2k to 16k, but its similar enough
> and I might even have some 4k examples at hand or can easily create one.
> I also raised size and duration a bit.
>
> An example based on whats in my article:
Thanks, here's the output.
I see bw jumping from bw=1700.8KB/s all the way to bw=474684KB/s depending
on the io size. That's "intereesting" to say the least.
So, doctor, is it bad? :)
randomwrite: (g=0): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentialwrite: (g=1): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
randomread: (g=2): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
sequentialread: (g=3): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=64
2.0.8
Starting 4 processes
randomwrite: Laying out IO file(s) (1 file(s) / 2048MB)
Jobs: 1 (f=1): [___R] [100.0% done] [558.8M/0K /s] [63.8K/0 iops] [eta 00m:00s]
randomwrite: (groupid=0, jobs=1): err= 0: pid=7193
write: io=102048KB, bw=1700.8KB/s, iops=189 , runt= 60003msec
slat (usec): min=21 , max=219834 , avg=5250.91, stdev=5936.55
clat (usec): min=25 , max=738932 , avg=329339.45, stdev=106004.63
lat (msec): min=4 , max=751 , avg=334.59, stdev=107.57
clat percentiles (msec):
| 1.00th=[ 225], 5.00th=[ 241], 10.00th=[ 247], 20.00th=[ 260],
| 30.00th=[ 269], 40.00th=[ 281], 50.00th=[ 293], 60.00th=[ 306],
| 70.00th=[ 322], 80.00th=[ 351], 90.00th=[ 545], 95.00th=[ 570],
| 99.00th=[ 627], 99.50th=[ 644], 99.90th=[ 709], 99.95th=[ 725],
| 99.99th=[ 742]
bw (KB/s) : min= 11, max= 2591, per=99.83%, avg=1697.13, stdev=491.48
lat (usec) : 50=0.01%
lat (msec) : 10=0.02%, 20=0.02%, 50=0.05%, 100=0.14%, 250=12.89%
lat (msec) : 500=72.44%, 750=14.43%
cpu : usr=0.30%, sys=2.65%, ctx=9699, majf=0, minf=19
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.3%, >=64=99.4%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=11396/d=0, short=r=0/w=0/d=0
sequentialwrite: (groupid=1, jobs=1): err= 0: pid=7218
write: io=80994KB, bw=1349.7KB/s, iops=149 , runt= 60011msec
slat (usec): min=15 , max=296798 , avg=6658.06, stdev=7414.10
clat (usec): min=15 , max=962319 , avg=418984.76, stdev=128626.12
lat (msec): min=12 , max=967 , avg=425.64, stdev=130.36
clat percentiles (msec):
| 1.00th=[ 269], 5.00th=[ 306], 10.00th=[ 322], 20.00th=[ 338],
| 30.00th=[ 355], 40.00th=[ 371], 50.00th=[ 388], 60.00th=[ 404],
| 70.00th=[ 420], 80.00th=[ 445], 90.00th=[ 603], 95.00th=[ 766],
| 99.00th=[ 881], 99.50th=[ 906], 99.90th=[ 938], 99.95th=[ 947],
| 99.99th=[ 963]
bw (KB/s) : min= 418, max= 1952, per=99.31%, avg=1339.72, stdev=354.39
lat (usec) : 20=0.01%
lat (msec) : 20=0.01%, 50=0.11%, 100=0.27%, 250=0.29%, 500=86.93%
lat (msec) : 750=6.87%, 1000=5.51%
cpu : usr=0.25%, sys=2.32%, ctx=13426, majf=0, minf=21
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=8991/d=0, short=r=0/w=0/d=0
randomread: (groupid=2, jobs=1): err= 0: pid=7234
read : io=473982KB, bw=7899.5KB/s, iops=934 , runt= 60005msec
slat (usec): min=2 , max=31048 , avg=1065.57, stdev=3519.63
clat (usec): min=21 , max=246981 , avg=67367.84, stdev=33459.21
lat (msec): min=1 , max=263 , avg=68.43, stdev=33.81
clat percentiles (msec):
| 1.00th=[ 15], 5.00th=[ 25], 10.00th=[ 33], 20.00th=[ 39],
| 30.00th=[ 50], 40.00th=[ 57], 50.00th=[ 61], 60.00th=[ 70],
| 70.00th=[ 76], 80.00th=[ 89], 90.00th=[ 111], 95.00th=[ 139],
| 99.00th=[ 176], 99.50th=[ 190], 99.90th=[ 217], 99.95th=[ 229],
| 99.99th=[ 247]
bw (KB/s) : min= 2912, max=11909, per=100.00%, avg=7900.46, stdev=2255.68
lat (usec) : 50=0.01%
lat (msec) : 2=0.17%, 4=0.16%, 10=0.35%, 20=2.49%, 50=27.44%
lat (msec) : 100=55.05%, 250=14.34%
cpu : usr=0.38%, sys=2.89%, ctx=8425, majf=0, minf=276
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=56073/w=0/d=0, short=r=0/w=0/d=0
sequentialread: (groupid=3, jobs=1): err= 0: pid=7249
read : io=2048.0MB, bw=474684KB/s, iops=52754 , runt= 4418msec
slat (usec): min=1 , max=44869 , avg=15.35, stdev=320.96
clat (usec): min=0 , max=67787 , avg=1145.60, stdev=3487.25
lat (usec): min=2 , max=67801 , avg=1161.14, stdev=3508.06
clat percentiles (usec):
| 1.00th=[ 10], 5.00th=[ 187], 10.00th=[ 217], 20.00th=[ 249],
| 30.00th=[ 278], 40.00th=[ 302], 50.00th=[ 330], 60.00th=[ 362],
| 70.00th=[ 398], 80.00th=[ 450], 90.00th=[ 716], 95.00th=[ 6624],
| 99.00th=[16064], 99.50th=[20096], 99.90th=[44800], 99.95th=[61696],
| 99.99th=[67072]
bw (KB/s) : min=50063, max=635019, per=97.34%, avg=462072.50, stdev=202894.18
lat (usec) : 2=0.65%, 4=0.02%, 10=0.30%, 20=0.24%, 50=0.36%
lat (usec) : 100=0.52%, 250=18.08%, 500=65.03%, 750=4.88%, 1000=0.52%
lat (msec) : 2=0.81%, 4=0.95%, 10=4.72%, 20=2.43%, 50=0.45%
lat (msec) : 100=0.05%
cpu : usr=7.52%, sys=67.56%, ctx=1494, majf=0, minf=277
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=233071/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
WRITE: io=102048KB, aggrb=1700KB/s, minb=1700KB/s, maxb=1700KB/s, mint=60003msec, maxt=60003msec
Run status group 1 (all jobs):
WRITE: io=80994KB, aggrb=1349KB/s, minb=1349KB/s, maxb=1349KB/s, mint=60011msec, maxt=60011msec
Run status group 2 (all jobs):
READ: io=473982KB, aggrb=7899KB/s, minb=7899KB/s, maxb=7899KB/s, mint=60005msec, maxt=60005msec
Run status group 3 (all jobs):
READ: io=2048.0MB, aggrb=474683KB/s, minb=474683KB/s, maxb=474683KB/s, mint=4418msec, maxt=4418msec
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
next prev parent reply other threads:[~2012-08-02 5:07 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-30 0:37 brtfs on top of dmcrypt with SSD. No corruption iff write cache off? Marc MERLIN
2012-02-01 17:56 ` Chris Mason
2012-02-02 3:23 ` Marc MERLIN
2012-02-02 12:42 ` Chris Mason
[not found] ` <20120202152722.GI12429@merlins.org>
2012-02-12 22:32 ` Marc MERLIN
2012-02-12 23:47 ` Milan Broz
2012-02-13 0:14 ` Marc MERLIN
2012-02-15 15:42 ` Calvin Walton
2012-02-15 16:55 ` Marc MERLIN
2012-02-15 16:59 ` Hugo Mills
2012-02-22 10:28 ` Justin Ossevoort
2012-02-22 11:07 ` Hugo Mills
2012-02-16 6:33 ` Chris Samuel
2012-02-18 12:33 ` Martin Steigerwald
2012-02-18 12:39 ` Martin Steigerwald
2012-02-18 12:49 ` Martin Steigerwald
2012-07-18 18:13 ` brtfs on top of dmcrypt with SSD -> Trim or no Trim Marc MERLIN
2012-07-18 20:04 ` Fajar A. Nugraha
2012-07-18 20:37 ` Marc MERLIN
2012-07-18 21:34 ` Clemens Eisserer
2012-07-18 21:48 ` Marc MERLIN
2012-07-18 21:49 ` Martin Steigerwald
2012-07-18 22:04 ` Marc MERLIN
2012-07-19 10:40 ` Martin Steigerwald
2012-07-22 18:58 ` brtfs on top of dmcrypt with SSD -> ssd or nossd + crypt performance? Marc MERLIN
2012-07-22 19:35 ` Martin Steigerwald
2012-07-22 19:43 ` Martin Steigerwald
2012-07-22 20:44 ` Marc MERLIN
2012-07-22 22:41 ` brtfs on top of dmcrypt with SSD -> file access 5x slower than spinning disk Marc MERLIN
2012-07-23 6:42 ` How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
2012-07-24 7:56 ` Martin Steigerwald
2012-07-27 4:40 ` Marc MERLIN
2012-07-27 11:08 ` Chris Mason
2012-07-27 18:42 ` Marc MERLIN
[not found] ` <20120801053042.GG12695@merlins.org>
2012-08-01 6:01 ` Marc MERLIN
2012-08-01 6:08 ` Fajar A. Nugraha
2012-08-01 6:21 ` Marc MERLIN
2012-08-01 21:57 ` Martin Steigerwald
2012-08-02 5:07 ` Marc MERLIN [this message]
2012-08-02 11:18 ` Martin Steigerwald
2012-08-02 17:39 ` Marc MERLIN
2012-08-02 20:20 ` Martin Steigerwald
2012-08-02 20:44 ` Marc MERLIN
2012-08-02 21:21 ` Martin Steigerwald
2012-08-02 21:49 ` Marc MERLIN
2012-08-03 18:45 ` Martin Steigerwald
2012-08-16 7:45 ` Marc MERLIN
2012-08-02 11:25 ` Martin Steigerwald
2012-08-01 6:36 ` Chris Samuel
2012-08-01 6:40 ` Marc MERLIN
2012-02-18 16:07 ` brtfs on top of dmcrypt with SSD. No corruption iff write cache off? Marc MERLIN
2012-02-19 0:53 ` Clemens Eisserer
-- strict thread matches above, loose matches on Subject: below --
2012-07-24 6:18 How can btrfs take 23sec to stat 23K files from an SSD? Marc MERLIN
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120802050741.GA22772@merlins.org \
--to=marc@merlins.org \
--cc=Martin@lichtvoll.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=list@fajar.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).