From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Bergmann Subject: Re: [PATCH 00/16 v3] f2fs: introduce flash-friendly file system Date: Mon, 12 Nov 2012 16:57:02 +0000 Message-ID: <201211121657.03054.arnd@arndb.de> References: <003d01cdb74b$0c3fa420$24beec60$%kim@samsung.com> <201211102149.48946.arnd@arndb.de> <201211121616.23616.Martin@lichtvoll.de> Mime-Version: 1.0 Content-Type: Text/Plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, Kim Jaegeuk , Jaegeuk Kim , linux-fsdevel@vger.kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk, tytso@mit.edu, chur.lee@samsung.com, cm224.lee@samsung.com, jooyoung.hwang@samsung.com To: Martin Steigerwald Return-path: In-Reply-To: <201211121616.23616.Martin@lichtvoll.de> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Monday 12 November 2012, Martin Steigerwald wrote: > Am Samstag, 10. November 2012 schrieb Arnd Bergmann: > > I would also recommend using flashbench to find out the optimum par= ameters > > for your device. You can download it from > > git://git.linaro.org/people/arnd/flashbench.git > > In the long run, we should automate those tests and make them part = of > > mkfs.f2fs, but for now, try to find out the erase block size and th= e number > > of concurrently used erase blocks on your device using a timing att= ack > > in flashbench. The README file in there explains how to interpret t= he > > results from "./flashbench -a /dev/sdb --blocksize=3D1024" to gues= s > > the erase block size, although that sometimes doesn't work. >=20 > Why do I use a blocksize of 1024 if the kernel reports me 512 byte bl= ocks? The blocksize you pass here is the size of writes that flashbench sends= to the kernel. Because of the algorithm used by flashbench, two hardware block= s is the smallest size you can use here, and larger block tend to be less= reliable for this test case. I should probably change the default. > [ 3112.144086] scsi9 : usb-storage 1-1.1:1.0 > [ 3113.145968] scsi 9:0:0:0: Direct-Access TinyDisk 2007-05-12 = 0.00 PQ: 0 ANSI: 2 > [ 3113.146476] sd 9:0:0:0: Attached scsi generic sg2 type 0 > [ 3113.147935] sd 9:0:0:0: [sdb] 4095999 512-byte logical blocks: (2.= 09 GB/1.95 GiB) > [ 3113.148935] sd 9:0:0:0: [sdb] Write Protect is off >=20 >=20 > And how do reads give information about erase block size? Wouldn=C2=B4= t writes me > more conclusive for that? (Having to erase one versus two erase block= s?) The --open-au tests can be more reliable, but also take more time and a= re harder to understand. Using this test is faster and often gives an easy answer even without destroying data on the device. > Hmmm, I get very varying results here with said USB stick: >=20 > merkaba:~> /tmp/flashbench -a /dev/sdb > align 536870912 pre 1.1ms on 1.1ms post 1.08ms diff = 13=C2=B5s > align 268435456 pre 1.2ms on 1.19ms post 1.16ms diff = 11.6=C2=B5s > align 134217728 pre 1.12ms on 1.14ms post 1.15ms diff = 9.51=C2=B5s > align 67108864 pre 1.12ms on 1.15ms post 1.12ms diff = 29.9=C2=B5s > align 33554432 pre 1.11ms on 1.17ms post 1.13ms diff = 49=C2=B5s > align 16777216 pre 1.14ms on 1.16ms post 1.15ms diff = 22.4=C2=B5s > align 8388608 pre 1.12ms on 1.09ms post 1.06ms diff = -2053ns > align 4194304 pre 1.13ms on 1.16ms post 1.14ms diff = 21.7=C2=B5s > align 2097152 pre 1.11ms on 1.08ms post 1.1ms diff = -18488n > align 1048576 pre 1.11ms on 1.11ms post 1.11ms diff = -2461ns > align 524288 pre 1.15ms on 1.17ms post 1.1ms diff = 45.4=C2=B5s > align 262144 pre 1.11ms on 1.13ms post 1.13ms diff = 12=C2=B5s > align 131072 pre 1.1ms on 1.09ms post 1.16ms diff = -38025n > align 65536 pre 1.09ms on 1.08ms post 1.11ms diff = -21353n > align 32768 pre 1.1ms on 1.08ms post 1.11ms diff = -23854n > merkaba:~> /tmp/flashbench -a /dev/sdb > align 536870912 pre 1.11ms on 1.13ms post 1.13ms diff = 10.6=C2=B5s > align 268435456 pre 1.12ms on 1.2ms post 1.17ms diff = 61.4=C2=B5s > align 134217728 pre 1.14ms on 1.19ms post 1.15ms diff = 46.8=C2=B5s > align 67108864 pre 1.08ms on 1.15ms post 1.08ms diff = 63.8=C2=B5s > align 33554432 pre 1.09ms on 1.08ms post 1.09ms diff = -4761ns > align 16777216 pre 1.12ms on 1.14ms post 1.07ms diff = 41.4=C2=B5s > align 8388608 pre 1.1ms on 1.1ms post 1.09ms diff = 7.48=C2=B5s > align 4194304 pre 1.08ms on 1.1ms post 1.1ms diff = 10.1=C2=B5s > align 2097152 pre 1.1ms on 1.11ms post 1.1ms diff = 16=C2=B5s > align 1048576 pre 1.09ms on 1.1ms post 1.07ms diff = 15.5=C2=B5s > align 524288 pre 1.12ms on 1.12ms post 1.1ms diff = 11=C2=B5s > align 262144 pre 1.13ms on 1.13ms post 1.1ms diff = 21.6=C2=B5s > align 131072 pre 1.11ms on 1.13ms post 1.12ms diff = 17.9=C2=B5s > align 65536 pre 1.07ms on 1.1ms post 1.1ms diff = 11.6=C2=B5s > align 32768 pre 1.09ms on 1.11ms post 1.13ms diff = -5131ns > merkaba:~> /tmp/flashbench -a /dev/sdb > align 536870912 pre 1.2ms on 1.18ms post 1.21ms diff = -27496n > align 268435456 pre 1.22ms on 1.21ms post 1.24ms diff = -18972n > align 134217728 pre 1.15ms on 1.19ms post 1.14ms diff = 42.5=C2=B5s > align 67108864 pre 1.08ms on 1.09ms post 1.08ms diff = 5.29=C2=B5s > align 33554432 pre 1.18ms on 1.19ms post 1.18ms diff = 9.25=C2=B5s > align 16777216 pre 1.18ms on 1.22ms post 1.17ms diff = 48.6=C2=B5s > align 8388608 pre 1.14ms on 1.17ms post 1.19ms diff = 4.36=C2=B5s > align 4194304 pre 1.16ms on 1.2ms post 1.11ms diff = 65.8=C2=B5s > align 2097152 pre 1.13ms on 1.09ms post 1.12ms diff = -37718n > align 1048576 pre 1.15ms on 1.2ms post 1.18ms diff = 34.9=C2=B5s > align 524288 pre 1.14ms on 1.19ms post 1.16ms diff = 41.5=C2=B5s > align 262144 pre 1.19ms on 1.12ms post 1.15ms diff = -52725n > align 131072 pre 1.21ms on 1.11ms post 1.14ms diff = -68522n > align 65536 pre 1.21ms on 1.13ms post 1.18ms diff = -64248n > align 32768 pre 1.14ms on 1.25ms post 1.12ms diff = 116=C2=B5s > > Even when I apply the explaination of the README I do not seem to get= a > clear picture of the stick erase block size. >=20 > The values above seem to indicate to me: I don=C2=B4t care about alig= nment at all. I think it's more a case of a device where reading does not easily reve= al the erase block boundaries, because the variance between multiple reads is much higher than between different positions. You can try again usin= g "--blocksize=3D1024 --count=3D100", which will increase the accuracy of= the test. On the other hand, the device size of "4095999 512-byte logical blocks" is quite suspicious, because it's not an even number, where it should be a multiple of erase blocks. It is one less sector than 1000 2MB bloc= ks (or 500 4MB blocks, for that matter), but it's not clear if that one block is missing at the start or at the end of the drive. > With another flash, likely slower Intenso 4GB stick I get: >=20 > [ 3672.512143] scsi 10:0:0:0: Direct-Access Ut165 USB2FlashSto= rage 0.00 PQ: 0 ANSI: 2 > [ 3672.514469] sd 10:0:0:0: Attached scsi generic sg2 type 0 > [ 3672.514991] sd 10:0:0:0: [sdb] 7897088 512-byte logical blocks: (4= =2E04 GB/3.76 GiB) > [=E2=80=A6] $ factor 7897088 7897088: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 241 Slightly more helpful, this one has 241 32MB-blocks, so at least we kno= w that the erase block size is not larger than 32MB (which would be very unlikely = anyway) and not a multiple of 3. > align 16777216 pre 939=C2=B5s on 903=C2=B5s post 880=C2= =B5s diff -5972ns > align 8388608 pre 900=C2=B5s on 914=C2=B5s post 923=C2= =B5s diff 2.42=C2=B5s > align 4194304 pre 894=C2=B5s on 886=C2=B5s post 882=C2= =B5s diff -1563ns >=20 > here? >=20 > align 2097152 pre 829=C2=B5s on 890=C2=B5s post 874=C2= =B5s diff 37.8=C2=B5s > align 1048576 pre 899=C2=B5s on 882=C2=B5s post 843=C2= =B5s diff 11.1=C2=B5s > align 524288 pre 890=C2=B5s on 887=C2=B5s post 902=C2= =B5s diff -9005ns > align 262144 pre 887=C2=B5s on 887=C2=B5s post 898=C2= =B5s diff -5474ns > align 131072 pre 928=C2=B5s on 895=C2=B5s post 914=C2= =B5s diff -26028n > align 65536 pre 898=C2=B5s on 898=C2=B5s post 894=C2= =B5s diff 2.59=C2=B5s > align 32768 pre 884=C2=B5s on 891=C2=B5s post 901=C2= =B5s diff -1284ns >=20 >=20 > Similar picture. The diffs seem to be mostly quite small with only so= me > micro seconds. Or am I misreading something? Same thing, try again with the options I listed above. > Then with a quite fast one 16 GB Transcend. >=20 > [ 4055.393399] sd 11:0:0:0: Attached scsi generic sg2 type 0 > [ 4055.394729] sd 11:0:0:0: [sdb] 31375360 512-byte logical blocks: (= 16.0 GB/14.9 GiB) > [ 4055.395262] sd 11:0:0:0: [sdb] Write Protect is off $ factor 31375360 31375360: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 5 383 That would be 5*383*16MB, so the erase block size will be a fraction of= 16MB. =20 > merkaba:~> /tmp/flashbench -a /dev/sdb > align 4294967296 pre 1.28ms on 1.48ms post 1.33ms = diff 179=C2=B5s > align 2147483648 pre 1.32ms on 1.51ms post 1.33ms = diff 181=C2=B5s > align 1073741824 pre 1.31ms on 1.46ms post 1.35ms = diff 132=C2=B5s > align 536870912 pre 1.27ms on 1.52ms post 1.33ms diff = 228=C2=B5s > align 268435456 pre 1.28ms on 1.46ms post 1.31ms diff = 161=C2=B5s > align 134217728 pre 1.28ms on 1.44ms post 1.37ms diff = 120=C2=B5s > align 67108864 pre 1.27ms on 1.44ms post 1.34ms diff = 133=C2=B5s > align 33554432 pre 1.24ms on 1.42ms post 1.31ms diff = 150=C2=B5s > align 16777216 pre 1.23ms on 1.46ms post 1.26ms diff = 218=C2=B5s > align 8388608 pre 1.31ms on 1.5ms post 1.33ms diff = 180=C2=B5s > align 4194304 pre 1.27ms on 1.45ms post 1.36ms diff = 135=C2=B5s > align 2097152 pre 1.29ms on 1.37ms post 1.39ms diff = 33.7=C2=B5s >=20 > here? >=20 > align 1048576 pre 1.31ms on 1.44ms post 1.35ms diff = 115=C2=B5s > align 524288 pre 1.33ms on 1.39ms post 1.48ms diff = -12297n > align 262144 pre 1.36ms on 1.42ms post 1.4ms diff = 45.6=C2=B5s > align 131072 pre 1.37ms on 1.44ms post 1.4ms diff = 57.7=C2=B5s > align 65536 pre 1.36ms on 1.35ms post 1.33ms diff = 4.67=C2=B5s > align 32768 pre 1.32ms on 1.38ms post 1.34ms diff = 44.1=C2=B5s > merkaba:~> /tmp/flashbench -a /dev/sdb > align 4294967296 pre 1.36ms on 1.49ms post 1.34ms = diff 139=C2=B5s > align 2147483648 pre 1.26ms on 1.48ms post 1.27ms = diff 213=C2=B5s > align 1073741824 pre 1.26ms on 1.45ms post 1.33ms = diff 164=C2=B5s > align 536870912 pre 1.22ms on 1.46ms post 1.35ms diff = 173=C2=B5s > align 268435456 pre 1.34ms on 1.5ms post 1.31ms diff = 172=C2=B5s > align 134217728 pre 1.34ms on 1.48ms post 1.31ms diff = 157=C2=B5s > align 67108864 pre 1.29ms on 1.46ms post 1.34ms diff = 142=C2=B5s > align 33554432 pre 1.28ms on 1.47ms post 1.31ms diff = 173=C2=B5s > align 16777216 pre 1.26ms on 1.48ms post 1.37ms diff = 168=C2=B5s > align 8388608 pre 1.31ms on 1.47ms post 1.36ms diff = 139=C2=B5s > align 4194304 pre 1.26ms on 1.53ms post 1.33ms diff = 237=C2=B5s > align 2097152 pre 1.34ms on 1.4ms post 1.36ms diff = 56.4=C2=B5s > align 1048576 pre 1.32ms on 1.35ms post 1.37ms diff = 638ns >=20 > here? >=20 > align 524288 pre 1.29ms on 1.47ms post 1.45ms diff = 98.1=C2=B5s > align 262144 pre 1.35ms on 1.38ms post 1.42ms diff = -11916n > align 131072 pre 1.32ms on 1.46ms post 1.4ms diff = 100=C2=B5s > align 65536 pre 1.35ms on 1.42ms post 1.43ms diff = 30.8=C2=B5s > align 32768 pre 1.31ms on 1.37ms post 1.33ms diff = 51=C2=B5s > merkaba:~> /tmp/flashbench -a /dev/sdb > align 4294967296 pre 1.26ms on 1.49ms post 1.27ms = diff 222=C2=B5s > align 2147483648 pre 1.25ms on 1.41ms post 1.37ms = diff 97.3=C2=B5s > align 1073741824 pre 1.26ms on 1.47ms post 1.31ms = diff 186=C2=B5s > align 536870912 pre 1.25ms on 1.42ms post 1.32ms diff = 132=C2=B5s > align 268435456 pre 1.2ms on 1.44ms post 1.29ms diff = 195=C2=B5s > align 134217728 pre 1.27ms on 1.43ms post 1.34ms diff = 118=C2=B5s > align 67108864 pre 1.25ms on 1.45ms post 1.31ms diff = 165=C2=B5s > align 33554432 pre 1.22ms on 1.36ms post 1.25ms diff = 124=C2=B5s > align 16777216 pre 1.24ms on 1.44ms post 1.26ms diff = 191=C2=B5s > align 8388608 pre 1.22ms on 1.39ms post 1.23ms diff = 164=C2=B5s > align 4194304 pre 1.23ms on 1.43ms post 1.3ms diff = 171=C2=B5s > align 2097152 pre 1.26ms on 1.3ms post 1.32ms diff = 16.7=C2=B5s > align 1048576 pre 1.26ms on 1.27ms post 1.26ms diff = 7.91=C2=B5s >=20 > here? >=20 > align 524288 pre 1.24ms on 1.3ms post 1.3ms diff = 29.2=C2=B5s > align 262144 pre 1.25ms on 1.3ms post 1.28ms diff = 28.2=C2=B5s > align 131072 pre 1.25ms on 1.29ms post 1.28ms diff = 24.8=C2=B5s > align 65536 pre 1.15ms on 1.24ms post 1.26ms diff = 34.5=C2=B5s > align 32768 pre 1.17ms on 1.3ms post 1.26ms diff = 82.6=C2=B5s This one is fairly deterministic, and I would assume it's 4MB, which al= ways has a much higher number in the last column than the 2MB one. =46or a fast 16 GB stick, I also wouldn't expect smaller than 4 MB eras= e blocks. > Thing is that me here is not always at the same place :) If you add a '--count=3DN' argument, you can have flashbench run the te= st more often and average between the runs. The default is 8. > > With the correct guess, compare the performance you get using > >=20 > > $ ERASESIZE=3D$[2*1024*1024] # replace with guess from flashbench -= a > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D40= 96 --erasesize=3D${ERASESIZE} > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D3 --blocksize=3D40= 96 --erasesize=3D${ERASESIZE} > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D5 --blocksize=3D40= 96 --erasesize=3D${ERASESIZE} > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D7 --blocksize=3D40= 96 --erasesize=3D${ERASESIZE} > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D13 --blocksize=3D4= 096 --erasesize=3D${ERASESIZE} >=20 > I omit this for now, cause I am not yet sure about the correct guess. You can also try this test to find out the erase block size if the -a t= est fails. Start with the largest possible value you'd expect (16 MB for a modern = and fast USB stick, less if it's older or smaller), and use --open-au-nr=3D1 to = get a baseline: =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D4096 -= -erasesize=3D$[16*1024*1024] Every device should be able to handle this nicely with maximum throughp= ut. The default is to start the test at 16 MB into the device to get out of the way of a p= otential FAT optimized area. You can change that offset to find where an erase block= boundary is. Adding '--offset=3D[24*1024*1024]' will still be fast if the erase bloc= k size is 8 MB, but get slower and have more jitter if the size is actually 16 MB, beca= use now we write a 16 MB section of the drive with an 8 MB misalignment. The next ones t= o try after that would be 20, 18, 17, 16.5, etc MB, to which will be slow for an 8,4, 2,= an 1 MB erase block size, respectively. You can also reduce the --erasesize argument = there and do =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D65536 = --erasesize=3D[16*1024*1024 --offset=3D[24*1024*1024] =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D65536 = --erasesize=3D[8*1024*1024 --offset=3D[20*1024*1024] =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D65536 = --erasesize=3D[4*1024*1024 --offset=3D[18*1024*1024] =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D65536 = --erasesize=3D[2*1024*1024 --offset=3D[17*1024*1024] =2E/flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D65536 = --erasesize=3D[1*1024*1024 --offset=3D[33*512*1024] If you have the result from the other test to figure out the maximum va= lue for '--open-au-nr=3DN', using that number here will make this test more rel= iable as well. Arnd