From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Steigerwald Subject: Re: [PATCH 00/16 v3] f2fs: introduce flash-friendly file system Date: Mon, 12 Nov 2012 16:16:23 +0100 Message-ID: <201211121616.23616.Martin@lichtvoll.de> References: <003d01cdb74b$0c3fa420$24beec60$%kim@samsung.com> <201211101933.38434.Martin@lichtvoll.de> <201211102149.48946.arnd@arndb.de> (sfid-20121111_013506_013683_61F4C040) Mime-Version: 1.0 Content-Type: Text/Plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, Kim Jaegeuk , Jaegeuk Kim , linux-fsdevel@vger.kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk, tytso@mit.edu, chur.lee@samsung.com, cm224.lee@samsung.com, jooyoung.hwang@samsung.com To: Arnd Bergmann Return-path: In-Reply-To: <201211102149.48946.arnd@arndb.de> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Am Samstag, 10. November 2012 schrieb Arnd Bergmann: > On Saturday 10 November 2012, Martin Steigerwald wrote: > > Command (m for help): n > > Partition type: > > p primary (0 primary, 0 extended, 4 free) > > e extended > > Select (default p): p > > Partition number (1-4, default 1): 1 > > First sector (2048-4095998, default 2048):=20 > > Using default value 2048 > > Last sector, +sectors or +size{K,M,G} (2048-4095998, default 409599= 8):=20 > > Using default value 4095998 >=20 > This is almost certainly not the right setting for f2fs, which only w= orks > at its design point if the segments are aligned to erase blocks. All = modern > flash devices have erase blocks larger than 1 MB, so starting the par= tition > at a 1 MB offset will cause it to be misaligned. Also, some USB stick= s > have an area optimized for random writes in the beginning of the driv= e > where both FAT32 and f2fs store their metadata. It may be worth testi= ng > again without a partition table, using just the raw device. Thank you for your hints, Arnd, much appreciated. I already suspected as such after having read some of the fine document= s on the linaro website. As I want to write some article to give Linux users some insight about Linux on "cheap" flash, I am willing to learn more. > I would also recommend using flashbench to find out the optimum param= eters > for your device. You can download it from > git://git.linaro.org/people/arnd/flashbench.git > In the long run, we should automate those tests and make them part of > mkfs.f2fs, but for now, try to find out the erase block size and the = number > of concurrently used erase blocks on your device using a timing attac= k > in flashbench. The README file in there explains how to interpret the > results from "./flashbench -a /dev/sdb --blocksize=3D1024" to guess > the erase block size, although that sometimes doesn't work. Why do I use a blocksize of 1024 if the kernel reports me 512 byte bloc= ks? [ 3112.144086] scsi9 : usb-storage 1-1.1:1.0 [ 3113.145968] scsi 9:0:0:0: Direct-Access TinyDisk 2007-05-12 = 0.00 PQ: 0 ANSI: 2 [ 3113.146476] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ 3113.147935] sd 9:0:0:0: [sdb] 4095999 512-byte logical blocks: (2.09= GB/1.95 GiB) [ 3113.148935] sd 9:0:0:0: [sdb] Write Protect is off And how do reads give information about erase block size? Wouldn=C2=B4t= writes me more conclusive for that? (Having to erase one versus two erase blocks?= ) Hmmm, I get very varying results here with said USB stick: merkaba:~> /tmp/flashbench -a /dev/sdb align 536870912 pre 1.1ms on 1.1ms post 1.08ms diff 13= =C2=B5s align 268435456 pre 1.2ms on 1.19ms post 1.16ms diff 11= =2E6=C2=B5s align 134217728 pre 1.12ms on 1.14ms post 1.15ms diff 9.= 51=C2=B5s align 67108864 pre 1.12ms on 1.15ms post 1.12ms diff 29= =2E9=C2=B5s align 33554432 pre 1.11ms on 1.17ms post 1.13ms diff 49= =C2=B5s align 16777216 pre 1.14ms on 1.16ms post 1.15ms diff 22= =2E4=C2=B5s align 8388608 pre 1.12ms on 1.09ms post 1.06ms diff -2= 053ns align 4194304 pre 1.13ms on 1.16ms post 1.14ms diff 21= =2E7=C2=B5s align 2097152 pre 1.11ms on 1.08ms post 1.1ms diff -1= 8488n align 1048576 pre 1.11ms on 1.11ms post 1.11ms diff -2= 461ns align 524288 pre 1.15ms on 1.17ms post 1.1ms diff 45= =2E4=C2=B5s align 262144 pre 1.11ms on 1.13ms post 1.13ms diff 12= =C2=B5s align 131072 pre 1.1ms on 1.09ms post 1.16ms diff -3= 8025n align 65536 pre 1.09ms on 1.08ms post 1.11ms diff -2= 1353n align 32768 pre 1.1ms on 1.08ms post 1.11ms diff -2= 3854n merkaba:~> /tmp/flashbench -a /dev/sdb align 536870912 pre 1.11ms on 1.13ms post 1.13ms diff 10= =2E6=C2=B5s align 268435456 pre 1.12ms on 1.2ms post 1.17ms diff 61= =2E4=C2=B5s align 134217728 pre 1.14ms on 1.19ms post 1.15ms diff 46= =2E8=C2=B5s align 67108864 pre 1.08ms on 1.15ms post 1.08ms diff 63= =2E8=C2=B5s align 33554432 pre 1.09ms on 1.08ms post 1.09ms diff -4= 761ns align 16777216 pre 1.12ms on 1.14ms post 1.07ms diff 41= =2E4=C2=B5s align 8388608 pre 1.1ms on 1.1ms post 1.09ms diff 7.= 48=C2=B5s align 4194304 pre 1.08ms on 1.1ms post 1.1ms diff 10= =2E1=C2=B5s align 2097152 pre 1.1ms on 1.11ms post 1.1ms diff 16= =C2=B5s align 1048576 pre 1.09ms on 1.1ms post 1.07ms diff 15= =2E5=C2=B5s align 524288 pre 1.12ms on 1.12ms post 1.1ms diff 11= =C2=B5s align 262144 pre 1.13ms on 1.13ms post 1.1ms diff 21= =2E6=C2=B5s align 131072 pre 1.11ms on 1.13ms post 1.12ms diff 17= =2E9=C2=B5s align 65536 pre 1.07ms on 1.1ms post 1.1ms diff 11= =2E6=C2=B5s align 32768 pre 1.09ms on 1.11ms post 1.13ms diff -5= 131ns merkaba:~> /tmp/flashbench -a /dev/sdb align 536870912 pre 1.2ms on 1.18ms post 1.21ms diff -2= 7496n align 268435456 pre 1.22ms on 1.21ms post 1.24ms diff -1= 8972n align 134217728 pre 1.15ms on 1.19ms post 1.14ms diff 42= =2E5=C2=B5s align 67108864 pre 1.08ms on 1.09ms post 1.08ms diff 5.= 29=C2=B5s align 33554432 pre 1.18ms on 1.19ms post 1.18ms diff 9.= 25=C2=B5s align 16777216 pre 1.18ms on 1.22ms post 1.17ms diff 48= =2E6=C2=B5s align 8388608 pre 1.14ms on 1.17ms post 1.19ms diff 4.= 36=C2=B5s align 4194304 pre 1.16ms on 1.2ms post 1.11ms diff 65= =2E8=C2=B5s align 2097152 pre 1.13ms on 1.09ms post 1.12ms diff -3= 7718n align 1048576 pre 1.15ms on 1.2ms post 1.18ms diff 34= =2E9=C2=B5s align 524288 pre 1.14ms on 1.19ms post 1.16ms diff 41= =2E5=C2=B5s align 262144 pre 1.19ms on 1.12ms post 1.15ms diff -5= 2725n align 131072 pre 1.21ms on 1.11ms post 1.14ms diff -6= 8522n align 65536 pre 1.21ms on 1.13ms post 1.18ms diff -6= 4248n align 32768 pre 1.14ms on 1.25ms post 1.12ms diff 11= 6=C2=B5s Even when I apply the explaination of the README I do not seem to get a clear picture of the stick erase block size. The values above seem to indicate to me: I don=C2=B4t care about alignm= ent at all. With another flash, likely slower Intenso 4GB stick I get: [ 3672.512143] scsi 10:0:0:0: Direct-Access Ut165 USB2FlashStora= ge 0.00 PQ: 0 ANSI: 2 [ 3672.514469] sd 10:0:0:0: Attached scsi generic sg2 type 0 [ 3672.514991] sd 10:0:0:0: [sdb] 7897088 512-byte logical blocks: (4.0= 4 GB/3.76 GiB) [=E2=80=A6] merkaba:~> /tmp/flashbench -a /dev/sdb align 1073741824 pre 1.06ms on 1.03ms post 951=C2=B5s= diff 26.1=C2=B5s align 536870912 pre 1.06ms on 1ms post 941=C2=B5s diff 1.17=C2= =B5s align 268435456 pre 995=C2=B5s on 957=C2=B5s post 887=C2=B5= s diff 15.7=C2=B5s align 134217728 pre 994=C2=B5s on 951=C2=B5s post 883=C2=B5= s diff 12.4=C2=B5s align 67108864 pre 994=C2=B5s on 989=C2=B5s post 1.02ms = diff -15104n align 33554432 pre 934=C2=B5s on 974=C2=B5s post 1ms = diff 4.16=C2=B5s align 16777216 pre 946=C2=B5s on 916=C2=B5s post 900=C2=B5= s diff -6588ns align 8388608 pre 883=C2=B5s on 881=C2=B5s post 880=C2=B5= s diff -1176ns align 4194304 pre 884=C2=B5s on 884=C2=B5s post 885=C2=B5= s diff -159ns here? align 2097152 pre 880=C2=B5s on 879=C2=B5s post 783=C2=B5= s diff 47.6=C2=B5s align 1048576 pre 877=C2=B5s on 881=C2=B5s post 878=C2=B5= s diff 3.92=C2=B5s align 524288 pre 869=C2=B5s on 870=C2=B5s post 875=C2=B5= s diff -2101ns align 262144 pre 871=C2=B5s on 875=C2=B5s post 885=C2=B5= s diff -2539ns align 131072 pre 878=C2=B5s on 893=C2=B5s post 900=C2=B5= s diff 3.6=C2=B5s align 65536 pre 851=C2=B5s on 881=C2=B5s post 884=C2=B5= s diff 13.7=C2=B5s align 32768 pre 836=C2=B5s on 833=C2=B5s post 880=C2=B5= s diff -25556n merkaba:~> /tmp/flashbench -a /dev/sdb align 1073741824 pre 1.07ms on 1e+03=C2=B5 post 962=C2= =B5s diff -14615n align 536870912 pre 1.06ms on 1.01ms post 940=C2=B5s di= ff 12.2=C2=B5s align 268435456 pre 1ms on 943=C2=B5s post 885=C2=B5s diff = -1132ns align 134217728 pre 995=C2=B5s on 982=C2=B5s post 909=C2=B5= s diff 30=C2=B5s align 67108864 pre 999=C2=B5s on 995=C2=B5s post 1.01ms = diff -9707ns align 33554432 pre 960=C2=B5s on 1.01ms post 1.03ms di= ff 15.2=C2=B5s align 16777216 pre 954=C2=B5s on 928=C2=B5s post 878=C2=B5= s diff 12.1=C2=B5s align 8388608 pre 872=C2=B5s on 900=C2=B5s post 895=C2=B5= s diff 16.5=C2=B5s align 4194304 pre 895=C2=B5s on 862=C2=B5s post 890=C2=B5= s diff -30439n align 2097152 pre 889=C2=B5s on 901=C2=B5s post 876=C2=B5= s diff 18.7=C2=B5s align 1048576 pre 900=C2=B5s on 898=C2=B5s post 897=C2=B5= s diff -708ns here? align 524288 pre 885=C2=B5s on 874=C2=B5s post 881=C2=B5= s diff -8470ns align 262144 pre 817=C2=B5s on 873=C2=B5s post 878=C2=B5= s diff 25.6=C2=B5s align 131072 pre 882=C2=B5s on 854=C2=B5s post 881=C2=B5= s diff -27423n align 65536 pre 866=C2=B5s on 890=C2=B5s post 885=C2=B5= s diff 14.3=C2=B5s align 32768 pre 900=C2=B5s on 881=C2=B5s post 893=C2=B5= s diff -15412n merkaba:~> /tmp/flashbench -a /dev/sdb align 1073741824 pre 1.12ms on 1.02ms post 949=C2=B5s= diff -12574n align 536870912 pre 1.07ms on 1.03ms post 948=C2=B5s di= ff 16.5=C2=B5s align 268435456 pre 1.01ms on 958=C2=B5s post 883=C2=B5s = diff 12.1=C2=B5s align 134217728 pre 994=C2=B5s on 946=C2=B5s post 879=C2=B5= s diff 9.2=C2=B5s align 67108864 pre 1ms on 1.05ms post 1.03ms diff 37.9=C2=B5= s align 33554432 pre 942=C2=B5s on 1.01ms post 1.03ms di= ff 20.6=C2=B5s align 16777216 pre 939=C2=B5s on 903=C2=B5s post 880=C2=B5= s diff -5972ns align 8388608 pre 900=C2=B5s on 914=C2=B5s post 923=C2=B5= s diff 2.42=C2=B5s align 4194304 pre 894=C2=B5s on 886=C2=B5s post 882=C2=B5= s diff -1563ns here? align 2097152 pre 829=C2=B5s on 890=C2=B5s post 874=C2=B5= s diff 37.8=C2=B5s align 1048576 pre 899=C2=B5s on 882=C2=B5s post 843=C2=B5= s diff 11.1=C2=B5s align 524288 pre 890=C2=B5s on 887=C2=B5s post 902=C2=B5= s diff -9005ns align 262144 pre 887=C2=B5s on 887=C2=B5s post 898=C2=B5= s diff -5474ns align 131072 pre 928=C2=B5s on 895=C2=B5s post 914=C2=B5= s diff -26028n align 65536 pre 898=C2=B5s on 898=C2=B5s post 894=C2=B5= s diff 2.59=C2=B5s align 32768 pre 884=C2=B5s on 891=C2=B5s post 901=C2=B5= s diff -1284ns Similar picture. The diffs seem to be mostly quite small with only some micro seconds. Or am I misreading something? Then with a quite fast one 16 GB Transcend. [ 4055.393399] sd 11:0:0:0: Attached scsi generic sg2 type 0 [ 4055.394729] sd 11:0:0:0: [sdb] 31375360 512-byte logical blocks: (16= =2E0 GB/14.9 GiB) [ 4055.395262] sd 11:0:0:0: [sdb] Write Protect is off merkaba:~> /tmp/flashbench -a /dev/sdb align 4294967296 pre 1.28ms on 1.48ms post 1.33ms = diff 179=C2=B5s align 2147483648 pre 1.32ms on 1.51ms post 1.33ms = diff 181=C2=B5s align 1073741824 pre 1.31ms on 1.46ms post 1.35ms = diff 132=C2=B5s align 536870912 pre 1.27ms on 1.52ms post 1.33ms diff 22= 8=C2=B5s align 268435456 pre 1.28ms on 1.46ms post 1.31ms diff 16= 1=C2=B5s align 134217728 pre 1.28ms on 1.44ms post 1.37ms diff 12= 0=C2=B5s align 67108864 pre 1.27ms on 1.44ms post 1.34ms diff 13= 3=C2=B5s align 33554432 pre 1.24ms on 1.42ms post 1.31ms diff 15= 0=C2=B5s align 16777216 pre 1.23ms on 1.46ms post 1.26ms diff 21= 8=C2=B5s align 8388608 pre 1.31ms on 1.5ms post 1.33ms diff 18= 0=C2=B5s align 4194304 pre 1.27ms on 1.45ms post 1.36ms diff 13= 5=C2=B5s align 2097152 pre 1.29ms on 1.37ms post 1.39ms diff 33= =2E7=C2=B5s here? align 1048576 pre 1.31ms on 1.44ms post 1.35ms diff 11= 5=C2=B5s align 524288 pre 1.33ms on 1.39ms post 1.48ms diff -1= 2297n align 262144 pre 1.36ms on 1.42ms post 1.4ms diff 45= =2E6=C2=B5s align 131072 pre 1.37ms on 1.44ms post 1.4ms diff 57= =2E7=C2=B5s align 65536 pre 1.36ms on 1.35ms post 1.33ms diff 4.= 67=C2=B5s align 32768 pre 1.32ms on 1.38ms post 1.34ms diff 44= =2E1=C2=B5s merkaba:~> /tmp/flashbench -a /dev/sdb align 4294967296 pre 1.36ms on 1.49ms post 1.34ms = diff 139=C2=B5s align 2147483648 pre 1.26ms on 1.48ms post 1.27ms = diff 213=C2=B5s align 1073741824 pre 1.26ms on 1.45ms post 1.33ms = diff 164=C2=B5s align 536870912 pre 1.22ms on 1.46ms post 1.35ms diff 17= 3=C2=B5s align 268435456 pre 1.34ms on 1.5ms post 1.31ms diff 17= 2=C2=B5s align 134217728 pre 1.34ms on 1.48ms post 1.31ms diff 15= 7=C2=B5s align 67108864 pre 1.29ms on 1.46ms post 1.34ms diff 14= 2=C2=B5s align 33554432 pre 1.28ms on 1.47ms post 1.31ms diff 17= 3=C2=B5s align 16777216 pre 1.26ms on 1.48ms post 1.37ms diff 16= 8=C2=B5s align 8388608 pre 1.31ms on 1.47ms post 1.36ms diff 13= 9=C2=B5s align 4194304 pre 1.26ms on 1.53ms post 1.33ms diff 23= 7=C2=B5s align 2097152 pre 1.34ms on 1.4ms post 1.36ms diff 56= =2E4=C2=B5s align 1048576 pre 1.32ms on 1.35ms post 1.37ms diff 63= 8ns here? align 524288 pre 1.29ms on 1.47ms post 1.45ms diff 98= =2E1=C2=B5s align 262144 pre 1.35ms on 1.38ms post 1.42ms diff -1= 1916n align 131072 pre 1.32ms on 1.46ms post 1.4ms diff 10= 0=C2=B5s align 65536 pre 1.35ms on 1.42ms post 1.43ms diff 30= =2E8=C2=B5s align 32768 pre 1.31ms on 1.37ms post 1.33ms diff 51= =C2=B5s merkaba:~> /tmp/flashbench -a /dev/sdb align 4294967296 pre 1.26ms on 1.49ms post 1.27ms = diff 222=C2=B5s align 2147483648 pre 1.25ms on 1.41ms post 1.37ms = diff 97.3=C2=B5s align 1073741824 pre 1.26ms on 1.47ms post 1.31ms = diff 186=C2=B5s align 536870912 pre 1.25ms on 1.42ms post 1.32ms diff 13= 2=C2=B5s align 268435456 pre 1.2ms on 1.44ms post 1.29ms diff 19= 5=C2=B5s align 134217728 pre 1.27ms on 1.43ms post 1.34ms diff 11= 8=C2=B5s align 67108864 pre 1.25ms on 1.45ms post 1.31ms diff 16= 5=C2=B5s align 33554432 pre 1.22ms on 1.36ms post 1.25ms diff 12= 4=C2=B5s align 16777216 pre 1.24ms on 1.44ms post 1.26ms diff 19= 1=C2=B5s align 8388608 pre 1.22ms on 1.39ms post 1.23ms diff 16= 4=C2=B5s align 4194304 pre 1.23ms on 1.43ms post 1.3ms diff 17= 1=C2=B5s align 2097152 pre 1.26ms on 1.3ms post 1.32ms diff 16= =2E7=C2=B5s align 1048576 pre 1.26ms on 1.27ms post 1.26ms diff 7.= 91=C2=B5s here? align 524288 pre 1.24ms on 1.3ms post 1.3ms diff 29= =2E2=C2=B5s align 262144 pre 1.25ms on 1.3ms post 1.28ms diff 28= =2E2=C2=B5s align 131072 pre 1.25ms on 1.29ms post 1.28ms diff 24= =2E8=C2=B5s align 65536 pre 1.15ms on 1.24ms post 1.26ms diff 34= =2E5=C2=B5s align 32768 pre 1.17ms on 1.3ms post 1.26ms diff 82= =2E6=C2=B5s Thing is that me here is not always at the same place :) > With the correct guess, compare the performance you get using >=20 > $ ERASESIZE=3D$[2*1024*1024] # replace with guess from flashbench -a > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D4096= --erasesize=3D${ERASESIZE} > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D3 --blocksize=3D4096= --erasesize=3D${ERASESIZE} > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D5 --blocksize=3D4096= --erasesize=3D${ERASESIZE} > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D7 --blocksize=3D4096= --erasesize=3D${ERASESIZE} > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D13 --blocksize=3D409= 6 --erasesize=3D${ERASESIZE} I omit this for now, cause I am not yet sure about the correct guess. > The first one of those should always be the fastest, hopefully follow= ed by > some that are equally fast and then some much slower ones (especially= for the > smaller block sizes). The "active_logs=3DN" mount option should be on= e less > than the highest number above that is still "fast", and only "2", "4"= and "6" > are valid at the moment. If you are lucky, your device is still fast = with > "--open-au-nr=3D7" and slow only for higher numbers, then the default= of "6" > is ok. >=20 > If the erase size is larger than 2 MB, then you have to "-s" option i= n > mkfs.f2fs to configure how many 2 MB segments there are in one erase = block. > For a 2 GB USB stick, I would guess that the erase block size is 1, 2= or > 4 MB. Newer (larger) sticks will have larger erase blocks that may al= so > be a multiple of 3 MB (3, 6, 12, or 24). Thanks, --=20 Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7