From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Bergmann Subject: Re: [PATCH 00/16 v3] f2fs: introduce flash-friendly file system Date: Fri, 16 Nov 2012 21:26:13 +0000 Message-ID: <201211162126.13940.arnd@arndb.de> References: <003d01cdb74b$0c3fa420$24beec60$%kim@samsung.com> <201211121657.03054.arnd@arndb.de> <201211141657.39475.Martin@lichtvoll.de> Mime-Version: 1.0 Content-Type: Text/Plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, Kim Jaegeuk , Jaegeuk Kim , linux-fsdevel@vger.kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk, tytso@mit.edu, chur.lee@samsung.com, cm224.lee@samsung.com, jooyoung.hwang@samsung.com To: Martin Steigerwald Return-path: In-Reply-To: <201211141657.39475.Martin@lichtvoll.de> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wednesday 14 November 2012, Martin Steigerwald wrote: > Am Montag, 12. November 2012 schrieb Arnd Bergmann: > > On Monday 12 November 2012, Martin Steigerwald wrote: > > > Am Samstag, 10. November 2012 schrieb Arnd Bergmann: > > > Even when I apply the explaination of the README I do not seem to= get a > > > clear picture of the stick erase block size. > > >=20 > > > The values above seem to indicate to me: I don=C2=B4t care about = alignment at all. > >=20 > > I think it's more a case of a device where reading does not easily = reveal > > the erase block boundaries, because the variance between multiple r= eads > > is much higher than between different positions. You can try again = using > > "--blocksize=3D1024 --count=3D100", which will increase the accurac= y of the > > test. > >=20 > > On the other hand, the device size of "4095999 512-byte logical blo= cks" > > is quite suspicious, because it's not an even number, where it shou= ld > > be a multiple of erase blocks. It is one less sector than 1000 2MB = blocks > > (or 500 4MB blocks, for that matter), but it's not clear if that on= e > > block is missing at the start or at the end of the drive. >=20 > Just for this first flash drive, I think the erase block size if 4 Mi= B. The > -a count=3D100/100 tests did not show any obvious results, but the=20 > --open-au ones did, I think. I would use two open allocation units (A= Us). >=20 > Maybe also 1 AU, cause 64 KiB sized accesses are faster that way? >=20 > Well I tend to use one AU. So that device would be more suitable for = =46AT > than for BTRFS. Or more suitable for F2FS that is. >=20 > What do you think? >=20 > Only thing that seems to contradict this is the test with different > alignments below. >=20 >=20 > merkaba:~#254> /tmp/flashbench -a /dev/sdb --count=3D100 You should really pass "--blocksize=3D1024" here, which makes the resul= ts much more accurate. Still, there are some devices where the -a test doesn't give anything useful at all. > align 536870912 pre 1.06ms on 1.07ms post 1.04ms diff = 14.6=C2=B5s > align 268435456 pre 1.09ms on 1.1ms post 1.09ms diff = 11.3=C2=B5s > align 134217728 pre 1.09ms on 1.09ms post 1.1ms diff = -87ns > align 67108864 pre 1.05ms on 1.06ms post 1.03ms diff = 15.9=C2=B5s > align 33554432 pre 1.06ms on 1.06ms post 1.03ms diff = 18.7=C2=B5s > align 16777216 pre 1.05ms on 1.05ms post 1.03ms diff = 13.3=C2=B5s > align 8388608 pre 1.05ms on 1.06ms post 1.04ms diff = 9.03=C2=B5s > align 4194304 pre 1.06ms on 1.06ms post 1.04ms diff = 8.56=C2=B5s > align 2097152 pre 1.06ms on 1.05ms post 1.05ms diff = 2.02=C2=B5s > align 1048576 pre 1.05ms on 1.04ms post 1.06ms diff = -11524n > align 524288 pre 1.05ms on 1.05ms post 1.04ms diff = 642ns > align 262144 pre 1.04ms on 1.04ms post 1.04ms diff = -604ns > align 131072 pre 1.03ms on 1.04ms post 1.04ms diff = 2.79=C2=B5s > align 65536 pre 1.04ms on 1.05ms post 1.05ms diff = 7.2=C2=B5s > align 32768 pre 1.05ms on 1.05ms post 1.05ms diff = -4475ns This looks like a 4 MB size. > merkaba:~> /tmp/flashbench -a /dev/sdb --count=3D1000 > align 536870912 pre 1.03ms on 1.05ms post 1.02ms diff = 20.3=C2=B5s > align 268435456 pre 1.06ms on 1.05ms post 1.04ms diff = 3.14=C2=B5s > align 134217728 pre 1.07ms on 1.08ms post 1.05ms diff = 16.1=C2=B5s > align 67108864 pre 1.03ms on 1.03ms post 1.02ms diff = 11=C2=B5s > align 33554432 pre 1.02ms on 1.03ms post 1.01ms diff = 10.3=C2=B5s > align 16777216 pre 1.03ms on 1.04ms post 1.02ms diff = 9.68=C2=B5s > align 8388608 pre 1.04ms on 1.03ms post 1.02ms diff = 6.45=C2=B5s > align 4194304 pre 1.03ms on 1.04ms post 1.02ms diff = 9.12=C2=B5s > align 2097152 pre 1.04ms on 1.04ms post 1.02ms diff = 15.4=C2=B5s > align 1048576 pre 1.03ms on 1.03ms post 1.03ms diff = -1590ns > align 524288 pre 1.03ms on 1.03ms post 1.03ms diff = -835ns > align 262144 pre 1.04ms on 1.04ms post 1.03ms diff = 1.25=C2=B5s > align 131072 pre 1.03ms on 1.03ms post 1.03ms diff = -3477ns > align 65536 pre 1.03ms on 1.03ms post 1.03ms diff = 191ns > align 32768 pre 1.03ms on 1.04ms post 1.03ms diff = 4.06=C2=B5s And this doesn't. I would guess 2 MB from the above. > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D2 --bloc= ksize=3D4096 --erasesize=3D$[16*1024*1024] > 16MiB 5.68M/s=20 > 8MiB 4.3M/s =20 > 4MiB 14.2M/s=20 > 2MiB 13.1M/s=20 > 1MiB 5.6M/s =20 > 512KiB 3.35M/s=20 > 256KiB 6.61M/s=20 > 128KiB 4.19M/s=20 > 64KiB 5.07M/s=20 > 32KiB 2.16M/s=20 > 16KiB 1.82M/s=20 > 8KiB 1.24M/s=20 > 4KiB 726K/s =20 > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D3 --bloc= ksize=3D4096 --erasesize=3D$[16*1024*1024] > 16MiB 7.18M/s=20 > 8MiB 14.6M/s=20 > 4MiB 14.1M/s=20 > 2MiB 13M/s =20 > 1MiB 6.39M/s=20 > 512KiB 8.77M/s=20 > 256KiB 6.13M/s=20 > 128KiB 3.81M/s=20 > 64KiB 2.37M/s=20 > 32KiB 1.15M/s=20 > 16KiB 648K/s =20 > 8KiB 344K/s =20 > 4KiB 180K/s =20 This shows clearly how the device cannot handle more than 2 erase block= s, as you correctly pointed out. I'm guessing that it does have a FAT optimized area in the= front, so it should work fine if you mount f2fs with just two active logs. > But then I tried with offset and get: >=20 > > > > With the correct guess, compare the performance you get using > > > >=20 > > > > $ ERASESIZE=3D$[2*1024*1024] # replace with guess from flashben= ch -a > > > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D= 4096 --erasesize=3D${ERASESIZE} > > > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D3 --blocksize=3D= 4096 --erasesize=3D${ERASESIZE} > > > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D5 --blocksize=3D= 4096 --erasesize=3D${ERASESIZE} > > > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D7 --blocksize=3D= 4096 --erasesize=3D${ERASESIZE} > > > > $ ./flashbench /dev/sdb --open-au --open-au-nr=3D13 --blocksize= =3D4096 --erasesize=3D${ERASESIZE} > > >=20 > > > I omit this for now, cause I am not yet sure about the correct gu= ess. > >=20 > > You can also try this test to find out the erase block size if the = -a test fails. > > Start with the largest possible value you'd expect (16 MB for a mod= ern and fast > > USB stick, less if it's older or smaller), and use --open-au-nr=3D1= to get a baseline: > >=20 > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D4096= --erasesize=3D$[16*1024*1024] > >=20 > > Every device should be able to handle this nicely with maximum thro= ughput. The default is > > to start the test at 16 MB into the device to get out of the way of= a potential FAT > > optimized area. You can change that offset to find where an erase b= lock boundary is. > > Adding '--offset=3D[24*1024*1024]' will still be fast if the erase = block size is 8 MB, > > but get slower and have more jitter if the size is actually 16 MB, = because now we write > > a 16 MB section of the drive with an 8 MB misalignment. The next on= es to try after that > > would be 20, 18, 17, 16.5, etc MB, to which will be slow for an 8,4= , 2, an 1 MB erase > > block size, respectively. You can also reduce the --erasesize argum= ent there and do > >=20 > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D6553= 6 --erasesize=3D[16*1024*1024 --offset=3D[24*1024*1024] > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D6553= 6 --erasesize=3D[8*1024*1024 --offset=3D[20*1024*1024] > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D6553= 6 --erasesize=3D[4*1024*1024 --offset=3D[18*1024*1024] > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D6553= 6 --erasesize=3D[2*1024*1024 --offset=3D[17*1024*1024] > > ./flashbench /dev/sdb --open-au --open-au-nr=3D1 --blocksize=3D6553= 6 --erasesize=3D[1*1024*1024 --offset=3D[33*512*1024] > >=20 > > If you have the result from the other test to figure out the maximu= m value for > > '--open-au-nr=3DN', using that number here will make this test more= reliable as well. >=20 >=20 >=20 > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D1 --offs= et $[8*1024*1024] --erasesize=3D$[16*1024*1024] > 16MiB 15.1M/s=20 > 8MiB 3.45M/s=20 > 4MiB 14M/s =20 > 2MiB 13.1M/s=20 > 1MiB 15.2M/s=20 > 512KiB 3.31M/s=20 > 256KiB 6.55M/s=20 > 128KiB 4.18M/s=20 > 64KiB 13.4M/s=20 > 32KiB 2.14M/s=20 > 16KiB 1.81M/s=20 > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D1 --offs= et $[1*1024*1024] --erasesize=3D$[4*1024*1024] > 4MiB 14.1M/s=20 > 2MiB 13M/s =20 > 1MiB 14.9M/s=20 > 512KiB 3.25M/s=20 > 256KiB 6.56M/s=20 > 128KiB 4.16M/s=20 > 64KiB 13.4M/s=20 > 32KiB 2.13M/s=20 > 16KiB 1.81M/s=20 As I mentioned, the beginning of the drive is likely different from the= rest, and deals differently with random I/O to optimize for the FAT file system. That's why I suggested = using 17MB offset rather than 1MB. > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D1 --offs= et $[2*1024*1024] --erasesize=3D$[4*1024*1024] > 4MiB 14M/s =20 > 2MiB 13M/s =20 > 1MiB 15.1M/s=20 > 512KiB 3.25M/s=20 > 256KiB 6.58M/s=20 > 128KiB 4.18M/s=20 > 64KiB 13.5M/s=20 > 32KiB 2.13M/s=20 > 16KiB 1.82M/s=20 >=20 >=20 > So this does seem to me that the device quite likes 4 MiB sized, but = doesn=C2=B4t > care too much about their alignment? I think we can assume that any I/O over 1MB is fast within the first fe= w MB of the device based on these results. > merkaba:~> /tmp/flashbench /dev/sdb --open-au --open-au-nr=3D1 --offs= et $[78*1024] --erasesize=3D$[4*1024*1024] > 4MiB 14.2M/s=20 > 2MiB 13.3M/s=20 > 1MiB 15.1M/s=20 > 512KiB 3.42M/s=20 > 256KiB 6.6M/s =20 > 128KiB 4.22M/s=20 > 64KiB 13.5M/s=20 > 32KiB 2.17M/s=20 > 16KiB 1.84M/s=20 >=20 > Its seem thats a kinda special USB stick. Having fast 64KB I/O is also quite common, but I agree that it's not ob= viously showing the patterns that we expect for f2fs. Especially the 2 erase bl= ock limit is problematic. If this is still the 2GB stick, it may be more helpful = to play with a different one that is larger. Many manufacturers have changed th= eir underlying technology between 2GB and 4GB (even more so in SD cards), a= nd the newer devices are more interesting because any small ones will soon be gone from the market. Arnd