From: Marc Lehmann <schmorp@schmorp.de>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Subject: Re: SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning
Date: Sat, 26 Sep 2015 15:53:53 +0200 [thread overview]
Message-ID: <20150926135353.GA9860@schmorp.de> (raw)
In-Reply-To: <20150926073655.GB13619@jaegeuk-mac02.hsd1.ca.comcast.net>
On Sat, Sep 26, 2015 at 12:36:55AM -0700, Jaegeuk Kim <jaegeuk@kernel.org> wrote:
> > Care to share why? :)
>
> Mostly, in the flash storages, it is multiple 2MB normally. :)
Well, any value of -s gives me a multiple of 2MB, no? :)
> > Is there anysthing specially good for numbers of two? Or do you just want top
> > reduce the number of changed variables?
>
> IMO, likewise flash storages, it needs to investigate the raw device
> characteristics.
Keep in mind that I don't use it for flash, but smr drives.
We already know the raw device characteristics, basically, the zones are
between 15 and 40 or so MB in size (on the seagate 8tb drive), and they
likely don't have "even" sizes at all.
It's also by far not easy to benchmark these things, the disks can
buffer up to 25GB of random writes (and then might need several hours of
cleanup). Failing a linear write incurs a 0.6-1.6s penalty, to be paid
much later. It's a shame that none of the drive companies actually release
any usable info on their drives.
These guys made a hole into the disk and devised a lot of benchmarks to
find out the characteristics of these drives.
https://www.usenix.org/system/files/conference/fast15/fast15-paper-aghayev.pdf
So, the strategy for a fs would be to write linearly, most of the time,
without any gaps. f2fs (at least in 3.18.x) manages to do that very
nicely, which is why I really try to get it working.
But for writing once, any value of -s would probably suffice. There are
two problems when the disk gets full:
a) ipu writes. the drive can't do, so gc might be cheaper.
b) reuse of sections - if sections are reasonably large, if one gets freed
and reused, it should be large to guarantee large linear writes again.
b) is the reason behind me trying large values of -s.
Since I know that f2fs is the only fs that I tested that can have a sustained
write performance on these drives that is near the physical drive
characteristics, all that needs to be done is to see how f2fs performs after
it starts gc'ing.
That's why I am so interested in disk full conditions - writing the disk
linearly once is easy, I can just write a tar to the device. Ensuring that
writes are large linear after deleting and cleaning up is harder.
nilfs is a good example - it should fit smr drives perfectly, until they
are nearly full, after which nilfs still matches smr drives perfectly,
but waiting for 8TB to be shuffled around to delete some files can take
days. More surprising is that nilfs phenomenally fails with these drives,
performance wise, for reaosns I haven't investigated (my guess is that
nilfs leaves gaps).
> I think this can be used for SMR too.
You can run any blockdevice operation on these drives, but the results
from flashbench will be close to meaningless for them. For example, you
can't distinguish betwene a nonaligned write causing a read-modify write
from an aligned large write, or a partial write, by access time, as they
will probably all have similar access times.
> I think there might be some hints for section size at first and performance
> variation as well.
I think you confuse these drives with flash drives - while they share some
characteristics, they are completely unlike flash. There is no translation
layer, there is no need for wear leveling, zones have widely varying
sizes, appending can be expensive or cheap, depending on the write size.
What these drives need is primarily large linear writes without gaps, and
secondarily any optimisations for rotational media apply. (And for that, f2fs
performs unexpectedly good, given it wasn't meant for rotational media).
Now, if f2fs can be made to (mostly) work bug-free, but with the
characteristics of 3.18.21, and the gc can ensure that reasonably big
areas spanning multiple zones will be reused, then f2fs will be the _ONLY_ fs
able to take care of drive managed smr disks efficiently.
Specifically, these filesystems do NOT work well with these drives:
nilfs, zfs, btrfs, ext4, xfs
And modifications for these filesystems are either far away in the
future, or not targetted at drive managed disks (ext4 already has some
modifications, but they are clearly not very suitable for actual drives,
assuming these drives have a fast area near the start of the disk, which
isn't the case). But these disks are not uncommon (seagate is shipping by
the millions), and will stay with us for quite a while.
--
The choice of a Deliantra, the free code+content MORPG
-----==- _GNU_ http://www.deliantra.net
----==-- _ generation
---==---(_)__ __ ____ __ Marc Lehmann
--==---/ / _ \/ // /\ \/ / schmorp@schmorp.de
-=====/_/_//_/\_,_/ /_/\_\
------------------------------------------------------------------------------
next prev parent reply other threads:[~2015-09-26 13:54 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-08 20:50 general stability of f2fs? Marc Lehmann
2015-08-10 20:31 ` Jaegeuk Kim
2015-08-10 20:53 ` Marc Lehmann
2015-08-10 21:58 ` Jaegeuk Kim
2015-08-13 0:26 ` Marc Lehmann
2015-08-14 23:07 ` Jaegeuk Kim
2015-09-20 23:59 ` finally testing with SMR drives Marc Lehmann
2015-09-21 8:17 ` SMR drive test 1; 512GB partition; very slow + unfixable corruption Marc Lehmann
2015-09-21 8:19 ` Marc Lehmann
2015-09-21 9:58 ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Marc Lehmann
2015-09-22 20:22 ` SMR drive test 3: full 8TB partition, mount problems, fsck error after delete Marc Lehmann
2015-09-22 23:08 ` Jaegeuk Kim
2015-09-23 3:50 ` Marc Lehmann
2015-09-23 1:12 ` SMR drive test 2; 128GB partition; no obvious corruption, much more sane behaviour, weird overprovisioning Jaegeuk Kim
2015-09-23 4:15 ` Marc Lehmann
2015-09-23 6:00 ` Marc Lehmann
2015-09-23 8:55 ` Chao Yu
2015-09-23 23:30 ` Marc Lehmann
2015-09-23 23:43 ` Marc Lehmann
2015-09-24 17:21 ` Jaegeuk Kim
2015-09-25 8:28 ` Chao Yu
2015-09-25 8:05 ` Chao Yu
2015-09-26 3:42 ` Marc Lehmann
2015-09-23 22:08 ` Jaegeuk Kim
2015-09-23 23:39 ` Marc Lehmann
2015-09-24 17:27 ` Jaegeuk Kim
2015-09-25 5:42 ` Marc Lehmann
2015-09-25 17:45 ` Jaegeuk Kim
2015-09-26 3:32 ` Marc Lehmann
2015-09-26 7:36 ` Jaegeuk Kim
2015-09-26 13:53 ` Marc Lehmann [this message]
2015-09-28 18:33 ` Jaegeuk Kim
2015-09-29 7:36 ` Marc Lehmann
2015-09-23 6:06 ` Marc Lehmann
2015-09-23 9:10 ` Chao Yu
2015-09-23 21:30 ` Jaegeuk Kim
2015-09-23 23:11 ` Marc Lehmann
2015-09-23 21:29 ` Jaegeuk Kim
2015-09-23 23:24 ` Marc Lehmann
2015-09-24 17:51 ` Jaegeuk Kim
-- strict thread matches above, loose matches on Subject: below --
2015-09-23 21:58 sync/umount hang on 3.18.21, 1.4TB gone after crash Marc Lehmann
2015-09-23 23:11 ` write performance difference 3.18.21/4.2.1 Marc Lehmann
2015-09-24 18:28 ` Jaegeuk Kim
2015-09-24 23:20 ` Marc Lehmann
2015-09-24 23:27 ` Marc Lehmann
2015-09-25 6:50 ` Marc Lehmann
2015-09-25 9:47 ` Chao Yu
2015-09-25 18:20 ` Jaegeuk Kim
2015-09-26 3:22 ` Marc Lehmann
2015-09-26 5:25 ` write performance difference 3.18.21/git f2fs Marc Lehmann
2015-09-26 5:57 ` Marc Lehmann
2015-09-26 7:52 ` Jaegeuk Kim
2015-09-26 13:59 ` Marc Lehmann
2015-09-28 17:59 ` Jaegeuk Kim
2015-09-29 11:02 ` Marc Lehmann
2015-09-29 23:13 ` Jaegeuk Kim
2015-09-30 9:02 ` Chao Yu
2015-10-01 12:11 ` Marc Lehmann
2015-10-01 18:51 ` Marc Lehmann
2015-10-02 8:53 ` 100% system time hang with git f2fs Marc Lehmann
2015-10-02 16:51 ` Jaegeuk Kim
2015-10-03 6:29 ` Marc Lehmann
2015-10-02 16:46 ` write performance difference 3.18.21/git f2fs Jaegeuk Kim
2015-10-04 9:40 ` near disk full performance (full 8TB) Marc Lehmann
2015-09-26 7:48 ` write performance difference 3.18.21/4.2.1 Jaegeuk Kim
2015-09-25 18:26 ` Jaegeuk Kim
2015-09-24 18:50 ` sync/umount hang on 3.18.21, 1.4TB gone after crash Jaegeuk Kim
2015-09-25 6:00 ` Marc Lehmann
2015-09-25 6:01 ` Marc Lehmann
2015-09-25 18:42 ` Jaegeuk Kim
2015-09-26 3:08 ` Marc Lehmann
2015-09-26 7:27 ` Jaegeuk Kim
2015-09-25 9:13 ` Chao Yu
2015-09-25 18:30 ` Jaegeuk Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150926135353.GA9860@schmorp.de \
--to=schmorp@schmorp.de \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.