* [LSF/MM TOPIC] bcachefs - status update, upstreaming (!?)
@ 2018-02-07 10:26 Kent Overstreet
0 siblings, 0 replies; only message in thread
From: Kent Overstreet @ 2018-02-07 10:26 UTC (permalink / raw)
To: lsf-pc, linux-fsdevel
Hi, I'd like to talk a bit about what I've sunk the past few years of my life
into :)
For those who haven't heard, bcachefs started out as an extended version of
bcache, and eventually grew into a full posix filesystem. It's a long weird
story.
Today, it's really a real filesystem with a small community of users and
testers, and the main focus has been on making it production quality and rock
solid - it's not a research project or a toy, it's meant to be used.
What's done:
- pretty much all the normal posix fs functionality - xattrs, acls, fallocate,
quotas.
- fsck
- full data checksumming
- compression
- encryption
- multiple devivices (raid1 is done minus exposing a way to rereplicate
degraded data after device failure)
- caching (right now only writeback caching is exposed; a new more flexible
interface is being worked on for caching and other allocation policy stuff)
What's _not_ done:
- persistent allocation information; we still have to walk all our metadata on
every mount to see what disk space is in use (and for a few other relatively
minor reasons).
This is less of an issue than you'd think: bcachefs walks metadata _really_
fast, fast enough that nobody's complaining (even on multi terabyte
filesystems; erasure coding is the most asked for feature, "faster mounts"
never comes up). But of the remaining features to implement/things to deal
with, this is going to be one of the most complex.
One of the upsides though - because I've had to make walking metadata as fast
as possible, bcachefs fsck is also really, really fast (it's run by default
on every mount).
Planned features:
- erasure coding (i.e. raid5/6)
- snapshots
I also want to come up with a plan for eventually upstreaming this damned thing :)
One of the reasons I haven't even talked about upstreaming before is I _really_
haven't wanted to fix the on disk format before I was ready. This is still a
concern w.r.t. persistent allocation information and snapshots, but overall
there's been fewer and fewer reasons for on disk format changes; things seem to
be naturally stabilizing.
And I know there's going to be plenty of other people at LSF with recent
experience on upstreaming new filesystems, right now I don't have any strong
ideas of my own and welcome any input :)
Not sure what else I should talk about; I've been quiet for _way_ too long. I'd
welcome any questions or suggestions.
One other cool thing I've been doing lately is I finally rigged up some pure
btree performance/torture tests: I am _exceedingly_ proud of bcachefs's btree
(bcache's btree code is at best a prototype or a toy compared to bcachefs's).
The numbers are, I think, well worth showing off; I'd be curious if anyone knows
how other competing btree implementations (xfs's?) do in comparison:
These benchmarks are with 64 bit keys and 64 bit values: sequentially create,
iterate over, and delete 100M keys:
seq_insert: 100M with 1 threads in 104 sec, 998 nsec per iter, 978k per sec
seq_lookup: 100M with 1 threads in 1 sec, 10 nsec per iter, 90.8M per sec
seq_delete: 100M with 1 threads in 41 sec, 392 nsec per iter, 2.4M per sec
create 100M keys at random (64 bit random ints for the keys)
rand_insert: 100M with 1 threads in 227 sec, 2166 nsec per iter, 450k per sec
rand_insert: 100M with 6 threads in 106 sec, 6086 nsec per iter, 962k per sec
random lookups, over the 100M random keys we just created:
rand_lookup: 10M with 1 threads in 10 sec, 995 nsec per iter, 981k per sec
rand_lookup: 10M with 6 threads in 2 sec, 1223 nsec per iter, 4.6M per sec
mixed lookup/update: 75% lookup, 25% update:
rand_mixed: 10M with 1 threads in 16 sec, 1615 nsec per iter, 604k per sec
rand_mixed: 10M with 6 threads in 8 sec, 4614 nsec per iter, 1.2M per sec
This is on my ancient i7 gulftown, using a micron p320h (it's not a pure in
memory test, we're actually writing out those random inserts!). Numbers are
slightly better on my haswell :)
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2018-02-07 10:26 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-07 10:26 [LSF/MM TOPIC] bcachefs - status update, upstreaming (!?) Kent Overstreet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).