From: Boaz harrosh <boaz@plexistor.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Anna Schumaker <Anna.Schumaker@netapp.com>,
Al Viro <viro@zeniv.linux.org.uk>
Cc: Ric Wheeler <rwheeler@redhat.com>,
Miklos Szeredi <mszeredi@redhat.com>,
Steven Whitehouse <swhiteho@redhat.com>,
Jefff moyer <jmoyer@redhat.com>,
Amir Goldstein <amir73il@gmail.com>,
Amit Golander <Amit.Golander@netapp.com>,
Sagi Manole <sagim@netapp.com>
Subject: [RFC PATCH 00/17] zuf: ZUFS Zero-copy User-mode FileSystem
Date: Tue, 19 Feb 2019 13:51:19 +0200 [thread overview]
Message-ID: <20190219115136.29952-1-boaz@plexistor.com> (raw)
From: Boaz Harrosh <boazh@netapp.com>
I would please like to present the ZUFS file system and the Kernel code part
in this patchset.
The Kernel code presented here can be found at:
https://github.com/NetApp/zufs-zuf
And the User-mode Server + example FSs here:
https://github.com/NetApp/zufs-zus
ZUFS - stands for Zero-copy User-mode FS
* It is geared towards true zero copy end to end of both data and meta data.
* It is geared towards very *low latency*, very high CPU locality, lock-less
parallelism.
* Synchronous operations (for low latency)
* Numa awareness
Short description:
ZUFS is a from scratch implementation of a filesystem-in-user-space, which
tries to address the above goals. from the get go it is aimed for pmem
based FSs. But can easily support other type of FSs that can utilize x10
latency and parallelism improvements.
The novelty of this project is that the interface is designed with a modern
multi-core NUMA machine in mind down to the ABI, so to reach these goals.
Please see first patch for License of this project
Current status: There are a couple of trivial open-source filesystem
implementations and a full blown proprietary implementation from Netapp.
Together with the Kernel module submitted here the User-mode-Server and the
zusFSs User-mode plugins, this code pass Netapp QA including xfstests +
internal QA tests. And was released to costumers as Maxdata 1.2.
So it is very stable.
In the git repository above there is also a backport for rhel 7.6.
Including rpm packages for Kernel and Server components.
(Also available evaluation licenses of Maxdata 1.2 for developers.
Please contact Amit Golander <Amit.Golander@netapp.com> if you need one)
Just to get some points across as I said this project is all about
performance and low latency. Here below are some results I have run:
[fuse]
threads wr_iops wr_bw wr_lat
1 33606 134424 26.53226
2 57056 228224 30.38476
3 73142 292571 35.75727
4 88667 354668 40.12783
5 102280 409122 42.13261
6 110122 440488 48.29697
7 116561 466245 53.98572
8 129134 516539 55.6134
[fuse-splice]
threads wr_iops wr_bw wr_lat
1 39670 158682 21.8399
2 51100 204400 34.63294
3 62385 249542 39.28847
4 75220 300882 47.42344
5 84522 338088 52.97299
6 93042 372168 57.40804
7 97706 390825 63.04435
8 98034 392137 73.24263
[xfs-dax]
threads wr_iops wr_bw wr_lat
1 19449 77799 48.03282
2 37704 150819 37.2343
3 55415 221663 30.59375
4 72285 289142 26.08636
5 90348 361392 23.89037
6 103696 414787 22.38045
7 120638 482552 21.38869
8 134157 536630 21.1426
[Maxdata-1.2-zufs]
threads wr_iops wr_bw wr_lat
1 57506 230026 14.387113
2 98624 394498 16.790232
3 142276 569106 17.344622
4 187984 751936 17.527123
5 190304 761219 19.504314
6 221407 885628 20.862000
7 211579 846316 23.262040
8 246029 984116 24.630604
[*1]
These good results are when an mm patch is applied which introduces
VM_LOCAL_CPU flag that eliminates vm_zap_ptes from scheduling on all
CPUs when creating a per-cpu VMA.
This patch was not accepted by the Linux Kernel community and is not
presented in this patchset. (Patch available for review on demand)
But a few weeks from now I will submit some incremental changes to the
code which will return the numbers to above, and even better for some
benchmarks. (without the mm patch)
I have used an 8 way KVM-qemu with 2 NUMA nodes.
Running fio with 4k random writes O_DIRECT | O_SYNC to a DRAM simulated
pmem. (memmap=! at grub), Fuse-fs was a memcpy same 4k null-FS
fio was then run with more and more threads (see threads column)
to test for scalability.
We are still > x2 slower than I would like to.
(Compared to an in-kernel pmem-base FS)
But I believe I can shave off another 1-2 us by farther optimizing
the app-to-server thread switch by developing a new scheduler-object
so to avoid going through the scheduler all together (and its locks)
when switching VMs.
(Currently using couple of wait_queue_head_t with wait_event() calls
See relay.h in patches)
Please Review and ask any question big or trivial. I would love to
iron this code, and submit it upstream.
Thank you for reading
Boaz
~~~~~~~~~~~~~~~~~~
Boaz Harrosh (17):
fs: Add the ZUF filesystem to the build + License
zuf: Preliminary Documentation
zuf: zuf-rootfs
zuf: zuf-core The ZTs
zuf: Multy Devices
zuf: mounting
zuf: Namei and directory operations
zuf: readdir operation
zuf: symlink
zuf: More file operation
zuf: Write/Read implementation
zuf: mmap & sync
zuf: ioctl implementation
zuf: xattr implementation
zuf: ACL support
zuf: Special IOCTL fadvise (TODO)
zuf: Support for dynamic-debug of zusFSs
Documentation/filesystems/zufs.txt | 351 ++++++++
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/zuf/Kconfig | 23 +
fs/zuf/Makefile | 23 +
fs/zuf/_extern.h | 166 ++++
fs/zuf/_pr.h | 62 ++
fs/zuf/acl.c | 281 +++++++
fs/zuf/directory.c | 167 ++++
fs/zuf/file.c | 527 ++++++++++++
fs/zuf/inode.c | 648 ++++++++++++++
fs/zuf/ioctl.c | 306 +++++++
fs/zuf/md.c | 761 +++++++++++++++++
fs/zuf/md.h | 318 +++++++
fs/zuf/md_def.h | 145 ++++
fs/zuf/mmap.c | 336 ++++++++
fs/zuf/module.c | 28 +
fs/zuf/namei.c | 435 ++++++++++
fs/zuf/relay.h | 88 ++
fs/zuf/rw.c | 705 ++++++++++++++++
fs/zuf/super.c | 771 +++++++++++++++++
fs/zuf/symlink.c | 74 ++
fs/zuf/t1.c | 138 +++
fs/zuf/t2.c | 375 +++++++++
fs/zuf/t2.h | 68 ++
fs/zuf/xattr.c | 310 +++++++
fs/zuf/zuf-core.c | 1257 ++++++++++++++++++++++++++++
fs/zuf/zuf-root.c | 431 ++++++++++
fs/zuf/zuf.h | 414 +++++++++
fs/zuf/zus_api.h | 869 +++++++++++++++++++
30 files changed, 10079 insertions(+)
create mode 100644 Documentation/filesystems/zufs.txt
create mode 100644 fs/zuf/Kconfig
create mode 100644 fs/zuf/Makefile
create mode 100644 fs/zuf/_extern.h
create mode 100644 fs/zuf/_pr.h
create mode 100644 fs/zuf/acl.c
create mode 100644 fs/zuf/directory.c
create mode 100644 fs/zuf/file.c
create mode 100644 fs/zuf/inode.c
create mode 100644 fs/zuf/ioctl.c
create mode 100644 fs/zuf/md.c
create mode 100644 fs/zuf/md.h
create mode 100644 fs/zuf/md_def.h
create mode 100644 fs/zuf/mmap.c
create mode 100644 fs/zuf/module.c
create mode 100644 fs/zuf/namei.c
create mode 100644 fs/zuf/relay.h
create mode 100644 fs/zuf/rw.c
create mode 100644 fs/zuf/super.c
create mode 100644 fs/zuf/symlink.c
create mode 100644 fs/zuf/t1.c
create mode 100644 fs/zuf/t2.c
create mode 100644 fs/zuf/t2.h
create mode 100644 fs/zuf/xattr.c
create mode 100644 fs/zuf/zuf-core.c
create mode 100644 fs/zuf/zuf-root.c
create mode 100644 fs/zuf/zuf.h
create mode 100644 fs/zuf/zus_api.h
--
2.20.1
next reply other threads:[~2019-02-19 11:51 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-19 11:51 Boaz harrosh [this message]
2019-02-19 11:51 ` [RFC PATCH 01/17] fs: Add the ZUF filesystem to the build + License Boaz harrosh
2019-02-20 11:03 ` Greg KH
2019-02-20 14:55 ` Boaz Harrosh
2019-02-20 19:40 ` Greg KH
2019-02-26 17:55 ` Schumaker, Anna
2019-02-28 16:42 ` Boaz Harrosh
2019-02-19 11:51 ` [RFC PATCH 02/17] zuf: Preliminary Documentation Boaz harrosh
2019-02-20 8:27 ` Miklos Szeredi
2019-02-20 14:24 ` Boaz Harrosh
2019-02-19 11:51 ` [RFC PATCH 03/17] zuf: zuf-rootfs Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 04/17] zuf: zuf-core The ZTs Boaz harrosh
2019-02-26 18:34 ` Schumaker, Anna
2019-02-28 17:01 ` Boaz Harrosh
2019-02-19 11:51 ` [RFC PATCH 05/17] zuf: Multy Devices Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 06/17] zuf: mounting Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 07/17] zuf: Namei and directory operations Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 08/17] zuf: readdir operation Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 09/17] zuf: symlink Boaz harrosh
2019-02-20 11:05 ` Greg KH
2019-02-20 14:12 ` Boaz Harrosh
2019-02-19 11:51 ` [RFC PATCH 10/17] zuf: More file operation Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 11/17] zuf: Write/Read implementation Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 12/17] zuf: mmap & sync Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 13/17] zuf: ioctl implementation Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 14/17] zuf: xattr implementation Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 15/17] zuf: ACL support Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 16/17] zuf: Special IOCTL fadvise (TODO) Boaz harrosh
2019-02-19 11:51 ` [RFC PATCH 17/17] zuf: Support for dynamic-debug of zusFSs Boaz harrosh
2019-02-19 12:15 ` [RFC PATCH 00/17] zuf: ZUFS Zero-copy User-mode FileSystem Matthew Wilcox
2019-02-19 19:15 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190219115136.29952-1-boaz@plexistor.com \
--to=boaz@plexistor.com \
--cc=Amit.Golander@netapp.com \
--cc=Anna.Schumaker@netapp.com \
--cc=amir73il@gmail.com \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mszeredi@redhat.com \
--cc=rwheeler@redhat.com \
--cc=sagim@netapp.com \
--cc=swhiteho@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).