Linux block layer
 help / color / mirror / Atom feed
* [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK
@ 2023-02-06 10:00 Hans Holmberg
  2023-02-06 12:49 ` Ming Lei
  2023-02-06 18:58 ` Bart Van Assche
  0 siblings, 2 replies; 12+ messages in thread
From: Hans Holmberg @ 2023-02-06 10:00 UTC (permalink / raw)
  To: linux-block@vger.kernel.org
  Cc: ming.lei@redhat.com, Matias Bjørling, Damien Le Moal,
	Dennis Maisenbacher, Ajay Joshi, Jørgen Hansen,
	andreas@metaspace.dk, javier@javigon.com, slava@dubeyko.com,
	kbusch@kernel.org, hans@owltronix.com, mcgrof@kernel.org,
	guokuankuan@bytedance.com, viacheslav.dubeyko@bytedance.com,
	hch@lst.de

I think we're missing a flexible way of routing random-ish
write workloads on to zoned storage devices. Implementing a UBLK
target for this would be a great way to provide zoned storage
benefits to a range of use cases. Creating UBLK target would
enable us experiment and move fast, and when we arrive
at a common, reasonably stable, solution we could move this into
the kernel.

We do have dm-zoned [3]in the kernel, but it requires a bounce
on conventional zones for non-sequential writes, resulting in a write
amplification of 2x (which is not optimal for flash).

Fully random workloads make little sense to store on ZBDs as a
host FTL could not be expected to do better than what conventional block
devices do today. Fully sequential writes are also well taken care of
by conventional block devices.

The interesting stuff is what lies in between those extremes.

I would like to discuss how we could use UBLK to implement a
common FTL with the right knobs to cater for a wide range of workloads
that utilize raw block devices. We had some knobs in  the now-dead pblk,
a FTL for open channel devices, but I think we could do way better than that.

Pblk did not require bouncing writes and had knobs for over-provisioning and
workload isolation which could be implemented. We could also add options
for different garbage collection policies. In userspace it would also 
be easy to support default block indirection sizes, reducing logical-physical
translation table memory overhead.

Use cases for such an FTL includes SSD caching stores such as Apache
traffic server [1] and CacheLib[2]. CacheLib's block cache and the apache
traffic server storage workloads are *almost* zone block device compatible
and would need little translation overhead to perform very well on e.g.
ZNS SSDs.

There are probably more use cases that would benefit.

It would also be a great research vehicle for academia. We've used dm-zap
for this [4] purpose the last couple of years, but that is not production-ready
and cumbersome to improve and maintain as it is implemented as a out-of-tree
device mapper.

ublk adds a bit of latency overhead, but I think this is acceptable at least
until we have a great, proven solution, which could be turned into
an in-kernel FTL.

If there is interest in the community for a project like this, let's talk!

cc:ing the folks who participated in the discussions at ALPSS 2021 and last
years' plumbers on this subject.

Thanks,
Hans

[1] https://trafficserver.apache.org/
[2] https://cachelib.org/
[3] https://docs.kernel.org/admin-guide/device-mapper/dm-zoned.html
[4] https://github.com/westerndigitalcorporation/dm-zap

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-07 12:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-06 10:00 [LSF/MM/BPF BoF]: A host FTL for zoned block devices using UBLK Hans Holmberg
2023-02-06 12:49 ` Ming Lei
2023-02-06 12:54   ` Ming Lei
2023-02-06 14:34   ` Matias Bjørling
2023-02-06 15:32     ` Ming Lei
2023-02-06 18:31     ` Bart Van Assche
2023-02-07  9:40       ` Matias Bjørling
2023-02-07  9:32     ` Hans Holmberg
2023-02-07 10:31   ` Nitesh Shetty
2023-02-07 12:49     ` Ming Lei
2023-02-06 18:58 ` Bart Van Assche
2023-02-07 12:11   ` Hans Holmberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox