From: Matias Bjorling <m@bjorling.me>
To: Christoph Hellwig <hch@infradead.org>
Cc: axboe@fb.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
Stephen.Bates@pmcs.com, keith.busch@intel.com,
javier@lightnvm.io
Subject: Re: [PATCH v4 0/8] Support for Open-Channel SSDs
Date: Wed, 10 Jun 2015 20:11:42 +0200 [thread overview]
Message-ID: <55787DDE.7020801@bjorling.me> (raw)
In-Reply-To: <20150609074643.GA5707@infradead.org>
On 06/09/2015 09:46 AM, Christoph Hellwig wrote:
> Hi Matias,
>
> I've been looking over this and I really think it needs a fundamental
> rearchitecture still. The design of using a separate stacking
> block device and all kinds of private hooks does not look very
> maintainable.
>
> Here is my counter suggestion:
>
> - the stacking block device goes away
> - the nvm_target_type make_rq and prep_rq callbacks are combined
> into one and called from the nvme/null_blk ->queue_rq method
> early on to prepare the FTL state. The drivers that are LightNVM
> enabled reserve a pointer to it in their per request data, which
> the unprep_rq callback is called on durign I/O completion.
>
I agree with this, if it only was a common FTL that would be
implemented. This is maybe where we start, but what I really want to
enable is these two use-cases:
1. A get/put flash block API, that user-space applications can use.
That will enable application-driven FTLs. E.g. RocksDB can be integrated
tightly with the SSD. Allowing data placement and garbage collection to
be strictly controlled. Data placement will reduce the need for
over-provisioning, as data that age at the same time are placed in the
same flash block, and garbage collection can be scheduled to not
interfere with user requests. Together, it will remove I/O outliers
significantly.
2. Large drive arrays with global FTL. The stacking block device model
enables this. It allows an FTL to span multiple devices, and thus
perform data placement and garbage collection over tens to hundred of
devices. That'll greatly improve wear-leveling, as there is a much
higher probability of a fully inactive block with more flash.
Additionally, as the parallelism grows within the storage array, we can
slice and dice the devices using the get/put flash block API and enable
applications to get predictable performance, while using large arrays
that have a single address space.
If it too much for now to get upstream, I can live with (2) removed and
then I make the changes you proposed.
What do you think?
Thanks
-Matias
next prev parent reply other threads:[~2015-06-10 18:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 12:54 [PATCH v4 0/8] Support for Open-Channel SSDs Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 1/8] nvme: add special param for nvme_submit_sync_cmd Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 2/8] nvme: don't overwrite req->cmd_flags on sync cmd Matias Bjørling
2015-06-09 7:31 ` Christoph Hellwig
2015-06-05 12:54 ` [PATCH v4 3/8] null_blk: wrong capacity when bs is not 512 bytes Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 4/8] bio: Introduce LightNVM payload Matias Bjørling
2015-06-05 18:17 ` Matias Bjorling
2015-07-06 13:16 ` Pavel Machek
2015-06-05 12:54 ` [PATCH v4 5/8] lightnvm: Support for Open-Channel SSDs Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 6/8] lightnvm: RRPC target Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 7/8] null_blk: LightNVM support Matias Bjørling
2015-06-05 12:54 ` [PATCH v4 8/8] nvme: " Matias Bjørling
2015-06-08 14:48 ` [PATCH v4 0/8] Support for Open-Channel SSDs Stephen Bates
2015-06-09 7:46 ` Christoph Hellwig
2015-06-10 18:11 ` Matias Bjorling [this message]
2015-06-11 10:29 ` Christoph Hellwig
2015-06-13 16:17 ` Matias Bjorling
2015-06-17 13:59 ` Christoph Hellwig
2015-06-17 18:04 ` Matias Bjorling
2015-07-16 12:23 ` Matias Bjørling
2015-07-16 12:46 ` Christoph Hellwig
2015-07-16 13:06 ` Matias Bjørling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55787DDE.7020801@bjorling.me \
--to=m@bjorling.me \
--cc=Stephen.Bates@pmcs.com \
--cc=axboe@fb.com \
--cc=hch@infradead.org \
--cc=javier@lightnvm.io \
--cc=keith.busch@intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).