From mboxrd@z Thu Jan  1 00:00:00 1970
From: Matias Bjorling <m@bjorling.me>
Subject: Re: [PATCH v4 0/8] Support for Open-Channel SSDs
Date: Wed, 10 Jun 2015 20:11:42 +0200
Message-ID: <55787DDE.7020801@bjorling.me>
References: <1433508870-28251-1-git-send-email-m@bjorling.me> <20150609074643.GA5707@infradead.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Cc: axboe@fb.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	Stephen.Bates@pmcs.com, keith.busch@intel.com, javier@lightnvm.io
To: Christoph Hellwig <hch@infradead.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-wi0-f177.google.com ([209.85.212.177]:34010 "EHLO
	mail-wi0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S933877AbbFJSLr (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 10 Jun 2015 14:11:47 -0400
Received: by wibut5 with SMTP id ut5so56575491wib.1
        for <linux-fsdevel@vger.kernel.org>; Wed, 10 Jun 2015 11:11:45 -0700 (PDT)
In-Reply-To: <20150609074643.GA5707@infradead.org>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 06/09/2015 09:46 AM, Christoph Hellwig wrote:
> Hi Matias,
> 
> I've been looking over this and I really think it needs a fundamental
> rearchitecture still.  The design of using a separate stacking
> block device and all kinds of private hooks does not look very
> maintainable.
> 
> Here is my counter suggestion:
> 
>  - the stacking block device goes away
>  - the nvm_target_type make_rq and prep_rq callbacks are combined
>    into one and called from the nvme/null_blk ->queue_rq method
>    early on to prepare the FTL state.  The drivers that are LightNVM
>    enabled reserve a pointer to it in their per request data, which
>    the unprep_rq callback is called on durign I/O completion.
> 

I agree with this, if it only was a common FTL that would be
implemented. This is maybe where we start, but what I really want to
enable is these two use-cases:

1. A get/put flash block API, that user-space applications can use.
That will enable application-driven FTLs. E.g. RocksDB can be integrated
tightly with the SSD. Allowing data placement and garbage collection to
be strictly controlled. Data placement will reduce the need for
over-provisioning, as data that age at the same time are placed in the
same flash block, and garbage collection can be scheduled to not
interfere with user requests. Together, it will remove I/O outliers
significantly.

2. Large drive arrays with global FTL. The stacking block device model
enables this. It allows an FTL to span multiple devices, and thus
perform data placement and garbage collection over tens to hundred of
devices. That'll greatly improve wear-leveling, as there is a much
higher probability of a fully inactive block with more flash.
Additionally, as the parallelism grows within the storage array, we can
slice and dice the devices using the get/put flash block API and enable
applications to get predictable performance, while using large arrays
that have a single address space.

If it too much for now to get upstream, I can live with (2) removed and
then I make the changes you proposed.

What do you think?

Thanks
-Matias