From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:33531 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1759044AbcCDUeU (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 4 Mar 2016 15:34:20 -0500
Subject: Re: [PATCH 0/11] Update version of write stream ID patchset
To: Jeff Moyer <jmoyer@redhat.com>
References: <1457107853-8689-1-git-send-email-axboe@fb.com>
 <x49a8meni6c.fsf@segfault.boston.devel.redhat.com>
CC: <linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>,
	<calvinowens@fb.com>, <hch@lst.de>, <adilger@dilger.ca>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
From: Jens Axboe <axboe@fb.com>
Message-ID: <56D9F141.9070803@fb.com>
Date: Fri, 4 Mar 2016 13:34:09 -0700
MIME-Version: 1.0
In-Reply-To: <x49a8meni6c.fsf@segfault.boston.devel.redhat.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 03/04/2016 12:42 PM, Jeff Moyer wrote:
> Jens Axboe <axboe@fb.com> writes:
>
>> It's been a while since I last posted the write stream ID patchset, but
>> here is an updated version.
>>
>> The original patchset was centered around the current NVMe streams
>> proposal, but there was a number of issues with that. It's now in a
>> much beter state, and hopefully will make it into 1.3 of the spec
>> soon.
>
> But the spec is still not public.  The only documentation I can find on
> this stuff is from t10, dated May of last year.

That is correct, but the important changes are basically in the cover 
letter that I wrote for the patchset :-)

>> To quickly re-summarize the intent behind write stream IDs, it's to
>> be able to provide a hint to the underlying storage device on what
>> writes could feasibly be grouped together. If the device is able to
>> group writes of similar life times on media, then we can greatly reduce
>> the amount of data that needs to be copied around at garbage collection
>> time. This gives us a better write amplification factor, which leads
>> to better device life times and better (and more predictable)
>> performance at steady staet.
>>
>> There's been a number of changes to this patchset since it was last
>> posted. In summary:
>>
>> 1) The bio parts have been bumped to carry 16 bits of stream data, up
>>     from 8 and 12 in the original series.
>>
>> 2) Since the interface grew some more options, I've moved away from
>>     fadvise and instead added a new system call. I don't feel strongly
>>     about what interface we use here, another option would be to have a
>>     (big) set of fcntl() commands instead.
>>
>> 3) The kernel now manages the ID space, since we have moved to a host
>>     assigned model. This is done on a backing_dev_info basis, and the
>>     btrfs patch has been updated to show how this can be used for nested
>>     devices on btrfs/md/dm/etc. This could be moved to the request queue
>>     as well, again I don't feel too strongly aboout this specific part.
>>
>> Those are the big changes.
>
> My main question is why expose this to userspace at all?  If we're
> keeping track of write streams per file, then why not implement that in
> the kernel, transparent to the application?  That would benefit all
> applications instead of requiring application developers to opt in.

Because lots of different files could be the same write ID. It's not 
like we're going to have millions of streams available, you have to 
group them more wisely. Unless the policy is one-stream-per-file always, 
then we can't put that sort of thing in the kernel. The kernel has no 
way of knowing.

> I'm sure your argument will have something to do with how stream id's
> are allocated/freed (expensive/slow, limited resource, whatever), but
> that really just gets back to Martin's original questions about what we
> should expect from the hardware and what the programming model should
> look like (questions that are, afaik, still open).

That's orthogonal, really. The open/close might be expensive, or it 
might not be, it has no real bearing on how you assign specific writes 
to specific stream IDs.

> I'm not against write streams, I think it's a neat idea.  I just think
> it will die on the vine if you require application developers to opt
> in.  Not all storage is SSDs, and I don't like that SSDs now have to be
> treated differently by the programmer.

But that's why it's kept really simple. There are people that want to 
make this more involved, and tie QoS criteria to streams. My argument 
there has been what you are saying, it will never be used or get 
adopted. For streams in general, the wins are big enough that 
applications will care. And it's not difficult to use at all...

It's not just SSDs, either. Could be used for tiered storage in general. 
That would mostly require going a bit further and assigning performance 
characteristics to specific stream IDs, but there's nothing preventing 
that from being done down the road. For now, this is just a basic 
interface with a kernel managed stream ID space attached.

-- 
Jens Axboe