linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>, <linux-fsdevel@vger.kernel.org>,
	<linux-block@vger.kernel.org>, <calvinowens@fb.com>, <hch@lst.de>,
	<adilger@dilger.ca>
Subject: Re: [PATCH 0/11] Update version of write stream ID patchset
Date: Tue, 8 Mar 2016 14:56:31 -0700	[thread overview]
Message-ID: <56DF4A8F.8070103@fb.com> (raw)
In-Reply-To: <yq1r3fo4pob.fsf@sermon.lab.mkp.net>

On 03/05/2016 01:48 PM, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <axboe@fb.com> writes:
>
> Jens,
>
>>> OK.  I'm still of the opinion that we should try to make this
>>> transparent.  I could be swayed by workload descriptions and numbers
>>> comparing approaches, though.
>
> Jens> You can't just waive that flag and not have a solution. Any
> Jens> solution in that space would imply having policy in the kernel. A
> Jens> "just use a stream per file" is never going to work.
>
> I totally understand the desire to have explicit, long-lived
> "from-file-open to file-close" streams for things like database journals
> and whatnot.

That is an appealing use case.

> However, I think that you are dismissing the benefits of being able to
> group I/Os to disjoint LBA ranges within a brief period of time as
> belonging to a single file. It's something that we know works well on
> other types of storage. And it's also a much better heuristic for data
> placement on SSDs than just picking the next available bucket. It does
> require some pipelining on the drive but they will need some front end
> logic to handle the proposed stream ID separation in any case.

I'm not a huge fan of heuristics based exclusively around the temporal 
and spacial locality. Using that as a hint for a case where no stream ID 
(or write tag) is given would be an improvement, though. And perhaps 
parts of the space should be reserved to just that.

But I don't think that should exclude doing this in a much more managed 
fashion, personally I find that a lot saner than adding this sort of 
state tracking in the kernel.

> Also, in our experiments we essentially got the explicit stream ID for
> free by virtue of the journal being written often enough that it was
> rarely if ever evicted as an active stream by the device. With no
> changes whatsoever to any application.

Journal would be an easy one to guess, for sure.

> My gripe with the current stuff is the same as before: The protocol is
> squarely aimed at papering over issues with current flash technology. It
> kinda-sorta works for other types of devices but it is very limiting. I
> appreciate that it is a great fit for the "handful of apps sharing a
> COTS NVMe drive on a cloud server" use case. But I think it is horrible
> for NVMe over Fabrics and pretty much everything else. That wouldn't be
> a big deal if the traditional storage models were going away. But I
> don't think they are...

I don't think erase blocks are going to go away in the near future. 
We're going to have better media as well, that's a given, but cheaper 
TLC flash is just going to make the current problem much worse. The 
patchset is really about tagging the writes with a stream ID, nothing 
else. That could potentially be any type of hinting, it's not exclusive 
to being used with NVMe write directives at all.


-- 
Jens Axboe


  reply	other threads:[~2016-03-08 21:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 16:10 [PATCH 0/11] Update version of write stream ID patchset Jens Axboe
2016-03-04 16:10 ` [PATCH 01/11] idr: make ida_simple_remove() return an error Jens Axboe
2016-03-04 16:10 ` [PATCH 02/11] block: add support for carrying a stream ID in a bio Jens Axboe
2016-03-04 16:10 ` [PATCH 03/11] Add support for per-file/inode stream ID Jens Axboe
     [not found]   ` <CAJVOszBXU-qQENcOGG8pWeARwoWL2G3gNJ0H2uNPjXkiVa8S+Q@mail.gmail.com>
2016-03-04 20:35     ` Jens Axboe
2016-03-04 16:10 ` [PATCH 04/11] Add system call for setting inode/file write " Jens Axboe
2016-03-04 16:10 ` [PATCH 05/11] wire up system call for x86/x86-64 Jens Axboe
2016-03-04 16:10 ` [PATCH 06/11] Add support for bdi tracking of stream ID Jens Axboe
2016-03-04 16:10 ` [PATCH 07/11] direct-io: add support for write stream IDs Jens Axboe
2016-03-04 16:10 ` [PATCH 08/11] Add stream ID support for buffered mpage/__block_write_full_page() Jens Axboe
2016-03-04 16:10 ` [PATCH 09/11] btrfs: add support for write stream IDs Jens Axboe
2016-03-04 20:44   ` Chris Mason
2016-03-04 20:45     ` Jens Axboe
2016-03-04 16:10 ` [PATCH 10/11] xfs: add support for buffered writeback stream ID Jens Axboe
2016-03-04 16:10 ` [PATCH 11/11] ext4: add support for write stream IDs Jens Axboe
2016-03-04 19:42 ` [PATCH 0/11] Update version of write stream ID patchset Jeff Moyer
2016-03-04 20:34   ` Jens Axboe
2016-03-04 21:01     ` Jeff Moyer
2016-03-04 21:06       ` Jens Axboe
2016-03-04 22:03         ` Jeff Moyer
2016-03-04 22:13           ` Jens Axboe
2016-03-05 20:48         ` Martin K. Petersen
2016-03-08 21:56           ` Jens Axboe [this message]
2016-03-17 23:43             ` Dan Williams
2016-03-18  0:18               ` Jens Axboe
2016-03-18  2:39                 ` Martin K. Petersen
2016-03-18 17:37                   ` Jens Axboe
2016-03-18 17:56                     ` Dan Williams
2016-03-06  6:13 ` Andreas Dilger
2016-03-06 13:03   ` Martin K. Petersen
2016-03-06 16:08     ` Boaz Harrosh
2016-03-06 20:51       ` Shaun Tancheff
2016-03-07 15:41         ` Martin K. Petersen
2016-03-07 15:34       ` Martin K. Petersen
2016-03-06 22:42     ` Andreas Dilger
2016-03-07 15:52       ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DF4A8F.8070103@fb.com \
    --to=axboe@fb.com \
    --cc=adilger@dilger.ca \
    --cc=calvinowens@fb.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).