linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Dave Chinner <david@fromorbit.com>, Jens Axboe <axboe@fb.com>
Cc: Ming Lin <mlin@kernel.org>, lkml <linux-kernel@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, ming.l@ssi.samsung.com,
	"Kwan (Hingkwan) Huen-SSI" <kwan.huen@ssi.samsung.com>
Subject: Re: [PATCH 3/6] direct-io: add support for write stream IDs
Date: Fri, 17 Apr 2015 20:00:53 -0600	[thread overview]
Message-ID: <5531BAD5.4030104@kernel.dk> (raw)
In-Reply-To: <20150417235142.GH15810@dastard>

On 04/17/2015 05:51 PM, Dave Chinner wrote:
> On Fri, Apr 17, 2015 at 05:11:40PM -0600, Jens Axboe wrote:
>> On 04/17/2015 05:06 PM, Dave Chinner wrote:
>>> On Thu, Apr 16, 2015 at 11:20:45PM -0700, Ming Lin wrote:
>>>> On Sat, Apr 11, 2015 at 4:59 AM, Dave Chinner <david@fromorbit.com> wrote:
>>>>> On Fri, Apr 10, 2015 at 04:50:05PM -0700, Ming Lin wrote:
>>>>>> On Wed, Mar 25, 2015 at 7:26 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>> If iocb->ki_filp->f_streamid is not set, then it should fall back to
>>>>>>>> whatever is set on the inode->i_streamid.
>>>>>>
>>>>>> Why should do the fall back?
>>>>>
>>>>> Because then you have a method of using streams with applications
>>>>> that aren't aware of streams.
>>>>>
>>>>> Or perhaps you have a file you know has different access patterns to
>>>>> the rest of the files in a directory, and you don't want to have to
>>>>> set the stream on every process that opens and uses that file. e.g.
>>>>> database writeahead log files (sequential write, never read) vs
>>>>> database index/table files (random read/write).....
>>>>>
>>>>>>> Good point, agree. Will make that change.
>>>>>>
>>>>>> That change causes problem for direct IO, for example
>>>>>>
>>>>>> process 1:
>>>>>> fd = open("/dev/nvme0n1", O_DIRECT...);
>>>>>> //set stream_id 1
>>>>>> fadvise(fd, 1, 0, POSIX_FADV_STREAMID);
>>>>>> pwrite(fd, ....);
>>>>>>
>>>>>> process 2:
>>>>>> fd = open("/dev/nvme0n1", O_DIRECT...);
>>>>>> //should be legacy stream_id 0
>>>>>> pwrite(fd, ....);
>>>>>>
>>>>>> But now process 2 also see stream_id 1, which is wrong.
>>>>>
>>>>> It's not wrong, your behaviour model is just different You have
>>>>> defined a process/fd based stream model and not considered
>>>>> considered that admins and applications might want to use a file
>>>>> based stream model instead, so applications don't need to even be
>>>>> aware that write streams are in use...
>>>>
>>>> The stream must be opened, otherwise device will return error if application
>>>> write to a not-opened stream.
>>>
>>> That's an extremely device specific *implementation* of a write
>>> stream. The *concept* of a write stream being passed from userspace to
>>> the block layer doesn't have such constraints, and I get realy
>>> concerned when implementations of a generic concept are so tightly
>>> focussed around one type of hardware implementation of the
>>> concept...
>>
>> Indeed, which is why the implementation posted cares ONLY about the
>> stream ID itself, and passing that through.
>>
>> But the point about fallback is valid, however, for some use cases
>> that will not be what you want. But we have to make some sort of
>> decision, and falling back to the inode set value (if one is set) is
>> probably the right thing to do in most use cases.
>
> Right, the question is then whether fadvise should set the value on
> the inode at all, because then the effect of setting it on a fd also
> changes the fallback. Perhaps we need to a distinction between
> "setting the stream for this fd" which lasts as long as the fd is
> active, and "setting the default inode stream" which is potentially
> a persistent operation if the filesystem stores it on disk...

Yes, that might be a good compromise. The easiest would be to define a 
second fadvise advice, where the stronger advice would be file + inode. 
Another option would be changing the file approach to use fcntl(), and 
keeping the fadvise for the inode. I'll be happy to take input on what 
people would prefer here.

>>>> Device has limited number of streams, for example, 16 streams.
>>>> There are 2 APIs to open/close the stream.
>>>
>>> What's to stop me writing something for DM-thinp that understands
>>> write streams in bios and uses it to separate out the write streams
>>> into different regions of the thinp device to improve locality of
>>> it's data placement and hence reduce fragmentation?
>>
>> Absolutely nothing, in fact that's one of the use cases that I had
>> in mind. Or for for caching software.
>
> *nod*. We are on the same page, then :)

Yes completely, basically just wanted to clarify that.

-- 
Jens Axboe


  reply	other threads:[~2015-04-18  2:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-24 15:26 [PATCH RFC] Support for write stream IDs Jens Axboe
2015-03-24 15:26 ` [PATCH 1/6] block: add support for carrying a stream ID in a bio Jens Axboe
2015-03-24 17:11   ` Matias Bjørling
2015-03-24 17:26     ` Jens Axboe
2015-03-24 22:07       ` Ming Lin-SSI
2015-03-25  1:42         ` Jens Axboe
2015-03-25  8:11         ` Matias Bjørling
2015-03-25 18:36           ` Ming Lin-SSI
2015-03-25  2:30   ` Dave Chinner
2015-04-12 10:42     ` Dmitry Monakhov
2015-03-24 15:26 ` [PATCH 2/6] Add support for per-file stream ID Jens Axboe
2015-03-24 15:27 ` [PATCH 3/6] direct-io: add support for write stream IDs Jens Axboe
2015-03-25  2:43   ` Dave Chinner
2015-03-25 14:26     ` Jens Axboe
2015-04-10 23:50       ` Ming Lin
2015-04-11  0:06         ` Ming Lin
2015-04-11 11:59         ` Dave Chinner
2015-04-17  6:20           ` Ming Lin
2015-04-17 23:06             ` Dave Chinner
2015-04-17 23:11               ` Jens Axboe
2015-04-17 23:51                 ` Dave Chinner
2015-04-18  2:00                   ` Jens Axboe [this message]
2015-04-17 15:17         ` Jens Axboe
2015-03-24 15:27 ` [PATCH 4/6] Add stream ID support for buffered writeback Jens Axboe
2015-03-25  2:40   ` Dave Chinner
2015-03-25 14:17     ` Jens Axboe
2015-03-24 15:27 ` [PATCH 5/6] btrfs: add support for buffered writeback stream ID Jens Axboe
2015-03-24 15:27 ` [PATCH 6/6] xfs: " Jens Axboe
2015-03-25  2:41   ` Dave Chinner
2015-03-24 17:03 ` [PATCH RFC] Support for write stream IDs Jeff Moyer
2015-03-24 17:08   ` Jens Axboe
2015-03-24 21:46     ` Ming Lin-SSI
2015-03-24 21:48       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5531BAD5.4030104@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=axboe@fb.com \
    --cc=david@fromorbit.com \
    --cc=kwan.huen@ssi.samsung.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.l@ssi.samsung.com \
    --cc=mlin@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).