linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "Jason B. Akers" <jason.b.akers@intel.com>,
	linux-ide@vger.kernel.org, dan.j.williams@intel.com,
	kapil.karkra@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives
Date: Thu, 30 Oct 2014 08:53:49 -0600	[thread overview]
Message-ID: <545250FD.8030509@fb.com> (raw)
In-Reply-To: <yq1bnoukzhe.fsf@sermon.lab.mkp.net>

On 2014-10-29 21:28, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <axboe@fb.com> writes:
>
> Jens> The problem with xadvise() is that it handles only one part of
> Jens> this - it handles the case of tying some sort of IO related
> Jens> priority information to an inode. It does not handle the case of
> Jens> different parts of the file, at least not without adding specific
> Jens> extra tracking for this on the kernel side.
>
> Are there actually people asking for sub-file granularity? I didn't get
> any requests for that in the survey I did this summer.

Yeah, consider the case of using a raw block device for storing a 
database. That one is quite common. Or perhaps a setup with a single 
log, with data being appended to it. Some of that data would be marked 
as hot/willneed, some of it will be marked with cold/wontneed. This 
means that we cannot rely on per-inode hinting.

> I talked to several application people about what they really needed and
> wanted. That turned into a huge twisted mess of a table with ponies of
> various sizes.

Who could have envisioned that :-)

> I condensed all those needs and desires into something like this:
>
> +-----------------+------------+----------+------------+
> | I/O Class       | Command    | Desired  | Predicted  |
> |                 | Completion | Future   | Future     |
> |                 | Urgency    | Access   | Access     |
> |                 |            | Latency  | Frequency  |
> +-----------------+------------+----------+------------+
> | Transaction     | High       | Low      | High       |
> +-----------------+------------+----------+------------+
> | Metadata        | High       | Low      | Normal     |
> +-----------------+------------+----------+------------+
> | Paging          | High       | Normal   | Normal     |
> +-----------------+------------+----------+------------+
> | Streaming       | High       | Normal   | Low        |
> +-----------------+------------+----------+------------+
> | Data            | Normal     | Normal   | Normal     |
> +-----------------+------------+----------+------------+
> | Background      | Low        | Normal*  | Low        |
> +-----------------+------------+----------+------------+
>
> Command completion urgency is really just the existing I/O priority.
> Desired future access latency affects data placement in a tiered
> device. Predicted future access frequency is essentially a caching hint.
>
> The names and I/O classes themselves are not really important. It's just
> a reduced version of all the things people asked for. Essentially:
> Relative priority, data placement and caching.
>
> I had also asked why people wanted to specify any hints. And that boiled
> down to the I/O classes in the left column above. People wanted stuff on
> a low latency storage tier because it was a transactional or metadata
> type of I/O. Or to isolate production I/O from any side effects of a
> background scrub or backup run.
>
> Incidentally, the classes data, transaction and background covered
> almost all the use cases that people had asked for. The metadata class
> mostly came about from good results with REQ_META tagging in a previous
> prototype. A few vendors wanted to be able to identify swap to prevent
> platter spin-ups. Streaming was requested by a couple of video folks.
>
> The notion of telling the storage *why* you're doing I/O instead of
> telling it how to manage its cache and where to put stuff is closely
> aligned with our internal experiences with I/O hints over the last
> decade. But it's a bit of a departure from where things are going in the
> standards bodies. In any case I thought it was interesting that pretty
> much every use case that people came up with could be adequately
> described by a handful of I/O classes.

Definitely agree on this, it's about notifying storage on what type of 
IO this is, or why we are doing it. I'm just still worried that this 
will then end up being unusable by applications, since they can't rely 
on anything. Say one vendor treats WONTNEED in a much colder fashion 
than others, the user/application will then complain about the access 
latencies for the next IO to that location. "Yes it's cold, but I didn't 
expect it to be THAT cold" and then come to the conclusion that they 
can't feasibly use these hints as they don't do exactly what they want.

It'd be nice if we could augment this with a query interface of some 
sort, that could give the application some idea of what happens for each 
of the passed in hints. That would improve the situation from a "lets 
set this hint and hope it does what we think it does" to a more 
predictable and robust environment.


-- 
Jens Axboe


  parent reply	other threads:[~2014-10-30 14:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-29 18:23 [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 1/5] block, ioprio: include caching advice via ionice Jason B. Akers
2014-10-29 19:02   ` Jeff Moyer
2014-10-29 21:07     ` Dan Williams
2014-10-29 18:23 ` [RFC PATCH 2/5] block: ioprio hint to low-level device drivers Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 3/5] block: untangle ioprio from BLK_CGROUP and BLK_DEV_THROTTLING Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 4/5] block, mm: Added the necessary plumbing to take ioprio hints down to block layer Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 5/5] libata: Enabling Solid State Hybrid Drives (SSHDs) based on SATA 3.2 standard Jason B. Akers
2014-10-29 20:14 ` [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Dave Chinner
2014-10-29 21:10   ` Jens Axboe
2014-10-29 22:09     ` Dave Chinner
2014-10-29 22:24       ` Dan Williams
2014-10-30  7:21         ` Dave Chinner
2014-10-30 14:15           ` Jens Axboe
2014-10-30 17:07           ` Dan Williams
2014-11-10  4:22             ` Dave Chinner
2014-11-12 16:47               ` Dan Williams
2014-10-29 22:49       ` Jens Axboe
2014-10-29 21:11   ` Dan Williams
2014-12-03 15:25   ` Pavel Machek
2014-10-30  2:05 ` Martin K. Petersen
2014-10-30  2:35   ` Jens Axboe
2014-10-30  3:28     ` Martin K. Petersen
2014-10-30  4:19       ` Dan Williams
2014-10-30 14:17         ` Jens Axboe
2014-10-30 14:53       ` Jens Axboe [this message]
2014-10-30 16:27         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=545250FD.8030509@fb.com \
    --to=axboe@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=jason.b.akers@intel.com \
    --cc=kapil.karkra@intel.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).