From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Jens Axboe <axboe@fb.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
"Jason B. Akers" <jason.b.akers@intel.com>,
<linux-ide@vger.kernel.org>, <dan.j.williams@intel.com>,
<kapil.karkra@intel.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives
Date: Wed, 29 Oct 2014 23:28:29 -0400 [thread overview]
Message-ID: <yq1bnoukzhe.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <5451A3F5.9020903@fb.com> (Jens Axboe's message of "Wed, 29 Oct 2014 20:35:33 -0600")
>>>>> "Jens" == Jens Axboe <axboe@fb.com> writes:
Jens> The problem with xadvise() is that it handles only one part of
Jens> this - it handles the case of tying some sort of IO related
Jens> priority information to an inode. It does not handle the case of
Jens> different parts of the file, at least not without adding specific
Jens> extra tracking for this on the kernel side.
Are there actually people asking for sub-file granularity? I didn't get
any requests for that in the survey I did this summer.
I talked to several application people about what they really needed and
wanted. That turned into a huge twisted mess of a table with ponies of
various sizes.
I condensed all those needs and desires into something like this:
+-----------------+------------+----------+------------+
| I/O Class | Command | Desired | Predicted |
| | Completion | Future | Future |
| | Urgency | Access | Access |
| | | Latency | Frequency |
+-----------------+------------+----------+------------+
| Transaction | High | Low | High |
+-----------------+------------+----------+------------+
| Metadata | High | Low | Normal |
+-----------------+------------+----------+------------+
| Paging | High | Normal | Normal |
+-----------------+------------+----------+------------+
| Streaming | High | Normal | Low |
+-----------------+------------+----------+------------+
| Data | Normal | Normal | Normal |
+-----------------+------------+----------+------------+
| Background | Low | Normal* | Low |
+-----------------+------------+----------+------------+
Command completion urgency is really just the existing I/O priority.
Desired future access latency affects data placement in a tiered
device. Predicted future access frequency is essentially a caching hint.
The names and I/O classes themselves are not really important. It's just
a reduced version of all the things people asked for. Essentially:
Relative priority, data placement and caching.
I had also asked why people wanted to specify any hints. And that boiled
down to the I/O classes in the left column above. People wanted stuff on
a low latency storage tier because it was a transactional or metadata
type of I/O. Or to isolate production I/O from any side effects of a
background scrub or backup run.
Incidentally, the classes data, transaction and background covered
almost all the use cases that people had asked for. The metadata class
mostly came about from good results with REQ_META tagging in a previous
prototype. A few vendors wanted to be able to identify swap to prevent
platter spin-ups. Streaming was requested by a couple of video folks.
The notion of telling the storage *why* you're doing I/O instead of
telling it how to manage its cache and where to put stuff is closely
aligned with our internal experiences with I/O hints over the last
decade. But it's a bit of a departure from where things are going in the
standards bodies. In any case I thought it was interesting that pretty
much every use case that people came up with could be adequately
described by a handful of I/O classes.
The next step was trying to map these hints into what was available in
xadvise(), NFS 4.2 and the recent T10/T13 efforts. That wasn't trivial
and there really isn't a 1:1 mapping that works. So I went to T10 and
tried to nudge things in the same direction as NFS 4.2. Mainly because
that's closer to what we already have in xadvise().
Jens> I think we've needed a proper API for passing in appropriate hints
Jens> on a per-io basis for a LONG time.
Yup.
Jens> That is the big challenge. We've tried (and failed) in the past to
Jens> define a set of hints that make sense. It'd be a shame to add
Jens> something that's specific to a given transport/technology.
Absolutely!
--
Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2014-10-30 3:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-29 18:23 [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 1/5] block, ioprio: include caching advice via ionice Jason B. Akers
2014-10-29 19:02 ` Jeff Moyer
2014-10-29 21:07 ` Dan Williams
2014-10-29 18:23 ` [RFC PATCH 2/5] block: ioprio hint to low-level device drivers Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 3/5] block: untangle ioprio from BLK_CGROUP and BLK_DEV_THROTTLING Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 4/5] block, mm: Added the necessary plumbing to take ioprio hints down to block layer Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 5/5] libata: Enabling Solid State Hybrid Drives (SSHDs) based on SATA 3.2 standard Jason B. Akers
2014-10-29 20:14 ` [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Dave Chinner
2014-10-29 21:10 ` Jens Axboe
2014-10-29 22:09 ` Dave Chinner
2014-10-29 22:24 ` Dan Williams
2014-10-30 7:21 ` Dave Chinner
2014-10-30 14:15 ` Jens Axboe
2014-10-30 17:07 ` Dan Williams
2014-11-10 4:22 ` Dave Chinner
2014-11-12 16:47 ` Dan Williams
2014-10-29 22:49 ` Jens Axboe
2014-10-29 21:11 ` Dan Williams
2014-12-03 15:25 ` Pavel Machek
2014-10-30 2:05 ` Martin K. Petersen
2014-10-30 2:35 ` Jens Axboe
2014-10-30 3:28 ` Martin K. Petersen [this message]
2014-10-30 4:19 ` Dan Williams
2014-10-30 14:17 ` Jens Axboe
2014-10-30 14:53 ` Jens Axboe
2014-10-30 16:27 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1bnoukzhe.fsf@sermon.lab.mkp.net \
--to=martin.petersen@oracle.com \
--cc=axboe@fb.com \
--cc=dan.j.williams@intel.com \
--cc=jason.b.akers@intel.com \
--cc=kapil.karkra@intel.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox