All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Jason B. Akers" <jason.b.akers@intel.com>,
	linux-ide@vger.kernel.org, dan.j.williams@intel.com,
	kapil.karkra@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives
Date: Wed, 29 Oct 2014 16:49:41 -0600	[thread overview]
Message-ID: <54516F05.8050204@fb.com> (raw)
In-Reply-To: <20141029220905.GL16186@dastard>

On 10/29/2014 04:09 PM, Dave Chinner wrote:
> On Wed, Oct 29, 2014 at 03:10:51PM -0600, Jens Axboe wrote:
>> On 10/29/2014 02:14 PM, Dave Chinner wrote:
>>> On Wed, Oct 29, 2014 at 11:23:38AM -0700, Jason B. Akers wrote:
>>>> The following series enables the use of Solid State hybrid drives
>>>> ATA standard 3.2 defines the hybrid information feature, which provides a means for the host driver to provide hints to the SSHDs to guide what to place on the SSD/NAND portion and what to place on the magnetic media.
>>>>
>>>> This implementation allows user space applications to provide the cache hints to the kernel using the existing ionice syscall. 
>>>>
>>>> An application can pass a priority number coding up bits 11, 12, and 15 of the ionice command to form a 3 bit field that encodes the following priorities:
>>>> 	OPRIO_ADV_NONE,
>>>> 	IOPRIO_ADV_EVICT, /* actively discard cached data */
>>>> 	IOPRIO_ADV_DONTNEED, /* caching this data has little value */
>>>> 	IOPRIO_ADV_NORMAL, /* best-effort cache priority (default) */
>>>> 	IOPRIO_ADV_RESERVED1, /* reserved for future use */
>>>> 	IOPRIO_ADV_RESERVED2,
>>>> 	IOPRIO_ADV_RESERVED3,
>>>> 	IOPRIO_ADV_WILLNEED, /* high temporal locality */
>>>>
>>>> For example the following commands from the user space will make dd IOs to be generated with a hint of IOPRIO_ADV_DONTNEED assuming the SSHD is /dev/sdc.
>>>>
>>>> ionice -c2 -n4096 dd if=/dev/zero of=/dev/sdc bs=1M count=1024
>>>> ionice -c2 -n4096 dd if=/dev/sdc of=/dev/null bs=1M count=1024
>>>
>>> This looks to be the wrong way to implement per-IO priority
>>> information.
>>>
>>> How does a filesystem make use of this to make sure it's
>>> metadata ends up with IOPRIO_ADV_WILLNEED to store frequently
>>> accessed metadata in flash. Conversely, journal writes need to
>>> be issued with IOPRIO_ADV_DONTNEED so they don't unneceessarily
>>> consume flash space as they are never-read IOs...
>>
>> Not disagreeing that loading more into the io priority fields is a
>> bit... icky. I see why it's done, though, it requires the least amount
>> of plumbing.
> 
> Yeah, but we don't do things the easy way just because it's easy. We
> do things the right way. ;)

Still not disagreeing with you, merely stating that I can see why they
chose to do it this way. Still doesn't change the fact that it feels
like a hack, not a designed solution.

>> As for the fs accessing this, the io nice fields are readily exposed
>> through the ->bi_rw setting. So while the above example uses ionice to
>> set a task io priority (that a bio will then inherit), nothing prevents
>> you from passing it in directly from the kernel.
> 
> Right, but now the filesystem needs to provide that on a per-inode
> basis, not from the task structure as the task that is submitting
> the bio is not necesarily the task doing the read/write syscall.

Whomever submits the bio would need to provide it, yes. And with the
disconnect for async writes, that becomes... interesting.

> e.g. the write case above doesn't actually inherit the task priority
> at the bio level at all because the IO is being dispatched by a
> background flusher thread, not the ioniced task calling write(2). 

Oh yes, I realize that.

> IMO using ionice is a nice hack, but utimately it looks mostly useless
> from a user and application perspective as cache residency is a
> property of the data being read/written, not the task doing the IO.
> e.g. a database will want it's indexes in flash and bulk
> data in non-cached storage.
> 
> IOWs, to make effective use of this the task will need different
> cache hints for each different type of data needs to do IO on, and
> so overloading IO priorities just seems the wrong direction to be
> starting from.

Agree.

-- 
Jens Axboe


WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@fb.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Jason B. Akers" <jason.b.akers@intel.com>,
	<linux-ide@vger.kernel.org>, <dan.j.williams@intel.com>,
	<kapil.karkra@intel.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives
Date: Wed, 29 Oct 2014 16:49:41 -0600	[thread overview]
Message-ID: <54516F05.8050204@fb.com> (raw)
In-Reply-To: <20141029220905.GL16186@dastard>

On 10/29/2014 04:09 PM, Dave Chinner wrote:
> On Wed, Oct 29, 2014 at 03:10:51PM -0600, Jens Axboe wrote:
>> On 10/29/2014 02:14 PM, Dave Chinner wrote:
>>> On Wed, Oct 29, 2014 at 11:23:38AM -0700, Jason B. Akers wrote:
>>>> The following series enables the use of Solid State hybrid drives
>>>> ATA standard 3.2 defines the hybrid information feature, which provides a means for the host driver to provide hints to the SSHDs to guide what to place on the SSD/NAND portion and what to place on the magnetic media.
>>>>
>>>> This implementation allows user space applications to provide the cache hints to the kernel using the existing ionice syscall. 
>>>>
>>>> An application can pass a priority number coding up bits 11, 12, and 15 of the ionice command to form a 3 bit field that encodes the following priorities:
>>>> 	OPRIO_ADV_NONE,
>>>> 	IOPRIO_ADV_EVICT, /* actively discard cached data */
>>>> 	IOPRIO_ADV_DONTNEED, /* caching this data has little value */
>>>> 	IOPRIO_ADV_NORMAL, /* best-effort cache priority (default) */
>>>> 	IOPRIO_ADV_RESERVED1, /* reserved for future use */
>>>> 	IOPRIO_ADV_RESERVED2,
>>>> 	IOPRIO_ADV_RESERVED3,
>>>> 	IOPRIO_ADV_WILLNEED, /* high temporal locality */
>>>>
>>>> For example the following commands from the user space will make dd IOs to be generated with a hint of IOPRIO_ADV_DONTNEED assuming the SSHD is /dev/sdc.
>>>>
>>>> ionice -c2 -n4096 dd if=/dev/zero of=/dev/sdc bs=1M count=1024
>>>> ionice -c2 -n4096 dd if=/dev/sdc of=/dev/null bs=1M count=1024
>>>
>>> This looks to be the wrong way to implement per-IO priority
>>> information.
>>>
>>> How does a filesystem make use of this to make sure it's
>>> metadata ends up with IOPRIO_ADV_WILLNEED to store frequently
>>> accessed metadata in flash. Conversely, journal writes need to
>>> be issued with IOPRIO_ADV_DONTNEED so they don't unneceessarily
>>> consume flash space as they are never-read IOs...
>>
>> Not disagreeing that loading more into the io priority fields is a
>> bit... icky. I see why it's done, though, it requires the least amount
>> of plumbing.
> 
> Yeah, but we don't do things the easy way just because it's easy. We
> do things the right way. ;)

Still not disagreeing with you, merely stating that I can see why they
chose to do it this way. Still doesn't change the fact that it feels
like a hack, not a designed solution.

>> As for the fs accessing this, the io nice fields are readily exposed
>> through the ->bi_rw setting. So while the above example uses ionice to
>> set a task io priority (that a bio will then inherit), nothing prevents
>> you from passing it in directly from the kernel.
> 
> Right, but now the filesystem needs to provide that on a per-inode
> basis, not from the task structure as the task that is submitting
> the bio is not necesarily the task doing the read/write syscall.

Whomever submits the bio would need to provide it, yes. And with the
disconnect for async writes, that becomes... interesting.

> e.g. the write case above doesn't actually inherit the task priority
> at the bio level at all because the IO is being dispatched by a
> background flusher thread, not the ioniced task calling write(2). 

Oh yes, I realize that.

> IMO using ionice is a nice hack, but utimately it looks mostly useless
> from a user and application perspective as cache residency is a
> property of the data being read/written, not the task doing the IO.
> e.g. a database will want it's indexes in flash and bulk
> data in non-cached storage.
> 
> IOWs, to make effective use of this the task will need different
> cache hints for each different type of data needs to do IO on, and
> so overloading IO priorities just seems the wrong direction to be
> starting from.

Agree.

-- 
Jens Axboe


  parent reply	other threads:[~2014-10-29 22:49 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-29 18:23 [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 1/5] block, ioprio: include caching advice via ionice Jason B. Akers
2014-10-29 19:02   ` Jeff Moyer
2014-10-29 21:07     ` Dan Williams
2014-10-29 18:23 ` [RFC PATCH 2/5] block: ioprio hint to low-level device drivers Jason B. Akers
2014-10-29 18:23 ` [RFC PATCH 3/5] block: untangle ioprio from BLK_CGROUP and BLK_DEV_THROTTLING Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 4/5] block, mm: Added the necessary plumbing to take ioprio hints down to block layer Jason B. Akers
2014-10-29 18:24 ` [RFC PATCH 5/5] libata: Enabling Solid State Hybrid Drives (SSHDs) based on SATA 3.2 standard Jason B. Akers
2014-10-29 20:14 ` [RFC PATCH 0/5] Enable use of Solid State Hybrid Drives Dave Chinner
2014-10-29 21:10   ` Jens Axboe
2014-10-29 21:10     ` Jens Axboe
2014-10-29 22:09     ` Dave Chinner
2014-10-29 22:24       ` Dan Williams
2014-10-30  7:21         ` Dave Chinner
2014-10-30 14:15           ` Jens Axboe
2014-10-30 17:07           ` Dan Williams
2014-11-10  4:22             ` Dave Chinner
2014-11-12 16:47               ` Dan Williams
2014-10-29 22:49       ` Jens Axboe [this message]
2014-10-29 22:49         ` Jens Axboe
2014-10-29 21:11   ` Dan Williams
2014-12-03 15:25   ` Pavel Machek
2014-10-30  2:05 ` Martin K. Petersen
2014-10-30  2:35   ` Jens Axboe
2014-10-30  2:35     ` Jens Axboe
2014-10-30  3:28     ` Martin K. Petersen
2014-10-30  3:28       ` Martin K. Petersen
2014-10-30  4:19       ` Dan Williams
2014-10-30  4:19         ` Dan Williams
2014-10-30 14:17         ` Jens Axboe
2014-10-30 14:17           ` Jens Axboe
2014-10-30 14:17           ` Jens Axboe
2014-10-30 14:53       ` Jens Axboe
2014-10-30 14:53         ` Jens Axboe
2014-10-30 16:27         ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54516F05.8050204@fb.com \
    --to=axboe@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jason.b.akers@intel.com \
    --cc=kapil.karkra@intel.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.