linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v4] Add support for write life time hints
@ 2017-06-15  3:45 Jens Axboe
  2017-06-15  3:45 ` [PATCH 01/11] block: add support for carrying stream information in a bio Jens Axboe
                   ` (11 more replies)
  0 siblings, 12 replies; 56+ messages in thread
From: Jens Axboe @ 2017-06-15  3:45 UTC (permalink / raw)
  To: linux-fsdevel, linux-block; +Cc: adilger, hch, martin.petersen

A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:

- With NVMe 1.3 compliant devices, the device can expose multiple
  streams. Separating data written into streams based on life time
  can drastically reduce the write amplification. This helps device
  endurance, and increases performance. Testing just performed
  internally at Facebook with these patches showed up to a 25%
  reduction in NAND writes in a RocksDB setup.

- Software caching solutions can make more intelligent decisions
  on how and where to place data.

Contrary to previous patches, we're not exposing numeric stream values anymore.
I've previously advocated for just doing a set of hints that makes sense
instead. See the coverage from the LSFMM summit this year:

https://lwn.net/Articles/717755/

This patchset attempts to do that. We define 4 flags for the pwritev2
system call:

RWF_WRITE_LIFE_SHORT	Data written with this flag is expected to have
			a high overwrite rate, or life time.

RWF_WRITE_LIFE_MEDIUM	Longer life time than SHORT

RWF_WRITE_LIFE_LONG	Longer life time than MEDIUM

RWF_WRITE_LIFE_EXTREME	Longer life time than LONG

The idea is that these are relative values, so an application can
use them as they see fit. The underlying device can then place
data appropriately, or be free to ignore the hint. It's just a hint.

A branch based on current master can be pulled
from here:

git://git.kernel.dk/linux-block write-stream.4

Changes since v3:

- Change any naming of stream ID to write hint.
- Various little API changes, suggested by Christoph
- Cleanup the NVMe bits, dump the debug info.
- Change NVMe to lazily allocate the streams.
- Various NVMe error handling improvements and command checking.

Changes since v2:

- Get rid of bio->bi_stream and replace with four request/bio flags.
  These map directly to the RWF_WRITE_* flags that the user passes in.
- Cleanup the NVMe stream setting.
- Drivers now responsible for updating the queue stream write counter,
  as they determine what stream to map a given flag to.

Changes since v1:

- Guard queue stream stats to ensure we don't mess up memory, if
  bio_stream() ever were to return a larger value than we support.
- NVMe: ensure we set the stream modulo the name space defined count.
- Cleanup the RWF_ and IOCB_ flags. Set aside 4 bits, and just store
  the stream value in there. This makes the passing of stream ID from
  RWF_ space to IOCB_ (and IOCB_ to bio) more efficient, and cleans it
  up in general.
- Kill the block internal definitions of the stream type, we don't need
  them anymore. See above.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET v7] Add support for write life time hints
@ 2017-06-17 19:59 Jens Axboe
  2017-06-17 19:59 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
  0 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2017-06-17 19:59 UTC (permalink / raw)
  To: linux-fsdevel, linux-block; +Cc: adilger, hch, martin.petersen

A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:

- For NVMe, this feature is ratified and released with the NVMe 1.3
  spec. Devices implementing Directives can expose multiple streams.
  Separating data written into streams based on life time can
  drastically reduce the write amplification. This helps device
  endurance, and increases performance. Testing just performed
  internally at Facebook with these patches showed up to a 25% reduction
  in NAND writes in a RocksDB setup.

- Software caching solutions can make more intelligent decisions
  on how and where to place data.

Contrary to previous patches, we're not exposing numeric stream values anymore.
I've previously advocated for just doing a set of hints that makes sense
instead. See the coverage from the LSFMM summit this year:

https://lwn.net/Articles/717755/

This patchset attempts to do that. We define 4 flags for the pwritev2
system call:

RWF_WRITE_LIFE_SHORT	Data written with this flag is expected to have
			a high overwrite rate, or life time.

RWF_WRITE_LIFE_MEDIUM	Longer life time than SHORT

RWF_WRITE_LIFE_LONG	Longer life time than MEDIUM

RWF_WRITE_LIFE_EXTREME	Longer life time than LONG

The idea is that these are relative values, so an application can
use them as they see fit. The underlying device can then place
data appropriately, or be free to ignore the hint. It's just a hint.

Similarly, to query and set these values on the side, there's now
an fcntl based interface. This exposes the WRITE_LIFE_* values to
userspace, and defines F_{GET,SET}_RW_HINT commands to get and set
them as well.

A branch based on current master can be pulled
from here:

git://git.kernel.dk/linux-block write-stream.7

Changes since v6:

- Rewrite NVMe write stream assignment
- Change NVMe stream assignment to be per-controller, not per-ns. Then
  we can use the same IDs across name spaces, and we don't have to do
  lazy setup of streams.
- If streams are enabled on nvme, set io min/opt and discard
  granularity based on the stream params reported.
- Fixup F_SET_RW_HINT definition, it was 20, should have been 12.

Changes since v5:

- Change enum write_hint to enum rw_hint.
- Change fcntl() interface to be read/write generic
- Bring enum rw_hint all the way to bio/request
- Change references to streams in changelogs and debugfs interface
- Rebase to master to resolve blkdev.h conflict
- Reshuffle patches so the WRITE_LIFE_* hints and type come first. Allowed
  me to merge two block patches as well.

Changes since v4:

- Add enum write_hint and the WRITE_HINT_* values. This is what we
  use internally (until transformed to req/bio flags), and what is
  exposed to user space with the fcntl() interface. Maps directly
  to the RWF_WRITE_LIFE_* values.
- Add fcntl() interface for getting/setting hint values.
- Get rid of inode ->i_write_hint, encode the 3 bits of hint info
  in the inode flags intead.
- Allow a write with no hint to clear the old hint. Previously we
  only changed the hint if a new valid hint was given, not if no
  hint was passed in.
- Shrink flag space grabbed from 4 to 3 bits for RWF_* and the inode
  flags.

Changes since v3:

- Change any naming of stream ID to write hint.
- Various little API changes, suggested by Christoph
- Cleanup the NVMe bits, dump the debug info.
- Change NVMe to lazily allocate the streams.
- Various NVMe error handling improvements and command checking.

Changes since v2:

- Get rid of bio->bi_stream and replace with four request/bio flags.
  These map directly to the RWF_WRITE_* flags that the user passes in.
- Cleanup the NVMe stream setting.
- Drivers now responsible for updating the queue stream write counter,
  as they determine what stream to map a given flag to.

Changes since v1:

- Guard queue stream stats to ensure we don't mess up memory, if
  bio_stream() ever were to return a larger value than we support.
- NVMe: ensure we set the stream modulo the name space defined count.
- Cleanup the RWF_ and IOCB_ flags. Set aside 4 bits, and just store
  the stream value in there. This makes the passing of stream ID from
  RWF_ space to IOCB_ (and IOCB_ to bio) more efficient, and cleans it
  up in general.
- Kill the block internal definitions of the stream type, we don't need
  them anymore. See above.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET v6] Add support for write life time hints
@ 2017-06-16 17:24 Jens Axboe
  2017-06-16 17:24 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
  0 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2017-06-16 17:24 UTC (permalink / raw)
  To: linux-fsdevel, linux-block; +Cc: adilger, hch, martin.petersen

A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:

- For NVMe, this feature is ratified and released with the NVMe 1.3
  spec. Devices implementing Directives can expose multiple streams.
  Separating data written into streams based on life time can
  drastically reduce the write amplification. This helps device
  endurance, and increases performance. Testing just performed
  internally at Facebook with these patches showed up to a 25% reduction
  in NAND writes in a RocksDB setup.

- Software caching solutions can make more intelligent decisions
  on how and where to place data.

Contrary to previous patches, we're not exposing numeric stream values anymore.
I've previously advocated for just doing a set of hints that makes sense
instead. See the coverage from the LSFMM summit this year:

https://lwn.net/Articles/717755/

This patchset attempts to do that. We define 4 flags for the pwritev2
system call:

RWF_WRITE_LIFE_SHORT	Data written with this flag is expected to have
			a high overwrite rate, or life time.

RWF_WRITE_LIFE_MEDIUM	Longer life time than SHORT

RWF_WRITE_LIFE_LONG	Longer life time than MEDIUM

RWF_WRITE_LIFE_EXTREME	Longer life time than LONG

The idea is that these are relative values, so an application can
use them as they see fit. The underlying device can then place
data appropriately, or be free to ignore the hint. It's just a hint.

Similarly, to query and set these values on the side, there's now
an fcntl based interface. This exposes the WRITE_LIFE_* values to
userspace, and defines F_{GET,SET}_WRITE_LIFE commands to get and
set them as well.

A branch based on current master can be pulled
from here:

git://git.kernel.dk/linux-block write-stream.5

Changes since v5:

- Change enum write_hint to enum rw_hint.
- Change fcntl() interface to be read/write generic
- Bring enum rw_hint all the way to bio/request
- Change references to streams in changelogs and debugfs interface
- Rebase to master to resolve blkdev.h conflict
- Reshuffle patches so the WRITE_LIFE_* hints and type come first. Allowed
  me to merge two block patches as well.

Changes since v4:

- Add enum write_hint and the WRITE_HINT_* values. This is what we
  use internally (until transformed to req/bio flags), and what is
  exposed to user space with the fcntl() interface. Maps directly
  to the RWF_WRITE_LIFE_* values.
- Add fcntl() interface for getting/setting hint values.
- Get rid of inode ->i_write_hint, encode the 3 bits of hint info
  in the inode flags intead.
- Allow a write with no hint to clear the old hint. Previously we
  only changed the hint if a new valid hint was given, not if no
  hint was passed in.
- Shrink flag space grabbed from 4 to 3 bits for RWF_* and the inode
  flags.

Changes since v3:

- Change any naming of stream ID to write hint.
- Various little API changes, suggested by Christoph
- Cleanup the NVMe bits, dump the debug info.
- Change NVMe to lazily allocate the streams.
- Various NVMe error handling improvements and command checking.

Changes since v2:

- Get rid of bio->bi_stream and replace with four request/bio flags.
  These map directly to the RWF_WRITE_* flags that the user passes in.
- Cleanup the NVMe stream setting.
- Drivers now responsible for updating the queue stream write counter,
  as they determine what stream to map a given flag to.

Changes since v1:

- Guard queue stream stats to ensure we don't mess up memory, if
  bio_stream() ever were to return a larger value than we support.
- NVMe: ensure we set the stream modulo the name space defined count.
- Cleanup the RWF_ and IOCB_ flags. Set aside 4 bits, and just store
  the stream value in there. This makes the passing of stream ID from
  RWF_ space to IOCB_ (and IOCB_ to bio) more efficient, and cleans it
  up in general.
- Kill the block internal definitions of the stream type, we don't need
  them anymore. See above.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 56+ messages in thread
* [PATCHSET v3] Add support for write life time hints
@ 2017-06-14 19:05 Jens Axboe
  2017-06-14 19:05 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
  0 siblings, 1 reply; 56+ messages in thread
From: Jens Axboe @ 2017-06-14 19:05 UTC (permalink / raw)
  To: linux-fsdevel, linux-block; +Cc: adilger, hch, martin.petersen

A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:

- With NVMe 1.3 compliant devices, the device can expose multiple
  streams. Separating data written into streams based on life time
  can drastically reduce the write amplification. This helps device
  endurance, and increases performance. Testing just performed
  internally at Facebook with these patches showed up to a 25%
  reduction in NAND writes in a RocksDB setup.

- Software caching solutions can make more intelligent decisions
  on how and where to place data.

Contrary to previous patches, we're not exposing numeric stream values anymore.
I've previously advocated for just doing a set of hints that makes sense
instead. See the coverage from the LSFMM summit this year:

https://lwn.net/Articles/717755/

This patchset attempts to do that. We define 4 flags for the pwritev2
system call:

RWF_WRITE_LIFE_SHORT	Data written with this flag is expected to have
			a high overwrite rate, or life time.

RWF_WRITE_LIFE_MEDIUM	Longer life time than SHORT

RWF_WRITE_LIFE_LONG	Longer life time than MEDIUM

RWF_WRITE_LIFE_EXTREME	Longer life time than LONG

The idea is that these are relative values, so an application can
use them as they see fit. The underlying device can then place
data appropriately, or be free to ignore the hint. It's just a hint.

A branch based on current master can be pulled
from here:

git://git.kernel.dk/linux-block write-stream.3

Changes since v2:

- Get rid of bio->bi_stream and replace with four request/bio flags.
  These map directly to the RWF_WRITE_* flags that the user passes in.
- Cleanup the NVMe stream setting.
- Drivers now responsible for updating the queue stream write counter,
  as they determine what stream to map a given flag to.

Changes since v1:

- Guard queue stream stats to ensure we don't mess up memory, if
  bio_stream() ever were to return a larger value than we support.
- NVMe: ensure we set the stream modulo the name space defined count.
- Cleanup the RWF_ and IOCB_ flags. Set aside 4 bits, and just store
  the stream value in there. This makes the passing of stream ID from
  RWF_ space to IOCB_ (and IOCB_ to bio) more efficient, and cleans it
  up in general.
- Kill the block internal definitions of the stream type, we don't need
  them anymore. See above.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2017-06-20 13:00 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-15  3:45 [PATCHSET v4] Add support for write life time hints Jens Axboe
2017-06-15  3:45 ` [PATCH 01/11] block: add support for carrying stream information in a bio Jens Axboe
2017-06-15  3:45 ` [PATCH 02/11] blk-mq: expose stream write stats through debugfs Jens Axboe
2017-06-15  8:16   ` Christoph Hellwig
2017-06-15 14:24     ` Jens Axboe
2017-06-15  3:45 ` [PATCH 03/11] fs: add support for an inode to carry stream related data Jens Axboe
2017-06-15  8:17   ` Christoph Hellwig
2017-06-15 14:22     ` Jens Axboe
2017-06-15  3:45 ` [PATCH 04/11] fs: add support for allowing applications to pass in write life time hints Jens Axboe
2017-06-15  4:15   ` Darrick J. Wong
2017-06-15  4:33     ` Jens Axboe
2017-06-15  8:19       ` Christoph Hellwig
2017-06-15 14:21         ` Jens Axboe
2017-06-15 15:23           ` Jens Axboe
2017-06-16  7:30             ` Christoph Hellwig
2017-06-16 14:35               ` Jens Axboe
2017-06-16  7:33           ` Christoph Hellwig
2017-06-16 14:35             ` Jens Axboe
2017-06-16 14:53               ` Jens Axboe
2017-06-16 15:52               ` Christoph Hellwig
2017-06-16 15:59                 ` Jens Axboe
2017-06-16 16:14                   ` Jens Axboe
2017-06-16 18:00                     ` Christoph Hellwig
2017-06-16 18:02                   ` Christoph Hellwig
2017-06-16 19:35                     ` Jens Axboe
2017-06-15 11:24     ` Al Viro
2017-06-15  3:45 ` [PATCH 05/11] block: add helpers for setting/checking write hint validity Jens Axboe
2017-06-15  3:45 ` [PATCH 06/11] fs: add O_DIRECT support for sending down bio stream information Jens Axboe
2017-06-15  3:45 ` [PATCH 07/11] fs: add support for buffered writeback to pass down write hints Jens Axboe
2017-06-15  3:45 ` [PATCH 08/11] ext4: add support for passing in write hints for buffered writes Jens Axboe
2017-06-15  3:45 ` [PATCH 09/11] xfs: " Jens Axboe
2017-06-15  3:45 ` [PATCH 10/11] btrfs: " Jens Axboe
2017-06-15  3:45 ` [PATCH 11/11] nvme: add support for streams and directives Jens Axboe
2017-06-15  8:12 ` [PATCHSET v4] Add support for write life time hints Christoph Hellwig
2017-06-15 14:23   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2017-06-17 19:59 [PATCHSET v7] " Jens Axboe
2017-06-17 19:59 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
2017-06-19  6:27   ` Christoph Hellwig
2017-06-19 14:56     ` Jens Axboe
2017-06-19 16:02       ` Jens Axboe
2017-06-19 18:58         ` Christoph Hellwig
2017-06-19 19:00           ` Jens Axboe
2017-06-19 19:10             ` Jens Axboe
2017-06-19 20:33               ` Jens Axboe
2017-06-20  2:06                 ` Jens Axboe
2017-06-20  8:57                 ` Christoph Hellwig
2017-06-20 12:43                   ` Jens Axboe
2017-06-20 12:43                     ` Christoph Hellwig
2017-06-20 12:45                       ` Jens Axboe
2017-06-20 12:47                         ` Christoph Hellwig
2017-06-20 12:51                           ` Jens Axboe
2017-06-20 12:56                             ` Christoph Hellwig
2017-06-20 13:00                               ` Jens Axboe
2017-06-16 17:24 [PATCHSET v6] Add support for " Jens Axboe
2017-06-16 17:24 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
2017-06-14 19:05 [PATCHSET v3] Add support for " Jens Axboe
2017-06-14 19:05 ` [PATCH 04/11] fs: add support for allowing applications to pass in " Jens Axboe
2017-06-14 20:26   ` Christoph Hellwig
2017-06-14 20:37     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).