* [PATCH v4 0/5] Write-placement hints and FDP
[not found] <CGME20240826171409epcas5p306ba210a9815e202556778a4c105b440@epcas5p3.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
[not found] ` <CGME20240826171413epcas5p3f62c2cc57b50d6df8fa66af5fe5996c5@epcas5p3.samsung.com>
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Kanchan Joshi
Current write-hint infrastructure supports 6 temperature-based data life
hints.
The series extends the infrastructure with a new temperature-agnostic
placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
send the hint type/value on file. See patch #3 commit description for
the details.
Overall this creates 128 placement hint values [*] that users can pass.
Patch #5 adds the ability to map these new hint values to nvme-specific
placement-identifiers.
Patch #4 restricts SCSI to use only life hint values.
Patch #1 and #2 are simple prep patches.
[*] While the user-interface can support more, this limit is due to the
in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
hole in the inode, but the code had this comment too:
/* 32-bit hole reserved for expanding i_fsnotify_mask */
Not must, but it will be good to know if a byte (or two) can be used
here.
Changes since v3:
- 4 new patches to introduce write-placement hints
- Make nvme patch use the placement hints rather than write-life hints
Changes since v2:
- Base it on nvme-6.11 and resolve a merge conflict
Changes since v1:
- Reduce the fetched plids from 128 to 6 (Keith)
- Use struct_size for a calculation (Keith)
- Handle robot/sparse warning
Kanchan Joshi (4):
fs, block: refactor enum rw_hint
fcntl: rename rw_hint_* to rw_life_hint_*
fcntl: add F_{SET/GET}_RW_HINT_EX
nvme: enable FDP support
Nitesh Shetty (1):
sd: limit to use write life hints
drivers/nvme/host/core.c | 81 ++++++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 4 ++
drivers/scsi/sd.c | 7 ++--
fs/buffer.c | 4 +-
fs/f2fs/f2fs.h | 4 +-
fs/f2fs/segment.c | 4 +-
fs/fcntl.c | 79 ++++++++++++++++++++++++++++++++++---
include/linux/blk-mq.h | 2 +-
include/linux/blk_types.h | 2 +-
include/linux/fs.h | 2 +-
include/linux/nvme.h | 19 +++++++++
include/linux/rw_hint.h | 20 +++++++---
include/uapi/linux/fcntl.h | 14 +++++++
13 files changed, 218 insertions(+), 24 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 1/5] fs, block: refactor enum rw_hint
[not found] ` <CGME20240826171413epcas5p3f62c2cc57b50d6df8fa66af5fe5996c5@epcas5p3.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
2024-08-26 17:44 ` Bart Van Assche
2024-08-30 12:17 ` Bart Van Assche
0 siblings, 2 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Kanchan Joshi
Rename enum rw_hint to rw_life_hint.
Change i_write_hint (in inode), bi_write_hint (in bio) and write_hint
(in request) to use u8 data-type rather than this enum.
This is in preparation to introduce a new write hint type.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
fs/buffer.c | 4 ++--
fs/f2fs/f2fs.h | 4 ++--
fs/f2fs/segment.c | 4 ++--
include/linux/blk-mq.h | 2 +-
include/linux/blk_types.h | 2 +-
include/linux/fs.h | 2 +-
include/linux/rw_hint.h | 9 ++-------
7 files changed, 11 insertions(+), 16 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index e55ad471c530..0c6bc9b7d4ad 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
- enum rw_hint hint, struct writeback_control *wbc);
+ u8 hint, struct writeback_control *wbc);
#define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
@@ -2778,7 +2778,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
}
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
- enum rw_hint write_hint,
+ u8 write_hint,
struct writeback_control *wbc)
{
const enum req_op op = opf & REQ_OP_MASK;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ac19c61f0c3e..b3022dc94a1f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3756,8 +3756,8 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
int __init f2fs_create_segment_manager_caches(void);
void f2fs_destroy_segment_manager_caches(void);
-int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_hint hint);
-enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_life_hint hint);
+enum rw_life_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
enum page_type type, enum temp_type temp);
unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi,
unsigned int segno);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 78c3198a6308..794050f4cdbf 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3381,7 +3381,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
return err;
}
-int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_hint hint)
+int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_life_hint hint)
{
if (F2FS_OPTION(sbi).active_logs == 2)
return CURSEG_HOT_DATA;
@@ -3425,7 +3425,7 @@ int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_hint hint)
* WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
* WRITE_LIFE_LONG " WRITE_LIFE_LONG
*/
-enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+enum rw_life_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
enum page_type type, enum temp_type temp)
{
switch (type) {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8d304b1d16b1..1e5ce84ccf0a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -159,7 +159,7 @@ struct request {
struct blk_crypto_keyslot *crypt_keyslot;
#endif
- enum rw_hint write_hint;
+ u8 write_hint;
unsigned short ioprio;
enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 36ed96133217..446c847bb3b3 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -216,7 +216,7 @@ struct bio {
*/
unsigned short bi_flags; /* BIO_* below */
unsigned short bi_ioprio;
- enum rw_hint bi_write_hint;
+ u8 bi_write_hint;
blk_status_t bi_status;
atomic_t __bi_remaining;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fb0426f349fc..f9a7a2a80661 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -674,7 +674,7 @@ struct inode {
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
u8 i_blkbits;
- enum rw_hint i_write_hint;
+ u8 i_write_hint;
blkcnt_t i_blocks;
#ifdef __NEED_I_SIZE_ORDERED
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
index 309ca72f2dfb..e17fd9fa65d4 100644
--- a/include/linux/rw_hint.h
+++ b/include/linux/rw_hint.h
@@ -7,18 +7,13 @@
#include <uapi/linux/fcntl.h>
/* Block storage write lifetime hint values. */
-enum rw_hint {
+enum rw_life_hint {
WRITE_LIFE_NOT_SET = RWH_WRITE_LIFE_NOT_SET,
WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
-} __packed;
-
-/* Sparse ignores __packed annotations on enums, hence the #ifndef below. */
-#ifndef __CHECKER__
-static_assert(sizeof(enum rw_hint) == 1);
-#endif
+};
#endif /* _LINUX_RW_HINT_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 2/5] fcntl: rename rw_hint_* to rw_life_hint_*
[not found] ` <CGME20240826171417epcas5p1c6dbe318c43324116647dae2129b7eb3@epcas5p1.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Kanchan Joshi
F_GET/SET_RW_HINT fcntl handlers query/set write life hints.
Rename the handlers/helpers to be explicit that write life hints are
being handled.
This is in preparation to introduce a new interface that supports more
than one type of write hint.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
fs/fcntl.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 300e5d9ad913..46cf08f67278 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -269,7 +269,7 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
}
#endif
-static bool rw_hint_valid(u64 hint)
+static bool rw_life_hint_valid(u64 hint)
{
BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
@@ -291,7 +291,7 @@ static bool rw_hint_valid(u64 hint)
}
}
-static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
+static long fcntl_get_rw_life_hint(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct inode *inode = file_inode(file);
@@ -303,7 +303,7 @@ static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
return 0;
}
-static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
+static long fcntl_set_rw_life_hint(struct file *file, unsigned int cmd,
unsigned long arg)
{
struct inode *inode = file_inode(file);
@@ -312,7 +312,7 @@ static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
if (copy_from_user(&hint, argp, sizeof(hint)))
return -EFAULT;
- if (!rw_hint_valid(hint))
+ if (!rw_life_hint_valid(hint))
return -EINVAL;
WRITE_ONCE(inode->i_write_hint, hint);
@@ -449,10 +449,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
err = memfd_fcntl(filp, cmd, argi);
break;
case F_GET_RW_HINT:
- err = fcntl_get_rw_hint(filp, cmd, arg);
+ err = fcntl_get_rw_life_hint(filp, cmd, arg);
break;
case F_SET_RW_HINT:
- err = fcntl_set_rw_hint(filp, cmd, arg);
+ err = fcntl_set_rw_life_hint(filp, cmd, arg);
break;
default:
break;
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 3/5] fcntl: add F_{SET/GET}_RW_HINT_EX
[not found] ` <CGME20240826171422epcas5p2fa8f07dfee9395745f1833a17fd89ae0@epcas5p2.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Kanchan Joshi,
Nitesh Shetty
This is similar to existing F_{SET/GET}_RW_HINT but more
generic/extensible.
F_SET/GET_RW_HINT_EX take a pointer to a struct rw_hint_ex as argument:
struct rw_hint_ex {
__u8 type;
__u8 pad[7];
__u64 val;
};
With F_SET_RW_HINT_EX, the user passes the hint type and its value.
Hint type can be either life hint (TYPE_RW_LIFE_HINT) or placement hint
(TYPE_RW_PLACEMENT_HINT). The interface allows to add more hint types in
future.
Valid values for life hints are same as enforced by existing
fcntl(F_SET_RW_HINT).
Valid values for placement hints are between 0 to 127, both inclusive.
The inode retains either the life hint or the placement hint, whichever
is set later. The set hint type and its value can be queried by
F_GET_RW_HINT_EX.
The i_write_hint field of the inode is a 1-byte field. Use the most
significant bit as the hint type. This bit is set for placement hint.
For life hint, this bit remains zero.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
fs/fcntl.c | 67 ++++++++++++++++++++++++++++++++++++++
include/linux/rw_hint.h | 13 ++++++++
include/uapi/linux/fcntl.h | 14 ++++++++
3 files changed, 94 insertions(+)
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 46cf08f67278..d82fd4142104 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -291,6 +291,14 @@ static bool rw_life_hint_valid(u64 hint)
}
}
+static inline bool rw_placement_hint_valid(u64 val)
+{
+ if (val <= MAX_PLACEMENT_HINT_VAL)
+ return true;
+
+ return false;
+}
+
static long fcntl_get_rw_life_hint(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -327,6 +335,59 @@ static long fcntl_set_rw_life_hint(struct file *file, unsigned int cmd,
return 0;
}
+static long fcntl_get_rw_hint_ex(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct rw_hint_ex __user *rw_hint_ex_p = (void __user *)arg;
+ struct rw_hint_ex rwh = {};
+ struct inode *inode = file_inode(file);
+ u8 hint = READ_ONCE(inode->i_write_hint);
+
+ rwh.type = WRITE_HINT_TYPE(hint);
+ rwh.val = WRITE_HINT_VAL(hint);
+
+ if (copy_to_user(rw_hint_ex_p, &rwh, sizeof(rwh)))
+ return -EFAULT;
+
+ return 0;
+}
+
+static long fcntl_set_rw_hint_ex(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct rw_hint_ex __user *rw_hint_ex_p = (void __user *)arg;
+ struct rw_hint_ex rwh;
+ struct inode *inode = file_inode(file);
+ u64 hint;
+ int i;
+
+ if (copy_from_user(&rwh, rw_hint_ex_p, sizeof(rwh)))
+ return -EFAULT;
+ for (i = 0; i < ARRAY_SIZE(rwh.pad); i++)
+ if (rwh.pad[i])
+ return -EINVAL;
+ switch (rwh.type) {
+ case TYPE_RW_LIFE_HINT:
+ if (!rw_life_hint_valid(rwh.val))
+ return -EINVAL;
+ hint = rwh.val;
+ break;
+ case TYPE_RW_PLACEMENT_HINT:
+ if (!rw_placement_hint_valid(rwh.val))
+ return -EINVAL;
+ hint = PLACEMENT_HINT_TYPE | rwh.val;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ WRITE_ONCE(inode->i_write_hint, hint);
+ if (file->f_mapping->host != inode)
+ WRITE_ONCE(file->f_mapping->host->i_write_hint, hint);
+
+ return 0;
+}
+
/* Is the file descriptor a dup of the file? */
static long f_dupfd_query(int fd, struct file *filp)
{
@@ -454,6 +515,12 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
case F_SET_RW_HINT:
err = fcntl_set_rw_life_hint(filp, cmd, arg);
break;
+ case F_GET_RW_HINT_EX:
+ err = fcntl_get_rw_hint_ex(filp, cmd, arg);
+ break;
+ case F_SET_RW_HINT_EX:
+ err = fcntl_set_rw_hint_ex(filp, cmd, arg);
+ break;
default:
break;
}
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
index e17fd9fa65d4..611c51d23d96 100644
--- a/include/linux/rw_hint.h
+++ b/include/linux/rw_hint.h
@@ -16,4 +16,17 @@ enum rw_life_hint {
WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
};
+#define WRITE_HINT_TYPE_BIT BIT(7)
+#define WRITE_HINT_VAL_MASK (WRITE_HINT_TYPE_BIT - 1)
+#define WRITE_HINT_TYPE(h) (((h) & WRITE_HINT_TYPE_BIT) ? \
+ TYPE_RW_PLACEMENT_HINT : TYPE_RW_LIFE_HINT)
+#define WRITE_HINT_VAL(h) ((h) & WRITE_HINT_VAL_MASK)
+
+#define WRITE_PLACEMENT_HINT(h) (((h) & WRITE_HINT_TYPE_BIT) ? \
+ WRITE_HINT_VAL(h) : 0)
+#define WRITE_LIFE_HINT(h) (((h) & WRITE_HINT_TYPE_BIT) ? \
+ 0 : WRITE_HINT_VAL(h))
+
+#define PLACEMENT_HINT_TYPE WRITE_HINT_TYPE_BIT
+#define MAX_PLACEMENT_HINT_VAL WRITE_HINT_VAL_MASK
#endif /* _LINUX_RW_HINT_H */
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index c0bcc185fa48..dfe77fa86776 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -57,6 +57,8 @@
#define F_SET_RW_HINT (F_LINUX_SPECIFIC_BASE + 12)
#define F_GET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 13)
#define F_SET_FILE_RW_HINT (F_LINUX_SPECIFIC_BASE + 14)
+#define F_GET_RW_HINT_EX (F_LINUX_SPECIFIC_BASE + 15)
+#define F_SET_RW_HINT_EX (F_LINUX_SPECIFIC_BASE + 16)
/*
* Valid hint values for F_{GET,SET}_RW_HINT. 0 is "not set", or can be
@@ -76,6 +78,18 @@
*/
#define RWF_WRITE_LIFE_NOT_SET RWH_WRITE_LIFE_NOT_SET
+enum rw_hint_type {
+ TYPE_RW_LIFE_HINT,
+ TYPE_RW_PLACEMENT_HINT
+};
+
+/* Exchange information with F_{GET/SET}_RW_HINT fcntl */
+struct rw_hint_ex {
+ __u8 type;
+ __u8 pad[7];
+ __u64 val;
+};
+
/*
* Types of directory notifications that may be requested.
*/
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 4/5] sd: limit to use write life hints
[not found] ` <CGME20240826171426epcas5p13c5ffabd6a05ee357bf4e9f78bc5de44@epcas5p1.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Nitesh Shetty,
Kanchan Joshi
From: Nitesh Shetty <nj.shetty@samsung.com>
The incoming hint value maybe either life hint or placement hint.
Make SCSI interpret only temperature-based write life hints.
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
drivers/scsi/sd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 699f4f9674d9..32b8a841c497 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1191,8 +1191,8 @@ static u8 sd_group_number(struct scsi_cmnd *cmd)
if (!sdkp->rscs)
return 0;
- return min3((u32)rq->write_hint, (u32)sdkp->permanent_stream_count,
- 0x3fu);
+ return min3((u32)WRITE_LIFE_HINT(rq->write_hint),
+ (u32)sdkp->permanent_stream_count, 0x3fu);
}
static blk_status_t sd_setup_rw32_cmnd(struct scsi_cmnd *cmd, bool write,
@@ -1390,7 +1390,8 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
protect | fua, dld);
} else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
- sdp->use_10_for_rw || protect || rq->write_hint) {
+ sdp->use_10_for_rw || protect ||
+ WRITE_LIFE_HINT(rq->write_hint)) {
ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
protect | fua);
} else {
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 5/5] nvme: enable FDP support
[not found] ` <CGME20240826171430epcas5p3d8e34a266ced7b3ea0df2a11b83292ae@epcas5p3.samsung.com>
@ 2024-08-26 17:06 ` Kanchan Joshi
2024-09-06 16:04 ` Keith Busch
0 siblings, 1 reply; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-26 17:06 UTC (permalink / raw)
To: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz, Kanchan Joshi,
Nitesh Shetty, Hui Qi
Flexible Data Placement (FDP), as ratified in TP 4146a, allows the host
to control the placement of logical blocks so as to reduce the SSD WAF.
Userspace can send the data placement information using the write hints.
Fetch the placement-identifiers if the device supports FDP.
The incoming placement hint is mapped to a placement-identifier, which
in turn is set in the DSPEC field of the write command.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Hui Qi <hui81.qi@samsung.com>
---
drivers/nvme/host/core.c | 81 ++++++++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 4 ++
include/linux/nvme.h | 19 ++++++++++
3 files changed, 104 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 33fa01c599ad..f93abd7fb163 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -43,6 +43,20 @@ struct nvme_ns_info {
bool is_removed;
};
+struct nvme_fdp_ruh_status_desc {
+ u16 pid;
+ u16 ruhid;
+ u32 earutr;
+ u64 ruamw;
+ u8 rsvd16[16];
+};
+
+struct nvme_fdp_ruh_status {
+ u8 rsvd0[14];
+ __le16 nruhsd;
+ struct nvme_fdp_ruh_status_desc ruhsd[];
+};
+
unsigned int admin_timeout = 60;
module_param(admin_timeout, uint, 0644);
MODULE_PARM_DESC(admin_timeout, "timeout in seconds for admin commands");
@@ -656,6 +670,7 @@ static void nvme_free_ns_head(struct kref *ref)
ida_free(&head->subsys->ns_ida, head->instance);
cleanup_srcu_struct(&head->srcu);
nvme_put_subsystem(head->subsys);
+ kfree(head->plids);
kfree(head);
}
@@ -958,6 +973,17 @@ static bool nvme_valid_atomic_write(struct request *req)
return true;
}
+static inline void nvme_assign_placement_id(struct nvme_ns *ns,
+ struct request *req,
+ struct nvme_command *cmd)
+{
+ u8 h = umin(ns->head->nr_plids - 1,
+ WRITE_PLACEMENT_HINT(req->write_hint));
+
+ cmd->rw.control |= cpu_to_le16(NVME_RW_DTYPE_DPLCMT);
+ cmd->rw.dsmgmt |= cpu_to_le32(ns->head->plids[h] << 16);
+}
+
static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
struct request *req, struct nvme_command *cmnd,
enum nvme_opcode op)
@@ -1077,6 +1103,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
break;
case REQ_OP_WRITE:
ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
+ if (!ret && ns->head->nr_plids)
+ nvme_assign_placement_id(ns, req, cmd);
break;
case REQ_OP_ZONE_APPEND:
ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
@@ -2113,6 +2141,52 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
return ret;
}
+static int nvme_fetch_fdp_plids(struct nvme_ns *ns, u32 nsid)
+{
+ struct nvme_command c = {};
+ struct nvme_fdp_ruh_status *ruhs;
+ struct nvme_fdp_ruh_status_desc *ruhsd;
+ int size, ret, i;
+
+refetch_plids:
+ size = struct_size(ruhs, ruhsd, ns->head->nr_plids);
+ ruhs = kzalloc(size, GFP_KERNEL);
+ if (!ruhs)
+ return -ENOMEM;
+
+ c.imr.opcode = nvme_cmd_io_mgmt_recv;
+ c.imr.nsid = cpu_to_le32(nsid);
+ c.imr.mo = 0x1;
+ c.imr.numd = cpu_to_le32((size >> 2) - 1);
+
+ ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size);
+ if (ret)
+ goto out;
+
+ if (!ns->head->nr_plids) {
+ ns->head->nr_plids = le16_to_cpu(ruhs->nruhsd);
+ ns->head->nr_plids =
+ min_t(u16, ns->head->nr_plids, NVME_MAX_PLIDS);
+
+ if (!ns->head->nr_plids)
+ goto out;
+
+ kfree(ruhs);
+ goto refetch_plids;
+ }
+ ns->head->plids = kzalloc(ns->head->nr_plids * sizeof(u16), GFP_KERNEL);
+ if (!ns->head->plids)
+ return -ENOMEM;
+
+ for (i = 0; i < ns->head->nr_plids; i++) {
+ ruhsd = &ruhs->ruhsd[i];
+ ns->head->plids[i] = le16_to_cpu(ruhsd->pid);
+ }
+out:
+ kfree(ruhs);
+ return ret;
+}
+
static int nvme_update_ns_info_block(struct nvme_ns *ns,
struct nvme_ns_info *info)
{
@@ -2204,6 +2278,13 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
if (ret && !nvme_first_scan(ns->disk))
goto out;
}
+ if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) {
+ ret = nvme_fetch_fdp_plids(ns, info->nsid);
+ if (ret)
+ dev_warn(ns->ctrl->device,
+ "FDP failure status:0x%x\n", ret);
+ }
+
ret = 0;
out:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ae5314d32943..7516823ff8dd 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -450,6 +450,8 @@ struct nvme_ns_ids {
u8 csi;
};
+#define NVME_MAX_PLIDS (MAX_PLACEMENT_HINT_VAL + 1)
+
/*
* Anchor structure for namespaces. There is one for each namespace in a
* NVMe subsystem that any of our controllers can see, and the namespace
@@ -471,6 +473,8 @@ struct nvme_ns_head {
struct kref ref;
bool shared;
bool passthru_err_log_enabled;
+ u16 nr_plids;
+ u16 *plids;
struct nvme_effects_log *effects;
u64 nuse;
unsigned ns_id;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 7b2ae2e43544..12d8db13b66e 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -275,6 +275,7 @@ enum nvme_ctrl_attr {
NVME_CTRL_ATTR_HID_128_BIT = (1 << 0),
NVME_CTRL_ATTR_TBKAS = (1 << 6),
NVME_CTRL_ATTR_ELBAS = (1 << 15),
+ NVME_CTRL_ATTR_FDPS = (1 << 19),
};
struct nvme_id_ctrl {
@@ -843,6 +844,7 @@ enum nvme_opcode {
nvme_cmd_resv_register = 0x0d,
nvme_cmd_resv_report = 0x0e,
nvme_cmd_resv_acquire = 0x11,
+ nvme_cmd_io_mgmt_recv = 0x12,
nvme_cmd_resv_release = 0x15,
nvme_cmd_zone_mgmt_send = 0x79,
nvme_cmd_zone_mgmt_recv = 0x7a,
@@ -864,6 +866,7 @@ enum nvme_opcode {
nvme_opcode_name(nvme_cmd_resv_register), \
nvme_opcode_name(nvme_cmd_resv_report), \
nvme_opcode_name(nvme_cmd_resv_acquire), \
+ nvme_opcode_name(nvme_cmd_io_mgmt_recv), \
nvme_opcode_name(nvme_cmd_resv_release), \
nvme_opcode_name(nvme_cmd_zone_mgmt_send), \
nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \
@@ -1015,6 +1018,7 @@ enum {
NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12,
NVME_RW_PRINFO_PRACT = 1 << 13,
NVME_RW_DTYPE_STREAMS = 1 << 4,
+ NVME_RW_DTYPE_DPLCMT = 2 << 4,
NVME_WZ_DEAC = 1 << 9,
};
@@ -1102,6 +1106,20 @@ struct nvme_zone_mgmt_recv_cmd {
__le32 cdw14[2];
};
+struct nvme_io_mgmt_recv_cmd {
+ __u8 opcode;
+ __u8 flags;
+ __u16 command_id;
+ __le32 nsid;
+ __le64 rsvd2[2];
+ union nvme_data_ptr dptr;
+ __u8 mo;
+ __u8 rsvd11;
+ __u16 mos;
+ __le32 numd;
+ __le32 cdw12[4];
+};
+
enum {
NVME_ZRA_ZONE_REPORT = 0,
NVME_ZRASF_ZONE_REPORT_ALL = 0,
@@ -1822,6 +1840,7 @@ struct nvme_command {
struct nvmf_auth_receive_command auth_receive;
struct nvme_dbbuf dbbuf;
struct nvme_directive_cmd directive;
+ struct nvme_io_mgmt_recv_cmd imr;
};
};
--
2.25.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/5] fs, block: refactor enum rw_hint
2024-08-26 17:06 ` [PATCH v4 1/5] fs, block: refactor enum rw_hint Kanchan Joshi
@ 2024-08-26 17:44 ` Bart Van Assche
2024-08-27 5:12 ` Kanchan Joshi
2024-08-30 12:17 ` Bart Van Assche
1 sibling, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2024-08-26 17:44 UTC (permalink / raw)
To: Kanchan Joshi, axboe, kbusch, hch, sagi, martin.petersen,
James.Bottomley, brauner, jack, jaegeuk, jlayton, chuck.lever
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On 8/26/24 10:06 AM, Kanchan Joshi wrote:
> Change i_write_hint (in inode), bi_write_hint (in bio) and write_hint
> (in request) to use u8 data-type rather than this enum.
That sounds fishy to me. Why to increase the size of this enum? Why to
reduce the ability of the compiler to perform type checking? I think
this needs to be motivated clearly in the patch description.
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/5] fs, block: refactor enum rw_hint
2024-08-26 17:44 ` Bart Van Assche
@ 2024-08-27 5:12 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-08-27 5:12 UTC (permalink / raw)
To: Bart Van Assche, axboe, kbusch, hch, sagi, martin.petersen,
James.Bottomley, brauner, jack, jaegeuk, jlayton, chuck.lever
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On 8/26/2024 11:14 PM, Bart Van Assche wrote:
> On 8/26/24 10:06 AM, Kanchan Joshi wrote:
>> Change i_write_hint (in inode), bi_write_hint (in bio) and write_hint
>> (in request) to use u8 data-type rather than this enum.
>
> That sounds fishy to me. Why to increase the size of this enum? Why to
> reduce the ability of the compiler to perform type checking? I think
> this needs to be motivated clearly in the patch description.
Since inode/bio/request stopped using this, the __packed annotation did
not seem to serve much purpose. But sure, I can retain the size/checks
on the renamed enum (rw_life_hint) too.
Motivation for keeping u8 in inode/bio/request is to represent another
hint type. This is similar to ioprio where multiple io priority
classes/values are expressed within an int type.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 0/5] Write-placement hints and FDP
2024-08-26 17:06 ` [PATCH v4 0/5] Write-placement hints and FDP Kanchan Joshi
` (4 preceding siblings ...)
[not found] ` <CGME20240826171430epcas5p3d8e34a266ced7b3ea0df2a11b83292ae@epcas5p3.samsung.com>
@ 2024-08-30 11:59 ` Javier González
2024-09-03 14:28 ` Kanchan Joshi
6 siblings, 0 replies; 16+ messages in thread
From: Javier González @ 2024-08-30 11:59 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche,
linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g
On 26.08.2024 22:36, Kanchan Joshi wrote:
>Current write-hint infrastructure supports 6 temperature-based data life
>hints.
>The series extends the infrastructure with a new temperature-agnostic
>placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
>send the hint type/value on file. See patch #3 commit description for
>the details.
>
>Overall this creates 128 placement hint values [*] that users can pass.
>Patch #5 adds the ability to map these new hint values to nvme-specific
>placement-identifiers.
>Patch #4 restricts SCSI to use only life hint values.
>Patch #1 and #2 are simple prep patches.
>
>[*] While the user-interface can support more, this limit is due to the
>in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
>hole in the inode, but the code had this comment too:
>
>/* 32-bit hole reserved for expanding i_fsnotify_mask */
>
>Not must, but it will be good to know if a byte (or two) can be used
>here.
>
>Changes since v3:
>- 4 new patches to introduce write-placement hints
>- Make nvme patch use the placement hints rather than write-life hints
>
>Changes since v2:
>- Base it on nvme-6.11 and resolve a merge conflict
>
>Changes since v1:
>- Reduce the fetched plids from 128 to 6 (Keith)
>- Use struct_size for a calculation (Keith)
>- Handle robot/sparse warning
>
>Kanchan Joshi (4):
> fs, block: refactor enum rw_hint
> fcntl: rename rw_hint_* to rw_life_hint_*
> fcntl: add F_{SET/GET}_RW_HINT_EX
> nvme: enable FDP support
>
>Nitesh Shetty (1):
> sd: limit to use write life hints
>
> drivers/nvme/host/core.c | 81 ++++++++++++++++++++++++++++++++++++++
> drivers/nvme/host/nvme.h | 4 ++
> drivers/scsi/sd.c | 7 ++--
> fs/buffer.c | 4 +-
> fs/f2fs/f2fs.h | 4 +-
> fs/f2fs/segment.c | 4 +-
> fs/fcntl.c | 79 ++++++++++++++++++++++++++++++++++---
> include/linux/blk-mq.h | 2 +-
> include/linux/blk_types.h | 2 +-
> include/linux/fs.h | 2 +-
> include/linux/nvme.h | 19 +++++++++
> include/linux/rw_hint.h | 20 +++++++---
> include/uapi/linux/fcntl.h | 14 +++++++
> 13 files changed, 218 insertions(+), 24 deletions(-)
>
>--
>2.25.1
>
Keith, Christoph, Martin
Does this approach align with the offline conversation we had arund FMS?
Comments on the list would help us move forward with this series.
We would like to move the folks that are using off-tree patches for FDP
to mainline support.
Thanks,
Javier
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/5] fs, block: refactor enum rw_hint
2024-08-26 17:06 ` [PATCH v4 1/5] fs, block: refactor enum rw_hint Kanchan Joshi
2024-08-26 17:44 ` Bart Van Assche
@ 2024-08-30 12:17 ` Bart Van Assche
2024-09-02 5:18 ` Kanchan Joshi
1 sibling, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2024-08-30 12:17 UTC (permalink / raw)
To: Kanchan Joshi, axboe, kbusch, hch, sagi, martin.petersen,
James.Bottomley, brauner, jack, jaegeuk, jlayton, chuck.lever
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On 8/26/24 1:06 PM, Kanchan Joshi wrote:
> /* Block storage write lifetime hint values. */
> -enum rw_hint {
> +enum rw_life_hint {
The name "rw_life_hint" seems confusing to me. I think that the
name "rw_lifetime_hint" would be a better name.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/5] fs, block: refactor enum rw_hint
2024-08-30 12:17 ` Bart Van Assche
@ 2024-09-02 5:18 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-02 5:18 UTC (permalink / raw)
To: Bart Van Assche, axboe, kbusch, hch, sagi, martin.petersen,
James.Bottomley, brauner, jack, jaegeuk, jlayton, chuck.lever
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On 8/30/2024 5:47 PM, Bart Van Assche wrote:
> On 8/26/24 1:06 PM, Kanchan Joshi wrote:
>> /* Block storage write lifetime hint values. */
>> -enum rw_hint {
>> +enum rw_life_hint {
>
> The name "rw_life_hint" seems confusing to me. I think that the
> name "rw_lifetime_hint" would be a better name.
>
I can change to that in next iteration.
This change needs to be consistent in all the places. But more important
in patch #3 (as we expose TYPE_RW_LIFE_HINT to userspace). Do you have
comments on the other parts?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 0/5] Write-placement hints and FDP
2024-08-26 17:06 ` [PATCH v4 0/5] Write-placement hints and FDP Kanchan Joshi
` (5 preceding siblings ...)
2024-08-30 11:59 ` [PATCH v4 0/5] Write-placement hints and FDP Javier González
@ 2024-09-03 14:28 ` Kanchan Joshi
2024-09-03 14:35 ` Christian Brauner
6 siblings, 1 reply; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-03 14:28 UTC (permalink / raw)
To: amir73il, kbusch, hch, sagi, martin.petersen, James.Bottomley,
brauner, jack, jaegeuk, jlayton, chuck.lever, bvanassche,
axboe@kernel.dk
Cc: linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
Hi Amir,
On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
> Current write-hint infrastructure supports 6 temperature-based data life
> hints.
> The series extends the infrastructure with a new temperature-agnostic
> placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
> send the hint type/value on file. See patch #3 commit description for
> the details.
>
> Overall this creates 128 placement hint values [*] that users can pass.
> Patch #5 adds the ability to map these new hint values to nvme-specific
> placement-identifiers.
> Patch #4 restricts SCSI to use only life hint values.
> Patch #1 and #2 are simple prep patches.
>
> [*] While the user-interface can support more, this limit is due to the
> in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
> hole in the inode, but the code had this comment too:
>
> /* 32-bit hole reserved for expanding i_fsnotify_mask */
>
> Not must, but it will be good to know if a byte (or two) can be used
> here.
Since having one extra byte will simplify things, I can't help but ask -
do you still have the plans to use this space (in entirety) within inode?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 0/5] Write-placement hints and FDP
2024-09-03 14:28 ` Kanchan Joshi
@ 2024-09-03 14:35 ` Christian Brauner
2024-09-04 14:57 ` Kanchan Joshi
0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2024-09-03 14:35 UTC (permalink / raw)
To: Kanchan Joshi
Cc: amir73il, kbusch, hch, sagi, martin.petersen, James.Bottomley,
jack, jaegeuk, jlayton, chuck.lever, bvanassche, axboe@kernel.dk,
linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On Tue, Sep 03, 2024 at 07:58:46PM GMT, Kanchan Joshi wrote:
> Hi Amir,
>
>
> On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
> > Current write-hint infrastructure supports 6 temperature-based data life
> > hints.
> > The series extends the infrastructure with a new temperature-agnostic
> > placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
> > send the hint type/value on file. See patch #3 commit description for
> > the details.
> >
> > Overall this creates 128 placement hint values [*] that users can pass.
> > Patch #5 adds the ability to map these new hint values to nvme-specific
> > placement-identifiers.
> > Patch #4 restricts SCSI to use only life hint values.
> > Patch #1 and #2 are simple prep patches.
> >
> > [*] While the user-interface can support more, this limit is due to the
> > in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
> > hole in the inode, but the code had this comment too:
> >
> > /* 32-bit hole reserved for expanding i_fsnotify_mask */
> >
> > Not must, but it will be good to know if a byte (or two) can be used
> > here.
>
> Since having one extra byte will simplify things, I can't help but ask -
> do you still have the plans to use this space (in entirety) within inode?
I just freed up 8 bytes in struct inode with what's currently in -next.
There will be no using up those 8 bytes unless it's for a good reason
and something that is very widely useful.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 0/5] Write-placement hints and FDP
2024-09-03 14:35 ` Christian Brauner
@ 2024-09-04 14:57 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-04 14:57 UTC (permalink / raw)
To: Christian Brauner
Cc: amir73il, kbusch, hch, sagi, martin.petersen, James.Bottomley,
jack, jaegeuk, jlayton, chuck.lever, bvanassche, axboe@kernel.dk,
linux-nvme, linux-fsdevel, linux-f2fs-devel, linux-block,
linux-scsi, gost.dev, vishak.g, javier.gonz
On 9/3/2024 8:05 PM, Christian Brauner wrote:
> On Tue, Sep 03, 2024 at 07:58:46PM GMT, Kanchan Joshi wrote:
>> Hi Amir,
>>
>>
>> On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
>>> Current write-hint infrastructure supports 6 temperature-based data life
>>> hints.
>>> The series extends the infrastructure with a new temperature-agnostic
>>> placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
>>> send the hint type/value on file. See patch #3 commit description for
>>> the details.
>>>
>>> Overall this creates 128 placement hint values [*] that users can pass.
>>> Patch #5 adds the ability to map these new hint values to nvme-specific
>>> placement-identifiers.
>>> Patch #4 restricts SCSI to use only life hint values.
>>> Patch #1 and #2 are simple prep patches.
>>>
>>> [*] While the user-interface can support more, this limit is due to the
>>> in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
>>> hole in the inode, but the code had this comment too:
>>>
>>> /* 32-bit hole reserved for expanding i_fsnotify_mask */
>>>
>>> Not must, but it will be good to know if a byte (or two) can be used
>>> here.
>>
>> Since having one extra byte will simplify things, I can't help but ask -
>> do you still have the plans to use this space (in entirety) within inode?
>
> I just freed up 8 bytes in struct inode with what's currently in -next.
> There will be no using up those 8 bytes unless it's for a good reason
> and something that is very widely useful.
I see, so now there are two holes. Seems the plan is to co-locate these
and reduce the size by 8 bytes. Thanks for the pointer. Primary reason
is a bit cleaner plumbing, but I'll manage without extra space.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 5/5] nvme: enable FDP support
2024-08-26 17:06 ` [PATCH v4 5/5] nvme: enable FDP support Kanchan Joshi
@ 2024-09-06 16:04 ` Keith Busch
2024-09-10 9:25 ` Kanchan Joshi
0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2024-09-06 16:04 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, hch, sagi, martin.petersen, James.Bottomley, brauner, jack,
jaegeuk, jlayton, chuck.lever, bvanassche, linux-nvme,
linux-fsdevel, linux-f2fs-devel, linux-block, linux-scsi,
gost.dev, vishak.g, javier.gonz, Nitesh Shetty, Hui Qi
On Mon, Aug 26, 2024 at 10:36:06PM +0530, Kanchan Joshi wrote:
> Flexible Data Placement (FDP), as ratified in TP 4146a, allows the host
> to control the placement of logical blocks so as to reduce the SSD WAF.
>
> Userspace can send the data placement information using the write hints.
> Fetch the placement-identifiers if the device supports FDP.
>
> The incoming placement hint is mapped to a placement-identifier, which
> in turn is set in the DSPEC field of the write command.
>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Hui Qi <hui81.qi@samsung.com>
I'm still fine with this nvme implementation.
Acked-by: Keith Busch <kbusch@kernel.org>
The reporting via fcntl looks okay to me, but I've never added anything
to that interface, so not sure if there's any problem using it for this.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 5/5] nvme: enable FDP support
2024-09-06 16:04 ` Keith Busch
@ 2024-09-10 9:25 ` Kanchan Joshi
0 siblings, 0 replies; 16+ messages in thread
From: Kanchan Joshi @ 2024-09-10 9:25 UTC (permalink / raw)
To: Keith Busch
Cc: axboe, hch, sagi, martin.petersen, James.Bottomley, brauner, jack,
jaegeuk, jlayton, chuck.lever, bvanassche, linux-nvme,
linux-fsdevel, linux-f2fs-devel, linux-block, linux-scsi,
gost.dev, vishak.g, javier.gonz, Nitesh Shetty, Hui Qi
On 9/6/2024 9:34 PM, Keith Busch wrote:
> On Mon, Aug 26, 2024 at 10:36:06PM +0530, Kanchan Joshi wrote:
>> Flexible Data Placement (FDP), as ratified in TP 4146a, allows the host
>> to control the placement of logical blocks so as to reduce the SSD WAF.
>>
>> Userspace can send the data placement information using the write hints.
>> Fetch the placement-identifiers if the device supports FDP.
>>
>> The incoming placement hint is mapped to a placement-identifier, which
>> in turn is set in the DSPEC field of the write command.
>>
>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>> Signed-off-by: Hui Qi <hui81.qi@samsung.com>
>
> I'm still fine with this nvme implementation.
>
> Acked-by: Keith Busch <kbusch@kernel.org>
>
> The reporting via fcntl looks okay to me, but I've never added anything
> to that interface, so not sure if there's any problem using it for this.
>
The difference comes only in the fcntl interface (hint type/value pair
rather than just value), otherwise it piggybacks on the same kernel
infrastructure that ensures the hint is propagated fine. So I do not
foresee problems.
And FWIW, we have had precedents when a revamped fcntl was introduced to
do what was not possible with the existing fcntl. Like:
F_{GET/SET}OWN_EX over F_{GET/SET}OWN.
Per-file hinting has its uses, particularly for buffered IO. But the
current interface can only do data-lifetime hints. The revamped
interface may come handy for other things too (e.g., KPIO).
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-09-10 9:26 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20240826171409epcas5p306ba210a9815e202556778a4c105b440@epcas5p3.samsung.com>
2024-08-26 17:06 ` [PATCH v4 0/5] Write-placement hints and FDP Kanchan Joshi
[not found] ` <CGME20240826171413epcas5p3f62c2cc57b50d6df8fa66af5fe5996c5@epcas5p3.samsung.com>
2024-08-26 17:06 ` [PATCH v4 1/5] fs, block: refactor enum rw_hint Kanchan Joshi
2024-08-26 17:44 ` Bart Van Assche
2024-08-27 5:12 ` Kanchan Joshi
2024-08-30 12:17 ` Bart Van Assche
2024-09-02 5:18 ` Kanchan Joshi
[not found] ` <CGME20240826171417epcas5p1c6dbe318c43324116647dae2129b7eb3@epcas5p1.samsung.com>
2024-08-26 17:06 ` [PATCH v4 2/5] fcntl: rename rw_hint_* to rw_life_hint_* Kanchan Joshi
[not found] ` <CGME20240826171422epcas5p2fa8f07dfee9395745f1833a17fd89ae0@epcas5p2.samsung.com>
2024-08-26 17:06 ` [PATCH v4 3/5] fcntl: add F_{SET/GET}_RW_HINT_EX Kanchan Joshi
[not found] ` <CGME20240826171426epcas5p13c5ffabd6a05ee357bf4e9f78bc5de44@epcas5p1.samsung.com>
2024-08-26 17:06 ` [PATCH v4 4/5] sd: limit to use write life hints Kanchan Joshi
[not found] ` <CGME20240826171430epcas5p3d8e34a266ced7b3ea0df2a11b83292ae@epcas5p3.samsung.com>
2024-08-26 17:06 ` [PATCH v4 5/5] nvme: enable FDP support Kanchan Joshi
2024-09-06 16:04 ` Keith Busch
2024-09-10 9:25 ` Kanchan Joshi
2024-08-30 11:59 ` [PATCH v4 0/5] Write-placement hints and FDP Javier González
2024-09-03 14:28 ` Kanchan Joshi
2024-09-03 14:35 ` Christian Brauner
2024-09-04 14:57 ` Kanchan Joshi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).