* [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA
@ 2026-02-26 19:04 Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 1/8] nvme: add preferred I/O size fields to struct nvme_id_ns_nvm Caleb Sander Mateos
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
NVMe block devices always report the logical block size for the
discard_granularity queue limit. However, more accurate values may be
available in the NPDG/NPDA fields of the Identify Namespace structure or
the NPDGL/NPDAL fields of the NVM Command Set Specific Identify
Namespace structure. So use these values to compute discard_granularity.
Also fix the use of the OPTPERF field to better comply with version 2.1
of the NVMe spec.
Update the target side to report NPDGL and NPDAL as well, in case the
discard granularity doesn't fit in the 16-bit NPDG and NPDA fields.
v4:
- Issue NVM Command Set Specific Identify Namespace regardless of
controller version (Keith)
v3:
- Introduce from0based() (Christoph)
- Fix merge conflict resolution (Keith)
- Add more comments (Christoph)
- Add Reviewed-by tags (Christoph)
v2:
- Only use low bit of OPTPERF on pre-2.1 controllers (Christoph)
- Add Reviewed-by tags (Christoph)
Caleb Sander Mateos (8):
nvme: add preferred I/O size fields to struct nvme_id_ns_nvm
nvme: fold nvme_config_discard() into nvme_update_disk_info()
nvme: update nvme_id_ns OPTPERF constants
nvme: always issue I/O Command Set specific Identify Namespace
nvme: add from0based() helper
nvme: set discard_granularity from NPDG/NPDA
nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT
nvmet: report NPDGL and NPDAL
drivers/nvme/host/core.c | 90 ++++++++++++++++++++-----------
drivers/nvme/host/nvme.h | 6 +++
drivers/nvme/target/admin-cmd.c | 2 +
drivers/nvme/target/io-cmd-bdev.c | 19 +++++--
drivers/nvme/target/nvmet.h | 2 +
include/linux/nvme.h | 15 +++++-
6 files changed, 96 insertions(+), 38 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4 1/8] nvme: add preferred I/O size fields to struct nvme_id_ns_nvm
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 2/8] nvme: fold nvme_config_discard() into nvme_update_disk_info() Caleb Sander Mateos
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
A subsequent change will use the NPDGL and NPDAL fields of the NVM
Command Set Specific Identify Namespace structure, so add them (and the
handful of intervening fields) to struct nvme_id_ns_nvm. Add an
assertion that the size is still 4 KB.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
include/linux/nvme.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 655d194f8e72..1134e6bf2d5c 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -511,13 +511,20 @@ struct nvme_id_ctrl_zns {
struct nvme_id_ns_nvm {
__le64 lbstm;
__u8 pic;
__u8 rsvd9[3];
__le32 elbaf[64];
- __u8 rsvd268[3828];
+ __le32 npdgl;
+ __le32 nprg;
+ __le32 npra;
+ __le32 nors;
+ __le32 npdal;
+ __u8 rsvd288[3808];
};
+static_assert(sizeof(struct nvme_id_ns_nvm) == 4096);
+
enum {
NVME_ID_NS_NVM_STS_MASK = 0x7f,
NVME_ID_NS_NVM_GUARD_SHIFT = 7,
NVME_ID_NS_NVM_GUARD_MASK = 0x3,
NVME_ID_NS_NVM_QPIF_SHIFT = 9,
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 2/8] nvme: fold nvme_config_discard() into nvme_update_disk_info()
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 1/8] nvme: add preferred I/O size fields to struct nvme_id_ns_nvm Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 3/8] nvme: update nvme_id_ns OPTPERF constants Caleb Sander Mateos
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
The choice of what queue limits are set in nvme_update_disk_info() vs.
nvme_config_discard() seems a bit arbitrary. A subsequent commit will
compute the discard_granularity limit using struct nvme_id_ns, which is
only passed to nvme_update_disk_info() currently. So move the logic in
nvme_config_discard() to nvme_update_disk_info(). Replace several
instances of ns->ctrl in nvme_update_disk_info() with the ctrl variable
brought from nvme_config_discard().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/core.c | 43 ++++++++++++++++++----------------------
1 file changed, 19 insertions(+), 24 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3a2126584a23..8dda2fe69789 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1880,30 +1880,10 @@ static bool nvme_init_integrity(struct nvme_ns_head *head,
bi->pi_offset = info->pi_offset;
}
return true;
}
-static void nvme_config_discard(struct nvme_ns *ns, struct queue_limits *lim)
-{
- struct nvme_ctrl *ctrl = ns->ctrl;
-
- if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(ns->head, UINT_MAX))
- lim->max_hw_discard_sectors =
- nvme_lba_to_sect(ns->head, ctrl->dmrsl);
- else if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
- lim->max_hw_discard_sectors = UINT_MAX;
- else
- lim->max_hw_discard_sectors = 0;
-
- lim->discard_granularity = lim->logical_block_size;
-
- if (ctrl->dmrl)
- lim->max_discard_segments = ctrl->dmrl;
- else
- lim->max_discard_segments = NVME_DSM_MAX_RANGES;
-}
-
static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
{
return uuid_equal(&a->uuid, &b->uuid) &&
memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 &&
memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0 &&
@@ -2078,10 +2058,11 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl,
static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
struct queue_limits *lim)
{
struct nvme_ns_head *head = ns->head;
+ struct nvme_ctrl *ctrl = ns->ctrl;
u32 bs = 1U << head->lba_shift;
u32 atomic_bs, phys_bs, io_opt = 0;
bool valid = true;
/*
@@ -2112,15 +2093,30 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
*/
lim->logical_block_size = bs;
lim->physical_block_size = min(phys_bs, atomic_bs);
lim->io_min = phys_bs;
lim->io_opt = io_opt;
- if ((ns->ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES) &&
- (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM))
+ if ((ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES) &&
+ (ctrl->oncs & NVME_CTRL_ONCS_DSM))
lim->max_write_zeroes_sectors = UINT_MAX;
else
- lim->max_write_zeroes_sectors = ns->ctrl->max_zeroes_sectors;
+ lim->max_write_zeroes_sectors = ctrl->max_zeroes_sectors;
+
+ if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(ns->head, UINT_MAX))
+ lim->max_hw_discard_sectors =
+ nvme_lba_to_sect(ns->head, ctrl->dmrsl);
+ else if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
+ lim->max_hw_discard_sectors = UINT_MAX;
+ else
+ lim->max_hw_discard_sectors = 0;
+
+ lim->discard_granularity = lim->logical_block_size;
+
+ if (ctrl->dmrl)
+ lim->max_discard_segments = ctrl->dmrl;
+ else
+ lim->max_discard_segments = NVME_DSM_MAX_RANGES;
return valid;
}
static bool nvme_ns_is_readonly(struct nvme_ns *ns, struct nvme_ns_info *info)
{
@@ -2381,11 +2377,10 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
if (!nvme_update_disk_info(ns, id, &lim))
capacity = 0;
- nvme_config_discard(ns, &lim);
if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
ns->head->ids.csi == NVME_CSI_ZNS)
nvme_update_zone_info(ns, &lim, &zi);
if ((ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT) && !info->no_vwc)
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 3/8] nvme: update nvme_id_ns OPTPERF constants
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 1/8] nvme: add preferred I/O size fields to struct nvme_id_ns_nvm Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 2/8] nvme: fold nvme_config_discard() into nvme_update_disk_info() Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace Caleb Sander Mateos
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
In NVMe verson 2.0 and below, OPTPERF comprises only bit 4 of NSFEAT in
the Identify Namespace structure. Since version 2.1, OPTPERF includes
both bits 4 and 5 of NSFEAT. Replace the NVME_NS_FEAT_IO_OPT constant
with NVME_NS_FEAT_OPTPERF_SHIFT, NVME_NS_FEAT_OPTPERF_MASK, and
NVME_NS_FEAT_OPTPERF_MASK_2_1, representing the first bit, pre-2.1 bit
width, and post-2.1 bit width of OPTPERF.
Update nvme_update_disk_info() to check both OPTPERF bits for
controllers that report version 2.1 or newer, as NPWG and NOWS are
supported even if only bit 5 is set.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/nvme/host/core.c | 8 +++++++-
include/linux/nvme.h | 6 +++++-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8dda2fe69789..bff6f26d7bcf 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2062,10 +2062,11 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
struct nvme_ns_head *head = ns->head;
struct nvme_ctrl *ctrl = ns->ctrl;
u32 bs = 1U << head->lba_shift;
u32 atomic_bs, phys_bs, io_opt = 0;
bool valid = true;
+ u8 optperf;
/*
* The block layer can't support LBA sizes larger than the page size
* or smaller than a sector size yet, so catch this early and don't
* allow block I/O.
@@ -2076,11 +2077,16 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
}
phys_bs = bs;
atomic_bs = nvme_configure_atomic_write(ns, id, lim, bs);
- if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {
+ optperf = id->nsfeat >> NVME_NS_FEAT_OPTPERF_SHIFT;
+ if (ctrl->vs >= NVME_VS(2, 1, 0))
+ optperf &= NVME_NS_FEAT_OPTPERF_MASK_2_1;
+ else
+ optperf &= NVME_NS_FEAT_OPTPERF_MASK;
+ if (optperf) {
/* NPWG = Namespace Preferred Write Granularity */
phys_bs = bs * (1 + le16_to_cpu(id->npwg));
/* NOWS = Namespace Optimal Write Size */
if (id->nows)
io_opt = bs * (1 + le16_to_cpu(id->nows));
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 1134e6bf2d5c..d840f5fe79fa 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -595,11 +595,15 @@ enum {
};
enum {
NVME_NS_FEAT_THIN = 1 << 0,
NVME_NS_FEAT_ATOMICS = 1 << 1,
- NVME_NS_FEAT_IO_OPT = 1 << 4,
+ NVME_NS_FEAT_OPTPERF_SHIFT = 4,
+ /* In NVMe version 2.0 and below, OPTPERF is only bit 4 of NSFEAT */
+ NVME_NS_FEAT_OPTPERF_MASK = 0x1,
+ /* Since version 2.1, OPTPERF is bits 4 and 5 of NSFEAT */
+ NVME_NS_FEAT_OPTPERF_MASK_2_1 = 0x3,
NVME_NS_ATTR_RO = 1 << 0,
NVME_NS_FLBAS_LBA_MASK = 0xf,
NVME_NS_FLBAS_LBA_UMASK = 0x60,
NVME_NS_FLBAS_LBA_SHIFT = 1,
NVME_NS_FLBAS_META_EXT = 0x10,
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (2 preceding siblings ...)
2026-02-26 19:04 ` [PATCH v4 3/8] nvme: update nvme_id_ns OPTPERF constants Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 21:02 ` Keith Busch
2026-02-26 19:04 ` [PATCH v4 5/8] nvme: add from0based() helper Caleb Sander Mateos
` (3 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
Currently, the I/O Command Set specific Identify Namespace structure is
only fetched for controllers that support extended LBA formats. This is
because struct nvme_id_ns_nvm is only used by nvme_configure_pi_elbas(),
which is only called when the ELBAS bit is set in the CTRATT field of
the Identify Controller structure.
However, the I/O Command Set specific Identify Namespace structure will
soon be used in nvme_update_disk_info(), so always try to obtain it in
nvme_update_ns_info_block(). This Identify structure is first defined in
NVMe spec version 2.0, but controllers reporting older versions could
still implement it.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/core.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bff6f26d7bcf..14e52b260f5d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2352,15 +2352,13 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
ret = -ENXIO;
goto out;
}
lbaf = nvme_lbaf_index(id->flbas);
- if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) {
- ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
- if (ret < 0)
- goto out;
- }
+ ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
+ if (ret < 0)
+ goto out;
if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
ns->head->ids.csi == NVME_CSI_ZNS) {
ret = nvme_query_zone_info(ns, lbaf, &zi);
if (ret < 0)
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 5/8] nvme: add from0based() helper
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (3 preceding siblings ...)
2026-02-26 19:04 ` [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
The NVMe specifications are big fans of "0's based"/"0-based" fields for
encoding values that must be positive. The encoded value is 1 less than
the value it represents. nvmet already provides a helper to0based() for
encoding 0's based values, so add a corresponding helper to decode these
fields on the host side.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/nvme/host/nvme.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9971045dbc05..ccd5e05dac98 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -760,10 +760,16 @@ static inline sector_t nvme_lba_to_sect(struct nvme_ns_head *head, u64 lba)
static inline u32 nvme_bytes_to_numd(size_t len)
{
return (len >> 2) - 1;
}
+/* Decode a 2-byte "0's based"/"0-based" field */
+static inline u32 from0based(__le16 value)
+{
+ return (u32)le16_to_cpu(value) + 1;
+}
+
static inline bool nvme_is_ana_error(u16 status)
{
switch (status & NVME_SCT_SC_MASK) {
case NVME_SC_ANA_TRANSITION:
case NVME_SC_ANA_INACCESSIBLE:
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (4 preceding siblings ...)
2026-02-26 19:04 ` [PATCH v4 5/8] nvme: add from0based() helper Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 21:20 ` Keith Busch
2026-02-26 19:04 ` [PATCH v4 7/8] nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 8/8] nvmet: report NPDGL and NPDAL Caleb Sander Mateos
7 siblings, 1 reply; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
Currently, nvme_config_discard() always sets the discard_granularity
queue limit to the logical block size. However, NVMe namespaces can
advertise a larger preferred discard granularity in the NPDG or NPDA
field of the Identify Namespace structure or the NPDGL or NPDAL fields
of the I/O Command Set Specific Identify Namespace structure.
Use these fields to compute the discard_granularity limit. The logic is
somewhat involved. First, the fields are optional. NPDG is only reported
if the low bit of OPTPERF is set in NSFEAT. NPDA is reported if any bit
of OPTPERF is set. And NPDGL and NPDAL are reported if the high bit of
OPTPERF is set. NPDGL and NPDAL can also each be set to 0 to opt out of
reporting a limit. I/O Command Set Specific Identify Namespace may also
not be supported by older NVMe controllers. Another complication is that
multiple values may be reported among NPDG, NPDGL, NPDA, and NPDAL. The
spec says to prefer the values reported in the L variants. The spec says
NPDG should be a multiple of NPDA and NPDGL should be a multiple of
NPDAL, but it doesn't specify a relationship between NPDG and NPDAL or
NPDGL and NPDA. So use the maximum of the reported NPDG(L) and NPDA(L)
values as the discard_granularity.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
drivers/nvme/host/core.c | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 14e52b260f5d..20441985cad1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2055,16 +2055,17 @@ static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl,
lim->max_segment_size = UINT_MAX;
lim->dma_alignment = 3;
}
static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
- struct queue_limits *lim)
+ struct nvme_id_ns_nvm *nvm, struct queue_limits *lim)
{
struct nvme_ns_head *head = ns->head;
struct nvme_ctrl *ctrl = ns->ctrl;
u32 bs = 1U << head->lba_shift;
u32 atomic_bs, phys_bs, io_opt = 0;
+ u32 npdg = 1, npda = 1;
bool valid = true;
u8 optperf;
/*
* The block layer can't support LBA sizes larger than the page size
@@ -2113,11 +2114,37 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
else if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
lim->max_hw_discard_sectors = UINT_MAX;
else
lim->max_hw_discard_sectors = 0;
- lim->discard_granularity = lim->logical_block_size;
+ /*
+ * NVMe namespaces advertise both a preferred deallocate granularity
+ * (for a discard length) and alignment (for a discard starting offset).
+ * However, Linux block devices advertise a single discard_granularity.
+ * From NVM Command Set specification 1.1 section 5.2.2, the NPDGL/NPDAL
+ * fields in the NVM Command Set Specific Identify Namespace structure
+ * are preferred to NPDG/NPDA in the Identify Namespace structure since
+ * they can represent larger values. However, NPDGL or NPDAL may be 0 if
+ * unsupported. NPDG and NPDA are 0's based.
+ * From Figure 115 of NVM Command Set specification 1.1, NPDGL and NPDAL
+ * are supported if the high bit of OPTPERF is set. NPDG is supported if
+ * the low bit of OPTPERF is set. NPDA is supported if either is set.
+ * NPDG should be a multiple of NPDA, and likewise NPDGL should be a
+ * multiple of NPDAL, but the spec doesn't say anything about NPDG vs.
+ * NPDAL or NPDGL vs. NPDA. So compute the maximum instead of assuming
+ * NPDG(L) is the larger. If neither NPDG, NPDGL, NPDA, nor NPDAL are
+ * supported, default the discard_granularity to the logical block size.
+ */
+ if (optperf & 0x2 && nvm && nvm->npdgl)
+ npdg = le32_to_cpu(nvm->npdgl);
+ else if (optperf & 0x1)
+ npdg = from0based(id->npdg);
+ if (optperf & 0x2 && nvm && nvm->npdal)
+ npda = le32_to_cpu(nvm->npdal);
+ else if (optperf)
+ npda = from0based(id->npda);
+ lim->discard_granularity = max(npdg, npda) * lim->logical_block_size;
if (ctrl->dmrl)
lim->max_discard_segments = ctrl->dmrl;
else
lim->max_discard_segments = NVME_DSM_MAX_RANGES;
@@ -2378,11 +2405,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
ns->head->nuse = le64_to_cpu(id->nuse);
capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
nvme_set_ctrl_limits(ns->ctrl, &lim, false);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
- if (!nvme_update_disk_info(ns, id, &lim))
+ if (!nvme_update_disk_info(ns, id, nvm, &lim))
capacity = 0;
if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
ns->head->ids.csi == NVME_CSI_ZNS)
nvme_update_zone_info(ns, &lim, &zi);
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 7/8] nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (5 preceding siblings ...)
2026-02-26 19:04 ` [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 8/8] nvmet: report NPDGL and NPDAL Caleb Sander Mateos
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
Use the NVME_NS_FEAT_OPTPERF_SHIFT constant in nvmet_bdev_set_limits()
to set the OPTPERF bits of the nvme_id_ns NSFEAT field instead of the
magic number 4.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/target/io-cmd-bdev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 8d246b8ca604..d94f885a56d9 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -28,15 +28,15 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
id->nawun = lpp0b;
id->nawupf = lpp0b;
id->nacwu = lpp0b;
/*
- * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
+ * OPTPERF = 01b indicates that the fields NPWG, NPWA, NPDG, NPDA, and
* NOWS are defined for this namespace and should be used by
* the host for I/O optimization.
*/
- id->nsfeat |= 1 << 4;
+ id->nsfeat |= 0x1 << NVME_NS_FEAT_OPTPERF_SHIFT;
/* NPWG = Namespace Preferred Write Granularity. 0's based */
id->npwg = to0based(bdev_io_min(bdev) / bdev_logical_block_size(bdev));
/* NPWA = Namespace Preferred Write Alignment. 0's based */
id->npwa = id->npwg;
/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v4 8/8] nvmet: report NPDGL and NPDAL
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
` (6 preceding siblings ...)
2026-02-26 19:04 ` [PATCH v4 7/8] nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT Caleb Sander Mateos
@ 2026-02-26 19:04 ` Caleb Sander Mateos
7 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 19:04 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni
Cc: linux-nvme, linux-kernel, Caleb Sander Mateos
A block device with a very large discard_granularity queue limit may not
be able to report it in the 16-bit NPDG and NPDA fields in the Identify
Namespace data structure. For this reason, version 2.1 of the NVMe specs
added 32-bit fields NPDGL and NPDAL to the NVM Command Set Specific
Identify Namespace structure. So report the discard_granularity there
too and set OPTPERF to 11b to indicate those fields are supported.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/target/admin-cmd.c | 2 ++
drivers/nvme/target/io-cmd-bdev.c | 19 +++++++++++++++----
drivers/nvme/target/nvmet.h | 2 ++
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 3da31bb1183e..72e733b62a2c 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -1056,10 +1056,12 @@ static void nvme_execute_identify_ns_nvm(struct nvmet_req *req)
id = kzalloc(sizeof(*id), GFP_KERNEL);
if (!id) {
status = NVME_SC_INTERNAL;
goto out;
}
+ if (req->ns->bdev)
+ nvmet_bdev_set_nvm_limits(req->ns->bdev, id);
status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
kfree(id);
out:
nvmet_req_complete(req, status);
}
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index d94f885a56d9..485b5cd42e4f 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -28,15 +28,15 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
id->nawun = lpp0b;
id->nawupf = lpp0b;
id->nacwu = lpp0b;
/*
- * OPTPERF = 01b indicates that the fields NPWG, NPWA, NPDG, NPDA, and
- * NOWS are defined for this namespace and should be used by
- * the host for I/O optimization.
+ * OPTPERF = 11b indicates that the fields NPWG, NPWA, NPDG, NPDA,
+ * NPDGL, NPDAL, and NOWS are defined for this namespace and should be
+ * used by the host for I/O optimization.
*/
- id->nsfeat |= 0x1 << NVME_NS_FEAT_OPTPERF_SHIFT;
+ id->nsfeat |= 0x3 << NVME_NS_FEAT_OPTPERF_SHIFT;
/* NPWG = Namespace Preferred Write Granularity. 0's based */
id->npwg = to0based(bdev_io_min(bdev) / bdev_logical_block_size(bdev));
/* NPWA = Namespace Preferred Write Alignment. 0's based */
id->npwa = id->npwg;
/* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
@@ -50,10 +50,21 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
/* Set WZDS and DRB if device supports unmapped write zeroes */
if (bdev_write_zeroes_unmap_sectors(bdev))
id->dlfeat = (1 << 3) | 0x1;
}
+void nvmet_bdev_set_nvm_limits(struct block_device *bdev,
+ struct nvme_id_ns_nvm *id)
+{
+ /*
+ * NPDGL = Namespace Preferred Deallocate Granularity Large
+ * NPDAL = Namespace Preferred Deallocate Alignment Large
+ */
+ id->npdgl = id->npdal = cpu_to_le32(bdev_discard_granularity(bdev) /
+ bdev_logical_block_size(bdev));
+}
+
void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
{
if (ns->bdev_file) {
fput(ns->bdev_file);
ns->bdev = NULL;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index b664b584fdc8..3a7efd9cb81a 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -547,10 +547,12 @@ void nvmet_start_keep_alive_timer(struct nvmet_ctrl *ctrl);
void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl);
u16 nvmet_parse_connect_cmd(struct nvmet_req *req);
u32 nvmet_connect_cmd_data_len(struct nvmet_req *req);
void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id);
+void nvmet_bdev_set_nvm_limits(struct block_device *bdev,
+ struct nvme_id_ns_nvm *id);
u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_file_parse_io_cmd(struct nvmet_req *req);
u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req);
u32 nvmet_admin_cmd_data_len(struct nvmet_req *req);
u16 nvmet_parse_admin_cmd(struct nvmet_req *req);
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace
2026-02-26 19:04 ` [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace Caleb Sander Mateos
@ 2026-02-26 21:02 ` Keith Busch
2026-02-26 21:06 ` Caleb Sander Mateos
0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2026-02-26 21:02 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel
On Thu, Feb 26, 2026 at 12:04:11PM -0700, Caleb Sander Mateos wrote:
> - if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) {
> - ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
> - if (ret < 0)
> - goto out;
> - }
> + ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
> + if (ret < 0)
> + goto out;
I don't think we can do this identify unconditionally. The controller
has to at least pass nvme_id_cns_ok().
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace
2026-02-26 21:02 ` Keith Busch
@ 2026-02-26 21:06 ` Caleb Sander Mateos
0 siblings, 0 replies; 12+ messages in thread
From: Caleb Sander Mateos @ 2026-02-26 21:06 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel
On Thu, Feb 26, 2026 at 1:02 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Thu, Feb 26, 2026 at 12:04:11PM -0700, Caleb Sander Mateos wrote:
> > - if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) {
> > - ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
> > - if (ret < 0)
> > - goto out;
> > - }
> > + ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
> > + if (ret < 0)
> > + goto out;
>
> I don't think we can do this identify unconditionally. The controller
> has to at least pass nvme_id_cns_ok().
Good point.
Thanks,
Caleb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA
2026-02-26 19:04 ` [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
@ 2026-02-26 21:20 ` Keith Busch
0 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2026-02-26 21:20 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
linux-nvme, linux-kernel
On Thu, Feb 26, 2026 at 12:04:13PM -0700, Caleb Sander Mateos wrote:
> + if (optperf & 0x2 && nvm && nvm->npdgl)
> + npdg = le32_to_cpu(nvm->npdgl);
> + else if (optperf & 0x1)
> + npdg = from0based(id->npdg);
> + if (optperf & 0x2 && nvm && nvm->npdal)
> + npda = le32_to_cpu(nvm->npdal);
> + else if (optperf)
> + npda = from0based(id->npda);
> + lim->discard_granularity = max(npdg, npda) * lim->logical_block_size;
I suspect some controller could report a very large npdal/npdgl such
that the multiplication result overflows the discard_granularity type,
so let's check with a fallback:
if (check_mul_overflow(max(npdg, npda), lim->logical_block_size, &result))
lim->discard_granularity = U32_MAX & (lim->logical_block_size - 1);
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-02-26 21:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 19:04 [PATCH v4 0/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 1/8] nvme: add preferred I/O size fields to struct nvme_id_ns_nvm Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 2/8] nvme: fold nvme_config_discard() into nvme_update_disk_info() Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 3/8] nvme: update nvme_id_ns OPTPERF constants Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 4/8] nvme: always issue I/O Command Set specific Identify Namespace Caleb Sander Mateos
2026-02-26 21:02 ` Keith Busch
2026-02-26 21:06 ` Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 5/8] nvme: add from0based() helper Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 6/8] nvme: set discard_granularity from NPDG/NPDA Caleb Sander Mateos
2026-02-26 21:20 ` Keith Busch
2026-02-26 19:04 ` [PATCH v4 7/8] nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT Caleb Sander Mateos
2026-02-26 19:04 ` [PATCH v4 8/8] nvmet: report NPDGL and NPDAL Caleb Sander Mateos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox