* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
@ 2017-09-20 17:06 Andrzej Jakowski
2017-09-20 17:48 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Andrzej Jakowski @ 2017-09-20 17:06 UTC (permalink / raw)
In old implementation physical block size exported by NVMe driver defaulted to
logical block size. Existing NVMe SSDs very often
Musze naprawic zeby brac pod uwage relative performance.
Signed-off-by: Andrzej Jakowski <andrzej.jakowski at intel.com>
---
drivers/nvme/host/core.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..e20c5c6517fc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1194,11 +1194,36 @@ static int nvme_revalidate_ns(struct nvme_ns *ns, struct nvme_id_ns **id)
return 0;
}
+static u32 calc_pbs(struct nvme_id_ns *id)
+{
+ struct nvme_lbaf *lbaf_used;
+ int i, max_id = 0;
+ u8 ds = 0;
+
+ lbaf_used = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];
+ if (lbaf_used->rp == NVME_LBAF_RP_BEST)
+ return (((u32)1) << lbaf_used->ds);
+
+ for (i = 0; i < id->nlbaf; i++) {
+ if (id->lbaf[i].ms == 0 && id->lbaf[i].ds > ds &&
+ id->lbaf[i].rp < lbaf_used->rp && id->lbaf[i].ds <= PAGE_SHIFT)
+ max_id = i;
+
+ ds = id->lbaf[i].ds;
+ }
+
+ if (id->lbaf[max_id].ds)
+ return (((u32)1) << id->lbaf[max_id].ds);
+ else
+ return 0;
+}
+
static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
{
struct nvme_ns *ns = disk->private_data;
struct nvme_ctrl *ctrl = ns->ctrl;
u16 bs;
+ u32 pbs;
/*
* If identify namespace failed, use default 512 byte block size so
@@ -1208,6 +1233,9 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
if (ns->lba_shift == 0)
ns->lba_shift = 9;
bs = 1 << ns->lba_shift;
+ pbs = calc_pbs(id);
+ if (pbs == 0)
+ pbs = bs;
ns->noiob = le16_to_cpu(id->noiob);
blk_mq_freeze_queue(disk->queue);
@@ -1215,6 +1243,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
nvme_prep_integrity(disk, id, bs);
blk_queue_logical_block_size(ns->queue, bs);
+ blk_queue_physical_block_size(ns->queue, pbs);
+
if (ns->noiob)
nvme_set_chunk_size(ns);
if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
--
2.13.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
2017-09-20 17:48 ` Keith Busch
@ 2017-09-20 17:48 ` Christoph Hellwig
2017-09-20 19:07 ` Keith Busch
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-09-20 17:48 UTC (permalink / raw)
On Wed, Sep 20, 2017@01:48:42PM -0400, Keith Busch wrote:
> I think what you're wanting to say is:
>
> The physical block size will default to the logical block size unless
> otherwise specified. While NVMe doesn't provide a way to discover the
> physical block size, the format with best relative performance is a
> good indicator as to the underlying block size.
I don't like this at all. It's a really nasty guesswork. If you need
this to get reasonable performance out of a specific device please
quirk it.
NVMe doesn't have the legacy issues that require 4k physical sectors
and 512 logical like ATA, so if a device really prefers 4k sectors
it should skip with 4k format by default instead of doing the logical /
physical dance. Especially that a "physical" blocksize that Linux
could actually deal with (<= 4k) is a complete lie for flash anyway.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
2017-09-20 17:06 [PATCH] nvme: set physical block size to value discovered in Identify Namespace Andrzej Jakowski
@ 2017-09-20 17:48 ` Keith Busch
2017-09-20 17:48 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2017-09-20 17:48 UTC (permalink / raw)
On Wed, Sep 20, 2017@10:06:05AM -0700, Andrzej Jakowski wrote:
> In old implementation physical block size exported by NVMe driver defaulted to
> logical block size. Existing NVMe SSDs very often
> Musze naprawic zeby brac pod uwage relative performance.
Dziekuje for the patch, but the change logs gotta be in English. :)
I think what you're wanting to say is:
The physical block size will default to the logical block size unless
otherwise specified. While NVMe doesn't provide a way to discover the
physical block size, the format with best relative performance is a
good indicator as to the underlying block size.
> +static u32 calc_pbs(struct nvme_id_ns *id)
> +{
> + struct nvme_lbaf *lbaf_used;
> + int i, max_id = 0;
> + u8 ds = 0;
> +
> + lbaf_used = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];
> + if (lbaf_used->rp == NVME_LBAF_RP_BEST)
> + return (((u32)1) << lbaf_used->ds);
> +
> + for (i = 0; i < id->nlbaf; i++) {
> + if (id->lbaf[i].ms == 0 && id->lbaf[i].ds > ds &&
> + id->lbaf[i].rp < lbaf_used->rp && id->lbaf[i].ds <= PAGE_SHIFT)
> + max_id = i;
> +
> + ds = id->lbaf[i].ds;
> + }
> +
> + if (id->lbaf[max_id].ds)
> + return (((u32)1) << id->lbaf[max_id].ds);
> + else
> + return 0;
> +}
I think this can be a little clearner. How about:
static u32 calc_pbs(struct nvme_id_ns *id)
{
int i;
struct nvme_lbaf *best = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];
for (i = 0; i < id->nlbaf; i++)
if (id->lbaf[i].rp <= best->rp &&
id->lbaf[i].ds > best->lbaf.ds &&
id->lbaf[i].ds <= PAGE_SHIFT)
best = &id->lbaf[i];
return 1 << best->ds;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
2017-09-20 17:48 ` Christoph Hellwig
@ 2017-09-20 19:07 ` Keith Busch
2017-09-20 20:01 ` Martin K. Petersen
2017-09-20 20:10 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2017-09-20 19:07 UTC (permalink / raw)
On Wed, Sep 20, 2017@10:48:14AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017@01:48:42PM -0400, Keith Busch wrote:
> > I think what you're wanting to say is:
> >
> > The physical block size will default to the logical block size unless
> > otherwise specified. While NVMe doesn't provide a way to discover the
> > physical block size, the format with best relative performance is a
> > good indicator as to the underlying block size.
>
> I don't like this at all. It's a really nasty guesswork. If you need
> this to get reasonable performance out of a specific device please
> quirk it.
I don't think it's about "reasonable" performance; it's about getting
extra relative performance. What else can the best performing LBAF
indicate other than the device's preferred access alignment/granularity?
The spec provides this hint, so it's not really a guess, but maybe
there's a better way to make use of it instead of considering it to be
the physical block size? io_opt?
On a slightly related topic, I think we should fix the consistency
in what's reported in the queue's attributes after reformatting the
namespace. Check out the following for what happens today:
Start with a 512b format:
# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
512
512
512
Format it to 4k:
# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
4096
4096
4096
Format it back to 512b:
# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
4096
512
4096
The first and last are the exact same NVMe format, but they're reported
differently.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
2017-09-20 19:07 ` Keith Busch
@ 2017-09-20 20:01 ` Martin K. Petersen
2017-09-20 20:10 ` Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2017-09-20 20:01 UTC (permalink / raw)
Keith,
> I don't think it's about "reasonable" performance; it's about getting
> extra relative performance. What else can the best performing LBAF
> indicate other than the device's preferred access alignment/granularity?
> The spec provides this hint, so it's not really a guess, but maybe
> there's a better way to make use of it instead of considering it to be
> the physical block size? io_opt?
It should really report io_min and not physical_block_size. pbs is a
special case for spinning media.
I agree it's a bit weird to walk the format list as a heuristic. Would
be nicer to have a smallest preferred I/O size in the protocol.
> The first and last are the exact same NVMe format, but they're
> reported differently.
Yeah, we should set pbs/io_min as well.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
2017-09-20 19:07 ` Keith Busch
2017-09-20 20:01 ` Martin K. Petersen
@ 2017-09-20 20:10 ` Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-09-20 20:10 UTC (permalink / raw)
On Wed, Sep 20, 2017@03:07:52PM -0400, Keith Busch wrote:
> I don't think it's about "reasonable" performance; it's about getting
> extra relative performance. What else can the best performing LBAF
> indicate other than the device's preferred access alignment/granularity?
> The spec provides this hint, so it's not really a guess, but maybe
> there's a better way to make use of it instead of considering it to be
> the physical block size? io_opt?
>From Documentation/ABI/testing/sysfs-block:
What: /sys/block/<disk>/queue/physical_block_size
Date: May 2009
Contact: Martin K. Petersen <martin.petersen at oracle.com>
Description:
This is the smallest unit a physical storage device can
write atomically. It is usually the same as the logical
block size but may be bigger. One example is SATA
drives with 4KB sectors that expose a 512-byte logical
block size to the operating system. For stacked block
devices the physical_block_size variable contains the
maximum physical_block_size of the component devices.
The best performing format certainly isn't related to that at all.
If we'd really want to set a physical_block_size we'd have to based
it on top of AWUN/AWUPF (although I still struggle how you define
a global value based on LBAs if the device supports different LBA
formats) or NAWUN/NABSN/NABSPF if supported.
The closest to what you seem to want above would be io_min:
What: /sys/block/<disk>/queue/minimum_io_size
Date: April 2009
Contact: Martin K. Petersen <martin.petersen at oracle.com>
Description:
Storage devices may report a granularity or preferred
minimum I/O size which is the smallest request the
device can perform without incurring a performance
penalty. For disk drives this is often the physical
block size. For RAID arrays it is often the stripe
chunk size. A properly aligned multiple of
minimum_io_size is the preferred request size for
workloads where a high number of I/O operations is
desired.
> On a slightly related topic, I think we should fix the consistency
> in what's reported in the queue's attributes after reformatting the
> namespace. Check out the following for what happens today:
I think we just need to opt into always getting the Namespace Attribute
Notices AER and this should be taken care of automatically? I was
going to send a patch for that for other reasons eventually.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-09-20 20:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-20 17:06 [PATCH] nvme: set physical block size to value discovered in Identify Namespace Andrzej Jakowski
2017-09-20 17:48 ` Keith Busch
2017-09-20 17:48 ` Christoph Hellwig
2017-09-20 19:07 ` Keith Busch
2017-09-20 20:01 ` Martin K. Petersen
2017-09-20 20:10 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.