All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
@ 2017-09-20 17:06 Andrzej Jakowski
  2017-09-20 17:48 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Andrzej Jakowski @ 2017-09-20 17:06 UTC (permalink / raw)


In old implementation physical block size exported by NVMe driver defaulted to
logical block size. Existing NVMe SSDs very often
Musze naprawic zeby brac pod uwage relative performance.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski at intel.com>
---
 drivers/nvme/host/core.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..e20c5c6517fc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1194,11 +1194,36 @@ static int nvme_revalidate_ns(struct nvme_ns *ns, struct nvme_id_ns **id)
 	return 0;
 }
 
+static u32 calc_pbs(struct nvme_id_ns *id)
+{
+	struct nvme_lbaf *lbaf_used;
+	int i, max_id = 0;
+	u8 ds = 0;
+
+	lbaf_used = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];
+	if (lbaf_used->rp == NVME_LBAF_RP_BEST)
+		return (((u32)1) << lbaf_used->ds);
+
+	for (i = 0; i < id->nlbaf; i++) {
+		if (id->lbaf[i].ms == 0 && id->lbaf[i].ds > ds &&
+			id->lbaf[i].rp < lbaf_used->rp && id->lbaf[i].ds <= PAGE_SHIFT)
+			max_id = i;
+
+		ds = id->lbaf[i].ds;
+	}
+
+	if (id->lbaf[max_id].ds)
+		return (((u32)1) << id->lbaf[max_id].ds);
+	else
+		return 0;
+}
+
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	u16 bs;
+	u32 pbs;
 
 	/*
 	 * If identify namespace failed, use default 512 byte block size so
@@ -1208,6 +1233,9 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
 	bs = 1 << ns->lba_shift;
+	pbs = calc_pbs(id);
+	if (pbs == 0)
+		pbs = bs;
 	ns->noiob = le16_to_cpu(id->noiob);
 
 	blk_mq_freeze_queue(disk->queue);
@@ -1215,6 +1243,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
 	blk_queue_logical_block_size(ns->queue, bs);
+	blk_queue_physical_block_size(ns->queue, pbs);
+
 	if (ns->noiob)
 		nvme_set_chunk_size(ns);
 	if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
  2017-09-20 17:48 ` Keith Busch
@ 2017-09-20 17:48   ` Christoph Hellwig
  2017-09-20 19:07     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-09-20 17:48 UTC (permalink / raw)


On Wed, Sep 20, 2017@01:48:42PM -0400, Keith Busch wrote:
> I think what you're wanting to say is:
> 
>   The physical block size will default to the logical block size unless
>   otherwise specified. While NVMe doesn't provide a way to discover the
>   physical block size, the format with best relative performance is a
>   good indicator as to the underlying block size.

I don't like this at all.  It's a really nasty guesswork.  If you need
this to get reasonable performance out of a specific device please
quirk it.

NVMe doesn't have the legacy issues that require 4k physical sectors
and 512 logical like ATA, so if a device really prefers 4k sectors
it should skip with 4k format by default instead of doing the logical /
physical dance.  Especially that a "physical" blocksize that Linux
could actually deal with (<= 4k) is a complete lie for flash anyway.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
  2017-09-20 17:06 [PATCH] nvme: set physical block size to value discovered in Identify Namespace Andrzej Jakowski
@ 2017-09-20 17:48 ` Keith Busch
  2017-09-20 17:48   ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2017-09-20 17:48 UTC (permalink / raw)


On Wed, Sep 20, 2017@10:06:05AM -0700, Andrzej Jakowski wrote:
> In old implementation physical block size exported by NVMe driver defaulted to
> logical block size. Existing NVMe SSDs very often
> Musze naprawic zeby brac pod uwage relative performance.

Dziekuje for the patch, but the change logs gotta be in English. :)

I think what you're wanting to say is:

  The physical block size will default to the logical block size unless
  otherwise specified. While NVMe doesn't provide a way to discover the
  physical block size, the format with best relative performance is a
  good indicator as to the underlying block size.

> +static u32 calc_pbs(struct nvme_id_ns *id)
> +{
> +	struct nvme_lbaf *lbaf_used;
> +	int i, max_id = 0;
> +	u8 ds = 0;
> +
> +	lbaf_used = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];
> +	if (lbaf_used->rp == NVME_LBAF_RP_BEST)
> +		return (((u32)1) << lbaf_used->ds);
> +
> +	for (i = 0; i < id->nlbaf; i++) {
> +		if (id->lbaf[i].ms == 0 && id->lbaf[i].ds > ds &&
> +			id->lbaf[i].rp < lbaf_used->rp && id->lbaf[i].ds <= PAGE_SHIFT)
> +			max_id = i;
> +
> +		ds = id->lbaf[i].ds;
> +	}
> +
> +	if (id->lbaf[max_id].ds)
> +		return (((u32)1) << id->lbaf[max_id].ds);
> +	else
> +		return 0;
> +}

I think this can be a little clearner. How about:

static u32 calc_pbs(struct nvme_id_ns *id)
{
	int i;
	struct nvme_lbaf *best = &id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK];

	for (i = 0; i < id->nlbaf; i++)
		if (id->lbaf[i].rp <= best->rp &&
				id->lbaf[i].ds > best->lbaf.ds &&
				id->lbaf[i].ds <= PAGE_SHIFT)
			best = &id->lbaf[i];

	return 1 << best->ds;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
  2017-09-20 17:48   ` Christoph Hellwig
@ 2017-09-20 19:07     ` Keith Busch
  2017-09-20 20:01       ` Martin K. Petersen
  2017-09-20 20:10       ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2017-09-20 19:07 UTC (permalink / raw)


On Wed, Sep 20, 2017@10:48:14AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017@01:48:42PM -0400, Keith Busch wrote:
> > I think what you're wanting to say is:
> > 
> >   The physical block size will default to the logical block size unless
> >   otherwise specified. While NVMe doesn't provide a way to discover the
> >   physical block size, the format with best relative performance is a
> >   good indicator as to the underlying block size.
> 
> I don't like this at all.  It's a really nasty guesswork.  If you need
> this to get reasonable performance out of a specific device please
> quirk it.

I don't think it's about "reasonable" performance; it's about getting
extra relative performance. What else can the best performing LBAF
indicate other than the device's preferred access alignment/granularity?
The spec provides this hint, so it's not really a guess, but maybe
there's a better way to make use of it instead of  considering it to be
the physical block size? io_opt?

On a slightly related topic, I think we should fix the  consistency
in what's reported in the queue's attributes after reformatting the
namespace. Check out the following for what happens today:

Start with a 512b format:

# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
512
512
512

Format it to 4k:

# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
4096
4096
4096

Format it back to 512b:

# cat /sys/block/nvme0n1/queue/{minimum_io_size,logical_block_size,physical_block_size}
4096
512
4096

The first and last are the exact same NVMe format, but they're reported
differently.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
  2017-09-20 19:07     ` Keith Busch
@ 2017-09-20 20:01       ` Martin K. Petersen
  2017-09-20 20:10       ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2017-09-20 20:01 UTC (permalink / raw)



Keith,

> I don't think it's about "reasonable" performance; it's about getting
> extra relative performance. What else can the best performing LBAF
> indicate other than the device's preferred access alignment/granularity?
> The spec provides this hint, so it's not really a guess, but maybe
> there's a better way to make use of it instead of  considering it to be
> the physical block size? io_opt?

It should really report io_min and not physical_block_size. pbs is a
special case for spinning media.

I agree it's a bit weird to walk the format list as a heuristic. Would
be nicer to have a smallest preferred I/O size in the protocol.

> The first and last are the exact same NVMe format, but they're
> reported differently.

Yeah, we should set pbs/io_min as well.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] nvme: set physical block size to value discovered in Identify Namespace
  2017-09-20 19:07     ` Keith Busch
  2017-09-20 20:01       ` Martin K. Petersen
@ 2017-09-20 20:10       ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-09-20 20:10 UTC (permalink / raw)


On Wed, Sep 20, 2017@03:07:52PM -0400, Keith Busch wrote:
> I don't think it's about "reasonable" performance; it's about getting
> extra relative performance. What else can the best performing LBAF
> indicate other than the device's preferred access alignment/granularity?
> The spec provides this hint, so it's not really a guess, but maybe
> there's a better way to make use of it instead of  considering it to be
> the physical block size? io_opt?

>From Documentation/ABI/testing/sysfs-block:

What:           /sys/block/<disk>/queue/physical_block_size
Date:           May 2009
Contact:        Martin K. Petersen <martin.petersen at oracle.com>
Description:
                This is the smallest unit a physical storage device can
                write atomically.  It is usually the same as the logical
                block size but may be bigger.  One example is SATA
                drives with 4KB sectors that expose a 512-byte logical
                block size to the operating system.  For stacked block
                devices the physical_block_size variable contains the
                maximum physical_block_size of the component devices.

The best performing format certainly isn't related to that at all.
If we'd really want to set a physical_block_size we'd have to based
it on top of AWUN/AWUPF (although I still struggle how you define
a global value based on LBAs if the device supports different LBA
formats) or NAWUN/NABSN/NABSPF if supported.

The closest to what you seem to want above would be io_min:

What:           /sys/block/<disk>/queue/minimum_io_size
Date:           April 2009
Contact:        Martin K. Petersen <martin.petersen at oracle.com>
Description:
                Storage devices may report a granularity or preferred
                minimum I/O size which is the smallest request the
                device can perform without incurring a performance
                penalty.  For disk drives this is often the physical
                block size.  For RAID arrays it is often the stripe
                chunk size.  A properly aligned multiple of
                minimum_io_size is the preferred request size for
                workloads where a high number of I/O operations is
                desired.

> On a slightly related topic, I think we should fix the  consistency
> in what's reported in the queue's attributes after reformatting the
> namespace. Check out the following for what happens today:

I think we just need to opt into always getting the Namespace Attribute
Notices AER and this should be taken care of automatically?  I was
going to send a patch for that for other reasons eventually.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-20 20:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-20 17:06 [PATCH] nvme: set physical block size to value discovered in Identify Namespace Andrzej Jakowski
2017-09-20 17:48 ` Keith Busch
2017-09-20 17:48   ` Christoph Hellwig
2017-09-20 19:07     ` Keith Busch
2017-09-20 20:01       ` Martin K. Petersen
2017-09-20 20:10       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.