* [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
@ 2024-10-16 21:31 Abhishek Bapat
2024-10-16 21:54 ` Prashant Malani
2024-10-17 16:40 ` Keith Busch
0 siblings, 2 replies; 14+ messages in thread
From: Abhishek Bapat @ 2024-10-16 21:31 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: Prashant Malani, linux-nvme, linux-kernel, Abhishek
From: Abhishek <abhishekbapat@google.com>
The initialization of the max_hw_sectors_kb value is performed by the
NVMe driver through the invocation of the NVMe Identify Controller
command, followed by the subsequent retrieval of the MDTS (Max Data
Transfer Size) field. Commit 3710e2b056cb ("nvme-pci: clamp
max_hw_sectors based on DMA optimized limitation") introduced a
limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
(MDTS = 5). This restricion was implemented to mitigate lockups
encountered in high-core count AMD servers.
Currently, user space applications have two options for obtaining the
max_hw_sectors_kb value to determine the payload size of the NVMe
command they wish to issue. They can either execute the Identify
Controller command or query the kernel. In instances where the
underlying NVMe device supports MDTS > 5 (128KiB), the user space
application can potentially create an NVMe command with a payload size
greater than 128KiB, if it fetches the MDTS value through the Identify
Controller command. However, this would result in an Invalid Argument
(-EINVAL) kernel error, preventing the application from issuing the
required command through any of the kernel supported I/O API. Presently,
the kernel exposes max_hw_sectors_kb value through a queue sysfs file.
However, this file is only present for an NVMe device if a namespace has
been created on the same NVMe device, necessitating the existence of a
namespace to query the value of max_hw_sectors_kb. This dependency is
semantically incorrect as MDTS is a controller-associated field (section
5.1.13, NVMe specification 2.1) and should be accessible regardless of
the presence of a namespace on the NVMe device.
Expose the value of max_hw_sectors_kb through NVMe sysfs to remove the
dependency of having a namespace on the device before accessing its
value.
Signed-off-by: Abhishek <abhishekbapat@google.com>
---
drivers/nvme/host/sysfs.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index b68a9e5f1ea3..1af2b2cf1a6c 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -546,6 +546,17 @@ static ssize_t dctype_show(struct device *dev,
}
static DEVICE_ATTR_RO(dctype);
+static ssize_t max_hw_sectors_kb_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+ u32 max_hw_sectors_kb = ctrl->max_hw_sectors >> 1;
+
+ return sysfs_emit(buf, "%u\n", max_hw_sectors_kb);
+}
+static DEVICE_ATTR_RO(max_hw_sectors_kb);
+
#ifdef CONFIG_NVME_HOST_AUTH
static ssize_t nvme_ctrl_dhchap_secret_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -687,6 +698,7 @@ static struct attribute *nvme_dev_attrs[] = {
&dev_attr_kato.attr,
&dev_attr_cntrltype.attr,
&dev_attr_dctype.attr,
+ &dev_attr_max_hw_sectors_kb.attr,
#ifdef CONFIG_NVME_HOST_AUTH
&dev_attr_dhchap_secret.attr,
&dev_attr_dhchap_ctrl_secret.attr,
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-16 21:31 [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces Abhishek Bapat
@ 2024-10-16 21:54 ` Prashant Malani
2024-10-17 21:09 ` Abhishek Bapat
2024-10-17 16:40 ` Keith Busch
1 sibling, 1 reply; 14+ messages in thread
From: Prashant Malani @ 2024-10-16 21:54 UTC (permalink / raw)
To: Abhishek Bapat
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme, linux-kernel
Hi Abhishek,
On Wed, 16 Oct 2024 at 14:31, Abhishek Bapat <abhishekbapat@google.com> wrote:
>
> From: Abhishek <abhishekbapat@google.com>
Here and in the S-o-b line: Please use your full legal name.
> diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
> index b68a9e5f1ea3..1af2b2cf1a6c 100644
> --- a/drivers/nvme/host/sysfs.c
> +++ b/drivers/nvme/host/sysfs.c
> @@ -546,6 +546,17 @@ static ssize_t dctype_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(dctype);
>
> +static ssize_t max_hw_sectors_kb_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> + u32 max_hw_sectors_kb = ctrl->max_hw_sectors >> 1;
In what unit is max_hw_sector stored? If it's "number of sectors", is this
conversion to size correct, or should SECTOR_SHIFT be used?
Thanks,
--
-Prashant
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-16 21:54 ` Prashant Malani
@ 2024-10-17 21:09 ` Abhishek Bapat
0 siblings, 0 replies; 14+ messages in thread
From: Abhishek Bapat @ 2024-10-17 21:09 UTC (permalink / raw)
To: Prashant Malani
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme, linux-kernel
On Wed, Oct 16, 2024 at 2:54 PM Prashant Malani <pmalani@google.com> wrote:
>
> Hi Abhishek,
>
>
> On Wed, 16 Oct 2024 at 14:31, Abhishek Bapat <abhishekbapat@google.com> wrote:
> >
> > From: Abhishek <abhishekbapat@google.com>
>
> Here and in the S-o-b line: Please use your full legal name.
Acknowledged.
>
> > diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
> > index b68a9e5f1ea3..1af2b2cf1a6c 100644
> > --- a/drivers/nvme/host/sysfs.c
> > +++ b/drivers/nvme/host/sysfs.c
> > @@ -546,6 +546,17 @@ static ssize_t dctype_show(struct device *dev,
> > }
> > static DEVICE_ATTR_RO(dctype);
> >
> > +static ssize_t max_hw_sectors_kb_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> > + u32 max_hw_sectors_kb = ctrl->max_hw_sectors >> 1;
>
> In what unit is max_hw_sector stored? If it's "number of sectors", is this
> conversion to size correct, or should SECTOR_SHIFT be used?
The unit for max_hw_sectors is sectors. Left shifting with SECTOR_SHIFT
will change the unit to bytes, and then we will have to right shift by 10
to convert it into KiB. The net effect is right shifting by one as the value
of SECTOR_SHIFT is 9. This programming pattern is used within the
Block layer code as well.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-16 21:31 [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces Abhishek Bapat
2024-10-16 21:54 ` Prashant Malani
@ 2024-10-17 16:40 ` Keith Busch
2024-10-17 17:01 ` Caleb Sander
` (2 more replies)
1 sibling, 3 replies; 14+ messages in thread
From: Keith Busch @ 2024-10-17 16:40 UTC (permalink / raw)
To: Abhishek Bapat
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Prashant Malani,
linux-nvme, linux-kernel
On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
> max_hw_sectors based on DMA optimized limitation") introduced a
> limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
> (MDTS = 5). This restricion was implemented to mitigate lockups
> encountered in high-core count AMD servers.
There are other limits that can constrain transfer sizes below the
device's MDTS. For example, the driver can only preallocate so much
space for DMA and SGL descriptors, so 8MB is the current max transfer
sizes the driver can support, and a device's MDTS can be much bigger
than that.
Anyway, yeah, I guess having a controller generic way to export this
sounds like a good idea, but I wonder if the nvme driver is the right
place to do it. The request_queue has all the limits you need to know
about, but these are only exported if a gendisk is attached to it.
Maybe we can create a queue subdirectory to the char dev too.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-17 16:40 ` Keith Busch
@ 2024-10-17 17:01 ` Caleb Sander
2024-10-17 21:32 ` Abhishek Bapat
2024-10-18 5:14 ` Christoph Hellwig
2 siblings, 0 replies; 14+ messages in thread
From: Caleb Sander @ 2024-10-17 17:01 UTC (permalink / raw)
To: Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Prashant Malani, linux-nvme, linux-kernel
I agree it would be convenient for the kernel to expose a generic "max
data size" limit for the NVMe controller. For example, nvme-cli
currently pessimistically assumes a controller's maximum data transfer
size is 4 KB when sending Get Log Page commands:
https://github.com/linux-nvme/libnvme/blob/8cdd746b324bd84a0666e7a265aa253dbda9d932/src/nvme/ioctl.c#L330.
Fetching large log pages results in a lot of Get Log Page commands. If
nvme-cli could tell that the controller and kernel support larger data
transfers, it could fetch the entire log page (or a much larger chunk)
in a Get Log Page command.
Best,
Caleb
On Thu, Oct 17, 2024 at 9:46 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
> > max_hw_sectors based on DMA optimized limitation") introduced a
> > limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
> > (MDTS = 5). This restricion was implemented to mitigate lockups
> > encountered in high-core count AMD servers.
>
> There are other limits that can constrain transfer sizes below the
> device's MDTS. For example, the driver can only preallocate so much
> space for DMA and SGL descriptors, so 8MB is the current max transfer
> sizes the driver can support, and a device's MDTS can be much bigger
> than that.
>
> Anyway, yeah, I guess having a controller generic way to export this
> sounds like a good idea, but I wonder if the nvme driver is the right
> place to do it. The request_queue has all the limits you need to know
> about, but these are only exported if a gendisk is attached to it.
> Maybe we can create a queue subdirectory to the char dev too.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-17 16:40 ` Keith Busch
2024-10-17 17:01 ` Caleb Sander
@ 2024-10-17 21:32 ` Abhishek Bapat
2024-10-22 14:53 ` Keith Busch
2024-10-18 5:14 ` Christoph Hellwig
2 siblings, 1 reply; 14+ messages in thread
From: Abhishek Bapat @ 2024-10-17 21:32 UTC (permalink / raw)
To: Keith Busch
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Prashant Malani,
linux-nvme, linux-kernel
On Thu, Oct 17, 2024 at 9:40 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
> > max_hw_sectors based on DMA optimized limitation") introduced a
> > limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
> > (MDTS = 5). This restricion was implemented to mitigate lockups
> > encountered in high-core count AMD servers.
>
> There are other limits that can constrain transfer sizes below the
> device's MDTS. For example, the driver can only preallocate so much
> space for DMA and SGL descriptors, so 8MB is the current max transfer
> sizes the driver can support, and a device's MDTS can be much bigger
> than that.
>
> Anyway, yeah, I guess having a controller generic way to export this
> sounds like a good idea, but I wonder if the nvme driver is the right
> place to do it. The request_queue has all the limits you need to know
> about, but these are only exported if a gendisk is attached to it.
> Maybe we can create a queue subdirectory to the char dev too.
Are you suggesting that all the files from the queue subdirectory should
be included in the char dev (/sys/class/nvme/nvmeX/queue/)? Or that
just the max_hw_sectors_kb value should be shared within the queue
subdirectory? And if not the nvme driver, where else can this be done
from?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-17 21:32 ` Abhishek Bapat
@ 2024-10-22 14:53 ` Keith Busch
2024-10-22 15:35 ` Sagi Grimberg
2024-10-23 5:24 ` Christoph Hellwig
0 siblings, 2 replies; 14+ messages in thread
From: Keith Busch @ 2024-10-22 14:53 UTC (permalink / raw)
To: Abhishek Bapat
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Prashant Malani,
linux-nvme, linux-kernel
On Thu, Oct 17, 2024 at 02:32:18PM -0700, Abhishek Bapat wrote:
> On Thu, Oct 17, 2024 at 9:40 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
> > > max_hw_sectors based on DMA optimized limitation") introduced a
> > > limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
> > > (MDTS = 5). This restricion was implemented to mitigate lockups
> > > encountered in high-core count AMD servers.
> >
> > There are other limits that can constrain transfer sizes below the
> > device's MDTS. For example, the driver can only preallocate so much
> > space for DMA and SGL descriptors, so 8MB is the current max transfer
> > sizes the driver can support, and a device's MDTS can be much bigger
> > than that.
> >
> > Anyway, yeah, I guess having a controller generic way to export this
> > sounds like a good idea, but I wonder if the nvme driver is the right
> > place to do it. The request_queue has all the limits you need to know
> > about, but these are only exported if a gendisk is attached to it.
> > Maybe we can create a queue subdirectory to the char dev too.
>
> Are you suggesting that all the files from the queue subdirectory should
> be included in the char dev (/sys/class/nvme/nvmeX/queue/)? Or that
> just the max_hw_sectors_kb value should be shared within the queue
> subdirectory? And if not the nvme driver, where else can this be done
> from?
You'd may want to know max_sectors_kb, dma_alignment, nr_requests,
virt_boundary_mask. Maybe some others.
The request_queue is owned by the block layer, so that seems like an
okay place to export it, but attached to some other device's sysfs
directory instead of a gendisk.
I'm just suggesting this because it doesn't sound like this is an nvme
specific problem.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-22 14:53 ` Keith Busch
@ 2024-10-22 15:35 ` Sagi Grimberg
2024-10-22 15:51 ` Keith Busch
2024-10-23 5:24 ` Christoph Hellwig
1 sibling, 1 reply; 14+ messages in thread
From: Sagi Grimberg @ 2024-10-22 15:35 UTC (permalink / raw)
To: Keith Busch, Abhishek Bapat
Cc: Jens Axboe, Christoph Hellwig, Prashant Malani, linux-nvme,
linux-kernel
On 22/10/2024 17:53, Keith Busch wrote:
> On Thu, Oct 17, 2024 at 02:32:18PM -0700, Abhishek Bapat wrote:
>> On Thu, Oct 17, 2024 at 9:40 AM Keith Busch <kbusch@kernel.org> wrote:
>>> On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
>>>> max_hw_sectors based on DMA optimized limitation") introduced a
>>>> limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
>>>> (MDTS = 5). This restricion was implemented to mitigate lockups
>>>> encountered in high-core count AMD servers.
>>> There are other limits that can constrain transfer sizes below the
>>> device's MDTS. For example, the driver can only preallocate so much
>>> space for DMA and SGL descriptors, so 8MB is the current max transfer
>>> sizes the driver can support, and a device's MDTS can be much bigger
>>> than that.
>>>
>>> Anyway, yeah, I guess having a controller generic way to export this
>>> sounds like a good idea, but I wonder if the nvme driver is the right
>>> place to do it. The request_queue has all the limits you need to know
>>> about, but these are only exported if a gendisk is attached to it.
>>> Maybe we can create a queue subdirectory to the char dev too.
>> Are you suggesting that all the files from the queue subdirectory should
>> be included in the char dev (/sys/class/nvme/nvmeX/queue/)? Or that
>> just the max_hw_sectors_kb value should be shared within the queue
>> subdirectory? And if not the nvme driver, where else can this be done
>> from?
> You'd may want to know max_sectors_kb, dma_alignment, nr_requests,
> virt_boundary_mask. Maybe some others.
>
> The request_queue is owned by the block layer, so that seems like an
> okay place to export it, but attached to some other device's sysfs
> directory instead of a gendisk.
>
> I'm just suggesting this because it doesn't sound like this is an nvme
> specific problem.
Won't it be confusing to find queue/ directory in controller nvmeX sysfs
entry?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-22 15:35 ` Sagi Grimberg
@ 2024-10-22 15:51 ` Keith Busch
2024-10-23 9:47 ` Sagi Grimberg
0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2024-10-22 15:51 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Abhishek Bapat, Jens Axboe, Christoph Hellwig, Prashant Malani,
linux-nvme, linux-kernel
On Tue, Oct 22, 2024 at 06:35:11PM +0300, Sagi Grimberg wrote:
> On 22/10/2024 17:53, Keith Busch wrote:
> > On Thu, Oct 17, 2024 at 02:32:18PM -0700, Abhishek Bapat wrote:
> >
> > The request_queue is owned by the block layer, so that seems like an
> > okay place to export it, but attached to some other device's sysfs
> > directory instead of a gendisk.
> >
> > I'm just suggesting this because it doesn't sound like this is an nvme
> > specific problem.
>
> Won't it be confusing to find queue/ directory in controller nvmeX sysfs
> entry?
It's the attributes of the request queue associated with that
controller, so I think a queue/ directory under it makes sense. That's
how it looks for gendisks, so why not for disk-less queues?
Many queue attributes only make sense for gendisks, though, so maybe
need to tweak visibility if we decide to do it like this.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-22 15:51 ` Keith Busch
@ 2024-10-23 9:47 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2024-10-23 9:47 UTC (permalink / raw)
To: Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Christoph Hellwig, Prashant Malani,
linux-nvme, linux-kernel
On 22/10/2024 18:51, Keith Busch wrote:
> On Tue, Oct 22, 2024 at 06:35:11PM +0300, Sagi Grimberg wrote:
>> On 22/10/2024 17:53, Keith Busch wrote:
>>> On Thu, Oct 17, 2024 at 02:32:18PM -0700, Abhishek Bapat wrote:
>>>
>>> The request_queue is owned by the block layer, so that seems like an
>>> okay place to export it, but attached to some other device's sysfs
>>> directory instead of a gendisk.
>>>
>>> I'm just suggesting this because it doesn't sound like this is an nvme
>>> specific problem.
>> Won't it be confusing to find queue/ directory in controller nvmeX sysfs
>> entry?
> It's the attributes of the request queue associated with that
> controller, so I think a queue/ directory under it makes sense. That's
> how it looks for gendisks, so why not for disk-less queues?
>
> Many queue attributes only make sense for gendisks, though, so maybe
> need to tweak visibility if we decide to do it like this.
It'd be good to name it "admin_queue" to clarify a bit (although its still
confusing).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-22 14:53 ` Keith Busch
2024-10-22 15:35 ` Sagi Grimberg
@ 2024-10-23 5:24 ` Christoph Hellwig
2024-10-23 9:46 ` Sagi Grimberg
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-23 5:24 UTC (permalink / raw)
To: Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Prashant Malani, linux-nvme, linux-kernel
On Tue, Oct 22, 2024 at 08:53:47AM -0600, Keith Busch wrote:
> You'd may want to know max_sectors_kb, dma_alignment, nr_requests,
> virt_boundary_mask. Maybe some others.
>
> The request_queue is owned by the block layer, so that seems like an
> okay place to export it, but attached to some other device's sysfs
> directory instead of a gendisk.
>
> I'm just suggesting this because it doesn't sound like this is an nvme
> specific problem.
Well, it's a problem specific to passthrough without a gendisk, which is
the NVMe admin queue and the /dev/sg device. So it's common-ish :)
Note that for the programs using passthrough sysfs isn't actually a very
good interface, as finding the right directory is pain, as is opening,
reading and parsing one ASCIII file per limit.
One thing I've been wanting to do also for mkfs tools and similar is a
generic extensible ioctl to dump all the queue limits. That's a lot
easier and faster for the tools and would work very well here.
Note that we could still be adding new limits at any point of time
(although I have a hard time thinking what limit we don't have yet),
so we still can't guarantee that non-trivial I/O will always work.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-23 5:24 ` Christoph Hellwig
@ 2024-10-23 9:46 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2024-10-23 9:46 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Prashant Malani, linux-nvme,
linux-kernel
On 23/10/2024 8:24, Christoph Hellwig wrote:
> On Tue, Oct 22, 2024 at 08:53:47AM -0600, Keith Busch wrote:
>> You'd may want to know max_sectors_kb, dma_alignment, nr_requests,
>> virt_boundary_mask. Maybe some others.
>>
>> The request_queue is owned by the block layer, so that seems like an
>> okay place to export it, but attached to some other device's sysfs
>> directory instead of a gendisk.
>>
>> I'm just suggesting this because it doesn't sound like this is an nvme
>> specific problem.
> Well, it's a problem specific to passthrough without a gendisk, which is
> the NVMe admin queue and the /dev/sg device. So it's common-ish :)
>
>
> Note that for the programs using passthrough sysfs isn't actually a very
> good interface, as finding the right directory is pain, as is opening,
> reading and parsing one ASCIII file per limit.
>
> One thing I've been wanting to do also for mkfs tools and similar is a
> generic extensible ioctl to dump all the queue limits. That's a lot
> easier and faster for the tools and would work very well here.
>
> Note that we could still be adding new limits at any point of time
> (although I have a hard time thinking what limit we don't have yet),
> so we still can't guarantee that non-trivial I/O will always work.
Makes sense to me. Although people would still like to be able to
see this value outside of an application context. We can probably
extend nvme-cli to display this info...
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-17 16:40 ` Keith Busch
2024-10-17 17:01 ` Caleb Sander
2024-10-17 21:32 ` Abhishek Bapat
@ 2024-10-18 5:14 ` Christoph Hellwig
2024-10-20 21:25 ` Sagi Grimberg
2 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2024-10-18 5:14 UTC (permalink / raw)
To: Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Prashant Malani, linux-nvme, linux-kernel
On Thu, Oct 17, 2024 at 10:40:36AM -0600, Keith Busch wrote:
> On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
> > max_hw_sectors based on DMA optimized limitation") introduced a
> > limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
> > (MDTS = 5). This restricion was implemented to mitigate lockups
> > encountered in high-core count AMD servers.
>
> There are other limits that can constrain transfer sizes below the
> device's MDTS. For example, the driver can only preallocate so much
> space for DMA and SGL descriptors, so 8MB is the current max transfer
> sizes the driver can support, and a device's MDTS can be much bigger
> than that.
Yes. Plus the virt boundary for PRPs, and for non-PCIe tranfers
there's also plenty of other hardware limits due to e.g. the FC HBA
and the RDMA HCA limit. There's also been some talk of a new PCIe
SGL variant with hard limits.
So I agree that exposting limits on I/O would be very useful, but it's
also kinda non-trivial.
> Anyway, yeah, I guess having a controller generic way to export this
> sounds like a good idea, but I wonder if the nvme driver is the right
> place to do it. The request_queue has all the limits you need to know
> about, but these are only exported if a gendisk is attached to it.
> Maybe we can create a queue subdirectory to the char dev too.
If we want it controller wide to e.g. include the admin queue the
gendisk won't really help unfortunately.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces
2024-10-18 5:14 ` Christoph Hellwig
@ 2024-10-20 21:25 ` Sagi Grimberg
0 siblings, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2024-10-20 21:25 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch
Cc: Abhishek Bapat, Jens Axboe, Prashant Malani, linux-nvme,
linux-kernel
On 18/10/2024 8:14, Christoph Hellwig wrote:
> On Thu, Oct 17, 2024 at 10:40:36AM -0600, Keith Busch wrote:
>> On Wed, Oct 16, 2024 at 09:31:08PM +0000, Abhishek Bapat wrote:
>>> max_hw_sectors based on DMA optimized limitation") introduced a
>>> limitation on the value of max_hw_sectors_kb, restricting it to 128KiB
>>> (MDTS = 5). This restricion was implemented to mitigate lockups
>>> encountered in high-core count AMD servers.
>> There are other limits that can constrain transfer sizes below the
>> device's MDTS. For example, the driver can only preallocate so much
>> space for DMA and SGL descriptors, so 8MB is the current max transfer
>> sizes the driver can support, and a device's MDTS can be much bigger
>> than that.
> Yes. Plus the virt boundary for PRPs, and for non-PCIe tranfers
> there's also plenty of other hardware limits due to e.g. the FC HBA
> and the RDMA HCA limit. There's also been some talk of a new PCIe
> SGL variant with hard limits.
>
> So I agree that exposting limits on I/O would be very useful, but it's
> also kinda non-trivial.
I think the ctrl misc device attributes are fine to expose this and other
types of attributes (like we already do today).
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-23 9:47 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-16 21:31 [PATCH] nvme-sysfs: display max_hw_sectors_kb without requiring namespaces Abhishek Bapat
2024-10-16 21:54 ` Prashant Malani
2024-10-17 21:09 ` Abhishek Bapat
2024-10-17 16:40 ` Keith Busch
2024-10-17 17:01 ` Caleb Sander
2024-10-17 21:32 ` Abhishek Bapat
2024-10-22 14:53 ` Keith Busch
2024-10-22 15:35 ` Sagi Grimberg
2024-10-22 15:51 ` Keith Busch
2024-10-23 9:47 ` Sagi Grimberg
2024-10-23 5:24 ` Christoph Hellwig
2024-10-23 9:46 ` Sagi Grimberg
2024-10-18 5:14 ` Christoph Hellwig
2024-10-20 21:25 ` Sagi Grimberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox