NVMe CLI Invalid PRP Entry Status Failures

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* NVMe CLI Invalid PRP Entry Status Failures
       [not found] <MW3PR16MB3897F0E659B34877C4D5197CF391A@MW3PR16MB3897.namprd16.prod.outlook.com>
@ 2025-05-15 14:01 ` Jeffrey Lien
  2025-05-15 14:18   ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Jeffrey Lien @ 2025-05-15 14:01 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org; +Cc: Avinash M N


We are seeing invalid PRP entry status failures with transfer lengths > 0x2000 on nvme cli admin-passthru commands.  This started happening on the 6.10 linux kernel version and the commands were working on kernels <= to the 5.15 version.   See example commands below.   I didn't find any obvious commits in that time frame that might be causing these failures.  Does anyone have any insight to any of the changes that might be causing these errors?

Example command: sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x2000 -r -b
 
.	Initially any transfer length > 0x2000 fails in invalid PRP entry status
$ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x2000 -r -b Admin Command Vendor Specific is Success and result: 0x00000000
 
$ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x2001 -r -b NVMe status: PRP Offset Invalid: The Offset field for a PRP entry is invalid(0x13)
 
.	After few retries, even lower transfer lengths start failing with same invalid PRP entry
$ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x1000 -r -b NVMe status: PRP Offset Invalid: The Offset field for a PRP entry is invalid(0x13)
 
$ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 128 -r -b NVMe status: PRP Offset Invalid: The Offset field for a PRP entry is invalid(0x13)


Jeff Lien
SW Tools Development
 

 
2900 37th St NW Building 108 Rochester, MN 55901
Email:  Email: jeff.lien@sandisk.com
Phone: 507.322.2416
Mobile: 507.273-9124



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
  2025-05-15 14:01 ` NVMe CLI Invalid PRP Entry Status Failures Jeffrey Lien
@ 2025-05-15 14:18   ` Keith Busch
       [not found]     ` <LV3PR16MB606775F391112B2DADF64463B590A@LV3PR16MB6067.namprd16.prod.outlook.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-05-15 14:18 UTC (permalink / raw)
  To: Jeffrey Lien; +Cc: linux-nvme@lists.infradead.org, Avinash M N

On Thu, May 15, 2025 at 02:01:50PM +0000, Jeffrey Lien wrote:
> 
> We are seeing invalid PRP entry status failures with transfer lengths
> 0x2000 on nvme cli admin-passthru commands.  This started happening on
> the 6.10 linux kernel version and the commands were working on kernels
> <= to the 5.15 version.   See example commands below.   I didn't find
> any obvious commits in that time frame that might be causing these
> failures.  Does anyone have any insight to any of the changes that
> might be causing these errors?
> 
> Example command: sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x2000 -r -b
>  
> .	Initially any transfer length > 0x2000 fails in invalid PRP entry status
> $ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x2000 -r -b Admin Command Vendor Specific is Success and result: 0x00000000

The only thing that comes to mind is that we allow dword aligned buffers
now. The driver used to require 4k aligned, but nvme spec doesn't
require that. My guess is your device is incorrectly rejecting admin
commands that have a dword offset in PRP1. Do you have any visibility
into the device's reasoning for this response here?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
       [not found]     ` <LV3PR16MB606775F391112B2DADF64463B590A@LV3PR16MB6067.namprd16.prod.outlook.com>
@ 2025-05-16 12:33       ` Keith Busch
       [not found]         ` <LV3PR16MB6067E928D739DCB65F31FE46B593A@LV3PR16MB6067.namprd16.prod.outlook.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-05-16 12:33 UTC (permalink / raw)
  To: Avinash M N; +Cc: Jeffrey Lien, linux-nvme@lists.infradead.org

On Fri, May 16, 2025 at 12:16:36PM +0000, Avinash M N wrote:
> Hi Keith,
> 
> There are 2 issues with the same command.
> 
> 
>   1.  $ sudo nvme admin-passthru /dev/nvme0 --opcode=0xD2 --cdw10=0x12600 --cdw12=0x00010132 -l 0x4F000 -r -b
> passthru: Invalid argument

Usually that means you're transferring something too large. What is your
device's MDTS value?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
       [not found]         ` <LV3PR16MB6067E928D739DCB65F31FE46B593A@LV3PR16MB6067.namprd16.prod.outlook.com>
@ 2025-05-16 14:41           ` Keith Busch
  2025-05-24  4:22             ` Avinash M N
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-05-16 14:41 UTC (permalink / raw)
  To: Avinash M N; +Cc: Jeffrey Lien, linux-nvme@lists.infradead.org

On Fri, May 16, 2025 at 12:49:07PM +0000, Avinash M N wrote:
> The device MDTS value is 2 MB. The same commands and device works fine on older kernels. The testing was on 5.14.

Not sure then. You may have to trace into the kernel to see where it's
getting the EINVAL error from.

And if you respond, please use plain text so that the mailing list
thread doesn't have gaps.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
  2025-05-16 14:41           ` Keith Busch
@ 2025-05-24  4:22             ` Avinash M N
  2025-05-24 14:26               ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Avinash M N @ 2025-05-24  4:22 UTC (permalink / raw)
  To: Keith Busch; +Cc: Jeffrey Lien, linux-nvme@lists.infradead.org, Rahul Jain

The below function is causing the EINVAL to be returned to the userspace.

int blk_rq_append_bio(struct request *rq, struct bio *bio)
{
        const struct queue_limits *lim = &rq->q->limits;
        unsigned int max_bytes = lim->max_hw_sectors << SECTOR_SHIFT;
        unsigned int nr_segs = 0;
        int ret;

        /* check that the data layout matches the hardware restrictions */
        ret = bio_split_rw_at(bio, lim, &nr_segs, max_bytes);
        if (ret) {
                /* if we would have to split the bio, copy instead */
                if (ret > 0) {
                        ret = -EREMOTEIO;
                }
                return ret;
        }

There is no issue if nvme-cli sends a transfer length of up to 128K. Anything more than 128K is failed as ENINVAL. I guess this is coming from the limitation of BIO_MAX_VECS as 256. Since this was working on older kernels, did anything change in this regard?

The failing command was attempting to transfer a length of ~320K. It seems that nvme-cli does not split the transfers and sends the full transfer length to the kernel.

Thanks,
Avinash 

On 16/05/25, 8:12 PM, "Keith Busch" <kbusch@kernel.org <mailto:kbusch@kernel.org>> wrote:

On Fri, May 16, 2025 at 12:49:07PM +0000, Avinash M N wrote:
> The device MDTS value is 2 MB. The same commands and device works fine on older kernels. The testing was on 5.14.

Not sure then. You may have to trace into the kernel to see where it's
getting the EINVAL error from.

And if you respond, please use plain text so that the mailing list
thread doesn't have gaps.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
  2025-05-24  4:22             ` Avinash M N
@ 2025-05-24 14:26               ` Keith Busch
  2025-05-26  7:09                 ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2025-05-24 14:26 UTC (permalink / raw)
  To: Avinash M N; +Cc: Jeffrey Lien, linux-nvme@lists.infradead.org, Rahul Jain

On Sat, May 24, 2025 at 04:22:11AM +0000, Avinash M N wrote:
> The below function is causing the EINVAL to be returned to the userspace.
> 
> int blk_rq_append_bio(struct request *rq, struct bio *bio)
> {
>         const struct queue_limits *lim = &rq->q->limits;
>         unsigned int max_bytes = lim->max_hw_sectors << SECTOR_SHIFT;
>         unsigned int nr_segs = 0;
>         int ret;
> 
>         /* check that the data layout matches the hardware restrictions */
>         ret = bio_split_rw_at(bio, lim, &nr_segs, max_bytes);
>         if (ret) {
>                 /* if we would have to split the bio, copy instead */
>                 if (ret > 0) {
>                         ret = -EREMOTEIO;
>                 }
>                 return ret;
>         }
> 
> There is no issue if nvme-cli sends a transfer length of up to 128K.
> Anything more than 128K is failed as ENINVAL. I guess this is coming
> from the limitation of BIO_MAX_VECS as 256. Since this was working on
> older kernels, did anything change in this regard?

Well, it's passing a virtually contiguous address, so if we assume your
page size is 4k, 256 vectors would allow up to 1MB without a problem.

But the NVMe pci driver has its own limit of 128 vectors, so 512k is the
largest you can safely go before you need to increase the size of pages
via hugetlbfs.

But you say you're doing something smaller than 512k, and your device's
MDTS is bigger than that, so what else is limiting here your transfer
size here? Do you have some udev rule that is reducing the size of your
max sectors value?

Check the value of /sys/block/nvmeXnY/queue/max_sectors_kb

See if it matches /sys/block/nvmeXnY/queue/max_hw_sectors_kb

If not, echo the higher value into the "max_sectors_kb" attribute and
see if that fixes your problem.

> The failing command was attempting to transfer a length of ~320K. It
> seems that nvme-cli does not split the transfers and sends the full
> transfer length to the kernel.

Your command is vendor specific; nvme-cli has no idea what your command
does so it can't possibly know how to properly split it up. Try
something nvme-cli knows about, like a telemetry-log, and nvme-cli will
split it up for you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
  2025-05-24 14:26               ` Keith Busch
@ 2025-05-26  7:09                 ` Christoph Hellwig
  2025-05-27 15:44                   ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2025-05-26  7:09 UTC (permalink / raw)
  To: Keith Busch
  Cc: Avinash M N, Jeffrey Lien, linux-nvme@lists.infradead.org,
	Rahul Jain

On Sat, May 24, 2025 at 08:26:17AM -0600, Keith Busch wrote:
> > There is no issue if nvme-cli sends a transfer length of up to 128K.
> > Anything more than 128K is failed as ENINVAL. I guess this is coming
> > from the limitation of BIO_MAX_VECS as 256. Since this was working on
> > older kernels, did anything change in this regard?
> 
> Well, it's passing a virtually contiguous address, so if we assume your
> page size is 4k, 256 vectors would allow up to 1MB without a problem.
> 
> But the NVMe pci driver has its own limit of 128 vectors, so 512k is the
> largest you can safely go before you need to increase the size of pages
> via hugetlbfs.
> 
> But you say you're doing something smaller than 512k, and your device's
> MDTS is bigger than that, so what else is limiting here your transfer
> size here? Do you have some udev rule that is reducing the size of your
> max sectors value?
> 
> Check the value of /sys/block/nvmeXnY/queue/max_sectors_kb
> 
> See if it matches /sys/block/nvmeXnY/queue/max_hw_sectors_kb

bio_split_rw_at doesn't look at max_sectors unless that is passed
in as the argument.

It would be good to just throw in debug printks to see which splitting
decision in bio_split_rw_at triggers, including checking the exact
condition in bvec_split_segs.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NVMe CLI Invalid PRP Entry Status Failures
  2025-05-26  7:09                 ` Christoph Hellwig
@ 2025-05-27 15:44                   ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2025-05-27 15:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Avinash M N, Jeffrey Lien, linux-nvme@lists.infradead.org,
	Rahul Jain

On Mon, May 26, 2025 at 12:09:21AM -0700, Christoph Hellwig wrote:
> On Sat, May 24, 2025 at 08:26:17AM -0600, Keith Busch wrote:
> 
> bio_split_rw_at doesn't look at max_sectors unless that is passed
> in as the argument.

Oh right, this passthrough path uses the max_hw_sectors limit instead.
 
> It would be good to just throw in debug printks to see which splitting
> decision in bio_split_rw_at triggers, including checking the exact
> condition in bvec_split_segs.

FWIW, I can't reproduce the issue on 6.10.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-27 15:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <MW3PR16MB3897F0E659B34877C4D5197CF391A@MW3PR16MB3897.namprd16.prod.outlook.com>
2025-05-15 14:01 ` NVMe CLI Invalid PRP Entry Status Failures Jeffrey Lien
2025-05-15 14:18   ` Keith Busch
     [not found]     ` <LV3PR16MB606775F391112B2DADF64463B590A@LV3PR16MB6067.namprd16.prod.outlook.com>
2025-05-16 12:33       ` Keith Busch
     [not found]         ` <LV3PR16MB6067E928D739DCB65F31FE46B593A@LV3PR16MB6067.namprd16.prod.outlook.com>
2025-05-16 14:41           ` Keith Busch
2025-05-24  4:22             ` Avinash M N
2025-05-24 14:26               ` Keith Busch
2025-05-26  7:09                 ` Christoph Hellwig
2025-05-27 15:44                   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox