stack smashing detected with 'nvme sanitize-log /dev/nvme0'

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* stack smashing detected with 'nvme sanitize-log /dev/nvme0'
@ 2023-07-26 11:52 Daniel Wagner
  2023-07-26 13:16 ` Christoph Hellwig
  2023-07-27  1:30 ` Ming Lei
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Wagner @ 2023-07-26 11:52 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org
  Cc: Guangwu Zhang, Ming Lei, Christoph Hellwig, Keith Busch

FYI, I got a a bug report [1] with a 'stack smashing detected' when running
'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
udisk. udisk recently added libnvme which does now a sanitize-log call, so this
problem might exists for a while.

We figured out that an older kernel such as 4.19.289 work but newer not (it's a
bit hard for the reporter to test all combinations on his setup due to compiler
changes etc.).

There was a bit of refactoring in v5.2 which could be the cause of the stack
smash, because saw this recent fix:

 b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")

[1] https://github.com/storaged-project/udisks/issues/1152

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner
@ 2023-07-26 13:16 ` Christoph Hellwig
  2023-08-21 13:37   ` Daniel Wagner
  2023-07-27  1:30 ` Ming Lei
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-07-26 13:16 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei,
	Christoph Hellwig, Keith Busch

On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> problem might exists for a while.
> 
> We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> bit hard for the reporter to test all combinations on his setup due to compiler
> changes etc.).
> 
> There was a bit of refactoring in v5.2 which could be the cause of the stack
> smash, because saw this recent fix:
> 
>  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
> 
> [1] https://github.com/storaged-project/udisks/issues/1152

If you think it is related to DMA, there are good ways to check for:

  1) force that an IOMMU is used for this device
  2) hack nvme or the blk-map code that we never do the direct mapping
     to user space but do the copy based version, and then enable
     all kernel memory debugging helpers, most importantly KASAN


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-07-26 13:16 ` Christoph Hellwig
@ 2023-08-21 13:37   ` Daniel Wagner
  2023-08-21 15:11     ` Keith Busch
  2023-08-28  9:18     ` Christoph Hellwig
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Wagner @ 2023-08-21 13:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei,
	Keith Busch, Oleg Solovyov

On Wed, Jul 26, 2023 at 03:16:43PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> > FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> > udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> > problem might exists for a while.
> > 
> > We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> > bit hard for the reporter to test all combinations on his setup due to compiler
> > changes etc.).
> > 
> > There was a bit of refactoring in v5.2 which could be the cause of the stack
> > smash, because saw this recent fix:
> > 
> >  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
> > 
> > [1] https://github.com/storaged-project/udisks/issues/1152
> 
> If you think it is related to DMA, there are good ways to check for:
> 
>   1) force that an IOMMU is used for this device
>   2) hack nvme or the blk-map code that we never do the direct mapping
>      to user space but do the copy based version, and then enable
>      all kernel memory debugging helpers, most importantly KASAN

Collected some info:

 - this happens only with devices from MAXIO Technology

   vid       : 0x1e4b
   ssvid     : 0x1e4b

 - Oleg bissect the kernel and he landed on 3b2a1ebceba3 ("nvme: set
   dma alignment to qword"). He hacked the kernel and this made it
   work again:

   --- a/drivers/nvme/host/core.c
   +++ b/drivers/nvme/host/core.c
   @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
                   blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
           }
           blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
   -       blk_queue_dma_alignment(q, 3);
           blk_queue_write_cache(q, vwc, vwc);
   }

 - modified the reproducer so that it allocates a 4k buffer with a well
   known pattern and checked the buffer after fetching the sanitize log
   [1]. The first invocation wrote 0x940 bytes and the second 0x5c0
   bytes. Note we just asking for 0x200 bytes.

 - modified blk_rq_map_user_iov so that it always uses the copy path.
   The problem went away. Though forgot to ask to turn on KASAN but
   given that we know the device writes too much data it is likely also
   overwriting some kernel memory.

So what's the best way forward from here? Introduce a quirk and always
use bounce buffer?

[1] https://github.com/storaged-project/libblockdev/pull/951#issuecomment-1676276491


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-21 13:37   ` Daniel Wagner
@ 2023-08-21 15:11     ` Keith Busch
  2023-08-22  7:55       ` Daniel Wagner
  2023-08-28  9:20       ` Christoph Hellwig
  2023-08-28  9:18     ` Christoph Hellwig
  1 sibling, 2 replies; 16+ messages in thread
From: Keith Busch @ 2023-08-21 15:11 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Mon, Aug 21, 2023 at 03:37:55PM +0200, Daniel Wagner wrote:
> On Wed, Jul 26, 2023 at 03:16:43PM +0200, Christoph Hellwig wrote:
> > On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> > > FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> > > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> > > udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> > > problem might exists for a while.
> > > 
> > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> > > bit hard for the reporter to test all combinations on his setup due to compiler
> > > changes etc.).
> > > 
> > > There was a bit of refactoring in v5.2 which could be the cause of the stack
> > > smash, because saw this recent fix:
> > > 
> > >  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
> > > 
> > > [1] https://github.com/storaged-project/udisks/issues/1152
> > 
> > If you think it is related to DMA, there are good ways to check for:
> > 
> >   1) force that an IOMMU is used for this device
> >   2) hack nvme or the blk-map code that we never do the direct mapping
> >      to user space but do the copy based version, and then enable
> >      all kernel memory debugging helpers, most importantly KASAN
> 
> Collected some info:
> 
>  - this happens only with devices from MAXIO Technology
> 
>    vid       : 0x1e4b
>    ssvid     : 0x1e4b
> 
>  - Oleg bissect the kernel and he landed on 3b2a1ebceba3 ("nvme: set
>    dma alignment to qword"). He hacked the kernel and this made it
>    work again:
> 
>    --- a/drivers/nvme/host/core.c
>    +++ b/drivers/nvme/host/core.c
>    @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
>                    blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
>            }
>            blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
>    -       blk_queue_dma_alignment(q, 3);
>            blk_queue_write_cache(q, vwc, vwc);
>    }
> 
>  - modified the reproducer so that it allocates a 4k buffer with a well
>    known pattern and checked the buffer after fetching the sanitize log
>    [1]. The first invocation wrote 0x940 bytes and the second 0x5c0
>    bytes. Note we just asking for 0x200 bytes.
> 
>  - modified blk_rq_map_user_iov so that it always uses the copy path.
>    The problem went away. Though forgot to ask to turn on KASAN but
>    given that we know the device writes too much data it is likely also
>    overwriting some kernel memory.
> 
> So what's the best way forward from here? Introduce a quirk and always
> use bounce buffer?

I don't think we want to bounce to kernel memory for the device to
overwrite it. I suggest just change nvme-cli's stack allocated santize
log to a use page aligned and sized buffer.

---
diff --git a/nvme.c b/nvme.c
index 181141ad..7f98a506 100644
--- a/nvme.c
+++ b/nvme.c
@@ -2376,7 +2376,7 @@ ret:
 static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin)
 {
 	const char *desc = "Retrieve sanitize log and show it.";
-	struct nvme_sanitize_log_page sanitize_log;
+	struct nvme_sanitize_log_page *sanitize_log;
 	enum nvme_print_flags flags;
 	struct nvme_dev *dev;
 	int err;
@@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p
 	if (cfg.human_readable)
 		flags |= VERBOSE;
 
-	err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log);
+	if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) {
+		err = -1;
+		goto close_dev;
+	}
+
+	err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log);
 	if (!err)
-		nvme_show_sanitize_log(&sanitize_log, dev->name, flags);
+		nvme_show_sanitize_log(sanitize_log, dev->name, flags);
 	else if (err > 0)
 		nvme_show_status(err);
 	else
 		nvme_show_error("sanitize status log: %s", nvme_strerror(errno));
+	free(sanitize_log);
 close_dev:
 	dev_close(dev);
 ret:
--


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-21 15:11     ` Keith Busch
@ 2023-08-22  7:55       ` Daniel Wagner
  2023-08-23 15:37         ` Keith Busch
  2023-08-28  9:20       ` Christoph Hellwig
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2023-08-22  7:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote:
> I don't think we want to bounce to kernel memory for the device to
> overwrite it. I suggest just change nvme-cli's stack allocated santize
> log to a use page aligned and sized buffer.

Sure we can add workarounds in userspace, though I think this is a
regression and should ideally address in the kernel somehow.

> diff --git a/nvme.c b/nvme.c
> index 181141ad..7f98a506 100644
> --- a/nvme.c
> +++ b/nvme.c
> @@ -2376,7 +2376,7 @@ ret:
>  static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin)
>  {
>  	const char *desc = "Retrieve sanitize log and show it.";
> -	struct nvme_sanitize_log_page sanitize_log;
> +	struct nvme_sanitize_log_page *sanitize_log;
>  	enum nvme_print_flags flags;
>  	struct nvme_dev *dev;
>  	int err;
> @@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p
>  	if (cfg.human_readable)
>  		flags |= VERBOSE;
>
> -	err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log);
> +	if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) {
> +		err = -1;
> +		goto close_dev;
> +	}
> +
> +	err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log);
>  	if (!err)
> -		nvme_show_sanitize_log(&sanitize_log, dev->name, flags);
> +		nvme_show_sanitize_log(sanitize_log, dev->name, flags);
>  	else if (err > 0)
>  		nvme_show_status(err);
>  	else
>  		nvme_show_error("sanitize status log: %s", nvme_strerror(errno));
> +	free(sanitize_log);
>  close_dev:
>  	dev_close(dev);
>  ret:

Anyway, I suppose we want to do this far all get log commands, not sure
if it is limited to the get sanitize log alone.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-22  7:55       ` Daniel Wagner
@ 2023-08-23 15:37         ` Keith Busch
  2023-08-25  6:36           ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2023-08-23 15:37 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Tue, Aug 22, 2023 at 09:55:38AM +0200, Daniel Wagner wrote:
> On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote:
> > I don't think we want to bounce to kernel memory for the device to
> > overwrite it. I suggest just change nvme-cli's stack allocated santize
> > log to a use page aligned and sized buffer.
> 
> Sure we can add workarounds in userspace, though I think this is a
> regression and should ideally address in the kernel somehow.

Not sure if this is a regression. Even prior to the DMA alignment
settings change, you'd still hit this if your stack buffer aligned to
512b. The kernel bounce buffer sounds like it just corrupts kernel
memory instead, so we can't have the driver just go back to the way it
was for these devices; someone has to over allocate the buffer.

A device over running their DMA buffer, though, sounds bad enough that a
firmware fix should happen instead.

> > diff --git a/nvme.c b/nvme.c
> > index 181141ad..7f98a506 100644
> > --- a/nvme.c
> > +++ b/nvme.c
> > @@ -2376,7 +2376,7 @@ ret:
> >  static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin)
> >  {
> >  	const char *desc = "Retrieve sanitize log and show it.";
> > -	struct nvme_sanitize_log_page sanitize_log;
> > +	struct nvme_sanitize_log_page *sanitize_log;
> >  	enum nvme_print_flags flags;
> >  	struct nvme_dev *dev;
> >  	int err;
> > @@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p
> >  	if (cfg.human_readable)
> >  		flags |= VERBOSE;
> >
> > -	err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log);
> > +	if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) {
> > +		err = -1;
> > +		goto close_dev;
> > +	}
> > +
> > +	err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log);
> >  	if (!err)
> > -		nvme_show_sanitize_log(&sanitize_log, dev->name, flags);
> > +		nvme_show_sanitize_log(sanitize_log, dev->name, flags);
> >  	else if (err > 0)
> >  		nvme_show_status(err);
> >  	else
> >  		nvme_show_error("sanitize status log: %s", nvme_strerror(errno));
> > +	free(sanitize_log);
> >  close_dev:
> >  	dev_close(dev);
> >  ret:
> 
> Anyway, I suppose we want to do this far all get log commands, not sure
> if it is limited to the get sanitize log alone.

Perhaps for saftey we could do that for all logs < 4k in size. There are
not very many of those.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-23 15:37         ` Keith Busch
@ 2023-08-25  6:36           ` Daniel Wagner
  2023-08-28  9:21             ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2023-08-25  6:36 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Wed, Aug 23, 2023 at 09:37:48AM -0600, Keith Busch wrote:
> On Tue, Aug 22, 2023 at 09:55:38AM +0200, Daniel Wagner wrote:
> > On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote:
> > > I don't think we want to bounce to kernel memory for the device to
> > > overwrite it. I suggest just change nvme-cli's stack allocated santize
> > > log to a use page aligned and sized buffer.
> > 
> > Sure we can add workarounds in userspace, though I think this is a
> > regression and should ideally address in the kernel somehow.
> 
> Not sure if this is a regression. Even prior to the DMA alignment
> settings change, you'd still hit this if your stack buffer aligned to
> 512b. The kernel bounce buffer sounds like it just corrupts kernel
> memory instead, so we can't have the driver just go back to the way it
> was for these devices; someone has to over allocate the buffer.

Okay, let's ignore the regression argument then. But what about the fact
we are asking for 512 bytes via the kernels API and get too much data?
Isn't this something we should address? I mean this forces all users of
this kernel API allocate enough large buffers to handle this device.

Alternatively, we could add bounce buffers in libnvme instead in the
kernel.

> A device over running their DMA buffer, though, sounds bad enough that a
> firmware fix should happen instead.

Oleg contacted the manufacturer but I do not get my hopes up at all.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-25  6:36           ` Daniel Wagner
@ 2023-08-28  9:21             ` Christoph Hellwig
  2023-09-25 15:09               ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-08-28  9:21 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Keith Busch, Christoph Hellwig, linux-nvme@lists.infradead.org,
	Guangwu Zhang, Ming Lei, Oleg Solovyov

On Fri, Aug 25, 2023 at 08:36:50AM +0200, Daniel Wagner wrote:
> Okay, let's ignore the regression argument then. But what about the fact
> we are asking for 512 bytes via the kernels API and get too much data?
> Isn't this something we should address? I mean this forces all users of
> this kernel API allocate enough large buffers to handle this device.

There isn't really much the kernel can do except for using an IOMMU
when available to protect itself from this, but that will mean we're
shutting down the device when it does that.

The device just seems completely broken unfortunately.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-28  9:21             ` Christoph Hellwig
@ 2023-09-25 15:09               ` Daniel Wagner
  2023-09-25 15:19                 ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Wagner @ 2023-09-25 15:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Mon, Aug 28, 2023 at 11:21:55AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 25, 2023 at 08:36:50AM +0200, Daniel Wagner wrote:
> > Okay, let's ignore the regression argument then. But what about the fact
> > we are asking for 512 bytes via the kernels API and get too much data?
> > Isn't this something we should address? I mean this forces all users of
> > this kernel API allocate enough large buffers to handle this device.
> 
> There isn't really much the kernel can do except for using an IOMMU
> when available to protect itself from this, but that will mean we're
> shutting down the device when it does that.
> 
> The device just seems completely broken unfortunately.

Just a follow up on this. I've update nvme-cli so that all payloads are
allocated via the nvme_alloc() helper which ensures that the payloads
start at a 4k boundary and the buffer is multiple of 4k. This should
address this issue.

As turns out, more devices suffer from this problem: SK hynix PC611 NVMe
512GB SSD[1].

[1] https://github.com/storaged-project/udisks/issues/1193


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-09-25 15:09               ` Daniel Wagner
@ 2023-09-25 15:19                 ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-09-25 15:19 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, Keith Busch, linux-nvme@lists.infradead.org,
	Guangwu Zhang, Ming Lei, Oleg Solovyov

On Mon, Sep 25, 2023 at 05:09:16PM +0200, Daniel Wagner wrote:
> > The device just seems completely broken unfortunately.
> 
> Just a follow up on this. I've update nvme-cli so that all payloads are
> allocated via the nvme_alloc() helper which ensures that the payloads
> start at a 4k boundary and the buffer is multiple of 4k. This should
> address this issue.

It does not address the issue, it just works around it.   I think we
need a kernel level quirk to make sure we never issue commands that
cause these devices to act so broken to them, as the stack smashing is
a security problem.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-21 15:11     ` Keith Busch
  2023-08-22  7:55       ` Daniel Wagner
@ 2023-08-28  9:20       ` Christoph Hellwig
  2023-08-29 13:29         ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-08-28  9:20 UTC (permalink / raw)
  To: Keith Busch
  Cc: Daniel Wagner, Christoph Hellwig, linux-nvme@lists.infradead.org,
	Guangwu Zhang, Ming Lei, Oleg Solovyov

On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote:
> I don't think we want to bounce to kernel memory for the device to
> overwrite it. I suggest just change nvme-cli's stack allocated santize
> log to a use page aligned and sized buffer.

That assumes it actually overwrites it in that case and doesn't just
have a PRP parsing bug when there is not enough alignment.

We should be able to find out by enabling KASAN and then requiring the
larger alignment before re-running the reproducer.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-28  9:20       ` Christoph Hellwig
@ 2023-08-29 13:29         ` Keith Busch
  0 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2023-08-29 13:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Daniel Wagner, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Oleg Solovyov

On Mon, Aug 28, 2023 at 11:20:38AM +0200, Christoph Hellwig wrote:
> On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote:
> > I don't think we want to bounce to kernel memory for the device to
> > overwrite it. I suggest just change nvme-cli's stack allocated santize
> > log to a use page aligned and sized buffer.
> 
> That assumes it actually overwrites it in that case and doesn't just
> have a PRP parsing bug when there is not enough alignment.
> 
> We should be able to find out by enabling KASAN and then requiring the
> larger alignment before re-running the reproducer.

Good point. I assumed it was a simple buffer overrun regardless of the
starting offset, but bad PRP parsing sounds plausible.

If you don't want to enable a kernel with kasan, we can just align user
space buffers with padding at different offsets and see what happens.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-08-21 13:37   ` Daniel Wagner
  2023-08-21 15:11     ` Keith Busch
@ 2023-08-28  9:18     ` Christoph Hellwig
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2023-08-28  9:18 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang,
	Ming Lei, Keith Busch, Oleg Solovyov

On Mon, Aug 21, 2023 at 03:37:55PM +0200, Daniel Wagner wrote:
>    --- a/drivers/nvme/host/core.c
>    +++ b/drivers/nvme/host/core.c
>    @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
>                    blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
>            }
>            blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
>    -       blk_queue_dma_alignment(q, 3);
>            blk_queue_write_cache(q, vwc, vwc);
>    }
> 

> So what's the best way forward from here? Introduce a quirk and always
> use bounce buffer?

Add a quirk for the device so that we require 512 byte alignment for it.
I suspect the same one will apply to this whole family of buggy MAXIO
devices..


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner
  2023-07-26 13:16 ` Christoph Hellwig
@ 2023-07-27  1:30 ` Ming Lei
  2023-07-27  1:37   ` Guangwu Zhang
  1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-07-27  1:30 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Christoph Hellwig,
	Keith Busch

On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> problem might exists for a while.
> 
> We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> bit hard for the reporter to test all combinations on his setup due to compiler
> changes etc.).
> 
> There was a bit of refactoring in v5.2 which could be the cause of the stack
> smash, because saw this recent fix:
> 
>  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")

This commit only fixes DMA UNMAP direction for integrity data, but is
there integrity data involved for 'nvme sanitize-log /dev/nvme0'? 


Thanks,
Ming



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-07-27  1:30 ` Ming Lei
@ 2023-07-27  1:37   ` Guangwu Zhang
  2023-07-27  7:23     ` Daniel Wagner
  0 siblings, 1 reply; 16+ messages in thread
From: Guangwu Zhang @ 2023-07-27  1:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Daniel Wagner, linux-nvme@lists.infradead.org, Christoph Hellwig,
	Keith Busch

Hi,

Can not reproduce the bug with our test environment.

5.14.0-344.el9.x86_64
b0:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd
NVMe SSD Controller PM173X

# nvme sanitize-log /dev/nvme0
Sanitize Progress                      (SPROG) :  65535
Sanitize Status                        (SSTAT) :  0
Sanitize Command Dword 10 Information (SCDW10) :  0
Estimated Time For Overwrite                   :  4294967295 (No time
period reported)
Estimated Time For Block Erase                 :  4294967295 (No time
period reported)
Estimated Time For Crypto Erase                :  4294967295 (No time
period reported)
Estimated Time For Overwrite (No-Deallocate)   :  0
Estimated Time For Block Erase (No-Deallocate) :  0
Estimated Time For Crypto Erase (No-Deallocate):  0

Ming Lei <ming.lei@redhat.com> 于2023年7月27日周四 09:30写道：
>
> On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote:
> > FYI, I got a a bug report [1] with a 'stack smashing detected' when running
> > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
> > udisk. udisk recently added libnvme which does now a sanitize-log call, so this
> > problem might exists for a while.
> >
> > We figured out that an older kernel such as 4.19.289 work but newer not (it's a
> > bit hard for the reporter to test all combinations on his setup due to compiler
> > changes etc.).
> >
> > There was a bit of refactoring in v5.2 which could be the cause of the stack
> > smash, because saw this recent fix:
> >
> >  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
>
> This commit only fixes DMA UNMAP direction for integrity data, but is
> there integrity data involved for 'nvme sanitize-log /dev/nvme0'?
>
>
> Thanks,
> Ming
>


-- 
Guangwu Zhang
Thanks



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0'
  2023-07-27  1:37   ` Guangwu Zhang
@ 2023-07-27  7:23     ` Daniel Wagner
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Wagner @ 2023-07-27  7:23 UTC (permalink / raw)
  To: Guangwu Zhang
  Cc: Ming Lei, linux-nvme@lists.infradead.org, Christoph Hellwig,
	Keith Busch

On Thu, Jul 27, 2023 at 09:37:05AM +0800, Guangwu Zhang wrote:
> Can not reproduce the bug with our test environment.

Thanks for the quick test. It looks like it depends on the SDD invovled, the
ones from MAXIO:

# nvme id-ctrl -H /dev/nvme0
NVME Identify Controller:
vid       : 0x1e4b
ssvid     : 0x1e4b

Both reports have this SDD in common. Other SDDs in the same systems did not
cause the stack smash.

> > >  b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
> >
> > This commit only fixes DMA UNMAP direction for integrity data, but is
> > there integrity data involved for 'nvme sanitize-log /dev/nvme0'?

Don't think so. As I said I just saw this fix and I was wondering if the changes
which came in v5.2 in this area uncoverd a bug. As Christoph suggested, I need
to figure out if device behaves correctly for example with playing with the
IOMMU.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-25 15:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner
2023-07-26 13:16 ` Christoph Hellwig
2023-08-21 13:37   ` Daniel Wagner
2023-08-21 15:11     ` Keith Busch
2023-08-22  7:55       ` Daniel Wagner
2023-08-23 15:37         ` Keith Busch
2023-08-25  6:36           ` Daniel Wagner
2023-08-28  9:21             ` Christoph Hellwig
2023-09-25 15:09               ` Daniel Wagner
2023-09-25 15:19                 ` Christoph Hellwig
2023-08-28  9:20       ` Christoph Hellwig
2023-08-29 13:29         ` Keith Busch
2023-08-28  9:18     ` Christoph Hellwig
2023-07-27  1:30 ` Ming Lei
2023-07-27  1:37   ` Guangwu Zhang
2023-07-27  7:23     ` Daniel Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox