* stack smashing detected with 'nvme sanitize-log /dev/nvme0'
@ 2023-07-26 11:52 Daniel Wagner
2023-07-26 13:16 ` Christoph Hellwig
2023-07-27 1:30 ` Ming Lei
0 siblings, 2 replies; 16+ messages in thread
From: Daniel Wagner @ 2023-07-26 11:52 UTC (permalink / raw)
To: linux-nvme@lists.infradead.org
Cc: Guangwu Zhang, Ming Lei, Christoph Hellwig, Keith Busch
FYI, I got a a bug report [1] with a 'stack smashing detected' when running
'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against
udisk. udisk recently added libnvme which does now a sanitize-log call, so this
problem might exists for a while.
We figured out that an older kernel such as 4.19.289 work but newer not (it's a
bit hard for the reporter to test all combinations on his setup due to compiler
changes etc.).
There was a bit of refactoring in v5.2 which could be the cause of the stack
smash, because saw this recent fix:
b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data")
[1] https://github.com/storaged-project/udisks/issues/1152
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner @ 2023-07-26 13:16 ` Christoph Hellwig 2023-08-21 13:37 ` Daniel Wagner 2023-07-27 1:30 ` Ming Lei 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2023-07-26 13:16 UTC (permalink / raw) To: Daniel Wagner Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Christoph Hellwig, Keith Busch On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote: > FYI, I got a a bug report [1] with a 'stack smashing detected' when running > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against > udisk. udisk recently added libnvme which does now a sanitize-log call, so this > problem might exists for a while. > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a > bit hard for the reporter to test all combinations on his setup due to compiler > changes etc.). > > There was a bit of refactoring in v5.2 which could be the cause of the stack > smash, because saw this recent fix: > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") > > [1] https://github.com/storaged-project/udisks/issues/1152 If you think it is related to DMA, there are good ways to check for: 1) force that an IOMMU is used for this device 2) hack nvme or the blk-map code that we never do the direct mapping to user space but do the copy based version, and then enable all kernel memory debugging helpers, most importantly KASAN ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-07-26 13:16 ` Christoph Hellwig @ 2023-08-21 13:37 ` Daniel Wagner 2023-08-21 15:11 ` Keith Busch 2023-08-28 9:18 ` Christoph Hellwig 0 siblings, 2 replies; 16+ messages in thread From: Daniel Wagner @ 2023-08-21 13:37 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Keith Busch, Oleg Solovyov On Wed, Jul 26, 2023 at 03:16:43PM +0200, Christoph Hellwig wrote: > On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote: > > FYI, I got a a bug report [1] with a 'stack smashing detected' when running > > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against > > udisk. udisk recently added libnvme which does now a sanitize-log call, so this > > problem might exists for a while. > > > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a > > bit hard for the reporter to test all combinations on his setup due to compiler > > changes etc.). > > > > There was a bit of refactoring in v5.2 which could be the cause of the stack > > smash, because saw this recent fix: > > > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") > > > > [1] https://github.com/storaged-project/udisks/issues/1152 > > If you think it is related to DMA, there are good ways to check for: > > 1) force that an IOMMU is used for this device > 2) hack nvme or the blk-map code that we never do the direct mapping > to user space but do the copy based version, and then enable > all kernel memory debugging helpers, most importantly KASAN Collected some info: - this happens only with devices from MAXIO Technology vid : 0x1e4b ssvid : 0x1e4b - Oleg bissect the kernel and he landed on 3b2a1ebceba3 ("nvme: set dma alignment to qword"). He hacked the kernel and this made it work again: --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX)); } blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1); - blk_queue_dma_alignment(q, 3); blk_queue_write_cache(q, vwc, vwc); } - modified the reproducer so that it allocates a 4k buffer with a well known pattern and checked the buffer after fetching the sanitize log [1]. The first invocation wrote 0x940 bytes and the second 0x5c0 bytes. Note we just asking for 0x200 bytes. - modified blk_rq_map_user_iov so that it always uses the copy path. The problem went away. Though forgot to ask to turn on KASAN but given that we know the device writes too much data it is likely also overwriting some kernel memory. So what's the best way forward from here? Introduce a quirk and always use bounce buffer? [1] https://github.com/storaged-project/libblockdev/pull/951#issuecomment-1676276491 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-21 13:37 ` Daniel Wagner @ 2023-08-21 15:11 ` Keith Busch 2023-08-22 7:55 ` Daniel Wagner 2023-08-28 9:20 ` Christoph Hellwig 2023-08-28 9:18 ` Christoph Hellwig 1 sibling, 2 replies; 16+ messages in thread From: Keith Busch @ 2023-08-21 15:11 UTC (permalink / raw) To: Daniel Wagner Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Aug 21, 2023 at 03:37:55PM +0200, Daniel Wagner wrote: > On Wed, Jul 26, 2023 at 03:16:43PM +0200, Christoph Hellwig wrote: > > On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote: > > > FYI, I got a a bug report [1] with a 'stack smashing detected' when running > > > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against > > > udisk. udisk recently added libnvme which does now a sanitize-log call, so this > > > problem might exists for a while. > > > > > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a > > > bit hard for the reporter to test all combinations on his setup due to compiler > > > changes etc.). > > > > > > There was a bit of refactoring in v5.2 which could be the cause of the stack > > > smash, because saw this recent fix: > > > > > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") > > > > > > [1] https://github.com/storaged-project/udisks/issues/1152 > > > > If you think it is related to DMA, there are good ways to check for: > > > > 1) force that an IOMMU is used for this device > > 2) hack nvme or the blk-map code that we never do the direct mapping > > to user space but do the copy based version, and then enable > > all kernel memory debugging helpers, most importantly KASAN > > Collected some info: > > - this happens only with devices from MAXIO Technology > > vid : 0x1e4b > ssvid : 0x1e4b > > - Oleg bissect the kernel and he landed on 3b2a1ebceba3 ("nvme: set > dma alignment to qword"). He hacked the kernel and this made it > work again: > > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, > blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX)); > } > blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1); > - blk_queue_dma_alignment(q, 3); > blk_queue_write_cache(q, vwc, vwc); > } > > - modified the reproducer so that it allocates a 4k buffer with a well > known pattern and checked the buffer after fetching the sanitize log > [1]. The first invocation wrote 0x940 bytes and the second 0x5c0 > bytes. Note we just asking for 0x200 bytes. > > - modified blk_rq_map_user_iov so that it always uses the copy path. > The problem went away. Though forgot to ask to turn on KASAN but > given that we know the device writes too much data it is likely also > overwriting some kernel memory. > > So what's the best way forward from here? Introduce a quirk and always > use bounce buffer? I don't think we want to bounce to kernel memory for the device to overwrite it. I suggest just change nvme-cli's stack allocated santize log to a use page aligned and sized buffer. --- diff --git a/nvme.c b/nvme.c index 181141ad..7f98a506 100644 --- a/nvme.c +++ b/nvme.c @@ -2376,7 +2376,7 @@ ret: static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin) { const char *desc = "Retrieve sanitize log and show it."; - struct nvme_sanitize_log_page sanitize_log; + struct nvme_sanitize_log_page *sanitize_log; enum nvme_print_flags flags; struct nvme_dev *dev; int err; @@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p if (cfg.human_readable) flags |= VERBOSE; - err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log); + if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) { + err = -1; + goto close_dev; + } + + err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log); if (!err) - nvme_show_sanitize_log(&sanitize_log, dev->name, flags); + nvme_show_sanitize_log(sanitize_log, dev->name, flags); else if (err > 0) nvme_show_status(err); else nvme_show_error("sanitize status log: %s", nvme_strerror(errno)); + free(sanitize_log); close_dev: dev_close(dev); ret: -- ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-21 15:11 ` Keith Busch @ 2023-08-22 7:55 ` Daniel Wagner 2023-08-23 15:37 ` Keith Busch 2023-08-28 9:20 ` Christoph Hellwig 1 sibling, 1 reply; 16+ messages in thread From: Daniel Wagner @ 2023-08-22 7:55 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote: > I don't think we want to bounce to kernel memory for the device to > overwrite it. I suggest just change nvme-cli's stack allocated santize > log to a use page aligned and sized buffer. Sure we can add workarounds in userspace, though I think this is a regression and should ideally address in the kernel somehow. > diff --git a/nvme.c b/nvme.c > index 181141ad..7f98a506 100644 > --- a/nvme.c > +++ b/nvme.c > @@ -2376,7 +2376,7 @@ ret: > static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin) > { > const char *desc = "Retrieve sanitize log and show it."; > - struct nvme_sanitize_log_page sanitize_log; > + struct nvme_sanitize_log_page *sanitize_log; > enum nvme_print_flags flags; > struct nvme_dev *dev; > int err; > @@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p > if (cfg.human_readable) > flags |= VERBOSE; > > - err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log); > + if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) { > + err = -1; > + goto close_dev; > + } > + > + err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log); > if (!err) > - nvme_show_sanitize_log(&sanitize_log, dev->name, flags); > + nvme_show_sanitize_log(sanitize_log, dev->name, flags); > else if (err > 0) > nvme_show_status(err); > else > nvme_show_error("sanitize status log: %s", nvme_strerror(errno)); > + free(sanitize_log); > close_dev: > dev_close(dev); > ret: Anyway, I suppose we want to do this far all get log commands, not sure if it is limited to the get sanitize log alone. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-22 7:55 ` Daniel Wagner @ 2023-08-23 15:37 ` Keith Busch 2023-08-25 6:36 ` Daniel Wagner 0 siblings, 1 reply; 16+ messages in thread From: Keith Busch @ 2023-08-23 15:37 UTC (permalink / raw) To: Daniel Wagner Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Tue, Aug 22, 2023 at 09:55:38AM +0200, Daniel Wagner wrote: > On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote: > > I don't think we want to bounce to kernel memory for the device to > > overwrite it. I suggest just change nvme-cli's stack allocated santize > > log to a use page aligned and sized buffer. > > Sure we can add workarounds in userspace, though I think this is a > regression and should ideally address in the kernel somehow. Not sure if this is a regression. Even prior to the DMA alignment settings change, you'd still hit this if your stack buffer aligned to 512b. The kernel bounce buffer sounds like it just corrupts kernel memory instead, so we can't have the driver just go back to the way it was for these devices; someone has to over allocate the buffer. A device over running their DMA buffer, though, sounds bad enough that a firmware fix should happen instead. > > diff --git a/nvme.c b/nvme.c > > index 181141ad..7f98a506 100644 > > --- a/nvme.c > > +++ b/nvme.c > > @@ -2376,7 +2376,7 @@ ret: > > static int sanitize_log(int argc, char **argv, struct command *command, struct plugin *plugin) > > { > > const char *desc = "Retrieve sanitize log and show it."; > > - struct nvme_sanitize_log_page sanitize_log; > > + struct nvme_sanitize_log_page *sanitize_log; > > enum nvme_print_flags flags; > > struct nvme_dev *dev; > > int err; > > @@ -2419,13 +2419,19 @@ static int sanitize_log(int argc, char **argv, struct command *command, struct p > > if (cfg.human_readable) > > flags |= VERBOSE; > > > > - err = nvme_cli_get_log_sanitize(dev, cfg.rae, &sanitize_log); > > + if (posix_memalign((void *)&sanitize_log, getpagesize(), 0x1000)) { > > + err = -1; > > + goto close_dev; > > + } > > + > > + err = nvme_cli_get_log_sanitize(dev, cfg.rae, sanitize_log); > > if (!err) > > - nvme_show_sanitize_log(&sanitize_log, dev->name, flags); > > + nvme_show_sanitize_log(sanitize_log, dev->name, flags); > > else if (err > 0) > > nvme_show_status(err); > > else > > nvme_show_error("sanitize status log: %s", nvme_strerror(errno)); > > + free(sanitize_log); > > close_dev: > > dev_close(dev); > > ret: > > Anyway, I suppose we want to do this far all get log commands, not sure > if it is limited to the get sanitize log alone. Perhaps for saftey we could do that for all logs < 4k in size. There are not very many of those. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-23 15:37 ` Keith Busch @ 2023-08-25 6:36 ` Daniel Wagner 2023-08-28 9:21 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Daniel Wagner @ 2023-08-25 6:36 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Wed, Aug 23, 2023 at 09:37:48AM -0600, Keith Busch wrote: > On Tue, Aug 22, 2023 at 09:55:38AM +0200, Daniel Wagner wrote: > > On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote: > > > I don't think we want to bounce to kernel memory for the device to > > > overwrite it. I suggest just change nvme-cli's stack allocated santize > > > log to a use page aligned and sized buffer. > > > > Sure we can add workarounds in userspace, though I think this is a > > regression and should ideally address in the kernel somehow. > > Not sure if this is a regression. Even prior to the DMA alignment > settings change, you'd still hit this if your stack buffer aligned to > 512b. The kernel bounce buffer sounds like it just corrupts kernel > memory instead, so we can't have the driver just go back to the way it > was for these devices; someone has to over allocate the buffer. Okay, let's ignore the regression argument then. But what about the fact we are asking for 512 bytes via the kernels API and get too much data? Isn't this something we should address? I mean this forces all users of this kernel API allocate enough large buffers to handle this device. Alternatively, we could add bounce buffers in libnvme instead in the kernel. > A device over running their DMA buffer, though, sounds bad enough that a > firmware fix should happen instead. Oleg contacted the manufacturer but I do not get my hopes up at all. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-25 6:36 ` Daniel Wagner @ 2023-08-28 9:21 ` Christoph Hellwig 2023-09-25 15:09 ` Daniel Wagner 0 siblings, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2023-08-28 9:21 UTC (permalink / raw) To: Daniel Wagner Cc: Keith Busch, Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Fri, Aug 25, 2023 at 08:36:50AM +0200, Daniel Wagner wrote: > Okay, let's ignore the regression argument then. But what about the fact > we are asking for 512 bytes via the kernels API and get too much data? > Isn't this something we should address? I mean this forces all users of > this kernel API allocate enough large buffers to handle this device. There isn't really much the kernel can do except for using an IOMMU when available to protect itself from this, but that will mean we're shutting down the device when it does that. The device just seems completely broken unfortunately. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-28 9:21 ` Christoph Hellwig @ 2023-09-25 15:09 ` Daniel Wagner 2023-09-25 15:19 ` Christoph Hellwig 0 siblings, 1 reply; 16+ messages in thread From: Daniel Wagner @ 2023-09-25 15:09 UTC (permalink / raw) To: Christoph Hellwig Cc: Keith Busch, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Aug 28, 2023 at 11:21:55AM +0200, Christoph Hellwig wrote: > On Fri, Aug 25, 2023 at 08:36:50AM +0200, Daniel Wagner wrote: > > Okay, let's ignore the regression argument then. But what about the fact > > we are asking for 512 bytes via the kernels API and get too much data? > > Isn't this something we should address? I mean this forces all users of > > this kernel API allocate enough large buffers to handle this device. > > There isn't really much the kernel can do except for using an IOMMU > when available to protect itself from this, but that will mean we're > shutting down the device when it does that. > > The device just seems completely broken unfortunately. Just a follow up on this. I've update nvme-cli so that all payloads are allocated via the nvme_alloc() helper which ensures that the payloads start at a 4k boundary and the buffer is multiple of 4k. This should address this issue. As turns out, more devices suffer from this problem: SK hynix PC611 NVMe 512GB SSD[1]. [1] https://github.com/storaged-project/udisks/issues/1193 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-09-25 15:09 ` Daniel Wagner @ 2023-09-25 15:19 ` Christoph Hellwig 0 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2023-09-25 15:19 UTC (permalink / raw) To: Daniel Wagner Cc: Christoph Hellwig, Keith Busch, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Sep 25, 2023 at 05:09:16PM +0200, Daniel Wagner wrote: > > The device just seems completely broken unfortunately. > > Just a follow up on this. I've update nvme-cli so that all payloads are > allocated via the nvme_alloc() helper which ensures that the payloads > start at a 4k boundary and the buffer is multiple of 4k. This should > address this issue. It does not address the issue, it just works around it. I think we need a kernel level quirk to make sure we never issue commands that cause these devices to act so broken to them, as the stack smashing is a security problem. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-21 15:11 ` Keith Busch 2023-08-22 7:55 ` Daniel Wagner @ 2023-08-28 9:20 ` Christoph Hellwig 2023-08-29 13:29 ` Keith Busch 1 sibling, 1 reply; 16+ messages in thread From: Christoph Hellwig @ 2023-08-28 9:20 UTC (permalink / raw) To: Keith Busch Cc: Daniel Wagner, Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote: > I don't think we want to bounce to kernel memory for the device to > overwrite it. I suggest just change nvme-cli's stack allocated santize > log to a use page aligned and sized buffer. That assumes it actually overwrites it in that case and doesn't just have a PRP parsing bug when there is not enough alignment. We should be able to find out by enabling KASAN and then requiring the larger alignment before re-running the reproducer. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-28 9:20 ` Christoph Hellwig @ 2023-08-29 13:29 ` Keith Busch 0 siblings, 0 replies; 16+ messages in thread From: Keith Busch @ 2023-08-29 13:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Daniel Wagner, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Oleg Solovyov On Mon, Aug 28, 2023 at 11:20:38AM +0200, Christoph Hellwig wrote: > On Mon, Aug 21, 2023 at 09:11:38AM -0600, Keith Busch wrote: > > I don't think we want to bounce to kernel memory for the device to > > overwrite it. I suggest just change nvme-cli's stack allocated santize > > log to a use page aligned and sized buffer. > > That assumes it actually overwrites it in that case and doesn't just > have a PRP parsing bug when there is not enough alignment. > > We should be able to find out by enabling KASAN and then requiring the > larger alignment before re-running the reproducer. Good point. I assumed it was a simple buffer overrun regardless of the starting offset, but bad PRP parsing sounds plausible. If you don't want to enable a kernel with kasan, we can just align user space buffers with padding at different offsets and see what happens. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-08-21 13:37 ` Daniel Wagner 2023-08-21 15:11 ` Keith Busch @ 2023-08-28 9:18 ` Christoph Hellwig 1 sibling, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2023-08-28 9:18 UTC (permalink / raw) To: Daniel Wagner Cc: Christoph Hellwig, linux-nvme@lists.infradead.org, Guangwu Zhang, Ming Lei, Keith Busch, Oleg Solovyov On Mon, Aug 21, 2023 at 03:37:55PM +0200, Daniel Wagner wrote: > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1871,7 +1871,6 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, > blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX)); > } > blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1); > - blk_queue_dma_alignment(q, 3); > blk_queue_write_cache(q, vwc, vwc); > } > > So what's the best way forward from here? Introduce a quirk and always > use bounce buffer? Add a quirk for the device so that we require 512 byte alignment for it. I suspect the same one will apply to this whole family of buggy MAXIO devices.. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner 2023-07-26 13:16 ` Christoph Hellwig @ 2023-07-27 1:30 ` Ming Lei 2023-07-27 1:37 ` Guangwu Zhang 1 sibling, 1 reply; 16+ messages in thread From: Ming Lei @ 2023-07-27 1:30 UTC (permalink / raw) To: Daniel Wagner Cc: linux-nvme@lists.infradead.org, Guangwu Zhang, Christoph Hellwig, Keith Busch On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote: > FYI, I got a a bug report [1] with a 'stack smashing detected' when running > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against > udisk. udisk recently added libnvme which does now a sanitize-log call, so this > problem might exists for a while. > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a > bit hard for the reporter to test all combinations on his setup due to compiler > changes etc.). > > There was a bit of refactoring in v5.2 which could be the cause of the stack > smash, because saw this recent fix: > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") This commit only fixes DMA UNMAP direction for integrity data, but is there integrity data involved for 'nvme sanitize-log /dev/nvme0'? Thanks, Ming ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-07-27 1:30 ` Ming Lei @ 2023-07-27 1:37 ` Guangwu Zhang 2023-07-27 7:23 ` Daniel Wagner 0 siblings, 1 reply; 16+ messages in thread From: Guangwu Zhang @ 2023-07-27 1:37 UTC (permalink / raw) To: Ming Lei Cc: Daniel Wagner, linux-nvme@lists.infradead.org, Christoph Hellwig, Keith Busch Hi, Can not reproduce the bug with our test environment. 5.14.0-344.el9.x86_64 b0:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X # nvme sanitize-log /dev/nvme0 Sanitize Progress (SPROG) : 65535 Sanitize Status (SSTAT) : 0 Sanitize Command Dword 10 Information (SCDW10) : 0 Estimated Time For Overwrite : 4294967295 (No time period reported) Estimated Time For Block Erase : 4294967295 (No time period reported) Estimated Time For Crypto Erase : 4294967295 (No time period reported) Estimated Time For Overwrite (No-Deallocate) : 0 Estimated Time For Block Erase (No-Deallocate) : 0 Estimated Time For Crypto Erase (No-Deallocate): 0 Ming Lei <ming.lei@redhat.com> 于2023年7月27日周四 09:30写道: > > On Wed, Jul 26, 2023 at 01:52:04PM +0200, Daniel Wagner wrote: > > FYI, I got a a bug report [1] with a 'stack smashing detected' when running > > 'nvme sanitize-log /dev/nvme0' on Debian. Originally, it was reported against > > udisk. udisk recently added libnvme which does now a sanitize-log call, so this > > problem might exists for a while. > > > > We figured out that an older kernel such as 4.19.289 work but newer not (it's a > > bit hard for the reporter to test all combinations on his setup due to compiler > > changes etc.). > > > > There was a bit of refactoring in v5.2 which could be the cause of the stack > > smash, because saw this recent fix: > > > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") > > This commit only fixes DMA UNMAP direction for integrity data, but is > there integrity data involved for 'nvme sanitize-log /dev/nvme0'? > > > Thanks, > Ming > -- Guangwu Zhang Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: stack smashing detected with 'nvme sanitize-log /dev/nvme0' 2023-07-27 1:37 ` Guangwu Zhang @ 2023-07-27 7:23 ` Daniel Wagner 0 siblings, 0 replies; 16+ messages in thread From: Daniel Wagner @ 2023-07-27 7:23 UTC (permalink / raw) To: Guangwu Zhang Cc: Ming Lei, linux-nvme@lists.infradead.org, Christoph Hellwig, Keith Busch On Thu, Jul 27, 2023 at 09:37:05AM +0800, Guangwu Zhang wrote: > Can not reproduce the bug with our test environment. Thanks for the quick test. It looks like it depends on the SDD invovled, the ones from MAXIO: # nvme id-ctrl -H /dev/nvme0 NVME Identify Controller: vid : 0x1e4b ssvid : 0x1e4b Both reports have this SDD in common. Other SDDs in the same systems did not cause the stack smash. > > > b8f6446b6853 ("nvme-pci: fix DMA direction of unmapping integrity data") > > > > This commit only fixes DMA UNMAP direction for integrity data, but is > > there integrity data involved for 'nvme sanitize-log /dev/nvme0'? Don't think so. As I said I just saw this fix and I was wondering if the changes which came in v5.2 in this area uncoverd a bug. As Christoph suggested, I need to figure out if device behaves correctly for example with playing with the IOMMU. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-09-25 15:20 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-26 11:52 stack smashing detected with 'nvme sanitize-log /dev/nvme0' Daniel Wagner 2023-07-26 13:16 ` Christoph Hellwig 2023-08-21 13:37 ` Daniel Wagner 2023-08-21 15:11 ` Keith Busch 2023-08-22 7:55 ` Daniel Wagner 2023-08-23 15:37 ` Keith Busch 2023-08-25 6:36 ` Daniel Wagner 2023-08-28 9:21 ` Christoph Hellwig 2023-09-25 15:09 ` Daniel Wagner 2023-09-25 15:19 ` Christoph Hellwig 2023-08-28 9:20 ` Christoph Hellwig 2023-08-29 13:29 ` Keith Busch 2023-08-28 9:18 ` Christoph Hellwig 2023-07-27 1:30 ` Ming Lei 2023-07-27 1:37 ` Guangwu Zhang 2023-07-27 7:23 ` Daniel Wagner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox