From: Serge Semin <Sergey.Semin@baikalelectronics.ru>
To: Jonathan Derrick <jonathan.derrick@intel.com>,
Revanth Rajashekar <revanth.rajashekar@intel.com>,
Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>,
Jens Axboe <axboe@fb.com>, Christoph Hellwig <hch@lst.de>,
Sagi Grimberg <sagi@grimberg.me>,
Guenter Roeck <linux@roeck-us.net>
Cc: Serge Semin <Sergey.Semin@baikalelectronics.ru>,
Serge Semin <fancer.lancer@gmail.com>,
Alexey Malahov <Alexey.Malahov@baikalelectronics.ru>,
Pavel Parkhomenko <Pavel.Parkhomenko@baikalelectronics.ru>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
<linux-nvme@lists.infradead.org>, <linux-block@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: [PATCH 1/2] nvme-hwmon: Cache-line-align the NVME SMART log-buffer
Date: Fri, 9 Sep 2022 22:19:15 +0300 [thread overview]
Message-ID: <20220909191916.16013-2-Sergey.Semin@baikalelectronics.ru> (raw)
In-Reply-To: <20220909191916.16013-1-Sergey.Semin@baikalelectronics.ru>
Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has caused
a regression on our platform. It turned out that the nvme_get_log() method
invocation caused the nvme_hwmon_data structure instance corruption. In
particular the nvme_hwmon_data.ctrl pointer was overwritten either with
zeros or with garbage. After some researches we discovered that the
problem happened even before the actual NVME DMA execution, but during the
buffer mapping. Since our platform was DMA-noncoherent the mapping implied
the cache-lines invalidations or write-backs depending on the
DMA-direction parameter. In case of the NVME SMART log getting the DMA
was performed from-device-to-memory, thus the cache-invalidation was
activated during the buffer mapping. Since the log-buffer wasn't
cache-line aligned the cache-invalidation caused the neighbour data
discard. The neighbouring data turned to be the data surrounding the
buffer in the framework of the nvme_hwmon_data structure.
In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. By doing
so we not only get rid from the denoted problem but also fulfill the
requirement explicitly described in [1].
After a deeper researches we found out that the denoted commit wasn't a
root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.
[1] Documentation/core-api/dma-api.rst
Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
---
Folks, I've thoroughly studied the whole NVME subsystem looking for
similar problems. Turned out there is one more place which may cause the
same issue. It's connected with the opal_dev.{cmd,req} buffers passed to
the nvme_sec_submit() method. The rest of the buffers involved in the NVME
DMA are either allocated by kmalloc (must be cache-line-aligned by design)
or bounced-buffered if allocated on the stack (see the blk_rq_map_kern()
method implementation).
---
drivers/nvme/host/hwmon.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 0a586d712920..94192ab7a02d 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -10,9 +10,10 @@
#include "nvme.h"
+/* DMA-noncoherent platforms require the cache-aligned buffers */
struct nvme_hwmon_data {
+ struct nvme_smart_log log ____cacheline_aligned;
struct nvme_ctrl *ctrl;
- struct nvme_smart_log log;
struct mutex read_lock;
};
--
2.37.2
next prev parent reply other threads:[~2022-09-09 19:19 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-09 19:19 [PATCH 0/2] block/nvme: Fix DMA-noncoherent platforms support Serge Semin
2022-09-09 19:19 ` Serge Semin [this message]
2022-09-09 19:42 ` [PATCH 1/2] nvme-hwmon: Cache-line-align the NVME SMART log-buffer Keith Busch
2022-09-09 20:53 ` Serge Semin
2022-09-09 20:36 ` Guenter Roeck
2022-09-10 5:30 ` Christoph Hellwig
2022-09-10 12:35 ` Serge Semin
2022-09-10 18:09 ` Serge Semin
2022-09-12 8:29 ` Christoph Hellwig
2022-09-25 22:23 ` Serge Semin
2022-09-26 14:39 ` Christoph Hellwig
2022-09-26 19:04 ` Serge Semin
2022-09-10 14:33 ` Guenter Roeck
2022-09-09 19:19 ` [PATCH 2/2] block: sed-opal: Cache-line-align the cmd/resp buffers Serge Semin
2022-09-10 5:32 ` Christoph Hellwig
2022-09-11 16:28 ` Serge Semin
2022-09-25 22:30 ` Serge Semin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220909191916.16013-2-Sergey.Semin@baikalelectronics.ru \
--to=sergey.semin@baikalelectronics.ru \
--cc=Alexey.Malahov@baikalelectronics.ru \
--cc=Pavel.Parkhomenko@baikalelectronics.ru \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=fancer.lancer@gmail.com \
--cc=hch@lst.de \
--cc=jonathan.derrick@intel.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux@roeck-us.net \
--cc=revanth.rajashekar@intel.com \
--cc=sagi@grimberg.me \
--cc=tsbogend@alpha.franken.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox