From: Kanchan Joshi <joshi.k@samsung.com>
To: Keith Busch <kbusch@kernel.org>
Cc: hch@lst.de, axboe@kernel.dk, sagi@grimberg.me,
linux-nvme@lists.infradead.org, vincentfu@gmail.com,
ankit.kumar@samsung.com, joshiiitr@gmail.com,
gost.dev@samsung.com, stable@vger.kernel.org,
Vincent Fu <vincent.fu@samsung.com>
Subject: Re: [PATCH v2 1/2] nvme: fix memory corruption for passthrough metadata
Date: Wed, 6 Sep 2023 21:18:15 +0530 [thread overview]
Message-ID: <20230906154815.GA23984@green245> (raw)
In-Reply-To: <ZPduqCASmcNxUUep@kbusch-mbp>
[-- Attachment #1: Type: text/plain, Size: 7171 bytes --]
On Tue, Sep 05, 2023 at 12:08:40PM -0600, Keith Busch wrote:
>On Tue, Sep 05, 2023 at 10:48:25AM +0530, Kanchan Joshi wrote:
>> On Fri, Sep 01, 2023 at 10:45:50AM -0400, Keith Busch wrote:
>> > And similiar to this problem, what if the metadata is extended rather
>> > than separate, and the user's buffer is too short? That will lead to the
>> > same type of problem you're trying to fix here?
>>
>> No.
>> For extended metadata, userspace is using its own buffer. Since
>> intermediate kernel buffer does not exist, I do not have a problem to
>> solve.
>
>We still use kernel memory if the user buffer is unaligned. If the user
>space provides an short unaligned buffer, the device will corrupt kernel
>memory.
Ah yes. blk_rq_map_user_iov() does make a copy of user-buffer in that
case.
>> > My main concern, though, is forward and backward compatibility. Even
>> > when metadata is enabled, there are IO commands that don't touch it, so
>> > some tool that erroneously requested it will stop working. Or perhaps
>> > some other future opcode will have some other metadata use that doesn't
>> > match up exactly with how read/write/compare/append use it. As much as
>> > I'd like to avoid bad user commands from crashing, these kinds of checks
>> > can become problematic for maintenance.
>>
>> For forward compatibility - if we have commands that need to specify
>> metadata in a different way (than what is possible from this interface),
>> we anyway need a new passthrough command structure.
>
>Not sure about that. The existing struct is flexible enough to describe
>any possible nvme command.
>
>More specifically about compatibility is that this patch assumes an
>"nlb" field exists inside an opaque structure at DW12 offset, and that
>field defines how large the metadata buffer needs to be. Some vendor
>specific or future opcode may have DW12 mean something completely
>different, but still need to access metadata this patch may prevent from
>working.
Right. It almost had me dropping the effort.
But given the horrible bug at hand, added an untested patch [1] that
handles all the shortcomings you mentioned. Please take a look.
>> Moreover, it's really about caring _only_ for cases when kernel
>> allocates
>> memory for metadata. And those cases are specific (i.e., when
>> metadata and metalen are not zero). We don't have to think in terms of
>> opcode (existing or future), no?
>
>It looks like a little work, but I don't see why blk-integrity must use
>kernel memory. Introducing an API like 'bio_integrity_map_user()' might
>also address your concern, as long as the user buffer is aligned. It
>sounds like we're assuming user buffers are aligned, at least.
Would you really prefer to have nvme_add_user_metadata() changed to do
away with allocation and use userspace meta-buffer directly?
Even with that route, extended-lba-with-short-unaligned-buffer remains
unhandled. That will still require similar checks that I would like
to avoid but cannnot.
So how about this -
[1]
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index d8ff796fd5f2..d09b5691da3e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -320,6 +320,67 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
meta_len, lower_32_bits(io.slba), NULL, 0, 0);
}
+static inline bool nvme_nlb_in_cdw12(u8 opcode)
+{
+ switch(opcode) {
+ case nvme_cmd_read:
+ case nvme_cmd_write:
+ case nvme_cmd_compare:
+ case nvme_cmd_zone_append:
+ return true;
+ }
+ return false;
+}
+
+static bool nvme_validate_passthru_meta(struct nvme_ctrl *ctrl,
+ struct nvme_ns *ns,
+ struct nvme_command *c,
+ __u64 meta, __u32 meta_len,
+ unsigned data_len)
+{
+ /*
+ * User may specify smaller meta-buffer with a larger data-buffer.
+ * Driver allocated meta buffer will also be small.
+ * Device can do larger dma into that, overwriting unrelated kernel
+ * memory.
+ */
+ if (ns && (meta_len || meta || ns->features & NVME_NS_EXT_LBAS)) {
+ u16 nlb, control;
+ unsigned dlen, mlen;
+
+ /* Exclude commands that do not have nlb in cdw12 */
+ if (!nvme_nlb_in_cdw12(c->common.opcode))
+ return true;
+
+ control = upper_16_bits(le32_to_cpu(c->common.cdw12));
+ /* Exclude when meta transfer from/to host is not done */
+ if (control & NVME_RW_PRINFO_PRACT && ns->ms == ns->pi_size)
+ return true;
+
+ nlb = lower_16_bits(le32_to_cpu(c->common.cdw12));
+ mlen = (nlb + 1) * ns->ms;
+
+ /* sanity for interleaved buffer */
+ if (ns->features & NVME_NS_EXT_LBAS) {
+ dlen = (nlb + 1) << ns->lba_shift;
+ if (data_len < (dlen + mlen))
+ goto out_false;
+ return true;
+ }
+ /* sanity for separate meta buffer */
+ if (meta_len < mlen)
+ goto out_false;
+
+ return true;
+out_false:
+ dev_err(ctrl->device,
+ "%s: metadata length is small!\n", current->comm);
+ return false;
+ }
+
+ return true;
+}
+
static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl,
struct nvme_ns *ns, __u32 nsid)
{
@@ -364,6 +425,10 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
c.common.cdw14 = cpu_to_le32(cmd.cdw14);
c.common.cdw15 = cpu_to_le32(cmd.cdw15);
+ if (!nvme_validate_passthru_meta(ctrl, ns, &c, cmd.metadata,
+ cmd.metadata_len, cmd.data_len))
+ return -EINVAL;
+
if (!nvme_cmd_allowed(ns, &c, 0, open_for_write))
return -EACCES;
@@ -411,6 +476,10 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
c.common.cdw14 = cpu_to_le32(cmd.cdw14);
c.common.cdw15 = cpu_to_le32(cmd.cdw15);
+ if (!nvme_validate_passthru_meta(ctrl, ns, &c, cmd.metadata,
+ cmd.metadata_len, cmd.data_len))
+ return -EINVAL;
+
if (!nvme_cmd_allowed(ns, &c, flags, open_for_write))
return -EACCES;
@@ -593,6 +662,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
d.metadata_len = READ_ONCE(cmd->metadata_len);
d.timeout_ms = READ_ONCE(cmd->timeout_ms);
+ if (!nvme_validate_passthru_meta(ctrl, ns, &c, d.metadata,
+ d.metadata_len, d.data_len))
+ return -EINVAL;
+
if (issue_flags & IO_URING_F_NONBLOCK) {
rq_flags |= REQ_NOWAIT;
blk_flags = BLK_MQ_REQ_NOWAIT;
--
2.25.1
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2023-09-06 15:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230814070548epcas5p34eb8f36ab460ee2bf55030ce856844b9@epcas5p3.samsung.com>
[not found] ` <20230814070213.161033-1-joshi.k@samsung.com>
2023-08-14 7:02 ` [PATCH v2 1/2] nvme: fix memory corruption for passthrough metadata Kanchan Joshi
2023-08-31 14:09 ` Vincent Fu
2023-09-01 7:06 ` KANCHAN JOSHI/Host Software /SSIR/Staff Engineer/Samsung Electronics
2023-09-01 14:45 ` Keith Busch
2023-09-05 5:18 ` Kanchan Joshi
2023-09-05 18:08 ` Keith Busch
2023-09-06 15:48 ` Kanchan Joshi [this message]
2023-09-07 15:41 ` Keith Busch
2023-09-07 20:35 ` Kanchan Joshi
2023-08-14 7:02 ` [PATCH v2 2/2] nvme: avoid memory corruption for sync passthrough Kanchan Joshi
2023-08-31 14:09 ` Vincent Fu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230906154815.GA23984@green245 \
--to=joshi.k@samsung.com \
--cc=ankit.kumar@samsung.com \
--cc=axboe@kernel.dk \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=joshiiitr@gmail.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
--cc=vincent.fu@samsung.com \
--cc=vincentfu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.