From: Max Gurtovoy <mgurtovoy@nvidia.com>
To: <martin.petersen@oracle.com>, <hch@lst.de>, <sagi@grimberg.me>,
<linux-nvme@lists.infradead.org>
Cc: <hare@suse.de>, <kbusch@kernel.org>, <axboe@kernel.dk>,
<linux-block@vger.kernel.org>, <oren@nvidia.com>,
<oevron@nvidia.com>, <israelr@nvidia.com>,
Max Gurtovoy <mgurtovoy@nvidia.com>
Subject: [PATCH v2 0/2] Fix failover to non integrity NVMe path
Date: Tue, 25 Apr 2023 01:54:40 +0300 [thread overview]
Message-ID: <20230424225442.18916-1-mgurtovoy@nvidia.com> (raw)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="yes", Size: 3505 bytes --]
Hi Christoph/Sagi/Martin,
We're encountered a crash while testing failover between NVMeF/RDMA
paths to a target that expose a namespace with metadata. The scenario is
the following:
Configure one initiator/host path on PI offload capable port (e.g ConnectX-5
device) and configure second initiator/host path on non PI offload capable
port (e.g ConnectX-3).
In case of failover, the original rq/bio is protected with integrity
context but the failover port is not integrity capable and it didn't allocate
the metadata resources for request. Thus we get NULL deref in case
blk_integrity_rq(rq) return true while req->metadata_sgl is NULL.
Bellow snip on the trace:
[Tue Feb 14 18:48:25 2023] mlx5_core 0000:02:00.0 ens2f0np0: Link down
[Tue Feb 14 18:48:32 2023] nvme nvme0: starting error recovery
[Tue Feb 14 18:48:32 2023] BUG: kernel NULL pointer dereference, address: 0000000000000010
[Tue Feb 14 18:48:32 2023] #PF: supervisor read access in kernel mode
[Tue Feb 14 18:48:32 2023] #PF: error_code(0x0000) - not-present page
[Tue Feb 14 18:48:32 2023] PGD 0 P4D 0
[Tue Feb 14 18:48:32 2023] Oops: 0000 [#1] PREEMPT SMP PTI
[Tue Feb 14 18:48:32 2023] CPU: 17 PID: 518 Comm: kworker/17:1H Tainted: G S E 6.2.0-rc4+ #224
[Tue Feb 14 18:48:32 2023] Hardware name: Supermicro SYS-6018R-WTR/X10DRW-i, BIOS 2.0 12/17/2015
[Tue Feb 14 18:48:32 2023] Workqueue: kblockd nvme_requeue_work [nvme_core]
...
...
[Tue Feb 14 18:48:32 2023] Call Trace:
[Tue Feb 14 18:48:32 2023] <TASK>
[Tue Feb 14 18:48:32 2023] nvme_rdma_queue_rq+0x194/0xa20 [nvme_rdma]
[Tue Feb 14 18:48:32 2023] ? __blk_mq_try_issue_directly+0x13f/0x1a0
[Tue Feb 14 18:48:32 2023] __blk_mq_try_issue_directly+0x13f/0x1a0
[Tue Feb 14 18:48:32 2023] blk_mq_try_issue_directly+0x15/0x50
[Tue Feb 14 18:48:32 2023] blk_mq_submit_bio+0x539/0x580
[Tue Feb 14 18:48:32 2023] __submit_bio+0xfa/0x170
[Tue Feb 14 18:48:32 2023] submit_bio_noacct_nocheck+0xe1/0x2a0
[Tue Feb 14 18:48:32 2023] nvme_requeue_work+0x4e/0x60 [nvme_core]
To solve that we'll expose API to release the integrity context from a
bio and call it in case of failover for each bio.
Another way to solve this is to free the context during
bio_integrity_prep.
I choose the first option because I thought it is better to avoid this
check in the fast path but the price is exporting new API from the block
layer.
In V1 there were some doubts regarding the setup configuration but I
believe that we can and should support it.
There are no provisions in the specification that prohibit it. Combining
binding and coupling multipathing with integrity/metadata appears to be
a matter of implementation specifics rather than a requirement.
If the host path lacks the ability to add protection information, it is
acceptable to request that the controller take action by setting the
PRACT bit to 1 when the namespace is formatted with protection
information.
Changes:
V2:
- update cover letter with more motivation
- Fix build issue reported by kernel test robot
- Didn't add Reviewed-by signature from Sagi for patch 2/2 since I
think he is still not sure about the series.
Max Gurtovoy (2):
block: bio-integrity: export bio_integrity_free func
nvme-multipath: fix path failover for integrity ns
block/bio-integrity.c | 1 +
block/blk.h | 4 ----
drivers/nvme/host/multipath.c | 9 +++++++++
include/linux/bio.h | 6 ++++++
4 files changed, 16 insertions(+), 4 deletions(-)
--
2.18.1
next reply other threads:[~2023-04-24 22:55 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-24 22:54 Max Gurtovoy [this message]
2023-04-24 22:54 ` [PATCH v2 1/2] block: bio-integrity: export bio_integrity_free func Max Gurtovoy
2023-04-24 22:54 ` [PATCH v2 2/2] nvme-multipath: fix path failover for integrity ns Max Gurtovoy
2023-04-25 2:12 ` Martin K. Petersen
2023-04-25 10:24 ` Max Gurtovoy
2023-04-25 22:39 ` Keith Busch
2023-04-26 1:22 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230424225442.18916-1-mgurtovoy@nvidia.com \
--to=mgurtovoy@nvidia.com \
--cc=axboe@kernel.dk \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=israelr@nvidia.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=martin.petersen@oracle.com \
--cc=oevron@nvidia.com \
--cc=oren@nvidia.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox