From: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>
To: linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
Brian King <brking@linux.vnet.ibm.com>,
Ian Munsie <imunsie@au1.ibm.com>,
Daniel Axtens <dja@ozlabs.au.ibm.com>,
Andrew Donnellan <andrew.donnellan@au1.ibm.com>,
Tomas Henzl <thenzl@redhat.com>,
David Laight <David.Laight@ACULAB.COM>
Cc: Michael Neuling <mikey@neuling.org>,
"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>,
linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup
Date: Wed, 21 Oct 2015 15:16:32 -0500 [thread overview]
Message-ID: <1445458592-63806-1-git-send-email-mrochs@linux.vnet.ibm.com> (raw)
In-Reply-To: <1445458134-63197-1-git-send-email-mrochs@linux.vnet.ibm.com>
Contexts may be skipped over for cleanup in situations where contention
for the adapter's table-list mutex is experienced in the presence of a
signal during the execution of the release handler.
This can lead to two known issues:
- A hang condition on remove as that path tries to wait for users to
cleanup - something that will never complete should this scenario play
out as the user has already cleaned up from their perspective.
- An Oops in the unmap_mapping_range() call that is made as part of
the user waiting mechanism that is invoked on remove when contexts
are found to still exist.
The root cause of this issue can be found in get_context() and how the
table-list mutex is acquired. As this code path is shared by several
different access points within the driver, a decision was made during
the development cycle to acquire this mutex in this location using the
interruptible version of the mutex locking service. In almost all of
the use-cases and environmental scenarios this holds up, even when the
mutex is contended. However, for critical system threads (such as the
release handler), failing to acquire the mutex and bailing with the
intention of the user being able to try again later is unacceptable.
In such a scenario, the context _must_ be derived as it is on an
irreversible path to being freed. Without being able to derive the
context, the code mistakenly assumes that it has already been freed
and proceeds to free up the underlying CXL context resources. From
this point on, any usage of [the now stale] CXL context resources
will result in undefined behavior. This is root cause of the Oops
mentioned as the second known issue as the mapping passed to the
unmap_mapping_range() service is owned by the CXL context.
To fix this problem, acquisition of the table-list mutex within
get_context() is simply changed to use the uninterruptible version
of the mutex locking service. This is safe as the timing windows for
holding this mutex are short and also protected against blocking.
Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
---
drivers/scsi/cxlflash/superpipe.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
index 8af7cdc..34b21a0 100644
--- a/drivers/scsi/cxlflash/superpipe.c
+++ b/drivers/scsi/cxlflash/superpipe.c
@@ -162,10 +162,7 @@ struct ctx_info *get_context(struct cxlflash_cfg *cfg, u64 rctxid,
if (likely(ctxid < MAX_CONTEXT)) {
while (true) {
- rc = mutex_lock_interruptible(&cfg->ctx_tbl_list_mutex);
- if (rc)
- goto out;
-
+ mutex_lock(&cfg->ctx_tbl_list_mutex);
ctxi = cfg->ctx_tbl[ctxid];
if (ctxi)
if ((file && (ctxi->file != file)) ||
--
2.1.0
next prev parent reply other threads:[~2015-10-21 20:17 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-21 20:08 [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-10-21 20:10 ` [PATCH v6 01/37] cxlflash: Fix to avoid invalid port_sel value Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 02/37] cxlflash: Replace magic numbers with literals Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 03/37] cxlflash: Fix read capacity timeout Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 04/37] cxlflash: Fix potential oops following LUN removal Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 05/37] cxlflash: Fix data corruption when vLUN used over multiple cards Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 06/37] cxlflash: Fix to avoid sizeof(bool) Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 07/37] cxlflash: Fix context encode mask width Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 08/37] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 09/37] cxlflash: Correct naming of limbo state and waitq Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 10/37] cxlflash: Make functions static Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 11/37] cxlflash: Refine host/device attributes Matthew R. Ochs
2015-10-23 13:33 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 12/37] cxlflash: Fix to avoid spamming the kernel log Matthew R. Ochs
2015-10-23 13:33 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 13/37] cxlflash: Fix to avoid stall while waiting on TMF Matthew R. Ochs
2015-10-23 13:36 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 14/37] cxlflash: Fix location of setting resid Matthew R. Ochs
2015-10-23 13:37 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 15/37] cxlflash: Fix host link up event handling Matthew R. Ochs
2015-10-23 13:38 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 16/37] cxlflash: Fix async interrupt bypass logic Matthew R. Ochs
2015-10-23 3:40 ` Andrew Donnellan
2015-10-23 13:39 ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 17/37] cxlflash: Remove dual port online dependency Matthew R. Ochs
2015-10-23 13:39 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 18/37] cxlflash: Fix AFU version access/storage and add check Matthew R. Ochs
2015-10-23 13:40 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 19/37] cxlflash: Correct usage of scsi_host_put() Matthew R. Ochs
2015-10-23 13:41 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 20/37] cxlflash: Fix to prevent workq from accessing freed memory Matthew R. Ochs
2015-10-23 13:41 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 21/37] cxlflash: Correct behavior in device reset handler following EEH Matthew R. Ochs
2015-10-23 13:42 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 22/37] cxlflash: Remove unnecessary scsi_block_requests Matthew R. Ochs
2015-10-23 13:42 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 23/37] cxlflash: Fix function prolog parameters and return codes Matthew R. Ochs
2015-10-23 13:45 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 24/37] cxlflash: Fix MMIO and endianness errors Matthew R. Ochs
2015-10-23 13:53 ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure Matthew R. Ochs
2015-10-23 13:54 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 26/37] cxlflash: Correct spelling, grammar, and alignment mistakes Matthew R. Ochs
2015-10-23 13:54 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 27/37] cxlflash: Fix to prevent stale AFU RRQ Matthew R. Ochs
2015-10-23 13:55 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 28/37] MAINTAINERS: Add cxlflash driver Matthew R. Ochs
2015-10-21 20:15 ` [PATCH v6 29/37] cxlflash: Fix to double the delay each time Matthew R. Ochs
2015-10-23 13:57 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 30/37] cxlflash: Fix to avoid corrupting adapter fops Matthew R. Ochs
2015-10-23 14:00 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 31/37] cxlflash: Correct trace string Matthew R. Ochs
2015-10-23 14:00 ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 32/37] cxlflash: Fix to avoid potential deadlock on EEH Matthew R. Ochs
2015-10-23 14:01 ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 33/37] cxlflash: Fix to avoid leaving dangling interrupt resources Matthew R. Ochs
2015-10-23 14:01 ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 34/37] cxlflash: Fix to escalate to LINK_RESET on login timeout Matthew R. Ochs
2015-10-23 14:01 ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 35/37] cxlflash: Fix to avoid corrupting port selection mask Matthew R. Ochs
2015-10-22 17:17 ` Manoj Kumar
2015-10-23 3:52 ` Andrew Donnellan
2015-10-21 20:16 ` [PATCH v6 36/37] cxlflash: Fix to avoid lock instrumentation rejection Matthew R. Ochs
2015-10-22 17:34 ` Manoj Kumar
2015-10-23 3:22 ` Andrew Donnellan
2015-10-21 20:16 ` Matthew R. Ochs [this message]
2015-10-22 2:01 ` [PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup Andrew Donnellan
2015-10-22 18:05 ` Manoj Kumar
2015-10-27 23:30 ` [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445458592-63806-1-git-send-email-mrochs@linux.vnet.ibm.com \
--to=mrochs@linux.vnet.ibm.com \
--cc=David.Laight@ACULAB.COM \
--cc=James.Bottomley@HansenPartnership.com \
--cc=andrew.donnellan@au1.ibm.com \
--cc=brking@linux.vnet.ibm.com \
--cc=dja@ozlabs.au.ibm.com \
--cc=imunsie@au1.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=manoj@linux.vnet.ibm.com \
--cc=mikey@neuling.org \
--cc=nab@linux-iscsi.org \
--cc=thenzl@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).