linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
To: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	Brian King <brking@linux.vnet.ibm.com>,
	Ian Munsie <imunsie@au1.ibm.com>,
	Daniel Axtens <dja@ozlabs.au.ibm.com>,
	Tomas Henzl <thenzl@redhat.com>,
	David Laight <David.Laight@ACULAB.COM>
Cc: Michael Neuling <mikey@neuling.org>,
	"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup
Date: Thu, 22 Oct 2015 13:01:57 +1100	[thread overview]
Message-ID: <56284395.1060100@au1.ibm.com> (raw)
In-Reply-To: <1445458592-63806-1-git-send-email-mrochs@linux.vnet.ibm.com>

On 22/10/15 07:16, Matthew R. Ochs wrote:
> Contexts may be skipped over for cleanup in situations where contention
> for the adapter's table-list mutex is experienced in the presence of a
> signal during the execution of the release handler.
>
> This can lead to two known issues:
>
>   - A hang condition on remove as that path tries to wait for users to
>     cleanup - something that will never complete should this scenario play
>     out as the user has already cleaned up from their perspective.
>
>   - An Oops in the unmap_mapping_range() call that is made as part of
>     the user waiting mechanism that is invoked on remove when contexts
>     are found to still exist.
>
> The root cause of this issue can be found in get_context() and how the
> table-list mutex is acquired. As this code path is shared by several
> different access points within the driver, a decision was made during
> the development cycle to acquire this mutex in this location using the
> interruptible version of the mutex locking service. In almost all of
> the use-cases and environmental scenarios this holds up, even when the
> mutex is contended. However, for critical system threads (such as the
> release handler), failing to acquire the mutex and bailing with the
> intention of the user being able to try again later is unacceptable.
>
> In such a scenario, the context _must_ be derived as it is on an
> irreversible path to being freed. Without being able to derive the
> context, the code mistakenly assumes that it has already been freed
> and proceeds to free up the underlying CXL context resources. From
> this point on, any usage of [the now stale] CXL context resources
> will result in undefined behavior. This is root cause of the Oops
> mentioned as the second known issue as the mapping passed to the
> unmap_mapping_range() service is owned by the CXL context.
>
> To fix this problem, acquisition of the table-list mutex within
> get_context() is simply changed to use the uninterruptible version
> of the mutex locking service. This is safe as the timing windows for
> holding this mutex are short and also protected against blocking.
>
> Signed-off-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>

-- 
Andrew Donnellan              Software Engineer, OzLabs
andrew.donnellan@au1.ibm.com  Australia Development Lab, Canberra
+61 2 6201 8874 (work)        IBM Australia Limited

  reply	other threads:[~2015-10-22  2:02 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 20:08 [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-10-21 20:10 ` [PATCH v6 01/37] cxlflash: Fix to avoid invalid port_sel value Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 02/37] cxlflash: Replace magic numbers with literals Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 03/37] cxlflash: Fix read capacity timeout Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 04/37] cxlflash: Fix potential oops following LUN removal Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 05/37] cxlflash: Fix data corruption when vLUN used over multiple cards Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 06/37] cxlflash: Fix to avoid sizeof(bool) Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 07/37] cxlflash: Fix context encode mask width Matthew R. Ochs
2015-10-21 20:11 ` [PATCH v6 08/37] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 09/37] cxlflash: Correct naming of limbo state and waitq Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 10/37] cxlflash: Make functions static Matthew R. Ochs
2015-10-21 20:12 ` [PATCH v6 11/37] cxlflash: Refine host/device attributes Matthew R. Ochs
2015-10-23 13:33   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 12/37] cxlflash: Fix to avoid spamming the kernel log Matthew R. Ochs
2015-10-23 13:33   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 13/37] cxlflash: Fix to avoid stall while waiting on TMF Matthew R. Ochs
2015-10-23 13:36   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 14/37] cxlflash: Fix location of setting resid Matthew R. Ochs
2015-10-23 13:37   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 15/37] cxlflash: Fix host link up event handling Matthew R. Ochs
2015-10-23 13:38   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 16/37] cxlflash: Fix async interrupt bypass logic Matthew R. Ochs
2015-10-23  3:40   ` Andrew Donnellan
2015-10-23 13:39   ` Tomas Henzl
2015-10-21 20:13 ` [PATCH v6 17/37] cxlflash: Remove dual port online dependency Matthew R. Ochs
2015-10-23 13:39   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 18/37] cxlflash: Fix AFU version access/storage and add check Matthew R. Ochs
2015-10-23 13:40   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 19/37] cxlflash: Correct usage of scsi_host_put() Matthew R. Ochs
2015-10-23 13:41   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 20/37] cxlflash: Fix to prevent workq from accessing freed memory Matthew R. Ochs
2015-10-23 13:41   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 21/37] cxlflash: Correct behavior in device reset handler following EEH Matthew R. Ochs
2015-10-23 13:42   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 22/37] cxlflash: Remove unnecessary scsi_block_requests Matthew R. Ochs
2015-10-23 13:42   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 23/37] cxlflash: Fix function prolog parameters and return codes Matthew R. Ochs
2015-10-23 13:45   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 24/37] cxlflash: Fix MMIO and endianness errors Matthew R. Ochs
2015-10-23 13:53   ` Tomas Henzl
2015-10-21 20:14 ` [PATCH v6 25/37] cxlflash: Fix to prevent EEH recovery failure Matthew R. Ochs
2015-10-23 13:54   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 26/37] cxlflash: Correct spelling, grammar, and alignment mistakes Matthew R. Ochs
2015-10-23 13:54   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 27/37] cxlflash: Fix to prevent stale AFU RRQ Matthew R. Ochs
2015-10-23 13:55   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 28/37] MAINTAINERS: Add cxlflash driver Matthew R. Ochs
2015-10-21 20:15 ` [PATCH v6 29/37] cxlflash: Fix to double the delay each time Matthew R. Ochs
2015-10-23 13:57   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 30/37] cxlflash: Fix to avoid corrupting adapter fops Matthew R. Ochs
2015-10-23 14:00   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 31/37] cxlflash: Correct trace string Matthew R. Ochs
2015-10-23 14:00   ` Tomas Henzl
2015-10-21 20:15 ` [PATCH v6 32/37] cxlflash: Fix to avoid potential deadlock on EEH Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 33/37] cxlflash: Fix to avoid leaving dangling interrupt resources Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 34/37] cxlflash: Fix to escalate to LINK_RESET on login timeout Matthew R. Ochs
2015-10-23 14:01   ` Tomas Henzl
2015-10-21 20:16 ` [PATCH v6 35/37] cxlflash: Fix to avoid corrupting port selection mask Matthew R. Ochs
2015-10-22 17:17   ` Manoj Kumar
2015-10-23  3:52   ` Andrew Donnellan
2015-10-21 20:16 ` [PATCH v6 36/37] cxlflash: Fix to avoid lock instrumentation rejection Matthew R. Ochs
2015-10-22 17:34   ` Manoj Kumar
2015-10-23  3:22   ` Andrew Donnellan
2015-10-21 20:16 ` [PATCH v6 37/37] cxlflash: Fix to avoid bypassing context cleanup Matthew R. Ochs
2015-10-22  2:01   ` Andrew Donnellan [this message]
2015-10-22 18:05   ` Manoj Kumar
2015-10-27 23:30 ` [PATCH v6 00/37] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56284395.1060100@au1.ibm.com \
    --to=andrew.donnellan@au1.ibm.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dja@ozlabs.au.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=manoj@linux.vnet.ibm.com \
    --cc=mikey@neuling.org \
    --cc=mrochs@linux.vnet.ibm.com \
    --cc=nab@linux-iscsi.org \
    --cc=thenzl@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).