From: Brian King <brking@linux.vnet.ibm.com>
To: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>,
linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
Ian Munsie <imunsie@au1.ibm.com>,
Daniel Axtens <dja@ozlabs.au.ibm.com>,
Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>,
linuxppc-dev@lists.ozlabs.org,
"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH
Date: Fri, 18 Sep 2015 08:37:41 -0500 [thread overview]
Message-ID: <55FC13A5.1080301@linux.vnet.ibm.com> (raw)
In-Reply-To: <1442438877-49469-1-git-send-email-mrochs@linux.vnet.ibm.com>
On 09/16/2015 04:27 PM, Matthew R. Ochs wrote:
> --- a/drivers/scsi/cxlflash/main.c
> +++ b/drivers/scsi/cxlflash/main.c
> @@ -2311,6 +2311,7 @@ static int cxlflash_probe(struct pci_dev *pdev,
> cfg->lr_port = -1;
> mutex_init(&cfg->ctx_tbl_list_mutex);
> mutex_init(&cfg->ctx_recovery_mutex);
> + init_rwsem(&cfg->ioctl_rwsem);
> INIT_LIST_HEAD(&cfg->ctx_err_recovery);
> INIT_LIST_HEAD(&cfg->lluns);
>
> @@ -2365,6 +2366,19 @@ out_remove:
> }
>
> /**
> + * drain_ioctls() - wait until all currently executing ioctls have completed
> + * @cfg: Internal structure associated with the host.
> + *
> + * Obtain write access to read/write semaphore that wraps ioctl
> + * handling to 'drain' ioctls currently executing.
> + */
> +static void drain_ioctls(struct cxlflash_cfg *cfg)
> +{
> + down_write(&cfg->ioctl_rwsem);
> + up_write(&cfg->ioctl_rwsem);
> +}
> +
> +/**
> * cxlflash_pci_error_detected() - called when a PCI error is detected
> * @pdev: PCI device struct.
> * @state: PCI channel state.
> @@ -2383,16 +2397,14 @@ static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
> switch (state) {
> case pci_channel_io_frozen:
> cfg->state = STATE_LIMBO;
> -
> - /* Turn off legacy I/O */
> scsi_block_requests(cfg->host);
> + drain_ioctls(cfg);
So, what kicks any outstanding ioctls back? Let's assume you are in the middle of disk_attach
and you've sent the READ_CAP16 to the device. It appears as if what would happen here is we'd
sit here in cxlflash_pci_error_detected. Eventually, the READ_CAP16 would timeout. This would
wake the SCSI error handler, and end up calling your eh_device_reset handler, which would see that
we are in STATE_LIMBO, where it would then do a wait_event, waiting for us to get out of STATE_LIMBO,
and we would end up in a deadlock.
Rather than implementing a rw semaphore, would it be better to simply make the ioctls check the
state we are in and either wait to get out of EEH state or fail themselves?
> rc = cxlflash_mark_contexts_error(cfg);
> if (unlikely(rc))
> dev_err(dev, "%s: Failed to mark user contexts!(%d)\n",
> __func__, rc);
> term_mc(cfg, UNDO_START);
> stop_afu(cfg);
> -
> return PCI_ERS_RESULT_NEED_RESET;
> case pci_channel_io_perm_failure:
> cfg->state = STATE_FAILTERM;
> diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
> index cf2a85d..8a18230 100644
> --- a/drivers/scsi/cxlflash/superpipe.c
> +++ b/drivers/scsi/cxlflash/superpipe.c
> @@ -1214,6 +1214,48 @@ static const struct file_operations null_fops = {
> };
>
> /**
> + * check_state() - checks and responds to the current adapter state
> + * @cfg: Internal structure associated with the host.
> + * @ioctl: Indicates if on an ioctl thread.
> + *
> + * This routine can block and should only be used on process context.
> + * When blocking on an ioctl thread, the ioctl read semaphore should be
> + * let up to allow for draining actively running ioctls. Also note that
> + * when waking up from waiting in reset, the state is unknown and must
> + * be checked again before proceeding.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +static int check_state(struct cxlflash_cfg *cfg, bool ioctl)
All your callers appear to set the second parameter to true, so why bother having it?
> +{
> + struct device *dev = &cfg->dev->dev;
> + int rc = 0;
> +
> +retry:
> + switch (cfg->state) {
> + case STATE_LIMBO:
> + dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
> + if (ioctl)
> + up_read(&cfg->ioctl_rwsem);
> + rc = wait_event_interruptible(cfg->limbo_waitq,
> + cfg->state != STATE_LIMBO);
> + if (ioctl)
> + down_read(&cfg->ioctl_rwsem);
> + if (unlikely(rc))
> + break;
> + goto retry;
> + case STATE_FAILTERM:
> + dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
> + rc = -ENODEV;
> + break;
> + default:
> + break;
> + }
> +
> + return rc;
> +}
> +
> +/**
> * cxlflash_disk_attach() - attach a LUN to a context
> * @sdev: SCSI device associated with LUN.
> * @attach: Attach ioctl data structure.
--
Brian King
Power Linux I/O
IBM Linux Technology Center
next prev parent reply other threads:[~2015-09-18 13:38 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-16 21:23 [PATCH v2 00/30] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-09-16 21:25 ` [PATCH v2 01/30] cxlflash: Fix to avoid invalid port_sel value Matthew R. Ochs
2015-09-18 1:16 ` Brian King
2015-09-16 21:26 ` [PATCH v2 02/30] cxlflash: Replace magic numbers with literals Matthew R. Ochs
2015-09-18 1:18 ` Brian King
2015-09-16 21:26 ` [PATCH v2 03/30] cxlflash: Fix read capacity timeout Matthew R. Ochs
2015-09-18 1:21 ` Brian King
2015-09-21 11:36 ` Tomas Henzl
2015-09-21 22:11 ` Matthew R. Ochs
2015-09-16 21:27 ` [PATCH v2 04/30] cxlflash: Fix potential oops following LUN removal Matthew R. Ochs
2015-09-18 1:26 ` Brian King
2015-09-18 23:18 ` Matthew R. Ochs
2015-09-21 12:11 ` Tomas Henzl
2015-09-21 22:32 ` Matthew R. Ochs
2015-09-16 21:27 ` [PATCH v2 05/30] cxlflash: Fix data corruption when vLUN used over multiple cards Matthew R. Ochs
2015-09-18 1:28 ` Brian King
2015-09-16 21:27 ` [PATCH v2 06/30] cxlflash: Fix to avoid sizeof(bool) Matthew R. Ochs
2015-09-18 1:29 ` Brian King
2015-09-16 21:27 ` [PATCH v2 07/30] cxlflash: Fix context encode mask width Matthew R. Ochs
2015-09-18 1:29 ` Brian King
2015-09-16 21:27 ` [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
2015-09-18 13:37 ` Brian King [this message]
2015-09-18 23:54 ` Matthew R. Ochs
2015-09-16 21:28 ` [PATCH v2 09/30] cxlflash: Fix to stop interrupt processing on remove Matthew R. Ochs
2015-09-17 11:58 ` David Laight
2015-09-17 16:55 ` Matthew R. Ochs
2015-09-16 21:28 ` [PATCH v2 10/30] cxlflash: Correct naming of limbo state and waitq Matthew R. Ochs
2015-09-18 15:28 ` Brian King
2015-09-16 21:28 ` [PATCH v2 11/30] cxlflash: Make functions static Matthew R. Ochs
2015-09-18 15:34 ` Brian King
2015-09-21 12:18 ` Tomas Henzl
2015-09-21 22:36 ` Matthew R. Ochs
2015-09-16 21:29 ` [PATCH v2 12/30] cxlflash: Refine host/device attributes Matthew R. Ochs
2015-09-18 21:34 ` Brian King
2015-09-18 23:56 ` Matthew R. Ochs
2015-09-21 9:55 ` David Laight
2015-09-16 21:30 ` [PATCH v2 13/30] cxlflash: Fix to avoid spamming the kernel log Matthew R. Ochs
2015-09-18 21:39 ` Brian King
2015-09-16 21:30 ` [PATCH v2 14/30] cxlflash: Fix to avoid stall while waiting on TMF Matthew R. Ochs
2015-09-21 18:24 ` Brian King
2015-09-21 23:05 ` Matthew R. Ochs
2015-09-16 21:30 ` [PATCH v2 15/30] cxlflash: Fix location of setting resid Matthew R. Ochs
2015-09-21 18:28 ` Brian King
2015-09-16 21:30 ` [PATCH v2 16/30] cxlflash: Fix host link up event handling Matthew R. Ochs
2015-09-21 21:47 ` Brian King
2015-09-16 21:30 ` [PATCH v2 17/30] cxlflash: Fix async interrupt bypass logic Matthew R. Ochs
2015-09-21 21:48 ` Brian King
2015-09-16 21:30 ` [PATCH v2 18/30] cxlflash: Remove dual port online dependency Matthew R. Ochs
2015-09-21 22:02 ` Brian King
2015-09-22 20:44 ` Matthew R. Ochs
2015-09-22 20:50 ` Brian King
2015-09-16 21:30 ` [PATCH v2 19/30] cxlflash: Fix AFU version access/storage and add check Matthew R. Ochs
2015-09-22 20:47 ` Brian King
2015-09-16 21:30 ` [PATCH v2 20/30] cxlflash: Correct usage of scsi_host_put() Matthew R. Ochs
2015-09-22 20:53 ` Brian King
2015-09-22 21:49 ` Matthew R. Ochs
2015-09-16 21:31 ` [PATCH v2 21/30] cxlflash: Fix to prevent workq from accessing freed memory Matthew R. Ochs
2015-09-21 12:25 ` Tomas Henzl
2015-09-21 22:44 ` Matthew R. Ochs
2015-09-16 21:31 ` [PATCH v2 22/30] cxlflash: Correct behavior in device reset handler following EEH Matthew R. Ochs
2015-09-22 20:58 ` Brian King
2015-09-16 21:31 ` [PATCH v2 23/30] cxlflash: Remove unnecessary scsi_block_requests Matthew R. Ochs
2015-09-22 20:59 ` Brian King
2015-09-16 21:31 ` [PATCH v2 24/30] cxlflash: Fix function prolog parameters and return codes Matthew R. Ochs
2015-09-22 21:02 ` Brian King
2015-09-16 21:32 ` [PATCH v2 25/30] cxlflash: Fix MMIO and endianness errors Matthew R. Ochs
2015-09-23 15:03 ` Brian King
2015-09-16 21:32 ` [PATCH v2 26/30] cxlflash: Fix to prevent EEH recovery failure Matthew R. Ochs
2015-09-23 19:09 ` Brian King
2015-09-16 21:32 ` [PATCH v2 27/30] cxlflash: Correct spelling, grammar, and alignment mistakes Matthew R. Ochs
2015-09-23 19:13 ` Brian King
2015-09-16 21:32 ` [PATCH v2 28/30] cxlflash: Fix to prevent stale AFU RRQ Matthew R. Ochs
2015-09-23 19:18 ` Brian King
2015-09-16 21:32 ` [PATCH v2 29/30] cxlflash: Fix to avoid state change collision Matthew R. Ochs
2015-09-21 12:44 ` Tomas Henzl
2015-09-21 22:59 ` Matthew R. Ochs
2015-09-16 21:33 ` [PATCH v2 30/30] MAINTAINERS: Add cxlflash driver Matthew R. Ochs
2015-09-23 19:19 ` Brian King
-- strict thread matches above, loose matches on Subject: below --
2015-09-16 16:53 [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55FC13A5.1080301@linux.vnet.ibm.com \
--to=brking@linux.vnet.ibm.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=andrew.donnellan@au1.ibm.com \
--cc=dja@ozlabs.au.ibm.com \
--cc=imunsie@au1.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=manoj@linux.vnet.ibm.com \
--cc=mikey@neuling.org \
--cc=mrochs@linux.vnet.ibm.com \
--cc=nab@linux-iscsi.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).