From: Brian King <brking@linux.vnet.ibm.com>
To: "Matthew R. Ochs" <mrochs@linux.vnet.ibm.com>,
linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
Ian Munsie <imunsie@au1.ibm.com>,
Daniel Axtens <dja@ozlabs.au.ibm.com>,
Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>,
linuxppc-dev@lists.ozlabs.org,
"Manoj N. Kumar" <manoj@linux.vnet.ibm.com>
Subject: Re: [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH
Date: Fri, 18 Sep 2015 08:37:41 -0500 [thread overview]
Message-ID: <55FC13A5.1080301@linux.vnet.ibm.com> (raw)
In-Reply-To: <1442438877-49469-1-git-send-email-mrochs@linux.vnet.ibm.com>
On 09/16/2015 04:27 PM, Matthew R. Ochs wrote:
> --- a/drivers/scsi/cxlflash/main.c
> +++ b/drivers/scsi/cxlflash/main.c
> @@ -2311,6 +2311,7 @@ static int cxlflash_probe(struct pci_dev *pdev,
> cfg->lr_port = -1;
> mutex_init(&cfg->ctx_tbl_list_mutex);
> mutex_init(&cfg->ctx_recovery_mutex);
> + init_rwsem(&cfg->ioctl_rwsem);
> INIT_LIST_HEAD(&cfg->ctx_err_recovery);
> INIT_LIST_HEAD(&cfg->lluns);
>
> @@ -2365,6 +2366,19 @@ out_remove:
> }
>
> /**
> + * drain_ioctls() - wait until all currently executing ioctls have completed
> + * @cfg: Internal structure associated with the host.
> + *
> + * Obtain write access to read/write semaphore that wraps ioctl
> + * handling to 'drain' ioctls currently executing.
> + */
> +static void drain_ioctls(struct cxlflash_cfg *cfg)
> +{
> + down_write(&cfg->ioctl_rwsem);
> + up_write(&cfg->ioctl_rwsem);
> +}
> +
> +/**
> * cxlflash_pci_error_detected() - called when a PCI error is detected
> * @pdev: PCI device struct.
> * @state: PCI channel state.
> @@ -2383,16 +2397,14 @@ static pci_ers_result_t cxlflash_pci_error_detected(struct pci_dev *pdev,
> switch (state) {
> case pci_channel_io_frozen:
> cfg->state = STATE_LIMBO;
> -
> - /* Turn off legacy I/O */
> scsi_block_requests(cfg->host);
> + drain_ioctls(cfg);
So, what kicks any outstanding ioctls back? Let's assume you are in the middle of disk_attach
and you've sent the READ_CAP16 to the device. It appears as if what would happen here is we'd
sit here in cxlflash_pci_error_detected. Eventually, the READ_CAP16 would timeout. This would
wake the SCSI error handler, and end up calling your eh_device_reset handler, which would see that
we are in STATE_LIMBO, where it would then do a wait_event, waiting for us to get out of STATE_LIMBO,
and we would end up in a deadlock.
Rather than implementing a rw semaphore, would it be better to simply make the ioctls check the
state we are in and either wait to get out of EEH state or fail themselves?
> rc = cxlflash_mark_contexts_error(cfg);
> if (unlikely(rc))
> dev_err(dev, "%s: Failed to mark user contexts!(%d)\n",
> __func__, rc);
> term_mc(cfg, UNDO_START);
> stop_afu(cfg);
> -
> return PCI_ERS_RESULT_NEED_RESET;
> case pci_channel_io_perm_failure:
> cfg->state = STATE_FAILTERM;
> diff --git a/drivers/scsi/cxlflash/superpipe.c b/drivers/scsi/cxlflash/superpipe.c
> index cf2a85d..8a18230 100644
> --- a/drivers/scsi/cxlflash/superpipe.c
> +++ b/drivers/scsi/cxlflash/superpipe.c
> @@ -1214,6 +1214,48 @@ static const struct file_operations null_fops = {
> };
>
> /**
> + * check_state() - checks and responds to the current adapter state
> + * @cfg: Internal structure associated with the host.
> + * @ioctl: Indicates if on an ioctl thread.
> + *
> + * This routine can block and should only be used on process context.
> + * When blocking on an ioctl thread, the ioctl read semaphore should be
> + * let up to allow for draining actively running ioctls. Also note that
> + * when waking up from waiting in reset, the state is unknown and must
> + * be checked again before proceeding.
> + *
> + * Return: 0 on success, -errno on failure
> + */
> +static int check_state(struct cxlflash_cfg *cfg, bool ioctl)
All your callers appear to set the second parameter to true, so why bother having it?
> +{
> + struct device *dev = &cfg->dev->dev;
> + int rc = 0;
> +
> +retry:
> + switch (cfg->state) {
> + case STATE_LIMBO:
> + dev_dbg(dev, "%s: Limbo state, going to wait...\n", __func__);
> + if (ioctl)
> + up_read(&cfg->ioctl_rwsem);
> + rc = wait_event_interruptible(cfg->limbo_waitq,
> + cfg->state != STATE_LIMBO);
> + if (ioctl)
> + down_read(&cfg->ioctl_rwsem);
> + if (unlikely(rc))
> + break;
> + goto retry;
> + case STATE_FAILTERM:
> + dev_dbg(dev, "%s: Failed/Terminating!\n", __func__);
> + rc = -ENODEV;
> + break;
> + default:
> + break;
> + }
> +
> + return rc;
> +}
> +
> +/**
> * cxlflash_disk_attach() - attach a LUN to a context
> * @sdev: SCSI device associated with LUN.
> * @attach: Attach ioctl data structure.
--
Brian King
Power Linux I/O
IBM Linux Technology Center
next prev parent reply other threads:[~2015-09-18 13:38 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-16 21:23 [PATCH v2 00/30] cxlflash: Miscellaneous bug fixes and corrections Matthew R. Ochs
2015-09-16 21:25 ` [PATCH v2 01/30] cxlflash: Fix to avoid invalid port_sel value Matthew R. Ochs
2015-09-18 1:16 ` Brian King
2015-09-16 21:26 ` [PATCH v2 02/30] cxlflash: Replace magic numbers with literals Matthew R. Ochs
2015-09-18 1:18 ` Brian King
2015-09-16 21:26 ` [PATCH v2 03/30] cxlflash: Fix read capacity timeout Matthew R. Ochs
2015-09-18 1:21 ` Brian King
2015-09-21 11:36 ` Tomas Henzl
2015-09-21 22:11 ` Matthew R. Ochs
2015-09-21 22:11 ` Matthew R. Ochs
2015-09-16 21:27 ` [PATCH v2 04/30] cxlflash: Fix potential oops following LUN removal Matthew R. Ochs
2015-09-18 1:26 ` Brian King
2015-09-18 23:18 ` Matthew R. Ochs
2015-09-18 23:18 ` Matthew R. Ochs
2015-09-21 12:11 ` Tomas Henzl
2015-09-21 22:32 ` Matthew R. Ochs
2015-09-21 22:32 ` Matthew R. Ochs
2015-09-16 21:27 ` [PATCH v2 05/30] cxlflash: Fix data corruption when vLUN used over multiple cards Matthew R. Ochs
2015-09-18 1:28 ` Brian King
2015-09-16 21:27 ` [PATCH v2 06/30] cxlflash: Fix to avoid sizeof(bool) Matthew R. Ochs
2015-09-18 1:29 ` Brian King
2015-09-16 21:27 ` [PATCH v2 07/30] cxlflash: Fix context encode mask width Matthew R. Ochs
2015-09-18 1:29 ` Brian King
2015-09-16 21:27 ` [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
2015-09-18 13:37 ` Brian King [this message]
2015-09-18 23:54 ` Matthew R. Ochs
2015-09-18 23:54 ` Matthew R. Ochs
2015-09-16 21:28 ` [PATCH v2 09/30] cxlflash: Fix to stop interrupt processing on remove Matthew R. Ochs
2015-09-17 11:58 ` David Laight
2015-09-17 11:58 ` David Laight
2015-09-17 16:55 ` Matthew R. Ochs
2015-09-17 16:55 ` Matthew R. Ochs
2015-09-16 21:28 ` [PATCH v2 10/30] cxlflash: Correct naming of limbo state and waitq Matthew R. Ochs
2015-09-18 15:28 ` Brian King
2015-09-16 21:28 ` [PATCH v2 11/30] cxlflash: Make functions static Matthew R. Ochs
2015-09-18 15:34 ` Brian King
2015-09-21 12:18 ` Tomas Henzl
2015-09-21 22:36 ` Matthew R. Ochs
2015-09-16 21:29 ` [PATCH v2 12/30] cxlflash: Refine host/device attributes Matthew R. Ochs
2015-09-18 21:34 ` Brian King
2015-09-18 23:56 ` Matthew R. Ochs
2015-09-18 23:56 ` Matthew R. Ochs
2015-09-21 9:55 ` David Laight
2015-09-21 9:55 ` David Laight
2015-09-16 21:30 ` [PATCH v2 13/30] cxlflash: Fix to avoid spamming the kernel log Matthew R. Ochs
2015-09-18 21:39 ` Brian King
2015-09-16 21:30 ` [PATCH v2 14/30] cxlflash: Fix to avoid stall while waiting on TMF Matthew R. Ochs
2015-09-21 18:24 ` Brian King
2015-09-21 23:05 ` Matthew R. Ochs
2015-09-16 21:30 ` [PATCH v2 15/30] cxlflash: Fix location of setting resid Matthew R. Ochs
2015-09-21 18:28 ` Brian King
2015-09-16 21:30 ` [PATCH v2 16/30] cxlflash: Fix host link up event handling Matthew R. Ochs
2015-09-21 21:47 ` Brian King
2015-09-16 21:30 ` [PATCH v2 17/30] cxlflash: Fix async interrupt bypass logic Matthew R. Ochs
2015-09-21 21:48 ` Brian King
2015-09-16 21:30 ` [PATCH v2 18/30] cxlflash: Remove dual port online dependency Matthew R. Ochs
2015-09-21 22:02 ` Brian King
2015-09-22 20:44 ` Matthew R. Ochs
2015-09-22 20:50 ` Brian King
2015-09-16 21:30 ` [PATCH v2 19/30] cxlflash: Fix AFU version access/storage and add check Matthew R. Ochs
2015-09-22 20:47 ` Brian King
2015-09-16 21:30 ` [PATCH v2 20/30] cxlflash: Correct usage of scsi_host_put() Matthew R. Ochs
2015-09-22 20:53 ` Brian King
2015-09-22 21:49 ` Matthew R. Ochs
2015-09-22 21:49 ` Matthew R. Ochs
2015-09-16 21:31 ` [PATCH v2 21/30] cxlflash: Fix to prevent workq from accessing freed memory Matthew R. Ochs
2015-09-21 12:25 ` Tomas Henzl
2015-09-21 22:44 ` Matthew R. Ochs
2015-09-16 21:31 ` [PATCH v2 22/30] cxlflash: Correct behavior in device reset handler following EEH Matthew R. Ochs
2015-09-22 20:58 ` Brian King
2015-09-16 21:31 ` [PATCH v2 23/30] cxlflash: Remove unnecessary scsi_block_requests Matthew R. Ochs
2015-09-22 20:59 ` Brian King
2015-09-16 21:31 ` [PATCH v2 24/30] cxlflash: Fix function prolog parameters and return codes Matthew R. Ochs
2015-09-22 21:02 ` Brian King
2015-09-16 21:32 ` [PATCH v2 25/30] cxlflash: Fix MMIO and endianness errors Matthew R. Ochs
2015-09-23 15:03 ` Brian King
2015-09-16 21:32 ` [PATCH v2 26/30] cxlflash: Fix to prevent EEH recovery failure Matthew R. Ochs
2015-09-23 19:09 ` Brian King
2015-09-16 21:32 ` [PATCH v2 27/30] cxlflash: Correct spelling, grammar, and alignment mistakes Matthew R. Ochs
2015-09-23 19:13 ` Brian King
2015-09-16 21:32 ` [PATCH v2 28/30] cxlflash: Fix to prevent stale AFU RRQ Matthew R. Ochs
2015-09-23 19:18 ` Brian King
2015-09-16 21:32 ` [PATCH v2 29/30] cxlflash: Fix to avoid state change collision Matthew R. Ochs
2015-09-21 12:44 ` Tomas Henzl
2015-09-21 22:59 ` Matthew R. Ochs
2015-09-16 21:33 ` [PATCH v2 30/30] MAINTAINERS: Add cxlflash driver Matthew R. Ochs
2015-09-23 19:19 ` Brian King
-- strict thread matches above, loose matches on Subject: below --
2015-09-16 16:53 [PATCH v2 08/30] cxlflash: Fix to avoid CXL services during EEH Matthew R. Ochs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55FC13A5.1080301@linux.vnet.ibm.com \
--to=brking@linux.vnet.ibm.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=andrew.donnellan@au1.ibm.com \
--cc=dja@ozlabs.au.ibm.com \
--cc=imunsie@au1.ibm.com \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=manoj@linux.vnet.ibm.com \
--cc=mikey@neuling.org \
--cc=mrochs@linux.vnet.ibm.com \
--cc=nab@linux-iscsi.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.