From: Hannes Reinecke <hare@suse.de>
To: Don Brace <don.brace@pmcs.com>,
scott.teel@pmcs.com, Kevin.Barnett@pmcs.com,
james.bottomley@parallels.com, hch@infradead.org,
Justin.Lindley@pmcs.combrace@pmcs.com
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH v4 03/43] hpsa: rework controller command submission
Date: Fri, 17 Apr 2015 14:57:57 +0200 [thread overview]
Message-ID: <55310355.8090701@suse.de> (raw)
In-Reply-To: <20150416134658.30238.76254.stgit@brunhilda>
Hi Don,
some comments inline.
On 04/16/2015 03:46 PM, Don Brace wrote:
> From: Webb Scales <webb.scales@hp.com>
>
> Allow driver initiated commands to have a timeout. It does not
> yet try to do anything with timeouts on such commands.
>
> We are sending a reset in order to get rid of a command we want to abort.
> If we make it return on the same reply queue as the command we want to abort,
> the completion of the aborted command will not race with the completion of
> the reset command.
>
> Rename hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd(), since
> this function is the interface for issuing commands to the controller and
> not the "core" of that implementation. Add a parameter to it which allows
> the caller to specify the reply queue to be used. Modify existing callers
> to specify the default reply queue.
>
> Rename __hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd_core(),
> since this routine is the "core" implementation of the "do simple command"
> function and there is no longer any other function with a similar name.
> Modify the existing callers of this routine (other than
> hpsa_scsi_do_simple_cmd()) to instead call hpsa_scsi_do_simple_cmd(), since
> it will now accept the reply_queue paramenter, and it provides a controller
> lock-up check. (Also, tweak two related message strings to make them
> distinct from each other.)
>
> Submitting a command to a locked up controller always results in a timeout,
> so check for controller lock-up before submitting.
>
> This is to enable fixing a race between command completions and
> abort completions on different reply queues in a subsequent patch.
> We want to be able to specify which reply queue an abort completion
> should occur on so that it cannot race the completion of the command
> it is trying to abort.
>
> The following race was possible in theory:
>
> 1. Abort command is sent to hardware.
> 2. Command to be aborted simultaneously completes on another
> reply queue.
> 3. Hardware receives abort command, decides command has already
> completed and indicates this to the driver via another different
> reply queue.
> 4. driver processes abort completion finds that the hardware does not know
> about the command, concludes that therefore the command cannot complete,
> returns SUCCESS indicating to the mid-layer that the scsi_cmnd may be
> re-used.
> 5. Command from step 2 is processed and completed back to scsi mid
> layer (after we already promised that would never happen.)
>
> Fix by forcing aborts to complete on the same reply queue as the command
> they are aborting.
>
> Piggybacking device rescanning functionality onto the lockup
> detection thread is not a good idea because if the controller
> locks up during device rescanning, then the thread could get
> stuck, then the lockup isn't detected. Use separate work
> queues for device rescanning and lockup detection.
>
> Detect controller lockup in abort handler.
>
> After a lockup is detected, return DO_NO_CONNECT which results in immediate
> termination of commands rather than DID_ERR which results in retries.
>
> Modify detect_controller_lockup() to return the result, to remove the need for a separate check.
>
> Reviewed-by: Scott Teel <scott.teel@pmcs.com>
> Reviewed-by: Kevin Barnett <kevin.barnett@pmcs.com>
> Signed-off-by: Webb Scales <webbnh@hp.com>
> Signed-off-by: Don Brace <don.brace@pmcs.com>
> ---
> drivers/scsi/hpsa.c | 329 ++++++++++++++++++++++++++++++++++++-----------
> drivers/scsi/hpsa_cmd.h | 5 +
> 2 files changed, 257 insertions(+), 77 deletions(-)
>
[ .. ]
> @@ -4375,13 +4469,46 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
> "device lookup failed.\n");
> return FAILED;
> }
> - dev_warn(&h->pdev->dev, "resetting device %d:%d:%d:%d\n",
> - h->scsi_host->host_no, dev->bus, dev->target, dev->lun);
> +
> + /* if controller locked up, we can guarantee command won't complete */
> + if (lockup_detected(h)) {
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%d RESET FAILED, lockup detected\n",
> + h->scsi_host->host_no, dev->bus, dev->target,
> + dev->lun);
> + return FAILED;
> + }
> +
> + /* this reset request might be the result of a lockup; check */
> + if (detect_controller_lockup(h)) {
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%d RESET FAILED, new lockup detected\n",
> + h->scsi_host->host_no, dev->bus, dev->target,
> + dev->lun);
> + return FAILED;
> + }
> +
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%d: %s %.8s %.16s resetting RAID-%s SSDSmartPathCap%c En%c Exp=%d\n",
> + h->scsi_host->host_no, dev->bus, dev->target, dev->lun,
> + scsi_device_type(dev->devtype),
> + dev->vendor,
> + dev->model,
> + dev->raid_level > RAID_UNKNOWN ?
> + "RAID-?" : raid_label[dev->raid_level],
> + dev->offload_config ? '+' : '-',
> + dev->offload_enabled ? '+' : '-',
> + dev->expose_state);
> +
Didn't you just reworked the message logging in the previous patch?
Why can't you use it here?
> /* send a reset to the SCSI LUN which the command was sent to */
> - rc = hpsa_send_reset(h, dev->scsi3addr, HPSA_RESET_TYPE_LUN);
> + rc = hpsa_send_reset(h, dev->scsi3addr, HPSA_RESET_TYPE_LUN,
> + DEFAULT_REPLY_QUEUE);
> if (rc == 0 && wait_for_device_to_become_ready(h, dev->scsi3addr) == 0)
> return SUCCESS;
>
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%d reset failed\n",
> + h->scsi_host->host_no, dev->bus, dev->target, dev->lun);
> return FAILED;
> }
>
> @@ -4426,7 +4553,7 @@ static void hpsa_get_tag(struct ctlr_info *h,
> }
>
> static int hpsa_send_abort(struct ctlr_info *h, unsigned char *scsi3addr,
> - struct CommandList *abort, int swizzle)
> + struct CommandList *abort, int swizzle, int reply_queue)
> {
> int rc = IO_OK;
> struct CommandList *c;
> @@ -4444,9 +4571,9 @@ static int hpsa_send_abort(struct ctlr_info *h, unsigned char *scsi3addr,
> 0, 0, scsi3addr, TYPE_MSG);
> if (swizzle)
> swizzle_abort_tag(&c->Request.CDB[4]);
> - hpsa_scsi_do_simple_cmd_core(h, c);
> + (void) hpsa_scsi_do_simple_cmd(h, c, reply_queue, NO_TIMEOUT);
> hpsa_get_tag(h, abort, &taglower, &tagupper);
> - dev_dbg(&h->pdev->dev, "%s: Tag:0x%08x:%08x: do_simple_cmd_core completed.\n",
> + dev_dbg(&h->pdev->dev, "%s: Tag:0x%08x:%08x: do_simple_cmd(abort) completed.\n",
> __func__, tagupper, taglower);
> /* no unmap needed here because no data xfer. */
>
> @@ -4478,7 +4605,7 @@ static int hpsa_send_abort(struct ctlr_info *h, unsigned char *scsi3addr,
> */
>
> static int hpsa_send_reset_as_abort_ioaccel2(struct ctlr_info *h,
> - unsigned char *scsi3addr, struct CommandList *abort)
> + unsigned char *scsi3addr, struct CommandList *abort, int reply_queue)
> {
> int rc = IO_OK;
> struct scsi_cmnd *scmd; /* scsi command within request being aborted */
> @@ -4521,7 +4648,7 @@ static int hpsa_send_reset_as_abort_ioaccel2(struct ctlr_info *h,
> "Reset as abort: Resetting physical device at scsi3addr 0x%02x%02x%02x%02x%02x%02x%02x%02x\n",
> psa[0], psa[1], psa[2], psa[3],
> psa[4], psa[5], psa[6], psa[7]);
> - rc = hpsa_send_reset(h, psa, HPSA_RESET_TYPE_TARGET);
> + rc = hpsa_send_reset(h, psa, HPSA_RESET_TYPE_TARGET, reply_queue);
> if (rc != 0) {
> dev_warn(&h->pdev->dev,
> "Reset as abort: Failed on physical device at scsi3addr 0x%02x%02x%02x%02x%02x%02x%02x%02x\n",
> @@ -4555,7 +4682,7 @@ static int hpsa_send_reset_as_abort_ioaccel2(struct ctlr_info *h,
> * make this true someday become false.
> */
> static int hpsa_send_abort_both_ways(struct ctlr_info *h,
> - unsigned char *scsi3addr, struct CommandList *abort)
> + unsigned char *scsi3addr, struct CommandList *abort, int reply_queue)
> {
> /* ioccelerator mode 2 commands should be aborted via the
> * accelerated path, since RAID path is unaware of these commands,
> @@ -4563,10 +4690,20 @@ static int hpsa_send_abort_both_ways(struct ctlr_info *h,
> * Change abort to physical device reset.
> */
> if (abort->cmd_type == CMD_IOACCEL2)
> - return hpsa_send_reset_as_abort_ioaccel2(h, scsi3addr, abort);
> + return hpsa_send_reset_as_abort_ioaccel2(h, scsi3addr,
> + abort, reply_queue);
> +
> + return hpsa_send_abort(h, scsi3addr, abort, 0, reply_queue) &&
> + hpsa_send_abort(h, scsi3addr, abort, 1, reply_queue);
> +}
>
> - return hpsa_send_abort(h, scsi3addr, abort, 0) &&
> - hpsa_send_abort(h, scsi3addr, abort, 1);
> +/* Find out which reply queue a command was meant to return on */
> +static int hpsa_extract_reply_queue(struct ctlr_info *h,
> + struct CommandList *c)
> +{
> + if (c->cmd_type == CMD_IOACCEL2)
> + return h->ioaccel2_cmd_pool[c->cmdindex].reply_queue;
> + return c->Header.ReplyQueue;
> }
>
> /* Send an abort for the specified command.
> @@ -4584,7 +4721,7 @@ static int hpsa_eh_abort_handler(struct scsi_cmnd *sc)
> char msg[256]; /* For debug messaging. */
> int ml = 0;
> __le32 tagupper, taglower;
> - int refcount;
> + int refcount, reply_queue;
>
> /* Find the controller of the command to be aborted */
> h = sdev_to_hba(sc->device);
> @@ -4592,8 +4729,23 @@ static int hpsa_eh_abort_handler(struct scsi_cmnd *sc)
> "ABORT REQUEST FAILED, Controller lookup failed.\n"))
> return FAILED;
>
> - if (lockup_detected(h))
> + /* If controller locked up, we can guarantee command won't complete */
> + if (lockup_detected(h)) {
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%llu scmd %p ABORT FAILED, lockup detected\n",
> + h->scsi_host->host_no, sc->device->channel,
> + sc->device->id, sc->device->lun, sc);
> return FAILED;
> + }
> +
> + /* This is a good time to check if controller lockup has occurred */
> + if (detect_controller_lockup(h)) {
> + dev_warn(&h->pdev->dev,
> + "scsi %d:%d:%d:%llu scmd %p ABORT FAILED, new lockup detected\n",
> + h->scsi_host->host_no, sc->device->channel,
> + sc->device->id, sc->device->lun, sc);
> + return FAILED;
> + }
>
Same here.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-04-17 12:57 UTC|newest]
Thread overview: 131+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-16 13:46 [PATCH v4 00/43] hpsa update Don Brace
2015-04-16 13:46 ` [PATCH v4 01/43] hpsa: add masked physical devices into h->dev[] array Don Brace
2015-04-17 12:52 ` Hannes Reinecke
2015-04-17 13:15 ` Tomas Henzl
2015-04-16 13:46 ` [PATCH v4 02/43] hpsa: clean up host, channel, target, lun prints Don Brace
2015-04-17 12:53 ` Hannes Reinecke
2015-04-17 13:15 ` Tomas Henzl
2015-04-16 13:46 ` [PATCH v4 03/43] hpsa: rework controller command submission Don Brace
2015-04-17 12:57 ` Hannes Reinecke [this message]
2015-04-16 13:47 ` [PATCH v4 04/43] hpsa: clean up aborts Don Brace
2015-04-17 12:59 ` Hannes Reinecke
2015-04-17 13:19 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 05/43] hpsa: decrement h->commands_outstanding in fail_all_outstanding_cmds Don Brace
2015-04-17 13:00 ` Hannes Reinecke
2015-04-17 13:20 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 06/43] hpsa: hpsa decode sense data for io and tmf Don Brace
2015-04-17 13:03 ` Hannes Reinecke
2015-04-17 13:23 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 07/43] hpsa: allow lockup detected to be viewed via sysfs Don Brace
2015-04-17 13:04 ` Hannes Reinecke
2015-04-17 13:23 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 08/43] hpsa: make function names consistent Don Brace
2015-04-17 13:04 ` Hannes Reinecke
2015-04-17 13:24 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 09/43] hpsa: factor out hpsa_init_cmd function Don Brace
2015-04-17 13:05 ` Hannes Reinecke
2015-04-17 13:26 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 10/43] hpsa: do not ignore return value of hpsa_register_scsi Don Brace
2015-04-17 13:05 ` Hannes Reinecke
2015-04-17 13:26 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 11/43] hpsa: try resubmitting down raid path on task set full Don Brace
2015-04-17 13:06 ` Hannes Reinecke
2015-04-17 13:26 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 12/43] hpsa: factor out hpsa_ioaccel_submit function Don Brace
2015-04-17 13:07 ` Hannes Reinecke
2015-04-17 13:27 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 13/43] hpsa: print accurate SSD Smart Path Enabled status Don Brace
2015-04-17 13:07 ` Hannes Reinecke
2015-04-17 13:27 ` Tomas Henzl
2015-04-16 13:47 ` [PATCH v4 14/43] hpsa: use ioaccel2 path to submit IOs to physical drives in HBA mode Don Brace
2015-04-17 13:08 ` Hannes Reinecke
2015-04-17 13:28 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 15/43] hpsa: Get queue depth from identify physical bmic for physical disks Don Brace
2015-04-17 13:09 ` Hannes Reinecke
2015-04-17 13:28 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 16/43] hpsa: break hpsa_free_irqs_and_disable_msix into two functions Don Brace
2015-04-17 13:10 ` Hannes Reinecke
2015-04-17 13:28 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 17/43] hpsa: clean up error handling Don Brace
2015-04-17 13:11 ` Hannes Reinecke
2015-04-17 13:29 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 18/43] hpsa: refactor freeing of resources into more logical functions Don Brace
2015-04-17 13:12 ` Hannes Reinecke
2015-04-17 13:29 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 19/43] hpsa: add ioaccel sg chaining for the ioaccel2 path Don Brace
2015-04-17 13:14 ` Hannes Reinecke
2015-04-22 19:12 ` brace
2015-04-23 5:50 ` Hannes Reinecke
2015-04-23 7:39 ` hch
2015-04-16 13:48 ` [PATCH v4 20/43] hpsa: add more ioaccel2 error handling, including underrun statuses Don Brace
2015-04-17 13:15 ` Hannes Reinecke
2015-04-16 13:48 ` [PATCH v4 21/43] hpsa: do not check cmd_alloc return value - it cannnot return NULL Don Brace
2015-04-17 13:16 ` Hannes Reinecke
2015-04-17 13:32 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 22/43] hpsa: correct return values from driver functions Don Brace
2015-04-17 13:17 ` Hannes Reinecke
2015-04-17 13:33 ` Tomas Henzl
2015-04-17 13:39 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 23/43] hpsa: clean up driver init Don Brace
2015-04-17 13:19 ` Hannes Reinecke
2015-04-17 13:39 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 24/43] hpsa: clean up some error reporting output in abort handler Don Brace
2015-04-17 13:20 ` Hannes Reinecke
2015-04-17 13:39 ` Tomas Henzl
2015-04-16 13:48 ` [PATCH v4 25/43] hpsa: do not print ioaccel2 warning messages about unusual completions Don Brace
2015-04-17 13:21 ` Hannes Reinecke
2015-04-17 13:40 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 26/43] hpsa: add support sending aborts to physical devices via the ioaccel2 path Don Brace
2015-04-17 13:22 ` Hannes Reinecke
2015-04-17 13:40 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 27/43] hpsa: use helper routines for finishing commands Don Brace
2015-04-17 13:23 ` Hannes Reinecke
2015-04-17 13:42 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 28/43] hpsa: don't return abort request until target is complete Don Brace
2015-04-17 13:24 ` Hannes Reinecke
2015-04-17 13:43 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 29/43] hpsa: refactor and rework support for sending TEST_UNIT_READY Don Brace
2015-04-17 13:25 ` Hannes Reinecke
2015-04-17 13:43 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 30/43] hpsa: performance tweak for hpsa_scatter_gather() Don Brace
2015-04-17 13:26 ` Hannes Reinecke
2015-04-17 13:43 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 31/43] hpsa: call pci_release_regions after pci_disable_device Don Brace
2015-04-17 13:27 ` Hannes Reinecke
2015-04-17 13:44 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 32/43] hpsa: skip free_irq calls if irqs are not allocated Don Brace
2015-04-17 13:29 ` Hannes Reinecke
2015-04-17 13:44 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 33/43] hpsa: cleanup for init_one step 2 in kdump Don Brace
2015-04-17 13:29 ` Hannes Reinecke
2015-04-17 13:45 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 34/43] hpsa: fix try_soft_reset error handling Don Brace
2015-04-17 13:29 ` Hannes Reinecke
2015-04-17 13:45 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 35/43] hpsa: create workqueue after the driver is ready for use Don Brace
2015-04-17 13:30 ` Hannes Reinecke
2015-04-17 13:46 ` Tomas Henzl
2015-04-16 13:49 ` [PATCH v4 36/43] hpsa: add interrupt number to /proc/interrupts interrupt name Don Brace
2015-04-17 13:31 ` Hannes Reinecke
2015-04-17 13:47 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 37/43] hpsa: use block layer tag for command allocation Don Brace
2015-04-17 13:33 ` Hannes Reinecke
2015-04-17 13:50 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 38/43] hpsa: use scsi host_no as hpsa controller number Don Brace
2015-04-17 13:34 ` Hannes Reinecke
2015-04-17 13:50 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 39/43] hpsa: propagate the error code in hpsa_kdump_soft_reset Don Brace
2015-04-17 13:34 ` Hannes Reinecke
2015-04-17 13:51 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 40/43] hpsa: cleanup reset Don Brace
2015-04-17 13:36 ` Hannes Reinecke
2015-04-17 13:51 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 41/43] hpsa: change driver version Don Brace
2015-04-17 13:36 ` Hannes Reinecke
2015-04-17 13:52 ` Tomas Henzl
2015-04-16 13:50 ` [PATCH v4 42/43] hpsa: add PMC to copyright Don Brace
2015-04-17 1:00 ` Elliott, Robert (Server Storage)
2015-04-17 13:36 ` Hannes Reinecke
2015-04-16 13:50 ` [PATCH v4 43/43] hpsa: add in new controller id Don Brace
2015-04-17 13:37 ` Hannes Reinecke
2015-04-17 13:52 ` Tomas Henzl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55310355.8090701@suse.de \
--to=hare@suse.de \
--cc=Justin.Lindley@pmcs.combrace \
--cc=Kevin.Barnett@pmcs.com \
--cc=don.brace@pmcs.com \
--cc=hch@infradead.org \
--cc=james.bottomley@parallels.com \
--cc=linux-scsi@vger.kernel.org \
--cc=scott.teel@pmcs.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.