From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Don Brace <don.brace@microchip.com>
Cc: Kevin.Barnett@microchip.com, scott.teel@microchip.com,
Justin.Lindley@microchip.com, scott.benesh@microchip.com,
gerry.morong@microchip.com, mahesh.rajashekhara@microchip.com,
mike.mcgowen@microchip.com, murthy.bhat@microchip.com,
balsundar.p@microchip.com, joseph.szczypek@hpe.com,
jeff@canonical.com, POSWALD@suse.com, john.p.donnelly@oracle.com,
mwilck@suse.com, linux-kernel@vger.kernel.org, hch@infradead.org,
martin.petersen@oracle.com, jejb@linux.vnet.ibm.com,
linux-scsi@vger.kernel.org
Subject: Re: [smartpqi updates PATCH V2 05/11] smartpqi: add tur check for sanitize operation
Date: Wed, 29 Sep 2021 09:56:34 +0200 [thread overview]
Message-ID: <c50efe26-2ee2-e880-3bb1-dd0ba60d2ec5@molgen.mpg.de> (raw)
In-Reply-To: <20210928235442.201875-6-don.brace@microchip.com>
Dear Don,
Thank you for the patch. Maybe rephrase the summary:
> Check TUR for sanitize operation
Am 29.09.21 um 01:54 schrieb Don Brace:
> Add in a TUR to HBA disks and do not present them to the OS if
Maybe add what TUR means: Test Unit Ready.
> 0x02/0x04/0x1b (sanitize in progress) is returned.
>
> During boot-up, some OSes appear to hang when there are one or
> more disks undergoing sanitize.
It’d be great, if you gave at least one concrete test setup, where the
hang occurred.
> According to SCSI SBC4 specification
> section 4.11.2 Commands allowed during sanitize,
> some SCSI commands are permitted, but read/write operations are not.
>
> When the OS attempts to read the disk partition table a
> CHECK CONDITION ASC 0x04 ASCQ 0x1b is returned which causes the OS
> to retry the read until sanitize has completed. This can take hours.
>
> Note: According to document HPE Smart Storage Administrator User Guide
> Link: https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=c03909334
>
> During the sanitize erase operation, the drive is unusable.
> I.E.
> The expected behavior for sanitize is the that disk remains
> offline even after sanitize has completed. The customer is
> expected to re-enable the disk using the management utility.
>
> Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
> Reviewed-by: Scott Teel <scott.teel@microchip.com>
> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
> Signed-off-by: Don Brace <don.brace@microchip.com>
> ---
> drivers/scsi/smartpqi/smartpqi_init.c | 87 +++++++++++++++++++++++++++
> 1 file changed, 87 insertions(+)
>
> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
> index 01330fd67500..838274d8fadf 100644
> --- a/drivers/scsi/smartpqi/smartpqi_init.c
> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
> @@ -555,6 +555,10 @@ static int pqi_build_raid_path_request(struct pqi_ctrl_info *ctrl_info,
> cdb = request->cdb;
>
> switch (cmd) {
> + case TEST_UNIT_READY:
> + request->data_direction = SOP_READ_FLAG;
> + cdb[0] = TEST_UNIT_READY;
> + break;
> case INQUIRY:
> request->data_direction = SOP_READ_FLAG;
> cdb[0] = INQUIRY;
> @@ -1575,6 +1579,85 @@ static int pqi_get_logical_device_info(struct pqi_ctrl_info *ctrl_info,
> return rc;
> }
>
> +/*
> + * Prevent adding drive to OS for some corner cases such as a drive
> + * undergoing a sanitize operation. Some OSes will continue to poll
> + * the drive until the sanitize completes, which can take hours,
> + * resulting in long bootup delays. Commands such as TUR, READ_CAP
> + * are allowed, but READ/WRITE cause check condition. So the OS
> + * cannot check/read the partition table.
> + * Note: devices that have completed sanitize must be re-enabled
> + * using the management utility.
> + */
> +static bool pqi_keep_device_offline(struct pqi_ctrl_info *ctrl_info,
> + struct pqi_scsi_dev *device)
> +{
> + u8 scsi_status;
> + int rc;
> + enum dma_data_direction dir;
> + char *buffer;
> + int buffer_length = 64;
Use size_t? And could be made const?
> + size_t sense_data_length;
> + struct scsi_sense_hdr sshdr;
> + struct pqi_raid_path_request request;
> + struct pqi_raid_error_info error_info;
> + bool offline = false; /* Assume keep online */
> +
> + /* Do not check controllers. */
I’d remove the dot/period at the end of the short comments.
> + if (pqi_is_hba_lunid(device->scsi3addr))
> + return false;
> +
> + /* Do not check LVs. */
> + if (pqi_is_logical_device(device))
> + return false;
> +
> + buffer = kmalloc(buffer_length, GFP_KERNEL);
> + if (!buffer)
> + return false; /* Assume not offline */
> +
> + /* Check for SANITIZE in progress using TUR */
> + rc = pqi_build_raid_path_request(ctrl_info, &request,
> + TEST_UNIT_READY, RAID_CTLR_LUNID, buffer,
> + buffer_length, 0, &dir);
> + if (rc)
> + goto out; /* Assume not offline */
> +
> + memcpy(request.lun_number, device->scsi3addr, sizeof(request.lun_number));
> +
> + rc = pqi_submit_raid_request_synchronous(ctrl_info, &request.header, 0, &error_info);
> +
> + if (rc)
> + goto out; /* Assume not offline */
> +
> + scsi_status = error_info.status;
> + sense_data_length = get_unaligned_le16(&error_info.sense_data_length);
> + if (sense_data_length == 0)
> + sense_data_length =
> + get_unaligned_le16(&error_info.response_data_length);
As the variable is named `sense_data_length`, for an outsider like me,
it’s suprising that `response_date_length` is stored in there.
> + if (sense_data_length) {
> + if (sense_data_length > sizeof(error_info.data))
> + sense_data_length = sizeof(error_info.data);
> +
> + /*
> + * Check for sanitize in progress: asc:0x04, ascq: 0x1b
Add a space after the second colon?
> + */
> + if (scsi_status == SAM_STAT_CHECK_CONDITION &&
> + scsi_normalize_sense(error_info.data,
> + sense_data_length, &sshdr) &&
> + sshdr.sense_key == NOT_READY &&
> + sshdr.asc == 0x04 &&
> + sshdr.ascq == 0x1b) {
> + device->device_offline = true;
> + offline = true;
> + goto out; /* Keep device offline */
> + }
> + }
Should a error, warning, or debug message be printed, when
`sense_data_length = 0` again?
> +
> +out:
> + kfree(buffer);
> + return offline;
> +}
> +
> static int pqi_get_device_info(struct pqi_ctrl_info *ctrl_info,
> struct pqi_scsi_dev *device,
> struct bmic_identify_physical_device *id_phys)
> @@ -2296,6 +2379,10 @@ static int pqi_update_scsi_devices(struct pqi_ctrl_info *ctrl_info)
> if (!pqi_is_supported_device(device))
> continue;
>
> + /* Do not present disks that the OS cannot fully probe */
> + if (pqi_keep_device_offline(ctrl_info, device))
I’d use the positive `!pqi_get_device_online()`, but it’s subjective.
> + continue;
> +
> /* Gather information about the device. */
> rc = pqi_get_device_info(ctrl_info, device, id_phys);
> if (rc == -ENOMEM) {
>
Kind regards,
Paul
next prev parent reply other threads:[~2021-09-29 7:57 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-28 23:54 [smartpqi updates PATCH V2 00/11] smartpqi updates Don Brace
2021-09-28 23:54 ` [smartpqi updates PATCH V2 01/11] smartpqi: update device removal management Don Brace
2021-09-30 18:21 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 02/11] smartpqi: add controller handshake during kdump Don Brace
2021-09-30 18:21 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 03/11] smartpqi: capture controller reason codes Don Brace
2021-09-30 18:22 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 04/11] smartpqi: update LUN reset handler Don Brace
2021-09-30 18:22 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 05/11] smartpqi: add tur check for sanitize operation Don Brace
2021-09-29 7:56 ` Paul Menzel [this message]
2021-09-30 18:23 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 06/11] smartpqi: avoid failing ios for offline devices Don Brace
2021-09-30 18:23 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 07/11] smartpqi: add extended report physical luns Don Brace
2021-09-30 18:23 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 08/11] smartpqi: fix boot failure during lun rebuild Don Brace
2021-09-30 18:24 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 09/11] smartpqi: fix duplicate device nodes for tape changers Don Brace
2021-09-30 18:24 ` john.p.donnelly
2021-10-01 8:26 ` Paul Menzel
2021-10-05 20:23 ` Don.Brace
2021-10-06 2:37 ` Martin K. Petersen
2021-10-06 14:28 ` Don.Brace
2021-10-07 9:38 ` Paul Menzel
2021-09-28 23:54 ` [smartpqi updates PATCH V2 10/11] smartpqi: add 3252-8i pci id Don Brace
2021-09-30 18:24 ` john.p.donnelly
2021-09-28 23:54 ` [smartpqi updates PATCH V2 11/11] smartpqi: update version to 2.1.12-055 Don Brace
2021-09-30 18:25 ` john.p.donnelly
2021-09-29 9:34 ` [smartpqi updates PATCH V2 00/11] smartpqi updates Paul Menzel
2021-09-29 14:08 ` Don.Brace
2021-09-29 14:12 ` Paul Menzel
2021-10-12 20:35 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c50efe26-2ee2-e880-3bb1-dd0ba60d2ec5@molgen.mpg.de \
--to=pmenzel@molgen.mpg.de \
--cc=Justin.Lindley@microchip.com \
--cc=Kevin.Barnett@microchip.com \
--cc=POSWALD@suse.com \
--cc=balsundar.p@microchip.com \
--cc=don.brace@microchip.com \
--cc=gerry.morong@microchip.com \
--cc=hch@infradead.org \
--cc=jeff@canonical.com \
--cc=jejb@linux.vnet.ibm.com \
--cc=john.p.donnelly@oracle.com \
--cc=joseph.szczypek@hpe.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=mahesh.rajashekhara@microchip.com \
--cc=martin.petersen@oracle.com \
--cc=mike.mcgowen@microchip.com \
--cc=murthy.bhat@microchip.com \
--cc=mwilck@suse.com \
--cc=scott.benesh@microchip.com \
--cc=scott.teel@microchip.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox