* [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
@ 2024-11-25 14:02 Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 1/4] scsi: st: Restore some drive settings after reset Kai Mäkisara
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Kai Mäkisara @ 2024-11-25 14:02 UTC (permalink / raw)
To: linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman, Kai Mäkisara
This set applies to 6.12 + the three patches accepted earlier (and in
linux-next).
The first patch re-applies after device reset some settings changed
by the user (partition, density, block size). This is the same as in v1.
The second and third patch address the case where more than one ULD
access the same device. The Unit Attention (UA) sense data is sent only
to one ULD and the others miss it. The st driver needs to find out
if device reset or media change has happened.
The second patch adds counters for New Media and Power On/Reset (POR)
Unit Attentions to the scsi_device struct. The third one changes st
so that these are used: if the value in the scsi_device struct does
not match the one stored locally, the corresponding UA has happened.
Use of the was_reset flag has been removed.
The fourth patch adds a file to sysfs to tell the user if reads/writes
to a tape have been blocked following a device reset.
---
Changes since V1:
- replace the patch removing was_reset handling with patches two and three
- add sysfs file reset_blocked
Kai Mäkisara (4):
scsi: st: Restore some drive settings after reset
scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT
ATTENTIONs
scsi: st: Modify st.c to use the new scsi_error counters
scsi: st: Add sysfs file reset_blocked
Documentation/scsi/st.rst | 5 +++
drivers/scsi/scsi_error.c | 12 +++++++
drivers/scsi/st.c | 73 +++++++++++++++++++++++++++++++++-----
drivers/scsi/st.h | 6 ++++
include/scsi/scsi_device.h | 9 +++++
5 files changed, 97 insertions(+), 8 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/4] scsi: st: Restore some drive settings after reset
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
@ 2024-11-25 14:02 ` Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs Kai Mäkisara
` (3 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Kai Mäkisara @ 2024-11-25 14:02 UTC (permalink / raw)
To: linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman, Kai Mäkisara
Some of the allowed operations put the tape into a known position
to continue operation assuming only the tape position has changed.
But reset sets partition, density and block size to drive default
values. These should be restored to the values before reset.
Normally the current block size and density are stored by the drive.
If the settings have been changed, the changed values have to be
saved by the driver across reset.
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
---
drivers/scsi/st.c | 26 +++++++++++++++++++++-----
drivers/scsi/st.h | 2 ++
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index e8ef27d7ef61..a0667a0ae4c9 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -952,7 +952,6 @@ static void reset_state(struct scsi_tape *STp)
STp->partition = find_partition(STp);
if (STp->partition < 0)
STp->partition = 0;
- STp->new_partition = STp->partition;
}
}
\f
@@ -2925,14 +2924,17 @@ static int st_int_ioctl(struct scsi_tape *STp, unsigned int cmd_in, unsigned lon
if (cmd_in == MTSETDENSITY) {
(STp->buffer)->b_data[4] = arg;
STp->density_changed = 1; /* At least we tried ;-) */
+ STp->changed_density = arg;
} else if (cmd_in == SET_DENS_AND_BLK)
(STp->buffer)->b_data[4] = arg >> 24;
else
(STp->buffer)->b_data[4] = STp->density;
if (cmd_in == MTSETBLK || cmd_in == SET_DENS_AND_BLK) {
ltmp = arg & MT_ST_BLKSIZE_MASK;
- if (cmd_in == MTSETBLK)
+ if (cmd_in == MTSETBLK) {
STp->blksize_changed = 1; /* At least we tried ;-) */
+ STp->changed_blksize = arg;
+ }
} else
ltmp = STp->block_size;
(STp->buffer)->b_data[9] = (ltmp >> 16);
@@ -3631,9 +3633,23 @@ static long st_ioctl(struct file *file, unsigned int cmd_in, unsigned long arg)
retval = (-EIO);
goto out;
}
- reset_state(STp);
- /* remove this when the midlevel properly clears was_reset */
- STp->device->was_reset = 0;
+ reset_state(STp); /* Clears pos_unknown */
+
+ /* Fix the device settings after reset, ignore errors */
+ if (mtc.mt_op == MTREW || mtc.mt_op == MTSEEK ||
+ mtc.mt_op == MTEOM) {
+ if (STp->can_partitions) {
+ /* STp->new_partition contains the
+ * latest partition set
+ */
+ STp->partition = 0;
+ switch_partition(STp);
+ }
+ if (STp->density_changed)
+ st_int_ioctl(STp, MTSETDENSITY, STp->changed_density);
+ if (STp->blksize_changed)
+ st_int_ioctl(STp, MTSETBLK, STp->changed_blksize);
+ }
}
if (mtc.mt_op != MTNOP && mtc.mt_op != MTSETBLK &&
diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h
index 7a68eaba7e81..2105c6a5b458 100644
--- a/drivers/scsi/st.h
+++ b/drivers/scsi/st.h
@@ -165,12 +165,14 @@ struct scsi_tape {
unsigned char compression_changed;
unsigned char drv_buffer;
unsigned char density;
+ unsigned char changed_density;
unsigned char door_locked;
unsigned char autorew_dev; /* auto-rewind device */
unsigned char rew_at_close; /* rewind necessary at close */
unsigned char inited;
unsigned char cleaning_req; /* cleaning requested? */
int block_size;
+ int changed_blksize;
int min_block;
int max_block;
int recover_count; /* From tape opening */
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 1/4] scsi: st: Restore some drive settings after reset Kai Mäkisara
@ 2024-11-25 14:02 ` Kai Mäkisara
2024-12-11 21:57 ` John Meneghini
2024-12-11 22:14 ` Bart Van Assche
2024-11-25 14:03 ` [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters Kai Mäkisara
` (2 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: Kai Mäkisara @ 2024-11-25 14:02 UTC (permalink / raw)
To: linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman, Kai Mäkisara
The purpose of the counters is to enable all ULDs attached to a
device to find out that a New Media or/and Power On/Reset Unit
Attentions has/have been set, even if another ULD catches the
Unit Attention as response to a SCSI command.
The ULDs can read the counters using the scsi_get_ua_new_media_ctr()
and scsi_get_ua_por_ctr() macros (argument pointer to the scsi_device
struct).
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
---
drivers/scsi/scsi_error.c | 12 ++++++++++++
include/scsi/scsi_device.h | 9 +++++++++
2 files changed, 21 insertions(+)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 10154d78e336..6ef0711c4ec3 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -547,6 +547,18 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
scsi_report_sense(sdev, &sshdr);
+ if (sshdr.sense_key == UNIT_ATTENTION) {
+ /*
+ * increment the counters for Power on/Reset or New Media so
+ * that all ULDs interested in these can see that those have
+ * happened, even if someone else gets the sense data.
+ */
+ if (sshdr.asc == 0x28)
+ scmd->device->ua_new_media_ctr++;
+ else if (sshdr.asc == 0x29)
+ scmd->device->ua_por_ctr++;
+ }
+
if (scsi_sense_is_deferred(&sshdr))
return NEEDS_RETRY;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9c540f5468eb..b184a5efc27e 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -247,6 +247,9 @@ struct scsi_device {
unsigned int queue_stopped; /* request queue is quiesced */
bool offline_already; /* Device offline message logged */
+ unsigned char ua_new_media_ctr; /* Counter for New Media UNIT ATTENTIONs */
+ unsigned char ua_por_ctr; /* Counter for Power On / Reset UAs */
+
atomic_t disk_events_disable_depth; /* disable depth for disk events */
DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
@@ -684,6 +687,12 @@ static inline int scsi_device_busy(struct scsi_device *sdev)
return sbitmap_weight(&sdev->budget_map);
}
+/* Macros to access the UNIT ATTENTION counters */
+#define scsi_get_ua_new_media_ctr(sdev) \
+ ((const unsigned int)(sdev->ua_new_media_ctr))
+#define scsi_get_ua_por_ctr(sdev) \
+ ((const unsigned int)(sdev->ua_por_ctr))
+
#define MODULE_ALIAS_SCSI_DEVICE(type) \
MODULE_ALIAS("scsi:t-" __stringify(type) "*")
#define SCSI_DEVICE_MODALIAS_FMT "scsi:t-0x%02x"
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 1/4] scsi: st: Restore some drive settings after reset Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs Kai Mäkisara
@ 2024-11-25 14:03 ` Kai Mäkisara
2024-12-11 22:14 ` John Meneghini
2024-11-25 14:03 ` [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked Kai Mäkisara
2024-12-11 21:57 ` [PATCH v2 0/4] scsi: st: scsi_error: More reset patches John Meneghini
4 siblings, 1 reply; 19+ messages in thread
From: Kai Mäkisara @ 2024-11-25 14:03 UTC (permalink / raw)
To: linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman, Kai Mäkisara
Compare the stored values of por_ctr and new_media_ctr against
the values in the device struct. In case of mismatch, the
Unit Attention corresponding to the counter has happened.
This is a safeguard against another ULD catching the
Unit Attention sense data.
Remove use of the was_reset flag in struct scsi_device.
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
---
drivers/scsi/st.c | 28 +++++++++++++++++++++++++---
drivers/scsi/st.h | 4 ++++
2 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index a0667a0ae4c9..ad86dfbc8919 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -163,9 +163,11 @@ static const char *st_formats[] = {
static int debugging = DEBUG;
+/* Setting these non-zero may risk recognizing resets */
#define MAX_RETRIES 0
#define MAX_WRITE_RETRIES 0
#define MAX_READY_RETRIES 0
+
#define NO_TAPE NOT_READY
#define ST_TIMEOUT (900 * HZ)
@@ -357,10 +359,18 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
{
int result = SRpnt->result;
u8 scode;
+ unsigned int ctr;
DEB(const char *stp;)
char *name = STp->name;
struct st_cmdstatus *cmdstatp;
+ ctr = scsi_get_ua_por_ctr(STp->device);
+ if (ctr != STp->por_ctr) {
+ STp->por_ctr = ctr;
+ STp->pos_unknown = 1; /* ASC => power on / reset */
+ st_printk(KERN_WARNING, STp, "Power on/reset recognized.");
+ }
+
if (!result)
return 0;
@@ -413,10 +423,11 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
if (cmdstatp->have_sense &&
cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17)
STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */
- if (cmdstatp->have_sense && scode == UNIT_ATTENTION && cmdstatp->sense_hdr.asc == 0x29)
+ if (cmdstatp->have_sense && scode == UNIT_ATTENTION &&
+ cmdstatp->sense_hdr.asc == 0x29 && !STp->pos_unknown) {
STp->pos_unknown = 1; /* ASC => power on / reset */
-
- STp->pos_unknown |= STp->device->was_reset;
+ st_printk(KERN_WARNING, STp, "Power on/reset recognized.");
+ }
if (cmdstatp->have_sense &&
scode == RECOVERED_ERROR
@@ -968,6 +979,7 @@ static int test_ready(struct scsi_tape *STp, int do_wait)
{
int attentions, waits, max_wait, scode;
int retval = CHKRES_READY, new_session = 0;
+ unsigned int ctr;
unsigned char cmd[MAX_COMMAND_SIZE];
struct st_request *SRpnt = NULL;
struct st_cmdstatus *cmdstatp = &STp->buffer->cmdstat;
@@ -1024,6 +1036,13 @@ static int test_ready(struct scsi_tape *STp, int do_wait)
}
}
+ ctr = scsi_get_ua_new_media_ctr(STp->device);
+ if (ctr != STp->new_media_ctr) {
+ STp->new_media_ctr = ctr;
+ new_session = 1;
+ DEBC_printk(STp, "New tape session.");
+ }
+
retval = (STp->buffer)->syscall_result;
if (!retval)
retval = new_session ? CHKRES_NEW_SESSION : CHKRES_READY;
@@ -4394,6 +4413,9 @@ static int st_probe(struct device *dev)
goto out_idr_remove;
}
+ tpnt->new_media_ctr = scsi_get_ua_new_media_ctr(SDp);
+ tpnt->por_ctr = scsi_get_ua_por_ctr(SDp);
+
dev_set_drvdata(dev, tpnt);
diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h
index 2105c6a5b458..47b0e31b7828 100644
--- a/drivers/scsi/st.h
+++ b/drivers/scsi/st.h
@@ -178,6 +178,10 @@ struct scsi_tape {
int recover_count; /* From tape opening */
int recover_reg; /* From last status call */
+ /* The saved values of midlevel counters */
+ unsigned int new_media_ctr;
+ unsigned int por_ctr;
+
#if DEBUG
unsigned char write_pending;
int nbr_finished;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
` (2 preceding siblings ...)
2024-11-25 14:03 ` [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters Kai Mäkisara
@ 2024-11-25 14:03 ` Kai Mäkisara
2024-12-11 21:57 ` John Meneghini
2024-12-11 21:57 ` [PATCH v2 0/4] scsi: st: scsi_error: More reset patches John Meneghini
4 siblings, 1 reply; 19+ messages in thread
From: Kai Mäkisara @ 2024-11-25 14:03 UTC (permalink / raw)
To: linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman, Kai Mäkisara
If the value read from the file is 1, reads and writes from/to the
device are blocked because the tape position may not match user's
expectation (tape rewound after device reset).
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
---
Documentation/scsi/st.rst | 5 +++++
drivers/scsi/st.c | 19 +++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/Documentation/scsi/st.rst b/Documentation/scsi/st.rst
index d3b28c28d74c..2209f03faad3 100644
--- a/Documentation/scsi/st.rst
+++ b/Documentation/scsi/st.rst
@@ -157,6 +157,11 @@ enabled driver and mode options. The value in the file is a bit mask where the
bit definitions are the same as those used with MTSETDRVBUFFER in setting the
options.
+Each directory contains the entry 'reset_blocked'. If this value is one,
+reading and writing to the device is blocked after device reset. Most
+devices rewind the tape after reset and the writes/read don't access the
+tape position the user expects.
+
A link named 'tape' is made from the SCSI device directory to the class
directory corresponding to the mode 0 auto-rewind device (e.g., st0).
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index ad86dfbc8919..0e6a87f1f47f 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4697,6 +4697,24 @@ options_show(struct device *dev, struct device_attribute *attr, char *buf)
}
static DEVICE_ATTR_RO(options);
+/**
+ * reset_blocked_show - Value 1 indicates that reads, writes, etc. are blocked
+ * because a device reset has occurred and no operation positioning the tape
+ * has been issued.
+ * @dev: struct device
+ * @attr: attribute structure
+ * @buf: buffer to return formatted data in
+ */
+static ssize_t reset_blocked_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct st_modedef *STm = dev_get_drvdata(dev);
+ struct scsi_tape *STp = STm->tape;
+
+ return sprintf(buf, "%d", STp->pos_unknown);
+}
+static DEVICE_ATTR_RO(reset_blocked);
+
/* Support for tape stats */
/**
@@ -4881,6 +4899,7 @@ static struct attribute *st_dev_attrs[] = {
&dev_attr_default_density.attr,
&dev_attr_default_compression.attr,
&dev_attr_options.attr,
+ &dev_attr_reset_blocked.attr,
NULL,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
` (3 preceding siblings ...)
2024-11-25 14:03 ` [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked Kai Mäkisara
@ 2024-12-11 21:57 ` John Meneghini
2024-12-12 18:27 ` "Kai Mäkisara (Kolumbus)"
4 siblings, 1 reply; 19+ messages in thread
From: John Meneghini @ 2024-12-11 21:57 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi; +Cc: martin.petersen, James.Bottomley, loberman
Sorry it has taken me so long to get back to this....
I've tested these patches with both my tape drive and with scsi_debug tape emulation.
see:
https://github.com/johnmeneghini/tape_tests
All hardware tests are passing and everything is working as expected with the tape drive tests, but the power on reset behavior
of the scsi_debug test is still showing the some strangeness.
https://github.com/johnmeneghini/tape_tests/blob/master/tape_reset_debug.sh
Specifically, every time you reload the scsi_debug driver the SCSI mid layer clears the POR UA. If I am not mistaken, your
intention with adding the counters for ua_new_media_ctr and ua_por_ctr to the mid layer was to catch these events and report
them to the upper layer driver.
Here's what the scsi_debug test does:
[tape_tests]# ./tape_reset_debug.sh 1 3 1 1
modprobe -r scsi_debug
modprobe scsi_debug tur_ms_to_ready=10000 ptype=1 max_luns=1 dev_size_mb=10000
[Wed Dec 11 15:35:48 2024] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[Wed Dec 11 15:35:48 2024] scsi host8: scsi_debug: version 0191 [20210520]
dev_size_mb=10000, opts=0x0, submit_queues=1, statistics=0
[Wed Dec 11 15:35:48 2024] scsi 8:0:0:0: Sequential-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
[Wed Dec 11 15:35:48 2024] scsi 8:0:0:0: Power-on or device reset occurred
^^^^^^^^^^^^^^^^^^^^^^
Here's where the scsi layer is clearing the POR UA.
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: Attached scsi tape st1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: st1: try direct i/o: yes (alignment 4 B)
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: Attached scsi generic sg3 type 1
[0:0:0:0] disk ATA Samsung SSD 840 4B0Q /dev/sda 3500253855022021d /dev/sg0
[7:0:0:0] tape QUANTUM ULTRIUM 4 U53F /dev/st0 - /dev/sg1
[7:0:1:0] enclosu LSI virtualSES 02 - - /dev/sg2
[8:0:0:0] tape Linux scsi_debug 0191 /dev/st1 - /dev/sg3
[N:0:0:1] disk INTEL SSDPEDMW400G4__1 /dev/nvme0n1 -
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 0
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (10000):
IM_REP_EN
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Sense Key : Not Ready [current]
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] Add. Sense: Logical unit is in process of becoming ready
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
st_chk_result was run here... but it looks like scsi_get_ua_por_ctr(STp->device) didn't report the first POR UA.
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] check_tape: 1141: CHKRES_NOT_READY pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] st_flush: 1404: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:35:48 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 1
Sleeping for 20 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 1
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
[Wed Dec 11 15:36:08 2024] st 8:0:0:0: [st1] Can't read block limits.
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
A second mt status command is done after the tape is ready...
So it looks like the initial POR UA is never recorded in ua_por_ctr.
Following this, resetting the device works.
sg_reset --target /dev/nst1
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Can't read block limits.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Error: 402, cmd: 1a 0 0 0 c 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Add. Sense: Invalid field in cdb
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 0 was_reset 0/0 ready 0, result 1026
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
scsi mid layer has NOT set was_reset.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] No Mode Sense.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] Block size: 0, buffer size: 4096 (1 blocks).
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] check_tape: 1282: CHKRES_READY pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 0/0 ready 0
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] st_flush: 1404: pos_unknown 0 was_reset 1/1 ready 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We see the scsi mid layer sets was_reset here.
[Wed Dec 11 15:36:09 2024] st 8:0:0:0: [st1] flush_buffer: 852: pos_unknown 0 was_reset 1/1 ready 0
Sleeping for 5 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: Power-on or device reset occurred
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Power on/reset recognized.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here's your code : scsi_get_ua_por_ctr(STp->device) found the reset here.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Unit Attention [current]
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Add. Sense: Scsi bus reset occurred
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Position unknown is now set.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 1026
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Can't read block limits.
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Error: 402, cmd: 1a 0 0 0 c 0
[Wed Dec 11 15:36:14 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
SCSI 2 tape drive:
All in all this is no different from what we have w/out your patches, so I have no problem approving this change.
One thing that did get fixed by this patch series is I can now use the sg device to reset the scsi_debug driver.
This is a good improvement. I can now run my new script.
https://github.com/johnmeneghini/tape_tests/blob/master/tape_reset_debug_sg.sh
sg_reset --target /dev/sg3
Sleeping for 5 seconds
Check the status
mt -f /dev/nst1 status
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: Power-on or device reset occurred
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Power on/reset recognized.
This didn't work before your change.
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Error: 2, cmd: 0 0 0 0 0 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Sense Key : Unit Attention [current]
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Add. Sense: Scsi bus reset occurred
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 2
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Error: 402, cmd: 5 0 0 0 0 0
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Sense Key : Illegal Request [current]
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Add. Sense: Invalid command operation code
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 1026
[Wed Dec 11 16:03:39 2024] st 8:0:0:0: [st1] Can't read block limits.
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
So all in all I think this is an improvement... I'd like to ask Martin to merge these changes in v6.13.
/John
P.S. There are still many issues with the scsi_debug tape emulation. See my test results for more information about how the
scsi_debug tape emulation test are failing at:
https://bugzilla.kernel.org/show_bug.cgi?id=219419#c21
On 11/25/24 09:02, Kai Mäkisara wrote:
> This set applies to 6.12 + the three patches accepted earlier (and in
> linux-next).
>
> The first patch re-applies after device reset some settings changed
> by the user (partition, density, block size). This is the same as in v1.
>
> The second and third patch address the case where more than one ULD
> access the same device. The Unit Attention (UA) sense data is sent only
> to one ULD and the others miss it. The st driver needs to find out
> if device reset or media change has happened.
>
> The second patch adds counters for New Media and Power On/Reset (POR)
> Unit Attentions to the scsi_device struct. The third one changes st
> so that these are used: if the value in the scsi_device struct does
> not match the one stored locally, the corresponding UA has happened.
> Use of the was_reset flag has been removed.
>
> The fourth patch adds a file to sysfs to tell the user if reads/writes
> to a tape have been blocked following a device reset.
> ---
> Changes since V1:
> - replace the patch removing was_reset handling with patches two and three
> - add sysfs file reset_blocked
>
> Kai Mäkisara (4):
> scsi: st: Restore some drive settings after reset
> scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT
> ATTENTIONs
> scsi: st: Modify st.c to use the new scsi_error counters
> scsi: st: Add sysfs file reset_blocked
>
> Documentation/scsi/st.rst | 5 +++
> drivers/scsi/scsi_error.c | 12 +++++++
> drivers/scsi/st.c | 73 +++++++++++++++++++++++++++++++++-----
> drivers/scsi/st.h | 6 ++++
> include/scsi/scsi_device.h | 9 +++++
> 5 files changed, 97 insertions(+), 8 deletions(-)
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs
2024-11-25 14:02 ` [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs Kai Mäkisara
@ 2024-12-11 21:57 ` John Meneghini
2024-12-11 22:14 ` Bart Van Assche
1 sibling, 0 replies; 19+ messages in thread
From: John Meneghini @ 2024-12-11 21:57 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi; +Cc: martin.petersen, James.Bottomley, loberman
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
On 11/25/24 09:02, Kai Mäkisara wrote:
> The purpose of the counters is to enable all ULDs attached to a
> device to find out that a New Media or/and Power On/Reset Unit
> Attentions has/have been set, even if another ULD catches the
> Unit Attention as response to a SCSI command.
>
> The ULDs can read the counters using the scsi_get_ua_new_media_ctr()
> and scsi_get_ua_por_ctr() macros (argument pointer to the scsi_device
> struct).
>
> Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
> ---
> drivers/scsi/scsi_error.c | 12 ++++++++++++
> include/scsi/scsi_device.h | 9 +++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 10154d78e336..6ef0711c4ec3 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -547,6 +547,18 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>
> scsi_report_sense(sdev, &sshdr);
>
> + if (sshdr.sense_key == UNIT_ATTENTION) {
> + /*
> + * increment the counters for Power on/Reset or New Media so
> + * that all ULDs interested in these can see that those have
> + * happened, even if someone else gets the sense data.
> + */
> + if (sshdr.asc == 0x28)
> + scmd->device->ua_new_media_ctr++;
> + else if (sshdr.asc == 0x29)
> + scmd->device->ua_por_ctr++;
> + }
> +
> if (scsi_sense_is_deferred(&sshdr))
> return NEEDS_RETRY;
>
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 9c540f5468eb..b184a5efc27e 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -247,6 +247,9 @@ struct scsi_device {
> unsigned int queue_stopped; /* request queue is quiesced */
> bool offline_already; /* Device offline message logged */
>
> + unsigned char ua_new_media_ctr; /* Counter for New Media UNIT ATTENTIONs */
> + unsigned char ua_por_ctr; /* Counter for Power On / Reset UAs */
> +
> atomic_t disk_events_disable_depth; /* disable depth for disk events */
>
> DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
> @@ -684,6 +687,12 @@ static inline int scsi_device_busy(struct scsi_device *sdev)
> return sbitmap_weight(&sdev->budget_map);
> }
>
> +/* Macros to access the UNIT ATTENTION counters */
> +#define scsi_get_ua_new_media_ctr(sdev) \
> + ((const unsigned int)(sdev->ua_new_media_ctr))
> +#define scsi_get_ua_por_ctr(sdev) \
> + ((const unsigned int)(sdev->ua_por_ctr))
> +
> #define MODULE_ALIAS_SCSI_DEVICE(type) \
> MODULE_ALIAS("scsi:t-" __stringify(type) "*")
> #define SCSI_DEVICE_MODALIAS_FMT "scsi:t-0x%02x"
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked
2024-11-25 14:03 ` [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked Kai Mäkisara
@ 2024-12-11 21:57 ` John Meneghini
0 siblings, 0 replies; 19+ messages in thread
From: John Meneghini @ 2024-12-11 21:57 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi; +Cc: martin.petersen, James.Bottomley, loberman
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
On 11/25/24 09:03, Kai Mäkisara wrote:
> If the value read from the file is 1, reads and writes from/to the
> device are blocked because the tape position may not match user's
> expectation (tape rewound after device reset).
>
> Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
> ---
> Documentation/scsi/st.rst | 5 +++++
> drivers/scsi/st.c | 19 +++++++++++++++++++
> 2 files changed, 24 insertions(+)
>
> diff --git a/Documentation/scsi/st.rst b/Documentation/scsi/st.rst
> index d3b28c28d74c..2209f03faad3 100644
> --- a/Documentation/scsi/st.rst
> +++ b/Documentation/scsi/st.rst
> @@ -157,6 +157,11 @@ enabled driver and mode options. The value in the file is a bit mask where the
> bit definitions are the same as those used with MTSETDRVBUFFER in setting the
> options.
>
> +Each directory contains the entry 'reset_blocked'. If this value is one,
> +reading and writing to the device is blocked after device reset. Most
> +devices rewind the tape after reset and the writes/read don't access the
> +tape position the user expects.
> +
> A link named 'tape' is made from the SCSI device directory to the class
> directory corresponding to the mode 0 auto-rewind device (e.g., st0).
>
> diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
> index ad86dfbc8919..0e6a87f1f47f 100644
> --- a/drivers/scsi/st.c
> +++ b/drivers/scsi/st.c
> @@ -4697,6 +4697,24 @@ options_show(struct device *dev, struct device_attribute *attr, char *buf)
> }
> static DEVICE_ATTR_RO(options);
>
> +/**
> + * reset_blocked_show - Value 1 indicates that reads, writes, etc. are blocked
> + * because a device reset has occurred and no operation positioning the tape
> + * has been issued.
> + * @dev: struct device
> + * @attr: attribute structure
> + * @buf: buffer to return formatted data in
> + */
> +static ssize_t reset_blocked_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct st_modedef *STm = dev_get_drvdata(dev);
> + struct scsi_tape *STp = STm->tape;
> +
> + return sprintf(buf, "%d", STp->pos_unknown);
> +}
> +static DEVICE_ATTR_RO(reset_blocked);
> +
> /* Support for tape stats */
>
> /**
> @@ -4881,6 +4899,7 @@ static struct attribute *st_dev_attrs[] = {
> &dev_attr_default_density.attr,
> &dev_attr_default_compression.attr,
> &dev_attr_options.attr,
> + &dev_attr_reset_blocked.attr,
> NULL,
> };
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters
2024-11-25 14:03 ` [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters Kai Mäkisara
@ 2024-12-11 22:14 ` John Meneghini
0 siblings, 0 replies; 19+ messages in thread
From: John Meneghini @ 2024-12-11 22:14 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi; +Cc: martin.petersen, James.Bottomley, loberman
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
On 11/25/24 09:03, Kai Mäkisara wrote:
> Compare the stored values of por_ctr and new_media_ctr against
> the values in the device struct. In case of mismatch, the
> Unit Attention corresponding to the counter has happened.
> This is a safeguard against another ULD catching the
> Unit Attention sense data.
>
> Remove use of the was_reset flag in struct scsi_device.
>
> Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
> ---
> drivers/scsi/st.c | 28 +++++++++++++++++++++++++---
> drivers/scsi/st.h | 4 ++++
> 2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
> index a0667a0ae4c9..ad86dfbc8919 100644
> --- a/drivers/scsi/st.c
> +++ b/drivers/scsi/st.c
> @@ -163,9 +163,11 @@ static const char *st_formats[] = {
>
> static int debugging = DEBUG;
>
> +/* Setting these non-zero may risk recognizing resets */
> #define MAX_RETRIES 0
> #define MAX_WRITE_RETRIES 0
> #define MAX_READY_RETRIES 0
> +
> #define NO_TAPE NOT_READY
>
> #define ST_TIMEOUT (900 * HZ)
> @@ -357,10 +359,18 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
> {
> int result = SRpnt->result;
> u8 scode;
> + unsigned int ctr;
> DEB(const char *stp;)
> char *name = STp->name;
> struct st_cmdstatus *cmdstatp;
>
> + ctr = scsi_get_ua_por_ctr(STp->device);
> + if (ctr != STp->por_ctr) {
> + STp->por_ctr = ctr;
> + STp->pos_unknown = 1; /* ASC => power on / reset */
> + st_printk(KERN_WARNING, STp, "Power on/reset recognized.");
> + }
> +
> if (!result)
> return 0;
>
> @@ -413,10 +423,11 @@ static int st_chk_result(struct scsi_tape *STp, struct st_request * SRpnt)
> if (cmdstatp->have_sense &&
> cmdstatp->sense_hdr.asc == 0 && cmdstatp->sense_hdr.ascq == 0x17)
> STp->cleaning_req = 1; /* ASC and ASCQ => cleaning requested */
> - if (cmdstatp->have_sense && scode == UNIT_ATTENTION && cmdstatp->sense_hdr.asc == 0x29)
> + if (cmdstatp->have_sense && scode == UNIT_ATTENTION &&
> + cmdstatp->sense_hdr.asc == 0x29 && !STp->pos_unknown) {
> STp->pos_unknown = 1; /* ASC => power on / reset */
> -
> - STp->pos_unknown |= STp->device->was_reset;
> + st_printk(KERN_WARNING, STp, "Power on/reset recognized.");
> + }
>
> if (cmdstatp->have_sense &&
> scode == RECOVERED_ERROR
> @@ -968,6 +979,7 @@ static int test_ready(struct scsi_tape *STp, int do_wait)
> {
> int attentions, waits, max_wait, scode;
> int retval = CHKRES_READY, new_session = 0;
> + unsigned int ctr;
> unsigned char cmd[MAX_COMMAND_SIZE];
> struct st_request *SRpnt = NULL;
> struct st_cmdstatus *cmdstatp = &STp->buffer->cmdstat;
> @@ -1024,6 +1036,13 @@ static int test_ready(struct scsi_tape *STp, int do_wait)
> }
> }
>
> + ctr = scsi_get_ua_new_media_ctr(STp->device);
> + if (ctr != STp->new_media_ctr) {
> + STp->new_media_ctr = ctr;
> + new_session = 1;
> + DEBC_printk(STp, "New tape session.");
> + }
> +
> retval = (STp->buffer)->syscall_result;
> if (!retval)
> retval = new_session ? CHKRES_NEW_SESSION : CHKRES_READY;
> @@ -4394,6 +4413,9 @@ static int st_probe(struct device *dev)
> goto out_idr_remove;
> }
>
> + tpnt->new_media_ctr = scsi_get_ua_new_media_ctr(SDp);
> + tpnt->por_ctr = scsi_get_ua_por_ctr(SDp);
> +
> dev_set_drvdata(dev, tpnt);
>
>
> diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h
> index 2105c6a5b458..47b0e31b7828 100644
> --- a/drivers/scsi/st.h
> +++ b/drivers/scsi/st.h
> @@ -178,6 +178,10 @@ struct scsi_tape {
> int recover_count; /* From tape opening */
> int recover_reg; /* From last status call */
>
> + /* The saved values of midlevel counters */
> + unsigned int new_media_ctr;
> + unsigned int por_ctr;
> +
> #if DEBUG
> unsigned char write_pending;
> int nbr_finished;
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs
2024-11-25 14:02 ` [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs Kai Mäkisara
2024-12-11 21:57 ` John Meneghini
@ 2024-12-11 22:14 ` Bart Van Assche
2024-12-12 18:33 ` "Kai Mäkisara (Kolumbus)"
1 sibling, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2024-12-11 22:14 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi, jmeneghi
Cc: martin.petersen, James.Bottomley, loberman
On 11/25/24 6:02 AM, Kai Mäkisara wrote:
> + unsigned char ua_new_media_ctr; /* Counter for New Media UNIT ATTENTIONs */
> + unsigned char ua_por_ctr; /* Counter for Power On / Reset UAs */
Why unsigned char instead of e.g. u16 or u32? With one of the latter two
data types, no cast would be necessary in the macros below.
> +/* Macros to access the UNIT ATTENTION counters */
> +#define scsi_get_ua_new_media_ctr(sdev) \
> + ((const unsigned int)(sdev->ua_new_media_ctr))
> +#define scsi_get_ua_por_ctr(sdev) \
> + ((const unsigned int)(sdev->ua_por_ctr))
Please introduce macros in the patch that introduces the first user of
these macros. I don't see any users of these macros in this patch?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-11 21:57 ` [PATCH v2 0/4] scsi: st: scsi_error: More reset patches John Meneghini
@ 2024-12-12 18:27 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 13:09 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 15:09 ` John Meneghini
0 siblings, 2 replies; 19+ messages in thread
From: "Kai Mäkisara (Kolumbus)" @ 2024-12-12 18:27 UTC (permalink / raw)
To: John Meneghini
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman
While doing some detective work, I found a serious problem. So, please hold these patches again.
More about the reason below.
> On 11. Dec 2024, at 23.57, John Meneghini <jmeneghi@redhat.com> wrote:
>
> Sorry it has taken me so long to get back to this....
>
> I've tested these patches with both my tape drive and with scsi_debug tape emulation.
>
> see:
>
> https://github.com/johnmeneghini/tape_tests
>
> All hardware tests are passing and everything is working as expected with the tape drive tests, but the power on reset behavior of the scsi_debug test is still showing the some strangeness.
>
> https://github.com/johnmeneghini/tape_tests/blob/master/tape_reset_debug.sh
>
> Specifically, every time you reload the scsi_debug driver the SCSI mid layer clears the POR UA. If I am not mistaken, your intention with adding the counters for ua_new_media_ctr and ua_por_ctr to the mid layer was to catch these events and report them to the upper layer driver.
>
Well, the counters work as designed. The st driver stores reference values when the driver
probing the device. This means that the UAs before the probe are missed.
I previously suspected that the first POR UA is caught by scsi scanning when it issues
MAINTENANCE_IN to get the supported opcodes. This happens when scanning calls
scsi_cdl_check(). However, this function does not do anything if Scsi level is less than
SPC-5 (ANSi 7). Scsi_debug claims SPC-5 and so scsi_cdl_check() gets the UA
before the st device exists. Your drive probably is reporting a level less than SPC-5
and so the UA is not received by the scanning code.
I changed scsi_debug so that it reports SPC-4 (ANSI 6). After this change st received
the POR UA. But I had 'mt stsetoptions' in my script and it failed!
The problem is that no driver options for the device can be set before something has
been done to clear the blocking. For instance, the stinit tool is a recommended method
to set the options based on a configuration file, but it fails.
Note that this problem has existed since commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
(for version 6.6) that added recognition of POR UA as an additional method to detect
resets. Nobody seems to have noticed this problem in the "real world". (Using
was_reset was not problematic because it caught only resets initiated by the midlevel.)
A solution might be to add some more ioctls to the list of allowed commands.
But I must think about this a little more.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs
2024-12-11 22:14 ` Bart Van Assche
@ 2024-12-12 18:33 ` "Kai Mäkisara (Kolumbus)"
0 siblings, 0 replies; 19+ messages in thread
From: "Kai Mäkisara (Kolumbus)" @ 2024-12-12 18:33 UTC (permalink / raw)
To: Bart Van Assche
Cc: linux-scsi, jmeneghi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman
> On 12. Dec 2024, at 0.14, Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 11/25/24 6:02 AM, Kai Mäkisara wrote:
>> + unsigned char ua_new_media_ctr; /* Counter for New Media UNIT ATTENTIONs */
>> + unsigned char ua_por_ctr; /* Counter for Power On / Reset UAs */
>
> Why unsigned char instead of e.g. u16 or u32? With one of the latter two data types, no cast would be necessary in the macros below.
The purpose was to save memory. (Because of the alignment rules, probably nothing is saved now compared to u16.)
I will change teh counters to u16.
The const casts are there to prevent the users to use the macros to change the counter values.
>
>> +/* Macros to access the UNIT ATTENTION counters */
>> +#define scsi_get_ua_new_media_ctr(sdev) \
>> + ((const unsigned int)(sdev->ua_new_media_ctr))
>> +#define scsi_get_ua_por_ctr(sdev) \
>> + ((const unsigned int)(sdev->ua_por_ctr))
>
> Please introduce macros in the patch that introduces the first user of these macros. I don't see any users of these macros in this patch?
The idea was to introduce the mechanism in one patch and the user in the next. But I will move the macros to the next patch.
Thanks,
Kai
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-12 18:27 ` "Kai Mäkisara (Kolumbus)"
@ 2024-12-13 13:09 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 17:32 ` John Meneghini
2024-12-13 15:09 ` John Meneghini
1 sibling, 1 reply; 19+ messages in thread
From: "Kai Mäkisara (Kolumbus)" @ 2024-12-13 13:09 UTC (permalink / raw)
To: John Meneghini
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman
> On 12. Dec 2024, at 20.27, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote:
>
> While doing some detective work, I found a serious problem. So, please hold these patches again.
> More about the reason below.
...
> The problem is that no driver options for the device can be set before something has
> been done to clear the blocking. For instance, the stinit tool is a recommended method
> to set the options based on a configuration file, but it fails.
>
> Note that this problem has existed since commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
> (for version 6.6) that added recognition of POR UA as an additional method to detect
> resets. Nobody seems to have noticed this problem in the "real world". (Using
> was_reset was not problematic because it caught only resets initiated by the midlevel.)
>
> A solution might be to add some more ioctls to the list of allowed commands.
> But I must think about this a little more.
This does not seem to be a promising direction. I think it is better to see that the
first test_ready() (called from st_open()) does not set the pos_unknown flag.
If there are no objections, I will add this to the next version of the patches.
The justification for this solution is that just after the device is detected by st,
the position of the tape is known to the user and there is no need to prevent,
for instance, writing to the tape.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-12 18:27 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 13:09 ` "Kai Mäkisara (Kolumbus)"
@ 2024-12-13 15:09 ` John Meneghini
2024-12-13 15:28 ` John Meneghini
1 sibling, 1 reply; 19+ messages in thread
From: John Meneghini @ 2024-12-13 15:09 UTC (permalink / raw)
To: Kai Mäkisara (Kolumbus)
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman
On 12/12/24 13:27, "Kai Mäkisara (Kolumbus)" wrote:
> I previously suspected that the first POR UA is caught by scsi scanning when it issues
> MAINTENANCE_IN to get the supported opcodes. This happens when scanning calls
> scsi_cdl_check(). However, this function does not do anything if Scsi level is less than
> SPC-5 (ANSi 7). Scsi_debug claims SPC-5 and so scsi_cdl_check() gets the UA
> before the st device exists. Your drive probably is reporting a level less than SPC-5
> and so the UA is not received by the scanning code.
No, I don't have a problem with the tape drive. Everything works correctly with the tape drive. You can see the test results
log for my latest test with the tape drive at:
https://bugzilla.kernel.org/show_bug.cgi?id=219419#c22
This isn't the issue. The issue of the first POR/UA not being reported to the st driver is only seen when using the scsi_debug
driver with tape emulation.
/John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-13 15:09 ` John Meneghini
@ 2024-12-13 15:28 ` John Meneghini
0 siblings, 0 replies; 19+ messages in thread
From: John Meneghini @ 2024-12-13 15:28 UTC (permalink / raw)
To: Kai Mäkisara (Kolumbus)
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman
On 12/13/24 10:09, John Meneghini wrote:
>> previously suspected that the first POR UA is caught by scsi scanning when it issues
>> MAINTENANCE_IN to get the supported opcodes. This happens when scanning calls
>> scsi_cdl_check(). However, this function does not do anything if Scsi level is less than
>> SPC-5 (ANSi 7). Scsi_debug claims SPC-5 and so scsi_cdl_check() gets the UA
>> before the st device exists. Your drive probably is reporting a level less than SPC-5
>> and so the UA is not received by the scanning code.
>
> No, I don't have a problem with the tape drive. Everything works correctly with the tape drive. You can see the test results
> log for my latest test with the tape drive at:
Sorry, I misunderstood your comment. Yes, the tape device I am using for testing is reporting SPC-3.
[root@to-be-determined tape_tests]# sg_inq /dev/nst0
standard INQUIRY:
PQual=0 PDT=1 RMB=1 LU_CONG=0 hot_pluggable=0 version=0x05 [SPC-3]
[AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=2
SCCS=0 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0]
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=0
[RelAdr=0] WBus16=0 Sync=0 [Linked=0] [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=96 (0x60) Peripheral device type: tape
Vendor identification: QUANTUM
Product identification: ULTRIUM 4
Product revision level: U53F
So you are saying this this issue of the mid layer not reporting the first POR UA would be seen if my tape drive were in SPC-5
device.
/John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-13 13:09 ` "Kai Mäkisara (Kolumbus)"
@ 2024-12-13 17:32 ` John Meneghini
2024-12-14 13:46 ` "Kai Mäkisara (Kolumbus)"
0 siblings, 1 reply; 19+ messages in thread
From: John Meneghini @ 2024-12-13 17:32 UTC (permalink / raw)
To: Kai Mäkisara (Kolumbus)
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman, Bryan Gurney
On 12/13/24 08:09, "Kai Mäkisara (Kolumbus)" wrote:
>
>> On 12. Dec 2024, at 20.27, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote:
>>
>> While doing some detective work, I found a serious problem. So, please hold these patches again.
>> More about the reason below.
> ...
>> The problem is that no driver options for the device can be set before something has
>> been done to clear the blocking. For instance, the stinit tool is a recommended method
>> to set the options based on a configuration file, but it fails.
And "the blocking" is because pos_unknown is set?
>> Note that this problem has existed since commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
>> (for version 6.6) that added recognition of POR UA as an additional method to detect
>> resets. Nobody seems to have noticed this problem in the "real world". (Using
>> was_reset was not problematic because it caught only resets initiated by the midlevel.)
Just to be clear. People in the real world did notice this problem. We have multiple customers who have reported "regressions"
in the st driver, all of whom starting using a version of our distribution which had commit 9604eea5bd3a. The changes for
9604eea5bd3a (scsi: st: Add third party poweron reset handling) were necessary to fix a real customer reported problem, but
there were a number of regressions introduced by this change and it looks like we haven't gotten to the bottom of these
regressions. Basically, we had so many customer complaints about this that we reverted commit 9604eea5bd3a in rhel-8.
>> A solution might be to add some more ioctls to the list of allowed commands.
>> But I must think about this a little more.
I think that might be a good way to go.
> This does not seem to be a promising direction. I think it is better to see that the
> first test_ready() (called from st_open()) does not set the pos_unknown flag.
> If there are no objections, I will add this to the next version of the patches.
So I don't know about this. Any tur/open that detects a POR/UA should turn on pos_unknown. This is the third party reset
problem. If you ignore was_reset in the st driver then there's no way to tell if any particular POR/UA is a result of an
intended action (the user reset the device on purpose) or if the tape device, or scsi gateway, or scsi transport error, or
anything else actually reset the device.
> The justification for this solution is that just after the device is detected by st,
> the position of the tape is known to the user and there is no need to prevent,
> for instance, writing to the tape.
What do you mean by "just after the device is detected"? How can you accurately detect this meta-condition?
I'd say, any time the st driver is loaded (modprobe -r st; modprobe st) the position should probably be assumed at unknown. And
any time a POR, Not Ready or Media Change is detected, the position should DEFINITELY be unknown until the tape is
rewound/re-positioned.
I don't think we can or should assume anything about the position of the tape in the driver.
Here's a test that's currently passing with my spc-3 tape device.
[root@to-be-determined tape_tests]# lsscsi -ig
[0:0:0:0] disk ATA Samsung SSD 840 4B0Q /dev/sda 3500253855022021d /dev/sg0
[7:0:0:0] tape QUANTUM ULTRIUM 4 U53F /dev/st0 - /dev/sg1
[7:0:1:0] enclosu LSI virtualSES 02 - - /dev/sg2
[N:0:0:1] disk INTEL SSDPEDMW400G4__1 /dev/nvme0n1 -
[root@to-be-determined tape_tests]# modprobe -r st
[Fri Dec 13 11:44:40 2024] st: Unloaded.
[root@to-be-determined tape_tests]# lsscsi -ig
[0:0:0:0] disk ATA Samsung SSD 840 4B0Q /dev/sda 3500253855022021d /dev/sg0
[7:0:0:0] tape QUANTUM ULTRIUM 4 U53F - - /dev/sg1
[7:0:1:0] enclosu LSI virtualSES 02 - - /dev/sg2
[N:0:0:1] disk INTEL SSDPEDMW400G4__1 /dev/nvme0n1 -
[root@to-be-determined tape_tests]# sg_reset --target /dev/sg1
[Fri Dec 13 11:46:37 2024] scsi target7:0:0: attempting target reset! scmd(0x00000000b33bbaf4)
[Fri Dec 13 11:46:37 2024] scsi 7:0:0:0: CDB: Test Unit Ready
[Fri Dec 13 11:46:37 2024] scsi target7:0:0: handle(0x000a), sas_address(0x500110a0014e8914), phy(0)
[Fri Dec 13 11:46:37 2024] scsi target7:0:0: enclosure logical id(0x500605b00cf207c0), slot(4)
[Fri Dec 13 11:46:37 2024] scsi target7:0:0: enclosure level(0x0000), connector name( C1 )
[Fri Dec 13 11:46:37 2024] scsi target7:0:0: target reset: SUCCESS scmd(0x00000000b33bbaf4)
So now let's attach the st driver "for the fist time".
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[root@to-be-determined tape_tests]# modprobe st debug_flag=1
[Fri Dec 13 11:53:00 2024] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[Fri Dec 13 11:53:00 2024] st: Debugging enabled debug_flag = 1
[Fri Dec 13 11:53:00 2024] st 7:0:0:0: Attached scsi tape st0
[Fri Dec 13 11:53:00 2024] st 7:0:0:0: st0: try direct i/o: yes (alignment 4 B)
[root@to-be-determined tape_tests]# mt -f /dev/nst0 status
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: Power-on or device reset occurred
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Power on/reset recognized.
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Error: 2, cmd: 0 0 0 0 0 0
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Sense Key : Unit Attention [current]
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Add. Sense: Scsi bus reset occurred
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] st_chk_result: 432: pos_unknown 1 was_reset 1/1 ready 0, result 2
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:53:17 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pos_unnkown is set, as it should be, because the device was reset between close/unload and load/open.
[root@to-be-determined tape_tests]# sg_inq /dev/nst0
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 1 was_reset 1/1 ready 0
sg_inq failed: Input/output error
close error: Input/output error
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 11:59:00 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the problem you're talking about. There are many ioctls that won't work while pos_unknown is set. They won't work
until a rewind/reposition is done... but I think think this is the design.
[root@to-be-determined tape_tests]# mt -f /dev/nst0 stshowoptions
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:24:51 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
The options set: buffer-writes async-writes read-ahead debug can-bsr
[root@to-be-determined tape_tests]# mt -f /dev/nst0 stsetoptions no-blklimits
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:26:12 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 1 was_reset 1/1 ready 0
/dev/nst0: Input/output error
[root@to-be-determined tape_tests]# mt -f /dev/nst0 rewind
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 1 was_reset 1/1 ready 0
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] Rewinding tape.
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:26:28 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 0 was_reset 1/1 ready 0
[root@to-be-determined tape_tests]# mt -f /dev/nst0 stsetoptions no-blklimits
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] Block limits 1 - 16777215 bytes.
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] Mode 0 options: buffer writes: 1, async writes: 1, read ahead: 1
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] can bsr: 1, two FMs: 0, fast mteom: 0, auto lock: 0,
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] defs for wr: 0, no block limits: 1, partitions: 0, s2 log: 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] sysv: 0 nowait: 0 sili: 0 nowait_filemark: 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] debugging: 1
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:26:31 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 0 was_reset 1/1 ready 0
[root@to-be-determined tape_tests]# mt -f /dev/nst0 stshowoptions
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] check_tape: 1082: pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] Mode sense. Length 11, medium 0, WBS 10, BLL 8
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] Density 46, tape length: 0, drv buffer: 1
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] Block size: 0, buffer size: 4096 (1 blocks).
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] check_tape: 1282: CHKRES_READY pos_unknown 0 was_reset 1/1 ready 0
[Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] st_flush: 1404: pos_unknown 0 was_reset 1/1 ready 0
[root@to-be-determined tape_tests]# [Fri Dec 13 12:29:12 2024] st 7:0:0:0: [st0] flush_buffer: 852: pos_unknown 0 was_reset 1/1
ready 0
The options set: buffer-writes async-writes read-ahead debug can-bsr no-blklimits
[root@to-be-determined tape_tests]# sg_inq /dev/sg1
standard INQUIRY:
PQual=0 PDT=1 RMB=1 LU_CONG=0 hot_pluggable=0 version=0x05 [SPC-3]
[AERC=0] [TrmTsk=0] NormACA=0 HiSUP=0 Resp_data_format=2
SCCS=0 ACC=0 TPGS=1 3PC=0 Protect=0 [BQue=0]
EncServ=0 MultiP=1 (VS=0) [MChngr=0] [ACKREQQ=0] Addr16=0
[RelAdr=0] WBus16=0 Sync=0 [Linked=0] [TranDis=0] CmdQue=1
[SPI: Clocking=0x0 QAS=0 IUS=0]
length=96 (0x60) Peripheral device type: tape
Vendor identification: QUANTUM
Product identification: ULTRIUM 4
Product revision level: U53F
[root@to-be-determined tape_tests]#
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-13 17:32 ` John Meneghini
@ 2024-12-14 13:46 ` "Kai Mäkisara (Kolumbus)"
2024-12-20 22:14 ` John Meneghini
0 siblings, 1 reply; 19+ messages in thread
From: "Kai Mäkisara (Kolumbus)" @ 2024-12-14 13:46 UTC (permalink / raw)
To: John Meneghini
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman, Bryan Gurney
> On 13. Dec 2024, at 19.32, John Meneghini <jmeneghi@redhat.com> wrote:
>
> On 12/13/24 08:09, "Kai Mäkisara (Kolumbus)" wrote:
>>> On 12. Dec 2024, at 20.27, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote:
>>>
>>> While doing some detective work, I found a serious problem. So, please hold these patches again.
>>> More about the reason below.
>> ...
>>> The problem is that no driver options for the device can be set before something has
>>> been done to clear the blocking. For instance, the stinit tool is a recommended method
>>> to set the options based on a configuration file, but it fails.
>
> And "the blocking" is because pos_unknown is set?
>
>>> Note that this problem has existed since commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
>>> (for version 6.6) that added recognition of POR UA as an additional method to detect
>>> resets. Nobody seems to have noticed this problem in the "real world". (Using
>>> was_reset was not problematic because it caught only resets initiated by the midlevel.)
>
> Just to be clear. People in the real world did notice this problem. We have multiple customers who have reported "regressions" in the st driver, all of whom starting using a version of our distribution which had commit 9604eea5bd3a. The changes for
> 9604eea5bd3a (scsi: st: Add third party poweron reset handling) were necessary to fix a real customer reported problem, but there were a number of regressions introduced by this change and it looks like we haven't gotten to the bottom of these regressions. Basically, we had so many customer complaints about this that we reverted commit 9604eea5bd3a in rhel-8.
This sounds puzzling. The patch 9604eea5bd3a has been signed off by you. Now you say that
there were a number of regressions, so that you have reverted the commit in rhel-8. Yet, there
have been no reports of regressions in linux-scsi. Or have I missed something?
I have made some experiments with st.c from v6.4 (before the commit) and v6.7 after the
commit. My (slightly tuned) scsi_debug was started with option 'scsi_level=6'. The
test used the stinit tool that can be used to set st parameters after a drive has been
detected (using, e.g., udev). (And I think that any decently configured Linux system
with tape drives should set the proper parameters for the drives.)
The test uses modprobe to load scsi_debug (and this loads also st). After that
the tools mentioned above were tried:
st.c from v6.4:
- stinit succeeds
- 'dd if=/dev/nst0 of=/dev/null bs=10240 count=10' succeeds
st.c from v6.7:
- stinit fails
- dd fails
So, there is are clear regressions caused by commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
and this must be fixed. One method is, of course, to revert the commit. Another alternative is to do
something to solve the problems created by the commit.
Modifying st to accept what stinit uses even is pos_unknown is true fixes the stinit problem,
but dd still fails. Not setting pos_unknown after the initial POR UA fixes both problems.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-14 13:46 ` "Kai Mäkisara (Kolumbus)"
@ 2024-12-20 22:14 ` John Meneghini
2024-12-21 7:57 ` "Kai Mäkisara (Kolumbus)"
0 siblings, 1 reply; 19+ messages in thread
From: John Meneghini @ 2024-12-20 22:14 UTC (permalink / raw)
To: Kai Mäkisara (Kolumbus)
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman, Bryan Gurney,
Ewan Milne
On 12/14/24 08:46, "Kai Mäkisara (Kolumbus)" wrote:
>> On 13. Dec 2024, at 19.32, John Meneghini <jmeneghi@redhat.com> wrote:
>>
>> On 12/13/24 08:09, "Kai Mäkisara (Kolumbus)" wrote:
>>>> On 12. Dec 2024, at 20.27, Kai Mäkisara (Kolumbus) <kai.makisara@kolumbus.fi> wrote:
>>>> Note that this problem has existed since commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
>>>> (for version 6.6) that added recognition of POR UA as an additional method to detect
>>>> resets. Nobody seems to have noticed this problem in the "real world". (Using
>>>> was_reset was not problematic because it caught only resets initiated by the midlevel.)
>>
>> Just to be clear. People in the real world did notice this problem. We have multiple customers who have reported "regressions"
>> in the st driver, all of whom starting using a version of our distribution which had commit 9604eea5bd3a. The changes for
>> 9604eea5bd3a (scsi: st: Add third party poweron reset handling) were necessary to fix a real customer reported problem, but
>> there were a number of regressions introduced by this change and it looks like we haven't gotten to the bottom of these regressions.
>> Basically, we had so many customer complaints about this that we reverted commit 9604eea5bd3a in rhel-8.
>
> This sounds puzzling. The pa9604eea5bd3a (scsi: st: Add third party poweron reset handling)tch 9604eea5bd3a has been signed off by you. Now you say that
> there were a number of regressions, so that you have reverted the commit in rhel-8. Yet, there
> have been no reports of regressions in linux-scsi. Or have I missed something?
Yes, I signed-off on a patch that introduced a regression. Of course, I was unaware of this at the time.
Laurence and I worked on commit 9604eea5bd3a (scsi: st: Add third party poweron reset handling) and tested it extensively with a
scsi gateway using third party resets. The patch fixed the problem. The customer tested it. We told the customer they needed to
reposition/rewind the tape with mt rewind following any third party poweron/reset event - which apparently happens not
infrequently in their environment. This worked for the customer so we thought it was good. The st driver had never correctly
handled a POR UA before so we didn't think the fact that MTIOGET returned EIO following a third party reset was a problem.
However, the part that we were not aware of was: tape drives set a poweron/reset UA when the machine reboots. So we started
getting complaints from different customers about the fact that MTIOCGET suddenly fails with EIO after upgrade. The work around
was simple (issue an 'mt rewind') but customers did not like this, and when more than one or two customers started calling and
complaining about this, we reverted the patch to avoid generating more calls. This is when I opened Bug 219419 - Can not query
tape status - MTIOCGET fails with EIO.
https://bugzilla.kernel.org/show_bug.cgi?id=219419
We have fixed that problem now and the fix, including commit 9604eea5bd3a, is being disseminated in rhel-8 and rhel-9.
I see now that stinit and dd, and possibly other IOCTLS, are unexpectedly failing too. I'm not sure if we can call these
regressions or not. From what I can see the st driver never really handled POR UA's correctly. It only worked with sg_reset
(first party reset)... but I agree that customers probably will not like this.
> I have made some experiments with st.c from v6.4 (before the commit) and v6.7 after the
> commit. My (slightly tuned) scsi_debug was started with option 'scsi_level=6'. The
> test used the stinit tool that can be used to set st parameters after a drive has been
> detected (using, e.g., udev). (And I think that any decently configured Linux system
> with tape drives should set the proper parameters for the drives.)
Agreed.
> The test uses modprobe to load scsi_debug (and this loads also st). After that
> the tools mentioned above were tried:
If you want to share the details of exactly what your tests are doing (privately if you'd like) I will try to reproduce your
results. Obviously, one problem here is that our tape tests here at RH (and everywhere) are inadequate - at least w.r.t.
resets. I'm working to improve this.
> st.c from v6.4:
> - stinit succeeds
> - 'dd if=/dev/nst0 of=/dev/null bs=10240 count=10' succeeds
>
> st.c from v6.7:
> - stinit fails
> - dd fails
This is simple enough... I'll add this to my tests.
> So, there is are clear regressions caused by commit 9604eea5bd3ae1fa3c098294f4fc29ad687141ea
> and this must be fixed. One method is, of course, to revert the commit. Another alternative is to do
> something to solve the problems created by the commit.
I really don't want to revert commit 9604eea5bd3ae1. It actually fixes a real problem where the tape drive behind a gateway
device crashes and resets itself. Then, because the st driver ignores the POR/UA, it writes a file mark at the BOT. This
destroys the customer's backup. It is a serious problem and a day-one bug.
> Modifying st to accept what stinit uses even is pos_unknown is true fixes the stinit problem,
> but dd still fails.
OK, shouldn't dd fail if the pos_unknown is true? Basically, anytime the tape device reports that it has been reset the
application NEEDS to reposition the tape. And, for that matter, the application should also check and set any options that
might be wanted. The application can't just ignore these POR Unit Attentions.
> Not setting pos_unknown after the initial POR UA fixes both problems.
That's fine with me... I just don't understand how you can distinguish "the initial POR UA" from any POR UA. If you have a
patch that can figure out how to do this... that's great... let's try.
Thanks for all of your work on this Kai. I will continue to help as much as I can by testing any further patches you have.
However, that will have to be after Christmas. Things are shutting down here at RH until January 2.
Thanks again,
/John
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 0/4] scsi: st: scsi_error: More reset patches
2024-12-20 22:14 ` John Meneghini
@ 2024-12-21 7:57 ` "Kai Mäkisara (Kolumbus)"
0 siblings, 0 replies; 19+ messages in thread
From: "Kai Mäkisara (Kolumbus)" @ 2024-12-21 7:57 UTC (permalink / raw)
To: John Meneghini
Cc: linux-scsi, martin.petersen,
James.Bottomley@hansenpartnership.com, loberman, Bryan Gurney,
Ewan Milne
On 21. Dec 2024, at 0.14, John Meneghini <jmeneghi@redhat.com> wrote:
>
...
> Yes, I signed-off on a patch that introduced a regression. Of course, I was unaware of this at the time.
>
> Laurence and I worked on commit 9604eea5bd3a (scsi: st: Add third party poweron reset handling) and tested it extensively with a scsi gateway using third party resets. The patch fixed the problem. The customer tested it. We told the customer they needed to reposition/rewind the tape with mt rewind following any third party poweron/reset event - which apparently happens not infrequently in their environment. This worked for the customer so we thought it was good. The st driver had never correctly handled a POR UA before so we didn't think the fact that MTIOGET returned EIO following a third party reset was a problem.
>
> However, the part that we were not aware of was: tape drives set a poweron/reset UA when the machine reboots. So we started getting complaints from different customers about the fact that MTIOCGET suddenly fails with EIO after upgrade. The work around was simple (issue an 'mt rewind') but customers did not like this, and when more than one or two customers started calling and complaining about this, we reverted the patch to avoid generating more calls. This is when I opened Bug 219419 - Can not query tape status - MTIOCGET fails with EIO.
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219419
>
> We have fixed that problem now and the fix, including commit 9604eea5bd3a, is being disseminated in rhel-8 and rhel-9.
>
> I see now that stinit and dd, and possibly other IOCTLS, are unexpectedly failing too. I'm not sure if we can call these regressions or not. From what I can see the st driver never really handled POR UA's correctly. It only worked with sg_reset (first party reset)... but I agree that customers probably will not like this.
It is a regression because it breaks user space.
First, I want to emphasise that your patch was correctly fixing the problem it was supposed
to fix. And I acked it. The problems after boot were far-fetched and not easy to see. This
happens.
My mild complaint is that there were no reports on linux-scsi about the problems after
boot. Someone might have seen these problems earlier if there were those reports.
(With hindsight, Rafal Rocha's patch dealt with this problem.)
But now we know where the problem is. Let'f concentrate on fixing it.
...
>> The test uses modprobe to load scsi_debug (and this loads also st). After that
>> the tools mentioned above were tried:
>
> If you want to share the details of exactly what your tests are doing (privately if you'd like) I will try to reproduce your results. Obviously, one problem here is that our tape tests here at RH (and everywhere) are inadequate - at least w.r.t. resets. I'm working to improve this.
I was experimenting with different things using (slightly enhanced) csi_debug and scripts
using snippets from your testing scripts. When I set scsi_level to 6 in scsi_debug, I noticed
that after modprobe, stinit and also dd failed. This lead me to think that this was because
now st caught the initial POR UA. The distribution I used for testing also had an udev rule
to run stinit when a drive was detected. The current stinit does return 0 also when setting
options fails, but I could see that the options did not get set.
You should see these things with a real tape drive, too.
...
> I really don't want to revert commit 9604eea5bd3ae1. It actually fixes a real problem where the tape drive behind a gateway device crashes and resets itself. Then, because the st driver ignores the POR/UA, it writes a file mark at the BOT. This destroys the customer's backup. It is a serious problem and a day-one bug.
I agree that the patch is useful. That is why it is better to fix the negative consequencies.
>> Modifying st to accept what stinit uses even is pos_unknown is true fixes the stinit problem,
>> but dd still fails.
>
> OK, shouldn't dd fail if the pos_unknown is true? Basically, anytime the tape device reports that it has been reset the application NEEDS to reposition the tape. And, for that matter, the application should also check and set any options that might be wanted. The application can't just ignore these POR Unit Attentions.
Yes. But I don't think pos_unknown should be set for the initial POR UA. I think it
is reasonable to assume that the user does not trust knowing the tape location
right after the device is added.
>
>> Not setting pos_unknown after the initial POR UA fixes both problems.
>
> That's fine with me... I just don't understand how you can distinguish "the initial POR UA" from any POR UA. If you have a patch that can figure out how to do this... that's great... let's try.
I sent the patch ([PATCH] scsi: st: Regression fix: Don't set pos_unknown just after device recognition) to linux-scsi on
Dec 16.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-12-21 7:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-25 14:02 [PATCH v2 0/4] scsi: st: scsi_error: More reset patches Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 1/4] scsi: st: Restore some drive settings after reset Kai Mäkisara
2024-11-25 14:02 ` [PATCH v2 2/4] scsi: scsi_error: Add counters for New Media and Power On/Reset UNIT ATTENTIONs Kai Mäkisara
2024-12-11 21:57 ` John Meneghini
2024-12-11 22:14 ` Bart Van Assche
2024-12-12 18:33 ` "Kai Mäkisara (Kolumbus)"
2024-11-25 14:03 ` [PATCH v2 3/4] scsi: st: Modify st.c to use the new scsi_error counters Kai Mäkisara
2024-12-11 22:14 ` John Meneghini
2024-11-25 14:03 ` [PATCH v2 4/4] scsi: st: Add sysfs file reset_blocked Kai Mäkisara
2024-12-11 21:57 ` John Meneghini
2024-12-11 21:57 ` [PATCH v2 0/4] scsi: st: scsi_error: More reset patches John Meneghini
2024-12-12 18:27 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 13:09 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 17:32 ` John Meneghini
2024-12-14 13:46 ` "Kai Mäkisara (Kolumbus)"
2024-12-20 22:14 ` John Meneghini
2024-12-21 7:57 ` "Kai Mäkisara (Kolumbus)"
2024-12-13 15:09 ` John Meneghini
2024-12-13 15:28 ` John Meneghini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox