* [PATCH 1/4] s390/dasd: fix hanging device after quiesce/resume
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
@ 2023-07-21 19:36 ` Stefan Haberland
2023-07-21 19:36 ` [PATCH 2/4] s390/dasd: use correct number of retries for ERP requests Stefan Haberland
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Stefan Haberland @ 2023-07-21 19:36 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
Quiesce and resume are functions that tell the DASD driver to stop/resume
issuing I/Os to a specific DASD.
On resume dasd_schedule_block_bh() is called to kick handling of IO
requests again. This does unfortunately not cover internal requests which
are used for path verification for example.
This could lead to a hanging device when a path event or anything else
that triggers internal requests occurs on a quiesced device.
Fix by also calling dasd_schedule_device_bh() which triggers handling of
internal requests on resume.
Fixes: 8e09f21574ea ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
---
drivers/s390/block/dasd_ioctl.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 513a7e6eee63..d55862605b82 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -131,6 +131,7 @@ static int dasd_ioctl_resume(struct dasd_block *block)
spin_unlock_irqrestore(get_ccwdev_lock(base->cdev), flags);
dasd_schedule_block_bh(block);
+ dasd_schedule_device_bh(base);
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 2/4] s390/dasd: use correct number of retries for ERP requests
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
2023-07-21 19:36 ` [PATCH 1/4] s390/dasd: fix hanging device after quiesce/resume Stefan Haberland
@ 2023-07-21 19:36 ` Stefan Haberland
2023-07-21 19:36 ` [PATCH 3/4] s390/dasd: fix hanging device after request requeue Stefan Haberland
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Stefan Haberland @ 2023-07-21 19:36 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
If a DASD request fails an error recovery procedure (ERP) request might
be built as a copy of the original request to do error recovery.
The ERP request gets a number of retries assigned.
This number is always 256 no matter what other value might have been set
for the original request. This is not what is expected when a user
specifies a certain amount of retries for the device via sysfs.
Correctly use the number of retries of the original request for ERP
requests.
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
---
drivers/s390/block/dasd_3990_erp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/block/dasd_3990_erp.c b/drivers/s390/block/dasd_3990_erp.c
index 9fd36c468706..91e9a17b848e 100644
--- a/drivers/s390/block/dasd_3990_erp.c
+++ b/drivers/s390/block/dasd_3990_erp.c
@@ -2441,7 +2441,7 @@ static struct dasd_ccw_req *dasd_3990_erp_add_erp(struct dasd_ccw_req *cqr)
erp->block = cqr->block;
erp->magic = cqr->magic;
erp->expires = cqr->expires;
- erp->retries = 256;
+ erp->retries = device->default_retries;
erp->buildclk = get_tod_clock();
erp->status = DASD_CQR_FILLED;
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 3/4] s390/dasd: fix hanging device after request requeue
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
2023-07-21 19:36 ` [PATCH 1/4] s390/dasd: fix hanging device after quiesce/resume Stefan Haberland
2023-07-21 19:36 ` [PATCH 2/4] s390/dasd: use correct number of retries for ERP requests Stefan Haberland
@ 2023-07-21 19:36 ` Stefan Haberland
2023-07-21 19:36 ` [PATCH 4/4] s390/dasd: print copy pair message only for the correct error Stefan Haberland
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Stefan Haberland @ 2023-07-21 19:36 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
The DASD device driver has a function to requeue requests to the
blocklayer.
This function is used in various cases when basic settings for the device
have to be changed like High Performance Ficon related parameters or copy
pair settings.
The functions iterates over the device->ccw_queue and also removes the
requests from the block->ccw_queue.
In case the device is started on an alias device instead of the base
device it might be removed from the block->ccw_queue without having it
canceled properly before. This might lead to a hanging device since the
request is no longer on a queue and can not be handled properly.
Fix by iterating over the block->ccw_queue instead of the
device->ccw_queue. This will take care of all blocklayer related requests
and handle them on all associated DASD devices.
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
---
drivers/s390/block/dasd.c | 125 +++++++++++++++-----------------------
1 file changed, 48 insertions(+), 77 deletions(-)
diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c
index edcbf77852c3..50a5ff70814a 100644
--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -2943,41 +2943,32 @@ static void _dasd_wake_block_flush_cb(struct dasd_ccw_req *cqr, void *data)
* Requeue a request back to the block request queue
* only works for block requests
*/
-static int _dasd_requeue_request(struct dasd_ccw_req *cqr)
+static void _dasd_requeue_request(struct dasd_ccw_req *cqr)
{
- struct dasd_block *block = cqr->block;
struct request *req;
- if (!block)
- return -EINVAL;
/*
* If the request is an ERP request there is nothing to requeue.
* This will be done with the remaining original request.
*/
if (cqr->refers)
- return 0;
+ return;
spin_lock_irq(&cqr->dq->lock);
req = (struct request *) cqr->callback_data;
blk_mq_requeue_request(req, true);
spin_unlock_irq(&cqr->dq->lock);
- return 0;
+ return;
}
-/*
- * Go through all request on the dasd_block request queue, cancel them
- * on the respective dasd_device, and return them to the generic
- * block layer.
- */
-static int dasd_flush_block_queue(struct dasd_block *block)
+static int _dasd_requests_to_flushqueue(struct dasd_block *block,
+ struct list_head *flush_queue)
{
struct dasd_ccw_req *cqr, *n;
- int rc, i;
- struct list_head flush_queue;
unsigned long flags;
+ int rc, i;
- INIT_LIST_HEAD(&flush_queue);
- spin_lock_bh(&block->queue_lock);
+ spin_lock_irqsave(&block->queue_lock, flags);
rc = 0;
restart:
list_for_each_entry_safe(cqr, n, &block->ccw_queue, blocklist) {
@@ -2992,13 +2983,32 @@ static int dasd_flush_block_queue(struct dasd_block *block)
* is returned from the dasd_device layer.
*/
cqr->callback = _dasd_wake_block_flush_cb;
- for (i = 0; cqr != NULL; cqr = cqr->refers, i++)
- list_move_tail(&cqr->blocklist, &flush_queue);
+ for (i = 0; cqr; cqr = cqr->refers, i++)
+ list_move_tail(&cqr->blocklist, flush_queue);
if (i > 1)
/* moved more than one request - need to restart */
goto restart;
}
- spin_unlock_bh(&block->queue_lock);
+ spin_unlock_irqrestore(&block->queue_lock, flags);
+
+ return rc;
+}
+
+/*
+ * Go through all request on the dasd_block request queue, cancel them
+ * on the respective dasd_device, and return them to the generic
+ * block layer.
+ */
+static int dasd_flush_block_queue(struct dasd_block *block)
+{
+ struct dasd_ccw_req *cqr, *n;
+ struct list_head flush_queue;
+ unsigned long flags;
+ int rc;
+
+ INIT_LIST_HEAD(&flush_queue);
+ rc = _dasd_requests_to_flushqueue(block, &flush_queue);
+
/* Now call the callback function of flushed requests */
restart_cb:
list_for_each_entry_safe(cqr, n, &flush_queue, blocklist) {
@@ -3881,75 +3891,36 @@ EXPORT_SYMBOL_GPL(dasd_generic_space_avail);
*/
int dasd_generic_requeue_all_requests(struct dasd_device *device)
{
+ struct dasd_block *block = device->block;
struct list_head requeue_queue;
struct dasd_ccw_req *cqr, *n;
- struct dasd_ccw_req *refers;
int rc;
- INIT_LIST_HEAD(&requeue_queue);
- spin_lock_irq(get_ccwdev_lock(device->cdev));
- rc = 0;
- list_for_each_entry_safe(cqr, n, &device->ccw_queue, devlist) {
- /* Check status and move request to flush_queue */
- if (cqr->status == DASD_CQR_IN_IO) {
- rc = device->discipline->term_IO(cqr);
- if (rc) {
- /* unable to terminate requeust */
- dev_err(&device->cdev->dev,
- "Unable to terminate request %p "
- "on suspend\n", cqr);
- spin_unlock_irq(get_ccwdev_lock(device->cdev));
- dasd_put_device(device);
- return rc;
- }
- }
- list_move_tail(&cqr->devlist, &requeue_queue);
- }
- spin_unlock_irq(get_ccwdev_lock(device->cdev));
-
- list_for_each_entry_safe(cqr, n, &requeue_queue, devlist) {
- wait_event(dasd_flush_wq,
- (cqr->status != DASD_CQR_CLEAR_PENDING));
+ if (!block)
+ return 0;
- /*
- * requeue requests to blocklayer will only work
- * for block device requests
- */
- if (_dasd_requeue_request(cqr))
- continue;
+ INIT_LIST_HEAD(&requeue_queue);
+ rc = _dasd_requests_to_flushqueue(block, &requeue_queue);
- /* remove requests from device and block queue */
- list_del_init(&cqr->devlist);
- while (cqr->refers != NULL) {
- refers = cqr->refers;
- /* remove the request from the block queue */
- list_del(&cqr->blocklist);
- /* free the finished erp request */
- dasd_free_erp_request(cqr, cqr->memdev);
- cqr = refers;
+ /* Now call the callback function of flushed requests */
+restart_cb:
+ list_for_each_entry_safe(cqr, n, &requeue_queue, blocklist) {
+ wait_event(dasd_flush_wq, (cqr->status < DASD_CQR_QUEUED));
+ /* Process finished ERP request. */
+ if (cqr->refers) {
+ spin_lock_bh(&block->queue_lock);
+ __dasd_process_erp(block->base, cqr);
+ spin_unlock_bh(&block->queue_lock);
+ /* restart list_for_xx loop since dasd_process_erp
+ * might remove multiple elements
+ */
+ goto restart_cb;
}
-
- /*
- * _dasd_requeue_request already checked for a valid
- * blockdevice, no need to check again
- * all erp requests (cqr->refers) have a cqr->block
- * pointer copy from the original cqr
- */
+ _dasd_requeue_request(cqr);
list_del_init(&cqr->blocklist);
cqr->block->base->discipline->free_cp(
cqr, (struct request *) cqr->callback_data);
}
-
- /*
- * if requests remain then they are internal request
- * and go back to the device queue
- */
- if (!list_empty(&requeue_queue)) {
- /* move freeze_queue to start of the ccw_queue */
- spin_lock_irq(get_ccwdev_lock(device->cdev));
- list_splice_tail(&requeue_queue, &device->ccw_queue);
- spin_unlock_irq(get_ccwdev_lock(device->cdev));
- }
dasd_schedule_device_bh(device);
return rc;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH 4/4] s390/dasd: print copy pair message only for the correct error
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
` (2 preceding siblings ...)
2023-07-21 19:36 ` [PATCH 3/4] s390/dasd: fix hanging device after request requeue Stefan Haberland
@ 2023-07-21 19:36 ` Stefan Haberland
2023-07-21 19:44 ` [PATCH 0/4] dasd fixes Jens Axboe
2023-07-24 19:05 ` Jens Axboe
5 siblings, 0 replies; 8+ messages in thread
From: Stefan Haberland @ 2023-07-21 19:36 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
The DASD driver has certain types of requests that might be rejected by
the storage server or z/VM because they are not supported. Since the
missing support of the command is not a real issue there is no user
visible kernel error message for this.
For copy pair setups there is a specific error that IO is not allowed on
secondary devices. This error case is explicitly handled and an error
message is printed.
The code checking for the error did use a bitwise 'and' that is used to
check for specific bits. But in this case the whole sense byte has to
match.
This leads to the problem that the copy pair related error message is
erroneously printed for other error cases that are usually not reported.
This might heavily confuse users and lead to follow on actions that might
disrupt application processing.
Fix by checking the sense byte for the exact value and not single bits.
Cc: <stable@vger.kernel.org> # 6.1+
Fixes: 1fca631a1185 ("s390/dasd: suppress generic error messages for PPRC secondary devices")
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
---
drivers/s390/block/dasd_3990_erp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/block/dasd_3990_erp.c b/drivers/s390/block/dasd_3990_erp.c
index 91e9a17b848e..89957bb7244d 100644
--- a/drivers/s390/block/dasd_3990_erp.c
+++ b/drivers/s390/block/dasd_3990_erp.c
@@ -1050,7 +1050,7 @@ dasd_3990_erp_com_rej(struct dasd_ccw_req * erp, char *sense)
dev_err(&device->cdev->dev, "An I/O request was rejected"
" because writing is inhibited\n");
erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
- } else if (sense[7] & SNS7_INVALID_ON_SEC) {
+ } else if (sense[7] == SNS7_INVALID_ON_SEC) {
dev_err(&device->cdev->dev, "An I/O request was rejected on a copy pair secondary device\n");
/* suppress dump of sense data for this error */
set_bit(DASD_CQR_SUPPRESS_CR, &erp->refers->flags);
--
2.39.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 0/4] dasd fixes
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
` (3 preceding siblings ...)
2023-07-21 19:36 ` [PATCH 4/4] s390/dasd: print copy pair message only for the correct error Stefan Haberland
@ 2023-07-21 19:44 ` Jens Axboe
2023-07-24 9:49 ` Stefan Haberland
2023-07-24 19:05 ` Jens Axboe
5 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-07-21 19:44 UTC (permalink / raw)
To: Stefan Haberland
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
On 7/21/23 1:36 PM, Stefan Haberland wrote:
> Hello Jens,
>
> please apply the following patches that fix some errors in the DASD device
> driver. Thanks.
6.6 fine, or were you targeting 6.5?
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 0/4] dasd fixes
2023-07-21 19:44 ` [PATCH 0/4] dasd fixes Jens Axboe
@ 2023-07-24 9:49 ` Stefan Haberland
0 siblings, 0 replies; 8+ messages in thread
From: Stefan Haberland @ 2023-07-24 9:49 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
Am 21.07.23 um 21:44 schrieb Jens Axboe:
> On 7/21/23 1:36 PM, Stefan Haberland wrote:
>> Hello Jens,
>>
>> please apply the following patches that fix some errors in the DASD device
>> driver. Thanks.
> 6.6 fine, or were you targeting 6.5?
>
6.5 would be great if possible.
One of the fixes is for a customer issue which we should get into the
Distros asap.
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/4] dasd fixes
2023-07-21 19:36 [PATCH 0/4] dasd fixes Stefan Haberland
` (4 preceding siblings ...)
2023-07-21 19:44 ` [PATCH 0/4] dasd fixes Jens Axboe
@ 2023-07-24 19:05 ` Jens Axboe
5 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2023-07-24 19:05 UTC (permalink / raw)
To: Stefan Haberland
Cc: linux-block, Jan Hoeppner, linux-s390, Heiko Carstens,
Vasily Gorbik, Christian Borntraeger
On Fri, 21 Jul 2023 21:36:43 +0200, Stefan Haberland wrote:
> please apply the following patches that fix some errors in the DASD device
> driver. Thanks.
>
> regards,
> Stefan
>
> Stefan Haberland (4):
> s390/dasd: fix hanging device after quiesce/resume
> s390/dasd: use correct number of retries for ERP requests
> s390/dasd: fix hanging device after request requeue
> s390/dasd: print copy pair message only for the correct error
>
> [...]
Applied, thanks!
[1/4] s390/dasd: fix hanging device after quiesce/resume
commit: 05f1d8ed03f547054efbc4d29bb7991c958ede95
[2/4] s390/dasd: use correct number of retries for ERP requests
commit: acea28a6b74f458defda7417d2217b051ba7d444
[3/4] s390/dasd: fix hanging device after request requeue
commit: 8a2278ce9c25048d999fe1a3561def75d963f471
[4/4] s390/dasd: print copy pair message only for the correct error
commit: 856d8e3c633b183df23549ce760ae84478a7098d
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread