* [PATCH] block: cleanup and fix batch completion adding conditions
@ 2025-02-13 15:18 Jens Axboe
2025-02-19 7:26 ` Shinichiro Kawasaki
2025-02-28 11:59 ` Ondrej Kozina
0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2025-02-13 15:18 UTC (permalink / raw)
To: linux-block@vger.kernel.org
The conditions for whether or not a request is allowed adding to a
completion batch are a bit hard to read, and they also have a few
issues. One is that ioerror may indeed be a random value on passthrough,
and it's being checked unconditionally of whether or not the given
request is a passthrough request or not.
Rewrite the conditions to be separate for easier reading, and only check
ioerror for non-passthrough requests. This fixes an issue with bio
unmapping on passthrough, where it fails getting added to a batch. This
both leads to suboptimal performance, and may trigger a potential
schedule-under-atomic condition for polled passthrough IO.
Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a0a9007cc1e3..cc1cb5de8fb6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -861,12 +861,22 @@ static inline bool blk_mq_add_to_batch(struct request *req,
void (*complete)(struct io_comp_batch *))
{
/*
- * blk_mq_end_request_batch() can't end request allocated from
- * sched tags
+ * Check various conditions that exclude batch processing:
+ * 1) No batch container
+ * 2) Has scheduler data attached
+ * 3) Not a passthrough request and end_io set
+ * 4) Not a passthrough request and an ioerror
*/
- if (!iob || (req->rq_flags & RQF_SCHED_TAGS) || ioerror ||
- (req->end_io && !blk_rq_is_passthrough(req)))
+ if (!iob)
return false;
+ if (req->rq_flags & RQF_SCHED_TAGS)
+ return false;
+ if (!blk_rq_is_passthrough(req)) {
+ if (req->end_io)
+ return false;
+ if (ioerror < 0)
+ return false;
+ }
if (!iob->complete)
iob->complete = complete;
--
Jens Axboe
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-13 15:18 [PATCH] block: cleanup and fix batch completion adding conditions Jens Axboe
@ 2025-02-19 7:26 ` Shinichiro Kawasaki
2025-02-19 19:17 ` alan.adamson
2025-02-28 11:59 ` Ondrej Kozina
1 sibling, 1 reply; 14+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-19 7:26 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block@vger.kernel.org, alan.adamson@oracle.com
CC+: Alan,
On Feb 13, 2025 / 08:18, Jens Axboe wrote:
> The conditions for whether or not a request is allowed adding to a
> completion batch are a bit hard to read, and they also have a few
> issues. One is that ioerror may indeed be a random value on passthrough,
> and it's being checked unconditionally of whether or not the given
> request is a passthrough request or not.
>
> Rewrite the conditions to be separate for easier reading, and only check
> ioerror for non-passthrough requests. This fixes an issue with bio
> unmapping on passthrough, where it fails getting added to a batch. This
> both leads to suboptimal performance, and may trigger a potential
> schedule-under-atomic condition for polled passthrough IO.
>
> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
I observed the blktests test case nvme/039 failure with v6.14-rc3 kernel. I
bisected and found that this patch in the v6.14-rc3 is the trigger.
The test run output is as follows:
nvme/039 => nvme0n1 (test error logging) [failed]
runtime 5.378s ... 5.354s
--- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900
+++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2025-02-19 16:13:05.061387179 +0900
@@ -1,7 +1,3 @@
Running nvme/039
- Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
- Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
- Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0
- Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0
Test complete
The test case does error injection. Test method requires reconsideration to
adjust to this kernel change, probably. Help for fix will be appreciated.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-19 7:26 ` Shinichiro Kawasaki
@ 2025-02-19 19:17 ` alan.adamson
2025-02-20 1:04 ` alan.adamson
0 siblings, 1 reply; 14+ messages in thread
From: alan.adamson @ 2025-02-19 19:17 UTC (permalink / raw)
To: Shinichiro Kawasaki, Jens Axboe; +Cc: linux-block@vger.kernel.org
On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote:
> CC+: Alan,
>
> On Feb 13, 2025 / 08:18, Jens Axboe wrote:
>> The conditions for whether or not a request is allowed adding to a
>> completion batch are a bit hard to read, and they also have a few
>> issues. One is that ioerror may indeed be a random value on passthrough,
>> and it's being checked unconditionally of whether or not the given
>> request is a passthrough request or not.
>>
>> Rewrite the conditions to be separate for easier reading, and only check
>> ioerror for non-passthrough requests. This fixes an issue with bio
>> unmapping on passthrough, where it fails getting added to a batch. This
>> both leads to suboptimal performance, and may trigger a potential
>> schedule-under-atomic condition for polled passthrough IO.
>>
>> Fixes: f794f3351f26 ("block: add support for blk_mq_end_request_batch()")
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> I observed the blktests test case nvme/039 failure with v6.14-rc3 kernel. I
> bisected and found that this patch in the v6.14-rc3 is the trigger.
>
> The test run output is as follows:
>
> nvme/039 => nvme0n1 (test error logging) [failed]
> runtime 5.378s ... 5.354s
> --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900
> +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad 2025-02-19 16:13:05.061387179 +0900
> @@ -1,7 +1,3 @@
> Running nvme/039
> - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) DNR
> - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
> - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
> Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0
> - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0
> Test complete
>
>
> The test case does error injection. Test method requires reconsideration to
> adjust to this kernel change, probably. Help for fix will be appreciated.
Not really an issue with the test, rather the error injector is broken.
I'll investigate.
Alan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-19 19:17 ` alan.adamson
@ 2025-02-20 1:04 ` alan.adamson
2025-02-20 7:28 ` Shinichiro Kawasaki
0 siblings, 1 reply; 14+ messages in thread
From: alan.adamson @ 2025-02-20 1:04 UTC (permalink / raw)
To: Shinichiro Kawasaki, Jens Axboe; +Cc: linux-block@vger.kernel.org
On 2/19/25 11:17 AM, alan.adamson@oracle.com wrote:
>
> On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote:
>> CC+: Alan,
>>
>> On Feb 13, 2025 / 08:18, Jens Axboe wrote:
>>> The conditions for whether or not a request is allowed adding to a
>>> completion batch are a bit hard to read, and they also have a few
>>> issues. One is that ioerror may indeed be a random value on
>>> passthrough,
>>> and it's being checked unconditionally of whether or not the given
>>> request is a passthrough request or not.
>>>
>>> Rewrite the conditions to be separate for easier reading, and only
>>> check
>>> ioerror for non-passthrough requests. This fixes an issue with bio
>>> unmapping on passthrough, where it fails getting added to a batch. This
>>> both leads to suboptimal performance, and may trigger a potential
>>> schedule-under-atomic condition for polled passthrough IO.
>>>
>>> Fixes: f794f3351f26 ("block: add support for
>>> blk_mq_end_request_batch()")
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> I observed the blktests test case nvme/039 failure with v6.14-rc3
>> kernel. I
>> bisected and found that this patch in the v6.14-rc3 is the trigger.
>>
>> The test run output is as follows:
>>
>> nvme/039 => nvme0n1 (test error logging) [failed]
>> runtime 5.378s ... 5.354s
>> --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900
>> +++
>> /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad
>> 2025-02-19 16:13:05.061387179 +0900
>> @@ -1,7 +1,3 @@
>> Running nvme/039
>> - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2 /
>> sc 0x81) DNR
>> - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
>> - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
>> Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR cdw10=0x1
>> cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0
>> - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR
>> cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0
>> Test complete
>>
>>
>> The test case does error injection. Test method requires
>> reconsideration to
>> adjust to this kernel change, probably. Help for fix will be
>> appreciated.
>
> Not really an issue with the test, rather the error injector is
> broken. I'll investigate.
The following change allows the test to pass.
- The check of (ioerror < 0) should be (ioerror != 0)
- passthroughs can also have ioerror set, so false needs to be returned.
This needs to be resolved.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fa2a76cc2f73..b2bd2ac40441 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct
request *req,
* 1) No batch container
* 2) Has scheduler data attached
* 3) Not a passthrough request and end_io set
- * 4) Not a passthrough request and an ioerror
+ * 4) An ioerror
*/
if (!iob)
return false;
if (req->rq_flags & RQF_SCHED_TAGS)
return false;
+ if (ioerror)
+ return false;
if (!blk_rq_is_passthrough(req)) {
if (req->end_io)
return false;
- if (ioerror < 0)
- return false;
}
Alan
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-20 1:04 ` alan.adamson
@ 2025-02-20 7:28 ` Shinichiro Kawasaki
0 siblings, 0 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2025-02-20 7:28 UTC (permalink / raw)
To: alan.adamson@oracle.com; +Cc: Jens Axboe, linux-block@vger.kernel.org
On Feb 19, 2025 / 17:04, alan.adamson@oracle.com wrote:
>
> On 2/19/25 11:17 AM, alan.adamson@oracle.com wrote:
> >
> > On 2/18/25 11:26 PM, Shinichiro Kawasaki wrote:
> > > CC+: Alan,
> > >
> > > On Feb 13, 2025 / 08:18, Jens Axboe wrote:
> > > > The conditions for whether or not a request is allowed adding to a
> > > > completion batch are a bit hard to read, and they also have a few
> > > > issues. One is that ioerror may indeed be a random value on
> > > > passthrough,
> > > > and it's being checked unconditionally of whether or not the given
> > > > request is a passthrough request or not.
> > > >
> > > > Rewrite the conditions to be separate for easier reading, and
> > > > only check
> > > > ioerror for non-passthrough requests. This fixes an issue with bio
> > > > unmapping on passthrough, where it fails getting added to a batch. This
> > > > both leads to suboptimal performance, and may trigger a potential
> > > > schedule-under-atomic condition for polled passthrough IO.
> > > >
> > > > Fixes: f794f3351f26 ("block: add support for
> > > > blk_mq_end_request_batch()")
> > > > Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > > I observed the blktests test case nvme/039 failure with v6.14-rc3
> > > kernel. I
> > > bisected and found that this patch in the v6.14-rc3 is the trigger.
> > >
> > > The test run output is as follows:
> > >
> > > nvme/039 => nvme0n1 (test error logging) [failed]
> > > runtime 5.378s ... 5.354s
> > > --- tests/nvme/039.out 2024-09-20 11:20:26.405380875 +0900
> > > +++
> > > /home/shin/Blktests/blktests/results/nvme0n1/nvme/039.out.bad
> > > 2025-02-19 16:13:05.061387179 +0900
> > > @@ -1,7 +1,3 @@
> > > Running nvme/039
> > > - Read(0x2) @ LBA 0, 1 blocks, Unrecovered Read Error (sct 0x2
> > > / sc 0x81) DNR
> > > - Read(0x2) @ LBA 0, 1 blocks, Unknown (sct 0x3 / sc 0x75) DNR
> > > - Write(0x1) @ LBA 0, 1 blocks, Write Fault (sct 0x2 / sc 0x80) DNR
> > > Identify(0x6), Access Denied (sct 0x2 / sc 0x86) DNR
> > > cdw10=0x1 cdw11=0x0 cdw12=0x0 cdw13=0x0 cdw14=0x0 cdw15=0x0
> > > - Read(0x2), Invalid Command Opcode (sct 0x0 / sc 0x1) DNR
> > > cdw10=0x0 cdw11=0x0 cdw12=0x1 cdw13=0x0 cdw14=0x0 cdw15=0x0
> > > Test complete
> > >
> > >
> > > The test case does error injection. Test method requires
> > > reconsideration to
> > > adjust to this kernel change, probably. Help for fix will be
> > > appreciated.
> >
> > Not really an issue with the test, rather the error injector is broken.
> > I'll investigate.
Alan, thank you for looking into this. Then it sounds that a fix is needed in
kernel side.
>
> The following change allows the test to pass.
>
> - The check of (ioerror < 0) should be (ioerror != 0)
>
> - passthroughs can also have ioerror set, so false needs to be returned.
> This needs to be resolved.
>
>
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index fa2a76cc2f73..b2bd2ac40441 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -865,17 +865,17 @@ static inline bool blk_mq_add_to_batch(struct request
> *req,
> * 1) No batch container
> * 2) Has scheduler data attached
> * 3) Not a passthrough request and end_io set
> - * 4) Not a passthrough request and an ioerror
> + * 4) An ioerror
> */
> if (!iob)
> return false;
> if (req->rq_flags & RQF_SCHED_TAGS)
> return false;
> + if (ioerror)
> + return false;
> if (!blk_rq_is_passthrough(req)) {
> if (req->end_io)
> return false;
> - if (ioerror < 0)
> - return false;
> }
I think the modification above essentially reverts the trigger commit, so it
means that we can not achieve what the commit aimed at.
I took a look in the call chain. Before the commit, the call chain was
like this for passthrough commands:
nvme_handle_cqe()
blk_mq_add_to_batch() ... false is returned, then call nvme_pci_complete_rq()
nvme_pci_complete_rq()
nvme_complete_rq()
nvme_end_req()
nvme_log_err_passthrough() ... log is printed
__nvme_end_req() ... the nvme command ends
After the commit, it looks like this:
nvme_handle_cqe()
blk_mq_add_to_batch() ... true is returned. Set nvme_pci_complete_batch()
nvme_pci_complete_batch()
nvme_complete_batch()
nvme_complete_batch()
nvme_complete_batch_req()
__nvme_end_req() ... log is not printed but the command ends,
then nvme/039 failed
Assuming the change by the trigger commit is required, I guess adding
nvme_log_err_passthrough() call in nvme_complete_batch_req() will restore
the log print feature for the passthrough case.
As a trial, I created a patch below [*]. I confirmed it avoids the nvme/039
failure. (As Alan noted, still I also find the check for "ioerror != 0" is
required for non-passthrough case, to make the test case pass.)
Jens, if you have comments, please share.
[*]
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 818d4e49aab5..33030dd9b76a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -496,6 +496,9 @@ void nvme_complete_batch_req(struct request *req)
{
trace_nvme_complete_rq(req);
nvme_cleanup_cmd(req);
+ if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET) &&
+ blk_rq_is_passthrough(req)))
+ nvme_log_err_passthru(req);
__nvme_end_req(req);
}
EXPORT_SYMBOL_GPL(nvme_complete_batch_req);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fa2a76cc2f73..b891ed113306 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -874,7 +874,7 @@ static inline bool blk_mq_add_to_batch(struct request *req,
if (!blk_rq_is_passthrough(req)) {
if (req->end_io)
return false;
- if (ioerror < 0)
+ if (ioerror)
return false;
}
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-13 15:18 [PATCH] block: cleanup and fix batch completion adding conditions Jens Axboe
2025-02-19 7:26 ` Shinichiro Kawasaki
@ 2025-02-28 11:59 ` Ondrej Kozina
2025-02-28 14:08 ` Jens Axboe
1 sibling, 1 reply; 14+ messages in thread
From: Ondrej Kozina @ 2025-02-28 11:59 UTC (permalink / raw)
To: linux-block; +Cc: axboe, gmazyland, regressions
Hi Jens,
this patch introduced regression to locked SED OPAL2 devices. The locked
region no longer returns -EIO upon IO. On read the caller receives block
of zeroes, on write it does not report any error either. In both cases,
previous to this patch, the caller would get IO error (expected) with
locked device.
It was discovered by cryptsetup testsuite specifically
https://gitlab.com/cryptsetup/cryptsetup/-/blob/main/tests/compat-test-opal
I've attached a simple patch that changes the ioerror condition and it
fixed the problem with SED OPAL2 devices for me.
#regzbot introduced: 1f47ed294a2bd577d5ae43e6e28e1c9a3be4a833
With kind regards
O.
------
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index fa2a76cc2f73..b891ed113306 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -874,7 +874,7 @@ static inline bool blk_mq_add_to_batch(struct
request *req,
if (!blk_rq_is_passthrough(req)) {
if (req->end_io)
return false;
- if (ioerror < 0)
+ if (ioerror)
return false;
}
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-28 11:59 ` Ondrej Kozina
@ 2025-02-28 14:08 ` Jens Axboe
2025-02-28 14:32 ` Ondrej Kozina
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2025-02-28 14:08 UTC (permalink / raw)
To: Ondrej Kozina, linux-block; +Cc: gmazyland, regressions
On 2/28/25 4:59 AM, Ondrej Kozina wrote:
> Hi Jens,
>
> this patch introduced regression to locked SED OPAL2 devices. The locked region no longer returns -EIO upon IO. On read the caller receives block of zeroes, on write it does not report any error either. In both cases, previous to this patch, the caller would get IO error (expected) with locked device.
>
> It was discovered by cryptsetup testsuite specifically https://gitlab.com/cryptsetup/cryptsetup/-/blob/main/tests/compat-test-opal
>
> I've attached a simple patch that changes the ioerror condition and it fixed the problem with SED OPAL2 devices for me.
>
> #regzbot introduced: 1f47ed294a2bd577d5ae43e6e28e1c9a3be4a833
Oops thanks - does someone want to send a "real" patch with commit message
etc, or do you just want me to queue something up?
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-28 14:08 ` Jens Axboe
@ 2025-02-28 14:32 ` Ondrej Kozina
2025-03-04 9:25 ` Milan Broz
0 siblings, 1 reply; 14+ messages in thread
From: Ondrej Kozina @ 2025-02-28 14:32 UTC (permalink / raw)
To: Jens Axboe, linux-block; +Cc: gmazyland, regressions
On 28/02/2025 15:08, Jens Axboe wrote:
> On 2/28/25 4:59 AM, Ondrej Kozina wrote:
>> Hi Jens,
>>
>> this patch introduced regression to locked SED OPAL2 devices. The locked region no longer returns -EIO upon IO. On read the caller receives block of zeroes, on write it does not report any error either. In both cases, previous to this patch, the caller would get IO error (expected) with locked device.
>>
>> It was discovered by cryptsetup testsuite specifically https://gitlab.com/cryptsetup/cryptsetup/-/blob/main/tests/compat-test-opal
>>
>> I've attached a simple patch that changes the ioerror condition and it fixed the problem with SED OPAL2 devices for me.
>>
>> #regzbot introduced: 1f47ed294a2bd577d5ae43e6e28e1c9a3be4a833
>
> Oops thanks - does someone want to send a "real" patch with commit message
> etc, or do you just want me to queue something up?
>
To be honest, my patch was just a quick hack and I'm not sure if it is
the correct one in general. But I will test (and report) a fix when it
lands here.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-02-28 14:32 ` Ondrej Kozina
@ 2025-03-04 9:25 ` Milan Broz
2025-03-11 3:03 ` Shinichiro Kawasaki
0 siblings, 1 reply; 14+ messages in thread
From: Milan Broz @ 2025-03-04 9:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: regressions, Ondrej Kozina, linux-block
On 2/28/25 3:32 PM, Ondrej Kozina wrote:
> On 28/02/2025 15:08, Jens Axboe wrote:
>> On 2/28/25 4:59 AM, Ondrej Kozina wrote:
>>> Hi Jens,
>>>
>>> this patch introduced regression to locked SED OPAL2 devices. The locked region no longer returns -EIO upon IO. On read the caller receives block of zeroes, on write it does not report any error either. In both cases, previous to this patch, the caller would get IO error (expected) with locked device.
>>>
>>> It was discovered by cryptsetup testsuite specifically https://gitlab.com/cryptsetup/cryptsetup/-/blob/main/tests/compat-test-opal
>>>
>>> I've attached a simple patch that changes the ioerror condition and it fixed the problem with SED OPAL2 devices for me.
>>>
>>> #regzbot introduced: 1f47ed294a2bd577d5ae43e6e28e1c9a3be4a833
>>
>> Oops thanks - does someone want to send a "real" patch with commit message
>> etc, or do you just want me to queue something up?
>>
>
> To be honest, my patch was just a quick hack and I'm not sure if it is
> the correct one in general. But I will test (and report) a fix when it
> lands here.
Hi Jens,
do you have some fix for this issue anywhere?
We can easily test the patch (at least for Opal) as our CI is currently
completely red because of this issue :)
Thanks,
Milan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-03-04 9:25 ` Milan Broz
@ 2025-03-11 3:03 ` Shinichiro Kawasaki
2025-03-11 14:59 ` Milan Broz
2025-03-12 11:17 ` Ondrej Kozina
0 siblings, 2 replies; 14+ messages in thread
From: Shinichiro Kawasaki @ 2025-03-11 3:03 UTC (permalink / raw)
To: Milan Broz
Cc: Jens Axboe, regressions@lists.linux.dev, Ondrej Kozina,
linux-block@vger.kernel.org, Alan Adamson
On Mar 04, 2025 / 10:25, Milan Broz wrote:
> On 2/28/25 3:32 PM, Ondrej Kozina wrote:
> > On 28/02/2025 15:08, Jens Axboe wrote:
> > > On 2/28/25 4:59 AM, Ondrej Kozina wrote:
> > > > Hi Jens,
> > > >
> > > > this patch introduced regression to locked SED OPAL2 devices. The locked region no longer returns -EIO upon IO. On read the caller receives block of zeroes, on write it does not report any error either. In both cases, previous to this patch, the caller would get IO error (expected) with locked device.
> > > >
> > > > It was discovered by cryptsetup testsuite specifically https://gitlab.com/cryptsetup/cryptsetup/-/blob/main/tests/compat-test-opal
> > > >
> > > > I've attached a simple patch that changes the ioerror condition and it fixed the problem with SED OPAL2 devices for me.
> > > >
> > > > #regzbot introduced: 1f47ed294a2bd577d5ae43e6e28e1c9a3be4a833
> > >
> > > Oops thanks - does someone want to send a "real" patch with commit message
> > > etc, or do you just want me to queue something up?
> > >
> >
> > To be honest, my patch was just a quick hack and I'm not sure if it is
> > the correct one in general. But I will test (and report) a fix when it
> > lands here.
>
> Hi Jens,
>
> do you have some fix for this issue anywhere?
>
> We can easily test the patch (at least for Opal) as our CI is currently
> completely red because of this issue :)
> Thanks,
> Milan
>
I created fix candidate patches to address the blktests nvme/039 failure [1].
This may work for the failures Ondrej and Milan observe too, hopefully.
Jens, Alan, could you take a look in the patches and see if they make sense?
[1] https://lkml.kernel.org/linux-block/20250311024144.1762333-1-shinichiro.kawasaki@wdc.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-03-11 3:03 ` Shinichiro Kawasaki
@ 2025-03-11 14:59 ` Milan Broz
2025-03-11 15:00 ` Jens Axboe
2025-03-12 11:17 ` Ondrej Kozina
1 sibling, 1 reply; 14+ messages in thread
From: Milan Broz @ 2025-03-11 14:59 UTC (permalink / raw)
To: Shinichiro Kawasaki
Cc: Jens Axboe, regressions@lists.linux.dev, Ondrej Kozina,
linux-block@vger.kernel.org, Alan Adamson
On 3/11/25 4:03 AM, Shinichiro Kawasaki wrote:
>
> I created fix candidate patches to address the blktests nvme/039 failure [1].
> This may work for the failures Ondrej and Milan observe too, hopefully.
Hi,
I quickly tried to run the test with todays' mainline git and mentioned two patches:
https://lkml.kernel.org/linux-block/20250311024144.1762333-2-shinichiro.kawasaki@wdc.com/
https://lkml.kernel.org/linux-block/20250311024144.1762333-3-shinichiro.kawasaki@wdc.com/
and it looks like our SED Opal tests are fixed, no errors or warnings, thanks!
> Jens, Alan, could you take a look in the patches and see if they make sense?
Please, fix it before the 6.14 final, this could cause serious data corruption, at least
on systems using SED Opal drives.
Thanks,
Milan
>
> [1] https://lkml.kernel.org/linux-block/20250311024144.1762333-1-shinichiro.kawasaki@wdc.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-03-11 14:59 ` Milan Broz
@ 2025-03-11 15:00 ` Jens Axboe
2025-03-19 18:54 ` Milan Broz
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2025-03-11 15:00 UTC (permalink / raw)
To: Milan Broz, Shinichiro Kawasaki
Cc: regressions@lists.linux.dev, Ondrej Kozina,
linux-block@vger.kernel.org, Alan Adamson
On 3/11/25 8:59 AM, Milan Broz wrote:
> On 3/11/25 4:03 AM, Shinichiro Kawasaki wrote:
>>
>> I created fix candidate patches to address the blktests nvme/039 failure [1].
>> This may work for the failures Ondrej and Milan observe too, hopefully.
>
> Hi,
>
> I quickly tried to run the test with todays' mainline git and mentioned two patches:
> https://lkml.kernel.org/linux-block/20250311024144.1762333-2-shinichiro.kawasaki@wdc.com/
> https://lkml.kernel.org/linux-block/20250311024144.1762333-3-shinichiro.kawasaki@wdc.com/
> and it looks like our SED Opal tests are fixed, no errors or warnings, thanks!
>
>> Jens, Alan, could you take a look in the patches and see if they make sense?
>
> Please, fix it before the 6.14 final, this could cause serious data corruption, at least
> on systems using SED Opal drives.
Fix is already queued up.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-03-11 3:03 ` Shinichiro Kawasaki
2025-03-11 14:59 ` Milan Broz
@ 2025-03-12 11:17 ` Ondrej Kozina
1 sibling, 0 replies; 14+ messages in thread
From: Ondrej Kozina @ 2025-03-12 11:17 UTC (permalink / raw)
To: Shinichiro Kawasaki, Milan Broz
Cc: Jens Axboe, regressions@lists.linux.dev,
linux-block@vger.kernel.org, Alan Adamson
On 11/03/2025 04:03, Shinichiro Kawasaki wrote:
>
> I created fix candidate patches to address the blktests nvme/039 failure [1].
> This may work for the failures Ondrej and Milan observe too, hopefully.
>
> Jens, Alan, could you take a look in the patches and see if they make sense?
>
> [1] https://lkml.kernel.org/linux-block/20250311024144.1762333-1-shinichiro.kawasaki@wdc.com/
>
Hi,
I've tested with both Shinichiro's patches [1] applied and it fixed the
issue for me.
Thank you!
[1]
-
https://lkml.kernel.org/linux-block/20250311024144.1762333-2-shinichiro.kawasaki@wdc.com/
-
https://lkml.kernel.org/linux-block/20250311024144.1762333-3-shinichiro.kawasaki@wdc.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] block: cleanup and fix batch completion adding conditions
2025-03-11 15:00 ` Jens Axboe
@ 2025-03-19 18:54 ` Milan Broz
0 siblings, 0 replies; 14+ messages in thread
From: Milan Broz @ 2025-03-19 18:54 UTC (permalink / raw)
To: Jens Axboe, Shinichiro Kawasaki
Cc: regressions@lists.linux.dev, Ondrej Kozina,
linux-block@vger.kernel.org, Alan Adamson, stable
On 3/11/25 4:00 PM, Jens Axboe wrote:
> On 3/11/25 8:59 AM, Milan Broz wrote:
>> On 3/11/25 4:03 AM, Shinichiro Kawasaki wrote:
>>>
>>> I created fix candidate patches to address the blktests nvme/039 failure [1].
>>> This may work for the failures Ondrej and Milan observe too, hopefully.
>>
>> Hi,
>>
>> I quickly tried to run the test with todays' mainline git and mentioned two patches:
>> https://lkml.kernel.org/linux-block/20250311024144.1762333-2-shinichiro.kawasaki@wdc.com/
>> https://lkml.kernel.org/linux-block/20250311024144.1762333-3-shinichiro.kawasaki@wdc.com/
>> and it looks like our SED Opal tests are fixed, no errors or warnings, thanks!
>>
>>> Jens, Alan, could you take a look in the patches and see if they make sense?
>>
>> Please, fix it before the 6.14 final, this could cause serious data corruption, at least
>> on systems using SED Opal drives.
>
> Fix is already queued up.
Hi Jens,
it seems the bug was "successfully" backported to stable 6.13.4+ and I do not see any fix
in the stable queue yet.
We were hit by it today, as all locked ranges for Opal devices now returning empty data
instead of failing read.
Please could you check what need to be backported to 6.13 stable?
IMO these patches should be backported:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5c2bcc0cd47321d78bb4e865d7857304139f95d
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bce6b5f8987678b9c6c1fe433af6b5fe41feadc
Thanks,
Milan
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-03-19 18:54 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 15:18 [PATCH] block: cleanup and fix batch completion adding conditions Jens Axboe
2025-02-19 7:26 ` Shinichiro Kawasaki
2025-02-19 19:17 ` alan.adamson
2025-02-20 1:04 ` alan.adamson
2025-02-20 7:28 ` Shinichiro Kawasaki
2025-02-28 11:59 ` Ondrej Kozina
2025-02-28 14:08 ` Jens Axboe
2025-02-28 14:32 ` Ondrej Kozina
2025-03-04 9:25 ` Milan Broz
2025-03-11 3:03 ` Shinichiro Kawasaki
2025-03-11 14:59 ` Milan Broz
2025-03-11 15:00 ` Jens Axboe
2025-03-19 18:54 ` Milan Broz
2025-03-12 11:17 ` Ondrej Kozina
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox