* [PATCH] block: I/O error occurs during SATA disk stress test
@ 2022-08-24 11:36 Gu Mi
2022-08-24 16:51 ` Bart Van Assche
2022-08-24 23:33 ` Damien Le Moal
0 siblings, 2 replies; 5+ messages in thread
From: Gu Mi @ 2022-08-24 11:36 UTC (permalink / raw)
To: axboe; +Cc: linux-block, Gu Mi
The problem occurs in two async processes, One is when a new IO
calls the blk_mq_start_request() interface to start sending,The other
is that the block layer timer process calls the blk_mq_req_expired
interface to check whether there is an IO timeout.
When an instruction out of sequence occurs between blk_add_timer
and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
blk_mq_start_request,at this time, the block timer is checking the
new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT
and req->deadline is 0 at this time, the new IO will be misjudged as
a timeout.
Our repair plan is for the deadline to be 0, and we do not think
that a timeout occurs. At the same time, because the jiffies of the
32-bit system will be reversed shortly after the system is turned on,
we will add 1 jiffies to the deadline at this time.
Signed-off-by: Gu Mi <gumi@linux.alibaba.com>
---
block/blk-mq.c | 2 ++
block/blk-timeout.c | 4 ++++
2 files changed, 6 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4b90d2d..6defaa1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
deadline = READ_ONCE(rq->deadline);
+ if (unlikely(deadline == 0))
+ return false;
if (time_after_eq(jiffies, deadline))
return true;
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 1b8de041..6fc5088 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -140,6 +140,10 @@ void blk_add_timer(struct request *req)
req->rq_flags &= ~RQF_TIMED_OUT;
expiry = jiffies + req->timeout;
+#ifndef CONFIG_64BIT
+/* In case INITIAL_JIFFIES wraps on 32-bit */
+ expiry |= 1UL;
+#endif
WRITE_ONCE(req->deadline, expiry);
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] block: I/O error occurs during SATA disk stress test
2022-08-24 11:36 [PATCH] block: I/O error occurs during SATA disk stress test Gu Mi
@ 2022-08-24 16:51 ` Bart Van Assche
2022-08-24 23:33 ` Damien Le Moal
1 sibling, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2022-08-24 16:51 UTC (permalink / raw)
To: Gu Mi, axboe; +Cc: linux-block
On 8/24/22 04:36, Gu Mi wrote:
> The problem occurs in two async processes, One is when a new IO
> calls the blk_mq_start_request() interface to start sending,The other
> is that the block layer timer process calls the blk_mq_req_expired
> interface to check whether there is an IO timeout.
>
> When an instruction out of sequence occurs between blk_add_timer
> and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
> blk_mq_start_request,at this time, the block timer is checking the
> new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT
> and req->deadline is 0 at this time, the new IO will be misjudged as
> a timeout.
>
> Our repair plan is for the deadline to be 0, and we do not think
> that a timeout occurs. At the same time, because the jiffies of the
> 32-bit system will be reversed shortly after the system is turned on,
> we will add 1 jiffies to the deadline at this time.
>
> Signed-off-by: Gu Mi <gumi@linux.alibaba.com>
> ---
> block/blk-mq.c | 2 ++
> block/blk-timeout.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4b90d2d..6defaa1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
> return false;
>
> deadline = READ_ONCE(rq->deadline);
> + if (unlikely(deadline == 0))
> + return false;
> if (time_after_eq(jiffies, deadline))
> return true;
>
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index 1b8de041..6fc5088 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -140,6 +140,10 @@ void blk_add_timer(struct request *req)
> req->rq_flags &= ~RQF_TIMED_OUT;
>
> expiry = jiffies + req->timeout;
> +#ifndef CONFIG_64BIT
> +/* In case INITIAL_JIFFIES wraps on 32-bit */
> + expiry |= 1UL;
> +#endif
> WRITE_ONCE(req->deadline, expiry);
>
> /*
Shouldn't this be fixed by inserting a barrier inside
blk_mq_start_request() instead of a patch like the above?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] block: I/O error occurs during SATA disk stress test
2022-08-24 11:36 [PATCH] block: I/O error occurs during SATA disk stress test Gu Mi
2022-08-24 16:51 ` Bart Van Assche
@ 2022-08-24 23:33 ` Damien Le Moal
1 sibling, 0 replies; 5+ messages in thread
From: Damien Le Moal @ 2022-08-24 23:33 UTC (permalink / raw)
To: Gu Mi, axboe; +Cc: linux-block
On 2022/08/24 4:36, Gu Mi wrote:
> The problem occurs in two async processes, One is when a new IO
> calls the blk_mq_start_request() interface to start sending,The other
> is that the block layer timer process calls the blk_mq_req_expired
> interface to check whether there is an IO timeout.
>
> When an instruction out of sequence occurs between blk_add_timer
> and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
> blk_mq_start_request,at this time, the block timer is checking the
> new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT
> and req->deadline is 0 at this time, the new IO will be misjudged as
> a timeout.
>
> Our repair plan is for the deadline to be 0, and we do not think
> that a timeout occurs. At the same time, because the jiffies of the
> 32-bit system will be reversed shortly after the system is turned on,
> we will add 1 jiffies to the deadline at this time.
>
> Signed-off-by: Gu Mi <gumi@linux.alibaba.com>
> ---
> block/blk-mq.c | 2 ++
> block/blk-timeout.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4b90d2d..6defaa1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
> return false;
>
> deadline = READ_ONCE(rq->deadline);
> + if (unlikely(deadline == 0))
> + return false;
> if (time_after_eq(jiffies, deadline))
Use time_after() instead of time_after_eq() ? Then the above change would not be
needed.
> return true;
>
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index 1b8de041..6fc5088 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -140,6 +140,10 @@ void blk_add_timer(struct request *req)
> req->rq_flags &= ~RQF_TIMED_OUT;
>
> expiry = jiffies + req->timeout;
> +#ifndef CONFIG_64BIT
> +/* In case INITIAL_JIFFIES wraps on 32-bit */
> + expiry |= 1UL;
> +#endif
time_after() and friends should handle the overflow. Why is this change needed ?
> WRITE_ONCE(req->deadline, expiry);
>
> /*
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] block: I/O error occurs during SATA disk stress test
@ 2022-08-25 3:17 Gu Mi
0 siblings, 0 replies; 5+ messages in thread
From: Gu Mi @ 2022-08-25 3:17 UTC (permalink / raw)
To: axboe; +Cc: linux-block, Gu Mi
The problem occurs in two async processes, One is when a new IO
calls the blk_mq_start_request() interface to start sending,The other
is that the block layer timer process calls the blk_mq_req_expired
interface to check whether there is an IO timeout.
When an instruction out of sequence occurs between blk_add_timer
and WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
blk_mq_start_request,at this time, the block timer is checking the
new IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT
and req->deadline is 0 at this time, the new IO will be misjudged as
a timeout.
Our repair plan is for the deadline to be 0, and we do not think
that a timeout occurs. At the same time, because the jiffies of the
32-bit system will be reversed shortly after the system is turned on,
we will add 1 jiffies to the deadline at this time.
Signed-off-by: Gu Mi <gumi@linux.alibaba.com>
---
block/blk-mq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4b90d2d..6defaa1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
deadline = READ_ONCE(rq->deadline);
+ if (unlikely(deadline == 0))
+ return false;
if (time_after_eq(jiffies, deadline))
return true;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] block: I/O error occurs during SATA disk stress test
@ 2022-08-26 3:15 gumi
0 siblings, 0 replies; 5+ messages in thread
From: gumi @ 2022-08-26 3:15 UTC (permalink / raw)
To: 'Damien Le Moal', axboe; +Cc: linux-block
On 2022/08/24 4:36, Gu Mi wrote:
> The problem occurs in two async processes, One is when a new IO calls
> the blk_mq_start_request() interface to start sending,The other is
> that the block layer timer process calls the blk_mq_req_expired
> interface to check whether there is an IO timeout.
>
> When an instruction out of sequence occurs between blk_add_timer and
> WRITE_ONCE(rq->state,MQ_RQ_IN_FLIGHT) in the interface
> blk_mq_start_request,at this time, the block timer is checking the new
> IO timeout, Since the req status has been set to MQ_RQ_IN_FLIGHT and
> req->deadline is 0 at this time, the new IO will be misjudged as a
> timeout.
>
> Our repair plan is for the deadline to be 0, and we do not think that
> a timeout occurs. At the same time, because the jiffies of the 32-bit
> system will be reversed shortly after the system is turned on, we will
> add 1 jiffies to the deadline at this time.
>
> Signed-off-by: Gu Mi <gumi@linux.alibaba.com>
> ---
> block/blk-mq.c | 2 ++
> block/blk-timeout.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c index 4b90d2d..6defaa1
> 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1451,6 +1451,8 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
> return false;
>
> deadline = READ_ONCE(rq->deadline);
> + if (unlikely(deadline == 0))
> + return false;
> if (time_after_eq(jiffies, deadline))
Use time_after() instead of time_after_eq() ? Then the above change would not be needed.
> return true;
>
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c index
> 1b8de041..6fc5088 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -140,6 +140,10 @@ void blk_add_timer(struct request *req)
> req->rq_flags &= ~RQF_TIMED_OUT;
>
> expiry = jiffies + req->timeout;
> +#ifndef CONFIG_64BIT
> +/* In case INITIAL_JIFFIES wraps on 32-bit */
> + expiry |= 1UL;
> +#endif
time_after() and friends should handle the overflow. Why is this change needed ?
> WRITE_ONCE(req->deadline, expiry);
>
> /*
--
Damien Le Moal
Western Digital Research
--
Sorry, my reply yesterday was wrong, please allow me to explain again,
> +#ifndef CONFIG_64BIT
> +/* In case INITIAL_JIFFIES wraps on 32-bit */
> + expiry |= 1UL;
The purpose of this modification is not to handle overflow, but to distinguish it from the req->deadline initialization value of 0.
And guaranteeing that req->deadline is 0 means that it is initialized to 0 in blk_mq_req_expired().
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-26 3:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-24 11:36 [PATCH] block: I/O error occurs during SATA disk stress test Gu Mi
2022-08-24 16:51 ` Bart Van Assche
2022-08-24 23:33 ` Damien Le Moal
-- strict thread matches above, loose matches on Subject: below --
2022-08-25 3:17 Gu Mi
2022-08-26 3:15 gumi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox