* Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
[not found] ` <25c4c665-1a33-456c-93c7-8b7b56c0e6db@kernel.dk>
@ 2024-11-04 2:38 ` Jens Axboe
2024-11-04 4:25 ` Andrew Marshall
2024-11-06 6:05 ` Greg Kroah-Hartman
0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-04 2:38 UTC (permalink / raw)
To: Keith Busch; +Cc: Andrew Marshall, io-uring, Greg Kroah-Hartman, stable
[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]
On 11/3/24 5:06 PM, Jens Axboe wrote:
> On 11/3/24 5:01 PM, Keith Busch wrote:
>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>> problematic commit simply by browsing git log. As indicated above;
>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>> minimal reproducer, but can provide my large one (it is testing a
>>>> larger build process in a VM) if needed?there are some additional
>>>> details in the above-linked downstream bug report, though. I hope that
>>>> having identified the problematic commit is enough for someone with
>>>> more context to go off of. Happy to provide more information if
>>>> needed.
>>>
>>> Don't worry about not having a reproducer, having the backport commit
>>> pin pointed will do just fine. I'll take a look at this.
>>
>> I think stable is missing:
>>
>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>
> I think you need to go back further than that, this one already
> unconditionally holds ->uring_lock around overflow flushing...
Took a look, it's this one:
commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
Author: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed Apr 10 02:26:54 2024 +0100
io_uring: always lock __io_cqring_overflow_flush
Greg/stable, can you pick this one for 6.6-stable? It picks
cleanly.
For 6.1, which is the other stable of that age that has the backport,
the attached patch will do the trick.
With that, I believe it should be sorted. Hopefully that can make
6.6.60 and 6.1.116.
--
Jens Axboe
[-- Attachment #2: 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch --]
[-- Type: text/x-patch, Size: 1966 bytes --]
From 3f1c33f03386c481caf2044a836f3ca611094098 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed, 10 Apr 2024 02:26:54 +0100
Subject: [PATCH] io_uring: always lock __io_cqring_overflow_flush
Commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 upstream.
Conditional locking is never great, in case of
__io_cqring_overflow_flush(), which is a slow path, it's not justified.
Don't handle IOPOLL separately, always grab uring_lock for overflow
flushing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
io_uring/io_uring.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f902b161f02c..92c1aa8f3501 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -593,6 +593,8 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
bool all_flushed;
size_t cqe_size = sizeof(struct io_uring_cqe);
+ lockdep_assert_held(&ctx->uring_lock);
+
if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
return false;
@@ -647,12 +649,9 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
bool ret = true;
if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
- /* iopoll syncs against uring_lock, not completion_lock */
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->uring_lock);
ret = __io_cqring_overflow_flush(ctx, false);
- if (ctx->flags & IORING_SETUP_IOPOLL)
- mutex_unlock(&ctx->uring_lock);
+ mutex_unlock(&ctx->uring_lock);
}
return ret;
@@ -1405,6 +1404,8 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
int ret = 0;
unsigned long check_cq;
+ lockdep_assert_held(&ctx->uring_lock);
+
if (!io_allowed_run_tw(ctx))
return -EEXIST;
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
2024-11-04 2:38 ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
@ 2024-11-04 4:25 ` Andrew Marshall
2024-11-04 13:17 ` Andrew Marshall
2024-11-06 6:05 ` Greg Kroah-Hartman
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Marshall @ 2024-11-04 4:25 UTC (permalink / raw)
To: Jens Axboe, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable
On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
> On 11/3/24 5:06 PM, Jens Axboe wrote:
>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>> larger build process in a VM) if needed?there are some additional
>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>> having identified the problematic commit is enough for someone with
>>>>> more context to go off of. Happy to provide more information if
>>>>> needed.
>>>>
>>>> Don't worry about not having a reproducer, having the backport commit
>>>> pin pointed will do just fine. I'll take a look at this.
>>>
>>> I think stable is missing:
>>>
>>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>
>> I think you need to go back further than that, this one already
>> unconditionally holds ->uring_lock around overflow flushing...
>
> Took a look, it's this one:
>
> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
> Author: Pavel Begunkov <asml.silence@gmail.com>
> Date: Wed Apr 10 02:26:54 2024 +0100
>
> io_uring: always lock __io_cqring_overflow_flush
>
> Greg/stable, can you pick this one for 6.6-stable? It picks
> cleanly.
>
> For 6.1, which is the other stable of that age that has the backport,
> the attached patch will do the trick.
>
> With that, I believe it should be sorted. Hopefully that can make
> 6.6.60 and 6.1.116.
>
> --
> Jens Axboe
> Attachments:
> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch
Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my reproducer (run a few times). Your first quick patch also passed, for what it’s worth. Thanks for the quick responses!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
2024-11-04 4:25 ` Andrew Marshall
@ 2024-11-04 13:17 ` Andrew Marshall
2024-11-04 15:58 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Marshall @ 2024-11-04 13:17 UTC (permalink / raw)
To: Jens Axboe, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable
On Sun, Nov 3, 2024, at 23:25, Andrew Marshall wrote:
> On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>> having identified the problematic commit is enough for someone with
>>>>>> more context to go off of. Happy to provide more information if
>>>>>> needed.
>>>>>
>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>
>>>> I think stable is missing:
>>>>
>>>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>>
>>> I think you need to go back further than that, this one already
>>> unconditionally holds ->uring_lock around overflow flushing...
>>
>> Took a look, it's this one:
>>
>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>> Author: Pavel Begunkov <asml.silence@gmail.com>
>> Date: Wed Apr 10 02:26:54 2024 +0100
>>
>> io_uring: always lock __io_cqring_overflow_flush
>>
>> Greg/stable, can you pick this one for 6.6-stable? It picks
>> cleanly.
>>
>> For 6.1, which is the other stable of that age that has the backport,
>> the attached patch will do the trick.
>>
>> With that, I believe it should be sorted. Hopefully that can make
>> 6.6.60 and 6.1.116.
>>
>> --
>> Jens Axboe
>> Attachments:
>> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch
>
> Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my
> reproducer (run a few times). Your first quick patch also passed, for
> what it’s worth. Thanks for the quick responses!
Correction: I cherry-picked and tested 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 (which was the change you identified), not 6b231248e97fc3. Apologies for any confusion.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
2024-11-04 13:17 ` Andrew Marshall
@ 2024-11-04 15:58 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-04 15:58 UTC (permalink / raw)
To: Andrew Marshall, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable
On 11/4/24 6:17 AM, Andrew Marshall wrote:
> On Sun, Nov 3, 2024, at 23:25, Andrew Marshall wrote:
>> On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
>>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>>> having identified the problematic commit is enough for someone with
>>>>>>> more context to go off of. Happy to provide more information if
>>>>>>> needed.
>>>>>>
>>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>>
>>>>> I think stable is missing:
>>>>>
>>>>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>>>
>>>> I think you need to go back further than that, this one already
>>>> unconditionally holds ->uring_lock around overflow flushing...
>>>
>>> Took a look, it's this one:
>>>
>>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>>> Author: Pavel Begunkov <asml.silence@gmail.com>
>>> Date: Wed Apr 10 02:26:54 2024 +0100
>>>
>>> io_uring: always lock __io_cqring_overflow_flush
>>>
>>> Greg/stable, can you pick this one for 6.6-stable? It picks
>>> cleanly.
>>>
>>> For 6.1, which is the other stable of that age that has the backport,
>>> the attached patch will do the trick.
>>>
>>> With that, I believe it should be sorted. Hopefully that can make
>>> 6.6.60 and 6.1.116.
>>>
>>> --
>>> Jens Axboe
>>> Attachments:
>>> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch
>>
>> Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my
>> reproducer (run a few times). Your first quick patch also passed, for
>> what it?s worth. Thanks for the quick responses!
>
> Correction: I cherry-picked and tested
> 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 (which was the change you
> identified), not 6b231248e97fc3. Apologies for any confusion.
Thanks for clarifying, so it's as expected. Hopefully -stable can pick
this backport up soonish, so the next stable release will be sorted.
Thanks for reporting the issue!
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
2024-11-04 2:38 ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
2024-11-04 4:25 ` Andrew Marshall
@ 2024-11-06 6:05 ` Greg Kroah-Hartman
2024-11-06 14:11 ` Jens Axboe
1 sibling, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-11-06 6:05 UTC (permalink / raw)
To: Jens Axboe; +Cc: Keith Busch, Andrew Marshall, io-uring, stable
On Sun, Nov 03, 2024 at 07:38:30PM -0700, Jens Axboe wrote:
> On 11/3/24 5:06 PM, Jens Axboe wrote:
> > On 11/3/24 5:01 PM, Keith Busch wrote:
> >> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
> >>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
> >>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
> >>>> problematic commit simply by browsing git log. As indicated above;
> >>>> reverting that atop 6.6.59 results in success. Since it is passing on
> >>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
> >>>> other semantic merge conflict. Unfortunately I do not have a compact,
> >>>> minimal reproducer, but can provide my large one (it is testing a
> >>>> larger build process in a VM) if needed?there are some additional
> >>>> details in the above-linked downstream bug report, though. I hope that
> >>>> having identified the problematic commit is enough for someone with
> >>>> more context to go off of. Happy to provide more information if
> >>>> needed.
> >>>
> >>> Don't worry about not having a reproducer, having the backport commit
> >>> pin pointed will do just fine. I'll take a look at this.
> >>
> >> I think stable is missing:
> >>
> >> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
> >
> > I think you need to go back further than that, this one already
> > unconditionally holds ->uring_lock around overflow flushing...
>
> Took a look, it's this one:
>
> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
> Author: Pavel Begunkov <asml.silence@gmail.com>
> Date: Wed Apr 10 02:26:54 2024 +0100
>
> io_uring: always lock __io_cqring_overflow_flush
>
> Greg/stable, can you pick this one for 6.6-stable? It picks
> cleanly.
>
> For 6.1, which is the other stable of that age that has the backport,
> the attached patch will do the trick.
>
> With that, I believe it should be sorted. Hopefully that can make
> 6.6.60 and 6.1.116.
Now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
2024-11-06 6:05 ` Greg Kroah-Hartman
@ 2024-11-06 14:11 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-06 14:11 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: Keith Busch, Andrew Marshall, io-uring, stable
On 11/5/24 11:05 PM, Greg Kroah-Hartman wrote:
> On Sun, Nov 03, 2024 at 07:38:30PM -0700, Jens Axboe wrote:
>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>> having identified the problematic commit is enough for someone with
>>>>>> more context to go off of. Happy to provide more information if
>>>>>> needed.
>>>>>
>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>
>>>> I think stable is missing:
>>>>
>>>> 6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>>
>>> I think you need to go back further than that, this one already
>>> unconditionally holds ->uring_lock around overflow flushing...
>>
>> Took a look, it's this one:
>>
>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>> Author: Pavel Begunkov <asml.silence@gmail.com>
>> Date: Wed Apr 10 02:26:54 2024 +0100
>>
>> io_uring: always lock __io_cqring_overflow_flush
>>
>> Greg/stable, can you pick this one for 6.6-stable? It picks
>> cleanly.
>>
>> For 6.1, which is the other stable of that age that has the backport,
>> the attached patch will do the trick.
>>
>> With that, I believe it should be sorted. Hopefully that can make
>> 6.6.60 and 6.1.116.
>
> Now queued up, thanks.
Thanks Greg!
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-11-06 14:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3d913aef-8c44-4f50-9bdf-7d9051b08941@app.fastmail.com>
[not found] ` <cc8b92ba-2daa-49e3-abe6-39e7d79f213d@kernel.dk>
[not found] ` <ZygO7O1Pm5lYbNkP@kbusch-mbp>
[not found] ` <25c4c665-1a33-456c-93c7-8b7b56c0e6db@kernel.dk>
2024-11-04 2:38 ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
2024-11-04 4:25 ` Andrew Marshall
2024-11-04 13:17 ` Andrew Marshall
2024-11-04 15:58 ` Jens Axboe
2024-11-06 6:05 ` Greg Kroah-Hartman
2024-11-06 14:11 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox