Linux kernel -stable discussions
 help / color / mirror / Atom feed
* Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
       [not found]     ` <25c4c665-1a33-456c-93c7-8b7b56c0e6db@kernel.dk>
@ 2024-11-04  2:38       ` Jens Axboe
  2024-11-04  4:25         ` Andrew Marshall
  2024-11-06  6:05         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-04  2:38 UTC (permalink / raw)
  To: Keith Busch; +Cc: Andrew Marshall, io-uring, Greg Kroah-Hartman, stable

[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]

On 11/3/24 5:06 PM, Jens Axboe wrote:
> On 11/3/24 5:01 PM, Keith Busch wrote:
>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>> problematic commit simply by browsing git log. As indicated above;
>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>> minimal reproducer, but can provide my large one (it is testing a
>>>> larger build process in a VM) if needed?there are some additional
>>>> details in the above-linked downstream bug report, though. I hope that
>>>> having identified the problematic commit is enough for someone with
>>>> more context to go off of. Happy to provide more information if
>>>> needed.
>>>
>>> Don't worry about not having a reproducer, having the backport commit
>>> pin pointed will do just fine. I'll take a look at this.
>>
>> I think stable is missing:
>>
>>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
> 
> I think you need to go back further than that, this one already
> unconditionally holds ->uring_lock around overflow flushing...

Took a look, it's this one:

commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Wed Apr 10 02:26:54 2024 +0100

    io_uring: always lock __io_cqring_overflow_flush

Greg/stable, can you pick this one for 6.6-stable? It picks
cleanly.

For 6.1, which is the other stable of that age that has the backport,
the attached patch will do the trick.

With that, I believe it should be sorted. Hopefully that can make
6.6.60 and 6.1.116.

-- 
Jens Axboe

[-- Attachment #2: 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch --]
[-- Type: text/x-patch, Size: 1966 bytes --]

From 3f1c33f03386c481caf2044a836f3ca611094098 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence@gmail.com>
Date: Wed, 10 Apr 2024 02:26:54 +0100
Subject: [PATCH] io_uring: always lock __io_cqring_overflow_flush

Commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 upstream.

Conditional locking is never great, in case of
__io_cqring_overflow_flush(), which is a slow path, it's not justified.
Don't handle IOPOLL separately, always grab uring_lock for overflow
flushing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f902b161f02c..92c1aa8f3501 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -593,6 +593,8 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
 	bool all_flushed;
 	size_t cqe_size = sizeof(struct io_uring_cqe);
 
+	lockdep_assert_held(&ctx->uring_lock);
+
 	if (!force && __io_cqring_events(ctx) == ctx->cq_entries)
 		return false;
 
@@ -647,12 +649,9 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx)
 	bool ret = true;
 
 	if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
-		/* iopoll syncs against uring_lock, not completion_lock */
-		if (ctx->flags & IORING_SETUP_IOPOLL)
-			mutex_lock(&ctx->uring_lock);
+		mutex_lock(&ctx->uring_lock);
 		ret = __io_cqring_overflow_flush(ctx, false);
-		if (ctx->flags & IORING_SETUP_IOPOLL)
-			mutex_unlock(&ctx->uring_lock);
+		mutex_unlock(&ctx->uring_lock);
 	}
 
 	return ret;
@@ -1405,6 +1404,8 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 	int ret = 0;
 	unsigned long check_cq;
 
+	lockdep_assert_held(&ctx->uring_lock);
+
 	if (!io_allowed_run_tw(ctx))
 		return -EEXIST;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
  2024-11-04  2:38       ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
@ 2024-11-04  4:25         ` Andrew Marshall
  2024-11-04 13:17           ` Andrew Marshall
  2024-11-06  6:05         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Marshall @ 2024-11-04  4:25 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable

On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
> On 11/3/24 5:06 PM, Jens Axboe wrote:
>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>> larger build process in a VM) if needed?there are some additional
>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>> having identified the problematic commit is enough for someone with
>>>>> more context to go off of. Happy to provide more information if
>>>>> needed.
>>>>
>>>> Don't worry about not having a reproducer, having the backport commit
>>>> pin pointed will do just fine. I'll take a look at this.
>>>
>>> I think stable is missing:
>>>
>>>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>> 
>> I think you need to go back further than that, this one already
>> unconditionally holds ->uring_lock around overflow flushing...
>
> Took a look, it's this one:
>
> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
> Author: Pavel Begunkov <asml.silence@gmail.com>
> Date:   Wed Apr 10 02:26:54 2024 +0100
>
>     io_uring: always lock __io_cqring_overflow_flush
>
> Greg/stable, can you pick this one for 6.6-stable? It picks
> cleanly.
>
> For 6.1, which is the other stable of that age that has the backport,
> the attached patch will do the trick.
>
> With that, I believe it should be sorted. Hopefully that can make
> 6.6.60 and 6.1.116.
>
> -- 
> Jens Axboe
> Attachments:
> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch

Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my reproducer (run a few times). Your first quick patch also passed, for what it’s worth. Thanks for the quick responses!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
  2024-11-04  4:25         ` Andrew Marshall
@ 2024-11-04 13:17           ` Andrew Marshall
  2024-11-04 15:58             ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Marshall @ 2024-11-04 13:17 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable

On Sun, Nov 3, 2024, at 23:25, Andrew Marshall wrote:
> On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>> having identified the problematic commit is enough for someone with
>>>>>> more context to go off of. Happy to provide more information if
>>>>>> needed.
>>>>>
>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>
>>>> I think stable is missing:
>>>>
>>>>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>> 
>>> I think you need to go back further than that, this one already
>>> unconditionally holds ->uring_lock around overflow flushing...
>>
>> Took a look, it's this one:
>>
>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>> Author: Pavel Begunkov <asml.silence@gmail.com>
>> Date:   Wed Apr 10 02:26:54 2024 +0100
>>
>>     io_uring: always lock __io_cqring_overflow_flush
>>
>> Greg/stable, can you pick this one for 6.6-stable? It picks
>> cleanly.
>>
>> For 6.1, which is the other stable of that age that has the backport,
>> the attached patch will do the trick.
>>
>> With that, I believe it should be sorted. Hopefully that can make
>> 6.6.60 and 6.1.116.
>>
>> -- 
>> Jens Axboe
>> Attachments:
>> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch
>
> Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my 
> reproducer (run a few times). Your first quick patch also passed, for 
> what it’s worth. Thanks for the quick responses!

Correction: I cherry-picked and tested 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 (which was the change you identified), not 6b231248e97fc3. Apologies for any confusion.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
  2024-11-04 13:17           ` Andrew Marshall
@ 2024-11-04 15:58             ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-04 15:58 UTC (permalink / raw)
  To: Andrew Marshall, Keith Busch; +Cc: io-uring, Greg Kroah-Hartman, stable

On 11/4/24 6:17 AM, Andrew Marshall wrote:
> On Sun, Nov 3, 2024, at 23:25, Andrew Marshall wrote:
>> On Sun, Nov 3, 2024, at 21:38, Jens Axboe wrote:
>>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>>> having identified the problematic commit is enough for someone with
>>>>>>> more context to go off of. Happy to provide more information if
>>>>>>> needed.
>>>>>>
>>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>>
>>>>> I think stable is missing:
>>>>>
>>>>>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>>>
>>>> I think you need to go back further than that, this one already
>>>> unconditionally holds ->uring_lock around overflow flushing...
>>>
>>> Took a look, it's this one:
>>>
>>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>>> Author: Pavel Begunkov <asml.silence@gmail.com>
>>> Date:   Wed Apr 10 02:26:54 2024 +0100
>>>
>>>     io_uring: always lock __io_cqring_overflow_flush
>>>
>>> Greg/stable, can you pick this one for 6.6-stable? It picks
>>> cleanly.
>>>
>>> For 6.1, which is the other stable of that age that has the backport,
>>> the attached patch will do the trick.
>>>
>>> With that, I believe it should be sorted. Hopefully that can make
>>> 6.6.60 and 6.1.116.
>>>
>>> -- 
>>> Jens Axboe
>>> Attachments:
>>> * 0001-io_uring-always-lock-__io_cqring_overflow_flush.patch
>>
>> Cherry-picking 6b231248e97fc3 onto 6.6.59, I can confirm it passes my 
>> reproducer (run a few times). Your first quick patch also passed, for 
>> what it?s worth. Thanks for the quick responses!
> 
> Correction: I cherry-picked and tested
> 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 (which was the change you
> identified), not 6b231248e97fc3. Apologies for any confusion.

Thanks for clarifying, so it's as expected. Hopefully -stable can pick
this backport up soonish, so the next stable release will be sorted.
Thanks for reporting the issue!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
  2024-11-04  2:38       ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
  2024-11-04  4:25         ` Andrew Marshall
@ 2024-11-06  6:05         ` Greg Kroah-Hartman
  2024-11-06 14:11           ` Jens Axboe
  1 sibling, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-11-06  6:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Andrew Marshall, io-uring, stable

On Sun, Nov 03, 2024 at 07:38:30PM -0700, Jens Axboe wrote:
> On 11/3/24 5:06 PM, Jens Axboe wrote:
> > On 11/3/24 5:01 PM, Keith Busch wrote:
> >> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
> >>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
> >>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
> >>>> problematic commit simply by browsing git log. As indicated above;
> >>>> reverting that atop 6.6.59 results in success. Since it is passing on
> >>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
> >>>> other semantic merge conflict. Unfortunately I do not have a compact,
> >>>> minimal reproducer, but can provide my large one (it is testing a
> >>>> larger build process in a VM) if needed?there are some additional
> >>>> details in the above-linked downstream bug report, though. I hope that
> >>>> having identified the problematic commit is enough for someone with
> >>>> more context to go off of. Happy to provide more information if
> >>>> needed.
> >>>
> >>> Don't worry about not having a reproducer, having the backport commit
> >>> pin pointed will do just fine. I'll take a look at this.
> >>
> >> I think stable is missing:
> >>
> >>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
> > 
> > I think you need to go back further than that, this one already
> > unconditionally holds ->uring_lock around overflow flushing...
> 
> Took a look, it's this one:
> 
> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
> Author: Pavel Begunkov <asml.silence@gmail.com>
> Date:   Wed Apr 10 02:26:54 2024 +0100
> 
>     io_uring: always lock __io_cqring_overflow_flush
> 
> Greg/stable, can you pick this one for 6.6-stable? It picks
> cleanly.
> 
> For 6.1, which is the other stable of that age that has the backport,
> the attached patch will do the trick.
> 
> With that, I believe it should be sorted. Hopefully that can make
> 6.6.60 and 6.1.116.

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")
  2024-11-06  6:05         ` Greg Kroah-Hartman
@ 2024-11-06 14:11           ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2024-11-06 14:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Keith Busch, Andrew Marshall, io-uring, stable

On 11/5/24 11:05 PM, Greg Kroah-Hartman wrote:
> On Sun, Nov 03, 2024 at 07:38:30PM -0700, Jens Axboe wrote:
>> On 11/3/24 5:06 PM, Jens Axboe wrote:
>>> On 11/3/24 5:01 PM, Keith Busch wrote:
>>>> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote:
>>>>> On 11/3/24 4:47 PM, Andrew Marshall wrote:
>>>>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely
>>>>>> problematic commit simply by browsing git log. As indicated above;
>>>>>> reverting that atop 6.6.59 results in success. Since it is passing on
>>>>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some
>>>>>> other semantic merge conflict. Unfortunately I do not have a compact,
>>>>>> minimal reproducer, but can provide my large one (it is testing a
>>>>>> larger build process in a VM) if needed?there are some additional
>>>>>> details in the above-linked downstream bug report, though. I hope that
>>>>>> having identified the problematic commit is enough for someone with
>>>>>> more context to go off of. Happy to provide more information if
>>>>>> needed.
>>>>>
>>>>> Don't worry about not having a reproducer, having the backport commit
>>>>> pin pointed will do just fine. I'll take a look at this.
>>>>
>>>> I think stable is missing:
>>>>
>>>>   6b231248e97fc3 ("io_uring: consolidate overflow flushing")
>>>
>>> I think you need to go back further than that, this one already
>>> unconditionally holds ->uring_lock around overflow flushing...
>>
>> Took a look, it's this one:
>>
>> commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998
>> Author: Pavel Begunkov <asml.silence@gmail.com>
>> Date:   Wed Apr 10 02:26:54 2024 +0100
>>
>>     io_uring: always lock __io_cqring_overflow_flush
>>
>> Greg/stable, can you pick this one for 6.6-stable? It picks
>> cleanly.
>>
>> For 6.1, which is the other stable of that age that has the backport,
>> the attached patch will do the trick.
>>
>> With that, I believe it should be sorted. Hopefully that can make
>> 6.6.60 and 6.1.116.
> 
> Now queued up, thanks.

Thanks Greg!

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-11-06 14:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3d913aef-8c44-4f50-9bdf-7d9051b08941@app.fastmail.com>
     [not found] ` <cc8b92ba-2daa-49e3-abe6-39e7d79f213d@kernel.dk>
     [not found]   ` <ZygO7O1Pm5lYbNkP@kbusch-mbp>
     [not found]     ` <25c4c665-1a33-456c-93c7-8b7b56c0e6db@kernel.dk>
2024-11-04  2:38       ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
2024-11-04  4:25         ` Andrew Marshall
2024-11-04 13:17           ` Andrew Marshall
2024-11-04 15:58             ` Jens Axboe
2024-11-06  6:05         ` Greg Kroah-Hartman
2024-11-06 14:11           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox