public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Andres Freund <andres@anarazel.de>
Cc: gregkh@linuxfoundation.org, asml.silence@gmail.com,
	stable@vger.kernel.org
Subject: Re: FAILED: patch "[PATCH] io_uring: Use io_schedule* in cqring wait" failed to apply to 6.1-stable tree
Date: Mon, 17 Jul 2023 10:32:18 -0600	[thread overview]
Message-ID: <26f0740e-06a5-bbf1-e973-956f23f36cce@kernel.dk> (raw)
In-Reply-To: <46c1075b-0daf-14db-cf48-5a5105b996de@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]

On 7/16/23 1:19?PM, Jens Axboe wrote:
> On 7/16/23 1:11?PM, Andres Freund wrote:
>> Hi,
>>
>> On 2023-07-16 12:13:45 -0600, Jens Axboe wrote:
>>> Here's one for 6.1-stable.
>>
>> Thanks for working on that!
>>
>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index cc35aba1e495..de117d3424b2 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -2346,7 +2346,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>>>  					  struct io_wait_queue *iowq,
>>>  					  ktime_t *timeout)
>>>  {
>>> -	int ret;
>>> +	int token, ret;
>>>  	unsigned long check_cq;
>>>  
>>>  	/* make sure we run task_work before checking for signals */
>>> @@ -2362,9 +2362,18 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
>>>  		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
>>>  			return -EBADR;
>>>  	}
>>> +
>>> +	/*
>>> +	 * Use io_schedule_prepare/finish, so cpufreq can take into account
>>> +	 * that the task is waiting for IO - turns out to be important for low
>>> +	 * QD IO.
>>> +	 */
>>> +	token = io_schedule_prepare();
>>> +	ret = 0;
>>>  	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
>>> -		return -ETIME;
>>> -	return 1;
>>> +		ret = -ETIME;
>>> +	io_schedule_finish(token);
>>> +	return ret;
>>>  }
>>
>> To me it looks like this might have changed more than intended? Previously
>> io_cqring_wait_schedule() returned 0 in case schedule_hrtimeout() returned
>> non-zero, now io_cqring_wait_schedule() returns 1 in that case?  Am I missing
>> something?
> 
> Ah shoot yes indeed. Greg, can you drop the 5.10/5.15/6.1 ones for now?
> I'll get it sorted tomorrow. Sorry about that, and thanks for catching
> that Andres!

Greg, can you pick up these two for 5.10-stable and 5.15-stable? While
running testing, noticed another backport that was missing, so added
that as we..

-- 
Jens Axboe

[-- Attachment #2: 0002-io_uring-add-reschedule-point-to-handle_tw_list.patch --]
[-- Type: text/x-patch, Size: 1157 bytes --]

From 4e214e7e01158a87308a17766706159bca472855 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@kernel.dk>
Date: Mon, 17 Jul 2023 10:27:20 -0600
Subject: [PATCH 2/2] io_uring: add reschedule point to handle_tw_list()

Commit f58680085478dd292435727210122960d38e8014 upstream.

If CONFIG_PREEMPT_NONE is set and the task_work chains are long, we
could be running into issues blocking others for too long. Add a
reschedule check in handle_tw_list(), and flush the ctx if we need to
reschedule.

Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 33d4a2871dbb..eae7a3d89397 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2216,9 +2216,12 @@ static void tctx_task_work(struct callback_head *cb)
 			}
 			req->io_task_work.func(req, &locked);
 			node = next;
+			if (unlikely(need_resched())) {
+				ctx_flush_and_put(ctx, &locked);
+				ctx = NULL;
+				cond_resched();
+			}
 		} while (node);
-
-		cond_resched();
 	}
 
 	ctx_flush_and_put(ctx, &locked);
-- 
2.40.1


[-- Attachment #3: 0001-io_uring-Use-io_schedule-in-cqring-wait.patch --]
[-- Type: text/x-patch, Size: 2770 bytes --]

From c8c88d523c89e0ac8affbf2fd57def82e0d5d4bf Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 16 Jul 2023 12:07:03 -0600
Subject: [PATCH 1/2] io_uring: Use io_schedule* in cqring wait

Commit 8a796565cec3601071cbbd27d6304e202019d014 upstream.

I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().

The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0

This is repeatable with different filesystems, using raw block devices
and using different block devices.

Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.

After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).

There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.

Cc: stable@vger.kernel.org # 5.10+
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de
[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e633799c9cea..33d4a2871dbb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -7785,7 +7785,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 					  struct io_wait_queue *iowq,
 					  ktime_t *timeout)
 {
-	int ret;
+	int token, ret;
 
 	/* make sure we run task_work before checking for signals */
 	ret = io_run_task_work_sig();
@@ -7795,9 +7795,17 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	if (test_bit(0, &ctx->check_cq_overflow))
 		return 1;
 
+	/*
+	 * Use io_schedule_prepare/finish, so cpufreq can take into account
+	 * that the task is waiting for IO - turns out to be important for low
+	 * QD IO.
+	 */
+	token = io_schedule_prepare();
+	ret = 1;
 	if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
-		return -ETIME;
-	return 1;
+		ret = -ETIME;
+	io_schedule_finish(token);
+	return ret;
 }
 
 /*
-- 
2.40.1


  parent reply	other threads:[~2023-07-17 16:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-16  8:41 FAILED: patch "[PATCH] io_uring: Use io_schedule* in cqring wait" failed to apply to 6.1-stable tree gregkh
2023-07-16 18:13 ` Jens Axboe
2023-07-16 19:11   ` Andres Freund
2023-07-16 19:19     ` Jens Axboe
2023-07-16 19:29       ` Greg KH
2023-07-17 16:32       ` Jens Axboe [this message]
2023-07-17 16:39   ` Jens Axboe
2023-07-17 17:33     ` Andres Freund
2023-07-17 20:12     ` Greg KH
2023-07-17 20:13       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26f0740e-06a5-bbf1-e973-956f23f36cce@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=andres@anarazel.de \
    --cc=asml.silence@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox