All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Nadav Amit <nadav.amit@gmail.com>,
	Olivier Langlois <olivier@trillion01.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work
Date: Tue, 10 Aug 2021 22:32:59 +0100	[thread overview]
Message-ID: <bbd25a42-eac0-a8f9-0e54-3c8c8e9894fd@gmail.com> (raw)
In-Reply-To: <A4DC14BA-74CA-41DB-BE08-D7B693C11AE0@gmail.com>

On 8/10/21 9:28 AM, Nadav Amit wrote:
>> On Aug 9, 2021, at 2:48 PM, Olivier Langlois <olivier@trillion01.com> wrote:
>> On Sat, 2021-08-07 at 17:13 -0700, Nadav Amit wrote:
>>> From: Nadav Amit <namit@vmware.com>
>>>
>>> When using SQPOLL, the submission queue polling thread calls
>>> task_work_run() to run queued work. However, when work is added with
>>> TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
>>> set afterwards and is never cleared.
>>>
>>> Consequently, when the submission queue polling thread checks whether
>>> signal_pending(), it may always find a pending signal, if
>>> task_work_add() was ever called before.
>>>
>>> The impact of this bug might be different on different kernel versions.
>>> It appears that on 5.14 it would only cause unnecessary calculation and
>>> prevent the polling thread from sleeping. On 5.13, where the bug was
>>> found, it stops the polling thread from finding newly submitted work.
>>>
>>> Instead of task_work_run(), use tracehook_notify_signal() that clears
>>> TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
>>> current->task_works to avoid a race in which task_works is cleared but
>>> the TIF_NOTIFY_SIGNAL is set.
>>
>> thx a lot for this patch!
>>
>> This explains what I am seeing here:
>> https://lore.kernel.org/io-uring/4d93d0600e4a9590a48d320c5a7dd4c54d66f095.camel@trillion01.com/
>>
>> I was under the impression that task_work_run() was clearing
>> TIF_NOTIFY_SIGNAL.
>>
>> your patch made me realize that it does not…
> 
> Happy it could help.
> 
> Unfortunately, there seems to be yet another issue (unless my code
> somehow caused it). It seems that when SQPOLL is used, there are cases
> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
> never goes down to zero.
> 
> Debugging... (while also trying to make some progress with my code)

It's most likely because a request has been lost (mis-refcounted).
Let us know if you need any help. Would be great to solve it for 5.14.
quick tips: 

1) if not already, try out Jens' 5.14 branch
git://git.kernel.dk/linux-block io_uring-5.14

2) try to characterise the io_uring use pattern. Poll requests?
Read/write requests? Send/recv? Filesystem vs bdev vs sockets?

If easily reproducible, you can match io_alloc_req() with it
getting into io_dismantle_req();

-- 
Pavel Begunkov

  parent reply	other threads:[~2021-08-10 21:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-08  0:13 [PATCH 0/2] io_uring: bug fixes Nadav Amit
2021-08-08  0:13 ` [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work Nadav Amit
2021-08-08 12:55   ` Pavel Begunkov
2021-08-08 17:31     ` Nadav Amit
2021-08-09  4:07       ` Hao Xu
2021-08-09  4:50         ` Nadav Amit
2021-08-09 10:35           ` Pavel Begunkov
2021-08-09 10:18       ` Pavel Begunkov
2021-08-09 21:48   ` Olivier Langlois
2021-08-10  8:28     ` Nadav Amit
2021-08-10 13:33       ` Olivier Langlois
2021-08-10 21:32       ` Pavel Begunkov [this message]
2021-08-11  2:33         ` Nadav Amit
2021-08-11  2:51           ` Jens Axboe
2021-08-11  5:40             ` I/O cancellation in io-uring (was: io_uring: clear TIF_NOTIFY_SIGNAL ...) Nadav Amit
2021-08-08  0:13 ` [PATCH 2/2] io_uring: Use WRITE_ONCE() when writing to sq_flags Nadav Amit
2021-08-09 13:53 ` [PATCH 0/2] io_uring: bug fixes Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbd25a42-eac0-a8f9-0e54-3c8c8e9894fd@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=olivier@trillion01.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.