Re: [syzbot] KCSAN: data-race in __io_uring_cancel / io_uring_try_cancel_requests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marco Elver <elver@google.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	syzbot <syzbot+73554e2258b7b8bf0bbf@syzkaller.appspotmail.com>,
	io-uring@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [syzbot] KCSAN: data-race in __io_uring_cancel / io_uring_try_cancel_requests
Date: Thu, 27 May 2021 11:32:34 +0200	[thread overview]
Message-ID: <YK9nMgamPsr9YsoY@elver.google.com> (raw)
In-Reply-To: <5cf2250a-c580-4dbf-5997-e987c7b71086@gmail.com>

On Wed, May 26, 2021 at 09:31PM +0100, Pavel Begunkov wrote:
> On 5/26/21 5:36 PM, Marco Elver wrote:
> > On Wed, 26 May 2021 at 18:29, Pavel Begunkov <asml.silence@gmail.com> wrote:
> >> On 5/26/21 4:52 PM, Marco Elver wrote:
> >>> Due to some moving around of code, the patch lost the actual fix (using
> >>> atomically read io_wq) -- so here it is again ... hopefully as intended.
> >>> :-)
> >>
> >> "fortify" damn it... It was synchronised with &ctx->uring_lock
> >> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(),
> >> so should not clear before *del_tctx_node()
> > 
> > Ah, so if I understand right, the property stated by the comment in
> > io_uring_try_cancel_iowq() was broken, and your patch below would fix
> > that, right?
> 
> "io_uring: fortify tctx/io_wq cleanup" broke it and the diff
> should fix it.
> 
> >> The fix should just move it after this sync point. Will you send
> >> it out as a patch?
> > 
> > Do you mean your move of write to io_wq goes on top of the patch I
> > proposed? (If so, please also leave your Signed-of-by so I can squash
> > it.)
> 
> No, only my diff, but you hinted on what has happened, so I would
> prefer you to take care of patching. If you want of course.
> 
> To be entirely fair, assuming that aligned ptr
> reads can't be torn, I don't see any _real_ problem. But surely
> the report is very helpful and the current state is too wonky, so
> should be patched.

In the current version, it is a problem if we end up with a double-read,
as it is in the current C code. The compiler might of course optimize
it into 1 read into a register.

Tangent: I avoid reasoning in terms of compiler optimizations where
I can. :-) It's is a slippery slope if the code in question isn't
tolerant to data races by design (examples are stats counting, or other
heuristics -- in the case here that's certainly not the case).
Therefore, my wish is that we really ought to resolve as many data races
as we can (+ mark intentional ones appropriately). Also, so that we're
left with only the interesting cases like in the case here.  (More
background if you're interested: https://lwn.net/Articles/816850/)

The problem here, however, has a nicer resolution as you suggested.

> TL;DR;
> The synchronisation goes as this: it's usually used by the owner
> task, and the owner task deletes it, so is mostly naturally
> synchronised. An exception is a worker (not only) that accesses
> it for cancellation purpose, but it uses it only under ->uring_lock,
> so if removal is also taking the lock it should be fine. see
> io_uring_del_tctx_node() locking.

Did you mean io_uring_del_task_file()? There is no
io_uring_del_tctx_node().

> > So if I understand right, we do in fact have 2 problems:
> > 1. the data race as I noted in my patch, and
> 
> Yes, and it deals with it
> 
> > 2. the fact that io_wq does not live long enough.
> 
> Nope, io_wq outlives them fine. 

I've sent:
https://lkml.kernel.org/r/20210527092547.2656514-1-elver@google.com

Thanks,
-- Marco

next prev parent reply	other threads:[~2021-05-27  9:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 15:44 [syzbot] KCSAN: data-race in __io_uring_cancel / io_uring_try_cancel_requests syzbot
2021-05-26 15:48 ` Marco Elver
2021-05-26 15:52   ` Marco Elver
2021-05-26 16:29     ` Pavel Begunkov
2021-05-26 16:33       ` Pavel Begunkov
2021-05-26 16:36       ` Marco Elver
2021-05-26 20:31         ` Pavel Begunkov
2021-05-27  9:32           ` Marco Elver [this message]
2021-05-27 10:05             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK9nMgamPsr9YsoY@elver.google.com \
    --to=elver@google.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=dvyukov@google.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+73554e2258b7b8bf0bbf@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.