From: Breno Leitao <leitao@debian.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
leit@meta.com, "open list:IO_URING" <io-uring@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
Date: Tue, 7 May 2024 03:44:52 -0700 [thread overview]
Message-ID: <ZjoGJH1CEk+f+U7n@gmail.com> (raw)
In-Reply-To: <d05aa530-f0f5-4ec2-91ae-b193ae644395@kernel.dk>
On Fri, May 03, 2024 at 12:32:38PM -0600, Jens Axboe wrote:
> On 5/3/24 11:37 AM, Breno Leitao wrote:
> > Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq
> > to address potential data races.
> >
> > The structure io_worker->flags may be accessed through parallel data
> > paths, leading to concurrency issues. When KCSAN is enabled, it reveals
> > data races occurring in io_worker_handle_work and
> > io_wq_activate_free_worker functions.
> >
> > BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker
> > write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28:
> > io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569)
> > io_wq_worker (io_uring/io-wq.c:?)
> > <snip>
> >
> > read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5:
> > io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285)
> > io_wq_enqueue (io_uring/io-wq.c:947)
> > io_queue_iowq (io_uring/io_uring.c:524)
> > io_req_task_submit (io_uring/io_uring.c:1511)
> > io_handle_tw_list (io_uring/io_uring.c:1198)
> >
> > Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of
> > git://git.kernel.org/pub/scm/virt/kvm/kvm").
> >
> > These races involve writes and reads to the same memory location by
> > different tasks running on different CPUs. To mitigate this, refactor
> > the code to use atomic operations such as set_bit(), test_bit(), and
> > clear_bit() instead of basic "and" and "or" operations. This ensures
> > thread-safe manipulation of worker flags.
>
> Looks good, a few comments for v2:
>
> > diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> > index 522196dfb0ff..6712d70d1f18 100644
> > --- a/io_uring/io-wq.c
> > +++ b/io_uring/io-wq.c
> > @@ -44,7 +44,7 @@ enum {
> > */
> > struct io_worker {
> > refcount_t ref;
> > - unsigned flags;
> > + unsigned long flags;
> > struct hlist_nulls_node nulls_node;
> > struct list_head all_list;
> > struct task_struct *task;
>
> This now creates a hole in the struct, maybe move 'lock' up after ref so
> that it gets filled and the current hole after 'lock' gets removed as
> well?
I am not sure I can see it. From my tests, we got the same hole, and the
struct size is the same. This is what I got with the change:
struct io_worker {
refcount_t ref; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
raw_spinlock_t lock; /* 8 64 */
/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
<snip>
/* size: 336, cachelines: 6, members: 14 */
/* sum members: 328, holes: 2, sum holes: 8 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));
This is what this current patch returns:
struct io_worker {
refcount_t ref; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
long unsigned int flags; /* 8 8 */
<snip>
/* size: 336, cachelines: 6, members: 14 */
/* sum members: 328, holes: 2, sum holes: 8 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));
A possible suggestion is to move `create_index` after `ref. Then we can
get a more packed structure:
struct io_worker {
refcount_t ref; /* 0 4 */
int create_index; /* 4 4 */
long unsigned int flags; /* 8 8 */
struct hlist_nulls_node nulls_node; /* 16 16 */
struct list_head all_list; /* 32 16 */
struct task_struct * task; /* 48 8 */
struct io_wq * wq; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct io_wq_work * cur_work; /* 64 8 */
struct io_wq_work * next_work; /* 72 8 */
raw_spinlock_t lock; /* 80 64 */
/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
struct completion ref_done; /* 144 88 */
/* --- cacheline 3 boundary (192 bytes) was 40 bytes ago --- */
long unsigned int create_state; /* 232 8 */
struct callback_head create_work __attribute__((__aligned__(8))); /* 240 16 */
/* --- cacheline 4 boundary (256 bytes) --- */
union {
struct callback_head rcu __attribute__((__aligned__(8))); /* 256 16 */
struct work_struct work; /* 256 72 */
} __attribute__((__aligned__(8))); /* 256 72 */
/* size: 328, cachelines: 6, members: 14 */
/* forced alignments: 2 */
/* last cacheline: 8 bytes */
} __attribute__((__aligned__(8)));
How does it sound?
> And then I'd renumber the flags, they take bit offsets, not
> masks/values. Otherwise it's a bit confusing for someone reading the
> code, using masks with test/set bit functions.
Good point. What about something like?
enum {
IO_WORKER_F_UP = 0, /* up and active */
IO_WORKER_F_RUNNING = 1, /* account as running */
IO_WORKER_F_FREE = 2, /* worker on free list */
IO_WORKER_F_BOUND = 3, /* is doing bounded work */
};
Since we are now using WRITE_ONCE() in io_wq_worker, I am wondering if
this is what we want to do?
WRITE_ONCE(worker->flags, (IO_WORKER_F_UP| IO_WORKER_F_RUNNING) << 1);
Thanks
next prev parent reply other threads:[~2024-05-07 10:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-03 17:37 [PATCH] io_uring/io-wq: Use set_bit() and test_bit() at worker->flags Breno Leitao
2024-05-03 18:32 ` Jens Axboe
2024-05-07 10:44 ` Breno Leitao [this message]
2024-05-07 11:02 ` Breno Leitao
2024-05-07 13:28 ` Jens Axboe
2024-05-03 18:41 ` Jens Axboe
2024-05-03 19:24 ` Christophe JAILLET
2024-05-03 19:36 ` Jens Axboe
2024-05-07 9:24 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjoGJH1CEk+f+U7n@gmail.com \
--to=leitao@debian.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=leit@meta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.