Re: [PATCH 07/11] io_uring: use atomic

public inbox for kernel-hardening@lists.openwall.com
 help / color / mirror / Atom feed

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
       [not found] ` <20191210155742.5844-8-axboe@kernel.dk>
@ 2019-12-10 22:04   ` Jann Horn
  2019-12-10 22:21     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jann Horn @ 2019-12-10 22:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Will Deacon, Kees Cook, Kernel Hardening

[context preserved for additional CCs]

On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
> Recently had a regression that turned out to be because
> CONFIG_REFCOUNT_FULL was set.

I assume "regression" here refers to a performance regression? Do you
have more concrete numbers on this? Is one of the refcounting calls
particularly problematic compared to the others?

I really don't like it when raw atomic_t is used for refcounting
purposes - not only because that gets rid of the overflow checks, but
also because it is less clear semantically.

> Our ref count usage is really simple,

In my opinion, for a refcount to qualify as "really simple", it must
be possible to annotate each relevant struct member and local variable
with the (fixed) bias it carries when alive and non-NULL. This
refcount is more complicated than that.

> so let's just use atomic_t and get rid of the dependency on the full
> reference count checking being enabled or disabled.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/io_uring.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 9a596b819334..05419a152b32 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -360,7 +360,7 @@ struct io_kiocb {
>         };
>         struct list_head        link_list;
>         unsigned int            flags;
> -       refcount_t              refs;
> +       atomic_t                refs;
>  #define REQ_F_NOWAIT           1       /* must not punt to workers */
>  #define REQ_F_IOPOLL_COMPLETED 2       /* polled IO has completed */
>  #define REQ_F_FIXED_FILE       4       /* ctx owns file */
> @@ -770,7 +770,7 @@ static void io_cqring_fill_event(struct io_kiocb *req, long res)
>                 WRITE_ONCE(ctx->rings->cq_overflow,
>                                 atomic_inc_return(&ctx->cached_cq_overflow));
>         } else {
> -               refcount_inc(&req->refs);
> +               atomic_inc(&req->refs);
>                 req->result = res;
>                 list_add_tail(&req->list, &ctx->cq_overflow_list);
>         }
> @@ -852,7 +852,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx,
>         req->ctx = ctx;
>         req->flags = 0;
>         /* one is dropped after submission, the other at completion */
> -       refcount_set(&req->refs, 2);
> +       atomic_set(&req->refs, 2);
>         req->result = 0;
>         INIT_IO_WORK(&req->work, io_wq_submit_work);
>         return req;
> @@ -1035,13 +1035,13 @@ static void io_put_req_find_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
>  {
>         io_req_find_next(req, nxtptr);
>
> -       if (refcount_dec_and_test(&req->refs))
> +       if (atomic_dec_and_test(&req->refs))
>                 __io_free_req(req);
>  }
>
>  static void io_put_req(struct io_kiocb *req)
>  {
> -       if (refcount_dec_and_test(&req->refs))
> +       if (atomic_dec_and_test(&req->refs))
>                 io_free_req(req);
>  }
>
> @@ -1052,14 +1052,14 @@ static void io_put_req(struct io_kiocb *req)
>  static void __io_double_put_req(struct io_kiocb *req)
>  {
>         /* drop both submit and complete references */
> -       if (refcount_sub_and_test(2, &req->refs))
> +       if (atomic_sub_and_test(2, &req->refs))
>                 __io_free_req(req);
>  }
>
>  static void io_double_put_req(struct io_kiocb *req)
>  {
>         /* drop both submit and complete references */
> -       if (refcount_sub_and_test(2, &req->refs))
> +       if (atomic_sub_and_test(2, &req->refs))
>                 io_free_req(req);
>  }
>
> @@ -1108,7 +1108,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
>                 io_cqring_fill_event(req, req->result);
>                 (*nr_events)++;
>
> -               if (refcount_dec_and_test(&req->refs)) {
> +               if (atomic_dec_and_test(&req->refs)) {
>                         /* If we're not using fixed files, we have to pair the
>                          * completion part with the file put. Use regular
>                          * completions for those, only batch free for fixed
> @@ -3169,7 +3169,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
>         if (!list_empty(&req->link_list)) {
>                 prev = list_entry(req->link_list.prev, struct io_kiocb,
>                                   link_list);
> -               if (refcount_inc_not_zero(&prev->refs)) {
> +               if (atomic_inc_not_zero(&prev->refs)) {
>                         list_del_init(&req->link_list);
>                         prev->flags &= ~REQ_F_LINK_TIMEOUT;
>                 } else
> @@ -4237,7 +4237,7 @@ static void io_get_work(struct io_wq_work *work)
>  {
>         struct io_kiocb *req = container_of(work, struct io_kiocb, work);
>
> -       refcount_inc(&req->refs);
> +       atomic_inc(&req->refs);
>  }
>
>  static int io_sq_offload_start(struct io_ring_ctx *ctx,
> @@ -4722,7 +4722,7 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>                         if (req->work.files != files)
>                                 continue;
>                         /* req is being completed, ignore */
> -                       if (!refcount_inc_not_zero(&req->refs))
> +                       if (!atomic_inc_not_zero(&req->refs))
>                                 continue;
>                         cancel_req = req;
>                         break;
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-10 22:04   ` [PATCH 07/11] io_uring: use atomic_t for refcounts Jann Horn
@ 2019-12-10 22:21     ` Jens Axboe
  2019-12-10 22:46       ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-12-10 22:21 UTC (permalink / raw)
  To: Jann Horn; +Cc: io-uring, Will Deacon, Kees Cook, Kernel Hardening

On 12/10/19 3:04 PM, Jann Horn wrote:
> [context preserved for additional CCs]
> 
> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
>> Recently had a regression that turned out to be because
>> CONFIG_REFCOUNT_FULL was set.
> 
> I assume "regression" here refers to a performance regression? Do you
> have more concrete numbers on this? Is one of the refcounting calls
> particularly problematic compared to the others?

Yes, a performance regression. io_uring is using io-wq now, which does
an extra get/put on the work item to make it safe against async cancel.
That get/put translates into a refcount_inc and refcount_dec per work
item, and meant that we went from 0.5% refcount CPU in the test case to
1.5%. That's a pretty substantial increase.

> I really don't like it when raw atomic_t is used for refcounting
> purposes - not only because that gets rid of the overflow checks, but
> also because it is less clear semantically.

Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
that's what I should do. But I'd prefer to just drop the refcount on the
io_uring side and keep it on for other potential useful cases.

>> Our ref count usage is really simple,
> 
> In my opinion, for a refcount to qualify as "really simple", it must
> be possible to annotate each relevant struct member and local variable
> with the (fixed) bias it carries when alive and non-NULL. This
> refcount is more complicated than that.

:-(

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-10 22:21     ` Jens Axboe
@ 2019-12-10 22:46       ` Kees Cook
  2019-12-10 22:55         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2019-12-10 22:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jann Horn, io-uring, Will Deacon, Kernel Hardening

On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote:
> On 12/10/19 3:04 PM, Jann Horn wrote:
> > [context preserved for additional CCs]
> > 
> > On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
> >> Recently had a regression that turned out to be because
> >> CONFIG_REFCOUNT_FULL was set.
> > 
> > I assume "regression" here refers to a performance regression? Do you
> > have more concrete numbers on this? Is one of the refcounting calls
> > particularly problematic compared to the others?
> 
> Yes, a performance regression. io_uring is using io-wq now, which does
> an extra get/put on the work item to make it safe against async cancel.
> That get/put translates into a refcount_inc and refcount_dec per work
> item, and meant that we went from 0.5% refcount CPU in the test case to
> 1.5%. That's a pretty substantial increase.
> 
> > I really don't like it when raw atomic_t is used for refcounting
> > purposes - not only because that gets rid of the overflow checks, but
> > also because it is less clear semantically.
> 
> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
> that's what I should do. But I'd prefer to just drop the refcount on the
> io_uring side and keep it on for other potential useful cases.

There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came
out as nearly identical to the x86 asm version. Can you share the
workload where you saw this? We really don't want to regression refcount
protections, especially in the face of new APIs.

Will, do you have a moment to dig into this?

-Kees

> 
> >> Our ref count usage is really simple,
> > 
> > In my opinion, for a refcount to qualify as "really simple", it must
> > be possible to annotate each relevant struct member and local variable
> > with the (fixed) bias it carries when alive and non-NULL. This
> > refcount is more complicated than that.
> 
> :-(
> 
> -- 
> Jens Axboe
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-10 22:46       ` Kees Cook
@ 2019-12-10 22:55         ` Jens Axboe
  2019-12-11 10:20           ` Will Deacon
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2019-12-10 22:55 UTC (permalink / raw)
  To: Kees Cook; +Cc: Jann Horn, io-uring, Will Deacon, Kernel Hardening

On 12/10/19 3:46 PM, Kees Cook wrote:
> On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote:
>> On 12/10/19 3:04 PM, Jann Horn wrote:
>>> [context preserved for additional CCs]
>>>
>>> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> Recently had a regression that turned out to be because
>>>> CONFIG_REFCOUNT_FULL was set.
>>>
>>> I assume "regression" here refers to a performance regression? Do you
>>> have more concrete numbers on this? Is one of the refcounting calls
>>> particularly problematic compared to the others?
>>
>> Yes, a performance regression. io_uring is using io-wq now, which does
>> an extra get/put on the work item to make it safe against async cancel.
>> That get/put translates into a refcount_inc and refcount_dec per work
>> item, and meant that we went from 0.5% refcount CPU in the test case to
>> 1.5%. That's a pretty substantial increase.
>>
>>> I really don't like it when raw atomic_t is used for refcounting
>>> purposes - not only because that gets rid of the overflow checks, but
>>> also because it is less clear semantically.
>>
>> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
>> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
>> that's what I should do. But I'd prefer to just drop the refcount on the
>> io_uring side and keep it on for other potential useful cases.
> 
> There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came
> out as nearly identical to the x86 asm version. Can you share the
> workload where you saw this? We really don't want to regression refcount
> protections, especially in the face of new APIs.
> 
> Will, do you have a moment to dig into this?

Ah, hopefully it'll work out ok, then. The patch came from testing the
full backport on 5.2.

Do you have a link to the "nearly identical"? I can backport that
patch and try on 5.2.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-10 22:55         ` Jens Axboe
@ 2019-12-11 10:20           ` Will Deacon
  2019-12-11 16:56             ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2019-12-11 10:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kees Cook, Jann Horn, io-uring, Kernel Hardening

On Tue, Dec 10, 2019 at 03:55:05PM -0700, Jens Axboe wrote:
> On 12/10/19 3:46 PM, Kees Cook wrote:
> > On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote:
> >> On 12/10/19 3:04 PM, Jann Horn wrote:
> >>> [context preserved for additional CCs]
> >>>
> >>> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>> Recently had a regression that turned out to be because
> >>>> CONFIG_REFCOUNT_FULL was set.
> >>>
> >>> I assume "regression" here refers to a performance regression? Do you
> >>> have more concrete numbers on this? Is one of the refcounting calls
> >>> particularly problematic compared to the others?
> >>
> >> Yes, a performance regression. io_uring is using io-wq now, which does
> >> an extra get/put on the work item to make it safe against async cancel.
> >> That get/put translates into a refcount_inc and refcount_dec per work
> >> item, and meant that we went from 0.5% refcount CPU in the test case to
> >> 1.5%. That's a pretty substantial increase.
> >>
> >>> I really don't like it when raw atomic_t is used for refcounting
> >>> purposes - not only because that gets rid of the overflow checks, but
> >>> also because it is less clear semantically.
> >>
> >> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
> >> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
> >> that's what I should do. But I'd prefer to just drop the refcount on the
> >> io_uring side and keep it on for other potential useful cases.
> > 
> > There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came
> > out as nearly identical to the x86 asm version. Can you share the
> > workload where you saw this? We really don't want to regression refcount
> > protections, especially in the face of new APIs.
> > 
> > Will, do you have a moment to dig into this?
> 
> Ah, hopefully it'll work out ok, then. The patch came from testing the
> full backport on 5.2.
> 
> Do you have a link to the "nearly identical"? I can backport that
> patch and try on 5.2.

You could try my refcount/full branch, which is what ended up getting merged
during the merge window:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=refcount/full

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-11 10:20           ` Will Deacon
@ 2019-12-11 16:56             ` Kees Cook
  2019-12-11 17:00               ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2019-12-11 16:56 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jens Axboe, Jann Horn, io-uring, Kernel Hardening

On Wed, Dec 11, 2019 at 10:20:13AM +0000, Will Deacon wrote:
> On Tue, Dec 10, 2019 at 03:55:05PM -0700, Jens Axboe wrote:
> > On 12/10/19 3:46 PM, Kees Cook wrote:
> > > On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote:
> > >> On 12/10/19 3:04 PM, Jann Horn wrote:
> > >>> [context preserved for additional CCs]
> > >>>
> > >>> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
> > >>>> Recently had a regression that turned out to be because
> > >>>> CONFIG_REFCOUNT_FULL was set.
> > >>>
> > >>> I assume "regression" here refers to a performance regression? Do you
> > >>> have more concrete numbers on this? Is one of the refcounting calls
> > >>> particularly problematic compared to the others?
> > >>
> > >> Yes, a performance regression. io_uring is using io-wq now, which does
> > >> an extra get/put on the work item to make it safe against async cancel.
> > >> That get/put translates into a refcount_inc and refcount_dec per work
> > >> item, and meant that we went from 0.5% refcount CPU in the test case to
> > >> 1.5%. That's a pretty substantial increase.
> > >>
> > >>> I really don't like it when raw atomic_t is used for refcounting
> > >>> purposes - not only because that gets rid of the overflow checks, but
> > >>> also because it is less clear semantically.
> > >>
> > >> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
> > >> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
> > >> that's what I should do. But I'd prefer to just drop the refcount on the
> > >> io_uring side and keep it on for other potential useful cases.
> > > 
> > > There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came
> > > out as nearly identical to the x86 asm version. Can you share the
> > > workload where you saw this? We really don't want to regression refcount
> > > protections, especially in the face of new APIs.
> > > 
> > > Will, do you have a moment to dig into this?
> > 
> > Ah, hopefully it'll work out ok, then. The patch came from testing the
> > full backport on 5.2.

Oh good! I thought we had some kind of impossible workload. :)

> > Do you have a link to the "nearly identical"? I can backport that
> > patch and try on 5.2.
> 
> You could try my refcount/full branch, which is what ended up getting merged
> during the merge window:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=refcount/full

Yeah, as you can see in the measured tight-loop timings in
https://git.kernel.org/linus/dcb786493f3e48da3272b710028d42ec608cfda1
there was 0.1% difference for Will's series compared to the x86 assembly
version, where as the old FULL was almost 70%.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 07/11] io_uring: use atomic_t for refcounts
  2019-12-11 16:56             ` Kees Cook
@ 2019-12-11 17:00               ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2019-12-11 17:00 UTC (permalink / raw)
  To: Kees Cook, Will Deacon; +Cc: Jann Horn, io-uring, Kernel Hardening

On 12/11/19 9:56 AM, Kees Cook wrote:
> On Wed, Dec 11, 2019 at 10:20:13AM +0000, Will Deacon wrote:
>> On Tue, Dec 10, 2019 at 03:55:05PM -0700, Jens Axboe wrote:
>>> On 12/10/19 3:46 PM, Kees Cook wrote:
>>>> On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote:
>>>>> On 12/10/19 3:04 PM, Jann Horn wrote:
>>>>>> [context preserved for additional CCs]
>>>>>>
>>>>>> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>> Recently had a regression that turned out to be because
>>>>>>> CONFIG_REFCOUNT_FULL was set.
>>>>>>
>>>>>> I assume "regression" here refers to a performance regression? Do you
>>>>>> have more concrete numbers on this? Is one of the refcounting calls
>>>>>> particularly problematic compared to the others?
>>>>>
>>>>> Yes, a performance regression. io_uring is using io-wq now, which does
>>>>> an extra get/put on the work item to make it safe against async cancel.
>>>>> That get/put translates into a refcount_inc and refcount_dec per work
>>>>> item, and meant that we went from 0.5% refcount CPU in the test case to
>>>>> 1.5%. That's a pretty substantial increase.
>>>>>
>>>>>> I really don't like it when raw atomic_t is used for refcounting
>>>>>> purposes - not only because that gets rid of the overflow checks, but
>>>>>> also because it is less clear semantically.
>>>>>
>>>>> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You
>>>>> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe
>>>>> that's what I should do. But I'd prefer to just drop the refcount on the
>>>>> io_uring side and keep it on for other potential useful cases.
>>>>
>>>> There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came
>>>> out as nearly identical to the x86 asm version. Can you share the
>>>> workload where you saw this? We really don't want to regression refcount
>>>> protections, especially in the face of new APIs.
>>>>
>>>> Will, do you have a moment to dig into this?
>>>
>>> Ah, hopefully it'll work out ok, then. The patch came from testing the
>>> full backport on 5.2.
> 
> Oh good! I thought we had some kind of impossible workload. :)
> 
>>> Do you have a link to the "nearly identical"? I can backport that
>>> patch and try on 5.2.
>>
>> You could try my refcount/full branch, which is what ended up getting merged
>> during the merge window:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=refcount/full
> 
> Yeah, as you can see in the measured tight-loop timings in
> https://git.kernel.org/linus/dcb786493f3e48da3272b710028d42ec608cfda1
> there was 0.1% difference for Will's series compared to the x86 assembly
> version, where as the old FULL was almost 70%.

That looks very promising! Hopefully the patch is moot at that point, I
dropped it from the series yesterday in any case. I'll revisit as soon
as I can and holler if there's an issue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-11 17:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20191210155742.5844-1-axboe@kernel.dk>
     [not found] ` <20191210155742.5844-8-axboe@kernel.dk>
2019-12-10 22:04   ` [PATCH 07/11] io_uring: use atomic_t for refcounts Jann Horn
2019-12-10 22:21     ` Jens Axboe
2019-12-10 22:46       ` Kees Cook
2019-12-10 22:55         ` Jens Axboe
2019-12-11 10:20           ` Will Deacon
2019-12-11 16:56             ` Kees Cook
2019-12-11 17:00               ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox