From: Jens Axboe <axboe@kernel.dk>
To: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH 3/3] io_uring/register: allow original task restrictions owner to unregister
Date: Tue, 13 Jan 2026 11:25:15 -0700 [thread overview]
Message-ID: <fa2d0d7f-adbc-4e5e-a9d8-9a170ade8eaa@kernel.dk> (raw)
In-Reply-To: <877btm4bko.fsf@mailhost.krisman.be>
On 1/12/26 5:10 PM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> Currently any attempt to register a set of task restrictions if an
>> existing set exists will fail with -EPERM. But it is feasible to let the
>> original creator/owner performance this operation. Either to remove
>> restrictions entirely, or to replace them with a new set.
>>
>> If an existing set exists and NULL is passed for the new set, the
>> current set is unregistered. If an existing set exists and a new set is
>> supplied, the old set is dropped and replaced with the new one.
>
> Feature-wise, I think this covers what I mentioned in the previous
> iteration. Even though this is an RFC, I think I found two bugs that
> allow the child to escape the restrictions:
>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>> include/linux/io_uring_types.h | 1 +
>> io_uring/register.c | 45 ++++++++++++++++++++++++++++------
>> 2 files changed, 38 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 196f41ec6d60..1ff7817b3535 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -222,6 +222,7 @@ struct io_rings {
>> struct io_restriction {
>> DECLARE_BITMAP(register_op, IORING_REGISTER_LAST);
>> DECLARE_BITMAP(sqe_op, IORING_OP_LAST);
>> + pid_t pid;
>> refcount_t refs;
>> u8 sqe_flags_allowed;
>> u8 sqe_flags_required;
>> diff --git a/io_uring/register.c b/io_uring/register.c
>> index 552b22f6b2dc..c8b8a9edbc65 100644
>> --- a/io_uring/register.c
>> +++ b/io_uring/register.c
>> @@ -189,12 +189,19 @@ static int io_register_restrictions_task(void __user *arg, unsigned int nr_args)
>> {
>> struct io_uring_task_restriction __user *ures = arg;
>> struct io_uring_task_restriction tres;
>> - struct io_restriction *res;
>> + struct io_restriction *old_res, *res;
>> int ret;
>>
>> if (nr_args != 1)
>> return -EINVAL;
>>
>> + res = current->io_uring_restrict;
>> + if (!ures) {
>> + if (!res)
>> + return -EFAULT;
>> + goto drop_set;
>> + }
>> +
>> if (copy_from_user(&tres, arg, sizeof(tres)))
>> return -EFAULT;
>>
>> @@ -207,13 +214,27 @@ static int io_register_restrictions_task(void __user *arg, unsigned int nr_args)
>> * Disallow if task already has registered restrictions, and we're
>> * not passing in further restrictions to add to an existing set.
>> */
>> - if (current->io_uring_restrict &&
>> - !(tres.flags & IORING_REG_RESTRICTIONS_MASK))
>> - return -EPERM;
>> + old_res = NULL;
>> + if (res && !(tres.flags & IORING_REG_RESTRICTIONS_MASK)) {
>> + /* Not owner, may only append further restrictions */
>> +drop_set:
>> + if (res->pid != current->pid)
>> + return -EPERM;
>
> This might be hard to exploit, but if the parent terminates, the pid
> can get reused. Then, if the child forks until it gets the same pid,
> it can unregister the filter. I suppose the fix would require holding
> a reference to the task, similar to what pidfd does. but perhaps just
> abandon the unregistering semantics? I'm not sure it is that
> useful...
I did ponder pid reuse and considered it not an issue due to the size of
the space. But from other feedback, seems like unregistering is not a
good idea anyway, should always be cummultative. There's a valid use
case where the task is forked up front, then restrictions registered,
and then exec. We can't allow unregistering for that case.
So I think I'll just drop this particular patch for now. It's also why I
kept it separate...
>> @@ -226,14 +247,22 @@ static int io_register_restrictions_task(void __user *arg, unsigned int nr_args)
>> tres.flags & IORING_REG_RESTRICTIONS_MASK);
>> if (ret) {
>> kfree(res);
>> - return ret;
>> + goto out;
>> }
>> if (current->io_uring_restrict &&
>> refcount_dec_and_test(¤t->io_uring_restrict->refs))
>> kfree(current->io_uring_restrict);
>> + res->pid = current->pid;
>
> res->pid must always point to the first task that added a
> restriction. So:
>
> if (!current->io_uring_restrict)
> res->pid = current->pid;
>
> Otherwise, the child will become the owner after adding another
> restriction, and can then break out with a further unregister. Based on
> your testcase, this escapes the filter:
Thanks for looking at that too, but I guess moot with it getting
dropped. But yes I do think you're right!
--
Jens Axboe
prev parent reply other threads:[~2026-01-13 18:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-09 18:48 [PATCHSET RFC v2 0/3] Per-task io_uring opcode restrictions Jens Axboe
2026-01-09 18:48 ` [PATCH 1/3] io_uring: allow registration of per-task restrictions Jens Axboe
2026-01-09 18:48 ` [PATCH 2/3] io_uring/register: add MASK support for task filter set Jens Axboe
2026-01-09 18:48 ` [PATCH 3/3] io_uring/register: allow original task restrictions owner to unregister Jens Axboe
2026-01-13 0:10 ` Gabriel Krisman Bertazi
2026-01-13 18:25 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fa2d0d7f-adbc-4e5e-a9d8-9a170ade8eaa@kernel.dk \
--to=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=krisman@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.