Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@xmission.com>
To: Parav Pandit <parav@nvidia.com>
Cc: <linux-rdma@vger.kernel.org>,  <linux-security-module@vger.kernel.org>
Subject: Re: [PATCH] RDMA/uverbs: Fix CAP_NET_RAW check for flow create in user namespace
Date: Mon, 10 Mar 2025 11:29:03 -0500	[thread overview]
Message-ID: <87ecz4q27k.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20250308180602.129663-1-parav@nvidia.com> (Parav Pandit's message of "Sat, 8 Mar 2025 20:06:02 +0200")

Parav Pandit <parav@nvidia.com> writes:

> A process running in a non-init user namespace possesses the
> CAP_NET_RAW capability. However, the patch cited in the fixes
> tag checks the capability in the default init user namespace.
> Because of this, when the process was started by Podman in a
> non-default user namespace, the flow creation failed.

This change isn't a bug fix.  This change is a relaxation of
permissions and it would be very good if this change description
described why it is in fact safe.

Many parts of the kernel are not safe for arbitrary users
to use.   In those cases an ordinary capable like you found
is used.

> Fix this issue by checking the CAP_NET_RAW networking capability
> in the owner user namespace that created the network namespace.
>
> This change is similar to the following cited patches.
>
> commit 5e1fccc0bfac ("net: Allow userns root control of the core of the network stack.")
> commit 52e804c6dfaa ("net: Allow userns root to control ipv4")
> commit 59cd7377660a ("net: openvswitch: allow conntrack in non-initial user namespace")
> commit 0a3deb11858a ("fs: Allow listmount() in foreign mount namespace")
> commit dd7cb142f467 ("fs: relax permissions for listmount()")

It is different in that hardware is involved.  There is a fair amount of
kernel bypass allowed by design in infiniband so this may indeed be safe
to allow any user on the system to do.  Still for someone who isn't
intimate with infiniband this isn't clear.

> Fixes: c938a616aadb ("IB/core: Add raw packet QP type")
> Signed-off-by: Parav Pandit <parav@nvidia.com>
>
> ---
> I would like to have feedback from the LSM experts to make sure this
> fix is correct. Given the widespread usage of the capable() call,
> it makes me wonder if the patch right.
>
> Secondly, I wasn't able to determine which primary namespace (such as
> mount or IPC, etc.) to consider for the CAP_IPC_LOCK capability.
> (not directly related to this patch, but as concept)

I took a quick look and it appears that no one figures any of the
CAP_IPC_LOCK capability checks are safe for anyone except the global
root user.

Allowing an arbitrary user to lock all of memory seems to defeat all
of the safeguards that are in place to limiting memory locking.

It looks like RLIMIT_MEMLOCK has been updated to be per user namespace
(with hierachical limits), so I expect the most reasonable thing
to do is to simply ensure the process that creates the user
namespace has a large enough RLIMIT_MEMLOCK when the user namespace
is created.

> ---
>  drivers/infiniband/core/uverbs_cmd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 5ad14c39d48c..8d6615f390f5 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -3198,7 +3198,7 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
>  	if (cmd.comp_mask)
>  		return -EINVAL;
>  
> -	if (!capable(CAP_NET_RAW))
> +	if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_RAW))
>  		return -EPERM;

Looking at the code in drivers/infiniband/core/uverbs_cmd.c
I don't think original capable call is actually correct.

The problem is that infiniband runs commands through a file descriptor.
Which means that anyone who can open the file descriptor and
then obtain a program that will work like a suid cat can bypass
the current permission check.

Before we relax any checks that test needs to be:
file_ns_capable(file, &init_user_ns, CAP_NET_RAW);

Similarly the network namespace you are talking about in those
infiniband commands really needs to be derived from the file
descriptor instead of current.

Those kinds of bugs seem very easy to find in the infiniband code
so I have a hunch that the infiniband code needs some tender loving
care before it is safe for unprivileged users to be able to do
anything with it.

In particular there was a whole lot of bug fixes and other work done to
the mount namespace and in the networking stack before allowing
unprivileged users to use it.

In the ip part of the networking stack CAP_NET_RAW allows all kinds
of things but when it is limited to only a single networking stack
(one the user had to create) it becomes safe.  I don't remember
enough about infiniband to safe if those parts guarded with CAP_NET_RAW
are safe in that way.

Eric


>  
>  	if (cmd.flow_attr.flags >= IB_FLOW_ATTR_FLAGS_RESERVED)

  parent reply	other threads:[~2025-03-10 16:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-08 18:06 [PATCH] RDMA/uverbs: Fix CAP_NET_RAW check for flow create in user namespace Parav Pandit
2025-03-10 13:31 ` Serge E. Hallyn
2025-03-10 14:47   ` Parav Pandit
2025-03-10 21:46     ` Serge E. Hallyn
2025-03-10 16:29 ` Eric W. Biederman [this message]
2025-03-10 17:48   ` Parav Pandit
2025-03-10 18:13     ` Eric W. Biederman
2025-03-11 11:32       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ecz4q27k.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=parav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox