Re: [PATCH 1/2] fs: Add bpf_sock_read_xattr() kfunc to read socket xattrs

Linux filesystem development
 help / color / mirror / Atom feed

From: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>
To: "Christian Brauner" <brauner@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>
Cc: "Alexander Viro" <viro@zeniv.linux.org.uk>,
	"Jan Kara" <jack@suse.cz>, "Simon Horman" <horms@kernel.org>,
	"Kuniyuki Iwashima" <kuniyu@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	<linux-fsdevel@vger.kernel.org>, <netdev@vger.kernel.org>,
	<bpf@vger.kernel.org>, "Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"Jiri Olsa" <jolsa@kernel.org>
Subject: Re: [PATCH 1/2] fs: Add bpf_sock_read_xattr() kfunc to read socket xattrs
Date: Fri, 19 Jun 2026 20:20:40 -0700	[thread overview]
Message-ID: <DJDJX62AS415.2BVILN08QK149@gmail.com> (raw)
In-Reply-To: <20260617-work-bpf-sock-xattr-v1-1-a1276f7c9da3@kernel.org>

On Wed Jun 17, 2026 at 4:18 AM PDT, Christian Brauner wrote:
> In c8db08110cbe ("Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
> we added support for extended attributes for sockets. This comes in two
> flavors: sockfs and non-sockfs/filesystem sockets. Filesystem sockets
> are actual filesystem objects so reading xattrs must use dedicated fs
> helpers such as bpf_get_dentry_xattr() and bpf_get_file_xattr(). Those
> are inherently sleeping operations. Sockfs sockets on the other hand
> don't need to use sleeping operations as the underlying data structure
> is lockless. In addition, retrieval of sockfs extended attributes often
> happens from LSM hooks that only provide struct socket and it's
> completely nonsensical to grab a reference to a file, then force a
> sleeping operation to retrieve the xattr and drop the reference. We know
> that the sockfs file cannot go away while the LSM hook runs.
>
> This series adds a bpf_sock_read_xattr() kfunc that, given a struct
> socket, reads a user.* extended attribute from the socket's sockfs inode
> into a bpf_dynptr. Together with fsetxattr() from userspace this lets a
> process label a socket with a user.* xattr and have a BPF LSM program
> retrieve that label locklessly. The kfunc mirrors the existing
> bpf_cgroup_read_xattr(), including the restriction to the user.*
> namespace.
>
> systemd uses user.* xattrs on sockets to implement socket rate limiting
> and to tag sockets for other purposes [1] such as implementing a varlink
> registry. There is currently no efficient way for a BPF program to read
> those labels back. The new helper allows a listening socket marked with
> an extended attribute to be read back during bind/connect and then act
> on the connect()ing socket. Extended attributes make it possible to
> allow an unprivileged user manager such as systemd --user to mark
> sockets from userspace and then rediscover them or implement policies.
>
> The kfunc is registered KF_RCU and only for BPF LSM programs. A struct
> socket is only guaranteed to live in sockfs when an LSM socket hook hands
> it out, which is what keeps SOCK_INODE() valid. Sockets that embed struct
> socket outside sockfs (tun, tap) are only reachable from tracing programs
> and are excluded by the registration. (Btw, for consistency it would
> be nice to force allocation of struct socket from sockfs instead of
> simply embedding it in e.g., struct tun_file which makes the SOCKFS_I()
> pattern a hazard - at least outside of sockfs functions.)
>
> The read never sleeps and takes no lock. For sockfs the value lives in
> the inode's in-memory xattr store and simple_xattr_get() resolves it
> with an RCU-protected rhashtable lookup, taking neither the inode lock
> nor any xattr lock. The kfunc is therefore usable from both sleepable
> and non-sleepable LSM hooks.
>
> Link: https://github.com/systemd/systemd/pull/40559 [1]
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
> ---
>  fs/bpf_fs_kfuncs.c  | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/net.h |  1 +
>  net/socket.c        | 25 +++++++++++++++++++++++++
>  3 files changed, 63 insertions(+)
>
> diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
> index 11841c3d4260..85fc9519d1ff 100644
> --- a/fs/bpf_fs_kfuncs.c
> +++ b/fs/bpf_fs_kfuncs.c
> @@ -11,6 +11,7 @@
>  #include <linux/file.h>
>  #include <linux/kernfs.h>
>  #include <linux/mm.h>
> +#include <linux/net.h>
>  #include <linux/xattr.h>
>  
>  __bpf_kfunc_start_defs();
> @@ -359,6 +360,39 @@ __bpf_kfunc int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__s
>  }
>  #endif /* CONFIG_CGROUPS */
>  
> +#ifdef CONFIG_NET
> +/**
> + * bpf_sock_read_xattr - read xattr of a socket's inode in sockfs
> + * @sock: socket to get xattr from
> + * @name__str: name of the xattr
> + * @value_p: output buffer of the xattr value
> + *
> + * Get xattr *name__str* of *sock* and store the output in *value_p*.
> + *
> + * For security reasons, only *name__str* with prefix "user." is allowed.
> + *
> + * Return: length of the xattr value on success, a negative value on error.
> + */
> +__bpf_kfunc int bpf_sock_read_xattr(struct socket *sock, const char *name__str,
> +				    struct bpf_dynptr *value_p)
> +{
> +	struct bpf_dynptr_kern *value_ptr = (struct bpf_dynptr_kern *)value_p;
> +	u32 value_len;
> +	void *value;
> +
> +	/* Only allow reading "user.*" xattrs */
> +	if (strncmp(name__str, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN))
> +		return -EPERM;
> +
> +	value_len = __bpf_dynptr_size(value_ptr);
> +	value = __bpf_dynptr_data_rw(value_ptr, value_len);
> +	if (!value)
> +		return -EINVAL;
> +
> +	return sock_read_xattr(sock, name__str, value, value_len);
> +}
> +#endif /* CONFIG_NET */

lgtm.
How do you want to route it? Thought vfs tree for the next merge window?
If so
Acked-by: Alexei Starovoitov <ast@kernel.org>

next prev parent reply	other threads:[~2026-06-20  3:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 11:18 [PATCH 0/2] Add bpf_sock_read_xattr() kfunc to read socket xattrs Christian Brauner
2026-06-17 11:18 ` [PATCH 1/2] fs: " Christian Brauner
2026-06-18 18:20   ` John Fastabend
2026-06-20  3:20   ` Alexei Starovoitov [this message]
2026-06-17 11:18 ` [PATCH 2/2] selftests/bpf: Add test for bpf_sock_read_xattr() kfunc Christian Brauner
2026-06-18 18:24   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJDJX62AS415.2BVILN08QK149@gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jack@suse.cz \
    --cc=jolsa@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=song@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox