Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Breno Leitao <leitao@debian.org>
To: David Laight <david.laight.linux@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	 Kuniyuki Iwashima <kuniyu@google.com>,
	Willem de Bruijn <willemb@google.com>,
	metze@samba.org,  axboe@kernel.dk,
	Stanislav Fomichev <sdf@fomichev.me>,
	io-uring@vger.kernel.org,  bpf@vger.kernel.org,
	netdev@vger.kernel.org,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
Date: Fri, 10 Apr 2026 05:29:37 -0700	[thread overview]
Message-ID: <adjkn7p4U13WBs2o@gmail.com> (raw)
In-Reply-To: <20260408195640.324ee932@pumpkin>

On Wed, Apr 08, 2026 at 07:56:40PM +0100, David Laight wrote:
> On Wed, 8 Apr 2026 06:52:54 -0700
> Breno Leitao <leitao@debian.org> wrote:
>
> > Hello David,
> >
> > On Wed, Apr 08, 2026 at 12:26:53PM +0100, David Laight wrote:
> > > On Wed, 08 Apr 2026 03:30:28 -0700
> > > Breno Leitao <leitao@debian.org> wrote:
> > >
> > > > Currently, the .getsockopt callback requires __user pointers:
> > > >
> > > >   int (*getsockopt)(struct socket *sock, int level,
> > > >                     int optname, char __user *optval, int __user *optlen);
> > > >
> > > > This prevents kernel callers (io_uring, BPF) from using getsockopt on
> > > > levels other than SOL_SOCKET, since they pass kernel pointers.
> > > >
> > > > Following Linus' suggestion [0], this series introduces sockopt_t, a
> > > > type-safe wrapper around iov_iter, and a getsockopt_iter callback that
> > > > works with both user and kernel buffers. AF_PACKET and CAN raw are
> > > > converted as initial users, with selftests covering the trickiest
> > > > conversion patterns.
> > >
> > > What are you doing about the cases where 'optlen' is a complete lie?
> >
> > Is this incorrect optlen originating from userspace, and getting into
> > the .getsockopt callbacks?
>
> Look at tcp_ao_copy_mptks_to_user() in net/ipv4/tcp_ao.c
> This isn't 'old code' it was added in 2023.
>
> Basically what is being transferred is an array and 'optlen' is the
> size of one element.
> The number of elements is in the first one.

Thank you for pointing this out. I now understand the issue you're raising.

The problem is that optlen doesn't represent the full buffer length. Instead:
	optlen = per-element struct size (stride)
	actual buffer length = optlen * nkeys (where nkeys comes from optval[0])

For handling these cases, I can think of several approaches:

1) Dynamically resize the iter once the actual buffer size is discovered:

  int sockopt_set_buflen(sockopt_t *opt, size_t new_len)
  {
      if (!opt->legacy)   /* Following Stefan's suggestion */
	return -EINVAL;

      /* Re-initialize iter with the actual buffer size.
       * For ubuf: same base pointer, updated count.
       * For kvec: same iov_base, updated iov_len + re-init.
       */
  }

This allows legacy protocols to adjust the iov_iter later, mirroring
current behavior. Note that this doesn't worsen the existing
situation—currently, the current code is like having iov_iter length is
set to INT_MAX, given the callback can read/write to any location based
on that __user pointer.

2) Use a special legacy callback path, as proposed by Stefan.

3) Store base pointers in sockopt_t and defer iter initialization for
   legacy callbacks:

  static int tcp_ao_copy_mkts_to_user(const struct sock *sk,
                                    struct tcp_ao_info *ao_info,
                                    sockopt_t *opt)
  {
        struct tcp_ao_getsockopt opt_in;
        int user_len = opt->optlen;
        struct kvec kvec;

        /* First, initialize a small iter to read the first element */
        sockopt_init_iter(opt, user_len, &kvec);

        if (copy_from_iter(&opt_in, user_len, &opt->iter_in) != user_len)
                return -EFAULT;

        /* Now we know the actual buffer size */
        sockopt_init_iter(opt, user_len * opt_in.nkeys, &kvec);

        /* ... write the full array via copy_to_iter() ... */
  }

4) Maintain two separate callbacks: ->getsockopt_iter (to be renamed to ->getsockopt
   after the transition) and ->getsockopt_unsafe for legacy cases.


Regardless of which approach we take for these legacy implementations, I don't
believe any of them invalidate the current patchset from a design standpoint.

Since these legacy protocols represent less than 1% of the cases, I'd prefer to
optimize for the common path and handle the exceptional cases as exceptions.

next prev parent reply	other threads:[~2026-04-10 12:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 10:30 [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 1/4] net: add getsockopt_iter callback to proto_ops Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 2/4] net: call getsockopt_iter if available Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 3/4] af_packet: convert to getsockopt_iter Breno Leitao
2026-04-08 10:30 ` [PATCH net-next v3 4/4] can: raw: " Breno Leitao
2026-04-08 11:26 ` [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers David Laight
2026-04-08 13:52   ` Breno Leitao
2026-04-08 18:56     ` David Laight
2026-04-10 12:29       ` Breno Leitao [this message]
2026-04-10 14:15         ` David Laight
2026-04-08 13:56   ` Stefan Metzmacher
2026-04-09  8:39     ` Stefan Metzmacher
2026-04-08 17:02 ` Stanislav Fomichev
2026-04-10 12:52   ` Breno Leitao
2026-04-10 15:11     ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adjkn7p4U13WBs2o@gmail.com \
    --to=leitao@debian.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=david.laight.linux@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=metze@samba.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=torvalds@linux-foundation.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox