From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EEB035504F for ; Mon, 2 Feb 2026 22:31:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770071496; cv=none; b=XJnVduwfJDg3MaJLif0PI7u9Jzrmozjz2urCQ4RohjPNbMWQJWVJRpzGLLEkwr44CM6SdBX7TTLAo+ND+ZAKo9KEPv8907mhwUDRlaBtXE+GqYrPpNyHjSALhwGHAyFyxCiBcvcgBjOByNVwwgzOheVtmXF79NbW8yPJROW7HF8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770071496; c=relaxed/simple; bh=FKtoDRTtb+4i7v/qjEwG2Xg2B/SIBUBqdJA1PVVs8mU=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a9Swjp/3Do99iidBHv22XTI3f7le1DXEI+GEnkxxI/BP9VphGDIZ9dTAxD8Kv8gbD/PCHm98dK6V0T0nlTQ3Ycb3aiVVRJVY5u0UOSpzRIvC1eDGRMQNDQ7GpGQrZvzN2L9DY1//pqQ5M43uRYBRXRvfR2yFpaPERw3+tY/TBvo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hLTnF6Cv; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hLTnF6Cv" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4806e0f6b69so36483125e9.3 for ; Mon, 02 Feb 2026 14:31:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770071494; x=1770676294; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=NIZUSELiXp+s9+KpMxQDKTA6DmNxb624EHaq2wV6J20=; b=hLTnF6CvvwEYLHacUBpRKRDcOFKxz+c5YDRbqFyQJHGxjggfEdXhb5HALhEHITN1/q oa8y1n1bv4B4xe1Nd3xjxZQueIRnpJRqYnsONW6RasHTvjPqHYiQRwrF3k7MQ4MCRFiF fPpO0pptW3T098QMQM9upnZMloxQtgbpstpwET/j0QOdU+QrNJ2goleiHYMCcPGKIU0X 5z52JGEgkSpmlzxXl82dgyGM/F1PSiFtS8eFb7mzZmpe9k2lEpdZ9yeK8EFd5NvxT/Wj o/v7Av4DO4wwrKc+O+ebSZR09VJSkNBPJGCYCl0MjRwAoSRP6O/PIP6LoitWN/QI3kir HxJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770071494; x=1770676294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NIZUSELiXp+s9+KpMxQDKTA6DmNxb624EHaq2wV6J20=; b=FyyDDf2J5z/OkoXSnYJpSrWZG7VcIaNQCghACOzxt5igqYS6/IYTYMGxfB8p9vAyt7 4sOSannKPQGVlbexr4wODgQ7F5PmEwzCUXI8Knhgu8hrcfLHNEuQRZZm1mitC7cJGZ8I q+KCoqp8RULvTFCqK65cCEEjIbqfmOlRBWYdJ2Xt8t3M3YRQn3JiTH3PfhnhhIMnVGpd zkOWGQNO7cK8wOqV2yXRNlbsz2Cvu5R7RBGihC5P0vywa0O5tEqq/OoMmH2quPnPDdTw zVrUP6AD7YujzC9YTACWsCZ2g82fExygdVaqEpwmwfeJyMbwe2Mzp20AKXZEuvVss6l5 5+XA== X-Forwarded-Encrypted: i=1; AJvYcCUqgeNBoNCviKUU1+cvzk67BAL+2iH9GdwTlkSe9fr4w2wyAFj/qtQ60+ZpIB84Vh0WdTzFyf+zp3NY9fk=@vger.kernel.org X-Gm-Message-State: AOJu0YxUzfbMnIMmol6nht3mm6/3tXUy5hyIQIQPwG1DsSEskIFi6yvk M8HoWA3++sgitoFMZkrMSw/rAFctlMCAbXVnYtziE1FQhxPJAufnUGb2 X-Gm-Gg: AZuq6aJd7wBRqAdPXetWD76O0qx12jRKQ2ITJ+DiHiYwazEkvDSY58YF3R/oPltD3AH 0eQOURvhV6xfsUThOTDYpKbKP8RICbT6Srtu/L5HauAU0VPnPKu/fpiB22rDziqvDr0GuSpWoQ8 Sk9uSk9DOhywb7qb7WJq4SBRNm5XcRFFoUwInZEkc8FHF7qBi3LAdNh1kYhljM9Ep0g1FzjVSzy zB+3urJ65sc/AR1kT+MOxjiQqX4pReHuizL0BsxV4yFDZHMd9RVjJW19RmOhGSPPOjSh7DWRm8l rT0g9vVIt6PihTCM5w63LLyBET8dKqRyknhznXwhd95RqD9kZ1r6bpkankRF72Ya56xV6DAUJlR ApCC2NjHYB12YV81GU3amda3pMGfIMsa8auy/jLjwgnKmq8aoWzzO8XA71/IoMqyQEUeh8qMoxO MjcIwBCspKoD5R5plzF/B47JOvPoVmxJLkQEflY/CJbrIeBZqr1vyu X-Received: by 2002:a05:600c:c16b:b0:477:9dc1:b706 with SMTP id 5b1f17b1804b1-482db47d849mr161092965e9.19.1770071493556; Mon, 02 Feb 2026 14:31:33 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-483051539ddsm17861575e9.9.2026.02.02.14.31.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Feb 2026 14:31:33 -0800 (PST) Date: Mon, 2 Feb 2026 22:31:31 +0000 From: David Laight To: Breno Leitao Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , metze@samba.org, axboe@kernel.dk, Stanislav Fomichev , io-uring@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, Linus Torvalds , linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH net-next RFC 0/3] net: move .getsockopt away from __user buffers Message-ID: <20260202223131.44e81ba1@pumpkin> In-Reply-To: References: <20260130-getsockopt-v1-0-9154fcff6f95@debian.org> <20260130205227.6fb1d9ad@pumpkin> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 2 Feb 2026 04:32:42 -0800 Breno Leitao wrote: > Hello David, > > On Fri, Jan 30, 2026 at 08:52:27PM +0000, David Laight wrote: > > > The system call wrapper can do the user copies, it can also suppress > > the write if the value is unchanged (which matters with clac/slac). > > This aligns with my proposal: using an in-kernel optlen that protocol > functions can operate on directly: > > typedef struct sockopt { > struct iov_iter iter; > int optlen; > } sockopt_t; > > > The obvious change would be to pass the length itself and make the > > return value -ERRNO or the size. > > I explored this approach to avoid embedding optlen in sockopt (which was > Linus' original suggestion). I attempted returning the length both via > iov_iter and as a return value, but neither proved ideal. > > > #define GETSOCKOPT_RVAL(errval, size) (1 << 31 | (errval) << 20 | (size)) > > which would get picked in the rval < 0 path. > > It would also let 'return 0' mean 'don't change the size' requiring > > a special return for the one (or two?) places that want to set the > > size to zero and return success. > > My conclusion is that encoding both optlen and error in the return value > requires pointer manipulation that isn't justified for this slow path. > While technically feasible, the resulting "mixed pointer abomination" > won't be worth it. Not really, they are both just numbers. 99% of the protocol code can just do 'return -Exxxx' or 'return size'. That is all simple and foolproof. The calling code (not many copies) does: rval = foo->getsockopt(..., size_in); size_out = size_in; if (rval >= 0) { if (rval > 0) size_out = rval; rval = 0; } else { /* abnormal path */ if ((rval & (1 << 30))) { size_out = rval & 0xffffff; rval = -((rval & ~(1 << 31)) >> 20); } } if (size_out != size_in) put_user(size_out); return rval; (Or something similar depending on exactly how the values are merged.) > > > There is not much point making the 'optval' parameter more than > > a structure of a user and kernel address - one of which will be NULL. > > (This is safer than sockptr_t's discriminant union.) > > This approach forces every protocol to distinguish between userspace and > kernelspace, then perform the appropriate copy: > > static inline int mgetsockopt(void *kernel_optlen, void *user_optlen, ..) > { > .... > if (kernel_optlen) > memcpy(kernel_optlen, newoptlen, ... > else > copy_to_user(user_optlen, newoptlen, ... > } That is a function provided by the implementation. It is no different from using the ones that act on iov_iter. The real difficultly is stopping the usual culprits (bpf an io_uring) from cheating and looking inside the structures. > Additionally, you'd need safeguards ensuring callers never pass both user > and kernel pointers simultaneously. This seems significantly worse than > using sockptr. Sockptr has the real disadvantage that it is very easy to mix up the kernel and user pointers (there is some horrid code that looks inside). If you have separate pointers that can't happen. You might access NULL, but you are never going to use the wrong address. Remember some systems (s390?) use the same numbers for user and kernel addresses - you have to get it right. In any case, if both addresses are set you can just have a rule that one is used by preference - it isn't a problem. There might be legitimate reasons for setting both pointers. Consider setsockopt, the wrapper could copy small user structures into an on-stack buffer. The structure would then need to contain the address/length of the kernel buffer as well as the actual user address in case the code wants to read more that the expected data length. For a kernel caller you also want the actual length of the buffer as a separate field from the length of the [sg]etsockopt(). I'm not sure what fields you need for the address buffer. Probably 'user address', 'kernel address' and 'kernel length', what you don't need is support for scatter-gather, page list, pipes etc. > > --breno >