From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: "Stefano Garzarella" <sgarzare@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Simon Horman" <horms@kernel.org>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Wei Liu" <wei.liu@kernel.org>,
"Dexuan Cui" <decui@microsoft.com>,
"Bryan Tan" <bryan-bt.tan@broadcom.com>,
"Vishnu Dasa" <vishnu.dasa@broadcom.com>,
"Broadcom internal kernel review list"
<bcm-kernel-feedback-list@broadcom.com>,
"Shuah Khan" <shuah@kernel.org>,
linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kselftest@vger.kernel.org,
berrange@redhat.com, "Sargun Dhillon" <sargun@sargun.me>,
"Bobby Eshleman" <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v12 04/12] vsock: add netns support to virtio transports
Date: Tue, 2 Dec 2025 14:01:04 -0800 [thread overview]
Message-ID: <aS9hoOKb7yA5Qgod@devvm11784.nha0.facebook.com> (raw)
In-Reply-To: <06b7cfea-d366-44f7-943e-087ead2f25c2@redhat.com>
On Tue, Dec 02, 2025 at 09:47:19PM +0100, Paolo Abeni wrote:
> On 12/2/25 6:56 PM, Bobby Eshleman wrote:
> > On Tue, Dec 02, 2025 at 11:18:14AM +0100, Paolo Abeni wrote:
> >> On 11/27/25 8:47 AM, Bobby Eshleman wrote:
> >>> @@ -674,6 +689,17 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
> >>> goto out;
> >>> }
> >>>
> >>> + net = current->nsproxy->net_ns;
> >>> + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
> >>> +
> >>> + /* Store the mode of the namespace at the time of creation. If this
> >>> + * namespace later changes from "global" to "local", we want this vsock
> >>> + * to continue operating normally and not suddenly break. For that
> >>> + * reason, we save the mode here and later use it when performing
> >>> + * socket lookups with vsock_net_check_mode() (see vhost_vsock_get()).
> >>> + */
> >>> + vsock->net_mode = vsock_net_mode(net);
> >>
> >> I'm sorry for the very late feedback. I think that at very least the
> >> user-space needs a way to query if the given transport is in local or
> >> global mode, as AFAICS there is no way to tell that when socket creation
> >> races with mode change.
> >
> > Are you thinking something along the lines of sockopt?
>
> I'd like to see a way for the user-space to query the socket 'namespace
> mode'.
>
> sockopt could be an option; a possibly better one could be sock_diag. Or
> you could do both using dumping the info with a shared helper invoked by
> both code paths, alike what TCP is doing.
> >> Also I'm a bit uneasy with the model implemented here, as 'local' socket
> >> may cross netns boundaris and connect to 'local' socket in other netns
> >> (if I read correctly patch 2/12). That in turns AFAICS break the netns
> >> isolation.
> >
> > Local mode sockets are unable to communicate with local mode (and global
> > mode too) sockets that are in other namespaces. The key piece of code
> > for that is vsock_net_check_mode(), where if either modes is local the
> > namespaces must be the same.
>
> Sorry, I likely misread the large comment in patch 2:
>
> https://lore.kernel.org/netdev/20251126-vsock-vmtest-v12-2-257ee21cd5de@meta.com/
>
> >> Have you considered instead a slightly different model, where the
> >> local/global model is set in stone at netns creation time - alike what
> >> /proc/sys/net/ipv4/tcp_child_ehash_entries is doing[1] - and
> >> inter-netns connectivity is explicitly granted by the admin (I guess
> >> you will need new transport operations for that)?
> >>
> >> /P
> >>
> >> [1] tcp allows using per-netns established socket lookup tables - as
> >> opposed to the default global lookup table (even if match always takes
> >> in account the netns obviously). The mentioned sysctl specify such
> >> configuration for the children namespaces, if any.
> >
> > I'll save this discussion if the above doesn't resolve your concerns.
> I still have some concern WRT the dynamic mode change after netns
> creation. I fear some 'unsolvable' (or very hard to solve) race I can't
> see now. A tcp_child_ehash_entries-like model will avoid completely the
> issue, but I understand it would be a significant change over the
> current status.
>
> "Luckily" the merge window is on us and we have some time to discuss. Do
> you have a specific use-case for the ability to change the netns mode
> after creation?
>
> /P
I don't think there is a hard requirement that the mode be change-able
after creation. Though I'd love to avoid such a big change... or at
least leave unchanged as much of what we've already reviewed as
possible.
In the scheme of defining the mode at creation and following the
tcp_child_ehash_entries-ish model, what I'm imagining is:
- /proc/sys/net/vsock/child_ns_mode can be set to "local" or "global"
- /proc/sys/net/vsock/child_ns_mode is not immutable, can change any
number of times
- when a netns is created, the new netns mode is inherited from
child_ns_mode, being assigned using something like:
net->vsock.ns_mode =
get_net_ns_by_pid(current->pid)->child_ns_mode
- /proc/sys/net/vsock/ns_mode queries the current mode, returning
"local" or "global", returning value of net->vsock.ns_mode
- /proc/sys/net/vsock/ns_mode and net->vsock.ns_mode are immutable and
reject writes
Does that align with what you have in mind?
Stefano, what are your thoughts?
Best,
Bobby
next prev parent reply other threads:[~2025-12-02 22:01 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-27 7:47 [PATCH net-next v12 00/12] vsock: add namespace support to vhost-vsock and loopback Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 01/12] vsock: a per-net vsock NS mode state Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 02/12] vsock: add netns to vsock core Bobby Eshleman
2025-11-27 14:25 ` Stefano Garzarella
2025-11-27 16:33 ` Bobby Eshleman
2026-01-11 9:16 ` Michael S. Tsirkin
2025-11-27 7:47 ` [PATCH net-next v12 03/12] virtio: set skb owner of virtio_transport_reset_no_sock() reply Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 04/12] vsock: add netns support to virtio transports Bobby Eshleman
2025-11-27 14:44 ` Stefano Garzarella
2025-12-02 10:18 ` Paolo Abeni
2025-12-02 17:56 ` Bobby Eshleman
2025-12-02 20:00 ` Bobby Eshleman
2025-12-02 20:47 ` Paolo Abeni
2025-12-02 22:01 ` Bobby Eshleman [this message]
2025-12-03 14:16 ` Stefano Garzarella
2025-12-12 15:26 ` Bobby Eshleman
2025-12-15 14:11 ` Stefano Garzarella
2025-12-16 1:22 ` Bobby Eshleman
2025-12-24 0:32 ` Bobby Eshleman
2025-12-24 13:01 ` Stefano Garzarella
2026-01-07 9:47 ` Paolo Abeni
2026-01-08 0:41 ` Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 05/12] selftests/vsock: add namespace helpers to vmtest.sh Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 06/12] selftests/vsock: prepare vm management helpers for namespaces Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 07/12] selftests/vsock: add vm_dmesg_{warn,oops}_count() helpers Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 08/12] selftests/vsock: use ss to wait for listeners instead of /proc/net Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 09/12] selftests/vsock: add tests for proc sys vsock ns_mode Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 10/12] selftests/vsock: add namespace tests for CID collisions Bobby Eshleman
2025-11-27 7:47 ` [PATCH net-next v12 11/12] selftests/vsock: add tests for host <-> vm connectivity with namespaces Bobby Eshleman
2025-11-27 14:49 ` Stefano Garzarella
2025-11-27 7:47 ` [PATCH net-next v12 12/12] selftests/vsock: add tests for namespace deletion and mode changes Bobby Eshleman
2025-11-27 15:17 ` [PATCH net-next v12 00/12] vsock: add namespace support to vhost-vsock and loopback Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS9hoOKb7yA5Qgod@devvm11784.nha0.facebook.com \
--to=bobbyeshleman@gmail.com \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=berrange@redhat.com \
--cc=bobbyeshleman@meta.com \
--cc=bryan-bt.tan@broadcom.com \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=haiyangz@microsoft.com \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sargun@sargun.me \
--cc=sgarzare@redhat.com \
--cc=shuah@kernel.org \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=vishnu.dasa@broadcom.com \
--cc=wei.liu@kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.