* [PATCH net] vsock: keep poll shutdown state consistent
@ 2026-05-16 3:47 Ziyu Zhang
2026-05-18 10:16 ` Stefano Garzarella
0 siblings, 1 reply; 5+ messages in thread
From: Ziyu Zhang @ 2026-05-16 3:47 UTC (permalink / raw)
To: Stefano Garzarella, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Andy King, George Zhang, Dmitry Torokhov,
virtualization, netdev, linux-kernel, baijiaju1990, r33s3n6,
gality369, zhenghaoran154, hanguidong02, zzzccc427, Ziyu Zhang
vsock_poll() reads vsk->peer_shutdown before taking the socket
lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the
lock to report EOF readability. A shutdown packet can update
peer_shutdown while poll is waiting for the lock, so one poll invocation
can report EPOLLIN without the corresponding HUP/RDHUP bits.
Keep non-connectible sockets on a single lockless READ_ONCE()
snapshot. For connectible sockets, defer shutdown-derived poll bits
until after lock_sock() and use one READ_ONCE() snapshot for both EOF
readability and HUP/RDHUP. This preserves shutdowns that arrive before
the lock is acquired and keeps all peer-shutdown-derived bits consistent
for a poll pass.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Signed-off-by: Ziyu Zhang <ziyuzhang201@gmail.com>
---
net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++--------------
1 file changed, 26 insertions(+), 14 deletions(-)
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index adcba1b7b..bed42347b 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode)
return err;
}
+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown)
+{
+ __poll_t mask = 0;
+
+ /* INET sockets treat local write shutdown and peer write shutdown as a
+ * case of EPOLLHUP set.
+ */
+ if (sk->sk_shutdown == SHUTDOWN_MASK ||
+ ((sk->sk_shutdown & SEND_SHUTDOWN) &&
+ (peer_shutdown & SEND_SHUTDOWN)))
+ mask |= EPOLLHUP;
+
+ if (sk->sk_shutdown & RCV_SHUTDOWN ||
+ peer_shutdown & SEND_SHUTDOWN)
+ mask |= EPOLLRDHUP;
+
+ return mask;
+}
+
static __poll_t vsock_poll(struct file *file, struct socket *sock,
poll_table *wait)
{
@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
/* Signify that there has been an error on this socket. */
mask |= EPOLLERR;
- /* INET sockets treat local write shutdown and peer write shutdown as a
- * case of EPOLLHUP set.
- */
- if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
- ((sk->sk_shutdown & SEND_SHUTDOWN) &&
- (vsk->peer_shutdown & SEND_SHUTDOWN))) {
- mask |= EPOLLHUP;
- }
-
- if (sk->sk_shutdown & RCV_SHUTDOWN ||
- vsk->peer_shutdown & SEND_SHUTDOWN) {
- mask |= EPOLLRDHUP;
- }
+ if (!sock_type_connectible(sk->sk_type))
+ mask |= vsock_poll_shutdown(sk,
+ READ_ONCE(vsk->peer_shutdown));
if (sk_is_readable(sk))
mask |= EPOLLIN | EPOLLRDNORM;
@@ -1171,6 +1180,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
} else if (sock_type_connectible(sk->sk_type)) {
const struct vsock_transport *transport;
+ u32 peer_shutdown;
lock_sock(sk);
@@ -1203,10 +1213,12 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock,
* terminated should also be considered read, and we check the
* shutdown flag for that.
*/
+ peer_shutdown = READ_ONCE(vsk->peer_shutdown);
if (sk->sk_shutdown & RCV_SHUTDOWN ||
- vsk->peer_shutdown & SEND_SHUTDOWN) {
+ peer_shutdown & SEND_SHUTDOWN) {
mask |= EPOLLIN | EPOLLRDNORM;
}
+ mask |= vsock_poll_shutdown(sk, peer_shutdown);
/* Connected sockets that can produce data can be written. */
if (transport && sk->sk_state == TCP_ESTABLISHED) {
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net] vsock: keep poll shutdown state consistent 2026-05-16 3:47 [PATCH net] vsock: keep poll shutdown state consistent Ziyu Zhang @ 2026-05-18 10:16 ` Stefano Garzarella 2026-05-18 12:39 ` ziyu zhang 0 siblings, 1 reply; 5+ messages in thread From: Stefano Garzarella @ 2026-05-18 10:16 UTC (permalink / raw) To: Ziyu Zhang Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Andy King, George Zhang, Dmitry Torokhov, virtualization, netdev, linux-kernel, baijiaju1990, r33s3n6, gality369, zhenghaoran154, hanguidong02, zzzccc427 On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote: >vsock_poll() reads vsk->peer_shutdown before taking the socket >lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the >lock to report EOF readability. A shutdown packet can update >peer_shutdown while poll is waiting for the lock, so one poll invocation >can report EPOLLIN without the corresponding HUP/RDHUP bits. > >Keep non-connectible sockets on a single lockless READ_ONCE() Should this be paired with WRITE_ONCE() on writers? >snapshot. For connectible sockets, defer shutdown-derived poll bits >until after lock_sock() and use one READ_ONCE() snapshot for both EOF >readability and HUP/RDHUP. This preserves shutdowns that arrive before >the lock is acquired and keeps all peer-shutdown-derived bits consistent >for a poll pass. > >Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") >Signed-off-by: Ziyu Zhang <ziyuzhang201@gmail.com> >--- > net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++-------------- > 1 file changed, 26 insertions(+), 14 deletions(-) > >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c >index adcba1b7b..bed42347b 100644 >--- a/net/vmw_vsock/af_vsock.c >+++ b/net/vmw_vsock/af_vsock.c >@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode) > return err; > } > >+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown) >+{ >+ __poll_t mask = 0; >+ >+ /* INET sockets treat local write shutdown and peer write shutdown as a >+ * case of EPOLLHUP set. >+ */ >+ if (sk->sk_shutdown == SHUTDOWN_MASK || >+ ((sk->sk_shutdown & SEND_SHUTDOWN) && >+ (peer_shutdown & SEND_SHUTDOWN))) >+ mask |= EPOLLHUP; >+ >+ if (sk->sk_shutdown & RCV_SHUTDOWN || >+ peer_shutdown & SEND_SHUTDOWN) >+ mask |= EPOLLRDHUP; >+ >+ return mask; >+} >+ > static __poll_t vsock_poll(struct file *file, struct socket *sock, > poll_table *wait) > { >@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > /* Signify that there has been an error on this socket. */ > mask |= EPOLLERR; > >- /* INET sockets treat local write shutdown and peer write shutdown as a >- * case of EPOLLHUP set. >- */ >- if ((sk->sk_shutdown == SHUTDOWN_MASK) || >- ((sk->sk_shutdown & SEND_SHUTDOWN) && >- (vsk->peer_shutdown & SEND_SHUTDOWN))) { >- mask |= EPOLLHUP; >- } >- >- if (sk->sk_shutdown & RCV_SHUTDOWN || >- vsk->peer_shutdown & SEND_SHUTDOWN) { >- mask |= EPOLLRDHUP; >- } >+ if (!sock_type_connectible(sk->sk_type)) >+ mask |= vsock_poll_shutdown(sk, >+ READ_ONCE(vsk->peer_shutdown)); Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ? Not a strong opinion about that, but in any case IMO we should add a comment here to explain why we are doing only for not connectible sockets. That said, if we use WRITE_ONCE in the writers, do we really need to move this after the lock_sock for the connectable ones? > > if (sk_is_readable(sk)) > mask |= EPOLLIN | EPOLLRDNORM; >@@ -1171,6 +1180,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > > } else if (sock_type_connectible(sk->sk_type)) { > const struct vsock_transport *transport; >+ u32 peer_shutdown; > > lock_sock(sk); > >@@ -1203,10 +1213,12 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > * terminated should also be considered read, and we check the > * shutdown flag for that. > */ >+ peer_shutdown = READ_ONCE(vsk->peer_shutdown); > if (sk->sk_shutdown & RCV_SHUTDOWN || >- vsk->peer_shutdown & SEND_SHUTDOWN) { >+ peer_shutdown & SEND_SHUTDOWN) { > mask |= EPOLLIN | EPOLLRDNORM; > } >+ mask |= vsock_poll_shutdown(sk, peer_shutdown); nit: to keep the order the same as before, I would move this call just before this `if` block, but I don't think it makes any difference in the end. > > /* Connected sockets that can produce data can be written. */ > if (transport && sk->sk_state == TCP_ESTABLISHED) { >-- >2.43.0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] vsock: keep poll shutdown state consistent 2026-05-18 10:16 ` Stefano Garzarella @ 2026-05-18 12:39 ` ziyu zhang 2026-05-19 9:36 ` Stefano Garzarella 0 siblings, 1 reply; 5+ messages in thread From: ziyu zhang @ 2026-05-18 12:39 UTC (permalink / raw) To: Stefano Garzarella Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Andy King, George Zhang, Dmitry Torokhov, virtualization, netdev, linux-kernel, baijiaju1990, r33s3n6, gality369, zhenghaoran154, hanguidong02, zzzccc427 On Mon, 18 May 2026 at 18:16, Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote: > >vsock_poll() reads vsk->peer_shutdown before taking the socket > >lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the > >lock to report EOF readability. A shutdown packet can update > >peer_shutdown while poll is waiting for the lock, so one poll invocation > >can report EPOLLIN without the corresponding HUP/RDHUP bits. > > > >Keep non-connectible sockets on a single lockless READ_ONCE() > > Should this be paired with WRITE_ONCE() on writers? Yes, since the poll path uses lockless READ_ONCE() snapshots of peer_shutdown, the writer side should be annotated with WRITE_ONCE() as well. I will add that in v2. > > >snapshot. For connectible sockets, defer shutdown-derived poll bits > >until after lock_sock() and use one READ_ONCE() snapshot for both EOF > >readability and HUP/RDHUP. This preserves shutdowns that arrive before > >the lock is acquired and keeps all peer-shutdown-derived bits consistent > >for a poll pass. > > > >Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") > >Signed-off-by: Ziyu Zhang <ziyuzhang201@gmail.com> > >--- > > net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++-------------- > > 1 file changed, 26 insertions(+), 14 deletions(-) > > > >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > >index adcba1b7b..bed42347b 100644 > >--- a/net/vmw_vsock/af_vsock.c > >+++ b/net/vmw_vsock/af_vsock.c > >@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode) > > return err; > > } > > > >+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown) > >+{ > >+ __poll_t mask = 0; > >+ > >+ /* INET sockets treat local write shutdown and peer write shutdown as a > >+ * case of EPOLLHUP set. > >+ */ > >+ if (sk->sk_shutdown == SHUTDOWN_MASK || > >+ ((sk->sk_shutdown & SEND_SHUTDOWN) && > >+ (peer_shutdown & SEND_SHUTDOWN))) > >+ mask |= EPOLLHUP; > >+ > >+ if (sk->sk_shutdown & RCV_SHUTDOWN || > >+ peer_shutdown & SEND_SHUTDOWN) > >+ mask |= EPOLLRDHUP; > >+ > >+ return mask; > >+} > >+ > > static __poll_t vsock_poll(struct file *file, struct socket *sock, > > poll_table *wait) > > { > >@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > > /* Signify that there has been an error on this socket. */ > > mask |= EPOLLERR; > > > >- /* INET sockets treat local write shutdown and peer write shutdown as a > >- * case of EPOLLHUP set. > >- */ > >- if ((sk->sk_shutdown == SHUTDOWN_MASK) || > >- ((sk->sk_shutdown & SEND_SHUTDOWN) && > >- (vsk->peer_shutdown & SEND_SHUTDOWN))) { > >- mask |= EPOLLHUP; > >- } > >- > >- if (sk->sk_shutdown & RCV_SHUTDOWN || > >- vsk->peer_shutdown & SEND_SHUTDOWN) { > >- mask |= EPOLLRDHUP; > >- } > >+ if (!sock_type_connectible(sk->sk_type)) > >+ mask |= vsock_poll_shutdown(sk, > >+ READ_ONCE(vsk->peer_shutdown)); > > Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ? > > Not a strong opinion about that, but in any case IMO we should add a > comment here to explain why we are doing only for not connectible > sockets. > > That said, if we use WRITE_ONCE in the writers, do we really need to > move this after the lock_sock for the connectable ones? Yes, I will move the non-connectible handling into the SOCK_DGRAM branch and add a comment there. For connectible sockets, my current understanding is that READ_ONCE()/WRITE_ONCE() make the individual lockless accesses explicit, but they do not make two separate reads in one vsock_poll() invocation observe the same peer_shutdown value. So I still think using one peer_shutdown snapshot after lock_sock() is useful for keeping the returned mask internally consistent. Please let me know if you think WRITE_ONCE() is enough for this case. > > > > > if (sk_is_readable(sk)) > > mask |= EPOLLIN | EPOLLRDNORM; > >@@ -1171,6 +1180,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > > > > } else if (sock_type_connectible(sk->sk_type)) { > > const struct vsock_transport *transport; > >+ u32 peer_shutdown; > > > > lock_sock(sk); > > > >@@ -1203,10 +1213,12 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > > * terminated should also be considered read, and we check the > > * shutdown flag for that. > > */ > >+ peer_shutdown = READ_ONCE(vsk->peer_shutdown); > > if (sk->sk_shutdown & RCV_SHUTDOWN || > >- vsk->peer_shutdown & SEND_SHUTDOWN) { > >+ peer_shutdown & SEND_SHUTDOWN) { > > mask |= EPOLLIN | EPOLLRDNORM; > > } > >+ mask |= vsock_poll_shutdown(sk, peer_shutdown); > > nit: to keep the order the same as before, I would move this call just > before this `if` block, but I don't think it makes any difference in the > end. Sure, I will restore the previous ordering in v2. Thanks, Ziyu > > > > > /* Connected sockets that can produce data can be written. */ > > if (transport && sk->sk_state == TCP_ESTABLISHED) { > >-- > >2.43.0 > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] vsock: keep poll shutdown state consistent 2026-05-18 12:39 ` ziyu zhang @ 2026-05-19 9:36 ` Stefano Garzarella 2026-05-19 15:58 ` ziyu zhang 0 siblings, 1 reply; 5+ messages in thread From: Stefano Garzarella @ 2026-05-19 9:36 UTC (permalink / raw) To: ziyu zhang Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Andy King, George Zhang, Dmitry Torokhov, virtualization, netdev, linux-kernel, baijiaju1990, r33s3n6, gality369, zhenghaoran154, hanguidong02, zzzccc427 On Mon, May 18, 2026 at 08:39:37PM +0800, ziyu zhang wrote: >On Mon, 18 May 2026 at 18:16, Stefano Garzarella <sgarzare@redhat.com> wrote: >> >> On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote: >> >vsock_poll() reads vsk->peer_shutdown before taking the socket >> >lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the >> >lock to report EOF readability. A shutdown packet can update >> >peer_shutdown while poll is waiting for the lock, so one poll invocation >> >can report EPOLLIN without the corresponding HUP/RDHUP bits. >> > >> >Keep non-connectible sockets on a single lockless READ_ONCE() >> >> Should this be paired with WRITE_ONCE() on writers? > >Yes, since the poll path uses lockless READ_ONCE() snapshots of >peer_shutdown, the writer side should be annotated with WRITE_ONCE() as >well. I will add that in v2. > >> >> >snapshot. For connectible sockets, defer shutdown-derived poll bits >> >until after lock_sock() and use one READ_ONCE() snapshot for both EOF >> >readability and HUP/RDHUP. This preserves shutdowns that arrive before >> >the lock is acquired and keeps all peer-shutdown-derived bits consistent >> >for a poll pass. >> > >> >Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") >> >Signed-off-by: Ziyu Zhang <ziyuzhang201@gmail.com> >> >--- >> > net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++-------------- >> > 1 file changed, 26 insertions(+), 14 deletions(-) >> > >> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c >> >index adcba1b7b..bed42347b 100644 >> >--- a/net/vmw_vsock/af_vsock.c >> >+++ b/net/vmw_vsock/af_vsock.c >> >@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode) >> > return err; >> > } >> > >> >+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown) >> >+{ >> >+ __poll_t mask = 0; >> >+ >> >+ /* INET sockets treat local write shutdown and peer write shutdown as a >> >+ * case of EPOLLHUP set. >> >+ */ >> >+ if (sk->sk_shutdown == SHUTDOWN_MASK || >> >+ ((sk->sk_shutdown & SEND_SHUTDOWN) && >> >+ (peer_shutdown & SEND_SHUTDOWN))) >> >+ mask |= EPOLLHUP; >> >+ >> >+ if (sk->sk_shutdown & RCV_SHUTDOWN || >> >+ peer_shutdown & SEND_SHUTDOWN) >> >+ mask |= EPOLLRDHUP; >> >+ >> >+ return mask; >> >+} >> >+ >> > static __poll_t vsock_poll(struct file *file, struct socket *sock, >> > poll_table *wait) >> > { >> >@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, >> > /* Signify that there has been an error on this socket. */ >> > mask |= EPOLLERR; >> > >> >- /* INET sockets treat local write shutdown and peer write shutdown as a >> >- * case of EPOLLHUP set. >> >- */ >> >- if ((sk->sk_shutdown == SHUTDOWN_MASK) || >> >- ((sk->sk_shutdown & SEND_SHUTDOWN) && >> >- (vsk->peer_shutdown & SEND_SHUTDOWN))) { >> >- mask |= EPOLLHUP; >> >- } >> >- >> >- if (sk->sk_shutdown & RCV_SHUTDOWN || >> >- vsk->peer_shutdown & SEND_SHUTDOWN) { >> >- mask |= EPOLLRDHUP; >> >- } >> >+ if (!sock_type_connectible(sk->sk_type)) >> >+ mask |= vsock_poll_shutdown(sk, >> >+ READ_ONCE(vsk->peer_shutdown)); >> >> Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ? >> >> Not a strong opinion about that, but in any case IMO we should add a >> comment here to explain why we are doing only for not connectible >> sockets. >> >> That said, if we use WRITE_ONCE in the writers, do we really need to >> move this after the lock_sock for the connectable ones? > >Yes, I will move the non-connectible handling into the SOCK_DGRAM branch >and add a comment there. > >For connectible sockets, my current understanding is that >READ_ONCE()/WRITE_ONCE() make the individual lockless accesses explicit, but >they do not make two separate reads in one vsock_poll() invocation observe the >same peer_shutdown value. So I still think using one peer_shutdown snapshot >after lock_sock() is useful for keeping the returned mask internally >consistent. Please let me know if you think WRITE_ONCE() is enough for this >case. What will be the issue of "do not make two separate reads in one vsock_poll() invocation observe the same peer_shutdown value." ? In any case, I'm not against it; I just want to understand the issue better. Thanks, Stefano ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] vsock: keep poll shutdown state consistent 2026-05-19 9:36 ` Stefano Garzarella @ 2026-05-19 15:58 ` ziyu zhang 0 siblings, 0 replies; 5+ messages in thread From: ziyu zhang @ 2026-05-19 15:58 UTC (permalink / raw) To: Stefano Garzarella Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Andy King, George Zhang, Dmitry Torokhov, virtualization, netdev, linux-kernel, baijiaju1990, r33s3n6, gality369, zhenghaoran154, hanguidong02, zzzccc427 On Tue, 19 May 2026 at 17:36, Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Mon, May 18, 2026 at 08:39:37PM +0800, ziyu zhang wrote: > >On Mon, 18 May 2026 at 18:16, Stefano Garzarella <sgarzare@redhat.com> wrote: > >> > >> On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote: > >> >vsock_poll() reads vsk->peer_shutdown before taking the socket > >> >lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the > >> >lock to report EOF readability. A shutdown packet can update > >> >peer_shutdown while poll is waiting for the lock, so one poll invocation > >> >can report EPOLLIN without the corresponding HUP/RDHUP bits. > >> > > >> >Keep non-connectible sockets on a single lockless READ_ONCE() > >> > >> Should this be paired with WRITE_ONCE() on writers? > > > >Yes, since the poll path uses lockless READ_ONCE() snapshots of > >peer_shutdown, the writer side should be annotated with WRITE_ONCE() as > >well. I will add that in v2. > > > >> > >> >snapshot. For connectible sockets, defer shutdown-derived poll bits > >> >until after lock_sock() and use one READ_ONCE() snapshot for both EOF > >> >readability and HUP/RDHUP. This preserves shutdowns that arrive before > >> >the lock is acquired and keeps all peer-shutdown-derived bits consistent > >> >for a poll pass. > >> > > >> >Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") > >> >Signed-off-by: Ziyu Zhang <ziyuzhang201@gmail.com> > >> >--- > >> > net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++-------------- > >> > 1 file changed, 26 insertions(+), 14 deletions(-) > >> > > >> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > >> >index adcba1b7b..bed42347b 100644 > >> >--- a/net/vmw_vsock/af_vsock.c > >> >+++ b/net/vmw_vsock/af_vsock.c > >> >@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode) > >> > return err; > >> > } > >> > > >> >+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown) > >> >+{ > >> >+ __poll_t mask = 0; > >> >+ > >> >+ /* INET sockets treat local write shutdown and peer write shutdown as a > >> >+ * case of EPOLLHUP set. > >> >+ */ > >> >+ if (sk->sk_shutdown == SHUTDOWN_MASK || > >> >+ ((sk->sk_shutdown & SEND_SHUTDOWN) && > >> >+ (peer_shutdown & SEND_SHUTDOWN))) > >> >+ mask |= EPOLLHUP; > >> >+ > >> >+ if (sk->sk_shutdown & RCV_SHUTDOWN || > >> >+ peer_shutdown & SEND_SHUTDOWN) > >> >+ mask |= EPOLLRDHUP; > >> >+ > >> >+ return mask; > >> >+} > >> >+ > >> > static __poll_t vsock_poll(struct file *file, struct socket *sock, > >> > poll_table *wait) > >> > { > >> >@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, > >> > /* Signify that there has been an error on this socket. */ > >> > mask |= EPOLLERR; > >> > > >> >- /* INET sockets treat local write shutdown and peer write shutdown as a > >> >- * case of EPOLLHUP set. > >> >- */ > >> >- if ((sk->sk_shutdown == SHUTDOWN_MASK) || > >> >- ((sk->sk_shutdown & SEND_SHUTDOWN) && > >> >- (vsk->peer_shutdown & SEND_SHUTDOWN))) { > >> >- mask |= EPOLLHUP; > >> >- } > >> >- > >> >- if (sk->sk_shutdown & RCV_SHUTDOWN || > >> >- vsk->peer_shutdown & SEND_SHUTDOWN) { > >> >- mask |= EPOLLRDHUP; > >> >- } > >> >+ if (!sock_type_connectible(sk->sk_type)) > >> >+ mask |= vsock_poll_shutdown(sk, > >> >+ READ_ONCE(vsk->peer_shutdown)); > >> > >> Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ? > >> > >> Not a strong opinion about that, but in any case IMO we should add a > >> comment here to explain why we are doing only for not connectible > >> sockets. > >> > >> That said, if we use WRITE_ONCE in the writers, do we really need to > >> move this after the lock_sock for the connectable ones? > > > >Yes, I will move the non-connectible handling into the SOCK_DGRAM branch > >and add a comment there. > > > >For connectible sockets, my current understanding is that > >READ_ONCE()/WRITE_ONCE() make the individual lockless accesses explicit, but > >they do not make two separate reads in one vsock_poll() invocation observe the > >same peer_shutdown value. So I still think using one peer_shutdown snapshot > >after lock_sock() is useful for keeping the returned mask internally > >consistent. Please let me know if you think WRITE_ONCE() is enough for this > >case. > > What will be the issue of "do not make two separate reads in one > vsock_poll() invocation observe the same peer_shutdown value." ? > > In any case, I'm not against it; I just want to understand the issue > better. The issue I was trying to avoid is a transiently inconsistent poll mask from one vsock_poll() pass. For example, the early peer_shutdown read can see 0, so the shutdown-derived bits are not set, especially EPOLLRDHUP, and EPOLLHUP in the cases where the existing logic would set it. Then, before lock_sock() succeeds, the peer shutdown can be processed. The later read after lock_sock() can see SEND_SHUTDOWN and set EPOLLIN|EPOLLRDNORM for EOF readability. So the returned mask can say that EOF is readable, but miss the shutdown-derived indication from the same peer_shutdown state. If userspace is waiting specifically for EPOLLRDHUP, that notification can be missed or delayed for that poll pass. I agree this is small and transient. A following read() or another poll pass will observe the shutdown state. My intention with the single snapshot is only to avoid mixing old and new peer_shutdown values in one returned mask. I will send a v2 as a new thread with WRITE_ONCE() on the writers, the DGRAM comment, and your ordering suggestion. Thanks, Ziyu > > Thanks, > Stefano > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-19 15:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-16 3:47 [PATCH net] vsock: keep poll shutdown state consistent Ziyu Zhang 2026-05-18 10:16 ` Stefano Garzarella 2026-05-18 12:39 ` ziyu zhang 2026-05-19 9:36 ` Stefano Garzarella 2026-05-19 15:58 ` ziyu zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox