* Hyper-V vsock streams do not fill the supplied buffer in full
@ 2023-07-04 22:45 Gary Guo
2023-07-06 10:01 ` Stefano Garzarella
0 siblings, 1 reply; 3+ messages in thread
From: Gary Guo @ 2023-07-04 22:45 UTC (permalink / raw)
To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Stefano Garzarella
Cc: linux-hyperv, virtualization, netdev, linux-kernel
When a vsock stream is called with recvmsg with a buffer, it only fills
the buffer with data from the first single VM packet. Even if there are
more VM packets at the time and the buffer is still not completely
filled, it will just leave the buffer partially filled.
This causes some issues when in WSLD which uses the vsock in
non-blocking mode and uses epoll.
For stream-oriented sockets, the epoll man page [1] says that
> For stream-oriented files (e.g., pipe, FIFO, stream socket),
> the condition that the read/write I/O space is exhausted can
> also be detected by checking the amount of data read from /
> written to the target file descriptor. For example, if you
> call read(2) by asking to read a certain amount of data and
> read(2) returns a lower number of bytes, you can be sure of
> having exhausted the read I/O space for the file descriptor.
This has been used as an optimisation in the wild for reducing number
of syscalls required for stream sockets (by asserting that the socket
will not have to polled to EAGAIN in edge-trigger mode, if the buffer
given to recvmsg is not filled completely). An example is Tokio, which
starting in v1.21.0 [2].
When this optimisation combines with the behaviour of Hyper-V vsock, it
causes issue in this scenario:
* the VM host send data to the guest, and it's splitted into multiple
VM packets
* sk_data_ready is called and epoll returns, notifying the userspace
that the socket is ready
* userspace call recvmsg with a buffer, and it's partially filled
* userspace assumes that the stream socket is depleted, and if new data
arrives epoll will notify it again.
* kernel always considers the socket to be ready, and since it's in
edge-trigger mode, the epoll instance will never be notified again.
This different realisation of the readiness causes the userspace to
block forever.
[0] https://github.com/nbdd0121/wsld/issues/32
[1] https://man7.org/linux/man-pages/man7/epoll.7.html#:~:text=For%20stream%2Doriented%20files
[2] https://github.com/tokio-rs/tokio/pull/4840
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Hyper-V vsock streams do not fill the supplied buffer in full
2023-07-04 22:45 Hyper-V vsock streams do not fill the supplied buffer in full Gary Guo
@ 2023-07-06 10:01 ` Stefano Garzarella
2023-07-26 21:34 ` Dexuan Cui
0 siblings, 1 reply; 3+ messages in thread
From: Stefano Garzarella @ 2023-07-06 10:01 UTC (permalink / raw)
To: Gary Guo, Dexuan Cui
Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, linux-hyperv,
virtualization, netdev, linux-kernel
Hi Gary,
On Wed, Jul 5, 2023 at 12:45 AM Gary Guo <gary@garyguo.net> wrote:
>
> When a vsock stream is called with recvmsg with a buffer, it only fills
> the buffer with data from the first single VM packet. Even if there are
> more VM packets at the time and the buffer is still not completely
> filled, it will just leave the buffer partially filled.
>
> This causes some issues when in WSLD which uses the vsock in
> non-blocking mode and uses epoll.
>
> For stream-oriented sockets, the epoll man page [1] says that
>
> > For stream-oriented files (e.g., pipe, FIFO, stream socket),
> > the condition that the read/write I/O space is exhausted can
> > also be detected by checking the amount of data read from /
> > written to the target file descriptor. For example, if you
> > call read(2) by asking to read a certain amount of data and
> > read(2) returns a lower number of bytes, you can be sure of
> > having exhausted the read I/O space for the file descriptor.
>
> This has been used as an optimisation in the wild for reducing number
> of syscalls required for stream sockets (by asserting that the socket
> will not have to polled to EAGAIN in edge-trigger mode, if the buffer
> given to recvmsg is not filled completely). An example is Tokio, which
> starting in v1.21.0 [2].
>
> When this optimisation combines with the behaviour of Hyper-V vsock, it
> causes issue in this scenario:
> * the VM host send data to the guest, and it's splitted into multiple
> VM packets
> * sk_data_ready is called and epoll returns, notifying the userspace
> that the socket is ready
> * userspace call recvmsg with a buffer, and it's partially filled
> * userspace assumes that the stream socket is depleted, and if new data
> arrives epoll will notify it again.
> * kernel always considers the socket to be ready, and since it's in
> edge-trigger mode, the epoll instance will never be notified again.
>
> This different realisation of the readiness causes the userspace to
> block forever.
Thanks for the detailed description of the problem.
I think we should fix the hvs_stream_dequeue() in
net/vmw_vsock/hyperv_transport.c.
We can do something similar to what we do in
virtio_transport_stream_do_dequeue() in
net/vmw_vsock/virtio_transport_common.c
@Dexuan WDYT?
Thanks,
Stefano
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Hyper-V vsock streams do not fill the supplied buffer in full
2023-07-06 10:01 ` Stefano Garzarella
@ 2023-07-26 21:34 ` Dexuan Cui
0 siblings, 0 replies; 3+ messages in thread
From: Dexuan Cui @ 2023-07-26 21:34 UTC (permalink / raw)
To: Stefano Garzarella, Gary Guo
Cc: KY Srinivasan, Haiyang Zhang, Wei Liu,
linux-hyperv@vger.kernel.org,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Nischala Yelchuri
> -----Original Message-----
> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Thursday, July 6, 2023 3:02 AM
> To: Gary Guo <gary@garyguo.net>; Dexuan Cui <decui@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Wei Liu <wei.liu@kernel.org>; linux-
> hyperv@vger.kernel.org; virtualization@lists.linux-foundation.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: Hyper-V vsock streams do not fill the supplied buffer in full
>
> Hi Gary,
>
> On Wed, Jul 5, 2023 at 12:45 AM Gary Guo <gary@garyguo.net> wrote:
> >
> > When a vsock stream is called with recvmsg with a buffer, it only fills
> > the buffer with data from the first single VM packet. Even if there are
> > more VM packets at the time and the buffer is still not completely
> > filled, it will just leave the buffer partially filled.
> >
> > This causes some issues when in WSLD which uses the vsock in
> > non-blocking mode and uses epoll.
> >
> > For stream-oriented sockets, the epoll man page [1] says that
> >
> > > For stream-oriented files (e.g., pipe, FIFO, stream socket),
> > > the condition that the read/write I/O space is exhausted can
> > > also be detected by checking the amount of data read from /
> > > written to the target file descriptor. For example, if you
> > > call read(2) by asking to read a certain amount of data and
> > > read(2) returns a lower number of bytes, you can be sure of
> > > having exhausted the read I/O space for the file descriptor.
> >
> > This has been used as an optimisation in the wild for reducing number
> > of syscalls required for stream sockets (by asserting that the socket
> > will not have to polled to EAGAIN in edge-trigger mode, if the buffer
> > given to recvmsg is not filled completely). An example is Tokio, which
> > starting in v1.21.0 [2].
> >
> > When this optimisation combines with the behaviour of Hyper-V vsock, it
> > causes issue in this scenario:
> > * the VM host send data to the guest, and it's splitted into multiple
> > VM packets
> > * sk_data_ready is called and epoll returns, notifying the userspace
> > that the socket is ready
> > * userspace call recvmsg with a buffer, and it's partially filled
> > * userspace assumes that the stream socket is depleted, and if new data
> > arrives epoll will notify it again.
> > * kernel always considers the socket to be ready, and since it's in
> > edge-trigger mode, the epoll instance will never be notified again.
> >
> > This different realisation of the readiness causes the userspace to
> > block forever.
>
> Thanks for the detailed description of the problem.
>
> I think we should fix the hvs_stream_dequeue() in
> net/vmw_vsock/hyperv_transport.c.
> We can do something similar to what we do in
> virtio_transport_stream_do_dequeue() in
> net/vmw_vsock/virtio_transport_common.c
>
> @Dexuan WDYT?
>
> Thanks,
> Stefano
(Sorry for the late response...)
Thanks Gary Guo for the good analysis!
I didn't realize that hvs_stream_dequeue() is supposed to
copy as much data as possible to the userspace in the case
of EPOLLET mode.
Yes, I think we should fix hvs_stream_dequeue(). We'll try to get
this fixed asap.
Thanks,
-- Dexuan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-07-26 21:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-04 22:45 Hyper-V vsock streams do not fill the supplied buffer in full Gary Guo
2023-07-06 10:01 ` Stefano Garzarella
2023-07-26 21:34 ` Dexuan Cui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).