All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Cc: Stefano Garzarella <sgarzare@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"sthemmin@microsoft.com" <sthemmin@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Dexuan Cui <decui@microsoft.com>, Bryan Tan <bryantan@vmware.com>,
	Vishnu Dasa <vdasa@vmware.com>,
	Krasnov Arseniy <oxffffaa@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"virtualization@lists.linux-foundation.org" 
	<virtualization@lists.linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	kernel <kernel@sberdevices.ru>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
Subject: Re: [PATCH net-next v4 0/9] vsock: updates for SO_RCVLOWAT handling
Date: Tue, 23 Aug 2022 15:14:10 -0400	[thread overview]
Message-ID: <YwUnAhWauSFSJX+g@fedora> (raw)
In-Reply-To: <de41de4c-0345-34d7-7c36-4345258b7ba8@sberdevices.ru>

[-- Attachment #1: Type: text/plain, Size: 4416 bytes --]

On Fri, Aug 19, 2022 at 05:21:58AM +0000, Arseniy Krasnov wrote:
> Hello,
> 
> This patchset includes some updates for SO_RCVLOWAT:
> 
> 1) af_vsock:
>    During my experiments with zerocopy receive, i found, that in some
>    cases, poll() implementation violates POSIX: when socket has non-
>    default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>    POLLRDNORM bits in 'revents' even number of bytes available to read
>    on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>    POLLIN flag and then tries to read data(for example using  'read()'
>    call), but read call will be blocked, because  SO_RCVLOWAT logic is
>    supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>    requires that:
> 
>    "POLLIN     Data other than high-priority data may be read without
>                blocking.
>     POLLRDNORM Normal data may be read without blocking."
> 
>    See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
> 
>    So, we have, that poll() syscall returns POLLIN, but read call will
>    be blocked.
> 
>    Also in man page socket(7) i found that:
> 
>    "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>    socket as readable only if at least SO_RCVLOWAT bytes are available."
> 
>    I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>    uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>    this case for TCP socket, it works as POSIX required.
> 
>    I've added some fixes to af_vsock.c and virtio_transport_common.c,
>    test is also implemented.
> 
> 2) virtio/vsock:
>    It adds some optimization to wake ups, when new data arrived. Now,
>    SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>    There is no sense, to kick waiter, when number of available bytes
>    in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>    this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>    or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>    exit from poll() will be "spurious". This logic is also used in TCP
>    sockets.
> 
> 3) vmci/vsock:
>    Same as 2), but i'm not sure about this changes. Will be very good,
>    to get comments from someone who knows this code.
> 
> 4) Hyper-V:
>    As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>    support SO_RCVLOWAT, so he suggested to disable this feature for
>    Hyper-V.
> 
> Thank You

Hi Arseniy,
Stefano will be online again on Monday. I suggest we wait for him to
review this series. If it's urgent, please let me know and I'll take a
look.

Thanks,
Stefan

> Arseniy Krasnov(9):
>  vsock: SO_RCVLOWAT transport set callback
>  hv_sock: disable SO_RCVLOWAT support
>  virtio/vsock: use 'target' in notify_poll_in callback
>  vmci/vsock: use 'target' in notify_poll_in callback
>  vsock: pass sock_rcvlowat to notify_poll_in as target
>  vsock: add API call for data ready
>  virtio/vsock: check SO_RCVLOWAT before wake up reader
>  vmci/vsock: check SO_RCVLOWAT before wake up reader
>  vsock_test: POLLIN + SO_RCVLOWAT test
> 
>  include/net/af_vsock.h                       |   2 +
>  net/vmw_vsock/af_vsock.c                     |  33 +++++++-
>  net/vmw_vsock/hyperv_transport.c             |   7 ++
>  net/vmw_vsock/virtio_transport_common.c      |   7 +-
>  net/vmw_vsock/vmci_transport_notify.c        |  10 +--
>  net/vmw_vsock/vmci_transport_notify_qstate.c |  12 +--
>  tools/testing/vsock/vsock_test.c             | 108 +++++++++++++++++++++++++++
>  7 files changed, 162 insertions(+), 17 deletions(-)
> 
>  Changelog:
> 
>  v1 -> v2:
>  1) Patches for VMCI transport(same as for virtio-vsock).
>  2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
>  3) Waiting logic in test was updated(sleep() -> poll()).
> 
>  v2 -> v3:
>  1) Patches were reordered.
>  2) Commit message updated in 0005.
>  3) Check 'transport' pointer in 0001 for NULL.
> 
>  v3 -> v4:
>  1) vsock_set_rcvlowat() logic changed. Previous version required
>     assigned transport and always called its 'set_rcvlowat' callback
>     (if present). Now, assignment is not needed.
>  2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
>  3) 0009 - small refactoring and style fixes.
>  4) RFC tag was removed.
> 
> -- 
> 2.25.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Cc: Vishnu Dasa <vdasa@vmware.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"sthemmin@microsoft.com" <sthemmin@microsoft.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"virtualization@lists.linux-foundation.org"
	<virtualization@lists.linux-foundation.org>,
	Bryan Tan <bryantan@vmware.com>,
	"edumazet@google.com" <edumazet@google.com>,
	Krasnov Arseniy <oxffffaa@gmail.com>,
	kernel <kernel@sberdevices.ru>, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH net-next v4 0/9] vsock: updates for SO_RCVLOWAT handling
Date: Tue, 23 Aug 2022 15:14:10 -0400	[thread overview]
Message-ID: <YwUnAhWauSFSJX+g@fedora> (raw)
In-Reply-To: <de41de4c-0345-34d7-7c36-4345258b7ba8@sberdevices.ru>


[-- Attachment #1.1: Type: text/plain, Size: 4416 bytes --]

On Fri, Aug 19, 2022 at 05:21:58AM +0000, Arseniy Krasnov wrote:
> Hello,
> 
> This patchset includes some updates for SO_RCVLOWAT:
> 
> 1) af_vsock:
>    During my experiments with zerocopy receive, i found, that in some
>    cases, poll() implementation violates POSIX: when socket has non-
>    default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>    POLLRDNORM bits in 'revents' even number of bytes available to read
>    on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>    POLLIN flag and then tries to read data(for example using  'read()'
>    call), but read call will be blocked, because  SO_RCVLOWAT logic is
>    supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>    requires that:
> 
>    "POLLIN     Data other than high-priority data may be read without
>                blocking.
>     POLLRDNORM Normal data may be read without blocking."
> 
>    See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
> 
>    So, we have, that poll() syscall returns POLLIN, but read call will
>    be blocked.
> 
>    Also in man page socket(7) i found that:
> 
>    "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>    socket as readable only if at least SO_RCVLOWAT bytes are available."
> 
>    I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>    uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>    this case for TCP socket, it works as POSIX required.
> 
>    I've added some fixes to af_vsock.c and virtio_transport_common.c,
>    test is also implemented.
> 
> 2) virtio/vsock:
>    It adds some optimization to wake ups, when new data arrived. Now,
>    SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>    There is no sense, to kick waiter, when number of available bytes
>    in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>    this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>    or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>    exit from poll() will be "spurious". This logic is also used in TCP
>    sockets.
> 
> 3) vmci/vsock:
>    Same as 2), but i'm not sure about this changes. Will be very good,
>    to get comments from someone who knows this code.
> 
> 4) Hyper-V:
>    As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>    support SO_RCVLOWAT, so he suggested to disable this feature for
>    Hyper-V.
> 
> Thank You

Hi Arseniy,
Stefano will be online again on Monday. I suggest we wait for him to
review this series. If it's urgent, please let me know and I'll take a
look.

Thanks,
Stefan

> Arseniy Krasnov(9):
>  vsock: SO_RCVLOWAT transport set callback
>  hv_sock: disable SO_RCVLOWAT support
>  virtio/vsock: use 'target' in notify_poll_in callback
>  vmci/vsock: use 'target' in notify_poll_in callback
>  vsock: pass sock_rcvlowat to notify_poll_in as target
>  vsock: add API call for data ready
>  virtio/vsock: check SO_RCVLOWAT before wake up reader
>  vmci/vsock: check SO_RCVLOWAT before wake up reader
>  vsock_test: POLLIN + SO_RCVLOWAT test
> 
>  include/net/af_vsock.h                       |   2 +
>  net/vmw_vsock/af_vsock.c                     |  33 +++++++-
>  net/vmw_vsock/hyperv_transport.c             |   7 ++
>  net/vmw_vsock/virtio_transport_common.c      |   7 +-
>  net/vmw_vsock/vmci_transport_notify.c        |  10 +--
>  net/vmw_vsock/vmci_transport_notify_qstate.c |  12 +--
>  tools/testing/vsock/vsock_test.c             | 108 +++++++++++++++++++++++++++
>  7 files changed, 162 insertions(+), 17 deletions(-)
> 
>  Changelog:
> 
>  v1 -> v2:
>  1) Patches for VMCI transport(same as for virtio-vsock).
>  2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
>  3) Waiting logic in test was updated(sleep() -> poll()).
> 
>  v2 -> v3:
>  1) Patches were reordered.
>  2) Commit message updated in 0005.
>  3) Check 'transport' pointer in 0001 for NULL.
> 
>  v3 -> v4:
>  1) vsock_set_rcvlowat() logic changed. Previous version required
>     assigned transport and always called its 'set_rcvlowat' callback
>     (if present). Now, assignment is not needed.
>  2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
>  3) 0009 - small refactoring and style fixes.
>  4) RFC tag was removed.
> 
> -- 
> 2.25.1

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  parent reply	other threads:[~2022-08-23 20:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19  5:21 [PATCH net-next v4 0/9] vsock: updates for SO_RCVLOWAT handling Arseniy Krasnov
2022-08-19  5:25 ` [PATCH net-next v4 1/9] vsock: SO_RCVLOWAT transport set callback Arseniy Krasnov
2022-08-19  5:27 ` [PATCH net-next v4 2/9] hv_sock: disable SO_RCVLOWAT support Arseniy Krasnov
2022-08-19  5:29 ` [PATCH net-next v4 3/9] virtio/vsock: use 'target' in notify_poll_in callback Arseniy Krasnov
2022-08-19  5:31 ` [PATCH net-next v4 4/9] vmci/vsock: " Arseniy Krasnov
2022-08-19  5:33 ` [PATCH net-next v4 5/9] vsock: pass sock_rcvlowat to notify_poll_in as target Arseniy Krasnov
2022-08-19  5:36 ` [PATCH net-next v4 6/9] vsock: add API call for data ready Arseniy Krasnov
2022-08-19  5:39 ` [PATCH net-next v4 7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader Arseniy Krasnov
2022-08-19  5:41 ` [PATCH net-next v4 8/9] vmci/vsock: " Arseniy Krasnov
2022-08-19  5:43 ` [PATCH net-next v4 9/9] vsock_test: POLLIN + SO_RCVLOWAT test Arseniy Krasnov
2022-08-23  9:20 ` [PATCH net-next v4 0/9] vsock: updates for SO_RCVLOWAT handling patchwork-bot+netdevbpf
2022-08-23 19:14 ` Stefan Hajnoczi [this message]
2022-08-23 19:14   ` Stefan Hajnoczi
2022-08-23 19:18   ` Jakub Kicinski
2022-08-23 20:30     ` Stefan Hajnoczi
2022-08-23 20:30       ` Stefan Hajnoczi
2022-08-23 20:57       ` Paolo Abeni
2022-08-23 20:57         ` Paolo Abeni
2022-08-24  3:48         ` Arseniy Krasnov
2022-08-29 13:52         ` Stefano Garzarella
2022-08-29 13:52           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwUnAhWauSFSJX+g@fedora \
    --to=stefanha@redhat.com \
    --cc=AVKrasnov@sberdevices.ru \
    --cc=bryantan@vmware.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=edumazet@google.com \
    --cc=haiyangz@microsoft.com \
    --cc=kernel@sberdevices.ru \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pv-drivers@vmware.com \
    --cc=sgarzare@redhat.com \
    --cc=sthemmin@microsoft.com \
    --cc=vdasa@vmware.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.