From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D161713064; Tue, 27 Jun 2023 17:25:19 +0000 (UTC) Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB2A326BF; Tue, 27 Jun 2023 10:25:17 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-666edfc50deso66937b3a.0; Tue, 27 Jun 2023 10:25:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687886717; x=1690478717; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tAy3BdEspJlPeDmEFZs9sE+/cUDTXl7LmDXHAzonOMU=; b=sa7cziA981RYM/ui/S3lmxnf2XasU2ILz8wt9ERWgRTOfDc8m10TTwaKH7tB67XbCH 7rsYn6l2vUVOJI8DO/5zIUA7bjt1BkIBwIlVBWeHDk1kLA4bo5oFXyYJZvhzsuFPwVSy oXOGIeJqge+hH5kCXtLNrR+10jqJiDozwyrh9KOvi6GzHFisaFpXGzPPIB3USUZKV6vF WlMT0Hdi5TTsTBtBwE5sUqvglm6w6aDskYJN9dYEAUld5giNWs9PpI97ONnvUQc4ZPu8 yeB5G3lkyNtOSOCv5DMaGCx/cINWeGtp4fD2IDF2Y5+K792vy+cAl7n5TOfyTH/rM3VR 7rMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687886717; x=1690478717; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tAy3BdEspJlPeDmEFZs9sE+/cUDTXl7LmDXHAzonOMU=; b=jw+S+2djQ9gPIUj1MeRTCtypl/t7/1Bkj+Y81iN95L1z5H8j1rsTEB48/TBTmzTSUW qD+V64OB+d6VIac1Tah9vPJL+e+/c2v8xS3OdECwH8h3rRueFpXaiQ5TOk8xGMkpWOJz u2Us4q8Y2xHtuLRDBwdfvW2KUxN+VkvdPisPVrUZbcWjb7mFUREXVJBR+wYzyt2UbYlL sEEaoJIvYmDiD3FzcIgTwuyk5ChUzcQQzB9JVSaJNoYsosE15TOjru8SAGSDp8XeSyw1 dgoh3x6gyAod8lgcySX0uwVtOErG7HdAGflJVMo8PJWCD6uQX8ut/TRLC4v3OmlA80vu odgQ== X-Gm-Message-State: AC+VfDzg7TzdDWmTNwYWdH9w235os9CsDjWd1bUqWq9cpZVALzBl1aEG kwSjPMGeFvOq0j8Hfqc98Bg= X-Google-Smtp-Source: ACHHUZ7obCo50dBOrmWezxMqSsQT++fExXVdK6D/0djOUDPgv8ohuvHhxmDfmFkzxBZ//Y/o/SZ8ZQ== X-Received: by 2002:a05:6a00:b4e:b0:67f:d5e7:4604 with SMTP id p14-20020a056a000b4e00b0067fd5e74604mr744066pfo.13.1687886716976; Tue, 27 Jun 2023 10:25:16 -0700 (PDT) Received: from localhost (ec2-54-67-115-33.us-west-1.compute.amazonaws.com. [54.67.115.33]) by smtp.gmail.com with ESMTPSA id x21-20020aa793b5000000b00673e652985esm4265097pff.44.2023.06.27.10.25.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Jun 2023 10:25:16 -0700 (PDT) Date: Tue, 27 Jun 2023 01:19:43 +0000 From: Bobby Eshleman To: Stefano Garzarella Cc: linux-hyperv@vger.kernel.org, Bobby Eshleman , kvm@vger.kernel.org, "Michael S. Tsirkin" , VMware PV-Drivers Reviewers , Simon Horman , Stefan Hajnoczi , virtualization@lists.linux-foundation.org, Eric Dumazet , Dan Carpenter , Xuan Zhuo , Wei Liu , Dexuan Cui , Bryan Tan , Jakub Kicinski , Paolo Abeni , Haiyang Zhang , Arseniy Krasnov , Vishnu Dasa , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, "David S. Miller" Subject: Re: [PATCH RFC net-next v4 6/8] virtio/vsock: support dgrams Message-ID: References: <20230413-b4-vsock-dgram-v4-0-0cebbb2ae899@bytedance.com> <20230413-b4-vsock-dgram-v4-6-0cebbb2ae899@bytedance.com> <92b3a6df-ded3-6470-39d1-fe0939441abc@gmail.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DATE_IN_PAST_12_24, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net On Mon, Jun 26, 2023 at 05:03:15PM +0200, Stefano Garzarella wrote: > On Fri, Jun 23, 2023 at 04:37:55AM +0000, Bobby Eshleman wrote: > > On Thu, Jun 22, 2023 at 06:09:12PM +0200, Stefano Garzarella wrote: > > > On Sun, Jun 11, 2023 at 11:49:02PM +0300, Arseniy Krasnov wrote: > > > > Hello Bobby! > > > > > > > > On 10.06.2023 03:58, Bobby Eshleman wrote: > > > > > This commit adds support for datagrams over virtio/vsock. > > > > > > > > > > Message boundaries are preserved on a per-skb and per-vq entry basis. > > > > > > > > I'm a little bit confused about the following case: let vhost sends 4097 bytes > > > > datagram to the guest. Guest uses 4096 RX buffers in it's virtio queue, each > > > > buffer has attached empty skb to it. Vhost places first 4096 bytes to the first > > > > buffer of guests RX queue, and 1 last byte to the second buffer. Now IIUC guest > > > > has two skb in it rx queue, and user in guest wants to read data - does it read > > > > 4097 bytes, while guest has two skb - 4096 bytes and 1 bytes? In seqpacket there is > > > > special marker in header which shows where message ends, and how it works here? > > > > > > I think the main difference is that DGRAM is not connection-oriented, so > > > we don't have a stream and we can't split the packet into 2 (maybe we > > > could, but we have no guarantee that the second one for example will be > > > not discarded because there is no space). > > > > > > So I think it is acceptable as a restriction to keep it simple. > > > > > > My only doubt is, should we make the RX buffer size configurable, > > > instead of always using 4k? > > > > > I think that is a really good idea. What mechanism do you imagine? > > Some parameter in sysfs? > I comment more on this below. > > > > For sendmsg() with buflen > VQ_BUF_SIZE, I think I'd like -ENOBUFS > > For the guest it should be easy since it allocates the buffers, but for > the host? > > Maybe we should add a field in the configuration space that reports some > sort of MTU. > > Something in addition to what Laura had proposed here: > https://markmail.org/message/ymhz7wllutdxji3e > That sounds good to me. IIUC vhost exposes the limit via the configuration space, and the guest can configure the RX buffer size up to that limit via sysfs? > > returned even though it is uncharacteristic of Linux sockets. > > Alternatively, silently dropping is okay... but seems needlessly > > unhelpful. > > UDP takes advantage of IP fragmentation, right? > But what happens if a fragment is lost? > > We should try to behave in a similar way. > AFAICT in UDP the sending socket will see EHOSTUNREACH on its error queue and the packet will be dropped. For more details: - the IP defragmenter will emit an ICMP_TIME_EXCEEDED from ip_expire() if the fragment queue is not completed within time. - Upon seeing ICMP_TIME_EXCEEDED, the sending stack will then add EHOSTUNREACH to the socket's error queue, as seen in __udp4_lib_err(). Given some updated man pages I think enqueuing EHOSTUNREACH is okay for vsock too. This also reserves ENOBUFS/ENOMEM only for shortage on local buffers / mem. What do you think? Thanks, Bobby