From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-comment-return-967-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 135BA985A5E for ; Sun, 1 Dec 2019 21:44:03 +0000 (UTC) Date: Sun, 1 Dec 2019 16:43:49 -0500 From: "Michael S. Tsirkin" Message-ID: <20191201164128-mutt-send-email-mst@kernel.org> References: <20191111092046-mutt-send-email-mst@kernel.org> <20191120081531-mutt-send-email-mst@kernel.org> <20191124100825-mutt-send-email-mst@kernel.org> <20191127083719-mutt-send-email-mst@kernel.org> <20191201074825-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: <20191201074825-mutt-send-email-mst@kernel.org> Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: Add equal-sized receive buffers feature Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline To: Vitaly Mireyno Cc: Jason Wang , "virtio-comment@lists.oasis-open.org" List-ID: On Sun, Dec 01, 2019 at 08:07:55AM -0500, Michael S. Tsirkin wrote: > On Sun, Dec 01, 2019 at 10:22:17AM +0000, Vitaly Mireyno wrote: > >=20 > >=20 > > >-----Original Message----- > > >From: Michael S. Tsirkin > > >Sent: Wednesday, 27 November, 2019 15:51 > > >To: Vitaly Mireyno > > >Cc: Jason Wang ; virtio-comment@lists.oasis- > > >open.org > > >Subject: Re: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATC= H] > > >virtio-net: Add equal-sized receive buffers feature > > > > > >On Tue, Nov 26, 2019 at 05:27:31PM +0000, Vitaly Mireyno wrote: > > >> > > >> > > >> >-----Original Message----- > > >> >From: virtio-comment@lists.oasis-open.org > > >> > On Behalf Of Michael S. > > >> >Tsirkin > > >> >Sent: Sunday, 24 November, 2019 17:30 > > >> >To: Vitaly Mireyno > > >> >Cc: Jason Wang ; virtio-comment@lists.oasis- > > >> >open.org > > >> >Subject: [virtio-comment] Re: [EXT] Re: [virtio-comment] Re: [PATCH= ] > > >> >virtio- > > >> >net: Add equal-sized receive buffers feature > > >> > > > >> >On Sun, Nov 24, 2019 at 03:02:05PM +0000, Vitaly Mireyno wrote: > > >> >> > > >> >> > > >> >> >-----Original Message----- > > >> >> >From: Michael S. Tsirkin > > >> >> >Sent: Wednesday, 20 November, 2019 15:23 > > >> >> >To: Vitaly Mireyno > > >> >> >Cc: Jason Wang ; virtio-comment@lists.oasis= - > > >> >> >open.org > > >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-net: > > >> >> >Add > > >> >> >equal- sized receive buffers feature > > >> >> > > > >> >> >On Thu, Nov 14, 2019 at 03:49:59PM +0000, Vitaly Mireyno wrote: > > >> >> >> > > >> >> >> > > >> >> >> >-----Original Message----- > > >> >> >> >From: Michael S. Tsirkin > > >> >> >> >Sent: Monday, 11 November, 2019 17:05 > > >> >> >> >To: Vitaly Mireyno > > >> >> >> >Cc: Jason Wang ; > > >> >> >> >virtio-comment@lists.oasis- open.org > > >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-ne= t: > > >> >> >> >Add > > >> >> >> >equal- sized receive buffers feature > > >> >> >> > > > >> >> >> >On Mon, Nov 11, 2019 at 01:58:51PM +0000, Vitaly Mireyno wrot= e: > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> >-----Original Message----- > > >> >> >> >> >From: Michael S. Tsirkin > > >> >> >> >> >Sent: Sunday, 10 November, 2019 17:35 > > >> >> >> >> >To: Vitaly Mireyno > > >> >> >> >> >Cc: Jason Wang ; > > >> >> >> >> >virtio-comment@lists.oasis- open.org > > >> >> >> >> >Subject: Re: [EXT] Re: [virtio-comment] Re: [PATCH] virtio= -net: > > >> >> >> >> >Add > > >> >> >> >> >equal- sized receive buffers feature > > >> >> >> >> > > > >> >> >> >> >On Sun, Nov 10, 2019 at 02:13:14PM +0000, Vitaly Mireyno w= rote: > > >> >> >> >> >> > > >> >> >> >> >> >-----Original Message----- > > >> >> >> >> >> >From: virtio-comment@lists.oasis-open.org > > >> >> >> >> >> > On Behalf Of Mic= hael > > >S. > > >> >> >> >> >> >Tsirkin > > >> >> >> >> >> >Sent: Tuesday, 5 November, 2019 20:52 > > >> >> >> >> >> >To: Jason Wang > > >> >> >> >> >> >Cc: Vitaly Mireyno ; > > >> >> >> >> >> >virtio-comment@lists.oasis- open.org > > >> >> >> >> >> >Subject: [EXT] Re: [virtio-comment] Re: [PATCH] virtio-= net: > > >> >> >> >> >> >Add equal-sized receive buffers feature > > >> >> >> >> >> > > > >> >> >> >> >> >External Email > > >> >> >> >> >> > > > >> >> >> >> >> >-------------------------------------------------------= -- > > >> >> >> >> >> >--- > > >> >> >> >> >> >--- > > >> >> >> >> >> >--- > > >> >> >> >> >> >--- > > >> >> >> >> >> >- On Fri, Nov 01, 2019 at 03:22:26PM +0800, Jason Wang = wrote: > > >> >> >> >> >> >> > > >> >> >> >> >> >> On 2019/11/1 =E4=B8=8B=E5=8D=8812:09, Michael S. Tsir= kin wrote: > > >> >> >> >> >> >> > On Thu, Oct 31, 2019 at 10:46:55AM +0000, Vitaly > > >> >> >> >> >> >> > Mireyno > > >> >wrote: > > >> >> >> >> >> >> > > The feature is limited to receive buffers only. > > >> >> >> >> >> >> > > A driver decides on receive buffer length. The on= ly > > >> >> >> >> >> >> > > limitation is that this > > >> >> >> >> >> >length has to be the same for all receive virtqueue buf= fers. > > >> >> >> >> >> >> > > The driver configures receive buffer length to th= e > > >> >> >> >> >> >> > > device during device > > >> >> >> >> >> >initialization, and the device reads it and may use it > > >> >> >> >> >> >for optimal > > >> >> >> >operation. > > >> >> >> >> >> >> > > No changes for transmit buffers. > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > -----Original Message----- > > >> >> >> >> >> >> > > From: virtio-comment@lists.oasis-open.org > > >> >> >> >> >> >> > > On Behalf O= f > > >> >> >> >> >> >> > > Jason Wang > > >> >> >> >> >> >> > > Sent: Thursday, 31 October, 2019 12:15 > > >> >> >> >> >> >> > > To: Vitaly Mireyno ; > > >> >> >> >> >> >> > > virtio-comment@lists.oasis-open.org > > >> >> >> >> >> >> > > Cc: Michael S. Tsirkin > > >> >> >> >> >> >> > > Subject: [virtio-comment] Re: [PATCH] virtio-net: > > >> >> >> >> >> >> > > Add equal-sized receive buffers feature > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > On 2019/10/31 =E4=B8=8B=E5=8D=885:23, Vitaly Mire= yno wrote: > > >> >> >> >> >> >> > > > Some devices benefit from working with receive > > >> >> >> >> >> >> > > > buffers of a > > >> >> >> >> >> >predefined constant length. > > >> >> >> >> >> >> > > > Add a feature for drivers that allocate > > >> >> >> >> >> >> > > > equal-sized receive buffers, and > > >> >> >> >> >> >devices that benefit from predefined receive buffers le= ngth. > > >> >> >> >> >> >> > > > > > >> >> >> >> >> >> > > > Signed-off-by: Vitaly Mireyno > > >> >> >> >> >> >> > > > > > >> >> >> >> >> >> > > > --- > > >> >> >> >> >> >> > > > content.tex | 29 +++++++++++++++++++++++++++= -- > > >> >> >> >> >> >> > > > 1 file changed, 27 insertions(+), 2 > > >> >> >> >> >> >> > > > deletions(-) > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > Is there any other networking device that has thi= s > > >feature? > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > diff --git a/content.tex b/content.tex index > > >> >> >> >> >> >> > > > b1ea9b9..c9e67c8 > > >> >> >> >> >> >> > > > 100644 > > >> >> >> >> >> >> > > > --- a/content.tex > > >> >> >> >> >> >> > > > +++ b/content.tex > > >> >> >> >> >> >> > > > @@ -2811,6 +2811,10 @@ \subsection{Feature > > >> >> >> >> >> >> > > > bits}\label{sec:Device > > >> >> >> >> >> >Types / Network Device / Feature bits > > >> >> >> >> >> >> > > > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set > > >MAC > > >> >> >> >> >> >> > > > address > > >> >> >> >> >> >through control > > >> >> >> >> >> >> > > > channel. > > >> >> >> >> >> >> > > > +\item[VIRTIO_NET_F_CONST_RXBUF_LEN(58)] Driver > > >> >> >> >> >> >> > > > +allocates all > > >> >> >> >> >> >receive buffers > > >> >> >> >> >> >> > > > + of the same constant length. Device benefi= ts > > >> >> >> >> >> >> > > > + from working > > >> >> >> >with > > >> >> >> >> >> >> > > > + receive buffers of equal length. > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > Problem is, I don't think linux will use this for > > >> >> >> >> >> >> > skbs since it breaks buffer accounting. This is > > >> >> >> >> >> >> > because it is important to make skbs as small as yo= u > > >> >> >> >> >> >> > can. So even if you see "device would > > >> >> >> >benefit" > > >> >> >> >> >> >> > there is no way to balance this with the benefit to= linux. > > >> >> >> >> >> >> > How do you know which benefit is bigger? > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> I guess the idea is e.g for Linux driver, it can refu= se this > > >feature. > > >> >> >> >> >> > > > >> >> >> >> >> >Okay. What uses it? > > >> >> >> >> >> > > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > You also never explained how does device benefit. M= y > > >> >> >> >> >> >> > guess is this allows device to calculate the # of > > >> >> >> >> >> >> > descriptors to fetch that are needed for a packet. = Right? > > >> >> >> >> >> >> > Assuming this, I think a rough estimate should be e= nough. > > >> >> >> >> >> >> > If device fetches too much it can discard extra, if > > >> >> >> >> >> >> > it does not fetch enough it can fetch extra. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > Let us not forget that express packets are 256 byte= s > > >> >> >> >> >> >> > so up to > > >> >> >> >> >> >> > 16 descriptors fit in a packet, there is no benefit > > >> >> >> >> >> >> > most of the time in knowing whether e.g. 1 or 2 > > >> >> >> >> >> >> > descriptors are > > >> >needed. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > Let us not forget these are buffers, not descriptor= s. > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> I guess maybe the initial motivation is constant > > >> >> >> >> >> >> descriptor > > >> >length. > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> >That conflicts with requirement framing is up to driver= . > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> I was using the wrong terminology. The feature is about > > >> >> >> >> >> constant > > >> >> >> >> >*descriptor* length. In other words, the value that driver > > >> >> >> >> >puts in Packed Virtqueue "Element Length" field (or 'len' > > >> >> >> >> >field in the > > >> >> >'pvirtq_desc' > > >> >> >> >struct). > > >> >> >> >> > > > >> >> >> >> >OK so this conflicts with "2.6.4 Message Framing" which > > >> >> >> >> >requires that drivers can split buffers into as many > > >> >> >> >> >descriptors as > > >> >they like. > > >> >> >> >> >Right? > > >> >> >> >> > > > >> >> >> >> >> Does it make more sense now, or you still see an issue > > >> >> >> >> >> with Linux > > >> >> >driver? > > >> >> >> >> > > > >> >> >> >> >I think there's an issue with the flexible framing > > >> >> >> >> >requirements and there's an issue with Linux driver. > > >> >> >> >> > > > >> >> >> >> >> The motivation is to allow the device to calculate the > > >> >> >> >> >> exact number of > > >> >> >> >> >descriptors to consume, before fetching the descriptors. > > >> >> >> >> >This is beneficial for devices for which overconsuming or > > >> >> >> >> >underconsuming descriptors come at a high cost. > > >> >> >> >> > > > >> >> >> >> >So I guess we can agree getting the # of descriptors > > >> >> >> >> >*exactly* right isn't all that important. Right? My questi= on > > >> >> >> >> >is how does the driver balance the device versus Linux nee= ds? > > >> >> >> >> > > > >> >> >> >> >One idea is if we assume this is best effort, does not hav= e > > >> >> >> >> >to be exact, then how about just having the device assume > > >> >> >> >> >descriptor sizes stay more or less constant and use that t= o > > >> >> >> >> >estimate the amount? If it under/over estimates, things ju= st > > >> >> >> >> >go a bit > > >> >slower. > > >> >> >> >> >This way driver can adjust the sizes and device will react > > >> >> >> >> >automatically, with time. > > >> >> >> >> > > > >> >> >> >> > > > >> >> >> >> > > >> >> >> >> I see that "2.6.4 Message Framing" allows a full flexibilit= y > > >> >> >> >> of descriptor lengths, and I presume it's applicable to > > >> >> >> >> Packet Virtqueues as well, though defined under Split Virtq= ueues > > >section. > > >> >> >> > > > >> >> >> >That's a good point. Probably makes sense to move it out to a > > >> >> >> >common section, right? > > >> >> >> > > > >> >> >> >> The example > > >> >> >> >> talks about transmit virtqueue, and it makes perfect sense. > > >> >> >> >> However, wouldn't a typical driver place equal-sized > > >> >> >> >> *receive* descriptors > > >> >> >> >anyway? So if a device can benefit from it, the driver might = as > > >> >> >> >well negotiate this new feature. > > >> >> >> > > > >> >> >> >Having buffer size jump around wildly doesn't seem too useful= , I > > >agree. > > >> >> >> >OTOH being able to adjust it gradually has been in the past > > >> >> >> >demontrated to help performance measureably. > > >> >> >> > > > >> >> >> >> This could be especially relevant for devices, for which > > >> >> >> >> adjusting the number > > >> >> >> >of used descriptors is impractical after descriptors have > > >> >> >> >already been > > >> >> >fetched. > > >> >> >> >> I agree that if this requirement conflicts with specific > > >> >> >> >> driver needs, it will not > > >> >> >> >be negotiated, and the device will either underperform in > > >> >> >> >certain scenarios, or will not come up at all. > > >> >> >> > > > >> >> >> >Right so that makes it a challenge. Device says it prefers > > >> >> >> >fixed buffers. Is that preference more or less important than > > >> >> >> >ability to efficiently support workloads such as TCP small qu= eues? > > >> >> >> >Driver has no idea and I suspect neither does the device. > > >> >> >> >So I don't see how will a Linux driver know that it should > > >> >> >> >enable this, neither how will device know it's okay to just r= efuse > > >features_ok. > > >> >> >> > > > >> >> >> > > > >> >> >> >On the other hand, current drivers already have logic that > > >> >> >> >tries to change buffer sizes in a smooth way. So simply > > >> >> >> >caching the last buffer size and assuming all the others will > > >> >> >> >be exactly the same will go a long way towards limiting how > > >> >> >> >much does device need to > > >> >fetch. > > >> >> >> >This does imply extra logic on the device to recover if thing= s > > >> >> >> >change and the first read did not fetch enough buffers, but > > >> >> >> >then it would be required anyway if as you say above the > > >> >> >> >failure is not > > >> >catastrophic. > > >> >> >> > > > >> >> >> > > > >> >> >> >The feature thus only seems useful for small, > > >> >> >> >feature-restrained devices > > >> >> >> >- which are prepared to sacrifice some performance to cut > > >> >> >> >costs, and which can't recover at all. Is this what you are t= rying to do? > > >> >> >> > > > >> >> >> > > >> >> >> The intention is to enable devices with this specific limitati= on > > >> >> >> to offer virtio offload. The feature can be redefined such th= at > > >> >> >> it would only be offered by devices that are unable to handle > > >> >> >> dynamically changing descriptor lengths. How does that sound? > > >> >> > > > >> >> >So it makes more sense when mergeable buffers are disabled (sinc= e > > >> >> >then buffers are practically all same size). > > >> >> > > > >> >> > > >> >> Actually, mergeable buffers are not a problem. They could be > > >> >> enabled, as long as each descriptor is the same length. So > > >> >> implementations that prefer optimizing memory utilization over > > >> >> jumbo frame performance can negotiate VIRTIO_NET_F_MRG_RXBUF > > >and allocate smaller buffers. > > >> > > > >> >So my point was, without VIRTIO_NET_F_MRG_RXBUF, all buffers are > > >same > > >> >length anyway. So if we are talking about cheap simple hardware, I > > >> >guess it can just not have VIRTIO_NET_F_MRG_RXBUF and be done with > > >it? > > >> >If device does care about memory utilization then I think it needs = to > > >> >give driver the flexibility it needs/uses. > > >> >No? > > >> > > > >> > > >> It's not a matter of device complexity, but a HW architecture, which= could be > > >complex, but have this specific limitation. > > > > > >So - don't do it then? > > > > > >> I see at least two reasons to support and negotiate > > >VIRTIO_NET_F_MRG_RXBUF, while keeping equal-sized descriptor limitatio= n: > > >> * GRO > > > > > >You mean LRO? > > > > > >> * With jumbo packets, if the throughput is capped by the port bandw= idth, > > >and not by the device/driver per-packet performance, it makes sense to > > >optimize memory utilization by allocating small buffers, without sacri= ficing > > >throughput performance. > > > > > >So we are back to square one, if driver cares about memory utilization= with 1K > > >packets vs 9K buffers, why not with 100byte packets vs 1K buffers? Loo= ks like > > >exactly the same tradeoff. > > > > > > > > >> > > >> >Again I can see how we might want to disallow crazy setups with e.g= . > > >> >1 byte per buffer. That's just abuse, no guest does that anyway. So > > >> >asking e.g. for a minimal buffer size sounds very reasonable. But = an > > >> >option that disables functionality that a popular guest uses needs = a > > >> >lot of documentation to help device writers figure out whether they > > >> >want that option or not, and I'd worry that even with documentation= will > > >be misunderstood even if we write it. > > >> >When do you enable this? > > >> >When you don't care about performance? When you don't care about > > >Linux? > > >> > > > >> > > >> I understand that there are guest-side optimizations that require fl= exibility > > >in descriptors length. I can propose the following simple logic: > > >> Device - advertise "equal-sized descriptor" feature only if the devi= ce is > > >absolutely unable to operate otherwise. The device will not set FEATUR= ES_OK > > >unless the driver negotiates this feature. > > >> Driver - if device advertises "equal-sized descriptor" feature - if = possible, > > >give up on the flexible descriptors length optimizations. If not - giv= e up on the > > >device. > > > > > >Yes, that's possible. Looks like a rather limited feature, and i'd rat= her we > > >focused on something more widely applicable, but with enough disclamer= s > > >that devices SHOULD NOT set this bit we can maybe do that. > > > > >=20 > > I agree that this feature is more of a limitation declaration, rather t= han an enhancement, but let me emphasize that its only purpose is to make m= ore *existing* HW devices be virtio compatible.=20 > > This feature will allow such HW devices to offer virtio offload with VI= RTIO_NET_F_MRG_RXBUF capability for GRO/LRO offload. > > New device designs should definitely avoid employing this feature. >=20 > I understand, my question is: can you find a way to make this more > generally useful? Something along the lines of the min buffer size > suggestion which would avoid creating a special case in the > driver? Any idea? as an example of a vague idea, exposing max rx s/g, min buffer + max buffer will allow device to force this from device side. Is that good? If not, how does driver decide on a good fixed buffer size? >=20 > > > > > >> > > >> >> > > > >> >> >With that in mind, I have an idea: scsi and block already have m= ax sg > > >field. > > >> >> >How about we add a writeable max sg field, maybe even make it > > >> >> >programmable per vq? > > >> >> > > > >> >> >Thus driver tells device what is it's max s/g value for rx. Wors= t > > >> >> >case device fetches a bit more than it needs. Discarding extra > > >> >> >shouldn't be expensive. This looks like it will help even smart > > >> >> >devices. What > > >> >do you think? > > >> >> > > > >> >> >This nicely avoids conflicting with the flexible framing require= ment. > > >> >> > > > >> >> > > >> >> My intention was to avoid any descriptor length variations, for > > >> >> devices that unable fetching or discarding extra descriptors. (If > > >> >> in the pipelined HW processing the decision on number of > > >> >> descriptors is made in the early stage, and it cannot be undone i= n a later > > >stage). > > >> > > > >> >Frankly discarding unused descriptors looks to me like something th= at > > >> >shouldn't have a high cost in the hardware. I can see how trying t= o > > >> >predict descriptor length, and fetching extra if not enough was > > >> >fetched can have a high cost, so to me extensions to remove the > > >> >guesswork from the device have value. However a lot of effort went > > >> >into trying to reduce e.g. number of pci express packets needed to = fetch > > >descriptors. > > >> >Each packet fetches 256 bytes anyway, does it not? Not having an > > >> >ability to use that information seems like an obvious waste, and th= at > > >> >means ability to keep some fetched descriptors around even if they > > >> >are not used for a given packet. Again just my $.02. > > >> > > > >> > > >> In packed virtqueue, discarding unused descriptors (and buffers asso= ciated > > >with them) can indeed be easy, but reusing them for the next packet is > > >complicated (or impossible). > > >> I agree that descriptors are being (pre)fetched with the maximum eff= iciency > > >(in terms of PCIe bandwidth), and cached in the device. But the decisi= on to > > >fetch is being made according to the number of left cached-in descript= ors and > > >the expected number of descriptors that will be used by the packet. > > >> If the expected number of descriptors is larger than the actual one,= the next > > >fetch decision will be taken too early, and there will be no way to re= use excess > > >cached descriptors, and they will have to be discarded. > > > > > >> Even if it's possible to skip descriptors in the packed virtqueue (i= s it?), it's > > >surely inefficient. > > > > > >OK I think I understand what you are doing. Device is getting buffers = 1,2,3 for > > >packet 1, it is meanwhile receiving packet > > >2 and decides to get buffers 4,5 for packet 2. > > >At this point it finally gets buffers 1-3 and figures out that buffers= 3 was not > > >needed, but possibly it already started writing packet 2 into buffers = 4. > > >What to do about buffers 3 now? > > > > > >Is above a good example? > > > > > >If yes then it looks like you are unaware that Descriptors can be used= out of > > >order, with split or packed ring, with no issues. Looks like exactly = what you > > >need to address this issue? > > >So device will simply proceed with marking buffers 1,2,4,5 as used, an= d store > > >buffers 3 in some kind of internal memory and use it for the next pack= et. > > > > > >This is exactly the kind of thing out of order was designed for. > > > > > >Does this answer the question? > > > > >=20 > > The example is good, and the suggested solution is clear. However if we= 're talking about HW device that is designed to process packets/buffers in = order, this solution could not be applicable. >=20 >=20 >=20 > > >> > > >> >> > > >> >> Defining max s/g sounds like an interesting feature by itself. > > >> > > > >> >But assuming we have max RX s/g, I guess hardware can set max s/g = =3D 1? > > >> >Then since with !VIRTIO_NET_F_MRG_RXBUF all buffers are forced to b= e > > >> >same length. > > >> > > > >> >> > > >> >> > > > >> >> >> >> Could you specify what issue do you see with the Linux driv= er? > > >> >> >> > > > >> >> >> >See logic around struct ewma. > > >> >> >> > > > >> >> >> >The first relevant commit is I guess commit > > >> >> >> >ab7db91705e95ed1bba1304388936fccfa58c992 > > >> >> >> > virtio-net: auto-tune mergeable rx buffer size for improv= ed > > >> >> >> >performance > > >> >> >> > > > >> >> >> >this claims a gain of about 10% with large packets which isn'= t > > >> >> >> >earth shattering but also not something we can ignore complet= ely. > > >> >> >> >And I suspect it can be bigger with smaller packets. > > >> >> >> > > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > So device does not really know the exact # of descr= iptors: > > >> >> >> >> >> >> > current drivers do 1 descriptor per buffer but > > >> >> >> >> >> >> > nothing prevents more. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > Thoughts? > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > \item[VIRTIO_NET_F_RSC_EXT(61)] Device can > > >> >> >> >> >> >> > > > process duplicated > > >> >> >> >> >> >ACKs > > >> >> >> >> >> >> > > > and report number of coalesced segments > > >> >> >> >> >> >> > > > and duplicated ACKs @@ -2854,8 +2858,8 @@ > > >> >> >> >\subsubsection{Legacy Interface: > > >> >> >> >> >> >Feature bits}\label{sec:Device Types / Network > > >> >> >> >> >> >> > > > \subsection{Device configuration > > >> >> >> >> >> >> > > > layout}\label{sec:Device Types / > > >> >> >> >> >> >Network Device / Device configuration layout} > > >> >> >> >> >> >> > > > \label{sec:Device Types / Block Device / > > >> >> >> >> >> >> > > > Feature bits / Device configuration layout} > > >> >> >> >> >> >> > > > -Three driver-read-only configuration fields ar= e > > >currently defined. > > >> >> >> >> >> >> > > > The \field{mac} address field -always exists > > >> >> >> >> >> >> > > > (though is only valid if VIRTIO_NET_F_MAC is > > >> >> >> >> >> >> > > > set), and > > >> >> >> >> >> >> > > > +The driver-read-only \field{mac} address field > > >> >> >> >> >> >> > > > +always exists (though is only valid if > > >> >> >> >> >> >> > > > +VIRTIO_NET_F_MAC is set), and > > >> >> >> >> >> >> > > > \field{status} only exists if VIRTIO_NET_F_S= TATUS is > > >set. > > >> >> >Two > > >> >> >> >> >> >> > > > read-only bits (for the driver) are currentl= y > > >> >> >> >> >> >> > > > defined for the status > > >> >> >> >> >> >field: > > >> >> >> >> >> >> > > > VIRTIO_NET_S_LINK_UP and > > >> >VIRTIO_NET_S_ANNOUNCE. > > >> >> >> >> >> >> > > > @@ -2875,12 +2879,17 @@ \subsection{Device > > >> >> >> >> >> >> > > > configuration > > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device > > >> >> >> >> >> >> > > > VIRTIO_NET_F_MTU is set. This field specifie= s > > >> >> >> >> >> >> > > > the maximum MTU for > > >> >> >> >> >> >the driver to > > >> >> >> >> >> >> > > > use. > > >> >> >> >> >> >> > > > +The device-read-only field \field{rx_buf_len} > > >> >> >> >> >> >> > > > +only exists if > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > Should be driver-read-only. > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > +VIRTIO_NET_F_CONST_RXBUF_LEN is negotiated. > > >This > > >> >> >> >> >> >> > > > +field specifies the receive buffers length. > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > \begin{lstlisting} > > >> >> >> >> >> >> > > > struct virtio_net_config { > > >> >> >> >> >> >> > > > u8 mac[6]; > > >> >> >> >> >> >> > > > le16 status; > > >> >> >> >> >> >> > > > le16 max_virtqueue_pairs; > > >> >> >> >> >> >> > > > le16 mtu; > > >> >> >> >> >> >> > > > + le32 rx_buf_len; > > >> >> >> >> >> >> > > > }; > > >> >> >> >> >> >> > > > \end{lstlisting} @@ -2933,6 +2942,13 @@ > > >> >> >> >> >> >> > > > \subsection{Device configuration > > >> >> >> >> >> >> > > > layout}\label{sec:Device Types / Network > > >> >Device > > >> >> >> >> >> >> > > > A driver SHOULD negotiate the > > >> >> >> >> >> >> > > > VIRTIO_NET_F_STANDBY feature if > > >> >> >> >> >> >the device offers it. > > >> >> >> >> >> >> > > > +A driver SHOULD accept the > > >> >> >> >VIRTIO_NET_F_CONST_RXBUF_LEN > > >> >> >> >> >> >feature if offered. > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has > > >been > > >> >> >> >> >negotiated, > > >> >> >> >> >> >> > > > +the driver MUST set \field{rx_buf_len}. > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > I think it's device that set the field? > > >> >> >> >> >> >> > Makes more sense for the driver, but if you want th= is set > > >e.g. > > >> >> >> >> >> >> > before buffers are added, you must say so. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > +A driver MUST NOT modify \field{rx_buf_len} on= ce > > >> >> >> >> >> >> > > > +it has been > > >> >> >> >> >set. > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > This seems very unflexible. I can see how e.g. XDP > > >> >> >> >> >> >> > would benefit from big buffers while skbs benefit > > >> >> >> >> >> >> > from small > > >> >buffers. > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > This calls for ability to change this. > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > >> >> >> >> >> >> Yes, but it requires non trivial cleanups for the old > > >> >> >> >> >> >> length and place them with new ones. > > >> >> >> >> >> >> > > >> >> >> >> >> >> Thanks > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> >Let's see: > > >> >> >> >> >> >1=09- making buffer smaller: just update config space, > > >> >> >> >> >> >=09 then make new buffers smaller > > >> >> >> >> >> > > > >> >> >> >> >> >2=09- making buffers bigger: add larger buffers, > > >> >> >> >> >> >=09once all small ones are consumed update config space > > >> >> >> >> >> > > > >> >> >> >> >> >2 is tricky I agree. Thoughts? > > >> >> >> >> >> > > > >> >> >> >> >> > > > >> >> >> >> >> Agree. It's doable, provided that the driver will follow > > >> >> >> >> >> the update > > >> >> >> >> >procedure. > > >> >> >> >> >> > > > >> >> >> >> >> >> > > >> >> >> >> >> >> > > > >> >> >> >> >> >> > > > \subsubsection{Legacy Interface: Device > > >> >> >> >> >> >> > > > configuration > > >> >> >> >> >> >layout}\label{sec:Device Types / Network Device / Devic= e > > >> >> >> >> >> >configuration layout / Legacy Interface: Device > > >> >> >> >> >> >configuration layout} > > >> >> >> >> >> >> > > > \label{sec:Device Types / Block Device / > > >> >> >> >> >> >> > > > Feature bits / Device > > >> >> >> >> >> >configuration layout / Legacy Interface: Device > > >> >> >> >> >> >configuration layout} > > >> >> >> >> >> >> > > > When using the legacy interface, transitiona= l > > >> >> >> >> >> >> > > > devices and drivers @@ > > >> >> >> >> >> >> > > > -3241,6 +3257,11 @@ \subsubsection{Setting Up > > >> >> >> >> >> >> > > > Receive > > >> >> >> >> >> >Buffers}\label{sec:Device Types / Network Devi > > >> >> >> >> >> >> > > > If VIRTIO_NET_F_MQ is negotiated, each of > > >> >> >> >> >> >> > > > receiveq1\ldots > > >> >> >> >> >> >receiveqN > > >> >> >> >> >> >> > > > that will be used SHOULD be populated with > > >> >> >> >> >> >> > > > receive > > >> >> >buffers. > > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN feature has > > >been > > >> >> >> >> >negotiated, > > >> >> >> >> >> >> > > > +the driver MUST initialize all receive virtque= ue > > >> >> >> >> >> >> > > > +descriptors \field{len} field with the value > > >> >> >> >> >> >> > > > +configured in \field{rx_buf_len} device > > >> >> >> >> >> >> > > > +configuration field, and allocate receive > > >> >> >> >> >> >buffers accordingly. > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > \devicenormative{\paragraph}{Setting Up > > >> >> >> >> >> >> > > > Receive Buffers}{Device Types / Network Device = / > > >> >> >> >> >> >> > > > Device Operation / Setting Up Receive Buffers} > > >> >> >> >> >> >> > > > The device MUST set \field{num_buffers} to t= he > > >> >> >> >> >> >> > > > number of descriptors used to @@ -3396,6 +3417,= 10 > > >> >> >> >> >> >> > > > @@ > > >> >> >> >> >> >\subsubsection{Processing of Incoming > > >> >> >> >> >> >Packets}\label{sec:Device Types / Network > > >> >> >> >> >> >> > > > checksum (in case of multiple encapsulated > > >> >> >> >> >> >> > > > protocols, one > > >> >> >> >level > > >> >> >> >> >> >> > > > of checksums is validated). > > >> >> >> >> >> >> > > > +If VIRTIO_NET_F_CONST_RXBUF_LEN has been > > >> >> >> >> >> >> > > > +negotiated, > > >> >> >> >the > > >> >> >> >> >> >device > > >> >> >> >> >> >> > > > +MAY use \field{rx_buf_len} as a buffer length > > >> >> >> >> >> >> > > > +(instead of reading it from virtqueue descript= or > > >> >\field{len} field). > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > Is this safe? What if driver submit a small buffe= r, > > >> >> >> >> >> >> > > then device can read > > >> >> >> >> >> >more than what is allowed? > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > Thanks > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > > >> >> >> >> >> >> > > > + > > >> >> >> >> >> >> > > > \drivernormative{\paragraph}{Processing of I= ncoming > > >> >> >> >> >> >> > > > Packets}{Device Types / Network Device / > > >> >> >> >> >> >> > > > Device Operation > > >> >> >/ > > >> >> >> >> >> >> > > > Processing of Incoming Packets} > > >> > > > >> > > > >> >This publicly archived list offers a means to provide input to the > > >> >OASIS Virtual I/O Device (VIRTIO) TC. > > >> > > > >> >In order to verify user consent to the Feedback License terms and t= o > > >> >minimize spam in the list archive, subscription is required before = posting. > > >> > > > >> >Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > >> >Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > >> >List help: virtio-comment-help@lists.oasis-open.org > > >> >List archive: https://urldefense.proofpoint.com/v2/url?u=3Dhttps- > > >> >3A__lists.oasis-2Dopen.org_archives_virtio- > > >> > > >>2Dcomment_&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ&r=3DlDHJ2FW52oJ3l > > >q > > >> >qsArgFRdcevq01tbLQAw4A_NO7xgI&m=3DgIIx9_eEGj-aDaM6Z- > > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=3D- > > >> >JICLquqUnNye7tilUS67AFv7opngsKEl5L75acB64U&e=3D > > >> >Feedback License: https://urldefense.proofpoint.com/v2/url?u=3Dhttp= s- > > >> >3A__www.oasis-2Dopen.org_who_ipr_feedback- > > >> > > >>5Flicense.pdf&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ&r=3DlDHJ2FW52oJ3l= q > > >q > > >> >sArgFRdcevq01tbLQAw4A_NO7xgI&m=3DgIIx9_eEGj-aDaM6Z- > > >> > > >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=3D4e2kWmSdPAtGMXBTHwfgNE_KOZdDU > > >R > > >> >Wsji73HWdVF3A&e=3D > > >> >List Guidelines: https://urldefense.proofpoint.com/v2/url?u=3Dhttps= - > > >> >3A__www.oasis-2Dopen.org_policies-2Dguidelines_mailing- > > >> > > >>2Dlists&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ&r=3DlDHJ2FW52oJ3lqqsArg= F > > >R > > >> >dcevq01tbLQAw4A_NO7xgI&m=3DgIIx9_eEGj-aDaM6Z- > > >> >42yWtI9MnZcqZ2Gw7KCN7EgCg&s=3Di-dQn5G- > > >> >auoFtCxN8Y2PN8UccM1ezgcrsT2A8T1H8wE&e=3D > > >> >Committee: https://urldefense.proofpoint.com/v2/url?u=3Dhttps- > > >> >3A__www.oasis- > > >> > > >>2Dopen.org_committees_virtio_&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ > > >& > > >> >r=3DlDHJ2FW52oJ3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=3DgIIx9_eEGj- > > >> >aDaM6Z-42yWtI9MnZcqZ2Gw7KCN7EgCg&s=3Dkcv-jA-_-JC3v64-_r5iP- > > >> >XQ9rhyaeOKvrHJNHwZrLc&e=3D > > >> >Join OASIS: https://urldefense.proofpoint.com/v2/url?u=3Dhttps- > > >> >3A__www.oasis- > > >> > > >>2Dopen.org_join_&d=3DDwIFaQ&c=3DnKjWec2b6R0mOyPaz7xtfQ&r=3DlDHJ2FW52 > > >o > > >> >J3lqqsArgFRdcevq01tbLQAw4A_NO7xgI&m=3DgIIx9_eEGj-aDaM6Z- > > >> > > >>42yWtI9MnZcqZ2Gw7KCN7EgCg&s=3DcKbLeI_5Fu9G7aybE5u51yISB0eRer6BvC > > >xr > > >> >wgd5HS4&e=3D > > >> > >=20 This publicly archived list offers a means to provide input to the=0D OASIS Virtual I/O Device (VIRTIO) TC.=0D =0D In order to verify user consent to the Feedback License terms and=0D to minimize spam in the list archive, subscription is required=0D before posting.=0D =0D Subscribe: virtio-comment-subscribe@lists.oasis-open.org=0D Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org=0D List help: virtio-comment-help@lists.oasis-open.org=0D List archive: https://lists.oasis-open.org/archives/virtio-comment/=0D Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf= =0D List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lis= ts=0D Committee: https://www.oasis-open.org/committees/virtio/=0D Join OASIS: https://www.oasis-open.org/join/