From: "Michael S. Tsirkin" <mst@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Virtio-Dev <virtio-dev@lists.oasis-open.org>,
Kangjie Xu <kangjie.xu@linux.alibaba.com>,
Heng Qi <hengqi@linux.alibaba.com>,
Jason Wang <jasowang@redhat.com>
Subject: Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
Date: Fri, 23 Sep 2022 06:44:54 -0400 [thread overview]
Message-ID: <20220923064208-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1663916247.3806806-1-xuanzhuo@linux.alibaba.com>
On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > > have been trying not to rely on desc chain.
> > >
> > > This seems to be a very large change which seems a little bit too
> > > early to be considered.
> >
> > I'd like to put it in other terms. Fundamentally devices are not
> > supposed to talk about descriptors at all. Descriptors are
> > a way to describe buffers, and devices should all work in terms
> > of buffers. I am working on cleaning up the spec from confusion
> > and terminology mixups. We have several major sources all over the spec:
> > - descriptor/buffer used inconsistently
> > - feature negotiated/offered used inconsistently
> > - field exists/valid used inconsistently
> >
> > My way to address the first issue is to make sure devices all work
> > with buffers. And buffers are described by descriptors (makes sense,
> > right?) and made available to device by driver and used by device.
>
> Can we try to keep desc info? I think it's a very important feature for
> virtio-net, and many NICs are designed based on this.
>
> Our current solutions B and C will waste a lot of memory. If the page occupied
> by the header can be quickly reclaimed, then we have to do a copy in the driver.
Not sure I understand. Can you explain?
> >
> > The advantage of this is layering - we can change the way buffers
> > are passed around without changing devices. And, it matches
> > the virtio API nicely.
> >
> > Existing devices are all fine with this - they do not pass any
> > information in the descriptor. Yes, this seems like an option to
> > pass some information around, but I am not convinced it is worth
> > the layering violation.
> >
> > By comparison, ability to write data at an offset seems generally
> > useful, in particular we have a very old issue even without
> > the split header feature where with mergeable buffers
> > if we attempt to align the data in the 1st buffer at a cache line
> > boundary by adding an offset before ETH header, then when it spills
> > over to the second buffer it will be misaligned there. Wastes
> > an extra cache line for such packets. Offsets can allow fixing this.
> >
>
> Scheme B In addition to the memory problem, under this scheme, if we want to
> implement tcp zerocopy, then we must apply for two consecutive pages for a
> buffer. The second page is used to place the payload. The first is used to place
> the header.
I just don't understand why there would be a difference. Explanation?
>
> >
> > I don't see architecturally what is wrong with making feature just
> > depend on mergeable buffers for now. We can always allow a combination
> > down the road. Let's just make it clear that if drivers see SPLIT &&
> > !MERGEABLE they should not fail probe, they should instead clear the
> > split header feature.
> >
>
> According to my understanding, for B, it does not depend on mergeable.
>
> If we give up desc info completely, then we prefer B.
>
> Hi Jason, what are your thoughts?
>
> Thanks.
>
> >
> >
> >
> > > >
> > > > If we can't make a feature depend only on mergeable, should we use solution B?
> > > >
> > > > 2. Scheme B ( refer to your suggestion )
> > > >
> > > > Our rethinking approach is no longer based on descriptor chain.
> > > >
> > > > We refer to your proposed offset-based scheme as scheme B.
> > >
> > > The offset seems to be the suggestion of Michael.
> > >
> > > I think I like the design of v7 for several reasons:
> > >
> > > 1) easy to reserve head/tailroom without any extension of the spec
> > > 2) easy to work with mergeable rx buffer
> > > 3) it is the model used by modern NIC like e810 [1]
> > >
> > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > it works which is similar to v7
> > >
> > > Thanks
> > >
> > > > As you suggested, scheme B gives the device a buffer, using offset to
> > > > indicate where to place the payload. Like this:
> > > >
> > > > <header>...<padding>... <beginning of page><data>
> > > >
> > > > But how to apply for this buffer?
> > > > Since we want the payload to be placed on a separate page, the method
> > > > we consider is to directly alloc two pages from driver of contiguous memory.
> > > >
> > > > Then the beginning of this contiguous memory is used to store the headroom,
> > > > and the contiguous memory after the headroom is directly handed over to the device.
> > > > Similar to the following:
> > > >
> > > > [------------------ receive buffer(2 pages) ------------------------------]
> > > > [<------------first page -------------------><------ second page -------->]
> > > > [<-----><virtnet hdr> <mac,ip,tcp>..<padding>< payload >]
> > > > ^ ^
> > > > | |
> > > > | pointer to device
> > > > |
> > > > |
> > > > Driver reserved, the later part is filled
> > > >
> > > > We have already entered v8, but we have not been able to reach an agreement on
> > > > the basic capabilities. I want to solve this problem first.
> > > >
> > > > @Jason @Michael
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > > >+ \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > > >
> > > > > > >
> > > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > > length of the first descriptor?
> > > > > > >
> > > > > >
> > > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > > >
> > > > > > >
> > > > > > > >+\end{itemize}
> > > > > > > >+
> > > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > > >+
> > > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > > >+the receive packets.
> > > > > > >
> > > > > > >
> > > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > > driver is try to having a new max_len).
> > > > > > >
> > > > > >
> > > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > > will also receive packets according to this offset.
> > > > > > But if the case is considered to be caused by driver error settings,
> > > > > > the device can do not split the packet.
> > > > >
> > > > > Note that protocol like ipv6 allows variable length of the header,
> > > > > falling back to not split the header seems better to me.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > > on the merge able) in this case).
> > > > > > >
> > > > > >
> > > > > > Okay, we can consider later.
> > > > > >
> > > > > > >
> > > > > > > > The maximum available length of the first buffer
> > > > > > > >+used by the device is specified by \field{max_len}.
> > > > > > >
> > > > > > >
> > > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > > >
> > > > > >
> > > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > > If \field{max_len} is 0 or
> > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > > >+
> > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > > >+
> > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > > >+inside the first buffer.
> > > > > > > >+
> > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > > > \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2022-09-23 10:45 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 2:56 [virtio-dev] [PATCH v8] virtio_net: support for split transport header hengqi
2022-09-19 8:47 ` [virtio-dev] " Heng Qi
2022-09-21 6:27 ` Michael S. Tsirkin
2022-09-20 1:59 ` [virtio-dev] " Jason Wang
2022-09-20 3:28 ` Heng Qi
2022-09-21 6:20 ` Jason Wang
2022-09-21 6:23 ` Jason Wang
2022-09-23 3:23 ` Xuan Zhuo
2022-09-23 4:04 ` Jason Wang
2022-09-23 5:59 ` Michael S. Tsirkin
2022-09-23 6:57 ` Xuan Zhuo
2022-09-23 10:44 ` Michael S. Tsirkin [this message]
2022-09-23 10:48 ` Xuan Zhuo
2022-09-23 11:04 ` Michael S. Tsirkin
2022-09-23 12:40 ` Xuan Zhuo
2022-09-26 8:06 ` Jason Wang
2022-09-28 13:39 ` Michael S. Tsirkin
2022-09-29 1:48 ` Jason Wang
2022-09-29 7:04 ` Michael S. Tsirkin
2022-09-29 8:24 ` Xuan Zhuo
2022-09-29 10:06 ` Michael S. Tsirkin
2022-09-29 11:48 ` Xuan Zhuo
2022-10-08 4:37 ` Jason Wang
2022-10-09 1:49 ` Xuan Zhuo
2022-10-10 17:11 ` Michael S. Tsirkin
2022-10-12 3:17 ` Jason Wang
2022-10-12 5:05 ` Michael S. Tsirkin
2022-10-13 6:47 ` Jason Wang
2022-10-13 14:33 ` Michael S. Tsirkin
2022-10-18 3:07 ` Xuan Zhuo
2022-10-20 8:16 ` Heng Qi
2023-01-31 9:23 ` 回复:[virtio-dev] " hengqi
2023-01-31 9:44 ` [virtio-comment] " Heng Qi
2023-01-31 23:08 ` Parav Pandit
2023-02-01 13:17 ` Heng Qi
2023-02-02 1:45 ` Parav Pandit
2023-02-02 3:45 ` Heng Qi
2023-02-02 4:20 ` Parav Pandit
2023-02-02 5:06 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-02 11:07 ` Michael S. Tsirkin
-- strict thread matches above, loose matches on Subject: below --
2022-09-28 1:43 [virtio-dev] " Xuan Zhuo
2022-09-28 4:05 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220923064208-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=hengqi@linux.alibaba.com \
--cc=jasowang@redhat.com \
--cc=kangjie.xu@linux.alibaba.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox