From: "Michael S. Tsirkin" <mst@redhat.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Subject: Re: [virtio-dev] packed ring layout proposal v2
Date: Wed, 1 Mar 2017 06:14:21 +0200 [thread overview]
Message-ID: <20170301060913-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20170301035715.GP18844@yliu-dev.sh.intel.com>
On Wed, Mar 01, 2017 at 11:57:15AM +0800, Yuanhan Liu wrote:
> On Wed, Mar 01, 2017 at 03:02:29AM +0200, Michael S. Tsirkin wrote:
> > > > * Descriptor ring:
> > > >
> > > > Guest adds descriptors with unique index values and DESC_HW set in flags.
> > > > Host overwrites used descriptors with correct len, index, and DESC_HW
> > > > clear. Flags are always set/cleared last.
> > >
> > > May I know what's the index intended for? Back referencing a pkt buffer?
> >
> > Yes and generally identify which descriptor completed. Recall that
> > even though vhost net completes in order at the moment,
> > virtio rings serve devices (e.g. storage) that complete out of order.
>
> I see, and thanks.
>
> > > That said, if it's "16: 16" and if we use only 8 bits for batch, we
> > > could still have another 8 bit for anything else, say the number of
> > > desc for a single pkt. With that, the num_buffers of mergeable Rx
> > > header could be replaced. More importantly, we could reduce a cache
> > > line write if non offload are supported in mergeable Rx path.
> >
> > Why do you bother with mergeable Rx without offloads?
>
> Oh, my bad. I actually meant "without offloads __being used__". Just
> assume the pkt size is 64B and no offloads are used. When mergeable
> Rx is negotiated (which is the default case), num_buffers has to be
> set 1. That means an extra cache line write. For the case of non
> mergeable, the cache line write could be avoid by a trick like what
> the following patch did:
>
> http://dpdk.org/browse/dpdk/commit/?id=c9ea670c1dc7e3f111d8139f915082b60c9c1ffe
>
> It basically tries to avoid writing 0 if the value is already 0:
> the case when no offloads are used.
> So while writing this email, I was thinking maybe we could not set
> num_buffers to 1 when there is only one desc (let it be 0 and let
> num_buffers == 0 imply num_buffers = 1). I'm not quite sure we can
> do that now, thus I checked the DPDK code and found it's Okay.
>
> 896 seg_num = header->num_buffers;
> 897
> 898 if (seg_num == 0)
> 899 seg_num = 1;
>
>
> I then also checked linux kernel code, and found it's not okay as
> the code depends on the value being set correctly:
>
> ==> 365 u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> 366 struct page *page = virt_to_head_page(buf);
> 367 int offset = buf - page_address(page);
> 368 unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> 369
> 370 struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> 371 truesize);
> 372 struct sk_buff *curr_skb = head_skb;
> 373
> 374 if (unlikely(!curr_skb))
> 375 goto err_skb;
> ==> 376 while (--num_buf) {
>
> That means, if we want to do that, it needs an extra feature flag
> (either a global feature flag or a desc flag), something like
> ALLOW_ZERO_NUM_BUFFERS. Or even, make it allowable in virtio 1.1
> (virtio 0.95/1.0 won't benifit from it though).
>
> Does it make sense to you?
Right and then we could use a descriptor flag "header is all 0s".
For virtio 1.0 we could put these in the used ring instead.
>
> > Make each buffer
> > MTU sized and it'll fit without merging. Linux used not to, it only
> > started doing this to save memory aggressively. I don't think
> > DPDK cares about this.
> >
> > >
> > > struct desc {
> > > __le64 addr;
> > > __le16 len;
> > > __le8 batch;
> > > __le8 num_buffers;
> > > __le16 index;
> > > __le16 flags;
> > > };
> >
> > Interesting. How about a benchmark for these ideas?
>
> Sure, I would like to benchmark it. It won't take long to me. But
> currently, I was still focusing on analysising the performance behaviour
> of virtio 0.95/1.0 (when I get some time), to see what's not good for
> performance and what's can be improved.
>
> Besides that, as said, I often got interrupted. Moreoever, DPDK v17.05
> code freeze deadline is coming. So it's just a remind that I may don't
> have time for it recently. Sorry for that.
>
> > > > * Device specific descriptor flags
> > > > We have a lot of unused space in the descriptor. This can be put to
> > > > good use by reserving some flag bits for device use.
> > > > For example, network device can set a bit to request
> > > > that header in the descriptor is suppressed
> > > > (in case it's all 0s anyway). This reduces cache utilization.
> > >
> > > Good proposal. But I think it could only work with Tx, where the driver
> > > knows whether the headers are all 0s while filling the desc. But for
> > > Rx, whereas the driver has to pre-fill the desc, it doesn't know. Thus
> > > it still requires filling an header desc for storing it.
> >
> > I don't see why - I don't think drivers pre-fill buffers in header for RX
> > right now. Why would they start?
>
> Again, my bad, I meant "prepare" but not "pre-fill". Let me try to explain
> it again. I'm thinking:
>
> - For Tx, when the header is all 0s, the header could be discarded. We
> could use one desc for transfering a packet (with a flag NO_HEADER
> or HEADER_ALL_ZERO bit set)
>
> - For Rx, the header is filled in the device (or vhost) side. And the
> driver has to prepare the header desc for each pkt, because the Rx
> driver has no idea whether it will be all 0s.
>
> That means, the header could not be discarded.
>
> If such a global feature is negotiated, we could also discard the header
> desc as well.
>
> --yliu
Right and again, flags could be added to the used ring to pass extra
info.
> > > Maybe we could introduce a global feature? When that's negotiated, no
> > > header desc need filled and processed? I'm thinking this could also
> > > help the vector implementation I mentioned in another email.
> >
> > It's possible of course - it's a subset of what I said.
> > Though it makes it less useful in the general case.
> >
> > > > Note: this feature can be supported in virtio 1.0 as well,
> > > > as we have unused bits in both descriptor and used ring there.
> > >
> > > Agreed.
> > >
> > > --yliu
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2017-03-01 4:14 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20160915223915.qjlnlvf2w7u37bu3@redhat.com>
2017-02-08 13:37 ` packed ring layout proposal v2 Christian Borntraeger
2017-02-09 17:43 ` Michael S. Tsirkin
[not found] ` <20170209181955-mutt-send-email-mst@kernel.org>
2017-02-09 18:27 ` Christian Borntraeger
2017-02-08 17:41 ` [virtio-dev] " Paolo Bonzini
2017-02-08 19:59 ` Michael S. Tsirkin
[not found] ` <20170208214435-mutt-send-email-mst@kernel.org>
2017-02-09 15:48 ` Paolo Bonzini
2017-02-09 16:11 ` Cornelia Huck
2017-02-09 18:24 ` Michael S. Tsirkin
[not found] ` <20170209202203-mutt-send-email-mst@kernel.org>
2017-02-10 11:32 ` Paolo Bonzini
[not found] ` <c229269b-1702-ffec-62e8-002c7c142904@redhat.com>
2017-02-10 15:20 ` Michael S. Tsirkin
2017-02-10 16:17 ` Paolo Bonzini
[not found] ` <20170209171105.075a9d9c.cornelia.huck@de.ibm.com>
2017-02-22 16:43 ` Michael S. Tsirkin
[not found] ` <20170222181333-mutt-send-email-mst@kernel.org>
2017-03-07 15:53 ` Cornelia Huck
2017-03-07 20:33 ` Michael S. Tsirkin
[not found] ` <20170307223057-mutt-send-email-mst@kernel.org>
2017-07-10 16:27 ` Amnon Ilan
2017-02-22 4:27 ` packed ring layout proposal - todo list Michael S. Tsirkin
2017-02-22 14:46 ` [virtio-dev] packed ring layout proposal v2 Chien, Roger S
[not found] ` <20170222054336-mutt-send-email-mst@kernel.org>
2017-02-22 9:19 ` [virtio-dev] packed ring layout proposal - todo list Gray, Mark D
[not found] ` <738D45BC1F695740A983F43CFE1B7EA94E93CA7E@IRSMSX108.ger.corp.intel.com>
2017-02-22 15:13 ` Michael S. Tsirkin
2017-02-28 4:29 ` Yuanhan Liu
[not found] ` <20170228042943.GH18844@yliu-dev.sh.intel.com>
2017-03-01 1:07 ` Michael S. Tsirkin
2017-03-08 7:09 ` Yuanhan Liu
[not found] ` <20170308070948.GC18844@yliu-dev.sh.intel.com>
2017-03-08 7:56 ` Yuanhan Liu
[not found] ` <20170308075624.GF18844@yliu-dev.sh.intel.com>
2017-03-29 12:39 ` Michael S. Tsirkin
2017-04-01 7:30 ` Yuanhan Liu
2017-02-28 5:02 ` [virtio-dev] packed ring layout proposal v2 Yuanhan Liu
2017-02-28 5:47 ` [RFC] packed (virtio-net) headers Yuanhan Liu
[not found] ` <20170228050218.GI18844@yliu-dev.sh.intel.com>
2017-03-01 1:02 ` [virtio-dev] packed ring layout proposal v2 Michael S. Tsirkin
[not found] ` <20170301024951-mutt-send-email-mst@kernel.org>
2017-03-01 3:57 ` Yuanhan Liu
[not found] ` <20170301035715.GP18844@yliu-dev.sh.intel.com>
2017-03-01 4:14 ` Michael S. Tsirkin [this message]
2017-03-01 4:57 ` Yuanhan Liu
[not found] ` <20170228054719.GJ18844@yliu-dev.sh.intel.com>
2017-03-01 1:28 ` [RFC] packed (virtio-net) headers Michael S. Tsirkin
2017-07-16 6:00 ` [virtio-dev] packed ring layout proposal v2 Lior Narkis
[not found] ` <DB5PR05MB176690DF180908ABCDDA0860D3A30@DB5PR05MB1766.eurprd05.prod.outlook.com>
2017-07-18 16:23 ` Michael S. Tsirkin
2017-07-19 7:41 ` Lior Narkis
[not found] ` <DB5PR05MB1766461893DB3FE20D338B96D3A60@DB5PR05MB1766.eurprd05.prod.outlook.com>
2017-07-20 13:06 ` Michael S. Tsirkin
2017-09-11 7:47 ` packed ring layout proposal v3 Jason Wang
2017-09-12 16:20 ` [virtio-dev] " Willem de Bruijn
[not found] ` <0f0e1b94-2a46-689c-dbb3-0d578cc8df33@redhat.com>
2017-09-12 16:23 ` [virtio-dev] " Willem de Bruijn
[not found] ` <CAF=yD-+LtZO=Fcw6Y-v0dnxkmwqW1+CqzopyQEApOjOAszgqMg@mail.gmail.com>
2017-09-13 1:26 ` Jason Wang
2017-09-14 8:23 ` Ilya Lesokhin
2017-09-20 9:11 ` [virtio-dev] " Liang, Cunming
2017-09-25 22:24 ` Michael S. Tsirkin
[not found] ` <20170926011826-mutt-send-email-mst@kernel.org>
2017-09-26 23:38 ` Steven Luong (sluong)
[not found] ` <7A0DC0C9-F148-4161-B2D1-8D8D14D8B9A1@cisco.com>
2017-09-27 23:49 ` Michael S. Tsirkin
2017-09-28 9:44 ` Liang, Cunming
2017-09-28 21:13 ` Michael S. Tsirkin
[not found] ` <D0158A423229094DA7ABF71CF2FA0DA34E1AD49B@SHSMSX152.ccr.corp.intel.com>
2017-10-01 4:08 ` Michael S. Tsirkin
2017-10-04 12:39 ` Jens Freimann
2017-10-04 12:58 ` Michael S. Tsirkin
[not found] ` <20171004155532-mutt-send-email-mst@kernel.org>
2017-10-10 9:56 ` Liang, Cunming
[not found] ` <D0158A423229094DA7ABF71CF2FA0DA34E7DE4D3@SHSMSX104.ccr.corp.intel.com>
2017-10-11 12:22 ` Jens Freimann
2017-09-21 13:36 ` Liang, Cunming
2017-09-28 21:27 ` Michael S. Tsirkin
2017-10-08 6:16 ` Ilya Lesokhin
[not found] ` <AM4PR0501MB27236B71E1B02176F1844ACBD4770@AM4PR0501MB2723.eurprd05.prod.outlook.com>
2017-10-25 16:20 ` Michael S. Tsirkin
2017-10-29 9:05 ` Ilya Lesokhin
[not found] ` <AM4PR0501MB2723177BF4E5EE32CC878E8CD4580@AM4PR0501MB2723.eurprd05.prod.outlook.com>
2017-10-29 14:21 ` Michael S. Tsirkin
2017-10-29 14:34 ` Ilya Lesokhin
[not found] ` <AM4PR0501MB272332C92A384C97D6975D8DD4580@AM4PR0501MB2723.eurprd05.prod.outlook.com>
2017-10-30 2:08 ` Michael S. Tsirkin
2017-10-30 6:30 ` Ilya Lesokhin
[not found] ` <AM4PR0501MB27233D93937CA1F2E5AD13DBD4590@AM4PR0501MB2723.eurprd05.prod.outlook.com>
2017-10-30 16:30 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170301060913-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=yuanhan.liu@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).