From: "Michael S. Tsirkin" <mst@redhat.com>
To: Wei Wang <wei.w.wang@intel.com>
Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-mm@kvack.org, david@redhat.com, cornelia.huck@de.ibm.com,
akpm@linux-foundation.org, mgorman@techsingularity.net,
aarcange@redhat.com, amit.shah@redhat.com, pbonzini@redhat.com,
liliang.opensource@gmail.com, virtio-dev@lists.oasis-open.org,
yang.zhang.wz@gmail.com, quan.xu@aliyun.com
Subject: Re: [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG
Date: Wed, 26 Jul 2017 20:02:30 +0300 [thread overview]
Message-ID: <20170726155856-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <59781119.8010200@intel.com>
On Wed, Jul 26, 2017 at 11:48:41AM +0800, Wei Wang wrote:
> On 07/23/2017 09:45 AM, Michael S. Tsirkin wrote:
> > On Fri, Jul 14, 2017 at 03:12:43PM +0800, Wei Wang wrote:
> > > On 07/14/2017 04:19 AM, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 13, 2017 at 03:42:35PM +0800, Wei Wang wrote:
> > > > > On 07/12/2017 09:56 PM, Michael S. Tsirkin wrote:
> > > > > > So the way I see it, there are several issues:
> > > > > >
> > > > > > - internal wait - forces multiple APIs like kick/kick_sync
> > > > > > note how kick_sync can fail but your code never checks return code
> > > > > > - need to re-write the last descriptor - might not work
> > > > > > for alternative layouts which always expose descriptors
> > > > > > immediately
> > > > > Probably it wasn't clear. Please let me explain the two functions here:
> > > > >
> > > > > 1) virtqueue_add_chain_desc(vq, head_id, prev_id,..):
> > > > > grabs a desc from the vq and inserts it to the chain tail (which is indexed
> > > > > by
> > > > > prev_id, probably better to call it tail_id). Then, the new added desc
> > > > > becomes
> > > > > the tail (i.e. the last desc). The _F_NEXT flag is cleared for each desc
> > > > > when it's
> > > > > added to the chain, and set when another desc comes to follow later.
> > > > And this only works if there are multiple rings like
> > > > avail + descriptor ring.
> > > > It won't work e.g. with the proposed new layout where
> > > > writing out a descriptor exposes it immediately.
> > > I think it can support the 1.1 proposal, too. But before getting
> > > into that, I think we first need to deep dive into the implementation
> > > and usage of _first/next/last. The usage would need to lock the vq
> > > from the first to the end (otherwise, the returned info about the number
> > > of available desc in the vq, i.e. num_free, would be invalid):
> > >
> > > lock(vq);
> > > add_first();
> > > add_next();
> > > add_last();
> > > unlock(vq);
> > >
> > > However, I think the case isn't this simple, since we need to check more
> > > things
> > > after each add_xx() step. For example, if only one entry is available at the
> > > time
> > > we start to use the vq, that is, num_free is 0 after add_first(), we
> > > wouldn't be
> > > able to add_next and add_last. So, it would work like this:
> > >
> > > start:
> > > ...get free page block..
> > > lock(vq)
> > > retry:
> > > ret = add_first(..,&num_free,);
> > > if(ret == -ENOSPC) {
> > > goto retry;
> > > } else if (!num_free) {
> > > add_chain_head();
> > > unlock(vq);
> > > kick & wait;
> > > goto start;
> > > }
> > > next_one:
> > > ...get free page block..
> > > add_next(..,&num_free,);
> > > if (!num_free) {
> > > add_chain_head();
> > > unlock(vq);
> > > kick & wait;
> > > goto start;
> > > } if (num_free == 1) {
> > > ...get free page block..
> > > add_last(..);
> > > unlock(vq);
> > > kick & wait;
> > > goto start;
> > > } else {
> > > goto next_one;
> > > }
> > >
> > > The above seems unnecessary to me to have three different APIs.
> > > That's the reason to combine them into one virtqueue_add_chain_desc().
> > >
> > > -- or, do you have a different thought about using the three APIs?
> > >
> > >
> > > Implementation Reference:
> > >
> > > struct desc_iterator {
> > > unsigned int head;
> > > unsigned int tail;
> > > };
> > >
> > > add_first(*vq, *desc_iterator, *num_free, ..)
> > > {
> > > if (vq->vq.num_free < 1)
> > > return -ENOSPC;
> > > get_desc(&desc_id);
> > > desc[desc_id].flag &= ~_F_NEXT;
> > > desc_iterator->head = desc_id
> > > desc_iterator->tail = desc_iterator->head;
> > > *num_free = vq->vq.num_free;
> > > }
> > >
> > > add_next(vq, desc_iterator, *num_free,..)
> > > {
> > > get_desc(&desc_id);
> > > desc[desc_id].flag &= ~_F_NEXT;
> > > desc[desc_iterator.tail].next = desc_id;
> > > desc[desc_iterator->tail].flag |= _F_NEXT;
> > > desc_iterator->tail = desc_id;
> > > *num_free = vq->vq.num_free;
> > > }
> > >
> > > add_last(vq, desc_iterator,..)
> > > {
> > > get_desc(&desc_id);
> > > desc[desc_id].flag &= ~_F_NEXT;
> > > desc[desc_iterator.tail].next = desc_id;
> > > desc_iterator->tail = desc_id;
> > >
> > > add_chain_head(); // put the desc_iterator.head to the ring
> > > }
> > >
> > >
> > > Best,
> > > Wei
> > OK I thought this over. While we might need these new APIs in
> > the future, I think that at the moment, there's a way to implement
> > this feature that is significantly simpler. Just add each s/g
> > as a separate input buffer.
>
>
> Should it be an output buffer?
Hypervisor overwrites these pages with zeroes. Therefore it is
writeable by device: DMA_FROM_DEVICE.
> I think output means from the
> driver to device (i.e. DMA_TO_DEVICE).
This part is correct I believe.
> >
> > This needs zero new APIs.
> >
> > I know that follow-up patches need to add a header in front
> > so you might be thinking: how am I going to add this
> > header? The answer is quite simple - add it as a separate
> > out header.
> >
> > Host will be able to distinguish between header and pages
> > by looking at the direction, and - should we want to add
> > IN data to header - additionally size (<4K => header).
>
>
> I think this works fine when the cmdq is only used for
> reporting the unused pages.
> It would be an issue
> if there are other usages (e.g. report memory statistics)
> interleaving. I think one solution would be to lock the cmdq until
> a cmd usage is done ((e.g. all the unused pages are reported) ) -
> in this case, the periodically updated guest memory statistics
> may be delayed for a while occasionally when live migration starts.
> Would this be acceptable? If not, probably we can have the cmdq
> for one usage only.
>
>
> Best,
> Wei
OK I see, I think the issue is that reporting free pages
was structured like stats. Let's split it -
send pages on e.g. free_vq, get commands on vq shared with
stats.
--
MST
next prev parent reply other threads:[~2017-07-26 17:02 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-12 12:40 [Qemu-devel] [PATCH v12 0/8] Virtio-balloon Enhancement Wei Wang
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 1/8] virtio-balloon: deflate via a page list Wei Wang
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 2/8] virtio-balloon: coding format cleanup Wei Wang
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 3/8] Introduce xbitmap Wei Wang
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 4/8] xbitmap: add xb_find_next_bit() and xb_zero() Wei Wang
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 5/8] virtio-balloon: VIRTIO_BALLOON_F_SG Wei Wang
2017-07-12 13:06 ` Michael S. Tsirkin
2017-07-12 13:29 ` Wei Wang
2017-07-12 13:56 ` Michael S. Tsirkin
2017-07-13 7:42 ` Wei Wang
2017-07-13 20:19 ` Michael S. Tsirkin
2017-07-14 7:12 ` Wei Wang
2017-07-23 1:45 ` Michael S. Tsirkin
2017-07-26 3:48 ` Wei Wang
2017-07-26 17:02 ` Michael S. Tsirkin [this message]
2017-07-27 2:50 ` Wei Wang
2017-07-28 23:08 ` Michael S. Tsirkin
2017-07-29 12:47 ` Wei Wang
2017-07-30 4:22 ` Michael S. Tsirkin
2017-07-30 5:59 ` Wang, Wei W
2017-07-30 16:18 ` Michael S. Tsirkin
2017-07-30 16:20 ` Michael S. Tsirkin
2017-07-31 12:36 ` Wei Wang
2017-07-13 0:44 ` Michael S. Tsirkin
2017-07-13 1:16 ` kbuild test robot
2017-07-13 4:21 ` kbuild test robot
2017-07-28 8:25 ` Wei Wang
2017-07-28 23:01 ` Michael S. Tsirkin
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 6/8] mm: support reporting free page blocks Wei Wang
2017-07-13 0:33 ` Michael S. Tsirkin
2017-07-13 8:25 ` Wei Wang
2017-07-14 12:30 ` Michal Hocko
2017-07-14 12:54 ` Michal Hocko
2017-07-14 15:46 ` Michael S. Tsirkin
2017-07-14 19:17 ` Michael S. Tsirkin
2017-07-17 15:24 ` Michal Hocko
2017-07-18 2:12 ` Wei Wang
2017-07-19 8:13 ` Michal Hocko
2017-07-19 12:01 ` Wei Wang
2017-07-24 9:00 ` Michal Hocko
2017-07-25 9:32 ` Wei Wang
2017-07-25 11:25 ` Michal Hocko
2017-07-25 11:56 ` Wei Wang
2017-07-25 12:41 ` Michal Hocko
2017-07-25 14:47 ` Wang, Wei W
2017-07-25 14:53 ` Michal Hocko
2017-07-26 2:22 ` Wei Wang
2017-07-26 10:24 ` Michal Hocko
2017-07-26 11:44 ` Wei Wang
2017-07-26 11:55 ` Michal Hocko
2017-07-26 12:47 ` Wang, Wei W
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 7/8] mm: export symbol of next_zone and first_online_pgdat Wei Wang
2017-07-13 0:16 ` Michael S. Tsirkin
2017-07-13 8:41 ` [Qemu-devel] [virtio-dev] " Wei Wang
2017-07-14 12:31 ` [Qemu-devel] " Michal Hocko
2017-07-12 12:40 ` [Qemu-devel] [PATCH v12 8/8] virtio-balloon: VIRTIO_BALLOON_F_CMD_VQ Wei Wang
2017-07-13 0:22 ` Michael S. Tsirkin
2017-07-13 8:46 ` Wei Wang
2017-07-13 17:59 ` Michael S. Tsirkin
2017-07-13 0:14 ` [Qemu-devel] [PATCH v12 0/8] Virtio-balloon Enhancement Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170726155856-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=amit.shah@redhat.com \
--cc=cornelia.huck@de.ibm.com \
--cc=david@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=liliang.opensource@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quan.xu@aliyun.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.w.wang@intel.com \
--cc=yang.zhang.wz@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).