* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Christoph Hellwig @ 2018-08-01 8:36 UTC (permalink / raw)
To: Will Deacon
Cc: robh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt,
linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus,
marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev,
elfring, haren, Anshuman Khandual
In-Reply-To: <20180801081637.GA14438@arm.com>
On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> transport (so definitely not PCI) have historically been advertised by qemu
> as not being cache coherent, but because the virtio core has bypassed DMA
> ops then everything has happened to work. If we blindly enable the arch DMA
> ops,
No one is suggesting that as far as I can tell.
> we'll plumb in the non-coherent ops and start getting data corruption,
> so we do need a way to quirk virtio as being "always coherent" if we want to
> use the DMA ops (which we do, because our emulation platforms have an IOMMU
> for all virtio devices).
From all that I've gather so far: no you do not want that. We really
need to figure out virtio "dma" interacts with the host / device.
If you look at the current iommu spec it does talk of physical address
with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
So between that and our discussion in this thread and its previous
iterations I think we need to stick to the current always physical,
bypass system dma ops mode of virtio operation as the default.
We just need to figure out how to deal with devices that deviate
from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really
should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
dma tweaks (offsets, cache flushing), which seems well in spirit of
the original design. The other issue is VIRTIO_F_IO_BARRIER
which is very vaguely defined, and which needs a better definition.
And last but not least we'll need some text explaining the challenges
of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
is what would basically cover them, but a good description including
an explanation of why these matter.
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Will Deacon @ 2018-08-01 9:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt,
linuxram, linux-kernel, virtualization, paulus, marc.zyngier, mpe,
joe, robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <20180801083639.GF26378@infradead.org>
Hi Christoph,
On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops,
>
> No one is suggesting that as far as I can tell.
Apologies: it's me that wants the DMA ops enabled to handle legacy devices
behind an IOMMU, but see below.
> > we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
>
> From all that I've gather so far: no you do not want that. We really
> need to figure out virtio "dma" interacts with the host / device.
>
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
That's true, although that doesn't exist in the legacy virtio spec, and we
have an existing emulation platform which puts legacy virtio devices behind
an IOMMU. Currently, Linux is unable to boot on this platform unless the
IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
then it works perfectly.
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
As above -- that means we hang during boot because we get stuck trying to
bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
answer is "just upgrade to latest virtio and advertise the presence of the
IOMMU". I'm pushing for that in future platforms, but it seems a shame not
to support the current platform, especially given that other systems do have
hacks in mainline to get virtio working.
> We just need to figure out how to deal with devices that deviate
> from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design. The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.
I agree that this makes sense for future revisions of virtio (or perhaps
it can just be a clarification to virtio 1.0), but we're still left in the
dark with legacy devices and it would be nice to have them work on the
systems which currently exist, even if it's a legacy-only hack in the arch
code.
Will
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Tonghao Zhang @ 2018-08-01 9:52 UTC (permalink / raw)
To: jasowang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <30e62749-3cbd-ae88-6582-c20087884b20@redhat.com>
On Wed, Aug 1, 2018 at 2:01 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年08月01日 11:00, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Factor out generic busy polling logic and will be
> > used for in tx path in the next patch. And with the patch,
> > qemu can set differently the busyloop_timeout for rx queue.
> >
> > In the handle_tx, the busypoll will vhost_net_disable/enable_vq
> > because we have poll the sock. This can improve performance.
> > [This is suggested by Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>]
> >
> > And when the sock receive skb, we should queue the poll if necessary.
> >
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> > drivers/vhost/net.c | 131 ++++++++++++++++++++++++++++++++++++----------------
> > 1 file changed, 91 insertions(+), 40 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 32c1b52..5b45463 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -440,6 +440,95 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
> > nvq->done_idx = 0;
> > }
> >
> > +static int sk_has_rx_data(struct sock *sk)
> > +{
> > + struct socket *sock = sk->sk_socket;
> > +
> > + if (sock->ops->peek_len)
> > + return sock->ops->peek_len(sock);
> > +
> > + return skb_queue_empty(&sk->sk_receive_queue);
> > +}
> > +
> > +static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
> > + struct vhost_virtqueue *vq)
> > +{
> > + if (!vhost_vq_avail_empty(&net->dev, vq)) {
> > + vhost_poll_queue(&vq->poll);
> > + } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> > + vhost_disable_notify(&net->dev, vq);
> > + vhost_poll_queue(&vq->poll);
> > + }
> > +}
> > +
> > +static void vhost_net_busy_poll_check(struct vhost_net *net,
> > + struct vhost_virtqueue *rvq,
> > + struct vhost_virtqueue *tvq,
> > + bool rx)
> > +{
> > + struct socket *sock = rvq->private_data;
> > +
> > + if (rx)
> > + vhost_net_busy_poll_try_queue(net, tvq);
> > + else if (sock && sk_has_rx_data(sock->sk))
> > + vhost_net_busy_poll_try_queue(net, rvq);
> > + else {
> > + /* On tx here, sock has no rx data, so we
> > + * will wait for sock wakeup for rx, and
> > + * vhost_enable_notify() is not needed. */
>
> A possible case is we do have rx data but guest does not refill the rx
> queue. In this case we may lose notifications from guest.
Yes, should consider this case. thanks.
> > + }
> > +}
> > +
> > +static void vhost_net_busy_poll(struct vhost_net *net,
> > + struct vhost_virtqueue *rvq,
> > + struct vhost_virtqueue *tvq,
> > + bool *busyloop_intr,
> > + bool rx)
> > +{
> > + unsigned long busyloop_timeout;
> > + unsigned long endtime;
> > + struct socket *sock;
> > + struct vhost_virtqueue *vq = rx ? tvq : rvq;
> > +
> > + mutex_lock_nested(&vq->mutex, rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
> > + vhost_disable_notify(&net->dev, vq);
> > + sock = rvq->private_data;
> > +
> > + busyloop_timeout = rx ? rvq->busyloop_timeout:
> > + tvq->busyloop_timeout;
> > +
> > +
> > + /* Busypoll the sock, so don't need rx wakeups during it. */
> > + if (!rx)
> > + vhost_net_disable_vq(net, vq);
>
> Actually this piece of code is not a factoring out. I would suggest to
> add this in another patch, or on top of this series.
I will add this in another patch.
> > +
> > + preempt_disable();
> > + endtime = busy_clock() + busyloop_timeout;
> > +
> > + while (vhost_can_busy_poll(endtime)) {
> > + if (vhost_has_work(&net->dev)) {
> > + *busyloop_intr = true;
> > + break;
> > + }
> > +
> > + if ((sock && sk_has_rx_data(sock->sk) &&
> > + !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > + !vhost_vq_avail_empty(&net->dev, tvq))
> > + break;
>
> Some checks were duplicated in vhost_net_busy_poll_check(). Need
> consider to unify them.
OK
> > +
> > + cpu_relax();
> > + }
> > +
> > + preempt_enable();
> > +
> > + if (!rx)
> > + vhost_net_enable_vq(net, vq);
>
> No need to enable rx virtqueue, if we are sure handle_rx() will be
> called soon.
If we disable rx virtqueue in handle_tx and don't send packets from
guest anymore(handle_tx is not called), so we can wake up for sock rx.
so the network is broken.
> > +
> > + vhost_net_busy_poll_check(net, rvq, tvq, rx);
>
> It looks to me just open code all check here is better and easier to be
> reviewed.
will be changed.
> Thanks
>
> > +
> > + mutex_unlock(&vq->mutex);
> > +}
> > +
> > static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
> > struct vhost_net_virtqueue *nvq,
> > unsigned int *out_num, unsigned int *in_num,
> > @@ -753,16 +842,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
> > return len;
> > }
> >
> > -static int sk_has_rx_data(struct sock *sk)
> > -{
> > - struct socket *sock = sk->sk_socket;
> > -
> > - if (sock->ops->peek_len)
> > - return sock->ops->peek_len(sock);
> > -
> > - return skb_queue_empty(&sk->sk_receive_queue);
> > -}
> > -
> > static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> > bool *busyloop_intr)
> > {
> > @@ -770,41 +849,13 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
> > struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
> > struct vhost_virtqueue *rvq = &rnvq->vq;
> > struct vhost_virtqueue *tvq = &tnvq->vq;
> > - unsigned long uninitialized_var(endtime);
> > int len = peek_head_len(rnvq, sk);
> >
> > - if (!len && tvq->busyloop_timeout) {
> > + if (!len && rvq->busyloop_timeout) {
> > /* Flush batched heads first */
> > vhost_net_signal_used(rnvq);
> > /* Both tx vq and rx socket were polled here */
> > - mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
> > - vhost_disable_notify(&net->dev, tvq);
> > -
> > - preempt_disable();
> > - endtime = busy_clock() + tvq->busyloop_timeout;
> > -
> > - while (vhost_can_busy_poll(endtime)) {
> > - if (vhost_has_work(&net->dev)) {
> > - *busyloop_intr = true;
> > - break;
> > - }
> > - if ((sk_has_rx_data(sk) &&
> > - !vhost_vq_avail_empty(&net->dev, rvq)) ||
> > - !vhost_vq_avail_empty(&net->dev, tvq))
> > - break;
> > - cpu_relax();
> > - }
> > -
> > - preempt_enable();
> > -
> > - if (!vhost_vq_avail_empty(&net->dev, tvq)) {
> > - vhost_poll_queue(&tvq->poll);
> > - } else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
> > - vhost_disable_notify(&net->dev, tvq);
> > - vhost_poll_queue(&tvq->poll);
> > - }
> > -
> > - mutex_unlock(&tvq->mutex);
> > + vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
> >
> > len = peek_head_len(rnvq, sk);
> > }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-08-01 11:12 UTC (permalink / raw)
To: Michal Hocko
Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <20180730090041.GC24267@dhcp22.suse.cz>
On 07/30/2018 05:00 PM, Michal Hocko wrote:
> On Fri 27-07-18 17:24:55, Wei Wang wrote:
>> The OOM notifier is getting deprecated to use for the reasons mentioned
>> here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
>>
>> This patch replaces the virtio-balloon oom notifier with a shrinker
>> to release balloon pages on memory pressure.
> It would be great to document the replacement. This is not a small
> change...
OK. I plan to document the following to the commit log:
The OOM notifier is getting deprecated to use for the reasons:
- As a callout from the oom context, it is too subtle and easy to
generate bugs and corner cases which are hard to track;
- It is called too late (after the reclaiming has been performed).
Drivers with large amuont of reclaimable memory is expected to be
released them at an early age of memory pressure;
- The notifier callback isn't aware of the oom contrains;
Link: https://lkml.org/lkml/2018/7/12/314
This patch replaces the virtio-balloon oom notifier with a shrinker
to release balloon pages on memory pressure. Users can set the
amount of
memory pages to release each time a shrinker_scan is called via the
module parameter balloon_pages_to_shrink, and the default amount is 256
pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
been used to release balloon pages on OOM. We continue to use this
feature bit for the shrinker, so the shrinker is only registered when
this feature bit has been negotiated with host.
In addition, the bug in the replaced virtballoon_oom_notify that only
VIRTIO_BALLOON_ARRAY_PFNS_MAX (i.e 256) balloon pages can be freed
though the user has specified more than that number is fixed in the
shrinker_scan function.
Best,
Wei
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Michal Hocko @ 2018-08-01 11:34 UTC (permalink / raw)
To: Wei Wang; +Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <5B619599.1000307@intel.com>
On Wed 01-08-18 19:12:25, Wei Wang wrote:
> On 07/30/2018 05:00 PM, Michal Hocko wrote:
> > On Fri 27-07-18 17:24:55, Wei Wang wrote:
> > > The OOM notifier is getting deprecated to use for the reasons mentioned
> > > here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
> > >
> > > This patch replaces the virtio-balloon oom notifier with a shrinker
> > > to release balloon pages on memory pressure.
> > It would be great to document the replacement. This is not a small
> > change...
>
> OK. I plan to document the following to the commit log:
>
> The OOM notifier is getting deprecated to use for the reasons:
> - As a callout from the oom context, it is too subtle and easy to
> generate bugs and corner cases which are hard to track;
> - It is called too late (after the reclaiming has been performed).
> Drivers with large amuont of reclaimable memory is expected to be
> released them at an early age of memory pressure;
> - The notifier callback isn't aware of the oom contrains;
> Link: https://lkml.org/lkml/2018/7/12/314
>
> This patch replaces the virtio-balloon oom notifier with a shrinker
> to release balloon pages on memory pressure. Users can set the amount of
> memory pages to release each time a shrinker_scan is called via the
> module parameter balloon_pages_to_shrink, and the default amount is 256
> pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
> been used to release balloon pages on OOM. We continue to use this
> feature bit for the shrinker, so the shrinker is only registered when
> this feature bit has been negotiated with host.
Do you have any numbers for how does this work in practice? Let's say
you have a medium page cache workload which triggers kswapd to do a
light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
no idea how people expect this to work. Shouldn't this be more
adaptive? How precious are those pages anyway?
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* IEEE Record # 41985: 2018 3rd International Conference on Contemporary Computing and Informatics (IC3I).
From: Dr. S K Niranjan Aradhya @ 2018-08-01 11:37 UTC (permalink / raw)
To: virtualization
[-- Attachment #1.1: Type: text/plain, Size: 1576 bytes --]
*<< Apologies for cross-postings >><<< Please circulate among your friends,
peers and researchers >>>*
IEEE Conference Record No.: #41985;
2018 3rd International Conference on Contemporary Computing and Informatics
(IC3I).
Conference Date : 10 - 12 October 2018
Submission Deadline: 30 July 2018
*Submission Link:http://cmsweb.com.sg/ic3i18/index.php/ic3i18/ic3i18/login
<http://cmsweb.com.sg/ic3i18/index.php/ic3i18/ic3i18/login>*
IEEE ISBN : 978-1-5386-6894-8
IEEE Part No. : CFP18AWQ-ART
Selected, accepted and extended paper will be published in Scopus Indexed
International Journal of Forensic Software Engineering published by
InderScience
All accepted and presented papers will be submitted to the IEEE for
possible publication in IEEE Xplore Digital Library. Previous edition
indexed in: SCOPUS, ISI Web of Science, Engineering Index, Google, etc.
If you like to join the TPC or propose a special session or symposiums
please write to: secretariat@ic3i.org
General Chair(s)
IC3I 2018 Conference
----------------------
Disclaimer: We have clearly mentioned the subject lines and your email
address won't be misleading in any form. We have found your mail address
through our own efforts on the web search and not through any illegal way.
If you wish to remove your information from our mailing list or no longer
receive future announcements, please email with REMOVE in subject. Your
request to opt-out will be effective within a reasonable amount of time.
ic3i-cfp.pdf
<https://drive.google.com/file/d/1wjyVxnuBxgZoHxrqNxxdDzPumPVHu4ma/view?usp=drive_web>
[-- Attachment #1.2: Type: text/html, Size: 2640 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-01 21:56 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram,
virtualization, Christoph Hellwig, paulus, marc.zyngier, joe,
robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <3d6e81511571260de1c8047aaffa8ac4df093d2e.camel@kernel.crashing.org>
On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote:
> On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > > However the question people raise is that DMA API is already full of
> > > arch-specific tricks the likes of which are outlined in your post linked
> > > above. How is this one much worse?
> >
> > None of these warts is visible to the driver, they are all handled in
> > the architecture (possibly on a per-bus basis).
> >
> > So for virtio we really need to decide if it has one set of behavior
> > as specified in the virtio spec, or if it behaves exactly as if it
> > was on a PCI bus, or in fact probably both as you lined up. But no
> > magic arch specific behavior inbetween.
>
> The only arch specific behaviour is needed in the case where it doesn't
> behave like PCI. In this case, the PCI DMA ops are not suitable, but in
> our secure VMs, we still need to make it use swiotlb in order to bounce
> through non-secure pages.
>
> It would be nice if "real PCI" was the default
I think you are mixing "real PCI" which isn't coded up yet and IOMMU
bypass which is. IOMMU bypass will maybe with time become unnecessary
since it seems that one can just program an IOMMU in a bypass mode
instead.
It's hard to blame you since right now if you disable IOMMU bypass
you get a real PCI mode. But they are distinct and to allow people
to enable IOMMU by default we will need to teach someone
(virtio or DMA API) about this mode that does follow
translation and protection rules in the IOMMU but runs
on a CPU and so does not need cache flushes and whatnot.
OTOH real PCI mode as opposed to default hypervisor mode does not perform as
well when what you actually have is a hypervisor.
So we'll likely have a mix of these two modes for a while.
> but it's not, VMs are
> created in "legacy" mode all the times and we don't know at VM creation
> time whether it will become a secure VM or not. The way our secure VMs
> work is that they start as a normal VM, load a secure "payload" and
> call the Ultravisor to "become" secure.
>
> So we're in a bit of a bind here. We need that one-liner optional arch
> hook to make virtio use swiotlb in that "IOMMU bypass" case.
>
> Ben.
And just to make sure I understand, on your platform DMA APIs do include
some of the cache flushing tricks and this is why you don't want to
declare iommu support in the hypervisor?
--
MST
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-01 22:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel,
linuxram, virtualization, paulus, marc.zyngier, mpe, joe,
robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <20180801083639.GF26378@infradead.org>
On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > transport (so definitely not PCI) have historically been advertised by qemu
> > as not being cache coherent, but because the virtio core has bypassed DMA
> > ops then everything has happened to work. If we blindly enable the arch DMA
> > ops,
>
> No one is suggesting that as far as I can tell.
>
> > we'll plumb in the non-coherent ops and start getting data corruption,
> > so we do need a way to quirk virtio as being "always coherent" if we want to
> > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > for all virtio devices).
>
> >From all that I've gather so far: no you do not want that. We really
> need to figure out virtio "dma" interacts with the host / device.
>
> If you look at the current iommu spec it does talk of physical address
> with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
>
> So between that and our discussion in this thread and its previous
> iterations I think we need to stick to the current always physical,
> bypass system dma ops mode of virtio operation as the default.
>
> We just need to figure out how to deal with devices that deviate
> from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really
> should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> dma tweaks (offsets, cache flushing), which seems well in spirit of
> the original design.
Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest
programmable protection which is designed for things like userspace
drivers but still very much which a CPU doing the accesses. I think
VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA.
> The other issue is VIRTIO_F_IO_BARRIER
> which is very vaguely defined, and which needs a better definition.
> And last but not least we'll need some text explaining the challenges
> of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> is what would basically cover them, but a good description including
> an explanation of why these matter.
I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea.
--
MST
^ permalink raw reply
* Re: [RFC 0/4] Virtio uses DMA API for all devices
From: Michael S. Tsirkin @ 2018-08-01 22:41 UTC (permalink / raw)
To: Will Deacon
Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel,
virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe,
robin.murphy, david, linuxppc-dev, elfring, haren,
Anshuman Khandual
In-Reply-To: <20180801090535.GB14438@arm.com>
On Wed, Aug 01, 2018 at 10:05:35AM +0100, Will Deacon wrote:
> Hi Christoph,
>
> On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote:
> > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote:
> > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO
> > > transport (so definitely not PCI) have historically been advertised by qemu
> > > as not being cache coherent, but because the virtio core has bypassed DMA
> > > ops then everything has happened to work. If we blindly enable the arch DMA
> > > ops,
> >
> > No one is suggesting that as far as I can tell.
>
> Apologies: it's me that wants the DMA ops enabled to handle legacy devices
> behind an IOMMU, but see below.
>
> > > we'll plumb in the non-coherent ops and start getting data corruption,
> > > so we do need a way to quirk virtio as being "always coherent" if we want to
> > > use the DMA ops (which we do, because our emulation platforms have an IOMMU
> > > for all virtio devices).
> >
> > From all that I've gather so far: no you do not want that. We really
> > need to figure out virtio "dma" interacts with the host / device.
> >
> > If you look at the current iommu spec it does talk of physical address
> > with a little careveout for VIRTIO_F_IOMMU_PLATFORM.
>
> That's true, although that doesn't exist in the legacy virtio spec, and we
> have an existing emulation platform which puts legacy virtio devices behind
> an IOMMU. Currently, Linux is unable to boot on this platform unless the
> IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops,
> then it works perfectly.
>
> > So between that and our discussion in this thread and its previous
> > iterations I think we need to stick to the current always physical,
> > bypass system dma ops mode of virtio operation as the default.
>
> As above -- that means we hang during boot because we get stuck trying to
> bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy
> answer is "just upgrade to latest virtio and advertise the presence of the
> IOMMU". I'm pushing for that in future platforms, but it seems a shame not
> to support the current platform, especially given that other systems do have
> hacks in mainline to get virtio working.
>
> > We just need to figure out how to deal with devices that deviate
> > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really
> > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu
> > dma tweaks (offsets, cache flushing), which seems well in spirit of
> > the original design. The other issue is VIRTIO_F_IO_BARRIER
> > which is very vaguely defined, and which needs a better definition.
> > And last but not least we'll need some text explaining the challenges
> > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER
> > is what would basically cover them, but a good description including
> > an explanation of why these matter.
>
> I agree that this makes sense for future revisions of virtio (or perhaps
> it can just be a clarification to virtio 1.0), but we're still left in the
> dark with legacy devices and it would be nice to have them work on the
> systems which currently exist, even if it's a legacy-only hack in the arch
> code.
>
> Will
Myself I'm sympathetic to this use-case and I see more uses to this
than just legacy support. But more work is required IMHO.
Will post tomorrow though - it's late here ...
--
MST
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-02 8:18 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <CAMDZJNWX7L4P0yO+-PeDFu_tmtLSntihO7rcLPB2GK4eN9zbwQ@mail.gmail.com>
On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>> +
>>> + cpu_relax();
>>> + }
>>> +
>>> + preempt_enable();
>>> +
>>> + if (!rx)
>>> + vhost_net_enable_vq(net, vq);
>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>> called soon.
> If we disable rx virtqueue in handle_tx and don't send packets from
> guest anymore(handle_tx is not called), so we can wake up for sock rx.
> so the network is broken.
Not sure I understand here. I mean is we schedule work for handle_rx(),
there's no need to enable it since handle_rx() will do this for us.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-02 8:41 UTC (permalink / raw)
To: Jason Wang, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <b4ffd376-6fa8-4d0c-5c3a-82d1b6e924d6@redhat.com>
On 2018/08/02 17:18, Jason Wang wrote:
> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>> + struct vhost_virtqueue *rvq,
>>> + struct vhost_virtqueue *tvq,
>>> + bool rx)
>>> +{
>>> + struct socket *sock = rvq->private_data;
>>> +
>>> + if (rx)
>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>> + else if (sock && sk_has_rx_data(sock->sk))
>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>> + else {
>>> + /* On tx here, sock has no rx data, so we
>>> + * will wait for sock wakeup for rx, and
>>> + * vhost_enable_notify() is not needed. */
>>
>> A possible case is we do have rx data but guest does not refill the rx
>> queue. In this case we may lose notifications from guest.
> Yes, should consider this case. thanks.
I'm a bit confused. Isn't this covered by the previous
"else if (sock && sk_has_rx_data(...))" block?
>>>> +
>>>> + cpu_relax();
>>>> + }
>>>> +
>>>> + preempt_enable();
>>>> +
>>>> + if (!rx)
>>>> + vhost_net_enable_vq(net, vq);
>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>> called soon.
>> If we disable rx virtqueue in handle_tx and don't send packets from
>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>> so the network is broken.
>
> Not sure I understand here. I mean is we schedule work for handle_rx(),
> there's no need to enable it since handle_rx() will do this for us.
Looks like in the last "else" block in vhost_net_busy_poll_check() we
need to enable vq since in that case we have no rx data and handle_rx()
is not scheduled.
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Jason Wang @ 2018-08-02 9:23 UTC (permalink / raw)
To: Toshiaki Makita, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <ca040549-5eda-4a03-4295-17e6d7d44dd5@lab.ntt.co.jp>
On 2018年08月02日 16:41, Toshiaki Makita wrote:
> On 2018/08/02 17:18, Jason Wang wrote:
>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>> + struct vhost_virtqueue *rvq,
>>>> + struct vhost_virtqueue *tvq,
>>>> + bool rx)
>>>> +{
>>>> + struct socket *sock = rvq->private_data;
>>>> +
>>>> + if (rx)
>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>> + else {
>>>> + /* On tx here, sock has no rx data, so we
>>>> + * will wait for sock wakeup for rx, and
>>>> + * vhost_enable_notify() is not needed. */
>>> A possible case is we do have rx data but guest does not refill the rx
>>> queue. In this case we may lose notifications from guest.
>> Yes, should consider this case. thanks.
> I'm a bit confused. Isn't this covered by the previous
> "else if (sock && sk_has_rx_data(...))" block?
The problem is it does nothing if vhost_vq_avail_empty() is true and
vhost_enble_notify() is false.
>
>>>>> +
>>>>> + cpu_relax();
>>>>> + }
>>>>> +
>>>>> + preempt_enable();
>>>>> +
>>>>> + if (!rx)
>>>>> + vhost_net_enable_vq(net, vq);
>>>> No need to enable rx virtqueue, if we are sure handle_rx() will be
>>>> called soon.
>>> If we disable rx virtqueue in handle_tx and don't send packets from
>>> guest anymore(handle_tx is not called), so we can wake up for sock rx.
>>> so the network is broken.
>> Not sure I understand here. I mean is we schedule work for handle_rx(),
>> there's no need to enable it since handle_rx() will do this for us.
> Looks like in the last "else" block in vhost_net_busy_poll_check() we
> need to enable vq since in that case we have no rx data and handle_rx()
> is not scheduled.
>
Yes.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v7 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: Toshiaki Makita @ 2018-08-02 9:57 UTC (permalink / raw)
To: Jason Wang, Tonghao Zhang
Cc: Linux Kernel Network Developers, virtualization, mst
In-Reply-To: <3272c3b4-a44c-8554-329e-8a5e1a59aafd@redhat.com>
On 2018/08/02 18:23, Jason Wang wrote:
> On 2018年08月02日 16:41, Toshiaki Makita wrote:
>> On 2018/08/02 17:18, Jason Wang wrote:
>>> On 2018年08月01日 17:52, Tonghao Zhang wrote:
>>>>> +static void vhost_net_busy_poll_check(struct vhost_net *net,
>>>>> + struct vhost_virtqueue *rvq,
>>>>> + struct vhost_virtqueue *tvq,
>>>>> + bool rx)
>>>>> +{
>>>>> + struct socket *sock = rvq->private_data;
>>>>> +
>>>>> + if (rx)
>>>>> + vhost_net_busy_poll_try_queue(net, tvq);
>>>>> + else if (sock && sk_has_rx_data(sock->sk))
>>>>> + vhost_net_busy_poll_try_queue(net, rvq);
>>>>> + else {
>>>>> + /* On tx here, sock has no rx data, so we
>>>>> + * will wait for sock wakeup for rx, and
>>>>> + * vhost_enable_notify() is not needed. */
>>>> A possible case is we do have rx data but guest does not refill the rx
>>>> queue. In this case we may lose notifications from guest.
>>> Yes, should consider this case. thanks.
>> I'm a bit confused. Isn't this covered by the previous
>> "else if (sock && sk_has_rx_data(...))" block?
>
> The problem is it does nothing if vhost_vq_avail_empty() is true and
> vhost_enble_notify() is false.
If vhost_enable_notify() is false, guest will eventually kicks vq, no?
--
Toshiaki Makita
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-08-02 10:32 UTC (permalink / raw)
To: Michal Hocko
Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <20180801113444.GK16767@dhcp22.suse.cz>
On 08/01/2018 07:34 PM, Michal Hocko wrote:
> On Wed 01-08-18 19:12:25, Wei Wang wrote:
>> On 07/30/2018 05:00 PM, Michal Hocko wrote:
>>> On Fri 27-07-18 17:24:55, Wei Wang wrote:
>>>> The OOM notifier is getting deprecated to use for the reasons mentioned
>>>> here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
>>>>
>>>> This patch replaces the virtio-balloon oom notifier with a shrinker
>>>> to release balloon pages on memory pressure.
>>> It would be great to document the replacement. This is not a small
>>> change...
>> OK. I plan to document the following to the commit log:
>>
>> The OOM notifier is getting deprecated to use for the reasons:
>> - As a callout from the oom context, it is too subtle and easy to
>> generate bugs and corner cases which are hard to track;
>> - It is called too late (after the reclaiming has been performed).
>> Drivers with large amuont of reclaimable memory is expected to be
>> released them at an early age of memory pressure;
>> - The notifier callback isn't aware of the oom contrains;
>> Link: https://lkml.org/lkml/2018/7/12/314
>>
>> This patch replaces the virtio-balloon oom notifier with a shrinker
>> to release balloon pages on memory pressure. Users can set the amount of
>> memory pages to release each time a shrinker_scan is called via the
>> module parameter balloon_pages_to_shrink, and the default amount is 256
>> pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
>> been used to release balloon pages on OOM. We continue to use this
>> feature bit for the shrinker, so the shrinker is only registered when
>> this feature bit has been negotiated with host.
> Do you have any numbers for how does this work in practice?
It works in this way: for example, we can set the parameter,
balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is
called. Now, we have a 8GB guest, and we balloon out 7GB. When shrink
scan is called, the balloon driver will get back 1GB memory and give
them back to mm, then the ballooned memory becomes 6GB.
When the shrinker scan is called the second time, another 1GB will be
given back to mm. So the ballooned pages are given back to mm gradually.
> Let's say
> you have a medium page cache workload which triggers kswapd to do a
> light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
> no idea how people expect this to work. Shouldn't this be more
> adaptive? How precious are those pages anyway?
Those pages are given to host to use usually because the guest has
enough free memory, and host doesn't want to waste those pieces of
memory as they are not used by this guest. When the guest needs them, it
is reasonable that the guest has higher priority to take them back.
But I'm not sure if there would be a more adaptive approach than
"gradually giving back as the guest wants more".
Best,
Wei
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Tetsuo Handa @ 2018-08-02 11:00 UTC (permalink / raw)
To: Wei Wang, Michal Hocko
Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <5B62DDCC.3030100@intel.com>
On 2018/08/02 19:32, Wei Wang wrote:
> On 08/01/2018 07:34 PM, Michal Hocko wrote:
>> Do you have any numbers for how does this work in practice?
>
> It works in this way: for example, we can set the parameter, balloon_pages_to_shrink,
> to shrink 1GB memory once shrink scan is called. Now, we have a 8GB guest, and we balloon
> out 7GB. When shrink scan is called, the balloon driver will get back 1GB memory and give
> them back to mm, then the ballooned memory becomes 6GB.
Since shrinker might be called concurrently (am I correct?), the balloon might deflate
far more than needed if it releases such much memory. If shrinker is used, releasing 256
pages might be sufficient.
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Wei Wang @ 2018-08-02 11:27 UTC (permalink / raw)
To: Tetsuo Handa, Michal Hocko
Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <87d7ae45-79cb-e294-7397-0e45e2af49cd@I-love.SAKURA.ne.jp>
On 08/02/2018 07:00 PM, Tetsuo Handa wrote:
> On 2018/08/02 19:32, Wei Wang wrote:
>> On 08/01/2018 07:34 PM, Michal Hocko wrote:
>>> Do you have any numbers for how does this work in practice?
>> It works in this way: for example, we can set the parameter, balloon_pages_to_shrink,
>> to shrink 1GB memory once shrink scan is called. Now, we have a 8GB guest, and we balloon
>> out 7GB. When shrink scan is called, the balloon driver will get back 1GB memory and give
>> them back to mm, then the ballooned memory becomes 6GB.
> Since shrinker might be called concurrently (am I correct?),
Not sure about it being concurrently, but I think it would be called
repeatedly as should_continue_reclaim() returns true.
Best,
Wei
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Michal Hocko @ 2018-08-02 11:29 UTC (permalink / raw)
To: Wei Wang
Cc: virtio-dev, mst, Tetsuo Handa, linux-kernel, virtualization,
linux-mm, akpm
In-Reply-To: <5B62EAAC.8000505@intel.com>
On Thu 02-08-18 19:27:40, Wei Wang wrote:
> On 08/02/2018 07:00 PM, Tetsuo Handa wrote:
> > On 2018/08/02 19:32, Wei Wang wrote:
> > > On 08/01/2018 07:34 PM, Michal Hocko wrote:
> > > > Do you have any numbers for how does this work in practice?
> > > It works in this way: for example, we can set the parameter, balloon_pages_to_shrink,
> > > to shrink 1GB memory once shrink scan is called. Now, we have a 8GB guest, and we balloon
> > > out 7GB. When shrink scan is called, the balloon driver will get back 1GB memory and give
> > > them back to mm, then the ballooned memory becomes 6GB.
> > Since shrinker might be called concurrently (am I correct?),
>
> Not sure about it being concurrently, but I think it would be called
> repeatedly as should_continue_reclaim() returns true.
Multiple direct reclaimers might indeed invoke it concurrently.
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* CFP SENSORNETS 2019 - 8th Int.l Conf. on Sensor Networks (Prague/Czech Republic)
From: sensornets @ 2018-08-02 11:30 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
8th International Conference on Sensor Networks
Submission Deadline: October 1, 2018
http://www.sensornets.org/
February 26 - 27, 2019
Prague, Czech Republic.
SENSORNETS is organized in 5 major tracks:
- Sensor Networks Software, Architectures and Applications
- Wireless Sensor Networks
- Energy and Environment
- Intelligent Data Analysis and Processing
- Security and Privacy in Sensor Networks
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
SENSORNETS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 100 033
Fax: +351 265 520 186
Web: http://www.sensornets.org/
e-mail: sensornets.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP IoTBDS 2019 - 3rd Int.l Conf. on Internet of Things, Big Data and Security (Heraklion, Crete/Greece)
From: iotbds @ 2018-08-02 11:30 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
3rd International Conference on Internet of Things, Big Data and Security
Submission Deadline: December 10, 2018
http://iotbds.org/
May 2 - 4, 2019
Heraklion, Crete, Greece.
IoTBDS is organized in 7 major tracks:
- Big Data Research
- Emerging Services and Analytics
- Internet of Things (IoT) Fundamentals
- Internet of Things (IoT) Applications
- Big Data for Multi-discipline Services
- Security, Privacy and Trust
- IoT Technologies
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
With the presence of internationally distinguished keynote speakers:
Francisco Herrera, University of Granada, Spain
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
IoTBDS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://iotbds.org/
e-mail: iotbds.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP SMARTGREENS 2019 - 8th Int.l Conf. on Smart Cities and Green ICT Systems (Heraklion, Crete/Greece)
From: smartgreens @ 2018-08-02 11:30 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
8th International Conference on Smart Cities and Green ICT Systems
Submission Deadline: December 10, 2018
http://www.smartgreens.org/
May 3 - 5, 2019
Heraklion, Crete, Greece.
SMARTGREENS is organized in 5 major tracks:
- Energy-Aware Systems and Technologies
- Sustainable Computing and Systems
- Smart Cities and Smart Buildings
- Demos and Use-Cases
- Smart and Digital Services
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
With the presence of internationally distinguished keynote speakers:
Norbert Streitz, Founder and Scientific Director, Smart Future Initiative, Germany
Rudolf Giffinger, Vienna University of Technology, Austria
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
SMARTGREENS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 185
Fax: +351 265 520 186
Web: http://www.smartgreens.org/
e-mail: smartgreens.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP VEHITS 2019 - 5th Int.l Conf. on Vehicle Technology and Intelligent Transport Systems (Heraklion, Crete/Greece)
From: vehits @ 2018-08-02 11:30 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
5th International Conference on Vehicle Technology and Intelligent Transport Systems
Submission Deadline: December 10, 2018
http://www.vehits.org/
May 3 - 5, 2019
Heraklion, Crete, Greece.
VEHITS is organized in 5 major tracks:
- Intelligent Vehicle Technologies
- Intelligent Transport Systems and Infrastructure
- Connected Vehicles
- Sustainable Transport
- Data Analytics
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
With the presence of internationally distinguished keynote speakers:
Javier Sánchez-Medina, University of Las Palmas de Gran Canaria, Spain
Jeroen Ploeg, 2getthere B.V., Utrecht, The Netherlands Lead Cooperative Driving Eindhoven University of Technology, Eindhoven, The Netherlands Associate professor (part time), faculty of Mechanical Engineering, Dynamics and Control group, Netherlands
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
VEHITS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 185
Fax: +351 265 520 186
Web: http://www.vehits.org/
e-mail: vehits.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP CLOSER 2019 - 8th Int.l Conf. on Cloud Computing and Services Science (Heraklion, Crete/Greece)
From: closer @ 2018-08-02 11:31 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
8th International Conference on Cloud Computing and Services Science
Submission Deadline: December 10, 2018
http://closer.scitevents.org
May 2 - 4, 2019
Heraklion, Crete, Greece.
CLOSER is organized in 9 major tracks:
- Services Science
- Data as a Service
- Cloud Operations
- Edge Cloud and Fog Computing
- Service Modelling and Analytics
- Mobile Cloud Computing
- Cloud Computing Fundamentals
- Cloud Computing Platforms and Applications
- Cloud Computing Enabling Technology
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
CLOSER Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://closer.scitevents.org
e-mail: closer.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* CFP ICEIS 2019 - 21st Int.l Conf. on Enterprise Information Systems (Heraklion, Crete/Greece)
From: iceis @ 2018-08-02 11:31 UTC (permalink / raw)
To: virtualization
SUBMISSION DEADLINE
21st International Conference on Enterprise Information Systems
Submission Deadline: December 10, 2018
http://www.iceis.org/
May 3 - 5, 2019
Heraklion, Crete, Greece.
ICEIS is organized in 6 major tracks:
- Databases and Information Systems Integration
- Artificial Intelligence and Decision Support Systems
- Information Systems Analysis and Specification
- Software Agents and Internet Computing
- Human-Computer Interaction
- Enterprise Architecture
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
Should you have any question please don’t hesitate contacting me.
Kind regards,
ICEIS Secretariat
Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://www.iceis.org/
e-mail: iceis.secretariat@insticc.org
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Michal Hocko @ 2018-08-02 11:47 UTC (permalink / raw)
To: Wei Wang; +Cc: virtio-dev, mst, linux-kernel, virtualization, linux-mm, akpm
In-Reply-To: <5B62DDCC.3030100@intel.com>
On Thu 02-08-18 18:32:44, Wei Wang wrote:
> On 08/01/2018 07:34 PM, Michal Hocko wrote:
> > On Wed 01-08-18 19:12:25, Wei Wang wrote:
> > > On 07/30/2018 05:00 PM, Michal Hocko wrote:
> > > > On Fri 27-07-18 17:24:55, Wei Wang wrote:
> > > > > The OOM notifier is getting deprecated to use for the reasons mentioned
> > > > > here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
> > > > >
> > > > > This patch replaces the virtio-balloon oom notifier with a shrinker
> > > > > to release balloon pages on memory pressure.
> > > > It would be great to document the replacement. This is not a small
> > > > change...
> > > OK. I plan to document the following to the commit log:
> > >
> > > The OOM notifier is getting deprecated to use for the reasons:
> > > - As a callout from the oom context, it is too subtle and easy to
> > > generate bugs and corner cases which are hard to track;
> > > - It is called too late (after the reclaiming has been performed).
> > > Drivers with large amuont of reclaimable memory is expected to be
> > > released them at an early age of memory pressure;
> > > - The notifier callback isn't aware of the oom contrains;
> > > Link: https://lkml.org/lkml/2018/7/12/314
> > >
> > > This patch replaces the virtio-balloon oom notifier with a shrinker
> > > to release balloon pages on memory pressure. Users can set the amount of
> > > memory pages to release each time a shrinker_scan is called via the
> > > module parameter balloon_pages_to_shrink, and the default amount is 256
> > > pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
> > > been used to release balloon pages on OOM. We continue to use this
> > > feature bit for the shrinker, so the shrinker is only registered when
> > > this feature bit has been negotiated with host.
> > Do you have any numbers for how does this work in practice?
>
> It works in this way: for example, we can set the parameter,
> balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is called.
> Now, we have a 8GB guest, and we balloon out 7GB. When shrink scan is
> called, the balloon driver will get back 1GB memory and give them back to
> mm, then the ballooned memory becomes 6GB.
>
> When the shrinker scan is called the second time, another 1GB will be given
> back to mm. So the ballooned pages are given back to mm gradually.
>
> > Let's say
> > you have a medium page cache workload which triggers kswapd to do a
> > light reclaim? Hardcoded shrinking sounds quite dubious to me but I have
> > no idea how people expect this to work. Shouldn't this be more
> > adaptive? How precious are those pages anyway?
>
> Those pages are given to host to use usually because the guest has enough
> free memory, and host doesn't want to waste those pieces of memory as they
> are not used by this guest. When the guest needs them, it is reasonable that
> the guest has higher priority to take them back.
> But I'm not sure if there would be a more adaptive approach than "gradually
> giving back as the guest wants more".
I am not sure I follow. Let me be more specific. Say you have a trivial
stream IO triggering reclaim to recycle clean page cache. This will
invoke slab shrinkers as well. Do you really want to drop your batch of
pages on each invocation? Doesn't that remove them very quickly? Just
try to dd if=large_file of=/dev/null and see how your pages are
disappearing. Shrinkers usually scale the number of objects they are
going to reclaim based on the memory pressure (aka targer to be
reclaimed).
--
Michal Hocko
SUSE Labs
^ permalink raw reply
* Re: [PATCH v2 2/2] virtio_balloon: replace oom notifier with shrinker
From: Michael S. Tsirkin @ 2018-08-02 15:18 UTC (permalink / raw)
To: Wei Wang
Cc: virtio-dev, linux-kernel, virtualization, linux-mm, akpm,
Michal Hocko
In-Reply-To: <5B62DDCC.3030100@intel.com>
On Thu, Aug 02, 2018 at 06:32:44PM +0800, Wei Wang wrote:
> On 08/01/2018 07:34 PM, Michal Hocko wrote:
> > On Wed 01-08-18 19:12:25, Wei Wang wrote:
> > > On 07/30/2018 05:00 PM, Michal Hocko wrote:
> > > > On Fri 27-07-18 17:24:55, Wei Wang wrote:
> > > > > The OOM notifier is getting deprecated to use for the reasons mentioned
> > > > > here by Michal Hocko: https://lkml.org/lkml/2018/7/12/314
> > > > >
> > > > > This patch replaces the virtio-balloon oom notifier with a shrinker
> > > > > to release balloon pages on memory pressure.
> > > > It would be great to document the replacement. This is not a small
> > > > change...
> > > OK. I plan to document the following to the commit log:
> > >
> > > The OOM notifier is getting deprecated to use for the reasons:
> > > - As a callout from the oom context, it is too subtle and easy to
> > > generate bugs and corner cases which are hard to track;
> > > - It is called too late (after the reclaiming has been performed).
> > > Drivers with large amuont of reclaimable memory is expected to be
> > > released them at an early age of memory pressure;
> > > - The notifier callback isn't aware of the oom contrains;
> > > Link: https://lkml.org/lkml/2018/7/12/314
> > >
> > > This patch replaces the virtio-balloon oom notifier with a shrinker
> > > to release balloon pages on memory pressure. Users can set the amount of
> > > memory pages to release each time a shrinker_scan is called via the
> > > module parameter balloon_pages_to_shrink, and the default amount is 256
> > > pages. Historically, the feature VIRTIO_BALLOON_F_DEFLATE_ON_OOM has
> > > been used to release balloon pages on OOM. We continue to use this
> > > feature bit for the shrinker, so the shrinker is only registered when
> > > this feature bit has been negotiated with host.
> > Do you have any numbers for how does this work in practice?
>
> It works in this way: for example, we can set the parameter,
> balloon_pages_to_shrink, to shrink 1GB memory once shrink scan is called.
> Now, we have a 8GB guest, and we balloon out 7GB. When shrink scan is
> called, the balloon driver will get back 1GB memory and give them back to
> mm, then the ballooned memory becomes 6GB.
>
> When the shrinker scan is called the second time, another 1GB will be given
> back to mm. So the ballooned pages are given back to mm gradually.
I think what's being asked here is a description of tests that
were run. Which workloads see improved behaviour?
Our behaviour under memory pressure isn't great, in particular it is not
clear when it's safe to re-inflate the balloon, if host attempts to
re-inflate it too soon then we still get OOM. It would be better
if VIRTIO_BALLOON_F_DEFLATE_ON_OOM would somehow mean
"it's ok to ask for almost all of memory, if guest needs memory from
balloon for apps to function it can take it from the balloon".
--
MST
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox