From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41098)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Wjp7V-0007Yg-Gu
	for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Wjp7P-0003d2-Bv
	for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:13 -0400
Received: from mx1.redhat.com ([209.132.183.28]:44553)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1Wjp7P-0003cu-3h
	for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:07 -0400
Date: Mon, 12 May 2014 15:12:52 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20140512121252.GA16576@redhat.com>
References: <33183CC9F5247A488A2544077AF19020815E744C@SZXEMA503-MBS.china.huawei.com>
	<536CA5A5.4080303@redhat.com>
	<33183CC9F5247A488A2544077AF19020815E7B70@SZXEMA503-MBS.china.huawei.com>
	<53709B0C.4030808@redhat.com> <20140512100814.GA15514@redhat.com>
	<53709F01.8090204@redhat.com> <20140512101844.GC15514@redhat.com>
	<5370A19F.5000900@redhat.com> <20140512110723.GB15684@redhat.com>
	<5370B48B.6080001@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5370B48B.6080001@redhat.com>
Subject: Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to
 call_rcu in vhost_set_memory() in vhost kernel module?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Huangweidong (C)" <weidong.huang@huawei.com>, "gleb@redhat.com" <gleb@redhat.com>, Radim Krcmar <rkrcmar@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Gonglei (Arei)" <arei.gonglei@huawei.com>, "avi.kivity@gmail.com" <avi.kivity@gmail.com>, "Herongguang (Stephen)" <herongguang.he@huawei.com>

On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>>>Perhaps we can check for cases where only the address is changing,
> >>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>>>doing any RCU synchronization?
> >>>>>
> >>>>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>>>cpus, now it matches some. No?
> >>>>
> >>>>Can you explain the problem more verbosely? :)
> >>>>
> >>>>Multiple writers would still be protected by the mutex, so you
> >>>>cannot have an "in-place update" writer racing with a "copy the
> >>>>array" writer.
> >>>
> >>>I am not sure really.
> >>>I'm worried about reader vs writer.
> >>>If reader sees a stale msi value msi will be sent to a wrong
> >>>address.
> >>
> >>That shouldn't happen on any cache-coherent system, no?
> >>
> >>Or at least, it shouldn't become any worse than what can already
> >>happen with RCU.
> >
> >Meaning guest must do some synchronization anyway?
> 
> Yes, I think so.  The simplest would be to mask the interrupt around
> MSI configuration changes.  Radim was looking at a similar bug.

Was this with classic assignment or with vfio?

> He
> couldn't replicate on bare metal, but I don't see why it shouldn't
> be possible there too.

I doubt linux does something tricky here, so
I'm not talking about something linux guest does,
I'm talking about an abstract guarantee of the
APIC. If baremetal causes a synchronisation point
and we don't, at some point linux will use it and
this will bite us.

> >But I am not sure this works correctly in all cases,
> >synchronization with guest VCPU does not have to be
> >the same thing as synchronization with host CPU.
> >For example, can not guest VCPU1 detect that
> >VCPU2 is idle and avoid any synchronization?
> 
> So guest VCPU1 would be the one that sets the MSI word, and VCPU2
> would be the old destination?  That would also be racy, the moment
> after you "check that VCPU2 is idle" an interrupt could come in and
> the VCPU wouldn't be idle anymore.

True but I was talking about something like RCU in guest so
that's not a problem: you check after update.
If it's idle you know next interrupt will be the new one.

> >In any case I'd like to see a patch like this
> >accompanied by some argument explaining why it's
> >a safe thing to do.
> 
> Yes, of course.
> 
> Paolo