From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41098) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wjp7V-0007Yg-Gu for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wjp7P-0003d2-Bv for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44553) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wjp7P-0003cu-3h for qemu-devel@nongnu.org; Mon, 12 May 2014 08:14:07 -0400 Date: Mon, 12 May 2014 15:12:52 +0300 From: "Michael S. Tsirkin" Message-ID: <20140512121252.GA16576@redhat.com> References: <33183CC9F5247A488A2544077AF19020815E744C@SZXEMA503-MBS.china.huawei.com> <536CA5A5.4080303@redhat.com> <33183CC9F5247A488A2544077AF19020815E7B70@SZXEMA503-MBS.china.huawei.com> <53709B0C.4030808@redhat.com> <20140512100814.GA15514@redhat.com> <53709F01.8090204@redhat.com> <20140512101844.GC15514@redhat.com> <5370A19F.5000900@redhat.com> <20140512110723.GB15684@redhat.com> <5370B48B.6080001@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5370B48B.6080001@redhat.com> Subject: Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: "Huangweidong (C)" , "gleb@redhat.com" , Radim Krcmar , "qemu-devel@nongnu.org" , "Gonglei (Arei)" , "avi.kivity@gmail.com" , "Herongguang (Stephen)" On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote: > Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto: > >On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote: > >>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto: > >>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote: > >>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto: > >>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote: > >>>>>>Perhaps we can check for cases where only the address is changing, > >>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without > >>>>>>doing any RCU synchronization? > >>>>> > >>>>>I suspect interrupts can get lost then: e.g. if address didn't match any > >>>>>cpus, now it matches some. No? > >>>> > >>>>Can you explain the problem more verbosely? :) > >>>> > >>>>Multiple writers would still be protected by the mutex, so you > >>>>cannot have an "in-place update" writer racing with a "copy the > >>>>array" writer. > >>> > >>>I am not sure really. > >>>I'm worried about reader vs writer. > >>>If reader sees a stale msi value msi will be sent to a wrong > >>>address. > >> > >>That shouldn't happen on any cache-coherent system, no? > >> > >>Or at least, it shouldn't become any worse than what can already > >>happen with RCU. > > > >Meaning guest must do some synchronization anyway? > > Yes, I think so. The simplest would be to mask the interrupt around > MSI configuration changes. Radim was looking at a similar bug. Was this with classic assignment or with vfio? > He > couldn't replicate on bare metal, but I don't see why it shouldn't > be possible there too. I doubt linux does something tricky here, so I'm not talking about something linux guest does, I'm talking about an abstract guarantee of the APIC. If baremetal causes a synchronisation point and we don't, at some point linux will use it and this will bite us. > >But I am not sure this works correctly in all cases, > >synchronization with guest VCPU does not have to be > >the same thing as synchronization with host CPU. > >For example, can not guest VCPU1 detect that > >VCPU2 is idle and avoid any synchronization? > > So guest VCPU1 would be the one that sets the MSI word, and VCPU2 > would be the old destination? That would also be racy, the moment > after you "check that VCPU2 is idle" an interrupt could come in and > the VCPU wouldn't be idle anymore. True but I was talking about something like RCU in guest so that's not a problem: you check after update. If it's idle you know next interrupt will be the new one. > >In any case I'd like to see a patch like this > >accompanied by some argument explaining why it's > >a safe thing to do. > > Yes, of course. > > Paolo