[Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
@ 2014-05-09  1:57 Gonglei (Arei)
  2014-05-09  8:14 ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-09  1:57 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: Paolo Bonzini, Huangweidong (C), Herongguang (Stephen),
	mst@redhat.com

Hi,

Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
to tell vhost kernel modules GPA to HVA memory mappings, which consume is expensively. 
The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl processing, 
kmod and vhost calls synchronize_rcu() to wait for grace period to free old memory.

In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu to call_rcu, 
since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.

In VHOST_SET_MEM_TABLE case, I wonder if we can change synchronize_rcu() to call_rcu(), 
i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU would do 
VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices, and RAM memory 
regions in system memory address space change. 

And I'd like to know if guest activities could lead to RAM memory regions change?

Can you give me some advices? Thanks!

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-09  1:57 [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module? Gonglei (Arei)
@ 2014-05-09  8:14 ` Paolo Bonzini
  2014-05-09  9:04   ` Gonglei (Arei)
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-09  8:14 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org
  Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com

Il 09/05/2014 03:57, Gonglei (Arei) ha scritto:
> Hi,
>
> Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
> to tell vhost kernel modules GPA to HVA memory mappings, which consume is expensively.
> The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl processing,
> kmod and vhost calls synchronize_rcu() to wait for grace period to free old memory.
>
> In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu to call_rcu,
> since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.
>
> In VHOST_SET_MEM_TABLE case, I wonder if we can change synchronize_rcu() to call_rcu(),
> i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU would do
> VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices, and RAM memory
> regions in system memory address space change.
>
> And I'd like to know if guest activities could lead to RAM memory regions change?

Yes, for example enabling/disabling PCI BARs would have that effect.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-09  8:14 ` Paolo Bonzini
@ 2014-05-09  9:04   ` Gonglei (Arei)
  2014-05-09  9:53     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-09  9:04 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel@nongnu.org
  Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com

Hi,

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, May 09, 2014 4:15 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: mst@redhat.com; Herongguang (Stephen); Huangweidong (C)
> Subject: Re: [RFC] vhost: Can we change synchronize_rcu to call_rcu in
> vhost_set_memory() in vhost kernel module?
> 
> Il 09/05/2014 03:57, Gonglei (Arei) ha scritto:
> > Hi,
> >
> > Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
> > to tell vhost kernel modules GPA to HVA memory mappings, which consume is
> expensively.
> > The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl
> processing,
> > kmod and vhost calls synchronize_rcu() to wait for grace period to free old
> memory.
> >
> > In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu
> to call_rcu,
> > since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.
> >
> > In VHOST_SET_MEM_TABLE case, I wonder if we can change
> synchronize_rcu() to call_rcu(),
> > i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU
> would do
> > VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices,
> and RAM memory
> > regions in system memory address space change.
> >
> > And I'd like to know if guest activities could lead to RAM memory regions
> change?
> 
> Yes, for example enabling/disabling PCI BARs would have that effect.
> 
Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with ram 
memory regions, so disable or enable PCI BARs would not change ram MRs' mapping. 
Since vhost_region_add/vhost_region_del check if changed MemoryRegionSection is ram, 
so vhost memoey mappings will not be influenced, is this correct?


Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-09  9:04   ` Gonglei (Arei)
@ 2014-05-09  9:53     ` Paolo Bonzini
  2014-05-12  9:28       ` Gonglei (Arei)
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-09  9:53 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org
  Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com

Il 09/05/2014 11:04, Gonglei (Arei) ha scritto:
>> > Yes, for example enabling/disabling PCI BARs would have that effect.
>> >
> Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with ram
> memory regions, so disable or enable PCI BARs would not change ram MRs' mapping.

PCI BARs can be RAM (one special case being the ROM BAR).

Paolo

> Since vhost_region_add/vhost_region_del check if changed MemoryRegionSection is ram,
> so vhost memoey mappings will not be influenced, is this correct?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-09  9:53     ` Paolo Bonzini
@ 2014-05-12  9:28       ` Gonglei (Arei)
  2014-05-12  9:57         ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-12  9:28 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel@nongnu.org
  Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
	avi.kivity@gmail.com, mst@redhat.com

Hi,

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, May 09, 2014 5:54 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: mst@redhat.com; Herongguang (Stephen); Huangweidong (C)
> Subject: Re: [RFC] vhost: Can we change synchronize_rcu to call_rcu in
> vhost_set_memory() in vhost kernel module?
> 
> Il 09/05/2014 11:04, Gonglei (Arei) ha scritto:
> >> > Yes, for example enabling/disabling PCI BARs would have that effect.
> >> >
> > Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with
> ram
> > memory regions, so disable or enable PCI BARs would not change ram MRs'
> mapping.
> 
> PCI BARs can be RAM (one special case being the ROM BAR).
> 
> Paolo
> 
Many thanks for your explanation.

We now found that migration downtime is primarily consumed in KVM_SET_GSI_ROUTING and 
VHOST_SET_MEM_TABLE IOCTLs, and internally this is stuck in synchronize_rcu(). Besides live 
migration, setting MSI IRQ CPU affinity in VM also triggers KVM_SET_GSI_ROUTING IOCTL in 
QEMU.

>From previous discussion:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though 
SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs 
work in telecom scenario, VMs report CPU and memory usage to balance node each second, and 
balance node dispatch works to different VMs according to VM load. Since this balance needs 
high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in 
every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization. 
And in live migration case, VHOST_SET_MEM_TABLE needs attention.

We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to 
configure. Do you have better ideas to achieve this? Thanks.

Cc'ing Avi & Gleb for more insight.


Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12  9:28       ` Gonglei (Arei)
@ 2014-05-12  9:57         ` Paolo Bonzini
  2014-05-12 10:08           ` Michael S. Tsirkin
                             ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12  9:57 UTC (permalink / raw)
  To: Gonglei (Arei), qemu-devel@nongnu.org
  Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
	avi.kivity@gmail.com, mst@redhat.com

Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> From previous discussion:
> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> balance node dispatch works to different VMs according to VM load. Since this balance needs
> high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> And in live migration case, VHOST_SET_MEM_TABLE needs attention.
>
> We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> configure. Do you have better ideas to achieve this? Thanks.

Perhaps we can check for cases where only the address is changing, and 
poke at an existing struct kvm_kernel_irq_routing_entry without doing 
any RCU synchronization?

As long as kvm_set_msi_irq only reads address_lo once, it should work.

VHOST_SET_MEM_TABLE is a different problem.  What happens in userspace 
that leads to calling that ioctl?  Can we remove it altogether, or delay 
it to after the destination has started running?

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12  9:57         ` Paolo Bonzini
@ 2014-05-12 10:08           ` Michael S. Tsirkin
  2014-05-12 10:14             ` Paolo Bonzini
  2014-05-12 10:30           ` Michael S. Tsirkin
  2014-05-13  7:01           ` Gonglei (Arei)
  2 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> >From previous discussion:
> >https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> >we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> >SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> >work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> >balance node dispatch works to different VMs according to VM load. Since this balance needs
> >high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> >every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> >And in live migration case, VHOST_SET_MEM_TABLE needs attention.
> >
> >We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> >configure. Do you have better ideas to achieve this? Thanks.
> 
> Perhaps we can check for cases where only the address is changing,
> and poke at an existing struct kvm_kernel_irq_routing_entry without
> doing any RCU synchronization?

I suspect interrupts can get lost then: e.g. if address didn't match any
cpus, not it matches some. No?

> As long as kvm_set_msi_irq only reads address_lo once, it should work.
> 
> VHOST_SET_MEM_TABLE is a different problem.  What happens in
> userspace that leads to calling that ioctl?  Can we remove it
> altogether, or delay it to after the destination has started
> running?
> 
> Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 10:08           ` Michael S. Tsirkin
@ 2014-05-12 10:14             ` Paolo Bonzini
  2014-05-12 10:18               ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 10:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>> Perhaps we can check for cases where only the address is changing,
>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>> doing any RCU synchronization?
>
> I suspect interrupts can get lost then: e.g. if address didn't match any
> cpus, now it matches some. No?

Can you explain the problem more verbosely? :)

Multiple writers would still be protected by the mutex, so you cannot 
have an "in-place update" writer racing with a "copy the array" writer.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 10:14             ` Paolo Bonzini
@ 2014-05-12 10:18               ` Michael S. Tsirkin
  2014-05-12 10:25                 ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>Perhaps we can check for cases where only the address is changing,
> >>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>doing any RCU synchronization?
> >
> >I suspect interrupts can get lost then: e.g. if address didn't match any
> >cpus, now it matches some. No?
> 
> Can you explain the problem more verbosely? :)
> 
> Multiple writers would still be protected by the mutex, so you
> cannot have an "in-place update" writer racing with a "copy the
> array" writer.
> 
> Paolo

I am not sure really.
I'm worried about reader vs writer.
If reader sees a stale msi value msi will be sent to a wrong
address.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 10:18               ` Michael S. Tsirkin
@ 2014-05-12 10:25                 ` Paolo Bonzini
  2014-05-12 11:07                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 10:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>> Perhaps we can check for cases where only the address is changing,
>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>> doing any RCU synchronization?
>>>
>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>> cpus, now it matches some. No?
>>
>> Can you explain the problem more verbosely? :)
>>
>> Multiple writers would still be protected by the mutex, so you
>> cannot have an "in-place update" writer racing with a "copy the
>> array" writer.
>
> I am not sure really.
> I'm worried about reader vs writer.
> If reader sees a stale msi value msi will be sent to a wrong
> address.

That shouldn't happen on any cache-coherent system, no?

Or at least, it shouldn't become any worse than what can already happen 
with RCU.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12  9:57         ` Paolo Bonzini
  2014-05-12 10:08           ` Michael S. Tsirkin
@ 2014-05-12 10:30           ` Michael S. Tsirkin
  2014-05-13  7:03             ` Gonglei (Arei)
  2014-05-13  7:01           ` Gonglei (Arei)
  2 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> >From previous discussion:
> >https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> >we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> >SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> >work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> >balance node dispatch works to different VMs according to VM load. Since this balance needs
> >high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> >every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> >And in live migration case, VHOST_SET_MEM_TABLE needs attention.
> >
> >We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> >configure. Do you have better ideas to achieve this? Thanks.
> 
> Perhaps we can check for cases where only the address is changing,
> and poke at an existing struct kvm_kernel_irq_routing_entry without
> doing any RCU synchronization?
> 
> As long as kvm_set_msi_irq only reads address_lo once, it should work.
> 
> VHOST_SET_MEM_TABLE is a different problem.  What happens in
> userspace that leads to calling that ioctl?  Can we remove it
> altogether, or delay it to after the destination has started
> running?
> 
> Paolo

vhost does everything under a VQ lock.
I think RCU for VHOST_SET_MEM_TABLE can be replaced with 
taking and freeing the VQ lock.

Does the below solve the problem for you
(warning: untested, sorry, busy with other bugs right now)?


Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 78987e4..df2e3eb 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 {
 	struct vhost_memory mem, *newmem, *oldmem;
 	unsigned long size = offsetof(struct vhost_memory, regions);
+	int i;
 
 	if (copy_from_user(&mem, m, size))
 		return -EFAULT;
@@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	oldmem = rcu_dereference_protected(d->memory,
 					   lockdep_is_held(&d->mutex));
 	rcu_assign_pointer(d->memory, newmem);
-	synchronize_rcu();
+
+	/* All memory accesses are done under some VQ mutex.
+	 * So below is a faster equivalent of synchronize_rcu()
+	 */
+	for (i = 0; i < dev->nvqs; ++i) {
+		mutex_lock(d->vqs[idx]->mutex);
+		mutex_unlock(d->vqs[idx]->mutex);
+	}
 	kfree(oldmem);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 10:25                 ` Paolo Bonzini
@ 2014-05-12 11:07                   ` Michael S. Tsirkin
  2014-05-12 11:46                     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 11:07 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)

On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>Perhaps we can check for cases where only the address is changing,
> >>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>doing any RCU synchronization?
> >>>
> >>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>cpus, now it matches some. No?
> >>
> >>Can you explain the problem more verbosely? :)
> >>
> >>Multiple writers would still be protected by the mutex, so you
> >>cannot have an "in-place update" writer racing with a "copy the
> >>array" writer.
> >
> >I am not sure really.
> >I'm worried about reader vs writer.
> >If reader sees a stale msi value msi will be sent to a wrong
> >address.
> 
> That shouldn't happen on any cache-coherent system, no?
> 
> Or at least, it shouldn't become any worse than what can already
> happen with RCU.
> 
> Paolo

Meaning guest must do some synchronization anyway?
It's an interesting question, I am not sure.
We seem to rely guest synchronization sometimes in core KVM.

But I am not sure this works correctly in all cases,
synchronization with guest VCPU does not have to be
the same thing as synchronization with host CPU.
For example, can not guest VCPU1 detect that
VCPU2 is idle and avoid any synchronization?

In any case I'd like to see a patch like this
accompanied by some argument explaining why it's
a safe thing to do.

-- 
MST

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 11:07                   ` Michael S. Tsirkin
@ 2014-05-12 11:46                     ` Paolo Bonzini
  2014-05-12 12:12                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 11:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
	qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
	Herongguang (Stephen)

Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>>>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>>>> Perhaps we can check for cases where only the address is changing,
>>>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>>>> doing any RCU synchronization?
>>>>>
>>>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>>>> cpus, now it matches some. No?
>>>>
>>>> Can you explain the problem more verbosely? :)
>>>>
>>>> Multiple writers would still be protected by the mutex, so you
>>>> cannot have an "in-place update" writer racing with a "copy the
>>>> array" writer.
>>>
>>> I am not sure really.
>>> I'm worried about reader vs writer.
>>> If reader sees a stale msi value msi will be sent to a wrong
>>> address.
>>
>> That shouldn't happen on any cache-coherent system, no?
>>
>> Or at least, it shouldn't become any worse than what can already
>> happen with RCU.
>
> Meaning guest must do some synchronization anyway?

Yes, I think so.  The simplest would be to mask the interrupt around MSI 
configuration changes.  Radim was looking at a similar bug.  He couldn't 
replicate on bare metal, but I don't see why it shouldn't be possible 
there too.

> But I am not sure this works correctly in all cases,
> synchronization with guest VCPU does not have to be
> the same thing as synchronization with host CPU.
> For example, can not guest VCPU1 detect that
> VCPU2 is idle and avoid any synchronization?

So guest VCPU1 would be the one that sets the MSI word, and VCPU2 would 
be the old destination?  That would also be racy, the moment after you 
"check that VCPU2 is idle" an interrupt could come in and the VCPU 
wouldn't be idle anymore.

> In any case I'd like to see a patch like this
> accompanied by some argument explaining why it's
> a safe thing to do.

Yes, of course.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 11:46                     ` Paolo Bonzini
@ 2014-05-12 12:12                       ` Michael S. Tsirkin
  2014-05-12 12:46                         ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 12:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
	qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
	Herongguang (Stephen)

On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>>>Perhaps we can check for cases where only the address is changing,
> >>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>>>doing any RCU synchronization?
> >>>>>
> >>>>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>>>cpus, now it matches some. No?
> >>>>
> >>>>Can you explain the problem more verbosely? :)
> >>>>
> >>>>Multiple writers would still be protected by the mutex, so you
> >>>>cannot have an "in-place update" writer racing with a "copy the
> >>>>array" writer.
> >>>
> >>>I am not sure really.
> >>>I'm worried about reader vs writer.
> >>>If reader sees a stale msi value msi will be sent to a wrong
> >>>address.
> >>
> >>That shouldn't happen on any cache-coherent system, no?
> >>
> >>Or at least, it shouldn't become any worse than what can already
> >>happen with RCU.
> >
> >Meaning guest must do some synchronization anyway?
> 
> Yes, I think so.  The simplest would be to mask the interrupt around
> MSI configuration changes.  Radim was looking at a similar bug.

Was this with classic assignment or with vfio?

> He
> couldn't replicate on bare metal, but I don't see why it shouldn't
> be possible there too.

I doubt linux does something tricky here, so
I'm not talking about something linux guest does,
I'm talking about an abstract guarantee of the
APIC. If baremetal causes a synchronisation point
and we don't, at some point linux will use it and
this will bite us.

> >But I am not sure this works correctly in all cases,
> >synchronization with guest VCPU does not have to be
> >the same thing as synchronization with host CPU.
> >For example, can not guest VCPU1 detect that
> >VCPU2 is idle and avoid any synchronization?
> 
> So guest VCPU1 would be the one that sets the MSI word, and VCPU2
> would be the old destination?  That would also be racy, the moment
> after you "check that VCPU2 is idle" an interrupt could come in and
> the VCPU wouldn't be idle anymore.

True but I was talking about something like RCU in guest so
that's not a problem: you check after update.
If it's idle you know next interrupt will be the new one.

> >In any case I'd like to see a patch like this
> >accompanied by some argument explaining why it's
> >a safe thing to do.
> 
> Yes, of course.
> 
> Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 12:12                       ` Michael S. Tsirkin
@ 2014-05-12 12:46                         ` Paolo Bonzini
  2014-05-12 12:53                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 12:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
	qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
	Herongguang (Stephen)

Il 12/05/2014 14:12, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
>>>> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
>>>>> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>>>>>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>>>>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>>>>>> Perhaps we can check for cases where only the address is changing,
>>>>>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>>>>>> doing any RCU synchronization?
>>>>>>>
>>>>>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>>>>>> cpus, now it matches some. No?
>>>>>>
>>>>>> Can you explain the problem more verbosely? :)
>>>>>>
>>>>>> Multiple writers would still be protected by the mutex, so you
>>>>>> cannot have an "in-place update" writer racing with a "copy the
>>>>>> array" writer.
>>>>>
>>>>> I am not sure really.
>>>>> I'm worried about reader vs writer.
>>>>> If reader sees a stale msi value msi will be sent to a wrong
>>>>> address.
>>>>
>>>> That shouldn't happen on any cache-coherent system, no?
>>>>
>>>> Or at least, it shouldn't become any worse than what can already
>>>> happen with RCU.
>>>
>>> Meaning guest must do some synchronization anyway?
>>
>> Yes, I think so.  The simplest would be to mask the interrupt around
>> MSI configuration changes.  Radim was looking at a similar bug.
>
> Was this with classic assignment or with vfio?

No, it was with QEMU emulated devices (AHCI).  Not this path, but it did 
involve MSI configuration changes while the interrupt was unmasked in 
the device.

>> He couldn't replicate on bare metal,
>> but I don't see why it shouldn't
>> be possible there too.
>
> I doubt linux does something tricky here, so
> I'm not talking about something linux guest does,
> I'm talking about an abstract guarantee of the
> APIC.

APIC or PCI?  The chipset is involved here, not the APICs (the APICs 
just see interrupt transactions).

> If baremetal causes a synchronisation point
> and we don't, at some point linux will use it and
> this will bite us.

Yes, and I agree.

It's complicated; on one hand I'm not sure we can entirely fix this, on 
the other hand it may be that the guest cannot have too high expectations.

The invalid scenario is an interrupt happening before the MSI 
reconfiguration and using the new value.  It's invalid because MSI 
reconfiguration is a non-posted write and must push previously posted 
writes.  But we cannot do that at all!  QEMU has the big lock that 
implicitly orders transactions, but this does not apply to 
high-performance users of irqfd (vhost, VFIO, even QEMU's own virtio-blk 
dataplane).  Even if we removed all RCU uses for the routing table, we 
don't have a "bottleneck" that orders transactions and removes this 
invalid scenario.

At the same time, it's obviously okay if an interrupt happens after the 
MSI reconfiguration and uses the old value.  That's a posted write 
passing an earlier non-posted write, which is allowed.  This is the main 
reason why I believe that masking/unmasking is required on bare metal. 
In addition, if the masking/unmasking is done with a posted write (MMIO 
BAR), the guest needs to do a read for synchronization before it unmasks 
the interrupt.

In any case, whether writes synchronize with RCU or bypass it doesn't 
change the picture.  In either case, writes are ordered against each 
other but not against reads.  RCU does nothing except preventing 
dangling pointer accesses.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 12:46                         ` Paolo Bonzini
@ 2014-05-12 12:53                           ` Michael S. Tsirkin
  2014-05-12 13:02                             ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 12:53 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
	qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
	Herongguang (Stephen)

On Mon, May 12, 2014 at 02:46:46PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 14:12, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> >>>>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >>>>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>>>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>>>>>Perhaps we can check for cases where only the address is changing,
> >>>>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>>>>>doing any RCU synchronization?
> >>>>>>>
> >>>>>>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>>>>>cpus, now it matches some. No?
> >>>>>>
> >>>>>>Can you explain the problem more verbosely? :)
> >>>>>>
> >>>>>>Multiple writers would still be protected by the mutex, so you
> >>>>>>cannot have an "in-place update" writer racing with a "copy the
> >>>>>>array" writer.
> >>>>>
> >>>>>I am not sure really.
> >>>>>I'm worried about reader vs writer.
> >>>>>If reader sees a stale msi value msi will be sent to a wrong
> >>>>>address.
> >>>>
> >>>>That shouldn't happen on any cache-coherent system, no?
> >>>>
> >>>>Or at least, it shouldn't become any worse than what can already
> >>>>happen with RCU.
> >>>
> >>>Meaning guest must do some synchronization anyway?
> >>
> >>Yes, I think so.  The simplest would be to mask the interrupt around
> >>MSI configuration changes.  Radim was looking at a similar bug.
> >
> >Was this with classic assignment or with vfio?
> 
> No, it was with QEMU emulated devices (AHCI).  Not this path, but it
> did involve MSI configuration changes while the interrupt was
> unmasked in the device.
> 
> >>He couldn't replicate on bare metal,
> >>but I don't see why it shouldn't
> >>be possible there too.
> >
> >I doubt linux does something tricky here, so
> >I'm not talking about something linux guest does,
> >I'm talking about an abstract guarantee of the
> >APIC.
> 
> APIC or PCI?  The chipset is involved here, not the APICs (the APICs
> just see interrupt transactions).
> 
> >If baremetal causes a synchronisation point
> >and we don't, at some point linux will use it and
> >this will bite us.
> 
> Yes, and I agree.
> 
> It's complicated; on one hand I'm not sure we can entirely fix this,
> on the other hand it may be that the guest cannot have too high
> expectations.
> 
> The invalid scenario is an interrupt happening before the MSI
> reconfiguration and using the new value.  It's invalid because MSI
> reconfiguration is a non-posted write and must push previously
> posted writes.  But we cannot do that at all!  QEMU has the big lock
> that implicitly orders transactions, but this does not apply to
> high-performance users of irqfd (vhost, VFIO, even QEMU's own
> virtio-blk dataplane).  Even if we removed all RCU uses for the
> routing table, we don't have a "bottleneck" that orders transactions
> and removes this invalid scenario.
> 
> At the same time, it's obviously okay if an interrupt happens after
> the MSI reconfiguration and uses the old value.  That's a posted
> write passing an earlier non-posted write, which is allowed.  This
> is the main reason why I believe that masking/unmasking is required
> on bare metal. In addition, if the masking/unmasking is done with a
> posted write (MMIO BAR), the guest needs to do a read for
> synchronization before it unmasks the interrupt.
> 
> In any case, whether writes synchronize with RCU or bypass it
> doesn't change the picture.  In either case, writes are ordered
> against each other but not against reads.  RCU does nothing except
> preventing dangling pointer accesses.
> 
> Paolo

This is the only part I don't get.
RCU will make sure no VCPUs are running, won't it?
So it's a kind of full barrier.

-- 
MST

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 12:53                           ` Michael S. Tsirkin
@ 2014-05-12 13:02                             ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 13:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
	qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
	Herongguang (Stephen)

Il 12/05/2014 14:53, Michael S. Tsirkin ha scritto:
>> > In any case, whether writes synchronize with RCU or bypass it
>> > doesn't change the picture.  In either case, writes are ordered
>> > against each other but not against reads.  RCU does nothing except
>> > preventing dangling pointer accesses.
>> >
>> > Paolo
> This is the only part I don't get.
> RCU will make sure no VCPUs are running, won't it?
> So it's a kind of full barrier.

The actual point where the new value becomes visible is where the write 
happens, not where you do synchronize_rcu.  This is the same for both 
full-copy of the routing table or overwriting the entry.

However, the delay in servicing an older irqfd write can be arbitrary if 
the scheduler decides not to run the irqfd_inject thread.  Such an older 
write might definitely read a newer routing entry.  And that's the 
invalid scenario according to the PCI spec.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12  9:57         ` Paolo Bonzini
  2014-05-12 10:08           ` Michael S. Tsirkin
  2014-05-12 10:30           ` Michael S. Tsirkin
@ 2014-05-13  7:01           ` Gonglei (Arei)
  2 siblings, 0 replies; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-13  7:01 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel@nongnu.org
  Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
	avi.kivity@gmail.com, mst@redhat.com

Hi,

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, May 12, 2014 5:58 PM

> Perhaps we can check for cases where only the address is changing, and
> poke at an existing struct kvm_kernel_irq_routing_entry without doing
> any RCU synchronization?
> 
> As long as kvm_set_msi_irq only reads address_lo once, it should work.
> 
> VHOST_SET_MEM_TABLE is a different problem.  What happens in userspace
> that leads to calling that ioctl?  Can we remove it altogether, or delay
> it to after the destination has started running?
> 
I thought this approach is a little different from the original, as eliminating synchronize_rcu() 
means when the KVM_SET_GSI_ROUTING ioctl returns, CPUs that using old irq routing table 
may still running, and thus when vCPU in VM done setting CPU affinity, other vCPUs in VM 
may receive stale IRQs.

But since in the original code, 
kvm_set_msi()->kvm_irq_delivery_to_apic()->...->__apic_accept_irq() may only sets vCPU's 
LAPIC, and kvm_vcpu_kick(), which means destination vCPUs may get scheduled to run after 
vCPU setting IRQ affinity get returned, thus receiving old stale IRQs.

So thanks for the suggestion, I'll try it.

And VHOST_SET_MEM_TABLE ioctl comes from vhost_dev_start() in QEMU when migration 
destination starts devices.

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-12 10:30           ` Michael S. Tsirkin
@ 2014-05-13  7:03             ` Gonglei (Arei)
  2014-05-13  8:21               ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-13  7:03 UTC (permalink / raw)
  To: Michael S. Tsirkin, Paolo Bonzini
  Cc: gleb@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
	avi.kivity@gmail.com, Herongguang (Stephen)

Hi,

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Monday, May 12, 2014 6:31 PM
> 
> vhost does everything under a VQ lock.
> I think RCU for VHOST_SET_MEM_TABLE can be replaced with
> taking and freeing the VQ lock.
> 
> Does the below solve the problem for you
> (warning: untested, sorry, busy with other bugs right now)?
> 
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 78987e4..df2e3eb 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d,
> struct vhost_memory __user *m)
>  {
>  	struct vhost_memory mem, *newmem, *oldmem;
>  	unsigned long size = offsetof(struct vhost_memory, regions);
> +	int i;
> 
>  	if (copy_from_user(&mem, m, size))
>  		return -EFAULT;
> @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d,
> struct vhost_memory __user *m)
>  	oldmem = rcu_dereference_protected(d->memory,
>  					   lockdep_is_held(&d->mutex));
>  	rcu_assign_pointer(d->memory, newmem);
> -	synchronize_rcu();
> +
> +	/* All memory accesses are done under some VQ mutex.
> +	 * So below is a faster equivalent of synchronize_rcu()
> +	 */
> +	for (i = 0; i < dev->nvqs; ++i) {
> +		mutex_lock(d->vqs[idx]->mutex);
> +		mutex_unlock(d->vqs[idx]->mutex);
> +	}
>  	kfree(oldmem);
>  	return 0;
>  }

Thanks for your advice, I suppose getting mutexes should generally be faster than waiting for 
CPU context switches. And I think d->mutex should also be synchronized since somewhere gets 
only this mutex directly and not vq mutexes. Is this right?

I'll try this approach, thanks.

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
  2014-05-13  7:03             ` Gonglei (Arei)
@ 2014-05-13  8:21               ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-13  8:21 UTC (permalink / raw)
  To: Gonglei (Arei)
  Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
	avi.kivity@gmail.com, Paolo Bonzini, Herongguang (Stephen)

On Tue, May 13, 2014 at 07:03:20AM +0000, Gonglei (Arei) wrote:
> Hi,
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Monday, May 12, 2014 6:31 PM
> > 
> > vhost does everything under a VQ lock.
> > I think RCU for VHOST_SET_MEM_TABLE can be replaced with
> > taking and freeing the VQ lock.
> > 
> > Does the below solve the problem for you
> > (warning: untested, sorry, busy with other bugs right now)?
> > 
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ---
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 78987e4..df2e3eb 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d,
> > struct vhost_memory __user *m)
> >  {
> >  	struct vhost_memory mem, *newmem, *oldmem;
> >  	unsigned long size = offsetof(struct vhost_memory, regions);
> > +	int i;
> > 
> >  	if (copy_from_user(&mem, m, size))
> >  		return -EFAULT;
> > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d,
> > struct vhost_memory __user *m)
> >  	oldmem = rcu_dereference_protected(d->memory,
> >  					   lockdep_is_held(&d->mutex));
> >  	rcu_assign_pointer(d->memory, newmem);
> > -	synchronize_rcu();
> > +
> > +	/* All memory accesses are done under some VQ mutex.
> > +	 * So below is a faster equivalent of synchronize_rcu()
> > +	 */
> > +	for (i = 0; i < dev->nvqs; ++i) {
> > +		mutex_lock(d->vqs[idx]->mutex);
> > +		mutex_unlock(d->vqs[idx]->mutex);
> > +	}
> >  	kfree(oldmem);
> >  	return 0;
> >  }
> 
> Thanks for your advice, I suppose getting mutexes should generally be faster than waiting for 
> CPU context switches. And I think d->mutex should also be synchronized since somewhere gets 
> only this mutex directly and not vq mutexes. Is this right?

No because all memory table accesses are under some vq mutex.

> I'll try this approach, thanks.
> 
> Best regards,
> -Gonglei

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-05-13  8:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-09  1:57 [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module? Gonglei (Arei)
2014-05-09  8:14 ` Paolo Bonzini
2014-05-09  9:04   ` Gonglei (Arei)
2014-05-09  9:53     ` Paolo Bonzini
2014-05-12  9:28       ` Gonglei (Arei)
2014-05-12  9:57         ` Paolo Bonzini
2014-05-12 10:08           ` Michael S. Tsirkin
2014-05-12 10:14             ` Paolo Bonzini
2014-05-12 10:18               ` Michael S. Tsirkin
2014-05-12 10:25                 ` Paolo Bonzini
2014-05-12 11:07                   ` Michael S. Tsirkin
2014-05-12 11:46                     ` Paolo Bonzini
2014-05-12 12:12                       ` Michael S. Tsirkin
2014-05-12 12:46                         ` Paolo Bonzini
2014-05-12 12:53                           ` Michael S. Tsirkin
2014-05-12 13:02                             ` Paolo Bonzini
2014-05-12 10:30           ` Michael S. Tsirkin
2014-05-13  7:03             ` Gonglei (Arei)
2014-05-13  8:21               ` Michael S. Tsirkin
2014-05-13  7:01           ` Gonglei (Arei)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).