* [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
@ 2014-05-09 1:57 Gonglei (Arei)
2014-05-09 8:14 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-09 1:57 UTC (permalink / raw)
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini, Huangweidong (C), Herongguang (Stephen),
mst@redhat.com
Hi,
Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
to tell vhost kernel modules GPA to HVA memory mappings, which consume is expensively.
The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl processing,
kmod and vhost calls synchronize_rcu() to wait for grace period to free old memory.
In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu to call_rcu,
since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.
In VHOST_SET_MEM_TABLE case, I wonder if we can change synchronize_rcu() to call_rcu(),
i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU would do
VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices, and RAM memory
regions in system memory address space change.
And I'd like to know if guest activities could lead to RAM memory regions change?
Can you give me some advices? Thanks!
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-09 1:57 [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module? Gonglei (Arei)
@ 2014-05-09 8:14 ` Paolo Bonzini
2014-05-09 9:04 ` Gonglei (Arei)
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-09 8:14 UTC (permalink / raw)
To: Gonglei (Arei), qemu-devel@nongnu.org
Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com
Il 09/05/2014 03:57, Gonglei (Arei) ha scritto:
> Hi,
>
> Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
> to tell vhost kernel modules GPA to HVA memory mappings, which consume is expensively.
> The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl processing,
> kmod and vhost calls synchronize_rcu() to wait for grace period to free old memory.
>
> In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu to call_rcu,
> since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.
>
> In VHOST_SET_MEM_TABLE case, I wonder if we can change synchronize_rcu() to call_rcu(),
> i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU would do
> VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices, and RAM memory
> regions in system memory address space change.
>
> And I'd like to know if guest activities could lead to RAM memory regions change?
Yes, for example enabling/disabling PCI BARs would have that effect.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-09 8:14 ` Paolo Bonzini
@ 2014-05-09 9:04 ` Gonglei (Arei)
2014-05-09 9:53 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-09 9:04 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel@nongnu.org
Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com
Hi,
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, May 09, 2014 4:15 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: mst@redhat.com; Herongguang (Stephen); Huangweidong (C)
> Subject: Re: [RFC] vhost: Can we change synchronize_rcu to call_rcu in
> vhost_set_memory() in vhost kernel module?
>
> Il 09/05/2014 03:57, Gonglei (Arei) ha scritto:
> > Hi,
> >
> > Vhost devices need to do VHOST_SET_MEM_TABLE ioctl in vhost_dev_start()
> > to tell vhost kernel modules GPA to HVA memory mappings, which consume is
> expensively.
> > The reason is same as KVM_SET_GSI_ROUTING ioctl. That is, in ioctl
> processing,
> > kmod and vhost calls synchronize_rcu() to wait for grace period to free old
> memory.
> >
> > In KVM_SET_GSI_ROUTING case, we cannot simply change synchronize_rcu
> to call_rcu,
> > since this may leads to DOS attacks if guest VM keeps setting IRQ affinity.
> >
> > In VHOST_SET_MEM_TABLE case, I wonder if we can change
> synchronize_rcu() to call_rcu(),
> > i.e., is it possible to trigger DOS attack in guest? There are some cases QEMU
> would do
> > VHOST_SET_MEM_TABLE ioctl, like VM start/reboot/attach vhost devices,
> and RAM memory
> > regions in system memory address space change.
> >
> > And I'd like to know if guest activities could lead to RAM memory regions
> change?
>
> Yes, for example enabling/disabling PCI BARs would have that effect.
>
Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with ram
memory regions, so disable or enable PCI BARs would not change ram MRs' mapping.
Since vhost_region_add/vhost_region_del check if changed MemoryRegionSection is ram,
so vhost memoey mappings will not be influenced, is this correct?
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-09 9:04 ` Gonglei (Arei)
@ 2014-05-09 9:53 ` Paolo Bonzini
2014-05-12 9:28 ` Gonglei (Arei)
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-09 9:53 UTC (permalink / raw)
To: Gonglei (Arei), qemu-devel@nongnu.org
Cc: Huangweidong (C), Herongguang (Stephen), mst@redhat.com
Il 09/05/2014 11:04, Gonglei (Arei) ha scritto:
>> > Yes, for example enabling/disabling PCI BARs would have that effect.
>> >
> Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with ram
> memory regions, so disable or enable PCI BARs would not change ram MRs' mapping.
PCI BARs can be RAM (one special case being the ROM BAR).
Paolo
> Since vhost_region_add/vhost_region_del check if changed MemoryRegionSection is ram,
> so vhost memoey mappings will not be influenced, is this correct?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-09 9:53 ` Paolo Bonzini
@ 2014-05-12 9:28 ` Gonglei (Arei)
2014-05-12 9:57 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-12 9:28 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel@nongnu.org
Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
avi.kivity@gmail.com, mst@redhat.com
Hi,
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Friday, May 09, 2014 5:54 PM
> To: Gonglei (Arei); qemu-devel@nongnu.org
> Cc: mst@redhat.com; Herongguang (Stephen); Huangweidong (C)
> Subject: Re: [RFC] vhost: Can we change synchronize_rcu to call_rcu in
> vhost_set_memory() in vhost kernel module?
>
> Il 09/05/2014 11:04, Gonglei (Arei) ha scritto:
> >> > Yes, for example enabling/disabling PCI BARs would have that effect.
> >> >
> > Yes, but PCI BARs are mapped in PCI hole, and they are not overlapped with
> ram
> > memory regions, so disable or enable PCI BARs would not change ram MRs'
> mapping.
>
> PCI BARs can be RAM (one special case being the ROM BAR).
>
> Paolo
>
Many thanks for your explanation.
We now found that migration downtime is primarily consumed in KVM_SET_GSI_ROUTING and
VHOST_SET_MEM_TABLE IOCTLs, and internally this is stuck in synchronize_rcu(). Besides live
migration, setting MSI IRQ CPU affinity in VM also triggers KVM_SET_GSI_ROUTING IOCTL in
QEMU.
>From previous discussion:
https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
balance node dispatch works to different VMs according to VM load. Since this balance needs
high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
And in live migration case, VHOST_SET_MEM_TABLE needs attention.
We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
configure. Do you have better ideas to achieve this? Thanks.
Cc'ing Avi & Gleb for more insight.
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 9:28 ` Gonglei (Arei)
@ 2014-05-12 9:57 ` Paolo Bonzini
2014-05-12 10:08 ` Michael S. Tsirkin
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 9:57 UTC (permalink / raw)
To: Gonglei (Arei), qemu-devel@nongnu.org
Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
avi.kivity@gmail.com, mst@redhat.com
Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> From previous discussion:
> https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> balance node dispatch works to different VMs according to VM load. Since this balance needs
> high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> And in live migration case, VHOST_SET_MEM_TABLE needs attention.
>
> We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> configure. Do you have better ideas to achieve this? Thanks.
Perhaps we can check for cases where only the address is changing, and
poke at an existing struct kvm_kernel_irq_routing_entry without doing
any RCU synchronization?
As long as kvm_set_msi_irq only reads address_lo once, it should work.
VHOST_SET_MEM_TABLE is a different problem. What happens in userspace
that leads to calling that ioctl? Can we remove it altogether, or delay
it to after the destination has started running?
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 9:57 ` Paolo Bonzini
@ 2014-05-12 10:08 ` Michael S. Tsirkin
2014-05-12 10:14 ` Paolo Bonzini
2014-05-12 10:30 ` Michael S. Tsirkin
2014-05-13 7:01 ` Gonglei (Arei)
2 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:08 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> >From previous discussion:
> >https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> >we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> >SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> >work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> >balance node dispatch works to different VMs according to VM load. Since this balance needs
> >high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> >every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> >And in live migration case, VHOST_SET_MEM_TABLE needs attention.
> >
> >We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> >configure. Do you have better ideas to achieve this? Thanks.
>
> Perhaps we can check for cases where only the address is changing,
> and poke at an existing struct kvm_kernel_irq_routing_entry without
> doing any RCU synchronization?
I suspect interrupts can get lost then: e.g. if address didn't match any
cpus, not it matches some. No?
> As long as kvm_set_msi_irq only reads address_lo once, it should work.
>
> VHOST_SET_MEM_TABLE is a different problem. What happens in
> userspace that leads to calling that ioctl? Can we remove it
> altogether, or delay it to after the destination has started
> running?
>
> Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 10:08 ` Michael S. Tsirkin
@ 2014-05-12 10:14 ` Paolo Bonzini
2014-05-12 10:18 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 10:14 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>> Perhaps we can check for cases where only the address is changing,
>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>> doing any RCU synchronization?
>
> I suspect interrupts can get lost then: e.g. if address didn't match any
> cpus, now it matches some. No?
Can you explain the problem more verbosely? :)
Multiple writers would still be protected by the mutex, so you cannot
have an "in-place update" writer racing with a "copy the array" writer.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 10:14 ` Paolo Bonzini
@ 2014-05-12 10:18 ` Michael S. Tsirkin
2014-05-12 10:25 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:18 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>Perhaps we can check for cases where only the address is changing,
> >>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>doing any RCU synchronization?
> >
> >I suspect interrupts can get lost then: e.g. if address didn't match any
> >cpus, now it matches some. No?
>
> Can you explain the problem more verbosely? :)
>
> Multiple writers would still be protected by the mutex, so you
> cannot have an "in-place update" writer racing with a "copy the
> array" writer.
>
> Paolo
I am not sure really.
I'm worried about reader vs writer.
If reader sees a stale msi value msi will be sent to a wrong
address.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 10:18 ` Michael S. Tsirkin
@ 2014-05-12 10:25 ` Paolo Bonzini
2014-05-12 11:07 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 10:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>> Perhaps we can check for cases where only the address is changing,
>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>> doing any RCU synchronization?
>>>
>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>> cpus, now it matches some. No?
>>
>> Can you explain the problem more verbosely? :)
>>
>> Multiple writers would still be protected by the mutex, so you
>> cannot have an "in-place update" writer racing with a "copy the
>> array" writer.
>
> I am not sure really.
> I'm worried about reader vs writer.
> If reader sees a stale msi value msi will be sent to a wrong
> address.
That shouldn't happen on any cache-coherent system, no?
Or at least, it shouldn't become any worse than what can already happen
with RCU.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 9:57 ` Paolo Bonzini
2014-05-12 10:08 ` Michael S. Tsirkin
@ 2014-05-12 10:30 ` Michael S. Tsirkin
2014-05-13 7:03 ` Gonglei (Arei)
2014-05-13 7:01 ` Gonglei (Arei)
2 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 10:30 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 11:28, Gonglei (Arei) ha scritto:
> >From previous discussion:
> >https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg04925.html
> >we know that you are going to replace RCU in KVM_SET_GSI_ROUTING with SRCU. Though
> >SRCU is quite better than originally RCU, in our test case this cannot satisfy our needs. Our VMs
> >work in telecom scenario, VMs report CPU and memory usage to balance node each second, and
> >balance node dispatch works to different VMs according to VM load. Since this balance needs
> >high accuracy, IRQ affinity settings in VM also need high accuracy, so we balance IRQ affinity in
> >every 0.5s. So for telecom scenario, KVM_SET_GSI_ROUTING IOCTL needs much optimization.
> >And in live migration case, VHOST_SET_MEM_TABLE needs attention.
> >
> >We tried to change synchronize_rcu() to call_rcu() with rate limit, but rate limit is not easy to
> >configure. Do you have better ideas to achieve this? Thanks.
>
> Perhaps we can check for cases where only the address is changing,
> and poke at an existing struct kvm_kernel_irq_routing_entry without
> doing any RCU synchronization?
>
> As long as kvm_set_msi_irq only reads address_lo once, it should work.
>
> VHOST_SET_MEM_TABLE is a different problem. What happens in
> userspace that leads to calling that ioctl? Can we remove it
> altogether, or delay it to after the destination has started
> running?
>
> Paolo
vhost does everything under a VQ lock.
I think RCU for VHOST_SET_MEM_TABLE can be replaced with
taking and freeing the VQ lock.
Does the below solve the problem for you
(warning: untested, sorry, busy with other bugs right now)?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 78987e4..df2e3eb 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
{
struct vhost_memory mem, *newmem, *oldmem;
unsigned long size = offsetof(struct vhost_memory, regions);
+ int i;
if (copy_from_user(&mem, m, size))
return -EFAULT;
@@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
oldmem = rcu_dereference_protected(d->memory,
lockdep_is_held(&d->mutex));
rcu_assign_pointer(d->memory, newmem);
- synchronize_rcu();
+
+ /* All memory accesses are done under some VQ mutex.
+ * So below is a faster equivalent of synchronize_rcu()
+ */
+ for (i = 0; i < dev->nvqs; ++i) {
+ mutex_lock(d->vqs[idx]->mutex);
+ mutex_unlock(d->vqs[idx]->mutex);
+ }
kfree(oldmem);
return 0;
}
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 10:25 ` Paolo Bonzini
@ 2014-05-12 11:07 ` Michael S. Tsirkin
2014-05-12 11:46 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 11:07 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
Gonglei (Arei), avi.kivity@gmail.com, Herongguang (Stephen)
On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>Perhaps we can check for cases where only the address is changing,
> >>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>doing any RCU synchronization?
> >>>
> >>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>cpus, now it matches some. No?
> >>
> >>Can you explain the problem more verbosely? :)
> >>
> >>Multiple writers would still be protected by the mutex, so you
> >>cannot have an "in-place update" writer racing with a "copy the
> >>array" writer.
> >
> >I am not sure really.
> >I'm worried about reader vs writer.
> >If reader sees a stale msi value msi will be sent to a wrong
> >address.
>
> That shouldn't happen on any cache-coherent system, no?
>
> Or at least, it shouldn't become any worse than what can already
> happen with RCU.
>
> Paolo
Meaning guest must do some synchronization anyway?
It's an interesting question, I am not sure.
We seem to rely guest synchronization sometimes in core KVM.
But I am not sure this works correctly in all cases,
synchronization with guest VCPU does not have to be
the same thing as synchronization with host CPU.
For example, can not guest VCPU1 detect that
VCPU2 is idle and avoid any synchronization?
In any case I'd like to see a patch like this
accompanied by some argument explaining why it's
a safe thing to do.
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 11:07 ` Michael S. Tsirkin
@ 2014-05-12 11:46 ` Paolo Bonzini
2014-05-12 12:12 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 11:46 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
Herongguang (Stephen)
Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>>>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>>>> Perhaps we can check for cases where only the address is changing,
>>>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>>>> doing any RCU synchronization?
>>>>>
>>>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>>>> cpus, now it matches some. No?
>>>>
>>>> Can you explain the problem more verbosely? :)
>>>>
>>>> Multiple writers would still be protected by the mutex, so you
>>>> cannot have an "in-place update" writer racing with a "copy the
>>>> array" writer.
>>>
>>> I am not sure really.
>>> I'm worried about reader vs writer.
>>> If reader sees a stale msi value msi will be sent to a wrong
>>> address.
>>
>> That shouldn't happen on any cache-coherent system, no?
>>
>> Or at least, it shouldn't become any worse than what can already
>> happen with RCU.
>
> Meaning guest must do some synchronization anyway?
Yes, I think so. The simplest would be to mask the interrupt around MSI
configuration changes. Radim was looking at a similar bug. He couldn't
replicate on bare metal, but I don't see why it shouldn't be possible
there too.
> But I am not sure this works correctly in all cases,
> synchronization with guest VCPU does not have to be
> the same thing as synchronization with host CPU.
> For example, can not guest VCPU1 detect that
> VCPU2 is idle and avoid any synchronization?
So guest VCPU1 would be the one that sets the MSI word, and VCPU2 would
be the old destination? That would also be racy, the moment after you
"check that VCPU2 is idle" an interrupt could come in and the VCPU
wouldn't be idle anymore.
> In any case I'd like to see a patch like this
> accompanied by some argument explaining why it's
> a safe thing to do.
Yes, of course.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 11:46 ` Paolo Bonzini
@ 2014-05-12 12:12 ` Michael S. Tsirkin
2014-05-12 12:46 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 12:12 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
Herongguang (Stephen)
On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>>>Perhaps we can check for cases where only the address is changing,
> >>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>>>doing any RCU synchronization?
> >>>>>
> >>>>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>>>cpus, now it matches some. No?
> >>>>
> >>>>Can you explain the problem more verbosely? :)
> >>>>
> >>>>Multiple writers would still be protected by the mutex, so you
> >>>>cannot have an "in-place update" writer racing with a "copy the
> >>>>array" writer.
> >>>
> >>>I am not sure really.
> >>>I'm worried about reader vs writer.
> >>>If reader sees a stale msi value msi will be sent to a wrong
> >>>address.
> >>
> >>That shouldn't happen on any cache-coherent system, no?
> >>
> >>Or at least, it shouldn't become any worse than what can already
> >>happen with RCU.
> >
> >Meaning guest must do some synchronization anyway?
>
> Yes, I think so. The simplest would be to mask the interrupt around
> MSI configuration changes. Radim was looking at a similar bug.
Was this with classic assignment or with vfio?
> He
> couldn't replicate on bare metal, but I don't see why it shouldn't
> be possible there too.
I doubt linux does something tricky here, so
I'm not talking about something linux guest does,
I'm talking about an abstract guarantee of the
APIC. If baremetal causes a synchronisation point
and we don't, at some point linux will use it and
this will bite us.
> >But I am not sure this works correctly in all cases,
> >synchronization with guest VCPU does not have to be
> >the same thing as synchronization with host CPU.
> >For example, can not guest VCPU1 detect that
> >VCPU2 is idle and avoid any synchronization?
>
> So guest VCPU1 would be the one that sets the MSI word, and VCPU2
> would be the old destination? That would also be racy, the moment
> after you "check that VCPU2 is idle" an interrupt could come in and
> the VCPU wouldn't be idle anymore.
True but I was talking about something like RCU in guest so
that's not a problem: you check after update.
If it's idle you know next interrupt will be the new one.
> >In any case I'd like to see a patch like this
> >accompanied by some argument explaining why it's
> >a safe thing to do.
>
> Yes, of course.
>
> Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 12:12 ` Michael S. Tsirkin
@ 2014-05-12 12:46 ` Paolo Bonzini
2014-05-12 12:53 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 12:46 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
Herongguang (Stephen)
Il 12/05/2014 14:12, Michael S. Tsirkin ha scritto:
> On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
>> Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
>>> On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
>>>> Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
>>>>> On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
>>>>>> Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
>>>>>>> On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
>>>>>>>> Perhaps we can check for cases where only the address is changing,
>>>>>>>> and poke at an existing struct kvm_kernel_irq_routing_entry without
>>>>>>>> doing any RCU synchronization?
>>>>>>>
>>>>>>> I suspect interrupts can get lost then: e.g. if address didn't match any
>>>>>>> cpus, now it matches some. No?
>>>>>>
>>>>>> Can you explain the problem more verbosely? :)
>>>>>>
>>>>>> Multiple writers would still be protected by the mutex, so you
>>>>>> cannot have an "in-place update" writer racing with a "copy the
>>>>>> array" writer.
>>>>>
>>>>> I am not sure really.
>>>>> I'm worried about reader vs writer.
>>>>> If reader sees a stale msi value msi will be sent to a wrong
>>>>> address.
>>>>
>>>> That shouldn't happen on any cache-coherent system, no?
>>>>
>>>> Or at least, it shouldn't become any worse than what can already
>>>> happen with RCU.
>>>
>>> Meaning guest must do some synchronization anyway?
>>
>> Yes, I think so. The simplest would be to mask the interrupt around
>> MSI configuration changes. Radim was looking at a similar bug.
>
> Was this with classic assignment or with vfio?
No, it was with QEMU emulated devices (AHCI). Not this path, but it did
involve MSI configuration changes while the interrupt was unmasked in
the device.
>> He couldn't replicate on bare metal,
>> but I don't see why it shouldn't
>> be possible there too.
>
> I doubt linux does something tricky here, so
> I'm not talking about something linux guest does,
> I'm talking about an abstract guarantee of the
> APIC.
APIC or PCI? The chipset is involved here, not the APICs (the APICs
just see interrupt transactions).
> If baremetal causes a synchronisation point
> and we don't, at some point linux will use it and
> this will bite us.
Yes, and I agree.
It's complicated; on one hand I'm not sure we can entirely fix this, on
the other hand it may be that the guest cannot have too high expectations.
The invalid scenario is an interrupt happening before the MSI
reconfiguration and using the new value. It's invalid because MSI
reconfiguration is a non-posted write and must push previously posted
writes. But we cannot do that at all! QEMU has the big lock that
implicitly orders transactions, but this does not apply to
high-performance users of irqfd (vhost, VFIO, even QEMU's own virtio-blk
dataplane). Even if we removed all RCU uses for the routing table, we
don't have a "bottleneck" that orders transactions and removes this
invalid scenario.
At the same time, it's obviously okay if an interrupt happens after the
MSI reconfiguration and uses the old value. That's a posted write
passing an earlier non-posted write, which is allowed. This is the main
reason why I believe that masking/unmasking is required on bare metal.
In addition, if the masking/unmasking is done with a posted write (MMIO
BAR), the guest needs to do a read for synchronization before it unmasks
the interrupt.
In any case, whether writes synchronize with RCU or bypass it doesn't
change the picture. In either case, writes are ordered against each
other but not against reads. RCU does nothing except preventing
dangling pointer accesses.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 12:46 ` Paolo Bonzini
@ 2014-05-12 12:53 ` Michael S. Tsirkin
2014-05-12 13:02 ` Paolo Bonzini
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-12 12:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
Herongguang (Stephen)
On Mon, May 12, 2014 at 02:46:46PM +0200, Paolo Bonzini wrote:
> Il 12/05/2014 14:12, Michael S. Tsirkin ha scritto:
> >On Mon, May 12, 2014 at 01:46:19PM +0200, Paolo Bonzini wrote:
> >>Il 12/05/2014 13:07, Michael S. Tsirkin ha scritto:
> >>>On Mon, May 12, 2014 at 12:25:35PM +0200, Paolo Bonzini wrote:
> >>>>Il 12/05/2014 12:18, Michael S. Tsirkin ha scritto:
> >>>>>On Mon, May 12, 2014 at 12:14:25PM +0200, Paolo Bonzini wrote:
> >>>>>>Il 12/05/2014 12:08, Michael S. Tsirkin ha scritto:
> >>>>>>>On Mon, May 12, 2014 at 11:57:32AM +0200, Paolo Bonzini wrote:
> >>>>>>>>Perhaps we can check for cases where only the address is changing,
> >>>>>>>>and poke at an existing struct kvm_kernel_irq_routing_entry without
> >>>>>>>>doing any RCU synchronization?
> >>>>>>>
> >>>>>>>I suspect interrupts can get lost then: e.g. if address didn't match any
> >>>>>>>cpus, now it matches some. No?
> >>>>>>
> >>>>>>Can you explain the problem more verbosely? :)
> >>>>>>
> >>>>>>Multiple writers would still be protected by the mutex, so you
> >>>>>>cannot have an "in-place update" writer racing with a "copy the
> >>>>>>array" writer.
> >>>>>
> >>>>>I am not sure really.
> >>>>>I'm worried about reader vs writer.
> >>>>>If reader sees a stale msi value msi will be sent to a wrong
> >>>>>address.
> >>>>
> >>>>That shouldn't happen on any cache-coherent system, no?
> >>>>
> >>>>Or at least, it shouldn't become any worse than what can already
> >>>>happen with RCU.
> >>>
> >>>Meaning guest must do some synchronization anyway?
> >>
> >>Yes, I think so. The simplest would be to mask the interrupt around
> >>MSI configuration changes. Radim was looking at a similar bug.
> >
> >Was this with classic assignment or with vfio?
>
> No, it was with QEMU emulated devices (AHCI). Not this path, but it
> did involve MSI configuration changes while the interrupt was
> unmasked in the device.
>
> >>He couldn't replicate on bare metal,
> >>but I don't see why it shouldn't
> >>be possible there too.
> >
> >I doubt linux does something tricky here, so
> >I'm not talking about something linux guest does,
> >I'm talking about an abstract guarantee of the
> >APIC.
>
> APIC or PCI? The chipset is involved here, not the APICs (the APICs
> just see interrupt transactions).
>
> >If baremetal causes a synchronisation point
> >and we don't, at some point linux will use it and
> >this will bite us.
>
> Yes, and I agree.
>
> It's complicated; on one hand I'm not sure we can entirely fix this,
> on the other hand it may be that the guest cannot have too high
> expectations.
>
> The invalid scenario is an interrupt happening before the MSI
> reconfiguration and using the new value. It's invalid because MSI
> reconfiguration is a non-posted write and must push previously
> posted writes. But we cannot do that at all! QEMU has the big lock
> that implicitly orders transactions, but this does not apply to
> high-performance users of irqfd (vhost, VFIO, even QEMU's own
> virtio-blk dataplane). Even if we removed all RCU uses for the
> routing table, we don't have a "bottleneck" that orders transactions
> and removes this invalid scenario.
>
> At the same time, it's obviously okay if an interrupt happens after
> the MSI reconfiguration and uses the old value. That's a posted
> write passing an earlier non-posted write, which is allowed. This
> is the main reason why I believe that masking/unmasking is required
> on bare metal. In addition, if the masking/unmasking is done with a
> posted write (MMIO BAR), the guest needs to do a read for
> synchronization before it unmasks the interrupt.
>
> In any case, whether writes synchronize with RCU or bypass it
> doesn't change the picture. In either case, writes are ordered
> against each other but not against reads. RCU does nothing except
> preventing dangling pointer accesses.
>
> Paolo
This is the only part I don't get.
RCU will make sure no VCPUs are running, won't it?
So it's a kind of full barrier.
--
MST
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 12:53 ` Michael S. Tsirkin
@ 2014-05-12 13:02 ` Paolo Bonzini
0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2014-05-12 13:02 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Huangweidong (C), gleb@redhat.com, Radim Krcmar,
qemu-devel@nongnu.org, Gonglei (Arei), avi.kivity@gmail.com,
Herongguang (Stephen)
Il 12/05/2014 14:53, Michael S. Tsirkin ha scritto:
>> > In any case, whether writes synchronize with RCU or bypass it
>> > doesn't change the picture. In either case, writes are ordered
>> > against each other but not against reads. RCU does nothing except
>> > preventing dangling pointer accesses.
>> >
>> > Paolo
> This is the only part I don't get.
> RCU will make sure no VCPUs are running, won't it?
> So it's a kind of full barrier.
The actual point where the new value becomes visible is where the write
happens, not where you do synchronize_rcu. This is the same for both
full-copy of the routing table or overwriting the entry.
However, the delay in servicing an older irqfd write can be arbitrary if
the scheduler decides not to run the irqfd_inject thread. Such an older
write might definitely read a newer routing entry. And that's the
invalid scenario according to the PCI spec.
Paolo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 9:57 ` Paolo Bonzini
2014-05-12 10:08 ` Michael S. Tsirkin
2014-05-12 10:30 ` Michael S. Tsirkin
@ 2014-05-13 7:01 ` Gonglei (Arei)
2 siblings, 0 replies; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-13 7:01 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel@nongnu.org
Cc: gleb@redhat.com, Huangweidong (C), Herongguang (Stephen),
avi.kivity@gmail.com, mst@redhat.com
Hi,
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Monday, May 12, 2014 5:58 PM
> Perhaps we can check for cases where only the address is changing, and
> poke at an existing struct kvm_kernel_irq_routing_entry without doing
> any RCU synchronization?
>
> As long as kvm_set_msi_irq only reads address_lo once, it should work.
>
> VHOST_SET_MEM_TABLE is a different problem. What happens in userspace
> that leads to calling that ioctl? Can we remove it altogether, or delay
> it to after the destination has started running?
>
I thought this approach is a little different from the original, as eliminating synchronize_rcu()
means when the KVM_SET_GSI_ROUTING ioctl returns, CPUs that using old irq routing table
may still running, and thus when vCPU in VM done setting CPU affinity, other vCPUs in VM
may receive stale IRQs.
But since in the original code,
kvm_set_msi()->kvm_irq_delivery_to_apic()->...->__apic_accept_irq() may only sets vCPU's
LAPIC, and kvm_vcpu_kick(), which means destination vCPUs may get scheduled to run after
vCPU setting IRQ affinity get returned, thus receiving old stale IRQs.
So thanks for the suggestion, I'll try it.
And VHOST_SET_MEM_TABLE ioctl comes from vhost_dev_start() in QEMU when migration
destination starts devices.
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-12 10:30 ` Michael S. Tsirkin
@ 2014-05-13 7:03 ` Gonglei (Arei)
2014-05-13 8:21 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Gonglei (Arei) @ 2014-05-13 7:03 UTC (permalink / raw)
To: Michael S. Tsirkin, Paolo Bonzini
Cc: gleb@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
avi.kivity@gmail.com, Herongguang (Stephen)
Hi,
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Monday, May 12, 2014 6:31 PM
>
> vhost does everything under a VQ lock.
> I think RCU for VHOST_SET_MEM_TABLE can be replaced with
> taking and freeing the VQ lock.
>
> Does the below solve the problem for you
> (warning: untested, sorry, busy with other bugs right now)?
>
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ---
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 78987e4..df2e3eb 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d,
> struct vhost_memory __user *m)
> {
> struct vhost_memory mem, *newmem, *oldmem;
> unsigned long size = offsetof(struct vhost_memory, regions);
> + int i;
>
> if (copy_from_user(&mem, m, size))
> return -EFAULT;
> @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d,
> struct vhost_memory __user *m)
> oldmem = rcu_dereference_protected(d->memory,
> lockdep_is_held(&d->mutex));
> rcu_assign_pointer(d->memory, newmem);
> - synchronize_rcu();
> +
> + /* All memory accesses are done under some VQ mutex.
> + * So below is a faster equivalent of synchronize_rcu()
> + */
> + for (i = 0; i < dev->nvqs; ++i) {
> + mutex_lock(d->vqs[idx]->mutex);
> + mutex_unlock(d->vqs[idx]->mutex);
> + }
> kfree(oldmem);
> return 0;
> }
Thanks for your advice, I suppose getting mutexes should generally be faster than waiting for
CPU context switches. And I think d->mutex should also be synchronized since somewhere gets
only this mutex directly and not vq mutexes. Is this right?
I'll try this approach, thanks.
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module?
2014-05-13 7:03 ` Gonglei (Arei)
@ 2014-05-13 8:21 ` Michael S. Tsirkin
0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2014-05-13 8:21 UTC (permalink / raw)
To: Gonglei (Arei)
Cc: Huangweidong (C), gleb@redhat.com, qemu-devel@nongnu.org,
avi.kivity@gmail.com, Paolo Bonzini, Herongguang (Stephen)
On Tue, May 13, 2014 at 07:03:20AM +0000, Gonglei (Arei) wrote:
> Hi,
>
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Monday, May 12, 2014 6:31 PM
> >
> > vhost does everything under a VQ lock.
> > I think RCU for VHOST_SET_MEM_TABLE can be replaced with
> > taking and freeing the VQ lock.
> >
> > Does the below solve the problem for you
> > (warning: untested, sorry, busy with other bugs right now)?
> >
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > ---
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 78987e4..df2e3eb 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -593,6 +593,7 @@ static long vhost_set_memory(struct vhost_dev *d,
> > struct vhost_memory __user *m)
> > {
> > struct vhost_memory mem, *newmem, *oldmem;
> > unsigned long size = offsetof(struct vhost_memory, regions);
> > + int i;
> >
> > if (copy_from_user(&mem, m, size))
> > return -EFAULT;
> > @@ -619,7 +620,14 @@ static long vhost_set_memory(struct vhost_dev *d,
> > struct vhost_memory __user *m)
> > oldmem = rcu_dereference_protected(d->memory,
> > lockdep_is_held(&d->mutex));
> > rcu_assign_pointer(d->memory, newmem);
> > - synchronize_rcu();
> > +
> > + /* All memory accesses are done under some VQ mutex.
> > + * So below is a faster equivalent of synchronize_rcu()
> > + */
> > + for (i = 0; i < dev->nvqs; ++i) {
> > + mutex_lock(d->vqs[idx]->mutex);
> > + mutex_unlock(d->vqs[idx]->mutex);
> > + }
> > kfree(oldmem);
> > return 0;
> > }
>
> Thanks for your advice, I suppose getting mutexes should generally be faster than waiting for
> CPU context switches. And I think d->mutex should also be synchronized since somewhere gets
> only this mutex directly and not vq mutexes. Is this right?
No because all memory table accesses are under some vq mutex.
> I'll try this approach, thanks.
>
> Best regards,
> -Gonglei
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-05-13 8:23 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-09 1:57 [Qemu-devel] [RFC] vhost: Can we change synchronize_rcu to call_rcu in vhost_set_memory() in vhost kernel module? Gonglei (Arei)
2014-05-09 8:14 ` Paolo Bonzini
2014-05-09 9:04 ` Gonglei (Arei)
2014-05-09 9:53 ` Paolo Bonzini
2014-05-12 9:28 ` Gonglei (Arei)
2014-05-12 9:57 ` Paolo Bonzini
2014-05-12 10:08 ` Michael S. Tsirkin
2014-05-12 10:14 ` Paolo Bonzini
2014-05-12 10:18 ` Michael S. Tsirkin
2014-05-12 10:25 ` Paolo Bonzini
2014-05-12 11:07 ` Michael S. Tsirkin
2014-05-12 11:46 ` Paolo Bonzini
2014-05-12 12:12 ` Michael S. Tsirkin
2014-05-12 12:46 ` Paolo Bonzini
2014-05-12 12:53 ` Michael S. Tsirkin
2014-05-12 13:02 ` Paolo Bonzini
2014-05-12 10:30 ` Michael S. Tsirkin
2014-05-13 7:03 ` Gonglei (Arei)
2014-05-13 8:21 ` Michael S. Tsirkin
2014-05-13 7:01 ` Gonglei (Arei)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).