* [PATCH 06/27] xen, smpboot: Use generic SMP booting infrastructure
From: Srivatsa S. Bhat @ 2012-06-01 9:11 UTC (permalink / raw)
To: peterz, paulmck
Cc: linux-arch, Jeremy Fitzhardinge, x86, nikunj,
Konrad Rzeszutek Wilk, vatsa, linux-kernel, rjw, yong.zhang0,
Ingo Molnar, Thomas Gleixner, srivatsa.bhat, H. Peter Anvin,
xen-devel, akpm, virtualization, mingo
In-Reply-To: <20120601090952.31979.24799.stgit@srivatsabhat.in.ibm.com>
Convert xen to use the generic framework to boot secondary CPUs.
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
arch/x86/xen/smp.c | 21 ++++-----------------
1 files changed, 4 insertions(+), 17 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 602d6b7..46c96f9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -58,13 +58,12 @@ static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
}
-static void __cpuinit cpu_bringup(void)
+void __cpuinit xen_cpu_pre_starting(void *unused)
{
int cpu;
cpu_init();
touch_softlockup_watchdog();
- preempt_disable();
xen_enable_sysenter();
xen_enable_syscall();
@@ -75,25 +74,11 @@ static void __cpuinit cpu_bringup(void)
set_cpu_sibling_map(cpu);
xen_setup_cpu_clockevents();
-
- notify_cpu_starting(cpu);
-
- set_cpu_online(cpu, true);
-
- this_cpu_write(cpu_state, CPU_ONLINE);
-
- wmb();
-
- /* We can take interrupts now: we're officially "up". */
- local_irq_enable();
-
- wmb(); /* make sure everything is out */
}
static void __cpuinit cpu_bringup_and_idle(void)
{
- cpu_bringup();
- cpu_idle();
+ smpboot_start_secondary(NULL);
}
static int xen_smp_intr_init(unsigned int cpu)
@@ -515,6 +500,8 @@ static const struct smp_ops xen_smp_ops __initconst = {
.smp_prepare_cpus = xen_smp_prepare_cpus,
.smp_cpus_done = xen_smp_cpus_done,
+ .cpu_pre_starting = xen_cpu_pre_starting,
+
.cpu_up = xen_cpu_up,
.cpu_die = xen_cpu_die,
.cpu_disable = xen_cpu_disable,
^ permalink raw reply related
* [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Stefan Hajnoczi @ 2012-06-01 9:13 UTC (permalink / raw)
To: virtualization; +Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, khoa
Holding the vblk->lock across kick causes poor scalability in SMP
guests. If one CPU is doing virtqueue kick and another CPU touches the
vblk->lock it will have to spin until virtqueue kick completes.
This patch reduces system% CPU utilization in SMP guests that are
running multithreaded I/O-bound workloads. The improvements are small
but show as iops and SMP are increased.
Khoa Huynh <khoa@us.ibm.com> provided initial performance data that
indicates this optimization is worthwhile at high iops.
Asias He <asias@redhat.com> reports the following fio results:
Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
Guest: same as host kernel
Average 3 runs:
with locked kick
read iops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
write iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
read iops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
write iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50
with unlocked kick
read iops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
write iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
read iops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
write iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00
FIO config file:
[global]
exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
group_reporting
norandommap
ioscheduler=noop
thread
bs=512
size=4MB
direct=1
filename=/dev/vdb
numjobs=256
ioengine=aio
iodepth=64
loops=3
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
irqs with spin_unlock_irq(). If the caller of blk_run_queue() had irqs
disabled and we enable them again this could be a problem, right? Can someone
more familiar with kernel locking comment?
drivers/block/virtio_blk.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 774c31d..d674977 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
issued++;
}
- if (issued)
- virtqueue_kick(vblk->vq);
+ if (!issued)
+ return;
+
+ if (virtqueue_kick_prepare(vblk->vq)) {
+ spin_unlock_irq(vblk->disk->queue->queue_lock);
+ virtqueue_notify(vblk->vq);
+ spin_lock_irq(vblk->disk->queue->queue_lock);
+ }
}
/* return id (s/n) string for *disk to *id_str
--
1.7.10
^ permalink raw reply related
* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Jan Beulich @ 2012-06-01 12:59 UTC (permalink / raw)
To: srivatsa.bhat
Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Keir Fraser, akpm,
nikunj, Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
virtualization, rjw, yong.zhang0, Ingo Molnar, H. Peter Anvin,
Thomas Gleixner, paulmck, mingo
In-Reply-To: <20120601091124.31979.91984.stgit@srivatsabhat.in.ibm.com>
>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
wrote:
> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
> is invoked in the cpu down path, whereas cpu_bringup() (as the name
> suggests) is useful in the cpu bringup path.
This might not be correct - the code as it is without this change is
safe even when the vCPU gets onlined back later by an external
entity (e.g. the Xen tool stack), and it would in that case resume
at the return point of the VCPUOP_down hypercall. That might
be a heritage from the original XenoLinux tree though, and be
meaningless in pv-ops context - Jeremy, Konrad?
Possibly it was bogus/unused even in that original tree - Keir?
Jan
> Getting rid of xen_play_dead()'s dependency on cpu_bringup() helps in
> hooking on to the generic SMP booting framework.
>
> Also remove the extra call to preempt_enable() added by commit 41bd956
> (xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while
> atomic) because it becomes unnecessary after this change.
>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: xen-devel@lists.xensource.com
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> arch/x86/xen/smp.c | 8 --------
> 1 files changed, 0 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 09a7199..602d6b7 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -417,14 +417,6 @@ static void __cpuinit xen_play_dead(void) /* used only
> with HOTPLUG_CPU */
> {
> play_dead_common();
> HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL);
> - cpu_bringup();
> - /*
> - * Balance out the preempt calls - as we are running in cpu_idle
> - * loop which has been called at bootup from cpu_bringup_and_idle.
> - * The cpucpu_bringup_and_idle called cpu_bringup which made a
> - * preempt_disable() So this preempt_enable will balance it out.
> - */
> - preempt_enable();
> }
>
> #else /* !CONFIG_HOTPLUG_CPU */
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply
* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Srivatsa S. Bhat @ 2012-06-01 15:13 UTC (permalink / raw)
To: Jan Beulich
Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Keir Fraser, akpm,
nikunj, Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
virtualization, rjw, yong.zhang0, Ingo Molnar, H. Peter Anvin,
Russell King, Thomas Gleixner, paulmck, mingo
In-Reply-To: <4FC8D8E30200007800087D2B@nat28.tlf.novell.com>
On 06/01/2012 06:29 PM, Jan Beulich wrote:
>>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
> wrote:
>> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
>> is invoked in the cpu down path, whereas cpu_bringup() (as the name
>> suggests) is useful in the cpu bringup path.
>
> This might not be correct - the code as it is without this change is
> safe even when the vCPU gets onlined back later by an external
> entity (e.g. the Xen tool stack), and it would in that case resume
> at the return point of the VCPUOP_down hypercall. That might
> be a heritage from the original XenoLinux tree though, and be
> meaningless in pv-ops context - Jeremy, Konrad?
>
> Possibly it was bogus/unused even in that original tree - Keir?
>
Thanks for your comments Jan!
In case this change is wrong, the other method I had in mind was to call
cpu_bringup_and_idle() in xen_play_dead(). (Even ARM does something similar,
in the sense that it runs the cpu bringup code including cpu_idle(), in the
cpu offline path, namely the cpu_die() function). Would that approach work
for xen as well? If yes, then we wouldn't have any issues to convert xen to
generic code.
Regards,
Srivatsa S. Bhat
>
>> Getting rid of xen_play_dead()'s dependency on cpu_bringup() helps in
>> hooking on to the generic SMP booting framework.
>>
>> Also remove the extra call to preempt_enable() added by commit 41bd956
>> (xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while
>> atomic) because it becomes unnecessary after this change.
>>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Jeremy Fitzhardinge <jeremy@goop.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Cc: xen-devel@lists.xensource.com
>> Cc: virtualization@lists.linux-foundation.org
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>> arch/x86/xen/smp.c | 8 --------
>> 1 files changed, 0 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 09a7199..602d6b7 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -417,14 +417,6 @@ static void __cpuinit xen_play_dead(void) /* used only
>> with HOTPLUG_CPU */
>> {
>> play_dead_common();
>> HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL);
>> - cpu_bringup();
>> - /*
>> - * Balance out the preempt calls - as we are running in cpu_idle
>> - * loop which has been called at bootup from cpu_bringup_and_idle.
>> - * The cpucpu_bringup_and_idle called cpu_bringup which made a
>> - * preempt_disable() So this preempt_enable will balance it out.
>> - */
>> - preempt_enable();
>> }
>>
>> #else /* !CONFIG_HOTPLUG_CPU */
>>
>>
^ permalink raw reply
* Re: [vmw_vmci RFC 00/11] VMCI for Linux
From: Andy King @ 2012-06-01 15:33 UTC (permalink / raw)
To: Greg KH
Cc: dtor, linux-kernel, virtualization, dsouders,
Andrew Stiegmann (stieg), akpm, cschamp
In-Reply-To: <20120515235024.GB1758@kroah.com>
Greg,
Thanks so much for the comments and apologies for the delayed response.
> Don't we have something like this already for KVM and maybe Xen?
> virtio? Can't you use that code instead of a new block of code that
> is only used by vmware users? It has virtual pci devices which
> should give you what you want/need here, right?
>
> If not, why doesn't that work for you? Would it be easier to just
> extend it?
The VMCI virtual device for which this driver is intended has been
around a lot longer than this submission might suggest. The virtual
hardware was released in a product before Rusty sent his RFC and
quite a bit before it made it to mainline; there was, regrettably,
no virtio then.
As such, it was designed to be its own transport, and it's something
that is now very much fixed at the hardware level (enhancements
not withstanding), and which we have to support all the way back.
In addition to that, our hypervisor endpoints are written using
the existing device backend; virtio doesn't currently make a lot of
sense for them, and would require a lot of additional work.
All of this is unfortunate. While I agree that virtio is certainly
the right approach, and we need to avoid this proliferation, I think
at this point we'd really like to try and upstream this in its current
form. There's certainly the possibility going forwards that we could
add a glue layer, such that other clients could use virtio if they're
willing to write their own hypervisor endpoints.
Does that sound reasonable?
Again, many thanks for taking the time to review this.
Kind regards,
- Andy King
^ permalink raw reply
* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Jan Beulich @ 2012-06-01 15:36 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, akpm,
nikunj, Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
virtualization, rjw, yong.zhang0, Ingo Molnar, H. Peter Anvin,
Keir Fraser, Thomas Gleixner, paulmck, mingo
In-Reply-To: <4FC8DC19.4000007@linux.vnet.ibm.com>
>>> On 01.06.12 at 17:13, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
wrote:
> On 06/01/2012 06:29 PM, Jan Beulich wrote:
>
>>>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>> wrote:
>>> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
>>> is invoked in the cpu down path, whereas cpu_bringup() (as the name
>>> suggests) is useful in the cpu bringup path.
>>
>> This might not be correct - the code as it is without this change is
>> safe even when the vCPU gets onlined back later by an external
>> entity (e.g. the Xen tool stack), and it would in that case resume
>> at the return point of the VCPUOP_down hypercall. That might
>> be a heritage from the original XenoLinux tree though, and be
>> meaningless in pv-ops context - Jeremy, Konrad?
>>
>> Possibly it was bogus/unused even in that original tree - Keir?
>>
>
>
> Thanks for your comments Jan!
>
> In case this change is wrong, the other method I had in mind was to call
> cpu_bringup_and_idle() in xen_play_dead(). (Even ARM does something similar,
> in the sense that it runs the cpu bringup code including cpu_idle(), in the
> cpu offline path, namely the cpu_die() function). Would that approach work
> for xen as well? If yes, then we wouldn't have any issues to convert xen to
> generic code.
No, that wouldn't work either afaict - the function is expected
to return.
Jan
^ permalink raw reply
* Re: [PATCH 03/27] smpboot: Define and use cpu_state per-cpu variable in generic code
From: David Daney @ 2012-06-01 16:59 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Venkatesh Pallipadi, Jeremy Fitzhardinge, linux-ia64, linux-mips,
peterz, Benjamin Herrenschmidt, linux-kernel, H. Peter Anvin,
mingo, linux-arch, xen-devel, Suresh Siddha, linux-sh, x86,
Ingo Molnar, paulmck, Fenghua Yu, Mike Frysinger, Peter Zijlstra,
nikunj, Konrad Rzeszutek Wilk, Chris Metcalf, rjw, yong.zhang0,
tglx, virtualization, Tony Luck, vatsa, Ralf
In-Reply-To: <20120601091038.31979.67878.stgit@srivatsabhat.in.ibm.com>
On 06/01/2012 02:10 AM, Srivatsa S. Bhat wrote:
> The per-cpu variable cpu_state is used in x86 and also used in other
> architectures, to track the state of the cpu during bringup and hotplug.
> Pull it out into generic code.
>
> Cc: Tony Luck<tony.luck@intel.com>
> Cc: Fenghua Yu<fenghua.yu@intel.com>
> Cc: Ralf Baechle<ralf@linux-mips.org>
> Cc: Benjamin Herrenschmidt<benh@kernel.crashing.org>
> Cc: Paul Mundt<lethal@linux-sh.org>
> Cc: Chris Metcalf<cmetcalf@tilera.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: "H. Peter Anvin"<hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com>
> Cc: Jeremy Fitzhardinge<jeremy@goop.org>
> Cc: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Cc: Andrew Morton<akpm@linux-foundation.org>
> Cc: Mike Frysinger<vapier@gentoo.org>
> Cc: Yong Zhang<yong.zhang0@gmail.com>
> Cc: Venkatesh Pallipadi<venki@google.com>
> Cc: Suresh Siddha<suresh.b.siddha@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linux-mips@linux-mips.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-sh@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Srivatsa S. Bhat<srivatsa.bhat@linux.vnet.ibm.com>
> ---
>
> arch/ia64/include/asm/cpu.h | 2 --
> arch/ia64/kernel/process.c | 1 +
> arch/ia64/kernel/smpboot.c | 6 +-----
> arch/mips/cavium-octeon/smp.c | 4 +---
> arch/powerpc/kernel/smp.c | 6 +-----
> arch/sh/include/asm/smp.h | 2 --
> arch/sh/kernel/smp.c | 4 +---
> arch/tile/kernel/smpboot.c | 4 +---
> arch/x86/include/asm/cpu.h | 2 --
> arch/x86/kernel/smpboot.c | 4 +---
> arch/x86/xen/smp.c | 1 +
> include/linux/smpboot.h | 1 +
> kernel/smpboot.c | 4 ++++
> 13 files changed, 13 insertions(+), 28 deletions(-)
>
[...]
> diff --git a/arch/mips/cavium-octeon/smp.c b/arch/mips/cavium-octeon/smp.c
> index 97e7ce9..93cd4b0 100644
> --- a/arch/mips/cavium-octeon/smp.c
> +++ b/arch/mips/cavium-octeon/smp.c
> @@ -13,6 +13,7 @@
> #include<linux/kernel_stat.h>
> #include<linux/sched.h>
> #include<linux/module.h>
> +#include<linux/smpboot.h>
>
> #include<asm/mmu_context.h>
> #include<asm/time.h>
> @@ -252,9 +253,6 @@ static void octeon_cpus_done(void)
>
> #ifdef CONFIG_HOTPLUG_CPU
>
> -/* State of each CPU. */
> -DEFINE_PER_CPU(int, cpu_state);
> -
> extern void fixup_irqs(void);
>
> static DEFINE_SPINLOCK(smp_reserve_lock);
The Octeon bit:
Acked-by: David Daney <david.daney@cavium.com>
FWIW, the rest looks good too.
^ permalink raw reply
* Re: [Xen-devel] [PATCH 05/27] xen, cpu hotplug: Don't call cpu_bringup() in xen_play_dead()
From: Srivatsa S. Bhat @ 2012-06-02 18:06 UTC (permalink / raw)
To: Jan Beulich
Cc: linux-arch, Jeremy Fitzhardinge, xen-devel, Russell King, akpm,
nikunj, Konrad Rzeszutek Wilk, peterz, linux-kernel, x86, vatsa,
virtualization, rjw, yong.zhang0, Ingo Molnar, H. Peter Anvin,
Keir Fraser, Thomas Gleixner, paulmck, mingo
In-Reply-To: <4FC8FD9C0200007800087DF5@nat28.tlf.novell.com>
On 06/01/2012 09:06 PM, Jan Beulich wrote:
>>>> On 01.06.12 at 17:13, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
> wrote:
>> On 06/01/2012 06:29 PM, Jan Beulich wrote:
>>
>>>>>> On 01.06.12 at 11:11, "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
>>> wrote:
>>>> xen_play_dead calls cpu_bringup() which looks weird, because xen_play_dead()
>>>> is invoked in the cpu down path, whereas cpu_bringup() (as the name
>>>> suggests) is useful in the cpu bringup path.
>>>
>>> This might not be correct - the code as it is without this change is
>>> safe even when the vCPU gets onlined back later by an external
>>> entity (e.g. the Xen tool stack), and it would in that case resume
>>> at the return point of the VCPUOP_down hypercall. That might
>>> be a heritage from the original XenoLinux tree though, and be
>>> meaningless in pv-ops context - Jeremy, Konrad?
>>>
>>> Possibly it was bogus/unused even in that original tree - Keir?
>>>
>>
>>
>> Thanks for your comments Jan!
>>
>> In case this change is wrong, the other method I had in mind was to call
>> cpu_bringup_and_idle() in xen_play_dead(). (Even ARM does something similar,
>> in the sense that it runs the cpu bringup code including cpu_idle(), in the
>> cpu offline path, namely the cpu_die() function). Would that approach work
>> for xen as well? If yes, then we wouldn't have any issues to convert xen to
>> generic code.
>
> No, that wouldn't work either afaict - the function is expected
> to return.
>
Ok.. So, I would love to hear a confirmation about whether this patch (which
removes cpu_bringup() in xen_play_dead()) will break things or it is good as is.
If its not correct, then we can probably make __cpu_post_online() return an int,
with the meaning:
0 => success, go ahead and call cpu_idle()
non-zero => stop here, thanks for your services so far.. now leave the rest to me.
So all other archs will return 0, Xen will return non-zero, and it will handle
when to call cpu_idle() and when not to do so.
Might sound a bit ugly, but I don't see much other option. Suggestions are
appreciated!
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH 0/3] Fix hot-unplug race in virtio-blk
From: Rusty Russell @ 2012-06-04 0:41 UTC (permalink / raw)
To: Asias He, virtualization, Michael S. Tsirkin; +Cc: kvm
In-Reply-To: <1337913289-23486-1-git-send-email-asias@redhat.com>
On Fri, 25 May 2012 10:34:46 +0800, Asias He <asias@redhat.com> wrote:
> This patch set fixes the race when hot-unplug stressed disk.
>
> Asias He (3):
> virtio-blk: Call del_gendisk() before disable guest kick
> virtio-blk: Reset device after blk_cleanup_queue()
> virtio-blk: Use block layer provided spinlock
>
> drivers/block/virtio_blk.c | 25 ++++++-------------------
> 1 file changed, 6 insertions(+), 19 deletions(-)
I've applied these (the revised spinlock patch, thanks to mst for his
feedback).
I think a cc: stable is in order for these, so I've added it.
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Rusty Russell @ 2012-06-04 1:57 UTC (permalink / raw)
To: Christian Borntraeger, Stefan Hajnoczi
Cc: kvm, Michael S. Tsirkin, virtualization, khoa, Tejun Heo
In-Reply-To: <4FC622F9.2070506@de.ibm.com>
On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> On 30/05/12 15:19, Stefan Hajnoczi wrote:
> > Holding the vblk->lock across kick causes poor scalability in SMP
> > guests. If one CPU is doing virtqueue kick and another CPU touches the
> > vblk->lock it will have to spin until virtqueue kick completes.
> >
> > This patch reduces system% CPU utilization in SMP guests that are
> > running multithreaded I/O-bound workloads. The improvements are small
> > but show as iops and SMP are increased.
>
> Funny, recently I got a bug report regarding spinlock lockup
> (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
> Turned out that blk_done was called on many guest cpus while the guest
> was heavily paging on one virtio block device. (and the guest had much
> more cpus than the host)
> This patch will probably reduce the pressure for those cases as well.
> we can then finish requests if somebody else is doing the kick.
>
> IIRC there were some other approaches to address this lock holding during
> kick but this looks like the less intrusive one.
>
> > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Unfortunately, this conflicts with Asias He's deadlock fix, which has
us just using the block-layer-supplied spinlock.
If we drop the lock around the kick as you suggest, we're playing with
fire. All the virtio backends have an atomic notify, so they're OK,
and the block layer *looks* safe at a glance, but there's no assurances.
It seems like a workaround to the fact that we don't have hcall-backed
spinlocks like Xen, or that our virtio device is too laggy.
Cheers,
Rusty.
^ permalink raw reply
* Re: Question regarding Virtio Console and Remoteproc
From: Rusty Russell @ 2012-06-04 4:45 UTC (permalink / raw)
To: Ohad Ben-Cohen, Sjur BRENDELAND
Cc: Amit Shah, Benjamin Herrenschmidt, Linus Walleij, Arnd Bergmann,
virtualization@lists.linux-foundation.org
In-Reply-To: <CAK=WgbY9cXByKXEnpSvVr3dCecmKyumAQ6ffX8_2ghsAr2haEQ@mail.gmail.com>
On Fri, 1 Jun 2012 10:51:38 +0300, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Fri, Jun 1, 2012 at 10:31 AM, Sjur BRENDELAND
> <sjur.brandeland@stericsson.com> wrote:
> > if (virtio_has_feature(vdev, VIRTIO_CONSOLE_USE_DMA_MEM)) {
> > dma_addr_t dma;
> > buf = dma_alloc_coherent(dev, size, &dma, GFP_KERNEL);
> > } else
> > buf = kmalloc(count, GFP_KERNEL);
>
> Something along those lines is also needed for remote processors which
> access memory via an IOMMU (e.g. OMAP4's M3 and DSP).
>
> Allocating the memory via the DMA API will seamlessly configure the
> relevant IOMMU as needed, and will make the buffers accessible to the
> remote processors.
>
> Thanks,
> Ohad.
It seems quite sensible. The formal definition in the spec would be
good. In particular, defining DMA_MEM in a generic (non-Linux) way will
be interesting.
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Michael S. Tsirkin @ 2012-06-04 5:29 UTC (permalink / raw)
To: Rusty Russell
Cc: Stefan Hajnoczi, kvm, Christian Borntraeger, virtualization, khoa,
Tejun Heo
In-Reply-To: <877gvn3kks.fsf@rustcorp.com.au>
On Mon, Jun 04, 2012 at 11:27:39AM +0930, Rusty Russell wrote:
> On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > On 30/05/12 15:19, Stefan Hajnoczi wrote:
> > > Holding the vblk->lock across kick causes poor scalability in SMP
> > > guests. If one CPU is doing virtqueue kick and another CPU touches the
> > > vblk->lock it will have to spin until virtqueue kick completes.
> > >
> > > This patch reduces system% CPU utilization in SMP guests that are
> > > running multithreaded I/O-bound workloads. The improvements are small
> > > but show as iops and SMP are increased.
> >
> > Funny, recently I got a bug report regarding spinlock lockup
> > (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
> > Turned out that blk_done was called on many guest cpus while the guest
> > was heavily paging on one virtio block device. (and the guest had much
> > more cpus than the host)
> > This patch will probably reduce the pressure for those cases as well.
> > we can then finish requests if somebody else is doing the kick.
> >
> > IIRC there were some other approaches to address this lock holding during
> > kick but this looks like the less intrusive one.
> >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Unfortunately, this conflicts with Asias He's deadlock fix, which has
> us just using the block-layer-supplied spinlock.
>
> If we drop the lock around the kick as you suggest, we're playing with
> fire. All the virtio backends have an atomic notify, so they're OK,
What was the point of virtqueue_kick_prepare/virtqueue_notify then?
I really thought the point was making virtqueue_notify
atomic at the API level.
> and the block layer *looks* safe at a glance, but there's no assurances.
what exactly do you have in mind for the block layer?
> It seems like a workaround to the fact that we don't have hcall-backed
> spinlocks like Xen, or that our virtio device is too laggy.
>
> Cheers,
> Rusty.
Exits are expensive no matter what we do, so dropping the lock
makes sense.
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Christian Borntraeger @ 2012-06-04 6:02 UTC (permalink / raw)
To: Rusty Russell
Cc: Stefan Hajnoczi, kvm, Michael S. Tsirkin, virtualization, khoa,
Tejun Heo
In-Reply-To: <877gvn3kks.fsf@rustcorp.com.au>
On 04/06/12 03:57, Rusty Russell wrote:
> Unfortunately, this conflicts with Asias He's deadlock fix, which has
> us just using the block-layer-supplied spinlock.
>
> If we drop the lock around the kick as you suggest, we're playing with
> fire. All the virtio backends have an atomic notify, so they're OK,
> and the block layer *looks* safe at a glance, but there's no assurances.
Well, the kick itself returns early, but in the host every other action
is already asynchronously running - not caring about the guest locks at all.
So if removing the lock around the kick causes a problem, then the problem
is already present, no?
Christian
^ permalink raw reply
* Re: Question regarding Virtio Console and Remoteproc
From: Benjamin Herrenschmidt @ 2012-06-04 6:10 UTC (permalink / raw)
To: Rusty Russell
Cc: Arnd Bergmann, Linus Walleij,
virtualization@lists.linux-foundation.org, Amit Shah
In-Reply-To: <87bokz1y8a.fsf@rustcorp.com.au>
On Mon, 2012-06-04 at 14:15 +0930, Rusty Russell wrote:
> > Something along those lines is also needed for remote processors
> which
> > access memory via an IOMMU (e.g. OMAP4's M3 and DSP).
> >
> > Allocating the memory via the DMA API will seamlessly configure the
> > relevant IOMMU as needed, and will make the buffers accessible to
> the
> > remote processors.
> >
> > Thanks,
> > Ohad.
>
> It seems quite sensible. The formal definition in the spec would be
> good. In particular, defining DMA_MEM in a generic (non-Linux) way
> will be interesting.
I think we could do a bit better...
We need to dbl check if all the archs we care about use the dma_ops
abstraction, but if they do, then we can remove the conditional and just
use the dma_* functions everywhere, and have virtio-pci itself tweak the
dma_ops pointer in struct device.
The default would be to swap the normal PCI one that the arch code has
put there with some "null" ops to match existing virtio, and we could
then use a negociated capability to avoid that behaviour.
It's important to keep in mind that for KVM, in most cases we want to
keep the bypass of the iommu, simply because it's always going to be
faster than emulating one & mapping/unmapping things to/from it.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Stefan Hajnoczi @ 2012-06-04 8:27 UTC (permalink / raw)
To: Rusty Russell
Cc: kvm, Michael S. Tsirkin, virtualization, Christian Borntraeger,
Tejun Heo, khoa
In-Reply-To: <877gvn3kks.fsf@rustcorp.com.au>
On Mon, Jun 04, 2012 at 11:27:39AM +0930, Rusty Russell wrote:
> On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > On 30/05/12 15:19, Stefan Hajnoczi wrote:
> > > Holding the vblk->lock across kick causes poor scalability in SMP
> > > guests. If one CPU is doing virtqueue kick and another CPU touches the
> > > vblk->lock it will have to spin until virtqueue kick completes.
> > >
> > > This patch reduces system% CPU utilization in SMP guests that are
> > > running multithreaded I/O-bound workloads. The improvements are small
> > > but show as iops and SMP are increased.
> >
> > Funny, recently I got a bug report regarding spinlock lockup
> > (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
> > Turned out that blk_done was called on many guest cpus while the guest
> > was heavily paging on one virtio block device. (and the guest had much
> > more cpus than the host)
> > This patch will probably reduce the pressure for those cases as well.
> > we can then finish requests if somebody else is doing the kick.
> >
> > IIRC there were some other approaches to address this lock holding during
> > kick but this looks like the less intrusive one.
> >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> > Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> Unfortunately, this conflicts with Asias He's deadlock fix, which has
> us just using the block-layer-supplied spinlock.
>
> If we drop the lock around the kick as you suggest, we're playing with
> fire. All the virtio backends have an atomic notify, so they're OK,
> and the block layer *looks* safe at a glance, but there's no assurances.
There are assurances:
Documentation/block/biodoc.txt:
"Drivers are free to drop the queue lock themselves, if required."
Other drivers including rbd, nbd, cciss, and probably others drop the
lock too.
Stefan
^ permalink raw reply
* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Asias He @ 2012-06-04 8:33 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: khoa, Michael S. Tsirkin, kvm, virtualization
In-Reply-To: <1338541986-8083-1-git-send-email-stefanha@linux.vnet.ibm.com>
On 06/01/2012 05:13 PM, Stefan Hajnoczi wrote:
> Holding the vblk->lock across kick causes poor scalability in SMP
> guests. If one CPU is doing virtqueue kick and another CPU touches the
> vblk->lock it will have to spin until virtqueue kick completes.
>
> This patch reduces system% CPU utilization in SMP guests that are
> running multithreaded I/O-bound workloads. The improvements are small
> but show as iops and SMP are increased.
>
> Khoa Huynh<khoa@us.ibm.com> provided initial performance data that
> indicates this optimization is worthwhile at high iops.
>
> Asias He<asias@redhat.com> reports the following fio results:
>
> Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
> Guest: same as host kernel
>
> Average 3 runs:
> with locked kick
> read iops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
> write iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
> read iops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
> write iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
> clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
> clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
> clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
> clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
> cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
> cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
> cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
> cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50
>
> with unlocked kick
> read iops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
> write iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
> read iops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
> write iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
> clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
> clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
> clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
> clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
> cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
> cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
> cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
> cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00
>
> FIO config file:
>
> [global]
> exec_prerun="echo 3> /proc/sys/vm/drop_caches"
> group_reporting
> norandommap
> ioscheduler=noop
> thread
> bs=512
> size=4MB
> direct=1
> filename=/dev/vdb
> numjobs=256
> ioengine=aio
> iodepth=64
> loops=3
>
> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
> ---
> Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
> To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
> irqs with spin_unlock_irq(). If the caller of blk_run_queue() had irqs
> disabled and we enable them again this could be a problem, right? Can someone
> more familiar with kernel locking comment?
blk_run_queue() is not used in our code path. We use __blk_run_queue().
The code path is:
generic_make_request() -> q->make_request_fn() -> blk_queue_bio()
->__blk_run_queue() -> q->request_fn() -> do_virtblk_request().
__blk_run_queue() is called with interrupts disabled and queue lock
locked. In blk_queue_bio, __blk_run_queue() is protected by
spin_lock_irq(q->queue_lock).
The lock in block layer seems a bit confusing, e.g.block/blk-core.c.
There are fixed use of spin_lock_irq() and spin_lock_irqsave() pair.
>
> drivers/block/virtio_blk.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 774c31d..d674977 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
> issued++;
> }
>
> - if (issued)
> - virtqueue_kick(vblk->vq);
> + if (!issued)
> + return;
> +
> + if (virtqueue_kick_prepare(vblk->vq)) {
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + virtqueue_notify(vblk->vq);
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + }
> }
>
> /* return id (s/n) string for *disk to *id_str
--
Asias
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Asias He @ 2012-06-04 8:35 UTC (permalink / raw)
To: Rusty Russell
Cc: Stefan Hajnoczi, kvm, Christian Borntraeger, Michael S. Tsirkin,
virtualization, khoa, Tejun Heo
In-Reply-To: <877gvn3kks.fsf@rustcorp.com.au>
On 06/04/2012 09:57 AM, Rusty Russell wrote:
> On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger<borntraeger@de.ibm.com> wrote:
>> On 30/05/12 15:19, Stefan Hajnoczi wrote:
>>> Holding the vblk->lock across kick causes poor scalability in SMP
>>> guests. If one CPU is doing virtqueue kick and another CPU touches the
>>> vblk->lock it will have to spin until virtqueue kick completes.
>>>
>>> This patch reduces system% CPU utilization in SMP guests that are
>>> running multithreaded I/O-bound workloads. The improvements are small
>>> but show as iops and SMP are increased.
>>
>> Funny, recently I got a bug report regarding spinlock lockup
>> (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
>> Turned out that blk_done was called on many guest cpus while the guest
>> was heavily paging on one virtio block device. (and the guest had much
>> more cpus than the host)
>> This patch will probably reduce the pressure for those cases as well.
>> we can then finish requests if somebody else is doing the kick.
>>
>> IIRC there were some other approaches to address this lock holding during
>> kick but this looks like the less intrusive one.
>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>> Acked-by: Christian Borntraeger<borntraeger@de.ibm.com>
>
> Unfortunately, this conflicts with Asias He's deadlock fix, which has
> us just using the block-layer-supplied spinlock.
There is a v3 which solved this conflicts.
>
> If we drop the lock around the kick as you suggest, we're playing with
> fire. All the virtio backends have an atomic notify, so they're OK,
> and the block layer *looks* safe at a glance, but there's no assurances.
>
> It seems like a workaround to the fact that we don't have hcall-backed
> spinlocks like Xen, or that our virtio device is too laggy.
>
> Cheers,
> Rusty.
>
>
--
Asias
^ permalink raw reply
* Re: [PATCH v2] virtio_blk: unlock vblk->lock during kick
From: Asias He @ 2012-06-04 8:57 UTC (permalink / raw)
To: Rusty Russell
Cc: Stefan Hajnoczi, kvm, Christian Borntraeger, Michael S. Tsirkin,
virtualization, khoa, Tejun Heo
In-Reply-To: <877gvn3kks.fsf@rustcorp.com.au>
On 06/04/2012 09:57 AM, Rusty Russell wrote:
> On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger<borntraeger@de.ibm.com> wrote:
>> On 30/05/12 15:19, Stefan Hajnoczi wrote:
>>> Holding the vblk->lock across kick causes poor scalability in SMP
>>> guests. If one CPU is doing virtqueue kick and another CPU touches the
>>> vblk->lock it will have to spin until virtqueue kick completes.
>>>
>>> This patch reduces system% CPU utilization in SMP guests that are
>>> running multithreaded I/O-bound workloads. The improvements are small
>>> but show as iops and SMP are increased.
>>
>> Funny, recently I got a bug report regarding spinlock lockup
>> (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
>> Turned out that blk_done was called on many guest cpus while the guest
>> was heavily paging on one virtio block device. (and the guest had much
>> more cpus than the host)
>> This patch will probably reduce the pressure for those cases as well.
>> we can then finish requests if somebody else is doing the kick.
>>
>> IIRC there were some other approaches to address this lock holding during
>> kick but this looks like the less intrusive one.
>>
>>> Signed-off-by: Stefan Hajnoczi<stefanha@linux.vnet.ibm.com>
>> Acked-by: Christian Borntraeger<borntraeger@de.ibm.com>
>
> Unfortunately, this conflicts with Asias He's deadlock fix, which has
> us just using the block-layer-supplied spinlock.
>
> If we drop the lock around the kick as you suggest, we're playing with
> fire. All the virtio backends have an atomic notify, so they're OK,
> and the block layer *looks* safe at a glance, but there's no assurances.
Why are we playing with fire if we drop the lock around the kick?
Which one do you think is un-safe, calling virtqueue_notify() with lock
dropped or dropping the lock in q->request_fn()?
> It seems like a workaround to the fact that we don't have hcall-backed
> spinlocks like Xen, or that our virtio device is too laggy.
>
> Cheers,
> Rusty.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Asias
^ permalink raw reply
* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Michael S. Tsirkin @ 2012-06-04 11:11 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: khoa, kvm, virtualization
In-Reply-To: <1338541986-8083-1-git-send-email-stefanha@linux.vnet.ibm.com>
On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
> Holding the vblk->lock across kick causes poor scalability in SMP
> guests. If one CPU is doing virtqueue kick and another CPU touches the
> vblk->lock it will have to spin until virtqueue kick completes.
>
> This patch reduces system% CPU utilization in SMP guests that are
> running multithreaded I/O-bound workloads. The improvements are small
> but show as iops and SMP are increased.
>
> Khoa Huynh <khoa@us.ibm.com> provided initial performance data that
> indicates this optimization is worthwhile at high iops.
>
> Asias He <asias@redhat.com> reports the following fio results:
>
> Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
> Guest: same as host kernel
>
> Average 3 runs:
> with locked kick
> read iops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
> write iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
> read iops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
> write iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
> clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
> clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
> clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
> clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
> cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
> cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
> cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
> cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50
>
> with unlocked kick
> read iops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
> write iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
> read iops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
> write iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
> clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
> clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
> clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
> clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
> cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
> cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
> cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
> cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00
>
> FIO config file:
>
> [global]
> exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
> group_reporting
> norandommap
> ioscheduler=noop
> thread
> bs=512
> size=4MB
> direct=1
> filename=/dev/vdb
> numjobs=256
> ioengine=aio
> iodepth=64
> loops=3
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
> To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
> irqs with spin_unlock_irq(). If the caller of blk_run_queue() had irqs
> disabled and we enable them again this could be a problem, right? Can someone
> more familiar with kernel locking comment?
>
> drivers/block/virtio_blk.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 774c31d..d674977 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
> issued++;
> }
>
> - if (issued)
> - virtqueue_kick(vblk->vq);
> + if (!issued)
> + return;
> +
> + if (virtqueue_kick_prepare(vblk->vq)) {
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + virtqueue_notify(vblk->vq);
If blk_done runs and completes the request at this point,
can hot unplug then remove the queue?
If yes will we get a use after free?
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + }
> }
>
> /* return id (s/n) string for *disk to *id_str
> --
> 1.7.10
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Michael S. Tsirkin @ 2012-06-04 11:15 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: khoa, kvm, virtualization
In-Reply-To: <1338541986-8083-1-git-send-email-stefanha@linux.vnet.ibm.com>
On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
> Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
> To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
> irqs with spin_unlock_irq(). If the caller of blk_run_queue() had irqs
> disabled and we enable them again this could be a problem, right? Can someone
> more familiar with kernel locking comment?
Why take the risk? What's the advantage of enabling them here? VCPU is
not running while the hypervisor is processing the notification anyway.
And the next line returns from the function so the interrupts will get
enabled.
> drivers/block/virtio_blk.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 774c31d..d674977 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
> issued++;
> }
>
> - if (issued)
> - virtqueue_kick(vblk->vq);
> + if (!issued)
> + return;
> +
> + if (virtqueue_kick_prepare(vblk->vq)) {
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + virtqueue_notify(vblk->vq);
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + }
> }
>
> /* return id (s/n) string for *disk to *id_str
> --
> 1.7.10
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v3] virtio_blk: unlock vblk->lock during kick
From: Khoa Huynh @ 2012-06-04 21:13 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <1338541986-8083-1-git-send-email-stefanha@linux.vnet.ibm.com>
[-- Attachment #1.1: Type: text/plain, Size: 5644 bytes --]
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> wrote on 06/01/2012 04:13:06
AM:
>
> Holding the vblk->lock across kick causes poor scalability in SMP
> guests. If one CPU is doing virtqueue kick and another CPU touches the
> vblk->lock it will have to spin until virtqueue kick completes.
>
> This patch reduces system% CPU utilization in SMP guests that are
> running multithreaded I/O-bound workloads. The improvements are small
> but show as iops and SMP are increased.
>
> Khoa Huynh <khoa@us.ibm.com> provided initial performance data that
> indicates this optimization is worthwhile at high iops.
Hi Stefan, I was out on vacation last week, so I could not post the details
of my performance testing with this unlocked-kick patch.
Anyway, here are more details:
Test environment:
- Host: IBM x3850 X5 server (40 cores, 256GB) running RHEL6.2
- Guest: 20 vcpus, 16GB, running Linux 3.4 kernel (w/ and w/o this patch)
- Storage: 12 block devices (from 3 SCSI target servers)
- Workload: FIO benchmark with 8KB random reads and writes (50% split)
With RHEL6.2 qemu-kvm:
- Locked kick = 44,529 IOPS
- Unlocked kick = 44,869 IOPS ==> 0.7% gain
With Stefan's "data-plane" qemu-kvm (experimental qemu to get around
qemu mutex):
- Locked kick = 497,943 IOPS
- Unlocked kick = 532,446 IOPS ==> 6.9% gain
I also tested on a smaller host and smaller guest (4 vcpus, 4GB, 4 virtual
block devices) and applied some performance tuning (e.g. smaller data sets
so most data fit into host's cache, 1KB reads) to deliver the highest I/O
rates possible to the guest with the existing RHEL6.2 qemu-kvm:
- Locked kick = 140,533 IOPS
- Unlocked kick = 143,415 IOPS ==> 2.1% gain
As you can see, the higher the I/O rate, the more performance benefit
we would get from Stefan's unlocked-kick patch. In my performance testing,
the highest I/O rate that I could achieve with the existing KVM/QEMU was
~140,000+ IOPS, so the most performance gain that we could get from this
unlocked-kick patch in this case was about 2-3%. However, going forward,
as we relieve the qemu mutex more and more, we should be able to get much
higher I/O rates (500,000 IOPS or higher), and so I believe that the
performance benefit of this unlocked-kick patch would be more apparent.
If you need further information, please let me know.
Thanks,
-Khoa
>
> Asias He <asias@redhat.com> reports the following fio results:
>
> Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
> Guest: same as host kernel
>
> Average 3 runs:
> with locked kick
> read iops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
> write iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
> read iops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
> write iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
> clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
> clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
> clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
> clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
> cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08
minf=17907363.00
> cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34
minf=20020008.50
> cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39
minf=19737254.00
> cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56
minf=18256760.50
>
> with unlocked kick
> read iops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
> write iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
> read iops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
> write iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
> clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
> clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
> clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
> clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
> cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91
minf=17907893.33
> cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04
minf=19757720.00
> cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25
minf=19349958.33
> cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35
minf=19092343.00
>
> FIO config file:
>
> [global]
> exec_prerun="echo 3 > /proc/sys/vm/drop_caches"
> group_reporting
> norandommap
> ioscheduler=noop
> thread
> bs=512
> size=4MB
> direct=1
> filename=/dev/vdb
> numjobs=256
> ioengine=aio
> iodepth=64
> loops=3
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I
> followed that.
> To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() butwe
enable
> irqs with spin_unlock_irq(). If the caller of blk_run_queue() had irqs
> disabled and we enable them again this could be a problem, right? Can
someone
> more familiar with kernel locking comment?
>
> drivers/block/virtio_blk.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 774c31d..d674977 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue
*q)
> issued++;
> }
>
> - if (issued)
> - virtqueue_kick(vblk->vq);
> + if (!issued)
> + return;
> +
> + if (virtqueue_kick_prepare(vblk->vq)) {
> + spin_unlock_irq(vblk->disk->queue->queue_lock);
> + virtqueue_notify(vblk->vq);
> + spin_lock_irq(vblk->disk->queue->queue_lock);
> + }
> }
>
> /* return id (s/n) string for *disk to *id_str
> --
> 1.7.10
>
[-- Attachment #1.2: Type: text/html, Size: 8132 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [vmw_vmci RFC 00/11] VMCI for Linux
From: Greg KH @ 2012-06-04 22:57 UTC (permalink / raw)
To: Andy King
Cc: dtor, linux-kernel, virtualization, dsouders,
Andrew Stiegmann (stieg), akpm, cschamp
In-Reply-To: <552579991.43606.1338564781979.JavaMail.root@vmware.com>
On Fri, Jun 01, 2012 at 08:33:02AM -0700, Andy King wrote:
> Greg,
>
> Thanks so much for the comments and apologies for the delayed response.
>
> > Don't we have something like this already for KVM and maybe Xen?
> > virtio? Can't you use that code instead of a new block of code that
> > is only used by vmware users? It has virtual pci devices which
> > should give you what you want/need here, right?
> >
> > If not, why doesn't that work for you? Would it be easier to just
> > extend it?
>
> The VMCI virtual device for which this driver is intended has been
> around a lot longer than this submission might suggest. The virtual
> hardware was released in a product before Rusty sent his RFC and
> quite a bit before it made it to mainline; there was, regrettably,
> no virtio then.
>
> As such, it was designed to be its own transport, and it's something
> that is now very much fixed at the hardware level (enhancements
> not withstanding), and which we have to support all the way back.
What "hardware" are you refering to here?
> In addition to that, our hypervisor endpoints are written using
> the existing device backend; virtio doesn't currently make a lot of
> sense for them, and would require a lot of additional work.
>
> All of this is unfortunate. While I agree that virtio is certainly
> the right approach, and we need to avoid this proliferation, I think
> at this point we'd really like to try and upstream this in its current
> form. There's certainly the possibility going forwards that we could
> add a glue layer, such that other clients could use virtio if they're
> willing to write their own hypervisor endpoints.
>
> Does that sound reasonable?
Not really, why should we take an interface that is tied to something
that you are saying isn't something we should be using? Don't you also
have control over the hypervisor side of things in order to properly
design this type of thing?
thanks,
greg k-h
^ permalink raw reply
* Re: [vmw_vmci RFC 00/11] VMCI for Linux
From: Dmitry Torokhov @ 2012-06-05 7:02 UTC (permalink / raw)
To: Greg KH
Cc: Andy King, linux-kernel, virtualization, dsouders,
Andrew Stiegmann (stieg), akpm, cschamp
In-Reply-To: <20120604225757.GC7041@kroah.com>
Hi Greg,
On Mon, Jun 04, 2012 at 03:57:57PM -0700, Greg KH wrote:
> On Fri, Jun 01, 2012 at 08:33:02AM -0700, Andy King wrote:
> > Greg,
> >
> > Thanks so much for the comments and apologies for the delayed response.
> >
> > > Don't we have something like this already for KVM and maybe Xen?
> > > virtio? Can't you use that code instead of a new block of code that
> > > is only used by vmware users? It has virtual pci devices which
> > > should give you what you want/need here, right?
> > >
> > > If not, why doesn't that work for you? Would it be easier to just
> > > extend it?
> >
> > The VMCI virtual device for which this driver is intended has been
> > around a lot longer than this submission might suggest. The virtual
> > hardware was released in a product before Rusty sent his RFC and
> > quite a bit before it made it to mainline; there was, regrettably,
> > no virtio then.
> >
> > As such, it was designed to be its own transport, and it's something
> > that is now very much fixed at the hardware level (enhancements
> > not withstanding), and which we have to support all the way back.
>
> What "hardware" are you refering to here?
The virtual hardware that is currently shipping and has been shipping
for a few years.
>
> > In addition to that, our hypervisor endpoints are written using
> > the existing device backend; virtio doesn't currently make a lot of
> > sense for them, and would require a lot of additional work.
> >
> > All of this is unfortunate. While I agree that virtio is certainly
> > the right approach, and we need to avoid this proliferation, I think
> > at this point we'd really like to try and upstream this in its current
> > form. There's certainly the possibility going forwards that we could
> > add a glue layer, such that other clients could use virtio if they're
> > willing to write their own hypervisor endpoints.
> >
> > Does that sound reasonable?
>
> Not really, why should we take an interface that is tied to something
> that you are saying isn't something we should be using?
That is not what Andy said. If virtio was available when we started
shipping VMCI then we certainly could have used that, but since it
wasn't there we invented something else.
> Don't you also
> have control over the hypervisor side of things in order to properly
> design this type of thing?
We do not have a time machine to go back and change products that we
already shipped to the customers. It is probably the same story as with
Hyper-V's vmbus which is not virtio.
Besides, virtio is not available on non-Linux guests with we have to
support as well, and than affected the design decisions in hypervisor
layer that have been made several years ago.
Thanks,
Dmitry
^ permalink raw reply
* [net-next RFC PATCH] virtio_net: collect satistics and export through ethtool
From: Jason Wang @ 2012-06-05 8:38 UTC (permalink / raw)
To: netdev, rusty, virtualization, linux-kernel, mst
Satistics counters is useful for debugging and performance optimization, so this
patch lets virtio_net driver collect following and export them to userspace
through "ethtool -S":
- number of packets sent/received
- number of bytes sent/received
- number of callbacks for tx/rx
- number of kick for tx/rx
- number of bytes/packets queued for tx
As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were exposed
like:
NIC statistics:
tx_bytes[0]: 2551
tx_packets[0]: 12
tx_kick[0]: 12
tx_callbacks[0]: 1
tx_queued_packets[0]: 12
tx_queued_bytes[0]: 3055
rx_bytes[0]: 0
rx_packets[0]: 0
rx_kick[0]: 0
rx_callbacks[0]: 0
tx_bytes[1]: 5575
tx_packets[1]: 37
tx_kick[1]: 38
tx_callbacks[1]: 0
tx_queued_packets[1]: 38
tx_queued_bytes[1]: 5217
rx_bytes[1]: 4175
rx_packets[1]: 25
rx_kick[1]: 1
rx_callbacks[1]: 16
tx_bytes: 8126
tx_packets: 49
tx_kick: 50
tx_callbacks: 1
tx_queued_packets: 50
tx_queued_bytes: 8272
rx_bytes: 4175
rx_packets: 25
rx_kick: 1
rx_callbacks: 16
TODO:
- more satistics
- unitfy the ndo_get_stats64 and get_ethtool_stats
- calculate the pending bytes/pkts
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 130 +++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..7ab0cc1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,10 @@ module_param(gso, bool, 0444);
#define VIRTNET_SEND_COMMAND_SG_MAX 2
#define VIRTNET_DRIVER_VERSION "1.0.0"
+#define VIRTNET_STAT_OFF(m) offsetof(struct virtnet_stats, m)
+#define VIRTNET_STAT(stat, i) (*((u64 *)((char *)stat + \
+ virtnet_stats_str_attr[i].stat_offset)))
+
struct virtnet_stats {
struct u64_stats_sync syncp;
u64 tx_bytes;
@@ -48,8 +52,33 @@ struct virtnet_stats {
u64 rx_bytes;
u64 rx_packets;
+
+ u64 tx_kick;
+ u64 rx_kick;
+ u64 tx_callbacks;
+ u64 rx_callbacks;
+ u64 tx_queued_packets;
+ u64 tx_queued_bytes;
+};
+
+static struct {
+ char string[ETH_GSTRING_LEN];
+ int stat_offset;
+} virtnet_stats_str_attr[] = {
+ { "tx_bytes", VIRTNET_STAT_OFF(tx_bytes)},
+ { "tx_packets", VIRTNET_STAT_OFF(tx_packets)},
+ { "tx_kick", VIRTNET_STAT_OFF(tx_kick)},
+ { "tx_callbacks", VIRTNET_STAT_OFF(tx_callbacks)},
+ { "tx_queued_packets", VIRTNET_STAT_OFF(tx_queued_packets)},
+ { "tx_queued_bytes", VIRTNET_STAT_OFF(tx_queued_bytes)},
+ { "rx_bytes" , VIRTNET_STAT_OFF(rx_bytes)},
+ { "rx_packets", VIRTNET_STAT_OFF(rx_packets)},
+ { "rx_kick", VIRTNET_STAT_OFF(rx_kick)},
+ { "rx_callbacks", VIRTNET_STAT_OFF(rx_callbacks)},
};
+#define VIRTNET_NUM_STATS ARRAY_SIZE(virtnet_stats_str_attr)
+
struct virtnet_info {
struct virtio_device *vdev;
struct virtqueue *rvq, *svq, *cvq;
@@ -142,6 +171,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
static void skb_xmit_done(struct virtqueue *svq)
{
struct virtnet_info *vi = svq->vdev->priv;
+ struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+ u64_stats_update_begin(&stats->syncp);
+ stats->tx_callbacks++;
+ u64_stats_update_end(&stats->syncp);
/* Suppress further interrupts. */
virtqueue_disable_cb(svq);
@@ -461,6 +495,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
{
int err;
bool oom;
+ struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
do {
if (vi->mergeable_rx_bufs)
@@ -477,13 +512,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
} while (err > 0);
if (unlikely(vi->num > vi->max))
vi->max = vi->num;
- virtqueue_kick(vi->rvq);
+ if (virtqueue_kick_prepare(vi->rvq)) {
+ virtqueue_notify(vi->rvq);
+ u64_stats_update_begin(&stats->syncp);
+ stats->rx_kick++;
+ u64_stats_update_end(&stats->syncp);
+ }
return !oom;
}
static void skb_recv_done(struct virtqueue *rvq)
{
struct virtnet_info *vi = rvq->vdev->priv;
+ struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+
+ u64_stats_update_begin(&stats->syncp);
+ stats->rx_callbacks++;
+ u64_stats_update_end(&stats->syncp);
+
/* Schedule NAPI, Suppress further interrupts if successful. */
if (napi_schedule_prep(&vi->napi)) {
virtqueue_disable_cb(rvq);
@@ -626,7 +672,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct virtnet_info *vi = netdev_priv(dev);
+ struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
int capacity;
+ bool kick;
/* Free up any pending old buffers before queueing new ones. */
free_old_xmit_skbs(vi);
@@ -651,7 +699,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
kfree_skb(skb);
return NETDEV_TX_OK;
}
- virtqueue_kick(vi->svq);
+
+ kick = virtqueue_kick_prepare(vi->svq);
+ if (kick)
+ virtqueue_notify(vi->svq);
+
+ u64_stats_update_begin(&stats->syncp);
+ if (kick)
+ stats->tx_kick++;
+ stats->tx_queued_bytes += skb->len;
+ stats->tx_queued_packets++;
+ u64_stats_update_end(&stats->syncp);
/* Don't wait up for transmitted skbs to be freed. */
skb_orphan(skb);
@@ -926,7 +984,6 @@ static void virtnet_get_ringparam(struct net_device *dev,
}
-
static void virtnet_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
{
@@ -939,10 +996,77 @@ static void virtnet_get_drvinfo(struct net_device *dev,
}
+static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+ int i, cpu;
+ switch (stringset) {
+ case ETH_SS_STATS:
+ for_each_possible_cpu(cpu)
+ for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+ sprintf(buf, "%s[%u]",
+ virtnet_stats_str_attr[i].string, cpu);
+ buf += ETH_GSTRING_LEN;
+ }
+ for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+ memcpy(buf, virtnet_stats_str_attr[i].string,
+ ETH_GSTRING_LEN);
+ buf += ETH_GSTRING_LEN;
+ }
+ break;
+ }
+}
+
+static int virtnet_get_sset_count(struct net_device *dev, int sset)
+{
+ switch (sset) {
+ case ETH_SS_STATS:
+ return VIRTNET_NUM_STATS * (num_online_cpus() + 1);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static void virtnet_get_ethtool_stats(struct net_device *dev,
+ struct ethtool_stats *stats, u64 *buf)
+{
+ struct virtnet_info *vi = netdev_priv(dev);
+ int cpu, i;
+ unsigned int start;
+ struct virtnet_stats sample, total;
+
+ memset(&total, 0, sizeof(total));
+ memset(&sample, 0, sizeof(sample));
+
+ for_each_possible_cpu(cpu) {
+ struct virtnet_stats *stats = per_cpu_ptr(vi->stats, cpu);
+ do {
+ start = u64_stats_fetch_begin(&stats->syncp);
+ for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+ VIRTNET_STAT(&sample, i) =
+ VIRTNET_STAT(stats, i);
+
+ }
+ } while (u64_stats_fetch_retry(&stats->syncp, start));
+ for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+ *buf = VIRTNET_STAT(&sample, i);
+ VIRTNET_STAT(&total, i) += VIRTNET_STAT(stats, i);
+ buf++;
+ }
+ }
+
+ for (i = 0; i < VIRTNET_NUM_STATS; i++) {
+ *buf = VIRTNET_STAT(&total, i);
+ buf++;
+ }
+}
+
static const struct ethtool_ops virtnet_ethtool_ops = {
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
+ .get_ethtool_stats = virtnet_get_ethtool_stats,
+ .get_strings = virtnet_get_strings,
+ .get_sset_count = virtnet_get_sset_count,
};
#define MIN_MTU 68
^ permalink raw reply related
* Re: Question regarding Virtio Console and Remoteproc
From: Amit Shah @ 2012-06-05 9:43 UTC (permalink / raw)
To: Sjur BRENDELAND
Cc: Linus Walleij, Arnd Bergmann,
virtualization@lists.linux-foundation.org
In-Reply-To: <81C3A93C17462B4BBD7E272753C10579232F4FB338@EXDCVYMBSTM005.EQ1STM.local>
Hi Sjur,
On (Fri) 01 Jun 2012 [09:31:30], Sjur BRENDELAND wrote:
> Hi Amit and Rusty,
>
> I've been looking into the possibility of using the Virtio Console
> Driver together with the remoteproc framework to communicate with
> ST-Ericsson modem over shared memory.
>
> It seems like Virtio Console would be a good fit, except for a issue
> with buffer allocation. Due to HW limitations the STE-Modem cannot
> access kernel memory (no IOMMU and limited address range). Instead
> we have a designated shared memory region used for IPC.
>
> Due to this I cannot use kmalloc() for buffer allocation, but I
> have to allocate buffers from the memory region shared with the
> modem.
>
> In remoteproc this is solved by using dma_alloc_coherent() for all
> memory to be shared with the modem. This works fine for me, because
> I can pass the IPC memory region to dma_declare_coherent_memory()
> so dma_alloc_coherent() will allocate from this memory region.
>
> I think I can solve this issue in Virtio Console by changing calls
> to kmalloc() to something like:
>
> if (virtio_has_feature(vdev, VIRTIO_CONSOLE_USE_DMA_MEM)) {
> dma_addr_t dma;
> buf = dma_alloc_coherent(dev, size, &dma, GFP_KERNEL);
> } else
> buf = kmalloc(count, GFP_KERNEL);
>
> I'd like to get the opinion from you virtualization folks on this!
> If you think it looks reasonable I might start cooking some patches...
I don't have a problem with this.
Thanks,
Amit
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox