* Re: [PATCH net-next v2 6/9] net: use core MTU range checking in virt drivers
From: Jarod Wilson @ 2016-10-21 2:37 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Aaron Conole, David Kershner, Wei Liu, VMware, Inc., netdev,
Haiyang Zhang, linux-kernel, virtualization, Paul Durrant,
Shrikrishna Khare
In-Reply-To: <20161020231559-mutt-send-email-mst@kernel.org>
On Thu, Oct 20, 2016 at 11:23:54PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 20, 2016 at 01:55:21PM -0400, Jarod Wilson wrote:
...
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index fad84f3..720809f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1419,17 +1419,6 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
> > .set_settings = virtnet_set_settings,
> > };
> >
> > -#define MIN_MTU 68
> > -#define MAX_MTU 65535
> > -
> > -static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> > -{
> > - if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
> > - return -EINVAL;
> > - dev->mtu = new_mtu;
> > - return 0;
> > -}
> > -
> > static const struct net_device_ops virtnet_netdev = {
> > .ndo_open = virtnet_open,
> > .ndo_stop = virtnet_close,
> > @@ -1437,7 +1426,6 @@ static const struct net_device_ops virtnet_netdev = {
> > .ndo_validate_addr = eth_validate_addr,
> > .ndo_set_mac_address = virtnet_set_mac_address,
> > .ndo_set_rx_mode = virtnet_set_rx_mode,
> > - .ndo_change_mtu = virtnet_change_mtu,
> > .ndo_get_stats64 = virtnet_stats,
> > .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> > .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> > @@ -1748,6 +1736,9 @@ static bool virtnet_validate_features(struct virtio_device *vdev)
> > return true;
> > }
> >
> > +#define MIN_MTU ETH_MIN_MTU
> > +#define MAX_MTU ETH_MAX_MTU
> > +
>
> Can we drop these btw?
Bah. Yeah. Should have just used them directly. I didn't add ETH_MAX_MTU
until after doing the virtio_net changes, so I missed that.
> > static int virtnet_probe(struct virtio_device *vdev)
> > {
> > int i, err;
> > @@ -1821,6 +1812,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> >
> > dev->vlan_features = dev->features;
> >
> > + /* MTU range: 68 - 65535 */
> > + dev->min_mtu = MIN_MTU;
> > + dev->max_mtu = MAX_MTU;
> > +
> > /* Configuration may specify what MAC to use. Otherwise random. */
> > if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> > virtio_cread_bytes(vdev,
> > @@ -1875,8 +1870,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > mtu = virtio_cread16(vdev,
> > offsetof(struct virtio_net_config,
> > mtu));
> > - if (virtnet_change_mtu(dev, mtu))
> > + if (mtu < dev->min_mtu || mtu > dev->max_mtu)
>
> In fact the > max_mtu branch does not make sense since a 16 bit
> value can't exceed MAX_MTU.
Hm. mtu is declared as an int, not sure if there's any sort of type
promotion to be worried about (not an area I know much/anything about).
Certainly something that could be looked into as a minor optimization,
though it's only in a probe path and shouldn't hurt anything, so ... meh?
--
Jarod Wilson
jarod@redhat.com
^ permalink raw reply
* Re: [PATCH v5 9/9] Documentation: virtual: kvm: Support vcpu preempted check
From: Pan Xinhui @ 2016-10-21 1:42 UTC (permalink / raw)
To: Boqun Feng, Pan Xinhui
Cc: kernellwp, linux-s390, jgross, kvm, rkrcmar, peterz,
xen-devel-request, will.deacon, linux-kernel, virtualization,
mingo, paulus, mpe, benh, pbonzini, paulmck, linuxppc-dev
In-Reply-To: <20161021012348.GC8429@tardis.cn.ibm.com>
在 2016/10/21 09:23, Boqun Feng 写道:
> On Thu, Oct 20, 2016 at 05:27:54PM -0400, Pan Xinhui wrote:
>> Commit ("x86, kvm: support vcpu preempted check") add one field "__u8
>> preempted" into struct kvm_steal_time. This field tells if one vcpu is
>> running or not.
>>
>> It is zero if 1) some old KVM deos not support this filed. 2) the vcpu is
>> preempted. Other values means the vcpu has been preempted.
> ^^^^^^^^^
> s/preempted/not preempted
>
yes. the less of *not* definitely sould be avoided..
> And better to fix other typos in the commit log ;-)
> Maybe you can try aspell? That works for me.
>
I will try it. :)
> Regards,
> Boqun
>
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> Documentation/virtual/kvm/msr.txt | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
>> index 2a71c8f..3376f13 100644
>> --- a/Documentation/virtual/kvm/msr.txt
>> +++ b/Documentation/virtual/kvm/msr.txt
>> @@ -208,7 +208,8 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
>> __u64 steal;
>> __u32 version;
>> __u32 flags;
>> - __u32 pad[12];
>> + __u8 preempted;
>> + __u32 pad[11];
>> }
>>
>> whose data will be filled in by the hypervisor periodically. Only one
>> @@ -232,6 +233,11 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
>> nanoseconds. Time during which the vcpu is idle, will not be
>> reported as steal time.
>>
>> + preempted: indicate the VCPU who owns this struct is running or
>> + not. Non-zero values mean the VCPU has been preempted. Zero
>> + means the VCPU is not preempted. NOTE, it is always zero if the
>> + the hypervisor doesn't support this field.
>> +
>> MSR_KVM_EOI_EN: 0x4b564d04
>> data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
>> when disabled. Bit 1 is reserved and must be zero. When PV end of
>> --
>> 2.4.11
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v5 9/9] Documentation: virtual: kvm: Support vcpu preempted check
From: Boqun Feng @ 2016-10-21 1:23 UTC (permalink / raw)
To: Pan Xinhui
Cc: kernellwp, linux-s390, jgross, kvm, rkrcmar, peterz,
xen-devel-request, will.deacon, linux-kernel, virtualization,
mingo, paulus, mpe, benh, pbonzini, paulmck, linuxppc-dev
In-Reply-To: <1476998874-2089-10-git-send-email-xinhui.pan@linux.vnet.ibm.com>
[-- Attachment #1.1: Type: text/plain, Size: 1795 bytes --]
On Thu, Oct 20, 2016 at 05:27:54PM -0400, Pan Xinhui wrote:
> Commit ("x86, kvm: support vcpu preempted check") add one field "__u8
> preempted" into struct kvm_steal_time. This field tells if one vcpu is
> running or not.
>
> It is zero if 1) some old KVM deos not support this filed. 2) the vcpu is
> preempted. Other values means the vcpu has been preempted.
^^^^^^^^^
s/preempted/not preempted
And better to fix other typos in the commit log ;-)
Maybe you can try aspell? That works for me.
Regards,
Boqun
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> Documentation/virtual/kvm/msr.txt | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
> index 2a71c8f..3376f13 100644
> --- a/Documentation/virtual/kvm/msr.txt
> +++ b/Documentation/virtual/kvm/msr.txt
> @@ -208,7 +208,8 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
> __u64 steal;
> __u32 version;
> __u32 flags;
> - __u32 pad[12];
> + __u8 preempted;
> + __u32 pad[11];
> }
>
> whose data will be filled in by the hypervisor periodically. Only one
> @@ -232,6 +233,11 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
> nanoseconds. Time during which the vcpu is idle, will not be
> reported as steal time.
>
> + preempted: indicate the VCPU who owns this struct is running or
> + not. Non-zero values mean the VCPU has been preempted. Zero
> + means the VCPU is not preempted. NOTE, it is always zero if the
> + the hypervisor doesn't support this field.
> +
> MSR_KVM_EOI_EN: 0x4b564d04
> data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
> when disabled. Bit 1 is reserved and must be zero. When PV end of
> --
> 2.4.11
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v5 9/9] Documentation: virtual: kvm: Support vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Commit ("x86, kvm: support vcpu preempted check") add one field "__u8
preempted" into struct kvm_steal_time. This field tells if one vcpu is
running or not.
It is zero if 1) some old KVM deos not support this filed. 2) the vcpu is
preempted. Other values means the vcpu has been preempted.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
Documentation/virtual/kvm/msr.txt | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 2a71c8f..3376f13 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -208,7 +208,8 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
__u64 steal;
__u32 version;
__u32 flags;
- __u32 pad[12];
+ __u8 preempted;
+ __u32 pad[11];
}
whose data will be filled in by the hypervisor periodically. Only one
@@ -232,6 +233,11 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
nanoseconds. Time during which the vcpu is idle, will not be
reported as steal time.
+ preempted: indicate the VCPU who owns this struct is running or
+ not. Non-zero values mean the VCPU has been preempted. Zero
+ means the VCPU is not preempted. NOTE, it is always zero if the
+ the hypervisor doesn't support this field.
+
MSR_KVM_EOI_EN: 0x4b564d04
data: Bit 0 is 1 when PV end of interrupt is enabled on the vcpu; 0
when disabled. Bit 1 is reserved and must be zero. When PV end of
--
2.4.11
^ permalink raw reply related
* [PATCH v5 8/9] s390/spinlock: Provide vcpu_is_preempted
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, rkrcmar, peterz, benh, will.deacon, mingo,
paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
From: Christian Borntraeger <borntraeger@de.ibm.com>
this implements the s390 backend for commit
"kernel/sched: introduce vcpu preempted check interface"
by reworking the existing smp_vcpu_scheduled into
arch_vcpu_is_preempted. We can then also get rid of the
local cpu_is_preempted function by moving the
CIF_ENABLED_WAIT test into arch_vcpu_is_preempted.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/include/asm/spinlock.h | 8 ++++++++
arch/s390/kernel/smp.c | 9 +++++++--
arch/s390/lib/spinlock.c | 25 ++++++++-----------------
3 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index 7e9e09f..7ecd890 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -23,6 +23,14 @@ _raw_compare_and_swap(unsigned int *lock, unsigned int old, unsigned int new)
return __sync_bool_compare_and_swap(lock, old, new);
}
+#ifndef CONFIG_SMP
+static inline bool arch_vcpu_is_preempted(int cpu) { return false; }
+#else
+bool arch_vcpu_is_preempted(int cpu);
+#endif
+
+#define vcpu_is_preempted arch_vcpu_is_preempted
+
/*
* Simple spin lock operations. There are two variants, one clears IRQ's
* on the local processor, one does not.
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 35531fe..b988ed1 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -368,10 +368,15 @@ int smp_find_processor_id(u16 address)
return -1;
}
-int smp_vcpu_scheduled(int cpu)
+bool arch_vcpu_is_preempted(int cpu)
{
- return pcpu_running(pcpu_devices + cpu);
+ if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
+ return false;
+ if (pcpu_running(pcpu_devices + cpu))
+ return false;
+ return true;
}
+EXPORT_SYMBOL(arch_vcpu_is_preempted);
void smp_yield_cpu(int cpu)
{
diff --git a/arch/s390/lib/spinlock.c b/arch/s390/lib/spinlock.c
index e5f50a7..e48a48e 100644
--- a/arch/s390/lib/spinlock.c
+++ b/arch/s390/lib/spinlock.c
@@ -37,15 +37,6 @@ static inline void _raw_compare_and_delay(unsigned int *lock, unsigned int old)
asm(".insn rsy,0xeb0000000022,%0,0,%1" : : "d" (old), "Q" (*lock));
}
-static inline int cpu_is_preempted(int cpu)
-{
- if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
- return 0;
- if (smp_vcpu_scheduled(cpu))
- return 0;
- return 1;
-}
-
void arch_spin_lock_wait(arch_spinlock_t *lp)
{
unsigned int cpu = SPINLOCK_LOCKVAL;
@@ -62,7 +53,7 @@ void arch_spin_lock_wait(arch_spinlock_t *lp)
continue;
}
/* First iteration: check if the lock owner is running. */
- if (first_diag && cpu_is_preempted(~owner)) {
+ if (first_diag && arch_vcpu_is_preempted(~owner)) {
smp_yield_cpu(~owner);
first_diag = 0;
continue;
@@ -81,7 +72,7 @@ void arch_spin_lock_wait(arch_spinlock_t *lp)
* yield the CPU unconditionally. For LPAR rely on the
* sense running status.
*/
- if (!MACHINE_IS_LPAR || cpu_is_preempted(~owner)) {
+ if (!MACHINE_IS_LPAR || arch_vcpu_is_preempted(~owner)) {
smp_yield_cpu(~owner);
first_diag = 0;
}
@@ -108,7 +99,7 @@ void arch_spin_lock_wait_flags(arch_spinlock_t *lp, unsigned long flags)
continue;
}
/* Check if the lock owner is running. */
- if (first_diag && cpu_is_preempted(~owner)) {
+ if (first_diag && arch_vcpu_is_preempted(~owner)) {
smp_yield_cpu(~owner);
first_diag = 0;
continue;
@@ -127,7 +118,7 @@ void arch_spin_lock_wait_flags(arch_spinlock_t *lp, unsigned long flags)
* yield the CPU unconditionally. For LPAR rely on the
* sense running status.
*/
- if (!MACHINE_IS_LPAR || cpu_is_preempted(~owner)) {
+ if (!MACHINE_IS_LPAR || arch_vcpu_is_preempted(~owner)) {
smp_yield_cpu(~owner);
first_diag = 0;
}
@@ -165,7 +156,7 @@ void _raw_read_lock_wait(arch_rwlock_t *rw)
owner = 0;
while (1) {
if (count-- <= 0) {
- if (owner && cpu_is_preempted(~owner))
+ if (owner && arch_vcpu_is_preempted(~owner))
smp_yield_cpu(~owner);
count = spin_retry;
}
@@ -211,7 +202,7 @@ void _raw_write_lock_wait(arch_rwlock_t *rw, unsigned int prev)
owner = 0;
while (1) {
if (count-- <= 0) {
- if (owner && cpu_is_preempted(~owner))
+ if (owner && arch_vcpu_is_preempted(~owner))
smp_yield_cpu(~owner);
count = spin_retry;
}
@@ -241,7 +232,7 @@ void _raw_write_lock_wait(arch_rwlock_t *rw)
owner = 0;
while (1) {
if (count-- <= 0) {
- if (owner && cpu_is_preempted(~owner))
+ if (owner && arch_vcpu_is_preempted(~owner))
smp_yield_cpu(~owner);
count = spin_retry;
}
@@ -285,7 +276,7 @@ void arch_lock_relax(unsigned int cpu)
{
if (!cpu)
return;
- if (MACHINE_IS_LPAR && !cpu_is_preempted(~cpu))
+ if (MACHINE_IS_LPAR && !arch_vcpu_is_preempted(~cpu))
return;
smp_yield_cpu(~cpu);
}
--
2.4.11
^ permalink raw reply related
* [PATCH v5 7/9] x86, xen: support vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
From: Juergen Gross <jgross@suse.com>
Support the vcpu_is_preempted() functionality under Xen. This will
enhance lock performance on overcommitted hosts (more runnable vcpus
than physical cpus in the system) as doing busy waits for preempted
vcpus will hurt system performance far worse than early yielding.
A quick test (4 vcpus on 1 physical cpu doing a parallel build job
with "make -j 8") reduced system time by about 5% with this patch.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/x86/xen/spinlock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 3d6e006..74756bb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,7 +114,6 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
}
-
/*
* Our init of PV spinlocks is split in two init functions due to us
* using paravirt patching and jump labels patching and having to do
@@ -137,6 +136,8 @@ void __init xen_init_spinlocks(void)
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = xen_qlock_wait;
pv_lock_ops.kick = xen_qlock_kick;
+
+ pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
}
/*
--
2.4.11
^ permalink raw reply related
* [PATCH v5 6/9] x86, kvm: support vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Support the vcpu_is_preempted() functionality under KVM. This will
enhance lock performance on overcommitted hosts (more runnable vcpus
than physical cpus in the system) as doing busy waits for preempted
vcpus will hurt system performance far worse than early yielding.
Use one field of struct kvm_steal_time to indicate that if one vcpu
is running or not.
unix benchmark result:
host: kernel 4.8.1, i5-4570, 4 cpus
guest: kernel 4.8.1, 8 vcpus
test-case after-patch before-patch
Execl Throughput | 18307.9 lps | 11701.6 lps
File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
Pipe Throughput | 11872208.7 lps | 11855628.9 lps
Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
Process Creation | 29881.2 lps | 28572.8 lps
Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
System Call Overhead | 10385653.0 lps | 10419979.0 lps
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
arch/x86/kernel/kvm.c | 12 ++++++++++++
arch/x86/kvm/x86.c | 18 ++++++++++++++++++
3 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..b3fec56 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -45,7 +45,8 @@ struct kvm_steal_time {
__u64 steal;
__u32 version;
__u32 flags;
- __u32 pad[12];
+ __u8 preempted;
+ __u32 pad[11];
};
#define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..0b48dd2 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
}
+static bool kvm_vcpu_is_preempted(int cpu)
+{
+ struct kvm_steal_time *src;
+
+ src = &per_cpu(steal_time, cpu);
+
+ return !!src->preempted;
+}
+
#ifdef CONFIG_SMP
static void __init kvm_smp_prepare_boot_cpu(void)
{
@@ -471,6 +480,9 @@ void __init kvm_guest_init(void)
if (kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
has_steal_clock = 1;
pv_time_ops.steal_clock = kvm_steal_clock;
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+ pv_lock_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
+#endif
}
if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c633de..a627537 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
return;
+ vcpu->arch.st.steal.preempted = 0;
+
if (vcpu->arch.st.steal.version & 1)
vcpu->arch.st.steal.version += 1; /* first time write, random junk */
@@ -2810,8 +2812,24 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
}
+static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
+{
+ if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
+ return;
+
+ if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
+ return;
+
+ vcpu->arch.st.steal.preempted = 1;
+
+ kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ &vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+}
+
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ kvm_steal_time_set_preempted(vcpu);
kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
--
2.4.11
^ permalink raw reply related
* [PATCH v5 5/9] x86, paravirt: Add interface to support kvm/xen vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted.
Then kernel can break the spin loops upon on the retval of
vcpu_is_preempted.
As kernel has used this interface, So lets support it.
To deal with kernel and kvm/xen, add vcpu_is_preempted into struct
pv_lock_ops.
Then kvm or xen could provide their own implementation to support
vcpu_is_preempted.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/x86/include/asm/paravirt_types.h | 2 ++
arch/x86/include/asm/spinlock.h | 8 ++++++++
arch/x86/kernel/paravirt-spinlocks.c | 6 ++++++
3 files changed, 16 insertions(+)
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 0f400c0..38c3bb7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -310,6 +310,8 @@ struct pv_lock_ops {
void (*wait)(u8 *ptr, u8 val);
void (*kick)(int cpu);
+
+ bool (*vcpu_is_preempted)(int cpu);
};
/* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 921bea7..0526f59 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -26,6 +26,14 @@
extern struct static_key paravirt_ticketlocks_enabled;
static __always_inline bool static_key_false(struct static_key *key);
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+ return pv_lock_ops.vcpu_is_preempted(cpu);
+}
+#endif
+
#include <asm/qspinlock.h>
/*
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 2c55a00..2f204dd 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -21,12 +21,18 @@ bool pv_is_native_spin_unlock(void)
__raw_callee_save___native_queued_spin_unlock;
}
+static bool native_vcpu_is_preempted(int cpu)
+{
+ return 0;
+}
+
struct pv_lock_ops pv_lock_ops = {
#ifdef CONFIG_SMP
.queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
.wait = paravirt_nop,
.kick = paravirt_nop,
+ .vcpu_is_preempted = native_vcpu_is_preempted,
#endif /* SMP */
};
EXPORT_SYMBOL(pv_lock_ops);
--
2.4.11
^ permalink raw reply related
* [PATCH v5 4/9] powerpc/spinlock: support vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted. Then
kernel can break the spin loops upon on the retval of vcpu_is_preempted.
As kernel has used this interface, So lets support it.
Only pSeries need support it. And the fact is powerNV are built into
same kernel image with pSeries. So we need return false if we are runnig
as powerNV. The another fact is that lppaca->yiled_count keeps zero on
powerNV. So we can just skip the machine type check.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/spinlock.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index abb6b0f..f4a9524 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,14 @@
#define SYNC_IO
#endif
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+ return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
#if defined(CONFIG_PPC_SPLPAR)
/* We only yield to the hypervisor if we are in shared processor mode */
#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
--
2.4.11
^ permalink raw reply related
* [PATCH v5 3/9] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.
Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.
test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report
before patch:
20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
4.12% sched-messaging [kernel.vmlinux] [k] system_call
3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
2.00% sched-messaging [kernel.vmlinux] [k] osq_lock
after patch:
9.99% sched-messaging [kernel.vmlinux] [k] mutex_unlock
5.28% sched-messaging [unknown] [H] 0xc0000000000768e0
4.27% sched-messaging [kernel.vmlinux] [k] __copy_tofrom_user_power7
3.77% sched-messaging [kernel.vmlinux] [k] copypage_power7
3.24% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.02% sched-messaging [kernel.vmlinux] [k] system_call
2.69% sched-messaging [kernel.vmlinux] [k] wait_consider_task
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
kernel/locking/mutex.c | 15 +++++++++++++--
kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
2 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..82108f5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
*/
barrier();
- if (!owner->on_cpu || need_resched()) {
+ /*
+ * Use vcpu_is_preempted to detech lock holder preemption issue
+ * and break. vcpu_is_preempted is a macro defined by false if
+ * arch does not support vcpu preempted check,
+ */
+ if (!owner->on_cpu || need_resched() ||
+ vcpu_is_preempted(task_cpu(owner))) {
ret = false;
break;
}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
rcu_read_lock();
owner = READ_ONCE(lock->owner);
+
+ /*
+ * As lock holder preemption issue, we both skip spinning if task is not
+ * on cpu or its cpu is preempted
+ */
if (owner)
- retval = owner->on_cpu;
+ retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
rcu_read_unlock();
/*
* if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..0897179 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -336,7 +336,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
goto done;
}
- ret = owner->on_cpu;
+ /*
+ * As lock holder preemption issue, we both skip spinning if task is not
+ * on cpu or its cpu is preempted
+ */
+ ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
done:
rcu_read_unlock();
return ret;
@@ -362,8 +366,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
*/
barrier();
- /* abort spinning when need_resched or owner is not running */
- if (!owner->on_cpu || need_resched()) {
+ /*
+ * abort spinning when need_resched or owner is not running or
+ * owner's cpu is preempted. vcpu_is_preempted is a macro
+ * defined by false if arch does not support vcpu preempted
+ * check
+ */
+ if (!owner->on_cpu || need_resched() ||
+ vcpu_is_preempted(task_cpu(owner))) {
rcu_read_unlock();
return false;
}
--
2.4.11
^ permalink raw reply related
* [PATCH v5 2/9] locking/osq: Drop the overload of osq_lock()
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
An over-committed guest with more vCPUs than pCPUs has a heavy overload in
osq_lock().
This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu
node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq
lock.
Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.
test case:
perf record -a perf bench sched messaging -g 400 -p && perf report
before patch:
18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
2.49% sched-messaging [kernel.vmlinux] [k] system_call
after patch:
20.68% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner
8.45% sched-messaging [kernel.vmlinux] [k] mutex_unlock
4.12% sched-messaging [kernel.vmlinux] [k] system_call
3.01% sched-messaging [kernel.vmlinux] [k] system_call_common
2.83% sched-messaging [kernel.vmlinux] [k] copypage_power7
2.64% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
2.00% sched-messaging [kernel.vmlinux] [k] osq_lock
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
kernel/locking/osq_lock.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..39d1385 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
return cpu_nr + 1;
}
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+ return node->cpu - 1;
+}
+
static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
{
int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
while (!READ_ONCE(node->locked)) {
/*
* If we need to reschedule bail... so we can block.
+ * Use vcpu_is_preempted to detech lock holder preemption issue
+ * and break. vcpu_is_preempted is a macro defined by false if
+ * arch does not support vcpu preempted check,
*/
- if (need_resched())
+ if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
goto unqueue;
cpu_relax_lowlatency();
--
2.4.11
^ permalink raw reply related
* [PATCH v5 1/9] kernel/sched: introduce vcpu preempted check interface
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
In-Reply-To: <1476998874-2089-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
This patch support to fix lock holder preemption issue.
For kernel users, we could use bool vcpu_is_preempted(int cpu) to detech if
one vcpu is preempted or not.
The default implementation is a macro defined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
include/linux/sched.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..44c1ce7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3506,6 +3506,18 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
#endif /* CONFIG_SMP */
+/*
+ * In order to deal with a various lock holder preemption issues provide an
+ * interface to see if a vCPU is currently running or not.
+ *
+ * This allows us to terminate optimistic spin loops and block, analogous to
+ * the native optimistic spin heuristic of testing if the lock owner task is
+ * running or not.
+ */
+#ifndef vcpu_is_preempted
+#define vcpu_is_preempted(cpu) false
+#endif
+
extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
--
2.4.11
^ permalink raw reply related
* [PATCH v5 0/9] implement vcpu preempted check
From: Pan Xinhui @ 2016-10-20 21:27 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
xen-devel-request, kvm
Cc: kernellwp, jgross, Pan Xinhui, rkrcmar, peterz, benh, will.deacon,
mingo, paulus, mpe, pbonzini, paulmck, boqun.feng
change from v4:
spilt x86 kvm vcpu preempted check into two patches.
add documentation patch.
add x86 vcpu preempted check patch under xen
add s390 vcpu preempted check patch
change from v3:
add x86 vcpu preempted check patch
change from v2:
no code change, fix typos, update some comments
change from v1:
a simplier definition of default vcpu_is_preempted
skip mahcine type check on ppc, and add config. remove dedicated macro.
add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
add more comments
thanks boqun and Peter's suggestion.
This patch set aims to fix lock holder preemption issues.
test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report
18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
2.49% sched-messaging [kernel.vmlinux] [k] system_call
We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set
We also have observed some performace improvements in uninx benchmark tests.
PPC test result:
1 copy - 0.94%
2 copy - 7.17%
4 copy - 11.9%
8 copy - 3.04%
16 copy - 15.11%
details below:
Without patch:
1 copy - File Write 4096 bufsize 8000 maxblocks 2188223.0 KBps (30.0 s, 1 samples)
2 copy - File Write 4096 bufsize 8000 maxblocks 1804433.0 KBps (30.0 s, 1 samples)
4 copy - File Write 4096 bufsize 8000 maxblocks 1237257.0 KBps (30.0 s, 1 samples)
8 copy - File Write 4096 bufsize 8000 maxblocks 1032658.0 KBps (30.0 s, 1 samples)
16 copy - File Write 4096 bufsize 8000 maxblocks 768000.0 KBps (30.1 s, 1 samples)
With patch:
1 copy - File Write 4096 bufsize 8000 maxblocks 2209189.0 KBps (30.0 s, 1 samples)
2 copy - File Write 4096 bufsize 8000 maxblocks 1943816.0 KBps (30.0 s, 1 samples)
4 copy - File Write 4096 bufsize 8000 maxblocks 1405591.0 KBps (30.0 s, 1 samples)
8 copy - File Write 4096 bufsize 8000 maxblocks 1065080.0 KBps (30.0 s, 1 samples)
16 copy - File Write 4096 bufsize 8000 maxblocks 904762.0 KBps (30.0 s, 1 samples)
X86 test result:
test-case after-patch before-patch
Execl Throughput | 18307.9 lps | 11701.6 lps
File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
Pipe Throughput | 11872208.7 lps | 11855628.9 lps
Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
Process Creation | 29881.2 lps | 28572.8 lps
Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
System Call Overhead | 10385653.0 lps | 10419979.0 lps
Christian Borntraeger (1):
s390/spinlock: Provide vcpu_is_preempted
Juergen Gross (1):
x86, xen: support vcpu preempted check
Pan Xinhui (7):
kernel/sched: introduce vcpu preempted check interface
locking/osq: Drop the overload of osq_lock()
kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
powerpc/spinlock: support vcpu preempted check
x86, paravirt: Add interface to support kvm/xen vcpu preempted check
x86, kvm: support vcpu preempted check
Documentation: virtual: kvm: Support vcpu preempted check
Documentation/virtual/kvm/msr.txt | 8 +++++++-
arch/powerpc/include/asm/spinlock.h | 8 ++++++++
arch/s390/include/asm/spinlock.h | 8 ++++++++
arch/s390/kernel/smp.c | 9 +++++++--
arch/s390/lib/spinlock.c | 25 ++++++++-----------------
arch/x86/include/asm/paravirt_types.h | 2 ++
arch/x86/include/asm/spinlock.h | 8 ++++++++
arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
arch/x86/kernel/kvm.c | 12 ++++++++++++
arch/x86/kernel/paravirt-spinlocks.c | 6 ++++++
arch/x86/kvm/x86.c | 18 ++++++++++++++++++
arch/x86/xen/spinlock.c | 3 ++-
include/linux/sched.h | 12 ++++++++++++
kernel/locking/mutex.c | 15 +++++++++++++--
kernel/locking/osq_lock.c | 10 +++++++++-
kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
16 files changed, 135 insertions(+), 28 deletions(-)
--
2.4.11
^ permalink raw reply
* Re: [PATCH net-next v2 6/9] net: use core MTU range checking in virt drivers
From: Michael S. Tsirkin @ 2016-10-20 20:23 UTC (permalink / raw)
To: Jarod Wilson
Cc: Aaron Conole, David Kershner, Wei Liu, VMware, Inc., netdev,
Haiyang Zhang, linux-kernel, virtualization, Paul Durrant,
Shrikrishna Khare
In-Reply-To: <20161020175524.6184-7-jarod@redhat.com>
On Thu, Oct 20, 2016 at 01:55:21PM -0400, Jarod Wilson wrote:
> hyperv_net:
> - set min/max_mtu, per Haiyang, after rndis_filter_device_add
>
> virtio_net:
> - set min/max_mtu
> - remove virtnet_change_mtu
> vmxnet3:
> - set min/max_mtu
>
> xen-netback:
> - min_mtu = 0, max_mtu = 65517
>
> xen-netfront:
> - min_mtu = 0, max_mtu = 65535
>
> unisys/visor:
> - clean up defines a little to not clash with network core or add
> redundat definitions
>
> CC: netdev@vger.kernel.org
> CC: virtualization@lists.linux-foundation.org
> CC: "K. Y. Srinivasan" <kys@microsoft.com>
> CC: Haiyang Zhang <haiyangz@microsoft.com>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: Shrikrishna Khare <skhare@vmware.com>
> CC: "VMware, Inc." <pv-drivers@vmware.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> CC: David Kershner <david.kershner@unisys.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
> drivers/net/hyperv/hyperv_net.h | 4 ++--
> drivers/net/hyperv/netvsc_drv.c | 14 +++++++-------
> drivers/net/virtio_net.c | 23 ++++++++++-------------
> drivers/net/vmxnet3/vmxnet3_drv.c | 7 ++++---
> drivers/net/xen-netback/interface.c | 5 ++++-
> drivers/net/xen-netfront.c | 2 ++
> drivers/staging/unisys/include/iochannel.h | 10 ++++------
> drivers/staging/unisys/visornic/visornic_main.c | 4 ++--
> 8 files changed, 35 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index f4fbcb5..3958ada 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -606,8 +606,8 @@ struct nvsp_message {
> } __packed;
>
>
> -#define NETVSC_MTU 65536
> -#define NETVSC_MTU_MIN 68
> +#define NETVSC_MTU 65535
> +#define NETVSC_MTU_MIN ETH_MIN_MTU
>
> #define NETVSC_RECEIVE_BUFFER_SIZE (1024*1024*16) /* 16MB */
> #define NETVSC_RECEIVE_BUFFER_SIZE_LEGACY (1024*1024*15) /* 15MB */
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index f0919bd..3b28cf1 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -872,19 +872,12 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
> struct netvsc_device *nvdev = ndevctx->nvdev;
> struct hv_device *hdev = ndevctx->device_ctx;
> struct netvsc_device_info device_info;
> - int limit = ETH_DATA_LEN;
> u32 num_chn;
> int ret = 0;
>
> if (ndevctx->start_remove || !nvdev || nvdev->destroy)
> return -ENODEV;
>
> - if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
> - limit = NETVSC_MTU - ETH_HLEN;
> -
> - if (mtu < NETVSC_MTU_MIN || mtu > limit)
> - return -EINVAL;
> -
> ret = netvsc_close(ndev);
> if (ret)
> goto out;
> @@ -1402,6 +1395,13 @@ static int netvsc_probe(struct hv_device *dev,
> netif_set_real_num_tx_queues(net, nvdev->num_chn);
> netif_set_real_num_rx_queues(net, nvdev->num_chn);
>
> + /* MTU range: 68 - 1500 or 65521 */
> + net->min_mtu = NETVSC_MTU_MIN;
> + if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
> + net->max_mtu = NETVSC_MTU - ETH_HLEN;
> + else
> + net->max_mtu = ETH_DATA_LEN;
> +
> ret = register_netdev(net);
> if (ret != 0) {
> pr_err("Unable to register netdev.\n");
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fad84f3..720809f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,17 +1419,6 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
> .set_settings = virtnet_set_settings,
> };
>
> -#define MIN_MTU 68
> -#define MAX_MTU 65535
> -
> -static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> -{
> - if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
> - return -EINVAL;
> - dev->mtu = new_mtu;
> - return 0;
> -}
> -
> static const struct net_device_ops virtnet_netdev = {
> .ndo_open = virtnet_open,
> .ndo_stop = virtnet_close,
> @@ -1437,7 +1426,6 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_validate_addr = eth_validate_addr,
> .ndo_set_mac_address = virtnet_set_mac_address,
> .ndo_set_rx_mode = virtnet_set_rx_mode,
> - .ndo_change_mtu = virtnet_change_mtu,
> .ndo_get_stats64 = virtnet_stats,
> .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
> .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> @@ -1748,6 +1736,9 @@ static bool virtnet_validate_features(struct virtio_device *vdev)
> return true;
> }
>
> +#define MIN_MTU ETH_MIN_MTU
> +#define MAX_MTU ETH_MAX_MTU
> +
Can we drop these btw?
> static int virtnet_probe(struct virtio_device *vdev)
> {
> int i, err;
> @@ -1821,6 +1812,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>
> dev->vlan_features = dev->features;
>
> + /* MTU range: 68 - 65535 */
> + dev->min_mtu = MIN_MTU;
> + dev->max_mtu = MAX_MTU;
> +
> /* Configuration may specify what MAC to use. Otherwise random. */
> if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
> virtio_cread_bytes(vdev,
> @@ -1875,8 +1870,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> mtu = virtio_cread16(vdev,
> offsetof(struct virtio_net_config,
> mtu));
> - if (virtnet_change_mtu(dev, mtu))
> + if (mtu < dev->min_mtu || mtu > dev->max_mtu)
In fact the > max_mtu branch does not make sense since a 16 bit
value can't exceed MAX_MTU.
> __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
> + else
> + dev->mtu = mtu;
> }
>
> if (vi->any_header_sg)
> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> index b5554f2..0c36de1 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -2969,9 +2969,6 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
> struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> int err = 0;
>
> - if (new_mtu < VMXNET3_MIN_MTU || new_mtu > VMXNET3_MAX_MTU)
> - return -EINVAL;
> -
> netdev->mtu = new_mtu;
>
> /*
> @@ -3428,6 +3425,10 @@ vmxnet3_probe_device(struct pci_dev *pdev,
> vmxnet3_set_ethtool_ops(netdev);
> netdev->watchdog_timeo = 5 * HZ;
>
> + /* MTU range: 60 - 9000 */
> + netdev->min_mtu = VMXNET3_MIN_MTU;
> + netdev->max_mtu = VMXNET3_MAX_MTU;
> +
> INIT_WORK(&adapter->work, vmxnet3_reset_work);
> set_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state);
>
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 74dc2bf..e30ffd2 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -302,7 +302,7 @@ static int xenvif_close(struct net_device *dev)
> static int xenvif_change_mtu(struct net_device *dev, int mtu)
> {
> struct xenvif *vif = netdev_priv(dev);
> - int max = vif->can_sg ? 65535 - VLAN_ETH_HLEN : ETH_DATA_LEN;
> + int max = vif->can_sg ? ETH_MAX_MTU - VLAN_ETH_HLEN : ETH_DATA_LEN;
>
> if (mtu > max)
> return -EINVAL;
> @@ -471,6 +471,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>
> dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
>
> + dev->min_mtu = 0;
> + dev->max_mtu = ETH_MAX_MTU - VLAN_ETH_HLEN;
> +
> /*
> * Initialise a dummy MAC address. We choose the numerically
> * largest non-broadcast address to prevent the address getting
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index e17879d..7d616b0 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1329,6 +1329,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
> netdev->features |= netdev->hw_features;
>
> netdev->ethtool_ops = &xennet_ethtool_ops;
> + netdev->min_mtu = 0;
> + netdev->max_mtu = XEN_NETIF_MAX_TX_SIZE;
> SET_NETDEV_DEV(netdev, &dev->dev);
>
> np->netdev = netdev;
> diff --git a/drivers/staging/unisys/include/iochannel.h b/drivers/staging/unisys/include/iochannel.h
> index cba4433..9081b3f 100644
> --- a/drivers/staging/unisys/include/iochannel.h
> +++ b/drivers/staging/unisys/include/iochannel.h
> @@ -113,12 +113,10 @@ enum net_types {
>
> };
>
> -#define ETH_HEADER_SIZE 14 /* size of ethernet header */
> -
> #define ETH_MIN_DATA_SIZE 46 /* minimum eth data size */
> -#define ETH_MIN_PACKET_SIZE (ETH_HEADER_SIZE + ETH_MIN_DATA_SIZE)
> +#define ETH_MIN_PACKET_SIZE (ETH_HLEN + ETH_MIN_DATA_SIZE)
>
> -#define ETH_MAX_MTU 16384 /* maximum data size */
> +#define VISOR_ETH_MAX_MTU 16384 /* maximum data size */
>
> #ifndef MAX_MACADDR_LEN
> #define MAX_MACADDR_LEN 6 /* number of bytes in MAC address */
> @@ -288,7 +286,7 @@ struct net_pkt_xmt {
> int len; /* full length of data in the packet */
> int num_frags; /* number of fragments in frags containing data */
> struct phys_info frags[MAX_PHYS_INFO]; /* physical page information */
> - char ethhdr[ETH_HEADER_SIZE]; /* the ethernet header */
> + char ethhdr[ETH_HLEN]; /* the ethernet header */
> struct {
> /* these are needed for csum at uisnic end */
> u8 valid; /* 1 = struct is valid - else ignore */
> @@ -323,7 +321,7 @@ struct net_pkt_xmtdone {
> */
> #define RCVPOST_BUF_SIZE 4032
> #define MAX_NET_RCV_CHAIN \
> - ((ETH_MAX_MTU + ETH_HEADER_SIZE + RCVPOST_BUF_SIZE - 1) \
> + ((VISOR_ETH_MAX_MTU + ETH_HLEN + RCVPOST_BUF_SIZE - 1) \
> / RCVPOST_BUF_SIZE)
>
> struct net_pkt_rcvpost {
> diff --git a/drivers/staging/unisys/visornic/visornic_main.c b/drivers/staging/unisys/visornic/visornic_main.c
> index 1367007..f8a584b 100644
> --- a/drivers/staging/unisys/visornic/visornic_main.c
> +++ b/drivers/staging/unisys/visornic/visornic_main.c
> @@ -791,7 +791,7 @@ visornic_xmit(struct sk_buff *skb, struct net_device *netdev)
> * pointing to
> */
> firstfraglen = skb->len - skb->data_len;
> - if (firstfraglen < ETH_HEADER_SIZE) {
> + if (firstfraglen < ETH_HLEN) {
> spin_unlock_irqrestore(&devdata->priv_lock, flags);
> devdata->busy_cnt++;
> dev_err(&netdev->dev,
> @@ -864,7 +864,7 @@ visornic_xmit(struct sk_buff *skb, struct net_device *netdev)
> /* copy ethernet header from first frag into ocmdrsp
> * - everything else will be pass in frags & DMA'ed
> */
> - memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HEADER_SIZE);
> + memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HLEN);
> /* copy frags info - from skb->data we need to only provide access
> * beyond eth header
> */
> --
> 2.10.0
^ permalink raw reply
* RE: [PATCH net-next v2 6/9] net: use core MTU range checking in virt drivers
From: Haiyang Zhang via Virtualization @ 2016-10-20 18:05 UTC (permalink / raw)
To: Jarod Wilson, linux-kernel@vger.kernel.org
Cc: David Kershner, Wei Liu, Michael S. Tsirkin, VMware, Inc.,
netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
Paul Durrant, Shrikrishna Khare
In-Reply-To: <20161020175524.6184-7-jarod@redhat.com>
> -----Original Message-----
> From: Jarod Wilson [mailto:jarod@redhat.com]
> Sent: Thursday, October 20, 2016 1:55 PM
> To: linux-kernel@vger.kernel.org
> Cc: Jarod Wilson <jarod@redhat.com>; netdev@vger.kernel.org;
> virtualization@lists.linux-foundation.org; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Michael S.
> Tsirkin <mst@redhat.com>; Shrikrishna Khare <skhare@vmware.com>; VMware,
> Inc. <pv-drivers@vmware.com>; Wei Liu <wei.liu2@citrix.com>; Paul
> Durrant <paul.durrant@citrix.com>; David Kershner
> <david.kershner@unisys.com>
> Subject: [PATCH net-next v2 6/9] net: use core MTU range checking in
> virt drivers
>
> hyperv_net:
> - set min/max_mtu, per Haiyang, after rndis_filter_device_add
>
> virtio_net:
> - set min/max_mtu
> - remove virtnet_change_mtu
>
> vmxnet3:
> - set min/max_mtu
>
> xen-netback:
> - min_mtu = 0, max_mtu = 65517
>
> xen-netfront:
> - min_mtu = 0, max_mtu = 65535
>
> unisys/visor:
> - clean up defines a little to not clash with network core or add
> redundat definitions
>
> CC: netdev@vger.kernel.org
> CC: virtualization@lists.linux-foundation.org
> CC: "K. Y. Srinivasan" <kys@microsoft.com>
> CC: Haiyang Zhang <haiyangz@microsoft.com>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: Shrikrishna Khare <skhare@vmware.com>
> CC: "VMware, Inc." <pv-drivers@vmware.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Paul Durrant <paul.durrant@citrix.com>
> CC: David Kershner <david.kershner@unisys.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
The hv_netvsc changes look fine. Thanks.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
^ permalink raw reply
* [PATCH net-next v2 6/9] net: use core MTU range checking in virt drivers
From: Jarod Wilson @ 2016-10-20 17:55 UTC (permalink / raw)
To: linux-kernel
Cc: Jarod Wilson, David Kershner, Wei Liu, Michael S. Tsirkin,
VMware, Inc., netdev, Haiyang Zhang, virtualization, Paul Durrant,
Shrikrishna Khare
In-Reply-To: <20161020175524.6184-1-jarod@redhat.com>
hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add
virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu
vmxnet3:
- set min/max_mtu
xen-netback:
- min_mtu = 0, max_mtu = 65517
xen-netfront:
- min_mtu = 0, max_mtu = 65535
unisys/visor:
- clean up defines a little to not clash with network core or add
redundat definitions
CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
drivers/net/hyperv/hyperv_net.h | 4 ++--
drivers/net/hyperv/netvsc_drv.c | 14 +++++++-------
drivers/net/virtio_net.c | 23 ++++++++++-------------
drivers/net/vmxnet3/vmxnet3_drv.c | 7 ++++---
drivers/net/xen-netback/interface.c | 5 ++++-
drivers/net/xen-netfront.c | 2 ++
drivers/staging/unisys/include/iochannel.h | 10 ++++------
drivers/staging/unisys/visornic/visornic_main.c | 4 ++--
8 files changed, 35 insertions(+), 34 deletions(-)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f4fbcb5..3958ada 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -606,8 +606,8 @@ struct nvsp_message {
} __packed;
-#define NETVSC_MTU 65536
-#define NETVSC_MTU_MIN 68
+#define NETVSC_MTU 65535
+#define NETVSC_MTU_MIN ETH_MIN_MTU
#define NETVSC_RECEIVE_BUFFER_SIZE (1024*1024*16) /* 16MB */
#define NETVSC_RECEIVE_BUFFER_SIZE_LEGACY (1024*1024*15) /* 15MB */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f0919bd..3b28cf1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -872,19 +872,12 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
struct netvsc_device *nvdev = ndevctx->nvdev;
struct hv_device *hdev = ndevctx->device_ctx;
struct netvsc_device_info device_info;
- int limit = ETH_DATA_LEN;
u32 num_chn;
int ret = 0;
if (ndevctx->start_remove || !nvdev || nvdev->destroy)
return -ENODEV;
- if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
- limit = NETVSC_MTU - ETH_HLEN;
-
- if (mtu < NETVSC_MTU_MIN || mtu > limit)
- return -EINVAL;
-
ret = netvsc_close(ndev);
if (ret)
goto out;
@@ -1402,6 +1395,13 @@ static int netvsc_probe(struct hv_device *dev,
netif_set_real_num_tx_queues(net, nvdev->num_chn);
netif_set_real_num_rx_queues(net, nvdev->num_chn);
+ /* MTU range: 68 - 1500 or 65521 */
+ net->min_mtu = NETVSC_MTU_MIN;
+ if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
+ net->max_mtu = NETVSC_MTU - ETH_HLEN;
+ else
+ net->max_mtu = ETH_DATA_LEN;
+
ret = register_netdev(net);
if (ret != 0) {
pr_err("Unable to register netdev.\n");
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fad84f3..720809f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,17 +1419,6 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
.set_settings = virtnet_set_settings,
};
-#define MIN_MTU 68
-#define MAX_MTU 65535
-
-static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
-{
- if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
- return -EINVAL;
- dev->mtu = new_mtu;
- return 0;
-}
-
static const struct net_device_ops virtnet_netdev = {
.ndo_open = virtnet_open,
.ndo_stop = virtnet_close,
@@ -1437,7 +1426,6 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = virtnet_set_mac_address,
.ndo_set_rx_mode = virtnet_set_rx_mode,
- .ndo_change_mtu = virtnet_change_mtu,
.ndo_get_stats64 = virtnet_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
@@ -1748,6 +1736,9 @@ static bool virtnet_validate_features(struct virtio_device *vdev)
return true;
}
+#define MIN_MTU ETH_MIN_MTU
+#define MAX_MTU ETH_MAX_MTU
+
static int virtnet_probe(struct virtio_device *vdev)
{
int i, err;
@@ -1821,6 +1812,10 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->vlan_features = dev->features;
+ /* MTU range: 68 - 65535 */
+ dev->min_mtu = MIN_MTU;
+ dev->max_mtu = MAX_MTU;
+
/* Configuration may specify what MAC to use. Otherwise random. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
virtio_cread_bytes(vdev,
@@ -1875,8 +1870,10 @@ static int virtnet_probe(struct virtio_device *vdev)
mtu = virtio_cread16(vdev,
offsetof(struct virtio_net_config,
mtu));
- if (virtnet_change_mtu(dev, mtu))
+ if (mtu < dev->min_mtu || mtu > dev->max_mtu)
__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);
+ else
+ dev->mtu = mtu;
}
if (vi->any_header_sg)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index b5554f2..0c36de1 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -2969,9 +2969,6 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
int err = 0;
- if (new_mtu < VMXNET3_MIN_MTU || new_mtu > VMXNET3_MAX_MTU)
- return -EINVAL;
-
netdev->mtu = new_mtu;
/*
@@ -3428,6 +3425,10 @@ vmxnet3_probe_device(struct pci_dev *pdev,
vmxnet3_set_ethtool_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;
+ /* MTU range: 60 - 9000 */
+ netdev->min_mtu = VMXNET3_MIN_MTU;
+ netdev->max_mtu = VMXNET3_MAX_MTU;
+
INIT_WORK(&adapter->work, vmxnet3_reset_work);
set_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 74dc2bf..e30ffd2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -302,7 +302,7 @@ static int xenvif_close(struct net_device *dev)
static int xenvif_change_mtu(struct net_device *dev, int mtu)
{
struct xenvif *vif = netdev_priv(dev);
- int max = vif->can_sg ? 65535 - VLAN_ETH_HLEN : ETH_DATA_LEN;
+ int max = vif->can_sg ? ETH_MAX_MTU - VLAN_ETH_HLEN : ETH_DATA_LEN;
if (mtu > max)
return -EINVAL;
@@ -471,6 +471,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
+ dev->min_mtu = 0;
+ dev->max_mtu = ETH_MAX_MTU - VLAN_ETH_HLEN;
+
/*
* Initialise a dummy MAC address. We choose the numerically
* largest non-broadcast address to prevent the address getting
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index e17879d..7d616b0 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1329,6 +1329,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
netdev->features |= netdev->hw_features;
netdev->ethtool_ops = &xennet_ethtool_ops;
+ netdev->min_mtu = 0;
+ netdev->max_mtu = XEN_NETIF_MAX_TX_SIZE;
SET_NETDEV_DEV(netdev, &dev->dev);
np->netdev = netdev;
diff --git a/drivers/staging/unisys/include/iochannel.h b/drivers/staging/unisys/include/iochannel.h
index cba4433..9081b3f 100644
--- a/drivers/staging/unisys/include/iochannel.h
+++ b/drivers/staging/unisys/include/iochannel.h
@@ -113,12 +113,10 @@ enum net_types {
};
-#define ETH_HEADER_SIZE 14 /* size of ethernet header */
-
#define ETH_MIN_DATA_SIZE 46 /* minimum eth data size */
-#define ETH_MIN_PACKET_SIZE (ETH_HEADER_SIZE + ETH_MIN_DATA_SIZE)
+#define ETH_MIN_PACKET_SIZE (ETH_HLEN + ETH_MIN_DATA_SIZE)
-#define ETH_MAX_MTU 16384 /* maximum data size */
+#define VISOR_ETH_MAX_MTU 16384 /* maximum data size */
#ifndef MAX_MACADDR_LEN
#define MAX_MACADDR_LEN 6 /* number of bytes in MAC address */
@@ -288,7 +286,7 @@ struct net_pkt_xmt {
int len; /* full length of data in the packet */
int num_frags; /* number of fragments in frags containing data */
struct phys_info frags[MAX_PHYS_INFO]; /* physical page information */
- char ethhdr[ETH_HEADER_SIZE]; /* the ethernet header */
+ char ethhdr[ETH_HLEN]; /* the ethernet header */
struct {
/* these are needed for csum at uisnic end */
u8 valid; /* 1 = struct is valid - else ignore */
@@ -323,7 +321,7 @@ struct net_pkt_xmtdone {
*/
#define RCVPOST_BUF_SIZE 4032
#define MAX_NET_RCV_CHAIN \
- ((ETH_MAX_MTU + ETH_HEADER_SIZE + RCVPOST_BUF_SIZE - 1) \
+ ((VISOR_ETH_MAX_MTU + ETH_HLEN + RCVPOST_BUF_SIZE - 1) \
/ RCVPOST_BUF_SIZE)
struct net_pkt_rcvpost {
diff --git a/drivers/staging/unisys/visornic/visornic_main.c b/drivers/staging/unisys/visornic/visornic_main.c
index 1367007..f8a584b 100644
--- a/drivers/staging/unisys/visornic/visornic_main.c
+++ b/drivers/staging/unisys/visornic/visornic_main.c
@@ -791,7 +791,7 @@ visornic_xmit(struct sk_buff *skb, struct net_device *netdev)
* pointing to
*/
firstfraglen = skb->len - skb->data_len;
- if (firstfraglen < ETH_HEADER_SIZE) {
+ if (firstfraglen < ETH_HLEN) {
spin_unlock_irqrestore(&devdata->priv_lock, flags);
devdata->busy_cnt++;
dev_err(&netdev->dev,
@@ -864,7 +864,7 @@ visornic_xmit(struct sk_buff *skb, struct net_device *netdev)
/* copy ethernet header from first frag into ocmdrsp
* - everything else will be pass in frags & DMA'ed
*/
- memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HEADER_SIZE);
+ memcpy(cmdrsp->net.xmt.ethhdr, skb->data, ETH_HLEN);
/* copy frags info - from skb->data we need to only provide access
* beyond eth header
*/
--
2.10.0
^ permalink raw reply related
* [PATCH] x86/vmware: Read tsc_khz only once - at boot time
From: Alexey Makhalov @ 2016-10-20 5:02 UTC (permalink / raw)
To: Alok Kataria, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
virtualization, linux-kernel
Re-factor the vmware platform setup code to query the hypervisor for tsc
frequency only once during boot. Since the VMware hypervisor guarantees
constant TSC, calibrate_tsc now uses the saved value.
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Acked-by: Alok N Kataria <akataria@vmware.com>
---
arch/x86/kernel/cpu/vmware.c | 37 ++++++++++++++++++-------------------
1 file changed, 18 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 5130985..46c7b9d 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -48,6 +48,8 @@
"2"(VMWARE_HYPERVISOR_PORT), "3"(UINT_MAX) : \
"memory");
+static unsigned long vmware_tsc_khz __ro_after_init;
+
static inline int __vmware_platform(void)
{
uint32_t eax, ebx, ecx, edx;
@@ -57,35 +59,32 @@ static inline int __vmware_platform(void)
static unsigned long vmware_get_tsc_khz(void)
{
- uint64_t tsc_hz, lpj;
- uint32_t eax, ebx, ecx, edx;
-
- VMWARE_PORT(GETHZ, eax, ebx, ecx, edx);
-
- tsc_hz = eax | (((uint64_t)ebx) << 32);
- do_div(tsc_hz, 1000);
- BUG_ON(tsc_hz >> 32);
- pr_info("TSC freq read from hypervisor : %lu.%03lu MHz\n",
- (unsigned long) tsc_hz / 1000,
- (unsigned long) tsc_hz % 1000);
-
- if (!preset_lpj) {
- lpj = ((u64)tsc_hz * 1000);
- do_div(lpj, HZ);
- preset_lpj = lpj;
- }
-
- return tsc_hz;
+ return vmware_tsc_khz;
}
static void __init vmware_platform_setup(void)
{
uint32_t eax, ebx, ecx, edx;
+ uint64_t lpj, tsc_khz;
VMWARE_PORT(GETHZ, eax, ebx, ecx, edx);
if (ebx != UINT_MAX) {
+ lpj = tsc_khz = eax | (((uint64_t)ebx) << 32);
+ do_div(tsc_khz, 1000);
+ WARN_ON(tsc_khz >> 32);
+ pr_info("TSC freq read from hypervisor : %lu.%03lu MHz\n",
+ (unsigned long) tsc_khz / 1000,
+ (unsigned long) tsc_khz % 1000);
+
+ if (!preset_lpj) {
+ do_div(lpj, HZ);
+ preset_lpj = lpj;
+ }
+
+ vmware_tsc_khz = tsc_khz;
x86_platform.calibrate_tsc = vmware_get_tsc_khz;
+
#ifdef CONFIG_X86_LOCAL_APIC
/* Skip lapic calibration since we know the bus frequency. */
lapic_timer_frequency = ecx / HZ;
--
1.9.1
^ permalink raw reply related
* Re: [PATCH net-next 5/6] net: use core MTU range checking in virt drivers
From: Shrikrishna Khare @ 2016-10-19 22:21 UTC (permalink / raw)
To: Jarod Wilson
Cc: Michael S. Tsirkin, VMware, Inc., netdev, Haiyang Zhang,
linux-kernel, virtualization, Shrikrishna Khare
In-Reply-To: <20161019023333.15760-6-jarod@redhat.com>
On Wed, 19 Oct 2016, Jarod Wilson wrote:
> hyperv_net:
> - set min/max_mtu
>
> virtio_net:
> - set min/max_mtu
> - remove virtnet_change_mtu
>
> vmxnet3:
> - set min/max_mtu
>
> CC: netdev@vger.kernel.org
> CC: virtualization@lists.linux-foundation.org
> CC: "K. Y. Srinivasan" <kys@microsoft.com>
> CC: Haiyang Zhang <haiyangz@microsoft.com>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: Shrikrishna Khare <skhare@vmware.com>
> CC: "VMware, Inc." <pv-drivers@vmware.com>
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
The vmxnet3 part of the change looks good to me.
Thanks,
Shri
^ permalink raw reply
* Re: [PATCH v4 5/5] x86, kvm: support vcpu preempted check
From: Pan Xinhui @ 2016-10-19 18:45 UTC (permalink / raw)
To: Radim Krčmář, Pan Xinhui
Cc: kernellwp, linux-s390, jgross, kvm, peterz, xen-devel-request,
will.deacon, linux-kernel, virtualization, mingo, paulus, mpe,
benh, pbonzini, paulmck, linuxppc-dev, boqun.feng
In-Reply-To: <20161019172403.GA9240@potion>
在 2016/10/20 01:24, Radim Krčmář 写道:
> 2016-10-19 06:20-0400, Pan Xinhui:
>> This is to fix some lock holder preemption issues. Some other locks
>> implementation do a spin loop before acquiring the lock itself.
>> Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
>> takes the cpu as parameter and return true if the cpu is preempted. Then
>> kernel can break the spin loops upon on the retval of vcpu_is_preempted.
>>
>> As kernel has used this interface, So lets support it.
>>
>> We use one field of struct kvm_steal_time to indicate that if one vcpu
>> is running or not.
>>
>> unix benchmark result:
>> host: kernel 4.8.1, i5-4570, 4 cpus
>> guest: kernel 4.8.1, 8 vcpus
>>
>> test-case after-patch before-patch
>> Execl Throughput | 18307.9 lps | 11701.6 lps
>> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
>> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
>> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
>> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
>> Process Creation | 29881.2 lps | 28572.8 lps
>> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
>> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
>> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> @@ -98,6 +98,10 @@ struct pv_time_ops {
>> unsigned long long (*steal_clock)(int cpu);
>> };
>>
>> +struct pv_vcpu_ops {
>> + bool (*vcpu_is_preempted)(int cpu);
>> +};
>> +
>
> (I would put it into pv_lock_ops to save the plumbing.)
>
hi, Radim
thanks for your reply.
yes, a new struct leads patch into unnecessary lines changed.
I do that just because I am not sure which existing xxx_ops I should place the vcpu_is_preempted in.
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -45,7 +45,8 @@ struct kvm_steal_time {
>> __u64 steal;
>> __u32 version;
>> __u32 flags;
>> - __u32 pad[12];
>> + __u32 preempted;
>
> Why __u32 instead of __u8?
>
I thought it is 32-bits aligned...
yes, u8 is good to store the preempt status.
>> + __u32 pad[11];
>> };
>
> Please document the change in Documentation/virtual/kvm/msr.txt, section
> MSR_KVM_STEAL_TIME.
>
okay, I totally forgot to do that. thanks!
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> @@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
>> +static bool kvm_vcpu_is_preempted(int cpu)
>> +{
>> + struct kvm_steal_time *src;
>> +
>> + src = &per_cpu(steal_time, cpu);
>> +
>> + return !!src->preempted;
>> +}
>> +
>> #ifdef CONFIG_SMP
>> static void __init kvm_smp_prepare_boot_cpu(void)
>> {
>> @@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
>> kvm_guest_cpu_init();
>> #endif
>>
>> + pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
>
> Would be nicer to assign conditionally in the KVM_FEATURE_STEAL_TIME
> block. The steal_time structure has to be zeroed, so this code would
> work, but the native function (return false) is better if we know that
> the kvm_vcpu_is_preempted() would always return false anway.
>
yes, agree. Will do that.
I once thought we can patch the code runtime.
we replace binary code
"call 0xXXXXXXXX #pv_vcpu_ops.vcpu_is_preempted"
with
"xor eax, eax"
however it is not worth doing that. the performace improvements might be very small.
> Old KVMs won't have the feature, so we could also assign only when KVM
> reports it, but that requires extra definitions and the performance gain
> is fairly small, so I'm ok with this.
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> @@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>> return;
>>
>> + vcpu->arch.st.steal.preempted = 0;
>> +
>> if (vcpu->arch.st.steal.version & 1)
>> vcpu->arch.st.steal.version += 1; /* first time write, random junk */
>>
>> @@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>
>> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> {
>> + if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
>> + if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> + &vcpu->arch.st.steal,
>> + sizeof(struct kvm_steal_time)) == 0) {
>> + vcpu->arch.st.steal.preempted = 1;
>> + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
>> + &vcpu->arch.st.steal,
>> + sizeof(struct kvm_steal_time));
>> + }
>
> Please name this block of code. Something like
> kvm_steal_time_set_preempted(vcpu);
>
yep, my code style is ugly.
will do that.
thanks
xinhui
> Thanks.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v4 5/5] x86, kvm: support vcpu preempted check
From: Radim Krčmář @ 2016-10-19 17:24 UTC (permalink / raw)
To: Pan Xinhui
Cc: kernellwp, linux-s390, jgross, kvm, peterz, xen-devel-request,
will.deacon, linux-kernel, virtualization, mingo, paulus, mpe,
benh, pbonzini, paulmck, linuxppc-dev, boqun.feng
In-Reply-To: <1476872416-42752-6-git-send-email-xinhui.pan@linux.vnet.ibm.com>
2016-10-19 06:20-0400, Pan Xinhui:
> This is to fix some lock holder preemption issues. Some other locks
> implementation do a spin loop before acquiring the lock itself.
> Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
> takes the cpu as parameter and return true if the cpu is preempted. Then
> kernel can break the spin loops upon on the retval of vcpu_is_preempted.
>
> As kernel has used this interface, So lets support it.
>
> We use one field of struct kvm_steal_time to indicate that if one vcpu
> is running or not.
>
> unix benchmark result:
> host: kernel 4.8.1, i5-4570, 4 cpus
> guest: kernel 4.8.1, 8 vcpus
>
> test-case after-patch before-patch
> Execl Throughput | 18307.9 lps | 11701.6 lps
> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
> Process Creation | 29881.2 lps | 28572.8 lps
> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> @@ -98,6 +98,10 @@ struct pv_time_ops {
> unsigned long long (*steal_clock)(int cpu);
> };
>
> +struct pv_vcpu_ops {
> + bool (*vcpu_is_preempted)(int cpu);
> +};
> +
(I would put it into pv_lock_ops to save the plumbing.)
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -45,7 +45,8 @@ struct kvm_steal_time {
> __u64 steal;
> __u32 version;
> __u32 flags;
> - __u32 pad[12];
> + __u32 preempted;
Why __u32 instead of __u8?
> + __u32 pad[11];
> };
Please document the change in Documentation/virtual/kvm/msr.txt, section
MSR_KVM_STEAL_TIME.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
> +static bool kvm_vcpu_is_preempted(int cpu)
> +{
> + struct kvm_steal_time *src;
> +
> + src = &per_cpu(steal_time, cpu);
> +
> + return !!src->preempted;
> +}
> +
> #ifdef CONFIG_SMP
> static void __init kvm_smp_prepare_boot_cpu(void)
> {
> @@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
> kvm_guest_cpu_init();
> #endif
>
> + pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
Would be nicer to assign conditionally in the KVM_FEATURE_STEAL_TIME
block. The steal_time structure has to be zeroed, so this code would
work, but the native function (return false) is better if we know that
the kvm_vcpu_is_preempted() would always return false anway.
Old KVMs won't have the feature, so we could also assign only when KVM
reports it, but that requires extra definitions and the performance gain
is fairly small, so I'm ok with this.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> &vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
> return;
>
> + vcpu->arch.st.steal.preempted = 0;
> +
> if (vcpu->arch.st.steal.version & 1)
> vcpu->arch.st.steal.version += 1; /* first time write, random junk */
>
> @@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> {
> + if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
> + if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> + &vcpu->arch.st.steal,
> + sizeof(struct kvm_steal_time)) == 0) {
> + vcpu->arch.st.steal.preempted = 1;
> + kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> + &vcpu->arch.st.steal,
> + sizeof(struct kvm_steal_time));
> + }
Please name this block of code. Something like
kvm_steal_time_set_preempted(vcpu);
Thanks.
^ permalink raw reply
* Re: [PATCH v4 0/5] implement vcpu preempted check
From: Pan Xinhui @ 2016-10-19 17:08 UTC (permalink / raw)
To: Juergen Gross, Pan Xinhui, linux-kernel, linuxppc-dev,
virtualization, linux-s390, xen-devel, kvm
Cc: kernellwp, peterz, benh, will.deacon, mingo, paulus, mpe,
pbonzini, paulmck, boqun.feng
In-Reply-To: <a801cd40-1019-01e2-1013-f04363fc7f31@suse.com>
在 2016/10/19 23:58, Juergen Gross 写道:
> On 19/10/16 12:20, Pan Xinhui wrote:
>> change from v3:
>> add x86 vcpu preempted check patch
>> change from v2:
>> no code change, fix typos, update some comments
>> change from v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
>> We also have observed some performace improvements.
>>
>> PPC test result:
>>
>> 1 copy - 0.94%
>> 2 copy - 7.17%
>> 4 copy - 11.9%
>> 8 copy - 3.04%
>> 16 copy - 15.11%
>>
>> details below:
>> Without patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks 2188223.0 KBps (30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks 1804433.0 KBps (30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks 1237257.0 KBps (30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks 1032658.0 KBps (30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks 768000.0 KBps (30.1 s, 1 samples)
>>
>> With patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks 2209189.0 KBps (30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks 1943816.0 KBps (30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks 1405591.0 KBps (30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks 1065080.0 KBps (30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks 904762.0 KBps (30.0 s, 1 samples)
>>
>> X86 test result:
>> test-case after-patch before-patch
>> Execl Throughput | 18307.9 lps | 11701.6 lps
>> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
>> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
>> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
>> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
>> Process Creation | 29881.2 lps | 28572.8 lps
>> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
>> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
>> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>>
>> Pan Xinhui (5):
>> kernel/sched: introduce vcpu preempted check interface
>> locking/osq: Drop the overload of osq_lock()
>> kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>> powerpc/spinlock: support vcpu preempted check
>> x86, kvm: support vcpu preempted check
>
> The attached patch adds Xen support for x86. Please tell me whether you
> want to add this patch to your series or if I should post it when your
> series has been accepted.
>
hi, Juergen
Your patch is pretty small and nice :) thanks!
I can include your patch into my next patchset after this patchset reviewed. :)
> You can add my
>
> Tested-by: Juergen Gross <jgross@suse.com>
>
> for patches 1-3 and 5 (paravirt parts only).
>
Thanks a lot!
xinhui
>
> Juergen
>
>>
>> arch/powerpc/include/asm/spinlock.h | 8 ++++++++
>> arch/x86/include/asm/paravirt_types.h | 6 ++++++
>> arch/x86/include/asm/spinlock.h | 8 ++++++++
>> arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
>> arch/x86/kernel/kvm.c | 11 +++++++++++
>> arch/x86/kernel/paravirt.c | 11 +++++++++++
>> arch/x86/kvm/x86.c | 12 ++++++++++++
>> include/linux/sched.h | 12 ++++++++++++
>> kernel/locking/mutex.c | 15 +++++++++++++--
>> kernel/locking/osq_lock.c | 10 +++++++++-
>> kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
>> 11 files changed, 105 insertions(+), 7 deletions(-)
>>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Call for Papers - WorldCIST'17 Workshops - Porto Santo Island
From: ML @ 2016-10-19 17:04 UTC (permalink / raw)
To: virtualization
[-- Attachment #1: Type: text/plain, Size: 5479 bytes --]
Please disseminate by your contacts. Thank you!
*Best papers published in SCI/SSCI-indexed journals
---------------------------------------------------------------------------
WorldCIST'17 - 5th World Conference on Information Systems and Technologies
Porto Santo Island, Madeira, Portugal
11th-13th of April 2017
http://www.worldcist.org/
--------------------------------------------------------------------
WorldCIST 2017 will feature a total of 18 Workshops. Paper submission for all Workshops must be performed at https://easychair.org/conferences/?conf=worldcist_workshops2017 selecting the desired Workshop. Workshop papers (Full 10 Pages and Short 7 Pages) will be published by Springer AISC series and the authors of the best Workshop paper will be invited to extend their work for publication at top International Journals (indexed by ISI Web of Knowledge and SCOPUS). Paper submission is open until November 27th for all Workshops.
WORKSHOPS
BIO - Business Intelligence in Organizations
CMAIPA - Computational Methods and Applications for Image Processing and Analysis
CSQA - Computer Supported Qualitative Analysis
ESG - Educational and Serious Games
ETCBPM - Emerging Trends and Challenges in Business Process Management
HISISE - Workshop on Healthcare Information Systems Interoperability, Security and Efficiency
HMInARMM - Human-Machine Interfaces in Automation, Robotics, Mechanics and Mechatronics
ICDSS - Intelligent and Collaborative Decision Support Systems for Improving Manufacturing Processes
ICTwithUAV - ICT solutions with Unmanned Aircraft Vehicles
IoT4Health - Workshop on Internet of Things for Health
ISM - Intelligent Systems and Machines
ISTA - Information Systems and Technologies Adoption
MAMM&MJ - Managing Audiovisual Mass Media (governance, funding and innovation) and Mobile Journalism
NPAT - New Pedagogical Approaches with Technologies
PIS - Workshop on Pervasive Information Systems
RSPPI - Resources Sharing between Private and Public Institutions
SIdEWayS - Social Media World Sensors
TinW - Technologies in the Workplace - Use and Impact on Workers
IMPORTANT DATES
Deadline for paper submission: November 27th, 2016
Notification of paper acceptance: December 25th, 2016
Deadline for final versions and conference registration: January 8th, 2017
Conference dates: April 11 -13, 2017
SUBMISSION AND PAPER FORMAT
Please Submit your paper at: https://easychair.org/conferences/?conf=worldcist_workshops2017
Two types of papers can be submitted to workshops (both will be published at the Springer AISC proceedings):
- Full papers: Finished or consolidated R&D works. These papers are assigned a 10-page limit.
- Short papers: Finished or consolidated R&D works and also Ongoing work but with relevant preliminary results, open to discussion. These papers are assigned a 7-page limit.
Submitted papers must comply with the format of Advances in Intelligent Systems and Computing Series (see Instructions for Authors at Springer Website or download a DOC example) be written in English, must not have been published before, not be under review for any other conference, workshop or publication.
Paper should not include any information leading to the authors identification (in order to enable double blind review). Therefore, the authors names, affiliations and bibliographic references should not be included in the version for evaluation by the Program Committee. This information should only be included in the camera-ready version, saved in Word or Latex format and also in PDF format. These files must be accompanied by the Publication form filled out, in a ZIP file, and uploaded at the conference management system.
All papers will be subjected to a double-blind review by at least two/three members of the Program Committee. Based on Program Committee evaluation, a paper can be rejected or accepted by the Conference Chairs. In the latter case, it can be accepted as the type originally submitted or as another type. Thus, full papers can be accepted as short papers.
PUBLICATION AND INDEXING
Workshop papers will be published in the AISC Springer Conference Proceedings. To ensure that a paper is published in the Proceedings, at least one of the authors must be fully registered by 11th of January 2017, and the paper must comply with the suggested layout and page-limit. Additionally, all recommended changes must be addressed by the authors before they submit the camera-ready version. No more than one paper per registration will be published in the Conference Proceedings. An extra fee must be paid for publication of additional papers, with a maximum of one additional paper per registration. Full and short papers will be published in the Conference Proceedings by Springer, in Advances in Intelligent Systems and Computing. Published full and short papers will be submitted for indexation by ISI, EI-Compendex, SCOPUS and DBLP, among others, and will be available in the SpringerLink Digital Library. The authors of the best selected papers will be invited to extend them for publication in renowned international journals indexed by ISI, SCOPUS and DBLP (see the information available at the main conference CFP for more details).
Website of WorldCIST'17: http://www.worldcist.org/
Best regards,
Maria Lemos
AISTI
http://www.aisti.eu/
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v4 0/5] implement vcpu preempted check
From: Pan Xinhui @ 2016-10-19 16:57 UTC (permalink / raw)
To: Christian Borntraeger, Pan Xinhui, linux-kernel, linuxppc-dev,
virtualization, linux-s390, xen-devel-request, kvm
Cc: kernellwp, jgross, peterz, benh, will.deacon, mingo, paulus, mpe,
pbonzini, paulmck, boqun.feng
In-Reply-To: <e3fd1bd3-c57d-f0b0-68c4-ecd450d1ad32@de.ibm.com>
在 2016/10/19 14:47, Christian Borntraeger 写道:
> On 10/19/2016 12:20 PM, Pan Xinhui wrote:
>> change from v3:
>> add x86 vcpu preempted check patch
>
> If you want you could add the s390 patch that I provided for your last version.
> I also gave my Acked-by for all previous patches.
>
hi, Christian
Thanks a lot!
I can include your new s390 patch into my next patchset(if v5 is needed).
xinhui
>
>
>> change from v2:
>> no code change, fix typos, update some comments
>> change from v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
>> We also have observed some performace improvements.
>>
>> PPC test result:
>>
>> 1 copy - 0.94%
>> 2 copy - 7.17%
>> 4 copy - 11.9%
>> 8 copy - 3.04%
>> 16 copy - 15.11%
>>
>> details below:
>> Without patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks 2188223.0 KBps (30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks 1804433.0 KBps (30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks 1237257.0 KBps (30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks 1032658.0 KBps (30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks 768000.0 KBps (30.1 s, 1 samples)
>>
>> With patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks 2209189.0 KBps (30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks 1943816.0 KBps (30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks 1405591.0 KBps (30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks 1065080.0 KBps (30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks 904762.0 KBps (30.0 s, 1 samples)
>>
>> X86 test result:
>> test-case after-patch before-patch
>> Execl Throughput | 18307.9 lps | 11701.6 lps
>> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
>> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
>> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
>> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
>> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
>> Process Creation | 29881.2 lps | 28572.8 lps
>> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
>> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
>> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>>
>> Pan Xinhui (5):
>> kernel/sched: introduce vcpu preempted check interface
>> locking/osq: Drop the overload of osq_lock()
>> kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>> powerpc/spinlock: support vcpu preempted check
>> x86, kvm: support vcpu preempted check
>>
>> arch/powerpc/include/asm/spinlock.h | 8 ++++++++
>> arch/x86/include/asm/paravirt_types.h | 6 ++++++
>> arch/x86/include/asm/spinlock.h | 8 ++++++++
>> arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
>> arch/x86/kernel/kvm.c | 11 +++++++++++
>> arch/x86/kernel/paravirt.c | 11 +++++++++++
>> arch/x86/kvm/x86.c | 12 ++++++++++++
>> include/linux/sched.h | 12 ++++++++++++
>> kernel/locking/mutex.c | 15 +++++++++++++--
>> kernel/locking/osq_lock.c | 10 +++++++++-
>> kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
>> 11 files changed, 105 insertions(+), 7 deletions(-)
>>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v4 0/5] implement vcpu preempted check
From: Juergen Gross @ 2016-10-19 15:58 UTC (permalink / raw)
To: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
linux-s390, xen-devel, kvm
Cc: kernellwp, peterz, benh, will.deacon, mingo, paulus, mpe,
pbonzini, paulmck, boqun.feng
In-Reply-To: <1476872416-42752-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 4542 bytes --]
On 19/10/16 12:20, Pan Xinhui wrote:
> change from v3:
> add x86 vcpu preempted check patch
> change from v2:
> no code change, fix typos, update some comments
> change from v1:
> a simplier definition of default vcpu_is_preempted
> skip mahcine type check on ppc, and add config. remove dedicated macro.
> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
> add more comments
> thanks boqun and Peter's suggestion.
>
> This patch set aims to fix lock holder preemption issues.
>
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
>
> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>
> We also have observed some performace improvements.
>
> PPC test result:
>
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy - 3.04%
> 16 copy - 15.11%
>
> details below:
> Without patch:
>
> 1 copy - File Write 4096 bufsize 8000 maxblocks 2188223.0 KBps (30.0 s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks 1804433.0 KBps (30.0 s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks 1237257.0 KBps (30.0 s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks 1032658.0 KBps (30.0 s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks 768000.0 KBps (30.1 s, 1 samples)
>
> With patch:
>
> 1 copy - File Write 4096 bufsize 8000 maxblocks 2209189.0 KBps (30.0 s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks 1943816.0 KBps (30.0 s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks 1405591.0 KBps (30.0 s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks 1065080.0 KBps (30.0 s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks 904762.0 KBps (30.0 s, 1 samples)
>
> X86 test result:
> test-case after-patch before-patch
> Execl Throughput | 18307.9 lps | 11701.6 lps
> File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps
> File Copy 256 bufsize 500 maxblocks | 367555.6 KBps | 222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps
> Pipe Throughput | 11872208.7 lps | 11855628.9 lps
> Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps
> Process Creation | 29881.2 lps | 28572.8 lps
> Shell Scripts (1 concurrent) | 23224.3 lpm | 22607.4 lpm
> Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm
> System Call Overhead | 10385653.0 lps | 10419979.0 lps
>
> Pan Xinhui (5):
> kernel/sched: introduce vcpu preempted check interface
> locking/osq: Drop the overload of osq_lock()
> kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
> powerpc/spinlock: support vcpu preempted check
> x86, kvm: support vcpu preempted check
The attached patch adds Xen support for x86. Please tell me whether you
want to add this patch to your series or if I should post it when your
series has been accepted.
You can add my
Tested-by: Juergen Gross <jgross@suse.com>
for patches 1-3 and 5 (paravirt parts only).
Juergen
>
> arch/powerpc/include/asm/spinlock.h | 8 ++++++++
> arch/x86/include/asm/paravirt_types.h | 6 ++++++
> arch/x86/include/asm/spinlock.h | 8 ++++++++
> arch/x86/include/uapi/asm/kvm_para.h | 3 ++-
> arch/x86/kernel/kvm.c | 11 +++++++++++
> arch/x86/kernel/paravirt.c | 11 +++++++++++
> arch/x86/kvm/x86.c | 12 ++++++++++++
> include/linux/sched.h | 12 ++++++++++++
> kernel/locking/mutex.c | 15 +++++++++++++--
> kernel/locking/osq_lock.c | 10 +++++++++-
> kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
> 11 files changed, 105 insertions(+), 7 deletions(-)
>
[-- Attachment #2: 0001-x86-xen-support-vcpu-preempted-check.patch --]
[-- Type: text/x-patch, Size: 1414 bytes --]
From c79b86d00a812d6207ef788d453e2d0289ef22a0 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Wed, 19 Oct 2016 15:30:59 +0200
Subject: [PATCH] x86, xen: support vcpu preempted check
Support the vcpu_is_preempted() functionality under Xen. This will
enhance lock performance on overcommitted hosts (more runnable vcpus
than physical cpus in the system) as doing busy waits for preempted
vcpus will hurt system performance far worse than early yielding.
A quick test (4 vcpus on 1 physical cpu doing a parallel build job
with "make -j 8") reduced system time by about 5% with this patch.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
arch/x86/xen/spinlock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 3d6e006..1d53b1b 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,7 +114,6 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(irq_name, cpu) = NULL;
}
-
/*
* Our init of PV spinlocks is split in two init functions due to us
* using paravirt patching and jump labels patching and having to do
@@ -137,6 +136,8 @@ void __init xen_init_spinlocks(void)
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = xen_qlock_wait;
pv_lock_ops.kick = xen_qlock_kick;
+
+ pv_vcpu_ops.vcpu_is_preempted = xen_vcpu_stolen;
}
/*
--
2.6.6
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH net-next 5/6] net: use core MTU range checking in virt drivers
From: Jarod Wilson @ 2016-10-19 14:23 UTC (permalink / raw)
To: Haiyang Zhang
Cc: Michael S. Tsirkin, VMware, Inc., netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, Shrikrishna Khare
In-Reply-To: <BLUPR03MB141289B9E24503DF35D5AAC3CAD20@BLUPR03MB1412.namprd03.prod.outlook.com>
On Wed, Oct 19, 2016 at 02:07:47PM +0000, Haiyang Zhang wrote:
>
>
> > -----Original Message-----
> > From: Jarod Wilson [mailto:jarod@redhat.com]
> > Sent: Tuesday, October 18, 2016 10:34 PM
> > To: linux-kernel@vger.kernel.org
> > Cc: Jarod Wilson <jarod@redhat.com>; netdev@vger.kernel.org;
> > virtualization@lists.linux-foundation.org; KY Srinivasan
> > <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Michael S.
> > Tsirkin <mst@redhat.com>; Shrikrishna Khare <skhare@vmware.com>; VMware,
> > Inc. <pv-drivers@vmware.com>
> > Subject: [PATCH net-next 5/6] net: use core MTU range checking in virt
> > drivers
> >
> > hyperv_net:
> > - set min/max_mtu
> >
> > virtio_net:
> > - set min/max_mtu
> > - remove virtnet_change_mtu
> >
> > vmxnet3:
> > - set min/max_mtu
> >
> > CC: netdev@vger.kernel.org
> > CC: virtualization@lists.linux-foundation.org
> > CC: "K. Y. Srinivasan" <kys@microsoft.com>
> > CC: Haiyang Zhang <haiyangz@microsoft.com>
> > CC: "Michael S. Tsirkin" <mst@redhat.com>
> > CC: Shrikrishna Khare <skhare@vmware.com>
> > CC: "VMware, Inc." <pv-drivers@vmware.com>
> > Signed-off-by: Jarod Wilson <jarod@redhat.com>
> > ---
> > drivers/net/hyperv/hyperv_net.h | 4 ++--
> > drivers/net/hyperv/netvsc_drv.c | 14 +++++++-------
> > drivers/net/virtio_net.c | 23 ++++++++++-------------
> > drivers/net/vmxnet3/vmxnet3_drv.c | 7 ++++---
> > 4 files changed, 23 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/net/hyperv/hyperv_net.h
> > b/drivers/net/hyperv/hyperv_net.h
> > index f4fbcb5..3958ada 100644
> > --- a/drivers/net/hyperv/hyperv_net.h
> > +++ b/drivers/net/hyperv/hyperv_net.h
> > @@ -606,8 +606,8 @@ struct nvsp_message {
> > } __packed;
> >
> >
> > -#define NETVSC_MTU 65536
> > -#define NETVSC_MTU_MIN 68
> > +#define NETVSC_MTU 65535
>
> Why change it to 65535? For Hyperv host, this should be 65536.
Forgot to call this change out, sorry. That was changed, because of
IP_MAX_MTU being 0xFFFFU -> 65535.
> > @@ -1343,6 +1336,13 @@ static int netvsc_probe(struct hv_device *dev,
> >
> > netif_carrier_off(net);
> >
> > + /* MTU range: 68 - 1500 or 65521 */
> > + net->min_mtu = NETVSC_MTU_MIN;
> > + if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
> > + net->max_mtu = NETVSC_MTU - ETH_HLEN;
> > + else
> > + net->max_mtu = ETH_DATA_LEN;
> > +
> > netvsc_init_settings(net);
> >
> > net_device_ctx = netdev_priv(net);
>
> nvdev->nvsp_version is not set until after rndis_filter_device_add()
> is successfully completed.
> You need to move this part to the place just before this line:
> ret = register_netdev(net);
Okay, will fix that up.
--
Jarod Wilson
jarod@redhat.com
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox