* [PATCH v7 0/6] Implement qspinlock/pv-qspinlock on ppc
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
Hi All,
this is the fairlock patchset. You can apply them and build successfully.
patches are based on 4.8-rc4.
qspinlock can avoid waiter starved issue. It has about the same speed in
single-thread and it can be much faster in high contention situations
especially when the spinlock is embedded within the data structure to be
protected.
v6 -> v7:
rebase onto 4.8-rc4
no changelog anymore, sorry for that. I hope there is a very careful review.
Todo:
we can save one function call overhead. As we can use feature-fixup to patch
the binary code. Currently there is pv_lock_ops->lock(lock) and ->unlock(lock) to acquire/release the lock.
some benchmark result below
perf bench
these numbers are ops per sec, So the higher the better.
*******************************************
on pSeries with 32 vcpus, 32Gb memory, pHyp.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
futex hash | 618572 | 552332 | 553788
futex lock-pi | 364 | 364 | 364
sched pipe | 78984 | 76060 | 81454
------------------------------------------------------------------------------------
unix bench:
these numbers are scores, So the higher the better.
************************************************
on PowerNV with 16 cores(cpus) (smt off), 32Gb memory:
-------------
pv-qspinlock and qspinlock have very similar results because pv-qspinlock use native version
which is only having one callback overhead
------------------------------------------------------------------------------------
test case | pv-qspinlock and qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 761.1 761.4
File Copy 1024 bufsize 2000 maxblocks 1259.8 1286.6
File Copy 256 bufsize 500 maxblocks 782.2 790.3
File Copy 4096 bufsize 8000 maxblocks 2741.5 2817.4
Pipe Throughput 1063.2 1036.7
Pipe-based Context Switching 284.7 281.1
Process Creation 679.6 649.1
Shell Scripts (1 concurrent) 1933.2 1922.9
Shell Scripts (8 concurrent) 5003.3 4899.8
System Call Overhead 900.6 896.8
==========================
System Benchmarks Index Score 1139.3 1133.0
--------------------------------------------------------------------------- ---------
*******************************************
on pSeries with 32 vcpus, 32Gb memory, pHyp.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 877.1 891.2 872.8
File Copy 1024 bufsize 2000 maxblocks 1390.4 1399.2 1395.0
File Copy 256 bufsize 500 maxblocks 882.4 889.5 881.8
File Copy 4096 bufsize 8000 maxblocks 3112.3 3113.4 3121.7
Pipe Throughput 1095.8 1162.6 1158.5
Pipe-based Context Switching 194.9 192.7 200.7
Process Creation 518.4 526.4 509.1
Shell Scripts (1 concurrent) 1401.9 1413.9 1402.2
Shell Scripts (8 concurrent) 3215.6 3246.6 3229.1
System Call Overhead 833.2 892.4 888.1
====================================
System Benchmarks Index Score 1033.7 1052.5 1047.8
------------------------------------------------------------------------------------
******************************************
on pSeries with 32 vcpus, 16Gb memory, KVM.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 497.4 518.7 497.8
File Copy 1024 bufsize 2000 maxblocks 1368.8 1390.1 1343.3
File Copy 256 bufsize 500 maxblocks 857.7 859.8 831.4
File Copy 4096 bufsize 8000 maxblocks 2851.7 2838.1 2785.5
Pipe Throughput 1221.9 1265.3 1250.4
Pipe-based Context Switching 529.8 578.1 564.2
Process Creation 408.4 421.6 287.6
Shell Scripts (1 concurrent) 1201.8 1215.3 1185.8
Shell Scripts (8 concurrent) 3758.4 3799.3 3878.9
System Call Overhead 1008.3 1122.6 1134.2
=====================================
System Benchmarks Index Score 1072.0 1108.9 1050.6
------------------------------------------------------------------------------------
Pan Xinhui (6):
pv-qspinlock: use cmpxchg_release in __pv_queued_spin_unlock
powerpc/qspinlock: powerpc support qspinlock
powerpc: pseries/Kconfig: Add qspinlock build config
powerpc: lib/locks.c: Add cpu yield/wake helper function
powerpc/pv-qspinlock: powerpc support pv-qspinlock
powerpc: pSeries: Add pv-qspinlock build config/make
arch/powerpc/include/asm/qspinlock.h | 93 +++++++++++++
arch/powerpc/include/asm/qspinlock_paravirt.h | 36 +++++
.../powerpc/include/asm/qspinlock_paravirt_types.h | 13 ++
arch/powerpc/include/asm/spinlock.h | 35 +++--
arch/powerpc/include/asm/spinlock_types.h | 4 +
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/paravirt.c | 153 +++++++++++++++++++++
arch/powerpc/lib/locks.c | 122 ++++++++++++++++
arch/powerpc/platforms/pseries/Kconfig | 9 ++
arch/powerpc/platforms/pseries/setup.c | 5 +
kernel/locking/qspinlock_paravirt.h | 2 +-
11 files changed, 459 insertions(+), 14 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt_types.h
create mode 100644 arch/powerpc/kernel/paravirt.c
--
2.4.11
^ permalink raw reply
* [PATCH v7 1/6] pv-qspinlock: use cmpxchg_release in __pv_queued_spin_unlock
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
cmpxchg_release() is more lighweight than cmpxchg() on some archs(e.g.
PPC), moreover, in __pv_queued_spin_unlock() we only needs a RELEASE in
the fast path(pairing with *_try_lock() or *_lock()). And the slow path
has smp_store_release too. So it's safe to use cmpxchg_release here.
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
kernel/locking/qspinlock_paravirt.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 8a99abf..ce655aa 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -544,7 +544,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
* unhash. Otherwise it would be possible to have multiple @lock
* entries, which would be BAD.
*/
- locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
+ locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
if (likely(locked == _Q_LOCKED_VAL))
return;
--
2.4.11
^ permalink raw reply related
* [PATCH v7 2/6] powerpc/qspinlock: powerpc support qspinlock
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
This patch add basic code to enable qspinlock on powerpc. qspinlock is
one kind of fairlock implemention. And seen some performance improvement
under some scenarios.
queued_spin_unlock() release the lock by just one write of NULL to the
->locked field which sits at different places in the two endianness
system.
We override some arch_spin_xxx as powerpc has io_sync stuff which makes
sure the io operations are protected by the lock correctly.
There is another special case, see commit
2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/qspinlock.h | 66 +++++++++++++++++++++++++++++++
arch/powerpc/include/asm/spinlock.h | 31 +++++++++------
arch/powerpc/include/asm/spinlock_types.h | 4 ++
arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++
4 files changed, 147 insertions(+), 13 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock.h
diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index 0000000..881a186
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,66 @@
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include <asm-generic/qspinlock_types.h>
+
+#define SPIN_THRESHOLD (1 << 15)
+#define queued_spin_unlock queued_spin_unlock
+#define queued_spin_is_locked queued_spin_is_locked
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+
+static inline u8 * __qspinlock_lock_byte(struct qspinlock *lock)
+{
+ return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ /* release semantics is required */
+ smp_store_release(__qspinlock_lock_byte(lock), 0);
+}
+
+static inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+ smp_mb();
+ return atomic_read(&lock->val);
+}
+
+#include <asm-generic/qspinlock.h>
+
+/* we need override it as ppc has io_sync stuff */
+#undef arch_spin_trylock
+#undef arch_spin_lock
+#undef arch_spin_lock_flags
+#undef arch_spin_unlock
+#define arch_spin_trylock arch_spin_trylock
+#define arch_spin_lock arch_spin_lock
+#define arch_spin_lock_flags arch_spin_lock_flags
+#define arch_spin_unlock arch_spin_unlock
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ CLEAR_IO_SYNC;
+ return queued_spin_trylock(lock);
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ CLEAR_IO_SYNC;
+ queued_spin_lock(lock);
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+ CLEAR_IO_SYNC;
+ queued_spin_lock(lock);
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ SYNC_IO;
+ queued_spin_unlock(lock);
+}
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index fa37fe9..6aef8dd 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,23 @@
#define SYNC_IO
#endif
+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
+extern void __spin_yield(arch_spinlock_t *lock);
+extern void __rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+#define __spin_yield(x) barrier()
+#define __rw_yield(x) barrier()
+#define SHARED_PROCESSOR 0
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+
+#define arch_spin_relax(lock) __spin_yield(lock)
+
static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
{
return lock.slock == 0;
@@ -106,18 +123,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
* held. Conveniently, we have a word in the paca that holds this
* value.
*/
-
-#if defined(CONFIG_PPC_SPLPAR)
-/* We only yield to the hypervisor if we are in shared processor mode */
-#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-extern void __spin_yield(arch_spinlock_t *lock);
-extern void __rw_yield(arch_rwlock_t *lock);
-#else /* SPLPAR */
-#define __spin_yield(x) barrier()
-#define __rw_yield(x) barrier()
-#define SHARED_PROCESSOR 0
-#endif
-
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
CLEAR_IO_SYNC;
@@ -195,6 +200,7 @@ out:
smp_mb();
}
+#endif /* !CONFIG_QUEUED_SPINLOCKS */
/*
* Read-write spinlocks, allowing multiple readers
* but only one writer.
@@ -330,7 +336,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
#define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
#define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
-#define arch_spin_relax(lock) __spin_yield(lock)
#define arch_read_relax(lock) __rw_yield(lock)
#define arch_write_relax(lock) __rw_yield(lock)
diff --git a/arch/powerpc/include/asm/spinlock_types.h b/arch/powerpc/include/asm/spinlock_types.h
index 2351adc..bd7144e 100644
--- a/arch/powerpc/include/asm/spinlock_types.h
+++ b/arch/powerpc/include/asm/spinlock_types.h
@@ -5,11 +5,15 @@
# error "please don't include this file directly"
#endif
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock_types.h>
+#else
typedef struct {
volatile unsigned int slock;
} arch_spinlock_t;
#define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
+#endif
typedef struct {
volatile signed int lock;
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index b7b1237..6574626 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,7 @@
#include <asm/hvcall.h>
#include <asm/smp.h>
+#ifndef CONFIG_QUEUED_SPINLOCKS
void __spin_yield(arch_spinlock_t *lock)
{
unsigned int lock_value, holder_cpu, yield_count;
@@ -42,6 +43,7 @@ void __spin_yield(arch_spinlock_t *lock)
get_hard_smp_processor_id(holder_cpu), yield_count);
}
EXPORT_SYMBOL_GPL(__spin_yield);
+#endif
/*
* Waiting for a read lock or a write lock on a rwlock...
@@ -68,3 +70,60 @@ void __rw_yield(arch_rwlock_t *rw)
get_hard_smp_processor_id(holder_cpu), yield_count);
}
#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+/*
+ * This forbid we load an old value in another LL/SC. Because the SC here force
+ * another LL/SC repeat. So we guarantee all loads in another LL and SC will
+ * read correct value.
+ */
+static inline u32 atomic_read_sync(atomic_t *v)
+{
+ u32 val;
+
+ __asm__ __volatile__(
+"1: " PPC_LWARX(%0, 0, %2, 0) "\n"
+" stwcx. %0, 0, %2\n"
+" bne- 1b\n"
+ : "=&r" (val), "+m" (*v)
+ : "r" (v)
+ : "cr0", "xer");
+
+ return val;
+}
+
+void queued_spin_unlock_wait(struct qspinlock *lock)
+{
+
+ u32 val;
+
+ smp_mb();
+
+ /*
+ * copied from generic queue_spin_unlock_wait with little modification
+ */
+ for (;;) {
+ /* need _sync, as we might race with another LL/SC in lock()*/
+ val = atomic_read_sync(&lock->val);
+
+ if (!val) /* not locked, we're done */
+ goto done;
+
+ if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
+ break;
+
+ /* not locked, but pending, wait until we observe the lock */
+ cpu_relax();
+ }
+
+ /*
+ * any unlock is good. And need not _sync, as ->val is set by the SC in
+ * unlock(), any loads in lock() must see the correct value.
+ */
+ while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
+ cpu_relax();
+done:
+ smp_mb();
+}
+EXPORT_SYMBOL(queued_spin_unlock_wait);
+#endif
--
2.4.11
^ permalink raw reply related
* [PATCH v7 3/6] powerpc: pseries/Kconfig: Add qspinlock build config
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
pseries will use qspinlock by default.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb..f669323 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -21,6 +21,7 @@ config PPC_PSERIES
select HOTPLUG_CPU if SMP
select ARCH_RANDOM
select PPC_DOORBELL
+ select ARCH_USE_QUEUED_SPINLOCKS
default y
config PPC_SPLPAR
--
2.4.11
^ permalink raw reply related
* [PATCH v7 4/6] powerpc: lib/locks.c: Add cpu yield/wake helper function
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Add two corresponding helper functions to support pv-qspinlock.
For normal use, __spin_yield_cpu will confer current vcpu slices to the
target vcpu(say, a lock holder). If target vcpu is not specified or it
is in running state, such conferging to lpar happens or not depends.
Because hcall itself will introduce latency and a little overhead. And
we do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
The second parameter *confer* can indicate such case.
__spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
current vcpu state.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/spinlock.h | 4 +++
arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+)
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 6aef8dd..abb6b0f 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -56,9 +56,13 @@
/* We only yield to the hypervisor if we are in shared processor mode */
#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
extern void __spin_yield(arch_spinlock_t *lock);
+extern void __spin_yield_cpu(int cpu, int confer);
+extern void __spin_wake_cpu(int cpu);
extern void __rw_yield(arch_rwlock_t *lock);
#else /* SPLPAR */
#define __spin_yield(x) barrier()
+#define __spin_yield_cpu(x,y) barrier()
+#define __spin_wake_cpu(x) barrier()
#define __rw_yield(x) barrier()
#define SHARED_PROCESSOR 0
#endif
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6574626..892df7d 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,65 @@
#include <asm/hvcall.h>
#include <asm/smp.h>
+/*
+ * confer our slices to a specified cpu and return. If it is already running or
+ * cpu is -1, then we will check confer. If confer is NULL, we will return
+ * otherwise we confer our slices to lpar.
+ */
+void __spin_yield_cpu(int cpu, int confer)
+{
+ unsigned int holder_cpu = cpu, yield_count;
+
+ if (cpu == -1)
+ goto yield_to_lpar;
+
+ BUG_ON(holder_cpu >= nr_cpu_ids);
+ yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+ /* if cpu is running, confer slices to lpar conditionally*/
+ if ((yield_count & 1) == 0)
+ goto yield_to_lpar;
+
+ plpar_hcall_norets(H_CONFER,
+ get_hard_smp_processor_id(holder_cpu), yield_count);
+ return;
+
+yield_to_lpar:
+ if (confer)
+ plpar_hcall_norets(H_CONFER, -1, 0);
+}
+EXPORT_SYMBOL_GPL(__spin_yield_cpu);
+
+void __spin_wake_cpu(int cpu)
+{
+ unsigned int holder_cpu = cpu;
+
+ BUG_ON(holder_cpu >= nr_cpu_ids);
+ /*
+ * NOTE: we should always do this hcall regardless of
+ * the yield_count of the holder_cpu.
+ * as thers might be a case like below;
+ * CPU 1 2
+ * yielded = true
+ * if (yielded)
+ * __spin_wake_cpu()
+ * __spin_yield_cpu()
+ *
+ * So we might lose a wake if we check the yield_count and
+ * return directly if the holder_cpu is running.
+ * IOW. do NOT code like below.
+ * yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+ * if ((yield_count & 1) == 0)
+ * return;
+ *
+ * a PROD hcall marks the target_cpu proded, which cause the next cede or confer
+ * called on the target_cpu invalid.
+ */
+ plpar_hcall_norets(H_PROD,
+ get_hard_smp_processor_id(holder_cpu));
+}
+EXPORT_SYMBOL_GPL(__spin_wake_cpu);
+
#ifndef CONFIG_QUEUED_SPINLOCKS
void __spin_yield(arch_spinlock_t *lock)
{
--
2.4.11
^ permalink raw reply related
* [PATCH v7 5/6] powerpc/pv-qspinlock: powerpc support pv-qspinlock
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
The default pv-qspinlock uses qspinlock(native version of pv-qspinlock).
pv_lock initialization should be done in bootstage with irq disabled.
And if we run as a guest with powerKVM/pHyp shared_processor mode,
restore pv_lock_ops callbacks to pv-qspinlock(pv version) which makes
full use of virtualization.
There is a hash table, we store cpu number into it and the key is lock.
So everytime pv_wait can know who is the lock holder by searching the
lock. Also store the lock in a per_cpu struct, and remove it when we own
the lock. Then pv_wait can know which lock we are spinning on. But the
cpu in the hash table might not be the correct lock holder, as for
performace issue, we does not take care of hash conflict.
Also introduce spin_lock_holder, which tells who owns the lock now.
currently the only user is spin_unlock_wait.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/qspinlock.h | 29 +++-
arch/powerpc/include/asm/qspinlock_paravirt.h | 36 +++++
.../powerpc/include/asm/qspinlock_paravirt_types.h | 13 ++
arch/powerpc/kernel/paravirt.c | 153 +++++++++++++++++++++
arch/powerpc/lib/locks.c | 8 +-
arch/powerpc/platforms/pseries/setup.c | 5 +
6 files changed, 241 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt_types.h
create mode 100644 arch/powerpc/kernel/paravirt.c
diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
index 881a186..23459fb 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -15,7 +15,7 @@ static inline u8 * __qspinlock_lock_byte(struct qspinlock *lock)
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}
-static inline void queued_spin_unlock(struct qspinlock *lock)
+static inline void native_queued_spin_unlock(struct qspinlock *lock)
{
/* release semantics is required */
smp_store_release(__qspinlock_lock_byte(lock), 0);
@@ -27,6 +27,33 @@ static inline int queued_spin_is_locked(struct qspinlock *lock)
return atomic_read(&lock->val);
}
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#include <asm/qspinlock_paravirt.h>
+/*
+ * try to know who is the lock holder, however it is not always true
+ * Return:
+ * -1, we did not know the lock holder.
+ * other value, likely is the lock holder.
+ */
+extern int spin_lock_holder(void *lock);
+
+static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+ pv_queued_spin_lock(lock, val);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ pv_queued_spin_unlock(lock);
+}
+#else
+#define spin_lock_holder(l) (-1)
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ native_queued_spin_unlock(lock);
+}
+#endif
+
#include <asm-generic/qspinlock.h>
/* we need override it as ppc has io_sync stuff */
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h b/arch/powerpc/include/asm/qspinlock_paravirt.h
new file mode 100644
index 0000000..d87cda0
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -0,0 +1,36 @@
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+#error "do not include this file"
+#endif
+
+#ifndef _ASM_QSPINLOCK_PARAVIRT_H
+#define _ASM_QSPINLOCK_PARAVIRT_H
+
+#include <asm/qspinlock_paravirt_types.h>
+
+extern void pv_lock_init(void);
+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_init_lock_hash(void);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
+{
+ pv_lock_op.lock(lock, val);
+}
+
+static inline void pv_queued_spin_unlock(struct qspinlock *lock)
+{
+ pv_lock_op.unlock(lock);
+}
+
+static inline void pv_wait(u8 *ptr, u8 val)
+{
+ pv_lock_op.wait(ptr, val);
+}
+
+static inline void pv_kick(int cpu)
+{
+ pv_lock_op.kick(cpu);
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt_types.h b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
new file mode 100644
index 0000000..83611ed
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+#define _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+
+struct pv_lock_ops {
+ void (*lock)(struct qspinlock *lock, u32 val);
+ void (*unlock)(struct qspinlock *lock);
+ void (*wait)(u8 *ptr, u8 val);
+ void (*kick)(int cpu);
+};
+
+extern struct pv_lock_ops pv_lock_op;
+
+#endif
diff --git a/arch/powerpc/kernel/paravirt.c b/arch/powerpc/kernel/paravirt.c
new file mode 100644
index 0000000..e697b17
--- /dev/null
+++ b/arch/powerpc/kernel/paravirt.c
@@ -0,0 +1,153 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/hash.h>
+#include <linux/bootmem.h>
+
+/* +2 here is to make sure there is not many conflict*/
+#define NUM_LOCK_CPU_ENTRY_SHIFT (order_base_2(NR_CPUS) + 2)
+#define NUM_LOCK_CPU_ENTRY (1 << NUM_LOCK_CPU_ENTRY_SHIFT)
+/* we can only spin on 4 locks at same time on same cpu*/
+#define NUM_LOCKS_PER_CPU 4
+
+static u16 *hash_lock_cpu_ptr;
+
+struct locks_on_cpu {
+ void *l[NUM_LOCKS_PER_CPU];
+ int count;
+};
+
+static DEFINE_PER_CPU(struct locks_on_cpu, node);
+
+static u16 *hash(void *l)
+{
+ int val = hash_ptr(l, NUM_LOCK_CPU_ENTRY_SHIFT);
+
+ return &hash_lock_cpu_ptr[val];
+}
+
+static void __init init_hash(void)
+{
+ int size = NUM_LOCK_CPU_ENTRY * sizeof(*hash_lock_cpu_ptr);
+
+ hash_lock_cpu_ptr = memblock_virt_alloc(size, 0);
+ memset(hash_lock_cpu_ptr, 0, size);
+}
+
+#define lock_get_holder(l) \
+ ((int)(*hash(l) - 1))
+
+#define lock_set_holder(l) \
+ (*hash(l) = raw_smp_processor_id() + 1)
+
+int spin_lock_holder(void *lock)
+{
+ /* we might run on PowerNV, which has no hash table ptr*/
+ if (hash_lock_cpu_ptr)
+ return lock_get_holder(lock);
+ return -1;
+}
+EXPORT_SYMBOL(spin_lock_holder);
+
+static void *this_cpu_lock(void)
+{
+ struct locks_on_cpu *this_node = this_cpu_ptr(&node);
+ int i = this_node->count - 1;
+
+ return this_node->l[i];
+}
+
+static void cpu_save_lock(void *l)
+{
+ struct locks_on_cpu *this_node = this_cpu_ptr(&node);
+ int i = this_node->count++;
+
+ this_node->l[i] = l;
+}
+
+static void cpu_remove_lock(void *l)
+{
+ __this_cpu_dec(node.count);
+}
+
+static void __native_queued_spin_unlock(struct qspinlock *lock)
+{
+ native_queued_spin_unlock(lock);
+}
+
+static void __pv_lock(struct qspinlock *lock, u32 val)
+{
+ /*
+ * save the lock we are spinning on
+ * pv_wait need know this lock
+ */
+ cpu_save_lock(lock);
+
+ __pv_queued_spin_lock_slowpath(lock, val);
+
+ /* as we win the lock, remove it*/
+ cpu_remove_lock(lock);
+
+ /*
+ * let other spinner know who is the lock holder
+ * we does not need to unset lock holder in unlock()
+ */
+ lock_set_holder(lock);
+}
+
+static void __pv_wait(u8 *ptr, u8 val)
+{
+ void *l = this_cpu_lock();
+ int cpu;
+ int always_confer = !in_interrupt();
+
+ while (READ_ONCE(*ptr) == val) {
+ HMT_low();
+ /*
+ * the lock might be unlocked once and locked again
+ */
+ cpu = lock_get_holder(l);
+
+ /*
+ * the default behavior of __spin_yield_cpu is yielding
+ * our cpu slices to target vcpu or lpar(pHyp or KVM).
+ * consider the latency of hcall itself and the priority of
+ * current task, we can do a optimisation.
+ * IOW, if we are in interrupt, and the target vcpu is running
+ * we do not yield ourself to lpar.
+ */
+ __spin_yield_cpu(cpu, always_confer);
+ }
+ HMT_medium();
+}
+
+static void __pv_kick(int cpu)
+{
+ __spin_wake_cpu(cpu);
+}
+
+struct pv_lock_ops pv_lock_op = {
+ .lock = native_queued_spin_lock_slowpath,
+ .unlock = __native_queued_spin_unlock,
+ .wait = NULL,
+ .kick = NULL,
+};
+EXPORT_SYMBOL(pv_lock_op);
+
+void __init pv_lock_init(void)
+{
+ if (SHARED_PROCESSOR) {
+ init_hash();
+ __pv_init_lock_hash();
+ pv_lock_op.lock = __pv_lock;
+ pv_lock_op.unlock = __pv_queued_spin_unlock;
+ pv_lock_op.wait = __pv_wait;
+ pv_lock_op.kick = __pv_kick;
+ }
+}
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 892df7d..5daa35a 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -179,8 +179,12 @@ void queued_spin_unlock_wait(struct qspinlock *lock)
* any unlock is good. And need not _sync, as ->val is set by the SC in
* unlock(), any loads in lock() must see the correct value.
*/
- while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
- cpu_relax();
+ while (atomic_read(&lock->val) & _Q_LOCKED_MASK) {
+ HMT_low();
+ if (SHARED_PROCESSOR)
+ __spin_yield_cpu(spin_lock_holder(lock), 0);
+ }
+ HMT_medium();
done:
smp_mb();
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 4ffcaa6..672d888 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -487,6 +487,11 @@ static void __init pSeries_setup_arch(void)
}
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+ pv_lock_init();
+#endif
+
}
static int __init pSeries_init_panel(void)
--
2.4.11
^ permalink raw reply related
* [PATCH v7 6/6] powerpc: pSeries: Add pv-qspinlock build config/make
From: Pan Xinhui @ 2016-09-19 9:23 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev
Cc: xinhui.pan, peterz, benh, waiman.long, virtualization, mingo,
paulus, mpe, paulmck
In-Reply-To: <1474277037-15200-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
pSeries run as a guest and might need pv-qspinlock.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fe4c075..efd2f3d 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,6 +49,7 @@ obj-$(CONFIG_PPC_970_NAP) += idle_power4.o
obj-$(CONFIG_PPC_P7_NAP) += idle_book3s.o
procfs-y := proc_powerpc.o
obj-$(CONFIG_PROC_FS) += $(procfs-y)
+obj-$(CONFIG_PARAVIRT_SPINLOCKS) += paravirt.o
rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI) := rtas_pci.o
obj-$(CONFIG_PPC_RTAS) += rtas.o rtas-rtc.o $(rtaspci-y-y)
obj-$(CONFIG_PPC_RTAS_DAEMON) += rtasd.o
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index f669323..46632e4 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -128,3 +128,11 @@ config HV_PERF_CTRS
systems. 24x7 is available on Power 8 systems.
If unsure, select Y.
+
+config PARAVIRT_SPINLOCKS
+ bool "Paravirtialization support for qspinlock"
+ depends on PPC_SPLPAR && QUEUED_SPINLOCKS
+ default y
+ help
+ If platform supports virtualization, for example PowerVM, this option
+ can let guest have a better performace.
--
2.4.11
^ permalink raw reply related
* [PATCH 0/2] Fix recursive Kconfig depedency issue
From: Peter Griffin @ 2016-09-19 9:34 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kernel, airlied, kraxel,
vinod.koul, sfr, bjorn.andersson
Cc: peter.griffin, lee.jones, dri-devel, virtualization
Hi,
This series fixes a Kconfig recurssive depedency issue and also some
trivial white space. This was found when developing the fdma dmaegine
series [1], which has now been applied by Vinod.
These patchs were originally included as part of the fdma series [1],
but I've now sent it separately as requested by Vinod [2], and also
privately by Emil.
FYI not having this patch applied causes kbuild error emails [3].
[1] https://lkml.org/lkml/2016/9/13/191
[2] https://lkml.org/lkml/2016/9/15/562
[3] https://lkml.org/lkml/2016/9/15/850
Peter Griffin (2):
drm/virtio: kconfig: Fix recursive dependency issue.
drm/virtio: kconfig: Fixup white space.
drivers/gpu/drm/virtio/Kconfig | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
--
1.9.1
^ permalink raw reply
* [PATCH 1/2] drm/virtio: kconfig: Fix recursive dependency issue.
From: Peter Griffin @ 2016-09-19 9:34 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kernel, airlied, kraxel,
vinod.koul, sfr, bjorn.andersson
Cc: peter.griffin, lee.jones, dri-devel, virtualization
In-Reply-To: <1474277660-5196-1-git-send-email-peter.griffin@linaro.org>
ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
recursive dependency.
[..]
drivers/video/fbdev/Kconfig:5:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:5: symbol FB is selected by DRM_KMS_FB_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:42: symbol DRM_KMS_FB_HELPER depends on DRM_KMS_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:36: symbol DRM_KMS_HELPER is selected by DRM_VIRTIO_GPU
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/virtio/Kconfig:1: symbol DRM_VIRTIO_GPU depends on VIRTIO
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/virtio/Kconfig:1: symbol VIRTIO is selected by REMOTEPROC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/remoteproc/Kconfig:4: symbol REMOTEPROC is selected by ST_SLIM_REMOTEPROC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/remoteproc/Kconfig:103: symbol ST_SLIM_REMOTEPROC is selected by ST_FDMA
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/dma/Kconfig:440: symbol ST_FDMA depends on DMADEVICES
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/dma/Kconfig:5: symbol DMADEVICES is selected by SND_SOC_SH4_SIU
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
sound/soc/sh/Kconfig:29: symbol SND_SOC_SH4_SIU is selected by SND_SIU_MIGOR
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
sound/soc/sh/Kconfig:64: symbol SND_SIU_MIGOR depends on I2C
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/i2c/Kconfig:7: symbol I2C is selected by FB_DDC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:63: symbol FB_DDC is selected by FB_CYBER2000_DDC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:378: symbol FB_CYBER2000_DDC depends on FB_CYBER2000
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:366: symbol FB_CYBER2000 depends on FB
which is due to drivers/gpu/drm/virtio/Kconfig depending on VIRTIO.
Fix by dropping depend and switching to select VIRTIO.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
---
drivers/gpu/drm/virtio/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/virtio/Kconfig b/drivers/gpu/drm/virtio/Kconfig
index e1afc3d..90357d9 100644
--- a/drivers/gpu/drm/virtio/Kconfig
+++ b/drivers/gpu/drm/virtio/Kconfig
@@ -1,6 +1,7 @@
config DRM_VIRTIO_GPU
tristate "Virtio GPU driver"
- depends on DRM && VIRTIO
+ depends on DRM
+ select VIRTIO
select DRM_KMS_HELPER
select DRM_TTM
help
--
1.9.1
^ permalink raw reply related
* [PATCH 2/2] drm/virtio: kconfig: Fixup white space.
From: Peter Griffin @ 2016-09-19 9:34 UTC (permalink / raw)
To: linux-arm-kernel, linux-kernel, kernel, airlied, kraxel,
vinod.koul, sfr, bjorn.andersson
Cc: peter.griffin, lee.jones, dri-devel, virtualization
In-Reply-To: <1474277660-5196-1-git-send-email-peter.griffin@linaro.org>
Use tabs instead of spaces.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
---
drivers/gpu/drm/virtio/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/virtio/Kconfig b/drivers/gpu/drm/virtio/Kconfig
index 90357d9..2d83932 100644
--- a/drivers/gpu/drm/virtio/Kconfig
+++ b/drivers/gpu/drm/virtio/Kconfig
@@ -2,10 +2,10 @@ config DRM_VIRTIO_GPU
tristate "Virtio GPU driver"
depends on DRM
select VIRTIO
- select DRM_KMS_HELPER
- select DRM_TTM
+ select DRM_KMS_HELPER
+ select DRM_TTM
help
This is the virtual GPU driver for virtio. It can be used with
- QEMU based VMMs (like KVM or Xen).
+ QEMU based VMMs (like KVM or Xen).
If unsure say M.
--
1.9.1
^ permalink raw reply related
* [PATCH 0/2] virtio-ringtest: more enablement
From: Cornelia Huck @ 2016-09-19 15:06 UTC (permalink / raw)
To: mst; +Cc: pasic, linux-kernel, virtualization
Hi Michael,
here are two patches by Halil so that we can get ringtest going on
our platform as well.
Question: Do you have an idea on how realistic the timings are
supposed to be? The s390 path is, well, 'tweak me'...
Halil Pasic (2):
tools/virtio/ringtest: fix run-on-all.sh for offline cpus
tools/virtio/ringtest: tweaks for s390
tools/virtio/ringtest/main.h | 12 ++++++++++++
tools/virtio/ringtest/run-on-all.sh | 5 +++--
2 files changed, 15 insertions(+), 2 deletions(-)
--
2.8.4
^ permalink raw reply
* [PATCH 1/2] tools/virtio/ringtest: fix run-on-all.sh for offline cpus
From: Cornelia Huck @ 2016-09-19 15:06 UTC (permalink / raw)
To: mst; +Cc: pasic, linux-kernel, virtualization
In-Reply-To: <20160919150656.6194-1-cornelia.huck@de.ibm.com>
From: Halil Pasic <pasic@linux.vnet.ibm.com>
Since ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work
without /dev/cpu") run-on-all.sh uses seq 0 $HOST_AFFINITY as the list
of ids of the CPUs to run the command on (assuming ids of online CPUs
are consecutive and start from 0), where $HOST_AFFINITY is the highest
CPU id in the system previously determined using lscpu. This can fail
on systems with offline CPUs.
Instead let's use lscpu to determine the list of online CPUs.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Fixes: ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work without
/dev/cpu")
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
tools/virtio/ringtest/run-on-all.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/virtio/ringtest/run-on-all.sh b/tools/virtio/ringtest/run-on-all.sh
index 2e69ca8..29b0d39 100755
--- a/tools/virtio/ringtest/run-on-all.sh
+++ b/tools/virtio/ringtest/run-on-all.sh
@@ -1,12 +1,13 @@
#!/bin/sh
+CPUS_ONLINE=$(lscpu --online -p=cpu|grep -v -e '#')
#use last CPU for host. Why not the first?
#many devices tend to use cpu0 by default so
#it tends to be busier
-HOST_AFFINITY=$(lscpu -p=cpu | tail -1)
+HOST_AFFINITY=$(echo "${CPUS_ONLINE}"|tail -n 1)
#run command on all cpus
-for cpu in $(seq 0 $HOST_AFFINITY)
+for cpu in $CPUS_ONLINE
do
#Don't run guest and host on same CPU
#It actually works ok if using signalling
--
2.8.4
^ permalink raw reply related
* [PATCH 2/2] tools/virtio/ringtest: tweaks for s390
From: Cornelia Huck @ 2016-09-19 15:06 UTC (permalink / raw)
To: mst; +Cc: pasic, linux-kernel, virtualization
In-Reply-To: <20160919150656.6194-1-cornelia.huck@de.ibm.com>
From: Halil Pasic <pasic@linux.vnet.ibm.com>
Make ringtest work on s390 too.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
tools/virtio/ringtest/main.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/tools/virtio/ringtest/main.h b/tools/virtio/ringtest/main.h
index 16917ac..7806b7b 100644
--- a/tools/virtio/ringtest/main.h
+++ b/tools/virtio/ringtest/main.h
@@ -26,6 +26,16 @@ static inline void wait_cycles(unsigned long long cycles)
#define VMEXIT_CYCLES 500
#define VMENTRY_CYCLES 500
+#elif defined(__s390x__)
+static inline void wait_cycles(unsigned long long cycles)
+{
+ asm volatile("0: brctg %0,0b" : : "d" (cycles));
+}
+
+/* tweak me */
+#define VMEXIT_CYCLES 200
+#define VMENTRY_CYCLES 200
+
#else
static inline void wait_cycles(unsigned long long cycles)
{
@@ -81,6 +91,8 @@ extern unsigned ring_size;
/* Is there a portable way to do this? */
#if defined(__x86_64__) || defined(__i386__)
#define cpu_relax() asm ("rep; nop" ::: "memory")
+#elif defined(__s390x__)
+#define cpu_relax() barrier()
#else
#define cpu_relax() assert(0)
#endif
--
2.8.4
^ permalink raw reply related
* [PATCH v11] virtio-net: add Max MTU configuration field
From: Aaron Conole @ 2016-09-19 15:10 UTC (permalink / raw)
To: virtio-dev, virtualization
Cc: Victor Kaplansky, Maxime Coquelin, Michael S. Tsirkin
It is helpful for a host to indicate it's MTU to be set on guest NICs
other than the assumed 1500 byte value. This helps in situations where
the host network is using Jumbo Frames, or aiding in PMTU discovery by
configuring a homogenous network. It is also helpful for sizing receive
buffers correctly.
The change adds a new field to configuration area of network
devices. It will be used to pass a maximum MTU from the device to
the driver. This will be used by the driver as a maximum value for
packet sizes during transmission, without segmentation offloading.
In addition, in order to support backward and forward compatibility,
we introduce a new feature bit called VIRTIO_NET_F_MTU.
VIRTIO-152
Signed-off-by: Aaron Conole <aconole@redhat.com>
Cc: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Hannes Reiencke <hare@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Note: should this proposal be accepted and approved, one or more
claims disclosed to the TC admin and listed on the Virtio TC
IPR page https://www.oasis-open.org/committees/virtio/ipr.php
might become Essential Claims.
v1:
This is an attempt at continuing the work done by Victor Kaplansky on
mtu negiotiation for virtio-net devices. It attempts to pick up from
https://lists.oasis-open.org/archives/virtio-dev/201508/msg00007.html
and is just a minor blurb from the first patch along with the 2nd patch
from the series, and some of the feedback integrated.
v2:
Rephrase and provide a mechanism for guest->host and host->guest
communication through a driver read-only and driver write-only field.
v3:
Converted to just support initial MTU. Guest->host and Host->guest MTU
changes are outside the scope of this change.
v4:
Removed references to 'initial', since that condition cannot be tested.
Simply state that if the driver will use the mtu field, it must
negotiate the feature bit, and if not, it must not.
v5:
After feedback from Michael S. Tsirkin
v6:
Bit has to change to bit 25 due to an undocumented bit 24 being taken.
v7:
Partial rewrite with feedback from MST. Additional conformance statements
added.
v8:
Clarified the new conformance statements.
v9:
Updated the commit log for correctness. Added ACKs from
Michael S. Tsirkin, and Maxime Coquelin. Included the virtio
issue id.
v10:
Update the conformance statement wordings from previous 'offered' form
to 'succesfully negotiated' form.
v11:
Clarified size w.r.t. MTU - buffer sizes must also add extra space for ethernet
headers. Also updated the feature bit description.
content.tex | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/content.tex b/content.tex
index 4b45678..45b94c6 100644
--- a/content.tex
+++ b/content.tex
@@ -3049,6 +3049,11 @@ features.
\item[VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2)] Control channel offloads
reconfiguration support.
+\item[VIRTIO_NET_F_MTU(3)] Device maximum MTU reporting is supported. If
+ offered by the device, device advises driver about the value of
+ its maximum MTU. If negotiated, the driver uses \field{mtu} as
+ the maximum MTU value.
+
\item[VIRTIO_NET_F_MAC (5)] Device has given MAC address.
\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Driver can receive TSOv4.
@@ -3140,11 +3145,16 @@ of each of transmit and receive virtqueues (receiveq1\ldots receiveqN
and transmitq1\ldots transmitqN respectively) that can be configured once VIRTIO_NET_F_MQ
is negotiated.
+The following driver-read-only field, \field{mtu} only exists if
+VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
+use.
+
\begin{lstlisting}
struct virtio_net_config {
u8 mac[6];
le16 status;
le16 max_virtqueue_pairs;
+ le16 mtu;
};
\end{lstlisting}
@@ -3153,6 +3163,23 @@ struct virtio_net_config {
The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
if it offers VIRTIO_NET_F_MQ.
+The device MUST set \field{mtu} to between 68 and 65535 inclusive,
+if it offers VIRTIO_NET_F_MTU.
+
+The device SHOULD set \field{mtu} to at least 1280, if it offers
+VIRTIO_NET_F_MTU.
+
+The device MUST NOT modify \field{mtu} once it has been set.
+
+The device MUST NOT pass received packets that exceed \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN
+after VIRTIO_NET_F_MTU has been successfully negotiated.
+
+The device MUST forward transmitted packets of up to \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN, and do
+so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
+negotiated.
+
\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
@@ -3165,6 +3192,16 @@ If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD
assume the link is active, otherwise it SHOULD read the link status from
the bottom bit of \field{status}.
+A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
+buffers of size \field{mtu} (plus low level ethernet header length) to be
+able to receive at least one receive packet with \field{gso_type} NONE or ECN.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
+size exceeding the value of \field{mtu} (plus low level ethernet header length)
+with \field{gso_type} NONE or ECN
+
\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
\label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
When using the legacy interface, transitional devices and drivers
--
2.7.4
^ permalink raw reply related
* Re: [virtio-dev] [PATCH v11] virtio-net: add Max MTU configuration field
From: Cornelia Huck @ 2016-09-19 15:26 UTC (permalink / raw)
To: Aaron Conole
Cc: Victor Kaplansky, virtio-dev, Maxime Coquelin, Michael S. Tsirkin,
virtualization
In-Reply-To: <1474297826-7642-1-git-send-email-aconole@redhat.com>
On Mon, 19 Sep 2016 11:10:26 -0400
Aaron Conole <aconole@redhat.com> wrote:
> @@ -3165,6 +3192,16 @@ If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD
> assume the link is active, otherwise it SHOULD read the link status from
> the bottom bit of \field{status}.
>
> +A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
> +
> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
> +buffers of size \field{mtu} (plus low level ethernet header length) to be
> +able to receive at least one receive packet with \field{gso_type} NONE or ECN.
> +
> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
> +size exceeding the value of \field{mtu} (plus low level ethernet header length)
> +with \field{gso_type} NONE or ECN
Missing period.
> +
> \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
> \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
> When using the legacy interface, transitional devices and drivers
Otherwise,
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11] virtio-net: add Max MTU configuration field
From: Aaron Conole @ 2016-09-19 17:02 UTC (permalink / raw)
To: Cornelia Huck
Cc: Victor Kaplansky, virtio-dev, Maxime Coquelin, Michael S. Tsirkin,
virtualization
In-Reply-To: <20160919172619.6dc465c7.cornelia.huck@de.ibm.com>
Cornelia Huck <cornelia.huck@de.ibm.com> writes:
> On Mon, 19 Sep 2016 11:10:26 -0400
> Aaron Conole <aconole@redhat.com> wrote:
>
>> @@ -3165,6 +3192,16 @@ If the driver does not negotiate the
>> VIRTIO_NET_F_STATUS feature, it SHOULD
>> assume the link is active, otherwise it SHOULD read the link status from
>> the bottom bit of \field{status}.
>>
>> +A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
>> +
>> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
>> +buffers of size \field{mtu} (plus low level ethernet header length) to be
>> +able to receive at least one receive packet with \field{gso_type} NONE or ECN.
>> +
>> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
>> +size exceeding the value of \field{mtu} (plus low level ethernet header length)
>> +with \field{gso_type} NONE or ECN
>
> Missing period.
D'oh!
Michael,
Is this something that requires a new spin of the patch to
correct? If so, I'll add Cornelia's ACK and repost today.
>> +
>> \subsubsection{Legacy Interface: Device configuration
>> layout}\label{sec:Device Types / Network Device / Device
>> configuration layout / Legacy Interface: Device configuration
>> layout}
>> \label{sec:Device Types / Block Device / Feature bits / Device
>> configuration layout / Legacy Interface: Device configuration
>> layout}
>> When using the legacy interface, transitional devices and drivers
>
> Otherwise,
>
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>
>
^ permalink raw reply
* Re: [PATCH v11] virtio-net: add Max MTU configuration field
From: Michael S. Tsirkin @ 2016-09-19 17:35 UTC (permalink / raw)
To: Aaron Conole
Cc: Victor Kaplansky, virtio-dev, Maxime Coquelin, virtualization
In-Reply-To: <1474297826-7642-1-git-send-email-aconole@redhat.com>
On Mon, Sep 19, 2016 at 11:10:26AM -0400, Aaron Conole wrote:
> It is helpful for a host to indicate it's MTU to be set on guest NICs
> other than the assumed 1500 byte value. This helps in situations where
> the host network is using Jumbo Frames, or aiding in PMTU discovery by
> configuring a homogenous network. It is also helpful for sizing receive
> buffers correctly.
>
> The change adds a new field to configuration area of network
> devices. It will be used to pass a maximum MTU from the device to
> the driver. This will be used by the driver as a maximum value for
> packet sizes during transmission, without segmentation offloading.
>
> In addition, in order to support backward and forward compatibility,
> we introduce a new feature bit called VIRTIO_NET_F_MTU.
>
> VIRTIO-152
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> Cc: Victor Kaplansky <victork@redhat.com>
> Reviewed-by: Hannes Reiencke <hare@suse.de>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Note: should this proposal be accepted and approved, one or more
> claims disclosed to the TC admin and listed on the Virtio TC
> IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> might become Essential Claims.
>
> v1:
> This is an attempt at continuing the work done by Victor Kaplansky on
> mtu negiotiation for virtio-net devices. It attempts to pick up from
> https://lists.oasis-open.org/archives/virtio-dev/201508/msg00007.html
> and is just a minor blurb from the first patch along with the 2nd patch
> from the series, and some of the feedback integrated.
>
> v2:
> Rephrase and provide a mechanism for guest->host and host->guest
> communication through a driver read-only and driver write-only field.
>
> v3:
> Converted to just support initial MTU. Guest->host and Host->guest MTU
> changes are outside the scope of this change.
>
> v4:
> Removed references to 'initial', since that condition cannot be tested.
> Simply state that if the driver will use the mtu field, it must
> negotiate the feature bit, and if not, it must not.
>
> v5:
> After feedback from Michael S. Tsirkin
>
> v6:
> Bit has to change to bit 25 due to an undocumented bit 24 being taken.
>
> v7:
> Partial rewrite with feedback from MST. Additional conformance statements
> added.
>
> v8:
> Clarified the new conformance statements.
>
> v9:
> Updated the commit log for correctness. Added ACKs from
> Michael S. Tsirkin, and Maxime Coquelin. Included the virtio
> issue id.
>
> v10:
> Update the conformance statement wordings from previous 'offered' form
> to 'succesfully negotiated' form.
>
> v11:
> Clarified size w.r.t. MTU - buffer sizes must also add extra space for ethernet
> headers. Also updated the feature bit description.
>
> content.tex | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index 4b45678..45b94c6 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3049,6 +3049,11 @@ features.
> \item[VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2)] Control channel offloads
> reconfiguration support.
>
> +\item[VIRTIO_NET_F_MTU(3)] Device maximum MTU reporting is supported. If
> + offered by the device, device advises driver about the value of
> + its maximum MTU. If negotiated, the driver uses \field{mtu} as
> + the maximum MTU value.
> +
> \item[VIRTIO_NET_F_MAC (5)] Device has given MAC address.
>
> \item[VIRTIO_NET_F_GUEST_TSO4 (7)] Driver can receive TSOv4.
> @@ -3140,11 +3145,16 @@ of each of transmit and receive virtqueues (receiveq1\ldots receiveqN
> and transmitq1\ldots transmitqN respectively) that can be configured once VIRTIO_NET_F_MQ
> is negotiated.
>
> +The following driver-read-only field, \field{mtu} only exists if
> +VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
> +use.
> +
> \begin{lstlisting}
> struct virtio_net_config {
> u8 mac[6];
> le16 status;
> le16 max_virtqueue_pairs;
> + le16 mtu;
> };
> \end{lstlisting}
>
> @@ -3153,6 +3163,23 @@ struct virtio_net_config {
> The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> if it offers VIRTIO_NET_F_MQ.
>
> +The device MUST set \field{mtu} to between 68 and 65535 inclusive,
> +if it offers VIRTIO_NET_F_MTU.
> +
> +The device SHOULD set \field{mtu} to at least 1280, if it offers
> +VIRTIO_NET_F_MTU.
> +
> +The device MUST NOT modify \field{mtu} once it has been set.
> +
> +The device MUST NOT pass received packets that exceed \field{mtu} (plus low
> +level ethernet header length) size with \field{gso_type} NONE or ECN
> +after VIRTIO_NET_F_MTU has been successfully negotiated.
> +
> +The device MUST forward transmitted packets of up to \field{mtu} (plus low
> +level ethernet header length) size with \field{gso_type} NONE or ECN, and do
> +so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
> +negotiated.
> +
> \drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>
> A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
> @@ -3165,6 +3192,16 @@ If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD
> assume the link is active, otherwise it SHOULD read the link status from
> the bottom bit of \field{status}.
>
> +A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
> +
> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
> +buffers of size \field{mtu} (plus low level ethernet header length) to be
> +able to receive at least one receive packet with \field{gso_type} NONE or ECN.
I think this should be
If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
buffers to be able to receive at least one receive packet
of size \field{mtu} (plus low level ethernet header length)
with \field{gso_type} NONE or ECN.
The difference being that (IMO) we do not need to dictate buffer size,
individual buffers can be smaller than MTU, as long as
the driver supplies many of these.
> +
> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
> +size exceeding the value of \field{mtu} (plus low level ethernet header length)
> +with \field{gso_type} NONE or ECN
> +
> \subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
> \label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
> When using the legacy interface, transitional devices and drivers
> --
> 2.7.4
^ permalink raw reply
* Re: [virtio-dev] [PATCH v11] virtio-net: add Max MTU configuration field
From: Michael S. Tsirkin @ 2016-09-19 17:36 UTC (permalink / raw)
To: Aaron Conole
Cc: virtio-dev, Victor Kaplansky, Maxime Coquelin, virtualization
In-Reply-To: <f7tr38f6cxg.fsf@redhat.com>
On Mon, Sep 19, 2016 at 01:02:03PM -0400, Aaron Conole wrote:
> Cornelia Huck <cornelia.huck@de.ibm.com> writes:
>
> > On Mon, 19 Sep 2016 11:10:26 -0400
> > Aaron Conole <aconole@redhat.com> wrote:
> >
> >> @@ -3165,6 +3192,16 @@ If the driver does not negotiate the
> >> VIRTIO_NET_F_STATUS feature, it SHOULD
> >> assume the link is active, otherwise it SHOULD read the link status from
> >> the bottom bit of \field{status}.
> >>
> >> +A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
> >> +
> >> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
> >> +buffers of size \field{mtu} (plus low level ethernet header length) to be
> >> +able to receive at least one receive packet with \field{gso_type} NONE or ECN.
> >> +
> >> +If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
> >> +size exceeding the value of \field{mtu} (plus low level ethernet header length)
> >> +with \field{gso_type} NONE or ECN
> >
> > Missing period.
>
> D'oh!
>
> Michael,
> Is this something that requires a new spin of the patch to
> correct? If so, I'll add Cornelia's ACK and repost today.
I posted a comment too :(. If you can do it, I think iterating quickly
is a good idea, yes.
> >> +
> >> \subsubsection{Legacy Interface: Device configuration
> >> layout}\label{sec:Device Types / Network Device / Device
> >> configuration layout / Legacy Interface: Device configuration
> >> layout}
> >> \label{sec:Device Types / Block Device / Feature bits / Device
> >> configuration layout / Legacy Interface: Device configuration
> >> layout}
> >> When using the legacy interface, transitional devices and drivers
> >
> > Otherwise,
> >
> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >
> >
^ permalink raw reply
* [PATCH v12] virtio-net: add Max MTU configuration field
From: Aaron Conole @ 2016-09-19 17:49 UTC (permalink / raw)
To: virtio-dev, virtualization
Cc: Victor Kaplansky, Maxime Coquelin, Michael S. Tsirkin
It is helpful for a host to indicate it's MTU to be set on guest NICs
other than the assumed 1500 byte value. This helps in situations where
the host network is using Jumbo Frames, or aiding in PMTU discovery by
configuring a homogenous network. It is also helpful for sizing receive
buffers correctly.
The change adds a new field to configuration area of network
devices. It will be used to pass a maximum MTU from the device to
the driver. This will be used by the driver as a maximum value for
packet sizes during transmission, without segmentation offloading.
In addition, in order to support backward and forward compatibility,
we introduce a new feature bit called VIRTIO_NET_F_MTU.
VIRTIO-152
Signed-off-by: Aaron Conole <aconole@redhat.com>
Cc: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Hannes Reiencke <hare@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
---
Note: should this proposal be accepted and approved, one or more
claims disclosed to the TC admin and listed on the Virtio TC
IPR page https://www.oasis-open.org/committees/virtio/ipr.php
might become Essential Claims.
v1:
This is an attempt at continuing the work done by Victor Kaplansky on
mtu negiotiation for virtio-net devices. It attempts to pick up from
https://lists.oasis-open.org/archives/virtio-dev/201508/msg00007.html
and is just a minor blurb from the first patch along with the 2nd patch
from the series, and some of the feedback integrated.
v2:
Rephrase and provide a mechanism for guest->host and host->guest
communication through a driver read-only and driver write-only field.
v3:
Converted to just support initial MTU. Guest->host and Host->guest MTU
changes are outside the scope of this change.
v4:
Removed references to 'initial', since that condition cannot be tested.
Simply state that if the driver will use the mtu field, it must
negotiate the feature bit, and if not, it must not.
v5:
After feedback from Michael S. Tsirkin
v6:
Bit has to change to bit 25 due to an undocumented bit 24 being taken.
v7:
Partial rewrite with feedback from MST. Additional conformance statements
added.
v8:
Clarified the new conformance statements.
v9:
Updated the commit log for correctness. Added ACKs from
Michael S. Tsirkin, and Maxime Coquelin. Included the virtio
issue id.
v10:
Update the conformance statement wordings from previous 'offered' form
to 'succesfully negotiated' form.
v11:
Clarified size w.r.t. MTU - buffer sizes must also add extra space for ethernet
headers. Also updated the feature bit description.
v12:
Fixed punctuation mistake. Added updated conformance wording to not force the size
of a packet buffer.
content.tex | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/content.tex b/content.tex
index 4b45678..0e46134 100644
--- a/content.tex
+++ b/content.tex
@@ -3049,6 +3049,11 @@ features.
\item[VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2)] Control channel offloads
reconfiguration support.
+\item[VIRTIO_NET_F_MTU(3)] Device maximum MTU reporting is supported. If
+ offered by the device, device advises driver about the value of
+ its maximum MTU. If negotiated, the driver uses \field{mtu} as
+ the maximum MTU value.
+
\item[VIRTIO_NET_F_MAC (5)] Device has given MAC address.
\item[VIRTIO_NET_F_GUEST_TSO4 (7)] Driver can receive TSOv4.
@@ -3140,11 +3145,16 @@ of each of transmit and receive virtqueues (receiveq1\ldots receiveqN
and transmitq1\ldots transmitqN respectively) that can be configured once VIRTIO_NET_F_MQ
is negotiated.
+The following driver-read-only field, \field{mtu} only exists if
+VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the driver to
+use.
+
\begin{lstlisting}
struct virtio_net_config {
u8 mac[6];
le16 status;
le16 max_virtqueue_pairs;
+ le16 mtu;
};
\end{lstlisting}
@@ -3153,6 +3163,23 @@ struct virtio_net_config {
The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
if it offers VIRTIO_NET_F_MQ.
+The device MUST set \field{mtu} to between 68 and 65535 inclusive,
+if it offers VIRTIO_NET_F_MTU.
+
+The device SHOULD set \field{mtu} to at least 1280, if it offers
+VIRTIO_NET_F_MTU.
+
+The device MUST NOT modify \field{mtu} once it has been set.
+
+The device MUST NOT pass received packets that exceed \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN
+after VIRTIO_NET_F_MTU has been successfully negotiated.
+
+The device MUST forward transmitted packets of up to \field{mtu} (plus low
+level ethernet header length) size with \field{gso_type} NONE or ECN, and do
+so without fragmentation, after VIRTIO_NET_F_MTU has been successfully
+negotiated.
+
\drivernormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
A driver SHOULD negotiate VIRTIO_NET_F_MAC if the device offers it.
@@ -3165,6 +3192,16 @@ If the driver does not negotiate the VIRTIO_NET_F_STATUS feature, it SHOULD
assume the link is active, otherwise it SHOULD read the link status from
the bottom bit of \field{status}.
+A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive
+buffers to receive at least one receive packet of size \field{mtu} (plus low
+level ethernet header length) with \field{gso_type} NONE or ECN.
+
+If the driver negotiates VIRTIO_NET_F_MTU, it MUST NOT transmit packets of
+size exceeding the value of \field{mtu} (plus low level ethernet header length)
+with \field{gso_type} NONE or ECN.
+
\subsubsection{Legacy Interface: Device configuration layout}\label{sec:Device Types / Network Device / Device configuration layout / Legacy Interface: Device configuration layout}
\label{sec:Device Types / Block Device / Feature bits / Device configuration layout / Legacy Interface: Device configuration layout}
When using the legacy interface, transitional devices and drivers
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Emil Velikov @ 2016-09-19 23:18 UTC (permalink / raw)
To: Peter Griffin
Cc: devicetree, kernel, vinod.koul, lee.jones, linux-remoteproc,
patrice.chotard, ML dri-devel, Linux-Kernel@Vger. Kernel. Org,
David Airlie, dmaengine, dan.j.williams, bjorn.andersson,
open list:VIRTIO GPU DRIVER, LAKML
In-Reply-To: <1473081421-16555-18-git-send-email-peter.griffin@linaro.org>
On 5 September 2016 at 14:16, Peter Griffin <peter.griffin@linaro.org> wrote:
> ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
> recursive dependency.
>
From a humble skim through remoteproc, drm and a few other subsystems
I think the above is wrong. All the drivers (outside of remoteproc),
that I've seen, depend on the core component, they don't select it.
Furthermore most places explicitly hide the drivers from the menu if
the core component isn't enabled.
Is there something that requires such a different/unusual behaviour in
remoteproc ?
Regards,
Emil
^ permalink raw reply
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Emil Velikov @ 2016-09-20 7:20 UTC (permalink / raw)
To: Peter Griffin
Cc: sfr@canb.auug.org.au, kernel@stlinux.com, airlied@linux.ie,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
bjorn.andersson@linaro.org, vinod.koul@intel.com,
virtualization@lists.linux-foundation.org, lee.jones@linaro.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <1474277660-5196-2-git-send-email-peter.griffin@linaro.org>
[-- Attachment #1.1: Type: text/plain, Size: 296 bytes --]
On Monday, 19 September 2016, Peter Griffin <peter.griffin@linaro.org>
wrote:
> ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
> recursive dependency.
>
> It must not select, it must depend? Did you miss my earlier
question/suggestion that mentions that ?
Regards,
Emil
[-- Attachment #1.2: Type: text/html, Size: 517 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Emil Velikov @ 2016-09-20 7:24 UTC (permalink / raw)
To: Peter Griffin
Cc: sfr@canb.auug.org.au, kernel@stlinux.com, airlied@linux.ie,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
bjorn.andersson@linaro.org, vinod.koul@intel.com,
virtualization@lists.linux-foundation.org, lee.jones@linaro.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <CACvgo533SrCHV9V_L3_UA8sTPWKyC9HA8k5BqNad+1eaoWPLmA@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 206 bytes --]
On Tuesday, 20 September 2016, Emil Velikov <emil.l.velikov@gmail.com>
wrote:
> Did you miss my earlier question/suggestion that mentions that ?
>
Please scratch that. Just noticed the timestamps.
Emil
[-- Attachment #1.2: Type: text/html, Size: 426 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Peter Griffin @ 2016-09-20 8:32 UTC (permalink / raw)
To: Emil Velikov, Arnd Bergmann
Cc: devicetree, kernel, vinod.koul, lee.jones, linux-remoteproc,
patrice.chotard, ML dri-devel, Linux-Kernel@Vger. Kernel. Org,
David Airlie, dmaengine, dan.j.williams, bjorn.andersson,
open list:VIRTIO GPU DRIVER, LAKML
In-Reply-To: <CACvgo52_g7anGBd9HTUyYsbp5QmQEk9LR3nNHgSTgjQ29G3ViA@mail.gmail.com>
Hi Emil,
On Tue, 20 Sep 2016, Emil Velikov wrote:
> On 5 September 2016 at 14:16, Peter Griffin <peter.griffin@linaro.org> wrote:
> > ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
> > recursive dependency.
> >
> From a humble skim through remoteproc, drm and a few other subsystems
> I think the above is wrong. All the drivers (outside of remoteproc),
> that I've seen, depend on the core component, they don't select it.
I will let Bjorn comment on the remoteproc subsystem Kconfig design, and
why it is like it is.
For this particular SLIM_RPROC I have added it to Kconfig in keeping with all
the other drivers in the remoteproc subsystem which has exposed this recursive
dependency issue.
For this particular kconfig symbol a quick grep reveals more drivers in
the kernel using 'select', than 'depend on'
git grep "select VIRTIO" | wc -l
14
git grep "depends on VIRTIO" | wc -l
10
> Furthermore most places explicitly hide the drivers from the menu if
> the core component isn't enabled.
Remoteproc subsystem takes a different approach, the core code is only enabled
if a driver which relies on it is enabled. This IMHO makes sense, as
remoteproc is not widely used (only a few particular ARM SoC's).
It is true that for subsystems which rely on the core component being
explicitly enabled, they often tend to hide drivers which depend on it
from the menu unless it is. This also makes sense.
>
> Is there something that requires such a different/unusual behaviour in
> remoteproc ?
>
I'm not sure it is that unusual...looking at config USB, it selects USB_COMMON in
mfd subsystem, client drivers select MFD_CORE.
I've added Arnd to this thread, as I've seen lots of Kconfig patches from him
recently, maybe he has some thoughts on whether this is the correct fix or not?
regards,
Peter.
^ permalink raw reply
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Jani Nikula @ 2016-09-20 9:33 UTC (permalink / raw)
To: Peter Griffin, Emil Velikov, Arnd Bergmann
Cc: devicetree, kernel, vinod.koul, linux-remoteproc, patrice.chotard,
ML dri-devel, Linux-Kernel@Vger. Kernel. Org, dmaengine,
dan.j.williams, bjorn.andersson, lee.jones,
open list:VIRTIO GPU DRIVER, LAKML
In-Reply-To: <20160920083251.GB26093@griffinp-ThinkPad-X1-Carbon-2nd>
On Tue, 20 Sep 2016, Peter Griffin <peter.griffin@linaro.org> wrote:
> Hi Emil,
>
> On Tue, 20 Sep 2016, Emil Velikov wrote:
>
>> On 5 September 2016 at 14:16, Peter Griffin <peter.griffin@linaro.org> wrote:
>> > ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
>> > recursive dependency.
>
>
>> >
>> From a humble skim through remoteproc, drm and a few other subsystems
>> I think the above is wrong. All the drivers (outside of remoteproc),
>> that I've seen, depend on the core component, they don't select it.
>
> I will let Bjorn comment on the remoteproc subsystem Kconfig design, and
> why it is like it is.
>
> For this particular SLIM_RPROC I have added it to Kconfig in keeping with all
> the other drivers in the remoteproc subsystem which has exposed this recursive
> dependency issue.
>
> For this particular kconfig symbol a quick grep reveals more drivers in
> the kernel using 'select', than 'depend on'
>
> git grep "select VIRTIO" | wc -l
> 14
>
> git grep "depends on VIRTIO" | wc -l
> 10
>
>
>> Furthermore most places explicitly hide the drivers from the menu if
>> the core component isn't enabled.
>
> Remoteproc subsystem takes a different approach, the core code is only enabled
> if a driver which relies on it is enabled. This IMHO makes sense, as
> remoteproc is not widely used (only a few particular ARM SoC's).
>
> It is true that for subsystems which rely on the core component being
> explicitly enabled, they often tend to hide drivers which depend on it
> from the menu unless it is. This also makes sense.
>
>>
>> Is there something that requires such a different/unusual behaviour in
>> remoteproc ?
>>
>
> I'm not sure it is that unusual...looking at config USB, it selects USB_COMMON in
> mfd subsystem, client drivers select MFD_CORE.
>
> I've added Arnd to this thread, as I've seen lots of Kconfig patches from him
> recently, maybe he has some thoughts on whether this is the correct fix or not?
Documentation/kbuild/kconfig-language.txt:
Note:
select should be used with care. select will force
a symbol to a value without visiting the dependencies.
By abusing select you are able to select a symbol FOO even
if FOO depends on BAR that is not set.
In general use select only for non-visible symbols
(no prompts anywhere) and for symbols with no dependencies.
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
People tend to abuse select because it's "convenient". If you depend,
but some of your dependencies aren't met, you're in for some digging
through Kconfig to find the missing deps. Just to make the option you
want visible in menuconfig. If you instead select something with
dependencies, it'll be right most of the time, and it's "convenient",
until it breaks. (And hey, it usually breaks for someone else with some
other config, so it's still convenient for you.)
Perhaps kconfig should complain about selecting visible symbols and
symbols with dependencies.
BR,
Jani.
--
Jani Nikula, Intel Open Source Technology Center
^ permalink raw reply
* Re: [PATCH v9 17/19] drm/virtio: kconfig: Fix recursive dependency issue.
From: Emil Velikov @ 2016-09-21 12:09 UTC (permalink / raw)
To: Peter Griffin
Cc: devicetree, kernel, Arnd Bergmann, vinod.koul, lee.jones,
linux-remoteproc, patrice.chotard, ML dri-devel,
Linux-Kernel@Vger. Kernel. Org, David Airlie, dmaengine,
dan.j.williams, bjorn.andersson, open list:VIRTIO GPU DRIVER,
LAKML
In-Reply-To: <20160920083251.GB26093@griffinp-ThinkPad-X1-Carbon-2nd>
On 20 September 2016 at 09:32, Peter Griffin <peter.griffin@linaro.org> wrote:
> Hi Emil,
>
> On Tue, 20 Sep 2016, Emil Velikov wrote:
>
>> On 5 September 2016 at 14:16, Peter Griffin <peter.griffin@linaro.org> wrote:
>> > ST_SLIM_REMOTEPROC must select REMOTEPROC, which exposes the following
>> > recursive dependency.
>
>
>> >
>> From a humble skim through remoteproc, drm and a few other subsystems
>> I think the above is wrong. All the drivers (outside of remoteproc),
>> that I've seen, depend on the core component, they don't select it.
>
> I will let Bjorn comment on the remoteproc subsystem Kconfig design, and
> why it is like it is.
>
> For this particular SLIM_RPROC I have added it to Kconfig in keeping with all
> the other drivers in the remoteproc subsystem which has exposed this recursive
> dependency issue.
>
> For this particular kconfig symbol a quick grep reveals more drivers in
> the kernel using 'select', than 'depend on'
>
> git grep "select VIRTIO" | wc -l
> 14
>
> git grep "depends on VIRTIO" | wc -l
> 10
>
Might be worth taking a closer look into these at some point.
>
>> Furthermore most places explicitly hide the drivers from the menu if
>> the core component isn't enabled.
>
> Remoteproc subsystem takes a different approach, the core code is only enabled
> if a driver which relies on it is enabled. This IMHO makes sense, as
> remoteproc is not widely used (only a few particular ARM SoC's).
>
> It is true that for subsystems which rely on the core component being
> explicitly enabled, they often tend to hide drivers which depend on it
> from the menu unless it is. This also makes sense.
>
>>
>> Is there something that requires such a different/unusual behaviour in
>> remoteproc ?
>>
>
> I'm not sure it is that unusual...looking at config USB, it selects USB_COMMON in
> mfd subsystem, client drivers select MFD_CORE.
>
On the USB case I'm not sure what the reasoning behind the USB vs
USB_COMMON split. In seems that one could just fold them, but that's
another topic. On the MFD side... it follows the select {,mis,ab}use.
With one (the only one?) MFD driver not using/selecting MFD_CORE doing
it's own version of mfd_add_devices... which could be reworked,
possibly.
> I've added Arnd to this thread, as I've seen lots of Kconfig patches from him
> recently, maybe he has some thoughts on whether this is the correct fix or not?
>
Ack. Fwiw, I believe that the reasoning put by Jani is perfeclty
reasonable, but it'll be great to hear others as well.
Thanks
Emil
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox