* [PATCH v8 0/6] Implement qspinlock/pv-qspinlock on ppc
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
Hi All,
this is the fairlock patchset. You can apply them and build successfully.
patches are based on linux-next
qspinlock can avoid waiter starved issue. It has about the same speed in
single-thread and it can be much faster in high contention situations
especially when the spinlock is embedded within the data structure to be
protected.
v7 -> v8:
add one patch to drop a function call under native qspinlock unlock.
Enabling qspinlock or not is a complier option now.
rebase onto linux-next(4.9-rc7)
v6 -> v7:
rebase onto 4.8-rc4
v1 -> v6:
too many details. snip.
some benchmark result below
perf bench
these numbers are ops per sec, So the higher the better.
*******************************************
on pSeries with 32 vcpus, 32Gb memory, pHyp.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
futex hash | 618572 | 552332 | 553788
futex lock-pi | 364 | 364 | 364
sched pipe | 78984 | 76060 | 81454
------------------------------------------------------------------------------------
unix bench:
these numbers are scores, So the higher the better.
************************************************
on PowerNV with 16 cores(cpus) (smt off), 32Gb memory:
-------------
pv-qspinlock and qspinlock have very similar results because pv-qspinlock use native version
which is only having one callback overhead
------------------------------------------------------------------------------------
test case | pv-qspinlock and qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 761.1 761.4
File Copy 1024 bufsize 2000 maxblocks 1259.8 1286.6
File Copy 256 bufsize 500 maxblocks 782.2 790.3
File Copy 4096 bufsize 8000 maxblocks 2741.5 2817.4
Pipe Throughput 1063.2 1036.7
Pipe-based Context Switching 284.7 281.1
Process Creation 679.6 649.1
Shell Scripts (1 concurrent) 1933.2 1922.9
Shell Scripts (8 concurrent) 5003.3 4899.8
System Call Overhead 900.6 896.8
==========================
System Benchmarks Index Score 1139.3 1133.0
--------------------------------------------------------------------------- ---------
*******************************************
on pSeries with 32 vcpus, 32Gb memory, pHyp.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 877.1 891.2 872.8
File Copy 1024 bufsize 2000 maxblocks 1390.4 1399.2 1395.0
File Copy 256 bufsize 500 maxblocks 882.4 889.5 881.8
File Copy 4096 bufsize 8000 maxblocks 3112.3 3113.4 3121.7
Pipe Throughput 1095.8 1162.6 1158.5
Pipe-based Context Switching 194.9 192.7 200.7
Process Creation 518.4 526.4 509.1
Shell Scripts (1 concurrent) 1401.9 1413.9 1402.2
Shell Scripts (8 concurrent) 3215.6 3246.6 3229.1
System Call Overhead 833.2 892.4 888.1
====================================
System Benchmarks Index Score 1033.7 1052.5 1047.8
------------------------------------------------------------------------------------
******************************************
on pSeries with 32 vcpus, 16Gb memory, KVM.
------------------------------------------------------------------------------------
test case | pv-qspinlock | qspinlock | current-spinlock
------------------------------------------------------------------------------------
Execl Throughput 497.4 518.7 497.8
File Copy 1024 bufsize 2000 maxblocks 1368.8 1390.1 1343.3
File Copy 256 bufsize 500 maxblocks 857.7 859.8 831.4
File Copy 4096 bufsize 8000 maxblocks 2851.7 2838.1 2785.5
Pipe Throughput 1221.9 1265.3 1250.4
Pipe-based Context Switching 529.8 578.1 564.2
Process Creation 408.4 421.6 287.6
Shell Scripts (1 concurrent) 1201.8 1215.3 1185.8
Shell Scripts (8 concurrent) 3758.4 3799.3 3878.9
System Call Overhead 1008.3 1122.6 1134.2
=====================================
System Benchmarks Index Score 1072.0 1108.9 1050.6
------------------------------------------------------------------------------------
Pan Xinhui (6):
powerpc/qspinlock: powerpc support qspinlock
powerpc: pSeries/Kconfig: Add qspinlock build config
powerpc: lib/locks.c: Add cpu yield/wake helper function
powerpc/pv-qspinlock: powerpc support pv-qspinlock
powerpc: pSeries: Add pv-qspinlock build config/make
powerpc/pv-qspinlock: Optimise native unlock path
arch/powerpc/include/asm/qspinlock.h | 93 ++++++++++++
arch/powerpc/include/asm/qspinlock_paravirt.h | 52 +++++++
.../powerpc/include/asm/qspinlock_paravirt_types.h | 13 ++
arch/powerpc/include/asm/spinlock.h | 35 +++--
arch/powerpc/include/asm/spinlock_types.h | 4 +
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/paravirt.c | 157 +++++++++++++++++++++
arch/powerpc/lib/locks.c | 122 ++++++++++++++++
arch/powerpc/platforms/pseries/Kconfig | 16 +++
arch/powerpc/platforms/pseries/setup.c | 5 +
10 files changed, 485 insertions(+), 13 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt_types.h
create mode 100644 arch/powerpc/kernel/paravirt.c
--
2.4.11
^ permalink raw reply
* [PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
This patch add basic code to enable qspinlock on powerpc. qspinlock is
one kind of fairlock implementation. And seen some performance improvement
under some scenarios.
queued_spin_unlock() release the lock by just one write of NULL to the
::locked field which sits at different places in the two endianness
system.
We override some arch_spin_XXX as powerpc has io_sync stuff which makes
sure the io operations are protected by the lock correctly.
There is another special case, see commit
2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/qspinlock.h | 66 +++++++++++++++++++++++++++++++
arch/powerpc/include/asm/spinlock.h | 31 +++++++++------
arch/powerpc/include/asm/spinlock_types.h | 4 ++
arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++
4 files changed, 147 insertions(+), 13 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock.h
diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
new file mode 100644
index 0000000..4c89256
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -0,0 +1,66 @@
+#ifndef _ASM_POWERPC_QSPINLOCK_H
+#define _ASM_POWERPC_QSPINLOCK_H
+
+#include <asm-generic/qspinlock_types.h>
+
+#define SPIN_THRESHOLD (1 << 15)
+#define queued_spin_unlock queued_spin_unlock
+#define queued_spin_is_locked queued_spin_is_locked
+#define queued_spin_unlock_wait queued_spin_unlock_wait
+
+extern void queued_spin_unlock_wait(struct qspinlock *lock);
+
+static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
+{
+ return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ /* release semantics is required */
+ smp_store_release(__qspinlock_lock_byte(lock), 0);
+}
+
+static inline int queued_spin_is_locked(struct qspinlock *lock)
+{
+ smp_mb();
+ return atomic_read(&lock->val);
+}
+
+#include <asm-generic/qspinlock.h>
+
+/* we need override it as ppc has io_sync stuff */
+#undef arch_spin_trylock
+#undef arch_spin_lock
+#undef arch_spin_lock_flags
+#undef arch_spin_unlock
+#define arch_spin_trylock arch_spin_trylock
+#define arch_spin_lock arch_spin_lock
+#define arch_spin_lock_flags arch_spin_lock_flags
+#define arch_spin_unlock arch_spin_unlock
+
+static inline int arch_spin_trylock(arch_spinlock_t *lock)
+{
+ CLEAR_IO_SYNC;
+ return queued_spin_trylock(lock);
+}
+
+static inline void arch_spin_lock(arch_spinlock_t *lock)
+{
+ CLEAR_IO_SYNC;
+ queued_spin_lock(lock);
+}
+
+static inline
+void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
+{
+ CLEAR_IO_SYNC;
+ queued_spin_lock(lock);
+}
+
+static inline void arch_spin_unlock(arch_spinlock_t *lock)
+{
+ SYNC_IO;
+ queued_spin_unlock(lock);
+}
+#endif /* _ASM_POWERPC_QSPINLOCK_H */
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 8c1b913..954099e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
}
#endif
+#if defined(CONFIG_PPC_SPLPAR)
+/* We only yield to the hypervisor if we are in shared processor mode */
+#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
+extern void __spin_yield(arch_spinlock_t *lock);
+extern void __rw_yield(arch_rwlock_t *lock);
+#else /* SPLPAR */
+#define __spin_yield(x) barrier()
+#define __rw_yield(x) barrier()
+#define SHARED_PROCESSOR 0
+#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm/qspinlock.h>
+#else
+
+#define arch_spin_relax(lock) __spin_yield(lock)
+
static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
{
return lock.slock == 0;
@@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
* held. Conveniently, we have a word in the paca that holds this
* value.
*/
-
-#if defined(CONFIG_PPC_SPLPAR)
-/* We only yield to the hypervisor if we are in shared processor mode */
-#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-extern void __spin_yield(arch_spinlock_t *lock);
-extern void __rw_yield(arch_rwlock_t *lock);
-#else /* SPLPAR */
-#define __spin_yield(x) barrier()
-#define __rw_yield(x) barrier()
-#define SHARED_PROCESSOR 0
-#endif
-
static inline void arch_spin_lock(arch_spinlock_t *lock)
{
CLEAR_IO_SYNC;
@@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
smp_mb();
}
+#endif /* !CONFIG_QUEUED_SPINLOCKS */
/*
* Read-write spinlocks, allowing multiple readers
* but only one writer.
@@ -338,7 +344,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
#define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
#define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
-#define arch_spin_relax(lock) __spin_yield(lock)
#define arch_read_relax(lock) __rw_yield(lock)
#define arch_write_relax(lock) __rw_yield(lock)
diff --git a/arch/powerpc/include/asm/spinlock_types.h b/arch/powerpc/include/asm/spinlock_types.h
index 2351adc..bd7144e 100644
--- a/arch/powerpc/include/asm/spinlock_types.h
+++ b/arch/powerpc/include/asm/spinlock_types.h
@@ -5,11 +5,15 @@
# error "please don't include this file directly"
#endif
+#ifdef CONFIG_QUEUED_SPINLOCKS
+#include <asm-generic/qspinlock_types.h>
+#else
typedef struct {
volatile unsigned int slock;
} arch_spinlock_t;
#define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
+#endif
typedef struct {
volatile signed int lock;
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index b7b1237..6574626 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,7 @@
#include <asm/hvcall.h>
#include <asm/smp.h>
+#ifndef CONFIG_QUEUED_SPINLOCKS
void __spin_yield(arch_spinlock_t *lock)
{
unsigned int lock_value, holder_cpu, yield_count;
@@ -42,6 +43,7 @@ void __spin_yield(arch_spinlock_t *lock)
get_hard_smp_processor_id(holder_cpu), yield_count);
}
EXPORT_SYMBOL_GPL(__spin_yield);
+#endif
/*
* Waiting for a read lock or a write lock on a rwlock...
@@ -68,3 +70,60 @@ void __rw_yield(arch_rwlock_t *rw)
get_hard_smp_processor_id(holder_cpu), yield_count);
}
#endif
+
+#ifdef CONFIG_QUEUED_SPINLOCKS
+/*
+ * This forbid we load an old value in another LL/SC. Because the SC here force
+ * another LL/SC repeat. So we guarantee all loads in another LL and SC will
+ * read correct value.
+ */
+static inline u32 atomic_read_sync(atomic_t *v)
+{
+ u32 val;
+
+ __asm__ __volatile__(
+"1: " PPC_LWARX(%0, 0, %2, 0) "\n"
+" stwcx. %0, 0, %2\n"
+" bne- 1b\n"
+ : "=&r" (val), "+m" (*v)
+ : "r" (v)
+ : "cr0", "xer");
+
+ return val;
+}
+
+void queued_spin_unlock_wait(struct qspinlock *lock)
+{
+
+ u32 val;
+
+ smp_mb();
+
+ /*
+ * copied from generic queue_spin_unlock_wait with little modification
+ */
+ for (;;) {
+ /* need _sync, as we might race with another LL/SC in lock()*/
+ val = atomic_read_sync(&lock->val);
+
+ if (!val) /* not locked, we're done */
+ goto done;
+
+ if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
+ break;
+
+ /* not locked, but pending, wait until we observe the lock */
+ cpu_relax();
+ }
+
+ /*
+ * any unlock is good. And need not _sync, as ->val is set by the SC in
+ * unlock(), any loads in lock() must see the correct value.
+ */
+ while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
+ cpu_relax();
+done:
+ smp_mb();
+}
+EXPORT_SYMBOL(queued_spin_unlock_wait);
+#endif
--
2.4.11
^ permalink raw reply related
* [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
pSeries/powerNV will use qspinlock from now on.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb..8a87d06 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -23,6 +23,14 @@ config PPC_PSERIES
select PPC_DOORBELL
default y
+config ARCH_USE_QUEUED_SPINLOCKS
+ default y
+ bool "Enable qspinlock"
+ help
+ Enabling this option will let kernel use qspinlock which is a kind of
+ fairlock. It has shown a good performance improvement on x86 and also ppc
+ especially in high contention cases.
+
config PPC_SPLPAR
depends on PPC_PSERIES
bool "Support for shared-processor logical partitions"
--
2.4.11
^ permalink raw reply related
* [PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Add two corresponding helper functions to support pv-qspinlock.
For normal use, __spin_yield_cpu will confer current vcpu slices to the
target vcpu(say, a lock holder). If target vcpu is not specified or it
is in running state, such conferging to lpar happens or not depends.
Because hcall itself will introduce latency and a little overhead. And we
do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
The second parameter *confer* can indicate such case.
__spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
current vcpu state.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/spinlock.h | 4 +++
arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+)
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 954099e..6426bd5 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
/* We only yield to the hypervisor if we are in shared processor mode */
#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
extern void __spin_yield(arch_spinlock_t *lock);
+extern void __spin_yield_cpu(int cpu, int confer);
+extern void __spin_wake_cpu(int cpu);
extern void __rw_yield(arch_rwlock_t *lock);
#else /* SPLPAR */
#define __spin_yield(x) barrier()
+#define __spin_yield_cpu(x, y) barrier()
+#define __spin_wake_cpu(x) barrier()
#define __rw_yield(x) barrier()
#define SHARED_PROCESSOR 0
#endif
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index 6574626..bd872c9 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -23,6 +23,65 @@
#include <asm/hvcall.h>
#include <asm/smp.h>
+/*
+ * confer our slices to a specified cpu and return. If it is in running state
+ * or cpu is -1, then we will check confer. If confer is NULL, we will return
+ * otherwise we confer our slices to lpar.
+ */
+void __spin_yield_cpu(int cpu, int confer)
+{
+ unsigned int holder_cpu = cpu, yield_count;
+
+ if (cpu == -1)
+ goto yield_to_lpar;
+
+ BUG_ON(holder_cpu >= nr_cpu_ids);
+ yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+
+ /* if cpu is running, confer slices to lpar conditionally*/
+ if ((yield_count & 1) == 0)
+ goto yield_to_lpar;
+
+ plpar_hcall_norets(H_CONFER,
+ get_hard_smp_processor_id(holder_cpu), yield_count);
+ return;
+
+yield_to_lpar:
+ if (confer)
+ plpar_hcall_norets(H_CONFER, -1, 0);
+}
+EXPORT_SYMBOL_GPL(__spin_yield_cpu);
+
+void __spin_wake_cpu(int cpu)
+{
+ unsigned int holder_cpu = cpu;
+
+ BUG_ON(holder_cpu >= nr_cpu_ids);
+ /*
+ * NOTE: we should always do this hcall regardless of
+ * the yield_count of the holder_cpu.
+ * as thers might be a case like below;
+ * CPU 1 CPU 2
+ * yielded = true
+ * if (yielded)
+ * __spin_wake_cpu()
+ * __spin_yield_cpu()
+ *
+ * So we might lose a wake if we check the yield_count and
+ * return directly if the holder_cpu is running.
+ * IOW. do NOT code like below.
+ * yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
+ * if ((yield_count & 1) == 0)
+ * return;
+ *
+ * a PROD hcall marks the target_cpu proded, which cause the next cede
+ * or confer called on the target_cpu invalid.
+ */
+ plpar_hcall_norets(H_PROD,
+ get_hard_smp_processor_id(holder_cpu));
+}
+EXPORT_SYMBOL_GPL(__spin_wake_cpu);
+
#ifndef CONFIG_QUEUED_SPINLOCKS
void __spin_yield(arch_spinlock_t *lock)
{
--
2.4.11
^ permalink raw reply related
* [PATCH v8 4/6] powerpc/pv-qspinlock: powerpc support pv-qspinlock
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
The default pv-qspinlock uses qspinlock(native version of pv-qspinlock).
pv_lock initialization should be done in bootstage with irq disabled.
And if we run as a guest with powerKVM/pHyp shared_processor mode,
restore pv_lock_ops callbacks to pv-qspinlock(pv version) which makes
full use of virtualization.
There is a hash table, we store cpu number into it and the key is lock.
So everytime pv_wait can know who is the lock holder by searching the
lock. Also store the lock in a per_cpu struct, and remove it when we own
the lock. Then pv_wait can know which lock we are spinning on. But the
cpu in the hash table might not be the correct lock holder, as for
performace issue, we does not take care of hash conflict.
Also introduce spin_lock_holder, which tells who owns the lock now.
currently the only user is spin_unlock_wait.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/qspinlock.h | 29 +++-
arch/powerpc/include/asm/qspinlock_paravirt.h | 36 +++++
.../powerpc/include/asm/qspinlock_paravirt_types.h | 13 ++
arch/powerpc/kernel/paravirt.c | 153 +++++++++++++++++++++
arch/powerpc/lib/locks.c | 8 +-
arch/powerpc/platforms/pseries/setup.c | 5 +
6 files changed, 241 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt_types.h
create mode 100644 arch/powerpc/kernel/paravirt.c
diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
index 4c89256..8fd6349 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -15,7 +15,7 @@ static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
}
-static inline void queued_spin_unlock(struct qspinlock *lock)
+static inline void native_queued_spin_unlock(struct qspinlock *lock)
{
/* release semantics is required */
smp_store_release(__qspinlock_lock_byte(lock), 0);
@@ -27,6 +27,33 @@ static inline int queued_spin_is_locked(struct qspinlock *lock)
return atomic_read(&lock->val);
}
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#include <asm/qspinlock_paravirt.h>
+/*
+ * try to know who is the lock holder, however it is not always true
+ * Return:
+ * -1, we did not know the lock holder.
+ * other value, likely is the lock holder.
+ */
+extern int spin_lock_holder(void *lock);
+
+static inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+ pv_queued_spin_lock(lock, val);
+}
+
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ pv_queued_spin_unlock(lock);
+}
+#else
+#define spin_lock_holder(l) (-1)
+static inline void queued_spin_unlock(struct qspinlock *lock)
+{
+ native_queued_spin_unlock(lock);
+}
+#endif
+
#include <asm-generic/qspinlock.h>
/* we need override it as ppc has io_sync stuff */
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h b/arch/powerpc/include/asm/qspinlock_paravirt.h
new file mode 100644
index 0000000..d87cda0
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -0,0 +1,36 @@
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+#error "do not include this file"
+#endif
+
+#ifndef _ASM_QSPINLOCK_PARAVIRT_H
+#define _ASM_QSPINLOCK_PARAVIRT_H
+
+#include <asm/qspinlock_paravirt_types.h>
+
+extern void pv_lock_init(void);
+extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_init_lock_hash(void);
+extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+
+static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
+{
+ pv_lock_op.lock(lock, val);
+}
+
+static inline void pv_queued_spin_unlock(struct qspinlock *lock)
+{
+ pv_lock_op.unlock(lock);
+}
+
+static inline void pv_wait(u8 *ptr, u8 val)
+{
+ pv_lock_op.wait(ptr, val);
+}
+
+static inline void pv_kick(int cpu)
+{
+ pv_lock_op.kick(cpu);
+}
+
+#endif
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt_types.h b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
new file mode 100644
index 0000000..83611ed
--- /dev/null
+++ b/arch/powerpc/include/asm/qspinlock_paravirt_types.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+#define _ASM_QSPINLOCK_PARAVIRT_TYPES_H
+
+struct pv_lock_ops {
+ void (*lock)(struct qspinlock *lock, u32 val);
+ void (*unlock)(struct qspinlock *lock);
+ void (*wait)(u8 *ptr, u8 val);
+ void (*kick)(int cpu);
+};
+
+extern struct pv_lock_ops pv_lock_op;
+
+#endif
diff --git a/arch/powerpc/kernel/paravirt.c b/arch/powerpc/kernel/paravirt.c
new file mode 100644
index 0000000..e697b17
--- /dev/null
+++ b/arch/powerpc/kernel/paravirt.c
@@ -0,0 +1,153 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/spinlock.h>
+#include <linux/smp.h>
+#include <linux/hash.h>
+#include <linux/bootmem.h>
+
+/* +2 here is to make sure there is not many conflict*/
+#define NUM_LOCK_CPU_ENTRY_SHIFT (order_base_2(NR_CPUS) + 2)
+#define NUM_LOCK_CPU_ENTRY (1 << NUM_LOCK_CPU_ENTRY_SHIFT)
+/* we can only spin on 4 locks at same time on same cpu*/
+#define NUM_LOCKS_PER_CPU 4
+
+static u16 *hash_lock_cpu_ptr;
+
+struct locks_on_cpu {
+ void *l[NUM_LOCKS_PER_CPU];
+ int count;
+};
+
+static DEFINE_PER_CPU(struct locks_on_cpu, node);
+
+static u16 *hash(void *l)
+{
+ int val = hash_ptr(l, NUM_LOCK_CPU_ENTRY_SHIFT);
+
+ return &hash_lock_cpu_ptr[val];
+}
+
+static void __init init_hash(void)
+{
+ int size = NUM_LOCK_CPU_ENTRY * sizeof(*hash_lock_cpu_ptr);
+
+ hash_lock_cpu_ptr = memblock_virt_alloc(size, 0);
+ memset(hash_lock_cpu_ptr, 0, size);
+}
+
+#define lock_get_holder(l) \
+ ((int)(*hash(l) - 1))
+
+#define lock_set_holder(l) \
+ (*hash(l) = raw_smp_processor_id() + 1)
+
+int spin_lock_holder(void *lock)
+{
+ /* we might run on PowerNV, which has no hash table ptr*/
+ if (hash_lock_cpu_ptr)
+ return lock_get_holder(lock);
+ return -1;
+}
+EXPORT_SYMBOL(spin_lock_holder);
+
+static void *this_cpu_lock(void)
+{
+ struct locks_on_cpu *this_node = this_cpu_ptr(&node);
+ int i = this_node->count - 1;
+
+ return this_node->l[i];
+}
+
+static void cpu_save_lock(void *l)
+{
+ struct locks_on_cpu *this_node = this_cpu_ptr(&node);
+ int i = this_node->count++;
+
+ this_node->l[i] = l;
+}
+
+static void cpu_remove_lock(void *l)
+{
+ __this_cpu_dec(node.count);
+}
+
+static void __native_queued_spin_unlock(struct qspinlock *lock)
+{
+ native_queued_spin_unlock(lock);
+}
+
+static void __pv_lock(struct qspinlock *lock, u32 val)
+{
+ /*
+ * save the lock we are spinning on
+ * pv_wait need know this lock
+ */
+ cpu_save_lock(lock);
+
+ __pv_queued_spin_lock_slowpath(lock, val);
+
+ /* as we win the lock, remove it*/
+ cpu_remove_lock(lock);
+
+ /*
+ * let other spinner know who is the lock holder
+ * we does not need to unset lock holder in unlock()
+ */
+ lock_set_holder(lock);
+}
+
+static void __pv_wait(u8 *ptr, u8 val)
+{
+ void *l = this_cpu_lock();
+ int cpu;
+ int always_confer = !in_interrupt();
+
+ while (READ_ONCE(*ptr) == val) {
+ HMT_low();
+ /*
+ * the lock might be unlocked once and locked again
+ */
+ cpu = lock_get_holder(l);
+
+ /*
+ * the default behavior of __spin_yield_cpu is yielding
+ * our cpu slices to target vcpu or lpar(pHyp or KVM).
+ * consider the latency of hcall itself and the priority of
+ * current task, we can do a optimisation.
+ * IOW, if we are in interrupt, and the target vcpu is running
+ * we do not yield ourself to lpar.
+ */
+ __spin_yield_cpu(cpu, always_confer);
+ }
+ HMT_medium();
+}
+
+static void __pv_kick(int cpu)
+{
+ __spin_wake_cpu(cpu);
+}
+
+struct pv_lock_ops pv_lock_op = {
+ .lock = native_queued_spin_lock_slowpath,
+ .unlock = __native_queued_spin_unlock,
+ .wait = NULL,
+ .kick = NULL,
+};
+EXPORT_SYMBOL(pv_lock_op);
+
+void __init pv_lock_init(void)
+{
+ if (SHARED_PROCESSOR) {
+ init_hash();
+ __pv_init_lock_hash();
+ pv_lock_op.lock = __pv_lock;
+ pv_lock_op.unlock = __pv_queued_spin_unlock;
+ pv_lock_op.wait = __pv_wait;
+ pv_lock_op.kick = __pv_kick;
+ }
+}
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index bd872c9..6e28651 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -179,8 +179,12 @@ void queued_spin_unlock_wait(struct qspinlock *lock)
* any unlock is good. And need not _sync, as ->val is set by the SC in
* unlock(), any loads in lock() must see the correct value.
*/
- while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
- cpu_relax();
+ while (atomic_read(&lock->val) & _Q_LOCKED_MASK) {
+ HMT_low();
+ if (SHARED_PROCESSOR)
+ __spin_yield_cpu(spin_lock_holder(lock), 0);
+ }
+ HMT_medium();
done:
smp_mb();
}
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 97aa3f3..ca61ead 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -487,6 +487,11 @@ static void __init pSeries_setup_arch(void)
}
ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+ pv_lock_init();
+#endif
+
}
static int __init pSeries_init_panel(void)
--
2.4.11
^ permalink raw reply related
* [PATCH v8 5/6] powerpc: pSeries: Add pv-qspinlock build config/make
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
pSeries run as a guest and might need pv-qspinlock.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1925341..4780415 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_PPC_970_NAP) += idle_power4.o
obj-$(CONFIG_PPC_P7_NAP) += idle_book3s.o
procfs-y := proc_powerpc.o
obj-$(CONFIG_PROC_FS) += $(procfs-y)
+obj-$(CONFIG_PARAVIRT_SPINLOCKS) += paravirt.o
rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI) := rtas_pci.o
obj-$(CONFIG_PPC_RTAS) += rtas.o rtas-rtc.o $(rtaspci-y-y)
obj-$(CONFIG_PPC_RTAS_DAEMON) += rtasd.o
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 8a87d06..0288c78 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -31,6 +31,14 @@ config ARCH_USE_QUEUED_SPINLOCKS
fairlock. It has shown a good performance improvement on x86 and also ppc
especially in high contention cases.
+config PARAVIRT_SPINLOCKS
+ bool "Paravirtialization support for qspinlock"
+ depends on PPC_SPLPAR && QUEUED_SPINLOCKS
+ default y
+ help
+ If kernel need run as a guest then enable this option.
+ Generally it can let kernel have a better performace.
+
config PPC_SPLPAR
depends on PPC_PSERIES
bool "Support for shared-processor logical partitions"
--
2.4.11
^ permalink raw reply related
* [PATCH v8 6/6] powerpc/pv-qspinlock: Optimise native unlock path
From: Pan Xinhui @ 2016-12-05 15:19 UTC (permalink / raw)
To: linux-kernel
Cc: xinhui.pan, peterz, benh, boqun.feng, waiman.long, virtualization,
mingo, paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Avoid a function call under native version of qspinlock. On powerNV,
bafore applying this patch, every unlock is expensive. This small
optimizes enhance the performance.
We use static_key with jump_label which removes unnecessary loads of
lppaca and its stuff.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/qspinlock_paravirt.h | 18 +++++++++++++++++-
arch/powerpc/kernel/paravirt.c | 4 ++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h b/arch/powerpc/include/asm/qspinlock_paravirt.h
index d87cda0..8d39446 100644
--- a/arch/powerpc/include/asm/qspinlock_paravirt.h
+++ b/arch/powerpc/include/asm/qspinlock_paravirt.h
@@ -6,12 +6,14 @@
#define _ASM_QSPINLOCK_PARAVIRT_H
#include <asm/qspinlock_paravirt_types.h>
+#include <linux/jump_label.h>
extern void pv_lock_init(void);
extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __pv_init_lock_hash(void);
extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+extern struct static_key_true sharedprocessor_key;
static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
{
@@ -20,7 +22,21 @@ static inline void pv_queued_spin_lock(struct qspinlock *lock, u32 val)
static inline void pv_queued_spin_unlock(struct qspinlock *lock)
{
- pv_lock_op.unlock(lock);
+ /*
+ * on powerNV and pSeries with jump_label, code will be
+ * PowerNV: pSeries:
+ * nop; b 2f;
+ * native unlock 2:
+ * pv unlock;
+ * In this way, we can do unlock quick in native case.
+ *
+ * IF jump_label is not enabled, we fall back into
+ * if condition, IOW, ld && cmp && bne.
+ */
+ if (static_branch_likely(&sharedprocessor_key))
+ native_queued_spin_unlock(lock);
+ else
+ pv_lock_op.unlock(lock);
}
static inline void pv_wait(u8 *ptr, u8 val)
diff --git a/arch/powerpc/kernel/paravirt.c b/arch/powerpc/kernel/paravirt.c
index e697b17..a0a000e 100644
--- a/arch/powerpc/kernel/paravirt.c
+++ b/arch/powerpc/kernel/paravirt.c
@@ -140,6 +140,9 @@ struct pv_lock_ops pv_lock_op = {
};
EXPORT_SYMBOL(pv_lock_op);
+struct static_key_true sharedprocessor_key = STATIC_KEY_TRUE_INIT;
+EXPORT_SYMBOL(sharedprocessor_key);
+
void __init pv_lock_init(void)
{
if (SHARED_PROCESSOR) {
@@ -149,5 +152,6 @@ void __init pv_lock_init(void)
pv_lock_op.unlock = __pv_queued_spin_unlock;
pv_lock_op.wait = __pv_wait;
pv_lock_op.kick = __pv_kick;
+ static_branch_disable(&sharedprocessor_key);
}
}
--
2.4.11
^ permalink raw reply related
* Re: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
From: Dave Hansen @ 2016-12-05 17:22 UTC (permalink / raw)
To: Li, Liang Z, kvm@vger.kernel.org
Cc: virtio-dev@lists.oasis-open.org, mhocko@suse.com, Amit Shah,
mst@redhat.com, qemu-devel@nongnu.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kirill.shutemov@linux.intel.com, pbonzini@redhat.com,
akpm@linux-foundation.org,
virtualization@lists.linux-foundation.org, Mel Gorman,
dgilbert@redhat.com
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E3A12D814@shsmsx102.ccr.corp.intel.com>
On 12/04/2016 05:13 AM, Li, Liang Z wrote:
>> On 11/30/2016 12:43 AM, Liang Li wrote:
>>> +static void send_unused_pages_info(struct virtio_balloon *vb,
>>> + unsigned long req_id)
>>> +{
>>> + struct scatterlist sg_in;
>>> + unsigned long pos = 0;
>>> + struct virtqueue *vq = vb->req_vq;
>>> + struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
>>> + int ret, order;
>>> +
>>> + mutex_lock(&vb->balloon_lock);
>>> +
>>> + for (order = MAX_ORDER - 1; order >= 0; order--) {
>>
>> I scratched my head for a bit on this one. Why are you walking over orders,
>> *then* zones. I *think* you're doing it because you can efficiently fill the
>> bitmaps at a given order for all zones, then move to a new bitmap. But, it
>> would be interesting to document this.
>
> Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> Do you think it's acceptable?
Yeah, it's fine. Just comment it, please.
>>> + if (ret == -ENOSPC) {
>>> + void *new_resp_data;
>>> +
>>> + new_resp_data = kmalloc(2 * vb->resp_buf_size,
>>> + GFP_KERNEL);
>>> + if (new_resp_data) {
>>> + kfree(vb->resp_data);
>>> + vb->resp_data = new_resp_data;
>>> + vb->resp_buf_size *= 2;
>>
>> What happens to the data in ->resp_data at this point? Doesn't this just
>> throw it away?
>
> Yes, so we should make sure the data in resp_data is not inuse.
But doesn't it have valid data that we just collected and haven't told
the hypervisor about yet? Aren't we throwing away good data that cost
us something to collect?
>> ...
>>> +struct page_info_item {
>>> + __le64 start_pfn : 52; /* start pfn for the bitmap */
>>> + __le64 page_shift : 6; /* page shift width, in bytes */
What does a page_shift "in bytes" mean? :)
>>> + __le64 bmap_len : 6; /* bitmap length, in bytes */ };
>>
>> Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right?
>
> Currently, we just use the 8 bytes and 0 bytes bitmap, should we support more than 64 bytes?
It just means that with this format, you end up wasting at least ~1/8th
of the space with metadata. That's a bit unfortunate, but I guess it's
not fatal.
I'd definitely call it out in the patch description and make sure other
folks take a look at it.
There's a somewhat easy fix, but that would make the qemu implementation
more complicated: You could just have bmap_len==0x3f imply that there's
another field that contains an extended bitmap length for when you need
long bitmaps.
But, as you note, there's no need for it, so it's a matter of trading
the extra complexity versus the desire to not habing to change the ABI
again for longer (hopefully).
>>> +static int mark_unused_pages(struct zone *zone,
>>> + unsigned long *unused_pages, unsigned long size,
>>> + int order, unsigned long *pos)
>>> +{
>>> + unsigned long pfn, flags;
>>> + unsigned int t;
>>> + struct list_head *curr;
>>> + struct page_info_item *info;
>>> +
>>> + if (zone_is_empty(zone))
>>> + return 0;
>>> +
>>> + spin_lock_irqsave(&zone->lock, flags);
>>> +
>>> + if (*pos + zone->free_area[order].nr_free > size)
>>> + return -ENOSPC;
>>
>> Urg, so this won't partially fill? So, what the nr_free pages limit where we no
>> longer fit in the kmalloc()'d buffer where this simply won't work?
>
> Yes. My initial implementation is partially fill, it's better for the worst case.
> I thought the above code is more efficient for most case ...
> Do you think partially fill the bitmap is better?
Could you please answer the question I asked?
Because if you don't get this right, it could mean that there are system
that simply *fail* here.
^ permalink raw reply
* Oops with CONFIG_VMAP_STCK and bond device + virtio-net
From: Laura Abbott @ 2016-12-05 23:53 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang
Cc: netdev, zbyszek, Linux Kernel Mailing List, virtualization
Hi,
Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1401612
In qemu with two virtio-net interfaces:
$ ip l
...
5: ens14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:e9:64:41 brd ff:ff:ff:ff:ff:ff
6: ens15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:e9:64:42 brd ff:ff:ff:ff:ff:ff
$ sudo ip link add bond1 type bond
$ sudo ip link set ens14 master bond1
Segmentation fault
------------[ cut here ]------------
kernel BUG at ./include/linux/scatterlist.h:140!
invalid opcode: 0000 [#1] SMP
Modules linked in: bonding ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip
ata_generic crc32c_intel qxl drm_kms_helper virtio_pci serio_raw ttm drm pata_acpi
CPU: 5 PID: 1983 Comm: ip Not tainted 4.9.0-0.rc6.git2.1.fc26.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
task: ffff9d50a3583240 task.stack: ffffb06e41040000
RIP: 0010:[<ffffffffbc4896fc>] [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
RSP: 0018:ffffb06e41043698 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffb06e41043774 RCX: 0000000000000028
RDX: 0000131ec1043774 RSI: 0000000000000013 RDI: ffffb06ec1043774
RBP: ffffb06e410436b0 R08: 00000000001ddbe0 R09: ffffb06e410436c8
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000006
R13: ffffb06e410436c8 R14: ffff9d50b2dc1800 R15: ffff9d50b3db9600
FS: 00007f15347e5700(0000) GS:ffff9d50bb000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc09bc4000 CR3: 0000000135797000 CR4: 00000000000406e0
Stack:
ffff9d50b229d000 0000000000000000 ffffb06e41043772 ffffb06e41043720
ffffffffc0051123 ffff9d50a3583240 0000000087654321 0000000000000002
0000000000000000 0000000000000000 0000000000000000 000000007b8f5301
Call Trace:
[<ffffffffc0051123>] virtnet_set_mac_address+0xb3/0x140 [virtio_net]
[<ffffffffbc7ae305>] dev_set_mac_address+0x55/0xc0
[<ffffffffc03f319e>] bond_enslave+0x34e/0x1180 [bonding]
[<ffffffffbc7ca22f>] do_setlink+0x6cf/0xd10
[<ffffffffbc20dd6a>] ? get_page_from_freelist+0x6ba/0xca0
[<ffffffffbc037de9>] ? sched_clock+0x9/0x10
[<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
[<ffffffffbc111ed6>] ? __lock_acquire+0x346/0x1290
[<ffffffffbc4aa436>] ? nla_parse+0xa6/0x120
[<ffffffffbc7ce9e8>] rtnl_newlink+0x5c8/0x870
[<ffffffffbc3ecb32>] ? avc_has_perm_noaudit+0x32/0x210
[<ffffffffbc0bbfca>] ? ns_capable_common+0x7a/0x90
[<ffffffffbc0bbff3>] ? ns_capable+0x13/0x20
[<ffffffffbc7ced76>] rtnetlink_rcv_msg+0xe6/0x210
[<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
[<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
[<ffffffffbc7cec90>] ? rtnl_newlink+0x870/0x870
[<ffffffffbc7f7394>] netlink_rcv_skb+0xa4/0xc0
[<ffffffffbc7c952a>] rtnetlink_rcv+0x2a/0x40
[<ffffffffbc7f6d07>] netlink_unicast+0x1f7/0x2f0
[<ffffffffbc7f6c7f>] ? netlink_unicast+0x16f/0x2f0
[<ffffffffbc7f7102>] netlink_sendmsg+0x302/0x3c0
[<ffffffffbc790c28>] sock_sendmsg+0x38/0x50
[<ffffffffbc791773>] ___sys_sendmsg+0x2e3/0x2f0
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
[<ffffffffbc037de9>] ? sched_clock+0x9/0x10
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
[<ffffffffbc111775>] ? trace_hardirqs_on_caller+0xf5/0x1b0
[<ffffffffbc7924b4>] __sys_sendmsg+0x54/0x90
[<ffffffffbc792502>] SyS_sendmsg+0x12/0x20
[<ffffffffbc003eec>] do_syscall_64+0x6c/0x1f0
[<ffffffffbc917589>] entry_SYSCALL64_slow_path+0x25/0x25
Code: ca 75 2c 49 8b 55 08 f6 c2 01 75 25 83 e2 03 81 e3 ff 0f 00 00 45 89 65 14 48
RIP [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
RSP <ffffb06e41043698>
---[ end trace 9076d2284efbf735 ]---
This looks like an issue with CONFIG_VMAP_STACK since bond_enslave uses
struct sockaddr from the stack and virtnet_set_mac_address calls
sg_init_one which triggers BUG_ON(!virt_addr_valid(buf));
I know there have been a lot of CONFIG_VMAP_STACK fixes around but I
didn't find this one reported yet.
Thanks,
Laura
^ permalink raw reply
* Re: [PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock
From: Boqun Feng @ 2016-12-06 0:47 UTC (permalink / raw)
To: Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-2-git-send-email-xinhui.pan@linux.vnet.ibm.com>
[-- Attachment #1.1: Type: text/plain, Size: 8946 bytes --]
On Mon, Dec 05, 2016 at 10:19:21AM -0500, Pan Xinhui wrote:
> This patch add basic code to enable qspinlock on powerpc. qspinlock is
> one kind of fairlock implementation. And seen some performance improvement
> under some scenarios.
>
> queued_spin_unlock() release the lock by just one write of NULL to the
> ::locked field which sits at different places in the two endianness
> system.
>
> We override some arch_spin_XXX as powerpc has io_sync stuff which makes
> sure the io operations are protected by the lock correctly.
>
> There is another special case, see commit
> 2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/qspinlock.h | 66 +++++++++++++++++++++++++++++++
> arch/powerpc/include/asm/spinlock.h | 31 +++++++++------
> arch/powerpc/include/asm/spinlock_types.h | 4 ++
> arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++
> 4 files changed, 147 insertions(+), 13 deletions(-)
> create mode 100644 arch/powerpc/include/asm/qspinlock.h
>
> diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
> new file mode 100644
> index 0000000..4c89256
> --- /dev/null
> +++ b/arch/powerpc/include/asm/qspinlock.h
> @@ -0,0 +1,66 @@
> +#ifndef _ASM_POWERPC_QSPINLOCK_H
> +#define _ASM_POWERPC_QSPINLOCK_H
> +
> +#include <asm-generic/qspinlock_types.h>
> +
> +#define SPIN_THRESHOLD (1 << 15)
> +#define queued_spin_unlock queued_spin_unlock
> +#define queued_spin_is_locked queued_spin_is_locked
> +#define queued_spin_unlock_wait queued_spin_unlock_wait
> +
> +extern void queued_spin_unlock_wait(struct qspinlock *lock);
> +
> +static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
> +{
> + return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
> +}
> +
> +static inline void queued_spin_unlock(struct qspinlock *lock)
> +{
> + /* release semantics is required */
> + smp_store_release(__qspinlock_lock_byte(lock), 0);
> +}
> +
> +static inline int queued_spin_is_locked(struct qspinlock *lock)
> +{
> + smp_mb();
> + return atomic_read(&lock->val);
> +}
> +
> +#include <asm-generic/qspinlock.h>
> +
> +/* we need override it as ppc has io_sync stuff */
> +#undef arch_spin_trylock
> +#undef arch_spin_lock
> +#undef arch_spin_lock_flags
> +#undef arch_spin_unlock
> +#define arch_spin_trylock arch_spin_trylock
> +#define arch_spin_lock arch_spin_lock
> +#define arch_spin_lock_flags arch_spin_lock_flags
> +#define arch_spin_unlock arch_spin_unlock
> +
> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
> +{
> + CLEAR_IO_SYNC;
> + return queued_spin_trylock(lock);
> +}
> +
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> + CLEAR_IO_SYNC;
> + queued_spin_lock(lock);
> +}
> +
> +static inline
> +void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
> +{
> + CLEAR_IO_SYNC;
> + queued_spin_lock(lock);
> +}
> +
> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
> +{
> + SYNC_IO;
> + queued_spin_unlock(lock);
> +}
> +#endif /* _ASM_POWERPC_QSPINLOCK_H */
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 8c1b913..954099e 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
> }
> #endif
>
> +#if defined(CONFIG_PPC_SPLPAR)
> +/* We only yield to the hypervisor if we are in shared processor mode */
> +#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
> +extern void __spin_yield(arch_spinlock_t *lock);
> +extern void __rw_yield(arch_rwlock_t *lock);
> +#else /* SPLPAR */
> +#define __spin_yield(x) barrier()
> +#define __rw_yield(x) barrier()
> +#define SHARED_PROCESSOR 0
> +#endif
> +
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +#include <asm/qspinlock.h>
> +#else
> +
> +#define arch_spin_relax(lock) __spin_yield(lock)
> +
> static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> {
> return lock.slock == 0;
> @@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
> * held. Conveniently, we have a word in the paca that holds this
> * value.
> */
> -
> -#if defined(CONFIG_PPC_SPLPAR)
> -/* We only yield to the hypervisor if we are in shared processor mode */
> -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
> -extern void __spin_yield(arch_spinlock_t *lock);
> -extern void __rw_yield(arch_rwlock_t *lock);
> -#else /* SPLPAR */
> -#define __spin_yield(x) barrier()
> -#define __rw_yield(x) barrier()
> -#define SHARED_PROCESSOR 0
> -#endif
> -
> static inline void arch_spin_lock(arch_spinlock_t *lock)
> {
> CLEAR_IO_SYNC;
> @@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
> smp_mb();
> }
>
> +#endif /* !CONFIG_QUEUED_SPINLOCKS */
> /*
> * Read-write spinlocks, allowing multiple readers
> * but only one writer.
> @@ -338,7 +344,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
> #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
> #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
>
> -#define arch_spin_relax(lock) __spin_yield(lock)
> #define arch_read_relax(lock) __rw_yield(lock)
> #define arch_write_relax(lock) __rw_yield(lock)
>
> diff --git a/arch/powerpc/include/asm/spinlock_types.h b/arch/powerpc/include/asm/spinlock_types.h
> index 2351adc..bd7144e 100644
> --- a/arch/powerpc/include/asm/spinlock_types.h
> +++ b/arch/powerpc/include/asm/spinlock_types.h
> @@ -5,11 +5,15 @@
> # error "please don't include this file directly"
> #endif
>
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +#include <asm-generic/qspinlock_types.h>
> +#else
> typedef struct {
> volatile unsigned int slock;
> } arch_spinlock_t;
>
> #define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
> +#endif
>
> typedef struct {
> volatile signed int lock;
> diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
> index b7b1237..6574626 100644
> --- a/arch/powerpc/lib/locks.c
> +++ b/arch/powerpc/lib/locks.c
> @@ -23,6 +23,7 @@
> #include <asm/hvcall.h>
> #include <asm/smp.h>
>
> +#ifndef CONFIG_QUEUED_SPINLOCKS
> void __spin_yield(arch_spinlock_t *lock)
> {
> unsigned int lock_value, holder_cpu, yield_count;
> @@ -42,6 +43,7 @@ void __spin_yield(arch_spinlock_t *lock)
> get_hard_smp_processor_id(holder_cpu), yield_count);
> }
> EXPORT_SYMBOL_GPL(__spin_yield);
> +#endif
>
> /*
> * Waiting for a read lock or a write lock on a rwlock...
> @@ -68,3 +70,60 @@ void __rw_yield(arch_rwlock_t *rw)
> get_hard_smp_processor_id(holder_cpu), yield_count);
> }
> #endif
> +
> +#ifdef CONFIG_QUEUED_SPINLOCKS
> +/*
> + * This forbid we load an old value in another LL/SC. Because the SC here force
> + * another LL/SC repeat. So we guarantee all loads in another LL and SC will
> + * read correct value.
> + */
> +static inline u32 atomic_read_sync(atomic_t *v)
> +{
> + u32 val;
> +
> + __asm__ __volatile__(
> +"1: " PPC_LWARX(%0, 0, %2, 0) "\n"
> +" stwcx. %0, 0, %2\n"
> +" bne- 1b\n"
> + : "=&r" (val), "+m" (*v)
> + : "r" (v)
> + : "cr0", "xer");
> +
> + return val;
> +}
> +
> +void queued_spin_unlock_wait(struct qspinlock *lock)
> +{
> +
> + u32 val;
> +
> + smp_mb();
> +
> + /*
> + * copied from generic queue_spin_unlock_wait with little modification
> + */
> + for (;;) {
> + /* need _sync, as we might race with another LL/SC in lock()*/
> + val = atomic_read_sync(&lock->val);
> +
> + if (!val) /* not locked, we're done */
> + goto done;
> +
> + if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
> + break;
> +
> + /* not locked, but pending, wait until we observe the lock */
> + cpu_relax();
> + }
> +
> + /*
> + * any unlock is good. And need not _sync, as ->val is set by the SC in
> + * unlock(), any loads in lock() must see the correct value.
> + */
I don't think the comment here about _sync is correct. First, not all
unlock() has a SC part, and for unlock_wait() case there is nothing to
do with whether lock() see the correct value or not. The reason with
_sync is not needed here is:
/*
* _sync() is not needed here, because once we got here, we must already
* read the ->val as LOCKED via a _sync(). Combining the smp_mb()
* before, we guarantee that all the memory accesses before
* unlock_wait() must be observed by the next lock critical section.
*/
Regards,
Boqun
> + while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
> + cpu_relax();
> +done:
> + smp_mb();
> +}
> +EXPORT_SYMBOL(queued_spin_unlock_wait);
> +#endif
> --
> 2.4.11
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config
From: Boqun Feng @ 2016-12-06 0:58 UTC (permalink / raw)
To: Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-3-git-send-email-xinhui.pan@linux.vnet.ibm.com>
[-- Attachment #1.1: Type: text/plain, Size: 1287 bytes --]
On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:
> pSeries/powerNV will use qspinlock from now on.
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
> index bec90fb..8a87d06 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
Why here? Not arch/powerpc/platforms/Kconfig?
> @@ -23,6 +23,14 @@ config PPC_PSERIES
> select PPC_DOORBELL
> default y
>
> +config ARCH_USE_QUEUED_SPINLOCKS
> + default y
> + bool "Enable qspinlock"
I think you just enable qspinlock by default for all PPC platforms. I
guess you need to put
depends on PPC_PSERIES || PPC_POWERNV
here to achieve what you mean in you commit message.
Regards,
Boqun
> + help
> + Enabling this option will let kernel use qspinlock which is a kind of
> + fairlock. It has shown a good performance improvement on x86 and also ppc
> + especially in high contention cases.
> +
> config PPC_SPLPAR
> depends on PPC_PSERIES
> bool "Support for shared-processor logical partitions"
> --
> 2.4.11
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v8 1/6] powerpc/qspinlock: powerpc support qspinlock
From: Pan Xinhui @ 2016-12-06 1:16 UTC (permalink / raw)
To: Boqun Feng, Pan Xinhui
Cc: peterz, benh, linux-kernel, virtualization, mingo, paulus, mpe,
Waiman Long, paulmck, linuxppc-dev
In-Reply-To: <20161206004747.GB18164@tardis.cn.ibm.com>
correct waiman's address.
在 2016/12/6 08:47, Boqun Feng 写道:
> On Mon, Dec 05, 2016 at 10:19:21AM -0500, Pan Xinhui wrote:
>> This patch add basic code to enable qspinlock on powerpc. qspinlock is
>> one kind of fairlock implementation. And seen some performance improvement
>> under some scenarios.
>>
>> queued_spin_unlock() release the lock by just one write of NULL to the
>> ::locked field which sits at different places in the two endianness
>> system.
>>
>> We override some arch_spin_XXX as powerpc has io_sync stuff which makes
>> sure the io operations are protected by the lock correctly.
>>
>> There is another special case, see commit
>> 2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more")
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/qspinlock.h | 66 +++++++++++++++++++++++++++++++
>> arch/powerpc/include/asm/spinlock.h | 31 +++++++++------
>> arch/powerpc/include/asm/spinlock_types.h | 4 ++
>> arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++
>> 4 files changed, 147 insertions(+), 13 deletions(-)
>> create mode 100644 arch/powerpc/include/asm/qspinlock.h
>>
>> diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
>> new file mode 100644
>> index 0000000..4c89256
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/qspinlock.h
>> @@ -0,0 +1,66 @@
>> +#ifndef _ASM_POWERPC_QSPINLOCK_H
>> +#define _ASM_POWERPC_QSPINLOCK_H
>> +
>> +#include <asm-generic/qspinlock_types.h>
>> +
>> +#define SPIN_THRESHOLD (1 << 15)
>> +#define queued_spin_unlock queued_spin_unlock
>> +#define queued_spin_is_locked queued_spin_is_locked
>> +#define queued_spin_unlock_wait queued_spin_unlock_wait
>> +
>> +extern void queued_spin_unlock_wait(struct qspinlock *lock);
>> +
>> +static inline u8 *__qspinlock_lock_byte(struct qspinlock *lock)
>> +{
>> + return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
>> +}
>> +
>> +static inline void queued_spin_unlock(struct qspinlock *lock)
>> +{
>> + /* release semantics is required */
>> + smp_store_release(__qspinlock_lock_byte(lock), 0);
>> +}
>> +
>> +static inline int queued_spin_is_locked(struct qspinlock *lock)
>> +{
>> + smp_mb();
>> + return atomic_read(&lock->val);
>> +}
>> +
>> +#include <asm-generic/qspinlock.h>
>> +
>> +/* we need override it as ppc has io_sync stuff */
>> +#undef arch_spin_trylock
>> +#undef arch_spin_lock
>> +#undef arch_spin_lock_flags
>> +#undef arch_spin_unlock
>> +#define arch_spin_trylock arch_spin_trylock
>> +#define arch_spin_lock arch_spin_lock
>> +#define arch_spin_lock_flags arch_spin_lock_flags
>> +#define arch_spin_unlock arch_spin_unlock
>> +
>> +static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> +{
>> + CLEAR_IO_SYNC;
>> + return queued_spin_trylock(lock);
>> +}
>> +
>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> + CLEAR_IO_SYNC;
>> + queued_spin_lock(lock);
>> +}
>> +
>> +static inline
>> +void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags)
>> +{
>> + CLEAR_IO_SYNC;
>> + queued_spin_lock(lock);
>> +}
>> +
>> +static inline void arch_spin_unlock(arch_spinlock_t *lock)
>> +{
>> + SYNC_IO;
>> + queued_spin_unlock(lock);
>> +}
>> +#endif /* _ASM_POWERPC_QSPINLOCK_H */
>> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
>> index 8c1b913..954099e 100644
>> --- a/arch/powerpc/include/asm/spinlock.h
>> +++ b/arch/powerpc/include/asm/spinlock.h
>> @@ -60,6 +60,23 @@ static inline bool vcpu_is_preempted(int cpu)
>> }
>> #endif
>>
>> +#if defined(CONFIG_PPC_SPLPAR)
>> +/* We only yield to the hypervisor if we are in shared processor mode */
>> +#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
>> +extern void __spin_yield(arch_spinlock_t *lock);
>> +extern void __rw_yield(arch_rwlock_t *lock);
>> +#else /* SPLPAR */
>> +#define __spin_yield(x) barrier()
>> +#define __rw_yield(x) barrier()
>> +#define SHARED_PROCESSOR 0
>> +#endif
>> +
>> +#ifdef CONFIG_QUEUED_SPINLOCKS
>> +#include <asm/qspinlock.h>
>> +#else
>> +
>> +#define arch_spin_relax(lock) __spin_yield(lock)
>> +
>> static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>> {
>> return lock.slock == 0;
>> @@ -114,18 +131,6 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
>> * held. Conveniently, we have a word in the paca that holds this
>> * value.
>> */
>> -
>> -#if defined(CONFIG_PPC_SPLPAR)
>> -/* We only yield to the hypervisor if we are in shared processor mode */
>> -#define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
>> -extern void __spin_yield(arch_spinlock_t *lock);
>> -extern void __rw_yield(arch_rwlock_t *lock);
>> -#else /* SPLPAR */
>> -#define __spin_yield(x) barrier()
>> -#define __rw_yield(x) barrier()
>> -#define SHARED_PROCESSOR 0
>> -#endif
>> -
>> static inline void arch_spin_lock(arch_spinlock_t *lock)
>> {
>> CLEAR_IO_SYNC;
>> @@ -203,6 +208,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
>> smp_mb();
>> }
>>
>> +#endif /* !CONFIG_QUEUED_SPINLOCKS */
>> /*
>> * Read-write spinlocks, allowing multiple readers
>> * but only one writer.
>> @@ -338,7 +344,6 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
>> #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
>> #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
>>
>> -#define arch_spin_relax(lock) __spin_yield(lock)
>> #define arch_read_relax(lock) __rw_yield(lock)
>> #define arch_write_relax(lock) __rw_yield(lock)
>>
>> diff --git a/arch/powerpc/include/asm/spinlock_types.h b/arch/powerpc/include/asm/spinlock_types.h
>> index 2351adc..bd7144e 100644
>> --- a/arch/powerpc/include/asm/spinlock_types.h
>> +++ b/arch/powerpc/include/asm/spinlock_types.h
>> @@ -5,11 +5,15 @@
>> # error "please don't include this file directly"
>> #endif
>>
>> +#ifdef CONFIG_QUEUED_SPINLOCKS
>> +#include <asm-generic/qspinlock_types.h>
>> +#else
>> typedef struct {
>> volatile unsigned int slock;
>> } arch_spinlock_t;
>>
>> #define __ARCH_SPIN_LOCK_UNLOCKED { 0 }
>> +#endif
>>
>> typedef struct {
>> volatile signed int lock;
>> diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
>> index b7b1237..6574626 100644
>> --- a/arch/powerpc/lib/locks.c
>> +++ b/arch/powerpc/lib/locks.c
>> @@ -23,6 +23,7 @@
>> #include <asm/hvcall.h>
>> #include <asm/smp.h>
>>
>> +#ifndef CONFIG_QUEUED_SPINLOCKS
>> void __spin_yield(arch_spinlock_t *lock)
>> {
>> unsigned int lock_value, holder_cpu, yield_count;
>> @@ -42,6 +43,7 @@ void __spin_yield(arch_spinlock_t *lock)
>> get_hard_smp_processor_id(holder_cpu), yield_count);
>> }
>> EXPORT_SYMBOL_GPL(__spin_yield);
>> +#endif
>>
>> /*
>> * Waiting for a read lock or a write lock on a rwlock...
>> @@ -68,3 +70,60 @@ void __rw_yield(arch_rwlock_t *rw)
>> get_hard_smp_processor_id(holder_cpu), yield_count);
>> }
>> #endif
>> +
>> +#ifdef CONFIG_QUEUED_SPINLOCKS
>> +/*
>> + * This forbid we load an old value in another LL/SC. Because the SC here force
>> + * another LL/SC repeat. So we guarantee all loads in another LL and SC will
>> + * read correct value.
>> + */
>> +static inline u32 atomic_read_sync(atomic_t *v)
>> +{
>> + u32 val;
>> +
>> + __asm__ __volatile__(
>> +"1: " PPC_LWARX(%0, 0, %2, 0) "\n"
>> +" stwcx. %0, 0, %2\n"
>> +" bne- 1b\n"
>> + : "=&r" (val), "+m" (*v)
>> + : "r" (v)
>> + : "cr0", "xer");
>> +
>> + return val;
>> +}
>> +
>> +void queued_spin_unlock_wait(struct qspinlock *lock)
>> +{
>> +
>> + u32 val;
>> +
>> + smp_mb();
>> +
>> + /*
>> + * copied from generic queue_spin_unlock_wait with little modification
>> + */
>> + for (;;) {
>> + /* need _sync, as we might race with another LL/SC in lock()*/
>> + val = atomic_read_sync(&lock->val);
>> +
>> + if (!val) /* not locked, we're done */
>> + goto done;
>> +
>> + if (val & _Q_LOCKED_MASK) /* locked, go wait for unlock */
>> + break;
>> +
>> + /* not locked, but pending, wait until we observe the lock */
>> + cpu_relax();
>> + }
>> +
>> + /*
>> + * any unlock is good. And need not _sync, as ->val is set by the SC in
>> + * unlock(), any loads in lock() must see the correct value.
>> + */
>
> I don't think the comment here about _sync is correct. First, not all
> unlock() has a SC part, and for unlock_wait() case there is nothing to
yes, not all unlock has sc, but we are just dealing with sc part or unlock.
> do with whether lock() see the correct value or not. The reason with
yep, but loads can be reorded, so the unlock_wait(SC) vs lock(SC) can forbid
any loads between lock(LL) and lock(SC) reading any too new values.
> _sync is not needed here is:
>
> /*
> * _sync() is not needed here, because once we got here, we must already
> * read the ->val as LOCKED via a _sync(). Combining the smp_mb()
> * before, we guarantee that all the memory accesses before
> * unlock_wait() must be observed by the next lock critical section.
> */
>
good, more clearly. thanks
xinhui
> Regards,
> Boqun
>
>> + while (atomic_read(&lock->val) & _Q_LOCKED_MASK)
>> + cpu_relax();
>> +done:
>> + smp_mb();
>> +}
>> +EXPORT_SYMBOL(queued_spin_unlock_wait);
>> +#endif
>> --
>> 2.4.11
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function
From: Boqun Feng @ 2016-12-06 1:23 UTC (permalink / raw)
To: Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <1480951166-44830-4-git-send-email-xinhui.pan@linux.vnet.ibm.com>
[-- Attachment #1.1: Type: text/plain, Size: 4246 bytes --]
On Mon, Dec 05, 2016 at 10:19:23AM -0500, Pan Xinhui wrote:
> Add two corresponding helper functions to support pv-qspinlock.
>
> For normal use, __spin_yield_cpu will confer current vcpu slices to the
> target vcpu(say, a lock holder). If target vcpu is not specified or it
> is in running state, such conferging to lpar happens or not depends.
>
> Because hcall itself will introduce latency and a little overhead. And we
> do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
> The second parameter *confer* can indicate such case.
>
> __spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
> current vcpu state.
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/spinlock.h | 4 +++
> arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 63 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 954099e..6426bd5 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
> /* We only yield to the hypervisor if we are in shared processor mode */
> #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
> extern void __spin_yield(arch_spinlock_t *lock);
> +extern void __spin_yield_cpu(int cpu, int confer);
> +extern void __spin_wake_cpu(int cpu);
> extern void __rw_yield(arch_rwlock_t *lock);
> #else /* SPLPAR */
> #define __spin_yield(x) barrier()
> +#define __spin_yield_cpu(x, y) barrier()
> +#define __spin_wake_cpu(x) barrier()
> #define __rw_yield(x) barrier()
> #define SHARED_PROCESSOR 0
> #endif
> diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
> index 6574626..bd872c9 100644
> --- a/arch/powerpc/lib/locks.c
> +++ b/arch/powerpc/lib/locks.c
> @@ -23,6 +23,65 @@
> #include <asm/hvcall.h>
> #include <asm/smp.h>
>
> +/*
> + * confer our slices to a specified cpu and return. If it is in running state
> + * or cpu is -1, then we will check confer. If confer is NULL, we will return
> + * otherwise we confer our slices to lpar.
> + */
> +void __spin_yield_cpu(int cpu, int confer)
> +{
> + unsigned int holder_cpu = cpu, yield_count;
As I said at:
https://marc.info/?l=linux-kernel&m=147455748619343&w=2
@holder_cpu is not necessary and doesn't help anything.
> +
> + if (cpu == -1)
> + goto yield_to_lpar;
> +
> + BUG_ON(holder_cpu >= nr_cpu_ids);
> + yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
> +
> + /* if cpu is running, confer slices to lpar conditionally*/
> + if ((yield_count & 1) == 0)
> + goto yield_to_lpar;
> +
> + plpar_hcall_norets(H_CONFER,
> + get_hard_smp_processor_id(holder_cpu), yield_count);
> + return;
> +
> +yield_to_lpar:
> + if (confer)
> + plpar_hcall_norets(H_CONFER, -1, 0);
> +}
> +EXPORT_SYMBOL_GPL(__spin_yield_cpu);
> +
> +void __spin_wake_cpu(int cpu)
> +{
> + unsigned int holder_cpu = cpu;
And it's even wrong to call the parameter of _wake_cpu() a holder_cpu,
because it's not the current lock holder.
Regards,
Boqun
> +
> + BUG_ON(holder_cpu >= nr_cpu_ids);
> + /*
> + * NOTE: we should always do this hcall regardless of
> + * the yield_count of the holder_cpu.
> + * as thers might be a case like below;
> + * CPU 1 CPU 2
> + * yielded = true
> + * if (yielded)
> + * __spin_wake_cpu()
> + * __spin_yield_cpu()
> + *
> + * So we might lose a wake if we check the yield_count and
> + * return directly if the holder_cpu is running.
> + * IOW. do NOT code like below.
> + * yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
> + * if ((yield_count & 1) == 0)
> + * return;
> + *
> + * a PROD hcall marks the target_cpu proded, which cause the next cede
> + * or confer called on the target_cpu invalid.
> + */
> + plpar_hcall_norets(H_PROD,
> + get_hard_smp_processor_id(holder_cpu));
> +}
> +EXPORT_SYMBOL_GPL(__spin_wake_cpu);
> +
> #ifndef CONFIG_QUEUED_SPINLOCKS
> void __spin_yield(arch_spinlock_t *lock)
> {
> --
> 2.4.11
>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config
From: Pan Xinhui @ 2016-12-06 1:24 UTC (permalink / raw)
To: Boqun Feng, Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <20161206005834.GC18164@tardis.cn.ibm.com>
在 2016/12/6 08:58, Boqun Feng 写道:
> On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:
>> pSeries/powerNV will use qspinlock from now on.
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
>> index bec90fb..8a87d06 100644
>> --- a/arch/powerpc/platforms/pseries/Kconfig
>> +++ b/arch/powerpc/platforms/pseries/Kconfig
>
> Why here? Not arch/powerpc/platforms/Kconfig?
>
>> @@ -23,6 +23,14 @@ config PPC_PSERIES
>> select PPC_DOORBELL
>> default y
>>
>> +config ARCH_USE_QUEUED_SPINLOCKS
>> + default y
>> + bool "Enable qspinlock"
>
> I think you just enable qspinlock by default for all PPC platforms. I
> guess you need to put
>
> depends on PPC_PSERIES || PPC_POWERNV
>
> here to achieve what you mean in you commit message.
>
yes, another good way.
I prefer to put it in pseries/Kconfig as same as pv-qspinlocks config.
when we build nv, it still include pSeries's config anyway.
thanks
xinhui
> Regards,
> Boqun
>
>> + help
>> + Enabling this option will let kernel use qspinlock which is a kind of
>> + fairlock. It has shown a good performance improvement on x86 and also ppc
>> + especially in high contention cases.
>> +
>> config PPC_SPLPAR
>> depends on PPC_PSERIES
>> bool "Support for shared-processor logical partitions"
>> --
>> 2.4.11
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v8 3/6] powerpc: lib/locks.c: Add cpu yield/wake helper function
From: Pan Xinhui @ 2016-12-06 1:30 UTC (permalink / raw)
To: Boqun Feng, Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <20161206012315.GD18164@tardis.cn.ibm.com>
在 2016/12/6 09:23, Boqun Feng 写道:
> On Mon, Dec 05, 2016 at 10:19:23AM -0500, Pan Xinhui wrote:
>> Add two corresponding helper functions to support pv-qspinlock.
>>
>> For normal use, __spin_yield_cpu will confer current vcpu slices to the
>> target vcpu(say, a lock holder). If target vcpu is not specified or it
>> is in running state, such conferging to lpar happens or not depends.
>>
>> Because hcall itself will introduce latency and a little overhead. And we
>> do NOT want to suffer any latency on some cases, e.g. in interrupt handler.
>> The second parameter *confer* can indicate such case.
>>
>> __spin_wake_cpu is simpiler, it will wake up one vcpu regardless of its
>> current vcpu state.
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/spinlock.h | 4 +++
>> arch/powerpc/lib/locks.c | 59 +++++++++++++++++++++++++++++++++++++
>> 2 files changed, 63 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
>> index 954099e..6426bd5 100644
>> --- a/arch/powerpc/include/asm/spinlock.h
>> +++ b/arch/powerpc/include/asm/spinlock.h
>> @@ -64,9 +64,13 @@ static inline bool vcpu_is_preempted(int cpu)
>> /* We only yield to the hypervisor if we are in shared processor mode */
>> #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
>> extern void __spin_yield(arch_spinlock_t *lock);
>> +extern void __spin_yield_cpu(int cpu, int confer);
>> +extern void __spin_wake_cpu(int cpu);
>> extern void __rw_yield(arch_rwlock_t *lock);
>> #else /* SPLPAR */
>> #define __spin_yield(x) barrier()
>> +#define __spin_yield_cpu(x, y) barrier()
>> +#define __spin_wake_cpu(x) barrier()
>> #define __rw_yield(x) barrier()
>> #define SHARED_PROCESSOR 0
>> #endif
>> diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
>> index 6574626..bd872c9 100644
>> --- a/arch/powerpc/lib/locks.c
>> +++ b/arch/powerpc/lib/locks.c
>> @@ -23,6 +23,65 @@
>> #include <asm/hvcall.h>
>> #include <asm/smp.h>
>>
>> +/*
>> + * confer our slices to a specified cpu and return. If it is in running state
>> + * or cpu is -1, then we will check confer. If confer is NULL, we will return
>> + * otherwise we confer our slices to lpar.
>> + */
>> +void __spin_yield_cpu(int cpu, int confer)
>> +{
>> + unsigned int holder_cpu = cpu, yield_count;
>
> As I said at:
>
> https://marc.info/?l=linux-kernel&m=147455748619343&w=2
>
> @holder_cpu is not necessary and doesn't help anything.
>
>> +
>> + if (cpu == -1)
>> + goto yield_to_lpar;
>> +
>> + BUG_ON(holder_cpu >= nr_cpu_ids);
>> + yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
>> +
>> + /* if cpu is running, confer slices to lpar conditionally*/
>> + if ((yield_count & 1) == 0)
>> + goto yield_to_lpar;
>> +
>> + plpar_hcall_norets(H_CONFER,
>> + get_hard_smp_processor_id(holder_cpu), yield_count);
>> + return;
>> +
>> +yield_to_lpar:
>> + if (confer)
>> + plpar_hcall_norets(H_CONFER, -1, 0);
>> +}
>> +EXPORT_SYMBOL_GPL(__spin_yield_cpu);
>> +
>> +void __spin_wake_cpu(int cpu)
>> +{
>> + unsigned int holder_cpu = cpu;
>
> And it's even wrong to call the parameter of _wake_cpu() a holder_cpu,
> because it's not the current lock holder.
>
oh, its name is really misleading.
thanks
> Regards,
> Boqun
>
>> +
>> + BUG_ON(holder_cpu >= nr_cpu_ids);
>> + /*
>> + * NOTE: we should always do this hcall regardless of
>> + * the yield_count of the holder_cpu.
>> + * as thers might be a case like below;
>> + * CPU 1 CPU 2
>> + * yielded = true
>> + * if (yielded)
>> + * __spin_wake_cpu()
>> + * __spin_yield_cpu()
>> + *
>> + * So we might lose a wake if we check the yield_count and
>> + * return directly if the holder_cpu is running.
>> + * IOW. do NOT code like below.
>> + * yield_count = be32_to_cpu(lppaca_of(holder_cpu).yield_count);
>> + * if ((yield_count & 1) == 0)
>> + * return;
>> + *
>> + * a PROD hcall marks the target_cpu proded, which cause the next cede
>> + * or confer called on the target_cpu invalid.
>> + */
>> + plpar_hcall_norets(H_PROD,
>> + get_hard_smp_processor_id(holder_cpu));
>> +}
>> +EXPORT_SYMBOL_GPL(__spin_wake_cpu);
>> +
>> #ifndef CONFIG_QUEUED_SPINLOCKS
>> void __spin_yield(arch_spinlock_t *lock)
>> {
>> --
>> 2.4.11
>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
From: Andy Lutomirski @ 2016-12-06 2:10 UTC (permalink / raw)
To: netdev
Cc: Michael S . Tsirkin, linux-kernel, virtualization,
Andy Lutomirski, Laura Abbott
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS. Copy the address to the heap
to prevent the crash.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: zbyszek@in.waw.pl
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
Very lightly tested.
drivers/net/virtio_net.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7276d5a95bd0..cbf1c613c67a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
int ret;
- struct sockaddr *addr = p;
+ struct sockaddr *addr;
struct scatterlist sg;
- ret = eth_prepare_mac_addr_change(dev, p);
+ addr = kmalloc(sizeof(*addr), GFP_KERNEL);
+ if (!addr)
+ return -ENOMEM;
+ memcpy(addr, p, sizeof(*addr));
+
+ ret = eth_prepare_mac_addr_change(dev, addr);
if (ret)
- return ret;
+ goto out;
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
sg_init_one(&sg, addr->sa_data, dev->addr_len);
@@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
dev_warn(&vdev->dev,
"Failed to set mac address by vq command.\n");
- return -EINVAL;
+ ret = -EINVAL;
+ goto out;
}
} else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
!virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
@@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
}
eth_commit_mac_addr_change(dev, p);
+ ret = 0;
- return 0;
+out:
+ kfree(addr);
+ return ret;
}
static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
--
2.9.3
^ permalink raw reply related
* Re: [PATCH v8 2/6] powerpc: pSeries/Kconfig: Add qspinlock build config
From: Pan Xinhui @ 2016-12-06 2:12 UTC (permalink / raw)
To: Boqun Feng, Pan Xinhui
Cc: peterz, benh, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, paulmck, linuxppc-dev
In-Reply-To: <2d4eb891-e999-b7f8-52ff-31ef7bb8af6b@linux.vnet.ibm.com>
在 2016/12/6 09:24, Pan Xinhui 写道:
>
>
> 在 2016/12/6 08:58, Boqun Feng 写道:
>> On Mon, Dec 05, 2016 at 10:19:22AM -0500, Pan Xinhui wrote:
>>> pSeries/powerNV will use qspinlock from now on.
>>>
>>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/platforms/pseries/Kconfig | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
>>> index bec90fb..8a87d06 100644
>>> --- a/arch/powerpc/platforms/pseries/Kconfig
>>> +++ b/arch/powerpc/platforms/pseries/Kconfig
>>
>> Why here? Not arch/powerpc/platforms/Kconfig?
>>
>>> @@ -23,6 +23,14 @@ config PPC_PSERIES
>>> select PPC_DOORBELL
>>> default y
>>>
>>> +config ARCH_USE_QUEUED_SPINLOCKS
>>> + default y
>>> + bool "Enable qspinlock"
>>
>> I think you just enable qspinlock by default for all PPC platforms. I
>> guess you need to put
>>
>> depends on PPC_PSERIES || PPC_POWERNV
>>
>> here to achieve what you mean in you commit message.
>>
oh, yes, need depends on PPC_PSERIES || PPC_POWERNV.
> yes, another good way.
> I prefer to put it in pseries/Kconfig as same as pv-qspinlocks config.
> when we build nv, it still include pSeries's config anyway.
>
> thanks
> xinhui
>
>> Regards,
>> Boqun
>>
>>> + help
>>> + Enabling this option will let kernel use qspinlock which is a kind of
>>> + fairlock. It has shown a good performance improvement on x86 and also ppc
>>> + especially in high contention cases.
>>> +
>>> config PPC_SPLPAR
>>> depends on PPC_PSERIES
>>> bool "Support for shared-processor logical partitions"
>>> --
>>> 2.4.11
>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
From: Jason Wang @ 2016-12-06 2:30 UTC (permalink / raw)
To: Andy Lutomirski, netdev
Cc: Laura Abbott, Michael S . Tsirkin, linux-kernel, virtualization
In-Reply-To: <fe889e578d5dffa9ae0834b449a35fcfd1e10694.1480990173.git.luto@kernel.org>
On 2016年12月06日 10:10, Andy Lutomirski wrote:
> With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
> pointer to the stack and it will OOPS. Copy the address to the heap
> to prevent the crash.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Reported-by: zbyszek@in.waw.pl
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>
> Very lightly tested.
>
> drivers/net/virtio_net.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7276d5a95bd0..cbf1c613c67a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> struct virtnet_info *vi = netdev_priv(dev);
> struct virtio_device *vdev = vi->vdev;
> int ret;
> - struct sockaddr *addr = p;
> + struct sockaddr *addr;
> struct scatterlist sg;
>
> - ret = eth_prepare_mac_addr_change(dev, p);
> + addr = kmalloc(sizeof(*addr), GFP_KERNEL);
> + if (!addr)
> + return -ENOMEM;
> + memcpy(addr, p, sizeof(*addr));
> +
> + ret = eth_prepare_mac_addr_change(dev, addr);
> if (ret)
> - return ret;
> + goto out;
>
> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> sg_init_one(&sg, addr->sa_data, dev->addr_len);
> @@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
> dev_warn(&vdev->dev,
> "Failed to set mac address by vq command.\n");
> - return -EINVAL;
> + ret = -EINVAL;
> + goto out;
> }
> } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
> !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> @@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> }
>
> eth_commit_mac_addr_change(dev, p);
> + ret = 0;
>
> - return 0;
> +out:
> + kfree(addr);
> + return ret;
> }
>
> static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
Acked-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
From: Michael S. Tsirkin @ 2016-12-06 3:00 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: netdev, Laura Abbott, linux-kernel, virtualization
In-Reply-To: <fe889e578d5dffa9ae0834b449a35fcfd1e10694.1480990173.git.luto@kernel.org>
On Mon, Dec 05, 2016 at 06:10:58PM -0800, Andy Lutomirski wrote:
> With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
> pointer to the stack and it will OOPS. Copy the address to the heap
> to prevent the crash.
>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Reported-by: zbyszek@in.waw.pl
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Very lightly tested.
>
> drivers/net/virtio_net.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7276d5a95bd0..cbf1c613c67a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -969,12 +969,17 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> struct virtnet_info *vi = netdev_priv(dev);
> struct virtio_device *vdev = vi->vdev;
> int ret;
> - struct sockaddr *addr = p;
> + struct sockaddr *addr;
> struct scatterlist sg;
>
> - ret = eth_prepare_mac_addr_change(dev, p);
> + addr = kmalloc(sizeof(*addr), GFP_KERNEL);
> + if (!addr)
> + return -ENOMEM;
> + memcpy(addr, p, sizeof(*addr));
> +
> + ret = eth_prepare_mac_addr_change(dev, addr);
> if (ret)
> - return ret;
> + goto out;
>
> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
> sg_init_one(&sg, addr->sa_data, dev->addr_len);
> @@ -982,7 +987,8 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> VIRTIO_NET_CTRL_MAC_ADDR_SET, &sg)) {
> dev_warn(&vdev->dev,
> "Failed to set mac address by vq command.\n");
> - return -EINVAL;
> + ret = -EINVAL;
> + goto out;
> }
> } else if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
> !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> @@ -996,8 +1002,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
> }
>
> eth_commit_mac_addr_change(dev, p);
> + ret = 0;
>
> - return 0;
> +out:
> + kfree(addr);
> + return ret;
> }
>
> static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
> --
> 2.9.3
^ permalink raw reply
* RE: [PATCH kernel v5 5/5] virtio-balloon: tell host vm's unused page info
From: Li, Liang Z @ 2016-12-06 4:47 UTC (permalink / raw)
To: Hansen, Dave, kvm@vger.kernel.org
Cc: virtio-dev@lists.oasis-open.org, mhocko@suse.com, Amit Shah,
mst@redhat.com, qemu-devel@nongnu.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kirill.shutemov@linux.intel.com, pbonzini@redhat.com,
akpm@linux-foundation.org,
virtualization@lists.linux-foundation.org, Mel Gorman,
dgilbert@redhat.com
In-Reply-To: <70ece7a5-348b-2eb9-c40a-f21b08df042c@intel.com>
> >>> + mutex_lock(&vb->balloon_lock);
> >>> +
> >>> + for (order = MAX_ORDER - 1; order >= 0; order--) {
> >>
> >> I scratched my head for a bit on this one. Why are you walking over
> >> orders,
> >> *then* zones. I *think* you're doing it because you can efficiently
> >> fill the bitmaps at a given order for all zones, then move to a new
> >> bitmap. But, it would be interesting to document this.
> >
> > Yes, use the order is somewhat strange, but it's helpful to keep the API simple.
> > Do you think it's acceptable?
>
> Yeah, it's fine. Just comment it, please.
>
Good!
> >>> + if (ret == -ENOSPC) {
> >>> + void *new_resp_data;
> >>> +
> >>> + new_resp_data = kmalloc(2 * vb->resp_buf_size,
> >>> + GFP_KERNEL);
> >>> + if (new_resp_data) {
> >>> + kfree(vb->resp_data);
> >>> + vb->resp_data = new_resp_data;
> >>> + vb->resp_buf_size *= 2;
> >>
> >> What happens to the data in ->resp_data at this point? Doesn't this
> >> just throw it away?
> >
> > Yes, so we should make sure the data in resp_data is not inuse.
>
> But doesn't it have valid data that we just collected and haven't told the
> hypervisor about yet? Aren't we throwing away good data that cost us
> something to collect?
Indeed. Some filled data may exist for the previous zone. Should we
change the API to
'int get_unused_pages(unsigned long *unused_pages, unsigned long size,
int order, unsigned long *pos, struct zone *zone)' ?
then we can use the 'zone' to record the zone to retry and not discard the
filled data.
> >> ...
> >>> +struct page_info_item {
> >>> + __le64 start_pfn : 52; /* start pfn for the bitmap */
> >>> + __le64 page_shift : 6; /* page shift width, in bytes */
>
> What does a page_shift "in bytes" mean? :)
Obviously, you know. :o
I will try to make it clear.
>
> >>> + __le64 bmap_len : 6; /* bitmap length, in bytes */ };
> >>
> >> Is 'bmap_len' too short? a 64-byte buffer is a bit tiny. Right?
> >
> > Currently, we just use the 8 bytes and 0 bytes bitmap, should we support
> more than 64 bytes?
>
> It just means that with this format, you end up wasting at least ~1/8th of the
> space with metadata. That's a bit unfortunate, but I guess it's not fatal.
>
> I'd definitely call it out in the patch description and make sure other folks take
> a look at it.
OK.
>
> There's a somewhat easy fix, but that would make the qemu implementation
> more complicated: You could just have bmap_len==0x3f imply that there's
> another field that contains an extended bitmap length for when you need long
> bitmaps.
>
> But, as you note, there's no need for it, so it's a matter of trading the extra
> complexity versus the desire to not habing to change the ABI again for longer
> (hopefully).
>
Your suggestion still works without changing the current code, just reserve
' bmap_len==0x3f' for future extension, and it's not used by the current code.
> >>> +static int mark_unused_pages(struct zone *zone,
> >>> + unsigned long *unused_pages, unsigned long size,
> >>> + int order, unsigned long *pos)
> >>> +{
> >>> + unsigned long pfn, flags;
> >>> + unsigned int t;
> >>> + struct list_head *curr;
> >>> + struct page_info_item *info;
> >>> +
> >>> + if (zone_is_empty(zone))
> >>> + return 0;
> >>> +
> >>> + spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> + if (*pos + zone->free_area[order].nr_free > size)
> >>> + return -ENOSPC;
> >>
> >> Urg, so this won't partially fill? So, what the nr_free pages limit
> >> where we no longer fit in the kmalloc()'d buffer where this simply won't
> work?
> >
> > Yes. My initial implementation is partially fill, it's better for the worst case.
> > I thought the above code is more efficient for most case ...
> > Do you think partially fill the bitmap is better?
>
> Could you please answer the question I asked?
>
For your question:
-------------------------------------------------------------------------------------------------------
>So, what the nr_free pages limit where we no longer fit in the kmalloc()'d buffer
> where this simply won't work?
------------------------------------------------------------------------------------------------------
No, if the buffer is not big enough to save 'nr_free' pages, get_unused_pages() will return
'-ENOSPC', and the following code will try to allocate a 2x times size buffer for retrying,
until the proper size buffer is allocated. The current order will not be skipped unless the
buffer allocation failed.
> Because if you don't get this right, it could mean that there are system that
> simply *fail* here.
^ permalink raw reply
* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
From: David Hildenbrand @ 2016-12-06 8:40 UTC (permalink / raw)
To: Liang Li, kvm
Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
dave.hansen, dgilbert, pbonzini, akpm, virtualization,
kirill.shutemov
In-Reply-To: <1480495397-23225-1-git-send-email-liang.z.li@intel.com>
Am 30.11.2016 um 09:43 schrieb Liang Li:
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process,
> the main idea of this optimization is to use bitmap to send the page
> information to host instead of the PFNs, to reduce the overhead of
> virtio data transmission, address translation and madvise(). This can
> help to improve the performance by about 85%.
Do you have some statistics/some rough feeling how many consecutive bits
are usually set in the bitmaps? Is it really just purely random or is
there some granularity that is usually consecutive?
IOW in real examples, do we have really large consecutive areas or are
all pages just completely distributed over our memory?
Thanks!
--
David
^ permalink raw reply
* RE: [PATCH v5 1/1] crypto: add virtio-crypto driver
From: Gonglei (Arei) @ 2016-12-06 9:01 UTC (permalink / raw)
To: Gonglei (Arei), linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org,
linux-crypto@vger.kernel.org
Cc: Huangweidong (C), Claudio Fontana, mst@redhat.com, Luonengjun,
Hanweidong (Randy), Xuquan (Quan Xu), Wanzongshun (Vincent),
stefanha@redhat.com, Zhoujian (jay, Euler), longpeng,
arei.gonglei@hotmail.com, davem@davemloft.net, Wubin (H),
herbert@gondor.apana.org.au
In-Reply-To: <1480595945-63656-2-git-send-email-arei.gonglei@huawei.com>
Hi Herbert,
Would you please review and/or ack the virtio_crypto_algs.c?
It is the realization of specified algs based on Linux Crypto Framework.
Thanks!
Regards,
-Gonglei
> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Thursday, December 01, 2016 8:39 PM
> To: linux-kernel@vger.kernel.org; qemu-devel@nongnu.org;
> virtio-dev@lists.oasis-open.org; virtualization@lists.linux-foundation.org;
> linux-crypto@vger.kernel.org
> Cc: Luonengjun; mst@redhat.com; stefanha@redhat.com; Huangweidong (C);
> Wubin (H); xin.zeng@intel.com; Claudio Fontana;
> herbert@gondor.apana.org.au; pasic@linux.vnet.ibm.com;
> davem@davemloft.net; Zhoujian (jay, Euler); Hanweidong (Randy);
> arei.gonglei@hotmail.com; cornelia.huck@de.ibm.com; Xuquan (Quan Xu);
> longpeng; Wanzongshun (Vincent); Gonglei (Arei)
> Subject: [PATCH v5 1/1] crypto: add virtio-crypto driver
>
> This patch introduces virtio-crypto driver for Linux Kernel.
>
> The virtio crypto device is a virtual cryptography device
> as well as a kind of virtual hardware accelerator for
> virtual machines. The encryption anddecryption requests
> are placed in the data queue and are ultimately handled by
> thebackend crypto accelerators. The second queue is the
> control queue used to create or destroy sessions for
> symmetric algorithms and will control some advanced features
> in the future. The virtio crypto device provides the following
> cryptoservices: CIPHER, MAC, HASH, and AEAD.
>
> For more information about virtio-crypto device, please see:
> http://qemu-project.org/Features/VirtioCrypto
>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Cornelia Huck <cornelia.huck@de.ibm.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Herbert Xu <herbert@gondor.apana.org.au>
> CC: Halil Pasic <pasic@linux.vnet.ibm.com>
> CC: David S. Miller <davem@davemloft.net>
> CC: Zeng Xin <xin.zeng@intel.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
> MAINTAINERS | 9 +
> drivers/crypto/Kconfig | 2 +
> drivers/crypto/Makefile | 1 +
> drivers/crypto/virtio/Kconfig | 10 +
> drivers/crypto/virtio/Makefile | 5 +
> drivers/crypto/virtio/virtio_crypto_algs.c | 537
> +++++++++++++++++++++++++++
> drivers/crypto/virtio/virtio_crypto_common.h | 122 ++++++
> drivers/crypto/virtio/virtio_crypto_core.c | 464
> +++++++++++++++++++++++
> drivers/crypto/virtio/virtio_crypto_mgr.c | 264 +++++++++++++
> include/uapi/linux/Kbuild | 1 +
> include/uapi/linux/virtio_crypto.h | 450
> ++++++++++++++++++++++
> include/uapi/linux/virtio_ids.h | 1 +
> 12 files changed, 1866 insertions(+)
> create mode 100644 drivers/crypto/virtio/Kconfig
> create mode 100644 drivers/crypto/virtio/Makefile
> create mode 100644 drivers/crypto/virtio/virtio_crypto_algs.c
> create mode 100644 drivers/crypto/virtio/virtio_crypto_common.h
> create mode 100644 drivers/crypto/virtio/virtio_crypto_core.c
> create mode 100644 drivers/crypto/virtio/virtio_crypto_mgr.c
> create mode 100644 include/uapi/linux/virtio_crypto.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ad9b965..cccaaf0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -12810,6 +12810,7 @@ F: drivers/net/virtio_net.c
> F: drivers/block/virtio_blk.c
> F: include/linux/virtio_*.h
> F: include/uapi/linux/virtio_*.h
> +F: drivers/crypto/virtio/
>
> VIRTIO DRIVERS FOR S390
> M: Christian Borntraeger <borntraeger@de.ibm.com>
> @@ -12846,6 +12847,14 @@ S: Maintained
> F: drivers/virtio/virtio_input.c
> F: include/uapi/linux/virtio_input.h
>
> +VIRTIO CRYPTO DRIVER
> +M: Gonglei <arei.gonglei@huawei.com>
> +L: virtualization@lists.linux-foundation.org
> +L: linux-crypto@vger.kernel.org
> +S: Maintained
> +F: drivers/crypto/virtio/
> +F: include/uapi/linux/virtio_crypto.h
> +
> VIA RHINE NETWORK DRIVER
> S: Orphan
> F: drivers/net/ethernet/via/via-rhine.c
> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
> index 4d2b81f..7956478 100644
> --- a/drivers/crypto/Kconfig
> +++ b/drivers/crypto/Kconfig
> @@ -555,4 +555,6 @@ config CRYPTO_DEV_ROCKCHIP
>
> source "drivers/crypto/chelsio/Kconfig"
>
> +source "drivers/crypto/virtio/Kconfig"
> +
> endif # CRYPTO_HW
> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
> index ad7250f..bc53cb8 100644
> --- a/drivers/crypto/Makefile
> +++ b/drivers/crypto/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
> obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
> obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
> obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio/
> diff --git a/drivers/crypto/virtio/Kconfig b/drivers/crypto/virtio/Kconfig
> new file mode 100644
> index 0000000..d80f733
> --- /dev/null
> +++ b/drivers/crypto/virtio/Kconfig
> @@ -0,0 +1,10 @@
> +config CRYPTO_DEV_VIRTIO
> + tristate "VirtIO crypto driver"
> + depends on VIRTIO
> + select CRYPTO_AEAD
> + select CRYPTO_AUTHENC
> + select CRYPTO_BLKCIPHER
> + default m
> + help
> + This driver provides support for virtio crypto device. If you
> + choose 'M' here, this module will be called virtio_crypto.
> diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
> new file mode 100644
> index 0000000..dd342c9
> --- /dev/null
> +++ b/drivers/crypto/virtio/Makefile
> @@ -0,0 +1,5 @@
> +obj-$(CONFIG_CRYPTO_DEV_VIRTIO) += virtio_crypto.o
> +virtio_crypto-objs := \
> + virtio_crypto_algs.o \
> + virtio_crypto_mgr.o \
> + virtio_crypto_core.o
> diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c
> b/drivers/crypto/virtio/virtio_crypto_algs.c
> new file mode 100644
> index 0000000..5100056
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_algs.c
> @@ -0,0 +1,537 @@
> + /* Algorithms supported by virtio crypto device
> + *
> + * Authors: Gonglei <arei.gonglei@huawei.com>
> + *
> + * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/scatterlist.h>
> +#include <crypto/algapi.h>
> +#include <linux/err.h>
> +#include <crypto/scatterwalk.h>
> +#include <linux/atomic.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +/*
> + * The algs_lock protects the below global virtio_crypto_active_devs
> + * and crypto algorithms registion.
> + */
> +static DEFINE_MUTEX(algs_lock);
> +static unsigned int virtio_crypto_active_devs;
> +
> +static u64 virtio_crypto_alg_sg_nents_length(struct scatterlist *sg)
> +{
> + u64 total = 0;
> +
> + for (total = 0; sg; sg = sg_next(sg))
> + total += sg->length;
> +
> + return total;
> +}
> +
> +static int
> +virtio_crypto_alg_validate_key(int key_len, uint32_t *alg)
> +{
> + switch (key_len) {
> + case AES_KEYSIZE_128:
> + case AES_KEYSIZE_192:
> + case AES_KEYSIZE_256:
> + *alg = VIRTIO_CRYPTO_CIPHER_AES_CBC;
> + break;
> + default:
> + pr_err("virtio_crypto: Unsupported key length: %d\n",
> + key_len);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_init_session(
> + struct virtio_crypto_ablkcipher_ctx *ctx,
> + uint32_t alg, const uint8_t *key,
> + unsigned int keylen,
> + int encrypt)
> +{
> + struct scatterlist outhdr, key_sg, inhdr, *sgs[3];
> + unsigned int tmp;
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + int op = encrypt ? VIRTIO_CRYPTO_OP_ENCRYPT :
> VIRTIO_CRYPTO_OP_DECRYPT;
> + int err;
> + unsigned int num_out = 0, num_in = 0;
> +
> + /*
> + * Avoid to do DMA from the stack, switch to using
> + * dynamically-allocated for the key
> + */
> + uint8_t *cipher_key = kmalloc(keylen, GFP_ATOMIC);
> +
> + if (!cipher_key)
> + return -ENOMEM;
> +
> + memcpy(cipher_key, key, keylen);
> +
> + spin_lock(&vcrypto->ctrl_lock);
> + /* Pad ctrl header */
> + vcrypto->ctrl.header.opcode =
> + cpu_to_le32(VIRTIO_CRYPTO_CIPHER_CREATE_SESSION);
> + vcrypto->ctrl.header.algo = cpu_to_le32(alg);
> + /* Set the default dataqueue id to 0 */
> + vcrypto->ctrl.header.queue_id = 0;
> +
> + vcrypto->input.status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
> + /* Pad cipher's parameters */
> + vcrypto->ctrl.u.sym_create_session.op_type =
> + cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
> + vcrypto->ctrl.u.sym_create_session.u.cipher.para.algo =
> + vcrypto->ctrl.header.algo;
> + vcrypto->ctrl.u.sym_create_session.u.cipher.para.keylen =
> + cpu_to_le32(keylen);
> + vcrypto->ctrl.u.sym_create_session.u.cipher.para.op =
> + cpu_to_le32(op);
> +
> + sg_init_one(&outhdr, &vcrypto->ctrl, sizeof(vcrypto->ctrl));
> + sgs[num_out++] = &outhdr;
> +
> + /* Set key */
> + sg_init_one(&key_sg, cipher_key, keylen);
> + sgs[num_out++] = &key_sg;
> +
> + /* Return status and session id back */
> + sg_init_one(&inhdr, &vcrypto->input, sizeof(vcrypto->input));
> + sgs[num_out + num_in++] = &inhdr;
> +
> + err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out,
> + num_in, vcrypto, GFP_ATOMIC);
> + if (err < 0) {
> + spin_unlock(&vcrypto->ctrl_lock);
> + kzfree(cipher_key);
> + return err;
> + }
> + virtqueue_kick(vcrypto->ctrl_vq);
> +
> + /*
> + * Trapping into the hypervisor, so the request should be
> + * handled immediately.
> + */
> + while (!virtqueue_get_buf(vcrypto->ctrl_vq, &tmp) &&
> + !virtqueue_is_broken(vcrypto->ctrl_vq))
> + cpu_relax();
> +
> + if (le32_to_cpu(vcrypto->input.status) != VIRTIO_CRYPTO_OK) {
> + spin_unlock(&vcrypto->ctrl_lock);
> + pr_err("virtio_crypto: Create session failed status: %u\n",
> + le32_to_cpu(vcrypto->input.status));
> + kzfree(cipher_key);
> + return -EINVAL;
> + }
> +
> + if (encrypt)
> + ctx->enc_sess_info.session_id =
> + le64_to_cpu(vcrypto->input.session_id);
> + else
> + ctx->dec_sess_info.session_id =
> + le64_to_cpu(vcrypto->input.session_id);
> +
> + spin_unlock(&vcrypto->ctrl_lock);
> +
> + kzfree(cipher_key);
> + return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_close_session(
> + struct virtio_crypto_ablkcipher_ctx *ctx,
> + int encrypt)
> +{
> + struct scatterlist outhdr, status_sg, *sgs[2];
> + unsigned int tmp;
> + struct virtio_crypto_destroy_session_req *destroy_session;
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + int err;
> + unsigned int num_out = 0, num_in = 0;
> +
> + spin_lock(&vcrypto->ctrl_lock);
> + vcrypto->ctrl_status.status = VIRTIO_CRYPTO_ERR;
> + /* Pad ctrl header */
> + vcrypto->ctrl.header.opcode =
> + cpu_to_le32(VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION);
> + /* Set the default virtqueue id to 0 */
> + vcrypto->ctrl.header.queue_id = 0;
> +
> + destroy_session = &vcrypto->ctrl.u.destroy_session;
> +
> + if (encrypt)
> + destroy_session->session_id =
> + cpu_to_le64(ctx->enc_sess_info.session_id);
> + else
> + destroy_session->session_id =
> + cpu_to_le64(ctx->dec_sess_info.session_id);
> +
> + sg_init_one(&outhdr, &vcrypto->ctrl, sizeof(vcrypto->ctrl));
> + sgs[num_out++] = &outhdr;
> +
> + /* Return status and session id back */
> + sg_init_one(&status_sg, &vcrypto->ctrl_status.status,
> + sizeof(vcrypto->ctrl_status.status));
> + sgs[num_out + num_in++] = &status_sg;
> +
> + err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out,
> + num_in, vcrypto, GFP_ATOMIC);
> + if (err < 0) {
> + spin_unlock(&vcrypto->ctrl_lock);
> + return err;
> + }
> + virtqueue_kick(vcrypto->ctrl_vq);
> +
> + while (!virtqueue_get_buf(vcrypto->ctrl_vq, &tmp) &&
> + !virtqueue_is_broken(vcrypto->ctrl_vq))
> + cpu_relax();
> +
> + if (vcrypto->ctrl_status.status != VIRTIO_CRYPTO_OK) {
> + spin_unlock(&vcrypto->ctrl_lock);
> + pr_err("virtio_crypto: Close session failed status: %u, session_id:
> 0x%llx\n",
> + vcrypto->ctrl_status.status,
> + destroy_session->session_id);
> +
> + return -EINVAL;
> + }
> + spin_unlock(&vcrypto->ctrl_lock);
> +
> + return 0;
> +}
> +
> +static int virtio_crypto_alg_ablkcipher_init_sessions(
> + struct virtio_crypto_ablkcipher_ctx *ctx,
> + const uint8_t *key, unsigned int keylen)
> +{
> + uint32_t alg;
> + int ret;
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> +
> + if (keylen > vcrypto->max_cipher_key_len) {
> + pr_err("virtio_crypto: the key is too long\n");
> + goto bad_key;
> + }
> +
> + if (virtio_crypto_alg_validate_key(keylen, &alg))
> + goto bad_key;
> +
> + /* Create encryption session */
> + ret = virtio_crypto_alg_ablkcipher_init_session(ctx,
> + alg, key, keylen, 1);
> + if (ret)
> + return ret;
> + /* Create decryption session */
> + ret = virtio_crypto_alg_ablkcipher_init_session(ctx,
> + alg, key, keylen, 0);
> + if (ret) {
> + virtio_crypto_alg_ablkcipher_close_session(ctx, 1);
> + return ret;
> + }
> + return 0;
> +
> +bad_key:
> + crypto_tfm_set_flags(ctx->tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
> + return -EINVAL;
> +}
> +
> +/* Note: kernel crypto API realization */
> +static int virtio_crypto_ablkcipher_setkey(struct crypto_ablkcipher *tfm,
> + const uint8_t *key,
> + unsigned int keylen)
> +{
> + struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(tfm);
> + int ret;
> +
> + if (!ctx->vcrypto) {
> + /* New key */
> + int node = virtio_crypto_get_current_node();
> + struct virtio_crypto *vcrypto =
> + virtcrypto_get_dev_node(node);
> + if (!vcrypto) {
> + pr_err("virtio_crypto: Could not find a virtio device in the
> system");
> + return -ENODEV;
> + }
> +
> + ctx->vcrypto = vcrypto;
> + }
> +
> + ret = virtio_crypto_alg_ablkcipher_init_sessions(ctx, key, keylen);
> + if (ret) {
> + virtcrypto_dev_put(ctx->vcrypto);
> + ctx->vcrypto = NULL;
> +
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int
> +__virtio_crypto_ablkcipher_do_req(struct virtio_crypto_request *vc_req,
> + struct ablkcipher_request *req,
> + struct data_queue *data_vq,
> + __u8 op)
> +{
> + struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
> + unsigned int ivsize = crypto_ablkcipher_ivsize(tfm);
> + struct virtio_crypto_ablkcipher_ctx *ctx = vc_req->ablkcipher_ctx;
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + struct virtio_crypto_op_data_req *req_data;
> + int src_nents, dst_nents;
> + int err;
> + unsigned long flags;
> + struct scatterlist outhdr, iv_sg, status_sg, **sgs;
> + int i;
> + u64 dst_len;
> + unsigned int num_out = 0, num_in = 0;
> + int sg_total;
> + uint8_t *iv;
> +
> + src_nents = sg_nents_for_len(req->src, req->nbytes);
> + dst_nents = sg_nents(req->dst);
> +
> + pr_debug("virtio_crypto: Number of sgs (src_nents: %d,
> dst_nents: %d)\n",
> + src_nents, dst_nents);
> +
> + /* Why 3? outhdr + iv + inhdr */
> + sg_total = src_nents + dst_nents + 3;
> + sgs = kzalloc_node(sg_total * sizeof(*sgs), GFP_ATOMIC,
> + dev_to_node(&vcrypto->vdev->dev));
> + if (!sgs)
> + return -ENOMEM;
> +
> + req_data = kzalloc_node(sizeof(*req_data), GFP_ATOMIC,
> + dev_to_node(&vcrypto->vdev->dev));
> + if (!req_data) {
> + kfree(sgs);
> + return -ENOMEM;
> + }
> +
> + vc_req->req_data = req_data;
> + vc_req->type = VIRTIO_CRYPTO_SYM_OP_CIPHER;
> + /* Head of operation */
> + if (op) {
> + req_data->header.session_id =
> + cpu_to_le64(ctx->enc_sess_info.session_id);
> + req_data->header.opcode =
> + cpu_to_le32(VIRTIO_CRYPTO_CIPHER_ENCRYPT);
> + } else {
> + req_data->header.session_id =
> + cpu_to_le64(ctx->dec_sess_info.session_id);
> + req_data->header.opcode =
> + cpu_to_le32(VIRTIO_CRYPTO_CIPHER_DECRYPT);
> + }
> + req_data->u.sym_req.op_type =
> cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER);
> + req_data->u.sym_req.u.cipher.para.iv_len = cpu_to_le32(ivsize);
> + req_data->u.sym_req.u.cipher.para.src_data_len =
> + cpu_to_le32(req->nbytes);
> +
> + dst_len = virtio_crypto_alg_sg_nents_length(req->dst);
> + if (unlikely(dst_len > U32_MAX)) {
> + pr_err("virtio_crypto: The dst_len is beyond U32_MAX\n");
> + err = -EINVAL;
> + goto free;
> + }
> +
> + pr_debug("virtio_crypto: src_len: %u, dst_len: %llu\n",
> + req->nbytes, dst_len);
> +
> + if (unlikely(req->nbytes + dst_len + ivsize +
> + sizeof(vc_req->status) > vcrypto->max_size)) {
> + pr_err("virtio_crypto: The length is too big\n");
> + err = -EINVAL;
> + goto free;
> + }
> +
> + req_data->u.sym_req.u.cipher.para.dst_data_len =
> + cpu_to_le32((uint32_t)dst_len);
> +
> + /* Outhdr */
> + sg_init_one(&outhdr, req_data, sizeof(*req_data));
> + sgs[num_out++] = &outhdr;
> +
> + /* IV */
> +
> + /*
> + * Avoid to do DMA from the stack, switch to using
> + * dynamically-allocated for the IV
> + */
> + iv = kzalloc_node(ivsize, GFP_ATOMIC,
> + dev_to_node(&vcrypto->vdev->dev));
> + if (!iv) {
> + err = -ENOMEM;
> + goto free;
> + }
> + memcpy(iv, req->info, ivsize);
> + sg_init_one(&iv_sg, iv, ivsize);
> + sgs[num_out++] = &iv_sg;
> + vc_req->iv = iv;
> +
> + /* Source data */
> + for (i = 0; i < src_nents; i++)
> + sgs[num_out++] = &req->src[i];
> +
> + /* Destination data */
> + for (i = 0; i < dst_nents; i++)
> + sgs[num_out + num_in++] = &req->dst[i];
> +
> + /* Status */
> + sg_init_one(&status_sg, &vc_req->status, sizeof(vc_req->status));
> + sgs[num_out + num_in++] = &status_sg;
> +
> + vc_req->sgs = sgs;
> +
> + spin_lock_irqsave(&vcrypto->lock, flags);
> + err = virtqueue_add_sgs(data_vq->vq, sgs, num_out,
> + num_in, vc_req, GFP_ATOMIC);
> + spin_unlock_irqrestore(&vcrypto->lock, flags);
> + if (unlikely(err < 0))
> + goto free_iv;
> +
> + return 0;
> +
> +free_iv:
> + kzfree(iv);
> +free:
> + kzfree(req_data);
> + kfree(sgs);
> + return err;
> +}
> +
> +static int virtio_crypto_ablkcipher_encrypt(struct ablkcipher_request *req)
> +{
> + struct crypto_ablkcipher *atfm = crypto_ablkcipher_reqtfm(req);
> + struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(atfm);
> + struct virtio_crypto_request *vc_req = ablkcipher_request_ctx(req);
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + int ret;
> + /* Use the first data virtqueue as default */
> + struct data_queue *data_vq = &vcrypto->data_vq[0];
> +
> + vc_req->ablkcipher_ctx = ctx;
> + vc_req->ablkcipher_req = req;
> + ret = __virtio_crypto_ablkcipher_do_req(vc_req, req, data_vq, 1);
> + if (ret < 0) {
> + pr_err("virtio_crypto: Encryption failed!\n");
> + return ret;
> + }
> + virtqueue_kick(data_vq->vq);
> +
> + return -EINPROGRESS;
> +}
> +
> +static int virtio_crypto_ablkcipher_decrypt(struct ablkcipher_request *req)
> +{
> + struct crypto_ablkcipher *atfm = crypto_ablkcipher_reqtfm(req);
> + struct virtio_crypto_ablkcipher_ctx *ctx = crypto_ablkcipher_ctx(atfm);
> + struct virtio_crypto_request *vc_req = ablkcipher_request_ctx(req);
> + struct virtio_crypto *vcrypto = ctx->vcrypto;
> + int ret;
> + /* Use the first data virtqueue as default */
> + struct data_queue *data_vq = &vcrypto->data_vq[0];
> +
> + vc_req->ablkcipher_ctx = ctx;
> + vc_req->ablkcipher_req = req;
> +
> + ret = __virtio_crypto_ablkcipher_do_req(vc_req, req, data_vq, 0);
> + if (ret < 0) {
> + pr_err("virtio_crypto: Decryption failed!\n");
> + return ret;
> + }
> + virtqueue_kick(data_vq->vq);
> +
> + return -EINPROGRESS;
> +}
> +
> +static int virtio_crypto_ablkcipher_init(struct crypto_tfm *tfm)
> +{
> + struct virtio_crypto_ablkcipher_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> + tfm->crt_ablkcipher.reqsize = sizeof(struct virtio_crypto_request);
> + ctx->tfm = tfm;
> +
> + return 0;
> +}
> +
> +static void virtio_crypto_ablkcipher_exit(struct crypto_tfm *tfm)
> +{
> + struct virtio_crypto_ablkcipher_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> + if (!ctx->vcrypto)
> + return;
> +
> + virtio_crypto_alg_ablkcipher_close_session(ctx, 1);
> + virtio_crypto_alg_ablkcipher_close_session(ctx, 0);
> + virtcrypto_dev_put(ctx->vcrypto);
> + ctx->vcrypto = NULL;
> +}
> +
> +static struct crypto_alg virtio_crypto_algs[] = { {
> + .cra_name = "cbc(aes)",
> + .cra_driver_name = "virtio_crypto_aes_cbc",
> + .cra_priority = 4001,
> + .cra_flags = CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> + .cra_blocksize = AES_BLOCK_SIZE,
> + .cra_ctxsize = sizeof(struct virtio_crypto_ablkcipher_ctx),
> + .cra_alignmask = 0,
> + .cra_module = THIS_MODULE,
> + .cra_type = &crypto_ablkcipher_type,
> + .cra_init = virtio_crypto_ablkcipher_init,
> + .cra_exit = virtio_crypto_ablkcipher_exit,
> + .cra_u = {
> + .ablkcipher = {
> + .setkey = virtio_crypto_ablkcipher_setkey,
> + .decrypt = virtio_crypto_ablkcipher_decrypt,
> + .encrypt = virtio_crypto_ablkcipher_encrypt,
> + .min_keysize = AES_MIN_KEY_SIZE,
> + .max_keysize = AES_MAX_KEY_SIZE,
> + .ivsize = AES_BLOCK_SIZE,
> + },
> + },
> +} };
> +
> +int virtio_crypto_algs_register(void)
> +{
> + int ret = 0;
> +
> + mutex_lock(&algs_lock);
> + if (++virtio_crypto_active_devs != 1)
> + goto unlock;
> +
> + ret = crypto_register_algs(virtio_crypto_algs,
> + ARRAY_SIZE(virtio_crypto_algs));
> + if (ret)
> + virtio_crypto_active_devs--;
> +
> +unlock:
> + mutex_unlock(&algs_lock);
> + return ret;
> +}
> +
> +void virtio_crypto_algs_unregister(void)
> +{
> + mutex_lock(&algs_lock);
> + if (--virtio_crypto_active_devs != 0)
> + goto unlock;
> +
> + crypto_unregister_algs(virtio_crypto_algs,
> + ARRAY_SIZE(virtio_crypto_algs));
> +
> +unlock:
> + mutex_unlock(&algs_lock);
> +}
> diff --git a/drivers/crypto/virtio/virtio_crypto_common.h
> b/drivers/crypto/virtio/virtio_crypto_common.h
> new file mode 100644
> index 0000000..975404b
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_common.h
> @@ -0,0 +1,122 @@
> +/* Common header for Virtio crypto device.
> + *
> + * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _VIRTIO_CRYPTO_COMMON_H
> +#define _VIRTIO_CRYPTO_COMMON_H
> +
> +#include <linux/virtio.h>
> +#include <linux/crypto.h>
> +#include <linux/spinlock.h>
> +#include <crypto/aead.h>
> +#include <crypto/aes.h>
> +#include <crypto/authenc.h>
> +
> +
> +/* Internal representation of a data virtqueue */
> +struct data_queue {
> + /* Virtqueue associated with this send _queue */
> + struct virtqueue *vq;
> +
> + /* Name of the tx queue: dataq.$index */
> + char name[32];
> +};
> +
> +struct virtio_crypto {
> + struct virtio_device *vdev;
> + struct virtqueue *ctrl_vq;
> + struct data_queue *data_vq;
> +
> + /* To protect the vq operations for the dataq */
> + spinlock_t lock;
> +
> + /* To protect the vq operations for the controlq */
> + spinlock_t ctrl_lock;
> +
> + /* Maximum of data queues supported by the device */
> + u32 max_data_queues;
> +
> + /* Number of queue currently used by the driver */
> + u32 curr_queue;
> +
> + /* Maximum length of cipher key */
> + u32 max_cipher_key_len;
> + /* Maximum length of authenticated key */
> + u32 max_auth_key_len;
> + /* Maximum size of per request */
> + u64 max_size;
> +
> + /* Control VQ buffers: protected by the ctrl_lock */
> + struct virtio_crypto_op_ctrl_req ctrl;
> + struct virtio_crypto_session_input input;
> + struct virtio_crypto_inhdr ctrl_status;
> +
> + unsigned long status;
> + atomic_t ref_count;
> + struct list_head list;
> + struct module *owner;
> + uint8_t dev_id;
> +
> + /* Does the affinity hint is set for virtqueues? */
> + bool affinity_hint_set;
> +};
> +
> +struct virtio_crypto_sym_session_info {
> + /* Backend session id, which come from the host side */
> + __u64 session_id;
> +};
> +
> +struct virtio_crypto_ablkcipher_ctx {
> + struct virtio_crypto *vcrypto;
> + struct crypto_tfm *tfm;
> +
> + struct virtio_crypto_sym_session_info enc_sess_info;
> + struct virtio_crypto_sym_session_info dec_sess_info;
> +};
> +
> +struct virtio_crypto_request {
> + /* Cipher or aead */
> + uint32_t type;
> + uint8_t status;
> + struct virtio_crypto_ablkcipher_ctx *ablkcipher_ctx;
> + struct ablkcipher_request *ablkcipher_req;
> + struct virtio_crypto_op_data_req *req_data;
> + struct scatterlist **sgs;
> + uint8_t *iv;
> +};
> +
> +int virtcrypto_devmgr_add_dev(struct virtio_crypto *vcrypto_dev);
> +struct list_head *virtcrypto_devmgr_get_head(void);
> +void virtcrypto_devmgr_rm_dev(struct virtio_crypto *vcrypto_dev);
> +struct virtio_crypto *virtcrypto_devmgr_get_first(void);
> +int virtcrypto_dev_in_use(struct virtio_crypto *vcrypto_dev);
> +int virtcrypto_dev_get(struct virtio_crypto *vcrypto_dev);
> +void virtcrypto_dev_put(struct virtio_crypto *vcrypto_dev);
> +int virtcrypto_dev_started(struct virtio_crypto *vcrypto_dev);
> +struct virtio_crypto *virtcrypto_get_dev_node(int node);
> +int virtcrypto_dev_start(struct virtio_crypto *vcrypto);
> +void virtcrypto_dev_stop(struct virtio_crypto *vcrypto);
> +
> +static inline int virtio_crypto_get_current_node(void)
> +{
> + return topology_physical_package_id(smp_processor_id());
> +}
> +
> +int virtio_crypto_algs_register(void);
> +void virtio_crypto_algs_unregister(void);
> +
> +#endif /* _VIRTIO_CRYPTO_COMMON_H */
> diff --git a/drivers/crypto/virtio/virtio_crypto_core.c
> b/drivers/crypto/virtio/virtio_crypto_core.c
> new file mode 100644
> index 0000000..286d829
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_core.c
> @@ -0,0 +1,464 @@
> + /* Driver for Virtio crypto device.
> + *
> + * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <linux/cpu.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +
> +static void virtcrypto_dataq_callback(struct virtqueue *vq)
> +{
> + struct virtio_crypto *vcrypto = vq->vdev->priv;
> + struct virtio_crypto_request *vc_req;
> + unsigned long flags;
> + unsigned int len;
> + struct ablkcipher_request *ablk_req;
> + int error;
> +
> + spin_lock_irqsave(&vcrypto->lock, flags);
> + do {
> + virtqueue_disable_cb(vq);
> + while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
> + if (vc_req->type == VIRTIO_CRYPTO_SYM_OP_CIPHER) {
> + switch (vc_req->status) {
> + case VIRTIO_CRYPTO_OK:
> + error = 0;
> + break;
> + case VIRTIO_CRYPTO_INVSESS:
> + case VIRTIO_CRYPTO_ERR:
> + error = -EINVAL;
> + break;
> + case VIRTIO_CRYPTO_BADMSG:
> + error = -EBADMSG;
> + break;
> + default:
> + error = -EIO;
> + break;
> + }
> + ablk_req = vc_req->ablkcipher_req;
> + /* Finish the encrypt or decrypt process */
> + ablk_req->base.complete(&ablk_req->base, error);
> + }
> +
> + kzfree(vc_req->iv);
> + kzfree(vc_req->req_data);
> + kfree(vc_req->sgs);
> + }
> + } while (!virtqueue_enable_cb(vq));
> + spin_unlock_irqrestore(&vcrypto->lock, flags);
> +}
> +
> +static int virtcrypto_find_vqs(struct virtio_crypto *vi)
> +{
> + vq_callback_t **callbacks;
> + struct virtqueue **vqs;
> + int ret = -ENOMEM;
> + int i, total_vqs;
> + const char **names;
> +
> + /*
> + * We expect 1 data virtqueue, followed by
> + * possible N-1 data queues used in multiqueue mode,
> + * followed by control vq.
> + */
> + total_vqs = vi->max_data_queues + 1;
> +
> + /* Allocate space for find_vqs parameters */
> + vqs = kcalloc(total_vqs, sizeof(*vqs), GFP_KERNEL);
> + if (!vqs)
> + goto err_vq;
> + callbacks = kcalloc(total_vqs, sizeof(*callbacks), GFP_KERNEL);
> + if (!callbacks)
> + goto err_callback;
> + names = kcalloc(total_vqs, sizeof(*names), GFP_KERNEL);
> + if (!names)
> + goto err_names;
> +
> + /* Parameters for control virtqueue */
> + callbacks[total_vqs - 1] = NULL;
> + names[total_vqs - 1] = "controlq";
> +
> + /* Allocate/initialize parameters for data virtqueues */
> + for (i = 0; i < vi->max_data_queues; i++) {
> + callbacks[i] = virtcrypto_dataq_callback;
> + snprintf(vi->data_vq[i].name, sizeof(vi->data_vq[i].name),
> + "dataq.%d", i);
> + names[i] = vi->data_vq[i].name;
> + }
> +
> + ret = vi->vdev->config->find_vqs(vi->vdev, total_vqs, vqs, callbacks,
> + names);
> + if (ret)
> + goto err_find;
> +
> + vi->ctrl_vq = vqs[total_vqs - 1];
> +
> + for (i = 0; i < vi->max_data_queues; i++)
> + vi->data_vq[i].vq = vqs[i];
> +
> + kfree(names);
> + kfree(callbacks);
> + kfree(vqs);
> +
> + return 0;
> +
> +err_find:
> + kfree(names);
> +err_names:
> + kfree(callbacks);
> +err_callback:
> + kfree(vqs);
> +err_vq:
> + return ret;
> +}
> +
> +static int virtcrypto_alloc_queues(struct virtio_crypto *vi)
> +{
> + vi->data_vq = kcalloc(vi->max_data_queues, sizeof(*vi->data_vq),
> + GFP_KERNEL);
> + if (!vi->data_vq)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void virtcrypto_clean_affinity(struct virtio_crypto *vi, long hcpu)
> +{
> + int i;
> +
> + if (vi->affinity_hint_set) {
> + for (i = 0; i < vi->max_data_queues; i++)
> + virtqueue_set_affinity(vi->data_vq[i].vq, -1);
> +
> + vi->affinity_hint_set = false;
> + }
> +}
> +
> +static void virtcrypto_set_affinity(struct virtio_crypto *vcrypto)
> +{
> + int i = 0;
> + int cpu;
> +
> + /*
> + * In single queue mode, we don't set the cpu affinity.
> + */
> + if (vcrypto->curr_queue == 1 || vcrypto->max_data_queues == 1) {
> + virtcrypto_clean_affinity(vcrypto, -1);
> + return;
> + }
> +
> + /*
> + * In multiqueue mode, we let the queue to be private to one cpu
> + * by setting the affinity hint to eliminate the contention.
> + *
> + * TODO: adds cpu hotplug support by register cpu notifier.
> + *
> + */
> + for_each_online_cpu(cpu) {
> + virtqueue_set_affinity(vcrypto->data_vq[i].vq, cpu);
> + if (++i >= vcrypto->max_data_queues)
> + break;
> + }
> +
> + vcrypto->affinity_hint_set = true;
> +}
> +
> +static void virtcrypto_free_queues(struct virtio_crypto *vi)
> +{
> + kfree(vi->data_vq);
> +}
> +
> +static int virtcrypto_init_vqs(struct virtio_crypto *vi)
> +{
> + int ret;
> +
> + /* Allocate send & receive queues */
> + ret = virtcrypto_alloc_queues(vi);
> + if (ret)
> + goto err;
> +
> + ret = virtcrypto_find_vqs(vi);
> + if (ret)
> + goto err_free;
> +
> + get_online_cpus();
> + virtcrypto_set_affinity(vi);
> + put_online_cpus();
> +
> + return 0;
> +
> +err_free:
> + virtcrypto_free_queues(vi);
> +err:
> + return ret;
> +}
> +
> +static int virtcrypto_update_status(struct virtio_crypto *vcrypto)
> +{
> + u32 status;
> + int err;
> + unsigned long flags;
> +
> + virtio_cread(vcrypto->vdev,
> + struct virtio_crypto_config, status, &status);
> +
> + /*
> + * Unknown status bits would be a host error and the driver
> + * should consider the device to be broken.
> + */
> + if (status & (~VIRTIO_CRYPTO_S_HW_READY)) {
> + dev_warn(&vcrypto->vdev->dev,
> + "Unknown status bits: 0x%x\n", status);
> +
> + spin_lock_irqsave(&vcrypto->lock, flags);
> + virtio_break_device(vcrypto->vdev);
> + spin_unlock_irqrestore(&vcrypto->lock, flags);
> + return -EPERM;
> + }
> +
> + if (vcrypto->status == status)
> + return 0;
> +
> + vcrypto->status = status;
> +
> + if (vcrypto->status & VIRTIO_CRYPTO_S_HW_READY) {
> + err = virtcrypto_dev_start(vcrypto);
> + if (err) {
> + dev_err(&vcrypto->vdev->dev,
> + "Failed to start virtio crypto device.\n");
> +
> + return -EPERM;
> + }
> + dev_info(&vcrypto->vdev->dev, "Accelerator is ready\n");
> + } else {
> + virtcrypto_dev_stop(vcrypto);
> + dev_info(&vcrypto->vdev->dev, "Accelerator is not ready\n");
> + }
> +
> + return 0;
> +}
> +
> +static void virtcrypto_del_vqs(struct virtio_crypto *vcrypto)
> +{
> + struct virtio_device *vdev = vcrypto->vdev;
> +
> + virtcrypto_clean_affinity(vcrypto, -1);
> +
> + vdev->config->del_vqs(vdev);
> +
> + virtcrypto_free_queues(vcrypto);
> +}
> +
> +static int virtcrypto_probe(struct virtio_device *vdev)
> +{
> + int err = -EFAULT;
> + struct virtio_crypto *vcrypto;
> + u32 max_data_queues = 0, max_cipher_key_len = 0;
> + u32 max_auth_key_len = 0;
> + u64 max_size = 0;
> +
> + if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
> + return -ENODEV;
> +
> + if (!vdev->config->get) {
> + dev_err(&vdev->dev, "%s failure: config access disabled\n",
> + __func__);
> + return -EINVAL;
> + }
> +
> + if (num_possible_nodes() > 1 && dev_to_node(&vdev->dev) < 0) {
> + /*
> + * If the accelerator is connected to a node with no memory
> + * there is no point in using the accelerator since the remote
> + * memory transaction will be very slow.
> + */
> + dev_err(&vdev->dev, "Invalid NUMA configuration.\n");
> + return -EINVAL;
> + }
> +
> + vcrypto = kzalloc_node(sizeof(*vcrypto), GFP_KERNEL,
> + dev_to_node(&vdev->dev));
> + if (!vcrypto)
> + return -ENOMEM;
> +
> + virtio_cread(vdev, struct virtio_crypto_config,
> + max_dataqueues, &max_data_queues);
> + if (max_data_queues < 1)
> + max_data_queues = 1;
> +
> + virtio_cread(vdev, struct virtio_crypto_config,
> + max_cipher_key_len, &max_cipher_key_len);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + max_auth_key_len, &max_auth_key_len);
> + virtio_cread(vdev, struct virtio_crypto_config,
> + max_size, &max_size);
> +
> + /* Add virtio crypto device to global table */
> + err = virtcrypto_devmgr_add_dev(vcrypto);
> + if (err) {
> + dev_err(&vdev->dev, "Failed to add new virtio crypto device.\n");
> + goto free;
> + }
> + vcrypto->owner = THIS_MODULE;
> + vcrypto = vdev->priv = vcrypto;
> + vcrypto->vdev = vdev;
> + spin_lock_init(&vcrypto->lock);
> + spin_lock_init(&vcrypto->ctrl_lock);
> +
> + /* Use single data queue as default */
> + vcrypto->curr_queue = 1;
> + vcrypto->max_data_queues = max_data_queues;
> + vcrypto->max_cipher_key_len = max_cipher_key_len;
> + vcrypto->max_auth_key_len = max_auth_key_len;
> + vcrypto->max_size = max_size;
> +
> + dev_info(&vdev->dev,
> + "max_queues: %u, max_cipher_key_len: %u, max_auth_key_len: %u,
> max_size 0x%llx\n",
> + vcrypto->max_data_queues,
> + vcrypto->max_cipher_key_len,
> + vcrypto->max_auth_key_len,
> + vcrypto->max_size);
> +
> + err = virtcrypto_init_vqs(vcrypto);
> + if (err) {
> + dev_err(&vdev->dev, "Failed to initialize vqs.\n");
> + goto free_dev;
> + }
> + virtio_device_ready(vdev);
> +
> + err = virtcrypto_update_status(vcrypto);
> + if (err)
> + goto free_vqs;
> +
> + return 0;
> +
> +free_vqs:
> + vcrypto->vdev->config->reset(vdev);
> + virtcrypto_del_vqs(vcrypto);
> +free_dev:
> + virtcrypto_devmgr_rm_dev(vcrypto);
> +free:
> + kfree(vcrypto);
> + return err;
> +}
> +
> +static void virtcrypto_free_unused_reqs(struct virtio_crypto *vcrypto)
> +{
> + struct virtio_crypto_request *vc_req;
> + int i;
> + struct virtqueue *vq;
> +
> + for (i = 0; i < vcrypto->max_data_queues; i++) {
> + vq = vcrypto->data_vq[i].vq;
> + while ((vc_req = virtqueue_detach_unused_buf(vq)) != NULL) {
> + kfree(vc_req->req_data);
> + kfree(vc_req->sgs);
> + }
> + }
> +}
> +
> +static void virtcrypto_remove(struct virtio_device *vdev)
> +{
> + struct virtio_crypto *vcrypto = vdev->priv;
> +
> + dev_info(&vdev->dev, "Start virtcrypto_remove.\n");
> +
> + if (virtcrypto_dev_started(vcrypto))
> + virtcrypto_dev_stop(vcrypto);
> + vdev->config->reset(vdev);
> + virtcrypto_free_unused_reqs(vcrypto);
> + virtcrypto_del_vqs(vcrypto);
> + virtcrypto_devmgr_rm_dev(vcrypto);
> + kfree(vcrypto);
> +}
> +
> +static void virtcrypto_config_changed(struct virtio_device *vdev)
> +{
> + struct virtio_crypto *vcrypto = vdev->priv;
> +
> + virtcrypto_update_status(vcrypto);
> +}
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int virtcrypto_freeze(struct virtio_device *vdev)
> +{
> + struct virtio_crypto *vcrypto = vdev->priv;
> +
> + vdev->config->reset(vdev);
> + virtcrypto_free_unused_reqs(vcrypto);
> + if (virtcrypto_dev_started(vcrypto))
> + virtcrypto_dev_stop(vcrypto);
> +
> + virtcrypto_del_vqs(vcrypto);
> + return 0;
> +}
> +
> +static int virtcrypto_restore(struct virtio_device *vdev)
> +{
> + struct virtio_crypto *vcrypto = vdev->priv;
> + int err;
> +
> + err = virtcrypto_init_vqs(vcrypto);
> + if (err)
> + return err;
> +
> + virtio_device_ready(vdev);
> + err = virtcrypto_dev_start(vcrypto);
> + if (err) {
> + dev_err(&vdev->dev, "Failed to start virtio crypto device.\n");
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +#endif
> +
> +static unsigned int features[] = {
> + /* none */
> +};
> +
> +static struct virtio_device_id id_table[] = {
> + { VIRTIO_ID_CRYPTO, VIRTIO_DEV_ANY_ID },
> + { 0 },
> +};
> +
> +static struct virtio_driver virtio_crypto_driver = {
> + .driver.name = KBUILD_MODNAME,
> + .driver.owner = THIS_MODULE,
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> + .id_table = id_table,
> + .probe = virtcrypto_probe,
> + .remove = virtcrypto_remove,
> + .config_changed = virtcrypto_config_changed,
> +#ifdef CONFIG_PM_SLEEP
> + .freeze = virtcrypto_freeze,
> + .restore = virtcrypto_restore,
> +#endif
> +};
> +
> +module_virtio_driver(virtio_crypto_driver);
> +
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("virtio crypto device driver");
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Gonglei <arei.gonglei@huawei.com>");
> diff --git a/drivers/crypto/virtio/virtio_crypto_mgr.c
> b/drivers/crypto/virtio/virtio_crypto_mgr.c
> new file mode 100644
> index 0000000..a69ff71
> --- /dev/null
> +++ b/drivers/crypto/virtio/virtio_crypto_mgr.c
> @@ -0,0 +1,264 @@
> + /* Management for virtio crypto devices (refer to adf_dev_mgr.c)
> + *
> + * Copyright 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/mutex.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +
> +#include <uapi/linux/virtio_crypto.h>
> +#include "virtio_crypto_common.h"
> +
> +static LIST_HEAD(virtio_crypto_table);
> +static uint32_t num_devices;
> +
> +/* The table_lock protects the above global list and num_devices */
> +static DEFINE_MUTEX(table_lock);
> +
> +#define VIRTIO_CRYPTO_MAX_DEVICES 32
> +
> +
> +/*
> + * virtcrypto_devmgr_add_dev() - Add vcrypto_dev to the acceleration
> + * framework.
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Function adds virtio crypto device to the global list.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 on success, error code othewise.
> + */
> +int virtcrypto_devmgr_add_dev(struct virtio_crypto *vcrypto_dev)
> +{
> + struct list_head *itr;
> +
> + mutex_lock(&table_lock);
> + if (num_devices == VIRTIO_CRYPTO_MAX_DEVICES) {
> + pr_info("virtio_crypto: only support up to %d devices\n",
> + VIRTIO_CRYPTO_MAX_DEVICES);
> + mutex_unlock(&table_lock);
> + return -EFAULT;
> + }
> +
> + list_for_each(itr, &virtio_crypto_table) {
> + struct virtio_crypto *ptr =
> + list_entry(itr, struct virtio_crypto, list);
> +
> + if (ptr == vcrypto_dev) {
> + mutex_unlock(&table_lock);
> + return -EEXIST;
> + }
> + }
> + atomic_set(&vcrypto_dev->ref_count, 0);
> + list_add_tail(&vcrypto_dev->list, &virtio_crypto_table);
> + vcrypto_dev->dev_id = num_devices++;
> + mutex_unlock(&table_lock);
> + return 0;
> +}
> +
> +struct list_head *virtcrypto_devmgr_get_head(void)
> +{
> + return &virtio_crypto_table;
> +}
> +
> +/*
> + * virtcrypto_devmgr_rm_dev() - Remove vcrypto_dev from the acceleration
> + * framework.
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Function removes virtio crypto device from the acceleration framework.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_devmgr_rm_dev(struct virtio_crypto *vcrypto_dev)
> +{
> + mutex_lock(&table_lock);
> + list_del(&vcrypto_dev->list);
> + num_devices--;
> + mutex_unlock(&table_lock);
> +}
> +
> +/*
> + * virtcrypto_devmgr_get_first()
> + *
> + * Function returns the first virtio crypto device from the acceleration
> + * framework.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: pointer to vcrypto_dev or NULL if not found.
> + */
> +struct virtio_crypto *virtcrypto_devmgr_get_first(void)
> +{
> + struct virtio_crypto *dev = NULL;
> +
> + mutex_lock(&table_lock);
> + if (!list_empty(&virtio_crypto_table))
> + dev = list_first_entry(&virtio_crypto_table,
> + struct virtio_crypto,
> + list);
> + mutex_unlock(&table_lock);
> + return dev;
> +}
> +
> +/*
> + * virtcrypto_dev_in_use() - Check whether vcrypto_dev is currently in use
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 1 when device is in use, 0 otherwise.
> + */
> +int virtcrypto_dev_in_use(struct virtio_crypto *vcrypto_dev)
> +{
> + return atomic_read(&vcrypto_dev->ref_count) != 0;
> +}
> +
> +/*
> + * virtcrypto_dev_get() - Increment vcrypto_dev reference count
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Increment the vcrypto_dev refcount and if this is the first time
> + * incrementing it during this period the vcrypto_dev is in use,
> + * increment the module refcount too.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 when successful, EFAULT when fail to bump module refcount
> + */
> +int virtcrypto_dev_get(struct virtio_crypto *vcrypto_dev)
> +{
> + if (atomic_add_return(1, &vcrypto_dev->ref_count) == 1)
> + if (!try_module_get(vcrypto_dev->owner))
> + return -EFAULT;
> + return 0;
> +}
> +
> +/*
> + * virtcrypto_dev_put() - Decrement vcrypto_dev reference count
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * Decrement the vcrypto_dev refcount and if this is the last time
> + * decrementing it during this period the vcrypto_dev is in use,
> + * decrement the module refcount too.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_dev_put(struct virtio_crypto *vcrypto_dev)
> +{
> + if (atomic_sub_return(1, &vcrypto_dev->ref_count) == 0)
> + module_put(vcrypto_dev->owner);
> +}
> +
> +/*
> + * virtcrypto_dev_started() - Check whether device has started
> + * @vcrypto_dev: Pointer to virtio crypto device.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 1 when the device has started, 0 otherwise
> + */
> +int virtcrypto_dev_started(struct virtio_crypto *vcrypto_dev)
> +{
> + return (vcrypto_dev->status & VIRTIO_CRYPTO_S_HW_READY);
> +}
> +
> +/*
> + * virtcrypto_get_dev_node() - Get vcrypto_dev on the node.
> + * @node: Node id the driver works.
> + *
> + * Function returns the virtio crypto device used fewest on the node.
> + *
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: pointer to vcrypto_dev or NULL if not found.
> + */
> +struct virtio_crypto *virtcrypto_get_dev_node(int node)
> +{
> + struct virtio_crypto *vcrypto_dev = NULL, *tmp_dev;
> + unsigned long best = ~0;
> + unsigned long ctr;
> +
> + mutex_lock(&table_lock);
> + list_for_each_entry(tmp_dev, virtcrypto_devmgr_get_head(), list) {
> +
> + if ((node == dev_to_node(&tmp_dev->vdev->dev) ||
> + dev_to_node(&tmp_dev->vdev->dev) < 0) &&
> + virtcrypto_dev_started(tmp_dev)) {
> + ctr = atomic_read(&tmp_dev->ref_count);
> + if (best > ctr) {
> + vcrypto_dev = tmp_dev;
> + best = ctr;
> + }
> + }
> + }
> +
> + if (!vcrypto_dev) {
> + pr_info("virtio_crypto: Could not find a device on node %d\n",
> + node);
> + /* Get any started device */
> + list_for_each_entry(tmp_dev,
> + virtcrypto_devmgr_get_head(), list) {
> + if (virtcrypto_dev_started(tmp_dev)) {
> + vcrypto_dev = tmp_dev;
> + break;
> + }
> + }
> + }
> + mutex_unlock(&table_lock);
> + if (!vcrypto_dev)
> + return NULL;
> +
> + virtcrypto_dev_get(vcrypto_dev);
> + return vcrypto_dev;
> +}
> +
> +/*
> + * virtcrypto_dev_start() - Start virtio crypto device
> + * @vcrypto: Pointer to virtio crypto device.
> + *
> + * Function notifies all the registered services that the virtio crypto device
> + * is ready to be used.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: 0 on success, EFAULT when fail to register algorithms
> + */
> +int virtcrypto_dev_start(struct virtio_crypto *vcrypto)
> +{
> + if (virtio_crypto_algs_register()) {
> + pr_err("virtio_crypto: Failed to register crypto algs\n");
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * virtcrypto_dev_stop() - Stop virtio crypto device
> + * @vcrypto: Pointer to virtio crypto device.
> + *
> + * Function notifies all the registered services that the virtio crypto device
> + * is ready to be used.
> + * To be used by virtio crypto device specific drivers.
> + *
> + * Return: void
> + */
> +void virtcrypto_dev_stop(struct virtio_crypto *vcrypto)
> +{
> + virtio_crypto_algs_unregister();
> +}
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index cd2be1c..4bdb84c 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -460,6 +460,7 @@ header-y += virtio_rng.h
> header-y += virtio_scsi.h
> header-y += virtio_types.h
> header-y += virtio_vsock.h
> +header-y += virtio_crypto.h
> header-y += vm_sockets.h
> header-y += vt.h
> header-y += vtpm_proxy.h
> diff --git a/include/uapi/linux/virtio_crypto.h
> b/include/uapi/linux/virtio_crypto.h
> new file mode 100644
> index 0000000..50cdc8a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_crypto.h
> @@ -0,0 +1,450 @@
> +#ifndef _VIRTIO_CRYPTO_H
> +#define _VIRTIO_CRYPTO_H
> +/* This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + * may be used to endorse or promote products derived from this
> software
> + * without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL IBM
> OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> OUT
> + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#include <linux/types.h>
> +#include <linux/virtio_types.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +
> +
> +#define VIRTIO_CRYPTO_SERVICE_CIPHER 0
> +#define VIRTIO_CRYPTO_SERVICE_HASH 1
> +#define VIRTIO_CRYPTO_SERVICE_MAC 2
> +#define VIRTIO_CRYPTO_SERVICE_AEAD 3
> +
> +#define VIRTIO_CRYPTO_OPCODE(service, op) (((service) << 8) | (op))
> +
> +struct virtio_crypto_ctrl_header {
> +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> + __le32 opcode;
> + __le32 algo;
> + __le32 flag;
> + /* data virtqueue id */
> + __le32 queue_id;
> +};
> +
> +struct virtio_crypto_cipher_session_para {
> +#define VIRTIO_CRYPTO_NO_CIPHER 0
> +#define VIRTIO_CRYPTO_CIPHER_ARC4 1
> +#define VIRTIO_CRYPTO_CIPHER_AES_ECB 2
> +#define VIRTIO_CRYPTO_CIPHER_AES_CBC 3
> +#define VIRTIO_CRYPTO_CIPHER_AES_CTR 4
> +#define VIRTIO_CRYPTO_CIPHER_DES_ECB 5
> +#define VIRTIO_CRYPTO_CIPHER_DES_CBC 6
> +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB 7
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC 8
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR 9
> +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8 10
> +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA2 11
> +#define VIRTIO_CRYPTO_CIPHER_AES_F8 12
> +#define VIRTIO_CRYPTO_CIPHER_AES_XTS 13
> +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3 14
> + __le32 algo;
> + /* length of key */
> + __le32 keylen;
> +
> +#define VIRTIO_CRYPTO_OP_ENCRYPT 1
> +#define VIRTIO_CRYPTO_OP_DECRYPT 2
> + /* encrypt or decrypt */
> + __le32 op;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_session_input {
> + /* Device-writable part */
> + __le64 session_id;
> + __le32 status;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_cipher_session_req {
> + struct virtio_crypto_cipher_session_para para;
> + __u8 padding[32];
> +};
> +
> +struct virtio_crypto_hash_session_para {
> +#define VIRTIO_CRYPTO_NO_HASH 0
> +#define VIRTIO_CRYPTO_HASH_MD5 1
> +#define VIRTIO_CRYPTO_HASH_SHA1 2
> +#define VIRTIO_CRYPTO_HASH_SHA_224 3
> +#define VIRTIO_CRYPTO_HASH_SHA_256 4
> +#define VIRTIO_CRYPTO_HASH_SHA_384 5
> +#define VIRTIO_CRYPTO_HASH_SHA_512 6
> +#define VIRTIO_CRYPTO_HASH_SHA3_224 7
> +#define VIRTIO_CRYPTO_HASH_SHA3_256 8
> +#define VIRTIO_CRYPTO_HASH_SHA3_384 9
> +#define VIRTIO_CRYPTO_HASH_SHA3_512 10
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128 11
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256 12
> + __le32 algo;
> + /* hash result length */
> + __le32 hash_result_len;
> + __u8 padding[8];
> +};
> +
> +struct virtio_crypto_hash_create_session_req {
> + struct virtio_crypto_hash_session_para para;
> + __u8 padding[40];
> +};
> +
> +struct virtio_crypto_mac_session_para {
> +#define VIRTIO_CRYPTO_NO_MAC 0
> +#define VIRTIO_CRYPTO_MAC_HMAC_MD5 1
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA1 2
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224 3
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256 4
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384 5
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512 6
> +#define VIRTIO_CRYPTO_MAC_CMAC_3DES 25
> +#define VIRTIO_CRYPTO_MAC_CMAC_AES 26
> +#define VIRTIO_CRYPTO_MAC_KASUMI_F9 27
> +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA2 28
> +#define VIRTIO_CRYPTO_MAC_GMAC_AES 41
> +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH 42
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES 49
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9 50
> +#define VIRTIO_CRYPTO_MAC_XCBC_AES 53
> + __le32 algo;
> + /* hash result length */
> + __le32 hash_result_len;
> + /* length of authenticated key */
> + __le32 auth_key_len;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_mac_create_session_req {
> + struct virtio_crypto_mac_session_para para;
> + __u8 padding[40];
> +};
> +
> +struct virtio_crypto_aead_session_para {
> +#define VIRTIO_CRYPTO_NO_AEAD 0
> +#define VIRTIO_CRYPTO_AEAD_GCM 1
> +#define VIRTIO_CRYPTO_AEAD_CCM 2
> +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305 3
> + __le32 algo;
> + /* length of key */
> + __le32 key_len;
> + /* hash result length */
> + __le32 hash_result_len;
> + /* length of the additional authenticated data (AAD) in bytes */
> + __le32 aad_len;
> + /* encrypt or decrypt, See above VIRTIO_CRYPTO_OP_* */
> + __le32 op;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_aead_create_session_req {
> + struct virtio_crypto_aead_session_para para;
> + __u8 padding[32];
> +};
> +
> +struct virtio_crypto_alg_chain_session_para {
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_HASH_THEN_CIPHER 1
> +#define VIRTIO_CRYPTO_SYM_ALG_CHAIN_ORDER_CIPHER_THEN_HASH 2
> + __le32 alg_chain_order;
> +/* Plain hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_PLAIN 1
> +/* Authenticated hash (mac) */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_AUTH 2
> +/* Nested hash */
> +#define VIRTIO_CRYPTO_SYM_HASH_MODE_NESTED 3
> + __le32 hash_mode;
> + struct virtio_crypto_cipher_session_para cipher_param;
> + union {
> + struct virtio_crypto_hash_session_para hash_param;
> + struct virtio_crypto_mac_session_para mac_param;
> + __u8 padding[16];
> + } u;
> + /* length of the additional authenticated data (AAD) in bytes */
> + __le32 aad_len;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_alg_chain_session_req {
> + struct virtio_crypto_alg_chain_session_para para;
> +};
> +
> +struct virtio_crypto_sym_create_session_req {
> + union {
> + struct virtio_crypto_cipher_session_req cipher;
> + struct virtio_crypto_alg_chain_session_req chain;
> + __u8 padding[48];
> + } u;
> +
> + /* Device-readable part */
> +
> +/* No operation */
> +#define VIRTIO_CRYPTO_SYM_OP_NONE 0
> +/* Cipher only operation on the data */
> +#define VIRTIO_CRYPTO_SYM_OP_CIPHER 1
> +/*
> + * Chain any cipher with any hash or mac operation. The order
> + * depends on the value of alg_chain_order param
> + */
> +#define VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING 2
> + __le32 op_type;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_destroy_session_req {
> + /* Device-readable part */
> + __le64 session_id;
> + __u8 padding[48];
> +};
> +
> +/* The request of the control virtqueue's packet */
> +struct virtio_crypto_op_ctrl_req {
> + struct virtio_crypto_ctrl_header header;
> +
> + union {
> + struct virtio_crypto_sym_create_session_req
> + sym_create_session;
> + struct virtio_crypto_hash_create_session_req
> + hash_create_session;
> + struct virtio_crypto_mac_create_session_req
> + mac_create_session;
> + struct virtio_crypto_aead_create_session_req
> + aead_create_session;
> + struct virtio_crypto_destroy_session_req
> + destroy_session;
> + __u8 padding[56];
> + } u;
> +};
> +
> +struct virtio_crypto_op_header {
> +#define VIRTIO_CRYPTO_CIPHER_ENCRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x00)
> +#define VIRTIO_CRYPTO_CIPHER_DECRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x01)
> +#define VIRTIO_CRYPTO_HASH \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x00)
> +#define VIRTIO_CRYPTO_MAC \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_ENCRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x00)
> +#define VIRTIO_CRYPTO_AEAD_DECRYPT \
> + VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x01)
> + __le32 opcode;
> + /* algo should be service-specific algorithms */
> + __le32 algo;
> + /* session_id should be service-specific algorithms */
> + __le64 session_id;
> + /* control flag to control the request */
> + __le32 flag;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_cipher_para {
> + /*
> + * Byte Length of valid IV/Counter
> + *
> + * For block ciphers in CBC or F8 mode, or for Kasumi in F8 mode, or for
> + * SNOW3G in UEA2 mode, this is the length of the IV (which
> + * must be the same as the block length of the cipher).
> + * For block ciphers in CTR mode, this is the length of the counter
> + * (which must be the same as the block length of the cipher).
> + * For AES-XTS, this is the 128bit tweak, i, from IEEE Std 1619-2007.
> + *
> + * The IV/Counter will be updated after every partial cryptographic
> + * operation.
> + */
> + __le32 iv_len;
> + /* length of source data */
> + __le32 src_data_len;
> + /* length of dst data */
> + __le32 dst_data_len;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_hash_para {
> + /* length of source data */
> + __le32 src_data_len;
> + /* hash result length */
> + __le32 hash_result_len;
> +};
> +
> +struct virtio_crypto_mac_para {
> + struct virtio_crypto_hash_para hash;
> +};
> +
> +struct virtio_crypto_aead_para {
> + /*
> + * Byte Length of valid IV data pointed to by the below iv_addr
> + * parameter.
> + *
> + * For GCM mode, this is either 12 (for 96-bit IVs) or 16, in which
> + * case iv_addr points to J0.
> + * For CCM mode, this is the length of the nonce, which can be in the
> + * range 7 to 13 inclusive.
> + */
> + __le32 iv_len;
> + /* length of additional auth data */
> + __le32 aad_len;
> + /* length of source data */
> + __le32 src_data_len;
> + /* length of dst data */
> + __le32 dst_data_len;
> +};
> +
> +struct virtio_crypto_cipher_data_req {
> + /* Device-readable part */
> + struct virtio_crypto_cipher_para para;
> + __u8 padding[24];
> +};
> +
> +struct virtio_crypto_hash_data_req {
> + /* Device-readable part */
> + struct virtio_crypto_hash_para para;
> + __u8 padding[40];
> +};
> +
> +struct virtio_crypto_mac_data_req {
> + /* Device-readable part */
> + struct virtio_crypto_mac_para para;
> + __u8 padding[40];
> +};
> +
> +struct virtio_crypto_alg_chain_data_para {
> + __le32 iv_len;
> + /* Length of source data */
> + __le32 src_data_len;
> + /* Length of destination data */
> + __le32 dst_data_len;
> + /* Starting point for cipher processing in source data */
> + __le32 cipher_start_src_offset;
> + /* Length of the source data that the cipher will be computed on */
> + __le32 len_to_cipher;
> + /* Starting point for hash processing in source data */
> + __le32 hash_start_src_offset;
> + /* Length of the source data that the hash will be computed on */
> + __le32 len_to_hash;
> + /* Length of the additional auth data */
> + __le32 aad_len;
> + /* Length of the hash result */
> + __le32 hash_result_len;
> + __le32 reserved;
> +};
> +
> +struct virtio_crypto_alg_chain_data_req {
> + /* Device-readable part */
> + struct virtio_crypto_alg_chain_data_para para;
> +};
> +
> +struct virtio_crypto_sym_data_req {
> + union {
> + struct virtio_crypto_cipher_data_req cipher;
> + struct virtio_crypto_alg_chain_data_req chain;
> + __u8 padding[40];
> + } u;
> +
> + /* See above VIRTIO_CRYPTO_SYM_OP_* */
> + __le32 op_type;
> + __le32 padding;
> +};
> +
> +struct virtio_crypto_aead_data_req {
> + /* Device-readable part */
> + struct virtio_crypto_aead_para para;
> + __u8 padding[32];
> +};
> +
> +/* The request of the data virtqueue's packet */
> +struct virtio_crypto_op_data_req {
> + struct virtio_crypto_op_header header;
> +
> + union {
> + struct virtio_crypto_sym_data_req sym_req;
> + struct virtio_crypto_hash_data_req hash_req;
> + struct virtio_crypto_mac_data_req mac_req;
> + struct virtio_crypto_aead_data_req aead_req;
> + __u8 padding[48];
> + } u;
> +};
> +
> +#define VIRTIO_CRYPTO_OK 0
> +#define VIRTIO_CRYPTO_ERR 1
> +#define VIRTIO_CRYPTO_BADMSG 2
> +#define VIRTIO_CRYPTO_NOTSUPP 3
> +#define VIRTIO_CRYPTO_INVSESS 4 /* Invalid session id */
> +
> +/* The accelerator hardware is ready */
> +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0)
> +
> +struct virtio_crypto_config {
> + /* See VIRTIO_CRYPTO_OP_* above */
> + __u32 status;
> +
> + /*
> + * Maximum number of data queue
> + */
> + __u32 max_dataqueues;
> +
> + /*
> + * Specifies the services mask which the device support,
> + * see VIRTIO_CRYPTO_SERVICE_* above
> + */
> + __u32 crypto_services;
> +
> + /* Detailed algorithms mask */
> + __u32 cipher_algo_l;
> + __u32 cipher_algo_h;
> + __u32 hash_algo;
> + __u32 mac_algo_l;
> + __u32 mac_algo_h;
> + __u32 aead_algo;
> + /* Maximum length of cipher key */
> + __u32 max_cipher_key_len;
> + /* Maximum length of authenticated key */
> + __u32 max_auth_key_len;
> + __u32 reserve;
> + /* Maximum size of each crypto request's content */
> + __u64 max_size;
> +};
> +
> +struct virtio_crypto_inhdr {
> + /* See VIRTIO_CRYPTO_* above */
> + __u8 status;
> +};
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 3228d58..6d5c3b2 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -42,5 +42,6 @@
> #define VIRTIO_ID_GPU 16 /* virtio GPU */
> #define VIRTIO_ID_INPUT 18 /* virtio input */
> #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */
> +#define VIRTIO_ID_CRYPTO 20 /* virtio crypto */
>
> #endif /* _LINUX_VIRTIO_IDS_H */
> --
> 1.8.3.1
>
^ permalink raw reply
* [PATCH 00/10] virtio: sparse fixes
From: Michael S. Tsirkin @ 2016-12-06 15:40 UTC (permalink / raw)
To: linux-kernel
Cc: linux-s390, linux-scsi, kvm, linux-kbuild, netdev,
linux-remoteproc, dri-devel, virtualization, Michal Marek,
linux-crypto, v9fs-developer
I run latest sparse from git on virtio drivers
(turns out the version I had was rather outdated).
This patchset fixes a couple of bugs this uncovered,
and adds some annotations to make it sparse-clean.
In particular, endian-ness is often tricky,
so this patchset enabled endian-ness checks for sparse
builds.
Michael S. Tsirkin (10):
virtio_console: drop unused config fields
drm/virtio: fix endianness in primary_plane_update
drm/virtio: fix lock context imbalance
drm/virtio: annotate virtio_gpu_queue_ctrl_buffer_locked
vhost: make interval tree static inline
vhost: add missing __user annotations
vsock/virtio: add a missing __le annotation
vsock/virtio: mark an internal function static
vsock/virtio: fix src/dst cid format
virtio: enable endian checks for sparse builds
drivers/char/virtio_console.c | 14 +++++++-------
drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++--
drivers/gpu/drm/virtio/virtgpu_vq.c | 6 +++++-
drivers/vhost/vhost.c | 12 ++++++------
net/vmw_vsock/virtio_transport.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 16 ++++++++--------
drivers/block/Makefile | 1 +
drivers/char/Makefile | 1 +
drivers/char/hw_random/Makefile | 2 ++
drivers/gpu/drm/virtio/Makefile | 1 +
drivers/net/Makefile | 3 +++
drivers/net/caif/Makefile | 1 +
drivers/rpmsg/Makefile | 1 +
drivers/s390/virtio/Makefile | 2 ++
drivers/scsi/Makefile | 1 +
drivers/vhost/Makefile | 1 +
drivers/virtio/Makefile | 3 +++
net/9p/Makefile | 1 +
net/packet/Makefile | 1 +
net/vmw_vsock/Makefile | 2 ++
20 files changed, 50 insertions(+), 25 deletions(-)
--
MST
^ permalink raw reply
* [PATCH 01/10] virtio_console: drop unused config fields
From: Michael S. Tsirkin @ 2016-12-06 15:40 UTC (permalink / raw)
To: linux-kernel; +Cc: Amit Shah, Greg Kroah-Hartman, Arnd Bergmann, virtualization
In-Reply-To: <1481038106-24899-1-git-send-email-mst@redhat.com>
struct ports_device includes a config field including the whole
virtio_console_config, but only max_nr_ports in there is ever updated or
used. The rest is unused and in fact does not even mirror the
device config. Drop everything except max_nr_ports,
saving some memory.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/char/virtio_console.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 5649234..8b00e79 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -152,8 +152,8 @@ struct ports_device {
spinlock_t c_ivq_lock;
spinlock_t c_ovq_lock;
- /* The current config space is stored here */
- struct virtio_console_config config;
+ /* max. number of ports this device can hold */
+ u32 max_nr_ports;
/* The virtio device we're associated with */
struct virtio_device *vdev;
@@ -1649,11 +1649,11 @@ static void handle_control_message(struct virtio_device *vdev,
break;
}
if (virtio32_to_cpu(vdev, cpkt->id) >=
- portdev->config.max_nr_ports) {
+ portdev->max_nr_ports) {
dev_warn(&portdev->vdev->dev,
"Request for adding port with "
"out-of-bound id %u, max. supported id: %u\n",
- cpkt->id, portdev->config.max_nr_ports - 1);
+ cpkt->id, portdev->max_nr_ports - 1);
break;
}
add_port(portdev, virtio32_to_cpu(vdev, cpkt->id));
@@ -1894,7 +1894,7 @@ static int init_vqs(struct ports_device *portdev)
u32 i, j, nr_ports, nr_queues;
int err;
- nr_ports = portdev->config.max_nr_ports;
+ nr_ports = portdev->max_nr_ports;
nr_queues = use_multiport(portdev) ? (nr_ports + 1) * 2 : 2;
vqs = kmalloc(nr_queues * sizeof(struct virtqueue *), GFP_KERNEL);
@@ -2047,13 +2047,13 @@ static int virtcons_probe(struct virtio_device *vdev)
}
multiport = false;
- portdev->config.max_nr_ports = 1;
+ portdev->max_nr_ports = 1;
/* Don't test MULTIPORT at all if we're rproc: not a valid feature! */
if (!is_rproc_serial(vdev) &&
virtio_cread_feature(vdev, VIRTIO_CONSOLE_F_MULTIPORT,
struct virtio_console_config, max_nr_ports,
- &portdev->config.max_nr_ports) == 0) {
+ &portdev->max_nr_ports) == 0) {
multiport = true;
}
--
MST
^ permalink raw reply related
* [PATCH 02/10] drm/virtio: fix endianness in primary_plane_update
From: Michael S. Tsirkin @ 2016-12-06 15:40 UTC (permalink / raw)
To: linux-kernel; +Cc: David Airlie, dri-devel, virtualization
In-Reply-To: <1481038106-24899-1-git-send-email-mst@redhat.com>
virtio_gpu_cmd_transfer_to_host_2d expects x and y
parameters in LE, but virtio_gpu_primary_plane_update
passes in the CPU format instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c
index ba28c0f..1fda965 100644
--- a/drivers/gpu/drm/virtio/virtgpu_plane.c
+++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
@@ -88,8 +88,8 @@ static void virtio_gpu_primary_plane_update(struct drm_plane *plane,
(vgdev, handle, 0,
cpu_to_le32(plane->state->src_w >> 16),
cpu_to_le32(plane->state->src_h >> 16),
- plane->state->src_x >> 16,
- plane->state->src_y >> 16, NULL);
+ cpu_to_le32(plane->state->src_x >> 16),
+ cpu_to_le32(plane->state->src_y >> 16), NULL);
}
} else {
handle = 0;
--
MST
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox