* [PATCH 0/2] Parallel crypto/IPsec v6 @ 2009-10-08 7:25 Steffen Klassert 2009-10-08 7:27 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert 2009-10-08 7:28 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert 0 siblings, 2 replies; 15+ messages in thread From: Steffen Klassert @ 2009-10-08 7:25 UTC (permalink / raw) To: Herbert Xu, David Miller; +Cc: linux-crypto This patchset adds the 'pcrypt' parallel crypto template. With this template it is possible to process the crypto requests of a transform in parallel without getting request reorder. This is in particular interesting for IPsec. The parallel crypto template is based on a generic parallelization/serialization method. This method uses the remote softirq invocation infrastructure for parallelization and serialization. With this method data objects can be processed in parallel, starting at some given point. After doing some expensive operations in parallel, it is possible to serialize again. The parallelized data objects return after serialization in the order as they were before the parallelization. In the case of IPsec, this makes it possible to run the expensive parts in parallel without getting packet reordering. Changes from v5: - rebased to linux-2.6 git current Changes from v4: - Use the dynamic percpu allocator - Drop of the obsolete eseqiv changes (eseqiv is the default IV generator for blockcipher algorithms on smp machines now). Changes from v3: - The generic aead wrapper is dropped. - tcrypt is extended to test algorithms by name. So it is possible to instantiate pcrypt by doing e.g.: modprobe tcrypt alg="pcrypt(authenc(hmac(sha1),cbc(aes)))" type=3 Changes from v2: - The xfrm netlink configuration code is dropped, this will be an extra patchset. - Add generic aead wrapper interface to be able to wrap an aead algorithm with an arbitrary crypto template. - Convert pcrypt to use the generic aead wrapper. - Add support for aead algorithms to eseqiv. - Add support for the pcrypt aead wrapper to authenc. It's now possible to choose for pcrypt as the default authenc wrapper with a module parameter. - Patchset applies to linux-2.6 git current. Changes from v1: - cpu_chainiv is dropped, pcrypt uses eseqiv as it's IV generator now. - Add a xfrm netlink message to be able to choose for pcrypt from userspace. - Use pcrypt just if it is selected from userspace. Steffen ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] padata: generic interface for parallel processing 2009-10-08 7:25 [PATCH 0/2] Parallel crypto/IPsec v6 Steffen Klassert @ 2009-10-08 7:27 ` Steffen Klassert 2009-10-08 7:28 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert 1 sibling, 0 replies; 15+ messages in thread From: Steffen Klassert @ 2009-10-08 7:27 UTC (permalink / raw) To: Herbert Xu, David Miller; +Cc: linux-crypto This patch introduces an interface to process data objects in parallel. On request it is possible to serialize again. The parallelized objects return after serialization in the same order as they were before the parallelization. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- include/linux/interrupt.h | 3 +- include/linux/padata.h | 121 ++++++++++ include/trace/events/irq.h | 1 + kernel/Makefile | 2 +- kernel/padata.c | 519 ++++++++++++++++++++++++++++++++++++++++++++ kernel/softirq.c | 2 +- 6 files changed, 645 insertions(+), 3 deletions(-) create mode 100644 include/linux/padata.h create mode 100644 kernel/padata.c diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index b78cf81..1af1e4b 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -350,7 +350,8 @@ enum TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, - RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ + PADATA_SOFTIRQ, + RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ NR_SOFTIRQS }; diff --git a/include/linux/padata.h b/include/linux/padata.h new file mode 100644 index 0000000..a81161d --- /dev/null +++ b/include/linux/padata.h @@ -0,0 +1,121 @@ +/* + * padata.h - header for the padata parallelization interface + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef PADATA_H +#define PADATA_H + +#include <linux/interrupt.h> +#include <linux/smp.h> +#include <linux/list.h> + +enum +{ + NO_PADATA=0, + AEAD_ENC_PADATA, + AEAD_DEC_PADATA, + NR_PADATA +}; + +struct padata_priv { + struct list_head list; + struct call_single_data csd; + int cb_cpu; + int seq_nr; + unsigned int nr; + int info; + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); +}; + +struct padata_queue { + struct list_head list; + atomic_t num_obj; + int cpu_index; + spinlock_t lock; +}; + +struct parallel_data { + struct work_struct work; + struct padata_queue *queue; + atomic_t seq_nr; + atomic_t queued_objects; + cpumask_var_t cpumask; + cpumask_var_t new_cpumask; + u8 flags; +#define PADATA_INIT 1 +#define PADATA_FLUSH_HARD 2 +#define PADATA_RESET_IN_PROGRESS 4 + spinlock_t lock; +}; + +#ifdef CONFIG_USE_GENERIC_SMP_HELPERS +extern void __init padata_init(unsigned int nr, const struct cpumask *cpumask); +extern void padata_dont_wait(unsigned int nr, struct padata_priv *padata); +extern int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu); +extern int padata_do_serial(unsigned int nr, struct padata_priv *padata); +extern int padata_cpumask_weight(unsigned int nr); +extern int padata_index_to_cpu(unsigned int nr, int cpu_index); +extern void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask); +extern void padata_add_cpu(unsigned int nr, int cpu); +extern void padata_remove_cpu(unsigned int nr, int cpu); +extern void padata_start(unsigned int nr); +extern void padata_stop(unsigned int nr); +#else +static inline void padata_init(unsigned int nr, const struct cpumask *cpumask) +{ +} +static inline void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ +} +static inline int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + return 0; +} +static inline int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + return 0; +} +static inline int padata_cpumask_weight(unsigned int nr) +{ + return 0; +} +static inline int padata_index_to_cpu(unsigned int nr, int cpu_index) +{ + return -ENOSYS; +} +static inline void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask) +{ +} +static inline padata_add_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_remove_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_start(unsigned int nr) +{ +} +static inline padata_stop(unsigned int nr) +{ +} +#endif +#endif diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index b89f9db..69584a5 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -19,6 +19,7 @@ softirq_name(TASKLET), \ softirq_name(SCHED), \ softirq_name(HRTIMER), \ + softirq_name(padata), \ softirq_name(RCU)) /** diff --git a/kernel/Makefile b/kernel/Makefile index b8d4cd8..e8e2ecc 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o -obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o +obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o padata.o ifneq ($(CONFIG_SMP),y) obj-y += up.o endif diff --git a/kernel/padata.c b/kernel/padata.c new file mode 100644 index 0000000..3e4065b --- /dev/null +++ b/kernel/padata.c @@ -0,0 +1,519 @@ +/* + * padata.c - generic interface to process data streams in parallel + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/err.h> +#include <linux/padata.h> + +#define MAX_SEQ_NR INT_MAX - NR_CPUS + +static struct parallel_data padata_vec[NR_PADATA]; +static struct padata_priv *padata_get_next(struct parallel_data *par_data); + +static void padata_flush_hard(struct parallel_data *par_data) +{ + int cpu; + struct padata_priv *padata; + struct padata_queue *queue; + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + while(!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, + struct padata_priv, list); + + spin_lock(&queue->lock); + list_del_init(&padata->list); + spin_unlock(&queue->lock); + + atomic_dec(&par_data->queued_objects); + padata->serial(padata); + } + } +} + +static void padata_flush_order(struct parallel_data *par_data) +{ + struct padata_priv *padata; + + while (1) { + padata = padata_get_next(par_data); + + if (padata && !IS_ERR(padata)) + padata->serial(padata); + else + break; + } + + padata_flush_hard(par_data); +} + +static void padata_reset_work(struct work_struct *work) +{ + int cpu, cpu_index; + struct padata_queue *queue; + struct parallel_data *par_data; + + par_data = container_of(work, struct parallel_data, work); + + if (par_data->flags & (PADATA_INIT|PADATA_RESET_IN_PROGRESS)) + return; + + spin_lock_bh(&par_data->lock); + par_data->flags |= PADATA_RESET_IN_PROGRESS; + + if (!(par_data->flags & PADATA_FLUSH_HARD)) + padata_flush_order(par_data); + else + padata_flush_hard(par_data); + + cpu_index = 0; + + cpumask_copy(par_data->cpumask, par_data->new_cpumask); + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + atomic_set(&queue->num_obj, 0); + queue->cpu_index = cpu_index; + cpu_index++; + } + spin_unlock_bh(&par_data->lock); + + atomic_set(&par_data->seq_nr, -1); + par_data->flags &= ~PADATA_RESET_IN_PROGRESS; + par_data->flags |= PADATA_INIT; +} + +static struct padata_priv *padata_get_next(struct parallel_data *par_data) +{ + int cpu, num_cpus, empty; + int seq_nr, calc_seq_nr, next_nr; + struct padata_queue *queue, *next_queue; + struct padata_priv *padata; + + empty = 0; + next_nr = -1; + next_queue = NULL; + + num_cpus = cpumask_weight(par_data->cpumask); + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + /* + * Calculate the seq_nr of the object that should be + * next in this queue. + */ + calc_seq_nr = (atomic_read(&queue->num_obj) * num_cpus) + + queue->cpu_index; + + if (!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, + struct padata_priv, list); + + seq_nr = padata->seq_nr; + + if (unlikely(calc_seq_nr != seq_nr)) { + par_data->flags &= ~PADATA_INIT; + par_data->flags |= PADATA_FLUSH_HARD; + padata = NULL; + goto out; + } + } else { + seq_nr = calc_seq_nr; + empty++; + } + + if (next_nr < 0 || seq_nr < next_nr) { + next_nr = seq_nr; + next_queue = queue; + } + } + + padata = NULL; + + if (empty == num_cpus) + goto out; + + if (!list_empty(&next_queue->list)) { + padata = list_entry(next_queue->list.next, + struct padata_priv, list); + + spin_lock(&next_queue->lock); + list_del_init(&padata->list); + spin_unlock(&next_queue->lock); + + atomic_dec(&par_data->queued_objects); + atomic_inc(&next_queue->num_obj); + + goto out; + } + + if (next_nr % num_cpus == next_queue->cpu_index) { + padata = ERR_PTR(-ENODATA); + goto out; + } + + padata = ERR_PTR(-EINPROGRESS); +out: + return padata; +} + +static void padata_action(struct softirq_action *h) +{ + struct list_head *cpu_list, local_list; + + cpu_list = &__get_cpu_var(softirq_work_list[PADATA_SOFTIRQ]); + + local_irq_disable(); + list_replace_init(cpu_list, &local_list); + local_irq_enable(); + + while (!list_empty(&local_list)) { + struct padata_priv *padata; + + padata = list_entry(local_list.next, + struct padata_priv, csd.list); + + list_del_init(&padata->csd.list); + + padata->serial(padata); + } +} + +static int padata_cpu_hash(unsigned int nr, struct padata_priv *padata) +{ + int this_cpu, cpu_index; + + this_cpu = smp_processor_id(); + + if (padata->nr != 0) + return this_cpu; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return this_cpu; + + padata->seq_nr = atomic_inc_return(&padata_vec[nr].seq_nr); + + if (padata->seq_nr > MAX_SEQ_NR) { + padata_vec[nr].flags &= ~PADATA_INIT; + padata->seq_nr = 0; + schedule_work(&padata_vec[nr].work); + return this_cpu; + } + + padata->nr = nr; + + /* + * Hash the sequence numbers to the cpus by taking + * seq_nr mod. number of cpus in use. + */ + cpu_index = padata->seq_nr % cpumask_weight(padata_vec[nr].cpumask); + + return padata_index_to_cpu(nr, cpu_index); +} + +/* + * padata_dont_wait - must be called if an object that runs in parallel will + * not be serialized with padata_do_serial + * + * @nr: number of the padata instance + * @padata: object that will not be seen by padata_do_serial + */ +void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ + struct padata_queue *queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return; + + if (padata->nr == 0 || padata->nr != nr) + return; + + queue = per_cpu_ptr(padata_vec[nr].queue, smp_processor_id()); + atomic_inc(&queue->num_obj); + + padata->nr = 0; + padata->seq_nr = 0; +} +EXPORT_SYMBOL(padata_dont_wait); + +/* + * padata_do_parallel - padata parallelization function + * + * @softirq_nr: number of the softirq that will do the parallelization + * @nr: number of the padata instance + * @padata: object to be parallelized + * @cb_cpu: cpu number on which the serialization callback function will run + */ +int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + int target_cpu; + + padata->cb_cpu = cb_cpu; + + local_bh_disable(); + target_cpu = padata_cpu_hash(nr, padata); + local_bh_enable(); + + send_remote_softirq(&padata->csd, target_cpu, softirq_nr); + + return 1; +} +EXPORT_SYMBOL(padata_do_parallel); + +/* + * padata_do_serial - padata serialization function + * + * @nr: number of the padata instance + * @padata: object to be serialized + * + * returns 1 if the serialization callback function will be called + * from padata, 0 else + */ +int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + int cpu; + struct padata_queue *reorder_queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return 0; + + if (padata->nr != nr || padata->nr == 0) { + padata->serial(padata); + return 1; + } + + cpu = smp_processor_id(); + + reorder_queue = per_cpu_ptr(padata_vec[nr].queue, cpu); + + spin_lock(&reorder_queue->lock); + list_add_tail(&padata->list, &reorder_queue->list); + spin_unlock(&reorder_queue->lock); + + atomic_inc(&padata_vec[nr].queued_objects); + +try_again: + if (!spin_trylock(&padata_vec[nr].lock)) + goto out; + + while(1) { + padata = padata_get_next(&padata_vec[nr]); + + if (!padata || PTR_ERR(padata) == -EINPROGRESS) + break; + if (PTR_ERR(padata) == -ENODATA) { + spin_unlock(&padata_vec[nr].lock); + goto out; + } + + send_remote_softirq(&padata->csd, padata->cb_cpu, + PADATA_SOFTIRQ); + } + + if (unlikely(!(padata_vec[nr].flags & PADATA_INIT))) { + spin_unlock(&padata_vec[nr].lock); + goto reset_out; + } + + spin_unlock(&padata_vec[nr].lock); + + if (atomic_read(&padata_vec[nr].queued_objects)) + goto try_again; + +out: + return 1; +reset_out: + schedule_work(&padata_vec[nr].work); + return 1; +} +EXPORT_SYMBOL(padata_do_serial); + +/* + * padata_cpumask_weight - get the number of cpus that are actually in use + * + * @nr: number of the padata instance + */ +int padata_cpumask_weight(unsigned int nr) +{ + return cpumask_weight(padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_cpumask_weight); + +/* + * padata_index_to_cpu - get the cpu for a given cpu index + * + * @nr: number of the padata instance + * @cpu_index: index of the cpu in question + * + * The range of cpu_index is 0 <= cpu_index < padata_cpumask_weight(), + * so padata_cpumask_weight must be called before padata_index_to_cpu. + */ +int padata_index_to_cpu(unsigned int nr, int cpu_index) +{ + int cpu, target_cpu; + + target_cpu = cpumask_first(padata_vec[nr].cpumask); + for (cpu = 0; cpu < cpu_index; cpu++) + target_cpu = cpumask_next(target_cpu, padata_vec[nr].cpumask); + + return target_cpu; +} +EXPORT_SYMBOL(padata_index_to_cpu); + +/* + * padata_set_cpumask - set the cpumask that padata uses + * + * @nr: number of the padata instance + * @cpumask: the cpumask to use + */ +void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask) +{ + cpumask_copy(padata_vec[nr].new_cpumask, cpumask); + padata_vec[nr].flags &= ~PADATA_INIT; + padata_vec[nr].flags |= PADATA_FLUSH_HARD; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_set_cpumask); + +/* + * padata_add_cpu - add a cpu to the padata cpumask + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_add_cpu(unsigned int nr, int cpu) +{ + cpumask_set_cpu(cpu, padata_vec[nr].cpumask); + padata_set_cpumask(nr, padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_add_cpu); + +/* + * padata_remove_cpu - remove a cpu from the padata cpumask + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_remove_cpu(unsigned int nr, int cpu) +{ + cpumask_set_cpu(cpu, padata_vec[nr].cpumask); + padata_set_cpumask(nr, padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_remove_cpu); + +/* + * padata_start - start the parallel processing + * + * @nr: number of the padata instance + */ +void padata_start(unsigned int nr) +{ + if (padata_vec[nr].flags & PADATA_INIT) + return; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_start); + +/* + * padata_stop - stop the parallel processing + * + * @nr: number of the padata instance + */ +void padata_stop(unsigned int nr) +{ + padata_vec[nr].flags &= ~PADATA_INIT; +} +EXPORT_SYMBOL(padata_stop); + +/* + * padata_init - initialize a padata instance + * + * @nr: number of the padata instance + * @cpumask: cpumask that padata uses for parallelization + */ +void __init padata_init(unsigned int nr, const struct cpumask *cpumask) +{ + int cpu, cpu_index; + struct padata_queue *percpu_queue, *queue; + + percpu_queue = alloc_percpu(struct padata_queue); + + if (!percpu_queue) { + printk("padata_init: Failed to alloc the serialization" + "queues for padata nr %d, exiting!\n", nr); + return; + } + + if (!alloc_cpumask_var(&padata_vec[nr].cpumask, GFP_KERNEL)) + goto err_free; + + if (!alloc_cpumask_var(&padata_vec[nr].new_cpumask, GFP_KERNEL)) + goto err_free_mask; + + cpu_index = 0; + + for_each_possible_cpu(cpu) { + queue = per_cpu_ptr(percpu_queue, cpu); + + if (cpumask_test_cpu(cpu, cpumask)) { + queue->cpu_index = cpu_index; + cpu_index++; + } + + INIT_LIST_HEAD(&queue->list); + spin_lock_init(&queue->lock); + atomic_set(&queue->num_obj, 0); + } + + INIT_WORK(&padata_vec[nr].work, padata_reset_work); + + cpumask_copy(padata_vec[nr].cpumask, cpumask); + cpumask_copy(padata_vec[nr].new_cpumask, cpumask); + + atomic_set(&padata_vec[nr].seq_nr, -1); + atomic_set(&padata_vec[nr].queued_objects, 0); + padata_vec[nr].queue = percpu_queue; + padata_vec[nr].flags = 0; + spin_lock_init(&padata_vec[nr].lock); + + return; + +err_free_mask: + free_cpumask_var(padata_vec[nr].cpumask); + +err_free: + free_percpu(percpu_queue); +} + +static int __init padata_initcall(void) +{ + open_softirq(PADATA_SOFTIRQ, padata_action); + + return 0; +} +subsys_initcall(padata_initcall); diff --git a/kernel/softirq.c b/kernel/softirq.c index f8749e5..4317559 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct task_struct *, ksoftirqd); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", - "TASKLET", "SCHED", "HRTIMER", "RCU" + "TASKLET", "SCHED", "HRTIMER","PADATA", "RCU" }; /* -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-08 7:25 [PATCH 0/2] Parallel crypto/IPsec v6 Steffen Klassert 2009-10-08 7:27 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert @ 2009-10-08 7:28 ` Steffen Klassert 2009-10-09 6:18 ` David Miller 1 sibling, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-10-08 7:28 UTC (permalink / raw) To: Herbert Xu, David Miller; +Cc: linux-crypto This patch adds a parallel crypto template that takes a crypto algorithm and converts it to process the crypto transforms in parallel. For the moment only aead is supported. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- crypto/Kconfig | 13 ++ crypto/Makefile | 2 + crypto/pcrypt.c | 409 ++++++++++++++++++++++++++++++++++++++++++++ crypto/pcrypt_core.c | 106 ++++++++++++ include/crypto/pcrypt.h | 51 ++++++ include/linux/interrupt.h | 2 + include/trace/events/irq.h | 2 + kernel/softirq.c | 3 +- 8 files changed, 587 insertions(+), 1 deletions(-) create mode 100644 crypto/pcrypt.c create mode 100644 crypto/pcrypt_core.c create mode 100644 include/crypto/pcrypt.h diff --git a/crypto/Kconfig b/crypto/Kconfig index 26b5dd0..ce1bc59 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -114,6 +114,19 @@ config CRYPTO_NULL help These are 'Null' algorithms, used by IPsec, which do nothing. +config CRYPTO_PCRYPT_CORE + bool + +config CRYPTO_PCRYPT + tristate "Parallel crypto engine (EXPERIMENTAL)" + depends on USE_GENERIC_SMP_HELPERS && EXPERIMENTAL + select CRYPTO_MANAGER + select CRYPTO_PCRYPT_CORE + select CRYPTO_AEAD + help + This converts an arbitrary crypto algorithm into a parallel + algorithm that is executed in a softirq. + config CRYPTO_WORKQUEUE tristate diff --git a/crypto/Makefile b/crypto/Makefile index 9e8f619..0d0d02d 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -56,6 +56,8 @@ obj-$(CONFIG_CRYPTO_XTS) += xts.o obj-$(CONFIG_CRYPTO_CTR) += ctr.o obj-$(CONFIG_CRYPTO_GCM) += gcm.o obj-$(CONFIG_CRYPTO_CCM) += ccm.o +obj-$(CONFIG_CRYPTO_PCRYPT_CORE) += pcrypt_core.o +obj-$(CONFIG_CRYPTO_PCRYPT) += pcrypt.o obj-$(CONFIG_CRYPTO_CRYPTD) += cryptd.o obj-$(CONFIG_CRYPTO_DES) += des_generic.o obj-$(CONFIG_CRYPTO_FCRYPT) += fcrypt.o diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c new file mode 100644 index 0000000..c8648e6 --- /dev/null +++ b/crypto/pcrypt.c @@ -0,0 +1,409 @@ +/* + * pcrypt - Parallel crypto wrapper. + * + * Copyright (C) 2009 secunet Security Networks AG + * Copyright (C) 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <crypto/algapi.h> +#include <crypto/internal/aead.h> +#include <linux/err.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <crypto/pcrypt.h> + +struct pcrypt_instance_ctx { + struct crypto_spawn spawn; + unsigned int tfm_count; +}; + +struct pcrypt_aead_ctx { + struct crypto_aead *child; + unsigned int tfm_nr; +}; + +static int pcrypt_do_parallel(struct padata_priv *padata, unsigned int tfm_nr, + unsigned int softirq, unsigned int padata_nr) +{ + unsigned int cpu_index, num_cpus, cb_cpu; + + num_cpus = padata_cpumask_weight(padata_nr); + if (!num_cpus) + return 0; + + cpu_index = tfm_nr % num_cpus; + + cb_cpu = padata_index_to_cpu(padata_nr, cpu_index); + + return padata_do_parallel(softirq, padata_nr, padata, cb_cpu); +} + +static int pcrypt_aead_setkey(struct crypto_aead *parent, + const u8 *key, unsigned int keylen) +{ + struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent); + + return crypto_aead_setkey(ctx->child, key, keylen); +} + +static int pcrypt_aead_setauthsize(struct crypto_aead *parent, + unsigned int authsize) +{ + struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(parent); + + return crypto_aead_setauthsize(ctx->child, authsize); +} + +static void pcrypt_aead_serial(struct padata_priv *padata) +{ + struct pcrypt_request *preq = pcrypt_padata_request(padata); + struct aead_request *req = pcrypt_request_ctx(preq); + + aead_request_complete(req->base.data, padata->info); +} + +static void pcrypt_aead_giv_serial(struct padata_priv *padata) +{ + struct pcrypt_request *preq = pcrypt_padata_request(padata); + struct aead_givcrypt_request *req = pcrypt_request_ctx(preq); + + aead_request_complete(req->areq.base.data, padata->info); +} + +static void pcrypt_aead_done(struct crypto_async_request *areq, int err) +{ + struct aead_request *req = areq->data; + struct pcrypt_request *preq = aead_request_ctx(req); + struct padata_priv *padata = pcrypt_request_padata(preq); + + padata->info = err; + req->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP; + + local_bh_disable(); + if (padata_do_serial(padata->nr, padata)) + goto out; + + aead_request_complete(req, padata->info); + +out: + local_bh_enable(); +} + +static void pcrypt_aead_enc(struct padata_priv *padata) +{ + struct pcrypt_request *preq = pcrypt_padata_request(padata); + struct aead_request *req = pcrypt_request_ctx(preq); + + padata->info = crypto_aead_encrypt(req); + + if (padata->info) + return; + + if (padata_do_serial(AEAD_ENC_PADATA, padata)) + return; + + aead_request_complete(req->base.data, padata->info); +} + +static int pcrypt_aead_encrypt(struct aead_request *req) +{ + int err; + struct pcrypt_request *preq = aead_request_ctx(req); + struct aead_request *creq = pcrypt_request_ctx(preq); + struct padata_priv *padata = pcrypt_request_padata(preq); + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead); + u32 flags = aead_request_flags(req); + + memset(padata, 0, sizeof(struct padata_priv)); + + padata->parallel = pcrypt_aead_enc; + padata->serial = pcrypt_aead_serial; + + aead_request_set_tfm(creq, ctx->child); + aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP, + pcrypt_aead_done, req); + aead_request_set_crypt(creq, req->src, req->dst, + req->cryptlen, req->iv); + aead_request_set_assoc(creq, req->assoc, req->assoclen); + + if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_ENC_SOFTIRQ, + AEAD_ENC_PADATA)) + err = -EINPROGRESS; + else + err = crypto_aead_encrypt(creq); + + return err; +} + +static void pcrypt_aead_dec(struct padata_priv *padata) +{ + struct pcrypt_request *preq = pcrypt_padata_request(padata); + struct aead_request *req = pcrypt_request_ctx(preq); + + padata->info = crypto_aead_decrypt(req); + + if (padata->info) + return; + + if (padata_do_serial(AEAD_DEC_PADATA, padata)) + return; + + aead_request_complete(req->base.data, padata->info); +} + +static int pcrypt_aead_decrypt(struct aead_request *req) +{ + int err; + struct pcrypt_request *preq = aead_request_ctx(req); + struct aead_request *creq = pcrypt_request_ctx(preq); + struct padata_priv *padata = pcrypt_request_padata(preq); + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead); + u32 flags = aead_request_flags(req); + + memset(padata, 0, sizeof(struct padata_priv)); + + padata->parallel = pcrypt_aead_dec; + padata->serial = pcrypt_aead_serial; + + aead_request_set_tfm(creq, ctx->child); + aead_request_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP, + pcrypt_aead_done, req); + aead_request_set_crypt(creq, req->src, req->dst, + req->cryptlen, req->iv); + aead_request_set_assoc(creq, req->assoc, req->assoclen); + + if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_DEC_SOFTIRQ, + AEAD_DEC_PADATA)) + err = -EINPROGRESS; + else + err = crypto_aead_decrypt(creq); + + return err; +} + +static void pcrypt_aead_givenc(struct padata_priv *padata) +{ + struct pcrypt_request *preq = pcrypt_padata_request(padata); + struct aead_givcrypt_request *req = pcrypt_request_ctx(preq); + + padata->info = crypto_aead_givencrypt(req); + + if (padata->info) + return; + + if (padata_do_serial(AEAD_ENC_PADATA, padata)) + return; + + aead_request_complete(req->areq.base.data, padata->info); +} + +static int pcrypt_aead_givencrypt(struct aead_givcrypt_request *req) +{ + int err; + struct aead_request *areq = &req->areq; + struct pcrypt_request *preq = aead_request_ctx(areq); + struct aead_givcrypt_request *creq = pcrypt_request_ctx(preq); + struct padata_priv *padata = pcrypt_request_padata(preq); + struct crypto_aead *aead = aead_givcrypt_reqtfm(req); + struct pcrypt_aead_ctx *ctx = crypto_aead_ctx(aead); + u32 flags = aead_request_flags(areq); + + memset(padata, 0, sizeof(struct padata_priv)); + + padata->parallel = pcrypt_aead_givenc; + padata->serial = pcrypt_aead_giv_serial; + + aead_givcrypt_set_tfm(creq, ctx->child); + aead_givcrypt_set_callback(creq, flags & ~CRYPTO_TFM_REQ_MAY_SLEEP, + pcrypt_aead_done, areq); + aead_givcrypt_set_crypt(creq, areq->src, areq->dst, + areq->cryptlen, areq->iv); + aead_givcrypt_set_assoc(creq, areq->assoc, areq->assoclen); + aead_givcrypt_set_giv(creq, req->giv, req->seq); + + + if (pcrypt_do_parallel(padata, ctx->tfm_nr, AEAD_ENC_SOFTIRQ, + AEAD_ENC_PADATA)) + err = -EINPROGRESS; + else + err = crypto_aead_givencrypt(creq); + + return err; +} + +static int pcrypt_aead_init_tfm(struct crypto_tfm *tfm) +{ + struct crypto_instance *inst = crypto_tfm_alg_instance(tfm); + struct pcrypt_instance_ctx *ictx = crypto_instance_ctx(inst); + struct pcrypt_aead_ctx *ctx = crypto_tfm_ctx(tfm); + struct crypto_aead *cipher; + + ictx->tfm_count++; + ctx->tfm_nr = ictx->tfm_count; + + cipher = crypto_spawn_aead(crypto_instance_ctx(inst)); + + if (IS_ERR(cipher)) + return PTR_ERR(cipher); + + ctx->child = cipher; + tfm->crt_aead.reqsize = sizeof(struct pcrypt_request) + + sizeof(struct aead_givcrypt_request) + + crypto_aead_reqsize(cipher); + + return 0; +} + +static void pcrypt_aead_exit_tfm(struct crypto_tfm *tfm) +{ + struct pcrypt_aead_ctx *ctx = crypto_tfm_ctx(tfm); + + crypto_free_aead(ctx->child); +} + +static struct crypto_instance *pcrypt_alloc_instance(struct crypto_alg *alg) +{ + struct crypto_instance *inst; + struct pcrypt_instance_ctx *ctx; + int err; + + inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL); + if (!inst) { + inst = ERR_PTR(-ENOMEM); + goto out; + } + + err = -ENAMETOOLONG; + if (snprintf(inst->alg.cra_driver_name, CRYPTO_MAX_ALG_NAME, + "pcrypt(%s)", alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) + goto out_free_inst; + + memcpy(inst->alg.cra_name, alg->cra_name, CRYPTO_MAX_ALG_NAME); + + ctx = crypto_instance_ctx(inst); + err = crypto_init_spawn(&ctx->spawn, alg, inst, + CRYPTO_ALG_TYPE_MASK); + if (err) + goto out_free_inst; + + inst->alg.cra_priority = alg->cra_priority + 100; + inst->alg.cra_blocksize = alg->cra_blocksize; + inst->alg.cra_alignmask = alg->cra_alignmask; + +out: + return inst; + +out_free_inst: + kfree(inst); + inst = ERR_PTR(err); + goto out; +} + +static struct crypto_instance *pcrypt_alloc_aead(struct rtattr **tb) +{ + struct crypto_instance *inst; + struct crypto_alg *alg; + struct crypto_attr_type *algt; + + algt = crypto_get_attr_type(tb); + + alg = crypto_get_attr_alg(tb, algt->type, + (algt->mask & CRYPTO_ALG_TYPE_MASK)); + if (IS_ERR(alg)) + return ERR_CAST(alg); + + inst = pcrypt_alloc_instance(alg); + if (IS_ERR(inst)) + goto out_put_alg; + + inst->alg.cra_flags = CRYPTO_ALG_TYPE_AEAD | CRYPTO_ALG_ASYNC; + inst->alg.cra_type = &crypto_aead_type; + + inst->alg.cra_aead.ivsize = alg->cra_aead.ivsize; + inst->alg.cra_aead.geniv = alg->cra_aead.geniv; + inst->alg.cra_aead.maxauthsize = alg->cra_aead.maxauthsize; + + inst->alg.cra_ctxsize = sizeof(struct pcrypt_aead_ctx); + + inst->alg.cra_init = pcrypt_aead_init_tfm; + inst->alg.cra_exit = pcrypt_aead_exit_tfm; + + inst->alg.cra_aead.setkey = pcrypt_aead_setkey; + inst->alg.cra_aead.setauthsize = pcrypt_aead_setauthsize; + inst->alg.cra_aead.encrypt = pcrypt_aead_encrypt; + inst->alg.cra_aead.decrypt = pcrypt_aead_decrypt; + inst->alg.cra_aead.givencrypt = pcrypt_aead_givencrypt; + +out_put_alg: + crypto_mod_put(alg); + return inst; +} + +static struct crypto_instance *pcrypt_alloc(struct rtattr **tb) +{ + struct crypto_attr_type *algt; + + algt = crypto_get_attr_type(tb); + if (IS_ERR(algt)) + return ERR_CAST(algt); + + switch (algt->type & algt->mask & CRYPTO_ALG_TYPE_MASK) { + case CRYPTO_ALG_TYPE_AEAD: + return pcrypt_alloc_aead(tb); + } + + return ERR_PTR(-EINVAL); +} + +static void pcrypt_free(struct crypto_instance *inst) +{ + struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst); + + crypto_drop_spawn(&ctx->spawn); + kfree(inst); +} + +static struct crypto_template pcrypt_tmpl = { + .name = "pcrypt", + .alloc = pcrypt_alloc, + .free = pcrypt_free, + .module = THIS_MODULE, +}; + +static int __init pcrypt_init(void) +{ + padata_start(AEAD_ENC_PADATA); + padata_start(AEAD_DEC_PADATA); + + return crypto_register_template(&pcrypt_tmpl); +} + +static void __exit pcrypt_exit(void) +{ + padata_stop(AEAD_ENC_PADATA); + padata_stop(AEAD_DEC_PADATA); + + crypto_unregister_template(&pcrypt_tmpl); +} + +module_init(pcrypt_init); +module_exit(pcrypt_exit); + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("Parallel crypto engine"); diff --git a/crypto/pcrypt_core.c b/crypto/pcrypt_core.c new file mode 100644 index 0000000..065d0a3 --- /dev/null +++ b/crypto/pcrypt_core.c @@ -0,0 +1,106 @@ +/* + * pcrypt_core.c - Core functions for the pcrypt crypto parallelization + * + * Copyright (C) 2009 secunet Security Networks AG + * Copyright (C) 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <linux/interrupt.h> +#include <linux/cpu.h> +#include <linux/err.h> +#include <linux/module.h> +#include <crypto/pcrypt.h> + +static void aead_enc_action(struct softirq_action *h) +{ + struct list_head *cpu_list, local_list; + + cpu_list = &__get_cpu_var(softirq_work_list[AEAD_ENC_SOFTIRQ]); + + local_irq_disable(); + list_replace_init(cpu_list, &local_list); + local_irq_enable(); + + while (!list_empty(&local_list)) { + struct padata_priv *padata; + + padata = list_entry(local_list.next, struct padata_priv, + csd.list); + + list_del_init(&padata->csd.list); + + padata->parallel(padata); + } +} + +static void aead_dec_action(struct softirq_action *h) +{ + struct list_head *cpu_list, local_list; + + cpu_list = &__get_cpu_var(softirq_work_list[AEAD_DEC_SOFTIRQ]); + + local_irq_disable(); + list_replace_init(cpu_list, &local_list); + local_irq_enable(); + + while (!list_empty(&local_list)) { + struct padata_priv *padata; + + padata = list_entry(local_list.next, struct padata_priv, + csd.list); + + list_del_init(&padata->csd.list); + + padata->parallel(padata); + } +} + +static int __devinit pcrypt_cpu_callback(struct notifier_block *nfb, + unsigned long action, void *hcpu) +{ + int cpu = (unsigned long)hcpu; + + switch (action) { + case CPU_ONLINE: + case CPU_ONLINE_FROZEN: + padata_add_cpu(AEAD_ENC_PADATA, cpu); + padata_add_cpu(AEAD_DEC_PADATA, cpu); + break; + + case CPU_DEAD: + case CPU_DEAD_FROZEN: + padata_remove_cpu(AEAD_ENC_PADATA, cpu); + padata_remove_cpu(AEAD_DEC_PADATA, cpu); + + break; + } + + return NOTIFY_OK; +} + +static int __init pcrypt_init_padata(void) +{ + open_softirq(AEAD_ENC_SOFTIRQ, aead_enc_action); + open_softirq(AEAD_DEC_SOFTIRQ, aead_dec_action); + + padata_init(AEAD_ENC_PADATA, cpu_online_mask); + padata_init(AEAD_DEC_PADATA, cpu_online_mask); + + hotcpu_notifier(pcrypt_cpu_callback, 0); + + return 0; +} +subsys_initcall(pcrypt_init_padata); diff --git a/include/crypto/pcrypt.h b/include/crypto/pcrypt.h new file mode 100644 index 0000000..d7d8bd8 --- /dev/null +++ b/include/crypto/pcrypt.h @@ -0,0 +1,51 @@ +/* + * pcrypt - Parallel crypto engine. + * + * Copyright (C) 2009 secunet Security Networks AG + * Copyright (C) 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef _CRYPTO_PCRYPT_H +#define _CRYPTO_PCRYPT_H + +#include <linux/crypto.h> +#include <linux/kernel.h> +#include <linux/padata.h> + +struct pcrypt_request { + struct padata_priv padata; + void *data; + void *__ctx[] CRYPTO_MINALIGN_ATTR; +}; + +static inline void *pcrypt_request_ctx(struct pcrypt_request *req) +{ + return req->__ctx; +} + +static inline +struct padata_priv *pcrypt_request_padata(struct pcrypt_request *req) +{ + return &req->padata; +} + +static inline +struct pcrypt_request *pcrypt_padata_request(struct padata_priv *padata) +{ + return container_of(padata, struct pcrypt_request, padata); +} + +#endif diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 1af1e4b..6bc5dde 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -350,6 +350,8 @@ enum TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, + AEAD_ENC_SOFTIRQ, + AEAD_DEC_SOFTIRQ, PADATA_SOFTIRQ, RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 69584a5..ded7178 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -19,6 +19,8 @@ softirq_name(TASKLET), \ softirq_name(SCHED), \ softirq_name(HRTIMER), \ + softirq_name(AEAD_ENC_SOFTIRQ),\ + softirq_name(AEAD_DEC_SOFTIRQ),\ softirq_name(padata), \ softirq_name(RCU)) diff --git a/kernel/softirq.c b/kernel/softirq.c index 4317559..5f20baa 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -58,7 +58,8 @@ static DEFINE_PER_CPU(struct task_struct *, ksoftirqd); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", - "TASKLET", "SCHED", "HRTIMER","PADATA", "RCU" + "TASKLET", "SCHED", "HRTIMER", "AEAD_ENC_SOFTIRQ", "AEAD_DEC_SOFTIRQ", + "PADATA", "RCU" }; /* -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-08 7:28 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert @ 2009-10-09 6:18 ` David Miller 2009-10-09 8:07 ` Steffen Klassert 2009-10-30 10:06 ` Steffen Klassert 0 siblings, 2 replies; 15+ messages in thread From: David Miller @ 2009-10-09 6:18 UTC (permalink / raw) To: steffen.klassert; +Cc: herbert, linux-crypto From: Steffen Klassert <steffen.klassert@secunet.com> Date: Thu, 8 Oct 2009 09:28:18 +0200 > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > index 1af1e4b..6bc5dde 100644 > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -350,6 +350,8 @@ enum > TASKLET_SOFTIRQ, > SCHED_SOFTIRQ, > HRTIMER_SOFTIRQ, > + AEAD_ENC_SOFTIRQ, > + AEAD_DEC_SOFTIRQ, > PADATA_SOFTIRQ, > RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ > Steffen are we going to end up adding a softirq for every crypto transform type? That won't work, softirqs are to be scarcely allocated and operate at a very high level. I can think of two alternatives: 1) One softirq that does per-cpu padata work via some generic callout mechanism. 2) Use tasklets Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-09 6:18 ` David Miller @ 2009-10-09 8:07 ` Steffen Klassert 2009-10-30 10:06 ` Steffen Klassert 1 sibling, 0 replies; 15+ messages in thread From: Steffen Klassert @ 2009-10-09 8:07 UTC (permalink / raw) To: David Miller; +Cc: herbert, linux-crypto On Thu, Oct 08, 2009 at 11:18:33PM -0700, David Miller wrote: > > Steffen are we going to end up adding a softirq for every crypto > transform type? > > That won't work, softirqs are to be scarcely allocated and operate > at a very high level. > > I can think of two alternatives: > > 1) One softirq that does per-cpu padata work via some generic > callout mechanism. I tried already to reduce the softirqs by using the same softirq for encryption and decryption. But in case of IPsec this had a negative performance impact. So if we stay with softirqs we would probaply need at least two for the whole crypto layer. Best would be if we would not need softirqs at all. In fact I started with a thread based version but the thread based version had never that performance like the softirq version has. Anyway, in between the workqueue interface changed so perhaps it is worth to try again with workqueues. > > 2) Use tasklets > Tasklets are not sufficient because I can't control on which cpu the tasklet will run on. Also we can run just one tasklet of the same type the time, so tasklets don't parallelize. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-09 6:18 ` David Miller 2009-10-09 8:07 ` Steffen Klassert @ 2009-10-30 10:06 ` Steffen Klassert 2009-10-30 12:58 ` Herbert Xu 1 sibling, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-10-30 10:06 UTC (permalink / raw) To: David Miller; +Cc: herbert, linux-crypto On Thu, Oct 08, 2009 at 11:18:33PM -0700, David Miller wrote: > > Steffen are we going to end up adding a softirq for every crypto > transform type? > > That won't work, softirqs are to be scarcely allocated and operate > at a very high level. > I changed padata to use workqueues instead of softirqs to do the parallelization/serialization. The performance for IPsec is similar to the softirq version, so probaply we don't need to use softirqs at all. While thinking about using workqueues for padata/pcrypt, I noticed two problems if we return asynchronous in thread context from the crypto-layer to xfrm. Returning in thread context is not new with pcrypt, we also can return in thread context if we use the cryptd. If we use tunnel mode, xfrm_input() calls netif_rx() which is certainly wrong if we are in thread context, we need to call netif_rx_ni() instead. Also xfrm_input() uses bare spinlocks to protect the xfrm_state, this is not appropriate in this case. We probaply need to switch off the bottom halves, if we allow to return from the cypto-layer in softirq and thread context. Any thoughts on how to handle this? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-30 10:06 ` Steffen Klassert @ 2009-10-30 12:58 ` Herbert Xu 2009-10-30 13:27 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-10-30 12:58 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Fri, Oct 30, 2009 at 11:06:09AM +0100, Steffen Klassert wrote: > > If we use tunnel mode, xfrm_input() calls netif_rx() which is certainly > wrong if we are in thread context, we need to call netif_rx_ni() instead. Since this is all happening through a crypto completion call, it needs to be done with BH off since that's a requirement for crypto completion functions. So netif_rx will work correctly as when BH is reenabled it'll pick up the packets. > Also xfrm_input() uses bare spinlocks to protect the xfrm_state, this is > not appropriate in this case. We probaply need to switch off the bottom > halves, if we allow to return from the cypto-layer in softirq and thread > context. This too should be fine with BH off. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-30 12:58 ` Herbert Xu @ 2009-10-30 13:27 ` Steffen Klassert 2009-10-30 13:30 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-10-30 13:27 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Fri, Oct 30, 2009 at 08:58:18AM -0400, Herbert Xu wrote: > > Since this is all happening through a crypto completion call, > it needs to be done with BH off since that's a requirement for > crypto completion functions. So netif_rx will work correctly > as when BH is reenabled it'll pick up the packets. Ok, if it's required that BHs are off then everything is fine. In fact I solved this problem for pcrypt by switching off the BHs. I just was not sure whether this is the right way to do. Thanks for clarification! I'm going to send a workqueue based version of padata/pcrypt within the next week. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-30 13:27 ` Steffen Klassert @ 2009-10-30 13:30 ` Herbert Xu 2009-11-02 11:36 ` David Miller 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-10-30 13:30 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Fri, Oct 30, 2009 at 02:27:34PM +0100, Steffen Klassert wrote: > > Ok, if it's required that BHs are off then everything is fine. > In fact I solved this problem for pcrypt by switching off the BHs. > I just was not sure whether this is the right way to do. Yeah having random contexts for completion functions would only create chaos :) > I'm going to send a workqueue based version of padata/pcrypt > within the next week. Awesome! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper 2009-10-30 13:30 ` Herbert Xu @ 2009-11-02 11:36 ` David Miller 0 siblings, 0 replies; 15+ messages in thread From: David Miller @ 2009-11-02 11:36 UTC (permalink / raw) To: herbert; +Cc: steffen.klassert, linux-crypto From: Herbert Xu <herbert@gondor.apana.org.au> Date: Fri, 30 Oct 2009 09:30:55 -0400 > On Fri, Oct 30, 2009 at 02:27:34PM +0100, Steffen Klassert wrote: >> I'm going to send a workqueue based version of padata/pcrypt >> within the next week. > > Awesome! Thanks a lot for continuing this work Steffen. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/2] Parallel crypto/IPsec v5 @ 2009-08-31 9:11 Steffen Klassert 2009-08-31 9:12 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-08-31 9:11 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto This patchset adds the 'pcrypt' parallel crypto template. With this template it is possible to process the crypto requests of a transform in parallel without getting request reorder. This is in particular interesting for IPsec. The parallel crypto template is based on a generic parallelization/serialization method. This method uses the remote softirq invocation infrastructure for parallelization and serialization. With this method data objects can be processed in parallel, starting at some given point. After doing some expensive operations in parallel, it is possible to serialize again. The parallelized data objects return after serialization in the order as they were before the parallelization. In the case of IPsec, this makes it possible to run the expensive parts in parallel without getting packet reordering. Changes from v4: - Use the dynamic percpu allocator - Drop of the obsolete eseqiv changes (eseqiv is the default IV generator for blockcipher algorithms on smp machines now). Changes from v3: - The generic aead wrapper is dropped. - tcrypt is extended to test algorithms by name. So it is possible to instantiate pcrypt by doing e.g.: modprobe tcrypt alg="pcrypt(authenc(hmac(sha1),cbc(aes)))" type=3 Changes from v2: - The xfrm netlink configuration code is dropped, this will be an extra patchset. - Add generic aead wrapper interface to be able to wrap an aead algorithm with an arbitrary crypto template. - Convert pcrypt to use the generic aead wrapper. - Add support for aead algorithms to eseqiv. - Add support for the pcrypt aead wrapper to authenc. It's now possible to choose for pcrypt as the default authenc wrapper with a module parameter. - Patchset applies to linux-2.6 git current. Changes from v1: - cpu_chainiv is dropped, pcrypt uses eseqiv as it's IV generator now. - Add a xfrm netlink message to be able to choose for pcrypt from userspace. - Use pcrypt just if it is selected from userspace. Steffen ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] padata: generic interface for parallel processing 2009-08-31 9:11 [PATCH 0/2] Parallel crypto/IPsec v5 Steffen Klassert @ 2009-08-31 9:12 ` Steffen Klassert 2009-09-19 23:19 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-08-31 9:12 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto This patch introduces an interface to process data objects in parallel. On request it is possible to serialize again. The parallelized objects return after serialization in the same order as they were before the parallelization. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- include/linux/interrupt.h | 3 +- include/linux/padata.h | 121 +++++++++++ kernel/Makefile | 2 +- kernel/padata.c | 519 +++++++++++++++++++++++++++++++++++++++++++++ kernel/softirq.c | 2 +- 5 files changed, 644 insertions(+), 3 deletions(-) create mode 100644 include/linux/padata.h create mode 100644 kernel/padata.c diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 35e7df1..b648cca 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -347,7 +347,8 @@ enum TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, - RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ + PADATA_SOFTIRQ, + RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ NR_SOFTIRQS }; diff --git a/include/linux/padata.h b/include/linux/padata.h new file mode 100644 index 0000000..a81161d --- /dev/null +++ b/include/linux/padata.h @@ -0,0 +1,121 @@ +/* + * padata.h - header for the padata parallelization interface + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef PADATA_H +#define PADATA_H + +#include <linux/interrupt.h> +#include <linux/smp.h> +#include <linux/list.h> + +enum +{ + NO_PADATA=0, + AEAD_ENC_PADATA, + AEAD_DEC_PADATA, + NR_PADATA +}; + +struct padata_priv { + struct list_head list; + struct call_single_data csd; + int cb_cpu; + int seq_nr; + unsigned int nr; + int info; + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); +}; + +struct padata_queue { + struct list_head list; + atomic_t num_obj; + int cpu_index; + spinlock_t lock; +}; + +struct parallel_data { + struct work_struct work; + struct padata_queue *queue; + atomic_t seq_nr; + atomic_t queued_objects; + cpumask_var_t cpumask; + cpumask_var_t new_cpumask; + u8 flags; +#define PADATA_INIT 1 +#define PADATA_FLUSH_HARD 2 +#define PADATA_RESET_IN_PROGRESS 4 + spinlock_t lock; +}; + +#ifdef CONFIG_USE_GENERIC_SMP_HELPERS +extern void __init padata_init(unsigned int nr, const struct cpumask *cpumask); +extern void padata_dont_wait(unsigned int nr, struct padata_priv *padata); +extern int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu); +extern int padata_do_serial(unsigned int nr, struct padata_priv *padata); +extern int padata_cpumask_weight(unsigned int nr); +extern int padata_index_to_cpu(unsigned int nr, int cpu_index); +extern void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask); +extern void padata_add_cpu(unsigned int nr, int cpu); +extern void padata_remove_cpu(unsigned int nr, int cpu); +extern void padata_start(unsigned int nr); +extern void padata_stop(unsigned int nr); +#else +static inline void padata_init(unsigned int nr, const struct cpumask *cpumask) +{ +} +static inline void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ +} +static inline int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + return 0; +} +static inline int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + return 0; +} +static inline int padata_cpumask_weight(unsigned int nr) +{ + return 0; +} +static inline int padata_index_to_cpu(unsigned int nr, int cpu_index) +{ + return -ENOSYS; +} +static inline void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask) +{ +} +static inline padata_add_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_remove_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_start(unsigned int nr) +{ +} +static inline padata_stop(unsigned int nr) +{ +} +#endif +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 2093a69..6ff7f1a 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o -obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o +obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o padata.o ifneq ($(CONFIG_SMP),y) obj-y += up.o endif diff --git a/kernel/padata.c b/kernel/padata.c new file mode 100644 index 0000000..3523f25 --- /dev/null +++ b/kernel/padata.c @@ -0,0 +1,519 @@ +/* + * padata.c - generic interface to process data streams in parallel + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/err.h> +#include <linux/padata.h> + +#define MAX_SEQ_NR INT_MAX - NR_CPUS + +static struct parallel_data padata_vec[NR_PADATA]; +static struct padata_priv *padata_get_next(struct parallel_data *par_data); + +static void padata_flush_hard(struct parallel_data *par_data) +{ + int cpu; + struct padata_priv *padata; + struct padata_queue *queue; + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + while(!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, + struct padata_priv, list); + + spin_lock(&queue->lock); + list_del_init(&padata->list); + spin_unlock(&queue->lock); + + atomic_dec(&par_data->queued_objects); + padata->serial(padata); + } + } +} + +static void padata_flush_order(struct parallel_data *par_data) +{ + struct padata_priv *padata; + + while (1) { + padata = padata_get_next(par_data); + + if (padata && !IS_ERR(padata)) + padata->serial(padata); + else + break; + } + + padata_flush_hard(par_data); +} + +static void padata_reset_work(struct work_struct *work) +{ + int cpu, cpu_index; + struct padata_queue *queue; + struct parallel_data *par_data; + + par_data = container_of(work, struct parallel_data, work); + + if (par_data->flags & (PADATA_INIT|PADATA_RESET_IN_PROGRESS)) + return; + + spin_lock_bh(&par_data->lock); + par_data->flags |= PADATA_RESET_IN_PROGRESS; + + if (!(par_data->flags & PADATA_FLUSH_HARD)) + padata_flush_order(par_data); + else + padata_flush_hard(par_data); + + cpu_index = 0; + + cpumask_copy(par_data->cpumask, par_data->new_cpumask); + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + atomic_set(&queue->num_obj, 0); + queue->cpu_index = cpu_index; + cpu_index++; + } + spin_unlock_bh(&par_data->lock); + + atomic_set(&par_data->seq_nr, -1); + par_data->flags &= ~PADATA_RESET_IN_PROGRESS; + par_data->flags |= PADATA_INIT; +} + +static struct padata_priv *padata_get_next(struct parallel_data *par_data) +{ + int cpu, num_cpus, empty; + int seq_nr, calc_seq_nr, next_nr; + struct padata_queue *queue, *next_queue; + struct padata_priv *padata; + + empty = 0; + next_nr = -1; + next_queue = NULL; + + num_cpus = cpumask_weight(par_data->cpumask); + + for_each_cpu(cpu, par_data->cpumask) { + queue = per_cpu_ptr(par_data->queue, cpu); + + /* + * Calculate the seq_nr of the object that should be + * next in this queue. + */ + calc_seq_nr = (atomic_read(&queue->num_obj) * num_cpus) + + queue->cpu_index; + + if (!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, + struct padata_priv, list); + + seq_nr = padata->seq_nr; + + if (unlikely(calc_seq_nr != seq_nr)) { + par_data->flags &= ~PADATA_INIT; + par_data->flags |= PADATA_FLUSH_HARD; + padata = NULL; + goto out; + } + } else { + seq_nr = calc_seq_nr; + empty++; + } + + if (next_nr < 0 || seq_nr < next_nr) { + next_nr = seq_nr; + next_queue = queue; + } + } + + padata = NULL; + + if (empty == num_cpus) + goto out; + + if (!list_empty(&next_queue->list)) { + padata = list_entry(next_queue->list.next, + struct padata_priv, list); + + spin_lock(&next_queue->lock); + list_del_init(&padata->list); + spin_unlock(&next_queue->lock); + + atomic_dec(&par_data->queued_objects); + atomic_inc(&next_queue->num_obj); + + goto out; + } + + if (next_nr % num_cpus == next_queue->cpu_index) { + padata = ERR_PTR(-ENODATA); + goto out; + } + + padata = ERR_PTR(-EINPROGRESS); +out: + return padata; +} + +static void padata_action(struct softirq_action *h) +{ + struct list_head *cpu_list, local_list; + + cpu_list = &__get_cpu_var(softirq_work_list[PADATA_SOFTIRQ]); + + local_irq_disable(); + list_replace_init(cpu_list, &local_list); + local_irq_enable(); + + while (!list_empty(&local_list)) { + struct padata_priv *padata; + + padata = list_entry(local_list.next, + struct padata_priv, csd.list); + + list_del_init(&padata->csd.list); + + padata->serial(padata); + } +} + +static int padata_cpu_hash(unsigned int nr, struct padata_priv *padata) +{ + int this_cpu, cpu_index; + + this_cpu = smp_processor_id(); + + if (padata->nr != 0) + return this_cpu; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return this_cpu; + + padata->seq_nr = atomic_inc_return(&padata_vec[nr].seq_nr); + + if (padata->seq_nr > MAX_SEQ_NR) { + padata_vec[nr].flags &= ~PADATA_INIT; + padata->seq_nr = 0; + schedule_work(&padata_vec[nr].work); + return this_cpu; + } + + padata->nr = nr; + + /* + * Hash the sequence numbers to the cpus by taking + * seq_nr mod. number of cpus in use. + */ + cpu_index = padata->seq_nr % cpumask_weight(padata_vec[nr].cpumask); + + return padata_index_to_cpu(nr, cpu_index); +} + +/* + * padata_dont_wait - must be called if an object that runs in parallel will + * not be serialized with padata_do_serial + * + * @nr: number of the padata instance + * @padata: object that will not be seen by padata_do_serial + */ +void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ + struct padata_queue *queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return; + + if (padata->nr == 0 || padata->nr != nr) + return; + + queue = per_cpu_ptr(padata_vec[nr].queue, smp_processor_id()); + atomic_inc(&queue->num_obj); + + padata->nr = 0; + padata->seq_nr = 0; +} +EXPORT_SYMBOL(padata_dont_wait); + +/* + * padata_do_parallel - padata parallelization function + * + * @softirq_nr: number of the softirq that will do the parallelization + * @nr: number of the padata instance + * @padata: object to be parallelized + * @cb_cpu: cpu number on which the serialization callback function will run + */ +int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + int target_cpu; + + padata->cb_cpu = cb_cpu; + + local_bh_disable(); + target_cpu = padata_cpu_hash(nr, padata); + local_bh_enable(); + + send_remote_softirq(&padata->csd, target_cpu, softirq_nr); + + return 1; +} +EXPORT_SYMBOL(padata_do_parallel); + +/* + * padata_do_serial - padata serialization function + * + * @nr: number of the padata instance + * @padata: object to be serialized + * + * returns 1 if the serialization callback function will be called + * from padata, 0 else + */ +int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + int cpu; + struct padata_queue *reorder_queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return 0; + + if (padata->nr != nr || padata->nr == 0) { + padata->serial(padata); + return 1; + } + + cpu = smp_processor_id(); + + reorder_queue = per_cpu_ptr(padata_vec[nr].queue, cpu); + + spin_lock(&reorder_queue->lock); + list_add_tail(&padata->list, &reorder_queue->list); + spin_unlock(&reorder_queue->lock); + + atomic_inc(&padata_vec[nr].queued_objects); + +try_again: + if (!spin_trylock(&padata_vec[nr].lock)) + goto out; + + while(1) { + padata = padata_get_next(&padata_vec[nr]); + + if (!padata || PTR_ERR(padata) == -EINPROGRESS) + break; + if (PTR_ERR(padata) == -ENODATA) { + spin_unlock(&padata_vec[nr].lock); + goto out; + } + + send_remote_softirq(&padata->csd, padata->cb_cpu, + PADATA_SOFTIRQ); + } + + if (unlikely(!(padata_vec[nr].flags & PADATA_INIT))) { + spin_unlock(&padata_vec[nr].lock); + goto reset_out; + } + + spin_unlock(&padata_vec[nr].lock); + + if (atomic_read(&padata_vec[nr].queued_objects)) + goto try_again; + +out: + return 1; +reset_out: + schedule_work(&padata_vec[nr].work); + return 1; +} +EXPORT_SYMBOL(padata_do_serial); + +/* + * padata_cpumask_weight - get the number of cpus that are actually in use + * + * @nr: number of the padata instance + */ +int padata_cpumask_weight(unsigned int nr) +{ + return cpumask_weight(padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_cpumask_weight); + +/* + * padata_index_to_cpu - get the cpu for a given cpu index + * + * @nr: number of the padata instance + * @cpu_index: index of the cpu in question + * + * The range of cpu_index is 0 <= cpu_index < padata_cpumask_weight(), + * so padata_cpumask_weight must be called before padata_index_to_cpu. + */ +int padata_index_to_cpu(unsigned int nr, int cpu_index) +{ + int cpu, target_cpu; + + target_cpu = cpumask_first(padata_vec[nr].cpumask); + for (cpu = 0; cpu < cpu_index; cpu++) + target_cpu = cpumask_next(target_cpu, padata_vec[nr].cpumask); + + return target_cpu; +} +EXPORT_SYMBOL(padata_index_to_cpu); + +/* + * padata_set_cpumask - set the cpumask that padata uses + * + * @nr: number of the padata instance + * @cpumask: the cpumask to use + */ +void padata_set_cpumask(unsigned int nr, cpumask_var_t cpumask) +{ + cpumask_copy(padata_vec[nr].new_cpumask, cpumask); + padata_vec[nr].flags &= ~PADATA_INIT; + padata_vec[nr].flags |= PADATA_FLUSH_HARD; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_set_cpumask); + +/* + * padata_add_cpu - add a cpu to the padata cpumask + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_add_cpu(unsigned int nr, int cpu) +{ + cpumask_set_cpu(cpu, padata_vec[nr].cpumask); + padata_set_cpumask(nr, padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_add_cpu); + +/* + * padata_remove_cpu - remove a cpu from the padata cpumask + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_remove_cpu(unsigned int nr, int cpu) +{ + cpumask_set_cpu(cpu, padata_vec[nr].cpumask); + padata_set_cpumask(nr, padata_vec[nr].cpumask); +} +EXPORT_SYMBOL(padata_remove_cpu); + +/* + * padata_start - start the parallel processing + * + * @nr: number of the padata instance + */ +void padata_start(unsigned int nr) +{ + if (padata_vec[nr].flags & PADATA_INIT) + return; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_start); + +/* + * padata_stop - stop the parallel processing + * + * @nr: number of the padata instance + */ +void padata_stop(unsigned int nr) +{ + padata_vec[nr].flags &= ~PADATA_INIT; +} +EXPORT_SYMBOL(padata_stop); + +/* + * padata_init - initialize a padata instance + * + * @nr: number of the padata instance + * @cpumask: cpumask that padata uses for parallelization + */ +void __init padata_init(unsigned int nr, const struct cpumask *cpumask) +{ + int cpu, cpu_index; + struct padata_queue *percpu_queue, *queue; + + percpu_queue = alloc_percpu(struct padata_queue); + + if (!percpu_queue) { + printk("padata_init: Failed to alloc the serialization" + "queues for padata nr %d, exiting!\n", nr); + return; + } + + if (!alloc_cpumask_var(&padata_vec[nr].cpumask, GFP_KERNEL)) + goto err_free; + + if (!alloc_cpumask_var(&padata_vec[nr].new_cpumask, GFP_KERNEL)) + goto err_free_mask; + + cpu_index = 0; + + for_each_possible_cpu(cpu) { + queue = per_cpu_ptr(percpu_queue, cpu); + + if (cpumask_test_cpu(cpu, cpumask)) { + queue->cpu_index = cpu_index; + cpu_index++; + } + + INIT_LIST_HEAD(&queue->list); + spin_lock_init(&queue->lock); + atomic_set(&queue->num_obj, 0); + } + + INIT_WORK(&padata_vec[nr].work, padata_reset_work); + + cpumask_copy(padata_vec[nr].cpumask, cpumask); + cpumask_copy(padata_vec[nr].new_cpumask, cpumask); + + atomic_set(&padata_vec[nr].seq_nr, -1); + atomic_set(&padata_vec[nr].queued_objects, 0); + padata_vec[nr].queue = percpu_queue; + padata_vec[nr].flags = 0; + spin_lock_init(&padata_vec[nr].lock); + + return; + +err_free_mask: + free_cpumask_var(padata_vec[nr].new_cpumask); + +err_free: + free_percpu(percpu_queue); +} + +static int __init padata_initcall(void) +{ + open_softirq(PADATA_SOFTIRQ, padata_action); + + return 0; +} +subsys_initcall(padata_initcall); diff --git a/kernel/softirq.c b/kernel/softirq.c index eb5e131..769ffc4 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct task_struct *, ksoftirqd); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", - "TASKLET", "SCHED", "HRTIMER", "RCU" + "TASKLET", "SCHED", "HRTIMER", "PADATA", "RCU" }; /* -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] padata: generic interface for parallel processing 2009-08-31 9:12 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert @ 2009-09-19 23:19 ` Herbert Xu 2009-10-07 14:22 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-09-19 23:19 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Mon, Aug 31, 2009 at 11:12:49AM +0200, Steffen Klassert wrote: > This patch introduces an interface to process data objects > in parallel. On request it is possible to serialize again. > The parallelized objects return after serialization in the > same order as they were before the parallelization. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Looks good to me. Dave, do you want to pick this one up? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] padata: generic interface for parallel processing 2009-09-19 23:19 ` Herbert Xu @ 2009-10-07 14:22 ` Steffen Klassert 2009-10-07 20:44 ` David Miller 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-10-07 14:22 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Sat, Sep 19, 2009 at 04:19:59PM -0700, Herbert Xu wrote: > On Mon, Aug 31, 2009 at 11:12:49AM +0200, Steffen Klassert wrote: > > This patch introduces an interface to process data objects > > in parallel. On request it is possible to serialize again. > > The parallelized objects return after serialization in the > > same order as they were before the parallelization. > > > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> > > Acked-by: Herbert Xu <herbert@gondor.apana.org.au> > > Looks good to me. Dave, do you want to pick this one up? > I plan to send a rebased patchset tomorrow, so who wants to take this one? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] padata: generic interface for parallel processing 2009-10-07 14:22 ` Steffen Klassert @ 2009-10-07 20:44 ` David Miller 0 siblings, 0 replies; 15+ messages in thread From: David Miller @ 2009-10-07 20:44 UTC (permalink / raw) To: steffen.klassert; +Cc: herbert, linux-crypto From: Steffen Klassert <steffen.klassert@secunet.com> Date: Wed, 7 Oct 2009 16:22:41 +0200 > On Sat, Sep 19, 2009 at 04:19:59PM -0700, Herbert Xu wrote: >> On Mon, Aug 31, 2009 at 11:12:49AM +0200, Steffen Klassert wrote: >> > This patch introduces an interface to process data objects >> > in parallel. On request it is possible to serialize again. >> > The parallelized objects return after serialization in the >> > same order as they were before the parallelization. >> > >> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> >> >> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> >> >> Looks good to me. Dave, do you want to pick this one up? >> > > I plan to send a rebased patchset tomorrow, > so who wants to take this one? I'll take the padata change, as I agreed with Herbert to do. But the rest goes to Herbert. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface
@ 2009-06-03 11:59 Herbert Xu
2009-06-05 9:20 ` Steffen Klassert
0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2009-06-03 11:59 UTC (permalink / raw)
To: Steffen Klassert; +Cc: David Miller, linux-crypto
On Wed, Jun 03, 2009 at 01:23:53PM +0200, Steffen Klassert wrote:
> On Wed, Jun 03, 2009 at 07:40:50PM +1000, Herbert Xu wrote:
> >
> > I see. How about if we let tcrypt test algorithms by name, e.g.,
> > something like
> >
> > modprobe tcrypt alg='pcrypt(authenc(hmac(sha1),cbc(aes))'
> >
>
> I'm not that sure whether this does what I want.
>
> If pcrypt has cra_name = pcrypt(authenc(hmac(sha1),cbc(aes))) this
> would instatiate this algorithm, but esp wants an algorithm with
> cra_name = authenc(hmac(sha1),cbc(aes)).
> These names are not matching, so __crypto_alg_lookup() will not
> choose for the pcrypt version regardless of the higher priority.
When pcrypt instantiates an algorithm, it should set cra_name to
%s and cra_driver_name to pcrypt(%s). So as long as the pcrypt
priority is higher than the underlying algorithm, it should all
work.
See for instance how cryptd does it.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-03 11:59 [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface Herbert Xu @ 2009-06-05 9:20 ` Steffen Klassert 2009-06-05 9:20 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-06-05 9:20 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Wed, Jun 03, 2009 at 09:59:31PM +1000, Herbert Xu wrote: > > When pcrypt instantiates an algorithm, it should set cra_name to > %s and cra_driver_name to pcrypt(%s). So as long as the pcrypt > priority is higher than the underlying algorithm, it should all > work. > As it is, I can instantiate pcrypt if the priority of pcrypt is lower than the priority of the underlying algorithm. If I do the priority check in crypto_alg_tested() the other way arround, I get it to work if pcrypt has a higher priority than the underlying algorithm. So I guess we need the patch below, right? If so, I would send a signed-off patch. diff --git a/crypto/algapi.c b/crypto/algapi.c index 56c62e2..2492e6c 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -255,7 +255,7 @@ found: continue; if (strcmp(alg->cra_driver_name, q->cra_driver_name) && - q->cra_priority > alg->cra_priority) + q->cra_priority < alg->cra_priority) continue; crypto_remove_spawns(&q->cra_users, &list, alg->cra_flags); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-05 9:20 ` Steffen Klassert @ 2009-06-05 9:20 ` Herbert Xu 2009-06-05 9:34 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-06-05 9:20 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Fri, Jun 05, 2009 at 11:20:30AM +0200, Steffen Klassert wrote: > > As it is, I can instantiate pcrypt if the priority of pcrypt is lower > than the priority of the underlying algorithm. If I do the priority > check in crypto_alg_tested() the other way arround, I get it to work if > pcrypt has a higher priority than the underlying algorithm. So I guess > we need the patch below, right? If so, I would send a signed-off patch. Can you give me an example with actual numbers and what you expect to happen? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-05 9:20 ` Herbert Xu @ 2009-06-05 9:34 ` Steffen Klassert 2009-06-08 5:28 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-06-05 9:34 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Fri, Jun 05, 2009 at 07:20:21PM +1000, Herbert Xu wrote: > On Fri, Jun 05, 2009 at 11:20:30AM +0200, Steffen Klassert wrote: > > > > As it is, I can instantiate pcrypt if the priority of pcrypt is lower > > than the priority of the underlying algorithm. If I do the priority > > check in crypto_alg_tested() the other way arround, I get it to work if > > pcrypt has a higher priority than the underlying algorithm. So I guess > > we need the patch below, right? If so, I would send a signed-off patch. > > Can you give me an example with actual numbers and what you expect > to happen? > In pcrypt_alloc_instance() I do inst->alg.cra_priority = alg->cra_priority + 100; So, in my case authenc has priority 2000 and pcrypt has priority 2100. In this case pcrypt is not instantiated if I use %s for pcrypt as cra_name. If I do inst->alg.cra_priority = alg->cra_priority - 100 it will be instantiated with priority 1900 but it will not be used because the priority of authenc is higher. So I did the priority check in crypto_alg_tested() the other way around. Then I can instantiate pcrypt with priority 2100 and I can use it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-05 9:34 ` Steffen Klassert @ 2009-06-08 5:28 ` Herbert Xu 2009-06-08 6:45 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-06-08 5:28 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Fri, Jun 05, 2009 at 11:34:30AM +0200, Steffen Klassert wrote: > > In pcrypt_alloc_instance() I do > inst->alg.cra_priority = alg->cra_priority + 100; > > So, in my case authenc has priority 2000 and pcrypt has priority 2100. > In this case pcrypt is not instantiated if I use %s for pcrypt as > cra_name. If I do > inst->alg.cra_priority = alg->cra_priority - 100 > it will be instantiated with priority 1900 but it will not be used > because the priority of authenc is higher. > > So I did the priority check in crypto_alg_tested() the other way around. > Then I can instantiate pcrypt with priority 2100 and I can use it. Can you send me a pcrypt patch that I can use to reproduce this? The check modified is meant to replace instances of the same implementation (i.e., you're replaceing aes-x86-64 with a newer version of aes-x86-64). It should never do anything when you add a different implementation of the same algorithm. So I'm surprised that you're seeing a difference when changing that check. Because unless you're creating two pcrypt objects with the same driver name, or your pcrypt object has the wrong driver name, then this change should make no difference whatsoever. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-08 5:28 ` Herbert Xu @ 2009-06-08 6:45 ` Steffen Klassert 2009-06-25 6:51 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-06-08 6:45 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Mon, Jun 08, 2009 at 03:28:08PM +1000, Herbert Xu wrote: > On Fri, Jun 05, 2009 at 11:34:30AM +0200, Steffen Klassert wrote: > > > > In pcrypt_alloc_instance() I do > > inst->alg.cra_priority = alg->cra_priority + 100; > > > > So, in my case authenc has priority 2000 and pcrypt has priority 2100. > > In this case pcrypt is not instantiated if I use %s for pcrypt as > > cra_name. If I do > > inst->alg.cra_priority = alg->cra_priority - 100 > > it will be instantiated with priority 1900 but it will not be used > > because the priority of authenc is higher. > > > > So I did the priority check in crypto_alg_tested() the other way around. > > Then I can instantiate pcrypt with priority 2100 and I can use it. > > Can you send me a pcrypt patch that I can use to reproduce this? Yes, I will send the full patchset including the tcrypt changes to instantiate pcrypt. As the patchset is, I'm not able to instantiate pcrypt here. I need to either change the priority check in crypto_alg_tested() or to make pcrypt using a lower priority than authenc. > > The check modified is meant to replace instances of the same > implementation (i.e., you're replaceing aes-x86-64 with a newer > version of aes-x86-64). It should never do anything when you add > a different implementation of the same algorithm. > > So I'm surprised that you're seeing a difference when changing > that check. Because unless you're creating two pcrypt objects > with the same driver name, or your pcrypt object has the wrong > driver name, then this change should make no difference whatsoever. > I was just surprised that I was able to instantiate pcrypt if it has a lower priority than the underlying authenc algorithm. So I searched for priority checks like the one in crypto_alg_tested() and in fact changing this check got it to work as I described above. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-08 6:45 ` Steffen Klassert @ 2009-06-25 6:51 ` Herbert Xu 2009-06-29 11:04 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-06-25 6:51 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Mon, Jun 08, 2009 at 08:45:18AM +0200, Steffen Klassert wrote: > > Yes, I will send the full patchset including the tcrypt changes to > instantiate pcrypt. OK, the patch I just posted to the list should fix the problem. I was able to test it suing modprobe tcrypt alg='pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic)))' type=3 Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-25 6:51 ` Herbert Xu @ 2009-06-29 11:04 ` Steffen Klassert 2009-06-29 11:59 ` Herbert Xu 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-06-29 11:04 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Thu, Jun 25, 2009 at 02:51:12PM +0800, Herbert Xu wrote: > > OK, the patch I just posted to the list should fix the problem. > > I was able to test it suing > > modprobe tcrypt alg='pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic)))' type=3 > I applied your patch on top of the last pcrypt patchset, but unfortunately it does not change anything here. If I do modprobe tcrypt alg='pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic)))' type=3 it instantiates authenc(hmac(sha1-generic),cbc(aes-generic)) but not pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic))) regardless your patch applied or not. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-29 11:04 ` Steffen Klassert @ 2009-06-29 11:59 ` Herbert Xu 2009-06-29 13:52 ` Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Herbert Xu @ 2009-06-29 11:59 UTC (permalink / raw) To: Steffen Klassert; +Cc: David Miller, linux-crypto On Mon, Jun 29, 2009 at 01:04:10PM +0200, Steffen Klassert wrote: > > I applied your patch on top of the last pcrypt patchset, but > unfortunately it does not change anything here. > > If I do > modprobe tcrypt alg='pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic)))' type=3 > > it instantiates > > authenc(hmac(sha1-generic),cbc(aes-generic)) > > but not > > pcrypt(authenc(hmac(sha1-generic),cbc(aes-generic))) > > regardless your patch applied or not. OK. Can you send me the patches you used against the current cryptodev tree (I only just pushed so give it an hour or so)? I'll see if I can reproduce it here. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface 2009-06-29 11:59 ` Herbert Xu @ 2009-06-29 13:52 ` Steffen Klassert 2009-06-29 13:55 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert 0 siblings, 1 reply; 15+ messages in thread From: Steffen Klassert @ 2009-06-29 13:52 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto On Mon, Jun 29, 2009 at 07:59:50PM +0800, Herbert Xu wrote: > > OK. Can you send me the patches you used against the current > cryptodev tree (I only just pushed so give it an hour or so)? > I'll see if I can reproduce it here. > I'll send the remaining two patches in reply to this mail. The strange thing is that I can instantiate pcrypt if the priority is lower than the priority of authenc, but not if it's equal or higher. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] padata: generic interface for parallel processing 2009-06-29 13:52 ` Steffen Klassert @ 2009-06-29 13:55 ` Steffen Klassert 0 siblings, 0 replies; 15+ messages in thread From: Steffen Klassert @ 2009-06-29 13:55 UTC (permalink / raw) To: Herbert Xu; +Cc: David Miller, linux-crypto This patch introduces an interface to process data objects in parallel. On request it is possible to serialize again. The parallelized objects return after serialization in the same order as they were before the parallelization. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- include/linux/interrupt.h | 3 +- include/linux/padata.h | 116 +++++++++++ kernel/Makefile | 2 +- kernel/padata.c | 490 +++++++++++++++++++++++++++++++++++++++++++++ kernel/softirq.c | 2 +- 5 files changed, 610 insertions(+), 3 deletions(-) create mode 100644 include/linux/padata.h create mode 100644 kernel/padata.c diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 2721f07..4aad58f 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -344,7 +344,8 @@ enum TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, - RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ + PADATA_SOFTIRQ, + RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ NR_SOFTIRQS }; diff --git a/include/linux/padata.h b/include/linux/padata.h new file mode 100644 index 0000000..469359f --- /dev/null +++ b/include/linux/padata.h @@ -0,0 +1,116 @@ +/* + * padata.h - header for the padata parallelization interface + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef PADATA_H +#define PADATA_H + +#include <linux/interrupt.h> +#include <linux/smp.h> +#include <linux/list.h> + +enum +{ + NO_PADATA=0, + AEAD_ENC_PADATA, + AEAD_DEC_PADATA, + NR_PADATA +}; + +struct padata_priv { + struct list_head list; + struct call_single_data csd; + int cb_cpu; + int seq_nr; + unsigned int nr; + int info; + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); +}; + +struct padata_queue { + struct list_head list; + atomic_t num_obj; + int cpu_index; + spinlock_t lock; +}; + +struct parallel_data { + struct work_struct work; + struct padata_queue *queue; + atomic_t seq_nr; + atomic_t queued_objects; + cpumask_t cpu_map; + cpumask_t new_cpu_map; + u8 flags; +#define PADATA_INIT 1 +#define PADATA_FLUSH_HARD 2 +#define PADATA_RESET_IN_PROGRESS 4 + spinlock_t lock; +}; + +#ifdef CONFIG_USE_GENERIC_SMP_HELPERS +extern void __init padata_init(unsigned int nr, cpumask_t cpu_map); +extern void padata_dont_wait(unsigned int nr, struct padata_priv *padata); +extern int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu); +extern int padata_do_serial(unsigned int nr, struct padata_priv *padata); +extern cpumask_t padata_get_cpumap(unsigned int nr); +extern void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map); +extern void padata_add_cpu(unsigned int nr, int cpu); +extern void padata_remove_cpu(unsigned int nr, int cpu); +extern void padata_start(unsigned int nr); +extern void padata_stop(unsigned int nr); +#else +static inline void padata_init(unsigned int nr,cpumask_t cpu_map) +{ +} +static inline void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ +} +static inline int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + return 0; +} +static inline int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + return 0; +} +static inline cpumask_t padata_get_cpumap(unsigned int nr) +{ + return cpu_online_map; +} +static inline void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map) +{ +} +static inline padata_add_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_remove_cpu(unsigned int nr, int cpu) +{ +} +static inline padata_start(unsigned int nr) +{ +} +static inline padata_stop(unsigned int nr) +{ +} +#endif +#endif diff --git a/kernel/Makefile b/kernel/Makefile index 9df4501..d029314 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_RT_MUTEXES) += rtmutex.o obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o obj-$(CONFIG_RT_MUTEX_TESTER) += rtmutex-tester.o obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o -obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o +obj-$(CONFIG_USE_GENERIC_SMP_HELPERS) += smp.o padata.o ifneq ($(CONFIG_SMP),y) obj-y += up.o endif diff --git a/kernel/padata.c b/kernel/padata.c new file mode 100644 index 0000000..192c9a6 --- /dev/null +++ b/kernel/padata.c @@ -0,0 +1,490 @@ +/* + * padata.c - generic interface to process data streams in parallel + * + * Copyright (C) 2008, 2009 secunet Security Networks AG + * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <linux/module.h> +#include <linux/cpumask.h> +#include <linux/err.h> +#include <linux/padata.h> + +#define MAX_SEQ_NR 1000000000 + +static struct parallel_data padata_vec[NR_PADATA]; +static struct padata_priv *padata_get_next(struct parallel_data *par_data); + +static void padata_flush_hard(struct parallel_data *par_data) +{ + int cpu; + struct padata_priv *padata; + struct padata_queue *queue; + + for_each_cpu_mask(cpu, par_data->cpu_map) { + queue = per_cpu_ptr(par_data->queue, cpu); + + while(!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, struct padata_priv, list); + + spin_lock(&queue->lock); + list_del_init(&padata->list); + spin_unlock(&queue->lock); + + atomic_dec(&par_data->queued_objects); + padata->serial(padata); + } + } +} + +static void padata_flush_order(struct parallel_data *par_data) +{ + struct padata_priv *padata; + + while (1) { + padata = padata_get_next(par_data); + + if (padata && !IS_ERR(padata)) + padata->serial(padata); + else + break; + } + + padata_flush_hard(par_data); +} + +static void padata_reset_work(struct work_struct *work) +{ + int cpu, cpu_index; + struct padata_queue *queue; + struct parallel_data *par_data; + + par_data = container_of(work, struct parallel_data, work); + + if (par_data->flags & (PADATA_INIT|PADATA_RESET_IN_PROGRESS)) + return; + + spin_lock_bh(&par_data->lock); + par_data->flags |= PADATA_RESET_IN_PROGRESS; + + if (!(par_data->flags & PADATA_FLUSH_HARD)) + padata_flush_order(par_data); + else + padata_flush_hard(par_data); + + cpu_index = 0; + + par_data->cpu_map = par_data->new_cpu_map; + + for_each_cpu_mask(cpu, par_data->cpu_map) { + queue = per_cpu_ptr(par_data->queue, cpu); + + atomic_set(&queue->num_obj, 0); + queue->cpu_index = cpu_index; + cpu_index++; + } + spin_unlock_bh(&par_data->lock); + + atomic_set(&par_data->seq_nr, -1); + par_data->flags &= ~PADATA_RESET_IN_PROGRESS; + par_data->flags |= PADATA_INIT; +} + +static struct padata_priv *padata_get_next(struct parallel_data *par_data) +{ + int cpu, num_cpus, empty; + int seq_nr, calc_seq_nr, next_nr; + struct padata_queue *queue, *next_queue; + struct padata_priv *padata; + + empty = 0; + next_nr = -1; + next_queue = NULL; + + num_cpus = cpus_weight(par_data->cpu_map); + + for_each_cpu_mask(cpu, par_data->cpu_map) { + queue = per_cpu_ptr(par_data->queue, cpu); + + /* + * Calculate the seq_nr of the object that should be + * next in this queue. + */ + calc_seq_nr = (atomic_read(&queue->num_obj) * num_cpus) + + queue->cpu_index; + + if (!list_empty(&queue->list)) { + padata = list_entry(queue->list.next, + struct padata_priv, list); + + seq_nr = padata->seq_nr; + + if (unlikely(calc_seq_nr != seq_nr)) { + par_data->flags &= ~PADATA_INIT; + par_data->flags |= PADATA_FLUSH_HARD; + padata = NULL; + goto out; + } + } else { + seq_nr = calc_seq_nr; + empty++; + } + + if (next_nr < 0 || seq_nr < next_nr) { + next_nr = seq_nr; + next_queue = queue; + } + } + + padata = NULL; + + if (empty == num_cpus) + goto out; + + if (!list_empty(&next_queue->list)) { + padata = list_entry(next_queue->list.next, + struct padata_priv, list); + + spin_lock(&next_queue->lock); + list_del_init(&padata->list); + spin_unlock(&next_queue->lock); + + atomic_dec(&par_data->queued_objects); + atomic_inc(&next_queue->num_obj); + + goto out; + } + + if (next_nr % num_cpus == next_queue->cpu_index) { + padata = ERR_PTR(-ENODATA); + goto out; + } + + padata = ERR_PTR(-EINPROGRESS); +out: + return padata; +} + +static void padata_action(struct softirq_action *h) +{ + struct list_head *cpu_list, local_list; + + cpu_list = &__get_cpu_var(softirq_work_list[PADATA_SOFTIRQ]); + + local_irq_disable(); + list_replace_init(cpu_list, &local_list); + local_irq_enable(); + + while (!list_empty(&local_list)) { + struct padata_priv *padata; + + padata = list_entry(local_list.next, + struct padata_priv, csd.list); + + list_del_init(&padata->csd.list); + + padata->serial(padata); + } +} + +static int padata_cpu_hash(unsigned int nr, struct padata_priv *padata) +{ + int cpu, target_cpu, this_cpu, cpu_index; + + this_cpu = smp_processor_id(); + + if (padata->nr != 0) + return this_cpu; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return this_cpu; + + padata->seq_nr = atomic_inc_return(&padata_vec[nr].seq_nr); + + if (padata->seq_nr > MAX_SEQ_NR) { + padata_vec[nr].flags &= ~PADATA_INIT; + padata->seq_nr = 0; + schedule_work(&padata_vec[nr].work); + return this_cpu; + } + + padata->nr = nr; + + /* + * Hash the sequence numbers to the cpus by taking + * seq_nr mod. number of cpus in use. + */ + cpu_index = padata->seq_nr % cpus_weight(padata_vec[nr].cpu_map); + + target_cpu = first_cpu(padata_vec[nr].cpu_map); + for (cpu = 0; cpu < cpu_index; cpu++) + target_cpu = next_cpu(target_cpu, padata_vec[nr].cpu_map); + + return target_cpu; +} + +/* + * padata_dont_wait - must be called if an object that runs in parallel will + * not be serialized with padata_do_serial + * + * @nr: number of the padata instance + * @padata: object that will not be seen by padata_do_serial + */ +void padata_dont_wait(unsigned int nr, struct padata_priv *padata) +{ + struct padata_queue *queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return; + + if (padata->nr == 0 || padata->nr != nr) + return; + + queue = per_cpu_ptr(padata_vec[nr].queue, smp_processor_id()); + atomic_inc(&queue->num_obj); + + padata->nr = 0; + padata->seq_nr = 0; +} +EXPORT_SYMBOL(padata_dont_wait); + +/* + * padata_do_parallel - padata parallelization function + * + * @softirq_nr: number of the softirq that will do the parallelization + * @nr: number of the padata instance + * @padata: object to be parallelized + * @cb_cpu: cpu number on which the serialization callback function will run + */ +int padata_do_parallel(unsigned int softirq_nr, unsigned int nr, + struct padata_priv *padata, int cb_cpu) +{ + int target_cpu; + + padata->cb_cpu = cb_cpu; + + local_bh_disable(); + target_cpu = padata_cpu_hash(nr, padata); + local_bh_enable(); + + send_remote_softirq(&padata->csd, target_cpu, softirq_nr); + + return 1; +} +EXPORT_SYMBOL(padata_do_parallel); + +/* + * padata_do_serial - padata serialization function + * + * @nr: number of the padata instance + * @padata: object to be serialized + * + * returns 1 if the serialization callback function will be called + * from padata, 0 else + */ +int padata_do_serial(unsigned int nr, struct padata_priv *padata) +{ + int cpu; + struct padata_queue *reorder_queue; + + if (!(padata_vec[nr].flags & PADATA_INIT)) + return 0; + + if (padata->nr != nr || padata->nr == 0) { + padata->serial(padata); + return 1; + } + + cpu = smp_processor_id(); + + reorder_queue = per_cpu_ptr(padata_vec[nr].queue, cpu); + + spin_lock(&reorder_queue->lock); + list_add_tail(&padata->list, &reorder_queue->list); + spin_unlock(&reorder_queue->lock); + + atomic_inc(&padata_vec[nr].queued_objects); + +try_again: + if (!spin_trylock(&padata_vec[nr].lock)) + goto out; + + while(1) { + padata = padata_get_next(&padata_vec[nr]); + + if (!padata || PTR_ERR(padata) == -EINPROGRESS) + break; + if (PTR_ERR(padata) == -ENODATA) { + spin_unlock(&padata_vec[nr].lock); + goto out; + } + + send_remote_softirq(&padata->csd, padata->cb_cpu, + PADATA_SOFTIRQ); + } + + if (unlikely(!(padata_vec[nr].flags & PADATA_INIT))) { + spin_unlock(&padata_vec[nr].lock); + goto reset_out; + } + + spin_unlock(&padata_vec[nr].lock); + + if (atomic_read(&padata_vec[nr].queued_objects)) + goto try_again; + +out: + return 1; +reset_out: + schedule_work(&padata_vec[nr].work); + return 1; +} +EXPORT_SYMBOL(padata_do_serial); + +/* + * padata_get_cpumap - get the cpu map that is actually in use + * + * @nr: number of the padata instance + */ +cpumask_t padata_get_cpumap(unsigned int nr) +{ + return padata_vec[nr].cpu_map; +} +EXPORT_SYMBOL(padata_get_cpumap); + +/* + * padata_set_cpumap - set the cpu map that padata uses + * + * @nr: number of the padata instance + * @cpu_map: the cpu map to use + */ +void padata_set_cpumap(unsigned int nr, cpumask_t cpu_map) +{ + padata_vec[nr].new_cpu_map = cpu_map; + padata_vec[nr].flags &= ~PADATA_INIT; + padata_vec[nr].flags |= PADATA_FLUSH_HARD; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_set_cpumap); + +/* + * padata_add_cpu - add a cpu to the padata cpu map + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_add_cpu(unsigned int nr, int cpu) +{ + cpumask_t cpu_map = padata_vec[nr].cpu_map; + + cpu_set(cpu, cpu_map); + padata_set_cpumap(nr, cpu_map); +} +EXPORT_SYMBOL(padata_add_cpu); + +/* + * padata_remove_cpu - remove a cpu from the padata cpu map + * + * @nr: number of the padata instance + * @cpu: cpu to remove + */ +void padata_remove_cpu(unsigned int nr, int cpu) +{ + cpumask_t cpu_map = padata_vec[nr].cpu_map; + + cpu_clear(cpu, cpu_map); + padata_set_cpumap(nr, cpu_map); +} +EXPORT_SYMBOL(padata_remove_cpu); + +/* + * padata_start - start the parallel processing + * + * @nr: number of the padata instance + */ +void padata_start(unsigned int nr) +{ + if (padata_vec[nr].flags & PADATA_INIT) + return; + + schedule_work(&padata_vec[nr].work); +} +EXPORT_SYMBOL(padata_start); + +/* + * padata_stop - stop the parallel processing + * + * @nr: number of the padata instance + */ +void padata_stop(unsigned int nr) +{ + padata_vec[nr].flags &= ~PADATA_INIT; +} +EXPORT_SYMBOL(padata_stop); + +/* + * padata_init - initialize a padata instance + * + * @nr: number of the padata instance + * @cpu_map: map of the cpu set that padata uses for parallelization + */ +void __init padata_init(unsigned int nr, cpumask_t cpu_map) +{ + int cpu, cpu_index; + struct padata_queue *percpu_queue, *queue; + + percpu_queue = alloc_percpu(struct padata_queue); + + if (!percpu_queue) { + printk("padata_init: Failed to alloc the serialization" + "queues for padata nr %d, exiting!\n", nr); + return; + } + + cpu_index = 0; + + for_each_possible_cpu(cpu) { + queue = per_cpu_ptr(percpu_queue, cpu); + + if (cpu_isset(cpu, cpu_map)) { + queue->cpu_index = cpu_index; + cpu_index++; + } + + INIT_LIST_HEAD(&queue->list); + spin_lock_init(&queue->lock); + atomic_set(&queue->num_obj, 0); + } + + INIT_WORK(&padata_vec[nr].work, padata_reset_work); + + atomic_set(&padata_vec[nr].seq_nr, -1); + atomic_set(&padata_vec[nr].queued_objects, 0); + padata_vec[nr].cpu_map = cpu_map; + padata_vec[nr].new_cpu_map = cpu_map; + padata_vec[nr].queue = percpu_queue; + padata_vec[nr].flags = 0; + spin_lock_init(&padata_vec[nr].lock); +} + +static int __init padata_initcall(void) +{ + open_softirq(PADATA_SOFTIRQ, padata_action); + + return 0; +} +subsys_initcall(padata_initcall); diff --git a/kernel/softirq.c b/kernel/softirq.c index b41fb71..e0faebf 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -58,7 +58,7 @@ static DEFINE_PER_CPU(struct task_struct *, ksoftirqd); char *softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", - "TASKLET", "SCHED", "HRTIMER", "RCU" + "TASKLET", "SCHED", "HRTIMER", "PADATA", "RCU" }; /* -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-11-02 11:36 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-08 7:25 [PATCH 0/2] Parallel crypto/IPsec v6 Steffen Klassert 2009-10-08 7:27 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert 2009-10-08 7:28 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert 2009-10-09 6:18 ` David Miller 2009-10-09 8:07 ` Steffen Klassert 2009-10-30 10:06 ` Steffen Klassert 2009-10-30 12:58 ` Herbert Xu 2009-10-30 13:27 ` Steffen Klassert 2009-10-30 13:30 ` Herbert Xu 2009-11-02 11:36 ` David Miller -- strict thread matches above, loose matches on Subject: below -- 2009-08-31 9:11 [PATCH 0/2] Parallel crypto/IPsec v5 Steffen Klassert 2009-08-31 9:12 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert 2009-09-19 23:19 ` Herbert Xu 2009-10-07 14:22 ` Steffen Klassert 2009-10-07 20:44 ` David Miller 2009-06-03 11:59 [RFC] [PATCH 2/5] aead: Add generic aead wrapper interface Herbert Xu 2009-06-05 9:20 ` Steffen Klassert 2009-06-05 9:20 ` Herbert Xu 2009-06-05 9:34 ` Steffen Klassert 2009-06-08 5:28 ` Herbert Xu 2009-06-08 6:45 ` Steffen Klassert 2009-06-25 6:51 ` Herbert Xu 2009-06-29 11:04 ` Steffen Klassert 2009-06-29 11:59 ` Herbert Xu 2009-06-29 13:52 ` Steffen Klassert 2009-06-29 13:55 ` [PATCH 1/2] padata: generic interface for parallel processing Steffen Klassert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.