All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Parallel crypto/IPsec v7
@ 2009-12-18 12:20 Steffen Klassert
  2009-12-18 12:20 ` [PATCH 1/2] padata: generic parallelization/serialization interface Steffen Klassert
  2009-12-18 12:21 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert
  0 siblings, 2 replies; 4+ messages in thread
From: Steffen Klassert @ 2009-12-18 12:20 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: linux-crypto

This patchset adds the 'pcrypt' parallel crypto template. With this template it
is possible to process the crypto requests of a transform in parallel without
getting request reorder. This is in particular interesting for IPsec.

The parallel crypto template is based on the 'padata' generic
parallelization/serialization method. With this method data objects can
be processed in parallel, starting at some given point.
The parallelized data objects return after serialization in the order as
they were before the parallelization. In the case of IPsec, this makes it
possible to run the expensive parts in parallel without getting packet
reordering.

IPsec forwarding tests with two quad core machines (Intel Core 2 Quad Q6600)
and an EXFO FTB-400 packet blazer showed the following results:

On all tests I used smp_affinity to pin the interrupts of the network cards
to different cpus.

linux-2.6.33-rc1 (64 bit)
Packetsize: 1420 byte
Test time: 60 sec
Encryption: aes192-sha1
bidirectional throughput without packet loss: 2 x 325 Mbit/s
unidirectional throughput without packet loss: 325 Mbit/s

linux-2.6.33-rc1 (64 bit)
Packetsize: 128 byte
Test time: 60 sec
Encryption: aes192-sha1
bidirectional throughput without packet loss: 2 x 100 Mbit/s
unidirectional throughput without packet loss: 125 Mbit/s


linux-2.6.33-rc1 with padata/pcrypt (64 bit)
Packetsize: 1420 byte
Test time: 60 sec
Encryption: aes192-sha1
bidirectional throughput without packet loss: 2 x 650 Mbit/s
unidirectional throughput without packet loss: 850  Mbit/s

linux-2.6.33-rc1 with padata/pcrypt (64 bit)
Packetsize: 128 byte
Test time: 60 sec
Encryption: aes192-sha1
bidirectional throughput without packet loss: 2 x 100 Mbit/s
unidirectional throughput without packet loss: 125 Mbit/s


So the performance win on big packets is quite good. But on small packets
the troughput results with and without the workqueue based parallelization
are amost the same on my testing environment.

Changes from v6:

- Rework padata to use workqueues instead of softirqs for
  parallelization/serialization

- Add a cyclic sequence number pattern, makes the reset of the padata
  serialization logic on sequence number overrun superfluous.

- Adapt pcrypt to the changed padata interface.

- Rebased to linux-2.6.33-rc1

Steffen

^ permalink raw reply	[flat|nested] 4+ messages in thread
* Re: workqueue thing
@ 2009-12-21 14:19 Tejun Heo
  2009-12-21 15:19 ` Arjan van de Ven
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2009-12-21 14:19 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jens Axboe, Andi Kleen, Peter Zijlstra, torvalds, awalls,
	linux-kernel, jeff, mingo, akpm, rusty, cl, dhowells, avi,
	johannes

Hello, Arjan.

On 12/21/2009 10:53 PM, Arjan van de Ven wrote:
> I'm just wondering if even that is overkill; I suspect you can do
> entirely without the scheduler intrusion;
> just make a new thread for each work item, with some hesteresis:
>
> * threads should stay around for a bit before dying (you do that)
> * after some minimum nr of threads (say 4 per cpu), you wait, say, 0.1
> seconds before deciding it's time
>   to spawn more threads, to smooth out spikes of very short lived stuff.
> 
> wouldn't that be a lot simpler than "ask the scheduler to see if
> they are all blocked". If they are all very busy churning cpu (say
> doing raid6 work, or btrfs checksumming) you still would want more
> threads I suspect

Ah... okay, there are two aspects cmwq invovles the scheduler.

A. Concurrency management.  This is achieved by the scheduler
   callbacks which watches how many workers are working.

B. Deadlock avoidance.  This requires migrating rescuers to CPUs under
   allocation distress.  The problem here is that
   set_cpus_allowed_ptr() doesn't allow migrating tasks to CPUs which
   are online but !active (CPU_DOWN_PREPARE).

B would be necessary in whichever way you implement shared worker pool
unless you create all the workers which might possibly be necessary
for allocation.

For A, it's far more efficient and robust with scheduler callbacks.
It's conceptually pretty simple too.  If you look at the patch which
actually implements the dynamic pool, the amount of code necessary for
implementing this part isn't that big.  Most of complexity in the
series comes from trying to sharing workers not the dynamic pool
management.  Even if it switches to timer based one, there simply
won't be much reduction in complexity.  So, I don't think there's any
reason to choose rather fragile heuristics when it can be implemented
in a pretty mechanical way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-23  8:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-18 12:20 [PATCH 0/2] Parallel crypto/IPsec v7 Steffen Klassert
2009-12-18 12:20 ` [PATCH 1/2] padata: generic parallelization/serialization interface Steffen Klassert
2009-12-18 12:21 ` [PATCH 2/2] crypto: pcrypt - Add pcrypt crypto parallelization wrapper Steffen Klassert
  -- strict thread matches above, loose matches on Subject: below --
2009-12-21 14:19 workqueue thing Tejun Heo
2009-12-21 15:19 ` Arjan van de Ven
2009-12-22  0:00   ` Tejun Heo
2009-12-22 11:10     ` Peter Zijlstra
2009-12-22 17:20       ` Linus Torvalds
2009-12-22 17:47         ` Peter Zijlstra
2009-12-23  3:37           ` Tejun Heo
2009-12-23  6:52             ` Herbert Xu
2009-12-23  8:00               ` Steffen Klassert
2009-12-23  8:01                 ` [PATCH 0/2] Parallel crypto/IPsec v7 Steffen Klassert
2009-12-23  8:03                   ` [PATCH 1/2] padata: generic parallelization/serialization interface Steffen Klassert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.