* ipsec smp scalability and cpu use fairness (softirqs) @ 2013-08-12 13:01 Timo Teras 2013-08-12 21:58 ` Andrew Collins 0 siblings, 1 reply; 12+ messages in thread From: Timo Teras @ 2013-08-12 13:01 UTC (permalink / raw) To: netdev Hi, I've been recently doing some ipsec benchmarking, and analysis on system running out of cpu power. The setup is dmvpn gateway (gre+xfrm+opennhrp) with traffic in forward path. The system I have been using are VIA Nano (Padlock aes/sha accel) and Intel Xeon (aes-ni and ssse3 sha1) based. In both setups the crypto happens synchronously using special opcodes, or assembly implementation of the algorithm. It seems that the combination of softirq, napi and synchronous crypto causes two problems. 1. Single core systems that are going out of cpu power, are overwhelmed in uncontrollable manner. As softirq is doing the heavy lifting, the user land processes are starved first. This can cause userland IKE daemon to starve and lose tunnels when it is unable to answer liveliness checks. The quick workaround is to setup traffic shaping for the encrypted traffic. 2. On multicore (6-12 cores) systems, it would appear that it is not easy to distribute the ipsec to multiple cores. as softirq is sticky to the cpu where it was raised. The ipsec decryption/encryption is done synchronously in the napi poll loop, and the throughput is limited by one cpu. If the NIC supports multiple queues and balancing with ESP SPI, we can use that to get some parallelism. Fundamentally, both problems arise because synchronous crypto happens in the softirq context. I'm wondering if it would make sense to execute the synchronous crypto in low-priority per-xfrm_state workqueue or similar. Any suggestions or comments? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-12 13:01 ipsec smp scalability and cpu use fairness (softirqs) Timo Teras @ 2013-08-12 21:58 ` Andrew Collins 2013-08-13 6:23 ` Timo Teras 0 siblings, 1 reply; 12+ messages in thread From: Andrew Collins @ 2013-08-12 21:58 UTC (permalink / raw) To: Timo Teras; +Cc: netdev On Mon, Aug 12, 2013 at 7:01 AM, Timo Teras <timo.teras@iki.fi> wrote: > 1. Single core systems that are going out of cpu power, are > overwhelmed in uncontrollable manner. As softirq is doing the heavy > lifting, the user land processes are starved first. This can cause > userland IKE daemon to starve and lose tunnels when it is unable to > answer liveliness checks. The quick workaround is to setup traffic > shaping for the encrypted traffic. Which kernel version are you on? I've found I've had better behavior since: commit c10d73671ad30f54692f7f69f0e09e75d3a8926a Author: Eric Dumazet <edumazet@google.com> Date: Thu Jan 10 15:26:34 2013 -0800 softirq: reduce latencies as it bails from lengthy softirq processing much earlier, along with tuning "netdev_budget" to avoid cycling for too long in the NAPI poll. > 2. On multicore (6-12 cores) systems, it would appear that it is not > easy to distribute the ipsec to multiple cores. as softirq is sticky to > the cpu where it was raised. The ipsec decryption/encryption is done > synchronously in the napi poll loop, and the throughput is limited by > one cpu. If the NIC supports multiple queues and balancing with ESP > SPI, we can use that to get some parallelism. Although it's highly usecase dependent, I've had good luck using RPS. I'm testing as an ipsec router however, not with an endpoint on the host itself, so it processes nearly all ipsec traffic in receive context. Andrew Collins ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-12 21:58 ` Andrew Collins @ 2013-08-13 6:23 ` Timo Teras 2013-08-13 7:46 ` Steffen Klassert 0 siblings, 1 reply; 12+ messages in thread From: Timo Teras @ 2013-08-13 6:23 UTC (permalink / raw) To: Andrew Collins; +Cc: netdev On Mon, 12 Aug 2013 15:58:41 -0600 Andrew Collins <bsderandrew@gmail.com> wrote: > On Mon, Aug 12, 2013 at 7:01 AM, Timo Teras <timo.teras@iki.fi> wrote: > > 1. Single core systems that are going out of cpu power, are > > overwhelmed in uncontrollable manner. As softirq is doing the heavy > > lifting, the user land processes are starved first. This can cause > > userland IKE daemon to starve and lose tunnels when it is unable to > > answer liveliness checks. The quick workaround is to setup traffic > > shaping for the encrypted traffic. > > Which kernel version are you on? I've found I've had better behavior > since: > > commit c10d73671ad30f54692f7f69f0e09e75d3a8926a > Author: Eric Dumazet <edumazet@google.com> > Date: Thu Jan 10 15:26:34 2013 -0800 > > softirq: reduce latencies > > as it bails from lengthy softirq processing much earlier, along with The user process starvation observations are originally from 3.3/3.4 kernels, and I have not retested properly yet with newer ones. Currently starting upgrades to 3.10. That commit looks like it will directly fix mostly single core starvation issues. I think netdev_budget mostly affects latencies for other softirqs, since the rx softirq will be practically always on during the stress. And it can still cause problems that encrypted and non-encrypted packets still go through same queues. This means that when we are out cpu, we can start dropping even non-encrypted packets early. > tuning "netdev_budget" to avoid cycling for too long in the NAPI poll. > > > 2. On multicore (6-12 cores) systems, it would appear that it is not > > easy to distribute the ipsec to multiple cores. as softirq is > > sticky to the cpu where it was raised. The ipsec > > decryption/encryption is done synchronously in the napi poll loop, > > and the throughput is limited by one cpu. If the NIC supports > > multiple queues and balancing with ESP SPI, we can use that to get > > some parallelism. > > Although it's highly usecase dependent, I've had good luck using > RPS. I'm testing as an ipsec router however, not with an endpoint > on the host itself, so it processes nearly all ipsec traffic in > receive context. Yes, RPS will help on many scenarios but not all. The flow dissector knows only IP/TCP/UDP/GRE, but not ESP. So as long as traffic is distributed between different IP-addresses, it is distributed. But if I have lot of traffic between two nodes either with different ESP SPI (different gatewayed subnets), or even with same SPI, then it won't. For my scenario it will be usually even same SPI. So even if flow dissector learns ESP and uses SPI in hash, I'd need a way to balance traffic to multiple SAs. I guess the place where I'd want to see the distribution to cores is crypto_aead_*() calls. In fact, it seems there's code infracture already for it: crypto/cryptd.c. Seems it needs to be manually configured and only few places e.g. aesni gcm parts use it. I'm wondering if it'd make sense to patch net/xfrm/xfrm_algo.c to use cryptd? Or at least have a Kconfig or sysctl option make it do so. - Timo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 6:23 ` Timo Teras @ 2013-08-13 7:46 ` Steffen Klassert 2013-08-13 7:57 ` Timo Teras 0 siblings, 1 reply; 12+ messages in thread From: Steffen Klassert @ 2013-08-13 7:46 UTC (permalink / raw) To: Timo Teras; +Cc: Andrew Collins, netdev On Tue, Aug 13, 2013 at 09:23:12AM +0300, Timo Teras wrote: > > For my scenario it will be usually even same SPI. So even if flow > dissector learns ESP and uses SPI in hash, I'd need a way to balance > traffic to multiple SAs. > > I guess the place where I'd want to see the distribution to cores is > crypto_aead_*() calls. In fact, it seems there's code infracture > already for it: crypto/cryptd.c. Seems it needs to be manually > configured and only few places e.g. aesni gcm parts use it. > > I'm wondering if it'd make sense to patch net/xfrm/xfrm_algo.c to use > cryptd? Or at least have a Kconfig or sysctl option make it do so. > It is possible to configure the used crypto algorithm from userspace with the crypto user configuration API, see crypto/crypto_user.c. I wrote to tool that usses this API some time ago, it is still a bit rudimentary but it does the job. You can find it at: https://sourceforge.net/projects/crconf/ Also, if you want parallelism, you could use the pcrypt algorithm. It sends the crypto requests asynchronously round robin to a configurable set of cpus. Finaly it takes care to bring the served crypto requests back into the order they were submitted to avoid packet reordering. Currently we have only one systemwide workqueue for encryption and one decryption. So all IPsec packets are send to the same workqueue, regardless which state they use. I have patches that make it possible to configure a separate workqueue for each state or to group some states to a specific workqueue. These patches are still unpublished because they have not much testing yet, but I could send them after some polishing for review or testing if you are interested. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 7:46 ` Steffen Klassert @ 2013-08-13 7:57 ` Timo Teras 2013-08-13 10:45 ` Steffen Klassert 2013-08-20 6:17 ` Steffen Klassert 0 siblings, 2 replies; 12+ messages in thread From: Timo Teras @ 2013-08-13 7:57 UTC (permalink / raw) To: Steffen Klassert; +Cc: Andrew Collins, netdev On Tue, 13 Aug 2013 09:46:14 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Tue, Aug 13, 2013 at 09:23:12AM +0300, Timo Teras wrote: > > > > For my scenario it will be usually even same SPI. So even if flow > > dissector learns ESP and uses SPI in hash, I'd need a way to balance > > traffic to multiple SAs. > > > > I guess the place where I'd want to see the distribution to cores is > > crypto_aead_*() calls. In fact, it seems there's code infracture > > already for it: crypto/cryptd.c. Seems it needs to be manually > > configured and only few places e.g. aesni gcm parts use it. > > > > I'm wondering if it'd make sense to patch net/xfrm/xfrm_algo.c to > > use cryptd? Or at least have a Kconfig or sysctl option make it do > > so. > > It is possible to configure the used crypto algorithm from userspace > with the crypto user configuration API, see crypto/crypto_user.c. > > I wrote to tool that usses this API some time ago, it is still > a bit rudimentary but it does the job. You can find it at: > https://sourceforge.net/projects/crconf/ Exactly what I was looking for! Thanks! > Also, if you want parallelism, you could use the pcrypt algorithm. > It sends the crypto requests asynchronously round robin to a > configurable set of cpus. Finaly it takes care to bring the > served crypto requests back into the order they were submitted > to avoid packet reordering. Right. Looks like this helps a lot. Perhaps it would be worth to experiment also with RPS type hash based cpu selection? > Currently we have only one systemwide workqueue for encryption > and one decryption. So all IPsec packets are send to the same > workqueue, regardless which state they use. > > I have patches that make it possible to configure a separate > workqueue for each state or to group some states to a specific > workqueue. These patches are still unpublished because they > have not much testing yet, but I could send them after some > polishing for review or testing if you are interested. Yes, I'd be interested. Thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 7:57 ` Timo Teras @ 2013-08-13 10:45 ` Steffen Klassert 2013-08-13 11:33 ` Timo Teras 2013-08-20 6:17 ` Steffen Klassert 1 sibling, 1 reply; 12+ messages in thread From: Steffen Klassert @ 2013-08-13 10:45 UTC (permalink / raw) To: Timo Teras; +Cc: Andrew Collins, netdev On Tue, Aug 13, 2013 at 10:57:57AM +0300, Timo Teras wrote: > On Tue, 13 Aug 2013 09:46:14 +0200 > Steffen Klassert <steffen.klassert@secunet.com> wrote: > > > Also, if you want parallelism, you could use the pcrypt algorithm. > > It sends the crypto requests asynchronously round robin to a > > configurable set of cpus. Finaly it takes care to bring the > > served crypto requests back into the order they were submitted > > to avoid packet reordering. > > Right. Looks like this helps a lot. > > Perhaps it would be worth to experiment also with RPS type hash based > cpu selection? > Actually, this was the reason why I started to write the below mentioned patches. The idea behind that was to use a combination of flow based and inner flow parallelization. On bigger NUMA machines it does not make much sense to use all cores for parallelization. The performance depends too much on the actual topology. Moving crypto requests to another NUMA node can even reduce performance. So I wanted to use RPS type hash based cpu selection to choose the node for a given flow and then use pcrypt to parallelize this flow on the chosen node. > > Currently we have only one systemwide workqueue for encryption > > and one decryption. So all IPsec packets are send to the same > > workqueue, regardless which state they use. > > > > I have patches that make it possible to configure a separate > > workqueue for each state or to group some states to a specific > > workqueue. These patches are still unpublished because they > > have not much testing yet, but I could send them after some > > polishing for review or testing if you are interested. > > Yes, I'd be interested. > Ok, I'll send them. May take some days to rebase and polish. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 10:45 ` Steffen Klassert @ 2013-08-13 11:33 ` Timo Teras 2013-08-13 11:56 ` Steffen Klassert 0 siblings, 1 reply; 12+ messages in thread From: Timo Teras @ 2013-08-13 11:33 UTC (permalink / raw) To: Steffen Klassert; +Cc: Andrew Collins, netdev On Tue, 13 Aug 2013 12:45:48 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Tue, Aug 13, 2013 at 10:57:57AM +0300, Timo Teras wrote: > > On Tue, 13 Aug 2013 09:46:14 +0200 > > Steffen Klassert <steffen.klassert@secunet.com> wrote: > > > > > Also, if you want parallelism, you could use the pcrypt algorithm. > > > It sends the crypto requests asynchronously round robin to a > > > configurable set of cpus. Finaly it takes care to bring the > > > served crypto requests back into the order they were submitted > > > to avoid packet reordering. > > > > Right. Looks like this helps a lot. > > > > Perhaps it would be worth to experiment also with RPS type hash > > based cpu selection? > > Actually, this was the reason why I started to write the below > mentioned patches. The idea behind that was to use a combination of > flow based and inner flow parallelization. > > On bigger NUMA machines it does not make much sense to use all > cores for parallelization. The performance depends too much on the > actual topology. Moving crypto requests to another NUMA node can > even reduce performance. So I wanted to use RPS type hash based > cpu selection to choose the node for a given flow and then use > pcrypt to parallelize this flow on the chosen node. Excellent. I've been now playing with pcrypt. It seems to not give significant boost in throughput. I've setup the cpumaps properly, and top says the work is distributed to appropriate kworkers, but for some reason throughput does not get any better. I've tested with iperf in both udp and tcp modes, with various amounts of threads. Is there any more synchronization points for single SA that might limit throughput? I've been testing with auth hmac(sha1), enc cbc(aes) - according to metric the CPUs are still largely idle instead of processing more data for better throughput. aes-gcm (without pcrypt) achieves better throughput even saturating my test box links. Any pointers what to test, or to pinpoint the bottleneck? I also tried enabling RPS on the gre device, but it did not seem to make any significant difference either. > > > Currently we have only one systemwide workqueue for encryption > > > and one decryption. So all IPsec packets are send to the same > > > workqueue, regardless which state they use. > > > > > > I have patches that make it possible to configure a separate > > > workqueue for each state or to group some states to a specific > > > workqueue. These patches are still unpublished because they > > > have not much testing yet, but I could send them after some > > > polishing for review or testing if you are interested. > > > > Yes, I'd be interested. > > Ok, I'll send them. May take some days to rebase and polish. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 11:33 ` Timo Teras @ 2013-08-13 11:56 ` Steffen Klassert 2013-08-13 12:41 ` Timo Teras 0 siblings, 1 reply; 12+ messages in thread From: Steffen Klassert @ 2013-08-13 11:56 UTC (permalink / raw) To: Timo Teras; +Cc: Andrew Collins, netdev On Tue, Aug 13, 2013 at 02:33:25PM +0300, Timo Teras wrote: > > I've been now playing with pcrypt. It seems to not give significant > boost in throughput. I've setup the cpumaps properly, and top says the > work is distributed to appropriate kworkers, but for some reason > throughput does not get any better. I've tested with iperf in both udp > and tcp modes, with various amounts of threads. > > Is there any more synchronization points for single SA that might limit > throughput? I've been testing with auth hmac(sha1), enc cbc(aes) - > according to metric the CPUs are still largely idle instead of > processing more data for better throughput. aes-gcm (without pcrypt) > achieves better throughput even saturating my test box links. > > Any pointers what to test, or to pinpoint the bottleneck? > The only pitfall that comes to my mind is that pcrypt must be instantiated before inserting the states. Your /proc/crypto should show something like: name : authenc(hmac(sha1),cbc(aes)) driver : pcrypt(authenc(hmac(sha1-generic),cbc(aes-asm))) module : pcrypt priority : 2100 refcnt : 1 selftest : passed type : aead async : yes blocksize : 16 ivsize : 16 maxauthsize : 20 geniv : <built-in> pcrypt is now instantiated, e.g. all new IPsec states (that do hmac-sha1, cbc-aes) will use it, adding new states increase the refcount. I'll do some tests with current net-next on my own tomorrow and let you know about the results. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 11:56 ` Steffen Klassert @ 2013-08-13 12:41 ` Timo Teras 2013-08-20 6:19 ` Steffen Klassert 0 siblings, 1 reply; 12+ messages in thread From: Timo Teras @ 2013-08-13 12:41 UTC (permalink / raw) To: Steffen Klassert; +Cc: Andrew Collins, netdev On Tue, 13 Aug 2013 13:56:52 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Tue, Aug 13, 2013 at 02:33:25PM +0300, Timo Teras wrote: > > > > I've been now playing with pcrypt. It seems to not give significant > > boost in throughput. I've setup the cpumaps properly, and top says > > the work is distributed to appropriate kworkers, but for some reason > > throughput does not get any better. I've tested with iperf in both > > udp and tcp modes, with various amounts of threads. > > > > Is there any more synchronization points for single SA that might > > limit throughput? I've been testing with auth hmac(sha1), enc > > cbc(aes) - according to metric the CPUs are still largely idle > > instead of processing more data for better throughput. aes-gcm > > (without pcrypt) achieves better throughput even saturating my test > > box links. > > > > Any pointers what to test, or to pinpoint the bottleneck? > > > > The only pitfall that comes to my mind is that pcrypt must be > instantiated before inserting the states. Your /proc/crypto > should show something like: > > name : authenc(hmac(sha1),cbc(aes)) > driver : pcrypt(authenc(hmac(sha1-generic),cbc(aes-asm))) > module : pcrypt > priority : 2100 > refcnt : 1 > selftest : passed > type : aead > async : yes > blocksize : 16 > ivsize : 16 > maxauthsize : 20 > geniv : <built-in> > > pcrypt is now instantiated, e.g. all new IPsec states (that do > hmac-sha1, cbc-aes) will use it, adding new states increase the > refcount. > > I'll do some tests with current net-next on my own tomorrow and let > you know about the results. Yes, I've got pcrypt there. Apparently I had some of the cpu bindings not right, so now it's looking a lot better. But it seems that ksoftirqd on one of the CPUs becomes first bottleneck. I'll try to figure out why. Thanks on all the info so far, will continue experimenting here too. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 12:41 ` Timo Teras @ 2013-08-20 6:19 ` Steffen Klassert 2013-08-20 6:39 ` Timo Teras 0 siblings, 1 reply; 12+ messages in thread From: Steffen Klassert @ 2013-08-20 6:19 UTC (permalink / raw) To: Timo Teras; +Cc: Andrew Collins, netdev On Tue, Aug 13, 2013 at 03:41:02PM +0300, Timo Teras wrote: > On Tue, 13 Aug 2013 13:56:52 +0200 > Steffen Klassert <steffen.klassert@secunet.com> wrote: > > > > > I'll do some tests with current net-next on my own tomorrow and let > > you know about the results. > > Yes, I've got pcrypt there. Apparently I had some of the cpu bindings > not right, so now it's looking a lot better. But it seems that > ksoftirqd on one of the CPUs becomes first bottleneck. I'll try to > figure out why. > > Thanks on all the info so far, will continue experimenting here too. Here are the promised test results: I used my test boxes with two nodes (Intel Xeon X5550 @ 2.67GHz) and all cores utilized (16 logical cores). I did iperf box to box IPsec tunnel tests with the crypto algorithm: pcrypt(authenc(hmac(sha1-ssse3),cbc(aes-asm))) Throughput is at 1.70 Gbits/sec. Same test without pcrypt, i.e. crypto algorithm: authenc(hmac(sha1-ssse3),cbc(aes-asm)) Throughput is at 560 Mbits/sec. Unfortunately I can't do forwarding tests, I have only two 10 Gbit NICs. Would be nice if I could get forwarding test results from somewhere. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-20 6:19 ` Steffen Klassert @ 2013-08-20 6:39 ` Timo Teras 0 siblings, 0 replies; 12+ messages in thread From: Timo Teras @ 2013-08-20 6:39 UTC (permalink / raw) To: Steffen Klassert; +Cc: Andrew Collins, netdev On Tue, 20 Aug 2013 08:19:14 +0200 Steffen Klassert <steffen.klassert@secunet.com> wrote: > On Tue, Aug 13, 2013 at 03:41:02PM +0300, Timo Teras wrote: > > On Tue, 13 Aug 2013 13:56:52 +0200 > > Steffen Klassert <steffen.klassert@secunet.com> wrote: > > > > > > > > I'll do some tests with current net-next on my own tomorrow and > > > let you know about the results. > > > > Yes, I've got pcrypt there. Apparently I had some of the cpu > > bindings not right, so now it's looking a lot better. But it seems > > that ksoftirqd on one of the CPUs becomes first bottleneck. I'll > > try to figure out why. > > > > Thanks on all the info so far, will continue experimenting here too. > > Here are the promised test results: > > I used my test boxes with two nodes (Intel Xeon X5550 @ 2.67GHz) and > all cores utilized (16 logical cores). I did iperf box to box IPsec > tunnel tests with the crypto algorithm: > > pcrypt(authenc(hmac(sha1-ssse3),cbc(aes-asm))) > > Throughput is at 1.70 Gbits/sec. > > Same test without pcrypt, i.e. crypto algorithm: > > authenc(hmac(sha1-ssse3),cbc(aes-asm)) > > Throughput is at 560 Mbits/sec. > > Unfortunately I can't do forwarding tests, I have only two 10 Gbit > NICs. Would be nice if I could get forwarding test results from > somewhere. I got basically the same results. (Managed to get 2.5 Gbit/s after some cpumask experimenting.) At this point it seems that one core cpu peaks at 100% softirq. It seems to be the nic rx softirq. I am curious why it takes so much cpu, because plain tcp at 10Gbit/s does not take much cpu at all. So even though pcrypt is used, it seems it adds considerable overhead in softirq rx path still. I wonder if it's the pcrypt do parallel overhead or some generic ipsec/gre overhead; perhaps some locking thing. I should profile it. Thanks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ipsec smp scalability and cpu use fairness (softirqs) 2013-08-13 7:57 ` Timo Teras 2013-08-13 10:45 ` Steffen Klassert @ 2013-08-20 6:17 ` Steffen Klassert 1 sibling, 0 replies; 12+ messages in thread From: Steffen Klassert @ 2013-08-20 6:17 UTC (permalink / raw) To: Timo Teras; +Cc: Andrew Collins, netdev On Tue, Aug 13, 2013 at 10:57:57AM +0300, Timo Teras wrote: > On Tue, 13 Aug 2013 09:46:14 +0200 > Steffen Klassert <steffen.klassert@secunet.com> wrote: > > > Currently we have only one systemwide workqueue for encryption > > and one decryption. So all IPsec packets are send to the same > > workqueue, regardless which state they use. > > > > I have patches that make it possible to configure a separate > > workqueue for each state or to group some states to a specific > > workqueue. These patches are still unpublished because they > > have not much testing yet, but I could send them after some > > polishing for review or testing if you are interested. > > Yes, I'd be interested. > I've pushed the patches to git://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux-stk.git net-next-pcrypt Steffen Klassert (9): crypto: api - Add crypto_tfm_has_alg helper xfrm: Add a netlink attribute for crypto algorithm drivers esp4: Use the crypto algorithm driver name if present esp6: Use the crypto algorithm driver name if present crypto: Support for multi instance algorithms pcrypt: handle errors from crypto_register_template crypto: pcrypt - Add support for request backlog crypto: pcrypt - Add the padata related informations to the instance context crypto: pcrypt - Support for multiple padata instances crypto/algapi.c | 3 +- crypto/api.c | 15 ++ crypto/pcrypt.c | 489 +++++++++++++++++++++++++++++++++++---------- include/linux/crypto.h | 7 + include/net/xfrm.h | 2 + include/uapi/linux/xfrm.h | 5 + net/ipv4/esp4.c | 33 ++- net/ipv6/esp6.c | 33 ++- net/xfrm/xfrm_user.c | 8 + 9 files changed, 482 insertions(+), 113 deletions(-) This is a combined patchset of networking and crypto changes. I merged them and pushed it to a git repo so I don't need to bother the netdev and the crypto list with this early stage patches. The networking changes add a posibility to choose the crypto alg driver on a per SA basis. I've attach the necessary iproute2 patch to this mail. The crypto changes are a general pcrypt update. It adds a possibility to build multiple instances of pcrypt such that each SA can have it's own pcrypt instance. There is one unrelated patch in the patchset: crypto: pcrypt - Add support for request backlog It should not interfere with the other patches, it was just to much pain to rebase without that patch. Comments to the patchset and test results are very welcome! The patch below adds an iproute2 option to configure the crypto driver per SA: From: Steffen Klassert <steffen.klassert@secunet.com> Date: Tue, 20 Aug 2013 07:13:51 +0200 Subject: [PATCH] iproute2: Add a option to configure the crypto driver on per SA basis --- include/linux/xfrm.h | 5 +++++ ip/xfrm_state.c | 7 +++++++ 2 files changed, 12 insertions(+) diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index 341c3c9..4520008 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -116,6 +116,10 @@ struct xfrm_algo_aead { char alg_key[0]; }; +struct xfrm_algo_driver{ + char driver_name[64]; +}; + struct xfrm_stats { __u32 replay_window; __u32 replay; @@ -298,6 +302,7 @@ enum xfrm_attr_type_t { XFRMA_TFCPAD, /* __u32 */ XFRMA_REPLAY_ESN_VAL, /* struct xfrm_replay_esn */ XFRMA_SA_EXTRA_FLAGS, /* __u32 */ + XFRMA_ALG_DRIVER, /* struct xfrm_algo_driver */ __XFRMA_MAX #define XFRMA_MAX (__XFRMA_MAX - 1) diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c index 389942c..b7d413d 100644 --- a/ip/xfrm_state.c +++ b/ip/xfrm_state.c @@ -274,6 +274,7 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv) char buf[RTA_BUF_SIZE]; } req; struct xfrm_replay_state replay; + struct xfrm_algo_driver driver; char *idp = NULL; char *aeadop = NULL; char *ealgop = NULL; @@ -290,6 +291,7 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv) memset(&req, 0, sizeof(req)); memset(&replay, 0, sizeof(replay)); + memset(&driver, 0, sizeof(driver)); memset(&ctx, 0, sizeof(ctx)); req.n.nlmsg_len = NLMSG_LENGTH(sizeof(req.xsinfo)); @@ -392,6 +394,11 @@ static int xfrm_state_modify(int cmd, unsigned flags, int argc, char **argv) xfrm_sctx_parse((char *)&ctx.str, context, &ctx.sctx); addattr_l(&req.n, sizeof(req.buf), XFRMA_SEC_CTX, (void *)&ctx, ctx.sctx.len); + } else if (strcmp(*argv, "crypto-driver") == 0) { + NEXT_ARG(); + strncpy(driver.driver_name, *argv, sizeof(driver.driver_name)); + addattr_l(&req.n, sizeof(req.buf), XFRMA_ALG_DRIVER, + (void *)&driver, sizeof(driver)); } else { /* try to assume ALGO */ int type = xfrm_algotype_getbyname(*argv); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-08-20 6:39 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-12 13:01 ipsec smp scalability and cpu use fairness (softirqs) Timo Teras 2013-08-12 21:58 ` Andrew Collins 2013-08-13 6:23 ` Timo Teras 2013-08-13 7:46 ` Steffen Klassert 2013-08-13 7:57 ` Timo Teras 2013-08-13 10:45 ` Steffen Klassert 2013-08-13 11:33 ` Timo Teras 2013-08-13 11:56 ` Steffen Klassert 2013-08-13 12:41 ` Timo Teras 2013-08-20 6:19 ` Steffen Klassert 2013-08-20 6:39 ` Timo Teras 2013-08-20 6:17 ` Steffen Klassert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).