linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Lamparter <chunkeey@googlemail.com>
To: Ben Greear <greearb@candelatech.com>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>
Subject: Re: Looking for non-NIC hardware-offload for wpa2 decrypt.
Date: Wed, 30 Jul 2014 00:29:56 +0200	[thread overview]
Message-ID: <3302077.5sUEMiqNRr@debian64> (raw)
In-Reply-To: <53D6B78E.1070705@candelatech.com>

On Monday, July 28, 2014 01:50:22 PM Ben Greear wrote:
> On 03/31/2014 11:09 AM, Christian Lamparter wrote:
> > Hello,
> > 
> > On Sunday, March 30, 2014 09:40:24 PM Ben Greear wrote:
> >> Due to hardware/firmware limitations, it does not appear possible to
> >> have a wifi NIC do hardware decrypt when using multiple stations on a single
> >> NIC (and have both stations connected to the same AP).
> >>
> >> This just happens to be one of my favourite things to do, and it kills
> >> performance compared to normal 'Open' throughput.
> >>
> >> I am curious if anyone knows of any way to accelerate rx-decrypt, perhaps by
> >> using a specialized hardware board or maybe a feature of certain CPUs?
> > 
> > You could check if your CPU (bios and kernel) have support for AES-NI [0].
> > AFAICT mac80211 utilizes the cryptoapi. Therefore anything that supports
> > the proper crypto bindings can be used to accelerate the encryption and
> > decryption process to some degree. And it just happens that thanks to
> > AES-NI parts of math can be efficiently calculated by the CPU. 
> 
> I recently took a look at this again, and the Intel E5 I'm using
> does use the aesni instructions/driver as far as I can tell.
Which E5 exactly? There are many different E5. 

> Throughput is still around 500Mbps where open is around 800Mbps.
I can't test ath10k or your multiple station on a single NIC thing. But
can you run a test for a "simple" single station - single AP wpa2 setup?
I want to know how close to the 800Mbps it actually goes.

> perf top shows this:
> 
> Samples: 37K of event 'cycles', Event count (approx.): 19360716192
>  12.01%  [kernel]                                      [k] math_state_restore
>  11.64%  [kernel]                                      [k] _aesni_enc1
>   8.25%  [kernel]                                      [k] __save_init_fpu
>   2.44%  [kernel]                                      [k] crypto_xor
>   1.87%  [kernel]                                      [k] irq_fpu_usable
>   1.30%  [kernel]                                      [k] aes_encrypt
>   0.76%  [kernel]                                      [k] __kernel_fpu_end
> ....
Yes, aesni is doing some of the heavy lifting! But in your original post,
you said you are interested in accelerate rx-decrypt... Now it's about 
encryption offload?! [please make up your mind :-D]

That being said 12.01% (math_state_restore -  
called by kernel_fpu_end) and 8.25% (__save_init_fpu - called 
by kernel_fpu_begin) cycles are wasted due fpu save and 
restore overhead. [You have noticed that before, didn't you ;-) ]

I think part of the poor performance is due to the design of
aes_encrypt in arch/x86/crypto/aesni-intel_glue.c:

> static void aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
> {
>        struct crypto_aes_ctx *ctx = aes_ctx(crypto_tfm_ctx(tfm));
>        [...]
>                kernel_fpu_begin();
>                aesni_enc(ctx, dst, src);
>                kernel_fpu_end();
>        [...]
> }

Ideally you would want something like:

>                kernel_fpu_begin();
>                aesni_enc(ctx, dst_frame1, src_frame1);
>                aesni_enc(ctx, dst_frame2, src_frame2);
>                ...
>                aesni_enc(ctx, dst_frameN, src_frameN);
>                kernel_fpu_end();

But getting there might not be easy and involve more than a bit
of "real programming".

In theory, it should be enough to test if there is some potential
in this approach by "enhancing" the tx-path in the following way:

1. the fpu_begin and fpu_end calls should be added to
ieee80211_crypto_ccmp_encrypt in net/mac80211/wpa.c.

>+     kernel_fpu_begin();
>        skb_queue_walk(&tx->skbs, skb) {
>                if (ccmp_encrypt_skb(tx, skb) < 0)
>                        return TX_DROP;
>        }
>+      kernel_fpu_end();
>
>       return TX_CONTINUE;

2. ieee80211_aes_ccm_encrypt in net/mac80211/aes_ccm.c
has to call __aes_encrypt instead of aes_encrypt in crypto_aead_encrypt.
[I can't think of a sane way to make this work. Of course, it's possible to
make a copy of ccm(aes) crypto_alg* and overwrite aes_encrypt with
__aes_encrypt. But that's not very nice... (It should work though) ]

> Any other magic add-in cards that would somehow just make this all faster w/out
> having to do any real programming work? :)
I doubt there is an magic add-in card for such a use-case. I think most of
them target directly applications/libraries and not the crypto-kernel
interface mac80211 is using.

[It would be really nice to know what E5 you actually have]

Regards
Christian

  reply	other threads:[~2014-07-29 22:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-31  4:40 Looking for non-NIC hardware-offload for wpa2 decrypt Ben Greear
2014-03-31 18:09 ` Christian Lamparter
2014-07-28 20:50   ` Ben Greear
2014-07-29 22:29     ` Christian Lamparter [this message]
2014-07-29 22:50       ` Ben Greear
2014-07-30 18:59         ` Christian Lamparter
2014-07-30 19:08           ` Ben Greear
2014-07-31 20:05           ` Jouni Malinen
2014-07-31 20:45             ` Christian Lamparter
2014-08-05 23:09               ` Ben Greear
2014-08-07 14:05                 ` Christian Lamparter
2014-08-07 17:45                   ` Ben Greear
2014-08-10 13:44                     ` Christian Lamparter
2014-08-12 18:34                       ` Ben Greear
2014-08-14 12:39                         ` Christian Lamparter
2014-08-14 17:09                           ` Ben Greear
2014-08-19 18:18                             ` Ben Greear
2014-08-20 20:47                               ` Christian Lamparter
2014-08-20 21:04                                 ` Ben Greear
2014-08-22 22:55                                   ` Christian Lamparter
2014-07-30  7:06       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3302077.5sUEMiqNRr@debian64 \
    --to=chunkeey@googlemail.com \
    --cc=greearb@candelatech.com \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).