netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug in MACSec - stops passing traffic after approx 5TB
@ 2018-10-14 14:59 Josh Coombs
  2018-10-14 20:25 ` Sabrina Dubroca
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Coombs @ 2018-10-14 14:59 UTC (permalink / raw)
  To: netdev

I initially mistook this for a traffic control issue, but after
stripping the test beds down to just the MACSec component, I can still
replicate the issue.  After approximately 5TB of transfer / 4 billion
packets over a MACSec link it stops passing traffic.  I have
replicated this now on both Ubuntu Server 18.04 with their patched
4.15 kernel and Gentoo with a vanilla 4.18.13 kernel.

As noted before, my test setup consists of two machines with a direct
ethernet connection to each other.  MACSec is setup on the link, along
with a /30 network.  I then hammer the link with iperf3 in both
directions.  On a gigabit link it takes me one to two days to trip the
bug.  Nothing is logged to dmesg.  If I remove and re-add the MACSec
link, or reboot the machines traffic flow resumes.  I can replicate on
physical hardware or in VMs.

How should I proceed to help diag and correct this?

To replicate, setup two machines with a direct ethernet connection.
If simulating in ESXi setup a dedicated vSwitch, allow promiscuous,
forged and MAC changes should be enabled, MTU increased to 9000, then
setup a dedicated port group and VLAN to simulate the direct
connection.

I use the following script to setup the MACSec connection, adjusting
keys, rxmac, interface and IPs as appropriate:

-----------
The script I used on each host (keys, rxmacs and IPs updated as appropriate):
#!/bin/bash

# Interfaces:
# dif = Egress physical interface (Dest)
# eif = Encrypted interface
dif=ens224
eif=macsec0

# MACSec Keys:
# txkey = Transmit (Local) key
# rxkey = Receive (Remote) key
# rxmac = Receive (Remote) MAC addy
txkey=60995924232808431491190820961556
rxkey=87345530111733181210202106249824
rxmac=00:0c:29:c5:95:df

# Clear any existing IP config
ifconfig $dif 0.0.0.0

# Bring up macsec:
echo "* Enable MACSec"
modprobe macsec
ip link add link "$dif" "$eif" type macsec
ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
ip macsec add "$eif" rx address "$rxmac" port 1
ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
ip link set "$eif" type macsec encrypt on

# Bring up the interfaces:
echo "* Light tunnel NICS"
ip link set "$dif" up
ip link set "$eif" up

# Set IP
ifconfig $eif 192.168.211.1/30

Once you can ping across the link, use iperf3 or a similar network
stress tool to flood the link with traffic in both directions and wait
for the bug to trigger.

Josh Coombs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in MACSec - stops passing traffic after approx 5TB
  2018-10-14 14:59 Bug in MACSec - stops passing traffic after approx 5TB Josh Coombs
@ 2018-10-14 20:25 ` Sabrina Dubroca
  2018-10-14 20:52   ` Josh Coombs
  0 siblings, 1 reply; 6+ messages in thread
From: Sabrina Dubroca @ 2018-10-14 20:25 UTC (permalink / raw)
  To: Josh Coombs; +Cc: netdev

2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> I initially mistook this for a traffic control issue, but after
> stripping the test beds down to just the MACSec component, I can still
> replicate the issue.  After approximately 5TB of transfer / 4 billion
> packets over a MACSec link it stops passing traffic.

I think you're just hitting packet number exhaustion. After 2^32
packets, the packet number would wrap to 0 and start being reused,
which breaks the crypto used by macsec. Before this point, you have to
add a new SA, and tell the macsec device to switch to it.

That's why you should be using wpa_supplicant. It will monitor the
growth of the packet number, and handle the rekey for you.

If you start with a PN already close to exhaustion (say, 4294967000),
you should hit the "bug" very quickly.

> # Bring up macsec:
> echo "* Enable MACSec"
> modprobe macsec
> ip link add link "$dif" "$eif" type macsec
> ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"

Keep the rest of the configuration, and replace that one with:
ip macsec add "$eif" tx sa 0 pn 4294967000 on key 02 "$txkey"

to trigger the issue faster.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in MACSec - stops passing traffic after approx 5TB
  2018-10-14 20:25 ` Sabrina Dubroca
@ 2018-10-14 20:52   ` Josh Coombs
  2018-10-15 15:45     ` Josh Coombs
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Coombs @ 2018-10-14 20:52 UTC (permalink / raw)
  To: sd; +Cc: netdev

On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > I initially mistook this for a traffic control issue, but after
> > stripping the test beds down to just the MACSec component, I can still
> > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > packets over a MACSec link it stops passing traffic.
>
> I think you're just hitting packet number exhaustion. After 2^32
> packets, the packet number would wrap to 0 and start being reused,
> which breaks the crypto used by macsec. Before this point, you have to
> add a new SA, and tell the macsec device to switch to it.

I had not considered that, I naively thought as long as I didn't
specify a replay window, it'd roll the PN over on it's own and life
would be good.  I'll test that theory tomorrow, should be easy to
prove out.

> That's why you should be using wpa_supplicant. It will monitor the
> growth of the packet number, and handle the rekey for you.

Thank you for the heads up, I'll read up on this as well.

Josh C

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in MACSec - stops passing traffic after approx 5TB
  2018-10-14 20:52   ` Josh Coombs
@ 2018-10-15 15:45     ` Josh Coombs
  2018-10-17 13:45       ` Josh Coombs
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Coombs @ 2018-10-15 15:45 UTC (permalink / raw)
  To: sd; +Cc: netdev

And confirmed, starting with a high packet number results in a very
short testbed run, 296 packets and then nothing, just as you surmised.
Sorry for raising the alarm falsely.  Looks like I need to roll my own
build of wpa_supplicant as the ubuntu builds don't include the macsec
driver, haven't tested Gentoo's ebuilds yet to see if they do.

Josh Coombs

On Sun, Oct 14, 2018 at 4:52 PM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > > I initially mistook this for a traffic control issue, but after
> > > stripping the test beds down to just the MACSec component, I can still
> > > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > > packets over a MACSec link it stops passing traffic.
> >
> > I think you're just hitting packet number exhaustion. After 2^32
> > packets, the packet number would wrap to 0 and start being reused,
> > which breaks the crypto used by macsec. Before this point, you have to
> > add a new SA, and tell the macsec device to switch to it.
>
> I had not considered that, I naively thought as long as I didn't
> specify a replay window, it'd roll the PN over on it's own and life
> would be good.  I'll test that theory tomorrow, should be easy to
> prove out.
>
> > That's why you should be using wpa_supplicant. It will monitor the
> > growth of the packet number, and handle the rekey for you.
>
> Thank you for the heads up, I'll read up on this as well.
>
> Josh C

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in MACSec - stops passing traffic after approx 5TB
  2018-10-15 15:45     ` Josh Coombs
@ 2018-10-17 13:45       ` Josh Coombs
  2018-10-17 14:46         ` Josh Coombs
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Coombs @ 2018-10-17 13:45 UTC (permalink / raw)
  To: sd; +Cc: netdev

I've got wpa_supplicant working with macsec on Fedora, my test bed has
shuffled 16 billion packets so far without interruption.  I am a bit
concerned that I've just pushed the resource exhaustion issue down the
road though, looking at the output of ip macsec show I see four SAs
for TX and RX, it appears to negotiate a new pair every 3 to 3.5
billion packets.  It doesn't appear to be ripping down old SAs.  What
happens when available SA slots run out?

Joshua Coombs
GWI

office 207-494-2140
www.gwi.net

On Mon, Oct 15, 2018 at 11:45 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> And confirmed, starting with a high packet number results in a very
> short testbed run, 296 packets and then nothing, just as you surmised.
> Sorry for raising the alarm falsely.  Looks like I need to roll my own
> build of wpa_supplicant as the ubuntu builds don't include the macsec
> driver, haven't tested Gentoo's ebuilds yet to see if they do.
>
> Josh Coombs
>
> On Sun, Oct 14, 2018 at 4:52 PM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> >
> > On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > > > I initially mistook this for a traffic control issue, but after
> > > > stripping the test beds down to just the MACSec component, I can still
> > > > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > > > packets over a MACSec link it stops passing traffic.
> > >
> > > I think you're just hitting packet number exhaustion. After 2^32
> > > packets, the packet number would wrap to 0 and start being reused,
> > > which breaks the crypto used by macsec. Before this point, you have to
> > > add a new SA, and tell the macsec device to switch to it.
> >
> > I had not considered that, I naively thought as long as I didn't
> > specify a replay window, it'd roll the PN over on it's own and life
> > would be good.  I'll test that theory tomorrow, should be easy to
> > prove out.
> >
> > > That's why you should be using wpa_supplicant. It will monitor the
> > > growth of the packet number, and handle the rekey for you.
> >
> > Thank you for the heads up, I'll read up on this as well.
> >
> > Josh C

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug in MACSec - stops passing traffic after approx 5TB
  2018-10-17 13:45       ` Josh Coombs
@ 2018-10-17 14:46         ` Josh Coombs
  0 siblings, 0 replies; 6+ messages in thread
From: Josh Coombs @ 2018-10-17 14:46 UTC (permalink / raw)
  To: sd; +Cc: netdev

I see it reusing SAs, so I'm good.

Joshua Coombs


On Wed, Oct 17, 2018 at 9:45 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> I've got wpa_supplicant working with macsec on Fedora, my test bed has
> shuffled 16 billion packets so far without interruption.  I am a bit
> concerned that I've just pushed the resource exhaustion issue down the
> road though, looking at the output of ip macsec show I see four SAs
> for TX and RX, it appears to negotiate a new pair every 3 to 3.5
> billion packets.  It doesn't appear to be ripping down old SAs.  What
> happens when available SA slots run out?
>
> Joshua Coombs
> GWI
>
> office 207-494-2140
> www.gwi.net
>
> On Mon, Oct 15, 2018 at 11:45 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> >
> > And confirmed, starting with a high packet number results in a very
> > short testbed run, 296 packets and then nothing, just as you surmised.
> > Sorry for raising the alarm falsely.  Looks like I need to roll my own
> > build of wpa_supplicant as the ubuntu builds don't include the macsec
> > driver, haven't tested Gentoo's ebuilds yet to see if they do.
> >
> > Josh Coombs
> >
> > On Sun, Oct 14, 2018 at 4:52 PM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> > >
> > > On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > > >
> > > > 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > > > > I initially mistook this for a traffic control issue, but after
> > > > > stripping the test beds down to just the MACSec component, I can still
> > > > > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > > > > packets over a MACSec link it stops passing traffic.
> > > >
> > > > I think you're just hitting packet number exhaustion. After 2^32
> > > > packets, the packet number would wrap to 0 and start being reused,
> > > > which breaks the crypto used by macsec. Before this point, you have to
> > > > add a new SA, and tell the macsec device to switch to it.
> > >
> > > I had not considered that, I naively thought as long as I didn't
> > > specify a replay window, it'd roll the PN over on it's own and life
> > > would be good.  I'll test that theory tomorrow, should be easy to
> > > prove out.
> > >
> > > > That's why you should be using wpa_supplicant. It will monitor the
> > > > growth of the packet number, and handle the rekey for you.
> > >
> > > Thank you for the heads up, I'll read up on this as well.
> > >
> > > Josh C

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-17 22:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-14 14:59 Bug in MACSec - stops passing traffic after approx 5TB Josh Coombs
2018-10-14 20:25 ` Sabrina Dubroca
2018-10-14 20:52   ` Josh Coombs
2018-10-15 15:45     ` Josh Coombs
2018-10-17 13:45       ` Josh Coombs
2018-10-17 14:46         ` Josh Coombs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).