[RFC] Kernel unaligned access at __skb_flow

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Kernel unaligned access at __skb_flow_dissect
@ 2016-01-29 18:06 Sowmini Varadhan
  2016-01-29 18:33 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 18:06 UTC (permalink / raw)
  To: netdev


There is an unaligned access at __skb_flow_dissect when it calls
ip6_flowlabel() with the call stack

  __skb_flow_dissect+0x2a8/0x87c
  eth_get_headlen+0x5c/0xaxa4
  ixgbe_clean_rx_irq+0x5cc/0xb20 [ixgbe]
  ixgbe_poll+0x5a4/0x760 [ixgbe]
  net_rx_action+0x13c/0x354
    :

Essentially, ixgbe_pull_tail() is trying to figure out how much
to pull, in order to have an aligned buffer:

        pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);

        /* align pull length to size of long to optimize memcpy performance */
        skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));

and seems like the unaligned access is unavoidable here (see comments
in __skb_get_poff, for example).

This (below) is what I came up with, to get rid of the unaligned access
errors on sparc, Is there a better solution? (Not having access
to struct ip6_hdr in this file made put_unaligned usage non-obvious)


--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
+static inline __be32 ip6_flowlabel_align(const u8 *hdr)
+{
+       union {
+               __u8 w[4];
+               __u32 flow;
+       } ip6_flow;
+
+       memcpy(ip6_flow.w, hdr, 4);
+       return (ip6_flow.flow & IPV6_FLOWLABEL_MASK);
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specifie
@@ -230,7 +241,7 @@ ipv6:
                        key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
                }
 
-               flow_label = ip6_flowlabel(iph);
+               flow_label = ip6_flowlabel_align((const u8 *)iph);
                if (flow_label) {
                        if (dissector_uses_key(flow_dissector,
                                               FLOW_DISSECTOR_KEY_FLOW_LABEL)) {

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:06 [RFC] Kernel unaligned access at __skb_flow_dissect Sowmini Varadhan
@ 2016-01-29 18:33 ` Eric Dumazet
  2016-01-29 18:40   ` Eric Dumazet
                     ` (2 more replies)
  2016-01-29 19:04 ` David Miller
  2016-01-30  2:49 ` [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Alexander Duyck
  2 siblings, 3 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 18:33 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev

On Fri, 2016-01-29 at 13:06 -0500, Sowmini Varadhan wrote:
> There is an unaligned access at __skb_flow_dissect when it calls
> ip6_flowlabel() with the call stack
> 
>   __skb_flow_dissect+0x2a8/0x87c
>   eth_get_headlen+0x5c/0xaxa4
>   ixgbe_clean_rx_irq+0x5cc/0xb20 [ixgbe]
>   ixgbe_poll+0x5a4/0x760 [ixgbe]
>   net_rx_action+0x13c/0x354
>     :
> 
> Essentially, ixgbe_pull_tail() is trying to figure out how much
> to pull, in order to have an aligned buffer:
> 
>         pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);
> 
>         /* align pull length to size of long to optimize memcpy performance */
>         skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
> 
> and seems like the unaligned access is unavoidable here (see comments
> in __skb_get_poff, for example).
> 
> This (below) is what I came up with, to get rid of the unaligned access
> errors on sparc, Is there a better solution? (Not having access
> to struct ip6_hdr in this file made put_unaligned usage non-obvious)
> 
> 
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
>  }
>  EXPORT_SYMBOL(__skb_flow_get_ports);
>  
> +static inline __be32 ip6_flowlabel_align(const u8 *hdr)
> +{
> +       union {
> +               __u8 w[4];
> +               __u32 flow;
> +       } ip6_flow;
> +
> +       memcpy(ip6_flow.w, hdr, 4);
> +       return (ip6_flow.flow & IPV6_FLOWLABEL_MASK);
> +}
> +
>  /**
>   * __skb_flow_dissect - extract the flow_keys struct and return it
>   * @skb: sk_buff to extract the flow from, can be NULL if the rest are specifie
> @@ -230,7 +241,7 @@ ipv6:
>                         key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
>                 }
>  
> -               flow_label = ip6_flowlabel(iph);
> +               flow_label = ip6_flowlabel_align((const u8 *)iph);
>                 if (flow_label) {
>                         if (dissector_uses_key(flow_dissector,
>                                                FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
> 
> 

Why ipv6 stack itself does not trigger the issue ?

Maybe the driver itself does not properly align IP headers on sparc ?

Make sure NET_IP_ALIGN is 2 on your build.

Note that x86 does not care, but a driver should always align Ethernet
header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
boundaries.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:33 ` Eric Dumazet
@ 2016-01-29 18:40   ` Eric Dumazet
  2016-01-29 19:08     ` David Miller
  2016-01-29 18:54   ` Eric Dumazet
  2016-01-29 19:06   ` David Miller
  2 siblings, 1 reply; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 18:40 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev

On Fri, 2016-01-29 at 10:33 -0800, Eric Dumazet wrote:

> 
> Why ipv6 stack itself does not trigger the issue ?
> 
> Maybe the driver itself does not properly align IP headers on sparc ?
> 
> Make sure NET_IP_ALIGN is 2 on your build.
> 
> Note that x86 does not care, but a driver should always align Ethernet
> header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
> boundaries.
> 

I would try following ixgbe fix (sorry, totally untested, but you get
the idea...)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c4003a88bbf6..7ba64ed463a6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1525,7 +1525,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
 
 	bi->dma = dma;
 	bi->page = page;
-	bi->page_offset = 0;
+	bi->page_offset = NET_IP_ALIGN;
 
 	return true;
 }

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:40   ` Eric Dumazet
@ 2016-01-29 19:08     ` David Miller
  2016-01-29 19:39       ` Eric Dumazet
  0 siblings, 1 reply; 50+ messages in thread
From: David Miller @ 2016-01-29 19:08 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sowmini.varadhan, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jan 2016 10:40:10 -0800

> I would try following ixgbe fix (sorry, totally untested, but you get
> the idea...)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index c4003a88bbf6..7ba64ed463a6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -1525,7 +1525,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
>  
>  	bi->dma = dma;
>  	bi->page = page;
> -	bi->page_offset = 0;
> +	bi->page_offset = NET_IP_ALIGN;
>  
>  	return true;
>  }

Hmmm, wouldn't this waste 2 bytes on all pages?  Or just the head one with the headers?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 19:08     ` David Miller
@ 2016-01-29 19:39       ` Eric Dumazet
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: sowmini.varadhan, netdev

On Fri, 2016-01-29 at 11:08 -0800, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 29 Jan 2016 10:40:10 -0800
> 
> > I would try following ixgbe fix (sorry, totally untested, but you get
> > the idea...)
> > 
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index c4003a88bbf6..7ba64ed463a6 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -1525,7 +1525,7 @@ static bool ixgbe_alloc_mapped_page(struct ixgbe_ring *rx_ring,
> >  
> >  	bi->dma = dma;
> >  	bi->page = page;
> > -	bi->page_offset = 0;
> > +	bi->page_offset = NET_IP_ALIGN;
> >  
> >  	return true;
> >  }
> 
> Hmmm, wouldn't this waste 2 bytes on all pages?  Or just the head one with the headers?

On arches where NET_IP_ALIGN = 2, where it makes sense to get a proper
alignement, even for the copy of headers into skb->head, with same
source/destination alignment.

We 'waste' already 66 bytes for typical ethernet+IPv4+TCP headers.

Lets remind we allocate half a page (2048 bytes) for an ethernet frame
anyway. We have plenty of room.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:33 ` Eric Dumazet
  2016-01-29 18:40   ` Eric Dumazet
@ 2016-01-29 18:54   ` Eric Dumazet
  2016-01-29 19:22     ` Sowmini Varadhan
  2016-01-29 21:00     ` Tom Herbert
  2016-01-29 19:06   ` David Miller
  2 siblings, 2 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 18:54 UTC (permalink / raw)
  To: Sowmini Varadhan, mwdalton; +Cc: netdev

On Fri, 2016-01-29 at 10:33 -0800, Eric Dumazet wrote:
> On Fri, 2016-01-29 at 13:06 -0500, Sowmini Varadhan wrote:
> > There is an unaligned access at __skb_flow_dissect when it calls
> > ip6_flowlabel() with the call stack
> > 
> >   __skb_flow_dissect+0x2a8/0x87c
> >   eth_get_headlen+0x5c/0xaxa4
> >   ixgbe_clean_rx_irq+0x5cc/0xb20 [ixgbe]
> >   ixgbe_poll+0x5a4/0x760 [ixgbe]
> >   net_rx_action+0x13c/0x354
> >     :
> > 
> > Essentially, ixgbe_pull_tail() is trying to figure out how much
> > to pull, in order to have an aligned buffer:
> > 
> >         pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);
> > 
> >         /* align pull length to size of long to optimize memcpy performance */
> >         skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
> > 
> > and seems like the unaligned access is unavoidable here (see comments
> > in __skb_get_poff, for example).
> > 
> > This (below) is what I came up with, to get rid of the unaligned access
> > errors on sparc, Is there a better solution? (Not having access
> > to struct ip6_hdr in this file made put_unaligned usage non-obvious)
> > 
> > 
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
> >  }
> >  EXPORT_SYMBOL(__skb_flow_get_ports);
> >  
> > +static inline __be32 ip6_flowlabel_align(const u8 *hdr)
> > +{
> > +       union {
> > +               __u8 w[4];
> > +               __u32 flow;
> > +       } ip6_flow;
> > +
> > +       memcpy(ip6_flow.w, hdr, 4);
> > +       return (ip6_flow.flow & IPV6_FLOWLABEL_MASK);
> > +}
> > +
> >  /**
> >   * __skb_flow_dissect - extract the flow_keys struct and return it
> >   * @skb: sk_buff to extract the flow from, can be NULL if the rest are specifie
> > @@ -230,7 +241,7 @@ ipv6:
> >                         key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
> >                 }
> >  
> > -               flow_label = ip6_flowlabel(iph);
> > +               flow_label = ip6_flowlabel_align((const u8 *)iph);
> >                 if (flow_label) {
> >                         if (dissector_uses_key(flow_dissector,
> >                                                FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
> > 
> > 
> 
> Why ipv6 stack itself does not trigger the issue ?
> 
> Maybe the driver itself does not properly align IP headers on sparc ?
> 
> Make sure NET_IP_ALIGN is 2 on your build.
> 
> Note that x86 does not care, but a driver should always align Ethernet
> header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
> boundaries.

Hmmm.... it seems that flow dissector can "support L2 GRE", leading to
unaligned accesses since a header is 14 bytes (not multiple of 4)

commit e1733de2243609073534cf56afb146a62af3c3d8
Author: Michael Dalton <mwdalton@google.com>
Date:   Mon Mar 11 06:52:28 2013 +0000

    flow_dissector: support L2 GRE
    
    Add support for L2 GRE tunnels, so that RPS can be more effective.


Michael, do we still need this ?

IP stacks in linux assume IP headers are always aligned to 4 bytes,
so it means having a GRE header like this would align trap on some
arches.






  

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:54   ` Eric Dumazet
@ 2016-01-29 19:22     ` Sowmini Varadhan
  2016-01-29 21:00     ` Tom Herbert
  1 sibling, 0 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 19:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mwdalton, netdev, davem

On (01/29/16 10:54), Eric Dumazet wrote:
> > Why ipv6 stack itself does not trigger the issue ?
> > Maybe the driver itself does not properly align IP headers on sparc ?
> > Make sure NET_IP_ALIGN is 2 on your build.
> > Note that x86 does not care, but a driver should always align Ethernet
> > header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
> > boundaries.

Consolidating a few responses:

NET_IP_ALIGN is 2, of course, or else I would have seen errors about unaligned
access in many more places. The issue is being triggered by ipv6
stack (I'm running iperf over ipv6) so I'm not sure I understand the
first question.

I tried out the suggested patch of setting page_offset to NET_IP_ALIGN
in ixgbe_main.c - and yes, it kills the unaligned errors, so that's a
Good Thing. (though I see davem just responded to the patch)

However, the patch itself is not arch specific, and the unaligned 
address issue should have impacted other archs (silently) as well - 
even for ipv4, 

It would also impact other logic in __skb_flow_dissect: 
e.g., the access to iph->saddr for ipv4 frags would be
unaligned in __skb_flow_dissect? (And Ias I mentioned in my original
mail, the comments in __skb_get_poff seem to indicate that others
have tripped up on this before - and yes, casting is not a good thing
but seems like that's what they did to paper over the problem)

> Hmmm.... it seems that flow dissector can "support L2 GRE", leading to
> unaligned accesses since a header is 14 bytes (not multiple of 4)
  :
> IP stacks in linux assume IP headers are always aligned to 4 bytes,
> so it means having a GRE header like this would align trap on some
> arches.

I'm not sure I see how the above commit could have been impacting me,
I dont have any l2 gre invovled (simple p2p Ethernet II + ipv6 + tcp, 
no vlans, gre or other exotic stuff).

--Sowmini

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:54   ` Eric Dumazet
  2016-01-29 19:22     ` Sowmini Varadhan
@ 2016-01-29 21:00     ` Tom Herbert
  2016-01-29 21:09       ` Sowmini Varadhan
  1 sibling, 1 reply; 50+ messages in thread
From: Tom Herbert @ 2016-01-29 21:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, mwdalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 10:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-01-29 at 10:33 -0800, Eric Dumazet wrote:
>> On Fri, 2016-01-29 at 13:06 -0500, Sowmini Varadhan wrote:
>> > There is an unaligned access at __skb_flow_dissect when it calls
>> > ip6_flowlabel() with the call stack
>> >
>> >   __skb_flow_dissect+0x2a8/0x87c
>> >   eth_get_headlen+0x5c/0xaxa4
>> >   ixgbe_clean_rx_irq+0x5cc/0xb20 [ixgbe]
>> >   ixgbe_poll+0x5a4/0x760 [ixgbe]
>> >   net_rx_action+0x13c/0x354
>> >     :
>> >
>> > Essentially, ixgbe_pull_tail() is trying to figure out how much
>> > to pull, in order to have an aligned buffer:
>> >
>> >         pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);
>> >
>> >         /* align pull length to size of long to optimize memcpy performance */
>> >         skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>> >
>> > and seems like the unaligned access is unavoidable here (see comments
>> > in __skb_get_poff, for example).
>> >
>> > This (below) is what I came up with, to get rid of the unaligned access
>> > errors on sparc, Is there a better solution? (Not having access
>> > to struct ip6_hdr in this file made put_unaligned usage non-obvious)
>> >
>> >
>> > --- a/net/core/flow_dissector.c
>> > +++ b/net/core/flow_dissector.c
>> > @@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int
>> >  }
>> >  EXPORT_SYMBOL(__skb_flow_get_ports);
>> >
>> > +static inline __be32 ip6_flowlabel_align(const u8 *hdr)
>> > +{
>> > +       union {
>> > +               __u8 w[4];
>> > +               __u32 flow;
>> > +       } ip6_flow;
>> > +
>> > +       memcpy(ip6_flow.w, hdr, 4);
>> > +       return (ip6_flow.flow & IPV6_FLOWLABEL_MASK);
>> > +}
>> > +
>> >  /**
>> >   * __skb_flow_dissect - extract the flow_keys struct and return it
>> >   * @skb: sk_buff to extract the flow from, can be NULL if the rest are specifie
>> > @@ -230,7 +241,7 @@ ipv6:
>> >                         key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
>> >                 }
>> >
>> > -               flow_label = ip6_flowlabel(iph);
>> > +               flow_label = ip6_flowlabel_align((const u8 *)iph);
>> >                 if (flow_label) {
>> >                         if (dissector_uses_key(flow_dissector,
>> >                                                FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
>> >
>> >
>>
>> Why ipv6 stack itself does not trigger the issue ?
>>
>> Maybe the driver itself does not properly align IP headers on sparc ?
>>
>> Make sure NET_IP_ALIGN is 2 on your build.
>>
>> Note that x86 does not care, but a driver should always align Ethernet
>> header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
>> boundaries.
>
> Hmmm.... it seems that flow dissector can "support L2 GRE", leading to
> unaligned accesses since a header is 14 bytes (not multiple of 4)
>
> commit e1733de2243609073534cf56afb146a62af3c3d8
> Author: Michael Dalton <mwdalton@google.com>
> Date:   Mon Mar 11 06:52:28 2013 +0000
>
>     flow_dissector: support L2 GRE
>
>     Add support for L2 GRE tunnels, so that RPS can be more effective.
>
>
> Michael, do we still need this ?
>
> IP stacks in linux assume IP headers are always aligned to 4 bytes,
> so it means having a GRE header like this would align trap on some
> arches.
>
Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
those create alignment issues (like when stack looks at addresses)?

>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 21:00     ` Tom Herbert
@ 2016-01-29 21:09       ` Sowmini Varadhan
  2016-01-29 21:33         ` Tom Herbert
  2016-01-29 23:00         ` Tom Herbert
  0 siblings, 2 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 21:09 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, mwdalton, Linux Kernel Network Developers

On (01/29/16 13:00), Tom Herbert wrote:
> Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
> those create alignment issues (like when stack looks at addresses)?

They do.

http://comments.gmane.org/gmane.linux.network/370672

some of it was fixed in https://lkml.org/lkml/2015/7/20/63.
There's still some NET_IP_ALIGN missing. IIRC, I was seeing
this for mldv3, but the fix has to be done carefully, by
someone who knows how to fully regression test it.

I dont know if other tunneling methods manage to get the 
NET_IP_ALIGN correct in every case. 

Also, while sparc complains about unaligned access
in most cases, some sins may pass under the radar, and other
platforms dont even generate traps, so it's easy to not know
that there's a problem, a lot of the time.

--Sowmini

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 21:09       ` Sowmini Varadhan
@ 2016-01-29 21:33         ` Tom Herbert
  2016-01-29 23:00         ` Tom Herbert
  1 sibling, 0 replies; 50+ messages in thread
From: Tom Herbert @ 2016-01-29 21:33 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: Eric Dumazet, mwdalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 1:09 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 13:00), Tom Herbert wrote:
>> Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
>> those create alignment issues (like when stack looks at addresses)?
>
> They do.
>
> http://comments.gmane.org/gmane.linux.network/370672
>
> some of it was fixed in https://lkml.org/lkml/2015/7/20/63.
> There's still some NET_IP_ALIGN missing. IIRC, I was seeing
> this for mldv3, but the fix has to be done carefully, by
> someone who knows how to fully regression test it.
>
> I dont know if other tunneling methods manage to get the
> NET_IP_ALIGN correct in every case.
>
Fortunately, most encapsulations are based on 4-byte alignment.
ETHER_IP even specifies a two byte header to ensure that whatever
follows the encapsulated Ethernet header is 4-byte aligned. The
encapsulation protocols that carry Ethernet without any inserted bytes
for alignment are the potential problem (GRE, VXLAN, etc.).
NET_IP_ALIGN can't help because the outer and inner (IP) headers have
different alignment, so no one value can work.

> Also, while sparc complains about unaligned access
> in most cases, some sins may pass under the radar, and other
> platforms dont even generate traps, so it's easy to not know
> that there's a problem, a lot of the time.
>
That is worrisome. The only way I can think of to be completely
protected would be to copy encapsulated Ethernet frames to
NET_IP_ALIGN buffer in the driver :-( It doesn't look like VXLAN does
anything like that.

Tom

> --Sowmini
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 21:09       ` Sowmini Varadhan
  2016-01-29 21:33         ` Tom Herbert
@ 2016-01-29 23:00         ` Tom Herbert
  2016-01-29 23:04           ` Sowmini Varadhan
  1 sibling, 1 reply; 50+ messages in thread
From: Tom Herbert @ 2016-01-29 23:00 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Michael Dalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 1:09 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 13:00), Tom Herbert wrote:
>> Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
>> those create alignment issues (like when stack looks at addresses)?
>
> They do.
>
> http://comments.gmane.org/gmane.linux.network/370672
>
> some of it was fixed in https://lkml.org/lkml/2015/7/20/63.
> There's still some NET_IP_ALIGN missing. IIRC, I was seeing
> this for mldv3, but the fix has to be done carefully, by
> someone who knows how to fully regression test it.
>
> I dont know if other tunneling methods manage to get the
> NET_IP_ALIGN correct in every case.
>
> Also, while sparc complains about unaligned access
> in most cases, some sins may pass under the radar, and other
> platforms dont even generate traps, so it's easy to not know
> that there's a problem, a lot of the time.
>
The sparc documentation is pretty clear
http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
like unaligned accesses are not allowed in the architecture.

Tom

> --Sowmini
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 23:00         ` Tom Herbert
@ 2016-01-29 23:04           ` Sowmini Varadhan
  2016-01-29 23:31             ` Tom Herbert
  0 siblings, 1 reply; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 23:04 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, Michael Dalton, Linux Kernel Network Developers

On (01/29/16 15:00), Tom Herbert wrote:

> The sparc documentation is pretty clear
> http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
> like unaligned accesses are not allowed in the architecture.

yes, but looks like you  can paper over some of this with
memcpy (as was happening with the saddr ref in skb_flow_dissect
that puzzled me and Eric because it did not generate any traps).

I suppose one could sprinkele a few WARN_ON's for !IS_ALIGNED
but that's not a fool-proof detection method either (in addition
to all the ugly shouting in the code).

--Sowmini

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 23:04           ` Sowmini Varadhan
@ 2016-01-29 23:31             ` Tom Herbert
  2016-01-29 23:58               ` Sowmini Varadhan
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Herbert @ 2016-01-29 23:31 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Michael Dalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 3:04 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 15:00), Tom Herbert wrote:
>
>> The sparc documentation is pretty clear
>> http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
>> like unaligned accesses are not allowed in the architecture.
>
> yes, but looks like you  can paper over some of this with
> memcpy (as was happening with the saddr ref in skb_flow_dissect
> that puzzled me and Eric because it did not generate any traps).
>
Well, I am more worried about what happens when an unaligned
encapsulated TPC/IP packet gets into the stack. It seems unlikely we'd
want to replace a bunch of address, seq numbers, etc. with memcpy all
over the stack. ;-)

But even within flow dissector, to be completely correct, we need to
replace all 32-bit accesses with the mempy (flow_label, mpls label,
keyid) and be vigilant about new ones coming in. For that matter, if
we really want to be pedantic, nothing prevents a driver from giving
us a packet with 1 byte alignment in which case we need to consider
access to 16-bit values also! Considering that this is a very narrow
use case (requires encapsulated Ethernet and architecture that is
alignment sensitive), I wonder if we should just punt on flow
dissection when we see non 4-byte alignment and
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set.

Tom

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 23:31             ` Tom Herbert
@ 2016-01-29 23:58               ` Sowmini Varadhan
  2016-01-30  0:47                 ` Tom Herbert
  0 siblings, 1 reply; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 23:58 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, Michael Dalton, Linux Kernel Network Developers

On (01/29/16 15:31), Tom Herbert wrote:
> But even within flow dissector, to be completely correct, we need to
> replace all 32-bit accesses with the mempy (flow_label, mpls label,
> keyid) and be vigilant about new ones coming in. For that matter, ..

well, one question that came to my mind when I was looking at this is:
why does eth_get_headlen try to compute the flow hash even before the
driver has had a chance to pull things into an aligned buffer? Isn't
that just risking even more unaligned accesses?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 23:58               ` Sowmini Varadhan
@ 2016-01-30  0:47                 ` Tom Herbert
  2016-01-30  1:18                   ` David Miller
  2016-01-30  2:15                   ` Eric Dumazet
  0 siblings, 2 replies; 50+ messages in thread
From: Tom Herbert @ 2016-01-30  0:47 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Michael Dalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 3:58 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 15:31), Tom Herbert wrote:
>> But even within flow dissector, to be completely correct, we need to
>> replace all 32-bit accesses with the mempy (flow_label, mpls label,
>> keyid) and be vigilant about new ones coming in. For that matter, ..
>
> well, one question that came to my mind when I was looking at this is:
> why does eth_get_headlen try to compute the flow hash even before the
> driver has had a chance to pull things into an aligned buffer? Isn't
> that just risking even more unaligned accesses?
>
Hmm, thanks for pointing that out. It looks like this is called by a
couple drivers as part of pulling up the data to get alignment. I am
actually surprised they are doing this. Flow dissector is not the
cheapest function in the world and it seems like a shame to be calling
it this early and not even getting a hash for the skb. Also, this is
another instance of touching the data so if we do get to the point of
trying steering packets without cache miss (another thread); this
undermines that. Seems like it might be just as well to pull up a
fixed number of bytes (ixgbe want min of 60 bytes), or don't pull up
anything avoid the cache miss here and let the stack pull up as
needed.

Anyway, I think you do have a point that flow_dissector does not
assume alignment or pull up data like the rest of the stack. Seems
like we need a utility function like align_access (maybe there is one
already) to get 32-bit fields and probably do need the sanity check
for odd alignment.

Tom

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-30  0:47                 ` Tom Herbert
@ 2016-01-30  1:18                   ` David Miller
  2016-01-30  2:15                   ` Eric Dumazet
  1 sibling, 0 replies; 50+ messages in thread
From: David Miller @ 2016-01-30  1:18 UTC (permalink / raw)
  To: tom; +Cc: sowmini.varadhan, eric.dumazet, mwdalton, netdev

From: Tom Herbert <tom@herbertland.com>
Date: Fri, 29 Jan 2016 16:47:46 -0800

> Seems like it might be just as well to pull up a fixed number of
> bytes (ixgbe want min of 60 bytes),

This is precisely what we were trying to move away from, please
don't regress back to that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-30  0:47                 ` Tom Herbert
  2016-01-30  1:18                   ` David Miller
@ 2016-01-30  2:15                   ` Eric Dumazet
  2016-01-31 22:13                     ` Tom Herbert
  1 sibling, 1 reply; 50+ messages in thread
From: Eric Dumazet @ 2016-01-30  2:15 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Sowmini Varadhan, Michael Dalton, Linux Kernel Network Developers

On Fri, 2016-01-29 at 16:47 -0800, Tom Herbert wrote:

> Hmm, thanks for pointing that out. It looks like this is called by a
> couple drivers as part of pulling up the data to get alignment. I am
> actually surprised they are doing this. Flow dissector is not the
> cheapest function in the world and it seems like a shame to be calling
> it this early and not even getting a hash for the skb. Also, this is
> another instance of touching the data so if we do get to the point of
> trying steering packets without cache miss (another thread); this
> undermines that. Seems like it might be just as well to pull up a
> fixed number of bytes (ixgbe want min of 60 bytes), or don't pull up
> anything avoid the cache miss here and let the stack pull up as
> needed.

Except that when for example mlx4 is able to pull only the headers
instead of a fixed amount of bytes, we increased the performance,
because GRO could cook packets twice as big for tunneled traffic.

( commit cfecec56ae7c7c40f23fbdac04acee027ca3bd66 )

1) Increasing performance by prefetching headers is a separate matter,
completely orthogonal to this.

2) Taking the opportunity to also give back the l4 hash as a bonus
has been considered, but tests showed no clear benefit.

  For typical cases where l4 hash is not used (RPS and RFS being not
enabled by default on linux), computing it brings nothing but added
complexity.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-30  2:15                   ` Eric Dumazet
@ 2016-01-31 22:13                     ` Tom Herbert
  2016-02-01  0:04                       ` Eric Dumazet
  0 siblings, 1 reply; 50+ messages in thread
From: Tom Herbert @ 2016-01-31 22:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sowmini Varadhan, Michael Dalton, Linux Kernel Network Developers

On Fri, Jan 29, 2016 at 6:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-01-29 at 16:47 -0800, Tom Herbert wrote:
>
>> Hmm, thanks for pointing that out. It looks like this is called by a
>> couple drivers as part of pulling up the data to get alignment. I am
>> actually surprised they are doing this. Flow dissector is not the
>> cheapest function in the world and it seems like a shame to be calling
>> it this early and not even getting a hash for the skb. Also, this is
>> another instance of touching the data so if we do get to the point of
>> trying steering packets without cache miss (another thread); this
>> undermines that. Seems like it might be just as well to pull up a
>> fixed number of bytes (ixgbe want min of 60 bytes), or don't pull up
>> anything avoid the cache miss here and let the stack pull up as
>> needed.
>
> Except that when for example mlx4 is able to pull only the headers
> instead of a fixed amount of bytes, we increased the performance,
> because GRO could cook packets twice as big for tunneled traffic.
>
> ( commit cfecec56ae7c7c40f23fbdac04acee027ca3bd66 )
>
> 1) Increasing performance by prefetching headers is a separate matter,
> completely orthogonal to this.
>
> 2) Taking the opportunity to also give back the l4 hash as a bonus
> has been considered, but tests showed no clear benefit.
>
>   For typical cases where l4 hash is not used (RPS and RFS being not
> enabled by default on linux), computing it brings nothing but added
> complexity.
>
Neither is GRE enabled by default in Linux and it is not a typical
case. So that patch is an optimization for a very narrow use case that
impacts the core data path for everyone. Please at least consider
making it configurable.

>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-31 22:13                     ` Tom Herbert
@ 2016-02-01  0:04                       ` Eric Dumazet
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-02-01  0:04 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Sowmini Varadhan, Michael Dalton, Linux Kernel Network Developers

On Sun, 2016-01-31 at 14:13 -0800, Tom Herbert wrote:

> Neither is GRE enabled by default in Linux and it is not a typical
> case. So that patch is an optimization for a very narrow use case that
> impacts the core data path for everyone. Please at least consider
> making it configurable.

No idea why it should be configurable, by adding yet another
conditional.

Pulling all headers at once is faster than pulling in GRO slow path, or
if GRO is disabled in pskb_may_pull() slow path.

It seems we need to fix flow dissector anyway.

Note: This path is only taken in non native traffic, in case you missed
this.

Regular IP+TCP or IP+UDP does not use it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:33 ` Eric Dumazet
  2016-01-29 18:40   ` Eric Dumazet
  2016-01-29 18:54   ` Eric Dumazet
@ 2016-01-29 19:06   ` David Miller
  2016-01-29 19:37     ` Eric Dumazet
  2 siblings, 1 reply; 50+ messages in thread
From: David Miller @ 2016-01-29 19:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sowmini.varadhan, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jan 2016 10:33:48 -0800

> Why ipv6 stack itself does not trigger the issue ?
> 
> Maybe the driver itself does not properly align IP headers on sparc ?
> 
> Make sure NET_IP_ALIGN is 2 on your build.
> 
> Note that x86 does not care, but a driver should always align Ethernet
> header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
> boundaries.

Eric, please reread the original posting, the driver is trying to do
exactly this.

This is the part in the driver where it is trying to figure out how
much to pull in order to align the packet before it gets sent into the
stack.

That's why only the flow dissector sees this problem.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 19:06   ` David Miller
@ 2016-01-29 19:37     ` Eric Dumazet
  2016-01-29 19:41       ` David Miller
  2016-01-29 19:44       ` Sowmini Varadhan
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 19:37 UTC (permalink / raw)
  To: David Miller; +Cc: sowmini.varadhan, netdev

On Fri, 2016-01-29 at 11:06 -0800, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 29 Jan 2016 10:33:48 -0800
> 
> > Why ipv6 stack itself does not trigger the issue ?
> > 
> > Maybe the driver itself does not properly align IP headers on sparc ?
> > 
> > Make sure NET_IP_ALIGN is 2 on your build.
> > 
> > Note that x86 does not care, but a driver should always align Ethernet
> > header to NET_IP_ALIGN, so that IP headers are aligned to 4 bytes
> > boundaries.
> 
> Eric, please reread the original posting, the driver is trying to do
> exactly this.

Sure, but this driver was tested on x86.

> 
> This is the part in the driver where it is trying to figure out how
> much to pull in order to align the packet before it gets sent into the
> stack.
> 
> That's why only the flow dissector sees this problem.

I have no idea why reading iph->saddr or iph->daddr would not hit the
problem, but accessing the 32bit ipv6 flow label would be an issue.

Something is fishy.

But really adding unaligned() accesses in flow dissector would slow it
quite a lot on MIPS and others.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 19:37     ` Eric Dumazet
@ 2016-01-29 19:41       ` David Miller
  2016-01-29 19:44       ` Sowmini Varadhan
  1 sibling, 0 replies; 50+ messages in thread
From: David Miller @ 2016-01-29 19:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sowmini.varadhan, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jan 2016 11:37:30 -0800

> But really adding unaligned() accesses in flow dissector would slow it
> quite a lot on MIPS and others.

Agreed, the fix definitely belongs in the callers of these interfaces,
which in this case is the driver.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 19:37     ` Eric Dumazet
  2016-01-29 19:41       ` David Miller
@ 2016-01-29 19:44       ` Sowmini Varadhan
  2016-01-29 20:01         ` Eric Dumazet
  1 sibling, 1 reply; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-29 19:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On (01/29/16 11:37), Eric Dumazet wrote:
> 
> I have no idea why reading iph->saddr or iph->daddr would not hit the
> problem, but accessing the 32bit ipv6 flow label would be an issue.
> 
> Something is fishy.

I was wondering about this myself. Even on sparc, I only first
ran into the errors for ipv6. I dont know if the fact that the
saddr is memcpy'ed masks the error (even though the problem
is still there). 

But doing the check of: if (!IS_ALIGNED(..)) on the iph->saddr in the
code does result in a positive.

> But really adding unaligned() accesses in flow dissector would slow it
> quite a lot on MIPS and others.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 19:44       ` Sowmini Varadhan
@ 2016-01-29 20:01         ` Eric Dumazet
  2016-01-29 20:06           ` Eric Dumazet
  2016-01-29 21:16           ` Alexander Duyck
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 20:01 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: David Miller, netdev

On Fri, 2016-01-29 at 14:44 -0500, Sowmini Varadhan wrote:
> On (01/29/16 11:37), Eric Dumazet wrote:
> > 
> > I have no idea why reading iph->saddr or iph->daddr would not hit the
> > problem, but accessing the 32bit ipv6 flow label would be an issue.
> > 
> > Something is fishy.
> 
> I was wondering about this myself. Even on sparc, I only first
> ran into the errors for ipv6. I dont know if the fact that the
> saddr is memcpy'ed masks the error (even though the problem
> is still there). 

Oh right, recent work in flow dissector added all these memcpy()

I was still looking at linux-4.3 ;)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 20:01         ` Eric Dumazet
@ 2016-01-29 20:06           ` Eric Dumazet
  2016-01-29 21:16           ` Alexander Duyck
  1 sibling, 0 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 20:06 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: David Miller, netdev

On Fri, 2016-01-29 at 12:01 -0800, Eric Dumazet wrote:
> On Fri, 2016-01-29 at 14:44 -0500, Sowmini Varadhan wrote:
> > On (01/29/16 11:37), Eric Dumazet wrote:
> > > 
> > > I have no idea why reading iph->saddr or iph->daddr would not hit the
> > > problem, but accessing the 32bit ipv6 flow label would be an issue.
> > > 
> > > Something is fishy.
> > 
> > I was wondering about this myself. Even on sparc, I only first
> > ran into the errors for ipv6. I dont know if the fact that the
> > saddr is memcpy'ed masks the error (even though the problem
> > is still there). 
> 
> Oh right, recent work in flow dissector added all these memcpy()
> 
> I was still looking at linux-4.3 ;)

The code for GRE/GRE_KEY is still accessing 32bit vars.

const __be32 *keyid;
...
key_keyid->keyid = *keyid;

So instead of auditing flow dissect code, we should fix the few drivers
using it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 20:01         ` Eric Dumazet
  2016-01-29 20:06           ` Eric Dumazet
@ 2016-01-29 21:16           ` Alexander Duyck
  2016-01-29 21:33             ` Eric Dumazet
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-29 21:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, David Miller, Netdev

On Fri, Jan 29, 2016 at 12:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-01-29 at 14:44 -0500, Sowmini Varadhan wrote:
>> On (01/29/16 11:37), Eric Dumazet wrote:
>> >
>> > I have no idea why reading iph->saddr or iph->daddr would not hit the
>> > problem, but accessing the 32bit ipv6 flow label would be an issue.
>> >
>> > Something is fishy.
>>
>> I was wondering about this myself. Even on sparc, I only first
>> ran into the errors for ipv6. I dont know if the fact that the
>> saddr is memcpy'ed masks the error (even though the problem
>> is still there).
>
> Oh right, recent work in flow dissector added all these memcpy()
>
> I was still looking at linux-4.3 ;)

It has to be something recent.  I know back when I wrote the code I
tested it on a few different architectures and had to add a few bits
in __skb_get_poff so that it would read doff as a u8 instead of
bitfield in a u32.

Looking at the code it seems like this should be an easy fix.  Just
swap the two lines that acquire and test the flow label with the check
for dissector_uses_key(... _KEY_FLOW_LABEL).  Then ixgbe will stop
trying to grab flow labels.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 21:16           ` Alexander Duyck
@ 2016-01-29 21:33             ` Eric Dumazet
  2016-01-29 22:08               ` Alexander Duyck
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 21:33 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Sowmini Varadhan, David Miller, Netdev

On Fri, 2016-01-29 at 13:16 -0800, Alexander Duyck wrote:

> It has to be something recent.  I know back when I wrote the code I
> tested it on a few different architectures and had to add a few bits
> in __skb_get_poff so that it would read doff as a u8 instead of
> bitfield in a u32.
> 
> Looking at the code it seems like this should be an easy fix.  Just
> swap the two lines that acquire and test the flow label with the check
> for dissector_uses_key(... _KEY_FLOW_LABEL).  Then ixgbe will stop
> trying to grab flow labels.

Note that if we properly align headers, we also align TCP/UDP payload.

Meaning that copying data in recvmsg() is likely faster in some cpus.
(The ones having NET_IP_ALIGN set to 2 presumably)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 21:33             ` Eric Dumazet
@ 2016-01-29 22:08               ` Alexander Duyck
  2016-01-29 22:28                 ` Eric Dumazet
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-29 22:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, David Miller, Netdev

On Fri, Jan 29, 2016 at 1:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-01-29 at 13:16 -0800, Alexander Duyck wrote:
>
>> It has to be something recent.  I know back when I wrote the code I
>> tested it on a few different architectures and had to add a few bits
>> in __skb_get_poff so that it would read doff as a u8 instead of
>> bitfield in a u32.
>>
>> Looking at the code it seems like this should be an easy fix.  Just
>> swap the two lines that acquire and test the flow label with the check
>> for dissector_uses_key(... _KEY_FLOW_LABEL).  Then ixgbe will stop
>> trying to grab flow labels.
>
> Note that if we properly align headers, we also align TCP/UDP payload.
>
> Meaning that copying data in recvmsg() is likely faster in some cpus.
> (The ones having NET_IP_ALIGN set to 2 presumably)

It also means DMA becomes dramatically slower as it introduces a
partial write access for the start of every frame.  It is why we had
set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
when unaligned then reading IP unaligned headers.

The gain on recvmsg would probably be minimal.  The only time I have
seen any significant speed-up for copying is if you can get both ends
aligned to something like 16B.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 22:08               ` Alexander Duyck
@ 2016-01-29 22:28                 ` Eric Dumazet
  2016-01-29 23:00                   ` Alexander Duyck
  2016-02-01 16:55                   ` David Laight
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-29 22:28 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Sowmini Varadhan, David Miller, Netdev

On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote:

> It also means DMA becomes dramatically slower as it introduces a
> partial write access for the start of every frame.  It is why we had
> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
> when unaligned then reading IP unaligned headers.

Well, I guess that if you have an arch where DMA accesses are slow and
NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are
better than others.


> 
> The gain on recvmsg would probably be minimal.  The only time I have
> seen any significant speed-up for copying is if you can get both ends
> aligned to something like 16B.

On modern intel cpus, this does not matter at all, sure. It took a while
before "rep movsb" finally did the right thing.

memcpy() and friends implementations are much slower on some older
arches (when dealing with unaligned src/dst)

arch/mips/lib/memcpy.S is a gem ;)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 22:28                 ` Eric Dumazet
@ 2016-01-29 23:00                   ` Alexander Duyck
  2016-02-01 16:55                   ` David Laight
  1 sibling, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2016-01-29 23:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, David Miller, Netdev

On Fri, Jan 29, 2016 at 2:28 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote:
>
>> It also means DMA becomes dramatically slower as it introduces a
>> partial write access for the start of every frame.  It is why we had
>> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
>> when unaligned then reading IP unaligned headers.
>
> Well, I guess that if you have an arch where DMA accesses are slow and
> NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are
> better than others.

The other bit you forgot to mention was an IOMMU.  That is another
per-architecture thing that can really slow us down.  Back when I
rewrote the receive path I was dealing with a number of performance
complaints on PowerPC.  The approach I took with the Intel drivers was
supposed to be the best compromise for IOMMU, DMA alignment, and IP
header alignment.

>>
>> The gain on recvmsg would probably be minimal.  The only time I have
>> seen any significant speed-up for copying is if you can get both ends
>> aligned to something like 16B.
>
> On modern intel cpus, this does not matter at all, sure. It took a while
> before "rep movsb" finally did the right thing.
>
> memcpy() and friends implementations are much slower on some older
> arches (when dealing with unaligned src/dst)
>
> arch/mips/lib/memcpy.S is a gem ;)

Yeah.  I can imagine.  The fact is you can't may everybody happy so I
am good with just trying to support the majority architectures as best
as possible if a few have to take a performance hit for an unaligned
memcpy then so be it.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 22:28                 ` Eric Dumazet
  2016-01-29 23:00                   ` Alexander Duyck
@ 2016-02-01 16:55                   ` David Laight
  2016-02-01 17:25                     ` Eric Dumazet
  1 sibling, 1 reply; 50+ messages in thread
From: David Laight @ 2016-02-01 16:55 UTC (permalink / raw)
  To: 'Eric Dumazet', Alexander Duyck
  Cc: Sowmini Varadhan, David Miller, Netdev

From: Eric Dumazet
> Sent: 29 January 2016 22:29
...
> On modern intel cpus, this does not matter at all, sure. It took a while
> before "rep movsb" finally did the right thing.

Unfortunately memcpy_to_io() etc now map to 'rep movsb', and that can
only be optimisied for cached addresses.

So copies to/from pcie space get done as byte copies, slower than slow.

The same is true when usespace has used mmap() to directly access
pcie memory.

There isn't a standard wrapper than generates 'rep movsd'.

	David

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-02-01 16:55                   ` David Laight
@ 2016-02-01 17:25                     ` Eric Dumazet
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-02-01 17:25 UTC (permalink / raw)
  To: David Laight; +Cc: Alexander Duyck, Sowmini Varadhan, David Miller, Netdev

On Mon, 2016-02-01 at 16:55 +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 29 January 2016 22:29
> ...
> > On modern intel cpus, this does not matter at all, sure. It took a while
> > before "rep movsb" finally did the right thing.
> 
> Unfortunately memcpy_to_io() etc now map to 'rep movsb', and that can
> only be optimisied for cached addresses.
> 
> So copies to/from pcie space get done as byte copies, slower than slow.
> 
> The same is true when usespace has used mmap() to directly access
> pcie memory.
> 
> There isn't a standard wrapper than generates 'rep movsd'.

If memcpy_to_io() has problem because it assumed memcpy() had some
properties, it is quite a different problem that should be referred to
x86 maintainers ?

arch/x86/lib/memcpy_64.S certainly contains a suitable variant.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC] Kernel unaligned access at __skb_flow_dissect
  2016-01-29 18:06 [RFC] Kernel unaligned access at __skb_flow_dissect Sowmini Varadhan
  2016-01-29 18:33 ` Eric Dumazet
@ 2016-01-29 19:04 ` David Miller
  2016-01-30  2:49 ` [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Alexander Duyck
  2 siblings, 0 replies; 50+ messages in thread
From: David Miller @ 2016-01-29 19:04 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: netdev

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Fri, 29 Jan 2016 13:06:51 -0500

> @@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
>  }
>  EXPORT_SYMBOL(__skb_flow_get_ports);
>  
> +static inline __be32 ip6_flowlabel_align(const u8 *hdr)
> +{
> +       union {
> +               __u8 w[4];
> +               __u32 flow;
> +       } ip6_flow;
> +
> +       memcpy(ip6_flow.w, hdr, 4);
 ...
> -               flow_label = ip6_flowlabel(iph);
> +               flow_label = ip6_flowlabel_align((const u8 *)iph);

Casting is not a foolproof solution.

The compiler is allowed to legally walk through the casts and discover that
the original type has certain alignment guarantees and use them in it's
inline expansion of memcpy().

Also, so much stuff in this file is going to trip over this issue if
the IP header isn't even 4 byte aligned.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-29 18:06 [RFC] Kernel unaligned access at __skb_flow_dissect Sowmini Varadhan
  2016-01-29 18:33 ` Eric Dumazet
  2016-01-29 19:04 ` David Miller
@ 2016-01-30  2:49 ` Alexander Duyck
  2016-01-30  3:23   ` Eric Dumazet
  2 siblings, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-30  2:49 UTC (permalink / raw)
  To: netdev, davem, alexander.duyck; +Cc: Sowmini Varadhan, eric.dumazet, tom

This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---

 net/core/flow_dissector.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d79699c9d1b9..f09114f8ee33 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -208,7 +208,6 @@ ip:
 	case htons(ETH_P_IPV6): {
 		const struct ipv6hdr *iph;
 		struct ipv6hdr _iph;
-		__be32 flow_label;
 
 ipv6:
 		iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, hlen, &_iph);
@@ -230,17 +229,19 @@ ipv6:
 			key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
 		}
 
-		flow_label = ip6_flowlabel(iph);
-		if (flow_label) {
-			if (dissector_uses_key(flow_dissector,
-					       FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
+		if (dissector_uses_key(flow_dissector,
+				       FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
+			__be32 flow_label = ip6_flowlabel(iph);
+
+			if (flow_label) {
 				key_tags = skb_flow_dissector_target(flow_dissector,
 								     FLOW_DISSECTOR_KEY_FLOW_LABEL,
 								     target_container);
 				key_tags->flow_label = ntohl(flow_label);
+
+				if (flags & FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
+					goto out_good;
 			}
-			if (flags & FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
-				goto out_good;
 		}
 
 		if (flags & FLOW_DISSECTOR_F_STOP_AT_L3)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30  2:49 ` [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Alexander Duyck
@ 2016-01-30  3:23   ` Eric Dumazet
  2016-01-30  3:35     ` David Miller
  2016-01-30 16:17     ` Sowmini Varadhan
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Dumazet @ 2016-01-30  3:23 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, alexander.duyck, Sowmini Varadhan, tom

On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
> This patch fixes an issue with unaligned accesses when using
> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
> The fact is when trying to check the length we don't need to be looking at
> the flow label so we can reorder the checks to first check if we are
> supposed to gather the flow label and then make the call to actually get
> it.
> 
> Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
> ---


You did not quite follow the discussion Alexander, we did not exactly
took a decision about this bug.

As we mentioned, there are other parts that need care.

key_keyid->keyid = *keyid;

Please address them, instead of having to 'wait' for the next crash.

BTW, even a memcpy(&key_addrs->v4addrs, &iph->saddr, 8) could crash, as
the compiler can certainly assume src and dst are 4 bytes aligned, and
could use word accesses when inlining memcpy() even on Sparc.

Apparently the compiler used by Sowmini is gentle.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30  3:23   ` Eric Dumazet
@ 2016-01-30  3:35     ` David Miller
  2016-01-30  4:46       ` Alexander Duyck
  2016-01-30 16:17     ` Sowmini Varadhan
  1 sibling, 1 reply; 50+ messages in thread
From: David Miller @ 2016-01-30  3:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: aduyck, netdev, alexander.duyck, sowmini.varadhan, tom

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jan 2016 19:23:54 -0800

> On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
>> This patch fixes an issue with unaligned accesses when using
>> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
>> The fact is when trying to check the length we don't need to be looking at
>> the flow label so we can reorder the checks to first check if we are
>> supposed to gather the flow label and then make the call to actually get
>> it.
>> 
>> Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
>> ---
> 
> 
> You did not quite follow the discussion Alexander, we did not exactly
> took a decision about this bug.
> 
> As we mentioned, there are other parts that need care.
> 
> key_keyid->keyid = *keyid;
> 
> Please address them, instead of having to 'wait' for the next crash.

Indeed, this is a more fundamental issue.

This change in and of itself isn't bad, and is probably a suitable
micro-optimization for net-next, but it doesn't fix the bug in
question.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30  3:35     ` David Miller
@ 2016-01-30  4:46       ` Alexander Duyck
  0 siblings, 0 replies; 50+ messages in thread
From: Alexander Duyck @ 2016-01-30  4:46 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, Alex Duyck, Netdev, Sowmini Varadhan, Tom Herbert

On Fri, Jan 29, 2016 at 7:35 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 29 Jan 2016 19:23:54 -0800
>
>> On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
>>> This patch fixes an issue with unaligned accesses when using
>>> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
>>> The fact is when trying to check the length we don't need to be looking at
>>> the flow label so we can reorder the checks to first check if we are
>>> supposed to gather the flow label and then make the call to actually get
>>> it.
>>>
>>> Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>>> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
>>> ---
>>
>>
>> You did not quite follow the discussion Alexander, we did not exactly
>> took a decision about this bug.
>>
>> As we mentioned, there are other parts that need care.
>>
>> key_keyid->keyid = *keyid;
>>
>> Please address them, instead of having to 'wait' for the next crash.

The key_id bit is not part of the "basic" keys.  The basic keys only
require a 2 byte alignment, it is when you start parsing to populate a
hash that you get into the wider fields.  As far as I know there
aren't any length fields that are wider than 2 bytes.

> Indeed, this is a more fundamental issue.
>
> This change in and of itself isn't bad, and is probably a suitable
> micro-optimization for net-next, but it doesn't fix the bug in
> question.

Actually it does fix the bug as reported.  The problem as reported was
for ixgbe using the function for parsing the length.  In the case of
just getting the length my patch resolves the issue as there are no
other accesses wider than 2 bytes used when *just* computing the
length.  This is something that will have to get addressed sooner or
later anyway, and at least with this patch in the basic case of IPv6
over ixgbe won't cause any issues.

The key_id bit that Eric is pointing out is called in a path that is
only executed if you are actually attempting to compute a hash.  Odds
are the fix for that is going to be much more complicated and extends
well outside this function since it is a fundamental issue with VXLAN
and GRE TEB tunnels as either the inner or outer headers end up being
unaligned unless you want to break them into two buffers so that they
can both be aligned at the same time.

Extending beyond this has anyone taken a look at the transmit path for
these tunnels?  I would suspect we probably also have a number of
issues there since either the inner or outer IP headers have to be
unaligned when we are setting those up.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30  3:23   ` Eric Dumazet
  2016-01-30  3:35     ` David Miller
@ 2016-01-30 16:17     ` Sowmini Varadhan
  2016-01-30 17:43       ` Tom Herbert
  2016-01-30 18:36       ` Alexander Duyck
  1 sibling, 2 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-30 16:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, alexander.duyck, tom

On (01/29/16 19:23), Eric Dumazet wrote:
> BTW, even a memcpy(&key_addrs->v4addrs, &iph->saddr, 8) could crash, as
> the compiler can certainly assume src and dst are 4 bytes aligned, and
> could use word accesses when inlining memcpy() even on Sparc.
> 
> Apparently the compiler used by Sowmini is gentle.

One more subtlety that I missed until now.. 

eth_get_headlen passes in flow_keys_buf_dissector (NOT flow_keys_dissector!)

So FLOW_DISSECTOR_KEY_IPV4_ADDRS is not set, and this helps to dodge
the unaligned iph->saddr access.

But as others have pointed out, much of this code is brittle
because it's accessing the data before the driver has had a chance
to align things. The page_offset initialization of NET_IP_ALIGN,
with all its weaknesses, at least matches (in principle) the prescription
used for the xmit path.

--Sowmini

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30 16:17     ` Sowmini Varadhan
@ 2016-01-30 17:43       ` Tom Herbert
  2016-01-30 19:12         ` Sowmini Varadhan
  2016-01-30 18:36       ` Alexander Duyck
  1 sibling, 1 reply; 50+ messages in thread
From: Tom Herbert @ 2016-01-30 17:43 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Alexander Duyck, Linux Kernel Network Developers,
	David S. Miller, Alexander Duyck

On Sat, Jan 30, 2016 at 8:17 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 19:23), Eric Dumazet wrote:
>> BTW, even a memcpy(&key_addrs->v4addrs, &iph->saddr, 8) could crash, as
>> the compiler can certainly assume src and dst are 4 bytes aligned, and
>> could use word accesses when inlining memcpy() even on Sparc.
>>
>> Apparently the compiler used by Sowmini is gentle.
>
> One more subtlety that I missed until now..
>
> eth_get_headlen passes in flow_keys_buf_dissector (NOT flow_keys_dissector!)
>
> So FLOW_DISSECTOR_KEY_IPV4_ADDRS is not set, and this helps to dodge
> the unaligned iph->saddr access.
>
> But as others have pointed out, much of this code is brittle
> because it's accessing the data before the driver has had a chance
> to align things. The page_offset initialization of NET_IP_ALIGN,
> with all its weaknesses, at least matches (in principle) the prescription
> used for the xmit path.
>
That is not the only case, If a GRE TEB packet is ever received and
flow dissector is called for any reason (like skb_get_hash) there's
going to be problems-- and this doesn't require GRE to even be
configured on the host.

I have a patch that changes the 32-bit accesses in flow_dissector to
use get_unaligned_be32. I don't have access to any machines that care
about alignment (only x86). Do you know if there is an alternate way
to test this other than running on architecture like Sparc?

Thanks,
Tom

> --Sowmini
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30 17:43       ` Tom Herbert
@ 2016-01-30 19:12         ` Sowmini Varadhan
  0 siblings, 0 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-30 19:12 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, Alexander Duyck, Linux Kernel Network Developers,
	David S. Miller, Alexander Duyck

On (01/30/16 09:43), Tom Herbert wrote:
> That is not the only case, If a GRE TEB packet is ever received and
> flow dissector is called for any reason (like skb_get_hash) there's
> going to be problems-- and this doesn't require GRE to even be
> configured on the host.
> 
> I have a patch that changes the 32-bit accesses in flow_dissector to
> use get_unaligned_be32. I don't have access to any machines that care
> about alignment (only x86). Do you know if there is an alternate way
> to test this other than running on architecture like Sparc?

I can help test this on sparc (would help if you send me any special
config instructions for the GRE TEB case) , but the other way to test
it would be something similar to this:

--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -178,6 +178,8 @@ ip:
 
                ip_proto = iph->protocol;
 
+               WARN_ON_ONCE(!IS_ALIGNED(iph->saddr, 4));
+
                if (!dissector_uses_key(flow_dissector,
                                        FLOW_DISSECTOR_KEY_IPV4_ADDRS))

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30 16:17     ` Sowmini Varadhan
  2016-01-30 17:43       ` Tom Herbert
@ 2016-01-30 18:36       ` Alexander Duyck
  2016-01-30 19:26         ` Eric Dumazet
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-30 18:36 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Alexander Duyck, Netdev, David Miller, Tom Herbert

On Sat, Jan 30, 2016 at 8:17 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/29/16 19:23), Eric Dumazet wrote:
>> BTW, even a memcpy(&key_addrs->v4addrs, &iph->saddr, 8) could crash, as
>> the compiler can certainly assume src and dst are 4 bytes aligned, and
>> could use word accesses when inlining memcpy() even on Sparc.
>>
>> Apparently the compiler used by Sowmini is gentle.
>
> One more subtlety that I missed until now..
>
> eth_get_headlen passes in flow_keys_buf_dissector (NOT flow_keys_dissector!)
>
> So FLOW_DISSECTOR_KEY_IPV4_ADDRS is not set, and this helps to dodge
> the unaligned iph->saddr access.

Right this is a bug that got introduced in 4.2 when the check for !skb
was dropped in favor of the flow keys.

My thought was the patch I provided fixes 4.3, and could be applied to
stable for 4.2.  The concerns about GRE TEB and VXLAN go much further
back then that since if I am not mistaken the transmit path for both
is pretty much borked if we try to do GSO as either the inner headers
or outer headers are not aligned so assembling the header will likely
result in unaligned accesses.

> But as others have pointed out, much of this code is brittle
> because it's accessing the data before the driver has had a chance
> to align things. The page_offset initialization of NET_IP_ALIGN,
> with all its weaknesses, at least matches (in principle) the prescription
> used for the xmit path.

That is essentially an unfortunate side effect of this being a
function that has two very different roles to perform.  One goes
through and just determines the length of the frame, the other goes
through and gathers data for computing a hash.

As far as the NET_IP_ALIGN on the page offset I think it is a horrible
idea.  Basically it means we have to allocate at least 1K more space
than we need since page sizes are powers of 2, and buffer sizes in
hardware are measured in 1K increments.

Adding the unaligned reads probably doesn't do much to fix the actual
issue which is the unaligned headers (either inner or outer) on
architectures that don't support unaligned access.  If anything what I
really think we should look at doing is coming up with a way to just
pad the inner headers by NET_IP_ALIGN bytes if the architecture
doesn't support unaligned access and then output the inner and outer
headers as two separate buffers.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30 18:36       ` Alexander Duyck
@ 2016-01-30 19:26         ` Eric Dumazet
  2016-01-31  1:13           ` Alexander Duyck
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Dumazet @ 2016-01-30 19:26 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Sowmini Varadhan, Alexander Duyck, Netdev, David Miller,
	Tom Herbert

On Sat, 2016-01-30 at 10:36 -0800, Alexander Duyck wrote:

> As far as the NET_IP_ALIGN on the page offset I think it is a horrible
> idea.  Basically it means we have to allocate at least 1K more space
> than we need since page sizes are powers of 2, and buffer sizes in
> hardware are measured in 1K increments.

I thought Ethernet frames were something like ~1500 bytes.

Can't you program the NIC to not receive frames bigger than MTU + 18, or
at least 2046 ?

Of course, if the NIC is able to push 2048 bytes (instead of 2046 max),
using an offset of 2 is not going to work.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-30 19:26         ` Eric Dumazet
@ 2016-01-31  1:13           ` Alexander Duyck
  2016-01-31  3:45             ` David Miller
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-31  1:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sowmini Varadhan, Alexander Duyck, Netdev, David Miller,
	Tom Herbert

On Sat, Jan 30, 2016 at 11:26 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-01-30 at 10:36 -0800, Alexander Duyck wrote:
>
>> As far as the NET_IP_ALIGN on the page offset I think it is a horrible
>> idea.  Basically it means we have to allocate at least 1K more space
>> than we need since page sizes are powers of 2, and buffer sizes in
>> hardware are measured in 1K increments.
>
> I thought Ethernet frames were something like ~1500 bytes.

It is, but if jumbo frames are enabled or LRO is enabled the NIC will
generate larger frames.

> Can't you program the NIC to not receive frames bigger than MTU + 18, or
> at least 2046 ?

Yes and no.  The NIC will already block frames larger than a fixed
size, but there are features like LRO that will let the NIC assemble a
larger frame.

> Of course, if the NIC is able to push 2048 bytes (instead of 2046 max),
> using an offset of 2 is not going to work.

The NIC hardware only allows us to set receive buffer sizes in 1K
increments.  I think fm10k may be an exception in that it may be able
to support 512 byte, but otherwise igb and ixgbe are strictly set in
1K increments.  So basically if we did the NET_IP_ALIGN of the page we
are limited to 1K buffers if we were to retain the current page reuse,
or we could go up to 3K if we were to allocate order 1 pages for
receive buffers as currently occurs with FCoE.

I would really prefer to keep the pages DMA aligned, and the skb->data
IP aligned.  If nothing else it has the advantage of actually having
the proper alignment on all the headers if I only pull the outer
headers and leave the inner headers in the page.. :-/

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-31  1:13           ` Alexander Duyck
@ 2016-01-31  3:45             ` David Miller
  2016-01-31  9:12               ` Alexander Duyck
  2016-02-01 16:50               ` David Laight
  0 siblings, 2 replies; 50+ messages in thread
From: David Miller @ 2016-01-31  3:45 UTC (permalink / raw)
  To: alexander.duyck; +Cc: eric.dumazet, sowmini.varadhan, aduyck, netdev, tom

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sat, 30 Jan 2016 17:13:35 -0800

> The NIC hardware only allows us to set receive buffer sizes in 1K
> increments.  I think fm10k may be an exception in that it may be able
> to support 512 byte, but otherwise igb and ixgbe are strictly set in
> 1K increments.  So basically if we did the NET_IP_ALIGN of the page we
> are limited to 1K buffers if we were to retain the current page reuse,
> or we could go up to 3K if we were to allocate order 1 pages for
> receive buffers as currently occurs with FCoE.
> 
> I would really prefer to keep the pages DMA aligned, and the skb->data
> IP aligned.  If nothing else it has the advantage of actually having
> the proper alignment on all the headers if I only pull the outer
> headers and leave the inner headers in the page.. :-/

I am unconvinced by your arguments.  People getting hit by these
arguments don't care.  For those not hitting the problem, nothing is
going to change.

x86 uses NET_IP_ALIGN==0, and therefore nothing will change for the
only platform people care a lot about.  PowerPC too.

For platforms with NET_IP_ALIGN==2 or similar, all of these cases
where 1K extra buffer is allocator or _whatever_ other side effect you
think is such as huge deal...  trust me it's _nothing_ compared to the
alternative:

1) Unaligned accesses

2) Constantly having to say "oops this case, oops that case, oops shit
   we have to handle this too" as we constantly patch up flow
   dissector and other paths.

Please, pass aligned buffers to the flow dissector, and let's put this
behind us, _provably_, for good.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-31  3:45             ` David Miller
@ 2016-01-31  9:12               ` Alexander Duyck
  2016-01-31 19:17                 ` David Miller
  2016-02-01 16:50               ` David Laight
  1 sibling, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-01-31  9:12 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, Sowmini Varadhan, Alex Duyck, Netdev, Tom Herbert

On Sat, Jan 30, 2016 at 7:45 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Sat, 30 Jan 2016 17:13:35 -0800
>
>> The NIC hardware only allows us to set receive buffer sizes in 1K
>> increments.  I think fm10k may be an exception in that it may be able
>> to support 512 byte, but otherwise igb and ixgbe are strictly set in
>> 1K increments.  So basically if we did the NET_IP_ALIGN of the page we
>> are limited to 1K buffers if we were to retain the current page reuse,
>> or we could go up to 3K if we were to allocate order 1 pages for
>> receive buffers as currently occurs with FCoE.
>>
>> I would really prefer to keep the pages DMA aligned, and the skb->data
>> IP aligned.  If nothing else it has the advantage of actually having
>> the proper alignment on all the headers if I only pull the outer
>> headers and leave the inner headers in the page.. :-/
>
> I am unconvinced by your arguments.  People getting hit by these
> arguments don't care.  For those not hitting the problem, nothing is
> going to change.
>
> x86 uses NET_IP_ALIGN==0, and therefore nothing will change for the
> only platform people care a lot about.  PowerPC too.

I think you may be getting confused about the scope of what you are
talking about.  The region pointed to by skb->data is IP aligned when
we pass it to the stack.  So the scope of this is very limited and
should only effect the flow dissector.  If needed I would be more than
happy to just fork the function and write up a version that only gets
the length of headers and is good with a 2 byte aligned header.  This
was one of my original misgivings about merging the functionality of
the flow dissector and the functions I was maintaining in the Intel
drivers.

> For platforms with NET_IP_ALIGN==2 or similar, all of these cases
> where 1K extra buffer is allocator or _whatever_ other side effect you
> think is such as huge deal...  trust me it's _nothing_ compared to the
> alternative:

We pay a pretty hefty truesize penalty for consuming any more memory
then we need to.  The changes as proposed would essentially kick our
truesize from 3K to 5K per frame.  It would double the space needed
since it isn't like I can just tell the page allocater to give me an
additional 2 bytes, it bumps me from 4K to 8K.  There would be a
noticeable performance drop, though I admit I don't know if it would
be a very big group noticing it since we are talking about some lesser
used architectures.

> 1) Unaligned accesses

Changing the driver doesn't fix the unaligned accesses problem.  It is
still there with the inner headers on GRE Transparent Ethernet Bridge.
That is why I believe Tom is still trying to come up with a solution
for it.

> 2) Constantly having to say "oops this case, oops that case, oops shit
>    we have to handle this too" as we constantly patch up flow
>    dissector and other paths.

The flow dissector is the only spot that has to deal with this.  If
you forget I had my own function we were perfectly happy to use with
the Intel drivers but I was told we needed to use this one as we
didn't want to duplicate functionality.

The fact is this function served its purpose perfectly fine until
commit b924933cbbfb "flow_dissector: introduce support for ipv6
addressses".  That is when the bug was introduced that is fixed by the
patch I submitted.

> Please, pass aligned buffers to the flow dissector, and let's put this
> behind us, _provably_, for good.

What is the point.  As pointed out earlier there is still an issue
with GRE TEB.  My patch is the surgical approach and fixes the
original issue and is easy to backport to the other kernels that have
this issue.  What you are asking me for is to go through and make
serious changes to igb, ixgbe, ixgbevf, fm10k, and mlx4.  If anything
I would think that fixing this in net with the least invasive patch
would be preferred.  If we want to go through and rewrite the receive
paths in multiple drivers to allow them to use IP aligned pages on
architectures that many of us don't have then we should focus on doing
that in net-next where we can prove out that we aren't making things
worse.

I'm standing by my original patch and still think it is the best fix
for net.  If you want me to go through and rearchitect the drivers so
that they only ever pass IP aligned page buffers to the flow dissector
then I really think it should wait until net-next as it isn't going to
be a small change set, and I don't have the systems or NICs here to be
able to test all of it.

> Thanks.

If nothing else we can discuss this further in Seville as I am just
not convinced changing the drivers is the right way to go about fixing
this, and I won't have the time to really get around to doing any of
it until I have come back from the conference.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-31  9:12               ` Alexander Duyck
@ 2016-01-31 19:17                 ` David Miller
  2016-01-31 21:23                   ` Sowmini Varadhan
  0 siblings, 1 reply; 50+ messages in thread
From: David Miller @ 2016-01-31 19:17 UTC (permalink / raw)
  To: alexander.duyck; +Cc: eric.dumazet, sowmini.varadhan, aduyck, netdev, tom

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sun, 31 Jan 2016 01:12:03 -0800

> If nothing else we can discuss this further in Seville as I am just
> not convinced changing the drivers is the right way to go about fixing
> this, and I won't have the time to really get around to doing any of
> it until I have come back from the conference.

Ok, let me think about this some more.

It's a shame that "put two garbage bytes into the FIFO at the
beginning of every frame" isn't a standard hw feature.  Or maybe
I misunderstand what's going on in all of these drivers...

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-31 19:17                 ` David Miller
@ 2016-01-31 21:23                   ` Sowmini Varadhan
  0 siblings, 0 replies; 50+ messages in thread
From: Sowmini Varadhan @ 2016-01-31 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.duyck, eric.dumazet, aduyck, netdev, tom

On (01/31/16 11:17), David Miller wrote:
> 
> > If nothing else we can discuss this further in Seville as I am just
> > not convinced changing the drivers is the right way to go about fixing
> > this, and I won't have the time to really get around to doing any of
> > it until I have come back from the conference.
> 
> Ok, let me think about this some more.
> 
> It's a shame that "put two garbage bytes into the FIFO at the
> beginning of every frame" isn't a standard hw feature.  Or maybe
> I misunderstand what's going on in all of these drivers...
> 

We can follow up on this at Seville, but if it is possible,
it might be better to not overload eth_get_headlen to also
do the flow extraction even before the buffer has been aligned
by the driver. Even the NET_IP_ALIGN algorithm will fail if we 
are dealing with IEEE 802.3 headers with LLC/SNAP etc.

And in addition to all this, anything we can do to help align
nested encapsulations  (i.e., Tom's concern) is also needed, and
maybe it will be cleaner to do that if we just split off deep packet
inspection for flow extraction away from eth_get_headlen.

--Sowmini

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-01-31  3:45             ` David Miller
  2016-01-31  9:12               ` Alexander Duyck
@ 2016-02-01 16:50               ` David Laight
  2016-02-01 18:18                 ` Alexander Duyck
  1 sibling, 1 reply; 50+ messages in thread
From: David Laight @ 2016-02-01 16:50 UTC (permalink / raw)
  To: 'David Miller', alexander.duyck@gmail.com
  Cc: eric.dumazet@gmail.com, sowmini.varadhan@oracle.com,
	aduyck@mirantis.com, netdev@vger.kernel.org, tom@herbertland.com

From: David Miller
> Sent: 31 January 2016 03:45
> To: alexander.duyck@gmail.com
...
> > I would really prefer to keep the pages DMA aligned, and the skb->data
> > IP aligned.  If nothing else it has the advantage of actually having
> > the proper alignment on all the headers if I only pull the outer
> > headers and leave the inner headers in the page.. :-/
> 
> I am unconvinced by your arguments.  People getting hit by these
> arguments don't care.  For those not hitting the problem, nothing is
> going to change.
> 
> x86 uses NET_IP_ALIGN==0, and therefore nothing will change for the
> only platform people care a lot about.  PowerPC too.
> 
> For platforms with NET_IP_ALIGN==2 or similar, all of these cases
> where 1K extra buffer is allocator or _whatever_ other side effect you
> think is such as huge deal...  trust me it's _nothing_ compared to the
> alternative:
> 
> 1) Unaligned accesses

Remember the even if you do a 'realignment copy' of the IP header,
at some point the 'userdata' of the packet has to be accessed.
Mostly this will be with memcpy() and you want that copy to be aligned.

I really can't believe just how many ethernet chips are now being designed
that can't write received frames to '4n+2' boundaries.
It isn't even a new problem. Sun fixed the sbus 'DMA' part so that it would
to sbus burst transfers to 4n+2 aligned buffers a long time ago...

	David

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-02-01 16:50               ` David Laight
@ 2016-02-01 18:18                 ` Alexander Duyck
  2016-02-02  9:47                   ` David Laight
  0 siblings, 1 reply; 50+ messages in thread
From: Alexander Duyck @ 2016-02-01 18:18 UTC (permalink / raw)
  To: David Laight
  Cc: David Miller, eric.dumazet@gmail.com, sowmini.varadhan@oracle.com,
	aduyck@mirantis.com, netdev@vger.kernel.org, tom@herbertland.com

On Mon, Feb 1, 2016 at 8:50 AM, David Laight <David.Laight@aculab.com> wrote:
> From: David Miller
>> Sent: 31 January 2016 03:45
>> To: alexander.duyck@gmail.com
> ...
>> > I would really prefer to keep the pages DMA aligned, and the skb->data
>> > IP aligned.  If nothing else it has the advantage of actually having
>> > the proper alignment on all the headers if I only pull the outer
>> > headers and leave the inner headers in the page.. :-/
>>
>> I am unconvinced by your arguments.  People getting hit by these
>> arguments don't care.  For those not hitting the problem, nothing is
>> going to change.
>>
>> x86 uses NET_IP_ALIGN==0, and therefore nothing will change for the
>> only platform people care a lot about.  PowerPC too.
>>
>> For platforms with NET_IP_ALIGN==2 or similar, all of these cases
>> where 1K extra buffer is allocator or _whatever_ other side effect you
>> think is such as huge deal...  trust me it's _nothing_ compared to the
>> alternative:
>>
>> 1) Unaligned accesses
>
> Remember the even if you do a 'realignment copy' of the IP header,
> at some point the 'userdata' of the packet has to be accessed.
> Mostly this will be with memcpy() and you want that copy to be aligned.

The problem is, aligned by what?  For x86 anything other than 16 byte
alignment will require some realignment anyway since internally it is
essentially doing some SSE style copy based on rep movsb.  I'm sure it
is similar for other architectures that have moved on from 32b to 64b
in that they want to move by the largest available data unit and 2
byte alignment vs 4 byte doesn't make that much difference anymore.

The thing you have to keep in mind is that your standard TCP frame has
66 bytes of headers(14 Ethernet, 20 IP, 20 TCP, 12 TCP options).  This
means that you are crossing over into the next cache line by either 2
or 4 bytes depending on NET_IP_ALIGN.  On a system with an 8 byte long
it doesn't really make a difference one way or the other since it is
still unaligned in terms of the fastest possible memcpy.

> I really can't believe just how many ethernet chips are now being designed
> that can't write received frames to '4n+2' boundaries.
> It isn't even a new problem. Sun fixed the sbus 'DMA' part so that it would
> to sbus burst transfers to 4n+2 aligned buffers a long time ago...

The problem here is motivation.  It makes sense that Sun would want
their headers to be IP aligned since a Sparc processor still needs
that.  Most NIC vendors don't really care about the more obscure
architectures, and even 32 bit is falling out of favor.  The market
share is just too small and in many cases customers don't run Linux
OSes on them anyway so even driver support is minimal.  They mostly
care about x86_64, PowerPC, and 64 bit ARM.  And all 3 of those
architectures have efficient unaligned accesses.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  2016-02-01 18:18                 ` Alexander Duyck
@ 2016-02-02  9:47                   ` David Laight
  0 siblings, 0 replies; 50+ messages in thread
From: David Laight @ 2016-02-02  9:47 UTC (permalink / raw)
  To: 'Alexander Duyck'
  Cc: David Miller, eric.dumazet@gmail.com, sowmini.varadhan@oracle.com,
	aduyck@mirantis.com, netdev@vger.kernel.org, tom@herbertland.com

From: Alexander Duyck
> Sent: 01 February 2016 18:18
> >> 1) Unaligned accesses
> >
> > Remember the even if you do a 'realignment copy' of the IP header,
> > at some point the 'userdata' of the packet has to be accessed.
> > Mostly this will be with memcpy() and you want that copy to be aligned.
> 
> The problem is, aligned by what?  For x86 anything other than 16 byte
> alignment will require some realignment anyway since internally it is
> essentially doing some SSE style copy based on rep movsb.

That isn't 'SSE' based, there is probably some kind of barrel shifter
associated with the data cache that allows cache-line reads from
the 'wrong' address.

> I'm sure it
> is similar for other architectures that have moved on from 32b to 64b
> in that they want to move by the largest available data unit and 2
> byte alignment vs 4 byte doesn't make that much difference anymore.

There are still plently of 32bit systems out there.
Including SoC systems with cpus that fault on misaligned transfers on
the same silicon as ethernet MACs that can't write 4n+2 aligned frames.

> The thing you have to keep in mind is that your standard TCP frame has
> 66 bytes of headers(14 Ethernet, 20 IP, 20 TCP, 12 TCP options).  This
> means that you are crossing over into the next cache line by either 2
> or 4 bytes depending on NET_IP_ALIGN.  On a system with an 8 byte long
> it doesn't really make a difference one way or the other since it is
> still unaligned in terms of the fastest possible memcpy.

Unless you align the frame on an 8n+6 boundary.

> > I really can't believe just how many ethernet chips are now being designed
> > that can't write received frames to '4n+2' boundaries.
> > It isn't even a new problem. Sun fixed the sbus 'DMA' part so that it would
> > to sbus burst transfers to 4n+2 aligned buffers a long time ago...
> 
> The problem here is motivation.  It makes sense that Sun would want
> their headers to be IP aligned since a Sparc processor still needs
> that.  Most NIC vendors don't really care about the more obscure
> architectures, and even 32 bit is falling out of favor.  The market
> share is just too small and in many cases customers don't run Linux
> OSes on them anyway so even driver support is minimal.  They mostly
> care about x86_64, PowerPC, and 64 bit ARM.  And all 3 of those
> architectures have efficient unaligned accesses.

With the possible exception of the x86 optimised 'rep movsb' on
very recent intel cpus, unaligned transfers are very likely
slower than aligned ones - just a lot faster than aligning the
copy in software.

My suspicion is that many hardware engineers are just not aware
of the problem. Or just leave it as a 'software problem'.

It is the same family of 'problem' that adds more and more
protocol specific 1s compliment checksum offloading.

	David.



^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-02-02  9:50 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-29 18:06 [RFC] Kernel unaligned access at __skb_flow_dissect Sowmini Varadhan
2016-01-29 18:33 ` Eric Dumazet
2016-01-29 18:40   ` Eric Dumazet
2016-01-29 19:08     ` David Miller
2016-01-29 19:39       ` Eric Dumazet
2016-01-29 18:54   ` Eric Dumazet
2016-01-29 19:22     ` Sowmini Varadhan
2016-01-29 21:00     ` Tom Herbert
2016-01-29 21:09       ` Sowmini Varadhan
2016-01-29 21:33         ` Tom Herbert
2016-01-29 23:00         ` Tom Herbert
2016-01-29 23:04           ` Sowmini Varadhan
2016-01-29 23:31             ` Tom Herbert
2016-01-29 23:58               ` Sowmini Varadhan
2016-01-30  0:47                 ` Tom Herbert
2016-01-30  1:18                   ` David Miller
2016-01-30  2:15                   ` Eric Dumazet
2016-01-31 22:13                     ` Tom Herbert
2016-02-01  0:04                       ` Eric Dumazet
2016-01-29 19:06   ` David Miller
2016-01-29 19:37     ` Eric Dumazet
2016-01-29 19:41       ` David Miller
2016-01-29 19:44       ` Sowmini Varadhan
2016-01-29 20:01         ` Eric Dumazet
2016-01-29 20:06           ` Eric Dumazet
2016-01-29 21:16           ` Alexander Duyck
2016-01-29 21:33             ` Eric Dumazet
2016-01-29 22:08               ` Alexander Duyck
2016-01-29 22:28                 ` Eric Dumazet
2016-01-29 23:00                   ` Alexander Duyck
2016-02-01 16:55                   ` David Laight
2016-02-01 17:25                     ` Eric Dumazet
2016-01-29 19:04 ` David Miller
2016-01-30  2:49 ` [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen Alexander Duyck
2016-01-30  3:23   ` Eric Dumazet
2016-01-30  3:35     ` David Miller
2016-01-30  4:46       ` Alexander Duyck
2016-01-30 16:17     ` Sowmini Varadhan
2016-01-30 17:43       ` Tom Herbert
2016-01-30 19:12         ` Sowmini Varadhan
2016-01-30 18:36       ` Alexander Duyck
2016-01-30 19:26         ` Eric Dumazet
2016-01-31  1:13           ` Alexander Duyck
2016-01-31  3:45             ` David Miller
2016-01-31  9:12               ` Alexander Duyck
2016-01-31 19:17                 ` David Miller
2016-01-31 21:23                   ` Sowmini Varadhan
2016-02-01 16:50               ` David Laight
2016-02-01 18:18                 ` Alexander Duyck
2016-02-02  9:47                   ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).