Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: David Miller @ 2016-03-28 23:54 UTC (permalink / raw)
  To: eric.dumazet
  Cc: jengelh, kadlec, sploving1, pablo, kaber, netfilter-devel, netdev
In-Reply-To: <1459198306.6473.126.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 28 Mar 2016 13:51:46 -0700

> On Mon, 2016-03-28 at 13:46 -0700, Eric Dumazet wrote:
> 
>> We have at least 384 bytes of padding in skb->head (this is struct
>> skb_shared_info).
>> 
>> Whatever garbage we might read, current code is fine.
>> 
>> We have to deal with a false positive here.
> 
> Very similar to the one fixed in 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10ec9472f05b45c94db3c854d22581a20b97db41

I don't see them as similar.

The current options code we are talking about here never references
past legitimate parts of the packet data.  We always check 'length',
and we never access past the boundary it describes.

This was the entire point of my posting.

Talking about padding, rather than the logical correctness of the
code, is therefore a distraction I think :-)

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: David Miller @ 2016-03-28 23:52 UTC (permalink / raw)
  To: jengelh; +Cc: kadlec, sploving1, pablo, kaber, netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.20.1603282214130.16241@nerf40.vanv.qr>

From: Jan Engelhardt <jengelh@inai.de>
Date: Mon, 28 Mar 2016 22:20:39 +0200 (CEST)

> 
> On Monday 2016-03-28 21:29, David Miller wrote:
>>>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
>>>> > >               length--;
>>>> > >               continue;
>>>> > >           default:
>>>> > > +            if (length < 2)
>>>> > > +                return;
>>>> > >               opsize = *ptr++;
>>>> > >               if (opsize < 2) /* "silly options" */
>>>> > >                   return;
>>
>>I'm trying to figure out how this can even matter.
>>If we are in the loop, length is at least one.
>>That means it is legal to read the opsize byte.
> 
> Is that because the skbuff is always padded to a multiple of (at
> least) two?

No, it's because length is at least one, so we can read one byte.

And then before we read more, we make sure length is at least opsize
which is at least two.

This has nothing to do with padding, and everything to do logically
with the code.

^ permalink raw reply

* Re: [PATCH] ipv6: Fix the pmtu path for connected UDP socket
From: Martin KaFai Lau @ 2016-03-28 23:44 UTC (permalink / raw)
  To: Cong Wang
  Cc: Wei Wang, Eric Dumazet, Eric Dumazet, Wei Wang, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <CAM_iQpXsv6HqRP=KySoaercKsR1wxWtzi4OuW=ViFfWzMaE_DQ@mail.gmail.com>

On Mon, Mar 28, 2016 at 03:39:42PM -0700, Cong Wang wrote:
> On Fri, Mar 25, 2016 at 5:16 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Fri, Mar 25, 2016 at 04:55:27PM -0700, Martin KaFai Lau wrote:
> >>  void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu)
> >>  {
> >> +     struct dst_entry *odst;
> >> +
> >> +     odst = sk_dst_get(sk);
> >> +
> >>       ip6_update_pmtu(skb, sock_net(sk), mtu,
> >>                       sk->sk_bound_dev_if, sk->sk_mark);
> >> +
> >> +     if (odst && !odst->error &&
> >> +         !ip6_dst_check(odst, inet6_sk(sk)->dst_cookie)) {
> >> +             struct dst_entry *ndst;
> >> +             struct flowi6 fl6;
> >> +
> >> +             build_skb_flow_key(&fl6, skb, sock_net(sk),
> >> +                                sk->sk_bound_dev_if, sk->sk_mark);
> >> +             ndst = ip6_route_output(sock_net(sk), NULL, &fl6);
> >> +             if (!ndst->error)
> >> +                     ip6_dst_store(sk, ndst, NULL, NULL);
> > oops...missed:
> >                 else
> >                         dst_release(ndst);
>
>
> Can you send an updated patch for review? Since you have a test case
> while I didn't.
>
> Also,
>
> 2) Not sure if we need to update dst or dst->path here, I guess the later
> counts in xfrm case therefore more correct.
Thanks for pointing it out.  Calling ip6_dst_check is also not appropriate here.
odst->ops->check() should be used.  I need to take a closer look and may try
another approach.

Thanks
-- Martin

^ permalink raw reply

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: Jesse Gross @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Alexander Duyck, David Miller, Linux Kernel Network Developers
In-Reply-To: <CALx6S37oooS6_0rQr59E3qkcDu14eGFDYP4CKvc_Zv64zP4NqA@mail.gmail.com>

On Mon, Mar 28, 2016 at 3:10 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>>>
>>>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>>>
>>>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>>>
>>>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>>>> that would cause problems.
>>>>>>>>>>
>>>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>>>
>>>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>>>> but *not* check for it.
>>>>>>>>
>>>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>>>
>>>>>>>
>>>>>>> Kernel software offload does support this combination just fine.
>>>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>>>> HW implementations you're referring to. The limitations of these
>>>>>>> drivers should not dictate how we build the software-- it needs to
>>>>>>> work the other way around.
>>>>>>
>>>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>>>> thing currently as the transmit side doesn't generate these kind of
>>>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>>>> really need to restructure the GSO code in order to handle it.  As far
>>>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>>>> way we could possibly handle it would be in software since what you
>>>>>> are indicating is a indeterminate number of headers that all require
>>>>>> updates.
>>>>>>
>>>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>>>> since the assumption is that the stack will never try to do this.
>>>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>>>> think it would be better to just advertise it that way on transmit
>>>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>>>
>>>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>>>> we should be increasing generality and capabilities of the kernel not
>>>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>>>> this flies then we can offload any combination of encapsulations with
>>>>>>> out protocol specific information.
>>>>>>
>>>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>>>> headers.  Currently we have the pointers for the inner headers and the
>>>>>> outer headers.  If you are talking about adding yet another level we
>>>>>> would need additional pointers in the skbuff to handle them and we
>>>>>> currently don't have them at present.
>>>>>>
>>>>>>>> The change that you are suggesting would result in packets generated
>>>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>>>> might corrupt traffic.
>>>>>>>
>>>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>>>> generic segmentation offload interface. But in the interim, it is
>>>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>>>> for their benefit is definitely not right.
>>>>>>
>>>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>>>> can handle it.  The problem as it stands right now is that we start
>>>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>>>> the standard mac_header, network_header, transport_header, and then
>>>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>>>> deep we start losing data.
>>>>>>
>>>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>>>> encapsulation for over a year now. We never had to add anything to
>>>>> skbuff to make that work. If "losing data" is a problem please provide
>>>>> the *reproducible* test case for that and we'll debug that.
>>>>
>>>> I'm guessing most of your examples involve either a remote checksum
>>>> being enabled or using NICs that don't support any tunnel offloads?
>>>> Hardware needs to be able to identify where headers are in order to
>>>> perform most of their offloads for TSO.  We either have to parse to
>>>> find them or we are provided with them by the stack.  GSO can work
>>>> around it as long as we don't stack checksum based offload with
>>>> non-checksum of the same type.
>>>>
>>>> mostIf for example you were to try and send a frame that had either an
>>>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>>>> probably screw it up.
>>>
>>> Please be more precise. Obviously we're only talking about the few
>>> NICs that even support UDP tunnel offload so it's not most NICs. Also,
>>> there is a big difference between "probably" and "definitely"; no one
>>> has provided a single data point that that there is even an issue. For
>>> instance, looking a i40e I suspect this will work with GRE/UDP since
>>> the driver already deals with offsets and shouldn't care about
>>> intermediate levels of encapsulation.
>>
>> I'm sorry I cannot be super precise as I hadn't looked that closely at
>> the FOU or GUE code before.  Honestly I suspect most people probably
>> haven't.
>>
> Sorry, but one simple test would have identified this patch was a
> regression. I don't mind that bugs are introduced with new patches,
> but it is frustrating when testing of the patches is obviously
> inadequate and then we get the "it's your stuff that's broken" knee
> jerk reaction when a patch causes a regression for someone else.
>
>> So from what I can tell FOU and GUE isn't really even necessarily
>> doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
>> So in the case of IPIP and SIT for instance the only real difference
>> between VXLAN and those tunnels with fou is that you don't have the
>> VXLAN or inner Ethernet headers.  So really you still only have two
>> levels of headers.  Even in the case of GRE you have that set after
>> the outer UDP header so in that case GRE ends up being treated as a
>> tunnel header since the outer transport offset was overwritten.  If we
>> had hardware that supported both UDP and GRE outer checksums it would
>> cause a mess as the hardware would place the GRE checksum in the wrong
>> spot.
>>
>> So if anything all we probably would need to do is treat the FOU or
>> GUE stuff as a special case.  Basically if we end up having to do a
>> GRO over a FOU or GUE instance maybe we need to knock the encap_mark
>> back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
>> because really it can just be treated as a UDP tunnel.  Jesse's code
>> is still needed to catch the case where someone is trying to do
>> something like IPIP over VXLAN or whatever but it does seem like there
>> should be some wiggle room for FOU or GUE.
>>
> Nothing is needed other than to revert this patch. There is no problem
> with GRO. You nor Jesse have not identified any tangible problem. If a
> driver really does had an issue with doing GRE/UDP with GSO then it
> can filter that in check_features (this is like 2 lines of code). But
> that has not been proven to be a problem, and I would expect that
> anyone who wants to fix that better run tests showing there is a
> problem.

Let me try to give a very clear and specific example. Note that this
is pure software and does not involve any hardware offloading.

* A packet is received that is encapsulated with two layers of GRE. It
looks like this: Eth|IP|GRE|IP|GRE|IP|TCP
* The packet is processed through GRO successfully. skb->encapsulation
bit is set and skb_shinfo(skb)->gso_type is set to SKB_GSO_GRE |
SKB_GSO_TCPV4.
* The packet is bridged to an output device and is prepared for
transmission. skb_gso_segment() is called to segment the packet.
* As we descent down the GSO code, we get to gre_gso_segment() for the
first GRE header. skb->encapsulation is cleared and we keep going to
the next header.
* We return to gre_gso_segment() again for the second GRE header.
There is a check for skb->encapsulation being set but it is now clear.
GSO processing is aborted.
* The packet is dropped.

^ permalink raw reply

* Re: [PATCH] ipv6: Fix the pmtu path for connected UDP socket
From: Cong Wang @ 2016-03-28 22:39 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Wei Wang, Eric Dumazet, Eric Dumazet, Wei Wang, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <20160326001600.GA29626@kafai-mba.local>

On Fri, Mar 25, 2016 at 5:16 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Fri, Mar 25, 2016 at 04:55:27PM -0700, Martin KaFai Lau wrote:
>>  void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu)
>>  {
>> +     struct dst_entry *odst;
>> +
>> +     odst = sk_dst_get(sk);
>> +
>>       ip6_update_pmtu(skb, sock_net(sk), mtu,
>>                       sk->sk_bound_dev_if, sk->sk_mark);
>> +
>> +     if (odst && !odst->error &&
>> +         !ip6_dst_check(odst, inet6_sk(sk)->dst_cookie)) {
>> +             struct dst_entry *ndst;
>> +             struct flowi6 fl6;
>> +
>> +             build_skb_flow_key(&fl6, skb, sock_net(sk),
>> +                                sk->sk_bound_dev_if, sk->sk_mark);
>> +             ndst = ip6_route_output(sock_net(sk), NULL, &fl6);
>> +             if (!ndst->error)
>> +                     ip6_dst_store(sk, ndst, NULL, NULL);
> oops...missed:
>                 else
>                         dst_release(ndst);


Can you send an updated patch for review? Since you have a test case
while I didn't.

Also,

1) I think you need to check obsolete before update? There is no
reason to update an obsoleted dst?

2) Not sure if we need to update dst or dst->path here, I guess the later
counts in xfrm case therefore more correct.

Thanks.

^ permalink raw reply

* Re: [RFT Patch net 1/2] ipv6: invalidate the socket cached route on pmtu events if possible
From: Cong Wang @ 2016-03-28 22:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Wei Wang, Steffen Klassert,
	Martin KaFai Lau, Hannes Frederic Sowa, Julian Anastasov
In-Reply-To: <1458929477.6473.47.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Mar 25, 2016 at 11:11 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2016-03-25 at 10:17 -0700, Cong Wang wrote:
>
>> 1) sock lock protects the whole update: the whole check, update, recheck,
>> set logic, to make sure another CPU will not do the same to the same socket
>> at the same time.
>>
>> 2) the dst itself is safe, because it is always refcounted, and we use xchg()
>> to change the pointer in sk_dst_cache.
>>
>> Or am I still missing anything here?
>
> As TCP always lock the socket before doing its heavy stuff,
> it can use a variant of sk_dst_cache manipulations that do not use extra
> atomic operations.
>
> But UDP gets xchg() to safely exchange sk_dst_cache, because we do not
> feel locking the socket is needed for UDP for typical uses (! cork)
>
> If you hold the socket lock in ICMP handler, then it would be
> inconsistent with udp sendmsg() where we do not hold the socket lock.
>
> Since I believe udp sendmsg() is fine, I do believe you do not need to
> lock the socket, and then care about socket being owned by the user.

I see, seems the whole update logic is safe to become lock-free, then
commit 8141ed9fcedb278f4a3a78680591bef1e55f75fb can be just
reverted.

OTOH, other bh_lock_sock() callers need it to update queues etc.,
here we only need to check and update one single pointer in sk.

Steffen?

Thanks.

^ permalink raw reply

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: Tom Herbert @ 2016-03-28 22:10 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesse Gross, David Miller, Linux Kernel Network Developers
In-Reply-To: <CAKgT0UdaEodsDwSfKBDJqjRG-7xHEdJGpKsAAmfmgLDL8cT4zQ@mail.gmail.com>

On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>> <alexander.duyck@gmail.com> wrote:
>>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>>
>>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>>
>>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>>
>>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>>> that would cause problems.
>>>>>>>>>
>>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>>
>>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>>> but *not* check for it.
>>>>>>>
>>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>>
>>>>>>
>>>>>> Kernel software offload does support this combination just fine.
>>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>>> HW implementations you're referring to. The limitations of these
>>>>>> drivers should not dictate how we build the software-- it needs to
>>>>>> work the other way around.
>>>>>
>>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>>> thing currently as the transmit side doesn't generate these kind of
>>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>>> really need to restructure the GSO code in order to handle it.  As far
>>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>>> way we could possibly handle it would be in software since what you
>>>>> are indicating is a indeterminate number of headers that all require
>>>>> updates.
>>>>>
>>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>>> since the assumption is that the stack will never try to do this.
>>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>>> think it would be better to just advertise it that way on transmit
>>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>>
>>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>>> we should be increasing generality and capabilities of the kernel not
>>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>>> this flies then we can offload any combination of encapsulations with
>>>>>> out protocol specific information.
>>>>>
>>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>>> headers.  Currently we have the pointers for the inner headers and the
>>>>> outer headers.  If you are talking about adding yet another level we
>>>>> would need additional pointers in the skbuff to handle them and we
>>>>> currently don't have them at present.
>>>>>
>>>>>>> The change that you are suggesting would result in packets generated
>>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>>> might corrupt traffic.
>>>>>>
>>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>>> generic segmentation offload interface. But in the interim, it is
>>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>>> for their benefit is definitely not right.
>>>>>
>>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>>> can handle it.  The problem as it stands right now is that we start
>>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>>> the standard mac_header, network_header, transport_header, and then
>>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>>> deep we start losing data.
>>>>>
>>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>>> encapsulation for over a year now. We never had to add anything to
>>>> skbuff to make that work. If "losing data" is a problem please provide
>>>> the *reproducible* test case for that and we'll debug that.
>>>
>>> I'm guessing most of your examples involve either a remote checksum
>>> being enabled or using NICs that don't support any tunnel offloads?
>>> Hardware needs to be able to identify where headers are in order to
>>> perform most of their offloads for TSO.  We either have to parse to
>>> find them or we are provided with them by the stack.  GSO can work
>>> around it as long as we don't stack checksum based offload with
>>> non-checksum of the same type.
>>>
>>> mostIf for example you were to try and send a frame that had either an
>>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>>> probably screw it up.
>>
>> Please be more precise. Obviously we're only talking about the few
>> NICs that even support UDP tunnel offload so it's not most NICs. Also,
>> there is a big difference between "probably" and "definitely"; no one
>> has provided a single data point that that there is even an issue. For
>> instance, looking a i40e I suspect this will work with GRE/UDP since
>> the driver already deals with offsets and shouldn't care about
>> intermediate levels of encapsulation.
>
> I'm sorry I cannot be super precise as I hadn't looked that closely at
> the FOU or GUE code before.  Honestly I suspect most people probably
> haven't.
>
Sorry, but one simple test would have identified this patch was a
regression. I don't mind that bugs are introduced with new patches,
but it is frustrating when testing of the patches is obviously
inadequate and then we get the "it's your stuff that's broken" knee
jerk reaction when a patch causes a regression for someone else.

> So from what I can tell FOU and GUE isn't really even necessarily
> doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
> So in the case of IPIP and SIT for instance the only real difference
> between VXLAN and those tunnels with fou is that you don't have the
> VXLAN or inner Ethernet headers.  So really you still only have two
> levels of headers.  Even in the case of GRE you have that set after
> the outer UDP header so in that case GRE ends up being treated as a
> tunnel header since the outer transport offset was overwritten.  If we
> had hardware that supported both UDP and GRE outer checksums it would
> cause a mess as the hardware would place the GRE checksum in the wrong
> spot.
>
> So if anything all we probably would need to do is treat the FOU or
> GUE stuff as a special case.  Basically if we end up having to do a
> GRO over a FOU or GUE instance maybe we need to knock the encap_mark
> back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
> because really it can just be treated as a UDP tunnel.  Jesse's code
> is still needed to catch the case where someone is trying to do
> something like IPIP over VXLAN or whatever but it does seem like there
> should be some wiggle room for FOU or GUE.
>
Nothing is needed other than to revert this patch. There is no problem
with GRO. You nor Jesse have not identified any tangible problem. If a
driver really does had an issue with doing GRE/UDP with GSO then it
can filter that in check_features (this is like 2 lines of code). But
that has not been proven to be a problem, and I would expect that
anyone who wants to fix that better run tests showing there is a
problem.

Tom

> - Alex

^ permalink raw reply

* Re: RESEND: Easily reproducible kernel panic due to netdev all_adj_list refcnt handling
From: Matthias Schiffer @ 2016-03-28 21:31 UTC (permalink / raw)
  To: Andrew Collins; +Cc: netdev
In-Reply-To: <56F5B746.9050401@cradlepoint.com>


[-- Attachment #1.1: Type: text/plain, Size: 1230 bytes --]

On 03/25/2016 11:10 PM, Andrew Collins wrote:
> On 03/25/2016 02:43 PM, Matthias Schiffer wrote:
>> We've tried your patch, and it changes the symptoms a bit, but doesn't fix
>> the panic. I've attached kernel logs of the crash both before and after
>> applying the patch.
>>
>> One note: I did not reproduce this issue myself, it was first reported in
>> [1], and then forwarded to the batman-adv issue tracker [2] by me.
>>
>> Regards,
>> Matthias
>>
>>
>> [1] https://github.com/freifunk-gluon/gluon/issues/680
>> [2] https://www.open-mesh.org/issues/247
> 
> On the off chance it helps, the version of the patch I integrated locally
> takes a somewhat different approach
> than the one I sent to the mailing list (propagates adj_list refcnts). 
> I've attached it in case it's useful.
> 
> I haven't submitted this upstream yet as it's still rather ugly.  I'm of
> the opinion that the whole "every device
> knows every upperdev and lowerdev in its tree" model is rather broken, and
> the patch is just working around
> a design that needs some rework.
> 
> Thanks,
> Andrew Collins

It's ugly, but it seems to help. No crashes so far with the new version of
your patch.

Thanks,
Matthias


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Eric Dumazet @ 2016-03-28 21:29 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Jan Engelhardt, David Miller, sploving1, Pablo Neira Ayuso,
	Patrick McHardy, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.10.1603282256120.25443@blackhole.kfki.hu>

On Mon, 2016-03-28 at 23:11 +0200, Jozsef Kadlecsik wrote:

> In net/netfilter/nf_conntrack_proto_tcp.c we copy the options into a 
> buffer with skb_header_pointer(), so it's not a false positive there and 
> the KASAN report referred to that part.
> 

Although the out of bound could be one extra byte,
if skb_header_bpointer() had to copy something (since it also might
return a pointer inside skb->head)

No arch would possibly fault here.

So reading one byte on the stack is fooling KASAN, but no ill effect
would actually happen.

If the read byte is < 2, the function would return because of

	 if (opsize < 2)
		return;

If the read byte is >= 2, the function would return because of
	if (opsize > length)
		return; /* don't parse partial options */


(Since we care here of the case where length == 1)

No big deal, it is probably better to 'fix' the code so that it pleases
dynamic checkers.

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Jozsef Kadlecsik @ 2016-03-28 21:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Engelhardt, David Miller, sploving1, Pablo Neira Ayuso,
	Patrick McHardy, netfilter-devel, netdev
In-Reply-To: <1459197963.6473.125.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 28 Mar 2016, Eric Dumazet wrote:

> On Mon, 2016-03-28 at 22:20 +0200, Jan Engelhardt wrote:
> > On Monday 2016-03-28 21:29, David Miller wrote:
> > >>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
> > >>> > >               length--;
> > >>> > >               continue;
> > >>> > >           default:
> > >>> > > +            if (length < 2)
> > >>> > > +                return;
> > >>> > >               opsize = *ptr++;
> > >>> > >               if (opsize < 2) /* "silly options" */
> > >>> > >                   return;
> > >
> > >I'm trying to figure out how this can even matter.
> > >If we are in the loop, length is at least one.
> > >That means it is legal to read the opsize byte.
> > 
> > Is that because the skbuff is always padded to a multiple of (at
> > least) two? Maybe such padding is explicitly foregone when ASAN is in
> > place. After all, glibc, in userspace, is likely to do padding as
> > well for malloc, and yet, ASAN catches these cases.

There might be a TCP option combination, which is "properly" padded but 
broken, like (wscale, wscale-value, mss) where the mss-value is missing.

> We have at least 384 bytes of padding in skb->head (this is struct
> skb_shared_info).
> 
> Whatever garbage we might read, current code is fine.
> 
> We have to deal with a false positive here.

In net/netfilter/nf_conntrack_proto_tcp.c we copy the options into a 
buffer with skb_header_pointer(), so it's not a false positive there and 
the KASAN report referred to that part.

I thought it's valid for tcp_parse_options() too, but then I'm wrong so 
at least the part from the patch for tcp_input.c can be dropped.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply

* Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU
From: Eric Dumazet @ 2016-03-28 21:02 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, David S . Miller, netdev
In-Reply-To: <CALx6S35n=R4B0Mm0YpgobPfU9rC=wf+t2b2z=xne3MhCMLbW2g@mail.gmail.com>

On Fri, 2016-03-25 at 17:08 -0700, Tom Herbert wrote:
> On Fri, Mar 25, 2016 at 3:29 PM, Eric Dumazet <edumazet@google.com> wrote:

> > +/* Must be called under rcu_read_lock().
> 
> 
> It might be just as easy to do the rcu_read_lock() within the
> function. That way we don't need to require callers to do it now.
> 
> > + * Does increment socket refcount.
> > + */
> > +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \
> > +    IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY)
> >  struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
> >                              __be32 daddr, __be16 dport, int dif)
> >  {
> > -       return __udp4_lib_lookup(net, saddr, sport, daddr, dport, dif,
> > -                                &udp_table, NULL);
> > +       struct sock *sk;
> > +
> > +       sk = __udp4_lib_lookup(net, saddr, sport, daddr, dport,
> > +                              dif, &udp_table, NULL);
> > +       if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
> > +               sk = NULL;
> > +       return sk;
> >  }
> >  EXPORT_SYMBOL_GPL(udp4_lib_lookup);
> > +#endif

Well, these callers already run with rcu_read_lock(), I simply added a
comment to remind this assumption.

As I said, we might later avoid the refcounting if callers are modified
to not call sock_put(). This is why I prefer to maintain the reuirement
of rcu_read_lock() being held by callers anyway.

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Eric Dumazet @ 2016-03-28 20:51 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: David Miller, kadlec, sploving1, pablo, kaber, netfilter-devel,
	netdev
In-Reply-To: <1459197963.6473.125.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 2016-03-28 at 13:46 -0700, Eric Dumazet wrote:

> We have at least 384 bytes of padding in skb->head (this is struct
> skb_shared_info).
> 
> Whatever garbage we might read, current code is fine.
> 
> We have to deal with a false positive here.

Very similar to the one fixed in 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10ec9472f05b45c94db3c854d22581a20b97db41

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Eric Dumazet @ 2016-03-28 20:46 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: David Miller, kadlec, sploving1, pablo, kaber, netfilter-devel,
	netdev
In-Reply-To: <alpine.LSU.2.20.1603282214130.16241@nerf40.vanv.qr>

On Mon, 2016-03-28 at 22:20 +0200, Jan Engelhardt wrote:
> On Monday 2016-03-28 21:29, David Miller wrote:
> >>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
> >>> > >               length--;
> >>> > >               continue;
> >>> > >           default:
> >>> > > +            if (length < 2)
> >>> > > +                return;
> >>> > >               opsize = *ptr++;
> >>> > >               if (opsize < 2) /* "silly options" */
> >>> > >                   return;
> >
> >I'm trying to figure out how this can even matter.
> >If we are in the loop, length is at least one.
> >That means it is legal to read the opsize byte.
> 
> Is that because the skbuff is always padded to a multiple of (at
> least) two? Maybe such padding is explicitly foregone when ASAN is in
> place. After all, glibc, in userspace, is likely to do padding as
> well for malloc, and yet, ASAN catches these cases.

We have at least 384 bytes of padding in skb->head (this is struct
skb_shared_info).

Whatever garbage we might read, current code is fine.

We have to deal with a false positive here.

^ permalink raw reply

* [PATCH] ethernet: mvneta: Support netpoll
From: Ezequiel Garcia @ 2016-03-28 20:41 UTC (permalink / raw)
  To: Thomas Petazzoni; +Cc: netdev, Ezequiel Garcia

This commit adds the support for netpoll, which is used
to implement netconsole.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
---
Tested on Armada 370 Mirabox and Armada XP Openblocks AX3-4
with netconsole.

 drivers/net/ethernet/marvell/mvneta.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 577f7ca7deba..dd114303c98f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3813,6 +3813,24 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
 	return 0;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void mvneta_percpu_poll_controller(void *arg)
+{
+	struct mvneta_port *pp = arg;
+	struct mvneta_pcpu_port *pcpu_port =
+		this_cpu_ptr(pp->ports);
+	mvneta_isr(pp->dev->irq, pcpu_port);
+}
+
+/* Polled functionality used by netconsole and others in non interrupt mode */
+static void mvneta_poll_controller(struct net_device *dev)
+{
+	struct mvneta_port *pp = netdev_priv(dev);
+
+	on_each_cpu(mvneta_percpu_poll_controller, pp, false);
+}
+#endif
+
 static const struct net_device_ops mvneta_netdev_ops = {
 	.ndo_open            = mvneta_open,
 	.ndo_stop            = mvneta_stop,
@@ -3823,6 +3841,9 @@ static const struct net_device_ops mvneta_netdev_ops = {
 	.ndo_fix_features    = mvneta_fix_features,
 	.ndo_get_stats64     = mvneta_get_stats64,
 	.ndo_do_ioctl        = mvneta_ioctl,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	.ndo_poll_controller = mvneta_poll_controller,
+#endif
 };
 
 const struct ethtool_ops mvneta_eth_tool_ops = {
-- 
2.7.0

^ permalink raw reply related

* [PATCH net,stable] qmi_wwan: add "D-Link DWM-221 B1" device id
From: Bjørn Mork @ 2016-03-28 20:38 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Thomas Schäfer, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Bjørn Mork

Thomas reports:
"Windows:

00 diagnostics
01 modem
02 at-port
03 nmea
04 nic

Linux:

T:  Bus=02 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=2001 ProdID=7e19 Rev=02.32
S:  Manufacturer=Mobile Connect
S:  Product=Mobile Connect
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage"

Reported-by: Thomas Schäfer <tschaefer-zqRNUXuvxA0b1SvskN2V4Q@public.gmane.org>
Signed-off-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 7d717c66bcb0..9d1fce8a6e84 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -844,6 +844,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x19d2, 0x1426, 2)},	/* ZTE MF91 */
 	{QMI_FIXED_INTF(0x19d2, 0x1428, 2)},	/* Telewell TW-LTE 4G v2 */
 	{QMI_FIXED_INTF(0x19d2, 0x2002, 4)},	/* ZTE (Vodafone) K3765-Z */
+	{QMI_FIXED_INTF(0x2001, 0x7e19, 4)},	/* D-Link DWM-221 B1 */
 	{QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)},    /* Sierra Wireless MC7700 */
 	{QMI_FIXED_INTF(0x114f, 0x68a2, 8)},    /* Sierra Wireless MC7750 */
 	{QMI_FIXED_INTF(0x1199, 0x68a2, 8)},	/* Sierra Wireless MC7710 in QMI mode */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: Alexander Duyck @ 2016-03-28 20:34 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers
In-Reply-To: <CALx6S37eOhZ+9rS8vk6FG9qyxpZ_1i5-hw4d4rnFdvEnBde3kg@mail.gmail.com>

On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>
>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>
>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>
>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>> that would cause problems.
>>>>>>>>
>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>
>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>> but *not* check for it.
>>>>>>
>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>
>>>>>
>>>>> Kernel software offload does support this combination just fine.
>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>> HW implementations you're referring to. The limitations of these
>>>>> drivers should not dictate how we build the software-- it needs to
>>>>> work the other way around.
>>>>
>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>> thing currently as the transmit side doesn't generate these kind of
>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>> really need to restructure the GSO code in order to handle it.  As far
>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>> way we could possibly handle it would be in software since what you
>>>> are indicating is a indeterminate number of headers that all require
>>>> updates.
>>>>
>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>> since the assumption is that the stack will never try to do this.
>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>> think it would be better to just advertise it that way on transmit
>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>
>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>> we should be increasing generality and capabilities of the kernel not
>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>> this flies then we can offload any combination of encapsulations with
>>>>> out protocol specific information.
>>>>
>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>> headers.  Currently we have the pointers for the inner headers and the
>>>> outer headers.  If you are talking about adding yet another level we
>>>> would need additional pointers in the skbuff to handle them and we
>>>> currently don't have them at present.
>>>>
>>>>>> The change that you are suggesting would result in packets generated
>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>> might corrupt traffic.
>>>>>
>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>> generic segmentation offload interface. But in the interim, it is
>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>> for their benefit is definitely not right.
>>>>
>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>> can handle it.  The problem as it stands right now is that we start
>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>> the standard mac_header, network_header, transport_header, and then
>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>> deep we start losing data.
>>>>
>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>> encapsulation for over a year now. We never had to add anything to
>>> skbuff to make that work. If "losing data" is a problem please provide
>>> the *reproducible* test case for that and we'll debug that.
>>
>> I'm guessing most of your examples involve either a remote checksum
>> being enabled or using NICs that don't support any tunnel offloads?
>> Hardware needs to be able to identify where headers are in order to
>> perform most of their offloads for TSO.  We either have to parse to
>> find them or we are provided with them by the stack.  GSO can work
>> around it as long as we don't stack checksum based offload with
>> non-checksum of the same type.
>>
>> mostIf for example you were to try and send a frame that had either an
>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>> probably screw it up.
>
> Please be more precise. Obviously we're only talking about the few
> NICs that even support UDP tunnel offload so it's not most NICs. Also,
> there is a big difference between "probably" and "definitely"; no one
> has provided a single data point that that there is even an issue. For
> instance, looking a i40e I suspect this will work with GRE/UDP since
> the driver already deals with offsets and shouldn't care about
> intermediate levels of encapsulation.

I'm sorry I cannot be super precise as I hadn't looked that closely at
the FOU or GUE code before.  Honestly I suspect most people probably
haven't.

So from what I can tell FOU and GUE isn't really even necessarily
doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
So in the case of IPIP and SIT for instance the only real difference
between VXLAN and those tunnels with fou is that you don't have the
VXLAN or inner Ethernet headers.  So really you still only have two
levels of headers.  Even in the case of GRE you have that set after
the outer UDP header so in that case GRE ends up being treated as a
tunnel header since the outer transport offset was overwritten.  If we
had hardware that supported both UDP and GRE outer checksums it would
cause a mess as the hardware would place the GRE checksum in the wrong
spot.

So if anything all we probably would need to do is treat the FOU or
GUE stuff as a special case.  Basically if we end up having to do a
GRO over a FOU or GUE instance maybe we need to knock the encap_mark
back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
because really it can just be treated as a UDP tunnel.  Jesse's code
is still needed to catch the case where someone is trying to do
something like IPIP over VXLAN or whatever but it does seem like there
should be some wiggle room for FOU or GUE.

- Alex

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Jan Engelhardt @ 2016-03-28 20:20 UTC (permalink / raw)
  To: David Miller; +Cc: kadlec, sploving1, pablo, kaber, netfilter-devel, netdev
In-Reply-To: <20160328.152959.1109841569654352040.davem@davemloft.net>


On Monday 2016-03-28 21:29, David Miller wrote:
>>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
>>> > >               length--;
>>> > >               continue;
>>> > >           default:
>>> > > +            if (length < 2)
>>> > > +                return;
>>> > >               opsize = *ptr++;
>>> > >               if (opsize < 2) /* "silly options" */
>>> > >                   return;
>
>I'm trying to figure out how this can even matter.
>If we are in the loop, length is at least one.
>That means it is legal to read the opsize byte.

Is that because the skbuff is always padded to a multiple of (at
least) two? Maybe such padding is explicitly foregone when ASAN is in
place. After all, glibc, in userspace, is likely to do padding as
well for malloc, and yet, ASAN catches these cases.

^ permalink raw reply

* Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU
From: Rick Jones @ 2016-03-28 20:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, David S . Miller, netdev, Tom Herbert
In-Reply-To: <1459195293.6473.118.camel@edumazet-glaptop3.roam.corp.google.com>

On 03/28/2016 01:01 PM, Eric Dumazet wrote:
> Note : file structures got RCU freeing back in 2.6.14, and I do not
> think named users ever complained about added cost ;)

Couldn't see the tree for the forest I guess :)

rick

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: Eric Dumazet @ 2016-03-28 20:07 UTC (permalink / raw)
  To: David Miller; +Cc: kadlec, sploving1, pablo, kaber, netfilter-devel, netdev
In-Reply-To: <20160328.152959.1109841569654352040.davem@davemloft.net>

On Mon, 2016-03-28 at 15:29 -0400, David Miller wrote:
> From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Date: Mon, 28 Mar 2016 18:48:51 +0200 (CEST)
> 
> >> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
> >> > >               length--;
> >> > >               continue;
> >> > >           default:
> >> > > +            if (length < 2)
> >> > > +                return;
> >> > >               opsize = *ptr++;
> >> > >               if (opsize < 2) /* "silly options" */
> >> > >                   return;
> 
> I'm trying to figure out how this can even matter.
> 
> If we are in the loop, length is at least one.
> 
> That means it is legal to read the opsize byte.
> 
> By the next check, opsize is at least 2.
> 
> And then the very next line in this code makes sure length >= opsize:
> 
> 			if (opsize > length)
> 				return;	/* don't parse partial options */
> 
> Therefore no out-of-range access is possible as far as I can see.

Maybe use kasan_disable_current() and kasan_enable_current() to silence
kasan ?

Oh wait, these are not BH safe.





^ permalink raw reply

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: Tom Herbert @ 2016-03-28 20:03 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesse Gross, David Miller, Linux Kernel Network Developers
In-Reply-To: <CAKgT0Uf-mPK3xQuR+BdB5owfuJdQkZPE6Z0MQs9R-y3NRrC_9g@mail.gmail.com>

On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>
>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>
>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>
>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>> that would cause problems.
>>>>>>>
>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>> driver can use ndo_features_check to validate.
>>>>>>
>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>> In this case the GSO flags don't provide enough information to
>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>> but *not* check for it.
>>>>>
>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>
>>>>
>>>> Kernel software offload does support this combination just fine.
>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>> HW implementations you're referring to. The limitations of these
>>>> drivers should not dictate how we build the software-- it needs to
>>>> work the other way around.
>>>
>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>> thing currently as the transmit side doesn't generate these kind of
>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>> really need to restructure the GSO code in order to handle it.  As far
>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>> wonder if we need to add yet another GSO bit to indicate that we
>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>> way we could possibly handle it would be in software since what you
>>> are indicating is a indeterminate number of headers that all require
>>> updates.
>>>
>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>> since the assumption is that the stack will never try to do this.
>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>> think it would be better to just advertise it that way on transmit
>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>
>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>> we should be increasing generality and capabilities of the kernel not
>>>> holding it down with artificial limits. This is why the generic TSO
>>>> work that Alexander and Edward are looking at is so important-- if
>>>> this flies then we can offload any combination of encapsulations with
>>>> out protocol specific information.
>>>
>>> The segmentation code isn't designed to handle more than 2 layers of
>>> headers.  Currently we have the pointers for the inner headers and the
>>> outer headers.  If you are talking about adding yet another level we
>>> would need additional pointers in the skbuff to handle them and we
>>> currently don't have them at present.
>>>
>>>>> The change that you are suggesting would result in packets generated
>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>> optimization and correctness must come first so we cannot use it if it
>>>>> might corrupt traffic.
>>>>
>>>> That's (a few) drivers problem. It's no different than if they had
>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>> generic segmentation offload interface. But in the interim, it is
>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>> for their benefit is definitely not right.
>>>
>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>> can handle it.  The problem as it stands right now is that we start
>>> losing data once we go past 1 level of encapsulation.  We only have
>>> the standard mac_header, network_header, transport_header, and then
>>> the inner set of the same pointers.  If we try to go multiple levels
>>> deep we start losing data.
>>>
>> Huh? GUE/FOU has been running perfectly well with two levels of
>> encapsulation for over a year now. We never had to add anything to
>> skbuff to make that work. If "losing data" is a problem please provide
>> the *reproducible* test case for that and we'll debug that.
>
> I'm guessing most of your examples involve either a remote checksum
> being enabled or using NICs that don't support any tunnel offloads?
> Hardware needs to be able to identify where headers are in order to
> perform most of their offloads for TSO.  We either have to parse to
> find them or we are provided with them by the stack.  GSO can work
> around it as long as we don't stack checksum based offload with
> non-checksum of the same type.
>
> mostIf for example you were to try and send a frame that had either an
> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
> probably screw it up.

Please be more precise. Obviously we're only talking about the few
NICs that even support UDP tunnel offload so it's not most NICs. Also,
there is a big difference between "probably" and "definitely"; no one
has provided a single data point that that there is even an issue. For
instance, looking a i40e I suspect this will work with GRE/UDP since
the driver already deals with offsets and shouldn't care about
intermediate levels of encapsulation.

> Up until now that hasn't been an issue.  As we
> start turning on offload support for multiple tunnel types thanks to
> the partial offloads it will come back and bite us if we try to tell
> hardware to handle more than 2 levels of headers.  I'm thinking if
> nothing else we might have to add yet another bit to GSO for stacked
> tunnels which can probably only be supported via partial GSO and only
> if we can get away with just replicating headers.
>
> It looks like we need to go through and probably clean up both the GSO
> and GRO code.  First we have to get the GSO code setup so that it can
> fully handle multiple levels of tunnels.  The code as it is now
> assumes you would only have one level and we are configuring the
> headers as such.  In addition the GRO code doesn't seem to place the
> header offsets correctly.  For instance, from what I can tell it looks
> like the inner transport offset is never updated.  We will probably
> need to have pointers for the inner-most and outer-most set of headers
> and from there we can start handling the offloads.
>
> - Alex

^ permalink raw reply

* Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU
From: Eric Dumazet @ 2016-03-28 20:01 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, David S . Miller, netdev, Tom Herbert
In-Reply-To: <56F981D7.2050801@hpe.com>

On Mon, 2016-03-28 at 12:11 -0700, Rick Jones wrote:

> I was under the impression that individual DNS queries were supposed to 
> have not only random DNS query IDs but also originate from random UDP 
> source ports.  https://tools.ietf.org/html/rfc5452 4.5 at least touches 
> on the topic but I don't see it making it hard and fast.  By section 10 
> though it is more explicit:
> 
>     This document recommends the use of UDP source port number
>     randomization to extend the effective DNS transaction ID beyond the
>     available 16 bits.
> 
> That being the case, if indeed there were to be 30000-odd concurrent 
> requests outstanding "upstream" from that location there'd have to be 
> 30000 ephemeral ports in play.

Sure, so these heavy duty programs have to consider the randomness
implications already, when picking a port in their pool.

Using the kernel and force socket()/bind()/close() in the hope to get
random values is safe, but never had been optimal for performance.

Note : file structures got RCU freeing back in 2.6.14, and I do not
think named users ever complained about added cost ;)

^ permalink raw reply

* Re: [PATCH 0/9] Netfilter fixes for net
From: David Miller @ 2016-03-28 19:43 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1459187882-5357-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon, 28 Mar 2016 19:57:53 +0200

> The following patchset contains Netfilter fixes for you net tree,
> they are:
 ...
> This batch comes with four patches to validate x_tables blobs coming
> from userspace. CONFIG_USERNS exposes the x_tables interface to
> unpriviledged users and to be honest this interface never received the
> attention for this move away from the CAP_NET_ADMIN domain. Florian is
> working on another round with more patches with more sanity checks, so
> expect a bit more Netfilter fixes in this development cycle than usual.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Looks good, pulled, thanks Pablo!

^ permalink raw reply

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: David Miller @ 2016-03-28 19:33 UTC (permalink / raw)
  To: jesse; +Cc: tom, netdev
In-Reply-To: <CAEh+42hbYUMK4WMZYDtzfynd1XFb7OxKkmdPOdT1Z7=n-kw4sA@mail.gmail.com>

From: Jesse Gross <jesse@kernel.org>
Date: Sun, 27 Mar 2016 21:38:34 -0700

> Generally speaking, multiple levels of encapsulation offload are not
> valid.

That's very much not true Jesse.

Please fix the regression you introduced or I will have no choice but to
revert your entire set of changes.

^ permalink raw reply

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
From: Alexander Duyck @ 2016-03-28 19:31 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers
In-Reply-To: <CALx6S37RPHwsWJOpNJ+8t1bBbJJ=5PKNwFUrh-yhwwXN63RFOA@mail.gmail.com>

On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>> more IP length fields and they are unaware of this.
>>>>>>
>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>
>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>
>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>> that would cause problems.
>>>>>>
>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>> driver can use ndo_features_check to validate.
>>>>>
>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>> could have is the potential for GRO to construct a packet which is a
>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>> In this case the GSO flags don't provide enough information to
>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>> but *not* check for it.
>>>>
>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>> valid. I think this is pretty clear from the fact that none of the
>>>> encapsulation drivers expose support for encapsulation offloads on
>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>
>>>
>>> Kernel software offload does support this combination just fine.
>>> Seriously, I've tested that more than a thousand times. This is a few
>>> HW implementations you're referring to. The limitations of these
>>> drivers should not dictate how we build the software-- it needs to
>>> work the other way around.
>>
>> Jesse does have a point.  Drivers aren't checking for this kind of
>> thing currently as the transmit side doesn't generate these kind of
>> frames.  The fact that GRO is makes things a bit more messy as we will
>> really need to restructure the GSO code in order to handle it.  As far
>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>> wonder if we need to add yet another GSO bit to indicate that we
>> support multiple layers of encapsulation.  I'm pretty sure the only
>> way we could possibly handle it would be in software since what you
>> are indicating is a indeterminate number of headers that all require
>> updates.
>>
>>>> Asking drivers to assume that this combination of flags means FOU
>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>> since the assumption is that the stack will never try to do this.
>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>> think it would be better to just advertise it that way on transmit
>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>
>>> If it's not FOU it will be something else. Arbitrarily declaring
>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>> we should be increasing generality and capabilities of the kernel not
>>> holding it down with artificial limits. This is why the generic TSO
>>> work that Alexander and Edward are looking at is so important-- if
>>> this flies then we can offload any combination of encapsulations with
>>> out protocol specific information.
>>
>> The segmentation code isn't designed to handle more than 2 layers of
>> headers.  Currently we have the pointers for the inner headers and the
>> outer headers.  If you are talking about adding yet another level we
>> would need additional pointers in the skbuff to handle them and we
>> currently don't have them at present.
>>
>>>> The change that you are suggesting would result in packets generated
>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>> would likely end up being dropped or malformed. GRO is just an
>>>> optimization and correctness must come first so we cannot use it if it
>>>> might corrupt traffic.
>>>
>>> That's (a few) drivers problem. It's no different than if they had
>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>> the long term solution is to eliminate the GSO_ flags and use a
>>> generic segmentation offload interface. But in the interim, it is
>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>> and the interfaces to do that exist. Restricting the capabilities of
>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>> for their benefit is definitely not right.
>>
>> This isn't about if drivers can handle it.  It is about if the skbuff
>> can handle it.  The problem as it stands right now is that we start
>> losing data once we go past 1 level of encapsulation.  We only have
>> the standard mac_header, network_header, transport_header, and then
>> the inner set of the same pointers.  If we try to go multiple levels
>> deep we start losing data.
>>
> Huh? GUE/FOU has been running perfectly well with two levels of
> encapsulation for over a year now. We never had to add anything to
> skbuff to make that work. If "losing data" is a problem please provide
> the *reproducible* test case for that and we'll debug that.

I'm guessing most of your examples involve either a remote checksum
being enabled or using NICs that don't support any tunnel offloads?
Hardware needs to be able to identify where headers are in order to
perform most of their offloads for TSO.  We either have to parse to
find them or we are provided with them by the stack.  GSO can work
around it as long as we don't stack checksum based offload with
non-checksum of the same type.

mostIf for example you were to try and send a frame that had either an
inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
probably screw it up.  Up until now that hasn't been an issue.  As we
start turning on offload support for multiple tunnel types thanks to
the partial offloads it will come back and bite us if we try to tell
hardware to handle more than 2 levels of headers.  I'm thinking if
nothing else we might have to add yet another bit to GSO for stacked
tunnels which can probably only be supported via partial GSO and only
if we can get away with just replicating headers.

It looks like we need to go through and probably clean up both the GSO
and GRO code.  First we have to get the GSO code setup so that it can
fully handle multiple levels of tunnels.  The code as it is now
assumes you would only have one level and we are configuring the
headers as such.  In addition the GRO code doesn't seem to place the
header offsets correctly.  For instance, from what I can tell it looks
like the inner transport offset is never updated.  We will probably
need to have pointers for the inner-most and outer-most set of headers
and from there we can start handling the offloads.

- Alex

^ permalink raw reply

* Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet
From: David Miller @ 2016-03-28 19:29 UTC (permalink / raw)
  To: kadlec; +Cc: sploving1, pablo, kaber, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.10.1603281844200.23985@blackhole.kfki.hu>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date: Mon, 28 Mar 2016 18:48:51 +0200 (CEST)

>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
>> > >               length--;
>> > >               continue;
>> > >           default:
>> > > +            if (length < 2)
>> > > +                return;
>> > >               opsize = *ptr++;
>> > >               if (opsize < 2) /* "silly options" */
>> > >                   return;

I'm trying to figure out how this can even matter.

If we are in the loop, length is at least one.

That means it is legal to read the opsize byte.

By the next check, opsize is at least 2.

And then the very next line in this code makes sure length >= opsize:

			if (opsize > length)
				return;	/* don't parse partial options */

Therefore no out-of-range access is possible as far as I can see.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox