Re: [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: davem@davemloft.net, netdev@vger.kernel.org, bpf@vger.kernel.org,
	cake@lists.bufferbloat.net, Davide Caratti <dcaratti@redhat.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Roman Mashak <mrv@mojatatu.com>, Lawrence Brakmo <brakmo@fb.com>,
	Ilya Ponetayev <i.ponetaev@ndmsystems.com>,
	kafai@fb.com, alexei.starovoitov@gmail.com, edumazet@google.com
Subject: Re: [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs
Date: Sat, 04 Jul 2020 13:28:40 +0200	[thread overview]
Message-ID: <87blkvmsd3.fsf@toke.dk> (raw)
In-Reply-To: <003ff65d-fc24-cd25-9e46-95e7ca2aa31f@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 7/3/20 10:26 PM, Toke Høiland-Jørgensen wrote:
>> There are a couple of places in net/sched/ that check skb->protocol and act
>> on the value there. However, in the presence of VLAN tags, the value stored
>> in skb->protocol can be inconsistent based on whether VLAN acceleration is
>> enabled. The commit quoted in the Fixes tag below fixed the users of
>> skb->protocol to use a helper that will always see the VLAN ethertype.
>> 
>> However, most of the callers don't actually handle the VLAN ethertype, but
>> expect to find the IP header type in the protocol field. This means that
>> things like changing the ECN field, or parsing diffserv values, stops
>> working if there's a VLAN tag, or if there are multiple nested VLAN
>> tags (QinQ).
>> 
>> To fix this, change the helper to take an argument that indicates whether
>> the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
>> make sure to skip all of them, so behaviour is consistent even in QinQ
>> mode.
>> 
>> To make the helper usable from the ECN code, move it to if_vlan.h instead
>> of pkt_sched.h.
>> 
>> v3:
>> - Remove empty lines
>> - Move vlan variable definitions inside loop in skb_protocol()
>> - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
>>    bpf_skb_ecn_set_ce()
>> 
>> v2:
>> - Use eth_type_vlan() helper in skb_protocol()
>> - Also fix code that reads skb->protocol directly
>> - Change a couple of 'if/else if' statements to switch constructs to avoid
>>    calling the helper twice
>> 
>> Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com>
>> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path")
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>   include/linux/if_vlan.h  | 28 ++++++++++++++++++++++++++++
>>   include/net/inet_ecn.h   | 25 +++++++++++++++++--------
>>   include/net/pkt_sched.h  | 11 -----------
>>   net/core/filter.c        | 10 +++++++---
>>   net/sched/act_connmark.c |  9 ++++++---
>>   net/sched/act_csum.c     |  2 +-
>>   net/sched/act_ct.c       |  9 ++++-----
>>   net/sched/act_ctinfo.c   |  9 ++++++---
>>   net/sched/act_mpls.c     |  2 +-
>>   net/sched/act_skbedit.c  |  2 +-
>>   net/sched/cls_api.c      |  2 +-
>>   net/sched/cls_flow.c     |  8 ++++----
>>   net/sched/cls_flower.c   |  2 +-
>>   net/sched/em_ipset.c     |  2 +-
>>   net/sched/em_ipt.c       |  2 +-
>>   net/sched/em_meta.c      |  2 +-
>>   net/sched/sch_cake.c     |  4 ++--
>>   net/sched/sch_dsmark.c   |  6 +++---
>>   net/sched/sch_teql.c     |  2 +-
>>   19 files changed, 86 insertions(+), 51 deletions(-)
>> 
>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>> index b05e855f1ddd..427a5b8597c2 100644
>> --- a/include/linux/if_vlan.h
>> +++ b/include/linux/if_vlan.h
>> @@ -308,6 +308,34 @@ static inline bool eth_type_vlan(__be16 ethertype)
>>   	}
>>   }
>>   
>> +/* A getter for the SKB protocol field which will handle VLAN tags consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +	unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +	__be16 proto = skb->protocol;
>> +
>> +	if (!skip_vlan)
>> +		/* VLAN acceleration strips the VLAN header from the skb and
>> +		 * moves it to skb->vlan_proto
>> +		 */
>> +		return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>> +
>> +	while (eth_type_vlan(proto)) {
>> +		struct vlan_hdr vhdr, *vh;
>> +
>> +		vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>> +		if (!vh)
>> +			break;
>> +
>> +		proto = vh->h_vlan_encapsulated_proto;
>> +		offset += sizeof(vhdr);
>> +	}
>
> Hm, why is the while loop 'unbounded'? Does it even make sense to have
> a packet with hundreds of vlan hdrs in there what you'd end up
> walking? What if an attacker crafts a max sized packet with only
> vlan_hdr forcing exorbitant looping in fast-path here (e.g. via
> af_packet)?

Hmm, I guess you're right that could theoretically happen. But on the
other hand, a lot of drivers seem to be cheerfully calling
vlan_get_protocol() on incoming packets, which doesn't have a limit on
the depth either.

I guess I could add a depth limit, but in that case I suppose that
should also be added to vlan_get_protocol() (or the two should be
consolidated). WDYT?

> Did you validate that skb_mac_offset() is always valid for the
> call-sites you converted? (We have a skb_mac_header_was_set() test to
> probe for whether skb->mac_header is set to ~0.)

Not extensively; I kinda assumed it would always be valid at those call
sites, since the callers go on to call ip_hdr() or something similar
right afterwards.

I guess Toshiaki's suggestion to use vlan_get_protocol() could be a way
around this, as that seems to deal with skb->mac_len being 0.

-Toke

next prev parent reply	other threads:[~2020-07-04 11:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03 20:26 [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs Toke Høiland-Jørgensen
2020-07-03 21:35 ` David Miller
2020-07-03 22:17 ` Daniel Borkmann
2020-07-04 11:28   ` Toke Høiland-Jørgensen [this message]
2020-07-04  3:24 ` Toshiaki Makita
2020-07-04 11:33   ` Toke Høiland-Jørgensen
2020-07-06  4:24     ` Toshiaki Makita
2020-07-06 10:53       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blkvmsd3.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=brakmo@fb.com \
    --cc=cake@lists.bufferbloat.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=edumazet@google.com \
    --cc=i.ponetaev@ndmsystems.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kafai@fb.com \
    --cc=mrv@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.