netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Eric Dumazet <edumazet@google.com>, Xin Long <lucien.xin@gmail.com>
Cc: network dev <netdev@vger.kernel.org>,
	davem@davemloft.net, kuba@kernel.org,
	Paolo Abeni <pabeni@redhat.com>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Pravin B Shelar <pshelar@ovn.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Ilya Maximets <i.maximets@ovn.org>,
	Aaron Conole <aconole@redhat.com>,
	Roopa Prabhu <roopa@nvidia.com>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Mahesh Bandewar <maheshb@google.com>,
	Paul Moore <paul@paul-moore.com>,
	Guillaume Nault <gnault@redhat.com>
Subject: Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length_mt6
Date: Mon, 16 Jan 2023 08:07:58 -0700	[thread overview]
Message-ID: <b73e2dd1-d7bc-e96b-8553-1536a1146f3c@gmail.com> (raw)
In-Reply-To: <CANn89iLtF3dNcMkMGagCSfb+p5zA3Fa-DV9f9xMHHU_TX2CvSw@mail.gmail.com>

On 1/16/23 2:24 AM, Eric Dumazet wrote:
> On Sun, Jan 15, 2023 at 9:15 PM Xin Long <lucien.xin@gmail.com> wrote:
>>
>> On Sun, Jan 15, 2023 at 2:40 PM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Sun, Jan 15, 2023 at 6:43 PM Xin Long <lucien.xin@gmail.com> wrote:
>>>>
>>>> On Sun, Jan 15, 2023 at 10:41 AM David Ahern <dsahern@gmail.com> wrote:
>>>>>
>>>>> On 1/13/23 8:31 PM, Xin Long wrote:
>>>>>> For IPv6 jumbogram packets, the packet size is bigger than 65535,
>>>>>> it's not right to get it from payload_len and save it to an u16
>>>>>> variable.
>>>>>>
>>>>>> This patch only fixes it for IPv6 BIG TCP packets, so instead of
>>>>>> parsing IPV6_TLV_JUMBO exthdr, which is quite some work, it only
>>>>>> gets the pktlen via 'skb->len - skb_network_offset(skb)' when
>>>>>> skb_is_gso_v6() and saves it to an u32 variable, similar to IPv4
>>>>>> BIG TCP packets.
>>>>>>
>>>>>> This fix will also help us add selftest for IPv6 BIG TCP in the
>>>>>> following patch.
>>>>>>
>>>>>
>>>>> If this is a bug fix for the existing IPv6 support, send it outside of
>>>>> this set for -net.
>>>>>
>>>> Sure,
>>>> I was thinking of adding it here to be able to support selftest for
>>>> IPv6 too in the next patch. But it seems to make more sense to
>>>> get it into -net first, then add this selftest after it goes to net-next.
>>>>
>>>> I will post it and all other fixes I mentioned in the cover-letter for
>>>> IPv6 BIG TCP for -net.
>>>>
>>>> But before that, I hope Eric can confirm it is okay to read the length
>>>> of IPv6 BIG TCP packets with skb_ipv6_totlen() defined in this patch,
>>>> instead of parsing JUMBO exthdr?
>>>>
>>>
>>> I do not think it is ok, but I will leave the question to netfilter maintainers.
>> Just note that the issue doesn't only exist in netfilter.
>> All the changes in Patch 2-7 from this patchset are also needed for IPv6
>> BIG TCP packets.
>>
>>>
>>> Guessing things in tcpdump or other tools is up to user space implementations,
>>> trying to work around some (kernel ?) deficiencies.
>>>
>>> Yes, IPv6 extensions headers are a pain, we all agree.
>>>
>>> Look at how ip6_rcv_core() properly dissects extension headers _and_ trim
>>> skb accordingly (pskb_trim_rcsum() called either from ip6_rcv_core()
>>> or ipv6_hop_jumbo())
>>>
>>> So skb->len is not the root of trust. Some transport mediums might add paddings.
>>>
>>> Ipv4 has a similar logic in ip_rcv_core().
>>>
>>> len = ntohs(iph->tot_len);
>>> if (skb->len < len) {
>>>      drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
>>>      __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS);
>>>     goto drop;
>>> } else if (len < (iph->ihl*4))
>>>      goto inhdr_error;
>>>
>>> /* Our transport medium may have padded the buffer out. Now we know it
>>> * is IP we can trim to the true length of the frame.
>>> * Note this now means skb->len holds ntohs(iph->tot_len).
>>> */
>>> if (pskb_trim_rcsum(skb, len)) {
>>>       __IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
>>>       goto drop;
>>> }
>>>
>>> After your changes, we might accept illegal packets that were properly
>>> dropped before.
>> I think skb->len is trustable for GSO/GRO packets.
>> In ipv6_gro_complete/inet_gro_complete():
>> The new length for payload_len or iph->tot_len are all calculated from skb->len.
>> As I said in the cover-letter, "there is no padding in GSO/GRO packets".
>> Or am I missing something?
> 
> This seems to be a contract violation with user space providing GSO packets.
> 
> In our changes we added some sanity checks, inherent to JUMBO specs.
> 
> Here, a GSO packet can now have a zero ip length, no matter if it is
> BIG TCP or not.

Meaning your preference is to set tot_len anytime it is <= 64kB so the
only time tot_len == 0 is for large GRO/TSO packets? That is doable.

> 
> It seems we lower the bar for consistency, and allow bugs (say
> changing skb->len) to not be detected.

not sure why you think it would not be detected. Today's model for gro
sets tot_len based on skb->len. There is an inherent trust that the
user's of the gro API set the length correctly. If it is not, the
payload to userspace would ultimately be non-sense and hence detectable.
I tend to use ssh to test changes like this for this reason - L4 payload
must make sense.

For the Tx path, there is a similar line of trust that the skb->len
passed to the L3 layer is correct. IPv4/IPv6 blindly trust what it is
told for length.


> 
> As you said, user space sniffing packets now have to guess what is the
> intent, instead of headers carrying all the needed information
> that can be fully validated by parsers.

This is a solveable problem within the packet socket API, and the entire
thing is opt-in. If a user's tcpdump / packet capture program is out of
date and does not support the new API for large packets, then that user
does not have to enable large GRO/TSO.


  reply	other threads:[~2023-01-16 15:17 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-14  3:31 [PATCH net-next 00/10] net: support ipv4 big tcp Xin Long
2023-01-14  3:31 ` [PATCH net-next 01/10] net: add a couple of helpers for iph tot_len Xin Long
2023-01-14  3:31 ` [PATCH net-next 02/10] bridge: use skb_ip_totlen in br netfilter Xin Long
2023-01-14  3:31 ` [PATCH net-next 03/10] openvswitch: use skb_ip_totlen in conntrack Xin Long
2023-01-14  3:31 ` [PATCH net-next 04/10] net: sched: use skb_ip_totlen and iph_totlen Xin Long
2023-01-14  3:31 ` [PATCH net-next 05/10] netfilter: " Xin Long
2023-01-14  3:31 ` [PATCH net-next 06/10] cipso_ipv4: use iph_set_totlen in skbuff_setattr Xin Long
2023-01-14 15:38   ` Paul Moore
2023-01-14 17:52     ` Xin Long
2023-01-16 16:45       ` Paul Moore
2023-01-16 17:36         ` Xin Long
2023-01-16 18:12           ` Paul Moore
2023-01-16 19:33             ` Xin Long
2023-01-17  4:54               ` David Ahern
2023-01-17 19:51               ` Paul Moore
2023-01-17 22:46                 ` Paul Moore
2023-01-18  2:47                   ` David Ahern
2023-01-18 19:18                     ` Paul Moore
2023-01-14  3:31 ` [PATCH net-next 07/10] ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr Xin Long
2023-01-14  3:31 ` [PATCH net-next 08/10] net: add support for ipv4 big tcp Xin Long
2023-01-14  3:31 ` [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length_mt6 Xin Long
2023-01-15 15:41   ` David Ahern
2023-01-15 17:42     ` Xin Long
2023-01-15 19:40       ` Eric Dumazet
2023-01-15 20:14         ` Xin Long
2023-01-15 23:57           ` David Ahern
2023-01-16  9:24           ` Eric Dumazet
2023-01-16 15:07             ` David Ahern [this message]
2023-01-16 16:02               ` Eric Dumazet
2023-01-16 19:09                 ` Xin Long
2023-01-16 20:37                   ` Eric Dumazet
2023-01-17 15:47                     ` Xin Long
2023-01-19  1:18                       ` Xin Long
2023-01-19  3:13                         ` Eric Dumazet
2023-01-19 15:41                           ` David Ahern
2023-01-19 16:49                             ` Xin Long
2023-01-19 18:10                               ` Eric Dumazet
2023-01-19 18:57                                 ` Xin Long
2023-01-19 19:17                                   ` Eric Dumazet
2023-01-19 19:30                                     ` Xin Long
2023-01-15 23:58         ` David Ahern
2023-01-14  3:31 ` [PATCH net-next 10/10] selftests: add a selftest for big tcp Xin Long
2023-01-15 15:45 ` [PATCH net-next 00/10] net: support ipv4 " David Ahern
2023-01-15 16:04 ` Eric Dumazet
2023-01-15 17:33   ` Xin Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b73e2dd1-d7bc-e96b-8553-1536a1146f3c@gmail.com \
    --to=dsahern@gmail.com \
    --cc=aconole@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=gnault@redhat.com \
    --cc=i.maximets@ovn.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=lucien.xin@gmail.com \
    --cc=maheshb@google.com \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=paul@paul-moore.com \
    --cc=pshelar@ovn.org \
    --cc=razor@blackwall.org \
    --cc=roopa@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).