From: David Ahern <dsahern@gmail.com>
To: Eric Dumazet <edumazet@google.com>, Xin Long <lucien.xin@gmail.com>
Cc: network dev <netdev@vger.kernel.org>,
davem@davemloft.net, kuba@kernel.org,
Paolo Abeni <pabeni@redhat.com>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Pravin B Shelar <pshelar@ovn.org>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
Pablo Neira Ayuso <pablo@netfilter.org>,
Florian Westphal <fw@strlen.de>,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
Ilya Maximets <i.maximets@ovn.org>,
Aaron Conole <aconole@redhat.com>,
Roopa Prabhu <roopa@nvidia.com>,
Nikolay Aleksandrov <razor@blackwall.org>,
Mahesh Bandewar <maheshb@google.com>,
Paul Moore <paul@paul-moore.com>,
Guillaume Nault <gnault@redhat.com>
Subject: Re: [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length_mt6
Date: Mon, 16 Jan 2023 08:07:58 -0700 [thread overview]
Message-ID: <b73e2dd1-d7bc-e96b-8553-1536a1146f3c@gmail.com> (raw)
In-Reply-To: <CANn89iLtF3dNcMkMGagCSfb+p5zA3Fa-DV9f9xMHHU_TX2CvSw@mail.gmail.com>
On 1/16/23 2:24 AM, Eric Dumazet wrote:
> On Sun, Jan 15, 2023 at 9:15 PM Xin Long <lucien.xin@gmail.com> wrote:
>>
>> On Sun, Jan 15, 2023 at 2:40 PM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Sun, Jan 15, 2023 at 6:43 PM Xin Long <lucien.xin@gmail.com> wrote:
>>>>
>>>> On Sun, Jan 15, 2023 at 10:41 AM David Ahern <dsahern@gmail.com> wrote:
>>>>>
>>>>> On 1/13/23 8:31 PM, Xin Long wrote:
>>>>>> For IPv6 jumbogram packets, the packet size is bigger than 65535,
>>>>>> it's not right to get it from payload_len and save it to an u16
>>>>>> variable.
>>>>>>
>>>>>> This patch only fixes it for IPv6 BIG TCP packets, so instead of
>>>>>> parsing IPV6_TLV_JUMBO exthdr, which is quite some work, it only
>>>>>> gets the pktlen via 'skb->len - skb_network_offset(skb)' when
>>>>>> skb_is_gso_v6() and saves it to an u32 variable, similar to IPv4
>>>>>> BIG TCP packets.
>>>>>>
>>>>>> This fix will also help us add selftest for IPv6 BIG TCP in the
>>>>>> following patch.
>>>>>>
>>>>>
>>>>> If this is a bug fix for the existing IPv6 support, send it outside of
>>>>> this set for -net.
>>>>>
>>>> Sure,
>>>> I was thinking of adding it here to be able to support selftest for
>>>> IPv6 too in the next patch. But it seems to make more sense to
>>>> get it into -net first, then add this selftest after it goes to net-next.
>>>>
>>>> I will post it and all other fixes I mentioned in the cover-letter for
>>>> IPv6 BIG TCP for -net.
>>>>
>>>> But before that, I hope Eric can confirm it is okay to read the length
>>>> of IPv6 BIG TCP packets with skb_ipv6_totlen() defined in this patch,
>>>> instead of parsing JUMBO exthdr?
>>>>
>>>
>>> I do not think it is ok, but I will leave the question to netfilter maintainers.
>> Just note that the issue doesn't only exist in netfilter.
>> All the changes in Patch 2-7 from this patchset are also needed for IPv6
>> BIG TCP packets.
>>
>>>
>>> Guessing things in tcpdump or other tools is up to user space implementations,
>>> trying to work around some (kernel ?) deficiencies.
>>>
>>> Yes, IPv6 extensions headers are a pain, we all agree.
>>>
>>> Look at how ip6_rcv_core() properly dissects extension headers _and_ trim
>>> skb accordingly (pskb_trim_rcsum() called either from ip6_rcv_core()
>>> or ipv6_hop_jumbo())
>>>
>>> So skb->len is not the root of trust. Some transport mediums might add paddings.
>>>
>>> Ipv4 has a similar logic in ip_rcv_core().
>>>
>>> len = ntohs(iph->tot_len);
>>> if (skb->len < len) {
>>> drop_reason = SKB_DROP_REASON_PKT_TOO_SMALL;
>>> __IP_INC_STATS(net, IPSTATS_MIB_INTRUNCATEDPKTS);
>>> goto drop;
>>> } else if (len < (iph->ihl*4))
>>> goto inhdr_error;
>>>
>>> /* Our transport medium may have padded the buffer out. Now we know it
>>> * is IP we can trim to the true length of the frame.
>>> * Note this now means skb->len holds ntohs(iph->tot_len).
>>> */
>>> if (pskb_trim_rcsum(skb, len)) {
>>> __IP_INC_STATS(net, IPSTATS_MIB_INDISCARDS);
>>> goto drop;
>>> }
>>>
>>> After your changes, we might accept illegal packets that were properly
>>> dropped before.
>> I think skb->len is trustable for GSO/GRO packets.
>> In ipv6_gro_complete/inet_gro_complete():
>> The new length for payload_len or iph->tot_len are all calculated from skb->len.
>> As I said in the cover-letter, "there is no padding in GSO/GRO packets".
>> Or am I missing something?
>
> This seems to be a contract violation with user space providing GSO packets.
>
> In our changes we added some sanity checks, inherent to JUMBO specs.
>
> Here, a GSO packet can now have a zero ip length, no matter if it is
> BIG TCP or not.
Meaning your preference is to set tot_len anytime it is <= 64kB so the
only time tot_len == 0 is for large GRO/TSO packets? That is doable.
>
> It seems we lower the bar for consistency, and allow bugs (say
> changing skb->len) to not be detected.
not sure why you think it would not be detected. Today's model for gro
sets tot_len based on skb->len. There is an inherent trust that the
user's of the gro API set the length correctly. If it is not, the
payload to userspace would ultimately be non-sense and hence detectable.
I tend to use ssh to test changes like this for this reason - L4 payload
must make sense.
For the Tx path, there is a similar line of trust that the skb->len
passed to the L3 layer is correct. IPv4/IPv6 blindly trust what it is
told for length.
>
> As you said, user space sniffing packets now have to guess what is the
> intent, instead of headers carrying all the needed information
> that can be fully validated by parsers.
This is a solveable problem within the packet socket API, and the entire
thing is opt-in. If a user's tcpdump / packet capture program is out of
date and does not support the new API for large packets, then that user
does not have to enable large GRO/TSO.
next prev parent reply other threads:[~2023-01-16 15:17 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-14 3:31 [PATCH net-next 00/10] net: support ipv4 big tcp Xin Long
2023-01-14 3:31 ` [PATCH net-next 01/10] net: add a couple of helpers for iph tot_len Xin Long
2023-01-14 3:31 ` [PATCH net-next 02/10] bridge: use skb_ip_totlen in br netfilter Xin Long
2023-01-14 3:31 ` [PATCH net-next 03/10] openvswitch: use skb_ip_totlen in conntrack Xin Long
2023-01-14 3:31 ` [PATCH net-next 04/10] net: sched: use skb_ip_totlen and iph_totlen Xin Long
2023-01-14 3:31 ` [PATCH net-next 05/10] netfilter: " Xin Long
2023-01-14 3:31 ` [PATCH net-next 06/10] cipso_ipv4: use iph_set_totlen in skbuff_setattr Xin Long
2023-01-14 15:38 ` Paul Moore
2023-01-14 17:52 ` Xin Long
2023-01-16 16:45 ` Paul Moore
2023-01-16 17:36 ` Xin Long
2023-01-16 18:12 ` Paul Moore
2023-01-16 19:33 ` Xin Long
2023-01-17 4:54 ` David Ahern
2023-01-17 19:51 ` Paul Moore
2023-01-17 22:46 ` Paul Moore
2023-01-18 2:47 ` David Ahern
2023-01-18 19:18 ` Paul Moore
2023-01-14 3:31 ` [PATCH net-next 07/10] ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr Xin Long
2023-01-14 3:31 ` [PATCH net-next 08/10] net: add support for ipv4 big tcp Xin Long
2023-01-14 3:31 ` [PATCH net-next 09/10] netfilter: get ipv6 pktlen properly in length_mt6 Xin Long
2023-01-15 15:41 ` David Ahern
2023-01-15 17:42 ` Xin Long
2023-01-15 19:40 ` Eric Dumazet
2023-01-15 20:14 ` Xin Long
2023-01-15 23:57 ` David Ahern
2023-01-16 9:24 ` Eric Dumazet
2023-01-16 15:07 ` David Ahern [this message]
2023-01-16 16:02 ` Eric Dumazet
2023-01-16 19:09 ` Xin Long
2023-01-16 20:37 ` Eric Dumazet
2023-01-17 15:47 ` Xin Long
2023-01-19 1:18 ` Xin Long
2023-01-19 3:13 ` Eric Dumazet
2023-01-19 15:41 ` David Ahern
2023-01-19 16:49 ` Xin Long
2023-01-19 18:10 ` Eric Dumazet
2023-01-19 18:57 ` Xin Long
2023-01-19 19:17 ` Eric Dumazet
2023-01-19 19:30 ` Xin Long
2023-01-15 23:58 ` David Ahern
2023-01-14 3:31 ` [PATCH net-next 10/10] selftests: add a selftest for big tcp Xin Long
2023-01-15 15:45 ` [PATCH net-next 00/10] net: support ipv4 " David Ahern
2023-01-15 16:04 ` Eric Dumazet
2023-01-15 17:33 ` Xin Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b73e2dd1-d7bc-e96b-8553-1536a1146f3c@gmail.com \
--to=dsahern@gmail.com \
--cc=aconole@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=gnault@redhat.com \
--cc=i.maximets@ovn.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=lucien.xin@gmail.com \
--cc=maheshb@google.com \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=paul@paul-moore.com \
--cc=pshelar@ovn.org \
--cc=razor@blackwall.org \
--cc=roopa@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).