Re: [PATCH v2 net] mld: fix panic in mld_newpack()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Taehee Yoo <ap420073@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	davem@davemloft.net, kuba@kernel.org, dsahern@kernel.org,
	yoshfuji@linux-ipv6.org, netdev@vger.kernel.org,
	xiyou.wangcong@gmail.com
Subject: Re: [PATCH v2 net] mld: fix panic in mld_newpack()
Date: Sat, 22 May 2021 01:11:57 +0900	[thread overview]
Message-ID: <905ec725-3d38-e110-e1ef-1300da80c2de@gmail.com> (raw)
In-Reply-To: <00922799-f302-b17b-2f2c-032c4a562315@gmail.com>

On 5/21/21 2:20 AM, Eric Dumazet wrote:

Hi Eric,
Thank you for your review!

 >
 >
 > On 5/16/21 4:44 PM, Taehee Yoo wrote:
 >> mld_newpack() doesn't allow to allocate high order page,
 >> only order-0 allocation is allowed.
 >> If headroom size is too large, a kernel panic could occur in skb_put().
 >>
 >> Test commands:
 >>      ip netns del A
 >>      ip netns del B
 >>      ip netns add A
 >>      ip netns add B
 >>      ip link add veth0 type veth peer name veth1
 >>      ip link set veth0 netns A
 >>      ip link set veth1 netns B
 >>
 >>      ip netns exec A ip link set lo up
 >>      ip netns exec A ip link set veth0 up
 >>      ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0
 >>      ip netns exec B ip link set lo up
 >>      ip netns exec B ip link set veth1 up
 >>      ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1
 >>      for i in {1..99}
 >>      do
 >>          let A=$i-1
 >>          ip netns exec A ip link add ip6gre$i type ip6gre \
 >> 	local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100
 >>          ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i
 >>          ip netns exec A ip link set ip6gre$i up
 >>
 >>          ip netns exec B ip link add ip6gre$i type ip6gre \
 >> 	local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100
 >>          ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i
 >>          ip netns exec B ip link set ip6gre$i up
 >>      done
 >>
 >> Splat looks like:
 >> kernel BUG at net/core/skbuff.c:110!
 >> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
 >> CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891
 >> Workqueue: ipv6_addrconf addrconf_dad_work
 >> RIP: 0010:skb_panic+0x15d/0x15f
 >> Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83
 >> 41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89
 >> 34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20
 >> RSP: 0018:ffff88810091f820 EFLAGS: 00010282
 >> RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000
 >> RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb
 >> RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031
 >> R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028
 >> R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0
 >> FS:  0000000000000000(0000) GS:ffff888117c00000(0000)
 >> knlGS:0000000000000000
 >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 >> CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0
 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 >> Call Trace:
 >>   ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 >>   ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 >>   skb_put.cold.104+0x22/0x22
 >>   ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 >>   ? rcu_read_lock_sched_held+0x91/0xc0
 >>   mld_newpack+0x398/0x8f0
 >>   ? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600
 >>   ? lock_contended+0xc40/0xc40
 >>   add_grhead.isra.33+0x280/0x380
 >>   add_grec+0x5ca/0xff0
 >>   ? mld_sendpack+0xf40/0xf40
 >>   ? lock_downgrade+0x690/0x690
 >>   mld_send_initial_cr.part.34+0xb9/0x180
 >>   ipv6_mc_dad_complete+0x15d/0x1b0
 >>   addrconf_dad_completed+0x8d2/0xbb0
 >>   ? lock_downgrade+0x690/0x690
 >>   ? addrconf_rs_timer+0x660/0x660
 >>   ? addrconf_dad_work+0x73c/0x10e0
 >>   addrconf_dad_work+0x73c/0x10e0
 >>
 >> Allowing high order page allocation could fix this problem.
 >>
 >> Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations")
 >> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
 >> ---
 >>
 >> v1 -> v2:
 >>   - Wait for mld-sleepable patchset to be merged.
 >>
 >>   net/ipv6/mcast.c | 3 ---
 >>   1 file changed, 3 deletions(-)
 >>
 >> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
 >> index 0d59efb6b49e..d36ef9d25e73 100644
 >> --- a/net/ipv6/mcast.c
 >> +++ b/net/ipv6/mcast.c
 >> @@ -1745,10 +1745,7 @@ static struct sk_buff *mld_newpack(struct 
inet6_dev *idev, unsigned int mtu)
 >>   		     IPV6_TLV_PADN, 0 };
 >>
 >>   	/* we assume size > sizeof(ra) here */
 >> -	/* limit our allocations to order-0 page */
 >> -	size = min_t(int, size, SKB_MAX_ORDER(0, 0));
 >>   	skb = sock_alloc_send_skb(sk, size, 1, &err);
 >> -
 >>   	if (!skb)
 >>   		return NULL;
 >>
 >>
 >
 > Sorry for being late to the party.
 >
 > This is forcing high-order allocations for devices with big mtu,
 > even for non pathological cases.
 >
 > (lo has MTU 65535, so we attempt order-5 allocations :/ )
 >
 > I think this could be smarter [1], addressing both the common case
 > and syzbot-like abuses.
 >
 > XMIT_RECURSION_LIMIT being 8, I doubt the repro makes any sense in 
real world.
 > Maybe it is time to limit netdev chains to 8 as well.
 >
 > Also, veth MTU being 1500, I fail to understand how your script
 > was crashing your host.

The root problem is that dev->need_headroom can be abnormally big.
Because when the tunneling interface begins to send a packet, it 
recalculates dev->needed_headroom.

int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
                  struct flowi6 *fl6, int encap_limit, __u32 *pmtu,
                  __u8 proto)
{
...
         max_headroom = LL_RESERVED_SPACE(dst->dev) + sizeof(struct ipv6hdr)
                         + dst->header_len + t->hlen;
         if (max_headroom > dev->needed_headroom)
                 dev->needed_headroom = max_headroom;

Every time an interface sends a packet, this function is called and it 
increases dev->needed_headroom.
The first time, ip6gre1 will recalculate its own needed_headroom value.
Then, when ip6gre2 is used and if a dst interface is ip6gre1, it 
increases ip6gre->needed_headroom.
This logic will be repeated until it uses ip6gre100.
Actually, most packets will be drop because of XMIT_RECURSION_LIMIT.
But updating dev->needed_headroom will be updated regardless of the 
success of sending packets.
So, the needed_headroom value can be too big.

I think your suggestion can fix this problem more smartly because It can 
avoid high-order allocation by mtu and it allows high-order if hlen or 
tlen is too big.
So, I think it deserves to be used.
And If we deny setting ->needed_headrom to a too big value at the 
ip6_tnl_xmit(), it's safer I think.
How do you think about it?

 >
 > [1]
 > diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
 > index 
d36ef9d25e73cb14eed45701acafcaf78e08451e..420bf2038e810173fc6f86622378b5151463f204 
100644
 > --- a/net/ipv6/mcast.c
 > +++ b/net/ipv6/mcast.c
 > @@ -1738,13 +1738,16 @@ static struct sk_buff *mld_newpack(struct 
inet6_dev *idev, unsigned int mtu)
 >          const struct in6_addr *saddr;
 >          int hlen = LL_RESERVED_SPACE(dev);
 >          int tlen = dev->needed_tailroom;
 > -       unsigned int size = mtu + hlen + tlen;
 > +       unsigned int size;
 >          int err;
 >          u8 ra[8] = { IPPROTO_ICMPV6, 0,
 >                       IPV6_TLV_ROUTERALERT, 2, 0, 0,
 >                       IPV6_TLV_PADN, 0 };
 >
 > -       /* we assume size > sizeof(ra) here */
 > +       /* We assume size > sizeof(ra) here.
 > +        * Also try to not allocate high-order pages for big MTU.
 > +        */
 > +       size = min_t(int, mtu, PAGE_SIZE/2) + hlen + tlen;
 >          skb = sock_alloc_send_skb(sk, size, 1, &err);
 >          if (!skb)
 >                  return NULL;

     prev parent reply	other threads:[~2021-05-21 16:12 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-16 14:44 [PATCH v2 net] mld: fix panic in mld_newpack() Taehee Yoo
2021-05-17 21:10 ` patchwork-bot+netdevbpf
2021-05-20 17:20 ` Eric Dumazet
2021-05-21 16:11   ` Taehee Yoo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=905ec725-3d38-e110-e1ef-1300da80c2de@gmail.com \
    --to=ap420073@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.