netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
@ 2015-08-10 21:56 Brenden Blanco
  2015-08-11 20:51 ` Linus Lüssing
  0 siblings, 1 reply; 6+ messages in thread
From: Brenden Blanco @ 2015-08-10 21:56 UTC (permalink / raw)
  To: netdev, linus.luessing

Hi folks,

Here is a crash that I am able to easily reproduce. The setup is:

2 VMs, running in libvirt (qemu-kvm)
CPU mode is host-passthrough, virtio drivers used wherever available
Disable ipv6 (just to limit the amount of multicast noise)
Set up a multicast vxlan tunnel between the two VMs
Attach the vxlan device to a linux bridge
Attach a veth pair to the linux bridge
Enable ipv6 on a single veth
At this point, either one of the VMs may crash with the attached trace

Here is the test script. Not all lines are necessary, some are a byproduct of
eliminating various functions from the trace to eliminate them as suspects.

---------------
rmmod ebtable_nat
rmmod ebtables
sysctl net.ipv6.conf.all.disable_ipv6=1
ip l add vxlan0 type vxlan id 10000 group 239.1.1.1 dev eth1
ip l add br0 type bridge
ip l set vxlan0 master br0
ip l set br0 up
ip l set vxlan0 up
ip l add v1a type veth peer name v1b
ip l set v1b master br0
ip l set v1b up
ip l set v1a up
sysctl net.ipv6.conf.v1a.disable_ipv6=0
----------------

Doing some code reading with Alexei, we found a suspect commit, which
introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
when skb->len == len.

9afd85c9e4552 "net: Export IGMP/MLD message validation code"

static struct sk_buff *skb_checksum_maybe_trim(struct sk_buff *skb,
      unsigned int transport_len)
...
       if (skb->len < len) {
               kfree_skb(skb);
               return NULL;
       } else if (skb->len == len) {
               return skb;
       }
...
static int __ipv6_mc_check_mld(struct sk_buff *skb,
                              struct sk_buff **skb_trimmed)
...
       skb_get(skb);
       skb_chk = skb_checksum_trimmed(skb, transport_len,
                                      ipv6_mc_validate_checksum);



Would someone more familiar with the code be able to suggest a viable solution
or patch to try?

Cheers,
Brenden


Apologies for some of the mangled text:

[  100.879047] ------------[ cut here ]------------
[  100.879105] kernel BUG at net/core/skbuff.c:1128!
[  100.879144] invalid opcode: 0000 [#1]
[  100.879250] Modules linked in: veth bridge stp llc vxlan
ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables iptable_filter
ip_tables x_tables netconsole configfs btrfs ib_iser rdma_cm iw_cm
ib_cm ib_sa ib_mad ib_core ib_addr openvswitch iscsi_tcp libiscsi_tcp
libiscsi xor scsi_transport_iscsi libcrc32c raid6_pq dm_crypt iosf_mbi
kvm_intel kvm ppdev dm_multipath crct10dif_pclmul scsi_dh crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd psmouse input_leds serio_raw floppy 8250_fintek
i2c_piix4 parport_pc pata_acpi mac_hid lp parport virtio_scsi
[last unloaded: ebtables]

[  100.881340] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc4+ #3
[  100.881375] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.2-20150617_082717-anatol 04/01/2014
[  100.881416] task: ffff88013abca940 ti: ffff88013abdc000 task.ti:
ffff88013abdc000
[  100.881457] RIP: 0010:[<ffffffff8168d3d7>] [<ffffffff8168d3d7>]
pskb_expand_head+0x227/0x260
[  100.881532] RSP: 0018:ffff88013fd03ab8  EFLAGS: 00010202
[  100.881567] RAX: 0000000000000002 RBX: ffff8800bb601500 RCX: 0000000000000020
[  100.881604] RDX: 0000000000000148 RSI: 0000000000000000 RDI: ffff8800bb601500
[  100.881642] RBP: ffff88013fd03af8 R08: 0000000000000000 R09: 000000000000001c
[  100.881677] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[  100.881714] R13: ffff8800bb601500 R14: ffff8800bb358840 R15: 0000000000000000
[  100.881749] FS:  0000000000000000(0000) GS:ffff88013fd00000(0000)
knlGS:0000000000000000
[  100.881790] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  100.881828] CR2: 00007f91049f2162 CR3: 0000000136a94000 CR4: 00000000001406e0
[  100.881864] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  100.881902] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  100.881936] Stack:
[  100.881977]  ffff88013fd03b38 ffffffff816cd766 ffff88013fd03b67
ffff8800bb601500
[  100.882149]  ffff88013fd03be0 0000000000000008 ffff8800bb358840
0000000000000000
[  100.882316]  ffff88013fd03b48 ffffffff8168e68f ffff88013fd03b48
000000088168f4d0
[  100.882486] Call Trace:
[  100.882524]  <IRQ>
100.882524]  <IRQ> ace:
03b480000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
0000000000
2717-anatol 04/01/2014
_netfilter if you need this.
 may change behavior in the future.

rser"

31 c0 87 87 b0 01 00 00 f7
 tpm br_netfilter e1000e dw_dmac i2c_hid dw_dmac_core wmi video bridge
8250_dw gpio_lynxpoint i2c_designware_platform ptp mei_me stp
i2c_designware_core pps_core llc acpi_pad mei i2c_core shpchp
spi_pxa2xx_platform processor button xt_addrtype nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat
nf_conntrack_ftp nf_conntrack iptable_filter sch_fq_codel nfsd nfs
auth_rpcgss fscache oid_registry nfs_acl lockd grace sunrpc ip_tables
x_tables ext4 mbcache crc16 jbd2 dm_mod hid_generic usbhid hidore
ablk_helper
Aug 10 14:12:43 192.168.1.10 [  100.882558]  [<ffffffff816cd766>] ?
netlink_broadcast_filtered+0x136/0x3e0
Aug 10 14:12:43 192.168.1.10 [  100.882630]  [<ffffffff8168e68f>]
__pskb_pull_tail+0x4f/0x350
Aug 10 14:12:43 192.168.1.10 [  100.882671]  [<ffffffff8177da86>]
ipv6_mc_check_mld+0x286/0x330
Aug 10 14:12:43 192.168.1.10 [  100.882714]  [<ffffffffc03eaf45>]
br_multicast_rcv+0x85/0xb90 [bridge]
Aug 10 14:12:43 192.168.1.10 [  100.882751]  [<ffffffff810bc5e1>] ?
__raw_callee_save___pv_queued_spin_unlock+0x11/0x20
Aug 10 14:12:43 192.168.1.10 [  100.882792]  [<ffffffff8168a3ba>] ?
kfree_skbmem+0x5a/0x60
Aug 10 14:12:43 192.168.1.10 [  100.882829]  [<ffffffffc03dfef2>] ?
br_fdb_update+0x132/0x1e0 [bridge]
Aug 10 14:12:43 192.168.1.10 [  100.882868]  [<ffffffffc03e22b6>]
br_handle_frame_finish+0x296/0x590 [bridge]
Aug 10 14:12:43 192.168.1.10 [  100.882911]  [<ffffffff8169eeaf>] ?
__netif_receive_skb_core+0x63f/0x9a0
Aug 10 14:12:43 192.168.1.10 [  100.882948]  [<ffffffff8170aebe>] ?
udp4_gro_receive+0x11e/0x2d0
Aug 10 14:12:43 192.168.1.10 [  100.882987]  [<ffffffffc03e26ff>]
br_handle_frame+0x14f/0x280 [bridge]
Aug 10 14:12:43 192.168.1.10 [  100.883026]  [<ffffffff8169f228>] ?
__netif_receive_skb+0x18/0x60
Aug 10 14:12:43 192.168.1.10 [  100.883061]  [<ffffffff8169ea41>]
__netif_receive_skb_core+0x1d1/0x9a0
Aug 10 14:12:43 192.168.1.10 [  100.883100]  [<ffffffff8169fe55>] ?
napi_gro_receive+0xb5/0xf0
Aug 10 14:12:43 192.168.1.10 [  100.883136]  [<ffffffff815841bd>] ?
virtnet_receive+0x27d/0x890
Aug 10 14:12:43 192.168.1.10 [  100.883174]  [<ffffffff8169f228>]
__netif_receive_skb+0x18/0x60
Aug 10 14:12:43 192.168.1.10 [  100.883213]  [<ffffffff8169ff2f>]
process_backlog+0x9f/0x150
Aug 10 14:12:43 192.168.1.10 [  100.883248]  [<ffffffff8169f6ac>]
net_rx_action+0x14c/0x320
Aug 10 14:12:43 192.168.1.10 [  100.883289]  [<ffffffff81079a42>]
__do_softirq+0xd2/0x250
Aug 10 14:12:43 192.168.1.10 [  100.883328]  [<ffffffff81079df5>]
irq_exit+0x95/0xa0
Aug 10 14:12:43 192.168.1.10 [  100.883364]  [<ffffffff817aa83a>]
do_IRQ+0x5a/0xe0
Aug 10 14:12:43 192.168.1.10 [  100.883405]  [<ffffffff817a87ab>]
common_interrupt+0x6b/0x6b
Aug 10 14:12:43 192.168.1.10 [  100.883439]  <EOI>
Aug 10 14:12:43 192.168.1.10 100.883439]  <EOI> ffff817a87ab>]
common_interrupt+0x6b/0x6b
Aug 10 14:12:43 192.168.1.10 [  100.883476]  [<ffffffff8105d2b6>] ?
native_safe_halt+0x6/0x10
Aug 10 14:12:43 192.168.1.10 [  100.883550]  [<ffffffff8101ec8e>]
default_idle+0x1e/0xa0
Aug 10 14:12:43 192.168.1.10 [  100.883589]  [<ffffffff8101f35f>]
arch_cpu_idle+0xf/0x20
Aug 10 14:12:43 192.168.1.10 [  100.883625]  [<ffffffff810b59b2>]
default_idle_call+0x32/0x40
Aug 10 14:12:43 192.168.1.10 [  100.883664]  [<ffffffff810b5cce>]
cpu_startup_entry+0x2ae/0x320
Aug 10 14:12:43 192.168.1.10 [  100.883705]  [<ffffffff8104adf5>]
start_secondary+0x175/0x1a0
Aug 10 14:12:43 192.168.1.10 [  100.883740] Code:
Aug 10 14:12:43 192.168.1.10 [  100.885904] RIP
Aug 10 14:12:43 192.168.1.10 [<ffffffff8168d3d7>] pskb_expand_head+0x227/0x260
Aug 10 14:12:43 192.168.1.10 [  100.885973]  RSP <ffff88013fd03ab8>
Aug 10 14:12:43 192.168.1.10 [  100.886017] ---[ end trace 65465776bab2684e ]---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
  2015-08-10 21:56 ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128 Brenden Blanco
@ 2015-08-11 20:51 ` Linus Lüssing
  2015-08-11 21:47   ` Linus Lüssing
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Lüssing @ 2015-08-11 20:51 UTC (permalink / raw)
  To: Brenden Blanco; +Cc: netdev

On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
> Doing some code reading with Alexei, we found a suspect commit, which
> introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
> when skb->len == len.

Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
reference count greater than one... But yes, the BUG() call in
skbuff.c:1128 / pskb_expand_head() says that (though in this case
the BUG() in skbuff.c call actually seems kinda weird (/"wrong"?), as
it isn't shared between different code paths).

Thanks for the thorough analysis, going to provide a patch within
the next 24h (hopefully).

Cheers, Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
  2015-08-11 20:51 ` Linus Lüssing
@ 2015-08-11 21:47   ` Linus Lüssing
  2015-08-12  0:51     ` Alexei Starovoitov
  2015-08-12  4:56     ` David Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Lüssing @ 2015-08-11 21:47 UTC (permalink / raw)
  To: Brenden Blanco; +Cc: netdev

On Tue, Aug 11, 2015 at 10:51:40PM +0200, Linus Lüssing wrote:
> On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
> > Doing some code reading with Alexei, we found a suspect commit, which
> > introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
> > when skb->len == len.
> 
> Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
> reference count greater than one... But yes, the BUG() call in
> skbuff.c:1128 / pskb_expand_head() says that (though in this case
> the BUG() in skbuff.c call actually seems kinda weird (/"wrong"?), as
> it isn't shared between different code paths).

The more I think about it, I'm tending to remove the BUG() call in
pskb_expand_head() as in this case it obviously isn't a bug.

The skb_get() allows a simple and in my opinion easy to read cleanup
part of skb_trimmed for any caller of ip{v6,}_mc_check_mld(). No need
to check whether skb == skb_trimmed for a caller for instance,
simply checking whether skb_trimmed exists is enough.


Any objections to remove the "if (skb_shared(skb)) BUG()" part in
pskb_expand_head()? Or would there be any other undesired side
effects in utilising skb_get() like that?

Cheers, Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
  2015-08-11 21:47   ` Linus Lüssing
@ 2015-08-12  0:51     ` Alexei Starovoitov
  2015-08-12  4:56     ` David Miller
  1 sibling, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2015-08-12  0:51 UTC (permalink / raw)
  To: Linus Lüssing; +Cc: Brenden Blanco, netdev

On Tue, Aug 11, 2015 at 11:47:25PM +0200, Linus Lüssing wrote:
> On Tue, Aug 11, 2015 at 10:51:40PM +0200, Linus Lüssing wrote:
> > On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
> > > Doing some code reading with Alexei, we found a suspect commit, which
> > > introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
> > > when skb->len == len.
> > 
> > Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
> > reference count greater than one... But yes, the BUG() call in
> > skbuff.c:1128 / pskb_expand_head() says that (though in this case
> > the BUG() in skbuff.c call actually seems kinda weird (/"wrong"?), as
> > it isn't shared between different code paths).
> 
> The more I think about it, I'm tending to remove the BUG() call in
> pskb_expand_head() as in this case it obviously isn't a bug.
> 
> The skb_get() allows a simple and in my opinion easy to read cleanup
> part of skb_trimmed for any caller of ip{v6,}_mc_check_mld(). No need
> to check whether skb == skb_trimmed for a caller for instance,
> simply checking whether skb_trimmed exists is enough.
> 
> 
> Any objections to remove the "if (skb_shared(skb)) BUG()" part in
> pskb_expand_head()? Or would there be any other undesired side
> effects in utilising skb_get() like that?

That fundamental check was there for 10+ years and cannot be removed.
bridge already did skb_share_check() before reaching this
__ipv6_mc_check_mld() path.
There is no reason to do skb_get() there.
It wasn't there before commit 9afd85c9e4552 which claims to do:
'Some small refactoring was done to enhance readibility',
but doing skb_get()+pskb_may_pull() which is incorrect.
Avoiding unnecessary skb_clone() is a good thing, but it should be
done without messing with skb->users, since this code path
already owns skb.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
  2015-08-11 21:47   ` Linus Lüssing
  2015-08-12  0:51     ` Alexei Starovoitov
@ 2015-08-12  4:56     ` David Miller
  2015-08-12 14:26       ` Eric Dumazet
  1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2015-08-12  4:56 UTC (permalink / raw)
  To: linus.luessing; +Cc: bblanco, netdev

From: Linus Lüssing <linus.luessing@c0d3.blue>
Date: Tue, 11 Aug 2015 23:47:25 +0200

> On Tue, Aug 11, 2015 at 10:51:40PM +0200, Linus Lüssing wrote:
>> On Mon, Aug 10, 2015 at 02:56:12PM -0700, Brenden Blanco wrote:
>> > Doing some code reading with Alexei, we found a suspect commit, which
>> > introduces an skb_get and skb_may_pull of the same skb, which leads to the BUG
>> > when skb->len == len.
>> 
>> Urgh, didn't know that pskb_may_pull() doesn't like an skb with a
>> reference count greater than one... But yes, the BUG() call in
>> skbuff.c:1128 / pskb_expand_head() says that (though in this case
>> the BUG() in skbuff.c call actually seems kinda weird (/"wrong"?), as
>> it isn't shared between different code paths).
> 
> The more I think about it, I'm tending to remove the BUG() call in
> pskb_expand_head() as in this case it obviously isn't a bug.

Calling pskb_expand_head() with a shared SKB is absolutely,
positively, a bug.  You just don't understand why it is.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128
  2015-08-12  4:56     ` David Miller
@ 2015-08-12 14:26       ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2015-08-12 14:26 UTC (permalink / raw)
  To: David Miller; +Cc: linus.luessing, bblanco, netdev

On Tue, 2015-08-11 at 21:56 -0700, David Miller wrote:

> Calling pskb_expand_head() with a shared SKB is absolutely,
> positively, a bug.  You just don't understand why it is.

Definitely agree. Its a pain to find races otherwise.

skb_get() in general is quite tricky.

Better avoid it unless really needed in performance critical paths.

Relevant commits

ba34e6d9d346fe4e05d7e417b9edf5140772d34c tcp: make sure skb is not shared before using skb_get()

fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0 ping: Fix race in free in receive path

1dc7b90f7cd050ef6d5e511e652347e52874469c ipv6: tcp: fix race in IPV6_2292PKTOPTIONS 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-12 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-10 21:56 ipv6_mc_check_mld - kernel BUG at net/core/skbuff.c:1128 Brenden Blanco
2015-08-11 20:51 ` Linus Lüssing
2015-08-11 21:47   ` Linus Lüssing
2015-08-12  0:51     ` Alexei Starovoitov
2015-08-12  4:56     ` David Miller
2015-08-12 14:26       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).