netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-27 21:47 Linus Lüssing
  2019-01-27 22:48 ` Florian Westphal
  2019-01-29  9:07 ` Linus Lüssing
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Lüssing @ 2019-01-27 21:47 UTC (permalink / raw)
  To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

Hi,

I was trying to implement a multicast-to-multi-unicast conversion
in batman-adv with the following patch:

https://patchwork.open-mesh.org/patch/17729/

However, on OpenWrt with a 4.9.146 kernel I get a
"Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list".

This only happens upon sending a SIGTERM to the network manager
"netifd" (so upon network shutdown). And only if the node is connected
to mesh of reasonable size, so if there is a certain amount of
multicast traffic for the multicast-to-multi-unicast patch to work on.

Upon normal operation, no such crash seems to occur.

The crash itself is triggered by the:

  BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));

in here:

https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354


What confuses me a bit is, that the multicast-to-multi-unicast
conversion uses the same/similar, simple skb_copy() approach like the
"classic broadcast flooding" approach in batman-adv so far. The latter too
transmits three redundant frames via skb_copy() to increase
reliability for Wifi broadcast packets.

One difference is that the broadcast flooding adds a bit of
delay between each transmission. Which the multicast-to-multi-unicast
doesn't.

Looking at "git log net/netfilter/nf_conntrack_core.c" I noticed
"netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks" (368982cd7). Which says:

"In nfqueue, two consecutive skbuffs may race to create the conntrack
 entry. Hence, the one that loses the race gets dropped due to clash in
 the insertion into the hashes from the nf_conntrack_confirm() path."

This patch is only part of >= 4.18, so not part of the firmware we use
yet. Could this issue somehow be related?


Other than that I was wondering whether we might be missing to
reset something after skb_copy()-ing. We do a "skb->protocol =
htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
batman-adv which sends the encapsulated frame into the
mesh. And we do a nf_reset(skb) after decapsulating a frame
received from the mesh. But maybe that is not enough?

Ticket this issue was reported at:

https://github.com/freifunk-gluon/gluon/issues/1468

Regards, Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-29  9:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-27 21:47 "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list" Linus Lüssing
2019-01-27 22:48 ` Florian Westphal
     [not found]   ` <20190127224822.lsagihtfiuvxyool-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:35     ` Chieh-Min Wang
     [not found]       ` <CALJUYjOq-xpjorsfnMRthzmC+iuDTVOPHXRb2p3ahU248Jrw4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 13:39         ` Florian Westphal
     [not found]           ` <20190128133940.jxwuscyi2wvbfb52-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:50             ` Pablo Neira Ayuso
2019-01-28 14:01               ` Florian Westphal
2019-01-28 14:03               ` Chieh-Min Wang
     [not found]                 ` <CALJUYjO4=pDT0COGJRx2YWDMiEJTpa2CdqQqxndd93khVDZHjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:13                   ` Florian Westphal
     [not found]                     ` <20190128141317.pxq7vklx346bv2bu-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 14:16                       ` Chieh-Min Wang
     [not found]                         ` <CALJUYjMM7EJVxxhh_q=607yn7OXhfnhrnk+m=tQ7C8GJjOCDcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:25                           ` Chieh-Min Wang
2019-01-29  9:07 ` Linus Lüssing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).