netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Linus Lüssing" <linus.luessing-djzkFPsfvsizQB+pC5nmwQ@public.gmane.org>
To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org
Subject: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
Date: Sun, 27 Jan 2019 22:47:08 +0100	[thread overview]
Message-ID: <20190127214708.GC1788@otheros> (raw)

Hi,

I was trying to implement a multicast-to-multi-unicast conversion
in batman-adv with the following patch:

https://patchwork.open-mesh.org/patch/17729/

However, on OpenWrt with a 4.9.146 kernel I get a
"Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list".

This only happens upon sending a SIGTERM to the network manager
"netifd" (so upon network shutdown). And only if the node is connected
to mesh of reasonable size, so if there is a certain amount of
multicast traffic for the multicast-to-multi-unicast patch to work on.

Upon normal operation, no such crash seems to occur.

The crash itself is triggered by the:

  BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));

in here:

https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354


What confuses me a bit is, that the multicast-to-multi-unicast
conversion uses the same/similar, simple skb_copy() approach like the
"classic broadcast flooding" approach in batman-adv so far. The latter too
transmits three redundant frames via skb_copy() to increase
reliability for Wifi broadcast packets.

One difference is that the broadcast flooding adds a bit of
delay between each transmission. Which the multicast-to-multi-unicast
doesn't.

Looking at "git log net/netfilter/nf_conntrack_core.c" I noticed
"netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks" (368982cd7). Which says:

"In nfqueue, two consecutive skbuffs may race to create the conntrack
 entry. Hence, the one that loses the race gets dropped due to clash in
 the insertion into the hashes from the nf_conntrack_confirm() path."

This patch is only part of >= 4.18, so not part of the firmware we use
yet. Could this issue somehow be related?


Other than that I was wondering whether we might be missing to
reset something after skb_copy()-ing. We do a "skb->protocol =
htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
batman-adv which sends the encapsulated frame into the
mesh. And we do a nf_reset(skb) after decapsulating a frame
received from the mesh. But maybe that is not enough?

Ticket this issue was reported at:

https://github.com/freifunk-gluon/gluon/issues/1468

Regards, Linus

             reply	other threads:[~2019-01-27 21:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-27 21:47 Linus Lüssing [this message]
2019-01-27 22:48 ` "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list" Florian Westphal
     [not found]   ` <20190127224822.lsagihtfiuvxyool-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:35     ` Chieh-Min Wang
     [not found]       ` <CALJUYjOq-xpjorsfnMRthzmC+iuDTVOPHXRb2p3ahU248Jrw4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 13:39         ` Florian Westphal
     [not found]           ` <20190128133940.jxwuscyi2wvbfb52-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:50             ` Pablo Neira Ayuso
2019-01-28 14:01               ` Florian Westphal
2019-01-28 14:03               ` Chieh-Min Wang
     [not found]                 ` <CALJUYjO4=pDT0COGJRx2YWDMiEJTpa2CdqQqxndd93khVDZHjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:13                   ` Florian Westphal
     [not found]                     ` <20190128141317.pxq7vklx346bv2bu-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 14:16                       ` Chieh-Min Wang
     [not found]                         ` <CALJUYjMM7EJVxxhh_q=607yn7OXhfnhrnk+m=tQ7C8GJjOCDcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:25                           ` Chieh-Min Wang
2019-01-29  9:07 ` Linus Lüssing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190127214708.GC1788@otheros \
    --to=linus.luessing-djzkfpsfvsizqb+pc5nmwq@public.gmane.org \
    --cc=b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r@public.gmane.org \
    --cc=netfilter-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).