All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Markus Wigge <wigge@bht-berlin.de>
Cc: netfilter@vger.kernel.org
Subject: Re: commit to kernel fails since Debian 12 (bookworm)
Date: Mon, 16 Oct 2023 23:24:26 +0200	[thread overview]
Message-ID: <ZS2qCnq6c+mKyDa3@calendula> (raw)
In-Reply-To: <43708702-0f37-4ea6-9b3d-4dc8ac2913a1@bht-berlin.de>

Hi Markus,

On Mon, Oct 16, 2023 at 01:02:35PM +0000, Markus Wigge wrote:
> >> With each received message I got a "device or resource busy" when conntrackd
> >> tried to commit it to the kernel.
> >>
> >> When I try to commit the cache now I get all the same errors but at once ;-)
> >
> > That means there is already an entry in the kernel.
>
> Is there any known change between bullseye and bookworm that might
> explain this? Unfortunately I am not so deep inside the kernel mechanics
> involved here.

The only spots where EBUSY could reasonably happen in the kernel is here:

static int
ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
{
        unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
        unsigned long d = ct->status ^ status;

        if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
                /* SEEN_REPLY bit can only be set */
                return -EBUSY;

        if (d & IPS_ASSURED && !(status & IPS_ASSURED))
                /* ASSURED bit can only be set */
                return -EBUSY;

And this EBUSY can only happen if userspace (conntrackd) is losing
race to update an already existing entry in the kernel.

> >> The architecture is quite simple and used to work since several years. It
> >> started flooding the syslog with dist-upgrade to "bookworm".
> >> Two active-active nodes share a bunch of VLANs in two keepalived groups.
> >>
> >> Each node is primary for one of the groups and secondary for the other. The
> >> interfaces are configured correctly and traffic is flowing as expected.
> >
> > That is, flow-based distribution between the firewalls, correct?
>
> I am not sure about your definition of flow-based but it sounds
> plausible. Each node is responsible for its own dedicated VLANs they
> only failover on reboot or upgrades etc.

So VLAN interfaces are distributed between nodes and, on failover, one
node picks up the VLAN interfaces of the node that is failing? I am
trying to understand if, in your setup, one node is active but is is
also at the same time a backup for the flows that are handled by the
other node.

> >> bird and bird6 are announcing the routes correctly on each side.
> >> Shorewall is used to filter the passing traffic. Thats all.
> >>
> >>>
> >>> EBUSY can be triggered in nf_conntrack_netlink.c in a few spots, this
> >>> is most likely ct status flags and conntrackd losing race to update
> >>> and entry that is being picked up from packet path.
> >>>
> >>> Is your ruleset dropping invalid packets to disable lazy pick up?
> >>> That is, nf_conntrack_tcp_loose sysctl is set to zero.
> >>
> >> nope:
> >> # sysctl -a | grep loose
> >> net.netfilter.nf_conntrack_dccp_loose = 1
> >> net.netfilter.nf_conntrack_tcp_loose = 1
> >
> > If _loose is enabled, that means kernel conntrack can pick up entries
> > from the middle base from packet path.
>
> I don't understand this part. The kernel picks up connections
> automatically? But how when the flow started on the other node?

This is how it works with net.netfilter.nf_conntrack_tcp_loose = 1,
that toggle enables "poor man" connection pickup, that is, the kernel
infers from the middle of the connection the current state.

> > Is your ruleset dropping invalid packets?
>
> Only for smurfs as far as I can see:
> >  203M   19G smurfs     0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID,NEW,UNTRACKED
>
> > Chain smurfs (7 references)
> >  pkts bytes target     prot opt in     out     source               destination
> >   19M 6211M RETURN     0    --  *      *       0.0.0.0              0.0.0.0/0
> >     0     0 smurflog   0    --  *      *       0.0.0.0/0            0.0.0.0/0           [goto]  ADDRTYPE match src-type BROADCAST
> >     0     0 smurflog   0    --  *      *       224.0.0.0/4          0.0.0.0/0           [goto]

This RETURN means you take back invalid packets to the chain where the
jump to smurfs happen.

> > It looks like conntrackd is getting late to synchronize the states
> > for some flows because the packet path already created the entry via
> > _loose mechanism.
>
> Following the logs it appears to me that every single entry is getting
> late then. I doubt that and don't see where state should come from
> beforehand.

From datapath itself, from the _loose mechanism that is enabled.

  parent reply	other threads:[~2023-10-16 21:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13 14:02 commit to kernel fails since Debian 12 (bookworm) Markus Wigge
2023-10-13 14:26 ` Kevin P. Fleming
2023-10-13 14:41 ` Pablo Neira Ayuso
     [not found]   ` <6289ae8d-7d8e-40a5-a012-3e6e32251942@bht-berlin.de>
     [not found]     ` <ZS0TvfCRySTWfdW6@calendula>
     [not found]       ` <43708702-0f37-4ea6-9b3d-4dc8ac2913a1@bht-berlin.de>
2023-10-16 21:24         ` Pablo Neira Ayuso [this message]
2023-10-18 11:31           ` Markus Wigge
2023-10-18 12:05             ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZS2qCnq6c+mKyDa3@calendula \
    --to=pablo@netfilter.org \
    --cc=netfilter@vger.kernel.org \
    --cc=wigge@bht-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.