From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Florian Westphal <fw@strlen.de>
Cc: Phil Sutter <phil@nwl.cc>, netfilter-devel@vger.kernel.org
Subject: Re: nftables: Writers starve readers
Date: Thu, 1 Jun 2023 22:06:24 +0200 [thread overview]
Message-ID: <ZHj6QAzAhUtfFO+g@calendula> (raw)
In-Reply-To: <20230601151105.GB26130@breakpoint.cc>
On Thu, Jun 01, 2023 at 05:11:05PM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > A call to 'nft list ruleset' in a second terminal hangs without output.
> > It apparently hangs in nft_cache_update() because rule_cache_dump()
> > returns EINTR. On kernel side, I guess it stems from
> > nl_dump_check_consistent() in __nf_tables_dump_rules(). I haven't
> > checked, but the generation counter likely increases while dumping the
> > 100k rules.
>
> Yes.
>
> > One may deem this scenario unrealistic, but I had to insert a 'sleep 5'
> > into the while-loop to unblock 'nft list ruleset' again. A new rule
> > every 4s especially in such a large ruleset is not that unrealistic IMO.
>
> Several seconds is very strange indeed, how is the data that needs
> to be transferred to userspace and how large is the buffer provided
> during dumps? strace would help here.
>
> If thats rather small, then dumping a chain with 10k rules may
> have to re-iterate the existig list for long time before it finds
> the starting point on where to resume the dump.
>
> > I wonder if we can provide some fairness to readers? Ideally a reader
> > would just see the ruleset as it was when it started dumping, but
> > keeping a copy of the large ruleset is probably not feasible.
>
> I can't think of a good solution. We could add a
> "--allow-inconsistent-dump" flag to nftables that disables the restart
> logic for -EINTR case, but we can't make that the default unfortunately.
>
> Or we could experiment with serializing the remaining rules into a
> private kernel-side kmalloc'd buffer once the userspace buffer is
> full, then copy from that buffer on resume without the inconsistency check.
>
> I don't think that we can solve this, slowing down writers when there
> are dumpers will load to the same issue, just in the oppostite direction.
There are currently two pending issues that, if addressed, could
improve things:
NLM_F_INTR is set on in case writer infers with a reader, currently
forcing userspace to read all of the remaining messages to leave
things in consistent state, otherwise next dump request hits EILSEQ in
libmnl. Before 6d085b22a8b5 ("table: support for the table owner
flag"), the socket was closed and reopen to workaround this issue.
There should be a way to discard the ongoing netlink dump without
having to read the remaining messages, that should also improve
things.
It should be possible to add generation counters per object type, so
userspace does not have to ditch all what it has in its cache, only
what it has changed. Currently the generation counter is global.
next prev parent reply other threads:[~2023-06-01 20:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-01 9:37 nftables: Writers starve readers Phil Sutter
2023-06-01 15:11 ` Florian Westphal
2023-06-01 16:42 ` Phil Sutter
2023-06-01 20:06 ` Pablo Neira Ayuso [this message]
2023-06-02 12:23 ` Phil Sutter
2023-06-02 22:11 ` Pablo Neira Ayuso
2023-06-02 22:54 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZHj6QAzAhUtfFO+g@calendula \
--to=pablo@netfilter.org \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=phil@nwl.cc \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.