From: Florian Westphal <fw@strlen.de>
To: scott.k.mitch1@gmail.com
Cc: netfilter-devel@vger.kernel.org, pablo@netfilter.org
Subject: Re: [PATCH v6 2/2] netfilter: nfnetlink_queue: optimize verdict lookup with hash table
Date: Sun, 18 Jan 2026 00:00:07 +0100 [thread overview]
Message-ID: <aWwUd1Z8xz5Kk30j@strlen.de> (raw)
In-Reply-To: <20260117173231.88610-3-scott.k.mitch1@gmail.com>
scott.k.mitch1@gmail.com <scott.k.mitch1@gmail.com> wrote:
> From: Scott Mitchell <scott.k.mitch1@gmail.com>
>
> The current implementation uses a linear list to find queued packets by
> ID when processing verdicts from userspace. With large queue depths and
> out-of-order verdicting, this O(n) lookup becomes a significant
> bottleneck, causing userspace verdict processing to dominate CPU time.
>
> Replace the linear search with a hash table for O(1) average-case
> packet lookup by ID. The hash table automatically resizes based on
> queue depth: grows at 75% load factor, shrinks at 25% load factor.
> To prevent rapid resize cycling during traffic bursts, shrinking only
> occurs if at least 60 seconds have passed since the last shrink.
Ouch. Can we first try something simpler rather than starting a
reimplementation of rhashtable?
Or just use a global rhashtable for this?
> Hash table memory is allocated with GFP_KERNEL_ACCOUNT so memory is
> attributed to the cgroup rather than kernel overhead.
>
> The existing list data structure is retained for operations requiring
> linear iteration (e.g. flush, device down events). Hot fields
> (queue_hash_mask, queue_hash pointer, resize state) are placed in the
> same cache line as the spinlock and packet counters for optimal memory
> access patterns.
>
> Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com>
> ---
> include/net/netfilter/nf_queue.h | 1 +
> net/netfilter/nfnetlink_queue.c | 237 +++++++++++++++++++++++++++++--
> 2 files changed, 229 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
> index 4aeffddb7586..3d0def310523 100644
> --- a/include/net/netfilter/nf_queue.h
> +++ b/include/net/netfilter/nf_queue.h
> @@ -11,6 +11,7 @@
> /* Each queued (to userspace) skbuff has one of these. */
> struct nf_queue_entry {
> struct list_head list;
> + struct hlist_node hash_node;
> struct sk_buff *skb;
> unsigned int id;
> unsigned int hook_index; /* index in hook_entries->hook[] */
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index 7b2cabf08fdf..772d2a7d0d7c 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -30,6 +30,11 @@
> #include <linux/netfilter/nf_conntrack_common.h>
> #include <linux/list.h>
> #include <linux/cgroup-defs.h>
> +#include <linux/workqueue.h>
> +#include <linux/jiffies.h>
> +#include <linux/log2.h>
> +#include <linux/memcontrol.h>
> +#include <linux/sched/mm.h>
> #include <net/gso.h>
> #include <net/sock.h>
> #include <net/tcp_states.h>
> @@ -46,7 +51,11 @@
> #include <net/netfilter/nf_conntrack.h>
> #endif
>
> -#define NFQNL_QMAX_DEFAULT 1024
> +#define NFQNL_QMAX_DEFAULT 1024
> +#define NFQNL_HASH_MIN_SIZE 16
> +#define NFQNL_HASH_MAX_SIZE 131072
Is there a use case for such a large table?
> +#define NFQNL_HASH_DEFAULT_SIZE NFQNL_HASH_MIN_SIZE
> +#define NFQNL_HASH_SHRINK_INTERVAL (60 * HZ) /* Only shrink every 60 seconds */
> /* We're using struct nlattr which has 16bit nla_len. Note that nla_len
> * includes the header length. Thus, the maximum packet length that we
> @@ -59,6 +68,11 @@
> struct nfqnl_instance {
> struct hlist_node hlist; /* global list of queues */
> struct rcu_head rcu;
> + struct work_struct destroy_work;
> + struct work_struct resize_work;
> +#ifdef CONFIG_MEMCG
> + struct mem_cgroup *resize_memcg;
> +#endif
I feel this is way too complicated and over-the-top.
Can we either
1). use a global rhashtable, shared by all netns + all queues (so we
have no extra memory tied down per queue).
OR
2). Try with a simple, statically sized hash table (16? 32? 64?) without
any magic resizing?
And, if we go route 2), how much confidence is there that its good
enough?
Because if you already suspect you need all this extra grow/shrink logic
then then 1) is my preferred choice.
What is the deal-breaker wrt. rhashtable so that one would start to
reimplement the features it already offers?
next prev parent reply other threads:[~2026-01-17 23:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-17 17:32 [PATCH v6 0/2] netfilter: nfnetlink_queue: optimize verdict lookup with hash table scott.k.mitch1
2026-01-17 17:32 ` [PATCH v6 1/2] netfilter: nfnetlink_queue: nfqnl_instance GFP_ATOMIC -> GFP_KERNEL_ACCOUNT allocation scott.k.mitch1
2026-01-17 22:45 ` Florian Westphal
2026-01-17 23:25 ` Scott Mitchell
2026-01-19 0:39 ` Florian Westphal
2026-01-23 14:02 ` Scott Mitchell
2026-01-17 17:32 ` [PATCH v6 2/2] netfilter: nfnetlink_queue: optimize verdict lookup with hash table scott.k.mitch1
2026-01-17 23:00 ` Florian Westphal [this message]
2026-01-21 15:25 ` Scott Mitchell
2026-01-21 15:49 ` Florian Westphal
2026-01-23 1:58 ` Scott Mitchell
2026-01-23 6:54 ` Florian Westphal
2026-01-23 13:38 ` Scott Mitchell
2026-01-24 16:48 ` Florian Westphal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWwUd1Z8xz5Kk30j@strlen.de \
--to=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=scott.k.mitch1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox