From: Phil Sutter <phil@nwl.cc>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>, netfilter-devel@vger.kernel.org
Subject: Re: [nf PATCH 2/5] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests
Date: Tue, 26 Sep 2023 15:59:15 +0200 [thread overview]
Message-ID: <ZRLjs5Rv91mJWbC0@orbyte.nwl.cc> (raw)
In-Reply-To: <ZRLd8MxWZMt3O/Yh@calendula>
On Tue, Sep 26, 2023 at 03:34:40PM +0200, Pablo Neira Ayuso wrote:
> On Tue, Sep 26, 2023 at 02:14:05PM +0200, Phil Sutter wrote:
> > On Tue, Sep 26, 2023 at 12:09:35PM +0200, Pablo Neira Ayuso wrote:
> > > Hi Phil,
> > >
> > > On Tue, Sep 26, 2023 at 11:34:43AM +0200, Phil Sutter wrote:
> > > > On Mon, Sep 25, 2023 at 09:53:17PM +0200, Florian Westphal wrote:
> > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > > On Sat, Sep 23, 2023 at 06:18:13PM +0200, Florian Westphal wrote:
> > > > > > > callback_that_might_reset()
> > > > > > > {
> > > > > > > try_module_get ...
> > > > > > > rcu_read_unlock()
> > > > > > > mutex_lock(net->commit_mutex)
> > > > > > > dumper();
> > > > > > > mutex_unlock(net->commit_mutex)
> > > > > > > rcu_read_lock();
> > > > > > > module_put()
> > > > > > > }
> > > > > > >
> > > > > > > should do the trick.
> > > > > >
> > > > > > Idiom above LGTM, *except for net->commit_mutex*. Please do not use
> > > > > > ->commit_mutex: This will stall ruleset updates for no reason, netlink
> > > > > > dump would grab and release such mutex for each netlink_recvmsg() call
> > > > > > and netlink dump side will always retry because of NLM_F_EINTR.
> > > > >
> > > > > It will stall updates, but for good reason: we are making changes to the
> > > > > expressions state.
> > > >
> > > > This also disqualifies the use of Pablo's suggested reset_lock, right?
> > >
> > > Quick summary:
> > >
> > > We are currently discussing if it makes sense to add a new lock or
> > > not. The commit_mutex stalls updates, but netlink dumps retrieves
> > > listings in chunks, that is, one recvmsg() call from userspace (to
> > > retrieve one list chunk) will grab the mutex then release it until the
> > > next recvmsg() call is done. Between these two calls an update is
> > > still possible. The question is if it is worth to stall an ongoing
> > > listing or updates.
> >
> > Thanks for the summary. Assuming that a blocked commit will only be
> > postponed until after the current chunk was filled and is being
> > submitted to user space, I don't see how it would make a practical
> > difference for reset command if commit_mutex is used instead of
> > reset_lock (or a dedicated reset_mutex).
>
> If the problem we are addressing is two processes listing the ruleset
> that concur to reset stateful expressions, then there is no difference.
> However, this is stalling writers and I don't think we need this
> according to the problem description.
ACK. Maybe Florian has a case in mind which requires to serialize reset
and commit?
> Another point to consider is how likely this list-and-reset happens.
> If it is unlikely, then commit_mutex should be fine. But if we expect
> a process polling to fetch counters very often, this will introduce an
> unnecessary interference with writers. In a very dynamic deployment,
> with frequent transaction updates, that might stall transactions for
> no reason.
I wonder if hardcore 'reset the rules fetching counters' users care
about performance of ruleset changes at the same time. Maybe we
shouldn't care until someone complains?
> Please, note that using commit_mutex will *not* fix either that
> userspace has to properly deal with NLM_F_EINTR, and I am 100% sure
> you told me last time when you submitted this that you would prefer to
> fix that once you get a ticket^H^H^H^H^H^H complain from someone else.
> Oh, and I accepted that deal.
Sorry, I don't recall. What did I promise to fix once someone complains?
NLM_F_EINTR handling in user space for reset commands? Is it broken??
> Honestly, we already have things to improve in other fronts, such
> speeding up set updates and reducing userspace memory consumption,
> much of this is userspace work.
I totally agree. And I didn't make the call for locking reset requests,
I'm just trying to answer it since it's my code that's broken in that
regard.
> > > There is the NLM_F_EINTR mechanism in place that tells that an
> > > interference has occured while keeping the listing lockless.
> > >
> > > Unless I am missing anything, the goal is to fix two different
> > > processes that are listing at the same time, that is, two processes
> > > running a netlink dump at the same time that are resetting the
> > > stateful expressions in the ruleset.
> >
> > Here's a simple repro I use to verify the locking approach (only rule
> > reset for now):
> >
> > | set -e
> > |
> > | RULESET='flush ruleset
> > | table t {
> > | chain c {
> > | counter packets 23 bytes 42
> > | }
> > | }'
> > |
> > | trap "$NFT list ruleset" EXIT
> > | for ((i = 0; i < 10000; i++)); do
> > | echo "iter $i"
> > | $NFT -f - <<< "$RULESET"
> > | $NFT list ruleset | grep -q 'packets 23 bytes 42' >/dev/null
> > | $NFT reset rules >/dev/null &
> > | pid=$!
> > | $NFT reset rules >/dev/null
> > | wait $!
> > | #$NFT list ruleset | grep 'packets'
> > | $NFT list ruleset | grep -q 'packets 0 bytes 0' >/dev/null
> > | done
> >
> > If the two calls clash, the rule will have huge counter values due to
> > underflow.
>
> Can you give a try with the reset_lock spinlock approach with this
> script that exercises worst case?
It passes with this series applied. It just takes long to finish (due to
10k retries). If it triggers, it usually does within ~100 tries. But it
depends, and I don't know how to increase the chances. Otherwise I would
have put this in a kselftest.
Cheers, Phil
next prev parent reply other threads:[~2023-09-26 13:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-23 1:38 [nf PATCH 0/5] Introduce locking for reset requests Phil Sutter
2023-09-23 1:38 ` [nf PATCH 1/5] netfilter: nf_tables: Don't allocate nft_rule_dump_ctx Phil Sutter
2023-09-23 1:38 ` [nf PATCH 2/5] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests Phil Sutter
2023-09-23 11:04 ` Florian Westphal
2023-09-23 15:03 ` Phil Sutter
2023-09-23 16:18 ` Florian Westphal
2023-09-25 9:32 ` Pablo Neira Ayuso
2023-09-25 19:53 ` Florian Westphal
2023-09-26 9:34 ` Phil Sutter
2023-09-26 10:09 ` Pablo Neira Ayuso
2023-09-26 12:14 ` Phil Sutter
2023-09-26 13:34 ` Pablo Neira Ayuso
2023-09-26 13:59 ` Phil Sutter [this message]
2023-09-27 11:41 ` Florian Westphal
2023-09-27 12:54 ` Phil Sutter
2023-09-25 11:02 ` Pablo Neira Ayuso
2023-09-25 10:47 ` Pablo Neira Ayuso
2023-09-26 9:14 ` Phil Sutter
2023-09-26 10:00 ` Pablo Neira Ayuso
2023-09-25 10:48 ` Pablo Neira Ayuso
2023-09-25 11:01 ` Pablo Neira Ayuso
2023-09-23 1:38 ` [nf PATCH 3/5] netfilter: nf_tables: Introduce struct nft_obj_dump_ctx Phil Sutter
2023-09-23 1:38 ` [nf PATCH 4/5] netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests Phil Sutter
2023-09-23 1:38 ` [nf PATCH 5/5] netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests Phil Sutter
2023-09-25 10:53 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRLjs5Rv91mJWbC0@orbyte.nwl.cc \
--to=phil@nwl.cc \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.