From: Phil Sutter <phil@nwl.cc>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>, netfilter-devel@vger.kernel.org
Subject: Re: [nf PATCH 2/5] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests
Date: Tue, 26 Sep 2023 15:59:15 +0200 [thread overview]
Message-ID: <ZRLjs5Rv91mJWbC0@orbyte.nwl.cc> (raw)
In-Reply-To: <ZRLd8MxWZMt3O/Yh@calendula>
On Tue, Sep 26, 2023 at 03:34:40PM +0200, Pablo Neira Ayuso wrote:
> On Tue, Sep 26, 2023 at 02:14:05PM +0200, Phil Sutter wrote:
> > On Tue, Sep 26, 2023 at 12:09:35PM +0200, Pablo Neira Ayuso wrote:
> > > Hi Phil,
> > >
> > > On Tue, Sep 26, 2023 at 11:34:43AM +0200, Phil Sutter wrote:
> > > > On Mon, Sep 25, 2023 at 09:53:17PM +0200, Florian Westphal wrote:
> > > > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > > > On Sat, Sep 23, 2023 at 06:18:13PM +0200, Florian Westphal wrote:
> > > > > > > callback_that_might_reset()
> > > > > > > {
> > > > > > > try_module_get ...
> > > > > > > rcu_read_unlock()
> > > > > > > mutex_lock(net->commit_mutex)
> > > > > > > dumper();
> > > > > > > mutex_unlock(net->commit_mutex)
> > > > > > > rcu_read_lock();
> > > > > > > module_put()
> > > > > > > }
> > > > > > >
> > > > > > > should do the trick.
> > > > > >
> > > > > > Idiom above LGTM, *except for net->commit_mutex*. Please do not use
> > > > > > ->commit_mutex: This will stall ruleset updates for no reason, netlink
> > > > > > dump would grab and release such mutex for each netlink_recvmsg() call
> > > > > > and netlink dump side will always retry because of NLM_F_EINTR.
> > > > >
> > > > > It will stall updates, but for good reason: we are making changes to the
> > > > > expressions state.
> > > >
> > > > This also disqualifies the use of Pablo's suggested reset_lock, right?
> > >
> > > Quick summary:
> > >
> > > We are currently discussing if it makes sense to add a new lock or
> > > not. The commit_mutex stalls updates, but netlink dumps retrieves
> > > listings in chunks, that is, one recvmsg() call from userspace (to
> > > retrieve one list chunk) will grab the mutex then release it until the
> > > next recvmsg() call is done. Between these two calls an update is
> > > still possible. The question is if it is worth to stall an ongoing
> > > listing or updates.
> >
> > Thanks for the summary. Assuming that a blocked commit will only be
> > postponed until after the current chunk was filled and is being
> > submitted to user space, I don't see how it would make a practical
> > difference for reset command if commit_mutex is used instead of
> > reset_lock (or a dedicated reset_mutex).
>
> If the problem we are addressing is two processes listing the ruleset
> that concur to reset stateful expressions, then there is no difference.
> However, this is stalling writers and I don't think we need this
> according to the problem description.
ACK. Maybe Florian has a case in mind which requires to serialize reset
and commit?
> Another point to consider is how likely this list-and-reset happens.
> If it is unlikely, then commit_mutex should be fine. But if we expect
> a process polling to fetch counters very often, this will introduce an
> unnecessary interference with writers. In a very dynamic deployment,
> with frequent transaction updates, that might stall transactions for
> no reason.
I wonder if hardcore 'reset the rules fetching counters' users care
about performance of ruleset changes at the same time. Maybe we
shouldn't care until someone complains?
> Please, note that using commit_mutex will *not* fix either that
> userspace has to properly deal with NLM_F_EINTR, and I am 100% sure
> you told me last time when you submitted this that you would prefer to
> fix that once you get a ticket^H^H^H^H^H^H complain from someone else.
> Oh, and I accepted that deal.
Sorry, I don't recall. What did I promise to fix once someone complains?
NLM_F_EINTR handling in user space for reset commands? Is it broken??
> Honestly, we already have things to improve in other fronts, such
> speeding up set updates and reducing userspace memory consumption,
> much of this is userspace work.
I totally agree. And I didn't make the call for locking reset requests,
I'm just trying to answer it since it's my code that's broken in that
regard.
> > > There is the NLM_F_EINTR mechanism in place that tells that an
> > > interference has occured while keeping the listing lockless.
> > >
> > > Unless I am missing anything, the goal is to fix two different
> > > processes that are listing at the same time, that is, two processes
> > > running a netlink dump at the same time that are resetting the
> > > stateful expressions in the ruleset.
> >
> > Here's a simple repro I use to verify the locking approach (only rule
> > reset for now):
> >
> > | set -e
> > |
> > | RULESET='flush ruleset
> > | table t {
> > | chain c {
> > | counter packets 23 bytes 42
> > | }
> > | }'
> > |
> > | trap "$NFT list ruleset" EXIT
> > | for ((i = 0; i < 10000; i++)); do
> > | echo "iter $i"
> > | $NFT -f - <<< "$RULESET"
> > | $NFT list ruleset | grep -q 'packets 23 bytes 42' >/dev/null
> > | $NFT reset rules >/dev/null &
> > | pid=$!
> > | $NFT reset rules >/dev/null
> > | wait $!
> > | #$NFT list ruleset | grep 'packets'
> > | $NFT list ruleset | grep -q 'packets 0 bytes 0' >/dev/null
> > | done
> >
> > If the two calls clash, the rule will have huge counter values due to
> > underflow.
>
> Can you give a try with the reset_lock spinlock approach with this
> script that exercises worst case?
It passes with this series applied. It just takes long to finish (due to
10k retries). If it triggers, it usually does within ~100 tries. But it
depends, and I don't know how to increase the chances. Otherwise I would
have put this in a kselftest.
Cheers, Phil
next prev parent reply other threads:[~2023-09-26 13:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-23 1:38 [nf PATCH 0/5] Introduce locking for reset requests Phil Sutter
2023-09-23 1:38 ` [nf PATCH 1/5] netfilter: nf_tables: Don't allocate nft_rule_dump_ctx Phil Sutter
2023-09-23 1:38 ` [nf PATCH 2/5] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests Phil Sutter
2023-09-23 11:04 ` Florian Westphal
2023-09-23 15:03 ` Phil Sutter
2023-09-23 16:18 ` Florian Westphal
2023-09-25 9:32 ` Pablo Neira Ayuso
2023-09-25 19:53 ` Florian Westphal
2023-09-26 9:34 ` Phil Sutter
2023-09-26 10:09 ` Pablo Neira Ayuso
2023-09-26 12:14 ` Phil Sutter
2023-09-26 13:34 ` Pablo Neira Ayuso
2023-09-26 13:59 ` Phil Sutter [this message]
2023-09-27 11:41 ` Florian Westphal
2023-09-27 12:54 ` Phil Sutter
2023-09-25 11:02 ` Pablo Neira Ayuso
2023-09-25 10:47 ` Pablo Neira Ayuso
2023-09-26 9:14 ` Phil Sutter
2023-09-26 10:00 ` Pablo Neira Ayuso
2023-09-25 10:48 ` Pablo Neira Ayuso
2023-09-25 11:01 ` Pablo Neira Ayuso
2023-09-23 1:38 ` [nf PATCH 3/5] netfilter: nf_tables: Introduce struct nft_obj_dump_ctx Phil Sutter
2023-09-23 1:38 ` [nf PATCH 4/5] netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests Phil Sutter
2023-09-23 1:38 ` [nf PATCH 5/5] netfilter: nf_tables: Add locking for NFT_MSG_GETSETELEM_RESET requests Phil Sutter
2023-09-25 10:53 ` Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRLjs5Rv91mJWbC0@orbyte.nwl.cc \
--to=phil@nwl.cc \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).