From: Davide Caratti <dcaratti@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jiri Pirko <jiri@resnulli.us>,
"David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCH net-next 2/2] net/sched: act_police: don't use spinlock in the data path
Date: Fri, 16 Nov 2018 12:28:54 +0100 [thread overview]
Message-ID: <e7e539deac5573c913ed05cfd85bf5eb8533be19.camel@redhat.com> (raw)
In-Reply-To: <6ef73ea1-2f0e-b1c5-3a0a-1742db67b59a@gmail.com>
On Thu, 2018-11-15 at 05:53 -0800, Eric Dumazet wrote:
>
> On 11/15/2018 03:43 AM, Davide Caratti wrote:
> > On Wed, 2018-11-14 at 22:46 -0800, Eric Dumazet wrote:
> > > On 09/13/2018 10:29 AM, Davide Caratti wrote:
> > > > use RCU instead of spinlocks, to protect concurrent read/write on
> > > > act_police configuration. This reduces the effects of contention in the
> > > > data path, in case multiple readers are present.
> > > >
> > > > Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> > > > ---
> > > > net/sched/act_police.c | 156 ++++++++++++++++++++++++-----------------
> > > > 1 file changed, 92 insertions(+), 64 deletions(-)
> > > >
> > >
> > > I must be missing something obvious with this patch.
> >
> > hello Eric,
> >
> > On the opposite, I missed something obvious when I wrote that patch: there
> > is a race condition on tcfp_toks, tcfp_ptoks and tcfp_t_c: thank you for
> > noticing it.
> >
> > These variables still need to be protected with a spinlock. I will do a
> > patch and evaluate if 'act_police' is still faster than a version where
> > 2d550dbad83c ("net/sched: .... ") is reverted, and share results in the
> > next hours.
> >
> > Ok?
> >
>
> SGTM, thanks.
hello,
I just finished the comparison of act_police, in the following cases:
a) revert the RCU-ification (i.e. commit 2d550dbad83c ("net/sched:
act_police: don't use spinlock in the data path"), and leave per-cpu
counters used by the rate estimator
b) keep RCU-ified configuration parameters, and protect read/update of
tcfp_toks, tcfp_ptoks and tcfp_t with a spinlock (code at the bottom of
this message).
## Test setup:
$DEV is a 'dummy' with clsact qdisc; the following two commands,
# test police with 'rate'
$TC filter add dev $DEV egress matchall \
action police rate 2gbit burst 100k conform-exceed pass/pass index 100
# test police with 'avrate'
$TC filter add dev prova egress estimator 1s 8s matchall \
action police avrate 2gbit conform-exceed pass/pass index 100
are tested with the following loop:
for c in 1 2 4 8 16; do
./pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i $DEV
done
## Test results:
using rate | reverted | patched
$c | act_police (a) | act_police (b)
-------------+----------------+---------------
1 | 3364442 | 3345580
2 | 2703030 | 2721919
4 | 1130146 | 1253555
8 | 664238 | 658777
16 | 154026 | 155259
using avrate | reverted | patched
$c | act_police (a) | act_police (b)
-------------+----------------+---------------
1 | 3621796 | 3658472
2 | 3075589 | 3548135
4 | 2313314 | 3343717
8 | 768458 | 3260480
16 | 177776 | 3254128
so, 'avrate' still gets a significant improvement because the 'conform/exceed'
decision doesn't need the spinlock in this case. The estimation is probably
less accurate, because it use per-CPU variables: if this is not acceptable,
then we need to revert also 93be42f9173b ("net/sched: act_police: use per-cpu
counters").
## patch code:
-- >8 --
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 052855d..42db852 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -27,10 +27,7 @@ struct tcf_police_params {
u32 tcfp_ewma_rate;
s64 tcfp_burst;
u32 tcfp_mtu;
- s64 tcfp_toks;
- s64 tcfp_ptoks;
s64 tcfp_mtu_ptoks;
- s64 tcfp_t_c;
struct psched_ratecfg rate;
bool rate_present;
struct psched_ratecfg peak;
@@ -40,6 +37,9 @@ struct tcf_police_params {
struct tcf_police {
struct tc_action common;
+ s64 tcfp_toks;
+ s64 tcfp_ptoks;
+ s64 tcfp_t_c;
struct tcf_police_params __rcu *params;
};
@@ -186,12 +186,6 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
}
new->tcfp_burst = PSCHED_TICKS2NS(parm->burst);
- new->tcfp_toks = new->tcfp_burst;
- if (new->peak_present) {
- new->tcfp_mtu_ptoks = (s64)psched_l2t_ns(&new->peak,
- new->tcfp_mtu);
- new->tcfp_ptoks = new->tcfp_mtu_ptoks;
- }
if (tb[TCA_POLICE_AVRATE])
new->tcfp_ewma_rate = nla_get_u32(tb[TCA_POLICE_AVRATE]);
@@ -207,7 +201,14 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
}
spin_lock_bh(&police->tcf_lock);
- new->tcfp_t_c = ktime_get_ns();
+ police->tcfp_t_c = ktime_get_ns();
+ police->tcfp_toks = new->tcfp_burst;
+ if (new->peak_present) {
+ new->tcfp_mtu_ptoks = (s64)psched_l2t_ns(&new->peak,
+ new->tcfp_mtu);
+ police->tcfp_ptoks = new->tcfp_mtu_ptoks;
+ }
+
police->tcf_action = parm->action;
rcu_swap_protected(police->params,
new,
@@ -255,27 +256,29 @@ static int tcf_police_act(struct sk_buff *skb, const struct tc_action *a,
ret = p->tcfp_result;
goto end;
}
-
+ spin_lock_bh(&police->tcf_lock);
now = ktime_get_ns();
- toks = min_t(s64, now - p->tcfp_t_c, p->tcfp_burst);
+ toks = min_t(s64, now - police->tcfp_t_c, p->tcfp_burst);
if (p->peak_present) {
- ptoks = toks + p->tcfp_ptoks;
+ ptoks = toks + police->tcfp_ptoks;
if (ptoks > p->tcfp_mtu_ptoks)
ptoks = p->tcfp_mtu_ptoks;
ptoks -= (s64)psched_l2t_ns(&p->peak,
qdisc_pkt_len(skb));
}
- toks += p->tcfp_toks;
+ toks += police->tcfp_toks;
if (toks > p->tcfp_burst)
toks = p->tcfp_burst;
toks -= (s64)psched_l2t_ns(&p->rate, qdisc_pkt_len(skb));
if ((toks|ptoks) >= 0) {
- p->tcfp_t_c = now;
- p->tcfp_toks = toks;
- p->tcfp_ptoks = ptoks;
+ police->tcfp_t_c = now;
+ police->tcfp_toks = toks;
+ police->tcfp_ptoks = ptoks;
+ spin_unlock_bh(&police->tcf_lock);
ret = p->tcfp_result;
goto inc_drops;
}
+ spin_unlock_bh(&police->tcf_lock);
}
inc_overlimits:
-- >8 --
any feedback is appreciated.
thanks!
--
davide
next prev parent reply other threads:[~2018-11-16 21:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-13 17:29 [PATCH net-next 0/2] net/sched: act_police: lockless data path Davide Caratti
2018-09-13 17:29 ` [PATCH net-next 1/2] net/sched: act_police: use per-cpu counters Davide Caratti
2018-09-13 17:29 ` [PATCH net-next 2/2] net/sched: act_police: don't use spinlock in the data path Davide Caratti
2018-11-15 6:46 ` Eric Dumazet
2018-11-15 11:43 ` Davide Caratti
2018-11-15 13:53 ` Eric Dumazet
2018-11-16 11:28 ` Davide Caratti [this message]
2018-11-16 14:34 ` Eric Dumazet
2018-11-16 14:39 ` Eric Dumazet
2018-11-16 14:41 ` David Laight
2018-11-16 14:55 ` Eric Dumazet
2018-09-16 22:32 ` [PATCH net-next 0/2] net/sched: act_police: lockless " David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e7e539deac5573c913ed05cfd85bf5eb8533be19.camel@redhat.com \
--to=dcaratti@redhat.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).