From: Vlad Buslov <vladbu@mellanox.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Vlad Buslov <vladbu@mellanox.com>,
"pablo@netfilter.org" <pablo@netfilter.org>,
"xiyou.wangcong@gmail.com" <xiyou.wangcong@gmail.com>,
"jhs@mojatatu.com" <jhs@mojatatu.com>, mlxsw <mlxsw@mellanox.com>,
Alex Kushnarov <alexanderk@mellanox.com>,
"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: tc tp creation performance degratation since kernel 5.1
Date: Thu, 13 Jun 2019 10:09:32 +0000 [thread overview]
Message-ID: <vbfblz123vt.fsf@mellanox.com> (raw)
In-Reply-To: <20190613081152.GC2254@nanopsycho.orion>
On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
> I made a mistake during measurements, sorry about that.
>
> This is the correct script:
> -----------------------------------------------------------------------
> #!/bin/bash
>
> dev=testdummy
> ip link add name $dev type dummy
> ip link set dev $dev up
> tc qdisc add dev $dev ingress
>
> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
> pref_id=1
>
> while [ $pref_id -lt 20000 ]
> do
> echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
> #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
> ((pref_id++))
> done
>
> start=$(date +"%s")
> tc -b $tmp_file_name
> stop=$(date +"%s")
> echo "Insertion duration: $(($stop - $start)) sec"
> rm -f $tmp_file_name
>
> ip link del dev $dev
> -----------------------------------------------------------------------
>
> Note the commented out matchall. I don't see the regression with
> matchall. However, I see that with flower:
> kernel 5.1
> Insertion duration: 4 sec
> kernel 5.2
> Insertion duration: 163 sec
>
> I don't see any significant difference in perf:
> kernel 5.1
> 77.24% tc [kernel.vmlinux] [k] tcf_chain_tp_find
> 1.67% tc [kernel.vmlinux] [k] mutex_spin_on_owner
> 1.44% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
> 0.93% tc [kernel.vmlinux] [k] idr_get_free
> 0.79% tc_pref_scale_o [kernel.vmlinux] [k] do_syscall_64
> 0.69% tc [kernel.vmlinux] [k] finish_task_switch
> 0.53% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.49% tc [kernel.vmlinux] [k] __memset
> 0.36% tc_pref_scale_o libc-2.28.so [.] malloc
> 0.30% tc_pref_scale_o libc-2.28.so [.] _int_free
> 0.24% tc [kernel.vmlinux] [k] __memcpy
> 0.23% tc [cls_flower] [k] fl_change
> 0.23% tc [kernel.vmlinux] [k] __nla_validate_parse
> 0.22% tc [kernel.vmlinux] [k] __slab_alloc
>
>
> 75.57% tc [kernel.kallsyms] [k] tcf_chain_tp_find
> 2.70% tc [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 1.13% tc_pref_scale_o [kernel.kallsyms] [k] do_syscall_64
> 0.87% tc libc-2.28.so [.] __memset_sse2_unaligned_erms
> 0.86% ip [kernel.kallsyms] [k] finish_task_switch
> 0.67% tc [kernel.kallsyms] [k] memset
> 0.63% tc [kernel.kallsyms] [k] mutex_spin_on_owner
> 0.52% tc_pref_scale_o libc-2.28.so [.] malloc
> 0.48% tc [kernel.kallsyms] [k] idr_get_free
> 0.46% tc [kernel.kallsyms] [k] fl_change
> 0.42% tc_pref_scale_o libc-2.28.so [.] _int_free
> 0.35% tc_pref_scale_o libc-2.28.so [.] __GI___strlen_sse2
> 0.35% tc_pref_scale_o libc-2.28.so [.] __mbrtowc
> 0.34% tc_pref_scale_o libc-2.28.so [.] __fcntl64_nocancel_adjusted
>
> Any ideas?
Thanks for providing reproduction script!
I've investigate the problem and found the root cause. First of all I
noticed that CPU utilization during problematic tc run is quite low
(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
and obtained following off-CPU trace (uninteresting traces are omitted
for brevity):
~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
...
finish_task_switch
__sched_text_start
schedule
schedule_timeout
wait_for_completion
__wait_rcu_gp
synchronize_rcu
fl_change
tc_new_tfilter
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
sock_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64
entry_SYSCALL_64_after_hwframe
- tc (2069)
142284953
As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
code is in fl_create_new_mask() function:
err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
&newmask->ht_node, mask_ht_params);
if (err)
goto errout_destroy;
/* Wait until any potential concurrent users of mask are finished */
synchronize_rcu();
The justification for this is described in comment in
fl_check_assign_mask() (user of fl_create_new_mask()):
/* Insert mask as temporary node to prevent concurrent creation of mask
* with same key. Any concurrent lookups with same key will return
* -EAGAIN because mask's refcnt is zero. It is safe to insert
* stack-allocated 'mask' to masks hash table because we call
* synchronize_rcu() before returning from this function (either in case
* of error or after replacing it with heap-allocated mask in
* fl_create_new_mask()).
*/
fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
&mask->ht_node,
mask_ht_params);
The offending commit is part of my series that implements unlocked
flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
insertion")
The justification presented in it is no longer relevant since Ivan
Vecera changed mask to be dynamically allocated in commit 2cddd2014782
("net/sched: cls_flower: allocate mask dynamically in fl_change()").
With this we can just change fl_change() to deallocate temporary mask
with rcu grace period and remove offending syncrhonize_rcu() call.
Any other suggestions?
next prev parent reply other threads:[~2019-06-13 15:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-13 4:50 ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13 5:49 ` Jiri Pirko
2019-06-13 8:11 ` Jiri Pirko
2019-06-13 10:09 ` Vlad Buslov [this message]
2019-06-13 11:11 ` Jiri Pirko
2019-06-13 11:26 ` Vlad Buslov
2019-06-13 14:18 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=vbfblz123vt.fsf@mellanox.com \
--to=vladbu@mellanox.com \
--cc=alexanderk@mellanox.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=mlxsw@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.