All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@mellanox.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Vlad Buslov <vladbu@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"pablo@netfilter.org" <pablo@netfilter.org>,
	"xiyou.wangcong@gmail.com" <xiyou.wangcong@gmail.com>,
	"jhs@mojatatu.com" <jhs@mojatatu.com>, mlxsw <mlxsw@mellanox.com>,
	Alex Kushnarov <alexanderk@mellanox.com>,
	"pabeni@redhat.com" <pabeni@redhat.com>
Subject: Re: tc tp creation performance degratation since kernel 5.1
Date: Thu, 13 Jun 2019 11:26:17 +0000	[thread overview]
Message-ID: <vbfa7el20ff.fsf@mellanox.com> (raw)
In-Reply-To: <20190613111135.GA2201@nanopsycho.orion>


On Thu 13 Jun 2019 at 14:11, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, Jun 13, 2019 at 12:09:32PM CEST, vladbu@mellanox.com wrote:
>>On Thu 13 Jun 2019 at 11:11, Jiri Pirko <jiri@resnulli.us> wrote:
>>> I made a mistake during measurements, sorry about that.
>>>
>>> This is the correct script:
>>> -----------------------------------------------------------------------
>>> #!/bin/bash
>>>
>>> dev=testdummy
>>> ip link add name $dev type dummy
>>> ip link set dev $dev up
>>> tc qdisc add dev $dev ingress
>>>
>>> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp")
>>> pref_id=1
>>>
>>> while [ $pref_id -lt 20000 ]
>>> do
>>>         echo "filter add dev $dev ingress proto ip pref $pref_id flower action drop" >> $tmp_file_name
>>>         #echo "filter add dev $dev ingress proto ip pref $pref_id matchall action drop" >> $tmp_file_name
>>>         ((pref_id++))
>>> done
>>>
>>> start=$(date +"%s")
>>> tc -b $tmp_file_name
>>> stop=$(date +"%s")
>>> echo "Insertion duration: $(($stop - $start)) sec"
>>> rm -f $tmp_file_name
>>>
>>> ip link del dev $dev
>>> -----------------------------------------------------------------------
>>>
>>> Note the commented out matchall. I don't see the regression with
>>> matchall. However, I see that with flower:
>>> kernel 5.1
>>> Insertion duration: 4 sec
>>> kernel 5.2
>>> Insertion duration: 163 sec
>>>
>>> I don't see any significant difference in perf:
>>> kernel 5.1
>>>     77.24%  tc               [kernel.vmlinux]    [k] tcf_chain_tp_find
>>>      1.67%  tc               [kernel.vmlinux]    [k] mutex_spin_on_owner
>>>      1.44%  tc               [kernel.vmlinux]    [k] _raw_spin_unlock_irqrestore
>>>      0.93%  tc               [kernel.vmlinux]    [k] idr_get_free
>>>      0.79%  tc_pref_scale_o  [kernel.vmlinux]    [k] do_syscall_64
>>>      0.69%  tc               [kernel.vmlinux]    [k] finish_task_switch
>>>      0.53%  tc               libc-2.28.so        [.] __memset_sse2_unaligned_erms
>>>      0.49%  tc               [kernel.vmlinux]    [k] __memset
>>>      0.36%  tc_pref_scale_o  libc-2.28.so        [.] malloc
>>>      0.30%  tc_pref_scale_o  libc-2.28.so        [.] _int_free
>>>      0.24%  tc               [kernel.vmlinux]    [k] __memcpy
>>>      0.23%  tc               [cls_flower]        [k] fl_change
>>>      0.23%  tc               [kernel.vmlinux]    [k] __nla_validate_parse
>>>      0.22%  tc               [kernel.vmlinux]    [k] __slab_alloc
>>>
>>>
>>>     75.57%  tc               [kernel.kallsyms]  [k] tcf_chain_tp_find
>>>      2.70%  tc               [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>>>      1.13%  tc_pref_scale_o  [kernel.kallsyms]  [k] do_syscall_64
>>>      0.87%  tc               libc-2.28.so       [.] __memset_sse2_unaligned_erms
>>>      0.86%  ip               [kernel.kallsyms]  [k] finish_task_switch
>>>      0.67%  tc               [kernel.kallsyms]  [k] memset
>>>      0.63%  tc               [kernel.kallsyms]  [k] mutex_spin_on_owner
>>>      0.52%  tc_pref_scale_o  libc-2.28.so       [.] malloc
>>>      0.48%  tc               [kernel.kallsyms]  [k] idr_get_free
>>>      0.46%  tc               [kernel.kallsyms]  [k] fl_change
>>>      0.42%  tc_pref_scale_o  libc-2.28.so       [.] _int_free
>>>      0.35%  tc_pref_scale_o  libc-2.28.so       [.] __GI___strlen_sse2
>>>      0.35%  tc_pref_scale_o  libc-2.28.so       [.] __mbrtowc
>>>      0.34%  tc_pref_scale_o  libc-2.28.so       [.] __fcntl64_nocancel_adjusted
>>>
>>> Any ideas?
>>
>>Thanks for providing reproduction script!
>>
>>I've investigate the problem and found the root cause. First of all I
>>noticed that CPU utilization during problematic tc run is quite low
>>(<10%), so I decided to investigate why tc sleeps so much. I've used bcc
>>and obtained following off-CPU trace (uninteresting traces are omitted
>>for brevity):
>>
>>~$ sudo /usr/share/bcc/tools/offcputime -K -p `pgrep -nx tc`
>>Tracing off-CPU time (us) of PID 2069 by kernel stack... Hit Ctrl-C to end.
>>...
>>    finish_task_switch
>>    __sched_text_start
>>    schedule
>>    schedule_timeout
>>    wait_for_completion
>>    __wait_rcu_gp
>>    synchronize_rcu
>>    fl_change
>>    tc_new_tfilter
>>    rtnetlink_rcv_msg
>>    netlink_rcv_skb
>>    netlink_unicast
>>    netlink_sendmsg
>>    sock_sendmsg
>>    ___sys_sendmsg
>>    __sys_sendmsg
>>    do_syscall_64
>>    entry_SYSCALL_64_after_hwframe
>>    -                tc (2069)
>>        142284953
>>
>>As you can see 142 seconds are spent sleeping in synchronize_rcu(). The
>>code is in fl_create_new_mask() function:
>>
>>	err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
>>				      &newmask->ht_node, mask_ht_params);
>>	if (err)
>>		goto errout_destroy;
>>
>>	/* Wait until any potential concurrent users of mask are finished */
>>	synchronize_rcu();
>>
>>The justification for this is described in comment in
>>fl_check_assign_mask() (user of fl_create_new_mask()):
>>
>>	/* Insert mask as temporary node to prevent concurrent creation of mask
>>	 * with same key. Any concurrent lookups with same key will return
>>	 * -EAGAIN because mask's refcnt is zero. It is safe to insert
>>	 * stack-allocated 'mask' to masks hash table because we call
>>	 * synchronize_rcu() before returning from this function (either in case
>>	 * of error or after replacing it with heap-allocated mask in
>>	 * fl_create_new_mask()).
>>	 */
>>	fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
>>						       &mask->ht_node,
>>						       mask_ht_params);
>>
>>The offending commit is part of my series that implements unlocked
>>flower: 195c234d15c9 ("net: sched: flower: handle concurrent mask
>>insertion")
>>
>>The justification presented in it is no longer relevant since Ivan
>>Vecera changed mask to be dynamically allocated in commit 2cddd2014782
>>("net/sched: cls_flower: allocate mask dynamically in fl_change()").
>>With this we can just change fl_change() to deallocate temporary mask
>>with rcu grace period and remove offending syncrhonize_rcu() call.
>>
>>Any other suggestions?
>
> So basically you just change synchronize_rcu() to kfree_rcu(mask),
>    correct?

Not really. I remove synchronize_rcu() and change all kfree(mask) in
fl_change() to tcf_queue_work(&mask->rwork, fl_mask_free_work) which
uses queue_rcu_work() internally. This would allow us to deallocate
fl_flow_mask in same manner on all code paths and doesn't require any
extensions in fl_flow_mask struct (kfree_rcu would require extending it
with rcu_head).

  reply	other threads:[~2019-06-13 15:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12 12:03 tc tp creation performance degratation since kernel 5.1 Jiri Pirko
2019-06-12 12:30 ` Paolo Abeni
2019-06-13  4:50   ` Jiri Pirko
2019-06-12 12:34 ` Vlad Buslov
2019-06-13  5:49   ` Jiri Pirko
2019-06-13  8:11 ` Jiri Pirko
2019-06-13 10:09   ` Vlad Buslov
2019-06-13 11:11     ` Jiri Pirko
2019-06-13 11:26       ` Vlad Buslov [this message]
2019-06-13 14:18         ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfa7el20ff.fsf@mellanox.com \
    --to=vladbu@mellanox.com \
    --cc=alexanderk@mellanox.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=mlxsw@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.