All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [RFC PATCH] net_sched: bulk free tcf_block
Date: Wed, 29 Nov 2017 19:47:52 +0100	[thread overview]
Message-ID: <1511981272.15699.18.camel@redhat.com> (raw)
In-Reply-To: <CAKgT0UeE+TE95cCTPSJQ_yN21jikRtKV3rNc=EDOB5XkWUsnKQ@mail.gmail.com>

Hi,

On Wed, 2017-11-29 at 08:14 -0800, Alexander Duyck wrote:
> On Wed, Nov 29, 2017 at 6:25 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > Currently deleting qdisc with a large number of children and filters
> > can take a lot of time:
> > 
> > tc qdisc add dev lo root htb
> > for I in `seq 1 1000`; do
> >         tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
> >         tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
> >         for J in `seq 1 10`; do
> >                 tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
> >         done
> > done
> > time tc qdisc del dev root
> > 
> > real    0m54.764s
> > user    0m0.023s
> > sys     0m0.000s
> > 
> > This is due to the multiple rcu_barrier() calls, one for each tcf_block
> > freed, invoked with the rtnl lock held. Most other network related
> > tasks will block in this timespan.
> > 
> > This change implements bulk free of tcf_block() at destroy() time:
> > when deleting a qdisc hierarchy, the to-be-deleted blocks are queued
> > in a rtnl_lock protected list, and a single rcu_barrier is invoked
> > for each burst.
> > 
> > The burst is flushed after the deletion of the topmost qdisc of the
> > destroyed hierarchy and all the queued blocks are deleted with a
> > single delayed work call.
> > 
> > After this change, the previous example gives:
> > 
> > real    0m0.193s
> > user    0m0.000s
> > sys     0m0.016s
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> While I agree there is an issue I don't think this is being approached
> quite the right way. The question I have is why aren't we using the
> standard RCU approach for this and simply using either call_rcu or
> kfree_rcu to free the object?
> 
> > ---
> > This patch adds 2 new list_head fields to tcf_block, that could be
> > replaced with a single pointer, open coding single linked list
> > manipulation in cls_api.c, if a lower memory impact is required.
> > ---
> >  include/net/pkt_cls.h     |  1 +
> >  include/net/sch_generic.h |  5 +++
> >  net/sched/cls_api.c       | 78 +++++++++++++++++++++++++++++++++++------------
> >  net/sched/sch_api.c       |  1 +
> >  net/sched/sch_generic.c   | 17 +++++++++++
> >  net/sched/sch_ingress.c   |  1 +
> >  6 files changed, 83 insertions(+), 20 deletions(-)
> > 
> > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > index 0105445cab83..12777cfae77c 100644
> > --- a/include/net/pkt_cls.h
> > +++ b/include/net/pkt_cls.h
> > @@ -45,6 +45,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> >  void tcf_block_put(struct tcf_block *block);
> >  void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
> >                        struct tcf_block_ext_info *ei);
> > +void tcf_flush_blocks(void);
> > 
> >  static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
> >  {
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 65d0d25f2648..99846ee644a8 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -71,6 +71,7 @@ struct Qdisc {
> >                                       * qdisc_tree_decrease_qlen() should stop.
> >                                       */
> >  #define TCQ_F_INVISIBLE                0x80 /* invisible by default in dump */
> > +#define TCQ_F_DELETING         0x100
> >         u32                     limit;
> >         const struct Qdisc_ops  *ops;
> >         struct qdisc_size_table __rcu *stab;
> > @@ -279,6 +280,10 @@ struct tcf_block {
> >         struct Qdisc *q;
> >         struct list_head cb_list;
> >         struct work_struct work;
> > +
> > +       /* TODO: use a single list, do avoid wasting too much memory */
> > +       struct list_head del_list;
> > +       struct list_head del_head;
> >  };
> > 
> 
> This is just adding yet another layer of deferred freeing. We already
> have RCU why don't we just use that?
> 
> >  static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > index 7d97f612c9b9..446b16c1f532 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -37,6 +37,61 @@ static LIST_HEAD(tcf_proto_base);
> >  /* Protects list of registered TC modules. It is pure SMP lock. */
> >  static DEFINE_RWLOCK(cls_mod_lock);
> > 
> > +/* List of tcf_blocks queued for deletion. Bulk freeing them we avoid the
> > + * rcu_barrier() storm at root_qdisc->destroy() time
> > + */
> > +static LIST_HEAD(queued_blocks);
> > +
> > +static void queue_for_deletion(struct tcf_block *block)
> > +{
> > +       if (WARN_ON(!list_empty(&block->del_list)))
> > +               return;
> > +
> > +       ASSERT_RTNL();
> > +       list_add(&block->del_list, &queued_blocks);
> > +}
> > +
> > +static void flush_blocks(struct work_struct *work)
> > +{
> > +       struct tcf_block *block, *tmp_block;
> > +       struct tcf_chain *chain, *tmp;
> > +       struct list_head *head;
> > +
> > +       head = &container_of(work, struct tcf_block, work)->del_head;
> > +       rtnl_lock();
> > +       list_for_each_entry(block, head, del_list)
> > +               /* Only chain 0 should be still here. */
> > +               list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
> > +                       tcf_chain_put(chain);
> > +       rtnl_unlock();
> > +
> > +       list_for_each_entry_safe(block, tmp_block, head, del_list)
> > +               kfree(block);
> > +}
> > +
> > +void tcf_flush_blocks(void)
> > +{
> > +       struct tcf_block *head;
> > +       LIST_HEAD(flush_burst);
> > +
> > +       ASSERT_RTNL();
> > +       if (list_empty(&queued_blocks))
> > +               return;
> > +
> > +       head = list_first_entry(&queued_blocks, struct tcf_block, del_list);
> > +       INIT_LIST_HEAD(&head->del_head);
> > +       list_splice_init(&queued_blocks, &head->del_head);
> > +       INIT_WORK(&head->work, flush_blocks);
> > +
> > +       /* Wait for existing RCU callbacks to cool down, make sure their works
> > +        * have been queued before this. We can not flush pending works here
> > +        * because we are holding the RTNL lock.
> > +        */
> > +       rcu_barrier();
> > +       schedule_work(&head->work);
> > +}
> > +EXPORT_SYMBOL_GPL(tcf_flush_blocks);
> > +
> >  /* Find classifier type by string name */
> > 
> >  static const struct tcf_proto_ops *tcf_proto_lookup_ops(const char *kind)
> > @@ -288,6 +343,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
> >                 return -ENOMEM;
> >         INIT_LIST_HEAD(&block->chain_list);
> >         INIT_LIST_HEAD(&block->cb_list);
> > +       INIT_LIST_HEAD(&block->del_list);
> > 
> >         /* Create chain 0 by default, it has to be always present. */
> >         chain = tcf_chain_create(block, 0);
> > @@ -330,19 +386,6 @@ int tcf_block_get(struct tcf_block **p_block,
> >  }
> >  EXPORT_SYMBOL(tcf_block_get);
> > 
> > -static void tcf_block_put_final(struct work_struct *work)
> > -{
> > -       struct tcf_block *block = container_of(work, struct tcf_block, work);
> > -       struct tcf_chain *chain, *tmp;
> > -
> > -       rtnl_lock();
> > -       /* Only chain 0 should be still here. */
> > -       list_for_each_entry_safe(chain, tmp, &block->chain_list, list)
> > -               tcf_chain_put(chain);
> 
> So it seems like the heart of the problem is right here. The
> tcf_chain_put call is updating the reference count and I would assume
> that is the only bit that really needs you to still be holding onto
> the rtnl_lock.
> 
> The question I would have is if there is anything accessing the
> reference count or manipulating the list itself without holding the
> rtnl lock? If not you could look at converting this whole thing from
> using a list to an rculist and it seems like it would save you a lot
> of trouble. As far as I can tell the only thing you would then really
> have to worry about would be the freeing of the chain itself which
> would happen in an rcu callback instead of with the rtnl lock held.

Thank you for the feedback!

Big fat disclaimer: I'm not 100% sure I fully understood the locking
schema currently in use (ence the RFC tag), so please Jamal/Cong/Jiri
correct me if/where needed.

AFAIU, there are some more context information: tcf_block_put_final()
and the like must be invoked after that the related filters are freed.
Filter themself must be destroyed synchronously at namespace deletion
time, but must respect RCU grace period elsewhere - see comments in 
tcf_exts_get_net(). 

The rcu_barrier() enforces the proper ordering in the latter case, and
the rtnl_lock protects vs concurrency in the first case.

Cheers,

Paolo

  reply	other threads:[~2017-11-29 18:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29 14:25 [RFC PATCH] net_sched: bulk free tcf_block Paolo Abeni
2017-11-29 16:14 ` Alexander Duyck
2017-11-29 18:47   ` Paolo Abeni [this message]
2017-12-01  7:14 ` Cong Wang
2017-12-01 11:05   ` Paolo Abeni
2017-12-01 22:07     ` Cong Wang
2017-12-04 10:12       ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1511981272.15699.18.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.