From: Jarek Poplawski <jarkao2@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: hadi@cyberus.ca, David Miller <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next V2] net: dynamic ingress_queue allocation
Date: Sat, 2 Oct 2010 11:32:55 +0200 [thread overview]
Message-ID: <20101002093255.GA2049@del.dom.local> (raw)
In-Reply-To: <1285941388.2641.175.camel@edumazet-laptop>
On Fri, Oct 01, 2010 at 03:56:28PM +0200, Eric Dumazet wrote:
> Le vendredi 01 octobre 2010 ?? 07:45 -0400, jamal a écrit :
> > On Fri, 2010-10-01 at 00:58 +0200, Eric Dumazet wrote:
> > > Hi Jamal
> > >
> > > Here is the dynamic allocation I promised. I lightly tested it, could
> > > you review it please ?
> > > Thanks !
> > >
> > > [PATCH net-next2.6] net: dynamic ingress_queue allocation
> > >
> > > ingress being not used very much, and net_device->ingress_queue being
> > > quite a big object (128 or 256 bytes), use a dynamic allocation if
> > > needed (tc qdisc add dev eth0 ingress ...)
> >
> > I agree with the principle that it is valuable in making it dynamic for
> > people who dont want it; but, but (like my kid would say, sniff, sniff)
> > you are making me sad saying it is not used very much ;-> It is used
> > very much in my world. My friend Jarek uses it;->
Thanks Jamal, my friend, I'm really glad! (And sad ;-)
>
> ;)
>
> >
> > > +#ifdef CONFIG_NET_CLS_ACT
> >
> > I think appropriately this should be NET_SCH_INGRESS (everywhere else as
> > well).
> >
>
> I first thought of this, and found it would add a new dependence on
> vmlinux :
>
> If someone wants to add NET_SCH_INGRESS module, he would need to
> recompile whole kernel and reboot.
>
> This is probably why ing_filter() and handle_ing() are enclosed with
> CONFIG_NET_CLS_ACT, not CONFIG_NET_SCH_INGRESS.
>
> Since struct net_dev only holds one pointer (after this patch), I
> believe its better to use same dependence.
>
> >
> > > +static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
> > > +{
> > > +#ifdef CONFIG_NET_CLS_ACT
> > > + return dev->ingress_queue;
> > > +#else
> > > + return NULL;
> > > +#endif
> >
> > Above, if you just returned dev->ingress_queue wouldnt it always be
> > NULL if it was not allocated?
> >
>
> ingress_queue is not defined in "struct net_device *dev" if
> !CONFIG_NET_CLS_ACT
>
> Returning NULL here permits dead code elimination by compiler.
>
> Then, probably nobody unset CONFIG_NET_CLS_ACT, so we can probably avoid
> this preprocessor stuff.
>
> >
> > > @@ -2737,7 +2734,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> > > struct packet_type **pt_prev,
> > > int *ret, struct net_device *orig_dev)
> > > {
> > > - if (skb->dev->ingress_queue.qdisc == &noop_qdisc)
> > > + struct netdev_queue *rxq = dev_ingress_queue(skb->dev);
> > > +
> > > + if (!rxq || rxq->qdisc == &noop_qdisc)
> > > goto out;
> >
> > I stared at above a little longer since this is the only fast path
> > affected; is it a few more cycles now for people who love ingress?
>
> I see, this adds an indirection and a conditional branch, but this
> should be in cpu cache and well predicted.
>
> I thought adding a fake "struct netdev_queue" object, with a qdisc
> pointing to noop_qdisc. But this would slow down a bit non ingress
> users ;)
>
> For people not using ingress, my solution removes an access to an extra
> cache line. So network latency is improved a bit when cpu caches are
> full of user land data.
>
> >
> > > @@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> > > (new && new->flags & TCQ_F_INGRESS)) {
> > > num_q = 1;
> > > ingress = 1;
> > > + if (!dev_ingress_queue(dev))
> > > + return -ENOENT;
> > > }
> > >
> >
> > The above looks clever but worries me because it changes the old flow.
> > If you have time, the following tests will alleviate my fears
> >
> > 1) compile support for ingress and add/delete ingress qdisc
>
> This worked for me, but I dont know complex setups.
>
> > 2) Dont compile support and add/delete ingress qdisc
>
> tc gives an error (a bit like !CONFIG_NET_SCH_INGRESS)
>
> # tc qdisc add dev eth0 ingress
> RTNETLINK answers: No such file or directory
> # tc -s -d qdisc show dev eth0
> qdisc mq 0: root
> Sent 636 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
>
> > 3) Compile ingress as a module and add/delete ingress qdisc
> >
> >
>
> Seems to work like 1)
>
> > Other than that excellent work Eric. And you can add my
> > Acked/reviewed-by etc.
> >
> > BTW, did i say i like your per-cpu stats stuff? It applies nicely to
> > qdiscs, actions etc ;->
>
> I took a look at ifb as suggested by Stephen but could not see trivial
> changes (LLTX or per-cpu stats), since central lock is needed I am
> afraid. And qdisc are the same, stats updates are mostly free as we
> dirtied cache line for the lock.
>
> Thanks Jamal !
>
> Here is the V2, with two #ifdef removed.
>
>
> [PATCH net-next V2] net: dynamic ingress_queue allocation
>
> ingress being not used very much, and net_device->ingress_queue being
> quite a big object (128 or 256 bytes), use a dynamic allocation if
> needed (tc qdisc add dev eth0 ingress ...)
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> include/linux/netdevice.h | 11 ++++++--
> net/core/dev.c | 48 +++++++++++++++++++++++++++---------
> net/sched/sch_api.c | 40 ++++++++++++++++++++----------
> net/sched/sch_generic.c | 36 +++++++++++++++------------
> 4 files changed, 92 insertions(+), 43 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ceed347..4f86009 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -986,8 +986,7 @@ struct net_device {
> rx_handler_func_t *rx_handler;
> void *rx_handler_data;
>
> - struct netdev_queue ingress_queue; /* use two cache lines */
> -
> + struct netdev_queue *ingress_queue;
> /*
> * Cache lines mostly used on transmit path
> */
> @@ -1115,6 +1114,14 @@ static inline void netdev_for_each_tx_queue(struct net_device *dev,
> f(dev, &dev->_tx[i], arg);
> }
>
> +
> +static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
> +{
> + return dev->ingress_queue;
> +}
> +
> +extern struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
> +
> /*
> * Net namespace inlines
> */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a313bab..e3bb8c9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2702,11 +2702,10 @@ EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
> * the ingress scheduler, you just cant add policies on ingress.
> *
> */
> -static int ing_filter(struct sk_buff *skb)
> +static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
> {
> struct net_device *dev = skb->dev;
> u32 ttl = G_TC_RTTL(skb->tc_verd);
> - struct netdev_queue *rxq;
> int result = TC_ACT_OK;
> struct Qdisc *q;
>
> @@ -2720,8 +2719,6 @@ static int ing_filter(struct sk_buff *skb)
> skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
> skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
>
> - rxq = &dev->ingress_queue;
> -
> q = rxq->qdisc;
> if (q != &noop_qdisc) {
> spin_lock(qdisc_lock(q));
> @@ -2737,7 +2734,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> struct packet_type **pt_prev,
> int *ret, struct net_device *orig_dev)
> {
> - if (skb->dev->ingress_queue.qdisc == &noop_qdisc)
> + struct netdev_queue *rxq = dev_ingress_queue(skb->dev);
> +
> + if (!rxq || rxq->qdisc == &noop_qdisc)
> goto out;
>
> if (*pt_prev) {
> @@ -2745,7 +2744,7 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> *pt_prev = NULL;
> }
>
> - switch (ing_filter(skb)) {
> + switch (ing_filter(skb, rxq)) {
> case TC_ACT_SHOT:
> case TC_ACT_STOLEN:
> kfree_skb(skb);
> @@ -4932,15 +4931,17 @@ static void __netdev_init_queue_locks_one(struct net_device *dev,
> struct netdev_queue *dev_queue,
> void *_unused)
> {
> - spin_lock_init(&dev_queue->_xmit_lock);
> - netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
> - dev_queue->xmit_lock_owner = -1;
> + if (dev_queue) {
> + spin_lock_init(&dev_queue->_xmit_lock);
> + netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
> + dev_queue->xmit_lock_owner = -1;
> + }
> }
>
> static void netdev_init_queue_locks(struct net_device *dev)
> {
> netdev_for_each_tx_queue(dev, __netdev_init_queue_locks_one, NULL);
> - __netdev_init_queue_locks_one(dev, &dev->ingress_queue, NULL);
> + __netdev_init_queue_locks_one(dev, dev_ingress_queue(dev), NULL);
Is dev_ingress_queue(dev) not NULL anytime here?
> }
>
> unsigned long netdev_fix_features(unsigned long features, const char *name)
> @@ -5447,16 +5448,37 @@ static void netdev_init_one_queue(struct net_device *dev,
> struct netdev_queue *queue,
> void *_unused)
> {
> - queue->dev = dev;
> + if (queue)
> + queue->dev = dev;
> }
>
> static void netdev_init_queues(struct net_device *dev)
> {
> - netdev_init_one_queue(dev, &dev->ingress_queue, NULL);
> + netdev_init_one_queue(dev, dev_ingress_queue(dev), NULL);
Is dev_ingress_queue(dev) not NULL anytime here?
> netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
> spin_lock_init(&dev->tx_global_lock);
> }
>
> +struct netdev_queue *dev_ingress_queue_create(struct net_device *dev)
> +{
> + struct netdev_queue *queue = dev_ingress_queue(dev);
> +
> +#ifdef CONFIG_NET_CLS_ACT
> + if (queue)
> + return queue;
> + queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> + if (!queue)
> + return NULL;
> + netdev_init_one_queue(dev, queue, NULL);
> + __netdev_init_queue_locks_one(dev, queue, NULL);
> + queue->qdisc = &noop_qdisc;
> + queue->qdisc_sleeping = &noop_qdisc;
> + smp_wmb();
Why don't we need smp_rmb() in handle_ing()?
> + dev->ingress_queue = queue;
> +#endif
> + return queue;
> +}
> +
> /**
> * alloc_netdev_mq - allocate network device
> * @sizeof_priv: size of private data to allocate space for
> @@ -5559,6 +5581,8 @@ void free_netdev(struct net_device *dev)
>
> kfree(dev->_tx);
>
> + kfree(dev_ingress_queue(dev));
> +
> /* Flush device addresses */
> dev_addr_flush(dev);
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index b802078..8635110 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -240,7 +240,10 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
> if (q)
> goto out;
>
> - q = qdisc_match_from_root(dev->ingress_queue.qdisc_sleeping, handle);
> + if (!dev_ingress_queue(dev))
> + goto out;
> + q = qdisc_match_from_root(dev_ingress_queue(dev)->qdisc_sleeping,
> + handle);
I'd prefer:
+ if (dev_ingress_queue(dev))
+ q = qdisc_match_from_root(dev_ingress_queue(dev)->qdisc_sleeping,
> out:
> return q;
> }
> @@ -690,6 +693,8 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> (new && new->flags & TCQ_F_INGRESS)) {
> num_q = 1;
> ingress = 1;
> + if (!dev_ingress_queue(dev))
> + return -ENOENT;
Is this test really needed here?
> }
>
> if (dev->flags & IFF_UP)
> @@ -701,7 +706,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
> }
>
> for (i = 0; i < num_q; i++) {
> - struct netdev_queue *dev_queue = &dev->ingress_queue;
> + struct netdev_queue *dev_queue = dev_ingress_queue(dev);
>
> if (!ingress)
> dev_queue = netdev_get_tx_queue(dev, i);
> @@ -979,7 +984,8 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
> return -ENOENT;
> q = qdisc_leaf(p, clid);
> } else { /* ingress */
> - q = dev->ingress_queue.qdisc_sleeping;
> + if (dev_ingress_queue(dev))
> + q = dev_ingress_queue(dev)->qdisc_sleeping;
> }
> } else {
> q = dev->qdisc;
> @@ -1044,7 +1050,8 @@ replay:
> return -ENOENT;
> q = qdisc_leaf(p, clid);
> } else { /*ingress */
> - q = dev->ingress_queue.qdisc_sleeping;
> + if (dev_ingress_queue_create(dev))
> + q = dev_ingress_queue(dev)->qdisc_sleeping;
I wonder if doing dev_ingress_queue_create() just before qdisc_create()
(and the test here) isn't more readable.
> }
> } else {
> q = dev->qdisc;
> @@ -1123,11 +1130,14 @@ replay:
> create_n_graft:
> if (!(n->nlmsg_flags&NLM_F_CREATE))
> return -ENOENT;
> - if (clid == TC_H_INGRESS)
> - q = qdisc_create(dev, &dev->ingress_queue, p,
> - tcm->tcm_parent, tcm->tcm_parent,
> - tca, &err);
> - else {
> + if (clid == TC_H_INGRESS) {
> + if (dev_ingress_queue(dev))
> + q = qdisc_create(dev, dev_ingress_queue(dev), p,
> + tcm->tcm_parent, tcm->tcm_parent,
> + tca, &err);
> + else
> + err = -ENOENT;
> + } else {
> struct netdev_queue *dev_queue;
>
> if (p && p->ops->cl_ops && p->ops->cl_ops->select_queue)
> @@ -1304,8 +1314,10 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
> if (tc_dump_qdisc_root(dev->qdisc, skb, cb, &q_idx, s_q_idx) < 0)
> goto done;
>
> - dev_queue = &dev->ingress_queue;
> - if (tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb, &q_idx, s_q_idx) < 0)
> + dev_queue = dev_ingress_queue(dev);
> + if (dev_queue &&
> + tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
> + &q_idx, s_q_idx) < 0)
> goto done;
>
> cont:
> @@ -1595,8 +1607,10 @@ static int tc_dump_tclass(struct sk_buff *skb, struct netlink_callback *cb)
> if (tc_dump_tclass_root(dev->qdisc, skb, tcm, cb, &t, s_t) < 0)
> goto done;
>
> - dev_queue = &dev->ingress_queue;
> - if (tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb, &t, s_t) < 0)
> + dev_queue = dev_ingress_queue(dev);
> + if (dev_queue &&
> + tc_dump_tclass_root(dev_queue->qdisc_sleeping, skb, tcm, cb,
> + &t, s_t) < 0)
> goto done;
>
> done:
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 545278a..c42dec5 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -721,16 +721,18 @@ static void transition_one_qdisc(struct net_device *dev,
> struct netdev_queue *dev_queue,
> void *_need_watchdog)
> {
> - struct Qdisc *new_qdisc = dev_queue->qdisc_sleeping;
> - int *need_watchdog_p = _need_watchdog;
> + if (dev_queue) {
> + struct Qdisc *new_qdisc = dev_queue->qdisc_sleeping;
> + int *need_watchdog_p = _need_watchdog;
>
> - if (!(new_qdisc->flags & TCQ_F_BUILTIN))
> - clear_bit(__QDISC_STATE_DEACTIVATED, &new_qdisc->state);
> + if (!(new_qdisc->flags & TCQ_F_BUILTIN))
> + clear_bit(__QDISC_STATE_DEACTIVATED, &new_qdisc->state);
>
> - rcu_assign_pointer(dev_queue->qdisc, new_qdisc);
> - if (need_watchdog_p && new_qdisc != &noqueue_qdisc) {
> - dev_queue->trans_start = 0;
> - *need_watchdog_p = 1;
> + rcu_assign_pointer(dev_queue->qdisc, new_qdisc);
> + if (need_watchdog_p && new_qdisc != &noqueue_qdisc) {
> + dev_queue->trans_start = 0;
> + *need_watchdog_p = 1;
> + }
> }
> }
>
> @@ -753,7 +755,7 @@ void dev_activate(struct net_device *dev)
>
> need_watchdog = 0;
> netdev_for_each_tx_queue(dev, transition_one_qdisc, &need_watchdog);
> - transition_one_qdisc(dev, &dev->ingress_queue, NULL);
> + transition_one_qdisc(dev, dev_ingress_queue(dev), NULL);
I'd prefer here and similarly later:
+ if (dev_ingress_queue(dev))
+ transition_one_qdisc(dev, dev_ingress_queue(dev), NULL);
to show NULL dev_queue is only legal in this one case.
Cheers,
Jarek P.
>
> if (need_watchdog) {
> dev->trans_start = jiffies;
> @@ -768,7 +770,7 @@ static void dev_deactivate_queue(struct net_device *dev,
> struct Qdisc *qdisc_default = _qdisc_default;
> struct Qdisc *qdisc;
>
> - qdisc = dev_queue->qdisc;
> + qdisc = dev_queue ? dev_queue->qdisc : NULL;
> if (qdisc) {
> spin_lock_bh(qdisc_lock(qdisc));
>
> @@ -812,7 +814,7 @@ static bool some_qdisc_is_busy(struct net_device *dev)
> void dev_deactivate(struct net_device *dev)
> {
> netdev_for_each_tx_queue(dev, dev_deactivate_queue, &noop_qdisc);
> - dev_deactivate_queue(dev, &dev->ingress_queue, &noop_qdisc);
> + dev_deactivate_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
>
> dev_watchdog_down(dev);
>
> @@ -830,15 +832,17 @@ static void dev_init_scheduler_queue(struct net_device *dev,
> {
> struct Qdisc *qdisc = _qdisc;
>
> - dev_queue->qdisc = qdisc;
> - dev_queue->qdisc_sleeping = qdisc;
> + if (dev_queue) {
> + dev_queue->qdisc = qdisc;
> + dev_queue->qdisc_sleeping = qdisc;
> + }
> }
>
> void dev_init_scheduler(struct net_device *dev)
> {
> dev->qdisc = &noop_qdisc;
> netdev_for_each_tx_queue(dev, dev_init_scheduler_queue, &noop_qdisc);
> - dev_init_scheduler_queue(dev, &dev->ingress_queue, &noop_qdisc);
> + dev_init_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
>
> setup_timer(&dev->watchdog_timer, dev_watchdog, (unsigned long)dev);
> }
> @@ -847,7 +851,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
> struct netdev_queue *dev_queue,
> void *_qdisc_default)
> {
> - struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
> + struct Qdisc *qdisc = dev_queue ? dev_queue->qdisc_sleeping : NULL;
> struct Qdisc *qdisc_default = _qdisc_default;
>
> if (qdisc) {
> @@ -861,7 +865,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
> void dev_shutdown(struct net_device *dev)
> {
> netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
> - shutdown_scheduler_queue(dev, &dev->ingress_queue, &noop_qdisc);
> + shutdown_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
> qdisc_destroy(dev->qdisc);
> dev->qdisc = &noop_qdisc;
>
>
>
next prev parent reply other threads:[~2010-10-02 9:33 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-28 15:58 [PATCH net-next2.6] net: rename netdev rx_queue to ingress_queue Eric Dumazet
2010-09-28 18:04 ` Jarek Poplawski
2010-09-29 10:56 ` jamal
2010-09-29 13:18 ` Jarek Poplawski
2010-09-29 16:54 ` Jarek Poplawski
2010-09-30 22:58 ` [PATCH net-next2.6] net: dynamic ingress_queue allocation Eric Dumazet
2010-10-01 11:45 ` jamal
2010-10-01 13:56 ` [PATCH net-next V2] " Eric Dumazet
2010-10-02 9:32 ` Jarek Poplawski [this message]
2010-10-02 16:11 ` [PATCH net-next V3] " Eric Dumazet
2010-10-03 9:42 ` Jarek Poplawski
2010-10-03 13:10 ` jamal
2010-10-04 8:42 ` Eric Dumazet
2010-10-04 9:00 ` [PATCH net-next] net: relax rtnl_dereference() Eric Dumazet
2010-10-05 7:29 ` David Miller
2010-10-04 12:06 ` [PATCH net-next V3] net: dynamic ingress_queue allocation Jarek Poplawski
2010-10-04 12:52 ` Eric Dumazet
2010-10-04 14:24 ` Jarek Poplawski
2010-10-04 15:21 ` Eric Dumazet
2010-10-05 7:24 ` David Miller
2010-10-05 7:31 ` Eric Dumazet
2010-10-02 12:10 ` [PATCH net-next V2] " jamal
2010-09-29 20:27 ` [PATCH net-next2.6] net: rename netdev rx_queue to ingress_queue David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101002093255.GA2049@del.dom.local \
--to=jarkao2@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=hadi@cyberus.ca \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.