From: Daniel Borkmann <daniel@iogearbox.net>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: davem@davemloft.net, john.fastabend@gmail.com, ast@plumgrid.com,
netdev@vger.kernel.org,
John Fastabend <john.r.fastabend@intel.com>
Subject: Re: [PATCH net-next 1/4] net: qdisc: add op to run filters/actions before enqueue
Date: Tue, 01 Sep 2015 20:50:30 +0200 [thread overview]
Message-ID: <55E5F376.80600@iogearbox.net> (raw)
In-Reply-To: <1441128083.8932.177.camel@edumazet-glaptop2.roam.corp.google.com>
On 09/01/2015 07:21 PM, Eric Dumazet wrote:
> On Tue, 2015-09-01 at 18:34 +0200, Daniel Borkmann wrote:
>> From: John Fastabend <john.r.fastabend@intel.com>
>>
>> Add a new ->preclassify() op to allow multiqueue queuing disciplines
>> to call tc_classify() or perform other work before dev_pick_tx().
>>
>> This helps, for example, with mqprio queueing discipline that has
>> offload support by most popular 10G NICs, where the txq effectively
>> picks the qdisc.
>>
>> Once traffic is being directed to a specific queue then hardware TX
>> rings may be tuned to support this traffic type. mqprio already
>> gives the ability to do this via skb->priority where the ->preclassify()
>> provides more control over packet steering, it can classify the skb
>> and set the priority, for example, from an eBPF classifier (or action).
>>
>> Also this allows traffic classifiers to be run without holding the
>> qdisc lock and gives one place to attach filters when mqprio is
>> in use. ->preclassify() could also be added to other mq qdiscs later
>> on: f.e. most classful qdiscs first check major/minor numbers of
>> skb->priority before actually consulting a more complex classifier.
>>
>> For mqprio case today, a filter has to be attached to each txq qdisc
>> to have all traffic hit the filter. Since ->preclassify() is currently
>> only used by mqprio, the __dev_queue_xmit() fast path is guarded by
>> a generic, hidden Kconfig option (NET_CLS_PRECLASSIFY) that is only
>> selected by mqprio,
>
> So all distros will select it, basically.
>
> ...
>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 877c848..b768bca 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3052,6 +3052,23 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
>> rcu_read_lock_bh();
>>
>> skb_update_prio(skb);
>> +#ifdef CONFIG_NET_CLS_PRECLASSIFY
>> + q = rcu_dereference_bh(dev->qdisc);
>> + if (q && q->preclassify) {
>> + switch (q->preclassify(skb, q)) {
>> + default:
>> + break;
>> +#ifdef CONFIG_NET_CLS_ACT
>> + case TC_ACT_SHOT:
>> + case TC_ACT_STOLEN:
>> + case TC_ACT_QUEUED:
>> + kfree_skb(skb);
>> + rc = NET_XMIT_SUCCESS;
>> + goto out;
>> +#endif
>> + }
>> + }
>> +#endif
>>
>
> Since its a device attribute after all, why are you storing it in
> dev->qdisc->preclassify, adding a cache line miss for moderate load ?
>
> (mqprio/mq root qdisc is normally not fetched in fast path ?)
>
> dev->preclassify would be better IMO, close to dev->_tx
Yes, makes sense, as this cacheline is accessed anyway few cycles later.
I'll look into how well this approach integrates into tc's configuration
path. I think mqprio/mq/... could just install/remove dev->preclassify()
handler as soon as first filter is attached/detached.
Thanks,
Daniel
next prev parent reply other threads:[~2015-09-01 18:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-01 16:34 [PATCH net-next 0/4] BPF updates Daniel Borkmann
2015-09-01 16:34 ` [PATCH net-next 1/4] net: qdisc: add op to run filters/actions before enqueue Daniel Borkmann
2015-09-01 17:21 ` Eric Dumazet
2015-09-01 18:50 ` Daniel Borkmann [this message]
2015-09-02 6:22 ` Cong Wang
2015-09-02 12:44 ` Daniel Borkmann
2015-09-02 20:29 ` Jamal Hadi Salim
2015-09-01 16:34 ` [PATCH net-next 2/4] ebpf: migrate bpf_prog's flags to bitfield Daniel Borkmann
2015-09-01 16:34 ` [PATCH net-next 3/4] net: {cls,act}_bpf: add helper for retrieving routing realms Daniel Borkmann
2015-09-01 16:34 ` [PATCH net-next 4/4] net: {cls,act}_bpf: make skb->priority writable Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55E5F376.80600@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=john.r.fastabend@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.