Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
From: Stephen Hemminger @ 2017-10-12 16:07 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Michal Kubecek, Hangbin Liu, netdev, Hangbin Liu
In-Reply-To: <20171011111007.GA11332@orbyte.nwl.cc>

On Wed, 11 Oct 2017 13:10:07 +0200
Phil Sutter <phil@nwl.cc> wrote:

> On Tue, Oct 10, 2017 at 09:47:43AM -0700, Stephen Hemminger wrote:
> > On Tue, 10 Oct 2017 08:41:17 +0200
> > Michal Kubecek <mkubecek@suse.cz> wrote:
> > 
> > > On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote:
> > > > Hi Stephen,
> > > > 
> > > > On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote:  
> > > > > On Thu, 28 Sep 2017 21:33:46 +0800
> > > > > Hangbin Liu <haliu@redhat.com> wrote:
> > > > >   
> > > > > > From: Hangbin Liu <liuhangbin@gmail.com>
> > > > > > 
> > > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size also in
> > > > > > iplink_get()"). After update, we will not need to double the buffer size
> > > > > > every time when VFs number increased.
> > > > > > 
> > > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
> > > > > > length parameter.
> > > > > > 
> > > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
> > > > > > answer to avoid overwrite data in nlh, because it may has more info after
> > > > > > nlh. also this will avoid nlh buffer not enough issue.
> > > > > > 
> > > > > > We need to free answer after using.
> > > > > > 
> > > > > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > > > > > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > > > > > ---  
> > > > > 
> > > > > Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing.
> > > > > Can only those places that need that be targeted?  
> > > > 
> > > > We could probably do that, by having a buffer on stack in __rtnl_talk()
> > > > which will be used instead of the allocated one if 'answer' is NULL. Or
> > > > maybe even introduce a dedicated API call for the dynamically allocated
> > > > receive buffer. But I really doubt that's feasible: AFAICT, that stack
> > > > buffer still needs to be reasonably sized since the reply might be
> > > > larger than the request (reusing the request buffer would be the most
> > > > simple way to tackle this), also there is support for extack which may
> > > > bloat the response to arbitrary size. Hangbin has shown in his benchmark
> > > > that the overhead of the second syscall is negligible, so why care about
> > > > that and increase code complexity even further?
> > > > 
> > > > Not saying it's not possible, but I just doubt it's worth the effort.  
> > > 
> > > Agreed. Current code is based on the assumption that we can estimate the
> > > maximum reply length in advance and the reason for this series is that
> > > this assumption turned out to be wrong. I'm afraid that if we replace
> > > it by an assumption that we can estimate the maximum reply length for
> > > most requests with only few exceptions, it's only matter of time for us
> > > to be proven wrong again.
> > > 
> > > Michal Kubecek
> > > 
> > 
> > For query responses, yes the response may be large. But for the common cases of
> > add address or add route, the response should just be ack or error.
> 
> And with extack, error is comprised of the original request plus an
> arbitrarily sized error message, so we can't just reuse the request
> buffer and are back to "guessing" the right length again.
> 
> To get an idea of what we're talking about, I wrote a simple benchmark
> which adds 256 * 254 (= 65024) addresses to an interface, then removes
> them again one by one and measured the time that takes for binaries with
> and without Hangbin's patches:
> 
> OP	Vanilla		Hangbin		Delta
> --------------------------------------------------------
> add	real 2m16.244s	real 2m27.964s	+11.72s	(108.6%)
> 	user 0m15.241s	user 0m17.295s	+2.054s	(113.5%)
> 	sys  1m40.229s	sys  1m48.239s	+8.01s	(108.0%)
> 
> remove	real 1m44.950s	real 1m47.044s	+2.094s	(102.0%)
> 	user 0m13.899s	user 0m14.723s	+0.824s (105.9%)
> 	sys  1m30.798s	sys  1m31.938s	+1.140s (101.3%)
> 
> So the overhead of the second syscall and dynamic memory allocation is
> less than 10% overall. Given the short time a single call to 'ip'
> typically takes, I don't think the difference is noticeable even in
> highly performance critical applications.
> 
> Cheers, Phil

For a better benchmark, I generated 4 Million routes
then did: 
	# ip ---batch routes.txt


OP	Vanilla		Hangbin		Delta
-----------------------------------------------------
add	real 1:25.840	1:33.677	+9.13%
	user   10.690	   6.078	-56.85%
	sys  1:00.920	1:13.109	+20.00%	

remove	real 2:29.881	2:25.872	-2.67%
	user   12.862	   7.942	-38.25%
	sys    44.127	  44.633	+1.15%


So the answer is addition is slower but deletion appears faster?
If I rerun the Vanilla test, get about the same times.

The slowdown won't impact me, but what about large scale users
like Cumulus.

^ permalink raw reply

* Re: [Intel-wired-lan] [next-queue PATCH v6 2/5] mqprio: Implement select_queue class_ops
From: Alexander Duyck @ 2017-10-12 16:09 UTC (permalink / raw)
  To: Jesus Sanchez-Palencia
  Cc: Vinicius Costa Gomes, Netdev, intel-wired-lan, rodney.cummings,
	andre.guedes, Jiri Pirko, ivan.briano, Richard Cochran, henrik,
	Jamal Hadi Salim, levipearson, boon.leong.ong, Cong Wang
In-Reply-To: <633dfce2-71f1-f704-7ae2-b1ef2ba9b448@intel.com>

On Thu, Oct 12, 2017 at 8:59 AM, Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
> Hi Alex,
>
>
> On 10/12/2017 08:21 AM, Alexander Duyck wrote:
>> On Wed, Oct 11, 2017 at 5:54 PM, Vinicius Costa Gomes
>> <vinicius.gomes@intel.com> wrote:
>>> From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>>
>>> When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
>>> the netdev_queue pointer that the current child qdisc is associated
>>> with before creating the new qdisc.
>>>
>>> Currently, when using mqprio as root qdisc, the kernel will end up
>>> getting the queue #0 pointer from the mqprio (root qdisc), which leaves
>>> any new child qdisc with a possibly wrong netdev_queue pointer.
>>>
>>> Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
>>> issue and avoid an inconsistent state when child qdiscs are replaced.
>>>
>>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>> ---
>>>  net/sched/sch_mqprio.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
>>> index 6bcdfe6e7b63..8c042ae323e3 100644
>>> --- a/net/sched/sch_mqprio.c
>>> +++ b/net/sched/sch_mqprio.c
>>> @@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
>>>         }
>>>  }
>>>
>>> +static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch,
>>> +                                               struct tcmsg *tcm)
>>> +{
>>> +       return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
>>> +}
>>> +
>>
>> So I was just comparing this against mq_selet_queue, and I was
>> wondering why we are willing to return NULL here instead of just
>> returning a pointer to the first Tx queue? I realize there is the fix
>> in the first patch but it seems like if we are going to go that route
>> then maybe we should update mq as well so that both of these qdiscs
>> behave the same way. Either this should work like mq, or mq should
>> work like this, but we shouldn't have them exposing different
>> behaviors.
>
>
> This was brought up by Cong Wang during the review of our v2. Based on my
> understanding, the point I've made is that for mqprio the inner qdiscs are
> always 'related' to one of the Tx netdev_queues per design. Returning any other
> queue as a fallback seemed like going against that to me.
>
> I'm still inclined to say that we should keep this function as the patch is
> proposing, thus either returning the correct netdev_queue for a given handle, or
> NULL as a way to flag that something was 'wrong' with it. Returning queue #0 is
> misleading in that sense, imho.
>
> As for aligning mq_select_queue() with this approach, if my reasoning behind
> mqprio is correct and also applies to mq, I would be happy to send that fix as
> part our v7.
>
> What do you think?
>
>
> Thanks,
> Jesus

I think it would be better to bring mq_select_queue in line with your
fix. You could probably just add it to your first patch. That way if
the user specifies a bad qdisc classid they don't get to just
overwrite the qdisc on Tx queue 0.

- Alex

^ permalink raw reply

* RE: [PATCH 3/4] net: qcom/emac: enforce DMA address restrictions
From: David Laight @ 2017-10-12 16:20 UTC (permalink / raw)
  To: 'Timur Tabi', David S. Miller, netdev@vger.kernel.org
In-Reply-To: <e523734d-d845-8eae-4a5b-e679b8e46654@codeaurora.org>

From: Timur Tabi
> Sent: 12 October 2017 15:13
> On 10/12/17 4:30 AM, David Laight wrote:
> > Isn't the memory allocated by a single kzalloc() call?
> 
> dma_alloc_coherenent, actually.
> 
> > IIRC that guarantees it doesn't cross a power or 2 boundary less than
> > the size.
> 
> I'm pretty sure that kzalloc does not make that guarantee, and I don't
> think dma_alloc_coherent does either.

dma_alloc_coherent() definitely does.
And I've a driver that relies on it (for 16k blocks).

	David

^ permalink raw reply

* Re: [Intel-wired-lan] [next-queue PATCH v6 2/5] mqprio: Implement select_queue class_ops
From: Jesus Sanchez-Palencia @ 2017-10-12 16:16 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Vinicius Costa Gomes, Netdev, intel-wired-lan, rodney.cummings,
	andre.guedes, Jiri Pirko, ivan.briano, Richard Cochran, henrik,
	Jamal Hadi Salim, levipearson, boon.leong.ong, Cong Wang
In-Reply-To: <CAKgT0UcpSq2-LaqDKhTWKoZdemDE2SOCWggxGBjCH98qYPBGFw@mail.gmail.com>



On 10/12/2017 09:09 AM, Alexander Duyck wrote:
> On Thu, Oct 12, 2017 at 8:59 AM, Jesus Sanchez-Palencia
> <jesus.sanchez-palencia@intel.com> wrote:
>> Hi Alex,
>>
>>
>> On 10/12/2017 08:21 AM, Alexander Duyck wrote:
>>> On Wed, Oct 11, 2017 at 5:54 PM, Vinicius Costa Gomes
>>> <vinicius.gomes@intel.com> wrote:
>>>> From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>>>
>>>> When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
>>>> the netdev_queue pointer that the current child qdisc is associated
>>>> with before creating the new qdisc.
>>>>
>>>> Currently, when using mqprio as root qdisc, the kernel will end up
>>>> getting the queue #0 pointer from the mqprio (root qdisc), which leaves
>>>> any new child qdisc with a possibly wrong netdev_queue pointer.
>>>>
>>>> Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
>>>> issue and avoid an inconsistent state when child qdiscs are replaced.
>>>>
>>>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>>>> ---
>>>>  net/sched/sch_mqprio.c | 7 +++++++
>>>>  1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
>>>> index 6bcdfe6e7b63..8c042ae323e3 100644
>>>> --- a/net/sched/sch_mqprio.c
>>>> +++ b/net/sched/sch_mqprio.c
>>>> @@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
>>>>         }
>>>>  }
>>>>
>>>> +static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch,
>>>> +                                               struct tcmsg *tcm)
>>>> +{
>>>> +       return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
>>>> +}
>>>> +
>>>
>>> So I was just comparing this against mq_selet_queue, and I was
>>> wondering why we are willing to return NULL here instead of just
>>> returning a pointer to the first Tx queue? I realize there is the fix
>>> in the first patch but it seems like if we are going to go that route
>>> then maybe we should update mq as well so that both of these qdiscs
>>> behave the same way. Either this should work like mq, or mq should
>>> work like this, but we shouldn't have them exposing different
>>> behaviors.
>>
>>
>> This was brought up by Cong Wang during the review of our v2. Based on my
>> understanding, the point I've made is that for mqprio the inner qdiscs are
>> always 'related' to one of the Tx netdev_queues per design. Returning any other
>> queue as a fallback seemed like going against that to me.
>>
>> I'm still inclined to say that we should keep this function as the patch is
>> proposing, thus either returning the correct netdev_queue for a given handle, or
>> NULL as a way to flag that something was 'wrong' with it. Returning queue #0 is
>> misleading in that sense, imho.
>>
>> As for aligning mq_select_queue() with this approach, if my reasoning behind
>> mqprio is correct and also applies to mq, I would be happy to send that fix as
>> part our v7.
>>
>> What do you think?
>>
>>
>> Thanks,
>> Jesus
> 
> I think it would be better to bring mq_select_queue in line with your
> fix. You could probably just add it to your first patch. That way if
> the user specifies a bad qdisc classid they don't get to just
> overwrite the qdisc on Tx queue 0.

Ok, I will send the fix then, but I'm just not sure if I'll send it together
with the patch fixing qdisc_alloc(). Looks like a change that should be together
with this one of after it, instead.

Anyhow, it will be fixed in our v7. Thanks!

^ permalink raw reply

* Re: BUG:af_packet fails to TX TSO frames
From: Willem de Bruijn @ 2017-10-12 16:30 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Network Development, David Miller
In-Reply-To: <e301fbfd-a283-caa1-5915-8be15677ed74@cambridgegreys.com>

On Thu, Oct 12, 2017 at 11:44 AM, Anton Ivanov
<anton.ivanov@cambridgegreys.com> wrote:
> Found it.
>
> Two bugs canceling each other.
> The bind sequence in:  psock_txring_vnet.c is wrong.
>
> It does the following addr.sll_protocol =    htons(ETH_P_IP);
> before calling bind.
>
> If you set addr.sll_protocol to ETH_P_ALL where it should have been in the
> first place the test program blows up with -ENOBUFS

There is no such requirement that the socket should bind to ETH_P_ALL.

> I think what is happening is that this value is taken into account when
> looking at "what should I use to segment it with" in skb_mac_gso_segment
> which is invoked at the end of the verification chain which starts in
> packet_direct_xmit in af_packet.c

packet_snd sets skb->protocol based on the protocol that the packet
socket is bound to. Binding to ETH_P_IP is the right choice here.

^ permalink raw reply

* Re: RFC(v2): Audit Kernel Container IDs
From: Casey Schaufler @ 2017-10-12 16:33 UTC (permalink / raw)
  To: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
	Linux Containers, Linux API, Linux Audit, Linux FS Devel,
	Linux Kernel, Linux Network Development
  Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
	jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Al Viro,
	David Howells, Simo Sorce, trondmy-7I+n7zu2hftEKMMhf/gKZA,
	Eric Paris, Serge E. Hallyn, Eric W. Biederman
In-Reply-To: <20171012141359.saqdtnodwmbz33b2-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>

On 10/12/2017 7:14 AM, Richard Guy Briggs wrote:
> Containers are a userspace concept.  The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions.  Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this.  This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
>
> The registration is a pseudo filesystem (proc, since PID tree already
> exists) write of a u8[16] UUID representing the container ID to a file
> representing a process that will become the first process in a new
> container.  This write might place restrictions on mount namespaces
> required to define a container, or at least careful checking of
> namespaces in the kernel to verify permissions of the orchestrator so it
> can't change its own container ID.  A bind mount of nsfs may be
> necessary in the container orchestrator's mntNS.
> Note: Use a 128-bit scalar rather than a string to make compares faster
> and simpler.
>
> Require a new CAP_CONTAINER_ADMIN to be able to carry out the
> registration.

Hang on. If containers are a user space concept, how can
you want CAP_CONTAINER_ANYTHING? If there's not such thing as
a container, how can you be asking for a capability to manage
them?

>   At that time, record the target container's user-supplied
> container identifier along with the target container's first process
> (which may become the target container's "init" process) process ID
> (referenced from the initial PID namespace), all namespace IDs (in the
> form of a nsfs device number and inode number tuple) in a new auxilliary
> record AUDIT_CONTAINER with a qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' task_struct.
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children.  If this is deemed overly restrictive,
> switch all threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_CONTAINER_ADMIN.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable.  Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
>
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.)  A
> container object may need a list of processes and/or namespaces.
>
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container.  A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
>
> (v2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and children into same container
>
> - RGB
>
> --
> Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
> --
> Linux-audit mailing list
> Linux-audit-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> https://www.redhat.com/mailman/listinfo/linux-audit
>

^ permalink raw reply

* Re: [PATCH net-next 1/1] veth: tweak creation of veth device
From: Roman Mashak @ 2017-10-12 16:49 UTC (permalink / raw)
  To: David Miller; +Cc: jhs, netdev
In-Reply-To: <20171011.151706.1844884518098480593.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

>> When creating veth pair, at first rtnl_new_link() creates veth_dev, i.e.
>> one end of the veth pipe, but not registers it; then veth_newlink() gets
>> invoked, where peer dev is created _and_ registered, followed by veth_dev
>> registration, which may fail if peer information, that is VETH_INFO_PEER
>> attribute, has not been provided and the kernel will allocate unique veth
>> name.
>> 
>> So, we should ask the kernel to allocate unique name for veth_dev only
>> when peer info is not available.
>> 
>> Example:
>> 
>> % ip link dev veth0 type veth
>> RTNETLINK answers: File exists
>> 
>> After fix:
>> % ip link dev veth0 type veth
>> % ip link show dev veth0
>> 5: veth0@veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
>>     link/ether f6:ef:8b:96:f4:ec brd ff:ff:ff:ff:ff:ff
>> %
>> 
>> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
>
> I'm not so sure about this.
>
> If we specify an explicit tb[IFLA_NAME], we shouldn't completely ignore that
> request from the user just because they didn't give any peer information.
>
> I see what happens in this case, the peer gets 'veth0' and then since
> the user asked for 'veth0' for the non-peer it conflicts.

So, the only way is to require user space to _always_ pass in
VETH_INFO_PEER, which may break existing code (fixing iproute2 is easiest).

Otherwise ignore netlink messages lacking of VETH_INFO_PEER and return
error.

IMO, neither of these solutions seem reasonable.

Also, there are valid use cases where a user does not care about veth
name sitting in container, but assigns a name following certain
pattern to a host-side veth.

> Well, too bad.  The user must work to orchestrate things such that
> this doesn't happen.  That means either providing the IFLA_NAME for
> both the peer and the non-peer, or specifying neither.
>
> I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH 3/4] net: qcom/emac: enforce DMA address restrictions
From: Timur Tabi @ 2017-10-12 16:52 UTC (permalink / raw)
  To: David Laight, David S. Miller, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD0092ADF@AcuExch.aculab.com>

On 10/12/2017 11:20 AM, David Laight wrote:
>> I'm pretty sure that kzalloc does not make that guarantee, and I don't
>> think dma_alloc_coherent does either.

> dma_alloc_coherent() definitely does.
> And I've a driver that relies on it (for 16k blocks).

What about when an IOMMU is used?  The DMA address that gets returned is 
not necessarily the physical address of the memory buffer.

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH 3/4] net: qcom/emac: enforce DMA address restrictions
From: David Miller @ 2017-10-12 16:58 UTC (permalink / raw)
  To: timur; +Cc: David.Laight, netdev
In-Reply-To: <e523734d-d845-8eae-4a5b-e679b8e46654@codeaurora.org>

From: Timur Tabi <timur@codeaurora.org>
Date: Thu, 12 Oct 2017 09:13:25 -0500

> On 10/12/17 4:30 AM, David Laight wrote:
>> Isn't the memory allocated by a single kzalloc() call?
> 
> dma_alloc_coherenent, actually.
> 
>> IIRC that guarantees it doesn't cross a power or 2 boundary less than
>> the size.
> 
> I'm pretty sure that kzalloc does not make that guarantee, and I don't
> think dma_alloc_coherent does either.

Both make that guarantee, even when an IOMMU is used.

^ permalink raw reply

* Re: [PATCH 3/4] net: qcom/emac: enforce DMA address restrictions
From: Timur Tabi @ 2017-10-12 17:15 UTC (permalink / raw)
  To: David Miller; +Cc: David.Laight, netdev
In-Reply-To: <20171012.095837.2057549694773237248.davem@davemloft.net>

On 10/12/2017 11:58 AM, David Miller wrote:
>> I'm pretty sure that kzalloc does not make that guarantee, and I don't
>> think dma_alloc_coherent does either.
> Both make that guarantee, even when an IOMMU is used.
> 

Ok, Dave, then can you drop this patch?

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [patch net-next 00/34] net: sched: allow qdiscs to share filter block instances
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb

From: Jiri Pirko <jiri@mellanox.com>

First of all, I would like to apologize for big patchset. However after
couple of hours trying to figure out how to cut it, I found out it is
actually not possible. I would have to add some interface in one patchset
and only use it in second, which is forbidden. Also, I would like
to provide the reviewer the full picture. Most of the patches are small
and contained anyway, so it should be easy to review them. But to the
motivation:

Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.

So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification. If we don't pass any
block number, a new one will be generated by kernel:

$ tc qdisc add dev ens7 ingress block 22
                                ^^^^^^^^
$ tc qdisc add dev ens8 ingress block 22
                                ^^^^^^^^

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress ffff: dev ens7 parent ffff:fff1 block 22 
qdisc ingress ffff: dev ens8 parent ffff:fff1 block 22 

Now we can add filter to any of qdiscs sharing the same block:

$ tc filter add dev ens7 parent ffff: protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop


We will see the same output if we list filters for ens7 and ens8, including stats:

$ tc -s filter show dev ens7 ingress
filter protocol ip pref 25 flower chain 0 
filter protocol ip pref 25 flower chain 0 handle 0x1 
  eth_type ipv4
  dst_ip 192.168.0.0/16
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 39 sec used 2 sec
        Action statistics:
        Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0 

$ tc -s filter show dev ens8 ingress
filter protocol ip pref 25 flower chain 0 
filter protocol ip pref 25 flower chain 0 handle 0x1 
  eth_type ipv4
  dst_ip 192.168.0.0/16
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 40 sec used 3 sec
        Action statistics:
        Sent 3108 bytes 37 pkt (dropped 37, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0


Patches overview:

Patches 1-3 introduce infrastructure for block sharing and the interface
   funtions to the qdisc, tcf_block_get_ext and tcf_block_put_ext
Patches 4-11 are removing usages of tp->q pointer, which needs to be
   eventually removed in order to set the tfc_proto independent
   on a qdisc instance
Patches 12-19 introduces block callbacks, internal infra and driver-facing
   interface, they add callback calling to individual classifiers
Patches 20-28 convert individual drivers from ndo_setup_tc to block
   callbacks for classifiers offloading
Patches 29-31 remove unused things due to the previous conversion
Patch 32 introduces block mechanism to handle netif_keep_dst calls
Patch 33 removes tp->q and tp->classid - makes tcf_proto independent on qdisc
Patch 34 finally enables block sharing for cls_ingress and cls_clsact


Iproute2 implementation is here:
https://github.com/jpirko/iproute2_mlxsw/commit/f91ff81e3b307adfe769f29b3c04c70f39a2520c

The next patchset will introduce block sharing for mlxsw. For the
curious ones the patches could be found here:
https://github.com/jpirko/linux_mlxsw/commits/jiri_devel_shblock

Jiri Pirko (34):
  net: sched: store Qdisc pointer in struct block
  net: sched: introduce support for multiple filter chain pointers
    registration
  net: sched: introduce shared filter blocks infrastructure
  net: sched: teach tcf_bind/unbind_filter to use block->q
  net: sched: ematch: obtain net pointer from blocks
  net: core: use dev->ingress_queue instead of tp->q
  net: sched: cls_u32: use block instead of q in tc_u_common
  net: sched: avoid usage of tp->q in tcf_classify
  net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct
    Qdisc
  net: sched: use tcf_block_q helper to get q pointer for sch_tree_lock
  net: sched: propagate q and parent from caller down to tcf_fill_node
  net: sched: add block bind/unbind notification to drivers
  net: sched: introduce per-block callbacks
  net: sched: use extended variants of block get and put in ingress and
    clsact qdiscs
  net: sched: use tc_setup_cb_call to call per-block callbacks
  net: sched: cls_matchall: call block callbacks for offload
  net: sched: cls_u32: swap u32_remove_hw_knode and u32_remove_hw_hnode
  net: sched: cls_u32: call block callbacks for offload
  net: sched: cls_bpf: call block callbacks for offload
  mlxsw: spectrum: Convert ndo_setup_tc offloads to block callbacks
  mlx5e: Convert ndo_setup_tc offloads to block callbacks
  bnxt: Convert ndo_setup_tc offloads to block callbacks
  cxgb4: Convert ndo_setup_tc offloads to block callbacks
  ixgbe: Convert ndo_setup_tc offloads to block callbacks
  mlx5e_rep: Convert ndo_setup_tc offloads to block callbacks
  nfp: flower: Convert ndo_setup_tc offloads to block callbacks
  nfp: bpf: Convert ndo_setup_tc offloads to block callbacks
  dsa: Convert ndo_setup_tc offloads to block callbacks
  net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*
  net: sched: remove unused classid field from tc_cls_common_offload
  net: sched: remove unused is_classid_clsact_ingress/egress helpers
  net: sched: introduce block mechanism to handle netif_keep_dst calls
  net: sched: remove classid and q fields from tcf_proto
  net: sched: allow ingress and clsact qdiscs to share filter blocks

 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |  37 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c       |   3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c      |  41 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c    |  42 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  45 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  45 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  62 ++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  83 +++-
 drivers/net/ethernet/netronome/nfp/bpf/main.c      |  51 +-
 drivers/net/ethernet/netronome/nfp/bpf/offload.c   |   4 +
 .../net/ethernet/netronome/nfp/flower/offload.c    |  54 ++-
 include/linux/netdevice.h                          |   1 +
 include/net/pkt_cls.h                              | 195 +++++++-
 include/net/pkt_sched.h                            |  14 +-
 include/net/sch_generic.h                          |  15 +-
 include/uapi/linux/pkt_sched.h                     |  12 +
 net/core/dev.c                                     |  21 +-
 net/dsa/slave.c                                    |  64 ++-
 net/sched/cls_api.c                                | 527 +++++++++++++++++++--
 net/sched/cls_bpf.c                                |  32 +-
 net/sched/cls_flow.c                               |   9 +-
 net/sched/cls_flower.c                             |  29 +-
 net/sched/cls_fw.c                                 |   5 +-
 net/sched/cls_matchall.c                           |  58 +--
 net/sched/cls_route.c                              |   2 +-
 net/sched/cls_tcindex.c                            |   5 +-
 net/sched/cls_u32.c                                |  79 ++-
 net/sched/ematch.c                                 |   2 +-
 net/sched/sch_api.c                                |   6 +-
 net/sched/sch_atm.c                                |   4 +-
 net/sched/sch_cbq.c                                |   2 +-
 net/sched/sch_drr.c                                |   2 +-
 net/sched/sch_dsmark.c                             |   2 +-
 net/sched/sch_fq_codel.c                           |   2 +-
 net/sched/sch_hfsc.c                               |   4 +-
 net/sched/sch_htb.c                                |   4 +-
 net/sched/sch_ingress.c                            | 123 ++++-
 net/sched/sch_multiq.c                             |   2 +-
 net/sched/sch_prio.c                               |   2 +-
 net/sched/sch_qfq.c                                |   2 +-
 net/sched/sch_sfb.c                                |   2 +-
 net/sched/sch_sfq.c                                |   2 +-
 43 files changed, 1370 insertions(+), 330 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [patch net-next 01/34] net: sched: store Qdisc pointer in struct block
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Prepare for removal of tp->q and store Qdisc pointer in the block
structure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     | 4 ++--
 include/net/sch_generic.h | 1 +
 net/sched/cls_api.c       | 3 ++-
 net/sched/sch_atm.c       | 4 ++--
 net/sched/sch_cbq.c       | 2 +-
 net/sched/sch_drr.c       | 2 +-
 net/sched/sch_dsmark.c    | 2 +-
 net/sched/sch_fq_codel.c  | 2 +-
 net/sched/sch_hfsc.c      | 4 ++--
 net/sched/sch_htb.c       | 4 ++--
 net/sched/sch_ingress.c   | 6 +++---
 net/sched/sch_multiq.c    | 2 +-
 net/sched/sch_prio.c      | 2 +-
 net/sched/sch_qfq.c       | 2 +-
 net/sched/sch_sfb.c       | 2 +-
 net/sched/sch_sfq.c       | 2 +-
 16 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f526374..772dfa8 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -22,7 +22,7 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 				bool create);
 void tcf_chain_put(struct tcf_chain *chain);
 int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain);
+		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q);
 void tcf_block_put(struct tcf_block *block);
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
@@ -30,7 +30,7 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 #else
 static inline
 int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain)
+		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q)
 {
 	return 0;
 }
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 684d8ed..df4032c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -270,6 +270,7 @@ struct tcf_chain {
 
 struct tcf_block {
 	struct list_head chain_list;
+	struct Qdisc *q;
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2977b8a..f7d3f1f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
 }
 
 int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain)
+		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q)
 {
 	struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL);
 	struct tcf_chain *chain;
@@ -257,6 +257,7 @@ int tcf_block_get(struct tcf_block **p_block,
 		goto err_chain_create;
 	}
 	tcf_chain_filter_chain_ptr_set(chain, p_filter_chain);
+	block->q = q;
 	*p_block = block;
 	return 0;
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index c5fcdf1..2dbd249 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -281,7 +281,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		goto err_out;
 	}
 
-	error = tcf_block_get(&flow->block, &flow->filter_list);
+	error = tcf_block_get(&flow->block, &flow->filter_list, sch);
 	if (error) {
 		kfree(flow);
 		goto err_out;
@@ -546,7 +546,7 @@ static int atm_tc_init(struct Qdisc *sch, struct nlattr *opt)
 		p->link.q = &noop_qdisc;
 	pr_debug("atm_tc_init: link (%p) qdisc %p\n", &p->link, p->link.q);
 
-	err = tcf_block_get(&p->link.block, &p->link.filter_list);
+	err = tcf_block_get(&p->link.block, &p->link.filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index dcef97f..c3b92d6 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1566,7 +1566,7 @@ cbq_change_class(struct Qdisc *sch, u32 classid, u32 parentid, struct nlattr **t
 	if (cl == NULL)
 		goto failure;
 
-	err = tcf_block_get(&cl->block, &cl->filter_list);
+	err = tcf_block_get(&cl->block, &cl->filter_list, sch);
 	if (err) {
 		kfree(cl);
 		return err;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 2d0e8d4..753dc7a 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -412,7 +412,7 @@ static int drr_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
 	struct drr_sched *q = qdisc_priv(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 	err = qdisc_class_hash_init(&q->clhash);
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 2836c80..fb4fb71 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -344,7 +344,7 @@ static int dsmark_init(struct Qdisc *sch, struct nlattr *opt)
 	if (!opt)
 		goto errout;
 
-	err = tcf_block_get(&p->block, &p->filter_list);
+	err = tcf_block_get(&p->block, &p->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index de3b57c..3c40ede 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -481,7 +481,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
 			return err;
 	}
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3f88b75..a692184 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1033,7 +1033,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	if (cl == NULL)
 		return -ENOBUFS;
 
-	err = tcf_block_get(&cl->block, &cl->filter_list);
+	err = tcf_block_get(&cl->block, &cl->filter_list, sch);
 	if (err) {
 		kfree(cl);
 		return err;
@@ -1405,7 +1405,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
 		return err;
 	q->eligible = RB_ROOT;
 
-	err = tcf_block_get(&q->root.block, &q->root.filter_list);
+	err = tcf_block_get(&q->root.block, &q->root.filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c6d7ae8..57be73c 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1030,7 +1030,7 @@ static int htb_init(struct Qdisc *sch, struct nlattr *opt)
 	if (!opt)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
@@ -1393,7 +1393,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 		if (!cl)
 			goto failure;
 
-		err = tcf_block_get(&cl->block, &cl->filter_list);
+		err = tcf_block_get(&cl->block, &cl->filter_list, sch);
 		if (err) {
 			kfree(cl);
 			goto failure;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 44de4ee..9ccc1b8 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -59,7 +59,7 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &dev->ingress_cl_list);
+	err = tcf_block_get(&q->block, &dev->ingress_cl_list, sch);
 	if (err)
 		return err;
 
@@ -153,11 +153,11 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
-	err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list);
+	err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list, sch);
 	if (err)
 		return err;
 
-	err = tcf_block_get(&q->egress_block, &dev->egress_cl_list);
+	err = tcf_block_get(&q->egress_block, &dev->egress_cl_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index ff4fc3e..31e0a28 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -245,7 +245,7 @@ static int multiq_init(struct Qdisc *sch, struct nlattr *opt)
 	if (opt == NULL)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 2dd6c68..95fad34 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -212,7 +212,7 @@ static int prio_init(struct Qdisc *sch, struct nlattr *opt)
 	if (!opt)
 		return -EINVAL;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6ddfd49..8694c7b 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1419,7 +1419,7 @@ static int qfq_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
 	int i, j, err;
 	u32 max_cl_shift, maxbudg_shift, max_classes;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index cc39e17..487d375 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -553,7 +553,7 @@ static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
 	struct sfb_sched_data *q = qdisc_priv(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 74ea863..123a53a 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -725,7 +725,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	setup_deferrable_timer(&q->perturb_timer, sfq_perturbation,
 			       (unsigned long)sch);
 
-	err = tcf_block_get(&q->block, &q->filter_list);
+	err = tcf_block_get(&q->block, &q->filter_list, sch);
 	if (err)
 		return err;
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 02/34] net: sched: introduce support for multiple filter chain pointers registration
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

So far, there was possible only to register a single filter chain
pointer to block->chain[0]. However, when the blocks will get shareable,
we need to allow multiple filter chain pointers registration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_sched.h   |  7 +++++
 include/net/sch_generic.h |  3 ++-
 net/sched/cls_api.c       | 68 ++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 67 insertions(+), 11 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 259bc19..2d234af 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -4,7 +4,9 @@
 #include <linux/jiffies.h>
 #include <linux/ktime.h>
 #include <linux/if_vlan.h>
+#include <linux/netdevice.h>
 #include <net/sch_generic.h>
+#include <net/net_namespace.h>
 #include <uapi/linux/pkt_sched.h>
 
 #define DEFAULT_TX_QUEUE_LEN	1000
@@ -146,4 +148,9 @@ static inline bool is_classid_clsact_egress(u32 classid)
 	       TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS);
 }
 
+static inline struct net *qdisc_net(struct Qdisc *q)
+{
+	return dev_net(q->dev_queue->dev);
+}
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index df4032c..6583c59 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -261,7 +261,7 @@ struct qdisc_skb_cb {
 
 struct tcf_chain {
 	struct tcf_proto __rcu *filter_chain;
-	struct tcf_proto __rcu **p_filter_chain;
+	struct list_head filter_chain_list;
 	struct list_head list;
 	struct tcf_block *block;
 	u32 index; /* chain index */
@@ -270,6 +270,7 @@ struct tcf_chain {
 
 struct tcf_block {
 	struct list_head chain_list;
+	struct net *net;
 	struct Qdisc *q;
 };
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f7d3f1f..0ffd79a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -171,6 +171,11 @@ static void tcf_proto_destroy(struct tcf_proto *tp)
 	kfree_rcu(tp, rcu);
 }
 
+struct tfc_filter_chain_list_item {
+	struct list_head list;
+	struct tcf_proto __rcu **p_filter_chain;
+};
+
 static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 					  u32 chain_index)
 {
@@ -179,6 +184,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 	if (!chain)
 		return NULL;
+	INIT_LIST_HEAD(&chain->filter_chain_list);
 	list_add_tail(&chain->list, &block->chain_list);
 	chain->block = block;
 	chain->index = chain_index;
@@ -188,10 +194,11 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 
 static void tcf_chain_flush(struct tcf_chain *chain)
 {
+	struct tfc_filter_chain_list_item *item;
 	struct tcf_proto *tp;
 
-	if (chain->p_filter_chain)
-		RCU_INIT_POINTER(*chain->p_filter_chain, NULL);
+	list_for_each_entry(item, &chain->filter_chain_list, list)
+		RCU_INIT_POINTER(*item->p_filter_chain, NULL);
 	while ((tp = rtnl_dereference(chain->filter_chain)) != NULL) {
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
 		tcf_chain_put(chain);
@@ -233,11 +240,41 @@ void tcf_chain_put(struct tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_chain_put);
 
+static int
+tcf_chain_filter_chain_ptr_add(struct tcf_chain *chain,
+			       struct tcf_proto __rcu **p_filter_chain)
+{
+	struct tfc_filter_chain_list_item *item;
+
+	item = kmalloc(sizeof(*item), GFP_KERNEL);
+	if (!item)
+		return -ENOMEM;
+	item->p_filter_chain = p_filter_chain;
+	list_add(&item->list, &chain->filter_chain_list);
+	return 0;
+}
+
 static void
-tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
+tcf_chain_filter_chain_ptr_del(struct tcf_chain *chain,
 			       struct tcf_proto __rcu **p_filter_chain)
 {
-	chain->p_filter_chain = p_filter_chain;
+	struct tfc_filter_chain_list_item *item;
+
+	list_for_each_entry(item, &chain->filter_chain_list, list) {
+		if (!p_filter_chain ||
+		    item->p_filter_chain == p_filter_chain) {
+			RCU_INIT_POINTER(*item->p_filter_chain, NULL);
+			list_del(&item->list);
+			kfree(item);
+			return;
+		}
+	}
+	WARN_ON(1);
+}
+
+static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+{
+	return list_first_entry(&block->chain_list, struct tcf_chain, list);
 }
 
 int tcf_block_get(struct tcf_block **p_block,
@@ -256,7 +293,8 @@ int tcf_block_get(struct tcf_block **p_block,
 		err = -ENOMEM;
 		goto err_chain_create;
 	}
-	tcf_chain_filter_chain_ptr_set(chain, p_filter_chain);
+	tcf_chain_filter_chain_ptr_add(chain, p_filter_chain);
+	block->net = qdisc_net(q);
 	block->q = q;
 	*p_block = block;
 	return 0;
@@ -274,6 +312,8 @@ void tcf_block_put(struct tcf_block *block)
 	if (!block)
 		return;
 
+	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block), NULL);
+
 	/* XXX: Standalone actions are not allowed to jump to any chain, and
 	 * bound actions should be all removed after flushing. However,
 	 * filters are destroyed in RCU callbacks, we have to hold the chains
@@ -371,9 +411,13 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 				struct tcf_chain_info *chain_info,
 				struct tcf_proto *tp)
 {
-	if (chain->p_filter_chain &&
-	    *chain_info->pprev == chain->filter_chain)
-		rcu_assign_pointer(*chain->p_filter_chain, tp);
+	if (*chain_info->pprev == chain->filter_chain) {
+		struct tfc_filter_chain_list_item *item;
+
+		list_for_each_entry(item, &chain->filter_chain_list, list)
+			rcu_assign_pointer(*item->p_filter_chain, tp);
+	}
+
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
 	tcf_chain_hold(chain);
@@ -385,8 +429,12 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 {
 	struct tcf_proto *next = rtnl_dereference(chain_info->next);
 
-	if (chain->p_filter_chain && tp == chain->filter_chain)
-		RCU_INIT_POINTER(*chain->p_filter_chain, next);
+	if (tp == chain->filter_chain) {
+		struct tfc_filter_chain_list_item *item;
+
+		list_for_each_entry(item, &chain->filter_chain_list, list)
+			RCU_INIT_POINTER(*item->p_filter_chain, next);
+	}
 	RCU_INIT_POINTER(*chain_info->pprev, next);
 	tcf_chain_put(chain);
 }
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 03/34] net: sched: introduce shared filter blocks infrastructure
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Allow qdiscs to share filter blocks among them. Each qdisc type has to
use block get/put modifications that enable sharing. Shared blocks are
tracked within each net namespace and identified by u32 value. This
value is auto-generated in case user did not pass it from userspace. If
user passes value that is not used, new block is created. If user passes
value that is already used, the existing block will be re-used.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     |  57 ++++++++++++++-
 include/net/sch_generic.h |   2 +
 net/sched/cls_api.c       | 183 +++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 222 insertions(+), 20 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 772dfa8..0cf520b 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -17,13 +17,42 @@ struct tcf_walker {
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
+struct tcf_block_ext_info {
+	bool shareable;
+	u32 block_index;
+};
+
 #ifdef CONFIG_NET_CLS
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 				bool create);
 void tcf_chain_put(struct tcf_chain *chain);
+
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q);
+int tcf_block_get_ext(struct tcf_block **p_block,
+		      struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		      struct tcf_block_ext_info *ei);
 void tcf_block_put(struct tcf_block *block);
+void tcf_block_put_ext(struct tcf_block *block,
+		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_block_ext_info *ei);
+
+static inline bool tcf_is_block_shared(const struct tcf_block *block)
+{
+	return block->refcnt != 1;
+}
+
+static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
+{
+	WARN_ON(tcf_is_block_shared(block));
+	return block->q;
+}
+
+static inline struct net_device *tcf_block_dev(struct tcf_block *block)
+{
+	return tcf_block_q(block)->dev_queue->dev;
+}
+
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
@@ -35,8 +64,34 @@ int tcf_block_get(struct tcf_block **p_block,
 	return 0;
 }
 
-static inline void tcf_block_put(struct tcf_block *block)
+static inline
+int tcf_block_get_ext(struct tcf_block **p_block,
+		      struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		      struct tcf_block_ext_info *ei)
 {
+	return 0;
+}
+
+static inline
+void tcf_block_put(struct tcf_block *block)
+{
+}
+
+static inline
+void tcf_block_put_ext(struct tcf_block *block,
+		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_block_ext_info *ei)
+{
+}
+
+static inline struct net_device *tcf_block_dev(struct tcf_block *block)
+{
+	return NULL;
+}
+
+static inline bool tcf_is_block_shared(const struct tcf_block *block)
+{
+	return false;
 }
 
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 6583c59..0b2ba3b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -270,6 +270,8 @@ struct tcf_chain {
 
 struct tcf_block {
 	struct list_head chain_list;
+	u32 index; /* block index for shared blocks */
+	unsigned int refcnt;
 	struct net *net;
 	struct Qdisc *q;
 };
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0ffd79a..2d0f18f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -25,6 +25,7 @@
 #include <linux/kmod.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include <linux/idr.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/netlink.h>
@@ -272,48 +273,70 @@ tcf_chain_filter_chain_ptr_del(struct tcf_chain *chain,
 	WARN_ON(1);
 }
 
-static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+struct tcf_net {
+	struct idr idr;
+};
+
+static unsigned int tcf_net_id;
+
+static int tcf_block_insert(struct tcf_block *block, struct net *net,
+			    u32 block_index)
 {
-	return list_first_entry(&block->chain_list, struct tcf_chain, list);
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+	int idr_start;
+	int idr_end;
+	int index;
+
+	if (block_index >= INT_MAX)
+		return -EINVAL;
+	idr_start = block_index ? block_index : 1;
+	idr_end = block_index ? block_index + 1 : INT_MAX;
+
+	index = idr_alloc(&tn->idr, block, idr_start, idr_end, GFP_KERNEL);
+	if (index < 0)
+		return index;
+	block->index = index;
+	return 0;
 }
 
-int tcf_block_get(struct tcf_block **p_block,
-		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q)
+static void tcf_block_remove(struct tcf_block *block, struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_remove(&tn->idr, block->index);
+}
+
+static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q)
 {
-	struct tcf_block *block = kzalloc(sizeof(*block), GFP_KERNEL);
+	struct tcf_block *block;
 	struct tcf_chain *chain;
 	int err;
 
+	block = kzalloc(sizeof(*block), GFP_KERNEL);
 	if (!block)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	INIT_LIST_HEAD(&block->chain_list);
+	block->refcnt = 1;
+	block->net = net;
+	block->q = q;
+
 	/* Create chain 0 by default, it has to be always present. */
 	chain = tcf_chain_create(block, 0);
 	if (!chain) {
 		err = -ENOMEM;
 		goto err_chain_create;
 	}
-	tcf_chain_filter_chain_ptr_add(chain, p_filter_chain);
-	block->net = qdisc_net(q);
-	block->q = q;
-	*p_block = block;
-	return 0;
+	return block;
 
 err_chain_create:
 	kfree(block);
-	return err;
+	return ERR_PTR(err);
 }
-EXPORT_SYMBOL(tcf_block_get);
 
-void tcf_block_put(struct tcf_block *block)
+static void tcf_block_destroy(struct tcf_block *block)
 {
 	struct tcf_chain *chain, *tmp;
 
-	if (!block)
-		return;
-
-	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block), NULL);
-
 	/* XXX: Standalone actions are not allowed to jump to any chain, and
 	 * bound actions should be all removed after flushing. However,
 	 * filters are destroyed in RCU callbacks, we have to hold the chains
@@ -341,6 +364,100 @@ void tcf_block_put(struct tcf_block *block)
 		tcf_chain_put(chain);
 	kfree(block);
 }
+
+static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	return idr_find(&tn->idr, block_index);
+}
+
+static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
+{
+	return list_first_entry(&block->chain_list, struct tcf_chain, list);
+}
+
+int tcf_block_get_ext(struct tcf_block **p_block,
+		      struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
+		      struct tcf_block_ext_info *ei)
+{
+	struct net *net = qdisc_net(q);
+	struct tcf_block *block = NULL;
+	bool created = false;
+	int err;
+
+	if (ei->shareable) {
+		block = tcf_block_lookup(net, ei->block_index);
+		if (block)
+			block->refcnt++;
+	}
+
+	if (!block) {
+		block = tcf_block_create(net, q);
+		if (IS_ERR(block))
+			return PTR_ERR(block);
+		created = true;
+		if (ei->shareable) {
+			err = tcf_block_insert(block, net, ei->block_index);
+			if (err)
+				goto err_block_insert;
+		}
+	}
+
+	err = tcf_chain_filter_chain_ptr_add(tcf_block_chain_zero(block),
+					     p_filter_chain);
+	if (err)
+		goto err_chain_filter_chain_ptr_add;
+
+	*p_block = block;
+	return 0;
+
+err_chain_filter_chain_ptr_add:
+	if (created) {
+		if (ei->shareable)
+			tcf_block_remove(block, net);
+err_block_insert:
+		tcf_block_destroy(block);
+	} else {
+		block->refcnt--;
+	}
+	return err;
+}
+EXPORT_SYMBOL(tcf_block_get_ext);
+
+int tcf_block_get(struct tcf_block **p_block,
+		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q)
+{
+	struct tcf_block_ext_info ei = {0, };
+
+	return tcf_block_get_ext(p_block, p_filter_chain, q, &ei);
+}
+EXPORT_SYMBOL(tcf_block_get);
+
+void tcf_block_put_ext(struct tcf_block *block,
+		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_block_ext_info *ei)
+{
+	if (!block)
+		return;
+
+	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block),
+				       p_filter_chain);
+
+	if (--block->refcnt == 0) {
+		if (ei->shareable)
+			tcf_block_remove(block, block->net);
+		tcf_block_destroy(block);
+	}
+}
+EXPORT_SYMBOL(tcf_block_put_ext);
+
+void tcf_block_put(struct tcf_block *block)
+{
+	struct tcf_block_ext_info ei = {0, };
+
+	tcf_block_put_ext(block, NULL, &ei);
+}
 EXPORT_SYMBOL(tcf_block_put);
 
 /* Main classifier routine: scans classifier chain attached
@@ -1090,8 +1207,36 @@ int tc_setup_cb_call(struct tcf_exts *exts, enum tc_setup_type type,
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
 
+static __net_init int tcf_net_init(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_init(&tn->idr);
+	return 0;
+}
+
+static void __net_exit tcf_net_exit(struct net *net)
+{
+	struct tcf_net *tn = net_generic(net, tcf_net_id);
+
+	idr_destroy(&tn->idr);
+}
+
+static struct pernet_operations tcf_net_ops = {
+	.init = tcf_net_init,
+	.exit = tcf_net_exit,
+	.id   = &tcf_net_id,
+	.size = sizeof(struct tcf_net),
+};
+
 static int __init tc_filter_init(void)
 {
+	int err;
+
+	err = register_pernet_subsys(&tcf_net_ops);
+	if (err)
+		return err;
+
 	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_ctl_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_ctl_tfilter, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_ctl_tfilter,
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 04/34] net: sched: teach tcf_bind/unbind_filter to use block->q
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Whenever the block->q is set, it can be used instead of tp->q as it
contains the same value. When it is not set, which can't happen now but
it might happen with the follow-up shared blocks introduction, the class
is not set in the result. That would lead to a class lookup instead
of direct class pointer use for classful qdiscs. However, it is not
planned to support classful qdisqs sharing filter blocks, so that may
never happen.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0cf520b..05bc999 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -108,36 +108,43 @@ __cls_set_class(unsigned long *clp, unsigned long cl)
 }
 
 static inline unsigned long
-cls_set_class(struct tcf_proto *tp, unsigned long *clp, 
-	unsigned long cl)
+cls_set_class(struct Qdisc *q, unsigned long *clp, unsigned long cl)
 {
 	unsigned long old_cl;
-	
-	tcf_tree_lock(tp);
+
+	sch_tree_lock(q);
 	old_cl = __cls_set_class(clp, cl);
-	tcf_tree_unlock(tp);
- 
+	sch_tree_unlock(q);
 	return old_cl;
 }
 
 static inline void
 tcf_bind_filter(struct tcf_proto *tp, struct tcf_result *r, unsigned long base)
 {
+	struct Qdisc *q = tp->chain->block->q;
 	unsigned long cl;
 
-	cl = tp->q->ops->cl_ops->bind_tcf(tp->q, base, r->classid);
-	cl = cls_set_class(tp, &r->class, cl);
+	/* Check q as it is not set for shared blocks. In that case,
+	 * setting class is not supported.
+	 */
+	if (!q)
+		return;
+	cl = q->ops->cl_ops->bind_tcf(q, base, r->classid);
+	cl = cls_set_class(q, &r->class, cl);
 	if (cl)
-		tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
+		q->ops->cl_ops->unbind_tcf(q, cl);
 }
 
 static inline void
 tcf_unbind_filter(struct tcf_proto *tp, struct tcf_result *r)
 {
+	struct Qdisc *q = tp->chain->block->q;
 	unsigned long cl;
 
+	if (!q)
+		return;
 	if ((cl = __cls_set_class(&r->class, 0)) != 0)
-		tp->q->ops->cl_ops->unbind_tcf(tp->q, cl);
+		q->ops->cl_ops->unbind_tcf(q, cl);
 }
 
 struct tcf_exts {
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 05/34] net: sched: ematch: obtain net pointer from blocks
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Instead of using tp->q, use block to get the net pointer.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/ematch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/ematch.c b/net/sched/ematch.c
index 03b677b..1331a4c 100644
--- a/net/sched/ematch.c
+++ b/net/sched/ematch.c
@@ -178,7 +178,7 @@ static int tcf_em_validate(struct tcf_proto *tp,
 	struct tcf_ematch_hdr *em_hdr = nla_data(nla);
 	int data_len = nla_len(nla) - sizeof(*em_hdr);
 	void *data = (void *) em_hdr + sizeof(*em_hdr);
-	struct net *net = dev_net(qdisc_dev(tp->q));
+	struct net *net = tp->chain->block->net;
 
 	if (!TCF_EM_REL_VALID(em_hdr->flags))
 		goto errout;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 06/34] net: core: use dev->ingress_queue instead of tp->q
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

In sch_handle_egress and sch_handle_ingress, don't use tp->q and use
dev->ingress_queue which stores the same pointer instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/dev.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fcddccb..cb9e5e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3273,14 +3273,18 @@ EXPORT_SYMBOL(dev_loopback_xmit);
 static struct sk_buff *
 sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 {
+	struct netdev_queue *netdev_queue =
+				rcu_dereference_bh(dev->ingress_queue);
 	struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
 	struct tcf_result cl_res;
+	struct Qdisc *q;
 
-	if (!cl)
+	if (!cl || !netdev_queue)
 		return skb;
+	q = netdev_queue->qdisc;
 
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
-	qdisc_bstats_cpu_update(cl->q, skb);
+	qdisc_bstats_cpu_update(q, skb);
 
 	switch (tcf_classify(skb, cl, &cl_res, false)) {
 	case TC_ACT_OK:
@@ -3288,7 +3292,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
-		qdisc_qstats_cpu_drop(cl->q);
+		qdisc_qstats_cpu_drop(q);
 		*ret = NET_XMIT_DROP;
 		kfree_skb(skb);
 		return NULL;
@@ -4188,16 +4192,21 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		   struct net_device *orig_dev)
 {
 #ifdef CONFIG_NET_CLS_ACT
+	struct netdev_queue *netdev_queue =
+				rcu_dereference_bh(skb->dev->ingress_queue);
 	struct tcf_proto *cl = rcu_dereference_bh(skb->dev->ingress_cl_list);
 	struct tcf_result cl_res;
+	struct Qdisc *q;
 
 	/* If there's at least one ingress present somewhere (so
 	 * we get here via enabled static key), remaining devices
 	 * that are not configured with an ingress qdisc will bail
 	 * out here.
 	 */
-	if (!cl)
+	if (!cl || !netdev_queue)
 		return skb;
+	q = netdev_queue->qdisc;
+
 	if (*pt_prev) {
 		*ret = deliver_skb(skb, *pt_prev, orig_dev);
 		*pt_prev = NULL;
@@ -4205,7 +4214,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 
 	qdisc_skb_cb(skb)->pkt_len = skb->len;
 	skb->tc_at_ingress = 1;
-	qdisc_bstats_cpu_update(cl->q, skb);
+	qdisc_bstats_cpu_update(q, skb);
 
 	switch (tcf_classify(skb, cl, &cl_res, false)) {
 	case TC_ACT_OK:
@@ -4213,7 +4222,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
-		qdisc_qstats_cpu_drop(cl->q);
+		qdisc_qstats_cpu_drop(q);
 		kfree_skb(skb);
 		return NULL;
 	case TC_ACT_STOLEN:
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 07/34] net: sched: cls_u32: use block instead of q in tc_u_common
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

tc_u_common is now per-q. With blocks, it has to be converted to be
per-block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_u32.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 094d224..b6d4606 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -93,7 +93,7 @@ struct tc_u_hnode {
 
 struct tc_u_common {
 	struct tc_u_hnode __rcu	*hlist;
-	struct Qdisc		*q;
+	struct tcf_block	*block;
 	int			refcnt;
 	struct idr		handle_idr;
 	struct hlist_node	hnode;
@@ -335,11 +335,7 @@ static struct hlist_head *tc_u_common_hash;
 
 static unsigned int tc_u_hash(const struct tcf_proto *tp)
 {
-	struct net_device *dev = tp->q->dev_queue->dev;
-	u32 qhandle = tp->q->handle;
-	int ifindex = dev->ifindex;
-
-	return hash_64((u64)ifindex << 32 | qhandle, U32_HASH_SHIFT);
+	return hash_64((u64) tp->chain->block, U32_HASH_SHIFT);
 }
 
 static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
@@ -349,7 +345,7 @@ static struct tc_u_common *tc_u_common_find(const struct tcf_proto *tp)
 
 	h = tc_u_hash(tp);
 	hlist_for_each_entry(tc, &tc_u_common_hash[h], hnode) {
-		if (tc->q == tp->q)
+		if (tc->block == tp->chain->block)
 			return tc;
 	}
 	return NULL;
@@ -378,7 +374,7 @@ static int u32_init(struct tcf_proto *tp)
 			kfree(root_ht);
 			return -ENOBUFS;
 		}
-		tp_c->q = tp->q;
+		tp_c->block = tp->chain->block;
 		INIT_HLIST_NODE(&tp_c->hnode);
 		idr_init(&tp_c->handle_idr);
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 09/34] net: sched: tcindex, fw, flow: use tcf_block_q helper to get struct Qdisc
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Use helper to get q pointer per block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flow.c    | 7 +++++--
 net/sched/cls_fw.c      | 5 ++++-
 net/sched/cls_tcindex.c | 5 ++++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 2a3a60e..f3be666 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -491,8 +491,11 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 			perturb_period = nla_get_u32(tb[TCA_FLOW_PERTURB]) * HZ;
 		}
 
-		if (TC_H_MAJ(baseclass) == 0)
-			baseclass = TC_H_MAKE(tp->q->handle, baseclass);
+		if (TC_H_MAJ(baseclass) == 0) {
+			struct Qdisc *q = tcf_block_q(tp->chain->block);
+
+			baseclass = TC_H_MAKE(q->handle, baseclass);
+		}
 		if (TC_H_MIN(baseclass) == 0)
 			baseclass = TC_H_MAKE(baseclass, 1);
 
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 941245a..aa1e1f3 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -28,6 +28,7 @@
 #include <net/netlink.h>
 #include <net/act_api.h>
 #include <net/pkt_cls.h>
+#include <net/sch_generic.h>
 
 #define HTSIZE 256
 
@@ -83,9 +84,11 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			}
 		}
 	} else {
+		struct Qdisc *q = tcf_block_q(tp->chain->block);
+
 		/* Old method: classify the packet using its skb mark. */
 		if (id && (TC_H_MAJ(id) == 0 ||
-			   !(TC_H_MAJ(id ^ tp->q->handle)))) {
+			   !(TC_H_MAJ(id ^ q->handle)))) {
 			res->classid = id;
 			res->class = 0;
 			return 0;
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 14a7e08..d732b54 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -13,6 +13,7 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 #include <net/pkt_cls.h>
+#include <net/sch_generic.h>
 
 /*
  * Passing parameters to the root seems to be done more awkwardly than really
@@ -90,9 +91,11 @@ static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 
 	f = tcindex_lookup(p, key);
 	if (!f) {
+		struct Qdisc *q = tcf_block_q(tp->chain->block);
+
 		if (!p->fall_through)
 			return -1;
-		res->classid = TC_H_MAKE(TC_H_MAJ(tp->q->handle), key);
+		res->classid = TC_H_MAKE(TC_H_MAJ(q->handle), key);
 		res->class = 0;
 		pr_debug("alg 0x%x\n", res->classid);
 		return 0;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 08/34] net: sched: avoid usage of tp->q in tcf_classify
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Use block index in the messages instead.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2d0f18f..b1bcc8b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -501,8 +501,9 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 #ifdef CONFIG_NET_CLS_ACT
 reset:
 	if (unlikely(limit++ >= max_reclassify_loop)) {
-		net_notice_ratelimited("%s: reclassify loop, rule prio %u, protocol %02x\n",
-				       tp->q->ops->id, tp->prio & 0xffff,
+		net_notice_ratelimited("%u: reclassify loop, rule prio %u, protocol %02x\n",
+				       tp->chain->block->index,
+				       tp->prio & 0xffff,
 				       ntohs(tp->protocol));
 		return TC_ACT_SHOT;
 	}
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 10/34] net: sched: use tcf_block_q helper to get q pointer for sch_tree_lock
From: Jiri Pirko @ 2017-10-12 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Use tcf_block_q helper to get q pointer to be used for direct call of
sch_tree_lock/unlock instead of tcf_tree_lock/unlock.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 3 ---
 net/sched/sch_api.c       | 6 ++++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 0b2ba3b..a4926c9 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -361,9 +361,6 @@ static inline void sch_tree_unlock(const struct Qdisc *q)
 	spin_unlock_bh(qdisc_root_sleeping_lock(q));
 }
 
-#define tcf_tree_lock(tp)	sch_tree_lock((tp)->q)
-#define tcf_tree_unlock(tp)	sch_tree_unlock((tp)->q)
-
 extern struct Qdisc noop_qdisc;
 extern struct Qdisc_ops noop_qdisc_ops;
 extern struct Qdisc_ops pfifo_fast_ops;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index aa82116..a9ac912 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1661,9 +1661,11 @@ static int tcf_node_bind(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
 	struct tcf_bind_args *a = (void *)arg;
 
 	if (tp->ops->bind_class) {
-		tcf_tree_lock(tp);
+		struct Qdisc *q = tcf_block_q(tp->chain->block);
+
+		sch_tree_lock(q);
 		tp->ops->bind_class(n, a->classid, a->cl);
-		tcf_tree_unlock(tp);
+		sch_tree_unlock(q);
 	}
 	return 0;
 }
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 11/34] net: sched: propagate q and parent from caller down to tcf_fill_node
From: Jiri Pirko @ 2017-10-12 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

The callers have this info, they will pass it down to tcf_fill_node.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 55 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 20 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index b1bcc8b..908b38a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -585,8 +585,8 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 }
 
 static int tcf_fill_node(struct net *net, struct sk_buff *skb,
-			 struct tcf_proto *tp, void *fh, u32 portid,
-			 u32 seq, u16 flags, int event)
+			 struct tcf_proto *tp, struct Qdisc *q, u32 parent,
+			 void *fh, u32 portid, u32 seq, u16 flags, int event)
 {
 	struct tcmsg *tcm;
 	struct nlmsghdr  *nlh;
@@ -599,8 +599,8 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 	tcm->tcm_family = AF_UNSPEC;
 	tcm->tcm__pad1 = 0;
 	tcm->tcm__pad2 = 0;
-	tcm->tcm_ifindex = qdisc_dev(tp->q)->ifindex;
-	tcm->tcm_parent = tp->classid;
+	tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
+	tcm->tcm_parent = parent;
 	tcm->tcm_info = TC_H_MAKE(tp->prio, tp->protocol);
 	if (nla_put_string(skb, TCA_KIND, tp->ops->kind))
 		goto nla_put_failure;
@@ -623,6 +623,7 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 
 static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 			  struct nlmsghdr *n, struct tcf_proto *tp,
+			  struct Qdisc *q, u32 parent,
 			  void *fh, int event, bool unicast)
 {
 	struct sk_buff *skb;
@@ -632,7 +633,7 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 	if (!skb)
 		return -ENOBUFS;
 
-	if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq,
+	if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq,
 			  n->nlmsg_flags, event) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
@@ -647,6 +648,7 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 
 static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 			      struct nlmsghdr *n, struct tcf_proto *tp,
+			      struct Qdisc *q, u32 parent,
 			      void *fh, bool unicast, bool *last)
 {
 	struct sk_buff *skb;
@@ -657,7 +659,7 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 	if (!skb)
 		return -ENOBUFS;
 
-	if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq,
+	if (tcf_fill_node(net, skb, tp, q, parent, fh, portid, n->nlmsg_seq,
 			  n->nlmsg_flags, RTM_DELTFILTER) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
@@ -677,6 +679,7 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 }
 
 static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
+				 struct Qdisc *q, u32 parent,
 				 struct nlmsghdr *n,
 				 struct tcf_chain *chain, int event)
 {
@@ -684,7 +687,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 
 	for (tp = rtnl_dereference(chain->filter_chain);
 	     tp; tp = rtnl_dereference(tp->next))
-		tfilter_notify(net, oskb, n, tp, 0, event, false);
+		tfilter_notify(net, oskb, n, tp, q, parent, 0, event, false);
 }
 
 /* Add/change/delete/get a filter node */
@@ -803,7 +806,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 	if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
-		tfilter_notify_chain(net, skb, n, chain, RTM_DELTFILTER);
+		tfilter_notify_chain(net, skb, q, parent, n,
+				     chain, RTM_DELTFILTER);
 		tcf_chain_flush(chain);
 		err = 0;
 		goto errout;
@@ -850,7 +854,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	if (!fh) {
 		if (n->nlmsg_type == RTM_DELTFILTER && t->tcm_handle == 0) {
 			tcf_chain_tp_remove(chain, &chain_info, tp);
-			tfilter_notify(net, skb, n, tp, fh,
+			tfilter_notify(net, skb, n, tp, q, parent, fh,
 				       RTM_DELTFILTER, false);
 			tcf_proto_destroy(tp);
 			err = 0;
@@ -875,8 +879,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			}
 			break;
 		case RTM_DELTFILTER:
-			err = tfilter_del_notify(net, skb, n, tp, fh, false,
-						 &last);
+			err = tfilter_del_notify(net, skb, n, tp, q, parent,
+						 fh, false, &last);
 			if (err)
 				goto errout;
 			if (last) {
@@ -885,7 +889,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			}
 			goto errout;
 		case RTM_GETTFILTER:
-			err = tfilter_notify(net, skb, n, tp, fh,
+			err = tfilter_notify(net, skb, n, tp, q, parent, fh,
 					     RTM_NEWTFILTER, true);
 			goto errout;
 		default:
@@ -899,7 +903,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	if (err == 0) {
 		if (tp_created)
 			tcf_chain_tp_insert(chain, &chain_info, tp);
-		tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false);
+		tfilter_notify(net, skb, n, tp, q, parent, fh,
+			       RTM_NEWTFILTER, false);
 	} else {
 		if (tp_created)
 			tcf_proto_destroy(tp);
@@ -918,6 +923,8 @@ struct tcf_dump_args {
 	struct tcf_walker w;
 	struct sk_buff *skb;
 	struct netlink_callback *cb;
+	struct Qdisc *q;
+	u32 parent;
 };
 
 static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
@@ -925,13 +932,14 @@ static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
 	struct tcf_dump_args *a = (void *)arg;
 	struct net *net = sock_net(a->skb->sk);
 
-	return tcf_fill_node(net, a->skb, tp, n, NETLINK_CB(a->cb->skb).portid,
+	return tcf_fill_node(net, a->skb, tp, a->q, a->parent,
+			     n, NETLINK_CB(a->cb->skb).portid,
 			     a->cb->nlh->nlmsg_seq, NLM_F_MULTI,
 			     RTM_NEWTFILTER);
 }
 
-static bool tcf_chain_dump(struct tcf_chain *chain, struct sk_buff *skb,
-			   struct netlink_callback *cb,
+static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
+			   struct sk_buff *skb, struct netlink_callback *cb,
 			   long index_start, long *p_index)
 {
 	struct net *net = sock_net(skb->sk);
@@ -953,7 +961,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct sk_buff *skb,
 			memset(&cb->args[1], 0,
 			       sizeof(cb->args) - sizeof(cb->args[0]));
 		if (cb->args[1] == 0) {
-			if (tcf_fill_node(net, skb, tp, 0,
+			if (tcf_fill_node(net, skb, tp, q, parent, 0,
 					  NETLINK_CB(cb->skb).portid,
 					  cb->nlh->nlmsg_seq, NLM_F_MULTI,
 					  RTM_NEWTFILTER) <= 0)
@@ -966,6 +974,8 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct sk_buff *skb,
 		arg.w.fn = tcf_node_dump;
 		arg.skb = skb;
 		arg.cb = cb;
+		arg.q = q;
+		arg.parent = parent;
 		arg.w.stop = 0;
 		arg.w.skip = cb->args[1] - 1;
 		arg.w.count = 0;
@@ -991,6 +1001,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	const struct Qdisc_class_ops *cops;
 	long index_start;
 	long index;
+	u32 parent;
 	int err;
 
 	if (nlmsg_len(cb->nlh) < sizeof(*tcm))
@@ -1004,10 +1015,13 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	if (!dev)
 		return skb->len;
 
-	if (!tcm->tcm_parent)
+	parent = tcm->tcm_parent;
+	if (!parent) {
 		q = dev->qdisc;
-	else
+		parent = q->handle;
+	} else {
 		q = qdisc_lookup(dev, TC_H_MAJ(tcm->tcm_parent));
+	}
 	if (!q)
 		goto out;
 	cops = q->ops->cl_ops;
@@ -1031,7 +1045,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		if (tca[TCA_CHAIN] &&
 		    nla_get_u32(tca[TCA_CHAIN]) != chain->index)
 			continue;
-		if (!tcf_chain_dump(chain, skb, cb, index_start, &index))
+		if (!tcf_chain_dump(chain, q, parent, skb, cb,
+				    index_start, &index))
 			break;
 	}
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 12/34] net: sched: add block bind/unbind notification to drivers
From: Jiri Pirko @ 2017-10-12 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to binder.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdevice.h |  1 +
 include/net/pkt_cls.h     | 21 +++++++++++++++++++--
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 33 +++++++++++++++++++++++++++++++--
 4 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 31bb301..062a4f5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -771,6 +771,7 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 
 enum tc_setup_type {
 	TC_SETUP_MQPRIO,
+	TC_SETUP_BLOCK,
 	TC_SETUP_CLSU32,
 	TC_SETUP_CLSFLOWER,
 	TC_SETUP_CLSMATCHALL,
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 05bc999..104326fc 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -17,9 +17,15 @@ struct tcf_walker {
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
+enum tcf_block_binder_type {
+	TCF_BLOCK_BINDER_TYPE_UNSPEC,
+};
+
 struct tcf_block_ext_info {
 	bool shareable;
 	u32 block_index;
+	enum tcf_block_binder_type binder_type;
+	bool bound;
 };
 
 #ifdef CONFIG_NET_CLS
@@ -34,7 +40,7 @@ int tcf_block_get_ext(struct tcf_block **p_block,
 		      struct tcf_block_ext_info *ei);
 void tcf_block_put(struct tcf_block *block);
 void tcf_block_put_ext(struct tcf_block *block,
-		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei);
 
 static inline bool tcf_is_block_shared(const struct tcf_block *block)
@@ -79,7 +85,7 @@ void tcf_block_put(struct tcf_block *block)
 
 static inline
 void tcf_block_put_ext(struct tcf_block *block,
-		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei)
 {
 }
@@ -468,6 +474,17 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 int tc_setup_cb_call(struct tcf_exts *exts, enum tc_setup_type type,
 		     void *type_data, bool err_stop);
 
+enum tc_block_command {
+	TC_BLOCK_BIND,
+	TC_BLOCK_UNBIND,
+};
+
+struct tc_block_offload {
+	enum tc_block_command command;
+	enum tcf_block_binder_type binder_type;
+	struct tcf_block *block;
+};
+
 struct tc_cls_common_offload {
 	u32 chain_index;
 	__be16 protocol;
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a4926c9..e210452 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -69,6 +69,7 @@ struct Qdisc {
 				      * qdisc_tree_decrease_qlen() should stop.
 				      */
 #define TCQ_F_INVISIBLE		0x80 /* invisible by default in dump */
+#define TCQ_F_BOUNDOFFLOAD	0x100 /* bound to the offload driver */
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 908b38a..8f6e2c9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -377,6 +377,33 @@ static struct tcf_chain *tcf_block_chain_zero(struct tcf_block *block)
 	return list_first_entry(&block->chain_list, struct tcf_chain, list);
 }
 
+static void tcf_block_offload_cmd(struct tcf_block *block, struct Qdisc *q,
+				  struct tcf_block_ext_info *ei,
+				  enum tc_block_command command)
+{
+	struct net_device *dev = q->dev_queue->dev;
+	struct tc_block_offload bo = {};
+
+	if (!tc_can_offload(dev))
+		return;
+	bo.command = command;
+	bo.binder_type = ei->binder_type;
+	bo.block = block;
+	dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
+}
+
+static void tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
+				   struct tcf_block_ext_info *ei)
+{
+	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_BIND);
+}
+
+static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
+				     struct tcf_block_ext_info *ei)
+{
+	tcf_block_offload_cmd(block, q, ei, TC_BLOCK_UNBIND);
+}
+
 int tcf_block_get_ext(struct tcf_block **p_block,
 		      struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
 		      struct tcf_block_ext_info *ei)
@@ -409,6 +436,7 @@ int tcf_block_get_ext(struct tcf_block **p_block,
 	if (err)
 		goto err_chain_filter_chain_ptr_add;
 
+	tcf_block_offload_bind(block, q, ei);
 	*p_block = block;
 	return 0;
 
@@ -435,12 +463,13 @@ int tcf_block_get(struct tcf_block **p_block,
 EXPORT_SYMBOL(tcf_block_get);
 
 void tcf_block_put_ext(struct tcf_block *block,
-		       struct tcf_proto __rcu **p_filter_chain,
+		       struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei)
 {
 	if (!block)
 		return;
 
+	tcf_block_offload_unbind(block, q, ei);
 	tcf_chain_filter_chain_ptr_del(tcf_block_chain_zero(block),
 				       p_filter_chain);
 
@@ -456,7 +485,7 @@ void tcf_block_put(struct tcf_block *block)
 {
 	struct tcf_block_ext_info ei = {0, };
 
-	tcf_block_put_ext(block, NULL, &ei);
+	tcf_block_put_ext(block, NULL, block->q, &ei);
 }
 EXPORT_SYMBOL(tcf_block_put);
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 13/34] net: sched: introduce per-block callbacks
From: Jiri Pirko @ 2017-10-12 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     |  81 ++++++++++++++++++++++++++++++++++++
 include/net/sch_generic.h |   1 +
 net/sched/cls_api.c       | 104 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 186 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 104326fc..febd52e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -28,6 +28,8 @@ struct tcf_block_ext_info {
 	bool bound;
 };
 
+struct tcf_block_cb;
+
 #ifdef CONFIG_NET_CLS
 struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
 				bool create);
@@ -59,6 +61,21 @@ static inline struct net_device *tcf_block_dev(struct tcf_block *block)
 	return tcf_block_q(block)->dev_queue->dev;
 }
 
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb);
+struct tcf_block_cb *tcf_block_cb_lookup(struct tcf_block *block,
+					 tc_setup_cb_t *cb, void *cb_ident);
+void tcf_block_cb_incref(struct tcf_block_cb *block_cb);
+unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb);
+struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
+					     tc_setup_cb_t *cb, void *cb_ident,
+					     void *cb_priv);
+int tcf_block_cb_register(struct tcf_block *block,
+			  tc_setup_cb_t *cb, void *cb_ident,
+			  void *cb_priv);
+void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb);
+void tcf_block_cb_unregister(struct tcf_block *block,
+			     tc_setup_cb_t *cb, void *cb_ident);
+
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		 struct tcf_result *res, bool compat_mode);
 
@@ -100,6 +117,70 @@ static inline bool tcf_is_block_shared(const struct tcf_block *block)
 	return false;
 }
 
+static inline
+int tc_setup_cb_block_register(struct tcf_block *block, tc_setup_cb_t *cb,
+			       void *cb_priv)
+{
+	return 0;
+}
+
+static inline
+void tc_setup_cb_block_unregister(struct tcf_block *block, tc_setup_cb_t *cb,
+				  void *cb_priv)
+{
+}
+
+static inline
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb)
+{
+	return NULL;
+}
+
+static inline
+struct tcf_block_cb *tcf_block_cb_lookup(struct tcf_block *block,
+					 tc_setup_cb_t *cb, void *cb_ident)
+{
+	return NULL;
+}
+
+static inline
+void tcf_block_cb_incref(struct tcf_block_cb *block_cb)
+{
+}
+
+static inline
+unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb)
+{
+	return 0;
+}
+
+static inline
+struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
+					     tc_setup_cb_t *cb, void *cb_ident,
+					     void *cb_priv)
+{
+	return NULL;
+}
+
+static inline
+int tcf_block_cb_register(struct tcf_block *block,
+			  tc_setup_cb_t *cb, void *cb_ident,
+			  void *cb_priv)
+{
+	return 0;
+}
+
+static inline
+void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb)
+{
+}
+
+static inline
+void tcf_block_cb_unregister(struct tcf_block *block,
+			     tc_setup_cb_t *cb, void *cb_ident)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			       struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e210452..dfa9617 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -275,6 +275,7 @@ struct tcf_block {
 	unsigned int refcnt;
 	struct net *net;
 	struct Qdisc *q;
+	struct list_head cb_list;
 };
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8f6e2c9..7837c8a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -319,6 +319,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q)
 	block->refcnt = 1;
 	block->net = net;
 	block->q = q;
+	INIT_LIST_HEAD(&block->cb_list);
 
 	/* Create chain 0 by default, it has to be always present. */
 	chain = tcf_chain_create(block, 0);
@@ -489,6 +490,109 @@ void tcf_block_put(struct tcf_block *block)
 }
 EXPORT_SYMBOL(tcf_block_put);
 
+struct tcf_block_cb {
+	struct list_head list;
+	tc_setup_cb_t *cb;
+	void *cb_ident;
+	void *cb_priv;
+	unsigned int refcnt;
+};
+
+void *tcf_block_cb_priv(struct tcf_block_cb *block_cb)
+{
+	return block_cb->cb_priv;
+}
+EXPORT_SYMBOL(tcf_block_cb_priv);
+
+struct tcf_block_cb *tcf_block_cb_lookup(struct tcf_block *block,
+					 tc_setup_cb_t *cb, void *cb_ident)
+{	struct tcf_block_cb *block_cb;
+
+	list_for_each_entry(block_cb, &block->cb_list, list)
+		if (block_cb->cb == cb && block_cb->cb_ident == cb_ident)
+			return block_cb;
+	return NULL;
+}
+EXPORT_SYMBOL(tcf_block_cb_lookup);
+
+void tcf_block_cb_incref(struct tcf_block_cb *block_cb)
+{
+	block_cb->refcnt++;
+}
+EXPORT_SYMBOL(tcf_block_cb_incref);
+
+unsigned int tcf_block_cb_decref(struct tcf_block_cb *block_cb)
+{
+	return --block_cb->refcnt;
+}
+EXPORT_SYMBOL(tcf_block_cb_decref);
+
+struct tcf_block_cb *__tcf_block_cb_register(struct tcf_block *block,
+					     tc_setup_cb_t *cb, void *cb_ident,
+					     void *cb_priv)
+{
+	struct tcf_block_cb *block_cb;
+
+	block_cb = kzalloc(sizeof(*block_cb), GFP_KERNEL);
+	if (!block_cb)
+		return NULL;
+	block_cb->cb = cb;
+	block_cb->cb_ident = cb_ident;
+	block_cb->cb_priv = cb_priv;
+	list_add(&block_cb->list, &block->cb_list);
+	return block_cb;
+}
+EXPORT_SYMBOL(__tcf_block_cb_register);
+
+int tcf_block_cb_register(struct tcf_block *block,
+			  tc_setup_cb_t *cb, void *cb_ident,
+			  void *cb_priv)
+{
+	struct tcf_block_cb *block_cb;
+
+	block_cb = __tcf_block_cb_register(block, cb, cb_ident, cb_priv);
+	return block_cb ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(tcf_block_cb_register);
+
+void __tcf_block_cb_unregister(struct tcf_block_cb *block_cb)
+{
+	list_del(&block_cb->list);
+	kfree(block_cb);
+}
+EXPORT_SYMBOL(__tcf_block_cb_unregister);
+
+void tcf_block_cb_unregister(struct tcf_block *block,
+			     tc_setup_cb_t *cb, void *cb_ident)
+{
+	struct tcf_block_cb *block_cb;
+
+	block_cb = tcf_block_cb_lookup(block, cb, cb_ident);
+	if (!block_cb)
+		return;
+	__tcf_block_cb_unregister(block_cb);
+}
+EXPORT_SYMBOL(tcf_block_cb_unregister);
+
+static int tcf_block_cb_call(struct tcf_block *block, enum tc_setup_type type,
+			     void *type_data, bool err_stop)
+{
+	struct tcf_block_cb *block_cb;
+	int ok_count = 0;
+	int err;
+
+	list_for_each_entry(block_cb, &block->cb_list, list) {
+		err = block_cb->cb(type, type_data, block_cb->cb_priv);
+		if (err) {
+			if (err_stop)
+				return err;
+		} else {
+			ok_count++;
+		}
+	}
+	return ok_count;
+}
+
 /* Main classifier routine: scans classifier chain attached
  * to this qdisc, (optionally) tests for protocol and asks
  * specific classifiers.
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 14/34] net: sched: use extended variants of block get and put in ingress and clsact qdiscs
From: Jiri Pirko @ 2017-10-12 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, jeffrey.t.kirsher, saeedm,
	matanb, leonro, idosch, jakub.kicinski, ast, daniel, simon.horman,
	pieter.jansenvanvuuren, john.hurley, edumazet, dsahern,
	alexander.h.duyck, john.fastabend, willemb
In-Reply-To: <20171012171823.1431-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Use previously introduced extended variants of block get and put
functions. This allows to specify a binder types specific to clsact
ingress/egress which is useful for drivers to distinguish who actually
got the block.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h   |  2 ++
 net/sched/sch_ingress.c | 36 +++++++++++++++++++++++++++++-------
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index febd52e..4e6cdf4 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -19,6 +19,8 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
 enum tcf_block_binder_type {
 	TCF_BLOCK_BINDER_TYPE_UNSPEC,
+	TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
+	TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
 struct tcf_block_ext_info {
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index 9ccc1b8..b599db2 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -20,6 +20,7 @@
 
 struct ingress_sched_data {
 	struct tcf_block *block;
+	struct tcf_block_ext_info block_info;
 };
 
 static struct Qdisc *ingress_leaf(struct Qdisc *sch, unsigned long arg)
@@ -59,7 +60,10 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
-	err = tcf_block_get(&q->block, &dev->ingress_cl_list, sch);
+	q->block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+
+	err = tcf_block_get_ext(&q->block, &dev->ingress_cl_list,
+				sch, &q->block_info);
 	if (err)
 		return err;
 
@@ -72,8 +76,10 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt)
 static void ingress_destroy(struct Qdisc *sch)
 {
 	struct ingress_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
-	tcf_block_put(q->block);
+	tcf_block_put_ext(q->block, &dev->ingress_cl_list,
+			  sch, &q->block_info);
 	net_dec_ingress_queue();
 }
 
@@ -114,6 +120,8 @@ static struct Qdisc_ops ingress_qdisc_ops __read_mostly = {
 struct clsact_sched_data {
 	struct tcf_block *ingress_block;
 	struct tcf_block *egress_block;
+	struct tcf_block_ext_info ingress_block_info;
+	struct tcf_block_ext_info egress_block_info;
 };
 
 static unsigned long clsact_find(struct Qdisc *sch, u32 classid)
@@ -153,13 +161,19 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 	struct net_device *dev = qdisc_dev(sch);
 	int err;
 
-	err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list, sch);
+	q->ingress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
+
+	err = tcf_block_get_ext(&q->ingress_block, &dev->ingress_cl_list,
+				sch, &q->ingress_block_info);
 	if (err)
 		return err;
 
-	err = tcf_block_get(&q->egress_block, &dev->egress_cl_list, sch);
+	q->egress_block_info.binder_type = TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS;
+
+	err = tcf_block_get_ext(&q->egress_block, &dev->egress_cl_list,
+				sch, &q->egress_block_info);
 	if (err)
-		return err;
+		goto err_egress_block_get;
 
 	net_inc_ingress_queue();
 	net_inc_egress_queue();
@@ -167,14 +181,22 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt)
 	sch->flags |= TCQ_F_CPUSTATS;
 
 	return 0;
+
+err_egress_block_get:
+	tcf_block_put_ext(q->ingress_block, &dev->ingress_cl_list,
+			  sch, &q->ingress_block_info);
+	return err;
 }
 
 static void clsact_destroy(struct Qdisc *sch)
 {
 	struct clsact_sched_data *q = qdisc_priv(sch);
+	struct net_device *dev = qdisc_dev(sch);
 
-	tcf_block_put(q->egress_block);
-	tcf_block_put(q->ingress_block);
+	tcf_block_put_ext(q->egress_block, &dev->egress_cl_list,
+			  sch, &q->egress_block_info);
+	tcf_block_put_ext(q->ingress_block, &dev->ingress_cl_list,
+			  sch, &q->ingress_block_info);
 
 	net_dec_ingress_queue();
 	net_dec_egress_queue();
-- 
2.9.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox