Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] RCU: introduce noref debug
From: Paul E. McKenney @ 2017-10-11  4:02 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: linux-kernel, Josh Triplett, Steven Rostedt, David S. Miller,
	Eric Dumazet, Hannes Frederic Sowa, netdev
In-Reply-To: <1507567992.21825.9.camel@redhat.com>

On Mon, Oct 09, 2017 at 06:53:12PM +0200, Paolo Abeni wrote:
> On Fri, 2017-10-06 at 09:34 -0700, Paul E. McKenney wrote:
> > On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote:
> > > Hi,
> > > 
> > > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote:
> > > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote:
> > > > > The networking subsystem is currently using some kind of long-lived
> > > > > RCU-protected, references to avoid the overhead of full book-keeping.
> > > > > 
> > > > > Such references - skb_dst() noref - are stored inside the skbs and can be
> > > > > moved across relevant slices of the network stack, with the users
> > > > > being in charge of properly clearing the relevant skb - or properly refcount
> > > > > the related dst references - before the skb escapes the RCU section.
> > > > > 
> > > > > We currently don't have any deterministic debug infrastructure to check
> > > > > the dst noref usages - and the introduction of others noref artifact is
> > > > > currently under discussion.
> > > > > 
> > > > > This series tries to tackle the above introducing an RCU debug infrastructure
> > > > > aimed at spotting incorrect noref pointer usage, in patch one. The
> > > > > infrastructure is small and must be explicitly enabled via a newly introduced
> > > > > build option.
> > > > > 
> > > > > Patch two uses such infrastructure to track dst noref usage in the networking
> > > > > stack.
> > > > > 
> > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure
> > > > > on basic scenarios.
> > > 
> > > Thank you for the prompt reply!
> > > > 
> > > > This patchset does not look like it handles rcu_read_lock() nesting.
> > > > For example, given code like this:
> > > > 
> > > > 	void foo(void)
> > > > 	{
> > > > 		rcu_read_lock();
> > > > 		rcu_track_noref(&key2, &noref2, true);
> > > > 		do_something();
> > > > 		rcu_track_noref(&key2, &noref2, false);
> > > > 		rcu_read_unlock();
> > > > 	}
> > > > 
> > > > 	void bar(void)
> > > > 	{
> > > > 		rcu_read_lock();
> > > > 		rcu_track_noref(&key1, &noref1, true);
> > > > 		do_something_more();
> > > > 		foo();
> > > > 		do_something_else();
> > > > 		rcu_track_noref(&key1, &noref1, false);
> > > > 		rcu_read_unlock();
> > > > 	}
> > > > 
> > > > 	void grill(void)
> > > > 	{
> > > > 		foo();
> > > > 	}
> > > > 
> > > > It looks like foo()'s rcu_read_unlock() will complain about key1.
> > > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but
> > > > that will break the call from grill().
> > > 
> > > Actually the code should cope correctly with your example; when foo()'s
> > > rcu_read_unlock() is called, 'cache' contains:
> > > 
> > > { { &key1, &noref1, 1},  // ...
> > > 
> > > and when the related __rcu_check_noref() is invoked preempt_count() is
> > > 2 - because the check is called before decreasing the preempt counter.
> > > 
> > > In the main loop inside __rcu_check_noref() we will hit always the
> > > 'continue' statement because 'cache->store[i].nesting != nesting', so
> > > no warn will be triggered.
> > 
> > You are right, it was too early, and my example wasn't correct.  How
> > about this one?
> > 
> > 	void foo(void (*f)(struct s *sp), struct s **spp)
> > 	{
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key2, &noref2, true);
> > 		f(spp);
> > 		rcu_track_noref(&key2, &noref2, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void barcb(struct s **spp)
> > 	{
> > 		*spp = &noref3;
> > 		rcu_track_noref(&key3, *spp, true);
> > 	}
> > 
> > 	void bar(void)
> > 	{
> > 		struct s *sp;
> > 
> > 		rcu_read_lock();
> > 		rcu_track_noref(&key1, &noref1, true);
> > 		do_something_more();
> > 		foo(barcb, &sp);
> > 		do_something_else(sp);
> > 		rcu_track_noref(&key3, sp, false);
> > 		rcu_track_noref(&key1, &noref1, false);
> > 		rcu_read_unlock();
> > 	}
> > 
> > 	void grillcb(struct s **spp)
> > 	{
> > 		*spp
> > 	}
> > 
> > 	void grill(void)
> > 	{
> > 		foo();
> > 	}
> 
> You are right: this will generate a splat, even if the code it safe.
> The false positive can be avoided looking for leaked references only in
> the outermost rcu unlook. I did a previous implementation performing
> such check, but it emitted very generic splat so I tried to be more
> strict. The latter choice allowed to find/do 3/4.
> 
> What about using save_stack_trace() in rcu_track_noref(, true) and
> reporting such stack trace when the check in the outer most rcu fails?
> 
> the current strict/false-postive-prone check could be enabled under an
> additional build flag.

Linus and Ingo will ask me how users decide how they should set that
additional build flag.  Especially given that if there is code that
requires non-strict checking, isn't everyone required to set up non-strict
checking to avoid false positives?  Unless you can also configure out
all the code that requires non-strict checking, I suppose, but how
would you keep track of what to configure out?

> > How does the user select the key argument?  It looks like someone
> > can choose to just pass in NULL -- is that the intent, or am I confused
> > about this as well?
> 
> The 'key' argument is intented to discriminate the scope of the same
> noref dst attached to different skbs, which happens e.g. as a result of
> as skb_dst_copy().
> 
> In a generic use case, it can be NULL, too.

OK.  There will probably be some discussion about the API in that case.

> > I am also concerned about false negatives when the user invokes
> > rcu_track_noref(..., false) but then leaks the pointer anyway.  Or is
> > there something you are doing that catches this that I am missing?
> 
> If the rcu_track_noref(..., false) is misplaced or missing, yes we can
> have false negative. 
> 
> In the noref rcu use-case the rcu_track_noref(, false) call is/must be
> placed side-by-side with the code clearing the the noref pointer, so
> is/should be quite easy avoiding such mistakes.

True enough.  Except that if people were good about always clearing the
pointer, then the pointer couldn't leak, right?  Or am I missing something
in your use cases?

Or to put it another way -- have you been able to catch any real
pointer-leak bugs with this?  The other RCU debug options have had
pretty long found-bug lists before we accepted them.

							Thanx, Paul

^ permalink raw reply

* ipsec: Fix dst leak in xfrm_bundle_create().
From: David Miller @ 2017-10-11  3:59 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert


If we cannot find a suitable inner_mode value, we will leak
the currently allocated 'xdst'.

The fix is to make sure it is linked into the chain before
erroring out.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Steffen, I found this via visual inspection.  Please double check my
work before applying this :-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index f06253969972..2746b62a8944 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1573,6 +1573,14 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 			goto put_states;
 		}
 
+		if (!dst_prev)
+			dst0 = dst1;
+		else
+			/* Ref count is taken during xfrm_alloc_dst()
+			 * No need to do dst_clone() on dst1
+			 */
+			dst_prev->child = dst1;
+
 		if (xfrm[i]->sel.family == AF_UNSPEC) {
 			inner_mode = xfrm_ip2inner_mode(xfrm[i],
 							xfrm_af2proto(family));
@@ -1584,14 +1592,6 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
 		} else
 			inner_mode = xfrm[i]->inner_mode;
 
-		if (!dst_prev)
-			dst0 = dst1;
-		else
-			/* Ref count is taken during xfrm_alloc_dst()
-			 * No need to do dst_clone() on dst1
-			 */
-			dst_prev->child = dst1;
-
 		xdst->route = dst;
 		dst_copy_metrics(dst1, dst);
 

^ permalink raw reply related

* Re: [PATCH net 2/2] net: call cgroup_sk_alloc() earlier in sk_clone_lock()
From: David Miller @ 2017-10-11  3:24 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, hannes, cgallek, tj
In-Reply-To: <20171011021233.24158-2-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 10 Oct 2017 19:12:33 -0700

> If for some reason, the newly allocated child need to be freed,
> we will call cgroup_put() (via sk_free_unlock_clone()) while the
> corresponding cgroup_get() was not yet done, and we will free memory
> too soon.
> 
> Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net 1/2] Revert "net: defer call to cgroup_sk_alloc()"
From: David Miller @ 2017-10-11  3:24 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, hannes, cgallek, tj
In-Reply-To: <20171011021233.24158-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 10 Oct 2017 19:12:32 -0700

> This reverts commit fbb1fb4ad415cb31ce944f65a5ca700aaf73a227.
> 
> This was not the proper fix, lets cleanly revert it, so that
> following patch can be carried to stable versions.
> 
> sock_cgroup_ptr() callers do not expect a NULL return value.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [next-queue PATCH v5 0/5] TSN: Add qdisc based config interface for CBS
From: David Miller @ 2017-10-11  3:23 UTC (permalink / raw)
  To: vinicius.gomes
  Cc: netdev, intel-wired-lan, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings
In-Reply-To: <20171011004400.14946-1-vinicius.gomes@intel.com>

From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Date: Tue, 10 Oct 2017 17:43:55 -0700

> Changes since v4:
>  - Added a software implementation of the CBS algorithm;

This series now looks fine to me.

I'll give others a chance to review it.

^ permalink raw reply

* Re: [PATCH RFC 0/3] tun zerocopy stats
From: Jason Wang @ 2017-10-11  3:15 UTC (permalink / raw)
  To: Willem de Bruijn, David Miller
  Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn
In-Reply-To: <CAF=yD-J+Abjka9PViEFE6OhKOWf65UK1do5prYboFC+m+Tv1wA@mail.gmail.com>



On 2017年10月11日 03:11, Willem de Bruijn wrote:
> On Tue, Oct 10, 2017 at 1:39 PM, David Miller <davem@davemloft.net> wrote:
>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>> Date: Tue, 10 Oct 2017 11:29:33 -0400
>>
>>> If there is a way to expose these stats through vhost_net directly,
>>> instead of through tun, that may be better. But I did not see a
>>> suitable interface. Perhaps debugfs.
>> Please don't use debugfs, thank you :-)
> Okay. I'll take a look at tracing for on-demand measurement.

This reminds me a past series that adding tracepoints to vhost/net[1]. 
It can count zero/datacopy independently and even contains a sample 
program to show the stats.

Thanks

[1] https://lists.oasis-open.org/archives/virtio-dev/201403/msg00025.html

^ permalink raw reply

* Re: [PATCH net 0/7] net: qualcomm: rmnet: Fix some existing functionality
From: David Miller @ 2017-10-11  3:11 UTC (permalink / raw)
  To: subashab; +Cc: netdev
In-Reply-To: <1507681229-27861-1-git-send-email-subashab@codeaurora.org>

From: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Date: Tue, 10 Oct 2017 18:20:22 -0600

> This series fixes some of the broken rmnet functionality from the initial
> patchset. Bridge mode is re-written and made useable and the muxed_ep is
> converted to hlist.
> 
> Patches 1-5 are cleanups in preparation for these changes.
> Patch 6 does the hlist conversion.
> Patch 7 has the implementation of the rmnet bridge mode.
> 
> Note that there will be a compilation error when merging net with net-next
> due to the addition of the ext ack argument in
> netdev_master_upper_dev_link / ndo_add_slave.

I don't thnk any of these changes qualify as "fixes".  They don't
fix any bugs at all.

These are cleanups, and the addition of a new feature used for
debugging.

Therefore these should be all targetted at the 'net-next' tree
not 'net'.

^ permalink raw reply

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
From: Alexei Starovoitov @ 2017-10-11  2:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Cong Wang, netdev, Eric Dumazet, Yuchung Cheng, Neal Cardwell,
	Martin KaFai Lau, Brendan Gregg, Song Liu
In-Reply-To: <87po9ull5u.fsf@stressinduktion.org>

On Tue, Oct 10, 2017 at 11:58:53PM +0200, Hannes Frederic Sowa wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Mon, Oct 09, 2017 at 10:35:47PM -0700, Cong Wang wrote:
> 
> [...]
> 
> >> +		trace_tcp_retransmit_skb(sk, skb, segs);
> >
> > I'm happy to see new tracepoints being added to tcp stack, but I'm concerned
> > with practical usability of them.
> > Like the above tracepoint definition makes it not very useful from bpf point of view,
> > since 'sk' pointer is not recored by as part of the tracepoint.
> > In bpf/tracing world we prefer tracepoints to have raw pointers recorded
> > in TP_STRUCT__entry() and _not_ printed in TP_printk()
> > (since pointers are useless for userspace).
> 
> Ack.
> 
> Also could the TP_printk also use the socket cookies so they can get
> associated with netlink dumps and as such also be associated to user
> space processes? It could help against races while trying to associate
> the socket with a process. ss already supports dumping those cookies
> with -e.

makes sense to me.

> The corresponding commit would be:
> 
> commit 33cf7c90fe2f97afb1cadaa0cfb782cb9d1b9ee2
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Mar 11 18:53:14 2015 -0700
> 
>     net: add real socket cookies
> 
> Right now they only get set when needed but as Eric already mentioned in
> his commit log, this could be refined.

actually we hit that too for completely different tracing use case.
Indeed would be good to generate socket cookie unconditionally
for all sockets. I don't think there is any harm.

^ permalink raw reply

* Re: [Patch net-next] tcp: add a tracepoint for tcp_retransmit_skb()
From: Alexei Starovoitov @ 2017-10-11  2:53 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Eric Dumazet, Yuchung Cheng,
	Neal Cardwell, Martin KaFai Lau, Brendan Gregg,
	Hannes Frederic Sowa, Song Liu
In-Reply-To: <CAM_iQpU9TAU-SbjTyfp3xZUjpdrz=WWobHiAhoZ2zeF0oonYXg@mail.gmail.com>

On Tue, Oct 10, 2017 at 02:37:11PM -0700, Cong Wang wrote:
> >
> > More concrete, if you can make this trace_tcp_retransmit_skb() to record
> > sk, skb pointers and err code at the end of __tcp_retransmit_skb() it will solve
> > our need as well.
> 
> 
> Note, currently I only call trace_tcp_retransmit_skb() for successful
> retransmissions, since you mentioned err code, I guess you want it
> for failures too? I am not sure if tracing unsuccessful TCP retransmissions
> is meaningful here, I guess it's needed for BPF to track TCP states?
>
> It doesn't harm to add it, at least we can filter out err!=0 since we
> only care about successful ones.

right now only successful rxmit would be enough for us.
Only that 'err' is hard to do via kprobe, since it's in some random
register and debug info is generally not available.
If you want to drop err for now and call tracepoint only
on success, I think, that's fine too. Need to double check.
Only sk and skb pointers are must have.

Thanks!

^ permalink raw reply

* Re: Regression in throughput between kvm guests over virtual bridge
From: Jason Wang @ 2017-10-11  2:41 UTC (permalink / raw)
  To: Matthew Rosato, netdev; +Cc: davem, mst
In-Reply-To: <038445a6-9dd5-30c2-aac0-ab5efbfa7024@linux.vnet.ibm.com>



On 2017年10月06日 04:07, Matthew Rosato wrote:
> On 09/25/2017 04:18 PM, Matthew Rosato wrote:
>> On 09/22/2017 12:03 AM, Jason Wang wrote:
>>>
>>> On 2017年09月21日 03:38, Matthew Rosato wrote:
>>>>> Seems to make some progress on wakeup mitigation. Previous patch tries
>>>>> to reduce the unnecessary traversal of waitqueue during rx. Attached
>>>>> patch goes even further which disables rx polling during processing tx.
>>>>> Please try it to see if it has any difference.
>>>> Unfortunately, this patch doesn't seem to have made a difference.  I
>>>> tried runs with both this patch and the previous patch applied, as well
>>>> as only this patch applied for comparison (numbers from vhost thread of
>>>> sending VM):
>>>>
>>>> 4.12    4.13     patch1   patch2   patch1+2
>>>> 2.00%   +3.69%   +2.55%   +2.81%   +2.69%   [...] __wake_up_sync_key
>>>>
>>>> In each case, the regression in throughput was still present.
>>> This probably means some other cases of the wakeups were missed. Could
>>> you please record the callers of __wake_up_sync_key()?
>>>
>> Hi Jason,
>>
>> With your 2 previous patches applied, every call to __wake_up_sync_key
>> (for both sender and server vhost threads) shows the following stack trace:
>>
>>       vhost-11478-11520 [002] ....   312.927229: __wake_up_sync_key
>> <-sock_def_readable
>>       vhost-11478-11520 [002] ....   312.927230: <stack trace>
>>   => dev_hard_start_xmit
>>   => sch_direct_xmit
>>   => __dev_queue_xmit
>>   => br_dev_queue_push_xmit
>>   => br_forward_finish
>>   => __br_forward
>>   => br_handle_frame_finish
>>   => br_handle_frame
>>   => __netif_receive_skb_core
>>   => netif_receive_skb_internal
>>   => tun_get_user
>>   => tun_sendmsg
>>   => handle_tx
>>   => vhost_worker
>>   => kthread
>>   => kernel_thread_starter
>>   => kernel_thread_starter
>>
> Ping...  Jason, any other ideas or suggestions?
>

Sorry for the late, recovering from a long holiday. Will go back to this 
soon.

Thanks

^ permalink raw reply

* [PATCH net-next] net: hns3: make local functions static
From: Wei Yongjun @ 2017-10-11  2:35 UTC (permalink / raw)
  To: Yisen Zhuang, Salil Mehta, Lipeng, Yunsheng Lin, Colin Ian King
  Cc: Wei Yongjun, netdev

Fixes the following sparse warnings:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:464:5: warning:
 symbol 'hns3_change_all_ring_bd_num' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:477:5: warning:
 symbol 'hns3_set_ringparam' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
index 9b36ce0..ddbd7f3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
@@ -461,7 +461,8 @@ static int hns3_get_rxnfc(struct net_device *netdev,
 	return 0;
 }
 
-int hns3_change_all_ring_bd_num(struct hns3_nic_priv *priv, u32 new_desc_num)
+static int hns3_change_all_ring_bd_num(struct hns3_nic_priv *priv,
+				       u32 new_desc_num)
 {
 	struct hnae3_handle *h = priv->ae_handle;
 	int i;
@@ -474,7 +475,8 @@ int hns3_change_all_ring_bd_num(struct hns3_nic_priv *priv, u32 new_desc_num)
 	return hns3_init_all_ring(priv);
 }
 
-int hns3_set_ringparam(struct net_device *ndev, struct ethtool_ringparam *param)
+static int hns3_set_ringparam(struct net_device *ndev,
+			      struct ethtool_ringparam *param)
 {
 	struct hns3_nic_priv *priv = netdev_priv(ndev);
 	struct hnae3_handle *h = priv->ae_handle;

^ permalink raw reply related

* [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Manish Kurup @ 2017-10-11  2:33 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel
  Cc: aring, mrv, kurup.manish, manish.kurup

Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 21 ++++++++-----
 net/sched/act_vlan.c         | 73 ++++++++++++++++++++++++++++++--------------
 2 files changed, 63 insertions(+), 31 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..67fd355 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include <net/act_api.h>
 #include <linux/tc_act/tc_vlan.h>
 
+struct tcf_vlan_params {
+	struct rcu_head   rcu;
+	int               tcfv_action;
+	u16               tcfv_push_vid;
+	__be16            tcfv_push_proto;
+	u8                tcfv_push_prio;
+};
+
 struct tcf_vlan {
 	struct tc_action	common;
-	int			tcfv_action;
-	u16			tcfv_push_vid;
-	__be16			tcfv_push_proto;
-	u8			tcfv_push_prio;
+	struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,22 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-	return to_vlan(a)->tcfv_action;
+	return to_vlan(a)->vlan_p->tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-	return to_vlan(a)->tcfv_push_vid;
+	return to_vlan(a)->vlan_p->tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-	return to_vlan(a)->tcfv_push_proto;
+	return to_vlan(a)->vlan_p->tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-	return to_vlan(a)->tcfv_push_prio;
+	return to_vlan(a)->vlan_p->tcfv_push_prio;
 }
 
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 14c262c..9bb0236 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -29,31 +29,37 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	int action;
 	int err;
 	u16 tci;
+	struct tcf_vlan_params *p;
 
 	tcf_lastuse_update(&v->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-	spin_lock(&v->tcf_lock);
-	action = v->tcf_action;
-
 	/* Ensure 'data' points at mac_header prior calling vlan manipulating
 	 * functions.
 	 */
 	if (skb_at_tc_ingress(skb))
 		skb_push_rcsum(skb, skb->mac_len);
 
-	switch (v->tcfv_action) {
+	rcu_read_lock();
+
+	action = READ_ONCE(v->tcf_action);
+
+	p = rcu_dereference(v->vlan_p);
+
+	switch (p->tcfv_action) {
 	case TCA_VLAN_ACT_POP:
 		err = skb_vlan_pop(skb);
 		if (err)
 			goto drop;
 		break;
+
 	case TCA_VLAN_ACT_PUSH:
-		err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-				    (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+		err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+				(p->tcfv_push_prio << VLAN_PRIO_SHIFT));
 		if (err)
 			goto drop;
 		break;
+
 	case TCA_VLAN_ACT_MODIFY:
 		/* No-op if no vlan tag (either hw-accel or in-payload) */
 		if (!skb_vlan_tagged(skb))
@@ -69,15 +75,16 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 				goto drop;
 		}
 		/* replace the vid */
-		tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+		tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
 		/* replace prio bits, if tcfv_push_prio specified */
-		if (v->tcfv_push_prio) {
+		if (p->tcfv_push_prio) {
 			tci &= ~VLAN_PRIO_MASK;
-			tci |= v->tcfv_push_prio << VLAN_PRIO_SHIFT;
+			tci |= p->tcfv_push_prio << VLAN_PRIO_SHIFT;
 		}
 		/* put updated tci as hwaccel tag */
-		__vlan_hwaccel_put_tag(skb, v->tcfv_push_proto, tci);
+		__vlan_hwaccel_put_tag(skb, p->tcfv_push_proto, tci);
 		break;
+
 	default:
 		BUG();
 	}
@@ -89,6 +96,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
 
 unlock:
+	rcu_read_unlock();
 	if (skb_at_tc_ingress(skb))
 		skb_pull_rcsum(skb, skb->mac_len);
 
@@ -111,6 +119,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	struct nlattr *tb[TCA_VLAN_MAX + 1];
 	struct tc_vlan *parm;
 	struct tcf_vlan *v;
+	struct tcf_vlan_params *p, *p_old;
 	int action;
 	__be16 push_vid = 0;
 	__be16 push_proto = 0;
@@ -187,16 +196,33 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 
 	v = to_vlan(*a);
 
-	spin_lock_bh(&v->tcf_lock);
-
-	v->tcfv_action = action;
-	v->tcfv_push_vid = push_vid;
-	v->tcfv_push_prio = push_prio;
-	v->tcfv_push_proto = push_proto;
+	ASSERT_RTNL();
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (unlikely(!p)) {
+		if (ovr)
+			tcf_idr_release(*a, bind);
+		return -ENOMEM;
+	}
 
 	v->tcf_action = parm->action;
 
-	spin_unlock_bh(&v->tcf_lock);
+	p_old = rtnl_dereference(v->vlan_p);
+
+	if (ovr)
+		spin_lock_bh(&v->tcf_lock);
+
+	p->tcfv_action = action;
+	p->tcfv_push_vid = push_vid;
+	p->tcfv_push_prio = push_prio;
+	p->tcfv_push_proto = push_proto;
+
+	rcu_assign_pointer(v->vlan_p, p);
+
+	if (ovr)
+		spin_unlock_bh(&v->tcf_lock);
+
+	if (p_old)
+		kfree_rcu(p_old, rcu);
 
 	if (ret == ACT_P_CREATED)
 		tcf_idr_insert(tn, *a);
@@ -208,25 +234,26 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct tc_action *a,
 {
 	unsigned char *b = skb_tail_pointer(skb);
 	struct tcf_vlan *v = to_vlan(a);
+	struct tcf_vlan_params *p = rtnl_dereference(v->vlan_p);
 	struct tc_vlan opt = {
 		.index    = v->tcf_index,
 		.refcnt   = v->tcf_refcnt - ref,
 		.bindcnt  = v->tcf_bindcnt - bind,
 		.action   = v->tcf_action,
-		.v_action = v->tcfv_action,
+		.v_action = p->tcfv_action,
 	};
 	struct tcf_t t;
 
 	if (nla_put(skb, TCA_VLAN_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
 
-	if ((v->tcfv_action == TCA_VLAN_ACT_PUSH ||
-	     v->tcfv_action == TCA_VLAN_ACT_MODIFY) &&
-	    (nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, v->tcfv_push_vid) ||
+	if ((p->tcfv_action == TCA_VLAN_ACT_PUSH ||
+	     p->tcfv_action == TCA_VLAN_ACT_MODIFY) &&
+	    (nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, p->tcfv_push_vid) ||
 	     nla_put_be16(skb, TCA_VLAN_PUSH_VLAN_PROTOCOL,
-			  v->tcfv_push_proto) ||
+			  p->tcfv_push_proto) ||
 	     (nla_put_u8(skb, TCA_VLAN_PUSH_VLAN_PRIORITY,
-					      v->tcfv_push_prio))))
+					      p->tcfv_push_prio))))
 		goto nla_put_failure;
 
 	tcf_tm_dump(&t, &v->tcf_tm);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats
From: Manish Kurup @ 2017-10-11  2:32 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel
  Cc: aring, mrv, kurup.manish, manish.kurup

The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
---
 net/sched/act_vlan.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..14c262c 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	int err;
 	u16 tci;
 
-	spin_lock(&v->tcf_lock);
 	tcf_lastuse_update(&v->tcf_tm);
-	bstats_update(&v->tcf_bstats, skb);
+	bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+	spin_lock(&v->tcf_lock);
 	action = v->tcf_action;
 
 	/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 
 drop:
 	action = TC_ACT_SHOT;
-	v->tcf_qstats.drops++;
+	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
 	if (skb_at_tc_ingress(skb))
 		skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
-				     &act_vlan_ops, bind, false);
+						&act_vlan_ops, bind, true);
 		if (ret)
 			return ret;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3] mac80211: aead api to reduce redundancy
From: Xiang Gao @ 2017-10-11  2:31 UTC (permalink / raw)
  To: davem, johannes, linux-kernel, linux-wireless, netdev; +Cc: qasdfgtyuiop

Currently, the aes_ccm.c and aes_gcm.c are almost line by line copy of
each other. This patch reduce code redundancy by moving the code in these
two files to crypto/aead_api.c to make it a higher level aead api. The
file aes_ccm.c and aes_gcm.c are removed and all the functions there are
now implemented in their headers using the newly added aead api.

Signed-off-by: Xiang Gao <qasdfgtyuiop@gmail.com>
---
 net/mac80211/Makefile                  |   3 +-
 net/mac80211/{aes_ccm.c => aead_api.c} |  40 ++++++------
 net/mac80211/aead_api.h                |  27 ++++++++
 net/mac80211/aes_ccm.h                 |  42 +++++++++----
 net/mac80211/aes_gcm.c                 | 109 ---------------------------------
 net/mac80211/aes_gcm.h                 |  38 +++++++++---
 net/mac80211/wpa.c                     |   4 +-
 7 files changed, 111 insertions(+), 152 deletions(-)
 rename net/mac80211/{aes_ccm.c => aead_api.c} (67%)
 create mode 100644 net/mac80211/aead_api.h
 delete mode 100644 net/mac80211/aes_gcm.c

diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 282912245938..80f25ff2f24b 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -6,6 +6,7 @@ mac80211-y := \
 	driver-ops.o \
 	sta_info.o \
 	wep.o \
+	aead_api.o \
 	wpa.o \
 	scan.o offchannel.o \
 	ht.o agg-tx.o agg-rx.o \
@@ -15,8 +16,6 @@ mac80211-y := \
 	rate.o \
 	michael.o \
 	tkip.o \
-	aes_ccm.o \
-	aes_gcm.o \
 	aes_cmac.o \
 	aes_gmac.o \
 	fils_aead.o \
diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aead_api.c
similarity index 67%
rename from net/mac80211/aes_ccm.c
rename to net/mac80211/aead_api.c
index a4e0d59a40dd..cc48675ba742 100644
--- a/net/mac80211/aes_ccm.c
+++ b/net/mac80211/aead_api.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright 2014-2015, Qualcomm Atheros, Inc.
  * Copyright 2003-2004, Instant802 Networks, Inc.
  * Copyright 2005-2006, Devicescape Software, Inc.
  *
@@ -12,30 +13,29 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/err.h>
+#include <linux/scatterlist.h>
 #include <crypto/aead.h>
 
-#include <net/mac80211.h>
-#include "key.h"
-#include "aes_ccm.h"
+#include "aead_api.h"
 
-int ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic,
-			      size_t mic_len)
+int aead_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, size_t aad_len,
+		 u8 *data, size_t data_len, u8 *mic)
 {
+	size_t mic_len = tfm->authsize;
 	struct scatterlist sg[3];
 	struct aead_request *aead_req;
 	int reqsize = sizeof(*aead_req) + crypto_aead_reqsize(tfm);
 	u8 *__aad;
 
-	aead_req = kzalloc(reqsize + CCM_AAD_LEN, GFP_ATOMIC);
+	aead_req = kzalloc(reqsize + aad_len, GFP_ATOMIC);
 	if (!aead_req)
 		return -ENOMEM;
 
 	__aad = (u8 *)aead_req + reqsize;
-	memcpy(__aad, aad, CCM_AAD_LEN);
+	memcpy(__aad, aad, aad_len);
 
 	sg_init_table(sg, 3);
-	sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
+	sg_set_buf(&sg[0], __aad, aad_len);
 	sg_set_buf(&sg[1], data, data_len);
 	sg_set_buf(&sg[2], mic, mic_len);
 
@@ -49,10 +49,10 @@ int ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
 	return 0;
 }
 
-int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic,
-			      size_t mic_len)
+int aead_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, size_t aad_len,
+		 u8 *data, size_t data_len, u8 *mic)
 {
+	size_t mic_len = tfm->authsize;
 	struct scatterlist sg[3];
 	struct aead_request *aead_req;
 	int reqsize = sizeof(*aead_req) + crypto_aead_reqsize(tfm);
@@ -62,15 +62,15 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
 	if (data_len == 0)
 		return -EINVAL;
 
-	aead_req = kzalloc(reqsize + CCM_AAD_LEN, GFP_ATOMIC);
+	aead_req = kzalloc(reqsize + aad_len, GFP_ATOMIC);
 	if (!aead_req)
 		return -ENOMEM;
 
 	__aad = (u8 *)aead_req + reqsize;
-	memcpy(__aad, aad, CCM_AAD_LEN);
+	memcpy(__aad, aad, aad_len);
 
 	sg_init_table(sg, 3);
-	sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
+	sg_set_buf(&sg[0], __aad, aad_len);
 	sg_set_buf(&sg[1], data, data_len);
 	sg_set_buf(&sg[2], mic, mic_len);
 
@@ -84,14 +84,14 @@ int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
 	return err;
 }
 
-struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[],
-						    size_t key_len,
-						    size_t mic_len)
+struct crypto_aead *
+aead_key_setup_encrypt(const char *alg, const u8 key[],
+		       size_t key_len, size_t mic_len)
 {
 	struct crypto_aead *tfm;
 	int err;
 
-	tfm = crypto_alloc_aead("ccm(aes)", 0, CRYPTO_ALG_ASYNC);
+	tfm = crypto_alloc_aead(alg, 0, CRYPTO_ALG_ASYNC);
 	if (IS_ERR(tfm))
 		return tfm;
 
@@ -109,7 +109,7 @@ struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[],
 	return ERR_PTR(err);
 }
 
-void ieee80211_aes_key_free(struct crypto_aead *tfm)
+void aead_key_free(struct crypto_aead *tfm)
 {
 	crypto_free_aead(tfm);
 }
diff --git a/net/mac80211/aead_api.h b/net/mac80211/aead_api.h
new file mode 100644
index 000000000000..5e39ea843bbf
--- /dev/null
+++ b/net/mac80211/aead_api.h
@@ -0,0 +1,27 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _AEAD_API_H
+#define _AEAD_API_H
+
+#include <crypto/aead.h>
+#include <linux/crypto.h>
+
+struct crypto_aead *
+aead_key_setup_encrypt(const char *alg, const u8 key[],
+		       size_t key_len, size_t mic_len);
+
+int aead_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
+		 size_t aad_len, u8 *data,
+		 size_t data_len, u8 *mic);
+
+int aead_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
+		 size_t aad_len, u8 *data,
+		 size_t data_len, u8 *mic);
+
+void aead_key_free(struct crypto_aead *tfm);
+
+#endif /* _AEAD_API_H */
diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h
index fcd3254c5cf0..e9b7ca0bde5b 100644
--- a/net/mac80211/aes_ccm.h
+++ b/net/mac80211/aes_ccm.h
@@ -10,19 +10,39 @@
 #ifndef AES_CCM_H
 #define AES_CCM_H
 
-#include <linux/crypto.h>
+#include "aead_api.h"
 
 #define CCM_AAD_LEN	32
 
-struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[],
-						    size_t key_len,
-						    size_t mic_len);
-int ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic,
-			      size_t mic_len);
-int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic,
-			      size_t mic_len);
-void ieee80211_aes_key_free(struct crypto_aead *tfm);
+static inline struct crypto_aead *
+ieee80211_aes_key_setup_encrypt(const u8 key[], size_t key_len, size_t mic_len)
+{
+	return aead_key_setup_encrypt("ccm(aes)", key, key_len, mic_len);
+}
+
+static inline int
+ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm,
+			  u8 *b_0, u8 *aad, u8 *data,
+			  size_t data_len, u8 *mic)
+{
+	return aead_encrypt(tfm, b_0, aad + 2,
+			    be16_to_cpup((__be16 *)aad),
+			    data, data_len, mic);
+}
+
+static inline int
+ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm,
+			  u8 *b_0, u8 *aad, u8 *data,
+			  size_t data_len, u8 *mic)
+{
+	return aead_decrypt(tfm, b_0, aad + 2,
+			    be16_to_cpup((__be16 *)aad),
+			    data, data_len, mic);
+}
+
+static inline void ieee80211_aes_key_free(struct crypto_aead *tfm)
+{
+	return aead_key_free(tfm);
+}
 
 #endif /* AES_CCM_H */
diff --git a/net/mac80211/aes_gcm.c b/net/mac80211/aes_gcm.c
deleted file mode 100644
index 8a4397cc1b08..000000000000
--- a/net/mac80211/aes_gcm.c
+++ /dev/null
@@ -1,109 +0,0 @@
-/*
- * Copyright 2014-2015, Qualcomm Atheros, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include <linux/kernel.h>
-#include <linux/types.h>
-#include <linux/err.h>
-#include <crypto/aead.h>
-
-#include <net/mac80211.h>
-#include "key.h"
-#include "aes_gcm.h"
-
-int ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic)
-{
-	struct scatterlist sg[3];
-	struct aead_request *aead_req;
-	int reqsize = sizeof(*aead_req) + crypto_aead_reqsize(tfm);
-	u8 *__aad;
-
-	aead_req = kzalloc(reqsize + GCM_AAD_LEN, GFP_ATOMIC);
-	if (!aead_req)
-		return -ENOMEM;
-
-	__aad = (u8 *)aead_req + reqsize;
-	memcpy(__aad, aad, GCM_AAD_LEN);
-
-	sg_init_table(sg, 3);
-	sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
-	sg_set_buf(&sg[1], data, data_len);
-	sg_set_buf(&sg[2], mic, IEEE80211_GCMP_MIC_LEN);
-
-	aead_request_set_tfm(aead_req, tfm);
-	aead_request_set_crypt(aead_req, sg, sg, data_len, j_0);
-	aead_request_set_ad(aead_req, sg[0].length);
-
-	crypto_aead_encrypt(aead_req);
-	kzfree(aead_req);
-	return 0;
-}
-
-int ieee80211_aes_gcm_decrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic)
-{
-	struct scatterlist sg[3];
-	struct aead_request *aead_req;
-	int reqsize = sizeof(*aead_req) + crypto_aead_reqsize(tfm);
-	u8 *__aad;
-	int err;
-
-	if (data_len == 0)
-		return -EINVAL;
-
-	aead_req = kzalloc(reqsize + GCM_AAD_LEN, GFP_ATOMIC);
-	if (!aead_req)
-		return -ENOMEM;
-
-	__aad = (u8 *)aead_req + reqsize;
-	memcpy(__aad, aad, GCM_AAD_LEN);
-
-	sg_init_table(sg, 3);
-	sg_set_buf(&sg[0], &__aad[2], be16_to_cpup((__be16 *)__aad));
-	sg_set_buf(&sg[1], data, data_len);
-	sg_set_buf(&sg[2], mic, IEEE80211_GCMP_MIC_LEN);
-
-	aead_request_set_tfm(aead_req, tfm);
-	aead_request_set_crypt(aead_req, sg, sg,
-			       data_len + IEEE80211_GCMP_MIC_LEN, j_0);
-	aead_request_set_ad(aead_req, sg[0].length);
-
-	err = crypto_aead_decrypt(aead_req);
-	kzfree(aead_req);
-
-	return err;
-}
-
-struct crypto_aead *ieee80211_aes_gcm_key_setup_encrypt(const u8 key[],
-							size_t key_len)
-{
-	struct crypto_aead *tfm;
-	int err;
-
-	tfm = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);
-	if (IS_ERR(tfm))
-		return tfm;
-
-	err = crypto_aead_setkey(tfm, key, key_len);
-	if (err)
-		goto free_aead;
-	err = crypto_aead_setauthsize(tfm, IEEE80211_GCMP_MIC_LEN);
-	if (err)
-		goto free_aead;
-
-	return tfm;
-
-free_aead:
-	crypto_free_aead(tfm);
-	return ERR_PTR(err);
-}
-
-void ieee80211_aes_gcm_key_free(struct crypto_aead *tfm)
-{
-	crypto_free_aead(tfm);
-}
diff --git a/net/mac80211/aes_gcm.h b/net/mac80211/aes_gcm.h
index 55aed5352494..d2b096033009 100644
--- a/net/mac80211/aes_gcm.h
+++ b/net/mac80211/aes_gcm.h
@@ -9,16 +9,38 @@
 #ifndef AES_GCM_H
 #define AES_GCM_H
 
-#include <linux/crypto.h>
+#include "aead_api.h"
 
 #define GCM_AAD_LEN	32
 
-int ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic);
-int ieee80211_aes_gcm_decrypt(struct crypto_aead *tfm, u8 *j_0, u8 *aad,
-			      u8 *data, size_t data_len, u8 *mic);
-struct crypto_aead *ieee80211_aes_gcm_key_setup_encrypt(const u8 key[],
-							size_t key_len);
-void ieee80211_aes_gcm_key_free(struct crypto_aead *tfm);
+static inline int ieee80211_aes_gcm_encrypt(struct crypto_aead *tfm,
+					    u8 *j_0, u8 *aad,  u8 *data,
+					    size_t data_len, u8 *mic)
+{
+	return aead_encrypt(tfm, j_0, aad + 2,
+			    be16_to_cpup((__be16 *)aad),
+			    data, data_len, mic);
+}
+
+static inline int ieee80211_aes_gcm_decrypt(struct crypto_aead *tfm,
+					    u8 *j_0, u8 *aad, u8 *data,
+					    size_t data_len, u8 *mic)
+{
+	return aead_decrypt(tfm, j_0, aad + 2,
+			    be16_to_cpup((__be16 *)aad),
+			    data, data_len, mic);
+}
+
+static inline struct crypto_aead *
+ieee80211_aes_gcm_key_setup_encrypt(const u8 key[], size_t key_len)
+{
+	return aead_key_setup_encrypt("gcm(aes)", key,
+				      key_len, IEEE80211_GCMP_MIC_LEN);
+}
+
+static inline void ieee80211_aes_gcm_key_free(struct crypto_aead *tfm)
+{
+	return aead_key_free(tfm);
+}
 
 #endif /* AES_GCM_H */
diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c
index 0d722ea98a1b..b58722d9de37 100644
--- a/net/mac80211/wpa.c
+++ b/net/mac80211/wpa.c
@@ -464,7 +464,7 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb,
 	pos += IEEE80211_CCMP_HDR_LEN;
 	ccmp_special_blocks(skb, pn, b_0, aad);
 	return ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad, pos, len,
-					 skb_put(skb, mic_len), mic_len);
+					 skb_put(skb, mic_len));
 }
 
 
@@ -543,7 +543,7 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx,
 				    key->u.ccmp.tfm, b_0, aad,
 				    skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN,
 				    data_len,
-				    skb->data + skb->len - mic_len, mic_len))
+				    skb->data + skb->len - mic_len))
 				return RX_DROP_UNUSABLE;
 		}
 
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH] mac80211: aead api to reduce redundancy
From: Xiang Gao @ 2017-10-11  2:31 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David S. Miller, linux-kernel, linux-wireless, netdev
In-Reply-To: <1507532953.26041.5.camel@sipsolutions.net>

2017-10-09 3:09 GMT-04:00 Johannes Berg <johannes@sipsolutions.net>:
> On Sun, 2017-10-08 at 01:43 -0400, Xiang Gao wrote:
>>
>> By the way, I'm still struggling on how to run unit tests. It might
>> take time for me to make it run on my machine.
>
> I can run it easily, so don't worry about it too much. Running it is of
> course much appreciated, but I don't really want to go and require that
> right now, it takes a long time to run.
>
> If you do want to set it up, I suggest the vm scripts (hostap
> repository in tests/hwsim/vm/ - you can use the kernel .config there as
> a base to compile a kernel and then just kick it off from there, but it
> can take a while to run.

Thanks for your help on this. This information is actually very helpful to me.

Since the unit test is not required, I will put working on this patch
higher priority than unit tests. I will send out patches without
running unit tests for now before I can make it run on my computer.
But I'm still interested in trying to run it on my computer after I
finish this patch.

I will send PATCH v3 soon.

Thanks

>
>> Hmm... good question. The reason is, aes_ccm.c and aes_gcm.c was
>> almost exact copy of each other. But they have different copyright
>> information.
>> The copyright of aes_ccm.c was:
>>
>> Copyright 2006, Devicescape Software, Inc.
>> Copyright 2003-2004, Instant802 Networks, Inc.
>>
>> and the copyright of aes_gcm.c was:
>>
>> Copyright 2014-2015, Qualcomm Atheros, Inc.
>>
>> I just don't know how to write the copyright for the new aead_api.c,
>> so I does not put anything there.
>
> Heh, good point. Well, I guess we can pretend it wasn't already copied
> before and just "keep" both.
>
> johannes

^ permalink raw reply

* [PATCH net 2/2] net: call cgroup_sk_alloc() earlier in sk_clone_lock()
From: Eric Dumazet @ 2017-10-11  2:12 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Craig Gallek,
	Tejun Heo
In-Reply-To: <20171011021233.24158-1-edumazet@google.com>

If for some reason, the newly allocated child need to be freed,
we will call cgroup_put() (via sk_free_unlock_clone()) while the
corresponding cgroup_get() was not yet done, and we will free memory
too soon.

Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
---
 net/core/sock.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 70c6ccbdf49f2f8a5a0f7c41c7849ea01459be50..415f441c63b9e2ff8feb010f44ca27303c72aaa1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1687,6 +1687,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		atomic_set(&newsk->sk_zckey, 0);
 
 		sock_reset_flag(newsk, SOCK_DONE);
+		cgroup_sk_alloc(&newsk->sk_cgrp_data);
 
 		rcu_read_lock();
 		filter = rcu_dereference(sk->sk_filter);
@@ -1718,8 +1719,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
-		cgroup_sk_alloc(&newsk->sk_cgrp_data);
-
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
 		 * (Documentation/RCU/rculist_nulls.txt for details)
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related

* [PATCH net 1/2] Revert "net: defer call to cgroup_sk_alloc()"
From: Eric Dumazet @ 2017-10-11  2:12 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Johannes Weiner, Craig Gallek,
	Tejun Heo

This reverts commit fbb1fb4ad415cb31ce944f65a5ca700aaf73a227.

This was not the proper fix, lets cleanly revert it, so that
following patch can be carried to stable versions.

sock_cgroup_ptr() callers do not expect a NULL return value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/cgroup/cgroup.c          | 11 +++++++++++
 net/core/sock.c                 |  3 ++-
 net/ipv4/inet_connection_sock.c |  5 -----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 3380a3e49af501e457991b2823020494cf32af80..44857278eb8aa6a2bbf27b7eb12137ef42628170 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5709,6 +5709,17 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 	if (cgroup_sk_alloc_disabled)
 		return;
 
+	/* Socket clone path */
+	if (skcd->val) {
+		/*
+		 * We might be cloning a socket which is left in an empty
+		 * cgroup and the cgroup might have already been rmdir'd.
+		 * Don't use cgroup_get_live().
+		 */
+		cgroup_get(sock_cgroup_ptr(skcd));
+		return;
+	}
+
 	rcu_read_lock();
 
 	while (true) {
diff --git a/net/core/sock.c b/net/core/sock.c
index 4499e31538132ed59a16d92e6f6b923e776df84e..70c6ccbdf49f2f8a5a0f7c41c7849ea01459be50 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1680,7 +1680,6 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 		/* sk->sk_memcg will be populated at accept() time */
 		newsk->sk_memcg = NULL;
-		memset(&newsk->sk_cgrp_data, 0, sizeof(newsk->sk_cgrp_data));
 
 		atomic_set(&newsk->sk_drops, 0);
 		newsk->sk_send_head	= NULL;
@@ -1719,6 +1718,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_incoming_cpu = raw_smp_processor_id();
 		atomic64_set(&newsk->sk_cookie, 0);
 
+		cgroup_sk_alloc(&newsk->sk_cgrp_data);
+
 		/*
 		 * Before updating sk_refcnt, we must commit prior changes to memory
 		 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d32c74507314cc4b91d040de8e877e4bd8204106..67aec7a106860b26c929fea1624d652c87972f04 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -26,8 +26,6 @@
 #include <net/tcp.h>
 #include <net/sock_reuseport.h>
 #include <net/addrconf.h>
-#include <net/cls_cgroup.h>
-#include <net/netprio_cgroup.h>
 
 #ifdef INET_CSK_DEBUG
 const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
@@ -478,9 +476,6 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern)
 		spin_unlock_bh(&queue->fastopenq.lock);
 	}
 	mem_cgroup_sk_alloc(newsk);
-	cgroup_sk_alloc(&newsk->sk_cgrp_data);
-	sock_update_classid(&newsk->sk_cgrp_data);
-	sock_update_netprioidx(&newsk->sk_cgrp_data);
 out:
 	release_sock(sk);
 	if (req)
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related

* RE: [net-next,RESEND] igb: add function to get maximum RSS queues
From: Brown, Aaron F @ 2017-10-11  1:45 UTC (permalink / raw)
  To: Zhang Shengju, Kirsher, Jeffrey T,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
In-Reply-To: <1505828454-2826-1-git-send-email-zhangshengju@cmss.chinamobile.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Zhang Shengju
> Sent: Tuesday, September 19, 2017 6:41 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; intel-wired-
> lan@lists.osuosl.org; netdev@vger.kernel.org
> Subject: [net-next,RESEND] igb: add function to get maximum RSS queues
> 
> This patch adds a new function igb_get_max_rss_queues() to get maximum
> RSS queues, this will reduce duplicate code and facilitate future
> maintenance.
> 
> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
> ---
>  drivers/net/ethernet/intel/igb/igb.h         |  1 +
>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 32 +---------------------------
>  drivers/net/ethernet/intel/igb/igb_main.c    | 12 +++++++++--
>  3 files changed, 12 insertions(+), 33 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* RE: [PATCH net] driver: e1000: fix race condition between e1000_down() and e1000_watchdog
From: Brown, Aaron F @ 2017-10-11  1:07 UTC (permalink / raw)
  To: Vincenzo Maffione, Kirsher, Jeffrey T
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170916160000.3560-1-v.maffione@gmail.com>

> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Vincenzo Maffione
> Sent: Saturday, September 16, 2017 9:00 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Vincenzo Maffione <v.maffione@gmail.com>
> Subject: [PATCH net] driver: e1000: fix race condition between e1000_down()
> and e1000_watchdog
> 
> This patch fixes a race condition that can result into the interface being
> up and carrier on, but with transmits disabled in the hardware.
> The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
> allows e1000_watchdog() interleave with e1000_down().
> 
>     CPU x                           CPU y
>     --------------------------------------------------------------------
>     e1000_down():
>         netif_carrier_off()
>                                     e1000_watchdog():
>                                         if (carrier == off) {
>                                             netif_carrier_on();
>                                             enable_hw_transmit();
>                                         }
>         disable_hw_transmit();
>                                     e1000_watchdog():
>                                         /* carrier on, do nothing */
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* [iproute2 net-next v2 3/3] man: Add initial manpage for tc-cbs(8)
From: Vinicius Costa Gomes @ 2017-10-11  0:46 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings
In-Reply-To: <20171011004652.15630-1-vinicius.gomes@intel.com>

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 man/man8/tc-cbs.8 | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)
 create mode 100644 man/man8/tc-cbs.8

diff --git a/man/man8/tc-cbs.8 b/man/man8/tc-cbs.8
new file mode 100644
index 00000000..97e00c84
--- /dev/null
+++ b/man/man8/tc-cbs.8
@@ -0,0 +1,112 @@
+.TH CBS 8 "18 Sept 2017" "iproute2" "Linux"
+.SH NAME
+CBS \- Credit Based Shaper (CBS) Qdisc
+.SH SYNOPSIS
+.B tc qdisc ... dev
+dev
+.B parent
+classid
+.B [ handle
+major:
+.B ] cbs idleslope
+idleslope
+.B sendslope
+sendslope
+.B hicredit
+hicredit
+.B locredit
+locredit
+.B [ offload
+0|1
+.B ]
+
+.SH DESCRIPTION
+The CBS (Credit Based Shaper) qdisc implements the shaping algorithm
+defined by the IEEE 802.1Q-2014 Section 8.6.8.2, which applies a well
+defined rate limiting method to the traffic.
+
+This queueing discipline is intended to be used by TSN (Time Sensitive
+Networking) applications, the CBS parameters are derived directly by
+what is described by the Annex L of the IEEE 802.1Q-2014
+Sepcification. The algorithm and how it affects the latency are
+detailed there.
+
+CBS is meant to be installed under another qdisc that maps packet
+flows to traffic classes, one example is
+.BR mqprio(8).
+
+.SH PARAMETERS
+.TP
+idleslope
+Idleslope is the rate of credits that is accumulated (in kilobits per
+second) when there is at least one packet waiting for transmission.
+Packets are transmitted when the current value of credits is equal or
+greater than zero. When there is no packet to be transmitted the
+amount of credits is set to zero. This is the main tunable of the CBS
+algorithm.
+.TP
+sendslope
+Sendslope is the rate of credits that is depleted (it should be a
+negative number of kilobits per second) when a transmission is
+ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+8.6.8.2 item g):
+
+sendslope = idleslope - port_transmit_rate
+
+.TP
+hicredit
+Hicredit defines the maximum amount of credits (in bytes) that can be
+accumulated. Hicredit depends on the characteristics of interfering
+traffic, 'max_interference_size' is the maximum size of any burst of
+traffic that can delay the transmission of a frame that is available
+for transmission for this traffic class, (IEEE 802.1Q-2014 Annex L,
+Equation L-3):
+
+hicredit = max_interference_size * (idleslope / port_transmit_rate)
+
+.TP
+locredit
+Locredit is the minimum amount of credits that can be reached. It is a
+function of the traffic flowing through this qdisc (IEEE 802.1Q-2014
+Annex L, Equation L-2):
+
+locredit = max_frame_size * (sendslope / port_transmit_rate)
+
+.TP
+offload
+When
+.B offload
+is 1,
+.BR cbs(8)
+will try to configure the network interface so the CBS algorithm runs
+in the controller. The default is 0.
+
+.SH EXAMPLES
+
+CBS is used to enforce a Quality of Service by limiting the data rate
+of a traffic class, to separate packets into traffic classes the user
+may choose
+.BR mqprio(8),
+and configure it like this:
+
+.EX
+# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 3 \\
+	map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \\
+	queues 1@0 1@1 2@2 \\
+	hw 0
+.EE
+.P
+To replace the current queuing disciple by CBS in the current queueing
+discipline connected to traffic class number 0, issue:
+.P
+.EX
+# tc qdisc replace dev eth0 parent 100:4 cbs \\
+	locredit -1470 hicredit 30 sendslope -980000 idleslope 20000
+.EE
+
+These values are obtained from the following parameters, idleslope is
+20mbit/s, the transmission rate is 1Gbit/s and the maximum interfering
+frame size is 1500 bytes.
+
+.SH AUTHORS
+Vinicius Costa Gomes <vinicius.gomes@intel.com>
-- 
2.14.2

^ permalink raw reply related

* [iproute2 net-next v2 1/3] update headers with CBS API [ONLY FOR TESTING]
From: Vinicius Costa Gomes @ 2017-10-11  0:46 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings

The headers will be updated when iproute2 fetches the headers from
net-next, this patch is only to ease testing.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/pkt_sched.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 099bf552..41e349df 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -871,4 +871,22 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* CBS */
+struct tc_cbs_qopt {
+	__u8 offload;
+	__s32 hicredit;
+	__s32 locredit;
+	__s32 idleslope;
+	__s32 sendslope;
+};
+
+enum {
+	TCA_CBS_UNSPEC,
+	TCA_CBS_PARMS,
+	__TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
 #endif
-- 
2.14.2

^ permalink raw reply related

* [iproute2 net-next v2 2/3] tc: Add support for the CBS qdisc
From: Vinicius Costa Gomes @ 2017-10-11  0:46 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings
In-Reply-To: <20171011004652.15630-1-vinicius.gomes@intel.com>

The Credit Based Shaper (CBS) queueing discipline allows bandwidth
reservation with sub-milisecond precision. It is defined by the
802.1Q-2014 specification (section 8.6.8.2 and Annex L).

The syntax is:

tc qdisc add dev DEV parent NODE cbs locredit <LOCREDIT>
   		hicredit <HICREDIT> sendslope <SENDSLOPE>
		idleslope <IDLESLOPE>

(The order is not important)

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 tc/Makefile |   1 +
 tc/q_cbs.c  | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)
 create mode 100644 tc/q_cbs.c

diff --git a/tc/Makefile b/tc/Makefile
index 777de5e6..24bd3e2e 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -69,6 +69,7 @@ TCMODULES += q_hhf.o
 TCMODULES += q_clsact.o
 TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
+TCMODULES += q_cbs.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_cbs.c b/tc/q_cbs.c
new file mode 100644
index 00000000..e53be654
--- /dev/null
+++ b/tc/q_cbs.c
@@ -0,0 +1,142 @@
+/*
+ * q_cbs.c		CBS.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Vinicius Costa Gomes <vinicius.gomes@intel.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... cbs hicredit BYTES locredit BYTES sendslope BPS idleslope BPS\n");
+	fprintf(stderr, "           [offload 0|1]\n");
+
+}
+
+static void explain1(const char *arg, const char *val)
+{
+	fprintf(stderr, "cbs: illegal value for \"%s\": \"%s\"\n", arg, val);
+}
+
+static int cbs_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nlmsghdr *n)
+{
+	struct tc_cbs_qopt opt = {};
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (matches(*argv, "offload") == 0) {
+			NEXT_ARG();
+			if (opt.offload) {
+				fprintf(stderr, "cbs: duplicate \"offload\" specification\n");
+				return -1;
+			}
+			if (get_u8(&opt.offload, *argv, 0)) {
+				explain1("offload", *argv);
+				return -1;
+			}
+		} else if (matches(*argv, "hicredit") == 0) {
+			NEXT_ARG();
+			if (opt.hicredit) {
+				fprintf(stderr, "cbs: duplicate \"hicredit\" specification\n");
+				return -1;
+			}
+			if (get_s32(&opt.hicredit, *argv, 0)) {
+				explain1("hicredit", *argv);
+				return -1;
+			}
+		} else if (matches(*argv, "locredit") == 0) {
+			NEXT_ARG();
+			if (opt.locredit) {
+				fprintf(stderr, "cbs: duplicate \"locredit\" specification\n");
+				return -1;
+			}
+			if (get_s32(&opt.locredit, *argv, 0)) {
+				explain1("locredit", *argv);
+				return -1;
+			}
+		} else if (matches(*argv, "sendslope") == 0) {
+			NEXT_ARG();
+			if (opt.sendslope) {
+				fprintf(stderr, "cbs: duplicate \"sendslope\" specification\n");
+				return -1;
+			}
+			if (get_s32(&opt.sendslope, *argv, 0)) {
+				explain1("sendslope", *argv);
+				return -1;
+			}
+		} else if (matches(*argv, "idleslope") == 0) {
+			NEXT_ARG();
+			if (opt.idleslope) {
+				fprintf(stderr, "cbs: duplicate \"idleslope\" specification\n");
+				return -1;
+			}
+			if (get_s32(&opt.idleslope, *argv, 0)) {
+				explain1("idleslope", *argv);
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "cbs: unknown parameter \"%s\"\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 2024, TCA_CBS_PARMS, &opt, sizeof(opt));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int cbs_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CBS_MAX+1];
+	struct tc_cbs_qopt *qopt;
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CBS_MAX, opt);
+
+	if (tb[TCA_CBS_PARMS] == NULL)
+		return -1;
+
+	qopt = RTA_DATA(tb[TCA_CBS_PARMS]);
+	if (RTA_PAYLOAD(tb[TCA_CBS_PARMS])  < sizeof(*qopt))
+		return -1;
+
+	fprintf(f, "hicredit %d ", qopt->hicredit);
+	fprintf(f, "locredit %d ", qopt->locredit);
+	fprintf(f, "sendslope %d ", qopt->sendslope);
+	fprintf(f, "idleslope %d ", qopt->idleslope);
+	fprintf(f, "offload %d ", qopt->offload);
+
+	return 0;
+}
+
+struct qdisc_util cbs_qdisc_util = {
+	.id		= "cbs",
+	.parse_qopt	= cbs_parse_opt,
+	.print_qopt	= cbs_print_opt,
+};
-- 
2.14.2

^ permalink raw reply related

* [next-queue PATCH v5 1/5] net/sched: Check for null dev_queue on create flow
From: Vinicius Costa Gomes @ 2017-10-11  0:43 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Jesus Sanchez-Palencia, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, boon.leong.ong, richardcochran, henrik, levipearson,
	rodney.cummings
In-Reply-To: <20171011004400.14946-1-vinicius.gomes@intel.com>

From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>

In qdisc_alloc() the dev_queue pointer was used without any checks
being performed. If qdisc_create() gets a null dev_queue pointer, it
just passes it along to qdisc_alloc(), leading to a crash. That
happens if a root qdisc implements select_queue() and returns a null
dev_queue pointer for an "invalid handle", for example, or if the
dev_queue associated with the parent qdisc is null.

This patch is in preparation for the next in this series, where
select_queue() is being added to mqprio and as it may return a null
dev_queue.

Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 net/sched/sch_generic.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a0a198768aad..de2408f1ccd3 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 	struct Qdisc *sch;
 	unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
 	int err = -ENOBUFS;
-	struct net_device *dev = dev_queue->dev;
+	struct net_device *dev;
+
+	if (!dev_queue) {
+		err = -EINVAL;
+		goto errout;
+	}
 
+	dev = dev_queue->dev;
 	p = kzalloc_node(size, GFP_KERNEL,
 			 netdev_queue_numa_node_read(dev_queue));
 
-- 
2.14.2

^ permalink raw reply related

* [next-queue PATCH v5 0/5] TSN: Add qdisc based config interface for CBS
From: Vinicius Costa Gomes @ 2017-10-11  0:43 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
	ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
	richardcochran, henrik, levipearson, rodney.cummings

Hi,

Changes since v4:
 - Added a software implementation of the CBS algorithm;

Changes since v3:
 - None, only a clean patchset without old patches;

Changes since v2:
 - squashed the patch introducing the userspace API into the patch
   implementing CBS;

Changes since v1:
 - Solved the mqprio dependency;
 - Fixed a mqprio bug, that caused the inner qdisc to have a wrong
   dev_queue associated with it;

Changes from the RFC:
 - Fixed comments from Henrik Austad;
 - Simplified the Qdisc, using the generic implementation of callbacks
   where possible;
 - Small refactor on the driver (igb) code;

This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.

As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Overview
========

Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.

The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.


Proposal
========

Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.

As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:

$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
     idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.

Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.


Testing this RFC
================

Attached to this cover letter are:
 - calculate_cbs_params.py: A Python script to calculate the
   parameters to the CBS queueing discipline;
 - tsn-talker.c: A sample C implementation of the talker side of a stream;
 - tsn-listener.c: A sample C implementation of the listener side of a
   stream;

For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:

1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
     map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.

2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4

Ex.:
+---(100:3) mqprio
|    +---(100:6) mqprio
|    +---(100:7) mqprio
|
+---(100:2) mqprio
|    +---(100:5) mqprio
|
+---(100:1) mqprio
     +---(100:4) mqprio

* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.

3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
   for B is 10Mbps:
$ calc_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500

4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
     hicredit 30 sendslope -980000 idleslope 20000

5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
     locredit -1485 hicredit 31 sendslope -990000 idleslope 10000

6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500

7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500

 * The bandwidth displayed on the listener output at this stage should be very
   close to the one configured for class A.

8) You can also run a Talker for class B (prio 2 here and using a
different address):
$ ./tsn-talker -d 01:BB:BB:BB:BB:BB -i ens4 -p 2 -s 1500


Authors
=======
 - Andre Guedes <andre.guedes@intel.com>
 - Ivan Briano <ivan.briano@intel.com>
 - Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
 - Vinicius Gomes <vinicius.gomes@intel.com>


Andre Guedes (1):
  igb: Add support for CBS offload

Jesus Sanchez-Palencia (2):
  net/sched: Check for null dev_queue on create flow
  mqprio: Implement select_queue class_ops

Vinicius Costa Gomes (2):
  net/sched: Introduce Credit Based Shaper (CBS) qdisc
  net/sched: Add support for HW offloading for CBS

 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 347 +++++++++++++++++++++++
 include/linux/netdevice.h                      |   1 +
 include/net/pkt_sched.h                        |   9 +
 include/uapi/linux/pkt_sched.h                 |  18 ++
 net/sched/Kconfig                              |  11 +
 net/sched/Makefile                             |   1 +
 net/sched/sch_cbs.c                            | 375 +++++++++++++++++++++++++
 net/sched/sch_generic.c                        |   8 +-
 net/sched/sch_mqprio.c                         |   7 +
 12 files changed, 813 insertions(+), 1 deletion(-)
 create mode 100644 net/sched/sch_cbs.c

Annex: Sample files
===================

calc_cbs_params.py
--8<---------------cut here---------------start------------->8---
#!/usr/bin/env python
#
# Copyright (c) 2017, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice, this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of Intel Corporation nor the names of its contributors
#       may be used to endorse or promote products derived from this software
#       without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import argparse
import math

def print_cbs_params_for_class_a(args):
    idleslope = args.idleslope_a
    sendslope = idleslope - args.link_speed

    # According to 802.1Q-2014 spec, Annex L, hiCredit and
    # loCredit for SR class A are calculated following the
    # equations L-10 and L-12, respectively.
    hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed)
    locredit = math.ceil(sendslope * args.frame_a / args.link_speed)

    print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
          (idleslope, sendslope, hicredit, locredit))

def print_cbs_params_for_class_b(args):
    idleslope = args.idleslope_b
    sendslope = idleslope - args.link_speed

    # Annex L doesn't present a straightforward equation to
    # calculate hiCredit for Class B so we have to derive it
    # based on generic equations presented in that Annex.
    #
    # L-3 is the primary equation to calculate hiCredit. Section
    # L.2 states that the 'maxInterferenceSize' for SR class B
    # is the maximum burst size for SR class A plus the
    # maxInterferenceSize from SR class A (which is equal to the
    # maximum frame from non-SR traffic).
    #
    # The maximum burst size for SR class A equation is shown in
    # L-16. Merging L-16 into L-3 we get the resulting equation
    # which calculates hiCredit B (refer to section L.3 in case
    # you're not familiar with the legend):
    #
    # hiCredit B = Rb * (     Mo         Ma   )
    #                     ---------- + ------
    #                      Ro - Ra       Ro
    #
    hicredit = math.ceil(idleslope * \
               ((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \
               (args.frame_a / args.link_speed)))

    # loCredit B is calculated following equation L-2.
    locredit = math.ceil(sendslope * args.frame_b / args.link_speed)

    print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
          (idleslope, sendslope, hicredit, locredit))

def main():
    parser = argparse.ArgumentParser()

    parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float,
                        help='Link speed in kbps')
    parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float,
                        help='Maximum frame size from non-SR traffic (MTU size'
                        'usually')
    parser.add_argument('-A', dest='idleslope_a', default=0, type=float,
                        help='Idleslope for SR class A in kbps')
    parser.add_argument('-a', dest='frame_a', default=0, type=float,
                        help='Maximum frame size for SR class A traffic')
    parser.add_argument('-B', dest='idleslope_b', default=0, type=float,
                        help='Idleslope for SR class B in kbps')
    parser.add_argument('-b', dest='frame_b', default=0, type=float,
                        help='Maximum frame size for SR class B traffic')

    args = parser.parse_args()

    if args.idleslope_a > 0:
        print_cbs_params_for_class_a(args)

    if args.idleslope_b > 0:
        print_cbs_params_for_class_b(args)

if __name__ == "__main__":
    main()
--8<---------------cut here---------------end--------------->8---

tsn-talker.c
--8<---------------cut here---------------start------------->8---
/*
 * Copyright (c) 2017, Intel Corporation
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of Intel Corporation nor the names of its contributors
 *       may be used to endorse or promote products derived from this software
 *       without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define MAGIC 0xCC

static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static int priority = -1;
static size_t size = 1500;
static uint64_t seq;
static int delay = -1;

static struct argp_option options[] = {
	{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
	{"delay", 'D', "NUM", 0, "Delay (in us) between packet transmission" },
	{"ifname", 'i', "IFNAME", 0, "Network Interface" },
	{"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" },
	{"packet-size", 's', "NUM", 0, "Size of packets to be transmitted" },
	{ 0 }
};

static error_t parser(int key, char *arg, struct argp_state *state)
{
	int res;

	switch (key) {
	case 'd':
		res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
					&macaddr[0], &macaddr[1], &macaddr[2],
					&macaddr[3], &macaddr[4], &macaddr[5]);
		if (res != 6) {
			printf("Invalid address\n");
			exit(EXIT_FAILURE);
		}

		break;
	case 'D':
		delay = atoi(arg);
		break;
	case 'i':
		strncpy(ifname, arg, sizeof(ifname) - 1);
		break;
	case 'p':
		priority = atoi(arg);
		break;
	case 's':
		size = atoi(arg);
		break;
	}

	return 0;
}

static struct argp argp = { options, parser };

int main(int argc, char *argv[])
{
	int fd, res;
	struct ifreq req;
	uint8_t *data;
	struct sockaddr_ll sk_addr = {
		.sll_family = AF_PACKET,
		.sll_protocol = htons(ETH_P_TSN),
		.sll_halen = ETH_ALEN,
	};

	argp_parse(&argp, argc, argv, 0, NULL, NULL);

	fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
	if (fd < 0) {
		perror("Couldn't open socket");
		return 1;
	}

	strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
	res = ioctl(fd, SIOCGIFINDEX, &req);
	if (res < 0) {
		perror("Couldn't get interface index");
		goto err;
	}

	sk_addr.sll_ifindex = req.ifr_ifindex;
	memcpy(&sk_addr.sll_addr, macaddr, ETH_ALEN);

	if (priority != -1) {
		res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority,
							sizeof(priority));
		if (res < 0) {
			perror("Couldn't set priority");
			goto err;
		}

	}

	data = alloca(size);
	memset(data, MAGIC, size);

	printf("Sending packets...\n");

	while (1) {
		uint64_t *seq_ptr = (uint64_t *) &data[0];
		ssize_t n;

		*seq_ptr = seq++;

		n = sendto(fd, data, size, 0, (struct sockaddr *) &sk_addr,
							sizeof(sk_addr));
		if (n < 0)
			perror("Failed to send data");

		if (delay > 0)
			usleep(delay);
	}

	close(fd);
	return 0;

err:
	close(fd);
	return 1;
}
--8<---------------cut here---------------end--------------->8---

tsn-listener.c
--8<---------------cut here---------------start------------->8---
/*
 * Copyright (c) 2017, Intel Corporation
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 *     * Redistributions of source code must retain the above copyright notice,
 *       this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of Intel Corporation nor the names of its contributors
 *       may be used to endorse or promote products derived from this software
 *       without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <poll.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/timerfd.h>
#include <unistd.h>

static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static uint64_t data_count;
static int size = 1500;
static time_t interval = 1;
static bool check_seq = false;
static uint64_t expected_seq;

static struct argp_option options[] = {
	{"check-seq", 'c', NULL, 0, "Check sequence number within packet" },
	{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
	{"ifname", 'i', "IFNAME", 0, "Network Interface" },
	{"interval", 'I', "SEC", 0, "Interval between bandwidth reports" },
	{"packet-size", 's', "NUM", 0, "Expected packet size" },
	{ 0 }
};

static error_t parser(int key, char *arg, struct argp_state *state)
{
	int res;

	switch (key) {
	case 'c':
		check_seq = true;
		break;
	case 'd':
		res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
					&macaddr[0], &macaddr[1], &macaddr[2],
					&macaddr[3], &macaddr[4], &macaddr[5]);
		if (res != 6) {
			printf("Invalid address\n");
			exit(EXIT_FAILURE);
		}

		break;
	case 'i':
		strncpy(ifname, arg, sizeof(ifname) - 1);
		break;
	case 'I':
		interval = atoi(arg);
		break;
	case 's':
		size = atoi(arg);
		break;
	}

	return 0;
}

static struct argp argp = { options, parser };

static int setup_timer(void)
{
	int fd, res;
	struct itimerspec tspec = { 0 };

	fd = timerfd_create(CLOCK_MONOTONIC, 0);
	if (fd < 0) {
		perror("Couldn't create timer");
		return -1;
	}

	tspec.it_value.tv_sec = interval;
	tspec.it_interval.tv_sec = interval;

	res = timerfd_settime(fd, 0, &tspec, NULL);
	if (res < 0) {
		perror("Couldn't set timer");
		close(fd);
		return -1;
	}

	return fd;
}

static int setup_socket(void)
{
	int fd, res;
	struct sockaddr_ll sk_addr = {
		.sll_family = AF_PACKET,
		.sll_protocol = htons(ETH_P_TSN),
	};

	fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
	if (fd < 0) {
		perror("Couldn't open socket");
		return -1;
	}

	/* If user provided a network interface, bind() to it. */
	if (ifname[0] != '\0') {
		struct ifreq req;

		strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
		res = ioctl(fd, SIOCGIFINDEX, &req);
		if (res < 0) {
			perror("Couldn't get interface index");
			goto err;
		}

		sk_addr.sll_ifindex = req.ifr_ifindex;

		res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr));
		if (res < 0) {
			perror("Couldn't bind() to interface");
			goto err;
		}
	}

	/* If user provided the stream destination address, set it as multicast
	 * address.
	 */
	if (macaddr[0] != '\0') {
		struct packet_mreq mreq;

		mreq.mr_ifindex = sk_addr.sll_ifindex;
		mreq.mr_type = PACKET_MR_MULTICAST;
		mreq.mr_alen = ETH_ALEN;
		memcpy(&mreq.mr_address, macaddr, ETH_ALEN);

		res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
					&mreq, sizeof(struct packet_mreq));
		if (res < 0) {
			perror("Couldn't set PACKET_ADD_MEMBERSHIP");
			goto err;
		}
	}

	return fd;

err:
	close(fd);
	return -1;
}

static void recv_packet(int fd)
{
	uint8_t *data = alloca(size);
	ssize_t n = recv(fd, data, size, 0);

	if (n < 0) {
		perror("Failed to receive data");
		return;
	}

	if (n != size)
		printf("Size mismatch: expected %d, got %d\n", size, n);

	if (check_seq) {
		uint64_t *seq = (uint64_t *) &data[0];

		/* If 'expected_seq' is equal to zero, it means this is the
		 * first packet we received so we don't know what sequence
		 * number to expect.
		 */
		if (expected_seq == 0)
			expected_seq = *seq;

		if (*seq != expected_seq) {
			printf("Sequence mismatch: expected %llu, got %llu\n",
					expected_seq, *seq);

			expected_seq = *seq;
		}

		expected_seq++;
	}

	data_count += n;
}

static void report_bw(int fd)
{
	uint64_t expirations;
	ssize_t n = read(fd, &expirations, sizeof(uint64_t));

	if (n < 0) {
		perror("Couldn't read timerfd");
		return;
	}

	if (expirations != 1)
		printf("Some went wrong with timerfd\n");

	printf("Receiving data rate: %llu kbps\n", (data_count * 8) / (1000 * interval));

	data_count = 0;
}

int main(int argc, char *argv[])
{
	int sk_fd, timer_fd, res;
	struct pollfd fds[2];

	argp_parse(&argp, argc, argv, 0, NULL, NULL);

	sk_fd = setup_socket();
	if (sk_fd < 0)
		return 1;

	timer_fd = setup_timer();
	if (timer_fd < 0) {
		close(sk_fd);
		return 1;
	}

	fds[0].fd = sk_fd;
	fds[0].events = POLLIN;
	fds[1].fd = timer_fd;
	fds[1].events = POLLIN;

	printf("Waiting for packets...\n");

	while (1) {
		res = poll(fds, 2, -1);
		if (res < 0) {
			perror("Error on poll()");
			goto err;
		}

		if (fds[0].revents & POLLIN)
			recv_packet(fds[0].fd);

		if (fds[1].revents & POLLIN) {
			report_bw(fds[1].fd);
		}
	}

	close(timer_fd);
	close(sk_fd);
	return 0;

err:
	close(timer_fd);
	close(sk_fd);
	return 1;
}
--8<---------------cut here---------------end--------------->8---

^ permalink raw reply

* [next-queue PATCH v5 5/5] igb: Add support for CBS offload
From: Vinicius Costa Gomes @ 2017-10-11  0:44 UTC (permalink / raw)
  To: netdev, intel-wired-lan
  Cc: Andre Guedes, jhs, xiyou.wangcong, jiri, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik,
	levipearson, rodney.cummings
In-Reply-To: <20171011004400.14946-1-vinicius.gomes@intel.com>

From: Andre Guedes <andre.guedes@intel.com>

This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.

The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.

FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.

FQTSS feature is supported by i210 controller only.

Signed-off-by: Andre Guedes <andre.guedes@intel.com>
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h    |   8 +
 drivers/net/ethernet/intel/igb/igb.h           |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 347 +++++++++++++++++++++++++
 4 files changed, 384 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
 #define E1000_RXPBS_CFG_TS_EN           0x80000000
 
 #define I210_RXPBSIZE_DEFAULT		0x000000A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK		0x0000003F
+#define I210_RXPBSIZE_PB_32KB		0x00000020
 #define I210_TXPBSIZE_DEFAULT		0x04000014 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK		0xC0FFFFFF
+#define I210_TXPBSIZE_PB0_8KB		(8 << 0)
+#define I210_TXPBSIZE_PB1_8KB		(8 << 6)
+#define I210_TXPBSIZE_PB2_4KB		(4 << 12)
+#define I210_TXPBSIZE_PB3_4KB		(4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT		0x00000098
+
+#define I210_SR_QUEUES_NUM		2
 
 /* SerDes Control */
 #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
 #define E1000_VLAPQF_P_VALID(_n)	(0x1 << (3 + (_n) * 4))
 #define E1000_VLAPQF_QUEUE_MASK	0x03
 
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE	BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARB	BIT(4)
+#define E1000_TQAVCTRL_DATATRANARB	BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK	0xFFFF
+#define E1000_TQAVCC_QUEUEMODE		BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY		BIT(27)
+
 #endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
 
 #define E1000_I210_FLA		0x1201C
 
+#define E1000_I210_DTXMXPKTSZ	0x355C
+
+#define E1000_I210_TXDCTL(_n)	(0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL	0x3570
+#define E1000_I210_TQAVCC(_n)	(0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n)	(0x300C + ((_n) * 0x40))
+
 #define E1000_INVM_DATA_REG(_n)	(0x12120 + 4*(_n))
 #define E1000_INVM_SIZE		64 /* Number of INVM Data Registers */
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
 	u16 count;			/* number of desc. in the ring */
 	u8 queue_index;			/* logical index of the ring*/
 	u8 reg_idx;			/* physical index of the ring */
+	bool cbs_enable;		/* indicates if CBS is enabled */
+	s32 idleslope;			/* idleSlope in kbps */
+	s32 sendslope;			/* sendSlope in kbps */
+	s32 hicredit;			/* hiCredit in bytes */
+	s32 locredit;			/* loCredit in bytes */
 
 	/* everything past this point are written often */
 	u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
 #define IGB_FLAG_EEE			BIT(14)
 #define IGB_FLAG_VLAN_PROMISC		BIT(15)
 #define IGB_FLAG_RX_LEGACY		BIT(16)
+#define IGB_FLAG_FQTSS			BIT(17)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0		0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 837d9b46a390..be2cf263efa9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -34,6 +34,7 @@
 #include <linux/slab.h>
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
+#include <net/pkt_sched.h>
 #include <linux/net_tstamp.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
@@ -62,6 +63,17 @@
 #define BUILD 0
 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) "." \
 __stringify(BUILD) "-k"
+
+enum queue_mode {
+	QUEUE_MODE_STRICT_PRIORITY,
+	QUEUE_MODE_STREAM_RESERVATION,
+};
+
+enum tx_queue_prio {
+	TX_QUEUE_PRIO_HIGH,
+	TX_QUEUE_PRIO_LOW,
+};
+
 char igb_driver_name[] = "igb";
 char igb_driver_version[] = DRV_VERSION;
 static const char igb_driver_string[] =
@@ -1271,6 +1283,12 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
 		ring->count = adapter->tx_ring_count;
 		ring->queue_index = txr_idx;
 
+		ring->cbs_enable = false;
+		ring->idleslope = 0;
+		ring->sendslope = 0;
+		ring->hicredit = 0;
+		ring->locredit = 0;
+
 		u64_stats_init(&ring->tx_syncp);
 		u64_stats_init(&ring->tx_syncp2);
 
@@ -1598,6 +1616,284 @@ static void igb_get_hw_control(struct igb_adapter *adapter)
 			ctrl_ext | E1000_CTRL_EXT_DRV_LOAD);
 }
 
+static void enable_fqtss(struct igb_adapter *adapter, bool enable)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+
+	if (enable)
+		adapter->flags |= IGB_FLAG_FQTSS;
+	else
+		adapter->flags &= ~IGB_FLAG_FQTSS;
+
+	if (netif_running(netdev))
+		schedule_work(&adapter->reset_task);
+}
+
+static bool is_fqtss_enabled(struct igb_adapter *adapter)
+{
+	return (adapter->flags & IGB_FLAG_FQTSS) ? true : false;
+}
+
+static void set_tx_desc_fetch_prio(struct e1000_hw *hw, int queue,
+				   enum tx_queue_prio prio)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 4);
+
+	val = rd32(E1000_I210_TXDCTL(queue));
+
+	if (prio == TX_QUEUE_PRIO_HIGH)
+		val |= E1000_TXDCTL_PRIORITY;
+	else
+		val &= ~E1000_TXDCTL_PRIORITY;
+
+	wr32(E1000_I210_TXDCTL(queue), val);
+}
+
+static void set_queue_mode(struct e1000_hw *hw, int queue, enum queue_mode mode)
+{
+	u32 val;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+
+	val = rd32(E1000_I210_TQAVCC(queue));
+
+	if (mode == QUEUE_MODE_STREAM_RESERVATION)
+		val |= E1000_TQAVCC_QUEUEMODE;
+	else
+		val &= ~E1000_TQAVCC_QUEUEMODE;
+
+	wr32(E1000_I210_TQAVCC(queue), val);
+}
+
+/**
+ *  igb_configure_cbs - Configure Credit-Based Shaper (CBS)
+ *  @adapter: pointer to adapter struct
+ *  @queue: queue number
+ *  @enable: true = enable CBS, false = disable CBS
+ *  @idleslope: idleSlope in kbps
+ *  @sendslope: sendSlope in kbps
+ *  @hicredit: hiCredit in bytes
+ *  @locredit: loCredit in bytes
+ *
+ *  Configure CBS for a given hardware queue. When disabling, idleslope,
+ *  sendslope, hicredit, locredit arguments are ignored. Returns 0 if
+ *  success. Negative otherwise.
+ **/
+static void igb_configure_cbs(struct igb_adapter *adapter, int queue,
+			      bool enable, int idleslope, int sendslope,
+			      int hicredit, int locredit)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 tqavcc;
+	u16 value;
+
+	WARN_ON(hw->mac.type != e1000_i210);
+	WARN_ON(queue < 0 || queue > 1);
+
+	if (enable) {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_HIGH);
+		set_queue_mode(hw, queue, QUEUE_MODE_STREAM_RESERVATION);
+
+		/* According to i210 datasheet section 7.2.7.7, we should set
+		 * the 'idleSlope' field from TQAVCC register following the
+		 * equation:
+		 *
+		 * For 100 Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 0.2                          (E1)
+		 *
+		 * For 1000Mbps link speed:
+		 *
+		 *     value = BW * 0x7735 * 2                            (E2)
+		 *
+		 * E1 and E2 can be merged into one equation as shown below.
+		 * Note that 'link-speed' is in Mbps.
+		 *
+		 *     value = BW * 0x7735 * 2 * link-speed
+		 *                           --------------               (E3)
+		 *                                1000
+		 *
+		 * 'BW' is the percentage bandwidth out of full link speed
+		 * which can be found with the following equation. Note that
+		 * idleSlope here is the parameter from this function which
+		 * is in kbps.
+		 *
+		 *     BW =     idleSlope
+		 *          -----------------                             (E4)
+		 *          link-speed * 1000
+		 *
+		 * That said, we can come up with a generic equation to
+		 * calculate the value we should set it TQAVCC register by
+		 * replacing 'BW' in E3 by E4. The resulting equation is:
+		 *
+		 * value =     idleSlope     * 0x7735 * 2 * link-speed
+		 *         -----------------            --------------    (E5)
+		 *         link-speed * 1000                 1000
+		 *
+		 * 'link-speed' is present in both sides of the fraction so
+		 * it is canceled out. The final equation is the following:
+		 *
+		 *     value = idleSlope * 61034
+		 *             -----------------                          (E6)
+		 *                  1000000
+		 */
+		value = DIV_ROUND_UP_ULL(idleslope * 61034ULL, 1000000);
+
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		tqavcc |= value;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		wr32(E1000_I210_TQAVHC(queue), 0x80000000 + hicredit * 0x7735);
+	} else {
+		set_tx_desc_fetch_prio(hw, queue, TX_QUEUE_PRIO_LOW);
+		set_queue_mode(hw, queue, QUEUE_MODE_STRICT_PRIORITY);
+
+		/* Set idleSlope to zero. */
+		tqavcc = rd32(E1000_I210_TQAVCC(queue));
+		tqavcc &= ~E1000_TQAVCC_IDLESLOPE_MASK;
+		wr32(E1000_I210_TQAVCC(queue), tqavcc);
+
+		/* Set hiCredit to zero. */
+		wr32(E1000_I210_TQAVHC(queue), 0);
+	}
+
+	/* XXX: In i210 controller the sendSlope and loCredit parameters from
+	 * CBS are not configurable by software so we don't do any 'controller
+	 * configuration' in respect to these parameters.
+	 */
+
+	netdev_dbg(netdev, "CBS %s: queue %d idleslope %d sendslope %d hiCredit %d locredit %d\n",
+		   (enable) ? "enabled" : "disabled", queue,
+		   idleslope, sendslope, hicredit, locredit);
+}
+
+static int igb_save_cbs_params(struct igb_adapter *adapter, int queue,
+			       bool enable, int idleslope, int sendslope,
+			       int hicredit, int locredit)
+{
+	struct igb_ring *ring;
+
+	if (queue < 0 || queue > adapter->num_tx_queues)
+		return -EINVAL;
+
+	ring = adapter->tx_ring[queue];
+
+	ring->cbs_enable = enable;
+	ring->idleslope = idleslope;
+	ring->sendslope = sendslope;
+	ring->hicredit = hicredit;
+	ring->locredit = locredit;
+
+	return 0;
+}
+
+static bool is_any_cbs_enabled(struct igb_adapter *adapter)
+{
+	struct igb_ring *ring;
+	int i;
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		ring = adapter->tx_ring[i];
+
+		if (ring->cbs_enable)
+			return true;
+	}
+
+	return false;
+}
+
+static void igb_setup_tx_mode(struct igb_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct e1000_hw *hw = &adapter->hw;
+	u32 val;
+
+	/* Only i210 controller supports changing the transmission mode. */
+	if (hw->mac.type != e1000_i210)
+		return;
+
+	if (is_fqtss_enabled(adapter)) {
+		int i, max_queue;
+
+		/* Configure TQAVCTRL register: set transmit mode to 'Qav',
+		 * set data fetch arbitration to 'round robin' and set data
+		 * transfer arbitration to 'credit shaper algorithm.
+		 */
+		val = rd32(E1000_I210_TQAVCTRL);
+		val |= E1000_TQAVCTRL_XMIT_MODE | E1000_TQAVCTRL_DATATRANARB;
+		val &= ~E1000_TQAVCTRL_DATAFETCHARB;
+		wr32(E1000_I210_TQAVCTRL, val);
+
+		/* Configure Tx and Rx packet buffers sizes as described in
+		 * i210 datasheet section 7.2.7.7.
+		 */
+		val = rd32(E1000_TXPBS);
+		val &= ~I210_TXPBSIZE_MASK;
+		val |= I210_TXPBSIZE_PB0_8KB | I210_TXPBSIZE_PB1_8KB |
+			I210_TXPBSIZE_PB2_4KB | I210_TXPBSIZE_PB3_4KB;
+		wr32(E1000_TXPBS, val);
+
+		val = rd32(E1000_RXPBS);
+		val &= ~I210_RXPBSIZE_MASK;
+		val |= I210_RXPBSIZE_PB_32KB;
+		wr32(E1000_RXPBS, val);
+
+		/* Section 8.12.9 states that MAX_TPKT_SIZE from DTXMXPKTSZ
+		 * register should not exceed the buffer size programmed in
+		 * TXPBS. The smallest buffer size programmed in TXPBS is 4kB
+		 * so according to the datasheet we should set MAX_TPKT_SIZE to
+		 * 4kB / 64.
+		 *
+		 * However, when we do so, no frame from queue 2 and 3 are
+		 * transmitted.  It seems the MAX_TPKT_SIZE should not be great
+		 * or _equal_ to the buffer size programmed in TXPBS. For this
+		 * reason, we set set MAX_ TPKT_SIZE to (4kB - 1) / 64.
+		 */
+		val = (4096 - 1) / 64;
+		wr32(E1000_I210_DTXMXPKTSZ, val);
+
+		/* Since FQTSS mode is enabled, apply any CBS configuration
+		 * previously set. If no previous CBS configuration has been
+		 * done, then the initial configuration is applied, which means
+		 * CBS is disabled.
+		 */
+		max_queue = (adapter->num_tx_queues < I210_SR_QUEUES_NUM) ?
+			    adapter->num_tx_queues : I210_SR_QUEUES_NUM;
+
+		for (i = 0; i < max_queue; i++) {
+			struct igb_ring *ring = adapter->tx_ring[i];
+
+			igb_configure_cbs(adapter, i, ring->cbs_enable,
+					  ring->idleslope, ring->sendslope,
+					  ring->hicredit, ring->locredit);
+		}
+	} else {
+		wr32(E1000_RXPBS, I210_RXPBSIZE_DEFAULT);
+		wr32(E1000_TXPBS, I210_TXPBSIZE_DEFAULT);
+		wr32(E1000_I210_DTXMXPKTSZ, I210_DTXMXPKTSZ_DEFAULT);
+
+		val = rd32(E1000_I210_TQAVCTRL);
+		/* According to Section 8.12.21, the other flags we've set when
+		 * enabling FQTSS are not relevant when disabling FQTSS so we
+		 * don't set they here.
+		 */
+		val &= ~E1000_TQAVCTRL_XMIT_MODE;
+		wr32(E1000_I210_TQAVCTRL, val);
+	}
+
+	netdev_dbg(netdev, "FQTSS %s\n", (is_fqtss_enabled(adapter)) ?
+		   "enabled" : "disabled");
+}
+
 /**
  *  igb_configure - configure the hardware for RX and TX
  *  @adapter: private board structure
@@ -1609,6 +1905,7 @@ static void igb_configure(struct igb_adapter *adapter)
 
 	igb_get_hw_control(adapter);
 	igb_set_rx_mode(netdev);
+	igb_setup_tx_mode(adapter);
 
 	igb_restore_vlan(adapter);
 
@@ -2150,6 +2447,55 @@ igb_features_check(struct sk_buff *skb, struct net_device *dev,
 	return features;
 }
 
+static int igb_offload_cbs(struct igb_adapter *adapter,
+			   struct tc_cbs_qopt_offload *qopt)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	int err;
+
+	/* CBS offloading is only supported by i210 controller. */
+	if (hw->mac.type != e1000_i210)
+		return -EOPNOTSUPP;
+
+	/* CBS offloading is only supported by queue 0 and queue 1. */
+	if (qopt->queue < 0 || qopt->queue > 1)
+		return -EINVAL;
+
+	err = igb_save_cbs_params(adapter, qopt->queue, qopt->enable,
+				  qopt->idleslope, qopt->sendslope,
+				  qopt->hicredit, qopt->locredit);
+	if (err)
+		return err;
+
+	if (is_fqtss_enabled(adapter)) {
+		igb_configure_cbs(adapter, qopt->queue, qopt->enable,
+				  qopt->idleslope, qopt->sendslope,
+				  qopt->hicredit, qopt->locredit);
+
+		if (!is_any_cbs_enabled(adapter))
+			enable_fqtss(adapter, false);
+
+	} else {
+		enable_fqtss(adapter, true);
+	}
+
+	return 0;
+}
+
+static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type,
+			void *type_data)
+{
+	struct igb_adapter *adapter = netdev_priv(dev);
+
+	switch (type) {
+	case TC_SETUP_CBS:
+		return igb_offload_cbs(adapter, type_data);
+
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static const struct net_device_ops igb_netdev_ops = {
 	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
@@ -2175,6 +2521,7 @@ static const struct net_device_ops igb_netdev_ops = {
 	.ndo_set_features	= igb_set_features,
 	.ndo_fdb_add		= igb_ndo_fdb_add,
 	.ndo_features_check	= igb_features_check,
+	.ndo_setup_tc		= igb_setup_tc,
 };
 
 /**
-- 
2.14.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox