Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] tun: use per cpu variables for stats accounting
From: Paolo Abeni @ 2016-04-15 12:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, mst, hannes, ebiederm, gkurz, jasowang
In-Reply-To: <20160414.225617.2128747969495613941.davem@davemloft.net>

On Thu, 2016-04-14 at 22:56 -0400, David Miller wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Wed, 13 Apr 2016 10:52:20 +0200
> 
> > Currently the tun device accounting uses dev->stats without applying any
> > kind of protection, regardless that accounting happens in preemptible
> > process context.
> > This patch move the tun stats to a per cpu data structure, and protect
> > the updates with  u64_stats_update_begin()/u64_stats_update_end() or
> > this_cpu_inc according to the stat type. The per cpu stats are
> > aggregated by the newly added ndo_get_stats64 ops.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> Ok this seems reasonable, applied, thanks.
> 
> I guess most applications use tuntap by having two threads, one
> for transmit and one for receive processing?

Probably. I guess that an user space application can leverage multiple
transmit threads to improve the throughput.

Paolo

^ permalink raw reply

* Re: [PATCH 0/5] wireless: ti: Convert specialized logging macros to kernel style
From: Kalle Valo @ 2016-04-15 12:22 UTC (permalink / raw)
  To: Eliad Peller
  Cc: Joe Perches, LKML, linux-wireless@vger.kernel.org,
	open list:NETWORKING DRIVERS, Guy Mishol, Uri Mashiach,
	Johannes Berg
In-Reply-To: <CAB3XZEfePJG7t6+TGYer_pcdqqaTfYugK0cARwhgLm9KRG1fmg@mail.gmail.com>

Eliad Peller <eliad@wizery.com> writes:

> On Thu, Mar 31, 2016 at 11:07 AM, Joe Perches <joe@perches.com> wrote:
>> On Thu, 2016-03-31 at 10:39 +0300, Kalle Valo wrote:
>>> Joe Perches <joe@perches.com> writes:
>>> > On Wed, 2016-03-30 at 14:51 +0300, Kalle Valo wrote:
>>> > > Joe Perches <joe@perches.com> writes:
>>> > > >
>>> > > > Using the normal kernel logging mechanisms makes this code
>>> > > > a bit more like other wireless drivers.
>>> > > Personally I don't see the point but I don't have any strong opinions. A
>>> > > bigger problem is that TI drivers are not really in active development
>>> > > and that's I'm not thrilled to take big patches like this for dormant
>>> > > drivers.
>>> > Not very dormant.
>>> >
>>> > 35 patches in the last year, most of them adding functionality.
>>> Oh, I didn't realise it had that many patches. But the driver is
>>> orphaned and doesn't have a maintainer so could I then have an ack from
>>> one of the active contributors that this ok?
>>
>> Fine by me.
>>
>> $ ./scripts/get_maintainer.pl -f --git drivers/net/wireless/ti/
>>
>> Kalle Valo <kvalo@codeaurora.org> (maintainer:NETWORKING DRIVERS (WIRELESS),commit_signer:27/35=77%)
>> Eliad Peller <eliad@wizery.com> (commit_signer:9/35=26%,authored:7/35=20%)
>> Guy Mishol <guym@ti.com> (commit_signer:6/35=17%,authored:5/35=14%)
>> Johannes Berg <johannes.berg@intel.com> (commit_signer:6/35=17%,authored:3/35=9%)
>> Uri Mashiach <uri.mashiach@compulab.co.il> (commit_signer:4/35=11%,authored:4/35=11%)
>>
>> For those people now added to the cc list,
>> here's the original patch thread:
>>
>> https://lkml.org/lkml/2016/3/7/1099
>
> I don't have a strong opinion here either.
> (I do like the trailing newline being added automatically, but that's
> hardly an issue...)

Ok, I didn't get any objections so I'm planning to this set. If someone
thinks this is a bad idea speak up now.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH 2/2] net-ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()
From: Kalle Valo @ 2016-04-15 12:09 UTC (permalink / raw)
  To: Julian Calaby
  Cc: ath9k-devel, linux-wireless, netdev, QCA ath9k Development, LKML,
	SF Markus Elfring, kernel-janitors, Julia Lawall
In-Reply-To: <CAGRGNgXR_OoER0rN5Z8n_5VtZimpU7WDNZuc4vkdb2eKDT2frQ@mail.gmail.com>

Julian Calaby <julian.calaby@gmail.com> writes:

> Hi Kalle,
>
> On Sat, Jan 2, 2016 at 5:25 AM, SF Markus Elfring
> <elfring@users.sourceforge.net> wrote:
>> From: Markus Elfring <elfring@users.sourceforge.net>
>> Date: Fri, 1 Jan 2016 19:09:32 +0100
>>
>> Replace an explicit initialisation for one local variable at the beginning
>> by a conditional assignment.
>>
>> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
>
> This looks sane to me.
>
> Reviewed-by: Julian Calaby <julian.calaby@gmail.com>

Before I commit I'll just change the commit title to:

ath9k_htc: Replace a variable initialisation by an assignment in ath9k_htc_set_channel()

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH net-next 0/2] act_bpf, cls_bpf: send eBPF bytecode through
From: Daniel Borkmann @ 2016-04-15 10:41 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: netdev, alexei.starovoitov
In-Reply-To: <1460714856-7221-1-git-send-email-quentin.monnet@6wind.com>

Hi Quentin,

On 04/15/2016 12:07 PM, Quentin Monnet wrote:
> When a new BPF traffic control filter or action is set up with tc, the
> bytecode is sent back to userspace through a netlink socket for cBPF, but
> not for eBPF (the file descriptor pointing to the object file containing
> the bytecode is sent instead).
>
> This patch makes cls_bpf and act_bpf modules send the bytecode for eBPF as
> well (in addition to the file descriptor).
>
> New BPF flags are used in order to differenciate what BPF version is in
> use, so that userspace tools can process the bytecode properly.
>
> Once the series is accepted and merged, it is intended to submit a patch
> for the iproute2 package, so as to fix tc utility so as to use the new
> flags and to display the bytecode in eBPF format when needed. This tc
> patch is already available at:
> https://github.com/6WIND/iproute2/commits/netlink_eBPF

Thanks for working on this, but it's unfortunately not that easy. Let
me ask, what would be the intended use-case to dump the insns?

I'm asking because if you dump them as-is, then a reinject at a later
time of that bytecode back into the kernel will most likely be rejected
by the verifier.

This is because on load time, verifier does rewrites/expansion on some
of the insns (f.e. map pointers, helper functions, ctx access etc, see
also appendix in [1]), so the code as seen in the kernel would need to
be sanitized first.

Also, how would you make sense/transform maps into a meaningful
representation (probably possible to find a scheme when they are pinned)?

Another possibility is that such programs need to be pinned (can be done
easily by tc in the background) and then implement a CRIU facility into
the bpf(2) syscall to retrieve them. tc could make use of this w/o too
much effort, and at the same time it would help CRIU folks, too. It
also seems cleaner to have only one central api (bpf(2)) to dump them,
but needs a bit of thought.

Thanks & cheers,
Daniel

   [1] http://www.netdevconf.org/1.1/proceedings/slides/borkmann-tc-classifier-cls-bpf.pdf

> Quentin Monnet (2):
>    act_bpf: send back eBPF bytecode through netlink socket
>    cls_bpf: send back eBPF bytecode through netlink socket
>
>   include/uapi/linux/pkt_cls.h       |  1 +
>   include/uapi/linux/tc_act/tc_bpf.h |  1 +
>   net/sched/act_bpf.c                | 23 +++++++++++++++++++++++
>   net/sched/cls_bpf.c                | 25 +++++++++++++++++++++++--
>   4 files changed, 48 insertions(+), 2 deletions(-)
>

^ permalink raw reply

* [PATCH net-next 2/2] cls_bpf: send back eBPF bytecode through netlink socket
From: Quentin Monnet @ 2016-04-15 10:07 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1460714856-7221-1-git-send-email-quentin.monnet@6wind.com>

As for act_bpf in a former patch, this patch makes the scheduler send
eBPF bytecode through the netlink socket for BPF filters set up with tc.
The existing TCA_BPF_FLAGS netlink attribute is used to embed a new flag
signaling eBPF bytecode, so as to identify the BPF version when reading
from the socket on userspace side.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
---
 include/uapi/linux/pkt_cls.h |  1 +
 net/sched/cls_bpf.c          | 25 +++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index c43c5f78b9c4..09d726fc2c5a 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -376,6 +376,7 @@ enum {
 /* BPF classifier */
 
 #define TCA_BPF_FLAG_ACT_DIRECT		(1 << 0)
+#define TCA_BPF_FLAG_EBPF		(1 << 1)
 
 enum {
 	TCA_BPF_UNSPEC,
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 425fe6a0eda3..f1d9057d8d94 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -450,6 +450,25 @@ static int cls_bpf_dump_bpf_info(const struct cls_bpf_prog *prog,
 static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 				  struct sk_buff *skb)
 {
+	struct bpf_prog *filter;
+
+	rcu_read_lock();
+	filter = rcu_dereference(prog->filter);
+	if (filter) {
+		if (nla_put_u16(skb, TCA_BPF_OPS_LEN, filter->len)) {
+			rcu_read_unlock();
+			return -EMSGSIZE;
+		}
+
+		if (nla_put(skb, TCA_BPF_OPS,
+			    filter->len * sizeof(struct sock_filter),
+			    filter->insnsi)) {
+			rcu_read_unlock();
+			return -EMSGSIZE;
+		}
+	}
+	rcu_read_unlock();
+
 	if (nla_put_u32(skb, TCA_BPF_FD, prog->bpf_fd))
 		return -EMSGSIZE;
 
@@ -481,10 +500,12 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 	    nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
 		goto nla_put_failure;
 
-	if (cls_bpf_is_ebpf(prog))
+	if (cls_bpf_is_ebpf(prog)) {
+		bpf_flags |= TCA_BPF_FLAG_EBPF;
 		ret = cls_bpf_dump_ebpf_info(prog, skb);
-	else
+	} else {
 		ret = cls_bpf_dump_bpf_info(prog, skb);
+	}
 	if (ret)
 		goto nla_put_failure;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 1/2] act_bpf: send back eBPF bytecode through netlink socket
From: Quentin Monnet @ 2016-04-15 10:07 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1460714856-7221-1-git-send-email-quentin.monnet@6wind.com>

When a new BPF traffic control action is set up with tc, the bytecode is
sent back to userspace through a netlink socket for cBPF, but not for
eBPF (the file descriptor pointing to the object file containing the
bytecode is sent instead).

This patch makes act_bpf module send the bytecode for eBPF as well (in
addition to the file descriptor).

It also adds a new BPF netlink attribute (a flag) in order to
differenciate what BPF version is in use, so that userspace tools can
process it properly.

Signed-off-by: Quentin Monnet <quentin.monnet@6wind.com>
---
 include/uapi/linux/tc_act/tc_bpf.h |  1 +
 net/sched/act_bpf.c                | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
index 07f17cc70bb3..8c9a44324467 100644
--- a/include/uapi/linux/tc_act/tc_bpf.h
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -26,6 +26,7 @@ enum {
 	TCA_ACT_BPF_OPS,
 	TCA_ACT_BPF_FD,
 	TCA_ACT_BPF_NAME,
+	TCA_ACT_BPF_EBPF,
 	__TCA_ACT_BPF_MAX,
 };
 #define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 8c9f1f0459ab..fcd30f0b3b75 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -118,6 +118,28 @@ static int tcf_bpf_dump_bpf_info(const struct tcf_bpf *prog,
 static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
 				  struct sk_buff *skb)
 {
+	struct bpf_prog *filter;
+
+	if (nla_put_flag(skb, TCA_ACT_BPF_EBPF))
+		return -EMSGSIZE;
+
+	rcu_read_lock();
+	filter = rcu_dereference(prog->filter);
+	if (filter) {
+		if (nla_put_u16(skb, TCA_ACT_BPF_OPS_LEN, filter->len)) {
+			rcu_read_unlock();
+			return -EMSGSIZE;
+		}
+
+		if (nla_put(skb, TCA_ACT_BPF_OPS,
+			    filter->len * sizeof(struct sock_filter),
+			    filter->insnsi)) {
+			rcu_read_unlock();
+			return -EMSGSIZE;
+		}
+	}
+	rcu_read_unlock();
+
 	if (nla_put_u32(skb, TCA_ACT_BPF_FD, prog->bpf_fd))
 		return -EMSGSIZE;
 
@@ -170,6 +192,7 @@ static const struct nla_policy act_bpf_policy[TCA_ACT_BPF_MAX + 1] = {
 	[TCA_ACT_BPF_PARMS]	= { .len = sizeof(struct tc_act_bpf) },
 	[TCA_ACT_BPF_FD]	= { .type = NLA_U32 },
 	[TCA_ACT_BPF_NAME]	= { .type = NLA_NUL_STRING, .len = ACT_BPF_NAME_LEN },
+	[TCA_ACT_BPF_EBPF]	= { .type = NLA_FLAG },
 	[TCA_ACT_BPF_OPS_LEN]	= { .type = NLA_U16 },
 	[TCA_ACT_BPF_OPS]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 0/2] act_bpf, cls_bpf: send eBPF bytecode through
From: Quentin Monnet @ 2016-04-15 10:07 UTC (permalink / raw)
  To: netdev

When a new BPF traffic control filter or action is set up with tc, the
bytecode is sent back to userspace through a netlink socket for cBPF, but
not for eBPF (the file descriptor pointing to the object file containing
the bytecode is sent instead).

This patch makes cls_bpf and act_bpf modules send the bytecode for eBPF as
well (in addition to the file descriptor).

New BPF flags are used in order to differenciate what BPF version is in
use, so that userspace tools can process the bytecode properly.

Once the series is accepted and merged, it is intended to submit a patch
for the iproute2 package, so as to fix tc utility so as to use the new
flags and to display the bytecode in eBPF format when needed. This tc
patch is already available at:
https://github.com/6WIND/iproute2/commits/netlink_eBPF

Quentin Monnet (2):
  act_bpf: send back eBPF bytecode through netlink socket
  cls_bpf: send back eBPF bytecode through netlink socket

 include/uapi/linux/pkt_cls.h       |  1 +
 include/uapi/linux/tc_act/tc_bpf.h |  1 +
 net/sched/act_bpf.c                | 23 +++++++++++++++++++++++
 net/sched/cls_bpf.c                | 25 +++++++++++++++++++++++--
 4 files changed, 48 insertions(+), 2 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Paolo Abeni @ 2016-04-15  9:38 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, Florian Westphal, linux-security-module,
	David S. Miller, James Morris, Andreas Gruenbacher,
	Stephen Smalley, netdev, selinux
In-Reply-To: <CAHC9VhSgWAoWgywOZOvT76jCcHuEywjFu5CKJo9D0k+66RbFCA@mail.gmail.com>

On Thu, 2016-04-14 at 18:53 -0400, Paul Moore wrote:
> On Tue, Apr 12, 2016 at 4:52 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > Will be ok if we post a v2 version of this series, removing the hooks
> > de-registration bits, but preserving the selinux nf-hooks and
> > socket_sock_rcv_skb() on-demand/delayed registration ? Will that fit
> > with the post-init read only memory usage that you are planning ?
> 
> The work Florian and and I were talking about would be limited just to
> the netfilter hooks, the LSM hooks, e.g. socket_sock_rcv_skb() and
> friends, would remain as they are today.  What what we discussing was
> defaulting to not registering the netfilter hooks until it became
> necessary due to a labeled networking configuration or the
> always_check_network policy capability; the registration of the
> netfilter hooks would be permanent, you could not unregister the hooks
> at that point, you would need to reboot.  Does that make sense?

Yes, AFAIC it makes sense. I'll try to follow this route for an eventual
v2.

> As far as Casey's concerns, I don't think the work we are talking
> about for the v2 patchset would have any effect on the socket/sock
> security blobs as you really can't manage those adequately from the
> netfilter hooks; you most likely will reference them and perhaps even
> update the data within, but not allocate or free the blobs.  Besides,
> even in some weird case you were alloc/free'ing security blobs in the
> netfilter hooks, we can deal with that on a per-LSM basis if/when the
> full fledged stacking patches are merged; everything we are talking
> about is a hidden implementation detail so changing it in the future
> shouldn't be a problem.

Casey, are you ok with the above?

Thank you,

Paolo



^ permalink raw reply

* Re: [patch net-next 05/18] mlxsw: spectrum_buffers: Push out indexes and direction out of SB structs
From: Jiri Pirko @ 2016-04-15  8:52 UTC (permalink / raw)
  To: David Laight
  Cc: netdev@vger.kernel.org, davem@davemloft.net, idosch@mellanox.com,
	eladr@mellanox.com, yotamg@mellanox.com, ogerlitz@mellanox.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	jhs@mojatatu.com, john.fastabend@gmail.com, rami.rosen@intel.com,
	gospo@cumulusnetworks.com, stephen@networkplumber.org,
	sfeldma@gmail.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D5F4A2E99@AcuExch.aculab.com>

Fri, Apr 15, 2016 at 10:33:27AM CEST, David.Laight@ACULAB.COM wrote:
>From: Jiri Pirko
>> Sent: 14 April 2016 17:19
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Structs are in arrays so use array index as pool/tc/prio index. With
>> that, there is need to maintain separate arrays for ingress and egress.
>...
>> +static const u16 mlxsw_sp_pbs[] = {
>> +	2 * MLXSW_SP_BYTES_TO_CELLS(ETH_FRAME_LEN),
>> +	0,
>> +	0,
>> +	0,
>> +	0,
>> +	0,
>> +	0,
>> +	0,
>> +	0, /* Unused */
>> +	2 * MLXSW_SP_BYTES_TO_CELLS(MLXSW_PORT_MAX_MTU),
>>  };
>
>Use designated initialisers.

Okay

>
>> 
>>  #define MLXSW_SP_PBS_LEN ARRAY_SIZE(mlxsw_sp_pbs)
>> @@ -106,10 +96,9 @@ static int mlxsw_sp_port_pb_init(struct mlxsw_sp_port *mlxsw_sp_port)
>>  	mlxsw_reg_pbmc_pack(pbmc_pl, mlxsw_sp_port->local_port,
>>  			    0xffff, 0xffff / 2);
>>  	for (i = 0; i < MLXSW_SP_PBS_LEN; i++) {
>
>I'd rather see an explicit ARRAY_COUNT(mlxsw_sp_pbs) than some 'randon' constant.

See "#define MLXSW_SP_PBS_LEN ARRAY_SIZE(mlxsw_sp_pbs)"


>
>> -		const struct mlxsw_sp_pb *pb;
>> -
>> -		pb = &mlxsw_sp_pbs[i];
>> -		mlxsw_reg_pbmc_lossy_buffer_pack(pbmc_pl, pb->index, pb->size);
>> +		if (i == 8)
>> +			continue;
>
>I'm guessing that is the same '8' as the commented 'unused' slot when mlxsw_sp_pbs[]
>is initialised.
>Would be better if a named constant.
>If this in initialisation code an illegal value (maybe 0xffff) to mark the
>unused slot.

Okay. Will send follow-up to make this a bit nicer. Thanks.


>
>> +		mlxsw_reg_pbmc_lossy_buffer_pack(pbmc_pl, i, mlxsw_sp_pbs[i]);
>
>	David
>

^ permalink raw reply

* Fwd: Re: Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
From: Yuki Machida @ 2016-04-15  8:47 UTC (permalink / raw)
  To: fgont, hagen, hannes, davem, netdev, rongqing.li
In-Reply-To: <57020D23.9010403@jp.fujitsu.com>

Hi,

I doesn't report this issue to contributers of commit 9d28971 yet.
sorry.
I resend my information.

Regards,
Yuki Machida

-------- Forwarded Message --------
Subject: Re: Section 4 No. 9,10 Failed was occurred by IPv6 Ready Logo Conformance Test
Date: Mon, 4 Apr 2016 15:43:47 +0900
From: Yuki Machida <machida.yuki@jp.fujitsu.com>
To: Rongqing Li <rongqing.li@windriver.com>, netdev <netdev@vger.kernel.org>

Hi Roy,

On 2016年04月01日 17:00, Yuki Machida wrote:
> Hi Roy,
> 
> Thank you for your advice.
> I am very glad.
> 
> Futher comment below.
> 
> On 2016年04月01日 16:43, Rongqing Li wrote:
>>
>>
>> On 2016年04月01日 15:31, Yuki Machida wrote:
>>> Hi all,
>>>
>>> I tested 4.6-rc1 by IPv6 Ready Logo Core Conformance Test.
>>> 4.6-rc1 has some FAILs in Section 4 (RFC 1981: Path MTU Discovery for IP version 6).
>>> I conformed that it was PASSed in 3.14.28 and it was FAILed in 4.1.17.
>>> I will find a patch between 3.14 and 4.1.
>>>
>>> IPv6 Ready Logo
>>> https://www.ipv6ready.org/
>>> TAHI Project
>>> http://www.tahi.org/
>>>
>>> I ran the IPv6 Ready Logo Core Conformance Test on Intel D510MO (Atom D510).
>>> It is using userland build with yocto project.
>>>
>>> Test Environment
>>> Test Specification          : 4.0.6
>>> Tool Version                : REL_3_3_2
>>> Test Program Version        : V6LC_5_0_0
>>> Target Device               : Intel D510MO (Atom D510)
>>>
>>> List of FAILs
>>>
>>> Section 4: RFC 1981 - Path MTU Discovery for IPv6
>>> - Test v6LC.4.1.6: Receiving MTU Below IPv6 Minimum Link MTU
>>>      - No. 9 Part A: MTU equal to 56
>>>      - No.10 Part B: MTU equal to 1279
>>>
>>
>> apply this one
>>
>> commit 8013d1d7eafb0589ca766db6b74026f76b7f5cb4
>> Author: Hangbin Liu <liuhangbin@gmail.com>
>> Date:   Thu Jul 30 14:28:42 2015 +0800
>>
>>       net/ipv6: add sysctl option accept_ra_min_hop_limit
>>
>>       Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface")
>>       disabled accept hop limit from RA if it is smaller than the current hop
>>       limit for security stuff. But this behavior kind of break the RFC
>> definition.
>>
>>       RFC 4861, 6.3.4.  Processing Received Router Advertisements
>>          A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
>>          and Retrans Timer) may contain a value denoting that it is
>>          unspecified.  In such cases, the parameter should be ignored and the
>>          host should continue using whatever value it is already using.
>>
>>          If the received Cur Hop Limit value is non-zero, the host SHOULD set
>>          its CurHopLimit variable to the received value.
>>
>>       So add sysctl option accept_ra_min_hop_limit to let user choose the
>> minimum
>>       hop limit value they can accept from RA. And set default to 1 to
>> meet RFC
>>       standards.
>>
>>       Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>>       Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
>>       Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I conformed that above patch has been applied at v4.3 in linux.git.
> 
> % git tag --contains=8013d1d7eafb0589ca766db6b74026f76b7f5cb4 | head
> v4.3
> v4.3-rc1
> v4.3-rc2
> v4.3-rc3
> v4.3-rc4
> v4.3-rc5
> v4.3-rc6
> v4.3-rc7
> v4.4
> v4.4-rc1
> 
>>
>>
>>
>>
>>
>> and revert the below one, the TAHI should be updated
>>
>> commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2
>> Author: Hagen Paul Pfeifer <hagen@jauu.net>
>> Date: Thu Jan 15 22:34:25 2015 +0100
>>
>>       ipv6: stop sending PTB packets for MTU < 1280
>>
>>       Reduce the attack vector and stop generating IPv6 Fragment Header for
>>       paths with an MTU smaller than the minimum required IPv6 MTU
>>       size (1280 byte) - called atomic fragments.
>>
>>       See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
>>       for more information and how this "feature" can be misused.
>>
>>       [1]
>> https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00
>>
>>       Signed-off-by: Fernando Gont <fgont@si6networks.com>
>>       Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
>>       Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>       Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I will try.

I confirmed that v4.1.20 revert above patch is passed Section 4 No. 9 and 10 testcases
in IPv6 Ready Logo Conformance Test.
I can't immediately revert above patch from v4.6-rc1 by implementation has changed.

I am considering how to fix this problem for mainline.

> 
>>
>>
>>
>> -Roy
>>
>>
>>
>>
>>> Regards,
>>> Yuki Machida
>>>
>>

^ permalink raw reply

* RE: [patch net-next 05/18] mlxsw: spectrum_buffers: Push out indexes and direction out of SB structs
From: David Laight @ 2016-04-15  8:33 UTC (permalink / raw)
  To: 'Jiri Pirko', netdev@vger.kernel.org
  Cc: davem@davemloft.net, idosch@mellanox.com, eladr@mellanox.com,
	yotamg@mellanox.com, ogerlitz@mellanox.com,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com,
	jhs@mojatatu.com, john.fastabend@gmail.com, rami.rosen@intel.com,
	gospo@cumulusnetworks.com, stephen@networkplumber.org,
	sfeldma@gmail.com
In-Reply-To: <1460650770-19382-6-git-send-email-jiri@resnulli.us>

From: Jiri Pirko
> Sent: 14 April 2016 17:19
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Structs are in arrays so use array index as pool/tc/prio index. With
> that, there is need to maintain separate arrays for ingress and egress.
...
> +static const u16 mlxsw_sp_pbs[] = {
> +	2 * MLXSW_SP_BYTES_TO_CELLS(ETH_FRAME_LEN),
> +	0,
> +	0,
> +	0,
> +	0,
> +	0,
> +	0,
> +	0,
> +	0, /* Unused */
> +	2 * MLXSW_SP_BYTES_TO_CELLS(MLXSW_PORT_MAX_MTU),
>  };

Use designated initialisers.

> 
>  #define MLXSW_SP_PBS_LEN ARRAY_SIZE(mlxsw_sp_pbs)
> @@ -106,10 +96,9 @@ static int mlxsw_sp_port_pb_init(struct mlxsw_sp_port *mlxsw_sp_port)
>  	mlxsw_reg_pbmc_pack(pbmc_pl, mlxsw_sp_port->local_port,
>  			    0xffff, 0xffff / 2);
>  	for (i = 0; i < MLXSW_SP_PBS_LEN; i++) {

I'd rather see an explicit ARRAY_COUNT(mlxsw_sp_pbs) than some 'randon' constant.

> -		const struct mlxsw_sp_pb *pb;
> -
> -		pb = &mlxsw_sp_pbs[i];
> -		mlxsw_reg_pbmc_lossy_buffer_pack(pbmc_pl, pb->index, pb->size);
> +		if (i == 8)
> +			continue;

I'm guessing that is the same '8' as the commented 'unused' slot when mlxsw_sp_pbs[]
is initialised.
Would be better if a named constant.
If this in initialisation code an illegal value (maybe 0xffff) to mark the
unused slot.

> +		mlxsw_reg_pbmc_lossy_buffer_pack(pbmc_pl, i, mlxsw_sp_pbs[i]);

	David

^ permalink raw reply

* Re: [PATCH net-next V3 00/16] net: fec: cleanup and fixes
From: Holger Schurig @ 2016-04-15  8:33 UTC (permalink / raw)
  To: Troy Kisky
  Cc: netdev, davem, fugang.duan, lznuaa, andrew, stillcompiling, arnd,
	sergei.shtylyov, gerg, fabio.estevam, johannes, l.stach,
	linux-arm-kernel, tremyfr
In-Reply-To: <570FB998.2080900@boundarydevices.com>

> I think I've already fixed this, but I've only submitted once.
>
> commit 466cb4a2e5583d2e18470f30d5948edcf4b947f5
> Author: Troy Kisky <troy.kisky@boundarydevices.com>
> Date:   Wed Jan 20 12:52:10 2016 -0700
>
>     net: fec: update dirty_tx even if no skb
>
>     If dirty_tx isn't updated, then dma_unmap_single
>     will be called twice.
>
>     Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com>

Thanks!   This fixed the issue.  Have a

Tested-by: <holgerschurig@gmail.com>

^ permalink raw reply

* [patch iproute2 11/11] devlink: add manpage for shared buffer
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Manpage for devlink "sb" object.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 man/man8/devlink-dev.8     |   1 +
 man/man8/devlink-monitor.8 |   1 +
 man/man8/devlink-port.8    |   1 +
 man/man8/devlink-sb.8      | 313 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 316 insertions(+)
 create mode 100644 man/man8/devlink-sb.8

diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
index af96a29..62bcead 100644
--- a/man/man8/devlink-dev.8
+++ b/man/man8/devlink-dev.8
@@ -52,6 +52,7 @@ Shows the state of specified devlink device.
 .SH SEE ALSO
 .BR devlink (8),
 .BR devlink-port (8),
+.BR devlink-sb (8),
 .BR devlink-monitor (8),
 .br
 
diff --git a/man/man8/devlink-monitor.8 b/man/man8/devlink-monitor.8
index 98134c3..13fe641 100644
--- a/man/man8/devlink-monitor.8
+++ b/man/man8/devlink-monitor.8
@@ -29,6 +29,7 @@ opens Devlink Netlink socket, listens on it and dumps state changes.
 .SH SEE ALSO
 .BR devlink (8),
 .BR devlink-dev (8),
+.BR devlink-sb (8),
 .BR devlink-port (8),
 .br
 
diff --git a/man/man8/devlink-port.8 b/man/man8/devlink-port.8
index d78837c..a639d01 100644
--- a/man/man8/devlink-port.8
+++ b/man/man8/devlink-port.8
@@ -120,6 +120,7 @@ Unplit the specified previously split devlink port.
 .SH SEE ALSO
 .BR devlink (8),
 .BR devlink-dev (8),
+.BR devlink-sb (8),
 .BR devlink-monitor (8),
 .br
 
diff --git a/man/man8/devlink-sb.8 b/man/man8/devlink-sb.8
new file mode 100644
index 0000000..ffb5553
--- /dev/null
+++ b/man/man8/devlink-sb.8
@@ -0,0 +1,313 @@
+.TH DEVLINK\-SB 8 "14 Apr 2016" "iproute2" "Linux"
+.SH NAME
+devlink-sb \- devlink shared buffer configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B sb
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-n\fR[\fIno-nice-names\fR] }
+
+.ti -8
+.BR "devlink sb show "
+.RI "[ " DEV " [ "
+.B sb
+.IR SB_INDEX " ] ]"
+
+.ti -8
+.BR "devlink sb pool show "
+.RI "[ " DEV " [ "
+.B sb
+.IR SB_INDEX " ] "
+.br
+.B pool
+.IR POOL_INDEX " ]"
+
+.ti -8
+.BI "devlink sb pool set " DEV "
+.RB "[ " sb
+.IR SB_INDEX " ] "
+.br
+.BI pool " POOL_INDEX "
+.br
+.BI size " POOL_SIZE "
+.br
+.BR thtype " { " static " | " dynamic " }"
+
+.ti -8
+.BR "devlink sb port pool show "
+.RI "[ " DEV/PORT_INDEX " [ "
+.B sb
+.IR SB_INDEX " ] "
+.br
+.B pool
+.IR POOL_INDEX " ]"
+
+.ti -8
+.BI "devlink sb port pool set " DEV/PORT_INDEX "
+.RB "[ " sb
+.IR SB_INDEX " ] "
+.br
+.BI pool " POOL_INDEX "
+.br
+.BI th " THRESHOLD "
+
+.ti -8
+.BR "devlink sb tc bind show "
+.RI "[ " DEV/PORT_INDEX " [ "
+.B sb
+.IR SB_INDEX " ] "
+.br
+.BI tc " TC_INDEX "
+.br
+.B type
+.RB "{ " ingress " | " egress " } ]"
+
+.ti -8
+.BI "devlink sb tc bind set " DEV/PORT_INDEX "
+.RB "[ " sb
+.IR SB_INDEX " ] "
+.br
+.BI tc " TC_INDEX "
+.br
+.BR type " { " ingress " | " egress " }"
+.br
+.BI pool " POOL_INDEX "
+.br
+.BI th " THRESHOLD "
+
+.ti -8
+.BR "devlink sb occupancy show "
+.RI "{ " DEV " | " DEV/PORT_INDEX " } [ "
+.B sb
+.IR SB_INDEX " ] "
+
+.ti -8
+.BR "devlink sb occupancy snapshot "
+.IR DEV " [ "
+.B sb
+.IR SB_INDEX " ]"
+
+.ti -8
+.BR "devlink sb occupancy clearmax "
+.IR DEV " [ "
+.B sb
+.IR SB_INDEX " ]"
+
+.ti -8
+.B devlink sb help
+
+.SH "DESCRIPTION"
+.SS devlink sb show - display available shared buffers and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device to show shared buffers.
+If this argument is omitted all shared buffers of all devices are listed.
+
+.PP
+.I "SB_INDEX"
+- specifies the shared buffer.
+If this argument is omitted shared buffer with index 0 is selected.
+Behaviour of this argument it the same for every command.
+
+.SS devlink sb pool show - display available pools and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device to show pools.
+If this argument is omitted all pools of all devices are listed.
+
+.SS devlink sb pool set - set attributes of pool
+
+.PP
+.I "DEV"
+- specifies the devlink device to set pool.
+
+.TP
+.BI size " POOL_SIZE"
+size of the pool in Bytes.
+
+.TP
+.BR thtype " { " static " | " dynamic " } "
+pool threshold type.
+
+.I static
+- Threshold values for the pool will be passed in Bytes.
+
+.I dynamic
+- Threshold values ("to_alpha") for the pool will be used to compute alpha parameter according to formula:
+.br
+.in +16
+alpha = 2 ^ (to_alpha - 10)
+.in -16
+
+.in +10
+The range of the passed value is between 0 to 20. The computed alpha is used to determine the maximum usage of the flow:
+.in -10
+.br
+.in +16
+max_usage = alpha / (1 + alpha) * Free_Buffer
+.in -16
+
+.SS devlink sb port pool show - display port-pool combinations and threshold for each
+.I "DEV/PORT_INDEX"
+- specifies the devlink port.
+
+.TP
+.BI pool " POOL_INDEX"
+pool index.
+
+.SS devlink sb port pool set - set port-pool threshold
+.I "DEV/PORT_INDEX"
+- specifies the devlink port.
+
+.TP
+.BI pool " POOL_INDEX"
+pool index.
+
+.TP
+.BI th " THRESHOLD"
+threshold value. Type of the value is either Bytes or "to_alpha", depends on
+.B thtype
+set for the pool.
+
+.SS devlink sb tc bind show - display port-TC to pool bindings and threshold for each
+
+.I "DEV/PORT_INDEX"
+- specifies the devlink port.
+
+.TP
+.BI tc " TC_INDEX"
+index of either ingress or egress TC, usually in range 0 to 8 (depends on device).
+
+.TP
+.BR type " { " ingress " | " egress " } "
+TC type.
+
+.SS devlink sb tc bind set - set port-TC to pool binding with specified threshold
+
+.I "DEV/PORT_INDEX"
+- specifies the devlink port.
+
+.TP
+.BI tc " TC_INDEX"
+index of either ingress or egress TC, usually in range 0 to 8 (depends on device).
+
+.TP
+.BR type " { " ingress " | " egress " } "
+TC type.
+
+.TP
+.BI pool " POOL_INDEX"
+index of pool to bind this to.
+
+.TP
+.BI th " THRESHOLD"
+threshold value. Type of the value is either Bytes or "to_alpha", depends on
+.B thtype
+set for the pool.
+
+.SS devlink sb occupancy show - display shared buffer occupancy values for device or port
+
+.PP
+This command is used to browse shared buffer occupancy values. Values are showed for every port-pool combination as well as for all port-TC combinations (with pool this port-TC is bound to). Format of value is:
+.br
+.in +16
+current_value/max_value
+.in -16
+Note that before showing values, one has to issue
+.b occupancy snapshot
+command first.
+
+.PP
+.I "DEV"
+- specifies the devlink device to show occupancy values for.
+
+.I "DEV/PORT_INDEX"
+- specifies the devlink port to show occupancy values for.
+
+.SS devlink sb occupancy snapshot - take occupancy snapshot of shared buffer for device
+This command is used to take a snapshot of shared buffer occupancy values. After that, the values can be showed using
+.B occupancy show
+command.
+
+.PP
+.I "DEV"
+- specifies the devlink device to take occupancy snapshot on.
+
+.SS devlink sb occupancy clearmax - clear occupancy watermarks of shared buffer for device
+This command is used to reset maximal occupancy values reached for whole device. Note that before browsing reset values, one has to issue
+.B occupancy snapshot
+command.
+
+.PP
+.I "DEV"
+- specifies the devlink device to clear occupancy watermarks on.
+
+.SH "EXAMPLES"
+.PP
+devlink sb show
+.RS 4
+List available share buffers.
+.RE
+.PP
+devlink sb pool show
+.RS 4
+List available pools and their config.
+.RE
+.PP
+devlink sb port pool show pci/0000:03:00.0/1 pool 0
+.RS 4
+Show port-pool setup for specified port and pool.
+.RE
+.PP
+sudo devlink sb port pool set pci/0000:03:00.0/1 pool 0 th 15
+.RS 4
+Change threshold for port specified port and pool.
+.RE
+.PP
+devlink sb tc bind show pci/0000:03:00.0/1 tc 0 type ingress
+.RS 4
+Show pool binding and threshold for specified port and TC.
+.RE
+.PP
+sudo devlink sb tc bind set pci/0000:03:00.0/1 tc 0 type ingress pool 0 th 9
+.RS 4
+Set pool binding and threshold for specified port and TC.
+.RE
+.PP
+sudo devlink sb occupancy snapshot pci/0000:03:00.0
+.RS 4
+Make a snapshot of occupancy of shared buffer for specified devlink device.
+.RE
+.PP
+devlink sb occupancy show pci/0000:03:00.0/1
+.RS 4
+Show occupancy for specified port from the snapshot.
+.RE
+.PP
+sudo devlink sb occupancy clearmax pci/0000:03:00.0
+.RS 4
+Clear watermarks for shared buffer of specified devlink device.
+
+
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-port (8),
+.BR devlink-monitor (8),
+.br
+
+.SH AUTHOR
+Jiri Pirko <jiri@mellanox.com>
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 10/11] devlink: implement shared buffer occupancy control
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c       | 349 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/devlink.h |   6 +
 2 files changed, 355 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 228807f..ffefa86 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -27,6 +27,12 @@
 
 #define pr_err(args...) fprintf(stderr, ##args)
 #define pr_out(args...) fprintf(stdout, ##args)
+#define pr_out_sp(num, args...)					\
+	do {							\
+		int ret = fprintf(stdout, ##args);		\
+		if (ret < num)					\
+			fprintf(stdout, "%*s", num - ret, "");	\
+	} while (0)
 
 static int _mnlg_socket_recv_run(struct mnlg_socket *nlg,
 				 mnl_cb_t data_cb, void *data)
@@ -275,6 +281,12 @@ static int attr_cb(const struct nlattr *attr, void *data)
 	if (type == DEVLINK_ATTR_SB_TC_INDEX &&
 	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
 		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_OCC_CUR &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_OCC_MAX &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
 	tb[type] = attr;
 	return MNL_CB_OK;
 }
@@ -864,6 +876,7 @@ static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
 	struct nlattr *attr_bus_name = tb[DEVLINK_ATTR_BUS_NAME];
 	struct nlattr *attr_dev_name = tb[DEVLINK_ATTR_DEV_NAME];
 	struct nlattr *attr_port_index = tb[DEVLINK_ATTR_PORT_INDEX];
+	struct nlattr *attr_sb_index = tb[DEVLINK_ATTR_SB_INDEX];
 
 	if (opts->present & DL_OPT_HANDLE &&
 	    attr_bus_name && attr_dev_name) {
@@ -885,6 +898,12 @@ static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
 		    port_index != opts->port_index)
 			return false;
 	}
+	if (opts->present & DL_OPT_SB && attr_sb_index) {
+		uint32_t sb_index = mnl_attr_get_u32(attr_sb_index);
+
+		if (sb_index != opts->sb_index)
+			return false;
+	}
 	return true;
 }
 
@@ -1168,6 +1187,9 @@ static void cmd_sb_help(void)
 	pr_out("       devlink sb tc bind set DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX\n");
 	pr_out("                              type { ingress | egress } pool POOL_INDEX\n");
 	pr_out("                              th THRESHOLD\n");
+	pr_out("       devlink sb occupancy show { DEV | DEV/PORT_INDEX } [ sb SB_INDEX ]\n");
+	pr_out("       devlink sb occupancy snapshot DEV [ sb SB_INDEX ]\n");
+	pr_out("       devlink sb occupancy clearmax DEV [ sb SB_INDEX ]\n");
 }
 
 static void pr_out_sb(struct nlattr **tb)
@@ -1504,6 +1526,330 @@ static int cmd_sb_tc(struct dl *dl)
 	return -ENOENT;
 }
 
+struct occ_item {
+	struct list_head list;
+	uint32_t index;
+	uint32_t cur;
+	uint32_t max;
+	uint32_t bound_pool_index;
+};
+
+struct occ_port {
+	struct list_head list;
+	char *bus_name;
+	char *dev_name;
+	uint32_t port_index;
+	uint32_t sb_index;
+	struct list_head pool_list;
+	struct list_head ing_tc_list;
+	struct list_head eg_tc_list;
+};
+
+struct occ_show {
+	struct dl *dl;
+	int err;
+	struct list_head port_list;
+};
+
+static struct occ_item *occ_item_alloc(void)
+{
+	return calloc(1, sizeof(struct occ_item));
+}
+
+static void occ_item_free(struct occ_item *occ_item)
+{
+	free(occ_item);
+}
+
+static struct occ_port *occ_port_alloc(uint32_t port_index)
+{
+	struct occ_port *occ_port;
+
+	occ_port = calloc(1, sizeof(*occ_port));
+	if (!occ_port)
+		return NULL;
+	occ_port->port_index = port_index;
+	INIT_LIST_HEAD(&occ_port->pool_list);
+	INIT_LIST_HEAD(&occ_port->ing_tc_list);
+	INIT_LIST_HEAD(&occ_port->eg_tc_list);
+	return occ_port;
+}
+
+static void occ_port_free(struct occ_port *occ_port)
+{
+	struct occ_item *occ_item, *tmp;
+
+	list_for_each_entry_safe(occ_item, tmp, &occ_port->pool_list, list)
+		occ_item_free(occ_item);
+	list_for_each_entry_safe(occ_item, tmp, &occ_port->ing_tc_list, list)
+		occ_item_free(occ_item);
+	list_for_each_entry_safe(occ_item, tmp, &occ_port->eg_tc_list, list)
+		occ_item_free(occ_item);
+}
+
+static struct occ_show *occ_show_alloc(struct dl *dl)
+{
+	struct occ_show *occ_show;
+
+	occ_show = calloc(1, sizeof(*occ_show));
+	if (!occ_show)
+		return NULL;
+	occ_show->dl = dl;
+	INIT_LIST_HEAD(&occ_show->port_list);
+	return occ_show;
+}
+
+static void occ_show_free(struct occ_show *occ_show)
+{
+	struct occ_port *occ_port, *tmp;
+
+	list_for_each_entry_safe(occ_port, tmp, &occ_show->port_list, list)
+		occ_port_free(occ_port);
+}
+
+static struct occ_port *occ_port_get(struct occ_show *occ_show,
+				     struct nlattr **tb)
+{
+	struct occ_port *occ_port;
+	uint32_t port_index;
+
+	port_index = mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]);
+
+	list_for_each_entry_reverse(occ_port, &occ_show->port_list, list) {
+		if (occ_port->port_index == port_index)
+			return occ_port;
+	}
+	occ_port = occ_port_alloc(port_index);
+	if (!occ_port)
+		return NULL;
+	list_add_tail(&occ_port->list, &occ_show->port_list);
+	return occ_port;
+}
+
+static void pr_out_occ_show_item_list(const char *label, struct list_head *list,
+				      bool bound_pool)
+{
+	struct occ_item *occ_item;
+	int i = 1;
+
+	pr_out_sp(7, "  %s:", label);
+	list_for_each_entry(occ_item, list, list) {
+		if ((i - 1) % 4 == 0 && i != 1)
+			pr_out_sp(7, " ");
+		if (bound_pool)
+			pr_out_sp(7, "%2u(%u):", occ_item->index,
+				  occ_item->bound_pool_index);
+		else
+			pr_out_sp(7, "%2u:", occ_item->index);
+		pr_out_sp(15, "%7u/%u", occ_item->cur, occ_item->max);
+		if (i++ % 4 == 0)
+			pr_out("\n");
+	}
+	if ((i - 1) % 4 != 0)
+		pr_out("\n");
+}
+
+static void pr_out_occ_show_port(struct occ_port *occ_port)
+{
+	pr_out_occ_show_item_list("pool", &occ_port->pool_list, false);
+	pr_out_occ_show_item_list("itc", &occ_port->ing_tc_list, true);
+	pr_out_occ_show_item_list("etc", &occ_port->eg_tc_list, true);
+}
+
+static void pr_out_occ_show(struct occ_show *occ_show)
+{
+	struct dl *dl = occ_show->dl;
+	struct dl_opts *opts = &dl->opts;
+	struct occ_port *occ_port;
+
+	list_for_each_entry(occ_port, &occ_show->port_list, list) {
+		__pr_out_port_handle_nice(dl, opts->bus_name, opts->dev_name,
+					  occ_port->port_index);
+		pr_out(":\n");
+		pr_out_occ_show_port(occ_port);
+	}
+}
+
+static void cmd_sb_occ_port_pool_process(struct occ_show *occ_show,
+					 struct nlattr **tb)
+{
+	struct occ_port *occ_port;
+	struct occ_item *occ_item;
+
+	if (occ_show->err || !dl_dump_filter(occ_show->dl, tb))
+		return;
+
+	occ_port = occ_port_get(occ_show, tb);
+	if (!occ_port) {
+		occ_show->err = -ENOMEM;
+		return;
+	}
+
+	occ_item = occ_item_alloc();
+	if (!occ_item) {
+		occ_show->err = -ENOMEM;
+		return;
+	}
+	occ_item->index = mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_POOL_INDEX]);
+	occ_item->cur = mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_OCC_CUR]);
+	occ_item->max = mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_OCC_MAX]);
+	list_add_tail(&occ_item->list, &occ_port->pool_list);
+}
+
+static int cmd_sb_occ_port_pool_process_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct occ_show *occ_show = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_PORT_INDEX] || !tb[DEVLINK_ATTR_SB_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_OCC_CUR] || !tb[DEVLINK_ATTR_SB_OCC_MAX])
+		return MNL_CB_ERROR;
+	cmd_sb_occ_port_pool_process(occ_show, tb);
+	return MNL_CB_OK;
+}
+
+static void cmd_sb_occ_tc_pool_process(struct occ_show *occ_show,
+				       struct nlattr **tb)
+{
+	struct occ_port *occ_port;
+	struct occ_item *occ_item;
+	uint8_t pool_type;
+
+	if (occ_show->err || !dl_dump_filter(occ_show->dl, tb))
+		return;
+
+	occ_port = occ_port_get(occ_show, tb);
+	if (!occ_port) {
+		occ_show->err = -ENOMEM;
+		return;
+	}
+
+	occ_item = occ_item_alloc();
+	if (!occ_item) {
+		occ_show->err = -ENOMEM;
+		return;
+	}
+	occ_item->index = mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_TC_INDEX]);
+	occ_item->cur = mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_OCC_CUR]);
+	occ_item->max = mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_OCC_MAX]);
+	occ_item->bound_pool_index =
+			mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_POOL_INDEX]);
+	pool_type = mnl_attr_get_u8(tb[DEVLINK_ATTR_SB_POOL_TYPE]);
+	if (pool_type == DEVLINK_SB_POOL_TYPE_INGRESS)
+		list_add_tail(&occ_item->list, &occ_port->ing_tc_list);
+	else if (pool_type == DEVLINK_SB_POOL_TYPE_EGRESS)
+		list_add_tail(&occ_item->list, &occ_port->eg_tc_list);
+	else
+		occ_item_free(occ_item);
+}
+
+static int cmd_sb_occ_tc_pool_process_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct occ_show *occ_show = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_PORT_INDEX] || !tb[DEVLINK_ATTR_SB_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_TC_INDEX] || !tb[DEVLINK_ATTR_SB_POOL_TYPE] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_OCC_CUR] || !tb[DEVLINK_ATTR_SB_OCC_MAX])
+		return MNL_CB_ERROR;
+	cmd_sb_occ_tc_pool_process(occ_show, tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_sb_occ_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	struct occ_show *occ_show;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP;
+	int err;
+
+	err = dl_argv_parse(dl, DL_OPT_HANDLE | DL_OPT_HANDLEP, DL_OPT_SB);
+	if (err)
+		return err;
+
+	occ_show = occ_show_alloc(dl);
+	if (!occ_show)
+		return -ENOMEM;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_PORT_POOL_GET, flags);
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh,
+				  cmd_sb_occ_port_pool_process_cb, occ_show);
+	if (err)
+		goto out;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_TC_POOL_BIND_GET, flags);
+
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh,
+				  cmd_sb_occ_tc_pool_process_cb, occ_show);
+	if (err)
+		goto out;
+
+	pr_out_occ_show(occ_show);
+
+out:
+	occ_show_free(occ_show);
+	return err;
+}
+
+static int cmd_sb_occ_snapshot(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_OCC_SNAPSHOT,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE, DL_OPT_SB);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_sb_occ_clearmax(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_OCC_MAX_CLEAR,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE, DL_OPT_SB);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_sb_occ(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list")) {
+		dl_arg_inc(dl);
+		return cmd_sb_occ_show(dl);
+	} else if (dl_argv_match(dl, "snapshot")) {
+		dl_arg_inc(dl);
+		return cmd_sb_occ_snapshot(dl);
+	} else if (dl_argv_match(dl, "clearmax")) {
+		dl_arg_inc(dl);
+		return cmd_sb_occ_clearmax(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static int cmd_sb(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
@@ -1522,6 +1868,9 @@ static int cmd_sb(struct dl *dl)
 	} else if (dl_argv_match(dl, "tc")) {
 		dl_arg_inc(dl);
 		return cmd_sb_tc(dl);
+	} else if (dl_argv_match(dl, "occupancy")) {
+		dl_arg_inc(dl);
+		return cmd_sb_occ(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
diff --git a/include/linux/devlink.h b/include/linux/devlink.h
index 9c1aa57..ba0073b 100644
--- a/include/linux/devlink.h
+++ b/include/linux/devlink.h
@@ -53,6 +53,10 @@ enum devlink_command {
 	DEVLINK_CMD_SB_TC_POOL_BIND_NEW,
 	DEVLINK_CMD_SB_TC_POOL_BIND_DEL,
 
+	/* Shared buffer occupancy monitoring commands */
+	DEVLINK_CMD_SB_OCC_SNAPSHOT,
+	DEVLINK_CMD_SB_OCC_MAX_CLEAR,
+
 	/* add new commands above here */
 
 	__DEVLINK_CMD_MAX,
@@ -119,6 +123,8 @@ enum devlink_attr {
 	DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE,	/* u8 */
 	DEVLINK_ATTR_SB_THRESHOLD,		/* u32 */
 	DEVLINK_ATTR_SB_TC_INDEX,		/* u16 */
+	DEVLINK_ATTR_SB_OCC_CUR,		/* u32 */
+	DEVLINK_ATTR_SB_OCC_MAX,		/* u32 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 09/11] devlink: implement shared buffer support
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c       | 603 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/devlink.h |  57 +++++
 2 files changed, 659 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index e2e0413..228807f 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -114,6 +114,13 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HANDLEP		BIT(1)
 #define DL_OPT_PORT_TYPE	BIT(2)
 #define DL_OPT_PORT_COUNT	BIT(3)
+#define DL_OPT_SB		BIT(4)
+#define DL_OPT_SB_POOL		BIT(5)
+#define DL_OPT_SB_SIZE		BIT(6)
+#define DL_OPT_SB_TYPE		BIT(7)
+#define DL_OPT_SB_THTYPE	BIT(8)
+#define DL_OPT_SB_TH		BIT(9)
+#define DL_OPT_SB_TC		BIT(10)
 
 struct dl_opts {
 	uint32_t present; /* flags of present items */
@@ -122,6 +129,13 @@ struct dl_opts {
 	uint32_t port_index;
 	enum devlink_port_type port_type;
 	uint32_t port_count;
+	uint32_t sb_index;
+	uint16_t sb_pool_index;
+	uint32_t sb_pool_size;
+	enum devlink_sb_pool_type sb_pool_type;
+	enum devlink_sb_threshold_type sb_pool_thtype;
+	uint32_t sb_threshold;
+	uint16_t sb_tc_index;
 };
 
 struct dl {
@@ -225,6 +239,42 @@ static int attr_cb(const struct nlattr *attr, void *data)
 	if (type == DEVLINK_ATTR_PORT_IBDEV_NAME &&
 	    mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
 		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_INDEX &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_SIZE &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_INGRESS_POOL_COUNT &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_EGRESS_POOL_COUNT &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_INGRESS_TC_COUNT &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_EGRESS_TC_COUNT &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_POOL_INDEX &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_POOL_TYPE &&
+	    mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_POOL_SIZE &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE &&
+	    mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_THRESHOLD &&
+	    mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+		return MNL_CB_ERROR;
+	if (type == DEVLINK_ATTR_SB_TC_INDEX &&
+	    mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+		return MNL_CB_ERROR;
 	tb[type] = attr;
 	return MNL_CB_OK;
 }
@@ -363,6 +413,20 @@ static int strtouint32_t(const char *str, uint32_t *p_val)
 	return 0;
 }
 
+static int strtouint16_t(const char *str, uint16_t *p_val)
+{
+	char *endptr;
+	unsigned long int val;
+
+	val = strtoul(str, &endptr, 10);
+	if (endptr == str || *endptr != '\0')
+		return -EINVAL;
+	if (val > USHRT_MAX)
+		return -ERANGE;
+	*p_val = val;
+	return 0;
+}
+
 static int __dl_argv_handle(char *str, char **p_bus_name, char **p_dev_name)
 {
 	strslashrsplit(str, p_bus_name, p_dev_name);
@@ -503,6 +567,24 @@ static int dl_argv_uint32_t(struct dl *dl, uint32_t *p_val)
 	return 0;
 }
 
+static int dl_argv_uint16_t(struct dl *dl, uint16_t *p_val)
+{
+	char *str = dl_argv_next(dl);
+	int err;
+
+	if (!str) {
+		pr_err("Unsigned number argument expected\n");
+		return -EINVAL;
+	}
+
+	err = strtouint16_t(str, p_val);
+	if (err) {
+		pr_err("\"%s\" is not a number or not within range\n", str);
+		return err;
+	}
+	return 0;
+}
+
 static int dl_argv_str(struct dl *dl, const char **p_str)
 {
 	const char *str = dl_argv_next(dl);
@@ -530,6 +612,33 @@ static int port_type_get(const char *typestr, enum devlink_port_type *p_type)
 	return 0;
 }
 
+static int pool_type_get(const char *typestr, enum devlink_sb_pool_type *p_type)
+{
+	if (strcmp(typestr, "ingress") == 0) {
+		*p_type = DEVLINK_SB_POOL_TYPE_INGRESS;
+	} else if (strcmp(typestr, "egress") == 0) {
+		*p_type = DEVLINK_SB_POOL_TYPE_EGRESS;
+	} else {
+		pr_err("Unknown pool type \"%s\"\n", typestr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int threshold_type_get(const char *typestr,
+			      enum devlink_sb_threshold_type *p_type)
+{
+	if (strcmp(typestr, "static") == 0) {
+		*p_type = DEVLINK_SB_THRESHOLD_TYPE_STATIC;
+	} else if (strcmp(typestr, "dynamic") == 0) {
+		*p_type = DEVLINK_SB_THRESHOLD_TYPE_DYNAMIC;
+	} else {
+		pr_err("Unknown threshold type \"%s\"\n", typestr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			 uint32_t o_optional)
 {
@@ -579,6 +688,66 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_PORT_COUNT;
+		} else if (dl_argv_match(dl, "sb") &&
+			   (o_all & DL_OPT_SB)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint32_t(dl, &opts->sb_index);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB;
+		} else if (dl_argv_match(dl, "pool") &&
+			   (o_all & DL_OPT_SB_POOL)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint16_t(dl, &opts->sb_pool_index);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_POOL;
+		} else if (dl_argv_match(dl, "size") &&
+			   (o_all & DL_OPT_SB_SIZE)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint32_t(dl, &opts->sb_pool_size);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_SIZE;
+		} else if (dl_argv_match(dl, "type") &&
+			   (o_all & DL_OPT_SB_TYPE)) {
+			const char *typestr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &typestr);
+			if (err)
+				return err;
+			err = pool_type_get(typestr, &opts->sb_pool_type);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_TYPE;
+		} else if (dl_argv_match(dl, "thtype") &&
+			   (o_all & DL_OPT_SB_THTYPE)) {
+			const char *typestr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &typestr);
+			if (err)
+				return err;
+			err = threshold_type_get(typestr,
+						 &opts->sb_pool_thtype);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_THTYPE;
+		} else if (dl_argv_match(dl, "th") &&
+			   (o_all & DL_OPT_SB_TH)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint32_t(dl, &opts->sb_threshold);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_TH;
+		} else if (dl_argv_match(dl, "tc") &&
+			   (o_all & DL_OPT_SB_TC)) {
+			dl_arg_inc(dl);
+			err = dl_argv_uint16_t(dl, &opts->sb_tc_index);
+			if (err)
+				return err;
+			o_found |= DL_OPT_SB_TC;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -587,6 +756,11 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 
 	opts->present = o_found;
 
+	if ((o_optional & DL_OPT_SB) && !(o_found & DL_OPT_SB)) {
+		opts->sb_index = 0;
+		opts->present |= DL_OPT_SB;
+	}
+
 	if ((o_required & DL_OPT_PORT_TYPE) && !(o_found & DL_OPT_PORT_TYPE)) {
 		pr_err("Port type option expected.\n");
 		return -EINVAL;
@@ -598,6 +772,35 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 		return -EINVAL;
 	}
 
+	if ((o_required & DL_OPT_SB_POOL) && !(o_found & DL_OPT_SB_POOL)) {
+		pr_err("Pool index option expected.\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_SB_SIZE) && !(o_found & DL_OPT_SB_SIZE)) {
+		pr_err("Pool size option expected.\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_SB_TYPE) && !(o_found & DL_OPT_SB_TYPE)) {
+		pr_err("Pool type option expected.\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_SB_THTYPE) && !(o_found & DL_OPT_SB_THTYPE)) {
+		pr_err("Pool threshold type option expected.\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_SB_TH) && !(o_found & DL_OPT_SB_TH)) {
+		pr_err("Threshold option expected.\n");
+		return -EINVAL;
+	}
+
+	if ((o_required & DL_OPT_SB_TC) && !(o_found & DL_OPT_SB_TC)) {
+		pr_err("TC index option expected.\n");
+		return -EINVAL;
+	}
 	return 0;
 }
 
@@ -620,6 +823,27 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_PORT_COUNT)
 		mnl_attr_put_u32(nlh, DEVLINK_ATTR_PORT_SPLIT_COUNT,
 				 opts->port_count);
+	if (opts->present & DL_OPT_SB)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_SB_INDEX,
+				 opts->sb_index);
+	if (opts->present & DL_OPT_SB_POOL)
+		mnl_attr_put_u16(nlh, DEVLINK_ATTR_SB_POOL_INDEX,
+				 opts->sb_pool_index);
+	if (opts->present & DL_OPT_SB_SIZE)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_SB_POOL_SIZE,
+				 opts->sb_pool_size);
+	if (opts->present & DL_OPT_SB_TYPE)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_SB_POOL_TYPE,
+				opts->sb_pool_type);
+	if (opts->present & DL_OPT_SB_THTYPE)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE,
+				opts->sb_pool_thtype);
+	if (opts->present & DL_OPT_SB_TH)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_SB_THRESHOLD,
+				 opts->sb_threshold);
+	if (opts->present & DL_OPT_SB_TC)
+		mnl_attr_put_u16(nlh, DEVLINK_ATTR_SB_TC_INDEX,
+				 opts->sb_tc_index);
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
@@ -929,6 +1153,380 @@ static int cmd_port(struct dl *dl)
 	return -ENOENT;
 }
 
+static void cmd_sb_help(void)
+{
+	pr_out("Usage: devlink sb show [ DEV [ sb SB_INDEX ] ]\n");
+	pr_out("       devlink sb pool show [ DEV [ sb SB_INDEX ] pool POOL_INDEX ]\n");
+	pr_out("       devlink sb pool set DEV [ sb SB_INDEX ] pool POOL_INDEX\n");
+	pr_out("                           size POOL_SIZE thtype { static | dynamic }\n");
+	pr_out("       devlink sb port pool show [ DEV/PORT_INDEX [ sb SB_INDEX ]\n");
+	pr_out("                                   pool POOL_INDEX ]\n");
+	pr_out("       devlink sb port pool set DEV/PORT_INDEX [ sb SB_INDEX ]\n");
+	pr_out("                                pool POOL_INDEX th THRESHOLD\n");
+	pr_out("       devlink sb tc bind show [ DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX\n");
+	pr_out("                                 type { ingress | egress } ]\n");
+	pr_out("       devlink sb tc bind set DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX\n");
+	pr_out("                              type { ingress | egress } pool POOL_INDEX\n");
+	pr_out("                              th THRESHOLD\n");
+}
+
+static void pr_out_sb(struct nlattr **tb)
+{
+	pr_out_handle(tb);
+	pr_out(": sb %u size %u ing_pools %u eg_pools %u ing_tcs %u eg_tcs %u\n",
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_INDEX]),
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_SIZE]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_INGRESS_POOL_COUNT]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_EGRESS_POOL_COUNT]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_INGRESS_TC_COUNT]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_EGRESS_TC_COUNT]));
+}
+
+static int cmd_sb_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_SB_INDEX] || !tb[DEVLINK_ATTR_SB_SIZE] ||
+	    !tb[DEVLINK_ATTR_SB_INGRESS_POOL_COUNT] ||
+	    !tb[DEVLINK_ATTR_SB_EGRESS_POOL_COUNT] ||
+	    !tb[DEVLINK_ATTR_SB_INGRESS_TC_COUNT] ||
+	    !tb[DEVLINK_ATTR_SB_EGRESS_TC_COUNT])
+		return MNL_CB_ERROR;
+	pr_out_sb(tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_sb_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE, DL_OPT_SB);
+		if (err)
+			return err;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_sb_show_cb, NULL);
+}
+
+static const char *pool_type_name(uint8_t type)
+{
+	switch (type) {
+	case DEVLINK_SB_POOL_TYPE_INGRESS: return "ingress";
+	case DEVLINK_SB_POOL_TYPE_EGRESS: return "egress";
+	default: return "<unknown type>";
+	}
+}
+
+static const char *threshold_type_name(uint8_t type)
+{
+	switch (type) {
+	case DEVLINK_SB_THRESHOLD_TYPE_STATIC: return "static";
+	case DEVLINK_SB_THRESHOLD_TYPE_DYNAMIC: return "dynamic";
+	default: return "<unknown type>";
+	}
+}
+
+static void pr_out_sb_pool(struct nlattr **tb)
+{
+	pr_out_handle(tb);
+	pr_out(": sb %u pool %u type %s size %u thtype %s\n",
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_INDEX]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_POOL_INDEX]),
+	       pool_type_name(mnl_attr_get_u8(tb[DEVLINK_ATTR_SB_POOL_TYPE])),
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_POOL_SIZE]),
+	       threshold_type_name(mnl_attr_get_u8(tb[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE])));
+}
+
+static int cmd_sb_pool_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_SB_INDEX] || !tb[DEVLINK_ATTR_SB_POOL_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_TYPE] || !tb[DEVLINK_ATTR_SB_POOL_SIZE] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE])
+		return MNL_CB_ERROR;
+	pr_out_sb_pool(tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_sb_pool_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_POOL_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_SB_POOL,
+					DL_OPT_SB);
+		if (err)
+			return err;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_sb_pool_show_cb, NULL);
+}
+
+static int cmd_sb_pool_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_POOL_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_SB_POOL |
+				DL_OPT_SB_SIZE | DL_OPT_SB_THTYPE, DL_OPT_SB);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_sb_pool(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_sb_pool_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_sb_pool_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static void pr_out_sb_port_pool(struct dl *dl, struct nlattr **tb)
+{
+	pr_out_port_handle_nice(dl, tb);
+	pr_out(": sb %u pool %u threshold %u\n",
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_INDEX]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_POOL_INDEX]),
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_THRESHOLD]));
+}
+
+static int cmd_sb_port_pool_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dl *dl = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_PORT_INDEX] || !tb[DEVLINK_ATTR_SB_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_INDEX] || !tb[DEVLINK_ATTR_SB_THRESHOLD])
+		return MNL_CB_ERROR;
+	pr_out_sb_port_pool(dl, tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_sb_port_pool_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_PORT_POOL_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLEP | DL_OPT_SB_POOL,
+					DL_OPT_SB);
+		if (err)
+			return err;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_sb_port_pool_show_cb, dl);
+}
+
+static int cmd_sb_port_pool_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_PORT_POOL_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLEP | DL_OPT_SB_POOL |
+				DL_OPT_SB_TH, DL_OPT_SB);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_sb_port_pool(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_sb_port_pool_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_sb_port_pool_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static int cmd_sb_port(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "pool")) {
+		dl_arg_inc(dl);
+		return cmd_sb_port_pool(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static void pr_out_sb_tc_bind(struct dl *dl, struct nlattr **tb)
+{
+	pr_out_port_handle_nice(dl, tb);
+	pr_out(": sb %u tc %u type %s pool %u threshold %u\n",
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_INDEX]),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_TC_INDEX]),
+	       pool_type_name(mnl_attr_get_u8(tb[DEVLINK_ATTR_SB_POOL_TYPE])),
+	       mnl_attr_get_u16(tb[DEVLINK_ATTR_SB_POOL_INDEX]),
+	       mnl_attr_get_u32(tb[DEVLINK_ATTR_SB_THRESHOLD]));
+}
+
+static int cmd_sb_tc_bind_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct dl *dl = data;
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_PORT_INDEX] || !tb[DEVLINK_ATTR_SB_INDEX] ||
+	    !tb[DEVLINK_ATTR_SB_TC_INDEX] || !tb[DEVLINK_ATTR_SB_POOL_TYPE] ||
+	    !tb[DEVLINK_ATTR_SB_POOL_INDEX] || !tb[DEVLINK_ATTR_SB_THRESHOLD])
+		return MNL_CB_ERROR;
+	pr_out_sb_tc_bind(dl, tb);
+	return MNL_CB_OK;
+}
+
+static int cmd_sb_tc_bind_show(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_TC_POOL_BIND_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLEP | DL_OPT_SB_TC |
+					DL_OPT_SB_TYPE, DL_OPT_SB);
+		if (err)
+			return err;
+	}
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_sb_tc_bind_show_cb, dl);
+}
+
+static int cmd_sb_tc_bind_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_SB_TC_POOL_BIND_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLEP | DL_OPT_SB_TC |
+				DL_OPT_SB_TYPE | DL_OPT_SB_POOL | DL_OPT_SB_TH,
+				DL_OPT_SB);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_sb_tc_bind(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_sb_tc_bind_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_sb_tc_bind_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static int cmd_sb_tc(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "bind")) {
+		dl_arg_inc(dl);
+		return cmd_sb_tc_bind(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
+static int cmd_sb(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_sb_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_sb_show(dl);
+	} else if (dl_argv_match(dl, "pool")) {
+		dl_arg_inc(dl);
+		return cmd_sb_pool(dl);
+	} else if (dl_argv_match(dl, "port")) {
+		dl_arg_inc(dl);
+		return cmd_sb_port(dl);
+	} else if (dl_argv_match(dl, "tc")) {
+		dl_arg_inc(dl);
+		return cmd_sb_tc(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static const char *cmd_name(uint8_t cmd)
 {
 	switch (cmd) {
@@ -1064,7 +1662,7 @@ static int cmd_mon(struct dl *dl)
 static void help(void)
 {
 	pr_out("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
-	       "where  OBJECT := { dev | port | monitor }\n"
+	       "where  OBJECT := { dev | port | sb | monitor }\n"
 	       "       OPTIONS := { -V[ersion] | -n[no-nice-names] }\n");
 }
 
@@ -1079,6 +1677,9 @@ static int dl_cmd(struct dl *dl)
 	} else if (dl_argv_match(dl, "port")) {
 		dl_arg_inc(dl);
 		return cmd_port(dl);
+	} else if (dl_argv_match(dl, "sb")) {
+		dl_arg_inc(dl);
+		return cmd_sb(dl);
 	} else if (dl_argv_match(dl, "monitor")) {
 		dl_arg_inc(dl);
 		return cmd_mon(dl);
diff --git a/include/linux/devlink.h b/include/linux/devlink.h
index c9fee57..9c1aa57 100644
--- a/include/linux/devlink.h
+++ b/include/linux/devlink.h
@@ -33,6 +33,26 @@ enum devlink_command {
 	DEVLINK_CMD_PORT_SPLIT,
 	DEVLINK_CMD_PORT_UNSPLIT,
 
+	DEVLINK_CMD_SB_GET,		/* can dump */
+	DEVLINK_CMD_SB_SET,
+	DEVLINK_CMD_SB_NEW,
+	DEVLINK_CMD_SB_DEL,
+
+	DEVLINK_CMD_SB_POOL_GET,	/* can dump */
+	DEVLINK_CMD_SB_POOL_SET,
+	DEVLINK_CMD_SB_POOL_NEW,
+	DEVLINK_CMD_SB_POOL_DEL,
+
+	DEVLINK_CMD_SB_PORT_POOL_GET,	/* can dump */
+	DEVLINK_CMD_SB_PORT_POOL_SET,
+	DEVLINK_CMD_SB_PORT_POOL_NEW,
+	DEVLINK_CMD_SB_PORT_POOL_DEL,
+
+	DEVLINK_CMD_SB_TC_POOL_BIND_GET,	/* can dump */
+	DEVLINK_CMD_SB_TC_POOL_BIND_SET,
+	DEVLINK_CMD_SB_TC_POOL_BIND_NEW,
+	DEVLINK_CMD_SB_TC_POOL_BIND_DEL,
+
 	/* add new commands above here */
 
 	__DEVLINK_CMD_MAX,
@@ -46,6 +66,31 @@ enum devlink_port_type {
 	DEVLINK_PORT_TYPE_IB,
 };
 
+enum devlink_sb_pool_type {
+	DEVLINK_SB_POOL_TYPE_INGRESS,
+	DEVLINK_SB_POOL_TYPE_EGRESS,
+};
+
+/* static threshold - limiting the maximum number of bytes.
+ * dynamic threshold - limiting the maximum number of bytes
+ *   based on the currently available free space in the shared buffer pool.
+ *   In this mode, the maximum quota is calculated based
+ *   on the following formula:
+ *     max_quota = alpha / (1 + alpha) * Free_Buffer
+ *   While Free_Buffer is the amount of none-occupied buffer associated to
+ *   the relevant pool.
+ *   The value range which can be passed is 0-20 and serves
+ *   for computation of alpha by following formula:
+ *     alpha = 2 ^ (passed_value - 10)
+ */
+
+enum devlink_sb_threshold_type {
+	DEVLINK_SB_THRESHOLD_TYPE_STATIC,
+	DEVLINK_SB_THRESHOLD_TYPE_DYNAMIC,
+};
+
+#define DEVLINK_SB_THRESHOLD_TO_ALPHA_MAX 20
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -62,6 +107,18 @@ enum devlink_attr {
 	DEVLINK_ATTR_PORT_IBDEV_NAME,		/* string */
 	DEVLINK_ATTR_PORT_SPLIT_COUNT,		/* u32 */
 	DEVLINK_ATTR_PORT_SPLIT_GROUP,		/* u32 */
+	DEVLINK_ATTR_SB_INDEX,			/* u32 */
+	DEVLINK_ATTR_SB_SIZE,			/* u32 */
+	DEVLINK_ATTR_SB_INGRESS_POOL_COUNT,	/* u16 */
+	DEVLINK_ATTR_SB_EGRESS_POOL_COUNT,	/* u16 */
+	DEVLINK_ATTR_SB_INGRESS_TC_COUNT,	/* u16 */
+	DEVLINK_ATTR_SB_EGRESS_TC_COUNT,	/* u16 */
+	DEVLINK_ATTR_SB_POOL_INDEX,		/* u16 */
+	DEVLINK_ATTR_SB_POOL_TYPE,		/* u8 */
+	DEVLINK_ATTR_SB_POOL_SIZE,		/* u32 */
+	DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE,	/* u8 */
+	DEVLINK_ATTR_SB_THRESHOLD,		/* u32 */
+	DEVLINK_ATTR_SB_TC_INDEX,		/* u16 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 04/11] devlink: introduce pr_out_port_handle helper
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 39f423a..0904e07 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -523,6 +523,12 @@ static void pr_out_handle(struct nlattr **tb)
 			mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]));
 }
 
+static void pr_out_port_handle(struct nlattr **tb)
+{
+	pr_out_handle(tb);
+	pr_out("/%d", mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]));
+}
+
 static void pr_out_dev(struct nlattr **tb)
 {
 	pr_out_handle(tb);
@@ -599,8 +605,8 @@ static void pr_out_port(struct nlattr **tb)
 	struct nlattr *pt_attr = tb[DEVLINK_ATTR_PORT_TYPE];
 	struct nlattr *dpt_attr = tb[DEVLINK_ATTR_PORT_DESIRED_TYPE];
 
-	pr_out_handle(tb);
-	pr_out("/%d:", mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]));
+	pr_out_port_handle(tb);
+	pr_out(":");
 	if (pt_attr) {
 		uint16_t port_type = mnl_attr_get_u16(pt_attr);
 
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 08/11] devlink: allow to parse both devlink and port handle in the same time
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

For filtering purposes, it makes sense for used to either specify
devlink handle of port handle.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 109 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 90 insertions(+), 19 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index d436bbf..e2e0413 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -363,6 +363,12 @@ static int strtouint32_t(const char *str, uint32_t *p_val)
 	return 0;
 }
 
+static int __dl_argv_handle(char *str, char **p_bus_name, char **p_dev_name)
+{
+	strslashrsplit(str, p_bus_name, p_dev_name);
+	return 0;
+}
+
 static int dl_argv_handle(struct dl *dl, char **p_bus_name, char **p_dev_name)
 {
 	char *str = dl_argv_next(dl);
@@ -376,8 +382,40 @@ static int dl_argv_handle(struct dl *dl, char **p_bus_name, char **p_dev_name)
 		pr_err("Expected \"bus_name/dev_name\".\n");
 		return -EINVAL;
 	}
+	return __dl_argv_handle(str, p_bus_name, p_dev_name);
+}
 
-	strslashrsplit(str, p_bus_name, p_dev_name);
+static int __dl_argv_handle_port(char *str,
+				 char **p_bus_name, char **p_dev_name,
+				 uint32_t *p_port_index)
+{
+	char *handlestr = handlestr;
+	char *portstr = portstr;
+	int err;
+
+	strslashrsplit(str, &handlestr, &portstr);
+	err = strtouint32_t(portstr, p_port_index);
+	if (err) {
+		pr_err("Port index \"%s\" is not a number or not within range\n",
+		       portstr);
+		return err;
+	}
+	strslashrsplit(handlestr, p_bus_name, p_dev_name);
+	return 0;
+}
+
+static int __dl_argv_handle_port_ifname(struct dl *dl, char *str,
+					char **p_bus_name, char **p_dev_name,
+					uint32_t *p_port_index)
+{
+	int err;
+
+	err = ifname_map_lookup(dl, str, p_bus_name, p_dev_name,
+				p_port_index);
+	if (err) {
+		pr_err("Netdevice \"%s\" not found\n", str);
+		return err;
+	}
 	return 0;
 }
 
@@ -386,7 +424,6 @@ static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
 {
 	char *str = dl_argv_next(dl);
 	unsigned int slash_count;
-	int err;
 
 	if (!str) {
 		pr_err("Port identification (\"bus_name/dev_name/port_index\" or \"netdev ifname\") expected.\n");
@@ -398,26 +435,52 @@ static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
 		pr_err("Expected \"bus_name/dev_name/port_index\" or \"netdev_ifname\".\n");
 		return -EINVAL;
 	}
-
 	if (slash_count == 2) {
-		char *handlestr = handlestr;
-		char *portstr = portstr;
-
-		err = strslashrsplit(str, &handlestr, &portstr);
-		err = strtouint32_t(portstr, p_port_index);
-		if (err) {
-			pr_err("Port index \"%s\" is not a number or not within range\n",
-			       portstr);
+		return __dl_argv_handle_port(str, p_bus_name,
+					     p_dev_name, p_port_index);
+	} else if (slash_count == 0) {
+		return __dl_argv_handle_port_ifname(dl, str, p_bus_name,
+						    p_dev_name, p_port_index);
+	}
+	return 0;
+}
+
+static int dl_argv_handle_both(struct dl *dl, char **p_bus_name,
+			       char **p_dev_name, uint32_t *p_port_index,
+			       uint32_t *p_handle_bit)
+{
+	char *str = dl_argv_next(dl);
+	unsigned int slash_count;
+	int err;
+
+	if (!str) {
+		pr_err("One of following identifications expected:\n"
+		       "Devlink identification (\"bus_name/dev_name\")\n"
+		       "Port identification (\"bus_name/dev_name/port_index\" or \"netdev ifname\")\n");
+		return -EINVAL;
+	}
+	slash_count = strslashcount(str);
+	if (slash_count == 1) {
+		err = __dl_argv_handle(str, p_bus_name, p_dev_name);
+		if (err)
 			return err;
-		}
-		strslashrsplit(handlestr, p_bus_name, p_dev_name);
+		*p_handle_bit = DL_OPT_HANDLE;
+	} else if (slash_count == 2) {
+		err = __dl_argv_handle_port(str, p_bus_name,
+					    p_dev_name, p_port_index);
+		if (err)
+			return err;
+		*p_handle_bit = DL_OPT_HANDLEP;
 	} else if (slash_count == 0) {
-		err = ifname_map_lookup(dl, str, p_bus_name, p_dev_name,
-					p_port_index);
-		if (err) {
-			pr_err("Netdevice \"%s\" not found\n", str);
+		err = __dl_argv_handle_port_ifname(dl, str, p_bus_name,
+						   p_dev_name, p_port_index);
+		if (err)
 			return err;
-		}
+		*p_handle_bit = DL_OPT_HANDLEP;
+	} else {
+		pr_err("Wrong port identification string format.\n");
+		pr_err("Expected \"bus_name/dev_name\" or \"bus_name/dev_name/port_index\" or \"netdev_ifname\".\n");
+		return -EINVAL;
 	}
 	return 0;
 }
@@ -475,7 +538,15 @@ static int dl_argv_parse(struct dl *dl, uint32_t o_required,
 	uint32_t o_found = 0;
 	int err;
 
-	if (o_required & DL_OPT_HANDLE) {
+	if (o_required & DL_OPT_HANDLE && o_required & DL_OPT_HANDLEP) {
+		uint32_t handle_bit = handle_bit;
+
+		err = dl_argv_handle_both(dl, &opts->bus_name, &opts->dev_name,
+					  &opts->port_index, &handle_bit);
+		if (err)
+			return err;
+		o_found |= handle_bit;
+	} else if (o_required & DL_OPT_HANDLE) {
 		err = dl_argv_handle(dl, &opts->bus_name, &opts->dev_name);
 		if (err)
 			return err;
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 07/11] devlink: introduce dump filtering function
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

This function is to be used from dump callbacks to decide if the output
currect output should be filtered off or not. Filtering is based on
previously parsed and stored command line options.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 0c2132f..d436bbf 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -563,6 +563,36 @@ static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
 	return 0;
 }
 
+static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
+{
+	struct dl_opts *opts = &dl->opts;
+	struct nlattr *attr_bus_name = tb[DEVLINK_ATTR_BUS_NAME];
+	struct nlattr *attr_dev_name = tb[DEVLINK_ATTR_DEV_NAME];
+	struct nlattr *attr_port_index = tb[DEVLINK_ATTR_PORT_INDEX];
+
+	if (opts->present & DL_OPT_HANDLE &&
+	    attr_bus_name && attr_dev_name) {
+		const char *bus_name = mnl_attr_get_str(attr_bus_name);
+		const char *dev_name = mnl_attr_get_str(attr_dev_name);
+
+		if (strcmp(bus_name, opts->bus_name) != 0 ||
+		    strcmp(dev_name, opts->dev_name) != 0)
+			return false;
+	}
+	if (opts->present & DL_OPT_HANDLEP &&
+	    attr_bus_name && attr_dev_name && attr_port_index) {
+		const char *bus_name = mnl_attr_get_str(attr_bus_name);
+		const char *dev_name = mnl_attr_get_str(attr_dev_name);
+		uint32_t port_index = mnl_attr_get_u32(attr_port_index);
+
+		if (strcmp(bus_name, opts->bus_name) != 0 ||
+		    strcmp(dev_name, opts->dev_name) != 0 ||
+		    port_index != opts->port_index)
+			return false;
+	}
+	return true;
+}
+
 static void cmd_dev_help(void)
 {
 	pr_out("Usage: devlink dev show [ DEV ]\n");
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 05/11] devlink: introduce helper to print out nice names (ifnames)
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

By default, ifnames will be printed out. User can turn that off using
"-n" option on the command line.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c       | 90 +++++++++++++++++++++++++++++++++++++++++++------
 man/man8/devlink-dev.8  |  1 +
 man/man8/devlink-port.8 |  1 +
 man/man8/devlink.8      |  5 +++
 4 files changed, 86 insertions(+), 11 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 0904e07..5e08666 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -114,6 +114,7 @@ struct dl {
 	struct list_head ifname_map_list;
 	int argc;
 	char **argv;
+	bool no_nice_names;
 };
 
 static int dl_argc(struct dl *dl)
@@ -290,6 +291,23 @@ static int ifname_map_lookup(struct dl *dl, const char *ifname,
 	return -ENOENT;
 }
 
+static int ifname_map_rev_lookup(struct dl *dl, const char *bus_name,
+				 const char *dev_name, uint32_t port_index,
+				 char **p_ifname)
+{
+	struct ifname_map *ifname_map;
+
+	list_for_each_entry(ifname_map, &dl->ifname_map_list, list) {
+		if (strcmp(bus_name, ifname_map->bus_name) == 0 &&
+		    strcmp(dev_name, ifname_map->dev_name) == 0 &&
+		    port_index == ifname_map->port_index) {
+			*p_ifname = ifname_map->ifname;
+			return 0;
+		}
+	}
+	return -ENOENT;
+}
+
 static unsigned int strslashcount(char *str)
 {
 	unsigned int count = 0;
@@ -517,16 +535,62 @@ static void cmd_dev_help(void)
 	pr_out("Usage: devlink dev show [ DEV ]\n");
 }
 
+static void __pr_out_handle(const char *bus_name, const char *dev_name)
+{
+	pr_out("%s/%s", bus_name, dev_name);
+}
+
 static void pr_out_handle(struct nlattr **tb)
 {
-	pr_out("%s/%s", mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]),
+	__pr_out_handle(mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]),
 			mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]));
 }
 
+static void __pr_out_port_handle(const char *bus_name, const char *dev_name,
+				 uint32_t port_index)
+{
+	__pr_out_handle(bus_name, dev_name);
+	pr_out("/%d", port_index);
+}
+
 static void pr_out_port_handle(struct nlattr **tb)
 {
-	pr_out_handle(tb);
-	pr_out("/%d", mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]));
+	__pr_out_port_handle(mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]),
+			     mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]),
+			     mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]));
+}
+
+static void __pr_out_port_handle_nice(struct dl *dl, const char *bus_name,
+				      const char *dev_name, uint32_t port_index)
+{
+	char *ifname;
+	int err;
+
+	if (dl->no_nice_names)
+		goto no_nice_names;
+
+	err = ifname_map_rev_lookup(dl, bus_name, dev_name,
+				    port_index, &ifname);
+	if (err)
+		goto no_nice_names;
+	pr_out("%s", ifname);
+	return;
+
+no_nice_names:
+	__pr_out_port_handle(bus_name, dev_name, port_index);
+}
+
+static void pr_out_port_handle_nice(struct dl *dl, struct nlattr **tb)
+{
+	const char *bus_name;
+	const char *dev_name;
+	uint32_t port_index;
+
+	bus_name = mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]);
+	dev_name = mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]);
+	port_index = mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_INDEX]);
+
+	__pr_out_port_handle_nice(dl, bus_name, dev_name, port_index);
 }
 
 static void pr_out_dev(struct nlattr **tb)
@@ -867,7 +931,7 @@ static void help(void)
 {
 	pr_out("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "where  OBJECT := { dev | port | monitor }\n"
-	       "       OPTIONS := { -V[ersion] }\n");
+	       "       OPTIONS := { -V[ersion] | -n[no-nice-names] }\n");
 }
 
 static int dl_cmd(struct dl *dl)
@@ -939,6 +1003,7 @@ int main(int argc, char **argv)
 {
 	static const struct option long_options[] = {
 		{ "Version",		no_argument,		NULL, 'V' },
+		{ "no-nice-names",	no_argument,		NULL, 'n' },
 		{ NULL, 0, NULL, 0 }
 	};
 	struct dl *dl;
@@ -946,13 +1011,22 @@ int main(int argc, char **argv)
 	int err;
 	int ret;
 
-	while ((opt = getopt_long(argc, argv, "V",
+	dl = dl_alloc();
+	if (!dl) {
+		pr_err("Failed to allocate memory for devlink\n");
+		return EXIT_FAILURE;
+	}
+
+	while ((opt = getopt_long(argc, argv, "Vn",
 				  long_options, NULL)) >= 0) {
 
 		switch (opt) {
 		case 'V':
 			printf("devlink utility, iproute2-ss%s\n", SNAPSHOT);
 			return EXIT_SUCCESS;
+		case 'n':
+			dl->no_nice_names = true;
+			break;
 		default:
 			pr_err("Unknown option.\n");
 			help();
@@ -963,12 +1037,6 @@ int main(int argc, char **argv)
 	argc -= optind;
 	argv += optind;
 
-	dl = dl_alloc();
-	if (!dl) {
-		pr_err("Failed to allocate memory for devlink\n");
-		return EXIT_FAILURE;
-	}
-
 	err = dl_init(dl, argc, argv);
 	if (err) {
 		ret = EXIT_FAILURE;
diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
index 7878d89..af96a29 100644
--- a/man/man8/devlink-dev.8
+++ b/man/man8/devlink-dev.8
@@ -16,6 +16,7 @@ devlink-dev \- devlink device configuration
 .ti -8
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
+\fB\-n\fR[\fIno-nice-names\fR] }
 
 .ti -8
 .B devlink dev show
diff --git a/man/man8/devlink-port.8 b/man/man8/devlink-port.8
index e6ae686..d78837c 100644
--- a/man/man8/devlink-port.8
+++ b/man/man8/devlink-port.8
@@ -16,6 +16,7 @@ devlink-port \- devlink port configuration
 .ti -8
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
+\fB\-n\fR[\fIno-nice-names\fR] }
 
 .ti -8
 .BR "devlink port set "
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index f608ccc..df00f4f 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -19,6 +19,7 @@ devlink \- Devlink tool
 .ti -8
 .IR OPTIONS " := { "
 \fB\-V\fR[\fIersion\fR] |
+\fB\-n\fR[\fIno-nice-names\fR] }
 
 .SH OPTIONS
 
@@ -28,6 +29,10 @@ Print the version of the
 .B devlink
 utility and exit.
 
+.TP
+.BR "\-n" , " -no-nice-names"
+Turn off printing out nice names, for example netdevice ifnames instead of devlink port identification.
+
 .SS
 .I OBJECT
 
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 06/11] devlink: split dl_argv_parse_put to parse and put parts
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

It is handy to have parsed cmdline data stored so they can be used for
dumps filtering. So split original dl_argv_parse_put into parse and put
parts.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 105 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 69 insertions(+), 36 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 5e08666..0c2132f 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -109,12 +109,28 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 	free(ifname_map);
 }
 
+#define BIT(nr)                 (1UL << (nr))
+#define DL_OPT_HANDLE		BIT(0)
+#define DL_OPT_HANDLEP		BIT(1)
+#define DL_OPT_PORT_TYPE	BIT(2)
+#define DL_OPT_PORT_COUNT	BIT(3)
+
+struct dl_opts {
+	uint32_t present; /* flags of present items */
+	char *bus_name;
+	char *dev_name;
+	uint32_t port_index;
+	enum devlink_port_type port_type;
+	uint32_t port_count;
+};
+
 struct dl {
 	struct mnlg_socket *nlg;
 	struct list_head ifname_map_list;
 	int argc;
 	char **argv;
 	bool no_nice_names;
+	struct dl_opts opts;
 };
 
 static int dl_argc(struct dl *dl)
@@ -347,11 +363,9 @@ static int strtouint32_t(const char *str, uint32_t *p_val)
 	return 0;
 }
 
-static int dl_argv_put_handle(struct nlmsghdr *nlh, struct dl *dl)
+static int dl_argv_handle(struct dl *dl, char **p_bus_name, char **p_dev_name)
 {
 	char *str = dl_argv_next(dl);
-	char *bus_name = bus_name;
-	char *dev_name = dev_name;
 
 	if (!str) {
 		pr_err("Devlink identification (\"bus_name/dev_name\") expected\n");
@@ -363,19 +377,15 @@ static int dl_argv_put_handle(struct nlmsghdr *nlh, struct dl *dl)
 		return -EINVAL;
 	}
 
-	strslashrsplit(str, &bus_name, &dev_name);
-	mnl_attr_put_strz(nlh, DEVLINK_ATTR_BUS_NAME, bus_name);
-	mnl_attr_put_strz(nlh, DEVLINK_ATTR_DEV_NAME, dev_name);
+	strslashrsplit(str, p_bus_name, p_dev_name);
 	return 0;
 }
 
-static int dl_argv_put_handle_port(struct nlmsghdr *nlh, struct dl *dl)
+static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
+			       char **p_dev_name, uint32_t *p_port_index)
 {
 	char *str = dl_argv_next(dl);
 	unsigned int slash_count;
-	char *bus_name = bus_name;
-	char *dev_name = dev_name;
-	uint32_t port_index = port_index;
 	int err;
 
 	if (!str) {
@@ -394,24 +404,21 @@ static int dl_argv_put_handle_port(struct nlmsghdr *nlh, struct dl *dl)
 		char *portstr = portstr;
 
 		err = strslashrsplit(str, &handlestr, &portstr);
-		err = strtouint32_t(portstr, &port_index);
+		err = strtouint32_t(portstr, p_port_index);
 		if (err) {
 			pr_err("Port index \"%s\" is not a number or not within range\n",
 			       portstr);
 			return err;
 		}
-		strslashrsplit(handlestr, &bus_name, &dev_name);
+		strslashrsplit(handlestr, p_bus_name, p_dev_name);
 	} else if (slash_count == 0) {
-		err = ifname_map_lookup(dl, str, &bus_name, &dev_name,
-					&port_index);
+		err = ifname_map_lookup(dl, str, p_bus_name, p_dev_name,
+					p_port_index);
 		if (err) {
 			pr_err("Netdevice \"%s\" not found\n", str);
 			return err;
 		}
 	}
-	mnl_attr_put_strz(nlh, DEVLINK_ATTR_BUS_NAME, bus_name);
-	mnl_attr_put_strz(nlh, DEVLINK_ATTR_DEV_NAME, dev_name);
-	mnl_attr_put_u32(nlh, DEVLINK_ATTR_PORT_INDEX, port_index);
 	return 0;
 }
 
@@ -460,55 +467,46 @@ static int port_type_get(const char *typestr, enum devlink_port_type *p_type)
 	return 0;
 }
 
-#define BIT(nr)                 (1UL << (nr))
-#define DL_OPT_HANDLE		BIT(0)
-#define DL_OPT_HANDLEP		BIT(1)
-#define DL_OPT_PORT_TYPE	BIT(2)
-#define DL_OPT_PORT_COUNT	BIT(3)
-
-static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
-			     uint32_t o_required, uint32_t o_optional)
+static int dl_argv_parse(struct dl *dl, uint32_t o_required,
+			 uint32_t o_optional)
 {
+	struct dl_opts *opts = &dl->opts;
 	uint32_t o_all = o_required | o_optional;
 	uint32_t o_found = 0;
 	int err;
 
 	if (o_required & DL_OPT_HANDLE) {
-		err = dl_argv_put_handle(nlh, dl);
+		err = dl_argv_handle(dl, &opts->bus_name, &opts->dev_name);
 		if (err)
 			return err;
+		o_found |= DL_OPT_HANDLE;
 	} else if (o_required & DL_OPT_HANDLEP) {
-		err = dl_argv_put_handle_port(nlh, dl);
+		err = dl_argv_handle_port(dl, &opts->bus_name, &opts->dev_name,
+					  &opts->port_index);
 		if (err)
 			return err;
+		o_found |= DL_OPT_HANDLEP;
 	}
 
 	while (dl_argc(dl)) {
 		if (dl_argv_match(dl, "type") &&
 		    (o_all & DL_OPT_PORT_TYPE)) {
-			enum devlink_port_type port_type;
 			const char *typestr;
 
 			dl_arg_inc(dl);
 			err = dl_argv_str(dl, &typestr);
 			if (err)
 				return err;
-			err = port_type_get(typestr, &port_type);
+			err = port_type_get(typestr, &opts->port_type);
 			if (err)
 				return err;
-			mnl_attr_put_u16(nlh, DEVLINK_ATTR_PORT_TYPE,
-					 port_type);
 			o_found |= DL_OPT_PORT_TYPE;
 		} else if (dl_argv_match(dl, "count") &&
 			   (o_all & DL_OPT_PORT_COUNT)) {
-			uint32_t count;
-
 			dl_arg_inc(dl);
-			err = dl_argv_uint32_t(dl, &count);
+			err = dl_argv_uint32_t(dl, &opts->port_count);
 			if (err)
 				return err;
-			mnl_attr_put_u32(nlh, DEVLINK_ATTR_PORT_SPLIT_COUNT,
-					 count);
 			o_found |= DL_OPT_PORT_COUNT;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
@@ -516,6 +514,8 @@ static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
 		}
 	}
 
+	opts->present = o_found;
+
 	if ((o_required & DL_OPT_PORT_TYPE) && !(o_found & DL_OPT_PORT_TYPE)) {
 		pr_err("Port type option expected.\n");
 		return -EINVAL;
@@ -530,6 +530,39 @@ static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
 	return 0;
 }
 
+static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
+{
+	struct dl_opts *opts = &dl->opts;
+
+	if (opts->present & DL_OPT_HANDLE) {
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_BUS_NAME, opts->bus_name);
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_DEV_NAME, opts->dev_name);
+	} else if (opts->present & DL_OPT_HANDLEP) {
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_BUS_NAME, opts->bus_name);
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_DEV_NAME, opts->dev_name);
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_PORT_INDEX,
+				 opts->port_index);
+	}
+	if (opts->present & DL_OPT_PORT_TYPE)
+		mnl_attr_put_u16(nlh, DEVLINK_ATTR_PORT_TYPE,
+				 opts->port_type);
+	if (opts->present & DL_OPT_PORT_COUNT)
+		mnl_attr_put_u32(nlh, DEVLINK_ATTR_PORT_SPLIT_COUNT,
+				 opts->port_count);
+}
+
+static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
+			     uint32_t o_required, uint32_t o_optional)
+{
+	int err;
+
+	err = dl_argv_parse(dl, o_required, o_optional);
+	if (err)
+		return err;
+	dl_opts_put(nlh, dl);
+	return 0;
+}
+
 static void cmd_dev_help(void)
 {
 	pr_out("Usage: devlink dev show [ DEV ]\n");
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 03/11] list: add list_add_tail helper
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/list.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/list.h b/include/list.h
index b549c3e..5b529dc 100644
--- a/include/list.h
+++ b/include/list.h
@@ -33,6 +33,11 @@ static inline void list_add(struct list_head *new, struct list_head *head)
 	__list_add(new, head, head->next);
 }
 
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head->prev, head);
+}
+
 static inline void __list_del(struct list_head *prev, struct list_head *next)
 {
 	next->prev = prev;
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 02/11] list: add list_for_each_entry_reverse macro
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/list.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/list.h b/include/list.h
index cdebe4d..b549c3e 100644
--- a/include/list.h
+++ b/include/list.h
@@ -50,9 +50,15 @@ static inline void list_del(struct list_head *entry)
 #define list_first_entry(ptr, type, member) \
 	list_entry((ptr)->next, type, member)
 
+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
 #define list_next_entry(pos, member) \
 	list_entry((pos)->member.next, typeof(*(pos)), member)
 
+#define list_prev_entry(pos, member) \
+	list_entry((pos)->member.prev, typeof(*(pos)), member)
+
 #define list_for_each_entry(pos, head, member)				\
 	for (pos = list_first_entry(head, typeof(*pos), member);	\
 	     &pos->member != (head);					\
@@ -64,6 +70,11 @@ static inline void list_del(struct list_head *entry)
 	     &pos->member != (head);					\
 	     pos = n, n = list_next_entry(n, member))
 
+#define list_for_each_entry_reverse(pos, head, member)			\
+	for (pos = list_last_entry(head, typeof(*pos), member);		\
+	     &pos->member != (head);					\
+	     pos = list_prev_entry(pos, member))
+
 struct hlist_head {
 	struct hlist_node *first;
 };
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 01/11] devlink: fix "devlink port" help message
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma
In-Reply-To: <1460706713-5942-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

"dl" -> "devlink"

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index c2da850..39f423a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -578,9 +578,9 @@ static int cmd_dev(struct dl *dl)
 static void cmd_port_help(void)
 {
 	pr_out("Usage: devlink port show [ DEV/PORT_INDEX ]\n");
-	pr_out("       dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]\n");
-	pr_out("       dl port split DEV/PORT_INDEX count COUNT\n");
-	pr_out("       dl port unsplit DEV/PORT_INDEX\n");
+	pr_out("       devlink port set DEV/PORT_INDEX [ type { eth | ib | auto} ]\n");
+	pr_out("       devlink port split DEV/PORT_INDEX count COUNT\n");
+	pr_out("       devlink port unsplit DEV/PORT_INDEX\n");
 }
 
 static const char *port_type_name(uint32_t type)
-- 
2.5.5

^ permalink raw reply related

* [patch iproute2 00/11] devlink: add support for shared buffer configuration and control
From: Jiri Pirko @ 2016-04-15  7:51 UTC (permalink / raw)
  To: netdev
  Cc: stephen, davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay,
	jhs, john.fastabend, rami.rosen, gospo, sfeldma

From: Jiri Pirko <jiri@mellanox.com>

Jiri Pirko (11):
  devlink: fix "devlink port" help message
  list: add list_for_each_entry_reverse macro
  list: add list_add_tail helper
  devlink: introduce pr_out_port_handle helper
  devlink: introduce helper to print out nice names (ifnames)
  devlink: split dl_argv_parse_put to parse and put parts
  devlink: introduce dump filtering function
  devlink: allow to parse both devlink and port handle in the same time
  devlink: implement shared buffer support
  devlink: implement shared buffer occupancy control
  devlink: add manpage for shared buffer

 devlink/devlink.c          | 1310 +++++++++++++++++++++++++++++++++++++++++---
 include/linux/devlink.h    |   63 +++
 include/list.h             |   16 +
 man/man8/devlink-dev.8     |    2 +
 man/man8/devlink-monitor.8 |    1 +
 man/man8/devlink-port.8    |    2 +
 man/man8/devlink-sb.8      |  313 +++++++++++
 man/man8/devlink.8         |    5 +
 8 files changed, 1636 insertions(+), 76 deletions(-)
 create mode 100644 man/man8/devlink-sb.8

-- 
2.5.5

^ permalink raw reply

* [patch net-next] devlink: fix sb register stub in case devlink is disabled
From: Jiri Pirko @ 2016-04-15  7:17 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, fengguang.wu

From: Jiri Pirko <jiri@mellanox.com>

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: bf7974710a40 ("devlink: add shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index be64218..1d45b61 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -184,7 +184,9 @@ static inline void devlink_port_split_set(struct devlink_port *devlink_port,
 static inline int devlink_sb_register(struct devlink *devlink,
 				      unsigned int sb_index, u32 size,
 				      u16 ingress_pools_count,
-				      u16 egress_pools_count, u16 tc_count)
+				      u16 egress_pools_count,
+				      u16 ingress_tc_count,
+				      u16 egress_tc_count)
 {
 	return 0;
 }
-- 
2.5.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox