Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v2] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: Christian Lamparter @ 2017-12-20 16:02 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Andrew Lunn, Christophe Jaillet

The RGMII spec allows compliance for devices that implement an internal
delay on TXC and/or RXC inside the transmitter. This patch adds the
necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
emac driver.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>

---
v2: - utilize phy_interface_mode_is_rgmii()
---
 drivers/net/ethernet/ibm/emac/core.c  |  4 ++--
 drivers/net/ethernet/ibm/emac/emac.h  |  3 +++
 drivers/net/ethernet/ibm/emac/rgmii.c | 10 ++++++++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 7feff2450ed6..043e72e28bba 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -199,8 +199,8 @@ static void __emac_set_multicast_list(struct emac_instance *dev);
 
 static inline int emac_phy_supports_gige(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
-		phy_mode == PHY_MODE_RGMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_SGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
index 5afcc27ceebb..8c6d2af7281b 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -112,6 +112,9 @@ struct emac_regs {
 #define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
 #define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
 #define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
+#define PHY_MODE_RGMII_ID	PHY_INTERFACE_MODE_RGMII_ID
+#define PHY_MODE_RGMII_RXID	PHY_INTERFACE_MODE_RGMII_RXID
+#define PHY_MODE_RGMII_TXID	PHY_INTERFACE_MODE_RGMII_TXID
 #define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
 #define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
 #define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
diff --git a/drivers/net/ethernet/ibm/emac/rgmii.c b/drivers/net/ethernet/ibm/emac/rgmii.c
index c4a1ac38bba8..124b0473d2b7 100644
--- a/drivers/net/ethernet/ibm/emac/rgmii.c
+++ b/drivers/net/ethernet/ibm/emac/rgmii.c
@@ -52,9 +52,9 @@
 /* RGMII bridge supports only GMII/TBI and RGMII/RTBI PHYs */
 static inline int rgmii_valid_mode(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
+	return  phy_interface_mode_is_rgmii(phy_mode) ||
+		phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_MII ||
-		phy_mode == PHY_MODE_RGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
 }
@@ -63,6 +63,9 @@ static inline const char *rgmii_mode_name(int mode)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return "RGMII";
 	case PHY_MODE_TBI:
 		return "TBI";
@@ -81,6 +84,9 @@ static inline u32 rgmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return RGMII_FER_RGMII(input);
 	case PHY_MODE_TBI:
 		return RGMII_FER_TBI(input);
-- 
2.15.1

^ permalink raw reply related

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 16:01 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <20171220155421.GA3849@amd>

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Wed 2017-12-20 16:54:21, Pavel Machek wrote:
> Hi!
> 
> > >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> > >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
> 
> > >
> > Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
> >
> 
> v4.15-rc4+:
> 
> Ethernet works with 19110cfbb reverted.
> 
> Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.

Hmm. So... ethernet originally did not work with patch/846825/ applied
or 19110cfbb reverted, so I re-plugged ethernet cables. Now it works
even with plain v4.15-rc4+.

So it looks like the bug was fixed in the mainline in the meantime...?

Sorry for the noise,
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Andreas Hartmann @ 2017-12-20 15:56 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <6f75bdf5-839b-8c84-c8be-e83d071b245e@maya.org>

On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
[...]
>> I have been able to reproduce the hang by sending a UFO packet
>> between two guests running v4.13 on a host running v4.15-rc1.
>>
>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>> vhost_zerocopy_callback being called for each segment of a
>> segmented UFO skb. This refcount is decremented then on each
>> segment, but incremented only once for the entire UFO skb.
>>
>> Before v4.14, these packets would be converted in skb_segment to
>> regular copy packets with skb_orphan_frags and the callback function
>> called once at this point. v4.14 added support for reference counted
>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>
>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>> to limit cloning of this state to those skbs that can do so safely.
>>
>> Please try a host with the following patch. This fixes it for me. I intend to
>> send it to net.
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a592ca025fc4..d2d985418819 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> -                       goto err;
>>
>>                 while (pos < offset + len) {
>>                         if (i >= nfrags) {
>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>                                 goto err;
>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> +                               goto err;
>>
>>                         *nskb_frag = *frag;
>>                         __skb_frag_ref(nskb_frag);
>>
>>
>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>> in the frags[] array. I will follow-up with a patch to net-next that only
>> checks once per skb:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 466581cf4cdc..a293a33604ec 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>                                               SKBTX_SHARED_FRAG;
>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>                         goto err;
>>
>>                 while (pos < offset + len) {
>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                                 BUG_ON(!nfrags);
>>
>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>> +                                                      GFP_ATOMIC))
>> +                                       goto err;
>> +
>>                                 list_skb = list_skb->next;
>>                         }
>>
>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>                                 goto err;
>>                         }
>>
>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>> -                               goto err;
>> -
> 
> I'm currently testing this one.
> 

Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
accept UFO datagrams from tuntap and packet".

At first, I tested an unpatched 4.14.7 - the problem (no more killable
qemu-process) did occur promptly on shutdown of the machine. This was
expected.

Next, I applied the above patch (the second one). Until now, I didn't
face any problem any more on shutdown of VMs. Looks promising.


Thanks,
regards,
Andreas

^ permalink raw reply

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Pavel Machek @ 2017-12-20 15:54 UTC (permalink / raw)
  To: Fujinaka, Todd
  Cc: Neftin, Sasha, Keller, Jacob E, bpoirier@suse.com,
	nix.or.die@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	lsorense@csclub.uwaterloo.ca, David Miller
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69845BBE2@ORSMSX110.amr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

Hi!

> >> Before ask for reverting 19110cfbb..., please, check if follow patch 
> >> of Benjamin work for you http://patchwork.ozlabs.org/patch/846825/

> >
> Pavel, before ask for revert - let's check Benjamin's patch following to his previous patch. Previous patch was not competed and latest one come to complete changes.
>

v4.15-rc4+:

Ethernet works with 19110cfbb reverted.

Ethernet works With patchwork.ozlabs.org/patch/846825/ applied.


									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH v1 2/4] lib/net_utils: Introduce mac_pton_from_user()
From: David Miller @ 2017-12-20 15:51 UTC (permalink / raw)
  To: gregkh; +Cc: andriy.shevchenko, netdev, Larry.Finger, florian.c.schilhabel,
	devel
In-Reply-To: <20171220071355.GB1957@kroah.com>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 20 Dec 2017 08:13:55 +0100

> On Tue, Dec 19, 2017 at 09:14:10PM +0200, Andy Shevchenko wrote:
>> Some drivers are getting MAC from user space. Make a helper for them.
>> 
>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> ---
>>  include/linux/kernel.h |  1 +
>>  lib/net_utils.c        | 12 ++++++++++++
>>  2 files changed, 13 insertions(+)
> 
> Don't do this just for some horrid staging drivers.  They can just drop
> that functionality entirely and use the "normal" way of doing this if
> they really want it.

Agreed.

^ permalink raw reply

* [PATCH net v2] openvswitch: Fix pop_vlan action for double tagged frames
From: Eric Garver @ 2017-12-20 15:39 UTC (permalink / raw)
  To: netdev; +Cc: ovs-dev, Jiri Benc

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames, but skb->protocol is set to the ethertype by
key_extract(). So temporarily set it to the TPID when doing a pop_vlan.

Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets")
Signed-off-by: Eric Garver <e@erig.me>
---
 net/openvswitch/actions.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 30a5df27116e..c484e0941047 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -280,6 +280,13 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
 static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 {
 	int err;
+	__be16 proto = skb->protocol;
+
+	/* skb->protocol is set to the inner most parsed ethertype. To satisfy
+	 * skb_vlan_pop() for multi-tagged frames we must set it to the tpid.
+	 */
+	if (is_flow_key_valid(key) && key->eth.vlan.tci && key->eth.cvlan.tci)
+		skb->protocol = key->eth.cvlan.tpid;
 
 	err = skb_vlan_pop(skb);
 	if (skb_vlan_tag_present(skb)) {
@@ -288,6 +295,9 @@ static int pop_vlan(struct sk_buff *skb, struct sw_flow_key *key)
 		key->eth.vlan.tci = 0;
 		key->eth.vlan.tpid = 0;
 	}
+
+	skb->protocol = proto;
+
 	return err;
 }
 
-- 
2.12.0

^ permalink raw reply related

* Re: [PATCH net v2] ipv4: Fix use-after-free when flushing FIB tables
From: Alexander Duyck @ 2017-12-20 15:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Netdev, David Miller, Duyck, Alexander H, David Ahern,
	Fengguang Wu, mlxsw
In-Reply-To: <20171220085156.27991-1-idosch@mellanox.com>

On Wed, Dec 20, 2017 at 12:51 AM, Ido Schimmel <idosch@mellanox.com> wrote:
> Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the
> local table uses the same trie allocated for the main table when custom
> rules are not in use.
>
> When a net namespace is dismantled, the main table is flushed and freed
> (via an RCU callback) before the local table. In case the callback is
> invoked before the local table is iterated, a use-after-free can occur.
>
> Fix this by iterating over the FIB tables in reverse order, so that the
> main table is always freed after the local table.
>
> v2: Add a comment to make the fix more explicit per Dave's and Alex's
> feedback.
>
> Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> ---
>  net/ipv4/fib_frontend.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index f52d27a422c3..08ce7f14ede1 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1298,14 +1298,17 @@ static int __net_init ip_fib_net_init(struct net *net)
>
>  static void ip_fib_net_exit(struct net *net)
>  {
> -       unsigned int i;
> +       int i;
>
>         rtnl_lock();
>  #ifdef CONFIG_IP_MULTIPLE_TABLES
>         RCU_INIT_POINTER(net->ipv4.fib_main, NULL);
>         RCU_INIT_POINTER(net->ipv4.fib_default, NULL);
>  #endif
> -       for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
> +       /* The local table must be destroyed before the main table,
> +        * as it might be using main's trie.
> +        */

I think we might want even more description here. Specifically why
reversing the order allows local to be destroyed before main. I was
thinking something along the lines of:

Destroy the tables in reverse order to guarantee that the local table,
ID 255, is destroyed before main table, ID 254. This is necessary as
local may contain references to data contained in main.

> +       for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) {
>                 struct hlist_head *head = &net->ipv4.fib_table_hash[i];
>                 struct hlist_node *tmp;
>                 struct fib_table *tb;
> --
> 2.14.3
>

^ permalink raw reply

* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-20 15:17 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev@vger.kernel.org, gerlitz.or@gmail.com
In-Reply-To: <VI1PR0501MB21433CE21CEF8212A8BBE21AAB0C0@VI1PR0501MB2143.eurprd05.prod.outlook.com>

On Wed, 20 Dec 2017 09:23:34 +0000
Chris Mi <chrism@mellanox.com> wrote:

> > Your real performance win is just not asking for ACK for every rule.  
> No. Even if batch_size > 1, we ack every rule. The real performance win is
> to send multiple rules in one system call. If we are not asking for ACK for every rule,
> the performance will be improved further.

Try the no ACK method.

When we were optimizing routing daemons like Quagga, it was discovered
that an ACK for every route insert was the main bottleneck. Doing asynchronous
error handling got a bigger win than your batching.

Please try that, doing multiple messages using iov is not necessary.

^ permalink raw reply

* Re: [PATCH bpf-next 1/8] bpf: offload: don't require rtnl for dev list manipulation
From: Kirill Tkhai @ 2017-12-20 15:00 UTC (permalink / raw)
  To: Jakub Kicinski, netdev, alexei.starovoitov, daniel; +Cc: oss-drivers
In-Reply-To: <20171220041006.25629-2-jakub.kicinski@netronome.com>

Hi, Jakub,

thanks for looking into this.

Sadly, that __bpf_prog_offload_destroy() needs rtnl_lock() context,
but rwsem is still good as it became useful for next patches from the series.

Please, see one small minor nit near the last hunk. Everything else looks good
for me.

On 20.12.2017 07:09, Jakub Kicinski wrote:
> We only need to hold rtnl_lock() around ndo calls.  The device
> offload initialization doesn't require it.  Neither will soon-
> -to-come querying the offload info.  Use struct rw_semaphore
> because map offload will require sleeping with the semaphore
> held for read.
> 
> Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>

> ---
>  kernel/bpf/offload.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> index 8455b89d1bbf..b88e5ebdc61d 100644
> --- a/kernel/bpf/offload.c
> +++ b/kernel/bpf/offload.c
> @@ -20,8 +20,12 @@
>  #include <linux/netdevice.h>
>  #include <linux/printk.h>
>  #include <linux/rtnetlink.h>
> +#include <linux/rwsem.h>
>  
> -/* protected by RTNL */
> +/* Protects bpf_prog_offload_devs and offload members of all progs.
> + * RTNL lock cannot be taken when holding this lock.
> + */
> +static struct rw_semaphore bpf_devs_lock;
>  static LIST_HEAD(bpf_prog_offload_devs);
>  
>  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
> @@ -43,17 +47,21 @@ int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr)
>  	offload->prog = prog;
>  	init_waitqueue_head(&offload->verifier_done);
>  
> -	rtnl_lock();
> +	/* Our UNREGISTER notifier will grab bpf_devs_lock, so we are safe
> +	 * to assume the netdev doesn't get unregistered as long as we hold
> +	 * bpf_devs_lock.
> +	 */
> +	down_write(&bpf_devs_lock);
>  	offload->netdev = __dev_get_by_index(net, attr->prog_ifindex);
>  	if (!offload->netdev) {
> -		rtnl_unlock();
> +		up_write(&bpf_devs_lock);
>  		kfree(offload);
>  		return -EINVAL;
>  	}
>  
>  	prog->aux->offload = offload;
>  	list_add_tail(&offload->offloads, &bpf_prog_offload_devs);
> -	rtnl_unlock();
> +	up_write(&bpf_devs_lock);
>  
>  	return 0;
>  }
> @@ -126,7 +134,9 @@ void bpf_prog_offload_destroy(struct bpf_prog *prog)
>  	wake_up(&offload->verifier_done);
>  
>  	rtnl_lock();
> +	down_write(&bpf_devs_lock);
>  	__bpf_prog_offload_destroy(prog);
> +	up_write(&bpf_devs_lock);
>  	rtnl_unlock();
>  
>  	kfree(offload);
> @@ -181,11 +191,13 @@ static int bpf_offload_notification(struct notifier_block *notifier,
>  		if (netdev->reg_state != NETREG_UNREGISTERING)
>  			break;
>  
> +		down_write(&bpf_devs_lock);
>  		list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
>  					 offloads) {
>  			if (offload->netdev == netdev)
>  				__bpf_prog_offload_destroy(offload->prog);
>  		}
> +		up_write(&bpf_devs_lock);
>  		break;
>  	default:
>  		break;
> @@ -199,6 +211,7 @@ static struct notifier_block bpf_offload_notifier = {
>  
>  static int __init bpf_offload_init(void)
>  {
> +	init_rwsem(&bpf_devs_lock);

DECLARE_RWSEM() could be used instead of this.

>  	register_netdevice_notifier(&bpf_offload_notifier);
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: [PATCH] net: Fix double free and memory corruption in get_net_ns_by_id()
From: Nicolas Dichtel @ 2017-12-20 15:00 UTC (permalink / raw)
  To: Eric W. Biederman, netdev
  Cc: David Miller, ktkhai, security, secalert, eric.dumazet, stephen
In-Reply-To: <87d13aaaqr.fsf@xmission.com>

Le 19/12/2017 à 18:27, Eric W. Biederman a écrit :
> 
> (I can trivially verify that that idr_remove in cleanup_net happens
>  after the network namespace count has dropped to zero --EWB)
> 
> Function get_net_ns_by_id() does not check for net::count
> after it has found a peer in netns_ids idr.
> 
> It may dereference a peer, after its count has already been
> finaly decremented. This leads to double free and memory
> corruption:
> 
> put_net(peer)                                   rtnl_lock()
> atomic_dec_and_test(&peer->count) [count=0]     ...
> __put_net(peer)                                 get_net_ns_by_id(net, id)
>   spin_lock(&cleanup_list_lock)
>   list_add(&net->cleanup_list, &cleanup_list)
>   spin_unlock(&cleanup_list_lock)
> queue_work()                                      peer = idr_find(&net->netns_ids, id)
>   |                                               get_net(peer) [count=1]
>   |                                               ...
>   |                                               (use after final put)
>   v                                               ...
>   cleanup_net()                                   ...
>     spin_lock(&cleanup_list_lock)                 ...
>     list_replace_init(&cleanup_list, ..)          ...
>     spin_unlock(&cleanup_list_lock)               ...
>     ...                                           ...
>     ...                                           put_net(peer)
>     ...                                             atomic_dec_and_test(&peer->count) [count=0]
>     ...                                               spin_lock(&cleanup_list_lock)
>     ...                                               list_add(&net->cleanup_list, &cleanup_list)
>     ...                                               spin_unlock(&cleanup_list_lock)
>     ...                                             queue_work()
>     ...                                           rtnl_unlock()
>     rtnl_lock()                                   ...
>     for_each_net(tmp) {                           ...
>       id = __peernet2id(tmp, peer)                ...
>       spin_lock_irq(&tmp->nsid_lock)              ...
>       idr_remove(&tmp->netns_ids, id)             ...
>       ...                                         ...
>       net_drop_ns()                               ...
> 	net_free(peer)                            ...
>     }                                             ...
>   |
>   v
>   cleanup_net()
>     ...
>     (Second free of peer)
> 
> Also, put_net() on the right cpu may reorder with left's cpu
> list_replace_init(&cleanup_list, ..), and then cleanup_list
> will be corrupted.
> 
> Since cleanup_net() is executed in worker thread, while
> put_net(peer) can happen everywhere, there should be
> enough time for concurrent get_net_ns_by_id() to pick
> the peer up, and the race does not seem to be unlikely.
> The patch fixes the problem in standard way.
> 
> (Also, there is possible problem in peernet2id_alloc(), which requires
> check for net::count under nsid_lock and maybe_get_net(peer), but
> in current stable kernel it's used under rtnl_lock() and it has to be
> safe. Openswitch begun to use peernet2id_alloc(), and possibly it should
> be fixed too. While this is not in stable kernel yet, so I'll send
> a separate message to netdev@ later).
> 
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids"
> Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Good catch, thank you.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: add vsock_test test suite
From: Jorgen S. Hansen @ 2017-12-20 14:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev@vger.kernel.org, Dexuan Cui
In-Reply-To: <20171213144911.6428-1-stefanha@redhat.com>

> On Dec 13, 2017, at 3:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> The vsock_diag.ko module already has a test suite but the core AF_VSOCK
> functionality has no tests.  This patch series adds several test cases that
> exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
> half-closed connections, simultaneous connections).
> 
> The test suite is modest but I hope to cover additional cases in the future.
> My goal is to have a shared test suite so VMCI, Hyper-V, and KVM can ensure
> that our transports behave the same.
> 
> I have tested virtio-vsock.
> 
> Jorgen: Please test the VMCI transport and let me know if anything needs to be
> adjusted.  See tools/testing/vsock/README for information on how to run the
> test suite.
> 

I tried running the vsock_test on VMCI, and all the tests failed in one way or
another:
1) connection reset test: when the guest tries to connect to the host, we
  get EINVAL as the error instead of ECONNRESET. I’ll fix that.
2) client close and server close tests: On the host side, VMCI doesn’t
  support reading data from a socket that has been closed by the
  guest. When the guest closes a connection, all data is gone, and
  we return EOF on the host side. So the tests that try to read data
  after close, should not attempt that on VMCI host side. I got the
  tests to pass by adding a getsockname call to determine if
  the local CID was the host CID, and then skip the read attempt
  in that case. We could add a vmci flag, that would enable
  this behavior.
3) send_byte(fd, -EPIPE): for the VMCI transport, the close
 isn’t necessarily visible immediately on the peer. So in most
 cases, these send operations would complete with success.
 I was running these tests using nested virtualization, so I
 suspect that the problem is more likely to occur here, but
 I had to add a sleep to be sure to get the EPIPE error.
4) server close test: the connect would sometimes fail. This looks
  like an issue where we detect the peer close on the client side
  before we complete the connection handshake on the client
  side. There are two different channels used for the connection
  handshake and the disconnect. I’ll look into this to see what
  exactly is going on.
5) multiple connections tests: with the standard socket sizes,
  VMCI is only able to support about 100 concurrent stream
  connections so this test passes with MULTICONN_NFDS
  set to 100.

Thanks,
Jorgen

^ permalink raw reply

* Re: [PATCH v1] net: bonding: Replace mac address parsing
From: Andy Gospodarek @ 2017-12-20 14:25 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Jay Vosburgh, Veaceslav Falico, netdev
In-Reply-To: <20171219182044.5678-1-andriy.shevchenko@linux.intel.com>

On Tue, Dec 19, 2017 at 08:20:44PM +0200, Andy Shevchenko wrote:
> Replace sscanf() with mac_pton().
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Nice cleanup.  Thanks!

Acked-by: Andy Gospodarek <andy@greyhouse.net>

> ---
>  drivers/net/bonding/bond_options.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
> index 8a9b085c2a98..58c705f24f96 100644
> --- a/drivers/net/bonding/bond_options.c
> +++ b/drivers/net/bonding/bond_options.c
> @@ -1431,13 +1431,9 @@ static int bond_option_ad_actor_system_set(struct bonding *bond,
>  {
>  	u8 macaddr[ETH_ALEN];
>  	u8 *mac;
> -	int i;
>  
>  	if (newval->string) {
> -		i = sscanf(newval->string, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> -			   &macaddr[0], &macaddr[1], &macaddr[2],
> -			   &macaddr[3], &macaddr[4], &macaddr[5]);
> -		if (i != ETH_ALEN)
> +		if (!mac_pton(newval->string, macaddr))
>  			goto err;
>  		mac = macaddr;
>  	} else {
> -- 
> 2.15.1
> 

^ permalink raw reply

* Re: [PATCH v3,net-next 2/2] ip6_gre: fix potential memory leak in ip6erspan_rcv
From: William Tu @ 2017-12-20 14:11 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Linux Kernel Network Developers, linux-kernel
In-Reply-To: <1513735621-21913-3-git-send-email-yanhaishuang@cmss.chinamobile.com>

On Tue, Dec 19, 2017 at 6:07 PM, Haishuang Yan
<yanhaishuang@cmss.chinamobile.com> wrote:
> If md is NULL, tun_dst must be freed, otherwise it will cause memory
> leak.
>
> Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
>
> ---
> Changes since v3:
>   * Rebase on latest master branch.
>   * Fix wrong commit information.
> ---
>  net/ipv6/ip6_gre.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 9bd1103..45038a9 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -550,8 +550,10 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
>
>                         info = &tun_dst->u.tun_info;
>                         md = ip_tunnel_info_opts(info);
> -                       if (!md)
> +                       if (!md) {
> +                               dst_release((struct dst_entry *)tun_dst);
>                                 return PACKET_REJECT;
isn't md allocated previously at
    tun_dst = ipv6_tun_rx_dst(skb, flags, tun_id,
                                           sizeof(*md));
so md should never be null after we check tun_dst?
William

^ permalink raw reply

* [PATCH net-next] net: packet: allow bind to device which is !IFF_UP
From: yuan linyu @ 2017-12-20 13:20 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, yuan linyu

From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

this try to allow tcpdump to capture packet once device IFF_UP

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
 net/packet/af_packet.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da215e5..11b19fc 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3124,13 +3124,8 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
 	if (proto == 0 || !need_rehook)
 		goto out_unlock;
 
-	if (!unlisted && (!dev || (dev->flags & IFF_UP))) {
+	if (!unlisted)
 		register_prot_hook(sk);
-	} else {
-		sk->sk_err = ENETDOWN;
-		if (!sock_flag(sk, SOCK_DEAD))
-			sk->sk_error_report(sk);
-	}
 
 out_unlock:
 	rcu_read_unlock();
-- 
2.7.4

^ permalink raw reply related

* Re: sparc64 verifier failures..
From: Daniel Borkmann @ 2017-12-20 13:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, alexei.starovoitov
In-Reply-To: <20171219.153636.783486472609136549.davem@davemloft.net>

On 12/19/2017 09:36 PM, David Miller wrote:
> 
> I'm getting about 100 verifier failures on sparc64.
> 
> The vast majority of them seem to be due to misaligned packet
> accesses.  Here is a sample of some of the failures.

Thanks, I'll check it next days. It would probably make sense to have
a --strict-align mode for test_verifier that enables it on all tests
unconditionally so selftests suite would always run in both modes (at
least on archs that don't have this restriction).

^ permalink raw reply

* [PATCH bpf-next 2/2] bpf: allow for correlation of maps and helpers in dump
From: Daniel Borkmann @ 2017-12-20 12:42 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: netdev, jakub.kicinski, quentin.monnet, Daniel Borkmann
In-Reply-To: <20171220124257.4512-1-daniel@iogearbox.net>

Currently a dump of an xlated prog (post verifier stage) doesn't
correlate used helpers as well as maps. The prog info lists
involved map ids, however there's no correlation of where in the
program they are used as of today. Likewise, bpftool does not
correlate helper calls with the target functions.

The latter can be done w/o any kernel changes through kallsyms,
and also has the advantage that this works with inlined helpers
and BPF calls.

Example, via interpreter:

  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf chain 0
  filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                      direct-action not_in_hw id 1 tag c74773051b364165   <-- prog id:1

  * Output before patch (calls/maps remain unclear):

  # bpftool prog dump xlated id 1             <-- dump prog id:1
   0: (b7) r1 = 2
   1: (63) *(u32 *)(r10 -4) = r1
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = 0xffff95c47a8d4800
   6: (85) call unknown#73040
   7: (15) if r0 == 0x0 goto pc+18
   8: (bf) r2 = r10
   9: (07) r2 += -4
  10: (bf) r1 = r0
  11: (85) call unknown#73040
  12: (15) if r0 == 0x0 goto pc+23
  [...]

  * Output after patch:

  # bpftool prog dump xlated id 1
   0: (b7) r1 = 2
   1: (63) *(u32 *)(r10 -4) = r1
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = map[id:2]                     <-- map id:2
   6: (85) call bpf_map_lookup_elem#73424     <-- helper call
   7: (15) if r0 == 0x0 goto pc+18
   8: (bf) r2 = r10
   9: (07) r2 += -4
  10: (bf) r1 = r0
  11: (85) call bpf_map_lookup_elem#73424
  12: (15) if r0 == 0x0 goto pc+23
  [...]

  # bpftool map show id 2                     <-- show/dump/etc map id:2
  2: hash_of_maps  flags 0x0
        key 4B  value 4B  max_entries 3  memlock 4096B

Example, JITed, same prog:

  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf chain 0
  filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                  direct-action not_in_hw id 3 tag c74773051b364165 jited

  # bpftool prog show id 3
  3: sched_cls  tag c74773051b364165
        loaded_at Dec 19/13:48  uid 0
        xlated 384B  jited 257B  memlock 4096B  map_ids 2

  # bpftool prog dump xlated id 3
   0: (b7) r1 = 2
   1: (63) *(u32 *)(r10 -4) = r1
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = map[id:2]                      <-- map id:2
   6: (85) call __htab_map_lookup_elem#77408   <-+ inlined rewrite
   7: (15) if r0 == 0x0 goto pc+2                |
   8: (07) r0 += 56                              |
   9: (79) r0 = *(u64 *)(r0 +0)                <-+
  10: (15) if r0 == 0x0 goto pc+24
  11: (bf) r2 = r10
  12: (07) r2 += -4
  [...]

Example, same prog, but kallsyms disabled (in that case we are
also not allowed to pass any relative offsets, etc, so prog
becomes pointer sanitized on dump):

  # sysctl kernel.kptr_restrict=2
  kernel.kptr_restrict = 2

  # bpftool prog dump xlated id 3
   0: (b7) r1 = 2
   1: (63) *(u32 *)(r10 -4) = r1
   2: (bf) r2 = r10
   3: (07) r2 += -4
   4: (18) r1 = map[id:2]
   6: (85) call bpf_unspec#0
   7: (15) if r0 == 0x0 goto pc+2
  [...]

Example, BPF calls via interpreter:

  # bpftool prog dump xlated id 1
   0: (85) call pc+2#__bpf_prog_run_args32
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

Example, BPF calls via JIT:

  # sysctl net.core.bpf_jit_enable=1
  net.core.bpf_jit_enable = 1
  # sysctl net.core.bpf_jit_kallsyms=1
  net.core.bpf_jit_kallsyms = 1

  # bpftool prog dump xlated id 1
   0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

And finally, an example for tail calls that is now working
as well wrt correlation:

  # bpftool prog dump xlated id 2
  [...]
  10: (b7) r2 = 8
  11: (85) call bpf_trace_printk#-41312
  12: (bf) r1 = r6
  13: (18) r2 = map[id:1]
  15: (b7) r3 = 0
  16: (85) call bpf_tail_call#12
  17: (b7) r1 = 42
  18: (6b) *(u16 *)(r6 +46) = r1
  19: (b7) r0 = 0
  20: (95) exit

  # bpftool map show id 1
  1: prog_array  flags 0x0
        key 4B  value 4B  max_entries 1  memlock 4096B

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/filter.h   |   9 +++
 kernel/bpf/core.c        |   4 +-
 kernel/bpf/disasm.c      |  65 ++++++++++++++---
 kernel/bpf/disasm.h      |  29 ++++++--
 kernel/bpf/syscall.c     |  87 +++++++++++++++++++++--
 kernel/bpf/verifier.c    |  30 ++++++--
 tools/bpf/bpftool/prog.c | 181 ++++++++++++++++++++++++++++++++++++++++++++---
 7 files changed, 370 insertions(+), 35 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e872b4e..2b0df27 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -18,6 +18,7 @@
 #include <linux/capability.h>
 #include <linux/cryptohash.h>
 #include <linux/set_memory.h>
+#include <linux/kallsyms.h>
 
 #include <net/sch_generic.h>
 
@@ -724,6 +725,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog);
 void bpf_jit_compile(struct bpf_prog *prog);
 bool bpf_helper_changes_pkt_data(void *func);
 
+static inline bool bpf_dump_raw_ok(void)
+{
+	/* Reconstruction of call-sites is dependent on kallsyms,
+	 * thus make dump the same restriction.
+	 */
+	return kallsyms_show_value() == 1;
+}
+
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 768e0a0..70a5345 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -771,7 +771,9 @@ struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *prog)
 
 /* Base function for offset calculation. Needs to go into .text section,
  * therefore keeping it non-static as well; will also be used by JITs
- * anyway later on, so do not let the compiler omit it.
+ * anyway later on, so do not let the compiler omit it. This also needs
+ * to go into kallsyms for correlation from e.g. bpftool, so naming
+ * must not change.
  */
 noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index 883f88f..8740406 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -21,10 +21,39 @@ static const char * const func_id_str[] = {
 };
 #undef __BPF_FUNC_STR_FN
 
-const char *func_id_name(int id)
+static const char *__func_get_name(const struct bpf_insn_cbs *cbs,
+				   const struct bpf_insn *insn,
+				   char *buff, size_t len)
 {
 	BUILD_BUG_ON(ARRAY_SIZE(func_id_str) != __BPF_FUNC_MAX_ID);
 
+	if (insn->src_reg != BPF_PSEUDO_CALL &&
+	    insn->imm >= 0 && insn->imm < __BPF_FUNC_MAX_ID &&
+	    func_id_str[insn->imm])
+		return func_id_str[insn->imm];
+
+	if (cbs && cbs->cb_call)
+		return cbs->cb_call(cbs->private_data, insn);
+
+	if (insn->src_reg == BPF_PSEUDO_CALL)
+		snprintf(buff, len, "%+d", insn->imm);
+
+	return buff;
+}
+
+static const char *__func_imm_name(const struct bpf_insn_cbs *cbs,
+				   const struct bpf_insn *insn,
+				   u64 full_imm, char *buff, size_t len)
+{
+	if (cbs && cbs->cb_imm)
+		return cbs->cb_imm(cbs->private_data, insn, full_imm);
+
+	snprintf(buff, len, "0x%llx", (unsigned long long)full_imm);
+	return buff;
+}
+
+const char *func_id_name(int id)
+{
 	if (id >= 0 && id < __BPF_FUNC_MAX_ID && func_id_str[id])
 		return func_id_str[id];
 	else
@@ -83,7 +112,7 @@ static const char *const bpf_jmp_string[16] = {
 	[BPF_EXIT >> 4] = "exit",
 };
 
-static void print_bpf_end_insn(bpf_insn_print_cb verbose,
+static void print_bpf_end_insn(bpf_insn_print_t verbose,
 			       struct bpf_verifier_env *env,
 			       const struct bpf_insn *insn)
 {
@@ -92,9 +121,12 @@ static void print_bpf_end_insn(bpf_insn_print_cb verbose,
 		insn->imm, insn->dst_reg);
 }
 
-void print_bpf_insn(bpf_insn_print_cb verbose, struct bpf_verifier_env *env,
-		    const struct bpf_insn *insn, bool allow_ptr_leaks)
+void print_bpf_insn(const struct bpf_insn_cbs *cbs,
+		    struct bpf_verifier_env *env,
+		    const struct bpf_insn *insn,
+		    bool allow_ptr_leaks)
 {
+	const bpf_insn_print_t verbose = cbs->cb_print;
 	u8 class = BPF_CLASS(insn->code);
 
 	if (class == BPF_ALU || class == BPF_ALU64) {
@@ -175,12 +207,15 @@ void print_bpf_insn(bpf_insn_print_cb verbose, struct bpf_verifier_env *env,
 			 */
 			u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
 			bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
+			char tmp[64];
 
 			if (map_ptr && !allow_ptr_leaks)
 				imm = 0;
 
-			verbose(env, "(%02x) r%d = 0x%llx\n", insn->code,
-				insn->dst_reg, (unsigned long long)imm);
+			verbose(env, "(%02x) r%d = %s\n",
+				insn->code, insn->dst_reg,
+				__func_imm_name(cbs, insn, imm,
+						tmp, sizeof(tmp)));
 		} else {
 			verbose(env, "BUG_ld_%02x\n", insn->code);
 			return;
@@ -189,12 +224,20 @@ void print_bpf_insn(bpf_insn_print_cb verbose, struct bpf_verifier_env *env,
 		u8 opcode = BPF_OP(insn->code);
 
 		if (opcode == BPF_CALL) {
-			if (insn->src_reg == BPF_PSEUDO_CALL)
-				verbose(env, "(%02x) call pc%+d\n", insn->code,
-					insn->imm);
-			else
+			char tmp[64];
+
+			if (insn->src_reg == BPF_PSEUDO_CALL) {
+				verbose(env, "(%02x) call pc%s\n",
+					insn->code,
+					__func_get_name(cbs, insn,
+							tmp, sizeof(tmp)));
+			} else {
+				strcpy(tmp, "unknown");
 				verbose(env, "(%02x) call %s#%d\n", insn->code,
-					func_id_name(insn->imm), insn->imm);
+					__func_get_name(cbs, insn,
+							tmp, sizeof(tmp)),
+					insn->imm);
+			}
 		} else if (insn->code == (BPF_JMP | BPF_JA)) {
 			verbose(env, "(%02x) goto pc%+d\n",
 				insn->code, insn->off);
diff --git a/kernel/bpf/disasm.h b/kernel/bpf/disasm.h
index 8de977e..e0857d0 100644
--- a/kernel/bpf/disasm.h
+++ b/kernel/bpf/disasm.h
@@ -17,16 +17,35 @@
 #include <linux/bpf.h>
 #include <linux/kernel.h>
 #include <linux/stringify.h>
+#ifndef __KERNEL__
+#include <stdio.h>
+#include <string.h>
+#endif
+
+struct bpf_verifier_env;
 
 extern const char *const bpf_alu_string[16];
 extern const char *const bpf_class_string[8];
 
 const char *func_id_name(int id);
 
-struct bpf_verifier_env;
-typedef void (*bpf_insn_print_cb)(struct bpf_verifier_env *env,
-				  const char *, ...);
-void print_bpf_insn(bpf_insn_print_cb verbose, struct bpf_verifier_env *env,
-		    const struct bpf_insn *insn, bool allow_ptr_leaks);
+typedef void (*bpf_insn_print_t)(struct bpf_verifier_env *env,
+				 const char *, ...);
+typedef const char *(*bpf_insn_revmap_call_t)(void *private_data,
+					      const struct bpf_insn *insn);
+typedef const char *(*bpf_insn_print_imm_t)(void *private_data,
+					    const struct bpf_insn *insn,
+					    __u64 full_imm);
+
+struct bpf_insn_cbs {
+	bpf_insn_print_t	cb_print;
+	bpf_insn_revmap_call_t	cb_call;
+	bpf_insn_print_imm_t	cb_imm;
+	void			*private_data;
+};
 
+void print_bpf_insn(const struct bpf_insn_cbs *cbs,
+		    struct bpf_verifier_env *env,
+		    const struct bpf_insn *insn,
+		    bool allow_ptr_leaks);
 #endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 30e728d..007802c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1558,6 +1558,67 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	return fd;
 }
 
+static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
+					      unsigned long addr)
+{
+	int i;
+
+	for (i = 0; i < prog->aux->used_map_cnt; i++)
+		if (prog->aux->used_maps[i] == (void *)addr)
+			return prog->aux->used_maps[i];
+	return NULL;
+}
+
+static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
+{
+	const struct bpf_map *map;
+	struct bpf_insn *insns;
+	u64 imm;
+	int i;
+
+	insns = kmemdup(prog->insnsi, bpf_prog_insn_size(prog),
+			GFP_USER);
+	if (!insns)
+		return insns;
+
+	for (i = 0; i < prog->len; i++) {
+		if (insns[i].code == (BPF_JMP | BPF_TAIL_CALL)) {
+			insns[i].code = BPF_JMP | BPF_CALL;
+			insns[i].imm = BPF_FUNC_tail_call;
+			/* fall-through */
+		}
+		if (insns[i].code == (BPF_JMP | BPF_CALL) ||
+		    insns[i].code == (BPF_JMP | BPF_CALL_ARGS)) {
+			if (insns[i].code == (BPF_JMP | BPF_CALL_ARGS))
+				insns[i].code = BPF_JMP | BPF_CALL;
+			if (!bpf_dump_raw_ok())
+				insns[i].imm = 0;
+			continue;
+		}
+
+		if (insns[i].code != (BPF_LD | BPF_IMM | BPF_DW))
+			continue;
+
+		imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
+		map = bpf_map_from_imm(prog, imm);
+		if (map) {
+			insns[i].src_reg = BPF_PSEUDO_MAP_FD;
+			insns[i].imm = map->id;
+			insns[i + 1].imm = 0;
+			continue;
+		}
+
+		if (!bpf_dump_raw_ok() &&
+		    imm == (unsigned long)prog->aux) {
+			insns[i].imm = 0;
+			insns[i + 1].imm = 0;
+			continue;
+		}
+	}
+
+	return insns;
+}
+
 static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 				   const union bpf_attr *attr,
 				   union bpf_attr __user *uattr)
@@ -1608,18 +1669,34 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	ulen = info.jited_prog_len;
 	info.jited_prog_len = prog->jited_len;
 	if (info.jited_prog_len && ulen) {
-		uinsns = u64_to_user_ptr(info.jited_prog_insns);
-		ulen = min_t(u32, info.jited_prog_len, ulen);
-		if (copy_to_user(uinsns, prog->bpf_func, ulen))
-			return -EFAULT;
+		if (bpf_dump_raw_ok()) {
+			uinsns = u64_to_user_ptr(info.jited_prog_insns);
+			ulen = min_t(u32, info.jited_prog_len, ulen);
+			if (copy_to_user(uinsns, prog->bpf_func, ulen))
+				return -EFAULT;
+		} else {
+			info.jited_prog_insns = 0;
+		}
 	}
 
 	ulen = info.xlated_prog_len;
 	info.xlated_prog_len = bpf_prog_insn_size(prog);
 	if (info.xlated_prog_len && ulen) {
+		struct bpf_insn *insns_sanitized;
+		bool fault;
+
+		if (prog->blinded && !bpf_dump_raw_ok()) {
+			info.xlated_prog_insns = 0;
+			goto done;
+		}
+		insns_sanitized = bpf_insn_prepare_dump(prog);
+		if (!insns_sanitized)
+			return -ENOMEM;
 		uinsns = u64_to_user_ptr(info.xlated_prog_insns);
 		ulen = min_t(u32, info.xlated_prog_len, ulen);
-		if (copy_to_user(uinsns, prog->insnsi, ulen))
+		fault = copy_to_user(uinsns, insns_sanitized, ulen);
+		kfree(insns_sanitized);
+		if (fault)
 			return -EFAULT;
 	}
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 811e6e9..f423726 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4426,9 +4426,12 @@ static int do_check(struct bpf_verifier_env *env)
 		}
 
 		if (env->log.level) {
+			const struct bpf_insn_cbs cbs = {
+				.cb_print	= verbose,
+			};
+
 			verbose(env, "%d: ", insn_idx);
-			print_bpf_insn(verbose, env, insn,
-				       env->allow_ptr_leaks);
+			print_bpf_insn(&cbs, env, insn, env->allow_ptr_leaks);
 		}
 
 		err = ext_analyzer_insn_hook(env, insn_idx, prev_insn_idx);
@@ -5016,14 +5019,14 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 {
 	struct bpf_prog *prog = env->prog, **func, *tmp;
 	int i, j, subprog_start, subprog_end = 0, len, subprog;
-	struct bpf_insn *insn = prog->insnsi;
+	struct bpf_insn *insn;
 	void *old_bpf_func;
 	int err = -ENOMEM;
 
 	if (env->subprog_cnt == 0)
 		return 0;
 
-	for (i = 0; i < prog->len; i++, insn++) {
+	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
 		if (insn->code != (BPF_JMP | BPF_CALL) ||
 		    insn->src_reg != BPF_PSEUDO_CALL)
 			continue;
@@ -5115,6 +5118,25 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		bpf_prog_lock_ro(func[i]);
 		bpf_prog_kallsyms_add(func[i]);
 	}
+
+	/* Last step: make now unused interpreter insns from main
+	 * prog consistent for later dump requests, so they can
+	 * later look the same as if they were interpreted only.
+	 */
+	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
+		unsigned long addr;
+
+		if (insn->code != (BPF_JMP | BPF_CALL) ||
+		    insn->src_reg != BPF_PSEUDO_CALL)
+			continue;
+		insn->off = env->insn_aux_data[i].call_imm;
+		subprog = find_subprog(env, i + insn->off + 1);
+		addr  = (unsigned long)func[subprog + 1]->bpf_func;
+		addr &= PAGE_MASK;
+		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
+			    addr - __bpf_call_base;
+	}
+
 	prog->jited = 1;
 	prog->bpf_func = func[0]->bpf_func;
 	prog->aux->func = func;
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 037484c..42ee889 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -401,6 +401,88 @@ static int do_show(int argc, char **argv)
 	return err;
 }
 
+#define SYM_MAX_NAME	256
+
+struct kernel_sym {
+	unsigned long address;
+	char name[SYM_MAX_NAME];
+};
+
+struct dump_data {
+	unsigned long address_call_base;
+	struct kernel_sym *sym_mapping;
+	__u32 sym_count;
+	char scratch_buff[SYM_MAX_NAME];
+};
+
+static int kernel_syms_cmp(const void *sym_a, const void *sym_b)
+{
+	return ((struct kernel_sym *)sym_a)->address -
+	       ((struct kernel_sym *)sym_b)->address;
+}
+
+static void kernel_syms_load(struct dump_data *dd)
+{
+	struct kernel_sym *sym;
+	char buff[256];
+	void *tmp, *address;
+	FILE *fp;
+
+	fp = fopen("/proc/kallsyms", "r");
+	if (!fp)
+		return;
+
+	while (!feof(fp)) {
+		if (!fgets(buff, sizeof(buff), fp))
+			break;
+		tmp = realloc(dd->sym_mapping,
+			      (dd->sym_count + 1) *
+			      sizeof(*dd->sym_mapping));
+		if (!tmp) {
+out:
+			free(dd->sym_mapping);
+			dd->sym_mapping = NULL;
+			fclose(fp);
+			return;
+		}
+		dd->sym_mapping = tmp;
+		sym = &dd->sym_mapping[dd->sym_count];
+		if (sscanf(buff, "%p %*c %s", &address, sym->name) != 2)
+			continue;
+		sym->address = (unsigned long)address;
+		if (!strcmp(sym->name, "__bpf_call_base")) {
+			dd->address_call_base = sym->address;
+			/* sysctl kernel.kptr_restrict was set */
+			if (!sym->address)
+				goto out;
+		}
+		if (sym->address)
+			dd->sym_count++;
+	}
+
+	fclose(fp);
+
+	qsort(dd->sym_mapping, dd->sym_count,
+	      sizeof(*dd->sym_mapping), kernel_syms_cmp);
+}
+
+static void kernel_syms_destroy(struct dump_data *dd)
+{
+	free(dd->sym_mapping);
+}
+
+static struct kernel_sym *kernel_syms_search(struct dump_data *dd,
+					     unsigned long key)
+{
+	struct kernel_sym sym = {
+		.address = key,
+	};
+
+	return dd->sym_mapping ?
+	       bsearch(&sym, dd->sym_mapping, dd->sym_count,
+		       sizeof(*dd->sym_mapping), kernel_syms_cmp) : NULL;
+}
+
 static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
 {
 	va_list args;
@@ -410,8 +492,71 @@ static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
 	va_end(args);
 }
 
-static void dump_xlated_plain(void *buf, unsigned int len, bool opcodes)
+static const char *print_call_pcrel(struct dump_data *dd,
+				    struct kernel_sym *sym,
+				    unsigned long address,
+				    const struct bpf_insn *insn)
 {
+	if (sym)
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "%+d#%s", insn->off, sym->name);
+	else
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "%+d#0x%lx", insn->off, address);
+	return dd->scratch_buff;
+}
+
+static const char *print_call_helper(struct dump_data *dd,
+				     struct kernel_sym *sym,
+				     unsigned long address)
+{
+	if (sym)
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "%s", sym->name);
+	else
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "0x%lx", address);
+	return dd->scratch_buff;
+}
+
+static const char *print_call(void *private_data,
+			      const struct bpf_insn *insn)
+{
+	struct dump_data *dd = private_data;
+	unsigned long address = dd->address_call_base + insn->imm;
+	struct kernel_sym *sym;
+
+	sym = kernel_syms_search(dd, address);
+	if (insn->src_reg == BPF_PSEUDO_CALL)
+		return print_call_pcrel(dd, sym, address, insn);
+	else
+		return print_call_helper(dd, sym, address);
+}
+
+static const char *print_imm(void *private_data,
+			     const struct bpf_insn *insn,
+			     __u64 full_imm)
+{
+	struct dump_data *dd = private_data;
+
+	if (insn->src_reg == BPF_PSEUDO_MAP_FD)
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "map[id:%u]", insn->imm);
+	else
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "0x%llx", (unsigned long long)full_imm);
+	return dd->scratch_buff;
+}
+
+static void dump_xlated_plain(struct dump_data *dd, void *buf,
+			      unsigned int len, bool opcodes)
+{
+	const struct bpf_insn_cbs cbs = {
+		.cb_print	= print_insn,
+		.cb_call	= print_call,
+		.cb_imm		= print_imm,
+		.private_data	= dd,
+	};
 	struct bpf_insn *insn = buf;
 	bool double_insn = false;
 	unsigned int i;
@@ -425,7 +570,7 @@ static void dump_xlated_plain(void *buf, unsigned int len, bool opcodes)
 		double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
 
 		printf("% 4d: ", i);
-		print_bpf_insn(print_insn, NULL, insn + i, true);
+		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
 			printf("       ");
@@ -454,8 +599,15 @@ static void print_insn_json(struct bpf_verifier_env *env, const char *fmt, ...)
 	va_end(args);
 }
 
-static void dump_xlated_json(void *buf, unsigned int len, bool opcodes)
+static void dump_xlated_json(struct dump_data *dd, void *buf,
+			     unsigned int len, bool opcodes)
 {
+	const struct bpf_insn_cbs cbs = {
+		.cb_print	= print_insn_json,
+		.cb_call	= print_call,
+		.cb_imm		= print_imm,
+		.private_data	= dd,
+	};
 	struct bpf_insn *insn = buf;
 	bool double_insn = false;
 	unsigned int i;
@@ -470,7 +622,7 @@ static void dump_xlated_json(void *buf, unsigned int len, bool opcodes)
 
 		jsonw_start_object(json_wtr);
 		jsonw_name(json_wtr, "disasm");
-		print_bpf_insn(print_insn_json, NULL, insn + i, true);
+		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
 			jsonw_name(json_wtr, "opcodes");
@@ -505,6 +657,7 @@ static void dump_xlated_json(void *buf, unsigned int len, bool opcodes)
 static int do_dump(int argc, char **argv)
 {
 	struct bpf_prog_info info = {};
+	struct dump_data dd = {};
 	__u32 len = sizeof(info);
 	unsigned int buf_size;
 	char *filepath = NULL;
@@ -592,6 +745,14 @@ static int do_dump(int argc, char **argv)
 		goto err_free;
 	}
 
+	if ((member_len == &info.jited_prog_len &&
+	     info.jited_prog_insns == 0) ||
+	    (member_len == &info.xlated_prog_len &&
+	     info.xlated_prog_insns == 0)) {
+		p_err("error retrieving insn dump: kernel.kptr_restrict set?");
+		goto err_free;
+	}
+
 	if (filepath) {
 		fd = open(filepath, O_WRONLY | O_CREAT | O_TRUNC, 0600);
 		if (fd < 0) {
@@ -608,17 +769,19 @@ static int do_dump(int argc, char **argv)
 			goto err_free;
 		}
 	} else {
-		if (member_len == &info.jited_prog_len)
+		if (member_len == &info.jited_prog_len) {
 			disasm_print_insn(buf, *member_len, opcodes);
-		else
+		} else {
+			kernel_syms_load(&dd);
 			if (json_output)
-				dump_xlated_json(buf, *member_len, opcodes);
+				dump_xlated_json(&dd, buf, *member_len, opcodes);
 			else
-				dump_xlated_plain(buf, *member_len, opcodes);
+				dump_xlated_plain(&dd, buf, *member_len, opcodes);
+			kernel_syms_destroy(&dd);
+		}
 	}
 
 	free(buf);
-
 	return 0;
 
 err_free:
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 1/2] bpf: fix kallsyms handling for subprogs
From: Daniel Borkmann @ 2017-12-20 12:42 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: netdev, jakub.kicinski, quentin.monnet, Daniel Borkmann
In-Reply-To: <20171220124257.4512-1-daniel@iogearbox.net>

Right now kallsyms handling is not working with JITed subprogs.
The reason is that when in 1c2a088a6626 ("bpf: x64: add JIT support
for multi-function programs") in jit_subprogs() they are passed
to bpf_prog_kallsyms_add(), then their prog type is 0, which BPF
core will think it's a cBPF program as only cBPF programs have a
0 type. Thus, they need to inherit the type from the main prog.

Once that is fixed, they are indeed added to the BPF kallsyms
infra, but their tag is 0. Therefore, since intention is to add
them as bpf_prog_F_<tag>, we need to pass them to bpf_prog_calc_tag()
first. And once this is resolved, there is a use-after-free on
prog cleanup: we remove the kallsyms entry from the main prog,
later walk all subprogs and call bpf_jit_free() on them. However,
the kallsyms linkage was never released on them. Thus, do that
for all subprogs right in __bpf_prog_put() when refcount hits 0.

Fixes: 1c2a088a6626 ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/syscall.c  | 6 ++++++
 kernel/bpf/verifier.c | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e2e1c78..30e728d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -937,10 +937,16 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
+		int i;
+
 		trace_bpf_prog_put_rcu(prog);
 		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog, do_idr_lock);
+
+		for (i = 0; i < prog->aux->func_cnt; i++)
+			bpf_prog_kallsyms_del(prog->aux->func[i]);
 		bpf_prog_kallsyms_del(prog);
+
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 	}
 }
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 48b2901..811e6e9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5062,7 +5062,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 			goto out_free;
 		memcpy(func[i]->insnsi, &prog->insnsi[subprog_start],
 		       len * sizeof(struct bpf_insn));
+		func[i]->type = prog->type;
 		func[i]->len = len;
+		if (bpf_prog_calc_tag(func[i]))
+			goto out_free;
 		func[i]->is_func = 1;
 		/* Use bpf_prog_F_tag to indicate functions in stack traces.
 		 * Long term would need debug info to populate names
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 0/2] bpftool improvements for xlated dump
From: Daniel Borkmann @ 2017-12-20 12:42 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: netdev, jakub.kicinski, quentin.monnet, Daniel Borkmann

This work adds correlation of maps and calls into the bpftool
xlated dump in order to help debugging and introspection of
loaded BPF progs. First patch makes kallsyms work on subprogs
with bpf calls, and second implements the actual correlation.
Details and example output can be found in the 2nd patch.

Thanks a lot!

Daniel Borkmann (2):
  bpf: fix kallsyms handling for subprogs
  bpf: allow for correlation of maps and helpers in dump

 include/linux/filter.h   |   9 +++
 kernel/bpf/core.c        |   4 +-
 kernel/bpf/disasm.c      |  65 ++++++++++++++---
 kernel/bpf/disasm.h      |  29 ++++++--
 kernel/bpf/syscall.c     |  93 ++++++++++++++++++++++--
 kernel/bpf/verifier.c    |  33 +++++++--
 tools/bpf/bpftool/prog.c | 181 ++++++++++++++++++++++++++++++++++++++++++++---
 7 files changed, 379 insertions(+), 35 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [patch net-next 10/10] mlxsw: core: Add support for reload
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.

In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 64 +++++++++++++++++++++++-------
 drivers/net/ethernet/mellanox/mlxsw/core.h |  5 ++-
 drivers/net/ethernet/mellanox/mlxsw/i2c.c  |  5 ++-
 drivers/net/ethernet/mellanox/mlxsw/pci.c  |  5 ++-
 4 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 9fe25b1..4b33919 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -113,6 +113,7 @@ struct mlxsw_core {
 	struct mlxsw_thermal *thermal;
 	struct mlxsw_core_port *ports;
 	unsigned int max_ports;
+	bool reload_fail;
 	unsigned long driver_priv[0];
 	/* driver_priv has to be always the last item */
 };
@@ -962,7 +963,28 @@ mlxsw_devlink_sb_occ_tc_port_bind_get(struct devlink_port *devlink_port,
 						     pool_type, p_cur, p_max);
 }
 
+static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	const struct mlxsw_bus *mlxsw_bus = mlxsw_core->bus;
+	int err;
+
+	if (!mlxsw_bus->reset)
+		return -EOPNOTSUPP;
+
+	mlxsw_core_bus_device_unregister(mlxsw_core, true);
+	mlxsw_bus->reset(mlxsw_core->bus_priv);
+	err = mlxsw_core_bus_device_register(mlxsw_core->bus_info,
+					     mlxsw_core->bus,
+					     mlxsw_core->bus_priv, true,
+					     devlink);
+	if (err)
+		mlxsw_core->reload_fail = true;
+	return err;
+}
+
 static const struct devlink_ops mlxsw_devlink_ops = {
+	.reload				= mlxsw_devlink_core_bus_device_reload,
 	.port_type_set			= mlxsw_devlink_port_type_set,
 	.port_split			= mlxsw_devlink_port_split,
 	.port_unsplit			= mlxsw_devlink_port_unsplit,
@@ -980,23 +1002,26 @@ static const struct devlink_ops mlxsw_devlink_ops = {
 
 int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 				   const struct mlxsw_bus *mlxsw_bus,
-				   void *bus_priv)
+				   void *bus_priv, bool reload,
+				   struct devlink *devlink)
 {
 	const char *device_kind = mlxsw_bus_info->device_kind;
 	struct mlxsw_core *mlxsw_core;
 	struct mlxsw_driver *mlxsw_driver;
-	struct devlink *devlink;
 	size_t alloc_size;
 	int err;
 
 	mlxsw_driver = mlxsw_core_driver_get(device_kind);
 	if (!mlxsw_driver)
 		return -EINVAL;
-	alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
-	devlink = devlink_alloc(&mlxsw_devlink_ops, alloc_size);
-	if (!devlink) {
-		err = -ENOMEM;
-		goto err_devlink_alloc;
+
+	if (!reload) {
+		alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
+		devlink = devlink_alloc(&mlxsw_devlink_ops, alloc_size);
+		if (!devlink) {
+			err = -ENOMEM;
+			goto err_devlink_alloc;
+		}
 	}
 
 	mlxsw_core = devlink_priv(devlink);
@@ -1012,7 +1037,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_bus_init;
 
-	if (mlxsw_driver->resources_register) {
+	if (mlxsw_driver->resources_register && !reload) {
 		err = mlxsw_driver->resources_register(mlxsw_core);
 		if (err)
 			goto err_register_resources;
@@ -1038,9 +1063,11 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_emad_init;
 
-	err = devlink_register(devlink, mlxsw_bus_info->dev);
-	if (err)
-		goto err_devlink_register;
+	if (!reload) {
+		err = devlink_register(devlink, mlxsw_bus_info->dev);
+		if (err)
+			goto err_devlink_register;
+	}
 
 	err = mlxsw_hwmon_init(mlxsw_core, mlxsw_bus_info, &mlxsw_core->hwmon);
 	if (err)
@@ -1082,20 +1109,29 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_register);
 
-void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
+				      bool reload)
 {
 	const char *device_kind = mlxsw_core->bus_info->device_kind;
 	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
+	if (mlxsw_core->reload_fail)
+		goto out;
+
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
 	mlxsw_thermal_fini(mlxsw_core->thermal);
-	devlink_unregister(devlink);
+	if (!reload)
+		devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	kfree(mlxsw_core->lag.mapping);
 	mlxsw_ports_fini(mlxsw_core);
-	devlink_resources_unregister(devlink, NULL);
+	if (!reload)
+		devlink_resources_unregister(devlink, NULL);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
+	if (reload)
+		return;
+out:
 	devlink_free(devlink);
 	mlxsw_core_driver_put(device_kind);
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index e44061d..5ddafd7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -66,8 +66,9 @@ void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
 
 int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 				   const struct mlxsw_bus *mlxsw_bus,
-				   void *bus_priv);
-void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core);
+				   void *bus_priv, bool reload,
+				   struct devlink *devlink);
+void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core, bool reload);
 
 struct mlxsw_tx_info {
 	u8 local_port;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/i2c.c b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
index c0dcfa0..25f9915 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/i2c.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/i2c.c
@@ -539,7 +539,8 @@ static int mlxsw_i2c_probe(struct i2c_client *client,
 	mlxsw_i2c->dev = &client->dev;
 
 	err = mlxsw_core_bus_device_register(&mlxsw_i2c->bus_info,
-					     &mlxsw_i2c_bus, mlxsw_i2c);
+					     &mlxsw_i2c_bus, mlxsw_i2c, false,
+					     NULL);
 	if (err) {
 		dev_err(&client->dev, "Fail to register core bus\n");
 		return err;
@@ -557,7 +558,7 @@ static int mlxsw_i2c_remove(struct i2c_client *client)
 {
 	struct mlxsw_i2c *mlxsw_i2c = i2c_get_clientdata(client);
 
-	mlxsw_core_bus_device_unregister(mlxsw_i2c->core);
+	mlxsw_core_bus_device_unregister(mlxsw_i2c->core, false);
 	mutex_destroy(&mlxsw_i2c->cmd.lock);
 
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 6468cf2..ceb8785 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1734,7 +1734,8 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mlxsw_pci->id = id;
 
 	err = mlxsw_core_bus_device_register(&mlxsw_pci->bus_info,
-					     &mlxsw_pci_bus, mlxsw_pci);
+					     &mlxsw_pci_bus, mlxsw_pci, false,
+					     NULL);
 	if (err) {
 		dev_err(&pdev->dev, "cannot register bus device\n");
 		goto err_bus_device_register;
@@ -1762,7 +1763,7 @@ static void mlxsw_pci_remove(struct pci_dev *pdev)
 {
 	struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev);
 
-	mlxsw_core_bus_device_unregister(mlxsw_pci->core);
+	mlxsw_core_bus_device_unregister(mlxsw_pci->core, false);
 	mlxsw_pci_free_irq_vectors(mlxsw_pci);
 	iounmap(mlxsw_pci->hw_addr);
 	pci_release_regions(mlxsw_pci->pdev);
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 09/10] mlxsw: pci: Add support for getting resource through devlink
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 16 ++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h     |  9 ++++
 drivers/net/ethernet/mellanox/mlxsw/pci.c      | 40 +++++-------------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 57 ++++++++++++++++++++++++++
 4 files changed, 92 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 2488662..9fe25b1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1800,6 +1800,22 @@ void mlxsw_core_flush_owq(void)
 }
 EXPORT_SYMBOL(mlxsw_core_flush_owq);
 
+int mlxsw_core_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+			     const struct mlxsw_config_profile *profile,
+			     u64 *p_single_size, u64 *p_double_size,
+			     u64 *p_linear_size)
+{
+	struct mlxsw_driver *driver =  mlxsw_core->driver;
+
+	if (!driver->kvd_sizes_get)
+		return -EINVAL;
+
+	return driver->kvd_sizes_get(mlxsw_core, profile,
+				     p_single_size, p_double_size,
+				     p_linear_size);
+}
+EXPORT_SYMBOL(mlxsw_core_kvd_sizes_get);
+
 static int __init mlxsw_core_module_init(void)
 {
 	int err;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index e23f83b..e44061d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -309,10 +309,19 @@ struct mlxsw_driver {
 	void (*txhdr_construct)(struct sk_buff *skb,
 				const struct mlxsw_tx_info *tx_info);
 	int (*resources_register)(struct mlxsw_core *mlxsw_core);
+	int (*kvd_sizes_get)(struct mlxsw_core *mlxsw_core,
+			     const struct mlxsw_config_profile *profile,
+			     u64 *p_single_size, u64 *p_double_size,
+			     u64 *p_linear_size);
 	u8 txhdr_len;
 	const struct mlxsw_config_profile *profile;
 };
 
+int mlxsw_core_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+			     const struct mlxsw_config_profile *profile,
+			     u64 *p_single_size, u64 *p_double_size,
+			     u64 *p_linear_size);
+
 bool mlxsw_core_res_valid(struct mlxsw_core *mlxsw_core,
 			  enum mlxsw_res_id res_id);
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 39872fb..6468cf2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1053,38 +1053,18 @@ static int mlxsw_pci_resources_query(struct mlxsw_pci *mlxsw_pci, char *mbox,
 }
 
 static int
-mlxsw_pci_profile_get_kvd_sizes(const struct mlxsw_config_profile *profile,
+mlxsw_pci_profile_get_kvd_sizes(const struct mlxsw_pci *mlxsw_pci,
+				const struct mlxsw_config_profile *profile,
 				struct mlxsw_res *res)
 {
-	u32 single_size, double_size, linear_size;
-
-	if (!MLXSW_RES_VALID(res, KVD_SINGLE_MIN_SIZE) ||
-	    !MLXSW_RES_VALID(res, KVD_DOUBLE_MIN_SIZE) ||
-	    !profile->used_kvd_split_data)
-		return -EIO;
-
-	linear_size = profile->kvd_linear_size;
+	u64 single_size, double_size, linear_size;
+	int err;
 
-	/* The hash part is what left of the kvd without the
-	 * linear part. It is split to the single size and
-	 * double size by the parts ratio from the profile.
-	 * Both sizes must be a multiplications of the
-	 * granularity from the profile.
-	 */
-	double_size = MLXSW_RES_GET(res, KVD_SIZE) - linear_size;
-	double_size *= profile->kvd_hash_double_parts;
-	double_size /= profile->kvd_hash_double_parts +
-		       profile->kvd_hash_single_parts;
-	double_size /= profile->kvd_hash_granularity;
-	double_size *= profile->kvd_hash_granularity;
-	single_size = MLXSW_RES_GET(res, KVD_SIZE) - double_size -
-		      linear_size;
-
-	/* Check results are legal. */
-	if (single_size < MLXSW_RES_GET(res, KVD_SINGLE_MIN_SIZE) ||
-	    double_size < MLXSW_RES_GET(res, KVD_DOUBLE_MIN_SIZE) ||
-	    MLXSW_RES_GET(res, KVD_SIZE) < linear_size)
-		return -EIO;
+	err = mlxsw_core_kvd_sizes_get(mlxsw_pci->core, profile,
+				       &single_size, &double_size,
+				       &linear_size);
+	if (err)
+		return err;
 
 	MLXSW_RES_SET(res, KVD_SINGLE_SIZE, single_size);
 	MLXSW_RES_SET(res, KVD_DOUBLE_SIZE, double_size);
@@ -1185,7 +1165,7 @@ static int mlxsw_pci_config_profile(struct mlxsw_pci *mlxsw_pci, char *mbox,
 			mbox, profile->adaptive_routing_group_cap);
 	}
 	if (MLXSW_RES_VALID(res, KVD_SIZE)) {
-		err = mlxsw_pci_profile_get_kvd_sizes(profile, res);
+		err = mlxsw_pci_profile_get_kvd_sizes(mlxsw_pci, profile, res);
 		if (err)
 			return err;
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 68e7d13..ab41438 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4122,6 +4122,62 @@ static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
 	return 0;
 }
 
+static int mlxsw_sp_kvd_sizes_get(struct mlxsw_core *mlxsw_core,
+				  const struct mlxsw_config_profile *profile,
+				  u64 *p_single_size, u64 *p_double_size,
+				  u64 *p_linear_size)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
+	u64 double_size;
+	int err;
+
+	if (!MLXSW_CORE_RES_VALID(mlxsw_core, KVD_SINGLE_MIN_SIZE) ||
+	    !MLXSW_CORE_RES_VALID(mlxsw_core, KVD_DOUBLE_MIN_SIZE) ||
+	    !profile->used_kvd_split_data)
+		return -EIO;
+
+	/* The hash part is what left of the kvd without the
+	 * linear part. It is split to the single size and
+	 * double size by the parts ratio from the profile.
+	 * Both sizes must be a multiplications of the
+	 * granularity from the profile. In case the user
+	 * provided the sizes they are obtained via devlink.
+	 */
+	err = devlink_resource_size_get(devlink,
+					MLXSW_SP_RESOURCE_KVD_LINEAR,
+					p_linear_size);
+	if (err)
+		*p_linear_size = profile->kvd_linear_size;
+
+	err = devlink_resource_size_get(devlink,
+					MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+					p_double_size);
+	if (err) {
+		double_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) -
+			      *p_linear_size;
+		double_size *= profile->kvd_hash_double_parts;
+		double_size /= profile->kvd_hash_double_parts +
+			       profile->kvd_hash_single_parts;
+		*p_double_size = rounddown(double_size,
+					   profile->kvd_hash_granularity);
+	}
+
+	err = devlink_resource_size_get(devlink,
+					MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+					p_single_size);
+	if (err)
+		*p_single_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) -
+				 *p_double_size - *p_linear_size;
+
+	/* Check results are legal. */
+	if (*p_single_size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_SINGLE_MIN_SIZE) ||
+	    *p_double_size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_DOUBLE_MIN_SIZE) ||
+	    MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE) < *p_linear_size)
+		return -EIO;
+
+	return 0;
+}
+
 static struct mlxsw_driver mlxsw_sp_driver = {
 	.kind				= mlxsw_sp_driver_name,
 	.priv_size			= sizeof(struct mlxsw_sp),
@@ -4142,6 +4198,7 @@ static struct mlxsw_driver mlxsw_sp_driver = {
 	.sb_occ_tc_port_bind_get	= mlxsw_sp_sb_occ_tc_port_bind_get,
 	.txhdr_construct		= mlxsw_sp_txhdr_construct,
 	.resources_register		= mlxsw_sp_resources_register,
+	.kvd_sizes_get			= mlxsw_sp_kvd_sizes_get,
 	.txhdr_len			= MLXSW_TXHDR_LEN,
 	.profile			= &mlxsw_sp_config_profile,
 };
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 08/10] mlxsw: spectrum: Add support for getting kvdl occupancy
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add support for getting the kvdl occupancy through the resource interface.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  9 ++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_kvdl.c    | 26 ++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 6e615ae..68e7d13 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4043,12 +4043,21 @@ mlxsw_sp_resource_kvd_hash_double_size_validate(struct devlink *devlink, u64 siz
 	return 0;
 }
 
+static u64 mlxsw_sp_resource_kvd_linear_occ_get(struct devlink *devlink)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+
+	return mlxsw_sp_kvdl_occ_get(mlxsw_sp);
+}
+
 static struct devlink_resource_ops mlxsw_sp_resource_kvd_ops = {
 	.size_validate = mlxsw_sp_resource_kvd_size_validate,
 };
 
 static struct devlink_resource_ops mlxsw_sp_resource_kvd_linear_ops = {
 	.size_validate = mlxsw_sp_resource_kvd_linear_size_validate,
+	.occ_get = mlxsw_sp_resource_kvd_linear_occ_get,
 };
 
 static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_single_ops = {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 5abc8c5..ff8d32b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -469,6 +469,7 @@ void mlxsw_sp_kvdl_free(struct mlxsw_sp *mlxsw_sp, int entry_index);
 int mlxsw_sp_kvdl_alloc_size_query(struct mlxsw_sp *mlxsw_sp,
 				   unsigned int entry_count,
 				   unsigned int *p_alloc_size);
+u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp);
 
 struct mlxsw_sp_acl_rule_info {
 	unsigned int priority;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
index 310c382..cfacc17 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c
@@ -286,6 +286,32 @@ static void mlxsw_sp_kvdl_parts_fini(struct mlxsw_sp *mlxsw_sp)
 		mlxsw_sp_kvdl_part_fini(mlxsw_sp, i);
 }
 
+u64 mlxsw_sp_kvdl_part_occ(struct mlxsw_sp_kvdl_part *part)
+{
+	unsigned int nr_entries;
+	int bit = -1;
+	u64 occ = 0;
+
+	nr_entries = (part->info->end_index -
+		      part->info->start_index + 1) /
+		      part->info->alloc_size;
+	while ((bit = find_next_bit(part->usage, nr_entries, bit + 1))
+		< nr_entries)
+		occ += part->info->alloc_size;
+	return occ;
+}
+
+u64 mlxsw_sp_kvdl_occ_get(const struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_kvdl_part *part;
+	u64 occ = 0;
+
+	list_for_each_entry(part, &mlxsw_sp->kvdl->parts_list, list)
+		occ += mlxsw_sp_kvdl_part_occ(part);
+
+	return occ;
+}
+
 int mlxsw_sp_kvdl_init(struct mlxsw_sp *mlxsw_sp)
 {
 	struct mlxsw_sp_kvdl *kvdl;
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 06/10] mlxsw: spectrum: Register KVD resources with devlink
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     |   9 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h     |   1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 135 +++++++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  12 +++
 4 files changed, 157 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index f3315bc..2488662 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1012,6 +1012,12 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_bus_init;
 
+	if (mlxsw_driver->resources_register) {
+		err = mlxsw_driver->resources_register(mlxsw_core);
+		if (err)
+			goto err_register_resources;
+	}
+
 	err = mlxsw_ports_init(mlxsw_core);
 	if (err)
 		goto err_ports_init;
@@ -1067,6 +1073,8 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 err_ports_init:
 	mlxsw_bus->fini(bus_priv);
 err_bus_init:
+	devlink_resources_unregister(devlink, NULL);
+err_register_resources:
 	devlink_free(devlink);
 err_devlink_alloc:
 	mlxsw_core_driver_put(device_kind);
@@ -1086,6 +1094,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 	mlxsw_emad_fini(mlxsw_core);
 	kfree(mlxsw_core->lag.mapping);
 	mlxsw_ports_fini(mlxsw_core);
+	devlink_resources_unregister(devlink, NULL);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	devlink_free(devlink);
 	mlxsw_core_driver_put(device_kind);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 34dda96..e23f83b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -308,6 +308,7 @@ struct mlxsw_driver {
 				       u32 *p_cur, u32 *p_max);
 	void (*txhdr_construct)(struct sk_buff *skb,
 				const struct mlxsw_tx_info *tx_info);
+	int (*resources_register)(struct mlxsw_core *mlxsw_core);
 	u8 txhdr_len;
 	const struct mlxsw_config_profile *profile;
 };
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d373df7..6e615ae 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3979,6 +3979,140 @@ static const struct mlxsw_config_profile mlxsw_sp_config_profile = {
 	.resource_query_enable		= 1,
 };
 
+static bool
+mlxsw_sp_resource_kvd_granularity_validate(struct netlink_ext_ack *extack,
+					   u64 size)
+{
+	const struct mlxsw_config_profile *profile;
+
+	profile = &mlxsw_sp_config_profile;
+	if (size % profile->kvd_hash_granularity) {
+		NL_SET_ERR_MSG_MOD(extack, "resource set with wrong granularity");
+		return false;
+	}
+	return true;
+}
+
+static int
+mlxsw_sp_resource_kvd_size_validate(struct devlink *devlink, u64 size,
+				    struct netlink_ext_ack *extack)
+{
+	NL_SET_ERR_MSG_MOD(extack, "kvd size cannot be changed");
+	return -EINVAL;
+}
+
+static int
+mlxsw_sp_resource_kvd_linear_size_validate(struct devlink *devlink, u64 size,
+					   struct netlink_ext_ack *extack)
+{
+	if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int
+mlxsw_sp_resource_kvd_hash_single_size_validate(struct devlink *devlink, u64 size,
+						struct netlink_ext_ack *extack)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+	if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+		return -EINVAL;
+
+	if (size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_SINGLE_MIN_SIZE)) {
+		NL_SET_ERR_MSG_MOD(extack, "hash single size is smaller than minimum");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int
+mlxsw_sp_resource_kvd_hash_double_size_validate(struct devlink *devlink, u64 size,
+						struct netlink_ext_ack *extack)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+
+	if (!mlxsw_sp_resource_kvd_granularity_validate(extack, size))
+		return -EINVAL;
+
+	if (size < MLXSW_CORE_RES_GET(mlxsw_core, KVD_DOUBLE_MIN_SIZE)) {
+		NL_SET_ERR_MSG_MOD(extack, "hash double size is smaller than minimum");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_ops = {
+	.size_validate = mlxsw_sp_resource_kvd_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_linear_ops = {
+	.size_validate = mlxsw_sp_resource_kvd_linear_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_single_ops = {
+	.size_validate = mlxsw_sp_resource_kvd_hash_single_size_validate,
+};
+
+static struct devlink_resource_ops mlxsw_sp_resource_kvd_hash_double_ops = {
+	.size_validate = mlxsw_sp_resource_kvd_hash_double_size_validate,
+};
+
+static int mlxsw_sp_resources_register(struct mlxsw_core *mlxsw_core)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
+	u32 kvd_size, single_size, double_size, linear_size;
+	const struct mlxsw_config_profile *profile;
+	int err;
+
+	profile = &mlxsw_sp_config_profile;
+	if (!MLXSW_CORE_RES_VALID(mlxsw_core, KVD_SIZE))
+		return -EIO;
+
+	kvd_size = MLXSW_CORE_RES_GET(mlxsw_core, KVD_SIZE);
+	err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD,
+					true, kvd_size,
+					MLXSW_SP_RESOURCE_KVD,
+					DEVLINK_RESOURCE_ID_PARENT_TOP,
+					&mlxsw_sp_resource_kvd_ops);
+	if (err)
+		return err;
+
+	linear_size = profile->kvd_linear_size;
+	err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_LINEAR,
+					false, linear_size,
+					MLXSW_SP_RESOURCE_KVD_LINEAR,
+					MLXSW_SP_RESOURCE_KVD,
+					&mlxsw_sp_resource_kvd_linear_ops);
+	if (err)
+		return err;
+
+	double_size = kvd_size - linear_size;
+	double_size *= profile->kvd_hash_double_parts;
+	double_size /= profile->kvd_hash_double_parts +
+		       profile->kvd_hash_single_parts;
+	double_size = rounddown(double_size, profile->kvd_hash_granularity);
+	err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_HASH_DOUBLE,
+					false, double_size,
+					MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+					MLXSW_SP_RESOURCE_KVD,
+					&mlxsw_sp_resource_kvd_hash_double_ops);
+	if (err)
+		return err;
+
+	single_size = kvd_size - double_size - linear_size;
+	err = devlink_resource_register(devlink, MLXSW_SP_RESOURCE_NAME_KVD_HASH_SINGLE,
+					false, single_size,
+					MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+					MLXSW_SP_RESOURCE_KVD,
+					&mlxsw_sp_resource_kvd_hash_single_ops);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 static struct mlxsw_driver mlxsw_sp_driver = {
 	.kind				= mlxsw_sp_driver_name,
 	.priv_size			= sizeof(struct mlxsw_sp),
@@ -3998,6 +4132,7 @@ static struct mlxsw_driver mlxsw_sp_driver = {
 	.sb_occ_port_pool_get		= mlxsw_sp_sb_occ_port_pool_get,
 	.sb_occ_tc_port_bind_get	= mlxsw_sp_sb_occ_tc_port_bind_get,
 	.txhdr_construct		= mlxsw_sp_txhdr_construct,
+	.resources_register		= mlxsw_sp_resources_register,
 	.txhdr_len			= MLXSW_TXHDR_LEN,
 	.profile			= &mlxsw_sp_config_profile,
 };
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a0adcd8..5abc8c5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -66,6 +66,18 @@
 #define MLXSW_SP_KVD_LINEAR_SIZE 98304 /* entries */
 #define MLXSW_SP_KVD_GRANULARITY 128
 
+#define MLXSW_SP_RESOURCE_NAME_KVD "kvd"
+#define MLXSW_SP_RESOURCE_NAME_KVD_LINEAR "linear"
+#define MLXSW_SP_RESOURCE_NAME_KVD_HASH_SINGLE "hash_single"
+#define MLXSW_SP_RESOURCE_NAME_KVD_HASH_DOUBLE "hash_double"
+
+enum mlxsw_sp_resource_id {
+	MLXSW_SP_RESOURCE_KVD,
+	MLXSW_SP_RESOURCE_KVD_LINEAR,
+	MLXSW_SP_RESOURCE_KVD_HASH_SINGLE,
+	MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE,
+};
+
 struct mlxsw_sp_port;
 struct mlxsw_sp_rif;
 
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 07/10] mlxsw: spectrum_dpipe: Connect dpipe tables to resources
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host - KVD hash single
2. IPv6 host - KVD hash double
3. Adjacency - KVD linear

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 72 ++++++++++++++++++----
 1 file changed, 60 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
index 96fdba7..282fe82 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
@@ -774,11 +774,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_host4_ops = {
 static int mlxsw_sp_dpipe_host4_table_init(struct mlxsw_sp *mlxsw_sp)
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+	int err;
 
-	return devlink_dpipe_table_register(devlink,
-					    MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
-					    &mlxsw_sp_host4_ops,
-					    mlxsw_sp, false);
+	err = devlink_dpipe_table_register(devlink,
+					   MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
+					   &mlxsw_sp_host4_ops,
+					   mlxsw_sp, false);
+	if (err)
+		return err;
+
+	err = devlink_dpipe_table_resource_set(devlink,
+					       MLXSW_SP_DPIPE_TABLE_NAME_HOST4,
+					       MLXSW_SP_RESOURCE_KVD_HASH_SINGLE);
+	if (err)
+		goto err_resource_set;
+
+	return 0;
+
+err_resource_set:
+	devlink_dpipe_table_unregister(devlink,
+				       MLXSW_SP_DPIPE_TABLE_NAME_HOST4);
+	return err;
 }
 
 static void mlxsw_sp_dpipe_host4_table_fini(struct mlxsw_sp *mlxsw_sp)
@@ -832,11 +848,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_host6_ops = {
 static int mlxsw_sp_dpipe_host6_table_init(struct mlxsw_sp *mlxsw_sp)
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+	int err;
 
-	return devlink_dpipe_table_register(devlink,
-					    MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
-					    &mlxsw_sp_host6_ops,
-					    mlxsw_sp, false);
+	err = devlink_dpipe_table_register(devlink,
+					   MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
+					   &mlxsw_sp_host6_ops,
+					   mlxsw_sp, false);
+	if (err)
+		return err;
+
+	err = devlink_dpipe_table_resource_set(devlink,
+					       MLXSW_SP_DPIPE_TABLE_NAME_HOST6,
+					       MLXSW_SP_RESOURCE_KVD_HASH_DOUBLE);
+	if (err)
+		goto err_resource_set;
+
+	return 0;
+
+err_resource_set:
+	devlink_dpipe_table_unregister(devlink,
+				       MLXSW_SP_DPIPE_TABLE_NAME_HOST6);
+	return err;
 }
 
 static void mlxsw_sp_dpipe_host6_table_fini(struct mlxsw_sp *mlxsw_sp)
@@ -1216,11 +1248,27 @@ static struct devlink_dpipe_table_ops mlxsw_sp_dpipe_table_adj_ops = {
 static int mlxsw_sp_dpipe_adj_table_init(struct mlxsw_sp *mlxsw_sp)
 {
 	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+	int err;
 
-	return devlink_dpipe_table_register(devlink,
-					    MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
-					    &mlxsw_sp_dpipe_table_adj_ops,
-					    mlxsw_sp, false);
+	err = devlink_dpipe_table_register(devlink,
+					   MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
+					   &mlxsw_sp_dpipe_table_adj_ops,
+					   mlxsw_sp, false);
+	if (err)
+		return err;
+
+	err = devlink_dpipe_table_resource_set(devlink,
+					       MLXSW_SP_DPIPE_TABLE_NAME_ADJ,
+					       MLXSW_SP_RESOURCE_KVD_LINEAR);
+	if (err)
+		goto err_resource_set;
+
+	return 0;
+
+err_resource_set:
+	devlink_dpipe_table_unregister(devlink,
+				       MLXSW_SP_DPIPE_TABLE_NAME_ADJ);
+	return err;
 }
 
 static void mlxsw_sp_dpipe_adj_table_fini(struct mlxsw_sp *mlxsw_sp)
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 05/10] mlxsw: pci: Add support for performing bus reset
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.h |  1 +
 drivers/net/ethernet/mellanox/mlxsw/pci.c  | 53 ++++++++++++++++++++++--------
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 6e966af..34dda96 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -332,6 +332,7 @@ struct mlxsw_bus {
 		    const struct mlxsw_config_profile *profile,
 		    struct mlxsw_res *res);
 	void (*fini)(void *bus_priv);
+	void (*reset)(void *bus_priv);
 	bool (*skb_transmit_busy)(void *bus_priv,
 				  const struct mlxsw_tx_info *tx_info);
 	int (*skb_transmit)(void *bus_priv, struct sk_buff *skb,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index 23f7d82..39872fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -154,6 +154,7 @@ struct mlxsw_pci {
 		} comp;
 	} cmd;
 	struct mlxsw_bus_info bus_info;
+	const struct pci_device_id *id;
 };
 
 static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q)
@@ -1622,16 +1623,6 @@ static int mlxsw_pci_cmd_exec(void *bus_priv, u16 opcode, u8 opcode_mod,
 	return err;
 }
 
-static const struct mlxsw_bus mlxsw_pci_bus = {
-	.kind			= "pci",
-	.init			= mlxsw_pci_init,
-	.fini			= mlxsw_pci_fini,
-	.skb_transmit_busy	= mlxsw_pci_skb_transmit_busy,
-	.skb_transmit		= mlxsw_pci_skb_transmit,
-	.cmd_exec		= mlxsw_pci_cmd_exec,
-	.features		= MLXSW_BUS_F_TXRX,
-};
-
 static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci,
 			      const struct pci_device_id *id)
 {
@@ -1655,6 +1646,41 @@ static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci,
 	return 0;
 }
 
+static void mlxsw_pci_free_irq_vectors(struct mlxsw_pci *mlxsw_pci)
+{
+	pci_free_irq_vectors(mlxsw_pci->pdev);
+}
+
+static int mlxsw_pci_alloc_irq_vectors(struct mlxsw_pci *mlxsw_pci)
+{
+	int err;
+
+	err = pci_alloc_irq_vectors(mlxsw_pci->pdev, 1, 1, PCI_IRQ_MSIX);
+	if (err < 0)
+		dev_err(&mlxsw_pci->pdev->dev, "MSI-X init failed\n");
+	return err;
+}
+
+static void mlxsw_pci_reset(void *bus_priv)
+{
+	struct mlxsw_pci *mlxsw_pci = bus_priv;
+
+	mlxsw_pci_free_irq_vectors(mlxsw_pci);
+	mlxsw_pci_sw_reset(mlxsw_pci, mlxsw_pci->id);
+	mlxsw_pci_alloc_irq_vectors(mlxsw_pci);
+}
+
+static const struct mlxsw_bus mlxsw_pci_bus = {
+	.kind			= "pci",
+	.init			= mlxsw_pci_init,
+	.fini			= mlxsw_pci_fini,
+	.skb_transmit_busy	= mlxsw_pci_skb_transmit_busy,
+	.skb_transmit		= mlxsw_pci_skb_transmit,
+	.cmd_exec		= mlxsw_pci_cmd_exec,
+	.features		= MLXSW_BUS_F_TXRX,
+	.reset			= mlxsw_pci_reset,
+};
+
 static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	const char *driver_name = pdev->driver->name;
@@ -1716,7 +1742,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_sw_reset;
 	}
 
-	err = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSIX);
+	err = mlxsw_pci_alloc_irq_vectors(mlxsw_pci);
 	if (err < 0) {
 		dev_err(&pdev->dev, "MSI-X init failed\n");
 		goto err_msix_init;
@@ -1725,6 +1751,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mlxsw_pci->bus_info.device_kind = driver_name;
 	mlxsw_pci->bus_info.device_name = pci_name(mlxsw_pci->pdev);
 	mlxsw_pci->bus_info.dev = &pdev->dev;
+	mlxsw_pci->id = id;
 
 	err = mlxsw_core_bus_device_register(&mlxsw_pci->bus_info,
 					     &mlxsw_pci_bus, mlxsw_pci);
@@ -1736,7 +1763,7 @@ static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	return 0;
 
 err_bus_device_register:
-	pci_free_irq_vectors(mlxsw_pci->pdev);
+	mlxsw_pci_free_irq_vectors(mlxsw_pci);
 err_msix_init:
 err_sw_reset:
 	iounmap(mlxsw_pci->hw_addr);
@@ -1756,7 +1783,7 @@ static void mlxsw_pci_remove(struct pci_dev *pdev)
 	struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev);
 
 	mlxsw_core_bus_device_unregister(mlxsw_pci->core);
-	pci_free_irq_vectors(mlxsw_pci->pdev);
+	mlxsw_pci_free_irq_vectors(mlxsw_pci);
 	iounmap(mlxsw_pci->hw_addr);
 	pci_release_regions(mlxsw_pci->pdev);
 	pci_disable_device(mlxsw_pci->pdev);
-- 
2.9.5

^ permalink raw reply related

* [patch net-next 04/10] devlink: Add relation between dpipe and resource
From: Jiri Pirko @ 2017-12-20 11:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz, dsa, roopa
In-Reply-To: <20171220115821.22171-1-jiri@resnulli.us>

From: Arkadi Sharshevsky <arkadis@mellanox.com>

The hardware processes which are modeled via dpipe commonly use some
internal hardware resources. Such relation can improve the understanding
of hardware limitations.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h        | 13 +++++++++++++
 include/uapi/linux/devlink.h |  1 +
 net/core/devlink.c           | 31 +++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 340c2fc..9217620 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -183,6 +183,8 @@ struct devlink_dpipe_table_ops;
  * @counters_enabled: indicates if counters are active
  * @counter_control_extern: indicates if counter control is in dpipe or
  *			    external tool
+ * @resource_valid: Indicate that the resource id is valid
+ * @resource_id: relative resource this table is related to
  * @table_ops: table operations
  * @rcu: rcu
  */
@@ -192,6 +194,8 @@ struct devlink_dpipe_table {
 	const char *name;
 	bool counters_enabled;
 	bool counter_control_extern;
+	bool resource_valid;
+	u64 resource_id;
 	struct devlink_dpipe_table_ops *table_ops;
 	struct rcu_head rcu;
 };
@@ -386,6 +390,8 @@ void devlink_resources_unregister(struct devlink *devlink,
 int devlink_resource_size_get(struct devlink *devlink,
 			      u64 resource_id,
 			      u64 *p_resource_size);
+int devlink_dpipe_table_resource_set(struct devlink *devlink,
+				     const char *table_name, u64 resource_id);
 
 #else
 
@@ -548,6 +554,13 @@ devlink_resource_size_get(struct devlink *devlink, u64 resource_id,
 	return -EOPNOTSUPP;
 }
 
+static inline int
+devlink_dpipe_table_resource_set(struct devlink *devlink,
+				 const char *table_name, u64 resource_id)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index bee827b..b84c6aa 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -217,6 +217,7 @@ enum devlink_attr {
 	DEVLINK_ATTR_RESOURCE_SIZE_VALID,	/* u8 */
 	DEVLINK_ATTR_RESOURCE_OCC,		/* u64 */
 	DEVLINK_ATTR_RESOURCE_ID,		/* u64 */
+	DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID,	/* u64 */
 
 	/* add new attributes above here, update the policy in devlink.c */
 
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3dedc3f..8fbef2d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1686,6 +1686,9 @@ static int devlink_dpipe_table_put(struct sk_buff *skb,
 		       table->counters_enabled))
 		goto nla_put_failure;
 
+	if (table->resource_valid)
+		nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_TABLE_RESOURCE_ID,
+				  table->resource_id, DEVLINK_ATTR_PAD);
 	if (devlink_dpipe_matches_put(table, skb))
 		goto nla_put_failure;
 
@@ -3234,6 +3237,34 @@ int devlink_resource_size_get(struct devlink *devlink,
 }
 EXPORT_SYMBOL_GPL(devlink_resource_size_get);
 
+/**
+ *	devlink_dpipe_table_resource_set - set the resource id
+ *
+ *	@devlink: devlink
+ *	@table_name: table name
+ *	@resource_id: resource id
+ */
+int devlink_dpipe_table_resource_set(struct devlink *devlink,
+				     const char *table_name, u64 resource_id)
+{
+	struct devlink_dpipe_table *table;
+	int err = 0;
+
+	mutex_lock(&devlink->lock);
+	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+					 table_name);
+	if (!table) {
+		err = -EINVAL;
+		goto out;
+	}
+	table->resource_id = resource_id;
+	table->resource_valid = true;
+out:
+	mutex_unlock(&devlink->lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_table_resource_set);
+
 static int __init devlink_module_init(void)
 {
 	return genl_register_family(&devlink_nl_family);
-- 
2.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox