Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Eric Dumazet @ 2016-04-08 18:23 UTC (permalink / raw)
  To: Petri Gynther
  Cc: Florian Fainelli, netdev, David Miller, opendmb, Jaedon Shin
In-Reply-To: <CAGXr9JED3WLEQr67FrMJoJnUaYJwOcnwRE4JxHTaf+0ha00kig@mail.gmail.com>

On Fri, 2016-04-08 at 09:54 -0700, Petri Gynther wrote:
> On Wed, Apr 6, 2016 at 1:25 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >
> > 2016-04-05 17:50 GMT-07:00 Petri Gynther <pgynther@google.com>:
> > > Add Byte Queue Limits (BQL) support to bcmgenet driver.
> > >
> > > Signed-off-by: Petri Gynther <pgynther@google.com>
> >
> > Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >
> > Thanks!
> > --
> > Florian
> 
> Any further comments?
> 
> Notable difference from some other drivers --
> netdev_tx_reset_queue(txq) is called for all queues in
> bcmgenet_netif_start(), just before netif_tx_start_all_queues(dev).
> This is to ensure that BQL is reset before the interface becomes
> operational.
> 
> I think that is the right place for these calls.
> 
> Some other drivers call it from the "interface down" path.

BQL is ready to go at device setup :

__QUEUE_STATE_STACK_XOFF is not set

dql_reset() was called from dql_init(), called from
netdev_init_one_queue()

^ permalink raw reply

* Re: [PATCH] ieee802154/adf7242: fix memory leak of firmware
From: Marcel Holtmann @ 2016-04-08 17:34 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Michael Hennerich, Alexander Aring, linux-kernel, linux-wpan,
	netdev
In-Reply-To: <1460027764-27428-1-git-send-email-sudipm.mukherjee@gmail.com>

Hi Sudip,

> If the firmware upload or the firmware verification fails then we
> printed the error message and exited but we missed releasing the
> firmware.
> 
> Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
> ---
> drivers/net/ieee802154/adf7242.c | 2 ++
> 1 file changed, 2 insertions(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Alexei Starovoitov @ 2016-04-08 17:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Brenden Blanco, davem, netdev, tom, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo, linux-mm
In-Reply-To: <20160408143340.10e5b1d0@redhat.com>

On Fri, Apr 08, 2016 at 02:33:40PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Fri, 8 Apr 2016 12:36:14 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > > +/* user return codes for PHYS_DEV prog type */
> > > +enum bpf_phys_dev_action {
> > > +	BPF_PHYS_DEV_DROP,
> > > +	BPF_PHYS_DEV_OK,
> > > +};  
> > 
> > I can imagine these extra return codes:
> > 
> >  BPF_PHYS_DEV_MODIFIED,   /* Packet page/payload modified */
> >  BPF_PHYS_DEV_STOLEN,     /* E.g. forward use-case */
> >  BPF_PHYS_DEV_SHARED,     /* Queue for async processing, e.g. tcpdump use-case */
> > 
> > The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> > which we can look at when we get that far...
> 
> I want to point out something which is quite FUNDAMENTAL, for
> understanding these return codes (and network stack).
> 
> 
> At driver RX time, the network stack basically have two ways of
> building an SKB, which is send up the stack.
> 
> Option-A (fastest): The packet page is writable. The SKB can be
> allocated and skb->data/head can point directly to the page.  And
> we place/write skb_shared_info in the end/tail-room. (This is done by
> calling build_skb()).
> 
> Option-B (slower): The packet page is read-only.  The SKB cannot point
> skb->data/head directly to the page, because skb_shared_info need to be
> written into skb->end (slightly hidden via skb_shinfo() casting).  To
> get around this, a separate piece of memory is allocated (speedup by
> __alloc_page_frag) for pointing skb->data/head, so skb_shared_info can
> be written. (This is done when calling netdev/napi_alloc_skb()).
>   Drivers then need to copy over packet headers, and assign + adjust
> skb_shinfo(skb)->frags[0] offset to skip copied headers.
> 
> 
> Unfortunately most drivers use option-B.  Due to cost of calling the
> page allocator.  It is only slightly most expensive to get a larger
> compound page from the page allocator, which then can be partitioned into
> page-fragments, thus amortizing the page alloc cost.  Unfortunately the
> cost is added later, when constructing the SKB.
>  Another reason for option-B, is that archs with expensive IOMMU
> requirements (like PowerPC), don't need to dma_unmap on every packet,
> but only on the compound page level.
> 
> Side-note: Most drivers have a "copy-break" optimization.  Especially
> for option-B, when copying header data anyhow. For small packet, one
> might as well free (or recycle) the RX page, if header size fits into
> the newly allocated memory (for skb_shared_info).

I think you guys are going into overdesign territory, so
. nack on read-only pages
. nack on copy-break approach
. nack on per-ring programs
. nack on modified/stolen/shared return codes

The whole thing must be dead simple to use. Above is not simple by any means.
The programs must see writeable pages only and return codes:
drop, pass to stack, redirect to xmit.
If program wishes to modify packets before passing it to stack, it
shouldn't need to deal with different return values.
No special things to deal with small or large packets. No header splits.
Program must not be aware of any such things.
Drivers can use DMA_BIDIRECTIONAL to allow received page to be
modified by the program and immediately sent to xmit. 
No dma map/unmap/sync per packet. If some odd architectures/dma setups
cannot do it, then XDP will not be applicable there.
We are not going to sacrifice performance for generality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] ipv6, token: allow for clearing the current device token
From: Daniel Borkmann @ 2016-04-08 17:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Bjørn Mork; +Cc: davem, robbat2, netdev
In-Reply-To: <5707CFF6.6090707@stressinduktion.org>

On 04/08/2016 05:36 PM, Hannes Frederic Sowa wrote:
> On 08.04.2016 17:25, Bjørn Mork wrote:
>> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>>> On Fri, Apr 8, 2016, at 16:18, Bjørn Mork wrote:
>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>
>>>>>       if (!token)
>>>>>           return -EINVAL;
>>>>> -    if (ipv6_addr_any(token))
>>>>> -        return -EINVAL;
>>>>>       if (dev->flags & (IFF_LOOPBACK | IFF_NOARP))
>>>>>           return -EINVAL;
>>>>
>>>> Not directly related to the patch in question.  It just made me aware of
>>>> this restriction...
>>>>
>>>> I realize that I'm a few years late here, but what's with the IFF_NOARP?
>>>> Is that just because we can't do DAD for the token based addresses?  How
>>>> is that different from manually configuring the whole address?
>>>
>>> IFF_NOARP is kind of the equivalent to no neighbor discovery. If you set
>>> a token and never get in a router advertisement you never create a
>>> tokenized ip address, thus the feature is useless.
>>
>> You can get router advertisements with IFF_NOARP. You cannot lookup L2
>> addresses, but the L3 prefix info is still as useful as with any other
>> interface.
>
> Of course router advertisements can be send and received with IFF_NOARP and probably we act on them as usual, as you showed. Looking in the source we don't really specify what those flags mean/do for IPv6. So I think you can assume that it is in there because of history.
>
> I would absolutely not mind if you remove the limitation for IFF_ARP.

Agreed me neither, the code should be able to handle it as far as I see.

Thanks,
Daniel

^ permalink raw reply

* [patch net-next] devlink: share user_ptr pointer for both devlink and devlink_port
From: Jiri Pirko @ 2016-04-08 17:12 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo

From: Jiri Pirko <jiri@mellanox.com>

Ptr to devlink structure can be easily obtained from
devlink_port->devlink. So share user_ptr[0] pointer for both and leave
user_ptr[1] free for other users.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 net/core/devlink.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 44f880d..b84cf0d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -119,7 +119,8 @@ static struct devlink_port *devlink_port_get_from_info(struct devlink *devlink,
 	return devlink_port_get_from_attrs(devlink, info->attrs);
 }
 
-#define DEVLINK_NL_FLAG_NEED_PORT	BIT(0)
+#define DEVLINK_NL_FLAG_NEED_DEVLINK	BIT(0)
+#define DEVLINK_NL_FLAG_NEED_PORT	BIT(1)
 
 static int devlink_nl_pre_doit(const struct genl_ops *ops,
 			       struct sk_buff *skb, struct genl_info *info)
@@ -132,8 +133,9 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
 		mutex_unlock(&devlink_mutex);
 		return PTR_ERR(devlink);
 	}
-	info->user_ptr[0] = devlink;
-	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
+	if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_DEVLINK) {
+		info->user_ptr[0] = devlink;
+	} else if (ops->internal_flags & DEVLINK_NL_FLAG_NEED_PORT) {
 		struct devlink_port *devlink_port;
 
 		mutex_lock(&devlink_port_mutex);
@@ -143,7 +145,7 @@ static int devlink_nl_pre_doit(const struct genl_ops *ops,
 			mutex_unlock(&devlink_mutex);
 			return PTR_ERR(devlink_port);
 		}
-		info->user_ptr[1] = devlink_port;
+		info->user_ptr[0] = devlink_port;
 	}
 	return 0;
 }
@@ -356,8 +358,8 @@ out:
 static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
 					struct genl_info *info)
 {
-	struct devlink *devlink = info->user_ptr[0];
-	struct devlink_port *devlink_port = info->user_ptr[1];
+	struct devlink_port *devlink_port = info->user_ptr[0];
+	struct devlink *devlink = devlink_port->devlink;
 	struct sk_buff *msg;
 	int err;
 
@@ -436,8 +438,8 @@ static int devlink_port_type_set(struct devlink *devlink,
 static int devlink_nl_cmd_port_set_doit(struct sk_buff *skb,
 					struct genl_info *info)
 {
-	struct devlink *devlink = info->user_ptr[0];
-	struct devlink_port *devlink_port = info->user_ptr[1];
+	struct devlink_port *devlink_port = info->user_ptr[0];
+	struct devlink *devlink = devlink_port->devlink;
 	int err;
 
 	if (info->attrs[DEVLINK_ATTR_PORT_TYPE]) {
@@ -511,6 +513,7 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.doit = devlink_nl_cmd_get_doit,
 		.dumpit = devlink_nl_cmd_get_dumpit,
 		.policy = devlink_nl_policy,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 		/* can be retrieved by unprivileged users */
 	},
 	{
@@ -533,12 +536,14 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.doit = devlink_nl_cmd_port_split_doit,
 		.policy = devlink_nl_policy,
 		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
 	{
 		.cmd = DEVLINK_CMD_PORT_UNSPLIT,
 		.doit = devlink_nl_cmd_port_unsplit_doit,
 		.policy = devlink_nl_policy,
 		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
 };
 
-- 
2.5.5

^ permalink raw reply related

* Re: [patch net-next 0/5] mlxsw: small driver update
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <20160408.130737.1379821883201014485.davem@davemloft.net>

Fri, Apr 08, 2016 at 07:07:37PM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Fri, 8 Apr 2016 17:51:55 +0200
>
>> Fri, Apr 08, 2016 at 05:45:20PM CEST, jiri@resnulli.us wrote:
>>>From: Jiri Pirko <jiri@mellanox.com>
>>>
>>>Cosmetics, in preparation to sharedbuffer patchset.
>> 
>> Dave, I just realized there is dependency on:
>> "devlink: remove implicit type set in port register" which I sent couple
>> of minutes after this patchset. I can either resend in bulk, or if you
>> could apply in order, that would be great.
>
>The devlink series also lacked a header posting.  Can you just sort this
>all out properly and respin everything?

done.

>
>Thanks.
>
>> Thanks and sorry, owe you another beer :)
>
>:-)

^ permalink raw reply

* [patch net-next 6/6] mlxsw: reg: Fix SBPM register name
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Fix copy&paste error and state the name of SBPM register correctly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 19bdc82..57e4a63 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3598,8 +3598,8 @@ static inline void mlxsw_reg_sbcm_pack(char *payload, u8 local_port, u8 pg_buff,
 	mlxsw_reg_sbcm_pool_set(payload, pool);
 }
 
-/* SBPM - Shared Buffer Class Management Register
- * ----------------------------------------------
+/* SBPM - Shared Buffer Port Management Register
+ * ---------------------------------------------
  * The SBPM register configures and retrieves the shared buffer allocation
  * and configuration according to Port-Pool, including the definition
  * of the associated quota.
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 5/6] mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Same field, same values, so share the same enum.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h          | 23 +++++++---------------
 .../net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 20 +++++++++----------
 2 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 28f5b99..19bdc82 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3476,9 +3476,10 @@ static const struct mlxsw_reg_info mlxsw_reg_sbpr = {
 	.len = MLXSW_REG_SBPR_LEN,
 };
 
-enum mlxsw_reg_sbpr_dir {
-	MLXSW_REG_SBPR_DIR_INGRESS,
-	MLXSW_REG_SBPR_DIR_EGRESS,
+/* shared direstion enum for SBPR, SBCM, SBPM */
+enum mlxsw_reg_sbxx_dir {
+	MLXSW_REG_SBXX_DIR_INGRESS,
+	MLXSW_REG_SBXX_DIR_EGRESS,
 };
 
 /* reg_sbpr_dir
@@ -3511,7 +3512,7 @@ enum mlxsw_reg_sbpr_mode {
 MLXSW_ITEM32(reg, sbpr, mode, 0x08, 0, 4);
 
 static inline void mlxsw_reg_sbpr_pack(char *payload, u8 pool,
-				       enum mlxsw_reg_sbpr_dir dir,
+				       enum mlxsw_reg_sbxx_dir dir,
 				       enum mlxsw_reg_sbpr_mode mode, u32 size)
 {
 	MLXSW_REG_ZERO(sbpr, payload);
@@ -3553,11 +3554,6 @@ MLXSW_ITEM32(reg, sbcm, local_port, 0x00, 16, 8);
  */
 MLXSW_ITEM32(reg, sbcm, pg_buff, 0x00, 8, 6);
 
-enum mlxsw_reg_sbcm_dir {
-	MLXSW_REG_SBCM_DIR_INGRESS,
-	MLXSW_REG_SBCM_DIR_EGRESS,
-};
-
 /* reg_sbcm_dir
  * Direction.
  * Access: Index
@@ -3590,7 +3586,7 @@ MLXSW_ITEM32(reg, sbcm, max_buff, 0x1C, 0, 24);
 MLXSW_ITEM32(reg, sbcm, pool, 0x24, 0, 4);
 
 static inline void mlxsw_reg_sbcm_pack(char *payload, u8 local_port, u8 pg_buff,
-				       enum mlxsw_reg_sbcm_dir dir,
+				       enum mlxsw_reg_sbxx_dir dir,
 				       u32 min_buff, u32 max_buff, u8 pool)
 {
 	MLXSW_REG_ZERO(sbcm, payload);
@@ -3630,11 +3626,6 @@ MLXSW_ITEM32(reg, sbpm, local_port, 0x00, 16, 8);
  */
 MLXSW_ITEM32(reg, sbpm, pool, 0x00, 8, 4);
 
-enum mlxsw_reg_sbpm_dir {
-	MLXSW_REG_SBPM_DIR_INGRESS,
-	MLXSW_REG_SBPM_DIR_EGRESS,
-};
-
 /* reg_sbpm_dir
  * Direction.
  * Access: Index
@@ -3661,7 +3652,7 @@ MLXSW_ITEM32(reg, sbpm, min_buff, 0x18, 0, 24);
 MLXSW_ITEM32(reg, sbpm, max_buff, 0x1C, 0, 24);
 
 static inline void mlxsw_reg_sbpm_pack(char *payload, u8 local_port, u8 pool,
-				       enum mlxsw_reg_sbpm_dir dir,
+				       enum mlxsw_reg_sbxx_dir dir,
 				       u32 min_buff, u32 max_buff)
 {
 	MLXSW_REG_ZERO(sbpm, payload);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index 97c8d53..f58b1d3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -110,7 +110,7 @@ static int mlxsw_sp_port_headroom_init(struct mlxsw_sp_port *mlxsw_sp_port)
 
 struct mlxsw_sp_sb_pool {
 	u8 pool;
-	enum mlxsw_reg_sbpr_dir dir;
+	enum mlxsw_reg_sbxx_dir dir;
 	enum mlxsw_reg_sbpr_mode mode;
 	u32 size;
 };
@@ -129,11 +129,11 @@ struct mlxsw_sp_sb_pool {
 	}
 
 #define MLXSW_SP_SB_POOL_INGRESS(_pool, _size)			\
-	MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBPR_DIR_INGRESS,	\
+	MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBXX_DIR_INGRESS,	\
 			 MLXSW_REG_SBPR_MODE_DYNAMIC, _size)
 
 #define MLXSW_SP_SB_POOL_EGRESS(_pool, _size)			\
-	MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBPR_DIR_EGRESS,	\
+	MLXSW_SP_SB_POOL(_pool, MLXSW_REG_SBXX_DIR_EGRESS,	\
 			 MLXSW_REG_SBPR_MODE_DYNAMIC, _size)
 
 static const struct mlxsw_sp_sb_pool mlxsw_sp_sb_pools[] = {
@@ -173,7 +173,7 @@ struct mlxsw_sp_sb_cm {
 		u8 pg;
 		u8 tc;
 	} u;
-	enum mlxsw_reg_sbcm_dir dir;
+	enum mlxsw_reg_sbxx_dir dir;
 	u32 min_buff;
 	u32 max_buff;
 	u8 pool;
@@ -189,15 +189,15 @@ struct mlxsw_sp_sb_cm {
 	}
 
 #define MLXSW_SP_SB_CM_INGRESS(_pg, _min_buff, _max_buff)		\
-	MLXSW_SP_SB_CM(_pg, MLXSW_REG_SBCM_DIR_INGRESS,			\
+	MLXSW_SP_SB_CM(_pg, MLXSW_REG_SBXX_DIR_INGRESS,			\
 		       _min_buff, _max_buff, 0)
 
 #define MLXSW_SP_SB_CM_EGRESS(_tc, _min_buff, _max_buff)		\
-	MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBCM_DIR_EGRESS,			\
+	MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBXX_DIR_EGRESS,			\
 		       _min_buff, _max_buff, 0)
 
 #define MLXSW_SP_CPU_PORT_SB_CM_EGRESS(_tc)				\
-	MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBCM_DIR_EGRESS, 104, 2, 3)
+	MLXSW_SP_SB_CM(_tc, MLXSW_REG_SBXX_DIR_EGRESS, 104, 2, 3)
 
 static const struct mlxsw_sp_sb_cm mlxsw_sp_sb_cms[] = {
 	MLXSW_SP_SB_CM_INGRESS(0, MLXSW_SP_BYTES_TO_CELLS(10000), 8),
@@ -304,7 +304,7 @@ static int mlxsw_sp_cpu_port_sb_cms_init(struct mlxsw_sp *mlxsw_sp)
 
 struct mlxsw_sp_sb_pm {
 	u8 pool;
-	enum mlxsw_reg_sbpm_dir dir;
+	enum mlxsw_reg_sbxx_dir dir;
 	u32 min_buff;
 	u32 max_buff;
 };
@@ -318,11 +318,11 @@ struct mlxsw_sp_sb_pm {
 	}
 
 #define MLXSW_SP_SB_PM_INGRESS(_pool, _min_buff, _max_buff)	\
-	MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBPM_DIR_INGRESS,	\
+	MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBXX_DIR_INGRESS,	\
 		       _min_buff, _max_buff)
 
 #define MLXSW_SP_SB_PM_EGRESS(_pool, _min_buff, _max_buff)	\
-	MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBPM_DIR_EGRESS,	\
+	MLXSW_SP_SB_PM(_pool, MLXSW_REG_SBXX_DIR_EGRESS,	\
 		       _min_buff, _max_buff)
 
 static const struct mlxsw_sp_sb_pm mlxsw_sp_sb_pms[] = {
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 4/6] mlxsw: Do not pass around driver_priv directly
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Instead of that, pass mlxsw_core and use a helper to get driver priv
from driver code. Looks much cleaner that way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 19 +++++++++++--------
 drivers/net/ethernet/mellanox/mlxsw/core.h     | 11 +++++++----
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 17 +++++++++--------
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  8 ++++----
 4 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 39161fb..3958195 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -114,6 +114,12 @@ struct mlxsw_core {
 	/* driver_priv has to be always the last item */
 };
 
+void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core)
+{
+	return mlxsw_core->driver_priv;
+}
+EXPORT_SYMBOL(mlxsw_core_driver_priv);
+
 struct mlxsw_rx_listener_item {
 	struct list_head list;
 	struct mlxsw_rx_listener rxl;
@@ -795,8 +801,7 @@ static int mlxsw_devlink_port_split(struct devlink *devlink,
 		return -EINVAL;
 	if (!mlxsw_core->driver->port_split)
 		return -EOPNOTSUPP;
-	return mlxsw_core->driver->port_split(mlxsw_core->driver_priv,
-					      port_index, count);
+	return mlxsw_core->driver->port_split(mlxsw_core, port_index, count);
 }
 
 static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
@@ -808,8 +813,7 @@ static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
 		return -EINVAL;
 	if (!mlxsw_core->driver->port_unsplit)
 		return -EOPNOTSUPP;
-	return mlxsw_core->driver->port_unsplit(mlxsw_core->driver_priv,
-						port_index);
+	return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index);
 }
 
 static const struct devlink_ops mlxsw_devlink_ops = {
@@ -880,8 +884,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (err)
 		goto err_devlink_register;
 
-	err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
-				 mlxsw_bus_info);
+	err = mlxsw_driver->init(mlxsw_core, mlxsw_bus_info);
 	if (err)
 		goto err_driver_init;
 
@@ -892,7 +895,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	return 0;
 
 err_debugfs_init:
-	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	mlxsw_core->driver->fini(mlxsw_core);
 err_driver_init:
 	devlink_unregister(devlink);
 err_devlink_register:
@@ -918,7 +921,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 	struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
 	mlxsw_core_debugfs_fini(mlxsw_core);
-	mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+	mlxsw_core->driver->fini(mlxsw_core);
 	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
 	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 0454212..f3cebef 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -62,6 +62,8 @@ struct mlxsw_driver;
 struct mlxsw_bus;
 struct mlxsw_bus_info;
 
+void *mlxsw_core_driver_priv(struct mlxsw_core *mlxsw_core);
+
 int mlxsw_core_driver_register(struct mlxsw_driver *mlxsw_driver);
 void mlxsw_core_driver_unregister(struct mlxsw_driver *mlxsw_driver);
 
@@ -192,11 +194,12 @@ struct mlxsw_driver {
 	const char *kind;
 	struct module *owner;
 	size_t priv_size;
-	int (*init)(void *driver_priv, struct mlxsw_core *mlxsw_core,
+	int (*init)(struct mlxsw_core *mlxsw_core,
 		    const struct mlxsw_bus_info *mlxsw_bus_info);
-	void (*fini)(void *driver_priv);
-	int (*port_split)(void *driver_priv, u8 local_port, unsigned int count);
-	int (*port_unsplit)(void *driver_priv, u8 local_port);
+	void (*fini)(struct mlxsw_core *mlxsw_core);
+	int (*port_split)(struct mlxsw_core *mlxsw_core, u8 local_port,
+			  unsigned int count);
+	int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port);
 	void (*txhdr_construct)(struct sk_buff *skb,
 				const struct mlxsw_tx_info *tx_info);
 	u8 txhdr_len;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 8abe1a6..19b3c14 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1948,9 +1948,10 @@ static u8 mlxsw_sp_cluster_base_port_get(u8 local_port)
 	return local_port - offset;
 }
 
-static int mlxsw_sp_port_split(void *priv, u8 local_port, unsigned int count)
+static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
+			       unsigned int count)
 {
-	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
 	u8 width = MLXSW_PORT_MODULE_MAX_WIDTH / count;
 	u8 module, cur_width, base_port;
@@ -2022,9 +2023,9 @@ err_port_create:
 	return err;
 }
 
-static int mlxsw_sp_port_unsplit(void *priv, u8 local_port)
+static int mlxsw_sp_port_unsplit(struct mlxsw_core *mlxsw_core, u8 local_port)
 {
-	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
 	u8 module, cur_width, base_port;
 	unsigned int count;
@@ -2369,10 +2370,10 @@ static int mlxsw_sp_lag_init(struct mlxsw_sp *mlxsw_sp)
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(slcr), slcr_pl);
 }
 
-static int mlxsw_sp_init(void *priv, struct mlxsw_core *mlxsw_core,
+static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 			 const struct mlxsw_bus_info *mlxsw_bus_info)
 {
-	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 	int err;
 
 	mlxsw_sp->core = mlxsw_core;
@@ -2443,9 +2444,9 @@ err_event_register:
 	return err;
 }
 
-static void mlxsw_sp_fini(void *priv)
+static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
 {
-	struct mlxsw_sp *mlxsw_sp = priv;
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 
 	mlxsw_sp_switchdev_fini(mlxsw_sp);
 	mlxsw_sp_traps_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index 2518c84..3842eab 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -1447,10 +1447,10 @@ static int mlxsw_sx_flood_init(struct mlxsw_sx *mlxsw_sx)
 	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(sgcr), sgcr_pl);
 }
 
-static int mlxsw_sx_init(void *priv, struct mlxsw_core *mlxsw_core,
+static int mlxsw_sx_init(struct mlxsw_core *mlxsw_core,
 			 const struct mlxsw_bus_info *mlxsw_bus_info)
 {
-	struct mlxsw_sx *mlxsw_sx = priv;
+	struct mlxsw_sx *mlxsw_sx = mlxsw_core_driver_priv(mlxsw_core);
 	int err;
 
 	mlxsw_sx->core = mlxsw_core;
@@ -1497,9 +1497,9 @@ err_event_register:
 	return err;
 }
 
-static void mlxsw_sx_fini(void *priv)
+static void mlxsw_sx_fini(struct mlxsw_core *mlxsw_core)
 {
-	struct mlxsw_sx *mlxsw_sx = priv;
+	struct mlxsw_sx *mlxsw_sx = mlxsw_core_driver_priv(mlxsw_core);
 
 	mlxsw_sx_traps_fini(mlxsw_sx);
 	mlxsw_sx_event_unregister(mlxsw_sx, MLXSW_TRAP_ID_PUDE);
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 3/6] mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Instead of passing around driver priv, pass struct mlxsw_core *
directly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 15 +++------------
 drivers/net/ethernet/mellanox/mlxsw/core.h     |  5 ++---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  4 ++--
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  4 ++--
 4 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 004fb8b..39161fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -381,7 +381,7 @@ static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core,
 
 	mlxsw_core->emad.trans_active = true;
 
-	err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info);
+	err = mlxsw_core_skb_transmit(mlxsw_core, skb, tx_info);
 	if (err) {
 		dev_err(mlxsw_core->bus_info->dev, "Failed to transmit EMAD (tid=%llx)\n",
 			mlxsw_core->emad.tid);
@@ -929,26 +929,17 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
 
-static struct mlxsw_core *__mlxsw_core_get(void *driver_priv)
-{
-	return container_of(driver_priv, struct mlxsw_core, driver_priv);
-}
-
-bool mlxsw_core_skb_transmit_busy(void *driver_priv,
+bool mlxsw_core_skb_transmit_busy(struct mlxsw_core *mlxsw_core,
 				  const struct mlxsw_tx_info *tx_info)
 {
-	struct mlxsw_core *mlxsw_core = __mlxsw_core_get(driver_priv);
-
 	return mlxsw_core->bus->skb_transmit_busy(mlxsw_core->bus_priv,
 						  tx_info);
 }
 EXPORT_SYMBOL(mlxsw_core_skb_transmit_busy);
 
-int mlxsw_core_skb_transmit(void *driver_priv, struct sk_buff *skb,
+int mlxsw_core_skb_transmit(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
 			    const struct mlxsw_tx_info *tx_info)
 {
-	struct mlxsw_core *mlxsw_core = __mlxsw_core_get(driver_priv);
-
 	return mlxsw_core->bus->skb_transmit(mlxsw_core->bus_priv, skb,
 					     tx_info);
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 06631a0..0454212 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -75,10 +75,9 @@ struct mlxsw_tx_info {
 	bool is_emad;
 };
 
-bool mlxsw_core_skb_transmit_busy(void *driver_priv,
+bool mlxsw_core_skb_transmit_busy(struct mlxsw_core *mlxsw_core,
 				  const struct mlxsw_tx_info *tx_info);
-
-int mlxsw_core_skb_transmit(void *driver_priv, struct sk_buff *skb,
+int mlxsw_core_skb_transmit(struct mlxsw_core *mlxsw_core, struct sk_buff *skb,
 			    const struct mlxsw_tx_info *tx_info);
 
 struct mlxsw_rx_listener {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 3216f2b..8abe1a6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -390,7 +390,7 @@ static netdev_tx_t mlxsw_sp_port_xmit(struct sk_buff *skb,
 	u64 len;
 	int err;
 
-	if (mlxsw_core_skb_transmit_busy(mlxsw_sp, &tx_info))
+	if (mlxsw_core_skb_transmit_busy(mlxsw_sp->core, &tx_info))
 		return NETDEV_TX_BUSY;
 
 	if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) {
@@ -414,7 +414,7 @@ static netdev_tx_t mlxsw_sp_port_xmit(struct sk_buff *skb,
 	/* Due to a race we might fail here because of a full queue. In that
 	 * unlikely case we simply drop the packet.
 	 */
-	err = mlxsw_core_skb_transmit(mlxsw_sp, skb, &tx_info);
+	err = mlxsw_core_skb_transmit(mlxsw_sp->core, skb, &tx_info);
 
 	if (!err) {
 		pcpu_stats = this_cpu_ptr(mlxsw_sp_port->pcpu_stats);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index 2417f09..2518c84 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -302,7 +302,7 @@ static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
 	u64 len;
 	int err;
 
-	if (mlxsw_core_skb_transmit_busy(mlxsw_sx, &tx_info))
+	if (mlxsw_core_skb_transmit_busy(mlxsw_sx->core, &tx_info))
 		return NETDEV_TX_BUSY;
 
 	if (unlikely(skb_headroom(skb) < MLXSW_TXHDR_LEN)) {
@@ -320,7 +320,7 @@ static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
 	/* Due to a race we might fail here because of a full queue. In that
 	 * unlikely case we simply drop the packet.
 	 */
-	err = mlxsw_core_skb_transmit(mlxsw_sx, skb, &tx_info);
+	err = mlxsw_core_skb_transmit(mlxsw_sx->core, skb, &tx_info);
 
 	if (!err) {
 		pcpu_stats = this_cpu_ptr(mlxsw_sx_port->pcpu_stats);
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 2/6] mlxsw: Move devlink port registration into common core code
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Remove devlink port reg/unreg from spectrum and switchx2 code and rather
do the common work in core. That also ensures code separation where
devlink is only used in core.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c     | 22 ++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h     | 10 +++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 31 +++++++++-----------------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  3 +--
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 30 +++++++++----------------
 5 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index f69f628..004fb8b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1358,6 +1358,28 @@ void mlxsw_core_lag_mapping_clear(struct mlxsw_core *mlxsw_core,
 }
 EXPORT_SYMBOL(mlxsw_core_lag_mapping_clear);
 
+int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core,
+			 struct mlxsw_core_port *mlxsw_core_port, u8 local_port,
+			 struct net_device *dev, bool split, u32 split_group)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+
+	if (split)
+		devlink_port_split_set(devlink_port, split_group);
+	devlink_port_type_eth_set(devlink_port, dev);
+	return devlink_port_register(devlink, devlink_port, local_port);
+}
+EXPORT_SYMBOL(mlxsw_core_port_init);
+
+void mlxsw_core_port_fini(struct mlxsw_core_port *mlxsw_core_port)
+{
+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+
+	devlink_port_unregister(devlink_port);
+}
+EXPORT_SYMBOL(mlxsw_core_port_fini);
+
 int mlxsw_cmd_exec(struct mlxsw_core *mlxsw_core, u16 opcode, u8 opcode_mod,
 		   u32 in_mod, bool out_mbox_direct,
 		   char *in_mbox, size_t in_mbox_size,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index c73d1c0..06631a0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -43,6 +43,7 @@
 #include <linux/gfp.h>
 #include <linux/types.h>
 #include <linux/skbuff.h>
+#include <net/devlink.h>
 
 #include "trap.h"
 #include "reg.h"
@@ -131,6 +132,15 @@ u8 mlxsw_core_lag_mapping_get(struct mlxsw_core *mlxsw_core,
 void mlxsw_core_lag_mapping_clear(struct mlxsw_core *mlxsw_core,
 				  u16 lag_id, u8 local_port);
 
+struct mlxsw_core_port {
+	struct devlink_port devlink_port;
+};
+
+int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core,
+			 struct mlxsw_core_port *mlxsw_core_port, u8 local_port,
+			 struct net_device *dev, bool split, u32 split_group);
+void mlxsw_core_port_fini(struct mlxsw_core_port *mlxsw_core_port);
+
 #define MLXSW_CONFIG_PROFILE_SWID_COUNT 8
 
 struct mlxsw_swid_config {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 507263a..3216f2b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -50,7 +50,6 @@
 #include <linux/bitops.h>
 #include <linux/list.h>
 #include <linux/dcbnl.h>
-#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -1685,9 +1684,7 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
 static int __mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 				  bool split, u8 module, u8 width)
 {
-	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
 	struct mlxsw_sp_port *mlxsw_sp_port;
-	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	size_t bytes;
 	int err;
@@ -1740,16 +1737,6 @@ static int __mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 	 */
 	dev->hard_header_len += MLXSW_TXHDR_LEN;
 
-	devlink_port = &mlxsw_sp_port->devlink_port;
-	if (mlxsw_sp_port->split)
-		devlink_port_split_set(devlink_port, module);
-	err = devlink_port_register(devlink, devlink_port, local_port);
-	if (err) {
-		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to register devlink port\n",
-			mlxsw_sp_port->local_port);
-		goto err_devlink_port_register;
-	}
-
 	err = mlxsw_sp_port_system_port_mapping_set(mlxsw_sp_port);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1812,7 +1799,14 @@ static int __mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 		goto err_register_netdev;
 	}
 
-	devlink_port_type_eth_set(devlink_port, dev);
+	err = mlxsw_core_port_init(mlxsw_sp->core, &mlxsw_sp_port->core_port,
+				   mlxsw_sp_port->local_port, dev,
+				   mlxsw_sp_port->split, module);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to init core port\n",
+			mlxsw_sp_port->local_port);
+		goto err_core_port_init;
+	}
 
 	err = mlxsw_sp_port_vlan_init(mlxsw_sp_port);
 	if (err)
@@ -1822,6 +1816,8 @@ static int __mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 	return 0;
 
 err_port_vlan_init:
+	mlxsw_core_port_fini(&mlxsw_sp_port->core_port);
+err_core_port_init:
 	unregister_netdev(dev);
 err_register_netdev:
 err_port_dcb_init:
@@ -1832,8 +1828,6 @@ err_port_mtu_set:
 err_port_speed_by_width_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
-	devlink_port_unregister(&mlxsw_sp_port->devlink_port);
-err_devlink_port_register:
 err_dev_addr_init:
 	free_percpu(mlxsw_sp_port->pcpu_stats);
 err_alloc_stats:
@@ -1887,16 +1881,13 @@ static void mlxsw_sp_port_vports_fini(struct mlxsw_sp_port *mlxsw_sp_port)
 static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
 	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[local_port];
-	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sp_port)
 		return;
 	mlxsw_sp->ports[local_port] = NULL;
-	devlink_port = &mlxsw_sp_port->devlink_port;
-	devlink_port_type_clear(devlink_port);
+	mlxsw_core_port_fini(&mlxsw_sp_port->core_port);
 	unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
 	mlxsw_sp_port_dcb_fini(mlxsw_sp_port);
-	devlink_port_unregister(devlink_port);
 	mlxsw_sp_port_vports_fini(mlxsw_sp_port);
 	mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
 	mlxsw_sp_port_swid_set(mlxsw_sp_port, MLXSW_PORT_SWID_DISABLED_PORT);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 47610a5..361b0c2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -44,7 +44,6 @@
 #include <linux/list.h>
 #include <linux/dcbnl.h>
 #include <net/switchdev.h>
-#include <net/devlink.h>
 
 #include "port.h"
 #include "core.h"
@@ -166,6 +165,7 @@ struct mlxsw_sp_port_pcpu_stats {
 };
 
 struct mlxsw_sp_port {
+	struct mlxsw_core_port core_port; /* must be first */
 	struct net_device *dev;
 	struct mlxsw_sp_port_pcpu_stats __percpu *pcpu_stats;
 	struct mlxsw_sp *mlxsw_sp;
@@ -198,7 +198,6 @@ struct mlxsw_sp_port {
 	unsigned long *untagged_vlans;
 	/* VLAN interfaces */
 	struct list_head vports_list;
-	struct devlink_port devlink_port;
 };
 
 static inline bool
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index c49447f..2417f09 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -43,7 +43,6 @@
 #include <linux/device.h>
 #include <linux/skbuff.h>
 #include <linux/if_vlan.h>
-#include <net/devlink.h>
 #include <net/switchdev.h>
 #include <generated/utsrelease.h>
 
@@ -75,11 +74,11 @@ struct mlxsw_sx_port_pcpu_stats {
 };
 
 struct mlxsw_sx_port {
+	struct mlxsw_core_port core_port; /* must be first */
 	struct net_device *dev;
 	struct mlxsw_sx_port_pcpu_stats __percpu *pcpu_stats;
 	struct mlxsw_sx *mlxsw_sx;
 	u8 local_port;
-	struct devlink_port devlink_port;
 };
 
 /* tx_hdr_version
@@ -956,9 +955,7 @@ mlxsw_sx_port_mac_learning_mode_set(struct mlxsw_sx_port *mlxsw_sx_port,
 
 static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
-	struct devlink *devlink = priv_to_devlink(mlxsw_sx->core);
 	struct mlxsw_sx_port *mlxsw_sx_port;
-	struct devlink_port *devlink_port;
 	struct net_device *dev;
 	bool usable;
 	int err;
@@ -1012,14 +1009,6 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto port_not_usable;
 	}
 
-	devlink_port = &mlxsw_sx_port->devlink_port;
-	err = devlink_port_register(devlink, devlink_port, local_port);
-	if (err) {
-		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to register devlink port\n",
-			mlxsw_sx_port->local_port);
-		goto err_devlink_port_register;
-	}
-
 	err = mlxsw_sx_port_system_port_mapping_set(mlxsw_sx_port);
 	if (err) {
 		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to set system port mapping\n",
@@ -1077,11 +1066,19 @@ static int mlxsw_sx_port_create(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 		goto err_register_netdev;
 	}
 
-	devlink_port_type_eth_set(devlink_port, dev);
+	err = mlxsw_core_port_init(mlxsw_sx->core, &mlxsw_sx_port->core_port,
+				   mlxsw_sx_port->local_port, dev, false, 0);
+	if (err) {
+		dev_err(mlxsw_sx->bus_info->dev, "Port %d: Failed to init core port\n",
+			mlxsw_sx_port->local_port);
+		goto err_core_port_init;
+	}
 
 	mlxsw_sx->ports[local_port] = mlxsw_sx_port;
 	return 0;
 
+err_core_port_init:
+	unregister_netdev(dev);
 err_register_netdev:
 err_port_mac_learning_mode_set:
 err_port_stp_state_set:
@@ -1090,8 +1087,6 @@ err_port_mtu_set:
 err_port_speed_set:
 err_port_swid_set:
 err_port_system_port_mapping_set:
-	devlink_port_unregister(&mlxsw_sx_port->devlink_port);
-err_devlink_port_register:
 port_not_usable:
 err_port_module_check:
 err_dev_addr_get:
@@ -1104,15 +1099,12 @@ err_alloc_stats:
 static void mlxsw_sx_port_remove(struct mlxsw_sx *mlxsw_sx, u8 local_port)
 {
 	struct mlxsw_sx_port *mlxsw_sx_port = mlxsw_sx->ports[local_port];
-	struct devlink_port *devlink_port;
 
 	if (!mlxsw_sx_port)
 		return;
-	devlink_port = &mlxsw_sx_port->devlink_port;
-	devlink_port_type_clear(devlink_port);
+	mlxsw_core_port_fini(&mlxsw_sx_port->core_port);
 	unregister_netdev(mlxsw_sx_port->dev); /* This calls ndo_stop */
 	mlxsw_sx_port_swid_set(mlxsw_sx_port, MLXSW_PORT_SWID_DISABLED_PORT);
-	devlink_port_unregister(devlink_port);
 	free_percpu(mlxsw_sx_port->pcpu_stats);
 	free_netdev(mlxsw_sx_port->dev);
 }
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 1/6] devlink: remove implicit type set in port register
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <1460135485-16095-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

As we rely on caller zeroing or correctly set the struct before the call,
this implicit type set is either no-op (DEVLINK_PORT_TYPE_NOTSET is 0)
or it rewrites wanted value. So remove this.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/devlink.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 590fa56..44f880d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -630,7 +630,6 @@ int devlink_port_register(struct devlink *devlink,
 	}
 	devlink_port->devlink = devlink;
 	devlink_port->index = port_index;
-	devlink_port->type = DEVLINK_PORT_TYPE_NOTSET;
 	devlink_port->registered = true;
 	list_add_tail(&devlink_port->list, &devlink->port_list);
 	mutex_unlock(&devlink_port_mutex);
-- 
2.5.5

^ permalink raw reply related

* [patch net-next 0/6] mlxsw: small driver update + one tiny devlink dependency
From: Jiri Pirko @ 2016-04-08 17:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, gospo

From: Jiri Pirko <jiri@mellanox.com>

Cosmetics, in preparation to sharedbuffer patchset.
First patch is here to allow patch number two.

Jiri Pirko (6):
  devlink: remove implicit type set in port register
  mlxsw: Move devlink port registration into common core code
  mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*
  mlxsw: Do not pass around driver_priv directly
  mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM
  mlxsw: reg: Fix SBPM register name

 drivers/net/ethernet/mellanox/mlxsw/core.c         | 56 ++++++++++++++--------
 drivers/net/ethernet/mellanox/mlxsw/core.h         | 26 +++++++---
 drivers/net/ethernet/mellanox/mlxsw/reg.h          | 27 ++++-------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 52 +++++++++-----------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  3 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 20 ++++----
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c     | 42 +++++++---------
 net/core/devlink.c                                 |  1 -
 8 files changed, 114 insertions(+), 113 deletions(-)

-- 
2.5.5

^ permalink raw reply

* Re: [patch net-next 0/5] mlxsw: small driver update
From: David Miller @ 2016-04-08 17:07 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, ogerlitz, roopa, gospo
In-Reply-To: <20160408155155.GC1932@nanopsycho.orion>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 8 Apr 2016 17:51:55 +0200

> Fri, Apr 08, 2016 at 05:45:20PM CEST, jiri@resnulli.us wrote:
>>From: Jiri Pirko <jiri@mellanox.com>
>>
>>Cosmetics, in preparation to sharedbuffer patchset.
> 
> Dave, I just realized there is dependency on:
> "devlink: remove implicit type set in port register" which I sent couple
> of minutes after this patchset. I can either resend in bulk, or if you
> could apply in order, that would be great.

The devlink series also lacked a header posting.  Can you just sort this
all out properly and respin everything?

Thanks.

> Thanks and sorry, owe you another beer :)

:-)

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Eric Dumazet @ 2016-04-08 17:04 UTC (permalink / raw)
  To: David Miller; +Cc: yangyingliang, netdev, dingtianhong
In-Reply-To: <20160408.125336.1805413467131988444.davem@davemloft.net>

On Fri, 2016-04-08 at 12:53 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 08 Apr 2016 07:44:25 -0700
> 
> > On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> > 
> >> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> > 
> > Try :
> > 
> > echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> > 
> > And restart your flows.
> 
> I'm honestly beginning to suspect a bug in their driver and how they
> handle skb->truesize.
> 
> Yang, until you show us the driver you are using and how is handles
> receive packets, we are largely in the dark about a major component
> of this issue and that is entirely unfair to us.

Apparently their skb->truesize and skb->len combinations are correct.

I suspect an issue with rcvbuf autouning on a bidirectional tcp traffic.
We mostly focus on unidirectional flows, but they seem to use a mixed
case.

Also, fact that sendmsg() locks the socket for the duration of the call
is problematic : I suspect their issues would mostly disappear by using
smaller chunk sizes (ie 64KB per sendmsg() instead of 256KB).

We also could add resched points in sendmsg() (processing backlog if it
gets too hot), but I fear this would slow down the fast path.

^ permalink raw reply

* Re: [RFC PATCH v2 4/5] mlx4: add support for fast rx drop bpf program
From: Brenden Blanco @ 2016-04-08 17:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <20160408134158.0f153ff9@redhat.com>

On Fri, Apr 08, 2016 at 01:41:58PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Thu,  7 Apr 2016 21:48:49 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> 
> > +int mlx4_call_bpf(struct bpf_prog *prog, void *data, unsigned int length)
> > +{
> > +	struct sk_buff *skb = this_cpu_ptr(&percpu_bpf_phys_dev_md);
> > +	int ret;
> > +
> > +	build_bpf_phys_dev_md(skb, data, length);
> > +
> > +	rcu_read_lock();
> > +	ret = BPF_PROG_RUN(prog, (void *)skb);
> > +	rcu_read_unlock();
> > +
> > +	return ret;
> > +}
> [...]
> 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 86bcfe5..287da02 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -840,6 +843,23 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> >  		l2_tunnel = (dev->hw_enc_features & NETIF_F_RXCSUM) &&
> >  			(cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
> >  
> > +		/* A bpf program gets first chance to drop the packet. It may
> > +		 * read bytes but not past the end of the frag.
> > +		 */
> > +		if (prog) {
> > +			struct ethhdr *ethh;
> > +			dma_addr_t dma;
> > +
> > +			dma = be64_to_cpu(rx_desc->data[0].addr);
> > +			dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
> > +						DMA_FROM_DEVICE);
> > +			ethh = page_address(frags[0].page) +
> > +							frags[0].page_offset;
> > +			if (mlx4_call_bpf(prog, ethh, frags[0].page_size) ==
> > +							BPF_PHYS_DEV_DROP)
> > +				goto next;
> > +		}
> > +
> 
> Do the program need to know if the packet-page we handover is
> considered 'read-only' at the call site?  Or do we rely on my idea
> requiring the registration to "know" this...
Seems that we're leaning towards the latter at this point. In either
case, we won't allow buggy situations, and let's see which approach
works better when we have RFC code in hand.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Brenden Blanco @ 2016-04-08 17:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo, linux-mm
In-Reply-To: <20160408143340.10e5b1d0@redhat.com>

On Fri, Apr 08, 2016 at 02:33:40PM +0200, Jesper Dangaard Brouer wrote:
> 
> On Fri, 8 Apr 2016 12:36:14 +0200 Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > > +/* user return codes for PHYS_DEV prog type */
> > > +enum bpf_phys_dev_action {
> > > +	BPF_PHYS_DEV_DROP,
> > > +	BPF_PHYS_DEV_OK,
> > > +};  
> > 
> > I can imagine these extra return codes:
> > 
> >  BPF_PHYS_DEV_MODIFIED,   /* Packet page/payload modified */
> >  BPF_PHYS_DEV_STOLEN,     /* E.g. forward use-case */
> >  BPF_PHYS_DEV_SHARED,     /* Queue for async processing, e.g. tcpdump use-case */
> > 
> > The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> > which we can look at when we get that far...
> 
> I want to point out something which is quite FUNDAMENTAL, for
> understanding these return codes (and network stack).
> 
> 
> At driver RX time, the network stack basically have two ways of
> building an SKB, which is send up the stack.
> 
> Option-A (fastest): The packet page is writable. The SKB can be
> allocated and skb->data/head can point directly to the page.  And
> we place/write skb_shared_info in the end/tail-room. (This is done by
> calling build_skb()).
> 
> Option-B (slower): The packet page is read-only.  The SKB cannot point
> skb->data/head directly to the page, because skb_shared_info need to be
> written into skb->end (slightly hidden via skb_shinfo() casting).  To
> get around this, a separate piece of memory is allocated (speedup by
> __alloc_page_frag) for pointing skb->data/head, so skb_shared_info can
> be written. (This is done when calling netdev/napi_alloc_skb()).
>   Drivers then need to copy over packet headers, and assign + adjust
> skb_shinfo(skb)->frags[0] offset to skip copied headers.
> 
> 
> Unfortunately most drivers use option-B.  Due to cost of calling the
> page allocator.  It is only slightly most expensive to get a larger
> compound page from the page allocator, which then can be partitioned into
> page-fragments, thus amortizing the page alloc cost.  Unfortunately the
> cost is added later, when constructing the SKB.
>  Another reason for option-B, is that archs with expensive IOMMU
> requirements (like PowerPC), don't need to dma_unmap on every packet,
> but only on the compound page level.
> 
> Side-note: Most drivers have a "copy-break" optimization.  Especially
> for option-B, when copying header data anyhow. For small packet, one
> might as well free (or recycle) the RX page, if header size fits into
> the newly allocated memory (for skb_shared_info).
> 
> 
> For the early filter drop (DDoS use-case), it does not matter that the
> packet-page is read-only.
> 
> BUT for the future XDP (eXpress Data Path) use-case it does matter.  If
> we ever want to see speeds comparable to DPDK, then drivers to
> need to implement option-A, as this allow forwarding at the packet-page
> level.
> 
> I hope, my future page-pool facility can remove/hide the cost calling
> the page allocator.
> 
Can't wait! This will open up a lot of doors.
> 
> Back to the return codes, thus:
> -------------------------------
> BPF_PHYS_DEV_SHARED requires driver use option-B, when constructing
> the SKB, and treat packet data as read-only.
> 
> BPF_PHYS_DEV_MODIFIED requires driver to provide a writable packet-page.
I understand the driver/hw requirement, but the codes themselves I think
need some tweaking. For instance, if the packet is both modified and
forwarded, should the flags be ORed together? Or is the need for this
return code made obsolete if the driver knows ahead of time via struct
bpf_prog flags that the prog intends to modify the packet, and can set
up the page accordingly?
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: add BQL support
From: Petri Gynther @ 2016-04-08 16:54 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, David Miller, opendmb, Jaedon Shin
In-Reply-To: <CAGVrzcbHGq7YP6kJ+M+jM+CCsKXMoqtzkf04YHPj+kseqm__9A@mail.gmail.com>

On Wed, Apr 6, 2016 at 1:25 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> 2016-04-05 17:50 GMT-07:00 Petri Gynther <pgynther@google.com>:
> > Add Byte Queue Limits (BQL) support to bcmgenet driver.
> >
> > Signed-off-by: Petri Gynther <pgynther@google.com>
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>
> Thanks!
> --
> Florian

Any further comments?

Notable difference from some other drivers --
netdev_tx_reset_queue(txq) is called for all queues in
bcmgenet_netif_start(), just before netif_tx_start_all_queues(dev).
This is to ensure that BQL is reset before the interface becomes
operational.

I think that is the right place for these calls.

Some other drivers call it from the "interface down" path.

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: David Miller @ 2016-04-08 16:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: yangyingliang, netdev, dingtianhong
In-Reply-To: <1460126665.6473.437.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Apr 2016 07:44:25 -0700

> On Fri, 2016-04-08 at 19:18 +0800, Yang Yingliang wrote:
> 
>> I expand  tcp_adv_win_scale and tcp_rmem. It has no effect.
> 
> Try :
> 
> echo -2 >/proc/sys/net/ipv4/tcp_adv_win_scale
> 
> And restart your flows.

I'm honestly beginning to suspect a bug in their driver and how they
handle skb->truesize.

Yang, until you show us the driver you are using and how is handles
receive packets, we are largely in the dark about a major component
of this issue and that is entirely unfair to us.

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Brenden Blanco @ 2016-04-08 16:48 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jesper Dangaard Brouer, davem, netdev, tom, alexei.starovoitov,
	ogerlitz, eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <5707916A.2030305@iogearbox.net>

On Fri, Apr 08, 2016 at 01:09:30PM +0200, Daniel Borkmann wrote:
> On 04/08/2016 12:36 PM, Jesper Dangaard Brouer wrote:
> >On Thu,  7 Apr 2016 21:48:46 -0700
> >Brenden Blanco <bblanco@plumgrid.com> wrote:
> >
> >>Add a new bpf prog type that is intended to run in early stages of the
> >>packet rx path. Only minimal packet metadata will be available, hence a
> >>new context type, struct bpf_phys_dev_md, is exposed to userspace. So
> >>far only expose the readable packet length, and only in read mode.
> >>
> >>The PHYS_DEV name is chosen to represent that the program is meant only
> >>for physical adapters, rather than all netdevs.
> >>
> >>While the user visible struct is new, the underlying context must be
> >>implemented as a minimal skb in order for the packet load_* instructions
> >>to work. The skb filled in by the driver must have skb->len, skb->head,
> >>and skb->data set, and skb->data_len == 0.
> >>
> >>Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> >>---
> >>  include/uapi/linux/bpf.h | 14 ++++++++++
> >>  kernel/bpf/verifier.c    |  1 +
> >>  net/core/filter.c        | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 83 insertions(+)
> >>
> >>diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>index 70eda5a..3018d83 100644
> >>--- a/include/uapi/linux/bpf.h
> >>+++ b/include/uapi/linux/bpf.h
> >>@@ -93,6 +93,7 @@ enum bpf_prog_type {
> >>  	BPF_PROG_TYPE_SCHED_CLS,
> >>  	BPF_PROG_TYPE_SCHED_ACT,
> >>  	BPF_PROG_TYPE_TRACEPOINT,
> >>+	BPF_PROG_TYPE_PHYS_DEV,
> >>  };
> >>
> >>  #define BPF_PSEUDO_MAP_FD	1
> >>@@ -368,6 +369,19 @@ struct __sk_buff {
> >>  	__u32 tc_classid;
> >>  };
> >>
> >>+/* user return codes for PHYS_DEV prog type */
> >>+enum bpf_phys_dev_action {
> >>+	BPF_PHYS_DEV_DROP,
> >>+	BPF_PHYS_DEV_OK,
> >>+};
> >
> >I can imagine these extra return codes:
> >
> >  BPF_PHYS_DEV_MODIFIED,   /* Packet page/payload modified */
> >  BPF_PHYS_DEV_STOLEN,     /* E.g. forward use-case */
> >  BPF_PHYS_DEV_SHARED,     /* Queue for async processing, e.g. tcpdump use-case */
> >
> >The "STOLEN" and "SHARED" use-cases require some refcnt manipulations,
> >which we can look at when we get that far...
> 
> I'd probably let the tcpdump case be handled by the rest of the stack.
> Forwarding could be to some txqueue or perhaps directly to a virtual net
> device e.g. the veth end sitting in a container (where fake skb would
> need to be promoted to a real one). Or, perhaps instead of veth end,
> directly demuxed into a target socket queue in that container ...
> Alternatively for tcpdump use-case, you could also do sampling on the
> bpf_phy_dev with eBPF maps.
+1, there is plenty of infrastructure to deal with this already.
> 
> >For the "MODIFIED" case, people caring about checksumming, might want
> >to voice their concern if we want additional info or return codes,
> >indicating if software need to recalc CSUM (or e.g. if only IP-pseudo
> >hdr was modified).
> >
> >>+/* user accessible metadata for PHYS_DEV packet hook
> >>+ * new fields must be added to the end of this structure
> >>+ */
> >>+struct bpf_phys_dev_md {
> >>+	__u32 len;
> >>+};
> >
> >This is userspace visible.  I can easily imagine this structure will get
> >extended.  How does a userspace eBPF program know that the struct got
> >extended? (bet you got some scheme, normally I would add a "version" as
> >first elem).
Since fields are only ever added to the end, programs that access beyond
the struct as understood by the running kernel will be rejected. The
struct ordering is a hard requirement. We've gone through quite a few
upgrades of this style of struct and not had any issues that I am aware.
> >
> >BTW, how and where is this "struct bpf_phys_dev_md" allocated?
> 
> It isn't, see bpf_phys_dev_convert_ctx_access(). At load time the verifier
> will convert/rewrite this based on offsetof() to a real access of the per
> cpu sk_buff, that's the only purpose.
> 
> Cheers,
> Daniel

^ permalink raw reply

* Re: [PATCH net] mpls: find_outdev: check for err ptr in addition to NULL check
From: David Miller @ 2016-04-08 16:43 UTC (permalink / raw)
  To: roopa; +Cc: netdev
In-Reply-To: <1460089718-25788-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Thu,  7 Apr 2016 21:28:38 -0700

> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> find_outdev calls inet{,6}_fib_lookup_dev() or dev_get_by_index() to
> find the output device. In case of an error, inet{,6}_fib_lookup_dev()
> returns error pointer and dev_get_by_index() returns NULL. But the function
> only checks for NULL and thus can end up calling dev_put on an ERR_PTR.
> This patch adds an additional check for err ptr after the NULL check.
> 
> Before: Trying to add an mpls route with no oif from user, no available
> path to 10.1.1.8 and no default route:
> $ip -f mpls route add 100 as 200 via inet 10.1.1.8
> [  822.337195] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000003a3
 ...
> After patch:
> $ip -f mpls route add 100 as 200 via inet 10.1.1.8
> RTNETLINK answers: Network is unreachable
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Reported-by: David Miller <davem@davemloft.net>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: thunderx: Fix broken of_node_put() code.
From: David Daney @ 2016-04-08 16:41 UTC (permalink / raw)
  To: David Daney
  Cc: David S. Miller, netdev, linux-kernel, linux-arm-kernel,
	Robert Richter, Sunil Goutham, David Daney
In-Reply-To: <1459472517-5696-1-git-send-email-ddaney.cavm@gmail.com>

Due to mail server malfunction, this patch was sent twice.  Please 
ignore this duplicate.

Thanks,
David Daney


On 03/31/2016 06:01 PM, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
>
> commit b7d3e3d3d21a ("net: thunderx: Don't leak phy device references
> on -EPROBE_DEFER condition.") incorrectly moved the call to
> of_node_put() outside of the loop.  Under normal loop exit, the node
> has already had of_node_put() called, so the extra call results in:
>
> [    8.228020] ERROR: Bad of_node_put() on /soc@0/pci@848000000000/mrml-bridge0@1,0/bgx0/xlaui00
> [    8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157
> [    8.247380] Hardware name: www.cavium.com EBB8800/EBB8800, BIOS 0.3 Mar  2 2016
> [    8.273541] Call trace:
> [    8.273550] [<fffffc0008097364>] dump_backtrace+0x0/0x210
> [    8.273557] [<fffffc0008097598>] show_stack+0x24/0x2c
> [    8.273560] [<fffffc0008399ed0>] dump_stack+0x8c/0xb4
> [    8.273566] [<fffffc00085aa828>] of_node_release+0xa8/0xac
> [    8.273570] [<fffffc000839cad8>] kobject_cleanup+0x8c/0x194
> [    8.273573] [<fffffc000839c97c>] kobject_put+0x44/0x6c
> [    8.273576] [<fffffc00085a9ab0>] of_node_put+0x24/0x30
> [    8.273587] [<fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx]
> [    8.273591] [<fffffc00083ed220>] pci_device_probe+0xa0/0x114
> [    8.273596] [<fffffc0008473fbc>] driver_probe_device+0x178/0x418
> [    8.273599] [<fffffc000847435c>] __driver_attach+0x100/0x118
> [    8.273602] [<fffffc0008471b58>] bus_for_each_dev+0x6c/0xac
> [    8.273605] [<fffffc0008473884>] driver_attach+0x30/0x38
> [    8.273608] [<fffffc00084732f4>] bus_add_driver+0x1f8/0x29c
> [    8.273611] [<fffffc0008475028>] driver_register+0x70/0x110
> [    8.273617] [<fffffc00083ebf08>] __pci_register_driver+0x60/0x6c
> [    8.273623] [<fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx]
> [    8.273626] [<fffffc0008090d04>] do_one_initcall+0xcc/0x1c0
> [    8.273631] [<fffffc0008198abc>] do_init_module+0x68/0x1c8
> [    8.273635] [<fffffc0008125668>] load_module+0xf44/0x11f4
> [    8.273638] [<fffffc0008125b64>] SyS_finit_module+0xb8/0xe0
> [    8.273641] [<fffffc0008093b30>] el0_svc_naked+0x24/0x28
>
> Go back to the previous (correct) code that only did the extra
> of_node_put() call on early exit from the loop.
>
> Signed-off-by: David Daney <david.daney@cavium.com>
> ---
>   drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> index 9679515..d20539a 100644
> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> @@ -1011,10 +1011,11 @@ static int bgx_init_of_phy(struct bgx *bgx)
>   		}
>
>   		lmac++;
> -		if (lmac == MAX_LMAC_PER_BGX)
> +		if (lmac == MAX_LMAC_PER_BGX) {
> +			of_node_put(node);
>   			break;
> +		}
>   	}
> -	of_node_put(node);
>   	return 0;
>
>   defer:
>

^ permalink raw reply

* Re: [RFC PATCH v2 2/5] net: add ndo to set bpf prog in adapter rx
From: Brenden Blanco @ 2016-04-08 16:39 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <20160408113858.4d39b274@redhat.com>

On Fri, Apr 08, 2016 at 11:38:58AM +0200, Jesper Dangaard Brouer wrote:
> 
> On Thu,  7 Apr 2016 21:48:47 -0700 Brenden Blanco <bblanco@plumgrid.com> wrote:
> 
> > Add two new set/get netdev ops for drivers implementing the
> > BPF_PROG_TYPE_PHYS_DEV filter.
> > 
> > Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
> [...]
> > 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index cb4e508..3acf732 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> [...]
> > @@ -1102,6 +1103,14 @@ struct tc_to_netdev {
> >   *	appropriate rx headroom value allows avoiding skb head copy on
> >   *	forward. Setting a negative value resets the rx headroom to the
> >   *	default value.
> > + * int  (*ndo_bpf_set)(struct net_device *dev, struct bpf_prog *prog);
> > + *	This function is used to set or clear a bpf program used in the
> > + *	earliest stages of packet rx. The prog will have been loaded as
> > + *	BPF_PROG_TYPE_PHYS_DEV. The callee is responsible for calling
> > + *	bpf_prog_put on any old progs that are stored, but not on the passed
> > + *	in prog.
> > + * bool (*ndo_bpf_get)(struct net_device *dev);
> > + *	This function is used to check if a bpf program is set on the device.
> >   *
> 
> This interface for the entire device, right.  I can imagine that users
> want to attach a eBPF program per RX queue.  Can we adapt the interface
> to support this? (could default to adding it all queues)
> 
Currently yes, for the entire device. I don't see rx queue exposed in
common setlink APIs. Wouldn't this be available only through ethtool,
generally? That would be a significant change to the api, but not a lot
of code. I would defer to others on which is cleaner. An alternative
could be to always run the program, but expose the queue number in
struct bpf_phys_dev_md. That is not as flexible since the program is
still shared, but maybe still useful.
> 
> I'm also wondering if we should add a "flags" parameter.  Or maybe we
> can extend 'struct bpf_prog' with I have in mind.
> 
> When the eBPF program is attached to a RX queue, I want to know if the
> program want to modify packet-data, up-front.
> 
> The problem is that most drivers use dma_sync, which means that data is
> considered 'read-only' (the "considered" part depend on DMA engine, and
> we might find a DMA loop-hole for some configs).
>   If the program want to write, the driver have the option of
> reconfiguring the driver routine to use dma_unmap, before handing over
> the page.  Or driver can also choose to realloc the specific RX ring
> queue pages as single pages (using dma_map/unmap consistently).
>  This also allow us to give a return code indicating given driver does
> not support writable packet-pages (rejecting program).
When write-mode is enabled for this prog type, we'll add the flag. I
don't want to add unused flags prematurely. When we add such support, it
should be available in the bpf_prog struct, similar to the cb_access or
dst_needed bool fields.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next 00/18][pull request] 10GbE Intel Wired LAN Driver Updates 2016-04-07
From: David Miller @ 2016-04-08 16:32 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1460085673-87056-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu,  7 Apr 2016 20:20:55 -0700

> This series contains updates to ixgbe and ixgbevf.

Pulled, thanks Jeff.

^ permalink raw reply

* [PATCH] net: thunderx: Fix broken of_node_put() code.
From: David Daney @ 2016-04-01  1:01 UTC (permalink / raw)
  To: David S. Miller, netdev
  Cc: linux-kernel, linux-arm-kernel, Robert Richter, Sunil Goutham,
	David Daney

From: David Daney <david.daney@cavium.com>

commit b7d3e3d3d21a ("net: thunderx: Don't leak phy device references
on -EPROBE_DEFER condition.") incorrectly moved the call to
of_node_put() outside of the loop.  Under normal loop exit, the node
has already had of_node_put() called, so the extra call results in:

[    8.228020] ERROR: Bad of_node_put() on /soc@0/pci@848000000000/mrml-bridge0@1,0/bgx0/xlaui00
[    8.239433] CPU: 16 PID: 608 Comm: systemd-udevd Not tainted 4.6.0-rc1-numa+ #157
[    8.247380] Hardware name: www.cavium.com EBB8800/EBB8800, BIOS 0.3 Mar  2 2016
[    8.273541] Call trace:
[    8.273550] [<fffffc0008097364>] dump_backtrace+0x0/0x210
[    8.273557] [<fffffc0008097598>] show_stack+0x24/0x2c
[    8.273560] [<fffffc0008399ed0>] dump_stack+0x8c/0xb4
[    8.273566] [<fffffc00085aa828>] of_node_release+0xa8/0xac
[    8.273570] [<fffffc000839cad8>] kobject_cleanup+0x8c/0x194
[    8.273573] [<fffffc000839c97c>] kobject_put+0x44/0x6c
[    8.273576] [<fffffc00085a9ab0>] of_node_put+0x24/0x30
[    8.273587] [<fffffc0000bd0f74>] bgx_probe+0x17c/0xcd8 [thunder_bgx]
[    8.273591] [<fffffc00083ed220>] pci_device_probe+0xa0/0x114
[    8.273596] [<fffffc0008473fbc>] driver_probe_device+0x178/0x418
[    8.273599] [<fffffc000847435c>] __driver_attach+0x100/0x118
[    8.273602] [<fffffc0008471b58>] bus_for_each_dev+0x6c/0xac
[    8.273605] [<fffffc0008473884>] driver_attach+0x30/0x38
[    8.273608] [<fffffc00084732f4>] bus_add_driver+0x1f8/0x29c
[    8.273611] [<fffffc0008475028>] driver_register+0x70/0x110
[    8.273617] [<fffffc00083ebf08>] __pci_register_driver+0x60/0x6c
[    8.273623] [<fffffc0000bf0040>] bgx_init_module+0x40/0x48 [thunder_bgx]
[    8.273626] [<fffffc0008090d04>] do_one_initcall+0xcc/0x1c0
[    8.273631] [<fffffc0008198abc>] do_init_module+0x68/0x1c8
[    8.273635] [<fffffc0008125668>] load_module+0xf44/0x11f4
[    8.273638] [<fffffc0008125b64>] SyS_finit_module+0xb8/0xe0
[    8.273641] [<fffffc0008093b30>] el0_svc_naked+0x24/0x28

Go back to the previous (correct) code that only did the extra
of_node_put() call on early exit from the loop.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 9679515..d20539a 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -1011,10 +1011,11 @@ static int bgx_init_of_phy(struct bgx *bgx)
 		}
 
 		lmac++;
-		if (lmac == MAX_LMAC_PER_BGX)
+		if (lmac == MAX_LMAC_PER_BGX) {
+			of_node_put(node);
 			break;
+		}
 	}
-	of_node_put(node);
 	return 0;
 
 defer:
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox