Netdev List

Netdev List
 help / color / mirror / Atom feed

* suspicius csum initialization in vmxnet3_rx_csum
From: Paolo Abeni @ 2018-05-24  8:47 UTC (permalink / raw)
  To: Ronak Doshi, Shrikrishna Khare, pv-drivers; +Cc: netdev, Neil Horman

Hi all,

we are hitting the BUG() condition in skb_checksum_help() -
skb_checksum_start_offset(skb) >= skb_headlen(skb) for skb received
from vmnxnet3 and queued from OVS to user-space

I think that the root cause is in vmxnet3_rx_csum():

			if (gdesc->rcd.csum) {
                                skb->csum = htons(gdesc->rcd.csum);
                                skb->ip_summed = CHECKSUM_PARTIAL;

CHECKSUM_PARTIAL looks suspicious here, as the csum field is
initialized, instead of csum_offset/csum_start. To be honest I find
also strange that the csum value is converted from host byte order.

I'm wild guessing something like the below patch should fix the issue,
but I'm not familiar with the vmxnet3 code. Can you please have a look?

Thank you,

Paolo
---
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 9ebe2a689966..06ade074c32c 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1172,7 +1172,7 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter,
 		} else {
 			if (gdesc->rcd.csum) {
 				skb->csum = htons(gdesc->rcd.csum);
-				skb->ip_summed = CHECKSUM_PARTIAL;
+				skb->ip_summed = CHECKSUM_COMPLETE;
 			} else {
 				skb_checksum_none_assert(skb);
 			}

^ permalink raw reply related

* Re: [PATCH 0/4] RFC CPSW switchdev mode
From: Ilias Apalodimas @ 2018-05-24  8:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, ivecera,
	francois.ozog, yogeshs, spatton
In-Reply-To: <20180524080528.GD2295@nanopsycho>

On Thu, May 24, 2018 at 10:05:28AM +0200, Jiri Pirko wrote:
> Thu, May 24, 2018 at 08:56:20AM CEST, ilias.apalodimas@linaro.org wrote:
> Any reason you need cpu port? We don't need it in mlxsw and also in dsa.
Yes i've seen that on mlxsw/rocker drivers and i was reluctant adding one here. 
The reason is that TI wants this configured differently from customer facing 
ports. Apparently there are existing customers already using the "feature". 
So OR'ing and adding the cpu port on every operation (add/del vlans add 
ucast/mcast entries etc) was less favoured. 
> 
> What is this device? Could you give me some pointer to description?
This is the switch used on TI's AM5728 and BBB boards. I am pretty sure there 
are other platforms i am not aware of.
http://www.ti.com/lit/ug/spruhz6j/spruhz6j.pdf is the techincal reference
manual. Section 24.11.5.4 "Initialization and Configuration of CPSW" is the
switch part.

Thanks,
Ilias

^ permalink raw reply

* [PATCH net-next] net: bridge: add support for port isolation
From: Nikolay Aleksandrov @ 2018-05-24  8:56 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, stephen, bridge, Nikolay Aleksandrov

This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 include/linux/if_bridge.h    | 1 +
 include/uapi/linux/if_link.h | 1 +
 net/bridge/br_forward.c      | 3 ++-
 net/bridge/br_input.c        | 1 +
 net/bridge/br_netlink.c      | 9 ++++++++-
 net/bridge/br_private.h      | 9 +++++++++
 net/bridge/br_sysfs_if.c     | 2 ++
 7 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 585d27182425..7843b98e1c6e 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -50,6 +50,7 @@ struct br_ip_list {
 #define BR_VLAN_TUNNEL		BIT(13)
 #define BR_BCAST_FLOOD		BIT(14)
 #define BR_NEIGH_SUPPRESS	BIT(15)
+#define BR_ISOLATED		BIT(16)
 
 #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b85266420bfb..cf01b6824244 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -333,6 +333,7 @@ enum {
 	IFLA_BRPORT_BCAST_FLOOD,
 	IFLA_BRPORT_GROUP_FWD_MASK,
 	IFLA_BRPORT_NEIGH_SUPPRESS,
+	IFLA_BRPORT_ISOLATED,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 7a7fd672ccf2..9019f326fe81 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -30,7 +30,8 @@ static inline int should_deliver(const struct net_bridge_port *p,
 	vg = nbp_vlan_group_rcu(p);
 	return ((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
 		br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING &&
-		nbp_switchdev_allowed_egress(p, skb);
+		nbp_switchdev_allowed_egress(p, skb) &&
+		!br_skb_isolated(p, skb);
 }
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 7f98a7d25866..72074276c088 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -114,6 +114,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 		goto drop;
 
 	BR_INPUT_SKB_CB(skb)->brdev = br->dev;
+	BR_INPUT_SKB_CB(skb)->src_port_isolated = !!(p->flags & BR_ISOLATED);
 
 	if (IS_ENABLED(CONFIG_INET) &&
 	    (skb->protocol == htons(ETH_P_ARP) ||
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 015f465c514b..9f5eb05b0373 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -139,6 +139,7 @@ static inline size_t br_port_info_size(void)
 		+ nla_total_size(1)	/* IFLA_BRPORT_PROXYARP_WIFI */
 		+ nla_total_size(1)	/* IFLA_BRPORT_VLAN_TUNNEL */
 		+ nla_total_size(1)	/* IFLA_BRPORT_NEIGH_SUPPRESS */
+		+ nla_total_size(1)	/* IFLA_BRPORT_ISOLATED */
 		+ nla_total_size(sizeof(struct ifla_bridge_id))	/* IFLA_BRPORT_ROOT_ID */
 		+ nla_total_size(sizeof(struct ifla_bridge_id))	/* IFLA_BRPORT_BRIDGE_ID */
 		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_DESIGNATED_PORT */
@@ -213,7 +214,8 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 							BR_VLAN_TUNNEL)) ||
 	    nla_put_u16(skb, IFLA_BRPORT_GROUP_FWD_MASK, p->group_fwd_mask) ||
 	    nla_put_u8(skb, IFLA_BRPORT_NEIGH_SUPPRESS,
-		       !!(p->flags & BR_NEIGH_SUPPRESS)))
+		       !!(p->flags & BR_NEIGH_SUPPRESS)) ||
+	    nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)))
 		return -EMSGSIZE;
 
 	timerval = br_timer_value(&p->message_age_timer);
@@ -660,6 +662,7 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_VLAN_TUNNEL] = { .type = NLA_U8 },
 	[IFLA_BRPORT_GROUP_FWD_MASK] = { .type = NLA_U16 },
 	[IFLA_BRPORT_NEIGH_SUPPRESS] = { .type = NLA_U8 },
+	[IFLA_BRPORT_ISOLATED]	= { .type = NLA_U8 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -810,6 +813,10 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
 	if (err)
 		return err;
 
+	err = br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
+	if (err)
+		return err;
+
 	br_port_flags_change(p, old_flags ^ p->flags);
 	return 0;
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 742f40aefdaf..11520ed528b0 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -423,6 +423,7 @@ struct br_input_skb_cb {
 #endif
 
 	bool proxyarp_replied;
+	bool src_port_isolated;
 
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 	bool vlan_filtered;
@@ -574,6 +575,14 @@ int br_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
 void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	      enum br_pkt_type pkt_type, bool local_rcv, bool local_orig);
 
+/* return true if both source port and dest port are isolated */
+static inline bool br_skb_isolated(const struct net_bridge_port *to,
+				   const struct sk_buff *skb)
+{
+	return BR_INPUT_SKB_CB(skb)->src_port_isolated &&
+	       (to->flags & BR_ISOLATED);
+}
+
 /* br_if.c */
 void br_port_carrier_check(struct net_bridge_port *p, bool *notified);
 int br_add_bridge(struct net *net, const char *name);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index fd31ad83ec7b..f99c5bf5c906 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -192,6 +192,7 @@ BRPORT_ATTR_FLAG(proxyarp_wifi, BR_PROXYARP_WIFI);
 BRPORT_ATTR_FLAG(multicast_flood, BR_MCAST_FLOOD);
 BRPORT_ATTR_FLAG(broadcast_flood, BR_BCAST_FLOOD);
 BRPORT_ATTR_FLAG(neigh_suppress, BR_NEIGH_SUPPRESS);
+BRPORT_ATTR_FLAG(isolated, BR_ISOLATED);
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -243,6 +244,7 @@ static const struct brport_attribute *brport_attrs[] = {
 	&brport_attr_broadcast_flood,
 	&brport_attr_group_fwd_mask,
 	&brport_attr_neigh_suppress,
+	&brport_attr_isolated,
 	NULL
 };
 
-- 
2.11.0

^ permalink raw reply related

* Re: INFO: rcu detected stall in corrupted
From: Xin Long @ 2018-05-24  9:02 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Eric Dumazet, David Miller, syzbot+f116bc1994efe725d51b, kuznet,
	LKML, network dev, syzkaller-bugs, yoshfuji, dsahern,
	Roopa Prabhu, linux-sctp
In-Reply-To: <20180523231340.GN5488@localhost.localdomain>

On Thu, May 24, 2018 at 7:13 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Mon, May 21, 2018 at 11:13:46AM -0700, Eric Dumazet wrote:
>>
>>
>> On 05/21/2018 11:09 AM, David Miller wrote:
>> > From: syzbot <syzbot+f116bc1994efe725d51b@syzkaller.appspotmail.com>
>> > Date: Mon, 21 May 2018 11:05:02 -0700
>> >
>> >>  find_match+0x244/0x13a0 net/ipv6/route.c:691
>> >>  find_rr_leaf net/ipv6/route.c:729 [inline]
>> >>  rt6_select net/ipv6/route.c:779 [inline]
>> >
>> > Hmmm, endless loop in find_rr_leaf or similar?
>> >
>>
>>
>> I do not think so, this really looks like SCTP specific
>> , we now have dozens of traces all sharing :
>>
>>  sctp_transport_route+0xad/0x450 net/sctp/transport.c:293
>>  sctp_packet_config+0xb89/0xfd0 net/sctp/output.c:123
>>  sctp_outq_flush+0x79c/0x4370 net/sctp/outqueue.c:894
>>  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>>  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>>  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>>  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>>  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>>
>>
>> Some kind of infinite loop.
>>
>> When the hrtimer fires, it can point to any code that sits below but does not necessarily have a bug.
>
> Agreed. Xin Long identified the root cause. syzkaller is setting too
> aggressive parameters to SCTP RTO, leading to issues with the
> heartbeat timer.
Right, I will prepare a fix soon with your suggestion rto_min value "HZ/5"
Thanks.

^ permalink raw reply

* [PATCH v2 net-next] sfc: stop the TX queue before pushing new buffers
From: Martin Habets @ 2018-05-24  9:14 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev, jarod
In-Reply-To: <651409eb-6faa-0224-e521-5b14d6913c9a@solarflare.com>

efx_enqueue_skb() can push new buffers for the xmit_more functionality.
We must stops the TX queue before this or else the TX queue does not get
restarted and we get a netdev watchdog.

In the error handling we may now need to unwind more than 1 packet, and
we may need to push the new buffers onto the partner queue.

v2: In the error leg also push this queue if xmit_more is set

Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Martin Habets <mhabets@solarflare.com>
---

Dave, could you please also queue this patch up for stable?

 drivers/net/ethernet/sfc/tx.c |   33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index cece961f2e82..c3ad564ac4c0 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -435,17 +435,18 @@ static int efx_tx_map_data(struct efx_tx_queue *tx_queue, struct sk_buff *skb,
 	} while (1);
 }
 
-/* Remove buffers put into a tx_queue.  None of the buffers must have
- * an skb attached.
+/* Remove buffers put into a tx_queue for the current packet.
+ * None of the buffers must have an skb attached.
  */
-static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
+static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue,
+			       unsigned int insert_count)
 {
 	struct efx_tx_buffer *buffer;
 	unsigned int bytes_compl = 0;
 	unsigned int pkts_compl = 0;
 
 	/* Work backwards until we hit the original insert pointer value */
-	while (tx_queue->insert_count != tx_queue->write_count) {
+	while (tx_queue->insert_count != insert_count) {
 		--tx_queue->insert_count;
 		buffer = __efx_tx_queue_get_insert_buffer(tx_queue);
 		efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
@@ -504,6 +505,8 @@ static int efx_tx_tso_fallback(struct efx_tx_queue *tx_queue,
  */
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 {
+	unsigned int old_insert_count = tx_queue->insert_count;
+	bool xmit_more = skb->xmit_more;
 	bool data_mapped = false;
 	unsigned int segments;
 	unsigned int skb_len;
@@ -553,8 +556,10 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	/* Update BQL */
 	netdev_tx_sent_queue(tx_queue->core_txq, skb_len);
 
+	efx_tx_maybe_stop_queue(tx_queue);
+
 	/* Pass off to hardware */
-	if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
+	if (!xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
 		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
 
 		/* There could be packets left on the partner queue if those
@@ -577,14 +582,26 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 		tx_queue->tx_packets++;
 	}
 
-	efx_tx_maybe_stop_queue(tx_queue);
-
 	return NETDEV_TX_OK;
 
 
 err:
-	efx_enqueue_unwind(tx_queue);
+	efx_enqueue_unwind(tx_queue, old_insert_count);
 	dev_kfree_skb_any(skb);
+
+	/* If we're not expecting another transmit and we had something to push
+	 * on this queue or a partner queue then we need to push here to get the
+	 * previous packets out.
+	 */
+	if (!xmit_more) {
+		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
+
+		if (txq2->xmit_more_available)
+			efx_nic_push_buffers(txq2);
+
+		efx_nic_push_buffers(tx_queue);
+	}
+
 	return NETDEV_TX_OK;
 }
 

^ permalink raw reply related

* Re: [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs
From: Daniel Borkmann @ 2018-05-24  9:25 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <8826a2e1-71c5-06fc-7a66-3c33c1a54c78@linux.vnet.ibm.com>

On 05/24/2018 10:25 AM, Sandipan Das wrote:
> On 05/24/2018 01:04 PM, Daniel Borkmann wrote:
>> On 05/24/2018 08:56 AM, Sandipan Das wrote:
>>> For multi-function programs, loading the address of a callee
>>> function to a register requires emitting instructions whose
>>> count varies from one to five depending on the nature of the
>>> address.
>>>
>>> Since we come to know of the callee's address only before the
>>> extra pass, the number of instructions required to load this
>>> address may vary from what was previously generated. This can
>>> make the JITed image grow or shrink.
>>>
>>> To avoid this, we should generate a constant five-instruction
>>> when loading function addresses by padding the optimized load
>>> sequence with NOPs.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 34 +++++++++++++++++++++++-----------
>>>  1 file changed, 23 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..e4582744a31d 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
>>>  
>>>  static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, u64 func)
>>>  {
>>> +	unsigned int i, ctx_idx = ctx->idx;
>>> +
>>> +	/* Load function address into r12 */
>>> +	PPC_LI64(12, func);
>>> +
>>> +	/* For bpf-to-bpf function calls, the callee's address is unknown
>>> +	 * until the last extra pass. As seen above, we use PPC_LI64() to
>>> +	 * load the callee's address, but this may optimize the number of
>>> +	 * instructions required based on the nature of the address.
>>> +	 *
>>> +	 * Since we don't want the number of instructions emitted to change,
>>> +	 * we pad the optimized PPC_LI64() call with NOPs to guarantee that
>>> +	 * we always have a five-instruction sequence, which is the maximum
>>> +	 * that PPC_LI64() can emit.
>>> +	 */
>>> +	for (i = ctx->idx - ctx_idx; i < 5; i++)
>>> +		PPC_NOP();
>>
>> By the way, I think you can still optimize this. The nops are not really
>> needed in case of insn->src_reg != BPF_PSEUDO_CALL since the address of
>> a normal BPF helper call will always be at a fixed location and known a
>> priori.
> 
> Ah, true. Thanks for pointing this out. There are a few other things that
> we are planning to do for the ppc64 JIT compiler. Will put out a patch for
> this with that series.

Awesome, thanks Sandipan!

^ permalink raw reply

* Re: [PATCH v2 05/13] mtd: rawnand: marvell: remove the dmaengine compat need
From: Miquel Raynal @ 2018-05-24  9:30 UTC (permalink / raw)
  To: Robert Jarzmik
  Cc: Daniel Mack, Haojian Zhuang, Ezequiel Garcia, Boris Brezillon,
	David Woodhouse, Brian Norris, Marek Vasut, Richard Weinberger,
	Liam Girdwood, Mark Brown, Arnd Bergmann, alsa-devel, netdev,
	linux-mmc, linux-kernel, linux-ide, linux-mtd, dmaengine,
	linux-arm-kernel, linux-media
In-Reply-To: <20180524070703.11901-6-robert.jarzmik@free.fr>

Hi Robert,

On Thu, 24 May 2018 09:06:55 +0200, Robert Jarzmik
<robert.jarzmik@free.fr> wrote:

> As the pxa architecture switched towards the dmaengine slave map, the
> old compatibility mechanism to acquire the dma requestor line number and
> priority are not needed anymore.
> 
> This patch simplifies the dma resource acquisition, using the more
> generic function dma_request_slave_channel().
> 
> Signed-off-by: Signed-off-by: Daniel Mack <daniel@zonque.org>
> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
> ---
>  drivers/mtd/nand/raw/marvell_nand.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 

Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>

Thanks,
Miquèl

^ permalink raw reply

* Re: STMMAC driver with TSO enabled issue
From: Jose Abreu @ 2018-05-24  9:31 UTC (permalink / raw)
  To: Bhadram Varka, Jose Abreu, netdev@vger.kernel.org, Joao Pinto
In-Reply-To: <c1c9025e-f75b-9ae1-4513-24615308e0af@nvidia.com>

Hi Bhadram,

On 24-05-2018 06:58, Bhadram Varka wrote:
>
> After some time if check Tx descriptor status - then I see only
> below
>
> [..]
> [85788.286730] 027 [0x827951b0]: 0xf854f000 0x0 0x16d8 0x90000000
>
> index 025 and 026 descriptors processed but not index 027.
>
> At this stage Tx DMA is always in below state -
>
> ■ 3'b011: Running (Reading Data from system memory
> buffer and queuing it to the Tx buffer (Tx FIFO))

Thats strange, I think the descriptors look okay though. I will
need the registers values (before the lock) and, if possible, the
git bisect output.

Thanks and Best Regards,
Jose Miguel Abreu

>
> Thanks,
> Bhadram.

^ permalink raw reply

* Re: [PATCH net-next v3 0/7] Add support for QCA8334 switch
From: Michal Vokáč @ 2018-05-24  9:34 UTC (permalink / raw)
  To: Florian Fainelli, andrew
  Cc: netdev, linux-kernel, devicetree, vivien.didelot, mark.rutland,
	robh+dt, davem, michal.vokac
In-Reply-To: <29ea7cdd-5690-e39a-a2ef-4f48fbcb7659@gmail.com>

On 23.5.2018 17:39, Florian Fainelli wrote:
> 
> 
> On 05/22/2018 11:20 PM, Michal Vokáč wrote:
>> This series basically adds support for a QCA8334 ethernet switch to the
>> qca8k driver. It is a four-port variant of the already supported seven
>> port QCA8337. Register map is the same for the whole familly and all chips
>> have the same device ID.
>>
>> Major part of this series enhances the CPU port setting. Currently the CPU
>> port is not set to any sensible defaults compatible with the xGMII
>> interface. This series forces the CPU port to its maximum bandwidth and
>> also allows to adjust the new defaults using fixed-link device tree
>> sub-node.
>>
>> Alongside these changes I fixed two checkpatch warnings regarding SPDX and
>> redundant parentheses.
> 
> Looks great, thanks Michal! Do you have any features or things you are
> working on that would be added later to the driver?

Thank you too Florian. And also big thank to you Andrew. You helped me
a lot to debug the RGMII issue. I have been stuck at that for more than
a month and would not resolve it without your help.

As I have done this in a process of upgrading our BSP to a more recent
kernel, and hopefully mainline, I now need to move on to other parts of
the board. So unfortunately no, I do not have any other enhancements
planned to this driver for now. But as we are probably one of the few
with access to the NDA covered Qualcomm documentation I see a great
opportunity to work on that later. I am afraid "later" means something
like next year in this case as I am basically the only kernel developer
in our company and not yet very experienced.

Thank you all for your time,
Michal

^ permalink raw reply

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Tariq Toukan @ 2018-05-24  9:45 UTC (permalink / raw)
  To: Qing Huang, tariqt, davem, haakon.bugge, yanjun.zhu
  Cc: netdev, linux-rdma, linux-kernel, gi-oh.kim
In-Reply-To: <20180523232246.20445-1-qing.huang@oracle.com>



On 24/05/2018 2:22 AM, Qing Huang wrote:
> When a system is under memory presure (high usage with fragments),
> the original 256KB ICM chunk allocations will likely trigger kernel
> memory management to enter slow path doing memory compact/migration
> ops in order to complete high order memory allocations.
> 
> When that happens, user processes calling uverb APIs may get stuck
> for more than 120s easily even though there are a lot of free pages
> in smaller chunks available in the system.
> 
> Syslog:
> ...
> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
> oracle_205573_e:205573 blocked for more than 120 seconds.
> ...
> 
> With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
> 
> However in order to support smaller ICM chunk size, we need to fix
> another issue in large size kcalloc allocations.
> 
> E.g.
> Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
> size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
> entry). So we need a 16MB allocation for a table->icm pointer array to
> hold 2M pointers which can easily cause kcalloc to fail.
> 
> The solution is to use kvzalloc to replace kcalloc which will fall back
> to vmalloc automatically if kmalloc fails.
> 
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> Acked-by: Daniel Jurgens <danielj@mellanox.com>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
> v4: use kvzalloc instead of vzalloc
>      add one err condition check
>      don't include vmalloc.h any more
> 
> v3: use PAGE_SIZE instead of PAGE_SHIFT
>      add comma to the end of enum variables
>      include vmalloc.h header file to avoid build issues on Sparc
> 
> v2: adjusted chunk size to reflect different architectures
> 
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index a822f7a..685337d 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,12 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in as big chunks as we can, up to a maximum of 256 KB
> - * per chunk.
> + * We allocate in page size (default 4KB on many archs) chunks to avoid high
> + * order memory allocations in fragmented/high usage memory situation.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> -	MLX4_TABLE_CHUNK_SIZE	= 1 << 18
> +	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> +	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -398,9 +398,11 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   	u64 size;
>   
>   	obj_per_chunk = MLX4_TABLE_CHUNK_SIZE / obj_size;
> +	if (WARN_ON(!obj_per_chunk))
> +		return -EINVAL;
>   	num_icm = (nobj + obj_per_chunk - 1) / obj_per_chunk;
>   
> -	table->icm      = kcalloc(num_icm, sizeof(*table->icm), GFP_KERNEL);
> +	table->icm      = kvzalloc(num_icm * sizeof(*table->icm), GFP_KERNEL);
>   	if (!table->icm)
>   		return -ENOMEM;
>   	table->virt     = virt;
> @@ -446,7 +448,7 @@ int mlx4_init_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table,
>   			mlx4_free_icm(dev, table->icm[i], use_coherent);
>   		}
>   
> -	kfree(table->icm);
> +	kvfree(table->icm);
>   
>   	return -ENOMEM;
>   }
> @@ -462,5 +464,5 @@ void mlx4_cleanup_icm_table(struct mlx4_dev *dev, struct mlx4_icm_table *table)
>   			mlx4_free_icm(dev, table->icm[i], table->coherent);
>   		}
>   
> -	kfree(table->icm);
> +	kvfree(table->icm);
>   }
> 

Thanks Qing.

Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

^ permalink raw reply

* Re: [PATCH 3/4] cpsw_switchdev: add switchdev support files
From: Maxim Uvarov @ 2018-05-24 10:00 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, francois.ozog, yogeshs, spatton
In-Reply-To: <1527144984-31236-4-git-send-email-ilias.apalodimas@linaro.org>

2018-05-24 9:56 GMT+03:00 Ilias Apalodimas <ilias.apalodimas@linaro.org>:
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> ---
>  drivers/net/ethernet/ti/Kconfig          |   9 +
>  drivers/net/ethernet/ti/Makefile         |   1 +
>  drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++++++++++++++++++
>  drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
>  4 files changed, 313 insertions(+)
>  create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
>  create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h
>
> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
> index 48a541e..b22ae7d 100644
> --- a/drivers/net/ethernet/ti/Kconfig
> +++ b/drivers/net/ethernet/ti/Kconfig
> @@ -73,6 +73,15 @@ config TI_CPSW
>           To compile this driver as a module, choose M here: the module
>           will be called cpsw.
>
> +config TI_CPSW_SWITCHDEV
> +       bool "TI CPSW switchdev support"
> +       depends on TI_CPSW
> +       depends on NET_SWITCHDEV
> +       help
> +         Enable switchdev support on TI's CPSW Ethernet Switch.
> +
> +         This will allow you to configure the switch using standard tools.
> +
>  config TI_CPTS
>         bool "TI Common Platform Time Sync (CPTS) Support"
>         depends on TI_CPSW || TI_KEYSTONE_NETCP
> diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
> index 0be551d..3926c6a 100644
> --- a/drivers/net/ethernet/ti/Makefile
> +++ b/drivers/net/ethernet/ti/Makefile
> @@ -15,6 +15,7 @@ obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
>  obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
>  obj-$(CONFIG_TI_CPTS_MOD) += cpts.o
>  obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
> +ti_cpsw-objs:= cpsw_switchdev.o
>  ti_cpsw-y := cpsw.o
>
>  obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
> diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
> new file mode 100644
> index 0000000..bf8c1bf
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
> @@ -0,0 +1,299 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Texas Instruments switchdev Driver
> + *
> + * Copyright (C) 2018 Texas Instruments
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation version 2.
> + *
> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any
> + * kind, whether express or implied; without even the implied warranty
> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/etherdevice.h>
> +#include <linux/if_bridge.h>
> +#include <net/switchdev.h>
> +#include "cpsw.h"
> +#include "cpsw_priv.h"
> +#include "cpsw_ale.h"
> +
> +static u32 cpsw_switchdev_get_ver(struct net_device *ndev)
> +{
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       struct cpsw_common *cpsw = priv->cpsw;
> +
> +       return cpsw->version;
> +}
> +
> +static int cpsw_port_attr_set(struct net_device *dev,
> +                             const struct switchdev_attr *attr,
> +                             struct switchdev_trans *trans)
> +{
> +       return -EOPNOTSUPP;
> +}
> +
> +static int cpsw_port_attr_get(struct net_device *dev,
> +                             struct switchdev_attr *attr)
> +{
> +       u32 cpsw_ver;
> +       int err = 0;
> +
> +       switch (attr->id) {
> +       case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
> +               cpsw_ver = cpsw_switchdev_get_ver(dev);
> +               attr->u.ppid.id_len = sizeof(cpsw_ver);
> +               memcpy(&attr->u.ppid.id, &cpsw_ver, attr->u.ppid.id_len);
> +               break;
> +       default:
> +               return -EOPNOTSUPP;
> +       }
> +
> +       return err;

err is always 0 here.

> +}
> +
> +static u16 cpsw_get_pvid(struct cpsw_priv *priv)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       u32 __iomem *port_vlan_reg;
> +       u32 pvid;
> +
> +       if (priv->emac_port) {
> +               int reg = CPSW2_PORT_VLAN;
> +
> +               if (cpsw->version == CPSW_VERSION_1)
> +                       reg = CPSW1_PORT_VLAN;
> +               pvid = slave_read(cpsw->slaves + (priv->emac_port - 1), reg);
> +       } else {
> +               port_vlan_reg = &cpsw->host_port_regs->port_vlan;
> +               pvid = readl(port_vlan_reg);
> +       }
> +
> +       pvid = pvid & 0xfff;
> +
> +       return pvid;
> +}
> +
> +static void cpsw_set_pvid(struct cpsw_priv *priv, u16 vid, bool cfi, u32 cos)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       void __iomem *port_vlan_reg;
> +       u32 pvid;
> +
> +       pvid = vid;
> +       pvid |= cfi ? BIT(12) : 0;
> +       pvid |= (cos & 0x7) << 13;
> +
> +       if (priv->emac_port) {
> +               int reg = CPSW2_PORT_VLAN;
> +
> +               if (cpsw->version == CPSW_VERSION_1)
> +                       reg = CPSW1_PORT_VLAN;
> +               /* no barrier */
> +               slave_write(cpsw->slaves + (priv->emac_port - 1), pvid, reg);
> +       } else {
> +               /* CPU port */
> +               port_vlan_reg = &cpsw->host_port_regs->port_vlan;
> +               writel(pvid, port_vlan_reg);
> +       }
> +}
> +
> +static int cpsw_port_vlan_add(struct cpsw_priv *priv, bool untag, bool pvid,
> +                             u16 vid)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask = BIT(priv->emac_port);
> +       int unreg_mcast_mask = 0;
> +       int reg_mcast_mask = 0;
> +       int untag_mask = 0;
> +       int ret = 0;
> +
> +       if (priv->ndev->flags & IFF_ALLMULTI)
> +               unreg_mcast_mask = port_mask;
> +
> +       if (priv->ndev->flags & IFF_MULTICAST)
> +               reg_mcast_mask = port_mask;
> +
> +       if (untag)
> +               untag_mask = port_mask;
> +
> +       ret = cpsw_ale_vlan_add_modify(cpsw->ale, vid, port_mask, untag_mask,
> +                                      reg_mcast_mask, unreg_mcast_mask);
> +       if (ret) {
> +               dev_err(priv->dev, "Unable to add vlan\n");
> +               return ret;
> +       }
> +
> +       if (!pvid)
> +               return ret;
> +
> +       cpsw_set_pvid(priv, vid, 0, 0);
> +
> +       dev_dbg(priv->dev, "VID: %u dev: %s port: %u\n", vid,
> +               priv->ndev->name, priv->emac_port);
> +
> +       return ret;
> +}
> +
> +static int cpsw_port_vlan_del(struct cpsw_priv *priv, u16 vid)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask = BIT(priv->emac_port);
> +       int ret = 0;

no need to set it to 0 here.

> +
> +       ret = cpsw_ale_vlan_del_modify(cpsw->ale, vid, port_mask);
> +       if (ret != 0)
> +               return ret;
> +
> +       ret = cpsw_ale_del_ucast(cpsw->ale, priv->mac_addr,
> +                                HOST_PORT_NUM, ALE_VLAN, vid);
> +
> +       if (vid == cpsw_get_pvid(priv))
> +               cpsw_set_pvid(priv, 0, 0, 0);
> +
> +       if (ret != 0) {
> +               dev_dbg(priv->dev, "Failed to delete unicast entry\n");
> +               ret = 0;

no need to set it to 0.

> +       }
> +
> +       ret = cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
> +                                0, ALE_VLAN, vid);
> +       if (ret != 0) {
> +               dev_dbg(priv->dev, "Failed to delete multicast entry\n");
> +               ret = 0;
> +       }
> +

just return 0 as it always returned.

> +       return ret;
> +}
> +
> +static int cpsw_port_vlans_add(struct cpsw_priv *priv,
> +                              const struct switchdev_obj_port_vlan *vlan,
> +                              struct switchdev_trans *trans)
> +{
> +       bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
> +       bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
> +       u16 vid;
> +
> +       if (switchdev_trans_ph_prepare(trans))
> +               return 0;
> +
> +       for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
> +               int err;
> +
> +               err = cpsw_port_vlan_add(priv, untagged, pvid, vid);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int cpsw_port_vlans_del(struct cpsw_priv *priv,
> +                              const struct switchdev_obj_port_vlan *vlan)
> +
> +{
> +       u16 vid;
> +
> +       for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
> +               int err;
> +
> +               err = cpsw_port_vlan_del(priv, vid);
> +               if (err)
> +                       return err;
> +       }
> +
> +       return 0;
> +}
> +
> +static int cpsw_port_mdb_add(struct cpsw_priv *priv,
> +                            struct switchdev_obj_port_mdb *mdb,
> +                            struct switchdev_trans *trans)
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int port_mask;
> +       int err;
> +
> +       if (switchdev_trans_ph_prepare(trans))
> +               return 0;
> +
> +       port_mask = BIT(priv->emac_port);
> +       err = cpsw_ale_mcast_add_modify(cpsw->ale, mdb->addr, port_mask,
> +                                       ALE_VLAN, mdb->vid, 0);
> +
> +       return err;
> +}
> +
> +static int cpsw_port_mdb_del(struct cpsw_priv *priv,
> +                            struct switchdev_obj_port_mdb *mdb)
> +
> +{
> +       struct cpsw_common *cpsw = priv->cpsw;
> +       int del_mask;
> +       int err;
> +
> +       del_mask = BIT(priv->emac_port);
> +       err = cpsw_ale_mcast_del_modify(cpsw->ale, mdb->addr, del_mask,
> +                                       ALE_VLAN, mdb->vid);
> +
> +       return err;
> +}
> +
> +static int cpsw_port_obj_add(struct net_device *ndev,
> +                            const struct switchdev_obj *obj,
> +                            struct switchdev_trans *trans)
> +{
> +       struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
> +       struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       int err = 0;
> +
> +       switch (obj->id) {
> +       case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +               err = cpsw_port_vlans_add(priv, vlan, trans);
> +               break;
> +       case SWITCHDEV_OBJ_ID_PORT_MDB:
> +               err = cpsw_port_mdb_add(priv, mdb, trans);
> +               break;
> +       default:
> +               err = -EOPNOTSUPP;
> +               break;
> +       }
> +
> +       return err;
> +}
> +
> +static int cpsw_port_obj_del(struct net_device *ndev,
> +                            const struct switchdev_obj *obj)
> +{
> +       struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
> +       struct cpsw_priv *priv = netdev_priv(ndev);
> +       int err = 0;
> +
> +       switch (obj->id) {
> +       case SWITCHDEV_OBJ_ID_PORT_VLAN:
> +               err = cpsw_port_vlans_del(priv, vlan);
> +               break;
> +       case SWITCHDEV_OBJ_ID_PORT_MDB:
> +               err = cpsw_port_mdb_del(priv, SWITCHDEV_OBJ_PORT_MDB(obj));
> +               break;
> +       default:
> +               err = -EOPNOTSUPP;
> +               break;
> +       }
> +
> +       return err;
> +}
> +
> +static const struct switchdev_ops cpsw_port_switchdev_ops = {
> +       .switchdev_port_attr_set        = cpsw_port_attr_set,
> +       .switchdev_port_attr_get        = cpsw_port_attr_get,
> +       .switchdev_port_obj_add         = cpsw_port_obj_add,
> +       .switchdev_port_obj_del         = cpsw_port_obj_del,
> +};
> +
> +void cpsw_port_switchdev_init(struct net_device *ndev)
> +{
> +       ndev->switchdev_ops = &cpsw_port_switchdev_ops;
> +}
> diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.h b/drivers/net/ethernet/ti/cpsw_switchdev.h
> new file mode 100644
> index 0000000..4940462
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/cpsw_switchdev.h
> @@ -0,0 +1,4 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <net/switchdev.h>
> +
> +void cpsw_port_switchdev_init(struct net_device *ndev);
> --
> 2.7.4
>



-- 
Best regards,
Maxim Uvarov

^ permalink raw reply

* [PATCH] net/9p: fix error path of p9_virtio_probe
From: Jean-Philippe Brucker @ 2018-05-24 10:10 UTC (permalink / raw)
  To: v9fs-developer, ericvh, rminnich, lucho; +Cc: netdev, davem

Currently when virtio_find_single_vq fails, we go through del_vqs which
throws a warning (Trying to free already-free IRQ). Skip del_vqs if vq
allocation failed.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 net/9p/trans_virtio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 4d0372263e5d..1c87eee522b7 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -562,7 +562,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	chan->vq = virtio_find_single_vq(vdev, req_done, "requests");
 	if (IS_ERR(chan->vq)) {
 		err = PTR_ERR(chan->vq);
-		goto out_free_vq;
+		goto out_free_chan;
 	}
 	chan->vq->vdev->priv = chan;
 	spin_lock_init(&chan->lock);
@@ -615,6 +615,7 @@ static int p9_virtio_probe(struct virtio_device *vdev)
 	kfree(tag);
 out_free_vq:
 	vdev->config->del_vqs(vdev);
+out_free_chan:
 	kfree(chan);
 fail:
 	return err;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next v7 0/6] ipv6: sr: introduce seg6local End.BPF action
From: Daniel Borkmann @ 2018-05-24 10:12 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <cover.1526824042.git.m.xhonneux@gmail.com>

On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
> As of Linux 4.14, it is possible to define advanced local processing for
> IPv6 packets with a Segment Routing Header through the seg6local LWT
> infrastructure. This LWT implements the network programming principles
> defined in the IETF “SRv6 Network Programming” draft.
> 
> The implemented operations are generic, and it would be very interesting to
> be able to implement user-specific seg6local actions, without having to
> modify the kernel directly. To do so, this patchset adds an End.BPF action
> to seg6local, powered by some specific Segment Routing-related helpers,
> which provide SR functionalities that can be applied on the packet. This
> BPF hook would then allow to implement specific actions at native kernel
> speed such as OAM features, advanced SR SDN policies, SRv6 actions like
> Segment Routing Header (SRH) encapsulation depending on the content of
> the packet, etc.
> 
> This patchset is divided in 6 patches, whose main features are :
> 
> - A new seg6local action End.BPF with the corresponding new BPF program
>   type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
>   passed to the LWT seg6local through netlink, the same way as the LWT
>   BPF hook operates.
> - 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
>   shrink a SRH and apply on a packet some of the generic SRv6 actions.
> - 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
>   encapsulation (via IPv6 encapsulation or inlining if the packet contains
>   already an IPv6 header).
> 
> As this patchset adds a new LWT BPF hook, I took into account the result of
> the discussions when the LWT BPF infrastructure got merged. Hence, the
> seg6local BPF hook doesn’t allow write access to skb->data directly, only
> the SRH can be modified through specific helpers, which ensures that the
> integrity of the packet is maintained.
> More details are available in the related patches messages.
> 
> The performances of this BPF hook have been assessed with the BPF JIT
> enabled on an Intel Xeon X3440 processors with 4 cores and 8 threads
> clocked at 2.53 GHz. No throughput losses are noted with the seg6local
> BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
> TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
> drops the throughput to 410kpps, and inlining a SRH via
> bpf_lwt_seg6_action drops the throughput to 420kpps.
> All throughputs are stable.
> 
> -------
> v2: move the SRH integrity state from skb->cb to a per-cpu buffer
> v3: - document helpers in man-page style
>     - fix kbuild bugs
>     - un-break BPF LWT out hook
>     - bpf_push_seg6_encap is now static
>     - preempt_enable is now called when the packet is dropped in
>       input_action_end_bpf
> v4: fix kbuild bugs when CONFIG_IPV6=m
> v5: fix kbuild sparse warnings when CONFIG_IPV6=m
> v6: fix skb pointers-related bugs in helpers
> v7: - fix memory leak in error path of End.BPF setup
>     - add freeing of BPF data in seg6_local_destroy_state
>     - new enums SEG6_LOCAL_BPF_* instead of re-using ones of lwt bpf for
>       netlink nested bpf attributes
>     - SEG6_LOCAL_BPF_PROG attr now contains prog->aux->id when dumping
>       state
> 
> Thanks.
> 
> Mathieu Xhonneux (6):
>   ipv6: sr: make seg6.h includable without IPv6
>   ipv6: sr: export function lookup_nexthop
>   bpf: Add IPv6 Segment Routing helpers
>   bpf: Split lwt inout verifier structures
>   ipv6: sr: Add seg6local action End.BPF
>   selftests/bpf: test for seg6local End.BPF action
> 
>  include/linux/bpf_types.h                         |   5 +-
>  include/net/seg6.h                                |   7 +-
>  include/net/seg6_local.h                          |  32 ++
>  include/uapi/linux/bpf.h                          |  97 ++++-
>  include/uapi/linux/seg6_local.h                   |  12 +
>  kernel/bpf/verifier.c                             |   1 +
>  net/core/filter.c                                 | 393 ++++++++++++++++---
>  net/ipv6/Kconfig                                  |   5 +
>  net/ipv6/seg6_local.c                             | 190 +++++++++-
>  tools/include/uapi/linux/bpf.h                    |  97 ++++-
>  tools/lib/bpf/libbpf.c                            |   1 +
>  tools/testing/selftests/bpf/Makefile              |   6 +-
>  tools/testing/selftests/bpf/bpf_helpers.h         |  12 +
>  tools/testing/selftests/bpf/test_lwt_seg6local.c  | 437 ++++++++++++++++++++++
>  tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++++++
>  15 files changed, 1363 insertions(+), 72 deletions(-)
>  create mode 100644 include/net/seg6_local.h
>  create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
>  create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

Applied to bpf-next, thanks Mathieu!

^ permalink raw reply

* Re: [PATCH bpf-next v7 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-24 10:18 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <98333ad8233f546c91e8f51c8c8ae457ce19980f.1526824042.git.m.xhonneux@gmail.com>

On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
> The BPF seg6local hook should be powerful enough to enable users to
> implement most of the use-cases one could think of. After some thinking,
> we figured out that the following actions should be possible on a SRv6
> packet, requiring 3 specific helpers :
>     - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
>     - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
>                                (to add/delete TLVs)
>     - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
>                            (specifically End.X, End.T, End.B6 and
>                             End.B6.Encap)
> 
> The specifications of these helpers are provided in the patch (see
> include/uapi/linux/bpf.h).
> 
> The non-sensitive fields of the SRH are the following : flags, tag and
> TLVs. The other fields can not be modified, to maintain the SRH
> integrity. Flags, tag and TLVs can easily be modified as their validity
> can be checked afterwards via seg6_validate_srh. It is not allowed to
> modify the segments directly. If one wants to add segments on the path,
> he should stack a new SRH using the End.B6 action via
> bpf_lwt_seg6_action.
> 
> Growing, shrinking or editing TLVs via the helpers will flag the SRH as
> invalid, and it will have to be re-validated before re-entering the IPv6
> layer. This flag is stored in a per-CPU buffer, along with the current
> header length in bytes.
> 
> Storing the SRH len in bytes in the control block is mandatory when using
> bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
> len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
> boundary). When adding/deleting TLVs within the BPF program, the SRH may
> temporary be in an invalid state where its length cannot be rounded to 8
> bytes without remainder, hence the need to store the length in bytes
> separately. The caller of the BPF program can then ensure that the SRH's
> final length is valid using this value. Again, a final SRH modified by a
> BPF program which doesn’t respect the 8-bytes boundary will be discarded
> as it will be considered as invalid.
> 
> Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
> available from the LWT BPF IN hook, but not from the seg6local BPF one.
> This helper allows to encapsulate a Segment Routing Header (either with
> a new outer IPv6 header, or by inlining it directly in the existing IPv6
> header) into a non-SRv6 packet. This helper is required if we want to
> offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
> as the BPF seg6local hook only works on traffic already containing a SRH.
> This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
> the same purpose but with a static SRH per route.
> 
> These helpers require CONFIG_IPV6=y (and not =m).
> 
> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
> Acked-by: David Lebrun <dlebrun@google.com>

One minor comments for follow-ups in here below.

> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
> +	   const void *, from, u32, len)
> +{
> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
> +	struct seg6_bpf_srh_state *srh_state =
> +		this_cpu_ptr(&seg6_bpf_srh_states);
> +	void *srh_tlvs, *srh_end, *ptr;
> +	struct ipv6_sr_hdr *srh;
> +	int srhoff = 0;
> +
> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
> +		return -EINVAL;
> +
> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> +	srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
> +	srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
> +
> +	ptr = skb->data + offset;
> +	if (ptr >= srh_tlvs && ptr + len <= srh_end)
> +		srh_state->valid = 0;
> +	else if (ptr < (void *)&srh->flags ||
> +		 ptr + len > (void *)&srh->segments)
> +		return -EFAULT;
> +
> +	if (unlikely(bpf_try_make_writable(skb, offset + len)))
> +		return -EFAULT;
> +
> +	memcpy(skb->data + offset, from, len);
> +	return 0;
> +#else /* CONFIG_IPV6_SEG6_BPF */
> +	return -EOPNOTSUPP;
> +#endif
> +}

Instead of doing this inside the helper you can reject the program already
in the lwt_*_func_proto() by returning NULL when !CONFIG_IPV6_SEG6_BPF. That
way programs get rejected at verification time instead of runtime, so the
user can probe availability more easily.

^ permalink raw reply

* Re: [PATCH v6 0/5] PCI: Improve PCIe link status reporting
From: Ganesh Goudar @ 2018-05-24 10:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jeff Kirsher, Michael Chan, Ariel Elior, David S. Miller,
	linux-pci, everest-linux-l2, intel-wired-lan, netdev,
	linux-kernel, Tal Gilboa, Tariq Toukan, Jacob Keller,
	Jakub Kicinski
In-Reply-To: <20180523214651.GE150632@bhelgaas-glaptop.roam.corp.google.com>

On Wednesday, May 05/23/18, 2018 at 16:46:51 -0500, Bjorn Helgaas wrote:
> [+to Davem]
> 
> On Thu, May 03, 2018 at 03:00:07PM -0500, Bjorn Helgaas wrote:
> > This is based on Tal's recent work to unify the approach for reporting PCIe
> > link speed/width and whether the device is being limited by a slower
> > upstream link.
> > 
> > The new pcie_print_link_status() interface appeared in v4.17-rc1; see
> > 9e506a7b5147 ("PCI: Add pcie_print_link_status() to log link speed and
> > whether it's limited").
> > 
> > That's a good way to replace use of pcie_get_minimum_link(), which gives
> > misleading results when a path contains both a fast, narrow link and a
> > slow, wide link: it reports the equivalent of a slow, narrow link.
> > 
> > This series removes the remaining uses of pcie_get_minimum_link() and then
> > removes the interface itself.  I'd like to merge them all through the PCI
> > tree to make the removal easy.
> > 
> > This does change the dmesg reporting of link speeds, and in the ixgbe case,
> > it changes the reporting from KERN_WARN level to KERN_INFO.  If that's an
> > issue, let's talk about it.  I'm hoping the reduce code size, improved
> > functionality, and consistency across drivers is enough to make this
> > worthwhile.
> > 
> > ---
> > 
> > Bjorn Helgaas (5):
> >       bnx2x: Report PCIe link properties with pcie_print_link_status()
> >       bnxt_en: Report PCIe link properties with pcie_print_link_status()
> >       cxgb4: Report PCIe link properties with pcie_print_link_status()
> >       ixgbe: Report PCIe link properties with pcie_print_link_status()
> >       PCI: Remove unused pcie_get_minimum_link()
> > 
> > 
> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   23 ++-----
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c        |   19 ------
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |   75 ----------------------
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   47 --------------
> >  drivers/pci/pci.c                                |   43 -------------
> >  include/linux/pci.h                              |    2 -
> >  6 files changed, 9 insertions(+), 200 deletions(-)
> 
> I applied all of these on pci/enumeration for v4.18.  If you'd rather take
> them, Dave, let me know and I'll drop them.
> 
> I solicited more acks, but only heard from Jeff.
Sorry for that, Thanks for cxgb4 changes.

^ permalink raw reply

* Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly
From: Christophe Leroy @ 2018-05-24 10:18 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, netdev, linuxppc-dev
In-Reply-To: <3848a4ad-2c0e-691f-e98f-347cfe3484e8@c-s.fr>



On 05/24/2018 06:20 AM, Christophe LEROY wrote:
> 
> 
> Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
>> On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
>>> The generic csum_ipv6_magic() generates a pretty bad result
>>
>> <snip>
>>
>> Please try with a more recent compiler, what you used is pretty ancient.
>> It's not like recent compilers do great on this either, but it's not
>> *that* bad anymore ;-)


Here is what I get with GCC 8.1
It doesn't look much better, does it ?


net/ipv6/ip6_checksum.o:     file format elf32-powerpc


Disassembly of section .text:

00000000 <csum_ipv6_magic>:
    0:	94 21 ff f0 	stwu    r1,-16(r1)
    4:	80 04 00 00 	lwz     r0,0(r4)
    8:	81 64 00 04 	lwz     r11,4(r4)
    c:	81 04 00 08 	lwz     r8,8(r4)
   10:	93 e1 00 0c 	stw     r31,12(r1)
   14:	81 43 00 00 	lwz     r10,0(r3)
   18:	83 e3 00 04 	lwz     r31,4(r3)
   1c:	81 23 00 08 	lwz     r9,8(r3)
   20:	81 83 00 0c 	lwz     r12,12(r3)
   24:	7c ea 3a 14 	add     r7,r10,r7
   28:	7d 4a 38 10 	subfc   r10,r10,r7
   2c:	7c ff 3a 14 	add     r7,r31,r7
   30:	81 44 00 0c 	lwz     r10,12(r4)
   34:	7c 63 19 10 	subfe   r3,r3,r3
   38:	7c 63 38 50 	subf    r3,r3,r7
   3c:	7f ff 18 10 	subfc   r31,r31,r3
   40:	7c e9 1a 14 	add     r7,r9,r3
   44:	83 e1 00 0c 	lwz     r31,12(r1)
   48:	7c 63 19 10 	subfe   r3,r3,r3
   4c:	38 21 00 10 	addi    r1,r1,16
   50:	7c 63 38 50 	subf    r3,r3,r7
   54:	7d 29 18 10 	subfc   r9,r9,r3
   58:	7d 2c 1a 14 	add     r9,r12,r3
   5c:	7c 63 19 10 	subfe   r3,r3,r3
   60:	7c 63 48 50 	subf    r3,r3,r9
   64:	7d 8c 18 10 	subfc   r12,r12,r3
   68:	7d 20 1a 14 	add     r9,r0,r3
   6c:	7c 63 19 10 	subfe   r3,r3,r3
   70:	7c 63 48 50 	subf    r3,r3,r9
   74:	7c 00 18 10 	subfc   r0,r0,r3
   78:	7d 2b 1a 14 	add     r9,r11,r3
   7c:	7c 63 19 10 	subfe   r3,r3,r3
   80:	7c 63 48 50 	subf    r3,r3,r9
   84:	7d 6b 18 10 	subfc   r11,r11,r3
   88:	7d 28 1a 14 	add     r9,r8,r3
   8c:	7c 63 19 10 	subfe   r3,r3,r3
   90:	7c 63 48 50 	subf    r3,r3,r9
   94:	7d 08 18 10 	subfc   r8,r8,r3
   98:	7d 2a 1a 14 	add     r9,r10,r3
   9c:	7c 63 19 10 	subfe   r3,r3,r3
   a0:	7c 63 48 50 	subf    r3,r3,r9
   a4:	7d 4a 18 10 	subfc   r10,r10,r3
   a8:	7d 23 2a 14 	add     r9,r3,r5
   ac:	7c 63 19 10 	subfe   r3,r3,r3
   b0:	7c 63 48 50 	subf    r3,r3,r9
   b4:	7c a5 18 10 	subfc   r5,r5,r3
   b8:	7c 63 32 14 	add     r3,r3,r6
   bc:	7d 29 49 10 	subfe   r9,r9,r9
   c0:	7d 29 18 50 	subf    r9,r9,r3
   c4:	7c c6 48 10 	subfc   r6,r6,r9
   c8:	7c 63 19 10 	subfe   r3,r3,r3
   cc:	7c 63 48 50 	subf    r3,r3,r9
   d0:	54 69 80 3e 	rotlwi  r9,r3,16
   d4:	7c 63 4a 14 	add     r3,r3,r9
   d8:	7c 63 18 f8 	not     r3,r3
   dc:	54 63 84 3e 	rlwinm  r3,r3,16,16,31
   e0:	4e 80 00 20 	blr

net/ipv6/ip6_checksum.o:     file format elf64-powerpc


Disassembly of section .text:

0000000000000000 <.csum_ipv6_magic>:
    0:	fb e1 ff f8 	std     r31,-8(r1)
    4:	81 43 00 00 	lwz     r10,0(r3)
    8:	81 83 00 04 	lwz     r12,4(r3)
    c:	81 23 00 08 	lwz     r9,8(r3)
   10:	80 03 00 0c 	lwz     r0,12(r3)
   14:	7c e7 52 14 	add     r7,r7,r10
   18:	80 64 00 08 	lwz     r3,8(r4)
   1c:	81 04 00 00 	lwz     r8,0(r4)
   20:	78 ff 00 20 	clrldi  r31,r7,32
   24:	7c ec 3a 14 	add     r7,r12,r7
   28:	81 64 00 04 	lwz     r11,4(r4)
   2c:	7f ea f8 50 	subf    r31,r10,r31
   30:	81 44 00 0c 	lwz     r10,12(r4)
   34:	7b ff 0f e0 	rldicl  r31,r31,1,63
   38:	7c ff 3a 14 	add     r7,r31,r7
   3c:	eb e1 ff f8 	ld      r31,-8(r1)
   40:	78 e4 00 20 	clrldi  r4,r7,32
   44:	7c e9 3a 14 	add     r7,r9,r7
   48:	7d 8c 20 50 	subf    r12,r12,r4
   4c:	79 8c 0f e0 	rldicl  r12,r12,1,63
   50:	7d 8c 3a 14 	add     r12,r12,r7
   54:	79 87 00 20 	clrldi  r7,r12,32
   58:	7d 80 62 14 	add     r12,r0,r12
   5c:	7d 29 38 50 	subf    r9,r9,r7
   60:	79 29 0f e0 	rldicl  r9,r9,1,63
   64:	7d 29 62 14 	add     r9,r9,r12
   68:	79 27 00 20 	clrldi  r7,r9,32
   6c:	7d 28 4a 14 	add     r9,r8,r9
   70:	7c 00 38 50 	subf    r0,r0,r7
   74:	78 00 0f e0 	rldicl  r0,r0,1,63
   78:	7c 00 4a 14 	add     r0,r0,r9
   7c:	78 09 00 20 	clrldi  r9,r0,32
   80:	7c 0b 02 14 	add     r0,r11,r0
   84:	7d 08 48 50 	subf    r8,r8,r9
   88:	79 08 0f e0 	rldicl  r8,r8,1,63
   8c:	7d 08 02 14 	add     r8,r8,r0
   90:	79 09 00 20 	clrldi  r9,r8,32
   94:	7d 03 42 14 	add     r8,r3,r8
   98:	7d 2b 48 50 	subf    r9,r11,r9
   9c:	79 29 0f e0 	rldicl  r9,r9,1,63
   a0:	7d 29 42 14 	add     r9,r9,r8
   a4:	79 28 00 20 	clrldi  r8,r9,32
   a8:	7d 2a 4a 14 	add     r9,r10,r9
   ac:	7d 03 40 50 	subf    r8,r3,r8
   b0:	79 08 0f e0 	rldicl  r8,r8,1,63
   b4:	7d 08 4a 14 	add     r8,r8,r9
   b8:	79 09 00 20 	clrldi  r9,r8,32
   bc:	7d 08 2a 14 	add     r8,r8,r5
   c0:	7d 2a 48 50 	subf    r9,r10,r9
   c4:	79 29 0f e0 	rldicl  r9,r9,1,63
   c8:	7d 29 42 14 	add     r9,r9,r8
   cc:	79 2a 00 20 	clrldi  r10,r9,32
   d0:	7d 29 32 14 	add     r9,r9,r6
   d4:	7c a5 50 50 	subf    r5,r5,r10
   d8:	78 a5 0f e0 	rldicl  r5,r5,1,63
   dc:	7d 25 4a 14 	add     r9,r5,r9
   e0:	79 2a 00 20 	clrldi  r10,r9,32
   e4:	7c c6 50 50 	subf    r6,r6,r10
   e8:	78 c6 0f e0 	rldicl  r6,r6,1,63
   ec:	7c c6 4a 14 	add     r6,r6,r9
   f0:	54 c3 80 3e 	rotlwi  r3,r6,16
   f4:	7c c6 1a 14 	add     r6,r6,r3
   f8:	7c c3 30 f8 	not     r3,r6
   fc:	78 63 84 22 	rldicl  r3,r3,48,48
  100:	4e 80 00 20 	blr

Christophe

>>
>>> --- a/arch/powerpc/lib/checksum_32.S
>>> +++ b/arch/powerpc/lib/checksum_32.S
>>> @@ -293,3 +293,36 @@ dst_error:
>>>       EX_TABLE(51b, dst_error);
>>>   EXPORT_SYMBOL(csum_partial_copy_generic)
>>> +
>>> +/*
>>> + * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>>> + *                      const struct in6_addr *daddr,
>>> + *                      __u32 len, __u8 proto, __wsum sum)
>>> + */
>>> +
>>> +_GLOBAL(csum_ipv6_magic)
>>> +    lwz    r8, 0(r3)
>>> +    lwz    r9, 4(r3)
>>> +    lwz    r10, 8(r3)
>>> +    lwz    r11, 12(r3)
>>> +    addc    r0, r5, r6
>>> +    adde    r0, r0, r7
>>> +    adde    r0, r0, r8
>>> +    adde    r0, r0, r9
>>> +    adde    r0, r0, r10
>>> +    adde    r0, r0, r11
>>> +    lwz    r8, 0(r4)
>>> +    lwz    r9, 4(r4)
>>> +    lwz    r10, 8(r4)
>>> +    lwz    r11, 12(r4)
>>> +    adde    r0, r0, r8
>>> +    adde    r0, r0, r9
>>> +    adde    r0, r0, r10
>>> +    adde    r0, r0, r11
>>> +    addze    r0, r0
>>> +    rotlwi    r3, r0, 16
>>> +    add    r3, r0, r3
>>> +    not    r3, r3
>>> +    rlwinm    r3, r3, 16, 16, 31
>>> +    blr
>>> +EXPORT_SYMBOL(csum_ipv6_magic)
>>
>> Clustering the loads and carry insns together is pretty much the worst 
>> you
>> can do on most 32-bit CPUs.
> 
> Oh, really ? __csum_partial is written that way too.
> 
> Right, now I tried interleaving the lwz and adde. I get no improvment at 
> all on a 885, but I get a 15% improvment on a 8321.
> 
> Christophe
> 
>>
>>
>> Segher
>>

^ permalink raw reply

* Re: [PATCH net-next] bpfilter: don't pass O_CREAT when opening console for debug
From: Daniel Borkmann @ 2018-05-24 10:33 UTC (permalink / raw)
  To: Jakub Kicinski, davem; +Cc: alexei.starovoitov, netdev, oss-drivers
In-Reply-To: <20180524074119.32132-1-jakub.kicinski@netronome.com>

On 05/24/2018 09:41 AM, Jakub Kicinski wrote:
> Passing O_CREAT (00000100) to open means we should also pass file
> mode as the third parameter.  Creating /dev/console as a regular
> file may not be helpful anyway, so simply drop the flag when
> opening debug_fd.
> 
> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next] bpfilter: fix build dependency
From: Daniel Borkmann @ 2018-05-24 10:33 UTC (permalink / raw)
  To: Alexei Starovoitov, David S . Miller; +Cc: jakub.kicinski, netdev, kernel-team
In-Reply-To: <20180524042905.3605566-1-ast@kernel.org>

On 05/24/2018 06:29 AM, Alexei Starovoitov wrote:
> BPFILTER could have been enabled without INET causing this build error:
> ERROR: "bpfilter_process_sockopt" [net/bpfilter/bpfilter.ko] undefined!
> 
> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH] netfilter: nft_numgen: fix ptr_ret.cocci warnings
From: Laura Garcia @ 2018-05-24 10:40 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	Netfilter Development Mailing list, coreteam, netdev,
	linux-kernel
In-Reply-To: <20180523105859.GA74092@ivb42>

On Wed, May 23, 2018 at 12:58 PM, kbuild test robot
<fengguang.wu@intel.com> wrote:
> From: kbuild test robot <fengguang.wu@intel.com>
>
> net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used
>
>
>  Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
>
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
>
> Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements")
> CC: Laura Garcia Liebana <nevola@gmail.com>
> Signed-off-by: kbuild test robot <fengguang.wu@intel.com>

Acked-by: Laura Garcia Liebana <nevola@gmail.com>

^ permalink raw reply

* [PATCH 1/6] ravb: remove custom .nway_reset from ethtool ops
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-1-git-send-email-vladimir_zapolskiy@mentor.com>

The change fixes a sleep in atomic context issue, which can be
always triggered by running 'ethtool -r' command, because
phy_start_aneg() protects phydev fields by a mutex.

Another note is that the change implicitly replaces phy_start_aneg()
with a newer phy_restart_aneg().

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 68f122140966..4a043eb0e2aa 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1150,21 +1150,6 @@ static int ravb_set_link_ksettings(struct net_device *ndev,
 	return error;
 }
 
-static int ravb_nway_reset(struct net_device *ndev)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	int error = -ENODEV;
-	unsigned long flags;
-
-	if (ndev->phydev) {
-		spin_lock_irqsave(&priv->lock, flags);
-		error = phy_start_aneg(ndev->phydev);
-		spin_unlock_irqrestore(&priv->lock, flags);
-	}
-
-	return error;
-}
-
 static u32 ravb_get_msglevel(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
@@ -1377,7 +1362,7 @@ static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 }
 
 static const struct ethtool_ops ravb_ethtool_ops = {
-	.nway_reset		= ravb_nway_reset,
+	.nway_reset		= phy_ethtool_nway_reset,
 	.get_msglevel		= ravb_get_msglevel,
 	.set_msglevel		= ravb_set_msglevel,
 	.get_link		= ethtool_op_get_link,
-- 
2.8.1

^ permalink raw reply related

* [PATCH 0/6] ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc

For ages trivial changes to RAVB and SuperH ethernet links by means of
standard 'ethtool' trigger a 'sleeping function called from invalid
context' bug, to visualize it on r8a7795 ULCB:

  % ethtool -r eth0
  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
  in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
  INFO: lockdep is turned off.
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>]           (null)
  hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
  softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
  softirqs last disabled at (0): [<0000000000000000>]           (null)
  CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
  Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
  Call trace:
   dump_backtrace+0x0/0x198
   show_stack+0x24/0x30
   dump_stack+0xb8/0xf4
   ___might_sleep+0x1c8/0x1f8
   __might_sleep+0x58/0x90
   __mutex_lock+0x50/0x890
   mutex_lock_nested+0x3c/0x50
   phy_start_aneg_priv+0x38/0x180
   phy_start_aneg+0x24/0x30
   ravb_nway_reset+0x3c/0x68
   dev_ethtool+0x3dc/0x2338
   dev_ioctl+0x19c/0x490
   sock_do_ioctl+0xe0/0x238
   sock_ioctl+0x254/0x460
   do_vfs_ioctl+0xb0/0x918
   ksys_ioctl+0x50/0x80
   sys_ioctl+0x34/0x48
   __sys_trace_return+0x0/0x4

The root cause is that an attempt to modify ECMR and GECMR registers
only when RX/TX function is disabled was too overcomplicated in its
original implementation, also processing of an optional Link Change
interrupt added even more complexity, as a result the implementation
was error prone.

The new locking scheme is confirmed to be correct by dumping driver
specific and generic PHY framework function calls with aid of ftrace
while running more or less advanced tests.

Please note that sh_eth patches from the series were built-tested only.

On purpose I do not add Fixes tags, the reused PHY handlers were added
way later than the fixed problems were firstly found in the drivers.

Vladimir Zapolskiy (6):
  ravb: remove custom .nway_reset from ethtool ops
  ravb: remove custom .get_link_ksettings from ethtool ops
  ravb: remove custom .set_link_ksettings from ethtool ops
  sh_eth: remove custom .nway_reset from ethtool ops
  sh_eth: remove custom .get_link_ksettings from ethtool ops
  sh_eth: remove custom .set_link_ksettings from ethtool ops

 drivers/net/ethernet/renesas/ravb_main.c | 93 ++++++-------------------------
 drivers/net/ethernet/renesas/sh_eth.c    | 94 ++++++--------------------------
 2 files changed, 34 insertions(+), 153 deletions(-)

-- 
2.8.1

^ permalink raw reply

* [PATCH 2/6] ravb: remove custom .get_link_ksettings from ethtool ops
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-1-git-send-email-vladimir_zapolskiy@mentor.com>

The change replaces a custom implementation of .get_link_ksettings
callback with a shared phy_ethtool_get_link_ksettings(), note that
&priv->lock wrapping is not needed, because the lock does not
serialize access to phydev fields.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 4a043eb0e2aa..3d91caa44176 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1096,22 +1096,6 @@ static int ravb_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int ravb_get_link_ksettings(struct net_device *ndev,
-				   struct ethtool_link_ksettings *cmd)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	unsigned long flags;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&priv->lock, flags);
-	phy_ethtool_ksettings_get(ndev->phydev, cmd);
-	spin_unlock_irqrestore(&priv->lock, flags);
-
-	return 0;
-}
-
 static int ravb_set_link_ksettings(struct net_device *ndev,
 				   const struct ethtool_link_ksettings *cmd)
 {
@@ -1372,7 +1356,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
 	.get_ringparam		= ravb_get_ringparam,
 	.set_ringparam		= ravb_set_ringparam,
 	.get_ts_info		= ravb_get_ts_info,
-	.get_link_ksettings	= ravb_get_link_ksettings,
+	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
 	.set_link_ksettings	= ravb_set_link_ksettings,
 	.get_wol		= ravb_get_wol,
 	.set_wol		= ravb_set_wol,
-- 
2.8.1

^ permalink raw reply related

* [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-1-git-send-email-vladimir_zapolskiy@mentor.com>

The change replaces a custom implementation of .set_link_ksettings
callback with a shared phy_ethtool_set_link_ksettings(), this fixes
sleep in atomic context bug, which is encountered every time when link
settings are changed by ethtool.

Now duplex mode setting is enforced in ravb_adjust_link() only, also
now TX/RX is disabled when link is put down or modifications to E-MAC
registers ECMR and GECMR are expected for both cases of checked and
ignored link status pin state from E-MAC interrupt handler.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
 1 file changed, 15 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 3d91caa44176..0d811c02ff34 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
 	struct ravb_private *priv = netdev_priv(ndev);
 	struct phy_device *phydev = ndev->phydev;
 	bool new_state = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
+
+	/* Disable TX and RX right over here, if E-MAC change is ignored */
+	if (priv->no_avb_link)
+		ravb_rcv_snd_disable(ndev);
 
 	if (phydev->link) {
 		if (phydev->duplex != priv->duplex) {
@@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
 			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
 			new_state = true;
 			priv->link = phydev->link;
-			if (priv->no_avb_link)
-				ravb_rcv_snd_enable(ndev);
 		}
 	} else if (priv->link) {
 		new_state = true;
 		priv->link = 0;
 		priv->speed = 0;
 		priv->duplex = -1;
-		if (priv->no_avb_link)
-			ravb_rcv_snd_disable(ndev);
 	}
 
+	/* Enable TX and RX right over here, if E-MAC change is ignored */
+	if (priv->no_avb_link && phydev->link)
+		ravb_rcv_snd_enable(ndev);
+
+	mmiowb();
+	spin_unlock_irqrestore(&priv->lock, flags);
+
 	if (new_state && netif_msg_link(priv))
 		phy_print_status(phydev);
 }
@@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int ravb_set_link_ksettings(struct net_device *ndev,
-				   const struct ethtool_link_ksettings *cmd)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	unsigned long flags;
-	int error;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&priv->lock, flags);
-
-	/* Disable TX and RX */
-	ravb_rcv_snd_disable(ndev);
-
-	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
-	if (error)
-		goto error_exit;
-
-	if (cmd->base.duplex == DUPLEX_FULL)
-		priv->duplex = 1;
-	else
-		priv->duplex = 0;
-
-	ravb_set_duplex(ndev);
-
-error_exit:
-	mdelay(1);
-
-	/* Enable TX and RX */
-	ravb_rcv_snd_enable(ndev);
-
-	mmiowb();
-	spin_unlock_irqrestore(&priv->lock, flags);
-
-	return error;
-}
-
 static u32 ravb_get_msglevel(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
@@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
 	.set_ringparam		= ravb_set_ringparam,
 	.get_ts_info		= ravb_get_ts_info,
 	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
-	.set_link_ksettings	= ravb_set_link_ksettings,
+	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
 	.get_wol		= ravb_get_wol,
 	.set_wol		= ravb_set_wol,
 };
-- 
2.8.1

^ permalink raw reply related

* [PATCH 4/6] sh_eth: remove custom .nway_reset from ethtool ops
From: Vladimir Zapolskiy @ 2018-05-24 11:11 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-1-git-send-email-vladimir_zapolskiy@mentor.com>

The change fixes a sleep in atomic context issue, which can be
always triggered by running 'ethtool -r' command, because
phy_start_aneg() protects phydev fields by a mutex.

Another note is that the change implicitly replaces phy_start_aneg()
with a newer phy_restart_aneg().

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/sh_eth.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index d9cadfb1bc4a..6d1fed2b4a4a 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2252,22 +2252,6 @@ static void sh_eth_get_regs(struct net_device *ndev, struct ethtool_regs *regs,
 	pm_runtime_put_sync(&mdp->pdev->dev);
 }
 
-static int sh_eth_nway_reset(struct net_device *ndev)
-{
-	struct sh_eth_private *mdp = netdev_priv(ndev);
-	unsigned long flags;
-	int ret;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-	ret = phy_start_aneg(ndev->phydev);
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
-	return ret;
-}
-
 static u32 sh_eth_get_msglevel(struct net_device *ndev)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
@@ -2418,7 +2402,7 @@ static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_regs_len	= sh_eth_get_regs_len,
 	.get_regs	= sh_eth_get_regs,
-	.nway_reset	= sh_eth_nway_reset,
+	.nway_reset	= phy_ethtool_nway_reset,
 	.get_msglevel	= sh_eth_get_msglevel,
 	.set_msglevel	= sh_eth_set_msglevel,
 	.get_link	= ethtool_op_get_link,
-- 
2.8.1

^ permalink raw reply related

* [PATCH 5/6] sh_eth: remove custom .get_link_ksettings from ethtool ops
From: Vladimir Zapolskiy @ 2018-05-24 11:14 UTC (permalink / raw)
  To: David S. Miller, Sergei Shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <1527160318-10958-1-git-send-email-vladimir_zapolskiy@mentor.com>

The change replaces a custom implementation of .get_link_ksettings
callback with a shared phy_ethtool_get_link_ksettings(), note that
&priv->lock wrapping is not needed, because the lock does not
serialize access to phydev fields.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
---
 drivers/net/ethernet/renesas/sh_eth.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index 6d1fed2b4a4a..e627b2b6c3b3 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -2019,22 +2019,6 @@ static int sh_eth_phy_start(struct net_device *ndev)
 	return 0;
 }
 
-static int sh_eth_get_link_ksettings(struct net_device *ndev,
-				     struct ethtool_link_ksettings *cmd)
-{
-	struct sh_eth_private *mdp = netdev_priv(ndev);
-	unsigned long flags;
-
-	if (!ndev->phydev)
-		return -ENODEV;
-
-	spin_lock_irqsave(&mdp->lock, flags);
-	phy_ethtool_ksettings_get(ndev->phydev, cmd);
-	spin_unlock_irqrestore(&mdp->lock, flags);
-
-	return 0;
-}
-
 static int sh_eth_set_link_ksettings(struct net_device *ndev,
 				     const struct ethtool_link_ksettings *cmd)
 {
@@ -2411,7 +2395,7 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
 	.get_sset_count     = sh_eth_get_sset_count,
 	.get_ringparam	= sh_eth_get_ringparam,
 	.set_ringparam	= sh_eth_set_ringparam,
-	.get_link_ksettings = sh_eth_get_link_ksettings,
+	.get_link_ksettings = phy_ethtool_get_link_ksettings,
 	.set_link_ksettings = sh_eth_set_link_ksettings,
 	.get_wol	= sh_eth_get_wol,
 	.set_wol	= sh_eth_set_wol,
-- 
2.8.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox