Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net] ixgbe: fix parsing of TC actions for HW offload
From: Ondřej Hlavatý @ 2018-05-28 16:39 UTC (permalink / raw)
  To: netdev
  Cc: Ondřej Hlavatý, Jeff Kirsher, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko

The previous code was optimistic, accepting the offload of whole action
chain when there was a single known action (drop/redirect). This results
in offloading a rule which should not be offloaded, because its behavior
cannot be reproduced in the hardware.

For example:

$ tc filter add dev eno1 parent ffff: protocol ip \
    u32 ht 800: order 1 match tcp src 42 FFFF \
    action mirred egress mirror dev enp1s16 pipe \
    drop

The controller is unable to mirror the packet to a VF, but still
offloads the rule by dropping the packet.

Change the approach of the function to a pessimistic one, rejecting the
chain when an unknown action is found. This is better suited for future
extensions.

Note that both recognized actions always return TC_ACT_SHOT, therefore
it is safe to ignore actions behind them.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Ondřej Hlavatý <ohlavaty@redhat.com>

---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba99f7b8..d01e1f0280cf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9054,7 +9054,6 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter,
 {
 	const struct tc_action *a;
 	LIST_HEAD(actions);
-	int err;
 
 	if (!tcf_exts_has_actions(exts))
 		return -EINVAL;
@@ -9075,14 +9074,14 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter,
 
 			if (!dev)
 				return -EINVAL;
-			err = handle_redirect_action(adapter, dev->ifindex, queue,
-						     action);
-			if (err == 0)
-				return err;
+			return handle_redirect_action(adapter, dev->ifindex,
+						      queue, action);
 		}
+
+		return -EINVAL;
 	}
 
-	return -EINVAL;
+	return 0;
 }
 #else
 static int parse_tc_actions(struct ixgbe_adapter *adapter,
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v3 net-next 2/2] tcp: minor optimization around tcp_hdr() usage in tcp receive path
From: Eric Dumazet @ 2018-05-28 16:36 UTC (permalink / raw)
  To: laoar.shao; +Cc: songliubraving, David Miller, netdev, LKML
In-Reply-To: <1527521753-17963-2-git-send-email-laoar.shao@gmail.com>

On Mon, May 28, 2018 at 8:36 AM Yafang Shao <laoar.shao@gmail.com> wrote:

> This is additional to the commit ea1627c20c34 ("tcp: minor optimizations
around tcp_hdr() usage").
> At this point, skb->data is same with tcp_hdr() as tcp header has not
> been pulled yet.

> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>   net/ipv4/tcp_ipv4.c | 2 +-
>   net/ipv6/tcp_ipv6.c | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)

> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index adbdb50..d179386 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1486,7 +1486,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff
*skb)
>                                  sk->sk_rx_dst = NULL;
>                          }
>                  }
> -               tcp_rcv_established(sk, skb, tcp_hdr(skb));
> +               tcp_rcv_established(sk, skb, (const struct tcphdr
*)skb->data);
>                  return 0;
>          }

> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 7d47c2b..1c633ff 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1322,7 +1322,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct
sk_buff *skb)
>                          }
>                  }

> -               tcp_rcv_established(sk, skb, tcp_hdr(skb));
> +               tcp_rcv_established(sk, skb, (const struct tcphdr
*)skb->data);
>                  if (opt_skb)
>                          goto ipv6_pktoptions;
>                  return 0;
> --
> 1.8.3.1


I would rather remove the third parameter of tcp_rcv_established() instead
of duplicating the cast.

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()
From: Petr Machata @ 2018-05-28 16:19 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: bridge, netdev, stephen, davem
In-Reply-To: <012aee15-9e57-0267-cc3a-3b091977fa3f@cumulusnetworks.com>

Nikolay Aleksandrov <nikolay@cumulusnetworks.com> writes:

> Fixes: 4d4fd36126d6 ("net: bridge: Publish bridge accessor functions")

Correct.

Thanks,
Petr

^ permalink raw reply

* Re: Is it possible to get device information via CMSG?
From: Michal Kubecek @ 2018-05-28  6:57 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: netdev
In-Reply-To: <20180526093912.E8D563A42D4@snark.thyrsus.com>

On Sat, May 26, 2018 at 05:39:12AM -0400, Eric S. Raymond wrote:
> I'm trying to untangle some nasty code in the Mills implementation of
> NTP.  I could simplify it a lot if there were a way to query a packet
> to find out the name of the network interface it arrived on.  (At the
> moment the code has to iterate over all interfaces checking for
> traffic on each one just so it doesn't lose that information.)
> 
> This seems like the kind of thing the CMSG macros are intended to
> support, but I can't find anywhere a specification of what cmsg_level
> and cmsg_type values are valid and what their semantics are.
> 
> So I have two questions:
> 
> 1. Is there a cmsg_level/cmsg_type combination that will return the
> name of the device the packet arrived through?

Not name directly, AFAIK, but you can set SOL_IP / IP_PKTINFO (or
SOL_IPV6 / IPV6_RECVPKTINFO) socket option and get IP_PKTINFO
(IPV6_PKTINFO) message with recvmsg(). This will tell you incoming
interface index so that you can look the name up. See ip(7) or ipv6(7)
for format of the message (struct ip_pktinfo, struct in6_pktinfo).

However, I suspect that userspace application is not really interested
in incoming interface name but rather in destination address of the
incoming packet which is also provided in IP_PKTINFO / IPV6_PKTINFO
message. 

Michal Kubecek

^ permalink raw reply

* Re: [PATCH rdma-next 3/3] IB/mlx5: Introduce a new mini-CQE format
From: Jason Gunthorpe @ 2018-05-28 16:11 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Guy Levi,
	Yishai Hadas, Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527104234.17261-4-leon@kernel.org>

On Sun, May 27, 2018 at 01:42:34PM +0300, Leon Romanovsky wrote:
> From: Yonatan Cohen <yonatanc@mellanox.com>
> 
> The new mini-CQE format includes the stride index, byte count and
> packet checksum.
> Stride index is needed for striding WQ feature.
> This patch exposes this capability and enables its setting
> via mlx5 UHW data as part of query device and cq creation.
> 
> Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
> Reviewed-by: Guy Levi <guyle@mellanox.com>
> Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/cq.c   | 42 +++++++++++++++++++++++++++++----------
>  drivers/infiniband/hw/mlx5/main.c |  4 ++++
>  include/uapi/rdma/mlx5-abi.h      |  2 +-
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> index 7b4ce1a19de0..ad39d64b8108 100644
> +++ b/drivers/infiniband/hw/mlx5/cq.c
> @@ -751,6 +751,28 @@ static int alloc_cq_frag_buf(struct mlx5_ib_dev *dev,
>  	return 0;
>  }
>  
> +enum {
> +	MLX5_CQE_RES_FORMAT_HASH = 0,
> +	MLX5_CQE_RES_FORMAT_CSUM = 1,
> +	MLX5_CQE_RES_FORMAT_CSUM_STRIDX = 3,
> +};

What is this??

> +static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
> +{
> +	switch (format) {
> +	case MLX5_IB_CQE_RES_FORMAT_HASH:
> +		return MLX5_CQE_RES_FORMAT_HASH;

Used here..

> +		mini_cqe_format =
> +			mini_cqe_res_format_to_hw(dev,
> +						  ucmd.cqe_comp_res_format);

And format comes from a ucmd, so that enum is upai.

Put it in the right place and put the right comment beside
struct mlx5_ib_create_cq's cqe_comp_res_format..

And what is wrong with the user space patches? Where is the update to
enum mlx5dv_cqe_comp_res_format ? And why is this wrong?

struct mlx5dv_cq_init_attr {
        uint64_t comp_mask; /* Use enum mlx5dv_cq_init_attr_mask */
        uint8_t cqe_comp_res_format; /* Use enum mlx5dv_cqe_comp_res_format */
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^

No, it isn't, and there isn't even an enum for it. Are you sure this is
designed right? Looks pretty wrong to me.

Fix it all please, and you need to arrange things to share the uapi
header with dv just like verbs is doing.

No more of this lax attitude toward uapi!

Jason

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-28 16:09 UTC (permalink / raw)
  To: Alexander Aring, Tariq Toukan
  Cc: David Miller, edumazet, netdev, fw, herbert, tgraf, brouer,
	alex.aring, stefan, ktkhai, eric.dumazet, Moshe Shemesh,
	Eran Ben Elisha
In-Reply-To: <20180528145224.3ih6urfixwv4fwkf@x220t>



On 05/28/2018 07:52 AM, Alexander Aring wrote:

> as somebody who had similar issues with this patch series I can tell you
> about what happened for the 6LoWPAN fragmentation.
> 
> The issue sounds similar, but there is too much missing information here
> to say something about if you have exactly the issue which we had.
> 
> Our problem:
> 
> The patch series uses memcmp() to compare hash keys, we had some padding
> bytes in our hash key and it occurs that we had sometimes random bytes
> in this structure when it's put on stack. We solved it by a struct
> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
> memset(bar, 0, ..) on the structure.
> 
> I asked on the netdev mailinglist how to deal with this problem in
> general, because = {} works in case of gcc, others compilers may have a
> different handling or even gcc will changes this behaviour in future.
> I got no reply so I did what it works for me. :-)
> 
> At least maybe a memcmp() on structures should never be used, it should
> be compared by field. I would recommend this way when the compiler is
> always clever enough to optimize it in some cases, but I am not so a
> compiler expert to say anything about that.
> 
> I checked the hash key structures for x86_64 and pahole, so far I didn't
> find any padding bytes there, but it might be different on
> architectures or ?compiler?.
> 
> Additional useful information to check if you running into the same problem
> would be:
> 
>  - Which architecture do you use?
> 
>  - Do you have similar problems with a veth setup?
> 
> You could also try this:
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index b939b94e7e91..40ece9ab8b12 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>  static struct frag_queue *
>  fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
>  {
> -       struct frag_v6_compare_key key = {
> -               .id = id,
> -               .saddr = hdr->saddr,
> -               .daddr = hdr->daddr,
> -               .user = IP6_DEFRAG_LOCAL_DELIVER,
> -               .iif = iif,
> -       };
> +       struct frag_v6_compare_key key = {};
>         struct inet_frag_queue *q;
>  
>         if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>                                             IPV6_ADDR_LINKLOCAL)))
>                 key.iif = 0;
>  
> +       key.id = id;
> +       key.saddr = hdr->saddr;
> +       key.daddr = hdr->daddr;
> +       key.user = IP6_DEFRAG_LOCAL_DELIVER;
> +       key.iif = iif;
> +
>         q = inet_frag_find(&net->ipv6.frags, &key);
>         if (!q)
>                 return NULL;
> 
> - Alex
> 

Hi Alex.

This patch makes no sense, since struct frag_v6_compare_key has no hole.

Only 6LoWPAN had a problem really, because of its way of having unions (and holes).

Also note that your patch would break the case when we force key.iif to be zero.


Tariq, here are my test results : No drops for me.

# ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      202117      0    10592.00
212992           10.00           0              0.00

Somehow, you might send packets too fast and receiver has a problem with that ?
For particular needs, you might need to adjust :

/proc/sys/net/ipv6/ip6frag_time  (to 2 seconds instead of the default of 60)
/proc/sys/net/ipv6/ip6frag_low_thresh
/proc/sys/net/ipv6/ip6frag_high_thresh

Once your receiver has filled its capacity with frags, the default of 60 seconds to garbage collect
might be the reason you notice a problem.

Check :
grep FRAG6 /proc/net/sockstat6

On Google servers we multiply by 25 the limits for ipv6 frags memory usage :

/proc/sys/net/ipv6/ip6frag_high_thresh:104857600  (instead of 4MB)
/proc/sys/net/ipv6/ip6frag_low_thresh:78643200  (instead of 3 MB)

When using 64KB datagrams, note that the truesize of the datagram would be about 44 * 2 = 88 KB,
so after ~40 lost packets in the network, you no longer can accept ipv6 fragments, until garbage
collector evicted old datagrams.

^ permalink raw reply

* Re: [PATCH] net: sched: split tc_ctl_tfilter into three handlers
From: Jamal Hadi Salim @ 2018-05-28 16:06 UTC (permalink / raw)
  To: Vlad Buslov, netdev; +Cc: xiyou.wangcong, jiri, davem, kliteyn
In-Reply-To: <78c4bc47-381c-4468-1745-1142d842cc70@mojatatu.com>

On 28/05/18 12:02 PM, Jamal Hadi Salim wrote:
> On 27/05/18 03:55 PM, Vlad Buslov wrote:
>> tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
>> RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
>> involves a lot of branching on specific message type because most of the
>> code is message-specific. This significantly complicates adding new
>> functionality and doesn't provide much benefit of code reuse.
>>
>> Split tc_ctl_tfilter to three standalone functions that handle filter 
>> new,
>> delete and get requests.
>>
>> The only truly protocol independent part of tc_ctl_tfilter is code that
>> looks up queue, class, and block. Refactor this code to standalone
>> tcf_block_find function that is used by all three new handlers.
>>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> 
> FWIW, I like this separation - makes things more maintainable and
> readable (we should do it in act_api as well).

Sorry - meant to point out that this belongs to net-next not net.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] net: sched: split tc_ctl_tfilter into three handlers
From: Jamal Hadi Salim @ 2018-05-28 16:02 UTC (permalink / raw)
  To: Vlad Buslov, netdev; +Cc: xiyou.wangcong, jiri, davem, kliteyn
In-Reply-To: <1527450903-11408-1-git-send-email-vladbu@mellanox.com>

On 27/05/18 03:55 PM, Vlad Buslov wrote:
> tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER,
> RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function
> involves a lot of branching on specific message type because most of the
> code is message-specific. This significantly complicates adding new
> functionality and doesn't provide much benefit of code reuse.
> 
> Split tc_ctl_tfilter to three standalone functions that handle filter new,
> delete and get requests.
> 
> The only truly protocol independent part of tc_ctl_tfilter is code that
> looks up queue, class, and block. Refactor this code to standalone
> tcf_block_find function that is used by all three new handlers.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

FWIW, I like this separation - makes things more maintainable and
readable (we should do it in act_api as well).


cheers,
jamal

^ permalink raw reply

* Re: [PATCH v2 net-next] net:sched: add action inheritdsfield to skbedit
From: Jamal Hadi Salim @ 2018-05-28 16:01 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Fu, Qiaobin
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Michel Machado,
	xiyou.wangcong@gmail.com
In-Reply-To: <20180528154228.GC3787@localhost.localdomain>

On 28/05/18 11:42 AM, Marcelo Ricardo Leitner wrote:
> On Mon, May 28, 2018 at 05:40:18AM +0000, Fu, Qiaobin wrote:
>> The new action inheritdsfield copies the field DS of
>> IPv4 and IPv6 packets into skb->priority. This enables
>> later classification of packets based on the DS field.
>>
>> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>>
>> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
>> Reviewed-by: Michel Machado <michel@digirati.com.br>
>> ---
>>
>> Note that the motivation for this patch is found in the following discussion:
>> https://www.spinics.net/lists/netdev/msg501061.html
>> ---
>>
>> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
>> index fbcfe27..432ad2f 100644
>> --- a/include/uapi/linux/tc_act/tc_skbedit.h
>> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
>> @@ -30,9 +30,11 @@
>>   #define SKBEDIT_F_MARK			0x4
>>   #define SKBEDIT_F_PTYPE			0x8
>>   #define SKBEDIT_F_MASK			0x10
>> +#define SKBEDIT_F_INHERITDSFIELD	0x20
>>   
>>   struct tc_skbedit {
>>   	tc_gen;
>> +	__u64 flags;
> 
> I don't think this is doable. It looks like it was prepared for such
> change, but it breaks UAPI as it causes tc without the respective
> patch to not be able to talk to skbedit anymore:
> 
> With this patch:
> [root@f28 ~]# tc action add action skbedit priority 1
> RTNETLINK answers: Numerical result out of range
> We have an error talking to the kernel
> [root@f28 ~]#
> 
> While without this patch:
> [root@f28 ~]# tc action add action skbedit priority 1
> [root@f28 ~]#

And the new flags field doesnt even seem to be in use.
Qiaobin, you dont seem to need this.
Also you have added a workaround for default_priority
which I dont think is needed either (sounds like you
want to make sure that SKBEDIT_F_PRIORITY and new
feature are mutually exclusive - which seems unnecessary.
My suggestion is to remove:
+	} else {
+default_priority:
+		if (d->flags & SKBEDIT_F_PRIORITY)
+			skb->priority = d->priority;
+	}

and leave alone:
-	if (d->flags & SKBEDIT_F_PRIORITY)
-		skb->priority = d->priority;

If someone wants to set both flags, then let them.

cheers,
jamal

^ permalink raw reply

* Re: Unable to create ip alias on bridge interface
From: Michal Kubecek @ 2018-05-28 12:05 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: netdev
In-Reply-To: <CAA5aLPgrCNTn15BQ_sbNqSE8cjRpx+goSN9gGHziNLLUuH6X9A@mail.gmail.com>

On Mon, May 28, 2018 at 02:35:41PM +0530, Akshat Kakkar wrote:
> I am having a bridge named br0 having ports eno1 and eno2 as members.
> I have given IP to br0 as 10.10.10.1/24
> 
> Now I want to create alias on br0 as br0:1 and give IP as
> 10.10.10.2/24, but I am unable to.
> 
> I know, we can add multiple IPs to br0 using "ip addr" command, but I
> dont want to do it that way as I want all outgoing connections from
> br0 to take src ip as 10.10.10.1. I know by providing option of "src"
> in all routes, things can work but this looks more like a hack and
> less of a solution.

I don't understand. There are no actual aliases since kernel 2.2 and an
attempt to add "br0:1 with address 10.10.10.2/24" using ifconfig should
result in the same configuration as

  ip addr add 10.10.10.2/24 brd + label br0:1 dev br0

where the "label br0:1" part only adds a label which allows ifconfig to
see the new address.

As both addresses share the same range, you don't even have to worry
about source address as primary address (10.10.10.1 - or first one added
in general) will be used unless specified otherwise.

Michal Kubecek

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()
From: Nikolay Aleksandrov @ 2018-05-28 15:53 UTC (permalink / raw)
  To: Petr Machata, bridge, netdev; +Cc: davem
In-Reply-To: <58a3153f-9393-da8c-7aea-7a68537f6f90@cumulusnetworks.com>

On 28/05/18 18:52, Nikolay Aleksandrov wrote:
> On 28/05/18 18:44, Petr Machata wrote:
>> Callers of br_fdb_find() need to hold the hash lock, which
>> br_fdb_find_port() doesn't do. Add the missing lock/unlock
>> pair.
>>
>> Signed-off-by: Petr Machata <petrm@mellanox.com>
>> ---
>>  net/bridge/br_fdb.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index b19e310..3f5691a 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>>  		return NULL;
>>  
>>  	br = netdev_priv(br_dev);
>> +	spin_lock_bh(&br->hash_lock);
>>  	f = br_fdb_find(br, addr, vid);
>>  	if (f && f->dst)
>>  		dev = f->dst->dev;
>> +	spin_unlock_bh(&br->hash_lock);
>>  
>>  	return dev;
>>  }
>>
> 
> There's also a lockdep assert for hash_lock in br_find_fdb().
> 
> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 

Fixes: 4d4fd36126d6 ("net: bridge: Publish bridge accessor functions")

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: Lock before br_fdb_find()
From: Nikolay Aleksandrov @ 2018-05-28 15:52 UTC (permalink / raw)
  To: Petr Machata, bridge, netdev; +Cc: davem
In-Reply-To: <ff84c00d882dc646a7d69dcbdfe8417b4c0bdf91.1527522008.git.petrm@mellanox.com>

On 28/05/18 18:44, Petr Machata wrote:
> Callers of br_fdb_find() need to hold the hash lock, which
> br_fdb_find_port() doesn't do. Add the missing lock/unlock
> pair.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> ---
>  net/bridge/br_fdb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b19e310..3f5691a 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
>  		return NULL;
>  
>  	br = netdev_priv(br_dev);
> +	spin_lock_bh(&br->hash_lock);
>  	f = br_fdb_find(br, addr, vid);
>  	if (f && f->dst)
>  		dev = f->dst->dev;
> +	spin_unlock_bh(&br->hash_lock);
>  
>  	return dev;
>  }
> 

There's also a lockdep assert for hash_lock in br_find_fdb().

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* [PATCH] net: davinci: fix building davinci mdio code without CONFIG_OF
From: Arnd Bergmann @ 2018-05-28 15:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Grygorii Strashko, Sekhar Nori, Florian Fainelli,
	linux-omap, netdev, linux-kernel

Test-building this driver on targets without CONFIG_OF revealed a build
failure:

drivers/net/ethernet/ti/davinci_mdio.c: In function 'davinci_mdio_probe':
drivers/net/ethernet/ti/davinci_mdio.c:380:9: error: implicit declaration of function 'davinci_mdio_probe_dt'; did you mean 'davinci_mdio_probe'? [-Werror=implicit-function-declaration]

This adjusts the #ifdef logic in the driver to make it build in
all configurations.

Fixes: 2652113ff043 ("net: ethernet: ti: Allow most drivers with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/ti/davinci_mdio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c b/drivers/net/ethernet/ti/davinci_mdio.c
index 8ac72831af05..a98aedae1b41 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -321,7 +321,6 @@ static int davinci_mdio_write(struct mii_bus *bus, int phy_id,
 	return ret;
 }
 
-#if IS_ENABLED(CONFIG_OF)
 static int davinci_mdio_probe_dt(struct mdio_platform_data *data,
 			 struct platform_device *pdev)
 {
@@ -339,7 +338,6 @@ static int davinci_mdio_probe_dt(struct mdio_platform_data *data,
 
 	return 0;
 }
-#endif
 
 #if IS_ENABLED(CONFIG_OF)
 static const struct davinci_mdio_of_param of_cpsw_mdio_data = {
@@ -374,7 +372,7 @@ static int davinci_mdio_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
-	if (dev->of_node) {
+	if (IS_ENABLED(CONFIG_OF) && dev->of_node) {
 		const struct of_device_id	*of_id;
 
 		ret = davinci_mdio_probe_dt(&data->pdata, pdev);
-- 
2.9.0

^ permalink raw reply related

* [PATCH, net-next] net: ethernet: freescale: fix false-positive string overflow warning
From: Arnd Bergmann @ 2018-05-28 15:49 UTC (permalink / raw)
  To: Fugang Duan, David S. Miller
  Cc: Arnd Bergmann, Fabio Estevam, Andrew Lunn, Troy Kisky,
	Florian Fainelli, netdev, linux-kernel

While compile-testing on arm64 with gcc-8.1, I ran into a build diagnostic:

drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
drivers/net/ethernet/freescale/fec_main.c:3517:25: error: '%d' directive writing between 1 and 10 bytes into a region of size 5 [-Werror=format-overflow=]
   sprintf(irq_name, "int%d", i);
                         ^~
drivers/net/ethernet/freescale/fec_main.c:3517:21: note: directive argument in the range [0, 2147483646]
   sprintf(irq_name, "int%d", i);
                     ^~~~~~~
drivers/net/ethernet/freescale/fec_main.c:3517:3: note: 'sprintf' output between 5 and 14 bytes into a destination of size 8
   sprintf(irq_name, "int%d", i);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It appears this has never shown on ppc32 or arm32 for an unknown reason, but
now gcc fails to identify that the 'irq_cnt' loop index has an upper bound
of 3, and instead uses a bogus range.

To work around the warning, this changes the sprintf to snprintf with the
correct buffer length.

Fixes: 78cc6e7ef957 ("net: ethernet: freescale: Allow FEC with COMPILE_TEST")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index ab7521c04eb2..c729665107f5 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3514,7 +3514,7 @@ fec_probe(struct platform_device *pdev)
 		goto failed_init;
 
 	for (i = 0; i < irq_cnt; i++) {
-		sprintf(irq_name, "int%d", i);
+		snprintf(irq_name, sizeof(irq_name), "int%d", i);
 		irq = platform_get_irq_byname(pdev, irq_name);
 		if (irq < 0)
 			irq = platform_get_irq(pdev, i);
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net-next 05/14] nfp: abm: add simple RED offload
From: Nogah Frankel @ 2018-05-28 15:49 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: jiri, xiyou.wangcong, john.fastabend, netdev, oss-drivers,
	alexei.starovoitov, nogahf, yuvalm, gerlitz.or
In-Reply-To: <20180526045338.10993-6-jakub.kicinski@netronome.com>

On 26-May-18 7:53 AM, Jakub Kicinski wrote:
> Offload simple RED configurations.  For now support only DCTCP
> like scenarios where min and max are the same.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>   drivers/net/ethernet/netronome/nfp/abm/main.c | 82 +++++++++++++++++++
>   drivers/net/ethernet/netronome/nfp/abm/main.h | 10 +++
>   2 files changed, 92 insertions(+)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.c b/drivers/net/ethernet/netronome/nfp/abm/main.c
> index 28a18ac62040..22251d88c958 100644
> --- a/drivers/net/ethernet/netronome/nfp/abm/main.c
> +++ b/drivers/net/ethernet/netronome/nfp/abm/main.c
> @@ -38,6 +38,8 @@
>   #include <linux/netdevice.h>
>   #include <linux/rcupdate.h>
>   #include <linux/slab.h>
> +#include <net/pkt_cls.h>
> +#include <net/pkt_sched.h>
>   
>   #include "../nfpcore/nfp.h"
>   #include "../nfpcore/nfp_cpp.h"
> @@ -55,6 +57,84 @@ static u32 nfp_abm_portid(enum nfp_repr_type rtype, unsigned int id)
>   	       FIELD_PREP(NFP_ABM_PORTID_ID, id);
>   }
>   
> +static void
> +nfp_abm_red_destroy(struct net_device *netdev, struct nfp_abm_link *alink,
> +		    u32 handle)
> +{
> +	struct nfp_port *port = nfp_port_from_netdev(netdev);
> +
> +	if (handle != alink->qdiscs[0].handle)
> +		return;
> +
> +	alink->qdiscs[0].handle = TC_H_UNSPEC;
> +	port->tc_offload_cnt = 0;
> +	nfp_abm_ctrl_set_all_q_lvls(alink, ~0);
> +}
> +
> +static int
> +nfp_abm_red_replace(struct net_device *netdev, struct nfp_abm_link *alink,
> +		    struct tc_red_qopt_offload *opt)
> +{
> +	struct nfp_port *port = nfp_port_from_netdev(netdev);
> +	int err;
> +
> +	if (opt->set.min != opt->set.max || !opt->set.is_ecn) {

I am a bit worried about the min == max.
sch_red doesn't really support it. It will calculate incorrect delta 
value. (And that only if tc_red_eval_P in iproute2 won't reject it).
You might maybe use max = min+1,  because in real life it will probably 
act the same but without this problem.

Nogah Frankel
(from a new mail address)

> +		nfp_warn(alink->abm->app->cpp,
> +			 "RED offload failed - unsupported parameters\n");
> +		err = -EINVAL;
> +		goto err_destroy;
> +	}
> +	err = nfp_abm_ctrl_set_all_q_lvls(alink, opt->set.min);
> +	if (err)
> +		goto err_destroy;
> +
> +	alink->qdiscs[0].handle = opt->handle;
> +	port->tc_offload_cnt = 1;
> +
> +	return 0;
> +err_destroy:
> +	if (alink->qdiscs[0].handle != TC_H_UNSPEC)
> +		nfp_abm_red_destroy(netdev, alink, alink->qdiscs[0].handle);
> +	return err;
> +}
> +
> +static int
> +nfp_abm_setup_tc_red(struct net_device *netdev, struct nfp_abm_link *alink,
> +		     struct tc_red_qopt_offload *opt)
> +{
> +	if (opt->parent != TC_H_ROOT)
> +		return -EOPNOTSUPP;
> +
> +	switch (opt->command) {
> +	case TC_RED_REPLACE:
> +		return nfp_abm_red_replace(netdev, alink, opt);
> +	case TC_RED_DESTROY:
> +		nfp_abm_red_destroy(netdev, alink, opt->handle);
> +		return 0;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static int
> +nfp_abm_setup_tc(struct nfp_app *app, struct net_device *netdev,
> +		 enum tc_setup_type type, void *type_data)
> +{
> +	struct nfp_repr *repr = netdev_priv(netdev);
> +	struct nfp_port *port;
> +
> +	port = nfp_port_from_netdev(netdev);
> +	if (!port || port->type != NFP_PORT_PF_PORT)
> +		return -EOPNOTSUPP;
> +
> +	switch (type) {
> +	case TC_SETUP_QDISC_RED:
> +		return nfp_abm_setup_tc_red(netdev, repr->app_priv, type_data);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
>   static struct net_device *nfp_abm_repr_get(struct nfp_app *app, u32 port_id)
>   {
>   	enum nfp_repr_type rtype;
> @@ -403,6 +483,8 @@ const struct nfp_app_type app_abm = {
>   	.vnic_alloc	= nfp_abm_vnic_alloc,
>   	.vnic_free	= nfp_abm_vnic_free,
>   
> +	.setup_tc	= nfp_abm_setup_tc,
> +
>   	.eswitch_mode_get	= nfp_abm_eswitch_mode_get,
>   	.eswitch_mode_set	= nfp_abm_eswitch_mode_set,
>   
> diff --git a/drivers/net/ethernet/netronome/nfp/abm/main.h b/drivers/net/ethernet/netronome/nfp/abm/main.h
> index 1ac651cdc140..979f98fb808b 100644
> --- a/drivers/net/ethernet/netronome/nfp/abm/main.h
> +++ b/drivers/net/ethernet/netronome/nfp/abm/main.h
> @@ -58,18 +58,28 @@ struct nfp_abm {
>   	const struct nfp_rtsym *q_lvls;
>   };
>   
> +/**
> + * struct nfp_red_qdisc - representation of single RED Qdisc
> + * @handle:	handle of currently offloaded RED Qdisc
> + */
> +struct nfp_red_qdisc {
> +	u32 handle;
> +};
> +
>   /**
>    * struct nfp_abm_link - port tuple of a ABM NIC
>    * @abm:	back pointer to nfp_abm
>    * @vnic:	data vNIC
>    * @id:		id of the data vNIC
>    * @queue_base:	id of base to host queue within PCIe (not QC idx)
> + * @qdiscs:	array of qdiscs
>    */
>   struct nfp_abm_link {
>   	struct nfp_abm *abm;
>   	struct nfp_net *vnic;
>   	unsigned int id;
>   	unsigned int queue_base;
> +	struct nfp_red_qdisc qdiscs[1];
>   };
>   
>   void nfp_abm_ctrl_read_params(struct nfp_abm_link *alink);
> 

^ permalink raw reply

* Re: [bpf-next PATCH] bpf: sockhash fix race with bpf_tcp_close and map delete
From: Daniel Borkmann @ 2018-05-28 15:48 UTC (permalink / raw)
  To: John Fastabend, ast; +Cc: netdev
In-Reply-To: <35940fbd-c354-e5ad-8baf-678d8245389c@gmail.com>

On 05/28/2018 05:13 PM, John Fastabend wrote:
> On 05/27/2018 03:36 PM, Daniel Borkmann wrote:
>> On 05/25/2018 07:37 PM, John Fastabend wrote:
>>> syzbot reported two related splats, a use after free and null
>>> pointer dereference, when a TCP socket is closed while the map is
>>> also being removed.
>>>
>>> The psock keeps a reference to all map slots that have a reference
>>> to the sock so that when the sock is closed we can clean up any
>>> outstanding sock{map|hash} entries. This avoids pinning a sock
>>> forever if the map owner fails to do proper cleanup. However, the
>>> result is we have two paths that can free an entry in the map. Even
>>> the comment in the sock{map|hash} tear down function, sock_hash_free()
>>> notes this:
>>>
>>>  At this point no update, lookup or delete operations can happen.
>>>  However, be aware we can still get a socket state event updates,
>>>  and data ready callbacks that reference the psock from sk_user_data.
>>>
>>> Both removal paths omitted taking the hash bucket lock resulting
>>> in the case where we have two references that are in the process
>>> of being free'd.
>>>
>>> Reported-by: syzbot+a761b81c211794fa1072@syzkaller.appspotmail.com
>>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>>
>> Applied to bpf-next, thanks John!
> 
> This needs a v2 it introduces a slightly different/related error. I'll
> have a v2 shortly.

Ok, I've just dropped it from the tree in favor for a v2. Thanks John!

^ permalink raw reply

* [PATCH net-next] net: bridge: Lock before br_fdb_find()
From: Petr Machata @ 2018-05-28 15:44 UTC (permalink / raw)
  To: bridge, netdev; +Cc: stephen, davem

Callers of br_fdb_find() need to hold the hash lock, which
br_fdb_find_port() doesn't do. Add the missing lock/unlock
pair.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 net/bridge/br_fdb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b19e310..3f5691a 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev,
 		return NULL;
 
 	br = netdev_priv(br_dev);
+	spin_lock_bh(&br->hash_lock);
 	f = br_fdb_find(br, addr, vid);
 	if (f && f->dst)
 		dev = f->dst->dev;
+	spin_unlock_bh(&br->hash_lock);
 
 	return dev;
 }
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH v2 net-next] net:sched: add action inheritdsfield to skbedit
From: Marcelo Ricardo Leitner @ 2018-05-28 15:42 UTC (permalink / raw)
  To: Fu, Qiaobin
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jhs@mojatatu.com,
	Michel Machado, xiyou.wangcong@gmail.com
In-Reply-To: <A371B54E-186D-4747-B4FB-744E201C3FE5@bu.edu>

On Mon, May 28, 2018 at 05:40:18AM +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
> 
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> 
> Note that the motivation for this patch is found in the following discussion:
> https://www.spinics.net/lists/netdev/msg501061.html
> ---
> 
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index fbcfe27..432ad2f 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -30,9 +30,11 @@
>  #define SKBEDIT_F_MARK			0x4
>  #define SKBEDIT_F_PTYPE			0x8
>  #define SKBEDIT_F_MASK			0x10
> +#define SKBEDIT_F_INHERITDSFIELD	0x20
>  
>  struct tc_skbedit {
>  	tc_gen;
> +	__u64 flags;

I don't think this is doable. It looks like it was prepared for such
change, but it breaks UAPI as it causes tc without the respective
patch to not be able to talk to skbedit anymore:

With this patch:
[root@f28 ~]# tc action add action skbedit priority 1 
RTNETLINK answers: Numerical result out of range
We have an error talking to the kernel
[root@f28 ~]#

While without this patch:
[root@f28 ~]# tc action add action skbedit priority 1
[root@f28 ~]#


>  };
>  
>  enum {

^ permalink raw reply

* [PATCH v3 net-next 2/2] tcp: minor optimization around tcp_hdr() usage in tcp receive path
From: Yafang Shao @ 2018-05-28 15:35 UTC (permalink / raw)
  To: songliubraving, edumazet, davem; +Cc: netdev, linux-kernel, Yafang Shao
In-Reply-To: <1527521753-17963-1-git-send-email-laoar.shao@gmail.com>

This is additional to the commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/ipv4/tcp_ipv4.c | 2 +-
 net/ipv6/tcp_ipv6.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index adbdb50..d179386 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1486,7 +1486,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 				sk->sk_rx_dst = NULL;
 			}
 		}
-		tcp_rcv_established(sk, skb, tcp_hdr(skb));
+		tcp_rcv_established(sk, skb, (const struct tcphdr *)skb->data);
 		return 0;
 	}
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7d47c2b..1c633ff 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1322,7 +1322,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 			}
 		}
 
-		tcp_rcv_established(sk, skb, tcp_hdr(skb));
+		tcp_rcv_established(sk, skb, (const struct tcphdr *)skb->data);
 		if (opt_skb)
 			goto ipv6_pktoptions;
 		return 0;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v3 net-next 1/2] tcp: use data length instead of skb->len in tcp_probe
From: Yafang Shao @ 2018-05-28 15:35 UTC (permalink / raw)
  To: songliubraving, edumazet, davem; +Cc: netdev, linux-kernel, Yafang Shao

At this point skb->len is including tcp header length, so it is meaningless
to user. data length could be more helpful, with which we can easily filter
out the packet without payload.

Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

---
v3: tcp_hdr() is a little expensive than skb->data, so replace it with
    skb->data.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/trace/events/tcp.h | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index c1a5284..7ff0446 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -236,7 +236,7 @@
 		__field(__u16, sport)
 		__field(__u16, dport)
 		__field(__u32, mark)
-		__field(__u16, length)
+		__field(__u16, data_len)
 		__field(__u32, snd_nxt)
 		__field(__u32, snd_una)
 		__field(__u32, snd_cwnd)
@@ -250,6 +250,7 @@
 	TP_fast_assign(
 		const struct tcp_sock *tp = tcp_sk(sk);
 		const struct inet_sock *inet = inet_sk(sk);
+		const struct tcphdr *th = (const struct tcphdr *)skb->data;
 
 		memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
 		memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
@@ -261,7 +262,7 @@
 		__entry->dport = ntohs(inet->inet_dport);
 		__entry->mark = skb->mark;
 
-		__entry->length = skb->len;
+		__entry->data_len = skb->len - __tcp_hdrlen(th);
 		__entry->snd_nxt = tp->snd_nxt;
 		__entry->snd_una = tp->snd_una;
 		__entry->snd_cwnd = tp->snd_cwnd;
@@ -272,9 +273,9 @@
 		__entry->sock_cookie = sock_gen_cookie(sk);
 	),
 
-	TP_printk("src=%pISpc dest=%pISpc mark=%#x length=%d snd_nxt=%#x snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock_cookie=%llx",
+	TP_printk("src=%pISpc dest=%pISpc mark=%#x data_len=%d snd_nxt=%#x snd_una=%#x snd_cwnd=%u ssthresh=%u snd_wnd=%u srtt=%u rcv_wnd=%u sock_cookie=%llx",
 		  __entry->saddr, __entry->daddr, __entry->mark,
-		  __entry->length, __entry->snd_nxt, __entry->snd_una,
+		  __entry->data_len, __entry->snd_nxt, __entry->snd_una,
 		  __entry->snd_cwnd, __entry->ssthresh, __entry->snd_wnd,
 		  __entry->srtt, __entry->rcv_wnd, __entry->sock_cookie)
 );
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] brcmfmac: stop watchdog before detach and free everything
From: Michael Nazzareno Trimarchi @ 2018-05-28 15:33 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S. Miller, Pieter-Paul Giesberts,
	Ian Molton, open list:TI WILINK WIRELES...,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	brcm80211-dev-list, netdev, LKML
In-Reply-To: <CAOf5uw=G5iYfuoRBCOWU-PkEGpm1_qMwB+1=MjRQgxB+e_0w=g@mail.gmail.com>

Hi Andy

The problem seems really easy to solve:

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 412a05b..ba60b151 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4227,13 +4227,7 @@ struct brcmf_sdio *brcmf_sdio_probe(struct
brcmf_sdio_dev *sdiodev)
        timer_setup(&bus->timer, brcmf_sdio_watchdog, 0);
        /* Initialize watchdog thread */
        init_completion(&bus->watchdog_wait);
-       bus->watchdog_tsk = kthread_run(brcmf_sdio_watchdog_thread,
-                                       bus, "brcmf_wdog/%s",
-                                       dev_name(&sdiodev->func1->dev));
-       if (IS_ERR(bus->watchdog_tsk)) {
-               pr_warn("brcmf_watchdog thread failed to start\n");
-               bus->watchdog_tsk = NULL;
-       }
+
        /* Initialize DPC thread */
        bus->dpc_triggered = false;
        bus->dpc_running = false;
@@ -4281,6 +4275,14 @@ struct brcmf_sdio *brcmf_sdio_probe(struct
brcmf_sdio_dev *sdiodev)
                goto fail;
        }

+       bus->watchdog_tsk = kthread_run(brcmf_sdio_watchdog_thread,
+                                       bus, "brcmf_wdog/%s",
+                                       dev_name(&sdiodev->func1->dev));
+       if (IS_ERR(bus->watchdog_tsk)) {
+               pr_warn("brcmf_watchdog thread failed to start\n");
+               bus->watchdog_tsk = NULL;
+       }
+
        return bus;

Just look here

        ret = brcmf_fw_get_firmwares(sdiodev->dev, fwreq,
                                     brcmf_sdio_firmware_callback);
        if (ret != 0) {
                brcmf_err("async firmware request failed: %d\n", ret);
                kfree(fwreq);
                goto fail;
        }

        bus->watchdog_tsk = kthread_run(brcmf_sdio_watchdog_thread,
                                        bus, "brcmf_wdog/%s",
                                        dev_name(&sdiodev->func1->dev));
        if (IS_ERR(bus->watchdog_tsk)) {
                pr_warn("brcmf_watchdog thread failed to start\n");
                bus->watchdog_tsk = NULL;
        }

        return bus;

fail:
        brcmf_sdio_remove(bus);
        return NULL;


On Mon, May 28, 2018 at 5:29 PM, Michael Nazzareno Trimarchi
<michael@amarulasolutions.com> wrote:
> Hi
>
> On Mon, May 28, 2018 at 5:25 PM, Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
>> On Mon, May 28, 2018 at 12:54 PM, Michael Nazzareno Trimarchi
>> <michael@amarulasolutions.com> wrote:
>>> Hi Arend
>>>
>>> On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
>>> <arend.vanspriel@broadcom.com> wrote:
>>>> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>>>>
>>>>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>>>>> i
>>>>> The system is going down NOW!
>>>>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>>>>> address 000002f8
>>>>> Sent SIGTERM to all processes
>>>>> [ 1348.121412] Mem abort info:
>>>>> [ 1348.126962]   ESR = 0x96000004
>>>>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>>>>> [ 1348.135948]   SET = 0, FnV = 0
>>>>> [ 1348.138997]   EA = 0, S1PTW = 0
>>>>> [ 1348.142154] Data abort info:
>>>>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>>>>> [ 1348.148884]   CM = 0, WnR = 0
>>>>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>>>>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>>> [ 1348.168927] Modules linked in: ipv6
>>>>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>>>>> 4.17.0-rc5-next-20180517 #18
>>>>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>>>>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>>>>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>>>>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> Thanks for the patch. In normal scenario the callstack looks like this:
>>>>
>>>> brcmf_sdio_remove()
>>>>         -> brcmf_detach()
>>>>                 -> brcmf_bus_stop()
>>>>                         -> brcmf_sdio_bus_stop()
>>>>
>>>> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
>>>> you encounter this null pointer deref?
>>>
>>> Is this happen even when there is not wifi firmware?
>>> boot without any firmware in the filesystem and then trigger a reboot
>>
>> Something like the above I had noticed for a long (couple of kernel
>> releases?) time, but wasn't a big priority to me.
>> Though, I can test this on my side.
>>
>> P.S. I think rmmod or echo > unbind will trigger that as well.
>>
>
> Right now the module is compiled in the kernel. I can dig down tonight
> on this if needed
>
> Michael
>
>> --
>> With Best Regards,
>> Andy Shevchenko
>
>
>
> --
> | Michael Nazzareno Trimarchi                     Amarula Solutions BV |
> | COO  -  Founder                                      Cruquiuskade 47 |
> | +31(0)851119172                                 Amsterdam 1018 AM NL |
> |                  [`as] http://www.amarulasolutions.com               |



-- 
| Michael Nazzareno Trimarchi                     Amarula Solutions BV |
| COO  -  Founder                                      Cruquiuskade 47 |
| +31(0)851119172                                 Amsterdam 1018 AM NL |
|                  [`as] http://www.amarulasolutions.com               |

^ permalink raw reply related

* [PATCH] [net-next, wrong] make BPFILTER_UMH depend on X86
From: Arnd Bergmann @ 2018-05-28 15:31 UTC (permalink / raw)
  To: David S. Miller, Alexei Starovoitov
  Cc: Arnd Bergmann, Masahiro Yamada, linux-kbuild, netdev,
	linux-kernel

When build testing across architectures, I run into a build error on
all targets other than X86:

gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objdump: net/bpfilter/bpfilter_umh: File format not recognized
gcc-8.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-objcopy:net/bpfilter/bpfilter_umh.o: Invalid bfd target

The problem is that 'hostprogs' get built with 'gcc' rather than
'$(CROSS_COMPILE)gcc', and my default gcc (as most people's) targets x86.

To work around it, adding an X86 dependency gets randconfigs building
again on my box.

Clearly, this is not a good solution, since it should actually work fine
when building native kernels on other architectures but that is now
disabled, while cross building an x86 kernel on another host is still
broken after my patch.

What we probably want here is to try out if the compiler is able to build
executables for the target architecture and not build the helper otherwise,
at least when compile-testing. No idea how to do that though.

Link: http://www.kernel.org/pub/tools/crosstool/
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/bpfilter/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bpfilter/Kconfig b/net/bpfilter/Kconfig
index 60725c5f79db..61cc4fcbb4d0 100644
--- a/net/bpfilter/Kconfig
+++ b/net/bpfilter/Kconfig
@@ -9,6 +9,7 @@ menuconfig BPFILTER
 if BPFILTER
 config BPFILTER_UMH
 	tristate "bpfilter kernel module with user mode helper"
+	depends on X86 # actually depends on native builds
 	default m
 	help
 	  This builds bpfilter kernel module with embedded user mode helper
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH] brcmfmac: stop watchdog before detach and free everything
From: Michael Nazzareno Trimarchi @ 2018-05-28 15:29 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S. Miller, Pieter-Paul Giesberts,
	Ian Molton, open list:TI WILINK WIRELES...,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	brcm80211-dev-list, netdev, LKML
In-Reply-To: <CAHp75VcC_cisfVTKU0bVm3+=NFLt3P0EkXh3uC4hWCA7y_7G3w@mail.gmail.com>

Hi

On Mon, May 28, 2018 at 5:25 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Mon, May 28, 2018 at 12:54 PM, Michael Nazzareno Trimarchi
> <michael@amarulasolutions.com> wrote:
>> Hi Arend
>>
>> On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
>> <arend.vanspriel@broadcom.com> wrote:
>>> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>>>
>>>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>>>> i
>>>> The system is going down NOW!
>>>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>>>> address 000002f8
>>>> Sent SIGTERM to all processes
>>>> [ 1348.121412] Mem abort info:
>>>> [ 1348.126962]   ESR = 0x96000004
>>>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>>>> [ 1348.135948]   SET = 0, FnV = 0
>>>> [ 1348.138997]   EA = 0, S1PTW = 0
>>>> [ 1348.142154] Data abort info:
>>>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>>>> [ 1348.148884]   CM = 0, WnR = 0
>>>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>>>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> [ 1348.168927] Modules linked in: ipv6
>>>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>>>> 4.17.0-rc5-next-20180517 #18
>>>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>>>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>>>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>>>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>>>
>>>
>>> Hi Michael,
>>>
>>> Thanks for the patch. In normal scenario the callstack looks like this:
>>>
>>> brcmf_sdio_remove()
>>>         -> brcmf_detach()
>>>                 -> brcmf_bus_stop()
>>>                         -> brcmf_sdio_bus_stop()
>>>
>>> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
>>> you encounter this null pointer deref?
>>
>> Is this happen even when there is not wifi firmware?
>> boot without any firmware in the filesystem and then trigger a reboot
>
> Something like the above I had noticed for a long (couple of kernel
> releases?) time, but wasn't a big priority to me.
> Though, I can test this on my side.
>
> P.S. I think rmmod or echo > unbind will trigger that as well.
>

Right now the module is compiled in the kernel. I can dig down tonight
on this if needed

Michael

> --
> With Best Regards,
> Andy Shevchenko



-- 
| Michael Nazzareno Trimarchi                     Amarula Solutions BV |
| COO  -  Founder                                      Cruquiuskade 47 |
| +31(0)851119172                                 Amsterdam 1018 AM NL |
|                  [`as] http://www.amarulasolutions.com               |

^ permalink raw reply

* Re: [PATCH] brcmfmac: stop watchdog before detach and free everything
From: Andy Shevchenko @ 2018-05-28 15:25 UTC (permalink / raw)
  To: Michael Nazzareno Trimarchi
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S. Miller, Pieter-Paul Giesberts,
	Ian Molton, open list:TI WILINK WIRELES...,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	brcm80211-dev-list, netdev, LKML
In-Reply-To: <CAOf5uw=mPOgngZW8UcPfvtxH+5nCR-36ovjYPH7AvO99y1M_xA@mail.gmail.com>

On Mon, May 28, 2018 at 12:54 PM, Michael Nazzareno Trimarchi
<michael@amarulasolutions.com> wrote:
> Hi Arend
>
> On Mon, May 28, 2018 at 11:51 AM, Arend van Spriel
> <arend.vanspriel@broadcom.com> wrote:
>> On 5/28/2018 9:50 AM, Michael Trimarchi wrote:
>>>
>>> Watchdog need to be stopped in brcmf_sdio_remove to avoid
>>> i
>>> The system is going down NOW!
>>> [ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual
>>> address 000002f8
>>> Sent SIGTERM to all processes
>>> [ 1348.121412] Mem abort info:
>>> [ 1348.126962]   ESR = 0x96000004
>>> [ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
>>> [ 1348.135948]   SET = 0, FnV = 0
>>> [ 1348.138997]   EA = 0, S1PTW = 0
>>> [ 1348.142154] Data abort info:
>>> [ 1348.145045]   ISV = 0, ISS = 0x00000004
>>> [ 1348.148884]   CM = 0, WnR = 0
>>> [ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
>>> [ 1348.158475] [00000000000002f8] pgd=0000000000000000
>>> [ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>> [ 1348.168927] Modules linked in: ipv6
>>> [ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted
>>> 4.17.0-rc5-next-20180517 #18
>>> [ 1348.180757] Hardware name: Amarula A64-Relic (DT)
>>> [ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
>>> [ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
>>> [ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
>>
>>
>> Hi Michael,
>>
>> Thanks for the patch. In normal scenario the callstack looks like this:
>>
>> brcmf_sdio_remove()
>>         -> brcmf_detach()
>>                 -> brcmf_bus_stop()
>>                         -> brcmf_sdio_bus_stop()
>>
>> In brcmf_sdio_bus_stop() the watchdog is terminated. So in what scenario did
>> you encounter this null pointer deref?
>
> Is this happen even when there is not wifi firmware?
> boot without any firmware in the filesystem and then trigger a reboot

Something like the above I had noticed for a long (couple of kernel
releases?) time, but wasn't a big priority to me.
Though, I can test this on my side.

P.S. I think rmmod or echo > unbind will trigger that as well.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v3 10/11] net: sched: atomically check-allocate action
From: Jiri Pirko @ 2018-05-28 15:24 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel, kliteyn
In-Reply-To: <1527455849-22327-11-git-send-email-vladbu@mellanox.com>

Sun, May 27, 2018 at 11:17:28PM CEST, vladbu@mellanox.com wrote:
>Implement function that atomically checks if action exists and either takes
>reference to it, or allocates idr slot for action index to prevent
>concurrent allocations of actions with same index. Use EBUSY error pointer
>to indicate that idr slot is reserved.
>
>Implement cleanup helper function that removes temporary error pointer from
>idr. (in case of error between idr allocation and insertion of newly
>created action to specified index)
>
>Refactor all action init functions to insert new action to idr using this
>API.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox