Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v2] rtnetlink: Disallow FDB configuration for non-Ethernet device
From: David Miller @ 2018-10-30  3:53 UTC (permalink / raw)
  To: idosch; +Cc: netdev, vyasevich, dsahern
In-Reply-To: <20181029203622.20608-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 29 Oct 2018 20:36:43 +0000

> When an FDB entry is configured, the address is validated to have the
> length of an Ethernet address, but the device for which the address is
> configured can be of any type.
> 
> The above can result in the use of uninitialized memory when the address
> is later compared against existing addresses since 'dev->addr_len' is
> used and it may be greater than ETH_ALEN, as with ip6tnl devices.
> 
> Fix this by making sure that FDB entries are only configured for
> Ethernet devices.
 ...
> v2:
> * Make error message more specific (David)
> 
> Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
> Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30  3:52 UTC (permalink / raw)
  To: Cong Wang, Paweł Staszewski
  Cc: Linux Kernel Network Developers, Dimitris Michailidis
In-Reply-To: <1e954663-ed05-4f33-4384-db880844f9d1@gmail.com>



On 10/29/2018 07:53 PM, Eric Dumazet wrote:
> 
> 
> On 10/29/2018 07:27 PM, Cong Wang wrote:
>> Hi,
>>
>> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>>
>>> Sorry not complete - followed by hw csum:
>>>
>>> [  342.190831] vlan1490: hw csum failure
>>> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>>> [  342.190836] Call Trace:
>>> [  342.190839]  <IRQ>
>>> [  342.190849]  dump_stack+0x46/0x5b
>>> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
>>> [  342.190859]  tcp_v4_rcv+0xef/0x960
>>> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
>>> [  342.190866]  ip_local_deliver+0x5e/0xe0
>>> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
>>> [  342.190870]  ip_rcv+0x41/0xc0
>>> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
>>> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
>>> [  342.190879]  napi_gro_receive+0xb7/0xe0
>>> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
>>> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
>>> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
>>
>>
>> We got exactly the same backtrace in our data center. However,
>> it is not easy for us to reproduce it, do you have any clue to reproduce it?
>>
>> If you do, try to tcpdump the packets triggering this warning, it could
>> be useful for debugging.
>>
>> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
>> occurs. We tried to revert the offending commit 88078d98d1bb, it
>> disappears. So it is likely that commit 88078d98d1bb introduces
>> more troubles than the one fixed by d55bef5059dd057bd.
>>
> 
> Or this could be that mlx5 driver is buggy when dealing with VLAN tags.
> 
> It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum() 
> 
> 		if (network_depth > ETH_HLEN)
> 			/* CQE csum is calculated from the IP header and does
> 			 * not cover VLAN headers (if present). This will add
> 			 * the checksum manually.
> 			 */
> 			skb->csum = csum_partial(skb->data + ETH_HLEN,
> 						 network_depth - ETH_HLEN,
> 						 skb->csum);
> 
> 
> That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.
> 

Bug might be in NETIF_F_RXFCS mlx5 handling btw...

Code does :

if (unlikely(netdev->features & NETIF_F_RXFCS))
     skb->csum = csum_add(skb->csum,
                          (__force __wsum)mlx5e_get_fcs(skb));

But Dimitris told us that we need to take into account if FCS starts at odd or even offset.

->
if (unlikely(netdev->features & NETIF_F_RXFCS))
     skb->csum = csum_block_add(skb->csum,
                                (__force __wsum)mlx5e_get_fcs(skb),
                                skb->len);

^ permalink raw reply

* Re: [PATCH net] sctp: check policy more carefully when getting pr status
From: David Miller @ 2018-10-30  3:51 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <cadb9859cc2015b176433d4ac7bb1bc228f97574.1540825991.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 29 Oct 2018 23:13:11 +0800

> When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt,
> it doesn't correctly process the case when policy is set with
> SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a
> slab-out-of-bounds in sctp_getsockopt_pr_streamstatus().
> 
> This patch fixes it by return -EINVAL for this case.
> 
> Fixes: 0ac1077e3a54 ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL")
> Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peer
From: David Miller @ 2018-10-30  3:51 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <f3d3a876b124a96a8e44913abfa552a278cd758c.1540825829.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 29 Oct 2018 23:10:29 +0800

> If a transport is removed by asconf but there still are some chunks with
> this transport queuing on out_chunk_list, later an use-after-free issue
> will be caused when accessing this transport from these chunks in
> sctp_outq_flush().
> 
> This is an old bug, we fix it by clearing the transport of these chunks
> in out_chunk_list when removing a transport in sctp_assoc_rm_peer().
> 
> Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] mlxsw: Couple of fixes
From: David Miller @ 2018-10-30  3:48 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, petrm, shalomt, alexpe, mlxsw
In-Reply-To: <20181029142517.1858-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 29 Oct 2018 14:26:13 +0000

> First patch makes sure mlxsw does not ignore user requests to delete FDB
> entries that were learned by the device.
> 
> Second patch fixes a use-after-free that can be triggered by requesting
> a reload via devlink when the previous reload failed.
> 
> Please consider both patches for stable. They apply cleanly to both
> 4.18.y and 4.19.y.

Applied and queued up for -stable, thanks.

^ permalink raw reply

* [PATCH 2/2] net: drop a space before tabs
From: Bo YU @ 2018-10-30  3:42 UTC (permalink / raw)
  To: davem; +Cc: yuzibode, Bo YU, netdev
In-Reply-To: <cover.1540868768.git.tsu.yubo@gmail.com>

Fix a warning from checkpatch.pl:'please no space before tabs'
in include/net/af_unix.h

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
 include/net/af_unix.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index d53aea859a76..ddbba838d048 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -40,7 +40,7 @@ struct unix_skb_parms {
 	u32			consumed;
 } __randomize_layout;
 
-#define UNIXCB(skb) 	(*(struct unix_skb_parms *)&((skb)->cb))
+#define UNIXCB(skb)	(*(struct unix_skb_parms *)&((skb)->cb))
 
 #define unix_state_lock(s)	spin_lock(&unix_sk(s)->lock)
 #define unix_state_unlock(s)	spin_unlock(&unix_sk(s)->lock)
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/2] net: add an identifier name for 'struct sock *'
From: Bo YU @ 2018-10-30  3:42 UTC (permalink / raw)
  To: davem; +Cc: yuzibode, Bo YU, netdev
In-Reply-To: <cover.1540868768.git.tsu.yubo@gmail.com>

Fix a warning from checkpatch:
function definition argument 'struct sock *' should also have an
identifier name in include/net/af_unix.h.

Signed-off-by: Bo YU <tsu.yubo@gmail.com>
---
 include/net/af_unix.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index e2695c4bf358..d53aea859a76 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -13,7 +13,7 @@ void unix_notinflight(struct user_struct *user, struct file *fp);
 void unix_gc(void);
 void wait_for_unix_gc(void);
 struct sock *unix_get_socket(struct file *filp);
-struct sock *unix_peer_get(struct sock *);
+struct sock *unix_peer_get(struct sock *sk);
 
 #define UNIX_HASH_SIZE	256
 #define UNIX_HASH_BITS	8
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net] Documentation: ip-sysctl.txt: Document tcp_fwmark_accept
From: David Miller @ 2018-10-30  3:42 UTC (permalink / raw)
  To: lorenzo; +Cc: netdev, zenczykowski
In-Reply-To: <20181029003029.60969-1-lorenzo@google.com>

From: Lorenzo Colitti <lorenzo@google.com>
Date: Mon, 29 Oct 2018 09:30:29 +0900

> This patch documents the tcp_fwmark_accept sysctl that was
> added in 3.15.
> 
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

Applied, thanks Lorenzo.

^ permalink raw reply

* Re: [PATCH v2] bonding: fix length of actor system
From: David Miller @ 2018-10-30  3:41 UTC (permalink / raw)
  To: tobias.jungel; +Cc: jay.vosburgh, vfalico, andy, edumazet, netdev
In-Reply-To: <9e78727aaee4a775dbe3a4a9942a733a14f8a608.camel@gmail.com>

From: Tobias Jungel <tobias.jungel@gmail.com>
Date: Sun, 28 Oct 2018 12:54:10 +0100

> The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the
> length of sizeof(bond->params.ad_actor_system) which is 8 byte. This
> patch aligns the length to ETH_ALEN to have the same MAC address exposed
> as using sysfs.
> 
> Fixes: f87fda00b6ed2 ("bonding: prevent out of bound accesses")
> 

Please do not put empty lines between "Fixes:" and other tags, keep
all the tags together in one uninterrupted group.

> Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* [PATCH 0/2] net: fix warnings in include/net/af_unix.h
From: Bo YU @ 2018-10-30  3:34 UTC (permalink / raw)
  To: yuzibode; +Cc: Bo YU, netdev

Fix two warnings from checkpatch.pl.

Bo YU (2):
  net: add an identifier name for 'struct sock *'
  net: drop a space before tabs

 include/net/af_unix.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

^ permalink raw reply

* Re: [PATCH net v5] net/ipv6: Add anycast addresses to a global hashtable
From: David Miller @ 2018-10-30  3:32 UTC (permalink / raw)
  To: 0xeffeff; +Cc: netdev, kuznet, yoshfuji
In-Reply-To: <20181028015159.10636-1-0xeffeff@gmail.com>

From: Jeff Barnhill <0xeffeff@gmail.com>
Date: Sun, 28 Oct 2018 01:51:59 +0000

> +struct ipv6_ac_addrlist {
> +	struct in6_addr		acal_addr;
> +	possible_net_t		acal_pnet;
> +	refcount_t		acal_users;
> +	struct hlist_node	acal_lst; /* inet6_acaddr_lst */
> +	struct rcu_head		rcu;
> +};

Please just add the hlist to ifcaddr6 instead of duplicating so much
information and reference counters here.

This seems to waste a lot of memory unnecessary and add lots of
unnecessary object allocate/setup/destroy logic.

^ permalink raw reply

* Re: [net 1/2] net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
From: Eric Dumazet @ 2018-10-30  3:29 UTC (permalink / raw)
  To: Saeed Mahameed, David S. Miller; +Cc: netdev, Eran Ben Elisha
In-Reply-To: <20180524215313.7605-2-saeedm@mellanox.com>



On 05/24/2018 02:53 PM, Saeed Mahameed wrote:
> From: Eran Ben Elisha <eranbe@mellanox.com>
> 
> When RXFCS feature is enabled, the HW do not strip the FCS data,
> however it is not present in the checksum calculated by the HW.
> 
> Fix that by manually calculating the FCS checksum and adding it to the SKB
> checksum field.
> 
> Add helper function to find the FCS data for all SKB forms (linear,
> one fragment or more).
> 
> Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag")
> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 42 +++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 176645762e49..1ff0b0e93804 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -615,6 +615,45 @@ static inline bool is_last_ethertype_ip(struct sk_buff *skb, int *network_depth)
>  	return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
>  }
>  
> +static __be32 mlx5e_get_fcs(struct sk_buff *skb)
> +{
> +	int last_frag_sz, bytes_in_prev, nr_frags;
> +	u8 *fcs_p1, *fcs_p2;
> +	skb_frag_t *last_frag;
> +	__be32 fcs_bytes;
> +
> +	if (!skb_is_nonlinear(skb))
> +		return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);
> +
> +	nr_frags = skb_shinfo(skb)->nr_frags;
> +	last_frag = &skb_shinfo(skb)->frags[nr_frags - 1];
> +	last_frag_sz = skb_frag_size(last_frag);
> +
> +	/* If all FCS data is in last frag */
> +	if (last_frag_sz >= ETH_FCS_LEN)
> +		return *(__be32 *)(skb_frag_address(last_frag) +
> +				   last_frag_sz - ETH_FCS_LEN);
> +
> +	fcs_p2 = (u8 *)skb_frag_address(last_frag);
> +	bytes_in_prev = ETH_FCS_LEN - last_frag_sz;
> +
> +	/* Find where the other part of the FCS is - Linear or another frag */
> +	if (nr_frags == 1) {
> +		fcs_p1 = skb_tail_pointer(skb);
> +	} else {
> +		skb_frag_t *prev_frag = &skb_shinfo(skb)->frags[nr_frags - 2];
> +
> +		fcs_p1 = skb_frag_address(prev_frag) +
> +			    skb_frag_size(prev_frag);
> +	}
> +	fcs_p1 -= bytes_in_prev;
> +
> +	memcpy(&fcs_bytes, fcs_p1, bytes_in_prev);
> +	memcpy(((u8 *)&fcs_bytes) + bytes_in_prev, fcs_p2, last_frag_sz);
> +
> +	return fcs_bytes;
> +}
>

Oh well, this is so ugly, why isn't skb_header_pointer() used ?

Untested patch :

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 94224c22ecc310a87b6715051e335446f29bec03..11129e3a50d6f3b9a49861a99023541720bbcbe4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -713,43 +713,12 @@ static inline void mlx5e_enable_ecn(struct mlx5e_rq *rq, struct sk_buff *skb)
        rq->stats->ecn_mark += !!rc;
 }
 
-static __be32 mlx5e_get_fcs(struct sk_buff *skb)
+static __be32 mlx5e_get_fcs(const struct sk_buff *skb)
 {
-       int last_frag_sz, bytes_in_prev, nr_frags;
-       u8 *fcs_p1, *fcs_p2;
-       skb_frag_t *last_frag;
        __be32 fcs_bytes;
 
-       if (!skb_is_nonlinear(skb))
-               return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);
-
-       nr_frags = skb_shinfo(skb)->nr_frags;
-       last_frag = &skb_shinfo(skb)->frags[nr_frags - 1];
-       last_frag_sz = skb_frag_size(last_frag);
-
-       /* If all FCS data is in last frag */
-       if (last_frag_sz >= ETH_FCS_LEN)
-               return *(__be32 *)(skb_frag_address(last_frag) +
-                                  last_frag_sz - ETH_FCS_LEN);
-
-       fcs_p2 = (u8 *)skb_frag_address(last_frag);
-       bytes_in_prev = ETH_FCS_LEN - last_frag_sz;
-
-       /* Find where the other part of the FCS is - Linear or another frag */
-       if (nr_frags == 1) {
-               fcs_p1 = skb_tail_pointer(skb);
-       } else {
-               skb_frag_t *prev_frag = &skb_shinfo(skb)->frags[nr_frags - 2];
-
-               fcs_p1 = skb_frag_address(prev_frag) +
-                           skb_frag_size(prev_frag);
-       }
-       fcs_p1 -= bytes_in_prev;
-
-       memcpy(&fcs_bytes, fcs_p1, bytes_in_prev);
-       memcpy(((u8 *)&fcs_bytes) + bytes_in_prev, fcs_p2, last_frag_sz);
-
-       return fcs_bytes;
+       return *(__be32 *)skb_header_pointer(skb, skb->len - ETH_FCS_LEN,
+                                            ETH_FCS_LEN, &fcs_bytes);
 }
 

^ permalink raw reply related

* Re: [PATCH net] ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12
From: David Miller @ 2018-10-30  3:26 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev, daniel, hannes, xiyou.wangcong
In-Reply-To: <1540524635-18667-1-git-send-email-liuhangbin@gmail.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Fri, 26 Oct 2018 11:30:35 +0800

> Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2
> switchback timeout to rfc3810, 9.12.")
> 
> i) RFC3376 8.12. Older Version Querier Present Timeout says:
> 
>    The Older Version Querier Interval is the time-out for transitioning
>    a host back to IGMPv3 mode once an older version query is heard.
>    When an older version query is received, hosts set their Older
>    Version Querier Present Timer to Older Version Querier Interval.
> 
>    This value MUST be ((the Robustness Variable) times (the Query
>    Interval in the last Query received)) plus (one Query Response
>    Interval).
> 
> Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT.
> Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response
> Interval) in struct in_device.
> 
> Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri.
> We need update these values when receive IGMPv3 queries.
> 
> Reported-by: Ying Xu <yinxu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6/mcast: update mc_qrv when join new group
From: David Miller @ 2018-10-30  3:23 UTC (permalink / raw)
  To: liuhangbin; +Cc: netdev
In-Reply-To: <1540521054-17825-1-git-send-email-liuhangbin@gmail.com>

From: Hangbin Liu <liuhangbin@gmail.com>
Date: Fri, 26 Oct 2018 10:30:54 +0800

> Currently we only set mc_qrv to sysctl_mld_qrv when interface up. If we
> change sysctl_mld_qrv after interface up, it will has no effect.
> 
> Fix it by assigning latest sysctl_mld_qrv to idev mc_qrv when join new group.
> 
> Reported-by: Ying Xu <yinxu@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Why isn't mld_update_qrv() taking care of this?

At a minimum, you must make your change take MLD_QRV_DEFAULT etc.
into account like mld_update_qrv() does.

^ permalink raw reply

* Re: [PATCH 2/2] net: xilinx_emaclite: recheck condition after timeout in mdio_wait()
From: Andrew Lunn @ 2018-10-30 12:10 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: Anirudha Sarangi, John Linn, David S. Miller, Michal Simek,
	Radhey Shyam Pandey, YueHaibing, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20181030093139.10226-3-kurt@linutronix.de>

On Tue, Oct 30, 2018 at 10:31:39AM +0100, Kurt Kanzenbach wrote:
> The function could report a false positive if it gets preempted between reading
> the XEL_MDIOCTRL_OFFSET register and checking for the timeout.  In such a case,
> the condition has to be rechecked to avoid false positives.
> 
> Therefore, check for expected condition even after the timeout occurred.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
>  drivers/net/ethernet/xilinx/xilinx_emaclite.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> index 639e3e99af46..957d03085bd0 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> @@ -714,19 +714,29 @@ static irqreturn_t xemaclite_interrupt(int irq, void *dev_id)
>  static int xemaclite_mdio_wait(struct net_local *lp)
>  {
>  	unsigned long end = jiffies + 2;
> +	u32 val;
>  
>  	/* wait for the MDIO interface to not be busy or timeout
>  	 * after some time.
>  	 */
> -	while (xemaclite_readl(lp->base_addr + XEL_MDIOCTRL_OFFSET) &
> -			XEL_MDIOCTRL_MDIOSTS_MASK) {
> +	while (1) {
> +		val = xemaclite_readl(lp->base_addr + XEL_MDIOCTRL_OFFSET);

Hi Kurt

It looks like readx_poll_timeout() should work here.

   Andrew

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Eric Dumazet @ 2018-10-30  3:08 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpXqJyK4zKiDNBMnX4GRa2Wt=9iAYvXORzs0wMkk+1=Qvg@mail.gmail.com>



On 10/29/2018 07:41 PM, Cong Wang wrote:
> On Mon, Oct 29, 2018 at 7:25 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 10/29/2018 07:21 PM, Cong Wang wrote:
>>> On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
>>>
>>> For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
>>>
>>> If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
>>> the end result may not be simpler.
>>
>> I meant to reinstate what was there before my patch in this error case
>>
>>        if (skb->ip_summed == CHECKSUM_COMPLETE)
>>                skb->ip_summed = CHECKSUM_NONE;
>>
>> That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.
> 
> I know your point, however, I am not sure that is a desired behavior.
> 
> On failure, I think the whole skb should be restored to its previous state
> before entering this function, changing it to CHECKSUM_NONE on failure
> is inconsistent with success case.
> 

Before my patch, we were changing skb->ip_summed to CHECKSUM_NONE, 
so why suddenly we need to be consistent ?

In any case, ip_check_defrag() should really drop this skb, as for other allocation
failures (like skb_share_check()), if really we want consistency.

^ permalink raw reply

* Re: [PATCH 1/2] net:stmmac: dwmac-mediatek: add support for mt2712
From: Andrew Lunn @ 2018-10-30 11:56 UTC (permalink / raw)
  To: biao huang
  Cc: Corentin Labbe, davem, robh+dt, mark.rutland, devicetree,
	nelson.chang, netdev, sean.wang, liguo.zhang, linux-kernel,
	matthias.bgg, joabreu, linux-mediatek, honghui.zhang, yt.shen,
	linux-arm-kernel
In-Reply-To: <1540883812.26982.15.camel@mhfsdcap03>

> > > +	plat_dat->interface = priv_plat->phy_mode;
> > > +	/* clk_csr_i = 250-300MHz & MDC = clk_csr_i/124 */
> > > +	plat_dat->clk_csr = 5;
> > > +	plat_dat->has_gmac4 = 1;
> > > +	plat_dat->has_gmac = 0;
> > > +	plat_dat->pmt = 0;
> > > +	plat_dat->maxmtu = 1500;
> > 
> > ETH_DATA_LEN ?
> how about getting maxmtu from device tree rather than assignment here?

Does it vary between devices? I expect it is a constant, determined by
the DMA hardware design. So no need to make it configurabale.

    Andrew

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30  2:53 UTC (permalink / raw)
  To: Cong Wang, Paweł Staszewski; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWWZqur_dLC_eM7iX7BFDncrYFSXHohwxCt2xMaAjRpMQ@mail.gmail.com>



On 10/29/2018 07:27 PM, Cong Wang wrote:
> Hi,
> 
> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>
>> Sorry not complete - followed by hw csum:
>>
>> [  342.190831] vlan1490: hw csum failure
>> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>> [  342.190836] Call Trace:
>> [  342.190839]  <IRQ>
>> [  342.190849]  dump_stack+0x46/0x5b
>> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
>> [  342.190859]  tcp_v4_rcv+0xef/0x960
>> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
>> [  342.190866]  ip_local_deliver+0x5e/0xe0
>> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
>> [  342.190870]  ip_rcv+0x41/0xc0
>> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
>> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
>> [  342.190879]  napi_gro_receive+0xb7/0xe0
>> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
>> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
>> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
> 
> 
> We got exactly the same backtrace in our data center. However,
> it is not easy for us to reproduce it, do you have any clue to reproduce it?
> 
> If you do, try to tcpdump the packets triggering this warning, it could
> be useful for debugging.
> 
> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
> occurs. We tried to revert the offending commit 88078d98d1bb, it
> disappears. So it is likely that commit 88078d98d1bb introduces
> more troubles than the one fixed by d55bef5059dd057bd.
> 

Or this could be that mlx5 driver is buggy when dealing with VLAN tags.

It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum() 

		if (network_depth > ETH_HLEN)
			/* CQE csum is calculated from the IP header and does
			 * not cover VLAN headers (if present). This will add
			 * the checksum manually.
			 */
			skb->csum = csum_partial(skb->data + ETH_HLEN,
						 network_depth - ETH_HLEN,
						 skb->csum);


That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.

^ permalink raw reply

* Re: 4.19 - tons of hw csum failure errors
From: Cong Wang @ 2018-10-30  2:51 UTC (permalink / raw)
  To: nikola.ciprich; +Cc: Linux Kernel Network Developers, nik, Eric Dumazet
In-Reply-To: <20181027191837.GA4283@localhost.localdomain>

(Cc'ing Eric)

On Sat, Oct 27, 2018 at 12:47 PM Nikola Ciprich
<nikola.ciprich@linuxbox.cz> wrote:
>
> Hi,
>
> just wanted to report, thet after switching to 4.19 (fro 4.14.x, so maybe
> the problem appeared somewhere between), I'm getting tons of similar
> messages:
>
> Oct 27 09:06:27 xxx kernel: br501: hw csum failure
> Oct 27 09:06:27 xxx kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E     4.19.0lb7.00_01_PRE04 #1
> Oct 27 09:06:27 xxx kernel: Hardware name: Supermicro Super Server/X11DDW-NT, BIOS 2.0b 03/07/2018
> Oct 27 09:06:27 xxx kernel: Call Trace:
> Oct 27 09:06:27 xxx kernel:  <IRQ>
> Oct 27 09:06:27 xxx kernel:  dump_stack+0x5a/0x73
> Oct 27 09:06:27 xxx kernel:  __skb_checksum_complete+0xba/0xc0
> Oct 27 09:06:27 xxx kernel:  tcp_error+0x108/0x180 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  nf_conntrack_in+0xd2/0x4b0 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  ? csum_partial+0xd/0x20
> Oct 27 09:06:27 xxx kernel:  nf_hook_slow+0x3d/0xb0
> Oct 27 09:06:27 xxx kernel:  ip_rcv+0xb5/0xd0
> Oct 27 09:06:27 xxx kernel:  ? ip_rcv_finish_core.isra.12+0x370/0x370
> Oct 27 09:06:27 xxx kernel:  __netif_receive_skb_one_core+0x52/0x70
> Oct 27 09:06:27 xxx kernel:  process_backlog+0xa3/0x150
> Oct 27 09:06:27 xxx kernel:  net_rx_action+0x2af/0x3f0
> Oct 27 09:06:27 xxx kernel:  __do_softirq+0xd1/0x28c
> Oct 27 09:06:27 xxx kernel:  irq_exit+0xde/0xf0
> Oct 27 09:06:27 xxx kernel:  do_IRQ+0x54/0xe0
> Oct 27 09:06:27 xxx kernel:  common_interrupt+0xf/0xf
> Oct 27 09:06:27 xxx kernel:  </IRQ>


We got the same warning (but a different backtrace) with mlx5 driver.

It seems you are using a different driver. Do you have any clue to reproduce
it?

If you do, try to tcpdump the packets triggering this warning, it could
be useful for debugging.

As I explained in other thread, it is likely that commit 88078d98d1bb
introduces more troubles than the one fixed by d55bef5059dd057bd.
You can try to play with these two commits to see if you get the same
conclusion.

BTW, the offending commit has been backported to 4.14 too. ;)

Thanks!

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30  2:43 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Linux Kernel Network Developers, Eric Dumazet, dmichail
In-Reply-To: <CAM_iQpWWZqur_dLC_eM7iX7BFDncrYFSXHohwxCt2xMaAjRpMQ@mail.gmail.com>

(Adding Eric and Dimitris into Cc)

On Mon, Oct 29, 2018 at 7:27 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi,
>
> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
> >
> > Sorry not complete - followed by hw csum:
> >
> > [  342.190831] vlan1490: hw csum failure
> > [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> > [  342.190836] Call Trace:
> > [  342.190839]  <IRQ>
> > [  342.190849]  dump_stack+0x46/0x5b
> > [  342.190856]  __skb_checksum_complete+0x9a/0xa0
> > [  342.190859]  tcp_v4_rcv+0xef/0x960
> > [  342.190864]  ip_local_deliver_finish+0x49/0xd0
> > [  342.190866]  ip_local_deliver+0x5e/0xe0
> > [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
> > [  342.190870]  ip_rcv+0x41/0xc0
> > [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
> > [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
> > [  342.190879]  napi_gro_receive+0xb7/0xe0
> > [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
> > [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
> > [  342.190888]  mlx5e_napi_poll+0xab/0xc90
>
>
> We got exactly the same backtrace in our data center. However,
> it is not easy for us to reproduce it, do you have any clue to reproduce it?
>
> If you do, try to tcpdump the packets triggering this warning, it could
> be useful for debugging.
>
> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
> occurs. We tried to revert the offending commit 88078d98d1bb, it
> disappears. So it is likely that commit 88078d98d1bb introduces
> more troubles than the one fixed by d55bef5059dd057bd.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30  2:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <5466e637-37d2-a11e-c30c-fc17a4dbeaf1@gmail.com>

On Mon, Oct 29, 2018 at 7:25 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/29/2018 07:21 PM, Cong Wang wrote:
> > On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
> >
> > For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
> >
> > If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
> > the end result may not be simpler.
>
> I meant to reinstate what was there before my patch in this error case
>
>        if (skb->ip_summed == CHECKSUM_COMPLETE)
>                skb->ip_summed = CHECKSUM_NONE;
>
> That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.

I know your point, however, I am not sure that is a desired behavior.

On failure, I think the whole skb should be restored to its previous state
before entering this function, changing it to CHECKSUM_NONE on failure
is inconsistent with success case.

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30  2:27 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Kernel Network Developers
In-Reply-To: <ec5c3923-a268-aa2c-fea1-350f8bce65c2@itcare.pl>

Hi,

On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>
> Sorry not complete - followed by hw csum:
>
> [  342.190831] vlan1490: hw csum failure
> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> [  342.190836] Call Trace:
> [  342.190839]  <IRQ>
> [  342.190849]  dump_stack+0x46/0x5b
> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
> [  342.190859]  tcp_v4_rcv+0xef/0x960
> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
> [  342.190866]  ip_local_deliver+0x5e/0xe0
> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
> [  342.190870]  ip_rcv+0x41/0xc0
> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
> [  342.190879]  napi_gro_receive+0xb7/0xe0
> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
> [  342.190888]  mlx5e_napi_poll+0xab/0xc90


We got exactly the same backtrace in our data center. However,
it is not easy for us to reproduce it, do you have any clue to reproduce it?

If you do, try to tcpdump the packets triggering this warning, it could
be useful for debugging.

Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
occurs. We tried to revert the offending commit 88078d98d1bb, it
disappears. So it is likely that commit 88078d98d1bb introduces
more troubles than the one fixed by d55bef5059dd057bd.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Eric Dumazet @ 2018-10-30  2:25 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpW8pT7yZ1pTmhNTh1+8yVhd+=mYLCNQ-cvCqPf127qRJA@mail.gmail.com>



On 10/29/2018 07:21 PM, Cong Wang wrote:
> On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
> 
> For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
> 
> If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
> the end result may not be simpler.

I meant to reinstate what was there before my patch in this error case

       if (skb->ip_summed == CHECKSUM_COMPLETE)
               skb->ip_summed = CHECKSUM_NONE;

That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30  2:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <ae7cf8de-af0c-4f06-1602-4f03144fbb8f@gmail.com>

On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?

For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?

If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
the end result may not be simpler.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Eric Dumazet @ 2018-10-30  2:14 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Eric Dumazet
In-Reply-To: <20181030003515.12075-1-xiyou.wangcong@gmail.com>



On 10/29/2018 05:35 PM, Cong Wang wrote:
> Most callers of pskb_trim_rcsum() simply drops the skb when
> it fails, however, ip_check_defrag() still continues to pass
> the skb up to stack. In that case, we should restore its previous
> csum if __pskb_trim() fails.
> 
> Found this during code review.
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/core/skbuff.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 946de0e24c87..5decd6e6d2b6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1843,6 +1843,9 @@ EXPORT_SYMBOL(___pskb_trim);
>   */
>  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>  {
> +	__wsum old_csum = skb->csum;
> +	int ret;
> +
>  	if (skb->ip_summed == CHECKSUM_COMPLETE) {
>  		int delta = skb->len - len;
>  
> @@ -1850,7 +1853,10 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>  					   skb_checksum(skb, len, delta, 0),
>  					   len);
>  	}
> -	return __pskb_trim(skb, len);
> +	ret = __pskb_trim(skb, len);
> +	if (unlikely(ret))
> +		skb->csum = old_csum;

Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?

> +	return ret;
>  }
>  EXPORT_SYMBOL(pskb_trim_rcsum_slow);
>  
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox