Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] MAINTAINERS: Direct networking documentation changes to netdev
From: David Miller @ 2018-04-19 17:42 UTC (permalink / raw)
  To: corbet; +Cc: netdev
In-Reply-To: <20180418101413.46fd1212@lwn.net>

From: Jonathan Corbet <corbet@lwn.net>
Date: Wed, 18 Apr 2018 10:14:13 -0600

> Networking docs changes go through the networking tree, so patch the
> MAINTAINERS file to direct authors to the right place.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

Applied, thanks Jonathan.

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: David Miller @ 2018-04-19 17:45 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, willemb
In-Reply-To: <CAF=yD-JOp37i83ROiizCUOJS3mz+TUgn91b9pWSGJ0sh-O9gbg@mail.gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed, 18 Apr 2018 14:12:40 -0400

> Actually, yes, I should be able to relax that constraint with
> segmentation.
> 
> It is there in case a corked packet may grow to the point of
> having to be fragmented. But segmentation already ensures
> that its datagrams always fit in MTU.
> 
> Should be simple refinement to
> 
>             !(flags & MSG_MORE) &&
> 
> in __ip_append_data.

Right, if segmentation never sends greater than mtu then we can
relax the restriction in this way.

Great idea.

^ permalink raw reply

* Re: [PATCH net-next] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
From: David Miller @ 2018-04-19 17:45 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20180418184315.48704-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 18 Apr 2018 11:43:15 -0700

> After working on IP defragmentation lately, I found that some large
> packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> zero paddings on the last (small) fragment.
> 
> While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> fragments had CHECKSUM_COMPLETE set.

Oops.

> We can instead compute the checksum of the part we are trimming,
> usually smaller than the part we keep.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [Patch net] llc: hold llc_sap before release_sock()
From: David Miller @ 2018-04-19 17:54 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev
In-Reply-To: <20180418185156.27067-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 18 Apr 2018 11:51:56 -0700

> @@ -199,9 +200,15 @@ static int llc_ui_release(struct socket *sock)
>  		llc->laddr.lsap, llc->daddr.lsap);
>  	if (!llc_send_disc(sk))
>  		llc_ui_wait_for_disc(sk, sk->sk_rcvtimeo);
> +	sap = llc->sap;
> +	/* Hold this for release_sock(), so that llc_backlog_rcv() could still
> +	 * use it.
> +	 */
> +	llc_sap_hold(sap);
>  	if (!sock_flag(sk, SOCK_ZAPPED))
>  		llc_sap_remove_socket(llc->sap, sk);
>  	release_sock(sk);
> +	llc_sap_put(sap);
>  	if (llc->dev)
>  		dev_put(llc->dev);
>  	sock_put(sk);

Yeah, kind of a weird ordering issue here.  It would have been nice if we could
remove the sap after the release_sock() but it appears that we can't.

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net] vmxnet3: fix incorrect dereference when rxvlan is disabled
From: David Miller @ 2018-04-19 17:59 UTC (permalink / raw)
  To: doshir; +Cc: netdev, pv-drivers, linux-kernel
In-Reply-To: <20180418194805.29119-1-doshir@vmware.com>

From: Ronak Doshi <doshir@vmware.com>
Date: Wed, 18 Apr 2018 12:48:04 -0700

> vmxnet3_get_hdr_len() is used to calculate the header length which in
> turn is used to calculate the gso_size for skb. When rxvlan offload is
> disabled, vlan tag is present in the header and the function references
> ip header from sizeof(ethhdr) and leads to incorrect pointer reference.
> 
> This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
> if vlan tag is present and correctly references the ip hdr.
> 
> Signed-off-by: Ronak Doshi <doshir@vmware.com>
> Acked-by: Guolin Yang <gyang@vmware.com>
> Acked-by: Louis Luo <llouis@vmware.com>

Applied and queued up for -stable, thanks.

Please provide an appropriate Fixes: tag next time.

^ permalink raw reply

* [PATCH] net:  Work around crash in ipv6 fib-walk-continue
From: greearb @ 2018-04-19 18:01 UTC (permalink / raw)
  To: netdev; +Cc: Ben Greear

From: Ben Greear <greearb@candelatech.com>

This keeps us from crashing in certain test cases where we
bring up many (1000, for instance) mac-vlans with IPv6
enabled in the kernel.  This bug has been around for a
very long time.

Until a real fix is found (and for stable), maybe it
is better to return an incomplete fib walk instead
of crashing.

BUG: unable to handle kernel NULL pointer dereference at 8
IP: fib6_walk_continue+0x5b/0x140 [ipv6]
PGD 80000007dfc0c067 P4D 80000007dfc0c067 PUD 7e66ff067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c vrf]
CPU: 3 PID: 15117 Comm: ip Tainted: G           O     4.16.0+ #5
Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
RSP: 0018:ffffc90008c3bc10 EFLAGS: 00010287
RAX: ffff88085ac45050 RBX: ffff8807e03008a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc90008c3bc48 RDI: ffffffff8232b240
RBP: ffff880819167600 R08: 0000000000000008 R09: ffff8807dff10071
R10: ffffc90008c3bbd0 R11: 0000000000000000 R12: ffff8807e03008a0
R13: 0000000000000002 R14: ffff8807e05744c8 R15: ffff8807e08ef000
FS:  00007f2f04342700(0000) GS:ffff88087fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000007e0556002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 inet6_dump_fib+0x14b/0x2c0 [ipv6]
 netlink_dump+0x216/0x2a0
 netlink_recvmsg+0x254/0x400
 ? copy_msghdr_from_user+0xb5/0x110
 ___sys_recvmsg+0xe9/0x230
 ? find_held_lock+0x3b/0xb0
 ? __handle_mm_fault+0x617/0x1180
 ? __audit_syscall_entry+0xb3/0x110
 ? __sys_recvmsg+0x39/0x70
 __sys_recvmsg+0x39/0x70
 do_syscall_64+0x63/0x120
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f2f03a72030
RSP: 002b:00007fffab3de508 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
RAX: ffffffffffffffda RBX: 00007fffab3e641c RCX: 00007f2f03a72030
RDX: 0000000000000000 RSI: 00007fffab3de570 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000007e6c R09: 00007fffab3e63a8
R10: 00007fffab3de5b0 R11: 0000000000000246 R12: 00007fffab3e6608
R13: 000000000066b460 R14: 0000000000007e6c R15: 0000000000000000
Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 89 53 2c c7 4
RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: ffffc90008c3bc10
CR2: 0000000000000008
---[ end trace bd03458864eb266c ]---

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

* This patch is against 4.16+, but a similar patch fixes the same issue
  older kernels.  Perhaps newer kernels will be resolved by David
  Ahern's fib6 changes, but I guess those won't be backported, so maybe
  this patch is still useful either way.

 net/ipv6/ip6_fib.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 92b8d8c..afef362 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1855,6 +1855,12 @@ static int fib6_walk_continue(struct fib6_walker *w)
 			if (fn == w->root)
 				return 0;
 			pn = rcu_dereference_protected(fn->parent, 1);
+			if (WARN_ON_ONCE(!pn)) {
+				pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
+				       w, fn, pn);
+				/* Attempt to work around crash that has been here forever. --Ben */
+				return 0;
+			}
 			left = rcu_dereference_protected(pn->left, 1);
 			right = rcu_dereference_protected(pn->right, 1);
 			w->node = pn;
-- 
2.4.11

^ permalink raw reply related

* Re: [Xen-devel] [PATCH] xen-netfront: Fix hang on device removal
From: Simon Gaiser @ 2018-04-19 18:10 UTC (permalink / raw)
  To: netdev
  Cc: Jason Andryuk, xen-devel, Eduardo Otubo, Juergen Gross,
	Boris Ostrovsky, open list
In-Reply-To: <20180228122323.3914-1-jandryuk@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 815 bytes --]

Jason Andryuk:
> A toolstack may delete the vif frontend and backend xenstore entries
> while xen-netfront is in the removal code path.  In that case, the
> checks for xenbus_read_driver_state would return XenbusStateUnknown, and
> xennet_remove would hang indefinitely.  This hang prevents system
> shutdown.
> 
> xennet_remove must be able to handle XenbusStateUnknown, and
> netback_changed must also wake up the wake_queue for that state as well.
> 
> Fixes: 5b5971df3bc2 ("xen-netfront: remove warning when unloading module")

I think this should go into stable since AFAIK the hanging network
device can only be fixed by rebooting the guest. AFAICS this affects all
4.* branches since 5b5971df3bc2 got backported to them.

Upstream commit c2d2e6738a209f0f9dffa2dc8e7292fc45360d61.

Simon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: Fix hang on device removal
From: Jason Andryuk @ 2018-04-19 18:14 UTC (permalink / raw)
  To: Simon Gaiser
  Cc: netdev, xen-devel, Eduardo Otubo, Juergen Gross, Boris Ostrovsky,
	open list
In-Reply-To: <1b07e839-5633-3b1e-1997-b86a891b2962@invisiblethingslab.com>

On Thu, Apr 19, 2018 at 2:10 PM, Simon Gaiser
<simon@invisiblethingslab.com> wrote:
> Jason Andryuk:
>> A toolstack may delete the vif frontend and backend xenstore entries
>> while xen-netfront is in the removal code path.  In that case, the
>> checks for xenbus_read_driver_state would return XenbusStateUnknown, and
>> xennet_remove would hang indefinitely.  This hang prevents system
>> shutdown.
>>
>> xennet_remove must be able to handle XenbusStateUnknown, and
>> netback_changed must also wake up the wake_queue for that state as well.
>>
>> Fixes: 5b5971df3bc2 ("xen-netfront: remove warning when unloading module")
>
> I think this should go into stable since AFAIK the hanging network
> device can only be fixed by rebooting the guest. AFAICS this affects all
> 4.* branches since 5b5971df3bc2 got backported to them.
>
> Upstream commit c2d2e6738a209f0f9dffa2dc8e7292fc45360d61.

Simon,

Yes, I agree.  I actually submitted the request to stable earlier
today, so hopefully it gets added soon.

Have you experienced this hang?

Regards,
Jason

^ permalink raw reply

* Re: [PATCH iproute2 net-next] vxlan: fix ttl inherit behavior
From: David Ahern @ 2018-04-19 18:15 UTC (permalink / raw)
  To: Hangbin Liu, netdev; +Cc: Stephen Hemminger, Jiri Benc
In-Reply-To: <1524027948-5395-1-git-send-email-liuhangbin@gmail.com>

On 4/17/18 11:05 PM, Hangbin Liu wrote:
> Like kernel net-next commit 72f6d71e491e6 ("vxlan: add ttl inherit support"),
> vxlan ttl inherit should means inherit the inner protocol's ttl value.
> 
> But currently when we add vxlan with "ttl inherit", we only set ttl 0,
> which is actually use whatever default value instead of inherit the inner
> protocol's ttl value.
> 
> To make a difference with ttl inherit and ttl == 0, we add an attribute
> IFLA_VXLAN_TTL_INHERIT when "ttl inherit" specified. And use "ttl auto"
> to means "use whatever default value", the same behavior with ttl == 0.
> 
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Suggested-by: Jiri Benc <jbenc@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  include/uapi/linux/if_link.h | 1 +
>  ip/iplink_vxlan.c            | 8 ++++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 

applied to iproute2-next.

^ permalink raw reply

* [PATCH v2 1/3] ethtool: Support ETHTOOL_GSTATS2 command.
From: greearb @ 2018-04-19 18:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-wireless, ath10k, Ben Greear

From: Ben Greear <greearb@candelatech.com>

This is similar to ETHTOOL_GSTATS, but it allows you to specify
flags.  These flags can be used by the driver to decrease the
amount of stats refreshed.  In particular, this helps with ath10k
since getting the firmware stats can be slow.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Change flag names to ETHTOOL_GS_NO_REFRESH_FW,
     ETHTOOL_GS2_REFRESH_ALL.

 include/linux/ethtool.h      | 12 ++++++++++++
 include/uapi/linux/ethtool.h | 11 +++++++++++
 net/core/ethtool.c           | 40 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index ebe4181..c90bc6f6 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -243,6 +243,15 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
  * @get_ethtool_stats: Return extended statistics about the device.
  *	This is only useful if the device maintains statistics not
  *	included in &struct rtnl_link_stats64.
+ * @get_ethtool_stats2: Return extended statistics about the device.
+ *	This is only useful if the device maintains statistics not
+ *	included in &struct rtnl_link_stats64.
+ *      Takes a flags argument:  0 means all (same as get_ethtool_stats),
+ *      0x1 (ETHTOOL_GS2_NO_REFRESH_FW) means skip refreshing firmware stats.
+ *      Other flags are reserved for now.
+ *      Same number of stats will be returned, but some of them might
+ *      not be as accurate/refreshed.  This is to allow not querying
+ *      firmware or other expensive-to-read stats, for instance.
  * @begin: Function to be called before any other operation.  Returns a
  *	negative error code or zero.
  * @complete: Function to be called after any other operation except
@@ -355,6 +364,9 @@ struct ethtool_ops {
 	int	(*set_phys_id)(struct net_device *, enum ethtool_phys_id_state);
 	void	(*get_ethtool_stats)(struct net_device *,
 				     struct ethtool_stats *, u64 *);
+	void	(*get_ethtool_stats2)(struct net_device *dev,
+				      struct ethtool_stats *gstats, u64 *data,
+				      u32 flags);
 	int	(*begin)(struct net_device *);
 	void	(*complete)(struct net_device *);
 	u32	(*get_priv_flags)(struct net_device *);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 4ca65b5..75c753d 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1396,11 +1396,22 @@ enum ethtool_fec_config_bits {
 #define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
 #define ETHTOOL_GFECPARAM	0x00000050 /* Get FEC settings */
 #define ETHTOOL_SFECPARAM	0x00000051 /* Set FEC settings */
+#define ETHTOOL_GSTATS2		0x00000052 /* get NIC-specific statistics
+					    * with ability to specify flags.
+					    * See ETHTOOL_GS2* below.
+					    */
 
 /* compatibility with older code */
 #define SPARC_ETH_GSET		ETHTOOL_GSET
 #define SPARC_ETH_SSET		ETHTOOL_SSET
 
+/* GSTATS2 flags */
+#define ETHTOOL_GS2_REFRESH_ALL     (0)    /* default is to update all stats */
+#define ETHTOOL_GS2_NO_REFRESH_FW   (1<<0) /* Skip refreshing stats that probe
+					    * firmware, and thus are
+					    * slow/expensive.
+					    */
+
 /* Link mode bit indices */
 enum ethtool_link_mode_bit_indices {
 	ETHTOOL_LINK_MODE_10baseT_Half_BIT	= 0,
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03416e6..dd16f15 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1952,16 +1952,14 @@ static int ethtool_phys_id(struct net_device *dev, void __user *useraddr)
 	return rc;
 }
 
-static int ethtool_get_stats(struct net_device *dev, void __user *useraddr)
+static int _ethtool_get_stats(struct net_device *dev, void __user *useraddr,
+			      u32 flags)
 {
 	struct ethtool_stats stats;
 	const struct ethtool_ops *ops = dev->ethtool_ops;
 	u64 *data;
 	int ret, n_stats;
 
-	if (!ops->get_ethtool_stats || !ops->get_sset_count)
-		return -EOPNOTSUPP;
-
 	n_stats = ops->get_sset_count(dev, ETH_SS_STATS);
 	if (n_stats < 0)
 		return n_stats;
@@ -1976,7 +1974,10 @@ static int ethtool_get_stats(struct net_device *dev, void __user *useraddr)
 	if (n_stats && !data)
 		return -ENOMEM;
 
-	ops->get_ethtool_stats(dev, &stats, data);
+	if (flags != ETHTOOL_GS2_REFRESH_ALL)
+		ops->get_ethtool_stats2(dev, &stats, data, flags);
+	else
+		ops->get_ethtool_stats(dev, &stats, data);
 
 	ret = -EFAULT;
 	if (copy_to_user(useraddr, &stats, sizeof(stats)))
@@ -1991,6 +1992,31 @@ static int ethtool_get_stats(struct net_device *dev, void __user *useraddr)
 	return ret;
 }
 
+static int ethtool_get_stats(struct net_device *dev, void __user *useraddr)
+{
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+
+	if (!ops->get_ethtool_stats || !ops->get_sset_count)
+		return -EOPNOTSUPP;
+	return _ethtool_get_stats(dev, useraddr, ETHTOOL_GS2_REFRESH_ALL);
+}
+
+static int ethtool_get_stats2(struct net_device *dev, void __user *useraddr)
+{
+	struct ethtool_stats stats;
+	const struct ethtool_ops *ops = dev->ethtool_ops;
+	u32 flags = 0;
+
+	if (!ops->get_ethtool_stats2 || !ops->get_sset_count)
+		return -EOPNOTSUPP;
+
+	if (copy_from_user(&stats, useraddr, sizeof(stats)))
+		return -EFAULT;
+
+	flags = stats.n_stats;
+	return _ethtool_get_stats(dev, useraddr, flags);
+}
+
 static int ethtool_get_phy_stats(struct net_device *dev, void __user *useraddr)
 {
 	struct ethtool_stats stats;
@@ -2632,6 +2658,7 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GSSET_INFO:
 	case ETHTOOL_GSTRINGS:
 	case ETHTOOL_GSTATS:
+	case ETHTOOL_GSTATS2:
 	case ETHTOOL_GPHYSTATS:
 	case ETHTOOL_GTSO:
 	case ETHTOOL_GPERMADDR:
@@ -2742,6 +2769,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GSTATS:
 		rc = ethtool_get_stats(dev, useraddr);
 		break;
+	case ETHTOOL_GSTATS2:
+		rc = ethtool_get_stats2(dev, useraddr);
+		break;
 	case ETHTOOL_GPERMADDR:
 		rc = ethtool_get_perm_addr(dev, useraddr);
 		break;
-- 
2.4.11

^ permalink raw reply related

* [PATCH v2 2/3] mac80211:  Add support for ethtool gstats2 API.
From: greearb @ 2018-04-19 18:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-wireless, ath10k, Ben Greear
In-Reply-To: <1524161863-30860-1-git-send-email-greearb@candelatech.com>

From: Ben Greear <greearb@candelatech.com>

This enables users to request fewer stats to be refreshed
in cases where firmware does not need to be probed.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  No changes.

 include/net/mac80211.h    |  6 ++++++
 net/mac80211/driver-ops.h |  9 +++++++--
 net/mac80211/ethtool.c    | 18 +++++++++++++-----
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index d2279b2..4854f33 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -3361,6 +3361,8 @@ enum ieee80211_reconfig_type {
  *
  * @get_et_stats:  Ethtool API to get a set of u64 stats.
  *
+ * @get_et_stats2:  Ethtool API to get a set of u64 stats, with flags.
+ *
  * @get_et_strings:  Ethtool API to get a set of strings to describe stats
  *	and perhaps other supported types of ethtool data-sets.
  *
@@ -3692,6 +3694,10 @@ struct ieee80211_ops {
 	void	(*get_et_stats)(struct ieee80211_hw *hw,
 				struct ieee80211_vif *vif,
 				struct ethtool_stats *stats, u64 *data);
+	void	(*get_et_stats2)(struct ieee80211_hw *hw,
+				 struct ieee80211_vif *vif,
+				 struct ethtool_stats *stats, u64 *data,
+				 u32 flags);
 	void	(*get_et_strings)(struct ieee80211_hw *hw,
 				  struct ieee80211_vif *vif,
 				  u32 sset, u8 *data);
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 4d82fe7..519d2db 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -58,10 +58,15 @@ static inline void drv_get_et_strings(struct ieee80211_sub_if_data *sdata,
 
 static inline void drv_get_et_stats(struct ieee80211_sub_if_data *sdata,
 				    struct ethtool_stats *stats,
-				    u64 *data)
+				    u64 *data, u32 flags)
 {
 	struct ieee80211_local *local = sdata->local;
-	if (local->ops->get_et_stats) {
+	if (local->ops->get_et_stats2) {
+		trace_drv_get_et_stats(local);
+		local->ops->get_et_stats2(&local->hw, &sdata->vif, stats, data,
+					  flags);
+		trace_drv_return_void(local);
+	} else if (local->ops->get_et_stats) {
 		trace_drv_get_et_stats(local);
 		local->ops->get_et_stats(&local->hw, &sdata->vif, stats, data);
 		trace_drv_return_void(local);
diff --git a/net/mac80211/ethtool.c b/net/mac80211/ethtool.c
index 9cc986d..b67520e 100644
--- a/net/mac80211/ethtool.c
+++ b/net/mac80211/ethtool.c
@@ -61,9 +61,9 @@ static int ieee80211_get_sset_count(struct net_device *dev, int sset)
 	return rv;
 }
 
-static void ieee80211_get_stats(struct net_device *dev,
-				struct ethtool_stats *stats,
-				u64 *data)
+static void ieee80211_get_stats2(struct net_device *dev,
+				 struct ethtool_stats *stats,
+				 u64 *data, u32 flags)
 {
 	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
 	struct ieee80211_chanctx_conf *chanctx_conf;
@@ -199,7 +199,14 @@ static void ieee80211_get_stats(struct net_device *dev,
 	if (WARN_ON(i != STA_STATS_LEN))
 		return;
 
-	drv_get_et_stats(sdata, stats, &(data[STA_STATS_LEN]));
+	drv_get_et_stats(sdata, stats, &data[STA_STATS_LEN], flags);
+}
+
+static void ieee80211_get_stats(struct net_device *dev,
+				struct ethtool_stats *stats,
+				u64 *data)
+{
+	ieee80211_get_stats2(dev, stats, data, 0);
 }
 
 static void ieee80211_get_strings(struct net_device *dev, u32 sset, u8 *data)
@@ -211,7 +218,7 @@ static void ieee80211_get_strings(struct net_device *dev, u32 sset, u8 *data)
 		sz_sta_stats = sizeof(ieee80211_gstrings_sta_stats);
 		memcpy(data, ieee80211_gstrings_sta_stats, sz_sta_stats);
 	}
-	drv_get_et_strings(sdata, sset, &(data[sz_sta_stats]));
+	drv_get_et_strings(sdata, sset, &data[sz_sta_stats]);
 }
 
 static int ieee80211_get_regs_len(struct net_device *dev)
@@ -238,5 +245,6 @@ const struct ethtool_ops ieee80211_ethtool_ops = {
 	.set_ringparam = ieee80211_set_ringparam,
 	.get_strings = ieee80211_get_strings,
 	.get_ethtool_stats = ieee80211_get_stats,
+	.get_ethtool_stats2 = ieee80211_get_stats2,
 	.get_sset_count = ieee80211_get_sset_count,
 };
-- 
2.4.11

^ permalink raw reply related

* [PATCH v2 3/3] ath10k:  Support ethtool gstats2 API.
From: greearb @ 2018-04-19 18:17 UTC (permalink / raw)
  To: netdev; +Cc: linux-wireless, ath10k, Ben Greear
In-Reply-To: <1524161863-30860-1-git-send-email-greearb@candelatech.com>

From: Ben Greear <greearb@candelatech.com>

Skip a firmware stats update when calling
code indicates the stats refresh is not needed.

Signed-off-by: Ben Greear <greearb@candelatech.com>
---

v2:  Convert to new flag name, attempt to fix build
   when there is not DEBUGFS enabled for ath10k.

 drivers/net/wireless/ath/ath10k/debug.c | 18 +++++++++++++++---
 drivers/net/wireless/ath/ath10k/debug.h |  5 +++++
 drivers/net/wireless/ath/ath10k/mac.c   |  1 +
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index bac832c..235cd04 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -1159,9 +1159,10 @@ int ath10k_debug_get_et_sset_count(struct ieee80211_hw *hw,
 	return 0;
 }
 
-void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
-			       struct ieee80211_vif *vif,
-			       struct ethtool_stats *stats, u64 *data)
+void ath10k_debug_get_et_stats2(struct ieee80211_hw *hw,
+				struct ieee80211_vif *vif,
+				struct ethtool_stats *stats, u64 *data,
+				u32 flags)
 {
 	struct ath10k *ar = hw->priv;
 	static const struct ath10k_fw_stats_pdev zero_stats = {};
@@ -1170,6 +1171,9 @@ void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
 
 	mutex_lock(&ar->conf_mutex);
 
+	if (flags & ETHTOOL_GS2_NO_REFRESH_FW)
+		goto skip_query_fw_stats;
+
 	if (ar->state == ATH10K_STATE_ON) {
 		ret = ath10k_debug_fw_stats_request(ar);
 		if (ret) {
@@ -1180,6 +1184,7 @@ void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
 		}
 	}
 
+skip_query_fw_stats:
 	pdev_stats = list_first_entry_or_null(&ar->debug.fw_stats.pdevs,
 					      struct ath10k_fw_stats_pdev,
 					      list);
@@ -1244,6 +1249,13 @@ void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
 	WARN_ON(i != ATH10K_SSTATS_LEN);
 }
 
+void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
+			       struct ieee80211_vif *vif,
+			       struct ethtool_stats *stats, u64 *data)
+{
+	ath10k_debug_get_et_stats2(hw, vif, stats, data, 0);
+}
+
 static const struct file_operations fops_fw_dbglog = {
 	.read = ath10k_read_fw_dbglog,
 	.write = ath10k_write_fw_dbglog,
diff --git a/drivers/net/wireless/ath/ath10k/debug.h b/drivers/net/wireless/ath/ath10k/debug.h
index 0afca5c..e953dd0 100644
--- a/drivers/net/wireless/ath/ath10k/debug.h
+++ b/drivers/net/wireless/ath/ath10k/debug.h
@@ -117,6 +117,10 @@ int ath10k_debug_get_et_sset_count(struct ieee80211_hw *hw,
 void ath10k_debug_get_et_stats(struct ieee80211_hw *hw,
 			       struct ieee80211_vif *vif,
 			       struct ethtool_stats *stats, u64 *data);
+void ath10k_debug_get_et_stats2(struct ieee80211_hw *hw,
+				struct ieee80211_vif *vif,
+				struct ethtool_stats *stats, u64 *data,
+				u32 level);
 
 static inline u64 ath10k_debug_get_fw_dbglog_mask(struct ath10k *ar)
 {
@@ -195,6 +199,7 @@ static inline u32 ath10k_debug_get_fw_dbglog_level(struct ath10k *ar)
 #define ath10k_debug_get_et_strings NULL
 #define ath10k_debug_get_et_sset_count NULL
 #define ath10k_debug_get_et_stats NULL
+#define ath10k_debug_get_et_stats2 NULL
 
 #endif /* CONFIG_ATH10K_DEBUGFS */
 #ifdef CONFIG_MAC80211_DEBUGFS
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index bf05a36..27b793c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -7734,6 +7734,7 @@ static const struct ieee80211_ops ath10k_ops = {
 	.ampdu_action			= ath10k_ampdu_action,
 	.get_et_sset_count		= ath10k_debug_get_et_sset_count,
 	.get_et_stats			= ath10k_debug_get_et_stats,
+	.get_et_stats2			= ath10k_debug_get_et_stats2,
 	.get_et_strings			= ath10k_debug_get_et_strings,
 	.add_chanctx			= ath10k_mac_op_add_chanctx,
 	.remove_chanctx			= ath10k_mac_op_remove_chanctx,
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH net-next] net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
From: Alexander Duyck @ 2018-04-19 18:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <20180418184315.48704-1-edumazet@google.com>

On Wed, Apr 18, 2018 at 11:43 AM, Eric Dumazet <edumazet@google.com> wrote:
> After working on IP defragmentation lately, I found that some large
> packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> zero paddings on the last (small) fragment.
>
> While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> fragments had CHECKSUM_COMPLETE set.
>
> We can instead compute the checksum of the part we are trimming,
> usually smaller than the part we keep.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/skbuff.h |  5 ++---
>  net/core/skbuff.c      | 14 ++++++++++++++
>  2 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9065477ed255a48f7e01b8a28ea6321cce9127f5..d274059529eb5216d041dfdcad4a564a623c8ea0 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3131,6 +3131,7 @@ static inline void *skb_push_rcsum(struct sk_buff *skb, unsigned int len)
>         return skb->data;
>  }
>
> +int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
>  /**
>   *     pskb_trim_rcsum - trim received skb and update checksum
>   *     @skb: buffer to trim
> @@ -3144,9 +3145,7 @@ static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
>  {
>         if (likely(len >= skb->len))
>                 return 0;
> -       if (skb->ip_summed == CHECKSUM_COMPLETE)
> -               skb->ip_summed = CHECKSUM_NONE;
> -       return __pskb_trim(skb, len);
> +       return pskb_trim_rcsum_slow(skb, len);
>  }
>

I'm wondering if in the past padding was somehow screwing up the
CHECKSUM_COMPLETE value being provided. I wonder if it wouldn't be in
our interest to just consider manually computing the checksum of the
fragment after stripping the padding instead of just subtracting the
offset of the padding.

I guess if we start seeing checksum errors popping up on some devices
we can reevaluate this if necessary.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
From: Vlastimil Babka @ 2018-04-19 18:28 UTC (permalink / raw)
  To: Mikulas Patocka, David Miller, Andrew Morton, linux-mm
  Cc: eric.dumazet, edumazet, bhutchings, netdev, linux-kernel, mst,
	jasowang, virtualization, dm-devel, Laura Abbott
In-Reply-To: <alpine.LRH.2.02.1804191207380.31175@file01.intranet.prod.int.rdu2.redhat.com>

On 04/19/2018 06:12 PM, Mikulas Patocka wrote:
> From: Mikulas Patocka <mpatocka@redhat.com>
> Subject: [PATCH] kvmalloc: always use vmalloc if CONFIG_DEBUG_VM
> 
> The kvmalloc function tries to use kmalloc and falls back to vmalloc if
> kmalloc fails.
> 
> Unfortunatelly, some kernel code has bugs - it uses kvmalloc and then
> uses DMA-API on the returned memory or frees it with kfree. Such bugs were
> found in the virtio-net driver, dm-integrity or RHEL7 powerpc-specific
> code.
> 
> These bugs are hard to reproduce because vmalloc falls back to kmalloc
> only if memory is fragmented.
> 
> In order to detect these bugs reliably I submit this patch that changes
> kvmalloc to always use vmalloc if CONFIG_DEBUG_VM is turned on.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Hmm AFAIK Fedora uses CONFIG_DEBUG_VM in their kernels. Sure you want to
impose this on all users? Seems too much for DEBUG_VM to me. Maybe it
should be hidden under some error injection config?

> ---
>  mm/util.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c	2018-04-18 15:46:23.000000000 +0200
> +++ linux-2.6/mm/util.c	2018-04-18 16:00:43.000000000 +0200
> @@ -395,6 +395,7 @@ EXPORT_SYMBOL(vm_mmap);
>   */
>  void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  {
> +#ifndef CONFIG_DEBUG_VM
>  	gfp_t kmalloc_flags = flags;
>  	void *ret;
>  
> @@ -426,6 +427,7 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	 */
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
> +#endif

Did you verify that vmalloc does the right thing for sub-page sizes?
Shouldn't those be exempted?

>  	return __vmalloc_node_flags_caller(size, node, flags,
>  			__builtin_return_address(0));
> 

^ permalink raw reply

* Re: [bisected] Stack overflow after fs: "switch the IO-triggering parts of umount to fs_pin" (was net namespaces kernel stack overflow)
From: Alexander Aring @ 2018-04-19 18:37 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: Al Viro, linux-kernel, netdev, Jamal Hadi Salim
In-Reply-To: <188a05bc-de07-c048-6a8a-63dc899cce6d@virtuozzo.com>

Hi,

On Thu, Apr 19, 2018 at 12:56 PM, Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> On 19.04.2018 19:44, Al Viro wrote:
>> On Thu, Apr 19, 2018 at 04:34:48PM +0100, Al Viro wrote:
>>
>>> IOW, we only get there if our vfsmount was an MNT_INTERNAL one.
>>> So we have mnt->mnt_umount of some MNT_INTERNAL mount found in
>>> ->mnt_pins of some other mount.  Which, AFAICS, means that
>>> it used to be mounted on that other mount.  How the hell can
>>> that happen?
>>>
>>> It looks like you somehow get a long chain of MNT_INTERNAL mounts
>>> stacked on top of each other, which ought to be prevented by
>>>         mnt_flags &= ~MNT_INTERNAL_FLAGS;
>>> in do_add_mount().  Nuts...
>>
>> Arrrrrgh...  Nuts is right - clone_mnt() preserves the sodding
>> MNT_INTERNAL, with obvious results.
>>
>> netns is related to the problem, by exposing MNT_INTERNAL mounts
>> (in /proc/*/ns/*) for mount --bind to copy and attach to the
>> tree.  AFAICS, the minimal reproducer is
>>
>> touch /tmp/a
>> unshare -m sh -c 'for i in `seq 10000`; do mount --bind /proc/1/ns/net /tmp/a; done'
>>
>> (and it can be anything in /proc/*/ns/*, really)
>>
>> I think the fix should be along the lines of the following:
>>
>> Don't leak MNT_INTERNAL away from internal mounts
>>
>> We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
>> their copies.
>>
>> Cc: stable@kernel.org
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> Flawless victory! Thanks.
>

Thanks to all.

Also thanks to Kirill for helping me here and doing the main part by
bisecting this issue.

Finally, my testing stuff which produced this bug also works well now.

Tested-by: Alexander Aring <aring@mojatatu.com>

- Alex

^ permalink raw reply

* Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Michael S. Tsirkin @ 2018-04-19 18:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, alex.williamson, ddutile, alexander.h.duyck,
	virtio-dev, linux-kernel, kvm, virtualization, netdev, dan.daly,
	cunming.liang, zhihong.wang, jianfeng.tan, xiao.w.wang
In-Reply-To: <30a63fff-7599-640a-361f-a27e5783012a@redhat.com>

On Tue, Apr 10, 2018 at 03:25:45PM +0800, Jason Wang wrote:
> > > > One problem is that, different virtio ring compatible devices
> > > > may have different device interfaces. That is to say, we will
> > > > need different drivers in QEMU. It could be troublesome. And
> > > > that's what this patch trying to fix. The idea behind this
> > > > patch is very simple: mdev is a standard way to emulate device
> > > > in kernel.
> > > So you just move the abstraction layer from qemu to kernel, and you still
> > > need different drivers in kernel for different device interfaces of
> > > accelerators. This looks even more complex than leaving it in qemu. As you
> > > said, another idea is to implement userspace vhost backend for accelerators
> > > which seems easier and could co-work with other parts of qemu without
> > > inventing new type of messages.
> > I'm not quite sure. Do you think it's acceptable to
> > add various vendor specific hardware drivers in QEMU?
> > 
> 
> I don't object but we need to figure out the advantages of doing it in qemu
> too.
> 
> Thanks

To be frank kernel is exactly where device drivers belong.  DPDK did
move them to userspace but that's merely a requirement for data path.
*If* you can have them in kernel that is best:
- update kernel and there's no need to rebuild userspace
- apps can be written in any language no need to maintain multiple
  libraries or add wrappers
- security concerns are much smaller (ok people are trying to
  raise the bar with IOMMUs and such, but it's already pretty
  good even without)

The biggest issue is that you let userspace poke at the
device which is also allowed by the IOMMU to poke at
kernel memory (needed for kernel driver to work).

Yes, maybe if device is not buggy it's all fine, but
it's better if we do not have to trust the device
otherwise the security picture becomes more murky.

I suggested attaching a PASID to (some) queues - see my old post "using
PASIDs to enable a safe variant of direct ring access".

Then using IOMMU with VFIO to limit access through queue to corrent
ranges of memory.


-- 
MST

^ permalink raw reply

* Re: [PATCH 02/39] proc: introduce proc_create_seq{,_data}
From: Alexey Dobriyan @ 2018-04-19 18:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-3-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:03PM +0200, Christoph Hellwig wrote:
> Variants of proc_create{,_data} that directly take a struct seq_operations
> argument and drastically reduces the boilerplate code in the callers.

> +static int proc_seq_open(struct inode *inode, struct file *file)
> +{
> +	struct proc_dir_entry *de = PDE(inode);
> +
> +	return seq_open(file, de->seq_ops);
> +}
> +
> +static const struct file_operations proc_seq_fops = {
> +	.open		= proc_seq_open,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= seq_release,
> +};
> +
> +struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode,
> +		struct proc_dir_entry *parent, const struct seq_operations *ops,
> +		void *data)
> +{
> +	struct proc_dir_entry *p;
> +
> +	p = proc_create_data(name, mode, parent, &proc_seq_fops, data);
> +	if (p)
> +		p->seq_ops = ops;
> +	return p;
> +}

Should be oopsable.
Once proc_create_data() returns, entry is live, ->open can be called.

> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -44,6 +44,7 @@ struct proc_dir_entry {
>  	struct completion *pde_unload_completion;
>  	const struct inode_operations *proc_iops;
>  	const struct file_operations *proc_fops;
> +	const struct seq_operations *seq_ops;
>  	void *data;
>  	unsigned int low_ino;
>  	nlink_t nlink;

"struct proc_dir_entry is 192/128 bytes now.
If someone knows how to pad array to certain size without union
please tell.

^ permalink raw reply

* Re: [PATCH 14/39] proc: introduce proc_create_net_single
From: Alexey Dobriyan @ 2018-04-19 18:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-15-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:15PM +0200, Christoph Hellwig wrote:
> Variant of proc_create_data that directly take a seq_file show

> +struct proc_dir_entry *proc_create_net_single(const char *name, umode_t mode,
> +		struct proc_dir_entry *parent,
> +		int (*show)(struct seq_file *, void *), void *data)
> +{
> +	struct proc_dir_entry *p;
> +
> +	p = proc_create_data(name, mode, parent, &proc_net_single_fops, data);
> +	if (p)
> +		p->single_show = show;
> +	return p;
> +}

Ditto, should be oopsable.

^ permalink raw reply

* Re: [PATCH 03/39] proc: introduce proc_create_seq_private
From: Alexey Dobriyan @ 2018-04-19 18:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rtc, Alessandro Zummo, Alexandre Belloni, devel, linux-scsi,
	Corey Minyard, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-kernel, linux-acpi, netdev, netfilter-devel, Alexander Viro,
	Jiri Slaby, Andrew Morton, linux-ext4, linux-afs,
	megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180419124140.9309-4-hch@lst.de>

On Thu, Apr 19, 2018 at 02:41:04PM +0200, Christoph Hellwig wrote:
> Variant of proc_create_data that directly take a struct seq_operations

> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -45,6 +45,7 @@ struct proc_dir_entry {
>  	const struct inode_operations *proc_iops;
>  	const struct file_operations *proc_fops;
>  	const struct seq_operations *seq_ops;
> +	size_t state_size;

"unsigned int" please.

Where have you seen 4GB priv states?

^ permalink raw reply

* Re: WARNING in refcount_dec
From: Willem de Bruijn @ 2018-04-19 18:55 UTC (permalink / raw)
  To: DaeRyong Jeong
  Cc: Cong Wang, Byoungyoung Lee, LKML, Kyungtae Kim,
	Linux Kernel Network Developers, Willem de Bruijn
In-Reply-To: <CACsK=jcgPLMydKfRkKKXXUVqAWXwszbvkr=5jYXXPAmigTtszQ@mail.gmail.com>

On Thu, Apr 19, 2018 at 2:32 AM, DaeRyong Jeong <threeearcat@gmail.com> wrote:
> Hello.
> We have analyzed the cause of the crash in v4.16-rc3, WARNING in refcount_dec,
> which is found by RaceFuzzer (a modified version of Syzkaller).
>
> Since struct packet_sock's member variables, running, has_vnet_hdr, origdev
> and auxdata are declared as bitfields, accessing these variables can race if
> there is no synchronization mechanism.

Great catch.

These fields po->{running, auxdata, origdev, has_vnet_hdr} are
accessed without a uniform locking strategy.

po->running is always accessed with po->bind_lock held (with the
exception of reading in packet_seq_show, but that is best effort).

That is the only field written to outside setsockopt. If it is moved to
a separate word, it will no longer interfere with the others.

The other fields are read lockless in the various recv and send
functions, but only set in setsockopt. We've had enough
locking bugs around setsockopt that I suggest we wrap all of
those in lock_sock, like the example I gave before for
has_vnet_hdr.

^ permalink raw reply

* Re: simplify procfs code for seq_file instances
From: Alexey Dobriyan @ 2018-04-19 18:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Alexander Viro, Greg Kroah-Hartman, Jiri Slaby,
	Corey Minyard, Alessandro Zummo, Alexandre Belloni, linux-acpi,
	drbd-dev, linux-ide, netdev, linux-rtc, megaraidlinux.pdl,
	linux-scsi, devel, linux-afs, linux-ext4, jfs-discussion,
	netfilter-devel, linux-kernel
In-Reply-To: <20180419124140.9309-1-hch@lst.de>

>     git://git.infradead.org/users/hch/misc.git proc_create


I want to ask if it is time to start using poorman function overloading
with _b_c_e(). There are millions of allocation functions for example,
all slightly difference, and people will add more. Seeing /proc interfaces
doubled like this is painful.

^ permalink raw reply

* Re: [PATCH] docs: ip-sysctl.txt: fix name of some ipv6 variables
From: David Miller @ 2018-04-19 19:22 UTC (permalink / raw)
  To: olivier.gayot; +Cc: netdev
In-Reply-To: <1524081786-30392-1-git-send-email-olivier.gayot@sigexec.com>

From: Olivier Gayot <olivier.gayot@sigexec.com>
Date: Wed, 18 Apr 2018 22:03:06 +0200

> The name of the following proc/sysctl entries were incorrectly
> documented:
> 
>     /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
>     /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
>     /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
>     /proc/sys/net/ipv6/conf/<interface>/max_hbt_length
> 
> Their name was set to the name of the symbol in the .data field of the
> control table instead of their .proc name.
> 
> Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>

Good catch, applied, thank you.

^ permalink raw reply

* [Patch net] llc: delete timers synchronously in llc_sk_free()
From: Cong Wang @ 2018-04-19 19:25 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

The connection timers of an llc sock could be still flying
after we delete them in llc_sk_free(), and even possibly
after we free the sock. We could just wait synchronously
here in case of troubles.

Note, I leave other call paths as they are, since they may
not have to wait, at least we can change them to synchronously
when needed.

Also, move the code to net/llc/llc_conn.c, which is apparently
a better place.

Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 include/net/llc_conn.h |  1 +
 net/llc/llc_c_ac.c     |  9 +--------
 net/llc/llc_conn.c     | 22 +++++++++++++++++++++-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index 5c40f118c0fa..df528a623548 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -97,6 +97,7 @@ static __inline__ char llc_backlog_type(struct sk_buff *skb)
 
 struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority,
 			  struct proto *prot, int kern);
+void llc_sk_stop_all_timers(struct sock *sk, bool sync);
 void llc_sk_free(struct sock *sk);
 
 void llc_sk_reset(struct sock *sk);
diff --git a/net/llc/llc_c_ac.c b/net/llc/llc_c_ac.c
index 163121192aca..4d78375f9872 100644
--- a/net/llc/llc_c_ac.c
+++ b/net/llc/llc_c_ac.c
@@ -1099,14 +1099,7 @@ int llc_conn_ac_inc_tx_win_size(struct sock *sk, struct sk_buff *skb)
 
 int llc_conn_ac_stop_all_timers(struct sock *sk, struct sk_buff *skb)
 {
-	struct llc_sock *llc = llc_sk(sk);
-
-	del_timer(&llc->pf_cycle_timer.timer);
-	del_timer(&llc->ack_timer.timer);
-	del_timer(&llc->rej_sent_timer.timer);
-	del_timer(&llc->busy_state_timer.timer);
-	llc->ack_must_be_send = 0;
-	llc->ack_pf = 0;
+	llc_sk_stop_all_timers(sk, false);
 	return 0;
 }
 
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index 110e32bcb399..c0ac522b48a1 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -961,6 +961,26 @@ struct sock *llc_sk_alloc(struct net *net, int family, gfp_t priority, struct pr
 	return sk;
 }
 
+void llc_sk_stop_all_timers(struct sock *sk, bool sync)
+{
+	struct llc_sock *llc = llc_sk(sk);
+
+	if (sync) {
+		del_timer_sync(&llc->pf_cycle_timer.timer);
+		del_timer_sync(&llc->ack_timer.timer);
+		del_timer_sync(&llc->rej_sent_timer.timer);
+		del_timer_sync(&llc->busy_state_timer.timer);
+	} else {
+		del_timer(&llc->pf_cycle_timer.timer);
+		del_timer(&llc->ack_timer.timer);
+		del_timer(&llc->rej_sent_timer.timer);
+		del_timer(&llc->busy_state_timer.timer);
+	}
+
+	llc->ack_must_be_send = 0;
+	llc->ack_pf = 0;
+}
+
 /**
  *	llc_sk_free - Frees a LLC socket
  *	@sk - socket to free
@@ -973,7 +993,7 @@ void llc_sk_free(struct sock *sk)
 
 	llc->state = LLC_CONN_OUT_OF_SVC;
 	/* Stop all (possibly) running timers */
-	llc_conn_ac_stop_all_timers(sk, NULL);
+	llc_sk_stop_all_timers(sk, true);
 #ifdef DEBUG_LLC_CONN_ALLOC
 	printk(KERN_INFO "%s: unackq=%d, txq=%d\n", __func__,
 		skb_queue_len(&llc->pdu_unack_q),
-- 
2.13.0

^ permalink raw reply related

* Re: [PATCH net-next] lan78xx: Add support to dump lan78xx registers
From: David Miller @ 2018-04-19 19:26 UTC (permalink / raw)
  To: raghuramchary.jallipalli; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180418155735.28070-1-raghuramchary.jallipalli@microchip.com>

From: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Date: Wed, 18 Apr 2018 21:27:35 +0530

> +	/* Read Device/MAC registers */
> +	for (i = 0, j = 0; i < (sizeof(lan78xx_regs) / sizeof(u32)); i++, j++)
> +		lan78xx_read_reg(dev, lan78xx_regs[i], &data[j]);

There is no need for two loop variables, both i and j increment over the
same numbers.

^ permalink raw reply

* Re: [PATCH net 2/3] net: sched: ife: handle malformed tlv length
From: David Miller @ 2018-04-19 19:30 UTC (permalink / raw)
  To: aring; +Cc: yotam.gi, jhs, xiyou.wangcong, jiri, yuvalm, netdev, kernel
In-Reply-To: <20180418213534.6215-3-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Wed, 18 Apr 2018 17:35:33 -0400

> @@ -92,12 +92,43 @@ struct meta_tlvhdr {
>  	__be16 len;
>  };
>  
> +static inline bool __ife_tlv_meta_valid(const unsigned char *skbdata,
> +					const unsigned char *ifehdr_end)
> +{

Please do not use inline in foo.c files, let the compiler decide.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox