Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.2 034/249] net: mvpp2: cls: Extract the RSS context when parsing the ethtool rule
From: Sasha Levin @ 2019-07-15 13:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Maxime Chevallier, David S . Miller, Sasha Levin, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Maxime Chevallier <maxime.chevallier@bootlin.com>

[ Upstream commit c561da68038a738f30eca21456534c2d1872d13d ]

ethtool_rx_flow_rule_create takes into parameter the ethtool flow spec,
which doesn't contain the rss context id. We therefore need to extract
it ourself before parsing the ethtool rule.

The FLOW_RSS flag is only set in info->fs.flow_type, and not
info->flow_type.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
index a57d17ab91f0..fb06c0aa620a 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_cls.c
@@ -1242,6 +1242,12 @@ int mvpp2_ethtool_cls_rule_ins(struct mvpp2_port *port,

 	input.fs = &info->fs;

+	/* We need to manually set the rss_ctx, since this info isn't present
+	 * in info->fs
+	 */
+	if (info->fs.flow_type & FLOW_RSS)
+		input.rss_ctx = info->rss_context;
+
 	ethtool_rule = ethtool_rx_flow_rule_create(&input);
 	if (IS_ERR(ethtool_rule)) {
 		ret = PTR_ERR(ethtool_rule);
-- 
2.20.1

^ permalink raw reply related

* [PATCH AUTOSEL 5.2 027/249] ice: Gracefully handle reset failure in ice_alloc_vfs()
From: Sasha Levin @ 2019-07-15 13:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Brett Creeley, Anirudh Venkataramanan, Andrew Bowers,
	Jeff Kirsher, Sasha Levin, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Brett Creeley <brett.creeley@intel.com>

[ Upstream commit 72f9c2039859e6303550f202d6cc6b8d8af0178c ]

Currently if ice_reset_all_vfs() fails in ice_alloc_vfs() we fail to
free some resources, reset variables, and return an error value.
Fix this by adding another unroll case to free the pf->vf array, set
the pf->num_alloc_vfs to 0, and return an error code.

Without this, if ice_reset_all_vfs() fails in ice_alloc_vfs() we will
not be able to do SRIOV without hard rebooting the system because
rmmod'ing the driver does not work.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index a805cbdd69be..81ea77978355 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -1134,7 +1134,7 @@ static int ice_alloc_vfs(struct ice_pf *pf, u16 num_alloc_vfs)
 			   GFP_KERNEL);
 	if (!vfs) {
 		ret = -ENOMEM;
-		goto err_unroll_sriov;
+		goto err_pci_disable_sriov;
 	}
 	pf->vf = vfs;
 
@@ -1154,12 +1154,19 @@ static int ice_alloc_vfs(struct ice_pf *pf, u16 num_alloc_vfs)
 	pf->num_alloc_vfs = num_alloc_vfs;
 
 	/* VF resources get allocated during reset */
-	if (!ice_reset_all_vfs(pf, true))
+	if (!ice_reset_all_vfs(pf, true)) {
+		ret = -EIO;
 		goto err_unroll_sriov;
+	}
 
 	goto err_unroll_intr;
 
 err_unroll_sriov:
+	pf->vf = NULL;
+	devm_kfree(&pf->pdev->dev, vfs);
+	vfs = NULL;
+	pf->num_alloc_vfs = 0;
+err_pci_disable_sriov:
 	pci_disable_sriov(pf->pdev);
 err_unroll_intr:
 	/* rearm interrupts here */
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 018/249] batman-adv: fix for leaked TVLV handler.
From: Sasha Levin @ 2019-07-15 13:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jeremy Sowden, syzbot+d454a826e670502484b8, Simon Wunderlich,
	Sasha Levin, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Jeremy Sowden <jeremy@azazel.net>

[ Upstream commit 17f78dd1bd624a4dd78ed5db3284a63ee807fcc3 ]

A handler for BATADV_TVLV_ROAM was being registered when the
translation-table was initialized, but not unregistered when the
translation-table was freed.  Unregister it.

Fixes: 122edaa05940 ("batman-adv: tvlv - convert roaming adv packet to use tvlv unicast packets")
Reported-by: syzbot+d454a826e670502484b8@syzkaller.appspotmail.com
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/batman-adv/translation-table.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 1ddfd5e011ee..8a482c5ec67b 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3813,6 +3813,8 @@ static void batadv_tt_purge(struct work_struct *work)
  */
 void batadv_tt_free(struct batadv_priv *bat_priv)
 {
+	batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_ROAM, 1);
+
 	batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_TT, 1);
 	batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_TT, 1);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 015/249] ath: DFS JP domain W56 fixed pulse type 3 RADAR detection
From: Sasha Levin @ 2019-07-15 13:43 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Anilkumar Kolli, Tamizh chelvam, Kalle Valo, Sasha Levin,
	linux-wireless, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Anilkumar Kolli <akolli@codeaurora.org>

[ Upstream commit d8792393a783158cbb2c39939cb897dc5e5299b6 ]

Increase pulse width range from 1-2usec to 0-4usec.
During data traffic HW occasionally fails detecting radar pulses,
so that SW cannot get enough radar reports to achieve the success rate.

Tested ath10k hw and fw:
	* QCA9888(10.4-3.5.1-00052)
	* QCA4019(10.4-3.2.1.1-00017)
	* QCA9984(10.4-3.6-00104)
	* QCA988X(10.2.4-1.0-00041)

Tested ath9k hw: AR9300

Tested-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/ath/dfs_pattern_detector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/dfs_pattern_detector.c b/drivers/net/wireless/ath/dfs_pattern_detector.c
index d52b31b45df7..a274eb0d1968 100644
--- a/drivers/net/wireless/ath/dfs_pattern_detector.c
+++ b/drivers/net/wireless/ath/dfs_pattern_detector.c
@@ -111,7 +111,7 @@ static const struct radar_detector_specs jp_radar_ref_types[] = {
 	JP_PATTERN(0, 0, 1, 1428, 1428, 1, 18, 29, false),
 	JP_PATTERN(1, 2, 3, 3846, 3846, 1, 18, 29, false),
 	JP_PATTERN(2, 0, 1, 1388, 1388, 1, 18, 50, false),
-	JP_PATTERN(3, 1, 2, 4000, 4000, 1, 18, 50, false),
+	JP_PATTERN(3, 0, 4, 4000, 4000, 1, 18, 50, false),
 	JP_PATTERN(4, 0, 5, 150, 230, 1, 23, 50, false),
 	JP_PATTERN(5, 6, 10, 200, 500, 1, 16, 50, false),
 	JP_PATTERN(6, 11, 20, 200, 500, 1, 12, 50, false),
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 011/249] ath6kl: add some bounds checking
From: Sasha Levin @ 2019-07-15 13:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dan Carpenter, Kalle Valo, Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Dan Carpenter <dan.carpenter@oracle.com>

[ Upstream commit 5d6751eaff672ea77642e74e92e6c0ac7f9709ab ]

The "ev->traffic_class" and "reply->ac" variables come from the network
and they're used as an offset into the wmi->stream_exist_for_ac[] array.
Those variables are u8 so they can be 0-255 but the stream_exist_for_ac[]
array only has WMM_NUM_AC (4) elements.  We need to add a couple bounds
checks to prevent array overflows.

I also modified one existing check from "if (traffic_class > 3) {" to
"if (traffic_class >= WMM_NUM_AC) {" just to make them all consistent.

Fixes: bdcd81707973 (" Add ath6kl cleaned up driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/ath/ath6kl/wmi.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath6kl/wmi.c b/drivers/net/wireless/ath/ath6kl/wmi.c
index 68854c45d0a4..9ab6aa9ded5c 100644
--- a/drivers/net/wireless/ath/ath6kl/wmi.c
+++ b/drivers/net/wireless/ath/ath6kl/wmi.c
@@ -1176,6 +1176,10 @@ static int ath6kl_wmi_pstream_timeout_event_rx(struct wmi *wmi, u8 *datap,
 		return -EINVAL;
 
 	ev = (struct wmi_pstream_timeout_event *) datap;
+	if (ev->traffic_class >= WMM_NUM_AC) {
+		ath6kl_err("invalid traffic class: %d\n", ev->traffic_class);
+		return -EINVAL;
+	}
 
 	/*
 	 * When the pstream (fat pipe == AC) timesout, it means there were
@@ -1517,6 +1521,10 @@ static int ath6kl_wmi_cac_event_rx(struct wmi *wmi, u8 *datap, int len,
 		return -EINVAL;
 
 	reply = (struct wmi_cac_event *) datap;
+	if (reply->ac >= WMM_NUM_AC) {
+		ath6kl_err("invalid AC: %d\n", reply->ac);
+		return -EINVAL;
+	}
 
 	if ((reply->cac_indication == CAC_INDICATION_ADMISSION_RESP) &&
 	    (reply->status_code != IEEE80211_TSPEC_STATUS_ADMISS_ACCEPTED)) {
@@ -2633,7 +2641,7 @@ int ath6kl_wmi_delete_pstream_cmd(struct wmi *wmi, u8 if_idx, u8 traffic_class,
 	u16 active_tsids = 0;
 	int ret;
 
-	if (traffic_class > 3) {
+	if (traffic_class >= WMM_NUM_AC) {
 		ath6kl_err("invalid traffic class: %d\n", traffic_class);
 		return -EINVAL;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 004/249] ath9k: Don't trust TX status TID number when reporting airtime
From: Sasha Levin @ 2019-07-15 13:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Toke Høiland-Jørgensen, Miguel Catalan Cid, Kalle Valo,
	Sasha Levin, linux-wireless, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Toke Høiland-Jørgensen <toke@redhat.com>

[ Upstream commit 389b72e58259336c2d56d58b660b79cf4b9e0dcb ]

As already noted a comment in ath_tx_complete_aggr(), the hardware will
occasionally send a TX status with the wrong tid number. If we trust the
value, airtime usage will be reported to the wrong AC, which can cause the
deficit on that AC to become very low, blocking subsequent attempts to
transmit.

To fix this, account airtime usage to the TID number from the original skb,
instead of the one in the hardware TX status report.

Reported-by: Miguel Catalan Cid <miguel.catalan@i2cat.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/ath/ath9k/xmit.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c
index b17e1ca40995..3be0aeedb9b5 100644
--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -668,7 +668,8 @@ static bool bf_is_ampdu_not_probing(struct ath_buf *bf)
 static void ath_tx_count_airtime(struct ath_softc *sc,
 				 struct ieee80211_sta *sta,
 				 struct ath_buf *bf,
-				 struct ath_tx_status *ts)
+				 struct ath_tx_status *ts,
+				 u8 tid)
 {
 	u32 airtime = 0;
 	int i;
@@ -679,7 +680,7 @@ static void ath_tx_count_airtime(struct ath_softc *sc,
 		airtime += rate_dur * bf->rates[i].count;
 	}
 
-	ieee80211_sta_register_airtime(sta, ts->tid, airtime, 0);
+	ieee80211_sta_register_airtime(sta, tid, airtime, 0);
 }
 
 static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
@@ -709,7 +710,7 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq,
 	if (sta) {
 		struct ath_node *an = (struct ath_node *)sta->drv_priv;
 		tid = ath_get_skb_tid(sc, an, bf->bf_mpdu);
-		ath_tx_count_airtime(sc, sta, bf, ts);
+		ath_tx_count_airtime(sc, sta, bf, ts, tid->tidno);
 		if (ts->ts_status & (ATH9K_TXERR_FILT | ATH9K_TXERR_XRETRY))
 			tid->clear_ps_filter = true;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 003/249] ath10k: fix incorrect multicast/broadcast rate setting
From: Sasha Levin @ 2019-07-15 13:42 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Pradeep kumar Chitrapu, Zhi Chen, Sven Eckelmann, Kalle Valo,
	Sasha Levin, ath10k, linux-wireless, netdev
In-Reply-To: <20190715134655.4076-1-sashal@kernel.org>

From: Pradeep kumar Chitrapu <pradeepc@codeaurora.org>

[ Upstream commit 93ee3d108fc77e19efeac3ec5aa7d5886711bfef ]

Invalid rate code is sent to firmware when multicast rate value of 0 is
sent to driver indicating disabled case, causing broken mesh path.
so fix that.

Tested on QCA9984 with firmware 10.4-3.6.1-00827

Sven tested on IPQ4019 with 10.4-3.5.3-00057 and QCA9888 with 10.4-3.5.3-00053
(ath10k-firmware) and 10.4-3.6-00140 (linux-firmware 2018-12-16-211de167).

Fixes: cd93b83ad92 ("ath10k: support for multicast rate control")
Co-developed-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Zhi Chen <zhichen@codeaurora.org>
Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
Tested-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/ath/ath10k/mac.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 9c703d287333..e8997e22ceec 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -5588,8 +5588,8 @@ static void ath10k_bss_info_changed(struct ieee80211_hw *hw,
 	struct cfg80211_chan_def def;
 	u32 vdev_param, pdev_param, slottime, preamble;
 	u16 bitrate, hw_value;
-	u8 rate, basic_rate_idx;
-	int rateidx, ret = 0, hw_rate_code;
+	u8 rate, basic_rate_idx, rateidx;
+	int ret = 0, hw_rate_code, mcast_rate;
 	enum nl80211_band band;
 	const struct ieee80211_supported_band *sband;
 
@@ -5776,7 +5776,11 @@ static void ath10k_bss_info_changed(struct ieee80211_hw *hw,
 	if (changed & BSS_CHANGED_MCAST_RATE &&
 	    !ath10k_mac_vif_chan(arvif->vif, &def)) {
 		band = def.chan->band;
-		rateidx = vif->bss_conf.mcast_rate[band] - 1;
+		mcast_rate = vif->bss_conf.mcast_rate[band];
+		if (mcast_rate > 0)
+			rateidx = mcast_rate - 1;
+		else
+			rateidx = ffs(vif->bss_conf.basic_rates) - 1;
 
 		if (ar->phy_capability & WHAL_WLAN_11A_CAPABILITY)
 			rateidx += ATH10K_MAC_FIRST_OFDM_RATE_IDX;
-- 
2.20.1


^ permalink raw reply related

* IPv6 L2TP issues related to 93531c67
From: Paul Donohue @ 2019-07-15 16:18 UTC (permalink / raw)
  To: David Ahern; +Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev

I have a system that establishes four L2TP over IPv6 tunnels using site-local addresses via the following:
ip l2tp add tunnel tunnel_id 1233 peer_tunnel_id 1233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp1 tunnel_id 1233 session_id 1233 peer_session_id 1233
ip link set dev net_l2tp1 up
ip l2tp add tunnel tunnel_id 1235 peer_tunnel_id 1235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp2 tunnel_id 1235 session_id 1235 peer_session_id 1235
ip link set dev net_l2tp2 up
ip l2tp add tunnel tunnel_id 2233 peer_tunnel_id 2233 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:3
ip l2tp add session name net_l2tp3 tunnel_id 2233 session_id 2233 peer_session_id 2233
ip link set dev net_l2tp3 up
ip l2tp add tunnel tunnel_id 2235 peer_tunnel_id 2235 encap ip local fd23:2355:accd::2:4 remote fd23:2355:accd::2:2
ip l2tp add session name net_l2tp4 tunnel_id 2235 session_id 2235 peer_session_id 2235
ip link set dev net_l2tp4 up

These tunnels worked fine on kernel 4.4.  On kernel 4.15, there was a bug that caused intermittent L2TP packet errors, but everything worked fine after applying 4522a70db7aa5e77526a4079628578599821b193.

However, after upgrading to kernel 4.18 with 4522a70d (or upgrading to kernel 5.0 which includes 4522a70d, or upgrading to the current master kernel branch), two of the four tunnels always fail to work properly after a reboot, although it appears random which two work and which two fail.

When I say "fail to work properly", the problem is that packets generated by the l2tp kernel modules (in response to a packet being sent to the associated net_l2tpX interface) are silently dropped.  The l2tp_debugfs kernel module reports that L2TP packets are being transmitted with no errors, iptables counters and nflog rules can be used to confirm that well-formed packets are generated and sent, but tcpdump does not see the packets being sent on any interface on the system.  iptables reports that the destination interface of the lost packets is "lo" (which is clearly incorrect and probably an indicator of the underlying issue), but `tcpdump -nnn -i lo` doesn't show any packets.  Incoming L2TP packets appear to be processed correctly, only outgoing L2TP packets appear affected.

Reverting commit 93531c6743157d7e8c5792f8ed1a57641149d62c (identified by bisection) fixes this issue.

IPv4 L2TP tunnels do not appear affected by this issue.  Based on a few quick tests, it appears that switching to publicly-routable IPv6 addresses instead of site-local addresses seems to prevent this issue, although I haven't done sufficient testing of this, and it is not clear to me how the code in 93531c67 might be affected by the type of IPv6 address, so this observation may be a red herring.  Manually deleting and re-creating a broken interface seems to make it work again, although I have not thoroughly experimented with making changes after boot time to see if the problem is entirely random, if it is based on the number of existing interfaces, if it is based on a boot-time timing issue, etc.

It is not obvious to me how commit 93531c6743157d7e8c5792f8ed1a57641149d62c causes this issue, or how it should be fixed.  Could someone take a look and point me in the right direction for further troubleshooting?

Thanks!

^ permalink raw reply

* Re: [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: Ira Weiny @ 2019-07-15 16:29 UTC (permalink / raw)
  To: Bharath Vedartham
  Cc: John Hubbard, akpm, Mauro Carvalho Chehab, Dimitri Sivanich,
	Arnd Bergmann, Greg Kroah-Hartman, Alex Williamson, Cornelia Huck,
	Jens Axboe, Alexander Viro, Björn Töpel,
	Magnus Karlsson, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Enrico Weigelt, Thomas Gleixner, Alexios Zavras,
	Dan Carpenter, Max Filippov, Matt Sickler, Kirill A. Shutemov,
	Keith Busch, YueHaibing, linux-media, linux-kernel, devel, kvm,
	linux-block, linux-fsdevel, linux-mm, netdev, bpf, xdp-newbies,
	Jason Gunthorpe
In-Reply-To: <20190715065654.GA3716@bharath12345-Inspiron-5559>

On Mon, Jul 15, 2019 at 12:26:54PM +0530, Bharath Vedartham wrote:
> On Sun, Jul 14, 2019 at 04:33:42PM -0700, John Hubbard wrote:
> > On 7/14/19 12:08 PM, Bharath Vedartham wrote:
> > > This patch converts all call sites of get_user_pages
> > > to use put_user_page*() instead of put_page*() functions to
> > > release reference to gup pinned pages.
> Hi John, 
> > Hi Bharath,
> > 
> > Thanks for jumping in to help, and welcome to the party!
> > 
> > You've caught everyone in the middle of a merge window, btw.  As a
> > result, I'm busy rebasing and reworking the get_user_pages call sites, 
> > and gup tracking, in the wake of some semi-traumatic changes to bio 
> > and gup and such. I plan to re-post right after 5.3-rc1 shows up, from 
> > here:
> > 
> >     https://github.com/johnhubbard/linux/commits/gup_dma_core
> > 
> > ...which you'll find already covers the changes you've posted, except for:
> > 
> >     drivers/misc/sgi-gru/grufault.c
> >     drivers/staging/kpc2000/kpc_dma/fileops.c
> > 
> > ...and this one, which is undergoing to larger local changes, due to
> > bvec, so let's leave it out of the choices:
> > 
> >     fs/io_uring.c
> > 
> > Therefore, until -rc1, if you'd like to help, I'd recommend one or more
> > of the following ideas:
> > 
> > 1. Pull down https://github.com/johnhubbard/linux/commits/gup_dma_core
> > and find missing conversions: look for any additional missing 
> > get_user_pages/put_page conversions. You've already found a couple missing 
> > ones. I haven't re-run a search in a long time, so there's probably even more.
> > 	a) And find more, after I rebase to 5.3-rc1: people probably are adding
> > 	get_user_pages() calls as we speak. :)
> Shouldn't this be documented then? I don't see any docs for using
> put_user_page*() in v5.2.1 in the memory management API section?
> > 2. Patches: Focus on just one subsystem at a time, and perfect the patch for
> > it. For example, I think this the staging driver would be perfect to start with:
> > 
> >     drivers/staging/kpc2000/kpc_dma/fileops.c
> > 
> > 	a) verify that you've really, corrected converted the whole
> > 	driver. (Hint: I think you might be overlooking a put_page call.)
> Yup. I did see that! Will fix it!
> > 	b) Attempt to test it if you can (I'm being hypocritical in
> > 	the extreme here, but one of my problems is that testing
> > 	has been light, so any help is very valuable). qemu...?
> > 	OTOH, maybe even qemu cannot easily test a kpc2000, but
> > 	perhaps `git blame` and talking to the authors would help
> > 	figure out a way to validate the changes.
> Great! I ll do that, I ll mail the patch authors and ask them for help
> in testing. 
> > 	Thinking about whether you can run a test that would prove or
> > 	disprove my claim in (a), above, could be useful in coming up
> > 	with tests to run.
> 
> > In other words, a few very high quality conversions (even just one) that
> > we can really put our faith in, is what I value most here. Tested patches
> > are awesome.
> I understand that! 
> > 3. Once I re-post, turn on the new CONFIG_DEBUG_GET_USER_PAGES_REFERENCES
> > and run things such as xfstest/fstest. (Again, doing so would be going
> > further than I have yet--very helpful). Help clarify what conversions have
> > actually been tested and work, and which ones remain unvalidated.
> > Other: Please note that this:
> Yup will do that.
> >     https://github.com/johnhubbard/linux/commits/gup_dma_core
> > 
> >     a) gets rebased often, and
> > 
> >     b) has a bunch of commits (iov_iter and related) that conflict
> >        with the latest linux.git,
> > 
> >     c) has some bugs in the bio area, that I'm fixing, so I don't trust
> >        that's it's safely runnable, for a few more days.
> I assume your repo contains only work related to fixing gup issues and
> not the main repo for gup development? i.e where gup changes are merged?

We have been using Andrews tree for merging.

> Also are release_pages and put_user_pages interchangable? 

Conceptually yes.  But release_pages is more efficient.  There was some
discussion around this starting here:

https://lore.kernel.org/lkml/20190523172852.GA27175@iweiny-DESK2.sc.intel.com/

And a resulting bug fix.

https://lkml.org/lkml/2019/6/21/95

Ira

> > One note below, for the future:
> > 
> > > 
> > > This is a bunch of trivial conversions which is a part of an effort
> > > by John Hubbard to solve issues with gup pinned pages and 
> > > filesystem writeback.
> > > 
> > > The issue is more clearly described in John Hubbard's patch[1] where
> > > put_user_page*() functions are introduced.
> > > 
> > > Currently put_user_page*() simply does put_page but future implementations
> > > look to change that once treewide change of put_page callsites to 
> > > put_user_page*() is finished.
> > > 
> > > The lwn article describing the issue with gup pinned pages and filesystem 
> > > writeback [2].
> > > 
> > > This patch has been tested by building and booting the kernel as I don't
> > > have the required hardware to test the device drivers.
> > > 
> > > I did not modify gpu/drm drivers which use release_pages instead of
> > > put_page() to release reference of gup pinned pages as I am not clear
> > > whether release_pages and put_page are interchangable. 
> > > 
> > > [1] https://lkml.org/lkml/2019/3/26/1396
> > 
> > When referring to patches in a commit description, please use the 
> > commit hash, not an external link. See Submitting Patches [1] for details.
> > 
> > Also, once you figure out the right maintainers and other involved people,
> > putting Cc: in the commit description is common practice, too.
> > 
> > [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html
> Will work on that! Thanks!
> > thanks,
> > -- 
> > John Hubbard
> > NVIDIA
> > 
> > > 
> > > [2] https://lwn.net/Articles/784574/
> > > 
> > > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > > ---
> > >  drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +--
> > >  drivers/misc/sgi-gru/grufault.c           | 2 +-
> > >  drivers/staging/kpc2000/kpc_dma/fileops.c | 4 +---
> > >  drivers/vfio/vfio_iommu_type1.c           | 2 +-
> > >  fs/io_uring.c                             | 7 +++----
> > >  mm/gup_benchmark.c                        | 6 +-----
> > >  net/xdp/xdp_umem.c                        | 7 +------
> > >  7 files changed, 9 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> > > index 66a6c6c..d6eeb43 100644
> > > --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> > > +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> > > @@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
> > >  	BUG_ON(dma->sglen);
> > >  
> > >  	if (dma->pages) {
> > > -		for (i = 0; i < dma->nr_pages; i++)
> > > -			put_page(dma->pages[i]);
> > > +		put_user_pages(dma->pages, dma->nr_pages);
> > >  		kfree(dma->pages);
> > >  		dma->pages = NULL;
> > >  	}
> > > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > > index 4b713a8..61b3447 100644
> > > --- a/drivers/misc/sgi-gru/grufault.c
> > > +++ b/drivers/misc/sgi-gru/grufault.c
> > > @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> > >  	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> > >  		return -EFAULT;
> > >  	*paddr = page_to_phys(page);
> > > -	put_page(page);
> > > +	put_user_page(page);
> > >  	return 0;
> > >  }
> > >  
> > > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > > index 6166587..26dceed 100644
> > > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > > @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
> > >  	sg_free_table(&acd->sgt);
> > >   err_dma_map_sg:
> > >   err_alloc_sg_table:
> > > -	for (i = 0 ; i < acd->page_count ; i++){
> > > -		put_page(acd->user_pages[i]);
> > > -	}
> > > +	put_user_pages(acd->user_pages, acd->page_count);
> > >   err_get_user_pages:
> > >  	kfree(acd->user_pages);
> > >   err_alloc_userpages:
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index add34ad..c491524 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -369,7 +369,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> > >  		 */
> > >  		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> > >  			ret = -EOPNOTSUPP;
> > > -			put_page(page[0]);
> > > +			put_user_page(page[0]);
> > >  		}
> > >  	}
> > >  	up_read(&mm->mmap_sem);
> > > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > > index 4ef62a4..b4a4549 100644
> > > --- a/fs/io_uring.c
> > > +++ b/fs/io_uring.c
> > > @@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
> > >  			 * if we did partial map, or found file backed vmas,
> > >  			 * release any pages we did get
> > >  			 */
> > > -			if (pret > 0) {
> > > -				for (j = 0; j < pret; j++)
> > > -					put_page(pages[j]);
> > > -			}
> > > +			if (pret > 0)
> > > +				put_user_pages(pages, pret);
> > > +
> > >  			if (ctx->account_mem)
> > >  				io_unaccount_mem(ctx->user, nr_pages);
> > >  			kvfree(imu->bvec);
> > > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> > > index 7dd602d..15fc7a2 100644
> > > --- a/mm/gup_benchmark.c
> > > +++ b/mm/gup_benchmark.c
> > > @@ -76,11 +76,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
> > >  	gup->size = addr - gup->addr;
> > >  
> > >  	start_time = ktime_get();
> > > -	for (i = 0; i < nr_pages; i++) {
> > > -		if (!pages[i])
> > > -			break;
> > > -		put_page(pages[i]);
> > > -	}
> > > +	put_user_pages(pages, nr_pages);
> > >  	end_time = ktime_get();
> > >  	gup->put_delta_usec = ktime_us_delta(end_time, start_time);
> > >  
> > > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > > index 9c6de4f..6103e19 100644
> > > --- a/net/xdp/xdp_umem.c
> > > +++ b/net/xdp/xdp_umem.c
> > > @@ -173,12 +173,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> > >  {
> > >  	unsigned int i;
> > >  
> > > -	for (i = 0; i < umem->npgs; i++) {
> > > -		struct page *page = umem->pgs[i];
> > > -
> > > -		set_page_dirty_lock(page);
> > > -		put_page(page);
> > > -	}
> > > +	put_user_pages_dirty_lock(umem->pgs, umem->npgs);
> > >  
> > >  	kfree(umem->pgs);
> > >  	umem->pgs = NULL;
> > > 

^ permalink raw reply

* Re: [PATCH net v3] net: neigh: fix multiple neigh timer scheduling
From: David Ahern @ 2019-07-15 16:32 UTC (permalink / raw)
  To: Lorenzo Bianconi, davem; +Cc: netdev, marek
In-Reply-To: <552d7c8de6a07e12f7b76791da953e81478138cd.1563134704.git.lorenzo.bianconi@redhat.com>

On 7/14/19 3:36 PM, Lorenzo Bianconi wrote:
> Neigh timer can be scheduled multiple times from userspace adding
> multiple neigh entries and forcing the neigh timer scheduling passing
> NTF_USE in the netlink requests.
> This will result in a refcount leak and in the following dump stack:
> 

...

> 
> Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
> receiving a netlink request with NTF_USE flag set
> 
> Reported-by: Marek Majkowski <marek@cloudflare.com>
> Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
> ---
> Changes since v2:
> - remove check_timer flag and run neigh_del_timer directly
> Changes since v1:
> - fix compilation errors defining neigh_event_send_check_timer routine
> ---
>  net/core/neighbour.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply

* [PATCH bpf 0/5] bpf: allow wide (u64) aligned loads for some fields of bpf_sock_addr
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song

When fixing selftests by adding support for wide stores, Yonghong
reported that he had seen some examples where clang generates
single u64 loads for two adjacent u32s as well:
http://lore.kernel.org/netdev/a66c937f-94c0-eaf8-5b37-8587d66c0c62@fb.com

Let's support aligned u64 reads for some bpf_sock_addr fields
as well.

(This can probably wait for bpf-next, I'll defer to Younhong and the
maintainers.)

Cc: Yonghong Song <yhs@fb.com>

Stanislav Fomichev (5):
  bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
  bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and
    msg_src_ip6
  selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
  selftests/bpf: add selftests for wide loads
  bpf: sync bpf.h to tools/

 include/linux/filter.h                        |  2 +-
 include/uapi/linux/bpf.h                      |  4 +-
 net/core/filter.c                             | 24 ++++--
 tools/include/uapi/linux/bpf.h                |  4 +-
 .../selftests/bpf/verifier/wide_access.c      | 73 +++++++++++++++++++
 .../selftests/bpf/verifier/wide_store.c       | 36 ---------
 6 files changed, 95 insertions(+), 48 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
 delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c

-- 
2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* [PATCH bpf 1/5] bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok to indicate
that it can be used for both loads and stores.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/filter.h |  2 +-
 net/core/filter.c      | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6d944369ca87..ff65d22cf336 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -747,7 +747,7 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 	return size <= size_default && (size & (size - 1)) == 0;
 }
 
-#define bpf_ctx_wide_store_ok(off, size, type, field)			\
+#define bpf_ctx_wide_access_ok(off, size, type, field)			\
 	(size == sizeof(__u64) &&					\
 	off >= offsetof(type, field) &&					\
 	off + sizeof(__u64) <= offsetofend(type, field) &&		\
diff --git a/net/core/filter.c b/net/core/filter.c
index 47f6386fb17a..c5983ddb1a9f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6890,14 +6890,14 @@ static bool sock_addr_is_valid_access(int off, int size,
 			if (!bpf_ctx_narrow_access_ok(off, size, size_default))
 				return false;
 		} else {
-			if (bpf_ctx_wide_store_ok(off, size,
-						  struct bpf_sock_addr,
-						  user_ip6))
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   user_ip6))
 				return true;
 
-			if (bpf_ctx_wide_store_ok(off, size,
-						  struct bpf_sock_addr,
-						  msg_src_ip6))
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   msg_src_ip6))
 				return true;
 
 			if (size != size_default)
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 2/5] bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and msg_src_ip6
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Add explicit check for u64 loads of user_ip6 and msg_src_ip6 and
update the comment.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/uapi/linux/bpf.h |  4 ++--
 net/core/filter.c        | 12 +++++++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..81be929b89fc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3248,7 +3248,7 @@ struct bpf_sock_addr {
 	__u32 user_ip4;		/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 user_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 user_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__u32 user_port;	/* Allows 4-byte read and write.
@@ -3260,7 +3260,7 @@ struct bpf_sock_addr {
 	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
diff --git a/net/core/filter.c b/net/core/filter.c
index c5983ddb1a9f..0f6854ccf894 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6884,9 +6884,19 @@ static bool sock_addr_is_valid_access(int off, int size,
 	case bpf_ctx_range(struct bpf_sock_addr, msg_src_ip4):
 	case bpf_ctx_range_till(struct bpf_sock_addr, msg_src_ip6[0],
 				msg_src_ip6[3]):
-		/* Only narrow read access allowed for now. */
 		if (type == BPF_READ) {
 			bpf_ctx_record_field_size(info, size_default);
+
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   user_ip6))
+				return true;
+
+			if (bpf_ctx_wide_access_ok(off, size,
+						   struct bpf_sock_addr,
+						   msg_src_ip6))
+				return true;
+
 			if (!bpf_ctx_narrow_access_ok(off, size, size_default))
 				return false;
 		} else {
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 3/5] selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Move the file and rename internal BPF_SOCK_ADDR define to
BPF_SOCK_ADDR_STORE. This selftest will be extended in the next commit
with the wide loads.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/verifier/wide_access.c      | 36 +++++++++++++++++++
 .../selftests/bpf/verifier/wide_store.c       | 36 -------------------
 2 files changed, 36 insertions(+), 36 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
 delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c

diff --git a/tools/testing/selftests/bpf/verifier/wide_access.c b/tools/testing/selftests/bpf/verifier/wide_access.c
new file mode 100644
index 000000000000..3ac97328432f
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/wide_access.c
@@ -0,0 +1,36 @@
+#define BPF_SOCK_ADDR_STORE(field, off, res, err) \
+{ \
+	"wide store to bpf_sock_addr." #field "[" #off "]", \
+	.insns = { \
+	BPF_MOV64_IMM(BPF_REG_0, 1), \
+	BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, \
+		    offsetof(struct bpf_sock_addr, field[off])), \
+	BPF_EXIT_INSN(), \
+	}, \
+	.result = res, \
+	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
+	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
+	.errstr = err, \
+}
+
+/* user_ip6[0] is u64 aligned */
+BPF_SOCK_ADDR_STORE(user_ip6, 0, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(user_ip6, 1, REJECT,
+		    "invalid bpf_context access off=12 size=8"),
+BPF_SOCK_ADDR_STORE(user_ip6, 2, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(user_ip6, 3, REJECT,
+		    "invalid bpf_context access off=20 size=8"),
+
+/* msg_src_ip6[0] is _not_ u64 aligned */
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 0, REJECT,
+		    "invalid bpf_context access off=44 size=8"),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 1, ACCEPT,
+		    NULL),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 2, REJECT,
+		    "invalid bpf_context access off=52 size=8"),
+BPF_SOCK_ADDR_STORE(msg_src_ip6, 3, REJECT,
+		    "invalid bpf_context access off=56 size=8"),
+
+#undef BPF_SOCK_ADDR_STORE
diff --git a/tools/testing/selftests/bpf/verifier/wide_store.c b/tools/testing/selftests/bpf/verifier/wide_store.c
deleted file mode 100644
index 8fe99602ded4..000000000000
--- a/tools/testing/selftests/bpf/verifier/wide_store.c
+++ /dev/null
@@ -1,36 +0,0 @@
-#define BPF_SOCK_ADDR(field, off, res, err) \
-{ \
-	"wide store to bpf_sock_addr." #field "[" #off "]", \
-	.insns = { \
-	BPF_MOV64_IMM(BPF_REG_0, 1), \
-	BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, \
-		    offsetof(struct bpf_sock_addr, field[off])), \
-	BPF_EXIT_INSN(), \
-	}, \
-	.result = res, \
-	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
-	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
-	.errstr = err, \
-}
-
-/* user_ip6[0] is u64 aligned */
-BPF_SOCK_ADDR(user_ip6, 0, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(user_ip6, 1, REJECT,
-	      "invalid bpf_context access off=12 size=8"),
-BPF_SOCK_ADDR(user_ip6, 2, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(user_ip6, 3, REJECT,
-	      "invalid bpf_context access off=20 size=8"),
-
-/* msg_src_ip6[0] is _not_ u64 aligned */
-BPF_SOCK_ADDR(msg_src_ip6, 0, REJECT,
-	      "invalid bpf_context access off=44 size=8"),
-BPF_SOCK_ADDR(msg_src_ip6, 1, ACCEPT,
-	      NULL),
-BPF_SOCK_ADDR(msg_src_ip6, 2, REJECT,
-	      "invalid bpf_context access off=52 size=8"),
-BPF_SOCK_ADDR(msg_src_ip6, 3, REJECT,
-	      "invalid bpf_context access off=56 size=8"),
-
-#undef BPF_SOCK_ADDR
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 4/5] selftests/bpf: add selftests for wide loads
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Mirror existing wide store tests with wide loads. The only significant
difference is expected error string.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/verifier/wide_access.c      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/selftests/bpf/verifier/wide_access.c b/tools/testing/selftests/bpf/verifier/wide_access.c
index 3ac97328432f..ccade9312d21 100644
--- a/tools/testing/selftests/bpf/verifier/wide_access.c
+++ b/tools/testing/selftests/bpf/verifier/wide_access.c
@@ -34,3 +34,40 @@ BPF_SOCK_ADDR_STORE(msg_src_ip6, 3, REJECT,
 		    "invalid bpf_context access off=56 size=8"),
 
 #undef BPF_SOCK_ADDR_STORE
+
+#define BPF_SOCK_ADDR_LOAD(field, off, res, err) \
+{ \
+	"wide load from bpf_sock_addr." #field "[" #off "]", \
+	.insns = { \
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, \
+		    offsetof(struct bpf_sock_addr, field[off])), \
+	BPF_MOV64_IMM(BPF_REG_0, 1), \
+	BPF_EXIT_INSN(), \
+	}, \
+	.result = res, \
+	.prog_type = BPF_PROG_TYPE_CGROUP_SOCK_ADDR, \
+	.expected_attach_type = BPF_CGROUP_UDP6_SENDMSG, \
+	.errstr = err, \
+}
+
+/* user_ip6[0] is u64 aligned */
+BPF_SOCK_ADDR_LOAD(user_ip6, 0, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(user_ip6, 1, REJECT,
+		   "invalid bpf_context access off=12 size=8"),
+BPF_SOCK_ADDR_LOAD(user_ip6, 2, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(user_ip6, 3, REJECT,
+		   "invalid bpf_context access off=20 size=8"),
+
+/* msg_src_ip6[0] is _not_ u64 aligned */
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 0, REJECT,
+		   "invalid bpf_context access off=44 size=8"),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 1, ACCEPT,
+		   NULL),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 2, REJECT,
+		   "invalid bpf_context access off=52 size=8"),
+BPF_SOCK_ADDR_LOAD(msg_src_ip6, 3, REJECT,
+		   "invalid bpf_context access off=56 size=8"),
+
+#undef BPF_SOCK_ADDR_LOAD
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* [PATCH bpf 5/5] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-07-15 16:39 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

Update bpf_sock_addr comments to indicate support for 8-byte reads
from user_ip6 and msg_src_ip6.

Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..1f61374fcf81 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3245,7 +3245,7 @@ struct bpf_sock_addr {
 	__u32 user_ip4;		/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 user_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 user_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__u32 user_port;	/* Allows 4-byte read and write.
@@ -3257,7 +3257,7 @@ struct bpf_sock_addr {
 	__u32 msg_src_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
 				 * Stored in network byte order.
 				 */
-	__u32 msg_src_ip6[4];	/* Allows 1,2,4-byte read and 4,8-byte write.
+	__u32 msg_src_ip6[4];	/* Allows 1,2,4,8-byte read and 4,8-byte write.
 				 * Stored in network byte order.
 				 */
 	__bpf_md_ptr(struct bpf_sock *, sk);
-- 
2.22.0.510.g264f2c817a-goog


^ permalink raw reply related

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Randy Dunlap @ 2019-07-15 16:43 UTC (permalink / raw)
  To: Stephen Rothwell, Linux Next Mailing List
  Cc: Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org
In-Reply-To: <20190715144848.4cc41e07@canb.auug.org.au>

On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add v5.4 material to your linux-next included branches
> until after v5.3-rc1 has been released.
> 
> Changes since 20190712:
> 

Hi,

I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
for include/net/netfilter/nf_tables_offload.h.s:

  CC      include/net/netfilter/nf_tables_offload.h.s
In file included from ./../include/net/netfilter/nf_tables_offload.h:5:0,
                 from <command-line>:0:
../include/net/netfilter/nf_tables.h: In function ‘nft_gencursor_next’:
../include/net/netfilter/nf_tables.h:1223:14: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return net->nft.gencursor + 1 == 1 ? 1 : 0;
              ^~~
              nf
In file included from ../include/linux/kernel.h:11:0,
                 from ../include/net/flow_offload.h:4,
                 from ./../include/net/netfilter/nf_tables_offload.h:4,
                 from <command-line>:0:
../include/net/netfilter/nf_tables.h: In function ‘nft_genmask_cur’:
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:261:17: note: in definition of macro ‘__READ_ONCE’
  union { typeof(x) __val; char __c[1]; } __u;   \
                 ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:22: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                      ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:263:42: note: in definition of macro ‘__READ_ONCE’
   __read_once_size(&(x), __u.__c, sizeof(x));  \
                                          ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:30: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                              ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
  return 1 << READ_ONCE(net->nft.gencursor);
                             ^
../include/linux/compiler.h:265:50: note: in definition of macro ‘__READ_ONCE’
   __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                                                  ^
../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
  return 1 << READ_ONCE(net->nft.gencursor);
              ^~~~~~~~~
make[2]: *** [../scripts/Makefile.build:304: include/net/netfilter/nf_tables_offload.h.s] Error 1


Should this header file not be tested?

thanks.
-- 
~Randy

^ permalink raw reply

* Re: [PATCH bpf 0/5] bpf: allow wide (u64) aligned loads for some fields of bpf_sock_addr
From: Andrii Nakryiko @ 2019-07-15 17:16 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Yonghong Song
In-Reply-To: <20190715163956.204061-1-sdf@google.com>

On Mon, Jul 15, 2019 at 9:40 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> When fixing selftests by adding support for wide stores, Yonghong
> reported that he had seen some examples where clang generates
> single u64 loads for two adjacent u32s as well:
> http://lore.kernel.org/netdev/a66c937f-94c0-eaf8-5b37-8587d66c0c62@fb.com
>
> Let's support aligned u64 reads for some bpf_sock_addr fields
> as well.
>
> (This can probably wait for bpf-next, I'll defer to Younhong and the
> maintainers.)
>
> Cc: Yonghong Song <yhs@fb.com>
>
> Stanislav Fomichev (5):
>   bpf: rename bpf_ctx_wide_store_ok to bpf_ctx_wide_access_ok
>   bpf: allow wide aligned loads for bpf_sock_addr user_ip6 and
>     msg_src_ip6
>   selftests/bpf: rename verifier/wide_store.c to verifier/wide_access.c
>   selftests/bpf: add selftests for wide loads
>   bpf: sync bpf.h to tools/
>

LGTM!

For the series:

Acked-by: Andrii Narkyiko <andriin@fb.com>

>  include/linux/filter.h                        |  2 +-
>  include/uapi/linux/bpf.h                      |  4 +-
>  net/core/filter.c                             | 24 ++++--
>  tools/include/uapi/linux/bpf.h                |  4 +-
>  .../selftests/bpf/verifier/wide_access.c      | 73 +++++++++++++++++++
>  .../selftests/bpf/verifier/wide_store.c       | 36 ---------
>  6 files changed, 95 insertions(+), 48 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/verifier/wide_access.c
>  delete mode 100644 tools/testing/selftests/bpf/verifier/wide_store.c
>
> --
> 2.22.0.510.g264f2c817a-goog

^ permalink raw reply

* Re: INFO: task hung in unregister_netdevice_notifier (3)
From: Oliver Hartkopp @ 2019-07-15 17:16 UTC (permalink / raw)
  To: syzbot, davem, linux-can, linux-kernel, mkl, netdev,
	syzkaller-bugs, Kirill Tkhai
In-Reply-To: <000000000000d018ea058d9c46e3@google.com>

Hello all,

On 14.07.19 06:07, syzbot wrote:
> syzbot has found a reproducer for the following crash on:

the internal users of the CAN networking subsystem like CAN_BCM and 
CAN_RAW hold a number of CAN identifier subscriptions ('filters') for 
CAN netdevices (only type ARPHRD_CAN) in their socket data structures.

The per-socket netdevice notifier is used to manage the ad-hoc removal 
of these filters at netdevice removal time.

What I can see in the console output at

https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000

seems to be a race between an unknown register_netdevice_notifier() call 
("A") and the unregister_netdevice_notifier() ("B") likely invoked by 
bcm_release() ("C"):

[ 1047.294207][ T1049]  schedule+0xa8/0x270
[ 1047.318401][ T1049]  rwsem_down_write_slowpath+0x70a/0xf70
[ 1047.324114][ T1049]  ? downgrade_write+0x3c0/0x3c0
[ 1047.438644][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.443483][ T1049]  ? lock_acquire+0x190/0x410
[ 1047.448191][ T1049]  ? unregister_netdevice_notifier+0x7e/0x390
[ 1047.547227][ T1049]  down_write+0x13c/0x150
[ 1047.579535][ T1049]  ? down_write+0x13c/0x150
[ 1047.584106][ T1049]  ? __down_timeout+0x2d0/0x2d0
[ 1047.635356][ T1049]  ? mark_held_locks+0xf0/0xf0
[ 1047.640721][ T1049]  unregister_netdevice_notifier+0x7e/0x390  <- "B"
[ 1047.646667][ T1049]  ? __sock_release+0x89/0x280
[ 1047.709126][ T1049]  ? register_netdevice_notifier+0x630/0x630 <- "A"
[ 1047.715203][ T1049]  ? __kasan_check_write+0x14/0x20
[ 1047.775138][ T1049]  bcm_release+0x93/0x5e0                    <- "C"
[ 1047.795337][ T1049]  __sock_release+0xce/0x280
[ 1047.829016][ T1049]  sock_close+0x1e/0x30

The question to me is now:

Is the problem located in an (un)register_netdevice_notifier race OR is 
it generally a bad idea to call unregister_netdevice_notifier() in a 
sock release?

I've never seen that kind of problem in the wild. But if it would be the 
latter case wouldn't it be the same problem when someone unloads the 
kernel module at the 'wrong' time?

In commit 328fbe747ad46 ("net: Close race between {un, 
}register_netdevice_notifier() and setup_net()/cleanup_net()") Kirill 
Tkhai reviewed the calling site in CAN_RAW raw_release() which points to 
the same situation. Therefore added him to the recipient list.

Should down_write() be replaced with something like 
rwsem_down_write_slowpath()??

Regards,
Oliver

> HEAD commit:    a2d79c71 Merge tag 'for-5.3/io_uring-20190711' of 
> git://gi..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e45f0fa00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3539b1747f03988e
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=0f1827363a305f74996f
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1765c52fa00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f1827363a305f74996f@syzkaller.appspotmail.com
> 
> INFO: task syz-executor.4:9527 blocked for more than 143 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.4  D28136  9527   9356 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.2:9528 blocked for more than 145 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.2  D28136  9528   9354 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.0:9529 blocked for more than 147 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.0  D28136  9529   9353 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.5:9533 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.5  D28136  9533   9358 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 5f fe ff ff 31 c9 31 f6 41 b9 b0 20 41 00 41 b8 8c d6 65 00 ba 02 
> 00 00 00 bf 28 38 44 00 ff 15 7d a1 24 00 85 c0 0f 85 37 fe <ff> ff 31 
> c9 31 f6 41 b9 b0 20 41 00 41 b8 90 d6 65 00 ba 03 00 00
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.1:9534 blocked for more than 148 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.1  D28136  9534   9359 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> INFO: task syz-executor.3:9535 blocked for more than 150 seconds.
>        Not tainted 5.2.0+ #80
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.3  D28136  9535   9351 0x00000004
> Call Trace:
>   context_switch kernel/sched/core.c:3252 [inline]
>   __schedule+0x755/0x1580 kernel/sched/core.c:3878
>   schedule+0xa8/0x270 kernel/sched/core.c:3942
>   rwsem_down_write_slowpath+0x70a/0xf70 kernel/locking/rwsem.c:1198
>   __down_write kernel/locking/rwsem.c:1349 [inline]
>   down_write+0x13c/0x150 kernel/locking/rwsem.c:1485
>   unregister_netdevice_notifier+0x7e/0x390 net/core/dev.c:1713
>   bcm_release+0x93/0x5e0 net/can/bcm.c:1525
>   __sock_release+0xce/0x280 net/socket.c:586
>   sock_close+0x1e/0x30 net/socket.c:1264
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:185 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413501
> Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 
> 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 
> 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:0000000000a6fbc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000413501
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 0000000000000001 R08: ffffffffffffffff R09: ffffffffffffffff
> R10: 0000000000a6fca0 R11: 0000000000000293 R12: 000000000075c9a0
> R13: 000000000075c9a0 R14: 00000000007619c8 R15: ffffffffffffffff
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1049:
>   #0: 00000000ede263b0 (rcu_read_lock){....}, at: 
> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5257
> 1 lock held by rsyslogd/9208:
>   #0: 00000000da20b59a (&f->f_pos_lock){+.+.}, at: 
> __fdget_pos+0xee/0x110 fs/file.c:801
> 2 locks held by getty/9298:
>   #0: 00000000e9efae0d (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000007287a12 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9299:
>   #0: 00000000ad0733b0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000094dd5193 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9300:
>   #0: 00000000692c340f (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000538c7d7d (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9301:
>   #0: 00000000116ea6c7 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 00000000a908a9f7 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9302:
>   #0: 0000000042704f01 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 0000000041cc8671 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9303:
>   #0: 000000001ef3b293 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
>   #1: 000000008b703302 (&ldata->atomic_read_lock){+.+.}, at: 
> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156
> 2 locks held by getty/9304:
>   #0: 0000000095601bb0 (&tty->ldisc_sem){++++}, at: 
> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
> 

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Laura Garcia @ 2019-07-15 17:28 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <ccb5b818-c191-2d9e-311f-b2c79b7f6823@infradead.org>

CC'ing netfilter.

On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > Hi all,
> >
> > Please do not add v5.4 material to your linux-next included branches
> > until after v5.3-rc1 has been released.
> >
> > Changes since 20190712:
> >
>
> Hi,
>
> I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> for include/net/netfilter/nf_tables_offload.h.s:
>
>   CC      include/net/netfilter/nf_tables_offload.h.s
> In file included from ./../include/net/netfilter/nf_tables_offload.h:5:0,
>                  from <command-line>:0:
> ../include/net/netfilter/nf_tables.h: In function ‘nft_gencursor_next’:
> ../include/net/netfilter/nf_tables.h:1223:14: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return net->nft.gencursor + 1 == 1 ? 1 : 0;
>               ^~~
>               nf
> In file included from ../include/linux/kernel.h:11:0,
>                  from ../include/net/flow_offload.h:4,
>                  from ./../include/net/netfilter/nf_tables_offload.h:4,
>                  from <command-line>:0:
> ../include/net/netfilter/nf_tables.h: In function ‘nft_genmask_cur’:
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:261:17: note: in definition of macro ‘__READ_ONCE’
>   union { typeof(x) __val; char __c[1]; } __u;   \
>                  ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:263:22: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size(&(x), __u.__c, sizeof(x));  \
>                       ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:263:42: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size(&(x), __u.__c, sizeof(x));  \
>                                           ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:265:30: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
>                               ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> ../include/net/netfilter/nf_tables.h:1234:29: error: ‘const struct net’ has no member named ‘nft’; did you mean ‘nf’?
>   return 1 << READ_ONCE(net->nft.gencursor);
>                              ^
> ../include/linux/compiler.h:265:50: note: in definition of macro ‘__READ_ONCE’
>    __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
>                                                   ^
> ../include/net/netfilter/nf_tables.h:1234:14: note: in expansion of macro ‘READ_ONCE’
>   return 1 << READ_ONCE(net->nft.gencursor);
>               ^~~~~~~~~
> make[2]: *** [../scripts/Makefile.build:304: include/net/netfilter/nf_tables_offload.h.s] Error 1
>
>
> Should this header file not be tested?
>
> thanks.
> --
> ~Randy

^ permalink raw reply

* Re: linux-next: Tree for Jul 15 (HEADERS_TEST w/ netfilter tables offload)
From: Pablo Neira Ayuso @ 2019-07-15 17:33 UTC (permalink / raw)
  To: Laura Garcia
  Cc: Randy Dunlap, Stephen Rothwell, Linux Next Mailing List,
	Linux Kernel Mailing List, linux-kbuild, Masahiro Yamada,
	netdev@vger.kernel.org, Netfilter Development Mailing list
In-Reply-To: <CAF90-WirEMg7arNOTmo+tyJ20rt_zeN=nr0OO6Qk0Ss8J4QrUA@mail.gmail.com>

On Mon, Jul 15, 2019 at 07:28:04PM +0200, Laura Garcia wrote:
> CC'ing netfilter.
> 
> On Mon, Jul 15, 2019 at 6:45 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> > On 7/14/19 9:48 PM, Stephen Rothwell wrote:
> > > Hi all,
> > >
> > > Please do not add v5.4 material to your linux-next included branches
> > > until after v5.3-rc1 has been released.
> > >
> > > Changes since 20190712:
> > >
> >
> > Hi,
> >
> > I am seeing these build errors from HEADERS_TEST (or KERNEL_HEADERS_TEST)
> > for include/net/netfilter/nf_tables_offload.h.s:
> >
> >   CC      include/net/netfilter/nf_tables_offload.h.s
[...]
> > Should this header file not be tested?

Yes, it should indeed be added.

^ permalink raw reply

* Re: [PATCH iproute2] utils: don't match empty strings as prefixes
From: Stephen Hemminger @ 2019-07-15 17:37 UTC (permalink / raw)
  To: Matteo Croce; +Cc: netdev, David Ahern
In-Reply-To: <CAGnkfhyyJJR0frmO7Z+bviu6xYnJVitw-G0Nzgv9UQ2PYO1goA@mail.gmail.com>

On Sun, 14 Jul 2019 16:57:54 +0200
Matteo Croce <mcroce@redhat.com> wrote:

> On Wed, Jul 10, 2019 at 1:18 AM Matteo Croce <mcroce@redhat.com> wrote:
> >
> > On Tue, Jul 9, 2019 at 11:38 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> > >
> > > On Tue,  9 Jul 2019 22:40:40 +0200
> > > Matteo Croce <mcroce@redhat.com> wrote:
> > >  
> > > > iproute has an utility function which checks if a string is a prefix for
> > > > another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> > > > instead of 'address'.
> > > >
> > > > This routine unfortunately considers an empty string as prefix
> > > > of any pattern, leading to undefined behaviour when an empty
> > > > argument is passed to ip:
> > > >
> > > >     # ip ''
> > > >     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> > > >         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > > >         inet 127.0.0.1/8 scope host lo
> > > >            valid_lft forever preferred_lft forever
> > > >         inet6 ::1/128 scope host
> > > >            valid_lft forever preferred_lft forever
> > > >
> > > >     # tc ''
> > > >     qdisc noqueue 0: dev lo root refcnt 2
> > > >
> > > >     # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> > > >     # ip addr show dev dummy0
> > > >     6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> > > >         link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> > > >         inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
> > > >            valid_lft forever preferred_lft forever
> > > >
> > > > Rewrite matches() so it takes care of an empty input, and doesn't
> > > > scan the input strings three times: the actual implementation
> > > > does 2 strlen and a memcpy to accomplish the same task.
> > > >
> > > > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> > > > ---
> > > >  include/utils.h |  2 +-
> > > >  lib/utils.c     | 14 +++++++++-----
> > > >  2 files changed, 10 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/include/utils.h b/include/utils.h
> > > > index 927fdc17..f4d12abb 100644
> > > > --- a/include/utils.h
> > > > +++ b/include/utils.h
> > > > @@ -198,7 +198,7 @@ int nodev(const char *dev);
> > > >  int check_ifname(const char *);
> > > >  int get_ifname(char *, const char *);
> > > >  const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
> > > > -int matches(const char *arg, const char *pattern);
> > > > +int matches(const char *prefix, const char *string);
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
> > > >  int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
> > > >
> > > > diff --git a/lib/utils.c b/lib/utils.c
> > > > index be0f11b0..73ce19bb 100644
> > > > --- a/lib/utils.c
> > > > +++ b/lib/utils.c
> > > > @@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
> > > >       return name;
> > > >  }
> > > >
> > > > -int matches(const char *cmd, const char *pattern)
> > > > +/* Check if 'prefix' is a non empty prefix of 'string' */
> > > > +int matches(const char *prefix, const char *string)
> > > >  {
> > > > -     int len = strlen(cmd);
> > > > +     if (!*prefix)
> > > > +             return 1;
> > > > +     while(*string && *prefix == *string) {
> > > > +             prefix++;
> > > > +             string++;
> > > > +     }
> > > >
> > > > -     if (len > strlen(pattern))
> > > > -             return -1;
> > > > -     return memcmp(pattern, cmd, len);
> > > > +     return *prefix;
> > > >  }
> > > >
> > > >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)  
> > >
> > > ERROR: space required before the open parenthesis '('
> > > #134: FILE: lib/utils.c:895:
> > > +       while(*string && *prefix == *string) {
> > >
> > > total: 1 errors, 1 warnings, 30 lines checked
> > >
> > > The empty prefix string is a bug and should not be allowed.
> > > Also return value should be same as old code (yours isn't).
> > >
> > >
> > >  
> >
> > The old return value was the difference between the first pair of
> > bytes, according to the memcmp manpage.
> > All calls only checks if the matches() return value is 0 or not 0:
> >
> > iproute2$ git grep 'matches(' |grep -v -e '== 0' -e '= 0' -e '!matches('
> > include/utils.h:int matches(const char *prefix, const char *string);
> > include/xtables.h:extern void xtables_register_matches(struct
> > xtables_match *, unsigned int);
> > lib/color.c:    if (matches(dup, "-color"))
> > lib/utils.c:int matches(const char *prefix, const char *string)
> > tc/tc.c:                if (matches(argv[0], iter->c))
> >
> > Is it a problem if it returns a non negative value for non matching strings?
> >
> > Regards,
> >
> >
> > --
> > Matteo Croce
> > per aspera ad upstream  
> 
> Hi Stephen,
> 
> should I send a v2 which keeps the old behaviour, even if noone checks
> for all the values?
> Just to clarify, the old behaviour of matches(cmd, pattern) was:
> 
> -1 if len(cmd) > len(pattern)
> 0 if pattern is equal to cmd
> 0 if pattern starts with cmd
> < 0 if pattern is alphabetically lower than cmd
> > 0 if pattern is alphabetically higher than cmd  
> 
> Regards,

Maybe time to make matches() into a boolean since that is how it is used.

^ permalink raw reply

* Re: [PATCH] ath10k: work around uninitialized vht_pfr variable
From: Kalle Valo @ 2019-07-15 17:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arnd Bergmann, Miaoqing Pan, David S. Miller, Rakesh Pillai,
	Brian Norris, Balaji Pothunoori, Wen Gong, Pradeep kumar Chitrapu,
	Sriram R, ath10k, linux-wireless, netdev, linux-kernel,
	clang-built-linux
In-Reply-To: <20190708125050.3689133-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> wrote:

> As clang points out, the vht_pfr is assigned to a struct member
> without being initialized in one case:
> 
> drivers/net/wireless/ath/ath10k/mac.c:7528:7: error: variable 'vht_pfr' is used uninitialized whenever 'if' condition
>       is false [-Werror,-Wsometimes-uninitialized]
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7551:20: note: uninitialized use occurs here
>                 arvif->vht_pfr = vht_pfr;
>                                  ^~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7528:3: note: remove the 'if' if its condition is always true
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7483:12: note: initialize the variable 'vht_pfr' to silence this warning
>         u8 vht_pfr;
> 
> Add an explicit but probably incorrect initialization here.
> I suspect we want a better fix here, but chose this approach to
> illustrate the issue.
> 
> Fixes: 8b97b055dc9d ("ath10k: fix failure to set multiple fixed rate")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

Patch applied to wireless-drivers.git, thanks.

ff414f31ce37 ath10k: work around uninitialized vht_pfr variable

-- 
https://patchwork.kernel.org/patch/11034993/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Michael S. Tsirkin @ 2019-07-15 17:50 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: Jason Wang, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190715074416.a3s2i5ausognotbn@steredhat>

On Mon, Jul 15, 2019 at 09:44:16AM +0200, Stefano Garzarella wrote:
> On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> > 
> > On 2019/7/12 下午6:00, Stefano Garzarella wrote:
> > > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> > > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > > > > > Hi,
> > > > > > > as Jason suggested some months ago, I looked better at the virtio-net driver to
> > > > > > > understand if we can reuse some parts also in the virtio-vsock driver, since we
> > > > > > > have similar challenges (mergeable buffers, page allocation, small
> > > > > > > packets, etc.).
> > > > > > > 
> > > > > > > Initially, I would add the skbuff in the virtio-vsock in order to re-use
> > > > > > > receive_*() functions.
> > > > > > 
> > > > > > Yes, that will be a good step.
> > > > > > 
> > > > > Okay, I'll go on this way.
> > > > > 
> > > > > > > Then I would move receive_[small, big, mergeable]() and
> > > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
> > > > > > > call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
> > > > > > > XDP part on the virtio-net driver), but I think it is feasible.
> > > > > > > 
> > > > > > > The idea is to create a virtio-skb.[h,c] where put these functions and a new
> > > > > > > object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
> > > > > > > some fields of struct receive_queue).
> > > > > > 
> > > > > > My understanding is we could be more ambitious here. Do you see any blocker
> > > > > > for reusing virtio-net directly? It's better to reuse not only the functions
> > > > > > but also the logic like NAPI to avoid re-inventing something buggy and
> > > > > > duplicated.
> > > > > > 
> > > > > These are my concerns:
> > > > > - virtio-vsock is not a "net_device", so a lot of code related to
> > > > >    ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, offloading) will be
> > > > >    not used by virtio-vsock.
> > 
> > 
> > Linux support device other than ethernet, so it should not be a problem.
> > 
> > 
> > > > > 
> > > > > - virtio-vsock has a different header. We can consider it as part of
> > > > >    virtio_net payload, but it precludes the compatibility with old hosts. This
> > > > >    was one of the major doubts that made me think about using only the
> > > > >    send/recv skbuff functions, that it shouldn't break the compatibility.
> > 
> > 
> > We can extend the current vnet header helper for it to work for vsock.
> 
> Okay, I'll do it.
> 
> > 
> > 
> > > > > 
> > > > > > > This is an idea of virtio-skb.h that
> > > > > > > I have in mind:
> > > > > > >       struct virtskb;
> > > > > > 
> > > > > > What fields do you want to store in virtskb? It looks to be exist sk_buff is
> > > > > > flexible enough to us?
> > > > > My idea is to store queues information, like struct receive_queue or
> > > > > struct send_queue, and some device attributes (e.g. hdr_len ).
> > 
> > 
> > If you reuse skb or virtnet_info, there is not necessary.
> > 
> 
> Okay.
> 
> > 
> > > > > 
> > > > > > 
> > > > > > >       struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
> > > > > > >       struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
> > > > > > >       struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
> > > > > > > 
> > > > > > >       int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > > >       int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > > >       int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > > > > > 
> > > > > > > For the Guest->Host path it should be easier, so maybe I can add a
> > > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
> > > > > > > of xmit_skb().
> > > > > > 
> > > > > > I may miss something, but I don't see any thing that prevents us from using
> > > > > > xmit_skb() directly.
> > > > > > 
> > > > > Yes, but my initial idea was to make it more parametric and not related to the
> > > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > > 'num_buffers' should be handled by the caller.
> > > > > 
> > > > > > > Let me know if you have in mind better names or if I should put these function
> > > > > > > in another place.
> > > > > > > 
> > > > > > > I would like to leave the control part completely separate, so, for example,
> > > > > > > the two drivers will negotiate the features independently and they will call
> > > > > > > the right virtskb_receive_*() function based on the negotiation.
> > > > > > 
> > > > > > If it's one the issue of negotiation, we can simply change the
> > > > > > virtnet_probe() to deal with different devices.
> > > > > > 
> > > > > > 
> > > > > > > I already started to work on it, but before to do more steps and send an RFC
> > > > > > > patch, I would like to hear your opinion.
> > > > > > > Do you think that makes sense?
> > > > > > > Do you see any issue or a better solution?
> > > > > > 
> > > > > > I still think we need to seek a way of adding some codes on virtio-net.c
> > > > > > directly if there's no huge different in the processing of TX/RX. That would
> > > > > > save us a lot time.
> > > > > After the reading of the buffers from the virtqueue I think the process
> > > > > is slightly different, because virtio-net will interface with the network
> > > > > stack, while virtio-vsock will interface with the vsock-core (socket).
> > > > > So the virtio-vsock implements the following:
> > > > > - control flow mechanism to avoid to loose packets, informing the peer
> > > > >    about the amount of memory available in the receive queue using some
> > > > >    fields in the virtio_vsock_hdr
> > > > > - de-multiplexing parsing the virtio_vsock_hdr and choosing the right
> > > > >    socket depending on the port
> > > > > - socket state handling
> > 
> > 
> > I think it's just a branch, for ethernet, go for networking stack. otherwise
> > go for vsock core?
> > 
> 
> Yes, that should work.
> 
> So, I should refactor the functions that can be called also from the vsock
> core, in order to remove "struct net_device *dev" parameter.
> Maybe creating some wrappers for the network stack.
> 
> Otherwise I should create a fake net_device for vsock_core.
> 
> What do you suggest?

Neither.

I think what Jason was saying all along is this:

virtio net doesn't actually lose packets, at least most
of the time. And it actually most of the time
passes all packets to host. So it's possible to use a virtio net
device (possibly with a feature flag that says "does not lose packets,
all packets go to host") and build vsock on top.

and all of this is nice, but don't expect anything easy,
or any quick results.

Also, in a sense it's a missed opportunity: we could cut out a lot
of fat and see just how fast can a protocol that is completely
new and separate from networking stack go.
Instead vsock implementation carries so much baggage from both
networking stack - such as softirq processing - and itself such as
workqueues, global state and crude locking - to the point where
it's actually slower than TCP.

> > 
> > > > > 
> > > > > We can use the virtio-net as transport, but we should add a lot of
> > > > > code to skip "net device" stuff when it is used by the virtio-vsock.
> > 
> > 
> > This could be another choice, but consider it was not transparent to the
> > admin and require new features, we may seek a transparent solution here.
> > 
> > 
> > > > > This could break something in virtio-net, for this reason, I thought to reuse
> > > > > only the send/recv functions starting from the idea to split the virtio-net
> > > > > driver in two parts:
> > > > > a. one with all stuff related to the network stack
> > > > > b. one with the stuff needed to communicate with the host
> > > > > 
> > > > > And use skbuff to communicate between parts. In this way, virtio-vsock
> > > > > can use only the b part.
> > > > > 
> > > > > Maybe we can do this split in a better way, but I'm not sure it is
> > > > > simple.
> > > > > 
> > > > > Thanks,
> > > > > Stefano
> > > > Frankly, skb is a huge structure which adds a lot of
> > > > overhead. I am not sure that using it is such a great idea
> > > > if building a device that does not have to interface
> > > > with the networking stack.
> > 
> > 
> > I believe vsock is mainly used for stream performance not for PPS. So the
> > impact should be minimal. We can use other metadata, just need branch in
> > recv_xxx().
> > 
> 
> Yes, I think stream performance is the case.
> 
> > 
> > > Thanks for the advice!
> > > 
> > > > So I agree with Jason in theory. To clarify, he is basically saying
> > > > current implementation is all wrong, it should be a protocol and we
> > > > should teach networking stack that there are reliable net devices that
> > > > handle just this protocol. We could add a flag in virtio net that
> > > > will say it's such a device.
> > > > 
> > > > Whether it's doable, I don't know, and it's definitely not simple - in
> > > > particular you will have to also re-implement existing devices in these
> > > > terms, and not just virtio - vmware vsock too.
> > 
> > 
> > Merging vsock protocol to exist networking stack could be a long term goal,
> > I believe for the first phase, we can seek to use virtio-net first.
> >
> 
> Yes, I agree.
> 
> > 
> > > > 
> > > > If you want to do a POC you can add a new address family,
> > > > that's easier.
> > > Very interesting!
> > > I agree with you. In this way we can completely split the protocol
> > > logic, from the device.
> > > 
> > > As you said, it will not simple to do, but can be an opportunity to learn
> > > better the Linux networking stack!
> > > I'll try to do a PoC with AF_VSOCK2 that will use the virtio-net.
> > 
> > 
> > I suggest to do this step by step:
> > 
> > 1) use virtio-net but keep some protocol logic
> > 
> > 2) separate protocol logic and merge it to exist Linux networking stack
> 
> Make sense, thanks for the suggestions, I'll try to do these steps!
> 
> Thanks,
> Stefano


An alternative is look at sources of overhead in vsock and get rid of
them, or rewrite it from scratch focusing on performance.


-- 
MST

^ permalink raw reply

* Re: [PATCH v2 1/2] rt2x00usb: fix rx queue hang
From: Kalle Valo @ 2019-07-15 17:53 UTC (permalink / raw)
  To: Soeren Moch
  Cc: Stanislaw Gruszka, Soeren Moch, stable, Helmut Schaa,
	David S. Miller, linux-wireless, netdev, linux-kernel
In-Reply-To: <20190701105314.9707-1-smoch@web.de>

Soeren Moch <smoch@web.de> wrote:

> Since commit ed194d136769 ("usb: core: remove local_irq_save() around
>  ->complete() handler") the handler rt2x00usb_interrupt_rxdone() is
> not running with interrupts disabled anymore. So this completion handler
> is not guaranteed to run completely before workqueue processing starts
> for the same queue entry.
> Be sure to set all other flags in the entry correctly before marking
> this entry ready for workqueue processing. This way we cannot miss error
> conditions that need to be signalled from the completion handler to the
> worker thread.
> Note that rt2x00usb_work_rxdone() processes all available entries, not
> only such for which queue_work() was called.
> 
> This patch is similar to what commit df71c9cfceea ("rt2x00: fix order
> of entry flags modification") did for TX processing.
> 
> This fixes a regression on a RT5370 based wifi stick in AP mode, which
> suddenly stopped data transmission after some period of heavy load. Also
> stopping the hanging hostapd resulted in the error message "ieee80211
> phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
> Other operation modes are probably affected as well, this just was
> the used testcase.
> 
> Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
> Cc: stable@vger.kernel.org # 4.20+
> Signed-off-by: Soeren Moch <smoch@web.de>
> Acked-by: Stanislaw Gruszka <sgruszka@redhat.com>

Patch applied to wireless-drivers.git, thanks.

41a531ffa4c5 rt2x00usb: fix rx queue hang

-- 
https://patchwork.kernel.org/patch/11025561/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox