Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/2] allow userspace to query device features
From: Zhu Lingshan @ 2022-08-15  9:26 UTC (permalink / raw)
  To: jasowang, mst
  Cc: virtualization, netdev, kvm, parav, xieyongji, gautam.dawar,
	Zhu Lingshan

This series allows userspace to query device features of
a vDPA device through a new netlink attr
VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES

This series also make some fields of virtio-net 
device config space conditional on the feature bits,
this means:

MTU should be conditional on VIRTIO_F_NET_MTU
MAC should be conditional on VIRTIO_F_NET_MAC
MQ should be conditional on VIRTIO_F_NET_MQ

For details, please refer to commit message
of patch 2/2

Thanks!

Zhu Lingshan (2):
  vDPA: allow userspace to query features of a vDPA device
  vDPA: conditionally read fields in virtio-net dev

 drivers/vdpa/vdpa.c       | 71 +++++++++++++++++++++++++++++++--------
 include/uapi/linux/vdpa.h |  3 ++
 2 files changed, 60 insertions(+), 14 deletions(-)

-- 
2.31.1


^ permalink raw reply

* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Russell King (Oracle) @ 2022-08-15  9:19 UTC (permalink / raw)
  To: Wei Fang
  Cc: Krzysztof Kozlowski, andrew@lunn.ch, hkallweit1@gmail.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, robh+dt@kernel.org,
	krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <DB9PR04MB8106851412412A65DF0A6F5388689@DB9PR04MB8106.eurprd04.prod.outlook.com>

On Mon, Aug 15, 2022 at 08:51:32AM +0000, Wei Fang wrote:
> 
> 
> > -----Original Message-----
> > From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> > Sent: 2022年8月12日 19:26
> > To: Wei Fang <wei.fang@nxp.com>; andrew@lunn.ch; hkallweit1@gmail.com;
> > linux@armlinux.org.uk; davem@davemloft.net; edumazet@google.com;
> > kuba@kernel.org; pabeni@redhat.com; robh+dt@kernel.org;
> > krzysztof.kozlowski+dt@linaro.org; f.fainelli@gmail.com;
> > netdev@vger.kernel.org; devicetree@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation
> > property
> > 
> > On 12/08/2022 12:02, Wei Fang wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> > >> Sent: 2022年8月12日 15:28
> > >> To: Wei Fang <wei.fang@nxp.com>; andrew@lunn.ch;
> > >> hkallweit1@gmail.com; linux@armlinux.org.uk; davem@davemloft.net;
> > >> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> > >> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> > >> f.fainelli@gmail.com; netdev@vger.kernel.org;
> > >> devicetree@vger.kernel.org; linux-kernel@vger.kernel.org
> > >> Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation
> > >> property
> > >>
> > >> On 12/08/2022 17:50, wei.fang@nxp.com wrote:
> > >>> From: Wei Fang <wei.fang@nxp.com>
> > >>>
> > >>
> > >> Please use subject prefix matching subsystem.
> > >>
> > > Ok, I'll add the subject prefix.
> > >
> > >>> The hibernation mode of Atheros AR803x PHYs is default enabled.
> > >>> When the cable is unplugged, the PHY will enter hibernation mode and
> > >>> the PHY clock does down. For some MACs, it needs the clock to
> > >>> support it's logic. For instance, stmmac needs the PHY inputs clock
> > >>> is present for software reset completion. Therefore, It is
> > >>> reasonable to add a DT property to disable hibernation mode.
> > >>>
> > >>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> > >>> ---
> > >>>  Documentation/devicetree/bindings/net/qca,ar803x.yaml | 6 ++++++
> > >>>  1 file changed, 6 insertions(+)
> > >>>
> > >>> diff --git a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> > >>> b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> > >>> index b3d4013b7ca6..d08431d79b83 100644
> > >>> --- a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> > >>> +++ b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> > >>> @@ -40,6 +40,12 @@ properties:
> > >>>        Only supported on the AR8031.
> > >>>      type: boolean
> > >>>
> > >>> +  qca,disable-hibernation:
> > >>> +    description: |
> > >>> +    If set, the PHY will not enter hibernation mode when the cable is
> > >>> +    unplugged.
> > >>
> > >> Wrong indentation. Did you test the bindings?
> > >>
> > > Sorry, I just checked the patch and forgot to check the dt-bindings.
> > >
> > >> Unfortunately the property describes driver behavior not hardware, so
> > >> it is not suitable for DT. Instead describe the hardware
> > >> characteristics/features/bugs/constraints. Not driver behavior. Both
> > >> in property name and property description.
> > >>
> > > Thanks for your review and feedback. Actually, the hibernation mode is a
> > feature of hardware, I will modify the property name and description to be
> > more in line with the requirements of the DT property.
> > 
> > hibernation is a feature, but 'disable-hibernation' is not. DTS describes the
> > hardware, not policy or driver bejhvior. Why disabling hibernation is a property
> > of hardware? How you described, it's not, therefore either property is not for
> > DT or it has to be phrased correctly to describe the hardware.
> > 
> Sorry, I'm a little confused. Hibernation mode is a feature of PHY hardware, the
> mode defaults to be enabled. We can configure it enabled or not through the
> register which the PHY provided. Now some MACs need the PHY clocks always
> output a valid clock so that MACs can operate correctly. And I add the property
> to disable hibernation mode to avoid this case. And I noticed that there are 
> some similar properties existed in the qca,ar803x,yaml, such as
> "qca,disable-smarteee" and "qca,keep-pll-enabled". So why can't I use the
> "qca,disable-hibernation" property? How should I do? 

To me, your proposal is entirely reasonable and in keeping with DT's
mandate, which is to describe not only how the hardware is connected
but also how the hardware needs to be configured to inter-operate. I
don't see any need for you to change your proposed property.

It is, as you point out above, no different from these other
properties that configure the PHY's internal settings.

I think Krzysztof is confusing the term "hibernation" with the system
level action, and believing that this property has something to do
with preventing the driver doing something when the system enters
that state. I can't fathom any other explanation.

Maybe changing the description of the property would help:

+  qca,disable-hibernation:
+    description: |
+      Configure the PHY to disable hardware hibernation power saving
+      mode, which is entered when the cable is disconnected.
+    type: boolean

Also, I'd suggest also fixing up the patch description to stress that
this is a hardware feature:

"The hardware hibernation mode of Atheros AR803x PHYs is defaults to
being enabled at hardware reset. When the cable is unplugged, the PHY
will enter hibernation mode after a delay and the PHY clock output to
the MAC can be stopped to save power.

"However, some MACs need this clock for proper functioning of their
logic. For instance, stmmac needs the PHY clock for software reset to
complete.

"Therefore, add a DT property to configure the PHY to disable this
hardware hibernation mode."

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH] virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
From: Michael S. Tsirkin @ 2022-08-15  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Xuan Zhuo, Jason Wang, Andres Freund, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization, netdev,
	Linus Torvalds, Jens Axboe, James Bottomley, Martin K. Petersen,
	Guenter Roeck, linux-kernel, Greg KH, c

This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7.

This has been reported to trip up guests on GCP (Google Cloud).  Why is
not yet clear - to be debugged, but the patch itself has several other
issues:

- It treats unknown speed as < 10G
- It leaves userspace no way to find out the ring size set by hypervisor
- It tests speed when link is down
- It ignores the virtio spec advice:
        Both \field{speed} and \field{duplex} can change, thus the driver
        is expected to re-read these values after receiving a
        configuration change notification.
- It is not clear the performance impact has been tested properly

Revert the patch for now.

Link: https://lore.kernel.org/r/20220814212610.GA3690074%40roeck-us.net
Link: https://lore.kernel.org/r/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de
Link: https://lore.kernel.org/r/3df6bb82-1951-455d-a768-e9e1513eb667%40www.fastmail.com
Link: https://lore.kernel.org/r/FCDC5DDE-3CDD-4B8A-916F-CA7D87B547CE%40anarazel.de
Fixes: 762faee5a267 ("virtio_net: set the default max ring size by find_vqs()")
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Andres Freund <andres@anarazel.de>
---
 drivers/net/virtio_net.c | 42 ++++------------------------------------
 1 file changed, 4 insertions(+), 38 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d934774e9733..ece00b84e3a7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3432,29 +3432,6 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
 		   (unsigned int)GOOD_PACKET_LEN);
 }
 
-static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
-{
-	u32 i, rx_size, tx_size;
-
-	if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_10000) {
-		rx_size = 1024;
-		tx_size = 1024;
-
-	} else if (vi->speed < SPEED_40000) {
-		rx_size = 1024 * 4;
-		tx_size = 1024 * 4;
-
-	} else {
-		rx_size = 1024 * 8;
-		tx_size = 1024 * 8;
-	}
-
-	for (i = 0; i < vi->max_queue_pairs; i++) {
-		sizes[rxq2vq(i)] = rx_size;
-		sizes[txq2vq(i)] = tx_size;
-	}
-}
-
 static int virtnet_find_vqs(struct virtnet_info *vi)
 {
 	vq_callback_t **callbacks;
@@ -3462,7 +3439,6 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 	int ret = -ENOMEM;
 	int i, total_vqs;
 	const char **names;
-	u32 *sizes;
 	bool *ctx;
 
 	/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
@@ -3490,15 +3466,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		ctx = NULL;
 	}
 
-	sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
-	if (!sizes)
-		goto err_sizes;
-
 	/* Parameters for control virtqueue, if any */
 	if (vi->has_cvq) {
 		callbacks[total_vqs - 1] = NULL;
 		names[total_vqs - 1] = "control";
-		sizes[total_vqs - 1] = 64;
 	}
 
 	/* Allocate/initialize parameters for send/receive virtqueues */
@@ -3513,10 +3484,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 			ctx[rxq2vq(i)] = true;
 	}
 
-	virtnet_config_sizes(vi, sizes);
-
-	ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
-				       names, sizes, ctx, NULL);
+	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
+				  names, ctx, NULL);
 	if (ret)
 		goto err_find;
 
@@ -3536,8 +3505,6 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 
 
 err_find:
-	kfree(sizes);
-err_sizes:
 	kfree(ctx);
 err_ctx:
 	kfree(names);
@@ -3897,9 +3864,6 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->curr_queue_pairs = num_online_cpus();
 	vi->max_queue_pairs = max_queue_pairs;
 
-	virtnet_init_settings(dev);
-	virtnet_update_settings(vi);
-
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
 	err = init_vqs(vi);
 	if (err)
@@ -3912,6 +3876,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
 	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
 
+	virtnet_init_settings(dev);
+
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
 		vi->failover = net_failover_create(vi->dev);
 		if (IS_ERR(vi->failover)) {
-- 
MST


^ permalink raw reply related

* [PATCH v2 2/2] Revert "mlxsw: core: Add the hottest thermal zone detection"
From: Daniel Lezcano @ 2022-08-15  9:10 UTC (permalink / raw)
  To: daniel.lezcano, rafael
  Cc: vadimp, davem, netdev, linux-kernel, Vadim Pasternak,
	Ido Schimmel, Petr Machata, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
In-Reply-To: <20220815091032.1731268-1-daniel.lezcano@linaro.org>

This reverts commit 6f73862fabd93213de157d9cc6ef76084311c628.

As discussed in the thread:

https://lore.kernel.org/all/f3c62ebe-7d59-c537-a010-bff366c8aeba@linaro.org/

the feature provided by commits 2dc2f760052da and 6f73862fabd93 is
actually already handled by the thermal framework via the cooling
device state aggregation, thus all this code is pointless.

The revert conflicts with the following changes:
 - 7f4957be0d5b8: thermal: Use mode helpers in drivers
 - 6a79507cfe94c: mlxsw: core: Extend thermal module with per QSFP module thermal zones

These conflicts were fixed and the resulting changes are in this patch.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
---
 v2:
   - Fix 'err' not used as reported by kbuild test:
   https://lore.kernel.org/all/202208150708.fk6sfd8u-lkp@intel.com/
---
 .../ethernet/mellanox/mlxsw/core_thermal.c    | 64 ++-----------------
 1 file changed, 4 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index f5751242653b..237a813fbb52 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -22,7 +22,6 @@
 #define MLXSW_THERMAL_HYSTERESIS_TEMP	5000	/* 5C */
 #define MLXSW_THERMAL_MODULE_TEMP_SHIFT	(MLXSW_THERMAL_HYSTERESIS_TEMP * 2)
 #define MLXSW_THERMAL_ZONE_MAX_NAME	16
-#define MLXSW_THERMAL_TEMP_SCORE_MAX	GENMASK(31, 0)
 #define MLXSW_THERMAL_MAX_STATE	10
 #define MLXSW_THERMAL_MIN_STATE	2
 #define MLXSW_THERMAL_MAX_DUTY	255
@@ -96,8 +95,6 @@ struct mlxsw_thermal {
 	u8 tz_module_num;
 	struct mlxsw_thermal_module *tz_gearbox_arr;
 	u8 tz_gearbox_num;
-	unsigned int tz_highest_score;
-	struct thermal_zone_device *tz_highest_dev;
 };
 
 static inline u8 mlxsw_state_to_duty(int state)
@@ -186,34 +183,6 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
 	return 0;
 }
 
-static void mlxsw_thermal_tz_score_update(struct mlxsw_thermal *thermal,
-					  struct thermal_zone_device *tzdev,
-					  struct mlxsw_thermal_trip *trips,
-					  int temp)
-{
-	struct mlxsw_thermal_trip *trip = trips;
-	unsigned int score, delta, i, shift = 1;
-
-	/* Calculate thermal zone score, if temperature is above the hot
-	 * threshold score is set to MLXSW_THERMAL_TEMP_SCORE_MAX.
-	 */
-	score = MLXSW_THERMAL_TEMP_SCORE_MAX;
-	for (i = MLXSW_THERMAL_TEMP_TRIP_NORM; i < MLXSW_THERMAL_NUM_TRIPS;
-	     i++, trip++) {
-		if (temp < trip->temp) {
-			delta = DIV_ROUND_CLOSEST(temp, trip->temp - temp);
-			score = delta * shift;
-			break;
-		}
-		shift *= 256;
-	}
-
-	if (score > thermal->tz_highest_score) {
-		thermal->tz_highest_score = score;
-		thermal->tz_highest_dev = tzdev;
-	}
-}
-
 static int mlxsw_thermal_bind(struct thermal_zone_device *tzdev,
 			      struct thermal_cooling_device *cdev)
 {
@@ -278,10 +247,8 @@ static int mlxsw_thermal_get_temp(struct thermal_zone_device *tzdev,
 		dev_err(dev, "Failed to query temp sensor\n");
 		return err;
 	}
+
 	mlxsw_reg_mtmp_unpack(mtmp_pl, &temp, NULL, NULL, NULL, NULL);
-	if (temp > 0)
-		mlxsw_thermal_tz_score_update(thermal, tzdev, thermal->trips,
-					      temp);
 
 	*p_temp = temp;
 	return 0;
@@ -342,22 +309,6 @@ static int mlxsw_thermal_set_trip_hyst(struct thermal_zone_device *tzdev,
 	return 0;
 }
 
-static int mlxsw_thermal_trend_get(struct thermal_zone_device *tzdev,
-				   int trip, enum thermal_trend *trend)
-{
-	struct mlxsw_thermal_module *tz = tzdev->devdata;
-	struct mlxsw_thermal *thermal = tz->parent;
-
-	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
-		return -EINVAL;
-
-	if (tzdev == thermal->tz_highest_dev)
-		return 1;
-
-	*trend = THERMAL_TREND_STABLE;
-	return 0;
-}
-
 static struct thermal_zone_params mlxsw_thermal_params = {
 	.no_hwmon = true,
 };
@@ -371,7 +322,6 @@ static struct thermal_zone_device_ops mlxsw_thermal_ops = {
 	.set_trip_temp	= mlxsw_thermal_set_trip_temp,
 	.get_trip_hyst	= mlxsw_thermal_get_trip_hyst,
 	.set_trip_hyst	= mlxsw_thermal_set_trip_hyst,
-	.get_trend	= mlxsw_thermal_trend_get,
 };
 
 static int mlxsw_thermal_module_bind(struct thermal_zone_device *tzdev,
@@ -456,7 +406,6 @@ static int mlxsw_thermal_module_temp_get(struct thermal_zone_device *tzdev,
 	int temp, crit_temp, emerg_temp;
 	struct device *dev;
 	u16 sensor_index;
-	int err;
 
 	dev = thermal->bus_info->dev;
 	sensor_index = MLXSW_REG_MTMP_MODULE_INDEX_MIN + tz->module;
@@ -471,10 +420,8 @@ static int mlxsw_thermal_module_temp_get(struct thermal_zone_device *tzdev,
 		return 0;
 
 	/* Update trip points. */
-	err = mlxsw_thermal_module_trips_update(dev, thermal->core, tz,
-						crit_temp, emerg_temp);
-	if (!err && temp > 0)
-		mlxsw_thermal_tz_score_update(thermal, tzdev, tz->trips, temp);
+	mlxsw_thermal_module_trips_update(dev, thermal->core, tz,
+					  crit_temp, emerg_temp);
 
 	return 0;
 }
@@ -547,7 +494,6 @@ static struct thermal_zone_device_ops mlxsw_thermal_module_ops = {
 	.set_trip_temp	= mlxsw_thermal_module_trip_temp_set,
 	.get_trip_hyst	= mlxsw_thermal_module_trip_hyst_get,
 	.set_trip_hyst	= mlxsw_thermal_module_trip_hyst_set,
-	.get_trend	= mlxsw_thermal_trend_get,
 };
 
 static int mlxsw_thermal_gearbox_temp_get(struct thermal_zone_device *tzdev,
@@ -568,8 +514,6 @@ static int mlxsw_thermal_gearbox_temp_get(struct thermal_zone_device *tzdev,
 		return err;
 
 	mlxsw_reg_mtmp_unpack(mtmp_pl, &temp, NULL, NULL, NULL, NULL);
-	if (temp > 0)
-		mlxsw_thermal_tz_score_update(thermal, tzdev, tz->trips, temp);
 
 	*p_temp = temp;
 	return 0;
@@ -584,7 +528,6 @@ static struct thermal_zone_device_ops mlxsw_thermal_gearbox_ops = {
 	.set_trip_temp	= mlxsw_thermal_module_trip_temp_set,
 	.get_trip_hyst	= mlxsw_thermal_module_trip_hyst_get,
 	.set_trip_hyst	= mlxsw_thermal_module_trip_hyst_set,
-	.get_trend	= mlxsw_thermal_trend_get,
 };
 
 static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
@@ -667,6 +610,7 @@ mlxsw_thermal_module_tz_init(struct mlxsw_thermal_module *module_tz)
 							MLXSW_THERMAL_TRIP_MASK,
 							module_tz,
 							&mlxsw_thermal_module_ops,
+
 							&mlxsw_thermal_params,
 							0,
 							module_tz->parent->polling_delay);
-- 
2.34.1


^ permalink raw reply related

* [PATCH v2 1/2] Revert "mlxsw: core: Use different get_trend() callbacks for different thermal zones"
From: Daniel Lezcano @ 2022-08-15  9:10 UTC (permalink / raw)
  To: daniel.lezcano, rafael
  Cc: vadimp, davem, netdev, linux-kernel, Vadim Pasternak,
	Ido Schimmel, Petr Machata, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

This reverts commit 2dc2f760052da4925482ecdcdc5c94d4a599153c.

As discussed in the thread:

https://lore.kernel.org/all/f3c62ebe-7d59-c537-a010-bff366c8aeba@linaro.org/

the feature provided by commits 2dc2f760052da and 6f73862fabd93 is
actually already handled by the thermal framework via the cooling
device state aggregation, thus all this code is pointless.

No conflict happened when reverting the patch.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Vadim Pasternak <vadimp@nvidia.com>
---
 .../ethernet/mellanox/mlxsw/core_thermal.c    | 23 ++++---------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 05f54bd982c0..f5751242653b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -345,7 +345,8 @@ static int mlxsw_thermal_set_trip_hyst(struct thermal_zone_device *tzdev,
 static int mlxsw_thermal_trend_get(struct thermal_zone_device *tzdev,
 				   int trip, enum thermal_trend *trend)
 {
-	struct mlxsw_thermal *thermal = tzdev->devdata;
+	struct mlxsw_thermal_module *tz = tzdev->devdata;
+	struct mlxsw_thermal *thermal = tz->parent;
 
 	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
 		return -EINVAL;
@@ -537,22 +538,6 @@ mlxsw_thermal_module_trip_hyst_set(struct thermal_zone_device *tzdev, int trip,
 	return 0;
 }
 
-static int mlxsw_thermal_module_trend_get(struct thermal_zone_device *tzdev,
-					  int trip, enum thermal_trend *trend)
-{
-	struct mlxsw_thermal_module *tz = tzdev->devdata;
-	struct mlxsw_thermal *thermal = tz->parent;
-
-	if (trip < 0 || trip >= MLXSW_THERMAL_NUM_TRIPS)
-		return -EINVAL;
-
-	if (tzdev == thermal->tz_highest_dev)
-		return 1;
-
-	*trend = THERMAL_TREND_STABLE;
-	return 0;
-}
-
 static struct thermal_zone_device_ops mlxsw_thermal_module_ops = {
 	.bind		= mlxsw_thermal_module_bind,
 	.unbind		= mlxsw_thermal_module_unbind,
@@ -562,7 +547,7 @@ static struct thermal_zone_device_ops mlxsw_thermal_module_ops = {
 	.set_trip_temp	= mlxsw_thermal_module_trip_temp_set,
 	.get_trip_hyst	= mlxsw_thermal_module_trip_hyst_get,
 	.set_trip_hyst	= mlxsw_thermal_module_trip_hyst_set,
-	.get_trend	= mlxsw_thermal_module_trend_get,
+	.get_trend	= mlxsw_thermal_trend_get,
 };
 
 static int mlxsw_thermal_gearbox_temp_get(struct thermal_zone_device *tzdev,
@@ -599,7 +584,7 @@ static struct thermal_zone_device_ops mlxsw_thermal_gearbox_ops = {
 	.set_trip_temp	= mlxsw_thermal_module_trip_temp_set,
 	.get_trip_hyst	= mlxsw_thermal_module_trip_hyst_get,
 	.set_trip_hyst	= mlxsw_thermal_module_trip_hyst_set,
-	.get_trend	= mlxsw_thermal_module_trend_get,
+	.get_trend	= mlxsw_thermal_trend_get,
 };
 
 static int mlxsw_thermal_get_max_state(struct thermal_cooling_device *cdev,
-- 
2.34.1


^ permalink raw reply related

* RE: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Wei Fang @ 2022-08-15  8:51 UTC (permalink / raw)
  To: Krzysztof Kozlowski, andrew@lunn.ch, hkallweit1@gmail.com,
	linux@armlinux.org.uk, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, robh+dt@kernel.org,
	krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <14cf568e-d7ee-886e-5122-69b2e58b8717@linaro.org>



> -----Original Message-----
> From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> Sent: 2022年8月12日 19:26
> To: Wei Fang <wei.fang@nxp.com>; andrew@lunn.ch; hkallweit1@gmail.com;
> linux@armlinux.org.uk; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; robh+dt@kernel.org;
> krzysztof.kozlowski+dt@linaro.org; f.fainelli@gmail.com;
> netdev@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation
> property
> 
> On 12/08/2022 12:02, Wei Fang wrote:
> >
> >
> >> -----Original Message-----
> >> From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> >> Sent: 2022年8月12日 15:28
> >> To: Wei Fang <wei.fang@nxp.com>; andrew@lunn.ch;
> >> hkallweit1@gmail.com; linux@armlinux.org.uk; davem@davemloft.net;
> >> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> >> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org;
> >> f.fainelli@gmail.com; netdev@vger.kernel.org;
> >> devicetree@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation
> >> property
> >>
> >> On 12/08/2022 17:50, wei.fang@nxp.com wrote:
> >>> From: Wei Fang <wei.fang@nxp.com>
> >>>
> >>
> >> Please use subject prefix matching subsystem.
> >>
> > Ok, I'll add the subject prefix.
> >
> >>> The hibernation mode of Atheros AR803x PHYs is default enabled.
> >>> When the cable is unplugged, the PHY will enter hibernation mode and
> >>> the PHY clock does down. For some MACs, it needs the clock to
> >>> support it's logic. For instance, stmmac needs the PHY inputs clock
> >>> is present for software reset completion. Therefore, It is
> >>> reasonable to add a DT property to disable hibernation mode.
> >>>
> >>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> >>> ---
> >>>  Documentation/devicetree/bindings/net/qca,ar803x.yaml | 6 ++++++
> >>>  1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> >>> b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> >>> index b3d4013b7ca6..d08431d79b83 100644
> >>> --- a/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> >>> +++ b/Documentation/devicetree/bindings/net/qca,ar803x.yaml
> >>> @@ -40,6 +40,12 @@ properties:
> >>>        Only supported on the AR8031.
> >>>      type: boolean
> >>>
> >>> +  qca,disable-hibernation:
> >>> +    description: |
> >>> +    If set, the PHY will not enter hibernation mode when the cable is
> >>> +    unplugged.
> >>
> >> Wrong indentation. Did you test the bindings?
> >>
> > Sorry, I just checked the patch and forgot to check the dt-bindings.
> >
> >> Unfortunately the property describes driver behavior not hardware, so
> >> it is not suitable for DT. Instead describe the hardware
> >> characteristics/features/bugs/constraints. Not driver behavior. Both
> >> in property name and property description.
> >>
> > Thanks for your review and feedback. Actually, the hibernation mode is a
> feature of hardware, I will modify the property name and description to be
> more in line with the requirements of the DT property.
> 
> hibernation is a feature, but 'disable-hibernation' is not. DTS describes the
> hardware, not policy or driver bejhvior. Why disabling hibernation is a property
> of hardware? How you described, it's not, therefore either property is not for
> DT or it has to be phrased correctly to describe the hardware.
> 
Sorry, I'm a little confused. Hibernation mode is a feature of PHY hardware, the
mode defaults to be enabled. We can configure it enabled or not through the
register which the PHY provided. Now some MACs need the PHY clocks always
output a valid clock so that MACs can operate correctly. And I add the property
to disable hibernation mode to avoid this case. And I noticed that there are 
some similar properties existed in the qca,ar803x,yaml, such as
"qca,disable-smarteee" and "qca,keep-pll-enabled". So why can't I use the
"qca,disable-hibernation" property? How should I do? 


> Best regards,
> Krzysztof

^ permalink raw reply

* Re: [PATCH ipsec 2/2] xfrm: Skip checking of already-verified secpath entries
From: Steffen Klassert @ 2022-08-15  8:50 UTC (permalink / raw)
  To: Benedict Wong; +Cc: netdev, nharold, lorenzo
In-Reply-To: <20220810182210.721493-3-benedictwong@google.com>

On Wed, Aug 10, 2022 at 06:22:10PM +0000, Benedict Wong wrote:
> This change fixes a bug where inbound packets to nested IPsec tunnels
> fails to pass policy checks due to the inner tunnel's policy checks
> not having a reference to the outer policy/template. This causes the
> policy check to fail, since the first entries in the secpath correlate
> to the outer tunnel, while the templates being verified are for the
> inner tunnel.
> 
> In order to ensure that the appropriate policy and template context is
> searchable, the policy checks must be done incrementally after each
> decryption step. As such, this marks secpath entries as having been
> successfully matched, skipping these on subsequent policy checks.
> 
> By skipping the immediate error return in the case where the secpath
> entry had previously been validated, this change allows secpath entries
> that matched a policy/template previously, while still requiring that
> each searched template find a match in the secpath.
> 
> For security:
> - All templates must have matching secpath entries
>   - Unchanged by current patch; templates that do not match any secpath
>     entry still return -1. This patch simply allows skipping earlier
>     blocks of verified secpath entries
> - All entries (except trailing transport mode entries) must have a
>   matching template
>   - Unvalidated entries, including transport-mode entries still return
>     the errored index if it does not match the correct template.
> 
> Test: Tested against Android Kernel Unit Tests
> Signed-off-by: Benedict Wong <benedictwong@google.com>
> Change-Id: Ic32831cb00151d0de2e465f18ec37d5f7b680e54

This ID is meaningless on a mainline kernel, please remove it.

> ---
>  include/net/xfrm.h     |  1 +
>  net/xfrm/xfrm_input.c  |  3 ++-
>  net/xfrm/xfrm_policy.c | 11 ++++++++++-
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index c39d910d4b45..a2f2840aba6b 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -1031,6 +1031,7 @@ struct xfrm_offload {
>  struct sec_path {
>  	int			len;
>  	int			olen;
> +	int			verified_cnt;
>  
>  	struct xfrm_state	*xvec[XFRM_MAX_DEPTH];
>  	struct xfrm_offload	ovec[XFRM_MAX_OFFLOAD_DEPTH];
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index b24df8a44585..895935077a91 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -129,6 +129,7 @@ struct sec_path *secpath_set(struct sk_buff *skb)
>  	memset(sp->ovec, 0, sizeof(sp->ovec));
>  	sp->olen = 0;
>  	sp->len = 0;
> +	sp->verified_cnt = 0;
>  
>  	return sp;
>  }
> @@ -587,7 +588,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>  
>  		// If nested tunnel, check outer states before context is lost.

Please use networking style comments here too.

>  		if (x->outer_mode.flags & XFRM_MODE_FLAG_TUNNEL
> -				&& sp->len > 0
> +				&& sp->len > sp->verified_cnt
>  				&& !xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family)) {

As in the first patch, please use common networking code
alignment.

Thanks!

^ permalink raw reply

* Re: [PATCH net] l2tp: Serialize access to sk_user_data with sock lock
From: Tom Parkin @ 2022-08-15  8:43 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: Jakub Kicinski, netdev, kernel-team, van fantasy
In-Reply-To: <87edxlu6kd.fsf@cloudflare.com>

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

On  Fri, Aug 12, 2022 at 11:54:43 +0200, Jakub Sitnicki wrote:
> On Thu, Aug 11, 2022 at 10:23 AM -07, Jakub Kicinski wrote:
> > On Wed, 10 Aug 2022 12:28:48 +0200 Jakub Sitnicki wrote:
> >> Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
> >
> > That tag immediately sets off red flags. Please find the commit where
> > to code originates, not where it was last moved.
> 
> The code move happened in v2.6.35. There's no point in digging further, IMHO.

At the time of fd558d186df2, sk_user_data was checked/set by the newly
added function l2tp_tunnel_create.  The only callpath for
l2tp_tunnel_create was via.  pppol2tp_connect which called
l2tp_tunnel_create with lock_sock held (and indeed still does).

I think the addition of the netlink API (which added a new callpath
for l2tp_tunnel_create via. l2tp_nl_cmd_tunnel_create which was *not*
lock_sock-protected) is perhaps the right commit to point to?

309795f4bec2 ("l2tp: Add netlink control API for L2TP")

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH ipsec 1/2] xfrm: Check policy for nested XFRM packets in xfrm_input
From: Steffen Klassert @ 2022-08-15  8:45 UTC (permalink / raw)
  To: Benedict Wong; +Cc: netdev, nharold, lorenzo
In-Reply-To: <20220810182210.721493-2-benedictwong@google.com>

On Wed, Aug 10, 2022 at 06:22:09PM +0000, Benedict Wong wrote:
> This change ensures that all nested XFRM packets have their policy
> checked before decryption of the next layer, so that policies are
> verified at each intermediate step of the decryption process.
> 
> This is necessary especially for nested tunnels, as the IP addresses,
> protocol and ports may all change, thus not matching the previous
> policies. In order to ensure that packets match the relevant inbound
> templates, the xfrm_policy_check should be done before handing off to
> the inner XFRM protocol to decrypt and decapsulate.
> 
> Test: Tested against Android Kernel Unit Tests
> Signed-off-by: Benedict Wong <benedictwong@google.com>
> Change-Id: I20c5abf39512d7f6cf438c0921a78a84e281b4e9
> ---
>  net/xfrm/xfrm_input.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 144238a50f3d..b24df8a44585 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -585,6 +585,13 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
>  			goto drop;
>  		}
>  
> +		// If nested tunnel, check outer states before context is lost.

Please use networking style comments like so /* ... */

> +		if (x->outer_mode.flags & XFRM_MODE_FLAG_TUNNEL
> +				&& sp->len > 0

Please align this to the opening brace of the if statement
like it is done everywhere in networking code. If you are
unsure about coding style, try checkpatch it helps in that
case.

> +				&& !xfrm_policy_check(NULL, XFRM_POLICY_IN, skb, family)) {

Hm, shouldn't the xfrm_policy_check called along the
packet path for each round after decapsulation?

Do you use ESP transformation offload (INET_ESP_OFFLOAD/
INET6_ESP_OFFLOAD)?

> +			goto drop;
> +		}
> +
>  		skb->mark = xfrm_smark_get(skb->mark, x);
>  
>  		sp->xvec[sp->len++] = x;
> -- 
> 2.37.1.595.g718a3a8f04-goog

^ permalink raw reply

* Re: [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
From: Geert Uytterhoeven @ 2022-08-15  8:38 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas
In-Reply-To: <CAMuHMdWo_wRwV-i_iyTxVnEsf3Th9GBAG+wxUQMQGnw1t2ijTg@mail.gmail.com>

Hi Saravana,

On Tue, Jul 5, 2022 at 11:11 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Wed, Jun 1, 2022 at 2:44 PM Saravana Kannan <saravanak@google.com> wrote:
> > Now that fw_devlink=on by default and fw_devlink supports interrupt
> > properties, the execution will never get to the point where
> > driver_deferred_probe_check_state() is called before the supplier has
> > probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch, which is now commit f8217275b57aa48d ("net:
> mdio: Delete usage of driver_deferred_probe_check_state()") in
> driver-core/driver-core-next.
>
> Seems like I missed something when providing my T-b for this series,
> sorry for that.
>
> arch/arm/boot/dts/r8a7791-koelsch.dts has:
>
>     &ether {
>             pinctrl-0 = <&ether_pins>, <&phy1_pins>;
>             pinctrl-names = "default";
>
>             phy-handle = <&phy1>;
>             renesas,ether-link-active-low;
>             status = "okay";
>
>             phy1: ethernet-phy@1 {
>                     compatible = "ethernet-phy-id0022.1537",
>                                  "ethernet-phy-ieee802.3-c22";
>                     reg = <1>;
>                     interrupt-parent = <&irqc0>;
>                     interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
>                     micrel,led-mode = <1>;
>                     reset-gpios = <&gpio5 22 GPIO_ACTIVE_LOW>;
>             };
>     };
>
> Despite the interrupts property, &ether is now probed before irqc0
> (interrupt-controller@e61c0000 in arch/arm/boot/dts/r8a7791.dtsi),
> causing the PHY not finding its interrupt, and resorting to polling:
>
>     -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=185)
>     +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
>
> Reverting this commit, and commit 9cbffc7a59561be9 ("driver core:
> Delete driver_deferred_probe_check_state()") fixes that.

FTR, this issue is now present in v6.0-rc1.
I haven't tried your newest series yet.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH 1/2] dt-bindings: vertexcom-mse102x: Update email address
From: Stefan Wahren @ 2022-08-15  8:06 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski
  Cc: netdev, devicetree, Stefan Wahren

in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml b/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
index 8156a9aeb589..304757bf9281 100644
--- a/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
+++ b/Documentation/devicetree/bindings/net/vertexcom-mse102x.yaml
@@ -7,7 +7,7 @@ $schema: "http://devicetree.org/meta-schemas/core.yaml#"
 title: The Vertexcom MSE102x (SPI) Device Tree Bindings
 
 maintainers:
-  - Stefan Wahren <stefan.wahren@in-tech.com>
+  - Stefan Wahren <stefan.wahren@chargebyte.com>
 
 description:
   Vertexcom's MSE102x are a family of HomePlug GreenPHY chips.
-- 
2.34.1


^ permalink raw reply related

* [PATCH 2/2] net: vertexcom: mse102x: Update email address
From: Stefan Wahren @ 2022-08-15  8:06 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski
  Cc: netdev, devicetree, Stefan Wahren
In-Reply-To: <20220815080626.9688-1-stefan.wahren@i2se.com>

in-tech smart charging is now chargebyte. So update the email address
accordingly.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 drivers/net/ethernet/vertexcom/mse102x.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c
index eb39a45de012..30a2f38491fe 100644
--- a/drivers/net/ethernet/vertexcom/mse102x.c
+++ b/drivers/net/ethernet/vertexcom/mse102x.c
@@ -762,6 +762,6 @@ static struct spi_driver mse102x_driver = {
 module_spi_driver(mse102x_driver);
 
 MODULE_DESCRIPTION("MSE102x Network driver");
-MODULE_AUTHOR("Stefan Wahren <stefan.wahren@in-tech.com>");
+MODULE_AUTHOR("Stefan Wahren <stefan.wahren@chargebyte.com>");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("spi:" DRV_NAME);
-- 
2.34.1


^ permalink raw reply related

* Re: upstream kernel crashes
From: Michael S. Tsirkin @ 2022-08-15  8:02 UTC (permalink / raw)
  To: Andres Freund
  Cc: Linus Torvalds, Jens Axboe, James Bottomley, Martin K. Petersen,
	Guenter Roeck, linux-kernel, Greg Kroah-Hartman, Xuan Zhuo,
	Jason Wang, virtualization, netdev
In-Reply-To: <3df6bb82-1951-455d-a768-e9e1513eb667@www.fastmail.com>

On Mon, Aug 15, 2022 at 12:46:36AM -0700, Andres Freund wrote:
> Hi,
> 
> On Mon, Aug 15, 2022, at 00:29, Michael S. Tsirkin wrote:
> > On Mon, Aug 15, 2022 at 12:11:43AM -0700, Andres Freund wrote:
> >> Hi,
> >> 
> >> On 2022-08-14 20:18:44 -0700, Linus Torvalds wrote:
> >> > On Sun, Aug 14, 2022 at 6:36 PM Andres Freund <andres@anarazel.de> wrote:
> >> > >
> >> > > Some of the symptoms could be related to the issue in this thread, hence
> >> > > listing them here
> >> > 
> >> > Smells like slab corruption to me, and the problems may end up being
> >> > then largely random just depending on who ends up using the allocation
> >> > that gets trampled on.
> >> > 
> >> > I wouldn't be surprised if it's all the same thing - including your
> >> > network issue.
> >> 
> >> Yea. As I just wrote in
> >> https://postgr.es/m/20220815070203.plwjx7b3cyugpdt7%40awork3.anarazel.de I
> >> bisected it down to one commit (762faee5a267). With that commit I only see the
> >> networking issue across a few reboots, but with ebcce4926365 some boots oops
> >> badly and other times it' "just" network not working.
> >> 
> >> 
> >> [oopses]
> 
> >> If somebody knowledgeable staring at 762faee5a267 doesn't surface somebody I
> >> can create a kernel with some more debugging stuff enabled, if somebody tells
> >> me what'd work best here.
> >> 
> >> 
> >> Greetings,
> >> 
> >> Andres Freund
> >
> > Thanks a lot for the work!
> > Just a small clarification:
> >
> > So IIUC you see several issues, right?
> 
> Yes, although they might be related, as theorized by Linus upthread.
> 
> > With 762faee5a2678559d3dc09d95f8f2c54cd0466a7 you see networking issues.
> 
> Yes.
> 
> 
> > With ebcce492636506443e4361db6587e6acd1a624f9 you see crashes.
> 
> Changed between rebooting. Sometimes the network issue, sometimes the crashes in the email you're replying to.
>

OK just adding:

    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
	L: virtualization@lists.linux-foundation.org
	L: netdev@vger.kernel.org

I think we can drop the original Cc list:

Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Guenter Roeck <linux@roeck-us.net>, linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>

but I'm not sure, maybe they want to be informed.

> 
> > MST


^ permalink raw reply

* Re: [PATCH net v2] Revert "tcp: change pingpong threshold to 3"
From: Jiri Slaby @ 2022-08-15  7:48 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Wei Wang, David Miller, Eric Dumazet, Jakub Kicinski, netdev,
	Soheil Hassas Yeganeh, Yuchung Cheng, LemmyHuang, stable
In-Reply-To: <e318ba59-d58a-5826-82c9-6cfc2409cbd4@kernel.org>

On 06. 08. 22, 16:41, Jiri Slaby wrote:
> On 06. 08. 22, 13:24, Neal Cardwell wrote:
>> On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby <jirislaby@kernel.org> wrote:
>>>
>>> On 21. 07. 22, 22:44, Wei Wang wrote:
>>>> This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c.
>>>>
>>>> This to-be-reverted commit was meant to apply a stricter rule for the
>>>> stack to enter pingpong mode. However, the condition used to check for
>>>> interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
>>>> jiffy based and might be too coarse, which delays the stack entering
>>>> pingpong mode.
>>>> We revert this patch so that we no longer use the above condition to
>>>> determine interactive session, and also reduce pingpong threshold to 1.
>>>>
>>>> Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
>>>> Reported-by: LemmyHuang <hlm3280@163.com>
>>>> Suggested-by: Neal Cardwell <ncardwell@google.com>
>>>> Signed-off-by: Wei Wang <weiwan@google.com>
>>>
>>>
>>> This breaks python-eventlet [1] (and was backported to stable trees):
>>> ________________ TestHttpd.test_018b_http_10_keepalive_framing
>>> _________________
>>>
>>> self = <tests.wsgi_test.TestHttpd
>>> testMethod=test_018b_http_10_keepalive_framing>
>>>
>>>       def test_018b_http_10_keepalive_framing(self):
>>>           # verify that if an http/1.0 client sends connection: 
>>> keep-alive
>>>           # that we don't mangle the request framing if the app doesn't
>>> read the request
>>>           def app(environ, start_response):
>>>               resp_body = {
>>>                   '/1': b'first response',
>>>                   '/2': b'second response',
>>>                   '/3': b'third response',
>>>               }.get(environ['PATH_INFO'])
>>>               if resp_body is None:
>>>                   resp_body = 'Unexpected path: ' + environ['PATH_INFO']
>>>                   if six.PY3:
>>>                       resp_body = resp_body.encode('latin1')
>>>               # Never look at wsgi.input!
>>>               start_response('200 OK', [('Content-type', 'text/plain')])
>>>               return [resp_body]
>>>
>>>           self.site.application = app
>>>           sock = eventlet.connect(self.server_addr)
>>>           req_body = b'GET /tricksy HTTP/1.1\r\n'
>>>           body_len = str(len(req_body)).encode('ascii')
>>>
>>>           sock.sendall(b'PUT /1 HTTP/1.0\r\nHost:
>>> localhost\r\nConnection: keep-alive\r\n'
>>>                        b'Content-Length: ' + body_len + b'\r\n\r\n' +
>>> req_body)
>>>           result1 = read_http(sock)
>>>           self.assertEqual(b'first response', result1.body)
>>>           self.assertEqual(result1.headers_original.get('Connection'),
>>> 'keep-alive')
>>>
>>>           sock.sendall(b'PUT /2 HTTP/1.0\r\nHost:
>>> localhost\r\nConnection: keep-alive\r\n'
>>>                        b'Content-Length: ' + body_len + b'\r\nExpect:
>>> 100-continue\r\n\r\n')
>>>           # Client may have a short timeout waiting on that 100 Continue
>>>           # and basically immediately send its body
>>>           sock.sendall(req_body)
>>>           result2 = read_http(sock)
>>>           self.assertEqual(b'second response', result2.body)
>>>           self.assertEqual(result2.headers_original.get('Connection'),
>>> 'close')
>>>
>>>   >       sock.sendall(b'PUT /3 HTTP/1.0\r\nHost:
>>> localhost\r\nConnection: close\r\n\r\n')
>>>
>>> tests/wsgi_test.py:648:
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> _ _ _ _
>>> eventlet/greenio/base.py:407: in sendall
>>>       tail = self.send(data, flags)
>>> eventlet/greenio/base.py:401: in send
>>>       return self._send_loop(self.fd.send, data, flags)
>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>>> _ _ _ _
>>>
>>> self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0>
>>> send_method = <built-in method send of socket object at 0x7f5f2f73d520>
>>> data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection: 
>>> close\r\n\r\n'
>>> args = (0,), _timeout_exc = timeout('timed out'), eno = 32
>>>
>>>       def _send_loop(self, send_method, data, *args):
>>>           if self.act_non_blocking:
>>>               return send_method(data, *args)
>>>
>>>           _timeout_exc = socket_timeout('timed out')
>>>           while True:
>>>               try:
>>>   >               return send_method(data, *args)
>>> E               BrokenPipeError: [Errno 32] Broken pipe
>>>
>>> eventlet/greenio/base.py:388: BrokenPipeError
>>> ====================
>>>
>>> Reverting this revert on the top of 5.19 solves the issue.
>>>
>>> Any ideas?
>>
>> Interesting. This revert should return the kernel back to the delayed
>> ACK behavior it had for many years before May 2019 and Linux 5.1,
>> which contains the commit it is reverting:
>>
>>    4a41f453bedfd tcp: change pingpong threshold to 3
>>
>> It sounds like perhaps this test you mention has an implicit
>> dependence on the timing of delayed ACKs.
>>
>> A few questions:
> 
> Dunno. I am only an openSUSE kernel maintainer and this popped out at 
> me. Feel free to dig to eventlet's sources on your own :P.

Any updates on this or should I send a revert directly?

The "before() &&" part of the patch makes the difference. That is this diff:
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
          * and it is a reply for ato after last received packet,
          * increase pingpong count.
          */
-       if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
-           (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
+       pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u 
ping=%u\n",
+                       __func__, sk, sk->sk_addrpair, sk->sk_portpair, now,
+                       tp->lsndtime, icsk->icsk_ack.lrcvtime,
+                       inet_csk(sk)->icsk_ack.pingpong);
+       if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
+           (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) {
                 inet_csk_inc_pingpong_cnt(sk);
+               pr_info("\tINC ping=%u before=%u\n",
+                               inet_csk(sk)->icsk_ack.pingpong,
+                               before(tp->lsndtime, 
icsk->icsk_ack.lrcvtime));
+       }

         tp->lsndtime = now;
  }

makes it work again, and outputs this:

 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d 
(100007f0100007f:e858b18b) now=4294902140 lsndtime=4294902140 
lrcvtime=4294902140 ping=0
 > TCP: tcp_event_data_sent: sk=00000000a4becf82 
(100007f0100007f:8bb158e8) now=4294902143 lsndtime=4294902140 
lrcvtime=4294902142 ping=0
 > TCP:     INC ping=1 before=1
 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d 
(100007f0100007f:e858b18b) now=4294902145 lsndtime=4294902140 
lrcvtime=4294902144 ping=0
 > TCP:     INC ping=1 before=1
 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d 
(100007f0100007f:e858b18b) now=4294902147 lsndtime=4294902145 
lrcvtime=4294902144 ping=1
 > TCP:     INC ping=2 before=0

IMO, this "before=0" is the "source" of the problem. But I have no idea 
what this means at all...

 > TCP: tcp_event_data_sent: sk=00000000a4becf82 
(100007f0100007f:8bb158e8) now=4294902149 lsndtime=4294902143 
lrcvtime=4294902148 ping=1
 > TCP:     INC ping=2 before=1
 > TCP: tcp_event_data_sent: sk=00000000fd67cf8d 
(100007f0100007f:e858b18b) now=4294902151 lsndtime=4294902147 
lrcvtime=4294902150 ping=3
 > TCP:     INC ping=4 before=1
 > TCP: tcp_event_data_sent: sk=00000000c7a417e9 
(100007f0100007f:e85ab18b) now=4294902153 lsndtime=4294902153 
lrcvtime=4294902153 ping=0
 > TCP: tcp_event_data_sent: sk=000000008681183e 
(100007f0100007f:8bb15ae8) now=4294902155 lsndtime=4294902153 
lrcvtime=4294902154 ping=0
 > TCP:     INC ping=1 before=1



>> (1) What are the timeout values in this test? If there is some
>> implicit or explicit timeout value less than the typical Linux TCP
>> 40ms delayed ACK timer value then this could be the problem. If you
>> make sure all timeouts are at least, say, 300ms then this should
>> remove dependencies on delayed ACK behavior (and make the test more
>> portable).
>>
>> (2) Does this test use the TCP_NODELAY socket option to disable
>> Nagle's algorithm? Presumably it should, given that it's a network app
>> that cares about latency. Omitting the TCP_NODELAY socket option can
>> cause request/response traffic to depend on delayed ACK behavior.
>>
>> (3) If (1) and (2) do not fix the test, would you be able to provide
>> binary .pcap traces of the behavior with the test (a) passing and (b)
>> failing? For example:
>>     sudo tcpdump -i any -w /tmp/trace.pcap -s 100 port 80 &
>>     # run test
>>     killall tcpdump
>>
>> thanks,
>> neal
> 
> 

thanks,
-- 
js
suse labs


^ permalink raw reply

* Re: [PATCH v14 37/42] virtio_net: set the default max ring size by find_vqs()
From: Michael S. Tsirkin @ 2022-08-15  7:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Hans de Goede, Mark Gross, Vadim Pasternak,
	Bjorn Andersson, Mathieu Poirier, Cornelia Huck, Halil Pasic,
	Eric Farman, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Vincent Whitchurch, linux-um, netdev, platform-driver-x86,
	linux-remoteproc, linux-s390, kvm, bpf, kangjie.xu
In-Reply-To: <1660548498.412278-11-xuanzhuo@linux.alibaba.com>

On Mon, Aug 15, 2022 at 03:28:18PM +0800, Xuan Zhuo wrote:
> On Mon, 15 Aug 2022 03:14:43 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Aug 15, 2022 at 02:35:03PM +0800, Xuan Zhuo wrote:
> > > On Mon, 15 Aug 2022 02:00:16 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Aug 01, 2022 at 02:38:57PM +0800, Xuan Zhuo wrote:
> > > > > Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
> > > > > rx at the same time.
> > > > >
> > > > >                          | rx/tx ring size
> > > > > -------------------------------------------
> > > > > speed == UNKNOWN or < 10G| 1024
> > > > > speed < 40G              | 4096
> > > > > speed >= 40G             | 8192
> > > > >
> > > > > Call virtnet_update_settings() once before calling init_vqs() to update
> > > > > speed.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > I've been looking at this patchset because of the resent
> > > > reported crashes, and I'm having second thoughts about this.
> > > >
> > > > Do we really want to second-guess the device supplied
> > > > max ring size? If yes why?
> > > >
> > > > Could you please share some performance data that motivated this
> > > > specific set of numbers?
> > >
> > >
> > > The impact of this value on performance is as follows. The larger the value, the
> > > throughput can be increased, but the delay will also increase accordingly. It is
> > > a maximum limit for the ring size under the corresponding speed. The purpose of
> > > this limitation is not to improve performance, but more to reduce memory usage.
> > >
> > > These data come from many other network cards and some network optimization
> > > experience.
> > >
> > > For example, in the case of speed = 20G, the impact of ring size greater
> > > than 4096 on performance has no meaning. At this time, if the device supports
> > > 8192, we limit it to 4096 through this, the real meaning is to reduce the memory
> > > usage.
> > >
> > >
> > > >
> > > > Also why do we intepret UNKNOWN as "very low"?
> > > > I'm thinking that should definitely be "don't change anything".
> > > >
> > >
> > > Generally speaking, for a network card with a high speed, it will return a
> > > correct speed. But I think it is a good idea to do nothing.
> >
> >
> >
> >
> >
> > >
> > > > Finally if all this makes sense then shouldn't we react when
> > > > speed changes?
> > >
> > > This is the feedback of the network card when it is started, and theoretically
> > > it should not change in the future.
> >
> > Yes it should:
> > 	Both \field{speed} and \field{duplex} can change, thus the driver
> > 	is expected to re-read these values after receiving a
> > 	configuration change notification.
> >
> >
> > Moreover, during probe link can quite reasonably be down.
> > If it is, then speed and duplex might not be correct.
> >
> 
> 
> It seems that this is indeed a problem.
> 
> But I feel that this is not the reason for the abnormal network.

Yes, but it's a reason to revert this patch and rethink the approach.

> I'm still trying google cloud vm.
> 
> 
> >
> >
> >
> > > >
> > > > Could you try reverting this and showing performance results
> > > > before and after please? Thanks!
> > >
> > > I hope the above reply can help you, if there is anything else you need me to
> > > cooperate with, I am very happy.
> > >
> > > If you think it's ok, I can resubmit a commit with 'UNKNOW' set to unlimited. I
> > > can submit it with the issue of #30.
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++++++----
> > > > >  1 file changed, 38 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 8a5810bcb839..40532ecbe7fc 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
> > > > >  		   (unsigned int)GOOD_PACKET_LEN);
> > > > >  }
> > > > >
> > > > > +static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
> > > > > +{
> > > > > +	u32 i, rx_size, tx_size;
> > > > > +
> > > > > +	if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_10000) {
> > > > > +		rx_size = 1024;
> > > > > +		tx_size = 1024;
> > > > > +
> > > > > +	} else if (vi->speed < SPEED_40000) {
> > > > > +		rx_size = 1024 * 4;
> > > > > +		tx_size = 1024 * 4;
> > > > > +
> > > > > +	} else {
> > > > > +		rx_size = 1024 * 8;
> > > > > +		tx_size = 1024 * 8;
> > > > > +	}
> > > > > +
> > > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +		sizes[rxq2vq(i)] = rx_size;
> > > > > +		sizes[txq2vq(i)] = tx_size;
> > > > > +	}
> > > > > +}
> > > > > +
> > > > >  static int virtnet_find_vqs(struct virtnet_info *vi)
> > > > >  {
> > > > >  	vq_callback_t **callbacks;
> > > > > @@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > > >  	int ret = -ENOMEM;
> > > > >  	int i, total_vqs;
> > > > >  	const char **names;
> > > > > +	u32 *sizes;
> > > > >  	bool *ctx;
> > > > >
> > > > >  	/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
> > > > > @@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > > >  		ctx = NULL;
> > > > >  	}
> > > > >
> > > > > +	sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
> > > > > +	if (!sizes)
> > > > > +		goto err_sizes;
> > > > > +
> > > > >  	/* Parameters for control virtqueue, if any */
> > > > >  	if (vi->has_cvq) {
> > > > >  		callbacks[total_vqs - 1] = NULL;
> > > > >  		names[total_vqs - 1] = "control";
> > > > > +		sizes[total_vqs - 1] = 64;
> > > > >  	}
> > > > >
> > > > >  	/* Allocate/initialize parameters for send/receive virtqueues */
> > > > > @@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > > >  			ctx[rxq2vq(i)] = true;
> > > > >  	}
> > > > >
> > > > > -	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
> > > > > -				  names, ctx, NULL);
> > > > > +	virtnet_config_sizes(vi, sizes);
> > > > > +
> > > > > +	ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
> > > > > +				       names, sizes, ctx, NULL);
> > > > >  	if (ret)
> > > > >  		goto err_find;
> > > > >
> > > > > @@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > > >
> > > > >
> > > > >  err_find:
> > > > > +	kfree(sizes);
> > > > > +err_sizes:
> > > > >  	kfree(ctx);
> > > > >  err_ctx:
> > > > >  	kfree(names);
> > > > > @@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > > >  		vi->curr_queue_pairs = num_online_cpus();
> > > > >  	vi->max_queue_pairs = max_queue_pairs;
> > > > >
> > > > > +	virtnet_init_settings(dev);
> > > > > +	virtnet_update_settings(vi);
> > > > > +
> > > > >  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> > > > >  	err = init_vqs(vi);
> > > > >  	if (err)
> > > > > @@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > > >  	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > > > >  	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> > > > >
> > > > > -	virtnet_init_settings(dev);
> > > > > -
> > > > >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> > > > >  		vi->failover = net_failover_create(vi->dev);
> > > > >  		if (IS_ERR(vi->failover)) {
> > > > > --
> > > > > 2.31.0
> > > >
> >


^ permalink raw reply

* Re: [PATCH net-next] net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg
From: Nikolay Aleksandrov @ 2022-08-15  7:32 UTC (permalink / raw)
  To: Zhengchao Shao, netdev, linux-kernel, davem, edumazet, kuba,
	pabeni
  Cc: idosch, petrm, florent.fourcot, weiyongjun1, yuehaibing
In-Reply-To: <feff23c6-529c-3421-c48c-463846e59630@blackwall.org>

On 15/08/2022 08:44, Nikolay Aleksandrov wrote:
> On 15/08/2022 05:46, Zhengchao Shao wrote:
>> When bulk delete command is received in the rtnetlink_rcv_msg function,
>> if bulk delete is not supported, module_put is not called to release
>> the reference counting. As a result, module reference count is leaked.
>>
>> Fixes: a6cec0bcd342("net: rtnetlink: add bulk delete support flag")
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>  net/core/rtnetlink.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>> index ac45328607f7..4b5b15c684ed 100644
>> --- a/net/core/rtnetlink.c
>> +++ b/net/core/rtnetlink.c
>> @@ -6070,6 +6070,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
>>  	if (kind == RTNL_KIND_DEL && (nlh->nlmsg_flags & NLM_F_BULK) &&
>>  	    !(flags & RTNL_FLAG_BULK_DEL_SUPPORTED)) {
>>  		NL_SET_ERR_MSG(extack, "Bulk delete is not supported");
>> +		module_put(owner);
>>  		goto err_unlock;
>>  	}
>>  
> 
> Oops, thanks.
> Acked-by: Nikolay Aleksandrov <razor@blackwall.org>

By the way I think this should be targeted at -net,
I didn't notice the net-next tag earlier.

^ permalink raw reply

* Re: [PATCH v14 37/42] virtio_net: set the default max ring size by find_vqs()
From: Xuan Zhuo @ 2022-08-15  7:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Hans de Goede, Mark Gross, Vadim Pasternak,
	Bjorn Andersson, Mathieu Poirier, Cornelia Huck, Halil Pasic,
	Eric Farman, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Vincent Whitchurch, linux-um, netdev, platform-driver-x86,
	linux-remoteproc, linux-s390, kvm, bpf, kangjie.xu
In-Reply-To: <20220815031022-mutt-send-email-mst@kernel.org>

On Mon, 15 Aug 2022 03:14:43 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Aug 15, 2022 at 02:35:03PM +0800, Xuan Zhuo wrote:
> > On Mon, 15 Aug 2022 02:00:16 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Aug 01, 2022 at 02:38:57PM +0800, Xuan Zhuo wrote:
> > > > Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
> > > > rx at the same time.
> > > >
> > > >                          | rx/tx ring size
> > > > -------------------------------------------
> > > > speed == UNKNOWN or < 10G| 1024
> > > > speed < 40G              | 4096
> > > > speed >= 40G             | 8192
> > > >
> > > > Call virtnet_update_settings() once before calling init_vqs() to update
> > > > speed.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > >
> > > I've been looking at this patchset because of the resent
> > > reported crashes, and I'm having second thoughts about this.
> > >
> > > Do we really want to second-guess the device supplied
> > > max ring size? If yes why?
> > >
> > > Could you please share some performance data that motivated this
> > > specific set of numbers?
> >
> >
> > The impact of this value on performance is as follows. The larger the value, the
> > throughput can be increased, but the delay will also increase accordingly. It is
> > a maximum limit for the ring size under the corresponding speed. The purpose of
> > this limitation is not to improve performance, but more to reduce memory usage.
> >
> > These data come from many other network cards and some network optimization
> > experience.
> >
> > For example, in the case of speed = 20G, the impact of ring size greater
> > than 4096 on performance has no meaning. At this time, if the device supports
> > 8192, we limit it to 4096 through this, the real meaning is to reduce the memory
> > usage.
> >
> >
> > >
> > > Also why do we intepret UNKNOWN as "very low"?
> > > I'm thinking that should definitely be "don't change anything".
> > >
> >
> > Generally speaking, for a network card with a high speed, it will return a
> > correct speed. But I think it is a good idea to do nothing.
>
>
>
>
>
> >
> > > Finally if all this makes sense then shouldn't we react when
> > > speed changes?
> >
> > This is the feedback of the network card when it is started, and theoretically
> > it should not change in the future.
>
> Yes it should:
> 	Both \field{speed} and \field{duplex} can change, thus the driver
> 	is expected to re-read these values after receiving a
> 	configuration change notification.
>
>
> Moreover, during probe link can quite reasonably be down.
> If it is, then speed and duplex might not be correct.
>


It seems that this is indeed a problem.

But I feel that this is not the reason for the abnormal network.

I'm still trying google cloud vm.


>
>
>
> > >
> > > Could you try reverting this and showing performance results
> > > before and after please? Thanks!
> >
> > I hope the above reply can help you, if there is anything else you need me to
> > cooperate with, I am very happy.
> >
> > If you think it's ok, I can resubmit a commit with 'UNKNOW' set to unlimited. I
> > can submit it with the issue of #30.
> >
> > Thanks.
> >
> >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++++++----
> > > >  1 file changed, 38 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 8a5810bcb839..40532ecbe7fc 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
> > > >  		   (unsigned int)GOOD_PACKET_LEN);
> > > >  }
> > > >
> > > > +static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
> > > > +{
> > > > +	u32 i, rx_size, tx_size;
> > > > +
> > > > +	if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_10000) {
> > > > +		rx_size = 1024;
> > > > +		tx_size = 1024;
> > > > +
> > > > +	} else if (vi->speed < SPEED_40000) {
> > > > +		rx_size = 1024 * 4;
> > > > +		tx_size = 1024 * 4;
> > > > +
> > > > +	} else {
> > > > +		rx_size = 1024 * 8;
> > > > +		tx_size = 1024 * 8;
> > > > +	}
> > > > +
> > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +		sizes[rxq2vq(i)] = rx_size;
> > > > +		sizes[txq2vq(i)] = tx_size;
> > > > +	}
> > > > +}
> > > > +
> > > >  static int virtnet_find_vqs(struct virtnet_info *vi)
> > > >  {
> > > >  	vq_callback_t **callbacks;
> > > > @@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > >  	int ret = -ENOMEM;
> > > >  	int i, total_vqs;
> > > >  	const char **names;
> > > > +	u32 *sizes;
> > > >  	bool *ctx;
> > > >
> > > >  	/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
> > > > @@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > >  		ctx = NULL;
> > > >  	}
> > > >
> > > > +	sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
> > > > +	if (!sizes)
> > > > +		goto err_sizes;
> > > > +
> > > >  	/* Parameters for control virtqueue, if any */
> > > >  	if (vi->has_cvq) {
> > > >  		callbacks[total_vqs - 1] = NULL;
> > > >  		names[total_vqs - 1] = "control";
> > > > +		sizes[total_vqs - 1] = 64;
> > > >  	}
> > > >
> > > >  	/* Allocate/initialize parameters for send/receive virtqueues */
> > > > @@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > >  			ctx[rxq2vq(i)] = true;
> > > >  	}
> > > >
> > > > -	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
> > > > -				  names, ctx, NULL);
> > > > +	virtnet_config_sizes(vi, sizes);
> > > > +
> > > > +	ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
> > > > +				       names, sizes, ctx, NULL);
> > > >  	if (ret)
> > > >  		goto err_find;
> > > >
> > > > @@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > > >
> > > >
> > > >  err_find:
> > > > +	kfree(sizes);
> > > > +err_sizes:
> > > >  	kfree(ctx);
> > > >  err_ctx:
> > > >  	kfree(names);
> > > > @@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > >  		vi->curr_queue_pairs = num_online_cpus();
> > > >  	vi->max_queue_pairs = max_queue_pairs;
> > > >
> > > > +	virtnet_init_settings(dev);
> > > > +	virtnet_update_settings(vi);
> > > > +
> > > >  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> > > >  	err = init_vqs(vi);
> > > >  	if (err)
> > > > @@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > >  	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > > >  	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> > > >
> > > > -	virtnet_init_settings(dev);
> > > > -
> > > >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> > > >  		vi->failover = net_failover_create(vi->dev);
> > > >  		if (IS_ERR(vi->failover)) {
> > > > --
> > > > 2.31.0
> > >
>

^ permalink raw reply

* Re: [PATCH] wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
From: Siddh Raman Pant @ 2022-08-15  7:24 UTC (permalink / raw)
  To: Greg KH
  Cc: johannes berg, david s. miller, eric dumazet, jakub kicinski,
	paolo abeni, netdev, linux-wireless, linux-kernel, stable,
	linux-kernel-mentees, syzbot+b6c9fe29aefe68e4ad34
In-Reply-To: <Yvnu/WT1Z+K36UwW@kroah.com>

On Mon, 15 Aug 2022 12:30:13 +0530  Greg KH  wrote:
> Please no blank line before your signed-off-by line or the tools will
> not like it.

Oh okay, noted.

> And did sysbot verify that this change solved the problem?

Syzbot was failing to boot for reasons unrelated to the patch.
I tried testing three times, and every time the boot was failing.
It was taking more than 5 hours for syzbot to pick up the patch
from pending state every time, so I now posted it.

You can look at the syzkaller group page here:
https://groups.google.com/g/syzkaller-bugs/c/bGZwWS4Q3ek/m/dQ3pdAVSAAAJ
(Pardon the atrocious HTML email at the end, I used a mobile app
to email and forgot it doesn't send in plaintext.)

I have locally tested this with the reproducer syzbot gave.

Thanks,
Siddh

^ permalink raw reply

* Re: [PATCH v14 37/42] virtio_net: set the default max ring size by find_vqs()
From: Michael S. Tsirkin @ 2022-08-15  7:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Hans de Goede, Mark Gross, Vadim Pasternak,
	Bjorn Andersson, Mathieu Poirier, Cornelia Huck, Halil Pasic,
	Eric Farman, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Vincent Whitchurch, linux-um, netdev, platform-driver-x86,
	linux-remoteproc, linux-s390, kvm, bpf, kangjie.xu
In-Reply-To: <1660545303.436073-9-xuanzhuo@linux.alibaba.com>

On Mon, Aug 15, 2022 at 02:35:03PM +0800, Xuan Zhuo wrote:
> On Mon, 15 Aug 2022 02:00:16 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Aug 01, 2022 at 02:38:57PM +0800, Xuan Zhuo wrote:
> > > Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
> > > rx at the same time.
> > >
> > >                          | rx/tx ring size
> > > -------------------------------------------
> > > speed == UNKNOWN or < 10G| 1024
> > > speed < 40G              | 4096
> > > speed >= 40G             | 8192
> > >
> > > Call virtnet_update_settings() once before calling init_vqs() to update
> > > speed.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > Acked-by: Jason Wang <jasowang@redhat.com>
> >
> > I've been looking at this patchset because of the resent
> > reported crashes, and I'm having second thoughts about this.
> >
> > Do we really want to second-guess the device supplied
> > max ring size? If yes why?
> >
> > Could you please share some performance data that motivated this
> > specific set of numbers?
> 
> 
> The impact of this value on performance is as follows. The larger the value, the
> throughput can be increased, but the delay will also increase accordingly. It is
> a maximum limit for the ring size under the corresponding speed. The purpose of
> this limitation is not to improve performance, but more to reduce memory usage.
> 
> These data come from many other network cards and some network optimization
> experience.
> 
> For example, in the case of speed = 20G, the impact of ring size greater
> than 4096 on performance has no meaning. At this time, if the device supports
> 8192, we limit it to 4096 through this, the real meaning is to reduce the memory
> usage.
> 
> 
> >
> > Also why do we intepret UNKNOWN as "very low"?
> > I'm thinking that should definitely be "don't change anything".
> >
> 
> Generally speaking, for a network card with a high speed, it will return a
> correct speed. But I think it is a good idea to do nothing.





> 
> > Finally if all this makes sense then shouldn't we react when
> > speed changes?
> 
> This is the feedback of the network card when it is started, and theoretically
> it should not change in the future.

Yes it should:
	Both \field{speed} and \field{duplex} can change, thus the driver
	is expected to re-read these values after receiving a
	configuration change notification.


Moreover, during probe link can quite reasonably be down.
If it is, then speed and duplex might not be correct.




> >
> > Could you try reverting this and showing performance results
> > before and after please? Thanks!
> 
> I hope the above reply can help you, if there is anything else you need me to
> cooperate with, I am very happy.
> 
> If you think it's ok, I can resubmit a commit with 'UNKNOW' set to unlimited. I
> can submit it with the issue of #30.
> 
> Thanks.
> 
> 
> >
> > > ---
> > >  drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 38 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 8a5810bcb839..40532ecbe7fc 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
> > >  		   (unsigned int)GOOD_PACKET_LEN);
> > >  }
> > >
> > > +static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
> > > +{
> > > +	u32 i, rx_size, tx_size;
> > > +
> > > +	if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_10000) {
> > > +		rx_size = 1024;
> > > +		tx_size = 1024;
> > > +
> > > +	} else if (vi->speed < SPEED_40000) {
> > > +		rx_size = 1024 * 4;
> > > +		tx_size = 1024 * 4;
> > > +
> > > +	} else {
> > > +		rx_size = 1024 * 8;
> > > +		tx_size = 1024 * 8;
> > > +	}
> > > +
> > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +		sizes[rxq2vq(i)] = rx_size;
> > > +		sizes[txq2vq(i)] = tx_size;
> > > +	}
> > > +}
> > > +
> > >  static int virtnet_find_vqs(struct virtnet_info *vi)
> > >  {
> > >  	vq_callback_t **callbacks;
> > > @@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >  	int ret = -ENOMEM;
> > >  	int i, total_vqs;
> > >  	const char **names;
> > > +	u32 *sizes;
> > >  	bool *ctx;
> > >
> > >  	/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
> > > @@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >  		ctx = NULL;
> > >  	}
> > >
> > > +	sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
> > > +	if (!sizes)
> > > +		goto err_sizes;
> > > +
> > >  	/* Parameters for control virtqueue, if any */
> > >  	if (vi->has_cvq) {
> > >  		callbacks[total_vqs - 1] = NULL;
> > >  		names[total_vqs - 1] = "control";
> > > +		sizes[total_vqs - 1] = 64;
> > >  	}
> > >
> > >  	/* Allocate/initialize parameters for send/receive virtqueues */
> > > @@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >  			ctx[rxq2vq(i)] = true;
> > >  	}
> > >
> > > -	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
> > > -				  names, ctx, NULL);
> > > +	virtnet_config_sizes(vi, sizes);
> > > +
> > > +	ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
> > > +				       names, sizes, ctx, NULL);
> > >  	if (ret)
> > >  		goto err_find;
> > >
> > > @@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> > >
> > >
> > >  err_find:
> > > +	kfree(sizes);
> > > +err_sizes:
> > >  	kfree(ctx);
> > >  err_ctx:
> > >  	kfree(names);
> > > @@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >  		vi->curr_queue_pairs = num_online_cpus();
> > >  	vi->max_queue_pairs = max_queue_pairs;
> > >
> > > +	virtnet_init_settings(dev);
> > > +	virtnet_update_settings(vi);
> > > +
> > >  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> > >  	err = init_vqs(vi);
> > >  	if (err)
> > > @@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >  	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> > >  	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> > >
> > > -	virtnet_init_settings(dev);
> > > -
> > >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> > >  		vi->failover = net_failover_create(vi->dev);
> > >  		if (IS_ERR(vi->failover)) {
> > > --
> > > 2.31.0
> >


^ permalink raw reply

* Re: Kernel Panic in skb_release_data using genet
From: Maxime Ripard @ 2022-08-15  7:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: nicolas saenz julienne, Doug Berger, bcm-kernel-feedback-list,
	linux-kernel, netdev
In-Reply-To: <a6035600-56f6-1760-ae5c-5e8131a2e8e4@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3023 bytes --]

Hi Florian,

On Thu, Aug 11, 2022 at 08:33:58PM -0700, Florian Fainelli wrote:
> 
> 
> On 5/17/2022 12:52 AM, Maxime Ripard wrote:
> > It's not really 100% reliable, but happens 30%-50% of the time at boot
> > when KASAN is enabled. It seems like enabling KASAN increases that
> > likelihood though, it went unnoticed for some time before I started
> > having those issues again when I enabled it for something unrelated.
> > 
> > It looks like it happens in bursts though, so I would get 10-15 boots
> > fine, and then 4-5 boots with that crash.
> > 
> > Cold boot vs reboot doesn't seem to affect it in one way or the other.
> > 
> > > What version of GCC did you build your kernel with?
> > 
> > The arm64 cross-compiler packaged by Fedora, which is GCC 11.2
> > at the moment.
> > 
> > > How often does that happen? What config.txt file are you using
> > > for your Pi4 B?
> > 
> > You'll find my config.txt and kernel .config attached
> 
> OK, so this is what I have been able to reproduce so far but this does not
> appear to be very reliable to reproduce, I will try my best to hold on to
> that lead though, thanks for your patience.
> 
> # udhcpc -i eth0
> udhcpc: started, v1.35.0
> [   34.355086] bcmgenet fd580000.ethernet: configuring instance for external
> RGMII (RX delay)
> [   34.363758]
> ==================================================================
> [   34.371106] BUG: KASAN: user-memory-access in put_page+0x10/0x64
> [   34.377227] Read of size 4 at addr 01000085 by task ifconfig/165
> [   34.383338]
> [   34.384857] CPU: 0 PID: 165 Comm: ifconfig Tainted: G        W   5.19.0
> #43
> [   34.392560] Hardware name: BCM2711
> [   34.396020]  unwind_backtrace from show_stack+0x18/0x1c
> [   34.401354]  show_stack from dump_stack_lvl+0x40/0x4c
> [   34.406502]  dump_stack_lvl from kasan_report+0x8c/0xa4
> [   34.411825]  kasan_report from put_page+0x10/0x64
> [   34.416615]  put_page from skb_release_data+0x84/0x13c
> [   34.421847]  skb_release_data from __kfree_skb+0x14/0x20
> [   34.427256]  __kfree_skb from bcmgenet_rx_poll+0x504/0x6f8
> [   34.432846]  bcmgenet_rx_poll from __napi_poll.constprop.0+0x50/0x1c0
> [   34.439407]  __napi_poll.constprop.0 from net_rx_action+0x278/0x488
> [   34.445787]  net_rx_action from __do_softirq+0x268/0x390
> [   34.451197]  __do_softirq from __irq_exit_rcu+0x88/0xf8
> [   34.456521]  __irq_exit_rcu from irq_exit+0x10/0x18
> [   34.461492]  irq_exit from call_with_stack+0x18/0x20
> [   34.466553]  call_with_stack from __irq_svc+0x84/0x94
> [   34.471696] Exception stack(0xf0d337f8 to 0xf0d33840)

It looks fairly close indeed.

There's a bunch of notable differences though (user-memory-access vs
wild-memory-access, the read size) but the type of memory access error
can just be due to the randomness of the memory address we try to
access, and the read 4 vs 8 could be because you're running on ARM and
I'm running on arm64?

Thanks again for looking into it

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [GIT PULL] virtio: fatures, fixes
From: Andres Freund @ 2022-08-15  7:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Linus Torvalds, kvm, virtualization, netdev,
	linux-kernel, alvaro.karsz, colin.i.king, colin.king,
	dan.carpenter, david, elic, eperezma, gautam.dawar, gshan,
	hdegoede, hulkci, jasowang, jiaming, kangjie.xu, lingshan.zhu,
	liubo03, michael.christie, pankaj.gupta, peng.fan, quic_mingxue,
	robin.murphy, sgarzare, suwan.kim027, syoshida, xieyongji,
	xuqiang36, Jens Axboe, Guenter Roeck
In-Reply-To: <20220814194031.ciql3slc5c34ayjw@awork3.anarazel.de>

Hi,

On 2022-08-14 12:40:31 -0700, Andres Freund wrote:
> On 2022-08-14 04:59:48 -0400, Michael S. Tsirkin wrote:
> > On Sat, Aug 13, 2022 at 09:39:06PM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2022-08-13 20:52:39 -0700, Andres Freund wrote:
> > > > Is there specific information you'd like from the VM? I just recreated the
> > > > problem and can extract.
> > >
> > > Actually, after reproducing I seem to now hit a likely different issue. I
> > > guess I should have checked exactly the revision I had a problem with earlier,
> > > rather than doing a git pull (up to aea23e7c464b)
> >
> > Looks like there's a generic memory corruption so it crashes
> > in random places.
>
> Either a generic memory corruption, or something wrong with IO.
>
> > Would bisect be possible for you?
>
> I'll give it a go.

Bisect points to

commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7 (refs/bisect/bad)
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date:   Mon Aug 1 14:38:57 2022 +0800

    virtio_net: set the default max ring size by find_vqs()

    Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
    rx at the same time.

                             | rx/tx ring size
    -------------------------------------------
    speed == UNKNOWN or < 10G| 1024
    speed < 40G              | 4096
    speed >= 40G             | 8192

    Call virtnet_update_settings() once before calling init_vqs() to update
    speed.

    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Message-Id: <20220801063902.129329-38-xuanzhuo@linux.alibaba.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


I'm not 100% confident yet, because the likelihood of encountering problems
was not uniform across the versions, with one of them showing the problem only
in 1/3 boots, whereas some of the others showed it 100% of the time. But I've
rebooted enough times to be fairly confident.

With 762faee5a267 I reliably see network not connecting, with
762faee5a267^=fe3dc04e31aa I haven't seen a problem yet.


I did see some other types of crashes in commits nearby, so this might not be
the only problematic bit. See also the discussion around
https://lore.kernel.org/all/CAHk-=wikzU4402P-FpJRK_QwfVOS+t-3p1Wx5awGHTvr-s_0Ew@mail.gmail.com/

Greetings,

Andres Freund

^ permalink raw reply

* Re: [PATCH] wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
From: Greg KH @ 2022-08-15  7:00 UTC (permalink / raw)
  To: Siddh Raman Pant
  Cc: Johannes Berg, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-wireless, linux-kernel, stable,
	linux-kernel-mentees, syzbot+b6c9fe29aefe68e4ad34
In-Reply-To: <20220814151512.9985-1-code@siddh.me>

On Sun, Aug 14, 2022 at 08:45:12PM +0530, Siddh Raman Pant via Linux-kernel-mentees wrote:
> When we are not connected to a channel, sending channel "switch"
> announcement doesn't make any sense.
> 
> The BSS list is empty in that case. This causes the for loop in
> cfg80211_get_bss() to be bypassed, so the function returns NULL
> (check line 1424 of net/wireless/scan.c), causing the WARN_ON()
> in ieee80211_ibss_csa_beacon() to get triggered (check line 500
> of net/mac80211/ibss.c), which was consequently reported on the
> syzkaller dashboard.
> 
> Thus, check if we have an existing connection before generating
> the CSA beacon in ieee80211_ibss_finish_csa().
> 
> Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode")
> Bug report: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6
> Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> 
> Signed-off-by: Siddh Raman Pant <code@siddh.me>

Please no blank line before your signed-off-by line or the tools will
not like it.

And did sysbot verify that this change solved the problem?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v14 37/42] virtio_net: set the default max ring size by find_vqs()
From: Xuan Zhuo @ 2022-08-15  6:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Richard Weinberger, Anton Ivanov, Johannes Berg,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Hans de Goede, Mark Gross, Vadim Pasternak,
	Bjorn Andersson, Mathieu Poirier, Cornelia Huck, Halil Pasic,
	Eric Farman, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Vincent Whitchurch, linux-um, netdev, platform-driver-x86,
	linux-remoteproc, linux-s390, kvm, bpf, kangjie.xu
In-Reply-To: <20220815015405-mutt-send-email-mst@kernel.org>

On Mon, 15 Aug 2022 02:00:16 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Aug 01, 2022 at 02:38:57PM +0800, Xuan Zhuo wrote:
> > Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
> > rx at the same time.
> >
> >                          | rx/tx ring size
> > -------------------------------------------
> > speed == UNKNOWN or < 10G| 1024
> > speed < 40G              | 4096
> > speed >= 40G             | 8192
> >
> > Call virtnet_update_settings() once before calling init_vqs() to update
> > speed.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
>
> I've been looking at this patchset because of the resent
> reported crashes, and I'm having second thoughts about this.
>
> Do we really want to second-guess the device supplied
> max ring size? If yes why?
>
> Could you please share some performance data that motivated this
> specific set of numbers?


The impact of this value on performance is as follows. The larger the value, the
throughput can be increased, but the delay will also increase accordingly. It is
a maximum limit for the ring size under the corresponding speed. The purpose of
this limitation is not to improve performance, but more to reduce memory usage.

These data come from many other network cards and some network optimization
experience.

For example, in the case of speed = 20G, the impact of ring size greater
than 4096 on performance has no meaning. At this time, if the device supports
8192, we limit it to 4096 through this, the real meaning is to reduce the memory
usage.


>
> Also why do we intepret UNKNOWN as "very low"?
> I'm thinking that should definitely be "don't change anything".
>

Generally speaking, for a network card with a high speed, it will return a
correct speed. But I think it is a good idea to do nothing.


> Finally if all this makes sense then shouldn't we react when
> speed changes?

This is the feedback of the network card when it is started, and theoretically
it should not change in the future.

>
> Could you try reverting this and showing performance results
> before and after please? Thanks!

I hope the above reply can help you, if there is anything else you need me to
cooperate with, I am very happy.

If you think it's ok, I can resubmit a commit with 'UNKNOW' set to unlimited. I
can submit it with the issue of #30.

Thanks.


>
> > ---
> >  drivers/net/virtio_net.c | 42 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 8a5810bcb839..40532ecbe7fc 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
> >  		   (unsigned int)GOOD_PACKET_LEN);
> >  }
> >
> > +static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
> > +{
> > +	u32 i, rx_size, tx_size;
> > +
> > +	if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_10000) {
> > +		rx_size = 1024;
> > +		tx_size = 1024;
> > +
> > +	} else if (vi->speed < SPEED_40000) {
> > +		rx_size = 1024 * 4;
> > +		tx_size = 1024 * 4;
> > +
> > +	} else {
> > +		rx_size = 1024 * 8;
> > +		tx_size = 1024 * 8;
> > +	}
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		sizes[rxq2vq(i)] = rx_size;
> > +		sizes[txq2vq(i)] = tx_size;
> > +	}
> > +}
> > +
> >  static int virtnet_find_vqs(struct virtnet_info *vi)
> >  {
> >  	vq_callback_t **callbacks;
> > @@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >  	int ret = -ENOMEM;
> >  	int i, total_vqs;
> >  	const char **names;
> > +	u32 *sizes;
> >  	bool *ctx;
> >
> >  	/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
> > @@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >  		ctx = NULL;
> >  	}
> >
> > +	sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
> > +	if (!sizes)
> > +		goto err_sizes;
> > +
> >  	/* Parameters for control virtqueue, if any */
> >  	if (vi->has_cvq) {
> >  		callbacks[total_vqs - 1] = NULL;
> >  		names[total_vqs - 1] = "control";
> > +		sizes[total_vqs - 1] = 64;
> >  	}
> >
> >  	/* Allocate/initialize parameters for send/receive virtqueues */
> > @@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >  			ctx[rxq2vq(i)] = true;
> >  	}
> >
> > -	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
> > -				  names, ctx, NULL);
> > +	virtnet_config_sizes(vi, sizes);
> > +
> > +	ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
> > +				       names, sizes, ctx, NULL);
> >  	if (ret)
> >  		goto err_find;
> >
> > @@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >
> >
> >  err_find:
> > +	kfree(sizes);
> > +err_sizes:
> >  	kfree(ctx);
> >  err_ctx:
> >  	kfree(names);
> > @@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev)
> >  		vi->curr_queue_pairs = num_online_cpus();
> >  	vi->max_queue_pairs = max_queue_pairs;
> >
> > +	virtnet_init_settings(dev);
> > +	virtnet_update_settings(vi);
> > +
> >  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> >  	err = init_vqs(vi);
> >  	if (err)
> > @@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> >  	netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
> >  	netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
> >
> > -	virtnet_init_settings(dev);
> > -
> >  	if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
> >  		vi->failover = net_failover_create(vi->dev);
> >  		if (IS_ERR(vi->failover)) {
> > --
> > 2.31.0
>

^ permalink raw reply

* [PATCH net-next] net: sched: make tcf_action_dump_1() static
From: Zhengchao Shao @ 2022-08-15  7:01 UTC (permalink / raw)
  To: netdev, linux-kernel, davem, edumazet, kuba, pabeni, jhs,
	xiyou.wangcong, jiri
  Cc: weiyongjun1, yuehaibing, shaozhengchao

Function tcf_action_dump_1() is not used outside of act_api.c, so remove
the superfluous EXPORT_SYMBOL() and marks it static.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
---
 include/net/act_api.h |   1 -
 net/sched/act_api.c   | 100 +++++++++++++++++++++---------------------
 2 files changed, 49 insertions(+), 52 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 9cf6870b526e..d51b3f931771 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -215,7 +215,6 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind,
 		    int ref, bool terse);
 int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
-int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
 
 static inline void tcf_action_update_bstats(struct tc_action *a,
 					    struct sk_buff *skb)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index b69fcde546ba..9fd98bf5c724 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -510,6 +510,55 @@ tcf_action_dump_terse(struct sk_buff *skb, struct tc_action *a, bool from_act)
 	return -1;
 }
 
+int
+tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+{
+	return a->ops->dump(skb, a, bind, ref);
+}
+
+static int
+tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
+{
+	int err = -EINVAL;
+	unsigned char *b = skb_tail_pointer(skb);
+	struct nlattr *nest;
+	u32 flags;
+
+	if (tcf_action_dump_terse(skb, a, false))
+		goto nla_put_failure;
+
+	if (a->hw_stats != TCA_ACT_HW_STATS_ANY &&
+	    nla_put_bitfield32(skb, TCA_ACT_HW_STATS,
+			       a->hw_stats, TCA_ACT_HW_STATS_ANY))
+		goto nla_put_failure;
+
+	if (a->used_hw_stats_valid &&
+	    nla_put_bitfield32(skb, TCA_ACT_USED_HW_STATS,
+			       a->used_hw_stats, TCA_ACT_HW_STATS_ANY))
+		goto nla_put_failure;
+
+	flags = a->tcfa_flags & TCA_ACT_FLAGS_USER_MASK;
+	if (flags &&
+	    nla_put_bitfield32(skb, TCA_ACT_FLAGS,
+			       flags, flags))
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_ACT_IN_HW_COUNT, a->in_hw_count))
+		goto nla_put_failure;
+
+	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
+	if (!nest)
+		goto nla_put_failure;
+	err = tcf_action_dump_old(skb, a, bind, ref);
+	if (err > 0) {
+		nla_nest_end(skb, nest);
+		return err;
+	}
+
+nla_put_failure:
+	nlmsg_trim(skb, b);
+	return -1;
+}
 static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 			   struct netlink_callback *cb)
 {
@@ -1132,57 +1181,6 @@ static void tcf_action_put_many(struct tc_action *actions[])
 	}
 }
 
-int
-tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
-{
-	return a->ops->dump(skb, a, bind, ref);
-}
-
-int
-tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
-{
-	int err = -EINVAL;
-	unsigned char *b = skb_tail_pointer(skb);
-	struct nlattr *nest;
-	u32 flags;
-
-	if (tcf_action_dump_terse(skb, a, false))
-		goto nla_put_failure;
-
-	if (a->hw_stats != TCA_ACT_HW_STATS_ANY &&
-	    nla_put_bitfield32(skb, TCA_ACT_HW_STATS,
-			       a->hw_stats, TCA_ACT_HW_STATS_ANY))
-		goto nla_put_failure;
-
-	if (a->used_hw_stats_valid &&
-	    nla_put_bitfield32(skb, TCA_ACT_USED_HW_STATS,
-			       a->used_hw_stats, TCA_ACT_HW_STATS_ANY))
-		goto nla_put_failure;
-
-	flags = a->tcfa_flags & TCA_ACT_FLAGS_USER_MASK;
-	if (flags &&
-	    nla_put_bitfield32(skb, TCA_ACT_FLAGS,
-			       flags, flags))
-		goto nla_put_failure;
-
-	if (nla_put_u32(skb, TCA_ACT_IN_HW_COUNT, a->in_hw_count))
-		goto nla_put_failure;
-
-	nest = nla_nest_start_noflag(skb, TCA_OPTIONS);
-	if (nest == NULL)
-		goto nla_put_failure;
-	err = tcf_action_dump_old(skb, a, bind, ref);
-	if (err > 0) {
-		nla_nest_end(skb, nest);
-		return err;
-	}
-
-nla_put_failure:
-	nlmsg_trim(skb, b);
-	return -1;
-}
-EXPORT_SYMBOL(tcf_action_dump_1);
-
 int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[],
 		    int bind, int ref, bool terse)
 {
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Wei Fang @ 2022-08-15  6:49 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: hkallweit1@gmail.com, linux@armlinux.org.uk, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	robh+dt@kernel.org, krzysztof.kozlowski+dt@linaro.org,
	f.fainelli@gmail.com, netdev@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <YvZggGkdlAUuQ1NG@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 2022年8月12日 22:15
> To: Wei Fang <wei.fang@nxp.com>
> Cc: hkallweit1@gmail.com; linux@armlinux.org.uk; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> robh+dt@kernel.org; krzysztof.kozlowski+dt@linaro.org; f.fainelli@gmail.com;
> netdev@vger.kernel.org; devicetree@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
> 
> On Sat, Aug 13, 2022 at 12:50:08AM +1000, wei.fang@nxp.com wrote:
> > From: Wei Fang <wei.fang@nxp.com>
> >
> > The hibernation mode of Atheros AR803x PHYs is default enabled.
> > When the cable is unplugged, the PHY will enter hibernation mode and
> > the PHY clock does down. For some MACs, it needs the clock to support
> > it's logic. For instance, stmmac needs the PHY inputs clock is present
> > for software reset completion. Therefore, It is reasonable to add a DT
> > property to disable hibernation mode.
> 
> It is not the first time we have seen this. What you should really be
> concentrating on is the clock out. That is what the MAC requires here.
> 
> You already have the property qca,clk-out-frequency. You could maybe piggy
> back off this. If that property is being used, you know the clock output is used. So
> you should do what is needed to keep it ticking.
> 
> You also have qca,keep-pll-enabled:
> 
>       If set, keep the PLL enabled even if there is no link. Useful if you
>       want to use the clock output without an ethernet link.
> 
> To me, it seems like you already have enough properties, you just need to imply
> that you need to disable hibernation in order to fulfil these properties.
> 
> 	Andrew

Hi Andrew,

	Your suggestion is indeed an effective solution, but I checked both the datasheet
and the driver of AR803x PHYs and found that the qca,clk-out-frequency and the
qca,keep-pll-enabled properties are associated with the CLK_25M pin of AR803x PHYs.
But there is a case that CLK_25M pin is not used on some platforms.
Taking our i.MX8DXL platform as an example, the stmmac and AR8031 PHY are applied
on this platform, but the CLK_25M pin of AR8031 is not used. So when I used the method
you mentioned above, it did not work as expected. In this case, we can only disable the
hibernation mode of AR803x PHYs and keep the RX_CLK always outputting a valid clock
so that the stmmac can complete the software reset operation.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox