netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 net] qede: Fix scheduling while atomic
@ 2023-05-18 14:52 Manish Chopra
  2023-05-20  4:30 ` Jakub Kicinski
  0 siblings, 1 reply; 3+ messages in thread
From: Manish Chopra @ 2023-05-18 14:52 UTC (permalink / raw)
  To: kuba; +Cc: netdev, aelior, palok, Sudarsana Kalluru, David Miller

Bonding module collects the statistics while holding
the spinlock, beneath that qede->qed driver statistics
flow gets scheduled out due to usleep_range() used in PTT
acquire logic which results into below bug and traces -

[ 3673.988874] Hardware name: HPE ProLiant DL365 Gen10 Plus/ProLiant DL365 Gen10 Plus, BIOS A42 10/29/2021
[ 3673.988878] Call Trace:
[ 3673.988891]  dump_stack_lvl+0x34/0x44
[ 3673.988908]  __schedule_bug.cold+0x47/0x53
[ 3673.988918]  __schedule+0x3fb/0x560
[ 3673.988929]  schedule+0x43/0xb0
[ 3673.988932]  schedule_hrtimeout_range_clock+0xbf/0x1b0
[ 3673.988937]  ? __hrtimer_init+0xc0/0xc0
[ 3673.988950]  usleep_range+0x5e/0x80
[ 3673.988955]  qed_ptt_acquire+0x2b/0xd0 [qed]
[ 3673.988981]  _qed_get_vport_stats+0x141/0x240 [qed]
[ 3673.989001]  qed_get_vport_stats+0x18/0x80 [qed]
[ 3673.989016]  qede_fill_by_demand_stats+0x37/0x400 [qede]
[ 3673.989028]  qede_get_stats64+0x19/0xe0 [qede]
[ 3673.989034]  dev_get_stats+0x5c/0xc0
[ 3673.989045]  netstat_show.constprop.0+0x52/0xb0
[ 3673.989055]  dev_attr_show+0x19/0x40
[ 3673.989065]  sysfs_kf_seq_show+0x9b/0xf0
[ 3673.989076]  seq_read_iter+0x120/0x4b0
[ 3673.989087]  new_sync_read+0x118/0x1a0
[ 3673.989095]  vfs_read+0xf3/0x180
[ 3673.989099]  ksys_read+0x5f/0xe0
[ 3673.989102]  do_syscall_64+0x3b/0x90
[ 3673.989109]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 3673.989115] RIP: 0033:0x7f8467d0b082
[ 3673.989119] Code: c0 e9 b2 fe ff ff 50 48 8d 3d ca 05 08 00 e8 35 e7 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[ 3673.989121] RSP: 002b:00007ffffb21fd08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 3673.989127] RAX: ffffffffffffffda RBX: 000000000100eca0 RCX: 00007f8467d0b082
[ 3673.989128] RDX: 00000000000003ff RSI: 00007ffffb21fdc0 RDI: 0000000000000003
[ 3673.989130] RBP: 00007f8467b96028 R08: 0000000000000010 R09: 00007ffffb21ec00
[ 3673.989132] R10: 00007ffffb27b170 R11: 0000000000000246 R12: 00000000000000f0
[ 3673.989134] R13: 0000000000000003 R14: 00007f8467b92000 R15: 0000000000045a05
[ 3673.989139] CPU: 30 PID: 285188 Comm: read_all Kdump: loaded Tainted: G        W  OE

Fix this by collecting the statistics asynchronously from a periodic
delayed work scheduled at default stats coalescing interval and return
the recent copy of statisitcs from .ndo_get_stats64(), also add ability
to configure/retrieve stats coalescing interval using below commands -

ethtool -C ethx stats-block-usecs <val>
ethtool -c ethx

Fixes: 133fac0eedc3 ("qede: Add basic ethtool support")
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Manish Chopra <manishc@marvell.com>
---
v1->v2:
 - Fixed checkpatch and kdoc warnings.
v2->v3:
 - Moving the changelog after tags.
v3->v4:
 - Changes to collect stats periodically using delayed work
   and add ability to configure/retrieve stats coalescing
   interval using ethtool
 - Modified commit description to reflect the changes
---
 drivers/net/ethernet/qlogic/qede/qede.h       |  9 +++++
 .../net/ethernet/qlogic/qede/qede_ethtool.c   | 38 ++++++++++++++++++-
 drivers/net/ethernet/qlogic/qede/qede_main.c  | 34 ++++++++++++++++-
 3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index f90dcfe9ee68..7106e29e2c2c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -32,6 +32,11 @@
 
 #define DRV_MODULE_SYM		qede
 
+#define QEDE_DEF_STATS_COAL_INTERVAL	HZ
+#define QEDE_DEF_STATS_COAL_TICKS	1000000
+#define QEDE_MIN_STATS_COAL_TICKS	250000
+#define QEDE_MAX_STATS_COAL_TICKS	5000000
+
 struct qede_stats_common {
 	u64 no_buff_discards;
 	u64 packet_too_big_discard;
@@ -271,6 +276,10 @@ struct qede_dev {
 #define QEDE_ERR_WARN			3
 
 	struct qede_dump_info		dump_info;
+	struct delayed_work		periodic_task;
+	unsigned long			stats_coal_interval;
+	u32				stats_coal_ticks;
+	spinlock_t			stats_lock; /* lock for vport stats access */
 };
 
 enum QEDE_STATE {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 8284c4c1528f..3457f2af4bde 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -426,6 +426,8 @@ static void qede_get_ethtool_stats(struct net_device *dev,
 		}
 	}
 
+	spin_lock(&edev->stats_lock);
+
 	for (i = 0; i < QEDE_NUM_STATS; i++) {
 		if (qede_is_irrelevant_stat(edev, i))
 			continue;
@@ -435,6 +437,8 @@ static void qede_get_ethtool_stats(struct net_device *dev,
 		buf++;
 	}
 
+	spin_unlock(&edev->stats_lock);
+
 	__qede_unlock(edev);
 }
 
@@ -817,6 +821,7 @@ static int qede_get_coalesce(struct net_device *dev,
 
 	coal->rx_coalesce_usecs = rx_coal;
 	coal->tx_coalesce_usecs = tx_coal;
+	coal->stats_block_coalesce_usecs = edev->stats_coal_ticks;
 
 	return rc;
 }
@@ -830,6 +835,33 @@ int qede_set_coalesce(struct net_device *dev, struct ethtool_coalesce *coal,
 	int i, rc = 0;
 	u16 rxc, txc;
 
+	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
+		u32 stats_coal_ticks, prev_stats_coal_ticks;
+
+		stats_coal_ticks = coal->stats_block_coalesce_usecs;
+		prev_stats_coal_ticks = edev->stats_coal_ticks;
+
+		/* zero coal ticks to disable periodic stats */
+		if (stats_coal_ticks)
+			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
+						   QEDE_MIN_STATS_COAL_TICKS,
+						   QEDE_MAX_STATS_COAL_TICKS);
+
+		stats_coal_ticks = rounddown(stats_coal_ticks, QEDE_MIN_STATS_COAL_TICKS);
+		edev->stats_coal_ticks = stats_coal_ticks;
+
+		if (edev->stats_coal_ticks) {
+			edev->stats_coal_interval = (unsigned long)edev->stats_coal_ticks *
+							HZ / 1000000;
+
+			if (prev_stats_coal_ticks == 0)
+				schedule_delayed_work(&edev->periodic_task, 0);
+		}
+
+		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal interval=%lu jiffies\n",
+			   edev->stats_coal_interval);
+	}
+
 	if (!netif_running(dev)) {
 		DP_INFO(edev, "Interface is down\n");
 		return -EINVAL;
@@ -2236,7 +2268,8 @@ static int qede_get_per_coalesce(struct net_device *dev,
 }
 
 static const struct ethtool_ops qede_ethtool_ops = {
-	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS,
+	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS |
+					  ETHTOOL_COALESCE_STATS_BLOCK_USECS,
 	.get_link_ksettings		= qede_get_link_ksettings,
 	.set_link_ksettings		= qede_set_link_ksettings,
 	.get_drvinfo			= qede_get_drvinfo,
@@ -2287,7 +2320,8 @@ static const struct ethtool_ops qede_ethtool_ops = {
 };
 
 static const struct ethtool_ops qede_vf_ethtool_ops = {
-	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS,
+	.supported_coalesce_params	= ETHTOOL_COALESCE_USECS |
+					  ETHTOOL_COALESCE_STATS_BLOCK_USECS,
 	.get_link_ksettings		= qede_get_link_ksettings,
 	.get_drvinfo			= qede_get_drvinfo,
 	.get_msglevel			= qede_get_msglevel,
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 06c6a5813606..5aba9c4a759d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -308,6 +308,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev)
 
 	edev->ops->get_vport_stats(edev->cdev, &stats);
 
+	spin_lock(&edev->stats_lock);
+
 	p_common->no_buff_discards = stats.common.no_buff_discards;
 	p_common->packet_too_big_discard = stats.common.packet_too_big_discard;
 	p_common->ttl0_discard = stats.common.ttl0_discard;
@@ -405,6 +407,8 @@ void qede_fill_by_demand_stats(struct qede_dev *edev)
 		p_ah->tx_1519_to_max_byte_packets =
 		    stats.ah.tx_1519_to_max_byte_packets;
 	}
+
+	spin_unlock(&edev->stats_lock);
 }
 
 static void qede_get_stats64(struct net_device *dev,
@@ -413,9 +417,10 @@ static void qede_get_stats64(struct net_device *dev,
 	struct qede_dev *edev = netdev_priv(dev);
 	struct qede_stats_common *p_common;
 
-	qede_fill_by_demand_stats(edev);
 	p_common = &edev->stats.common;
 
+	spin_lock(&edev->stats_lock);
+
 	stats->rx_packets = p_common->rx_ucast_pkts + p_common->rx_mcast_pkts +
 			    p_common->rx_bcast_pkts;
 	stats->tx_packets = p_common->tx_ucast_pkts + p_common->tx_mcast_pkts +
@@ -435,6 +440,8 @@ static void qede_get_stats64(struct net_device *dev,
 		stats->collisions = edev->stats.bb.tx_total_collisions;
 	stats->rx_crc_errors = p_common->rx_crc_errors;
 	stats->rx_frame_errors = p_common->rx_align_errors;
+
+	spin_unlock(&edev->stats_lock);
 }
 
 #ifdef CONFIG_QED_SRIOV
@@ -1000,6 +1007,21 @@ static void qede_unlock(struct qede_dev *edev)
 	rtnl_unlock();
 }
 
+static void qede_periodic_task(struct work_struct *work)
+{
+	struct qede_dev *edev = container_of(work, struct qede_dev,
+					     periodic_task.work);
+
+	if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags))
+		return;
+
+	if (edev->stats_coal_ticks) {
+		qede_fill_by_demand_stats(edev);
+		schedule_delayed_work(&edev->periodic_task,
+				      edev->stats_coal_interval);
+	}
+}
+
 static void qede_sp_task(struct work_struct *work)
 {
 	struct qede_dev *edev = container_of(work, struct qede_dev,
@@ -1208,7 +1230,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
 		 * from there, although it's unlikely].
 		 */
 		INIT_DELAYED_WORK(&edev->sp_task, qede_sp_task);
+		INIT_DELAYED_WORK(&edev->periodic_task, qede_periodic_task);
 		mutex_init(&edev->qede_lock);
+		spin_lock_init(&edev->stats_lock);
 
 		rc = register_netdev(edev->ndev);
 		if (rc) {
@@ -1233,6 +1257,11 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level,
 	edev->rx_copybreak = QEDE_RX_HDR_SIZE;
 
 	qede_log_probe(edev);
+
+	edev->stats_coal_interval = QEDE_DEF_STATS_COAL_INTERVAL;
+	edev->stats_coal_ticks = QEDE_DEF_STATS_COAL_TICKS;
+	schedule_delayed_work(&edev->periodic_task, 0);
+
 	return 0;
 
 err4:
@@ -1301,6 +1330,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
 		unregister_netdev(ndev);
 
 		cancel_delayed_work_sync(&edev->sp_task);
+		cancel_delayed_work_sync(&edev->periodic_task);
 
 		edev->ops->common->set_power_state(cdev, PCI_D0);
 
@@ -2571,6 +2601,8 @@ static void qede_recovery_handler(struct qede_dev *edev)
 
 	DP_NOTICE(edev, "Starting a recovery process\n");
 
+	edev->stats_coal_ticks = 0;
+
 	/* No need to acquire first the qede_lock since is done by qede_sp_task
 	 * before calling this function.
 	 */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v4 net] qede: Fix scheduling while atomic
  2023-05-18 14:52 [PATCH v4 net] qede: Fix scheduling while atomic Manish Chopra
@ 2023-05-20  4:30 ` Jakub Kicinski
  2023-05-22 13:09   ` [EXT] " Manish Chopra
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2023-05-20  4:30 UTC (permalink / raw)
  To: Manish Chopra; +Cc: netdev, aelior, palok, Sudarsana Kalluru, David Miller

On Thu, 18 May 2023 20:22:14 +0530 Manish Chopra wrote:
> Bonding module collects the statistics while holding
> the spinlock, beneath that qede->qed driver statistics
> flow gets scheduled out due to usleep_range() used in PTT
> acquire logic which results into below bug and traces -

>  	struct qede_dump_info		dump_info;
> +	struct delayed_work		periodic_task;
> +	unsigned long			stats_coal_interval;
> +	u32				stats_coal_ticks;

It's a bit odd to make _interval ulong and ticks u32 when _ticks will
obviously be much larger..

Also - s/ticks/usecs/ ? I'd have guessed interval == usecs, ticks ==
jiffies without reading the code, and the inverse is true.

> +	spinlock_t			stats_lock; /* lock for vport stats access */
>  };

> +	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
> +		u32 stats_coal_ticks, prev_stats_coal_ticks;
> +
> +		stats_coal_ticks = coal->stats_block_coalesce_usecs;
> +		prev_stats_coal_ticks = edev->stats_coal_ticks;
> +
> +		/* zero coal ticks to disable periodic stats */
> +		if (stats_coal_ticks)
> +			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
> +						   QEDE_MIN_STATS_COAL_TICKS,
> +						   QEDE_MAX_STATS_COAL_TICKS);
> +
> +		stats_coal_ticks = rounddown(stats_coal_ticks, QEDE_MIN_STATS_COAL_TICKS);
> +		edev->stats_coal_ticks = stats_coal_ticks;

Why round down the usecs?  Don't you want to return to the user on get
exactly what set specified?  Otherwise I wouldn't bother saving the
usecs at all, just convert back from jiffies.

> +		if (edev->stats_coal_ticks) {
> +			edev->stats_coal_interval = (unsigned long)edev->stats_coal_ticks *
> +							HZ / 1000000;

usecs_to_jiffies()

> +			if (prev_stats_coal_ticks == 0)
> +				schedule_delayed_work(&edev->periodic_task, 0);
> +		}
> +
> +		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal interval=%lu jiffies\n",
> +			   edev->stats_coal_interval);
> +	}
> +
>  	if (!netif_running(dev)) {
>  		DP_INFO(edev, "Interface is down\n");
>  		return -EINVAL;
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [EXT] Re: [PATCH v4 net] qede: Fix scheduling while atomic
  2023-05-20  4:30 ` Jakub Kicinski
@ 2023-05-22 13:09   ` Manish Chopra
  0 siblings, 0 replies; 3+ messages in thread
From: Manish Chopra @ 2023-05-22 13:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, Ariel Elior, Alok Prasad,
	Sudarsana Reddy Kalluru, David Miller

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, May 20, 2023 10:01 AM
> To: Manish Chopra <manishc@marvell.com>
> Cc: netdev@vger.kernel.org; Ariel Elior <aelior@marvell.com>; Alok Prasad
> <palok@marvell.com>; Sudarsana Reddy Kalluru <skalluru@marvell.com>;
> David Miller <davem@davemloft.net>
> Subject: [EXT] Re: [PATCH v4 net] qede: Fix scheduling while atomic
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, 18 May 2023 20:22:14 +0530 Manish Chopra wrote:
> > Bonding module collects the statistics while holding the spinlock,
> > beneath that qede->qed driver statistics flow gets scheduled out due
> > to usleep_range() used in PTT acquire logic which results into below
> > bug and traces -
> 
> >  	struct qede_dump_info		dump_info;
> > +	struct delayed_work		periodic_task;
> > +	unsigned long			stats_coal_interval;
> > +	u32				stats_coal_ticks;
> 
> It's a bit odd to make _interval ulong and ticks u32 when _ticks will obviously
> be much larger..
> 
> Also - s/ticks/usecs/ ? I'd have guessed interval == usecs, ticks == jiffies
> without reading the code, and the inverse is true.

I will rename to avoid the confusion (ticks to usecs and interval to ticks).

> 
> > +	spinlock_t			stats_lock; /* lock for vport stats
> access */
> >  };
> 
> > +	if (edev->stats_coal_ticks != coal->stats_block_coalesce_usecs) {
> > +		u32 stats_coal_ticks, prev_stats_coal_ticks;
> > +
> > +		stats_coal_ticks = coal->stats_block_coalesce_usecs;
> > +		prev_stats_coal_ticks = edev->stats_coal_ticks;
> > +
> > +		/* zero coal ticks to disable periodic stats */
> > +		if (stats_coal_ticks)
> > +			stats_coal_ticks = clamp_t(u32, stats_coal_ticks,
> > +
> QEDE_MIN_STATS_COAL_TICKS,
> > +
> QEDE_MAX_STATS_COAL_TICKS);
> > +
> > +		stats_coal_ticks = rounddown(stats_coal_ticks,
> QEDE_MIN_STATS_COAL_TICKS);
> > +		edev->stats_coal_ticks = stats_coal_ticks;
> 
> Why round down the usecs?  Don't you want to return to the user on get
> exactly what set specified?  Otherwise I wouldn't bother saving the usecs at
> all, just convert back from jiffies.

Maybe I should not put this and allow user to configure any usecs value (by not limiting to any min/max)
as long as it results into different/unique ticks/jiffies. But I should still save usecs rather than ticks/jiffies
(or preferably both in order to avoid runtime conversion at the time of scheduling the work every time)
because on get jiffies_to_usecs() could be misleading, for example if user configured 10 usecs which will
result into single jiffy and on getting it back it will report 1000 usecs when converting back using jiffies_to_usecs()
which is still not the same as what user configured.

> 
> > +		if (edev->stats_coal_ticks) {
> > +			edev->stats_coal_interval = (unsigned long)edev-
> >stats_coal_ticks *
> > +							HZ / 1000000;
> 
> usecs_to_jiffies()
> 
> > +			if (prev_stats_coal_ticks == 0)
> > +				schedule_delayed_work(&edev-
> >periodic_task, 0);
> > +		}
> > +
> > +		DP_VERBOSE(edev, QED_MSG_DEBUG, "stats coal
> interval=%lu jiffies\n",
> > +			   edev->stats_coal_interval);
> > +	}
> > +
> >  	if (!netif_running(dev)) {
> >  		DP_INFO(edev, "Interface is down\n");
> >  		return -EINVAL;
> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-22 13:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-18 14:52 [PATCH v4 net] qede: Fix scheduling while atomic Manish Chopra
2023-05-20  4:30 ` Jakub Kicinski
2023-05-22 13:09   ` [EXT] " Manish Chopra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).