From: "Arinzon, David" <darinzon@amazon.com>
To: Simon Horman <horms@kernel.org>
Cc: "Nelson, Shannon" <shannon.nelson@amd.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Woodhouse, David" <dwmw@amazon.co.uk>,
"Machulsky, Zorik" <zorik@amazon.com>,
"Matushevsky, Alexander" <matua@amazon.com>,
"Bshara, Saeed" <saeedb@amazon.com>,
"Wilson, Matt" <msw@amazon.com>,
"Liguori, Anthony" <aliguori@amazon.com>,
"Bshara, Nafea" <nafea@amazon.com>,
"Belgazal, Netanel" <netanel@amazon.com>,
"Saidi, Ali" <alisaidi@amazon.com>,
"Herrenschmidt, Benjamin" <benh@amazon.com>,
"Kiyanovski, Arthur" <akiyano@amazon.com>,
"Dagan, Noam" <ndagan@amazon.com>,
"Agroskin, Shay" <shayagr@amazon.com>,
"Itzko, Shahar" <itzko@amazon.com>,
"Abboud, Osama" <osamaabb@amazon.com>,
"Ostrovsky, Evgeny" <evostrov@amazon.com>,
"Tabachnik, Ofir" <ofirt@amazon.com>,
"Koler, Nati" <nkoler@amazon.com>
Subject: RE: [PATCH v2 net-next 07/11] net: ena: Add more information on TX timeouts
Date: Thu, 1 Feb 2024 12:53:41 +0000 [thread overview]
Message-ID: <903a717258b548e19314b35e1ff9b638@amazon.com> (raw)
In-Reply-To: <20240201122705.GA530335@kernel.org>
> On Tue, Jan 30, 2024 at 09:53:49AM +0000, darinzon@amazon.com wrote:
> > From: David Arinzon <darinzon@amazon.com>
> >
> > The function responsible for polling TX completions might not receive
> > the CPU resources it needs due to higher priority tasks running on the
> > requested core.
> >
> > The driver might not be able to recognize such cases, but it can use
> > its state to suspect that they happened. If both conditions are met:
> >
> > - napi hasn't been executed more than the TX completion timeout value
> > - napi is scheduled (meaning that we've received an interrupt)
> >
> > Then it's more likely that the napi handler isn't scheduled because of
> > an overloaded CPU.
> > It was decided that for this case, the driver would wait twice as long
> > as the regular timeout before scheduling a reset.
> > The driver uses ENA_REGS_RESET_SUSPECTED_POLL_STARVATION reset
> reason
> > to indicate this case to the device.
> >
> > This patch also adds more information to the ena_tx_timeout() callback.
> > This function is called by the kernel when it detects that a specific
> > TX queue has been closed for too long.
> >
> > Signed-off-by: Shay Agroskin <shayagr@amazon.com>
> > Signed-off-by: David Arinzon <darinzon@amazon.com>
> > ---
> > drivers/net/ethernet/amazon/ena/ena_netdev.c | 77
> +++++++++++++++----
> > .../net/ethernet/amazon/ena/ena_regs_defs.h | 1 +
> > 2 files changed, 64 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > index 18acb76..ae9291b 100644
> > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > @@ -47,19 +47,44 @@ static int ena_restore_device(struct ena_adapter
> > *adapter);
> >
> > static void ena_tx_timeout(struct net_device *dev, unsigned int
> > txqueue) {
> > + enum ena_regs_reset_reason_types reset_reason =
> > + ENA_REGS_RESET_OS_NETDEV_WD;
> > struct ena_adapter *adapter = netdev_priv(dev);
> > + unsigned int time_since_last_napi, threshold;
> > + struct ena_ring *tx_ring;
> > + int napi_scheduled;
> > +
> > + if (txqueue >= adapter->num_io_queues) {
> > + netdev_err(dev, "TX timeout on invalid queue %u\n", txqueue);
> > + goto schedule_reset;
> > + }
> > +
> > + threshold = jiffies_to_usecs(dev->watchdog_timeo);
> > + tx_ring = &adapter->tx_ring[txqueue];
> > +
> > + time_since_last_napi = jiffies_to_usecs(jiffies - tx_ring-
> >tx_stats.last_napi_jiffies);
> > + napi_scheduled = !!(tx_ring->napi->state & NAPIF_STATE_SCHED);
> >
> > + netdev_err(dev,
> > + "TX q %d is paused for too long (threshold %u). Time since last
> napi %u usec. napi scheduled: %d\n",
> > + txqueue,
> > + threshold,
> > + time_since_last_napi,
> > + napi_scheduled);
> > +
> > + if (threshold < time_since_last_napi && napi_scheduled) {
> > + netdev_err(dev,
> > + "napi handler hasn't been called for a long time but is
> scheduled\n");
> > + reset_reason =
> > + ENA_REGS_RESET_SUSPECTED_POLL_STARVATION;
>
> Hi David,
>
> a nit from my side: the line above is indented one tab-stop too many.
> No need to respin just for this AFAIC.
>
Hi Simon,
Thanks for pointing it out. Seems like I got carried away a bit with the
Indentation due to the print above it.
Thanks,
David
> > + }
> > +schedule_reset:
> > /* Change the state of the device to trigger reset
> > * Check that we are not in the middle or a trigger already
> > */
> > -
> > if (test_and_set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))
> > return;
> >
> > - ena_reset_device(adapter, ENA_REGS_RESET_OS_NETDEV_WD);
> > + ena_reset_device(adapter, reset_reason);
> > ena_increase_stat(&adapter->dev_stats.tx_timeout, 1,
> > &adapter->syncp);
> > -
> > - netif_err(adapter, tx_err, dev, "Transmit time out\n");
> > }
> >
>
> ...
next prev parent reply other threads:[~2024-02-01 12:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 9:53 [PATCH v2 net-next 00/11] ENA driver changes darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 01/11] net: ena: Remove an unused field darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 02/11] net: ena: Add more documentation for RX copybreak darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 03/11] net: ena: Minor cosmetic changes darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 04/11] net: ena: Enable DIM by default darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 05/11] net: ena: Remove CQ tail pointer update darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 06/11] net: ena: Change error print during ena_device_init() darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 07/11] net: ena: Add more information on TX timeouts darinzon
2024-02-01 12:17 ` Paolo Abeni
2024-02-01 13:21 ` Arinzon, David
2024-02-01 14:36 ` Paolo Abeni
2024-02-01 15:43 ` Arinzon, David
2024-02-01 12:27 ` Simon Horman
2024-02-01 12:53 ` Arinzon, David [this message]
2024-01-30 9:53 ` [PATCH v2 net-next 08/11] net: ena: Relocate skb_tx_timestamp() to improve time stamping accuracy darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 09/11] net: ena: Change default print level for netif_ prints darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 10/11] net: ena: handle ena_calc_io_queue_size() possible errors darinzon
2024-01-30 9:53 ` [PATCH v2 net-next 11/11] net: ena: Reduce lines with longer column width boundary darinzon
2024-02-01 16:12 ` Jakub Kicinski
2024-02-01 17:40 ` Arinzon, David
2024-02-01 12:30 ` [PATCH v2 net-next 00/11] ENA driver changes patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=903a717258b548e19314b35e1ff9b638@amazon.com \
--to=darinzon@amazon.com \
--cc=akiyano@amazon.com \
--cc=aliguori@amazon.com \
--cc=alisaidi@amazon.com \
--cc=benh@amazon.com \
--cc=davem@davemloft.net \
--cc=dwmw@amazon.co.uk \
--cc=evostrov@amazon.com \
--cc=horms@kernel.org \
--cc=itzko@amazon.com \
--cc=kuba@kernel.org \
--cc=matua@amazon.com \
--cc=msw@amazon.com \
--cc=nafea@amazon.com \
--cc=ndagan@amazon.com \
--cc=netanel@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=nkoler@amazon.com \
--cc=ofirt@amazon.com \
--cc=osamaabb@amazon.com \
--cc=saeedb@amazon.com \
--cc=shannon.nelson@amd.com \
--cc=shayagr@amazon.com \
--cc=zorik@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).