All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Arinzon, David" <darinzon@amazon.com>
To: Paolo Abeni <pabeni@redhat.com>,
	"Nelson, Shannon" <shannon.nelson@amd.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: "Woodhouse, David" <dwmw@amazon.co.uk>,
	"Machulsky, Zorik" <zorik@amazon.com>,
	"Matushevsky, Alexander" <matua@amazon.com>,
	"Bshara, Saeed" <saeedb@amazon.com>,
	"Wilson, Matt" <msw@amazon.com>,
	"Liguori, Anthony" <aliguori@amazon.com>,
	"Bshara, Nafea" <nafea@amazon.com>,
	"Belgazal, Netanel" <netanel@amazon.com>,
	"Saidi, Ali" <alisaidi@amazon.com>,
	"Herrenschmidt, Benjamin" <benh@amazon.com>,
	"Kiyanovski, Arthur" <akiyano@amazon.com>,
	"Dagan, Noam" <ndagan@amazon.com>,
	"Agroskin, Shay" <shayagr@amazon.com>,
	"Itzko, Shahar" <itzko@amazon.com>,
	"Abboud, Osama" <osamaabb@amazon.com>,
	"Ostrovsky, Evgeny" <evostrov@amazon.com>,
	"Tabachnik, Ofir" <ofirt@amazon.com>,
	"Koler, Nati" <nkoler@amazon.com>
Subject: RE: [PATCH v2 net-next 07/11] net: ena: Add more information on TX timeouts
Date: Thu, 1 Feb 2024 15:43:23 +0000	[thread overview]
Message-ID: <4e6160bad7144dbc8c98cac49dbe3891@amazon.com> (raw)
In-Reply-To: <1878748538c778a0f0d7fb23cafc4a661132097d.camel@redhat.com>

> On Thu, 2024-02-01 at 13:21 +0000, Arinzon, David wrote:
> > > On Tue, 2024-01-30 at 09:53 +0000, darinzon@amazon.com wrote:
> > > > @@ -3408,25 +3437,45 @@ static int
> > > check_missing_comp_in_tx_queue(struct ena_adapter *adapter,
> > > >                       adapter->missing_tx_completion_to);
> > > >
> > > >               if (unlikely(is_tx_comp_time_expired)) {
> > > > -                     if (!tx_buf->print_once) {
> > > > -                             time_since_last_napi = jiffies_to_usecs(jiffies - tx_ring-
> > > > tx_stats.last_napi_jiffies);
> > > > -                             missing_tx_comp_to = jiffies_to_msecs(adapter-
> > > > missing_tx_completion_to);
> > > > -                             netif_notice(adapter, tx_err, adapter->netdev,
> > > > -                                          "Found a Tx that wasn't completed on time, qid
> %d,
> > > index %d. %u usecs have passed since last napi execution. Missing Tx
> > > timeout value %u msecs\n",
> > > > -                                          tx_ring->qid, i, time_since_last_napi,
> > > missing_tx_comp_to);
> > > > +                     time_since_last_napi =
> > > > +                             jiffies_to_usecs(jiffies - tx_ring-
> >tx_stats.last_napi_jiffies);
> > > > +                     napi_scheduled = !!(ena_napi->napi.state &
> > > > + NAPIF_STATE_SCHED);
> > > > +
> > > > +                     if (missing_tx_comp_to <
> > > > + time_since_last_napi &&
> > > napi_scheduled) {
> > > > +                             /* We suspect napi isn't called because the
> > > > +                              * bottom half is not run. Require a bigger
> > > > +                              * timeout for these cases
> > > > +                              */
> > >
> > > Not blocking this series, but I guess the above "the bottom half is
> > > not run", after commit d15121be7485655129101f3960ae6add40204463,
> > > happens only when running in napi threaded mode, right?
> > >
> > > cheers,
> > >
> > > Paolo
> >
> > Hi Paolo,
> >
> > The ENA driver napi routine doesn't run in threaded mode.
> 
> ... unless you do:
> 
> echo 1 > /sys/class/net/<nic name>/threaded
> 
> :)
> 

Thanks for pointing this out. We will look into this further.

> > We've seen cases where napi is indeed scheduled, but didn't get a
> > chance to run for a noticeable amount of time and process TX
> > completions, and based on that we conclude that there's a high CPU
> > load that doesn't allow the routine to run in a timely manner.
> > Based on the information in
> d15121be7485655129101f3960ae6add40204463,
> > the observed stalls are in the magnitude of milliseconds, the above
> > code is actually an additional grace time, and the timeouts here are in
> seconds.
> 
> Do I read correctly that in your scenario the napi instance is not scheduled for
> _seconds_?  That smells like a serious bug somewhere else really worthy of
> more investigation.
> 
> Cheers,
> 
> Paolo
> 

Thanks for noting this. It is something that we're actively monitoring and looking to improve.

Thanks,
David


  reply	other threads:[~2024-02-01 15:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  9:53 [PATCH v2 net-next 00/11] ENA driver changes darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 01/11] net: ena: Remove an unused field darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 02/11] net: ena: Add more documentation for RX copybreak darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 03/11] net: ena: Minor cosmetic changes darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 04/11] net: ena: Enable DIM by default darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 05/11] net: ena: Remove CQ tail pointer update darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 06/11] net: ena: Change error print during ena_device_init() darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 07/11] net: ena: Add more information on TX timeouts darinzon
2024-02-01 12:17   ` Paolo Abeni
2024-02-01 13:21     ` Arinzon, David
2024-02-01 14:36       ` Paolo Abeni
2024-02-01 15:43         ` Arinzon, David [this message]
2024-02-01 12:27   ` Simon Horman
2024-02-01 12:53     ` Arinzon, David
2024-01-30  9:53 ` [PATCH v2 net-next 08/11] net: ena: Relocate skb_tx_timestamp() to improve time stamping accuracy darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 09/11] net: ena: Change default print level for netif_ prints darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 10/11] net: ena: handle ena_calc_io_queue_size() possible errors darinzon
2024-01-30  9:53 ` [PATCH v2 net-next 11/11] net: ena: Reduce lines with longer column width boundary darinzon
2024-02-01 16:12   ` Jakub Kicinski
2024-02-01 17:40     ` Arinzon, David
2024-02-01 12:30 ` [PATCH v2 net-next 00/11] ENA driver changes patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e6160bad7144dbc8c98cac49dbe3891@amazon.com \
    --to=darinzon@amazon.com \
    --cc=akiyano@amazon.com \
    --cc=aliguori@amazon.com \
    --cc=alisaidi@amazon.com \
    --cc=benh@amazon.com \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=evostrov@amazon.com \
    --cc=itzko@amazon.com \
    --cc=kuba@kernel.org \
    --cc=matua@amazon.com \
    --cc=msw@amazon.com \
    --cc=nafea@amazon.com \
    --cc=ndagan@amazon.com \
    --cc=netanel@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=nkoler@amazon.com \
    --cc=ofirt@amazon.com \
    --cc=osamaabb@amazon.com \
    --cc=pabeni@redhat.com \
    --cc=saeedb@amazon.com \
    --cc=shannon.nelson@amd.com \
    --cc=shayagr@amazon.com \
    --cc=zorik@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.