From: <netanel@amazon.com>
To: <davem@davemloft.net>, <netdev@vger.kernel.org>
Cc: Netanel Belgazal <netanel@amazon.com>, <dwmw@amazon.com>,
<zorik@amazon.com>, <matua@amazon.com>, <saeedb@amazon.com>,
<msw@amazon.com>, <aliguori@amazon.com>, <nafea@amazon.com>,
<evgenys@amazon.com>
Subject: [PATCH net 8/9] net: ena: bug fix in lost tx packets detection mechanism
Date: Sun, 11 Jun 2017 15:42:50 +0300 [thread overview]
Message-ID: <1497184971-28178-9-git-send-email-netanel@amazon.com> (raw)
In-Reply-To: <1497184971-28178-1-git-send-email-netanel@amazon.com>
From: Netanel Belgazal <netanel@amazon.com>
check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 -
drivers/net/ethernet/amazon/ena/ena_netdev.c | 66 +++++++++++++++------------
drivers/net/ethernet/amazon/ena/ena_netdev.h | 14 +++++-
3 files changed, 50 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 533b2fbdeef1..3ee55e2fd694 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(tx_poll),
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(prepare_ctx_err),
- ENA_STAT_TX_ENTRY(missing_tx_comp),
ENA_STAT_TX_ENTRY(bad_req_id),
};
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 3c366bfbbab1..4f16ed38bcf3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
tx_info->tx_descs = nb_hw_desc;
tx_info->last_jiffies = jiffies;
+ tx_info->print_once = 0;
tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use,
tx_ring->ring_size);
@@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct *work)
"Reset attempt failed. Can not reset the device\n");
}
-static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+static int check_missing_comp_in_queue(struct ena_adapter *adapter,
+ struct ena_ring *tx_ring)
{
struct ena_tx_buffer *tx_buf;
unsigned long last_jiffies;
+ u32 missed_tx = 0;
+ int i;
+
+ for (i = 0; i < tx_ring->ring_size; i++) {
+ tx_buf = &tx_ring->tx_buffer_info[i];
+ last_jiffies = tx_buf->last_jiffies;
+ if (unlikely(last_jiffies &&
+ time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
+ if (!tx_buf->print_once)
+ netif_notice(adapter, tx_err, adapter->netdev,
+ "Found a Tx that wasn't completed on time, qid %d, index %d.\n",
+ tx_ring->qid, i);
+
+ tx_buf->print_once = 1;
+ missed_tx++;
+
+ if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) {
+ netif_err(adapter, tx_err, adapter->netdev,
+ "The number of lost tx completions is above the threshold (%d > %d). Reset the device\n",
+ missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS);
+ set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
+ return -EIO;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+{
struct ena_ring *tx_ring;
- int i, j, budget;
- u32 missed_tx;
+ int i, budget, rc;
/* Make sure the driver doesn't turn the device in other process */
smp_rmb();
@@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter)
for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) {
tx_ring = &adapter->tx_ring[i];
- for (j = 0; j < tx_ring->ring_size; j++) {
- tx_buf = &tx_ring->tx_buffer_info[j];
- last_jiffies = tx_buf->last_jiffies;
- if (unlikely(last_jiffies && time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
- netif_notice(adapter, tx_err, adapter->netdev,
- "Found a Tx that wasn't completed on time, qid %d, index %d.\n",
- tx_ring->qid, j);
-
- u64_stats_update_begin(&tx_ring->syncp);
- missed_tx = tx_ring->tx_stats.missing_tx_comp++;
- u64_stats_update_end(&tx_ring->syncp);
-
- /* Clear last jiffies so the lost buffer won't
- * be counted twice.
- */
- tx_buf->last_jiffies = 0;
-
- if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) {
- netif_err(adapter, tx_err, adapter->netdev,
- "The number of lost tx completion is above the threshold (%d > %d). Reset the device\n",
- missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS);
- set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
- }
- }
- }
+ rc = check_missing_comp_in_queue(adapter, tx_ring);
+ if (unlikely(rc))
+ return;
budget--;
if (!budget)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 8828f1d6dd22..88b5e5612338 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -146,7 +146,18 @@ struct ena_tx_buffer {
u32 tx_descs;
/* num of buffers used by this skb */
u32 num_of_bufs;
- /* Save the last jiffies to detect missing tx packets */
+
+ /* Used for detect missing tx packets to limit the number of prints */
+ u32 print_once;
+ /* Save the last jiffies to detect missing tx packets
+ *
+ * sets to non zero value on ena_start_xmit and set to zero on
+ * napi and timer_Service_routine.
+ *
+ * while this value is not protected by lock,
+ * a given packet is not expected to be handled by ena_start_xmit
+ * and by napi/timer_service at the same time.
+ */
unsigned long last_jiffies;
struct ena_com_buf bufs[ENA_PKT_MAX_BUFS];
} ____cacheline_aligned;
@@ -170,7 +181,6 @@ struct ena_stats_tx {
u64 napi_comp;
u64 tx_poll;
u64 doorbells;
- u64 missing_tx_comp;
u64 bad_req_id;
};
--
2.7.4
next prev parent reply other threads:[~2017-06-11 12:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-11 12:42 [PATCH net 0/9] Bugs fixes in ena ethernet driver netanel
2017-06-11 12:42 ` [PATCH net 1/9] net: ena: fix rare uncompleted admin command false alarm netanel
2017-06-11 12:42 ` [PATCH net 2/9] net: ena: fix bug that might cause hang after consecutive open/close interface netanel
2017-06-11 12:42 ` [PATCH net 3/9] net: ena: add missing return when ena_com_get_io_handlers() fails netanel
2017-06-11 12:42 ` [PATCH net 4/9] net: ena: fix race condition between submit and completion admin command netanel
2017-06-11 12:42 ` [PATCH net 5/9] net: ena: add missing unmap bars on device removal netanel
2017-06-11 12:42 ` [PATCH net 6/9] net: ena: fix theoretical Rx hang on low memory systems netanel
2017-06-11 12:42 ` [PATCH net 7/9] net: ena: disable admin msix while working in polling mode netanel
2017-06-11 12:42 ` netanel [this message]
2017-06-11 12:42 ` [PATCH net 9/9] net: ena: update ena driver to version 1.1.7 netanel
2017-06-11 20:38 ` [PATCH net 0/9] Bugs fixes in ena ethernet driver David Miller
2017-06-12 14:09 ` Belgazal, Netanel
2017-06-12 14:24 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1497184971-28178-9-git-send-email-netanel@amazon.com \
--to=netanel@amazon.com \
--cc=aliguori@amazon.com \
--cc=davem@davemloft.net \
--cc=dwmw@amazon.com \
--cc=evgenys@amazon.com \
--cc=matua@amazon.com \
--cc=msw@amazon.com \
--cc=nafea@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=saeedb@amazon.com \
--cc=zorik@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.