* [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm
2017-06-08 21:46 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
@ 2017-06-08 21:46 ` netanel
0 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-08 21:46 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error
check_for_completion()
sleep()
}
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.
The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 08d11ce..e1c2fab 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_ctx,
struct ena_com_admin_queue *admin_queue)
{
- unsigned long flags;
- u32 start_time;
+ unsigned long flags, timeout;
int ret;
- start_time = ((u32)jiffies_to_usecs(jiffies));
+ timeout = jiffies + ADMIN_CMD_TIMEOUT_US;
+
+ while (1) {
+ spin_lock_irqsave(&admin_queue->q_lock, flags);
+ ena_com_handle_admin_completion(admin_queue);
+ spin_unlock_irqrestore(&admin_queue->q_lock, flags);
- while (comp_ctx->status == ENA_CMD_SUBMITTED) {
- if ((((u32)jiffies_to_usecs(jiffies)) - start_time) >
- ADMIN_CMD_TIMEOUT_US) {
+ if (comp_ctx->status != ENA_CMD_SUBMITTED)
+ break;
+
+ if (time_is_before_jiffies(timeout)) {
pr_err("Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
spin_lock_irqsave(&admin_queue->q_lock, flags);
@@ -528,10 +533,6 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
goto err;
}
- spin_lock_irqsave(&admin_queue->q_lock, flags);
- ena_com_handle_admin_completion(admin_queue);
- spin_unlock_irqrestore(&admin_queue->q_lock, flags);
-
msleep(100);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 0/8] Bug fixes in ena ethernet driver
@ 2017-06-09 6:55 netanel
2017-06-09 6:55 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
` (9 more replies)
0 siblings, 10 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
This patchset contains fixes for the bugs that were discovered so far.
Netanel Belgazal (8):
net: ena: fix rare uncompleted admin command false alarm
net: ena: fix bug that might cause hang after consecutive open/close
interface.
net: ena: add missing return when ena_com_get_io_handlers() fails
net: ena: fix race condition between submit and completion admin
command
net: ena: add missing unmap bars on device removal
net: ena: fix theoretical Rx hang on low memory systems
net: ena: disable admin msix while working in polling mode
net: ena: bug fix in lost tx packets detection mechanism
drivers/net/ethernet/amazon/ena/ena_com.c | 35 +++--
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 2 +-
drivers/net/ethernet/amazon/ena/ena_netdev.c | 179 +++++++++++++++++++-------
drivers/net/ethernet/amazon/ena/ena_netdev.h | 16 ++-
4 files changed, 168 insertions(+), 64 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface netanel
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error
check_for_completion()
sleep()
}
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.
The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 08d11ce..e1c2fab 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_ctx,
struct ena_com_admin_queue *admin_queue)
{
- unsigned long flags;
- u32 start_time;
+ unsigned long flags, timeout;
int ret;
- start_time = ((u32)jiffies_to_usecs(jiffies));
+ timeout = jiffies + ADMIN_CMD_TIMEOUT_US;
+
+ while (1) {
+ spin_lock_irqsave(&admin_queue->q_lock, flags);
+ ena_com_handle_admin_completion(admin_queue);
+ spin_unlock_irqrestore(&admin_queue->q_lock, flags);
- while (comp_ctx->status == ENA_CMD_SUBMITTED) {
- if ((((u32)jiffies_to_usecs(jiffies)) - start_time) >
- ADMIN_CMD_TIMEOUT_US) {
+ if (comp_ctx->status != ENA_CMD_SUBMITTED)
+ break;
+
+ if (time_is_before_jiffies(timeout)) {
pr_err("Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
spin_lock_irqsave(&admin_queue->q_lock, flags);
@@ -528,10 +533,6 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
goto err;
}
- spin_lock_irqsave(&admin_queue->q_lock, flags);
- ena_com_handle_admin_completion(admin_queue);
- spin_unlock_irqrestore(&admin_queue->q_lock, flags);
-
msleep(100);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface.
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-09 6:55 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails netanel
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
Fixing a bug that the driver does not unmask the IO interrupts
in ndo_open():
occasionally, the MSI-X interrupt (for one or more IO queues)
can be masked when ndo_close() was called.
If that is followed by ndo open(),
then the MSI-X will be still masked so no interrupt
will be received by the driver.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 41 ++++++++++++++++++----------
1 file changed, 26 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 7c1214d..0e3c60c7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1078,6 +1078,26 @@ inline void ena_adjust_intr_moderation(struct ena_ring *rx_ring,
rx_ring->per_napi_bytes = 0;
}
+static inline void ena_unmask_interrupt(struct ena_ring *tx_ring,
+ struct ena_ring *rx_ring)
+{
+ struct ena_eth_io_intr_reg intr_reg;
+
+ /* Update intr register: rx intr delay,
+ * tx intr delay and interrupt unmask
+ */
+ ena_com_update_intr_reg(&intr_reg,
+ rx_ring->smoothed_interval,
+ tx_ring->smoothed_interval,
+ true);
+
+ /* It is a shared MSI-X.
+ * Tx and Rx CQ have pointer to it.
+ * So we use one of them to reach the intr reg
+ */
+ ena_com_unmask_intr(rx_ring->ena_com_io_cq, &intr_reg);
+}
+
static inline void ena_update_ring_numa_node(struct ena_ring *tx_ring,
struct ena_ring *rx_ring)
{
@@ -1108,7 +1128,6 @@ static int ena_io_poll(struct napi_struct *napi, int budget)
{
struct ena_napi *ena_napi = container_of(napi, struct ena_napi, napi);
struct ena_ring *tx_ring, *rx_ring;
- struct ena_eth_io_intr_reg intr_reg;
u32 tx_work_done;
u32 rx_work_done;
@@ -1149,22 +1168,9 @@ static int ena_io_poll(struct napi_struct *napi, int budget)
if (ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev))
ena_adjust_intr_moderation(rx_ring, tx_ring);
- /* Update intr register: rx intr delay,
- * tx intr delay and interrupt unmask
- */
- ena_com_update_intr_reg(&intr_reg,
- rx_ring->smoothed_interval,
- tx_ring->smoothed_interval,
- true);
-
- /* It is a shared MSI-X.
- * Tx and Rx CQ have pointer to it.
- * So we use one of them to reach the intr reg
- */
- ena_com_unmask_intr(rx_ring->ena_com_io_cq, &intr_reg);
+ ena_unmask_interrupt(tx_ring, rx_ring);
}
-
ena_update_ring_numa_node(tx_ring, rx_ring);
ret = rx_work_done;
@@ -1485,6 +1491,11 @@ static int ena_up_complete(struct ena_adapter *adapter)
ena_napi_enable_all(adapter);
+ /* Enable completion queues interrupt */
+ for (i = 0; i < adapter->num_queues; i++)
+ ena_unmask_interrupt(&adapter->tx_ring[i],
+ &adapter->rx_ring[i]);
+
/* schedule napi in case we had pending packets
* from the last time we disable napi
*/
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-09 6:55 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
2017-06-09 6:55 ` [PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command netanel
` (6 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0e3c60c7..1e71e89 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1543,6 +1543,7 @@ static int ena_create_io_tx_queue(struct ena_adapter *adapter, int qid)
"Failed to get TX queue handlers. TX queue num %d rc: %d\n",
qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+ return rc;
}
ena_com_update_numa_node(tx_ring->ena_com_io_cq, ctx.numa_node);
@@ -1607,6 +1608,7 @@ static int ena_create_io_rx_queue(struct ena_adapter *adapter, int qid)
"Failed to get RX queue handlers. RX queue num %d rc: %d\n",
qid, rc);
ena_com_destroy_io_queue(ena_dev, ena_qid);
+ return rc;
}
ena_com_update_numa_node(rx_ring->ena_com_io_cq, ctx.numa_node);
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (2 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 5/8] net: ena: add missing unmap bars on device removal netanel
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).
Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.
This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.
Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index e1c2fab..ea60b9e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -232,11 +232,9 @@ static struct ena_comp_ctx *__ena_com_submit_admin_cmd(struct ena_com_admin_queu
tail_masked = admin_queue->sq.tail & queue_size_mask;
/* In case of queue FULL */
- cnt = admin_queue->sq.tail - admin_queue->sq.head;
+ cnt = atomic_read(&admin_queue->outstanding_cmds);
if (cnt >= admin_queue->q_depth) {
- pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n",
- admin_queue->sq.tail, admin_queue->sq.head,
- admin_queue->q_depth);
+ pr_debug("admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(-ENOSPC);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 5/8] net: ena: add missing unmap bars on device removal
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (3 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems netanel
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
This patch also change the mapping functions to devm_ functions
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 1e71e89..4e9fbdd 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2853,6 +2853,11 @@ static void ena_release_bars(struct ena_com_dev *ena_dev, struct pci_dev *pdev)
{
int release_bars;
+ if (ena_dev->mem_bar)
+ devm_iounmap(&pdev->dev, ena_dev->mem_bar);
+
+ devm_iounmap(&pdev->dev, ena_dev->reg_bar);
+
release_bars = pci_select_bars(pdev, IORESOURCE_MEM) & ENA_BAR_MASK;
pci_release_selected_regions(pdev, release_bars);
}
@@ -2940,8 +2945,9 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto err_free_ena_dev;
}
- ena_dev->reg_bar = ioremap(pci_resource_start(pdev, ENA_REG_BAR),
- pci_resource_len(pdev, ENA_REG_BAR));
+ ena_dev->reg_bar = devm_ioremap(&pdev->dev,
+ pci_resource_start(pdev, ENA_REG_BAR),
+ pci_resource_len(pdev, ENA_REG_BAR));
if (!ena_dev->reg_bar) {
dev_err(&pdev->dev, "failed to remap regs bar\n");
rc = -EFAULT;
@@ -2961,8 +2967,9 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
ena_set_push_mode(pdev, ena_dev, &get_feat_ctx);
if (ena_dev->tx_mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV) {
- ena_dev->mem_bar = ioremap_wc(pci_resource_start(pdev, ENA_MEM_BAR),
- pci_resource_len(pdev, ENA_MEM_BAR));
+ ena_dev->mem_bar = devm_ioremap_wc(&pdev->dev,
+ pci_resource_start(pdev, ENA_MEM_BAR),
+ pci_resource_len(pdev, ENA_MEM_BAR));
if (!ena_dev->mem_bar) {
rc = -EFAULT;
goto err_device_destroy;
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (4 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 5/8] net: ena: add missing unmap bars on device removal netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx stuck " netanel
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more
The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 +
drivers/net/ethernet/amazon/ena/ena_netdev.c | 55 +++++++++++++++++++++++++++
drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +
3 files changed, 58 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 67b2338f..533b2fb 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -94,6 +94,7 @@ static const struct ena_stats ena_stats_rx_strings[] = {
ENA_STAT_RX_ENTRY(dma_mapping_err),
ENA_STAT_RX_ENTRY(bad_desc_num),
ENA_STAT_RX_ENTRY(rx_copybreak_pkt),
+ ENA_STAT_RX_ENTRY(empty_rx_ring),
};
static const struct ena_stats ena_stats_ena_com_strings[] = {
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 4e9fbdd..3c366bf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -190,6 +190,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
rxr->sgl_size = adapter->max_rx_sgl_size;
rxr->smoothed_interval =
ena_com_get_nonadaptive_moderation_interval_rx(ena_dev);
+ rxr->empty_rx_queue = 0;
}
}
@@ -2619,6 +2620,58 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter)
adapter->last_monitored_tx_qid = i % adapter->num_queues;
}
+/* trigger napi schedule after 2 consecutive detections */
+#define EMPTY_RX_REFILL 2
+/* For the rare case where the device runs out of Rx descriptors and the
+ * napi handler failed to refill new Rx descriptors (due to a lack of memory
+ * for example).
+ * This case will lead to a deadlock:
+ * The device won't send interrupts since all the new Rx packets will be dropped
+ * The napi handler won't allocate new Rx descriptors so the device will be
+ * able to send new packets.
+ *
+ * This scenario can happen when the kernel's vm.min_free_kbytes is too small.
+ * It is recommended to have at least 512MB, with a minimum of 128MB for
+ * constrained environment).
+ *
+ * When such a situation is detected - Reschedule napi
+ */
+static void check_for_empty_rx_ring(struct ena_adapter *adapter)
+{
+ struct ena_ring *rx_ring;
+ int i, refill_required;
+
+ if (!test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
+ return;
+
+ if (test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))
+ return;
+
+ for (i = 0; i < adapter->num_queues; i++) {
+ rx_ring = &adapter->rx_ring[i];
+
+ refill_required =
+ ena_com_sq_empty_space(rx_ring->ena_com_io_sq);
+ if (unlikely(refill_required == (rx_ring->ring_size - 1))) {
+ rx_ring->empty_rx_queue++;
+
+ if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) {
+ u64_stats_update_begin(&rx_ring->syncp);
+ rx_ring->rx_stats.empty_rx_ring++;
+ u64_stats_update_end(&rx_ring->syncp);
+
+ netif_err(adapter, drv, adapter->netdev,
+ "trigger refill for ring %d\n", i);
+
+ napi_schedule(rx_ring->napi);
+ rx_ring->empty_rx_queue = 0;
+ }
+ } else {
+ rx_ring->empty_rx_queue = 0;
+ }
+ }
+}
+
/* Check for keep alive expiration */
static void check_for_missing_keep_alive(struct ena_adapter *adapter)
{
@@ -2673,6 +2726,8 @@ static void ena_timer_service(unsigned long data)
check_for_missing_tx_completions(adapter);
+ check_for_empty_rx_ring(adapter);
+
if (debug_area)
ena_dump_stats_to_buf(adapter, debug_area);
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 0e22bce..8828f1d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -184,6 +184,7 @@ struct ena_stats_rx {
u64 dma_mapping_err;
u64 bad_desc_num;
u64 rx_copybreak_pkt;
+ u64 empty_rx_ring;
};
struct ena_ring {
@@ -231,6 +232,7 @@ struct ena_ring {
struct ena_stats_tx tx_stats;
struct ena_stats_rx rx_stats;
};
+ int empty_rx_queue;
} ____cacheline_aligned;
struct ena_stats_dev {
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 6/8] net: ena: fix theoretical Rx stuck on low memory systems
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (5 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode netanel
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more
The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 +
drivers/net/ethernet/amazon/ena/ena_netdev.c | 55 +++++++++++++++++++++++++++
drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +
3 files changed, 58 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 67b2338f..533b2fb 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -94,6 +94,7 @@ static const struct ena_stats ena_stats_rx_strings[] = {
ENA_STAT_RX_ENTRY(dma_mapping_err),
ENA_STAT_RX_ENTRY(bad_desc_num),
ENA_STAT_RX_ENTRY(rx_copybreak_pkt),
+ ENA_STAT_RX_ENTRY(empty_rx_ring),
};
static const struct ena_stats ena_stats_ena_com_strings[] = {
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 4e9fbdd..3c366bf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -190,6 +190,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
rxr->sgl_size = adapter->max_rx_sgl_size;
rxr->smoothed_interval =
ena_com_get_nonadaptive_moderation_interval_rx(ena_dev);
+ rxr->empty_rx_queue = 0;
}
}
@@ -2619,6 +2620,58 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter)
adapter->last_monitored_tx_qid = i % adapter->num_queues;
}
+/* trigger napi schedule after 2 consecutive detections */
+#define EMPTY_RX_REFILL 2
+/* For the rare case where the device runs out of Rx descriptors and the
+ * napi handler failed to refill new Rx descriptors (due to a lack of memory
+ * for example).
+ * This case will lead to a deadlock:
+ * The device won't send interrupts since all the new Rx packets will be dropped
+ * The napi handler won't allocate new Rx descriptors so the device will be
+ * able to send new packets.
+ *
+ * This scenario can happen when the kernel's vm.min_free_kbytes is too small.
+ * It is recommended to have at least 512MB, with a minimum of 128MB for
+ * constrained environment).
+ *
+ * When such a situation is detected - Reschedule napi
+ */
+static void check_for_empty_rx_ring(struct ena_adapter *adapter)
+{
+ struct ena_ring *rx_ring;
+ int i, refill_required;
+
+ if (!test_bit(ENA_FLAG_DEV_UP, &adapter->flags))
+ return;
+
+ if (test_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags))
+ return;
+
+ for (i = 0; i < adapter->num_queues; i++) {
+ rx_ring = &adapter->rx_ring[i];
+
+ refill_required =
+ ena_com_sq_empty_space(rx_ring->ena_com_io_sq);
+ if (unlikely(refill_required == (rx_ring->ring_size - 1))) {
+ rx_ring->empty_rx_queue++;
+
+ if (rx_ring->empty_rx_queue >= EMPTY_RX_REFILL) {
+ u64_stats_update_begin(&rx_ring->syncp);
+ rx_ring->rx_stats.empty_rx_ring++;
+ u64_stats_update_end(&rx_ring->syncp);
+
+ netif_err(adapter, drv, adapter->netdev,
+ "trigger refill for ring %d\n", i);
+
+ napi_schedule(rx_ring->napi);
+ rx_ring->empty_rx_queue = 0;
+ }
+ } else {
+ rx_ring->empty_rx_queue = 0;
+ }
+ }
+}
+
/* Check for keep alive expiration */
static void check_for_missing_keep_alive(struct ena_adapter *adapter)
{
@@ -2673,6 +2726,8 @@ static void ena_timer_service(unsigned long data)
check_for_missing_tx_completions(adapter);
+ check_for_empty_rx_ring(adapter);
+
if (debug_area)
ena_dump_stats_to_buf(adapter, debug_area);
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 0e22bce..8828f1d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -184,6 +184,7 @@ struct ena_stats_rx {
u64 dma_mapping_err;
u64 bad_desc_num;
u64 rx_copybreak_pkt;
+ u64 empty_rx_ring;
};
struct ena_ring {
@@ -231,6 +232,7 @@ struct ena_ring {
struct ena_stats_tx tx_stats;
struct ena_stats_rx rx_stats;
};
+ int empty_rx_queue;
} ____cacheline_aligned;
struct ena_stats_dev {
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (6 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx stuck " netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 6:55 ` [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism netanel
2017-06-09 19:33 ` [PATCH net-next 0/8] Bug fixes in ena ethernet driver David Miller
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index ea60b9e..f5b237e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -61,6 +61,8 @@
#define ENA_MMIO_READ_TIMEOUT 0xFFFFFFFF
+#define ENA_REGS_ADMIN_INTR_MASK 1
+
/*****************************************************************************/
/*****************************************************************************/
/*****************************************************************************/
@@ -1454,6 +1456,12 @@ void ena_com_admin_destroy(struct ena_com_dev *ena_dev)
void ena_com_set_admin_polling_mode(struct ena_com_dev *ena_dev, bool polling)
{
+ u32 mask_value = 0;
+
+ if (polling)
+ mask_value = ENA_REGS_ADMIN_INTR_MASK;
+
+ writel(mask_value, ena_dev->reg_bar + ENA_REGS_INTR_MASK_OFF);
ena_dev->admin_queue.polling = polling;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (7 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode netanel
@ 2017-06-09 6:55 ` netanel
2017-06-09 19:33 ` [PATCH net-next 0/8] Bug fixes in ena ethernet driver David Miller
9 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 6:55 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 -
drivers/net/ethernet/amazon/ena/ena_netdev.c | 66 +++++++++++++++------------
drivers/net/ethernet/amazon/ena/ena_netdev.h | 14 +++++-
3 files changed, 50 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 533b2fb..3ee55e2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -80,7 +80,6 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(tx_poll),
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(prepare_ctx_err),
- ENA_STAT_TX_ENTRY(missing_tx_comp),
ENA_STAT_TX_ENTRY(bad_req_id),
};
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 3c366bf..4f16ed3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1995,6 +1995,7 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev)
tx_info->tx_descs = nb_hw_desc;
tx_info->last_jiffies = jiffies;
+ tx_info->print_once = 0;
tx_ring->next_to_use = ENA_TX_RING_IDX_NEXT(next_to_use,
tx_ring->ring_size);
@@ -2564,13 +2565,44 @@ static void ena_fw_reset_device(struct work_struct *work)
"Reset attempt failed. Can not reset the device\n");
}
-static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+static int check_missing_comp_in_queue(struct ena_adapter *adapter,
+ struct ena_ring *tx_ring)
{
struct ena_tx_buffer *tx_buf;
unsigned long last_jiffies;
+ u32 missed_tx = 0;
+ int i;
+
+ for (i = 0; i < tx_ring->ring_size; i++) {
+ tx_buf = &tx_ring->tx_buffer_info[i];
+ last_jiffies = tx_buf->last_jiffies;
+ if (unlikely(last_jiffies &&
+ time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
+ if (!tx_buf->print_once)
+ netif_notice(adapter, tx_err, adapter->netdev,
+ "Found a Tx that wasn't completed on time, qid %d, index %d.\n",
+ tx_ring->qid, i);
+
+ tx_buf->print_once = 1;
+ missed_tx++;
+
+ if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) {
+ netif_err(adapter, tx_err, adapter->netdev,
+ "The number of lost tx completions is above the threshold (%d > %d). Reset the device\n",
+ missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS);
+ set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
+ return -EIO;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static void check_for_missing_tx_completions(struct ena_adapter *adapter)
+{
struct ena_ring *tx_ring;
- int i, j, budget;
- u32 missed_tx;
+ int i, budget, rc;
/* Make sure the driver doesn't turn the device in other process */
smp_rmb();
@@ -2586,31 +2618,9 @@ static void check_for_missing_tx_completions(struct ena_adapter *adapter)
for (i = adapter->last_monitored_tx_qid; i < adapter->num_queues; i++) {
tx_ring = &adapter->tx_ring[i];
- for (j = 0; j < tx_ring->ring_size; j++) {
- tx_buf = &tx_ring->tx_buffer_info[j];
- last_jiffies = tx_buf->last_jiffies;
- if (unlikely(last_jiffies && time_is_before_jiffies(last_jiffies + TX_TIMEOUT))) {
- netif_notice(adapter, tx_err, adapter->netdev,
- "Found a Tx that wasn't completed on time, qid %d, index %d.\n",
- tx_ring->qid, j);
-
- u64_stats_update_begin(&tx_ring->syncp);
- missed_tx = tx_ring->tx_stats.missing_tx_comp++;
- u64_stats_update_end(&tx_ring->syncp);
-
- /* Clear last jiffies so the lost buffer won't
- * be counted twice.
- */
- tx_buf->last_jiffies = 0;
-
- if (unlikely(missed_tx > MAX_NUM_OF_TIMEOUTED_PACKETS)) {
- netif_err(adapter, tx_err, adapter->netdev,
- "The number of lost tx completion is above the threshold (%d > %d). Reset the device\n",
- missed_tx, MAX_NUM_OF_TIMEOUTED_PACKETS);
- set_bit(ENA_FLAG_TRIGGER_RESET, &adapter->flags);
- }
- }
- }
+ rc = check_missing_comp_in_queue(adapter, tx_ring);
+ if (unlikely(rc))
+ return;
budget--;
if (!budget)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 8828f1d..88b5e56 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -146,7 +146,18 @@ struct ena_tx_buffer {
u32 tx_descs;
/* num of buffers used by this skb */
u32 num_of_bufs;
- /* Save the last jiffies to detect missing tx packets */
+
+ /* Used for detect missing tx packets to limit the number of prints */
+ u32 print_once;
+ /* Save the last jiffies to detect missing tx packets
+ *
+ * sets to non zero value on ena_start_xmit and set to zero on
+ * napi and timer_Service_routine.
+ *
+ * while this value is not protected by lock,
+ * a given packet is not expected to be handled by ena_start_xmit
+ * and by napi/timer_service at the same time.
+ */
unsigned long last_jiffies;
struct ena_com_buf bufs[ENA_PKT_MAX_BUFS];
} ____cacheline_aligned;
@@ -170,7 +181,6 @@ struct ena_stats_tx {
u64 napi_comp;
u64 tx_poll;
u64 doorbells;
- u64 missing_tx_comp;
u64 bad_req_id;
};
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
` (8 preceding siblings ...)
2017-06-09 6:55 ` [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism netanel
@ 2017-06-09 19:33 ` David Miller
2017-06-09 22:13 ` Belgazal, Netanel
9 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2017-06-09 19:33 UTC (permalink / raw)
To: netanel; +Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, evgenys
From: <netanel@amazon.com>
Date: Fri, 9 Jun 2017 09:55:16 +0300
> This patchset contains fixes for the bugs that were discovered so far.
You submitted patch #6 twice, once with the word "stuck" in the subject
line, once with the word "hang" in the subject line.
Please sort this out and resubmit, thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver
2017-06-09 19:33 ` [PATCH net-next 0/8] Bug fixes in ena ethernet driver David Miller
@ 2017-06-09 22:13 ` Belgazal, Netanel
0 siblings, 0 replies; 14+ messages in thread
From: Belgazal, Netanel @ 2017-06-09 22:13 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, Woodhouse, David, Machulsky, Zorik,
Matushevsky, Alexander, BSHARA, Said, Wilson, Matt,
Liguori, Anthony, Bshara, Nafea, Schmeilin, Evgeny
I the last minute I fixed patchset #6 commit subject from stuck to hang and I forget to remove it.
Sorry for that.
resubmitted.
________________________________________
From: David Miller <davem@davemloft.net>
Sent: Friday, June 9, 2017 10:33 PM
To: Belgazal, Netanel
Cc: netdev@vger.kernel.org; Woodhouse, David; Machulsky, Zorik; Matushevsky, Alexander; BSHARA, Said; Wilson, Matt; Liguori, Anthony; Bshara, Nafea; Schmeilin, Evgeny
Subject: Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver
From: <netanel@amazon.com>
Date: Fri, 9 Jun 2017 09:55:16 +0300
> This patchset contains fixes for the bugs that were discovered so far.
You submitted patch #6 twice, once with the word "stuck" in the subject
line, once with the word "hang" in the subject line.
Please sort this out and resubmit, thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm
2017-06-09 22:13 netanel
@ 2017-06-09 22:13 ` netanel
0 siblings, 0 replies; 14+ messages in thread
From: netanel @ 2017-06-09 22:13 UTC (permalink / raw)
To: davem, netdev
Cc: Netanel Belgazal, dwmw, zorik, matua, saeedb, msw, aliguori,
nafea, evgenys
From: Netanel Belgazal <netanel@amazon.com>
The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error
check_for_completion()
sleep()
}
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.
The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 08d11ce..e1c2fab 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -508,15 +508,20 @@ static int ena_com_comp_status_to_errno(u8 comp_status)
static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_ctx,
struct ena_com_admin_queue *admin_queue)
{
- unsigned long flags;
- u32 start_time;
+ unsigned long flags, timeout;
int ret;
- start_time = ((u32)jiffies_to_usecs(jiffies));
+ timeout = jiffies + ADMIN_CMD_TIMEOUT_US;
+
+ while (1) {
+ spin_lock_irqsave(&admin_queue->q_lock, flags);
+ ena_com_handle_admin_completion(admin_queue);
+ spin_unlock_irqrestore(&admin_queue->q_lock, flags);
- while (comp_ctx->status == ENA_CMD_SUBMITTED) {
- if ((((u32)jiffies_to_usecs(jiffies)) - start_time) >
- ADMIN_CMD_TIMEOUT_US) {
+ if (comp_ctx->status != ENA_CMD_SUBMITTED)
+ break;
+
+ if (time_is_before_jiffies(timeout)) {
pr_err("Wait for completion (polling) timeout\n");
/* ENA didn't have any completion */
spin_lock_irqsave(&admin_queue->q_lock, flags);
@@ -528,10 +533,6 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
goto err;
}
- spin_lock_irqsave(&admin_queue->q_lock, flags);
- ena_com_handle_admin_completion(admin_queue);
- spin_unlock_irqrestore(&admin_queue->q_lock, flags);
-
msleep(100);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-06-09 22:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-09 6:55 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
2017-06-09 6:55 ` [PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface netanel
2017-06-09 6:55 ` [PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails netanel
2017-06-09 6:55 ` [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command netanel
2017-06-09 6:55 ` [PATCH net-next 5/8] net: ena: add missing unmap bars on device removal netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx stuck " netanel
2017-06-09 6:55 ` [PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode netanel
2017-06-09 6:55 ` [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism netanel
2017-06-09 19:33 ` [PATCH net-next 0/8] Bug fixes in ena ethernet driver David Miller
2017-06-09 22:13 ` Belgazal, Netanel
-- strict thread matches above, loose matches on Subject: below --
2017-06-09 22:13 netanel
2017-06-09 22:13 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
2017-06-08 21:46 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-08 21:46 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).