From: <netanel@amazon.com>
To: <davem@davemloft.net>, <netdev@vger.kernel.org>
Cc: Netanel Belgazal <netanel@amazon.com>, <dwmw@amazon.com>,
<zorik@amazon.com>, <matua@amazon.com>, <saeedb@amazon.com>,
<msw@amazon.com>, <aliguori@amazon.com>, <nafea@amazon.com>,
<evgenys@amazon.com>
Subject: [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command
Date: Fri, 9 Jun 2017 09:55:20 +0300 [thread overview]
Message-ID: <1496991325-551-5-git-send-email-netanel@amazon.com> (raw)
In-Reply-To: <1496991325-551-1-git-send-email-netanel@amazon.com>
From: Netanel Belgazal <netanel@amazon.com>
Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).
Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.
This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.
Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
---
drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index e1c2fab..ea60b9e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -232,11 +232,9 @@ static struct ena_comp_ctx *__ena_com_submit_admin_cmd(struct ena_com_admin_queu
tail_masked = admin_queue->sq.tail & queue_size_mask;
/* In case of queue FULL */
- cnt = admin_queue->sq.tail - admin_queue->sq.head;
+ cnt = atomic_read(&admin_queue->outstanding_cmds);
if (cnt >= admin_queue->q_depth) {
- pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n",
- admin_queue->sq.tail, admin_queue->sq.head,
- admin_queue->q_depth);
+ pr_debug("admin queue is full.\n");
admin_queue->stats.out_of_space++;
return ERR_PTR(-ENOSPC);
}
--
2.7.4
next prev parent reply other threads:[~2017-06-09 6:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-09 6:55 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-09 6:55 ` [PATCH net-next 1/8] net: ena: fix rare uncompleted admin command false alarm netanel
2017-06-09 6:55 ` [PATCH net-next 2/8] net: ena: fix bug that might cause hang after consecutive open/close interface netanel
2017-06-09 6:55 ` [PATCH net-next 3/8] net: ena: add missing return when ena_com_get_io_handlers() fails netanel
2017-06-09 6:55 ` netanel [this message]
2017-06-09 6:55 ` [PATCH net-next 5/8] net: ena: add missing unmap bars on device removal netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx hang on low memory systems netanel
2017-06-09 6:55 ` [PATCH net-next 6/8] net: ena: fix theoretical Rx stuck " netanel
2017-06-09 6:55 ` [PATCH net-next 7/8] net: ena: disable admin msix while working in polling mode netanel
2017-06-09 6:55 ` [PATCH net-next 8/8] net: ena: bug fix in lost tx packets detection mechanism netanel
2017-06-09 19:33 ` [PATCH net-next 0/8] Bug fixes in ena ethernet driver David Miller
2017-06-09 22:13 ` Belgazal, Netanel
-- strict thread matches above, loose matches on Subject: below --
2017-06-09 22:13 netanel
2017-06-09 22:13 ` [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command netanel
2017-06-08 21:46 [PATCH net-next 0/8] Bug fixes in ena ethernet driver netanel
2017-06-08 21:46 ` [PATCH net-next 4/8] net: ena: fix race condition between submit and completion admin command netanel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1496991325-551-5-git-send-email-netanel@amazon.com \
--to=netanel@amazon.com \
--cc=aliguori@amazon.com \
--cc=davem@davemloft.net \
--cc=dwmw@amazon.com \
--cc=evgenys@amazon.com \
--cc=matua@amazon.com \
--cc=msw@amazon.com \
--cc=nafea@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=saeedb@amazon.com \
--cc=zorik@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).