From: Daniel Walker <dwalker@codeaurora.org>
To: "David S. Miller" <davem@davemloft.net>
Cc: Jim Harford <c_jharfo@quicinc.com>, netdev@vger.kernel.org
Subject: [PATCH] net: Fix race condition on receive path.
Date: Fri, 05 Mar 2010 11:34:59 -0800 [thread overview]
Message-ID: <1267817699.30393.8.camel@c-dwalke-linux.qualcomm.com> (raw)
From: Jim Harford <c_jharfo@quicinc.com>
Fixes a race condition on the networking receive path that causes all
received packets to be dropped after 15-60 minutes of heavy network usage.
Function process_backlog() empties the receive queue, re-enables
interrupts, then "completes" the softIRQ. This provides a time window for
netif_rx() to execute (in IRQ context) and enqueue a received packet
without re-scheduling the softIRQ. After this, the receive queue is never
processed and the system eventually begins to drop all received packets.
The fix has netif_rx() calling napi_schedule(), whether or not the
receive queue is empty. There are two questions that must be addressed:
1) Is the fix correct? Or, is it safe to always call napi_schedule()?
2) Does the fix result in a performance degradation?
With respect to correctness, the napi softIRQ routine should NOT be
scheduled if it is already running. This is controlled by the bit flag
NAPI_STATE_SCHED in the "state" field of struct napi_struct. When this
flag is set, it means the softIRQ routine is either already pending or
currently running. When the softIRQ is finished, the bit is cleared via
__napi_complete(), which is called by napi_complete(), which is called by
process_backlog() after re-enabling interrupts and just prior to
returning.
When netif_rx() calls napi_schedule(), that routine checks the
NAPI_STATE_SCHED bit flag by calling napi_schedule_prep(), which calls
test_and_set_bit(); if the flag is already set, napi_schedule() returns
without doing anything. Note that test_and_set_bit() implements an
atomic "test and set" operation that is commonly used as the foundation
of mutual exclusion mechanisms.
So it is safe for netif_rx() to call napi_schedule(), whether or not the
receive queue is empty.
With respect to performance, the following behavior was observed using an
ARM CPU and cross-compiling for ARM using gcc. Removing the test of
whether the receive queue is empty saves 2 assembly instructions. The
addition of calling napi_schedule(), when that routine does nothing
(because the NAPI_STATE_SCHED flag is set) adds 11 assembly instructions.
Thus, there is a net addition of 9 assembly instructions. Furthermore,
we observe in our testing that even under heavy network load, the receive
queue is empty on at least 95% of netif_rx() invocations. The 9 assembly
instructions amortized over all netif_rx() invocations yield an average
performance penalty of less than 1 assembly instruction per invocation of
netif_rx(). Finally, the test-and-set operation does not entail testing
common memory shared by multiple CPU cores (which can be costly), because
the data structures in question are specific to a single CPU by design.
So calling napi_schedule() at every invocation of netif_rx() does not
impair performance.
CRs-fixed: 184599
Signed-off-by: Jim Harford <c_jharfo@quicinc.com>
Signed-off-by: Daniel Walker <dwalker@codeaurora.org>
---
net/core/dev.c | 11 +++--------
1 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index be9924f..43161c6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2150,15 +2150,10 @@ int netif_rx(struct sk_buff *skb)
__get_cpu_var(netdev_rx_stat).total++;
if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
- if (queue->input_pkt_queue.qlen) {
-enqueue:
- __skb_queue_tail(&queue->input_pkt_queue, skb);
- local_irq_restore(flags);
- return NET_RX_SUCCESS;
- }
-
+ __skb_queue_tail(&queue->input_pkt_queue, skb);
napi_schedule(&queue->backlog);
- goto enqueue;
+ local_irq_restore(flags);
+ return NET_RX_SUCCESS;
}
__get_cpu_var(netdev_rx_stat).dropped++;
--
1.6.3.3
next reply other threads:[~2010-03-05 19:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-05 19:34 Daniel Walker [this message]
2010-03-05 21:21 ` [PATCH] net: Fix race condition on receive path Stephen Hemminger
2010-03-05 23:41 ` Harford, Jim
2010-03-05 23:44 ` Daniel Walker
2010-03-05 23:48 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1267817699.30393.8.camel@c-dwalke-linux.qualcomm.com \
--to=dwalker@codeaurora.org \
--cc=c_jharfo@quicinc.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).