netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Walker <dwalker@codeaurora.org>
To: "David S. Miller" <davem@davemloft.net>
Cc: Jim Harford <c_jharfo@quicinc.com>, netdev@vger.kernel.org
Subject: [PATCH] net: Fix race condition on receive path.
Date: Fri, 05 Mar 2010 11:34:59 -0800	[thread overview]
Message-ID: <1267817699.30393.8.camel@c-dwalke-linux.qualcomm.com> (raw)

From: Jim Harford <c_jharfo@quicinc.com>

Fixes a race condition on the networking receive path that causes all
received packets to be dropped after 15-60 minutes of heavy network usage.
Function process_backlog() empties the receive queue, re-enables
interrupts, then "completes" the softIRQ.  This provides a time window for
netif_rx() to execute (in IRQ context) and enqueue a received packet
without re-scheduling the softIRQ.  After this, the receive queue is never
processed and the system eventually begins to drop all received packets.

The fix has netif_rx() calling napi_schedule(), whether or not the
receive queue is empty.  There are two questions that must be addressed:
  1) Is the fix correct?  Or, is it safe to always call napi_schedule()?
  2) Does the fix result in a performance degradation?

With respect to correctness, the napi softIRQ routine should NOT be
scheduled if it is already running.  This is controlled by the bit flag
NAPI_STATE_SCHED in the "state" field of struct napi_struct.  When this
flag is set, it means the softIRQ routine is either already pending or
currently running.  When the softIRQ is finished, the bit is cleared via
__napi_complete(), which is called by napi_complete(), which is called by
process_backlog() after re-enabling interrupts and just prior to
returning.

When netif_rx() calls napi_schedule(), that routine checks the
NAPI_STATE_SCHED bit flag by calling napi_schedule_prep(), which calls
test_and_set_bit(); if the flag is already set, napi_schedule() returns
without doing anything.  Note that test_and_set_bit() implements an
atomic "test and set" operation that is commonly used as the foundation
of mutual exclusion mechanisms.

So it is safe for netif_rx() to call napi_schedule(), whether or not the
receive queue is empty.

With respect to performance, the following behavior was observed using an
ARM CPU and cross-compiling for ARM using gcc.  Removing the test of
whether the receive queue is empty saves 2 assembly instructions.  The
addition of calling napi_schedule(), when that routine does nothing
(because the NAPI_STATE_SCHED flag is set) adds 11 assembly instructions.
Thus, there is a net addition of 9 assembly instructions.  Furthermore,
we observe in our testing that even under heavy network load, the receive
queue is empty on at least 95% of netif_rx() invocations.  The 9 assembly
instructions amortized over all netif_rx() invocations yield an average
performance penalty of less than 1 assembly instruction per invocation of
netif_rx().  Finally, the test-and-set operation does not entail testing
common memory shared by multiple CPU cores (which can be costly), because
the data structures in question are specific to a single CPU by design.

So calling napi_schedule() at every invocation of netif_rx() does not
impair performance.

CRs-fixed: 184599
Signed-off-by: Jim Harford <c_jharfo@quicinc.com>
Signed-off-by: Daniel Walker <dwalker@codeaurora.org>
---
 net/core/dev.c |   11 +++--------
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index be9924f..43161c6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2150,15 +2150,10 @@ int netif_rx(struct sk_buff *skb)
 
 	__get_cpu_var(netdev_rx_stat).total++;
 	if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
-		if (queue->input_pkt_queue.qlen) {
-enqueue:
-			__skb_queue_tail(&queue->input_pkt_queue, skb);
-			local_irq_restore(flags);
-			return NET_RX_SUCCESS;
-		}
-
+		__skb_queue_tail(&queue->input_pkt_queue, skb);
 		napi_schedule(&queue->backlog);
-		goto enqueue;
+		local_irq_restore(flags);
+		return NET_RX_SUCCESS;
 	}
 
 	__get_cpu_var(netdev_rx_stat).dropped++;
-- 
1.6.3.3





             reply	other threads:[~2010-03-05 19:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-05 19:34 Daniel Walker [this message]
2010-03-05 21:21 ` [PATCH] net: Fix race condition on receive path Stephen Hemminger
2010-03-05 23:41   ` Harford, Jim
2010-03-05 23:44     ` Daniel Walker
2010-03-05 23:48       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1267817699.30393.8.camel@c-dwalke-linux.qualcomm.com \
    --to=dwalker@codeaurora.org \
    --cc=c_jharfo@quicinc.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).