From: Neil Horman <nhorman@tuxdriver.com>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, nhorman@tuxdriver.com
Subject: [PATCH] netpoll: fix race on poll_list resulting in garbage entry
Date: Tue, 9 Dec 2008 16:06:44 -0500 [thread overview]
Message-ID: <20081209210644.GC3785@hmsreliant.think-freely.org> (raw)
Hey all-
A few months back a race was discused between the netpoll napi service
path, and the fast path through net_rx_action:
http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
A patch was submitted for that bug, but I think we missed a case.
Consider the following scenario:
INITIAL STATE
CPU0 has one napi_struct A on its poll_list
CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
napi_struct A that CPU0 has on its list
CPU0 CPU1
net_rx_action poll_napi
!list_empty (returns true) locks poll_lock for A
poll_one_napi
napi->poll
netif_rx_complete
__napi_complete
(removes A from poll_list)
list_entry(list->next)
In the above scenario, net_rx_action assumes that the per-cpu poll_list is
exclusive to that cpu. netpoll of course violates that, and because the netpoll
path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
list at the top of the while loop in net_rx_action, but have it become empty by
the time it calls list_entry. Since the poll_list isn't surrounded by any other
structure, the returned data from that list_entry call in this situation is
garbage, and any number of crashes can result based on what exactly that garbage
is.
Given that its not fasible for performance reasons to place exclusive locks
arround each cpus poll list to provide that mutal exclusion, I think the best
solution is modify the netpoll path in such a way that we continue to guarantee
that the poll_list for a cpu is in fact exclusive to that cpu. To do this I've
implemented the patch below. It adds an additional bit to the state field in
the napi_struct. When executing napi->poll from the netpoll_path, this bit will
be set. When a driver calls netif_rx_complete, if that bit is set, it will not
remove the napi_struct from the poll_list. That work will be saved for the next
iteration of net_rx_action.
I've tested this and it seems to work well. About the biggest drawback I can
see to it is the fact that it might result in an extra loop through
net_rx_action in the event that the device is actually contended for (i.e. the
netpoll path actually preforms all the needed work no the device, and the call
to net_rx_action winds up doing nothing, except removing the napi_struct from
the poll_list. However I think this is probably a small price to pay, given
that the alternative is a crash.
Regards
Neil
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/linux/netdevice.h | 7 +++++++
net/core/netpoll.c | 2 ++
2 files changed, 9 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9d77b1d..e26f549 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -319,6 +319,7 @@ enum
{
NAPI_STATE_SCHED, /* Poll is scheduled */
NAPI_STATE_DISABLE, /* Disable pending */
+ NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
};
extern void __napi_schedule(struct napi_struct *n);
@@ -1497,6 +1498,12 @@ static inline void netif_rx_complete(struct net_device *dev,
{
unsigned long flags;
+ /*
+ * don't let napi dequeue from the cpu poll list
+ * just in case its running on a different cpu
+ */
+ if (unlikely(test_bit(NAPI_STATE_NPSVC, &napi->state)))
+ return;
local_irq_save(flags);
__netif_rx_complete(dev, napi);
local_irq_restore(flags);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 6c7af39..dadac62 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -133,9 +133,11 @@ static int poll_one_napi(struct netpoll_info *npinfo,
npinfo->rx_flags |= NETPOLL_RX_DROP;
atomic_inc(&trapped);
+ set_bit(NAPI_STATE_NPSVC, &napi->state);
work = napi->poll(napi, budget);
+ clear_bit(NAPI_STATE_NPSVC, &napi->state);
atomic_dec(&trapped);
npinfo->rx_flags &= ~NETPOLL_RX_DROP;
--
/****************************************************
* Neil Horman <nhorman@tuxdriver.com>
* Software Engineer, Red Hat
****************************************************/
next reply other threads:[~2008-12-09 21:08 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-09 21:06 Neil Horman [this message]
2008-12-10 7:22 ` [PATCH] netpoll: fix race on poll_list resulting in garbage entry David Miller
2008-12-11 13:07 ` Jarek Poplawski
2008-12-11 14:29 ` Neil Horman
2008-12-11 17:01 ` Stephen Hemminger
2008-12-11 18:15 ` Neil Horman
2008-12-12 0:03 ` Stephen Hemminger
2008-12-12 12:18 ` Neil Horman
2008-12-16 23:55 ` David Miller
2008-12-17 21:16 ` Neil Horman
2008-12-17 21:31 ` Stephen Hemminger
2008-12-17 23:44 ` Neil Horman
2008-12-18 1:13 ` Neil Horman
2008-12-18 3:29 ` David Miller
2008-12-18 14:47 ` Neil Horman
2008-12-18 19:52 ` Neil Horman
2008-12-18 22:40 ` Ben Hutchings
2008-12-18 23:30 ` Johannes Berg
2008-12-19 1:25 ` Neil Horman
2008-12-19 6:42 ` David Miller
2008-12-19 13:42 ` Neil Horman
2008-12-23 4:43 ` David Miller
2008-12-18 9:04 ` Jarek Poplawski
2008-12-12 7:07 ` Jarek Poplawski
2008-12-12 13:31 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081209210644.GC3785@hmsreliant.think-freely.org \
--to=nhorman@tuxdriver.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).