From: Mark Broadbent <markb@wetlettuce.com>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: Matt Mackall <mpm@selenic.com>,
"Network Development (Linux)" <netdev@oss.sgi.com>
Subject: Re: Followup to netpoll issues
Date: Fri, 07 Jan 2005 20:14:47 +0000 [thread overview]
Message-ID: <1105128887.3814.20.camel@localhost> (raw)
In-Reply-To: <20050107002053.GD27896@electric-eye.fr.zoreil.com>
[-- Attachment #1: Type: text/plain, Size: 6913 bytes --]
[cc'ing netdev]
On Fri, 2005-01-07 at 01:20 +0100, Francois Romieu wrote:
> Mark Broadbent <markb@wetlettuce.com> :
> [...]
> > I had a think about the netpoll deadlock following the initial detective
> > work by you and Francois did. I came up with a sample fix given below
> > that defers the packet transmission within netpoll until the xmit_lock
> > can be grabbed. An advantage of this is that is maintains the packet
> > ordering. I have attached the patch.
>
> Nice.
>
> Comments below. Off to bed now.
>
> > It is possible for netpoll to deadlock by attempting to take the network
> > device xmit_lock twice on different processors. This patch resolves this
> > by deferring to transmission of the second and subsequent skbs.
> >
> > Signed-Off-By: Mark Broadbent <markb@wetlettuce.com>
> >
> > --- linux-2.6.10-org/net/core/netpoll.c 2005-01-05 22:30:53.000000000 +0000
> > +++ linux-2.6.10/net/core/netpoll.c 2005-01-06 21:00:24.000000000 +0000
> [...]
> > @@ -178,17 +201,14 @@ repeat:
> > return skb;
> > }
> >
> > -void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
> > +/* This must be called with np->dev->xmit_lock spinlock held - this function
> > + * function will drop the lock before returning indicating if it needs to be
> > + * retried
> > + */
> > +static int transmit_skb(struct netpoll *np, struct sk_buff *skb)
>
> It may seem like cosmetic but I do not like functions which unlock a lock
> they have not acquired themselves (and I am probably not alone). Would you
> consider pushing the trylock() in transmit_skb() itself ?
>
> All the callers of this functions have to issue the trylock anyway so it
> should not matter from a performance POV.
Good point, I was having doubts about that.
> [...]
> > +
> > + spin_lock(&tx_list_lock);
> ^^^^^^^^^
> kernel/workqueue.c::run_workqueue() could reenable interrupts before
> it calls tx_retry_wq(). An irq safe version of the underlined lock
> seems required to avoid deadlocking with printks potentially issued
> from interrupt context.
Fixed.
> > +
> > + list_for_each_safe(tx_ctr, tx_tmp, &tx_queue.link) {
> > + tx_el = list_entry(tx_ctr, struct tx_queue_list, link);
> > +
> > + while (tx_el->np && tx_el->np->dev &&
> > + netif_running(tx_el->np->dev)) {
>
> netif_running() without xmit_lock held ? Uh oh...
This is done in the original code (and still is in other parts of the code).
> Imho you should try to keep transmit_skb() closer to the former
> netpoll_send_skb() (modulo the try_lock change of course).
> Btw I do not see what will prevent cleanup_netconsole() to succeed
> when tx_retry_wq() is scheduled for later execution. Add some module
> refcounting ?
No need, if netpoll_cleanup is called whilst a retry is pending the tx
list lock is taken. All the references to the netpoll pointer are
deleted from the pending list before the lock is released.
> > +void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
> > +{
> > +repeat:
> > + if(!np || !np->dev || !netif_running(np->dev))
> > + goto out_free;
>
> I am not sure that the first two tests above really made sense in
> the former version of the code:
> -> kernel/printk.c::vprintk() (downs console_sem)
> -> kernel/printk.c::release_console_sem
> -> kernel/printk.c::call_console_drivers
> [blah blah]
> -> drivers/net/netconsole.c::netconsole.write_msg
> -> drivers/net/netconsole.c::netpoll_send_udp
> -> drivers/net/netconsole.c::netpoll_send_skb
>
> -> drivers/net/netconsole.c::cleanup_netconsole
> -> kernel/printk.c::unregister_console
> -> kernel/printk.c::acquire_console_sem
> -> try to down(&console_sem)
>
> dev_put() has not been issued, np->dev is still among us and != NULL.
> So far, so good.
>
> However netpoll_send_skb can be called through:
> netpoll_rx -> arp_reply -> netpoll_send_skb
> Here I fail to see why netpoll_cleanup could not happen at the same time
> and issue np->dev = NULL;
Could this solved by having a per netpoll structure rw_lock to protect
against changing elements in the structure?
e.g.
struct netpoll {
...
rwlock_t netpoll_lock;
};
write_lock_irqsave(&np->netpoll_lock, flags);
np->dev = NULL;
write_unlock_irqsave(&np->netpoll_lock, flags);
> > +
> > + /* If the list is non-empty then pkts are pending tx,
> > + * append to the end of list to stop reordering of the pkts
> > + */
> > + if (!list_empty(&tx_queue.link) || !spin_trylock(&np->dev->xmit_lock)) {
> > + struct tx_queue_list *txel;
> > +
> > + /* If we failed to get the xmit_lock then netpoll probably
> > + * already holds this lock but will deadlock here if we relock,
>
> Ok ("probably" ?).
:) Fixed
> > + * queue the skb for transmission later
> > + */
> > + spin_lock(&tx_list_lock);
>
> No spin_lock_irqsave ? Hmmm...
> The only use of netpoll_send_skb() goes through netpoll_send_udp() which is
> issued in drivers/net/netconsole.c::write_msg() with local_irq disabled.
> You are right but an extra comment would not hurt imho.
Is it safe to issue spin_lock_irqsave/spin_unlock_restore in this
context? Reason: What if it wasn't netconsole calling netpoll_send_udp
and not with local interrupts disabled?
> > +
> > + txel = kmalloc(sizeof(struct tx_queue_list), GFP_ATOMIC);
> > + if (!txel)
> > + goto out_free;
> > +
> > + txel->skb = skb_get(skb);
> > + txel->np = np;
> > +
> > + list_add_tail(&txel->link, &tx_queue.link);
> > +
> > + queue_delayed_work(tx_wq, &tx_wq_obj, HZ/100);
> > +
> > + spin_unlock(&tx_list_lock);
> > + } else if (transmit_skb(np, skb))
> > + goto repeat;
>
> As I read it, the previous code could loop as long as netif_queue_stopped()
> is true and it can acquire the lock. Any reason to not defer the skb for
> transmission in tx_retry_wq() ?
Nope, no good reason. If transmit_skb fails it'll now defer.
>
> > +
> > + return;
> > +
> > +out_free:
> > + __kfree_skb(skb);
> > }
> >
> > void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
> > @@ -636,6 +729,15 @@ int netpoll_setup(struct netpoll *np)
> > spin_unlock_irqrestore(&rx_list_lock, flags);
> > }
> >
> > + /* Workqueue to schedule the tx retries */
> > + if (!tx_wq)
> > + tx_wq = create_workqueue("netpoll");
> > +
> > + if (!tx_wq) {
> > + printk(KERN_ERR "Failed to create netpoll workqueue\n");
> > + goto release;
> > + }
> > +
>
> This queue is created but never removed. May be it could/should go in a
> separate function and be anchored to net_dev_init(). The network maintainers
> will tell if they care or not.
>
> Pure curiosity: why a specific queue instead of schedule_delayed_work() ?
Pure ignorance I'm afraid ;). That'll more what I was looking for
originally so I've changed it.
I've attached a revised version.
Thanks for you comments (and Matts)
Mark
--
Mark Broadbent <markb@wetlettuce.com>
[-- Attachment #2: netpoll-fix-tx-deadlock-2.patch --]
[-- Type: text/x-patch, Size: 4765 bytes --]
It is possible for netpoll to deadlock by attempting to take the network
device xmit_lock twice on different processors. This patch resolves this
by deferring to transmission of the second and subsequent skbs.
Signed-Off-By: Mark Broadbent <markb@wetlettuce.com>
--- linux-2.6.10-org/net/core/netpoll.c 2005-01-05 22:30:53.000000000 +0000
+++ linux-2.6.10/net/core/netpoll.c 2005-01-07 20:10:20.000000000 +0000
@@ -1,6 +1,9 @@
/*
* Common framework for low-level network console, dump, and debugger code
*
+ * Jan 5 2005 Mark Broadbent <markb@wetlettuce.com>
+ * avoid deadlock on transmission
+ *
* Sep 8 2003 Matt Mackall <mpm@selenic.com>
*
* based on the netconsole code from:
@@ -19,6 +22,8 @@
#include <linux/netpoll.h>
#include <linux/sched.h>
#include <linux/rcupdate.h>
+#include <linux/workqueue.h>
+#include <linux/list.h>
#include <net/tcp.h>
#include <net/udp.h>
#include <asm/unaligned.h>
@@ -31,6 +36,22 @@
#define MAX_SKBS 32
#define MAX_UDP_CHUNK 1460
+static void tx_retry_wq(void *p);
+
+struct tx_queue_list {
+ struct list_head link;
+ struct sk_buff *skb;
+ struct netpoll *np;
+};
+
+static spinlock_t tx_list_lock = SPIN_LOCK_UNLOCKED;
+static struct tx_queue_list tx_queue = {
+ .link = LIST_HEAD_INIT(tx_queue.link)
+};
+
+static DECLARE_WORK(tx_wq_obj, tx_retry_wq, NULL);
+
+
static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED;
static int nr_skbs;
static struct sk_buff *skbs;
@@ -178,17 +199,21 @@ repeat:
return skb;
}
-void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
+/* This must be called with np->dev->xmit_lock spinlock held - this function
+ * function will drop the lock before returning indicating if it needs to be
+ * retried
+ */
+static int transmit_skb(struct netpoll *np, struct sk_buff *skb)
{
int status;
-repeat:
- if(!np || !np->dev || !netif_running(np->dev)) {
- __kfree_skb(skb);
- return;
- }
-
- spin_lock(&np->dev->xmit_lock);
+ if (np && np->dev && netif_running(np->dev))
+ return -1;
+
+ /* Attempt the xmit lock as we may already hold it */
+ if (!spin_trylock(&np->dev->xmit_lock))
+ return -1;
+
np->dev->xmit_lock_owner = smp_processor_id();
/*
@@ -200,7 +225,7 @@ repeat:
spin_unlock(&np->dev->xmit_lock);
netpoll_poll(np);
- goto repeat;
+ return -1;
}
status = np->dev->hard_start_xmit(skb, np->dev);
@@ -210,8 +235,73 @@ repeat:
/* transmit busy */
if(status) {
netpoll_poll(np);
- goto repeat;
+ return -1;
}
+
+ return 0;
+}
+
+static void tx_retry_wq(void *p)
+{
+ struct list_head *tx_ctr, *tx_tmp;
+ struct tx_queue_list *tx_el;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tx_list_lock, flags);
+
+ list_for_each_safe(tx_ctr, tx_tmp, &tx_queue.link) {
+ tx_el = list_entry(tx_ctr, struct tx_queue_list, link);
+
+ if (transmit_skb(tx_el->np, tx_el->skb)) {
+ // Transmit failed - reschedule
+ schedule_delayed_work(&tx_wq_obj, HZ/100);
+ goto out_unlock;
+ }
+
+ list_del(tx_ctr);
+ __kfree_skb(tx_el->skb);
+ kfree(tx_el);
+ }
+
+out_unlock:
+ spin_unlock_irqrestore(&tx_list_lock, flags);
+}
+
+void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
+{
+ /* If the list is non-empty (packets pending tx) or the skb transmit
+ * failed then pkts are pending tx, append to the end of list to stop
+ * reordering of the pkts
+ */
+ if (!list_empty(&tx_queue.link) || transmit_skb(np, skb)) {
+ struct tx_queue_list *txel;
+
+ /* If we failed to get the xmit_lock then netpoll already holds
+ * this lock but will deadlock here if we relock, queue the skb
+ * for transmission later.
+ * This function is always called with local irqs disabled, hence
+ * the lack of spin_lock_irqsave
+ */
+ spin_lock(&tx_list_lock);
+
+ txel = kmalloc(sizeof(struct tx_queue_list), GFP_ATOMIC);
+ if (!txel)
+ goto out_free;
+
+ txel->skb = skb_get(skb);
+ txel->np = np;
+
+ list_add_tail(&txel->link, &tx_queue.link);
+
+ schedule_delayed_work(&tx_wq_obj, HZ/100);
+
+ spin_unlock(&tx_list_lock);
+ }
+
+ return;
+
+out_free:
+ __kfree_skb(skb);
}
void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
@@ -644,6 +734,23 @@ int netpoll_setup(struct netpoll *np)
void netpoll_cleanup(struct netpoll *np)
{
+ struct list_head *tx_ctr, *tx_tmp;
+ struct tx_queue_list *tx_el;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tx_list_lock, flags);
+ list_for_each_safe(tx_ctr, tx_tmp, &tx_queue.link) {
+ tx_el = list_entry(tx_ctr, struct tx_queue_list, link);
+
+ // This netpoll is going away - remove just these elements
+ if (tx_el->np == np) {
+ list_del(tx_ctr);
+ __kfree_skb(tx_el->skb);
+ kfree(tx_el);
+ }
+ }
+ spin_unlock_irqrestore(&tx_list_lock, flags);
+
if (np->rx_hook) {
unsigned long flags;
next prev parent reply other threads:[~2005-01-07 20:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1105045914.7687.3.camel@tigger>
[not found] ` <20050106234610.GT2940@waste.org>
[not found] ` <20050107011547.GE27896@electric-eye.fr.zoreil.com>
2005-01-07 17:01 ` Followup to netpoll issues Matt Mackall
2005-01-07 21:42 ` Francois Romieu
2005-01-07 22:07 ` Matt Mackall
2005-01-07 22:41 ` Francois Romieu
[not found] ` <20050107002053.GD27896@electric-eye.fr.zoreil.com>
2005-01-07 20:14 ` Mark Broadbent [this message]
2005-01-07 23:18 ` Francois Romieu
2005-01-10 22:03 ` Mark Broadbent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1105128887.3814.20.camel@localhost \
--to=markb@wetlettuce.com \
--cc=mpm@selenic.com \
--cc=netdev@oss.sgi.com \
--cc=romieu@fr.zoreil.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).