[cc'ing netdev] On Fri, 2005-01-07 at 01:20 +0100, Francois Romieu wrote: > Mark Broadbent : > [...] > > I had a think about the netpoll deadlock following the initial detective > > work by you and Francois did. I came up with a sample fix given below > > that defers the packet transmission within netpoll until the xmit_lock > > can be grabbed. An advantage of this is that is maintains the packet > > ordering. I have attached the patch. > > Nice. > > Comments below. Off to bed now. > > > It is possible for netpoll to deadlock by attempting to take the network > > device xmit_lock twice on different processors. This patch resolves this > > by deferring to transmission of the second and subsequent skbs. > > > > Signed-Off-By: Mark Broadbent > > > > --- linux-2.6.10-org/net/core/netpoll.c 2005-01-05 22:30:53.000000000 +0000 > > +++ linux-2.6.10/net/core/netpoll.c 2005-01-06 21:00:24.000000000 +0000 > [...] > > @@ -178,17 +201,14 @@ repeat: > > return skb; > > } > > > > -void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) > > +/* This must be called with np->dev->xmit_lock spinlock held - this function > > + * function will drop the lock before returning indicating if it needs to be > > + * retried > > + */ > > +static int transmit_skb(struct netpoll *np, struct sk_buff *skb) > > It may seem like cosmetic but I do not like functions which unlock a lock > they have not acquired themselves (and I am probably not alone). Would you > consider pushing the trylock() in transmit_skb() itself ? > > All the callers of this functions have to issue the trylock anyway so it > should not matter from a performance POV. Good point, I was having doubts about that. > [...] > > + > > + spin_lock(&tx_list_lock); > ^^^^^^^^^ > kernel/workqueue.c::run_workqueue() could reenable interrupts before > it calls tx_retry_wq(). An irq safe version of the underlined lock > seems required to avoid deadlocking with printks potentially issued > from interrupt context. Fixed. > > + > > + list_for_each_safe(tx_ctr, tx_tmp, &tx_queue.link) { > > + tx_el = list_entry(tx_ctr, struct tx_queue_list, link); > > + > > + while (tx_el->np && tx_el->np->dev && > > + netif_running(tx_el->np->dev)) { > > netif_running() without xmit_lock held ? Uh oh... This is done in the original code (and still is in other parts of the code). > Imho you should try to keep transmit_skb() closer to the former > netpoll_send_skb() (modulo the try_lock change of course). > Btw I do not see what will prevent cleanup_netconsole() to succeed > when tx_retry_wq() is scheduled for later execution. Add some module > refcounting ? No need, if netpoll_cleanup is called whilst a retry is pending the tx list lock is taken. All the references to the netpoll pointer are deleted from the pending list before the lock is released. > > +void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) > > +{ > > +repeat: > > + if(!np || !np->dev || !netif_running(np->dev)) > > + goto out_free; > > I am not sure that the first two tests above really made sense in > the former version of the code: > -> kernel/printk.c::vprintk() (downs console_sem) > -> kernel/printk.c::release_console_sem > -> kernel/printk.c::call_console_drivers > [blah blah] > -> drivers/net/netconsole.c::netconsole.write_msg > -> drivers/net/netconsole.c::netpoll_send_udp > -> drivers/net/netconsole.c::netpoll_send_skb > > -> drivers/net/netconsole.c::cleanup_netconsole > -> kernel/printk.c::unregister_console > -> kernel/printk.c::acquire_console_sem > -> try to down(&console_sem) > > dev_put() has not been issued, np->dev is still among us and != NULL. > So far, so good. > > However netpoll_send_skb can be called through: > netpoll_rx -> arp_reply -> netpoll_send_skb > Here I fail to see why netpoll_cleanup could not happen at the same time > and issue np->dev = NULL; Could this solved by having a per netpoll structure rw_lock to protect against changing elements in the structure? e.g. struct netpoll { ... rwlock_t netpoll_lock; }; write_lock_irqsave(&np->netpoll_lock, flags); np->dev = NULL; write_unlock_irqsave(&np->netpoll_lock, flags); > > + > > + /* If the list is non-empty then pkts are pending tx, > > + * append to the end of list to stop reordering of the pkts > > + */ > > + if (!list_empty(&tx_queue.link) || !spin_trylock(&np->dev->xmit_lock)) { > > + struct tx_queue_list *txel; > > + > > + /* If we failed to get the xmit_lock then netpoll probably > > + * already holds this lock but will deadlock here if we relock, > > Ok ("probably" ?). :) Fixed > > + * queue the skb for transmission later > > + */ > > + spin_lock(&tx_list_lock); > > No spin_lock_irqsave ? Hmmm... > The only use of netpoll_send_skb() goes through netpoll_send_udp() which is > issued in drivers/net/netconsole.c::write_msg() with local_irq disabled. > You are right but an extra comment would not hurt imho. Is it safe to issue spin_lock_irqsave/spin_unlock_restore in this context? Reason: What if it wasn't netconsole calling netpoll_send_udp and not with local interrupts disabled? > > + > > + txel = kmalloc(sizeof(struct tx_queue_list), GFP_ATOMIC); > > + if (!txel) > > + goto out_free; > > + > > + txel->skb = skb_get(skb); > > + txel->np = np; > > + > > + list_add_tail(&txel->link, &tx_queue.link); > > + > > + queue_delayed_work(tx_wq, &tx_wq_obj, HZ/100); > > + > > + spin_unlock(&tx_list_lock); > > + } else if (transmit_skb(np, skb)) > > + goto repeat; > > As I read it, the previous code could loop as long as netif_queue_stopped() > is true and it can acquire the lock. Any reason to not defer the skb for > transmission in tx_retry_wq() ? Nope, no good reason. If transmit_skb fails it'll now defer. > > > + > > + return; > > + > > +out_free: > > + __kfree_skb(skb); > > } > > > > void netpoll_send_udp(struct netpoll *np, const char *msg, int len) > > @@ -636,6 +729,15 @@ int netpoll_setup(struct netpoll *np) > > spin_unlock_irqrestore(&rx_list_lock, flags); > > } > > > > + /* Workqueue to schedule the tx retries */ > > + if (!tx_wq) > > + tx_wq = create_workqueue("netpoll"); > > + > > + if (!tx_wq) { > > + printk(KERN_ERR "Failed to create netpoll workqueue\n"); > > + goto release; > > + } > > + > > This queue is created but never removed. May be it could/should go in a > separate function and be anchored to net_dev_init(). The network maintainers > will tell if they care or not. > > Pure curiosity: why a specific queue instead of schedule_delayed_work() ? Pure ignorance I'm afraid ;). That'll more what I was looking for originally so I've changed it. I've attached a revised version. Thanks for you comments (and Matts) Mark -- Mark Broadbent