public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* sungem hangs in atomic if netconsole enabled but no carrier
@ 2005-12-20 12:08 Johannes Berg
  2005-12-20 21:18 ` Benjamin Herrenschmidt
  2005-12-20 22:19 ` Francois Romieu
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Berg @ 2005-12-20 12:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller, Benjamin Herrenschmidt, Eric Lemoine

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

I've been debugging some issues and wondered why I got hangs in random
places in the code. It turns out that the problem is that I still had
netconsole enabled even though I have no network at the moment. So what
I had was:
  * sungem compiled in
  * netconsole=.... as command line
  * no network cable plugged in

sungem does recognise this situation and says that waiting for carrier
timed out. However, later, when I printk() in with interrupts disabled,
the system hangs after printing out a few lines to the console (I think
it's more than one, not sure though, might be just a single one).

Turns out that if I remove the netconsole=... option to my kernel, all
works fine and the system no longer hangs. Obviously not plugging in a
network cable is pretty useless when netconsole is turned on, but I
think it should not hang the system completely. So far I haven't been
able to figure out where it actually hangs and don't even know how to do
so -- I'm open for suggestions on how to find out why/where it hangs or
even fixes.

johannes


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sungem hangs in atomic if netconsole enabled but no carrier
  2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
@ 2005-12-20 21:18 ` Benjamin Herrenschmidt
  2005-12-20 21:23   ` David S. Miller
  2005-12-20 22:19 ` Francois Romieu
  1 sibling, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2005-12-20 21:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-kernel, David S. Miller, Eric Lemoine


> Turns out that if I remove the netconsole=... option to my kernel, all
> works fine and the system no longer hangs. Obviously not plugging in a
> network cable is pretty useless when netconsole is turned on, but I
> think it should not hang the system completely. So far I haven't been
> able to figure out where it actually hangs and don't even know how to do
> so -- I'm open for suggestions on how to find out why/where it hangs or
> even fixes.

Hrm... I've heard various reports about problems with netconsole... I've
never tried it myself so far though. One thing I remember to beware of
is if sungem does a printk while holding its lock...

Ben.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sungem hangs in atomic if netconsole enabled but no carrier
  2005-12-20 21:18 ` Benjamin Herrenschmidt
@ 2005-12-20 21:23   ` David S. Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David S. Miller @ 2005-12-20 21:23 UTC (permalink / raw)
  To: benh; +Cc: johannes, linux-kernel, davem, eric.lemoine

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 21 Dec 2005 08:18:40 +1100

> 
> > Turns out that if I remove the netconsole=... option to my kernel, all
> > works fine and the system no longer hangs. Obviously not plugging in a
> > network cable is pretty useless when netconsole is turned on, but I
> > think it should not hang the system completely. So far I haven't been
> > able to figure out where it actually hangs and don't even know how to do
> > so -- I'm open for suggestions on how to find out why/where it hangs or
> > even fixes.
> 
> Hrm... I've heard various reports about problems with netconsole... I've
> never tried it myself so far though. One thing I remember to beware of
> is if sungem does a printk while holding its lock...

I bet that's it, doing a printk while already in the netconsole
printk output path.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sungem hangs in atomic if netconsole enabled but no carrier
  2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
  2005-12-20 21:18 ` Benjamin Herrenschmidt
@ 2005-12-20 22:19 ` Francois Romieu
  2005-12-20 22:59   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 5+ messages in thread
From: Francois Romieu @ 2005-12-20 22:19 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linux-kernel, David S. Miller, Benjamin Herrenschmidt,
	Eric Lemoine

Johannes Berg <johannes@sipsolutions.net> :
[...]
> think it should not hang the system completely. So far I haven't been
> able to figure out where it actually hangs and don't even know how to do
> so -- I'm open for suggestions on how to find out why/where it hangs or
> even fixes.

See the thread "Netconsole violates dev->hard_start_xmit synch rules"
started the 06/09/2005 on netdev@vger.kernel.org for some interesting
background.

(the innocent hero slowly fades into the swamps of netpolling...)

Still with us ?

Were you using sundance.c, you would probably bug on the first timeout:

[net/sched/sch_generic.c]
static void dev_watchdog(unsigned long arg)
{
        struct net_device *dev = (struct net_device *)arg;

        spin_lock(&dev->xmit_lock);
        ^^^^^^^^^
        if (dev->qdisc != &noop_qdisc) {
                if (netif_device_present(dev) &&
                    netif_running(dev) &&
                    netif_carrier_ok(dev)) {
                        if (netif_queue_stopped(dev) &&
                            (jiffies - dev->trans_start) > dev->watchdog_timeo) {
                                printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name);
                                dev->tx_timeout(dev);
                                ^^^^^^^^^^^^^^^
[net/core/netpoll.c]
static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
        int status;
        struct netpoll_info *npinfo;

        if (!np || !np->dev || !netif_running(np->dev)) {
                __kfree_skb(skb);
                return;
        }

        npinfo = np->dev->npinfo;

        /* avoid recursion */
        if (npinfo->poll_owner == smp_processor_id() ||
            np->dev->xmit_lock_owner == smp_processor_id()) {
                if (np->drop)
                        np->drop(skb);
                else
                        __kfree_skb(skb);
                return;
        }

        do {
                npinfo->tries--;
                spin_lock(&np->dev->xmit_lock);
                ^^^^^^^^^

A quick glance shows no netif_carrier_{on/off} in the sundance driver.
It would be a good candidate.

However you are using sungem.c and despite the fact that I should really
have something for dinner *now*, you are protected by netif_carrier_off.

But (drums roll):

[drivers/net/sungem.c]
#define DEFAULT_MSG     (NETIF_MSG_DRV          | \
                         NETIF_MSG_PROBE        | \
                         NETIF_MSG_LINK)

Thus gem_link_timer() will periodically complain that the link is down.

So gem_start_xmit() is issued.

Repeat until the TX ring is full: netif_stop_queue() is called.

gem_link_timer() printks.

net/core/netpoll.c::netpoll_send_skb() notices that the queue is stopped
and decides to try the usual NAPI poll(). A few function calls later, the
driver ends in drivers/net/sungem.c::gem_poll() where it takes so many
(irq-)locks that I do not even want to verify that it has a chance
to play nice with the pending gem_link_timer().

--
Ueimor

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: sungem hangs in atomic if netconsole enabled but no carrier
  2005-12-20 22:19 ` Francois Romieu
@ 2005-12-20 22:59   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2005-12-20 22:59 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Johannes Berg, linux-kernel, David S. Miller, Eric Lemoine

On Tue, 2005-12-20 at 23:19 +0100, Francois Romieu wrote:

> net/core/netpoll.c::netpoll_send_skb() notices that the queue is stopped
> and decides to try the usual NAPI poll(). A few function calls later, the
> driver ends in drivers/net/sungem.c::gem_poll() where it takes so many
> (irq-)locks that I do not even want to verify that it has a chance
> to play nice with the pending gem_link_timer().

I'm not fan of the locking in sungem, I think I wrote a big fat comment
about it and why it is like that for now, better ideas are welcome :)

Ben.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-12-20 23:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
2005-12-20 21:18 ` Benjamin Herrenschmidt
2005-12-20 21:23   ` David S. Miller
2005-12-20 22:19 ` Francois Romieu
2005-12-20 22:59   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox