* sungem hangs in atomic if netconsole enabled but no carrier
@ 2005-12-20 12:08 Johannes Berg
2005-12-20 21:18 ` Benjamin Herrenschmidt
2005-12-20 22:19 ` Francois Romieu
0 siblings, 2 replies; 5+ messages in thread
From: Johannes Berg @ 2005-12-20 12:08 UTC (permalink / raw)
To: linux-kernel; +Cc: David S. Miller, Benjamin Herrenschmidt, Eric Lemoine
[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]
I've been debugging some issues and wondered why I got hangs in random
places in the code. It turns out that the problem is that I still had
netconsole enabled even though I have no network at the moment. So what
I had was:
* sungem compiled in
* netconsole=.... as command line
* no network cable plugged in
sungem does recognise this situation and says that waiting for carrier
timed out. However, later, when I printk() in with interrupts disabled,
the system hangs after printing out a few lines to the console (I think
it's more than one, not sure though, might be just a single one).
Turns out that if I remove the netconsole=... option to my kernel, all
works fine and the system no longer hangs. Obviously not plugging in a
network cable is pretty useless when netconsole is turned on, but I
think it should not hang the system completely. So far I haven't been
able to figure out where it actually hangs and don't even know how to do
so -- I'm open for suggestions on how to find out why/where it hangs or
even fixes.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sungem hangs in atomic if netconsole enabled but no carrier
2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
@ 2005-12-20 21:18 ` Benjamin Herrenschmidt
2005-12-20 21:23 ` David S. Miller
2005-12-20 22:19 ` Francois Romieu
1 sibling, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2005-12-20 21:18 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-kernel, David S. Miller, Eric Lemoine
> Turns out that if I remove the netconsole=... option to my kernel, all
> works fine and the system no longer hangs. Obviously not plugging in a
> network cable is pretty useless when netconsole is turned on, but I
> think it should not hang the system completely. So far I haven't been
> able to figure out where it actually hangs and don't even know how to do
> so -- I'm open for suggestions on how to find out why/where it hangs or
> even fixes.
Hrm... I've heard various reports about problems with netconsole... I've
never tried it myself so far though. One thing I remember to beware of
is if sungem does a printk while holding its lock...
Ben.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sungem hangs in atomic if netconsole enabled but no carrier
2005-12-20 21:18 ` Benjamin Herrenschmidt
@ 2005-12-20 21:23 ` David S. Miller
0 siblings, 0 replies; 5+ messages in thread
From: David S. Miller @ 2005-12-20 21:23 UTC (permalink / raw)
To: benh; +Cc: johannes, linux-kernel, davem, eric.lemoine
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 21 Dec 2005 08:18:40 +1100
>
> > Turns out that if I remove the netconsole=... option to my kernel, all
> > works fine and the system no longer hangs. Obviously not plugging in a
> > network cable is pretty useless when netconsole is turned on, but I
> > think it should not hang the system completely. So far I haven't been
> > able to figure out where it actually hangs and don't even know how to do
> > so -- I'm open for suggestions on how to find out why/where it hangs or
> > even fixes.
>
> Hrm... I've heard various reports about problems with netconsole... I've
> never tried it myself so far though. One thing I remember to beware of
> is if sungem does a printk while holding its lock...
I bet that's it, doing a printk while already in the netconsole
printk output path.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sungem hangs in atomic if netconsole enabled but no carrier
2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
2005-12-20 21:18 ` Benjamin Herrenschmidt
@ 2005-12-20 22:19 ` Francois Romieu
2005-12-20 22:59 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 5+ messages in thread
From: Francois Romieu @ 2005-12-20 22:19 UTC (permalink / raw)
To: Johannes Berg
Cc: linux-kernel, David S. Miller, Benjamin Herrenschmidt,
Eric Lemoine
Johannes Berg <johannes@sipsolutions.net> :
[...]
> think it should not hang the system completely. So far I haven't been
> able to figure out where it actually hangs and don't even know how to do
> so -- I'm open for suggestions on how to find out why/where it hangs or
> even fixes.
See the thread "Netconsole violates dev->hard_start_xmit synch rules"
started the 06/09/2005 on netdev@vger.kernel.org for some interesting
background.
(the innocent hero slowly fades into the swamps of netpolling...)
Still with us ?
Were you using sundance.c, you would probably bug on the first timeout:
[net/sched/sch_generic.c]
static void dev_watchdog(unsigned long arg)
{
struct net_device *dev = (struct net_device *)arg;
spin_lock(&dev->xmit_lock);
^^^^^^^^^
if (dev->qdisc != &noop_qdisc) {
if (netif_device_present(dev) &&
netif_running(dev) &&
netif_carrier_ok(dev)) {
if (netif_queue_stopped(dev) &&
(jiffies - dev->trans_start) > dev->watchdog_timeo) {
printk(KERN_INFO "NETDEV WATCHDOG: %s: transmit timed out\n", dev->name);
dev->tx_timeout(dev);
^^^^^^^^^^^^^^^
[net/core/netpoll.c]
static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
{
int status;
struct netpoll_info *npinfo;
if (!np || !np->dev || !netif_running(np->dev)) {
__kfree_skb(skb);
return;
}
npinfo = np->dev->npinfo;
/* avoid recursion */
if (npinfo->poll_owner == smp_processor_id() ||
np->dev->xmit_lock_owner == smp_processor_id()) {
if (np->drop)
np->drop(skb);
else
__kfree_skb(skb);
return;
}
do {
npinfo->tries--;
spin_lock(&np->dev->xmit_lock);
^^^^^^^^^
A quick glance shows no netif_carrier_{on/off} in the sundance driver.
It would be a good candidate.
However you are using sungem.c and despite the fact that I should really
have something for dinner *now*, you are protected by netif_carrier_off.
But (drums roll):
[drivers/net/sungem.c]
#define DEFAULT_MSG (NETIF_MSG_DRV | \
NETIF_MSG_PROBE | \
NETIF_MSG_LINK)
Thus gem_link_timer() will periodically complain that the link is down.
So gem_start_xmit() is issued.
Repeat until the TX ring is full: netif_stop_queue() is called.
gem_link_timer() printks.
net/core/netpoll.c::netpoll_send_skb() notices that the queue is stopped
and decides to try the usual NAPI poll(). A few function calls later, the
driver ends in drivers/net/sungem.c::gem_poll() where it takes so many
(irq-)locks that I do not even want to verify that it has a chance
to play nice with the pending gem_link_timer().
--
Ueimor
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sungem hangs in atomic if netconsole enabled but no carrier
2005-12-20 22:19 ` Francois Romieu
@ 2005-12-20 22:59 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2005-12-20 22:59 UTC (permalink / raw)
To: Francois Romieu
Cc: Johannes Berg, linux-kernel, David S. Miller, Eric Lemoine
On Tue, 2005-12-20 at 23:19 +0100, Francois Romieu wrote:
> net/core/netpoll.c::netpoll_send_skb() notices that the queue is stopped
> and decides to try the usual NAPI poll(). A few function calls later, the
> driver ends in drivers/net/sungem.c::gem_poll() where it takes so many
> (irq-)locks that I do not even want to verify that it has a chance
> to play nice with the pending gem_link_timer().
I'm not fan of the locking in sungem, I think I wrote a big fat comment
about it and why it is like that for now, better ideas are welcome :)
Ben.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-12-20 23:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-20 12:08 sungem hangs in atomic if netconsole enabled but no carrier Johannes Berg
2005-12-20 21:18 ` Benjamin Herrenschmidt
2005-12-20 21:23 ` David S. Miller
2005-12-20 22:19 ` Francois Romieu
2005-12-20 22:59 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox