netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@linux-foundation.org>
To: Chris Lightfoot <chris@ex-parrot.com>
Cc: Stephen Hemminger <shemminger@osdl.org>, netdev@vger.kernel.org
Subject: Re: sky2 problems on Intel Mac Mini
Date: Mon, 29 Jan 2007 16:01:17 -0800	[thread overview]
Message-ID: <20070129160117.7dda9563@freekitty> (raw)
In-Reply-To: <DJgDe2E/Cvnb.CrmE6+LDxkzqNQTYvzH+WQ@sphinx.mythic-beasts.com>

On Mon, 29 Jan 2007 23:57:32 +0000
Chris Lightfoot <chris@ex-parrot.com> wrote:

>   [ please cc: me on any reply ]
> 
> I'm seeing lots of problems with the sky2 driver on Mac
> Minis. Based on the suggestions in,
>     http://www.mail-archive.com/netdev@vger.kernel.org/msg28221.html
> I am running stock 2.6.19 + the patches from the
> mactel-linux.org site to get the kernel booting on the
> Apple hardware; none of these touches the sky2 code. The
> module is installed with disable_msi=1 and
> idle_timeout=10; the chip version is,
>     Yukon-EC (0xb6) rev 2
> 
> The crashes we're seeing at the moment show (with
> debug=16) lots and lots of transmits being queued up and
> never being completed, even with the timeout switched on.
> For instance, (this is on a machine running NFS root and
> vlans)

Is this NFS over UDP?

> 
>     [ lots of normal activity alternating tx queued / tx done ]
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 65, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 106 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 66 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 67, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 107 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 68 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 69, len 150 
> Jan 29 21:03:22 yeti kernel: sky2 eth0: rx slot 108 status 0x9e2100 len 154 
> Jan 29 21:03:22 yeti kernel: eth0: tx done 70 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 71, len 89 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 73, len 1090
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 75, len 1514
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 79, len 90 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 81, len 1514 
> Jan 29 21:03:22 yeti kernel: eth0: tx queued, slot 84, len 1090 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 86, len 98 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 88, len 1514 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 91, len 1090 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 93, len 54 
> Jan 29 21:03:23 yeti kernel: eth0: tx queued, slot 94, len 66 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 95, len 54 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 96, len 66 
> Jan 29 21:03:24 yeti kernel: eth0: tx queued, slot 97, len 98 
>     [ ... and so on for a total of 109 tx queued with no tx done, after which
>       our watchdog rebooted the machine ] 
> 
> -- though we've also seen, e.g., (no NFS root, no vlans)
> 
> Jan 28 19:32:16 t1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Jan 28 19:32:16 t1 kernel: sky2 eth0: tx timeout
> Jan 28 19:32:16 t1 kernel: sky2 eth0: transmit ring 115 .. 92 report=115 done=115
> Jan 28 19:32:16 t1 kernel: sky2 hardware hung? flushing
> Jan 28 19:32:25 t1 kernel: BUG: soft lockup detected on CPU#0!
> Jan 28 19:32:25 t1 kernel:  [<c015495a>] softlockup_tick+0xba/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c01327e9>] update_process_times+0x39/0x90
> Jan 28 19:32:25 t1 kernel:  [<c0117337>] smp_apic_timer_interrupt+0x97/0xc0
> Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
> Jan 28 19:32:25 t1 kernel:  [<c0445107>] _spin_lock_irqsave+0x67/0x80
> Jan 28 19:32:25 t1 kernel:  [<c0445136>] _spin_lock_bh+0x6/0x20
> Jan 28 19:32:25 t1 kernel:  [<c0302f40>] sky2_tx_clean+0x20/0x70
> Jan 28 19:32:25 t1 kernel:  [<c0303904>] sky2_tx_timeout+0x144/0x1b0
> Jan 28 19:32:25 t1 kernel:  [<c03da1c0>] dev_watchdog+0x0/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c03da28e>] dev_watchdog+0xce/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c0132916>] run_timer_softirq+0xc6/0x1c0
> Jan 28 19:32:25 t1 kernel:  [<c0120c80>] scheduler_tick+0xb0/0x3a0
> Jan 28 19:32:25 t1 kernel:  [<c012d1ea>] __do_softirq+0xca/0xf0
> Jan 28 19:32:25 t1 kernel:  [<c012d245>] do_softirq+0x35/0x40
> Jan 28 19:32:25 t1 kernel:  [<c012d295>] irq_exit+0x45/0x50
> Jan 28 19:32:25 t1 kernel:  [<c011733c>] smp_apic_timer_interrupt+0x9c/0xc0
> Jan 28 19:32:25 t1 kernel:  [<c0103eab>] apic_timer_interrupt+0x1f/0x24
> Jan 28 19:32:25 t1 kernel:  [<c0101332>] mwait_idle_with_hints+0x32/0x40
> Jan 28 19:32:25 t1 kernel:  [<c0101370>] mwait_idle+0x30/0x40
> Jan 28 19:32:25 t1 kernel:  [<c0101144>] cpu_idle+0x94/0xe0
> Jan 28 19:32:25 t1 kernel:  [<c0592a16>] start_kernel+0x1c6/0x230
> Jan 28 19:32:25 t1 kernel:  [<c0592360>] unknown_bootoption+0x0/0x1e0
> Jan 28 19:32:25 t1 kernel:  =======================
> 
> -- I assume this is just the same problem exhibiting on a
> kernel with soft lockups detection enabled?
> 
> Hopefully I should be able to actually log into one of
> these machines over an alternate connection next time the
> problem recurs, at which point I should be able to get
> ethtool -d output. Anything else I should do at that
> point?
> 
> Any suggestions for what to do next to chase this problem
> down? I haven't yet tried the sk98lin driver on this
> hardware; is that still worth doing? Are there any useful
> tests we should try? Unfortunately, though these crashes
> happen pretty frequently (several times per day
> typically), I don't have a test case to reproduce one;
> however, if it'd be useful, I can probably get a pcap
> trace of the period immediately before the interface falls
> over using port mirroring on the switch to which the
> machines are connected. Is that likely to be informative?
> 


-- 
Stephen Hemminger <shemminger@linux-foundation.org>

  reply	other threads:[~2007-01-30  0:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-29 23:57 sky2 problems on Intel Mac Mini Chris Lightfoot
2007-01-30  0:01 ` Stephen Hemminger [this message]
2007-01-30  8:39   ` Chris Lightfoot
2007-01-30  9:40     ` Tino Keitel
2007-01-30 23:21       ` Stephen Hemminger
2007-01-30 23:24         ` Tino Keitel
2007-01-30 19:15 ` Stephen Hemminger
2007-01-31  0:09   ` Chris Lightfoot
     [not found]     ` <iUr0b79BEZdD.1otwEM+70thgeGTyfwhw1g@sphinx.mythic-beasts.com>
2007-01-31 16:48       ` Chris Lightfoot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070129160117.7dda9563@freekitty \
    --to=shemminger@linux-foundation.org \
    --cc=chris@ex-parrot.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).