All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Breuer <mbreuer@majjas.com>
To: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
	David Miller <davem@davemloft.net>,
	akpm@linux-foundation.org, flyboy@gmail.com,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Michael Chan <mchan@broadcom.com>, Don Fry <pcnet32@verizon.net>,
	Francois Romieu <romieu@fr.zoreil.com>,
	Matt Carlson <mcarlson@broadcom.com>
Subject: Re: [PATCH] sky2:  receive dma mapping error handling
Date: Sat, 30 Jan 2010 11:31:48 -0500	[thread overview]
Message-ID: <4B645EF4.4050701@majjas.com> (raw)
In-Reply-To: <20100128153643.0fca3c51@nehalam>

On 01/28/2010 06:36 PM, Stephen Hemminger wrote:
> Please try this patch (and only this patch), on 2.6.33-rc5[*];
> none of the other patches that did not make it upstream because that
> confuses things too much.
>
> The code that checks for DMA mapping errors on receive buffers would
> not handle errors correctly.  I doubt you have these errors, but if you
> did then it would explain the problems.  The code has to be a little
> tricky and build mapping for new rx buffer before releasing old one,
> that way if new mapping fails, the old one can be reused.
>
> If it works for you, I will resubmit with signed-off.
>
> -
>
Nope - tx crash again. This time the system stayed up (but hosed) for a 
few hours. When I tried to recover eth0 the system then crashed.

Brief summary of events (log extract below):

System start Jan 28 19:29
Everything seemed good (load and all) until 17:13:11 the following day 
when I got rx errors:

Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:11 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x6230010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x7f40010 
length 1518
Jan 29 17:13:12 mail kernel: sky2 eth0: rx error, status 0x8180010 
length 1518
Jan 29 17:13:14 mail kernel: sky2 eth0: rx error, status 0x5f60010 
length 1518

The system continued running normally after this until this morning (Jan 
30) at 0:44:55:
Jan 30 05:44:55 mail kernel: DRHD: handling fault status reg 2
Jan 30 05:44:55 mail kernel: DMAR:[DMA Read] Request device [06:00.0] 
fault addr ffc4331ff000
Jan 30 05:44:55 mail kernel: DMAR:[fault reason 06] PTE Read access is 
not set
Jan 30 05:44:55 mail kernel: net_ratelimit: 2 callbacks suppressed
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: error interrupt 
status=0xc0000000
Jan 30 05:44:55 mail kernel: sky2 0000:06:00.0: PCI hardware error (0x2010)
Jan 30 05:45:01 mail kernel: ------------[ cut here ]------------
Jan 30 05:45:01 mail kernel: WARNING: at net/sched/sch_generic.c:255 
dev_watchdog+0xf3/0x161()
Jan 30 05:45:01 mail kernel: Hardware name: System Product Name
Jan 30 05:45:01 mail kernel: NETDEV WATCHDOG: eth0 (sky2): transmit 
queue 0 timed out
Jan 30 05:45:01 mail kernel: Modules linked in: iptable_raw 
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat cpufreq_stats 
ip6table_filter ip6table_mangle ip6_tables bridge stp appletalk psnap 
llc nfsd lockd nfs_acl auth_rpcgss exportfs hwmon_vid coretemp sunrpc 
acpi_cpufreq sit tunnel4 ipt_LOG nf_conntrack_netbios_ns 
nf_conntrack_ftp xt_DSCP xt_dscp xt_MARK nf_conntrack_ipv6 xt_multiport 
ipv6 dm_multipath kvm_intel kvm snd_hda_codec_analog snd_hda_intel 
snd_ens1371 gameport snd_hda_codec snd_rawmidi snd_ac97_codec 
gspca_spca505 ac97_bus gspca_main snd_hwdep videodev snd_seq 
snd_seq_device v4l1_compat snd_pcm v4l2_compat_ioctl32 snd_timer snd 
soundcore snd_page_alloc firewire_ohci pcspkr i2c_i801 firewire_core wmi 
asus_atk0110 crc_itu_t sky2 hwmon iTCO_wdt iTCO_vendor_support fbcon 
tileblit font bitblit softcursor raid456 async_raid6_recov async_pq 
raid6_pq async_xor xor async_memcpy async_tx raid1 ata_generic pata_acpi 
pata_marvell nouveau ttm drm_kms_helper drm agpgart fb i2c_algo_bit 
cfbcopyarea i2c_core cf
Jan 30 05:45:01 mail kernel: bimgblt cfbfillrect [last unloaded: nf_nat]
Jan 30 05:45:01 mail kernel: Pid: 0, comm: swapper Tainted: G        W 
2.6.33-rc5WITHMMAPNODMARFORKTIPSKY2DMAMAP-00283-gd4d37bd-dirty #1
Jan 30 05:45:01 mail kernel: Call Trace:
Jan 30 05:45:01 mail kernel: <IRQ>  [<ffffffff8104a03d>] 
warn_slowpath_common+0x7c/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8104a0ac>] 
warn_slowpath_fmt+0x41/0x43
Jan 30 05:45:01 mail kernel: [<ffffffff813d2f43>] ? netif_tx_lock+0x44/0x6c
Jan 30 05:45:01 mail kernel: [<ffffffff813d30ab>] dev_watchdog+0xf3/0x161
Jan 30 05:45:01 mail kernel: [<ffffffff8106a31f>] ? 
sched_clock_cpu+0x44/0xce
Jan 30 05:45:01 mail kernel: [<ffffffff8105761a>] 
run_timer_softirq+0x1c3/0x26b
Jan 30 05:45:01 mail kernel: [<ffffffff8105060c>] __do_softirq+0xf8/0x1cd
Jan 30 05:45:01 mail kernel: [<ffffffff8107192b>] ? 
tick_program_event+0x2a/0x2c
Jan 30 05:45:01 mail kernel: [<ffffffff8100ab1c>] call_softirq+0x1c/0x30
Jan 30 05:45:01 mail kernel: [<ffffffff8100c2b3>] do_softirq+0x4b/0xa3
Jan 30 05:45:01 mail kernel: [<ffffffff810501f8>] irq_exit+0x4a/0x8c
Jan 30 05:45:01 mail kernel: [<ffffffff81461859>] 
smp_apic_timer_interrupt+0x86/0x94
Jan 30 05:45:01 mail kernel: [<ffffffff8100a5d3>] 
apic_timer_interrupt+0x13/0x20
Jan 30 05:45:01 mail kernel: <EOI>  [<ffffffff812afbd4>] ? 
acpi_idle_enter_bm+0x256/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff812afbcd>] ? 
acpi_idle_enter_bm+0x24f/0x28a
Jan 30 05:45:01 mail kernel: [<ffffffff8139574c>] 
cpuidle_idle_call+0x9e/0xfa
Jan 30 05:45:01 mail kernel: [<ffffffff81008c05>] cpu_idle+0xb4/0xf6
Jan 30 05:45:01 mail kernel: [<ffffffff81455d48>] 
start_secondary+0x201/0x242
Jan 30 05:45:01 mail kernel: ---[ end trace 57f7151f6a5def07 ]---
Jan 30 05:45:01 mail kernel: sky2 eth0: tx timeout
Jan 30 05:45:01 mail kernel: sky2 eth0: transmit ring 14 .. 102 
report=14 done=14
Jan 30 05:45:01 mail kernel: sky2 eth0: disabling interface
Jan 30 05:45:01 mail kernel: sky2 eth0: enabling interface

This down/up continued for several hours until I intervened at about 10:05.

I saw that there was no eth0 connectivity, eth1 was ok. It appeard that 
eth0 was receiving traffic but unable to send. arpwatch was reporting 
bogons, DHCP showed many DISCOVER/OFFER pairs, no REQUEST/ACK. Pings to 
any system failed; arp showed incomplete for anything hanging off of 
eth0. arping also failed.
I manually stopped and started eth0 (ifconfig) and reset iptables 
(although eth0 has no filters).

As I started looking at logs, the system hung and rebooted. I'm up now 
with dma debug enabled, however as with 2.6.32.4 num_entries is dropping 
and I don't think that dma debug will remain enabled long enough to 
catch a crash.

So, as I see things, there are two issues here: 1) the TX hang post DMAR 
error and 2) the inability to recover the interface and subsequent 
system instability.




  parent reply	other threads:[~2010-01-30 16:31 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-20  9:41 [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync Jarek Poplawski
2010-01-20 18:03 ` Stephen Hemminger
2010-01-20 20:11   ` Michael Chan
2010-01-20 20:30     ` Stephen Hemminger
2010-01-20 20:58       ` Jarek Poplawski
2010-01-20 22:50         ` David Miller
2010-01-20 22:45       ` David Miller
2010-01-20 18:09 ` Stephen Hemminger
2010-01-20 22:24 ` Alan Cox
2010-01-20 22:53   ` David Miller
2010-01-20 22:53   ` Jarek Poplawski
2010-01-21 15:22     ` FUJITA Tomonori
2010-01-21 18:41       ` Jarek Poplawski
2010-01-22  5:11         ` FUJITA Tomonori
2010-01-22  6:38           ` David Miller
2010-02-03  1:18             ` FUJITA Tomonori
2010-02-03  1:27               ` David Miller
2010-01-21 19:59 ` Michael Breuer
2010-01-21 20:41   ` Jarek Poplawski
2010-01-21 20:46     ` Michael Breuer
2010-01-21 21:02       ` Jarek Poplawski
2010-01-22 18:01     ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Michael Breuer
2010-01-22 21:53       ` Jarek Poplawski
2010-01-22 22:14         ` Michael Breuer
2010-01-22 23:06           ` Jarek Poplawski
2010-01-22 23:25             ` Michael Breuer
2010-01-22 23:46               ` Jarek Poplawski
2010-01-22 23:50                 ` Michael Breuer
2010-01-23 23:21                   ` Jarek Poplawski
2010-01-24  1:53                     ` Michael Breuer
2010-01-27 15:34                     ` Michael Breuer
2010-01-27 16:50                       ` Stephen Hemminger
2010-01-27 16:57                         ` Michael Breuer
2010-01-27 17:45                           ` Stephen Hemminger
2010-01-27 17:57                             ` Michael Breuer
2010-01-27 18:33                               ` Michael Breuer
2010-01-27 23:54                             ` Hang: 2.6.32.4 sky2/DMAR David Miller
2010-01-27 17:56                           ` Hang: 2.6.32.4 sky2/DMAR (was [PATCH] sky2: Fix WARNING: at lib/dma-debug.c:902 check_sync) Stephen Hemminger
2010-01-27 17:58                             ` Michael Breuer
2010-01-27 18:08                             ` Michael Breuer
2010-01-27 18:45                               ` Michael Breuer
2010-01-27 19:23                                 ` Jarek Poplawski
2010-01-27 19:32                                   ` Jarek Poplawski
2010-01-28 15:32                                 ` Michael Breuer
2010-01-28 16:43                                   ` Michael Breuer
2010-01-28 17:08                                     ` Stephen Hemminger
2010-01-28 18:46                                       ` Michael Breuer
2010-01-28 22:34                                         ` Jarek Poplawski
2010-01-28 22:43                                           ` Michael Breuer
2010-01-28 22:56                                             ` Jarek Poplawski
2010-01-28 22:59                                               ` Michael Breuer
2010-01-28 23:36                                                 ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
2010-01-29  0:05                                                   ` Michael Breuer
2010-01-30 16:30                                                   ` Michael Breuer
2010-01-30 16:31                                                   ` Michael Breuer [this message]
2010-01-31  0:34                                                     ` Jarek Poplawski
2010-01-31  4:17                                                       ` Michael Breuer
2010-01-31 22:25                                                         ` Jarek Poplawski
2010-01-31 23:58                                                           ` Michael Breuer
2010-01-31  4:55                                                       ` Michael Breuer
2010-01-31 18:50                                                         ` Michael Breuer
2010-01-31 21:58                                                           ` Michael Breuer
2010-01-31 22:18                                                             ` Jarek Poplawski
2010-02-01  0:19                                                               ` Michael Breuer
2010-02-01  4:26                                                                 ` Michael Breuer
2010-02-01 10:47                                                                   ` Jarek Poplawski
2010-02-01  9:17                                                                 ` [PATCH v2] sky2: Fix transmit dma mapping handling Jarek Poplawski
2010-02-01 17:52                                                                   ` Michael Breuer
2010-02-01 18:08                                                               ` [PATCH] sky2: receive dma mapping error handling Stephen Hemminger
2010-02-01 18:20                                                               ` Stephen Hemminger
2010-02-01 18:44                                                                 ` Michael Breuer
2010-02-01 20:13                                                                 ` Jarek Poplawski
2010-02-01 20:41                                                                   ` Jarek Poplawski
2010-02-01 21:27                                                                 ` [PATCH v3] " Jarek Poplawski
2010-02-01 22:29                                                                   ` Stephen Hemminger
2010-02-01 22:46                                                                     ` Jarek Poplawski
2010-02-01 22:51                                                                       ` Stephen Hemminger
2010-02-01 21:42                                                                 ` [PATCH v3b resent] sky2: Fix transmit dma mapping handling Jarek Poplawski
2010-02-03  4:07                                                                 ` [PATCH] sky2: receive dma mapping error handling Michael Breuer
2010-02-03 16:47                                                                   ` Michael Breuer
2010-02-03 16:56                                                                     ` Stephen Hemminger
2010-02-03 17:07                                                                       ` Michael Breuer
2010-02-03 18:23                                                                         ` Justin P. Mattock
2010-02-03 18:25                                                                           ` Stephen Hemminger
2010-02-03 18:48                                                                             ` Justin P. Mattock
2010-11-06 16:57                                                                         ` Sky2 2.6.36-09934-g2aab243 DMAR error with tcp timestamp enabled Michael Breuer
2010-11-08  3:13                                                                           ` Stephen Hemminger
2010-11-08  3:38                                                                             ` Michael Breuer
2010-11-08 16:46                                                                               ` Stephen Hemminger
2010-02-03 17:16                                                                       ` [PATCH] sky2: receive dma mapping error handling Justin P. Mattock
2010-02-02 22:44                                                   ` Andi Kleen
2012-01-16 16:39       ` Regression: sky2 kernel between 3.1 and 3.2.1 (last known good 3.0.9) Michael Breuer
2012-01-20 14:24         ` Michael Breuer
2012-01-20 16:10           ` Stephen Hemminger
2012-01-20 16:17             ` Michael Breuer
2012-01-20 16:26         ` Stephen Hemminger
2012-01-20 16:44           ` Michael Breuer
2012-01-21 15:29             ` Michael Breuer
2012-01-22 18:03               ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B645EF4.4050701@majjas.com \
    --to=mbreuer@majjas.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=flyboy@gmail.com \
    --cc=jarkao2@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pcnet32@verizon.net \
    --cc=romieu@fr.zoreil.com \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.