netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.25-rc8-mm2: FIX kmalloc-2048 (was Re: 2.6.25-rc8-mm2: IP: [<ffffffff802868f9>] __kmalloc+0x69/0x110)
       [not found]       ` <20080414183221.GA5234@martell.zuzino.mipt.ru>
@ 2008-04-14 19:56         ` Alexey Dobriyan
  2008-04-14 20:05           ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-14 19:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, netdev

I can reproduce semi-reliably (by kernel standards) corruption in
kmalloc-2048. No idea if this can explain all "struct file" related
oopses I saw, or SLUB free pointer corruption Pekka and Christoph are
looking into.

8139too and atl1 drivers are in use. 8139too connects to outer world,
atl1 -- to laptop collecting netconsole logs. However, I never managed to
collect late oopses with netconsole even if init scripts which are
shutting down interfaces are disabled. :-(



Transcribed from photo:

8000 flags=0x8000000000002082
INFO: Object 0xffff81017ff9d2d0 @offset=21200 fp=0xffff81017ff9ca88

Bytes b4 0xffff81017ff9d2c0:  62 ea ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a
  Object 0xffff81017ff9d2d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
  Object 0xffff81017ff9d2e0:  6b 6b 00 18 f3 a2 9f 90 00 1b 38 af 22 49 08 00
  Object 0xffff81017ff9d2f0:  45 10 00 4c ff 59 40 00 40 11 86 ac c0 a8 00 2a
  Object 0xffff81017ff9d300:  50 fa a2 be 91 43 00 7b 00 38 54 d4 23 00 00 00
  Object 0xffff81017ff9d310:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Object 0xffff81017ff9d320:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  Object 0xffff81017ff9d330:  00 00 00 00 4c ff 10 44 74 7f 6f 9d e4 c8 a2 4f
  Object 0xffff81017ff9d340:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 Redzone 0xffff81017ff9dad0:  bb bb bb bb bb bb bb bb
 Padding 0xffff81017ff9db10:  5a 5a 5a 5a 5a 5a 5a 5a

 Pid: 6168, comm: reboot Not tainted 2.6.25-rc8-mm2 #28

 Call Trace:
	print_trailer
	check_bytes_and_report
	check_object
	__free_slab
	discard_slab
	__slab_free
	? skb_release_data
	kfree
	? skb_release_data
	skb_release_all
	__kfre_skb
	kfree_skb
	atl1_clean_rx_ring
	atl1_down
	atl1_close
	dev_close
	dev_change_flags
	devinet_ioctl
	? trace_hardirqs_on
	inet_ioctl
	sock_ioctl
	vfs_ioctl
	do_vfs_ioctl
	sys_ioctl
	system_call_after_swapgs

FIX kmalloc-2048: Restoring 0xffff81017ff9d2e2-0xffff81017ff9d8d9=0x6b


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 2.6.25-rc8-mm2: FIX kmalloc-2048 (was Re: 2.6.25-rc8-mm2: IP: [<ffffffff802868f9>] __kmalloc+0x69/0x110)
  2008-04-14 19:56         ` 2.6.25-rc8-mm2: FIX kmalloc-2048 (was Re: 2.6.25-rc8-mm2: IP: [<ffffffff802868f9>] __kmalloc+0x69/0x110) Alexey Dobriyan
@ 2008-04-14 20:05           ` Christoph Lameter
  2008-04-19 11:17             ` Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2008-04-14 20:05 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, netdev

On Mon, 14 Apr 2008, Alexey Dobriyan wrote:

> I can reproduce semi-reliably (by kernel standards) corruption in
> kmalloc-2048. No idea if this can explain all "struct file" related
> oopses I saw, or SLUB free pointer corruption Pekka and Christoph are
> looking into.

The slub free pointer corruption is usually a result of the overwrites.

> Bytes b4 0xffff81017ff9d2c0:  62 ea ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a
>   Object 0xffff81017ff9d2d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
>   Object 0xffff81017ff9d2e0:  6b 6b 00 18 f3 a2 9f 90 00 1b 38 af 22 49 08 00
>   Object 0xffff81017ff9d2f0:  45 10 00 4c ff 59 40 00 40 11 86 ac c0 a8 00 2a
>   Object 0xffff81017ff9d300:  50 fa a2 be 91 43 00 7b 00 38 54 d4 23 00 00 00
>   Object 0xffff81017ff9d310:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   Object 0xffff81017ff9d320:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   Object 0xffff81017ff9d330:  00 00 00 00 4c ff 10 44 74 7f 6f 9d e4 c8 a2 4f
>   Object 0xffff81017ff9d340:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  Redzone 0xffff81017ff9dad0:  bb bb bb bb bb bb bb bb
>  Padding 0xffff81017ff9db10:  5a 5a 5a 5a 5a 5a 5a 5a
> 
> FIX kmalloc-2048: Restoring 0xffff81017ff9d2e2-0xffff81017ff9d8d9=0x6b

Looks like skb corruption. Would be helpful to have the complete output 
though. Does the data in the restored range trigger any memories?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: 2.6.25-rc8-mm2: FIX kmalloc-2048 (was Re: 2.6.25-rc8-mm2: IP: [<ffffffff802868f9>] __kmalloc+0x69/0x110)
  2008-04-14 20:05           ` Christoph Lameter
@ 2008-04-19 11:17             ` Alexey Dobriyan
  2008-04-19 14:45               ` atl1 64-bit => 32-bit DMA borkage (reproducible, bisected) Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-19 11:17 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, netdev

On Mon, Apr 14, 2008 at 01:05:09PM -0700, Christoph Lameter wrote:
> On Mon, 14 Apr 2008, Alexey Dobriyan wrote:
> 
> > I can reproduce semi-reliably (by kernel standards) corruption in
> > kmalloc-2048. No idea if this can explain all "struct file" related
> > oopses I saw, or SLUB free pointer corruption Pekka and Christoph are
> > looking into.
> 
> The slub free pointer corruption is usually a result of the overwrites.
> 
> > Bytes b4 0xffff81017ff9d2c0:  62 ea ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a
> >   Object 0xffff81017ff9d2d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> >   Object 0xffff81017ff9d2e0:  6b 6b 00 18 f3 a2 9f 90 00 1b 38 af 22 49 08 00
> >   Object 0xffff81017ff9d2f0:  45 10 00 4c ff 59 40 00 40 11 86 ac c0 a8 00 2a
> >   Object 0xffff81017ff9d300:  50 fa a2 be 91 43 00 7b 00 38 54 d4 23 00 00 00
> >   Object 0xffff81017ff9d310:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >   Object 0xffff81017ff9d320:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >   Object 0xffff81017ff9d330:  00 00 00 00 4c ff 10 44 74 7f 6f 9d e4 c8 a2 4f
> >   Object 0xffff81017ff9d340:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >  Redzone 0xffff81017ff9dad0:  bb bb bb bb bb bb bb bb
> >  Padding 0xffff81017ff9db10:  5a 5a 5a 5a 5a 5a 5a 5a
> > 
> > FIX kmalloc-2048: Restoring 0xffff81017ff9d2e2-0xffff81017ff9d8d9=0x6b
> 
> Looks like skb corruption. Would be helpful to have the complete output 
> though. Does the data in the restored range trigger any memories?

No.

I'm currently tracing this bug and 2.6.24 also has it. :-(


^ permalink raw reply	[flat|nested] 26+ messages in thread

* atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-19 11:17             ` Alexey Dobriyan
@ 2008-04-19 14:45               ` Alexey Dobriyan
  2008-04-20  2:54                 ` Jay Cliburn
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-19 14:45 UTC (permalink / raw)
  To: Luca Tettamanti, Chris Snook, Jay Cliburn, Jeff Garzik
  Cc: Pekka Enberg, Andrew Morton, linux-kernel, netdev,
	Christoph Lameter, torvalds

On Sat, Apr 19, 2008 at 03:17:19PM +0400, Alexey Dobriyan wrote:
> > > Bytes b4 0xffff81017ff9d2c0:  62 ea ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a
> > >   Object 0xffff81017ff9d2d0:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> > >   Object 0xffff81017ff9d2e0:  6b 6b 00 18 f3 a2 9f 90 00 1b 38 af 22 49 08 00
> > >   Object 0xffff81017ff9d2f0:  45 10 00 4c ff 59 40 00 40 11 86 ac c0 a8 00 2a
> > >   Object 0xffff81017ff9d300:  50 fa a2 be 91 43 00 7b 00 38 54 d4 23 00 00 00
> > >   Object 0xffff81017ff9d310:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >   Object 0xffff81017ff9d320:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >   Object 0xffff81017ff9d330:  00 00 00 00 4c ff 10 44 74 7f 6f 9d e4 c8 a2 4f
> > >   Object 0xffff81017ff9d340:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  Redzone 0xffff81017ff9dad0:  bb bb bb bb bb bb bb bb
> > >  Padding 0xffff81017ff9db10:  5a 5a 5a 5a 5a 5a 5a 5a
> > > 
> > > FIX kmalloc-2048: Restoring 0xffff81017ff9d2e2-0xffff81017ff9d8d9=0x6b

OK, nailed it.

It's commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 aka "atl1: disable broken 64-bit DMA".

With this commit in tree, I can reproduce either
a) kmalloc-2048 corruption after initscripts shutdown eth0
	http://marc.info/?l=linux-kernel&m=120820360221261&w=2

b) or oopses at filp_close() first reported long ago
	(sorry, can't find that email)

c) or hard hang after initscripts shutdown eth0 with even SysRq not working.
	http://marc.info/?l=linux-kernel&m=120795046008115&w=2

I have two boxes one with atl1, 4G RAM with 2G remapped after 4G boundary, another
with r8169 connected with just ethernet cable. NICs agree on 1Gbps speed.

So, it's enough to scp 200 MB git archive and immediately start
rebooting sequence for horrors described above to appear. It's not 100%
reproducible but more like 90%.

I tested 10 times kernel one commit before and it doesn't have these
issues and reboots reliably.

CONFIG_IOMMU is in use, dmesg, lspci, /proc/mtrr below:

03:00.0 Ethernet controller [0200]: Attansic Technology Corp. L1 Gigabit Ethernet Adapter [1969:1048] (rev b0)
	Subsystem: ASUSTeK Computer Inc. Unknown device [1043:8226]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 319
	Region 0: Memory at fe9c0000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at fe9a0000 [disabled] [size=128K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [48] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
		Address: 00000000fee0300c  Data: 4161
	Capabilities: [58] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag- AttnBtn+ AttnInd+ PwrInd+ RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
			ClockPM- Suprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [6c] Vital Product Data <?>
	Kernel driver in use: atl1
00: 69 19 48 10 06 04 10 00 b0 00 00 02 08 00 00 00
10: 04 00 9c fe 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 26 82
30: 00 00 9a fe 40 00 00 00 00 00 00 00 0a 01 00 00
40: 01 48 02 c0 00 00 00 00 05 58 81 00 0c 30 e0 fe
50: 00 00 00 00 61 41 00 00 10 6c 01 00 80 7f 00 00
60: 00 20 1b 00 11 f4 03 00 40 00 11 10 03 00 28 81
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 69 19 48 10 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


reg00: base=0x80000000 (2048MB), size=2048MB: uncachable, count=1
reg01: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=2048MB: write-back, count=1


Linux version 2.6.23-rc6 (ad@martell) (gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)) #14 SMP PREEMPT Sat Apr 19 17:46:31 MSD 2008
Command line: root=/dev/sda2 netconsole=@192.168.0.1/eth0,9353@192.168.0.42/00:1b:38:af:22:49 ignore_loglevel
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007ff90000 (usable)
 BIOS-e820: 000000007ff90000 - 000000007ff9e000 (ACPI data)
 BIOS-e820: 000000007ff9e000 - 000000007ffe0000 (ACPI NVS)
 BIOS-e820: 000000007ffe0000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524176) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1572864) 2 entries of 256 used
end_pfn_map = 1572864
DMI 2.4 present.
ACPI: RSDP 000FA980, 0024 (r2 ACPIAM)
ACPI: XSDT 7FF90100, 0054 (r1 KOZIRO FRONTIER  2000707 MSFT       97)
ACPI: FACP 7FF90290, 00F4 (r3 MSTEST OEMFACP   2000707 MSFT       97)
ACPI: DSDT 7FF905C0, 8FA9 (r1  A0637 A0637000        0 INTL 20060113)
ACPI: FACS 7FF9E000, 0040
ACPI: APIC 7FF90390, 006C (r1 MSTEST OEMAPIC   2000707 MSFT       97)
ACPI: MCFG 7FF90400, 003C (r1 MSTEST OEMMCFG   2000707 MSFT       97)
ACPI: SLIC 7FF90440, 0176 (r1 KOZIRO FRONTIER  2000707 MSFT       97)
ACPI: OEMB 7FF9E040, 007B (r1 MSTEST AMI_OEM   2000707 MSFT       97)
ACPI: HPET 7FF99570, 0038 (r1 MSTEST OEMHPET   2000707 MSFT       97)
Entering add_active_range(0, 0, 159) 0 entries of 256 used
Entering add_active_range(0, 256, 524176) 1 entries of 256 used
Entering add_active_range(0, 1048576, 1572864) 2 entries of 256 used
Zone PFN ranges:
  DMA             0 ->     4096
  DMA32        4096 ->  1048576
  Normal    1048576 ->  1572864
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0:        0 ->      159
    0:      256 ->   524176
    0:  1048576 ->  1572864
On node 0 totalpages: 1048367
  DMA zone: 56 pages used for memmap
  DMA zone: 2000 pages reserved
  DMA zone: 1943 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 505800 pages, LIFO batch:31
  Normal zone: 7168 pages used for memmap
  Normal zone: 517120 pages, LIFO batch:31
  Movable zone: 0 pages used for memmap
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
ACPI: HPET id: 0x8086a202 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7ee00000)
PERCPU: Allocating 29912 bytes of per cpu data
Built 1 zonelists in Zone order.  Total pages: 1024863
Kernel command line: root=/dev/sda2 netconsole=@192.168.0.1/eth0,9353@192.168.0.42/00:1b:38:af:22:49 ignore_loglevel
netconsole: local port 6665
netconsole: local IP 192.168.0.1
netconsole: interface eth0
netconsole: remote port 9353
netconsole: remote IP 192.168.0.42
netconsole: remote ethernet address 00:1b:38:af:22:49
debug: ignoring loglevel setting.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
Extended CMOS year: 2000
time.c: Detected 2135.040 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:    8
... MAX_LOCK_DEPTH:          30
... MAX_LOCKDEP_KEYS:        2048
... CLASSHASH_SIZE:           1024
... MAX_LOCKDEP_ENTRIES:     8192
... MAX_LOCKDEP_CHAINS:      16384
... CHAINHASH_SIZE:          8192
 memory used by lock dependency info: 1648 kB
 per task-struct memory footprint: 1680 bytes
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Checking aperture...
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Placing software IO TLB between 0x161b000 - 0x561b000
Memory: 4026652k/6291456k available (2330k kernel code, 166608k reserved, 1300k data, 200k init)
SLUB: Genslabs=22, HWalign=64, Order=0-1, MinObjects=4, CPUs=2, Nodes=1
Calibrating delay using timer specific routine.. 4273.23 BogoMIPS (lpj=2136619)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
using mwait in idle threads.
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
Freeing SMP alternatives: 19k freed
ACPI: Core revision 20070126
Using local APIC timer interrupts.
result 16679993
Detected 16.679 MHz APIC timer.
lockdep: not fixing up alternatives.
Booting processor 1/2 APIC 0x1
Initializing CPU#1
Calibrating delay using timer specific routine.. 4270.09 BogoMIPS (lpj=2135045)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU1: Thermal monitoring enabled (TM2)
Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz stepping 02
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
No dock devices found.
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
PCI quirk: region 0480-04bf claimed by ICH6 GPIO
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P8._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs *3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] -  08, should be 03 [20070126]
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 15 devices
ACPI: ACPI bus type pnp unregistered
SCSI subsystem initialized
libata version 2.21 loaded.
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI-GART: No AMD northbridge found.
Time: tsc clocksource has been installed.
pnp: 00:01: iomem range 0xfed14000-0xfed19fff has been reserved
pnp: 00:07: ioport range 0x290-0x297 has been reserved
pnp: 00:08: iomem range 0xffafe000-0xffb0cbff could not be reserved
pnp: 00:08: iomem range 0xffb00000-0xffbfffff could not be reserved
pnp: 00:08: iomem range 0xfed1c000-0xfed1ffff has been reserved
pnp: 00:08: iomem range 0xfed20000-0xfed8ffff has been reserved
pnp: 00:0b: iomem range 0xfec00000-0xfec00fff has been reserved
pnp: 00:0b: iomem range 0xfee00000-0xfee00fff could not be reserved
pnp: 00:0d: iomem range 0xe0000000-0xefffffff has been reserved
pnp: 00:0e: iomem range 0x0-0x9ffff could not be reserved
pnp: 00:0e: iomem range 0xc0000-0xcffff has been reserved
pnp: 00:0e: iomem range 0xe0000-0xfffff could not be reserved
pnp: 00:0e: iomem range 0x100000-0x7fffffff could not be reserved
PCI: Bridge: 0000:00:01.0
  IO window: 9000-9fff
  MEM window: f8700000-fe7fffff
  PREFETCH window: bfe00000-dfdfffff
PCI: Bridge: 0000:00:1c.0
  IO window: disabled.
  MEM window: disabled.
  PREFETCH window: dfe00000-dfefffff
PCI: Bridge: 0000:00:1c.3
  IO window: disabled.
  MEM window: fe900000-fe9fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.4
  IO window: a000-afff
  MEM window: fe800000-fe8fffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
  IO window: b000-bfff
  MEM window: fea00000-feafffff
  PREFETCH window: 88000000-880fffff
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:01.0 to 64
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1c.3 to 64
ACPI: PCI Interrupt 0000:00:1c.4[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1c.4 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 65536 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
TCP: Hash tables configured (established 65536 bind 65536)
TCP reno registered
io scheduler noop registered
io scheduler cfq registered (default)
Boot video device is 0000:01:00.0
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.102
ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:03:00.0 to 64
atl1 0000:03:00.0: version 2.0.7
8139too Fast Ethernet driver 0.9.28
ACPI: PCI Interrupt 0000:05:02.0[A] -> GSI 23 (level, low) -> IRQ 23
eth1: RealTek RTL8139 at 0xb800, 00:80:48:2e:06:2e, IRQ 23
eth1:  Identified 8139 chip type 'RTL-8100B/8139D'
netconsole: device eth0 not up yet, forcing it
atl1 0000:03:00.0: eth0 link is up 1000 Mbps full duplex
console [netcon0] enabled
netconsole: network logging started
ahci 0000:02:00.0: version 2.3
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
ahci 0000:02:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
ahci 0000:02:00.0: flags: 64bit ncq pm led clo pmp pio slum part 
PCI: Setting latency timer of device 0000:02:00.0 to 64
scsi0 : ahci
scsi1 : ahci
ata1: SATA max UDMA/133 cmd 0xffffc20000024100 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 16
ata2: SATA max UDMA/133 cmd 0xffffc20000024180 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 16
ata1: SATA link down (SStatus 0 SControl 300)
ata2: SATA link down (SStatus 0 SControl 300)
ata_piix 0000:00:1f.2: version 2.12
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi2 : ata_piix
scsi3 : ata_piix
ata3: SATA max UDMA/133 cmd 0x000000000001ec00 ctl 0x000000000001e882 bmdma 0x000000000001e400 irq 19
ata4: SATA max UDMA/133 cmd 0x000000000001e800 ctl 0x000000000001e482 bmdma 0x000000000001e408 irq 19
ata3.00: ATA-8: ST3750330AS, SD15, max UDMA/133
ata3.00: 1465149168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.01: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
ata3.01: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata3.00: configured for UDMA/133
ata3.01: configured for UDMA/133
ata4.00: ATA-7: ST3250620AS, 3.AAE, max UDMA/133
ata4.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ST3750330AS      SD15 PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:0:0: [sda] 1465149168 512-byte hardware sectors (750156 MB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
scsi 2:0:1:0: Direct-Access     ATA      ST3160811AS      3.AA PQ: 0 ANSI: 5
sd 2:0:1:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:1:0: [sdb] Write Protect is off
sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 2:0:1:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 2:0:1:0: [sdb] Write Protect is off
sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: unknown partition table
sd 2:0:1:0: [sdb] Attached SCSI disk
scsi 3:0:0:0: Direct-Access     ATA      ST3250620AS      3.AA PQ: 0 ANSI: 5
sd 3:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 3:0:0:0: [sdc] Write Protect is off
sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 3:0:0:0: [sdc] 488397168 512-byte hardware sectors (250059 MB)
sd 3:0:0:0: [sdc] Write Protect is off
sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1
sd 3:0:0:0: [sdc] Attached SCSI disk
ata_piix 0000:00:1f.5: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.5[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1f.5 to 64
scsi4 : ata_piix
scsi5 : ata_piix
ata5: SATA max UDMA/133 cmd 0x000000000001d400 ctl 0x000000000001d082 bmdma 0x000000000001c880 irq 19
ata6: SATA max UDMA/133 cmd 0x000000000001d000 ctl 0x000000000001cc02 bmdma 0x000000000001c888 irq 19
ACPI: PCI Interrupt 0000:02:00.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:02:00.1 to 64
scsi6 : pata_jmicron
scsi7 : pata_jmicron
ata7: PATA max UDMA/100 cmd 0x000000000001ac00 ctl 0x000000000001a882 bmdma 0x000000000001a400 irq 17
ata8: PATA max UDMA/100 cmd 0x000000000001a800 ctl 0x000000000001a482 bmdma 0x000000000001a408 irq 17
ata7.01: ATAPI: Optiarc DVD RW AD-7173A, 1-01, max UDMA/66
ata7.01: configured for UDMA/66
scsi 6:0:1:0: CD-ROM            Optiarc  DVD RW AD-7173A  1-01 PQ: 0 ANSI: 5
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
Advanced Linux Sound Architecture Driver Version 1.0.14 (Fri Jul 20 09:12:58 2007 UTC).
ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:1b.0 to 64
input: AT Translated Set 2 keyboard as /class/input/input0
ALSA device list:
  #0: HDA Intel at 0xfebf8000 irq 22
TCP cubic registered
NET: Registered protocol family 1
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 200k freed
Write protecting the kernel read-only data: 3216k
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
ACPI: PCI Interrupt 0000:00:1a.7[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1a.7 to 64
ehci_hcd 0000:00:1a.7: EHCI Host Controller
ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1a.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1a.7
ehci_hcd 0000:00:1a.7: irq 18, io mem 0xfebffc00
USB Universal Host Controller Interface driver v3.0
ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 6:0:1:0: Attached scsi CD-ROM sr0
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 32 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 23, io mem 0xfebff800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
ACPI: PCI Interrupt 0000:00:1a.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1a.0 to 64
uhci_hcd 0000:00:1a.0: UHCI Host Controller
uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1a.0: irq 16, io base 0x0000dc00
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1a.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1a.1 to 64
uhci_hcd 0000:00:1a.1: UHCI Host Controller
uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1a.1: irq 17, io base 0x0000e000
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.0: irq 23, io base 0x0000d480
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 19
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 6
uhci_hcd 0000:00:1d.1: irq 19, io base 0x0000d800
usb usb6: configuration #1 chosen from 1 choice
hub 6-0:1.0: USB hub found
hub 6-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 7
uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000d880
usb usb7: configuration #1 chosen from 1 choice
hub 7-0:1.0: USB hub found
hub 7-0:1.0: 2 ports detected
EXT3 FS on sda2, internal journal
usbcore: registered new interface driver usblp
Adding 9775512k swap on /dev/sda1.  Priority:-1 extents:1 across:9775512k
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
eth1: link up, 100Mbps, full-duplex, lpa 0x45E1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-19 14:45               ` atl1 64-bit => 32-bit DMA borkage (reproducible, bisected) Alexey Dobriyan
@ 2008-04-20  2:54                 ` Jay Cliburn
  2008-04-20 11:14                   ` Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-04-20  2:54 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sat, 19 Apr 2008 18:45:35 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> OK, nailed it.
> 
> It's commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 aka "atl1:
> disable broken 64-bit DMA".
> 
> With this commit in tree, I can reproduce either
> a) kmalloc-2048 corruption after initscripts shutdown eth0
> 	http://marc.info/?l=linux-kernel&m=120820360221261&w=2
> 
> b) or oopses at filp_close() first reported long ago
> 	(sorry, can't find that email)
> 
> c) or hard hang after initscripts shutdown eth0 with even SysRq not
> working. http://marc.info/?l=linux-kernel&m=120795046008115&w=2
> 
> I have two boxes one with atl1, 4G RAM with 2G remapped after 4G
> boundary, another with r8169 connected with just ethernet cable. NICs
> agree on 1Gbps speed.
> 
> So, it's enough to scp 200 MB git archive and immediately start
> rebooting sequence for horrors described above to appear. It's not
> 100% reproducible but more like 90%.

Do I understand correctly that these failures occur only while the
network interface is going down?

Jay

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 11:14                   ` Alexey Dobriyan
@ 2008-04-20 11:06                     ` Jay Cliburn
  2008-04-20 12:26                       ` Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-04-20 11:06 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sun, 20 Apr 2008 15:14:53 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
> > On Sat, 19 Apr 2008 18:45:35 +0400
> > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > 
> > > OK, nailed it.
> > > 
> > > It's commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 aka "atl1:
> > > disable broken 64-bit DMA".
> > > 
> > > With this commit in tree, I can reproduce either
> > > a) kmalloc-2048 corruption after initscripts shutdown eth0
> > > 	http://marc.info/?l=linux-kernel&m=120820360221261&w=2
> > > 
> > > b) or oopses at filp_close() first reported long ago
> > > 	(sorry, can't find that email)
> > > 
> > > c) or hard hang after initscripts shutdown eth0 with even SysRq
> > > not working.
> > > http://marc.info/?l=linux-kernel&m=120795046008115&w=2
> > > 
> > > I have two boxes one with atl1, 4G RAM with 2G remapped after 4G
> > > boundary, another with r8169 connected with just ethernet cable.
> > > NICs agree on 1Gbps speed.
> > > 
> > > So, it's enough to scp 200 MB git archive and immediately start
> > > rebooting sequence for horrors described above to appear. It's not
> > > 100% reproducible but more like 90%.
> > 
> > Do I understand correctly that these failures occur only while the
> > network interface is going down?
> 
> Yep. During up or running there were no problems with this card.
> 

One more question:  Does it happen whether or not you're using atl1 as
a netconsole?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20  2:54                 ` Jay Cliburn
@ 2008-04-20 11:14                   ` Alexey Dobriyan
  2008-04-20 11:06                     ` Jay Cliburn
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-20 11:14 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
> On Sat, 19 Apr 2008 18:45:35 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > OK, nailed it.
> > 
> > It's commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 aka "atl1:
> > disable broken 64-bit DMA".
> > 
> > With this commit in tree, I can reproduce either
> > a) kmalloc-2048 corruption after initscripts shutdown eth0
> > 	http://marc.info/?l=linux-kernel&m=120820360221261&w=2
> > 
> > b) or oopses at filp_close() first reported long ago
> > 	(sorry, can't find that email)
> > 
> > c) or hard hang after initscripts shutdown eth0 with even SysRq not
> > working. http://marc.info/?l=linux-kernel&m=120795046008115&w=2
> > 
> > I have two boxes one with atl1, 4G RAM with 2G remapped after 4G
> > boundary, another with r8169 connected with just ethernet cable. NICs
> > agree on 1Gbps speed.
> > 
> > So, it's enough to scp 200 MB git archive and immediately start
> > rebooting sequence for horrors described above to appear. It's not
> > 100% reproducible but more like 90%.
> 
> Do I understand correctly that these failures occur only while the
> network interface is going down?

Yep. During up or running there were no problems with this card.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 11:06                     ` Jay Cliburn
@ 2008-04-20 12:26                       ` Alexey Dobriyan
  2008-04-20 18:37                         ` Jay Cliburn
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-20 12:26 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sun, Apr 20, 2008 at 06:06:07AM -0500, Jay Cliburn wrote:
> On Sun, 20 Apr 2008 15:14:53 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
> > > On Sat, 19 Apr 2008 18:45:35 +0400
> > > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > > 
> > > > OK, nailed it.
> > > > 
> > > > It's commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 aka "atl1:
> > > > disable broken 64-bit DMA".
> > > > 
> > > > With this commit in tree, I can reproduce either
> > > > a) kmalloc-2048 corruption after initscripts shutdown eth0
> > > > 	http://marc.info/?l=linux-kernel&m=120820360221261&w=2
> > > > 
> > > > b) or oopses at filp_close() first reported long ago
> > > > 	(sorry, can't find that email)
> > > > 
> > > > c) or hard hang after initscripts shutdown eth0 with even SysRq
> > > > not working.
> > > > http://marc.info/?l=linux-kernel&m=120795046008115&w=2
> > > > 
> > > > I have two boxes one with atl1, 4G RAM with 2G remapped after 4G
> > > > boundary, another with r8169 connected with just ethernet cable.
> > > > NICs agree on 1Gbps speed.
> > > > 
> > > > So, it's enough to scp 200 MB git archive and immediately start
> > > > rebooting sequence for horrors described above to appear. It's not
> > > > 100% reproducible but more like 90%.
> > > 
> > > Do I understand correctly that these failures occur only while the
> > > network interface is going down?
> > 
> > Yep. During up or running there were no problems with this card.
> > 
> 
> One more question:  Does it happen whether or not you're using atl1 as
> a netconsole?

Without netconsole bugs happens too.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 12:26                       ` Alexey Dobriyan
@ 2008-04-20 18:37                         ` Jay Cliburn
  2008-04-20 20:55                           ` Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-04-20 18:37 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sun, 20 Apr 2008 16:26:31 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Sun, Apr 20, 2008 at 06:06:07AM -0500, Jay Cliburn wrote:
> > On Sun, 20 Apr 2008 15:14:53 +0400
> > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > 
> > > On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
> > > > On Sat, 19 Apr 2008 18:45:35 +0400
> > > > Alexey Dobriyan <adobriyan@gmail.com> wrote:
[...]
> > > > > So, it's enough to scp 200 MB git archive and immediately
> > > > > start rebooting sequence for horrors described above to
> > > > > appear. It's not 100% reproducible but more like 90%.
> > > > 
> > > > Do I understand correctly that these failures occur only while
> > > > the network interface is going down?
> > > 
> > > Yep. During up or running there were no problems with this card.
> > > 
> > 
> > One more question:  Does it happen whether or not you're using atl1
> > as a netconsole?
> 
> Without netconsole bugs happens too.
> 

I can't duplicate this error, but it's probably because my machine
doesn't have 4GB of memory.

I have one report in Febroary 2008 of another user encountering strange
oopses in 2.6.23.12 and 2.6.24 whenever he downed the interface.  I
suspect your experience is a repeat of that.

Just to be clear, you transfer about 200MB to the NIC (Rx direction),
then immediately reboot, right?  Can you duplicate the problem if you
simply ifconfig down instead of rebooting after the transfer?  

Thanks for your help.

Jay

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 18:37                         ` Jay Cliburn
@ 2008-04-20 20:55                           ` Alexey Dobriyan
  2008-04-21 18:42                             ` Chris Snook
  2008-04-22  2:08                             ` Jay Cliburn
  0 siblings, 2 replies; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-20 20:55 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Sun, Apr 20, 2008 at 01:37:04PM -0500, Jay Cliburn wrote:
> On Sun, 20 Apr 2008 16:26:31 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Sun, Apr 20, 2008 at 06:06:07AM -0500, Jay Cliburn wrote:
> > > On Sun, 20 Apr 2008 15:14:53 +0400
> > > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > > 
> > > > On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
> > > > > On Sat, 19 Apr 2008 18:45:35 +0400
> > > > > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> [...]
> > > > > > So, it's enough to scp 200 MB git archive and immediately
> > > > > > start rebooting sequence for horrors described above to
> > > > > > appear. It's not 100% reproducible but more like 90%.
> > > > > 
> > > > > Do I understand correctly that these failures occur only while
> > > > > the network interface is going down?
> > > > 
> > > > Yep. During up or running there were no problems with this card.
> > > > 
> > > 
> > > One more question:  Does it happen whether or not you're using atl1
> > > as a netconsole?
> > 
> > Without netconsole bugs happens too.
> > 
> 
> I can't duplicate this error, but it's probably because my machine
> doesn't have 4GB of memory.
> 
> I have one report in Febroary 2008 of another user encountering strange
> oopses in 2.6.23.12 and 2.6.24 whenever he downed the interface.  I
> suspect your experience is a repeat of that.
> 
> Just to be clear, you transfer about 200MB to the NIC (Rx direction),
> then immediately reboot, right?

Yup!

> Can you duplicate the problem if you
> simply ifconfig down instead of rebooting after the transfer?  

Aha, ifconfig down is enough. Here is how reproducer looks like now:

	./sync-linux-linus && ssh core2 "sudo /sbin/ifconfig eth0 down"

where first script is basically scp(1).

Also, booting with 1G or 2G of RAM (mem=1024m) makes issue go away.

printk at dev_close() time shows that NETIF_F_HIGHDMA was not somehow
enabled.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 20:55                           ` Alexey Dobriyan
@ 2008-04-21 18:42                             ` Chris Snook
  2008-04-21 19:56                               ` Alexey Dobriyan
  2008-04-22  2:08                             ` Jay Cliburn
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Snook @ 2008-04-21 18:42 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Jay Cliburn, Luca Tettamanti, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

Alexey Dobriyan wrote:
> On Sun, Apr 20, 2008 at 01:37:04PM -0500, Jay Cliburn wrote:
>> On Sun, 20 Apr 2008 16:26:31 +0400
>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>
>>> On Sun, Apr 20, 2008 at 06:06:07AM -0500, Jay Cliburn wrote:
>>>> On Sun, 20 Apr 2008 15:14:53 +0400
>>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>>>
>>>>> On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
>>>>>> On Sat, 19 Apr 2008 18:45:35 +0400
>>>>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>> [...]
>>>>>>> So, it's enough to scp 200 MB git archive and immediately
>>>>>>> start rebooting sequence for horrors described above to
>>>>>>> appear. It's not 100% reproducible but more like 90%.
>>>>>> Do I understand correctly that these failures occur only while
>>>>>> the network interface is going down?
>>>>> Yep. During up or running there were no problems with this card.
>>>>>
>>>> One more question:  Does it happen whether or not you're using atl1
>>>> as a netconsole?
>>> Without netconsole bugs happens too.
>>>
>> I can't duplicate this error, but it's probably because my machine
>> doesn't have 4GB of memory.
>>
>> I have one report in Febroary 2008 of another user encountering strange
>> oopses in 2.6.23.12 and 2.6.24 whenever he downed the interface.  I
>> suspect your experience is a repeat of that.
>>
>> Just to be clear, you transfer about 200MB to the NIC (Rx direction),
>> then immediately reboot, right?
> 
> Yup!
> 
>> Can you duplicate the problem if you
>> simply ifconfig down instead of rebooting after the transfer?  
> 
> Aha, ifconfig down is enough. Here is how reproducer looks like now:
> 
> 	./sync-linux-linus && ssh core2 "sudo /sbin/ifconfig eth0 down"
> 
> where first script is basically scp(1).
> 
> Also, booting with 1G or 2G of RAM (mem=1024m) makes issue go away.
> 
> printk at dev_close() time shows that NETIF_F_HIGHDMA was not somehow
> enabled.
> 

Does the problem go away with iommu=nomerge?  If so, I suspect we're not 
properly flushing an iowrite somewhere.

-- Chris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-21 18:42                             ` Chris Snook
@ 2008-04-21 19:56                               ` Alexey Dobriyan
  0 siblings, 0 replies; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-21 19:56 UTC (permalink / raw)
  To: Chris Snook
  Cc: Jay Cliburn, Luca Tettamanti, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Mon, Apr 21, 2008 at 02:42:42PM -0400, Chris Snook wrote:
> Alexey Dobriyan wrote:
>> On Sun, Apr 20, 2008 at 01:37:04PM -0500, Jay Cliburn wrote:
>>> On Sun, 20 Apr 2008 16:26:31 +0400
>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>>
>>>> On Sun, Apr 20, 2008 at 06:06:07AM -0500, Jay Cliburn wrote:
>>>>> On Sun, 20 Apr 2008 15:14:53 +0400
>>>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>>>>
>>>>>> On Sat, Apr 19, 2008 at 09:54:44PM -0500, Jay Cliburn wrote:
>>>>>>> On Sat, 19 Apr 2008 18:45:35 +0400
>>>>>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>> [...]
>>>>>>>> So, it's enough to scp 200 MB git archive and immediately
>>>>>>>> start rebooting sequence for horrors described above to
>>>>>>>> appear. It's not 100% reproducible but more like 90%.
>>>>>>> Do I understand correctly that these failures occur only while
>>>>>>> the network interface is going down?
>>>>>> Yep. During up or running there were no problems with this card.
>>>>>>
>>>>> One more question:  Does it happen whether or not you're using atl1
>>>>> as a netconsole?
>>>> Without netconsole bugs happens too.
>>>>
>>> I can't duplicate this error, but it's probably because my machine
>>> doesn't have 4GB of memory.
>>>
>>> I have one report in Febroary 2008 of another user encountering strange
>>> oopses in 2.6.23.12 and 2.6.24 whenever he downed the interface.  I
>>> suspect your experience is a repeat of that.
>>>
>>> Just to be clear, you transfer about 200MB to the NIC (Rx direction),
>>> then immediately reboot, right?
>> Yup!
>>> Can you duplicate the problem if you
>>> simply ifconfig down instead of rebooting after the transfer?  
>> Aha, ifconfig down is enough. Here is how reproducer looks like now:
>> 	./sync-linux-linus && ssh core2 "sudo /sbin/ifconfig eth0 down"
>> where first script is basically scp(1).
>> Also, booting with 1G or 2G of RAM (mem=1024m) makes issue go away.
>> printk at dev_close() time shows that NETIF_F_HIGHDMA was not somehow
>> enabled.
>
> Does the problem go away with iommu=nomerge?  If so, I suspect we're not 
> properly flushing an iowrite somewhere.

nomerge doesn't help.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-20 20:55                           ` Alexey Dobriyan
  2008-04-21 18:42                             ` Chris Snook
@ 2008-04-22  2:08                             ` Jay Cliburn
  2008-04-22 19:02                               ` Alexey Dobriyan
  2008-04-26  0:57                               ` Jay Cliburn
  1 sibling, 2 replies; 26+ messages in thread
From: Jay Cliburn @ 2008-04-22  2:08 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Mon, 21 Apr 2008 00:55:00 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> Aha, ifconfig down is enough. Here is how reproducer looks like now:
> 
> 	./sync-linux-linus && ssh core2 "sudo /sbin/ifconfig eth0
> down"
> 
> where first script is basically scp(1).
> 
> Also, booting with 1G or 2G of RAM (mem=1024m) makes issue go away.
> 
> printk at dev_close() time shows that NETIF_F_HIGHDMA was not somehow
> enabled.
> 

Alexey, can you please try this (very minimally tested) patch?

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 5586fc6..07fe5c0 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -1115,9 +1115,6 @@ static void atl1_free_ring_resources(struct atl1_adapter *adapter)
 	struct atl1_rrd_ring *rrd_ring = &adapter->rrd_ring;
 	struct atl1_ring_header *ring_header = &adapter->ring_header;
 
-	atl1_clean_tx_ring(adapter);
-	atl1_clean_rx_ring(adapter);
-
 	kfree(tpd_ring->buffer_info);
 	pci_free_consistent(pdev, ring_header->size, ring_header->desc,
 		ring_header->dma);
@@ -3423,6 +3420,8 @@ static int atl1_set_ringparam(struct net_device *netdev,
 		adapter->rrd_ring = rrd_old;
 		adapter->tpd_ring = tpd_old;
 		adapter->ring_header = rhdr_old;
+		atl1_clean_tx_ring(adapter);
+		atl1_clean_rx_ring(adapter);
 		atl1_free_ring_resources(adapter);
 		adapter->rfd_ring = rfd_new;
 		adapter->rrd_ring = rrd_new;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-22  2:08                             ` Jay Cliburn
@ 2008-04-22 19:02                               ` Alexey Dobriyan
  2008-04-26  0:57                               ` Jay Cliburn
  1 sibling, 0 replies; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-22 19:02 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 4683 bytes --]

On Mon, Apr 21, 2008 at 09:08:21PM -0500, Jay Cliburn wrote:
> On Mon, 21 Apr 2008 00:55:00 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > Aha, ifconfig down is enough. Here is how reproducer looks like now:
> > 
> > 	./sync-linux-linus && ssh core2 "sudo /sbin/ifconfig eth0
> > down"
> > 
> > where first script is basically scp(1).
> > 
> > Also, booting with 1G or 2G of RAM (mem=1024m) makes issue go away.
> > 
> > printk at dev_close() time shows that NETIF_F_HIGHDMA was not somehow
> > enabled.
> > 
> 
> Alexey, can you please try this (very minimally tested) patch?
> 
> diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
> index 5586fc6..07fe5c0 100644
> --- a/drivers/net/atlx/atl1.c
> +++ b/drivers/net/atlx/atl1.c
> @@ -1115,9 +1115,6 @@ static void atl1_free_ring_resources(struct atl1_adapter *adapter)
>  	struct atl1_rrd_ring *rrd_ring = &adapter->rrd_ring;
>  	struct atl1_ring_header *ring_header = &adapter->ring_header;
>  
> -	atl1_clean_tx_ring(adapter);
> -	atl1_clean_rx_ring(adapter);
> -
>  	kfree(tpd_ring->buffer_info);
>  	pci_free_consistent(pdev, ring_header->size, ring_header->desc,
>  		ring_header->dma);
> @@ -3423,6 +3420,8 @@ static int atl1_set_ringparam(struct net_device *netdev,
>  		adapter->rrd_ring = rrd_old;
>  		adapter->tpd_ring = tpd_old;
>  		adapter->ring_header = rhdr_old;
> +		atl1_clean_tx_ring(adapter);
> +		atl1_clean_rx_ring(adapter);
>  		atl1_free_ring_resources(adapter);

Patch doesn't help unfortunately.

BTW, below is clean corruption trace:


atl1 0000:03:00.0: eth0 link is up 1000 Mbps full duplex
=============================================================================
BUG kmalloc-2048: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff81017ed7a97a-0xffff81017ed7af71. First byte 0x0 instead of 0x6b
INFO: Allocated in dev_alloc_skb+0x18/0x30 age=23894 cpu=1 pid=30255
INFO: Freed in skb_release_data+0x7a/0xc0 age=20700 cpu=0 pid=0
INFO: Slab 0xffffe200053bf240 used=12 fp=0xffff81017ed7a968 flags=0x17c000000040c3
INFO: Object 0xffff81017ed7a968 @offset=10600 fp=0xffff81017ed7ca88

Bytes b4 0xffff81017ed7a958:  14 09 a7 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ..§.....ZZZZZZZZ
  Object 0xffff81017ed7a968:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff81017ed7a978:  6b 6b 00 18 f3 a2 9f 90 00 1b 38 af 22 49 08 00 kk..ó¢....8¯"I..
  Object 0xffff81017ed7a988:  45 10 00 4c a4 9f 40 00 40 11 d2 fe c0 a8 00 2a E..L¤.@.@.ÒþÀ¨.*
  Object 0xffff81017ed7a998:  59 6f a8 b1 9d e9 00 7b 00 38 58 29 23 00 00 00 Yo¨±.é.{.8X)#...
  Object 0xffff81017ed7a9a8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81017ed7a9b8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81017ed7a9c8:  00 00 00 00 1e 31 61 fa 08 5e 9a 73 de cf ce 94 .....1aú.^.sÞÏÎ.
  Object 0xffff81017ed7a9d8:  63 64 65 66 67 68 6a 69 6b 6c 6d 6e 6f 70 71 72 cdefghjiklmnopqr
 Redzone 0xffff81017ed7b168:  bb bb bb bb bb bb bb bb                         »»»»»»»»        
 Padding 0xffff81017ed7b1a8:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
Pid: 31677, comm: ifconfig Not tainted 2.6.25-3925e6fc1f774048404fdd910b0345b06c699eb4 #5

Call Trace:
 [<ffffffff80288277>] print_trailer+0xe7/0x170
 [<ffffffff802883a5>] check_bytes_and_report+0xa5/0xd0
 [<ffffffff80288678>] check_object+0xa8/0x250
 [<ffffffff80289975>] __slab_alloc+0x535/0x690
 [<ffffffff80253f3e>] ? mark_held_locks+0x3e/0x80
 [<ffffffff803f2fd8>] ? dev_alloc_skb+0x18/0x30
 [<ffffffff8028aff6>] __kmalloc_track_caller+0xe6/0x100
 [<ffffffff803f2fd8>] ? dev_alloc_skb+0x18/0x30
 [<ffffffff803f2b8f>] __alloc_skb+0x6f/0x160
 [<ffffffff803f2fd8>] dev_alloc_skb+0x18/0x30
 [<ffffffff8036512a>] atl1_alloc_rx_buffers+0x11a/0x260
 [<ffffffff80366dc7>] atl1_up+0x77/0x750
 [<ffffffff80367a0b>] atl1_open+0x3b/0x50
 [<ffffffff803fa3fa>] dev_open+0x5a/0x90
 [<ffffffff803f8ca9>] dev_change_flags+0x99/0x1b0
 [<ffffffff8043d1d2>] devinet_ioctl+0x592/0x740
 [<ffffffff803fa229>] ? dev_ioctl+0x479/0x550
 [<ffffffff8043d891>] inet_ioctl+0x61/0x80
 [<ffffffff803eaa16>] sock_ioctl+0x56/0x240
 [<ffffffff8029b271>] vfs_ioctl+0x31/0x90
 [<ffffffff8029b343>] do_vfs_ioctl+0x73/0x2d0
 [<ffffffff8029b5ea>] sys_ioctl+0x4a/0x80
 [<ffffffff8020b54b>] system_call_after_swapgs+0x7b/0x80

FIX kmalloc-2048: Restoring 0xffff81017ed7a97a-0xffff81017ed7af71=0x6b

FIX kmalloc-2048: Marking all objects used
atl1 0000:03:00.0: eth0 link is up 1000 Mbps full duplex


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-22  2:08                             ` Jay Cliburn
  2008-04-22 19:02                               ` Alexey Dobriyan
@ 2008-04-26  0:57                               ` Jay Cliburn
  2008-04-28  6:42                                 ` Alexey Dobriyan
  2008-05-04 21:15                                 ` Alexey Dobriyan
  1 sibling, 2 replies; 26+ messages in thread
From: Jay Cliburn @ 2008-04-26  0:57 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Jay Cliburn, Luca Tettamanti, Chris Snook, Jeff Garzik,
	Pekka Enberg, Andrew Morton, linux-kernel, netdev,
	Christoph Lameter, torvalds

On Mon, 21 Apr 2008 21:08:21 -0500
Jay Cliburn <jacliburn@bellsouth.net> wrote:

> 
> Alexey, can you please try this (very minimally tested) patch?

Alexey, have you found time to try this patch yet?

Thanks.

> 
> diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
> index 5586fc6..07fe5c0 100644
> --- a/drivers/net/atlx/atl1.c
> +++ b/drivers/net/atlx/atl1.c
> @@ -1115,9 +1115,6 @@ static void atl1_free_ring_resources(struct
> atl1_adapter *adapter) struct atl1_rrd_ring *rrd_ring =
> &adapter->rrd_ring; struct atl1_ring_header *ring_header =
> &adapter->ring_header; 
> -	atl1_clean_tx_ring(adapter);
> -	atl1_clean_rx_ring(adapter);
> -
>  	kfree(tpd_ring->buffer_info);
>  	pci_free_consistent(pdev, ring_header->size,
> ring_header->desc, ring_header->dma);
> @@ -3423,6 +3420,8 @@ static int atl1_set_ringparam(struct net_device
> *netdev, adapter->rrd_ring = rrd_old;
>  		adapter->tpd_ring = tpd_old;
>  		adapter->ring_header = rhdr_old;
> +		atl1_clean_tx_ring(adapter);
> +		atl1_clean_rx_ring(adapter);
>  		atl1_free_ring_resources(adapter);
>  		adapter->rfd_ring = rfd_new;
>  		adapter->rrd_ring = rrd_new;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-26  0:57                               ` Jay Cliburn
@ 2008-04-28  6:42                                 ` Alexey Dobriyan
  2008-05-04 21:15                                 ` Alexey Dobriyan
  1 sibling, 0 replies; 26+ messages in thread
From: Alexey Dobriyan @ 2008-04-28  6:42 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Fri, Apr 25, 2008 at 07:57:43PM -0500, Jay Cliburn wrote:
> On Mon, 21 Apr 2008 21:08:21 -0500
> Jay Cliburn <jacliburn@bellsouth.net> wrote:
> 
> > 
> > Alexey, can you please try this (very minimally tested) patch?
> 
> Alexey, have you found time to try this patch yet?

I've tried it and it doesn't help.
http://marc.info/?l=linux-netdev&m=120888791230434&w=2

> > --- a/drivers/net/atlx/atl1.c
> > +++ b/drivers/net/atlx/atl1.c
> > @@ -1115,9 +1115,6 @@ static void atl1_free_ring_resources(struct
> > atl1_adapter *adapter) struct atl1_rrd_ring *rrd_ring =
> > &adapter->rrd_ring; struct atl1_ring_header *ring_header =
> > &adapter->ring_header; 
> > -	atl1_clean_tx_ring(adapter);
> > -	atl1_clean_rx_ring(adapter);
> > -
> >  	kfree(tpd_ring->buffer_info);
> >  	pci_free_consistent(pdev, ring_header->size,
> > ring_header->desc, ring_header->dma);
> > @@ -3423,6 +3420,8 @@ static int atl1_set_ringparam(struct net_device
> > *netdev, adapter->rrd_ring = rrd_old;
> >  		adapter->tpd_ring = tpd_old;
> >  		adapter->ring_header = rhdr_old;
> > +		atl1_clean_tx_ring(adapter);
> > +		atl1_clean_rx_ring(adapter);
> >  		atl1_free_ring_resources(adapter);
> >  		adapter->rfd_ring = rfd_new;
> >  		adapter->rrd_ring = rrd_new;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-04-26  0:57                               ` Jay Cliburn
  2008-04-28  6:42                                 ` Alexey Dobriyan
@ 2008-05-04 21:15                                 ` Alexey Dobriyan
  2008-05-05  0:31                                   ` Jay Cliburn
  2008-05-06 16:02                                   ` Jay Cliburn
  1 sibling, 2 replies; 26+ messages in thread
From: Alexey Dobriyan @ 2008-05-04 21:15 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

Looking at how other netdevice drivers:

8139too and others checks netif_running() in interrupt handler.

r8169 has scary "50k$" question comment re irqs disabled after
interacting with hardware.

But the r8169 case should be fixed by atlx_irq_disable()?

Writes to REG_IMR, REG_ISR are commented in atl1_reset_hw(), why?
(I'll test that soon)

Do we have a theory why changing from 64-bit DMA mask to 32-bit mask
resurrects the bug? NIC here never showed any sort of corruption
described in commit which banned 64-bit DMA.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-04 21:15                                 ` Alexey Dobriyan
@ 2008-05-05  0:31                                   ` Jay Cliburn
  2008-05-05  0:34                                     ` Jay Cliburn
  2008-05-06 16:02                                   ` Jay Cliburn
  1 sibling, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-05-05  0:31 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Mon, 5 May 2008 01:15:07 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> Looking at how other netdevice drivers:
> 
> 8139too and others checks netif_running() in interrupt handler.
> 
> r8169 has scary "50k$" question comment re irqs disabled after
> interacting with hardware.
> 
> But the r8169 case should be fixed by atlx_irq_disable()?

Agreed.

> 
> Writes to REG_IMR, REG_ISR are commented in atl1_reset_hw(), why?

Came from the vendor that way.

> (I'll test that soon)
> 
> Do we have a theory why changing from 64-bit DMA mask to 32-bit mask
> resurrects the bug? NIC here never showed any sort of corruption
> described in commit which banned 64-bit DMA.

We had multiple reports of users who encountered repeated memory
corruption when transferring large files while running with a 64-bit DMA
mask.  Chris Snook noticed the upper 32 bits of the descriptor address
register are shared among five other registers, each containing the low
bits for one of five descriptors.  All the descriptors, therefore, have
to live within the same 4GB address space.

I'll keep poking at it as time permits through the week, but I probably
won't be able to devote a whole lot of time to it until next weekend.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-05  0:31                                   ` Jay Cliburn
@ 2008-05-05  0:34                                     ` Jay Cliburn
  0 siblings, 0 replies; 26+ messages in thread
From: Jay Cliburn @ 2008-05-05  0:34 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Alexey Dobriyan, Luca Tettamanti, Chris Snook, Jeff Garzik,
	Pekka Enberg, Andrew Morton, linux-kernel, netdev,
	Christoph Lameter, torvalds

On Sun, 4 May 2008 19:31:28 -0500
Jay Cliburn <jacliburn@bellsouth.net> wrote:

> All the descriptors, therefore, have to live within the same 4GB address
> space.

Make that "...within the same 2GB address space."

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-04 21:15                                 ` Alexey Dobriyan
  2008-05-05  0:31                                   ` Jay Cliburn
@ 2008-05-06 16:02                                   ` Jay Cliburn
  2008-05-09 19:51                                     ` Alexey Dobriyan
  1 sibling, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-05-06 16:02 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Mon, 5 May 2008 01:15:07 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> Looking at how other netdevice drivers:
> 
> 8139too and others checks netif_running() in interrupt handler.
> 
> r8169 has scary "50k$" question comment re irqs disabled after
> interacting with hardware.
> 
> But the r8169 case should be fixed by atlx_irq_disable()?
> 
> Writes to REG_IMR, REG_ISR are commented in atl1_reset_hw(), why?
> (I'll test that soon)

I've tried all the stuff you mentioned above, and more, to prevent the
memory corruption, all to no avail.

I booted with mem=4000M and didn't hit the bug.  I diffed dmesg between
booting with mem=4000M and booting without it, and found that iommu
was being disabled when booting with full memory:

--- dmesg-4000.txt      2008-05-06 10:14:07.000000000 -0500
+++ dmesg-4096.txt      2008-05-06 10:09:19.000000000 -0500
@@ -1,5 +1,5 @@
 Linux version 2.6.26-rc1 (jcliburn@finch.hogchain.net) (gcc version 4.1.2 20070
925 (Red Hat 4.1.2-27)) #4 SMP Mon May 5 18:03:48 CDT 2008
-Command line: ro root=LABEL=/1 console=ttyS0,38400 console=tty0 slub_debug=FZPU mem=4000M
+Command line: ro root=LABEL=/1 console=ttyS0,38400 console=tty0 slub_debug=FZPU
[...]
+Looks like a VIA chipset. Disabling IOMMU. Override with iommu=allowed
[...]

So I then booted with iommu=allowed.  No errors.  Can't hit the bug to
save my life.

Why would disabling iommu cause the atl1 driver to write over poisoned
memory?

Alexey, can you please try booting with iommu=allowed and see if you
avoid the problem?

Thanks,
Jay

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-09 19:51                                     ` Alexey Dobriyan
@ 2008-05-09 18:56                                       ` Chris Snook
  2008-05-09 20:07                                         ` Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Snook @ 2008-05-09 18:56 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Jay Cliburn, Luca Tettamanti, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

Alexey Dobriyan wrote:
> Hmmm, there was a wonderful oops on interface stop here when the other end
> of atl1 cable was physically unplugged (but there was traffic before):
> 
> 	atl1_down
> 	atl1_clean_rx_ring
> 	swiotlb_unmap_single
> 	swiotlb_unmap_single_attrs
> 	memcpy_c
> 

Intel chip, or AMD?

-- Chris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-09 20:07                                         ` Alexey Dobriyan
@ 2008-05-09 19:38                                           ` Jay Cliburn
  2008-05-10 19:31                                             ` [PATCH] " Alexey Dobriyan
  0 siblings, 1 reply; 26+ messages in thread
From: Jay Cliburn @ 2008-05-09 19:38 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Chris Snook, Luca Tettamanti, Jeff Garzik, Andrew Morton,
	linux-kernel, netdev

[trimmed cc list slightly]

On Sat, 10 May 2008 00:07:15 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Fri, May 09, 2008 at 02:56:21PM -0400, Chris Snook wrote:
> > Alexey Dobriyan wrote:
> >> Hmmm, there was a wonderful oops on interface stop here when the
> >> other end of atl1 cable was physically unplugged (but there was
> >> traffic before): atl1_down
> >> 	atl1_clean_rx_ring
> >> 	swiotlb_unmap_single
> >> 	swiotlb_unmap_single_attrs
> >> 	memcpy_c
> >
> > Intel chip, or AMD?
> 
> Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
> Asus P5B-E motherboard.
> 

I see the same thing with a Socked AM2-based board (Asus M2V) with 4GB
RAM installed. The problem occurs only when SWIOTLB is active, which
happens automatically at boot (in arch/x86/kernel/pci-swiotlb.c) when
the page frame number exceeds 1048576 (corresponding to 2^32 bytes).

I thought for awhile that the problem went away with iommu=allowed, but
I was wrong.

The bug appears to be a "simple" skb write-after-free that happens only
when bounce buffers are in use, but I'll be damned if I can find the
cause of it.

<continues looking>

=============================================================================
BUG kmalloc-2048: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff81010004297a-0xffff810100042f71. First byte 0x0 instead of 0x6b
INFO: Allocated in dev_alloc_skb+0x16/0x2c age=5813 cpu=0 pid=3029
INFO: Freed in skb_release_data+0xa8/0xad age=201 cpu=0 pid=0
INFO: Slab 0xffffe20005801600 objects=15 used=0 fp=0xffff810100045b18 flags=0x8000000000002082
INFO: Object 0xffff810100042968 @offset=10600 fp=0xffff8101000418d8

Bytes b4 0xffff810100042958:  aa 91 fd ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a �.��....ZZZZZZZZ
  Object 0xffff810100042968:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
  Object 0xffff810100042978:  6b 6b 00 17 31 4e 9d 41 00 0f db bc af 14 08 00 kk..1N.A..ۼ�...
  Object 0xffff810100042988:  45 00 00 4e 87 5e 00 00 40 11 6e 82 c0 a8 01 fe E..N.^..@.n.�������.�
  Object 0xffff810100042998:  c0 a8 01 70 00 89 00 89 00 3a 3b 67 00 09 00 00 ��.p.....:;g....
  Object 0xffff8101000429a8:  00 01 00 00 00 00 00 00 20 43 4b 41 41 41 41 41 .........CKAAAAA
  Object 0xffff8101000429b8:  41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
  Object 0xffff8101000429c8:  41 41 41 41 41 41 41 41 41 00 00 21 00 01 f0 53 AAAAAAAAA..!..
  Object 0xffff8101000429d8:  56 17 df 3e 3b 9f b7 1f 2d 29 f0 68 cf 4d 61 97 V.�>;.�.-)�h�Ma.
 Redzone 0xffff810100043168:  bb bb bb bb bb bb bb bb                         �𰻻������        
 Padding 0xffff8101000431a8:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
Pid: 3030, comm: ifconfig Not tainted 2.6.26-rc1 #3

Call Trace:
 [<ffffffff8108cf62>] print_trailer+0x123/0x12c
 [<ffffffff8108d00f>] check_bytes_and_report+0xa4/0xcb
 [<ffffffff8108d33e>] check_object+0xca/0x212
 [<ffffffff8108d6cd>] __free_slab+0x85/0xfd
 [<ffffffff811e5dd3>] ? skb_release_data+0xa8/0xad
 [<ffffffff8108d77d>] discard_slab+0x38/0x3a
 [<ffffffff8108e172>] __slab_free+0xdb/0x2ac
 [<ffffffff8108e47a>] kfree+0xbc/0xcb
 [<ffffffff811e5dd3>] ? skb_release_data+0xa8/0xad
 [<ffffffff811e5dd3>] skb_release_data+0xa8/0xad
 [<ffffffff811e6494>] skb_release_all+0xc9/0xce
 [<ffffffff811e5c2e>] __kfree_skb+0x11/0x78
 [<ffffffff811e5cbc>] kfree_skb+0x27/0x29
 [<ffffffffa00cc3aa>] :atl1:atl1_clean_rx_ring+0x7e/0xe2
 [<ffffffffa00cc4d7>] :atl1:atl1_down+0xc9/0xce
 [<ffffffffa00cedcd>] :atl1:atl1_close+0x18/0x27
 [<ffffffff811ebe2d>] dev_close+0x57/0x72
 [<ffffffff811ebb31>] dev_change_flags+0xa8/0x164
 [<ffffffff8122f44c>] devinet_ioctl+0x26a/0x5f6
 [<ffffffff8122fc79>] inet_ioctl+0x92/0xaa
 [<ffffffff811df6d4>] sock_ioctl+0x1da/0x202
 [<ffffffff8109f252>] vfs_ioctl+0x2a/0x77
 [<ffffffff8109f501>] do_vfs_ioctl+0x262/0x27f
 [<ffffffff8109f575>] sys_ioctl+0x57/0x7a
 [<ffffffff8100bff7>] tracesys+0xd5/0xda

FIX kmalloc-2048: Restoring 0xffff81010004297a-0xffff810100042f71=0x6b

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-06 16:02                                   ` Jay Cliburn
@ 2008-05-09 19:51                                     ` Alexey Dobriyan
  2008-05-09 18:56                                       ` Chris Snook
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-05-09 19:51 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Luca Tettamanti, Chris Snook, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

Hmmm, there was a wonderful oops on interface stop here when the other end
of atl1 cable was physically unplugged (but there was traffic before):

	atl1_down
	atl1_clean_rx_ring
	swiotlb_unmap_single
	swiotlb_unmap_single_attrs
	memcpy_c


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-09 18:56                                       ` Chris Snook
@ 2008-05-09 20:07                                         ` Alexey Dobriyan
  2008-05-09 19:38                                           ` Jay Cliburn
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-05-09 20:07 UTC (permalink / raw)
  To: Chris Snook
  Cc: Jay Cliburn, Luca Tettamanti, Jeff Garzik, Pekka Enberg,
	Andrew Morton, linux-kernel, netdev, Christoph Lameter, torvalds

On Fri, May 09, 2008 at 02:56:21PM -0400, Chris Snook wrote:
> Alexey Dobriyan wrote:
>> Hmmm, there was a wonderful oops on interface stop here when the other end
>> of atl1 cable was physically unplugged (but there was traffic before):
>> 	atl1_down
>> 	atl1_clean_rx_ring
>> 	swiotlb_unmap_single
>> 	swiotlb_unmap_single_attrs
>> 	memcpy_c
>
> Intel chip, or AMD?

Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
Asus P5B-E motherboard.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-09 19:38                                           ` Jay Cliburn
@ 2008-05-10 19:31                                             ` Alexey Dobriyan
  2008-05-11  1:58                                               ` Jay Cliburn
  0 siblings, 1 reply; 26+ messages in thread
From: Alexey Dobriyan @ 2008-05-10 19:31 UTC (permalink / raw)
  To: Jay Cliburn
  Cc: Chris Snook, Luca Tettamanti, Jeff Garzik, Andrew Morton,
	linux-kernel, netdev

On Fri, May 09, 2008 at 02:38:54PM -0500, Jay Cliburn wrote:
> [trimmed cc list slightly]
> 
> On Sat, 10 May 2008 00:07:15 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Fri, May 09, 2008 at 02:56:21PM -0400, Chris Snook wrote:
> > > Alexey Dobriyan wrote:
> > >> Hmmm, there was a wonderful oops on interface stop here when the
> > >> other end of atl1 cable was physically unplugged (but there was
> > >> traffic before): atl1_down
> > >> 	atl1_clean_rx_ring
> > >> 	swiotlb_unmap_single
> > >> 	swiotlb_unmap_single_attrs
> > >> 	memcpy_c
> > >
> > > Intel chip, or AMD?
> > 
> > Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
> > Asus P5B-E motherboard.
> > 
> 
> I see the same thing with a Socked AM2-based board (Asus M2V) with 4GB
> RAM installed. The problem occurs only when SWIOTLB is active, which
> happens automatically at boot (in arch/x86/kernel/pci-swiotlb.c) when
> the page frame number exceeds 1048576 (corresponding to 2^32 bytes).
> 
> I thought for awhile that the problem went away with iommu=allowed, but
> I was wrong.
> 
> The bug appears to be a "simple" skb write-after-free that happens only
> when bounce buffers are in use, but I'll be damned if I can find the
> cause of it.
> 
> <continues looking>

Try this patch! If scared, remove swiotlb poisoning, I'm not entirely
sure it's correct, but it makes aforementioned second oops
deterministic.

--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2027,6 +2029,7 @@ rrd_ok:
 		/* Good Receive */
 		pci_unmap_page(adapter->pdev, buffer_info->dma,
 			       buffer_info->length, PCI_DMA_FROMDEVICE);
+		buffer_info->dma = 0;
 		skb = buffer_info->skb;
 		length = le16_to_cpu(rrd->xsz.xsum_sz.pkt_size);
 
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index d568894..f6165ed 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -399,12 +399,14 @@ unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
 	/*
 	 * First, sync the memory before unmapping the entry
 	 */
-	if (buffer && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+	if (buffer && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))) {
 		/*
 		 * bounce... copy the data back into the original buffer * and
 		 * delete the bounce buffer.
 		 */
 		memcpy(buffer, dma_addr, size);
+		io_tlb_orig_addr[index] = (void *)0x9a9a9a9a9a9a9a9aUL;
+	}
 
 	/*
 	 * Return the buffer to the free list by setting the corresponding


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] Re: atl1 64-bit => 32-bit DMA borkage (reproducible, bisected)
  2008-05-10 19:31                                             ` [PATCH] " Alexey Dobriyan
@ 2008-05-11  1:58                                               ` Jay Cliburn
  0 siblings, 0 replies; 26+ messages in thread
From: Jay Cliburn @ 2008-05-11  1:58 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Chris Snook, Luca Tettamanti, Jeff Garzik, Andrew Morton,
	linux-kernel, netdev

On Sat, 10 May 2008 23:31:07 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Fri, May 09, 2008 at 02:38:54PM -0500, Jay Cliburn wrote:

> > The bug appears to be a "simple" skb write-after-free that happens
> > only when bounce buffers are in use, but I'll be damned if I can
> > find the cause of it.
> > 
> > <continues looking>
> 
> Try this patch! If scared, remove swiotlb poisoning, I'm not entirely
> sure it's correct, but it makes aforementioned second oops
> deterministic.

Seems to fix it for me.  Nicely done, Alexey!  Thanks!

I looked at that blasted unmap a thousand times, but never noticed the
missing buffer_info->dma clear.

I'll get input from one more tester, and if it's positive, I'll submit
this to Jeff.

Thanks again.

> 
> --- a/drivers/net/atlx/atl1.c
> +++ b/drivers/net/atlx/atl1.c
> @@ -2027,6 +2029,7 @@ rrd_ok:
>  		/* Good Receive */
>  		pci_unmap_page(adapter->pdev, buffer_info->dma,
>  			       buffer_info->length,
> PCI_DMA_FROMDEVICE);
> +		buffer_info->dma = 0;
>  		skb = buffer_info->skb;
>  		length = le16_to_cpu(rrd->xsz.xsum_sz.pkt_size);
>  
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index d568894..f6165ed 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -399,12 +399,14 @@ unmap_single(struct device *hwdev, char
> *dma_addr, size_t size, int dir) /*
>  	 * First, sync the memory before unmapping the entry
>  	 */
> -	if (buffer && ((dir == DMA_FROM_DEVICE) || (dir ==
> DMA_BIDIRECTIONAL)))
> +	if (buffer && ((dir == DMA_FROM_DEVICE) || (dir ==
> DMA_BIDIRECTIONAL))) { /*
>  		 * bounce... copy the data back into the original
> buffer * and
>  		 * delete the bounce buffer.
>  		 */
>  		memcpy(buffer, dma_addr, size);
> +		io_tlb_orig_addr[index] = (void
> *)0x9a9a9a9a9a9a9a9aUL;
> +	}
>  
>  	/*
>  	 * Return the buffer to the free list by setting the
> corresponding
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-05-11  1:58 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080410203354.f0a6f464.akpm@linux-foundation.org>
     [not found] ` <20080413204422.GA5136@martell.zuzino.mipt.ru>
     [not found]   ` <84144f020804140901p1c076fd2q73e3effe7cd96da3@mail.gmail.com>
     [not found]     ` <Pine.LNX.4.64.0804141050550.6296@schroedinger.engr.sgi.com>
     [not found]       ` <20080414183221.GA5234@martell.zuzino.mipt.ru>
2008-04-14 19:56         ` 2.6.25-rc8-mm2: FIX kmalloc-2048 (was Re: 2.6.25-rc8-mm2: IP: [<ffffffff802868f9>] __kmalloc+0x69/0x110) Alexey Dobriyan
2008-04-14 20:05           ` Christoph Lameter
2008-04-19 11:17             ` Alexey Dobriyan
2008-04-19 14:45               ` atl1 64-bit => 32-bit DMA borkage (reproducible, bisected) Alexey Dobriyan
2008-04-20  2:54                 ` Jay Cliburn
2008-04-20 11:14                   ` Alexey Dobriyan
2008-04-20 11:06                     ` Jay Cliburn
2008-04-20 12:26                       ` Alexey Dobriyan
2008-04-20 18:37                         ` Jay Cliburn
2008-04-20 20:55                           ` Alexey Dobriyan
2008-04-21 18:42                             ` Chris Snook
2008-04-21 19:56                               ` Alexey Dobriyan
2008-04-22  2:08                             ` Jay Cliburn
2008-04-22 19:02                               ` Alexey Dobriyan
2008-04-26  0:57                               ` Jay Cliburn
2008-04-28  6:42                                 ` Alexey Dobriyan
2008-05-04 21:15                                 ` Alexey Dobriyan
2008-05-05  0:31                                   ` Jay Cliburn
2008-05-05  0:34                                     ` Jay Cliburn
2008-05-06 16:02                                   ` Jay Cliburn
2008-05-09 19:51                                     ` Alexey Dobriyan
2008-05-09 18:56                                       ` Chris Snook
2008-05-09 20:07                                         ` Alexey Dobriyan
2008-05-09 19:38                                           ` Jay Cliburn
2008-05-10 19:31                                             ` [PATCH] " Alexey Dobriyan
2008-05-11  1:58                                               ` Jay Cliburn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).