* [PROBLEM] r8169 deadlocks
@ 2004-01-15 9:38 Srihari Vijayaraghavan
2004-01-15 21:08 ` Francois Romieu
0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-15 9:38 UTC (permalink / raw)
To: netdev
Hello,
Hardware:
Athlon 64 3200+
Gigabyte K8VNXP
VIA K8T800
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 (rev 10)
Subsystem: Giga-byte Technology: Unknown device e000
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (8000ns min, 16000ns max), cache line size 08
Interrupt: pin A routed to IRQ 5
Region 0: I/O ports at e800 [size=256]
Region 1: Memory at e3005000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Software:
SuSE 9 for AMD64
2.6.1-mm3 kernel
Consider desktop as the computer with the RealTek r8169 card and laptop from
where I perform these steps:
1. ssh desktop
2. while true; do ls -la /; done
3. In few seconds the desktop computer hangs
(And of course at the laptop computer the ssh session hangs)
Here is the sysrq-p from the desktop computer (captured using serial-console):
Pid: 1963, comm: ls Not tainted
RIP: 0010:[<ffffffffa008afd9>]
<ffffffffa008afd9>{:r8169:rtl8169_tx_interrupt+73}
RSP: 0000:ffffffff80374dc8 EFLAGS: 00000286
RAX: 0000000000000420 RBX: ffffffff80374d18 RCX: 0000010000399000
RDX: ffffffff80370e80 RSI: 000000003525d05e RDI: 0000000080391bf0
RBP: ffffffff801100d9 R08: 0000000000000007 R09: 0000000000000000
R10: 0000002a95587de0 R11: 0000000000000003 R12: 0000000000000042
R13: 0000000000000001 R14: 00000000000000bc R15: 000001003f7d1340
FS: 00000000005144a0(005b) GS:ffffffff80370e80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a957876d0 CR3: 0000000000101000 CR4: 00000000000006a0
Call Trace:<IRQ> <ffffffffa008b3c8>{:r8169:rtl8169_interrupt+120}
<ffffffff8011222f>{handle_IRQ_event+47}
<ffffffff801123b3>{do_IRQ+147} <ffffffff801100d9>{ret_from_intr+0}
<EOI> <ffffffff80110152>{retint_careful+13}
And here is the sysrq-t:
sibling
task PC pid father child younger older
init S ffffffff8014efa7 1 0 3 (NOTLB)
000001003ff8dd88 0000000000000002 ffffffff80311600 00000000000001f7
ffffffff802c67e0 0000000000000000 000001003ff8b3e0 ffffffff802bb520
00000000802c67e0 000000d000000010
Call Trace:<ffffffff8013a37e>{schedule_timeout+158}
<ffffffff8013a2d0>{process_timeout+0}
<ffffffff80172d3a>{pipe_poll+42} <ffffffff8017914a>{do_select+778}
<ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
<ffffffff8010fb60>{system_call+124}
events/0 R 000001003a4a1d48 3 1 4 5 (L-TLB)
000001000243de88 0000000000000046 ffffffff80311600 ffffffff80140870
000000000000078a 000000693a4a1cd8 000001003ff8a2c0 000001003fd0d1e0
000001003ffecd70 000001000243ded8
Call Trace:<ffffffff80140870>{__call_usermodehelper+0}
<ffffffff80140c6c>{worker_thread+300}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff80140b40>{worker_thread+0} <ffffffff80140b40>{worker_thread+0}
<ffffffff80143d86>{kthread+54} <ffffffff8011054b>{child_rip+8}
<ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}
kblockd/0 S 0000000000000013 4 3 8 (L-TLB)
000001003fd3be88 0000000000000046 ffffffff80311600 0000000000000006
0000000000000001 000000768020bc37 000001003ff89a30 000001003fd0d1e0
000001003fd69870 000001003fd3bed8
Call Trace:<ffffffff80140c6c>{worker_thread+300}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff80140b40>{worker_thread+0}
<ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
<ffffffff8011054b>{child_rip+8}
<ffffffff80143db0>{keventd_create_kthread+0}
<ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}
aio/0 S 000001003ff88948 8 3 199 4 (L-TLB)
000001003fd11e88 0000000000000046 ffffffff80311600 0000000000000003
0000000000000000 0000007d00000000 000001003fd0f420 000001003ff88910
0000000000000000 0000000000000000
Call Trace:<ffffffff80140c6c>{worker_thread+300}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff80140b40>{worker_thread+0}
<ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
<ffffffff8011054b>{child_rip+8}
<ffffffff80143db0>{keventd_create_kthread+0}
<ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}
pdflush S 000000000003fff0 5 1 6 3 (L-TLB)
000001003fd19ee8 0000000000000046 ffffffff80311600 0000000000000000
0000000000000000 0000007d00000000 000001003ff891a0 000001003ff88910
0000000000000000 0000000000000000
Call Trace:<ffffffff8015091f>{__pdflush+159} <ffffffff80150a4c>{pdflush+12}
<ffffffff8011054b>{child_rip+8} <ffffffff80150a40>{pdflush+0}
<ffffffff80110543>{child_rip+0}
pdflush S 0000000000000000 6 1 7 5 (L-TLB)
000001003fd17ee8 0000000000000046 ffffffff80311600 0000000000000000
0000000000000000 0000000000000000 000001003ff88910 ffffffff802bb520
0000000000000000 0000000000000000
Call Trace:<ffffffff8015091f>{__pdflush+159} <ffffffff80150a4c>{pdflush+12}
<ffffffff8011054b>{child_rip+8} <ffffffff80150a40>{pdflush+0}
<ffffffff80110543>{child_rip+0}
kswapd0 S 000000000003fff0 7 1 10 6 (L-TLB)
000001003fd15da8 0000000000000046 ffffffff80311600 0000000000000000
0000000000000000 0000007d00000000 000001003ff88080 000001003ff8b3e0
0000000000000000 0000000000000000
Call Trace:<ffffffff80155df5>{kswapd+277}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff8011054b>{child_rip+8}
<ffffffff80155ce0>{kswapd+0} <ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 10 1 192 7 (L-TLB)
000001003fc17e68 0000000000000046 ffffffff80311600 0000010030748340
ffffffff802bb520 0000010030748280 000001003fd0c950 ffffffff802bb520
000001003fc9f298 000001003fc17eb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
kjournald S 0000000000000006 192 1 193 10 (L-TLB)
000001003e77be68 0000000000000046 ffffffff80311600 000001003fc9f078
000001003e77be48 0000007d8012f388 000001003f6de340 000001003f6df460
0000000000000000 0000000000000000
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 193 1 194 192 (L-TLB)
000001003e4a1e68 0000000000000046 ffffffff80311600 000001003e4a6e78
ffffffff802bb520 000000768012f388 000001003fd0eb90 ffffffff802bb520
000001003e4a6e98 000001003e4a1eb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 194 1 195 193 (L-TLB)
000001003e51de68 0000000000000046 ffffffff80311600 000001003326ec40
ffffffff802bb520 000000753326eb80 000001003f6ddab0 ffffffff802bb520
000001003e4a6c98 000001003e51deb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 195 1 196 194 (L-TLB)
000001003e5f3e68 0000000000000046 ffffffff80311600 000001003e4a6a78
ffffffff802bb520 000000738012f388 000001003f6debd0 ffffffff802bb520
000001003e4a6a98 000001003e5f3eb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 196 1 200 195 (L-TLB)
000001003e4d5e68 0000000000000046 ffffffff80311600 0000010036f6b160
ffffffff802bb520 0000007536f69fa0 000001003f6dc990 ffffffff802bb520
000001003e4a6898 000001003e4d5eb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
reiserfs/0 S 0000000000000000 199 3 8 (L-TLB)
000001003ee49e88 0000000000000046 ffffffff80311600 0000000000000206
0000000000000000 00000076a001f8bf 000001003f6dc100 0000010031492e10
000001003fc8c570 000001003ee49ed8
Call Trace:<ffffffff80140c6c>{worker_thread+300}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff80140b40>{worker_thread+0}
<ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
<ffffffff8011054b>{child_rip+8}
<ffffffff80143db0>{keventd_create_kthread+0}
<ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}
kjournald S ffffffff8012f2a0 200 1 240 196 (L-TLB)
000001003f09de68 0000000000000046 ffffffff80311600 000001003e4a6078
ffffffff802bb520 000000738012f388 000001003e4cf4a0 ffffffff802bb520
000001003e4a6098 000001003f09deb8
Call Trace:<ffffffff801aef67>{kjournald+455}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801307a0>{autoremove_wake_function+0}
<ffffffff801aed80>{commit_timeout+0}
<ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
<ffffffff80110543>{child_rip+0}
scsi_eh_0 S 000001003f5a7dc0 240 1 1875 200 (L-TLB)
000001003f31fe48 0000000000000046 ffffffff80311600 0000000000000008
000001003f5e9ac0 0000007d801340a3 000001003e4ce380 000001003e4cec10
000001003f31ff18 ffffffff80133d21
Call Trace:<ffffffff80133d21>{reparent_to_init+481}
<ffffffff8010ede6>{__down_interruptible+198}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff801bbe81>{__down_failed_interruptible+53}
<ffffffffa00531c4>{:scsi_mod:.text.lock.scsi_error+65}
<ffffffff8011054b>{child_rip+8}
<ffffffffa0052ea0>{:scsi_mod:scsi_error_handler+0}
<ffffffff80110543>{child_rip+0}
bash S ffffffff80159b41 1875 1 1928 1895 240 (NOTLB)
00000100398ebeb8 0000000000000002 ffffffff80311600 00000100398ebf58
00000000005c7ac0 000000783e4cd260 000001003e4cd260 000001003f6df460
000001003f6df460 ffffffff80131e39
Call Trace:<ffffffff80131e39>{copy_process+2265}
<ffffffff80135556>{sys_wait4+598}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8010fb60>{system_call+124}
sshd S 0000000000000256 1895 1 1929 1875 (NOTLB)
0000010039915d88 0000000000000006 ffffffff80311600 0000000000000000
000001003fd0e300 000000758014f070 000001003fd0e300 000001003f6dd220
0000000000000246 0000000000000000
Call Trace:<ffffffff8013a2fe>{schedule_timeout+30}
<ffffffff802482e1>{tcp_poll+33}
<ffffffff8017914a>{do_select+778} <ffffffff80178c80>{__pollwait+0}
<ffffffff801795b0>{sys_select+992} <ffffffff8010fb60>{system_call+124}
slabdiff.py S 0000000000000000 1928 1875 (NOTLB)
000001003d78fd88 0000000000000006 ffffffff80311600 ffffffff801ec504
000000000000000a 0000000000000202 000001003f6df460 ffffffff802bb520
000000000000000a ffffffff801eee2d
Call Trace:<ffffffff801ec504>{lf+36} <ffffffff801eee2d>{do_con_write+1581}
<ffffffff8013a37e>{schedule_timeout+158}
<ffffffff8013a2d0>{process_timeout+0}
<ffffffff8017914a>{do_select+778} <ffffffff801e3f07>{write_chan+551}
<ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
<ffffffff8010fb60>{system_call+124}
sshd S ffffffff8012f2a0 1929 1895 1932 (NOTLB)
000001003a4a1d88 0000000000000006 ffffffff80311600 00000000000001f7
000001003fd0da70 0000007d00000000 000001003f6dd220 000001003fd0da70
000000003a4a1ec0 000000d000000010
Call Trace:<ffffffff8013a2fe>{schedule_timeout+30}
<ffffffff801e4f0b>{pty_write_room+43}
<ffffffff801e406c>{normal_poll+316} <ffffffff8017914a>{do_select+778}
<ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
<ffffffff8010fb60>{system_call+124}
bash S ffffffff80159b41 1932 1929 1963 (NOTLB)
0000010036f1deb8 0000000000000002 ffffffff80311600 0000010036f1df58
00000000005b6018 0000007d3fd0d1e0 000001003fd0d1e0 000001003fd0da70
000001003fd0da70 ffffffff80131e39
Call Trace:<ffffffff80131e39>{copy_process+2265}
<ffffffff8010fbe9>{sysret_signal+28}
<ffffffff80135556>{sys_wait4+598}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8012f340>{default_wake_function+0}
<ffffffff8010fb60>{system_call+124}
ls R current task 1963 1932 (NOTLB)
000001003255df70 0000000000000006 ffffffff80311600 00000000000000a1
000001003f6dd220 0000007500000000 000001003fd0da70 000001003f6dd220
0000000000518200 000000000040c921
Call Trace:<ffffffff80110152>{retint_careful+13}
Please feel free to ask for more information. (please cc me in replies)
Thanks
Hari
harisri@bigpond.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-15 9:38 [PROBLEM] r8169 deadlocks Srihari Vijayaraghavan
@ 2004-01-15 21:08 ` Francois Romieu
2004-01-17 1:34 ` Srihari Vijayaraghavan
2004-01-19 11:51 ` Srihari Vijayaraghavan
0 siblings, 2 replies; 11+ messages in thread
From: Francois Romieu @ 2004-01-15 21:08 UTC (permalink / raw)
To: Srihari Vijayaraghavan; +Cc: netdev
Srihari Vijayaraghavan <harisri@bigpond.com> :
[...]
> Consider desktop as the computer with the RealTek r8169 card and laptop from
> where I perform these steps:
> 1. ssh desktop
> 2. while true; do ls -la /; done
> 3. In few seconds the desktop computer hangs
> (And of course at the laptop computer the ssh session hangs)
>
> Here is the sysrq-p from the desktop computer (captured using serial-console):
> Pid: 1963, comm: ls Not tainted
> RIP: 0010:[<ffffffffa008afd9>]
> <ffffffffa008afd9>{:r8169:rtl8169_tx_interrupt+73}
> RSP: 0000:ffffffff80374dc8 EFLAGS: 00000286
> RAX: 0000000000000420 RBX: ffffffff80374d18 RCX: 0000010000399000
> RDX: ffffffff80370e80 RSI: 000000003525d05e RDI: 0000000080391bf0
> RBP: ffffffff801100d9 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000002a95587de0 R11: 0000000000000003 R12: 0000000000000042
> R13: 0000000000000001 R14: 00000000000000bc R15: 000001003f7d1340
> FS: 00000000005144a0(005b) GS:ffffffff80370e80(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000002a957876d0 CR3: 0000000000101000 CR4: 00000000000006a0
>
> Call Trace:<IRQ> <ffffffffa008b3c8>{:r8169:rtl8169_interrupt+120}
> <ffffffff8011222f>{handle_IRQ_event+47}
> <ffffffff801123b3>{do_IRQ+147} <ffffffff801100d9>{ret_from_intr+0}
> <EOI> <ffffffff80110152>{retint_careful+13}
*head scratch*
Can you monitor 'vmstat 1' output on the r8169 host during the test ?
You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
If it does not perform better, you can try against 2.6.1-bk1 the set at
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b
If I remember correctly, you are the first report of a non-completely
disfunctional driver for the new version of the r8169. Things improve.
--
Ueimor
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-15 21:08 ` Francois Romieu
@ 2004-01-17 1:34 ` Srihari Vijayaraghavan
2004-01-17 12:53 ` Francois Romieu
2004-01-19 11:51 ` Srihari Vijayaraghavan
1 sibling, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-17 1:34 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Hello Francois,
On Friday 16 January 2004 08:08, Francois Romieu wrote:
> *head scratch*
Sorry :-)
> Can you monitor 'vmstat 1' output on the r8169 host during the test ?
The computer deadlocks within few seconds (3 to 5), and it hangs everything
including vmstat, and does not get as for as the file system. I will write it
down by hand and post it.
Here is the sysrq-m when it hung, maybe this will provide some you wanted from
vmstat:
SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty
Free pages: 987624kB (0kB HighMem)
Active:2407 inactive:1605 dirty:1 writeback:0 unstable:0 free:246906
DMA free:13256kB min:12kB low:24kB high:36kB active:0kB inactive:0kB
Normal free:974368kB min:1004kB low:2008kB high:3012kB active:9628kB
inactive:6420kB
HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB
DMA: 0*4kB 1*8kB 0*16kB 2*32kB 2*64kB 2*128kB 2*256kB 0*512kB 0*1024kB
0*2048kB 3*4096kB = 13256kB
Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB
1*2048kB 237*4096kB = 974368kB
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap: 0kB
262128 pages of RAM
5950 reserved pages
2977 pages shared
0 pages swap cached
> You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
I shall try this and then report the status.
> If it does not perform better, you can try against 2.6.1-bk1 the set at
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b
OK. I have tried 2.6.1-mm4 which includes the most recent -netdev updates from
Jeff Garzik and it behaves the same way.
> If I remember correctly, you are the first report of a non-completely
> disfunctional driver for the new version of the r8169. Things improve.
Sorry I am unable to understand your statement.
Thanks
Hari
harisri@bigpond.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-17 1:34 ` Srihari Vijayaraghavan
@ 2004-01-17 12:53 ` Francois Romieu
0 siblings, 0 replies; 11+ messages in thread
From: Francois Romieu @ 2004-01-17 12:53 UTC (permalink / raw)
To: Srihari Vijayaraghavan; +Cc: netdev
Srihari Vijayaraghavan <harisri@bigpond.com> :
[memory stats]
Ok, the driver does not seem to leak.
[...]
> > You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> > http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
>
> I shall try this and then report the status.
Please (see "Scenario" below).
> > If it does not perform better, you can try against 2.6.1-bk1 the set at
> > http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b
>
> OK. I have tried 2.6.1-mm4 which includes the most recent -netdev updates
> from Jeff Garzik and it behaves the same way.
>
> > If I remember correctly, you are the first report of a non-completely
> > disfunctional driver for the new version of the r8169. Things improve.
>
> Sorry I am unable to understand your statement.
Tests have shown that stock r8169 is foobar on amd64 without Realtek's
changes. The r8169 in -mm, -netdev merge various changes made by Realtek
and several contributors. Tests have shown that this modified r8169 was
completely broken. Your report indicates that the last modified r8169 (slowly)
returns to sanity on amd64. Nice :o)
r8169-tx-index-overflow.patch has not been included in -mm nor in -netdev
so far. It has only been moderately tested on x86 so amd64 users are welcome.
I do not claim it will solve everything but nasty things [*] can happen
without it.
[*] Scenario:
While submitting sbk, start_xmit crosses the end of the Tx descriptor ring and
feeds the start of the ring again (so far, so good). It is possible/expected
that several skbs are pending, especially as the start_xmit function uses
posted pci writes to tell that asic that it must wake up. Later, the Tx irq
handler notifes that the first pending buffer was sent. Now, depending on the
state of the memory just after the end of the Tx descriptor ring, interesting
things (deadlock included) can happen.
Take a look at rtl8169_tx_interrupt(), assume that tp->dirty_tx = 63,
tp->cur_tx = 63 + 48. "entry" starts at tp->cur_tx % NUM_TX_DESC = 47 and
can be incremented from tp->cur_tx - tp->dirty_tx = 48 units, thus ending
waaaaayyy beyond the end of the allowed Tx descriptor ring (NUM_TX_DESC = 64
entries). If something in this memory area looks like a Tx descriptor which
is owned by the asic, the irq handler loops for life. If this memory area
looks like a Tx descriptor which belongs to the cpu, the irq handler will
free the skb and the asic may simply send crap on the wire.
If this explanation is right, it applies on 2.4.x as well. However it is
suprizing as Robert Olsson was able to send packets at rather high rates
with the Realtek variant of this driver (where the start_xmit/tx_interrupt
functions are identical).
So, please, please, test in a sane environment (no binary modules) and tell
me if things behave the same/better/worse.
--
Ueimor
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-15 21:08 ` Francois Romieu
2004-01-17 1:34 ` Srihari Vijayaraghavan
@ 2004-01-19 11:51 ` Srihari Vijayaraghavan
2004-01-19 23:24 ` Francois Romieu
1 sibling, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-19 11:51 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Hello Francois,
On Friday 16 January 2004 08:08, Francois Romieu wrote:
> Can you monitor 'vmstat 1' output on the r8169 host during the test ?
Here it is (2.6.1-bk2-netdev4):
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 887000 10820 87992 0 0 291 47 1051 170 7 4 83 6
0 0 0 886800 10820 87992 0 0 0 108 1042 13 0 0 100
0
0 0 0 886800 10820 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10820 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10820 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10836 87992 0 0 0 56 1013 18 0 0 100
0
0 0 0 886800 10840 87992 0 0 0 4 1009 10 0 0 100
0
0 0 0 886800 10840 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10840 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10840 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10848 87992 0 0 0 16 1011 11 0 0 100
0
0 0 0 886800 10848 87992 0 0 0 0 1008 7 0 0 100
0
0 0 0 886800 10848 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10848 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10848 87992 0 0 0 0 1008 3 0 0 100
0
0 0 0 886800 10856 87992 0 0 0 16 1011 11 0 0 100
0
0 0 0 886800 10864 87992 0 0 0 140 1037 12 0 0 100
0
0 0 0 886800 10864 87992 0 0 0 0 1008 3 0 0 100
0
2 0 0 886472 10864 87992 0 0 0 0 1305 1958 12 12 76 0
It hung at the final entry.
> You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 +
netdev4.
> If it does not perform better, you can try against 2.6.1-bk1 the set at
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b
I am yet to try this combination.
Thanks
Hari
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-19 11:51 ` Srihari Vijayaraghavan
@ 2004-01-19 23:24 ` Francois Romieu
2004-01-20 10:50 ` Srihari Vijayaraghavan
0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-19 23:24 UTC (permalink / raw)
To: Srihari Vijayaraghavan; +Cc: netdev
Srihari Vijayaraghavan <harisri@bigpond.com> :
[vmstat 1 output]
Ok, mostly idle.
[...]
> > You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> > http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
>
> The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 +
> netdev4.
Can you verify that your kernel tree is fine or give an (sh-)history of
the applied patches ?
I have just checked and the patch applies cleanly on kernel 2.6.1-bk2 +
Jeff's 2.6.1-bk1-netdev4 as well as on kernel 2.6.1-bk4 + Jeff's
2.6.1-bk4-netdev1.
--
Ueimor
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-19 23:24 ` Francois Romieu
@ 2004-01-20 10:50 ` Srihari Vijayaraghavan
2004-01-20 20:52 ` Francois Romieu
0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-20 10:50 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Hello Francois,
On Tuesday 20 January 2004 10:24, Francois Romieu wrote:
> > The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 +
> > netdev4.
>
> Can you verify that your kernel tree is fine or give an (sh-)history of
> the applied patches ?
cd /usr/local/src
tar xfj /media/cdrecorder/v2.6/linux-2.6.0.tar.bz2
cd linux-2.6.0
bunzip2 -c /media/cdrecorder/v2.6/patch-2.6.1.bz2 |patch -p1
bunzip2 -c ~/linux/patch-2.6.1-bk2.bz2 |patch -p1
bunzip2 -c ~/linux/2.6.1-bk1-netdev4.patch.bz2 |patch -p1
patch -p1 --dry-run < ~/linux/r8169/r8169-tx-index-overflow.patch
patching file drivers/net/r8169.c
Hunk #1 succeeded at 1341 (offset 364 lines).
Hunk #2 FAILED at 1351.
Hunk #3 succeeded at 1365 with fuzz 1 (offset 367 lines).
1 out of 3 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej
> I have just checked and the patch applies cleanly on kernel 2.6.1-bk2 +
> Jeff's 2.6.1-bk1-netdev4 as well as on kernel 2.6.1-bk4 + Jeff's
> 2.6.1-bk4-netdev1.
Interesting.
In this very thread you mentioned (in which you did not cc me BTW :-) that you
welcomed AMD64-RTL8169 users, that gave me an idea. I tested this computer
under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my
torture test (I have verified for no more than 5 minutes though, but then it
does not survive for more than 5 secs under the 64 bit kernel).
(And BTW I do not like binary only kernel modules, and I do these bug
reporting "for fun", and there is no fun in binary only modules. I have been
reading lkml for long enough to understand that :-)
Thanks for help and suggestions so far, I appreciate them.
Hari
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-20 10:50 ` Srihari Vijayaraghavan
@ 2004-01-20 20:52 ` Francois Romieu
2004-01-21 10:15 ` Srihari Vijayaraghavan
0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-20 20:52 UTC (permalink / raw)
To: Srihari Vijayaraghavan; +Cc: netdev
Srihari Vijayaraghavan <harisri@bigpond.com> :
> On Tuesday 20 January 2004 10:24, Francois Romieu wrote:
[...]
> cd /usr/local/src
> tar xfj /media/cdrecorder/v2.6/linux-2.6.0.tar.bz2
> cd linux-2.6.0
> bunzip2 -c /media/cdrecorder/v2.6/patch-2.6.1.bz2 |patch -p1
> bunzip2 -c ~/linux/patch-2.6.1-bk2.bz2 |patch -p1
> bunzip2 -c ~/linux/2.6.1-bk1-netdev4.patch.bz2 |patch -p1
> patch -p1 --dry-run < ~/linux/r8169/r8169-tx-index-overflow.patch
> patching file drivers/net/r8169.c
> Hunk #1 succeeded at 1341 (offset 364 lines).
> Hunk #2 FAILED at 1351.
> Hunk #3 succeeded at 1365 with fuzz 1 (offset 367 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej
$ cat>foo<<EOD
tar jxf linux-2.6.0.tar.bz2
bunzip2 -c patch-2.6.1.bz2 | patch -p1 -d linux-2.6.0
bunzip2 -c patch-2.6.1-bk2.bz2 | patch -p1 -d linux-2.6.0
bunzip2 -c 2.6.1-bk1-netdev4.patch.bz2 | patch -p1 -d linux-2.6.0
wget http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
EOD
$ sh foo
[...]
$ patch -p1 -d linux-2.6.0 < r8169-tx-index-overflow.patch
patching file drivers/net/r8169.c
Okay...
$ md5sum r8169-tx-index-overflow.patch
99b2f5886d6bf1d4df0f7553bb5bef57 r8169-tx-index-overflow.patch
[...]
> In this very thread you mentioned (in which you did not cc me BTW :-) that
> welcomed AMD64-RTL8169 users, that gave me an idea. I tested this computer
I did :o)
----- The following addresses had permanent fatal errors -----
<harisri@bigpond.com>
(reason: 554 recipient <harisri@bigpond.com> exceeds mailbox storage quota)
> under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my
> torture test (I have verified for no more than 5 minutes though, but then it
> does not survive for more than 5 secs under the 64 bit kernel).
Point taken.
--
Ueimor
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-20 20:52 ` Francois Romieu
@ 2004-01-21 10:15 ` Srihari Vijayaraghavan
2004-01-21 23:59 ` Francois Romieu
0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-21 10:15 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
Hello Francois,
On Wednesday 21 January 2004 07:52, Francois Romieu wrote:
> [snip]
> $ cat>foo<<EOD
> tar jxf linux-2.6.0.tar.bz2
> bunzip2 -c patch-2.6.1.bz2 | patch -p1 -d linux-2.6.0
> bunzip2 -c patch-2.6.1-bk2.bz2 | patch -p1 -d linux-2.6.0
> bunzip2 -c 2.6.1-bk1-netdev4.patch.bz2 | patch -p1 -d linux-2.6.0
> wget
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
> EOD
> $ sh foo
> [...]
> $ patch -p1 -d linux-2.6.0 < r8169-tx-index-overflow.patch
> patching file drivers/net/r8169.c
>
> Okay...
>
> $ md5sum r8169-tx-index-overflow.patch
> 99b2f5886d6bf1d4df0f7553bb5bef57 r8169-tx-index-overflow.patch
>
> [...]
Must be my mistake. Thanks for verifying things.
> I did :o)
>
> ----- The following addresses had permanent fatal errors -----
> <harisri@bigpond.com>
> (reason: 554 recipient <harisri@bigpond.com> exceeds mailbox storage
> quota)
:-) I apologies for that. God save me from the Spammers!
> > under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my
> > torture test (I have verified for no more than 5 minutes though, but then
> > it does not survive for more than 5 secs under the 64 bit kernel).
>
> Point taken.
I have a good news: I checked out things as usual under vanilla
linux-2.6.2-rc1, and to my surprise Kernel does not hang anymore :-).
(although Tx counter not incrementing is altogether another problem).
If you want me to I can test your r8169-tc-index-overflow.patch.
Thanks
Hari
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-21 10:15 ` Srihari Vijayaraghavan
@ 2004-01-21 23:59 ` Francois Romieu
2004-01-22 10:32 ` Srihari Vijayaraghavan
0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-21 23:59 UTC (permalink / raw)
To: Srihari Vijayaraghavan; +Cc: netdev, jgarzik
[-- Attachment #1: Type: text/plain, Size: 523 bytes --]
Srihari Vijayaraghavan <harisri@bigpond.com> :
[...]
> I have a good news: I checked out things as usual under vanilla
> linux-2.6.2-rc1, and to my surprise Kernel does not hang anymore :-).
> (although Tx counter not incrementing is altogether another problem).
r8169 did not evolve between 2.6.1 and 2.6.2-rc1. Change of behavior
probably comes from some other part of the kernel. Joy.
> If you want me to I can test your r8169-tc-index-overflow.patch.
See attachment. It applies against plain 2.6.2-rc1.
--
Ueimor
[-- Attachment #2: r8169.c-diff --]
[-- Type: text/plain, Size: 872 bytes --]
--- linux-2.6.2-rc1/drivers/net/r8169.c.orig 2004-01-22 00:41:03.000000000 +0100
+++ linux-2.6.2-rc1/drivers/net/r8169.c 2004-01-22 00:46:46.000000000 +0100
@@ -871,7 +871,6 @@ rtl8169_tx_interrupt(struct net_device *
void *ioaddr)
{
unsigned long dirty_tx, tx_left = 0;
- int entry = tp->cur_tx % NUM_TX_DESC;
assert(dev != NULL);
assert(tp != NULL);
@@ -881,14 +880,14 @@ rtl8169_tx_interrupt(struct net_device *
tx_left = tp->cur_tx - dirty_tx;
while (tx_left > 0) {
+ int entry = dirty_tx % NUM_TX_DESC;
+
if ((tp->TxDescArray[entry].status & OWNbit) == 0) {
- dev_kfree_skb_irq(tp->
- Tx_skbuff[dirty_tx % NUM_TX_DESC]);
- tp->Tx_skbuff[dirty_tx % NUM_TX_DESC] = NULL;
+ dev_kfree_skb_irq(tp->Tx_skbuff[entry]);
+ tp->Tx_skbuff[entry] = NULL;
tp->stats.tx_packets++;
dirty_tx++;
tx_left--;
- entry++;
}
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PROBLEM] r8169 deadlocks
2004-01-21 23:59 ` Francois Romieu
@ 2004-01-22 10:32 ` Srihari Vijayaraghavan
0 siblings, 0 replies; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-22 10:32 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, jgarzik
Hello Francois,
On Thursday 22 January 2004 10:59, Francois Romieu wrote:
> [...]
> r8169 did not evolve between 2.6.1 and 2.6.2-rc1. Change of behavior
> probably comes from some other part of the kernel. Joy.
>
> > If you want me to I can test your r8169-tc-index-overflow.patch.
>
> See attachment. It applies against plain 2.6.2-rc1.
OK. I have applied your patch and I have tested my computer under moderate to
high network load. Although I may not have the kind of network load to
trigger the bug your patch is meant to attack, I can confirm there is no
stability issue with your patch.
Thanks
Hari
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2004-01-22 10:32 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-15 9:38 [PROBLEM] r8169 deadlocks Srihari Vijayaraghavan
2004-01-15 21:08 ` Francois Romieu
2004-01-17 1:34 ` Srihari Vijayaraghavan
2004-01-17 12:53 ` Francois Romieu
2004-01-19 11:51 ` Srihari Vijayaraghavan
2004-01-19 23:24 ` Francois Romieu
2004-01-20 10:50 ` Srihari Vijayaraghavan
2004-01-20 20:52 ` Francois Romieu
2004-01-21 10:15 ` Srihari Vijayaraghavan
2004-01-21 23:59 ` Francois Romieu
2004-01-22 10:32 ` Srihari Vijayaraghavan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).