netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PROBLEM] r8169 deadlocks
@ 2004-01-15  9:38 Srihari Vijayaraghavan
  2004-01-15 21:08 ` Francois Romieu
  0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-15  9:38 UTC (permalink / raw)
  To: netdev

Hello,

Hardware:
Athlon 64 3200+
Gigabyte K8VNXP
VIA K8T800
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 (rev 10)
        Subsystem: Giga-byte Technology: Unknown device e000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (8000ns min, 16000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 5
        Region 0: I/O ports at e800 [size=256]
        Region 1: Memory at e3005000 (32-bit, non-prefetchable) [size=256]
        Expansion ROM at <unassigned> [disabled] [size=64K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA 
PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Software:
SuSE 9 for AMD64
2.6.1-mm3 kernel

Consider desktop as the computer with the RealTek r8169 card and laptop from 
where I perform these steps:
1. ssh desktop
2. while true; do ls -la /; done
3. In few seconds the desktop computer hangs
(And of course at the laptop computer the ssh session hangs)

Here is the sysrq-p from the desktop computer (captured using serial-console):
Pid: 1963, comm: ls Not tainted
RIP: 0010:[<ffffffffa008afd9>] 
<ffffffffa008afd9>{:r8169:rtl8169_tx_interrupt+73}
RSP: 0000:ffffffff80374dc8  EFLAGS: 00000286
RAX: 0000000000000420 RBX: ffffffff80374d18 RCX: 0000010000399000
RDX: ffffffff80370e80 RSI: 000000003525d05e RDI: 0000000080391bf0
RBP: ffffffff801100d9 R08: 0000000000000007 R09: 0000000000000000
R10: 0000002a95587de0 R11: 0000000000000003 R12: 0000000000000042
R13: 0000000000000001 R14: 00000000000000bc R15: 000001003f7d1340
FS:  00000000005144a0(005b) GS:ffffffff80370e80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a957876d0 CR3: 0000000000101000 CR4: 00000000000006a0

Call Trace:<IRQ> <ffffffffa008b3c8>{:r8169:rtl8169_interrupt+120} 
<ffffffff8011222f>{handle_IRQ_event+47}
       <ffffffff801123b3>{do_IRQ+147} <ffffffff801100d9>{ret_from_intr+0}
        <EOI> <ffffffff80110152>{retint_careful+13}

And here is the sysrq-t:
                                                       sibling
  task                 PC          pid father child younger older
init          S ffffffff8014efa7     1      0     3               (NOTLB)
000001003ff8dd88 0000000000000002 ffffffff80311600 00000000000001f7
       ffffffff802c67e0 0000000000000000 000001003ff8b3e0 ffffffff802bb520
       00000000802c67e0 000000d000000010
Call Trace:<ffffffff8013a37e>{schedule_timeout+158} 
<ffffffff8013a2d0>{process_timeout+0}
       <ffffffff80172d3a>{pipe_poll+42} <ffffffff8017914a>{do_select+778}
       <ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
       <ffffffff8010fb60>{system_call+124}
events/0      R 000001003a4a1d48     3      1     4       5       (L-TLB)
000001000243de88 0000000000000046 ffffffff80311600 ffffffff80140870
       000000000000078a 000000693a4a1cd8 000001003ff8a2c0 000001003fd0d1e0
       000001003ffecd70 000001000243ded8
Call Trace:<ffffffff80140870>{__call_usermodehelper+0} 
<ffffffff80140c6c>{worker_thread+300}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff80140b40>{worker_thread+0} <ffffffff80140b40>{worker_thread+0}
       <ffffffff80143d86>{kthread+54} <ffffffff8011054b>{child_rip+8}
       <ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}

kblockd/0     S 0000000000000013     4      3             8       (L-TLB)
000001003fd3be88 0000000000000046 ffffffff80311600 0000000000000006
       0000000000000001 000000768020bc37 000001003ff89a30 000001003fd0d1e0
       000001003fd69870 000001003fd3bed8 
Call Trace:<ffffffff80140c6c>{worker_thread+300} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff80140b40>{worker_thread+0}
       <ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
       <ffffffff8011054b>{child_rip+8} 
<ffffffff80143db0>{keventd_create_kthread+0}
       <ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0} 

aio/0         S 000001003ff88948     8      3           199     4 (L-TLB)
000001003fd11e88 0000000000000046 ffffffff80311600 0000000000000003
       0000000000000000 0000007d00000000 000001003fd0f420 000001003ff88910 
       0000000000000000 0000000000000000
Call Trace:<ffffffff80140c6c>{worker_thread+300} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff80140b40>{worker_thread+0}
       <ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
       <ffffffff8011054b>{child_rip+8} 
<ffffffff80143db0>{keventd_create_kthread+0}
       <ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}

pdflush       S 000000000003fff0     5      1             6     3 (L-TLB)
000001003fd19ee8 0000000000000046 ffffffff80311600 0000000000000000
       0000000000000000 0000007d00000000 000001003ff891a0 000001003ff88910
       0000000000000000 0000000000000000
Call Trace:<ffffffff8015091f>{__pdflush+159} <ffffffff80150a4c>{pdflush+12}
       <ffffffff8011054b>{child_rip+8} <ffffffff80150a40>{pdflush+0}
       <ffffffff80110543>{child_rip+0} 
pdflush       S 0000000000000000     6      1             7     5 (L-TLB)
000001003fd17ee8 0000000000000046 ffffffff80311600 0000000000000000
       0000000000000000 0000000000000000 000001003ff88910 ffffffff802bb520
       0000000000000000 0000000000000000
Call Trace:<ffffffff8015091f>{__pdflush+159} <ffffffff80150a4c>{pdflush+12} 
       <ffffffff8011054b>{child_rip+8} <ffffffff80150a40>{pdflush+0}
       <ffffffff80110543>{child_rip+0} 
kswapd0       S 000000000003fff0     7      1            10     6 (L-TLB)
000001003fd15da8 0000000000000046 ffffffff80311600 0000000000000000
       0000000000000000 0000007d00000000 000001003ff88080 000001003ff8b3e0 
       0000000000000000 0000000000000000
Call Trace:<ffffffff80155df5>{kswapd+277} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff8011054b>{child_rip+8} 
       <ffffffff80155ce0>{kswapd+0} <ffffffff80110543>{child_rip+0}
       
kjournald     S ffffffff8012f2a0    10      1           192     7 (L-TLB)
000001003fc17e68 0000000000000046 ffffffff80311600 0000010030748340 
       ffffffff802bb520 0000010030748280 000001003fd0c950 ffffffff802bb520 
       000001003fc9f298 000001003fc17eb8
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0} 
       <ffffffff80110543>{child_rip+0}
kjournald     S 0000000000000006   192      1           193    10 (L-TLB)
000001003e77be68 0000000000000046 ffffffff80311600 000001003fc9f078
       000001003e77be48 0000007d8012f388 000001003f6de340 000001003f6df460
       0000000000000000 0000000000000000 
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
       <ffffffff80110543>{child_rip+0} 
kjournald     S ffffffff8012f2a0   193      1           194   192 (L-TLB)
000001003e4a1e68 0000000000000046 ffffffff80311600 000001003e4a6e78 
       ffffffff802bb520 000000768012f388 000001003fd0eb90 ffffffff802bb520
       000001003e4a6e98 000001003e4a1eb8 
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0} 
       <ffffffff80110543>{child_rip+0}
kjournald     S ffffffff8012f2a0   194      1           195   193 (L-TLB)
000001003e51de68 0000000000000046 ffffffff80311600 000001003326ec40
       ffffffff802bb520 000000753326eb80 000001003f6ddab0 ffffffff802bb520 
       000001003e4a6c98 000001003e51deb8
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0} 
       <ffffffff80110543>{child_rip+0}
kjournald     S ffffffff8012f2a0   195      1           196   194 (L-TLB)
000001003e5f3e68 0000000000000046 ffffffff80311600 000001003e4a6a78
       ffffffff802bb520 000000738012f388 000001003f6debd0 ffffffff802bb520 
       000001003e4a6a98 000001003e5f3eb8
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0} 
       <ffffffff80110543>{child_rip+0}
kjournald     S ffffffff8012f2a0   196      1           200   195 (L-TLB)
000001003e4d5e68 0000000000000046 ffffffff80311600 0000010036f6b160
       ffffffff802bb520 0000007536f69fa0 000001003f6dc990 ffffffff802bb520 
       000001003e4a6898 000001003e4d5eb8
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
       <ffffffff80110543>{child_rip+0} 
reiserfs/0    S 0000000000000000   199      3                   8 (L-TLB)
000001003ee49e88 0000000000000046 ffffffff80311600 0000000000000206 
       0000000000000000 00000076a001f8bf 000001003f6dc100 0000010031492e10
       000001003fc8c570 000001003ee49ed8 
Call Trace:<ffffffff80140c6c>{worker_thread+300} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff80140b40>{worker_thread+0}
       <ffffffff80140b40>{worker_thread+0} <ffffffff80143d86>{kthread+54}
       <ffffffff8011054b>{child_rip+8} 
<ffffffff80143db0>{keventd_create_kthread+0}
       <ffffffff80143d50>{kthread+0} <ffffffff80110543>{child_rip+0}

kjournald     S ffffffff8012f2a0   200      1           240   196 (L-TLB)
000001003f09de68 0000000000000046 ffffffff80311600 000001003e4a6078 
       ffffffff802bb520 000000738012f388 000001003e4cf4a0 ffffffff802bb520
       000001003e4a6098 000001003f09deb8 
Call Trace:<ffffffff801aef67>{kjournald+455} 
<ffffffff801307a0>{autoremove_wake_function+0}
       <ffffffff801307a0>{autoremove_wake_function+0} 
<ffffffff801aed80>{commit_timeout+0}
       <ffffffff8011054b>{child_rip+8} <ffffffff801aeda0>{kjournald+0}
       <ffffffff80110543>{child_rip+0} 
scsi_eh_0     S 000001003f5a7dc0   240      1          1875   200 (L-TLB)
000001003f31fe48 0000000000000046 ffffffff80311600 0000000000000008 
       000001003f5e9ac0 0000007d801340a3 000001003e4ce380 000001003e4cec10
       000001003f31ff18 ffffffff80133d21 
Call Trace:<ffffffff80133d21>{reparent_to_init+481} 
<ffffffff8010ede6>{__down_interruptible+198}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff801bbe81>{__down_failed_interruptible+53}
       <ffffffffa00531c4>{:scsi_mod:.text.lock.scsi_error+65}
       <ffffffff8011054b>{child_rip+8} 
<ffffffffa0052ea0>{:scsi_mod:scsi_error_handler+0}
       <ffffffff80110543>{child_rip+0} 
bash          S ffffffff80159b41  1875      1  1928    1895   240 (NOTLB)
00000100398ebeb8 0000000000000002 ffffffff80311600 00000100398ebf58 
       00000000005c7ac0 000000783e4cd260 000001003e4cd260 000001003f6df460
       000001003f6df460 ffffffff80131e39 
Call Trace:<ffffffff80131e39>{copy_process+2265} 
<ffffffff80135556>{sys_wait4+598}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff8010fb60>{system_call+124} 
sshd          S 0000000000000256  1895      1  1929          1875 (NOTLB)
0000010039915d88 0000000000000006 ffffffff80311600 0000000000000000 
       000001003fd0e300 000000758014f070 000001003fd0e300 000001003f6dd220
       0000000000000246 0000000000000000
Call Trace:<ffffffff8013a2fe>{schedule_timeout+30} 
<ffffffff802482e1>{tcp_poll+33} 
       <ffffffff8017914a>{do_select+778} <ffffffff80178c80>{__pollwait+0}
       <ffffffff801795b0>{sys_select+992} <ffffffff8010fb60>{system_call+124} 

slabdiff.py   S 0000000000000000  1928   1875                     (NOTLB)
000001003d78fd88 0000000000000006 ffffffff80311600 ffffffff801ec504
       000000000000000a 0000000000000202 000001003f6df460 ffffffff802bb520
       000000000000000a ffffffff801eee2d
Call Trace:<ffffffff801ec504>{lf+36} <ffffffff801eee2d>{do_con_write+1581}
       <ffffffff8013a37e>{schedule_timeout+158} 
<ffffffff8013a2d0>{process_timeout+0}
       <ffffffff8017914a>{do_select+778} <ffffffff801e3f07>{write_chan+551}
       <ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
       <ffffffff8010fb60>{system_call+124} 
sshd          S ffffffff8012f2a0  1929   1895  1932               (NOTLB)
000001003a4a1d88 0000000000000006 ffffffff80311600 00000000000001f7
       000001003fd0da70 0000007d00000000 000001003f6dd220 000001003fd0da70
       000000003a4a1ec0 000000d000000010 
Call Trace:<ffffffff8013a2fe>{schedule_timeout+30} 
<ffffffff801e4f0b>{pty_write_room+43}
       <ffffffff801e406c>{normal_poll+316} <ffffffff8017914a>{do_select+778} 
       <ffffffff80178c80>{__pollwait+0} <ffffffff801795b0>{sys_select+992}
       <ffffffff8010fb60>{system_call+124} 
bash          S ffffffff80159b41  1932   1929  1963               (NOTLB)
0000010036f1deb8 0000000000000002 ffffffff80311600 0000010036f1df58 
       00000000005b6018 0000007d3fd0d1e0 000001003fd0d1e0 000001003fd0da70
       000001003fd0da70 ffffffff80131e39 
Call Trace:<ffffffff80131e39>{copy_process+2265} 
<ffffffff8010fbe9>{sysret_signal+28}
       <ffffffff80135556>{sys_wait4+598} 
<ffffffff8012f340>{default_wake_function+0}
       <ffffffff8012f340>{default_wake_function+0} 
<ffffffff8010fb60>{system_call+124}

ls            R   current task    1963   1932                     (NOTLB)
000001003255df70 0000000000000006 ffffffff80311600 00000000000000a1
       000001003f6dd220 0000007500000000 000001003fd0da70 000001003f6dd220
       0000000000518200 000000000040c921
Call Trace:<ffffffff80110152>{retint_careful+13}

Please feel free to ask for more information. (please cc me in replies)

Thanks
Hari
harisri@bigpond.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-15  9:38 [PROBLEM] r8169 deadlocks Srihari Vijayaraghavan
@ 2004-01-15 21:08 ` Francois Romieu
  2004-01-17  1:34   ` Srihari Vijayaraghavan
  2004-01-19 11:51   ` Srihari Vijayaraghavan
  0 siblings, 2 replies; 11+ messages in thread
From: Francois Romieu @ 2004-01-15 21:08 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: netdev

Srihari Vijayaraghavan <harisri@bigpond.com> :
[...]
> Consider desktop as the computer with the RealTek r8169 card and laptop from 
> where I perform these steps:
> 1. ssh desktop
> 2. while true; do ls -la /; done
> 3. In few seconds the desktop computer hangs
> (And of course at the laptop computer the ssh session hangs)
> 
> Here is the sysrq-p from the desktop computer (captured using serial-console):
> Pid: 1963, comm: ls Not tainted
> RIP: 0010:[<ffffffffa008afd9>] 
> <ffffffffa008afd9>{:r8169:rtl8169_tx_interrupt+73}
> RSP: 0000:ffffffff80374dc8  EFLAGS: 00000286
> RAX: 0000000000000420 RBX: ffffffff80374d18 RCX: 0000010000399000
> RDX: ffffffff80370e80 RSI: 000000003525d05e RDI: 0000000080391bf0
> RBP: ffffffff801100d9 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000002a95587de0 R11: 0000000000000003 R12: 0000000000000042
> R13: 0000000000000001 R14: 00000000000000bc R15: 000001003f7d1340
> FS:  00000000005144a0(005b) GS:ffffffff80370e80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000002a957876d0 CR3: 0000000000101000 CR4: 00000000000006a0
> 
> Call Trace:<IRQ> <ffffffffa008b3c8>{:r8169:rtl8169_interrupt+120} 
> <ffffffff8011222f>{handle_IRQ_event+47}
>        <ffffffff801123b3>{do_IRQ+147} <ffffffff801100d9>{ret_from_intr+0}
>         <EOI> <ffffffff80110152>{retint_careful+13}

*head scratch*

Can you monitor 'vmstat 1' output on the r8169 host during the test ?

You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 + 
http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch 

If it does not perform better, you can try against 2.6.1-bk1 the set at
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b

If I remember correctly, you are the first report of a non-completely
disfunctional driver for the new version of the r8169. Things improve.

--
Ueimor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-15 21:08 ` Francois Romieu
@ 2004-01-17  1:34   ` Srihari Vijayaraghavan
  2004-01-17 12:53     ` Francois Romieu
  2004-01-19 11:51   ` Srihari Vijayaraghavan
  1 sibling, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-17  1:34 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Hello Francois,

On Friday 16 January 2004 08:08, Francois Romieu wrote:
> *head scratch*

Sorry :-)

> Can you monitor 'vmstat 1' output on the r8169 host during the test ?

The computer deadlocks within few seconds (3 to 5), and it hangs everything 
including vmstat, and does not get as for as the file system. I will write it 
down by hand and post it.

Here is the sysrq-m when it hung, maybe this will provide some you wanted from 
vmstat:
SysRq : Show Memory
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty

Free pages:      987624kB (0kB HighMem)
Active:2407 inactive:1605 dirty:1 writeback:0 unstable:0 free:246906
DMA free:13256kB min:12kB low:24kB high:36kB active:0kB inactive:0kB
Normal free:974368kB min:1004kB low:2008kB high:3012kB active:9628kB 
inactive:6420kB
HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB
DMA: 0*4kB 1*8kB 0*16kB 2*32kB 2*64kB 2*128kB 2*256kB 0*512kB 0*1024kB 
0*2048kB 3*4096kB = 13256kB
Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 
1*2048kB 237*4096kB = 974368kB
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap:            0kB
262128 pages of RAM
5950 reserved pages
2977 pages shared
0 pages swap cached

> You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch

I shall try this and then report the status.

> If it does not perform better, you can try against 2.6.1-bk1 the set at
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b

OK. I have tried 2.6.1-mm4 which includes the most recent -netdev updates from 
Jeff Garzik and it behaves the same way.

> If I remember correctly, you are the first report of a non-completely
> disfunctional driver for the new version of the r8169. Things improve.

Sorry I am unable to understand your statement.

Thanks
Hari
harisri@bigpond.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-17  1:34   ` Srihari Vijayaraghavan
@ 2004-01-17 12:53     ` Francois Romieu
  0 siblings, 0 replies; 11+ messages in thread
From: Francois Romieu @ 2004-01-17 12:53 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: netdev

Srihari Vijayaraghavan <harisri@bigpond.com> :
[memory stats]

Ok, the driver does not seem to leak.

[...]
> > You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> > http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
> 
> I shall try this and then report the status.

Please (see "Scenario" below).

> > If it does not perform better, you can try against 2.6.1-bk1 the set at
> > http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b
> 
> OK. I have tried 2.6.1-mm4 which includes the most recent -netdev updates
> from Jeff Garzik and it behaves the same way.
> 
> > If I remember correctly, you are the first report of a non-completely
> > disfunctional driver for the new version of the r8169. Things improve.
> 
> Sorry I am unable to understand your statement.

Tests have shown that stock r8169 is foobar on amd64 without Realtek's
changes.  The r8169 in -mm, -netdev merge various changes made by Realtek
and several contributors. Tests have shown that this modified r8169 was
completely broken. Your report indicates that the last modified r8169 (slowly)
returns to sanity on amd64. Nice :o)

r8169-tx-index-overflow.patch has not been included in -mm nor in -netdev
so far. It has only been moderately tested on x86 so amd64 users are welcome.
I do not claim it will solve everything but nasty things [*] can happen
without it.

[*] Scenario:
While submitting sbk, start_xmit crosses the end of the Tx descriptor ring and
feeds the start of the ring again (so far, so good). It is possible/expected
that several skbs are pending, especially as the start_xmit function uses
posted pci writes to tell that asic that it must wake up. Later, the Tx irq
handler notifes that the first pending buffer was sent. Now, depending on the
state of the memory just after the end of the Tx descriptor ring, interesting
things (deadlock included) can happen.

Take a look at rtl8169_tx_interrupt(), assume that tp->dirty_tx = 63,
tp->cur_tx = 63 + 48. "entry" starts at tp->cur_tx % NUM_TX_DESC = 47 and
can be incremented from tp->cur_tx - tp->dirty_tx = 48 units, thus ending
waaaaayyy beyond the end of the allowed Tx descriptor ring (NUM_TX_DESC = 64
entries). If something in this memory area looks like a Tx descriptor which
is owned by the asic, the irq handler loops for life. If this memory area
looks like a Tx descriptor which belongs to the cpu, the irq handler will
free the skb and the asic may simply send crap on the wire.

If this explanation is right, it applies on 2.4.x as well. However it is
suprizing as Robert Olsson was able to send packets at rather high rates
with the Realtek variant of this driver (where the start_xmit/tx_interrupt
functions are identical).

So, please, please, test in a sane environment (no binary modules) and tell
me if things behave the same/better/worse.

--
Ueimor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-15 21:08 ` Francois Romieu
  2004-01-17  1:34   ` Srihari Vijayaraghavan
@ 2004-01-19 11:51   ` Srihari Vijayaraghavan
  2004-01-19 23:24     ` Francois Romieu
  1 sibling, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-19 11:51 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Hello Francois,

On Friday 16 January 2004 08:08, Francois Romieu wrote:
> Can you monitor 'vmstat 1' output on the r8169 host during the test ?

Here it is (2.6.1-bk2-netdev4):
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0 887000  10820  87992    0    0   291    47 1051   170  7  4 83  6

 0  0      0 886800  10820  87992    0    0     0   108 1042    13  0  0 100  
0
 0  0      0 886800  10820  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10820  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10820  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10836  87992    0    0     0    56 1013    18  0  0 100  
0
 0  0      0 886800  10840  87992    0    0     0     4 1009    10  0  0 100  
0
 0  0      0 886800  10840  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10840  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10840  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10848  87992    0    0     0    16 1011    11  0  0 100  
0
 0  0      0 886800  10848  87992    0    0     0     0 1008     7  0  0 100  
0
 0  0      0 886800  10848  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10848  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10848  87992    0    0     0     0 1008     3  0  0 100  
0
 0  0      0 886800  10856  87992    0    0     0    16 1011    11  0  0 100  
0
 0  0      0 886800  10864  87992    0    0     0   140 1037    12  0  0 100  
0
 0  0      0 886800  10864  87992    0    0     0     0 1008     3  0  0 100  
0
 2  0      0 886472  10864  87992    0    0     0     0 1305  1958 12 12 76  0

It hung at the final entry.

> You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch

The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 + 
netdev4.

> If it does not perform better, you can try against 2.6.1-bk1 the set at
> http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.1-bk1-b

I am yet to try this combination.

Thanks
Hari

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-19 11:51   ` Srihari Vijayaraghavan
@ 2004-01-19 23:24     ` Francois Romieu
  2004-01-20 10:50       ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-19 23:24 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: netdev

Srihari Vijayaraghavan <harisri@bigpond.com> :
[vmstat 1 output]

Ok, mostly idle.

[...]
> > You can try 2.6.1-bk2 + Jeff Garzik's -netdev4 +
> > http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
> 
> The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 + 
> netdev4.

Can you verify that your kernel tree is fine or give an (sh-)history of
the applied patches ?

I have just checked and the patch applies cleanly on kernel 2.6.1-bk2 +
Jeff's 2.6.1-bk1-netdev4 as well as on kernel 2.6.1-bk4 + Jeff's
2.6.1-bk4-netdev1. 

--
Ueimor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-19 23:24     ` Francois Romieu
@ 2004-01-20 10:50       ` Srihari Vijayaraghavan
  2004-01-20 20:52         ` Francois Romieu
  0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-20 10:50 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Hello Francois,

On Tuesday 20 January 2004 10:24, Francois Romieu wrote:
> > The r8169-tc-index-overflow.patch does not (cleanly) apply on 2.6.1-bk2 +
> > netdev4.
>
> Can you verify that your kernel tree is fine or give an (sh-)history of
> the applied patches ?

cd /usr/local/src
tar xfj /media/cdrecorder/v2.6/linux-2.6.0.tar.bz2
cd linux-2.6.0
bunzip2 -c /media/cdrecorder/v2.6/patch-2.6.1.bz2 |patch -p1
bunzip2 -c ~/linux/patch-2.6.1-bk2.bz2 |patch -p1
bunzip2 -c ~/linux/2.6.1-bk1-netdev4.patch.bz2 |patch -p1
patch -p1 --dry-run < ~/linux/r8169/r8169-tx-index-overflow.patch
patching file drivers/net/r8169.c
Hunk #1 succeeded at 1341 (offset 364 lines).
Hunk #2 FAILED at 1351.
Hunk #3 succeeded at 1365 with fuzz 1 (offset 367 lines).
1 out of 3 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej

> I have just checked and the patch applies cleanly on kernel 2.6.1-bk2 +
> Jeff's 2.6.1-bk1-netdev4 as well as on kernel 2.6.1-bk4 + Jeff's
> 2.6.1-bk4-netdev1.

Interesting.

In this very thread you mentioned (in which you did not cc me BTW :-) that you 
welcomed AMD64-RTL8169 users, that gave me an idea. I tested this computer 
under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my 
torture test (I have verified for no more than 5 minutes though, but then it 
does not survive for more than 5 secs under the 64 bit kernel).

(And BTW I do not like binary only kernel modules, and I do these bug 
reporting "for fun", and there is no fun in binary only modules. I have been 
reading lkml for long enough to understand that :-)

Thanks for help and suggestions so far, I appreciate them.

Hari

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-20 10:50       ` Srihari Vijayaraghavan
@ 2004-01-20 20:52         ` Francois Romieu
  2004-01-21 10:15           ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-20 20:52 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: netdev

Srihari Vijayaraghavan <harisri@bigpond.com> :
> On Tuesday 20 January 2004 10:24, Francois Romieu wrote:
[...]
> cd /usr/local/src
> tar xfj /media/cdrecorder/v2.6/linux-2.6.0.tar.bz2
> cd linux-2.6.0
> bunzip2 -c /media/cdrecorder/v2.6/patch-2.6.1.bz2 |patch -p1
> bunzip2 -c ~/linux/patch-2.6.1-bk2.bz2 |patch -p1
> bunzip2 -c ~/linux/2.6.1-bk1-netdev4.patch.bz2 |patch -p1
> patch -p1 --dry-run < ~/linux/r8169/r8169-tx-index-overflow.patch
> patching file drivers/net/r8169.c
> Hunk #1 succeeded at 1341 (offset 364 lines).
> Hunk #2 FAILED at 1351.
> Hunk #3 succeeded at 1365 with fuzz 1 (offset 367 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file drivers/net/r8169.c.rej

$ cat>foo<<EOD
tar jxf linux-2.6.0.tar.bz2 
bunzip2 -c patch-2.6.1.bz2 | patch -p1 -d linux-2.6.0
bunzip2 -c patch-2.6.1-bk2.bz2 | patch -p1 -d linux-2.6.0
bunzip2 -c 2.6.1-bk1-netdev4.patch.bz2 | patch -p1 -d linux-2.6.0
wget http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
EOD
$ sh foo
[...]
$ patch -p1 -d linux-2.6.0 < r8169-tx-index-overflow.patch 
patching file drivers/net/r8169.c

Okay...

$ md5sum r8169-tx-index-overflow.patch 
99b2f5886d6bf1d4df0f7553bb5bef57  r8169-tx-index-overflow.patch

[...]
> In this very thread you mentioned (in which you did not cc me BTW :-) that 
> welcomed AMD64-RTL8169 users, that gave me an idea. I tested this computer 

I did :o)

   ----- The following addresses had permanent fatal errors -----
<harisri@bigpond.com>
    (reason: 554 recipient <harisri@bigpond.com> exceeds mailbox storage quota)

> under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my 
> torture test (I have verified for no more than 5 minutes though, but then it 
> does not survive for more than 5 secs under the 64 bit kernel).

Point taken.

--
Ueimor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-20 20:52         ` Francois Romieu
@ 2004-01-21 10:15           ` Srihari Vijayaraghavan
  2004-01-21 23:59             ` Francois Romieu
  0 siblings, 1 reply; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-21 10:15 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Hello Francois,

On Wednesday 21 January 2004 07:52, Francois Romieu wrote:
> [snip]
> $ cat>foo<<EOD
> tar jxf linux-2.6.0.tar.bz2
> bunzip2 -c patch-2.6.1.bz2 | patch -p1 -d linux-2.6.0
> bunzip2 -c patch-2.6.1-bk2.bz2 | patch -p1 -d linux-2.6.0
> bunzip2 -c 2.6.1-bk1-netdev4.patch.bz2 | patch -p1 -d linux-2.6.0
> wget
> http://www.fr.zoreil.com/people/francois/misc/r8169-tx-index-overflow.patch
> EOD
> $ sh foo
> [...]
> $ patch -p1 -d linux-2.6.0 < r8169-tx-index-overflow.patch
> patching file drivers/net/r8169.c
>
> Okay...
>
> $ md5sum r8169-tx-index-overflow.patch
> 99b2f5886d6bf1d4df0f7553bb5bef57  r8169-tx-index-overflow.patch
>
> [...]

Must be my mistake. Thanks for verifying things.

> I did :o)
>
>    ----- The following addresses had permanent fatal errors -----
> <harisri@bigpond.com>
>     (reason: 554 recipient <harisri@bigpond.com> exceeds mailbox storage
> quota)

:-) I apologies for that. God save me from the Spammers!

> > under 32 bit kernel (vanilla Fedora + 2.6.1-mm4) in which it survives my
> > torture test (I have verified for no more than 5 minutes though, but then
> > it does not survive for more than 5 secs under the 64 bit kernel).
>
> Point taken.

I have a good news: I checked out things as usual under vanilla 
linux-2.6.2-rc1, and to my surprise Kernel does not hang anymore :-). 
(although Tx counter not incrementing is altogether another problem).

If you want me to I can test your r8169-tc-index-overflow.patch.

Thanks
Hari 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-21 10:15           ` Srihari Vijayaraghavan
@ 2004-01-21 23:59             ` Francois Romieu
  2004-01-22 10:32               ` Srihari Vijayaraghavan
  0 siblings, 1 reply; 11+ messages in thread
From: Francois Romieu @ 2004-01-21 23:59 UTC (permalink / raw)
  To: Srihari Vijayaraghavan; +Cc: netdev, jgarzik

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

Srihari Vijayaraghavan <harisri@bigpond.com> :
[...]
> I have a good news: I checked out things as usual under vanilla 
> linux-2.6.2-rc1, and to my surprise Kernel does not hang anymore :-). 
> (although Tx counter not incrementing is altogether another problem).

r8169 did not evolve between 2.6.1 and 2.6.2-rc1. Change of behavior
probably comes from some other part of the kernel. Joy.

> If you want me to I can test your r8169-tc-index-overflow.patch.

See attachment. It applies against plain 2.6.2-rc1.

--
Ueimor

[-- Attachment #2: r8169.c-diff --]
[-- Type: text/plain, Size: 872 bytes --]

--- linux-2.6.2-rc1/drivers/net/r8169.c.orig	2004-01-22 00:41:03.000000000 +0100
+++ linux-2.6.2-rc1/drivers/net/r8169.c	2004-01-22 00:46:46.000000000 +0100
@@ -871,7 +871,6 @@ rtl8169_tx_interrupt(struct net_device *
 		     void *ioaddr)
 {
 	unsigned long dirty_tx, tx_left = 0;
-	int entry = tp->cur_tx % NUM_TX_DESC;
 
 	assert(dev != NULL);
 	assert(tp != NULL);
@@ -881,14 +880,14 @@ rtl8169_tx_interrupt(struct net_device *
 	tx_left = tp->cur_tx - dirty_tx;
 
 	while (tx_left > 0) {
+		int entry = dirty_tx % NUM_TX_DESC;
+
 		if ((tp->TxDescArray[entry].status & OWNbit) == 0) {
-			dev_kfree_skb_irq(tp->
-					  Tx_skbuff[dirty_tx % NUM_TX_DESC]);
-			tp->Tx_skbuff[dirty_tx % NUM_TX_DESC] = NULL;
+			dev_kfree_skb_irq(tp->Tx_skbuff[entry]);
+			tp->Tx_skbuff[entry] = NULL;
 			tp->stats.tx_packets++;
 			dirty_tx++;
 			tx_left--;
-			entry++;
 		}
 	}
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PROBLEM] r8169 deadlocks
  2004-01-21 23:59             ` Francois Romieu
@ 2004-01-22 10:32               ` Srihari Vijayaraghavan
  0 siblings, 0 replies; 11+ messages in thread
From: Srihari Vijayaraghavan @ 2004-01-22 10:32 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, jgarzik

Hello Francois,

On Thursday 22 January 2004 10:59, Francois Romieu wrote:
> [...]
> r8169 did not evolve between 2.6.1 and 2.6.2-rc1. Change of behavior
> probably comes from some other part of the kernel. Joy.
>
> > If you want me to I can test your r8169-tc-index-overflow.patch.
>
> See attachment. It applies against plain 2.6.2-rc1.

OK. I have applied your patch and I have tested my computer under moderate to 
high network load. Although I may not have the kind of network load to 
trigger the bug your patch is meant to attack, I can confirm there is no 
stability issue with your patch.

Thanks
Hari

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-01-22 10:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-15  9:38 [PROBLEM] r8169 deadlocks Srihari Vijayaraghavan
2004-01-15 21:08 ` Francois Romieu
2004-01-17  1:34   ` Srihari Vijayaraghavan
2004-01-17 12:53     ` Francois Romieu
2004-01-19 11:51   ` Srihari Vijayaraghavan
2004-01-19 23:24     ` Francois Romieu
2004-01-20 10:50       ` Srihari Vijayaraghavan
2004-01-20 20:52         ` Francois Romieu
2004-01-21 10:15           ` Srihari Vijayaraghavan
2004-01-21 23:59             ` Francois Romieu
2004-01-22 10:32               ` Srihari Vijayaraghavan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).