Netdev List
 help / color / mirror / Atom feed
* FW:  ccid2/ccid3 oopses
From: devzero @ 2008-01-08 22:06 UTC (permalink / raw)
  To: dccp; +Cc: netdev

Hello !

as suggested by Ian McDonald, i`m forwarding this to dccp and netdev mailing lists.


> hi !
>
> i know dccp_ccid2 and ccid3 modules are considered experimental, but i`m unsure if i probably triggered a bug inside or outside that modules here (i`m no kernel developer)
>
> apparently, i got crashes when loading/unloading other driver modules just after ccid2 or ccid3 had been loaded/unloaded _once_ (have not used them at all, just modprobe module;modprobe -r module)
>
> this was detected during some hardcore module load/unload testing session and apparently these modules seem to be the root cause of other modules crashing, so they seem to leave the system in an inconsistent state after load/unload.
>
> this can be reproduced with recent 2.6.24rc6 kernel which was mostly built with allmodconfig.
> i could not reproduce this with a more minimalistic configuration, e.g. the suse kernel of the day runs fine.
>
> the easiest way to reproduce is:
>
> while true;do modprobe dccp_ccid2/3;modprobe -r dccp_ccid2/3;done
> after short time, the kernel oopses (messages below)
>
> i`m not sure if this is worth to be filed at kernel bugzilla, so i`m contacting you personally first.
>
> i`d happily assist in helping debug this or provide more input (.config etc) if you want to take a look.
>
> regards
> Roland 
>
>
> [ 2322.177054] CCID: Unregistered CCID 2 (ccid2)
> [ 2322.377927] CCID: Registered CCID 2 (ccid2)
>
> [ 2322.413793] BUG: unable to handle kernel paging request at virtual address 40000864
> [ 2322.425066] printing eip: c01792e1 *pde = 00000000
> [ 2322.431523] Oops: 0000 [#1] SMP
> [ 2322.435249] Modules linked in: dccp_ccid2 dccp edd iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr container thermal piix generic i2c_piix4 processor button ac i2c_core power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd uhci_hcd scsi_mod usbcore
> [ 2322.489115]
> [ 2322.491535] Pid: 1730, comm: kjournald Not tainted (2.6.24-rc6 #4)
> [ 2322.497266] EIP: 0060:[<c01792e1>] EFLAGS: 00010002 CPU: 0
> [ 2322.503205] EIP is at kmem_cache_alloc+0x5d/0xa6
> [ 2322.508789] EAX: 00000000 EBX: 00000282 ECX: c03750a0 EDX: 40000864
> [ 2322.514864] ESI: c1408314 EDI: 40000864 EBP: c03750a0 ESP: df9cfe94
> [ 2322.521110]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 2322.527346] Process kjournald (pid: 1730, ti=df9ce000 task=deaf31a0 task.ti=df9ce000)
> [ 2322.535722] Stack: c016094d c1408314 40000864 00011200 df4032a0 df408e40 def45000 0000000f
> [ 2322.545350]        00011210 c016094d 00000010 e08cb15f 00000000 dead9c00 df161ab8 00000004
> [ 2322.556833]        00000000 def45000 0000000f 00000000 c019acdf 00000000 df402940 00000010
> [ 2322.565168] Call Trace:
> [ 2322.568637]  [<c016094d>] mempool_alloc+0x24/0xc2
> [ 2322.573169]  [<c016094d>] mempool_alloc+0x24/0xc2
> [ 2322.577175]  [<e08cb15f>] __journal_file_buffer+0x9b/0x11c [jbd]
> [ 2322.585033]  [<c019acdf>] bio_alloc_bioset+0x8c/0xe6
> [ 2322.589301]  [<c019ad44>] bio_alloc+0xb/0x17
> [ 2322.593309]  [<c0197688>] submit_bh+0x6e/0xf8
> [ 2322.597358]  [<e08ccdba>] journal_commit_transaction+0x6de/0xbe8 [jbd]
> [ 2322.605109]  [<c013095c>] lock_timer_base+0x19/0x35
> [ 2322.610478]  [<e08cf9e6>] kjournald+0xae/0x1dd [jbd]
> [ 2322.616182]  [<c0139985>] autoremove_wake_function+0x0/0x33
> [ 2322.621341]  [<e08cf938>] kjournald+0x0/0x1dd [jbd]
> [ 2322.628588]  [<c01398bc>] kthread+0x38/0x60
> [ 2322.633306]  [<c0139884>] kthread+0x0/0x60
> [ 2322.637365]  [<c0107beb>] kernel_thread_helper+0x7/0x10
> [ 2322.645002]  =======================
> [ 2322.649049] Code: 3e 85 ff 89 7c 24 08 75 1b 89 14 24 8b 54 24 0c 83 c9 ff 89 e8 89 74 24 04 e8 2b fb ff ff 89 44 24 08 eb 0c 8b 54 24 08 8b 46 0c <8b> 04 82 89 06 89 d8 50 9d 0f 1f 84 00 00 00 00 00 66 83 7c 24
> [ 2322.673340] EIP: [<c01792e1>] kmem_cache_alloc+0x5d/0xa6 SS:ESP 0068:df9cfe94
> [ 2322.681327] ---[ end trace 35dbcab07ee48cc5 ]---
> [ 2322.737700] ------------[ cut here ]------------
> [ 2322.748822] Kernel BUG at c0199e6d [verbose debug info unavailable]
> [ 2322.755960] invalid opcode: 0000 [#2] SMP
> [ 2322.760773] Modules linked in: dccp_ccid2 dccp edd iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci parport_pc floppy ata_generic parport pcnet32 rtc_cmos libata rtc_core rtc_lib mii pcspkr container thermal piix generic i2c_piix4 processor button ac i2c_core power_supply shpchp ide_core intel_agp pci_hotplug agpgart mousedev evdev sg ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi ehci_hcd uhci_hcd scsi_mod usbcore
> [ 2322.813338]
> [ 2322.817134] Pid: 3125, comm: klogd Tainted: G      D (2.6.24-rc6 #4)
> [ 2322.821416] EIP: 0060:[<c0199e6d>] EFLAGS: 00010246 CPU: 0
> [ 2322.828832] EIP is at end_buffer_async_write+0x6f/0xfd
> [ 2322.833341] EAX: 00000000 EBX: df1d2268 ECX: df1d2268 EDX: 00000001
> [ 2322.839963] ESI: def45080 EDI: df77c0c0 EBP: c13a9d40 ESP: dee9be6c
> [ 2322.845409]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 2322.851864] Process klogd (pid: 3125, ti=dee9a000 task=df474bc0 task.ti=dee9a000)
> [ 2322.859787] Stack: dfa61000 00000015 ffffffff e088db79 c0107a5c ffffffff 00000202 00000200
> [ 2322.868770]        e088db79 dfa61000 def45080 def45080 df77c0c0 00001000 c01994fd c01994dc
> [ 2322.877370]        c019a99d 00001000 c01d919a 00000000 dee9bedc c02d8c44 dee9bfa0 dee9bedc
> [ 2322.887755] Call Trace:
> [ 2322.889427]  [<e088db79>] mptscsih_io_done+0x0/0xa52 [mptscsih]
> [ 2322.896192]  [<c0107a5c>] apic_timer_interrupt+0x28/0x30
> [ 2322.901439]  [<e088db79>] mptscsih_io_done+0x0/0xa52 [mptscsih]
> [ 2322.909389]  [<c01994fd>] end_bio_bh_io_sync+0x21/0x29
> [ 2322.915672]  [<c01994dc>] end_bio_bh_io_sync+0x0/0x29
> [ 2322.921404]  [<c019a99d>] bio_endio+0x27/0x29
> [ 2322.928198]  [<c01d919a>] __end_that_request_first+0x192/0x33d
> [ 2322.933405]  [<e0816a93>] scsi_end_request+0x1a/0xa8 [scsi_mod]
> [ 2322.940634]  [<e081761b>] scsi_io_completion+0x14c/0x2fb [scsi_mod]
> [ 2322.947318]  [<c01db7e1>] blk_done_softirq+0x5b/0x67
> [ 2322.953414]  [<c012cfae>] __do_softirq+0x75/0xe1
> [ 2322.957435]  [<c012d05f>] do_softirq+0x45/0x53
> [ 2322.962434]  [<c012d2c3>] irq_exit+0x38/0x6b
> [ 2322.966576]  [<c0109843>] do_IRQ+0x5c/0x71
> [ 2322.971256]  [<c012d2de>] irq_exit+0x53/0x6b
> [ 2322.975864]  [<c011a12a>] smp_apic_timer_interrupt+0x71/0x7d
> [ 2322.981461]  [<c010799f>] common_interrupt+0x23/0x28
> [ 2322.989075]  =======================
> [ 2322.995502] Code: 24 16 7b 31 c0 e8 59 f4 f8 ff 8b 45 10 90 0f ba 68 3c 15 90 0f ba 2b 0b 90 0f ba 33 00 90 0f ba 6d 00 01 8b 45 00 f6 c4 08 75 04 <0f> 0b eb fe 8b 75 0c 9c 58 0f 1f 84 00 00 00 00 00 89 c7 fa 0f
> [ 2323.023538] EIP: [<c0199e6d>] end_buffer_async_write+0x6f/0xfd SS:ESP 0068:dee9be6c
> [ 2323.033481] Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
>
> [  306.079477] CCID: Unregistered CCID 3 (ccid3)
> [  306.143890] BUG: unable to handle kernel paging request at virtual address 40001864
> [  306.152506] printing eip: c01792e1 *pde = 00000000
> [  306.156682] Oops: 0000 [#1] SMP
> [  306.160469] Modules linked in: dccp_tfrc_lib dccp edd iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci ata_generic libata floppy parport_pc rtc_cmos parport pcnet32 rtc_core mii i2c_piix4 rtc_lib i2c_core pcspkr piix generic ac thermal container ide_core power_supply button shpchp intel_agp processor pci_hotplug agpgart mousedev evdev sg ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi uhci_hcd ehci_hcd scsi_mod usbcore
> [  306.220412]
> [  306.222927] Pid: 136, comm: pdflush Not tainted (2.6.24-rc6 #4)
> [  306.228691] EIP: 0060:[<c01792e1>] EFLAGS: 00010002 CPU: 0
> [  306.238499] EIP is at kmem_cache_alloc+0x5d/0xa6
> [  306.242756] EAX: 00000000 EBX: 00000282 ECX: c03750a0 EDX: 40001864
> [  306.248712] ESI: c1408314 EDI: 40001864 EBP: c03750a0 ESP: dead3d54
> [  306.254538]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [  306.258526] Process pdflush (pid: 136, ti=dead2000 task=df49e5e0 task.ti=dead2000)
> [  306.264717] Stack: c016094d c1408314 40001864 00011200 de5fb000 df446080 de65e680 00000000
> [  306.276685]        00011210 c016094d 00000010 00000000 00000003 de6bc378 00000003 de6bcde0
> [  306.284594]        00000000 de65e680 00000000 00000000 c019acdf 00000000 df402a20 00000010
> [  306.296711] Call Trace:
> [  306.298976]  [<c016094d>] mempool_alloc+0x24/0xc2
> [  306.304602]  [<c016094d>] mempool_alloc+0x24/0xc2
> [  306.308722]  [<c019acdf>] bio_alloc_bioset+0x8c/0xe6
> [  306.314133]  [<c019ad44>] bio_alloc+0xb/0x17
> [  306.318609]  [<c0197688>] submit_bh+0x6e/0xf8
> [  306.324696]  [<c0199265>] __block_write_full_page+0x222/0x30f
> [  306.330100]  [<c019c810>] blkdev_get_block+0x0/0x43
> [  306.334949]  [<c0199420>] block_write_full_page+0xce/0xd6
> [  306.340733]  [<c019c810>] blkdev_get_block+0x0/0x43
> [  306.348460]  [<c0163588>] __writepage+0x8/0x21
> [  306.352736]  [<c0163a1e>] write_cache_pages+0x15b/0x273
> [  306.362425]  [<c0163580>] __writepage+0x0/0x21
> [  306.369968]  [<c0163b36>] generic_writepages+0x0/0x26
> [  306.375911]  [<c0163b55>] generic_writepages+0x1f/0x26
> [  306.380734]  [<c0163b7c>] do_writepages+0x20/0x30
> [  306.386999]  [<c019423b>] __writeback_single_inode+0x17c/0x2a0
> [  306.394198]  [<c012d2de>] irq_exit+0x53/0x6b
> [  306.398880]  [<c011a12a>] smp_apic_timer_interrupt+0x71/0x7d
> [  306.403016]  [<c0194650>] sync_sb_inodes+0x18a/0x240
> [  306.410354]  [<c01948a4>] writeback_inodes+0x5a/0x9c
> [  306.414853]  [<c01643d1>] wb_kupdate+0x7c/0xde
> [  306.419014]  [<c01648b8>] pdflush+0x130/0x1d0
> [  306.424730]  [<c0164355>] wb_kupdate+0x0/0xde
> [  306.430029]  [<c0164788>] pdflush+0x0/0x1d0
> [  306.435018]  [<c01398bc>] kthread+0x38/0x60
> [  306.439003]  [<c0139884>] kthread+0x0/0x60
> [  306.442884]  [<c0107beb>] kernel_thread_helper+0x7/0x10
> [  306.448467]  =======================
> [  306.451998] Code: 3e 85 ff 89 7c 24 08 75 1b 89 14 24 8b 54 24 0c 83 c9 ff 89 e8 89 74 24 04 e8 2b fb ff ff 89 44 24 08 eb 0c 8b 54 24 08 8b 46 0c <8b> 04 82 89 06 89 d8 50 9d 0f 1f 84 00 00 00 00 00 66 83 7c 24
> [  306.478944] EIP: [<c01792e1>] kmem_cache_alloc+0x5d/0xa6 SS:ESP 0068:dead3d54
> [  306.486157] ---[ end trace 0170fd34e372695a ]---
> [  306.511450] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> [  306.532261] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]
> [  306.551315] sd 0:0:0:0: [sda] Add. Sense: Data phase error
> [  306.563451] end_request: I/O error, dev sda, sector 4437024
> [  306.575308] Buffer I/O error on device sda2, logical block 426108
> [  306.587177] lost page write due to I/O error on sda2
> [  306.596911] Buffer I/O error on device sda2, logical block 426109
> [  306.608804] lost page write due to I/O error on sda2
> [  306.617992] Buffer I/O error on device sda2, logical block 426110
> [  306.628005] lost page write due to I/O error on sda2
> [  306.636004] Buffer I/O error on device sda2, logical block 426111
> [  306.644813] lost page write due to I/O error on sda2
> [  306.652236] Buffer I/O error on device sda2, logical block 426112
> [  306.662203] lost page write due to I/O error on sda2
> [  306.671197] Buffer I/O error on device sda2, logical block 426113
> [  306.679141] lost page write due to I/O error on sda2
> [  306.699255] BUG: unable to handle kernel paging request at virtual address 100d4a84
> [  306.715348] printing eip: c0139945 *pde = 00000000
> [  306.724664] Oops: 0000 [#2] SMP
> [  306.730582] Modules linked in: dccp_tfrc_lib dccp edd iptable_filter ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode firmware_class fuse loop dm_mod ide_cd cdrom pata_acpi ata_piix ahci ata_generic libata floppy parport_pc rtc_cmos parport pcnet32 rtc_core mii i2c_piix4 rtc_lib i2c_core pcspkr piix generic ac thermal container ide_core power_supply button shpchp intel_agp processor pci_hotplug agpgart mousedev evdev sg ext3 jbd mbcache sd_mod mptspi mptscsih mptbase scsi_transport_spi uhci_hcd ehci_hcd scsi_mod usbcore
> [  306.788046]
> [  306.788569] Pid: 136, comm: pdflush Tainted: G      D (2.6.24-rc6 #4)
> [  306.800851] EIP: 0060:[<c0139945>] EFLAGS: 00010296 CPU: 0
> [  306.806474] EIP is at __wake_up_bit+0x9/0x33
> [  306.814710] EAX: 100d4a84 EBX: 100d4a80 ECX: 0000000c EDX: c13670e0
> [  306.818605] ESI: df39a8f8 EDI: 00000202 EBP: c13670e0 ESP: dead3acc
> [  306.835065]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [  306.837951] Process pdflush (pid: 136, ti=dead2000 task=df49e5e0 task.ti=dead2000)
> [  306.844247] Stack: 00000202 c13670e0 c015e5c5 df39a8f8 c0199ee0 c0317b16 dead3ae8 32616473
> [  306.855168]        00000200 c0160924 00000000 df446000 de65ef80 df4e00c0 00004000 de65e200
> [  306.864865]        de65e200 df4e00c0 00003000 c01994fd c01994dc c019a99d 00001000 c01d919a
> [  306.874768] Call Trace:
> [  306.876862]  [<c015e5c5>] end_page_writeback+0x2f/0x3c
> [  306.883188]  [<c0199ee0>] end_buffer_async_write+0xe2/0xfd
> [  306.892703]  [<c0160924>] mempool_free+0x6a/0x6f
> [  306.896859]  [<c01994fd>] end_bio_bh_io_sync+0x21/0x29
> [  306.903132]  [<c01994dc>] end_bio_bh_io_sync+0x0/0x29
> [  306.908307]  [<c019a99d>] bio_endio+0x27/0x29
> [  306.912352]  [<c01d919a>] __end_that_request_first+0x192/0x33d
> [  306.919035]  [<e0816a93>] scsi_end_request+0x1a/0xa8 [scsi_mod]
> [  306.926526]  [<e08177c2>] scsi_io_completion+0x2f3/0x2fb [scsi_mod]
> [  306.932877]  [<c01db7e1>] blk_done_softirq+0x5b/0x67
> [  306.938996]  [<c012cfae>] __do_softirq+0x75/0xe1
> [  306.944407]  [<c012d05f>] do_softirq+0x45/0x53
> [  306.948536]  [<c012d2c3>] irq_exit+0x38/0x6b
> [  306.952574]  [<c011a12a>] smp_apic_timer_interrupt+0x71/0x7d
> [  306.958088]  [<c0107a5c>] apic_timer_interrupt+0x28/0x30
> [  306.963199]  [<c014ce77>] acct_collect+0x152/0x15b
> [  306.968109]  [<c012b710>] do_exit+0x1b9/0x68c
> [  306.972841]  [<c01292c0>] printk+0x1b/0x1f
> [  306.976869]  [<c0108417>] die+0x21d/0x224
> [  306.980899]  [<c0120397>] do_page_fault+0x4b6/0x593
> [  306.986648]  [<c01d9793>] generic_make_request+0x3be/0x3ec
> [  306.991209]  [<c01d9793>] generic_make_request+0x3be/0x3ec
> [  306.996429]  [<c011fee1>] do_page_fault+0x0/0x593
> [  307.000875]  [<c02c2c3a>] error_code+0x72/0x78
> [  307.004889]  [<c01792e1>] kmem_cache_alloc+0x5d/0xa6
> [  307.008678]  [<c016094d>] mempool_alloc+0x24/0xc2
> [  307.011091]  [<c016094d>] mempool_alloc+0x24/0xc2
> [  307.026957]  [<c019acdf>] bio_alloc_bioset+0x8c/0xe6
> [  307.032287]  [<c019ad44>] bio_alloc+0xb/0x17
> [  307.035254]  [<c0197688>] submit_bh+0x6e/0xf8
> [  307.040911]  [<c0199265>] __block_write_full_page+0x222/0x30f
> [  307.048908]  [<c019c810>] blkdev_get_block+0x0/0x43
> [  307.055251]  [<c0199420>] block_write_full_page+0xce/0xd6
> [  307.062257]  [<c019c810>] blkdev_get_block+0x0/0x43
> [  307.067258]  [<c0163588>] __writepage+0x8/0x21
> [  307.072534]  [<c0163a1e>] write_cache_pages+0x15b/0x273
> [  307.076626]  [<c0163580>] __writepage+0x0/0x21
> [  307.078707]  [<c0163b36>] generic_writepages+0x0/0x26
> [  307.084487]  [<c0163b55>] generic_writepages+0x1f/0x26
> [  307.091273]  [<c0163b7c>] do_writepages+0x20/0x30
> [  307.096249]  [<c019423b>] __writeback_single_inode+0x17c/0x2a0
> [  307.102496]  [<c012d2de>] irq_exit+0x53/0x6b
> [  307.107266]  [<c011a12a>] smp_apic_timer_interrupt+0x71/0x7d
> [  307.112929]  [<c0194650>] sync_sb_inodes+0x18a/0x240
> [  307.119285]  [<c01948a4>] writeback_inodes+0x5a/0x9c
> [  307.124927]  [<c01643d1>] wb_kupdate+0x7c/0xde
> [  307.132149]  [<c01648b8>] pdflush+0x130/0x1d0
> [  307.136540]  [<c0164355>] wb_kupdate+0x0/0xde
> [  307.140336]  [<c0164788>] pdflush+0x0/0x1d0
> [  307.144942]  [<c01398bc>] kthread+0x38/0x60
> [  307.150390]  [<c0139884>] kthread+0x0/0x60
> [  307.155285]  [<c0107beb>] kernel_thread_helper+0x7/0x10
> [  307.162728]  =======================
> [  307.164944] Code: c1 eb 1e 69 db 80 07 00 00 81 c3 80 49 35 c0 2b 8b 08 07 00 00 d3 e8 6b c0 0c 03 83 00 07 00 00 5b c3 53 89 c3 83 ec 0c 8d 40 04 <39> 43 04 89 54 24 04 89 4c 24 08 74 18 8d 44 24 04 b9 01 00 00
> [  307.188954] EIP: [<c0139945>] __wake_up_bit+0x9/0x33 SS:ESP 0068:dead3acc
> [  307.199560] Kernel panic - not syncing: Fatal exception in interrupt



_______________________________________________________________________
Jetzt neu! Schützen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220


^ permalink raw reply

* Please pull 'upstream-jgarzik' branch of wireless-2.6
From: John W. Linville @ 2008-01-08 22:23 UTC (permalink / raw)
  To: jeff-o2qLIJkoznsdnm+yROfE0A
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q

[-- Attachment #1: Type: text/plain, Size: 8503 bytes --]

Jeff,

Another round of patches intended for 2.6.25...the biggest factions are
rt2x00 and b43 updates, as well as some Viro-isms... :-)

Please let me know if there are any problems!

John

P.S.  Copying Dave in case he is handling these requests...FWIW, it
will definitely depend on the patches already in netdev-2.6#upstream...

---

Individual patches are available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/upstream-jgarzik

---

The following changes since commit 65d0aa09c183ee45dc1786675209313fa75cf4ec:
  Jeff Garzik (1):
        wireless/iwl: fix namespace breakage

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-jgarzik

Al Viro (35):
      eliminate byteswapping in struct ieee80211_qos_parameters
      several missing cpu_to_le16() in ieee80211softmac_capabilities()
      ieee80211softmac_auth_resp() fix
      ieee80211: fix misannotations
      ieee80211: beacon->capability is little-endian
      airo: fix transmit_802_11_packet()
      airo: fix endianness bug in ->dBm handling
      airo: bug in airo_interrupt() handling on incoming 802.11
      airo endianness bug: cap_rid.extSoftCap
      airo: fix writerids() endianness
      hostap: fix endianness with txdesc->sw_support
      p54common annotations and fixes
      ipw2100 annotations and fixes
      ray_cs fixes
      ipw2200 fix: struct ieee80211_radiotap_header is little-endian
      ipw2200 fix: ->rt_chbitmask is le16
      ipw2200: ipw_tx_skb() endianness bug
      airo: trivial endianness annotations
      airo: sanitize handling of SSID_rid
      bap_read()/bap_write() work with fixed-endian buffers
      airo: sanitize BSSListRid handling
      airo: sanitize handling of WepKeyRid
      airo: sanitize handling of StatsRid
      airo: sanitize handling of CapabilityRid
      airo: sanitize APListRid handling
      airo: sanitize handling of StatusRid
      airo: last of endianness annotations
      hostap annotations
      hostap: don't mess with mixed-endian even for internal skb queues
      p54pci: endianness annotations and fixes
      bcm43xx annotations
      prism54 trivial annotations
      ipw2200 trivial annotations
      ipw2200: do not byteswap struct ipw_associate
      misc wireless annotations

Daniel Walker (1):
      prism54: remove questionable down_interruptible usage

Ivo van Doorn (12):
      rt2x00: Fix chipset debugfs file
      rt2x00: Always call ieee80211_stop_queue() when return NETDEV_TX_BUSY
      rt2x00: Only set the TBCN flag when the interface is configured to send beacons.
      rt2x00: Store queue idx and entry idx in data_ring and data_entry
      rt2x00: Move start() and stop() handlers into rt2x00lib.c
      rt2x00: Put 802.11 data on 4 byte boundary
      rt2x00: Move packet filter flags
      rt2x00: Cleanup write_tx_desc() arguments
      rt2x00: Determine MY_BSS from descriptor
      rt2x00: Move init_txring and init_rxring into rt2x00lib
      rt2x00: Correctly initialize data and desc pointer
      rt2x00: Release rt2x00 2.0.14

John W. Linville (1):
      Revert "rtl8187: fix tx power reading"

Michael Buesch (14):
      ssb: Fix extraction of values from SPROM
      b43: Only select allowed TX and RX antennas
      b43: Fix chip access validation for new devices
      ssb: Fix PCMCIA lowlevel register access
      b43: Remove PIO support
      b43: Add definitions for MAC Control register
      b43: Fix upload of beacon packets to the hardware
      b43: Fix template upload locking.
      b43: Put multicast frames on the mcast queue
      b43: Fix tim search buffer overrun
      b43-ssb-bridge: Add PCI ID for BCM43XG
      b43: Add NPHY kconfig option
      b43: Fix any N-PHY related WARN_ON() in the attach stage.
      b43: Add N-PHY related initvals firmware filenames.

Miguel Botón (3):
      ssb: add 'ssb_pcihost_set_power_state' function
      b44: power down PHY when interface down
      iwlwifi: fix compilation warning in 'iwl-4965.c'

Zhu Yi (1):
      iwlwifi: fix typo in 'drivers/net/wireless/iwlwifi/Kconfig'

 drivers/net/b44.c                             |   28 +-
 drivers/net/wireless/adm8211.c                |    8 +-
 drivers/net/wireless/airo.c                   | 1233 ++++++++++++-------------
 drivers/net/wireless/atmel.c                  |   30 +-
 drivers/net/wireless/b43/Kconfig              |   58 +-
 drivers/net/wireless/b43/Makefile             |    9 +-
 drivers/net/wireless/b43/b43.h                |   97 +--
 drivers/net/wireless/b43/debugfs.c            |    1 -
 drivers/net/wireless/b43/dma.c                |  101 ++-
 drivers/net/wireless/b43/dma.h                |   50 -
 drivers/net/wireless/b43/main.c               |  627 +++++++------
 drivers/net/wireless/b43/main.h               |    3 +
 drivers/net/wireless/b43/xmit.c               |   26 +-
 drivers/net/wireless/b43legacy/main.c         |    5 -
 drivers/net/wireless/b43legacy/phy.c          |    2 +-
 drivers/net/wireless/bcm43xx/bcm43xx.h        |    6 +-
 drivers/net/wireless/bcm43xx/bcm43xx_main.c   |   40 +-
 drivers/net/wireless/bcm43xx/bcm43xx_pio.c    |    6 +-
 drivers/net/wireless/bcm43xx/bcm43xx_xmit.c   |    6 +-
 drivers/net/wireless/hostap/hostap_80211.h    |   34 +-
 drivers/net/wireless/hostap/hostap_80211_rx.c |    2 +-
 drivers/net/wireless/hostap/hostap_ap.c       |   72 +-
 drivers/net/wireless/hostap/hostap_common.h   |   34 +-
 drivers/net/wireless/hostap/hostap_download.c |   22 +-
 drivers/net/wireless/hostap/hostap_hw.c       |   28 +-
 drivers/net/wireless/hostap/hostap_info.c     |    9 +-
 drivers/net/wireless/hostap/hostap_ioctl.c    |   66 +-
 drivers/net/wireless/hostap/hostap_main.c     |    6 +-
 drivers/net/wireless/hostap/hostap_pci.c      |   16 +-
 drivers/net/wireless/hostap/hostap_wlan.h     |  202 ++--
 drivers/net/wireless/ipw2100.c                |   10 +-
 drivers/net/wireless/ipw2200.c                |  175 ++--
 drivers/net/wireless/ipw2200.h                |  190 ++--
 drivers/net/wireless/iwlwifi/Kconfig          |    4 +-
 drivers/net/wireless/iwlwifi/iwl-4965.c       |    2 +-
 drivers/net/wireless/iwlwifi/iwl3945-base.c   |    4 +-
 drivers/net/wireless/iwlwifi/iwl4965-base.c   |    4 +-
 drivers/net/wireless/p54common.c              |    8 +-
 drivers/net/wireless/p54pci.c                 |   16 +-
 drivers/net/wireless/p54pci.h                 |    4 +-
 drivers/net/wireless/prism54/isl_38xx.h       |   10 +-
 drivers/net/wireless/prism54/isl_ioctl.c      |   12 +-
 drivers/net/wireless/prism54/islpci_eth.c     |    2 +-
 drivers/net/wireless/prism54/islpci_eth.h     |   38 +-
 drivers/net/wireless/prism54/islpci_mgt.h     |    2 +-
 drivers/net/wireless/ray_cs.c                 |   69 +-
 drivers/net/wireless/rt2x00/rt2400pci.c       |  102 +--
 drivers/net/wireless/rt2x00/rt2500pci.c       |   88 +--
 drivers/net/wireless/rt2x00/rt2500usb.c       |   36 +-
 drivers/net/wireless/rt2x00/rt2x00.h          |   32 +-
 drivers/net/wireless/rt2x00/rt2x00debug.c     |   13 +-
 drivers/net/wireless/rt2x00/rt2x00dev.c       |  142 +++-
 drivers/net/wireless/rt2x00/rt2x00lib.h       |    4 +-
 drivers/net/wireless/rt2x00/rt2x00mac.c       |   59 +-
 drivers/net/wireless/rt2x00/rt2x00pci.c       |   28 +-
 drivers/net/wireless/rt2x00/rt2x00ring.h      |   13 +
 drivers/net/wireless/rt2x00/rt2x00usb.c       |   87 +-
 drivers/net/wireless/rt2x00/rt2x00usb.h       |    5 +-
 drivers/net/wireless/rt2x00/rt61pci.c         |  114 +--
 drivers/net/wireless/rt2x00/rt73usb.c         |   35 +-
 drivers/net/wireless/rtl8187_rtl8225.c        |    8 +-
 drivers/net/wireless/wavelan_cs.p.h           |    2 +-
 drivers/ssb/b43_pci_bridge.c                  |    1 +
 drivers/ssb/pci.c                             |   76 ++-
 drivers/ssb/pcmcia.c                          |   71 +-
 include/linux/ssb/ssb.h                       |   29 +-
 include/linux/ssb/ssb_regs.h                  |   38 +-
 include/net/ieee80211.h                       |    6 +-
 net/ieee80211/ieee80211_crypt_tkip.c          |   22 +-
 net/ieee80211/ieee80211_rx.c                  |   47 +-
 net/ieee80211/ieee80211_tx.c                  |   14 +-
 net/ieee80211/softmac/ieee80211softmac_auth.c |    6 +-
 net/ieee80211/softmac/ieee80211softmac_io.c   |   10 +-
 73 files changed, 2209 insertions(+), 2256 deletions(-)

Omnibus patch attached as 'upstream-jgarzik.patch.bz2'.
-- 
John W. Linville
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org

[-- Attachment #2: upstream-jgarzik.patch.bz2 --]
[-- Type: application/x-bzip2, Size: 66997 bytes --]

^ permalink raw reply

* AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-08 22:27 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, linux-kernel

Hello,

I was coding an application which passes variable-length messages
over an AF_UNIX SOCK_STREAM socket.  As such I would pass a message
length embedded as the first element in the message, use recv(...,MSG_PEEK)
to determine the message length, and perform the necessary allocation
on the receiving end before performing the real recv().

However, the program would occasionally get into a situation where
a call to recv(sockfd, &buf, len, MSG_PEEK) returns some number
of bytes less than the requested length, and persists in this state
(i.e. retrying the call continues to return the same amount of data)
even when more than sufficient data is known to have been successfully
written to the socket.

By my reading of the af_unix.c kernel code, I believe the problem lies in
unix_stream_recvmsg().  Here we find that, under MSG_PEEK, only a single
skb is pulled from the receive queue and the data copied into the user
buffer.  This fails to account for the fact that this skb could contain less
data than is necessary to satisfy the MSG_PEEK request.  The "short" skb
is then pushed back onto the queue, and thus a retried MSG_PEEK will again
return a short response.  The only way to clear this condition is to
perform a recv() without specifying MSG_PEEK.

I'll be happy to submit a patch to fix this, but would like to first
confirm with others that A) this is incorrect behavior for MSG_PEEK
semantics and B) my approach to the fix sounds reasonable.

What I propose as a fix is, for MSG_PEEK operations, dequeueing enough
skbs to satisfy the operation (checking of course that there is enough
data in the queue in the first place), copying out the requested data
to the user buffer, then pushing those skbs back onto the receive queue.

I'm not terribly familiar with the Linux networking code, so if the
above approach sounds misguided, please feel free to let me know.

Below you'll find an example program (not the real program, I know
there's some no-nos in here, this was written simply to demonstrate
the bug) that demonstrates the problem.  It fails quite reliably and
quickly on an SGI Altix IA64, though given the nature of the problem I
don't believe it's architecture-specific.  Running the program with
no arguments produces output similar to the following:

consumer: recv of message size is not progressing at time 1199829455094356.
consumer: want 8 bytes, but only receiving 5.
consumer: SIOCOUTQ=0, SIOCINQ=202421
consumer: had 36985 successes
producer: sendmsg failed at time 1199829455094571

The meaning of the first two lines is obvious (UNIX time in microseconds).
The third line gives the output and input queue lengths for the AF_UNIX
socket at the time of the error message, to confirm that there is indeed
sufficient data available for satisfying the MSG_PEEK operation.  The
fourth line details how many iterations of the main loop of the program
were successful until we got hung up on the MSG_PEEK.  The final line
simply confirms that the producer failed as a result of the consumer
closing the socket, not the other way around.

Thanks,
Brent Casavant

#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/un.h>

#include <linux/sockios.h>

#define SOCKNAME "/tmp/peektest.sock"
#define MAXLEN 2048
#define NUMMSG 64
#define ERRLOOPS 50000

struct sockmsg {
	size_t length;
	char payload[MAXLEN];
};

struct sockmsg messages[NUMMSG];
struct iovec iov[NUMMSG];

void
do_consumer(char *socketname) {
	struct stat sb;
	int status;
	int sockfd, fd;
	struct sockaddr_un su;
	ssize_t bytes;
	struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0,
			      .msg_iov = iov, .msg_iovlen = 1,
			      .msg_control = NULL, .msg_controllen = 0,
			      .msg_flags = 0 };
	int successes = 0;

	/* Wait for socket to become available */
	do {
		sleep(1);
		status = stat(socketname, &sb);
	} while ((status < 0) && (ENOENT == errno));
	if ((status < 0) && (ENOENT != errno)) {
		printf("consumer: stat failed\n");
		return;
	}
	if (!S_ISSOCK(sb.st_mode)) {
		printf("consumer: %s is not a socket\n", socketname);
		return;
	}

	/* Connect to the producer */
	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sockfd < 0) {
		printf("consumer: socket() failed\n");
		return;
	}
	su.sun_family = PF_UNIX;
	strcpy(su.sun_path, socketname);
	if (connect(sockfd, (struct sockaddr*) &su, SUN_LEN(&su)) < 0) {
		printf("consumer: connect() failed\n");
		return;
	}

	/* Main loop here.  Inspect the first size_t in the incoming
	 * message to determine the message length.  Then read in
	 * the entire message.
	 */
	do {
		size_t msglen;
		int loops;

		/* XXX: Here's where the trouble occurs. Even though the
		 * socket gets filled up quite plentifully by the producer,
		 * the recv(..., MSG_PEEK) operation eventually will get
		 * stuck returning less bytes than requested and never
		 * return any more.
		 * Note that including or not MSG_WAITALL in the flags
		 * makes no difference.
		 */
		/* Get length of next message */
		loops = 0;
		do {
			int outq, inq;
			bytes = recv(sockfd, &msglen, sizeof(size_t),
							MSG_PEEK|MSG_WAITALL);
			if (loops++ > ERRLOOPS) {
				struct timeval tv;
				gettimeofday(&tv, NULL);
				printf("consumer: recv of message size is "
				       "not progressing at time %ld.\n",
				       tv.tv_sec*1000000 + tv.tv_usec);
				printf("consumer: want %d bytes, but only "
				       "receiving %d.\n", sizeof(size_t),
				       bytes);
				ioctl(sockfd, SIOCOUTQ, &outq);
				ioctl(sockfd, SIOCINQ, &inq);
				printf("consumer: SIOCOUTQ=%d, SIOCINQ=%d\n",
				       outq, inq);
				printf("consumer: had %d successes\n",
				       successes);
				return;
			}
			/* Just in case, give the producer a little time to
			 * fill the queue if we're not making progress.
			 */
			if ((bytes < sizeof(size_t)) && (0 == (loops % 10000)))
				sleep(1);
		} while (bytes < sizeof(size_t));

		/* Receive the message */
		iov[0].iov_len = msglen;
		bytes = recvmsg(sockfd, &msg, MSG_WAITALL);
		if (bytes < 0) {
			printf("consumer: recvmsg failed\n");
			return;
		} else if (bytes != msglen) {
			printf("consumer: short recvmsg\n");
			return;
		}
		/* Discard what we receive, we don't really care
		 * about it.
		 */
		successes++;
	} while(1);
}

void
do_producer(char *socketname) {
	int sockfd, fd;
	struct sockaddr_un su;
	struct msghdr msg = { .msg_name = NULL, .msg_namelen = 0,
			      .msg_iov = iov, .msg_iovlen = NUMMSG,
			      .msg_control = NULL, .msg_controllen = 0,
			      .msg_flags = 0 };
	ssize_t sent;

	/* Remove any leftover socket with each run */
	unlink(SOCKNAME);

	/* Create/bind/listen/accept socket */
	sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
	if (sockfd < 0) {
		printf("producer: socket() failed\n");
		return;
	}
	su.sun_family = PF_UNIX;
	strcpy(su.sun_path, socketname);
	if (bind(sockfd, (struct sockaddr*) &su, SUN_LEN(&su)) < 0) {
		printf("producer: bind() failed\n");
		return;
	}
	if (listen(sockfd, 5) < 0) {
		printf("producer: listen() failed\n");
		return;
	}
	while ((fd = accept(sockfd, NULL, 0)) < 0)
		printf("producer: accept() failed\n");

	/* Don't leave it laying around when we exit */
	unlink(SOCKNAME);

	/* Main loop here.  Set message lengths, and send them over the
	 * socket.
	 */
	srand(getpid());
	do {
		int i;
		size_t tosend;
		struct timeval tv;

		/* Set up each IOV/message */
		tosend = 0;
		for (i = 0; i < msg.msg_iovlen; i++) {
			/* Send random lengths of data */
			messages[i].length =
					(rand() % MAXLEN) + sizeof(size_t);
			iov[i].iov_len = messages[i].length;
			tosend += messages[i].length;
		}

		/* Send the message */
		sent = sendmsg(fd, &msg, MSG_NOSIGNAL);
		if (sent < 0) {
			gettimeofday(&tv, NULL);
			printf("producer: sendmsg failed at time %ld\n",
			       tv.tv_sec*1000000 + tv.tv_usec);
			return;
		}
		if (sent < tosend) {
			gettimeofday(&tv, NULL);
			printf("producer: short sendmsg at time %ld\n",
			       tv.tv_sec*1000000 + tv.tv_usec);
			return;
		}
	} while(1);
}

int
main(int argc, char *argv[]) {
	pid_t pid;
	int i;

	/* Initialize messages and IOVs */
	for (i = 0; i < NUMMSG; i++) {
		memset(&messages[i], 0, sizeof(struct sockmsg));
		iov[i].iov_base = &messages[i];
		iov[i].iov_len = sizeof(struct sockmsg);
	}

	/* With no arguments, start both producer and consumer */
	if (argc < 2) {
		if ((pid = fork()) < 0) {
			printf("Problem with fork\n");
			return 1;
		} else if (0 == pid)
			do_producer(SOCKNAME);
		else
			do_consumer(SOCKNAME);
		return 0;
	}

	/* Otherwise start the specified role */
	if (0 == strcmp(argv[1], "producer"))
		do_producer(SOCKNAME);
	else if (0 == strcmp(argv[1], "consumer"))
		do_consumer(SOCKNAME);
	else {
		printf("Error: Must specify \"producer\" or \"consumer\" "
		       "on command line.\n");
		return 1;
	}

	return 0;
}

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Rick Jones @ 2008-01-08 22:40 UTC (permalink / raw)
  To: Brent Casavant; +Cc: netdev, David Miller, linux-kernel
In-Reply-To: <alpine.BSF.1.00.0801081556580.24286@pkunk.americas.sgi.com>

Potential bugs notwithstanding, given that this is a STREAM socket, and 
as such shouldn't (I hope, or I'm eating toes for dinner again) have 
side effects like tossing the rest of a datagram, why are you using 
MSG_PEEK?  Why not simply read the N bytes of the message that will have 
the message length with a normal read/recv, and then read that many 
bytes in the next call?

rick jones

^ permalink raw reply

* Re: SACK scoreboard
From: David Miller @ 2008-01-08 22:44 UTC (permalink / raw)
  To: jheffner; +Cc: ilpo.jarvinen, lachlan.andrew, netdev, quetchen
In-Reply-To: <4783AA29.3080406@psc.edu>

From: John Heffner <jheffner@psc.edu>
Date: Tue, 08 Jan 2008 11:51:53 -0500

> I haven't thought about this too hard, but can we approximate this by 
> moving scaked data into a sacked queue, then if something bad happens 
> merge this back into the retransmit queue?

That defeats the impetus for the change.

We want to free up the data, say, 2 packets at a time as
ACKs come in.  The key goal is smooth liberation of
retransmit queue packets over time.

The big problem is that recovery from even a single packet loss in a
window makes us run kfree_skb() for a all the packets in a full
window's worth of data when recovery completes.

If we just move such packets to a seperate list, we still have to
iterate over all of them when the cumulative ACK arrives.

This problem, that retransmit queue liberation is not smooth, is the
biggest flaw in how SACK is specified.  I mean, consider Ilpo's
mentioned case of 500,000 packet windows.  The issue cannot be
ignored.  SACK is clearly broken.

You speak of a path in Linux where we can reneg on SACKs, but I doubt
it really ever runs because of how aggressive the queue collapser is.
Alexey even has a comment there:

	 * This must not ever occur. */

To be honest this code sits here because it was written before the
queue collapser was coded up.

Really, find me a box where the LINUX_MIB_OFOPRUNED or
LINUX_MIB_RECVPRUNED counters are anything other than zero.

So this is a non-issue and I did consider it before proposing that we
redefine SACK.

^ permalink raw reply

* Re: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: David Miller @ 2008-01-08 22:47 UTC (permalink / raw)
  To: Ramkrishna.Vepa; +Cc: netdev
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD7702CAA89B@nekter>

From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
Date: Tue, 8 Jan 2008 13:17:03 -0500

> Dave,
> 
> This change is not required as the macro, is_s2io_card_up() checks for
> an internal state of the adapter and not netif's state. We want to make
> sure that the adapter registers are not accessed when the adapter is
> being brought down.

If the adapter is being brought down, you would have done
a napi_disable() first and therefore never reach this code
path.

The removal is correct, I read how your driver works.

^ permalink raw reply

* One more patch... -- Re: Please pull 'upstream-jgarzik' branch of wireless-2.6
From: John W. Linville @ 2008-01-08 22:42 UTC (permalink / raw)
  To: jeff; +Cc: netdev, linux-wireless, davem
In-Reply-To: <20080108222305.GD3086@tuxdriver.com>

On Tue, Jan 08, 2008 at 05:23:05PM -0500, John W. Linville wrote:
> Jeff,
> 
> Another round of patches intended for 2.6.25...the biggest factions are
> rt2x00 and b43 updates, as well as some Viro-isms... :-)
> 
> Please let me know if there are any problems!
> 
> John
> 
> P.S.  Copying Dave in case he is handling these requests...FWIW, it
> will definitely depend on the patches already in netdev-2.6#upstream...

I left out a patch.  I have pushed it on top of the previous request.

Let me know if there are problems!

Thanks,

John

---

The following changes since commit f94de7b013f78ad8bbe1064c108dd55141efb177:
  Miguel Botón (1):
        iwlwifi: fix compilation warning in 'iwl-4965.c'

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-jgarzik

Michael Buesch (1):
      zd1211rw: fix alignment for QOS and WDS frames

 drivers/net/wireless/zd1211rw/zd_mac.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c
index 14fb727..7b86930 100644
--- a/drivers/net/wireless/zd1211rw/zd_mac.c
+++ b/drivers/net/wireless/zd1211rw/zd_mac.c
@@ -623,6 +623,8 @@ int zd_mac_rx(struct ieee80211_hw *hw, const u8 *buffer, unsigned int length)
 	const struct rx_status *status;
 	struct sk_buff *skb;
 	int bad_frame = 0;
+	u16 fc;
+	bool is_qos, is_4addr, need_padding;
 
 	if (length < ZD_PLCP_HEADER_SIZE + 10 /* IEEE80211_1ADDR_LEN */ +
 	             FCS_LEN + sizeof(struct rx_status))
@@ -674,9 +676,22 @@ int zd_mac_rx(struct ieee80211_hw *hw, const u8 *buffer, unsigned int length)
 			&& !mac->pass_ctrl)
 		return 0;
 
-	skb = dev_alloc_skb(length);
+	fc = le16_to_cpu(*((__le16 *) buffer));
+
+	is_qos = ((fc & IEEE80211_FCTL_FTYPE) == IEEE80211_FTYPE_DATA) &&
+		 ((fc & IEEE80211_FCTL_STYPE) == IEEE80211_STYPE_QOS_DATA);
+	is_4addr = (fc & (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS)) ==
+		   (IEEE80211_FCTL_TODS | IEEE80211_FCTL_FROMDS);
+	need_padding = is_qos ^ is_4addr;
+
+	skb = dev_alloc_skb(length + (need_padding ? 2 : 0));
 	if (skb == NULL)
 		return -ENOMEM;
+	if (need_padding) {
+		/* Make sure the the payload data is 4 byte aligned. */
+		skb_reserve(skb, 2);
+	}
+
 	memcpy(skb_put(skb, length), buffer, length);
 
 	ieee80211_rx_irqsafe(hw, skb, &stats);
-- 
John W. Linville
linville@tuxdriver.com

^ permalink raw reply related

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 22:53 UTC (permalink / raw)
  To: Rick Jones; +Cc: Brent Casavant, netdev, David Miller, linux-kernel
In-Reply-To: <4783FBD6.1000004@hp.com>

On 08/01/2008, Rick Jones <rick.jones2@hp.com> wrote:
> Potential bugs notwithstanding, given that this is a STREAM socket, and
> as such shouldn't (I hope, or I'm eating toes for dinner again) have
> side effects like tossing the rest of a datagram, why are you using
> MSG_PEEK?  Why not simply read the N bytes of the message that will have
> the message length with a normal read/recv, and then read that many
> bytes in the next call?
>
> rick jones
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Hi,

Where in the code is the message length being sent across the socket?

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Rafael J. Wysocki @ 2008-01-08 22:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Randy Dunlap, Kevin Winchester, J. Bruce Fields,
	Al Viro, Linux Kernel Mailing List, Andrew Morton, NetDev
In-Reply-To: <alpine.LFD.1.00.0801081126200.3148@woody.linux-foundation.org>

On Tuesday, 8 of January 2008, Linus Torvalds wrote:
> 
> On Tue, 8 Jan 2008, Arjan van de Ven wrote:
> > 
> > ok done; I had to fizzle a bit because some things aren't *exactly* a 
> > BUG() statement but I track them anyway (things like the "sleeping in 
> > invalid context" check), so I had to somewhat arbitrarily assign 
> > categories for those. I might fine tune these over time some; if you or 
> > someone else sees problems with categorization please let me know
> 
> Looking good. I wonder if we could also have some way to cross-ref these 
> things with the regression list (notably try to get pointers to them in 
> the regression list). 

I'm thinking about that, but haven't invented any automated solution yet.
I only can manually add references to kerneloops.org, for now.

Greetings,
Rafael

^ permalink raw reply

* Re: [PATCH 7/7]: [NET]: Make ->poll() breakout consistent in Intel ethernet drivers.
From: David Miller @ 2008-01-08 22:55 UTC (permalink / raw)
  To: auke-jan.h.kok; +Cc: netdev, jesse.brandeburg, john.ronciak
In-Reply-To: <4783C95B.8060005@intel.com>

From: "Kok, Auke" <auke-jan.h.kok@intel.com>
Date: Tue, 08 Jan 2008 11:04:59 -0800

> this is exactly the change I was eyeballing and indeed this seems to be the
> general use case in most drivers anyway.
> 
> I'll try to see how this impacts the (especially 4-port) TX performance issue with
> e1000e, but this should be just fine for now, and we can address later anyway.
> 
> Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
> 

Ok, thanks for reviewing.

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: David Miller @ 2008-01-08 23:00 UTC (permalink / raw)
  To: romieu; +Cc: linux, netdev, akpm
In-Reply-To: <20080108213640.GC4450@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Tue, 8 Jan 2008 22:36:40 +0100

> linux@horizon.com <linux@horizon.com> :
> > I take that back.  This patch does NOT fix the leak, at least if
> > ping: sendmsg: No buffer space available
> > is any indication...
> 
> Can you try the patch below ?

Same kind of bug as the RX side :-)  I bet this fixes his
problem...

^ permalink raw reply

* RE: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: Ramkrishna Vepa @ 2008-01-08 23:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080108.144731.168854521.davem@davemloft.net>

Dave,
Sorry, should have been clearer. When I meant "brought down" did not
mean close, but when a adapter reset is initiated. The napi_disable() is
called only on a close. When the driver does a reset, napi_disable() is
not called. 

Ram
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, January 08, 2008 2:48 PM
> To: Ramkrishna Vepa
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 3/7]: [NET]: Do not check netif_running() and
carrier
> state in ->poll()
> 
> From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
> Date: Tue, 8 Jan 2008 13:17:03 -0500
> 
> > Dave,
> >
> > This change is not required as the macro, is_s2io_card_up() checks
for
> > an internal state of the adapter and not netif's state. We want to
make
> > sure that the adapter registers are not accessed when the adapter is
> > being brought down.
> 
> If the adapter is being brought down, you would have done
> a napi_disable() first and therefore never reach this code
> path.
> 
> The removal is correct, I read how your driver works.

^ permalink raw reply

* Re: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: David Miller @ 2008-01-08 23:07 UTC (permalink / raw)
  To: Ramkrishna.Vepa; +Cc: netdev
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD7702CAA9E8@nekter>

From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
Date: Tue, 8 Jan 2008 18:01:32 -0500

> Dave,
> Sorry, should have been clearer. When I meant "brought down" did not
> mean close, but when a adapter reset is initiated. The napi_disable() is
> called only on a close. When the driver does a reset, napi_disable() is
> not called. 

You should be doing a napi_disable() during a reset, like every
other driver does.

It is the only reliable way to prevent the code path from running.

Otherwise, you can start resetting the device right after
that check in the ->poll() routine, and thus still touch
the device during the reset sequence.

In short the check is wrong, because it doesn't fully prevent
what you want it to prevent.  Only a napi_disable() would do
that fully for you.

^ permalink raw reply

* my NAPI patches
From: David Miller @ 2008-01-08 23:08 UTC (permalink / raw)
  To: netdev


Can people do me a favor and do more constructive things like test
that my patches really do fix the "device down during packet flood"
bug instead of nit-picking all the individual driver fixups I made?

That's what is useful at this time, thanks.

^ permalink raw reply

* RE: [PATCH 3/7]: [NET]: Do not check netif_running() and carrier state in ->poll()
From: Ramkrishna Vepa @ 2008-01-08 23:12 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080108.150735.182342472.davem@davemloft.net>

Dave,

Got it. These new napi interface changes were introduced by someone else
and we assumed it to be correct. We will make the fix and submit.

Thanks,
Ram

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Tuesday, January 08, 2008 3:08 PM
> To: Ramkrishna Vepa
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 3/7]: [NET]: Do not check netif_running() and
carrier
> state in ->poll()
> 
> From: "Ramkrishna Vepa" <Ramkrishna.Vepa@neterion.com>
> Date: Tue, 8 Jan 2008 18:01:32 -0500
> 
> > Dave,
> > Sorry, should have been clearer. When I meant "brought down" did not
> > mean close, but when a adapter reset is initiated. The
napi_disable() is
> > called only on a close. When the driver does a reset, napi_disable()
is
> > not called.
> 
> You should be doing a napi_disable() during a reset, like every
> other driver does.
> 
> It is the only reliable way to prevent the code path from running.
> 
> Otherwise, you can start resetting the device right after
> that check in the ->poll() routine, and thus still touch
> the device during the reset sequence.
> 
> In short the check is wrong, because it doesn't fully prevent
> what you want it to prevent.  Only a napi_disable() would do
> that fully for you.

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-08 23:18 UTC (permalink / raw)
  To: Tom Spink; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081453s198af7efycc7c35668c65eaf1@mail.gmail.com>

On Tue, 8 Jan 2008, Tom Spink wrote:

> Where in the code is the message length being sent across the socket?

In do_producer(), there are the following lines in the main loop:

	/* Send random lengths of data */
	messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
	iov[i].iov_len = messages[i].length;

The entire "struct sockmsg" is sent across the socket, so the first
size_t in each message contains the length of the entire message
(including the size_t).  This size gets picked up at the
recv(...,MSG_PEEK) line in do_consumer().

Thanks,
Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-08 23:20 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, David Miller, linux-kernel
In-Reply-To: <4783FBD6.1000004@hp.com>

On Tue, 8 Jan 2008, Rick Jones wrote:

> Potential bugs notwithstanding, given that this is a STREAM socket, and as
> such shouldn't (I hope, or I'm eating toes for dinner again) have side effects
> like tossing the rest of a datagram, why are you using MSG_PEEK?  Why not
> simply read the N bytes of the message that will have the message length with
> a normal read/recv, and then read that many bytes in the next call?

That's entirely reasonable, and probably a worthwhile change to make.
But, as you say, it doesn't change whether or not this is a bug in
the MSG_PEEK code.

With a small bit of complication I certainly can do what you suggest.

The initial reasoning was that this made it easy to handle the case where
the caller of the library routine (my code which stumbled on this was
part of a small library I wrote as part of the application) did not supply
a sufficiently sized buffer for reception of the next message.  The "easy"
way to do this was a MSG_PEEK to validate the buffer size against the
message size, followed by a regular recv() if the buffer is large enough.
To do what you suggest (which effectively works around the bug) is certainly
possible, but also requires maintaining state between calls to the
"get message" routine, as I need to track whether or not the size has already
been read from the next message on the stream socket.

The change isn't terribly difficult, but wasn't the initial idea.

Plus it's always good to flush a bug out of hiding, right? ;)

Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: [PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write
From: linux @ 2008-01-08 23:33 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev
In-Reply-To: <20080108204744.GB4450@electric-eye.fr.zoreil.com>

>> +	do {
>> +		/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
>> +		u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
>> +		/* + rather than | lets compiler microoptimize better */
>> +		ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
>> +	} while (len);

> Imho something is not quite right when the code needs a comment every line
> and I am mildly convinced that we really want to honk an "optimizing mdio
> methods is ok" signal around.

Oh, but those are SPACE-saving optimiztions. :-)
I know it's not time-critical; it's really pure hack value, but is it
that evil?

> "while (len--) {" is probably more akpm-ish btw.

Well spotted.

[...]
>>  static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>>  {
>>  	void __iomem *ioaddr = ipg_ioaddr(dev);
>> +	u8 const polarity = ipg_r8(PHY_CTRL) &
>> +			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);

> (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY) appears twice. I would not
> mind a #define for it.

I'm hardly going to go to war over over the matter, but actually I disagree.

There's a non-zero mental cost to keeping track of an additional name,
and when it's only used two times, and is pretty simple, I think reducing
the number of layers of #defines to understand is a positive advantage.
The above reads "the two polarity bits from the PHY_CTRL register"
to a person who's never read ipg.h.  Adding IPG_PC_POLARITY_BITS just
requires mentally dereferencing another layer of pointers.

Think of it as a function small enough that it can be inlined.

>> @@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>[...]
>> -	for (i = 0; i < p[6].len; i++) {
>> -		p[6].field |=
>> -		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
>> -	}
>> +	send_three_state(ioaddr, polarity);	/* TA first bit */
>> +	(void)read_phy_bit(ioaddr, polarity);		/* TA second bit */
>> +
>> +	for (i = 0; i < 16; i++)
>> +		data += data + read_phy_bit(ioaddr, polarity);
                ^^^^^^^^^^^^
> Huh ?

Okay, I guess you prefer
+		data = 2*data + read_phy_bit(ioaddr, polarity);

That's only one character longer and easier to understand.
Or even four characters:
+		data = (data<<1) + read_phy_bit(ioaddr, polarity);

That's just the synonym that happened to come out of my fingers at the
time.  There's no particular meaning to it.

>> @@ -299,11 +273,13 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
>>  static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
[...]
>> +	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
>> +	mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
>> +	                                  phy_id << 7 | phy_reg << 2 |
>> +					  0x2, 16);

> Use the 80 cols luke:
>	                                  phy_id << 7 | phy_reg << 2 | 0x2, 16);

Good spotting, thanks.

Here's a revised patch:

    drivers/net/ipg.c: Fixed style problems that AKPM noticed.
    
    (And a few more while at it.  Including an actual bug in enabling multicast
    due to & vs. && confusion.)

diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 3860fcd..fb69374 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -188,9 +188,9 @@ static void send_end(void __iomem *ioaddr, u8 phyctrlpolarity)
 		phyctrlpolarity) & IPG_PC_RSVD_MASK, PHY_CTRL);
 }
 
-static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
+static unsigned read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 {
-	u16 bit_data;
+	unsigned bit_data;
 
 	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | phyctrlpolarity);
 
@@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len)
+{
+	otherbits |= IPG_PC_MGMTDIR;
+	while (len--) {
+		/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+		u8 d = ((data >> len) & 1) * IPG_PC_MGMTDATA;
+		/* + rather than | allows slight code size microoptimization */
+		ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+	}
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
+	unsigned i, data = 0;
 	/*
 	 * The GMII mangement frame structure for a read is as follows:
 	 *
@@ -218,78 +237,34 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 	 * A = bit of Physical Layer device address (MSB first)
 	 * R = bit of register address (MSB first)
 	 * z = High impedance state
-	 * D = bit of read data (MSB first)
+	 * 0 = preamble bit sent by PHY
+	 * D = bit of read data (MSB first), sent by PHY
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_READ,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0000,		2  },	/* TA */
-		{ 0x0000,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 5; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
-
-	send_three_state(ioaddr, polarity);
-
-	read_phy_bit(ioaddr, polarity);
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST<<12 | GMII_READ << 10 |
+	                                  phy_id << 5 | phy_reg, 14);
 
 	/*
 	 * For a read cycle, the bits for the next two fields (TA and
 	 * DATA) are driven by the PHY (the IPG reads these bits).
 	 */
-	for (i = 0; i < p[6].len; i++) {
-		p[6].field |=
-		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
-	}
+	send_three_state(ioaddr, polarity);	/* TA first bit */
+	(void)read_phy_bit(ioaddr, polarity);		/* TA second bit */
+
+	for (i = 0; i < 16; i++)
+		data = 2*data + read_phy_bit(ioaddr, polarity);
 
+	/* Trailing idle */
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_end(ioaddr, polarity);
 
 	/* Return the value of the DATA field. */
-	return p[6].field;
+	return data;
 }
 
 /*
@@ -299,11 +274,13 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
 	/*
-	 * The GMII mangement frame structure for a read is as follows:
+	 * The GMII mangement frame structure for a write is as follows:
 	 *
 	 * |Preamble|st|op|phyad|regad|ta|      data      |idle|
-	 * |< 32 1s>|01|10|AAAAA|RRRRR|z0|DDDDDDDDDDDDDDDD|z   |
+	 * |< 32 1s>|01|01|AAAAA|RRRRR|10|DDDDDDDDDDDDDDDD|z   |
 	 *
 	 * <32 1s> = 32 consecutive logic 1 values
 	 * A = bit of Physical Layer device address (MSB first)
@@ -312,66 +289,17 @@ static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 	 * D = bit of write data (MSB first)
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_WRITE,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0002,		2  },	/* TA */
-		{ val & 0xffff,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 7; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
+	                                  phy_id << 7 | phy_reg << 2 | 0x2, 16);
+	mdio_write_bits(ioaddr, polarity, val, 16);	/* DATA */
 
 	/* The last cycle is a tri-state, so read from the PHY. */
-	for (j = 7; j < 8; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
-
-			p[j].field |= ((ipg_r8(PHY_CTRL) &
-				IPG_PC_MGMTDATA) >> 1) << (p[j].len - 1 - i);
-
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
-		}
-	}
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
+	(void)ipg_r8(PHY_CTRL);
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
 }
 
 /* Set LED_Mode JES20040127EEPROM */
@@ -379,18 +307,17 @@ static void ipg_set_led_mode(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	u32 mode;
+	u32 mode = ipg_r32(ASIC_CTRL);
 
-	mode = ipg_r32(ASIC_CTRL);
 	mode &= ~(IPG_AC_LED_MODE_BIT_1 | IPG_AC_LED_MODE | IPG_AC_LED_SPEED);
 
-	if ((sp->LED_Mode & 0x03) > 1)
-		mode |= IPG_AC_LED_MODE_BIT_1;	/* Write Asic Control Bit 29 */
-
-	if ((sp->LED_Mode & 0x01) == 1)
+	if (sp->LED_Mode & 0x01)
 		mode |= IPG_AC_LED_MODE;	/* Write Asic Control Bit 14 */
 
-	if ((sp->LED_Mode & 0x08) == 8)
+	if (sp->LED_Mode & 0x02)
+		mode |= IPG_AC_LED_MODE_BIT_1;	/* Write Asic Control Bit 29 */
+
+	if (sp->LED_Mode & 0x08)
 		mode |= IPG_AC_LED_SPEED;	/* Write Asic Control Bit 27 */
 
 	ipg_w32(mode, ASIC_CTRL);
@@ -401,11 +328,13 @@ static void ipg_set_phy_set(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	int physet;
+	u8 physet = ipg_r8(PHY_SET);
+	u8 newset = sp->LED_Mode >> 4;
 
-	physet = ipg_r8(PHY_SET);
-	physet &= ~(IPG_PS_MEM_LENB9B | IPG_PS_MEM_LEN9 | IPG_PS_NON_COMPDET);
-	physet |= ((sp->LED_Mode & 0x70) >> 4);
+	/* Change three bits of physet to values in newset */
+	newset ^= physet;
+	newset &= (IPG_PS_MEM_LENB9B | IPG_PS_MEM_LEN9 | IPG_PS_NON_COMPDET);
+	physet ^= newset;
 	ipg_w8(physet, PHY_SET);
 }
 
@@ -444,7 +373,7 @@ static int ipg_find_phyaddr(struct net_device *dev)
 	unsigned int phyaddr, i;
 
 	for (i = 0; i < 32; i++) {
-		u32 status;
+		unsigned status;
 
 		/* Search for the correct PHY address among 32 possible. */
 		phyaddr = (IPG_NIC_PHY_ADDRESS + i) % 32;
@@ -470,10 +399,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
 	void __iomem *ioaddr = sp->ioaddr;
-	unsigned int txflowcontrol;
-	unsigned int rxflowcontrol;
-	unsigned int fullduplex;
-	unsigned int gig;
+	bool txflowcontrol, rxflowcontrol, fullduplex, gig;
 	u32 mac_ctrl_val;
 	u32 asicctrl;
 	u8 phyctrl;
@@ -487,10 +413,10 @@ static int ipg_config_autoneg(struct net_device *dev)
 	/* Set flags for use in resolving auto-negotation, assuming
 	 * non-1000Mbps, half duplex, no flow control.
 	 */
-	fullduplex = 0;
-	txflowcontrol = 0;
-	rxflowcontrol = 0;
-	gig = 0;
+	fullduplex = false;
+	txflowcontrol = false;
+	rxflowcontrol = false;
+	gig = false;
 
 	/* To accomodate a problem in 10Mbps operation,
 	 * set a global flag if PHY running in 10Mbps mode.
@@ -512,7 +438,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 		break;
 	case IPG_PC_LINK_SPEED_1000MBPS:
 		printk("1000Mbps.\n");
-		gig = 1;
+		gig = true;
 		break;
 	default:
 		printk("undefined!\n");
@@ -520,19 +446,20 @@ static int ipg_config_autoneg(struct net_device *dev)
 	}
 
 	if (phyctrl & IPG_PC_DUPLEX_STATUS) {
-		fullduplex = 1;
-		txflowcontrol = 1;
-		rxflowcontrol = 1;
+		fullduplex = true;
+		txflowcontrol = true;
+		rxflowcontrol = true;
 	}
 
 	/* Configure full duplex, and flow control. */
-	if (fullduplex == 1) {
+	if (fullduplex) {
 		/* Configure IPG for full duplex operation. */
 		printk(KERN_INFO "%s: setting full duplex, ", dev->name);
 
 		mac_ctrl_val |= IPG_MC_DUPLEX_SELECT_FD;
 
-		if (txflowcontrol == 1) {
+		/* FIXME: Does this variable always equal fullduplex? */
+		if (txflowcontrol) {
 			printk("TX flow control");
 			mac_ctrl_val |= IPG_MC_TX_FLOW_CONTROL_ENABLE;
 		} else {
@@ -540,7 +467,7 @@ static int ipg_config_autoneg(struct net_device *dev)
 			mac_ctrl_val &= ~IPG_MC_TX_FLOW_CONTROL_ENABLE;
 		}
 
-		if (rxflowcontrol == 1) {
+		if (rxflowcontrol) {
 			printk(", RX flow control.");
 			mac_ctrl_val |= IPG_MC_RX_FLOW_CONTROL_ENABLE;
 		} else {
@@ -568,9 +495,8 @@ static int ipg_config_autoneg(struct net_device *dev)
 static void ipg_nic_set_multicast_list(struct net_device *dev)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
-	struct dev_mc_list *mc_list_ptr;
-	unsigned int hashindex;
-	u32 hashtable[2];
+	struct dev_mc_list *mc;
+	u32 hashtable[2] = { 0, 0 };
 	u8 receivemode;
 
 	IPG_DEBUG_MSG("_nic_set_multicast_list\n");
@@ -581,52 +507,34 @@ static void ipg_nic_set_multicast_list(struct net_device *dev)
 		/* NIC to be configured in promiscuous mode. */
 		receivemode = IPG_RM_RECEIVEALLFRAMES;
 	} else if ((dev->flags & IFF_ALLMULTI) ||
-		   (dev->flags & IFF_MULTICAST &
-		    (dev->mc_count > IPG_MULTICAST_HASHTABLE_SIZE))) {
+		   (dev->flags & IFF_MULTICAST &&
+		    (dev->mc_count > 2*IPG_MULTICAST_HASHTABLE_SIZE))) {
 		/* NIC to be configured to receive all multicast
 		 * frames. */
 		receivemode |= IPG_RM_RECEIVEMULTICAST;
-	} else if (dev->flags & IFF_MULTICAST & (dev->mc_count > 0)) {
+	} else if (dev->flags & IFF_MULTICAST && (dev->mc_count > 0)) {
 		/* NIC to be configured to receive selected
 		 * multicast addresses. */
 		receivemode |= IPG_RM_RECEIVEMULTICASTHASH;
-	}
-
-	/* Calculate the bits to set for the 64 bit, IPG HASHTABLE.
-	 * The IPG applies a cyclic-redundancy-check (the same CRC
-	 * used to calculate the frame data FCS) to the destination
-	 * address all incoming multicast frames whose destination
-	 * address has the multicast bit set. The least significant
-	 * 6 bits of the CRC result are used as an addressing index
-	 * into the hash table. If the value of the bit addressed by
-	 * this index is a 1, the frame is passed to the host system.
-	 */
-
-	/* Clear hashtable. */
-	hashtable[0] = 0x00000000;
-	hashtable[1] = 0x00000000;
-
-	/* Cycle through all multicast addresses to filter. */
-	for (mc_list_ptr = dev->mc_list;
-	     mc_list_ptr != NULL; mc_list_ptr = mc_list_ptr->next) {
-		/* Calculate CRC result for each multicast address. */
-		hashindex = crc32_le(0xffffffff, mc_list_ptr->dmi_addr,
-				     ETH_ALEN);
 
-		/* Use only the least significant 6 bits. */
-		hashindex = hashindex & 0x3F;
-
-		/* Within "hashtable", set bit number "hashindex"
-		 * to a logic 1.
+		/*
+		 * The IPG uses the standard hash table technique to filter
+		 * incoming multicast packets.  It computes the CRC of the
+		 * incoming MAC address, and uses the low 6 bits of the
+		 * result to select a hash table bit.  If set (and the address
+		 * is a multicast address), the packet is received.
 		 */
-		set_bit(hashindex, (void *)hashtable);
-	}
+		for (mc = dev->mc_list; mc; mv = mc->next) {
+			/* Calculate CRC result for each multicast address. */
+			u32 hashindex = crc32_le(0xffffffff, mc->dmi_addr,
+					ETH_ALEN);
 
-	/* Write the value of the hashtable, to the 4, 16 bit
-	 * HASHTABLE IPG registers.
-	 */
-	ipg_w32(hashtable[0], HASHTABLE_0);
-	ipg_w32(hashtable[1], HASHTABLE_1);
+			/* Use low 6 bits to select hash table bit */
+			set_bit(hashindex & 63, (void *)hashtable);
+		}
+		ipg_w32(hashtable[0], HASHTABLE_0);
+		ipg_w32(hashtable[1], HASHTABLE_1);
+	}
 
 	ipg_w8(IPG_RM_RSVD_MASK & receivemode, RECEIVE_MODE);
 


^ permalink raw reply related

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 23:39 UTC (permalink / raw)
  To: Brent Casavant; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <alpine.BSF.1.00.0801081716010.24286@pkunk.americas.sgi.com>

On 08/01/2008, Brent Casavant <bcasavan@sgi.com> wrote:
> On Tue, 8 Jan 2008, Tom Spink wrote:
>
> > Where in the code is the message length being sent across the socket?
>
> In do_producer(), there are the following lines in the main loop:
>
>         /* Send random lengths of data */
>         messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
>         iov[i].iov_len = messages[i].length;
>
> The entire "struct sockmsg" is sent across the socket, so the first
> size_t in each message contains the length of the entire message
> (including the size_t).  This size gets picked up at the
> recv(...,MSG_PEEK) line in do_consumer().
>
> Thanks,
> Brent
>
> --
> Brent Casavant                          All music is folk music.  I ain't
> bcasavan@sgi.com                        never heard a horse sing a song.
> Silicon Graphics, Inc.                    -- Louis Armstrong
>

Hi,

But you're not consuming the size_t on the other end.  You're only
peeking it, i.e. you're doing the recv to peek at the message, but
never calling recv to remove that data from the queue... or am I
missing something?

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: Francois Romieu @ 2008-01-08 23:28 UTC (permalink / raw)
  To: David Miller; +Cc: linux, netdev, akpm
In-Reply-To: <20080108.150038.57735076.davem@davemloft.net>

David Miller <davem@davemloft.net> :
[...]
> Same kind of bug as the RX side :-)  I bet this fixes his
> problem...

I am not sure but the Rx side is probably just here to distract
from the real problem. Please don't ask... :o)

Anyway I'll poke an adapter in the test computer and give it a
try tomorrow. Nobody will complain if I crash it.

-- 
Ueimor

^ permalink raw reply

* Re: AF_UNIX MSG_PEEK bug?
From: Tom Spink @ 2008-01-08 23:46 UTC (permalink / raw)
  To: Brent Casavant; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081539x7aee72fbm53b5f298c5faf56@mail.gmail.com>

On 08/01/2008, Tom Spink <tspink@gmail.com> wrote:
> On 08/01/2008, Brent Casavant <bcasavan@sgi.com> wrote:
> > On Tue, 8 Jan 2008, Tom Spink wrote:
> >
> > > Where in the code is the message length being sent across the socket?
> >
> > In do_producer(), there are the following lines in the main loop:
> >
> >         /* Send random lengths of data */
> >         messages[i].length = (rand() % MAXLEN) + sizeof(size_t);
> >         iov[i].iov_len = messages[i].length;
> >
> > The entire "struct sockmsg" is sent across the socket, so the first
> > size_t in each message contains the length of the entire message
> > (including the size_t).  This size gets picked up at the
> > recv(...,MSG_PEEK) line in do_consumer().
> >
> > Thanks,
> > Brent
> >
> > --
> > Brent Casavant                          All music is folk music.  I ain't
> > bcasavan@sgi.com                        never heard a horse sing a song.
> > Silicon Graphics, Inc.                    -- Louis Armstrong
> >
>
> Hi,
>
> But you're not consuming the size_t on the other end.  You're only
> peeking it, i.e. you're doing the recv to peek at the message, but
> never calling recv to remove that data from the queue... or am I
> missing something?
>
> --
> Regards,
> Tom Spink
> University of Edinburgh
>

Ach.  I *am* missing something... and what I'm missing is my
understanding of the sendmsg and recvmsg calls.

A quick Google has sorted me out.

-- 
Regards,
Tom Spink
University of Edinburgh

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: FUJITA Tomonori @ 2008-01-08 23:57 UTC (permalink / raw)
  To: mingo, akpm
  Cc: fujita.tomonori, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080108155948.GC26114@elte.hu>

On Tue, 8 Jan 2008 16:59:48 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> 
> > The patches are available at:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/
> > 
> > Or if you prefer the git tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git 
> > iommu-sg-fixes
> 
> btw., these improvements to the IOMMU code are in -mm and will go into 
> v2.6.25, right? The changes look robust to me.

Thanks, they have been in -mm though the iommu helper fix hasn't
yet. Balbir Singh found the bug in 2.6.24-rc6-mm1. I've just check
mmotm and found that the IOMMU helper patch doesn't include the fix.

Andrew, can you replace

iommu-sg-add-iommu-helper-functions-for-the-free-area-management.patch

with the updated patch:

http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048997.html

For your convenience I've attached the updated patch too.

Hopefully, they will go into v2.6.25. At least, I hope that the
patches (0001-0011) that make the IOMMUs respect segment size limits
when merging sg lists will be merged. They are simple and I got ACKs
on POWER and PARISC.


Thanks,

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] add IOMMU helper functions for the free area management

This adds IOMMU helper functions for the free area management.  These
functions take care of LLD's segment boundary limit for IOMMUs.  They would be
useful for IOMMUs that use bitmap for the free area management.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 include/linux/iommu-helper.h |    7 ++++
 lib/Makefile                 |    1 +
 lib/iommu-helper.c           |   80 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/iommu-helper.h
 create mode 100644 lib/iommu-helper.c

diff --git a/include/linux/iommu-helper.h b/include/linux/iommu-helper.h
new file mode 100644
index 0000000..4dd4c04
--- /dev/null
+++ b/include/linux/iommu-helper.h
@@ -0,0 +1,7 @@
+extern unsigned long iommu_area_alloc(unsigned long *map, unsigned long size,
+				      unsigned long start, unsigned int nr,
+				      unsigned long shift,
+				      unsigned long boundary_size,
+				      unsigned long align_mask);
+extern void iommu_area_free(unsigned long *map, unsigned long start,
+			    unsigned int nr);
diff --git a/lib/Makefile b/lib/Makefile
index b6793ed..0e7383f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_SMP) += percpu_counter.o
 obj-$(CONFIG_AUDIT_GENERIC) += audit.o
 
 obj-$(CONFIG_SWIOTLB) += swiotlb.o
+obj-$(CONFIG_IOMMU_HELPER) += iommu-helper.o
 obj-$(CONFIG_FAULT_INJECTION) += fault-inject.o
 
 lib-$(CONFIG_GENERIC_BUG) += bug.o
diff --git a/lib/iommu-helper.c b/lib/iommu-helper.c
new file mode 100644
index 0000000..495575a
--- /dev/null
+++ b/lib/iommu-helper.c
@@ -0,0 +1,80 @@
+/*
+ * IOMMU helper functions for the free area management
+ */
+
+#include <linux/module.h>
+#include <linux/bitops.h>
+
+static unsigned long find_next_zero_area(unsigned long *map,
+					 unsigned long size,
+					 unsigned long start,
+					 unsigned int nr,
+					 unsigned long align_mask)
+{
+	unsigned long index, end, i;
+again:
+	index = find_next_zero_bit(map, size, start);
+
+	/* Align allocation */
+	index = (index + align_mask) & ~align_mask;
+
+	end = index + nr;
+	if (end >= size)
+		return -1;
+	for (i = index; i < end; i++) {
+		if (test_bit(i, map)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return index;
+}
+
+static inline void set_bit_area(unsigned long *map, unsigned long i,
+				int len)
+{
+	unsigned long end = i + len;
+	while (i < end) {
+		__set_bit(i, map);
+		i++;
+	}
+}
+
+static inline int is_span_boundary(unsigned int index, unsigned int nr,
+				   unsigned long shift,
+				   unsigned long boundary_size)
+{
+	shift = (shift + index) & (boundary_size - 1);
+	return shift + nr > boundary_size;
+}
+
+unsigned long iommu_area_alloc(unsigned long *map, unsigned long size,
+			       unsigned long start, unsigned int nr,
+			       unsigned long shift, unsigned long boundary_size,
+			       unsigned long align_mask)
+{
+	unsigned long index;
+again:
+	index = find_next_zero_area(map, size, start, nr, align_mask);
+	if (index != -1) {
+		if (is_span_boundary(index, nr, shift, boundary_size)) {
+			/* we could do more effectively */
+			start = index + 1;
+			goto again;
+		}
+		set_bit_area(map, index, nr);
+	}
+	return index;
+}
+EXPORT_SYMBOL(iommu_area_alloc);
+
+void iommu_area_free(unsigned long *map, unsigned long start, unsigned int nr)
+{
+	unsigned long end = start + nr;
+
+	while (start < end) {
+		__clear_bit(start, map);
+		start++;
+	}
+}
+EXPORT_SYMBOL(iommu_area_free);
-- 
1.5.3.4



^ permalink raw reply related

* Re: AF_UNIX MSG_PEEK bug?
From: Brent Casavant @ 2008-01-09  0:08 UTC (permalink / raw)
  To: Tom Spink; +Cc: Rick Jones, netdev, David Miller, linux-kernel
In-Reply-To: <7b9198260801081546u50f11e9ck32ec6f071efa300d@mail.gmail.com>

On Tue, 8 Jan 2008, Tom Spink wrote:

> Ach.  I *am* missing something... and what I'm missing is my
> understanding of the sendmsg and recvmsg calls.

No problem.  We all have those days. :)

Brent

-- 
Brent Casavant                          All music is folk music.  I ain't
bcasavan@sgi.com                        never heard a horse sing a song.
Silicon Graphics, Inc.                    -- Louis Armstrong

^ permalink raw reply

* Re: 2.6.24-rc6-mm1
From: Andrew Morton @ 2008-01-09  0:27 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: mingo, fujita.tomonori, just.for.lkml, tomof, jarkao2, herbert,
	linux-kernel, neilb, bfields, netdev, tom
In-Reply-To: <20080109085753O.fujita.tomonori@lab.ntt.co.jp>

On Wed, 09 Jan 2008 08:57:53 +0900
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> Andrew, can you replace
> 
> iommu-sg-add-iommu-helper-functions-for-the-free-area-management.patch
> 
> with the updated patch:
> 
> http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048997.html
> 
> For your convenience I've attached the updated patch too.

<generates the incremental>


> --- a/lib/iommu-helper.c~a
> +++ a/lib/iommu-helper.c
> @@ -8,15 +8,20 @@
>  static unsigned long find_next_zero_area(unsigned long *map,
>  					 unsigned long size,
>  					 unsigned long start,
> -					 unsigned int nr)
> +					 unsigned int nr,
> +					 unsigned long align_mask)
>  {
>  	unsigned long index, end, i;
>  again:
>  	index = find_next_zero_bit(map, size, start);
> +
> +	/* Align allocation */
> +	index = (index + align_mask) & ~align_mask;

The ALIGN() macro is the approved way of doing this.

(I don't think ALIGN adds much value really, especially given that you've
commented what's going on, but I guess it does make reviewing and reading a
little easier).


>  	end = index + nr;
> -	if (end > size)
> +	if (end >= size)
>  		return -1;
> -	for (i = index + 1; i < end; i++) {
> +	for (i = index; i < end; i++) {
>  		if (test_bit(i, map)) {
>  			start = i+1;
>  			goto again;
> @@ -50,9 +55,8 @@ unsigned long iommu_area_alloc(unsigned 
>  {
>  	unsigned long index;
>  again:
> -	index = find_next_zero_area(map, size, start, nr);
> +	index = find_next_zero_area(map, size, start, nr, align_mask);
>  	if (index != -1) {
> -		index = (index + align_mask) & ~align_mask;
>  		if (is_span_boundary(index, nr, shift, boundary_size)) {
>  			/* we could do more effectively */
>  			start = index + 1;
> _
> 
> 

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: linux @ 2008-01-09  0:38 UTC (permalink / raw)
  To: linux, romieu; +Cc: akpm, davem, netdev
In-Reply-To: <20080108213640.GC4450@electric-eye.fr.zoreil.com>

> Can you try the patch below ?

Testing now... (I presume you noticed the one-character typo in my
earlier patch.  That should be "mc = mc->next", not "mv = mc->next".)

That doesn't seem to do it.  Not entirely, at least.  After downloading
and partially re-uploading an 800M file, slabtop reports:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
341576 341574  99%    0.50K  42697        8    170788K kmalloc-512
342006 341953  99%    0.19K  16286       21     65144K kmalloc-192
 30592  30575  99%    2.00K   7648        4     61184K kmalloc-2048
 30213  30193  99%    0.44K   3357        9     13428K skbuff_fclone_cache
  7650   7643  99%    0.08K    150       51       600K sysfs_dir_cache
  4000   3938  98%    0.12K    125       32       500K kmalloc-128
   258    258 100%    1.15K     43        6       344K raid5-md5
   232    221  95%    1.00K     58        4       232K kmalloc-1024
  3136   3110  99%    0.06K     49       64       196K kmalloc-64
   264     80  30%    0.68K     24       11       192K ext3_inode_cache

The "kmalloc-2048" was down in the noise before the upload started.
This is in single-user mode, after sync and echo 3 > /proc/sys/vm/drop_caches.


I'll have to try this after this evening's social plans, but I'm thinking
of implementing more rapid bug detection: explicitly zero the sp->TxBuff
slot when the skb is freed, and check that it is zero before putting
anything else in there.  (And likewise for RxBuff.)

That way, I don't have to use up a noticeable amount of memory to see
the bug and reboot to clear up the damage each test cycle.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox