From mboxrd@z Thu Jan 1 00:00:00 1970 From: Laurent Riffard Subject: Re: 2.6.18-rc1-mm1 inconsistent lock state in netpoll_send_skb Date: Wed, 12 Jul 2006 00:00:39 +0200 Message-ID: <44B41F87.9080306@free.fr> References: <20060709021106.9310d4d1.akpm@osdl.org> <44B17735.4060006@free.fr> <1152520823.4874.0.camel@laptopd505.fenrus.org> <44B2A522.2080703@free.fr> <1152607239.3128.21.camel@laptopd505.fenrus.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andrew Morton , Kernel development list , netdev@vger.kernel.org Return-path: Received: from smtp6-g19.free.fr ([212.27.42.36]:46225 "EHLO smtp6-g19.free.fr") by vger.kernel.org with ESMTP id S932168AbWGKWAX (ORCPT ); Tue, 11 Jul 2006 18:00:23 -0400 To: Arjan van de Ven In-Reply-To: <1152607239.3128.21.camel@laptopd505.fenrus.org> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Le 11.07.2006 10:40, Arjan van de Ven a =E9crit : >> Reversed (or previously applied) patch detected!=20 >> >> Wrong patch ? This one won't apply, it seems to be already=20 >> applied to 2.6.18-rc1-mm1. >=20 > ok these patches ought to fix this for real (sorry I don't have this > hardware so I cannot actually do the testing) >=20 > I hope you have time to test these.. >=20 > Greetings, > Arjan van de Ven >=20 > From: Arjan van de Ven > Subject: lockdep: core, add enable/disable_irq_irqsave/irqrestore() A= PIs >=20 > Introduce the disable_irq_nosync_lockdep_irqsave() and enable_irq_loc= kdep_irqrestore() APIs. > These are needed for NE2000; basically NE2000 calls disable_irq and e= nable_irq as locking > against the IRQ handler, but both in cases where interrupts are on an= d off. This means that > lockdep needs to track the old state of the virtual irq flags on disa= ble_irq, and restore these > at enable_irq time. >=20 > Signed-off-by: Arjan van de Ven > Signed-off-by: Ingo Molnar > Index: linux-2.6.18-rc1/include/linux/interrupt.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-2.6.18-rc1.orig/include/linux/interrupt.h > +++ linux-2.6.18-rc1/include/linux/interrupt.h > @@ -123,6 +123,14 @@ static inline void disable_irq_nosync_lo > #endif > } > =20 > +static inline void disable_irq_nosync_lockdep_irqsave(unsigned int i= rq, unsigned long *flags) > +{ > + disable_irq_nosync(irq); > +#ifdef CONFIG_LOCKDEP > + local_irq_save(*flags); > +#endif > +} > + > static inline void disable_irq_lockdep(unsigned int irq) > { > disable_irq(irq); > @@ -139,6 +147,14 @@ static inline void enable_irq_lockdep(un > enable_irq(irq); > } > =20 > +static inline void enable_irq_lockdep_irqrestore(unsigned int irq, u= nsigned long *flags) > +{ > +#ifdef CONFIG_LOCKDEP > + local_irq_restore(*flags); > +#endif > + enable_irq(irq); > +} > + > /* IRQ wakeup (PM) control: */ > extern int set_irq_wake(unsigned int irq, unsigned int on); > =20 > From: Arjan van de Ven > Subject: lockdep: annotate the ne2000 driver with the new disable_irq= API addition >=20 > The ne2000 driver's xmit function gets called from netpoll with the > _xmit_lock spinlock held as _irqsave. This means the xmit function ne= eds to preserve this > irq-off state throughout to avoid deadlock. It does, but we need to a= lso tell lockdep that > the function indeed does this by using the proper disable_irq annotat= ion. >=20 > Signed-off-by: Arjan van de Ven > Signed-off-by: Ingo Molnar >=20 > Index: linux-2.6.18-rc1/drivers/net/8390.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-2.6.18-rc1.orig/drivers/net/8390.c > +++ linux-2.6.18-rc1/drivers/net/8390.c > @@ -299,7 +299,7 @@ static int ei_start_xmit(struct sk_buff=20 > * Slow phase with lock held. > */ > =20 > - disable_irq_nosync_lockdep(dev->irq); > + disable_irq_nosync_lockdep_irqsave(dev->irq, &flags); > =09 > spin_lock(&ei_local->page_lock); > =09 > @@ -338,7 +338,7 @@ static int ei_start_xmit(struct sk_buff=20 > netif_stop_queue(dev); > outb_p(ENISR_ALL, e8390_base + EN0_IMR); > spin_unlock(&ei_local->page_lock); > - enable_irq_lockdep(dev->irq); > + enable_irq_lockdep_irqrestore(dev->irq, &flags); > ei_local->stat.tx_errors++; > return 1; > } > @@ -379,7 +379,7 @@ static int ei_start_xmit(struct sk_buff=20 > outb_p(ENISR_ALL, e8390_base + EN0_IMR); > =09 > spin_unlock(&ei_local->page_lock); > - enable_irq_lockdep(dev->irq); > + enable_irq_lockdep_irqrestore(dev->irq, &flags); > =20 > dev_kfree_skb (skb); > ei_local->stat.tx_bytes +=3D send_length; >=20 >=20 Well, the warning did not go away: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D [ INFO: inconsistent lock state ] --------------------------------- inconsistent {in-softirq-W} -> {softirq-on-W} usage. swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes: (&dev->_xmit_lock){-+..}, at: [] netpoll_send_skb+0x6b/0xdb {in-softirq-W} state was registered at: [] lock_acquire+0x60/0x80 [] _spin_lock+0x19/0x28 [] dev_watchdog+0x11/0xaf [] run_timer_softirq+0xed/0x145 [] __do_softirq+0x46/0x9c [] do_softirq+0x4d/0xab irq event stamp: 616419 hardirqs last enabled at (616419): [] restore_nocheck+0x12/0= x15 hardirqs last disabled at (616417): [] __do_softirq+0x5f/0x9c softirqs last enabled at (616418): [] __do_softirq+0x97/0x9c softirqs last disabled at (616413): [] do_softirq+0x4d/0xab other info that might help us debug this: no locks held by swapper/1. stack backtrace: [] show_trace+0xd/0x10 [] dump_stack+0x19/0x1d [] print_usage_bug+0x1cc/0x1d9 [] mark_lock+0x22d/0x349 [] __lock_acquire+0x463/0x9a5 [] lock_acquire+0x60/0x80 [] _spin_lock+0x19/0x28 [] netpoll_send_skb+0x6b/0xdb [] netpoll_send_udp+0x1fd/0x207 [] write_msg+0x42/0x6a [] __call_console_drivers+0x3b/0x48 [] _call_console_drivers+0x54/0x58 [] release_console_sem+0x118/0x1ed [] register_console+0x190/0x197 [] init_netconsole+0x4e/0x62 [] init+0x88/0x1e1 [] kernel_thread_helper+0x5/0xb =46YI, here is a diff between previous dmesg and current=20 one (with above patch applied). Please note the=20 "+no locks held by swapper/1." line.=20 --- dmesg-2.6.18-rc1-mm1 2006-07-09 23:23:33.000000000 +0200 +++ - 2006-07-11 23:51:20.760770585 +0200 @@ -1,4 +1,4 @@ -Linux version 2.6.18-rc1-mm1 (laurent@antares.localdomain) (gcc versio= n 4.1.1 20060330 (prerelease)) #67 Sun Jul 9 19:41:56 CEST 2006 +Linux version 2.6.18-rc1-mm1 (laurent@antares.localdomain) (gcc versio= n 4.1.1 20060330 (prerelease)) #68 Tue Jul 11 23:13:20 CEST 2006 BIOS-provided physical RAM map: sanitize start sanitize end @@ -40,7 +40,7 @@ ACPI: DSDT (v001 ASUS A7V133-C 0x00001000 MSFT 0x0100000b) @ 0x00000= 000 ACPI: PM-Timer IO Port: 0xe408 Allocating PCI resources starting at 30000000 (gap: 20000000:dfff0000) -Detected 1410.216 MHz processor. +Detected 1410.384 MHz processor. Built 1 zonelists. Total pages: 131052 Kernel command line: root=3D/dev/vglinux1/lvroot video=3Dvesafb:ywrap,= mtrr splash=3Dsilent resume=3D/dev/hdb6 netconsole=3D@192.163.0.3/,@192= =2E168.0.1/00:0E:9B:91:ED:72 netconsole: local port 6665 @@ -153,7 +153,7 @@ Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 514484k/524208k available (1612k kernel code, 9184k reserved, = 1175k data, 176k init, 0k highmem) Checking if this processor honours the WP bit even in supervisor mode.= =2E. Ok. -Calibrating delay using timer specific routine.. 2824.42 BogoMIPS (lpj= =3D5648858) +Calibrating delay using timer specific routine.. 2824.46 BogoMIPS (lpj= =3D5648924) Mount-cache hash table entries: 512 CPU: After generic identify, caps: 0383f9ff c1cbf9ff 00000000 00000000= 00000000 00000000 00000000 CPU: After vendor identify, caps: 0383f9ff c1cbf9ff 00000000 00000000 = 00000000 00000000 00000000 @@ -236,40 +236,39 @@ --------------------------------- inconsistent {in-softirq-W} -> {softirq-on-W} usage. swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes: - (&dev->_xmit_lock){-+..}, at: [] netpoll_send_skb+0x6b/0xdb + (&dev->_xmit_lock){-+..}, at: [] netpoll_send_skb+0x6b/0xdb {in-softirq-W} state was registered at: - [] lock_acquire+0x60/0x80 - [] _spin_lock+0x19/0x28 - [] dev_watchdog+0x11/0xaf + [] lock_acquire+0x60/0x80 + [] _spin_lock+0x19/0x28 + [] dev_watchdog+0x11/0xaf [] run_timer_softirq+0xed/0x145 [] __do_softirq+0x46/0x9c [] do_softirq+0x4d/0xab -irq event stamp: 616842 -hardirqs last enabled at (616841): [] _spin_unlock_irqresto= re+0x36/0x3c -hardirqs last disabled at (616842): [] _spin_lock_irqsave+0x= f/0x32 -softirqs last enabled at (616802): [] dev_mc_upload+0x36/0x= 3a -softirqs last disabled at (616798): [] _spin_lock_bh+0xb/0x2= d +irq event stamp: 616419 +hardirqs last enabled at (616419): [] restore_nocheck+0x12/= 0x15 +hardirqs last disabled at (616417): [] __do_softirq+0x5f/0x9= c +softirqs last enabled at (616418): [] __do_softirq+0x97/0x9= c +softirqs last disabled at (616413): [] do_softirq+0x4d/0xab =20 other info that might help us debug this: -1 lock held by swapper/1: - #0: (&dev->_xmit_lock){-+..}, at: [] netpoll_send_skb+0x6b= /0xdb +no locks held by swapper/1. =20 stack backtrace: [] show_trace+0xd/0x10 [] dump_stack+0x19/0x1d - [] print_usage_bug+0x1cc/0x1d9 - [] mark_lock+0x22d/0x349 - [] mark_held_locks+0x47/0x65 - [] trace_hardirqs_on+0xef/0x119 - [] ei_start_xmit+0x242/0x260 - [] netpoll_send_skb+0x89/0xdb - [] netpoll_send_udp+0x1fd/0x207 - [] write_msg+0x42/0x6a + [] print_usage_bug+0x1cc/0x1d9 + [] mark_lock+0x22d/0x349 + [] __lock_acquire+0x463/0x9a5 + [] lock_acquire+0x60/0x80 + [] _spin_lock+0x19/0x28 + [] netpoll_send_skb+0x6b/0xdb + [] netpoll_send_udp+0x1fd/0x207 + [] write_msg+0x42/0x6a [] __call_console_drivers+0x3b/0x48 [] _call_console_drivers+0x54/0x58 [] release_console_sem+0x118/0x1ed [] register_console+0x190/0x197 - [] init_netconsole+0x4e/0x62 + [] init_netconsole+0x4e/0x62 [] init+0x88/0x1e1 [] kernel_thread_helper+0x5/0xb netconsole: network logging started @@ -347,8 +346,8 @@ hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKD] -> GSI 5 (level, lo= w) -> IRQ 5 -usb 1-2: new low speed USB device using uhci_hcd and address 2 ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=3D[5] MMIO=3D[c5800000-c= 58007ff] Max Packet=3D[2048] IR/IT contexts=3D[8/8] +usb 1-2: new low speed USB device using uhci_hcd and address 2 usb 1-2: new device found, idVendor=3D044f, idProduct=3Db303 usb 1-2: new device strings: Mfr=3D4, Product=3D30, SerialNumber=3D0 usb 1-2: Product: FireStorm Dual Analog 2 @@ -359,14 +358,14 @@ input: USB HID v1.10 Gamepad [THRUSTMASTER FireStorm Dual Analog 2] on= usb-0000:00:04.2-2 usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver -ohci1394: fw-host0: AT dma reset ctx=3D0, aborting transmission -ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... Linux agpgart interface v0.101 (c) Dave Jones agpgart: Detected VIA Twister-K/KT133x/KM133 chipset +ohci1394: fw-host0: AT dma reset ctx=3D0, aborting transmission +ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... agpgart: AGP aperture is 256M @ 0xd0000000 input: PC Speaker as /class/input/input3 -ieee1394: Host added: ID:BUS[0-00:1023] GUID[00308d0120e085ca] Using specific hotkey driver +ieee1394: Host added: ID:BUS[0-00:1023] GUID[00308d0120e085ca] EXT3 FS on dm-4, internal journal Adding 257000k swap on /dev/hdb6. Priority:1 extents:1 across:257000k Adding 64220k swap on /dev/hda5. Priority:2 extents:1 across:64220k @@ -382,7 +381,7 @@ z_hash_table: 8192 buckets z_hash_table: 8192 buckets j_hash_table: 16384 buckets -loading reiser4 bitmap......done (33 jiffies) +loading reiser4 bitmap......done (31 jiffies) ReiserFS: hda7: found reiserfs format "3.6" with standard journal ReiserFS: hda7: using ordered data mode ReiserFS: hda7: journal params: device hda7, size 8192, journal first = block 18, max trans len 1024, max batch 900, max commit age 30, max tra= ns age 30 @@ -409,8 +408,3 @@ Using specific hotkey driver ACPI: PCI Interrupt 0000:00:0d.0[A] -> Link [LNKD] -> GSI 5 (level, lo= w) -> IRQ 5 NET: Registered protocol family 17 -zcip uses obsolete (PF_INET,SOCK_PACKET) -device eth0 entered promiscuous mode -device eth0 left promiscuous mode -martian source 255.255.255.255 from 82.64.105.254, on dev eth0 -ll header: ff:ff:ff:ff:ff:ff:00:07:cb:0e:8e:f9:08:00 --=20 laurent