* odd memory corruption in 2.5.27?
@ 2002-07-23 6:23 Zwane Mwaikambo
2002-07-23 6:24 ` Zwane Mwaikambo
0 siblings, 1 reply; 29+ messages in thread
From: Zwane Mwaikambo @ 2002-07-23 6:23 UTC (permalink / raw)
To: Alexander Viro; +Cc: Linux Kernel
Hi Al,
kernel is 2.5.27, the box was doing a 'make -j2 bzImage' at the
time on an NFS mounted filesystem, the server is 2.4.19-re5-ac3
(gdb) list *0xc014b09c
0xc014b09c is in filp_close (open.c:834).
829 */
830 int filp_close(struct file *filp, fl_owner_t id)
831 {
832 int retval;
833
834 if (!file_count(filp)) {
835 printk(KERN_ERR "VFS: Close: file count is 0\n");
836 return 0;
837 }
838 retval = 0;
Unable to handle kernel NULL pointer dereference at virtual address 0000008c
printing eip:
c014b09c
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c014b09c>] Not tainted
EFLAGS: 00010206
eax: cea36a44 ebx: cea36920 ecx: cb235b30 edx: 00000078
esi: 00000078 edi: 00000001 ebp: cea36920 esp: ca9bfddc
ds: 0018 es: 0018 ss: 0018
Process fixdep (pid: 1015, threadinfo=ca9be000 task=c8245980)
Stack: cea36920 00000001 00000001 0000000a c01202db 00000078 cea36920 ca9be000
c8245980 cea36920 00000000 c0120e40 00000000 c036bb00 ca9bff28 00000002
c0314a6f 00000001 c0340018 c0310018 ffffff00 c0108622 00000010 00000202
Call Trace: [<c01202db>] [<c0120e40>] [<c0108622>] [<c0115261>] [<c014cd1d>]
[<c01616ac>] [<c0157cda>] [<c0159020>] [<c012f75c>] [<c0114e90>] [<c01080a0>]
[<c0120018>] [<c014cd1d>] [<c0154970>] [<c0154ed1>] [<c014b066>] [<c01075db>]
Code: 8b 46 14 85 c0 75 12 68 c0 6e 31 c0 e8 13 1b fd ff 5a 31 c0
>>EIP; c014b09c <filp_close+c/130> <=====
Trace; c01202db <put_files_struct+4b/d0>
Trace; c0120e40 <do_exit+210/520>
Trace; c0108622 <die+f2/100>
Trace; c0115261 <do_page_fault+3d1/506>
Trace; c014cd1d <fget+4d/70>
Trace; c01616ac <dput+1c/240>
Trace; c0157cda <permission+1a/30>
Trace; c0159020 <link_path_walk+a30/a60>
Trace; c012f75c <zap_pmd_range+4c/60>
Trace; c0114e90 <do_page_fault+0/506>
Trace; c01080a0 <error_code+34/40>
Trace; c0120018 <will_become_orphaned_pgrp+48/d0>
Trace; c014cd1d <fget+4d/70>
Trace; c0154970 <vfs_fstat+10/40>
Trace; c0154ed1 <sys_fstat64+11/30>
Trace; c014b066 <sys_open+66/70>
Trace; c01075db <syscall_call+7/b>
Code; c014b09c <filp_close+c/130>
00000000 <_EIP>:
Code; c014b09c <filp_close+c/130> <=====
0: 8b 46 14 mov 0x14(%esi),%eax <=====
Code; c014b09f <filp_close+f/130>
3: 85 c0 test %eax,%eax
Code; c014b0a1 <filp_close+11/130>
5: 75 12 jne 19 <_EIP+0x19> c014b0b5 <filp_close+25/130>
Code; c014b0a3 <filp_close+13/130>
7: 68 c0 6e 31 c0 push $0xc0316ec0
Code; c014b0a8 <filp_close+18/130>
c: e8 13 1b fd ff call fffd1b24 <_EIP+0xfffd1b24> c011cbc0 <printk+0/210>
Code; c014b0ad <filp_close+1d/130>
11: 5a pop %edx
Code; c014b0ae <filp_close+1e/130>
12: 31 c0 xor %eax,%eax
--
function.linuxpower.ca
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: odd memory corruption in 2.5.27? 2002-07-23 6:23 odd memory corruption in 2.5.27? Zwane Mwaikambo @ 2002-07-23 6:24 ` Zwane Mwaikambo 2002-07-23 6:26 ` Zwane Mwaikambo 0 siblings, 1 reply; 29+ messages in thread From: Zwane Mwaikambo @ 2002-07-23 6:24 UTC (permalink / raw) To: Alexander Viro; +Cc: Linux Kernel Hi Al, Hmm this is wild pointer #2, whats -really- freaky is that its the same address 0x14(00000078), this was also over NFS. Unable to handle kernel NULL pointer dereference at virtual address 0000008c c014cd1d *pde = 00000000 Oops: 0002 CPU: 0 EIP: 0010:[<c014cd1d>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: ccefb4c0 ebx: 00000078 ecx: 00000003 edx: ccf30000 esi: ccefb39c edi: 00000003 ebp: bfffe998 esp: ccf31f5c ds: 0018 es: 0018 ss: 0018 Stack: ccf31f7c fffffff7 c0154970 ccf31f7c bfffe9e0 c0154ed1 00000003 ccf31f7c 00000246 00000246 c4ca7000 00000001 00000000 00000000 00000400 00000078 00000003 c4ca7000 bfffe9c8 c014b066 c16b6a30 c4ca7000 ccf30000 bfffe9e0 Call Trace: [<c0154970>] [<c0154ed1>] [<c014b066>] [<c01075db>] Code: f0 ff 43 14 f0 ff 46 04 ff 4a 10 8b 42 08 83 e0 08 74 05 e8 >>EIP; c014cd1d <fget+4d/70> <===== Trace; c0154970 <vfs_fstat+10/40> Trace; c0154ed1 <sys_fstat64+11/30> Trace; c014b066 <sys_open+66/70> Trace; c01075db <syscall_call+7/b> Code; c014cd1d <fget+4d/70> 00000000 <_EIP>: Code; c014cd1d <fget+4d/70> <===== 0: f0 ff 43 14 lock incl 0x14(%ebx) <===== Code; c014cd21 <fget+51/70> 4: f0 ff 46 04 lock incl 0x4(%esi) Code; c014cd25 <fget+55/70> 8: ff 4a 10 decl 0x10(%edx) Code; c014cd28 <fget+58/70> b: 8b 42 08 mov 0x8(%edx),%eax Code; c014cd2b <fget+5b/70> e: 83 e0 08 and $0x8,%eax Code; c014cd2e <fget+5e/70> 11: 74 05 je 18 <_EIP+0x18> c014cd35 <fget+65/70> Code; c014cd30 <fget+60/70> -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 6:24 ` Zwane Mwaikambo @ 2002-07-23 6:26 ` Zwane Mwaikambo 2002-07-23 7:57 ` Trond Myklebust 0 siblings, 1 reply; 29+ messages in thread From: Zwane Mwaikambo @ 2002-07-23 6:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux Kernel, Alexander Viro Hi Trond, Arnaldo, Al I tried reproducing using a local filesystem and couldn't (machine survived 3 make -j10 kernel compiles). Here is another oops for the collection. Al i'll remove you from further CCs now. client: 2.5.27-serial, 3c905B server: 2.4.19-pre5-ac3, 3c905B connection: 100Mb/FD I got this message before it oopsed; RPC: garbage, exit EIO Unable to handle kernel paging request at virtual address 5f707275 c013f822 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c013f822>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 5f707261 ebx: 00000001 ecx: 5f707261 edx: 00000000 esi: c826d2e4 edi: c826d2e4 ebp: c038c000 esp: c038deb0 ds: 0018 es: 0018 ss: 0018 Stack: c028d4fe c826d2e4 cea012cc c028d53c c826d2e4 cde2ca04 cea012cc c028d67d c826d2e4 c02a2b00 c02a28d3 c826d2e4 c8e0ae2c c02a2b00 c038c000 c02a2c21 cea012cc c8e0ae2c c02a2c8b cea012cc 00000001 00000000 cc00fc0c 00000000 Call Trace: [<c028d4fe>] [<c028d53c>] [<c028d67d>] [<c02a2b00>] [<c02a28d3>] [<c02a2b00>] [<c02a2c21>] [<c02a2c8b>] [<c0126614>] [<c0122950>] [<c01227e0>] [<c01224eb>] [<c0109d82>] [<c01053a0>] [<c0107f2a>] [<c01053a0>] [<c01053a0>] [<c01053ca>] [<c0105472>] [<c0105000>] Code: 8b 41 14 a9 00 08 00 00 75 14 f0 ff 49 10 0f 94 c0 84 c0 74 >>EIP; c013f822 <__free_pages+2/30> <===== Trace; c028d4fe <skb_release_data+1e/80> Trace; c028d53c <skb_release_data+5c/80> Trace; c028d67d <__kfree_skb+ad/e0> Trace; c02a2b00 <ip_evictor+1d0/200> Trace; c02a28d3 <ip_frag_destroy+43/a0> Trace; c02a2b00 <ip_evictor+1d0/200> Trace; c02a2c21 <ip_expire+f1/1a0> Trace; c02a2c8b <ip_expire+15b/1a0> Trace; c0126614 <timer_bh+324/400> Trace; c0122950 <bh_action+70/140> Trace; c01227e0 <tasklet_hi_action+70/c0> Trace; c01224eb <do_softirq+7b/f0> Trace; c0109d82 <do_IRQ+1d2/1e0> Trace; c01053a0 <default_idle+0/40> Trace; c0107f2a <common_interrupt+22/28> Trace; c01053a0 <default_idle+0/40> Trace; c01053a0 <default_idle+0/40> Trace; c01053ca <default_idle+2a/40> Trace; c0105472 <cpu_idle+52/70> Trace; c0105000 <_stext+0/0> Code; c013f822 <__free_pages+2/30> 00000000 <_EIP>: Code; c013f822 <__free_pages+2/30> <===== 0: 8b 41 14 mov 0x14(%ecx),%eax <===== Code; c013f825 <__free_pages+5/30> 3: a9 00 08 00 00 test $0x800,%eax Code; c013f82a <__free_pages+a/30> 8: 75 14 jne 1e <_EIP+0x1e> c013f840 <__free_pages+20/30> Code; c013f82c <__free_pages+c/30> a: f0 ff 49 10 lock decl 0x10(%ecx) Code; c013f830 <__free_pages+10/30> e: 0f 94 c0 sete %al Code; c013f833 <__free_pages+13/30> 11: 84 c0 test %al,%al Code; c013f835 <__free_pages+15/30> 13: 74 00 je 15 <_EIP+0x15> c013f837 <__free_pages+17/30> <0>Kernel panic: Aiee, killing interrupt handler! -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 6:26 ` Zwane Mwaikambo @ 2002-07-23 7:57 ` Trond Myklebust 2002-07-23 8:54 ` Zwane Mwaikambo 0 siblings, 1 reply; 29+ messages in thread From: Trond Myklebust @ 2002-07-23 7:57 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: Linux Kernel, Alexander Viro On Tuesday 23 July 2002 08:26, Zwane Mwaikambo wrote: > Hi Trond, Arnaldo, Al > I tried reproducing using a local filesystem and couldn't > (machine survived 3 make -j10 kernel compiles). Here is another oops for > the collection. Al i'll remove you from further CCs now. > > client: 2.5.27-serial, 3c905B > server: 2.4.19-pre5-ac3, 3c905B > connection: 100Mb/FD > > I got this message before it oopsed; > RPC: garbage, exit EIO Just means that some RPC message reply from the server was crap. We should deal fine with that sort of thing... AFAICS The Oops itself happened deep down in the socket layer in the part which has to do with reassembling fragments into packets. The garbage collector tried to release a fragment that had timed out and Oopsed. Suggests either memory corruption or else that the networking driver is doing something odd ('cos at that point in the socket layer *only* the driver + the fragment handler should have touched the skb). Cheers, Trond ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 7:57 ` Trond Myklebust @ 2002-07-23 8:54 ` Zwane Mwaikambo 2002-07-23 20:32 ` george anzinger 0 siblings, 1 reply; 29+ messages in thread From: Zwane Mwaikambo @ 2002-07-23 8:54 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux Kernel On Tue, 23 Jul 2002, Trond Myklebust wrote: > Just means that some RPC message reply from the server was crap. We should > deal fine with that sort of thing... > > AFAICS The Oops itself happened deep down in the socket layer in the part > which has to do with reassembling fragments into packets. The garbage > collector tried to release a fragment that had timed out and Oopsed. > > Suggests either memory corruption or else that the networking driver is doing > something odd ('cos at that point in the socket layer *only* the driver + the > fragment handler should have touched the skb). Thanks, that helps quite a bit, i'll see if i can pinpoint it and send it to the relevant people. Thanks, Zwane -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 8:54 ` Zwane Mwaikambo @ 2002-07-23 20:32 ` george anzinger 2002-07-23 20:47 ` William Lee Irwin III ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: george anzinger @ 2002-07-23 20:32 UTC (permalink / raw) To: Zwane Mwaikambo; +Cc: Trond Myklebust, Linux Kernel Zwane Mwaikambo wrote: > > On Tue, 23 Jul 2002, Trond Myklebust wrote: > > > Just means that some RPC message reply from the server was crap. We should > > deal fine with that sort of thing... > > > > AFAICS The Oops itself happened deep down in the socket layer in the part > > which has to do with reassembling fragments into packets. The garbage > > collector tried to release a fragment that had timed out and Oopsed. > > > > Suggests either memory corruption or else that the networking driver is doing > > something odd ('cos at that point in the socket layer *only* the driver + the > > fragment handler should have touched the skb). > > Thanks, that helps quite a bit, i'll see if i can pinpoint it and send it > to the relevant people. > I just spent a month tracking down this issue. It comes down to the slab allocater using per cpu data structures and protecting them with a combination of interrupt disables and spin_locks. Preemption is allowed (incorrectly) if interrupts are off and preempt_count goes to zero on the spin_unlock. I will wager that this is an SMP machine. After the preemption interrupts will be on (schedule() does that) AND you could be on a different cpu. Either of these is a BAD thing. The proposed fix is to catch the attempted preemption in preempt_schedule() and just return if the interrupt system is off. (Of course there is more that this to it, but I do believe that the problem is known. You could blow this assertion out of the water by asserting that the machine is NOT smp.) -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 20:32 ` george anzinger @ 2002-07-23 20:47 ` William Lee Irwin III 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar 2002-07-24 6:52 ` odd memory corruption in 2.5.27? Zwane Mwaikambo 2 siblings, 0 replies; 29+ messages in thread From: William Lee Irwin III @ 2002-07-23 20:47 UTC (permalink / raw) To: george anzinger; +Cc: Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Tue, Jul 23, 2002 at 01:32:11PM -0700, george anzinger wrote: > I just spent a month tracking down this issue. It comes > down to the slab allocater using per cpu data structures and > protecting them with a combination of interrupt disables and > spin_locks. Preemption is allowed (incorrectly) if > interrupts are off and preempt_count goes to zero on the > spin_unlock. I will wager that this is an SMP machine. > After the preemption interrupts will be on (schedule() does > that) AND you could be on a different cpu. Either of these > is a BAD thing. > The proposed fix is to catch the attempted preemption in > preempt_schedule() and just return if the interrupt system > is off. (Of course there is more that this to it, but I do > believe that the problem is known. You could blow this > assertion out of the water by asserting that the machine is > NOT smp.) I've been seeing the slab allocator race like mad already (i.e. BUG at slab.c:1947 and slab.c:1961. I'll test this fix. Thanks, Bill ^ permalink raw reply [flat|nested] 29+ messages in thread
* [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-23 20:32 ` george anzinger 2002-07-23 20:47 ` William Lee Irwin III @ 2002-07-23 23:28 ` Ingo Molnar 2002-07-23 23:53 ` george anzinger ` (2 more replies) 2002-07-24 6:52 ` odd memory corruption in 2.5.27? Zwane Mwaikambo 2 siblings, 3 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-23 23:28 UTC (permalink / raw) To: george anzinger Cc: Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Tue, 23 Jul 2002, george anzinger wrote: > I just spent a month tracking down this issue. It comes > down to the slab allocater using per cpu data structures and > protecting them with a combination of interrupt disables and > spin_locks. Preemption is allowed (incorrectly) if > interrupts are off and preempt_count goes to zero on the > spin_unlock. [...] > The proposed fix is to catch the attempted preemption in > preempt_schedule() and just return if the interrupt system > is off. [...] this is most definitely not the correct fix ... i'm quite convinced that the fix is to avoid illegal preemption, not to work it around. i've written debugging code that caught and reported this slab.c bug within minutes. The code detects the irqs-off condition in schedule(). You can find it my latest irqlock patchset, at: http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.27-G3 it fixes this and other related bugs as well. Changes in -G3: - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It also has to check for preemption in the right spot.) This should fix the memory corruption. - irq_exit() needs to run softirqs if interrupts not active - in the previous patch it ran them when preempt_count() was 0, which is incorrect. - spinlock macros are updated to enable preemption after enabling interrupts. Besides avoiding false positive warnings, this also - fork.c has to call scheduler_tick() with preemption disabled - otherwise scheduler_tick()'s spin_unlock can preempt! - irqs_disabled() macro introduced. - [ all other local_irq_enable() or sti instances conditional on CONFIG_DEBUG_IRQ_SCHEDULE are to fix false positive warnings. ] Changes in -G0: - fix buggy in_softirq(). Fortunately the bug made the test broader, which didnt result in algorithmical breakage, just suboptimal performance. - move do_softirq() processing into irq_exit() => this also fixes the softirq processing bugs present in apic.c IRQ handlers that did not test for softirqs after irq_exit(). - simplify local_bh_enable(). Changes in -F9: - replace all instances of: local_save_flags(flags); local_irq_disable(); with the shorter form of: local_irq_save(flags); about 30 files are affected by this change. Changes in -F8: - preempt/hardirq/softirq count separation, cleanups. - skbuff.c fix. - use irq_count() in scheduler_tick() Changes in -F3: - the entry.S cleanups/speedups by Oleg Nesterov. - a rather critical synchronize_irq() bugfix: if a driver frees an interrupt that is still being probed then synchronize_irq() locks up. This bug has caused a spurious boot-lockup on one of my testsystems, ifconfig would lock up trying to close eth0. - remove duplicate definitions from asm-i386/system.h, this fixes compiler warnings. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar @ 2002-07-23 23:53 ` george anzinger 2002-07-23 23:56 ` Linus Torvalds 2002-07-24 1:08 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Robert Love 2 siblings, 0 replies; 29+ messages in thread From: george anzinger @ 2002-07-23 23:53 UTC (permalink / raw) To: Ingo Molnar Cc: Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel Ingo Molnar wrote: > > On Tue, 23 Jul 2002, george anzinger wrote: > > > I just spent a month tracking down this issue. It comes > > down to the slab allocater using per cpu data structures and > > protecting them with a combination of interrupt disables and > > spin_locks. Preemption is allowed (incorrectly) if > > interrupts are off and preempt_count goes to zero on the > > spin_unlock. [...] > > > The proposed fix is to catch the attempted preemption in > > preempt_schedule() and just return if the interrupt system > > is off. [...] > > this is most definitely not the correct fix ... > > i'm quite convinced that the fix is to avoid illegal preemption, not to > work it around. I like this. The only change I would make is to enable interrupts in exit.c and entry.S where they are only enabled under the debug condition now, (mostly I try to avoid Heisenberg). I really do like the ability to track down this problem by just turing on the debug option. > > i've written debugging code that caught and reported this slab.c bug > within minutes. Yeah, its easy when you know what to look for :) > The code detects the irqs-off condition in schedule(). You > can find it my latest irqlock patchset, at: > > http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.27-G3 > > it fixes this and other related bugs as well. > > Changes in -G3: > > - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It > also has to check for preemption in the right spot.) This should fix > the memory corruption. > > - irq_exit() needs to run softirqs if interrupts not active - in the > previous patch it ran them when preempt_count() was 0, which is > incorrect. > > - spinlock macros are updated to enable preemption after enabling > interrupts. Besides avoiding false positive warnings, this also > > - fork.c has to call scheduler_tick() with preemption disabled - > otherwise scheduler_tick()'s spin_unlock can preempt! > > - irqs_disabled() macro introduced. > > - [ all other local_irq_enable() or sti instances conditional on > CONFIG_DEBUG_IRQ_SCHEDULE are to fix false positive warnings. ] > > Changes in -G0: > > - fix buggy in_softirq(). Fortunately the bug made the test broader, > which didnt result in algorithmical breakage, just suboptimal > performance. > > - move do_softirq() processing into irq_exit() => this also fixes the > softirq processing bugs present in apic.c IRQ handlers that did not > test for softirqs after irq_exit(). > > - simplify local_bh_enable(). > > Changes in -F9: > > - replace all instances of: > > local_save_flags(flags); > local_irq_disable(); > > with the shorter form of: > > local_irq_save(flags); > > about 30 files are affected by this change. > > Changes in -F8: > > - preempt/hardirq/softirq count separation, cleanups. > > - skbuff.c fix. > > - use irq_count() in scheduler_tick() > > Changes in -F3: > > - the entry.S cleanups/speedups by Oleg Nesterov. > > - a rather critical synchronize_irq() bugfix: if a driver frees an > interrupt that is still being probed then synchronize_irq() locks up. > This bug has caused a spurious boot-lockup on one of my testsystems, > ifconfig would lock up trying to close eth0. > > - remove duplicate definitions from asm-i386/system.h, this fixes > compiler warnings. > > Ingo -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar 2002-07-23 23:53 ` george anzinger @ 2002-07-23 23:56 ` Linus Torvalds 2002-07-24 0:07 ` Ingo Molnar 2002-07-24 1:08 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Robert Love 2 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2002-07-23 23:56 UTC (permalink / raw) To: Ingo Molnar Cc: george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Wed, 24 Jul 2002, Ingo Molnar wrote: > > - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It > also has to check for preemption in the right spot.) This should fix > the memory corruption. That cannot be right. If we want to drop a spinlock but remain non-preemptible, we should comment that a _lot_, and not just say "xxxx_no_resched()". In fact, I personally think that every "spin_unlock_no_resched()" is an outright BUG. Either the spin_unlock() makes us preemptible (in which case it doesn't matter from a correctness point whether we schedule immediately, or whether something else like a vmalloc fault might force us to schedule soon afterwards), or the spin_unlock is doing something magical, and we depend on the preemptability to not change. In the latter case (which should be very very rare indeed), we should just use /* BIG comment about what we're doing. */ /* We're dropping the spinlock, but we remain non-preemptable */ __raw_spin_unlock(..); and then later on, when preemptability is over, we do local_irq_enable(); preempt_enable(); so that we _clearly_ mark out the region where we must not re-schedule. It is simply not acceptable to just play games with disabling interrupts, and magically "knowing" that we're not preemptable without making that clear some way. Please get rid of spin_unlock_no_schedule() and friends, ok? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-23 23:56 ` Linus Torvalds @ 2002-07-24 0:07 ` Ingo Molnar 2002-07-24 2:15 ` Linus Torvalds 0 siblings, 1 reply; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 0:07 UTC (permalink / raw) To: Linus Torvalds Cc: george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Tue, 23 Jul 2002, Linus Torvalds wrote: > /* BIG comment about what we're doing. */ > /* We're dropping the spinlock, but we remain non-preemptable */ > __raw_spin_unlock(..); > > and then later on, when preemptability is over, we do > > local_irq_enable(); > preempt_enable(); > > so that we _clearly_ mark out the region where we must not re-schedule. agreed, this is much nicer. i removed the spin_unlock_no_resched() define, and modified the slab.c fix (and the other spin_unlock_no_resched() places) to be more verbose about what they do. we still have preempt_enable_no_resched() - that is legitimately needed in a number of cases - i've added comments to make its usage clear. these cleanups are in the -G5 patch: http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.27-G5 i've test-booted BK-curr + G5, CONFIG_SMP, CONFIG_PREEMPT and CONFIG_DEBUG_IRQ_SCHEDULE kernel on an SMP box, it works fine. Ingo Changes in -G3: - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It also has to check for preemption in the right spot.) This should fix the memory corruption. - irq_exit() needs to run softirqs if interrupts not active - in the previous patch it ran them when preempt_count() was 0, which is incorrect. - spinlock macros are updated to enable preemption after enabling interrupts. Besides avoiding false positive warnings, this also - fork.c has to call scheduler_tick() with preemption disabled - otherwise scheduler_tick()'s spin_unlock can preempt! - irqs_disabled() macro introduced. - [ all other local_irq_enable() or sti instances conditional on CONFIG_DEBUG_IRQ_SCHEDULE are to fix false positive warnings. ] Changes in -G0: - fix buggy in_softirq(). Fortunately the bug made the test broader, which didnt result in algorithmical breakage, just suboptimal performance. - move do_softirq() processing into irq_exit() => this also fixes the softirq processing bugs present in apic.c IRQ handlers that did not test for softirqs after irq_exit(). - simplify local_bh_enable(). Changes in -F9: - replace all instances of: local_save_flags(flags); local_irq_disable(); with the shorter form of: local_irq_save(flags); about 30 files are affected by this change. Changes in -F8: - preempt/hardirq/softirq count separation, cleanups. - skbuff.c fix. - use irq_count() in scheduler_tick() Changes in -F3: - the entry.S cleanups/speedups by Oleg Nesterov. - a rather critical synchronize_irq() bugfix: if a driver frees an interrupt that is still being probed then synchronize_irq() locks up. This bug has caused a spurious boot-lockup on one of my testsystems, ifconfig would lock up trying to close eth0. - remove duplicate definitions from asm-i386/system.h, this fixes compiler warnings. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-24 0:07 ` Ingo Molnar @ 2002-07-24 2:15 ` Linus Torvalds 2002-07-24 8:59 ` [patch] irqlock patch 2.5.27-H3 Ingo Molnar 0 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2002-07-24 2:15 UTC (permalink / raw) To: Ingo Molnar Cc: george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel Ingo, you really shouldn't use a generic #define like IRQ_MASK for something as obscure as the mask for preemption bits (that apparently gets used exactly _once_ in the same header file that defines it). That #define is already used in the kernel inside various files, and from the things I looked at, other users had more reason to call their stuff IRQ_MASK than the new code has. Also, please don't do things like #define NR_PREEMPT 256 when what you really are doing is doling out bits rather than "numbers". So I think you'd be better off doing #define PREEMPT_BITS 8 #define HARDIRQ_BITS 8 #define SOFTIRQ_BITS 8 #define PREEMPT_SHIFT 0 #define HARDIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS) #define SOFTIRQ_SHIFT (HARDIRQ_SHIFT + PREEMPT_BITS) #define __MASK(x) ((1UL << (x))-1) #define PREEMPT_MASK (__MASK(PREEMPT_BITS) << PREEMPT_SHIFT) #define HARDIRQ_MASK (__MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT) #define SOFTIRQ_MASK (__MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT) #define hardirq_count() (preempt_count() & HARDIRQ_MASK) #define softirq_count() (preempt_count() & SOFTIRQ_MASK) #define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK)) which creates bitmasks from bit-operations rather than doing non-bitwise arithmetic to get them from magic values that have to be powers-of-2. And avoids using too generic a name (ie IRQ_MASK is gone). Ehh? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* [patch] irqlock patch 2.5.27-H3. 2002-07-24 2:15 ` Linus Torvalds @ 2002-07-24 8:59 ` Ingo Molnar 0 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 8:59 UTC (permalink / raw) To: Linus Torvalds Cc: george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Tue, 23 Jul 2002, Linus Torvalds wrote: > you really shouldn't use a generic #define like IRQ_MASK for something as > obscure as the mask for preemption bits (that apparently gets used > exactly _once_ in the same header file that defines it). > > That #define is already used in the kernel inside various files, and from > the things I looked at, other users had more reason to call their stuff > IRQ_MASK than the new code has. (doh - and i've specifically checked the new names for namespace collision because they looked too generic - apparently not well enough.) > So I think you'd be better off doing > > #define PREEMPT_BITS 8 > #define HARDIRQ_BITS 8 > #define SOFTIRQ_BITS 8 > > #define PREEMPT_SHIFT 0 > #define HARDIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS) > #define SOFTIRQ_SHIFT (HARDIRQ_SHIFT + PREEMPT_BITS) > > #define __MASK(x) ((1UL << (x))-1) > > #define PREEMPT_MASK (__MASK(PREEMPT_BITS) << PREEMPT_SHIFT) > #define HARDIRQ_MASK (__MASK(HARDIRQ_BITS) << HARDIRQ_SHIFT) > #define SOFTIRQ_MASK (__MASK(SOFTIRQ_BITS) << SOFTIRQ_SHIFT) > > #define hardirq_count() (preempt_count() & HARDIRQ_MASK) > #define softirq_count() (preempt_count() & SOFTIRQ_MASK) > #define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK)) > > which creates bitmasks from bit-operations rather than doing non-bitwise > arithmetic to get them from magic values that have to be powers-of-2. yeah, agreed. i did one more modification in this area, just for educational purposes it's a better ordering to have: - preempt count - softirq count - hardirq count the higher positioned the bitmask, the 'stronger' and more atomic the execution concept is. Latest patch (against BK-curr) is at: http://redhat.com/~mingo/remove-irqlock-patches/remove-irqlock-2.5.27-H3 Changes in -H3: - init thread needs to have preempt_count of 1 until sched_init(). (William Lee Irwin III) - clean up the irq-mask macros. (Linus) - add barrier() to irq_enter() and irq_exit(). (based on Oleg Nesterov's comment.) - move the irqs-off check into preempt_schedule() and remove CONFIG_DEBUG_IRQ_SCHEDULE. Changes in -G5: - remove spin_unlock_no_resched() and comment the affected places more agressively. Changes in -G3: - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It also has to check for preemption in the right spot.) This should fix the memory corruption. - irq_exit() needs to run softirqs if interrupts not active - in the previous patch it ran them when preempt_count() was 0, which is incorrect. - spinlock macros are updated to enable preemption after enabling interrupts. Besides avoiding false positive warnings, this also - fork.c has to call scheduler_tick() with preemption disabled - otherwise scheduler_tick()'s spin_unlock can preempt! - irqs_disabled() macro introduced. - [ all other local_irq_enable() or sti instances conditional on CONFIG_DEBUG_IRQ_SCHEDULE are to fix false positive warnings. ] Changes in -G0: - fix buggy in_softirq(). Fortunately the bug made the test broader, which didnt result in algorithmical breakage, just suboptimal performance. - move do_softirq() processing into irq_exit() => this also fixes the softirq processing bugs present in apic.c IRQ handlers that did not test for softirqs after irq_exit(). - simplify local_bh_enable(). Changes in -F9: - replace all instances of: local_save_flags(flags); local_irq_disable(); with the shorter form of: local_irq_save(flags); about 30 files are affected by this change. Changes in -F8: - preempt/hardirq/softirq count separation, cleanups. - skbuff.c fix. - use irq_count() in scheduler_tick() Changes in -F3: - the entry.S cleanups/speedups by Oleg Nesterov. - a rather critical synchronize_irq() bugfix: if a driver frees an interrupt that is still being probed then synchronize_irq() locks up. This bug has caused a spurious boot-lockup on one of my testsystems, ifconfig would lock up trying to close eth0. - remove duplicate definitions from asm-i386/system.h, this fixes compiler warnings. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar 2002-07-23 23:53 ` george anzinger 2002-07-23 23:56 ` Linus Torvalds @ 2002-07-24 1:08 ` Robert Love 2002-07-24 3:13 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Andrew Morton 2 siblings, 1 reply; 29+ messages in thread From: Robert Love @ 2002-07-24 1:08 UTC (permalink / raw) To: Ingo Molnar Cc: george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Tue, 2002-07-23 at 16:28, Ingo Molnar wrote: > this is most definitely not the correct fix ... > > i'm quite convinced that the fix is to avoid illegal preemption, not to > work it around. I am not sure I am fully convinced one way or the other, but treating every bit of code as we find it scares me. The fact is, if a spin_unlock() can magically reenable interrupts that is a bug. I don't like relying on chance and the possibility your debug tool found the problem... but at the same time, Ingo's solution is a lot cleaner. Linus, Ingo, comments? Attached is the patch George mentioned, against 2.5.27. Robert Love diff -urN linux-2.5.27/include/asm-i386/system.h linux/include/asm-i386/system.h --- linux-2.5.27/include/asm-i386/system.h Sat Jul 20 12:11:05 2002 +++ linux/include/asm-i386/system.h Tue Jul 23 18:03:47 2002 @@ -270,6 +270,13 @@ /* Compiling for a 386 proper. Is it worth implementing via cli/sti? */ #endif +#define MASK_IF 0x200 +#define interrupts_enabled() ({ \ + int flg; \ + __save_flags(flg); \ + flg & MASK_IF; \ +}) + /* * Force strict CPU ordering. * And yes, this is required on UP too when we're talking diff -urN linux-2.5.27/kernel/sched.c linux/kernel/sched.c --- linux-2.5.27/kernel/sched.c Sat Jul 20 12:11:11 2002 +++ linux/kernel/sched.c Tue Jul 23 18:02:13 2002 @@ -899,7 +899,7 @@ { struct thread_info *ti = current_thread_info(); - if (unlikely(ti->preempt_count)) + if (unlikely(ti->preempt_count || !interrupts_enabled())) return; need_resched: ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] 2002-07-24 1:08 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Robert Love @ 2002-07-24 3:13 ` Andrew Morton 2002-07-24 3:18 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Andrew Morton @ 2002-07-24 3:13 UTC (permalink / raw) To: Robert Love Cc: Ingo Molnar, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel Robert Love wrote: > > ... > Attached is the patch George mentioned, against 2.5.27. > I'd agree with Robert on that. The idea should be that, as much as possible, preempt "just works" as long as the programmer sticks with standard SMP programming practices. And yet here we have a case where a spin_unlock() will go and turn on local interrupts. Only with CONFIG_PREEMPT, and even then, extremely rarely. Plus it would be nice to have an arch-independent way of querying whether the current CPU has interrupts enabled. I've needed that several times within debug/devel code. Robert and George's patch doesn't seem to be optimal though - if we're not going to preempt at spin_unlock() time, we need to preempt at local_irq_restore() time. It'll be untrivial to fix all this, but this very subtle change to the locking semantics with CONFIG_PREEMPT is quite nasty. - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] 2002-07-24 3:13 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Andrew Morton @ 2002-07-24 3:18 ` Linus Torvalds 2002-07-24 7:13 ` Ingo Molnar 2002-07-24 7:34 ` Ingo Molnar 2002-07-24 7:37 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Ingo Molnar 2 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2002-07-24 3:18 UTC (permalink / raw) To: Andrew Morton Cc: Robert Love, Ingo Molnar, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Tue, 23 Jul 2002, Andrew Morton wrote: > > And yet here we have a case where a spin_unlock() will > go and turn on local interrupts. Only with CONFIG_PREEMPT, > and even then, extremely rarely. I think that's just a bug, the same way it was a bug that preemtion would sometimes set tsk->state to TASK_RUNNING. I think Robert already sent a fix: make "preempt_schedule()" refuse to schedule if local interrupts are disabled. That, together with making it a warning (so that we can _fix_ places that have unbalanced irq/spinlock behaviour) shoul dbe fine. Eventually, if we think all places are fixed, we can remove the test from preempt_schedule(). Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] 2002-07-24 3:18 ` Linus Torvalds @ 2002-07-24 7:13 ` Ingo Molnar 0 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 7:13 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Tue, 23 Jul 2002, Linus Torvalds wrote: > > And yet here we have a case where a spin_unlock() will > > go and turn on local interrupts. Only with CONFIG_PREEMPT, > > and even then, extremely rarely. > > I think that's just a bug, the same way it was a bug that preemtion would > sometimes set tsk->state to TASK_RUNNING. > > I think Robert already sent a fix: make "preempt_schedule()" refuse to > schedule if local interrupts are disabled. my problem with Robert's patch is that the intention is not debugging, the intention of the change was make it the standard thing. This just hides serious bugs like the one in slab.c. I'd suggest to rather fix these bugs and be aware of them via a debugging mechanism, instead of putting one more (not quite cheap) check into one of our hotpaths. > That, together with making it a warning (so that we can _fix_ places > that have unbalanced irq/spinlock behaviour) shoul dbe fine. [...] yep - i've moved the check from schedule() to preempt_schedule(), which clearly is the most serious offender. This enabled the removal of the CONFIG_DEBUG_IRQ_SCHEDULE define. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] 2002-07-24 3:13 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Andrew Morton 2002-07-24 3:18 ` Linus Torvalds @ 2002-07-24 7:34 ` Ingo Molnar 2002-07-24 8:00 ` [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] Andrew Morton 2002-07-24 7:37 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Ingo Molnar 2 siblings, 1 reply; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 7:34 UTC (permalink / raw) To: Andrew Morton Cc: Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Tue, 23 Jul 2002, Andrew Morton wrote: > Robert and George's patch doesn't seem to be optimal though - if we're > not going to preempt at spin_unlock() time, we need to preempt at > local_irq_restore() time. It'll be untrivial to fix all this, but this > very subtle change to the locking semantics with CONFIG_PREEMPT is quite > nasty. this is precisely the reason why we cannot pretend these bugs do not exist and just work this around in preempt_schedule(). Code that relies on cli/sti for atomicity should be pretty rare and limited, there's 1 known case so far where it leads to bugs. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 7:34 ` Ingo Molnar @ 2002-07-24 8:00 ` Andrew Morton 2002-07-24 7:54 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Andrew Morton @ 2002-07-24 8:00 UTC (permalink / raw) To: Ingo Molnar Cc: Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel Ingo Molnar wrote: > > On Tue, 23 Jul 2002, Andrew Morton wrote: > > > Robert and George's patch doesn't seem to be optimal though - if we're > > not going to preempt at spin_unlock() time, we need to preempt at > > local_irq_restore() time. It'll be untrivial to fix all this, but this > > very subtle change to the locking semantics with CONFIG_PREEMPT is quite > > nasty. > > this is precisely the reason why we cannot pretend these bugs do not exist > and just work this around in preempt_schedule(). But there is no bug in slab. The bug is that spin_unlock() is scheduling inside local_irq_disable(). > Code that relies on > cli/sti for atomicity should be pretty rare and limited, there's 1 known > case so far where it leads to bugs. Are you implying that all code which does spin_unlock() inside local_irq_disable() needs to be converted to use _raw_spin_unlock()? If so then, umm, ugh. I hope that the debug check is working for CONFIG_PREEMPT=n. BTW, what is the situation with spin_unlock_irq[restore]()? Seems that these will schedule inside local_irq_disable() quite a lot? - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:00 ` [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] Andrew Morton @ 2002-07-24 7:54 ` Ingo Molnar 2002-07-24 8:03 ` William Lee Irwin III 2002-07-24 16:40 ` Linus Torvalds 2 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 7:54 UTC (permalink / raw) To: Andrew Morton Cc: Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Wed, 24 Jul 2002, Andrew Morton wrote: > > Code that relies on > > cli/sti for atomicity should be pretty rare and limited, there's 1 known > > case so far where it leads to bugs. > > Are you implying that all code which does spin_unlock() inside > local_irq_disable() needs to be converted to use _raw_spin_unlock()? If > so then, umm, ugh. I hope that the debug check is working for > CONFIG_PREEMPT=n. yes, it works for CONFIG_PREEMT=n as well. > BTW, what is the situation with spin_unlock_irq[restore]()? Seems that > these will schedule inside local_irq_disable() quite a lot? i changed the order in my patch - and there's another valid reason for it: to slightly reduce the amount of time spent with irqs disabled. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:00 ` [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] Andrew Morton 2002-07-24 7:54 ` Ingo Molnar @ 2002-07-24 8:03 ` William Lee Irwin III 2002-07-24 8:06 ` Ingo Molnar 2002-07-24 16:40 ` Linus Torvalds 2 siblings, 1 reply; 29+ messages in thread From: William Lee Irwin III @ 2002-07-24 8:03 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel Ingo Molnar wrote: >> Code that relies on >> cli/sti for atomicity should be pretty rare and limited, there's 1 known >> case so far where it leads to bugs. On Wed, Jul 24, 2002 at 01:00:00AM -0700, Andrew Morton wrote: > Are you implying that all code which does spin_unlock() inside > local_irq_disable() needs to be converted to use _raw_spin_unlock()? > If so then, umm, ugh. I hope that the debug check is working > for CONFIG_PREEMPT=n. > BTW, what is the situation with spin_unlock_irq[restore]()? Seems > that these will schedule inside local_irq_disable() quite a lot? Since we're on the subject of preempt_schedule() being done at inappropriate times, I'm seeing it being called and panicking the machine before the per-cpu GDT, IDT, TSS, and LDT are loaded in cpu_init() and suspect that may be a wee bit too early to be sane. Cheers, Bill ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:03 ` William Lee Irwin III @ 2002-07-24 8:06 ` Ingo Molnar 2002-07-24 8:15 ` William Lee Irwin III 0 siblings, 1 reply; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 8:06 UTC (permalink / raw) To: William Lee Irwin III Cc: Andrew Morton, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Wed, 24 Jul 2002, William Lee Irwin III wrote: > Since we're on the subject of preempt_schedule() being done at > inappropriate times, I'm seeing it being called and panicking the > machine before the per-cpu GDT, IDT, TSS, and LDT are loaded in > cpu_init() and suspect that may be a wee bit too early to be sane. i've fixed this in my tree: the init thread needs to start up with a nonzero preempt_count, and schedule_init() sets it to 0. [schedule_init() is the point after we can schedule.] Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:06 ` Ingo Molnar @ 2002-07-24 8:15 ` William Lee Irwin III 2002-07-24 8:17 ` Ingo Molnar 0 siblings, 1 reply; 29+ messages in thread From: William Lee Irwin III @ 2002-07-24 8:15 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Wed, 24 Jul 2002, William Lee Irwin III wrote: >> Since we're on the subject of preempt_schedule() being done at >> inappropriate times, I'm seeing it being called and panicking the >> machine before the per-cpu GDT, IDT, TSS, and LDT are loaded in >> cpu_init() and suspect that may be a wee bit too early to be sane. On Wed, Jul 24, 2002 at 10:06:28AM +0200, Ingo Molnar wrote: > i've fixed this in my tree: the init thread needs to start up with a > nonzero preempt_count, and schedule_init() sets it to 0. [schedule_init() > is the point after we can schedule.] Sorry about the duplicated report, I must have missed the fix in the changelogs. I've been following closely in general partly because my machines are very sensitive to i386 interrupt changes, and I need to help do testing and report things early as you're not likely to have a similar machine. Since it's where the fixes are going, it looks like I'd better use your tree. I'll follow up once I've updated and had time to test. (They are unfortunately very slow to boot.) Thanks, Bill ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:15 ` William Lee Irwin III @ 2002-07-24 8:17 ` Ingo Molnar 0 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 8:17 UTC (permalink / raw) To: William Lee Irwin III Cc: Andrew Morton, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Wed, 24 Jul 2002, William Lee Irwin III wrote: > > i've fixed this in my tree: the init thread needs to start up with a > > nonzero preempt_count, and schedule_init() sets it to 0. [schedule_init() > > is the point after we can schedule.] > > Sorry about the duplicated report, I must have missed the fix in the > changelogs. [...] sorry - i was too compact. What i wanted to say: "i agree that this is a bug, and based on your report i've fixed this bug in my tree, it will show up in the next patch". Thanks! Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 8:00 ` [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] Andrew Morton 2002-07-24 7:54 ` Ingo Molnar 2002-07-24 8:03 ` William Lee Irwin III @ 2002-07-24 16:40 ` Linus Torvalds 2002-07-24 16:49 ` Robert Love 2 siblings, 1 reply; 29+ messages in thread From: Linus Torvalds @ 2002-07-24 16:40 UTC (permalink / raw) To: Andrew Morton Cc: Ingo Molnar, Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Wed, 24 Jul 2002, Andrew Morton wrote: > > But there is no bug in slab. The bug is that spin_unlock() is > scheduling inside local_irq_disable(). I disagree. There is no reason to believe that you can mix irq disables and spinlocking willy nilly. The two have commonalities, and thinking that you can build up your own special irq-safe spinlocks by hand just isn't _true_. The bug is conceptually definitely in slab.c - although we can't blame it too much since we've modified some behaviour. Just as an example, in the not too distant future what _will_ happen is that spin_lock_irqsave() .. spin_unlock_irqrestore() will not necessarily increment the preemtion count. Why should they? They've disabled local interrupts, so there's no preemption to protect against. That's just an _obvious_ optimization. But that optimization also means that you _have_ to pair the things properly. It is incorrect and buggy to play pairing games. The same way you could not pair the global irq lock with a local unlock, you cannot play games with local_irq_disable() and spinlocks, and expect it to "just work". Normal code should always pair these primitives. And, in fact, 99% of all code _does_ pair them properly, and people would consider anything else strange. The 1% (or less) of code that thinks it wants to be clever will have to live with that decision. We will have code that does the preemption regions by hand, and some day you will need to do spin_lock_irqsave(..) ... ... preempt_disable(); spin_unlock(..); ... ... local_irq_restore(..); preempt_enable(); if you want to do what slab apparently wants to do right now. In short, you should _always_ write out what you actually expect of the environment. Making the assumption that "spin_lock_irqsave" is 100% the same as "local_irq_save + preempt_disable + __spin_lock" is a BUG, simply because it assumes something that has never been guaranteed, and is nothign but an implementation detail. > Are you implying that all code which does spin_unlock() inside > local_irq_disable() needs to be converted to use _raw_spin_unlock()? > If so then, umm, ugh. I hope that the debug check is working > for CONFIG_PREEMPT=n. I'm saying that preempt will have the irq check (for safety and debugging, and it will scream if somebody does something bad), and I'm saying outright that if you expect to not be scheduled, you should _tell_ the kernel so, instead of just thinking that it's implied by something else. You can nest spinlocks any amount you want, but no, you cannot just assume that nesting irq disables and spinlocks gives you the same thing as using the irq-safe spinlocks. We can easily add a debugging check to spin_unlock() that says: /* Somebody messed up, doesn't hold any other preemption thing * than this lock that is now getting released, and has interrupts * disabled */ BUG_ON(preempt_count() == 1 && interrupts_enabled()) No? Linus ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] 2002-07-24 16:40 ` Linus Torvalds @ 2002-07-24 16:49 ` Robert Love 2002-07-24 20:56 ` [patch] irqlock patch -G3. [was Re: odd memorycorruptionin2.5.27?] george anzinger 0 siblings, 1 reply; 29+ messages in thread From: Robert Love @ 2002-07-24 16:49 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Ingo Molnar, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linux Kernel On Wed, 2002-07-24 at 09:40, Linus Torvalds wrote: > Just as an example, in the not too distant future what _will_ happen is > that > > spin_lock_irqsave() > .. > spin_unlock_irqrestore() > > will not necessarily increment the preemtion count. Why should they? > They've disabled local interrupts, so there's no preemption to protect > against. That's just an _obvious_ optimization. So obvious it is even in my queue :) I do not think we are ready yet - there is just way too much code that does not pair up as you mention... but I have played with the patch and some debugging to see just how feasible it is. Fairly soon. Note that other preemptible kernels (IRIX and BeOS, for example) do not even have a preemption count -- all spinlocks also disable interrupts. Not that I am suggesting that, I am very fond of our preempt_count mechanism, but its a point to consider even if it were feasible (e.g. we did not have spinlocks held for hundreds of milliseconds). > We can easily add a debugging check to spin_unlock() that says: > > /* Somebody messed up, doesn't hold any other preemption thing > * than this lock that is now getting released, and has interrupts > * disabled > */ > BUG_ON(preempt_count() == 1 && interrupts_enabled()) > > No? Pretty similar to the debugging I played with... as long as it goes away eventually (who wants this in their unlock path?) we certainly should add it at some point. Robert Love ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memorycorruptionin2.5.27?] 2002-07-24 16:49 ` Robert Love @ 2002-07-24 20:56 ` george anzinger 0 siblings, 0 replies; 29+ messages in thread From: george anzinger @ 2002-07-24 20:56 UTC (permalink / raw) To: Robert Love Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, Zwane Mwaikambo, Trond Myklebust, Linux Kernel Robert Love wrote: > > On Wed, 2002-07-24 at 09:40, Linus Torvalds wrote: > > > Just as an example, in the not too distant future what _will_ happen is > > that > > > > spin_lock_irqsave() > > .. > > spin_unlock_irqrestore() > > > > will not necessarily increment the preemtion count. Why should they? > > They've disabled local interrupts, so there's no preemption to protect > > against. That's just an _obvious_ optimization. I thought this was the thing to do back at the beginning of the preemption saga, but now I am not so sure. Lately I have been doing a bit of code in the clock routines. These get called from both interrupt context and from normal context and thus need to use the IRQ locks. The task entry does a spin_lock_irq() and the interrupt routine KNOWS interrupts are off and just does a spin_lock(). The inner code just knows that it is wrapped by an IRQ off and so does not use the IRQ version of the spin locks it needs. The above change would auger that the "inner" code should use _raw* spinlocks OR should use the nested IRQ locks. But an IRQ lock is a LOT more expensive (due to the hardware sync implied by messing with the interrupt system) than inc and dec on a counter. This, by the way, is the same reason that preempt_enable()/preemp_disable() should be preferred over local_irq_disable()/local_irq_enable() when either would do the job. > > So obvious it is even in my queue :) > > I do not think we are ready yet - there is just way too much code that > does not pair up as you mention... but I have played with the patch and > some debugging to see just how feasible it is. Fairly soon. > > Note that other preemptible kernels (IRIX and BeOS, for example) do not > even have a preemption count -- all spinlocks also disable interrupts. > Not that I am suggesting that, I am very fond of our preempt_count > mechanism, but its a point to consider even if it were feasible (e.g. we > did not have spinlocks held for hundreds of milliseconds). > > > We can easily add a debugging check to spin_unlock() that says: > > > > /* Somebody messed up, doesn't hold any other preemption thing > > * than this lock that is now getting released, and has interrupts > > * disabled > > */ > > BUG_ON(preempt_count() == 1 && interrupts_enabled()) > > > > No? > > Pretty similar to the debugging I played with... as long as it goes away > eventually (who wants this in their unlock path?) we certainly should > add it at some point. I think we also need to have debugging code in the local_irq_enable() to make sure the correct preemption enable is being done. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] 2002-07-24 3:13 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Andrew Morton 2002-07-24 3:18 ` Linus Torvalds 2002-07-24 7:34 ` Ingo Molnar @ 2002-07-24 7:37 ` Ingo Molnar 2 siblings, 0 replies; 29+ messages in thread From: Ingo Molnar @ 2002-07-24 7:37 UTC (permalink / raw) To: Andrew Morton Cc: Robert Love, george anzinger, Zwane Mwaikambo, Trond Myklebust, Linus Torvalds, Linux Kernel On Tue, 23 Jul 2002, Andrew Morton wrote: > [...] It'll be untrivial to fix all this, but this very subtle change to > the locking semantics with CONFIG_PREEMPT is quite nasty. well, preempt is different, lets face it. I'm very strongly against making the irq-on path check for preemption ... Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: odd memory corruption in 2.5.27? 2002-07-23 20:32 ` george anzinger 2002-07-23 20:47 ` William Lee Irwin III 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar @ 2002-07-24 6:52 ` Zwane Mwaikambo 2 siblings, 0 replies; 29+ messages in thread From: Zwane Mwaikambo @ 2002-07-24 6:52 UTC (permalink / raw) To: george anzinger; +Cc: Trond Myklebust, Linux Kernel On Tue, 23 Jul 2002, george anzinger wrote: > protecting them with a combination of interrupt disables and > spin_locks. Preemption is allowed (incorrectly) if > interrupts are off and preempt_count goes to zero on the > spin_unlock. I will wager that this is an SMP machine. > After the preemption interrupts will be on (schedule() does > that) AND you could be on a different cpu. Either of these > is a BAD thing. > > The proposed fix is to catch the attempted preemption in > preempt_schedule() and just return if the interrupt system > is off. (Of course there is more that this to it, but I do > believe that the problem is known. You could blow this > assertion out of the water by asserting that the machine is > NOT smp.) I haven't looked at it further than gathering oopses and idly browsing surrounding code. About your assertion, you're almost right, its UP box running an SMP kernel w/ CONFIG_PREEMT. -- function.linuxpower.ca ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2002-07-24 21:12 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-23 6:23 odd memory corruption in 2.5.27? Zwane Mwaikambo 2002-07-23 6:24 ` Zwane Mwaikambo 2002-07-23 6:26 ` Zwane Mwaikambo 2002-07-23 7:57 ` Trond Myklebust 2002-07-23 8:54 ` Zwane Mwaikambo 2002-07-23 20:32 ` george anzinger 2002-07-23 20:47 ` William Lee Irwin III 2002-07-23 23:28 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Ingo Molnar 2002-07-23 23:53 ` george anzinger 2002-07-23 23:56 ` Linus Torvalds 2002-07-24 0:07 ` Ingo Molnar 2002-07-24 2:15 ` Linus Torvalds 2002-07-24 8:59 ` [patch] irqlock patch 2.5.27-H3 Ingo Molnar 2002-07-24 1:08 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in 2.5.27?] Robert Love 2002-07-24 3:13 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Andrew Morton 2002-07-24 3:18 ` Linus Torvalds 2002-07-24 7:13 ` Ingo Molnar 2002-07-24 7:34 ` Ingo Molnar 2002-07-24 8:00 ` [patch] irqlock patch -G3. [was Re: odd memory corruptionin2.5.27?] Andrew Morton 2002-07-24 7:54 ` Ingo Molnar 2002-07-24 8:03 ` William Lee Irwin III 2002-07-24 8:06 ` Ingo Molnar 2002-07-24 8:15 ` William Lee Irwin III 2002-07-24 8:17 ` Ingo Molnar 2002-07-24 16:40 ` Linus Torvalds 2002-07-24 16:49 ` Robert Love 2002-07-24 20:56 ` [patch] irqlock patch -G3. [was Re: odd memorycorruptionin2.5.27?] george anzinger 2002-07-24 7:37 ` [patch] irqlock patch -G3. [was Re: odd memory corruption in2.5.27?] Ingo Molnar 2002-07-24 6:52 ` odd memory corruption in 2.5.27? Zwane Mwaikambo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox