* Re: 2.6.5-rc1-mm2
2004-03-18 4:14 2.6.5-rc1-mm2 Andrew Morton
@ 2004-03-18 16:01 ` John Cherry
2004-03-18 20:31 ` USB: gphoto2 hangs, device disconnection oddity (was Re: 2.6.5-rc1-mm2) Sean Neakums
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: John Cherry @ 2004-03-18 16:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel@vger.kernel.org
No change with 2.6.5-rc1-mm2.
Linux 2.6 (mm tree) Compile Statistics (gcc 3.2.2)
Warnings/Errors Summary
Kernel bzImage bzImage bzImage modules bzImage modules
(defconfig) (allno) (allyes) (allyes) (allmod) (allmod)
--------------- ---------- -------- -------- -------- -------- --------
2.6.5-rc1-mm2 0w/0e 5w/0e 135w/ 5e 8w/0e 3w/0e 133w/0e
2.6.5-rc1-mm1 0w/0e 5w/0e 135w/ 5e 8w/0e 3w/0e 133w/0e
2.6.4-mm2 1w/2e 5w/2e 144w/10e 8w/0e 3w/2e 144w/0e
2.6.4-mm1 1w/0e 5w/0e 146w/ 5e 8w/0e 3w/0e 144w/0e
2.6.4-rc2-mm1 1w/0e 5w/0e 146w/12e 11w/0e 3w/0e 147w/2e
2.6.4-rc1-mm2 1w/0e 5w/0e 144w/ 0e 11w/0e 3w/0e 145w/0e
2.6.4-rc1-mm1 1w/0e 5w/0e 147w/ 5e 11w/0e 3w/0e 147w/0e
2.6.3-mm4 1w/0e 5w/0e 146w/ 0e 7w/0e 3w/0e 142w/0e
2.6.3-mm3 1w/2e 5w/2e 146w/15e 7w/0e 3w/2e 144w/5e
2.6.3-mm2 1w/8e 5w/0e 140w/ 0e 7w/0e 3w/0e 138w/0e
2.6.3-mm1 1w/0e 5w/0e 143w/ 5e 7w/0e 3w/0e 141w/0e
2.6.3-rc3-mm1 1w/0e 0w/0e 144w/13e 7w/0e 3w/0e 142w/3e
2.6.3-rc2-mm1 1w/0e 0w/265e 144w/ 5e 7w/0e 3w/0e 145w/0e
2.6.3-rc1-mm1 1w/0e 0w/265e 141w/ 5e 7w/0e 3w/0e 143w/0e
2.6.2-mm1 2w/0e 0w/264e 147w/ 5e 7w/0e 3w/0e 173w/0e
2.6.2-rc3-mm1 2w/0e 0w/265e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc2-mm2 0w/0e 0w/264e 145w/ 5e 7w/0e 3w/0e 171w/0e
2.6.2-rc2-mm1 0w/0e 0w/264e 146w/ 5e 7w/0e 3w/0e 172w/0e
2.6.2-rc1-mm3 0w/0e 0w/265e 144w/ 8e 7w/0e 3w/0e 169w/0e
2.6.2-rc1-mm2 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.2-rc1-mm1 0w/0e 0w/264e 144w/ 5e 10w/0e 3w/0e 171w/0e
2.6.1-mm5 2w/5e 0w/264e 153w/11e 10w/0e 3w/0e 180w/0e
2.6.1-mm4 0w/821e 0w/264e 154w/ 5e 8w/1e 5w/0e 179w/0e
2.6.1-mm3 0w/0e 0w/0e 151w/ 5e 10w/0e 3w/0e 177w/0e
2.6.1-mm2 0w/0e 0w/0e 143w/ 5e 12w/0e 3w/0e 171w/0e
2.6.1-mm1 0w/0e 0w/0e 146w/ 9e 12w/0e 6w/0e 171w/0e
2.6.1-rc2-mm1 0w/0e 0w/0e 149w/ 0e 12w/0e 6w/0e 171w/4e
2.6.1-rc1-mm2 0w/0e 0w/0e 157w/15e 12w/0e 3w/0e 185w/4e
2.6.1-rc1-mm1 0w/0e 0w/0e 156w/10e 12w/0e 3w/0e 184w/2e
2.6.0-mm2 0w/0e 0w/0e 161w/ 0e 12w/0e 3w/0e 189w/0e
2.6.0-mm1 0w/0e 0w/0e 173w/ 0e 12w/0e 3w/0e 212w/0e
Web page with links to complete details:
http://developer.osdl.org/cherry/compile/
^ permalink raw reply [flat|nested] 14+ messages in thread* USB: gphoto2 hangs, device disconnection oddity (was Re: 2.6.5-rc1-mm2)
2004-03-18 4:14 2.6.5-rc1-mm2 Andrew Morton
2004-03-18 16:01 ` 2.6.5-rc1-mm2 John Cherry
@ 2004-03-18 20:31 ` Sean Neakums
2004-03-19 9:27 ` 2.6.5-rc1-mm2 Marc-Christian Petersen
2004-03-30 19:27 ` 2.6.5-rc1-mm2 Jesse Barnes
3 siblings, 0 replies; 14+ messages in thread
From: Sean Neakums @ 2004-03-18 20:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-usb-devel
On one machine (a Dell Inspiron 4100 laptop), with 2.6.5-rc1-mm2 and
2.6.5-rc1-mm1, but not with 2.6.5-rc1, gphoto2 hangs trying to talk to
my camera:
$ ps -C gphoto2 -o comm,s,wchan
COMMAND S WCHAN
gphoto2 D usb_disable_device
However, I was able to connect, mount and perform large transfers to a
USB Storage device without any problems, although the device still
shows up in lsusb after it is umounted and disconnected, and plugging
in the camera has no effect, which is how I first noticed this problem.
Here's the Inspiron's controller:
$ sudo lspci -s 00:1d.0 -vvvv
00:1d.0 USB Controller: Intel Corp. 82801CA/CAM USB (Hub #1) (rev 01) (prog-if 00 [UHCI])
Subsystem: Intel Corp.: Unknown device 4541
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 11
Region 4: I/O ports at bf80 [size=32]
But on another machine (Gigabyte 6VTXD board, VIA chipset) running
2.6.5-rc1-mm1, gphoto2 works fine. Here's its controller:
$ sudo lspci -s 00:07 -vvvv
[...]
00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin D routed to IRQ 10
Region 4: I/O ports at c800 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 1a) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin D routed to IRQ 10
Region 4: I/O ports at cc00 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
[...]
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.6.5-rc1-mm2
2004-03-18 4:14 2.6.5-rc1-mm2 Andrew Morton
2004-03-18 16:01 ` 2.6.5-rc1-mm2 John Cherry
2004-03-18 20:31 ` USB: gphoto2 hangs, device disconnection oddity (was Re: 2.6.5-rc1-mm2) Sean Neakums
@ 2004-03-19 9:27 ` Marc-Christian Petersen
2004-03-30 19:27 ` 2.6.5-rc1-mm2 Jesse Barnes
3 siblings, 0 replies; 14+ messages in thread
From: Marc-Christian Petersen @ 2004-03-19 9:27 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 211 bytes --]
On Thursday 18 March 2004 05:14, Andrew Morton wrote:
Hi Andrew,
> +move-job-control-stuff-tosignal_struct-sparc64-fix.patch
> Fix the signal rework for sparc64
prolly this one too for ebtables.
ciao, Marc
[-- Attachment #2: move-job-control-stuff-tosignal_struct-ebtables-fix.patch --]
[-- Type: text/x-diff, Size: 463 bytes --]
--- old/net/bridge/netfilter/ebtables.c 2003-12-18 03:58:40.000000000 +0100
+++ new/net/bridge/netfilter/ebtables.c 2004-03-19 10:23:43.000000000 +0100
@@ -46,7 +46,7 @@ static void print_string(char *str)
struct tty_struct *my_tty;
/* The tty for the current task */
- my_tty = current->tty;
+ my_tty = current->signal->tty;
if (my_tty != NULL) {
my_tty->driver->write(my_tty, 0, str, strlen(str));
my_tty->driver->write(my_tty, 0, "\015\012", 2);
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.6.5-rc1-mm2
2004-03-18 4:14 2.6.5-rc1-mm2 Andrew Morton
` (2 preceding siblings ...)
2004-03-19 9:27 ` 2.6.5-rc1-mm2 Marc-Christian Petersen
@ 2004-03-30 19:27 ` Jesse Barnes
2004-03-30 19:36 ` 2.6.5-rc1-mm2 Andrew Morton
3 siblings, 1 reply; 14+ messages in thread
From: Jesse Barnes @ 2004-03-30 19:27 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Wednesday 17 March 2004 8:14 pm, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5-rc1/2.6
>.5-rc1-mm2/
>
> - Dropped the early-x86-cpu-detection patches, as these appear to be the
> source of recent early-crash problems.
>
> - Several fixes against the new writeback code.
>
> - Several fixes against the new block unplugging code.
I just tracked down a hang I've been seeing in the 2.6.5-rcX-mm trees to this
release. The symptom is that the machine hangs sometime during init script
startup, usually at around the time swap space is enabled (using pretty stock
Red Hat scripts). Before I look into it any further, are there any patches
that I should look at dropping to see if the hang goes away?
The hang occurs all the way through 2.6.5-rc3-mm1, but Linus' 2.6.5-rc3
release works fine.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.6.5-rc1-mm2
2004-03-30 19:27 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-03-30 19:36 ` Andrew Morton
2004-03-30 19:44 ` 2.6.5-rc1-mm2 Jesse Barnes
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2004-03-30 19:36 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel
Jesse Barnes <jbarnes@sgi.com> wrote:
>
> On Wednesday 17 March 2004 8:14 pm, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5-rc1/2.6
> >.5-rc1-mm2/
> >
> > - Dropped the early-x86-cpu-detection patches, as these appear to be the
> > source of recent early-crash problems.
> >
> > - Several fixes against the new writeback code.
> >
> > - Several fixes against the new block unplugging code.
>
> I just tracked down a hang I've been seeing in the 2.6.5-rcX-mm trees to this
> release. The symptom is that the machine hangs sometime during init script
> startup, usually at around the time swap space is enabled (using pretty stock
> Red Hat scripts). Before I look into it any further, are there any patches
> that I should look at dropping to see if the hang goes away?
>
> The hang occurs all the way through 2.6.5-rc3-mm1, but Linus' 2.6.5-rc3
> release works fine.
I don't see anything especially hangy in 2.6.5-rc1-mm2 - maybe it's
something which was sucked in via one of the "external trees". rc3-mm1
boots OK on my ia64 box.
Do you not have the means to work out where things are stuck at?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-30 19:36 ` 2.6.5-rc1-mm2 Andrew Morton
@ 2004-03-30 19:44 ` Jesse Barnes
2004-03-31 19:02 ` 2.6.5-rc1-mm2 Jesse Barnes
0 siblings, 1 reply; 14+ messages in thread
From: Jesse Barnes @ 2004-03-30 19:44 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Tuesday 30 March 2004 11:36 am, Andrew Morton wrote:
> I don't see anything especially hangy in 2.6.5-rc1-mm2 - maybe it's
> something which was sucked in via one of the "external trees". rc3-mm1
> boots OK on my ia64 box.
Well, like I said, the BK trees (both Linus' linux-2.5 and David's
to-linus-2.5) continue to work, all the way up through today, and
2.6.5-rc1-mm1 worked too.
> Do you not have the means to work out where things are stuck at?
It looks like there's a bug in the sysrq implementation in the sn_serial
driver. Once the initial console is opened, sysrq no longer works. All I've
determined so far is that both CPUs in my box are in cpu_idle somewhere...
Anyway, I'll keep looking.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-30 19:44 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-03-31 19:02 ` Jesse Barnes
2004-03-31 20:06 ` 2.6.5-rc1-mm2 Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Jesse Barnes @ 2004-03-31 19:02 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Tuesday 30 March 2004 11:44 am, Jesse Barnes wrote:
> It looks like there's a bug in the sysrq implementation in the sn_serial
> driver. Once the initial console is opened, sysrq no longer works. All
> I've determined so far is that both CPUs in my box are in cpu_idle
> somewhere... Anyway, I'll keep looking.
Ah, now sysrq is working (just had to configure it correctly). I've seen two
backtraces in the hangs I've seen. The one I just reproduced looks like this:
Enabling local filesystem quotas: [ OK ]
Enabling swap space: [ OK ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting sysstat: [ OK ]
Setting network parameters: ^[SYSSysRq : Show State
[ bunch of kernel daemon traces ]
...
S10network S a0000001000d8cf0 0 1143 1104 1156 (NOTLB)
Call Trace:
[<a0000001000c4200>] schedule+0xda0/0x1360
sp=e00000387a27fdc0 bsp=e00000387a2791b8
[<a0000001000d8cf0>] sys_wait4+0x450/0x660
sp=e00000387a27fdd0 bsp=e00000387a2790f0
[<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
sp=e00000387a27fe30 bsp=e00000387a2790b8
initlog S a0000001000e8650 0 1156 1143 1157 (NOTLB)
Call Trace:
[<a0000001000c4200>] schedule+0xda0/0x1360
sp=e00000387af47ce0 bsp=e00000387af411a0
[<a0000001000e8650>] schedule_timeout+0x190/0x1a0
sp=e00000387af47cf0 bsp=e00000387af41168
[<a00000010072eb70>] unix_wait_for_peer+0x210/0x220
sp=e00000387af47d30 bsp=e00000387af41130
[<a00000010072ee30>] unix_stream_connect+0x2b0/0xd00
sp=e00000387af47d90 bsp=e00000387af41098
[<a0000001006285f0>] sys_connect+0xf0/0x140
sp=e00000387af47da0 bsp=e00000387af41020
[<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
sp=e00000387af47e30 bsp=e00000387af41020
sysctl Z a0000001000d7330 0 1157 1156 (L-TLB)
Call Trace:
[<a0000001000c4200>] schedule+0xda0/0x1360
sp=e00000347a5a7e20 bsp=e00000347a5a1078
[<a0000001000d7330>] do_exit+0x490/0x500
sp=e00000347a5a7e30 bsp=e00000347a5a1018
[<a0000001000d77b0>] do_group_exit+0x290/0x360
sp=e00000347a5a7e30 bsp=e00000347a5a0fe0
[<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
sp=e00000347a5a7e30 bsp=e00000347a5a0fc8
and the CPU is in cpu_idle (somewhere, either default_idle or somewhere
along that call path). The other failure was also a hang, and it looked
like an infinite number of page faults was being generated, something
like
...
[<a0000001001233c0>] __free_pages+0x60/0x140
sp=e0000030148ebb80 bsp=e0000030148e5388
[<a00000010012b670>] slab_destroy+0x2f0/0x3e0
sp=e0000030148ebb80 bsp=e0000030148e5338
[<a000000100130120>] reap_timer_fnc+0x480/0x680
sp=e0000030148ebb80 bsp=e0000030148e5268
[<a0000001000e7ee0>] run_timer_softirq+0x380/0x5c0
sp=e0000030148ebb90 bsp=e0000030148e51e0
[<a0000001000dbd10>] __do_softirq+0x1d0/0x1e0
sp=e0000030148ebbb0 bsp=e0000030148e5160
[<a0000001000dbda0>] do_softirq+0x80/0xe0
sp=e0000030148ebbb0 bsp=e0000030148e5100
[<a000000100018300>] ia64_handle_irq+0x180/0x1c0
sp=e0000030148ebbb0 bsp=e0000030148e50c0
[<a000000100011c00>] ia64_leave_kernel+0x0/0x280
sp=e0000030148ebbb0 bsp=e0000030148e50c0
[<a000000100019d20>] default_idle+0xe0/0x180
or
...
[<a00000010005de40>] mapped_kernel_page_is_present+0x100/0x120
sp=e0000030148eb920 bsp=e0000030148e5438
[<a00000010005dfd0>] ia64_do_page_fault+0x170/0x960
sp=e0000030148eb920 bsp=e0000030148e53c8
[<a000000100011c00>] ia64_leave_kernel+0x0/0x280
sp=e0000030148eb9b0 bsp=e0000030148e53c8
[<a0000001001233c0>] __free_pages+0x60/0x140
sp=e0000030148ebb80 bsp=e0000030148e5388
[<a00000010012b670>] slab_destroy+0x2f0/0x3e0
sp=e0000030148ebb80 bsp=e0000030148e5338
[<a000000100130120>] reap_timer_fnc+0x480/0x680
sp=e0000030148ebb80 bsp=e0000030148e5268
[<a0000001000e7ee0>] run_timer_softirq+0x380/0x5c0
sp=e0000030148ebb90 bsp=e0000030148e51e0
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.6.5-rc1-mm2
2004-03-31 19:02 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-03-31 20:06 ` Andrew Morton
2004-03-31 23:15 ` 2.6.5-rc1-mm2 Jesse Barnes
0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2004-03-31 20:06 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel
Jesse Barnes <jbarnes@sgi.com> wrote:
>
> On Tuesday 30 March 2004 11:44 am, Jesse Barnes wrote:
> > It looks like there's a bug in the sysrq implementation in the sn_serial
> > driver. Once the initial console is opened, sysrq no longer works. All
> > I've determined so far is that both CPUs in my box are in cpu_idle
> > somewhere... Anyway, I'll keep looking.
>
> Ah, now sysrq is working (just had to configure it correctly).
great.
> I've seen two
> backtraces in the hangs I've seen. The one I just reproduced looks like this:
>
> Enabling local filesystem quotas: [ OK ]
> Enabling swap space: [ OK ]
> INIT: Entering runlevel: 3
> Entering non-interactive startup
> Starting sysstat: [ OK ]
> Setting network parameters: ^[SYSSysRq : Show State
> [ bunch of kernel daemon traces ]
> ...
> S10network S a0000001000d8cf0 0 1143 1104 1156 (NOTLB)
>
> Call Trace:
> [<a0000001000c4200>] schedule+0xda0/0x1360
> sp=e00000387a27fdc0 bsp=e00000387a2791b8
> [<a0000001000d8cf0>] sys_wait4+0x450/0x660
> sp=e00000387a27fdd0 bsp=e00000387a2790f0
> [<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
> sp=e00000387a27fe30 bsp=e00000387a2790b8
> initlog S a0000001000e8650 0 1156 1143 1157 (NOTLB)
>
> Call Trace:
> [<a0000001000c4200>] schedule+0xda0/0x1360
> sp=e00000387af47ce0 bsp=e00000387af411a0
> [<a0000001000e8650>] schedule_timeout+0x190/0x1a0
> sp=e00000387af47cf0 bsp=e00000387af41168
> [<a00000010072eb70>] unix_wait_for_peer+0x210/0x220
> sp=e00000387af47d30 bsp=e00000387af41130
> [<a00000010072ee30>] unix_stream_connect+0x2b0/0xd00
> sp=e00000387af47d90 bsp=e00000387af41098
> [<a0000001006285f0>] sys_connect+0xf0/0x140
> sp=e00000387af47da0 bsp=e00000387af41020
> [<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
> sp=e00000387af47e30 bsp=e00000387af41020
> sysctl Z a0000001000d7330 0 1157 1156 (L-TLB)
>
> Call Trace:
> [<a0000001000c4200>] schedule+0xda0/0x1360
> sp=e00000347a5a7e20 bsp=e00000347a5a1078
> [<a0000001000d7330>] do_exit+0x490/0x500
> sp=e00000347a5a7e30 bsp=e00000347a5a1018
> [<a0000001000d77b0>] do_group_exit+0x290/0x360
> sp=e00000347a5a7e30 bsp=e00000347a5a0fe0
> [<a000000100011a60>] ia64_ret_from_syscall+0x0/0x20
> sp=e00000347a5a7e30 bsp=e00000347a5a0fc8
>
> and the CPU is in cpu_idle (somewhere, either default_idle or somewhere
> along that call path). The other failure was also a hang, and it looked
> like an infinite number of page faults was being generated, something
> like
>
> ...
> [<a0000001001233c0>] __free_pages+0x60/0x140
> sp=e0000030148ebb80 bsp=e0000030148e5388
> [<a00000010012b670>] slab_destroy+0x2f0/0x3e0
> sp=e0000030148ebb80 bsp=e0000030148e5338
> [<a000000100130120>] reap_timer_fnc+0x480/0x680
> sp=e0000030148ebb80 bsp=e0000030148e5268
> [<a0000001000e7ee0>] run_timer_softirq+0x380/0x5c0
> sp=e0000030148ebb90 bsp=e0000030148e51e0
> [<a0000001000dbd10>] __do_softirq+0x1d0/0x1e0
> sp=e0000030148ebbb0 bsp=e0000030148e5160
> [<a0000001000dbda0>] do_softirq+0x80/0xe0
> sp=e0000030148ebbb0 bsp=e0000030148e5100
> [<a000000100018300>] ia64_handle_irq+0x180/0x1c0
> sp=e0000030148ebbb0 bsp=e0000030148e50c0
> [<a000000100011c00>] ia64_leave_kernel+0x0/0x280
> sp=e0000030148ebbb0 bsp=e0000030148e50c0
> [<a000000100019d20>] default_idle+0xe0/0x180
>
So are we to assume that this is the offending process? That the periodic
slab reaping code has screwed up?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-31 20:06 ` 2.6.5-rc1-mm2 Andrew Morton
@ 2004-03-31 23:15 ` Jesse Barnes
2004-03-31 23:56 ` 2.6.5-rc1-mm2 Andrew Morton
0 siblings, 1 reply; 14+ messages in thread
From: Jesse Barnes @ 2004-03-31 23:15 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Wednesday 31 March 2004 12:06 pm, Andrew Morton wrote:
> So are we to assume that this is the offending process? That the periodic
> slab reaping code has screwed up?
It looks like it. Disabling the slab cache reaping function allows it to boot
again.
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-31 23:15 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-03-31 23:56 ` Andrew Morton
2004-03-31 23:58 ` 2.6.5-rc1-mm2 Jesse Barnes
2004-04-01 19:28 ` 2.6.5-rc1-mm2 Jesse Barnes
0 siblings, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2004-03-31 23:56 UTC (permalink / raw)
To: Jesse Barnes; +Cc: linux-kernel
Jesse Barnes <jbarnes@sgi.com> wrote:
>
> On Wednesday 31 March 2004 12:06 pm, Andrew Morton wrote:
> > So are we to assume that this is the offending process? That the periodic
> > slab reaping code has screwed up?
>
> It looks like it. Disabling the slab cache reaping function allows it to boot
> again.
I suspect that the reap timer is innocent and what we have is simply
scribbled-on slab metadata. Which means it could be anything at all.
One last thing: could you please stick a
printk(KERN_EMERG "destroying slab %s\n", cachep->name);
at the start of slab_destroy()? That'll help narrow it down.
Could you also punt me over the .config? If I can make it happen, the
binary search will find it. But it probably won't happen here.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-31 23:56 ` 2.6.5-rc1-mm2 Andrew Morton
@ 2004-03-31 23:58 ` Jesse Barnes
2004-04-01 0:16 ` 2.6.5-rc1-mm2 Jesse Barnes
2004-04-01 19:28 ` 2.6.5-rc1-mm2 Jesse Barnes
1 sibling, 1 reply; 14+ messages in thread
From: Jesse Barnes @ 2004-03-31 23:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Wednesday 31 March 2004 3:56 pm, Andrew Morton wrote:
> Jesse Barnes <jbarnes@sgi.com> wrote:
> > On Wednesday 31 March 2004 12:06 pm, Andrew Morton wrote:
> > > So are we to assume that this is the offending process? That the
> > > periodic slab reaping code has screwed up?
> >
> > It looks like it. Disabling the slab cache reaping function allows it to
> > boot again.
>
> I suspect that the reap timer is innocent and what we have is simply
> scribbled-on slab metadata. Which means it could be anything at all.
That's what I thought too, I'm trying to track down exactly which slab is
having problems now.
>
> One last thing: could you please stick a
>
> printk(KERN_EMERG "destroying slab %s\n", cachep->name);
I'm already booting up something similar...
> at the start of slab_destroy()? That'll help narrow it down.
>
> Could you also punt me over the .config? If I can make it happen, the
> binary search will find it. But it probably won't happen here.
I'm using sn2_defconfig in arch/ia64/configs.
Thanks,
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-31 23:58 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-04-01 0:16 ` Jesse Barnes
0 siblings, 0 replies; 14+ messages in thread
From: Jesse Barnes @ 2004-04-01 0:16 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Wednesday 31 March 2004 3:58 pm, Jesse Barnes wrote:
> > Could you also punt me over the .config? If I can make it happen, the
> > binary search will find it. But it probably won't happen here.
>
> I'm using sn2_defconfig in arch/ia64/configs.
It's the 32k slab and it something that I enabled between -rc1-mm1 and
-rc1-mm2 in sn2_defconfig. Arg! I didn't think to check the config file
first since it works fine in other trees. Oh well, I'm building with slab
debugging enabled now (and the naughty config file)...
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.5-rc1-mm2
2004-03-31 23:56 ` 2.6.5-rc1-mm2 Andrew Morton
2004-03-31 23:58 ` 2.6.5-rc1-mm2 Jesse Barnes
@ 2004-04-01 19:28 ` Jesse Barnes
1 sibling, 0 replies; 14+ messages in thread
From: Jesse Barnes @ 2004-04-01 19:28 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton
On Wednesday 31 March 2004 3:56 pm, Andrew Morton wrote:
> Could you also punt me over the .config? If I can make it happen, the
> binary search will find it. But it probably won't happen here.
CONFIG_HUGETLBFS is the culprit. I'm trying to narrow it down to a specific
hugetlb related patch now.
Jesse
^ permalink raw reply [flat|nested] 14+ messages in thread