* 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
@ 2006-07-12 16:53 Martin Bligh
2006-07-13 1:16 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Martin Bligh @ 2006-07-12 16:53 UTC (permalink / raw)
To: LKML; +Cc: Andrew Morton
-git3 was fine
(bootlog for git3: http://test.kernel.org/abat/40748/debug/console.log)
-mm1 has the same issue
Slightly different manifestations across 2 boots
http://test.kernel.org/abat/40760/debug/console.log
http://test.kernel.org/abat/40837/debug/console.log
32GB NUMA-Q system w/16 processors.
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap = 0kB
Total swap = 0kB
Free swap: 0kB
8321024 pages of RAM
8159232 pages of HIGHMEM
133127 reserved pages
3739 pages shared
0 pages swap cached
208 pages dirty
0 pages writeback
1135 pages mapped
24266 pages slab
76 pages pagetables
Out of Memory: Kill process 1 (init) score 0 and children.
No available memory (MPOL_BIND): Killed process 1267 (rc).
-- 0:conmux-control -- time-stamp -- Jul/12/06 2:00:36 --
-- 0:conmux-control -- time-stamp -- Jul/12/06 2:09:55 --
(bot:conmon-payload) disconnected
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-12 16:53 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot Martin Bligh
@ 2006-07-13 1:16 ` Andrew Morton
2006-07-13 1:24 ` Martin Bligh
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2006-07-13 1:16 UTC (permalink / raw)
To: Martin Bligh; +Cc: linux-kernel
On Wed, 12 Jul 2006 09:53:08 -0700
Martin Bligh <mbligh@google.com> wrote:
> -git3 was fine
> (bootlog for git3: http://test.kernel.org/abat/40748/debug/console.log)
>
> -mm1 has the same issue
>
> Slightly different manifestations across 2 boots
>
> http://test.kernel.org/abat/40760/debug/console.log
> http://test.kernel.org/abat/40837/debug/console.log
[<c0136fcf>] out_of_memory+0x29/0xf6
[<c0137f48>] __alloc_pages+0x1ed/0x276
[<c014db73>] kmem_getpages+0x63/0xc1
[<c014e960>] cache_grow+0xaa/0x139
[<c014eb6a>] cache_alloc_refill+0x17b/0x1c0
[<c014f1ef>] __kmalloc+0x83/0x93
[<c0168cf5>] alloc_fd_array+0x19/0x24
[<c0169122>] alloc_fdtable+0xb2/0xef
[<c016917f>] expand_fdtable+0x20/0x7d
[<c0169221>] expand_files+0x45/0x50
[<c0161263>] locate_fd+0x70/0x8e
[<c01612aa>] dupfd+0x29/0x61
[<c01613dc>] sys_dup+0x1b/0x23
[<c01027d3>] syscall_call+0x7/0xb
I suspect that's because I had me a little mistake.
--- a/fs/file.c~alloc_fdtable-expansion-fix
+++ a/fs/file.c
@@ -240,7 +240,7 @@ static struct fdtable *alloc_fdtable(int
if (!fdt)
goto out;
- nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nfds));
+ nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nr + 1));
if (nfds > NR_OPEN)
nfds = NR_OPEN;
_
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-13 1:16 ` Andrew Morton
@ 2006-07-13 1:24 ` Martin Bligh
2006-07-13 14:12 ` Andy Whitcroft
0 siblings, 1 reply; 8+ messages in thread
From: Martin Bligh @ 2006-07-13 1:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Andy Whitcroft
Andrew Morton wrote:
> On Wed, 12 Jul 2006 09:53:08 -0700
> Martin Bligh <mbligh@google.com> wrote:
>
>
>>-git3 was fine
>>(bootlog for git3: http://test.kernel.org/abat/40748/debug/console.log)
>>
>>-mm1 has the same issue
>>
>>Slightly different manifestations across 2 boots
>>
>>http://test.kernel.org/abat/40760/debug/console.log
>>http://test.kernel.org/abat/40837/debug/console.log
>
>
> [<c0136fcf>] out_of_memory+0x29/0xf6
> [<c0137f48>] __alloc_pages+0x1ed/0x276
> [<c014db73>] kmem_getpages+0x63/0xc1
> [<c014e960>] cache_grow+0xaa/0x139
> [<c014eb6a>] cache_alloc_refill+0x17b/0x1c0
> [<c014f1ef>] __kmalloc+0x83/0x93
> [<c0168cf5>] alloc_fd_array+0x19/0x24
> [<c0169122>] alloc_fdtable+0xb2/0xef
> [<c016917f>] expand_fdtable+0x20/0x7d
> [<c0169221>] expand_files+0x45/0x50
> [<c0161263>] locate_fd+0x70/0x8e
> [<c01612aa>] dupfd+0x29/0x61
> [<c01613dc>] sys_dup+0x1b/0x23
> [<c01027d3>] syscall_call+0x7/0xb
>
> I suspect that's because I had me a little mistake.
>
> --- a/fs/file.c~alloc_fdtable-expansion-fix
> +++ a/fs/file.c
> @@ -240,7 +240,7 @@ static struct fdtable *alloc_fdtable(int
> if (!fdt)
> goto out;
>
> - nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nfds));
> + nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nr + 1));
> if (nfds > NR_OPEN)
> nfds = NR_OPEN;
>
> _
>
Thanks, that was affecting several machines.
Andy, any chance we can do an across-all-machines run of that one on top
of -mm1? Thanks,
M.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-13 1:24 ` Martin Bligh
@ 2006-07-13 14:12 ` Andy Whitcroft
2006-07-14 8:00 ` Andy Whitcroft
0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-07-13 14:12 UTC (permalink / raw)
To: Martin Bligh; +Cc: Andrew Morton, linux-kernel
Martin Bligh wrote:
> Andrew Morton wrote:
>> On Wed, 12 Jul 2006 09:53:08 -0700
>> Martin Bligh <mbligh@google.com> wrote:
>>
>>
>>> -git3 was fine
>>> (bootlog for git3: http://test.kernel.org/abat/40748/debug/console.log)
>>>
>>> -mm1 has the same issue
>>>
>>> Slightly different manifestations across 2 boots
>>>
>>> http://test.kernel.org/abat/40760/debug/console.log
>>> http://test.kernel.org/abat/40837/debug/console.log
>>
>>
>> [<c0136fcf>] out_of_memory+0x29/0xf6
>> [<c0137f48>] __alloc_pages+0x1ed/0x276
>> [<c014db73>] kmem_getpages+0x63/0xc1
>> [<c014e960>] cache_grow+0xaa/0x139
>> [<c014eb6a>] cache_alloc_refill+0x17b/0x1c0
>> [<c014f1ef>] __kmalloc+0x83/0x93
>> [<c0168cf5>] alloc_fd_array+0x19/0x24
>> [<c0169122>] alloc_fdtable+0xb2/0xef
>> [<c016917f>] expand_fdtable+0x20/0x7d
>> [<c0169221>] expand_files+0x45/0x50
>> [<c0161263>] locate_fd+0x70/0x8e
>> [<c01612aa>] dupfd+0x29/0x61
>> [<c01613dc>] sys_dup+0x1b/0x23
>> [<c01027d3>] syscall_call+0x7/0xb
>>
>> I suspect that's because I had me a little mistake.
>>
>> --- a/fs/file.c~alloc_fdtable-expansion-fix
>> +++ a/fs/file.c
>> @@ -240,7 +240,7 @@ static struct fdtable *alloc_fdtable(int
>> if (!fdt)
>> goto out;
>>
>> - nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nfds));
>> + nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nr + 1));
>> if (nfds > NR_OPEN)
>> nfds = NR_OPEN;
>>
>> _
>>
>
> Thanks, that was affecting several machines.
>
> Andy, any chance we can do an across-all-machines run of that one on top
> of -mm1? Thanks,
>
> M.
Yep, I've run this with badari's fix as a set across the whole family.
I did all dbenchall runs for now as this example is showing on that and
badari's is triggered same. If there is any measure of success there
I'll throw in the externals too.
-apw
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-13 14:12 ` Andy Whitcroft
@ 2006-07-14 8:00 ` Andy Whitcroft
2006-07-14 8:08 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-07-14 8:00 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Martin Bligh, Andrew Morton, linux-kernel
Andy Whitcroft wrote:
> Martin Bligh wrote:
>> Andrew Morton wrote:
>>> On Wed, 12 Jul 2006 09:53:08 -0700
>>> Martin Bligh <mbligh@google.com> wrote:
>>>
>>>
>>>> -git3 was fine
>>>> (bootlog for git3: http://test.kernel.org/abat/40748/debug/console.log)
>>>>
>>>> -mm1 has the same issue
>>>>
>>>> Slightly different manifestations across 2 boots
>>>>
>>>> http://test.kernel.org/abat/40760/debug/console.log
>>>> http://test.kernel.org/abat/40837/debug/console.log
>>>
>>>
>>> [<c0136fcf>] out_of_memory+0x29/0xf6
>>> [<c0137f48>] __alloc_pages+0x1ed/0x276
>>> [<c014db73>] kmem_getpages+0x63/0xc1
>>> [<c014e960>] cache_grow+0xaa/0x139
>>> [<c014eb6a>] cache_alloc_refill+0x17b/0x1c0
>>> [<c014f1ef>] __kmalloc+0x83/0x93
>>> [<c0168cf5>] alloc_fd_array+0x19/0x24
>>> [<c0169122>] alloc_fdtable+0xb2/0xef
>>> [<c016917f>] expand_fdtable+0x20/0x7d
>>> [<c0169221>] expand_files+0x45/0x50
>>> [<c0161263>] locate_fd+0x70/0x8e
>>> [<c01612aa>] dupfd+0x29/0x61
>>> [<c01613dc>] sys_dup+0x1b/0x23
>>> [<c01027d3>] syscall_call+0x7/0xb
>>>
>>> I suspect that's because I had me a little mistake.
>>>
>>> --- a/fs/file.c~alloc_fdtable-expansion-fix
>>> +++ a/fs/file.c
>>> @@ -240,7 +240,7 @@ static struct fdtable *alloc_fdtable(int
>>> if (!fdt)
>>> goto out;
>>>
>>> - nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nfds));
>>> + nfds = max_t(int, 8 * L1_CACHE_BYTES, roundup_pow_of_two(nr + 1));
>>> if (nfds > NR_OPEN)
>>> nfds = NR_OPEN;
>>>
>>> _
>>>
>>
>> Thanks, that was affecting several machines.
>>
>> Andy, any chance we can do an across-all-machines run of that one on top
>> of -mm1? Thanks,
>>
>> M.
>
> Yep, I've run this with badari's fix as a set across the whole family. I
> did all dbenchall runs for now as this example is showing on that and
> badari's is triggered same. If there is any measure of success there
> I'll throw in the externals too.
General goodness from this one. Except where we're getting issues with
the e1000's. That seems to be fixed up by backing out some driver changes.
All moot, as -mm2 is showing similar goodness.
-apw
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-14 8:00 ` Andy Whitcroft
@ 2006-07-14 8:08 ` Andrew Morton
2006-07-14 8:32 ` Andy Whitcroft
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2006-07-14 8:08 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: apw, mbligh, linux-kernel
On Fri, 14 Jul 2006 09:00:36 +0100
Andy Whitcroft <apw@shadowen.org> wrote:
> > Yep, I've run this with badari's fix as a set across the whole family. I
> > did all dbenchall runs for now as this example is showing on that and
> > badari's is triggered same. If there is any measure of success there
> > I'll throw in the externals too.
>
> General goodness from this one. Except where we're getting issues with
> the e1000's. That seems to be fixed up by backing out some driver changes.
>
> All moot, as -mm2 is showing similar goodness.
Is -mm2's e1000 OK?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-14 8:08 ` Andrew Morton
@ 2006-07-14 8:32 ` Andy Whitcroft
2006-07-14 10:13 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2006-07-14 8:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: mbligh, linux-kernel, Benjamin Herrenschmidt
Andrew Morton wrote:
> On Fri, 14 Jul 2006 09:00:36 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
>
>>> Yep, I've run this with badari's fix as a set across the whole family. I
>>> did all dbenchall runs for now as this example is showing on that and
>>> badari's is triggered same. If there is any measure of success there
>>> I'll throw in the externals too.
>> General goodness from this one. Except where we're getting issues with
>> the e1000's. That seems to be fixed up by backing out some driver changes.
>>
>> All moot, as -mm2 is showing similar goodness.
>
> Is -mm2's e1000 OK?
Whilst calling it the e1000 problem (that was how it was originally
reported) I should say that this was related to the sysfs change in the
following patches:
gregkh-driver-network-class_device-to-device.patch
gregkh-driver-class_device_rename-remove.patch
I have two boxes under test which were failing on -mm1 similar to teh
following (from userland):
eth-id-00:02:55:d3:37:4a No interface found
Both are booting -mm2 fine.
I can only see two outstanding issues. An IDE lost interrupt issue on a
blade we have under test which I believe benh is looking at, and what
looks like an s390 tool chain issue which I am told is being looked at.
-apw
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot
2006-07-14 8:32 ` Andy Whitcroft
@ 2006-07-14 10:13 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2006-07-14 10:13 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: Andrew Morton, mbligh, linux-kernel
> I can only see two outstanding issues. An IDE lost interrupt issue on a
> blade we have under test which I believe benh is looking at, and what
> looks like an s390 tool chain issue which I am told is being looked at.
Yeah, well, I'm trying to look at it... between flights :)
Ben.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-07-14 10:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-12 16:53 2.6.18-rc1-git4 and 2.6.18-rc1-mm1 OOM's on boot Martin Bligh
2006-07-13 1:16 ` Andrew Morton
2006-07-13 1:24 ` Martin Bligh
2006-07-13 14:12 ` Andy Whitcroft
2006-07-14 8:00 ` Andy Whitcroft
2006-07-14 8:08 ` Andrew Morton
2006-07-14 8:32 ` Andy Whitcroft
2006-07-14 10:13 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox