* stack overflow
@ 2000-09-04 10:47 Zeshan Ahmad
2000-09-04 11:03 ` Matti Aarnio
0 siblings, 1 reply; 35+ messages in thread
From: Zeshan Ahmad @ 2000-09-04 10:47 UTC (permalink / raw)
To: linux-mm
Hi
Can any1 tell me how can the stack size be changed in
the Kernel. i am experiencing a stack overflow problem
when the function kmem_cache_sizes_init is called in
/init/main.c The exact place where the stack overflow
occurs is in the function kmem_cache_slabmgmt in
/mm/slab.c
Is there any way to change the stack size in Kernel?
Can the change in stack size simply solve this Kernel
stack overflow problem?
Urgent help is needed.
ZEESHAN
__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2000-09-04 10:47 Zeshan Ahmad
@ 2000-09-04 11:03 ` Matti Aarnio
2000-09-04 11:23 ` Tigran Aivazian
0 siblings, 1 reply; 35+ messages in thread
From: Matti Aarnio @ 2000-09-04 11:03 UTC (permalink / raw)
To: Zeshan Ahmad; +Cc: linux-mm
On Mon, Sep 04, 2000 at 03:47:44AM -0700, Zeshan Ahmad wrote:
> Hi
>
> Can any1 tell me how can the stack size be changed in
> the Kernel. i am experiencing a stack overflow problem
In kernel ? DON'T!
> when the function kmem_cache_sizes_init is called in
> /init/main.c The exact place where the stack overflow
> occurs is in the function kmem_cache_slabmgmt in
> /mm/slab.c
>
> Is there any way to change the stack size in Kernel?
> Can the change in stack size simply solve this Kernel
> stack overflow problem?
That is indicative that somewhere along the path
you are: a) recursin, b) otherwise wasting stack
with too large local allocations (e.g. "auto"
variables).
In the kernel space: NEVER use stack-based buffers,
always kmalloc(). (If they are more than 8-16 bytes
in size, that is.) Similarly, NEVER use alloca() !
> Urgent help is needed.
>
> ZEESHAN
/Matti Aarnio
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2000-09-04 11:03 ` Matti Aarnio
@ 2000-09-04 11:23 ` Tigran Aivazian
2000-09-05 10:55 ` Mark Hemment
0 siblings, 1 reply; 35+ messages in thread
From: Tigran Aivazian @ 2000-09-04 11:23 UTC (permalink / raw)
To: Matti Aarnio; +Cc: Zeshan Ahmad, linux-mm, Mark Hemment
On Mon, 4 Sep 2000, Matti Aarnio wrote:
> > when the function kmem_cache_sizes_init is called in
> > /init/main.c The exact place where the stack overflow
> > occurs is in the function kmem_cache_slabmgmt in
> > /mm/slab.c
> >
> > Is there any way to change the stack size in Kernel?
> > Can the change in stack size simply solve this Kernel
> > stack overflow problem?
>
> That is indicative that somewhere along the path
> you are: a) recursin
looking at the code, it seems in theory possible to recurse via
kmem_cache_alloc()->kmem_cache_grow()->kmem_cache_slabmgmt()->kmem_cache_alloc() but
I thought Mark invented offslab_limit to prevent this.
Maybe decreasing offslab_limit can help? Defer to Mark...
Regards,
Tigran
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2000-09-04 11:23 ` Tigran Aivazian
@ 2000-09-05 10:55 ` Mark Hemment
0 siblings, 0 replies; 35+ messages in thread
From: Mark Hemment @ 2000-09-05 10:55 UTC (permalink / raw)
To: Tigran Aivazian; +Cc: Matti Aarnio, Zeshan Ahmad, linux-mm
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1525 bytes --]
Hi,
A quick look indicates what could be the problem.
In my original, the code assumes that all general purpose slabs below
"bufctl_limit" where suitable for bufctl allocation (look at a 2.2.x
version, in kmem_cache_sizes_init() I have a state variable called
"found").
The modified code (by who?), which now uses "offslab_limit", removes
the assumption, which could very well be causing the stack overflow; the
code now breaks the comment "Inc off-slab bufctl limit until the ceiling
is hit" as it has no ceiling.
Bringing back the state variable will close at least one door.
I've attached a completely untested (and uncompiled) patch.
If this doesn't fix it, I'll look deeper.
Mark
On Mon, 4 Sep 2000, Tigran Aivazian wrote:
> On Mon, 4 Sep 2000, Matti Aarnio wrote:
> > > when the function kmem_cache_sizes_init is called in
> > > /init/main.c The exact place where the stack overflow
> > > occurs is in the function kmem_cache_slabmgmt in
> > > /mm/slab.c
> > >
> > > Is there any way to change the stack size in Kernel?
> > > Can the change in stack size simply solve this Kernel
> > > stack overflow problem?
> >
> > That is indicative that somewhere along the path
> > you are: a) recursin
>
> looking at the code, it seems in theory possible to recurse via
> kmem_cache_alloc()->kmem_cache_grow()->kmem_cache_slabmgmt()->kmem_cache_alloc() but
> I thought Mark invented offslab_limit to prevent this.
>
> Maybe decreasing offslab_limit can help? Defer to Mark...
>
> Regards,
> Tigran
>
[-- Attachment #2: slab.patch --]
[-- Type: TEXT/PLAIN, Size: 1152 bytes --]
--- slab.c.00 Tue Sep 5 12:49:16 2000
+++ slab.c Tue Sep 5 12:51:36 2000
@@ -424,14 +424,17 @@
*/
void __init kmem_cache_sizes_init(void)
{
+ unsigned int limit_found;
cache_sizes_t *sizes = cache_sizes;
char name[20];
+
/*
* Fragmentation resistance on low memory - only use bigger
* page orders on machines with more than 32MB of memory.
*/
if (num_physpages > (32 << 20) >> PAGE_SHIFT)
slab_break_gfp_order = BREAK_GFP_ORDER_HI;
+ limit_found = 0;
do {
/* For performance, all the general caches are L1 aligned.
* This should be particularly beneficial on SMP boxes, as it
@@ -446,9 +449,12 @@
}
/* Inc off-slab bufctl limit until the ceiling is hit. */
- if (!(OFF_SLAB(sizes->cs_cachep))) {
- offslab_limit = sizes->cs_size-sizeof(slab_t);
- offslab_limit /= 2;
+ if (limit_found == 0) {
+ if (!(OFF_SLAB(sizes->cs_cachep))) {
+ offslab_limit = sizes->cs_size-sizeof(slab_t);
+ offslab_limit /= 2;
+ } else
+ limit_found = 1;
}
sprintf(name, "size-%Zd(DMA)",sizes->cs_size);
sizes->cs_dmacachep = kmem_cache_create(name, sizes->cs_size, 0,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
@ 2000-09-05 19:03 Zeshan Ahmad
2000-09-06 8:33 ` Mark Hemment
0 siblings, 1 reply; 35+ messages in thread
From: Zeshan Ahmad @ 2000-09-05 19:03 UTC (permalink / raw)
To: Mark Hemment; +Cc: tigran, linux-mm
Hi
I have figured out why the patch is'nt working.
Mark wrote:
>In my original, the code assumes that all general
>purpose slabs below
>"bufctl_limit" where suitable for bufctl allocation
>(look at a 2.2.x
>version, in kmem_cache_sizes_init() I have a state
>variable called
>"found").
Since I am already using 2.2.x, so the patch is not
working. This means i am already using the variable
"found".
So this will not work i presume.
Any other solution available?
Regards
Zeshan
__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2000-09-05 19:03 Zeshan Ahmad
@ 2000-09-06 8:33 ` Mark Hemment
0 siblings, 0 replies; 35+ messages in thread
From: Mark Hemment @ 2000-09-06 8:33 UTC (permalink / raw)
To: Zeshan Ahmad; +Cc: tigran, linux-mm
Hi Zeshan,
What version of 2.2.x are you using, and have you applied any patches it
to?
I'm not subscribed to linux-mm at the moment, so I missed your original
posting.
Mark
On Tue, 5 Sep 2000, Zeshan Ahmad wrote:
> Hi
>
> I have figured out why the patch is'nt working.
>
> Mark wrote:
> >In my original, the code assumes that all general
> >purpose slabs below
> >"bufctl_limit" where suitable for bufctl allocation
> >(look at a 2.2.x
> >version, in kmem_cache_sizes_init() I have a state
> >variable called
> >"found").
>
> Since I am already using 2.2.x, so the patch is not
> working. This means i am already using the variable
> "found".
> So this will not work i presume.
>
> Any other solution available?
>
> Regards
> Zeshan
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Mail - Free email you can access from anywhere!
> http://mail.yahoo.com/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
@ 2000-09-06 13:25 Zeshan Ahmad
0 siblings, 0 replies; 35+ messages in thread
From: Zeshan Ahmad @ 2000-09-06 13:25 UTC (permalink / raw)
To: Mark Hemment; +Cc: tigran, linux-mm
[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]
Thanx for ur reply in considering my problem. I am
using Kernel version 2.2.14
I have attached the portion of mm/slab.c containing
the function kmem_cache_sizes_init from the kernel
code i am using with this mail. Plz have a look at it
and recommend me any changes.
Also plz suggest any good readings about "slab
allocator" on the net.
Thanx for ur support. Help is badly needed.
Anxiously waiting for ur reply.
Regards
ZESHAN
--- Mark Hemment <markhe@veritas.com> wrote:
> Hi Zeshan,
>
> What version of 2.2.x are you using, and have you
> applied any patches it
> to?
> I'm not subscribed to linux-mm at the moment, so I
> missed your original
> posting.
>
> Mark
>
>
> On Tue, 5 Sep 2000, Zeshan Ahmad wrote:
>
> > Hi
> >
> > I have figured out why the patch is'nt working.
> >
> > Mark wrote:
> > >In my original, the code assumes that all general
> > >purpose slabs below
> > >"bufctl_limit" where suitable for bufctl
> allocation
> > >(look at a 2.2.x
> > >version, in kmem_cache_sizes_init() I have a
> state
> > >variable called
> > >"found").
> >
> > Since I am already using 2.2.x, so the patch is
> not
> > working. This means i am already using the
> variable
> > "found".
> > So this will not work i presume.
> >
> > Any other solution available?
> >
> > Regards
> > Zeshan
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! Mail - Free email you can access from
> anywhere!
> > http://mail.yahoo.com/
> >
>
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@kvack.org. For more info on
> Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/
[-- Attachment #2: kmem_cache_sizes_init.txt --]
[-- Type: text/plain, Size: 1191 bytes --]
void __init kmem_cache_sizes_init(void)
{
unsigned int found = 0;
cache_slabp = kmem_cache_create("slab_cache", sizeof(kmem_slab_t),
0, SLAB_HWCACHE_ALIGN, NULL, NULL);
if (cache_slabp) {
char **names = cache_sizes_name;
cache_sizes_t *sizes = cache_sizes;
do {
/* For performance, all the general caches are L1 aligned.
* This should be particularly beneficial on SMP boxes, as it
* eliminates "false sharing".
* Note for systems short on memory removing the alignment will
* allow tighter packing of the smaller caches. */
if (!(sizes->cs_cachep =
kmem_cache_create(*names++, sizes->cs_size,
0, SLAB_HWCACHE_ALIGN, NULL, NULL)))
goto panic_time;
if (!found) {
/* Inc off-slab bufctl limit until the ceiling is hit. */
if (SLAB_BUFCTL(sizes->cs_cachep->c_flags))
found++;
else
bufctl_limit =
(sizes->cs_size/sizeof(kmem_bufctl_t));
}
sizes->cs_cachep->c_flags |= SLAB_CFLGS_GENERAL;
sizes++;
} while (sizes->cs_size);
#if SLAB_SELFTEST
kmem_self_test();
#endif /* SLAB_SELFTEST */
return;
}
panic_time:
panic("kmem_cache_sizes_init: Error creating caches");
/* NOTREACHED */
}
^ permalink raw reply [flat|nested] 35+ messages in thread
* Stack overflow
@ 2003-01-24 7:08 Madhavi
2003-01-24 7:53 ` Linux Geek
0 siblings, 1 reply; 35+ messages in thread
From: Madhavi @ 2003-01-24 7:08 UTC (permalink / raw)
To: linux-kernel
Hi
I am testing a PCI network device driver on linux-2.4.19.
I have observed a peculiar problem during testing.
I have a functionality which works well if the code whcih performs this
function is embedded in the required function. If this functionality is
implemented as a separate function, and this function is called at the
required place, the system crashes. I have used KDB for debugging. But,
KDB also fails when this system crash occurs.
Could this be because of any function stack overflow? I am new to this
field. Could someone through some light on this.
Thanks in advacne.
regards
Madhavi.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stack overflow
2003-01-24 7:08 Stack overflow Madhavi
@ 2003-01-24 7:53 ` Linux Geek
2003-01-24 15:32 ` GrandMasterLee
0 siblings, 1 reply; 35+ messages in thread
From: Linux Geek @ 2003-01-24 7:53 UTC (permalink / raw)
To: Madhavi; +Cc: linux-kernel
>
>
>I have a functionality which works well if the code whcih performs this
>function is embedded in the required function. If this functionality is
>implemented as a separate function, and this function is called at the
>required place, the system crashes. I have used KDB for debugging. But,
>
>
I'd suggest check the args passed to the function and the sizes they
would consume when they are passed as 'call by value'.
Try to pass them as pointers maybe.
Yes, i think there is a limit on kernel stack but not i'm not too sure
about the number.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stack overflow
2003-01-24 7:53 ` Linux Geek
@ 2003-01-24 15:32 ` GrandMasterLee
2003-01-24 15:41 ` Madhavi
0 siblings, 1 reply; 35+ messages in thread
From: GrandMasterLee @ 2003-01-24 15:32 UTC (permalink / raw)
To: Linux Geek; +Cc: Madhavi, linux-kernel
On Fri, 2003-01-24 at 01:53, Linux Geek wrote:
> >
> >
> >I have a functionality which works well if the code whcih performs this
> >function is embedded in the required function. If this functionality is
> >implemented as a separate function, and this function is called at the
> >required place, the system crashes. I have used KDB for debugging. But,
> >
> >
> I'd suggest check the args passed to the function and the sizes they
> would consume when they are passed as 'call by value'.
> Try to pass them as pointers maybe.
>
> Yes, i think there is a limit on kernel stack but not i'm not too sure
> about the number.
The kernel has an 8K stack max. That said, it *could* be your stack. Do
you get any panics or oops? If so, you *could* write them down :-D And
debug them later. If it appears corrupted, then it's probably stack.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stack overflow
2003-01-24 15:32 ` GrandMasterLee
@ 2003-01-24 15:41 ` Madhavi
2003-01-24 16:42 ` Richard B. Johnson
2003-01-24 16:52 ` Gianni Tedesco
0 siblings, 2 replies; 35+ messages in thread
From: Madhavi @ 2003-01-24 15:41 UTC (permalink / raw)
To: GrandMasterLee; +Cc: Linux Geek, linux-kernel
On 24 Jan 2003, GrandMasterLee wrote:
> On Fri, 2003-01-24 at 01:53, Linux Geek wrote:
> > >
> > >
> > >I have a functionality which works well if the code whcih performs this
> > >function is embedded in the required function. If this functionality is
> > >implemented as a separate function, and this function is called at the
> > >required place, the system crashes. I have used KDB for debugging. But,
> > >
> > >
> > I'd suggest check the args passed to the function and the sizes they
> > would consume when they are passed as 'call by value'.
> > Try to pass them as pointers maybe.
> >
> > Yes, i think there is a limit on kernel stack but not i'm not too sure
> > about the number.
>
> The kernel has an 8K stack max. That said, it *could* be your stack. Do
> you get any panics or oops? If so, you *could* write them down :-D And
> debug them later. If it appears corrupted, then it's probably stack.
>
I am getting a panic saying "attempted to kill idle task". The call trace
scrolls through for 4-5 pages. I have also observed complete system hang a
few times.
How can I solve this problem, supposing I can't avoid that function call?
Any pointers regarding this would be of great help for me.
thanks & regards
Madhavi.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stack overflow
2003-01-24 15:41 ` Madhavi
@ 2003-01-24 16:42 ` Richard B. Johnson
2003-01-24 16:52 ` Gianni Tedesco
1 sibling, 0 replies; 35+ messages in thread
From: Richard B. Johnson @ 2003-01-24 16:42 UTC (permalink / raw)
To: Madhavi; +Cc: GrandMasterLee, Linux Geek, linux-kernel
On Fri, 24 Jan 2003, Madhavi wrote:
>
> On 24 Jan 2003, GrandMasterLee wrote:
>
> > On Fri, 2003-01-24 at 01:53, Linux Geek wrote:
> > > >
> > > >
> > > >I have a functionality which works well if the code whcih performs this
> > > >function is embedded in the required function. If this functionality is
> > > >implemented as a separate function, and this function is called at the
> > > >required place, the system crashes. I have used KDB for debugging. But,
> > > >
> > > >
> > > I'd suggest check the args passed to the function and the sizes they
> > > would consume when they are passed as 'call by value'.
> > > Try to pass them as pointers maybe.
> > >
> > > Yes, i think there is a limit on kernel stack but not i'm not too sure
> > > about the number.
> >
> > The kernel has an 8K stack max. That said, it *could* be your stack. Do
> > you get any panics or oops? If so, you *could* write them down :-D And
> > debug them later. If it appears corrupted, then it's probably stack.
> >
>
> I am getting a panic saying "attempted to kill idle task". The call trace
> scrolls through for 4-5 pages. I have also observed complete system hang a
> few times.
>
> How can I solve this problem, supposing I can't avoid that function call?
> Any pointers regarding this would be of great help for me.
>
> thanks & regards
> Madhavi.
>
Make a user-mode 'main()' test program that calls your function with
the required parameters. Link it with your function and run it either
with gdb or with lots of ordinary printf statements.
You can resolve all missing globals, including functions, with
simple 'ints' as long as they are not accessed in your code. This
lets you test practically anything in user-mode when, if you
screw up, you just have a core file and don't have to re-boot.
Most all of the modules that I write for embedded systems are
initially tested this way. That lets me get linked-lists and
other easy-to-muck-up things working correctly before I throw
it off the cliff, hoping it will fly. My user-mode code calls
ISRs, actually accessing hardware in most cases (iopl(3) allows
this).
Also, PCI interrupts are level interrupts. Do not enable interrupts
in your ISR until you have serviced the reason why you got the
interrupt in the first place, otherwise the ISR will be interrupted
by the same interrupt...,
by the same interrupt...,
by the same interrupt...,
until you crash.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Why is the government concerned about the lunatic fringe? Think about it.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Stack overflow
2003-01-24 15:41 ` Madhavi
2003-01-24 16:42 ` Richard B. Johnson
@ 2003-01-24 16:52 ` Gianni Tedesco
1 sibling, 0 replies; 35+ messages in thread
From: Gianni Tedesco @ 2003-01-24 16:52 UTC (permalink / raw)
To: Madhavi; +Cc: GrandMasterLee, Linux Geek, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 399 bytes --]
On Fri, 2003-01-24 at 15:41, Madhavi wrote:
> How can I solve this problem, supposing I can't avoid that function call?
> Any pointers regarding this would be of great help for me.
post source code.
--
// Gianni Tedesco (gianni at scaramanga dot co dot uk)
lynx --source www.scaramanga.co.uk/gianni-at-ecsc.asc | gpg --import
8646BE7D: 6D9F 2287 870E A2C9 8F60 3A3C 91B5 7669 8646 BE7D
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* stack overflow
@ 2003-09-12 17:53 Breno
2003-09-12 22:50 ` Andreas Dilger
0 siblings, 1 reply; 35+ messages in thread
From: Breno @ 2003-09-12 17:53 UTC (permalink / raw)
To: Kernel List
Hi ... this is my idea to check a stack overflow. What do you think ?
#define STACK_LIMIT (1024*8192)/PAGE_SIZE
int check_stack_overflow(struct task_struct *tsk)
{
unsigned long stack_size,stack_addr,stack_ptr;
int i;
if(tsk->mm != NULL)
{
stack_addr = tsk->mm->start_stack;
stack_ptr = tsk->thread.esp;
for(i=0; i < stack_ptr; i++)
stack_addr++;
stack_size = (stack_addr - stack_ptr)/PAGE_SIZE;
if(stack_size > ( STACK_LIMIT - 1))
{
printk(KERN_CRIT"Process %s : pid %d -
Can cause stack
overflow\n",tsk->comm,tsk->pid);
return 0;
}
}
return 0;
}
att,
Breno
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 22:50 ` Andreas Dilger
@ 2003-09-12 19:14 ` Breno
2003-09-12 23:06 ` William Lee Irwin III
1 sibling, 0 replies; 35+ messages in thread
From: Breno @ 2003-09-12 19:14 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Kernel List
I think that size limit of user stack is 8mb
Breno
----- Original Message -----
From: "Andreas Dilger" <adilger@clusterfs.com>
To: "Breno" <brenosp@brasilsec.com.br>
Cc: "Kernel List" <linux-kernel@vger.kernel.org>
Sent: Friday, September 12, 2003 11:50 PM
Subject: Re: stack overflow
> On Sep 12, 2003 18:53 +0100, Breno wrote:
> > Hi ... this is my idea to check a stack overflow. What do you think ?
> >
> > #define STACK_LIMIT (1024*8192)/PAGE_SIZE
> >
> > int check_stack_overflow(struct task_struct *tsk)
> > {
> >
> > unsigned long stack_size,stack_addr,stack_ptr;
> > int i;
> >
> > if(tsk->mm != NULL)
> > {
> > stack_addr = tsk->mm->start_stack;
> >
> > stack_ptr = tsk->thread.esp;
> >
> > for(i=0; i < stack_ptr; i++)
> > stack_addr++;
> >
> > stack_size = (stack_addr - stack_ptr)/PAGE_SIZE;
> >
> > if(stack_size > ( STACK_LIMIT - 1))
>
> Well, with the exception of the fact that STACK_LIMIT is 8MB, and kernel
> stacks are only 8kB (on i386)...
>
> Also, see "do_IRQ()" (i386) for CONFIG_DEBUG_STACKOVERFLOW to see this
already.
>
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 23:06 ` William Lee Irwin III
@ 2003-09-12 19:23 ` Breno
2003-09-12 23:18 ` Alan Cox
1 sibling, 0 replies; 35+ messages in thread
From: Breno @ 2003-09-12 19:23 UTC (permalink / raw)
To: William Lee Irwin III, Kernel List
Wli,
Exactly that stack users are demand paged , you can calculate the size of
stack. This is will impossible or more difficult to do if you have more that
one mm->start_stack :)
att
Breno
----- Original Message -----
From: "William Lee Irwin III" <wli@holomorphy.com>
To: "Breno" <brenosp@brasilsec.com.br>; "Kernel List"
<linux-kernel@vger.kernel.org>
Sent: Saturday, September 13, 2003 12:06 AM
Subject: Re: stack overflow
> On Fri, Sep 12, 2003 at 04:50:47PM -0600, Andreas Dilger wrote:
> > Well, with the exception of the fact that STACK_LIMIT is 8MB, and kernel
> > stacks are only 8kB (on i386)...
> > Also, see "do_IRQ()" (i386) for CONFIG_DEBUG_STACKOVERFLOW to see this
already.
>
> What he actually wants is in-kernel user stack overflow checking, which
> is basically impossible since user stacks are demand paged. He's been
> told this before and failed to absorb it.
>
> There have been attempts to use i386 segmentation for stack limit
> checks written but they should probably not be confused with this.
>
>
> -- wli
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 17:53 stack overflow Breno
@ 2003-09-12 22:50 ` Andreas Dilger
2003-09-12 19:14 ` Breno
2003-09-12 23:06 ` William Lee Irwin III
0 siblings, 2 replies; 35+ messages in thread
From: Andreas Dilger @ 2003-09-12 22:50 UTC (permalink / raw)
To: Breno; +Cc: Kernel List
On Sep 12, 2003 18:53 +0100, Breno wrote:
> Hi ... this is my idea to check a stack overflow. What do you think ?
>
> #define STACK_LIMIT (1024*8192)/PAGE_SIZE
>
> int check_stack_overflow(struct task_struct *tsk)
> {
>
> unsigned long stack_size,stack_addr,stack_ptr;
> int i;
>
> if(tsk->mm != NULL)
> {
> stack_addr = tsk->mm->start_stack;
>
> stack_ptr = tsk->thread.esp;
>
> for(i=0; i < stack_ptr; i++)
> stack_addr++;
>
> stack_size = (stack_addr - stack_ptr)/PAGE_SIZE;
>
> if(stack_size > ( STACK_LIMIT - 1))
Well, with the exception of the fact that STACK_LIMIT is 8MB, and kernel
stacks are only 8kB (on i386)...
Also, see "do_IRQ()" (i386) for CONFIG_DEBUG_STACKOVERFLOW to see this already.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 22:50 ` Andreas Dilger
2003-09-12 19:14 ` Breno
@ 2003-09-12 23:06 ` William Lee Irwin III
2003-09-12 19:23 ` Breno
2003-09-12 23:18 ` Alan Cox
1 sibling, 2 replies; 35+ messages in thread
From: William Lee Irwin III @ 2003-09-12 23:06 UTC (permalink / raw)
To: Breno, Kernel List
On Fri, Sep 12, 2003 at 04:50:47PM -0600, Andreas Dilger wrote:
> Well, with the exception of the fact that STACK_LIMIT is 8MB, and kernel
> stacks are only 8kB (on i386)...
> Also, see "do_IRQ()" (i386) for CONFIG_DEBUG_STACKOVERFLOW to see this already.
What he actually wants is in-kernel user stack overflow checking, which
is basically impossible since user stacks are demand paged. He's been
told this before and failed to absorb it.
There have been attempts to use i386 segmentation for stack limit
checks written but they should probably not be confused with this.
-- wli
^ permalink raw reply [flat|nested] 35+ messages in thread
* stack overflow - kernel thread
2003-09-12 23:25 ` William Lee Irwin III
@ 2003-09-12 23:18 ` Breno
0 siblings, 0 replies; 35+ messages in thread
From: Breno @ 2003-09-12 23:18 UTC (permalink / raw)
To: William Lee Irwin III, Alan Cox; +Cc: Linux Kernel Mailing List
Hi .... this worked
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/unistd.h>
#include <linux/mm.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/string.h>
#include <asm/uaccess.h>
#include <asm/errno.h>
#include <asm/mman.h>
#include <asm/page.h>
#include <asm/pgalloc.h>
#include <asm/processor.h>
#include <asm-i386/vm86.h>
#define THREAD 1
#define STACK_LIMIT 8192
int check_stack()
{
struct task_struct *p = NULL;
struct vm_area_struct *vma;
unsigned long stack_size, stack_addr, stack_ptr;
daemonize();
sprintf(current->comm,"CHECK_STACK");
for(;;schedule())
{
for_each_task(p)
{
if((p->mm != NULL) && (p->pid != 1))
{
stack_addr = p->mm->start_stack;
stack_ptr = p->thread.esp;
vma = find_vma(p->mm,stack_addr);
if(!vma)
return 0;
stack_size = (vma->vm_end - vma->vm_start)/PAGE_SIZE;
if(stack_size >= (STACK_LIMIT - 1024))
{
printk(KERN_CRIT"Process %s pid: %d\n",p->comm,p->pid);
printk(KERN_CRIT"Stack size %lu k\n",stack_size);
vma->vm_flags &= ~(VM_WRITE | VM_GROWSDOWN);
return 0;
}
}
}
}
return 0;
}
int init_module()
{
int th_pid[THREAD];
int i;
for(i = 0;i < THREAD;i++)
th_pid[i] = kernel_thread(check_stack,NULL,CLONE_SIGHAND);
if(th_pid[i] < 0)
printk(KERN_CRIT"Cannot create thread\n");
return 0;
}
void cleanup_module()
{
}
att,
Breno
----- Original Message -----
From: "William Lee Irwin III" <wli@holomorphy.com>
To: "Alan Cox" <alan@lxorguk.ukuu.org.uk>
Cc:
"Breno" <brenosp@brasilsec.com.br>; "Linux Kernel Mailing List"
<linux-kernel@vger.kernel.org>
Sent: Saturday, September 13, 2003 12:25 AM
Subject: Re: stack overflow
> On Sad, 2003-09-13 at 00:06, William Lee Irwin III wrote:
> >> What he actually wants is in-kernel user stack overflow checking, which
> >> is basically impossible since user stacks are demand paged. He's been
> >> told this before and failed to absorb it.
>
> On Sat, Sep 13, 2003 at 12:18:32AM +0100, Alan Cox wrote:
> > We will fault and error on a user stack exceed. You need to use
> > sigaltstack to catch it for obvious reasons. You can also use mmap and
> > drop in red zones on user space stacks
>
> Stack rlimits are fine and we already do those; the rest sounds like
> something userspace has to do.
>
>
> -- wli
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 23:06 ` William Lee Irwin III
2003-09-12 19:23 ` Breno
@ 2003-09-12 23:18 ` Alan Cox
2003-09-12 23:25 ` William Lee Irwin III
1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2003-09-12 23:18 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Breno, Linux Kernel Mailing List
On Sad, 2003-09-13 at 00:06, William Lee Irwin III wrote:
> What he actually wants is in-kernel user stack overflow checking, which
> is basically impossible since user stacks are demand paged. He's been
> told this before and failed to absorb it.
We will fault and error on a user stack exceed. You need to use
sigaltstack to catch it for obvious reasons. You can also use mmap and
drop in red zones on user space stacks
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
2003-09-12 23:18 ` Alan Cox
@ 2003-09-12 23:25 ` William Lee Irwin III
2003-09-12 23:18 ` stack overflow - kernel thread Breno
0 siblings, 1 reply; 35+ messages in thread
From: William Lee Irwin III @ 2003-09-12 23:25 UTC (permalink / raw)
To: Alan Cox; +Cc: Breno, Linux Kernel Mailing List
On Sad, 2003-09-13 at 00:06, William Lee Irwin III wrote:
>> What he actually wants is in-kernel user stack overflow checking, which
>> is basically impossible since user stacks are demand paged. He's been
>> told this before and failed to absorb it.
On Sat, Sep 13, 2003 at 12:18:32AM +0100, Alan Cox wrote:
> We will fault and error on a user stack exceed. You need to use
> sigaltstack to catch it for obvious reasons. You can also use mmap and
> drop in red zones on user space stacks
Stack rlimits are fine and we already do those; the rest sounds like
something userspace has to do.
-- wli
^ permalink raw reply [flat|nested] 35+ messages in thread
* stack overflow
@ 2004-03-08 16:43 Stuart_Hayes-DYMqY+WieiM
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F128AA4-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
0 siblings, 1 reply; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-08 16:43 UTC (permalink / raw)
To: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f; +Cc: Stuart_Hayes-DYMqY+WieiM
Hello!
I'm using the x86_64 architecture with Linux, and I'm getting what appear to be
stack overflows while the ACPI stuff is being initialized (all the _STA
methods are being executed).
Just wondering if anyone has taken a look at stack usage with the ACPI stuff,
and what could be done to fix this (other than simplifying the ACPI tables!).
I suspect this problem might become an issue with a large number of people as
the x86_64 architecture becomes more common, since it uses ACPI by default (and
i386 did not).
Here are some of the reasons I believe the stack is overflowing:
I've added some "printk"s to the kernel, and I've found that the stack pointer
goes down by ~6K between namespace/nseval.c:acpi_ns_evaluate_relative() and
executer/exstore.c:acpi_ex_store().
If I disable local interrupts while the ACPI stuff is being initialized, it
seems to make it through without failing.
If I simplify some of the methods, it seems to work ok.
Thanks
Stuart
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F128AA4-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
@ 2004-03-08 18:26 ` Andi Kleen
[not found] ` <20040308182630.GB9490-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2004-03-10 4:33 ` Len Brown
1 sibling, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-03-08 18:26 UTC (permalink / raw)
To: Stuart_Hayes-DYMqY+WieiM; +Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
> Here are some of the reasons I believe the stack is overflowing:
>
> I've added some "printk"s to the kernel, and I've found that the stack pointer
> goes down by ~6K between namespace/nseval.c:acpi_ns_evaluate_relative() and
> executer/exstore.c:acpi_ex_store().
The usual way to start is do
objdump -S <acpi object modules> | grep sub.*rsp
then sort by the biggest stack pigs and fix them one by one (e.g.
by kmallocing local data instead of allocating it on the stack)
When afterwards the problem still occurs it is most likely recursion or
to deep nesting. I have an old 2.4 patch that can catch these, but it
would need porting to 2.6.
-Andi
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
[not found] ` <20040308182630.GB9490-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2004-03-09 7:17 ` Len Brown
0 siblings, 0 replies; 35+ messages in thread
From: Len Brown @ 2004-03-09 7:17 UTC (permalink / raw)
To: Andi Kleen; +Cc: Stuart_Hayes-DYMqY+WieiM, ACPI Developers
Stuart,
Does CONFIG_ACPI_DEBUG change the results of your measurements?
Is it possible to run an i386 kernel on the same system to see if we've
got an x86_64-specific issue?
There is some run-time stack tracing code in ACPI (see
acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
thanks,
-Len
On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
> > Here are some of the reasons I believe the stack is overflowing:
> >
> > I've added some "printk"s to the kernel, and I've found that the stack pointer
> > goes down by ~6K between namespace/nseval.c:acpi_ns_evaluate_relative() and
> > executer/exstore.c:acpi_ex_store().
>
> The usual way to start is do
>
> objdump -S <acpi object modules> | grep sub.*rsp
>
> then sort by the biggest stack pigs and fix them one by one (e.g.
> by kmallocing local data instead of allocating it on the stack)
> When afterwards the problem still occurs it is most likely recursion or
> to deep nesting. I have an old 2.4 patch that can catch these, but it
> would need porting to 2.6.
>
> -Andi
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Acpi-devel mailing list
> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-09 18:34 Moore, Robert
0 siblings, 0 replies; 35+ messages in thread
From: Moore, Robert @ 2004-03-09 18:34 UTC (permalink / raw)
To: Brown, Len, Andi Kleen
Cc: Stuart_Hayes-DYMqY+WieiM, ACPI Developers, Grover, Andrew
There's the whole tracing mechanism that sits on the stack -- it goes
away when debug is disabled.
There is very, very little recursion in the ACPI subsystem, and only
when it is bounded.
Bob
-----Original Message-----
From: acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
[mailto:acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of Brown, Len
Sent: Monday, March 08, 2004 11:18 PM
To: Andi Kleen
Cc: Stuart_Hayes-DYMqY+WieiM@public.gmane.org; ACPI Developers
Subject: Re: [ACPI] stack overflow
Stuart,
Does CONFIG_ACPI_DEBUG change the results of your measurements?
Is it possible to run an i386 kernel on the same system to see if we've
got an x86_64-specific issue?
There is some run-time stack tracing code in ACPI (see
acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
thanks,
-Len
On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
> > Here are some of the reasons I believe the stack is overflowing:
> >
> > I've added some "printk"s to the kernel, and I've found that the
stack pointer
> > goes down by ~6K between
namespace/nseval.c:acpi_ns_evaluate_relative() and
> > executer/exstore.c:acpi_ex_store().
>
> The usual way to start is do
>
> objdump -S <acpi object modules> | grep sub.*rsp
>
> then sort by the biggest stack pigs and fix them one by one (e.g.
> by kmallocing local data instead of allocating it on the stack)
> When afterwards the problem still occurs it is most likely recursion
or
> to deep nesting. I have an old 2.4 patch that can catch these, but it
> would need porting to 2.6.
>
> -Andi
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Acpi-devel mailing list
> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Acpi-devel mailing list
Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-09 20:00 Stuart_Hayes-DYMqY+WieiM
0 siblings, 0 replies; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-09 20:00 UTC (permalink / raw)
To: len.brown-ral2JQCrhuEAvxtiuMwx3w, ak-l3A5Bk7waGM
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
I am in the process of trying this without CONFIG_ACPI_DEBUG now.
I put a little extra debug stuff in utilities/utdebug.c to keep track
of all the nested functions, and added the ACPI_FUNCTION_TRACE to some
more of the functions, and this is what I get when the stack pointer is
lowest (the number by each is the address of the first argument of the
acpi_ut_trace(_*) function, which I store along with the function
name). This gives a pretty good picture of what's going on with the
recursion and the deep nesting of functions.
acpi_init 0000010001e0defc
acpi_bus_init 0000010001e0deac
acpi_initialize_objects 0000010001e0de6c
ns_initialize_devices 0000010001e0de1c
ns_walk_namespace 0000010001e0dd8c
ns_init_one_device 0000010001e0dd2c
ut_execute_STA 0000010001e0dccc
ut_evaluate_object 0000010001e0dc5c
ns_evaluate_relative 0000010001e0db8c
ns_evaluate_by_handle 0000010001e0db1c
ns_execute_control_method 0000010001e0dacc
psx_execute 0000010001e0da5c
ps_parse_aml 0000010001e0da0c
ps_parse_loop 0000010001e0d8fc
ds_exec_end_op 0000010001e0d89c
ds_resolve_operands 0000010001e0d85c
ex_resolve_to_value 0000010001e0d80c
ex_resolve_node_to_value 0000010001e0d7ac
ex_read_data_from_field 0000010001e0d71c
ex_extract_from_field 0000010001e0d69c
ex_field_datum_io 0000010001e0d61c
ex_access_region 0000010001e0d5ac
ev_address_space_dispatch 0000010001e0d52c
ev_pci_config_region_setup 0000010001e0d4ac
ut_evaluate_numeric_object 0000010001e0d44c
ut_evaluate_object 0000010001e0d3dc
ns_evaluate_relative 0000010001e0d30c
ns_evaluate_by_handle 0000010001e0d29c
ns_execute_control_method 0000010001e0d24c
psx_execute 0000010001e0d1dc
ps_parse_aml 0000010001e0d18c
ps_parse_loop 0000010001e0d07c
ds_exec_end_op 0000010001e0d01c
ex_resolve_operands 0000010001e0cf9c
ex_resolve_to_value 0000010001e0cf4c
ex_resolve_node_to_value 0000010001e0ceec
ex_read_data_from_field 0000010001e0ce5c
ex_extract_from_field 0000010001e0cddc
ex_field_datum_io 0000010001e0cd5c
ex_access_region 0000010001e0ccec
ev_address_space_dispatch 0000010001e0cc6c
ev_pci_config_region_setup 0000010001e0cbec
acpi_evaluate_integer 0000010001e0caac
acpi_evaluate_object 0000010001e0ca2c
ns_evaluate_relative 0000010001e0c95c
ns_evaluate_by_handle 0000010001e0c8ec
ns_execute_control_method 0000010001e0c89c
psx_execute 0000010001e0c82c
ps_parse_aml 0000010001e0c7dc
ps_parse_loop 0000010001e0c6cc
ds_exec_end_op 0000010001e0c66c
ex_resolve_operands 0000010001e0c5ec
ex_resolve_to_value 0000010001e0c59c
ex_resolve_node_to_value 0000010001e0c53c
ex_read_data_from_field 0000010001e0c4ac
ex_extract_from_field 0000010001e0c42c
ex_field_datum_io 0000010001e0c3ac
ex_access_region 0000010001e0c33c
ev_address_space_dispatch 0000010001e0c2bc
ex_enter_interpreter 0000010001e0c27c
ut_acquire_mutex 0000010001e0c21c
os_wait_semaphore 0000010001e0c1cc
Thanks
Stuart
Len Brown wrote:
> Stuart,
> Does CONFIG_ACPI_DEBUG change the results of your measurements?
>
> Is it possible to run an i386 kernel on the same system to see if
> we've got an x86_64-specific issue?
>
> There is some run-time stack tracing code in ACPI (see
> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
>
> thanks,
> -Len
>
> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
>>> Here are some of the reasons I believe the stack is overflowing:
>>>
>>> I've added some "printk"s to the kernel, and I've found that the
>>> stack pointer goes down by ~6K between
>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
>>> executer/exstore.c:acpi_ex_store().
>>
>> The usual way to start is do
>>
>> objdump -S <acpi object modules> | grep sub.*rsp
>>
>> then sort by the biggest stack pigs and fix them one by one (e.g.
>> by kmallocing local data instead of allocating it on the stack)
>> When afterwards the problem still occurs it is most likely recursion
>> or to deep nesting. I have an old 2.4 patch that can catch these,
>> but it would need porting to 2.6.
>>
>> -Andi
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IBM Linux Tutorials
>> Free Linux tutorial presented by Daniel Robbins, President and CEO of
>> GenToo technologies. Learn everything from fundamentals to system
>> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>> _______________________________________________
>> Acpi-devel mailing list
>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-09 21:04 Stuart_Hayes-DYMqY+WieiM
0 siblings, 0 replies; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-09 21:04 UTC (permalink / raw)
To: len.brown-ral2JQCrhuEAvxtiuMwx3w, ak-l3A5Bk7waGM
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
It seems to work with CONFIG_ACPI_DEBUG off. I'm guessing we're just
squeaking by with that, though. Wouldn't more complex ACPI methods
cause the stack usage to go up, causing it to break again?
I don't think this will occur with i386, because the pointers are
half the size, but I'll try it as soon as I get a chance.
Thanks
Stuart
Hayes, Stuart wrote:
> I am in the process of trying this without CONFIG_ACPI_DEBUG now.
>
> I put a little extra debug stuff in utilities/utdebug.c to keep track
> of all the nested functions, and added the ACPI_FUNCTION_TRACE to some
> more of the functions, and this is what I get when the stack pointer
> is lowest (the number by each is the address of the first argument of
> the acpi_ut_trace(_*) function, which I store along with the function
> name). This gives a pretty good picture of what's going on with the
> recursion and the deep nesting of functions.
>
> acpi_init 0000010001e0defc
> acpi_bus_init 0000010001e0deac
> acpi_initialize_objects 0000010001e0de6c
> ns_initialize_devices 0000010001e0de1c
> ns_walk_namespace 0000010001e0dd8c
> ns_init_one_device 0000010001e0dd2c
> ut_execute_STA 0000010001e0dccc
> ut_evaluate_object 0000010001e0dc5c
> ns_evaluate_relative 0000010001e0db8c
> ns_evaluate_by_handle 0000010001e0db1c
> ns_execute_control_method 0000010001e0dacc
> psx_execute 0000010001e0da5c
> ps_parse_aml 0000010001e0da0c
> ps_parse_loop 0000010001e0d8fc
> ds_exec_end_op 0000010001e0d89c
> ds_resolve_operands 0000010001e0d85c
> ex_resolve_to_value 0000010001e0d80c
> ex_resolve_node_to_value 0000010001e0d7ac
> ex_read_data_from_field 0000010001e0d71c
> ex_extract_from_field 0000010001e0d69c
> ex_field_datum_io 0000010001e0d61c
> ex_access_region 0000010001e0d5ac
> ev_address_space_dispatch 0000010001e0d52c
> ev_pci_config_region_setup 0000010001e0d4ac
> ut_evaluate_numeric_object 0000010001e0d44c
> ut_evaluate_object 0000010001e0d3dc
> ns_evaluate_relative 0000010001e0d30c
> ns_evaluate_by_handle 0000010001e0d29c
> ns_execute_control_method 0000010001e0d24c
> psx_execute 0000010001e0d1dc
> ps_parse_aml 0000010001e0d18c
> ps_parse_loop 0000010001e0d07c
> ds_exec_end_op 0000010001e0d01c
> ex_resolve_operands 0000010001e0cf9c
> ex_resolve_to_value 0000010001e0cf4c
> ex_resolve_node_to_value 0000010001e0ceec
> ex_read_data_from_field 0000010001e0ce5c
> ex_extract_from_field 0000010001e0cddc
> ex_field_datum_io 0000010001e0cd5c
> ex_access_region 0000010001e0ccec
> ev_address_space_dispatch 0000010001e0cc6c
> ev_pci_config_region_setup 0000010001e0cbec
> acpi_evaluate_integer 0000010001e0caac
> acpi_evaluate_object 0000010001e0ca2c
> ns_evaluate_relative 0000010001e0c95c
> ns_evaluate_by_handle 0000010001e0c8ec
> ns_execute_control_method 0000010001e0c89c
> psx_execute 0000010001e0c82c
> ps_parse_aml 0000010001e0c7dc
> ps_parse_loop 0000010001e0c6cc
> ds_exec_end_op 0000010001e0c66c
> ex_resolve_operands 0000010001e0c5ec
> ex_resolve_to_value 0000010001e0c59c
> ex_resolve_node_to_value 0000010001e0c53c
> ex_read_data_from_field 0000010001e0c4ac
> ex_extract_from_field 0000010001e0c42c
> ex_field_datum_io 0000010001e0c3ac
> ex_access_region 0000010001e0c33c
> ev_address_space_dispatch 0000010001e0c2bc
> ex_enter_interpreter 0000010001e0c27c
> ut_acquire_mutex 0000010001e0c21c
> os_wait_semaphore 0000010001e0c1cc
>
> Thanks
> Stuart
>
>
>
> Len Brown wrote:
>> Stuart,
>> Does CONFIG_ACPI_DEBUG change the results of your measurements?
>>
>> Is it possible to run an i386 kernel on the same system to see if
>> we've got an x86_64-specific issue?
>>
>> There is some run-time stack tracing code in ACPI (see
>> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
>>
>> thanks,
>> -Len
>>
>> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
>>>> Here are some of the reasons I believe the stack is overflowing:
>>>>
>>>> I've added some "printk"s to the kernel, and I've found that the
>>>> stack pointer goes down by ~6K between
>>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
>>>> executer/exstore.c:acpi_ex_store().
>>>
>>> The usual way to start is do
>>>
>>> objdump -S <acpi object modules> | grep sub.*rsp
>>>
>>> then sort by the biggest stack pigs and fix them one by one (e.g.
>>> by kmallocing local data instead of allocating it on the stack)
>>> When afterwards the problem still occurs it is most likely recursion
>>> or to deep nesting. I have an old 2.4 patch that can catch these,
>>> but it would need porting to 2.6.
>>>
>>> -Andi
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by: IBM Linux Tutorials
>>> Free Linux tutorial presented by Daniel Robbins, President and CEO
>>> of GenToo technologies. Learn everything from fundamentals to system
>>>
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>>> _______________________________________________
>>> Acpi-devel mailing list
>>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-09 22:48 Moore, Robert
0 siblings, 0 replies; 35+ messages in thread
From: Moore, Robert @ 2004-03-09 22:48 UTC (permalink / raw)
To: Stuart_Hayes-DYMqY+WieiM, Brown, Len, ak-l3A5Bk7waGM
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Can you please post your DSDT and give us some idea of which _STA method
is executing?
I seem to remember that there may be a bit of leftover recursion in the
operation region/field handling code, but something looks very odd about
the way the ev_pci_config_region_setup function is getting called twice.
>>It seems to work with CONFIG_ACPI_DEBUG off. I'm guessing we're just
squeaking by with that, though. Wouldn't more complex ACPI methods
cause the stack usage to go up, causing it to break again?
I think this is an odd case (i.e., bug), since the interpreter has been
specifically architected to not use recursion -- however, the original
version of the interpreter did recurse based on the complexity of the
ASL code (when the interpreter was running as an application.) This was
removed in all obvious cases, but I do think that there may be a couple
that were missed - fields being perhaps one of them.
Bob
-----Original Message-----
From: acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
[mailto:acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of
Stuart_Hayes-DYMqY+WieiM@public.gmane.org
Sent: Tuesday, March 09, 2004 12:00 PM
To: Brown, Len; ak-l3A5Bk7waGM@public.gmane.org
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: RE: [ACPI] stack overflow
I am in the process of trying this without CONFIG_ACPI_DEBUG now.
I put a little extra debug stuff in utilities/utdebug.c to keep track
of all the nested functions, and added the ACPI_FUNCTION_TRACE to some
more of the functions, and this is what I get when the stack pointer is
lowest (the number by each is the address of the first argument of the
acpi_ut_trace(_*) function, which I store along with the function
name). This gives a pretty good picture of what's going on with the
recursion and the deep nesting of functions.
acpi_init 0000010001e0defc
acpi_bus_init 0000010001e0deac
acpi_initialize_objects 0000010001e0de6c
ns_initialize_devices 0000010001e0de1c
ns_walk_namespace 0000010001e0dd8c
ns_init_one_device 0000010001e0dd2c
ut_execute_STA 0000010001e0dccc
ut_evaluate_object 0000010001e0dc5c
ns_evaluate_relative 0000010001e0db8c
ns_evaluate_by_handle 0000010001e0db1c
ns_execute_control_method 0000010001e0dacc
psx_execute 0000010001e0da5c
ps_parse_aml 0000010001e0da0c
ps_parse_loop 0000010001e0d8fc
ds_exec_end_op 0000010001e0d89c
ds_resolve_operands 0000010001e0d85c
ex_resolve_to_value 0000010001e0d80c
ex_resolve_node_to_value 0000010001e0d7ac
ex_read_data_from_field 0000010001e0d71c
ex_extract_from_field 0000010001e0d69c
ex_field_datum_io 0000010001e0d61c
ex_access_region 0000010001e0d5ac
ev_address_space_dispatch 0000010001e0d52c
ev_pci_config_region_setup 0000010001e0d4ac
ut_evaluate_numeric_object 0000010001e0d44c
ut_evaluate_object 0000010001e0d3dc
ns_evaluate_relative 0000010001e0d30c
ns_evaluate_by_handle 0000010001e0d29c
ns_execute_control_method 0000010001e0d24c
psx_execute 0000010001e0d1dc
ps_parse_aml 0000010001e0d18c
ps_parse_loop 0000010001e0d07c
ds_exec_end_op 0000010001e0d01c
ex_resolve_operands 0000010001e0cf9c
ex_resolve_to_value 0000010001e0cf4c
ex_resolve_node_to_value 0000010001e0ceec
ex_read_data_from_field 0000010001e0ce5c
ex_extract_from_field 0000010001e0cddc
ex_field_datum_io 0000010001e0cd5c
ex_access_region 0000010001e0ccec
ev_address_space_dispatch 0000010001e0cc6c
ev_pci_config_region_setup 0000010001e0cbec
acpi_evaluate_integer 0000010001e0caac
acpi_evaluate_object 0000010001e0ca2c
ns_evaluate_relative 0000010001e0c95c
ns_evaluate_by_handle 0000010001e0c8ec
ns_execute_control_method 0000010001e0c89c
psx_execute 0000010001e0c82c
ps_parse_aml 0000010001e0c7dc
ps_parse_loop 0000010001e0c6cc
ds_exec_end_op 0000010001e0c66c
ex_resolve_operands 0000010001e0c5ec
ex_resolve_to_value 0000010001e0c59c
ex_resolve_node_to_value 0000010001e0c53c
ex_read_data_from_field 0000010001e0c4ac
ex_extract_from_field 0000010001e0c42c
ex_field_datum_io 0000010001e0c3ac
ex_access_region 0000010001e0c33c
ev_address_space_dispatch 0000010001e0c2bc
ex_enter_interpreter 0000010001e0c27c
ut_acquire_mutex 0000010001e0c21c
os_wait_semaphore 0000010001e0c1cc
Thanks
Stuart
Len Brown wrote:
> Stuart,
> Does CONFIG_ACPI_DEBUG change the results of your measurements?
>
> Is it possible to run an i386 kernel on the same system to see if
> we've got an x86_64-specific issue?
>
> There is some run-time stack tracing code in ACPI (see
> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
>
> thanks,
> -Len
>
> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
>>> Here are some of the reasons I believe the stack is overflowing:
>>>
>>> I've added some "printk"s to the kernel, and I've found that the
>>> stack pointer goes down by ~6K between
>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
>>> executer/exstore.c:acpi_ex_store().
>>
>> The usual way to start is do
>>
>> objdump -S <acpi object modules> | grep sub.*rsp
>>
>> then sort by the biggest stack pigs and fix them one by one (e.g.
>> by kmallocing local data instead of allocating it on the stack)
>> When afterwards the problem still occurs it is most likely recursion
>> or to deep nesting. I have an old 2.4 patch that can catch these,
>> but it would need porting to 2.6.
>>
>> -Andi
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IBM Linux Tutorials
>> Free Linux tutorial presented by Daniel Robbins, President and CEO of
>> GenToo technologies. Learn everything from fundamentals to system
>> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>> _______________________________________________
>> Acpi-devel mailing list
>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=ick
_______________________________________________
Acpi-devel mailing list
Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F128AA4-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
2004-03-08 18:26 ` Andi Kleen
@ 2004-03-10 4:33 ` Len Brown
[not found] ` <1078893223.2346.585.camel-D2Zvc0uNKG8@public.gmane.org>
1 sibling, 1 reply; 35+ messages in thread
From: Len Brown @ 2004-03-10 4:33 UTC (permalink / raw)
To: Stuart_Hayes-DYMqY+WieiM; +Cc: ACPI Developers, Robert Moore, Andi Kleen
On Mon, 2004-03-08 at 11:43, Stuart_Hayes-DYMqY+WieiM@public.gmane.org wrote:
> If I disable local interrupts while the ACPI stuff is being initialized, it
> seems to make it through without failing.
hmmm, i wonder if the failure happens when ACPI is interrupted, or if
there is an issue with some ACPI code running in an interrupt?
re: the stack trace
I'm sure if Bob gets your DSDT he'll be able to address the recursion
issue at hand.
More interesting, perhaps, would be adding debugging code to
"gracefully" check for this failure. There must be such DEBUG code
already built into the kernel someplace.
Sorting the list of stack frame sizes below shows
acpi_evaluate_integer() is the winner with 320 bytes on the stack. Note
that this isn't from passing structures, but from allocating local
structures. On i386 acpi_parse_object is 124 bytes, on x86_64 it will
be bigger...
./foo <stack.txt |sort -n
0 acpi_init
64 acpi_initialize_objects
64 ds_resolve_operands
64 ex_enter_interpreter
80 acpi_bus_init
80 ex_resolve_to_value
80 ex_resolve_to_value
80 ex_resolve_to_value
80 ns_execute_control_method
80 ns_execute_control_method
80 ns_execute_control_method
80 ns_initialize_devices
80 os_wait_semaphore
80 ps_parse_aml
80 ps_parse_aml
80 ps_parse_aml
96 ds_exec_end_op
96 ds_exec_end_op
96 ds_exec_end_op
96 ex_resolve_node_to_value
96 ex_resolve_node_to_value
96 ex_resolve_node_to_value
96 ns_init_one_device
96 ut_acquire_mutex
96 ut_evaluate_numeric_object
96 ut_execute_STA
112 ex_access_region
112 ex_access_region
112 ex_access_region
112 ns_evaluate_by_handle
112 ns_evaluate_by_handle
112 ns_evaluate_by_handle
112 psx_execute
112 psx_execute
112 psx_execute
112 ut_evaluate_object
112 ut_evaluate_object
128 acpi_evaluate_object
128 ev_address_space_dispatch
128 ev_address_space_dispatch
128 ev_address_space_dispatch
128 ev_pci_config_region_setup
128 ev_pci_config_region_setup
128 ex_extract_from_field
128 ex_extract_from_field
128 ex_extract_from_field
128 ex_field_datum_io
128 ex_field_datum_io
128 ex_field_datum_io
128 ex_resolve_operands
128 ex_resolve_operands
144 ex_read_data_from_field
144 ex_read_data_from_field
144 ex_read_data_from_field
144 ns_walk_namespace
208 ns_evaluate_relative
208 ns_evaluate_relative
208 ns_evaluate_relative
272 ps_parse_loop
272 ps_parse_loop
272 ps_parse_loop
320 acpi_evaluate_integer
#include <stdio.h>
#include <stdlib.h>
main()
{
char name[512];
unsigned long long number;
unsigned long long previous = 0;
while (EOF != scanf("%s %llx", &name, &number))
{
unsigned long long delta;
if (previous == 0) previous = number;
delta = previous - number;
printf("%lld %s\n", delta, name);
}
}
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
[not found] ` <1078893223.2346.585.camel-D2Zvc0uNKG8@public.gmane.org>
@ 2004-03-10 13:32 ` Andi Kleen
[not found] ` <20040310133208.GC12272-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
0 siblings, 1 reply; 35+ messages in thread
From: Andi Kleen @ 2004-03-10 13:32 UTC (permalink / raw)
To: Len Brown
Cc: Stuart_Hayes-DYMqY+WieiM, ACPI Developers, Robert Moore,
Andi Kleen
On Tue, Mar 09, 2004 at 11:33:43PM -0500, Brown, Len wrote:
> On Mon, 2004-03-08 at 11:43, Stuart_Hayes-DYMqY+WieiM@public.gmane.org wrote:
>
> > If I disable local interrupts while the ACPI stuff is being initialized, it
> > seems to make it through without failing.
>
> hmmm, i wonder if the failure happens when ACPI is interrupted, or if
> there is an issue with some ACPI code running in an interrupt?
x86-64 has separate interrupt stacks, interrupts shouldn't be a problem.
> More interesting, perhaps, would be adding debugging code to
> "gracefully" check for this failure. There must be such DEBUG code
> already built into the kernel someplace.
I have a patch for x86-64 (for 2.4, but could be ported). But it's
quite a slow down because it instruments every function.
If you have some recursion either fix it or at least add an error
out when the stack gets too low. We can add an "stack_left" function
exported by the architecture.
>
> Sorting the list of stack frame sizes below shows
> acpi_evaluate_integer() is the winner with 320 bytes on the stack. Note
> that this isn't from passing structures, but from allocating local
> structures. On i386 acpi_parse_object is 124 bytes, on x86_64 it will
> be bigger...
I would suggest to fix anything > 100 bytes at least
(and double check anything that could be expanded on 64bit)
-Andi
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-10 15:56 Stuart_Hayes-DYMqY+WieiM
0 siblings, 0 replies; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-10 15:56 UTC (permalink / raw)
To: robert.moore-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, ak-l3A5Bk7waGM
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Here is the relevant method, and all of the fields, regions, and
methods needed to execute it (I think). If this isn't sufficient,
let me know. The method being run is ...VPR0.D0F0._STA.
This is cut from the output from Phoenix's "ad" program, and it is
showing both the disassembled code and the AML itself. Sorry about
the long lines.
Thanks
Stuart
Scope/*0x10*//*0x85,0x31,0x03*/(\_SB/*0x5C,0x5F,0x53,0x42,0x5F*/)
{
Device/*0x5B,0x82*//*0x8D,0xD0,0x02*/(PCI0/*0x50,0x43,0x49,0x30*/)
{
Device/*0x5B,0x82*//*0x4C,0x89*/(VPR0/*0x56,0x50,0x52,0x30*/)
{
Name/*0x08*/(DEVN/*0x44,0x45,0x56,0x4E*/,/*0x0A*/0x00/*0x00*/)
Method/*0x14*//*0x3B*/(_ADR/*0x5F,0x41,0x44,0x52*/,0,NotSerialized/*0x00
*/)
{
Store/*0x70*/(/*0x0C*/0x00060000/*0x00,0x00,0x06,0x00*/,Local0/*0x60*/)
Store/*0x70*/(\_SB.PCI0.ISA.CPTP/*0x5C,0x2F,0x04,0x5F,0x53,0x42,0x5F,0x5
0,0x43,0x49,0x30,0x49,0x53,0x41,0x5F,0x43,0x50,0x54,0x50*/,Local1/*0x61*
/)
If/*0xA0*//*0x0C*/(LEqual/*0x93*/(Local1/*0x61*/,/*0x0A*/0x01/*0x01*/))
{
Store/*0x70*/(/*0x0C*/0x000A0000/*0x00,0x00,0x0A,0x00*/,Local0/*0x60*/)
}
Store/*0x70*/(Local0/*0x60*/,DEVN/*0x44,0x45,0x56,0x4E*/ /*
\_SB.PCI0.VPR0.DEVN */)
Store/*0x70*/(Local0/*0x60*/,Debug/*0x5B,0x31*/)
Return/*0xA4*/(Local0/*0x60*/)
}
Method/*0x14*//*0x21*/(MADR/*0x4D,0x41,0x44,0x52*/,1,NotSerialized/*0x01
*/)
{
Store/*0x70*/(Arg0/*0x68*/,Local0/*0x60*/)
If/*0xA0*//*0x0B*/(And/*0x7B*/(SCPL/*0x53,0x43,0x50,0x4C*/ /*
\_SB.PCI0.VPR0.SCPL */,/*0x0A*/0x40/*0x40*//*0x00*/))
{
Return/*0xA4*/(Local0/*0x60*/)
}
Else/*0xA1*//*0x0B*/
{
Or/*0x7D*/(Local0/*0x60*/,/*0x0C*/0x001F0000/*0x00,0x00,0x1F,0x00*/,Loca
l0/*0x60*/)
Return/*0xA4*/(Local0/*0x60*/)
}
}
OperationRegion(NBCF,PCI_Config,0x00,0x0100)
Field(NBCF /* \_SB.PCI0.VPR0.NBCF
*/,ByteAcc,NoLock,Preserve)
{
Offset(0x70),
Offset(0x73),
PNUM,8,
Offset(0x78),
SCPL,16,
SCPH,16,
SCTL,8,
SCTH,8,
SSTA,8,
Offset(0x80),
RPCT,8
}
Method/*0x14*//*0x15*/(MSTA/*0x4D,0x53,0x54,0x41*/,1,NotSerialized/*0x01
*/)
{
If/*0xA0*//*0x09*/(LEqual/*0x93*/(Arg0/*0x68*/,/*0x0B*/0xFFFF/*0xFF,0xFF
*/))
{
Return/*0xA4*/(/*0x0A*/0x00/*0x00*/)
}
Else/*0xA1*//*0x04*/
{
Return/*0xA4*/(/*0x0A*/0x0F/*0x0F*/)
}
}
Device/*0x5B,0x82*//*0x4C,0x05*/(D0F0/*0x44,0x30,0x46,0x30*/)
{
Method/*0x14*//*0x0D*/(_ADR/*0x5F,0x41,0x44,0x52*/,0,NotSerialized/*0x00
*/)
{
Return/*0xA4*/(MADR/*0x4D,0x41,0x44,0x52*/ /*
\_SB.PCI0.VPR0.MADR */(/*0x0A*/0x00/*0x00*/))
}
Method/*0x14*//*0x0F*/(_SUN/*0x5F,0x53,0x55,0x4E*/,0,NotSerialized/*0x00
*/)
{
Return/*0xA4*/(ShiftRight/*0x7A*/(SCPH/*0x53,0x43,0x50,0x48*/ /*
\_SB.PCI0.VPR0.SCPH */,/*0x0A*/0x03/*0x03*//*0x00*/))
}
OperationRegion/*0x5B,0x80*/(SCFG/*0x53,0x43,0x46,0x47*/,PCI_Config/*0x0
2*/,/*0x0A*/0x00/*0x00*/,/*0x0B*/0x0100/*0x00,0x01*/)
Field/*0x5B,0x81*//*0x0B*/(SCFG/*0x53,0x43,0x46,0x47*/ /*
\_SB.PCI0.VPR0.D0F0.SCFG */,WordAcc,NoLock,Preserve/*0x02*/)
{
VDID/*0x56,0x44,0x49,0x44*//*0x10*/,16
}
Method/*0x14*//*0x0F*/(_STA/*0x5F,0x53,0x54,0x41*/,0,NotSerialized/*0x00
*/)
{
Return/*0xA4*/(MSTA/*0x4D,0x53,0x54,0x41*/ /*
\_SB.PCI0.VPR0.MSTA */(VDID/*0x56,0x44,0x49,0x44*/ /*
\_SB.PCI0.VPR0.D0F0.VDID */))
}
Method/*0x14*//*0x0E*/(_EJ0/*0x5F,0x45,0x4A,0x30*/,1,NotSerialized/*0x01
*/)
{
PWCM/*0x50,0x57,0x43,0x4D*/ /*
\_SB.PCI0.VPR0.PWCM */(/*0x0A*/0xF8/*0xF8*/,/*0x0A*/0x07/*0x07*/)
}
}
}
}
}
Moore, Robert wrote:
> Can you please post your DSDT and give us some idea of which _STA
> method is executing?
>
> I seem to remember that there may be a bit of leftover recursion in
> the operation region/field handling code, but something looks very
> odd about the way the ev_pci_config_region_setup function is getting
> called twice.
>
>
>>> It seems to work with CONFIG_ACPI_DEBUG off. I'm guessing we're
>>> just
> squeaking by with that, though. Wouldn't more complex ACPI methods
> cause the stack usage to go up, causing it to break again?
>
> I think this is an odd case (i.e., bug), since the interpreter has
> been specifically architected to not use recursion -- however, the
> original version of the interpreter did recurse based on the
> complexity of the ASL code (when the interpreter was running as an
> application.) This was removed in all obvious cases, but I do think
> that there may be a couple that were missed - fields being perhaps
> one of them.
>
> Bob
>
> -----Original Message-----
> From: acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> [mailto:acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of
> Stuart_Hayes-DYMqY+WieiM@public.gmane.org
> Sent: Tuesday, March 09, 2004 12:00 PM
> To: Brown, Len; ak-l3A5Bk7waGM@public.gmane.org
> Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> Subject: RE: [ACPI] stack overflow
>
>
> I am in the process of trying this without CONFIG_ACPI_DEBUG now.
>
> I put a little extra debug stuff in utilities/utdebug.c to keep track
> of all the nested functions, and added the ACPI_FUNCTION_TRACE to some
> more of the functions, and this is what I get when the stack pointer
> is lowest (the number by each is the address of the first argument of
> the acpi_ut_trace(_*) function, which I store along with the function
> name). This gives a pretty good picture of what's going on with the
> recursion and the deep nesting of functions.
>
> acpi_init 0000010001e0defc
> acpi_bus_init 0000010001e0deac
> acpi_initialize_objects 0000010001e0de6c
> ns_initialize_devices 0000010001e0de1c
> ns_walk_namespace 0000010001e0dd8c
> ns_init_one_device 0000010001e0dd2c
> ut_execute_STA 0000010001e0dccc
> ut_evaluate_object 0000010001e0dc5c
> ns_evaluate_relative 0000010001e0db8c
> ns_evaluate_by_handle 0000010001e0db1c
> ns_execute_control_method 0000010001e0dacc
> psx_execute 0000010001e0da5c
> ps_parse_aml 0000010001e0da0c
> ps_parse_loop 0000010001e0d8fc
> ds_exec_end_op 0000010001e0d89c
> ds_resolve_operands 0000010001e0d85c
> ex_resolve_to_value 0000010001e0d80c
> ex_resolve_node_to_value 0000010001e0d7ac
> ex_read_data_from_field 0000010001e0d71c
> ex_extract_from_field 0000010001e0d69c
> ex_field_datum_io 0000010001e0d61c
> ex_access_region 0000010001e0d5ac
> ev_address_space_dispatch 0000010001e0d52c
> ev_pci_config_region_setup 0000010001e0d4ac
> ut_evaluate_numeric_object 0000010001e0d44c
> ut_evaluate_object 0000010001e0d3dc
> ns_evaluate_relative 0000010001e0d30c
> ns_evaluate_by_handle 0000010001e0d29c
> ns_execute_control_method 0000010001e0d24c
> psx_execute 0000010001e0d1dc
> ps_parse_aml 0000010001e0d18c
> ps_parse_loop 0000010001e0d07c
> ds_exec_end_op 0000010001e0d01c
> ex_resolve_operands 0000010001e0cf9c
> ex_resolve_to_value 0000010001e0cf4c
> ex_resolve_node_to_value 0000010001e0ceec
> ex_read_data_from_field 0000010001e0ce5c
> ex_extract_from_field 0000010001e0cddc
> ex_field_datum_io 0000010001e0cd5c
> ex_access_region 0000010001e0ccec
> ev_address_space_dispatch 0000010001e0cc6c
> ev_pci_config_region_setup 0000010001e0cbec
> acpi_evaluate_integer 0000010001e0caac
> acpi_evaluate_object 0000010001e0ca2c
> ns_evaluate_relative 0000010001e0c95c
> ns_evaluate_by_handle 0000010001e0c8ec
> ns_execute_control_method 0000010001e0c89c
> psx_execute 0000010001e0c82c
> ps_parse_aml 0000010001e0c7dc
> ps_parse_loop 0000010001e0c6cc
> ds_exec_end_op 0000010001e0c66c
> ex_resolve_operands 0000010001e0c5ec
> ex_resolve_to_value 0000010001e0c59c
> ex_resolve_node_to_value 0000010001e0c53c
> ex_read_data_from_field 0000010001e0c4ac
> ex_extract_from_field 0000010001e0c42c
> ex_field_datum_io 0000010001e0c3ac
> ex_access_region 0000010001e0c33c
> ev_address_space_dispatch 0000010001e0c2bc
> ex_enter_interpreter 0000010001e0c27c
> ut_acquire_mutex 0000010001e0c21c
> os_wait_semaphore 0000010001e0c1cc
>
> Thanks
> Stuart
>
>
>
> Len Brown wrote:
>> Stuart,
>> Does CONFIG_ACPI_DEBUG change the results of your measurements?
>>
>> Is it possible to run an i386 kernel on the same system to see if
>> we've got an x86_64-specific issue?
>>
>> There is some run-time stack tracing code in ACPI (see
>> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
>>
>> thanks,
>> -Len
>>
>> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
>>>> Here are some of the reasons I believe the stack is overflowing:
>>>>
>>>> I've added some "printk"s to the kernel, and I've found that the
>>>> stack pointer goes down by ~6K between
>>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
>>>> executer/exstore.c:acpi_ex_store().
>>>
>>> The usual way to start is do
>>>
>>> objdump -S <acpi object modules> | grep sub.*rsp
>>>
>>> then sort by the biggest stack pigs and fix them one by one (e.g.
>>> by kmallocing local data instead of allocating it on the stack)
>>> When afterwards the problem still occurs it is most likely recursion
>>> or to deep nesting. I have an old 2.4 patch that can catch these,
>>> but it would need porting to 2.6.
>>>
>>> -Andi
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by: IBM Linux Tutorials
>>> Free Linux tutorial presented by Daniel Robbins, President and CEO
>>> of GenToo technologies. Learn everything from fundamentals to system
>>>
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>>> _______________________________________________
>>> Acpi-devel mailing list
>>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=ick
> _______________________________________________
> Acpi-devel mailing list
> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-10 17:08 Stuart_Hayes-DYMqY+WieiM
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F020E5FD9-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
0 siblings, 1 reply; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-10 17:08 UTC (permalink / raw)
To: robert.moore-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, ak-l3A5Bk7waGM
Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Here is the console output that occurred right before the lowest
stack pointer occurred (the point where the system output the list
of ACPI functions that I sent earlier). You can see from this
which methods were being executed, and where the code was in
evaluating ...VPR0.D0F0._STA.
Also, I should probably mention that this is a 2.4 kernel. It's
RHEL3, which is based on 2.4.21, but I've tested this with the
latest 2.4 kernel (2.4.25), and the results were the same.
Thanks
Stuart
in acpi_ns_init_one_device for D0F0 (ptr=0000010001ddacf0)
. in acpi_ns_evaluate_relative to execute D0F0, method _STA
...internalized path=_STA
...looked up node, pointer=0000010001dd91f0
(in apci_ns_execute_control_method _STA)
(method aml: a4 4d 53 54 41 56 44 49 44 )
in acpi_ns_evaluate_relative to execute D0F0, method _ADR
...internalized path=_ADR
...looked up node, pointer=0000010001ddad70
(in apci_ns_execute_control_method _ADR)
(method aml: a4 4d 41 44 52 a 0 )
in acpi_ns_evaluate_relative to execute VPR0, method _ADR
...internalized path=_ADR
...looked up node, pointer=0000010001ddbaf0
(in apci_ns_execute_control_method _ADR)
(method aml: 70 c 0 0 6 0 60 70 5c 2f 4 5f 53 42 5f 50 43 49 30 49 53
41 5f 43 50 54 50 61 a0 c 93 61 a 1 70 c 0 0 a 0 60 70 60 44 45 56 4e 70
60 5b 31 a4 60 )
[ACPI Debug] Integer: 0000000000060000
...exiting acpi_ns_evaluate_relative for obj=_ADR
in acpi_ns_evaluate_relative to execute PCI0, method _SEG
...internalized path=_SEG
...looked up node, pointer=0000000000000000
[PCI0._SEG] was not found
in acpi_ns_evaluate_relative to execute PCI0, method _BBN
...internalized path=_BBN
...looked up node, pointer=0000010001de64f0
...exiting acpi_ns_evaluate_relative for obj=_BBN
in acpi_ns_evaluate_relative to execute VPR0, method _ADR
...internalized path=_ADR
...looked up node, pointer=0000010001ddbaf0
(in apci_ns_execute_control_method _ADR)
(method aml: 70 c 0 0 6 0 60 70 5c 2f 4 5f 53 42 5f 50 43 49 30 49 53
41 5f 43 50 54 50 61 a0 c 93 61 a 1 70 c 0 0 a 0 60 70 60 44 45 56 4e 70
60 5b 31 a4 60
--(function list I sent earlier was here)--
Hayes, Stuart wrote:
> Here is the relevant method, and all of the fields, regions, and
> methods needed to execute it (I think). If this isn't sufficient,
> let me know. The method being run is ...VPR0.D0F0._STA.
>
> This is cut from the output from Phoenix's "ad" program, and it is
> showing both the disassembled code and the AML itself. Sorry about
> the long lines.
>
> Thanks
> Stuart
>
>
>
>
> Moore, Robert wrote:
>> Can you please post your DSDT and give us some idea of which _STA
>> method is executing?
>>
>> I seem to remember that there may be a bit of leftover recursion in
>> the operation region/field handling code, but something looks very
>> odd about the way the ev_pci_config_region_setup function is getting
>> called twice.
>>
>>
>>>> It seems to work with CONFIG_ACPI_DEBUG off. I'm guessing we're
>>>> just
>> squeaking by with that, though. Wouldn't more complex ACPI methods
>> cause the stack usage to go up, causing it to break again?
>>
>> I think this is an odd case (i.e., bug), since the interpreter has
>> been specifically architected to not use recursion -- however, the
>> original version of the interpreter did recurse based on the
>> complexity of the ASL code (when the interpreter was running as an
>> application.) This was removed in all obvious cases, but I do think
>> that there may be a couple that were missed - fields being perhaps
>> one of them.
>>
>> Bob
>>
>> -----Original Message-----
>> From: acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> [mailto:acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of
>> Stuart_Hayes-DYMqY+WieiM@public.gmane.org Sent: Tuesday, March 09, 2004 12:00 PM
>> To: Brown, Len; ak-l3A5Bk7waGM@public.gmane.org
>> Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> Subject: RE: [ACPI] stack overflow
>>
>>
>> I am in the process of trying this without CONFIG_ACPI_DEBUG now.
>>
>> I put a little extra debug stuff in utilities/utdebug.c to keep track
>> of all the nested functions, and added the ACPI_FUNCTION_TRACE to
>> some more of the functions, and this is what I get when the stack
>> pointer is lowest (the number by each is the address of the first
>> argument of the acpi_ut_trace(_*) function, which I store along with
>> the function name). This gives a pretty good picture of what's
>> going on with the recursion and the deep nesting of functions.
>>
>> acpi_init 0000010001e0defc
>> acpi_bus_init 0000010001e0deac
>> acpi_initialize_objects 0000010001e0de6c
>> ns_initialize_devices 0000010001e0de1c
>> ns_walk_namespace 0000010001e0dd8c
>> ns_init_one_device 0000010001e0dd2c
>> ut_execute_STA 0000010001e0dccc
>> ut_evaluate_object 0000010001e0dc5c
>> ns_evaluate_relative 0000010001e0db8c
>> ns_evaluate_by_handle 0000010001e0db1c
>> ns_execute_control_method 0000010001e0dacc
>> psx_execute 0000010001e0da5c
>> ps_parse_aml 0000010001e0da0c
>> ps_parse_loop 0000010001e0d8fc
>> ds_exec_end_op 0000010001e0d89c
>> ds_resolve_operands 0000010001e0d85c
>> ex_resolve_to_value 0000010001e0d80c
>> ex_resolve_node_to_value 0000010001e0d7ac
>> ex_read_data_from_field 0000010001e0d71c
>> ex_extract_from_field 0000010001e0d69c
>> ex_field_datum_io 0000010001e0d61c
>> ex_access_region 0000010001e0d5ac
>> ev_address_space_dispatch 0000010001e0d52c
>> ev_pci_config_region_setup 0000010001e0d4ac
>> ut_evaluate_numeric_object 0000010001e0d44c
>> ut_evaluate_object 0000010001e0d3dc
>> ns_evaluate_relative 0000010001e0d30c
>> ns_evaluate_by_handle 0000010001e0d29c
>> ns_execute_control_method 0000010001e0d24c
>> psx_execute 0000010001e0d1dc
>> ps_parse_aml 0000010001e0d18c
>> ps_parse_loop 0000010001e0d07c
>> ds_exec_end_op 0000010001e0d01c
>> ex_resolve_operands 0000010001e0cf9c
>> ex_resolve_to_value 0000010001e0cf4c
>> ex_resolve_node_to_value 0000010001e0ceec
>> ex_read_data_from_field 0000010001e0ce5c
>> ex_extract_from_field 0000010001e0cddc
>> ex_field_datum_io 0000010001e0cd5c
>> ex_access_region 0000010001e0ccec
>> ev_address_space_dispatch 0000010001e0cc6c
>> ev_pci_config_region_setup 0000010001e0cbec
>> acpi_evaluate_integer 0000010001e0caac
>> acpi_evaluate_object 0000010001e0ca2c
>> ns_evaluate_relative 0000010001e0c95c
>> ns_evaluate_by_handle 0000010001e0c8ec
>> ns_execute_control_method 0000010001e0c89c
>> psx_execute 0000010001e0c82c
>> ps_parse_aml 0000010001e0c7dc
>> ps_parse_loop 0000010001e0c6cc
>> ds_exec_end_op 0000010001e0c66c
>> ex_resolve_operands 0000010001e0c5ec
>> ex_resolve_to_value 0000010001e0c59c
>> ex_resolve_node_to_value 0000010001e0c53c
>> ex_read_data_from_field 0000010001e0c4ac
>> ex_extract_from_field 0000010001e0c42c
>> ex_field_datum_io 0000010001e0c3ac
>> ex_access_region 0000010001e0c33c
>> ev_address_space_dispatch 0000010001e0c2bc
>> ex_enter_interpreter 0000010001e0c27c
>> ut_acquire_mutex 0000010001e0c21c
>> os_wait_semaphore 0000010001e0c1cc
>>
>> Thanks
>> Stuart
>>
>>
>>
>> Len Brown wrote:
>>> Stuart,
>>> Does CONFIG_ACPI_DEBUG change the results of your measurements?
>>>
>>> Is it possible to run an i386 kernel on the same system to see if
>>> we've got an x86_64-specific issue?
>>>
>>> There is some run-time stack tracing code in ACPI (see
>>> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
>>>
>>> thanks,
>>> -Len
>>>
>>> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
>>>>> Here are some of the reasons I believe the stack is overflowing:
>>>>>
>>>>> I've added some "printk"s to the kernel, and I've found that the
>>>>> stack pointer goes down by ~6K between
>>>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
>>>>> executer/exstore.c:acpi_ex_store().
>>>>
>>>> The usual way to start is do
>>>>
>>>> objdump -S <acpi object modules> | grep sub.*rsp
>>>>
>>>> then sort by the biggest stack pigs and fix them one by one (e.g.
>>>> by kmallocing local data instead of allocating it on the stack)
>>>> When afterwards the problem still occurs it is most likely
>>>> recursion or to deep nesting. I have an old 2.4 patch that can
>>>> catch these, but it would need porting to 2.6.
>>>>
>>>> -Andi
>>>>
>>>>
>>>> -------------------------------------------------------
>>>> This SF.Net email is sponsored by: IBM Linux Tutorials
>>>> Free Linux tutorial presented by Daniel Robbins, President and CEO
>>>> of GenToo technologies. Learn everything from fundamentals to
>>>> system
>>>>
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>>>> _______________________________________________
>>>> Acpi-devel mailing list
>>>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>>>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IBM Linux Tutorials
>> Free Linux tutorial presented by Daniel Robbins, President and CEO of
>> GenToo technologies. Learn everything from fundamentals to system
>> administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=ick
>> _______________________________________________
>> Acpi-devel mailing list
>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F020E5FD9-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
@ 2004-03-10 18:40 ` Len Brown
0 siblings, 0 replies; 35+ messages in thread
From: Len Brown @ 2004-03-10 18:40 UTC (permalink / raw)
To: Stuart_Hayes-DYMqY+WieiM; +Cc: Robert Moore, Andi Kleen, ACPI Developers
Stuart,
not that I expect it to make a difference, but note that you can update
to the latest ACPI interpreter by applying the ACPI patch here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/release/
I do have a patch there for 2.4.25.
Thanks for all the info -- I'm sure that Bob will be able to use it to
find out why the interpreter erroneously went recursive.
thanks,
-Len
On Wed, 2004-03-10 at 12:08, Stuart_Hayes-DYMqY+WieiM@public.gmane.org wrote:
> Here is the console output that occurred right before the lowest
> stack pointer occurred (the point where the system output the list
> of ACPI functions that I sent earlier). You can see from this
> which methods were being executed, and where the code was in
> evaluating ...VPR0.D0F0._STA.
>
> Also, I should probably mention that this is a 2.4 kernel. It's
> RHEL3, which is based on 2.4.21, but I've tested this with the
> latest 2.4 kernel (2.4.25), and the results were the same.
>
> Thanks
> Stuart
>
>
> in acpi_ns_init_one_device for D0F0 (ptr=0000010001ddacf0)
> . in acpi_ns_evaluate_relative to execute D0F0, method _STA
> ...internalized path=_STA
> ...looked up node, pointer=0000010001dd91f0
> (in apci_ns_execute_control_method _STA)
> (method aml: a4 4d 53 54 41 56 44 49 44 )
> in acpi_ns_evaluate_relative to execute D0F0, method _ADR
> ...internalized path=_ADR
> ...looked up node, pointer=0000010001ddad70
> (in apci_ns_execute_control_method _ADR)
> (method aml: a4 4d 41 44 52 a 0 )
> in acpi_ns_evaluate_relative to execute VPR0, method _ADR
> ...internalized path=_ADR
> ...looked up node, pointer=0000010001ddbaf0
> (in apci_ns_execute_control_method _ADR)
> (method aml: 70 c 0 0 6 0 60 70 5c 2f 4 5f 53 42 5f 50 43 49 30 49 53 41
> 5f 43 50 54 50 61 a0 c 93 61 a 1 70 c 0 0 a 0 60 70 60 44 45 56 4e 70 60 5b
> 31 a4 60 )
> [ACPI Debug] Integer: 0000000000060000
> ...exiting acpi_ns_evaluate_relative for obj=_ADR
> in acpi_ns_evaluate_relative to execute PCI0, method _SEG
> ...internalized path=_SEG
> ...looked up node, pointer=0000000000000000
> [PCI0._SEG] was not found
> in acpi_ns_evaluate_relative to execute PCI0, method _BBN
> ...internalized path=_BBN
> ...looked up node, pointer=0000010001de64f0
> ...exiting acpi_ns_evaluate_relative for obj=_BBN
> in acpi_ns_evaluate_relative to execute VPR0, method _ADR
> ...internalized path=_ADR
> ...looked up node, pointer=0000010001ddbaf0
> (in apci_ns_execute_control_method _ADR)
> (method aml: 70 c 0 0 6 0 60 70 5c 2f 4 5f 53 42 5f 50 43 49 30 49 53 41
> 5f 43 50 54 50 61 a0 c 93 61 a 1 70 c 0 0 a 0 60 70 60 44 45 56 4e 70 60 5b
> 31 a4 60
>
> --(function list I sent earlier was here)--
>
>
> Hayes, Stuart wrote:
> > Here is the relevant method, and all of the fields, regions, and
> > methods needed to execute it (I think). If this isn't sufficient,
> > let me know. The method being run is ...VPR0.D0F0._STA.
> >
> > This is cut from the output from Phoenix's "ad" program, and it is
> > showing both the disassembled code and the AML itself. Sorry about
> > the long lines.
> >
> > Thanks
> > Stuart
> >
> >
> >
> >
> > Moore, Robert wrote:
> >> Can you please post your DSDT and give us some idea of which _STA
> >> method is executing?
> >>
> >> I seem to remember that there may be a bit of leftover recursion in
> >> the operation region/field handling code, but something looks very
> >> odd about the way the ev_pci_config_region_setup function is getting
> >> called twice.
> >>
> >>
> >>>> It seems to work with CONFIG_ACPI_DEBUG off. I'm guessing we're
> >>>> just
> >> squeaking by with that, though. Wouldn't more complex ACPI methods
> >> cause the stack usage to go up, causing it to break again?
> >>
> >> I think this is an odd case (i.e., bug), since the interpreter has
> >> been specifically architected to not use recursion -- however, the
> >> original version of the interpreter did recurse based on the
> >> complexity of the ASL code (when the interpreter was running as an
> >> application.) This was removed in all obvious cases, but I do think
> >> that there may be a couple that were missed - fields being perhaps
> >> one of them.
> >>
> >> Bob
> >>
> >> -----Original Message-----
> >> From: acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> >> [mailto:acpi-devel-admin-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org] On Behalf Of
> >> Stuart_Hayes-DYMqY+WieiM@public.gmane.org Sent: Tuesday, March 09, 2004 12:00 PM
> >> To: Brown, Len; ak-l3A5Bk7waGM@public.gmane.org
> >> Cc: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> >> Subject: RE: [ACPI] stack overflow
> >>
> >>
> >> I am in the process of trying this without CONFIG_ACPI_DEBUG now.
> >>
> >> I put a little extra debug stuff in utilities/utdebug.c to keep track
> >> of all the nested functions, and added the ACPI_FUNCTION_TRACE to
> >> some more of the functions, and this is what I get when the stack
> >> pointer is lowest (the number by each is the address of the first
> >> argument of the acpi_ut_trace(_*) function, which I store along with
> >> the function name). This gives a pretty good picture of what's
> >> going on with the recursion and the deep nesting of functions.
> >>
> >> acpi_init 0000010001e0defc
> >> acpi_bus_init 0000010001e0deac
> >> acpi_initialize_objects 0000010001e0de6c
> >> ns_initialize_devices 0000010001e0de1c
> >> ns_walk_namespace 0000010001e0dd8c
> >> ns_init_one_device 0000010001e0dd2c
> >> ut_execute_STA 0000010001e0dccc
> >> ut_evaluate_object 0000010001e0dc5c
> >> ns_evaluate_relative 0000010001e0db8c
> >> ns_evaluate_by_handle 0000010001e0db1c
> >> ns_execute_control_method 0000010001e0dacc
> >> psx_execute 0000010001e0da5c
> >> ps_parse_aml 0000010001e0da0c
> >> ps_parse_loop 0000010001e0d8fc
> >> ds_exec_end_op 0000010001e0d89c
> >> ds_resolve_operands 0000010001e0d85c
> >> ex_resolve_to_value 0000010001e0d80c
> >> ex_resolve_node_to_value 0000010001e0d7ac
> >> ex_read_data_from_field 0000010001e0d71c
> >> ex_extract_from_field 0000010001e0d69c
> >> ex_field_datum_io 0000010001e0d61c
> >> ex_access_region 0000010001e0d5ac
> >> ev_address_space_dispatch 0000010001e0d52c
> >> ev_pci_config_region_setup 0000010001e0d4ac
> >> ut_evaluate_numeric_object 0000010001e0d44c
> >> ut_evaluate_object 0000010001e0d3dc
> >> ns_evaluate_relative 0000010001e0d30c
> >> ns_evaluate_by_handle 0000010001e0d29c
> >> ns_execute_control_method 0000010001e0d24c
> >> psx_execute 0000010001e0d1dc
> >> ps_parse_aml 0000010001e0d18c
> >> ps_parse_loop 0000010001e0d07c
> >> ds_exec_end_op 0000010001e0d01c
> >> ex_resolve_operands 0000010001e0cf9c
> >> ex_resolve_to_value 0000010001e0cf4c
> >> ex_resolve_node_to_value 0000010001e0ceec
> >> ex_read_data_from_field 0000010001e0ce5c
> >> ex_extract_from_field 0000010001e0cddc
> >> ex_field_datum_io 0000010001e0cd5c
> >> ex_access_region 0000010001e0ccec
> >> ev_address_space_dispatch 0000010001e0cc6c
> >> ev_pci_config_region_setup 0000010001e0cbec
> >> acpi_evaluate_integer 0000010001e0caac
> >> acpi_evaluate_object 0000010001e0ca2c
> >> ns_evaluate_relative 0000010001e0c95c
> >> ns_evaluate_by_handle 0000010001e0c8ec
> >> ns_execute_control_method 0000010001e0c89c
> >> psx_execute 0000010001e0c82c
> >> ps_parse_aml 0000010001e0c7dc
> >> ps_parse_loop 0000010001e0c6cc
> >> ds_exec_end_op 0000010001e0c66c
> >> ex_resolve_operands 0000010001e0c5ec
> >> ex_resolve_to_value 0000010001e0c59c
> >> ex_resolve_node_to_value 0000010001e0c53c
> >> ex_read_data_from_field 0000010001e0c4ac
> >> ex_extract_from_field 0000010001e0c42c
> >> ex_field_datum_io 0000010001e0c3ac
> >> ex_access_region 0000010001e0c33c
> >> ev_address_space_dispatch 0000010001e0c2bc
> >> ex_enter_interpreter 0000010001e0c27c
> >> ut_acquire_mutex 0000010001e0c21c
> >> os_wait_semaphore 0000010001e0c1cc
> >>
> >> Thanks
> >> Stuart
> >>
> >>
> >>
> >> Len Brown wrote:
> >>> Stuart,
> >>> Does CONFIG_ACPI_DEBUG change the results of your measurements?
> >>>
> >>> Is it possible to run an i386 kernel on the same system to see if
> >>> we've got an x86_64-specific issue?
> >>>
> >>> There is some run-time stack tracing code in ACPI (see
> >>> acpi_gbl_lowest_stack_pointer) but it hasn't been used in a while.
> >>>
> >>> thanks,
> >>> -Len
> >>>
> >>> On Mon, 2004-03-08 at 13:26, Andi Kleen wrote:
> >>>>> Here are some of the reasons I believe the stack is overflowing:
> >>>>>
> >>>>> I've added some "printk"s to the kernel, and I've found that the
> >>>>> stack pointer goes down by ~6K between
> >>>>> namespace/nseval.c:acpi_ns_evaluate_relative() and
> >>>>> executer/exstore.c:acpi_ex_store().
> >>>>
> >>>> The usual way to start is do
> >>>>
> >>>> objdump -S <acpi object modules> | grep sub.*rsp
> >>>>
> >>>> then sort by the biggest stack pigs and fix them one by one (e.g.
> >>>> by kmallocing local data instead of allocating it on the stack)
> >>>> When afterwards the problem still occurs it is most likely
> >>>> recursion or to deep nesting. I have an old 2.4 patch that can
> >>>> catch these, but it would need porting to 2.6.
> >>>>
> >>>> -Andi
> >>>>
> >>>>
> >>>> -------------------------------------------------------
> >>>> This SF.Net email is sponsored by: IBM Linux Tutorials
> >>>> Free Linux tutorial presented by Daniel Robbins, President and CEO
> >>>> of GenToo technologies. Learn everything from fundamentals to
> >>>> system
> >>>> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> >>>> _______________________________________________
> >>>> Acpi-devel mailing list
> >>>> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> >>>> https://lists.sourceforge.net/lists/listinfo/acpi-devel
> >>
> >>
> >>
> >>
> >> -------------------------------------------------------
> >> This SF.Net email is sponsored by: IBM Linux Tutorials
> >> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> >> GenToo technologies. Learn everything from fundamentals to system
> >> administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=ick
> >> _______________________________________________
> >> Acpi-devel mailing list
> >> Acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
> >> https://lists.sourceforge.net/lists/listinfo/acpi-devel
>
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: stack overflow
[not found] ` <20040310133208.GC12272-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
@ 2004-03-10 18:44 ` Len Brown
0 siblings, 0 replies; 35+ messages in thread
From: Len Brown @ 2004-03-10 18:44 UTC (permalink / raw)
To: Andi Kleen; +Cc: Stuart_Hayes-DYMqY+WieiM, ACPI Developers, Robert Moore
On Wed, 2004-03-10 at 08:32, Andi Kleen wrote:
> If you have some recursion either fix it or at least add an error
> out when the stack gets too low. We can add an "stack_left" function
> exported by the architecture.
I think we'll want to put some fun-time sanity checks for illegal
recursion into the interpreter. That should be less invasive than doing
the full blown stack check -- which can be enabled as a separate DEBUG
test when needed.
> > Sorting the list of stack frame sizes below shows
> > acpi_evaluate_integer() is the winner with 320 bytes on the stack. Note
> > that this isn't from passing structures, but from allocating local
> > structures. On i386 acpi_parse_object is 124 bytes, on x86_64 it will
> > be bigger...
>
> I would suggest to fix anything > 100 bytes at least
> (and double check anything that could be expanded on 64bit)
Agreed, Bob and I will fix the big stack users.
thanks,
-Len
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: stack overflow
@ 2004-03-22 17:40 Stuart_Hayes-DYMqY+WieiM
0 siblings, 0 replies; 35+ messages in thread
From: Stuart_Hayes-DYMqY+WieiM @ 2004-03-22 17:40 UTC (permalink / raw)
To: acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Cc: robert.moore-ral2JQCrhuEAvxtiuMwx3w
Len Brown wrote:
> On Wed, 2004-03-10 at 08:32, Andi Kleen wrote:
>
>> If you have some recursion either fix it or at least add an error
>> out when the stack gets too low. We can add an "stack_left" function
>> exported by the architecture.
>
> I think we'll want to put some fun-time sanity checks for illegal
> recursion into the interpreter. That should be less invasive than
> doing the full blown stack check -- which can be enabled as a
> separate DEBUG test when needed.
>
>>> Sorting the list of stack frame sizes below shows
>>> acpi_evaluate_integer() is the winner with 320 bytes on the stack.
>>> Note that this isn't from passing structures, but from allocating
>>> local structures. On i386 acpi_parse_object is 124 bytes, on
>>> x86_64 it will be bigger...
>>
>> I would suggest to fix anything > 100 bytes at least
>> (and double check anything that could be expanded on 64bit)
>
> Agreed, Bob and I will fix the big stack users.
>
> thanks,
> -Len
Thanks for all the help. Robert Moore identified a big part of the
problem (recursion):
(quoting Robert)
"The issue is how PCI_Config operation regions are initialized.
1) a _STA accesses a PCI_Config space field
2) This eventually causes transfer to ev_pci_config_region_setup
3) ev_pci_config_region_setup attempts to resolve the _ADR object
4) The implementation of _ADR in the ASL accesses PCI_Config space
5) This in turn causes another call to ev_pci_config_region_setup"
(end of quote)
Turning off ACPI debug messages seems to fix my problem for now.
With debug messages enabled, the stack usage seems to be roughly 64
extra bytes for each function called (presumably from the extra local
variables defined in some of the debug macros in acmacros.h). I have
modified my DSDT so that only one layer of recursion happens, and,
with this, the *difference* in kernel stack usage between having ACPI
debug enabled and disabled is 2776 bytes. Even with this better DSDT,
the kernel stack overflows by 616 bytes (I used a larger kernel stack
to figure that out...) if I have debug messages enabled. FYI, the
kernel stack on my system is 5408 bytes total (8K - size of the
task_struct).
With debug messages disabled, I have a good 2K of margin on the stack
with my modified DSDT.
Thanks!
Stuart
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id\x1470&alloc_id638&op=click
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2004-03-22 17:40 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-12 17:53 stack overflow Breno
2003-09-12 22:50 ` Andreas Dilger
2003-09-12 19:14 ` Breno
2003-09-12 23:06 ` William Lee Irwin III
2003-09-12 19:23 ` Breno
2003-09-12 23:18 ` Alan Cox
2003-09-12 23:25 ` William Lee Irwin III
2003-09-12 23:18 ` stack overflow - kernel thread Breno
-- strict thread matches above, loose matches on Subject: below --
2004-03-22 17:40 stack overflow Stuart_Hayes-DYMqY+WieiM
2004-03-10 17:08 Stuart_Hayes-DYMqY+WieiM
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F020E5FD9-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
2004-03-10 18:40 ` Len Brown
2004-03-10 15:56 Stuart_Hayes-DYMqY+WieiM
2004-03-09 22:48 Moore, Robert
2004-03-09 21:04 Stuart_Hayes-DYMqY+WieiM
2004-03-09 20:00 Stuart_Hayes-DYMqY+WieiM
2004-03-09 18:34 Moore, Robert
2004-03-08 16:43 Stuart_Hayes-DYMqY+WieiM
[not found] ` <CE41BFEF2481C246A8DE0D2B4DBACF4F128AA4-novRXWwkcpil7xnNSM18fRtLTTO9Z+wMojBamW5iJbs@public.gmane.org>
2004-03-08 18:26 ` Andi Kleen
[not found] ` <20040308182630.GB9490-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2004-03-09 7:17 ` Len Brown
2004-03-10 4:33 ` Len Brown
[not found] ` <1078893223.2346.585.camel-D2Zvc0uNKG8@public.gmane.org>
2004-03-10 13:32 ` Andi Kleen
[not found] ` <20040310133208.GC12272-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2004-03-10 18:44 ` Len Brown
2003-01-24 7:08 Stack overflow Madhavi
2003-01-24 7:53 ` Linux Geek
2003-01-24 15:32 ` GrandMasterLee
2003-01-24 15:41 ` Madhavi
2003-01-24 16:42 ` Richard B. Johnson
2003-01-24 16:52 ` Gianni Tedesco
2000-09-06 13:25 stack overflow Zeshan Ahmad
2000-09-05 19:03 Zeshan Ahmad
2000-09-06 8:33 ` Mark Hemment
2000-09-04 10:47 Zeshan Ahmad
2000-09-04 11:03 ` Matti Aarnio
2000-09-04 11:23 ` Tigran Aivazian
2000-09-05 10:55 ` Mark Hemment
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.