* Re: [BUG] test9 ACPI bad: scheduling while atomic!
@ 2003-10-27 8:22 Noah J. Misch
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Noah J. Misch @ 2003-10-27 8:22 UTC (permalink / raw)
To: alex.williamson-VXdhtT5mjnY
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
shaohua.li-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ
Hi,
> On an Omnibook 500 running test9, removing AC power causes an
> immediate hang. This laptop is getting a little old and I have to force
> on ACPI support, but this did not happen with test8. The bug and panic
I have this problem as well, on a Sony Vaio, model PCG-571L.
> are shown below. It looks like the AML associated with the AC event is
> trying to do an AML_SLEEP_OP. Since this is called while in the
> interrupt handler, and the eventual call to acpi_os_sleep() sets the
> current state to interruptible... boom. One simple, but terribly ugly,
> workaround is to make acpi_os_sleep() call acpi_os_stall() if
> in_atomic() is true (patch below). Hopefully there's a better way to
> fix this. Somehow the interpreter really needs to drop interrupt
> context before it starts making calls like this. Thanks,
This problem stems from the changes in revision 1.26 of drivers/acpi/ec.c.
They come from a patch Shaohua Li submitted for kernel bug 1171 at
bugme.osdl.org. That patch can cause acpi_ec_gpe_query to run in interrupt
context, whereas before it always ran from a workqueue. It does non-interrupt
like things, like sleeping and kmalloc'ing with GFP_KERNEL.
This was obvious on my system because it has no ECDT table, and as such
acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an
ECDT it would only do so for a brief time during boot, and the problem would be
much more subtle. That's probably why nobody noticed this in earlier tests.
I reversed cset 1.1337.43.3 as follows, and that fixed the problem:
bk export -tpatch -r1.1337.43.3 | patch -p1 -R
I can't figure out why that patch fixed the oops in bug 1171. It was a hook
into the ec address space handler, not the gpe handler, that led to the oops,
yet the patch seems to only modify gpe-related code. Perhaps you could explain,
Shaohua?
I'd guess the T40 oops results from the ACPI_MEM_FREE on line 305 of
drivers/acpi/events/evregion.c freeing already-freed memory. I'm actually not
sure why that free is even there. I also can't figure why only SMP-configured
kernels exhibited the problem. If someone has the problem hardware, I am
willing to debug it, however.
The errant patch does address what seems to be a race condition that could play
out as follows:
1) The early ECDT probe locates an ECDT and registers a handler for the relevant
GPE and address space.
2) An IRQ triggers acpi_ec_gpe_handler, which schedules acpi_ec_gpe_query.
3) ACPI scans for devices and adds the "real" embedded controller device,
freeing the (temporary) context of the old GPE query handler.
4) Queue runs acpi_ec_gpe_query with a context that has already been kfree'd,
causing it to fail.
It seems rather theoretical, but perhaps we could fix it with a patch like the
following. I tested it for kicks and didn't hit any problems, but I'm afraid it
risks more problems than it solves. Thoughts?
--- 1.27/drivers/acpi/ec.c Mon Oct 27 03:50:57 2003
+++ edited/drivers/acpi/ec.c Mon Oct 27 03:51:57 2003
@@ -28,6 +28,7 @@
#include <linux/init.h>
#include <linux/types.h>
#include <linux/delay.h>
+#include <linux/workqueue.h>
#include <linux/proc_fs.h>
#include <asm/io.h>
#include <acpi/acpi_bus.h>
@@ -593,6 +594,10 @@
ACPI_ADR_SPACE_EC, &acpi_ec_space_handler);
acpi_remove_gpe_handler(NULL, ec_ecdt->gpe_bit, &acpi_ec_gpe_handler);
+
+ /* Clear any pending GPE queries before freeing the context for
+ their handlers */
+ flush_scheduled_work();
kfree(ec_ecdt);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] test9 ACPI bad: scheduling while atomic!
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
@ 2003-10-27 16:47 ` Alex Williamson
[not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org>
2003-10-27 20:24 ` [ACPI] " Nate Lawson
1 sibling, 1 reply; 4+ messages in thread
From: Alex Williamson @ 2003-10-27 16:47 UTC (permalink / raw)
To: Noah J. Misch
Cc: linux-kernel, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
shaohua.li-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ
On Mon, 2003-10-27 at 01:22, Noah J. Misch wrote:
> This problem stems from the changes in revision 1.26 of drivers/acpi/ec.c.
> They come from a patch Shaohua Li submitted for kernel bug 1171 at
> bugme.osdl.org. That patch can cause acpi_ec_gpe_query to run in interrupt
> context, whereas before it always ran from a workqueue. It does non-interrupt
> like things, like sleeping and kmalloc'ing with GFP_KERNEL.
>
> This was obvious on my system because it has no ECDT table, and as such
> acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an
> ECDT it would only do so for a brief time during boot, and the problem would be
> much more subtle. That's probably why nobody noticed this in earlier tests.
>
I don't have an ECDT either. Is it possible that the setting of
ec_device_init = 1 is simply misplaced? I can see why we wouldn't want
to call acpi_os_queue_for_execution() early in bootup, but there ought
to be a fixed point after which it's ok, regardless of whether the
system has the ECDT table. Would it be sufficient to set ec_device_init
to 1 at the beginning of acpi_ec_add(), with no dependency on the ECDT
table?
Alex
--
Alex Williamson HP Linux & Open Source Lab
-------------------------------------------------------
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community? Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] test9 ACPI bad: scheduling while atomic!
[not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org>
@ 2003-10-27 18:02 ` Noah J. Misch
0 siblings, 0 replies; 4+ messages in thread
From: Noah J. Misch @ 2003-10-27 18:02 UTC (permalink / raw)
To: Alex Williamson
Cc: linux-kernel, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
shaohua.li-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ
On Mon, 27 Oct 2003, Alex Williamson wrote:
> > This was obvious on my system because it has no ECDT table, and as such
> > acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an
> > ECDT it would only do so for a brief time during boot, and the problem would be
> > much more subtle. That's probably why nobody noticed this in earlier tests.
> >
>
> I don't have an ECDT either. Is it possible that the setting of
> ec_device_init = 1 is simply misplaced?
It is misplaced. If revision 1.26 of ec.c were otherwise sound, I would place
ec_device_init = 1 right before the call to acpi_install_gpe_handler in
acpi_ec_start. Anywhere outside that if and between where _add removes the
handlers and _start installs them would work. This would fix your crash, but
it's not the right fix.
> I can see why we wouldn't want to call acpi_os_queue_for_execution() early in
> bootup, but there ought to be a fixed point after which it's ok, regardless of
> whether the system has the ECDT table.
I don't think early calls to schedule_work (via acpi_os_queue_for_execution) are
a problem. The call to init_workqueues is just before do_initcalls in
do_basic_setup, so it happens earlier than all this stuff.
The more general problem is that acpi_ec_gpe_query cannot run in an interrupt
handler as written. It used to always run from a queue. We can either fix it
so it can run from an interrupt handler or change it back to never doing so. I
favor the latter, especially because I don't see how the recent change fixed the
problem T40 users were experiencing.
> Would it be sufficient to set ec_device_init to 1 at the beginning of
> acpi_ec_add(), with no dependency on the ECDT table?
That particular placement looks racy. I would do it after removing the
handlers, as explained above.
Thanks,
Noah
-------------------------------------------------------
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community? Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ACPI] Re: [BUG] test9 ACPI bad: scheduling while atomic!
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
2003-10-27 16:47 ` Alex Williamson
@ 2003-10-27 20:24 ` Nate Lawson
1 sibling, 0 replies; 4+ messages in thread
From: Nate Lawson @ 2003-10-27 20:24 UTC (permalink / raw)
To: Noah J. Misch
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Mon, 27 Oct 2003, Noah J. Misch wrote:
> > are shown below. It looks like the AML associated with the AC event is
> > trying to do an AML_SLEEP_OP. Since this is called while in the
> > interrupt handler, and the eventual call to acpi_os_sleep() sets the
> > current state to interruptible... boom. One simple, but terribly ugly,
> > workaround is to make acpi_os_sleep() call acpi_os_stall() if
> > in_atomic() is true (patch below). Hopefully there's a better way to
> > fix this. Somehow the interpreter really needs to drop interrupt
> > context before it starts making calls like this. Thanks,
I thought a change was committed to address this, calling Stall for up to
255 us and Sleep for more than that.
-Nate
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-10-27 20:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-27 8:22 [BUG] test9 ACPI bad: scheduling while atomic! Noah J. Misch
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
2003-10-27 16:47 ` Alex Williamson
[not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org>
2003-10-27 18:02 ` Noah J. Misch
2003-10-27 20:24 ` [ACPI] " Nate Lawson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox