* Re: [BUG] test9 ACPI bad: scheduling while atomic!
@ 2003-10-27 8:22 Noah J. Misch
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Noah J. Misch @ 2003-10-27 8:22 UTC (permalink / raw)
To: alex.williamson-VXdhtT5mjnY
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
shaohua.li-ral2JQCrhuEAvxtiuMwx3w,
len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ
Hi,
> On an Omnibook 500 running test9, removing AC power causes an
> immediate hang. This laptop is getting a little old and I have to force
> on ACPI support, but this did not happen with test8. The bug and panic
I have this problem as well, on a Sony Vaio, model PCG-571L.
> are shown below. It looks like the AML associated with the AC event is
> trying to do an AML_SLEEP_OP. Since this is called while in the
> interrupt handler, and the eventual call to acpi_os_sleep() sets the
> current state to interruptible... boom. One simple, but terribly ugly,
> workaround is to make acpi_os_sleep() call acpi_os_stall() if
> in_atomic() is true (patch below). Hopefully there's a better way to
> fix this. Somehow the interpreter really needs to drop interrupt
> context before it starts making calls like this. Thanks,
This problem stems from the changes in revision 1.26 of drivers/acpi/ec.c.
They come from a patch Shaohua Li submitted for kernel bug 1171 at
bugme.osdl.org. That patch can cause acpi_ec_gpe_query to run in interrupt
context, whereas before it always ran from a workqueue. It does non-interrupt
like things, like sleeping and kmalloc'ing with GFP_KERNEL.
This was obvious on my system because it has no ECDT table, and as such
acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an
ECDT it would only do so for a brief time during boot, and the problem would be
much more subtle. That's probably why nobody noticed this in earlier tests.
I reversed cset 1.1337.43.3 as follows, and that fixed the problem:
bk export -tpatch -r1.1337.43.3 | patch -p1 -R
I can't figure out why that patch fixed the oops in bug 1171. It was a hook
into the ec address space handler, not the gpe handler, that led to the oops,
yet the patch seems to only modify gpe-related code. Perhaps you could explain,
Shaohua?
I'd guess the T40 oops results from the ACPI_MEM_FREE on line 305 of
drivers/acpi/events/evregion.c freeing already-freed memory. I'm actually not
sure why that free is even there. I also can't figure why only SMP-configured
kernels exhibited the problem. If someone has the problem hardware, I am
willing to debug it, however.
The errant patch does address what seems to be a race condition that could play
out as follows:
1) The early ECDT probe locates an ECDT and registers a handler for the relevant
GPE and address space.
2) An IRQ triggers acpi_ec_gpe_handler, which schedules acpi_ec_gpe_query.
3) ACPI scans for devices and adds the "real" embedded controller device,
freeing the (temporary) context of the old GPE query handler.
4) Queue runs acpi_ec_gpe_query with a context that has already been kfree'd,
causing it to fail.
It seems rather theoretical, but perhaps we could fix it with a patch like the
following. I tested it for kicks and didn't hit any problems, but I'm afraid it
risks more problems than it solves. Thoughts?
--- 1.27/drivers/acpi/ec.c Mon Oct 27 03:50:57 2003
+++ edited/drivers/acpi/ec.c Mon Oct 27 03:51:57 2003
@@ -28,6 +28,7 @@
#include <linux/init.h>
#include <linux/types.h>
#include <linux/delay.h>
+#include <linux/workqueue.h>
#include <linux/proc_fs.h>
#include <asm/io.h>
#include <acpi/acpi_bus.h>
@@ -593,6 +594,10 @@
ACPI_ADR_SPACE_EC, &acpi_ec_space_handler);
acpi_remove_gpe_handler(NULL, ec_ecdt->gpe_bit, &acpi_ec_gpe_handler);
+
+ /* Clear any pending GPE queries before freeing the context for
+ their handlers */
+ flush_scheduled_work();
kfree(ec_ecdt);
}
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>]
* Re: [BUG] test9 ACPI bad: scheduling while atomic! [not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org> @ 2003-10-27 16:47 ` Alex Williamson [not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org> 2003-10-27 20:24 ` [ACPI] " Nate Lawson 1 sibling, 1 reply; 4+ messages in thread From: Alex Williamson @ 2003-10-27 16:47 UTC (permalink / raw) To: Noah J. Misch Cc: linux-kernel, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, shaohua.li-ral2JQCrhuEAvxtiuMwx3w, len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ On Mon, 2003-10-27 at 01:22, Noah J. Misch wrote: > This problem stems from the changes in revision 1.26 of drivers/acpi/ec.c. > They come from a patch Shaohua Li submitted for kernel bug 1171 at > bugme.osdl.org. That patch can cause acpi_ec_gpe_query to run in interrupt > context, whereas before it always ran from a workqueue. It does non-interrupt > like things, like sleeping and kmalloc'ing with GFP_KERNEL. > > This was obvious on my system because it has no ECDT table, and as such > acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an > ECDT it would only do so for a brief time during boot, and the problem would be > much more subtle. That's probably why nobody noticed this in earlier tests. > I don't have an ECDT either. Is it possible that the setting of ec_device_init = 1 is simply misplaced? I can see why we wouldn't want to call acpi_os_queue_for_execution() early in bootup, but there ought to be a fixed point after which it's ok, regardless of whether the system has the ECDT table. Would it be sufficient to set ec_device_init to 1 at the beginning of acpi_ec_add(), with no dependency on the ECDT table? Alex -- Alex Williamson HP Linux & Open Source Lab ------------------------------------------------------- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org>]
* Re: [BUG] test9 ACPI bad: scheduling while atomic! [not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org> @ 2003-10-27 18:02 ` Noah J. Misch 0 siblings, 0 replies; 4+ messages in thread From: Noah J. Misch @ 2003-10-27 18:02 UTC (permalink / raw) To: Alex Williamson Cc: linux-kernel, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, shaohua.li-ral2JQCrhuEAvxtiuMwx3w, len.brown-ral2JQCrhuEAvxtiuMwx3w, jon-fdRWMHV75ajk1uMJSBkQmQ On Mon, 27 Oct 2003, Alex Williamson wrote: > > This was obvious on my system because it has no ECDT table, and as such > > acpi_ec_gpe_query was _always_ running in interrupt context, whereas with an > > ECDT it would only do so for a brief time during boot, and the problem would be > > much more subtle. That's probably why nobody noticed this in earlier tests. > > > > I don't have an ECDT either. Is it possible that the setting of > ec_device_init = 1 is simply misplaced? It is misplaced. If revision 1.26 of ec.c were otherwise sound, I would place ec_device_init = 1 right before the call to acpi_install_gpe_handler in acpi_ec_start. Anywhere outside that if and between where _add removes the handlers and _start installs them would work. This would fix your crash, but it's not the right fix. > I can see why we wouldn't want to call acpi_os_queue_for_execution() early in > bootup, but there ought to be a fixed point after which it's ok, regardless of > whether the system has the ECDT table. I don't think early calls to schedule_work (via acpi_os_queue_for_execution) are a problem. The call to init_workqueues is just before do_initcalls in do_basic_setup, so it happens earlier than all this stuff. The more general problem is that acpi_ec_gpe_query cannot run in an interrupt handler as written. It used to always run from a queue. We can either fix it so it can run from an interrupt handler or change it back to never doing so. I favor the latter, especially because I don't see how the recent change fixed the problem T40 users were experiencing. > Would it be sufficient to set ec_device_init to 1 at the beginning of > acpi_ec_add(), with no dependency on the ECDT table? That particular placement looks racy. I would do it after removing the handlers, as explained above. Thanks, Noah ------------------------------------------------------- This SF.net email is sponsored by: The SF.net Donation Program. Do you like what SourceForge.net is doing for the Open Source Community? Make a contribution, and help us add new features and functionality. Click here: http://sourceforge.net/donate/ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ACPI] Re: [BUG] test9 ACPI bad: scheduling while atomic! [not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org> 2003-10-27 16:47 ` Alex Williamson @ 2003-10-27 20:24 ` Nate Lawson 1 sibling, 0 replies; 4+ messages in thread From: Nate Lawson @ 2003-10-27 20:24 UTC (permalink / raw) To: Noah J. Misch Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, acpi-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f On Mon, 27 Oct 2003, Noah J. Misch wrote: > > are shown below. It looks like the AML associated with the AC event is > > trying to do an AML_SLEEP_OP. Since this is called while in the > > interrupt handler, and the eventual call to acpi_os_sleep() sets the > > current state to interruptible... boom. One simple, but terribly ugly, > > workaround is to make acpi_os_sleep() call acpi_os_stall() if > > in_atomic() is true (patch below). Hopefully there's a better way to > > fix this. Somehow the interpreter really needs to drop interrupt > > context before it starts making calls like this. Thanks, I thought a change was committed to address this, calling Stall for up to 255 us and Sleep for more than that. -Nate ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-10-27 20:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-27 8:22 [BUG] test9 ACPI bad: scheduling while atomic! Noah J. Misch
[not found] ` <Pine.GSO.4.58.0310262327040.19469-8Uwgm7wxmYs@public.gmane.org>
2003-10-27 16:47 ` Alex Williamson
[not found] ` <1067273229.7497.30.camel-Wmjt7DDUnIVxnVILBQAtiA@public.gmane.org>
2003-10-27 18:02 ` Noah J. Misch
2003-10-27 20:24 ` [ACPI] " Nate Lawson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox