* Re: [Bugme-new] [Bug 9731] New: 2.6.24-rc7: Deadlock when any ACPI eject sys node written
[not found] <bug-9731-10286@http.bugzilla.kernel.org/>
@ 2008-01-11 20:37 ` Andrew Morton
0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2008-01-11 20:37 UTC (permalink / raw)
To: Greg KH, Kay Sievers; +Cc: bugme-daemon, linux-acpi, arai
On Fri, 11 Jan 2008 09:38:25 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9731
>
> Summary: 2.6.24-rc7: Deadlock when any ACPI eject sys node
> written
> Product: ACPI
> Version: 2.5
> KernelVersion: 2.6.24-rc7
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: Other
> AssignedTo: acpi_other@kernel-bugs.osdl.org
> ReportedBy: arai@vmware.com
>
>
> Latest working kernel version: Unknown
> Earliest failing kernel version: All 2.6.24-rc versions I've tried
> Distribution: sles10
> Hardware Environment: x86_64
> Software Environment:
> Problem Description:
> I have "hardware" that supports ejectable CPUs. Any attempt to eject a CPU by
> echoing 1 into the /sys node results in the shell doing the echo deadlocking.
>
> Here's what dmesg says bash is doing:
>
> bash D 0000000000000000 0 3552 3372
> ffff810007023ca8 0000000000000082 0000000000000000 ffff8100014327f0
> 0000000000000000 ffffffff00000000 ffff81000ecde0c0 ffff8100014437c0
> 304a455f0dd521a0 00000000ffffdb37 00000000000000ff ffff81000fe37900
> Call Trace:
> [<ffffffff80447282>] wait_for_completion+0xa2/0xf0
> [<ffffffff80231d50>] default_wake_function+0x0/0x10
> [<ffffffff802e2f6d>] sysfs_addrm_finish+0x1dd/0x250
> [<ffffffff802e17d6>] sysfs_hash_and_remove+0xa6/0xc0
> [<ffffffff8038d37d>] device_remove_file+0x2d/0x60
> [<ffffffff803525c3>] acpi_device_unregister+0xc8/0x124
> [<ffffffff80352778>] acpi_bus_remove+0x5e/0x64
> [<ffffffff803527f8>] acpi_bus_trim+0x7a/0xee
> [<ffffffff803528e8>] acpi_eject_store+0x7c/0x119
> [<ffffffff802e1ef4>] sysfs_write_file+0xd4/0x150
> [<ffffffff80293f7d>] vfs_write+0xdd/0x150
> [<ffffffff80294643>] sys_write+0x53/0x90
> [<ffffffff8020bf1e>] system_call+0x7e/0x83
>
> The problem seems to be that acpi_device_unregister tries to delete the sys
> node for eject, but the node cannot be deleted until the write completes.
>
> sysfs_write_file calls flush_write_buffer, which does this:
>
> static int
> flush_write_buffer(struct dentry * dentry, struct sysfs_buffer * buffer, size_t
> count)
> {
> struct sysfs_dirent *attr_sd = dentry->d_fsdata;
> struct kobject *kobj = attr_sd->s_parent->s_elem.dir.kobj;
> struct sysfs_ops * ops = buffer->ops;
> int rc;
>
> /* need attr_sd for attr and ops, its parent for kobj */
> if (!sysfs_get_active_two(attr_sd))
> return -ENODEV;
>
> rc = ops->store(kobj, attr_sd->s_elem.attr.attr, buffer->page, count);
>
> sysfs_put_active_two(attr_sd);
>
> return rc;
> }
>
> sysfs_addrm_finish calls sysfs_deactivate, which is stuck waiting forever on
> the wait_for_completion call:
>
> /**
> * sysfs_deactivate - deactivate sysfs_dirent
> * @sd: sysfs_dirent to deactivate
> *
> * Deny new active references and drain existing ones.
> */
> static void sysfs_deactivate(struct sysfs_dirent *sd)
> {
> DECLARE_COMPLETION_ONSTACK(wait);
> int v;
>
> BUG_ON(sd->s_sibling || !(sd->s_flags & SYSFS_FLAG_REMOVED));
> sd->s_sibling = (void *)&wait;
>
> /* atomic_add_return() is a mb(), put_active() will always see
> * the updated sd->s_sibling.
> */
> v = atomic_add_return(SD_DEACTIVATED_BIAS, &sd->s_active);
>
> if (v != SD_DEACTIVATED_BIAS)
> wait_for_completion(&wait);
>
> sd->s_sibling = NULL;
> }
>
> But it looks like to me the wait_for_completion() won't return until the call
> to sysfs_put_active_two() in flush_write_buffer() is invoked. This looks like
> a deadlock to me.
>
> I can provide more information if it's helpful, and can help with testing any
> patches.
>
> I'm not sure when this problem was exactly first introduced. 2.6.22 hung in a
> similar way, but it looks like the code that deals with deleting sysfs nodes
> got significantly reworked between 2.6.22 and 2.6.24.
>
> Steps to reproduce:
> echo 1 into any /sys/devices/LNXSYSTM:00/ACPI*/eject node. Watch the parent
> process hang.
Thanks. So it would seem that sysfs core changes caused the acpi code to
fail.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2008-01-11 20:38 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-9731-10286@http.bugzilla.kernel.org/>
2008-01-11 20:37 ` [Bugme-new] [Bug 9731] New: 2.6.24-rc7: Deadlock when any ACPI eject sys node written Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.