All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	toshi.kani@hp.com, lenb@kernel.org, wency@cn.fujitsu.com,
	vasilis.liaskovitis@profitbricks.com
Subject: Re: [PATCH v2] acpi : acpi_bus_trim() stops removing devices when failing to remove the device
Date: Fri, 26 Oct 2012 16:33:49 +0900	[thread overview]
Message-ID: <508A3CDD.20506@jp.fujitsu.com> (raw)
In-Reply-To: <20121019175941.GB3375@kroah.com>

Hi Greg,

Sorry for late reply.

2012/10/20 2:59, Greg Kroah-Hartman wrote:
> On Fri, Oct 19, 2012 at 06:29:52AM +0200, Rafael J. Wysocki wrote:
>> On Thursday 11 of October 2012 19:12:28 Yasuaki Ishimatsu wrote:
>>> acpi_bus_trim() stops removing devices, when acpi_bus_remove() return error
>>> number. But acpi_bus_remove() cannot return error number correctly.
>>> acpi_bus_remove() only return -EINVAL, when dev argument is NULL. Thus even if
>>> device cannot be removed correctly, acpi_bus_trim() ignores and continues to
>>> remove devices. acpi_bus_hot_remove_device() uses acpi_bus_trim() for removing
>>> devices. Therefore acpi_bus_hot_remove_device() can send "_EJ0" to firmware,
>>> even if the device is running on the system. In this case, the system cannot
>>> work well.
>>>
>>> Vasilis hit the bug at memory hotplug and reported it as follow:
>>> https://lkml.org/lkml/2012/9/26/318
>>>
>>> So acpi_bus_trim() should check whether device was removed or not correctly.
>>> The patch adds error check into some functions to remove the device.
>>>
>>> Applying the patch, acpi_bus_trim() stops removing devices when failing
>>> to remove the device. But I think there is no impact with the
>>> exceptionof CPU and Memory hotplug path. Because other device also fails
>>> but the fail is an irregular case like device is NULL.
>>>
>>> v1->v2
>>> - add a rollback for reinstalling a notify handler.
>>>
>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> Greg, do you think there may be any problems with the changes in dd.c?
>
> Yes, I don't like it.
>
> remove should always work, just like the exit call in a module.  It
> means that the core wants to remove the driver, so it is going to
> happen, a driver can't refuse it.
>
> Which brings me to the larger question, why would this solve anything?

Now we are developing physical memory hot plug.

https://lkml.org/lkml/2012/10/23/213

So if we aplly the patch-set, we can hot remove a physical memory
by the following way.

"echo 1 > /sys/bus/acpi/devices/PNP/eject"

In this case, acpi_bus_hot_remove_device() tries to remove memory
device by acpi_bus_trim(). But if the memory has irremovable memory,
memory hot remove fails. And the memory remains in kernel.
However acpi_bus_trim() cannot notice that memory hot remove fails and
retruns 0. So acpi_bus_hot_remove_device() continues to remove memory
devices and sends _EJ0 method to firmware. Thus the memory device cannot
be used. But the memory remains in kernel yet. So if someone access the
memory, kernel panic occurs.

Thanks,
Yasuaki Ishimatsu

> If the kernel wants to unbind a device, why would we ever not want that
> to happen?
>
> So, NAK on this patch, sorry.  Fix up the ACPI core to handle this
> properly, don't mess with the driver core here.
>
> greg k-h
>



WARNING: multiple messages have this Message-ID (diff)
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>, <linux-acpi@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <toshi.kani@hp.com>,
	<lenb@kernel.org>, <wency@cn.fujitsu.com>,
	<vasilis.liaskovitis@profitbricks.com>
Subject: Re: [PATCH v2] acpi : acpi_bus_trim() stops removing devices when failing to remove the device
Date: Fri, 26 Oct 2012 16:33:49 +0900	[thread overview]
Message-ID: <508A3CDD.20506@jp.fujitsu.com> (raw)
In-Reply-To: <20121019175941.GB3375@kroah.com>

Hi Greg,

Sorry for late reply.

2012/10/20 2:59, Greg Kroah-Hartman wrote:
> On Fri, Oct 19, 2012 at 06:29:52AM +0200, Rafael J. Wysocki wrote:
>> On Thursday 11 of October 2012 19:12:28 Yasuaki Ishimatsu wrote:
>>> acpi_bus_trim() stops removing devices, when acpi_bus_remove() return error
>>> number. But acpi_bus_remove() cannot return error number correctly.
>>> acpi_bus_remove() only return -EINVAL, when dev argument is NULL. Thus even if
>>> device cannot be removed correctly, acpi_bus_trim() ignores and continues to
>>> remove devices. acpi_bus_hot_remove_device() uses acpi_bus_trim() for removing
>>> devices. Therefore acpi_bus_hot_remove_device() can send "_EJ0" to firmware,
>>> even if the device is running on the system. In this case, the system cannot
>>> work well.
>>>
>>> Vasilis hit the bug at memory hotplug and reported it as follow:
>>> https://lkml.org/lkml/2012/9/26/318
>>>
>>> So acpi_bus_trim() should check whether device was removed or not correctly.
>>> The patch adds error check into some functions to remove the device.
>>>
>>> Applying the patch, acpi_bus_trim() stops removing devices when failing
>>> to remove the device. But I think there is no impact with the
>>> exceptionof CPU and Memory hotplug path. Because other device also fails
>>> but the fail is an irregular case like device is NULL.
>>>
>>> v1->v2
>>> - add a rollback for reinstalling a notify handler.
>>>
>>> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>>
>> Greg, do you think there may be any problems with the changes in dd.c?
>
> Yes, I don't like it.
>
> remove should always work, just like the exit call in a module.  It
> means that the core wants to remove the driver, so it is going to
> happen, a driver can't refuse it.
>
> Which brings me to the larger question, why would this solve anything?

Now we are developing physical memory hot plug.

https://lkml.org/lkml/2012/10/23/213

So if we aplly the patch-set, we can hot remove a physical memory
by the following way.

"echo 1 > /sys/bus/acpi/devices/PNP/eject"

In this case, acpi_bus_hot_remove_device() tries to remove memory
device by acpi_bus_trim(). But if the memory has irremovable memory,
memory hot remove fails. And the memory remains in kernel.
However acpi_bus_trim() cannot notice that memory hot remove fails and
retruns 0. So acpi_bus_hot_remove_device() continues to remove memory
devices and sends _EJ0 method to firmware. Thus the memory device cannot
be used. But the memory remains in kernel yet. So if someone access the
memory, kernel panic occurs.

Thanks,
Yasuaki Ishimatsu

> If the kernel wants to unbind a device, why would we ever not want that
> to happen?
>
> So, NAK on this patch, sorry.  Fix up the ACPI core to handle this
> properly, don't mess with the driver core here.
>
> greg k-h
>



  reply	other threads:[~2012-10-26  7:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 10:12 [PATCH v2] acpi : acpi_bus_trim() stops removing devices when failing to remove the device Yasuaki Ishimatsu
2012-10-11 10:12 ` Yasuaki Ishimatsu
2012-10-11 13:58 ` Toshi Kani
2012-10-12  4:31   ` Yasuaki Ishimatsu
2012-10-12  4:31     ` Yasuaki Ishimatsu
2012-10-19  4:29 ` Rafael J. Wysocki
2012-10-19 17:59   ` Greg Kroah-Hartman
2012-10-26  7:33     ` Yasuaki Ishimatsu [this message]
2012-10-26  7:33       ` Yasuaki Ishimatsu
2012-10-26 15:25       ` Greg Kroah-Hartman
2012-10-31 10:52         ` Yasuaki Ishimatsu
2012-10-31 10:52           ` Yasuaki Ishimatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508A3CDD.20506@jp.fujitsu.com \
    --to=isimatu.yasuaki@jp.fujitsu.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=toshi.kani@hp.com \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.