From: Harry Ciao <qingtao.cao@windriver.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org,
bluesmoke-devel@lists.sourceforge.net,
Doug Thompson <norsk5@yahoo.com>
Subject: Re: [PATCH 1/1] EDAC: fix edac core deadlock when removing a device
Date: Tue, 23 Dec 2008 10:56:01 +0800 [thread overview]
Message-ID: <49505341.6010304@windriver.com> (raw)
In-Reply-To: <20081218144342.4f033416.akpm@linux-foundation.org>
Hi Andrew,
Thanks for your response!
I think the change to edac_device_workq_function() is necessary, because
edac_device_workq_teardown()'s call to cancel_delayed_work() won't
ensure that the work won't be running, that's why it has to go on to
call flush_workqueue() when cancel_delayed_work() returns zero, when the
work has been added to edac_workqueue already, so the whole
edac_workqueue has to be flushed to make sure this work completes before
the edac device it belongs to could be safely removed.
I also add a printk to edac_device_workq_function():
static void edac_device_workq_function(struct work_struct *work_req)
{
struct delayed_work *d_work = (struct delayed_work *)work_req;
struct edac_device_ctl_info *edac_dev = to_edac_device_ctl_work(d_work);
mutex_lock(&device_ctls_mutex);
/* If we are being removed, bail out immediately */
if (edac_dev->op_state == OP_OFFLINE) {
printk(KERN_CRIT "+++HARRY+++ edac device is being removed, bail out
from work!\n");
mutex_unlock(&device_ctls_mutex);
return;
}
/* Only poll controllers that are running polled and have a check */
if ((edac_dev->op_state == OP_RUNNING_POLL) &&
(edac_dev->edac_check != NULL)) {
edac_dev->edac_check(edac_dev);
}
mutex_unlock(&device_ctls_mutex);
...
and do a test to run pairs of insmod/rmmod about 10 times, as you could
see from below logs that 2/10 chances that the work is running by
edac_poller worker thread:
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
+++HARRY+++ edac device is being removed, bail out from work!
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
+++HARRY+++ edac device is being removed, bail out from work!
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root>
So edac_device_workq_function() should bail out immediately if the edac
device that current work belongs to is being removed.
Best regards and Merry Chrismas!
Harry
Andrew Morton 写道:
> On Thu, 18 Dec 2008 13:09:18 +0800
> Harry Ciao <qingtao.cao@windriver.com> wrote:
>
>
>> To remove an edac device, all pending work must be complete. At that point,
>> it is safe to remove the edac_dev structure.
>>
>> If the pending work is not properly cleared and proper care is not taken
>> when waiting for it's completion, the following stack trace result:
>>
>> ...
>>
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -394,6 +394,12 @@ static void edac_device_workq_function(struct work_struct *work_req)
>>
>> mutex_lock(&device_ctls_mutex);
>>
>> + /* If we are being removed, bail out immediately */
>> + if (edac_dev->op_state == OP_OFFLINE) {
>> + mutex_unlock(&device_ctls_mutex);
>> + return;
>> + }
>> +
>> /* Only poll controllers that are running polled and have a check */
>> if ((edac_dev->op_state == OP_RUNNING_POLL) &&
>> (edac_dev->edac_check != NULL)) {
>> @@ -585,14 +591,14 @@ struct edac_device_ctl_info *edac_device_del_device(struct device *dev)
>> /* mark this instance as OFFLINE */
>> edac_dev->op_state = OP_OFFLINE;
>>
>> - /* clear workq processing on this instance */
>> - edac_device_workq_teardown(edac_dev);
>> -
>> /* deregister from global list */
>> del_edac_device_from_global_list(edac_dev);
>>
>> mutex_unlock(&device_ctls_mutex);
>>
>> + /* clear workq processing on this instance */
>> + edac_device_workq_teardown(edac_dev);
>> +
>> /* Tear down the sysfs entries for this instance */
>> edac_device_remove_sysfs(edac_dev);
>>
>>
>
> Is the change to edac_device_workq_function() necessary?
> edac_device_workq_teardown()'s call to cancel_delayed_work() will
> ensure that edac_device_workq_function() isn't running.
>
> Incidentally, edac_device_workq_teardown() is an open-coded
> cancel_delayed_work_sync().
>
>
>
prev parent reply other threads:[~2008-12-23 3:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-18 5:09 [PATCH 0/1] fix EDAC core deadlock when removes a device Harry Ciao
2008-12-18 5:09 ` [PATCH 1/1] EDAC: fix edac core deadlock when removing " Harry Ciao
2008-12-18 22:43 ` Andrew Morton
2008-12-23 2:56 ` Harry Ciao [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49505341.6010304@windriver.com \
--to=qingtao.cao@windriver.com \
--cc=akpm@linux-foundation.org \
--cc=bluesmoke-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=norsk5@yahoo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.