All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Ciao <qingtao.cao@windriver.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org,
	bluesmoke-devel@lists.sourceforge.net,
	Doug Thompson <norsk5@yahoo.com>
Subject: Re: [PATCH 1/1] EDAC: fix edac core deadlock when removing a device
Date: Tue, 23 Dec 2008 10:56:01 +0800	[thread overview]
Message-ID: <49505341.6010304@windriver.com> (raw)
In-Reply-To: <20081218144342.4f033416.akpm@linux-foundation.org>

Hi Andrew,

Thanks for your response!

I think the change to edac_device_workq_function() is necessary, because 
edac_device_workq_teardown()'s call to cancel_delayed_work() won't 
ensure that the work won't be running, that's why it has to go on to 
call flush_workqueue() when cancel_delayed_work() returns zero, when the 
work has been added to edac_workqueue already, so the whole 
edac_workqueue has to be flushed to make sure this work completes before 
the edac device it belongs to could be safely removed.

I also add a printk to edac_device_workq_function():

static void edac_device_workq_function(struct work_struct *work_req)
{
struct delayed_work *d_work = (struct delayed_work *)work_req;
struct edac_device_ctl_info *edac_dev = to_edac_device_ctl_work(d_work);

mutex_lock(&device_ctls_mutex);

/* If we are being removed, bail out immediately */
if (edac_dev->op_state == OP_OFFLINE) {
printk(KERN_CRIT "+++HARRY+++ edac device is being removed, bail out 
from work!\n");
mutex_unlock(&device_ctls_mutex);
return;
}

/* Only poll controllers that are running polled and have a check */
if ((edac_dev->op_state == OP_RUNNING_POLL) &&
(edac_dev->edac_check != NULL)) {
edac_dev->edac_check(edac_dev);
}

mutex_unlock(&device_ctls_mutex);
...

and do a test to run pairs of insmod/rmmod about 10 times, as you could 
see from below logs that 2/10 chances that the work is running by 
edac_poller worker thread:

root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
+++HARRY+++ edac device is being removed, bail out from work!
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
+++HARRY+++ edac device is being removed, bail out from work!
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root> insmod mv64x60_edac.ko
root@localhost:/root> rmmod mv64x60_edac.ko
root@localhost:/root>

So edac_device_workq_function() should bail out immediately if the edac 
device that current work belongs to is being removed.

Best regards and Merry Chrismas!

Harry


Andrew Morton 写道:
> On Thu, 18 Dec 2008 13:09:18 +0800
> Harry Ciao <qingtao.cao@windriver.com> wrote:
>
>   
>> To remove an edac device, all pending work must be complete. At that point, 
>> it is safe to remove the edac_dev structure.
>>
>> If the pending work is not properly cleared and proper care is not taken 
>> when waiting for it's completion, the following stack trace result:
>>
>> ...
>>
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -394,6 +394,12 @@ static void edac_device_workq_function(struct work_struct *work_req)
>>  
>>  	mutex_lock(&device_ctls_mutex);
>>  
>> +	/* If we are being removed, bail out immediately */
>> +	if (edac_dev->op_state == OP_OFFLINE) {
>> +		mutex_unlock(&device_ctls_mutex);
>> +		return;
>> +	}
>> +
>>  	/* Only poll controllers that are running polled and have a check */
>>  	if ((edac_dev->op_state == OP_RUNNING_POLL) &&
>>  		(edac_dev->edac_check != NULL)) {
>> @@ -585,14 +591,14 @@ struct edac_device_ctl_info *edac_device_del_device(struct device *dev)
>>  	/* mark this instance as OFFLINE */
>>  	edac_dev->op_state = OP_OFFLINE;
>>  
>> -	/* clear workq processing on this instance */
>> -	edac_device_workq_teardown(edac_dev);
>> -
>>  	/* deregister from global list */
>>  	del_edac_device_from_global_list(edac_dev);
>>  
>>  	mutex_unlock(&device_ctls_mutex);
>>  
>> +	/* clear workq processing on this instance */
>> +	edac_device_workq_teardown(edac_dev);
>> +
>>  	/* Tear down the sysfs entries for this instance */
>>  	edac_device_remove_sysfs(edac_dev);
>>  
>>     
>
> Is the change to edac_device_workq_function() necessary? 
> edac_device_workq_teardown()'s call to cancel_delayed_work() will
> ensure that edac_device_workq_function() isn't running.
>
> Incidentally, edac_device_workq_teardown() is an open-coded
> cancel_delayed_work_sync().
>
>
>   


      reply	other threads:[~2008-12-23  3:06 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-18  5:09 [PATCH 0/1] fix EDAC core deadlock when removes a device Harry Ciao
2008-12-18  5:09 ` [PATCH 1/1] EDAC: fix edac core deadlock when removing " Harry Ciao
2008-12-18 22:43   ` Andrew Morton
2008-12-23  2:56     ` Harry Ciao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49505341.6010304@windriver.com \
    --to=qingtao.cao@windriver.com \
    --cc=akpm@linux-foundation.org \
    --cc=bluesmoke-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=norsk5@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.