All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Chinner <dgc@sgi.com>, David Howells <dhowells@redhat.com>,
	Gautham Shenoy <ego@in.ibm.com>, Jarek Poplawski <jarkao2@o2.pl>,
	Ingo Molnar <mingo@elte.hu>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] make cancel_rearming_delayed_work() reliable
Date: Fri, 11 May 2007 17:19:01 +0200	[thread overview]
Message-ID: <46448965.7070500@gmail.com> (raw)
In-Reply-To: <20070511134714.GA191@tv-sign.ru>

Oleg Nesterov wrote:
>> and this.  After grabbing cwq lock, compare it to get_wq_data() first,
>> if they are the same, both are using the same lock so there's no
>> reason for the barriers.  If they are different, it doesn't matter
>> anyway.  We need to retry the locking.
> 
> I think this is not right. The problem is that work->data (its
> WORK_STRUCT_WQ_DATA_MASK part, cwq) could be changed _without_ holding
> the old cwq->lock.
> 
> Suppose that CPU_0 does queue_delayed_work(dwork). We start the timer,
> work->data points to cwq_0. CPU_DEAD comes, the timer migrates to
> CPU_1, but work->data was not changed yet.
> 
>>   retry:
>> 	cwq = get_sw_data(work);
>> 	if (!cwq)
>> 		return ret;
>>
>> 	spin_lock_irq(&cwq->lock);
>> 	if (unlikely(cwq != get_wq_data(work))) {
>> 		/* oops wrong cwq */
>> 		spin_unlock_irq(&cwq->lock);
>> 		goto retry;	/* or just return 0; */
>> 	}
> 
> dwork->timer fires, delayed_work_timer_fn() changes work->data to cwq_1
> and queues this work on cwq_1.
> 
>> 	if (!list_empty(&work->entry)) {
>> 		list_del_init(&work->entry);
> 
> Oops! we are holding cwq_0->lock, but modify cwq_1->worklist.
> 
> Actually, we have the same problem with a plain queue_work() if _cpu_down()
> comes at "right" time.
>
> However, I agree, this smp_wmb() in insert_work() should die. We can
> introduce "smp_mb__before_spinlock()" (no-op on x86 at least) to kill it.

Yeah, right, we allow cwq pointer to change without holding the lock.
Although I still think that is where we should fix the problem.  Taking
down CPU is a cold cold path.  We can afford a lot of overhead there.
IMHO, if we can do that, it would be far better than memory barrier
dance which tends to be difficult to understand and thus prove/maintain
correctness.  I'll think about it more.

>>> + * It is possible to use this function if the work re-queues itself. It can
>>> + * cancel the work even if it migrates to another workqueue, however in that
>>> + * case it only garantees that work->func() has completed on the last queued
>>> + * workqueue.
>> We first prevent requeueing from happening; then, wait for each cwq to
>> finish the work, so I think we're guaranteed that they're finished on
>> all cpus.  Otherwise, the 'sync' part isn't too useful as it means all
>> rearming tasks might be running on completion of cancel_work_sync().
> 
> Yes, sure, you are right. What I meant was:
> 
> 	struct workqueue_struct *WQ1, *WQ1;
> 
> 	void work_func(struct work_struct *self)
> 	{
> 		// migrate on another workqueue
> 		queue_work(WQ2, self);
> 
> 		do_something();
> 	}
> 
> 	queue_work(WQ1, work);
> 
> Now, cancel_work_sync() can't guarantee that this work has finished
> its execution on WQ1.

Right again, I misunderstood that the migration was between cwqs.
Things definitely won't work if it's between different wqs.

Thanks.

-- 
tejun

  reply	other threads:[~2007-05-11 15:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-03 20:42 [PATCH] make cancel_rearming_delayed_work() reliable Oleg Nesterov
2007-05-04  1:15 ` Andrew Morton
2007-05-04 17:09   ` Oleg Nesterov
2007-05-05 21:32   ` [PATCH] make-cancel_rearming_delayed_work-reliable-fix Oleg Nesterov
2007-05-07 10:31     ` Jarek Poplawski
2007-05-07 10:34       ` Oleg Nesterov
2007-05-07 11:55         ` Anton Vorontsov
2007-05-07 11:33           ` Oleg Nesterov
2007-05-08  9:16         ` Jarek Poplawski
2007-05-08 12:02           ` Oleg Nesterov
2007-05-08 13:07             ` Jarek Poplawski
2007-05-08  7:15 ` [PATCH] make cancel_rearming_delayed_work() reliable Jarek Poplawski
2007-05-08 12:31   ` Oleg Nesterov
2007-05-08 13:56     ` Jarek Poplawski
2007-05-08 14:05       ` Oleg Nesterov
2007-05-08 14:32         ` Jarek Poplawski
2007-05-08 14:12       ` Jarek Poplawski
2007-05-11 13:55 ` Tejun Heo
2007-05-11 13:47   ` Oleg Nesterov
2007-05-11 15:19     ` Tejun Heo [this message]
2007-05-11 14:53       ` Oleg Nesterov
2007-05-12  5:50         ` Tejun Heo
2007-05-13 19:27           ` Oleg Nesterov
2007-05-13 20:16             ` Tejun Heo
2007-05-13 21:25               ` Oleg Nesterov
2007-05-14 19:44               ` Oleg Nesterov
2007-05-15  8:26                 ` Tejun Heo
2007-05-15 13:09                   ` Jarek Poplawski
2007-05-15 22:08                     ` Oleg Nesterov
2007-05-16  5:21                       ` Jarek Poplawski
2007-05-15 22:00                   ` Oleg Nesterov
2007-05-16 11:25                     ` Tejun Heo
2007-05-16 18:52                       ` Oleg Nesterov
2007-05-17  9:36                         ` Tejun Heo
2007-05-18  7:35                         ` Jarek Poplawski
2007-05-18  8:13                           ` Jarek Poplawski
2007-05-18 13:33                           ` Tejun Heo
2007-05-21  7:00                             ` Jarek Poplawski
2007-05-21  8:59                               ` Tejun Heo
2007-05-21 10:10                                 ` Jarek Poplawski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46448965.7070500@gmail.com \
    --to=htejun@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dgc@sgi.com \
    --cc=dhowells@redhat.com \
    --cc=ego@in.ibm.com \
    --cc=jarkao2@o2.pl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.