All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarek Poplawski <jarkao2@o2.pl>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work
Date: Tue, 24 Apr 2007 13:53:23 +0200	[thread overview]
Message-ID: <20070424115322.GA2423@ff.dom.local> (raw)
In-Reply-To: <20070423163312.GA129@tv-sign.ru>

On Mon, Apr 23, 2007 at 08:33:12PM +0400, Oleg Nesterov wrote:
> On 04/23, Jarek Poplawski wrote:
> >
> > On Fri, Apr 20, 2007 at 09:08:36PM +0400, Oleg Nesterov wrote:
> > > 
> > > First, this flag should be cleared after return from cancel_rearming_delayed_work().
> > 
> > I think this flag, if at all, probably should be cleared only
> > consciously by the owner of a work, maybe as a schedule_xxx_work
> > parameter, (but shouldn't be used from work handlers for rearming).
> > Mostly it should mean: we are closing (and have no time to chase
> > our work)...
> 
> This will change the API. Currently it is possible to do:
> 
> 	cancel_delayed_work(dwork);
> 	schedule_delayed_work(dwork, delay);
> 
> and we have such a code. With the change you propose this can't work.

Not necessarily: this all was only a concept and schedule_xxx_work
could be also a new function, if needed.

> 
> > > Also, we should add a lot of nasty checks to workqueue.c
> > 
> > Checking a flag isn't nasty - it's clear. IMHO current way of checking,
> > whether cancel succeeded, is nasty.
> > 
> > > 
> > > I _think_ we can re-use WORK_STRUCT_PENDING to improve this interface.
> > > Note that if we set WORK_STRUCT_PENDING, the work can't be queued, and
> > > dwork->timer can't be started. The only problem is that it is not so
> > > trivial to avoid races.
> > 
> > If there were no place, it would be better, then current way.
> > But WORK_STRUCT_PENDING couldn't be used for some error checking,
> > as it's now.
> 
> Look,
> 
> 	void cancel_rearming_delayed_work(struct delayed_work *dwork)
> 	{
> 		struct work_struct *work = &dwork->work;
> 		struct cpu_workqueue_struct *cwq = get_wq_data(work);
> 		struct workqueue_struct *wq;
> 		const cpumask_t *cpu_map;
> 		int retry;
> 		int cpu;
> 
> 		if (!cwq)
> 			return;
> 
> 	retry:
> 		spin_lock_irq(&cwq->lock);
> 		list_del_init(&work->entry);
> 		__set_bit(WORK_STRUCT_PENDING, work_data_bits(work));
> 		retry = try_to_del_timer_sync(&dwork->timer) < 0;
> 		spin_unlock_irq(&cwq->lock);
> 
> 		if (unlikely(retry))
> 			goto retry;
> 
> 		// the work can't be re-queued and the timer can't
> 		// be re-started due to WORK_STRUCT_PENDING
> 
> 		wq = cwq->wq;
> 		cpu_map = wq_cpu_map(wq);
> 
> 		for_each_cpu_mask(cpu, *cpu_map)
> 			wait_on_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
> 
> 		work_clear_pending(work);
> 	}
> 
> I think this almost works, except:
> 
> 	- we should change run_workqueue() to call work_clear_pending()
> 	  under cwq->lock. I'd like to avoid this.
> 
> 	- this is racy wrt cpu-hotplug. We should re-check get_wq_data()
> 	  when we take the lock. This is easy.
> 
> 	- we should factor out the common code with cancel_work_sync().
> 
> I may be wrong, still had no time to concentrate on this with a "clear head".
> May be tomorrow.

This looks fine. Of course, it requires to remove some debugging
currently done with _PENDING flag and it's hard to estimate this
all before you do more, but it should be more foreseeable than
current way. But the races with _PENDING could be really "funny"
without locking it everywhere. BTW - are a few locks more a real
problem, while serving a "sleeping" path? And I don't think there
is any reason to hurry... 

> 
> > > > - for a work function: to stop execution as soon as possible,
> > > > even without completing the usual job, at first possible check.
> > > 
> > > I doubt we need this "in general". It is easy to add some flag to the
> > > work_struct's container and check it in work->func() when needed.
> > 
> > Yes, but currently you cannot to behave like this e.g. with
> > "rearming" work.
> 
> Why?

OK, it's not impossible, but needs some bothering: if I simply
set some flag and my work function exits before rearming -
cancel_rearming_delayed_work can loop.

> 
> >                   And maybe a common api could save some work.
> 
> May be you are right, but still I don't think we should introduce
> the new flag to implement this imho not-so-useful feature.

Maybe you are right. Probably some code should be analysed,
to check how often such activities are needed.

Cheers,
Jarek P.

  reply	other threads:[~2007-04-24 11:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070419002548.72689f0e.akpm@linux-foundation.org>
     [not found] ` <20070419102122.GA93@tv-sign.ru>
2007-04-20  9:22   ` Fw: [PATCH -mm] workqueue: debug possible endless loop in cancel_rearming_delayed_work Jarek Poplawski
2007-04-20 17:08     ` Oleg Nesterov
2007-04-23  9:00       ` Jarek Poplawski
2007-04-23 16:33         ` Oleg Nesterov
2007-04-24 11:53           ` Jarek Poplawski [this message]
2007-04-24 18:55             ` Oleg Nesterov
2007-04-25  6:12               ` Jarek Poplawski
2007-04-25 12:20               ` Jarek Poplawski
2007-04-25 12:28                 ` Jarek Poplawski
2007-04-25 12:47                   ` Oleg Nesterov
2007-04-25 14:47                     ` Oleg Nesterov
2007-04-26 12:59                       ` Jarek Poplawski
2007-04-26 16:34                         ` Oleg Nesterov
2007-04-27  5:26                           ` Jarek Poplawski
2007-04-27  7:52                             ` Oleg Nesterov
2007-04-27  9:03                               ` Jarek Poplawski
2007-04-26 13:13                     ` Jarek Poplawski
2007-04-26 16:44                       ` Oleg Nesterov
2007-04-27  5:52                         ` Jarek Poplawski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070424115322.GA2423@ff.dom.local \
    --to=jarkao2@o2.pl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.