Re: [RFC v2 07/18] kthread: Allow to cancel kthread work

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.com>
To: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jiri Kosina <jkosina@suse.cz>, Borislav Petkov <bp@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>,
	live-patching@vger.kernel.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC v2 07/18] kthread: Allow to cancel kthread work
Date: Fri, 2 Oct 2015 17:43:36 +0200	[thread overview]
Message-ID: <20151002154336.GC3122@pathway.suse.cz> (raw)
In-Reply-To: <20150928170314.GF2589@mtj.duckdns.org>

On Mon 2015-09-28 13:03:14, Tejun Heo wrote:
> Hello, Petr.
> 
> On Fri, Sep 25, 2015 at 01:26:17PM +0200, Petr Mladek wrote:
> > 1) PENDING state plus -EAGAIN/busy loop cycle
> > ---------------------------------------------
> > 
> > IMHO, we want to use the timer because it is an elegant solution.
> > Then we must release the lock when the timer is running. The lock
> > must be taken by the timer->function(). And there is a small window
> > when the timer is not longer pending but timer->function is not running:
> > 
> > CPU0                            CPU1
> > 
> > run_timer_softirq()
> >   __run_timers()
> >     detach_expired_timer()
> >       detach_timer()
> > 	#clear_pending
> > 
> > 				try_to_grab_pending_kthread_work()
> > 				  del_timer()
> > 				    # fails because not pending
> > 
> > 				  test_and_set_bit(KTHREAD_WORK_PENDING_BIT)
> > 				    # fails because already set
> > 
> > 				  if (!list_empty(&work->node))
> > 				    # fails because still not queued
> > 
> > 			!!! problematic window !!!
> > 
> >     call_timer_fn()
> >      queue_kthraed_work()
> 
> Let's say each work item has a state variable which is protected by a
> lock and the state can be one of IDLE, PENDING, CANCELING.  Let's also
> assume that all cancelers synchronize with each other via mutex, so we
> only have to worry about a single canceler.  Wouldn't something like
> the following work while being a lot simpler?
> 
> Delayed queueing and execution.
> 
> 1. Lock and check whether state is IDLE.  If not, nothing to do.
> 
> 2. Set state to PENDING and schedule the timer and unlock.
> 
> 3. On expiration, timer_fn grabs the lock and see whether state is
>    still PENDING.  If so, schedule the work item for execution;
>    otherwise, nothing to do.
> 
> 4. After dequeueing from execution queue with lock held, the worker is
>    marked as executing the work item and state is reset to IDLE.
> 
> Canceling
> 
> 1. Lock, dequeue and set the state to CANCELING.
> 
> 2. Unlock and perform del_timer_sync().
> 
> 3. Flush the work item.
> 
> 4. Lock and reset the state to IDLE and unlock.
> 
> 
> > 2) CANCEL state plus custom waitqueue
> > -------------------------------------
> > 
> > cancel_kthread_work_sync() has to wait for the running work. It might take
> > quite some time. Therefore we could not block others by a spinlock.
> > Also others could not wait for the spin lock in a busy wait.
> 
> Hmmm?  Cancelers can synchronize amongst them using a mutex and the
> actual work item wait can use flushing.
> 
> > IMHO, the proposed and rather complex solutions are needed in both cases.
> > 
> > Or did I miss a possible trick, please?
> 
> I probably have missed something in the above and it is not completley
> correct but I do think it can be way simpler than how workqueue does
> it.

I have played with this idea and it opens a can of worms with locking
problems and it looks even more complicated.

Let me show this on a snippet of code:

struct kthread_worker {
	spinlock_t		lock;
	struct list_head	work_list;
	struct kthread_work	*current_work;
};

enum {
	KTHREAD_WORK_IDLE,
	KTHREAD_WORK_PENDING,
	KTHREAD_WORK_CANCELING,
};

struct kthread_work {
	unsigned int		flags;
	spinlock_t		lock;
	struct list_head	node;
	kthread_work_func_t	func;
	struct kthread_worker	*worker;
};


/* the main kthread worker cycle */
int kthread_worker_fn(void *worker_ptr)
{
	struct kthread_worker *worker = worker_ptr;
	struct kthread_work *work;

repeat:

	work = NULL;
	spin_lock_irq(&worker->lock);
	if (!list_empty(&worker->work_list)) {
		work = list_first_entry(&worker->work_list,
					struct kthread_work, node);
		spin_lock(&work->lock);
		list_del_init(&work->node);
		work->flags = KTHREAD_WORK_IDLE;
		spin_unlock(&work->lock);
	}
	worker->current_work = work;
	spin_unlock_irq(&worker->lock);

	if (work) {
		__set_current_state(TASK_RUNNING);
		work->func(work);
	} else if (!freezing(current))
		schedule();

	goto repeat;
}
EXPORT_SYMBOL_GPL(kthread_worker_fn);


static void __queue_kthread_work(struct kthread_worker *worker,
			  struct kthread_work *work)
{
	list_add_tail(&work->node, pos);
	work->worker = worker;
}


bool queue_kthread_work(struct kthread_worker *worker,
			struct kthread_work *work)
{
	bool ret = false;
	unsigned long flags;

	/*
	 * Lock worker first to avoid ABBA deadlock.
	 * What if the work is already queued to another worker?
	 */
	spin_lock_irqsave(&worker->lock, flags);
	spin_lock(&work->lock);

	if (work->flags != KTHREAD_WORK_IDLE)
		goto unlock_out;

	__queue_kthread_work(worker, work);
	ret = true;

unlock_out:
	spin_unlock(&work->lock);
	spin_unlock_irqrestore(&worker->lock, flags);
out:
	return ret;
}

bool cancel_kthread_work_sync(struct kthread_work *work)
{
	struct kthread_worker *worker;
	bool flush = true;
	unsigned long flags;

again:
	worker = ACCESS_ONCE(work->worker);

	if (worker)
		spin_lock_irqsave(&worker->lock, flags);
	else
		local_irq_save(flags);

	spin_lock(&work->lock);

	if (worker && worker != work->worker) {
		spin_unlock(&work->lock);
		spin_unlock_irqrestore(&worker->lock, flags);
		goto again;
	}

	switch (work->flags) {
	case KTHREAD_WORK_PENDING:
		list_del_init(&work->node);
	case KTHREAD_WORK_IDLE:
		work->flags = KTHREAD_WORK_CANCELING;
		break;
	case KTHREAD_WORK_CANCELING:
		/*
		 * Some other work is canceling. Let's wait in a
		 * custom waitqueue until the work is flushed.
		 */
		prepare_to_wait_exclusive(&cancel_waitq, &cwait.wait,
						  TASK_UNINTERRUPTIBLE);
		flush = false;
		break;
	}

	spin_unlock(&work->lock);
	if (worker)
		spin_unlock_irqrestore(&worker->lock, flags);
	else
		local_irq_restore(flags);

	if (flush) {
		kthread_work_flush(work);
		/*
		 * We are the cancel leader. Nobody else could manipulate the
		 * work.
		 */
		work->flags = KTHREAD_WORK_IDLE;
		/*
		 * Paired with prepare_to_wait() above so that either
		 * waitqueue_active() is visible here or CANCELING bit is
		 * visible there.
		 */
		smp_mb();
		if (waitqueue_active(&cancel_waitq))
			__wake_up(&cancel_waitq, TASK_NORMAL, 1, work);
	} elseif (READ_ONCE(work->flags) == KTHREAD_WORK_CANCELING) {
	       schedule();
	}
}


IMHO, we need both locks. The worker manipulates more works and
need its own lock. We need work-specific lock because the work
might be assigned to different workers and we need to be sure
that the operations are really serialized, e.g. queuing.

Now, worker_fn() needs to get the first work from the the worker list
and then manipulate the work => it needs to get the worker->lock
and then work->lock.

It means that most other functions would need to take both locks
in this order to avoid ABBA deadlock. This causes quite some
complications, e.g. in cancel_work().

There might be even more problems if we try to queue the same work
into more workers and we start mixing the locks from different
workers.

I do not know. It is possible that you prefer this solution than
the atomic bit operations. Also it is possible that there exists
a trick that might make it easier.

I would personally prefer to use the tricks from the workqueues.
They are proven to work and they make the code faster. I think
that we might need to queue the work in a sensitive fast path.

IMHO, the code already is much easier than the workqueues because
there are no pools of kthreads behind each worker.

Best Regards,
Petr

next prev parent reply	other threads:[~2015-10-02 15:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-21 13:03 [RFC v2 00/18] kthread: Use kthread worker API more widely Petr Mladek
2015-09-21 13:03 ` [RFC v2 01/18] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2015-09-21 13:03 ` [RFC v2 02/18] kthread: Add create_kthread_worker*() Petr Mladek
2015-09-22 18:20   ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 03/18] kthread: Add drain_kthread_worker() Petr Mladek
2015-09-22 18:26   ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 04/18] kthread: Add destroy_kthread_worker() Petr Mladek
2015-09-22 18:30   ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 05/18] kthread: Add pending flag to kthread work Petr Mladek
2015-09-21 13:03 ` [RFC v2 06/18] kthread: Initial support for delayed " Petr Mladek
2015-09-21 13:03 ` [RFC v2 07/18] kthread: Allow to cancel " Petr Mladek
2015-09-22 19:35   ` Tejun Heo
2015-09-25 11:26     ` Petr Mladek
2015-09-28 17:03       ` Tejun Heo
2015-10-02 15:43         ` Petr Mladek [this message]
2015-10-02 19:24           ` Tejun Heo
2015-10-05 10:07             ` Petr Mladek
2015-10-05 11:09               ` Petr Mladek
2015-10-07  9:21                 ` Petr Mladek
2015-10-07 14:24                   ` Tejun Heo
2015-10-14 10:20                     ` Petr Mladek
2015-10-14 17:30                       ` Tejun Heo
2015-09-21 13:03 ` [RFC v2 08/18] kthread: Allow to modify delayed " Petr Mladek
2015-09-21 13:03 ` [RFC v2 09/18] mm/huge_page: Convert khugepaged() into kthread worker API Petr Mladek
2015-09-22 20:26   ` Tejun Heo
2015-09-23  9:50     ` Petr Mladek
2015-09-21 13:03 ` [RFC v2 10/18] ring_buffer: Do no not complete benchmark reader too early Petr Mladek
2015-09-21 13:03 ` [RFC v2 11/18] ring_buffer: Fix more races when terminating the producer in the benchmark Petr Mladek
2015-09-21 13:03 ` [RFC v2 12/18] ring_buffer: Convert benchmark kthreads into kthread worker API Petr Mladek
2015-09-21 13:03 ` [RFC v2 13/18] rcu: Finish folding ->fqs_state into ->gp_state Petr Mladek
2015-09-21 13:03 ` [RFC v2 14/18] rcu: Store first_gp_fqs into struct rcu_state Petr Mladek
2015-09-21 13:03 ` [RFC v2 15/18] rcu: Clean up timeouts for forcing the quiescent state Petr Mladek
2015-09-21 13:03 ` [RFC v2 16/18] rcu: Check actual RCU_GP_FLAG_FQS when handling " Petr Mladek
2015-09-21 13:03 ` [RFC v2 17/18] rcu: Convert RCU gp kthreads into kthread worker API Petr Mladek
2015-09-28 17:14   ` Paul E. McKenney
2015-10-01 15:43     ` Petr Mladek
2015-10-01 16:33       ` Paul E. McKenney
2015-09-21 13:03 ` [RFC v2 18/18] kthread: Better support freezable kthread workers Petr Mladek
2015-09-22 20:32 ` [RFC v2 00/18] kthread: Use kthread worker API more widely Tejun Heo
2015-09-30  5:08 ` Paul E. McKenney
2015-10-01 15:59   ` Petr Mladek
2015-10-01 17:00     ` Paul E. McKenney
2015-10-02 12:00       ` Petr Mladek
2015-10-02 13:59         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151002154336.GC3122@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=jkosina@suse.cz \
    --cc=josh@joshtriplett.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).