linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@librato.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@osdl.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org, Serge Hallyn <serue@us.ibm.com>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Pavel Emelyanov <xemul@openvz.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Oren Laadan <orenl@librato.com>,
	Oren Laadan <orenl@cs.columbia.edu>
Subject: [RFC v17][PATCH 47/60] deferqueue: generic queue to defer work
Date: Wed, 22 Jul 2009 06:00:09 -0400	[thread overview]
Message-ID: <1248256822-23416-48-git-send-email-orenl@librato.com> (raw)
In-Reply-To: <1248256822-23416-1-git-send-email-orenl@librato.com>

Add a interface to postpone an action until the end of the entire
checkpoint or restart operation. This is useful when during the
scan of tasks an operation cannot be performed in place, to avoid
the need for a second scan.

One use case is when restoring an ipc shared memory region that has
been deleted (but is still attached), during restart it needs to be
create, attached and then deleted. However, creation and attachment
are performed in distinct locations, so deletion can not be performed
on the spot. Instead, this work (delete) is deferred until later.
(This example is in one of the following patches).

This interface allows chronic procrastination in the kernel:

deferqueue_create(void):
    Allocates and returns a new deferqueue.

deferqueue_run(deferqueue):
    Executes all the pending works in the queue. Returns the number
    of works executed, or an error upon the first error reported by
    a deferred work.

deferqueue_add(deferqueue, data, size, func, dtor):
    Enqueue a deferred work. @function is the callback function to
    do the work, which will be called with @data as an argument.
    @size tells the size of data. @dtor is a destructor callback
    that is invoked for deferred works remaining in the queue when
    the queue is destroyed. NOTE: for a given deferred work, @dtor
    is _not_ called if @func was already called (regardless of the
    return value of the latter).

deferqueue_destroy(deferqueue):
    Free the deferqueue and any queued items while invoking the
    @dtor callback for each queued item.

Why aren't we using the existing kernel workqueue mechanism?  We need
to defer to work until the end of the operation: not earlier, since we
need other things to be in place; not later, to not block waiting for
it. However, the workqueue schedules the work for 'some time later'.
Also, the kernel workqueue may run in any task context, but we require
many times that an operation be run in the context of some specific
restarting task (e.g., restoring IPC state of a certain ipc_ns).

Instead, this mechanism is a simple way for the c/r operation as a
whole, and later a task in particular, to defer some action until
later (but not arbitrarily later) _in the restore_ operation.

Changelog[v17]
  - Fix deferqueue_add() function

Signed-off-by: Oren Laadan <orenl@cs.columbia.edu>
---
 checkpoint/Kconfig         |    5 ++
 include/linux/deferqueue.h |   58 +++++++++++++++++++++++
 kernel/Makefile            |    1 +
 kernel/deferqueue.c        |  109 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/deferqueue.h
 create mode 100644 kernel/deferqueue.c

diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index 21fc86b..4a2c845 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -2,10 +2,15 @@
 # implemented the hooks for processor state etc. needed by the
 # core checkpoint/restart code.
 
+config DEFERQUEUE
+	bool
+	default n
+
 config CHECKPOINT
 	bool "Checkpoint/restart (EXPERIMENTAL)"
 	depends on CHECKPOINT_SUPPORT && EXPERIMENTAL
 	depends on CGROUP_FREEZER
+	select DEFERQUEUE
 	help
 	  Application checkpoint/restart is the ability to save the
 	  state of a running application so that it can later resume
diff --git a/include/linux/deferqueue.h b/include/linux/deferqueue.h
new file mode 100644
index 0000000..2eb58cf
--- /dev/null
+++ b/include/linux/deferqueue.h
@@ -0,0 +1,58 @@
+/*
+ * deferqueue.h --- deferred work queue handling for Linux.
+ */
+
+#ifndef _LINUX_DEFERQUEUE_H
+#define _LINUX_DEFERQUEUE_H
+
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+/*
+ * This interface allows chronic procrastination in the kernel:
+ *
+ * deferqueue_create(void):
+ *     Allocates and returns a new deferqueue.
+ *
+ * deferqueue_run(deferqueue):
+ *     Executes all the pending works in the queue. Returns the number
+ *     of works executed, or an error upon the first error reported by
+ *     a deferred work.
+ *
+ * deferqueue_add(deferqueue, data, size, func, dtor):
+ * 	Enqueue a deferred work. @function is the callback function to
+ *      do the work, which will be called with @data as an argument.
+ *      @size tells the size of data. @dtor is a destructor callback
+ *      that is invoked for deferred works remaining in the queue when
+ *      the queue is destroyed. NOTE: for a given deferred work, @dtor
+ *      is _not_ called if @func was already called (regardless of the
+ *      return value of the latter).
+ *
+ * deferqueue_destroy(deferqueue):
+ *      Free the deferqueue and any queued items while invoking the
+ *      @dtor callback for each queued item.
+ */
+
+
+typedef int (*deferqueue_func_t)(void *);
+
+struct deferqueue_entry {
+	deferqueue_func_t function;
+	deferqueue_func_t destructor;
+	struct list_head list;
+	char data[0];
+};
+
+struct deferqueue_head {
+	spinlock_t lock;
+	struct list_head list;
+};
+
+struct deferqueue_head *deferqueue_create(void);
+void deferqueue_destroy(struct deferqueue_head *head);
+int deferqueue_add(struct deferqueue_head *head, void *data, int size,
+		   deferqueue_func_t func, deferqueue_func_t dtor);
+int deferqueue_run(struct deferqueue_head *head);
+
+#endif
diff --git a/kernel/Makefile b/kernel/Makefile
index 2093a69..ef229da 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -23,6 +23,7 @@ CFLAGS_REMOVE_cgroup-debug.o = -pg
 CFLAGS_REMOVE_sched_clock.o = -pg
 endif
 
+obj-$(CONFIG_DEFERQUEUE) += deferqueue.o
 obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
diff --git a/kernel/deferqueue.c b/kernel/deferqueue.c
new file mode 100644
index 0000000..3fb388b
--- /dev/null
+++ b/kernel/deferqueue.c
@@ -0,0 +1,109 @@
+/*
+ *  Infrastructure to manage deferred work
+ *
+ *  This differs from a workqueue in that the work must be deferred
+ *  until specifically run by the caller.
+ *
+ *  As the only user currently is checkpoint/restart, which has
+ *  very simple usage, the locking is kept simple.  Adding rules
+ *  is protected by the head->lock.  But deferqueue_run() is only
+ *  called once, after all entries have been added.  So it is not
+ *  protected.  Similarly, _destroy is only called once when the
+ *  ckpt_ctx is releeased, so it is not locked or refcounted.  These
+ *  can of course be added if needed by other users.
+ *
+ *  Why not use workqueue ?  We need to defer work until the end of an
+ *  operation: not earlier, since we need other things to be in place;
+ *  not later, to not block waiting for it. However, the workqueue
+ *  schedules the work for 'some time later'. Also, workqueue may run
+ *  in any task context, but we require many times that an operation
+ *  be run in the context of some specific restarting task (e.g.,
+ *  restoring IPC state of a certain ipc_ns).
+ *
+ *  Instead, this mechanism is a simple way for the c/r operation as a
+ *  whole, and later a task in particular, to defer some action until
+ *  later (but not arbitrarily later) _in the restore_ operation.
+ *
+ *  Copyright (C) 2009 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/deferqueue.h>
+
+struct deferqueue_head *deferqueue_create(void)
+{
+	struct deferqueue_head *h = kmalloc(sizeof(*h), GFP_KERNEL);
+	if (h) {
+		spin_lock_init(&h->lock);
+		INIT_LIST_HEAD(&h->list);
+	}
+	return h;
+}
+
+void deferqueue_destroy(struct deferqueue_head *h)
+{
+	if (!list_empty(&h->list)) {
+		struct deferqueue_entry *dq, *n;
+
+		pr_debug("%s: freeing non-empty queue\n", __func__);
+		list_for_each_entry_safe(dq, n, &h->list, list) {
+			dq->destructor(dq->data);
+			list_del(&dq->list);
+			kfree(dq);
+		}
+	}
+	kfree(h);
+}
+
+int deferqueue_add(struct deferqueue_head *head, void *data, int size,
+		   deferqueue_func_t func, deferqueue_func_t dtor)
+{
+	struct deferqueue_entry *dq;
+
+	dq = kmalloc(sizeof(*dq) + size, GFP_KERNEL);
+	if (!dq)
+		return -ENOMEM;
+
+	dq->function = func;
+	dq->destructor = dtor;
+	memcpy(dq->data, data, size);
+
+	pr_debug("%s: adding work %p func %p dtor %p\n",
+		 __func__, dq, func, dtor);
+	spin_lock(&head->lock);
+	list_add_tail(&dq->list, &head->list);
+	spin_unlock(&head->lock);
+	return 0;
+}
+
+/*
+ * deferqueue_run - perform all work in the work queue
+ * @head: deferqueue_head from which to run
+ *
+ * returns: number of works performed, or < 0 on error
+ */
+int deferqueue_run(struct deferqueue_head *head)
+{
+	struct deferqueue_entry *dq, *n;
+	int nr = 0;
+	int ret;
+
+	list_for_each_entry_safe(dq, n, &head->list, list) {
+		pr_debug("doing work %p function %p\n", dq, dq->function);
+		/* don't call destructor - function callback should do it */
+		ret = dq->function(dq->data);
+		if (ret < 0)
+			pr_debug("wq function failed %d\n", ret);
+		list_del(&dq->list);
+		kfree(dq);
+		nr++;
+	}
+
+	return nr;
+}
-- 
1.6.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-07-22 10:00 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-22  9:59 [RFC v17][PATCH 00/60] Kernel based checkpoint/restart Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 01/60] c/r: extend arch_setup_additional_pages() Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 02/60] x86: ptrace debugreg checks rewrite Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 03/60] c/r: break out new_user_ns() Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 04/60] c/r: split core function out of some set*{u,g}id functions Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 05/60] cgroup freezer: Fix buggy resume test for tasks frozen with cgroup freezer Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 06/60] cgroup freezer: Update stale locking comments Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 07/60] cgroup freezer: Add CHECKPOINTING state to safeguard container checkpoint Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 08/60] cgroup freezer: interface to freeze a cgroup from within the kernel Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 09/60] Namespaces submenu Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 10/60] c/r: make file_pos_read/write() public Oren Laadan
2009-07-23  2:33   ` KAMEZAWA Hiroyuki
2009-07-22  9:59 ` [RFC v17][PATCH 11/60] pids 1/7: Factor out code to allocate pidmap page Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 12/60] pids 2/7: Have alloc_pidmap() return actual error code Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 13/60] pids 3/7: Add target_pid parameter to alloc_pidmap() Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 14/60] pids 4/7: Add target_pids parameter to alloc_pid() Oren Laadan
     [not found]   ` <1248256822-23416-15-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-08-03 18:22     ` Serge E. Hallyn
2009-07-22  9:59 ` [RFC v17][PATCH 16/60] pids 6/7: Define do_fork_with_pids() Oren Laadan
2009-08-03 18:26   ` Serge E. Hallyn
2009-08-04  8:37     ` Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 17/60] pids 7/7: Define clone_with_pids syscall Oren Laadan
     [not found]   ` <1248256822-23416-18-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-29  0:44     ` Sukadev Bhattiprolu
2009-07-22  9:59 ` [RFC v17][PATCH 18/60] c/r: create syscalls: sys_checkpoint, sys_restart Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 19/60] c/r: documentation Oren Laadan
     [not found]   ` <1248256822-23416-20-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-23 14:24     ` Serge E. Hallyn
2009-07-23 15:24       ` Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 20/60] c/r: basic infrastructure for checkpoint/restart Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 21/60] c/r: x86_32 support " Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 22/60] c/r: external checkpoint of a task other than ourself Oren Laadan
2009-07-22 17:52   ` Serge E. Hallyn
2009-07-23  4:32     ` Oren Laadan
     [not found]       ` <4A67E7D7.9060800-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-23 13:12         ` Serge E. Hallyn
2009-07-23 14:14           ` Oren Laadan
2009-07-23 14:54         ` Serge E. Hallyn
2009-07-23 14:47   ` Serge E. Hallyn
2009-07-23 15:33     ` Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 23/60] c/r: export functionality used in next patch for restart-blocks Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 24/60] c/r: restart-blocks Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 25/60] c/r: checkpoint multiple processes Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 26/60] c/r: restart " Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 27/60] c/r: introduce PF_RESTARTING, and skip notification on exit Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 29/60] c/r: Save and restore the [compat_]robust_list member of the task struct Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 30/60] c/r: infrastructure for shared objects Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 31/60] c/r: detect resource leaks for whole-container checkpoint Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 32/60] c/r: introduce '->checkpoint()' method in 'struct file_operations' Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 33/60] c/r: dump open file descriptors Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 34/60] c/r: restore " Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 35/60] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 36/60] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2009-07-22  9:59 ` [RFC v17][PATCH 37/60] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 38/60] c/r: dump memory address space (private memory) Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 39/60] c/r: restore " Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 40/60] c/r: export shmem_getpage() to support shared memory Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 42/60] c/r: restore anonymous- and file-mapped- " Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 43/60] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 45/60] c/r: make ckpt_may_checkpoint_task() check each namespace individually Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 46/60] c/r: support for UTS namespace Oren Laadan
2009-07-22 10:00 ` Oren Laadan [this message]
2009-07-22 10:00 ` [RFC v17][PATCH 48/60] c/r (ipc): allow allocation of a desired ipc identifier Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 49/60] c/r: save and restore sysvipc namespace basics Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 50/60] c/r: support share-memory sysv-ipc Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 51/60] c/r: support message-queues sysv-ipc Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 52/60] c/r: support semaphore sysv-ipc Oren Laadan
     [not found]   ` <1248256822-23416-53-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-22 17:25     ` Cyrill Gorcunov
2009-07-23  3:46       ` Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 53/60] c/r: (s390): expose a constant for the number of words (CRs) Oren Laadan
     [not found] ` <1248256822-23416-1-git-send-email-orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-07-22  9:59   ` [RFC v17][PATCH 15/60] pids 5/7: Add target_pids parameter to copy_process() Oren Laadan
2009-07-22  9:59   ` [RFC v17][PATCH 28/60] c/r: support for zombie processes Oren Laadan
2009-07-22 10:00   ` [RFC v17][PATCH 41/60] c/r: dump anonymous- and file-mapped- shared memory Oren Laadan
2009-07-22 10:00   ` [RFC v17][PATCH 44/60] c/r: support for open pipes Oren Laadan
2009-07-22 10:00   ` [RFC v17][PATCH 54/60] c/r: add CKPT_COPY() macro Oren Laadan
2009-07-24 19:09   ` [RFC v17][PATCH 00/60] Kernel based checkpoint/restart Serge E. Hallyn
2009-07-22 10:00 ` [RFC v17][PATCH 55/60] c/r: define s390-specific checkpoint-restart code Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 56/60] c/r: clone_with_pids: define the s390 syscall Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 57/60] c/r: capabilities: define checkpoint and restore fns Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 58/60] c/r: checkpoint and restore task credentials Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 59/60] c/r: restore file->f_cred Oren Laadan
2009-07-22 10:00 ` [RFC v17][PATCH 60/60] c/r: checkpoint and restore (shared) task's sighand_struct Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1248256822-23416-48-git-send-email-orenl@librato.com \
    --to=orenl@librato.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=orenl@cs.columbia.edu \
    --cc=serue@us.ibm.com \
    --cc=torvalds@osdl.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).