linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] VFS revoke()
@ 2014-08-12 18:54 David Herrmann
  2014-08-12 18:54 ` [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts David Herrmann
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: David Herrmann @ 2014-08-12 18:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Tejun Heo,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig,
	David Herrmann

Hi

This patchset implements synchronous shutdown of file->f_op->xyz() access. It
provides generic helpers that can be used by any provider of file->f_op to
prevent new tasks from entering a file-operation and synchronously waiting for
all pending file-operations to finish.
Furthermore, it includes two patches to show how it can be used by device
drivers (here: Input-evdev as an example).

There are several use-cases for this patchset:

1) Hotplug Support
Currently, a device driver for hotpluggable hardware has to jump through hoops
to make unplugging work. Imagine a driver that registers a cdev via cdev_add()
in the ->probe() callback on its bus. Once cdev_add() returns, user-space might
start entering file->f_op->xyz() callbacks on the cdev.
Now, if the hardware device is unplugged, the ->remove() callback of the device
driver is called on its bus. The driver now removes the cdev via cdev_del(),
but this does not guarantee that there's no open file-description left.
Furthermore, it does not prevent any further entry to file-operations.
Therefore, the driver has to protect all its f_op callbacks and deny new tasks
whenever the device was removed. This needlessly leaves the device around
(albeit disabled) until all file-descriptions are closed by user-space.
With generic revoke() support, a driver can synchronously shutdown all
file-operations. This way, it can guarantee that all open files are closed
before returning from its ->remove() callback on the bus.

2) Access Management
Once a user-space process was granted access to a device node, there is no way
to revoke that access again. This is problematic for all devices that are shared
between sessions. Imagine switching between multiple graphics servers, there is
currently no way to make sure only the foreground server gets access to graphics
and input devices.
The easiest solution, obviously, is to provide a user-space daemon that
dispatches between those processes. However, this requires to wrap *all* API
calls and reduces performance of device-operations considerably.
Therefore, input and graphics drivers have started to implement access
management as part of their API. To avoid duplicating that code everywhere, we
need a generic revoke() implementation that revokes *just one file-description*.
This way, generic access-management (like the 'logind' daemon) can hand out
file-descriptors to processes and retain a copy themselves. By revoking access
to the file, the process will loose access, too. We'd love to see such revoke
support for webcams, audio devices, disks and more, so we can prevent background
sessions from accessing those.

3) Generic revoke() / frevoke()
Obviously, the generic revoke() syscall can be implemented with this, too.
Unlike 2), this syscall revokes access to *all* open file-descriptions attached
to a device/object. This can be exposed to user-space, but is also handy to
implement existing syscalls like vhangup().


This patchset introduces two new objects:
  struct revokable_device:
      This is the entry point to revoke-support. Every object that needs revoke
      support has to have a revokable_device embedded somewhere. This is usually
      part of the "cdev", "inode" or other kinds of parent devices. However, it
      can be used freely and does not enforce any restrictions.
      The revokable_device object is used as parent for all open files on the
      underlying device. That is, each file on that device that is opened via
      ->open() will be attached to this revokable_device object. Each attached
      file can be managed independently of the parent device and can be revoked
      by itself.
      The revokable_device object now allows to manage all attached files at
      once. That is, if you call revoke_device(), this will cause all attached
      files to be revoked at the same time and prevent new files from being
      opened.
  struct revokable_file:
      Inside of the ->open() callback, a driver needs to call make_revokable()
      if it needs revoke-support on a file. This will create a new
      "revokable_file" object and link it from file->f_revoke. The object will
      also be attached to the parent revokable_device object that is passed to
      make_revokable().
      Once make_revokable() returns, the file is active and may be revoked at
      any time (maybe even before you return from ->open()).
      Each revokable_file can be revoked independently of each other. Once a
      device is revoked, any new entry to a file-operation is denied and pending
      operations are completed synchronously. Once all those operations are
      done, the ->release() callback of the file is called early so the device
      driver can detach from it. Once the last file-reference is dropped,
      __fput() will no longer call ->release(), as it has already been called.

Unlike previous attempts to implement revoke(), this series implements revoke()
on the file-description level. That is, each open file description can be
revoked independently of each other. This is required to implement
access-management on a per-file level, instead of revoking the whole device.
Furthermore, this series splits revoke_file() and drain_file(). The former only
marks the file as revoked and prevents new file-operations, the latter waits for
pending operations to finish. This split allows drivers to perform any required
actions in between both calls (or avoid draining at all). Note that if
drain_file() is called, the driver needs to wake up sleeping file-operations
after revoke_file() was called. Otherwise, drain_file() might stall.

Open questions:
 * Obviously, we need to protect all calls into file->f_op->xyz(). This series
   provides enter_file() and leave_file() that can be used like:
      if (enter_file(file))
          retval = file->f_op->xyz(file, ...);
      else
          retval = -ENODEV;
   Question is, should we do this at the time we actually invoke those
   callbacks or should we do it at the syscall-entry time? If we do it right
   when calling the f_op, we might and up with stacked calls. This works fine,
   but makes drain_file_self() awful to implement.
   Note that I have implemented both, but didn't include the patches in this
   series as I wasn't sure which to go with and they need much more love...
   Making sure we call enter_file/leave_file everywhere will require proper
   auditing and I guess people will tell me to drop drain_self(), but lets see..
 * Do we need drain_file_self()? That is, do we really want to allow drivers
   to drain a file while being inside a f_op callback? It is handy to implement
   existing calls like EVIOCREVOKE properly, but makes the whole system a lot
   more complex.
   It was a nice challenge to implement it, but I'm fine with dropping it again.
 * What should revoke() do? Currently, revoke_device() calls revoke_file() on
   *all* attached files and prevents any new calls to ->open(). This is handy to
   make hotplugging work, but doesn't make much sense if called on real files.
   We definitely want to allow new calls to open(), so I guess setting
   "device->revoked" to "true" should be optional and only used by drivers
   explicitly on unplug. The currently implementation can be made to allow both.
   But parallel calls to open() while revoke_device() is called will still be
   revoked until revoke_device() really returns.
 * What error code do we want to return when a file is revoked? poll() should
   probably signal POLLHUP, but what for all the other stuff? ENODEV? EACCES?
   For user-space code, a separate EUNPLUG code would be *very* handy to make
   dealing with asynchronous unplug more easy. EREVOKED would also work, but
   doesn't allow to distinguish between device unplug and plain access
   revocation.
 * What to do with remaining data in receive buffers? On revoke() we want all
   access to the device to be denied, but on unplug we *might* want to allow
   user-space to retrieve already queued data in the receive buffers. That is,
   poll returns POLLHUP | POLLIN and read() can return already queued data. I
   haven't seen any reason to do this, though. Iff hardware shows up that has
   really short lifetime and we want all data to be forwarded to user-space, we
   might want to add some special flag to allow poll() and read(). Until then, I
   think we can safely ignore it.

Notes:
 * This series is loosely based on:
     - previous mails by Al Viro
     - "active"-refs of kernfs by Tejun Heo
     - EVIOCREVOKE on evdev
   The patches are written from scratch (no previous attempt allowed revoke() on
   single files so far), but credits go to these guys, too!
 * We probably want a generic kick() callback. With this in place, we can
   implement frevoke() easily:
       int frevoke(struct file *file)
       {
               if (!file->f_revoke)
                       return -ENOTSUPP;
               if (!enter_file(file))
                       return -ENODEV;
               revoke_device(file->f_revoke->device);
               if (file->f_revoke->device->kick)
                       file->f_revoke->device->kick(file->f_revoke->device);
               leave_file(file);
               /* TODO: need ref on device; f_revoke->device may ne NULL here */
               drain_device(file->f_revoke->device);
               return 0;
       }
   Implementing revoke() is harder as we need a connection between a dentry and
   a revokable_file. We could add inode->i_revoke for that.
   In both cases, we need some guarantee that revokable_device does not vanish
   while we use it. So we either need to know the parent object and hold a ref
   to it, or we add ref-counts to revokable_device.
 * mmap() is obviously left to device-drivers to handle in ->release() or
   ->kick(). I recently posted patches that do page duplication and thus can
   give each open-file effectively a MAP_PRIVATE copy at any time. But if
   drivers map I/O memory directly into user-space, you're screwed, anyway.
   Imho, this is something drivers need to fix on a per-case basis. Same way
   drivers handle ->release() regarding mmaps (either leaving them alive or
   revoking them, or .. whatever).
 * This adds 8 bytes to all "struct file"s that don't use revoke(). I think
   that's a fair tradeoff. If you want revoke(), it adds ~40 bytes for
   revokable_file and ~80 bytes for each revokable_device.
   The fast-paths check file->f_revoke for NULL in enter_file(). Should be fine.
   In case you enabled revoke(), you get atomic_inc_unless_negative() and
   atomic_dec_return() in fast-paths. Slow-paths (in case the file is revoked)
   are heavy, but that should be just fine. I mean, most device release paths
   use several synchronize_rcu() calls and more. Some atomic_t magic for
   revoke() should be just fine.
 * "struct kactive" uses atomic_t heavily to implement active device references.
   I have no idea whether it makes sense to do it that way. Shout at me if the
   slow-paths should be changed to use a spin-lock..
   I'd also appreciate mem-barrier reviews there. I'm not entirely sure I got
   them all right.
 * Makeing revokable_device->active_files RCU protected would be *really* nice.
   I didn't succeed in doing so, though. The hard part is revoke_device() which
   somehow needs to iterate the whole list of devices and cannot allow parallel
   modifications. But it needs to drop the spin-lock for ->release(). My current
   way of moving between two lists works fine without RCU, but not with RCU.
   But maybe the current spin-lock protection is just fine. I mean, there's not
   much traversal going on, anyway, and modifications are always protected by
   the lock.
 * This evdev-patches in this series rely on some non-mainstream trivial
   cleanups. In case anyone cares, see:
       "[PATCH v2] Input: evdev - drop redundant list-locking"
 * ...
 * I probably have a lot more notes that I forgot again. Any comments welcome. I
   just wanted to get this out so people know I'm working on it, and maybe
   people can point out whether this is going into the right direction.


Anyway, regardless of revoke(2), frevoke(2) and stuff, this series already makes
writing device-drivers a lot easier. I included two patches that convert one of
the most trivial drivers (and one of the few drivers that actually does proper
unplug handling) to use the new infrastucture (Input: evdev). I have similar
patches for ALSA and DRM, but they require far more cleanups before, so I didn't
include them for now. Note that most drivers are horribly racy regarding
unplugging (ALSA replaces f_op with a dummy and hopes no-one's inside a f_op so
far) or don't care at all (DRM just destroys the device on PCI unplug). Most of
this works fine so far, because we did our best to reduce races to a minimum.
However, it really looks ugly and it would make many developer lives easier if
we had synchronous shutdown.

After adding enter_file()/leave_file() annotations, this series already works
fine on my machine.

Comments welcome!
David

David Herrmann (4):
  kactive: introduce generic "active"-refcounts
  vfs: add revoke() helpers
  Input: evdev - switch to revoke helpers
  Input: evdev - drop redundant client_list

 drivers/input/evdev.c   | 179 +++++++++++----------------
 fs/Makefile             |   2 +-
 fs/file_table.c         |   4 +-
 fs/revoke.c             | 194 ++++++++++++++++++++++++++++++
 include/linux/fs.h      |   2 +
 include/linux/kactive.h | 269 +++++++++++++++++++++++++++++++++++++++++
 include/linux/revoke.h  | 124 +++++++++++++++++++
 lib/Makefile            |   2 +-
 lib/kactive.c           | 312 ++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 974 insertions(+), 114 deletions(-)
 create mode 100644 fs/revoke.c
 create mode 100644 include/linux/kactive.h
 create mode 100644 include/linux/revoke.h
 create mode 100644 lib/kactive.c

-- 
2.0.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts
  2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
@ 2014-08-12 18:54 ` David Herrmann
  2014-08-18 21:03   ` Tejun Heo
  2014-08-12 18:54 ` [PATCH RFC 2/4] vfs: add revoke() helpers David Herrmann
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: David Herrmann @ 2014-08-12 18:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Tejun Heo,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig,
	David Herrmann

This introduces a new reference-type "struct kactive". Unlike kref, this
type manages "active references". That means, references can only be
acquired if the object is active. At any time the object can be
deactivated, causing any new attempt to acquire an active reference to
fail. Furthermore, after disabling an object, you can wait for all active
references to be dropped. This allows synchronous object removal without
dangling _active_ objects.

This obivously does NOT replace the usual ref-count. Moreover, it is meant
to be used in combination with normal ref-counts. This way, you can get
active references and normal references to the object.

Active references are usually required for callbacks. That is, if an
object has callbacks that can be entered by user-space, you usually want
an active reference to the object as long as you are _inside_ the
callback. This way the object cannot be removed while you work on it.
Normal references, on the other hand, are required by the underlying
file/device to make sure the object with its callback-pointer can be
accessed at all.

kernfs implements an active-reference type with spin-locks. This patch is
very loosely based on kernfs but avoids spin-locks.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
---
 include/linux/kactive.h | 269 +++++++++++++++++++++++++++++++++++++++++
 lib/Makefile            |   2 +-
 lib/kactive.c           | 312 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 582 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/kactive.h
 create mode 100644 lib/kactive.c

diff --git a/include/linux/kactive.h b/include/linux/kactive.h
new file mode 100644
index 0000000..e8fca7e
--- /dev/null
+++ b/include/linux/kactive.h
@@ -0,0 +1,269 @@
+/*
+ * kactive - library routines for handling active counters on objects that can
+ *           be revoked at any time
+ *
+ * Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#ifndef _KACTIVE_H_
+#define _KACTIVE_H_
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+
+/**
+ * struct kactive - Counting active object users
+ * @count: internal state-tracking
+ *
+ * "kactive" counts active users on arbitrary objects. For each user currently
+ * accessing the object, an internal counter is increased. However, unlike
+ * reference-counts, kactive allows to disable objects so no new references can
+ * be acquired. Once disabled, you can wait for the counter to drop to zero,
+ * thus, waiting until all users are gone (called 'draining').
+ *
+ * All paths that operate on an object and require it to be active, have to use
+ * kactive_get() to acquire an active reference, and kactive_put() to drop it
+ * again. Unlike kref_get(), kactive_get() might fail, in which case the caller
+ * must bail out. Furthermore, an active reference should be dropped as soon as
+ * possible. If it is claimed for an undefinite period, you must integrate it
+ * into a wait-queue and pass it to kactive. Otherwise, draining a kactive
+ * counter will stall.
+ * You're strongly recommended to drop active references before stalling for an
+ * undefinite period and trying to reacquire them afterwards.
+ *
+ * A kactive counter will undergo the following states (in this order):
+ *  - NEW: First, a counter has to be initialized via kactive_init(). The object
+ *         may not be accessed before it is initialized. kactive_is_new()
+ *         returns true iff the counter is in this state. While in this state
+ *         no references can be acquired.
+ *    test: kactive_is_new()
+ *  - ENABLED: Once a counter is enabled via kactive_enable(), the counter is
+ *             alive and references can be acquired. kactive_is_enabled()
+ *             returns true iff the counter is in this state.
+ *             This state may be skipped if kactive_disable() is called before
+ *             kactive_enable().
+ *    test: kactive_is_enabled()
+ *  - DRAINING: Once a counter is disabled via kactive_disable() no new
+ *              references can be acquired. However, there might still be
+ *              pending references that haven't been dropped. If so, the counter
+ *              is put into the DRAINING state until all those references are
+ *              dropped.
+ *    test: kactive_is_draining()
+ *  - SELF_DRAINING: This is the same state as DRAINING, but is entered after
+ *                   the first call to kactive_drain_self() returned. All
+ *                   remaining refs are now hold by callers of
+ *                   kactive_drain_self(). If kactive_drain_self() is never
+ *                   called, this state may be skipped.
+ *                   In the public API, this state is treated the same as
+ *                   DRAINING and kactive_is_draining() returns true if in this
+ *                   state.
+ *    test: __kactive_is_self_draining()
+ *  - DRAINED: Once a counter was disabled via kactive_disable() and there are
+ *             no more active references, the counter will become DRAINED. It
+ *             will stay in this state until it is destroyed.
+ *    test: kactive_is_drained()
+ *
+ * Both kactive_disable() and kactive_put() can cause a transition to state
+ * DRAINED. In that case, the caller might want to perform cleanups on the
+ * object before doing the transition. Therefore, both functions take a
+ * callback to run before the final transition is done.
+ *
+ * Calling kactive_drain() waits until all active references have been dropped.
+ * Obviously, you must have called kactive_disable() beforehand, otherwise this
+ * might sleep for an indefinite period. Furthermore, kactive_drain() must be
+ * called without any active reference held. You can use kactive_drain_self()
+ * to avoid this restriction and drain all users but yourself. This requires an
+ * additional atomic_t counter, though. It has to be shared across all users of
+ * kactive_drain_self() on that kactive counter.
+ *
+ * If you plan on using kactive_drain() or kactive_drain_self(), the kactive
+ * counter needs a wait-queue to wait on. As draining is supposed to be called
+ * scarcely, the kactive counter does not include its own wait-queue. Instead,
+ * you have to pass a wait-queue pointer to all functions releasing references
+ * or draining the counter.
+ * When draining the wait-queue, the task will wait as TASK_NORMAL on the
+ * wait-queue until the object is drained. Therefore, any call that releases
+ * references wakes up all waiting tasks in state TASK_NORMAL. If that's
+ * incompatible with your wait-queue, you must provide a separate one for the
+ * kactive counter. Note that wake-ups only happen when the last (or second to
+ * last, to support self-draining) reference is dropped. Hence, the wait-queue
+ * of each kactive counter is woken up twice tops.
+ *
+ * The internal counter can be in one of the following states:
+ *   [0, +inf]
+ *      Counter is enabled and counts active refs
+ *   [__KACTIVE_BIAS + 1, -1]
+ *      Counter is draining and counts active refs
+ *   {__KACTIVE_BIAS}
+ *      Counter is draining and there's only a single ref left. It has already
+ *      been dropped but the last user performs cleanups before the counter is
+ *      marked as drained.
+ *   [__KACTIVE_BIAS * 2 + 1, __KACTIVE_BIAS - 1]
+ *      Counter was drained via kactive_drain_self() before and now counts the
+ *      remaining refs of all tasks that drained the counter while holding refs
+ *      themselves.
+ *   {__KACTIVE_BIAS * 2}
+ *      Counter was drained via kactive_drain_self() before and only a single
+ *      remaining ref is left. It has already been dropped, but the last user
+ *      performs cleanups before the counter is marked as drained.
+ *   {__KACTIVE_BIAS * 2 - 1}
+ *      Counter is drained.
+ *   {__KACTIVE_BIAS * 2 - 2}
+ *      Initial state of the counter.
+ */
+struct kactive {
+	atomic_t count;
+};
+
+typedef void (*kactive_release_t) (struct kactive *kactive);
+
+#define __KACTIVE_BIAS			(INT_MIN / 2 + 2)
+#define __KACTIVE_NEW			(__KACTIVE_BIAS * 2 - 2)
+#define __KACTIVE_DRAINED		(__KACTIVE_BIAS * 2 - 1)
+
+/**
+ * kactive_init() - Initialize kactive counter
+ * @kactive: kactive counter to modify
+ *
+ * This initializes a kactive counter. You should call this on any counter
+ * before accessing it via kactive_*() calls. If you don't call this, you must
+ * initialize the memory to 0 and it will behave as if you called:
+ *         kactive_init(&kactive);
+ *         kactive_enable(&kactive);
+ *
+ * Once this call returns the counter is in state NEW. You may call
+ * kactive_enable() to enable the counter, or kactive_disable() to immediately
+ * disable the counter.
+ *
+ * There is no allocated memory on a counter so you can destroy it at any time
+ * as long as you make sure no-one else can access it, anymore. Furthermore,
+ * you can call kactive_init() (or clear the memory) at any time to reset the
+ * counter. This, however, will cause undefined behavior if there are other
+ * tasks that may access the counter in parallel.
+ */
+static inline void kactive_init(struct kactive *kactive)
+{
+	atomic_set(&kactive->count, __KACTIVE_NEW);
+}
+
+/* true if @kactive is in state NEW */
+static inline bool kactive_is_new(struct kactive *kactive)
+{
+	return atomic_read(&kactive->count) == __KACTIVE_NEW;
+}
+
+/* true if @kactive is in state ENABLED */
+static inline bool kactive_is_enabled(struct kactive *kactive)
+{
+	return atomic_read(&kactive->count) >= 0;
+}
+
+/* true if @kactive is DRAINING or DRAINED */
+static inline bool kactive_is_disabled(struct kactive *kactive)
+{
+	int v = atomic_read(&kactive->count);
+	return v > __KACTIVE_NEW && v < 0;
+}
+
+/* true if @kactive is in state DRAINING or SELF_DRAINING */
+static inline bool kactive_is_draining(struct kactive *kactive)
+{
+	int v = atomic_read(&kactive->count);
+	return v > __KACTIVE_DRAINED && v < 0;
+}
+
+/* true if @kactive is in state DRAINED */
+static inline bool kactive_is_drained(struct kactive *kactive)
+{
+	return atomic_read(&kactive->count) == __KACTIVE_DRAINED;
+}
+
+/**
+ * kactive_get() - Acquire active reference
+ * @kactive: kactive counter to modify or NULL
+ *
+ * This tries to acquire a new active reference on @kactive. If @kactive wasn't
+ * enabled, yet, or if it was already disabled, this will return false.
+ * Otherwise, the active-counter of @kactive is increased and this will return
+ * true. You can release this reference via kactive_put() again.
+ *
+ * If @kactive is NULL, this is a no-op and will always return false.
+ *
+ * It is safe to nest multiple calls to kactive_get(). However, kactive_get()
+ * might fail regardless whether you already own a reference or not.
+ *
+ * Returns: If an active reference was acquired, true is returned. Otherwise,
+ *          this returns false.
+ */
+static inline bool kactive_get(struct kactive *kactive)
+{
+	if (unlikely(!kactive))
+		return false;
+
+	return atomic_inc_unless_negative(&kactive->count);
+}
+
+/* slow-path of kactive_put() */
+bool __kactive_release(struct kactive *kactive,
+		       wait_queue_head_t *waitq,
+		       kactive_release_t release,
+		       int v);
+
+/**
+ * kactive_put() - Release active reference
+ * @kactive: kactive counter to modify or NULL
+ * @waitq: wait queue to wake up if counter dropped to zero
+ * @release: release-function to call before dropping the last ref, or NULL
+ *
+ * This drops an active reference previously acquired via kactive_get(). Iff
+ * this call dropped the last reference and the kactive counter is in state
+ * DRAINING, then @release is called (if not NULL) and the counter is marked as
+ * DRAINED.
+ *
+ * All waiters on @waitq are woken up if the kactive counter is in state
+ * DRAINING and this call dropped the last or second to last reference.
+ *
+ * If @kactive is NULL, this is a no-op.
+ *
+ * Returns: If this caused a transition to state DRAINED, true is returned.
+ *          Otherwise, this call returns false.
+ */
+static inline bool kactive_put(struct kactive *kactive,
+			       wait_queue_head_t *waitq,
+			       kactive_release_t release)
+{
+	int v;
+
+	if (unlikely(!kactive))
+		return false;
+
+	v = atomic_read(&kactive->count);
+	if (WARN_ON(v <= __KACTIVE_BIAS * 2 ||
+	            v == __KACTIVE_BIAS ||
+		    v == 0))
+		return false;
+
+	v = atomic_dec_return(&kactive->count);
+	if (v < 0)
+		return __kactive_release(kactive, waitq, release, v);
+
+	return false;
+}
+
+void kactive_drain(struct kactive *kactive, wait_queue_head_t *waitq);
+void kactive_drain_self(struct kactive *kactive,
+			wait_queue_head_t *waitq,
+			unsigned int refs_owned,
+			atomic_t *drain_counter);
+
+bool kactive_enable(struct kactive *kactive);
+bool kactive_disable(struct kactive *kactive,
+		     wait_queue_head_t *waitq,
+		     kactive_release_t release);
+
+#endif /* _KACTIVE_H_ */
diff --git a/lib/Makefile b/lib/Makefile
index d6b4bc4..ec7f6a0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -20,7 +20,7 @@ lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
 
 lib-y	+= kobject.o klist.o
-obj-y	+= lockref.o
+obj-y	+= lockref.o kactive.o
 
 obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \
 	 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
diff --git a/lib/kactive.c b/lib/kactive.c
new file mode 100644
index 0000000..5029110
--- /dev/null
+++ b/lib/kactive.c
@@ -0,0 +1,312 @@
+/*
+ * kactive - library routines for handling active counters on objects that can
+ *           be revoked at any time
+ *
+ * Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include <linux/atomic.h>
+#include <linux/bug.h>
+#include <linux/export.h>
+#include <linux/kactive.h>
+#include <linux/kernel.h>
+#include <linux/lockdep.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+
+/* true if @kactive is SELF_DRAINING */
+static inline bool __kactive_is_self_draining(struct kactive *kactive)
+{
+	int v = atomic_read(&kactive->count);
+	return v > __KACTIVE_DRAINED && v < __KACTIVE_BIAS;
+}
+
+/* true if @kactive is SELF_DRAINING or DRAINING with context-ref */
+static inline bool __kactive_is_almost_drained(struct kactive *kactive)
+{
+	int v = atomic_read(&kactive->count);
+	return v > __KACTIVE_DRAINED && v <= __KACTIVE_BIAS + 1;
+}
+
+/* atomically add @num to @kactive iff @kactive is greater than @cmp */
+static inline int __kactive_add_gt(struct kactive *kactive, int num, int cmp)
+{
+	int old, new;
+
+	do {
+		old = atomic_read(&kactive->count);
+		if (old <= cmp)
+			return old;
+		new = old + num;
+	} while (atomic_cmpxchg(&kactive->count, old, new) != old);
+
+	return old;
+}
+
+/* atomically add @num to @kactive iff @kactive is lower than @cmp */
+static inline int __kactive_add_lt(struct kactive *kactive, int num, int cmp)
+{
+	int old, new;
+
+	do {
+		old = atomic_read(&kactive->count);
+		if (old >= cmp)
+			return old;
+		new = old + num;
+	} while (atomic_cmpxchg(&kactive->count, old, new) != old);
+
+	return old;
+}
+
+bool __kactive_release(struct kactive *kactive,
+		       wait_queue_head_t *waitq,
+		       kactive_release_t release,
+		       int v)
+{
+	if (v == __KACTIVE_BIAS || v == __KACTIVE_BIAS * 2) {
+		if (release)
+			release(kactive);
+
+		/*
+		 * We allow freeing @kactive immediately after kactive_drain()
+		 * returns. So make sure release() is fully done before we
+		 * mark it as drained.
+		 */
+		mb();
+
+		atomic_set(&kactive->count, __KACTIVE_DRAINED);
+		wake_up_all(waitq);
+		return true;
+	} else if (v == __KACTIVE_BIAS + 1) {
+		/* wake up kactive_drain_self() with context-ref */
+		wake_up_all(waitq);
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(__kactive_release);
+
+/**
+ * kactive_drain() - Drain kactive counter
+ * @kactive: kactive counter to drain
+ * @waitq: wait queue to wait on
+ *
+ * This waits on the given kactive counter until all references were dropped.
+ * You must wake up any sleeping tasks that might hold active references before
+ * draining the kactive counter, otherwise this call might stall. Moreover, if
+ * this is called on an enabled counter it will stall until someone disables
+ * the counter.
+ *
+ * You must not hold any active references yourself when calling this! See
+ * kactive_drain_self() to avoid this restriction.
+ *
+ * Once this call returns, no active reference will exist, anymore.
+ *
+ * This call waits on @waitq in case there are active references. You must pass
+ * the same wait-queue to kactive_put() or wake up the queue manually to make
+ * sure all draining tasks are correctly notified.
+ *
+ * It is safe to call this function multiple times (even in parallel). All of
+ * those calls are guaranteed to sleep until there are no more active refs.
+ * This can even be called in parallel to kactive_drain_self(), it will wait
+ * until all references, including those held by callers to kactive_drain_self()
+ * are released.
+ */
+void kactive_drain(struct kactive *kactive, wait_queue_head_t *waitq)
+{
+	/*
+	 * When draining without holding active refs we simply wait
+	 * until the kactive counter drops to DRAINED.
+	 */
+	wait_event(*waitq, kactive_is_drained(kactive));
+}
+EXPORT_SYMBOL_GPL(kactive_drain);
+
+/**
+ * kactive_drain_self() - Drain kactive counter while holding active refs
+ * @kactive: kactive counter to drain
+ * @waitq: wait queue to wait on
+ * @refs_owned: number of active references owned by the caller
+ * @drain_counter: separate atomic counter to keep track of context refs
+ *
+ * This is similar to kactive_drain(), but allows the caller to hold active
+ * references themself. However, you must know how many references are hold
+ * by your call-stack and pass this as @refs_owned. Otherwise, this will
+ * dead-lock. Furthermore, a second atomic_t counter is needed to make this
+ * work. This counter has to be shared across all users of kactive_drain_self()
+ * on this kactive counter. It has to be initialized to 0 and passed as
+ * @drain_counter to this helper. No other kactive helpers need access to it.
+ *
+ * If @refs_owned is 0, this is equivalent to kactive_drain() and
+ * @drain_counter is not accessed (can be NULL). The following description
+ * assumes @refs_owned is greater than zero.
+ *
+ * When calling kactive_drain_self(), this blocks until all other references
+ * but the own are dropped. The kactive counter must have been disabled before
+ * this is called. Furthermore, any pending tasks that hold active refs must
+ * have been woken up. Otherwise, this might block indefinitely.
+ *
+ * If there are multiple parallel callers of kactive_drain_self(), they're
+ * treated equally. That means, all calls will block until all refs but the
+ * sum of their refs were dropped.
+ *
+ * If any caller of kactive_drain_self() calls this function a second time, it
+ * will immediately return as all remaining refs are guaranteed to be hold by a
+ * caller of kactive_drain_self().
+ * Once kactive_drain_self() returns, your remaining active references are
+ * still valid as usual.
+ */
+void kactive_drain_self(struct kactive *kactive,
+			wait_queue_head_t *waitq,
+			unsigned int refs_owned,
+			atomic_t *drain_counter)
+{
+	bool first;
+	int v;
+
+	if (refs_owned < 1) {
+		kactive_drain(kactive, waitq);
+		return;
+	}
+
+	/*
+	 * When draining while holding refs yourself, we drop the refs
+	 * and then drain the kactive counter. The first one to enter
+	 * kactive_drain_self() retains a context-ref to make sure
+	 * racing calls to kactive_put() don't free the object. Once
+	 * all but the context-ref are dropped, the first one to leave
+	 * performs a transition to SELF_DRAINING, acquires all refs of
+	 * all tasks inside kactive_drain_self() and then drops the
+	 * context-ref. This guarantees that no task leaving
+	 * kactive_drain_self() early can drop all remaining refs via
+	 * kactive_put().
+	 * Once the first task leaves kactive_drain_self(), we know all
+	 * other tasks with self-refs had to be inside
+	 * kactive_drain_self(), too, and must have dropped their local
+	 * refs. Otherwise, the counter would have never dropped to 1
+	 * and the task could not have left kactive_drain_self(). If a
+	 * task enters kactive_drain_self() a second time, their local
+	 * cache must already see that we're SELF_DRAINING and we can
+	 * exit early without waiting again.
+	 */
+
+	if (__kactive_is_self_draining(kactive))
+		return;
+
+	/*
+	 * At this point we know:
+	 *  - this task has never entered kactive_drain_self() before,
+	 *    otherwise the local view must show SELF_DRAINING
+	 *  - no-one else can have left kactive_drain_self() as we
+	 *    still hold refs and never dropped them
+	 * We might not be the first to enter kactive_drain_self() so
+	 * remember our ref-count in the shared counter.
+	 */
+
+	v = atomic_add_return(refs_owned, drain_counter);
+	first = (v == refs_owned);
+
+	/*
+	 * Make sure our refs are acoounted for in @drain_counter
+	 * before we drop them.
+	 */
+	smp_wmb();
+
+	/*
+	 * If we were the first to modify @drain_counter, acquire an
+	 * additional context-ref. Only one is allowed to do that,
+	 * otherwise kactive_put() might never wake us up as it has no
+	 * access to @drain_counter.
+	 * If someone passed us and enters wait_event() early, our own
+	 * refs serve as their context-ref until we acquire it.
+	 */
+	atomic_sub(refs_owned - first, &kactive->count);
+
+	wait_event(*waitq, __kactive_is_almost_drained(kactive));
+
+	/*
+	 * Make sure we read all context-references after wait_event()
+	 * read the kactive counter.
+	 */
+	smp_rmb();
+	v = atomic_read(drain_counter);
+
+	/*
+	 * The first one to leave has to do an atomic transition to
+	 * SELF_DRAINING while acquiring all @drain_counter refs and
+	 * dropping the context-ref.
+	 * All others to leave know that their refs have already been
+	 * safely reacquired.
+	 */
+	__kactive_add_gt(kactive,
+			 v + __KACTIVE_BIAS - 1,
+			 __KACTIVE_BIAS);
+}
+EXPORT_SYMBOL_GPL(kactive_drain_self);
+
+/**
+ * kactive_enable() - Enable kactive counter
+ * @kactive: kactive counter to enable
+ *
+ * This enables a kactive counter in case it wasn't already enabled. Note that
+ * if the counter was disabled via kactive_disable() before, it can never be
+ * enabled again.
+ *
+ * It is safe to call this multiple times (even in parallel). Any call but the
+ * first is a no-op.
+ *
+ * Returns: If the counter was not in state NEW, false is returned. Otherwise,
+ *          the counter is put into state ENABLED and true is returned.
+ */
+bool kactive_enable(struct kactive *kactive)
+{
+	/* try transition NEW=>ENABLED */
+	return __kactive_add_lt(kactive,
+				-__KACTIVE_NEW,
+				__KACTIVE_NEW + 1) <= __KACTIVE_NEW;
+}
+EXPORT_SYMBOL_GPL(kactive_enable);
+
+/**
+ * kactive_disable() - Disable kactive counter
+ * @kactive: kactive counter to disable
+ * @waitq: wait queue to wake up if counter dropped to zero
+ * @release: release-function to call before dropping the last ref, or NULL
+ *
+ * This disables a kactive counter. This works regardless of the state @kactive
+ * is in, even if it wasn't enabled, yet. A counter that was disabled can never
+ * be enabled again. Once this call returns, no new active references can be
+ * acquired.
+ *
+ * It is safe to call this multiple times (even in parallel). Any call but the
+ * first is a no-op.
+ *
+ * Iff this call causes a transition to state DRAINED, @release is called (if
+ * not NULL) before it is marked as DRAINED.
+ *
+ * Returns: If this call caused a transition into state DRAINED, this returns
+ *          true. Otherwise, false is returned.
+ */
+bool kactive_disable(struct kactive *kactive,
+		     wait_queue_head_t *waitq,
+		     kactive_release_t release)
+{
+	/*
+	 * First try direct transition from NEW=>DRAINING...
+	 * ...then it can never be NEW again, so try ENABLED=>DRAINING.
+	 */
+	if (__kactive_add_lt(kactive,
+			     2 - __KACTIVE_BIAS,
+			     __KACTIVE_NEW + 1) <= __KACTIVE_NEW)
+		return __kactive_release(kactive, waitq, release,
+					 __KACTIVE_BIAS);
+	else if (__kactive_add_gt(kactive, __KACTIVE_BIAS, -1) == 0)
+		return __kactive_release(kactive, waitq, release,
+					 __KACTIVE_BIAS);
+
+	/* someone else already disabled it, we didn't modify @kactive */
+	return false;
+}
+EXPORT_SYMBOL_GPL(kactive_disable);
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 2/4] vfs: add revoke() helpers
  2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
  2014-08-12 18:54 ` [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts David Herrmann
@ 2014-08-12 18:54 ` David Herrmann
  2014-08-19 12:27   ` Christoph Hellwig
  2014-08-12 18:54 ` [PATCH RFC 3/4] Input: evdev - switch to revoke helpers David Herrmann
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: David Herrmann @ 2014-08-12 18:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Tejun Heo,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig,
	David Herrmann

This patch adds generic VFS helpers to revoke file access. By default,
revoke() support is disabled for all files. Drivers need to call
make_revokable() in their f_op->open() callback to enable it for a given
file.

If a file is marked revokable, VFS core tracks all tasks inside file->f_op
callbacks. Once a file is revoked, we prevent new tasks from entering
those callbacks and synchronously wait for existing tasks to leave them.
Once all tasks are done, we call f_op->release() early, so the device
driver can detach the file.

Each file description can be revoked independently by calling
revoke_file(). This prevents new tasks from entering any file->f_op
callbacks, but does *not* wait for existing tasks. You have to call
drain_file() to explicitly wait for any pending file-operations to finish.
In between revoke_file() and drain_file(), you must wake up any sleeping
file-operations to make sure drain_file() can complete in finite time.
This makes drain_file() device dependent. In case we want to implement
generic revoke() syscalls, we need a f_op->kick() callback that is called
in between both calls.

Once make_revokable() was called, file->f_revoke points to a
"struct revokable_file" object which contains revoke() management data.
Additionally, each revokable file must be linked to an object of type
"struct revokable_device". The device object is usually attached to the
parent inode of the revokable files, but no such restriction is enforced.
Revokable devices manage attached files and can be used to revoke access
to all attached files at once. That is, calling revoke_device() is
equivalent to calling revoke_file() on *all* open file descriptions
attached to the device. This is non-trivial, though. Hence, we provide a
generic implementation for it.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
---
 fs/Makefile            |   2 +-
 fs/file_table.c        |   4 +-
 fs/revoke.c            | 194 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |   2 +
 include/linux/revoke.h | 124 +++++++++++++++++++++++++++++++
 5 files changed, 323 insertions(+), 3 deletions(-)
 create mode 100644 fs/revoke.c
 create mode 100644 include/linux/revoke.h

diff --git a/fs/Makefile b/fs/Makefile
index 4030cbf..33ac82d 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -11,7 +11,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		attr.o bad_inode.o file.o filesystems.o namespace.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o splice.o sync.o utimes.o \
-		stack.o fs_struct.o statfs.o
+		stack.o fs_struct.o statfs.o revoke.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o block_dev.o direct-io.o mpage.o
diff --git a/fs/file_table.c b/fs/file_table.c
index 385bfd3..a8555cc 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -26,6 +26,7 @@
 #include <linux/hardirq.h>
 #include <linux/task_work.h>
 #include <linux/ima.h>
+#include <linux/revoke.h>
 
 #include <linux/atomic.h>
 
@@ -212,8 +213,7 @@ static void __fput(struct file *file)
 			file->f_op->fasync(-1, file, 0);
 	}
 	ima_file_free(file);
-	if (file->f_op->release)
-		file->f_op->release(inode, file);
+	release_file(file);
 	security_file_free(file);
 	if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
 		     !(file->f_mode & FMODE_PATH))) {
diff --git a/fs/revoke.c b/fs/revoke.c
new file mode 100644
index 0000000..6a38f78
--- /dev/null
+++ b/fs/revoke.c
@@ -0,0 +1,194 @@
+/*
+ * File Access Revocation
+ * Written 2014 by David Herrmann <dh.herrmann@gmail.com>
+ */
+
+#include <linux/atomic.h>
+#include <linux/export.h>
+#include <linux/fs.h>
+#include <linux/kactive.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/revoke.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+static void release_drained_file(struct revokable_file *rf)
+{
+	struct revokable_device *dev = rf->device;
+	struct file *file = rf->file;
+
+	hlist_del_init(&rf->node);
+	hlist_add_head(&rf->node, &dev->revoked_files);
+	rf->device = NULL;
+
+	spin_unlock_irq(&dev->lock);
+	if (file->f_op->release)
+		file->f_op->release(file->f_inode, file);
+	spin_lock_irq(&dev->lock);
+
+	hlist_del_init(&rf->node);
+}
+
+/* callback for locked kactive_disable() */
+static void ____release_drained_revokable(struct kactive *active)
+{
+	release_drained_file(container_of(active, struct revokable_file,
+					  active));
+}
+
+/* callback for unlocked kactive_disable() */
+void __release_drained_revokable(struct kactive *active)
+{
+	struct revokable_file *rf = container_of(active, struct revokable_file,
+						 active);
+	struct revokable_device *dev = rf->device;
+
+	spin_lock_irq(&dev->lock);
+	release_drained_file(rf);
+	spin_unlock_irq(&dev->lock);
+}
+EXPORT_SYMBOL_GPL(__release_drained_revokable);
+
+int make_revokable(struct revokable_device *dev, struct file *file)
+{
+	struct revokable_file *rf;
+	int retval;
+
+	if (dev->revoked)
+		return -ENODEV;
+
+	rf = kzalloc(sizeof(*rf), GFP_KERNEL);
+	if (!rf)
+		return -ENOMEM;
+
+	INIT_HLIST_NODE(&rf->node);
+	rf->waitq = &dev->waitq;
+	rf->file = file;
+	rf->device = dev;
+	kactive_init(&rf->active);
+	kactive_enable(&rf->active);
+	atomic_set(&rf->drain_count, 0);
+
+	spin_lock_irq(&dev->lock);
+	if (dev->revoked) {
+		retval = -ENODEV;
+	} else {
+		file->f_revoke = rf;
+		hlist_add_head(&rf->node, &dev->active_files);
+		retval = 0;
+	}
+	spin_unlock_irq(&dev->lock);
+
+	if (retval)
+		kfree(rf);
+	return retval;
+}
+EXPORT_SYMBOL_GPL(make_revokable);
+
+void release_file(struct file *file)
+{
+	struct revokable_file *rf = file->f_revoke;
+
+	if (rf) {
+		/*
+		 * There cannot be any active ref left, so all kactive_drain()
+		 * does is wait for possible parallel kactive_disable() calls
+		 * to finish.
+		 */
+		kactive_disable(&rf->active, rf->waitq,
+				__release_drained_revokable);
+		kactive_drain(&rf->active, rf->waitq);
+		kfree(rf);
+	} else if (file->f_op->release) {
+		file->f_op->release(file->f_inode, file);
+	}
+}
+
+void revoke_file(struct file *file)
+{
+	struct revokable_file *rf = file->f_revoke;
+
+	if (WARN_ON(!rf))
+		return;
+
+	kactive_disable(&rf->active, rf->waitq, __release_drained_revokable);
+}
+EXPORT_SYMBOL_GPL(revoke_file);
+
+void drain_file(struct file *file)
+{
+	struct revokable_file *rf = file->f_revoke;
+
+	if (WARN_ON(!rf))
+		return;
+
+	revoke_file(file);
+	kactive_drain(&rf->active, rf->waitq);
+}
+EXPORT_SYMBOL_GPL(drain_file);
+
+void drain_file_self(struct file *file)
+{
+	struct revokable_file *rf = file->f_revoke;
+
+	if (WARN_ON(!rf))
+		return;
+
+	revoke_file(file);
+	kactive_drain_self(&rf->active, rf->waitq, 1, &rf->drain_count);
+}
+EXPORT_SYMBOL_GPL(drain_file_self);
+
+void init_revokable_device(struct revokable_device *dev)
+{
+	INIT_HLIST_HEAD(&dev->active_files);
+	INIT_HLIST_HEAD(&dev->revoked_files);
+	init_waitqueue_head(&dev->waitq);
+	spin_lock_init(&dev->lock);
+	dev->revoked = false;
+}
+EXPORT_SYMBOL_GPL(init_revokable_device);
+
+void revoke_device(struct revokable_device *dev)
+{
+	struct revokable_file *rf;
+
+	spin_lock_irq(&dev->lock);
+	dev->revoked = true;
+	while (!hlist_empty(&dev->active_files)) {
+		rf = to_revokable_safe(dev->active_files.first);
+		hlist_del_init(&rf->node);
+		hlist_add_head(&rf->node, &dev->revoked_files);
+
+		/*
+		 * Call kactive_disable() with device->lock held to protect
+		 * against parallel file release. It might drop the lock
+		 * temporarily when calling into f_op->release(), though.
+		 */
+		kactive_disable(&rf->active, rf->waitq,
+				____release_drained_revokable);
+	}
+	spin_unlock_irq(&dev->lock);
+}
+EXPORT_SYMBOL_GPL(revoke_device);
+
+static bool device_is_drained(struct revokable_device *dev)
+{
+	bool drained;
+
+	spin_lock_irq(&dev->lock);
+	drained = dev->revoked &&
+			hlist_empty(&dev->active_files) &&
+			hlist_empty(&dev->revoked_files);
+	spin_unlock_irq(&dev->lock);
+
+	return drained;
+}
+
+void drain_device(struct revokable_device *dev)
+{
+	revoke_device(dev);
+	wait_event(dev->waitq, device_is_drained(dev));
+}
+EXPORT_SYMBOL_GPL(drain_device);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f0890e4..6230f29 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -49,6 +49,7 @@ struct swap_info_struct;
 struct seq_file;
 struct workqueue_struct;
 struct iov_iter;
+struct revokable_file;
 
 extern void __init inode_init(void);
 extern void __init inode_init_early(void);
@@ -792,6 +793,7 @@ struct file {
 	struct fown_struct	f_owner;
 	const struct cred	*f_cred;
 	struct file_ra_state	f_ra;
+	struct revokable_file	*f_revoke;
 
 	u64			f_version;
 #ifdef CONFIG_SECURITY
diff --git a/include/linux/revoke.h b/include/linux/revoke.h
new file mode 100644
index 0000000..a466902
--- /dev/null
+++ b/include/linux/revoke.h
@@ -0,0 +1,124 @@
+#ifndef _LINUX_REVOKE_H
+#define _LINUX_REVOKE_H
+
+/*
+ * File Access Revocation
+ * Written 2014 by David Herrmann <dh.herrmann@gmail.com>
+ */
+
+#include <linux/atomic.h>
+#include <linux/fs.h>
+#include <linux/kactive.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/wait.h>
+
+struct revokable_device {
+	struct hlist_head		active_files;
+	struct hlist_head		revoked_files;
+	wait_queue_head_t		waitq;
+	spinlock_t			lock;
+	bool				revoked : 1;
+};
+
+struct revokable_file {
+	struct hlist_node		node;
+	wait_queue_head_t		*waitq;
+	struct file			*file;
+	struct revokable_device		*device;
+	struct kactive			active;
+	atomic_t			drain_count;
+};
+
+#define to_revokable_safe(_hlist_node) hlist_entry_safe((_hlist_node), \
+						struct revokable_file, node)
+
+void init_revokable_device(struct revokable_device *dev);
+void revoke_device(struct revokable_device *dev);
+void drain_device(struct revokable_device *dev);
+
+int make_revokable(struct revokable_device *dev, struct file *file);
+void __release_drained_revokable(struct kactive *active);
+void release_file(struct file *file);
+void revoke_file(struct file *file);
+void drain_file(struct file *file);
+void drain_file_self(struct file *file);
+
+static inline bool device_is_revoked(struct revokable_device *dev)
+{
+	return dev->revoked;
+}
+
+static inline bool file_is_revoked(struct file *file)
+{
+	return file->f_revoke && kactive_is_disabled(&file->f_revoke->active);
+}
+
+static inline bool enter_file(struct file *file)
+{
+	return file->f_revoke ? kactive_get(&file->f_revoke->active) : true;
+}
+
+static inline void leave_file(struct file *file)
+{
+	if (file->f_revoke)
+		kactive_put(&file->f_revoke->active,
+			    file->f_revoke->waitq,
+			    __release_drained_revokable);
+}
+
+/*
+ * revokable_device iterator
+ */
+
+struct revokable_iter {
+	struct hlist_node *pos;
+	unsigned int current_list : 2;
+};
+
+static inline struct revokable_file*
+__next_revokable(struct revokable_device *dev, struct revokable_iter *iter)
+{
+	do {
+		if (iter->pos) {
+			iter->pos = iter->pos->next;
+		} else {
+			--iter->current_list;
+			if (iter->current_list == 2)
+				iter->pos = dev->active_files.first;
+			else if (iter->current_list == 1)
+				iter->pos = dev->revoked_files.first;
+		}
+	} while ((iter->pos && !to_revokable_safe(iter->pos)->device) ||
+		 (!iter->pos && iter->current_list > 0));
+	return to_revokable_safe(iter->pos);
+}
+
+static inline void *__next_revokable_private(struct revokable_device *dev,
+					     struct revokable_iter *iter)
+{
+	__next_revokable(dev, iter);
+	if (!iter->pos)
+		return NULL;
+	return to_revokable_safe(iter->pos)->file->private_data;
+}
+
+#define REVOKABLE_ITER_INIT \
+		((struct revokable_iter){ .pos = NULL, .current_list = 3 })
+
+static inline void revokable_iter_init(struct revokable_iter *iter)
+{
+	*iter = REVOKABLE_ITER_INIT;
+}
+
+#define for_each_revokable(_pos, _iter, _dev)				\
+	for (_pos = __next_revokable((_dev), (_iter));			\
+		(_iter)->pos;						\
+		_pos = __next_revokable((_dev), (_iter)))
+
+#define for_each_revokable_private(_pos, _iter, _dev)			\
+	for (_pos = __next_revokable_private((_dev), (_iter));		\
+		(_iter)->pos;						\
+		_pos = __next_revokable_private((_dev), (_iter)))
+
+#endif /* _LINUX_REVOKE_H */
-- 
2.0.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 3/4] Input: evdev - switch to revoke helpers
  2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
  2014-08-12 18:54 ` [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts David Herrmann
  2014-08-12 18:54 ` [PATCH RFC 2/4] vfs: add revoke() helpers David Herrmann
@ 2014-08-12 18:54 ` David Herrmann
  2014-08-12 18:54 ` [PATCH RFC 4/4] Input: evdev - drop redundant client_list David Herrmann
  2014-08-12 19:14 ` [PATCH RFC 0/4] VFS revoke() Al Viro
  4 siblings, 0 replies; 9+ messages in thread
From: David Herrmann @ 2014-08-12 18:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Tejun Heo,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig,
	David Herrmann

Evdev currently protectes all f_op->xy() callbacks with a mutex. We need
that to allow synchronous device removal. Once an input device is
unplugged, we mark it as dead and thus all further file-operations will
fail. The same logic is used for EVIOCREVOKE to revoke file access on a
single file-description.

By using the generic revoke() helpers, we can drop all those protections
and instead use synchronous revoke_file() and revoke_device(). As a bonus,
we no longer leave open files around after a device is dead. We can
synchronously drain the file and thus free all our file contexts and
disconnect from the dead parent device.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
---
 drivers/input/evdev.c | 154 +++++++++++++++++++-------------------------------
 1 file changed, 57 insertions(+), 97 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 48b7216..0cab144 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -26,6 +26,7 @@
 #include <linux/major.h>
 #include <linux/device.h>
 #include <linux/cdev.h>
+#include <linux/revoke.h>
 #include "input-compat.h"
 
 struct evdev {
@@ -37,7 +38,7 @@ struct evdev {
 	struct mutex mutex;
 	struct device dev;
 	struct cdev cdev;
-	bool exist;
+	struct revokable_device revoke;
 };
 
 struct evdev_client {
@@ -49,7 +50,6 @@ struct evdev_client {
 	struct evdev *evdev;
 	struct list_head node;
 	int clkid;
-	bool revoked;
 	unsigned int bufsize;
 	struct input_event buffer[];
 };
@@ -165,9 +165,6 @@ static void evdev_pass_values(struct evdev_client *client,
 	struct input_event event;
 	bool wakeup = false;
 
-	if (client->revoked)
-		return;
-
 	event.time = ktime_to_timeval(client->clkid == CLOCK_MONOTONIC ?
 				      mono : real);
 
@@ -237,28 +234,8 @@ static int evdev_fasync(int fd, struct file *file, int on)
 static int evdev_flush(struct file *file, fl_owner_t id)
 {
 	struct evdev_client *client = file->private_data;
-	struct evdev *evdev = client->evdev;
-	int retval;
-
-	retval = mutex_lock_interruptible(&evdev->mutex);
-	if (retval)
-		return retval;
-
-	if (!evdev->exist || client->revoked)
-		retval = -ENODEV;
-	else
-		retval = input_flush_device(&evdev->handle, file);
-
-	mutex_unlock(&evdev->mutex);
-	return retval;
-}
 
-static void evdev_free(struct device *dev)
-{
-	struct evdev *evdev = container_of(dev, struct evdev, dev);
-
-	input_put_device(evdev->handle.dev);
-	kfree(evdev);
+	return input_flush_device(&client->evdev->handle, file);
 }
 
 /*
@@ -304,7 +281,7 @@ static int evdev_release(struct inode *inode, struct file *file)
 	mutex_lock(&evdev->mutex);
 	evdev_ungrab(evdev, client);
 	list_del_rcu(&client->node);
-	if (evdev->exist && !--evdev->open)
+	if (!--evdev->open)
 		input_close_device(&evdev->handle);
 	mutex_unlock(&evdev->mutex);
 
@@ -352,11 +329,6 @@ static int evdev_open(struct inode *inode, struct file *file)
 
 	list_add_tail_rcu(&client->node, &evdev->client_list);
 
-	if (!evdev->exist) {
-		error = -ENODEV;
-		goto err_unlink;
-	}
-
 	if (!evdev->open++) {
 		error = input_open_device(&evdev->handle);
 		if (error) {
@@ -365,13 +337,19 @@ static int evdev_open(struct inode *inode, struct file *file)
 		}
 	}
 
-	mutex_unlock(&evdev->mutex);
-
 	file->private_data = client;
 	nonseekable_open(inode, file);
 
+	error = make_revokable(&evdev->revoke, file);
+	if (error)
+		goto err_close;
+
+	mutex_unlock(&evdev->mutex);
 	return 0;
 
+err_close:
+	if (!--evdev->open)
+		input_close_device(&evdev->handle);
 err_unlink:
 	list_del_rcu(&client->node);
 	mutex_unlock(&evdev->mutex);
@@ -392,29 +370,15 @@ static ssize_t evdev_write(struct file *file, const char __user *buffer,
 	if (count != 0 && count < input_event_size())
 		return -EINVAL;
 
-	retval = mutex_lock_interruptible(&evdev->mutex);
-	if (retval)
-		return retval;
-
-	if (!evdev->exist || client->revoked) {
-		retval = -ENODEV;
-		goto out;
-	}
-
 	while (retval + input_event_size() <= count) {
+		if (input_event_from_user(buffer + retval, &event))
+			return -EFAULT;
 
-		if (input_event_from_user(buffer + retval, &event)) {
-			retval = -EFAULT;
-			goto out;
-		}
 		retval += input_event_size();
-
 		input_inject_event(&evdev->handle,
 				   event.type, event.code, event.value);
 	}
 
- out:
-	mutex_unlock(&evdev->mutex);
 	return retval;
 }
 
@@ -449,7 +413,7 @@ static ssize_t evdev_read(struct file *file, char __user *buffer,
 		return -EINVAL;
 
 	for (;;) {
-		if (!evdev->exist || client->revoked)
+		if (device_is_revoked(&evdev->revoke))
 			return -ENODEV;
 
 		if (client->packet_head == client->tail &&
@@ -478,7 +442,7 @@ static ssize_t evdev_read(struct file *file, char __user *buffer,
 		if (!(file->f_flags & O_NONBLOCK)) {
 			error = wait_event_interruptible(evdev->wait,
 					client->packet_head != client->tail ||
-					!evdev->exist || client->revoked);
+					device_is_revoked(&evdev->revoke));
 			if (error)
 				return error;
 		}
@@ -496,11 +460,7 @@ static unsigned int evdev_poll(struct file *file, poll_table *wait)
 
 	poll_wait(file, &evdev->wait, wait);
 
-	if (evdev->exist && !client->revoked)
-		mask = POLLOUT | POLLWRNORM;
-	else
-		mask = POLLHUP | POLLERR;
-
+	mask = POLLOUT | POLLWRNORM;
 	if (client->packet_head != client->tail)
 		mask |= POLLIN | POLLRDNORM;
 
@@ -748,17 +708,6 @@ static int evdev_handle_mt_request(struct input_dev *dev,
 	return 0;
 }
 
-static int evdev_revoke(struct evdev *evdev, struct evdev_client *client,
-			struct file *file)
-{
-	client->revoked = true;
-	evdev_ungrab(evdev, client);
-	input_flush_device(&evdev->handle, file);
-	wake_up_interruptible(&evdev->wait);
-
-	return 0;
-}
-
 static long evdev_do_ioctl(struct file *file, unsigned int cmd,
 			   void __user *p, int compat_mode)
 {
@@ -821,12 +770,6 @@ static long evdev_do_ioctl(struct file *file, unsigned int cmd,
 		else
 			return evdev_ungrab(evdev, client);
 
-	case EVIOCREVOKE:
-		if (p)
-			return -EINVAL;
-		else
-			return evdev_revoke(evdev, client, file);
-
 	case EVIOCSCLOCKID:
 		if (copy_from_user(&i, p, sizeof(unsigned int)))
 			return -EFAULT;
@@ -963,6 +906,17 @@ static long evdev_do_ioctl(struct file *file, unsigned int cmd,
 	return -EINVAL;
 }
 
+static int evdev_revoke(struct evdev *evdev, struct evdev_client *client,
+			struct file *file)
+{
+	revoke_file(file);
+	wake_up_interruptible(&evdev->wait);
+	kill_fasync(&client->fasync, SIGIO, POLL_HUP);
+	drain_file_self(file);
+
+	return 0;
+}
+
 static long evdev_ioctl_handler(struct file *file, unsigned int cmd,
 				void __user *p, int compat_mode)
 {
@@ -970,19 +924,21 @@ static long evdev_ioctl_handler(struct file *file, unsigned int cmd,
 	struct evdev *evdev = client->evdev;
 	int retval;
 
-	retval = mutex_lock_interruptible(&evdev->mutex);
-	if (retval)
-		return retval;
-
-	if (!evdev->exist || client->revoked) {
-		retval = -ENODEV;
-		goto out;
+	/* unlocked ioctls */
+	switch (cmd) {
+	case EVIOCREVOKE:
+		if (p)
+			return -EINVAL;
+		else
+			return evdev_revoke(evdev, client, file);
 	}
 
-	retval = evdev_do_ioctl(file, cmd, p, compat_mode);
+	retval = mutex_lock_interruptible(&evdev->mutex);
+	if (!retval) {
+		retval = evdev_do_ioctl(file, cmd, p, compat_mode);
+		mutex_unlock(&evdev->mutex);
+	}
 
- out:
-	mutex_unlock(&evdev->mutex);
 	return retval;
 }
 
@@ -1015,9 +971,16 @@ static const struct file_operations evdev_fops = {
 	.llseek		= no_llseek,
 };
 
+static void evdev_free(struct device *dev)
+{
+	struct evdev *evdev = container_of(dev, struct evdev, dev);
+
+	input_put_device(evdev->handle.dev);
+	kfree(evdev);
+}
+
 static void evdev_cleanup(struct evdev *evdev)
 {
-	struct input_handle *handle = &evdev->handle;
 	struct evdev_client *client;
 
 	/*
@@ -1025,21 +988,18 @@ static void evdev_cleanup(struct evdev *evdev)
 	 * Then wake up running users that wait for I/O so they can disconnect
 	 * from the dead device.
 	 */
-	mutex_lock(&evdev->mutex);
-	evdev->exist = false;
-	list_for_each_entry(client, &evdev->client_list, node)
+
+	cdev_del(&evdev->cdev);
+	revoke_device(&evdev->revoke);
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(client, &evdev->client_list, node)
 		kill_fasync(&client->fasync, SIGIO, POLL_HUP);
-	mutex_unlock(&evdev->mutex);
+	rcu_read_unlock();
 
 	wake_up_interruptible(&evdev->wait);
 
-	cdev_del(&evdev->cdev);
-
-	/* evdev is marked dead so no one else accesses evdev->open */
-	if (evdev->open) {
-		input_flush_device(handle, NULL);
-		input_close_device(handle);
-	}
+	drain_device(&evdev->revoke);
 }
 
 /*
@@ -1070,7 +1030,7 @@ static int evdev_connect(struct input_handler *handler, struct input_dev *dev,
 	INIT_LIST_HEAD(&evdev->client_list);
 	mutex_init(&evdev->mutex);
 	init_waitqueue_head(&evdev->wait);
-	evdev->exist = true;
+	init_revokable_device(&evdev->revoke);
 
 	dev_no = minor;
 	/* Normalize device number if it falls into legacy range */
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 4/4] Input: evdev - drop redundant client_list
  2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
                   ` (2 preceding siblings ...)
  2014-08-12 18:54 ` [PATCH RFC 3/4] Input: evdev - switch to revoke helpers David Herrmann
@ 2014-08-12 18:54 ` David Herrmann
  2014-08-12 19:14 ` [PATCH RFC 0/4] VFS revoke() Al Viro
  4 siblings, 0 replies; 9+ messages in thread
From: David Herrmann @ 2014-08-12 18:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Tejun Heo,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig,
	David Herrmann

The client_list of evdev devices keeps the same information as the VFS
revokable_file list. Drop the redundant list and use the VFS core instead.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
---
 drivers/input/evdev.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 0cab144..f8c305a 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -34,7 +34,6 @@ struct evdev {
 	struct input_handle handle;
 	wait_queue_head_t wait;
 	struct evdev_client __rcu *grab;
-	struct list_head client_list;
 	struct mutex mutex;
 	struct device dev;
 	struct cdev cdev;
@@ -48,7 +47,6 @@ struct evdev_client {
 	spinlock_t buffer_lock; /* protects access to buffer, head and tail */
 	struct fasync_struct *fasync;
 	struct evdev *evdev;
-	struct list_head node;
 	int clkid;
 	unsigned int bufsize;
 	struct input_event buffer[];
@@ -195,6 +193,7 @@ static void evdev_events(struct input_handle *handle,
 	struct evdev *evdev = handle->private;
 	struct evdev_client *client;
 	ktime_t time_mono, time_real;
+	unsigned long flags;
 
 	time_mono = ktime_get();
 	time_real = ktime_mono_to_real(time_mono);
@@ -203,12 +202,17 @@ static void evdev_events(struct input_handle *handle,
 
 	client = rcu_dereference(evdev->grab);
 
-	if (client)
+	if (client) {
 		evdev_pass_values(client, vals, count, time_mono, time_real);
-	else
-		list_for_each_entry_rcu(client, &evdev->client_list, node)
+	} else {
+		struct revokable_iter iter = REVOKABLE_ITER_INIT;
+
+		spin_lock_irqsave(&evdev->revoke.lock, flags);
+		for_each_revokable_private(client, &iter, &evdev->revoke)
 			evdev_pass_values(client, vals, count,
 					  time_mono, time_real);
+		spin_unlock_irqrestore(&evdev->revoke.lock, flags);
+	}
 
 	rcu_read_unlock();
 }
@@ -280,7 +284,6 @@ static int evdev_release(struct inode *inode, struct file *file)
 
 	mutex_lock(&evdev->mutex);
 	evdev_ungrab(evdev, client);
-	list_del_rcu(&client->node);
 	if (!--evdev->open)
 		input_close_device(&evdev->handle);
 	mutex_unlock(&evdev->mutex);
@@ -327,13 +330,11 @@ static int evdev_open(struct inode *inode, struct file *file)
 	if (error)
 		goto err_free;
 
-	list_add_tail_rcu(&client->node, &evdev->client_list);
-
 	if (!evdev->open++) {
 		error = input_open_device(&evdev->handle);
 		if (error) {
 			evdev->open--;
-			goto err_unlink;
+			goto err_unlock;
 		}
 	}
 
@@ -350,10 +351,8 @@ static int evdev_open(struct inode *inode, struct file *file)
 err_close:
 	if (!--evdev->open)
 		input_close_device(&evdev->handle);
-err_unlink:
-	list_del_rcu(&client->node);
+err_unlock:
 	mutex_unlock(&evdev->mutex);
-	synchronize_rcu();
 err_free:
 	kfree(client);
 	return error;
@@ -981,6 +980,7 @@ static void evdev_free(struct device *dev)
 
 static void evdev_cleanup(struct evdev *evdev)
 {
+	struct revokable_iter iter = REVOKABLE_ITER_INIT;
 	struct evdev_client *client;
 
 	/*
@@ -992,10 +992,10 @@ static void evdev_cleanup(struct evdev *evdev)
 	cdev_del(&evdev->cdev);
 	revoke_device(&evdev->revoke);
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(client, &evdev->client_list, node)
+	spin_lock_irq(&evdev->revoke.lock);
+	for_each_revokable_private(client, &iter, &evdev->revoke)
 		kill_fasync(&client->fasync, SIGIO, POLL_HUP);
-	rcu_read_unlock();
+	spin_unlock_irq(&evdev->revoke.lock);
 
 	wake_up_interruptible(&evdev->wait);
 
@@ -1027,7 +1027,6 @@ static int evdev_connect(struct input_handler *handler, struct input_dev *dev,
 		goto err_free_minor;
 	}
 
-	INIT_LIST_HEAD(&evdev->client_list);
 	mutex_init(&evdev->mutex);
 	init_waitqueue_head(&evdev->wait);
 	init_revokable_device(&evdev->revoke);
-- 
2.0.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 0/4] VFS revoke()
  2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
                   ` (3 preceding siblings ...)
  2014-08-12 18:54 ` [PATCH RFC 4/4] Input: evdev - drop redundant client_list David Herrmann
@ 2014-08-12 19:14 ` Al Viro
  4 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2014-08-12 19:14 UTC (permalink / raw)
  To: David Herrmann
  Cc: linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Andrew Morton,
	Tejun Heo, linux-fsdevel, Theodore Tso, Christoph Hellwig

On Tue, Aug 12, 2014 at 08:54:04PM +0200, David Herrmann wrote:

> Open questions:
>  * Obviously, we need to protect all calls into file->f_op->xyz(). This series
>    provides enter_file() and leave_file() that can be used like:
>       if (enter_file(file))
>           retval = file->f_op->xyz(file, ...);
>       else
>           retval = -ENODEV;
>    Question is, should we do this at the time we actually invoke those
>    callbacks or should we do it at the syscall-entry time?

The former, obviously.  The real bitch is around mmap - you don't need
that for evdev, but generic solution would better deal with that kind
of crap.

I'll post a review once I'm done with Eric's patchset; hopefully tonight or
tomorrow morning.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts
  2014-08-12 18:54 ` [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts David Herrmann
@ 2014-08-18 21:03   ` Tejun Heo
  2014-08-19  8:29     ` David Herrmann
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-08-18 21:03 UTC (permalink / raw)
  To: David Herrmann
  Cc: linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Andrew Morton,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig

Hello,

On Tue, Aug 12, 2014 at 08:54:05PM +0200, David Herrmann wrote:
> This introduces a new reference-type "struct kactive". Unlike kref, this
> type manages "active references". That means, references can only be
> acquired if the object is active. At any time the object can be
> deactivated, causing any new attempt to acquire an active reference to
> fail. Furthermore, after disabling an object, you can wait for all active
> references to be dropped. This allows synchronous object removal without
> dangling _active_ objects.
> 
> This obivously does NOT replace the usual ref-count. Moreover, it is meant
> to be used in combination with normal ref-counts. This way, you can get
> active references and normal references to the object.
> 
> Active references are usually required for callbacks. That is, if an
> object has callbacks that can be entered by user-space, you usually want
> an active reference to the object as long as you are _inside_ the
> callback. This way the object cannot be removed while you work on it.
> Normal references, on the other hand, are required by the underlying
> file/device to make sure the object with its callback-pointer can be
> accessed at all.
> 
> kernfs implements an active-reference type with spin-locks. This patch is
> very loosely based on kernfs but avoids spin-locks.

Yeah, this is very similar to kernfs's [de]activation support which is
pretty much a revocation mechanism.  I have two comments.

* Is it necessary to make this a library?  Should it just be embedded
  into revoke(2) implementation?

* I struggled a lot with self-draining support in kernfs.  Ultimately,
  what worked is the ability to just put the reference which is held
  for the current invocation, fully delegating the responsibility of
  keeping the current object accessible to whoever requested to put
  that reference.  Self-draining can be implemented on top of it but
  that in itself is a bit too rigid for dynamic filesystems.

I'd love to see this in vfs layer.  A lot of kernfs's complexities
come from revocation support and using something common would be nice.
If this gets in, do you plan to convert kernfs to use it?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts
  2014-08-18 21:03   ` Tejun Heo
@ 2014-08-19  8:29     ` David Herrmann
  0 siblings, 0 replies; 9+ messages in thread
From: David Herrmann @ 2014-08-19  8:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Andrew Morton,
	Al Viro, linux-fsdevel, Theodore Tso, Christoph Hellwig

Hi Tejun

On Mon, Aug 18, 2014 at 11:03 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Tue, Aug 12, 2014 at 08:54:05PM +0200, David Herrmann wrote:
>> This introduces a new reference-type "struct kactive". Unlike kref, this
>> type manages "active references". That means, references can only be
>> acquired if the object is active. At any time the object can be
>> deactivated, causing any new attempt to acquire an active reference to
>> fail. Furthermore, after disabling an object, you can wait for all active
>> references to be dropped. This allows synchronous object removal without
>> dangling _active_ objects.
>>
>> This obivously does NOT replace the usual ref-count. Moreover, it is meant
>> to be used in combination with normal ref-counts. This way, you can get
>> active references and normal references to the object.
>>
>> Active references are usually required for callbacks. That is, if an
>> object has callbacks that can be entered by user-space, you usually want
>> an active reference to the object as long as you are _inside_ the
>> callback. This way the object cannot be removed while you work on it.
>> Normal references, on the other hand, are required by the underlying
>> file/device to make sure the object with its callback-pointer can be
>> accessed at all.
>>
>> kernfs implements an active-reference type with spin-locks. This patch is
>> very loosely based on kernfs but avoids spin-locks.
>
> Yeah, this is very similar to kernfs's [de]activation support which is
> pretty much a revocation mechanism.  I have two comments.
>
> * Is it necessary to make this a library?  Should it just be embedded
>   into revoke(2) implementation?

I have no objections to embedding it into revoke(2).

> * I struggled a lot with self-draining support in kernfs.  Ultimately,
>   what worked is the ability to just put the reference which is held
>   for the current invocation, fully delegating the responsibility of
>   keeping the current object accessible to whoever requested to put
>   that reference.  Self-draining can be implemented on top of it but
>   that in itself is a bit too rigid for dynamic filesystems.

Yes, I saw that you just drop the own reference and then acquire a
fake-ref to simplify exit-paths. Not sure which's the better way. I
guess for the start it'd be enough to just get normal draining and add
self-draining once we're done with the base implementation.

> I'd love to see this in vfs layer.  A lot of kernfs's complexities
> come from revocation support and using something common would be nice.
> If this gets in, do you plan to convert kernfs to use it?

kernfs also requires delayed 'enable' for attributes, so you can
enable a whole tree at once. I haven't added support for this to
revoke(2), but I'm planning to.

Thanks
David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC 2/4] vfs: add revoke() helpers
  2014-08-12 18:54 ` [PATCH RFC 2/4] vfs: add revoke() helpers David Herrmann
@ 2014-08-19 12:27   ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2014-08-19 12:27 UTC (permalink / raw)
  To: David Herrmann
  Cc: linux-kernel, Greg Kroah-Hartman, Linus Torvalds, Andrew Morton,
	Tejun Heo, Al Viro, linux-fsdevel, Theodore Tso,
	Christoph Hellwig

NAK on anything bloating struct file more.  If you want to add something
new to it find a way to put it on diet first by removing existing
fields.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-08-19 12:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-12 18:54 [PATCH RFC 0/4] VFS revoke() David Herrmann
2014-08-12 18:54 ` [PATCH RFC 1/4] kactive: introduce generic "active"-refcounts David Herrmann
2014-08-18 21:03   ` Tejun Heo
2014-08-19  8:29     ` David Herrmann
2014-08-12 18:54 ` [PATCH RFC 2/4] vfs: add revoke() helpers David Herrmann
2014-08-19 12:27   ` Christoph Hellwig
2014-08-12 18:54 ` [PATCH RFC 3/4] Input: evdev - switch to revoke helpers David Herrmann
2014-08-12 18:54 ` [PATCH RFC 4/4] Input: evdev - drop redundant client_list David Herrmann
2014-08-12 19:14 ` [PATCH RFC 0/4] VFS revoke() Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).