From: Tejun Heo <tj@kernel.org>
To: gregkh@linuxfoundation.org
Cc: linux-kernel@vger.kernel.org, schwidefsky@de.ibm.com,
heiko.carstens@de.ibm.com, stern@rowland.harvard.edu,
JBottomley@parallels.com, bhelgaas@google.com,
Tejun Heo <tj@kernel.org>
Subject: [PATCH 05/14] kernfs: restructure removal path to fix possible premature return
Date: Fri, 10 Jan 2014 08:46:51 -0500 [thread overview]
Message-ID: <1389361620-5086-6-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1389361620-5086-1-git-send-email-tj@kernel.org>
The recursive nature of kernfs_remove() means that, even if
kernfs_remove() is not allowed to be called multiple times on the same
node, there may be race conditions between removal of parent and its
descendants. While we can claim that kernfs_remove() shouldn't be
called on one of the descendants while the removal of an ancestor is
in progress, such rule is unnecessarily restrictive and very difficult
to enforce. It's better to simply allow invoking kernfs_remove() as
the caller sees fit as long as the caller ensures that the node is
accessible.
The current behavior in such situations is broken. Whoever enters
removal path first takes the node off the hierarchy and then
deactivates. Following removers either return as soon as it notices
that it's not the first one or can't even find the target node as it
has already been removed from the hierarchy. In both cases, the
following removers may finish prematurely while the nodes which should
be removed and drained are still being processed by the first one.
This patch restructures so that multiple removers, whether through
recursion or direction invocation, always follow the following rules.
* When there are multiple concurrent removers, only one puts the base
ref.
* Regardless of which one puts the base ref, all removers are blocked
until the target node is fully deactivated and removed.
To achieve the above, removal path now first deactivates the subtree,
drains it and then unlinks one-by-one. __kernfs_deactivate() is
called directly from __kernfs_removal() and drops and regrabs
kernfs_mutex for each descendant to drain active refs. As this means
that multiple removers can enter __kernfs_deactivate() for the same
node, the function is updated so that it can handle multiple
deactivators of the same node - only one actually deactivates but all
wait till drain completion.
The restructured removal path guarantees that a removed node gets
unlinked only after the node is deactivated and drained. Combined
with proper multiple deactivator handling, this guarantees that any
invocation of kernfs_remove() returns only after the node itself and
all its descendants are deactivated, drained and removed.
v2: Draining separated into a separate loop (used to be in the same
loop as unlink) and done from __kernfs_deactivate(). This is to
allow exposing deactivation as a separate interface later.
Root node removal was broken in v1 patch. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
fs/kernfs/dir.c | 139 ++++++++++++++++++++++++++++++-------------------
include/linux/kernfs.h | 1 +
2 files changed, 87 insertions(+), 53 deletions(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 7f8afc1..e565ec0 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -181,14 +181,38 @@ void kernfs_put_active(struct kernfs_node *kn)
* kernfs_drain - drain kernfs_node
* @kn: kernfs_node to drain
*
- * Drain existing usages.
+ * Drain existing usages of @kn. Mutiple removers may invoke this function
+ * concurrently on @kn and all will return after draining is complete.
+ * Returns %true if drain is performed and kernfs_mutex was temporarily
+ * released. %false if @kn was already drained and no operation was
+ * necessary.
+ *
+ * The caller is responsible for ensuring @kn stays pinned while this
+ * function is in progress even if it gets removed by someone else.
*/
-static void kernfs_drain(struct kernfs_node *kn)
+static bool kernfs_drain(struct kernfs_node *kn)
+ __releases(&kernfs_mutex) __acquires(&kernfs_mutex)
{
struct kernfs_root *root = kernfs_root(kn);
+ lockdep_assert_held(&kernfs_mutex);
WARN_ON_ONCE(atomic_read(&kn->active) >= 0);
+ /*
+ * We want to go through the active ref lockdep annotation at least
+ * once for all node removals, but the lockdep annotation can't be
+ * nested inside kernfs_mutex and deactivation can't make forward
+ * progress if we keep dropping the mutex. Use JUST_ACTIVATED to
+ * force the slow path once for each deactivation if lockdep is
+ * enabled.
+ */
+ if ((!kernfs_lockdep(kn) || !(kn->flags & KERNFS_JUST_DEACTIVATED)) &&
+ atomic_read(&kn->active) == KN_DEACTIVATED_BIAS)
+ return false;
+
+ kn->flags &= ~KERNFS_JUST_DEACTIVATED;
+ mutex_unlock(&kernfs_mutex);
+
if (kernfs_lockdep(kn)) {
rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_);
if (atomic_read(&kn->active) != KN_DEACTIVATED_BIAS)
@@ -202,6 +226,9 @@ static void kernfs_drain(struct kernfs_node *kn)
lock_acquired(&kn->dep_map, _RET_IP_);
rwsem_release(&kn->dep_map, 1, _RET_IP_);
}
+
+ mutex_lock(&kernfs_mutex);
+ return true;
}
/**
@@ -447,49 +474,6 @@ int kernfs_add_one(struct kernfs_addrm_cxt *acxt, struct kernfs_node *kn,
}
/**
- * kernfs_remove_one - remove kernfs_node from parent
- * @acxt: addrm context to use
- * @kn: kernfs_node to be removed
- *
- * Mark @kn removed and drop nlink of parent inode if @kn is a
- * directory. @kn is unlinked from the children list.
- *
- * This function should be called between calls to
- * kernfs_addrm_start() and kernfs_addrm_finish() and should be
- * passed the same @acxt as passed to kernfs_addrm_start().
- *
- * LOCKING:
- * Determined by kernfs_addrm_start().
- */
-static void kernfs_remove_one(struct kernfs_addrm_cxt *acxt,
- struct kernfs_node *kn)
-{
- struct kernfs_iattrs *ps_iattr;
-
- /*
- * Removal can be called multiple times on the same node. Only the
- * first invocation is effective and puts the base ref.
- */
- if (atomic_read(&kn->active) < 0)
- return;
-
- if (kn->parent) {
- kernfs_unlink_sibling(kn);
-
- /* Update timestamps on the parent */
- ps_iattr = kn->parent->iattr;
- if (ps_iattr) {
- ps_iattr->ia_iattr.ia_ctime = CURRENT_TIME;
- ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
- }
- }
-
- atomic_add(KN_DEACTIVATED_BIAS, &kn->active);
- kn->u.removed_list = acxt->removed;
- acxt->removed = kn;
-}
-
-/**
* kernfs_addrm_finish - finish up kernfs_node add/remove
* @acxt: addrm context to finish up
*
@@ -512,7 +496,6 @@ void kernfs_addrm_finish(struct kernfs_addrm_cxt *acxt)
acxt->removed = kn->u.removed_list;
- kernfs_drain(kn);
kernfs_unmap_bin_file(kn);
kernfs_put(kn);
}
@@ -822,23 +805,73 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
return pos->parent;
}
+static void __kernfs_deactivate(struct kernfs_node *kn)
+{
+ struct kernfs_node *pos;
+
+ lockdep_assert_held(&kernfs_mutex);
+
+ /* prevent any new usage under @kn by deactivating all nodes */
+ pos = NULL;
+ while ((pos = kernfs_next_descendant_post(pos, kn))) {
+ if (atomic_read(&pos->active) >= 0) {
+ atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
+ pos->flags |= KERNFS_JUST_DEACTIVATED;
+ }
+ }
+
+ /*
+ * Drain the subtree. If kernfs_drain() blocked to drain, which is
+ * indicated by %true return, it temporarily released kernfs_mutex
+ * and the rbtree might have been modified inbetween breaking our
+ * future walk. Restart the walk after each %true return.
+ */
+ pos = NULL;
+ while ((pos = kernfs_next_descendant_post(pos, kn))) {
+ bool drained;
+
+ kernfs_get(pos);
+ drained = kernfs_drain(pos);
+ kernfs_put(pos);
+ if (drained)
+ pos = NULL;
+ }
+}
+
static void __kernfs_remove(struct kernfs_addrm_cxt *acxt,
struct kernfs_node *kn)
{
- struct kernfs_node *pos, *next;
+ struct kernfs_node *pos;
+
+ lockdep_assert_held(&kernfs_mutex);
if (!kn)
return;
pr_debug("kernfs %s: removing\n", kn->name);
- next = NULL;
+ __kernfs_deactivate(kn);
+
+ /* unlink the subtree node-by-node */
do {
- pos = next;
- next = kernfs_next_descendant_post(pos, kn);
- if (pos)
- kernfs_remove_one(acxt, pos);
- } while (next);
+ struct kernfs_iattrs *ps_iattr;
+
+ pos = kernfs_leftmost_descendant(kn);
+
+ if (pos->parent) {
+ kernfs_unlink_sibling(pos);
+
+ /* update timestamps on the parent */
+ ps_iattr = pos->parent->iattr;
+ if (ps_iattr) {
+ ps_iattr->ia_iattr.ia_ctime = CURRENT_TIME;
+ ps_iattr->ia_iattr.ia_mtime = CURRENT_TIME;
+ }
+ }
+
+ pos->u.removed_list = acxt->removed;
+ acxt->removed = pos;
+ } while (pos != kn);
}
/**
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 289d4f6..61bd7ae 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -37,6 +37,7 @@ enum kernfs_node_type {
#define KERNFS_FLAG_MASK ~KERNFS_TYPE_MASK
enum kernfs_node_flag {
+ KERNFS_JUST_DEACTIVATED = 0x0010, /* used to aid lockdep annotation */
KERNFS_NS = 0x0020,
KERNFS_HAS_SEQ_SHOW = 0x0040,
KERNFS_HAS_MMAP = 0x0080,
--
1.8.4.2
next prev parent reply other threads:[~2014-01-10 13:47 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-10 13:46 [PATCHSET v2 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
2014-01-10 13:46 ` [PATCH 01/14] kernfs: fix get_active failure handling in kernfs_seq_*() Tejun Heo
2014-01-10 13:46 ` [PATCH 02/14] kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq Tejun Heo
2014-01-10 13:46 ` [PATCH 03/14] kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep() Tejun Heo
2014-01-10 13:46 ` [PATCH 04/14] kernfs: remove KERNFS_REMOVED Tejun Heo
2014-01-10 13:46 ` Tejun Heo [this message]
2014-01-10 13:46 ` [PATCH 06/14] kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() Tejun Heo
2014-01-10 13:46 ` [PATCH 07/14] kernfs: remove kernfs_addrm_cxt Tejun Heo
2014-01-10 13:46 ` [PATCH 08/14] kernfs: make kernfs_get_active() block if the node is deactivated but not removed Tejun Heo
2014-01-10 13:46 ` [PATCH 09/14] kernfs: implement kernfs_{de|re}activate[_self]() Tejun Heo
2014-01-10 13:46 ` [PATCH 10/14] kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers Tejun Heo
2014-01-10 13:46 ` [PATCH 11/14] pci: use device_remove_file_self() instead of device_schedule_callback() Tejun Heo
2014-01-10 13:46 ` [PATCH 12/14] scsi: " Tejun Heo
2014-01-10 13:46 ` [PATCH 13/14] s390: " Tejun Heo
2014-01-10 13:47 ` [PATCH 14/14] sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() Tejun Heo
2014-01-10 13:54 ` [PATCHSET v2 driver-core-next] kernfs, sysfs, driver-core: implement synchronous self-removal Tejun Heo
2014-01-10 15:46 ` Alan Stern
2014-01-11 18:51 ` Tejun Heo
2014-01-11 20:19 ` Greg KH
2014-01-11 22:52 ` Alan Stern
2014-01-11 23:37 ` Greg KH
-- strict thread matches above, loose matches on Subject: below --
2014-01-10 13:57 [PATCHSET v3 " Tejun Heo
2014-01-10 13:57 ` [PATCH 05/14] kernfs: restructure removal path to fix possible premature return Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1389361620-5086-6-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=JBottomley@parallels.com \
--cc=bhelgaas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox