* [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode
@ 2016-04-06 9:22 Dave Chinner
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
` (5 more replies)
0 siblings, 6 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
Hi folks,
There is a problem that RHEL QE tripped over on a long-running
fsstress test on a RHEL 6.6. Brian did all the hard work of working
out the initial cause of the GPF that was being tripped, but
had not yet got past how to fix the issues around xfs_free_inode.
The code is the same as the current upstream code, so the problem
still exists....
The first patch fixes an obvious (now!) bug in xfs_iflush_cluster
where it checks the wrong inode after lookup for validity. It still
kind-of works, because the correct inode number is used for the "are
we still in the right cluster" check, so it's not quite a hole the
size of a truck. Still something that should not have slipped
through 6 years ago and not been discovered until now...
The most important patch (#4) address the use-after-free issues that the
xfs inode has w.r.t. RCU freeing and the lookup that
xfs_iflush_cluster is doing under the rcu_read_lock. All the
structures accessed under the RCU context need to be freed after the
current RCU grace period expires, as RCU lookups may attempt to
access them at any time during the grace period. hence we have to
move them into the RCU callback so that we don't free them
prematurely.
The rest of the patches are defensive in that they make
xfs_iflush_cluster only act on relevant inodes and be able to
guarantee detection inodes that are in the process of being freed.
While these aren't absolutely necessary, it seems silly to ignore
these obvious issues while I'm fixing up other issues with the same
code.
There's some more detail on the fixes in the commit descriptions.
Brian, I've only run this through xfstests, so I have no real idea
if it fixes the problem fsstress has uncovered. AIUI it takes 3 or 4
days to reproduce the issue so this is kind of a pre-emptive strike
on what I think is the underlying issue based on your description
and commentary. I figured having code to explain the problems would
save some time while you sleep....
Comments, thoughts, testing and flames all welcome...
-Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
@ 2016-04-06 9:22 ` Dave Chinner
2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 2/5] xfs: skip stale inodes " Dave Chinner
` (4 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Some careless idiot wrote crap code in commit 1a3e8f3 ("xfs: convert
inode cache lookups to use RCU locking") back in late 2010, and so
xfs_iflush_cluster checks the wrong inode for whether it is still
valid under RCU protection. Fix it to lock and check the correct
inode.
Part of the reason for the screwup was the unconventional naming of
the cluster inode variable - iq - so also rename all the cluster
inode variables to use a more conventional prefixes to reduce
potential future confusion (cilist, cilist_size, cip).
Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 64 +++++++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 32 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f79ea59..2718d10 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3149,16 +3149,16 @@ out_release_wip:
STATIC int
xfs_iflush_cluster(
- xfs_inode_t *ip,
- xfs_buf_t *bp)
+ struct xfs_inode *ip,
+ struct xfs_buf *bp)
{
- xfs_mount_t *mp = ip->i_mount;
+ struct xfs_mount *mp = ip->i_mount;
struct xfs_perag *pag;
unsigned long first_index, mask;
unsigned long inodes_per_cluster;
- int ilist_size;
- xfs_inode_t **ilist;
- xfs_inode_t *iq;
+ int cilist_size;
+ struct xfs_inode **cilist;
+ struct xfs_inode *cip;
int nr_found;
int clcount = 0;
int bufwasdelwri;
@@ -3167,23 +3167,23 @@ xfs_iflush_cluster(
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
inodes_per_cluster = mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog;
- ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
- ilist = kmem_alloc(ilist_size, KM_MAYFAIL|KM_NOFS);
- if (!ilist)
+ cilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
+ cilist = kmem_alloc(cilist_size, KM_MAYFAIL|KM_NOFS);
+ if (!cilist)
goto out_put;
mask = ~(((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
rcu_read_lock();
/* really need a gang lookup range call here */
- nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)ilist,
+ nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)cilist,
first_index, inodes_per_cluster);
if (nr_found == 0)
goto out_free;
for (i = 0; i < nr_found; i++) {
- iq = ilist[i];
- if (iq == ip)
+ cip = cilist[i];
+ if (cip == ip)
continue;
/*
@@ -3192,20 +3192,20 @@ xfs_iflush_cluster(
* We need to check under the i_flags_lock for a valid inode
* here. Skip it if it is not valid or the wrong inode.
*/
- spin_lock(&ip->i_flags_lock);
- if (!ip->i_ino ||
- (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
- spin_unlock(&ip->i_flags_lock);
+ spin_lock(&cip->i_flags_lock);
+ if (!cip->i_ino ||
+ (XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
+ spin_unlock(&cip->i_flags_lock);
continue;
}
- spin_unlock(&ip->i_flags_lock);
+ spin_unlock(&cip->i_flags_lock);
/*
* Do an un-protected check to see if the inode is dirty and
* is a candidate for flushing. These checks will be repeated
* later after the appropriate locks are acquired.
*/
- if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0)
+ if (xfs_inode_clean(cip) && xfs_ipincount(cip) == 0)
continue;
/*
@@ -3213,15 +3213,15 @@ xfs_iflush_cluster(
* then this inode cannot be flushed and is skipped.
*/
- if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED))
+ if (!xfs_ilock_nowait(cip, XFS_ILOCK_SHARED))
continue;
- if (!xfs_iflock_nowait(iq)) {
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ if (!xfs_iflock_nowait(cip)) {
+ xfs_iunlock(cip, XFS_ILOCK_SHARED);
continue;
}
- if (xfs_ipincount(iq)) {
- xfs_ifunlock(iq);
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ if (xfs_ipincount(cip)) {
+ xfs_ifunlock(cip);
+ xfs_iunlock(cip, XFS_ILOCK_SHARED);
continue;
}
@@ -3229,18 +3229,18 @@ xfs_iflush_cluster(
* arriving here means that this inode can be flushed. First
* re-check that it's dirty before flushing.
*/
- if (!xfs_inode_clean(iq)) {
+ if (!xfs_inode_clean(cip)) {
int error;
- error = xfs_iflush_int(iq, bp);
+ error = xfs_iflush_int(cip, bp);
if (error) {
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ xfs_iunlock(cip, XFS_ILOCK_SHARED);
goto cluster_corrupt_out;
}
clcount++;
} else {
- xfs_ifunlock(iq);
+ xfs_ifunlock(cip);
}
- xfs_iunlock(iq, XFS_ILOCK_SHARED);
+ xfs_iunlock(cip, XFS_ILOCK_SHARED);
}
if (clcount) {
@@ -3250,7 +3250,7 @@ xfs_iflush_cluster(
out_free:
rcu_read_unlock();
- kmem_free(ilist);
+ kmem_free(cilist);
out_put:
xfs_perag_put(pag);
return 0;
@@ -3293,8 +3293,8 @@ cluster_corrupt_out:
/*
* Unlocks the flush lock
*/
- xfs_iflush_abort(iq, false);
- kmem_free(ilist);
+ xfs_iflush_abort(cip, false);
+ kmem_free(cilist);
xfs_perag_put(pag);
return -EFSCORRUPTED;
}
--
2.7.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/5] xfs: skip stale inodes in xfs_iflush_cluster
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
@ 2016-04-06 9:22 ` Dave Chinner
2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 3/5] xfs: xfs_iflush_cluster has range issues Dave Chinner
` (3 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
We don't write back stale inodes, so we should skip them in
xfS_iflush_cluster, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2718d10..6598104 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3190,10 +3190,11 @@ xfs_iflush_cluster(
* because this is an RCU protected lookup, we could find a
* recently freed or even reallocated inode during the lookup.
* We need to check under the i_flags_lock for a valid inode
- * here. Skip it if it is not valid or the wrong inode.
+ * here. Skip it if it is not valid, stale or the wrong inode.
*/
spin_lock(&cip->i_flags_lock);
if (!cip->i_ino ||
+ __xfs_iflags_test(ip, XFS_ISTALE) ||
(XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
spin_unlock(&cip->i_flags_lock);
continue;
--
2.7.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/5] xfs: xfs_iflush_cluster has range issues
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
2016-04-06 9:22 ` [PATCH 2/5] xfs: skip stale inodes " Dave Chinner
@ 2016-04-06 9:22 ` Dave Chinner
2016-04-06 12:52 ` Brian Foster
2016-04-07 15:48 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
` (2 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
xfs_iflush_cluster() does a gang lookup on the radix tree, meaning
it can find inode beyond the current cluster if there is sparse
cache population. gang lookups return results in ascending index
order, so stop trying to cluster iodes once the first inode outside
the cluster mask is detected.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 6598104..b984be4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3194,11 +3194,20 @@ xfs_iflush_cluster(
*/
spin_lock(&cip->i_flags_lock);
if (!cip->i_ino ||
- __xfs_iflags_test(ip, XFS_ISTALE) ||
- (XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
+ __xfs_iflags_test(ip, XFS_ISTALE)) {
spin_unlock(&cip->i_flags_lock);
continue;
}
+
+ /*
+ * Once we fall off the end of the cluster, no point checking
+ * any more inodes in the list because they will also all be
+ * outside the cluster.
+ */
+ if ((XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
+ spin_unlock(&cip->i_flags_lock);
+ break;
+ }
spin_unlock(&cip->i_flags_lock);
/*
--
2.7.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
` (2 preceding siblings ...)
2016-04-06 9:22 ` [PATCH 3/5] xfs: xfs_iflush_cluster has range issues Dave Chinner
@ 2016-04-06 9:22 ` Dave Chinner
2016-04-07 15:50 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 5/5] xfs: mark reclaimed inodes invalid earlier Dave Chinner
2016-04-06 12:52 ` [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Brian Foster
5 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The xfs_inode freed in xfs_inode_free() has multiple allocated
structures attached to it. We free these in xfs_inode_free() before
we mark the inode as invalid, and before we run call_rcu() to queue
the structure for freeing.
Unfortunately, this freeing can race with other accesses that are in
the RCU current grace period that have found the inode in the radix
tree with a valid state. This includes xfs_iflush_cluster(), which
calls xfs_inode_clean(), and that accesses the inode log item on the
xfs_inode.
The log item structure is freed in xfs_inode_free(), so there is the
possibility we can be accessing freed memory in xfs_iflush_cluster()
after validating the xfs_inode structure as being valid for this RCU
context. Hence we can get spuriously incorrect clean state returned
from such checks. This can lead to use thinking the inode is dirty
when it is, in fact, clean, and so incorrectly attaching it to the
buffer for IO and completion processing.
This then leads to use-after-free situations on the xfs_inode itself
if the IO completes after the current RCU grace period expires. The
buffer callbacks will access the xfs_inode and try to do all sorts
of things it shouldn't with freed memory.
IOWs, xfs_iflush_cluster() only works correctly when racing with
inode reclaim if the inode log item is present and correctly stating
the inode is clean. If the inode is being freed, then reclaim has
already made sure the inode is clean, and hence xfs_iflush_cluster
can skip it. However, we are accessing the inode inode under RCU
read lock protection and so also must ensure that all dynamically
allocated memory we reference in this context is not freed until the
RCU grace period expires.
To fix this, move all the potential memory freeing into
xfs_inode_free_callback() so that we are guarantee RCU protected
lookup code will always have the memory structures it needs
available during the RCU grace period that lookup races can occur
in.
Discovered-by: Brain Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_icache.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index bf2d607..0c94cde 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -94,13 +94,6 @@ xfs_inode_free_callback(
struct inode *inode = container_of(head, struct inode, i_rcu);
struct xfs_inode *ip = XFS_I(inode);
- kmem_zone_free(xfs_inode_zone, ip);
-}
-
-void
-xfs_inode_free(
- struct xfs_inode *ip)
-{
switch (VFS_I(ip)->i_mode & S_IFMT) {
case S_IFREG:
case S_IFDIR:
@@ -118,6 +111,13 @@ xfs_inode_free(
ip->i_itemp = NULL;
}
+ kmem_zone_free(xfs_inode_zone, ip);
+}
+
+void
+xfs_inode_free(
+ struct xfs_inode *ip)
+{
/*
* Because we use RCU freeing we need to ensure the inode always
* appears to be reclaimed with an invalid inode number when in the
--
2.7.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/5] xfs: mark reclaimed inodes invalid earlier
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
` (3 preceding siblings ...)
2016-04-06 9:22 ` [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
@ 2016-04-06 9:22 ` Dave Chinner
2016-04-06 12:52 ` [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Brian Foster
5 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-06 9:22 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The last thing we do before using call_rcu() on an xfs_inode to be
freed is mark it as invalid. This means there is a window between
when we know for certain that the inode is going to be freed and
when we do actually mark it as "freed".
This is important in the context of RCU lookups - we can look up the
inode, find that it is valid, and then use it as such not realising
that it is in the final stages of being freed.
As such, mark the inode as being invalid the moment we know it is
going to be reclaimed. This can be done while we still hold the
XFS_ILOCK_EXCL and the flush lock in xfs_inode_reclaim, meaning that
it occurs well before we remove it from the radix tree, and that
the i_flags_lock, the XFS_ILOCK and the inode flush lock all act as
synchronisation points for detecting that an inode is about to go
away.
For defensive purposes, this allows us to add a further check to
xfs_iflush_cluster to ensure we skip inodes that are being freed
after we grab the XFS_ILOCK_SHARED and the flush lock - we know that
if the inode number if valid while we have these locks held we know
that it has not progressed through reclaim to the point where it is
clean and is about to be freed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_icache.c | 40 ++++++++++++++++++++++++++++++++--------
fs/xfs/xfs_inode.c | 14 +++++++++++++-
2 files changed, 45 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0c94cde..a60db43 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -114,6 +114,18 @@ xfs_inode_free_callback(
kmem_zone_free(xfs_inode_zone, ip);
}
+static void
+__xfs_inode_free(
+ struct xfs_inode *ip)
+{
+ /* asserts to verify all state is correct here */
+ ASSERT(atomic_read(&ip->i_pincount) == 0);
+ ASSERT(!xfs_isiflocked(ip));
+ XFS_STATS_DEC(ip->i_mount, vn_active);
+
+ call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback);
+}
+
void
xfs_inode_free(
struct xfs_inode *ip)
@@ -129,12 +141,7 @@ xfs_inode_free(
ip->i_ino = 0;
spin_unlock(&ip->i_flags_lock);
- /* asserts to verify all state is correct here */
- ASSERT(atomic_read(&ip->i_pincount) == 0);
- ASSERT(!xfs_isiflocked(ip));
- XFS_STATS_DEC(ip->i_mount, vn_active);
-
- call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback);
+ __xfs_inode_free(ip);
}
/*
@@ -929,6 +936,7 @@ xfs_reclaim_inode(
int sync_mode)
{
struct xfs_buf *bp = NULL;
+ xfs_ino_t ino = ip->i_ino; /* for radix_tree_delete */
int error;
restart:
@@ -993,6 +1001,22 @@ restart:
xfs_iflock(ip);
reclaim:
+ /*
+ * Because we use RCU freeing we need to ensure the inode always appears
+ * to be reclaimed with an invalid inode number when in the free state.
+ * We do this as early as possible under the ILOCK and flush lock so
+ * that xfs_iflush_cluster() can be guaranteed to detect races with us
+ * here. By doing this, we guarantee that once xfs_iflush_cluster has
+ * locked both the XFS_ILOCK and the flush lock that it will see either
+ * a valid, flushable inode that will serialise correctly against the
+ * locks below, or it will see a clean (and invalid) inode that it can
+ * skip.
+ */
+ spin_lock(&ip->i_flags_lock);
+ ip->i_flags = XFS_IRECLAIM;
+ ip->i_ino = 0;
+ spin_unlock(&ip->i_flags_lock);
+
xfs_ifunlock(ip);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
@@ -1006,7 +1030,7 @@ reclaim:
*/
spin_lock(&pag->pag_ici_lock);
if (!radix_tree_delete(&pag->pag_ici_root,
- XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino)))
+ XFS_INO_TO_AGINO(ip->i_mount, ino)))
ASSERT(0);
__xfs_inode_clear_reclaim(pag, ip);
spin_unlock(&pag->pag_ici_lock);
@@ -1023,7 +1047,7 @@ reclaim:
xfs_qm_dqdetach(ip);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_inode_free(ip);
+ __xfs_inode_free(ip);
return error;
out_ifunlock:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b984be4..5b84bbc 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3222,7 +3222,6 @@ xfs_iflush_cluster(
* Try to get locks. If any are unavailable or it is pinned,
* then this inode cannot be flushed and is skipped.
*/
-
if (!xfs_ilock_nowait(cip, XFS_ILOCK_SHARED))
continue;
if (!xfs_iflock_nowait(cip)) {
@@ -3235,6 +3234,19 @@ xfs_iflush_cluster(
continue;
}
+
+ /*
+ * Check the inode number again, just to be certain we are not
+ * racing with freeing in xfs_reclaim_inode(). See the comments
+ * in that function for more information as to why the initial
+ * check is not sufficient.
+ */
+ if (!cip->i_ino) {
+ xfs_ifunlock(cip);
+ xfs_iunlock(cip, XFS_ILOCK_SHARED);
+ continue;
+ }
+
/*
* arriving here means that this inode can be flushed. First
* re-check that it's dirty before flushing.
--
2.7.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
@ 2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Brian Foster @ 2016-04-06 12:51 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Apr 06, 2016 at 07:22:50PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Some careless idiot wrote crap code in commit 1a3e8f3 ("xfs: convert
> inode cache lookups to use RCU locking") back in late 2010, and so
> xfs_iflush_cluster checks the wrong inode for whether it is still
> valid under RCU protection. Fix it to lock and check the correct
> inode.
>
> Part of the reason for the screwup was the unconventional naming of
> the cluster inode variable - iq - so also rename all the cluster
> inode variables to use a more conventional prefixes to reduce
> potential future confusion (cilist, cilist_size, cip).
>
> Discovered-by: Brain Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Heh, the variable name changes are probably a good idea. It's very easy
to misread between iq/ip. I'm not sure how many times I read through
xfs_iflush_cluster() before I realized it wasn't doing what I thought it
was doing, but it was at least once or twice. ;)
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_inode.c | 64 +++++++++++++++++++++++++++---------------------------
> 1 file changed, 32 insertions(+), 32 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f79ea59..2718d10 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3149,16 +3149,16 @@ out_release_wip:
>
> STATIC int
> xfs_iflush_cluster(
> - xfs_inode_t *ip,
> - xfs_buf_t *bp)
> + struct xfs_inode *ip,
> + struct xfs_buf *bp)
> {
> - xfs_mount_t *mp = ip->i_mount;
> + struct xfs_mount *mp = ip->i_mount;
> struct xfs_perag *pag;
> unsigned long first_index, mask;
> unsigned long inodes_per_cluster;
> - int ilist_size;
> - xfs_inode_t **ilist;
> - xfs_inode_t *iq;
> + int cilist_size;
> + struct xfs_inode **cilist;
> + struct xfs_inode *cip;
> int nr_found;
> int clcount = 0;
> int bufwasdelwri;
> @@ -3167,23 +3167,23 @@ xfs_iflush_cluster(
> pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
>
> inodes_per_cluster = mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog;
> - ilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
> - ilist = kmem_alloc(ilist_size, KM_MAYFAIL|KM_NOFS);
> - if (!ilist)
> + cilist_size = inodes_per_cluster * sizeof(xfs_inode_t *);
> + cilist = kmem_alloc(cilist_size, KM_MAYFAIL|KM_NOFS);
> + if (!cilist)
> goto out_put;
>
> mask = ~(((mp->m_inode_cluster_size >> mp->m_sb.sb_inodelog)) - 1);
> first_index = XFS_INO_TO_AGINO(mp, ip->i_ino) & mask;
> rcu_read_lock();
> /* really need a gang lookup range call here */
> - nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)ilist,
> + nr_found = radix_tree_gang_lookup(&pag->pag_ici_root, (void**)cilist,
> first_index, inodes_per_cluster);
> if (nr_found == 0)
> goto out_free;
>
> for (i = 0; i < nr_found; i++) {
> - iq = ilist[i];
> - if (iq == ip)
> + cip = cilist[i];
> + if (cip == ip)
> continue;
>
> /*
> @@ -3192,20 +3192,20 @@ xfs_iflush_cluster(
> * We need to check under the i_flags_lock for a valid inode
> * here. Skip it if it is not valid or the wrong inode.
> */
> - spin_lock(&ip->i_flags_lock);
> - if (!ip->i_ino ||
> - (XFS_INO_TO_AGINO(mp, iq->i_ino) & mask) != first_index) {
> - spin_unlock(&ip->i_flags_lock);
> + spin_lock(&cip->i_flags_lock);
> + if (!cip->i_ino ||
> + (XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
> + spin_unlock(&cip->i_flags_lock);
> continue;
> }
> - spin_unlock(&ip->i_flags_lock);
> + spin_unlock(&cip->i_flags_lock);
>
> /*
> * Do an un-protected check to see if the inode is dirty and
> * is a candidate for flushing. These checks will be repeated
> * later after the appropriate locks are acquired.
> */
> - if (xfs_inode_clean(iq) && xfs_ipincount(iq) == 0)
> + if (xfs_inode_clean(cip) && xfs_ipincount(cip) == 0)
> continue;
>
> /*
> @@ -3213,15 +3213,15 @@ xfs_iflush_cluster(
> * then this inode cannot be flushed and is skipped.
> */
>
> - if (!xfs_ilock_nowait(iq, XFS_ILOCK_SHARED))
> + if (!xfs_ilock_nowait(cip, XFS_ILOCK_SHARED))
> continue;
> - if (!xfs_iflock_nowait(iq)) {
> - xfs_iunlock(iq, XFS_ILOCK_SHARED);
> + if (!xfs_iflock_nowait(cip)) {
> + xfs_iunlock(cip, XFS_ILOCK_SHARED);
> continue;
> }
> - if (xfs_ipincount(iq)) {
> - xfs_ifunlock(iq);
> - xfs_iunlock(iq, XFS_ILOCK_SHARED);
> + if (xfs_ipincount(cip)) {
> + xfs_ifunlock(cip);
> + xfs_iunlock(cip, XFS_ILOCK_SHARED);
> continue;
> }
>
> @@ -3229,18 +3229,18 @@ xfs_iflush_cluster(
> * arriving here means that this inode can be flushed. First
> * re-check that it's dirty before flushing.
> */
> - if (!xfs_inode_clean(iq)) {
> + if (!xfs_inode_clean(cip)) {
> int error;
> - error = xfs_iflush_int(iq, bp);
> + error = xfs_iflush_int(cip, bp);
> if (error) {
> - xfs_iunlock(iq, XFS_ILOCK_SHARED);
> + xfs_iunlock(cip, XFS_ILOCK_SHARED);
> goto cluster_corrupt_out;
> }
> clcount++;
> } else {
> - xfs_ifunlock(iq);
> + xfs_ifunlock(cip);
> }
> - xfs_iunlock(iq, XFS_ILOCK_SHARED);
> + xfs_iunlock(cip, XFS_ILOCK_SHARED);
> }
>
> if (clcount) {
> @@ -3250,7 +3250,7 @@ xfs_iflush_cluster(
>
> out_free:
> rcu_read_unlock();
> - kmem_free(ilist);
> + kmem_free(cilist);
> out_put:
> xfs_perag_put(pag);
> return 0;
> @@ -3293,8 +3293,8 @@ cluster_corrupt_out:
> /*
> * Unlocks the flush lock
> */
> - xfs_iflush_abort(iq, false);
> - kmem_free(ilist);
> + xfs_iflush_abort(cip, false);
> + kmem_free(cilist);
> xfs_perag_put(pag);
> return -EFSCORRUPTED;
> }
> --
> 2.7.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] xfs: skip stale inodes in xfs_iflush_cluster
2016-04-06 9:22 ` [PATCH 2/5] xfs: skip stale inodes " Dave Chinner
@ 2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Brian Foster @ 2016-04-06 12:51 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Apr 06, 2016 at 07:22:51PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> We don't write back stale inodes, so we should skip them in
> xfS_iflush_cluster, too.
xfs_iflush_cluster
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_inode.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 2718d10..6598104 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3190,10 +3190,11 @@ xfs_iflush_cluster(
> * because this is an RCU protected lookup, we could find a
> * recently freed or even reallocated inode during the lookup.
> * We need to check under the i_flags_lock for a valid inode
> - * here. Skip it if it is not valid or the wrong inode.
> + * here. Skip it if it is not valid, stale or the wrong inode.
> */
> spin_lock(&cip->i_flags_lock);
> if (!cip->i_ino ||
> + __xfs_iflags_test(ip, XFS_ISTALE) ||
> (XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
> spin_unlock(&cip->i_flags_lock);
> continue;
> --
> 2.7.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] xfs: xfs_iflush_cluster has range issues
2016-04-06 9:22 ` [PATCH 3/5] xfs: xfs_iflush_cluster has range issues Dave Chinner
@ 2016-04-06 12:52 ` Brian Foster
2016-04-07 15:48 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Brian Foster @ 2016-04-06 12:52 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Apr 06, 2016 at 07:22:52PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs_iflush_cluster() does a gang lookup on the radix tree, meaning
> it can find inode beyond the current cluster if there is sparse
inodes
> cache population. gang lookups return results in ascending index
> order, so stop trying to cluster iodes once the first inode outside
inodes
> the cluster mask is detected.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_inode.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 6598104..b984be4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3194,11 +3194,20 @@ xfs_iflush_cluster(
> */
> spin_lock(&cip->i_flags_lock);
> if (!cip->i_ino ||
> - __xfs_iflags_test(ip, XFS_ISTALE) ||
> - (XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
> + __xfs_iflags_test(ip, XFS_ISTALE)) {
> spin_unlock(&cip->i_flags_lock);
> continue;
> }
> +
> + /*
> + * Once we fall off the end of the cluster, no point checking
> + * any more inodes in the list because they will also all be
> + * outside the cluster.
> + */
> + if ((XFS_INO_TO_AGINO(mp, cip->i_ino) & mask) != first_index) {
> + spin_unlock(&cip->i_flags_lock);
> + break;
> + }
> spin_unlock(&cip->i_flags_lock);
>
> /*
> --
> 2.7.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
` (4 preceding siblings ...)
2016-04-06 9:22 ` [PATCH 5/5] xfs: mark reclaimed inodes invalid earlier Dave Chinner
@ 2016-04-06 12:52 ` Brian Foster
5 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2016-04-06 12:52 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Apr 06, 2016 at 07:22:49PM +1000, Dave Chinner wrote:
> Hi folks,
>
> There is a problem that RHEL QE tripped over on a long-running
> fsstress test on a RHEL 6.6. Brian did all the hard work of working
> out the initial cause of the GPF that was being tripped, but
> had not yet got past how to fix the issues around xfs_free_inode.
> The code is the same as the current upstream code, so the problem
> still exists....
>
> The first patch fixes an obvious (now!) bug in xfs_iflush_cluster
> where it checks the wrong inode after lookup for validity. It still
> kind-of works, because the correct inode number is used for the "are
> we still in the right cluster" check, so it's not quite a hole the
> size of a truck. Still something that should not have slipped
> through 6 years ago and not been discovered until now...
>
> The most important patch (#4) address the use-after-free issues that the
> xfs inode has w.r.t. RCU freeing and the lookup that
> xfs_iflush_cluster is doing under the rcu_read_lock. All the
> structures accessed under the RCU context need to be freed after the
> current RCU grace period expires, as RCU lookups may attempt to
> access them at any time during the grace period. hence we have to
> move them into the RCU callback so that we don't free them
> prematurely.
>
> The rest of the patches are defensive in that they make
> xfs_iflush_cluster only act on relevant inodes and be able to
> guarantee detection inodes that are in the process of being freed.
> While these aren't absolutely necessary, it seems silly to ignore
> these obvious issues while I'm fixing up other issues with the same
> code.
>
> There's some more detail on the fixes in the commit descriptions.
>
> Brian, I've only run this through xfstests, so I have no real idea
> if it fixes the problem fsstress has uncovered. AIUI it takes 3 or 4
> days to reproduce the issue so this is kind of a pre-emptive strike
> on what I think is the underlying issue based on your description
> and commentary. I figured having code to explain the problems would
> save some time while you sleep....
>
> Comments, thoughts, testing and flames all welcome...
>
Thanks.. it looks mostly clean and logical to me on a first pass
through. I've reviewed the first three, which look straightforward, and
I'll follow up on the last couple once I've run through testing and had
a closer look.
To migrate some of the discussion over to the list... what I'm testing
now is an XFS_IRECLAIM flag check (and inode skip) on the inode in the
cluster flush path. This has run clean for the better part of yesterday
through overnight. While the test requires a few days to verify, the
hacked up kernel I've been testing with usually detects this a bit
sooner (within 6-8 hours or so), so I'm fairly confident that test at
least corners the problem.
The concern raised with the XFS_IRECLAIM check is that of performance
(skipping an inode flush when we don't need to). I'm not totally
convinced that's much of a problem considering how rare this situation
is. The best/only reproducer we have takes a couple days on a debug
kernel, for a bug that's been present for several years. The concern I
had with this approach was more that both codepaths seemed to have
trylock or checks that cause the reclaim or flush to "back off," and I
didn't want to create a situation where neither path would actually do
its job (potentially repeatedly) for a particular inode.
So that approach seemed cleaner than the alternative I was considering,
but the thought of pushing the i_itemp deallocation to the RCU callback
also crossed my mind later on yesterday after writing up my last report
on the bug. This series appears to move all of the deallocation (incl.
fork destruction) to the RCU cb, but that seems reasonable to me as
well. I think relying on RCU is probably the cleanest/most correct
approach so far, so I'll kill the currently running test, pull these
back to the rhel kernel and see how that goes.
Note that I haven't even attempted to reproduce this on an upstream
kernel simply because it's so difficult to reproduce. It's not even
consistently reproducible with the original kernel on different
hardware. I agree that the problem likely exists upstream, but I'm going
to continue to use the original kernel for verification because that
hw/kernel has provided a consistent reproducer...
Brian
> -Dave.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
2016-04-06 12:51 ` Brian Foster
@ 2016-04-07 15:47 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Apr 06, 2016 at 07:22:50PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Some careless idiot wrote crap code in commit 1a3e8f3 ("xfs: convert
> inode cache lookups to use RCU locking") back in late 2010, and so
> xfs_iflush_cluster checks the wrong inode for whether it is still
> valid under RCU protection. Fix it to lock and check the correct
> inode.
>
> Part of the reason for the screwup was the unconventional naming of
> the cluster inode variable - iq - so also rename all the cluster
> inode variables to use a more conventional prefixes to reduce
> potential future confusion (cilist, cilist_size, cip).
Between all that renaming I didn't actually manage to find the fix :)
Can you split the rename and the fix into separate patches?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] xfs: skip stale inodes in xfs_iflush_cluster
2016-04-06 9:22 ` [PATCH 2/5] xfs: skip stale inodes " Dave Chinner
2016-04-06 12:51 ` Brian Foster
@ 2016-04-07 15:47 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] xfs: xfs_iflush_cluster has range issues
2016-04-06 9:22 ` [PATCH 3/5] xfs: xfs_iflush_cluster has range issues Dave Chinner
2016-04-06 12:52 ` Brian Foster
@ 2016-04-07 15:48 ` Christoph Hellwig
1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:48 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-06 9:22 ` [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
@ 2016-04-07 15:50 ` Christoph Hellwig
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-04-07 15:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
@ 2016-04-12 8:55 Alex Lyakas
0 siblings, 0 replies; 20+ messages in thread
From: Alex Lyakas @ 2016-04-12 8:55 UTC (permalink / raw)
To: xfs; +Cc: bbice
[-- Attachment #1.1: Type: text/plain, Size: 697 bytes --]
Hello Dave,
Looking at the patch, I see that now we call xfs_idestroy_fork() in RCU callback. This can do the following chain:
xfs_iext_destroy => xfs_iext_irec_remove => xfs_iext_realloc_indirect=> kmem_realloc => kmem_alloc => kmem_alloc => congestion_wait()
At least according to documentation, the RCU callback cannot block, since it may be called from softirq context. Is this fine?
Thanks,
Alex.
P.S.: I have submitted a patch called “[PATCH] xfs: optimize destruction of the indirection array”, which changes xfs_iext_destroy to call only kmem_free(). But this patch got stuck in the spam filter of the mailing list, and Brent is working to release it from there.
[-- Attachment #1.2: Type: text/html, Size: 1090 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
@ 2016-04-12 8:56 Alex Lyakas
2016-04-12 11:29 ` Brent Bice
2016-04-12 23:29 ` Dave Chinner
0 siblings, 2 replies; 20+ messages in thread
From: Alex Lyakas @ 2016-04-12 8:56 UTC (permalink / raw)
To: xfs; +Cc: bbice, Shyam Kaushik
[-- Attachment #1.1: Type: text/plain, Size: 697 bytes --]
Hello Dave,
Looking at the patch, I see that now we call xfs_idestroy_fork() in RCU callback. This can do the following chain:
xfs_iext_destroy => xfs_iext_irec_remove => xfs_iext_realloc_indirect=> kmem_realloc => kmem_alloc => kmem_alloc => congestion_wait()
At least according to documentation, the RCU callback cannot block, since it may be called from softirq context. Is this fine?
Thanks,
Alex.
P.S.: I have submitted a patch called “[PATCH] xfs: optimize destruction of the indirection array”, which changes xfs_iext_destroy to call only kmem_free(). But this patch got stuck in the spam filter of the mailing list, and Brent is working to release it from there.
[-- Attachment #1.2: Type: text/html, Size: 1349 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-12 8:56 Alex Lyakas
@ 2016-04-12 11:29 ` Brent Bice
2016-04-12 21:11 ` Dave Chinner
2016-04-12 23:29 ` Dave Chinner
1 sibling, 1 reply; 20+ messages in thread
From: Brent Bice @ 2016-04-12 11:29 UTC (permalink / raw)
To: Alex Lyakas, xfs; +Cc: Shyam Kaushik
Could you resend it to be sure? I flagged it for redelivery the day
I emailed you so if nobody else has seen it yet then something still
didn't work. I was pretty sure I'd seen it get delivered ok to oss,
but... (shrug)
Brent
On 04/12/2016 02:56 AM, Alex Lyakas wrote:
> Hello Dave,
> Looking at the patch, I see that now we call xfs_idestroy_fork() in RCU
> callback. This can do the following chain:
> xfs_iext_destroy => xfs_iext_irec_remove => xfs_iext_realloc_indirect=>
> kmem_realloc => kmem_alloc => kmem_alloc => congestion_wait()
> At least according to documentation, the RCU callback cannot block,
> since it may be called from softirq context. Is this fine?
> Thanks,
> Alex.
> P.S.: I have submitted a patch called “[PATCH] xfs: optimize destruction
> of the indirection array”, which changes xfs_iext_destroy to call only
> kmem_free(). But this patch got stuck in the spam filter of the mailing
> list, and Brent is working to release it from there.
>
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-12 11:29 ` Brent Bice
@ 2016-04-12 21:11 ` Dave Chinner
2016-04-12 21:55 ` Brent Bice
0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2016-04-12 21:11 UTC (permalink / raw)
To: Brent Bice; +Cc: Shyam Kaushik, Alex Lyakas, xfs
On Tue, Apr 12, 2016 at 05:29:15AM -0600, Brent Bice wrote:
> Could you resend it to be sure? I flagged it for redelivery the
> day I emailed you so if nobody else has seen it yet then something
> still didn't work. I was pretty sure I'd seen it get delivered ok
> to oss, but... (shrug)
No, it didn't, and it's not in the archives, either.
It seems that sgi.com is rejecting quite a bit of legitimate mail
lately. I've seen a couple of people in the past couple of weeks
send stuff that has reached my private in boxes but not the list.
Now this one, too.
If someone is having to micro-manage the list to prevent rejections
of valid email then that is bad...
-Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-12 21:11 ` Dave Chinner
@ 2016-04-12 21:55 ` Brent Bice
0 siblings, 0 replies; 20+ messages in thread
From: Brent Bice @ 2016-04-12 21:55 UTC (permalink / raw)
To: Dave Chinner; +Cc: Shyam Kaushik, Alex Lyakas, xfs
On 04/12/2016 03:11 PM, Dave Chinner wrote:
> It seems that sgi.com is rejecting quite a bit of legitimate mail
> lately. I've seen a couple of people in the past couple of weeks
> send stuff that has reached my private in boxes but not the list.
> Now this one, too.
Yeah, this one wasn't the anti-spam filters that I manage (the
barracudas, our private RBL, etc) but spamassassin on oss.sgi.com.
So... Dunno. I took a crack at whitelisting Alex's from address in
spamassassin (though I haven't looked at it's configs or man pages in
about 10 years).
> If someone is having to micro-manage the list to prevent rejections
> of valid email then that is bad...
So true. I only happened to notice because I had someone (I forget who
it was several years ago) tweak SA on oss so that rejections get
forwarded to me. The idea was I'd look at those (stuff that the cudas
missed but spamassassin caught) and use that info to update our filters
(the filters that benefit everyone at SGI, not just oss).
Anyway, I don't always have the bandwidth to look at every email SA
on oss rejects and update spam filters (if possible) and just happened
to notice this one and that it wasn't spam.
I am a lot more aggressive with the filters for oss because it's
such a sore-thumb-target for spammers. For instance, there are large
swaths of yahoo IP space being blocked now (for oss only) because over
the years I only found one legit sender to oss from a yahoo IP - the
rest was all spam/phish. But I've only dropped in yahoo IP ranges that
I was seeing spam from. I've gotten aggressive with some Chinese IP
space too for similar reasons. If any of the senders you know of having
trouble might be in yahoo or chinese IP space let me know and I can look
in the logs, but I'd bet a doughnut that if they didn't get a bounce
then they're running afoul of spamassassin 'n not the filters I manage. :-(
There was talk a while back of taking oss out of SGI and hosting it
somewhere in "Da cloud!" (tm) and having someone else in the xfs
community own it, wasn't there? Anyone still looking into that?
Brent
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe
2016-04-12 8:56 Alex Lyakas
2016-04-12 11:29 ` Brent Bice
@ 2016-04-12 23:29 ` Dave Chinner
1 sibling, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2016-04-12 23:29 UTC (permalink / raw)
To: Alex Lyakas; +Cc: bbice, Shyam Kaushik, xfs
On Tue, Apr 12, 2016 at 11:56:35AM +0300, Alex Lyakas wrote:
> Hello Dave,
>
> Looking at the patch, I see that now we call xfs_idestroy_fork() in RCU callback. This can do the following chain:
>
> xfs_iext_destroy => xfs_iext_irec_remove => xfs_iext_realloc_indirect=> kmem_realloc => kmem_alloc => kmem_alloc => congestion_wait()
>
> At least according to documentation, the RCU callback cannot block, since it may be called from softirq context. Is this fine?
Right, I forgot about that. Too many forests. I'll reconstruct your
patch from the email you appended it to previously and add that to
the series to test against.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-04-12 23:29 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-06 9:22 [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Dave Chinner
2016-04-06 9:22 ` [PATCH 1/5] xfs: fix inode validity check in xfs_iflush_cluster Dave Chinner
2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 2/5] xfs: skip stale inodes " Dave Chinner
2016-04-06 12:51 ` Brian Foster
2016-04-07 15:47 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 3/5] xfs: xfs_iflush_cluster has range issues Dave Chinner
2016-04-06 12:52 ` Brian Foster
2016-04-07 15:48 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe Dave Chinner
2016-04-07 15:50 ` Christoph Hellwig
2016-04-06 9:22 ` [PATCH 5/5] xfs: mark reclaimed inodes invalid earlier Dave Chinner
2016-04-06 12:52 ` [PATCH 0/5] xfs; xfs_iflush_cluster vs xfs_reclaim_inode Brian Foster
-- strict thread matches above, loose matches on Subject: below --
2016-04-12 8:55 [PATCH 4/5] xfs: xfs_inode_free() isn't RCU safe Alex Lyakas
2016-04-12 8:56 Alex Lyakas
2016-04-12 11:29 ` Brent Bice
2016-04-12 21:11 ` Dave Chinner
2016-04-12 21:55 ` Brent Bice
2016-04-12 23:29 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox