* Review: Reduce in-core superblock lock contention near ENOSPC
@ 2006-11-23 4:41 David Chinner
2006-11-30 18:03 ` Lachlan McIlroy
0 siblings, 1 reply; 11+ messages in thread
From: David Chinner @ 2006-11-23 4:41 UTC (permalink / raw)
To: xfs-dev; +Cc: xfs
The existing per-cpu superblock counter code still uses the existing
superblock spin lock when we approach ENOSPC for global
synchronisation. On larger machines than this code was originally
tested on we've found that we can still get catastrophic spinlock
contention as we near ENOSPC due to the frequency of rebalances
increasing.
The following patch prevents the case of tens of CPUs spinning on
the incore superblock lock as rebalances fly around. This is done
mainly by introducing a sleeping lock that is used to serialise
balances and modifications near ENOSPC. While this does not prevent
contention and serialisation, it prevents use from wasting the CPU
time of potentially hundreds of CPUs.
This patch also reduces the number of balances occuring by
separating the "need rebalance" case from the "slow allocate" case.
Previously, when a per-cpu counter ran dry, we would lock the
superblock, disable the counters, rebalance and retry the
modification. If we failed a second time we'd disable the per-cpu
counter and use the global slowpath.
However, while we had the counters disabled other threads could run
would then sit waiting in the slow path on the global superblock
lock waiting to do a rebalance. IOWs, near ENOSPC we can end up with
lots of CPUs waiting to do a rebalance and then executing a
rebalance even though it is not necessary.
Now, a counter running dry will trigger a rebalance during which
counters are disabled. Any thread that sees a disabled counter
enters a different path where it waits on the new mutex. When it
gets the new mutex, it checks if the counter is disabled. If the
counter is disabled, then we _know_ that we have to use the global
counter and lock and it is safe to do so immediately. Otherwise, we
drop the mutex and go back to trying the per-cpu counters which we
know were re-enabled.
IOWs, we only do a single rebalance for each counter that runs dry
and we don't get a stampeding heard of rebalances on large CPU count
machines and the subsequent problems that spinlock contention will
cause.
This patch also fixes a rebalance loop that can occur when we try to
reserve more than any per-cpu counter holds while the aggregate free
space is sufficient for a rebalance to always redistribute the space
acorss the per-cpu counters. It does so by ensuring that the minimum
amount on each counter as a result of a rebalance is sufficient to
satisfy the request, or it falls back to the global slow path
earlier than it otherwise would.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_mount.c | 234 +++++++++++++++++++++++++++++++----------------------
fs/xfs/xfs_mount.h | 1
2 files changed, 142 insertions(+), 93 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-10-19 10:29:35.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-10-19 10:32:16.626827226 +1000
@@ -52,21 +52,19 @@ STATIC void xfs_unmountfs_wait(xfs_mount
#ifdef HAVE_PERCPU_SB
STATIC void xfs_icsb_destroy_counters(xfs_mount_t *);
-STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int);
+STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int,
+int);
STATIC void xfs_icsb_sync_counters(xfs_mount_t *);
STATIC int xfs_icsb_modify_counters(xfs_mount_t *, xfs_sb_field_t,
int64_t, int);
-STATIC int xfs_icsb_modify_counters_locked(xfs_mount_t *, xfs_sb_field_t,
- int64_t, int);
STATIC int xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t);
#else
#define xfs_icsb_destroy_counters(mp) do { } while (0)
-#define xfs_icsb_balance_counter(mp, a, b) do { } while (0)
+#define xfs_icsb_balance_counter(mp, a, b, c) do { } while (0)
#define xfs_icsb_sync_counters(mp) do { } while (0)
#define xfs_icsb_modify_counters(mp, a, b, c) do { } while (0)
-#define xfs_icsb_modify_counters_locked(mp, a, b, c) do { } while (0)
#endif
@@ -540,9 +538,11 @@ xfs_readsb(xfs_mount_t *mp, int flags)
ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
}
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0);
+ mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
mp->m_sb_bp = bp;
xfs_buf_relse(bp);
@@ -1479,9 +1479,11 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status = xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
msbp->msb_delta, rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
@@ -1515,11 +1517,12 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status =
- xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
-(msbp->msb_delta),
rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
@@ -1727,14 +1730,17 @@ xfs_icsb_cpu_notify(
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
break;
case CPU_ONLINE:
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0);
+ mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
break;
case CPU_DEAD:
/* Disable all the counters, then fold the dead cpu's
* count into the total on the global superblock and
* re-enable the counters. */
+ mutex_lock(&mp->m_icsb_mutex);
s = XFS_SB_LOCK(mp);
xfs_icsb_disable_counter(mp, XFS_SBS_ICOUNT);
xfs_icsb_disable_counter(mp, XFS_SBS_IFREE);
@@ -1746,10 +1752,14 @@ xfs_icsb_cpu_notify(
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, XFS_ICSB_SB_LOCKED);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, XFS_ICSB_SB_LOCKED);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, XFS_ICSB_SB_LOCKED);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT,
+ XFS_ICSB_SB_LOCKED, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE,
+ XFS_ICSB_SB_LOCKED, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS,
+ XFS_ICSB_SB_LOCKED, 0);
XFS_SB_UNLOCK(mp, s);
+ mutex_unlock(&mp->m_icsb_mutex);
break;
}
@@ -1778,6 +1788,9 @@ xfs_icsb_init_counters(
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
}
+
+ mutex_init(&mp->m_icsb_mutex);
+
/*
* start with all counters disabled so that the
* initial balance kicks us off correctly
@@ -1882,6 +1895,17 @@ xfs_icsb_disable_counter(
ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS));
+ /*
+ * If we are already disabled, then there is nothing to do
+ * here. We check before locking all the counters to avoid
+ * the expensive lock operation when being called in the
+ * slow path and the counter is already disabled. This is
+ * safe because the only time we set or clear this state is under
+ * the m_icsb_mutex.
+ */
+ if (xfs_icsb_counter_disabled(mp, field))
+ return 0;
+
xfs_icsb_lock_all_counters(mp);
if (!test_and_set_bit(field, &mp->m_icsb_counters)) {
/* drain back to superblock */
@@ -1991,24 +2015,33 @@ xfs_icsb_sync_counters_lazy(
/*
* Balance and enable/disable counters as necessary.
*
- * Thresholds for re-enabling counters are somewhat magic.
- * inode counts are chosen to be the same number as single
- * on disk allocation chunk per CPU, and free blocks is
- * something far enough zero that we aren't going thrash
- * when we get near ENOSPC.
+ * Thresholds for re-enabling counters are somewhat magic. inode counts are
+ * chosen to be the same number as single on disk allocation chunk per CPU, and
+ * free blocks is something far enough zero that we aren't going thrash when we
+ * get near ENOSPC. We also need to supply a minimum we require per cpu to
+ * prevent looping endlessly when xfs_alloc_space asks for more than will
+ * be distributed to a single CPU but each CPU has enough blocks to be
+ * reenabled.
+ *
+ * Note that we can be called when counters are already disabled.
+ * xfs_icsb_disable_counter() optimises the counter locking in this case to
+ * prevent locking every per-cpu counter needlessly.
*/
-#define XFS_ICSB_INO_CNTR_REENABLE 64
+
+#define XFS_ICSB_INO_CNTR_REENABLE (uint64_t)64
#define XFS_ICSB_FDBLK_CNTR_REENABLE(mp) \
- (512 + XFS_ALLOC_SET_ASIDE(mp))
+ (uint64_t)(512 + XFS_ALLOC_SET_ASIDE(mp))
STATIC void
xfs_icsb_balance_counter(
xfs_mount_t *mp,
xfs_sb_field_t field,
- int flags)
+ int flags,
+ int min_per_cpu)
{
uint64_t count, resid;
int weight = num_online_cpus();
int s;
+ uint64_t min = (uint64_t)min_per_cpu;
if (!(flags & XFS_ICSB_SB_LOCKED))
s = XFS_SB_LOCK(mp);
@@ -2021,19 +2054,19 @@ xfs_icsb_balance_counter(
case XFS_SBS_ICOUNT:
count = mp->m_sb.sb_icount;
resid = do_div(count, weight);
- if (count < XFS_ICSB_INO_CNTR_REENABLE)
+ if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE))
goto out;
break;
case XFS_SBS_IFREE:
count = mp->m_sb.sb_ifree;
resid = do_div(count, weight);
- if (count < XFS_ICSB_INO_CNTR_REENABLE)
+ if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE))
goto out;
break;
case XFS_SBS_FDBLOCKS:
count = mp->m_sb.sb_fdblocks;
resid = do_div(count, weight);
- if (count < XFS_ICSB_FDBLK_CNTR_REENABLE(mp))
+ if (count < max(min, XFS_ICSB_FDBLK_CNTR_REENABLE(mp)))
goto out;
break;
default:
@@ -2048,32 +2081,39 @@ out:
XFS_SB_UNLOCK(mp, s);
}
-STATIC int
-xfs_icsb_modify_counters_int(
+int
+xfs_icsb_modify_counters(
xfs_mount_t *mp,
xfs_sb_field_t field,
int64_t delta,
- int rsvd,
- int flags)
+ int rsvd)
{
xfs_icsb_cnts_t *icsbp;
long long lcounter; /* long counter for 64 bit fields */
- int cpu, s, locked = 0;
- int ret = 0, balance_done = 0;
+ int cpu, ret = 0, s;
+ might_sleep();
again:
cpu = get_cpu();
- icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu),
- xfs_icsb_lock_cntr(icsbp);
+ icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu);
+
+ /*
+ * if the counter is disabled, go to slow path
+ */
if (unlikely(xfs_icsb_counter_disabled(mp, field)))
goto slow_path;
+ xfs_icsb_lock_cntr(icsbp);
+ if (unlikely(xfs_icsb_counter_disabled(mp, field))) {
+ xfs_icsb_unlock_cntr(icsbp);
+ goto slow_path;
+ }
switch (field) {
case XFS_SBS_ICOUNT:
lcounter = icsbp->icsb_icount;
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_icount = lcounter;
break;
@@ -2081,7 +2121,7 @@ again:
lcounter = icsbp->icsb_ifree;
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_ifree = lcounter;
break;
@@ -2091,7 +2131,7 @@ again:
lcounter = icsbp->icsb_fdblocks - XFS_ALLOC_SET_ASIDE(mp);
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp);
break;
default:
@@ -2100,72 +2140,80 @@ again:
}
xfs_icsb_unlock_cntr(icsbp);
put_cpu();
- if (locked)
- XFS_SB_UNLOCK(mp, s);
return 0;
- /*
- * The slow path needs to be run with the SBLOCK
- * held so that we prevent other threads from
- * attempting to run this path at the same time.
- * this provides exclusion for the balancing code,
- * and exclusive fallback if the balance does not
- * provide enough resources to continue in an unlocked
- * manner.
- */
slow_path:
- xfs_icsb_unlock_cntr(icsbp);
put_cpu();
- /* need to hold superblock incase we need
- * to disable a counter */
- if (!(flags & XFS_ICSB_SB_LOCKED)) {
- s = XFS_SB_LOCK(mp);
- locked = 1;
- flags |= XFS_ICSB_SB_LOCKED;
- }
- if (!balance_done) {
- xfs_icsb_balance_counter(mp, field, flags);
- balance_done = 1;
+ /*
+ * serialise with a mutex so we don't burn lots of cpu on
+ * the superblock lock. We still need to hold the superblock
+ * lock, however, when we modify the global structures.
+ */
+ mutex_lock(&mp->m_icsb_mutex);
+
+ /*
+ * Now running atomically.
+ *
+ * If the counter is enabled, someone has beaten us to rebalancing.
+ * Drop the lock and try again in the fast path....
+ */
+ if (!(xfs_icsb_counter_disabled(mp, field))) {
+ mutex_unlock(&mp->m_icsb_mutex);
goto again;
- } else {
- /*
- * we might not have enough on this local
- * cpu to allocate for a bulk request.
- * We need to drain this field from all CPUs
- * and disable the counter fastpath
- */
- xfs_icsb_disable_counter(mp, field);
}
+ /*
+ * The counter is currently disabled. Because we are
+ * running atomically here, we know a rebalance cannot
+ * be in progress. Hence we can go straight to operating
+ * on the global superblock. We do not call xfs_mod_incore_sb()
+ * here even though we need to get the SB_LOCK. Doing so
+ * will cause us to re-enter this function and deadlock.
+ * Hence we get the SB_LOCK ourselves and then call
+ * xfs_mod_incore_sb_unlocked() as the unlocked path operates
+ * directly on the global counters.
+ */
+ s = XFS_SB_LOCK(mp);
ret = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd);
+ XFS_SB_UNLOCK(mp, s);
- if (locked)
- XFS_SB_UNLOCK(mp, s);
+ /*
+ * Now that we've modified the global superblock, we
+ * may be able to re-enable the distributed counters
+ * (e.g. lots of space just got freed). After that
+ * we are done.
+ */
+ if (ret != ENOSPC)
+ xfs_icsb_balance_counter(mp, field, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
return ret;
-}
-STATIC int
-xfs_icsb_modify_counters(
- xfs_mount_t *mp,
- xfs_sb_field_t field,
- int64_t delta,
- int rsvd)
-{
- return xfs_icsb_modify_counters_int(mp, field, delta, rsvd, 0);
-}
+balance_counter:
+ xfs_icsb_unlock_cntr(icsbp);
+ put_cpu();
-/*
- * Called when superblock is already locked
- */
-STATIC int
-xfs_icsb_modify_counters_locked(
- xfs_mount_t *mp,
- xfs_sb_field_t field,
- int64_t delta,
- int rsvd)
-{
- return xfs_icsb_modify_counters_int(mp, field, delta,
- rsvd, XFS_ICSB_SB_LOCKED);
+ /*
+ * We may have multiple threads here if multiple per-cpu
+ * counters run dry at the same time. This will mean we can
+ * do more balances than strictly necessary but it is not
+ * the common slowpath case.
+ */
+ mutex_lock(&mp->m_icsb_mutex);
+
+ /*
+ * running atomically.
+ *
+ * This will leave the counter in the correct state for future
+ * accesses. After the rebalance, we simply try again but with the
+ * global superblock lock held. This ensures that the counter state as
+ * a result of the balance does not change and our retry will either
+ * succeed through the fast path or slow path without another balance
+ * operation being required.
+ */
+ xfs_icsb_balance_counter(mp, field, 0, delta);
+ mutex_unlock(&mp->m_icsb_mutex);
+ goto again;
}
+
#endif
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2006-10-19 10:25:12.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2006-10-19 10:32:16.626827226 +1000
@@ -419,6 +419,7 @@ typedef struct xfs_mount {
xfs_icsb_cnts_t *m_sb_cnts; /* per-cpu superblock counters */
unsigned long m_icsb_counters; /* disabled per-cpu counters */
struct notifier_block m_icsb_notifier; /* hotplug cpu notifier */
+ struct mutex m_icsb_mutex; /* balancer sync lock */
#endif
} xfs_mount_t;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-11-23 4:41 Review: Reduce in-core superblock lock contention near ENOSPC David Chinner
@ 2006-11-30 18:03 ` Lachlan McIlroy
2006-11-30 22:38 ` David Chinner
0 siblings, 1 reply; 11+ messages in thread
From: Lachlan McIlroy @ 2006-11-30 18:03 UTC (permalink / raw)
To: David Chinner; +Cc: xfs-dev, xfs
Dave,
Could you have changed the SB_LOCK from a spinlock to a blocking
mutex and have achieved a similar effect?
Has this change had much testing on a large machine?
These changes wouldn't apply cleanly to tot (3 hunks failed in
xfs_mount.c) but I couldn't see why.
The changes look fine to me, couple of comments below.
Lachlan
@@ -1479,9 +1479,11 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status = xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
msbp->msb_delta, rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
Is it safe to be releasing the SB_LOCK? Is it assumed that the
superblock wont change while we process the list of xfs_mod_sb
structures?
@@ -1515,11 +1517,12 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status =
- xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
-(msbp->msb_delta),
rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
Same as above.
@@ -1882,6 +1895,17 @@ xfs_icsb_disable_counter(
ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS));
+ /*
+ * If we are already disabled, then there is nothing to do
+ * here. We check before locking all the counters to avoid
+ * the expensive lock operation when being called in the
+ * slow path and the counter is already disabled. This is
+ * safe because the only time we set or clear this state is under
+ * the m_icsb_mutex.
+ */
+ if (xfs_icsb_counter_disabled(mp, field))
+ return 0;
+
xfs_icsb_lock_all_counters(mp);
if (!test_and_set_bit(field, &mp->m_icsb_counters)) {
/* drain back to superblock */
Nice one, that will avoid a lot of unnecessary work.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-11-30 18:03 ` Lachlan McIlroy
@ 2006-11-30 22:38 ` David Chinner
2006-12-01 0:41 ` David Chinner
2006-12-01 19:22 ` Lachlan McIlroy
0 siblings, 2 replies; 11+ messages in thread
From: David Chinner @ 2006-11-30 22:38 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: David Chinner, xfs-dev, xfs
On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
> Dave,
>
> Could you have changed the SB_LOCK from a spinlock to a blocking
> mutex and have achieved a similar effect?
Sort of - it would still be inefficient and wouldn't help solve the
underlying causes of contention. Also, everything else that uses
the SB_LOCK would now have a sleep point where there wasn't one
previously. If we are nesting the SB_LOCK somewhere else inside a
another spinlock (not sure if we are) then we can't sleep. I'd
prefer not to change the semantics of such a lock if I can avoid it.
I think the slow path code is somewhat clearer with a separate
mutex - it clearly documents the serialisation barrier that
the slow path uses and allows us to do slow path checks on the
per-cpu counters without needing the SB_LOCK.
It also means that in future, we can slowly remove the need for
holding the SB_LOCK across the entire rebalance operation and only
use it when referencing the global superblock fields during
the rebalance.
If the need arises, it also means we can move to a mutex per counter
so we can independently rebalance different types of counters at the
same time (which we can't do right now).
> Has this change had much testing on a large machine?
8p is the largest I've run it on (junkbond) and it's been ENOSPC
tested on a 2.7GB/s filesystem (junkbond once again) as well
as one single, slow disks.
I've tried and tried to get the ppl that reported the problem to
test this fix but no luck so far (this bug has been open for months
and most of that time has been me waiting for someone to run a
test). I've basically got sick of waiting and I just want to
move this on. It's already too late for sles10sp1 because of
the lack of response.
> These changes wouldn't apply cleanly to tot (3 hunks failed in
> xfs_mount.c) but I couldn't see why.
Whitespace issue? Try setting:
$ export QUILT_PATCH_OPTS="--ignore-whitespace"
I'll apply the patch to a separate tree and see if I hit the same
problem....
> The changes look fine to me, couple of comments below.
>
> Lachlan
>
>
> @@ -1479,9 +1479,11 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
> case XFS_SBS_IFREE:
> case XFS_SBS_FDBLOCKS:
> if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
> - status = xfs_icsb_modify_counters_locked(mp,
> + XFS_SB_UNLOCK(mp, s);
> + status = xfs_icsb_modify_counters(mp,
> msbp->msb_field,
> msbp->msb_delta,
> rsvd);
> + s = XFS_SB_LOCK(mp);
> break;
> }
> /* FALLTHROUGH */
>
> Is it safe to be releasing the SB_LOCK?
Yes.
> Is it assumed that the
> superblock wont change while we process the list of xfs_mod_sb
> structures?
No. We are applying deltas - it doesn't matter if other deltas are
applied at the same time by other callers because in the end all
the deltas get applied and it adds up to the same thing.
> @@ -1515,11 +1517,12 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
> case XFS_SBS_IFREE:
> case XFS_SBS_FDBLOCKS:
> if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB))
> {
> - status =
> -
> xfs_icsb_modify_counters_locked(mp,
> + XFS_SB_UNLOCK(mp, s);
> + status = xfs_icsb_modify_counters(mp,
> msbp->msb_field,
> -(msbp->msb_delta),
> rsvd);
> + s = XFS_SB_LOCK(mp);
> break;
> }
> /* FALLTHROUGH */
>
> Same as above.
Ditto ;)
Thanks for looking at this, Lachlan.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-11-30 22:38 ` David Chinner
@ 2006-12-01 0:41 ` David Chinner
2006-12-01 20:12 ` Lachlan McIlroy
2006-12-01 19:22 ` Lachlan McIlroy
1 sibling, 1 reply; 11+ messages in thread
From: David Chinner @ 2006-12-01 0:41 UTC (permalink / raw)
To: David Chinner; +Cc: Lachlan McIlroy, xfs-dev, xfs
On Fri, Dec 01, 2006 at 09:38:11AM +1100, David Chinner wrote:
> On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
>
> > These changes wouldn't apply cleanly to tot (3 hunks failed in
> > xfs_mount.c) but I couldn't see why.
>
> Whitespace issue? Try setting:
>
> $ export QUILT_PATCH_OPTS="--ignore-whitespace"
>
> I'll apply the patch to a separate tree and see if I hit the same
> problem....
I see the problem - the next patch I am going to send out for
review which is earlier in my series....
The growfs fix changes the delta parameter to xfs_icsb_modify_counters()
from int to int64_t, and that is why the hunks don't apply.
The attached patch should apply (with a 6 line offset to most hunks).
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_mount.c | 234 +++++++++++++++++++++++++++++++----------------------
fs/xfs/xfs_mount.h | 1
2 files changed, 142 insertions(+), 93 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-10-19 10:29:35.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-10-19 10:32:16.626827226 +1000
@@ -52,21 +52,19 @@ STATIC void xfs_unmountfs_wait(xfs_mount
#ifdef HAVE_PERCPU_SB
STATIC void xfs_icsb_destroy_counters(xfs_mount_t *);
-STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int);
+STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, int,
+int);
STATIC void xfs_icsb_sync_counters(xfs_mount_t *);
STATIC int xfs_icsb_modify_counters(xfs_mount_t *, xfs_sb_field_t,
int, int);
-STATIC int xfs_icsb_modify_counters_locked(xfs_mount_t *, xfs_sb_field_t,
- int, int);
STATIC int xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t);
#else
#define xfs_icsb_destroy_counters(mp) do { } while (0)
-#define xfs_icsb_balance_counter(mp, a, b) do { } while (0)
+#define xfs_icsb_balance_counter(mp, a, b, c) do { } while (0)
#define xfs_icsb_sync_counters(mp) do { } while (0)
#define xfs_icsb_modify_counters(mp, a, b, c) do { } while (0)
-#define xfs_icsb_modify_counters_locked(mp, a, b, c) do { } while (0)
#endif
@@ -540,9 +538,11 @@ xfs_readsb(xfs_mount_t *mp, int flags)
ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
}
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0);
+ mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
mp->m_sb_bp = bp;
xfs_buf_relse(bp);
@@ -1479,9 +1479,11 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status = xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
msbp->msb_delta, rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
@@ -1515,11 +1517,12 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
case XFS_SBS_IFREE:
case XFS_SBS_FDBLOCKS:
if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
- status =
- xfs_icsb_modify_counters_locked(mp,
+ XFS_SB_UNLOCK(mp, s);
+ status = xfs_icsb_modify_counters(mp,
msbp->msb_field,
-(msbp->msb_delta),
rsvd);
+ s = XFS_SB_LOCK(mp);
break;
}
/* FALLTHROUGH */
@@ -1727,14 +1730,17 @@ xfs_icsb_cpu_notify(
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
break;
case CPU_ONLINE:
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0);
+ mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
break;
case CPU_DEAD:
/* Disable all the counters, then fold the dead cpu's
* count into the total on the global superblock and
* re-enable the counters. */
+ mutex_lock(&mp->m_icsb_mutex);
s = XFS_SB_LOCK(mp);
xfs_icsb_disable_counter(mp, XFS_SBS_ICOUNT);
xfs_icsb_disable_counter(mp, XFS_SBS_IFREE);
@@ -1746,10 +1752,14 @@ xfs_icsb_cpu_notify(
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
- xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, XFS_ICSB_SB_LOCKED);
- xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, XFS_ICSB_SB_LOCKED);
- xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, XFS_ICSB_SB_LOCKED);
+ xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT,
+ XFS_ICSB_SB_LOCKED, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_IFREE,
+ XFS_ICSB_SB_LOCKED, 0);
+ xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS,
+ XFS_ICSB_SB_LOCKED, 0);
XFS_SB_UNLOCK(mp, s);
+ mutex_unlock(&mp->m_icsb_mutex);
break;
}
@@ -1778,6 +1788,9 @@ xfs_icsb_init_counters(
cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i);
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
}
+
+ mutex_init(&mp->m_icsb_mutex);
+
/*
* start with all counters disabled so that the
* initial balance kicks us off correctly
@@ -1882,6 +1895,17 @@ xfs_icsb_disable_counter(
ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS));
+ /*
+ * If we are already disabled, then there is nothing to do
+ * here. We check before locking all the counters to avoid
+ * the expensive lock operation when being called in the
+ * slow path and the counter is already disabled. This is
+ * safe because the only time we set or clear this state is under
+ * the m_icsb_mutex.
+ */
+ if (xfs_icsb_counter_disabled(mp, field))
+ return 0;
+
xfs_icsb_lock_all_counters(mp);
if (!test_and_set_bit(field, &mp->m_icsb_counters)) {
/* drain back to superblock */
@@ -1991,24 +2015,33 @@ xfs_icsb_sync_counters_lazy(
/*
* Balance and enable/disable counters as necessary.
*
- * Thresholds for re-enabling counters are somewhat magic.
- * inode counts are chosen to be the same number as single
- * on disk allocation chunk per CPU, and free blocks is
- * something far enough zero that we aren't going thrash
- * when we get near ENOSPC.
+ * Thresholds for re-enabling counters are somewhat magic. inode counts are
+ * chosen to be the same number as single on disk allocation chunk per CPU, and
+ * free blocks is something far enough zero that we aren't going thrash when we
+ * get near ENOSPC. We also need to supply a minimum we require per cpu to
+ * prevent looping endlessly when xfs_alloc_space asks for more than will
+ * be distributed to a single CPU but each CPU has enough blocks to be
+ * reenabled.
+ *
+ * Note that we can be called when counters are already disabled.
+ * xfs_icsb_disable_counter() optimises the counter locking in this case to
+ * prevent locking every per-cpu counter needlessly.
*/
-#define XFS_ICSB_INO_CNTR_REENABLE 64
+
+#define XFS_ICSB_INO_CNTR_REENABLE (uint64_t)64
#define XFS_ICSB_FDBLK_CNTR_REENABLE(mp) \
- (512 + XFS_ALLOC_SET_ASIDE(mp))
+ (uint64_t)(512 + XFS_ALLOC_SET_ASIDE(mp))
STATIC void
xfs_icsb_balance_counter(
xfs_mount_t *mp,
xfs_sb_field_t field,
- int flags)
+ int flags,
+ int min_per_cpu)
{
uint64_t count, resid;
int weight = num_online_cpus();
int s;
+ uint64_t min = (uint64_t)min_per_cpu;
if (!(flags & XFS_ICSB_SB_LOCKED))
s = XFS_SB_LOCK(mp);
@@ -2021,19 +2054,19 @@ xfs_icsb_balance_counter(
case XFS_SBS_ICOUNT:
count = mp->m_sb.sb_icount;
resid = do_div(count, weight);
- if (count < XFS_ICSB_INO_CNTR_REENABLE)
+ if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE))
goto out;
break;
case XFS_SBS_IFREE:
count = mp->m_sb.sb_ifree;
resid = do_div(count, weight);
- if (count < XFS_ICSB_INO_CNTR_REENABLE)
+ if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE))
goto out;
break;
case XFS_SBS_FDBLOCKS:
count = mp->m_sb.sb_fdblocks;
resid = do_div(count, weight);
- if (count < XFS_ICSB_FDBLK_CNTR_REENABLE(mp))
+ if (count < max(min, XFS_ICSB_FDBLK_CNTR_REENABLE(mp)))
goto out;
break;
default:
@@ -2048,32 +2081,39 @@ out:
XFS_SB_UNLOCK(mp, s);
}
-STATIC int
-xfs_icsb_modify_counters_int(
+int
+xfs_icsb_modify_counters(
xfs_mount_t *mp,
xfs_sb_field_t field,
int delta,
- int rsvd,
- int flags)
+ int rsvd)
{
xfs_icsb_cnts_t *icsbp;
long long lcounter; /* long counter for 64 bit fields */
- int cpu, s, locked = 0;
- int ret = 0, balance_done = 0;
+ int cpu, ret = 0, s;
+ might_sleep();
again:
cpu = get_cpu();
- icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu),
- xfs_icsb_lock_cntr(icsbp);
+ icsbp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, cpu);
+
+ /*
+ * if the counter is disabled, go to slow path
+ */
if (unlikely(xfs_icsb_counter_disabled(mp, field)))
goto slow_path;
+ xfs_icsb_lock_cntr(icsbp);
+ if (unlikely(xfs_icsb_counter_disabled(mp, field))) {
+ xfs_icsb_unlock_cntr(icsbp);
+ goto slow_path;
+ }
switch (field) {
case XFS_SBS_ICOUNT:
lcounter = icsbp->icsb_icount;
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_icount = lcounter;
break;
@@ -2081,7 +2121,7 @@ again:
lcounter = icsbp->icsb_ifree;
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_ifree = lcounter;
break;
@@ -2091,7 +2131,7 @@ again:
lcounter = icsbp->icsb_fdblocks - XFS_ALLOC_SET_ASIDE(mp);
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp);
break;
default:
@@ -2100,72 +2140,80 @@ again:
}
xfs_icsb_unlock_cntr(icsbp);
put_cpu();
- if (locked)
- XFS_SB_UNLOCK(mp, s);
return 0;
- /*
- * The slow path needs to be run with the SBLOCK
- * held so that we prevent other threads from
- * attempting to run this path at the same time.
- * this provides exclusion for the balancing code,
- * and exclusive fallback if the balance does not
- * provide enough resources to continue in an unlocked
- * manner.
- */
slow_path:
- xfs_icsb_unlock_cntr(icsbp);
put_cpu();
- /* need to hold superblock incase we need
- * to disable a counter */
- if (!(flags & XFS_ICSB_SB_LOCKED)) {
- s = XFS_SB_LOCK(mp);
- locked = 1;
- flags |= XFS_ICSB_SB_LOCKED;
- }
- if (!balance_done) {
- xfs_icsb_balance_counter(mp, field, flags);
- balance_done = 1;
+ /*
+ * serialise with a mutex so we don't burn lots of cpu on
+ * the superblock lock. We still need to hold the superblock
+ * lock, however, when we modify the global structures.
+ */
+ mutex_lock(&mp->m_icsb_mutex);
+
+ /*
+ * Now running atomically.
+ *
+ * If the counter is enabled, someone has beaten us to rebalancing.
+ * Drop the lock and try again in the fast path....
+ */
+ if (!(xfs_icsb_counter_disabled(mp, field))) {
+ mutex_unlock(&mp->m_icsb_mutex);
goto again;
- } else {
- /*
- * we might not have enough on this local
- * cpu to allocate for a bulk request.
- * We need to drain this field from all CPUs
- * and disable the counter fastpath
- */
- xfs_icsb_disable_counter(mp, field);
}
+ /*
+ * The counter is currently disabled. Because we are
+ * running atomically here, we know a rebalance cannot
+ * be in progress. Hence we can go straight to operating
+ * on the global superblock. We do not call xfs_mod_incore_sb()
+ * here even though we need to get the SB_LOCK. Doing so
+ * will cause us to re-enter this function and deadlock.
+ * Hence we get the SB_LOCK ourselves and then call
+ * xfs_mod_incore_sb_unlocked() as the unlocked path operates
+ * directly on the global counters.
+ */
+ s = XFS_SB_LOCK(mp);
ret = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd);
+ XFS_SB_UNLOCK(mp, s);
- if (locked)
- XFS_SB_UNLOCK(mp, s);
+ /*
+ * Now that we've modified the global superblock, we
+ * may be able to re-enable the distributed counters
+ * (e.g. lots of space just got freed). After that
+ * we are done.
+ */
+ if (ret != ENOSPC)
+ xfs_icsb_balance_counter(mp, field, 0, 0);
+ mutex_unlock(&mp->m_icsb_mutex);
return ret;
-}
-STATIC int
-xfs_icsb_modify_counters(
- xfs_mount_t *mp,
- xfs_sb_field_t field,
- int delta,
- int rsvd)
-{
- return xfs_icsb_modify_counters_int(mp, field, delta, rsvd, 0);
-}
+balance_counter:
+ xfs_icsb_unlock_cntr(icsbp);
+ put_cpu();
-/*
- * Called when superblock is already locked
- */
-STATIC int
-xfs_icsb_modify_counters_locked(
- xfs_mount_t *mp,
- xfs_sb_field_t field,
- int delta,
- int rsvd)
-{
- return xfs_icsb_modify_counters_int(mp, field, delta,
- rsvd, XFS_ICSB_SB_LOCKED);
+ /*
+ * We may have multiple threads here if multiple per-cpu
+ * counters run dry at the same time. This will mean we can
+ * do more balances than strictly necessary but it is not
+ * the common slowpath case.
+ */
+ mutex_lock(&mp->m_icsb_mutex);
+
+ /*
+ * running atomically.
+ *
+ * This will leave the counter in the correct state for future
+ * accesses. After the rebalance, we simply try again but with the
+ * global superblock lock held. This ensures that the counter state as
+ * a result of the balance does not change and our retry will either
+ * succeed through the fast path or slow path without another balance
+ * operation being required.
+ */
+ xfs_icsb_balance_counter(mp, field, 0, delta);
+ mutex_unlock(&mp->m_icsb_mutex);
+ goto again;
}
+
#endif
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2006-10-19 10:25:12.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2006-10-19 10:32:16.626827226 +1000
@@ -419,6 +419,7 @@ typedef struct xfs_mount {
xfs_icsb_cnts_t *m_sb_cnts; /* per-cpu superblock counters */
unsigned long m_icsb_counters; /* disabled per-cpu counters */
struct notifier_block m_icsb_notifier; /* hotplug cpu notifier */
+ struct mutex m_icsb_mutex; /* balancer sync lock */
#endif
} xfs_mount_t;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-11-30 22:38 ` David Chinner
2006-12-01 0:41 ` David Chinner
@ 2006-12-01 19:22 ` Lachlan McIlroy
2006-12-03 23:49 ` David Chinner
1 sibling, 1 reply; 11+ messages in thread
From: Lachlan McIlroy @ 2006-12-01 19:22 UTC (permalink / raw)
To: David Chinner; +Cc: xfs-dev, xfs
David Chinner wrote:
> On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
>
>>Dave,
>>
>>Could you have changed the SB_LOCK from a spinlock to a blocking
>>mutex and have achieved a similar effect?
>
>
> Sort of - it would still be inefficient and wouldn't help solve the
> underlying causes of contention. Also, everything else that uses
> the SB_LOCK would now have a sleep point where there wasn't one
> previously. If we are nesting the SB_LOCK somewhere else inside a
> another spinlock (not sure if we are) then we can't sleep. I'd
> prefer not to change the semantics of such a lock if I can avoid it.
That's fair enough and I can't disagree with you. I think the SB_LOCK
was/is being abused anyway and was used too genericly (if there's such
a word). Using separate locks for specific purposes like you've done
with the new mutex is a great start to cleaning this code up.
>
> I think the slow path code is somewhat clearer with a separate
> mutex - it clearly documents the serialisation barrier that
> the slow path uses and allows us to do slow path checks on the
> per-cpu counters without needing the SB_LOCK.
It's certainly an improvement over the original code.
>
> It also means that in future, we can slowly remove the need for
> holding the SB_LOCK across the entire rebalance operation and only
> use it when referencing the global superblock fields during
> the rebalance.
Sounds good.
>
> If the need arises, it also means we can move to a mutex per counter
> so we can independently rebalance different types of counters at the
> same time (which we can't do right now).
That seems so obvious - I'm surprised we can't do it now.
>
>
>>Has this change had much testing on a large machine?
>
>
> 8p is the largest I've run it on (junkbond) and it's been ENOSPC
> tested on a 2.7GB/s filesystem (junkbond once again) as well
> as one single, slow disks.
>
> I've tried and tried to get the ppl that reported the problem to
> test this fix but no luck so far (this bug has been open for months
> and most of that time has been me waiting for someone to run a
> test). I've basically got sick of waiting and I just want to
> move this on. It's already too late for sles10sp1 because of
> the lack of response.
If it's important to them they'll test it. If the change doesn't fix
their problem then I'm sure we'll hear from them again.
>
>
>>These changes wouldn't apply cleanly to tot (3 hunks failed in
>>xfs_mount.c) but I couldn't see why.
>
>
> Whitespace issue? Try setting:
>
> $ export QUILT_PATCH_OPTS="--ignore-whitespace"
>
> I'll apply the patch to a separate tree and see if I hit the same
> problem....
>
>
>>The changes look fine to me, couple of comments below.
>>
>>Lachlan
>>
>>
>>@@ -1479,9 +1479,11 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
>> case XFS_SBS_IFREE:
>> case XFS_SBS_FDBLOCKS:
>> if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB)) {
>>- status = xfs_icsb_modify_counters_locked(mp,
>>+ XFS_SB_UNLOCK(mp, s);
>>+ status = xfs_icsb_modify_counters(mp,
>> msbp->msb_field,
>> msbp->msb_delta,
>> rsvd);
>>+ s = XFS_SB_LOCK(mp);
>> break;
>> }
>> /* FALLTHROUGH */
>>
>>Is it safe to be releasing the SB_LOCK?
>
>
> Yes.
>
>
>>Is it assumed that the
>>superblock wont change while we process the list of xfs_mod_sb
>>structures?
>
>
> No. We are applying deltas - it doesn't matter if other deltas are
> applied at the same time by other callers because in the end all
> the deltas get applied and it adds up to the same thing.
Okay.
>
>
>>@@ -1515,11 +1517,12 @@ xfs_mod_incore_sb_batch(xfs_mount_t *mp,
>> case XFS_SBS_IFREE:
>> case XFS_SBS_FDBLOCKS:
>> if (!(mp->m_flags & XFS_MOUNT_NO_PERCPU_SB))
>> {
>>- status =
>>-
>>xfs_icsb_modify_counters_locked(mp,
>>+ XFS_SB_UNLOCK(mp, s);
>>+ status = xfs_icsb_modify_counters(mp,
>> msbp->msb_field,
>> -(msbp->msb_delta),
>> rsvd);
>>+ s = XFS_SB_LOCK(mp);
>> break;
>> }
>> /* FALLTHROUGH */
>>
>>Same as above.
>
>
> Ditto ;)
>
> Thanks for looking at this, Lachlan.
>
> Cheers,
>
> Dave.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-01 0:41 ` David Chinner
@ 2006-12-01 20:12 ` Lachlan McIlroy
0 siblings, 0 replies; 11+ messages in thread
From: Lachlan McIlroy @ 2006-12-01 20:12 UTC (permalink / raw)
To: David Chinner; +Cc: xfs-dev, xfs
David Chinner wrote:
> On Fri, Dec 01, 2006 at 09:38:11AM +1100, David Chinner wrote:
>
>>On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
>>
>>
>>>These changes wouldn't apply cleanly to tot (3 hunks failed in
>>>xfs_mount.c) but I couldn't see why.
>>
>>Whitespace issue? Try setting:
>>
>>$ export QUILT_PATCH_OPTS="--ignore-whitespace"
>>
>>I'll apply the patch to a separate tree and see if I hit the same
>>problem....
>
>
> I see the problem - the next patch I am going to send out for
> review which is earlier in my series....
>
> The growfs fix changes the delta parameter to xfs_icsb_modify_counters()
> from int to int64_t, and that is why the hunks don't apply.
>
> The attached patch should apply (with a 6 line offset to most hunks).
>
That's even worse - now it loses track of which file it's patching.
[lachlan@linux (2.6.x-xfs)2.6.x-xfs]$ patch -p1 -l -i ENOSPC.patch.eml
patching file fs/xfs/xfs_mount.c
Hunk #2 succeeded at 543 (offset 5 lines).
Hunk #3 succeeded at 1485 (offset 6 lines).
Hunk #4 succeeded at 1523 (offset 6 lines).
Hunk #5 succeeded at 1736 (offset 6 lines).
Hunk #6 succeeded at 1758 (offset 6 lines).
Hunk #7 succeeded at 1794 (offset 6 lines).
Hunk #8 succeeded at 1901 (offset 6 lines).
Hunk #9 succeeded at 2021 (offset 6 lines).
Hunk #10 succeeded at 2060 (offset 6 lines).
Hunk #11 FAILED at 2087.
1 out of 11 hunks FAILED -- saving rejects to file fs/xfs/xfs_mount.c.rej
missing header for unified diff at line 256 of patch
can't find file to patch at input line 256
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
| break;
|
--------------------------
File to patch:
Skip this patch? [y]
Skipping patch.
3 out of 3 hunks ignored
patching file fs/xfs/xfs_mount.h
It seems to have a problem with line 256 of the patch:
@@ -2081,7 +2121,7 @@ again: <---- line 256
lcounter = icsbp->icsb_ifree;
lcounter += delta;
if (unlikely(lcounter < 0))
- goto slow_path;
+ goto balance_counter;
icsbp->icsb_ifree = lcounter;
break;
I can't see what's wrong. Don't sweat over it - it's not that important
now that the review is done. I'll leave it to you to merge it with tot.
Lachlan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-01 19:22 ` Lachlan McIlroy
@ 2006-12-03 23:49 ` David Chinner
2006-12-05 11:46 ` Klaus Strebel
0 siblings, 1 reply; 11+ messages in thread
From: David Chinner @ 2006-12-03 23:49 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: David Chinner, xfs-dev, xfs
On Fri, Dec 01, 2006 at 07:22:18PM +0000, Lachlan McIlroy wrote:
> David Chinner wrote:
> >On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
> >I think the slow path code is somewhat clearer with a separate
> >mutex - it clearly documents the serialisation barrier that
> >the slow path uses and allows us to do slow path checks on the
> >per-cpu counters without needing the SB_LOCK.
>
> It's certainly an improvement over the original code.
>
> >
> >It also means that in future, we can slowly remove the need for
> >holding the SB_LOCK across the entire rebalance operation and only
> >use it when referencing the global superblock fields during
> >the rebalance.
>
> Sounds good.
>
> >
> >If the need arises, it also means we can move to a mutex per counter
> >so we can independently rebalance different types of counters at the
> >same time (which we can't do right now).
>
> That seems so obvious - I'm surprised we can't do it now.
Well, the reason I didn't go down this path in the first place
was that typically only one type of counter would need rebalancing
at a time (e.g. free blocks or free inodes, but not both at the
same time). I tested this out on revenue2 with simultaneous creates
and deletes of small files but could not cause contention on the
single global lock under these loads. i also tried parallel
writes of large files with creates and deletes, but hte create/delete
was slowed sufficiently by the parallel writes that it once again
didn't cause an issue.
Hence it didn't seem to be an issue and a single lock simplified
the initial implementation. What I'm thinking now is that it may
become an issue with MDFS as acheivable create and delete rates
should be much higher on one of these filesystems and then it may
prove to be an issue.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-03 23:49 ` David Chinner
@ 2006-12-05 11:46 ` Klaus Strebel
2006-12-05 21:55 ` David Chinner
0 siblings, 1 reply; 11+ messages in thread
From: Klaus Strebel @ 2006-12-05 11:46 UTC (permalink / raw)
To: David Chinner; +Cc: Lachlan McIlroy, xfs-dev, xfs
David Chinner wrote:
> On Fri, Dec 01, 2006 at 07:22:18PM +0000, Lachlan McIlroy wrote:
>> David Chinner wrote:
>>> On Thu, Nov 30, 2006 at 06:03:40PM +0000, Lachlan McIlroy wrote:
>>> I think the slow path code is somewhat clearer with a separate
>>> mutex - it clearly documents the serialisation barrier that
>>> the slow path uses and allows us to do slow path checks on the
>>> per-cpu counters without needing the SB_LOCK.
>> It's certainly an improvement over the original code.
>>
>>> It also means that in future, we can slowly remove the need for
>>> holding the SB_LOCK across the entire rebalance operation and only
>>> use it when referencing the global superblock fields during
>>> the rebalance.
>> Sounds good.
>>
>>> If the need arises, it also means we can move to a mutex per counter
>>> so we can independently rebalance different types of counters at the
>>> same time (which we can't do right now).
>> That seems so obvious - I'm surprised we can't do it now.
>
> Well, the reason I didn't go down this path in the first place
> was that typically only one type of counter would need rebalancing
> at a time (e.g. free blocks or free inodes, but not both at the
> same time). I tested this out on revenue2 with simultaneous creates
> and deletes of small files but could not cause contention on the
> single global lock under these loads. i also tried parallel
> writes of large files with creates and deletes, but hte create/delete
> was slowed sufficiently by the parallel writes that it once again
> didn't cause an issue.
>
> Hence it didn't seem to be an issue and a single lock simplified
> the initial implementation. What I'm thinking now is that it may
> become an issue with MDFS as acheivable create and delete rates
> should be much higher on one of these filesystems and then it may
> prove to be an issue.
>
> Cheers,
>
> Dave.
Hi guys,
just updated my CVS copy from oss.sgi.com ( the linux-2.6-xfs ) and
tried to compile ... but your patch failes to compile if HAVE_PERCPU_SB
is #ifndef'd :-(, the m_icsb_mutex is not in the struct see xfs_mount.h.
Make oldconfig didn't show HAVE_PERCPU_SB as option for .config, looks
like nobody tested on a single processor config ??
Ciao
Klaus
--
Mit freundlichen Grüssen / best regards
Klaus Strebel, Dipl.-Inform. (FH), mailto:klaus.strebel@gmx.net
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-05 11:46 ` Klaus Strebel
@ 2006-12-05 21:55 ` David Chinner
2006-12-06 8:43 ` Lachlan McIlroy
0 siblings, 1 reply; 11+ messages in thread
From: David Chinner @ 2006-12-05 21:55 UTC (permalink / raw)
To: Klaus Strebel; +Cc: David Chinner, Lachlan McIlroy, xfs-dev, xfs
On Tue, Dec 05, 2006 at 12:46:46PM +0100, Klaus Strebel wrote:
> Hi guys,
>
> just updated my CVS copy from oss.sgi.com ( the linux-2.6-xfs ) and
> tried to compile ... but your patch failes to compile if HAVE_PERCPU_SB
> is #ifndef'd :-(, the m_icsb_mutex is not in the struct see xfs_mount.h.
> Make oldconfig didn't show HAVE_PERCPU_SB as option for .config, looks
> like nobody tested on a single processor config ??
Sorry - my bad. The code did not change for UP, so I didn't think to
test it. The patch below abstracts the icsb_mutex so that it
doesn't get directly referenced by code outside the per-cpu counter
code. Builds with and without HAVE_PERCPU_SB defined.
I'll run a test cycle on it and get it fixed up.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
---
fs/xfs/xfs_mount.c | 23 ++++++++++++-----------
fs/xfs/xfs_mount.h | 20 ++++++++++++++++++++
2 files changed, 32 insertions(+), 11 deletions(-)
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.c 2006-12-04 11:33:54.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.c 2006-12-06 08:43:25.389157507 +1100
@@ -536,11 +536,11 @@ xfs_readsb(xfs_mount_t *mp, int flags)
ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
}
- mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_lock(mp);
xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
mp->m_sb_bp = bp;
xfs_buf_relse(bp);
@@ -1726,17 +1726,17 @@ xfs_icsb_cpu_notify(
memset(cntp, 0, sizeof(xfs_icsb_cnts_t));
break;
case CPU_ONLINE:
- mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_lock(mp);
xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0, 0);
xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0, 0);
xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0, 0);
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
break;
case CPU_DEAD:
/* Disable all the counters, then fold the dead cpu's
* count into the total on the global superblock and
* re-enable the counters. */
- mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_lock(mp);
s = XFS_SB_LOCK(mp);
xfs_icsb_disable_counter(mp, XFS_SBS_ICOUNT);
xfs_icsb_disable_counter(mp, XFS_SBS_IFREE);
@@ -1755,7 +1755,7 @@ xfs_icsb_cpu_notify(
xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS,
XFS_ICSB_SB_LOCKED, 0);
XFS_SB_UNLOCK(mp, s);
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
break;
}
@@ -1803,6 +1803,7 @@ xfs_icsb_destroy_counters(
unregister_hotcpu_notifier(&mp->m_icsb_notifier);
free_percpu(mp->m_sb_cnts);
}
+ mutex_destroy(&mp->m_icsb_mutex);
}
STATIC_INLINE void
@@ -2146,7 +2147,7 @@ slow_path:
* the superblock lock. We still need to hold the superblock
* lock, however, when we modify the global structures.
*/
- mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_lock(mp);
/*
* Now running atomically.
@@ -2155,7 +2156,7 @@ slow_path:
* Drop the lock and try again in the fast path....
*/
if (!(xfs_icsb_counter_disabled(mp, field))) {
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
goto again;
}
@@ -2182,7 +2183,7 @@ slow_path:
*/
if (ret != ENOSPC)
xfs_icsb_balance_counter(mp, field, 0, 0);
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
return ret;
balance_counter:
@@ -2195,7 +2196,7 @@ balance_counter:
* do more balances than strictly necessary but it is not
* the common slowpath case.
*/
- mutex_lock(&mp->m_icsb_mutex);
+ xfs_icsb_lock(mp);
/*
* running atomically.
@@ -2206,7 +2207,7 @@ balance_counter:
* another balance operation being required.
*/
xfs_icsb_balance_counter(mp, field, 0, delta);
- mutex_unlock(&mp->m_icsb_mutex);
+ xfs_icsb_unlock(mp);
goto again;
}
Index: 2.6.x-xfs-new/fs/xfs/xfs_mount.h
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_mount.h 2006-12-04 11:33:56.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_mount.h 2006-12-06 08:43:18.630046128 +1100
@@ -576,6 +576,26 @@ xfs_daddr_to_agbno(struct xfs_mount *mp,
}
/*
+ * Per-cpu superblock locking functions
+ */
+#ifdef HAVE_PERCPU_SB
+STATIC_INLINE void
+xfs_icsb_lock(xfs_mount_t *mp)
+{
+ mutex_lock(&mp->m_icsb_mutex);
+}
+
+STATIC_INLINE void
+xfs_icsb_unlock(xfs_mount_t *mp)
+{
+ mutex_unlock(&mp->m_icsb_mutex);
+}
+#else
+#define xfs_icsb_lock(mp)
+#define xfs_icsb_unlock(mp)
+#endif
+
+/*
* This structure is for use by the xfs_mod_incore_sb_batch() routine.
* xfs_growfs can specify a few fields which are more than int limit
*/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-05 21:55 ` David Chinner
@ 2006-12-06 8:43 ` Lachlan McIlroy
2006-12-08 5:16 ` David Chinner
0 siblings, 1 reply; 11+ messages in thread
From: Lachlan McIlroy @ 2006-12-06 8:43 UTC (permalink / raw)
To: David Chinner; +Cc: Klaus Strebel, xfs-dev, xfs
David Chinner wrote:
> On Tue, Dec 05, 2006 at 12:46:46PM +0100, Klaus Strebel wrote:
>
>>Hi guys,
>>
>>just updated my CVS copy from oss.sgi.com ( the linux-2.6-xfs ) and
>>tried to compile ... but your patch failes to compile if HAVE_PERCPU_SB
>>is #ifndef'd :-(, the m_icsb_mutex is not in the struct see xfs_mount.h.
>>Make oldconfig didn't show HAVE_PERCPU_SB as option for .config, looks
>>like nobody tested on a single processor config ??
>
>
> Sorry - my bad. The code did not change for UP, so I didn't think to
> test it. The patch below abstracts the icsb_mutex so that it
> doesn't get directly referenced by code outside the per-cpu counter
> code. Builds with and without HAVE_PERCPU_SB defined.
>
> I'll run a test cycle on it and get it fixed up.
>
> Cheers,
>
> Dave.
@@ -1803,6 +1803,7 @@ xfs_icsb_destroy_counters(
unregister_hotcpu_notifier(&mp->m_icsb_notifier);
free_percpu(mp->m_sb_cnts);
}
+ mutex_destroy(&mp->m_icsb_mutex);
}
Do you need to abstract the call to mutex_destroy too?
The rest of the change looks good.
Lachlan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Review: Reduce in-core superblock lock contention near ENOSPC
2006-12-06 8:43 ` Lachlan McIlroy
@ 2006-12-08 5:16 ` David Chinner
0 siblings, 0 replies; 11+ messages in thread
From: David Chinner @ 2006-12-08 5:16 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: David Chinner, Klaus Strebel, xfs-dev, xfs
On Wed, Dec 06, 2006 at 08:43:47AM +0000, Lachlan McIlroy wrote:
> David Chinner wrote:
> >On Tue, Dec 05, 2006 at 12:46:46PM +0100, Klaus Strebel wrote:
> >
> >>Hi guys,
> >>
> >>just updated my CVS copy from oss.sgi.com ( the linux-2.6-xfs ) and
> >>tried to compile ... but your patch failes to compile if HAVE_PERCPU_SB
> >>is #ifndef'd :-(, the m_icsb_mutex is not in the struct see xfs_mount.h.
> >>Make oldconfig didn't show HAVE_PERCPU_SB as option for .config, looks
> >>like nobody tested on a single processor config ??
> >
> >
> >Sorry - my bad. The code did not change for UP, so I didn't think to
> >test it. The patch below abstracts the icsb_mutex so that it
> >doesn't get directly referenced by code outside the per-cpu counter
> >code. Builds with and without HAVE_PERCPU_SB defined.
> >
> >I'll run a test cycle on it and get it fixed up.
> >
> >Cheers,
> >
> >Dave.
>
> @@ -1803,6 +1803,7 @@ xfs_icsb_destroy_counters(
> unregister_hotcpu_notifier(&mp->m_icsb_notifier);
> free_percpu(mp->m_sb_cnts);
> }
> + mutex_destroy(&mp->m_icsb_mutex);
> }
>
> Do you need to abstract the call to mutex_destroy too?
No. I didn't abstract the mutex_init and mutex_destroy calls because
they are in the init/destroy functions for the icsb subsystem and
those functions are #define'd out when HAVE_PERCPU_SB is not
defined.
> The rest of the change looks good.
Thanks, Lachlan.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-12-08 5:34 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-23 4:41 Review: Reduce in-core superblock lock contention near ENOSPC David Chinner
2006-11-30 18:03 ` Lachlan McIlroy
2006-11-30 22:38 ` David Chinner
2006-12-01 0:41 ` David Chinner
2006-12-01 20:12 ` Lachlan McIlroy
2006-12-01 19:22 ` Lachlan McIlroy
2006-12-03 23:49 ` David Chinner
2006-12-05 11:46 ` Klaus Strebel
2006-12-05 21:55 ` David Chinner
2006-12-06 8:43 ` Lachlan McIlroy
2006-12-08 5:16 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox