* [PATCH 0/3] Use generic percpu counters in XFS
@ 2010-11-29 0:36 Dave Chinner
2010-11-29 0:36 ` [PATCH 1/3] lib: percpu counter add unless less than functionality Dave Chinner
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Dave Chinner @ 2010-11-29 0:36 UTC (permalink / raw)
To: xfs; +Cc: linux-kernel, a.p.zijlstra
This series replaces the XFS per-cpu superblock counters with generic per-cpu
counters. It adds a specialised operation to the generic counter implementation
and then switches the XFS percpu superblock counters over to use the generic
percpu counter implementation. The result is several hundred lines of complex
code removed from XFS.
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/3] lib: percpu counter add unless less than functionality 2010-11-29 0:36 [PATCH 0/3] Use generic percpu counters in XFS Dave Chinner @ 2010-11-29 0:36 ` Dave Chinner 2010-11-29 12:08 ` Peter Zijlstra 2010-11-29 0:36 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner 2010-11-29 0:36 ` [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner 2 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2010-11-29 0:36 UTC (permalink / raw) To: xfs; +Cc: linux-kernel, a.p.zijlstra From: Dave Chinner <dchinner@redhat.com> To use the generic percpu counter infrastructure for counters that require conditional addition based on a threshold value we need special handling of the counter. Further, the caller needs to know the status of the conditional addition to determine what action to take depending on whether the addition occurred or not. Examples of this sort of usage are resource counters that cannot go below zero (e.g. filesystem free blocks). To allow XFS to replace it's complex roll-your-own per-cpu superblock counters, a single generic conditional function is required: percpu_counter_add_unless_lt(). This will add the amount to the counter unless the result would be less than the given threshold. A caller supplied threshold is required because XFS does not necessarily use the same threshold for every counter. percpu_counter_add_unless_lt() attempts to minimise counter lock traversals by only taking the counter lock when the threshold is within the error range of the current counter value. Hence when the threshold is not within the counter error range, the counter will still have the same scalability characteristics as the normal percpu_counter_add() function. Adding this functionality to the generic percpu counters allows us to remove the much more complex and less efficient XFS percpu counter code (~700 lines of code) and replace it with generic percpu counters. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- include/linux/percpu_counter.h | 20 ++++++++++ lib/percpu_counter.c | 78 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 98 insertions(+), 0 deletions(-) diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 46f6ba5..465b658f 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -41,6 +41,8 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount); void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch); s64 __percpu_counter_sum(struct percpu_counter *fbc); int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs); +int percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, + s64 threshold); static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) { @@ -153,6 +155,24 @@ static inline int percpu_counter_initialized(struct percpu_counter *fbc) return 1; } +static inline int percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, + s64 threshold) +{ + s64 count; + int ret = ‐1; + + preempt_disable(); + count = fbc->count + amount; + if (count < threshold) + goto out; + fbc->count = count; + ret = count == threshold ? 0 : 1; +out: + preempt_enable(); + return ret; +} + + #endif /* CONFIG_SMP */ static inline void percpu_counter_inc(struct percpu_counter *fbc) diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 604678d..ba625b7 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -213,6 +213,84 @@ int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs) } EXPORT_SYMBOL(percpu_counter_compare); +/** + * percpu_counter_add_unless_lt - add to a counter avoiding underruns + * @fbc: counter + * @amount: amount to add + * @threshold: underrun threshold + * + * Add @amount to @fdc if and only if result of addition is greater than or + * equal to @threshold Return 1 if greater and added, 0 if equal and added + * and -1 if and underrun would have occured. + * + * This is useful for operations that must accurately and atomically only add a + * delta to a counter if the result is greater than a given (e.g. for freespace + * accounting with ENOSPC checking in filesystems). + */ +int percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, + s64 threshold) +{ + s64 count; + s64 error = 2 * percpu_counter_batch * num_online_cpus(); + int cpu; + int ret = -1; + + preempt_disable(); + + /* Check to see if rough count will be sufficient for comparison */ + count = percpu_counter_read(fbc); + if (count + amount < threshold - error) + goto out; + + /* + * If the counter is over the threshold and the change is less than the + * batch size, we might be able to avoid locking. + */ + if (count > threshold + error && abs(amount) < percpu_counter_batch) { + __percpu_counter_add(fbc, amount, percpu_counter_batch); + ret = 1; + goto out; + } + + /* + * If the result is is over the error threshold, we can just add it + * into the global counter ignoring what is in the per-cpu counters + * as they will not change the result of the calculation. + */ + spin_lock(&fbc->lock); + if (fbc->count + amount > threshold + error) { + fbc->count += amount; + ret = 1; + goto out_unlock; + } + + /* + * Result is withing the error margin. Run an open-coded sum of the + * per-cpu counters to get the exact value at this point in time, + * and if the result woul dbe above the threshold, add the amount to + * the global counter. + */ + count = fbc->count; + for_each_online_cpu(cpu) { + s32 *pcount = per_cpu_ptr(fbc->counters, cpu); + count += *pcount; + } + WARN_ON(count < threshold); + + if (count + amount >= threshold) { + ret = 0; + if (count + amount > threshold) + ret = 1; + fbc->count += amount; + } +out_unlock: + spin_unlock(&fbc->lock); +out: + preempt_enable(); + return ret; +} +EXPORT_SYMBOL(percpu_counter_add_unless_lt); + static int __init percpu_counter_startup(void) { compute_batch_value(); -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] lib: percpu counter add unless less than functionality 2010-11-29 0:36 ` [PATCH 1/3] lib: percpu counter add unless less than functionality Dave Chinner @ 2010-11-29 12:08 ` Peter Zijlstra 2010-11-30 4:30 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2010-11-29 12:08 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-kernel On Mon, 2010-11-29 at 11:36 +1100, Dave Chinner wrote: > +int percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, > + s64 threshold); It might make sense to implement that as; __percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, s64 threshold, s32 batch); That way its consistent with the __percpu_counter_add() implementation where you can also specify the batch value. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] lib: percpu counter add unless less than functionality 2010-11-29 12:08 ` Peter Zijlstra @ 2010-11-30 4:30 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2010-11-30 4:30 UTC (permalink / raw) To: Peter Zijlstra; +Cc: xfs, linux-kernel On Mon, Nov 29, 2010 at 01:08:50PM +0100, Peter Zijlstra wrote: > On Mon, 2010-11-29 at 11:36 +1100, Dave Chinner wrote: > > +int percpu_counter_add_unless_lt(struct percpu_counter *fbc, s64 amount, > > + s64 threshold); > > It might make sense to implement that as; > > __percpu_counter_add_unless_lt(struct percpu_counter *fbc, > s64 amount, > s64 threshold, > s32 batch); > > That way its consistent with the __percpu_counter_add() implementation > where you can also specify the batch value. Good point. I'll fix that up. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] xfs: use generic per-cpu counter infrastructure 2010-11-29 0:36 [PATCH 0/3] Use generic percpu counters in XFS Dave Chinner 2010-11-29 0:36 ` [PATCH 1/3] lib: percpu counter add unless less than functionality Dave Chinner @ 2010-11-29 0:36 ` Dave Chinner 2010-11-29 9:16 ` Christoph Hellwig 2010-11-29 0:36 ` [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner 2 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2010-11-29 0:36 UTC (permalink / raw) To: xfs; +Cc: linux-kernel, a.p.zijlstra From: Dave Chinner <dchinner@redhat.com> XFS has a per-cpu counter implementation for in-core superblock counters that pre-dated the generic implementation. It is complex and baroque as it is tailored directly to the needs of ENOSPC detection. Now that the generic percpu counter infrastructure has the percpu_counter_add_unless_lt() function that implements the necessary threshold checks for us, switch the XFS per-cpu superblock counters to use the generic percpu counter infrastructure. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_linux.h | 9 - fs/xfs/linux-2.6/xfs_super.c | 4 +- fs/xfs/xfs_fsops.c | 4 +- fs/xfs/xfs_mount.c | 842 ++++++++++-------------------------------- fs/xfs/xfs_mount.h | 71 +--- 5 files changed, 211 insertions(+), 719 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_linux.h b/fs/xfs/linux-2.6/xfs_linux.h index 214ddd7..9fa4f2a 100644 --- a/fs/xfs/linux-2.6/xfs_linux.h +++ b/fs/xfs/linux-2.6/xfs_linux.h @@ -88,15 +88,6 @@ #include <xfs_super.h> #include <xfs_buf.h> -/* - * Feature macros (disable/enable) - */ -#ifdef CONFIG_SMP -#define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ -#else -#undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ -#endif - #define irix_sgid_inherit xfs_params.sgid_inherit.val #define irix_symlink_mode xfs_params.symlink_mode.val #define xfs_panic_mask xfs_params.panic_mask.val diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index 6f02771..6cbebb2 100644 --- a/fs/xfs/linux-2.6/xfs_super.c +++ b/fs/xfs/linux-2.6/xfs_super.c @@ -1227,9 +1227,9 @@ xfs_fs_statfs( statp->f_fsid.val[0] = (u32)id; statp->f_fsid.val[1] = (u32)(id >> 32); - xfs_icsb_sync_counters(mp, XFS_ICSB_LAZY_COUNT); - spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); + statp->f_bsize = sbp->sb_blocksize; lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0; statp->f_blocks = sbp->sb_dblocks - lsize; diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index a7c116e..fb9a9c8 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -478,8 +478,8 @@ xfs_fs_counts( xfs_mount_t *mp, xfs_fsop_counts_t *cnt) { - xfs_icsb_sync_counters(mp, XFS_ICSB_LAZY_COUNT); spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); cnt->freedata = mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); cnt->freertx = mp->m_sb.sb_frextents; cnt->freeino = mp->m_sb.sb_ifree; @@ -540,7 +540,7 @@ xfs_reserve_blocks( */ retry: spin_lock(&mp->m_sb_lock); - xfs_icsb_sync_counters_locked(mp, 0); + xfs_icsb_sync_counters(mp); /* * If our previous reservation was larger than the current value, diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 19e9dfa..f0353fb 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -46,19 +46,6 @@ STATIC void xfs_unmountfs_wait(xfs_mount_t *); - -#ifdef HAVE_PERCPU_SB -STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, - int); -STATIC void xfs_icsb_balance_counter_locked(xfs_mount_t *, xfs_sb_field_t, - int); -STATIC void xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t); -#else - -#define xfs_icsb_balance_counter(mp, a, b) do { } while (0) -#define xfs_icsb_balance_counter_locked(mp, a, b) do { } while (0) -#endif - static const struct { short offset; short type; /* 0 = integer @@ -280,6 +267,112 @@ xfs_free_perag( } } + +/* + * Per-cpu incore superblock counters + * + * Simple concept, difficult implementation, now somewhat simplified by generic + * per-cpu counter support. This provides distributed per cpu counters for + * contended fields (e.g. free block count). + * + * Difficulties arise in that the incore sb is used for ENOSPC checking, and + * hence needs to be accurately read when we are running low on space. Hence We + * need to check against counter error bounds and determine how accurately to + * sum based on that metric. The percpu counters take care of this for us, + * so we only need to modify the fast path to handle per-cpu counter error + * cases. + */ +static inline int +xfs_icsb_add( + struct xfs_mount *mp, + int counter, + int64_t delta, + int64_t threshold) +{ + int ret; + + ret = percpu_counter_add_unless_lt(&mp->m_icsb[counter], delta, + threshold); + if (ret < 0) + return -ENOSPC; + return 0; +} + +static inline void +xfs_icsb_set( + struct xfs_mount *mp, + int counter, + int64_t value) +{ + percpu_counter_set(&mp->m_icsb[counter], value); +} + +static inline int64_t +xfs_icsb_sum( + struct xfs_mount *mp, + int counter) +{ + return percpu_counter_sum_positive(&mp->m_icsb[counter]); +} + +static inline int64_t +xfs_icsb_read( + struct xfs_mount *mp, + int counter) +{ + return percpu_counter_read_positive(&mp->m_icsb[counter]); +} + +void +xfs_icsb_reinit_counters( + struct xfs_mount *mp) +{ + xfs_icsb_set(mp, XFS_ICSB_FDBLOCKS, mp->m_sb.sb_fdblocks); + xfs_icsb_set(mp, XFS_ICSB_IFREE, mp->m_sb.sb_ifree); + xfs_icsb_set(mp, XFS_ICSB_ICOUNT, mp->m_sb.sb_icount); +} + +int +xfs_icsb_init_counters( + struct xfs_mount *mp) +{ + int i; + int error; + + for (i = 0; i < XFS_ICSB_MAX; i++) { + error = percpu_counter_init(&mp->m_icsb[i], 0); + if (error) + goto out_error; + } + xfs_icsb_reinit_counters(mp); + return 0; + +out_error: + for (; i >= 0; i--) + percpu_counter_destroy(&mp->m_icsb[i]); + return error; +} + +void +xfs_icsb_destroy_counters( + xfs_mount_t *mp) +{ + int i; + + for (i = 0; i < XFS_ICSB_MAX; i++) + percpu_counter_destroy(&mp->m_icsb[i]); +} + +void +xfs_icsb_sync_counters( + xfs_mount_t *mp) +{ + assert_spin_locked(&mp->m_sb_lock); + mp->m_sb.sb_icount = xfs_icsb_sum(mp, XFS_ICSB_ICOUNT); + mp->m_sb.sb_ifree = xfs_icsb_sum(mp, XFS_ICSB_IFREE); + mp->m_sb.sb_fdblocks = xfs_icsb_sum(mp, XFS_ICSB_FDBLOCKS); +} + /* * Check size of device based on the (data/realtime) block count. * Note: this check is used by the growfs code as well as mount. @@ -1562,7 +1655,9 @@ xfs_log_sbcount( if (!xfs_fs_writable(mp)) return 0; - xfs_icsb_sync_counters(mp, 0); + spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); + spin_unlock(&mp->m_sb_lock); /* * we don't need to do this if we are updating the superblock @@ -1674,9 +1769,8 @@ xfs_mod_incore_sb_unlocked( int64_t delta, int rsvd) { - int scounter; /* short counter for 32 bit fields */ - long long lcounter; /* long counter for 64 bit fields */ - long long res_used, rem; + int scounter = 0; /* short counter for 32 bit fields */ + long long lcounter = 0; /* long counter for 64 bit fields */ /* * With the in-core superblock spin lock held, switch @@ -1685,66 +1779,6 @@ xfs_mod_incore_sb_unlocked( * 0, then do not apply the delta and return EINVAL. */ switch (field) { - case XFS_SBS_ICOUNT: - lcounter = (long long)mp->m_sb.sb_icount; - lcounter += delta; - if (lcounter < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - mp->m_sb.sb_icount = lcounter; - return 0; - case XFS_SBS_IFREE: - lcounter = (long long)mp->m_sb.sb_ifree; - lcounter += delta; - if (lcounter < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - mp->m_sb.sb_ifree = lcounter; - return 0; - case XFS_SBS_FDBLOCKS: - lcounter = (long long) - mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); - res_used = (long long)(mp->m_resblks - mp->m_resblks_avail); - - if (delta > 0) { /* Putting blocks back */ - if (res_used > delta) { - mp->m_resblks_avail += delta; - } else { - rem = delta - res_used; - mp->m_resblks_avail = mp->m_resblks; - lcounter += rem; - } - } else { /* Taking blocks away */ - lcounter += delta; - if (lcounter >= 0) { - mp->m_sb.sb_fdblocks = lcounter + - XFS_ALLOC_SET_ASIDE(mp); - return 0; - } - - /* - * We are out of blocks, use any available reserved - * blocks if were allowed to. - */ - if (!rsvd) - return XFS_ERROR(ENOSPC); - - lcounter = (long long)mp->m_resblks_avail + delta; - if (lcounter >= 0) { - mp->m_resblks_avail = lcounter; - return 0; - } - printk_once(KERN_WARNING - "Filesystem \"%s\": reserve blocks depleted! " - "Consider increasing reserve pool size.", - mp->m_fsname); - return XFS_ERROR(ENOSPC); - } - - mp->m_sb.sb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp); - return 0; case XFS_SBS_FREXTENTS: lcounter = (long long)mp->m_sb.sb_frextents; lcounter += delta; @@ -1846,9 +1880,7 @@ xfs_mod_incore_sb( { int status; -#ifdef HAVE_PERCPU_SB ASSERT(field < XFS_SBS_ICOUNT || field > XFS_SBS_FDBLOCKS); -#endif spin_lock(&mp->m_sb_lock); status = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd); spin_unlock(&mp->m_sb_lock); @@ -1907,6 +1939,89 @@ unwind: return error; } +int +xfs_icsb_modify_counters( + xfs_mount_t *mp, + xfs_sb_field_t field, + int64_t delta, + int rsvd) +{ + int64_t lcounter; + int64_t res_used; + int ret = 0; + + + switch (field) { + case XFS_SBS_ICOUNT: + ret = xfs_icsb_add(mp, cntr, delta, 0); + if (ret < 0) { + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; + + case XFS_SBS_IFREE: + ret = xfs_icsb_add(mp, XFS_ICSB_IFREE, delta, 0); + if (ret < 0) { + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; + + case XFS_SBS_FDBLOCKS: + /* + * if we are putting blocks back, put them into the reserve + * block pool first. + */ + if (mp->m_resblks != mp->m_resblks_avail && delta > 0) { + spin_lock(&mp->m_sb_lock); + res_used = (int64_t)(mp->m_resblks - + mp->m_resblks_avail); + if (res_used > delta) { + mp->m_resblks_avail += delta; + delta = 0; + } else { + delta -= res_used; + mp->m_resblks_avail = mp->m_resblks; + } + spin_unlock(&mp->m_sb_lock); + if (!delta) + return 0; + } + + /* try the change */ + ret = xfs_icsb_add(mp, XFS_ICSB_FDBLOCKS, delta, + XFS_ALLOC_SET_ASIDE(mp)); + if (likely(ret >= 0)) + return 0; + + /* ENOSPC */ + ASSERT(ret == -ENOSPC); + ASSERT(delta < 0); + + if (!rsvd) + return XFS_ERROR(ENOSPC); + + spin_lock(&mp->m_sb_lock); + lcounter = (int64_t)mp->m_resblks_avail + delta; + if (lcounter >= 0) { + mp->m_resblks_avail = lcounter; + spin_unlock(&mp->m_sb_lock); + return 0; + } + spin_unlock(&mp->m_sb_lock); + printk_once(KERN_WARNING + "Filesystem \"%s\": reserve blocks depleted! " + "Consider increasing reserve pool size.", + mp->m_fsname); + return XFS_ERROR(ENOSPC); + default: + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; +} + /* * xfs_getsb() is called to obtain the buffer for the superblock. * The buffer is returned locked and read in from disk. @@ -2000,572 +2115,3 @@ xfs_dev_is_read_only( } return 0; } - -#ifdef HAVE_PERCPU_SB -/* - * Per-cpu incore superblock counters - * - * Simple concept, difficult implementation - * - * Basically, replace the incore superblock counters with a distributed per cpu - * counter for contended fields (e.g. free block count). - * - * Difficulties arise in that the incore sb is used for ENOSPC checking, and - * hence needs to be accurately read when we are running low on space. Hence - * there is a method to enable and disable the per-cpu counters based on how - * much "stuff" is available in them. - * - * Basically, a counter is enabled if there is enough free resource to justify - * running a per-cpu fast-path. If the per-cpu counter runs out (i.e. a local - * ENOSPC), then we disable the counters to synchronise all callers and - * re-distribute the available resources. - * - * If, once we redistributed the available resources, we still get a failure, - * we disable the per-cpu counter and go through the slow path. - * - * The slow path is the current xfs_mod_incore_sb() function. This means that - * when we disable a per-cpu counter, we need to drain its resources back to - * the global superblock. We do this after disabling the counter to prevent - * more threads from queueing up on the counter. - * - * Essentially, this means that we still need a lock in the fast path to enable - * synchronisation between the global counters and the per-cpu counters. This - * is not a problem because the lock will be local to a CPU almost all the time - * and have little contention except when we get to ENOSPC conditions. - * - * Basically, this lock becomes a barrier that enables us to lock out the fast - * path while we do things like enabling and disabling counters and - * synchronising the counters. - * - * Locking rules: - * - * 1. m_sb_lock before picking up per-cpu locks - * 2. per-cpu locks always picked up via for_each_online_cpu() order - * 3. accurate counter sync requires m_sb_lock + per cpu locks - * 4. modifying per-cpu counters requires holding per-cpu lock - * 5. modifying global counters requires holding m_sb_lock - * 6. enabling or disabling a counter requires holding the m_sb_lock - * and _none_ of the per-cpu locks. - * - * Disabled counters are only ever re-enabled by a balance operation - * that results in more free resources per CPU than a given threshold. - * To ensure counters don't remain disabled, they are rebalanced when - * the global resource goes above a higher threshold (i.e. some hysteresis - * is present to prevent thrashing). - */ - -#ifdef CONFIG_HOTPLUG_CPU -/* - * hot-plug CPU notifier support. - * - * We need a notifier per filesystem as we need to be able to identify - * the filesystem to balance the counters out. This is achieved by - * having a notifier block embedded in the xfs_mount_t and doing pointer - * magic to get the mount pointer from the notifier block address. - */ -STATIC int -xfs_icsb_cpu_notify( - struct notifier_block *nfb, - unsigned long action, - void *hcpu) -{ - xfs_icsb_cnts_t *cntp; - xfs_mount_t *mp; - - mp = (xfs_mount_t *)container_of(nfb, xfs_mount_t, m_icsb_notifier); - cntp = (xfs_icsb_cnts_t *) - per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu); - switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: - /* Easy Case - initialize the area and locks, and - * then rebalance when online does everything else for us. */ - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - break; - case CPU_ONLINE: - case CPU_ONLINE_FROZEN: - xfs_icsb_lock(mp); - xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0); - xfs_icsb_unlock(mp); - break; - case CPU_DEAD: - case CPU_DEAD_FROZEN: - /* Disable all the counters, then fold the dead cpu's - * count into the total on the global superblock and - * re-enable the counters. */ - xfs_icsb_lock(mp); - spin_lock(&mp->m_sb_lock); - xfs_icsb_disable_counter(mp, XFS_SBS_ICOUNT); - xfs_icsb_disable_counter(mp, XFS_SBS_IFREE); - xfs_icsb_disable_counter(mp, XFS_SBS_FDBLOCKS); - - mp->m_sb.sb_icount += cntp->icsb_icount; - mp->m_sb.sb_ifree += cntp->icsb_ifree; - mp->m_sb.sb_fdblocks += cntp->icsb_fdblocks; - - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - - xfs_icsb_balance_counter_locked(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter_locked(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter_locked(mp, XFS_SBS_FDBLOCKS, 0); - spin_unlock(&mp->m_sb_lock); - xfs_icsb_unlock(mp); - break; - } - - return NOTIFY_OK; -} -#endif /* CONFIG_HOTPLUG_CPU */ - -int -xfs_icsb_init_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t); - if (mp->m_sb_cnts == NULL) - return -ENOMEM; - -#ifdef CONFIG_HOTPLUG_CPU - mp->m_icsb_notifier.notifier_call = xfs_icsb_cpu_notify; - mp->m_icsb_notifier.priority = 0; - register_hotcpu_notifier(&mp->m_icsb_notifier); -#endif /* CONFIG_HOTPLUG_CPU */ - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - } - - mutex_init(&mp->m_icsb_mutex); - - /* - * start with all counters disabled so that the - * initial balance kicks us off correctly - */ - mp->m_icsb_counters = -1; - return 0; -} - -void -xfs_icsb_reinit_counters( - xfs_mount_t *mp) -{ - xfs_icsb_lock(mp); - /* - * start with all counters disabled so that the - * initial balance kicks us off correctly - */ - mp->m_icsb_counters = -1; - xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0); - xfs_icsb_unlock(mp); -} - -void -xfs_icsb_destroy_counters( - xfs_mount_t *mp) -{ - if (mp->m_sb_cnts) { - unregister_hotcpu_notifier(&mp->m_icsb_notifier); - free_percpu(mp->m_sb_cnts); - } - mutex_destroy(&mp->m_icsb_mutex); -} - -STATIC void -xfs_icsb_lock_cntr( - xfs_icsb_cnts_t *icsbp) -{ - while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags)) { - ndelay(1000); - } -} - -STATIC void -xfs_icsb_unlock_cntr( - xfs_icsb_cnts_t *icsbp) -{ - clear_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags); -} - - -STATIC void -xfs_icsb_lock_all_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - xfs_icsb_lock_cntr(cntp); - } -} - -STATIC void -xfs_icsb_unlock_all_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - xfs_icsb_unlock_cntr(cntp); - } -} - -STATIC void -xfs_icsb_count( - xfs_mount_t *mp, - xfs_icsb_cnts_t *cnt, - int flags) -{ - xfs_icsb_cnts_t *cntp; - int i; - - memset(cnt, 0, sizeof(xfs_icsb_cnts_t)); - - if (!(flags & XFS_ICSB_LAZY_COUNT)) - xfs_icsb_lock_all_counters(mp); - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - cnt->icsb_icount += cntp->icsb_icount; - cnt->icsb_ifree += cntp->icsb_ifree; - cnt->icsb_fdblocks += cntp->icsb_fdblocks; - } - - if (!(flags & XFS_ICSB_LAZY_COUNT)) - xfs_icsb_unlock_all_counters(mp); -} - -STATIC int -xfs_icsb_counter_disabled( - xfs_mount_t *mp, - xfs_sb_field_t field) -{ - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - return test_bit(field, &mp->m_icsb_counters); -} - -STATIC void -xfs_icsb_disable_counter( - xfs_mount_t *mp, - xfs_sb_field_t field) -{ - xfs_icsb_cnts_t cnt; - - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - - /* - * If we are already disabled, then there is nothing to do - * here. We check before locking all the counters to avoid - * the expensive lock operation when being called in the - * slow path and the counter is already disabled. This is - * safe because the only time we set or clear this state is under - * the m_icsb_mutex. - */ - if (xfs_icsb_counter_disabled(mp, field)) - return; - - xfs_icsb_lock_all_counters(mp); - if (!test_and_set_bit(field, &mp->m_icsb_counters)) { - /* drain back to superblock */ - - xfs_icsb_count(mp, &cnt, XFS_ICSB_LAZY_COUNT); - switch(field) { - case XFS_SBS_ICOUNT: - mp->m_sb.sb_icount = cnt.icsb_icount; - break; - case XFS_SBS_IFREE: - mp->m_sb.sb_ifree = cnt.icsb_ifree; - break; - case XFS_SBS_FDBLOCKS: - mp->m_sb.sb_fdblocks = cnt.icsb_fdblocks; - break; - default: - BUG(); - } - } - - xfs_icsb_unlock_all_counters(mp); -} - -STATIC void -xfs_icsb_enable_counter( - xfs_mount_t *mp, - xfs_sb_field_t field, - uint64_t count, - uint64_t resid) -{ - xfs_icsb_cnts_t *cntp; - int i; - - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - - xfs_icsb_lock_all_counters(mp); - for_each_online_cpu(i) { - cntp = per_cpu_ptr(mp->m_sb_cnts, i); - switch (field) { - case XFS_SBS_ICOUNT: - cntp->icsb_icount = count + resid; - break; - case XFS_SBS_IFREE: - cntp->icsb_ifree = count + resid; - break; - case XFS_SBS_FDBLOCKS: - cntp->icsb_fdblocks = count + resid; - break; - default: - BUG(); - break; - } - resid = 0; - } - clear_bit(field, &mp->m_icsb_counters); - xfs_icsb_unlock_all_counters(mp); -} - -void -xfs_icsb_sync_counters_locked( - xfs_mount_t *mp, - int flags) -{ - xfs_icsb_cnts_t cnt; - - xfs_icsb_count(mp, &cnt, flags); - - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_ICOUNT)) - mp->m_sb.sb_icount = cnt.icsb_icount; - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_IFREE)) - mp->m_sb.sb_ifree = cnt.icsb_ifree; - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_FDBLOCKS)) - mp->m_sb.sb_fdblocks = cnt.icsb_fdblocks; -} - -/* - * Accurate update of per-cpu counters to incore superblock - */ -void -xfs_icsb_sync_counters( - xfs_mount_t *mp, - int flags) -{ - spin_lock(&mp->m_sb_lock); - xfs_icsb_sync_counters_locked(mp, flags); - spin_unlock(&mp->m_sb_lock); -} - -/* - * Balance and enable/disable counters as necessary. - * - * Thresholds for re-enabling counters are somewhat magic. inode counts are - * chosen to be the same number as single on disk allocation chunk per CPU, and - * free blocks is something far enough zero that we aren't going thrash when we - * get near ENOSPC. We also need to supply a minimum we require per cpu to - * prevent looping endlessly when xfs_alloc_space asks for more than will - * be distributed to a single CPU but each CPU has enough blocks to be - * reenabled. - * - * Note that we can be called when counters are already disabled. - * xfs_icsb_disable_counter() optimises the counter locking in this case to - * prevent locking every per-cpu counter needlessly. - */ - -#define XFS_ICSB_INO_CNTR_REENABLE (uint64_t)64 -#define XFS_ICSB_FDBLK_CNTR_REENABLE(mp) \ - (uint64_t)(512 + XFS_ALLOC_SET_ASIDE(mp)) -STATIC void -xfs_icsb_balance_counter_locked( - xfs_mount_t *mp, - xfs_sb_field_t field, - int min_per_cpu) -{ - uint64_t count, resid; - int weight = num_online_cpus(); - uint64_t min = (uint64_t)min_per_cpu; - - /* disable counter and sync counter */ - xfs_icsb_disable_counter(mp, field); - - /* update counters - first CPU gets residual*/ - switch (field) { - case XFS_SBS_ICOUNT: - count = mp->m_sb.sb_icount; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE)) - return; - break; - case XFS_SBS_IFREE: - count = mp->m_sb.sb_ifree; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE)) - return; - break; - case XFS_SBS_FDBLOCKS: - count = mp->m_sb.sb_fdblocks; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_FDBLK_CNTR_REENABLE(mp))) - return; - break; - default: - BUG(); - count = resid = 0; /* quiet, gcc */ - break; - } - - xfs_icsb_enable_counter(mp, field, count, resid); -} - -STATIC void -xfs_icsb_balance_counter( - xfs_mount_t *mp, - xfs_sb_field_t fields, - int min_per_cpu) -{ - spin_lock(&mp->m_sb_lock); - xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu); - spin_unlock(&mp->m_sb_lock); -} - -int -xfs_icsb_modify_counters( - xfs_mount_t *mp, - xfs_sb_field_t field, - int64_t delta, - int rsvd) -{ - xfs_icsb_cnts_t *icsbp; - long long lcounter; /* long counter for 64 bit fields */ - int ret = 0; - - might_sleep(); -again: - preempt_disable(); - icsbp = this_cpu_ptr(mp->m_sb_cnts); - - /* - * if the counter is disabled, go to slow path - */ - if (unlikely(xfs_icsb_counter_disabled(mp, field))) - goto slow_path; - xfs_icsb_lock_cntr(icsbp); - if (unlikely(xfs_icsb_counter_disabled(mp, field))) { - xfs_icsb_unlock_cntr(icsbp); - goto slow_path; - } - - switch (field) { - case XFS_SBS_ICOUNT: - lcounter = icsbp->icsb_icount; - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_icount = lcounter; - break; - - case XFS_SBS_IFREE: - lcounter = icsbp->icsb_ifree; - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_ifree = lcounter; - break; - - case XFS_SBS_FDBLOCKS: - BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0); - - lcounter = icsbp->icsb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp); - break; - default: - BUG(); - break; - } - xfs_icsb_unlock_cntr(icsbp); - preempt_enable(); - return 0; - -slow_path: - preempt_enable(); - - /* - * serialise with a mutex so we don't burn lots of cpu on - * the superblock lock. We still need to hold the superblock - * lock, however, when we modify the global structures. - */ - xfs_icsb_lock(mp); - - /* - * Now running atomically. - * - * If the counter is enabled, someone has beaten us to rebalancing. - * Drop the lock and try again in the fast path.... - */ - if (!(xfs_icsb_counter_disabled(mp, field))) { - xfs_icsb_unlock(mp); - goto again; - } - - /* - * The counter is currently disabled. Because we are - * running atomically here, we know a rebalance cannot - * be in progress. Hence we can go straight to operating - * on the global superblock. We do not call xfs_mod_incore_sb() - * here even though we need to get the m_sb_lock. Doing so - * will cause us to re-enter this function and deadlock. - * Hence we get the m_sb_lock ourselves and then call - * xfs_mod_incore_sb_unlocked() as the unlocked path operates - * directly on the global counters. - */ - spin_lock(&mp->m_sb_lock); - ret = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd); - spin_unlock(&mp->m_sb_lock); - - /* - * Now that we've modified the global superblock, we - * may be able to re-enable the distributed counters - * (e.g. lots of space just got freed). After that - * we are done. - */ - if (ret != ENOSPC) - xfs_icsb_balance_counter(mp, field, 0); - xfs_icsb_unlock(mp); - return ret; - -balance_counter: - xfs_icsb_unlock_cntr(icsbp); - preempt_enable(); - - /* - * We may have multiple threads here if multiple per-cpu - * counters run dry at the same time. This will mean we can - * do more balances than strictly necessary but it is not - * the common slowpath case. - */ - xfs_icsb_lock(mp); - - /* - * running atomically. - * - * This will leave the counter in the correct state for future - * accesses. After the rebalance, we simply try again and our retry - * will either succeed through the fast path or slow path without - * another balance operation being required. - */ - xfs_icsb_balance_counter(mp, field, delta); - xfs_icsb_unlock(mp); - goto again; -} - -#endif diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 5861b49..42d31df 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -65,44 +65,19 @@ struct xfs_nameops; struct xfs_ail; struct xfs_quotainfo; -#ifdef HAVE_PERCPU_SB - /* - * Valid per-cpu incore superblock counters. Note that if you add new counters, - * you may need to define new counter disabled bit field descriptors as there - * are more possible fields in the superblock that can fit in a bitfield on a - * 32 bit platform. The XFS_SBS_* values for the current current counters just - * fit. + * Per-cpu incore superblock counters. */ -typedef struct xfs_icsb_cnts { - uint64_t icsb_fdblocks; - uint64_t icsb_ifree; - uint64_t icsb_icount; - unsigned long icsb_flags; -} xfs_icsb_cnts_t; - -#define XFS_ICSB_FLAG_LOCK (1 << 0) /* counter lock bit */ - -#define XFS_ICSB_LAZY_COUNT (1 << 1) /* accuracy not needed */ +enum { + XFS_ICSB_FDBLOCKS = 0, + XFS_ICSB_IFREE, + XFS_ICSB_ICOUNT, + XFS_ICSB_MAX, +}; -extern int xfs_icsb_init_counters(struct xfs_mount *); -extern void xfs_icsb_reinit_counters(struct xfs_mount *); -extern void xfs_icsb_destroy_counters(struct xfs_mount *); -extern void xfs_icsb_sync_counters(struct xfs_mount *, int); -extern void xfs_icsb_sync_counters_locked(struct xfs_mount *, int); extern int xfs_icsb_modify_counters(struct xfs_mount *, xfs_sb_field_t, int64_t, int); -#else -#define xfs_icsb_init_counters(mp) (0) -#define xfs_icsb_destroy_counters(mp) do { } while (0) -#define xfs_icsb_reinit_counters(mp) do { } while (0) -#define xfs_icsb_sync_counters(mp, flags) do { } while (0) -#define xfs_icsb_sync_counters_locked(mp, flags) do { } while (0) -#define xfs_icsb_modify_counters(mp, field, delta, rsvd) \ - xfs_mod_incore_sb(mp, field, delta, rsvd) -#endif - typedef struct xfs_mount { struct super_block *m_super; xfs_tid_t m_tid; /* next unused tid for fs */ @@ -186,12 +161,6 @@ typedef struct xfs_mount { struct xfs_chash *m_chash; /* fs private inode per-cluster * hash table */ atomic_t m_active_trans; /* number trans frozen */ -#ifdef HAVE_PERCPU_SB - xfs_icsb_cnts_t __percpu *m_sb_cnts; /* per-cpu superblock counters */ - unsigned long m_icsb_counters; /* disabled per-cpu counters */ - struct notifier_block m_icsb_notifier; /* hotplug cpu notifier */ - struct mutex m_icsb_mutex; /* balancer sync lock */ -#endif struct xfs_mru_cache *m_filestream; /* per-mount filestream data */ struct task_struct *m_sync_task; /* generalised sync thread */ xfs_sync_work_t m_sync_work; /* work item for VFS_SYNC */ @@ -202,6 +171,7 @@ typedef struct xfs_mount { __int64_t m_update_flags; /* sb flags we need to update on the next remount,rw */ struct shrinker m_inode_shrink; /* inode reclaim shrinker */ + struct percpu_counter m_icsb[XFS_ICSB_MAX]; } xfs_mount_t; /* @@ -333,26 +303,6 @@ struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno, void xfs_perag_put(struct xfs_perag *pag); /* - * Per-cpu superblock locking functions - */ -#ifdef HAVE_PERCPU_SB -static inline void -xfs_icsb_lock(xfs_mount_t *mp) -{ - mutex_lock(&mp->m_icsb_mutex); -} - -static inline void -xfs_icsb_unlock(xfs_mount_t *mp) -{ - mutex_unlock(&mp->m_icsb_mutex); -} -#else -#define xfs_icsb_lock(mp) -#define xfs_icsb_unlock(mp) -#endif - -/* * This structure is for use by the xfs_mod_incore_sb_batch() routine. * xfs_growfs can specify a few fields which are more than int limit */ @@ -379,6 +329,11 @@ extern int xfs_sb_validate_fsb_count(struct xfs_sb *, __uint64_t); extern int xfs_dev_is_read_only(struct xfs_mount *, char *); +extern int xfs_icsb_init_counters(struct xfs_mount *); +extern void xfs_icsb_reinit_counters(struct xfs_mount *); +extern void xfs_icsb_destroy_counters(struct xfs_mount *); +extern void xfs_icsb_sync_counters(struct xfs_mount *); + #endif /* __KERNEL__ */ extern void xfs_mod_sb(struct xfs_trans *, __int64_t); -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: use generic per-cpu counter infrastructure 2010-11-29 0:36 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner @ 2010-11-29 9:16 ` Christoph Hellwig 2010-11-30 4:37 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2010-11-29 9:16 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-kernel, a.p.zijlstra On Mon, Nov 29, 2010 at 11:36:40AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > XFS has a per-cpu counter implementation for in-core superblock > counters that pre-dated the generic implementation. It is complex > and baroque as it is tailored directly to the needs of ENOSPC > detection. > > Now that the generic percpu counter infrastructure has the > percpu_counter_add_unless_lt() function that implements the > necessary threshold checks for us, switch the XFS per-cpu > superblock counters to use the generic percpu counter > infrastructure. Looks good, but a few comments below: > +/* > + * Per-cpu incore superblock counters > + * > + * Simple concept, difficult implementation, now somewhat simplified by generic > + * per-cpu counter support. This provides distributed per cpu counters for > + * contended fields (e.g. free block count). The kind of historic comments like now simplified by .. don't make any sense after only a short while. I'd just remove the first senstence above, as the details of the problems are explained much better later. > +static inline int > +xfs_icsb_add( > + struct xfs_mount *mp, > + int counter, > + int64_t delta, > + int64_t threshold) > +{ > + int ret; > + > + ret = percpu_counter_add_unless_lt(&mp->m_icsb[counter], delta, > + threshold); > + if (ret < 0) > + return -ENOSPC; > + return 0; > +} > + > +static inline void > +xfs_icsb_set( > + struct xfs_mount *mp, > + int counter, > + int64_t value) > +{ > + percpu_counter_set(&mp->m_icsb[counter], value); > +} > + > +static inline int64_t > +xfs_icsb_sum( > + struct xfs_mount *mp, > + int counter) > +{ > + return percpu_counter_sum_positive(&mp->m_icsb[counter]); > +} I still don't like these wrappers. They are all local to xfs_mount.c, and only have a single function calling them. See the RFC patch below which removes them, and imho makes the code more readable. Especially in xfs _add case where we can get rid of one level of error remapping, and go directly from the weird percpu return values to the positive xfs errors instead of doing a detour via the negative linux errors. > +static inline int64_t > +xfs_icsb_read( > + struct xfs_mount *mp, > + int counter) > +{ > + return percpu_counter_read_positive(&mp->m_icsb[counter]); > +} this one is entirely unused. Index: xfs/fs/xfs/xfs_mount.c =================================================================== --- xfs.orig/fs/xfs/xfs_mount.c 2010-11-29 09:43:31.423011248 +0100 +++ xfs/fs/xfs/xfs_mount.c 2010-11-29 09:56:32.546255345 +0100 @@ -282,54 +282,14 @@ xfs_free_perag( * so we only need to modify the fast path to handle per-cpu counter error * cases. */ -static inline int -xfs_icsb_add( - struct xfs_mount *mp, - int counter, - int64_t delta, - int64_t threshold) -{ - int ret; - - ret = percpu_counter_add_unless_lt(&mp->m_icsb[counter], delta, - threshold); - if (ret < 0) - return -ENOSPC; - return 0; -} - -static inline void -xfs_icsb_set( - struct xfs_mount *mp, - int counter, - int64_t value) -{ - percpu_counter_set(&mp->m_icsb[counter], value); -} - -static inline int64_t -xfs_icsb_sum( - struct xfs_mount *mp, - int counter) -{ - return percpu_counter_sum_positive(&mp->m_icsb[counter]); -} - -static inline int64_t -xfs_icsb_read( - struct xfs_mount *mp, - int counter) -{ - return percpu_counter_read_positive(&mp->m_icsb[counter]); -} - void xfs_icsb_reinit_counters( struct xfs_mount *mp) { - xfs_icsb_set(mp, XFS_ICSB_FDBLOCKS, mp->m_sb.sb_fdblocks); - xfs_icsb_set(mp, XFS_ICSB_IFREE, mp->m_sb.sb_ifree); - xfs_icsb_set(mp, XFS_ICSB_ICOUNT, mp->m_sb.sb_icount); + percpu_counter_set(&mp->m_icsb[XFS_ICSB_FDBLOCKS], + mp->m_sb.sb_fdblocks); + percpu_counter_set(&mp->m_icsb[XFS_ICSB_IFREE], mp->m_sb.sb_ifree); + percpu_counter_set(&mp->m_icsb[XFS_ICSB_ICOUNT], mp->m_sb.sb_icount); } int @@ -368,9 +328,12 @@ xfs_icsb_sync_counters( xfs_mount_t *mp) { assert_spin_locked(&mp->m_sb_lock); - mp->m_sb.sb_icount = xfs_icsb_sum(mp, XFS_ICSB_ICOUNT); - mp->m_sb.sb_ifree = xfs_icsb_sum(mp, XFS_ICSB_IFREE); - mp->m_sb.sb_fdblocks = xfs_icsb_sum(mp, XFS_ICSB_FDBLOCKS); + mp->m_sb.sb_icount = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_ICOUNT]); + mp->m_sb.sb_ifree = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_IFREE]); + mp->m_sb.sb_fdblocks = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_FDBLOCKS]); } /* @@ -1953,7 +1916,7 @@ xfs_icsb_modify_counters( switch (field) { case XFS_SBS_ICOUNT: - ret = xfs_icsb_add(mp, cntr, delta, 0); + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_SBS_ICOUNT], delta, 0); if (ret < 0) { ASSERT(0); return XFS_ERROR(EINVAL); @@ -1961,7 +1924,7 @@ xfs_icsb_modify_counters( return 0; case XFS_SBS_IFREE: - ret = xfs_icsb_add(mp, XFS_ICSB_IFREE, delta, 0); + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_SBS_IFREE], delta, 0); if (ret < 0) { ASSERT(0); return XFS_ERROR(EINVAL); @@ -1990,13 +1953,12 @@ xfs_icsb_modify_counters( } /* try the change */ - ret = xfs_icsb_add(mp, XFS_ICSB_FDBLOCKS, delta, - XFS_ALLOC_SET_ASIDE(mp)); + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_ICSB_FDBLOCKS], + delta, XFS_ALLOC_SET_ASIDE(mp)); if (likely(ret >= 0)) return 0; /* ENOSPC */ - ASSERT(ret == -ENOSPC); ASSERT(delta < 0); if (!rsvd) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: use generic per-cpu counter infrastructure 2010-11-29 9:16 ` Christoph Hellwig @ 2010-11-30 4:37 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2010-11-30 4:37 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs, linux-kernel, a.p.zijlstra On Mon, Nov 29, 2010 at 04:16:10AM -0500, Christoph Hellwig wrote: > On Mon, Nov 29, 2010 at 11:36:40AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > XFS has a per-cpu counter implementation for in-core superblock > > counters that pre-dated the generic implementation. It is complex > > and baroque as it is tailored directly to the needs of ENOSPC > > detection. > > > > Now that the generic percpu counter infrastructure has the > > percpu_counter_add_unless_lt() function that implements the > > necessary threshold checks for us, switch the XFS per-cpu > > superblock counters to use the generic percpu counter > > infrastructure. > > Looks good, but a few comments below: > > > +/* > > + * Per-cpu incore superblock counters > > + * > > + * Simple concept, difficult implementation, now somewhat simplified by generic > > + * per-cpu counter support. This provides distributed per cpu counters for > > + * contended fields (e.g. free block count). > > The kind of historic comments like now simplified by .. don't make any > sense after only a short while. I'd just remove the first senstence > above, as the details of the problems are explained much better later. Ok, will do. > > > +static inline int > > +xfs_icsb_add( > > + struct xfs_mount *mp, > > + int counter, > > + int64_t delta, > > + int64_t threshold) > > +{ > > + int ret; > > + > > + ret = percpu_counter_add_unless_lt(&mp->m_icsb[counter], delta, > > + threshold); > > + if (ret < 0) > > + return -ENOSPC; > > + return 0; > > +} > > + > > +static inline void > > +xfs_icsb_set( > > + struct xfs_mount *mp, > > + int counter, > > + int64_t value) > > +{ > > + percpu_counter_set(&mp->m_icsb[counter], value); > > +} > > + > > +static inline int64_t > > +xfs_icsb_sum( > > + struct xfs_mount *mp, > > + int counter) > > +{ > > + return percpu_counter_sum_positive(&mp->m_icsb[counter]); > > +} > > I still don't like these wrappers. They are all local to xfs_mount.c, > and only have a single function calling them. See the RFC patch below > which removes them, and imho makes the code more readable. Especially > in xfs _add case where we can get rid of one level of error remapping, > and go directly from the weird percpu return values to the positive > xfs errors instead of doing a detour via the negative linux errors. Ok, if we need to tweak the batch size in future, then we can deal with it then. I'll clean it up as you suggest... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() 2010-11-29 0:36 [PATCH 0/3] Use generic percpu counters in XFS Dave Chinner 2010-11-29 0:36 ` [PATCH 1/3] lib: percpu counter add unless less than functionality Dave Chinner 2010-11-29 0:36 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner @ 2010-11-29 0:36 ` Dave Chinner 2010-11-29 9:19 ` Christoph Hellwig 2 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2010-11-29 0:36 UTC (permalink / raw) To: xfs; +Cc: linux-kernel, a.p.zijlstra From: Dave Chinner <dchinner@redhat.com> With the conversion to percpu counters, xfs_icsb_modify_counters() really does not need to exist. Convert the inode counter modifications to use a common helper function for the one place that calls them, and add another function for the free block modification and convert all the callers to use that. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_bmap.c | 34 +++++------- fs/xfs/xfs_fsops.c | 3 +- fs/xfs/xfs_mount.c | 160 +++++++++++++++++++++++++--------------------------- fs/xfs/xfs_mount.h | 4 +- fs/xfs/xfs_trans.c | 19 +++--- 5 files changed, 103 insertions(+), 117 deletions(-) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 4111cd3..6a47556 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -614,8 +614,8 @@ xfs_bmap_add_extent( nblks += cur->bc_private.b.allocated; ASSERT(nblks <= da_old); if (nblks < da_old) - xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - (int64_t)(da_old - nblks), rsvd); + xfs_icsb_modify_free_blocks(ip->i_mount, + (int64_t)(da_old - nblks), rsvd); } /* * Clear out the allocated field, done with it now in any case. @@ -1079,7 +1079,7 @@ xfs_bmap_add_extent_delay_real( diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) - (cur ? cur->bc_private.b.allocated : 0)); if (diff > 0 && - xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, + xfs_icsb_modify_free_blocks(ip->i_mount, -((int64_t)diff), rsvd)) { /* * Ick gross gag me with a spoon. @@ -1090,8 +1090,7 @@ xfs_bmap_add_extent_delay_real( temp--; diff--; if (!diff || - !xfs_icsb_modify_counters(ip->i_mount, - XFS_SBS_FDBLOCKS, + !xfs_icsb_modify_free_blocks(ip->i_mount, -((int64_t)diff), rsvd)) break; } @@ -1099,8 +1098,7 @@ xfs_bmap_add_extent_delay_real( temp2--; diff--; if (!diff || - !xfs_icsb_modify_counters(ip->i_mount, - XFS_SBS_FDBLOCKS, + !xfs_icsb_modify_free_blocks(ip->i_mount, -((int64_t)diff), rsvd)) break; } @@ -1769,8 +1767,8 @@ xfs_bmap_add_extent_hole_delay( } if (oldlen != newlen) { ASSERT(oldlen > newlen); - xfs_icsb_modify_counters(ip->i_mount, XFS_SBS_FDBLOCKS, - (int64_t)(oldlen - newlen), rsvd); + xfs_icsb_modify_free_blocks(ip->i_mount, + (int64_t)(oldlen - newlen), rsvd); /* * Nothing to do for disk quota accounting here. */ @@ -3114,10 +3112,9 @@ xfs_bmap_del_extent( * Nothing to do for disk quota accounting here. */ ASSERT(da_old >= da_new); - if (da_old > da_new) { - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - (int64_t)(da_old - da_new), rsvd); - } + if (da_old > da_new) + xfs_icsb_modify_free_blocks(ip->i_mount, + (int64_t)(da_old - da_new), rsvd); done: *logflagsp = flags; return error; @@ -4530,14 +4527,12 @@ xfs_bmapi( -((int64_t)extsz), (flags & XFS_BMAPI_RSVBLOCKS)); } else { - error = xfs_icsb_modify_counters(mp, - XFS_SBS_FDBLOCKS, + error = xfs_icsb_modify_free_blocks(mp, -((int64_t)alen), (flags & XFS_BMAPI_RSVBLOCKS)); } if (!error) { - error = xfs_icsb_modify_counters(mp, - XFS_SBS_FDBLOCKS, + error = xfs_icsb_modify_free_blocks(mp, -((int64_t)indlen), (flags & XFS_BMAPI_RSVBLOCKS)); if (error && rt) @@ -4546,8 +4541,7 @@ xfs_bmapi( (int64_t)extsz, (flags & XFS_BMAPI_RSVBLOCKS)); else if (error) - xfs_icsb_modify_counters(mp, - XFS_SBS_FDBLOCKS, + xfs_icsb_modify_free_blocks(mp, (int64_t)alen, (flags & XFS_BMAPI_RSVBLOCKS)); } @@ -5210,7 +5204,7 @@ xfs_bunmapi( ip, -((long)del.br_blockcount), 0, XFS_QMOPT_RES_RTBLKS); } else { - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, + xfs_icsb_modify_free_blocks(mp, (int64_t)del.br_blockcount, rsvd); (void)xfs_trans_reserve_quota_nblks(NULL, ip, -((long)del.br_blockcount), 0, diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index fb9a9c8..be34ff2 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -596,8 +596,7 @@ out: * the extra reserve blocks from the reserve..... */ int error; - error = xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - fdblks_delta, 0); + error = xfs_icsb_modify_free_blocks(mp, fdblks_delta, 0); if (error == ENOSPC) goto retry; } diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index f0353fb..3905fc3 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -373,6 +373,83 @@ xfs_icsb_sync_counters( mp->m_sb.sb_fdblocks = xfs_icsb_sum(mp, XFS_ICSB_FDBLOCKS); } +int +xfs_icsb_modify_inodes( + xfs_mount_t *mp, + int cntr, + int64_t delta, + int rsvd) +{ + int ret = 0; + + ASSERT(cntr == XFS_ICSB_ICOUNT || cntr == XFS_ICSB_IFREE); + + ret = xfs_icsb_add(mp, cntr, delta, 0); + if (ret < 0) { + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; +} + +int +xfs_icsb_modify_free_blocks( + xfs_mount_t *mp, + int64_t delta, + int rsvd) +{ + int64_t lcounter; + int64_t res_used; + int ret = 0; + + /* + * if we are putting blocks back, put them into the reserve + * block pool first. + */ + if (unlikely(mp->m_resblks != mp->m_resblks_avail) && delta > 0) { + spin_lock(&mp->m_sb_lock); + res_used = (int64_t)(mp->m_resblks - + mp->m_resblks_avail); + if (res_used > delta) { + mp->m_resblks_avail += delta; + delta = 0; + } else { + delta -= res_used; + mp->m_resblks_avail = mp->m_resblks; + } + spin_unlock(&mp->m_sb_lock); + if (!delta) + return 0; + } + + /* try the change */ + ret = xfs_icsb_add(mp, XFS_ICSB_FDBLOCKS, delta, + XFS_ALLOC_SET_ASIDE(mp)); + if (likely(ret >= 0)) + return 0; + + /* ENOSPC */ + ASSERT(ret == -ENOSPC); + ASSERT(delta < 0); + + if (!rsvd) + return XFS_ERROR(ENOSPC); + + spin_lock(&mp->m_sb_lock); + lcounter = (int64_t)mp->m_resblks_avail + delta; + if (lcounter >= 0) { + mp->m_resblks_avail = lcounter; + spin_unlock(&mp->m_sb_lock); + return 0; + } + spin_unlock(&mp->m_sb_lock); + printk_once(KERN_WARNING + "Filesystem \"%s\": reserve blocks depleted! " + "Consider increasing reserve pool size.", + mp->m_fsname); + return XFS_ERROR(ENOSPC); +} + /* * Check size of device based on the (data/realtime) block count. * Note: this check is used by the growfs code as well as mount. @@ -1939,89 +2016,6 @@ unwind: return error; } -int -xfs_icsb_modify_counters( - xfs_mount_t *mp, - xfs_sb_field_t field, - int64_t delta, - int rsvd) -{ - int64_t lcounter; - int64_t res_used; - int ret = 0; - - - switch (field) { - case XFS_SBS_ICOUNT: - ret = xfs_icsb_add(mp, cntr, delta, 0); - if (ret < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - return 0; - - case XFS_SBS_IFREE: - ret = xfs_icsb_add(mp, XFS_ICSB_IFREE, delta, 0); - if (ret < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - return 0; - - case XFS_SBS_FDBLOCKS: - /* - * if we are putting blocks back, put them into the reserve - * block pool first. - */ - if (mp->m_resblks != mp->m_resblks_avail && delta > 0) { - spin_lock(&mp->m_sb_lock); - res_used = (int64_t)(mp->m_resblks - - mp->m_resblks_avail); - if (res_used > delta) { - mp->m_resblks_avail += delta; - delta = 0; - } else { - delta -= res_used; - mp->m_resblks_avail = mp->m_resblks; - } - spin_unlock(&mp->m_sb_lock); - if (!delta) - return 0; - } - - /* try the change */ - ret = xfs_icsb_add(mp, XFS_ICSB_FDBLOCKS, delta, - XFS_ALLOC_SET_ASIDE(mp)); - if (likely(ret >= 0)) - return 0; - - /* ENOSPC */ - ASSERT(ret == -ENOSPC); - ASSERT(delta < 0); - - if (!rsvd) - return XFS_ERROR(ENOSPC); - - spin_lock(&mp->m_sb_lock); - lcounter = (int64_t)mp->m_resblks_avail + delta; - if (lcounter >= 0) { - mp->m_resblks_avail = lcounter; - spin_unlock(&mp->m_sb_lock); - return 0; - } - spin_unlock(&mp->m_sb_lock); - printk_once(KERN_WARNING - "Filesystem \"%s\": reserve blocks depleted! " - "Consider increasing reserve pool size.", - mp->m_fsname); - return XFS_ERROR(ENOSPC); - default: - ASSERT(0); - return XFS_ERROR(EINVAL); - } - return 0; -} - /* * xfs_getsb() is called to obtain the buffer for the superblock. * The buffer is returned locked and read in from disk. diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 42d31df..c8ff435 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -75,8 +75,8 @@ enum { XFS_ICSB_MAX, }; -extern int xfs_icsb_modify_counters(struct xfs_mount *, xfs_sb_field_t, - int64_t, int); +extern int xfs_icsb_modify_inodes(struct xfs_mount *, int, int64_t, int); +extern int xfs_icsb_modify_free_blocks(struct xfs_mount *, int64_t, int); typedef struct xfs_mount { struct super_block *m_super; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index f6d956b..cf2214d 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -696,7 +696,7 @@ xfs_trans_reserve( * fail if the count would go below zero. */ if (blocks > 0) { - error = xfs_icsb_modify_counters(tp->t_mountp, XFS_SBS_FDBLOCKS, + error = xfs_icsb_modify_free_blocks(tp->t_mountp, -((int64_t)blocks), rsvd); if (error != 0) { current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS); @@ -767,8 +767,8 @@ undo_log: undo_blocks: if (blocks > 0) { - xfs_icsb_modify_counters(tp->t_mountp, XFS_SBS_FDBLOCKS, - (int64_t)blocks, rsvd); + xfs_icsb_modify_free_blocks(tp->t_mountp, + (int64_t)blocks, rsvd); tp->t_blk_res = 0; } @@ -1045,21 +1045,20 @@ xfs_trans_unreserve_and_mod_sb( /* apply the per-cpu counters */ if (blkdelta) { - error = xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, - blkdelta, rsvd); + error = xfs_icsb_modify_free_blocks(mp, blkdelta, rsvd); if (error) goto out; } if (idelta) { - error = xfs_icsb_modify_counters(mp, XFS_SBS_ICOUNT, + error = xfs_icsb_modify_inodes(mp, XFS_ICSB_ICOUNT, idelta, rsvd); if (error) goto out_undo_fdblocks; } if (ifreedelta) { - error = xfs_icsb_modify_counters(mp, XFS_SBS_IFREE, + error = xfs_icsb_modify_inodes(mp, XFS_ICSB_IFREE, ifreedelta, rsvd); if (error) goto out_undo_icount; @@ -1129,13 +1128,13 @@ xfs_trans_unreserve_and_mod_sb( out_undo_ifreecount: if (ifreedelta) - xfs_icsb_modify_counters(mp, XFS_SBS_IFREE, -ifreedelta, rsvd); + xfs_icsb_modify_inodes(mp, XFS_ICSB_IFREE, -ifreedelta, rsvd); out_undo_icount: if (idelta) - xfs_icsb_modify_counters(mp, XFS_SBS_ICOUNT, -idelta, rsvd); + xfs_icsb_modify_inodes(mp, XFS_ICSB_ICOUNT, -idelta, rsvd); out_undo_fdblocks: if (blkdelta) - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS, -blkdelta, rsvd); + xfs_icsb_modify_free_blocks(mp, -blkdelta, rsvd); out: ASSERT(error = 0); return; -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() 2010-11-29 0:36 ` [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner @ 2010-11-29 9:19 ` Christoph Hellwig 2010-11-30 4:38 ` Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2010-11-29 9:19 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-kernel, a.p.zijlstra On Mon, Nov 29, 2010 at 11:36:41AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > With the conversion to percpu counters, xfs_icsb_modify_counters() really does > not need to exist. Convert the inode counter modifications to use a common > helper function for the one place that calls them, and add another function for > the free block modification and convert all the callers to use that. > +xfs_icsb_modify_inodes( > + xfs_mount_t *mp, struct xfs_mount, please. > + int cntr, > + int64_t delta, > + int rsvd) the rsvd argument isn't used at all. > +{ > + int ret = 0; > + > + ASSERT(cntr == XFS_ICSB_ICOUNT || cntr == XFS_ICSB_IFREE); > + > + ret = xfs_icsb_add(mp, cntr, delta, 0); > + if (ret < 0) { > + ASSERT(0); > + return XFS_ERROR(EINVAL); > + } > + return 0; You could get rdif of the ret argument as we don't care about the value. I also don't think we need the assert here - the caller already does one for us. > +xfs_icsb_modify_free_blocks( > + xfs_mount_t *mp, same here. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() 2010-11-29 9:19 ` Christoph Hellwig @ 2010-11-30 4:38 ` Dave Chinner 0 siblings, 0 replies; 12+ messages in thread From: Dave Chinner @ 2010-11-30 4:38 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs, linux-kernel, a.p.zijlstra On Mon, Nov 29, 2010 at 04:19:20AM -0500, Christoph Hellwig wrote: > On Mon, Nov 29, 2010 at 11:36:41AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@redhat.com> > > > > With the conversion to percpu counters, xfs_icsb_modify_counters() really does > > not need to exist. Convert the inode counter modifications to use a common > > helper function for the one place that calls them, and add another function for > > the free block modification and convert all the callers to use that. > > > +xfs_icsb_modify_inodes( > > + xfs_mount_t *mp, > > struct xfs_mount, please. > > > + int cntr, > > + int64_t delta, > > + int rsvd) > > the rsvd argument isn't used at all. > > > +{ > > + int ret = 0; > > + > > + ASSERT(cntr == XFS_ICSB_ICOUNT || cntr == XFS_ICSB_IFREE); > > + > > + ret = xfs_icsb_add(mp, cntr, delta, 0); > > + if (ret < 0) { > > + ASSERT(0); > > + return XFS_ERROR(EINVAL); > > + } > > + return 0; > > You could get rdif of the ret argument as we don't care about the > value. I also don't think we need the assert here - the caller already > does one for us. > > > +xfs_icsb_modify_free_blocks( > > + xfs_mount_t *mp, > > same here. Ok, makes sense. I'll clean it up. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 0/3] Use generic percpu counters in XFS V2 @ 2010-12-13 1:21 Dave Chinner 2010-12-13 1:21 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2010-12-13 1:21 UTC (permalink / raw) To: xfs; +Cc: linux-kernel, a.p.zijlstra This series replaces the XFS per-cpu superblock counters with generic per-cpu counters. It adds a specialised operation to the generic counter implementation and then switches the XFS percpu superblock counters over to use the generic percpu counter implementation. The result is several hundred lines of complex code removed from XFS. Version 2: - use wrappers for percpu_counter_add_unless_lt() to allow custom batch sizes to be used. - removed xfs_icsb_*() wrappers from percpu counters. - cleaned up split of xfs_icsb_modify_counters(). ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] xfs: use generic per-cpu counter infrastructure 2010-12-13 1:21 [PATCH 0/3] Use generic percpu counters in XFS V2 Dave Chinner @ 2010-12-13 1:21 ` Dave Chinner 2010-12-16 15:36 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Dave Chinner @ 2010-12-13 1:21 UTC (permalink / raw) To: xfs; +Cc: linux-kernel, a.p.zijlstra From: Dave Chinner <dchinner@redhat.com> XFS has a per-cpu counter implementation for in-core superblock counters that pre-dated the generic implementation. It is complex and baroque as it is tailored directly to the needs of ENOSPC detection. Now that the generic percpu counter infrastructure has the percpu_counter_add_unless_lt() function that implements the necessary threshold checks for us, switch the XFS per-cpu superblock counters to use the generic percpu counter infrastructure. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/linux-2.6/xfs_linux.h | 9 - fs/xfs/linux-2.6/xfs_super.c | 4 +- fs/xfs/xfs_fsops.c | 4 +- fs/xfs/xfs_mount.c | 802 ++++++++---------------------------------- fs/xfs/xfs_mount.h | 71 +--- 5 files changed, 171 insertions(+), 719 deletions(-) diff --git a/fs/xfs/linux-2.6/xfs_linux.h b/fs/xfs/linux-2.6/xfs_linux.h index 214ddd7..9fa4f2a 100644 --- a/fs/xfs/linux-2.6/xfs_linux.h +++ b/fs/xfs/linux-2.6/xfs_linux.h @@ -88,15 +88,6 @@ #include <xfs_super.h> #include <xfs_buf.h> -/* - * Feature macros (disable/enable) - */ -#ifdef CONFIG_SMP -#define HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ -#else -#undef HAVE_PERCPU_SB /* per cpu superblock counters are a 2.6 feature */ -#endif - #define irix_sgid_inherit xfs_params.sgid_inherit.val #define irix_symlink_mode xfs_params.symlink_mode.val #define xfs_panic_mask xfs_params.panic_mask.val diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c index c45b323..abcda07 100644 --- a/fs/xfs/linux-2.6/xfs_super.c +++ b/fs/xfs/linux-2.6/xfs_super.c @@ -1229,9 +1229,9 @@ xfs_fs_statfs( statp->f_fsid.val[0] = (u32)id; statp->f_fsid.val[1] = (u32)(id >> 32); - xfs_icsb_sync_counters(mp, XFS_ICSB_LAZY_COUNT); - spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); + statp->f_bsize = sbp->sb_blocksize; lsize = sbp->sb_logstart ? sbp->sb_logblocks : 0; statp->f_blocks = sbp->sb_dblocks - lsize; diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index a7c116e..fb9a9c8 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -478,8 +478,8 @@ xfs_fs_counts( xfs_mount_t *mp, xfs_fsop_counts_t *cnt) { - xfs_icsb_sync_counters(mp, XFS_ICSB_LAZY_COUNT); spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); cnt->freedata = mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); cnt->freertx = mp->m_sb.sb_frextents; cnt->freeino = mp->m_sb.sb_ifree; @@ -540,7 +540,7 @@ xfs_reserve_blocks( */ retry: spin_lock(&mp->m_sb_lock); - xfs_icsb_sync_counters_locked(mp, 0); + xfs_icsb_sync_counters(mp); /* * If our previous reservation was larger than the current value, diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 19e9dfa..16be1b1 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -46,19 +46,6 @@ STATIC void xfs_unmountfs_wait(xfs_mount_t *); - -#ifdef HAVE_PERCPU_SB -STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t, - int); -STATIC void xfs_icsb_balance_counter_locked(xfs_mount_t *, xfs_sb_field_t, - int); -STATIC void xfs_icsb_disable_counter(xfs_mount_t *, xfs_sb_field_t); -#else - -#define xfs_icsb_balance_counter(mp, a, b) do { } while (0) -#define xfs_icsb_balance_counter_locked(mp, a, b) do { } while (0) -#endif - static const struct { short offset; short type; /* 0 = integer @@ -281,6 +268,71 @@ xfs_free_perag( } /* + * Per-cpu incore superblock counters + * + * This provides distributed per cpu counters for contended fields (e.g. free + * block count). Difficulties arise in that the incore sb is used for ENOSPC + * checking, and hence needs to be accurately read when we are running low on + * space. Hence We need to check against counter error bounds and determine how + * accurately to sum based on that metric. The percpu counters take care of + * this for us, so we only need to modify the fast path to handle per-cpu + * counter error cases. + */ +void +xfs_icsb_reinit_counters( + struct xfs_mount *mp) +{ + percpu_counter_set(&mp->m_icsb[XFS_ICSB_FDBLOCKS], + mp->m_sb.sb_fdblocks); + percpu_counter_set(&mp->m_icsb[XFS_ICSB_IFREE], mp->m_sb.sb_ifree); + percpu_counter_set(&mp->m_icsb[XFS_ICSB_ICOUNT], mp->m_sb.sb_icount); +} + +int +xfs_icsb_init_counters( + struct xfs_mount *mp) +{ + int i; + int error; + + for (i = 0; i < XFS_ICSB_MAX; i++) { + error = percpu_counter_init(&mp->m_icsb[i], 0); + if (error) + goto out_error; + } + xfs_icsb_reinit_counters(mp); + return 0; + +out_error: + for (; i >= 0; i--) + percpu_counter_destroy(&mp->m_icsb[i]); + return error; +} + +void +xfs_icsb_destroy_counters( + xfs_mount_t *mp) +{ + int i; + + for (i = 0; i < XFS_ICSB_MAX; i++) + percpu_counter_destroy(&mp->m_icsb[i]); +} + +void +xfs_icsb_sync_counters( + xfs_mount_t *mp) +{ + assert_spin_locked(&mp->m_sb_lock); + mp->m_sb.sb_icount = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_ICOUNT]); + mp->m_sb.sb_ifree = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_IFREE]); + mp->m_sb.sb_fdblocks = + percpu_counter_sum_positive(&mp->m_icsb[XFS_ICSB_FDBLOCKS]); +} + +/* * Check size of device based on the (data/realtime) block count. * Note: this check is used by the growfs code as well as mount. */ @@ -1562,7 +1614,9 @@ xfs_log_sbcount( if (!xfs_fs_writable(mp)) return 0; - xfs_icsb_sync_counters(mp, 0); + spin_lock(&mp->m_sb_lock); + xfs_icsb_sync_counters(mp); + spin_unlock(&mp->m_sb_lock); /* * we don't need to do this if we are updating the superblock @@ -1674,9 +1728,8 @@ xfs_mod_incore_sb_unlocked( int64_t delta, int rsvd) { - int scounter; /* short counter for 32 bit fields */ - long long lcounter; /* long counter for 64 bit fields */ - long long res_used, rem; + int scounter = 0; /* short counter for 32 bit fields */ + long long lcounter = 0; /* long counter for 64 bit fields */ /* * With the in-core superblock spin lock held, switch @@ -1685,66 +1738,6 @@ xfs_mod_incore_sb_unlocked( * 0, then do not apply the delta and return EINVAL. */ switch (field) { - case XFS_SBS_ICOUNT: - lcounter = (long long)mp->m_sb.sb_icount; - lcounter += delta; - if (lcounter < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - mp->m_sb.sb_icount = lcounter; - return 0; - case XFS_SBS_IFREE: - lcounter = (long long)mp->m_sb.sb_ifree; - lcounter += delta; - if (lcounter < 0) { - ASSERT(0); - return XFS_ERROR(EINVAL); - } - mp->m_sb.sb_ifree = lcounter; - return 0; - case XFS_SBS_FDBLOCKS: - lcounter = (long long) - mp->m_sb.sb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); - res_used = (long long)(mp->m_resblks - mp->m_resblks_avail); - - if (delta > 0) { /* Putting blocks back */ - if (res_used > delta) { - mp->m_resblks_avail += delta; - } else { - rem = delta - res_used; - mp->m_resblks_avail = mp->m_resblks; - lcounter += rem; - } - } else { /* Taking blocks away */ - lcounter += delta; - if (lcounter >= 0) { - mp->m_sb.sb_fdblocks = lcounter + - XFS_ALLOC_SET_ASIDE(mp); - return 0; - } - - /* - * We are out of blocks, use any available reserved - * blocks if were allowed to. - */ - if (!rsvd) - return XFS_ERROR(ENOSPC); - - lcounter = (long long)mp->m_resblks_avail + delta; - if (lcounter >= 0) { - mp->m_resblks_avail = lcounter; - return 0; - } - printk_once(KERN_WARNING - "Filesystem \"%s\": reserve blocks depleted! " - "Consider increasing reserve pool size.", - mp->m_fsname); - return XFS_ERROR(ENOSPC); - } - - mp->m_sb.sb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp); - return 0; case XFS_SBS_FREXTENTS: lcounter = (long long)mp->m_sb.sb_frextents; lcounter += delta; @@ -1846,9 +1839,7 @@ xfs_mod_incore_sb( { int status; -#ifdef HAVE_PERCPU_SB ASSERT(field < XFS_SBS_ICOUNT || field > XFS_SBS_FDBLOCKS); -#endif spin_lock(&mp->m_sb_lock); status = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd); spin_unlock(&mp->m_sb_lock); @@ -1907,6 +1898,90 @@ unwind: return error; } +int +xfs_icsb_modify_counters( + xfs_mount_t *mp, + xfs_sb_field_t field, + int64_t delta, + int rsvd) +{ + int64_t lcounter; + int64_t res_used; + int ret = 0; + + + switch (field) { + case XFS_SBS_ICOUNT: + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_SBS_ICOUNT], + delta, 0); + if (ret < 0) { + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; + + case XFS_SBS_IFREE: + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_SBS_IFREE], + delta, 0); + if (ret < 0) { + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; + + case XFS_SBS_FDBLOCKS: + /* + * if we are putting blocks back, put them into the reserve + * block pool first. + */ + if (mp->m_resblks != mp->m_resblks_avail && delta > 0) { + spin_lock(&mp->m_sb_lock); + res_used = (int64_t)(mp->m_resblks - + mp->m_resblks_avail); + if (res_used > delta) { + mp->m_resblks_avail += delta; + delta = 0; + } else { + delta -= res_used; + mp->m_resblks_avail = mp->m_resblks; + } + spin_unlock(&mp->m_sb_lock); + if (!delta) + return 0; + } + + /* try the change */ + ret = percpu_counter_add_unless_lt(&mp->m_icsb[XFS_ICSB_FDBLOCKS], + delta, XFS_ALLOC_SET_ASIDE(mp)); + if (likely(ret >= 0)) + return 0; + + /* ENOSPC */ + ASSERT(delta < 0); + + if (!rsvd) + return XFS_ERROR(ENOSPC); + + spin_lock(&mp->m_sb_lock); + lcounter = (int64_t)mp->m_resblks_avail + delta; + if (lcounter >= 0) { + mp->m_resblks_avail = lcounter; + spin_unlock(&mp->m_sb_lock); + return 0; + } + spin_unlock(&mp->m_sb_lock); + printk_once(KERN_WARNING + "Filesystem \"%s\": reserve blocks depleted! " + "Consider increasing reserve pool size.", + mp->m_fsname); + return XFS_ERROR(ENOSPC); + default: + ASSERT(0); + return XFS_ERROR(EINVAL); + } + return 0; +} + /* * xfs_getsb() is called to obtain the buffer for the superblock. * The buffer is returned locked and read in from disk. @@ -2000,572 +2075,3 @@ xfs_dev_is_read_only( } return 0; } - -#ifdef HAVE_PERCPU_SB -/* - * Per-cpu incore superblock counters - * - * Simple concept, difficult implementation - * - * Basically, replace the incore superblock counters with a distributed per cpu - * counter for contended fields (e.g. free block count). - * - * Difficulties arise in that the incore sb is used for ENOSPC checking, and - * hence needs to be accurately read when we are running low on space. Hence - * there is a method to enable and disable the per-cpu counters based on how - * much "stuff" is available in them. - * - * Basically, a counter is enabled if there is enough free resource to justify - * running a per-cpu fast-path. If the per-cpu counter runs out (i.e. a local - * ENOSPC), then we disable the counters to synchronise all callers and - * re-distribute the available resources. - * - * If, once we redistributed the available resources, we still get a failure, - * we disable the per-cpu counter and go through the slow path. - * - * The slow path is the current xfs_mod_incore_sb() function. This means that - * when we disable a per-cpu counter, we need to drain its resources back to - * the global superblock. We do this after disabling the counter to prevent - * more threads from queueing up on the counter. - * - * Essentially, this means that we still need a lock in the fast path to enable - * synchronisation between the global counters and the per-cpu counters. This - * is not a problem because the lock will be local to a CPU almost all the time - * and have little contention except when we get to ENOSPC conditions. - * - * Basically, this lock becomes a barrier that enables us to lock out the fast - * path while we do things like enabling and disabling counters and - * synchronising the counters. - * - * Locking rules: - * - * 1. m_sb_lock before picking up per-cpu locks - * 2. per-cpu locks always picked up via for_each_online_cpu() order - * 3. accurate counter sync requires m_sb_lock + per cpu locks - * 4. modifying per-cpu counters requires holding per-cpu lock - * 5. modifying global counters requires holding m_sb_lock - * 6. enabling or disabling a counter requires holding the m_sb_lock - * and _none_ of the per-cpu locks. - * - * Disabled counters are only ever re-enabled by a balance operation - * that results in more free resources per CPU than a given threshold. - * To ensure counters don't remain disabled, they are rebalanced when - * the global resource goes above a higher threshold (i.e. some hysteresis - * is present to prevent thrashing). - */ - -#ifdef CONFIG_HOTPLUG_CPU -/* - * hot-plug CPU notifier support. - * - * We need a notifier per filesystem as we need to be able to identify - * the filesystem to balance the counters out. This is achieved by - * having a notifier block embedded in the xfs_mount_t and doing pointer - * magic to get the mount pointer from the notifier block address. - */ -STATIC int -xfs_icsb_cpu_notify( - struct notifier_block *nfb, - unsigned long action, - void *hcpu) -{ - xfs_icsb_cnts_t *cntp; - xfs_mount_t *mp; - - mp = (xfs_mount_t *)container_of(nfb, xfs_mount_t, m_icsb_notifier); - cntp = (xfs_icsb_cnts_t *) - per_cpu_ptr(mp->m_sb_cnts, (unsigned long)hcpu); - switch (action) { - case CPU_UP_PREPARE: - case CPU_UP_PREPARE_FROZEN: - /* Easy Case - initialize the area and locks, and - * then rebalance when online does everything else for us. */ - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - break; - case CPU_ONLINE: - case CPU_ONLINE_FROZEN: - xfs_icsb_lock(mp); - xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0); - xfs_icsb_unlock(mp); - break; - case CPU_DEAD: - case CPU_DEAD_FROZEN: - /* Disable all the counters, then fold the dead cpu's - * count into the total on the global superblock and - * re-enable the counters. */ - xfs_icsb_lock(mp); - spin_lock(&mp->m_sb_lock); - xfs_icsb_disable_counter(mp, XFS_SBS_ICOUNT); - xfs_icsb_disable_counter(mp, XFS_SBS_IFREE); - xfs_icsb_disable_counter(mp, XFS_SBS_FDBLOCKS); - - mp->m_sb.sb_icount += cntp->icsb_icount; - mp->m_sb.sb_ifree += cntp->icsb_ifree; - mp->m_sb.sb_fdblocks += cntp->icsb_fdblocks; - - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - - xfs_icsb_balance_counter_locked(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter_locked(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter_locked(mp, XFS_SBS_FDBLOCKS, 0); - spin_unlock(&mp->m_sb_lock); - xfs_icsb_unlock(mp); - break; - } - - return NOTIFY_OK; -} -#endif /* CONFIG_HOTPLUG_CPU */ - -int -xfs_icsb_init_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - mp->m_sb_cnts = alloc_percpu(xfs_icsb_cnts_t); - if (mp->m_sb_cnts == NULL) - return -ENOMEM; - -#ifdef CONFIG_HOTPLUG_CPU - mp->m_icsb_notifier.notifier_call = xfs_icsb_cpu_notify; - mp->m_icsb_notifier.priority = 0; - register_hotcpu_notifier(&mp->m_icsb_notifier); -#endif /* CONFIG_HOTPLUG_CPU */ - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - memset(cntp, 0, sizeof(xfs_icsb_cnts_t)); - } - - mutex_init(&mp->m_icsb_mutex); - - /* - * start with all counters disabled so that the - * initial balance kicks us off correctly - */ - mp->m_icsb_counters = -1; - return 0; -} - -void -xfs_icsb_reinit_counters( - xfs_mount_t *mp) -{ - xfs_icsb_lock(mp); - /* - * start with all counters disabled so that the - * initial balance kicks us off correctly - */ - mp->m_icsb_counters = -1; - xfs_icsb_balance_counter(mp, XFS_SBS_ICOUNT, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_IFREE, 0); - xfs_icsb_balance_counter(mp, XFS_SBS_FDBLOCKS, 0); - xfs_icsb_unlock(mp); -} - -void -xfs_icsb_destroy_counters( - xfs_mount_t *mp) -{ - if (mp->m_sb_cnts) { - unregister_hotcpu_notifier(&mp->m_icsb_notifier); - free_percpu(mp->m_sb_cnts); - } - mutex_destroy(&mp->m_icsb_mutex); -} - -STATIC void -xfs_icsb_lock_cntr( - xfs_icsb_cnts_t *icsbp) -{ - while (test_and_set_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags)) { - ndelay(1000); - } -} - -STATIC void -xfs_icsb_unlock_cntr( - xfs_icsb_cnts_t *icsbp) -{ - clear_bit(XFS_ICSB_FLAG_LOCK, &icsbp->icsb_flags); -} - - -STATIC void -xfs_icsb_lock_all_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - xfs_icsb_lock_cntr(cntp); - } -} - -STATIC void -xfs_icsb_unlock_all_counters( - xfs_mount_t *mp) -{ - xfs_icsb_cnts_t *cntp; - int i; - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - xfs_icsb_unlock_cntr(cntp); - } -} - -STATIC void -xfs_icsb_count( - xfs_mount_t *mp, - xfs_icsb_cnts_t *cnt, - int flags) -{ - xfs_icsb_cnts_t *cntp; - int i; - - memset(cnt, 0, sizeof(xfs_icsb_cnts_t)); - - if (!(flags & XFS_ICSB_LAZY_COUNT)) - xfs_icsb_lock_all_counters(mp); - - for_each_online_cpu(i) { - cntp = (xfs_icsb_cnts_t *)per_cpu_ptr(mp->m_sb_cnts, i); - cnt->icsb_icount += cntp->icsb_icount; - cnt->icsb_ifree += cntp->icsb_ifree; - cnt->icsb_fdblocks += cntp->icsb_fdblocks; - } - - if (!(flags & XFS_ICSB_LAZY_COUNT)) - xfs_icsb_unlock_all_counters(mp); -} - -STATIC int -xfs_icsb_counter_disabled( - xfs_mount_t *mp, - xfs_sb_field_t field) -{ - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - return test_bit(field, &mp->m_icsb_counters); -} - -STATIC void -xfs_icsb_disable_counter( - xfs_mount_t *mp, - xfs_sb_field_t field) -{ - xfs_icsb_cnts_t cnt; - - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - - /* - * If we are already disabled, then there is nothing to do - * here. We check before locking all the counters to avoid - * the expensive lock operation when being called in the - * slow path and the counter is already disabled. This is - * safe because the only time we set or clear this state is under - * the m_icsb_mutex. - */ - if (xfs_icsb_counter_disabled(mp, field)) - return; - - xfs_icsb_lock_all_counters(mp); - if (!test_and_set_bit(field, &mp->m_icsb_counters)) { - /* drain back to superblock */ - - xfs_icsb_count(mp, &cnt, XFS_ICSB_LAZY_COUNT); - switch(field) { - case XFS_SBS_ICOUNT: - mp->m_sb.sb_icount = cnt.icsb_icount; - break; - case XFS_SBS_IFREE: - mp->m_sb.sb_ifree = cnt.icsb_ifree; - break; - case XFS_SBS_FDBLOCKS: - mp->m_sb.sb_fdblocks = cnt.icsb_fdblocks; - break; - default: - BUG(); - } - } - - xfs_icsb_unlock_all_counters(mp); -} - -STATIC void -xfs_icsb_enable_counter( - xfs_mount_t *mp, - xfs_sb_field_t field, - uint64_t count, - uint64_t resid) -{ - xfs_icsb_cnts_t *cntp; - int i; - - ASSERT((field >= XFS_SBS_ICOUNT) && (field <= XFS_SBS_FDBLOCKS)); - - xfs_icsb_lock_all_counters(mp); - for_each_online_cpu(i) { - cntp = per_cpu_ptr(mp->m_sb_cnts, i); - switch (field) { - case XFS_SBS_ICOUNT: - cntp->icsb_icount = count + resid; - break; - case XFS_SBS_IFREE: - cntp->icsb_ifree = count + resid; - break; - case XFS_SBS_FDBLOCKS: - cntp->icsb_fdblocks = count + resid; - break; - default: - BUG(); - break; - } - resid = 0; - } - clear_bit(field, &mp->m_icsb_counters); - xfs_icsb_unlock_all_counters(mp); -} - -void -xfs_icsb_sync_counters_locked( - xfs_mount_t *mp, - int flags) -{ - xfs_icsb_cnts_t cnt; - - xfs_icsb_count(mp, &cnt, flags); - - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_ICOUNT)) - mp->m_sb.sb_icount = cnt.icsb_icount; - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_IFREE)) - mp->m_sb.sb_ifree = cnt.icsb_ifree; - if (!xfs_icsb_counter_disabled(mp, XFS_SBS_FDBLOCKS)) - mp->m_sb.sb_fdblocks = cnt.icsb_fdblocks; -} - -/* - * Accurate update of per-cpu counters to incore superblock - */ -void -xfs_icsb_sync_counters( - xfs_mount_t *mp, - int flags) -{ - spin_lock(&mp->m_sb_lock); - xfs_icsb_sync_counters_locked(mp, flags); - spin_unlock(&mp->m_sb_lock); -} - -/* - * Balance and enable/disable counters as necessary. - * - * Thresholds for re-enabling counters are somewhat magic. inode counts are - * chosen to be the same number as single on disk allocation chunk per CPU, and - * free blocks is something far enough zero that we aren't going thrash when we - * get near ENOSPC. We also need to supply a minimum we require per cpu to - * prevent looping endlessly when xfs_alloc_space asks for more than will - * be distributed to a single CPU but each CPU has enough blocks to be - * reenabled. - * - * Note that we can be called when counters are already disabled. - * xfs_icsb_disable_counter() optimises the counter locking in this case to - * prevent locking every per-cpu counter needlessly. - */ - -#define XFS_ICSB_INO_CNTR_REENABLE (uint64_t)64 -#define XFS_ICSB_FDBLK_CNTR_REENABLE(mp) \ - (uint64_t)(512 + XFS_ALLOC_SET_ASIDE(mp)) -STATIC void -xfs_icsb_balance_counter_locked( - xfs_mount_t *mp, - xfs_sb_field_t field, - int min_per_cpu) -{ - uint64_t count, resid; - int weight = num_online_cpus(); - uint64_t min = (uint64_t)min_per_cpu; - - /* disable counter and sync counter */ - xfs_icsb_disable_counter(mp, field); - - /* update counters - first CPU gets residual*/ - switch (field) { - case XFS_SBS_ICOUNT: - count = mp->m_sb.sb_icount; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE)) - return; - break; - case XFS_SBS_IFREE: - count = mp->m_sb.sb_ifree; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_INO_CNTR_REENABLE)) - return; - break; - case XFS_SBS_FDBLOCKS: - count = mp->m_sb.sb_fdblocks; - resid = do_div(count, weight); - if (count < max(min, XFS_ICSB_FDBLK_CNTR_REENABLE(mp))) - return; - break; - default: - BUG(); - count = resid = 0; /* quiet, gcc */ - break; - } - - xfs_icsb_enable_counter(mp, field, count, resid); -} - -STATIC void -xfs_icsb_balance_counter( - xfs_mount_t *mp, - xfs_sb_field_t fields, - int min_per_cpu) -{ - spin_lock(&mp->m_sb_lock); - xfs_icsb_balance_counter_locked(mp, fields, min_per_cpu); - spin_unlock(&mp->m_sb_lock); -} - -int -xfs_icsb_modify_counters( - xfs_mount_t *mp, - xfs_sb_field_t field, - int64_t delta, - int rsvd) -{ - xfs_icsb_cnts_t *icsbp; - long long lcounter; /* long counter for 64 bit fields */ - int ret = 0; - - might_sleep(); -again: - preempt_disable(); - icsbp = this_cpu_ptr(mp->m_sb_cnts); - - /* - * if the counter is disabled, go to slow path - */ - if (unlikely(xfs_icsb_counter_disabled(mp, field))) - goto slow_path; - xfs_icsb_lock_cntr(icsbp); - if (unlikely(xfs_icsb_counter_disabled(mp, field))) { - xfs_icsb_unlock_cntr(icsbp); - goto slow_path; - } - - switch (field) { - case XFS_SBS_ICOUNT: - lcounter = icsbp->icsb_icount; - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_icount = lcounter; - break; - - case XFS_SBS_IFREE: - lcounter = icsbp->icsb_ifree; - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_ifree = lcounter; - break; - - case XFS_SBS_FDBLOCKS: - BUG_ON((mp->m_resblks - mp->m_resblks_avail) != 0); - - lcounter = icsbp->icsb_fdblocks - XFS_ALLOC_SET_ASIDE(mp); - lcounter += delta; - if (unlikely(lcounter < 0)) - goto balance_counter; - icsbp->icsb_fdblocks = lcounter + XFS_ALLOC_SET_ASIDE(mp); - break; - default: - BUG(); - break; - } - xfs_icsb_unlock_cntr(icsbp); - preempt_enable(); - return 0; - -slow_path: - preempt_enable(); - - /* - * serialise with a mutex so we don't burn lots of cpu on - * the superblock lock. We still need to hold the superblock - * lock, however, when we modify the global structures. - */ - xfs_icsb_lock(mp); - - /* - * Now running atomically. - * - * If the counter is enabled, someone has beaten us to rebalancing. - * Drop the lock and try again in the fast path.... - */ - if (!(xfs_icsb_counter_disabled(mp, field))) { - xfs_icsb_unlock(mp); - goto again; - } - - /* - * The counter is currently disabled. Because we are - * running atomically here, we know a rebalance cannot - * be in progress. Hence we can go straight to operating - * on the global superblock. We do not call xfs_mod_incore_sb() - * here even though we need to get the m_sb_lock. Doing so - * will cause us to re-enter this function and deadlock. - * Hence we get the m_sb_lock ourselves and then call - * xfs_mod_incore_sb_unlocked() as the unlocked path operates - * directly on the global counters. - */ - spin_lock(&mp->m_sb_lock); - ret = xfs_mod_incore_sb_unlocked(mp, field, delta, rsvd); - spin_unlock(&mp->m_sb_lock); - - /* - * Now that we've modified the global superblock, we - * may be able to re-enable the distributed counters - * (e.g. lots of space just got freed). After that - * we are done. - */ - if (ret != ENOSPC) - xfs_icsb_balance_counter(mp, field, 0); - xfs_icsb_unlock(mp); - return ret; - -balance_counter: - xfs_icsb_unlock_cntr(icsbp); - preempt_enable(); - - /* - * We may have multiple threads here if multiple per-cpu - * counters run dry at the same time. This will mean we can - * do more balances than strictly necessary but it is not - * the common slowpath case. - */ - xfs_icsb_lock(mp); - - /* - * running atomically. - * - * This will leave the counter in the correct state for future - * accesses. After the rebalance, we simply try again and our retry - * will either succeed through the fast path or slow path without - * another balance operation being required. - */ - xfs_icsb_balance_counter(mp, field, delta); - xfs_icsb_unlock(mp); - goto again; -} - -#endif diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 5861b49..42d31df 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -65,44 +65,19 @@ struct xfs_nameops; struct xfs_ail; struct xfs_quotainfo; -#ifdef HAVE_PERCPU_SB - /* - * Valid per-cpu incore superblock counters. Note that if you add new counters, - * you may need to define new counter disabled bit field descriptors as there - * are more possible fields in the superblock that can fit in a bitfield on a - * 32 bit platform. The XFS_SBS_* values for the current current counters just - * fit. + * Per-cpu incore superblock counters. */ -typedef struct xfs_icsb_cnts { - uint64_t icsb_fdblocks; - uint64_t icsb_ifree; - uint64_t icsb_icount; - unsigned long icsb_flags; -} xfs_icsb_cnts_t; - -#define XFS_ICSB_FLAG_LOCK (1 << 0) /* counter lock bit */ - -#define XFS_ICSB_LAZY_COUNT (1 << 1) /* accuracy not needed */ +enum { + XFS_ICSB_FDBLOCKS = 0, + XFS_ICSB_IFREE, + XFS_ICSB_ICOUNT, + XFS_ICSB_MAX, +}; -extern int xfs_icsb_init_counters(struct xfs_mount *); -extern void xfs_icsb_reinit_counters(struct xfs_mount *); -extern void xfs_icsb_destroy_counters(struct xfs_mount *); -extern void xfs_icsb_sync_counters(struct xfs_mount *, int); -extern void xfs_icsb_sync_counters_locked(struct xfs_mount *, int); extern int xfs_icsb_modify_counters(struct xfs_mount *, xfs_sb_field_t, int64_t, int); -#else -#define xfs_icsb_init_counters(mp) (0) -#define xfs_icsb_destroy_counters(mp) do { } while (0) -#define xfs_icsb_reinit_counters(mp) do { } while (0) -#define xfs_icsb_sync_counters(mp, flags) do { } while (0) -#define xfs_icsb_sync_counters_locked(mp, flags) do { } while (0) -#define xfs_icsb_modify_counters(mp, field, delta, rsvd) \ - xfs_mod_incore_sb(mp, field, delta, rsvd) -#endif - typedef struct xfs_mount { struct super_block *m_super; xfs_tid_t m_tid; /* next unused tid for fs */ @@ -186,12 +161,6 @@ typedef struct xfs_mount { struct xfs_chash *m_chash; /* fs private inode per-cluster * hash table */ atomic_t m_active_trans; /* number trans frozen */ -#ifdef HAVE_PERCPU_SB - xfs_icsb_cnts_t __percpu *m_sb_cnts; /* per-cpu superblock counters */ - unsigned long m_icsb_counters; /* disabled per-cpu counters */ - struct notifier_block m_icsb_notifier; /* hotplug cpu notifier */ - struct mutex m_icsb_mutex; /* balancer sync lock */ -#endif struct xfs_mru_cache *m_filestream; /* per-mount filestream data */ struct task_struct *m_sync_task; /* generalised sync thread */ xfs_sync_work_t m_sync_work; /* work item for VFS_SYNC */ @@ -202,6 +171,7 @@ typedef struct xfs_mount { __int64_t m_update_flags; /* sb flags we need to update on the next remount,rw */ struct shrinker m_inode_shrink; /* inode reclaim shrinker */ + struct percpu_counter m_icsb[XFS_ICSB_MAX]; } xfs_mount_t; /* @@ -333,26 +303,6 @@ struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno, void xfs_perag_put(struct xfs_perag *pag); /* - * Per-cpu superblock locking functions - */ -#ifdef HAVE_PERCPU_SB -static inline void -xfs_icsb_lock(xfs_mount_t *mp) -{ - mutex_lock(&mp->m_icsb_mutex); -} - -static inline void -xfs_icsb_unlock(xfs_mount_t *mp) -{ - mutex_unlock(&mp->m_icsb_mutex); -} -#else -#define xfs_icsb_lock(mp) -#define xfs_icsb_unlock(mp) -#endif - -/* * This structure is for use by the xfs_mod_incore_sb_batch() routine. * xfs_growfs can specify a few fields which are more than int limit */ @@ -379,6 +329,11 @@ extern int xfs_sb_validate_fsb_count(struct xfs_sb *, __uint64_t); extern int xfs_dev_is_read_only(struct xfs_mount *, char *); +extern int xfs_icsb_init_counters(struct xfs_mount *); +extern void xfs_icsb_reinit_counters(struct xfs_mount *); +extern void xfs_icsb_destroy_counters(struct xfs_mount *); +extern void xfs_icsb_sync_counters(struct xfs_mount *); + #endif /* __KERNEL__ */ extern void xfs_mod_sb(struct xfs_trans *, __int64_t); -- 1.7.2.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] xfs: use generic per-cpu counter infrastructure 2010-12-13 1:21 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner @ 2010-12-16 15:36 ` Christoph Hellwig 0 siblings, 0 replies; 12+ messages in thread From: Christoph Hellwig @ 2010-12-16 15:36 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, linux-kernel, a.p.zijlstra On Mon, Dec 13, 2010 at 12:21:52PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > XFS has a per-cpu counter implementation for in-core superblock > counters that pre-dated the generic implementation. It is complex > and baroque as it is tailored directly to the needs of ENOSPC > detection. > > Now that the generic percpu counter infrastructure has the > percpu_counter_add_unless_lt() function that implements the > necessary threshold checks for us, switch the XFS per-cpu > superblock counters to use the generic percpu counter > infrastructure. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks good to me, Reviewed-by: Christoph Hellwig <hch@lst.de> A little nipick: > -#ifdef HAVE_PERCPU_SB > ASSERT(field < XFS_SBS_ICOUNT || field > XFS_SBS_FDBLOCKS); > -#endif No need to keep this assert - xfs_mod_incore_sb_unlocked already has one for unknown fields. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-12-16 15:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-29 0:36 [PATCH 0/3] Use generic percpu counters in XFS Dave Chinner 2010-11-29 0:36 ` [PATCH 1/3] lib: percpu counter add unless less than functionality Dave Chinner 2010-11-29 12:08 ` Peter Zijlstra 2010-11-30 4:30 ` Dave Chinner 2010-11-29 0:36 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner 2010-11-29 9:16 ` Christoph Hellwig 2010-11-30 4:37 ` Dave Chinner 2010-11-29 0:36 ` [PATCH 3/3] xfs: demultiplex xfs_icsb_modify_counters() Dave Chinner 2010-11-29 9:19 ` Christoph Hellwig 2010-11-30 4:38 ` Dave Chinner -- strict thread matches above, loose matches on Subject: below -- 2010-12-13 1:21 [PATCH 0/3] Use generic percpu counters in XFS V2 Dave Chinner 2010-12-13 1:21 ` [PATCH 2/3] xfs: use generic per-cpu counter infrastructure Dave Chinner 2010-12-16 15:36 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox