* [PATCH] Increase the default size of the reserved blocks pool
@ 2008-09-29 8:54 Lachlan McIlroy
2008-09-30 3:26 ` Mark Goodwin
2008-09-30 4:11 ` Dave Chinner
0 siblings, 2 replies; 9+ messages in thread
From: Lachlan McIlroy @ 2008-09-29 8:54 UTC (permalink / raw)
To: xfs-dev, xfs-oss
The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions. If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited
warning so we know if this starts happening again.
--- a/fs/xfs/xfs_mount.c 2008-09-29 18:30:26.000000000 +1000
+++ b/fs/xfs/xfs_mount.c 2008-09-29 18:27:37.000000000 +1000
@@ -1194,7 +1194,7 @@ xfs_mountfs(
*/
resblks = mp->m_sb.sb_dblocks;
do_div(resblks, 20);
- resblks = min_t(__uint64_t, resblks, 1024);
+ resblks = min_t(__uint64_t, resblks, 16384);
error = xfs_reserve_blocks(mp, &resblks, NULL);
if (error)
cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. "
@@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked(
int scounter; /* short counter for 32 bit fields */
long long lcounter; /* long counter for 64 bit fields */
long long res_used, rem;
+ static int depleted = 0;
/*
* With the in-core superblock spin lock held, switch
@@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked(
if (rsvd) {
lcounter = (long long)mp->m_resblks_avail + delta;
if (lcounter < 0) {
+ if ((depleted % 100) == 0)
+ printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n");
+ depleted++;
return XFS_ERROR(ENOSPC);
}
mp->m_resblks_avail = lcounter;
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-29 8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy @ 2008-09-30 3:26 ` Mark Goodwin 2008-09-30 4:25 ` Dave Chinner 2008-09-30 4:11 ` Dave Chinner 1 sibling, 1 reply; 9+ messages in thread From: Mark Goodwin @ 2008-09-30 3:26 UTC (permalink / raw) To: lachlan; +Cc: xfs-dev, xfs-oss Lachlan McIlroy wrote: > The current default size of the reserved blocks pool is easy to deplete > with certain workloads, in particular workloads that do lots of concurrent > delayed allocation extent conversions. If enough transactions are running > in parallel and the entire pool is consumed then subsequent calls to > xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited > warning so we know if this starts happening again. > Should we also change the semantics of the XFS_SET_RESBLKS ioctl so that the passed in value is the minimum required by the caller, i.e. silently succeed if the current value is more than that? Cheers -- Mark ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 3:26 ` Mark Goodwin @ 2008-09-30 4:25 ` Dave Chinner 2008-09-30 6:08 ` Lachlan McIlroy 0 siblings, 1 reply; 9+ messages in thread From: Dave Chinner @ 2008-09-30 4:25 UTC (permalink / raw) To: Mark Goodwin; +Cc: lachlan, xfs-dev, xfs-oss On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote: > > > Lachlan McIlroy wrote: >> The current default size of the reserved blocks pool is easy to deplete >> with certain workloads, in particular workloads that do lots of concurrent >> delayed allocation extent conversions. If enough transactions are running >> in parallel and the entire pool is consumed then subsequent calls to >> xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited >> warning so we know if this starts happening again. >> > > Should we also change the semantics of the XFS_SET_RESBLKS ioctl > so that the passed in value is the minimum required by the caller, > i.e. silently succeed if the current value is more than that? No. If we are asked to reduce the size of the pool, then we should do so. The caller might have reason for wanting the pool size reduced. e.g. using it to trigger early ENOSPC notification so that there is always room to write critical application data when the filesystem fills up.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 4:25 ` Dave Chinner @ 2008-09-30 6:08 ` Lachlan McIlroy 2008-09-30 6:37 ` Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Lachlan McIlroy @ 2008-09-30 6:08 UTC (permalink / raw) To: Mark Goodwin, lachlan, xfs-dev, xfs-oss Dave Chinner wrote: > On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote: >> >> Lachlan McIlroy wrote: >>> The current default size of the reserved blocks pool is easy to deplete >>> with certain workloads, in particular workloads that do lots of concurrent >>> delayed allocation extent conversions. If enough transactions are running >>> in parallel and the entire pool is consumed then subsequent calls to >>> xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited >>> warning so we know if this starts happening again. >>> >> Should we also change the semantics of the XFS_SET_RESBLKS ioctl >> so that the passed in value is the minimum required by the caller, >> i.e. silently succeed if the current value is more than that? > > No. If we are asked to reduce the size of the pool, then we should > do so. The caller might have reason for wanting the pool size > reduced. e.g. using it to trigger early ENOSPC notification so that > there is always room to write critical application data when the > filesystem fills up.... > We tossed around the idea of preventing applications from reducing the size of the reserved pool so that they could not weaken the integrity of the filesystem by removing critical resources. We need to support reducing the pool size because we do so on unmount. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 6:08 ` Lachlan McIlroy @ 2008-09-30 6:37 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2008-09-30 6:37 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: Mark Goodwin, xfs-dev, xfs-oss On Tue, Sep 30, 2008 at 04:08:15PM +1000, Lachlan McIlroy wrote: > Dave Chinner wrote: >> On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote: >>> >>> Lachlan McIlroy wrote: >>>> The current default size of the reserved blocks pool is easy to deplete >>>> with certain workloads, in particular workloads that do lots of concurrent >>>> delayed allocation extent conversions. If enough transactions are running >>>> in parallel and the entire pool is consumed then subsequent calls to >>>> xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited >>>> warning so we know if this starts happening again. >>>> >>> Should we also change the semantics of the XFS_SET_RESBLKS ioctl >>> so that the passed in value is the minimum required by the caller, >>> i.e. silently succeed if the current value is more than that? >> >> No. If we are asked to reduce the size of the pool, then we should >> do so. The caller might have reason for wanting the pool size >> reduced. e.g. using it to trigger early ENOSPC notification so that >> there is always room to write critical application data when the >> filesystem fills up.... >> > > We tossed around the idea of preventing applications from reducing the > size of the reserved pool so that they could not weaken the integrity > of the filesystem by removing critical resources. We need to support > reducing the pool size because we do so on unmount. Some people so tightly control their use of disk space that even the default needs to be reduced. We recently had someone come across this very problem when upgrading from 2.6.18 to 2.6.25 - their app preallocated almost he entire filesystem and so when the reserve pool took it's blocks, the filesystem was permanently at ENOSPC. The only way to fix this was to reduce the pool size and it was obvious that in this configuration the reserve pool was superfluous because it was a static layout. So at one end of the scale we've got the problem of some workloads when run at ENOSPC will exhaust the default pool size. At the other end we've got some workloads where the default pool size is too large. And we've got the vast middle ground where there are no problems with the current pool size but may have issues with a significant increase in pool size. It's this vast middle ground where we'll get all the "I upgraded and now I can't use my XFS filesystem" reports from. Let's not make more trouble for ourselves than is necesary. Hence it seems to me that the default should not be changed, the various mitigation strategies we talked about should be implemented, and SGI should tune the reserve pool to suit their users in the Propack distro (like so many other tunables are modified).... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-29 8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy 2008-09-30 3:26 ` Mark Goodwin @ 2008-09-30 4:11 ` Dave Chinner 2008-09-30 4:29 ` Dave Chinner 2008-09-30 6:19 ` Lachlan McIlroy 1 sibling, 2 replies; 9+ messages in thread From: Dave Chinner @ 2008-09-30 4:11 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote: > The current default size of the reserved blocks pool is easy to deplete > with certain workloads, in particular workloads that do lots of concurrent > delayed allocation extent conversions. If enough transactions are running > in parallel and the entire pool is consumed then subsequent calls to > xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited > warning so we know if this starts happening again. > > --- a/fs/xfs/xfs_mount.c 2008-09-29 18:30:26.000000000 +1000 > +++ b/fs/xfs/xfs_mount.c 2008-09-29 18:27:37.000000000 +1000 > @@ -1194,7 +1194,7 @@ xfs_mountfs( > */ > resblks = mp->m_sb.sb_dblocks; > do_div(resblks, 20); > - resblks = min_t(__uint64_t, resblks, 1024); > + resblks = min_t(__uint64_t, resblks, 16384); I'm still not convinced such a large increase is needed for average case. This means that at a filesystem size of 5GB we are reserving 256MB (5%) for a corner case workload that is unlikely to be run on a 5GB filesystem. That is a substantial reduction in space for such a filesystem, and quite possibly will drive systems into immediate ENOSPC at mount. At that point stuff is going to fail badly during boot. Indeed - this will ENOSPC the root drive on my laptop the moment I apply it (6GB root, 200MB free) and reboot, as well as my main server (4GB root - 150MB free, 2GB /var - 100MB free, etc). On that basis alone, I'd suggest this is a bad change to make to the default value of the reserved block pool. > error = xfs_reserve_blocks(mp, &resblks, NULL); > if (error) > cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. " > @@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked( > int scounter; /* short counter for 32 bit fields */ > long long lcounter; /* long counter for 64 bit fields */ > long long res_used, rem; > + static int depleted = 0; > > /* > * With the in-core superblock spin lock held, switch > @@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked( > if (rsvd) { > lcounter = (long long)mp->m_resblks_avail + delta; > if (lcounter < 0) { > + if ((depleted % 100) == 0) > + printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n"); > + depleted++; > return XFS_ERROR(ENOSPC); > } This should use the generic printk ratelimiter, and the error message should use xfs_fs_cmn_err() to indicate what filesystem the error is occuring on. ie.: if (printk_ratelimit()) xfs_fs_cmn_err(CE_WARN, mp, "ENOSPC: reserved block pool empty"); Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 4:11 ` Dave Chinner @ 2008-09-30 4:29 ` Dave Chinner 2008-09-30 6:19 ` Lachlan McIlroy 1 sibling, 0 replies; 9+ messages in thread From: Dave Chinner @ 2008-09-30 4:29 UTC (permalink / raw) To: Lachlan McIlroy, xfs-dev, xfs-oss On Tue, Sep 30, 2008 at 02:11:49PM +1000, Dave Chinner wrote: > On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote: > > The current default size of the reserved blocks pool is easy to deplete > > with certain workloads, in particular workloads that do lots of concurrent > > delayed allocation extent conversions. If enough transactions are running > > in parallel and the entire pool is consumed then subsequent calls to > > xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited > > warning so we know if this starts happening again. > > > > --- a/fs/xfs/xfs_mount.c 2008-09-29 18:30:26.000000000 +1000 > > +++ b/fs/xfs/xfs_mount.c 2008-09-29 18:27:37.000000000 +1000 > > @@ -1194,7 +1194,7 @@ xfs_mountfs( > > */ > > resblks = mp->m_sb.sb_dblocks; > > do_div(resblks, 20); > > - resblks = min_t(__uint64_t, resblks, 1024); > > + resblks = min_t(__uint64_t, resblks, 16384); > > I'm still not convinced such a large increase is needed for average > case. This means that at a filesystem size of 5GB we are reserving > 256MB (5%) for a corner case workload that is unlikely to be run on a > 5GB filesystem. That is a substantial reduction in space for such > a filesystem, and quite possibly will drive systems into immediate > ENOSPC at mount. At that point stuff is going to fail badly during > boot. Sorry, helps if I get the maths right - I was thinking of 16k filesystem blocks there. It's 64MB with 4k block size. My point still stands, though, that this is a problem for small filesystems that are typically used for root filesystems and are often run near full.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 4:11 ` Dave Chinner 2008-09-30 4:29 ` Dave Chinner @ 2008-09-30 6:19 ` Lachlan McIlroy 2008-09-30 6:40 ` Dave Chinner 1 sibling, 1 reply; 9+ messages in thread From: Lachlan McIlroy @ 2008-09-30 6:19 UTC (permalink / raw) To: Lachlan McIlroy, xfs-dev, xfs-oss Dave Chinner wrote: > On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote: >> The current default size of the reserved blocks pool is easy to deplete >> with certain workloads, in particular workloads that do lots of concurrent >> delayed allocation extent conversions. If enough transactions are running >> in parallel and the entire pool is consumed then subsequent calls to >> xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited >> warning so we know if this starts happening again. >> >> --- a/fs/xfs/xfs_mount.c 2008-09-29 18:30:26.000000000 +1000 >> +++ b/fs/xfs/xfs_mount.c 2008-09-29 18:27:37.000000000 +1000 >> @@ -1194,7 +1194,7 @@ xfs_mountfs( >> */ >> resblks = mp->m_sb.sb_dblocks; >> do_div(resblks, 20); >> - resblks = min_t(__uint64_t, resblks, 1024); >> + resblks = min_t(__uint64_t, resblks, 16384); > > I'm still not convinced such a large increase is needed for average > case. This means that at a filesystem size of 5GB we are reserving > 256MB (5%) for a corner case workload that is unlikely to be run on a > 5GB filesystem. That is a substantial reduction in space for such > a filesystem, and quite possibly will drive systems into immediate > ENOSPC at mount. At that point stuff is going to fail badly during > boot. What the? Just last week you were trying to convince me that increasing the pool size was a good idea. > > Indeed - this will ENOSPC the root drive on my laptop the moment I > apply it (6GB root, 200MB free) and reboot, as well as my main > server (4GB root - 150MB free, 2GB /var - 100MB free, etc). > On that basis alone, I'd suggest this is a bad change to make to the > default value of the reserved block pool. > >> error = xfs_reserve_blocks(mp, &resblks, NULL); >> if (error) >> cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. " >> @@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked( >> int scounter; /* short counter for 32 bit fields */ >> long long lcounter; /* long counter for 64 bit fields */ >> long long res_used, rem; >> + static int depleted = 0; >> >> /* >> * With the in-core superblock spin lock held, switch >> @@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked( >> if (rsvd) { >> lcounter = (long long)mp->m_resblks_avail + delta; >> if (lcounter < 0) { >> + if ((depleted % 100) == 0) >> + printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n"); >> + depleted++; >> return XFS_ERROR(ENOSPC); >> } > > This should use the generic printk ratelimiter, and the error message > should use xfs_fs_cmn_err() to indicate what filesystem the error > is occuring on. ie.: > > if (printk_ratelimit()) > xfs_fs_cmn_err(CE_WARN, mp, > "ENOSPC: reserved block pool empty"); Okay, I didn't know about printk_ratelimit(). Hmmm, that routine is not entirely useful - if the system is generating lots of log messages then it could suppress the one key message that indicates what's really going on. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Increase the default size of the reserved blocks pool 2008-09-30 6:19 ` Lachlan McIlroy @ 2008-09-30 6:40 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2008-09-30 6:40 UTC (permalink / raw) To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss On Tue, Sep 30, 2008 at 04:19:56PM +1000, Lachlan McIlroy wrote: > Dave Chinner wrote: >> On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote: >>> The current default size of the reserved blocks pool is easy to deplete >>> with certain workloads, in particular workloads that do lots of concurrent >>> delayed allocation extent conversions. If enough transactions are running >>> in parallel and the entire pool is consumed then subsequent calls to >>> xfs_trans_reserve() will fail with ENOSPC. Also add a rate limited >>> warning so we know if this starts happening again. >>> >>> --- a/fs/xfs/xfs_mount.c 2008-09-29 18:30:26.000000000 +1000 >>> +++ b/fs/xfs/xfs_mount.c 2008-09-29 18:27:37.000000000 +1000 >>> @@ -1194,7 +1194,7 @@ xfs_mountfs( >>> */ >>> resblks = mp->m_sb.sb_dblocks; >>> do_div(resblks, 20); >>> - resblks = min_t(__uint64_t, resblks, 1024); >>> + resblks = min_t(__uint64_t, resblks, 16384); >> >> I'm still not convinced such a large increase is needed for average >> case. This means that at a filesystem size of 5GB we are reserving >> 256MB (5%) for a corner case workload that is unlikely to be run on a >> 5GB filesystem. That is a substantial reduction in space for such >> a filesystem, and quite possibly will drive systems into immediate >> ENOSPC at mount. At that point stuff is going to fail badly during >> boot. > What the? Just last week you were trying to convince me that increasing > the pool size was a good idea. For your customer's systems that are being run at ENOSPC - not the default for everyone! >> This should use the generic printk ratelimiter, and the error message >> should use xfs_fs_cmn_err() to indicate what filesystem the error >> is occuring on. ie.: >> >> if (printk_ratelimit()) >> xfs_fs_cmn_err(CE_WARN, mp, >> "ENOSPC: reserved block pool empty"); > > Okay, I didn't know about printk_ratelimit(). Hmmm, that routine is not > entirely useful - if the system is generating lots of log messages then > it could suppress the one key message that indicates what's really going > on. If the message is that critical, then it shouldn't be rate limited at all. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-09-30 6:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-29 8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy 2008-09-30 3:26 ` Mark Goodwin 2008-09-30 4:25 ` Dave Chinner 2008-09-30 6:08 ` Lachlan McIlroy 2008-09-30 6:37 ` Dave Chinner 2008-09-30 4:11 ` Dave Chinner 2008-09-30 4:29 ` Dave Chinner 2008-09-30 6:19 ` Lachlan McIlroy 2008-09-30 6:40 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox