[PATCH] Increase the default size of the reserved blocks pool

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Increase the default size of the reserved blocks pool
@ 2008-09-29  8:54 Lachlan McIlroy
  2008-09-30  3:26 ` Mark Goodwin
  2008-09-30  4:11 ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Lachlan McIlroy @ 2008-09-29  8:54 UTC (permalink / raw)
  To: xfs-dev, xfs-oss

The current default size of the reserved blocks pool is easy to deplete
with certain workloads, in particular workloads that do lots of concurrent
delayed allocation extent conversions.  If enough transactions are running
in parallel and the entire pool is consumed then subsequent calls to
xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
warning so we know if this starts happening again.

--- a/fs/xfs/xfs_mount.c	2008-09-29 18:30:26.000000000 +1000
+++ b/fs/xfs/xfs_mount.c	2008-09-29 18:27:37.000000000 +1000
@@ -1194,7 +1194,7 @@ xfs_mountfs(
 	 */
 	resblks = mp->m_sb.sb_dblocks;
 	do_div(resblks, 20);
-	resblks = min_t(__uint64_t, resblks, 1024);
+	resblks = min_t(__uint64_t, resblks, 16384);
 	error = xfs_reserve_blocks(mp, &resblks, NULL);
 	if (error)
 		cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. "
@@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked(
 	int		scounter;	/* short counter for 32 bit fields */
 	long long	lcounter;	/* long counter for 64 bit fields */
 	long long	res_used, rem;
+	static int	depleted = 0;
 
 	/*
 	 * With the in-core superblock spin lock held, switch
@@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked(
 				if (rsvd) {
 					lcounter = (long long)mp->m_resblks_avail + delta;
 					if (lcounter < 0) {
+						if ((depleted % 100) == 0)
+							printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n");
+						depleted++;
 						return XFS_ERROR(ENOSPC);
 					}
 					mp->m_resblks_avail = lcounter;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-29  8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy
@ 2008-09-30  3:26 ` Mark Goodwin
  2008-09-30  4:25   ` Dave Chinner
  2008-09-30  4:11 ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Goodwin @ 2008-09-30  3:26 UTC (permalink / raw)
  To: lachlan; +Cc: xfs-dev, xfs-oss



Lachlan McIlroy wrote:
> The current default size of the reserved blocks pool is easy to deplete
> with certain workloads, in particular workloads that do lots of concurrent
> delayed allocation extent conversions.  If enough transactions are running
> in parallel and the entire pool is consumed then subsequent calls to
> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
> warning so we know if this starts happening again.
> 

Should we also change the semantics of the XFS_SET_RESBLKS ioctl
so that the passed in value is the minimum required by the caller,
i.e. silently succeed if the current value is more than that?

Cheers
-- Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-29  8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy
  2008-09-30  3:26 ` Mark Goodwin
@ 2008-09-30  4:11 ` Dave Chinner
  2008-09-30  4:29   ` Dave Chinner
  2008-09-30  6:19   ` Lachlan McIlroy
  1 sibling, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2008-09-30  4:11 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss

On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote:
> The current default size of the reserved blocks pool is easy to deplete
> with certain workloads, in particular workloads that do lots of concurrent
> delayed allocation extent conversions.  If enough transactions are running
> in parallel and the entire pool is consumed then subsequent calls to
> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
> warning so we know if this starts happening again.
>
> --- a/fs/xfs/xfs_mount.c	2008-09-29 18:30:26.000000000 +1000
> +++ b/fs/xfs/xfs_mount.c	2008-09-29 18:27:37.000000000 +1000
> @@ -1194,7 +1194,7 @@ xfs_mountfs(
> 	 */
> 	resblks = mp->m_sb.sb_dblocks;
> 	do_div(resblks, 20);
> -	resblks = min_t(__uint64_t, resblks, 1024);
> +	resblks = min_t(__uint64_t, resblks, 16384);

I'm still not convinced such a large increase is needed for average
case. This means that at a filesystem size of 5GB we are reserving
256MB (5%) for a corner case workload that is unlikely to be run on a
5GB filesystem. That is a substantial reduction in space for such
a filesystem, and quite possibly will drive systems into immediate
ENOSPC at mount. At that point stuff is going to fail badly during
boot.

Indeed - this will ENOSPC the root drive on my laptop the moment I
apply it (6GB root, 200MB free) and reboot, as well as my main
server (4GB root - 150MB free, 2GB /var - 100MB free, etc).
On that basis alone, I'd suggest this is a bad change to make to the
default value of the reserved block pool.

> 	error = xfs_reserve_blocks(mp, &resblks, NULL);
> 	if (error)
> 		cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. "
> @@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked(
> 	int		scounter;	/* short counter for 32 bit fields */
> 	long long	lcounter;	/* long counter for 64 bit fields */
> 	long long	res_used, rem;
> +	static int	depleted = 0;
>
> 	/*
> 	 * With the in-core superblock spin lock held, switch
> @@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked(
> 				if (rsvd) {
> 					lcounter = (long long)mp->m_resblks_avail + delta;
> 					if (lcounter < 0) {
> +						if ((depleted % 100) == 0)
> +							printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n");
> +						depleted++;
> 						return XFS_ERROR(ENOSPC);
> 					}

This should use the generic printk ratelimiter, and the error message
should use xfs_fs_cmn_err() to indicate what filesystem the error
is occuring on. ie.:

	if (printk_ratelimit())
		xfs_fs_cmn_err(CE_WARN, mp,
				"ENOSPC: reserved block pool empty");

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  3:26 ` Mark Goodwin
@ 2008-09-30  4:25   ` Dave Chinner
  2008-09-30  6:08     ` Lachlan McIlroy
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2008-09-30  4:25 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: lachlan, xfs-dev, xfs-oss

On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote:
>
>
> Lachlan McIlroy wrote:
>> The current default size of the reserved blocks pool is easy to deplete
>> with certain workloads, in particular workloads that do lots of concurrent
>> delayed allocation extent conversions.  If enough transactions are running
>> in parallel and the entire pool is consumed then subsequent calls to
>> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
>> warning so we know if this starts happening again.
>>
>
> Should we also change the semantics of the XFS_SET_RESBLKS ioctl
> so that the passed in value is the minimum required by the caller,
> i.e. silently succeed if the current value is more than that?

No. If we are asked to reduce the size of the pool, then we should
do so. The caller might have reason for wanting the pool size
reduced. e.g. using it to trigger early ENOSPC notification so that
there is always room to write critical application data when the
filesystem fills up....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  4:11 ` Dave Chinner
@ 2008-09-30  4:29   ` Dave Chinner
  2008-09-30  6:19   ` Lachlan McIlroy
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2008-09-30  4:29 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-dev, xfs-oss

On Tue, Sep 30, 2008 at 02:11:49PM +1000, Dave Chinner wrote:
> On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote:
> > The current default size of the reserved blocks pool is easy to deplete
> > with certain workloads, in particular workloads that do lots of concurrent
> > delayed allocation extent conversions.  If enough transactions are running
> > in parallel and the entire pool is consumed then subsequent calls to
> > xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
> > warning so we know if this starts happening again.
> >
> > --- a/fs/xfs/xfs_mount.c	2008-09-29 18:30:26.000000000 +1000
> > +++ b/fs/xfs/xfs_mount.c	2008-09-29 18:27:37.000000000 +1000
> > @@ -1194,7 +1194,7 @@ xfs_mountfs(
> > 	 */
> > 	resblks = mp->m_sb.sb_dblocks;
> > 	do_div(resblks, 20);
> > -	resblks = min_t(__uint64_t, resblks, 1024);
> > +	resblks = min_t(__uint64_t, resblks, 16384);
> 
> I'm still not convinced such a large increase is needed for average
> case. This means that at a filesystem size of 5GB we are reserving
> 256MB (5%) for a corner case workload that is unlikely to be run on a
> 5GB filesystem. That is a substantial reduction in space for such
> a filesystem, and quite possibly will drive systems into immediate
> ENOSPC at mount. At that point stuff is going to fail badly during
> boot.

Sorry, helps if I get the maths right - I was thinking of 16k
filesystem blocks there. It's 64MB with 4k block size. My point
still stands, though, that this is a problem for small filesystems
that are typically used for root filesystems and are often run
near full....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  4:25   ` Dave Chinner
@ 2008-09-30  6:08     ` Lachlan McIlroy
  2008-09-30  6:37       ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Lachlan McIlroy @ 2008-09-30  6:08 UTC (permalink / raw)
  To: Mark Goodwin, lachlan, xfs-dev, xfs-oss

Dave Chinner wrote:
> On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote:
>>
>> Lachlan McIlroy wrote:
>>> The current default size of the reserved blocks pool is easy to deplete
>>> with certain workloads, in particular workloads that do lots of concurrent
>>> delayed allocation extent conversions.  If enough transactions are running
>>> in parallel and the entire pool is consumed then subsequent calls to
>>> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
>>> warning so we know if this starts happening again.
>>>
>> Should we also change the semantics of the XFS_SET_RESBLKS ioctl
>> so that the passed in value is the minimum required by the caller,
>> i.e. silently succeed if the current value is more than that?
> 
> No. If we are asked to reduce the size of the pool, then we should
> do so. The caller might have reason for wanting the pool size
> reduced. e.g. using it to trigger early ENOSPC notification so that
> there is always room to write critical application data when the
> filesystem fills up....
> 

We tossed around the idea of preventing applications from reducing the
size of the reserved pool so that they could not weaken the integrity
of the filesystem by removing critical resources.  We need to support
reducing the pool size because we do so on unmount.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  4:11 ` Dave Chinner
  2008-09-30  4:29   ` Dave Chinner
@ 2008-09-30  6:19   ` Lachlan McIlroy
  2008-09-30  6:40     ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Lachlan McIlroy @ 2008-09-30  6:19 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-dev, xfs-oss

Dave Chinner wrote:
> On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote:
>> The current default size of the reserved blocks pool is easy to deplete
>> with certain workloads, in particular workloads that do lots of concurrent
>> delayed allocation extent conversions.  If enough transactions are running
>> in parallel and the entire pool is consumed then subsequent calls to
>> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
>> warning so we know if this starts happening again.
>>
>> --- a/fs/xfs/xfs_mount.c	2008-09-29 18:30:26.000000000 +1000
>> +++ b/fs/xfs/xfs_mount.c	2008-09-29 18:27:37.000000000 +1000
>> @@ -1194,7 +1194,7 @@ xfs_mountfs(
>> 	 */
>> 	resblks = mp->m_sb.sb_dblocks;
>> 	do_div(resblks, 20);
>> -	resblks = min_t(__uint64_t, resblks, 1024);
>> +	resblks = min_t(__uint64_t, resblks, 16384);
> 
> I'm still not convinced such a large increase is needed for average
> case. This means that at a filesystem size of 5GB we are reserving
> 256MB (5%) for a corner case workload that is unlikely to be run on a
> 5GB filesystem. That is a substantial reduction in space for such
> a filesystem, and quite possibly will drive systems into immediate
> ENOSPC at mount. At that point stuff is going to fail badly during
> boot.
What the?  Just last week you were trying to convince me that increasing
the pool size was a good idea.

> 
> Indeed - this will ENOSPC the root drive on my laptop the moment I
> apply it (6GB root, 200MB free) and reboot, as well as my main
> server (4GB root - 150MB free, 2GB /var - 100MB free, etc).
> On that basis alone, I'd suggest this is a bad change to make to the
> default value of the reserved block pool.
> 
>> 	error = xfs_reserve_blocks(mp, &resblks, NULL);
>> 	if (error)
>> 		cmn_err(CE_WARN, "XFS: Unable to allocate reserve blocks. "
>> @@ -1483,6 +1483,7 @@ xfs_mod_incore_sb_unlocked(
>> 	int		scounter;	/* short counter for 32 bit fields */
>> 	long long	lcounter;	/* long counter for 64 bit fields */
>> 	long long	res_used, rem;
>> +	static int	depleted = 0;
>>
>> 	/*
>> 	 * With the in-core superblock spin lock held, switch
>> @@ -1535,6 +1536,9 @@ xfs_mod_incore_sb_unlocked(
>> 				if (rsvd) {
>> 					lcounter = (long long)mp->m_resblks_avail + delta;
>> 					if (lcounter < 0) {
>> +						if ((depleted % 100) == 0)
>> +							printk(KERN_DEBUG "XFS reserved blocks pool depleted.\n");
>> +						depleted++;
>> 						return XFS_ERROR(ENOSPC);
>> 					}
> 
> This should use the generic printk ratelimiter, and the error message
> should use xfs_fs_cmn_err() to indicate what filesystem the error
> is occuring on. ie.:
> 
> 	if (printk_ratelimit())
> 		xfs_fs_cmn_err(CE_WARN, mp,
> 				"ENOSPC: reserved block pool empty");

Okay, I didn't know about printk_ratelimit().  Hmmm, that routine is not
entirely useful - if the system is generating lots of log messages then
it could suppress the one key message that indicates what's really going
on.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  6:08     ` Lachlan McIlroy
@ 2008-09-30  6:37       ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2008-09-30  6:37 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: Mark Goodwin, xfs-dev, xfs-oss

On Tue, Sep 30, 2008 at 04:08:15PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Tue, Sep 30, 2008 at 01:26:17PM +1000, Mark Goodwin wrote:
>>>
>>> Lachlan McIlroy wrote:
>>>> The current default size of the reserved blocks pool is easy to deplete
>>>> with certain workloads, in particular workloads that do lots of concurrent
>>>> delayed allocation extent conversions.  If enough transactions are running
>>>> in parallel and the entire pool is consumed then subsequent calls to
>>>> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
>>>> warning so we know if this starts happening again.
>>>>
>>> Should we also change the semantics of the XFS_SET_RESBLKS ioctl
>>> so that the passed in value is the minimum required by the caller,
>>> i.e. silently succeed if the current value is more than that?
>>
>> No. If we are asked to reduce the size of the pool, then we should
>> do so. The caller might have reason for wanting the pool size
>> reduced. e.g. using it to trigger early ENOSPC notification so that
>> there is always room to write critical application data when the
>> filesystem fills up....
>>
>
> We tossed around the idea of preventing applications from reducing the
> size of the reserved pool so that they could not weaken the integrity
> of the filesystem by removing critical resources.  We need to support
> reducing the pool size because we do so on unmount.

Some people so tightly control their use of disk space that even the
default needs to be reduced. We recently had someone come across this
very problem when upgrading from 2.6.18 to 2.6.25 - their app
preallocated almost he entire filesystem and so when the reserve
pool took it's blocks, the filesystem was permanently at ENOSPC.
The only way to fix this was to reduce the pool size and it was
obvious that in this configuration the reserve pool was superfluous
because it was a static layout.

So at one end of the scale we've got the problem of some workloads
when run at ENOSPC will exhaust the default pool size. At the other
end we've got some workloads where the default pool size is too
large. And we've got the vast middle ground where there are no
problems with the current pool size but may have issues with a
significant increase in pool size.

It's this vast middle ground where we'll get all the "I upgraded and
now I can't use my XFS filesystem" reports from. Let's not make more
trouble for ourselves than is necesary.  Hence it seems to me that
the default should not be changed, the various mitigation strategies
we talked about should be implemented, and SGI should tune the
reserve pool to suit their users in the Propack distro (like so many
other tunables are modified)....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Increase the default size of the reserved blocks pool
  2008-09-30  6:19   ` Lachlan McIlroy
@ 2008-09-30  6:40     ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2008-09-30  6:40 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss

On Tue, Sep 30, 2008 at 04:19:56PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Mon, Sep 29, 2008 at 06:54:13PM +1000, Lachlan McIlroy wrote:
>>> The current default size of the reserved blocks pool is easy to deplete
>>> with certain workloads, in particular workloads that do lots of concurrent
>>> delayed allocation extent conversions.  If enough transactions are running
>>> in parallel and the entire pool is consumed then subsequent calls to
>>> xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
>>> warning so we know if this starts happening again.
>>>
>>> --- a/fs/xfs/xfs_mount.c	2008-09-29 18:30:26.000000000 +1000
>>> +++ b/fs/xfs/xfs_mount.c	2008-09-29 18:27:37.000000000 +1000
>>> @@ -1194,7 +1194,7 @@ xfs_mountfs(
>>> 	 */
>>> 	resblks = mp->m_sb.sb_dblocks;
>>> 	do_div(resblks, 20);
>>> -	resblks = min_t(__uint64_t, resblks, 1024);
>>> +	resblks = min_t(__uint64_t, resblks, 16384);
>>
>> I'm still not convinced such a large increase is needed for average
>> case. This means that at a filesystem size of 5GB we are reserving
>> 256MB (5%) for a corner case workload that is unlikely to be run on a
>> 5GB filesystem. That is a substantial reduction in space for such
>> a filesystem, and quite possibly will drive systems into immediate
>> ENOSPC at mount. At that point stuff is going to fail badly during
>> boot.
> What the?  Just last week you were trying to convince me that increasing
> the pool size was a good idea.

For your customer's systems that are being run at ENOSPC - not the
default for everyone!

>> This should use the generic printk ratelimiter, and the error message
>> should use xfs_fs_cmn_err() to indicate what filesystem the error
>> is occuring on. ie.:
>>
>> 	if (printk_ratelimit())
>> 		xfs_fs_cmn_err(CE_WARN, mp,
>> 				"ENOSPC: reserved block pool empty");
>
> Okay, I didn't know about printk_ratelimit().  Hmmm, that routine is not
> entirely useful - if the system is generating lots of log messages then
> it could suppress the one key message that indicates what's really going
> on.

If the message is that critical, then it shouldn't be rate limited
at all.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-09-30  6:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-29  8:54 [PATCH] Increase the default size of the reserved blocks pool Lachlan McIlroy
2008-09-30  3:26 ` Mark Goodwin
2008-09-30  4:25   ` Dave Chinner
2008-09-30  6:08     ` Lachlan McIlroy
2008-09-30  6:37       ` Dave Chinner
2008-09-30  4:11 ` Dave Chinner
2008-09-30  4:29   ` Dave Chinner
2008-09-30  6:19   ` Lachlan McIlroy
2008-09-30  6:40     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox