public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs:negative_icount.patch
@ 2010-07-16 18:53 Stuart Brodsky
  2010-07-18  4:54 ` Christoph Hellwig
  2010-07-19  0:14 ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: Stuart Brodsky @ 2010-07-16 18:53 UTC (permalink / raw)
  To: xfs

This patch fixes the stat -f icount field going negative.  The use of
lazy counts means that the super block field sb_icount is not updated
correctly and will allow over allocation of inodes above maxicount.

Signed-off-by: Stu Brodsky <sbrodsky@sgi.com>

Index: a/fs/xfs/xfs_ialloc.c
===================================================================
--- a/fs/xfs/xfs_ialloc.c.orig	2010-07-16 10:12:02.000000000 -0500
+++ b/fs/xfs/xfs_ialloc.c	2010-07-16 10:19:56.312633030 -0500
@@ -260,7 +260,7 @@
 	 */
 	newlen = XFS_IALLOC_INODES(args.mp);
 	if (args.mp->m_maxicount &&
-	    args.mp->m_sb.sb_icount + newlen > args.mp->m_maxicount)
+	    atomic64_read(&args.mp->m_inodesinuse) + newlen >
args.mp->m_maxicount)
 		return XFS_ERROR(ENOSPC);
 	args.minlen = args.maxlen = XFS_IALLOC_BLOCKS(args.mp);
 	/*
@@ -708,7 +708,7 @@
 	 */
 
 	if (mp->m_maxicount &&
-	    mp->m_sb.sb_icount + XFS_IALLOC_INODES(mp) > mp->m_maxicount) {
+	    atomic64_read(&mp->m_inodesinuse) + XFS_IALLOC_INODES(mp) >
mp->m_maxicount)
 		noroom = 1;
 		okalloc = 0;
 	}
Index: b/fs/xfs/xfs_mount.c
===================================================================
--- a/fs/xfs/xfs_mount.c.orig	2010-07-16 10:12:02.000000000 -0500
+++ b/fs/xfs/xfs_mount.c	2010-07-16 10:20:49.717572132 -0500
@@ -744,6 +744,7 @@
 	mp->m_blockmask = sbp->sb_blocksize - 1;
 	mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG;
 	mp->m_blockwmask = mp->m_blockwsize - 1;
+	atomic64_set(&mp->m_inodesinuse,sbp->sb_icount);
 
 	mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1);
 	mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0);
Index: a/fs/xfs/xfs_mount.h
===================================================================
--- a/fs/xfs/xfs_mount.h.orig	2010-07-16 10:12:02.000000000 -0500
+++ b/fs/xfs/xfs_mount.h	2010-07-16 10:21:36.377577685 -0500
@@ -137,6 +137,7 @@
 	__uint8_t		m_agno_log;	/* log #ag's */
 	__uint8_t		m_agino_log;	/* #bits for agino in inum */
 	__uint16_t		m_inode_cluster_size;/* min inode buf size */
+	atomic64_t		m_inodesinuse;	/* in use inodes */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	uint			m_blockwmask;	/* blockwsize-1 */
Index: a/fs/xfs/xfs_trans.c
===================================================================
--- a/fs/xfs/xfs_trans.c.orig	2010-07-16 10:12:02.000000000 -0500
+++ b/fs/xfs/xfs_trans.c	2010-07-16 10:22:47.690249548 -0500
@@ -805,6 +805,7 @@
 	switch (field) {
 	case XFS_TRANS_SB_ICOUNT:
 		tp->t_icount_delta += delta;
+		atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
 		if (xfs_sb_version_haslazysbcount(&mp->m_sb))
 			flags &= ~XFS_TRANS_SB_DIRTY;
 		break;




-- 
Stuart Brodsky
2750 Blue Water Road
Eagan, MN. 55121
651-683-7910
<sbrodsky@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs:negative_icount.patch
  2010-07-16 18:53 [PATCH] xfs:negative_icount.patch Stuart Brodsky
@ 2010-07-18  4:54 ` Christoph Hellwig
  2010-07-19  0:14 ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2010-07-18  4:54 UTC (permalink / raw)
  To: Stuart Brodsky; +Cc: xfs

On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
> This patch fixes the stat -f icount field going negative.

In what tool?  Do you means the Inodes: Total output in the stat(1) tool?

Please describe the problem in more detail in the commit log, and also
use a descriptive subject line.


> The use of
> lazy counts means that the super block field sb_icount is not updated
> correctly and will allow over allocation of inodes above maxicount.

The patch replaces the use of m_sb.sb_icount with your new atomic
variable in a few places in the inode allocator.  If it really was the
stat tool mentioned above I would expect xfs_fs_statfs to also be
updated to use it.  Either way I'm a bit confused about which users
are switched to the new variable and which not.  This probably needs
a good comment in the code explaining where to use it and where not.

And the commit message needs a comment explaining how this negative
icount actually happened.

Note that introducing a filesystem-global atomic variable generally
isn't a good thing for scalability.  This one is touched seldomly enough
that it might not a big problem, but depending on where you need to use
it summing up the per-cpu counters on demand might be a better option.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs:negative_icount.patch
  2010-07-16 18:53 [PATCH] xfs:negative_icount.patch Stuart Brodsky
  2010-07-18  4:54 ` Christoph Hellwig
@ 2010-07-19  0:14 ` Dave Chinner
       [not found]   ` <ECC25D18-BB9D-4EC7-A73C-139A688CE1C2@sgi.com>
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2010-07-19  0:14 UTC (permalink / raw)
  To: Stuart Brodsky; +Cc: xfs

On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
> This patch fixes the stat -f icount field going negative.  The use of
> lazy counts means that the super block field sb_icount is not updated
> correctly and will allow over allocation of inodes above maxicount.
> 
> Signed-off-by: Stu Brodsky <sbrodsky@sgi.com>
> 
> Index: a/fs/xfs/xfs_ialloc.c
> ===================================================================
> --- a/fs/xfs/xfs_ialloc.c.orig	2010-07-16 10:12:02.000000000 -0500
> +++ b/fs/xfs/xfs_ialloc.c	2010-07-16 10:19:56.312633030 -0500
> @@ -260,7 +260,7 @@
>  	 */
>  	newlen = XFS_IALLOC_INODES(args.mp);
>  	if (args.mp->m_maxicount &&
> -	    args.mp->m_sb.sb_icount + newlen > args.mp->m_maxicount)
> +	    atomic64_read(&args.mp->m_inodesinuse) + newlen >
> args.mp->m_maxicount)
>  		return XFS_ERROR(ENOSPC);
>  	args.minlen = args.maxlen = XFS_IALLOC_BLOCKS(args.mp);
>  	/*
> @@ -708,7 +708,7 @@
>  	 */
>  
>  	if (mp->m_maxicount &&
> -	    mp->m_sb.sb_icount + XFS_IALLOC_INODES(mp) > mp->m_maxicount) {
> +	    atomic64_read(&mp->m_inodesinuse) + XFS_IALLOC_INODES(mp) >
> mp->m_maxicount)
>  		noroom = 1;
>  		okalloc = 0;
>  	}

.....

> Index: a/fs/xfs/xfs_trans.c
> ===================================================================
> --- a/fs/xfs/xfs_trans.c.orig	2010-07-16 10:12:02.000000000 -0500
> +++ b/fs/xfs/xfs_trans.c	2010-07-16 10:22:47.690249548 -0500
> @@ -805,6 +805,7 @@
>  	switch (field) {
>  	case XFS_TRANS_SB_ICOUNT:
>  		tp->t_icount_delta += delta;
> +		atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
>  		if (xfs_sb_version_haslazysbcount(&mp->m_sb))
>  			flags &= ~XFS_TRANS_SB_DIRTY;
>  		break;

This is still racy. It may fix your specific reproducer but I can't see how
changing the in use inode count from late in xfs_trans_commit() to early in
xfs_trans_commit() solves the problem: if we look at it
like this:


	thread 1		thread 2
	check count
	<start alloc>
	.....
				check count
				<start alloc>
				....
	<update count>


All you've done if change where <update count> occurs rather than
solving the race that prevents negative inode counts.

Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a
small amount - it's just an arbitrary limit that we use to prevent all the
filesytem space from being taken up by inodes.  I think the problem is just a
reporting problem here in xfs_fs_statfs():

1245         fakeinos = statp->f_bfree << sbp->sb_inopblog;
1246         statp->f_files =
1247             MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER);
1248         if (mp->m_maxicount)
1249                 statp->f_files = min_t(typeof(statp->f_files),
1250                                         statp->f_files,
1251                                         mp->m_maxicount);
1252         statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);

Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0),
and we have just hit the above race condition to allocate more inodes than
mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives:

	statp->f_files = min(sbp->sb_icount, mp->m_maxicount);
	statp->f_files = mp->m_maxicount;

And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore:

	statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0)

Which gives statp->f_ffree < 0 and hence negative free inodes. I think
all that is required to "fix" this is to clamp statp->f_ffree to zero.
i.e.:

	if (statp->f_ffree < 0)
		statp->f_ffree = 0;

Make sense?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs:negative_icount.patch
       [not found]   ` <ECC25D18-BB9D-4EC7-A73C-139A688CE1C2@sgi.com>
@ 2010-07-19 23:11     ` Dave Chinner
  2010-07-20 12:56       ` Stuart Brodsky
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2010-07-19 23:11 UTC (permalink / raw)
  To: Stuart Brodsky; +Cc: xfs, Alex Elder

Hi Stuart, 

A couple of minor list-etiquette comments first:

	- please don't top post when replying as it makes it
	  really hard to understand what you are replying to
	  because there is no context to your reply.

	- don't remove the mailing list from the CC list while
	  discussing a patch as there are plenty more people than
	  just the person who sent the email that might want to put
	  their 2c worth in...

I've rearranged your reply and re-added the mailing list to the CC
list....

On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote:
> On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote:
> > On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
> >> This patch fixes the stat -f icount field going negative.  The use of
> >> lazy counts means that the super block field sb_icount is not updated
> >> correctly and will allow over allocation of inodes above maxicount.
.....
> >> Index: a/fs/xfs/xfs_trans.c
> >> ===================================================================
> >> --- a/fs/xfs/xfs_trans.c.orig	2010-07-16 10:12:02.000000000 -0500
> >> +++ b/fs/xfs/xfs_trans.c	2010-07-16 10:22:47.690249548 -0500
> >> @@ -805,6 +805,7 @@
> >> 	switch (field) {
> >> 	case XFS_TRANS_SB_ICOUNT:
> >> 		tp->t_icount_delta += delta;
> >> +		atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
> >> 		if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> >> 			flags &= ~XFS_TRANS_SB_DIRTY;
> >> 		break;
> > 
> > This is still racy. It may fix your specific reproducer but I can't see how
> > changing the in use inode count from late in xfs_trans_commit() to early in
> > xfs_trans_commit() solves the problem: if we look at it
> > like this:
....
> > Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a
> > small amount - it's just an arbitrary limit that we use to prevent all the
> > filesytem space from being taken up by inodes.  I think the problem is just a
> > reporting problem here in xfs_fs_statfs():
> > 
> > 1245         fakeinos = statp->f_bfree << sbp->sb_inopblog;
> > 1246         statp->f_files =
> > 1247             MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER);
> > 1248         if (mp->m_maxicount)
> > 1249                 statp->f_files = min_t(typeof(statp->f_files),
> > 1250                                         statp->f_files,
> > 1251                                         mp->m_maxicount);
> > 1252         statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
> > 
> > Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0),
> > and we have just hit the above race condition to allocate more inodes than
> > mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives:
> > 
> > 	statp->f_files = min(sbp->sb_icount, mp->m_maxicount);
> > 	statp->f_files = mp->m_maxicount;
> > 
> > And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore:
> > 
> > 	statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0)
> > 
> > Which gives statp->f_ffree < 0 and hence negative free inodes. I think
> > all that is required to "fix" this is to clamp statp->f_ffree to zero.
> > i.e.:
> > 
> > 	if (statp->f_ffree < 0)
> > 		statp->f_ffree = 0;
> > 
> > Make sense?
>
> I agree that if we are willing to accept the fact that the actual
> number of inodes allocated can be over maxicount then your fix is
> correct in just clamping the f_ffree to 0.

Given that it already exceeds maxicount (and has for years ;) and
nothing breaks except for the stats, I don't see any problem
here. It seems much more complex to do anythign accurate and AFAICT
it is not necessary to be absolutely accurate.

Anyway, we have to allow the allocated inode count to exceed
maxicount as you can use growfs to reduce the allowed inode
percentage in the filesystem. Hence if we are near maxicount
allocated inodes and we reduce the max inode percentage then the
allocated inode count will exceed maxicount. In that case, we really
do need to report zero free inodes, not some negative number....

> I just don't like
> using a counter that has the potential to be out of date any where
> in the code.

I'm not sure I follow you - nothing gets out of date AFAICT.

> In this case as you suggest being over maxicount is
> not the end of the world.  We will report ENOSPC a bit later but
> we still will detect it.

We still report ENOSPC at exactly the same time as we do now.
Nothing there changes, only the information going to userspace is
sanitised.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs:negative_icount.patch
  2010-07-19 23:11     ` Dave Chinner
@ 2010-07-20 12:56       ` Stuart Brodsky
  0 siblings, 0 replies; 5+ messages in thread
From: Stuart Brodsky @ 2010-07-20 12:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, Alex Elder

Thanks for the feedback.  I will have an updated patch shortly.  There is a slight problem in that f_ffree is an unsigned 64 so that it does not see something going negative but that is easy to work around.  I'll send out the tested patch in a bit.

Thanks,

---
Stuart Brodsky
2750 Blue Water Road
Eagan, MN. 55121
651-683-7910
sbrodsky@sgi.com




On Jul 19, 2010, at 6:11 PM, Dave Chinner wrote:

> Hi Stuart, 
> 
> A couple of minor list-etiquette comments first:
> 
> 	- please don't top post when replying as it makes it
> 	  really hard to understand what you are replying to
> 	  because there is no context to your reply.
> 
> 	- don't remove the mailing list from the CC list while
> 	  discussing a patch as there are plenty more people than
> 	  just the person who sent the email that might want to put
> 	  their 2c worth in...
> 
> I've rearranged your reply and re-added the mailing list to the CC
> list....
> 
> On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote:
>> On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote:
>>> On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
>>>> This patch fixes the stat -f icount field going negative.  The use of
>>>> lazy counts means that the super block field sb_icount is not updated
>>>> correctly and will allow over allocation of inodes above maxicount.
> .....
>>>> Index: a/fs/xfs/xfs_trans.c
>>>> ===================================================================
>>>> --- a/fs/xfs/xfs_trans.c.orig	2010-07-16 10:12:02.000000000 -0500
>>>> +++ b/fs/xfs/xfs_trans.c	2010-07-16 10:22:47.690249548 -0500
>>>> @@ -805,6 +805,7 @@
>>>> 	switch (field) {
>>>> 	case XFS_TRANS_SB_ICOUNT:
>>>> 		tp->t_icount_delta += delta;
>>>> +		atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
>>>> 		if (xfs_sb_version_haslazysbcount(&mp->m_sb))
>>>> 			flags &= ~XFS_TRANS_SB_DIRTY;
>>>> 		break;
>>> 
>>> This is still racy. It may fix your specific reproducer but I can't see how
>>> changing the in use inode count from late in xfs_trans_commit() to early in
>>> xfs_trans_commit() solves the problem: if we look at it
>>> like this:
> ....
>>> Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a
>>> small amount - it's just an arbitrary limit that we use to prevent all the
>>> filesytem space from being taken up by inodes.  I think the problem is just a
>>> reporting problem here in xfs_fs_statfs():
>>> 
>>> 1245         fakeinos = statp->f_bfree << sbp->sb_inopblog;
>>> 1246         statp->f_files =
>>> 1247             MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER);
>>> 1248         if (mp->m_maxicount)
>>> 1249                 statp->f_files = min_t(typeof(statp->f_files),
>>> 1250                                         statp->f_files,
>>> 1251                                         mp->m_maxicount);
>>> 1252         statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
>>> 
>>> Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0),
>>> and we have just hit the above race condition to allocate more inodes than
>>> mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives:
>>> 
>>> 	statp->f_files = min(sbp->sb_icount, mp->m_maxicount);
>>> 	statp->f_files = mp->m_maxicount;
>>> 
>>> And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore:
>>> 
>>> 	statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0)
>>> 
>>> Which gives statp->f_ffree < 0 and hence negative free inodes. I think
>>> all that is required to "fix" this is to clamp statp->f_ffree to zero.
>>> i.e.:
>>> 
>>> 	if (statp->f_ffree < 0)
>>> 		statp->f_ffree = 0;
>>> 
>>> Make sense?
>> 
>> I agree that if we are willing to accept the fact that the actual
>> number of inodes allocated can be over maxicount then your fix is
>> correct in just clamping the f_ffree to 0.
> 
> Given that it already exceeds maxicount (and has for years ;) and
> nothing breaks except for the stats, I don't see any problem
> here. It seems much more complex to do anythign accurate and AFAICT
> it is not necessary to be absolutely accurate.
> 
> Anyway, we have to allow the allocated inode count to exceed
> maxicount as you can use growfs to reduce the allowed inode
> percentage in the filesystem. Hence if we are near maxicount
> allocated inodes and we reduce the max inode percentage then the
> allocated inode count will exceed maxicount. In that case, we really
> do need to report zero free inodes, not some negative number....
> 
>> I just don't like
>> using a counter that has the potential to be out of date any where
>> in the code.
> 
> I'm not sure I follow you - nothing gets out of date AFAICT.
> 
>> In this case as you suggest being over maxicount is
>> not the end of the world.  We will report ENOSPC a bit later but
>> we still will detect it.
> 
> We still report ENOSPC at exactly the same time as we do now.
> Nothing there changes, only the information going to userspace is
> sanitised.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-07-20 12:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-16 18:53 [PATCH] xfs:negative_icount.patch Stuart Brodsky
2010-07-18  4:54 ` Christoph Hellwig
2010-07-19  0:14 ` Dave Chinner
     [not found]   ` <ECC25D18-BB9D-4EC7-A73C-139A688CE1C2@sgi.com>
2010-07-19 23:11     ` Dave Chinner
2010-07-20 12:56       ` Stuart Brodsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox