From: Dave Chinner <david@fromorbit.com>
To: Stuart Brodsky <sbrodsky@sgi.com>
Cc: xfs@oss.sgi.com, Alex Elder <aelder@sgi.com>
Subject: Re: [PATCH] xfs:negative_icount.patch
Date: Tue, 20 Jul 2010 09:11:50 +1000 [thread overview]
Message-ID: <20100719231150.GE32635@dastard> (raw)
In-Reply-To: <ECC25D18-BB9D-4EC7-A73C-139A688CE1C2@sgi.com>
Hi Stuart,
A couple of minor list-etiquette comments first:
- please don't top post when replying as it makes it
really hard to understand what you are replying to
because there is no context to your reply.
- don't remove the mailing list from the CC list while
discussing a patch as there are plenty more people than
just the person who sent the email that might want to put
their 2c worth in...
I've rearranged your reply and re-added the mailing list to the CC
list....
On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote:
> On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote:
> > On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
> >> This patch fixes the stat -f icount field going negative. The use of
> >> lazy counts means that the super block field sb_icount is not updated
> >> correctly and will allow over allocation of inodes above maxicount.
.....
> >> Index: a/fs/xfs/xfs_trans.c
> >> ===================================================================
> >> --- a/fs/xfs/xfs_trans.c.orig 2010-07-16 10:12:02.000000000 -0500
> >> +++ b/fs/xfs/xfs_trans.c 2010-07-16 10:22:47.690249548 -0500
> >> @@ -805,6 +805,7 @@
> >> switch (field) {
> >> case XFS_TRANS_SB_ICOUNT:
> >> tp->t_icount_delta += delta;
> >> + atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
> >> if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> >> flags &= ~XFS_TRANS_SB_DIRTY;
> >> break;
> >
> > This is still racy. It may fix your specific reproducer but I can't see how
> > changing the in use inode count from late in xfs_trans_commit() to early in
> > xfs_trans_commit() solves the problem: if we look at it
> > like this:
....
> > Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a
> > small amount - it's just an arbitrary limit that we use to prevent all the
> > filesytem space from being taken up by inodes. I think the problem is just a
> > reporting problem here in xfs_fs_statfs():
> >
> > 1245 fakeinos = statp->f_bfree << sbp->sb_inopblog;
> > 1246 statp->f_files =
> > 1247 MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER);
> > 1248 if (mp->m_maxicount)
> > 1249 statp->f_files = min_t(typeof(statp->f_files),
> > 1250 statp->f_files,
> > 1251 mp->m_maxicount);
> > 1252 statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
> >
> > Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0),
> > and we have just hit the above race condition to allocate more inodes than
> > mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives:
> >
> > statp->f_files = min(sbp->sb_icount, mp->m_maxicount);
> > statp->f_files = mp->m_maxicount;
> >
> > And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore:
> >
> > statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0)
> >
> > Which gives statp->f_ffree < 0 and hence negative free inodes. I think
> > all that is required to "fix" this is to clamp statp->f_ffree to zero.
> > i.e.:
> >
> > if (statp->f_ffree < 0)
> > statp->f_ffree = 0;
> >
> > Make sense?
>
> I agree that if we are willing to accept the fact that the actual
> number of inodes allocated can be over maxicount then your fix is
> correct in just clamping the f_ffree to 0.
Given that it already exceeds maxicount (and has for years ;) and
nothing breaks except for the stats, I don't see any problem
here. It seems much more complex to do anythign accurate and AFAICT
it is not necessary to be absolutely accurate.
Anyway, we have to allow the allocated inode count to exceed
maxicount as you can use growfs to reduce the allowed inode
percentage in the filesystem. Hence if we are near maxicount
allocated inodes and we reduce the max inode percentage then the
allocated inode count will exceed maxicount. In that case, we really
do need to report zero free inodes, not some negative number....
> I just don't like
> using a counter that has the potential to be out of date any where
> in the code.
I'm not sure I follow you - nothing gets out of date AFAICT.
> In this case as you suggest being over maxicount is
> not the end of the world. We will report ENOSPC a bit later but
> we still will detect it.
We still report ENOSPC at exactly the same time as we do now.
Nothing there changes, only the information going to userspace is
sanitised.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-07-19 23:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-16 18:53 [PATCH] xfs:negative_icount.patch Stuart Brodsky
2010-07-18 4:54 ` Christoph Hellwig
2010-07-19 0:14 ` Dave Chinner
[not found] ` <ECC25D18-BB9D-4EC7-A73C-139A688CE1C2@sgi.com>
2010-07-19 23:11 ` Dave Chinner [this message]
2010-07-20 12:56 ` Stuart Brodsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100719231150.GE32635@dastard \
--to=david@fromorbit.com \
--cc=aelder@sgi.com \
--cc=sbrodsky@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox