From: "Dilger, Andreas" <andreas.dilger@intel.com>
To: Theodore Ts'o <tytso@mit.edu>,
"Zhang, Hongchao" <hongchao.zhang@intel.com>,
Eric Sandeen <sandeen@redhat.com>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: an issue of ext4
Date: Thu, 6 Mar 2014 23:57:15 +0000 [thread overview]
Message-ID: <CF3CDE77.96870%andreas.dilger@intel.com> (raw)
In-Reply-To: <20140305125105.GA11600@thunk.org>
On 2014/03/05, 5:51 AM, "Theodore Ts'o" <tytso@mit.edu> wrote:
>On Wed, Mar 05, 2014 at 12:33:32PM +0000, Zhang, Hongchao wrote:
>>
>> in ext4_fill_super, the variables related to statfs should be
>> initialized after journal recovery is completed. otherwise, if a
>> large number of blocks were being allocated before the filesystem
>> crashed, then the blocks and inode counters may become negative
>> during use and report incorrect values to statfs call.
>
>The ext4_statfs() doesn't use the free blocks and inodes count from
>the superblock. For scalability reasons, we no longer update the
>journal values in the superblock while they are in use, but rather
>compute them from the sum of the values from the blockgroup
>descriptors, and then track them via percpu counters.
Ted,
This doesn't relate to using the summary counters in the superblock.
The problem is that the percpu counters are initialized from the
group descriptors (or block and inode bitmaps if EXT4_DEBUG is on)
at mount time _before_ the journal has been replayed. That means
journal replay can still change the group descriptors (or bitmaps)
after the counters are initialized, and statfs(), allocators, etc.
will use the wrong values for the rest of the mount.
If the journal is large, and there is heavy allocation happening
before the reboot then the counters can be significantly incorrect.
However, looking more closely at the upstream kernel, I see that this
code was changed by Dmitry Monakhov in v2.6.34-rc7-16-g84061e0 to
move the counters after journal init (almost the same as Hongchao's
patch does) but then you submitted a patch v2.6.37-rc1-3-gce7e010
to initialize the percpu counters are both before and after the
journal is loaded. It isn't clear from your commit comment why
the patch to load them both before and after was needed?
It seems we hit this problem in the RHEL6 (which is missing both of
these changes), and your patch made upstream look like the original
unpatched code was loading the counters only before the journal is
replayed, so Hongchao's patch still applied to upstream.
So I guess upstream is OK, with the exception that it isn't clear
why commit ce7e010 was made. Need to ask Eric to backport 84061e0
and ce7e010 to RHEL6 I guess, and use those patches in place of
our own in the meantime.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
next prev parent reply other threads:[~2014-03-06 23:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-05 12:33 an issue of ext4 Zhang, Hongchao
2014-03-05 12:51 ` Theodore Ts'o
2014-03-06 23:57 ` Dilger, Andreas [this message]
2014-03-07 1:52 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CF3CDE77.96870%andreas.dilger@intel.com \
--to=andreas.dilger@intel.com \
--cc=hongchao.zhang@intel.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).