From: Alexandru Cardaniuc <cardaniuc@gmail.com>
To: xfs@oss.sgi.com
Subject: Re: corruption of in-memory data detected
Date: Tue, 01 Jul 2014 13:13:19 -0700 [thread overview]
Message-ID: <878uoc25y8.fsf@gmail.com> (raw)
In-Reply-To: <20140701093803.GH4453@dastard> (Dave Chinner's message of "Tue, 1 Jul 2014 19:38:03 +1000")
Dave Chinner <david@fromorbit.com> writes:
> On Tue, Jul 01, 2014 at 01:29:35AM -0700, Alexandru Cardaniuc wrote:
>> Dave Chinner <david@fromorbit.com> writes:
>>
>> > On Mon, Jun 30, 2014 at 11:44:45PM -0700, Alexandru Cardaniuc
>> > wrote:
>> >> Hi All,
>>
>> >> I am having an issue with an XFS filesystem shutting down under
>> >> high load with very many small files. Basically, I have around
>> >> 3.5 - 4 million files on this filesystem. New files are being
>> >> written to the FS all the time, until I get to 9-11 mln small
>> >> files (35k on average).
> ....
>> > You've probably fragmented free space to the point where inodes
>> > cannot be allocated anymore, and then it's shutdown because it got
>> > enospc with a dirty inode allocation transaction.
>>
>> > xfs_db -c "freespc -s" <dev>
>>
>> > should tell us whether this is the case or not.
>> This is what I have
>>
>> # xfs_db -c "freesp -s" /dev/sda5 from to extents blocks pct 1 1 657
>> 657 0.00 2 3 264 607 0.00 4 7 29 124 0.00 8 15 13 143 0.00 16 31 41
>> 752 0.00 32 63 8 293 0.00 64 127 12 1032 0.00 128 255 8 1565 0.00
>> 256 511 10 4044 0.00 512 1023 7 5750 0.00 1024 2047 10 16061 0.01
>> 2048 4095 5 16948 0.01 4096 8191 7 43312 0.02 8192 16383 9 115578
>> 0.06 16384 32767 6 159576 0.08 32768 65535 3 104586 0.05 262144
>> 524287 1 507710 0.25 4194304 7454720 28 200755934 99.51 total free
>> extents 1118 total free blocks 201734672 average free extent size
>> 180442
>
> So it's not freespace fragmentation, but that was just the most likely
> cause. Most likely it's a transient condition where an AG is out of
> space but in determining that condition the AGF was modified. We've
> fixed several bugs in that area over the past few years....
I still have the FS available. Any other information I can assemble to
help you identify the issue?
>> >> Using CentOS 5.9 with kernel 2.6.18-348.el5xen
>> > The "enospc with dirty transaction" shutdown bugs have been fixed
>> > in more recent kernels than RHEL5.
>> These fixes were not backported to RHEL5 kernels?
> No.
I assume I wouldn't just be able to take the source for XFS kernel module
and compile it against the 2.6.18 kernel in CentOS 5.x?
>> >> The problem is reproducible and I don't think it's hardware
>> >> related. The problem was reproduced on multiple servers of the
>> >> same type. So, I doubt it's a memory issue or something like
>> >> that.
>>
>> > Nope, it's not hardware, it's buggy software that has been fixed
>> > in the years since 2.6.18....
>> I would hope these fixes would be backported to RHEL5 (CentOS 5)
>> kernels...
>
> TANSTAAFL.
>> > If you've fragmented free space, then your ony options are:
>>
>> > - dump/mkfs/restore - remove a large number of files from the
>> > filesystem so free space defragments.
>> That wouldn't be fixed automagically using xfs_repair, wouldn't it?
> No.
>> > If you simply want to avoid the shutdown, then upgrade to a more
>> > recent kernel (3.x of some kind) where all the known issues have
>> > been fixed.
>> How about 2.6.32? That's the kernel that comes with RHEL 6.x
>
> It might, but I don't know the exact root cause of your problem so I
> couldn't say for sure.
>> >> I went through the kernel updates for CentOS 5.10 (newer kernel),
>> >> but didn't see any xfs related fixes since CentOS 5.9
>>
>> > That's something you need to talk to your distro maintainers
>> > about....
>> I was worried you gonna say that :)
>
> Theres only so much that upstream can do to support heavily patched, 6
> year old distro kernels.
>> What are my options at this point? Am I correct to assume that the
>> issue is related to the load and if I manage to decrease the load,
>> the issue is not going to reproduce itself?
> It's more likely related to the layout of data and metadata on disk.
>> We have been using XFS on RHEL 5 kernels for years and didn't see
>> this issue. Now, the issue happens consistently, but seems to be
>> related to high load...
> There are several different potential causes - high load just iterates
> the problem space faster.
>> We have hundreds of these servers deployed in production right now,
>> so some way to address the current situation would be very welcomed.
> I'd suggest talking to Red Hat about what they can do to help you,
> especially as CentOS is a now RH distro....
I will try that. Thanks.
--
"It's very well to be thrifty, but don't amass a hoard of regrets."
- Charles D'Orleans
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-07-01 20:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-01 6:44 corruption of in-memory data detected Alexandru Cardaniuc
2014-07-01 7:02 ` Dave Chinner
2014-07-01 8:29 ` Alexandru Cardaniuc
2014-07-01 9:38 ` Dave Chinner
2014-07-01 20:13 ` Alexandru Cardaniuc [this message]
2014-07-01 21:43 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2009-01-02 2:46 Corruption " Thomas Gutzler
2009-01-02 3:24 ` Eric Sandeen
2009-03-11 2:44 ` Thomas Gutzler
2009-03-11 4:30 ` Eric Sandeen
2009-03-11 10:42 ` Thomas Gutzler
2009-03-12 2:23 ` Eric Sandeen
2009-03-12 5:06 ` Thomas Gutzler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878uoc25y8.fsf@gmail.com \
--to=cardaniuc@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.