From: Eric Sandeen <sandeen@sandeen.net>
To: Josh Endries <endries@cs.cornell.edu>
Cc: xfs@oss.sgi.com
Subject: Re: Crash recovery/zero-byte file question
Date: Sun, 19 May 2013 21:22:55 -0500 [thread overview]
Message-ID: <519988FF.3010702@sandeen.net> (raw)
In-Reply-To: <76015885.11204.1369015296087.JavaMail.root@coecis.cornell.edu>
On 5/19/13 9:01 PM, Josh Endries wrote:
> Hello,
>
> Thanks for the reply!
>
>>> We have a RHEL 6.3 machine with a large XFS mount that suffered a
>>> power outage.
>>
>> For starters, have you engaged your RH support folks?
>
> Unfortunately we don't have support for these machines. We have tons of RH machines and licenses, but only a few with paid support. Generally the (grant-funded) research machines don't include RH support. (And generally we don't run into problems like this. :))
ok
>>> When it came back up, it allegedly fixed itself, but
>>> now many files are zero bytes. I found a bug report/errata fix at RH
>>> that mentions something similar, which might be what we ran into.
>>
>> Which one? RH support can probably help you decide if that bug report
>> applies, and where/when it was fixed.
>
> This one: https://access.redhat.com/site/solutions/272673
well, that's a "solution" ;)
> You need a login to view that, though... I think this is the same one, which I just found today:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=845233
>
> That URL is currently broken for me, so here is a cache of it:
>
> http://webcache.googleusercontent.com/search?q=cache:3OjuPDd8A1AJ:https://bugzilla.redhat.com/show_bug.cgi%3Fid%3D845233+&cd=2&hl=en&ct=clnk&gl=us&client=firefox-a
>
> Reading this, I'm no longer sure we have a kernel with the fix. That machine is running:
>
> 2.6.32-279.el6.x86_64
Right, and: "Fixed In Version: kernel-2.6.32-328.el6"
So this is a known bug and fixed, but you're not running the fix it seems.
> I'm not really sure when the files were created or how long it was
> idle before the crash... I wonder if ctime/mtime would be reliable
> for the files. I also don't know how to reproduce the situation in
> order to test if it's fixed in a later kernel. I can pull the power
> out to test if I knew how to modify files ahead of time such that
> they would zero themselves out.
I think you can be fairly certain that it's resolved in the above
kernel.
>>> We
>>> are running a kernel that should have the fix as far as I can tell,
>>> but we definitely have zero byte files that shouldn't be.
>>
>> shouldn't be because they had all been properly synced to disk
>> before the power loss, or? (just in general, files not fsynced
>> aren't guaranteed to be in any particular state if you lose power,
>> though of course there are certain expectations of timely flushing).
>
> No, I mean they shouldn't be zero normally. They weren't zero a week
> ago. In other words, the files definitely changed unexpectedly, I'm
> assuming due to the power outage. The files had not been touched in
> at least a few days before the crash, according to the researcher
> working on those files. If I read the report correctly, though, that
> might not matter much.
ok
>>> My question is: is there a way to restore this or fix it before going
>>> to backups? Is it worth it to unmount and run xfs_check or similar?
>>> Unfortunately, since the system came up and appeared to be working,
>>> some users have been using that mount point.
>>
>> If you have backups that's probably the best option.
>
> There aren't any backups of these files. The researchers should be
> able to recreate them (I hope so); the data sets come from various
> places. It's a lot of data, so I was hoping I could recover something
> to lessen the downtime. They opted not to back up that directory
> because it's just too many TBs for normal backups.
>
> I'm not really expecting to be able to restore everything, I just
> want to put some effort in to getting back what I can before telling
> them they need to start over...
Dave is more familiar with that bug than I am, but short of some serious
forensics & luck, I don't think you'll be able to get things back.
I'd update to the kernel mentioned above soon, though, and sorry
about the hassle. :(
-Eric
> Thanks,
> Josh
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2013-05-20 2:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1601451892.10839.1368808313900.JavaMail.root@coecis.cornell.edu>
2013-05-17 16:36 ` Crash recovery/zero-byte file question Josh Endries
2013-05-17 21:44 ` Eric Sandeen
2013-05-20 2:01 ` Josh Endries
2013-05-20 2:22 ` Eric Sandeen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=519988FF.3010702@sandeen.net \
--to=sandeen@sandeen.net \
--cc=endries@cs.cornell.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox