public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Tinguely <tinguely@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Ben Myers <bpm@sgi.com>, Stan Hoeppner <stan@hardwarefreak.com>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	xfs@oss.sgi.com
Subject: Re: [Bisected] Corruption of root fs during git bisect of drm system hang
Date: Sat, 20 Jul 2013 12:21:47 -0500	[thread overview]
Message-ID: <51EAC72B.905@sgi.com> (raw)
In-Reply-To: <20130720031840.GA11674@dastard>

On 07/19/13 22:18, Dave Chinner wrote:
> On Fri, Jul 19, 2013 at 04:11:28PM -0500, Mark Tinguely wrote:
>> On 07/19/13 07:22, Markus Trippelsdorf wrote:
>>>
>>> I've bisected this issue to the following commit:
>>>
>>>   commit cca9f93a52d2ead50b5da59ca83d5f469ee4be5f
>>>   Author: Dave Chinner<dchinner@redhat.com>
>>>   Date:   Thu Jun 27 16:04:49 2013 +1000
>>>
>>>       xfs: don't do IO when creating an new inode
>>>
>>> Reverting this commit on top of the Linus tree "solves" all problems for
>>> me. IOW I no longer loose my KDE and LibreOffice config files during a
>>> crash. Log recovery now works fine and xfs_repair shows no issues.
>>>
>>> So users of 3.11.0-rc1 beware. Only run this version if you have
>>> up-to-date backups handy.
>>>
>>
>> I reviewed the above patch and liked it but, I think I recreated the
>> above mentioned problem with a simple script:
>>
>> cp /root/.bash_history /root/.lesshst /root/.pwclientrc
>> /root/.viminfo /root/.bash_profile  /root/.lesshst.YCJCDz
>> /root/.quiltrc /somexfsdir
>> sync
>> echo 'c'>  /proc/sysrq-trigger
>> .... reboot, remount ...
>> cd /somexfsdir
>
> I've only reproduced the problem *once* with this method - the first
> time I tried. Then I mkfs'd the filesystem rather than repairing it
> and I haven't been able to reproduce it since.  So the problem is
> far more subtle that just copying some files, running sync and
> crashing the machine - there's some kind of initial or timing
> condition that we are missing that triggers it...
>
> The one interesting thing I noticed was that the generation number
> in the crash case was non-zero. That's an important piece of
> information, and:
>
>> # cat .bash_history
>> cat: .bash_history: No such file or directory
>>
>> xfs_db>  inode 131
>> xfs_db>  p
>> core.magic = 0x494e
>> core.mode = 0
>
> That's a "free" inode, and why XFS considers it invalid when the
> lookup sees it.
>
>> core.gen = 3707503345
>
> You saw it as well, Mark.
>
> That means it has actually been allocated and written to disk at
> some point in time. That is, inodes allocated by mkfs in the root
> inode chunk have a generation number of zero. For this to have a
> non-zero generation number, it means that had to be written after
> allocation - either before the sync or during log recovery.
>
> Unfortunately, without the 'xfs_logprint -t -i<dev>' output from
> prior to mounting the filesystem which demonstrates te problem, I
> can't tell if the issue is a recovery problem or something that
> happened before the crash....
>
>> revert the above commit and the problem goes away.
> ....
>> core.mode = 0100600
>
> Not an free inode...
>
>> core.gen = 0
>
> And, importantly, the generation number is zero, as would be
> expected for an inode in the root chunk.
>
> FWIW, if you can reproduce this on demand, Mark, is to see if
> mounting "-o ikeep" makes the problem go away as this optimisation
> is only used on filesystems that are configured to free inode
> chunks...
>
> Cheers,
>
> Dave.


Yeah, I thought of the logprint and the ikeep afterwards.

I tried the script today and it did not reproduce the problem. The 
logprint and the mounted filesystem was empty. I will rebuild the 
sources to eliminate some patched kernel versions on that box and 
experiment with the sync and the shooting of the kernel.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-07-20 17:21 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-10  9:06 Corruption of root fs during git bisect of drm system hang Markus Trippelsdorf
2013-07-11  0:31 ` Dave Chinner
2013-07-11  3:36   ` Markus Trippelsdorf
2013-07-11  3:58     ` Dave Chinner
2013-07-11  4:12       ` Stan Hoeppner
2013-07-11  9:07         ` Markus Trippelsdorf
2013-07-11 11:28           ` Markus Trippelsdorf
2013-07-11 20:24             ` Stan Hoeppner
2013-07-11 20:40               ` Markus Trippelsdorf
2013-07-11 23:01                 ` Stan Hoeppner
2013-07-12  2:38                 ` Dave Chinner
2013-07-12  2:17           ` Dave Chinner
2013-07-12  7:07             ` Markus Trippelsdorf
2013-07-13  9:05               ` Markus Trippelsdorf
2013-07-15  2:28               ` Dave Chinner
2013-07-15  6:47                 ` Markus Trippelsdorf
2013-07-19 12:22                   ` [Bisected] " Markus Trippelsdorf
2013-07-19 12:41                     ` Stefan Ring
2013-07-19 12:51                       ` Markus Trippelsdorf
2013-07-19 16:02                         ` Eric Sandeen
2013-07-19 16:32                           ` Markus Trippelsdorf
2013-07-19 19:13                             ` Ben Myers
2013-07-19 19:56                               ` Markus Trippelsdorf
2013-07-19 20:28                                 ` Markus Trippelsdorf
2013-07-19 19:23                             ` Eric Sandeen
2013-07-19 19:53                               ` Markus Trippelsdorf
2013-07-19 21:11                     ` Mark Tinguely
2013-07-20  3:18                       ` Dave Chinner
2013-07-20 17:21                         ` Mark Tinguely [this message]
2013-07-21  7:37                           ` Dave Chinner
2013-07-20  1:48                     ` Dave Chinner
2013-07-22 10:22                       ` Dave Chinner
2013-07-22 10:47                         ` Markus Trippelsdorf
2013-07-22 22:54                           ` Dave Chinner
2013-07-11  4:15       ` Markus Trippelsdorf
2013-07-11  0:37 ` Stan Hoeppner
2013-07-11  3:47   ` Markus Trippelsdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51EAC72B.905@sgi.com \
    --to=tinguely@sgi.com \
    --cc=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=markus@trippelsdorf.de \
    --cc=stan@hardwarefreak.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox