public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Ben Myers <bpm@sgi.com>, Mark Tinguely <tinguely@sgi.com>,
	Stan Hoeppner <stan@hardwarefreak.com>,
	xfs@oss.sgi.com
Subject: Re: [Bisected] Corruption of root fs during git bisect of drm system hang
Date: Sat, 20 Jul 2013 11:48:36 +1000	[thread overview]
Message-ID: <20130720014836.GZ11674@dastard> (raw)
In-Reply-To: <20130719122235.GA360@x4>

On Fri, Jul 19, 2013 at 02:22:35PM +0200, Markus Trippelsdorf wrote:
> On 2013.07.15 at 08:47 +0200, Markus Trippelsdorf wrote:
> > On 2013.07.15 at 12:28 +1000, Dave Chinner wrote:
> > > On Fri, Jul 12, 2013 at 09:07:21AM +0200, Markus Trippelsdorf wrote:
> > > > On 2013.07.12 at 12:17 +1000, Dave Chinner wrote:
> > > > > On Thu, Jul 11, 2013 at 11:07:55AM +0200, Markus Trippelsdorf wrote:
> > > > > > On 2013.07.10 at 23:12 -0500, Stan Hoeppner wrote:
> > > > > > > On 7/10/2013 10:58 PM, Dave Chinner wrote:
> > > > > > > > On Thu, Jul 11, 2013 at 05:36:21AM +0200, Markus Trippelsdorf wrote:
> > > > > > > 
> > > > > > > >> I was loosing my KDE settings bit by bit with every reboot during the
> > > > > > > >> bisection. First my window-rules disappeared, then my desktop background
> > > > > > > >> changed to default, then my taskbar moved from top to the bottom, etc.
> > > > > > > >> In the end I had to restore all my .files from backup. 
> > > > > > > > 
> > > > > > > > That's not filesystem corruption. That sounds more like someone not
> > > > > > > > using fsync in the apropriate place when overwriting a file....
> > > > > > > 
> > > > > > t@ubunt:~# xfs_repair /dev/sdb
> > > > > > Phase 1 - find and verify superblock...
> > > > > > Phase 2 - using internal log
> > > > > >         - zero log...
> > > > > >         - scan filesystem freespace and inode maps...
> > > > > > agi unlinked bucket 0 is 683435008 in ag 2 (inode=4978402304)
> > > > > > agi unlinked bucket 1 is 683435009 in ag 2 (inode=4978402305)
> > > > > >         - found root inode chunk
> > > > > 
> > > > > Again, these are signs that log recovery has not completed
> > > > > successfully or that for some reason it thought the log was clean.
> > > > > Can you please post the dmesg output after the crash when you go
> > > > > through the mount/unmount process before you run xfs_repair?
> > > > 
> > > > Sure.
> > > > First boot after crash:
> > > >  XFS (sdb2): Mounting Filesystem
> > > >  XFS (sdb2): Starting recovery (logdev: internal)
> > > >  XFS (sdb2): Ending recovery (logdev: internal)
> > > > 
> > > > Second boot after crash:
> > > >  XFS (sdb2): Mounting Filesystem
> > > >  XFS (sdb2): Ending clean mount 
> > > > 
> > > > I then boot Ubuntu from another disc to run xfs_repair.
> > > 
> > > That's what shoul dhave been in the initial description of your
> > > problem.
> > > 
> > > > And looking through my logs I see this WARNING:
> > > > 
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 439 at fs/inode.c:280 drop_nlink+0x33/0x40()
> > > > CPU: 0 PID: 439 Comm: gconfd-2 Not tainted 3.10.0-08982-g6d128e1-dirty #42
> > > > Hardware name: System manufacturer System Product Name/M4A78T-E, BIOS 3503    04/13/2011
> > > >  0000000000000009 ffffffff8157d030 0000000000000000 ffffffff81060788
> > > >  ffff8801f8608cc8 ffff880205998230 ffff8801f7bede58 0000000000000000
> > > >  ffff8801f86083c0 ffffffff8110ce93 ffff8801f8608b40 ffffffff811b7104
> > > > Call Trace:
> > > >  [<ffffffff8157d030>] ? dump_stack+0x41/0x51
> > > >  [<ffffffff81060788>] ? warn_slowpath_common+0x68/0x80
> > > >  [<ffffffff8110ce93>] ? drop_nlink+0x33/0x40
> > > >  [<ffffffff811b7104>] ? xfs_droplink+0x24/0x60
> > > >  [<ffffffff811b84ed>] ? xfs_remove+0x24d/0x380
> > > >  [<ffffffff811b1657>] ? xfs_vn_unlink+0x37/0x80
> > > >  [<ffffffff8110414e>] ? vfs_unlink+0x6e/0xe0
> > > >  [<ffffffff8110432a>] ? do_unlinkat+0x16a/0x220
> > > >  [<ffffffff810f4fa9>] ? SyS_faccessat+0x149/0x200
> > > >  [<ffffffff81583292>] ? system_call_fastpath+0x16/0x1b
> > > 
> > > When did that occur? Before the crash, after the first/second mount?
> > > after you ran repair?
> > 
> > After the first mount.
> > 
> > > > Some further observations:
> > > > 
> > > > When I boot 3.2.0 after the crash log recovery works fine.
> > > > 
> > > > When I boot 3.9.0 after the crash I get the following:
> > > > 
> > > > [    2.332989] XFS (sdc2): Mounting Filesystem
> > > > [    2.406206] XFS (sdc2): Starting recovery (logdev: internal)
> > > > [    2.418147] XFS (sdc2): log record CRC mismatch: found 0xdbcaef48, expected 0x69e7934e.
> > > 
> > > Just informational - indicating that the log records don't have
> > > valid CRCs in them because 3.2 didn't calculate them. If you are
> > > getting them when after a crash on a 3.9+ kernel, then there's a
> > > problem writing to the log....
> > 
> > The crash always occurred on the current Linus tree kernel...
> > 
> > > > When I boot the current Linus tree after the crash log recovery fails silently.
> > > 
> > > dmesg output, please. Indeed, what does "fails silently" mean? the
> > > filesystem doesn't mount but no error is given?
> > 
> > Again, there is no dmesg output. XFS tells me that it's "Ending recovery
> > (logdev: internal)" without any errors, when indeed it didn't recover
> > the log at all. It then mounts the filesystem normally (rw) in this
> > unclean state. That's when the WARNING I postend above happend.
> 
> I've bisected this issue to the following commit:
> 
>  commit cca9f93a52d2ead50b5da59ca83d5f469ee4be5f
>  Author: Dave Chinner <dchinner@redhat.com>
>  Date:   Thu Jun 27 16:04:49 2013 +1000
> 
>      xfs: don't do IO when creating an new inode
>          
> Reverting this commit on top of the Linus tree "solves" all problems for
> me. IOW I no longer loose my KDE and LibreOffice config files during a
> crash. Log recovery now works fine and xfs_repair shows no issues.

Thanks for bisecting this, Marcus.

I'll admit, right now it doesn't make a lot of sense to me - I don't
immediately see a connection between not reading an inode during the
create phase and unlinked list and directory corruption after a
crash. But now you've identified a change that might be the cause,
I have an avenue of investigation I can follow.

Indeed, in the time I've taken to write this mail I've thought of
2-3 possible causes that I need to investigate....

> So users of 3.11.0-rc1 beware. Only run this version if you have
> up-to-date backups handy.

Don't be so dramatic - very few people are doing what you are doing,
so let's try to understand the root cause of problem before jumping
to rash conclusions....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-07-20  8:45 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-10  9:06 Corruption of root fs during git bisect of drm system hang Markus Trippelsdorf
2013-07-11  0:31 ` Dave Chinner
2013-07-11  3:36   ` Markus Trippelsdorf
2013-07-11  3:58     ` Dave Chinner
2013-07-11  4:12       ` Stan Hoeppner
2013-07-11  9:07         ` Markus Trippelsdorf
2013-07-11 11:28           ` Markus Trippelsdorf
2013-07-11 20:24             ` Stan Hoeppner
2013-07-11 20:40               ` Markus Trippelsdorf
2013-07-11 23:01                 ` Stan Hoeppner
2013-07-12  2:38                 ` Dave Chinner
2013-07-12  2:17           ` Dave Chinner
2013-07-12  7:07             ` Markus Trippelsdorf
2013-07-13  9:05               ` Markus Trippelsdorf
2013-07-15  2:28               ` Dave Chinner
2013-07-15  6:47                 ` Markus Trippelsdorf
2013-07-19 12:22                   ` [Bisected] " Markus Trippelsdorf
2013-07-19 12:41                     ` Stefan Ring
2013-07-19 12:51                       ` Markus Trippelsdorf
2013-07-19 16:02                         ` Eric Sandeen
2013-07-19 16:32                           ` Markus Trippelsdorf
2013-07-19 19:13                             ` Ben Myers
2013-07-19 19:56                               ` Markus Trippelsdorf
2013-07-19 20:28                                 ` Markus Trippelsdorf
2013-07-19 19:23                             ` Eric Sandeen
2013-07-19 19:53                               ` Markus Trippelsdorf
2013-07-19 21:11                     ` Mark Tinguely
2013-07-20  3:18                       ` Dave Chinner
2013-07-20 17:21                         ` Mark Tinguely
2013-07-21  7:37                           ` Dave Chinner
2013-07-20  1:48                     ` Dave Chinner [this message]
2013-07-22 10:22                       ` Dave Chinner
2013-07-22 10:47                         ` Markus Trippelsdorf
2013-07-22 22:54                           ` Dave Chinner
2013-07-11  4:15       ` Markus Trippelsdorf
2013-07-11  0:37 ` Stan Hoeppner
2013-07-11  3:47   ` Markus Trippelsdorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130720014836.GZ11674@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=markus@trippelsdorf.de \
    --cc=stan@hardwarefreak.com \
    --cc=tinguely@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox