* Re: 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing
[not found] ` <20070918092013.GA1352@infradead.org>
@ 2007-09-18 10:39 ` David Chinner
0 siblings, 0 replies; 3+ messages in thread
From: David Chinner @ 2007-09-18 10:39 UTC (permalink / raw)
To: Christoph Hellwig, David Chinner, Justin Piszcz, linux-kernel,
xfs
On Tue, Sep 18, 2007 at 10:20:13AM +0100, Christoph Hellwig wrote:
> On Tue, Sep 18, 2007 at 11:45:37AM +1000, David Chinner wrote:
> > No idea - it looks like dkpg was trying to remove a directory on the
> > same path the lookup was and both have gone splat in __d_lookup on
> > the same dentry. Something happened in those 180 days that left a
> > landmine that was tripped over here, I think. I can't see any way of
> > tracking it down from this, but thanks for reporting it anyway,
>
> This looks a lot like the i_sem leak that Vlad debugged. Do you remember
> where this was fixed?
The i_sem leak was hitting us on sles9 - 2.6.5 base kernel - and it was fixed
before the i_sem -> i_mutex conversion in mainline. Some time around 2.6.16,
IIRC. Given this was a 2.6.20 kernel, there'd be an almighty kaboom if that
bug still existed after the i_mutex conversion....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing
[not found] <Pine.LNX.4.64.0709171315210.22156@p34.internal.lan>
[not found] ` <20070918014537.GK23367404@sgi.com>
@ 2007-09-19 8:47 ` Justin Piszcz
2007-09-21 0:15 ` David Chinner
1 sibling, 1 reply; 3+ messages in thread
From: Justin Piszcz @ 2007-09-19 8:47 UTC (permalink / raw)
To: linux-kernel; +Cc: xfs
On Mon, 17 Sep 2007, Justin Piszcz wrote:
> Including the XFS mailing list in here too because it may be an XFS bug
> looking at the call trace.
>
> System: Debian Testing
> Kernel: 2.6.20
> Config: Attached
>
> I was running apt-get dist-upgrade as I always do to get the latest packages
> upgraded and the kernel OOPS'd when it was upgrading 'tzdata' and the process
> went into D-state and I had to reboot.
>
> The config file is from 2.6.20 but it had been moved to a 2.6.22 directory
> for an upgrade, but all of the options have been left unchanged.
>
> Here is the *OOPS I captured via dmesg before I rebooted:
>
>
Also,
Not sure if this helps but when this happened, any file that was open()
for read/write seem to have also been corrupted..
$ /usr/sbin/xfs_bmap -v myconfig.txt.orig
myconfig.txt.orig:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 64601112..64601119 14 (52040..52047) 8
$ /usr/sbin/xfs_bmap -v myconfig.txt
myconfig.txt:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 64625720..64625727 14 (76648..76655) 8
$ md5sum myconfig*
db8c50ca2c86d2e757ecef1d6b3fcc69 myconfig.txt
09fb630623b3ae614511cef4c7a21063 myconfig.txt.orig
$ file myconfig.txt myconfig.txt.orig
myconfig.txt: ASCII text
myconfig.txt.orig: data
$
$ strings -a myconfig.txt.orig
$
$ od -c myconfig.txt.orig
0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 *
0003500 \0 \0 \0 \0 \0 \0
0003506
Seems like it was NULL'd out?
Justin.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing
2007-09-19 8:47 ` Justin Piszcz
@ 2007-09-21 0:15 ` David Chinner
0 siblings, 0 replies; 3+ messages in thread
From: David Chinner @ 2007-09-21 0:15 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, xfs
On Wed, Sep 19, 2007 at 04:47:38AM -0400, Justin Piszcz wrote:
> On Mon, 17 Sep 2007, Justin Piszcz wrote:
>
> >Including the XFS mailing list in here too because it may be an XFS bug
> >looking at the call trace.
> >
> >System: Debian Testing
> >Kernel: 2.6.20
> >Config: Attached
> >
> >I was running apt-get dist-upgrade as I always do to get the latest
> >packages upgraded and the kernel OOPS'd when it was upgrading 'tzdata' and
> >the process went into D-state and I had to reboot.
> >
> >The config file is from 2.6.20 but it had been moved to a 2.6.22 directory
> >for an upgrade, but all of the options have been left unchanged.
> >
> >Here is the *OOPS I captured via dmesg before I rebooted:
> >
> >
>
> Also,
>
> Not sure if this helps but when this happened, any file that was open()
> for read/write seem to have also been corrupted..
Is that all files, or just ones that were being changed?
> $ /usr/sbin/xfs_bmap -v myconfig.txt.orig
> myconfig.txt.orig:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..7]: 64601112..64601119 14 (52040..52047) 8
> $ /usr/sbin/xfs_bmap -v myconfig.txt
> myconfig.txt:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..7]: 64625720..64625727 14 (76648..76655) 8
> $ md5sum myconfig*
> db8c50ca2c86d2e757ecef1d6b3fcc69 myconfig.txt
> 09fb630623b3ae614511cef4c7a21063 myconfig.txt.orig
> $ file myconfig.txt myconfig.txt.orig
> myconfig.txt: ASCII text
> myconfig.txt.orig: data
> $
>
> $ strings -a myconfig.txt.orig
> $
>
> $ od -c myconfig.txt.orig
> 0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 *
> 0003500 \0 \0 \0 \0 \0 \0
> 0003506
>
> Seems like it was NULL'd out?
A single block of zeros - its possible that the crash occurred between
the allocation transaction and the data write - the allocation gets
replayed (along with the new file size), but the data write does
not (not journalled). This is one of the rarer "NULL files on crash"
failure modes fixed in 6.5.22.....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-09-21 0:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.64.0709171315210.22156@p34.internal.lan>
[not found] ` <20070918014537.GK23367404@sgi.com>
[not found] ` <20070918092013.GA1352@infradead.org>
2007-09-18 10:39 ` 2.6.20 (XFS? related) crash after uptime of > 180 days during apt-get dist-upgrade on Debian Testing David Chinner
2007-09-19 8:47 ` Justin Piszcz
2007-09-21 0:15 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox