* [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3
@ 2011-08-19 23:51 Thilo-Alexander Ginkel
2011-09-05 12:59 ` Jan Kara
0 siblings, 1 reply; 3+ messages in thread
From: Thilo-Alexander Ginkel @ 2011-08-19 23:51 UTC (permalink / raw)
To: linux-kernel, linux-ext4
Hi there,
while rsyncing a large amount (> 1TB) of data from an ext3 to an ext4
on my machine [1], I encountered an issue where rsync and syslog
eventually started consuming 100% CPU and my syslog was flooded [2]
with error messages:
-- 8< --
> kernel: [101543.047293] b_state=0x00000029, b_size=>[10ock01543.04>[101543.047321] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> kernel: [101543.047330] b_state=0x00000029, b_size=4096
> kernel: [101543.047>[10ock01543.047348] b_state=0x00000029, b_size=4096
> kernel: [101543.047353] device blocksize: 4096
> kernel: [101543.047359] __find_get_block_slow() failed. block=328204473, b01543.0>[10ock01543.047>[1ock01543.047404] b_state=0x00000029, b_size=4096
> kernel: [101543.047409] device blocksize: 4096
> kernel: [101543.047414] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> kernel: [10154ock01543.0>[1ock01543.0492>[1ock01543.0492>[1ock01543.049>[1ock01543.0492>[1ock01543.0>[1ock01543.049>[1ock01543.049>[1ock01543.0492>[10ock01543.0>[1ock=01543.04>[1ock01543.>[1ock01543.0493>[1ock01543.049>[1ock01543.04>[1ock01543.0493>[1ock01543.04941>[1ock01543.0494>[1ock01543.0>[1ock01543.049>[10ock01543.0>[1ock01543.04>[1ock01543.04>[1ock01543.0495>[1ock01543.0495>[1ock01543.0495>[1ock01543.0496>[1ock01543.04>[1ock01543.04>[1ock01543.049>[1ock01543.049>[1ock01543.04>[1ock01543.0497>[1ock01543.0>[1ock01543.0497>[1ock01543.0497>[1ock01543.0498>[1ock01543.0498>[1ock01543.04>[1ock01543.04>[1ock01543.0498>[1ock01543.0498>[1ock01543.0499>[1ock01543.0499>[1ock01543.04>[101543.049967] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> kernel: [101543.049975] b_state=0x00000029, b_size=4096
> kernel: [101543.049980] device blocksize: 4096
> kernel: [101543.049986] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
-- 8< --
These are not preceded by any other error messages (about possible FS
inconsistencies) as has been the case in the past when bugs related to
this error message were reported.
Judging by the block size, the possibly corrupt volume is the ext3 one
(the ext4 volume has a block size of 2048).
A forced fsck.ext{3,4} of the source and target partitions did not
show any inconsistencies.
Any ideas?
Thanks,
Thilo
[1] Linux andromeda 3.0.3-030003-generic #201108180913 SMP Thu Aug 18
09:15:59 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
[2] /var/log/kern.log grew to 200 MB just while shutting down the system
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3
2011-08-19 23:51 [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3 Thilo-Alexander Ginkel
@ 2011-09-05 12:59 ` Jan Kara
2011-09-20 18:08 ` Thilo-Alexander Ginkel
0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2011-09-05 12:59 UTC (permalink / raw)
To: Thilo-Alexander Ginkel; +Cc: linux-kernel, linux-ext4
Hi,
On Sat 20-08-11 01:51:49, Thilo-Alexander Ginkel wrote:
> while rsyncing a large amount (> 1TB) of data from an ext3 to an ext4
> on my machine [1], I encountered an issue where rsync and syslog
> eventually started consuming 100% CPU and my syslog was flooded [2]
> with error messages:
>
> -- 8< --
> > kernel: [101543.047293] b_state=0x00000029, b_size=>[10ock01543.04>[101543.047321] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> > kernel: [101543.047330] b_state=0x00000029, b_size=4096
> > kernel: [101543.047>[10ock01543.047348] b_state=0x00000029, b_size=4096
> > kernel: [101543.047353] device blocksize: 4096
> > kernel: [101543.047359] __find_get_block_slow() failed. block=328204473, b01543.0>[10ock01543.047>[1ock01543.047404] b_state=0x00000029, b_size=4096
> > kernel: [101543.047409] device blocksize: 4096
> > kernel: [101543.047414] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> > kernel: [10154ock01543.0>[1ock01543.0492>[1ock01543.0492>[1ock01543.049>[1ock01543.0492>[1ock01543.0>[1ock01543.049>[1ock01543.049>[1ock01543.0492>[10ock01543.0>[1ock=01543.04>[1ock01543.>[1ock01543.0493>[1ock01543.049>[1ock01543.04>[1ock01543.0493>[1ock01543.04941>[1ock01543.0494>[1ock01543.0>[1ock01543.049>[10ock01543.0>[1ock01543.04>[1ock01543.04>[1ock01543.0495>[1ock01543.0495>[1ock01543.0495>[1ock01543.0496>[1ock01543.04>[1ock01543.04>[1ock01543.049>[1ock01543.049>[1ock01543.04>[1ock01543.0497>[1ock01543.0>[1ock01543.0497>[1ock01543.0497>[1ock01543.0498>[1ock01543.0498>[1ock01543.04>[1ock01543.04>[1ock01543.0498>[1ock01543.0498>[1ock01543.0499>[1ock01543.0499>[1ock01543.04>[101543.049967] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> > kernel: [101543.049975] b_state=0x00000029, b_size=4096
> > kernel: [101543.049980] device blocksize: 4096
> > kernel: [101543.049986] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025
> -- 8< --
>
> These are not preceded by any other error messages (about possible FS
> inconsistencies) as has been the case in the past when bugs related to
> this error message were reported.
>
> Judging by the block size, the possibly corrupt volume is the ext3 one
> (the ext4 volume has a block size of 2048).
>
> A forced fsck.ext{3,4} of the source and target partitions did not
> show any inconsistencies.
>
> Any ideas?
Something has corrupted your buffer head structure in memory (and we then
infinitely looped in __getblk_slow()). bh->b_blocknr has been 0xC139000B9
which it should have been 0x139000B9 (5th byte has been changed from 0x00
to 0x0C). It might be a hw fault, buggy driver, or some other bug - hard to
say. You might want to run memtest for some time, or enable some kernel debug
options (DEBUG_PAGEALLOC, DEBUG_SLAB) which might catch the code causing
corruption (this assumes it's at least occasionally reproducible and your
are willing to take the performance hit)...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3
2011-09-05 12:59 ` Jan Kara
@ 2011-09-20 18:08 ` Thilo-Alexander Ginkel
0 siblings, 0 replies; 3+ messages in thread
From: Thilo-Alexander Ginkel @ 2011-09-20 18:08 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-kernel, linux-ext4
On Mon, Sep 5, 2011 at 14:59, Jan Kara <jack@suse.cz> wrote:
> Something has corrupted your buffer head structure in memory (and we then
> infinitely looped in __getblk_slow()). bh->b_blocknr has been 0xC139000B9
> which it should have been 0x139000B9 (5th byte has been changed from 0x00
> to 0x0C). It might be a hw fault, buggy driver, or some other bug - hard to
> say. You might want to run memtest for some time, or enable some kernel debug
> options (DEBUG_PAGEALLOC, DEBUG_SLAB) which might catch the code causing
> corruption (this assumes it's at least occasionally reproducible and your
> are willing to take the performance hit)...
Thanks for your reply and sorry for the slow response. As my system
also experienced lockups from time to time I performed an extensive
memtest run, which actually brought up sporadic memory corruption
(some bits flipping to zero) after 20 hours or so. I swapped CPU,
mainboard and RAM and have not experienced any problems since then, so
I guess this was the cause of the issue.
Thanks,
Thilo
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-09-20 18:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-19 23:51 [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3 Thilo-Alexander Ginkel
2011-09-05 12:59 ` Jan Kara
2011-09-20 18:08 ` Thilo-Alexander Ginkel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).