From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [bug] ext{3,4}: __find_get_block_slow() failed on 3.0.3 Date: Mon, 5 Sep 2011 14:59:40 +0200 Message-ID: <20110905125939.GF5466@quack.suse.cz> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org To: Thilo-Alexander Ginkel Return-path: Received: from cantor2.suse.de ([195.135.220.15]:37892 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752835Ab1IEM7o (ORCPT ); Mon, 5 Sep 2011 08:59:44 -0400 Content-Disposition: inline In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi, On Sat 20-08-11 01:51:49, Thilo-Alexander Ginkel wrote: > while rsyncing a large amount (> 1TB) of data from an ext3 to an ext4 > on my machine [1], I encountered an issue where rsync and syslog > eventually started consuming 100% CPU and my syslog was flooded [2] > with error messages: > > -- 8< -- > > kernel: [101543.047293] b_state=0x00000029, b_size=>[10ock01543.04>[101543.047321] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [101543.047330] b_state=0x00000029, b_size=4096 > > kernel: [101543.047>[10ock01543.047348] b_state=0x00000029, b_size=4096 > > kernel: [101543.047353] device blocksize: 4096 > > kernel: [101543.047359] __find_get_block_slow() failed. block=328204473, b01543.0>[10ock01543.047>[1ock01543.047404] b_state=0x00000029, b_size=4096 > > kernel: [101543.047409] device blocksize: 4096 > > kernel: [101543.047414] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [10154ock01543.0>[1ock01543.0492>[1ock01543.0492>[1ock01543.049>[1ock01543.0492>[1ock01543.0>[1ock01543.049>[1ock01543.049>[1ock01543.0492>[10ock01543.0>[1ock=01543.04>[1ock01543.>[1ock01543.0493>[1ock01543.049>[1ock01543.04>[1ock01543.0493>[1ock01543.04941>[1ock01543.0494>[1ock01543.0>[1ock01543.049>[10ock01543.0>[1ock01543.04>[1ock01543.04>[1ock01543.0495>[1ock01543.0495>[1ock01543.0495>[1ock01543.0496>[1ock01543.04>[1ock01543.04>[1ock01543.049>[1ock01543.049>[1ock01543.04>[1ock01543.0497>[1ock01543.0>[1ock01543.0497>[1ock01543.0497>[1ock01543.0498>[1ock01543.0498>[1ock01543.04>[1ock01543.04>[1ock01543.0498>[1ock01543.0498>[1ock01543.0499>[1ock01543.0499>[1ock01543.04>[101543.049967] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > > kernel: [101543.049975] b_state=0x00000029, b_size=4096 > > kernel: [101543.049980] device blocksize: 4096 > > kernel: [101543.049986] __find_get_block_slow() failed. block=328204473, b_blocknr=51867812025 > -- 8< -- > > These are not preceded by any other error messages (about possible FS > inconsistencies) as has been the case in the past when bugs related to > this error message were reported. > > Judging by the block size, the possibly corrupt volume is the ext3 one > (the ext4 volume has a block size of 2048). > > A forced fsck.ext{3,4} of the source and target partitions did not > show any inconsistencies. > > Any ideas? Something has corrupted your buffer head structure in memory (and we then infinitely looped in __getblk_slow()). bh->b_blocknr has been 0xC139000B9 which it should have been 0x139000B9 (5th byte has been changed from 0x00 to 0x0C). It might be a hw fault, buggy driver, or some other bug - hard to say. You might want to run memtest for some time, or enable some kernel debug options (DEBUG_PAGEALLOC, DEBUG_SLAB) which might catch the code causing corruption (this assumes it's at least occasionally reproducible and your are willing to take the performance hit)... Honza -- Jan Kara SUSE Labs, CR