3.5.3: kernel BUG at fs/btrfs/ctree.c:3451!

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.5.3: kernel BUG at fs/btrfs/ctree.c:3451!
@ 2012-09-20 17:17 Marc MERLIN
  2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2012-09-20 17:17 UTC (permalink / raw)
  To: linux-btrfs

I had a btrfs built on top of 5 drives (dmcrypt devices).

The drive then died while I was writing to the filesystem and my system
crashed and rebooted:

[384555.534020] sd 10:0:0:0: rejecting I/O to offline device                    
[384555.535057] sd 10:0:0:0: rejecting I/O to offline device                    
[384556.666885] ------------[ cut here ]------------                            
[384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache                     
[384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!                            
[384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP                           
[384556.687878] CPU 2                                                           

	/* push data from right to left */
	copy_extent_buffer(left, right,
			   btrfs_item_nr_offset(btrfs_header_nritems(left)),
			   btrfs_item_nr_offset(0),
			   push_items * sizeof(struct btrfs_item));

	push_space = BTRFS_LEAF_DATA_SIZE(root) -
		     btrfs_item_offset_nr(right, push_items - 1);

	copy_extent_buffer(left, right, btrfs_leaf_data(left) +
		     leaf_data_end(root, left) - push_space,
		     btrfs_leaf_data(right) +
		     btrfs_item_offset_nr(right, push_items - 1),
		     push_space);
	old_left_nritems = btrfs_header_nritems(left);
	BUG_ON(old_left_nritems <= 0);  <<<<<<< 3451

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-20 17:17 3.5.3: kernel BUG at fs/btrfs/ctree.c:3451! Marc MERLIN
@ 2012-09-21  3:46 ` Marc MERLIN
  2012-09-21  3:51   ` cwillu
                     ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Marc MERLIN @ 2012-09-21  3:46 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
> I had a btrfs built on top of 5 drives (dmcrypt devices).
> 
> The drive then died while I was writing to the filesystem and my system
> crashed and rebooted:
> 
> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device                    
> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device                    
> [384556.666885] ------------[ cut here ]------------                            
> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache                     
> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!                            
> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP                           
> [384556.687878] CPU 2                                                           
> 

Oh my, now I'm trying again with a new drive, and a big cp from an
existing array to a new one dies with:
[32042.079411] ------------[ cut here ]------------                             
[32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
[32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
[32042.099227] CPU 1                                                            
[32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
core sparse_keymap    

    int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
			    u64 length, u64 logical, struct page *page,
			    int mirror_num)
    {
	    struct bio *bio;
	    struct btrfs_device *dev;
	    DECLARE_COMPLETION_ONSTACK(compl);
	    u64 map_length = 0;
	    u64 sector;
	    struct btrfs_bio *bbio = NULL;
	    int ret;

	    BUG_ON(!mirror_num); <<<<<

This is more of a problem since I can't backup my filesystem (source is
ext4 and destination is btrfs).

Any suggestion on what went wrong here?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
@ 2012-09-21  3:51   ` cwillu
  2012-09-21  4:11     ` Marc MERLIN
  2012-09-21  3:53   ` Liu Bo
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: cwillu @ 2012-09-21  3:51 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Thu, Sep 20, 2012 at 9:46 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
>> I had a btrfs built on top of 5 drives (dmcrypt devices).
>>
>> The drive then died while I was writing to the filesystem and my system
>> crashed and rebooted:
>>
>> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device
>> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device
>> [384556.666885] ------------[ cut here ]------------
>> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache
>> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!
>> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP
>> [384556.687878] CPU 2
>>
>
> Oh my, now I'm trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> core sparse_keymap
>
>     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
>                             u64 length, u64 logical, struct page *page,
>                             int mirror_num)
>     {
>             struct bio *bio;
>             struct btrfs_device *dev;
>             DECLARE_COMPLETION_ONSTACK(compl);
>             u64 map_length = 0;
>             u64 sector;
>             struct btrfs_bio *bbio = NULL;
>             int ret;
>
>             BUG_ON(!mirror_num); <<<<<
>
> This is more of a problem since I can't backup my filesystem (source is
> ext4 and destination is btrfs).
>
> Any suggestion on what went wrong here?

There should have been a stack trace as well as a couple other things,
can you post those as well please?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
  2012-09-21  3:51   ` cwillu
@ 2012-09-21  3:53   ` Liu Bo
  2012-09-21  4:57   ` Stefan Behrens
  2012-09-23 16:16   ` crash in read_extent_buffer+0xb7/0xfb Marc MERLIN
  3 siblings, 0 replies; 11+ messages in thread
From: Liu Bo @ 2012-09-21  3:53 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 09/21/2012 11:46 AM, Marc MERLIN wrote:
> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
>> I had a btrfs built on top of 5 drives (dmcrypt devices).
>>
>> The drive then died while I was writing to the filesystem and my system
>> crashed and rebooted:
>>
>> [384555.534020] sd 10:0:0:0: rejecting I/O to offline device                    
>> [384555.535057] sd 10:0:0:0: rejecting I/O to offline device                    
>> [384556.666885] ------------[ cut here ]------------                            
>> [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache                     
>> [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!                            
>> [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP                           
>> [384556.687878] CPU 2                                                           
>>
>  
> Oh my, now I'm trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------                             
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
> [32042.099227] CPU 1                                                            
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> core sparse_keymap    
> 
>     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
> 			    u64 length, u64 logical, struct page *page,
> 			    int mirror_num)
>     {
> 	    struct bio *bio;
> 	    struct btrfs_device *dev;
> 	    DECLARE_COMPLETION_ONSTACK(compl);
> 	    u64 map_length = 0;
> 	    u64 sector;
> 	    struct btrfs_bio *bbio = NULL;
> 	    int ret;
> 
> 	    BUG_ON(!mirror_num); <<<<<
> 
> This is more of a problem since I can't backup my filesystem (source is
> ext4 and destination is btrfs).
> 
> Any suggestion on what went wrong here?
> 

Could you please show us the complete stack info?

thanks,
liubo

> Thanks,
> Marc
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-21  3:51   ` cwillu
@ 2012-09-21  4:11     ` Marc MERLIN
  0 siblings, 0 replies; 11+ messages in thread
From: Marc MERLIN @ 2012-09-21  4:11 UTC (permalink / raw)
  To: cwillu, Liu Bo; +Cc: linux-btrfs

On Thu, Sep 20, 2012 at 09:51:59PM -0600, cwillu wrote:
> > Oh my, now I'm trying again with a new drive, and a big cp from an
> > existing array to a new one dies with:
> > [32042.079411] ------------[ cut here ]------------
> > [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> > [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> > [32042.099227] CPU 1
> > [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> > _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
> >  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> > nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> > core sparse_keymap
> >
> >     int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
> >                             u64 length, u64 logical, struct page *page,
> >                             int mirror_num)
> >     {
> >             struct bio *bio;
> >             struct btrfs_device *dev;
> >             DECLARE_COMPLETION_ONSTACK(compl);
> >             u64 map_length = 0;
> >             u64 sector;
> >             struct btrfs_bio *bbio = NULL;
> >             int ret;
> >
> >             BUG_ON(!mirror_num); <<<<<
> >
> > This is more of a problem since I can't backup my filesystem (source is
> > ext4 and destination is btrfs).
> >
> > Any suggestion on what went wrong here?
> 
> There should have been a stack trace as well as a couple other things,
> can you post those as well please?

Actually, I found a few more lines in syslog just before the crash:
 kernel: [32008.938796] lost page write due to I/O error on /dev/mapper/crypt_e0e810c2-0d8f-409f-9674-e05763083a45
 kernel: [32008.938800] btrfs: bdev /dev/mapper/crypt_e0e810c2-0d8f-409f-9674-e05763083a45 errs: wr 1933, rd 0, flush 32, corrupt 0, gen 0
 kernel: [32008.954383] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.954386] btrfs: bdev /dev/dm-6 errs: wr 1490, rd 0, flush 18, corrupt 0, gen 0
 kernel: [32008.969038] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.969043] btrfs: bdev /dev/dm-6 errs: wr 1491, rd 0, flush 18, corrupt 0, gen 0
 kernel: [32008.979997] lost page write due to I/O error on /dev/dm-6
 kernel: [32008.980002] btrfs: bdev /dev/dm-6 errs: wr 1492, rd 0, flush 18, corrupt 0, gen 0

That helps answer my question: disk error caused the crash.

As for a stack trace, I was suprised that I didn't get one, but the lines I posted
are the last ones I got on my serial console (they didn't even make it to syslog).

to be more clear, all I got is:
[32042.079411] ------------[ cut here ]------------                             
[32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
[32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
[32042.099227] CPU 1                                                            
[32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
core sparse_keymap                                                              
LILO 23.2 boot:                                                                 
Loading linux...........................................................        
BIOS data check successful   

I'm booting with:
auto BOOT_IMAGE=linux ro root=900 panic=20 console=tty0 console=ttyS0,115200n8 elevator=cfq pcie_aspm=force edd=off irqpoll

Is panic=20 causing the stack trace not to be printed somehow?

If not, is one of my config options set wrong?
http://marc.merlins.org/tmp/config-3.5.3-amd64-preempt-noide-20120903

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
  2012-09-21  3:51   ` cwillu
  2012-09-21  3:53   ` Liu Bo
@ 2012-09-21  4:57   ` Stefan Behrens
  2012-09-21  5:43     ` Marc MERLIN
  2012-09-23 16:16   ` crash in read_extent_buffer+0xb7/0xfb Marc MERLIN
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Behrens @ 2012-09-21  4:57 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 09/21/2012 05:46, Marc MERLIN wrote:
> Oh my, now I'm trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP
> [32042.099227] CPU 1
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
>   ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> core sparse_keymap
>
>      int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start,
> 			    u64 length, u64 logical, struct page *page,
> 			    int mirror_num)
>      {
> 	    struct bio *bio;
> 	    struct btrfs_device *dev;
> 	    DECLARE_COMPLETION_ONSTACK(compl);
> 	    u64 map_length = 0;
> 	    u64 sector;
> 	    struct btrfs_bio *bbio = NULL;
> 	    int ret;
>
> 	    BUG_ON(!mirror_num); <<<<<
>

This was fixed with commit c0901581ad077004145c9ee80e843fba71c100b8 and 
is included in Linux 3.6 RC1.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel BUG at fs/btrfs/extent_io.c:1884!
  2012-09-21  4:57   ` Stefan Behrens
@ 2012-09-21  5:43     ` Marc MERLIN
  0 siblings, 0 replies; 11+ messages in thread
From: Marc MERLIN @ 2012-09-21  5:43 UTC (permalink / raw)
  To: Stefan Behrens; +Cc: linux-btrfs

On Fri, Sep 21, 2012 at 06:57:32AM +0200, Stefan Behrens wrote:
> >	    BUG_ON(!mirror_num); <<<<<
> >
> 
> This was fixed with commit c0901581ad077004145c9ee80e843fba71c100b8 and 
> is included in Linux 3.6 RC1.

Congrats for all having a time machine and fixing my reported bugs in the
past :)

Thanks for the fix and the link,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in read_extent_buffer+0xb7/0xfb
  2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
                     ` (2 preceding siblings ...)
  2012-09-21  4:57   ` Stefan Behrens
@ 2012-09-23 16:16   ` Marc MERLIN
  2012-09-24 13:08     ` David Sterba
  3 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2012-09-23 16:16 UTC (permalink / raw)
  To: linux-btrfs

On Thu, Sep 20, 2012 at 08:46:52PM -0700, Marc MERLIN wrote:
> On Thu, Sep 20, 2012 at 10:17:47AM -0700, Marc MERLIN wrote:
> > I had a btrfs built on top of 5 drives (dmcrypt devices).
> > 
> > The drive then died while I was writing to the filesystem and my system
> > crashed and rebooted:
> > 
> > [384555.534020] sd 10:0:0:0: rejecting I/O to offline device                    
> > [384555.535057] sd 10:0:0:0: rejecting I/O to offline device                    
> > [384556.666885] ------------[ cut here ]------------                            
> > [384556.667909] sd 10:0:0:0: [sdj] Synchronizing SCSI cache                     
> > [384556.677509] kernel BUG at fs/btrfs/ctree.c:3451!                            
> > [384556.682551] invalid opcode: 0000 [#1] PREEMPT SMP                           
> > [384556.687878] CPU 2                                                           
> > 
>  
> Oh my, now I'm trying again with a new drive, and a big cp from an
> existing array to a new one dies with:
> [32042.079411] ------------[ cut here ]------------                             
> [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
> [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
> [32042.099227] CPU 1                                                            
> [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
>  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> core sparse_keymap    

I had a different crash while copying to a btrfs 5 disk array. Not sure if this is
also fixed too, but pasting just in case.
 
[207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7 6, corrupt 0, gen 0
[207055.067267] btrfs bad mapping eb start 8653217792 len 4096, wanted 184467440 50581869634 4
[207055.078099] general protection fault: 0000 [#1] PREEMPT SMP
[207055.085213] CPU 3
[207055.087173] Modules linked in:[207055.091512]  raid456 async_raid6_recov asy
nc_pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb1
05 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_REJECT
 xt_state xt_tcpudp xt_LOG iptable_mangle iptable_filter deflate ctr twofish_gen
eric twofish_x86_64_3way twofish_x86_64 twofish_common camellia_generic camellia
_x86_64 serpent_sse2_x86_64 lrw serpent_generic xts gf128mul blowfish_generic bl
owfish_x86_64 blowfish_common cast5 des_generic xcbc rmd160 sha512_generic crypt
o_null af_key xfrm_algo dm_crypt dm_mirror dm_region_hash dm_log aes_x86_64 fuse
 lm85 hwmon_vid dm_snapshot dm_mod iptable_nat ip_tables nf_conntrack_ftp ipt_MA
SQUERADE nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 x_tables nf_conntrack sg st snd
_pcm_oss snd_mixer_oss snd_hda_codec_hdmi snd_hda_codec_realtek snd_cmipci gamep
ort rc_ati_x10 snd_opl3_lib snd_mpu401_uart pl2303 ati_remote rc_core snd_seq_mi
di snd_seq_midi_event snd_seq usbserial snd_rawmidi kvm_intel kvm snd_seq_device
 snd_hda_intel[207055.193933]  i915 snd_hda_codec drm_kms_helper snd_hwdep snd_p
cm drm snd_timer eeepc_wmi asus_wmi sparse_keymap rfkill snd i2c_i801 parport_pc
 acpi_cpufreq i2c_algo_bit microcode crc32c_intel ehci_hcd xhci_hcd ghash_clmuln
i_intel pci_hotplug wmi cryptd r8169 snd_page_alloc soundcore pcspkr tpm_tis mpe
rf tpm evdev tpm_bios usbcore i2c_core parport mii lpc_ich mei sata_sil24 corete
mp sata_mv fan thermal processor button video thermal_sys usb_common [last unloa
ded: kl5kusb105]

[207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W    3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product Name/P8H67-M PRO
[207055.261478] RIP: 0010:[<ffffffff811fc9ae>]  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
[207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX: ffff8800405ba2c8
[207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI: ffff880105ff38ec
[207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09: 0000160000000000
[207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12: 0000000000000004
[207055.313356] R13: ffff880000000000 R14: fffffffa9d7b9446 R15: 000000000000044 2
[207055.322032] FS:  0000000000000000(0000) GS:ffff88011f380000(0000) knlGS:0000000000000000
[207055.331692] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[207055.339014] CR2: 00000000f7021000 CR3: 0000000001a0c000 CR4: 00000000000407e0
[207055.347715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[207055.356403] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[207055.365092] Process btrfs-transacti (pid: 6456, threadinfo ffff880105ff2000,task ffff880105e7e600)
[207055.376219] Stack:
[207055.380369]  fffffffa9d7b9442 000fffffffa9d7b9 ffff880105ff38a0 0000000000000000
[207055.389447]  ffff8800405ba1f8 fffffffa9d7b9431 fffffffa9d7b9442 00000000798be017
[207055.398481]  ffff880105ff3910 ffffffff811f2855 ffff8800405ba1f8 fffffffa9d7b9000
[207055.407543] Call Trace:
[207055.411582]  [<ffffffff811f2855>] btrfs_token_item_offset+0x86/0xb8
[207055.419436]  [<ffffffff811f295f>] btrfs_item_offset+0xb/0xd
[207055.426585]  [<ffffffff811c04bf>] btrfs_item_offset_nr+0x14/0x16
[207055.434143]  [<ffffffff811c08f9>] leaf_space_used+0x58/0x81
[207055.441269]  [<ffffffff811c42ea>] btrfs_leaf_free_space+0x33/0x72
[207055.448924]  [<ffffffff811c4d45>] push_leaf_right+0xa1/0x142
[207055.456092]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.463329]  [<ffffffff811c4f13>] split_leaf+0x79/0x52f
[207055.470222]  [<ffffffff811f295f>] ? btrfs_item_offset+0xb/0xd
[207055.477483]  [<ffffffff811c08f9>] ? leaf_space_used+0x58/0x81
[207055.484744]  [<ffffffff814aac0e>] ? _raw_write_unlock+0x28/0x33
[207055.492203]  [<ffffffff8120a523>] ? btrfs_set_lock_blocking_rw+0x9b/0xec
[207055.500770]  [<ffffffff811c5b5c>] btrfs_search_slot+0x583/0x62e
[207055.508199]  [<ffffffff811c6e32>] btrfs_insert_empty_items+0x62/0xb4
[207055.516029]  [<ffffffff811cef40>] run_clustered_refs+0x3e2/0x741
[207055.523655]  [<ffffffff811cf503>] btrfs_run_delayed_refs+0x264/0x373
[207055.531450]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
[207055.538950]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.545965]  [<ffffffff814aaab9>] ? _raw_spin_unlock+0x27/0x32
[207055.553168]  [<ffffffff811f6c51>] ? btrfs_run_ordered_operations+0x19f/0x1ae
[207055.561517]  [<ffffffff811dd30f>] btrfs_commit_transaction+0xa9/0x8dc
[207055.569231]  [<ffffffff8105957a>] ? add_wait_queue+0x44/0x44
[207055.576235]  [<ffffffff81049f32>] ? init_timer_deferrable_key+0x17/0x17
[207055.584056]  [<ffffffff811d7e58>] transaction_kthread+0x174/0x230
[207055.591332]  [<ffffffff811d7ce4>] ? try_to_freeze+0x33/0x33
[207055.598153]  [<ffffffff81058e3c>] kthread+0x86/0x8e
[207055.604162]  [<ffffffff814b08a4>] kernel_thread_helper+0x4/0x10
[207055.611168]  [<ffffffff81058db6>] ? kthread_freezable_should_stop+0x3e/0x3e
[207055.619358]  [<ffffffff814b08a0>] ? gs_change+0x13/0x13
[207055.625624] Code: b7 6d db b6 6d db b6 6d 49 bd 00 00 00 00 00 88 ff ff 49 c1 e0 03 eb 43 48 8b 8b 50 01 00 00 4c 89 d0 48 89 d7 4c 29 f8 4c 39 e0 <4a> 8b 0c 01 49 0f 47 c4 49 83 c0 08 49 29 c4 4c 01 c9 48 c1 f9
[207055.647970] RIP  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.655271]  RSP <ffff880105ff3880>
[207055.665029] ---[ end trace 06a6f0aa8102336a ]---
[207055.671223] Kernel panic - not syncing: Fatal exception



-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in read_extent_buffer+0xb7/0xfb
  2012-09-23 16:16   ` crash in read_extent_buffer+0xb7/0xfb Marc MERLIN
@ 2012-09-24 13:08     ` David Sterba
  2012-09-24 14:41       ` Marc MERLIN
  0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2012-09-24 13:08 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Sun, Sep 23, 2012 at 09:16:34AM -0700, Marc MERLIN wrote:
> > Oh my, now I'm trying again with a new drive, and a big cp from an
> > existing array to a new one dies with:
> > [32042.079411] ------------[ cut here ]------------                             
> > [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884!                         
> > [32042.092528] invalid opcode: 0000 [#1] PREEMPT SMP                            
> > [32042.099227] CPU 1                                                            
> > [32042.101095] Modules linked in:[32042.105950]  raid456 async_raid6_recov async
> > _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105
> >  ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s
> > nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_
> > core sparse_keymap    
> 
> I had a different crash while copying to a btrfs 5 disk array. Not sure if this is
> also fixed too, but pasting just in case.
>  
> [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7 6, corrupt 0, gen 0

So many write and flush errors?

> [207055.067267] btrfs bad mapping eb start 8653217792 len 4096, wanted 184467440 50581869634 4

4680         if (start + min_len > eb->len) {
4681                 printk(KERN_ERR "btrfs bad mapping eb start %llu len %lu, "
4682                        "wanted %lu %lu\n", (unsigned long long)eb->start,
4683                        eb->len, start, min_len);
4684                 WARN_ON(1);
4685                 return -EINVAL;
4686         }

8653217792  = 0x203c5a000	eb->start
4096       			eb->len

184467440   = 0x00afebff0	start
50581869634 = 0xbc6ea1442	min_len

bogus numbers, no pattern, not visible in the stacktrace.


> [207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W    3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product Name/P8H67-M PRO
> [207055.261478] RIP: 0010:[<ffffffff811fc9ae>]  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
> [207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
> [207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX: ffff8800405ba2c8
> [207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI: ffff880105ff38ec
> [207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09: 0000160000000000
> [207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12: 0000000000000004

R11 contains the POISON_FREE pattern, though it's not clear who and where
used it. It may come from some unhandled case in the write error
recovery paths.

The crash site is not any of the BUG_ON but some place that actually
tries to access an unmapped memory, so from that point it slipped
through sanity checks.


david

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in read_extent_buffer+0xb7/0xfb
  2012-09-24 13:08     ` David Sterba
@ 2012-09-24 14:41       ` Marc MERLIN
  2012-09-24 15:37         ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Marc MERLIN @ 2012-09-24 14:41 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Sep 24, 2012 at 03:08:47PM +0200, David Sterba wrote:
> > I had a different crash while copying to a btrfs 5 disk array. Not sure if this is
> > also fixed too, but pasting just in case.
> >  
> > [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7 6, corrupt 0, gen 0
> 
> So many write and flush errors?
 
It's possible, I have crappy drives that were cheap that I'm using for tests
and copies.

> R11 contains the POISON_FREE pattern, though it's not clear who and where
> used it. It may come from some unhandled case in the write error
> recovery paths.
 
Considering that I was doing a huge copy to a brtfs filesystem (source was
ext4) and that I was using crappy drives in a 5 drives configuration
with no redundancy since there is no raid5 yet, it's very possible.

> The crash site is not any of the BUG_ON but some place that actually
> tries to access an unmapped memory, so from that point it slipped
> through sanity checks.

If that helps, I forgot to decode the ASM:

========
   0:   b7 6d                   mov    $0x6d,%bh
   2:   db b6 6d db b6 6d       (bad)  0x6db6db6d(%rsi)
   8:   49 bd 00 00 00 00 00    movabs $0xffff880000000000,%r13
   f:   88 ff ff 
  12:   49 c1 e0 03             shl    $0x3,%r8
  16:   eb 43                   jmp    0x5b
  18:   48 8b 8b 50 01 00 00    mov    0x150(%rbx),%rcx
  1f:   4c 89 d0                mov    %r10,%rax
  22:   48 89 d7                mov    %rdx,%rdi
  25:   4c 29 f8                sub    %r15,%rax
  28:   4c 39 e0                cmp    %r12,%rax
  2b:*  4a 8b 0c 01             mov    (%rcx,%r8,1),%rcx     <-- trapping instruction
  2f:   49 0f 47 c4             cmova  %r12,%rax
  33:   49 83 c0 08             add    $0x8,%r8
  37:   49 29 c4                sub    %rax,%r12
  3a:   4c 01 c9                add    %r9,%rcx
  3d:   48                      rex.W
  3e:   c1                      .byte 0xc1
  3f:   f9                      stc    

Code starting with the faulting instruction
===========================================
   0:   4a 8b 0c 01             mov    (%rcx,%r8,1),%rcx
   4:   49 0f 47 c4             cmova  %r12,%rax
   8:   49 83 c0 08             add    $0x8,%r8
   c:   49 29 c4                sub    %rax,%r12
   f:   4c 01 c9                add    %r9,%rcx
  12:   48                      rex.W
  13:   c1                      .byte 0xc1
  14:   f9                      stc   

For 

[207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: G        W    3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product Name/P8H67-M PRO
[207055.261478] RIP: 0010:[<ffffffff811fc9ae>]  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.271621] RSP: 0018:ffff880105ff3880  EFLAGS: 00010202
[207055.278516] RAX: 0000000000000bbe RBX: ffff8800405ba1f8 RCX: ffff8800405ba2c8
[207055.287257] RDX: ffff880105ff38ec RSI: 0000000000000086 RDI: ffff880105ff38ec
[207055.295967] RBP: ffff880105ff38c0 R08: 007ffffffd4ebdc8 R09: 0000160000000000
[207055.304674] R10: 0000000000001000 R11: 6db6db6db6db6db7 R12: 0000000000000004
[207055.313356] R13: ffff880000000000 R14: fffffffa9d7b9446 R15: 000000000000044 2
[207055.322032] FS:  0000000000000000(0000) GS:ffff88011f380000(0000) knlGS:0000000000000000
[207055.331692] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[207055.339014] CR2: 00000000f7021000 CR3: 0000000001a0c000 CR4: 00000000000407e0
[207055.347715] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[207055.356403] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[207055.365092] Process btrfs-transacti (pid: 6456, threadinfo ffff880105ff2000,task ffff880105e7e600)
[207055.376219] Stack:
[207055.380369]  fffffffa9d7b9442 000fffffffa9d7b9 ffff880105ff38a0 0000000000000000
[207055.389447]  ffff8800405ba1f8 fffffffa9d7b9431 fffffffa9d7b9442 00000000798be017
[207055.398481]  ffff880105ff3910 ffffffff811f2855 ffff8800405ba1f8 fffffffa9d7b9000
[207055.407543] Call Trace:
[207055.411582]  [<ffffffff811f2855>] btrfs_token_item_offset+0x86/0xb8
[207055.419436]  [<ffffffff811f295f>] btrfs_item_offset+0xb/0xd
[207055.426585]  [<ffffffff811c04bf>] btrfs_item_offset_nr+0x14/0x16
[207055.434143]  [<ffffffff811c08f9>] leaf_space_used+0x58/0x81
[207055.441269]  [<ffffffff811c42ea>] btrfs_leaf_free_space+0x33/0x72
[207055.448924]  [<ffffffff811c4d45>] push_leaf_right+0xa1/0x142
[207055.456092]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.463329]  [<ffffffff811c4f13>] split_leaf+0x79/0x52f
[207055.470222]  [<ffffffff811f295f>] ? btrfs_item_offset+0xb/0xd
[207055.477483]  [<ffffffff811c08f9>] ? leaf_space_used+0x58/0x81
[207055.484744]  [<ffffffff814aac0e>] ? _raw_write_unlock+0x28/0x33
[207055.492203]  [<ffffffff8120a523>] ? btrfs_set_lock_blocking_rw+0x9b/0xec
[207055.500770]  [<ffffffff811c5b5c>] btrfs_search_slot+0x583/0x62e
[207055.508199]  [<ffffffff811c6e32>] btrfs_insert_empty_items+0x62/0xb4
[207055.516029]  [<ffffffff811cef40>] run_clustered_refs+0x3e2/0x741
[207055.523655]  [<ffffffff811cf503>] btrfs_run_delayed_refs+0x264/0x373
[207055.531450]  [<ffffffff81085cf8>] ? arch_local_irq_save+0x15/0x1b
[207055.538950]  [<ffffffff814aa936>] ? _raw_spin_lock+0x1b/0x1f
[207055.545965]  [<ffffffff814aaab9>] ? _raw_spin_unlock+0x27/0x32
[207055.553168]  [<ffffffff811f6c51>] ? btrfs_run_ordered_operations+0x19f/0x1ae
[207055.561517]  [<ffffffff811dd30f>] btrfs_commit_transaction+0xa9/0x8dc
[207055.569231]  [<ffffffff8105957a>] ? add_wait_queue+0x44/0x44
[207055.576235]  [<ffffffff81049f32>] ? init_timer_deferrable_key+0x17/0x17
[207055.584056]  [<ffffffff811d7e58>] transaction_kthread+0x174/0x230
[207055.591332]  [<ffffffff811d7ce4>] ? try_to_freeze+0x33/0x33
[207055.598153]  [<ffffffff81058e3c>] kthread+0x86/0x8e
[207055.604162]  [<ffffffff814b08a4>] kernel_thread_helper+0x4/0x10
[207055.611168]  [<ffffffff81058db6>] ? kthread_freezable_should_stop+0x3e/0x3e
[207055.619358]  [<ffffffff814b08a0>] ? gs_change+0x13/0x13
[207055.625624] Code: b7 6d db b6 6d db b6 6d 49 bd 00 00 00 00 00 88 ff ff 49 c1 e0 03 eb 43 48 8b 8b 50 01 00 00 4c 89 d0 48 89 d7 4c 29 f8 4c 39 e0 <4a> 8b 0c 01 49 0f 47 c4 49 83 c0 08 49 29 c4 4c 01 c9 48 c1 f9
[207055.647970] RIP  [<ffffffff811fc9ae>] read_extent_buffer+0xb7/0xfb
[207055.655271]  RSP <ffff880105ff3880>
[207055.665029] ---[ end trace 06a6f0aa8102336a ]---
[207055.671223] Kernel panic - not syncing: Fatal exception
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: crash in read_extent_buffer+0xb7/0xfb
  2012-09-24 14:41       ` Marc MERLIN
@ 2012-09-24 15:37         ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2012-09-24 15:37 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Mon, Sep 24, 2012 at 07:41:03AM -0700, Marc MERLIN wrote:
> It's possible, I have crappy drives that were cheap that I'm using for tests
> and copies.

Yeah, that makes a good use of crappy disks :)

> Considering that I was doing a huge copy to a brtfs filesystem (source was
> ext4) and that I was using crappy drives in a 5 drives configuration
> with no redundancy since there is no raid5 yet, it's very possible.

Well, in your case raid1 might not be enough to protect the data.

>    0:   b7 6d                   mov    $0x6d,%bh
>    2:   db b6 6d db b6 6d       (bad)  0x6db6db6d(%rsi)
>    8:   49 bd 00 00 00 00 00    movabs $0xffff880000000000,%r13
>    f:   88 ff ff 
>   12:   49 c1 e0 03             shl    $0x3,%r8
>   16:   eb 43                   jmp    0x5b
>   18:   48 8b 8b 50 01 00 00    mov    0x150(%rbx),%rcx
>   1f:   4c 89 d0                mov    %r10,%rax
>   22:   48 89 d7                mov    %rdx,%rdi
>   25:   4c 29 f8                sub    %r15,%rax
>   28:   4c 39 e0                cmp    %r12,%rax
>   2b:*  4a 8b 0c 01             mov    (%rcx,%r8,1),%rcx     <-- trapping instruction

ffff8800405ba2c8 + 007ffffffd4ebdc8 = 1007f88003daa6090 and overflows 64bit

I'm afraid this does not tell much of the story. The last function that
is not a struct helper was leaf_space_used(), via push_leaf_right,
split_leaf() from btrfs_search_slot -- all sanity chcecks I see are past
any of those calls, so it's probably corrupted on-disk.

The call stack is unfortunatelly deep and going backwards in assembly to
track where R11 could get set is tedious.

Did you see any other messages in the log? If you could recreate the
filesystem and workload, doing a fsck occasionally may narrow down the
surface for analysis. Otherwise I'm out of ideas now.

david

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-09-24 15:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-20 17:17 3.5.3: kernel BUG at fs/btrfs/ctree.c:3451! Marc MERLIN
2012-09-21  3:46 ` kernel BUG at fs/btrfs/extent_io.c:1884! Marc MERLIN
2012-09-21  3:51   ` cwillu
2012-09-21  4:11     ` Marc MERLIN
2012-09-21  3:53   ` Liu Bo
2012-09-21  4:57   ` Stefan Behrens
2012-09-21  5:43     ` Marc MERLIN
2012-09-23 16:16   ` crash in read_extent_buffer+0xb7/0xfb Marc MERLIN
2012-09-24 13:08     ` David Sterba
2012-09-24 14:41       ` Marc MERLIN
2012-09-24 15:37         ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).