* memory overflow or undeflow in free space tree / space_info? @ 2016-07-29 18:40 Stefan Priebe - Profihost AG 2016-07-29 19:11 ` Omar Sandoval 0 siblings, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-07-29 18:40 UTC (permalink / raw) To: linux-btrfs@vger.kernel.org; +Cc: osandov Dear list, i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). In all cases i'm getting a trace like this one a space_info warning. (since commit [1]). Could someone please be so kind and help me debugging / fixing this bug? I'm using space_cache=v2 on all those systems. ------------[ cut here ]------------ WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710 btrfs_free_block_groups+0x35a/0x400 [btrfs]() Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas pps_core CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 Call Trace: [<ffffffffbd3c712f>] dump_stack+0x63/0x84 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs] [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f ---[ end trace bd985b05cc90617f ]--- ------------[ cut here ]------------ WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711 btrfs_free_block_groups+0x3f4/0x400 [btrfs]() Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas pps_core CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 Call Trace: [<ffffffffbd3c712f>] dump_stack+0x63/0x84 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs] [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f ---[ end trace bd985b05cc906180 ]--- ------------[ cut here ]------------ WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990 btrfs_free_block_groups+0x2a4/0x400 [btrfs]() Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas pps_core CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528 0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000 Call Trace: [<ffffffffbd3c712f>] dump_stack+0x63/0x84 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs] [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f ---[ end trace bd985b05cc906181 ]--- BTRFS: space_info 4 has 18446743491956604928 free, is not full BTRFS: space_info total=307627032576, used=206629289984, pinned=0, reserved=0, may_use=682750558208, readonly=131072 Greets, Stefan [1] https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 18:40 memory overflow or undeflow in free space tree / space_info? Stefan Priebe - Profihost AG @ 2016-07-29 19:11 ` Omar Sandoval 2016-07-29 19:14 ` Omar Sandoval 2016-07-29 19:39 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 14+ messages in thread From: Omar Sandoval @ 2016-07-29 19:11 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: > Dear list, > > i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). > > In all cases i'm getting a trace like this one a space_info warning. > (since commit [1]). Could someone please be so kind and help me > debugging / fixing this bug? I'm using space_cache=v2 on all those systems. Hm, so I think this indicates a bug in space accounting somewhere else rather than the free space tree itself. I haven't debugged one of these issues before, I'll see if I can reproduce it. Cc'ing Josef, too. > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710 Do these line numbers match up with yours? 5706 static void release_global_block_rsv(struct btrfs_fs_info *fs_info) 5707 { 5708 block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL, 5709 (u64)-1); 5710 WARN_ON(fs_info->delalloc_block_rsv.size > 0); 5711 WARN_ON(fs_info->delalloc_block_rsv.reserved > 0); 5712 WARN_ON(fs_info->trans_block_rsv.size > 0); 5713 WARN_ON(fs_info->trans_block_rsv.reserved > 0); 5714 WARN_ON(fs_info->chunk_block_rsv.size > 0); 5715 WARN_ON(fs_info->chunk_block_rsv.reserved > 0); 5716 WARN_ON(fs_info->delayed_block_rsv.size > 0); 5717 WARN_ON(fs_info->delayed_block_rsv.reserved > 0); 5718 } > btrfs_free_block_groups+0x35a/0x400 [btrfs]() > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas > raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables > x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel > usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core > usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod > raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx > xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas > pps_core > CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 > Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 > 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 > ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 > 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 > Call Trace: > [<ffffffffbd3c712f>] dump_stack+0x63/0x84 > [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 > [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 > [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs] > [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] > [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] > [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 > [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 > [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 > [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 > [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 > [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 > [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 > [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 > [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 > [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f > ---[ end trace bd985b05cc90617f ]--- > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711 > btrfs_free_block_groups+0x3f4/0x400 [btrfs]() > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas > raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables > x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel > usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core > usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod > raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx > xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas > pps_core > CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 > Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 > 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 > ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 > 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 > Call Trace: > [<ffffffffbd3c712f>] dump_stack+0x63/0x84 > [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 > [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 > [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs] > [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] > [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] > [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 > [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 > [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 > [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 > [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 > [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 > [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 > [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 > [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 > [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f > ---[ end trace bd985b05cc906180 ]--- > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990 I don't see what warning this is in kdave/for-next. > btrfs_free_block_groups+0x2a4/0x400 [btrfs]() > Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas > raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables > x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel > usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core > usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod > raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx > xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas > pps_core > CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 > Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 > 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 > ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528 > 0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000 > Call Trace: > [<ffffffffbd3c712f>] dump_stack+0x63/0x84 > [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 > [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 > [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs] > [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] > [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] > [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 > [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 > [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 > [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 > [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 > [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 > [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 > [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 > [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 > [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f > ---[ end trace bd985b05cc906181 ]--- > BTRFS: space_info 4 has 18446743491956604928 free, is not full > BTRFS: space_info total=307627032576, used=206629289984, pinned=0, > reserved=0, may_use=682750558208, readonly=131072 > > Greets, > Stefan > > [1] > https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd -- Omar ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 19:11 ` Omar Sandoval @ 2016-07-29 19:14 ` Omar Sandoval 2016-07-29 19:40 ` Stefan Priebe - Profihost AG 2016-07-29 21:03 ` Josef Bacik 2016-07-29 19:39 ` Stefan Priebe - Profihost AG 1 sibling, 2 replies; 14+ messages in thread From: Omar Sandoval @ 2016-07-29 19:14 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: > On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: > > Dear list, > > > > i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). > > > > In all cases i'm getting a trace like this one a space_info warning. > > (since commit [1]). Could someone please be so kind and help me > > debugging / fixing this bug? I'm using space_cache=v2 on all those systems. > > Hm, so I think this indicates a bug in space accounting somewhere else > rather than the free space tree itself. I haven't debugged one of these > issues before, I'll see if I can reproduce it. Cc'ing Josef, too. I should've asked, what sort of filesystem activity triggers this? -- Omar ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 19:14 ` Omar Sandoval @ 2016-07-29 19:40 ` Stefan Priebe - Profihost AG 2016-07-29 21:03 ` Josef Bacik 1 sibling, 0 replies; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-07-29 19:40 UTC (permalink / raw) To: Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik Am 29.07.2016 um 21:14 schrieb Omar Sandoval: > On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: >>> Dear list, >>> >>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). >>> >>> In all cases i'm getting a trace like this one a space_info warning. >>> (since commit [1]). Could someone please be so kind and help me >>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems. >> >> Hm, so I think this indicates a bug in space accounting somewhere else >> rather than the free space tree itself. I haven't debugged one of these >> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. > > I should've asked, what sort of filesystem activity triggers this? > Sure. The workload on the FS is basically: - Write file1 (50GB - 500GB) - cp --reflink=always file1 to file2 - apply changes to file2 (100MB - 5GB) - cp --reflink=always file2 to file3 - apply changes to file3 (100MB - 5GB) ... - delete file1 - cp --reflink=always file3 to file4 - apply changes to file4 (100MB - 5GB) - delete file2 ... And this for around 300 files a day. btrfs balance with dusage=5 and musage=5 is running daily sometimes in parallel to the workload above. Greets, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 19:14 ` Omar Sandoval 2016-07-29 19:40 ` Stefan Priebe - Profihost AG @ 2016-07-29 21:03 ` Josef Bacik 2016-07-29 22:57 ` Holger Hoffstätte 2016-08-04 11:40 ` Stefan Priebe - Profihost AG 1 sibling, 2 replies; 14+ messages in thread From: Josef Bacik @ 2016-07-29 21:03 UTC (permalink / raw) To: Omar Sandoval, Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org On 07/29/2016 03:14 PM, Omar Sandoval wrote: > On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: >>> Dear list, >>> >>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). >>> >>> In all cases i'm getting a trace like this one a space_info warning. >>> (since commit [1]). Could someone please be so kind and help me >>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems. >> >> Hm, so I think this indicates a bug in space accounting somewhere else >> rather than the free space tree itself. I haven't debugged one of these >> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. > > I should've asked, what sort of filesystem activity triggers this? > Chris just fixed this I think, try his next branch from his git tree git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git and see if it still happens. Thanks, Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 21:03 ` Josef Bacik @ 2016-07-29 22:57 ` Holger Hoffstätte 2016-07-29 23:09 ` Holger Hoffstätte 2016-08-04 11:40 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 14+ messages in thread From: Holger Hoffstätte @ 2016-07-29 22:57 UTC (permalink / raw) To: linux-btrfs On Fri, 29 Jul 2016 17:03:43 -0400, Josef Bacik wrote: > On 07/29/2016 03:14 PM, Omar Sandoval wrote: >> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: >>>> Dear list, >>>> >>>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). >>>> >>>> In all cases i'm getting a trace like this one a space_info warning. >>>> (since commit [1]). Could someone please be so kind and help me >>>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems. >>> >>> Hm, so I think this indicates a bug in space accounting somewhere else >>> rather than the free space tree itself. I haven't debugged one of these >>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >> >> I should've asked, what sort of filesystem activity triggers this? >> > > Chris just fixed this I think, try his next branch from his git tree > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git > > and see if it still happens. Thanks, > > Josef Hi Josef, can you say which patch you have in mind? The tree in question doesn't have any of Chandra's pagesize/sectorsize patches (carefully patched around, for stability and LTS patchability) so I hope it's not the recent commit 8b8b08cb "fix delalloc accounting after copy_from_user faults" because that would be too fiddly (at least for me) to backport correctly. The only other patch I just found missing and which looks like it could/should (I think?) work on top of the 4.4.x pagesize-based calculations in file.c is: a2af23b7 "__btrfs_buffered_write: Pass valid file offset when releasing delalloc space" Would that make sense? Neither I nor any other users of that tree have observed weird space-info underflows so far (and I use my fs daily), so it's definitely something peculiar Stefan is doing with his weird compressed rsync-inplace workload. Odd sector offsets causing slowly creeping space_info underflow sounds to me like it just might be the problem. thanks, Holger ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 22:57 ` Holger Hoffstätte @ 2016-07-29 23:09 ` Holger Hoffstätte 0 siblings, 0 replies; 14+ messages in thread From: Holger Hoffstätte @ 2016-07-29 23:09 UTC (permalink / raw) To: linux-btrfs On Fri, 29 Jul 2016 22:57:36 +0000, Holger Hoffstätte wrote: > The only other patch I just found missing and which looks like it > could/should (I think?) work on top of the 4.4.x pagesize-based > calculations in file.c is: > > a2af23b7 "__btrfs_buffered_write: Pass valid file offset when > releasing delalloc space" > > Would that make sense? No it wouldn't, not without some other sectorsize-related patches that came before...and those would just make matters worse. So forget the above. -h ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 21:03 ` Josef Bacik 2016-07-29 22:57 ` Holger Hoffstätte @ 2016-08-04 11:40 ` Stefan Priebe - Profihost AG 2016-08-08 6:17 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-04 11:40 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Am 29.07.2016 um 23:03 schrieb Josef Bacik: > On 07/29/2016 03:14 PM, Omar Sandoval wrote: >> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>> AG wrote: >>>> Dear list, >>>> >>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>> 30TB). >>>> >>>> In all cases i'm getting a trace like this one a space_info warning. >>>> (since commit [1]). Could someone please be so kind and help me >>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>> systems. >>> >>> Hm, so I think this indicates a bug in space accounting somewhere else >>> rather than the free space tree itself. I haven't debugged one of these >>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >> >> I should've asked, what sort of filesystem activity triggers this? >> > > Chris just fixed this I think, try his next branch from his git tree > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git Thanks now running a 4.4 with those patches backported. If that still shows an error i will try that vanilla tree. Thanks! Stefan > and see if it still happens. Thanks, > > Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-08-04 11:40 ` Stefan Priebe - Profihost AG @ 2016-08-08 6:17 ` Stefan Priebe - Profihost AG 2016-08-10 21:31 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-08 6:17 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG: > Am 29.07.2016 um 23:03 schrieb Josef Bacik: >> On 07/29/2016 03:14 PM, Omar Sandoval wrote: >>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>>> AG wrote: >>>>> Dear list, >>>>> >>>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>>> 30TB). >>>>> >>>>> In all cases i'm getting a trace like this one a space_info warning. >>>>> (since commit [1]). Could someone please be so kind and help me >>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>>> systems. >>>> >>>> Hm, so I think this indicates a bug in space accounting somewhere else >>>> rather than the free space tree itself. I haven't debugged one of these >>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >>> >>> I should've asked, what sort of filesystem activity triggers this? >>> >> >> Chris just fixed this I think, try his next branch from his git tree >> >> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git > > Thanks now running a 4.4 with those patches backported. If that still > shows an error i will try that vanilla tree. OK this didn't work. I'll start / try using the linux-btrfs next branch and look if this helps. Greets, Stefan > > Thanks! > > Stefan > >> and see if it still happens. Thanks, >> >> Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-08-08 6:17 ` Stefan Priebe - Profihost AG @ 2016-08-10 21:31 ` Stefan Priebe - Profihost AG 2016-08-11 6:09 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-10 21:31 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Hi Josef, same again with chris next branch: ERROR: error during balancing '/vmbackup/': No space left on device There may be more info in syslog - try dmesg | tail Dumping filters: flags 0x7, state 0x0, force is off DATA (flags 0x2): balancing, usage=5 METADATA (flags 0x2): balancing, usage=5 SYSTEM (flags 0x2): balancing, usage=5 dmesg: [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance uname -r 4.7.0-rc6-29043-g8b8b08c Greets, Stefan Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG: > Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG: >> Am 29.07.2016 um 23:03 schrieb Josef Bacik: >>> On 07/29/2016 03:14 PM, Omar Sandoval wrote: >>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>>>> AG wrote: >>>>>> Dear list, >>>>>> >>>>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>>>> 30TB). >>>>>> >>>>>> In all cases i'm getting a trace like this one a space_info warning. >>>>>> (since commit [1]). Could someone please be so kind and help me >>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>>>> systems. >>>>> >>>>> Hm, so I think this indicates a bug in space accounting somewhere else >>>>> rather than the free space tree itself. I haven't debugged one of these >>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >>>> >>>> I should've asked, what sort of filesystem activity triggers this? >>>> >>> >>> Chris just fixed this I think, try his next branch from his git tree >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >> >> Thanks now running a 4.4 with those patches backported. If that still >> shows an error i will try that vanilla tree. > > OK this didn't work. I'll start / try using the linux-btrfs next branch > and look if this helps. > > Greets, > Stefan > >> >> Thanks! >> >> Stefan >> >>> and see if it still happens. Thanks, >>> >>> Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-08-10 21:31 ` Stefan Priebe - Profihost AG @ 2016-08-11 6:09 ` Stefan Priebe - Profihost AG 2016-08-14 15:22 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-11 6:09 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Hello, the backtrace and info on umount looks the same: [241910.341124] ------------[ cut here ]------------ [241910.379991] WARNING: CPU: 1 PID: 26664 at fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs] [241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas [241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted 4.7.0-rc6-29043-g8b8b08c #1 [241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 [241910.723716] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf 0000000000000000 [241910.779309] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 ffff8808d104bd08 [241910.835143] 000016455a3410a8 00000047a0000000 0000000000000000 ffff8808469e2088 [241910.891882] Call Trace: [241910.947624] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 [241911.003714] [<ffffffffbd085615>] __warn+0xe5/0x100 [241911.060167] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 [241911.117422] [<ffffffffc058ca90>] btrfs_free_block_groups+0x370/0x410 [btrfs] [241911.175975] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] [241911.235170] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] [241911.294638] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 [241911.353005] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 [241911.409832] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] [241911.466467] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 [241911.522602] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 [241911.577979] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 [241911.633188] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 [241911.688146] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 [241911.742740] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 [241911.797039] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 [241911.850750] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 [241911.903564] ---[ end trace fae017546778f2b0 ]--- [241911.955332] ------------[ cut here ]------------ [241912.006262] WARNING: CPU: 1 PID: 26664 at fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs] [241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas [241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G W 4.7.0-rc6-29043-g8b8b08c #1 [241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 [241912.429395] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf 0000000000000000 [241912.497080] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 ffff8808d104bd08 [241912.565113] 000016465a3410a8 00000047a0000000 0000000000000000 ffff8808469e2088 [241912.634105] Call Trace: [241912.702992] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 [241912.773473] [<ffffffffbd085615>] __warn+0xe5/0x100 [241912.844339] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 [241912.916083] [<ffffffffc058cb2a>] btrfs_free_block_groups+0x40a/0x410 [btrfs] [241912.989103] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] [241913.062672] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] [241913.136364] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 [241913.208701] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 [241913.279194] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] [241913.348065] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 [241913.415082] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 [241913.479841] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 [241913.543353] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 [241913.605959] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 [241913.667542] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 [241913.729612] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 [241913.791203] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 [241913.852485] ---[ end trace fae017546778f2b1 ]--- [241913.913638] ------------[ cut here ]------------ [241913.974871] WARNING: CPU: 1 PID: 26664 at fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs] [241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas [241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G W 4.7.0-rc6-29043-g8b8b08c #1 [241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 [241914.460679] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf 0000000000000000 [241914.534126] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 ffff8808d104bce8 [241914.607523] 0000271dbd3dac8c ffff88085184aac8 0000000000000038 0000000000000000 [241914.681318] Call Trace: [241914.754437] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 [241914.828796] [<ffffffffbd085615>] __warn+0xe5/0x100 [241914.902953] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 [241914.977271] [<ffffffffc058c9da>] btrfs_free_block_groups+0x2ba/0x410 [btrfs] [241915.052041] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] [241915.126282] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] [241915.200758] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 [241915.273872] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 [241915.345132] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] [241915.414703] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 [241915.482488] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 [241915.547994] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 [241915.611962] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 [241915.674717] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 [241915.736398] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 [241915.798592] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 [241915.860295] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 [241915.921642] ---[ end trace fae017546778f2b2 ]--- [241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full [241916.045103] BTRFS: space_info total=307627032576, used=193048903680, pinned=0, reserved=0, may_use=688537059328, readonly=131072 Greets, Stefan Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG: > Hi Josef, > > same again with chris next branch: > > ERROR: error during balancing '/vmbackup/': No space left on device > There may be more info in syslog - try dmesg | tail > Dumping filters: flags 0x7, state 0x0, force is off > DATA (flags 0x2): balancing, usage=5 > METADATA (flags 0x2): balancing, usage=5 > SYSTEM (flags 0x2): balancing, usage=5 > > dmesg: > [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance > > uname -r 4.7.0-rc6-29043-g8b8b08c > > Greets, > Stefan > > Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG: >> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG: >>> Am 29.07.2016 um 23:03 schrieb Josef Bacik: >>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote: >>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>>>>> AG wrote: >>>>>>> Dear list, >>>>>>> >>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>>>>> 30TB). >>>>>>> >>>>>>> In all cases i'm getting a trace like this one a space_info warning. >>>>>>> (since commit [1]). Could someone please be so kind and help me >>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>>>>> systems. >>>>>> >>>>>> Hm, so I think this indicates a bug in space accounting somewhere else >>>>>> rather than the free space tree itself. I haven't debugged one of these >>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >>>>> >>>>> I should've asked, what sort of filesystem activity triggers this? >>>>> >>>> >>>> Chris just fixed this I think, try his next branch from his git tree >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >>> >>> Thanks now running a 4.4 with those patches backported. If that still >>> shows an error i will try that vanilla tree. >> >> OK this didn't work. I'll start / try using the linux-btrfs next branch >> and look if this helps. >> >> Greets, >> Stefan >> >>> >>> Thanks! >>> >>> Stefan >>> >>>> and see if it still happens. Thanks, >>>> >>>> Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-08-11 6:09 ` Stefan Priebe - Profihost AG @ 2016-08-14 15:22 ` Stefan Priebe - Profihost AG 2016-08-29 14:02 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-14 15:22 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Hi Josef, anything i could do or test? Results with a vanilla next branch are the same. Stefan Am 11.08.2016 um 08:09 schrieb Stefan Priebe - Profihost AG: > Hello, > > the backtrace and info on umount looks the same: > > [241910.341124] ------------[ cut here ]------------ > [241910.379991] WARNING: CPU: 1 PID: 26664 at > fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs] > [241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT > raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp > iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci > i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si > ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod > ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas > [241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted > 4.7.0-rc6-29043-g8b8b08c #1 > [241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c > 02/18/2015 > [241910.723716] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf > 0000000000000000 > [241910.779309] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 > ffff8808d104bd08 > [241910.835143] 000016455a3410a8 00000047a0000000 0000000000000000 > ffff8808469e2088 > [241910.891882] Call Trace: > [241910.947624] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 > [241911.003714] [<ffffffffbd085615>] __warn+0xe5/0x100 > [241911.060167] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 > [241911.117422] [<ffffffffc058ca90>] > btrfs_free_block_groups+0x370/0x410 [btrfs] > [241911.175975] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] > [241911.235170] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] > [241911.294638] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 > [241911.353005] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 > [241911.409832] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [241911.466467] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 > [241911.522602] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 > [241911.577979] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 > [241911.633188] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 > [241911.688146] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 > [241911.742740] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 > [241911.797039] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 > [241911.850750] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 > [241911.903564] ---[ end trace fae017546778f2b0 ]--- > [241911.955332] ------------[ cut here ]------------ > [241912.006262] WARNING: CPU: 1 PID: 26664 at > fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs] > [241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT > raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp > iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci > i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si > ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod > ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas > [241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G W > 4.7.0-rc6-29043-g8b8b08c #1 > [241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c > 02/18/2015 > [241912.429395] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf > 0000000000000000 > [241912.497080] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 > ffff8808d104bd08 > [241912.565113] 000016465a3410a8 00000047a0000000 0000000000000000 > ffff8808469e2088 > [241912.634105] Call Trace: > [241912.702992] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 > [241912.773473] [<ffffffffbd085615>] __warn+0xe5/0x100 > [241912.844339] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 > [241912.916083] [<ffffffffc058cb2a>] > btrfs_free_block_groups+0x40a/0x410 [btrfs] > [241912.989103] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] > [241913.062672] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] > [241913.136364] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 > [241913.208701] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 > [241913.279194] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [241913.348065] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 > [241913.415082] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 > [241913.479841] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 > [241913.543353] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 > [241913.605959] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 > [241913.667542] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 > [241913.729612] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 > [241913.791203] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 > [241913.852485] ---[ end trace fae017546778f2b1 ]--- > [241913.913638] ------------[ cut here ]------------ > [241913.974871] WARNING: CPU: 1 PID: 26664 at > fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs] > [241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT > raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp > iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci > i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si > ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod > ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas > [241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G W > 4.7.0-rc6-29043-g8b8b08c #1 > [241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c > 02/18/2015 > [241914.460679] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf > 0000000000000000 > [241914.534126] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 > ffff8808d104bce8 > [241914.607523] 0000271dbd3dac8c ffff88085184aac8 0000000000000038 > 0000000000000000 > [241914.681318] Call Trace: > [241914.754437] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 > [241914.828796] [<ffffffffbd085615>] __warn+0xe5/0x100 > [241914.902953] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 > [241914.977271] [<ffffffffc058c9da>] > btrfs_free_block_groups+0x2ba/0x410 [btrfs] > [241915.052041] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] > [241915.126282] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] > [241915.200758] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 > [241915.273872] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 > [241915.345132] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] > [241915.414703] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 > [241915.482488] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 > [241915.547994] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 > [241915.611962] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 > [241915.674717] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 > [241915.736398] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 > [241915.798592] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 > [241915.860295] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 > [241915.921642] ---[ end trace fae017546778f2b2 ]--- > [241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full > [241916.045103] BTRFS: space_info total=307627032576, used=193048903680, > pinned=0, reserved=0, may_use=688537059328, readonly=131072 > > Greets, > Stefan > > Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG: >> Hi Josef, >> >> same again with chris next branch: >> >> ERROR: error during balancing '/vmbackup/': No space left on device >> There may be more info in syslog - try dmesg | tail >> Dumping filters: flags 0x7, state 0x0, force is off >> DATA (flags 0x2): balancing, usage=5 >> METADATA (flags 0x2): balancing, usage=5 >> SYSTEM (flags 0x2): balancing, usage=5 >> >> dmesg: >> [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance >> >> uname -r 4.7.0-rc6-29043-g8b8b08c >> >> Greets, >> Stefan >> >> Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG: >>> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG: >>>> Am 29.07.2016 um 23:03 schrieb Josef Bacik: >>>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote: >>>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>>>>>> AG wrote: >>>>>>>> Dear list, >>>>>>>> >>>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>>>>>> 30TB). >>>>>>>> >>>>>>>> In all cases i'm getting a trace like this one a space_info warning. >>>>>>>> (since commit [1]). Could someone please be so kind and help me >>>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>>>>>> systems. >>>>>>> >>>>>>> Hm, so I think this indicates a bug in space accounting somewhere else >>>>>>> rather than the free space tree itself. I haven't debugged one of these >>>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >>>>>> >>>>>> I should've asked, what sort of filesystem activity triggers this? >>>>>> >>>>> >>>>> Chris just fixed this I think, try his next branch from his git tree >>>>> >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >>>> >>>> Thanks now running a 4.4 with those patches backported. If that still >>>> shows an error i will try that vanilla tree. >>> >>> OK this didn't work. I'll start / try using the linux-btrfs next branch >>> and look if this helps. >>> >>> Greets, >>> Stefan >>> >>>> >>>> Thanks! >>>> >>>> Stefan >>>> >>>>> and see if it still happens. Thanks, >>>>> >>>>> Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-08-14 15:22 ` Stefan Priebe - Profihost AG @ 2016-08-29 14:02 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-08-29 14:02 UTC (permalink / raw) To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org Hi Josef, this still hapens with current 4.8-rc* releases. Anything i can do to debug this? May be insert some code to check for an under or overflow in the code? Stefan Am 14.08.2016 um 17:22 schrieb Stefan Priebe - Profihost AG: > Hi Josef, > > anything i could do or test? Results with a vanilla next branch are the > same. > > Stefan > > Am 11.08.2016 um 08:09 schrieb Stefan Priebe - Profihost AG: >> Hello, >> >> the backtrace and info on umount looks the same: >> >> [241910.341124] ------------[ cut here ]------------ >> [241910.379991] WARNING: CPU: 1 PID: 26664 at >> fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs] >> [241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT >> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp >> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci >> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si >> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod >> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas >> [241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted >> 4.7.0-rc6-29043-g8b8b08c #1 >> [241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c >> 02/18/2015 >> [241910.723716] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf >> 0000000000000000 >> [241910.779309] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 >> ffff8808d104bd08 >> [241910.835143] 000016455a3410a8 00000047a0000000 0000000000000000 >> ffff8808469e2088 >> [241910.891882] Call Trace: >> [241910.947624] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 >> [241911.003714] [<ffffffffbd085615>] __warn+0xe5/0x100 >> [241911.060167] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 >> [241911.117422] [<ffffffffc058ca90>] >> btrfs_free_block_groups+0x370/0x410 [btrfs] >> [241911.175975] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] >> [241911.235170] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] >> [241911.294638] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 >> [241911.353005] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 >> [241911.409832] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [241911.466467] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 >> [241911.522602] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 >> [241911.577979] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 >> [241911.633188] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 >> [241911.688146] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 >> [241911.742740] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 >> [241911.797039] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 >> [241911.850750] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 >> [241911.903564] ---[ end trace fae017546778f2b0 ]--- >> [241911.955332] ------------[ cut here ]------------ >> [241912.006262] WARNING: CPU: 1 PID: 26664 at >> fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs] >> [241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT >> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp >> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci >> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si >> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod >> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas >> [241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G W >> 4.7.0-rc6-29043-g8b8b08c #1 >> [241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c >> 02/18/2015 >> [241912.429395] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf >> 0000000000000000 >> [241912.497080] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 >> ffff8808d104bd08 >> [241912.565113] 000016465a3410a8 00000047a0000000 0000000000000000 >> ffff8808469e2088 >> [241912.634105] Call Trace: >> [241912.702992] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 >> [241912.773473] [<ffffffffbd085615>] __warn+0xe5/0x100 >> [241912.844339] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 >> [241912.916083] [<ffffffffc058cb2a>] >> btrfs_free_block_groups+0x40a/0x410 [btrfs] >> [241912.989103] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] >> [241913.062672] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] >> [241913.136364] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 >> [241913.208701] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 >> [241913.279194] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [241913.348065] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 >> [241913.415082] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 >> [241913.479841] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 >> [241913.543353] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 >> [241913.605959] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 >> [241913.667542] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 >> [241913.729612] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 >> [241913.791203] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 >> [241913.852485] ---[ end trace fae017546778f2b1 ]--- >> [241913.913638] ------------[ cut here ]------------ >> [241913.974871] WARNING: CPU: 1 PID: 26664 at >> fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs] >> [241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT >> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp >> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci >> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si >> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov >> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod >> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas >> [241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G W >> 4.7.0-rc6-29043-g8b8b08c #1 >> [241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c >> 02/18/2015 >> [241914.460679] 0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf >> 0000000000000000 >> [241914.534126] 0000000000000000 ffff8808d104bcf8 ffffffffbd085615 >> ffff8808d104bce8 >> [241914.607523] 0000271dbd3dac8c ffff88085184aac8 0000000000000038 >> 0000000000000000 >> [241914.681318] Call Trace: >> [241914.754437] [<ffffffffbd3d83cf>] dump_stack+0x63/0x84 >> [241914.828796] [<ffffffffbd085615>] __warn+0xe5/0x100 >> [241914.902953] [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20 >> [241914.977271] [<ffffffffc058c9da>] >> btrfs_free_block_groups+0x2ba/0x410 [btrfs] >> [241915.052041] [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs] >> [241915.126282] [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs] >> [241915.200758] [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100 >> [241915.273872] [<ffffffffbd1df026>] kill_anon_super+0x16/0x30 >> [241915.345132] [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [241915.414703] [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90 >> [241915.482488] [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70 >> [241915.547994] [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90 >> [241915.611962] [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20 >> [241915.674717] [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0 >> [241915.736398] [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95 >> [241915.798592] [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150 >> [241915.860295] [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25 >> [241915.921642] ---[ end trace fae017546778f2b2 ]--- >> [241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full >> [241916.045103] BTRFS: space_info total=307627032576, used=193048903680, >> pinned=0, reserved=0, may_use=688537059328, readonly=131072 >> >> Greets, >> Stefan >> >> Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG: >>> Hi Josef, >>> >>> same again with chris next branch: >>> >>> ERROR: error during balancing '/vmbackup/': No space left on device >>> There may be more info in syslog - try dmesg | tail >>> Dumping filters: flags 0x7, state 0x0, force is off >>> DATA (flags 0x2): balancing, usage=5 >>> METADATA (flags 0x2): balancing, usage=5 >>> SYSTEM (flags 0x2): balancing, usage=5 >>> >>> dmesg: >>> [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance >>> >>> uname -r 4.7.0-rc6-29043-g8b8b08c >>> >>> Greets, >>> Stefan >>> >>> Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG: >>>> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG: >>>>> Am 29.07.2016 um 23:03 schrieb Josef Bacik: >>>>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote: >>>>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote: >>>>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost >>>>>>>> AG wrote: >>>>>>>>> Dear list, >>>>>>>>> >>>>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (> >>>>>>>>> 30TB). >>>>>>>>> >>>>>>>>> In all cases i'm getting a trace like this one a space_info warning. >>>>>>>>> (since commit [1]). Could someone please be so kind and help me >>>>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those >>>>>>>>> systems. >>>>>>>> >>>>>>>> Hm, so I think this indicates a bug in space accounting somewhere else >>>>>>>> rather than the free space tree itself. I haven't debugged one of these >>>>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too. >>>>>>> >>>>>>> I should've asked, what sort of filesystem activity triggers this? >>>>>>> >>>>>> >>>>>> Chris just fixed this I think, try his next branch from his git tree >>>>>> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git >>>>> >>>>> Thanks now running a 4.4 with those patches backported. If that still >>>>> shows an error i will try that vanilla tree. >>>> >>>> OK this didn't work. I'll start / try using the linux-btrfs next branch >>>> and look if this helps. >>>> >>>> Greets, >>>> Stefan >>>> >>>>> >>>>> Thanks! >>>>> >>>>> Stefan >>>>> >>>>>> and see if it still happens. Thanks, >>>>>> >>>>>> Josef ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: memory overflow or undeflow in free space tree / space_info? 2016-07-29 19:11 ` Omar Sandoval 2016-07-29 19:14 ` Omar Sandoval @ 2016-07-29 19:39 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 14+ messages in thread From: Stefan Priebe - Profihost AG @ 2016-07-29 19:39 UTC (permalink / raw) To: Omar Sandoval Cc: linux-btrfs@vger.kernel.org, Josef Bacik, Holger Hoffstätte Am 29.07.2016 um 21:11 schrieb Omar Sandoval: > On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote: >> Dear list, >> >> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB). >> >> In all cases i'm getting a trace like this one a space_info warning. >> (since commit [1]). Could someone please be so kind and help me >> debugging / fixing this bug? I'm using space_cache=v2 on all those systems. > > Hm, so I think this indicates a bug in space accounting somewhere else > rather than the free space tree itself. I haven't debugged one of these > issues before, I'll see if I can reproduce it. Cc'ing Josef, too. Thanks. >> ------------[ cut here ]------------ >> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710 > > Do these line numbers match up with yours? > > 5706 static void release_global_block_rsv(struct btrfs_fs_info *fs_info) > 5707 { > 5708 block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL, > 5709 (u64)-1); > 5710 WARN_ON(fs_info->delalloc_block_rsv.size > 0); > 5711 WARN_ON(fs_info->delalloc_block_rsv.reserved > 0); > 5712 WARN_ON(fs_info->trans_block_rsv.size > 0); > 5713 WARN_ON(fs_info->trans_block_rsv.reserved > 0); > 5714 WARN_ON(fs_info->chunk_block_rsv.size > 0); > 5715 WARN_ON(fs_info->chunk_block_rsv.reserved > 0); > 5716 WARN_ON(fs_info->delayed_block_rsv.size > 0); > 5717 WARN_ON(fs_info->delayed_block_rsv.reserved > 0); > 5718 } Yes it does. But the kernel i'm using is somewhat special i'm using a 4.4 kernel with a patchset from holger (CC'ed). See here: https://github.com/hhoffstaette/kernel-patches/tree/c9cce0933a40db84627241143b123210aee0fde6/4.4.15 >> btrfs_free_block_groups+0x35a/0x400 [btrfs]() >> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas >> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel >> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core >> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod >> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx >> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas >> pps_core >> CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 >> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 >> 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 >> ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 >> 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 >> Call Trace: >> [<ffffffffbd3c712f>] dump_stack+0x63/0x84 >> [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 >> [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 >> [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs] >> [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] >> [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] >> [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 >> [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 >> [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 >> [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 >> [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 >> [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 >> [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 >> [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 >> [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 >> [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace bd985b05cc90617f ]--- >> ------------[ cut here ]------------ >> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711 >> btrfs_free_block_groups+0x3f4/0x400 [btrfs]() >> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas >> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel >> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core >> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod >> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx >> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas >> pps_core >> CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 >> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 >> 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 >> ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000 >> 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000 >> Call Trace: >> [<ffffffffbd3c712f>] dump_stack+0x63/0x84 >> [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 >> [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 >> [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs] >> [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] >> [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] >> [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 >> [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 >> [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 >> [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 >> [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 >> [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 >> [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 >> [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 >> [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 >> [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace bd985b05cc906180 ]--- >> ------------[ cut here ]------------ >> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990 > > I don't see what warning this is in kdave/for-next. > >> btrfs_free_block_groups+0x2a4/0x400 [btrfs]() >> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas >> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables >> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel >> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core >> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod >> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx >> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas >> pps_core >> CPU: 5 PID: 26421 Comm: umount Tainted: G W O 4.4.15+43-ph #1 >> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015 >> 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000 >> ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528 >> 0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000 >> Call Trace: >> [<ffffffffbd3c712f>] dump_stack+0x63/0x84 >> [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0 >> [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20 >> [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs] >> [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs] >> [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs] >> [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100 >> [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30 >> [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs] >> [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90 >> [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70 >> [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90 >> [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20 >> [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0 >> [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95 >> [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0 >> [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f >> ---[ end trace bd985b05cc906181 ]--- >> BTRFS: space_info 4 has 18446743491956604928 free, is not full >> BTRFS: space_info total=307627032576, used=206629289984, pinned=0, >> reserved=0, may_use=682750558208, readonly=131072 >> >> Greets, >> Stefan >> >> [1] >> https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-08-29 14:02 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-29 18:40 memory overflow or undeflow in free space tree / space_info? Stefan Priebe - Profihost AG 2016-07-29 19:11 ` Omar Sandoval 2016-07-29 19:14 ` Omar Sandoval 2016-07-29 19:40 ` Stefan Priebe - Profihost AG 2016-07-29 21:03 ` Josef Bacik 2016-07-29 22:57 ` Holger Hoffstätte 2016-07-29 23:09 ` Holger Hoffstätte 2016-08-04 11:40 ` Stefan Priebe - Profihost AG 2016-08-08 6:17 ` Stefan Priebe - Profihost AG 2016-08-10 21:31 ` Stefan Priebe - Profihost AG 2016-08-11 6:09 ` Stefan Priebe - Profihost AG 2016-08-14 15:22 ` Stefan Priebe - Profihost AG 2016-08-29 14:02 ` Stefan Priebe - Profihost AG 2016-07-29 19:39 ` Stefan Priebe - Profihost AG
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).