linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* memory overflow or undeflow in free space tree / space_info?
@ 2016-07-29 18:40 Stefan Priebe - Profihost AG
  2016-07-29 19:11 ` Omar Sandoval
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-07-29 18:40 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org; +Cc: osandov

Dear list,

i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).

In all cases i'm getting a trace like this one a space_info warning.
(since commit [1]). Could someone please be so kind and help me
debugging / fixing this bug? I'm using space_cache=v2 on all those systems.

------------[ cut here ]------------
WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710
btrfs_free_block_groups+0x35a/0x400 [btrfs]()
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
pps_core
CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
Call Trace:
 [<ffffffffbd3c712f>] dump_stack+0x63/0x84
 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs]
 [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
 [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
 [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
 [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
---[ end trace bd985b05cc90617f ]---
------------[ cut here ]------------
WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711
btrfs_free_block_groups+0x3f4/0x400 [btrfs]()
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
pps_core
CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
 0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
Call Trace:
 [<ffffffffbd3c712f>] dump_stack+0x63/0x84
 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs]
 [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
 [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
 [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
 [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
---[ end trace bd985b05cc906180 ]---
------------[ cut here ]------------
WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990
btrfs_free_block_groups+0x2a4/0x400 [btrfs]()
Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
pps_core
CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
 0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
 ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528
 0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000
Call Trace:
 [<ffffffffbd3c712f>] dump_stack+0x63/0x84
 [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
 [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs]
 [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
 [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
 [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
 [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
 [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
 [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
 [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
 [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
 [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
 [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
 [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
 [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
 [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
---[ end trace bd985b05cc906181 ]---
BTRFS: space_info 4 has 18446743491956604928 free, is not full
BTRFS: space_info total=307627032576, used=206629289984, pinned=0,
reserved=0, may_use=682750558208, readonly=131072

Greets,
Stefan

[1]
https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 18:40 memory overflow or undeflow in free space tree / space_info? Stefan Priebe - Profihost AG
@ 2016-07-29 19:11 ` Omar Sandoval
  2016-07-29 19:14   ` Omar Sandoval
  2016-07-29 19:39   ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 14+ messages in thread
From: Omar Sandoval @ 2016-07-29 19:11 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik

On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
> Dear list,
> 
> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
> 
> In all cases i'm getting a trace like this one a space_info warning.
> (since commit [1]). Could someone please be so kind and help me
> debugging / fixing this bug? I'm using space_cache=v2 on all those systems.

Hm, so I think this indicates a bug in space accounting somewhere else
rather than the free space tree itself. I haven't debugged one of these
issues before, I'll see if I can reproduce it. Cc'ing Josef, too.

> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710

Do these line numbers match up with yours?

  5706	static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
  5707	{
  5708		block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
  5709					(u64)-1);
  5710		WARN_ON(fs_info->delalloc_block_rsv.size > 0);
  5711		WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
  5712		WARN_ON(fs_info->trans_block_rsv.size > 0);
  5713		WARN_ON(fs_info->trans_block_rsv.reserved > 0);
  5714		WARN_ON(fs_info->chunk_block_rsv.size > 0);
  5715		WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
  5716		WARN_ON(fs_info->delayed_block_rsv.size > 0);
  5717		WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
  5718	}

> btrfs_free_block_groups+0x35a/0x400 [btrfs]()
> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
> pps_core
> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
>  0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
> Call Trace:
>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs]
>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
> ---[ end trace bd985b05cc90617f ]---
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711
> btrfs_free_block_groups+0x3f4/0x400 [btrfs]()
> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
> pps_core
> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
>  0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
> Call Trace:
>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs]
>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
> ---[ end trace bd985b05cc906180 ]---
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990

I don't see what warning this is in kdave/for-next.

> btrfs_free_block_groups+0x2a4/0x400 [btrfs]()
> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
> pps_core
> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528
>  0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000
> Call Trace:
>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs]
>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
> ---[ end trace bd985b05cc906181 ]---
> BTRFS: space_info 4 has 18446743491956604928 free, is not full
> BTRFS: space_info total=307627032576, used=206629289984, pinned=0,
> reserved=0, may_use=682750558208, readonly=131072
> 
> Greets,
> Stefan
> 
> [1]
> https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd

-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 19:11 ` Omar Sandoval
@ 2016-07-29 19:14   ` Omar Sandoval
  2016-07-29 19:40     ` Stefan Priebe - Profihost AG
  2016-07-29 21:03     ` Josef Bacik
  2016-07-29 19:39   ` Stefan Priebe - Profihost AG
  1 sibling, 2 replies; 14+ messages in thread
From: Omar Sandoval @ 2016-07-29 19:14 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik

On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
> > Dear list,
> > 
> > i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
> > 
> > In all cases i'm getting a trace like this one a space_info warning.
> > (since commit [1]). Could someone please be so kind and help me
> > debugging / fixing this bug? I'm using space_cache=v2 on all those systems.
> 
> Hm, so I think this indicates a bug in space accounting somewhere else
> rather than the free space tree itself. I haven't debugged one of these
> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.

I should've asked, what sort of filesystem activity triggers this?

-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 19:11 ` Omar Sandoval
  2016-07-29 19:14   ` Omar Sandoval
@ 2016-07-29 19:39   ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-07-29 19:39 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-btrfs@vger.kernel.org, Josef Bacik, Holger Hoffstätte

Am 29.07.2016 um 21:11 schrieb Omar Sandoval:
> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
>> Dear list,
>>
>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
>>
>> In all cases i'm getting a trace like this one a space_info warning.
>> (since commit [1]). Could someone please be so kind and help me
>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems.
> 
> Hm, so I think this indicates a bug in space accounting somewhere else
> rather than the free space tree itself. I haven't debugged one of these
> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.

Thanks.

>> ------------[ cut here ]------------
>> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5710
> 
> Do these line numbers match up with yours?
> 
>   5706	static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>   5707	{
>   5708		block_rsv_release_bytes(fs_info, &fs_info->global_block_rsv, NULL,
>   5709					(u64)-1);
>   5710		WARN_ON(fs_info->delalloc_block_rsv.size > 0);
>   5711		WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
>   5712		WARN_ON(fs_info->trans_block_rsv.size > 0);
>   5713		WARN_ON(fs_info->trans_block_rsv.reserved > 0);
>   5714		WARN_ON(fs_info->chunk_block_rsv.size > 0);
>   5715		WARN_ON(fs_info->chunk_block_rsv.reserved > 0);
>   5716		WARN_ON(fs_info->delayed_block_rsv.size > 0);
>   5717		WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
>   5718	}

Yes it does.

But the kernel i'm using is somewhat special i'm using a 4.4 kernel with
a patchset from holger (CC'ed). See here:
https://github.com/hhoffstaette/kernel-patches/tree/c9cce0933a40db84627241143b123210aee0fde6/4.4.15

>> btrfs_free_block_groups+0x35a/0x400 [btrfs]()
>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
>> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
>> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
>> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
>> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
>> pps_core
>> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
>>  0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
>> Call Trace:
>>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>>  [<ffffffffc034a17a>] btrfs_free_block_groups+0x35a/0x400 [btrfs]
>>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace bd985b05cc90617f ]---
>> ------------[ cut here ]------------
>> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:5711
>> btrfs_free_block_groups+0x3f4/0x400 [btrfs]()
>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
>> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
>> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
>> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
>> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
>> pps_core
>> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 00000047a0000000
>>  0000000000000000 ffff8806016a1400 ffff8808881d2088 ffff8808881d2000
>> Call Trace:
>>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>>  [<ffffffffc034a214>] btrfs_free_block_groups+0x3f4/0x400 [btrfs]
>>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace bd985b05cc906180 ]---
>> ------------[ cut here ]------------
>> WARNING: CPU: 5 PID: 26421 at fs/btrfs/extent-tree.c:9990
> 
> I don't see what warning this is in kdave/for-next.
> 
>> btrfs_free_block_groups+0x2a4/0x400 [btrfs]()
>> Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 mpt3sas
>> raid_class scsi_transport_sas xt_multiport iptable_filter ip_tables
>> x_tables 8021q garp bonding coretemp loop i40e(O) vxlan ip6_udp_tunnel
>> usbhid udp_tunnel sb_edac ehci_pci edac_core ehci_hcd i2c_i801 i2c_core
>> usbcore shpchp usb_common ipmi_si ipmi_msghandler button btrfs dm_mod
>> raid1 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq md_mod ixgbe mdio sg sd_mod ahci ptp libahci megaraid_sas
>> pps_core
>> CPU: 5 PID: 26421 Comm: umount Tainted: G        W  O    4.4.15+43-ph #1
>> Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c 02/18/2015
>>  0000000000000000 ffff880ae8b47cd8 ffffffffbd3c712f 0000000000000000
>>  ffffffffc03ec603 ffff880ae8b47d18 ffffffffbd0837e7 ffff880c6aaa4528
>>  0000000000000038 0000000000000000 ffff8802fe8d8c88 ffff8808881d2000
>> Call Trace:
>>  [<ffffffffbd3c712f>] dump_stack+0x63/0x84
>>  [<ffffffffbd0837e7>] warn_slowpath_common+0x97/0xe0
>>  [<ffffffffbd08384a>] warn_slowpath_null+0x1a/0x20
>>  [<ffffffffc034a0c4>] btrfs_free_block_groups+0x2a4/0x400 [btrfs]
>>  [<ffffffffc035ba4b>] close_ctree+0x15b/0x330 [btrfs]
>>  [<ffffffffc03291f9>] btrfs_put_super+0x19/0x20 [btrfs]
>>  [<ffffffffbd1cd33f>] generic_shutdown_super+0x6f/0x100
>>  [<ffffffffbd1cd866>] kill_anon_super+0x16/0x30
>>  [<ffffffffc032f96a>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>>  [<ffffffffbd1cda31>] deactivate_locked_super+0x51/0x90
>>  [<ffffffffbd1ce42e>] deactivate_super+0x4e/0x70
>>  [<ffffffffbd1e9373>] cleanup_mnt+0x43/0x90
>>  [<ffffffffbd1e9412>] __cleanup_mnt+0x12/0x20
>>  [<ffffffffbd09ef8e>] task_work_run+0x7e/0xa0
>>  [<ffffffffbd07e550>] exit_to_usermode_loop+0x66/0x95
>>  [<ffffffffbd002a56>] syscall_return_slowpath+0xa6/0xf0
>>  [<ffffffffbd6b6f4c>] int_ret_from_sys_call+0x25/0x8f
>> ---[ end trace bd985b05cc906181 ]---
>> BTRFS: space_info 4 has 18446743491956604928 free, is not full
>> BTRFS: space_info total=307627032576, used=206629289984, pinned=0,
>> reserved=0, may_use=682750558208, readonly=131072
>>
>> Greets,
>> Stefan
>>
>> [1]
>> https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git/commit/?h=for-next&id=d555b6c380c644af63dbdaa7cc14bba041a4e4dd
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 19:14   ` Omar Sandoval
@ 2016-07-29 19:40     ` Stefan Priebe - Profihost AG
  2016-07-29 21:03     ` Josef Bacik
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-07-29 19:40 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik


Am 29.07.2016 um 21:14 schrieb Omar Sandoval:
> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
>>> Dear list,
>>>
>>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
>>>
>>> In all cases i'm getting a trace like this one a space_info warning.
>>> (since commit [1]). Could someone please be so kind and help me
>>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems.
>>
>> Hm, so I think this indicates a bug in space accounting somewhere else
>> rather than the free space tree itself. I haven't debugged one of these
>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
> 
> I should've asked, what sort of filesystem activity triggers this?
> 

Sure.


The workload on the FS is basically:
- Write file1 (50GB - 500GB)

- cp --reflink=always file1 to file2
- apply changes to file2 (100MB - 5GB)

- cp --reflink=always file2 to file3
- apply changes to file3 (100MB - 5GB)

...

- delete file1

- cp --reflink=always file3 to file4
- apply changes to file4 (100MB - 5GB)

- delete file2

...

And this for around 300 files a day. btrfs balance with dusage=5 and
musage=5 is running daily sometimes in parallel to the workload above.

Greets,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 19:14   ` Omar Sandoval
  2016-07-29 19:40     ` Stefan Priebe - Profihost AG
@ 2016-07-29 21:03     ` Josef Bacik
  2016-07-29 22:57       ` Holger Hoffstätte
  2016-08-04 11:40       ` Stefan Priebe - Profihost AG
  1 sibling, 2 replies; 14+ messages in thread
From: Josef Bacik @ 2016-07-29 21:03 UTC (permalink / raw)
  To: Omar Sandoval, Stefan Priebe - Profihost AG; +Cc: linux-btrfs@vger.kernel.org

On 07/29/2016 03:14 PM, Omar Sandoval wrote:
> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
>>> Dear list,
>>>
>>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
>>>
>>> In all cases i'm getting a trace like this one a space_info warning.
>>> (since commit [1]). Could someone please be so kind and help me
>>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems.
>>
>> Hm, so I think this indicates a bug in space accounting somewhere else
>> rather than the free space tree itself. I haven't debugged one of these
>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>
> I should've asked, what sort of filesystem activity triggers this?
>

Chris just fixed this I think, try his next branch from his git tree

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git

and see if it still happens.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 21:03     ` Josef Bacik
@ 2016-07-29 22:57       ` Holger Hoffstätte
  2016-07-29 23:09         ` Holger Hoffstätte
  2016-08-04 11:40       ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 14+ messages in thread
From: Holger Hoffstätte @ 2016-07-29 22:57 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 29 Jul 2016 17:03:43 -0400, Josef Bacik wrote:

> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost AG wrote:
>>>> Dear list,
>>>>
>>>> i'm seeing btrfs no space messages frequently on big filesystems (> 30TB).
>>>>
>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>> (since commit [1]). Could someone please be so kind and help me
>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those systems.
>>>
>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>> rather than the free space tree itself. I haven't debugged one of these
>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>
>> I should've asked, what sort of filesystem activity triggers this?
>>
> 
> Chris just fixed this I think, try his next branch from his git tree
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> 
> and see if it still happens.  Thanks,
> 
> Josef

Hi Josef,

can you say which patch you have in mind? The tree in question
doesn't have any of Chandra's pagesize/sectorsize patches (carefully
patched around, for stability and LTS patchability) so I hope it's
not the recent commit

8b8b08cb "fix delalloc accounting after copy_from_user faults"

because that would be too fiddly (at least for me) to backport
correctly.

The only other patch I just found missing and which looks like it
could/should (I think?) work on top of the 4.4.x pagesize-based
calculations in file.c is:

a2af23b7 "__btrfs_buffered_write: Pass valid file offset when
releasing delalloc space"

Would that make sense? Neither I nor any other users of that tree
have observed weird space-info underflows so far (and I use my
fs daily), so it's definitely something peculiar Stefan is doing
with his weird compressed rsync-inplace workload. Odd sector offsets
causing slowly creeping space_info underflow sounds to me like it
just might be the problem.

thanks,
Holger


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 22:57       ` Holger Hoffstätte
@ 2016-07-29 23:09         ` Holger Hoffstätte
  0 siblings, 0 replies; 14+ messages in thread
From: Holger Hoffstätte @ 2016-07-29 23:09 UTC (permalink / raw)
  To: linux-btrfs


On Fri, 29 Jul 2016 22:57:36 +0000, Holger Hoffstätte wrote:

> The only other patch I just found missing and which looks like it
> could/should (I think?) work on top of the 4.4.x pagesize-based
> calculations in file.c is:
> 
> a2af23b7 "__btrfs_buffered_write: Pass valid file offset when
> releasing delalloc space"
> 
> Would that make sense?

No it wouldn't, not without some other sectorsize-related patches
that came before...and those would just make matters worse.

So forget the above.

-h


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-07-29 21:03     ` Josef Bacik
  2016-07-29 22:57       ` Holger Hoffstätte
@ 2016-08-04 11:40       ` Stefan Priebe - Profihost AG
  2016-08-08  6:17         ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-04 11:40 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Am 29.07.2016 um 23:03 schrieb Josef Bacik:
> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>> AG wrote:
>>>> Dear list,
>>>>
>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>> 30TB).
>>>>
>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>> (since commit [1]). Could someone please be so kind and help me
>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>> systems.
>>>
>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>> rather than the free space tree itself. I haven't debugged one of these
>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>
>> I should've asked, what sort of filesystem activity triggers this?
>>
> 
> Chris just fixed this I think, try his next branch from his git tree
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git

Thanks now running a 4.4 with those patches backported. If that still
shows an error i will try that vanilla tree.

Thanks!

Stefan

> and see if it still happens.  Thanks,
> 
> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-08-04 11:40       ` Stefan Priebe - Profihost AG
@ 2016-08-08  6:17         ` Stefan Priebe - Profihost AG
  2016-08-10 21:31           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-08  6:17 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG:
> Am 29.07.2016 um 23:03 schrieb Josef Bacik:
>> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>>> AG wrote:
>>>>> Dear list,
>>>>>
>>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>>> 30TB).
>>>>>
>>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>>> (since commit [1]). Could someone please be so kind and help me
>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>>> systems.
>>>>
>>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>>> rather than the free space tree itself. I haven't debugged one of these
>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>>
>>> I should've asked, what sort of filesystem activity triggers this?
>>>
>>
>> Chris just fixed this I think, try his next branch from his git tree
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
> 
> Thanks now running a 4.4 with those patches backported. If that still
> shows an error i will try that vanilla tree.

OK this didn't work. I'll start / try using the linux-btrfs next branch
and look if this helps.

Greets,
Stefan

> 
> Thanks!
> 
> Stefan
> 
>> and see if it still happens.  Thanks,
>>
>> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-08-08  6:17         ` Stefan Priebe - Profihost AG
@ 2016-08-10 21:31           ` Stefan Priebe - Profihost AG
  2016-08-11  6:09             ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-10 21:31 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Hi Josef,

same again with chris next branch:

ERROR: error during balancing '/vmbackup/': No space left on device
There may be more info in syslog - try dmesg | tail
Dumping filters: flags 0x7, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=5
  METADATA (flags 0x2): balancing, usage=5
  SYSTEM (flags 0x2): balancing, usage=5

dmesg:
[203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance

uname -r 4.7.0-rc6-29043-g8b8b08c

Greets,
Stefan

Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG:
> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG:
>> Am 29.07.2016 um 23:03 schrieb Josef Bacik:
>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>>>> AG wrote:
>>>>>> Dear list,
>>>>>>
>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>>>> 30TB).
>>>>>>
>>>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>>>> (since commit [1]). Could someone please be so kind and help me
>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>>>> systems.
>>>>>
>>>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>>>> rather than the free space tree itself. I haven't debugged one of these
>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>>>
>>>> I should've asked, what sort of filesystem activity triggers this?
>>>>
>>>
>>> Chris just fixed this I think, try his next branch from his git tree
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
>>
>> Thanks now running a 4.4 with those patches backported. If that still
>> shows an error i will try that vanilla tree.
> 
> OK this didn't work. I'll start / try using the linux-btrfs next branch
> and look if this helps.
> 
> Greets,
> Stefan
> 
>>
>> Thanks!
>>
>> Stefan
>>
>>> and see if it still happens.  Thanks,
>>>
>>> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-08-10 21:31           ` Stefan Priebe - Profihost AG
@ 2016-08-11  6:09             ` Stefan Priebe - Profihost AG
  2016-08-14 15:22               ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-11  6:09 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Hello,

the backtrace and info on umount looks the same:

[241910.341124] ------------[ cut here ]------------
[241910.379991] WARNING: CPU: 1 PID: 26664 at
fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs]
[241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT
raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
[241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted
4.7.0-rc6-29043-g8b8b08c #1
[241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
02/18/2015
[241910.723716]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
0000000000000000
[241910.779309]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
ffff8808d104bd08
[241910.835143]  000016455a3410a8 00000047a0000000 0000000000000000
ffff8808469e2088
[241910.891882] Call Trace:
[241910.947624]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
[241911.003714]  [<ffffffffbd085615>] __warn+0xe5/0x100
[241911.060167]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
[241911.117422]  [<ffffffffc058ca90>]
btrfs_free_block_groups+0x370/0x410 [btrfs]
[241911.175975]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
[241911.235170]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
[241911.294638]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
[241911.353005]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
[241911.409832]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
[241911.466467]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
[241911.522602]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
[241911.577979]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
[241911.633188]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
[241911.688146]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
[241911.742740]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
[241911.797039]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
[241911.850750]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
[241911.903564] ---[ end trace fae017546778f2b0 ]---
[241911.955332] ------------[ cut here ]------------
[241912.006262] WARNING: CPU: 1 PID: 26664 at
fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs]
[241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT
raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
[241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
4.7.0-rc6-29043-g8b8b08c #1
[241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
02/18/2015
[241912.429395]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
0000000000000000
[241912.497080]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
ffff8808d104bd08
[241912.565113]  000016465a3410a8 00000047a0000000 0000000000000000
ffff8808469e2088
[241912.634105] Call Trace:
[241912.702992]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
[241912.773473]  [<ffffffffbd085615>] __warn+0xe5/0x100
[241912.844339]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
[241912.916083]  [<ffffffffc058cb2a>]
btrfs_free_block_groups+0x40a/0x410 [btrfs]
[241912.989103]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
[241913.062672]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
[241913.136364]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
[241913.208701]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
[241913.279194]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
[241913.348065]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
[241913.415082]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
[241913.479841]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
[241913.543353]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
[241913.605959]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
[241913.667542]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
[241913.729612]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
[241913.791203]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
[241913.852485] ---[ end trace fae017546778f2b1 ]---
[241913.913638] ------------[ cut here ]------------
[241913.974871] WARNING: CPU: 1 PID: 26664 at
fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs]
[241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT
raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
[241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
4.7.0-rc6-29043-g8b8b08c #1
[241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
02/18/2015
[241914.460679]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
0000000000000000
[241914.534126]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
ffff8808d104bce8
[241914.607523]  0000271dbd3dac8c ffff88085184aac8 0000000000000038
0000000000000000
[241914.681318] Call Trace:
[241914.754437]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
[241914.828796]  [<ffffffffbd085615>] __warn+0xe5/0x100
[241914.902953]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
[241914.977271]  [<ffffffffc058c9da>]
btrfs_free_block_groups+0x2ba/0x410 [btrfs]
[241915.052041]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
[241915.126282]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
[241915.200758]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
[241915.273872]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
[241915.345132]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
[241915.414703]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
[241915.482488]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
[241915.547994]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
[241915.611962]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
[241915.674717]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
[241915.736398]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
[241915.798592]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
[241915.860295]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
[241915.921642] ---[ end trace fae017546778f2b2 ]---
[241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full
[241916.045103] BTRFS: space_info total=307627032576, used=193048903680,
pinned=0, reserved=0, may_use=688537059328, readonly=131072

Greets,
Stefan

Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG:
> Hi Josef,
> 
> same again with chris next branch:
> 
> ERROR: error during balancing '/vmbackup/': No space left on device
> There may be more info in syslog - try dmesg | tail
> Dumping filters: flags 0x7, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=5
>   METADATA (flags 0x2): balancing, usage=5
>   SYSTEM (flags 0x2): balancing, usage=5
> 
> dmesg:
> [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance
> 
> uname -r 4.7.0-rc6-29043-g8b8b08c
> 
> Greets,
> Stefan
> 
> Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG:
>> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG:
>>> Am 29.07.2016 um 23:03 schrieb Josef Bacik:
>>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>>>>> AG wrote:
>>>>>>> Dear list,
>>>>>>>
>>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>>>>> 30TB).
>>>>>>>
>>>>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>>>>> (since commit [1]). Could someone please be so kind and help me
>>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>>>>> systems.
>>>>>>
>>>>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>>>>> rather than the free space tree itself. I haven't debugged one of these
>>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>>>>
>>>>> I should've asked, what sort of filesystem activity triggers this?
>>>>>
>>>>
>>>> Chris just fixed this I think, try his next branch from his git tree
>>>>
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
>>>
>>> Thanks now running a 4.4 with those patches backported. If that still
>>> shows an error i will try that vanilla tree.
>>
>> OK this didn't work. I'll start / try using the linux-btrfs next branch
>> and look if this helps.
>>
>> Greets,
>> Stefan
>>
>>>
>>> Thanks!
>>>
>>> Stefan
>>>
>>>> and see if it still happens.  Thanks,
>>>>
>>>> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-08-11  6:09             ` Stefan Priebe - Profihost AG
@ 2016-08-14 15:22               ` Stefan Priebe - Profihost AG
  2016-08-29 14:02                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-14 15:22 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Hi Josef,

anything i could do or test? Results with a vanilla next branch are the
same.

Stefan

Am 11.08.2016 um 08:09 schrieb Stefan Priebe - Profihost AG:
> Hello,
> 
> the backtrace and info on umount looks the same:
> 
> [241910.341124] ------------[ cut here ]------------
> [241910.379991] WARNING: CPU: 1 PID: 26664 at
> fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs]
> [241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT
> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
> [241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted
> 4.7.0-rc6-29043-g8b8b08c #1
> [241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
> 02/18/2015
> [241910.723716]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
> 0000000000000000
> [241910.779309]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
> ffff8808d104bd08
> [241910.835143]  000016455a3410a8 00000047a0000000 0000000000000000
> ffff8808469e2088
> [241910.891882] Call Trace:
> [241910.947624]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
> [241911.003714]  [<ffffffffbd085615>] __warn+0xe5/0x100
> [241911.060167]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
> [241911.117422]  [<ffffffffc058ca90>]
> btrfs_free_block_groups+0x370/0x410 [btrfs]
> [241911.175975]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
> [241911.235170]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
> [241911.294638]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
> [241911.353005]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
> [241911.409832]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
> [241911.466467]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
> [241911.522602]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
> [241911.577979]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
> [241911.633188]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
> [241911.688146]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
> [241911.742740]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
> [241911.797039]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
> [241911.850750]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
> [241911.903564] ---[ end trace fae017546778f2b0 ]---
> [241911.955332] ------------[ cut here ]------------
> [241912.006262] WARNING: CPU: 1 PID: 26664 at
> fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs]
> [241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT
> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
> [241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
> 4.7.0-rc6-29043-g8b8b08c #1
> [241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
> 02/18/2015
> [241912.429395]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
> 0000000000000000
> [241912.497080]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
> ffff8808d104bd08
> [241912.565113]  000016465a3410a8 00000047a0000000 0000000000000000
> ffff8808469e2088
> [241912.634105] Call Trace:
> [241912.702992]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
> [241912.773473]  [<ffffffffbd085615>] __warn+0xe5/0x100
> [241912.844339]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
> [241912.916083]  [<ffffffffc058cb2a>]
> btrfs_free_block_groups+0x40a/0x410 [btrfs]
> [241912.989103]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
> [241913.062672]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
> [241913.136364]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
> [241913.208701]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
> [241913.279194]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
> [241913.348065]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
> [241913.415082]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
> [241913.479841]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
> [241913.543353]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
> [241913.605959]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
> [241913.667542]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
> [241913.729612]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
> [241913.791203]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
> [241913.852485] ---[ end trace fae017546778f2b1 ]---
> [241913.913638] ------------[ cut here ]------------
> [241913.974871] WARNING: CPU: 1 PID: 26664 at
> fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs]
> [241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT
> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
> [241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
> 4.7.0-rc6-29043-g8b8b08c #1
> [241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
> 02/18/2015
> [241914.460679]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
> 0000000000000000
> [241914.534126]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
> ffff8808d104bce8
> [241914.607523]  0000271dbd3dac8c ffff88085184aac8 0000000000000038
> 0000000000000000
> [241914.681318] Call Trace:
> [241914.754437]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
> [241914.828796]  [<ffffffffbd085615>] __warn+0xe5/0x100
> [241914.902953]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
> [241914.977271]  [<ffffffffc058c9da>]
> btrfs_free_block_groups+0x2ba/0x410 [btrfs]
> [241915.052041]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
> [241915.126282]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
> [241915.200758]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
> [241915.273872]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
> [241915.345132]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
> [241915.414703]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
> [241915.482488]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
> [241915.547994]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
> [241915.611962]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
> [241915.674717]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
> [241915.736398]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
> [241915.798592]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
> [241915.860295]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
> [241915.921642] ---[ end trace fae017546778f2b2 ]---
> [241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full
> [241916.045103] BTRFS: space_info total=307627032576, used=193048903680,
> pinned=0, reserved=0, may_use=688537059328, readonly=131072
> 
> Greets,
> Stefan
> 
> Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG:
>> Hi Josef,
>>
>> same again with chris next branch:
>>
>> ERROR: error during balancing '/vmbackup/': No space left on device
>> There may be more info in syslog - try dmesg | tail
>> Dumping filters: flags 0x7, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=5
>>   METADATA (flags 0x2): balancing, usage=5
>>   SYSTEM (flags 0x2): balancing, usage=5
>>
>> dmesg:
>> [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance
>>
>> uname -r 4.7.0-rc6-29043-g8b8b08c
>>
>> Greets,
>> Stefan
>>
>> Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG:
>>> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG:
>>>> Am 29.07.2016 um 23:03 schrieb Josef Bacik:
>>>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>>>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>>>>>> AG wrote:
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>>>>>> 30TB).
>>>>>>>>
>>>>>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>>>>>> (since commit [1]). Could someone please be so kind and help me
>>>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>>>>>> systems.
>>>>>>>
>>>>>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>>>>>> rather than the free space tree itself. I haven't debugged one of these
>>>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>>>>>
>>>>>> I should've asked, what sort of filesystem activity triggers this?
>>>>>>
>>>>>
>>>>> Chris just fixed this I think, try his next branch from his git tree
>>>>>
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
>>>>
>>>> Thanks now running a 4.4 with those patches backported. If that still
>>>> shows an error i will try that vanilla tree.
>>>
>>> OK this didn't work. I'll start / try using the linux-btrfs next branch
>>> and look if this helps.
>>>
>>> Greets,
>>> Stefan
>>>
>>>>
>>>> Thanks!
>>>>
>>>> Stefan
>>>>
>>>>> and see if it still happens.  Thanks,
>>>>>
>>>>> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: memory overflow or undeflow in free space tree / space_info?
  2016-08-14 15:22               ` Stefan Priebe - Profihost AG
@ 2016-08-29 14:02                 ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-08-29 14:02 UTC (permalink / raw)
  To: Josef Bacik, Omar Sandoval; +Cc: linux-btrfs@vger.kernel.org

Hi Josef,

this still hapens with current 4.8-rc* releases. Anything i can do to
debug this? May be insert some code to check for an under or overflow in
the code?

Stefan

Am 14.08.2016 um 17:22 schrieb Stefan Priebe - Profihost AG:
> Hi Josef,
> 
> anything i could do or test? Results with a vanilla next branch are the
> same.
> 
> Stefan
> 
> Am 11.08.2016 um 08:09 schrieb Stefan Priebe - Profihost AG:
>> Hello,
>>
>> the backtrace and info on umount looks the same:
>>
>> [241910.341124] ------------[ cut here ]------------
>> [241910.379991] WARNING: CPU: 1 PID: 26664 at
>> fs/btrfs/extent-tree.c:5701 btrfs_free_block_groups+0x370/0x410 [btrfs]
>> [241910.422099] Modules linked in: netconsole mpt3sas ipt_REJECT
>> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
>> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
>> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
>> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
>> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
>> [241910.616845] CPU: 1 PID: 26664 Comm: umount Not tainted
>> 4.7.0-rc6-29043-g8b8b08c #1
>> [241910.669646] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
>> 02/18/2015
>> [241910.723716]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
>> 0000000000000000
>> [241910.779309]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
>> ffff8808d104bd08
>> [241910.835143]  000016455a3410a8 00000047a0000000 0000000000000000
>> ffff8808469e2088
>> [241910.891882] Call Trace:
>> [241910.947624]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
>> [241911.003714]  [<ffffffffbd085615>] __warn+0xe5/0x100
>> [241911.060167]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
>> [241911.117422]  [<ffffffffc058ca90>]
>> btrfs_free_block_groups+0x370/0x410 [btrfs]
>> [241911.175975]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
>> [241911.235170]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
>> [241911.294638]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
>> [241911.353005]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
>> [241911.409832]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>> [241911.466467]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
>> [241911.522602]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
>> [241911.577979]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
>> [241911.633188]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
>> [241911.688146]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
>> [241911.742740]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
>> [241911.797039]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
>> [241911.850750]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
>> [241911.903564] ---[ end trace fae017546778f2b0 ]---
>> [241911.955332] ------------[ cut here ]------------
>> [241912.006262] WARNING: CPU: 1 PID: 26664 at
>> fs/btrfs/extent-tree.c:5702 btrfs_free_block_groups+0x40a/0x410 [btrfs]
>> [241912.059326] Modules linked in: netconsole mpt3sas ipt_REJECT
>> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
>> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
>> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
>> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
>> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
>> [241912.298666] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
>> 4.7.0-rc6-29043-g8b8b08c #1
>> [241912.363401] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
>> 02/18/2015
>> [241912.429395]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
>> 0000000000000000
>> [241912.497080]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
>> ffff8808d104bd08
>> [241912.565113]  000016465a3410a8 00000047a0000000 0000000000000000
>> ffff8808469e2088
>> [241912.634105] Call Trace:
>> [241912.702992]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
>> [241912.773473]  [<ffffffffbd085615>] __warn+0xe5/0x100
>> [241912.844339]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
>> [241912.916083]  [<ffffffffc058cb2a>]
>> btrfs_free_block_groups+0x40a/0x410 [btrfs]
>> [241912.989103]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
>> [241913.062672]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
>> [241913.136364]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
>> [241913.208701]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
>> [241913.279194]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>> [241913.348065]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
>> [241913.415082]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
>> [241913.479841]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
>> [241913.543353]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
>> [241913.605959]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
>> [241913.667542]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
>> [241913.729612]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
>> [241913.791203]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
>> [241913.852485] ---[ end trace fae017546778f2b1 ]---
>> [241913.913638] ------------[ cut here ]------------
>> [241913.974871] WARNING: CPU: 1 PID: 26664 at
>> fs/btrfs/extent-tree.c:10013 btrfs_free_block_groups+0x2ba/0x410 [btrfs]
>> [241914.039315] Modules linked in: netconsole mpt3sas ipt_REJECT
>> raid_class nf_reject_ipv4 scsi_transport_sas xt_multiport 8021q garp
>> iptable_filter ip_tables x_tables bonding coretemp loop usbhid ehci_pci
>> i2c_i801 ehci_hcd usbcore i2c_core shpchp usb_common ipmi_si
>> ipmi_msghandler button btrfs dm_mod raid1 raid456 async_raid6_recov
>> async_memcpy async_pq async_xor async_tx xor raid6_pq md_mod sg sd_mod
>> ixgbe i40e mdio ptp ahci libahci pps_core megaraid_sas
>> [241914.315918] CPU: 1 PID: 26664 Comm: umount Tainted: G        W
>> 4.7.0-rc6-29043-g8b8b08c #1
>> [241914.388096] Hardware name: Supermicro X10DRH/X10DRH-iT, BIOS 1.0c
>> 02/18/2015
>> [241914.460679]  0000000000000000 ffff8808d104bca8 ffffffffbd3d83cf
>> 0000000000000000
>> [241914.534126]  0000000000000000 ffff8808d104bcf8 ffffffffbd085615
>> ffff8808d104bce8
>> [241914.607523]  0000271dbd3dac8c ffff88085184aac8 0000000000000038
>> 0000000000000000
>> [241914.681318] Call Trace:
>> [241914.754437]  [<ffffffffbd3d83cf>] dump_stack+0x63/0x84
>> [241914.828796]  [<ffffffffbd085615>] __warn+0xe5/0x100
>> [241914.902953]  [<ffffffffbd08564d>] warn_slowpath_null+0x1d/0x20
>> [241914.977271]  [<ffffffffc058c9da>]
>> btrfs_free_block_groups+0x2ba/0x410 [btrfs]
>> [241915.052041]  [<ffffffffc059e7ab>] close_ctree+0x15b/0x330 [btrfs]
>> [241915.126282]  [<ffffffffc056f089>] btrfs_put_super+0x19/0x20 [btrfs]
>> [241915.200758]  [<ffffffffbd1deaff>] generic_shutdown_super+0x6f/0x100
>> [241915.273872]  [<ffffffffbd1df026>] kill_anon_super+0x16/0x30
>> [241915.345132]  [<ffffffffc05720fa>] btrfs_kill_super+0x1a/0xb0 [btrfs]
>> [241915.414703]  [<ffffffffbd1df1f1>] deactivate_locked_super+0x51/0x90
>> [241915.482488]  [<ffffffffbd1dfb8e>] deactivate_super+0x4e/0x70
>> [241915.547994]  [<ffffffffbd1fba73>] cleanup_mnt+0x43/0x90
>> [241915.611962]  [<ffffffffbd1fbb12>] __cleanup_mnt+0x12/0x20
>> [241915.674717]  [<ffffffffbd0a1f61>] task_work_run+0x81/0xb0
>> [241915.736398]  [<ffffffffbd07ffcd>] exit_to_usermode_loop+0x66/0x95
>> [241915.798592]  [<ffffffffbd002a7d>] do_syscall_64+0x10d/0x150
>> [241915.860295]  [<ffffffffbd6d9ca1>] entry_SYSCALL64_slow_path+0x25/0x25
>> [241915.921642] ---[ end trace fae017546778f2b2 ]---
>> [241915.982893] BTRFS: space_info 4 has 114577997824 free, is not full
>> [241916.045103] BTRFS: space_info total=307627032576, used=193048903680,
>> pinned=0, reserved=0, may_use=688537059328, readonly=131072
>>
>> Greets,
>> Stefan
>>
>> Am 10.08.2016 um 23:31 schrieb Stefan Priebe - Profihost AG:
>>> Hi Josef,
>>>
>>> same again with chris next branch:
>>>
>>> ERROR: error during balancing '/vmbackup/': No space left on device
>>> There may be more info in syslog - try dmesg | tail
>>> Dumping filters: flags 0x7, state 0x0, force is off
>>>   DATA (flags 0x2): balancing, usage=5
>>>   METADATA (flags 0x2): balancing, usage=5
>>>   SYSTEM (flags 0x2): balancing, usage=5
>>>
>>> dmesg:
>>> [203784.411189] BTRFS info (device dm-0): 114 enospc errors during balance
>>>
>>> uname -r 4.7.0-rc6-29043-g8b8b08c
>>>
>>> Greets,
>>> Stefan
>>>
>>> Am 08.08.2016 um 08:17 schrieb Stefan Priebe - Profihost AG:
>>>> Am 04.08.2016 um 13:40 schrieb Stefan Priebe - Profihost AG:
>>>>> Am 29.07.2016 um 23:03 schrieb Josef Bacik:
>>>>>> On 07/29/2016 03:14 PM, Omar Sandoval wrote:
>>>>>>> On Fri, Jul 29, 2016 at 12:11:53PM -0700, Omar Sandoval wrote:
>>>>>>>> On Fri, Jul 29, 2016 at 08:40:26PM +0200, Stefan Priebe - Profihost
>>>>>>>> AG wrote:
>>>>>>>>> Dear list,
>>>>>>>>>
>>>>>>>>> i'm seeing btrfs no space messages frequently on big filesystems (>
>>>>>>>>> 30TB).
>>>>>>>>>
>>>>>>>>> In all cases i'm getting a trace like this one a space_info warning.
>>>>>>>>> (since commit [1]). Could someone please be so kind and help me
>>>>>>>>> debugging / fixing this bug? I'm using space_cache=v2 on all those
>>>>>>>>> systems.
>>>>>>>>
>>>>>>>> Hm, so I think this indicates a bug in space accounting somewhere else
>>>>>>>> rather than the free space tree itself. I haven't debugged one of these
>>>>>>>> issues before, I'll see if I can reproduce it. Cc'ing Josef, too.
>>>>>>>
>>>>>>> I should've asked, what sort of filesystem activity triggers this?
>>>>>>>
>>>>>>
>>>>>> Chris just fixed this I think, try his next branch from his git tree
>>>>>>
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git
>>>>>
>>>>> Thanks now running a 4.4 with those patches backported. If that still
>>>>> shows an error i will try that vanilla tree.
>>>>
>>>> OK this didn't work. I'll start / try using the linux-btrfs next branch
>>>> and look if this helps.
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Stefan
>>>>>
>>>>>> and see if it still happens.  Thanks,
>>>>>>
>>>>>> Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-29 14:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-29 18:40 memory overflow or undeflow in free space tree / space_info? Stefan Priebe - Profihost AG
2016-07-29 19:11 ` Omar Sandoval
2016-07-29 19:14   ` Omar Sandoval
2016-07-29 19:40     ` Stefan Priebe - Profihost AG
2016-07-29 21:03     ` Josef Bacik
2016-07-29 22:57       ` Holger Hoffstätte
2016-07-29 23:09         ` Holger Hoffstätte
2016-08-04 11:40       ` Stefan Priebe - Profihost AG
2016-08-08  6:17         ` Stefan Priebe - Profihost AG
2016-08-10 21:31           ` Stefan Priebe - Profihost AG
2016-08-11  6:09             ` Stefan Priebe - Profihost AG
2016-08-14 15:22               ` Stefan Priebe - Profihost AG
2016-08-29 14:02                 ` Stefan Priebe - Profihost AG
2016-07-29 19:39   ` Stefan Priebe - Profihost AG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).