linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
@ 2014-10-02  7:27 Tomasz Chmielewski
  2014-10-03 18:17 ` Josef Bacik
  2014-10-13 15:15 ` 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Rich Freeman
  0 siblings, 2 replies; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-10-02  7:27 UTC (permalink / raw)
  To: linux-btrfs

Got this when running balance with 3.17.0-rc7:

(...)
[173394.571080] BTRFS info (device sdd1): relocating block group 
4391666974720 flags 17
[173405.407779] BTRFS info (device sdd1): found 52296 extents
[173441.235837] BTRFS info (device sdd1): found 52296 extents
[173442.266918] BTRFS info (device sdd1): relocating block group 
4390593232896 flags 17
[173451.515002] BTRFS info (device sdd1): found 22314 extents
[173473.761612] BTRFS info (device sdd1): found 22314 extents
[173474.498414] BTRFS info (device sdd1): relocating block group 
4389519491072 flags 20
[173475.410657] ------------[ cut here ]------------
[173475.410717] kernel BUG at fs/btrfs/relocation.c:931!
[173475.410774] invalid opcode: 0000 [#1] SMP
[173475.410829] Modules linked in: ipt_MASQUERADE iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop i2c_i801 i2c_core pcspkr battery 
tpm_infineon tpm_tis tpm parport_pc parport video lpc_ich mfd_core 
ehci_pci ehci_hcd button acpi_cpufreq ext4 crc16 jbd2 mbcache raid1 sg 
sd_mod ahci libahci libata scsi_mod r8169 mii
[173475.411284] CPU: 1 PID: 5512 Comm: btrfs Not tainted 3.17.0-rc7 #1
[173475.411341] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[173475.411450] task: ffff8807f1744830 ti: ffff88076e9b0000 task.ti: 
ffff88076e9b0000
[173475.411555] RIP: 0010:[<ffffffffa02ef1ae>]  [<ffffffffa02ef1ae>] 
build_backref_tree+0x64a/0xe77 [btrfs]
[173475.411684] RSP: 0018:ffff88076e9b3888  EFLAGS: 00010287
[173475.411740] RAX: ffff8805abb30480 RBX: ffff880589dfcf00 RCX: 
0000000000000003
[173475.411845] RDX: 00000510a31b8000 RSI: ffff880589dfcac0 RDI: 
ffff8804c69f8800
[173475.411949] RBP: ffff88076e9b3988 R08: 00000000000143e0 R09: 
0000000000000000
[173475.412053] R10: ffff8807c97366f0 R11: 0000000000000000 R12: 
ffff8804c69f8800
[173475.412157] R13: ffff880589dfca80 R14: 0000000000000000 R15: 
ffff88065a3b0000
[173475.412262] FS:  00007f320e446840(0000) GS:ffff88081fa40000(0000) 
knlGS:0000000000000000
[173475.413687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[173475.413744] CR2: ffffffffff600400 CR3: 00000007ea08f000 CR4: 
00000000001407e0
[173475.413849] Stack:
[173475.413899]  ffff8807c820a000 0000000000000000 ffff880589dfcf00 
ffff8801a22bf7c0
[173475.414007]  ffff8801a22bfb60 ffff880589dfcac0 ffff88065a3b0124 
0000000000000001
[173475.414114]  ffff88065a3b0120 0000000000000003 ffff8805abb30480 
ffff8805dbbb7240
[173475.414223] Call Trace:
[173475.414291]  [<ffffffffa02f0480>] relocate_tree_blocks+0x1b7/0x532 
[btrfs]
[173475.414364]  [<ffffffffa02cac81>] ? free_extent_buffer+0x6f/0x7c 
[btrfs]
[173475.414434]  [<ffffffffa02ebcf3>] ? tree_insert+0x49/0x50 [btrfs]
[173475.414501]  [<ffffffffa02eea1b>] ? add_tree_block+0x13a/0x162 
[btrfs]
[173475.414570]  [<ffffffffa02f1861>] relocate_block_group+0x275/0x4de 
[btrfs]
[173475.414640]  [<ffffffffa02f1c22>] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[173475.414763]  [<ffffffffa02ce79c>] 
btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
[173475.414884]  [<ffffffffa02dd99f>] ? 
btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
[173475.414995]  [<ffffffffa028eb04>] ? 
btrfs_set_path_blocking+0x23/0x54 [btrfs]
[173475.415107]  [<ffffffffa0293517>] ? btrfs_search_slot+0x7bc/0x816 
[btrfs]
[173475.415177]  [<ffffffffa02cac81>] ? free_extent_buffer+0x6f/0x7c 
[btrfs]
[173475.415248]  [<ffffffffa02d1679>] btrfs_balance+0xa7b/0xc80 [btrfs]
[173475.415318]  [<ffffffffa02d7177>] btrfs_ioctl_balance+0x220/0x29f 
[btrfs]
[173475.415388]  [<ffffffffa02dc1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
[173475.415448]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
[173475.415507]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
[173475.415566]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
[173475.415625]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
[173475.415682]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
[173475.415740]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
[173475.415798]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
[173475.415856] Code: ff ff 01 e9 50 02 00 00 48 63 8d 48 ff ff ff 48 8b 
85 50 ff ff ff 48 83 3c c8 00 75 46 48 8b 53 18 49 39 94 24 d8 00 00 00 
74 02 <0f> 0b 4c 89 e7 e8 9b e8 ff ff 85 c0 74 21 48 8b 55 98 48 8d 43
[173475.416080] RIP  [<ffffffffa02ef1ae>] build_backref_tree+0x64a/0xe77 
[btrfs]
[173475.416151]  RSP <ffff88076e9b3888>
[173475.416482] ---[ end trace 17e512e0d6dc91d7 ]---



-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-02  7:27 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Tomasz Chmielewski
@ 2014-10-03 18:17 ` Josef Bacik
  2014-10-03 22:06   ` Tomasz Chmielewski
  2014-10-13 15:15 ` 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Rich Freeman
  1 sibling, 1 reply; 25+ messages in thread
From: Josef Bacik @ 2014-10-03 18:17 UTC (permalink / raw)
  To: Tomasz Chmielewski, linux-btrfs

On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
> Got this when running balance with 3.17.0-rc7:
>

Give these two patches a try

https://patchwork.kernel.org/patch/4938281/
https://patchwork.kernel.org/patch/4939761/

Thanks,

Josef

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-03 18:17 ` Josef Bacik
@ 2014-10-03 22:06   ` Tomasz Chmielewski
  2014-10-03 22:09     ` Josef Bacik
  2014-11-25 22:33     ` Tomasz Chmielewski
  0 siblings, 2 replies; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-10-03 22:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>> Got this when running balance with 3.17.0-rc7:
>> 
> 
> Give these two patches a try
> 
> https://patchwork.kernel.org/patch/4938281/
> https://patchwork.kernel.org/patch/4939761/

With these two patches applied on top of 3.13-rc7, it BUGs somewhere 
else now:

[ 2030.858792] BTRFS info (device sdd1): relocating block group 
6469424513024 flags 17
[ 2039.674077] BTRFS info (device sdd1): found 20937 extents
[ 2066.726661] BTRFS info (device sdd1): found 20937 extents
[ 2068.048208] BTRFS info (device sdd1): relocating block group 
6468350771200 flags 17
[ 2080.796412] BTRFS info (device sdd1): found 46927 extents
[ 2092.703850] parent transid verify failed on 5568935395328 wanted 
70315 found 71183
[ 2092.714622] parent transid verify failed on 5568935395328 wanted 
70315 found 71183
[ 2092.725269] parent transid verify failed on 5568935395328 wanted 
70315 found 71183
[ 2092.725680] ------------[ cut here ]------------
[ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
[ 2092.725800] invalid opcode: 0000 [#1] SMP
[ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core 
parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core ehci_pci 
ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg sd_mod 
ahci libahci libata scsi_mod r8169 mii
[ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
[ 2092.727801] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti: 
ffff8800c7d04000
[ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>] 
relocate_block_group+0x432/0x4de [btrfs]
[ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
[ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX: 
0000000180200000
[ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI: 
ffff8807ff403900
[ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09: 
0000000000000000
[ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12: 
ffff8805d515c240
[ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15: 
ffff8806c69a1820
[ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000) 
knlGS:0000000000000000
[ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4: 
00000000001407e0
[ 2092.728783] Stack:
[ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000 
0000000000000000
[ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30 
0000000000001000
[ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000 
ffff8807f3f285b0
[ 2092.729183] Call Trace:
[ 2092.729256]  [<ffffffffa0322c4e>] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[ 2092.729385]  [<ffffffffa02ff79c>] 
btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
[ 2092.729512]  [<ffffffffa030e99f>] ? 
btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
[ 2092.729632]  [<ffffffffa02bfb04>] ? btrfs_set_path_blocking+0x23/0x54 
[btrfs]
[ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816 
[btrfs]
[ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c 
[btrfs]
[ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
[ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f 
[btrfs]
[ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
[ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
[ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
[ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
[ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
[ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
[ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
[ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
[ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab 
08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00 
74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
[ 2092.730759] RIP  [<ffffffffa0322a4a>] 
relocate_block_group+0x432/0x4de [btrfs]
[ 2092.730885]  RSP <ffff8800c7d07a58>
[ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-03 22:06   ` Tomasz Chmielewski
@ 2014-10-03 22:09     ` Josef Bacik
  2014-10-04 21:47       ` Tomasz Chmielewski
  2014-11-25 22:33     ` Tomasz Chmielewski
  1 sibling, 1 reply; 25+ messages in thread
From: Josef Bacik @ 2014-10-03 22:09 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

Can you make a btrfs-image of this fs and send it to me?  Thanks,

Josef

Tomasz Chmielewski <tch@virtall.com> wrote:


On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>> Got this when running balance with 3.17.0-rc7:
>>
>
> Give these two patches a try
>
> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4938281/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=dc71db2c3614702306e4085366f39b206c2b93859afded8030aea69d56f570f7
> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4939761/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=d9f61fcd6ba68d21f5180b30ffafed66471754728c5e46f70a2010c13ca48726

With these two patches applied on top of 3.13-rc7, it BUGs somewhere
else now:

[ 2030.858792] BTRFS info (device sdd1): relocating block group
6469424513024 flags 17
[ 2039.674077] BTRFS info (device sdd1): found 20937 extents
[ 2066.726661] BTRFS info (device sdd1): found 20937 extents
[ 2068.048208] BTRFS info (device sdd1): relocating block group
6468350771200 flags 17
[ 2080.796412] BTRFS info (device sdd1): found 46927 extents
[ 2092.703850] parent transid verify failed on 5568935395328 wanted
70315 found 71183
[ 2092.714622] parent transid verify failed on 5568935395328 wanted
70315 found 71183
[ 2092.725269] parent transid verify failed on 5568935395328 wanted
70315 found 71183
[ 2092.725680] ------------[ cut here ]------------
[ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
[ 2092.725800] invalid opcode: 0000 [#1] SMP
[ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
ip_tables x_tables cpufreq_ondemand cpufreq_conservative
cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq
zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core
parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core ehci_pci
ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg sd_mod
ahci libahci libata scsi_mod r8169 mii
[ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
[ 2092.727801] Hardware name: System manufacturer System Product
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti:
ffff8800c7d04000
[ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>]
relocate_block_group+0x432/0x4de [btrfs]
[ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
[ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX:
0000000180200000
[ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI:
ffff8807ff403900
[ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09:
0000000000000000
[ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12:
ffff8805d515c240
[ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15:
ffff8806c69a1820
[ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000)
knlGS:0000000000000000
[ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4:
00000000001407e0
[ 2092.728783] Stack:
[ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000
0000000000000000
[ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30
0000000000001000
[ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000
ffff8807f3f285b0
[ 2092.729183] Call Trace:
[ 2092.729256]  [<ffffffffa0322c4e>]
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[ 2092.729385]  [<ffffffffa02ff79c>]
btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
[ 2092.729512]  [<ffffffffa030e99f>] ?
btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
[ 2092.729632]  [<ffffffffa02bfb04>] ? btrfs_set_path_blocking+0x23/0x54
[btrfs]
[ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816
[btrfs]
[ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c
[btrfs]
[ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
[ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f
[btrfs]
[ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
[ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
[ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
[ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
[ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
[ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
[ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
[ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
[ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab
08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00
74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
[ 2092.730759] RIP  [<ffffffffa0322a4a>]
relocate_block_group+0x432/0x4de [btrfs]
[ 2092.730885]  RSP <ffff8800c7d07a58>
[ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---


--
Tomasz Chmielewski
https://urldefense.proofpoint.com/v1/url?u=http://www.sslrack.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=0d4fcb8f56e098c3b7b558e37ba28198826f35c0563f25d96663ff7c30c20149


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-03 22:09     ` Josef Bacik
@ 2014-10-04 21:47       ` Tomasz Chmielewski
  2014-10-04 22:07         ` Josef Bacik
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-10-04 21:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

Hi,

is btrfs-image with single -s flag OK? I.e.

btrfs-image -s -c 9 -t 32 /dev/sdc1 /root/btrfs-2.img

?

Tomasz Chmielewski


On 2014-10-04 00:09 (Sat), Josef Bacik wrote:
> Can you make a btrfs-image of this fs and send it to me?  Thanks,
> 
> Josef
> 
> Tomasz Chmielewski <tch@virtall.com> wrote:
> 
> 
> On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
>> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>>> Got this when running balance with 3.17.0-rc7:
>>> 
>> 
>> Give these two patches a try
>> 
>> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4938281/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=dc71db2c3614702306e4085366f39b206c2b93859afded8030aea69d56f570f7
>> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4939761/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=d9f61fcd6ba68d21f5180b30ffafed66471754728c5e46f70a2010c13ca48726
> 
> With these two patches applied on top of 3.13-rc7, it BUGs somewhere
> else now:
> 
> [ 2030.858792] BTRFS info (device sdd1): relocating block group
> 6469424513024 flags 17
> [ 2039.674077] BTRFS info (device sdd1): found 20937 extents
> [ 2066.726661] BTRFS info (device sdd1): found 20937 extents
> [ 2068.048208] BTRFS info (device sdd1): relocating block group
> 6468350771200 flags 17
> [ 2080.796412] BTRFS info (device sdd1): found 46927 extents
> [ 2092.703850] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.714622] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725269] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725680] ------------[ cut here ]------------
> [ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
> [ 2092.725800] invalid opcode: 0000 [#1] SMP
> [ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> ip_tables x_tables cpufreq_ondemand cpufreq_conservative
> cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq
> zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core
> parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core 
> ehci_pci
> ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg sd_mod
> ahci libahci libata scsi_mod r8169 mii
> [ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
> [ 2092.727801] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti:
> ffff8800c7d04000
> [ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
> [ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX:
> 0000000180200000
> [ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI:
> ffff8807ff403900
> [ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09:
> 0000000000000000
> [ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12:
> ffff8805d515c240
> [ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15:
> ffff8806c69a1820
> [ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000)
> knlGS:0000000000000000
> [ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4:
> 00000000001407e0
> [ 2092.728783] Stack:
> [ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000
> 0000000000000000
> [ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30
> 0000000000001000
> [ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000
> ffff8807f3f285b0
> [ 2092.729183] Call Trace:
> [ 2092.729256]  [<ffffffffa0322c4e>]
> btrfs_relocate_block_group+0x158/0x278 [btrfs]
> [ 2092.729385]  [<ffffffffa02ff79c>]
> btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
> [ 2092.729512]  [<ffffffffa030e99f>] ?
> btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
> [ 2092.729632]  [<ffffffffa02bfb04>] ? 
> btrfs_set_path_blocking+0x23/0x54
> [btrfs]
> [ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816
> [btrfs]
> [ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c
> [btrfs]
> [ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
> [ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f
> [btrfs]
> [ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
> [ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
> [ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
> [ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
> [ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
> [ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
> [ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
> [ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
> [ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab
> 08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00
> 74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
> [ 2092.730759] RIP  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.730885]  RSP <ffff8800c7d07a58>
> [ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---

Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-04 21:47       ` Tomasz Chmielewski
@ 2014-10-04 22:07         ` Josef Bacik
  0 siblings, 0 replies; 25+ messages in thread
From: Josef Bacik @ 2014-10-04 22:07 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

Yup I don't need filenames for a balance.  Thanks,

Josef

Tomasz Chmielewski <tch@virtall.com> wrote:


Hi,

is btrfs-image with single -s flag OK? I.e.

btrfs-image -s -c 9 -t 32 /dev/sdc1 /root/btrfs-2.img

?

Tomasz Chmielewski


On 2014-10-04 00:09 (Sat), Josef Bacik wrote:
> Can you make a btrfs-image of this fs and send it to me?  Thanks,
>
> Josef
>
> Tomasz Chmielewski <tch@virtall.com> wrote:
>
>
> On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
>> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>>> Got this when running balance with 3.17.0-rc7:
>>>
>>
>> Give these two patches a try
>>
>> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4938281/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=dc71db2c3614702306e4085366f39b206c2b93859afded8030aea69d56f570f7
>> https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/4939761/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=zoOEeHoPSaWycUnRMebS4NI5cEnXmHt7kS4QG2us9Mk%3D%0A&s=d9f61fcd6ba68d21f5180b30ffafed66471754728c5e46f70a2010c13ca48726
>
> With these two patches applied on top of 3.13-rc7, it BUGs somewhere
> else now:
>
> [ 2030.858792] BTRFS info (device sdd1): relocating block group
> 6469424513024 flags 17
> [ 2039.674077] BTRFS info (device sdd1): found 20937 extents
> [ 2066.726661] BTRFS info (device sdd1): found 20937 extents
> [ 2068.048208] BTRFS info (device sdd1): relocating block group
> 6468350771200 flags 17
> [ 2080.796412] BTRFS info (device sdd1): found 46927 extents
> [ 2092.703850] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.714622] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725269] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725680] ------------[ cut here ]------------
> [ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
> [ 2092.725800] invalid opcode: 0000 [#1] SMP
> [ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> ip_tables x_tables cpufreq_ondemand cpufreq_conservative
> cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq
> zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core
> parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core
> ehci_pci
> ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg sd_mod
> ahci libahci libata scsi_mod r8169 mii
> [ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
> [ 2092.727801] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti:
> ffff8800c7d04000
> [ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
> [ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX:
> 0000000180200000
> [ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI:
> ffff8807ff403900
> [ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09:
> 0000000000000000
> [ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12:
> ffff8805d515c240
> [ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15:
> ffff8806c69a1820
> [ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000)
> knlGS:0000000000000000
> [ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4:
> 00000000001407e0
> [ 2092.728783] Stack:
> [ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000
> 0000000000000000
> [ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30
> 0000000000001000
> [ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000
> ffff8807f3f285b0
> [ 2092.729183] Call Trace:
> [ 2092.729256]  [<ffffffffa0322c4e>]
> btrfs_relocate_block_group+0x158/0x278 [btrfs]
> [ 2092.729385]  [<ffffffffa02ff79c>]
> btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
> [ 2092.729512]  [<ffffffffa030e99f>] ?
> btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
> [ 2092.729632]  [<ffffffffa02bfb04>] ?
> btrfs_set_path_blocking+0x23/0x54
> [btrfs]
> [ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816
> [btrfs]
> [ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c
> [btrfs]
> [ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
> [ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f
> [btrfs]
> [ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
> [ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
> [ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
> [ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
> [ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
> [ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
> [ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
> [ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
> [ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab
> 08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00
> 74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
> [ 2092.730759] RIP  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.730885]  RSP <ffff8800c7d07a58>
> [ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---

Tomasz Chmielewski
https://urldefense.proofpoint.com/v1/url?u=http://www.sslrack.com/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=04DsKhXKpV2nuINvUCrCo%2Bzmyr%2BwKVOYV3yrYCEqs3o%3D%0A&s=1a938ddfc074ee6f147c1ef13dd02a6a981fbe7d086926dab65e71cfc0fdbe83


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-02  7:27 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Tomasz Chmielewski
  2014-10-03 18:17 ` Josef Bacik
@ 2014-10-13 15:15 ` Rich Freeman
  1 sibling, 0 replies; 25+ messages in thread
From: Rich Freeman @ 2014-10-13 15:15 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On Thu, Oct 2, 2014 at 3:27 AM, Tomasz Chmielewski <tch@virtall.com> wrote:
> Got this when running balance with 3.17.0-rc7:
>
> [173475.410717] kernel BUG at fs/btrfs/relocation.c:931!

I just started a post on another thread with this exact same issue on
3.17.0. I started a balance after adding a new drive.

[453046.291762] BTRFS info (device sde2): relocating block group
10367073779712 flags 17
[453062.494151] BTRFS info (device sde2): found 13 extents
[453069.283368] ------------[ cut here ]------------
[453069.283468] kernel BUG at
/data/src/linux-3.17.0-gentoo/fs/btrfs/relocation.c:931!
[453069.283590] invalid opcode: 0000 [#1] SMP
[453069.283666] Modules linked in: vhost_net vhost macvtap macvlan tun
ipt_MASQUERADE xt_conntrack veth nfsd auth_rpcgss oid_registry lockd
iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables it87
hwmon_vid hid_logitech_dj nxt200x cx88_dvb videobuf_dvb dvb_core
cx88_vp3054_i2c tuner_simple tuner_types tuner mousedev hid_generic
usbhid cx88_alsa radeon cx8800 cx8802 cx88xx snd_hda_codec_realtek
btcx_risc snd_hda_codec_generic videobuf_dma_sg videobuf_core kvm_amd
tveeprom kvm rc_core v4l2_common cfbfillrect fbcon videodev cfbimgblt
snd_hda_intel bitblit snd_hda_controller cfbcopyarea softcursor font
tileblit i2c_algo_bit k10temp snd_hda_codec backlight drm_kms_helper
snd_hwdep i2c_piix4 ttm snd_pcm snd_timer drm snd soundcore 8250 evdev
[453069.285043]  serial_core ext4 crc16 jbd2 mbcache zram lz4_compress
zsmalloc ata_generic pata_acpi btrfs xor zlib_deflate atkbd raid6_pq
ohci_pci firewire_ohci firewire_core crc_itu_t pata_atiixp ehci_pci
ohci_hcd ehci_hcd usbcore usb_common r8169 mii sunrpc dm_mirror
dm_region_hash dm_log dm_mod
[453069.285552] CPU: 1 PID: 17270 Comm: btrfs Not tainted 3.17.0-gentoo #1
[453069.285657] Hardware name: Gigabyte Technology Co., Ltd.
GA-880GM-UD2H/GA-880GM-UD2H, BIOS F8 10/11/2010
[453069.285806] task: ffff88040ec556e0 ti: ffff88010cf94000 task.ti:
ffff88010cf94000
[453069.285925] RIP: 0010:[<ffffffffa02ddd62>]  [<ffffffffa02ddd62>]
build_backref_tree+0x1152/0x11b0 [btrfs]
[453069.286137] RSP: 0018:ffff88010cf97848  EFLAGS: 00010206
[453069.286223] RAX: ffff8800ae67c800 RBX: ffff880122e94000 RCX:
ffff880122e949c0
[453069.286336] RDX: 000009270788d000 RSI: ffff880054c3fbc0 RDI:
ffff8800ae67c800
[453069.286449] RBP: ffff88010cf97958 R08: 00000000000159a0 R09:
ffff880122e94000
[453069.286561] R10: 0000000000000003 R11: 0000000000000000 R12:
ffff8802da313000
[453069.286674] R13: ffff8802da313c60 R14: ffff880122e94780 R15:
ffff88040c277000
[453069.286787] FS:  00007f175ac51880(0000) GS:ffff880427c40000(0000)
knlGS:00000000f7333b40
[453069.286913] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[453069.287005] CR2: 00007f208de58000 CR3: 00000003b0a9c000 CR4:
00000000000007e0
[453069.287116] Stack:
[453069.287151]  ffff88010cf97868 ffff880122e94000 01ff880122e94300
ffff880342156060
[453069.287282]  ffff880122e94780 ffff8802da313c60 ffff880122e94600
ffff8800ae67c800
[453069.287412]  ffff880122e947c0 ffff8802da313000 ffff88040c277120
ffff880100000005
[453069.287542] Call Trace:
[453069.287640]  [<ffffffffa02ddfa3>] relocate_tree_blocks+0x1e3/0x630 [btrfs]
[453069.287796]  [<ffffffffa02e0550>] relocate_block_group+0x3d0/0x650 [btrfs]
[453069.287951]  [<ffffffffa02e0958>]
btrfs_relocate_block_group+0x188/0x2a0 [btrfs]
[453069.288113]  [<ffffffffa02b48f0>]
btrfs_relocate_chunk.isra.61+0x70/0x780 [btrfs]
[453069.288276]  [<ffffffffa02c7fd0>] ?
btrfs_set_lock_blocking_rw+0x70/0xc0 [btrfs]
[453069.288438]  [<ffffffffa02b0e79>] ? free_extent_buffer+0x59/0xb0 [btrfs]
[453069.288590]  [<ffffffffa02b8e99>] btrfs_balance+0x829/0xf40 [btrfs]
[453069.288738]  [<ffffffffa02bf80f>] btrfs_ioctl_balance+0x1af/0x510 [btrfs]
[453069.288890]  [<ffffffffa02c59e4>] btrfs_ioctl+0xa54/0x2950 [btrfs]
[453069.288995]  [<ffffffff8111d016>] ?
lru_cache_add_active_or_unevictable+0x26/0x90
[453069.289119]  [<ffffffff8113a061>] ? handle_mm_fault+0xbe1/0xdb0
[453069.289219]  [<ffffffff811ffdce>] ? cred_has_capability+0x5e/0x100
[453069.289323]  [<ffffffff8104065c>] ? __do_page_fault+0x1fc/0x4f0
[453069.289422]  [<ffffffff8117d80e>] do_vfs_ioctl+0x7e/0x4f0
[453069.289513]  [<ffffffff811ff64f>] ? file_has_perm+0x8f/0xa0
[453069.289606]  [<ffffffff8117dd09>] SyS_ioctl+0x89/0xa0
[453069.289692]  [<ffffffff81040a1c>] ? do_page_fault+0xc/0x10
[453069.289785]  [<ffffffff814f5752>] system_call_fastpath+0x16/0x1b
[453069.289881] Code: ff ff 48 8b 9d 20 ff ff ff e9 11 ff ff ff 0f 0b
be ec 03 00 00 48 c7 c7 d0 f0 30 a0 e8 28 00 d7 e0 e9 06 f3 ff ff e8
c4 42 02 00 <0f> 0b 3c b0 0f 84 72 f1 ff ff be 22 03 00 00 48 c7 c7 d0
f0 30
[453069.290429] RIP  [<ffffffffa02ddd62>]
build_backref_tree+0x1152/0x11b0 [btrfs]
[453069.290591]  RSP <ffff88010cf97848>
[453069.316194] ---[ end trace 5fdc0af4cc62bf41 ]---

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931!
  2014-10-03 22:06   ` Tomasz Chmielewski
  2014-10-03 22:09     ` Josef Bacik
@ 2014-11-25 22:33     ` Tomasz Chmielewski
  2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
  1 sibling, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-11-25 22:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

I'm still seeing this when running balance with 3.18-rc6:

[95334.066898] BTRFS info (device sdd1): relocating block group 
6468350771200 flags 17
[95344.384279] BTRFS info (device sdd1): found 5371 extents
[95373.555640] BTRFS (device sdd1): parent transid verify failed on 
5568935395328 wanted 70315 found 89269
[95373.574208] BTRFS (device sdd1): parent transid verify failed on 
5568935395328 wanted 70315 found 89269
[95373.574483] ------------[ cut here ]------------
[95373.574542] kernel BUG at fs/btrfs/relocation.c:242!
[95373.574601] invalid opcode: 0000 [#1] SMP
[95373.574661] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats nfsd auth_rpcgss oid_registry exportfs 
nfs_acl nfs lockd grace fscache sunrpc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop pcspkr i2c_i801 i2c_core battery 
tpm_infineon tpm_tis tpm 8250_fintek video parport_pc parport ehci_pci 
lpc_ich ehci_hcd mfd_core button acpi_cpufreq ext4 crc16 jbd2 mbcache 
raid1 sg sd_mod ahci libahci libata scsi_mod r8169 mii
[95373.576506] CPU: 1 PID: 6089 Comm: btrfs Not tainted 3.18.0-rc6 #1
[95373.576568] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[95373.576683] task: ffff8807e9b91810 ti: ffff8807da1b8000 task.ti: 
ffff8807da1b8000
[95373.576794] RIP: 0010:[<ffffffffa0323144>]  [<ffffffffa0323144>] 
relocate_block_group+0x432/0x4de [btrfs]
[95373.576933] RSP: 0018:ffff8807da1bbb18  EFLAGS: 00010202
[95373.576993] RAX: ffff8806327a70f8 RBX: ffff8806327a7000 RCX: 
0000000180200000
[95373.577056] RDX: ffff8806327a70d8 RSI: ffff8806327a70e8 RDI: 
ffff8807ff403900
[95373.577118] RBP: ffff8807da1bbb88 R08: 0000000000000001 R09: 
0000000000000000
[95373.577181] R10: 0000000000000003 R11: ffffffffa031f2aa R12: 
ffff8804601de5a0
[95373.577243] R13: ffff8806327a7108 R14: 00000000fffffff4 R15: 
ffff8806327a7020
[95373.577307] FS:  00007f9ccfa99840(0000) GS:ffff88081fa40000(0000) 
knlGS:0000000000000000
[95373.577418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[95373.577479] CR2: 00007f98c4133000 CR3: 00000007dd7bf000 CR4: 
00000000001407e0
[95373.577540] Stack:
[95373.577594]  ffffea0004962e80 ffff8806327a70e8 ffffea000c7fdb80 
0000000000000000
[95373.577708]  ffffea000d289600 00ffea000d289640 a8000005e22b2a30 
0000000000001000
[95373.577822]  ffff8802eb7b0240 ffff8806327a7000 0000000000000000 
ffff8807f3b5a5a8
[95373.577937] Call Trace:
[95373.578009]  [<ffffffffa0323348>] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[95373.578137]  [<ffffffffa0300fd4>] 
btrfs_relocate_chunk.isra.70+0x35/0xa5 [btrfs]
[95373.578263]  [<ffffffffa03025d4>] btrfs_balance+0xa66/0xc6b [btrfs]
[95373.578329]  [<ffffffff810bd63a>] ? 
__alloc_pages_nodemask+0x137/0x702
[95373.578407]  [<ffffffffa0308485>] btrfs_ioctl_balance+0x220/0x29f 
[btrfs]
[95373.578483]  [<ffffffffa030d586>] btrfs_ioctl+0x1134/0x22f6 [btrfs]
[95373.578547]  [<ffffffff810d5d90>] ? handle_mm_fault+0x44d/0xa00
[95373.578610]  [<ffffffff81175856>] ? avc_has_perm+0x2e/0xf7
[95373.578672]  [<ffffffff810d88a9>] ? __vm_enough_memory+0x25/0x13c
[95373.578736]  [<ffffffff8110f05d>] do_vfs_ioctl+0x3f2/0x43c
[95373.578798]  [<ffffffff8110f0f5>] SyS_ioctl+0x4e/0x7d
[95373.578859]  [<ffffffff81030ab3>] ? do_page_fault+0xc/0x11
[95373.578920]  [<ffffffff813b58d2>] system_call_fastpath+0x12/0x17
[95373.578981] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab 
08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00 
74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
[95373.579226] RIP  [<ffffffffa0323144>] 
relocate_block_group+0x432/0x4de [btrfs]
[95373.579352]  RSP <ffff8807da1bbb18>




On 2014-10-04 00:06, Tomasz Chmielewski wrote:
> On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
>> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>>> Got this when running balance with 3.17.0-rc7:
>>> 
>> 
>> Give these two patches a try
>> 
>> https://patchwork.kernel.org/patch/4938281/
>> https://patchwork.kernel.org/patch/4939761/
> 
> With these two patches applied on top of 3.13-rc7, it BUGs somewhere 
> else now:
> 
> [ 2030.858792] BTRFS info (device sdd1): relocating block group
> 6469424513024 flags 17
> [ 2039.674077] BTRFS info (device sdd1): found 20937 extents
> [ 2066.726661] BTRFS info (device sdd1): found 20937 extents
> [ 2068.048208] BTRFS info (device sdd1): relocating block group
> 6468350771200 flags 17
> [ 2080.796412] BTRFS info (device sdd1): found 46927 extents
> [ 2092.703850] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.714622] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725269] parent transid verify failed on 5568935395328 wanted
> 70315 found 71183
> [ 2092.725680] ------------[ cut here ]------------
> [ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
> [ 2092.725800] invalid opcode: 0000 [#1] SMP
> [ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> ip_tables x_tables cpufreq_ondemand cpufreq_conservative
> cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq
> zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core
> parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core
> ehci_pci ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg
> sd_mod ahci libahci libata scsi_mod r8169 mii
> [ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
> [ 2092.727801] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti:
> ffff8800c7d04000
> [ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
> [ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX: 
> 0000000180200000
> [ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI: 
> ffff8807ff403900
> [ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09: 
> 0000000000000000
> [ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12: 
> ffff8805d515c240
> [ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15: 
> ffff8806c69a1820
> [ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000)
> knlGS:0000000000000000
> [ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4: 
> 00000000001407e0
> [ 2092.728783] Stack:
> [ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000
> 0000000000000000
> [ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30
> 0000000000001000
> [ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000
> ffff8807f3f285b0
> [ 2092.729183] Call Trace:
> [ 2092.729256]  [<ffffffffa0322c4e>]
> btrfs_relocate_block_group+0x158/0x278 [btrfs]
> [ 2092.729385]  [<ffffffffa02ff79c>]
> btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
> [ 2092.729512]  [<ffffffffa030e99f>] ?
> btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
> [ 2092.729632]  [<ffffffffa02bfb04>] ? 
> btrfs_set_path_blocking+0x23/0x54 [btrfs]
> [ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816 
> [btrfs]
> [ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c 
> [btrfs]
> [ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
> [ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f 
> [btrfs]
> [ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
> [ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
> [ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
> [ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
> [ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
> [ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
> [ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
> [ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
> [ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39
> ab 08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00
> 00 00 74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b
> 7b 08
> [ 2092.730759] RIP  [<ffffffffa0322a4a>]
> relocate_block_group+0x432/0x4de [btrfs]
> [ 2092.730885]  RSP <ffff8800c7d07a58>
> [ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---

Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-11-25 22:33     ` Tomasz Chmielewski
@ 2014-12-12 14:37       ` Tomasz Chmielewski
  2014-12-12 21:36         ` Robert White
                           ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-12 14:37 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).

# time btrfs balance start /mnt/lxc2
Segmentation fault

real    322m32.153s
user    0m0.000s
sys     16m0.930s


[20182.461873] BTRFS info (device sdd1): relocating block group 
6915027369984 flags 17
[20194.050641] BTRFS info (device sdd1): found 4819 extents
[20286.243576] BTRFS info (device sdd1): found 4819 extents
[20287.143471] BTRFS info (device sdd1): relocating block group 
6468350771200 flags 17
[20295.756934] BTRFS info (device sdd1): found 3613 extents
[20306.981773] BTRFS (device sdd1): parent transid verify failed on 
5568935395328 wanted 70315 found 102416
[20306.983962] BTRFS (device sdd1): parent transid verify failed on 
5568935395328 wanted 70315 found 102416
[20307.029841] BTRFS (device sdd1): parent transid verify failed on 
5568935395328 wanted 70315 found 102416
[20307.030037] ------------[ cut here ]------------
[20307.030083] kernel BUG at fs/btrfs/relocation.c:242!
[20307.030130] invalid opcode: 0000 [#1] SMP
[20307.030175] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack ip_tables x_tables cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave cpufreq_stats nfsd auth_rpcgss oid_registry exportfs 
nfs_acl nfs lockd grace fscache sunrpc ipv6 btrfs xor raid6_pq 
zlib_deflate coretemp hwmon loop pcspkr i2c_i801 i2c_core lpc_ich 
mfd_core 8250_fintek battery parport_pc parport tpm_infineon tpm_tis tpm 
ehci_pci ehci_hcd video button acpi_cpufreq ext4 crc16 jbd2 mbcache 
raid1 sg sd_mod r8169 mii ahci libahci libata scsi_mod
[20307.030587] CPU: 3 PID: 4218 Comm: btrfs Not tainted 3.18.0 #1
[20307.030634] Hardware name: System manufacturer System Product 
Name/P8H77-M PRO, BIOS 1101 02/04/2013
[20307.030724] task: ffff8807f2cac830 ti: ffff8807e9198000 task.ti: 
ffff8807e9198000
[20307.030811] RIP: 0010:[<ffffffffa02e8240>]  [<ffffffffa02e8240>] 
relocate_block_group+0x432/0x4de [btrfs]
[20307.030914] RSP: 0018:ffff8807e919bb18  EFLAGS: 00010202
[20307.030960] RAX: ffff8805f06c40f8 RBX: ffff8805f06c4000 RCX: 
0000000180200003
[20307.031008] RDX: ffff8805f06c40d8 RSI: ffff8805f06c40e8 RDI: 
ffff8807ff403900
[20307.031056] RBP: ffff8807e919bb88 R08: 0000000000000001 R09: 
0000000000000000
[20307.031105] R10: 0000000000000003 R11: ffffffffa02e43a6 R12: 
ffff8807e637f090
[20307.031153] R13: ffff8805f06c4108 R14: 00000000fffffff4 R15: 
ffff8805f06c4020
[20307.031201] FS:  00007f1bdb4ba880(0000) GS:ffff88081fac0000(0000) 
knlGS:0000000000000000
[20307.031289] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20307.031336] CR2: 00007f5672e18070 CR3: 00000007e99cc000 CR4: 
00000000001407e0
[20307.031384] Stack:
[20307.031426]  ffffea0016296680 ffff8805f06c40e8 ffffea0016296380 
0000000000000000
[20307.031515]  ffffea0016296400 00ffea0016296440 a8000005e22b2a30 
0000000000001000
[20307.031604]  ffff8804d86963f0 ffff8805f06c4000 0000000000000000 
ffff8807f2d785a8
[20307.031693] Call Trace:
[20307.031743]  [<ffffffffa02e8444>] 
btrfs_relocate_block_group+0x158/0x278 [btrfs]
[20307.031838]  [<ffffffffa02c5fd4>] 
btrfs_relocate_chunk.isra.70+0x35/0xa5 [btrfs]
[20307.031931]  [<ffffffffa02c75d4>] btrfs_balance+0xa66/0xc6b [btrfs]
[20307.031981]  [<ffffffff810bd63a>] ? 
__alloc_pages_nodemask+0x137/0x702
[20307.032036]  [<ffffffffa02cd485>] btrfs_ioctl_balance+0x220/0x29f 
[btrfs]
[20307.032089]  [<ffffffffa02d2586>] btrfs_ioctl+0x1134/0x22f6 [btrfs]
[20307.032138]  [<ffffffff810d5d83>] ? handle_mm_fault+0x44d/0xa00
[20307.032186]  [<ffffffff81175862>] ? avc_has_perm+0x2e/0xf7
[20307.032234]  [<ffffffff810d889d>] ? __vm_enough_memory+0x25/0x13c
[20307.032282]  [<ffffffff8110f05d>] do_vfs_ioctl+0x3f2/0x43c
[20307.032329]  [<ffffffff8110f0f5>] SyS_ioctl+0x4e/0x7d
[20307.032376]  [<ffffffff81030ab3>] ? do_page_fault+0xc/0x11
[20307.032424]  [<ffffffff813b5992>] system_call_fastpath+0x12/0x17
[20307.032488] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39 ab 
08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00 00 00 
74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b 7b 08
[20307.032660] RIP  [<ffffffffa02e8240>] 
relocate_block_group+0x432/0x4de [btrfs]
[20307.032754]  RSP <ffff8807e919bb18>
[20307.033068] ---[ end trace 18be77360e49d59d ]---



On 2014-11-25 23:33, Tomasz Chmielewski wrote:
> I'm still seeing this when running balance with 3.18-rc6:
> 
> [95334.066898] BTRFS info (device sdd1): relocating block group
> 6468350771200 flags 17
> [95344.384279] BTRFS info (device sdd1): found 5371 extents
> [95373.555640] BTRFS (device sdd1): parent transid verify failed on
> 5568935395328 wanted 70315 found 89269
> [95373.574208] BTRFS (device sdd1): parent transid verify failed on
> 5568935395328 wanted 70315 found 89269
> [95373.574483] ------------[ cut here ]------------
> [95373.574542] kernel BUG at fs/btrfs/relocation.c:242!
> [95373.574601] invalid opcode: 0000 [#1] SMP
> [95373.574661] Modules linked in: ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cpufreq_ondemand
> cpufreq_conservative cpufreq_powersave cpufreq_stats nfsd auth_rpcgss
> oid_registry exportfs nfs_acl nfs lockd grace fscache sunrpc ipv6
> btrfs xor raid6_pq zlib_deflate coretemp hwmon loop pcspkr i2c_i801
> i2c_core battery tpm_infineon tpm_tis tpm 8250_fintek video parport_pc
> parport ehci_pci lpc_ich ehci_hcd mfd_core button acpi_cpufreq ext4
> crc16 jbd2 mbcache raid1 sg sd_mod ahci libahci libata scsi_mod r8169
> mii
> [95373.576506] CPU: 1 PID: 6089 Comm: btrfs Not tainted 3.18.0-rc6 #1
> [95373.576568] Hardware name: System manufacturer System Product
> Name/P8H77-M PRO, BIOS 1101 02/04/2013
> [95373.576683] task: ffff8807e9b91810 ti: ffff8807da1b8000 task.ti:
> ffff8807da1b8000
> [95373.576794] RIP: 0010:[<ffffffffa0323144>]  [<ffffffffa0323144>]
> relocate_block_group+0x432/0x4de [btrfs]
> [95373.576933] RSP: 0018:ffff8807da1bbb18  EFLAGS: 00010202
> [95373.576993] RAX: ffff8806327a70f8 RBX: ffff8806327a7000 RCX: 
> 0000000180200000
> [95373.577056] RDX: ffff8806327a70d8 RSI: ffff8806327a70e8 RDI: 
> ffff8807ff403900
> [95373.577118] RBP: ffff8807da1bbb88 R08: 0000000000000001 R09: 
> 0000000000000000
> [95373.577181] R10: 0000000000000003 R11: ffffffffa031f2aa R12: 
> ffff8804601de5a0
> [95373.577243] R13: ffff8806327a7108 R14: 00000000fffffff4 R15: 
> ffff8806327a7020
> [95373.577307] FS:  00007f9ccfa99840(0000) GS:ffff88081fa40000(0000)
> knlGS:0000000000000000
> [95373.577418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [95373.577479] CR2: 00007f98c4133000 CR3: 00000007dd7bf000 CR4: 
> 00000000001407e0
> [95373.577540] Stack:
> [95373.577594]  ffffea0004962e80 ffff8806327a70e8 ffffea000c7fdb80
> 0000000000000000
> [95373.577708]  ffffea000d289600 00ffea000d289640 a8000005e22b2a30
> 0000000000001000
> [95373.577822]  ffff8802eb7b0240 ffff8806327a7000 0000000000000000
> ffff8807f3b5a5a8
> [95373.577937] Call Trace:
> [95373.578009]  [<ffffffffa0323348>]
> btrfs_relocate_block_group+0x158/0x278 [btrfs]
> [95373.578137]  [<ffffffffa0300fd4>]
> btrfs_relocate_chunk.isra.70+0x35/0xa5 [btrfs]
> [95373.578263]  [<ffffffffa03025d4>] btrfs_balance+0xa66/0xc6b [btrfs]
> [95373.578329]  [<ffffffff810bd63a>] ? 
> __alloc_pages_nodemask+0x137/0x702
> [95373.578407]  [<ffffffffa0308485>] btrfs_ioctl_balance+0x220/0x29f 
> [btrfs]
> [95373.578483]  [<ffffffffa030d586>] btrfs_ioctl+0x1134/0x22f6 [btrfs]
> [95373.578547]  [<ffffffff810d5d90>] ? handle_mm_fault+0x44d/0xa00
> [95373.578610]  [<ffffffff81175856>] ? avc_has_perm+0x2e/0xf7
> [95373.578672]  [<ffffffff810d88a9>] ? __vm_enough_memory+0x25/0x13c
> [95373.578736]  [<ffffffff8110f05d>] do_vfs_ioctl+0x3f2/0x43c
> [95373.578798]  [<ffffffff8110f0f5>] SyS_ioctl+0x4e/0x7d
> [95373.578859]  [<ffffffff81030ab3>] ? do_page_fault+0xc/0x11
> [95373.578920]  [<ffffffff813b58d2>] system_call_fastpath+0x12/0x17
> [95373.578981] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39
> ab 08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00
> 00 00 74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b
> 7b 08
> [95373.579226] RIP  [<ffffffffa0323144>]
> relocate_block_group+0x432/0x4de [btrfs]
> [95373.579352]  RSP <ffff8807da1bbb18>
> 
> 
> 
> 
> On 2014-10-04 00:06, Tomasz Chmielewski wrote:
>> On 2014-10-03 20:17 (Fri), Josef Bacik wrote:
>>> On 10/02/2014 03:27 AM, Tomasz Chmielewski wrote:
>>>> Got this when running balance with 3.17.0-rc7:
>>>> 
>>> 
>>> Give these two patches a try
>>> 
>>> https://patchwork.kernel.org/patch/4938281/
>>> https://patchwork.kernel.org/patch/4939761/
>> 
>> With these two patches applied on top of 3.13-rc7, it BUGs somewhere 
>> else now:
>> 
>> [ 2030.858792] BTRFS info (device sdd1): relocating block group
>> 6469424513024 flags 17
>> [ 2039.674077] BTRFS info (device sdd1): found 20937 extents
>> [ 2066.726661] BTRFS info (device sdd1): found 20937 extents
>> [ 2068.048208] BTRFS info (device sdd1): relocating block group
>> 6468350771200 flags 17
>> [ 2080.796412] BTRFS info (device sdd1): found 46927 extents
>> [ 2092.703850] parent transid verify failed on 5568935395328 wanted
>> 70315 found 71183
>> [ 2092.714622] parent transid verify failed on 5568935395328 wanted
>> 70315 found 71183
>> [ 2092.725269] parent transid verify failed on 5568935395328 wanted
>> 70315 found 71183
>> [ 2092.725680] ------------[ cut here ]------------
>> [ 2092.725740] kernel BUG at fs/btrfs/relocation.c:242!
>> [ 2092.725800] invalid opcode: 0000 [#1] SMP
>> [ 2092.725860] Modules linked in: ipt_MASQUERADE iptable_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
>> ip_tables x_tables cpufreq_ondemand cpufreq_conservative
>> cpufreq_powersave cpufreq_stats bridge stp llc ipv6 btrfs xor raid6_pq
>> zlib_deflate coretemp hwmon loop i2c_i801 parport_pc pcspkr i2c_core
>> parport video battery tpm_infineon tpm_tis tpm lpc_ich mfd_core
>> ehci_pci ehci_hcd acpi_cpufreq button ext4 crc16 jbd2 mbcache raid1 sg
>> sd_mod ahci libahci libata scsi_mod r8169 mii
>> [ 2092.727740] CPU: 3 PID: 3937 Comm: btrfs Not tainted 3.17.0-rc7 #3
>> [ 2092.727801] Hardware name: System manufacturer System Product
>> Name/P8H77-M PRO, BIOS 1101 02/04/2013
>> [ 2092.727917] task: ffff8800c7883020 ti: ffff8800c7d04000 task.ti:
>> ffff8800c7d04000
>> [ 2092.728029] RIP: 0010:[<ffffffffa0322a4a>]  [<ffffffffa0322a4a>]
>> relocate_block_group+0x432/0x4de [btrfs]
>> [ 2092.728169] RSP: 0018:ffff8800c7d07a58  EFLAGS: 00010206
>> [ 2092.728229] RAX: ffff8806c69a18f8 RBX: ffff8806c69a1800 RCX: 
>> 0000000180200000
>> [ 2092.728292] RDX: ffff8806c69a18d8 RSI: ffff8806c69a18e8 RDI: 
>> ffff8807ff403900
>> [ 2092.728356] RBP: ffff8800c7d07ac8 R08: 0000000000000001 R09: 
>> 0000000000000000
>> [ 2092.728419] R10: 0000000000000003 R11: ffffffffa031eb54 R12: 
>> ffff8805d515c240
>> [ 2092.728482] R13: ffff8806c69a1908 R14: 00000000fffffff4 R15: 
>> ffff8806c69a1820
>> [ 2092.728546] FS:  00007f4f251d0840(0000) GS:ffff88081fac0000(0000)
>> knlGS:0000000000000000
>> [ 2092.728660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2092.728721] CR2: ffffffffff600400 CR3: 00000000c7fb0000 CR4: 
>> 00000000001407e0
>> [ 2092.728783] Stack:
>> [ 2092.728837]  ffffea0002edf300 ffff8806c69a18e8 ffffea0002edf000
>> 0000000000000000
>> [ 2092.728952]  ffffea0002edf080 00ffea0002edf0c0 a8000005e22b2a30
>> 0000000000001000
>> [ 2092.729067]  ffff8807d969f870 ffff8806c69a1800 0000000000000000
>> ffff8807f3f285b0
>> [ 2092.729183] Call Trace:
>> [ 2092.729256]  [<ffffffffa0322c4e>]
>> btrfs_relocate_block_group+0x158/0x278 [btrfs]
>> [ 2092.729385]  [<ffffffffa02ff79c>]
>> btrfs_relocate_chunk.isra.62+0x58/0x5f7 [btrfs]
>> [ 2092.729512]  [<ffffffffa030e99f>] ?
>> btrfs_set_lock_blocking_rw+0x68/0x95 [btrfs]
>> [ 2092.729632]  [<ffffffffa02bfb04>] ? 
>> btrfs_set_path_blocking+0x23/0x54 [btrfs]
>> [ 2092.729704]  [<ffffffffa02c4517>] ? btrfs_search_slot+0x7bc/0x816 
>> [btrfs]
>> [ 2092.729782]  [<ffffffffa02fbc81>] ? free_extent_buffer+0x6f/0x7c 
>> [btrfs]
>> [ 2092.729859]  [<ffffffffa0302679>] btrfs_balance+0xa7b/0xc80 [btrfs]
>> [ 2092.729935]  [<ffffffffa0308177>] btrfs_ioctl_balance+0x220/0x29f 
>> [btrfs]
>> [ 2092.730012]  [<ffffffffa030d1e4>] btrfs_ioctl+0x10bd/0x2281 [btrfs]
>> [ 2092.730076]  [<ffffffff810d5152>] ? handle_mm_fault+0x44d/0xa00
>> [ 2092.730140]  [<ffffffff81173e76>] ? avc_has_perm+0x2e/0xf7
>> [ 2092.730202]  [<ffffffff810d7c6d>] ? __vm_enough_memory+0x25/0x13c
>> [ 2092.730266]  [<ffffffff8110d72d>] do_vfs_ioctl+0x3f2/0x43c
>> [ 2092.730328]  [<ffffffff8110d7c5>] SyS_ioctl+0x4e/0x7d
>> [ 2092.730389]  [<ffffffff81030a71>] ? do_page_fault+0xc/0xf
>> [ 2092.730452]  [<ffffffff813b0652>] system_call_fastpath+0x16/0x1b
>> [ 2092.730512] Code: 00 00 00 48 39 83 f8 00 00 00 74 02 0f 0b 4c 39
>> ab 08 01 00 00 74 02 0f 0b 48 83 7b 20 00 74 02 0f 0b 83 bb 20 01 00
>> 00 00 74 02 <0f> 0b 83 bb 24 01 00 00 00 74 02 0f 0b 48 8b 73 18 48 8b
>> 7b 08
>> [ 2092.730759] RIP  [<ffffffffa0322a4a>]
>> relocate_block_group+0x432/0x4de [btrfs]
>> [ 2092.730885]  RSP <ffff8800c7d07a58>
>> [ 2092.731233] ---[ end trace 16c7709ebf2c379c ]---
> 
> Tomasz Chmielewski
> http://www.sslrack.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
@ 2014-12-12 21:36         ` Robert White
  2014-12-12 21:46           ` Tomasz Chmielewski
  2014-12-15 20:07         ` Josef Bacik
  2014-12-19 21:47         ` Josef Bacik
  2 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-12 21:36 UTC (permalink / raw)
  To: Tomasz Chmielewski, Josef Bacik; +Cc: linux-btrfs

On 12/12/2014 06:37 AM, Tomasz Chmielewski wrote:
> FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).
>
> # time btrfs balance start /mnt/lxc2
> Segmentation fault
>
> real    322m32.153s
> user    0m0.000s
> sys     16m0.930s

> (...)

> [20306.981773] BTRFS (device sdd1): parent transid verify failed on
> 5568935395328 wanted 70315 found 102416
> [20306.983962] BTRFS (device sdd1): parent transid verify failed on
> 5568935395328 wanted 70315 found 102416

Uh... isn't fixing an invalid transaction id a job for btrfsck? I don't 
see anything in linux/fs/btrfs/*.c that would fix this sort of semantic 
error, like ever.

I think that this is a case of thing_a points to thing_b and thing_b is 
much newer (transaction 102416) than thing_a thinks it should be 
(transaction 70315).

In another thread [that was discussing SMART] you talked about replacing 
a drive and then needing to do some patching-up of the result because of 
drive failures. Is this the same filesystem where that happened? That 
kind of work could leave you in this state if thing_a was one of the 
damaged bits and the system had to go fall back to an earlier version.

So I'd run a btrfsck from the very recent btrfs-tools package. If it 
tells you to run it again with --repair, then do that.

By my reading balance is simply refusing to touch an extent that doesn't 
seem to make sense because it can't be sure it wouldn't undermine some 
active data if it relocated the block.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 21:36         ` Robert White
@ 2014-12-12 21:46           ` Tomasz Chmielewski
  2014-12-12 22:34             ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-12 21:46 UTC (permalink / raw)
  To: Robert White; +Cc: Josef Bacik, linux-btrfs

On 2014-12-12 22:36, Robert White wrote:

> In another thread [that was discussing SMART] you talked about
> replacing a drive and then needing to do some patching-up of the
> result because of drive failures. Is this the same filesystem where
> that happened?

Nope, it was on a different server.

-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 21:46           ` Tomasz Chmielewski
@ 2014-12-12 22:34             ` Robert White
  2014-12-12 22:46               ` Tomasz Chmielewski
  0 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-12 22:34 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Josef Bacik, linux-btrfs

On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
> On 2014-12-12 22:36, Robert White wrote:
>
>> In another thread [that was discussing SMART] you talked about
>> replacing a drive and then needing to do some patching-up of the
>> result because of drive failures. Is this the same filesystem where
>> that happened?
>
> Nope, it was on a different server.
>

okay, so how did the btrfsck turn out?



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 22:34             ` Robert White
@ 2014-12-12 22:46               ` Tomasz Chmielewski
  2014-12-12 22:58                 ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-12 22:46 UTC (permalink / raw)
  To: Robert White; +Cc: Josef Bacik, linux-btrfs

On 2014-12-12 23:34, Robert White wrote:
> On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
>> On 2014-12-12 22:36, Robert White wrote:
>> 
>>> In another thread [that was discussing SMART] you talked about
>>> replacing a drive and then needing to do some patching-up of the
>>> result because of drive failures. Is this the same filesystem where
>>> that happened?
>> 
>> Nope, it was on a different server.
>> 
> 
> okay, so how did the btrfsck turn out?

# time btrfsck /dev/sdc1 &>/root/btrfsck.log

real    22m0.140s
user    0m3.090s
sys     0m6.120s

root@bkp010 /usr/src/btrfs-progs # echo $?
1

# cat /root/btrfsck.log
root item for root 8681, current bytenr 5568935395328, current gen 
70315, current level 2, new bytenr 5569014104064, new gen 70316, new 
level 2
Found 1 roots with an outdated root item.
Please run a filesystem check with the option --repair to fix them.


Now, I'm a bit afraid to run --repair - as far as I remember, some time 
ago, it used to do all weird things except the actual repair.
Is it better nowadays? I'm using latest clone from 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 22:46               ` Tomasz Chmielewski
@ 2014-12-12 22:58                 ` Robert White
  2014-12-13  8:16                   ` Tomasz Chmielewski
  0 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-12 22:58 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Josef Bacik, linux-btrfs

On 12/12/2014 02:46 PM, Tomasz Chmielewski wrote:
> On 2014-12-12 23:34, Robert White wrote:
>> On 12/12/2014 01:46 PM, Tomasz Chmielewski wrote:
>>> On 2014-12-12 22:36, Robert White wrote:
>>>
>>>> In another thread [that was discussing SMART] you talked about
>>>> replacing a drive and then needing to do some patching-up of the
>>>> result because of drive failures. Is this the same filesystem where
>>>> that happened?
>>>
>>> Nope, it was on a different server.
>>>
>>
>> okay, so how did the btrfsck turn out?
>
> # time btrfsck /dev/sdc1 &>/root/btrfsck.log
>
> real    22m0.140s
> user    0m3.090s
> sys     0m6.120s
>
> root@bkp010 /usr/src/btrfs-progs # echo $?
> 1
>
> # cat /root/btrfsck.log
> root item for root 8681, current bytenr 5568935395328, current gen
> 70315, current level 2, new bytenr 5569014104064, new gen 70316, new
> level 2
> Found 1 roots with an outdated root item.
> Please run a filesystem check with the option --repair to fix them.
>
>
> Now, I'm a bit afraid to run --repair - as far as I remember, some time
> ago, it used to do all weird things except the actual repair.
> Is it better nowadays? I'm using latest clone from
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
>
>

I don't have the history to answer this definitively, but I don't think 
you have a choice. Nothing else is going to touch that error.

I have not seen any "oh my god, btrfsck just ate my filesystem errors" 
since I joined the list -- but I am a relative newcomer.

I know that you, of course, as a contentious and well-traveled system 
administrator, already have a current backup since you are doing storage 
maintenance... right? 8-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 22:58                 ` Robert White
@ 2014-12-13  8:16                   ` Tomasz Chmielewski
  2014-12-13  9:39                     ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-13  8:16 UTC (permalink / raw)
  To: Robert White; +Cc: Josef Bacik, linux-btrfs

On 2014-12-12 23:58, Robert White wrote:

> I don't have the history to answer this definitively, but I don't
> think you have a choice. Nothing else is going to touch that error.
> 
> I have not seen any "oh my god, btrfsck just ate my filesystem errors"
> since I joined the list -- but I am a relative newcomer.
> 
> I know that you, of course, as a contentious and well-traveled system
> administrator, already have a current backup since you are doing
> storage maintenance... right? 8-)

Who needs backups with btrfs, right? :)

So apparently btrfsck --repair fixed some issues, the fs is still 
mountable and looks fine.

Running balance again, but that will take many days there.

# btrfsck --repair /dev/sdc1
fixing root item for root 8681, current bytenr 5568935395328, current 
gen 70315, current level 2, new bytenr 5569014104064, new gen 70316, new 
level 2
Fixed 1 roots.
checking extents
checking free space cache
checking fs roots
root 696 inode 2765103 errors 400, nbytes wrong
root 696 inode 2831256 errors 400, nbytes wrong
root 9466 inode 2831256 errors 400, nbytes wrong
root 9505 inode 2831256 errors 400, nbytes wrong
root 10139 inode 2831256 errors 400, nbytes wrong
root 10525 inode 2831256 errors 400, nbytes wrong
root 10561 inode 2831256 errors 400, nbytes wrong
root 10633 inode 2765103 errors 400, nbytes wrong
root 10633 inode 2831256 errors 400, nbytes wrong
root 10650 inode 2765103 errors 400, nbytes wrong
root 10650 inode 2831256 errors 400, nbytes wrong
root 10680 inode 2765103 errors 400, nbytes wrong
root 10680 inode 2831256 errors 400, nbytes wrong
root 10681 inode 2765103 errors 400, nbytes wrong
root 10681 inode 2831256 errors 400, nbytes wrong
root 10701 inode 2765103 errors 400, nbytes wrong
root 10701 inode 2831256 errors 400, nbytes wrong
root 10718 inode 2765103 errors 400, nbytes wrong
root 10718 inode 2831256 errors 400, nbytes wrong
root 10735 inode 2765103 errors 400, nbytes wrong
root 10735 inode 2831256 errors 400, nbytes wrong
enabling repair mode
Checking filesystem on /dev/sdc1
UUID: 371af1dc-d88b-4dee-90ba-91fec2bee6c3
cache and super generation don't match, space cache will be invalidated
found 942113871627 bytes used err is 1
total csum bytes: 2445349244
total tree bytes: 28743073792
total fs tree bytes: 22880043008
total extent tree bytes: 2890547200
btree space waste bytes: 5339534781
file data blocks allocated: 2779865800704
  referenced 3446026993664
Btrfs v3.17.3

real    76m27.845s
user    19m1.470s
sys     2m55.690s


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13  8:16                   ` Tomasz Chmielewski
@ 2014-12-13  9:39                     ` Robert White
  2014-12-13 13:53                       ` Tomasz Chmielewski
  0 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-13  9:39 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Josef Bacik, linux-btrfs

On 12/13/2014 12:16 AM, Tomasz Chmielewski wrote:
> On 2014-12-12 23:58, Robert White wrote:
>
>> I don't have the history to answer this definitively, but I don't
>> think you have a choice. Nothing else is going to touch that error.
>>
>> I have not seen any "oh my god, btrfsck just ate my filesystem errors"
>> since I joined the list -- but I am a relative newcomer.
>>
>> I know that you, of course, as a contentious and well-traveled system
>> administrator, already have a current backup since you are doing
>> storage maintenance... right? 8-)
>
> Who needs backups with btrfs, right? :)
>
> So apparently btrfsck --repair fixed some issues, the fs is still
> mountable and looks fine.
>
> Running balance again, but that will take many days there.

Might I ask why you are running balance? After a persistent error I'd 
understand going straight to scrub, but balance is usually for 
transformation or to redistribute things after atypical use.

An entire generation of folks have grown used to defraging windows boxes 
and all, but if you've already got an array that is going to take "many 
days" to balance what benefit do you actually expect to receive?


Defrag -- used for "I think I'm getting a lot of unnecessary head seek 
in this application, these files need to be brought into closer order".

Scrub -- used for defensive checking a-la checkdisk. "I suspect that 
after that unexpected power outage something may be a little off", or 
alternately "I think my disks are giving me bitrot, I better check".

Btrfsck -- used for "I suspect structural problems caused by real world 
events like power hits or that one time when the cat knocked over my 
tower case while I was vacuuming all my sql tables." (often reserved for 
"hey, I'm getting weird messages from the kernel about things in my 
filesystem".)

Balance -- primary -- used for "Well I used to use this filessytem for a 
small number of large files, but now I am processing a large number of 
small files and I'm running out of metadata even though I've got a lot 
of space." (or vice versa)

Balance -- other -- used for "I just changed the geometry of my 
filessytem by adding or removing a disk and I want to spread out.

Balance -- (conversion/restructuring) -- used for "single is okay, but 
I'd rather raid-0 to spread out my load across these many disks" or 
"gee, I'd like some redundancy now that I have the room.



Frequent balancing of a Copy On Write filesystem will tend to make 
things somewhat anti-optimal. You are burping the natural working space 
out of the natural layout.

Since COW implies mandatory movement of data, every time you burp out 
all the slack and pack all the data together you are taking your 
regularly modified files and moving them far, far away from the places 
where frequently modified files are most happy (e.g. the 
only-partly-full data region they were just living in).

Similarly two files that usually get modified at the same time (say a 
databse file and its rollback log) will tend to end up in the same 
active data extent as time goes on, and if balance decides it can "clean 
up" that extent it will likely give those two files a data-extent 
divorce and force them to the opposite ends of dataland.

COW systems are inherently somewhat chaotic. If you fight that too 
aggressively you will, at best, be wasting the maintenance time.

It may be a decrease in performance measured in very small quanta, but 
so is the expected benefit of most maintenance.


 From the wiki::

https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F

btrfs filesystem balance is an operation which simply takes all of the 
data and metadata on the filesystem, and re-writes it in a different 
place on the disks, passing it through the allocator algorithm on the 
way. It was originally designed for multi-device filesystems, to spread 
data more evenly across the devices (i.e. to "balance" their usage). 
This is particularly useful when adding new devices to a nearly-full 
filesystem.
Due to the way that balance works, it also has some useful side-effects:
If there is a lot of allocated but unused data or metadata chunks, a 
balance may reclaim some of that allocated space. This is the main 
reason for running a balance on a single-device filesystem.
On a filesystem with damaged replication (e.g. a RAID-1 FS with a dead 
and removed disk), it will force the FS to rebuild the missing copy of 
the data on one of the currently active devices, restoring the RAID-1 
capability of the filesystem.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13  9:39                     ` Robert White
@ 2014-12-13 13:53                       ` Tomasz Chmielewski
  2014-12-13 20:54                         ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-13 13:53 UTC (permalink / raw)
  To: Robert White; +Cc: linux-btrfs

On 2014-12-13 10:39, Robert White wrote:

> Might I ask why you are running balance? After a persistent error I'd
> understand going straight to scrub, but balance is usually for
> transformation or to redistribute things after atypical use.

There were several reasons for running balance on this system:

1) I was getting "no space left", even though there were hundreds of GBs 
left. Not sure if this still applies to the current kernels (3.18 and 
later) though, but it was certainly the problem in the past.

2) The system was regularly freezing, I'd say once a week was a norm. 
Sometimes I was getting btrfs traces logged in syslog.
After a few freezes the fs was getting corrupted to different degree. At 
some point, it was so bad that it was only possible to use it read only. 
So I had to get the data off, reformat, copy back... It would start 
crashing after a few weeks of usage.

My usage case is quite simple:

- skinny extents, extended inode refs
- mount compress-force=zlib
- rsync many remote data sources (-a -H --inplace --partial) + snapshot
- around 500 snapshots in total, from 20 or so subvolumes

Especially rsync's --inplace option combined with many snapshots and 
large fragmentation was deadly for btrfs - I was seeing system freezes 
right when rsyncing a highly fragmented, large file.

Then, running balance on the "corrupted" filesystem was more an exercise 
(if scrub passes fine, I would expect balance to pass as well). Some 
BUGs it was causing was sometimes fixed in newer kernels, sometimes not 
(btrfsck was not really usable a few months back).

3) I had different luck with recovering btrfs after a failed drive (in 
RAID-1). Sometimes it worked as expected, sometimes, the fs was getting 
broken so much I had to rsync data off it and format from scratch (where 
mdraid would kick the drive after getting write errors - it's not the 
case with btrfs, and weird things can happen).
Sometimes, running "btrfs device delete missing" (it's balance in 
principle, I think) would take weeks, during which a second drive could 
easily die.
Again, running balance would be more exercise there, to see if the newer 
kernel still crashes.


> An entire generation of folks have grown used to defraging windows
> boxes and all, but if you've already got an array that is going to
> take "many days" to balance what benefit do you actually expect to
> receive?

For me - it's a good test to see if btrfs is finally getting stable 
(some cases explained above).


> Defrag -- used for "I think I'm getting a lot of unnecessary head seek
> in this application, these files need to be brought into closer
> order".

Fragmentation was an issue for btrfs, at least a few kernels back (as 
explained above, with rsync's --inplace).
However, I'm not running autodefrag anywhere - not sure how it affects 
snapshots.


> Scrub -- used for defensive checking a-la checkdisk. "I suspect that
> after that unexpected power outage something may be a little off", or
> alternately "I think my disks are giving me bitrot, I better check".

For me, it was passing fine, where balance was crashing the kernel.


Again, my main rationale for running balance is to see if btrfs is 
behaving stable. While I have systems with btrfs which are running fine 
for months, I also have ones which will crash after 1-2 weeks (once the 
system grows in size / complexity).

So hopefully, btrfsck had fixed that fs - once it is running stable for 
a week or two, I might be brave to re-enable btrfs quotas (was another 
system freezer, at least a few kernels back).


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13 13:53                       ` Tomasz Chmielewski
@ 2014-12-13 20:54                         ` Robert White
  2014-12-13 21:52                           ` Tomasz Chmielewski
  0 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-13 20:54 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On 12/13/2014 05:53 AM, Tomasz Chmielewski wrote:
> My usage case is quite simple:
>
> - skinny extents, extended inode refs
okay

> - mount compress-force=zlib
I'd, personally, "never" force compression. This can increase the size 
of files by five or more percent if it is an inherently incompressible 
file. While it is easy to deliberately create a file that will trick the 
compress check logic into not compressing something that would enjoy 
compression it does _not_ happen by chance very often at all.

> - rsync many remote data sources (-a -H --inplace --partial) + snapshot

Using --inplace on a Copy On Write filesystem has only one effect, it 
increases fragmentation... a lot... Every new block is going to get 
written to a new area anyway, so if you have enough slack space to keep 
the one new copy of the new file, which you will probably use up anyway 
in the COW event, laying in the fresh copy in a likely more contiguous 
way will tend to make things cleaner over time.

--inplace is doubly useless with compression as compression is perturbed 
by default if one byte changes in the original file.

The only time --inplace might be helpful is if the file is NOCOW... 
except...


> - around 500 snapshots in total, from 20 or so subvolumes

That's a lot of snapshots and subvolumes. Not an impossibly high number, 
but a lot. That needs it's own use-case evaluation. But regardless...

Even if you set the NOCOW option on a file to make the --inplace rsync 
work, if that file is snapshotted (snapshot?) between the rsync 
modification events it will be in 1COW mode because of the snapshot 
anyway and you are back to the default anti-optimal conditions.


> Especially rsync's --inplace option combined with many snapshots and
> large fragmentation was deadly for btrfs - I was seeing system freezes
> right when rsyncing a highly fragmented, large file.

You are kind of doing all that to yourself. Combining _forced_ 
compression with denying the natural opportunity for the re-write of the 
file to move it to nicely contiguous "new locations" and then pinning it 
all in place with multiple snapshots you've created the worst of all 
possible worlds.

The more you use optional gross-behavior options on some sorts of things 
the more you are fighting the "natural organization" of the system. That 
is, every system is designed around a set of core assumptions and 
behavioral options tend to invalidate the mainline assumptions. Some 
options, like "recursive" are naturally part of those assumptions and 
play into them, other options, particularly things with "force" in the 
name tend to be "if you really think you must, sure, I'll do what you 
say, but if it turns out bad it's on _your_ head" options. Which options 
are which is a judgment call, but the combination you've chosen is 
definitely working in that bad area.

And keep repeating this to yourself :: "balance does not reorganize 
anything, it just moves the existing disorder to a new location". This 
is not a perfect summation, and it's clearly wrong if you are using 
"convert", but it's the correct way to view what's happening while 
asking yourself "should I balance?".


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13 20:54                         ` Robert White
@ 2014-12-13 21:52                           ` Tomasz Chmielewski
  2014-12-13 23:56                             ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-13 21:52 UTC (permalink / raw)
  To: Robert White; +Cc: linux-btrfs

On 2014-12-13 21:54, Robert White wrote:

>> - rsync many remote data sources (-a -H --inplace --partial) + 
>> snapshot
> 
> Using --inplace on a Copy On Write filesystem has only one effect, it
> increases fragmentation... a lot...

...if the file was changed.


> Every new block is going to get
> written to a new area anyway,

Exactly - "every new block". But that's true with and without --inplace.
Also - without --inplace, it is "every block". In other words, without 
--inplace, the file is likely to be rewritten by rsync to a new one, and 
CoW is lost (more below).


> so if you have enough slack space to
> keep the one new copy of the new file, which you will probably use up
> anyway in the COW event, laying in the fresh copy in a likely more
> contiguous way will tend to make things cleaner over time.
> 
> --inplace is doubly useless with compression as compression is
> perturbed by default if one byte changes in the original file.

No. If you change 1 byte in a 100 MB file, or perhaps 1 GB file, you 
will likely loose a few kBs of CoW. The whole file is certainly not 
rewritten if you use --inplace. However it will be wholly rewritten if 
you don't use --inplace.


> The only time --inplace might be helpful is if the file is NOCOW... 
> except...

No, you're wrong.
By default, rsync creates a new file if it detects any file modification 
- like "touch file".

Consider this experiment:

# create a "large file"
dd if=/dev/urandom of=bigfile bs=1M count=3000

# copy it with rsync
rsync -a -v --progress bigfile bigfile2

# copy it again - blazing fast, no change
rsync -a -v --progress bigfile bigfile2

# "touch" the original file
touch bigfile

# try copying again with rsync - notice rsync creates a temp file, like 
.bigfile2.J79ta2
# No change to the file except the timestamp, but good bye your CoW.
rsync -a -v --progress bigfile bigfile2

# Now try the same with --inplace; compare data written to disk with 
iostat -m in both cases.


Same goes for append files - even if they are compressed, most CoW will 
be shared. I'd say it will be similar for lightly modified files 
(changed data will be CoW-unshared, some compressed "overhead" will be 
unshared, but the rest will be untouched / shared by CoW between the 
snapshots).



>> - around 500 snapshots in total, from 20 or so subvolumes
> 
> That's a lot of snapshots and subvolumes. Not an impossibly high
> number, but a lot. That needs it's own use-case evaluation. But
> regardless...
> 
> Even if you set the NOCOW option on a file to make the --inplace rsync
> work, if that file is snapshotted (snapshot?) between the rsync
> modification events it will be in 1COW mode because of the snapshot
> anyway and you are back to the default anti-optimal conditions.

Again - if the file was changed a lot, it doesn't matter if it's 
--inplace or not. If the file data was not changed, or changed little - 
--inplace will help preserve CoW.


>> Especially rsync's --inplace option combined with many snapshots and
>> large fragmentation was deadly for btrfs - I was seeing system freezes
>> right when rsyncing a highly fragmented, large file.
> 
> You are kind of doing all that to yourself.

To clarify - freezes - I mean kernel bugs exposed and machine freezing.
I think we all agree that whatever userspace is doing in the filesystem, 
it should not result is kernel BUG / freeze.


> Combining _forced_
> compression with denying the natural opportunity for the re-write of
> the file to move it to nicely contiguous "new locations" and then
> pinning it all in place with multiple snapshots you've created the
> worst of all possible worlds.

I disagree. It's quite compact, for my data usage. If I needed blazing 
fast file access, I wouldn't be using a CoW filesystem nor snapshots in 
the first place. For data mostly stored and rarely read, it is OK.


(...)

> And keep repeating this to yourself :: "balance does not reorganize
> anything, it just moves the existing disorder to a new location". This
> is not a perfect summation, and it's clearly wrong if you are using
> "convert", but it's the correct way to view what's happening while
> asking yourself "should I balance?".

I agree - I don't run it unless I need to (or I'm curious to see if it 
would expose some more bugs).
It would be quite a step back for a filesystem to need some periodic 
maintenance like that after all.

Also I'm in the opinion that balance should not cause the kernel to BUG 
- it should abort, possibly remount the fs ro etc. (suggest running 
btrfsck, if there is enough confidence in this tool), but definitely not 
BUG.


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13 21:52                           ` Tomasz Chmielewski
@ 2014-12-13 23:56                             ` Robert White
  2014-12-14  8:45                               ` Robert White
  0 siblings, 1 reply; 25+ messages in thread
From: Robert White @ 2014-12-13 23:56 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On 12/13/2014 01:52 PM, Tomasz Chmielewski wrote:
> On 2014-12-13 21:54, Robert White wrote:
>
>>> - rsync many remote data sources (-a -H --inplace --partial) + snapshot
>>
>> Using --inplace on a Copy On Write filesystem has only one effect, it
>> increases fragmentation... a lot...
>
> ...if the file was changed.

If the file hasn't changed then it won't be be transferred by 
definition. So the un-changed file is not terribly interesting.

And I did think about the rest of (most of) your points right after 
sending the original email. Particularly since I don't know your actual 
use case. But there is no "un-send" which I suddenly realized I wanted 
to do... Because I needed to change my answer. Like ten seconds later. 
/sigh.

I'm still strongly against forcing compression.

That said, my knee-jerk reaction to using --inplace is still strong for 
almost all file types.

And it remains almost absolute in your case simply because you are 
finding yourself needing to balance and whatnot.

E.g. the theoretical model of efficent partial copies as you present is 
fine... up until we get back your original complain about what a mess it 
makes.

The ruling precept here is Ben Franklin's "penny wise, pound foolish". 
What you _might_ be saving up-front with --inplace is charging you 
double on the back-end with maintenance.

>> Every new block is going to get
>> written to a new area anyway,
>
> Exactly - "every new block". But that's true with and without --inplace.
> Also - without --inplace, it is "every block". In other words, without
> --inplace, the file is likely to be rewritten by rsync to a new one, and
> CoW is lost (more below).

I don't know the nature of the particular files you are translating, but 
I do know a lot about rsync and file layout in general for lots of 
different types of files.

(rsync details here for readers-along :: 
http://rsync.samba.org/how-rsync-works.html )

Now I am assuming you took this advice from something like the manual 
page [QUOTE] This [--inplace] option  is useful for transferring large 
files with block-based changes or appended data, and also on systems 
that are disk bound, not network bound.  It can also help keep a 
copy-on-write filesystem snapshot from diverging the entire contents of 
a file that only has minor changes.[/QUOTE] Though maybe not since the 
description goes on to say that --inplace implies --partial so 
specifying both is redundant.

But here's the thing, those files are really rare. Way more rare than 
you might think.  The consist almost entirely of block-based database 
extents (like an oracle tablespace file), logfiles (such as is found in 
/var/log/messages etc.), VM disk image files (particularly raw images) 
and ISO images that are _only_ modified by adding tracks may fall into 
this category as well..


So we've already skipped the unchanged files...

So, inserting a single byte into, or removing a single byte from, any 
file will cause a re-write from that point on. It will send that file 
from the block boundary containing that byte. Just about anything with a 
header and a history is going to get re-sent almost completely. This 
includes the output from any word processing program you are likely to 
encounter.

Anything with linear compression (such as Open Document Format, which is 
basically a ZIP file) will be resent entirely.

All compiled programs binaries will be resent entirely if the program 
changed at all (the headers again, the changes in text segments, the 
changes in layout that a single byte difference in size cause the elf 
formats, or the dll formats to juggle significantly.)

And I could go on at length, but I'll skip that...

And _then_ the forced compression comes into play.

Rsync is going to impose its default block size to frame changes (see 
--block-size=) and then BTRFS is going to impose it's compression frame 
sizes (presuming it is done by block size). If these are not exactly teh 
same size any rsync block that updates will result in one or two "extra" 
compression blocks being re-written by the tiling overlap effect.

>> so if you have enough slack space to
>> keep the one new copy of the new file, which you will probably use up
>> anyway in the COW event, laying in the fresh copy in a likely more
>> contiguous way will tend to make things cleaner over time.
>>
>> --inplace is doubly useless with compression as compression is
>> perturbed by default if one byte changes in the original file.
>
> No. If you change 1 byte in a 100 MB file, or perhaps 1 GB file, you
> will likely loose a few kBs of CoW. The whole file is certainly not
> rewritten if you use --inplace. However it will be wholly rewritten if
> you don't use --inplace.
>
>
>> The only time --inplace might be helpful is if the file is NOCOW...
>> except...
>
> No, you're wrong.
> By default, rsync creates a new file if it detects any file modification
> - like "touch file".
>
> Consider this experiment:
>
> # create a "large file"
> dd if=/dev/urandom of=bigfile bs=1M count=3000
>
> # copy it with rsync
> rsync -a -v --progress bigfile bigfile2
>
> # copy it again - blazing fast, no change
> rsync -a -v --progress bigfile bigfile2
>
> # "touch" the original file
> touch bigfile

touching an unchanged file is cheating... and would be better addressed 
by the --checksum argument (unless you have something that really 
depends on the dates and you've already assured that any restores won't 
mess up the dates later anyway). --checksum, of course, slows down the 
file selection process.

>
> # try copying again with rsync - notice rsync creates a temp file, like
> .bigfile2.J79ta2
> # No change to the file except the timestamp, but good bye your CoW.
> rsync -a -v --progress bigfile bigfile2
>
> # Now try the same with --inplace; compare data written to disk with
> iostat -m in both cases.
>
>
> Same goes for append files - even if they are compressed, most CoW will
> be shared. I'd say it will be similar for lightly modified files
> (changed data will be CoW-unshared, some compressed "overhead" will be
> unshared, but the rest will be untouched / shared by CoW between the
> snapshots).

So while it is apparent that we both know how rsync works, I wonder if 
you've checked how much of your data load actually has a chance to 
benefit from --inplace and compared it to how much fragmentation it's 
likely to cause.

===

The basic problem I have with --inplace is, space permitting, you end up 
"better off" over long periods of time in a COW filesystem if you don't 
use it.

Consider any append-mode file. With each incremental append amidst a 
bulk transfer it will tend to have each increment separated from its 
next increment by one or more raw allocation extents. That, if nothing 
else, will cause the extents to _never_ reach the empty state where they 
can be reclaimed automatically.

If we were making an infographic-style representation of your disk 
storage the files with long histories would tend to look like lightning 
strikes "down the page" or "vertical scribbling up-an-down" instead of 
solid bars across the little chunks.

As snapshots come and go, the copied-and-replaced little bars would be 
memorialized for a time and then go away. The up and down result is 
semi-permanent, requiring you to do internal maintenance nonsense to try 
to coalesce the scribbles into bars.

So every time you defrag and balance the drive you are just taking steps 
to undo the geographical harm that rsync --inplace caused in the first 
place. That increases the total effective write cost of the inplace into 
the full copy cost _plus_ the incremental copy-on-overwrite cost spread 
over repeated activities (defrags or balances etc).

Without --inplace you'd definitely be using up more room for the copies 
(at least until internal data de-duplication comes along, presuming it 
does), but those copies will go away as the snapshots age off leaving 
larger chunks available for future allocation. The result of that will 
be smaller trees (not that that matters) and larger gaps (which really 
does matter) that the system can work with more optimally as time rolls 
forward forever.

In computer science (and other disciplines) "The Principle(s) of 
Locality" are huge actors. It's the basis of caching at all levels and 
it features strongly in graph theory and simple mechanics.

So yea, ten snapshots of a 30GiB VM disk image that hardly changed at 
all would _suck_, and might be worth its own selective rsync for the 
subdirectories where such things might happen; but turning a backup copy 
of your browser history file into a fifty-segment snail-trail wandering 
all through your data extents is not to be taken lightly either.

The middle ground is to selectively rsync the (usually) very-few 
directories that contain the files you _know_ will explicitly benefit 
from --inplace, such as /home/some_user/Virtual_Machines/*; then rsync 
the whole tree without the option. [The already synchronized directory 
will be automatically seen as current and you'll get optimal results.]


ASIDE :: I hope you are also using --delete when you rsync your backups. 8-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-13 23:56                             ` Robert White
@ 2014-12-14  8:45                               ` Robert White
  0 siblings, 0 replies; 25+ messages in thread
From: Robert White @ 2014-12-14  8:45 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On 12/13/2014 03:56 PM, Robert White wrote:
> ...

Dangit... On re-reading I think I was still less than optimally clear. I 
kept using the word "resent" when I should have been using a word like 
"re-written" or "re-stored" (as opposed to "restored"). On re-reading 
I'm not sure what the least confusing word would be.

So here is a contrived example with seriously simplified assumptions):

Lets say every day rsync coincidentally sends 1Gib and the receiving 
filesystem is otherwise almost quiescent. So as a side effect the 
receiving filesystem monotonically creates one 1GiB data extent. A 
snapshot is taken every day after the rsync. (This is all to just to 
make the mental picture easier.)

Lets say there is a file Aardvark that just happens to be the first file 
considered every time and also happens to grow by exactly 1MiB in pure 
append each day and started out at 1MiB. After ten days Aardvark is 
stored across the ten extents. After 100 days it is store across 100 
extents. Each successive 1MiB is exactly 1023MiB away from its 
predecessor and successor.

Now consider file Badger, the second file. It is 100MiB in size. It is 
also modified each day such that five percent of its total bytes are 
rewritten as exactly five records of exactly 1MiB aligned on 1MiB 
boundaries, all on convenient rsync boundaries. On the first day a 
100MiB chunk lands square in the first data Extent right next to 
Aardvark. On the second and every successive day 5MiB lands next to 
Aardvark in the next extent. But the 5MiB is not a contiguous, they are 
1MiB holes punched in a completely fair distribution across all the 
active fragments of Badger wherever they lie.

A linear read of Aardvark gets monotonically worse with each rsync. A 
linear read of Badger decays towards being 100 head seeks for every 
linear read.

Now how does rsync work? It does a linear read of each file. All of 
Aardvark, then all of Badger (etc) to create the progressive checksum 
stream that it uses to determine if a block needs to transmitted or not.

Now if we start "aging off" (and deleting) snapshots, we start realizing 
the holes in the oldest copies of Badger. There is a very high 
probability that the next chunk of Aardvark is going to end up somewhere 
in Badger-of-day-one. Worse still, some parts of Badger are going to end 
up in Badger-of-day-one but nicely out of order.

At this point the model starts to get too complex for my understanding 
(I don't know how BTRFS selects which data extent to put any one chunk 
of data in relative to the rest of the file contents of whether it tries 
to fill the fullest chunk, the least-full chunk, or if it does some 
other best-fit for this case, so I have to stop that half of the example 
there.)

Additionally: After (N*log(N))^2 days (where I think N is 5) [because of 
fair randomness] {so just shy of two months?} there is a high 
probability that no _current_ part of Badger is still mapped to data 
extent 1. But it is still impossible for snapshot removal to result in a 
reclaim of data extent 1... Aardvark's first block is there forever.

Now compare this to doing the copy.

A linear write of a file is supposed to be (if I understand what I'm 
reading here) laid out as closely-as-possible as a linear extent on the 
disk. Not guaranteed, but its a goal.  This would be "more true" if the 
application doing the writing called fallocate(). [I don't know if rsync 
does fallocate(), I'm just saying.]

So now on day one, Aardvark is one 1MiB chunk in Data extent 1, followed 
by all of Badger.

On day two Aardvark is one 2MiB chunk in Data extent 2, followed by all 
of Badger.

(magical adjustments take place in the source data stream so that we are 
still, by incredible coincidence, using up exactly one extent every day. 
[it's like one of those physics problems where we get to ignore 
friction. 8-)])

On every rsync pass, both the growing Aardvark and the active working 
set of Badger are available as linear reads while making the rolling 
checksums.

If Aardvark and/or Badger need to be used for any purpose from one or 
more of the snapshots, they will also benefit from locality and linear 
read optimization.

When we get around to deleting the first snapshot all of the active 
parts of Aardvark and Badger are long gone. (and since this is magical 
fairly land, data extent one is reclaimed!).

---

How realistic is this? Well clearly magical fairies were involved in the 
making of this play. But the role of Badger will be played by a database 
tablespace and his friend Aardvark will be played by the associated 
update journal. Meaning that both of those two file behaviors are 
real-world examples (not withstanding the cartoonish monotonic update 
profile).

And _clearly_ once you start deleting older snapshots the orderly 
picture would fall apart piecewise.

Then again, according to grep, my /usr/bin/rsync contains the string 
"fallocate". Not a guarantee it's being used, but a strong indicator. 
Any use of fallocate tends to imply that later defrag would not change 
efficiency, so there's another task you wouldn't need to undertake.

So it's a classic trade-off of efficiencies of space vs order.

Once you achieve your dynamic balance, with whole copies of things 
tending to find their homes your _overall_ performance should become 
more stable over time as it bounces back and forth about a mean 
performance for the first few cycles (a cycle being completed when the 
snapshot is deleted).

Right now you are reporting that it was becoming less stable over time.

So there is your deal right there.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
  2014-12-12 21:36         ` Robert White
@ 2014-12-15 20:07         ` Josef Bacik
  2014-12-15 23:27           ` Tomasz Chmielewski
  2014-12-19 21:47         ` Josef Bacik
  2 siblings, 1 reply; 25+ messages in thread
From: Josef Bacik @ 2014-12-15 20:07 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:
> FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).
>
> # time btrfs balance start /mnt/lxc2
> Segmentation fault
>
> real    322m32.153s
> user    0m0.000s
> sys     16m0.930s
>
>

Sorry Tomasz, you are now at the top of the list.  I assume the images 
you sent me before are still good for reproducing this?  Thanks,

Josef


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-15 20:07         ` Josef Bacik
@ 2014-12-15 23:27           ` Tomasz Chmielewski
  0 siblings, 0 replies; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-15 23:27 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 2014-12-15 21:07, Josef Bacik wrote:
> On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:
>> FYI, still seeing this with 3.18 (scrub passes fine on this 
>> filesystem).
>> 
>> # time btrfs balance start /mnt/lxc2
>> Segmentation fault
>> 
>> real    322m32.153s
>> user    0m0.000s
>> sys     16m0.930s
>> 
>> 
> 
> Sorry Tomasz, you are now at the top of the list.  I assume the images
> you sent me before are still good for reproducing this?  Thanks,

I've sent you two URL back then, they should still work. One of these 
filesystems did not crash the 3.18.0 kernel anymore (though there were 
many files changed / added / removed since I've uploaded the images); 
the other still did.


Tomasz


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
  2014-12-12 21:36         ` Robert White
  2014-12-15 20:07         ` Josef Bacik
@ 2014-12-19 21:47         ` Josef Bacik
  2014-12-19 23:18           ` Tomasz Chmielewski
  2 siblings, 1 reply; 25+ messages in thread
From: Josef Bacik @ 2014-12-19 21:47 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs

On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:
> FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).
>
> # time btrfs balance start /mnt/lxc2
> Segmentation fault
>

Ok now I remember why I haven't fix this yet, the images you gave me 
restore but then they don't mount because the extent tree is corrupted 
for some reason.  Could you re-image this fs and send it to me and I 
promise to spend all of my time on the problem until its fixed.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!
  2014-12-19 21:47         ` Josef Bacik
@ 2014-12-19 23:18           ` Tomasz Chmielewski
  0 siblings, 0 replies; 25+ messages in thread
From: Tomasz Chmielewski @ 2014-12-19 23:18 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On 2014-12-19 22:47, Josef Bacik wrote:
> On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:
>> FYI, still seeing this with 3.18 (scrub passes fine on this 
>> filesystem).
>> 
>> # time btrfs balance start /mnt/lxc2
>> Segmentation fault
>> 
> 
> Ok now I remember why I haven't fix this yet, the images you gave me
> restore but then they don't mount because the extent tree is corrupted
> for some reason.  Could you re-image this fs and send it to me and I
> promise to spend all of my time on the problem until its fixed.

(un)fortunately one filesystem stopped crashing on balance with some 
kernel update, and the other one I had crashing on balance was fixed 
with btrfs - so I'm not able to reproduce anymore / produce an image 
which is crashing.

-- 
Tomasz Chmielewski
http://www.sslrack.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-12-19 23:18 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-02  7:27 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Tomasz Chmielewski
2014-10-03 18:17 ` Josef Bacik
2014-10-03 22:06   ` Tomasz Chmielewski
2014-10-03 22:09     ` Josef Bacik
2014-10-04 21:47       ` Tomasz Chmielewski
2014-10-04 22:07         ` Josef Bacik
2014-11-25 22:33     ` Tomasz Chmielewski
2014-12-12 14:37       ` 3.18.0: kernel BUG at fs/btrfs/relocation.c:242! Tomasz Chmielewski
2014-12-12 21:36         ` Robert White
2014-12-12 21:46           ` Tomasz Chmielewski
2014-12-12 22:34             ` Robert White
2014-12-12 22:46               ` Tomasz Chmielewski
2014-12-12 22:58                 ` Robert White
2014-12-13  8:16                   ` Tomasz Chmielewski
2014-12-13  9:39                     ` Robert White
2014-12-13 13:53                       ` Tomasz Chmielewski
2014-12-13 20:54                         ` Robert White
2014-12-13 21:52                           ` Tomasz Chmielewski
2014-12-13 23:56                             ` Robert White
2014-12-14  8:45                               ` Robert White
2014-12-15 20:07         ` Josef Bacik
2014-12-15 23:27           ` Tomasz Chmielewski
2014-12-19 21:47         ` Josef Bacik
2014-12-19 23:18           ` Tomasz Chmielewski
2014-10-13 15:15 ` 3.17.0-rc7: kernel BUG at fs/btrfs/relocation.c:931! Rich Freeman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).