linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Periodic kernel freezes
@ 2015-10-30 16:25 Alex Adriaanse
  2015-10-30 20:06 ` David Goodwin
  0 siblings, 1 reply; 2+ messages in thread
From: Alex Adriaanse @ 2015-10-30 16:25 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]

I have an EC2 instance on AWS that tends to freeze several times per week. When it freezes it stops responding to network traffic, disk I/O stops, and CPU goes to 100%. The system comes back fine after a reboot. I was finally able to get a kernel backtrace from when this happened today, which I have attached to this email.

The VM in question runs Debian Jessie, and has 3 BTRFS filesystems, including the root filesystem. Details are included below.

Any ideas?

Thanks,

Alex



# uname -a
Linux prod-docker-1-a 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux

#   btrfs --version
Btrfs v3.17

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda       8.0G  1.3G  6.4G  17% /
udev             10M     0   10M   0% /dev
tmpfs           3.0G  8.6M  3.0G   1% /run
tmpfs           7.5G   12K  7.5G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.5G     0  7.5G   0% /sys/fs/cgroup
/dev/xvdb        50G  3.9G   45G   9% /var/lib/docker
/dev/xvdc       200G   70G  130G  35% /srv/volumes


# btrfs fi show
Label: none  uuid: 8a293966-5c19-485c-a819-a6b801a1085d
	Total devices 1 FS bytes used 1.21GiB
	devid    1 size 8.00GiB used 3.28GiB path /dev/xvda

Label: 'docker'  uuid: 5bf935e0-4519-43d9-b2e9-b3fb19374b72
	Total devices 1 FS bytes used 3.70GiB
	devid    1 size 50.00GiB used 6.04GiB path /dev/xvdb

Label: 'volumes'  uuid: 2d121370-7879-4485-8fd5-1fe0db5a0c12
	Total devices 1 FS bytes used 68.82GiB
	devid    1 size 200.00GiB used 124.04GiB path /dev/xvdc

Btrfs v3.17


# btrfs fi df /
Data, single: total=2.85GiB, used=1.17GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=204.75MiB, used=38.03MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

# btrfs fi df /var/lib/docker
Data, single: total=4.01GiB, used=3.52GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=179.58MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=64.00MiB, used=0.00B

# btrfs fi df /srv/volumes
Data, single: total=122.01GiB, used=68.55GiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=1.00GiB, used=277.20MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=96.00MiB, used=0.00B

[-- Attachment #2: btrfs-kernel-backtrace.txt --]
[-- Type: text/plain, Size: 9135 bytes --]

[344317.872151] ------------[ cut here ]------------
[344317.876091] kernel BUG at /build/linux-xkTWug/linux-3.16.7-ckt11/mm/page_alloc.c:1011!
[344317.876091] invalid opcode: 0000 [#1] SMP 
[344317.876091] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc crc32_pclmul ppdev ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc parport ttm drm_kms_helper drm i2c_piix4 i2c_core processor thermal_sys button autofs4 btrfs xor raid6_pq ata_generic xen_blkfront crct10dif_pclmul crct10dif_common crc32c_intel ata_piix libata scsi_mod ixgbevf(O)
[344317.876091] CPU: 0 PID: 9842 Comm: kworker/u30:7 Tainted: G           O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u5
[344317.876091] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
[344317.876091] Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
[344317.876091] task: ffff8800eb30b630 ti: ffff880001a08000 task.ti: ffff880001a08000
[344317.876091] RIP: 0010:[<ffffffff811421e7>]  [<ffffffff811421e7>] move_freepages+0x107/0x110
[344317.876091] RSP: 0018:ffff880001a0b918  EFLAGS: 00010006
[344317.876091] RAX: ffff8803e08fb000 RBX: 0000000000000000 RCX: 0000000000000001
[344317.876091] RDX: ffffea000d922fc8 RSI: ffffea000d91c000 RDI: ffff8803e08fbe00
[344317.876091] RBP: 0000000000000001 R08: ffff8803e08fbe00 R09: 0000000000000000
[344317.876091] R10: 0000000000000000 R11: ffff8803e08fbeb0 R12: ffffea000d91cbd0
[344317.876091] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8803e08fbe00
[344317.876091] FS:  0000000000000000(0000) GS:ffff8803e0400000(0000) knlGS:0000000000000000
[344317.876091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[344317.876091] CR2: 00007fd0a085fc00 CR3: 000000035f1d5000 CR4: 00000000001406f0
[344317.876091] Stack:
[344317.876091]  ffffffff81143c1c 0000000000000000 00000002115a8000 ffffea000d91cbf0
[344317.876091]  ffff8803e08fbe90 ffff8803e0412f78 ffff8800eb30b698 ffff8803e08fbe00
[344317.876091]  000000000000001f 0000000000000000 0000000000000001 000000000000001a
[344317.876091] Call Trace:
[344317.876091]  [<ffffffff81143c1c>] ? __rmqueue+0x37c/0x460
[344317.876091]  [<ffffffff81145e15>] ? get_page_from_freelist+0x685/0x910
[344317.876091]  [<ffffffff8114620d>] ? __alloc_pages_nodemask+0x16d/0xb30
[344317.876091]  [<ffffffff8114620d>] ? __alloc_pages_nodemask+0x16d/0xb30
[344317.876091]  [<ffffffffa013583a>] ? btrfs_find_space_for_alloc+0x22a/0x270 [btrfs]
[344317.876091]  [<ffffffffa00d6d57>] ? btrfs_update_reserved_bytes+0x37/0x110 [btrfs]
[344317.876091]  [<ffffffff8118ca6b>] ? kmem_getpages+0x5b/0x110
[344317.876091]  [<ffffffff8118dd5b>] ? cache_grow+0x21b/0x240
[344317.876091]  [<ffffffff8118e793>] ? kmem_cache_alloc+0x183/0x450
[344317.876091]  [<ffffffffa010a070>] ? __btrfs_add_ordered_extent+0x40/0x370 [btrfs]
[344317.876091]  [<ffffffffa010a3bd>] ? btrfs_add_ordered_extent+0x1d/0x30 [btrfs]
[344317.876091]  [<ffffffffa00f8b01>] ? cow_file_range+0x231/0x450 [btrfs]
[344317.876091]  [<ffffffffa00f9da9>] ? submit_compressed_extents+0x1d9/0x490 [btrfs]
[344317.876091]  [<ffffffffa00fa060>] ? submit_compressed_extents+0x490/0x490 [btrfs]
[344317.876091]  [<ffffffffa01200ab>] ? normal_work_helper+0x17b/0x290 [btrfs]
[344317.876091]  [<ffffffff81081662>] ? process_one_work+0x172/0x420
[344317.876091]  [<ffffffff81081cf3>] ? worker_thread+0x113/0x4f0
[344317.876091]  [<ffffffff81081be0>] ? rescuer_thread+0x2d0/0x2d0
[344317.876091]  [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[344317.876091]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[344317.876091]  [<ffffffff815115d8>] ? ret_from_fork+0x58/0x90
[344317.876091]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[344317.876091] Code: e1 4c 89 56 10 4d 63 c1 44 89 c9 4e 8d 0c c5 00 00 00 00 49 c1 e0 06 01 c8 4d 29 c8 4c 01 c6 48 39 f2 73 89 5b 5d 41 5c 41 5d c3 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 49 b9 00 00 00 00 00 
[344317.876091] RIP  [<ffffffff811421e7>] move_freepages+0x107/0x110
[344317.876091]  RSP <ffff880001a0b918>
[344317.876091] ---[ end trace 337c316a9f97426c ]---
[344318.061974] BUG: unable to handle kernel paging request at ffffffffffffffd8
[344318.065536] IP: [<ffffffff8108850c>] kthread_data+0xc/0x20
[344318.065836] PGD 1816067 PUD 1818067 PMD 0 
[344318.065836] Oops: 0000 [#2] SMP 
[344318.065836] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc crc32_pclmul ppdev ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd evdev psmouse serio_raw parport_pc parport ttm drm_kms_helper drm i2c_piix4 i2c_core processor thermal_sys button autofs4 btrfs xor raid6_pq ata_generic xen_blkfront crct10dif_pclmul crct10dif_common crc32c_intel ata_piix libata scsi_mod ixgbevf(O)
[344318.065836] CPU: 0 PID: 9842 Comm: kworker/u30:7 Tainted: G      D    O  3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u5
[344318.065836] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
[344318.065836] task: ffff8800eb30b630 ti: ffff880001a08000 task.ti: ffff880001a08000
[344318.065836] RIP: 0010:[<ffffffff8108850c>]  [<ffffffff8108850c>] kthread_data+0xc/0x20
[344318.065836] RSP: 0018:ffff880001a0b6b8  EFLAGS: 00010002
[344318.065836] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000f
[344318.065836] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8800eb30b630
[344318.065836] RBP: ffff8800eb30b630 R08: 0000000000000001 R09: 000000000000b66e
[344318.065836] R10: 000000000000002f R11: ffff8803cd30002f R12: ffff8803e0412f00
[344318.065836] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800eb30b630
[344318.065836] FS:  0000000000000000(0000) GS:ffff8803e0400000(0000) knlGS:0000000000000000
[344318.065836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[344318.065836] CR2: 0000000000000028 CR3: 000000035f1d5000 CR4: 00000000001406f0
[344318.065836] Stack:
[344318.065836]  ffffffff8108213d ffff8800eb30ba90 ffffffff8150db8d 0000000000012f00
[344318.065836]  ffff880001a0bfd8 0000000000012f00 ffff8800eb30b630 ffff8800eb30bcb0
[344318.065836]  ffff8800eb30b988 ffff8800eb30b620 ffff8803de5752b0 ffff8800eb30b620
[344318.065836] Call Trace:
[344318.065836]  [<ffffffff8108213d>] ? wq_worker_sleeping+0xd/0x80
[344318.065836]  [<ffffffff8150db8d>] ? __schedule+0x45d/0x710
[344318.065836]  [<ffffffff81069f3f>] ? do_exit+0x6df/0xa50
[344318.065836]  [<ffffffff81016397>] ? oops_end+0x97/0xe0
[344318.065836]  [<ffffffff810137f0>] ? do_error_trap+0x70/0xe0
[344318.065836]  [<ffffffff811421e7>] ? move_freepages+0x107/0x110
[344318.065836]  [<ffffffffa00d6d57>] ? btrfs_update_reserved_bytes+0x37/0x110 [btrfs]
[344318.065836]  [<ffffffff815130be>] ? invalid_op+0x1e/0x30
[344318.065836]  [<ffffffff811421e7>] ? move_freepages+0x107/0x110
[344318.065836]  [<ffffffff81143c1c>] ? __rmqueue+0x37c/0x460
[344318.065836]  [<ffffffff81145e15>] ? get_page_from_freelist+0x685/0x910
[344318.065836]  [<ffffffff8114620d>] ? __alloc_pages_nodemask+0x16d/0xb30
[344318.065836]  [<ffffffff8114620d>] ? __alloc_pages_nodemask+0x16d/0xb30
[344318.065836]  [<ffffffffa013583a>] ? btrfs_find_space_for_alloc+0x22a/0x270 [btrfs]
[344318.065836]  [<ffffffffa00d6d57>] ? btrfs_update_reserved_bytes+0x37/0x110 [btrfs]
[344318.065836]  [<ffffffff8118ca6b>] ? kmem_getpages+0x5b/0x110
[344318.065836]  [<ffffffff8118dd5b>] ? cache_grow+0x21b/0x240
[344318.065836]  [<ffffffff8118e793>] ? kmem_cache_alloc+0x183/0x450
[344318.065836]  [<ffffffffa010a070>] ? __btrfs_add_ordered_extent+0x40/0x370 [btrfs]
[344318.065836]  [<ffffffffa010a3bd>] ? btrfs_add_ordered_extent+0x1d/0x30 [btrfs]
[344318.065836]  [<ffffffffa00f8b01>] ? cow_file_range+0x231/0x450 [btrfs]
[344318.065836]  [<ffffffffa00f9da9>] ? submit_compressed_extents+0x1d9/0x490 [btrfs]
[344318.065836]  [<ffffffffa00fa060>] ? submit_compressed_extents+0x490/0x490 [btrfs]
[344318.065836]  [<ffffffffa01200ab>] ? normal_work_helper+0x17b/0x290 [btrfs]
[344318.065836]  [<ffffffff81081662>] ? process_one_work+0x172/0x420
[344318.065836]  [<ffffffff81081cf3>] ? worker_thread+0x113/0x4f0
[344318.065836]  [<ffffffff81081be0>] ? rescuer_thread+0x2d0/0x2d0
[344318.065836]  [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[344318.065836]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[344318.065836]  [<ffffffff815115d8>] ? ret_from_fork+0x58/0x90
[344318.065836]  [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[344318.065836] Code: 08 04 00 00 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 08 04 00 00 <48> 8b 40 d8 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 
[344318.065836] RIP  [<ffffffff8108850c>] kthread_data+0xc/0x20
[344318.065836]  RSP <ffff880001a0b6b8>
[344318.065836] CR2: ffffffffffffffd8
[344318.065836] ---[ end trace 337c316a9f97426d ]---
[344318.065836] Fixing recursive fault but reboot is needed!

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Periodic kernel freezes
  2015-10-30 16:25 Periodic kernel freezes Alex Adriaanse
@ 2015-10-30 20:06 ` David Goodwin
  0 siblings, 0 replies; 2+ messages in thread
From: David Goodwin @ 2015-10-30 20:06 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org


On 30/10/2015 16:25, Alex Adriaanse wrote:
> I have an EC2 instance on AWS that tends to freeze several times per
> week. When it freezes it stops responding to network traffic, disk
> I/O stops, and CPU goes to 100%. The system comes back fine after a
> reboot. I was finally able to get a kernel backtrace from when this
> happened today, which I have attached to this email.
>
> The VM in question runs Debian Jessie, and has 3 BTRFS filesystems,
> including the root filesystem. Details are included below.
>
> Any ideas?
>

Hi Alex -

I kept experiencing problems with the Jessie 3.16.x kernel on EC2 (and 
elsewhere) with BTRFS.

Out of 8 nodes, one managed an uptime of 90 days, while the average was 
about 21 days.

Crashes were seemingly random, and it was difficult to get stack traces.

For the stack traces I did get, it wasn't always obvious that the 
problem lay with BTRFS.

Reboots normally needed to be forceful.

I'd suggest upgrading to a backports kernel (I compiled various 4.1.x 
kernels, but there's now 4.2.x in jessie-backports).

You might also want to turn off compression...

David.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-10-30 20:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-30 16:25 Periodic kernel freezes Alex Adriaanse
2015-10-30 20:06 ` David Goodwin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).