* Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14?
@ 2014-07-04 1:19 Marc MERLIN
2014-07-04 4:33 ` Russell Coker
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Marc MERLIN @ 2014-07-04 1:19 UTC (permalink / raw)
To: linux-btrfs
I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been
running out of memory and deadlocking (panic= doesn't even work).
I downgraded back to 3.14, but I already had the problem once since then.
OOM comes in, even though I have 0 swap used and AFAIK all my RAM isn't
gone, it then fails to kill enough stuff and eventually it dies like this:
[80943.542209] Swap cache stats: add 814596, delete 814595, find 2567491/2808869
[80943.565106] Free swap = 15612448kB
[80943.577607] Total swap = 15616764kB
[80943.589766] 2021665 pages RAM
[80943.600281] 0 pages HighMem/MovableOnly
[80943.613284] 28468 pages reserved
[80943.624330] 0 pages hwpoisoned
[80943.634824] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[80943.659669] [ 918] 0 918 855 0 5 236 -1000 udevd
[80943.684789] [ 8022] 0 8022 3074 0 5 89 -1000 auditd
[80943.710154] [ 8253] 0 8253 1813 0 6 123 -1000 sshd
[80943.735024] [12001] 0 12001 854 0 5 241 -1000 udevd
[80943.760152] [18969] 0 18969 854 0 5 223 -1000 udevd
[80943.785293] Kernel panic - not syncing: Out of memory and no killable processes...
Here is my more recent capture on 3.14 when I was able to catch it
before the panic and dump a bunch of sysrq data.
http://marc.merlins.org/tmp/btrfs-oom.txt
Things to note in that log:
[90621.895715] 2962 total pagecache pages
[90621.895716] 5 pages in swap cache
[90621.895717] Swap cache stats: add 145004, delete 144999, find 3314901/3316382
[90621.895718] Free swap = 15230536kB
[90621.895718] Total swap = 15616764kB
[90621.895718] Total swap = 15616764kB
[90621.895719] 2021665 pages RAM
[90621.895720] 0 pages HighMem/MovableOnly
[90621.895720] 28468 pages reserved
I'm not a VM person so I don't know how to read this, but am I out of RAM
but not out of swap (since clearly none was used), or am I out of a specific
memory region that is causing me problems?
I'm not 100% certain btrfs is to blame, but somehow it's suspect when
ugprading to 3.15 and getting btrfs problems then caused my 3 months running
fine 3.14.0 kernel also to die with the same OOM problems.
Then again, I understand it could be red herring. Suggestions either way are
appreciated :)
I tried raising this:
gargamel:~# echo 100 > /proc/sys/vm/swappiness
But so far I have too much unused RAM for any swap to be touched.
gargamel:~# free
total used free shared buffers cached
Mem: 7894792 4938596 2956196 0 2396 2909204
-/+ buffers/cache: 2026996 5867796
Swap: 15616764 0 15616764
The log is too big to paste here, but you can grep it for:
[90817.715833] SysRq : Show Memory
[90817.715833] SysRq : Show Memory
[90817.715833] SysRq : Show Memory
[90893.571151] SysRq : Show backtrace of all active CPUs
[90921.781599] SysRq : Show Blocked State
[91075.976611] SysRq : Show State
[91406.972046] SysRq : Terminate All Tasks
[91410.771584] SysRq : Emergency Remount R/O
[91413.222483] SysRq : Emergency Sync
[91430.316955] SysRq : Power Off
^^^^^^^^^^^^^^^^^^^^^^^
note the kernel was wedged enough that Power Off didn't work, apparently because it failed
to swap:
[91447.490142] CPU: 3 PID: 48 Comm: kswapd0 Not tainted 3.14.0-amd64-i915-preempt-20140216 #2
[91447.490143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012
[91447.490145] task: ffff8802126b6490 ti: ffff8802126e4000 task.ti: ffff8802126e4000
[91447.490146] RIP: 0010:[<ffffffff810898f5>] [<ffffffff810898f5>] do_raw_spin_lock+0x23/0x27
Right after OOM started kicking in, console showed apparent deadlocks in btrfs.
But is it possible that btrfs is then also eating all my memory somehow?
You can find the long details here:
http://marc.merlins.org/tmp/btrfs-oom.txt
[90801.680821] INFO: task btrfs-transacti:3433 blocked for more than 120 seconds.
[90801.712345] Not tainted 3.14.0-amd64-i915-preempt-20140216 #2
[90801.734394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[90801.882691] btrfs-transacti D ffff88021387e800 0 3433 2 0x00000000
[90801.904863] ffff88020b20de10 0000000000000046 ffff88020b20dfd8 ffff88021387e2d0
[90801.928448] 00000000000141c0 ffff88021387e2d0 ffff880211e94800 ffff880029c00dc0
[90801.952015] ffff8802009d1f28 ffff8802009d1ed0 0000000000000000 ffff88020b20de20
[90801.975701] Call Trace:
[90801.984443] [<ffffffff8160d2a1>] schedule+0x73/0x75
[90802.000438] [<ffffffff8122b575>] btrfs_commit_transaction+0x330/0x849
[90802.021140] [<ffffffff81085116>] ? finish_wait+0x65/0x65
[90802.038438] [<ffffffff81227c48>] transaction_kthread+0xf8/0x1ab
[90802.057571] [<ffffffff81227b50>] ? btrfs_cleanup_transaction+0x43f/0x43f
[90802.079092] [<ffffffff8106bc62>] kthread+0xae/0xb6
[90802.094838] [<ffffffff8106bbb4>] ? __kthread_parkme+0x61/0x61
[90802.113485] [<ffffffff8161637c>] ret_from_fork+0x7c/0xb0
[90802.130821] [<ffffffff8106bbb4>] ? __kthread_parkme+0x61/0x61
[90802.153881] INFO: task bash:6863 blocked for more than 120 seconds.
[90802.174582] Not tainted 3.14.0-amd64-i915-preempt-20140216 #2
[90802.195291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[90802.220747] bash D ffff88016177e780 0 6863 6862 0x20120086
[90802.268599] ffff88006f7eb918 0000000000000046 ffff88006f7ebfd8 ffff88016177e250
[90802.292124] 00000000000141c0 ffff88016177e250 ffff88006f7eb980 ffff8801ad7a484c
[90802.315631] 0000000001ddc000 ffff88006f7eb998 ffff8801ad7a4830 ffff88006f7eb928
[90802.339144] Call Trace:
[90802.347518] [<ffffffff8160d2a1>] schedule+0x73/0x75
[90802.363446] [<ffffffff81242468>] lock_extent_bits+0x11e/0x195
[90802.382008] [<ffffffff81085116>] ? finish_wait+0x65/0x65
[90802.399262] [<ffffffff81238674>] lock_and_cleanup_extent_if_need+0x6c/0x198
[90802.421485] [<ffffffff81239a21>] __btrfs_buffered_write+0x1e9/0x427
[90802.441622] [<ffffffff8160dd00>] ? mutex_unlock+0x16/0x18
[90802.459170] [<ffffffff8123a028>] btrfs_file_aio_write+0x3c9/0x4b7
[90802.478803] [<ffffffff811552b8>] do_sync_write+0x59/0x78
[90802.496102] [<ffffffff810b3677>] do_acct_process+0x314/0x39c
[90802.514452] [<ffffffff810b3ca7>] acct_process+0x77/0x92
[90802.531520] [<ffffffff810520f4>] do_exit+0x3a0/0x938
[90802.547785] [<ffffffff812986dd>] ? security_task_wait+0x16/0x18
[90802.566914] [<ffffffff8105c571>] ? __dequeue_signal+0x1c/0x101
[90802.585790] [<ffffffff81052731>] do_group_exit+0x6f/0xa2
[90802.603094] [<ffffffff8105ed02>] get_signal_to_deliver+0x4e6/0x533
[90802.623011] [<ffffffff8100f38f>] do_signal+0x49/0x5dc
[90802.639553] [<ffffffff810b6499>] ? C_SYSC_wait4+0x28/0xba
[90802.657105] [<ffffffff813b6353>] ? tty_ldisc_deref+0x16/0x18
[90802.675433] [<ffffffff8115d770>] ? path_put+0x1e/0x21
[90802.691916] [<ffffffff8100f957>] do_notify_resume+0x35/0x72
[90802.709952] [<ffffffff816166ea>] int_signal+0x12/0x17
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 1:19 Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? Marc MERLIN @ 2014-07-04 4:33 ` Russell Coker 2014-07-04 6:04 ` Marc MERLIN 2014-07-04 6:23 ` Satoru Takeuchi 2014-07-05 13:47 ` Andrew E. Mileski 2014-07-05 14:27 ` Andrew E. Mileski 2 siblings, 2 replies; 23+ messages in thread From: Russell Coker @ 2014-07-04 4:33 UTC (permalink / raw) To: Marc MERLIN; +Cc: linux-btrfs On Thu, 3 Jul 2014 18:19:38 Marc MERLIN wrote: > I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > running out of memory and deadlocking (panic= doesn't even work). > I downgraded back to 3.14, but I already had the problem once since then. Is there any correlation between such problems and BTRFS operations such as creating snapshots or running a scrub/balance? Back in ~3.10 days I had serious problems with BTRFS memory use when removing multiple snapshots or balancing. But at about 3.13 they all seemed to get fixed. I usually didn't have a kernel panic when I had such problems (although I sometimes had a system lock up solid such that I couldn't even determine what it's problem was). Usually the Oom handler started killing big processes such as chromium when it shouldn't have needed to. Note that I haven't verified that the BTRFS memory use is reasonable in all such situations. Merely that it doesn't use enough to kill my systems. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 4:33 ` Russell Coker @ 2014-07-04 6:04 ` Marc MERLIN 2014-07-04 6:23 ` Satoru Takeuchi 1 sibling, 0 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-04 6:04 UTC (permalink / raw) To: Russell Coker; +Cc: linux-btrfs On Fri, Jul 04, 2014 at 02:33:06PM +1000, Russell Coker wrote: > On Thu, 3 Jul 2014 18:19:38 Marc MERLIN wrote: > > I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > > running out of memory and deadlocking (panic= doesn't even work). > > I downgraded back to 3.14, but I already had the problem once since then. > > Is there any correlation between such problems and BTRFS operations such as > creating snapshots or running a scrub/balance? So far I don't have time data of when it happened, but I'm running new code to do better logging and hopefully give me a hint of when this is happening. However since it takes 6 to 48H to trigger, it'll take a little bit before I can report back. Thanks Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 4:33 ` Russell Coker 2014-07-04 6:04 ` Marc MERLIN @ 2014-07-04 6:23 ` Satoru Takeuchi 2014-07-04 14:24 ` Marc MERLIN 1 sibling, 1 reply; 23+ messages in thread From: Satoru Takeuchi @ 2014-07-04 6:23 UTC (permalink / raw) To: russell, Marc MERLIN; +Cc: linux-btrfs Hi, (2014/07/04 13:33), Russell Coker wrote: > On Thu, 3 Jul 2014 18:19:38 Marc MERLIN wrote: >> I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been >> running out of memory and deadlocking (panic= doesn't even work). >> I downgraded back to 3.14, but I already had the problem once since then. > > Is there any correlation between such problems and BTRFS operations such as > creating snapshots or running a scrub/balance? Were you running scrub, Marc? http://marc.merlins.org/tmp/btrfs-oom.txt: === ... [90621.895922] [ 8034] 0 8034 1315 164 5 46 0 btrfs-scrub ... === In this case, you would hit kernel memory leak. However, I can't find who is the root cause from this log. Marc, do you change - software and its setting, - operations, - hardware configuration, or any other, just before detecting first OOM? You have 8GB RAM and there is plenty of swap space. =============================================================================== [90621.895719] 2021665 pages RAM ... [90621.895718] Free swap = 15230536kB =============================================================================== Here are the avaliable memory of for each OOM-killer. 1st OOM: =============================================================================== [90622.074758] Out of memory: Kill process 11452 (mh) score 2 or sacrifice child [90622.074760] Killed process 11452 (mh) total-vm:66208kB, anon-rss:0kB, file-rss:872kB [90622.425826] rfx-xpl-static invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 ~~~~~~~~~~~~~~~~~~~~~~~~~~ It failed to acquire order=0 (2^0=1) page. So it's not kernel-memory-fragmentation case. Since __GFP_IO(0x80) and __GFP_FS(0x80) is set in gfp_mask, it can swap out anon/file pages to swap/filesystems to prepare free memories. [90622.425829] rfx-xpl-static cpuset=/ mems_allowed=0 [90622.425832] CPU: 2 PID: 748 Comm: rfx-xpl-static Not tainted 3.14.0-amd64-i915-preempt-20140216 #2 [90622.425833] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012 [90622.425834] 0000000000000000 ffff8801414a79d8 ffffffff8160a06d ffff8801434b2050 [90622.425838] ffff8801414a7a68 ffffffff81607078 0000000000000000 ffffffff8160dd00 [90622.425841] ffff8801414a7a08 ffffffff810501b4 ffff8801414a7a48 ffffffff8109cb05 [90622.425844] Call Trace: [90622.425846] [<ffffffff8160a06d>] dump_stack+0x4e/0x7a [90622.425851] [<ffffffff81607078>] dump_header+0x7f/0x206 [90622.425854] [<ffffffff8160dd00>] ? mutex_unlock+0x16/0x18 [90622.425857] [<ffffffff810501b4>] ? put_online_cpus+0x6c/0x6e [90622.425861] [<ffffffff8109cb05>] ? rcu_oom_notify+0xb3/0xc6 [90622.425865] [<ffffffff81101a7f>] oom_kill_process+0x6e/0x30e [90622.425869] [<ffffffff811022b6>] out_of_memory+0x42e/0x461 [90622.425872] [<ffffffff81106dfe>] __alloc_pages_nodemask+0x673/0x854 [90622.425876] [<ffffffff8113b654>] alloc_pages_vma+0xd1/0x116 [90622.425880] [<ffffffff81130ab7>] read_swap_cache_async+0x74/0x13b [90622.425883] [<ffffffff81130cc1>] swapin_readahead+0x143/0x152 [90622.425886] [<ffffffff810fede9>] ? find_get_page+0x69/0x75 [90622.425889] [<ffffffff81122adf>] handle_mm_fault+0x56b/0x9b0 [90622.425892] [<ffffffff81612de6>] __do_page_fault+0x381/0x3cd [90622.425895] [<ffffffff81078cfc>] ? wake_up_state+0x12/0x12 [90622.425899] [<ffffffff8115d770>] ? path_put+0x1e/0x21 [90622.425903] [<ffffffff81612e57>] do_page_fault+0x25/0x27 [90622.425906] [<ffffffff816103f8>] page_fault+0x28/0x30 [90622.425910] Mem-Info: [90622.425910] Node 0 DMA per-cpu: [90622.425913] CPU 0: hi: 0, btch: 1 usd: 0 [90622.425914] CPU 1: hi: 0, btch: 1 usd: 0 [90622.425915] CPU 2: hi: 0, btch: 1 usd: 0 [90622.425916] CPU 3: hi: 0, btch: 1 usd: 0 [90622.425916] Node 0 DMA32 per-cpu: [90622.425919] CPU 0: hi: 186, btch: 31 usd: 24 [90622.425920] CPU 1: hi: 186, btch: 31 usd: 1 [90622.425921] CPU 2: hi: 186, btch: 31 usd: 0 [90622.425922] CPU 3: hi: 186, btch: 31 usd: 0 [90622.425923] Node 0 Normal per-cpu: [90622.425924] CPU 0: hi: 186, btch: 31 usd: 0 [90622.425925] CPU 1: hi: 186, btch: 31 usd: 0 [90622.425926] CPU 2: hi: 186, btch: 31 usd: 0 [90622.425928] CPU 3: hi: 186, btch: 31 usd: 0 [90622.425932] active_anon:57 inactive_anon:92 isolated_anon:0 [90622.425932] active_file:987 inactive_file:1232 isolated_file:0 [90622.425932] unevictable:1389 dirty:590 writeback:1 unstable:0 [90622.425932] free:25102 slab_reclaimable:9147 slab_unreclaimable:30944 There are few anon/file, in other word, reclaimable pages. The system would be almost full of kernel memory. As I said, kernel memory leak would happen here. [90622.425932] mapped:771 shmem:104 pagetables:1487 bounce:0 [90622.425932] free_cma:0 [90622.425933] Node 0 DMA free:15360kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes ~~~~~~~~~~~~~~~~~~~~~~ "all_unreclaimable? == yes" means "page reclaim work do my best and there is nothing to do any more". [90622.425940] lowmem_reserve[]: 0 3204 7691 7691 [90622.425943] Node 0 DMA32 free:45816kB min:28100kB low:35124kB high:42148kB active_anon:0kB inactive_anon:88kB active_file:1336kB inactive_file:1624kB unevictable:1708kB isolated(anon):0kB isolated(file):0kB present:3362328kB managed:3284952kB mlocked:1708kB dirty:244kB writeback:0kB mapped:964kB shmem:0kB slab_reclaimable:128kB slab_unreclaimable:4712kB kernel_stack:1824kB pagetables:2096kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4807 all_unreclaimable? yes [90622.425950] lowmem_reserve[]: 0 0 4486 4486 [90622.425953] Node 0 Normal free:39232kB min:39348kB low:49184kB high:59020kB active_anon:228kB inactive_anon:280kB active_file:2612kB inactive_file:3304kB unevictable:3848kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4594480kB mlocked:3848kB dirty:2116kB writeback:4kB mapped:2120kB shmem:416kB slab_reclaimable:36460kB slab_unreclaimable:119064kB kernel_stack:2040kB pagetables:3852kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:9683 all_unreclaimable? yes [90622.425959] lowmem_reserve[]: 0 0 0 0 [90622.425962] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15360kB [90622.425973] Node 0 DMA32: 10492*4kB (UEM) 2*8kB (U) 0*16kB 0*32kB 1*64kB (R) 1*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 0*4096kB = 46016kB [90622.425985] Node 0 Normal: 8763*4kB (UEM) 33*8kB (UE) 2*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 39444kB [90622.425997] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [90622.425998] 3257 total pagecache pages [90622.425999] 53 pages in swap cache [90622.426000] Swap cache stats: add 145114, delete 145061, find 3322456/3324032 [90622.426001] Free swap = 15277320kB [90622.426002] Total swap = 15616764kB [90622.426002] 2021665 pages RAM [90622.426003] 0 pages HighMem/MovableOnly [90622.426004] 28468 pages reserved [90622.426004] 0 pages hwpoisoned [90622.426005] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [90622.426011] [ 917] 0 917 754 96 5 135 -1000 udevd [90622.426014] [ 1634] 0 1634 592 81 5 50 0 bootlogd [90622.426016] [ 1635] 0 1635 510 50 4 15 0 startpar [90622.426020] [ 4336] 0 4336 1257 153 5 260 0 pinggw [90622.426022] [ 7130] 0 7130 677 99 5 57 0 rpcbind [90622.426024] [ 7160] 122 7160 746 152 5 100 0 rpc.statd [90622.426026] [ 7195] 0 7195 757 74 5 44 0 rpc.idmapd [90622.426028] [ 7604] 0 7604 753 87 5 136 -1000 udevd [90622.426030] [ 8016] 0 8016 564 144 4 24 0 getty =============================================================================== All processes above uses a little memory. It's because they are already evicted to swap/filesystem beforehand. Thanks, Satoru > > Back in ~3.10 days I had serious problems with BTRFS memory use when removing > multiple snapshots or balancing. But at about 3.13 they all seemed to get > fixed. > > I usually didn't have a kernel panic when I had such problems (although I > sometimes had a system lock up solid such that I couldn't even determine what > it's problem was). Usually the Oom handler started killing big processes such > as chromium when it shouldn't have needed to. > > Note that I haven't verified that the BTRFS memory use is reasonable in all > such situations. Merely that it doesn't use enough to kill my systems. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 6:23 ` Satoru Takeuchi @ 2014-07-04 14:24 ` Marc MERLIN 2014-07-04 14:45 ` Russell Coker 2014-07-04 22:13 ` Duncan 0 siblings, 2 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-04 14:24 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: russell, linux-btrfs Thank you for your answer. I'll put the conclusion and question at the top for easier reading: So, should I understand that 1) I have enough RAM in my system but all of it disappears, apparently claimed by the kernel and not released 2) this could be a kernel memory leak in btrfs or somewhere else, there is no good way to know If so, in a case like this, is there additional output I can capture to figure out how the memory is lost and help find out which part of the kernel is eating the memory without releasing it? While btrfs is likely to blame, for now it's really just a guess and it would be good to confirm. On Fri, Jul 04, 2014 at 03:23:41PM +0900, Satoru Takeuchi wrote: > >Is there any correlation between such problems and BTRFS operations such as > >creating snapshots or running a scrub/balance? > > Were you running scrub, Marc? Yes, I was due to the other problem I was discussing on the list. I wanted to know if scrub would find any problem (it did not). I think I'll now try to read every file of every filesystem to see what btrfs does (this will take a while, that's around 100 million files). But the last times I had this OOM problem with 3.15.1 it was happening within 6 hours sometimes, and I was not starting scrub every time the system booted, so scrub may be partially responsible but it's not the core problem. (Also I run scrub on this system every few weeks and it hadn't OOM crashed inthe past) > Marc, do you change > > - software and its setting, > - operations, > - hardware configuration, > or any other, just before detecting first OOM? Those are 3 good questions, I asked myself the same thing. >From what I remember though all I did was going from 3.14 to 3.15. However this machine has many cronjobs, it does rsyncs to and from remote systems, it has btrfs send/receive going to and from it, and snapshots every hour. Those are not new, but if any of them changed in a small way, I guess they could trigger bugs. > You have 8GB RAM and there is plenty of swap space. Correct. > =============================================================================== > [90621.895719] 2021665 pages RAM > ... > [90621.895718] Free swap = 15230536kB > =============================================================================== > > Here are the avaliable memory of for each OOM-killer. > > 1st OOM: > =============================================================================== > [90622.074758] Out of memory: Kill process 11452 (mh) score 2 or sacrifice child > [90622.074760] Killed process 11452 (mh) total-vm:66208kB, anon-rss:0kB, file-rss:872kB > [90622.425826] rfx-xpl-static invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > It failed to acquire order=0 (2^0=1) page. So it's not > kernel-memory-fragmentation case. Since __GFP_IO(0x80) and __GFP_FS(0x80) is > set in gfp_mask, it can swap out anon/file pages to swap/filesystems to prepare > free memories. Thanks for explaining. > [90622.425932] active_anon:57 inactive_anon:92 isolated_anon:0 > [90622.425932] active_file:987 inactive_file:1232 isolated_file:0 > [90622.425932] unevictable:1389 dirty:590 writeback:1 unstable:0 > [90622.425932] free:25102 slab_reclaimable:9147 slab_unreclaimable:30944 > > There are few anon/file, in other word, reclaimable pages. > The system would be almost full of kernel memory. > As I said, kernel memory leak would happen here. > > [90622.425932] mapped:771 shmem:104 pagetables:1487 bounce:0 > [90622.425932] free_cma:0 > [90622.425933] Node 0 DMA free:15360kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15360kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > ~~~~~~~~~~~~~~~~~~~~~~ > > "all_unreclaimable? == yes" means "page reclaim work do my best > and there is nothing to do any more". Understood. I moved my question that was here at the top. Thank you, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 14:24 ` Marc MERLIN @ 2014-07-04 14:45 ` Russell Coker 2014-07-04 15:07 ` Marc MERLIN 2014-07-04 22:13 ` Duncan 1 sibling, 1 reply; 23+ messages in thread From: Russell Coker @ 2014-07-04 14:45 UTC (permalink / raw) To: Marc MERLIN; +Cc: Satoru Takeuchi, linux-btrfs On Fri, 4 Jul 2014 07:24:16 Marc MERLIN wrote: > On Fri, Jul 04, 2014 at 03:23:41PM +0900, Satoru Takeuchi wrote: > > >Is there any correlation between such problems and BTRFS operations such > > >as > > >creating snapshots or running a scrub/balance? > > > > Were you running scrub, Marc? > > Yes, I was due to the other problem I was discussing on the list. I > wanted to know if scrub would find any problem (it did not). > I think I'll now try to read every file of every filesystem to see what > btrfs does (this will take a while, that's around 100 million files). > > But the last times I had this OOM problem with 3.15.1 it was happening > within 6 hours sometimes, and I was not starting scrub every time the > system booted, so scrub may be partially responsible but it's not the > core problem. It would be a good idea to run a few scrubs and see if this is a repeatable problem. If it's a repeatable problem then it's something to fix regardless of whether it's the only issue you have. > > Marc, do you change > > > > - software and its setting, > > - operations, > > - hardware configuration, > > > > or any other, just before detecting first OOM? > > Those are 3 good questions, I asked myself the same thing. > From what I remember though all I did was going from 3.14 to 3.15. > However this machine has many cronjobs, it does rsyncs to and from > remote systems, it has btrfs send/receive going to and from it, and > snapshots every hour. > Those are not new, but if any of them changed in a small way, I guess > they could trigger bugs. If a scrub can reliably trigger the problem it would be good to test 3.14 for the same behavior. Knowing whether it's a regression would help the developers. > > You have 8GB RAM and there is plenty of swap space. > > Correct. Even without much swap 8G should be a plenty. My main workstation has 4G of RAM and 6G of swap. I almost never use more than 3G of swap because the system becomes so slow as to be almost unusable when swap gets to 4G (Chromium is to blame). However that is for a 120G non-RAID filesystem. Presumably a RAID array will need some more kernel memory and a larger filesystem will also need a little more, but it still shouldn't be that much. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 14:45 ` Russell Coker @ 2014-07-04 15:07 ` Marc MERLIN 0 siblings, 0 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-04 15:07 UTC (permalink / raw) To: Russell Coker; +Cc: Satoru Takeuchi, linux-btrfs On Sat, Jul 05, 2014 at 12:45:55AM +1000, Russell Coker wrote: > > But the last times I had this OOM problem with 3.15.1 it was happening > > within 6 hours sometimes, and I was not starting scrub every time the > > system booted, so scrub may be partially responsible but it's not the > > core problem. > > It would be a good idea to run a few scrubs and see if this is a repeatable > problem. If it's a repeatable problem then it's something to fix regardless > of whether it's the only issue you have. Fair point. Re-running scrub now. That will take 36H or so :) > If a scrub can reliably trigger the problem it would be good to test 3.14 for > the same behavior. Knowing whether it's a regression would help the > developers. I'm already back to 3.14. 3.15 was dying about once a day without scrub, enough that this was causing me real problems (it's a server that is supposed to do work :) ). > Even without much swap 8G should be a plenty. My main workstation has 4G of > RAM and 6G of swap. I almost never use more than 3G of swap because the > system becomes so slow as to be almost unusable when swap gets to 4G (Chromium > is to blame). However that is for a 120G non-RAID filesystem. Presumably a > RAID array will need some more kernel memory and a larger filesystem will also > need a little more, but it still shouldn't be that much. You're correct. I only have more swap than ram, both as a habit in case I ever want to hibernate a system (I don't with this one) and in case some userland stuff leaks a lot (I've had versions of Xorg leak and that would improve the time between which I had to restart X and lose all my window state :) ). Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 14:24 ` Marc MERLIN 2014-07-04 14:45 ` Russell Coker @ 2014-07-04 22:13 ` Duncan 1 sibling, 0 replies; 23+ messages in thread From: Duncan @ 2014-07-04 22:13 UTC (permalink / raw) To: linux-btrfs Marc MERLIN posted on Fri, 04 Jul 2014 07:24:16 -0700 as excerpted: > So, should I understand that 1) I have enough RAM in my system but all > of it disappears, apparently > claimed by the kernel and not released > > 2) this could be a kernel memory leak in btrfs or somewhere else, there > is no good way to know > Generally speaking if you're running out of RAM and swap is unused, it's because something is leaking memory that cannot be swapped. The kernel is the most likely suspect as kernel memory doesn't swap, especially so if the OOM-killer kills everything it can and the problem is still there. So it's almost certainly the kernel, and btrfs is a good candidate, but I'm not a dev, and at least from that vantage point I didn't see enough information to conclusively pin it on btrfs. If the devs can't either, then either turning on additional debugging options or further debugging patches would seem to be in order. So the above seems correct from here, yes. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 1:19 Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? Marc MERLIN 2014-07-04 4:33 ` Russell Coker @ 2014-07-05 13:47 ` Andrew E. Mileski 2014-07-05 14:43 ` Marc MERLIN 2014-07-05 14:27 ` Andrew E. Mileski 2 siblings, 1 reply; 23+ messages in thread From: Andrew E. Mileski @ 2014-07-05 13:47 UTC (permalink / raw) To: linux-btrfs On 2014-07-03 9:19 PM, Marc MERLIN wrote: > I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > running out of memory and deadlocking (panic= doesn't even work). > I downgraded back to 3.14, but I already had the problem once since then. I didn't see any mention of the btrfs utility version in this thread (I may be blind though). My server was suffering from frequent panics upon scrub / defrag / balance, until I updated the btrfs utility. That resolved all my issues. My server has two filesystems of ~25 TiB, and 4 GB of RAM (15% used). A hardware upgrade is imminent, just waiting for the backup to complete. ~~ Andrew E. Mileski ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-05 13:47 ` Andrew E. Mileski @ 2014-07-05 14:43 ` Marc MERLIN 2014-07-05 15:17 ` Andrew E. Mileski 2014-07-06 14:58 ` Marc MERLIN 0 siblings, 2 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-05 14:43 UTC (permalink / raw) To: Andrew E. Mileski; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 5353 bytes --] On Sat, Jul 05, 2014 at 09:47:09AM -0400, Andrew E. Mileski wrote: > On 2014-07-03 9:19 PM, Marc MERLIN wrote: > >I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > >running out of memory and deadlocking (panic= doesn't even work). > >I downgraded back to 3.14, but I already had the problem once since then. > > I didn't see any mention of the btrfs utility version in this thread > (I may be blind though). > > My server was suffering from frequent panics upon scrub / defrag / > balance, until I updated the btrfs utility. That resolved all my > issues. Really? The userland tool should only send ioctls to the kernel, I really can't see how it would cause the kernel code to panic or not. gargamel:~# btrfs --version Btrfs v3.14.1 which is the latest in debian unstable. As an update, after 1.7 days of scrubbing, the system has started getting sluggish, I'm getting synchronization problems/crashes in some of my tools that talk to serial ports (likely due to mini deadlocks in the kernel), and I'm now getting a few btrfs hangs. A lot of my RAM seems gone now (1GB free but only 500MB actually used by user space), but the system is still up for now, just somewhat sluggish. I've been told meminfo, slabinfo and zoneinfo may be useful to debug that, so I'll attach them. I see an interesting line about nf_conntrack and vmalloc in there. When my scrub is done, I'll reboot and disable nf_conntrack just to see what happens, and I'll also wait a bit to see how stable the RAM is before I start more scrubs to see if scrubs do create memmory leaks. slabinfo (attached) doesn't seem to show that btrfs owns a lot of memory, but maybe I'm not sure how to read it. gargamel:~# free total used free shared buffers cached Mem: 7894792 7370872 523920 0 28 564272 -/+ buffers/cache: 6806572 1088220 Swap: 15616764 483792 15132972 That one could be interesting, although just a a symptom of the memory pressure: [144323.739212] nf_conntrack: falling back to vmalloc. [144323.757139] nf_conntrack: falling back to vmalloc. [144323.778092] nf_conntrack: falling back to vmalloc. [144323.797785] nf_conntrack: falling back to vmalloc. Is it possible for btrfs to be hung on getting memory allocations? Is a 2mn wait even possible for the right memory block to become available? [149171.560084] INFO: task btrfs:17366 blocked for more than 120 seconds. [149171.582499] Not tainted 3.14.0-amd64-i915-preempt-20140216 #2 [149171.604251] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [149171.630523] btrfs D ffff88006897c980 0 17366 18362 0x00000080 [149171.654267] ffff880083a51720 0000000000000086 ffff880083a51fd8 ffff88006897c450 [149171.679265] 00000000000141c0 ffff88006897c450 ffff88021f3941c0 ffff88006897c450 [149171.704141] ffff880083a517c0 ffffffff810fe7a2 0000000000000002 ffff880083a51730 [149171.729093] Call Trace: [149171.738754] [<ffffffff810fe7a2>] ? wait_on_page_read+0x3c/0x3c [149171.759582] [<ffffffff8160d2a1>] schedule+0x73/0x75 [149171.776663] [<ffffffff8160d446>] io_schedule+0x60/0x7a [149171.794400] [<ffffffff810fe7b0>] sleep_on_page+0xe/0x12 [149171.812292] [<ffffffff8160d6d8>] __wait_on_bit+0x48/0x7a [149171.830401] [<ffffffff810fe750>] wait_on_page_bit+0x7a/0x7c [149171.849307] [<ffffffff8108514a>] ? autoremove_wake_function+0x34/0x34 [149171.871229] [<ffffffff81246320>] read_extent_buffer_pages+0x1bf/0x204 [149171.892813] [<ffffffff81223dc0>] ? free_root_pointers+0x5b/0x5b [149171.912642] [<ffffffff81224ac2>] btree_read_extent_buffer_pages.constprop.45+0x66/0x100 [149171.938737] [<ffffffff81225a17>] read_tree_block+0x2f/0x47 [149171.957308] [<ffffffff8120eb66>] read_block_for_search.isra.26+0x24a/0x287 [149171.980128] [<ffffffff812103a7>] btrfs_search_slot+0x4f4/0x6bb [149171.999679] [<ffffffff8127174d>] scrub_stripe+0x43d/0xc9e [149172.017930] [<ffffffff81085116>] ? finish_wait+0x65/0x65 [149172.035901] [<ffffffff81272084>] scrub_chunk.isra.9+0xd6/0x10d [149172.055614] [<ffffffff8127232f>] scrub_enumerate_chunks+0x274/0x418 [149172.076546] [<ffffffff81085100>] ? finish_wait+0x4f/0x65 [149172.094506] [<ffffffff81272a73>] btrfs_scrub_dev+0x254/0x3cb [149172.113493] [<ffffffff8116e31e>] ? __mnt_want_write+0x62/0x78 [149172.132714] [<ffffffff81256314>] btrfs_ioctl+0x1114/0x24bd [149172.151111] [<ffffffff81015f07>] ? paravirt_sched_clock+0x9/0xd [149172.170794] [<ffffffff810164ad>] ? sched_clock+0x9/0xb [149172.188107] [<ffffffff8107b2fe>] ? sched_clock_cpu+0x1a/0xc5 [149172.207380] [<ffffffff811409bd>] ? ____cache_alloc+0x1c/0x29b [149172.226446] [<ffffffff81140d2b>] ? kmem_cache_alloc_node+0xef/0x179 [149172.247033] [<ffffffff8160f97b>] ? _raw_spin_unlock+0x17/0x2a [149172.266018] [<ffffffff81163ffe>] do_vfs_ioctl+0x3d2/0x41d [149172.283939] [<ffffffff8116c1f0>] ? __fget+0x6f/0x79 [149172.300268] [<ffffffff81164099>] SyS_ioctl+0x50/0x7b [149172.316818] [<ffffffff8161642d>] system_call_fastpath+0x1a/0x1f -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 [-- Attachment #2: all-memory --] [-- Type: text/plain, Size: 31328 bytes --] Sat Jul 5 07:30:09 PDT 2014 MemTotal: 7894792 kB MemFree: 489800 kB MemAvailable: 929864 kB Buffers: 28 kB Cached: 581012 kB SwapCached: 33024 kB Active: 368028 kB Inactive: 388080 kB Active(anon): 81496 kB Inactive(anon): 97180 kB Active(file): 286532 kB Inactive(file): 290900 kB Unevictable: 5352 kB Mlocked: 5352 kB SwapTotal: 15616764 kB SwapFree: 15136100 kB Dirty: 1672 kB Writeback: 3396 kB AnonPages: 171784 kB Mapped: 29572 kB Shmem: 660 kB Slab: 186660 kB SReclaimable: 63132 kB SUnreclaim: 123528 kB KernelStack: 5184 kB PageTables: 10040 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 19564160 kB Committed_AS: 1717256 kB VmallocTotal: 34359738367 kB VmallocUsed: 358996 kB VmallocChunk: 34359281468 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 144920 kB DirectMap2M: 7942144 kB slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> pid_2 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 nfsd_drc 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0 nfsd4_delegations 0 0 368 11 1 : tunables 54 27 8 : slabdata 0 0 0 nfsd4_stateids 1 33 120 33 1 : tunables 120 60 8 : slabdata 1 1 0 nfsd4_files 1 31 128 31 1 : tunables 120 60 8 : slabdata 1 1 0 nfsd4_lockowners 0 0 384 10 1 : tunables 54 27 8 : slabdata 0 0 0 nfsd4_openowners 1 10 392 10 1 : tunables 54 27 8 : slabdata 1 1 0 nfs_direct_cache 0 0 208 19 1 : tunables 120 60 8 : slabdata 0 0 0 nfs_commit_data 4 11 704 11 2 : tunables 54 27 8 : slabdata 1 1 0 nfs_write_data 32 32 960 4 1 : tunables 54 27 8 : slabdata 8 8 0 nfs_read_data 0 0 896 4 1 : tunables 54 27 8 : slabdata 0 0 0 nfs_inode_cache 0 0 1016 4 1 : tunables 54 27 8 : slabdata 0 0 0 nfs_page 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 fscache_cookie_jar 1 48 80 48 1 : tunables 120 60 8 : slabdata 1 1 0 rpc_inode_cache 15 18 640 6 1 : tunables 54 27 8 : slabdata 3 3 0 rpc_buffers 8 8 2048 2 1 : tunables 24 12 8 : slabdata 4 4 0 rpc_tasks 8 15 256 15 1 : tunables 120 60 8 : slabdata 1 1 0 ext4_groupinfo_1k 61 62 128 31 1 : tunables 120 60 8 : slabdata 2 2 0 jbd2_1k 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 dm_snap_pending_exception 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 dm_exception 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 raid5-md8 16384 16384 1936 2 1 : tunables 24 12 8 : slabdata 8192 8192 0 nf_conntrack_ffffffff81cc7380 269 507 304 13 1 : tunables 54 27 8 : slabdata 39 39 0 nf_conntrack_expect 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 fuse_request 9 9 408 9 1 : tunables 54 27 8 : slabdata 1 1 0 fuse_inode 94 100 768 5 1 : tunables 54 27 8 : slabdata 20 20 0 kvm_async_pf 0 0 136 29 1 : tunables 120 60 8 : slabdata 0 0 0 kvm_vcpu 0 0 16128 1 4 : tunables 8 4 0 : slabdata 0 0 0 kvm_mmu_page_header 0 0 168 23 1 : tunables 120 60 8 : slabdata 0 0 0 pte_list_desc 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 bio-2 690 1530 256 15 1 : tunables 120 60 8 : slabdata 101 102 360 dm_crypt_io 708 1743 184 21 1 : tunables 120 60 8 : slabdata 82 83 300 kcopyd_job 0 0 3312 2 2 : tunables 24 12 8 : slabdata 0 0 0 io 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 dm_uevent 0 0 2608 3 2 : tunables 24 12 8 : slabdata 0 0 0 dm_rq_target_io 0 0 424 9 1 : tunables 54 27 8 : slabdata 0 0 0 dm_io 711 2511 40 93 1 : tunables 120 60 8 : slabdata 27 27 420 raid5-md5 16384 16384 1936 2 1 : tunables 24 12 8 : slabdata 8192 8192 0 btrfs_prelim_ref 4 48 80 48 1 : tunables 120 60 8 : slabdata 1 1 0 btrfs_delayed_extent_op 279 279 40 93 1 : tunables 120 60 8 : slabdata 3 3 0 btrfs_delayed_data_ref 96 200 96 40 1 : tunables 120 60 8 : slabdata 5 5 0 btrfs_delayed_tree_ref 264 264 88 44 1 : tunables 120 60 8 : slabdata 6 6 0 btrfs_delayed_ref_head 312 312 160 24 1 : tunables 120 60 8 : slabdata 13 13 60 btrfs_inode_defrag 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 btrfs_delayed_node 447 871 304 13 1 : tunables 54 27 8 : slabdata 67 67 0 btrfs_ordered_extent 8 60 376 10 1 : tunables 54 27 8 : slabdata 4 6 0 btrfs_extent_map 46900 61854 152 26 1 : tunables 120 60 8 : slabdata 2379 2379 420 bio-1 333 912 320 12 1 : tunables 54 27 8 : slabdata 75 76 189 btrfs_extent_buffer 14930 16536 320 12 1 : tunables 54 27 8 : slabdata 1378 1378 162 btrfs_extent_state 3716 10400 96 40 1 : tunables 120 60 8 : slabdata 260 260 300 btrfs_delalloc_work 1 29 136 29 1 : tunables 120 60 8 : slabdata 1 1 0 btrfs_free_space 50084 54780 64 60 1 : tunables 120 60 8 : slabdata 913 913 0 btrfs_path 238 297 144 27 1 : tunables 120 60 8 : slabdata 11 11 0 btrfs_transaction 24 48 240 16 1 : tunables 120 60 8 : slabdata 3 3 0 btrfs_trans_handle 48 48 160 24 1 : tunables 120 60 8 : slabdata 2 2 0 btrfs_inode 12932 12932 1008 4 1 : tunables 54 27 8 : slabdata 3233 3233 0 fib6_nodes 13 60 64 60 1 : tunables 120 60 8 : slabdata 1 1 0 ip6_dst_cache 12 30 384 10 1 : tunables 54 27 8 : slabdata 3 3 0 ip6_mrt_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 PINGv6 0 0 1088 7 2 : tunables 24 12 8 : slabdata 0 0 0 RAWv6 7 7 1088 7 2 : tunables 24 12 8 : slabdata 1 1 0 UDPLITEv6 0 0 1088 7 2 : tunables 24 12 8 : slabdata 0 0 0 UDPv6 12 35 1088 7 2 : tunables 24 12 8 : slabdata 5 5 0 tw_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 request_sock_TCPv6 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 TCPv6 30 30 1920 2 1 : tunables 24 12 8 : slabdata 15 15 0 flow_cache 0 0 104 37 1 : tunables 120 60 8 : slabdata 0 0 0 scsi_sense_cache 341 341 128 31 1 : tunables 120 60 8 : slabdata 11 11 180 scsi_cmd_cache 190 190 384 10 1 : tunables 54 27 8 : slabdata 19 19 108 sd_ext_cdb 2 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0 i915_gem_object 9 10 384 10 1 : tunables 54 27 8 : slabdata 1 1 0 btree_node 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 cfq_io_cq 828 924 120 33 1 : tunables 120 60 8 : slabdata 28 28 0 cfq_queue 859 884 232 17 1 : tunables 120 60 8 : slabdata 52 52 0 bsg_cmd 0 0 312 12 1 : tunables 54 27 8 : slabdata 0 0 0 mqueue_inode_cache 1 4 896 4 1 : tunables 54 27 8 : slabdata 1 1 0 hugetlbfs_inode_cache 1 7 584 7 1 : tunables 54 27 8 : slabdata 1 1 0 jbd2_transaction_s 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 jbd2_inode 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 jbd2_journal_handle 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 jbd2_journal_head 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0 jbd2_revoke_table_s 2 204 16 204 1 : tunables 120 60 8 : slabdata 1 1 0 jbd2_revoke_record_s 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_inode_cache 349 352 968 4 1 : tunables 54 27 8 : slabdata 88 88 0 ext4_xattr 0 0 88 44 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_free_data 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_allocation_context 0 0 136 29 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_prealloc_space 0 0 104 37 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_system_zone 0 0 40 93 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_io_end 0 0 72 53 1 : tunables 120 60 8 : slabdata 0 0 0 ext4_extent_status 8 93 40 93 1 : tunables 120 60 8 : slabdata 1 1 0 configfs_dir_cache 1 44 88 44 1 : tunables 120 60 8 : slabdata 1 1 0 dquot 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 kioctx 12 12 640 6 1 : tunables 54 27 8 : slabdata 2 2 0 kiocb 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 fanotify_event_info 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 fanotify_response_event 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 fsnotify_mark 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0 inotify_inode_mark 122 165 120 33 1 : tunables 120 60 8 : slabdata 5 5 0 dnotify_mark 1 33 120 33 1 : tunables 120 60 8 : slabdata 1 1 0 dnotify_struct 1 113 32 113 1 : tunables 120 60 8 : slabdata 1 1 0 dio 0 0 640 6 1 : tunables 54 27 8 : slabdata 0 0 0 fasync_cache 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 pid_namespace 0 0 2192 3 2 : tunables 24 12 8 : slabdata 0 0 0 user_namespace 0 0 264 15 1 : tunables 54 27 8 : slabdata 0 0 0 posix_timers_cache 0 0 248 16 1 : tunables 120 60 8 : slabdata 0 0 0 uid_cache 28 31 128 31 1 : tunables 120 60 8 : slabdata 1 1 0 UNIX 204 204 896 4 1 : tunables 54 27 8 : slabdata 51 51 0 ip_mrt_cache 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 UDP-Lite 0 0 896 4 1 : tunables 54 27 8 : slabdata 0 0 0 tcp_bind_bucket 170 300 64 60 1 : tunables 120 60 8 : slabdata 5 5 0 inet_peer_cache 248 740 192 20 1 : tunables 120 60 8 : slabdata 37 37 0 secpath_cache 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 xfrm_dst_cache 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 ip_fib_trie 22 68 56 68 1 : tunables 120 60 8 : slabdata 1 1 0 ip_fib_alias 25 78 48 78 1 : tunables 120 60 8 : slabdata 1 1 0 ip_dst_cache 73 140 192 20 1 : tunables 120 60 8 : slabdata 7 7 0 PING 0 0 896 4 1 : tunables 54 27 8 : slabdata 0 0 0 RAW 6 8 896 4 1 : tunables 54 27 8 : slabdata 2 2 0 UDP 75 76 896 4 1 : tunables 54 27 8 : slabdata 19 19 0 tw_sock_TCP 38 120 192 20 1 : tunables 120 60 8 : slabdata 6 6 0 request_sock_TCP 30 30 256 15 1 : tunables 120 60 8 : slabdata 2 2 0 TCP 108 108 1792 2 1 : tunables 24 12 8 : slabdata 54 54 0 eventpoll_pwq 265 265 72 53 1 : tunables 120 60 8 : slabdata 5 5 0 eventpoll_epi 120 248 128 31 1 : tunables 120 60 8 : slabdata 8 8 0 sgpool-128 49 49 4096 1 1 : tunables 24 12 8 : slabdata 49 49 12 sgpool-64 64 64 2048 2 1 : tunables 24 12 8 : slabdata 32 32 24 sgpool-32 112 112 1024 4 1 : tunables 54 27 8 : slabdata 28 28 0 sgpool-16 96 96 512 8 1 : tunables 54 27 8 : slabdata 12 12 27 sgpool-8 300 300 256 15 1 : tunables 120 60 8 : slabdata 20 20 120 scsi_data_buffer 0 0 24 146 1 : tunables 120 60 8 : slabdata 0 0 0 blkdev_integrity 0 0 112 35 1 : tunables 120 60 8 : slabdata 0 0 0 blkdev_queue 22 24 1896 2 1 : tunables 24 12 8 : slabdata 12 12 0 blkdev_requests 334 490 384 10 1 : tunables 54 27 8 : slabdata 49 49 189 blkdev_ioc 472 518 104 37 1 : tunables 120 60 8 : slabdata 14 14 0 bio-0 2430 3480 256 15 1 : tunables 120 60 8 : slabdata 232 232 420 biovec-256 315 376 4096 1 1 : tunables 24 12 8 : slabdata 315 376 96 biovec-128 109 160 2048 2 1 : tunables 24 12 8 : slabdata 80 80 60 biovec-64 341 352 1024 4 1 : tunables 54 27 8 : slabdata 88 88 189 biovec-16 210 210 256 15 1 : tunables 120 60 8 : slabdata 14 14 60 bio_integrity_payload 4 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0 khugepaged_mm_slot 0 0 40 93 1 : tunables 120 60 8 : slabdata 0 0 0 ksm_mm_slot 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 ksm_stable_node 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 ksm_rmap_item 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 dmaengine-unmap-256 1 3 2112 3 2 : tunables 24 12 8 : slabdata 1 1 0 dmaengine-unmap-128 1 7 1088 7 2 : tunables 24 12 8 : slabdata 1 1 0 dmaengine-unmap-16 1 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0 dmaengine-unmap-2 1 60 64 60 1 : tunables 120 60 8 : slabdata 1 1 0 sock_inode_cache 444 444 640 6 1 : tunables 54 27 8 : slabdata 74 74 0 skbuff_fclone_cache 56 56 512 7 1 : tunables 54 27 8 : slabdata 8 8 0 skbuff_head_cache 465 555 256 15 1 : tunables 120 60 8 : slabdata 37 37 180 file_lock_cache 120 120 192 20 1 : tunables 120 60 8 : slabdata 6 6 0 net_namespace 0 0 4352 1 2 : tunables 8 4 0 : slabdata 0 0 0 shmem_inode_cache 1745 1800 656 6 1 : tunables 54 27 8 : slabdata 300 300 0 ftrace_event_file 1197 1232 88 44 1 : tunables 120 60 8 : slabdata 28 28 0 ftrace_event_field 2781 2808 48 78 1 : tunables 120 60 8 : slabdata 36 36 0 pool_workqueue 39 75 256 15 1 : tunables 120 60 8 : slabdata 5 5 0 task_delay_info 880 910 112 35 1 : tunables 120 60 8 : slabdata 26 26 0 taskstats 36 36 328 12 1 : tunables 54 27 8 : slabdata 3 3 0 proc_inode_cache 4038 4038 632 6 1 : tunables 54 27 8 : slabdata 673 673 0 sigqueue 235 288 160 24 1 : tunables 120 60 8 : slabdata 12 12 7 bdev_cache 41 48 832 4 1 : tunables 54 27 8 : slabdata 12 12 0 kernfs_node_cache 29369 29535 120 33 1 : tunables 120 60 8 : slabdata 895 895 0 mnt_cache 58 72 320 12 1 : tunables 54 27 8 : slabdata 6 6 0 filp 5715 6000 256 15 1 : tunables 120 60 8 : slabdata 400 400 480 inode_cache 9527 10283 568 7 1 : tunables 54 27 8 : slabdata 1469 1469 0 dentry 46720 46720 192 20 1 : tunables 120 60 8 : slabdata 2336 2336 0 names_cache 77 77 4096 1 1 : tunables 24 12 8 : slabdata 77 77 24 key_jar 4 20 192 20 1 : tunables 120 60 8 : slabdata 1 1 0 buffer_head 13 333 104 37 1 : tunables 120 60 8 : slabdata 9 9 0 nsproxy 1 78 48 78 1 : tunables 120 60 8 : slabdata 1 1 0 vm_area_struct 16239 17073 184 21 1 : tunables 120 60 8 : slabdata 813 813 360 mm_struct 364 364 960 4 1 : tunables 54 27 8 : slabdata 91 91 0 fs_cache 505 540 64 60 1 : tunables 120 60 8 : slabdata 9 9 0 files_cache 360 360 640 6 1 : tunables 54 27 8 : slabdata 60 60 0 signal_cache 602 602 1088 7 2 : tunables 24 12 8 : slabdata 86 86 0 sighand_cache 576 576 2112 3 2 : tunables 24 12 8 : slabdata 192 192 0 task_xstate 729 729 832 9 2 : tunables 54 27 8 : slabdata 81 81 0 task_struct 658 674 6304 1 2 : tunables 8 4 0 : slabdata 658 674 0 cred_jar 1408 1580 192 20 1 : tunables 120 60 8 : slabdata 79 79 60 Acpi-Operand 3606 3657 72 53 1 : tunables 120 60 8 : slabdata 69 69 0 Acpi-ParseExt 16 53 72 53 1 : tunables 120 60 8 : slabdata 1 1 0 Acpi-Parse 16 78 48 78 1 : tunables 120 60 8 : slabdata 1 1 0 Acpi-State 16 48 80 48 1 : tunables 120 60 8 : slabdata 1 1 0 Acpi-Namespace 1870 1953 40 93 1 : tunables 120 60 8 : slabdata 21 21 0 anon_vma_chain 17601 20220 64 60 1 : tunables 120 60 8 : slabdata 337 337 360 anon_vma 10261 11016 56 68 1 : tunables 120 60 8 : slabdata 162 162 180 pid 851 961 128 31 1 : tunables 120 60 8 : slabdata 31 31 0 shared_policy_node 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 numa_policy 3 146 24 146 1 : tunables 120 60 8 : slabdata 1 1 0 radix_tree_node 19286 20566 560 7 1 : tunables 54 27 8 : slabdata 2938 2938 135 idr_layer_cache 184 195 2112 3 2 : tunables 24 12 8 : slabdata 65 65 0 dma-kmalloc-4194304 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0 dma-kmalloc-2097152 0 0 2097152 1 512 : tunables 1 1 0 : slabdata 0 0 0 dma-kmalloc-1048576 0 0 1048576 1 256 : tunables 1 1 0 : slabdata 0 0 0 dma-kmalloc-524288 0 0 524288 1 128 : tunables 1 1 0 : slabdata 0 0 0 dma-kmalloc-262144 0 0 262144 1 64 : tunables 1 1 0 : slabdata 0 0 0 dma-kmalloc-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 dma-kmalloc-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 dma-kmalloc-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 dma-kmalloc-16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 dma-kmalloc-8192 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 dma-kmalloc-4096 0 0 4096 1 1 : tunables 24 12 8 : slabdata 0 0 0 dma-kmalloc-2048 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0 dma-kmalloc-1024 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0 dma-kmalloc-512 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0 dma-kmalloc-256 0 0 256 15 1 : tunables 120 60 8 : slabdata 0 0 0 dma-kmalloc-128 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 dma-kmalloc-64 0 0 64 60 1 : tunables 120 60 8 : slabdata 0 0 0 dma-kmalloc-32 0 0 32 113 1 : tunables 120 60 8 : slabdata 0 0 0 dma-kmalloc-192 0 0 192 20 1 : tunables 120 60 8 : slabdata 0 0 0 dma-kmalloc-96 0 0 128 31 1 : tunables 120 60 8 : slabdata 0 0 0 kmalloc-4194304 0 0 4194304 1 1024 : tunables 1 1 0 : slabdata 0 0 0 kmalloc-2097152 0 0 2097152 1 512 : tunables 1 1 0 : slabdata 0 0 0 kmalloc-1048576 0 0 1048576 1 256 : tunables 1 1 0 : slabdata 0 0 0 kmalloc-524288 0 0 524288 1 128 : tunables 1 1 0 : slabdata 0 0 0 kmalloc-262144 0 0 262144 1 64 : tunables 1 1 0 : slabdata 0 0 0 kmalloc-131072 4 4 131072 1 32 : tunables 8 4 0 : slabdata 4 4 0 kmalloc-65536 7 7 65536 1 16 : tunables 8 4 0 : slabdata 7 7 0 kmalloc-32768 2 2 32768 1 8 : tunables 8 4 0 : slabdata 2 2 0 kmalloc-16384 287 287 16384 1 4 : tunables 8 4 0 : slabdata 287 287 0 kmalloc-8192 37 37 8192 1 2 : tunables 8 4 0 : slabdata 37 37 0 kmalloc-4096 1355 1357 4096 1 1 : tunables 24 12 8 : slabdata 1355 1357 36 kmalloc-2048 947 956 2048 2 1 : tunables 24 12 8 : slabdata 478 478 12 kmalloc-1024 2864 2864 1024 4 1 : tunables 54 27 8 : slabdata 716 716 0 kmalloc-512 2439 2632 512 8 1 : tunables 54 27 8 : slabdata 329 329 27 kmalloc-256 11945 12105 256 15 1 : tunables 120 60 8 : slabdata 807 807 0 kmalloc-192 6880 6940 192 20 1 : tunables 120 60 8 : slabdata 347 347 120 kmalloc-96 3410 5208 128 31 1 : tunables 120 60 8 : slabdata 168 168 480 kmalloc-64 28688 34080 64 60 1 : tunables 120 60 8 : slabdata 568 568 180 kmalloc-128 12130 13795 128 31 1 : tunables 120 60 8 : slabdata 445 445 120 kmalloc-32 20274 25764 32 113 1 : tunables 120 60 8 : slabdata 228 228 480 kmem_cache 233 260 192 20 1 : tunables 120 60 8 : slabdata 13 13 0 Node 0, zone DMA pages free 3840 min 32 low 40 high 48 scanned 0 spanned 4095 present 3995 managed 3840 nr_free_pages 3840 nr_alloc_batch 8 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped 0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 16 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 16 numa_other 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 3204, 7691, 7691) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 2 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 2 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 2 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 2 all_unreclaimable: 1 start_pfn: 1 inactive_ratio: 1 Node 0, zone DMA32 pages free 106302 min 7025 low 8781 high 10537 scanned 0 spanned 1044480 present 840582 managed 821238 nr_free_pages 106302 nr_alloc_batch 460 nr_inactive_anon 23190 nr_active_anon 19470 nr_inactive_file 71852 nr_active_file 70899 nr_unevictable 569 nr_mlock 569 nr_anon_pages 41807 nr_mapped 6552 nr_file_pages 150284 nr_dirty 328 nr_writeback 0 nr_slab_reclaimable 2314 nr_slab_unreclaimable 2274 nr_page_table_pages 1068 nr_kernel_stack 376 nr_unstable 0 nr_bounce 0 nr_vmscan_write 138697 nr_vmscan_immediate_reclaim 29869 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 62 nr_dirtied 22352669 nr_written 22020249 numa_hit 1744799762 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 1744799762 numa_other 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 0, 4486, 4486) pagesets cpu: 0 count: 76 high: 186 batch: 31 vm stats threshold: 36 cpu: 1 count: 92 high: 186 batch: 31 vm stats threshold: 36 cpu: 2 count: 180 high: 186 batch: 31 vm stats threshold: 36 cpu: 3 count: 60 high: 186 batch: 31 vm stats threshold: 36 all_unreclaimable: 0 start_pfn: 4096 inactive_ratio: 5 Node 0, zone Normal pages free 12279 min 9837 low 12296 high 14755 scanned 0 spanned 1177088 present 1177088 managed 1148620 nr_free_pages 12279 nr_alloc_batch 2448 nr_inactive_anon 1105 nr_active_anon 904 nr_inactive_file 873 nr_active_file 734 nr_unevictable 769 nr_mlock 769 nr_anon_pages 1139 nr_mapped 841 nr_file_pages 3232 nr_dirty 90 nr_writeback 849 nr_slab_reclaimable 13469 nr_slab_unreclaimable 28608 nr_page_table_pages 1442 nr_kernel_stack 272 nr_unstable 0 nr_bounce 0 nr_vmscan_write 36540130 nr_vmscan_immediate_reclaim 430109 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 103 nr_dirtied 14670364 nr_written 45400987 numa_hit 2124679820 numa_miss 0 numa_foreign 0 numa_interleave 11035 numa_local 2124679820 numa_other 0 nr_anon_transparent_hugepages 0 nr_free_cma 0 protection: (0, 0, 0, 0) pagesets cpu: 0 count: 69 high: 186 batch: 31 vm stats threshold: 42 cpu: 1 count: 166 high: 186 batch: 31 vm stats threshold: 42 cpu: 2 count: 122 high: 186 batch: 31 vm stats threshold: 42 cpu: 3 count: 157 high: 186 batch: 31 vm stats threshold: 42 all_unreclaimable: 0 start_pfn: 1048576 inactive_ratio: 6 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-05 14:43 ` Marc MERLIN @ 2014-07-05 15:17 ` Andrew E. Mileski 2014-07-06 14:58 ` Marc MERLIN 1 sibling, 0 replies; 23+ messages in thread From: Andrew E. Mileski @ 2014-07-05 15:17 UTC (permalink / raw) To: linux-btrfs On 2014-07-05 10:43 AM, Marc MERLIN wrote: > On Sat, Jul 05, 2014 at 09:47:09AM -0400, Andrew E. Mileski wrote: >> On 2014-07-03 9:19 PM, Marc MERLIN wrote: >>> I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been >>> running out of memory and deadlocking (panic= doesn't even work). >>> I downgraded back to 3.14, but I already had the problem once since then. >> >> I didn't see any mention of the btrfs utility version in this thread >> (I may be blind though). >> >> My server was suffering from frequent panics upon scrub / defrag / >> balance, until I updated the btrfs utility. That resolved all my >> issues. > > Really? The userland tool should only send ioctls to the kernel, I > really can't see how it would cause the kernel code to panic or not. > > gargamel:~# btrfs --version > Btrfs v3.14.1 > which is the latest in debian unstable. It reached the point that I got so frustrated that I built the utilities from git, as an up-to-date package wasn't available at the time. I'm currently using v3.14.2 but v3.14.1 was the version that resolved my server's panics. I've not looked into the code very deeply yet, so I can't theorize on why it resolved my server's panics. I can only state with complete certainty that it did. I too was somewhat surprised. However, it seems we've probably now ruled it out in your case. ~~ Andrew E. Mileski ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-05 14:43 ` Marc MERLIN 2014-07-05 15:17 ` Andrew E. Mileski @ 2014-07-06 14:58 ` Marc MERLIN 2014-07-13 14:29 ` btrfs is " Marc MERLIN 2014-07-16 0:45 ` Is btrfs " Jérôme Poulin 1 sibling, 2 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-06 14:58 UTC (permalink / raw) To: Andrew E. Mileski; +Cc: linux-btrfs On Sat, Jul 05, 2014 at 07:43:18AM -0700, Marc MERLIN wrote: > On Sat, Jul 05, 2014 at 09:47:09AM -0400, Andrew E. Mileski wrote: > > On 2014-07-03 9:19 PM, Marc MERLIN wrote: > > >I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > > >running out of memory and deadlocking (panic= doesn't even work). > > >I downgraded back to 3.14, but I already had the problem once since then. > > > > I didn't see any mention of the btrfs utility version in this thread > > (I may be blind though). > > > > My server was suffering from frequent panics upon scrub / defrag / > > balance, until I updated the btrfs utility. That resolved all my > > issues. > > Really? The userland tool should only send ioctls to the kernel, I > really can't see how it would cause the kernel code to panic or not. > > gargamel:~# btrfs --version > Btrfs v3.14.1 > which is the latest in debian unstable. > > As an update, after 1.7 days of scrubbing, the system has started > getting sluggish, I'm getting synchronization problems/crashes in some of > my tools that talk to serial ports (likely due to mini deadlocks in the > kernel), and I'm now getting a few btrfs hangs. Predictably, it died yesterday afternoon after going into memory death (it was answering pings, but userspace was dead, and even sysrq-o did not respond, I had to power cycle the power outlet). This happened just before my 3rd scrub finished, so I'm now 2 out of 2: running scrub on my 3 filesystems kills the system half way through the 3rd scrub. This is the last memory log that reached the disk: http://marc.merlins.org/tmp/btrfs-oom2.txt Do those logs point to any possible culprit, or a kernel memory leak cannot be pointed to its source because the kernel loses track of who requested the memory that leaked? Excerpt here: Sat Jul 5 14:25:04 PDT 2014 total used free shared buffers cached Mem: 7894792 7712384 182408 0 28 227480 -/+ buffers/cache: 7484876 409916 Swap: 15616764 463732 15153032 Userspace is using 345MB according to ps Sat Jul 5 14:25:04 PDT 2014 MemTotal: 7894792 kB MemFree: 184556 kB MemAvailable: 269568 kB Buffers: 28 kB Cached: 228164 kB SwapCached: 18296 kB Active: 178196 kB Inactive: 187016 kB Active(anon): 70068 kB Inactive(anon): 71100 kB Active(file): 108128 kB Inactive(file): 115916 kB Unevictable: 5624 kB Mlocked: 5624 kB SwapTotal: 15616764 kB SwapFree: 15152768 kB Dirty: 17716 kB Writeback: 516 kB AnonPages: 140588 kB Mapped: 21940 kB Shmem: 688 kB Slab: 181708 kB SReclaimable: 59808 kB SUnreclaim: 121900 kB KernelStack: 4728 kB PageTables: 8480 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 19564160 kB Committed_AS: 1633204 kB VmallocTotal: 34359738367 kB VmallocUsed: 358996 kB VmallocChunk: 34359281468 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 144920 kB DirectMap2M: 7942144 kB Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-06 14:58 ` Marc MERLIN @ 2014-07-13 14:29 ` Marc MERLIN 2014-07-13 15:37 ` Marc MERLIN 2014-07-14 1:24 ` btrfs is " Qu Wenruo 2014-07-16 0:45 ` Is btrfs " Jérôme Poulin 1 sibling, 2 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-13 14:29 UTC (permalink / raw) To: Andrew E. Mileski; +Cc: linux-btrfs On Sun, Jul 06, 2014 at 07:58:15AM -0700, Marc MERLIN wrote: > > As an update, after 1.7 days of scrubbing, the system has started > > getting sluggish, I'm getting synchronization problems/crashes in some of > > my tools that talk to serial ports (likely due to mini deadlocks in the > > kernel), and I'm now getting a few btrfs hangs. > > Predictably, it died yesterday afternoon after going into memory death > (it was answering pings, but userspace was dead, and even sysrq-o did > not respond, I had to power cycle the power outlet). > > This happened just before my 3rd scrub finished, so I'm now 2 out of 2: > running scrub on my 3 filesystems kills the system half way through the > 3rd scrub. Ok, I now changed the subject line to confirm that btrfs is to blame. I had my system booted 6 days and it held steady at 6.4GB free. I mounted 2 of my 4 btrfs filesystems (one by one) and waited a few days, and no problems with RAM going down. Then I mounted my 3rd btrfs filesystem, the one that holds many files that has rsync backups running, and I lost 1.4GB of RAM overnight. I've just umounted one of its mountpoints, the one where all the backups happen while leaving its main pool mounted and will see if RAM keeps going down or not (i.e. whether the memory leak is due to rsync activity or some other background btrfs process). But generally, is there a tool to locate which kernel function allocated all that RAM that seems to get allocated and forgotten? Is /proc/slabinfo supposed to show anything useful? This is the filesystem in question: gargamel:~# btrfs fi df /mnt/btrfs_pool2/ Data, single: total=3.34TiB, used=3.32TiB System, DUP: total=8.00MiB, used=400.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=77.50GiB, used=59.87GiB Metadata, single: total=8.00MiB, used=0.00 Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-13 14:29 ` btrfs is " Marc MERLIN @ 2014-07-13 15:37 ` Marc MERLIN 2014-07-13 15:45 ` btrfs quotas " Marc MERLIN 2014-07-14 1:24 ` btrfs is " Qu Wenruo 1 sibling, 1 reply; 23+ messages in thread From: Marc MERLIN @ 2014-07-13 15:37 UTC (permalink / raw) To: Andrew E. Mileski; +Cc: linux-btrfs On Sun, Jul 13, 2014 at 07:29:18AM -0700, Marc MERLIN wrote: > Is /proc/slabinfo supposed to show anything useful? > > This is the filesystem in question: > gargamel:~# btrfs fi df /mnt/btrfs_pool2/ > Data, single: total=3.34TiB, used=3.32TiB > System, DUP: total=8.00MiB, used=400.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=77.50GiB, used=59.87GiB > Metadata, single: total=8.00MiB, used=0.00 Mmmh, now that I think of it, I do have quota enabled on that filesystem. Due to the many many files, this may be what's causing the problem. I thought quotas were supposed to work with 3.15, but maybe there is still a leak? I just turned quotas off and I'm going to let my server run for a while to see if the leak stops. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs quotas related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-13 15:37 ` Marc MERLIN @ 2014-07-13 15:45 ` Marc MERLIN 2014-07-14 1:36 ` Qu Wenruo 0 siblings, 1 reply; 23+ messages in thread From: Marc MERLIN @ 2014-07-13 15:45 UTC (permalink / raw) To: Andrew E. Mileski, Duncan, Russell Coker, Satoru Takeuchi, linux-btrfs On Sun, Jul 13, 2014 at 08:37:34AM -0700, Marc MERLIN wrote: > On Sun, Jul 13, 2014 at 07:29:18AM -0700, Marc MERLIN wrote: > > Is /proc/slabinfo supposed to show anything useful? > > > > This is the filesystem in question: > > gargamel:~# btrfs fi df /mnt/btrfs_pool2/ > > Data, single: total=3.34TiB, used=3.32TiB > > System, DUP: total=8.00MiB, used=400.00KiB > > System, single: total=4.00MiB, used=0.00 > > Metadata, DUP: total=77.50GiB, used=59.87GiB > > Metadata, single: total=8.00MiB, used=0.00 > > Mmmh, now that I think of it, I do have quota enabled on that > filesystem. > Due to the many many files, this may be what's causing the problem. > I thought quotas were supposed to work with 3.15, but maybe there is > still a leak? > > I just turned quotas off and I'm going to let my server run for a while to > see if the leak stops. Mmmh, look what I found in my quota output before I turned it off. Many many unknown subvolumes. If I were to guess, when I rotate snapshots and delete them, their quota does not, and a lot of crap stays behind. Not sure if that is causing the memory leak, but that can't be good. Hopefully quota disable will have deleted all that cruft and will stop the leak. gargamel:/mnt/btrfs_pool2# btrfs-quota.py . subvol group total unshared ------------------------------------------------------------------------------- (unknown) 0/5 0.00G 0.00G backup 0/257 -0.00G 0.00G Soft 0/258 59.27G 0.00G Win 0/259 112.60G 0.00G backup/debian32 0/262 -166.13G -8.72G backup/debian64 0/263 644.62G 0.00G backup/ubuntu 0/264 326.59G -0.49G backup-test 0/265 0.00G 0.00G backup/0Notmachines 0/266 268.16G -0.13G backup/1Appliances 0/566 2.09G 0.00G backup/win 0/570 461.61G 0.00G (unknown) 0/1039 0.00G 0.00G (unknown) 0/1041 0.00G 0.00G (unknown) 0/1044 0.00G 0.00G (unknown) 0/1045 0.00G 0.00G (unknown) 0/1046 0.00G 0.00G (unknown) 0/1047 0.00G 0.00G (unknown) 0/1048 0.00G 0.00G (unknown) 0/1049 0.00G 0.00G (unknown) 0/1050 0.00G 0.00G (unknown) 0/1051 0.00G 0.00G (unknown) 0/1052 0.00G 0.00G (unknown) 0/1053 0.00G 0.00G (unknown) 0/1054 300.95G 0.00G (unknown) 0/1055 300.95G 0.00G (unknown) 0/1056 2.09G 0.00G (unknown) 0/1057 2.09G 0.00G (unknown) 0/1058 461.61G 0.00G (unknown) 0/1059 461.61G 0.00G (unknown) 0/1518 0.00G -0.00G (unknown) 0/1519 0.00G 0.00G (unknown) 0/1520 59.26G 0.00G (unknown) 0/1521 59.27G 0.00G (unknown) 0/1522 112.60G 0.00G (unknown) 0/1523 112.60G 0.00G (unknown) 0/1524 0.00G 0.00G (unknown) 0/1525 0.00G 0.00G (unknown) 0/1526 278.62G -2.52G (unknown) 0/1527 282.31G 0.00G (1600 lines of unknown snipped) legolas/tmp_ggm_daily_ro.20140527_10:03:17 0/4125 0.22G 0.00G legolas/tmp_ggm_daily_ro.20140527_10:03:17_daily_20140528_00:03:01 0/4258 0.22G 0.00G legolas/var_ggm_daily_ro.20140529_10:19:37 0/4565 297.75G 0.00G legolas/var_ggm_daily_ro.20140530_10:10:24 0/4599 301.64G 0.01G legolas/var_ggm_daily_ro.20140603_10:22:18 0/4710 303.93G 0.94G legolas/var_ggm_daily_ro.20140605_10:20:02 0/4764 304.09G 0.00G legolas/var_ggm_daily_ro.20140606_10:23:59 0/4792 305.97G 0.41G legolas/tmp_ggm_daily_ro.20140611_10:40:08 0/4938 0.29G 0.01G legolas/var_ggm_daily_ro.20140613_10:30:23 0/4997 288.92G 0.00G legolas/var_ggm_daily_ro.20140614_10:26:37 0/5030 289.02G 0.00G legolas/var_ggm_daily_rw.20140614_10:26:37 0/5032 289.02G 0.00G backup_weekly_20140615_00:04:01 0/5051 -0.00G 0.00G Soft_weekly_20140615_00:04:01 0/5052 59.27G 0.00G Win_weekly_20140615_00:04:01 0/5053 112.60G 0.00G backup/debian32_weekly_20140615_00:04:01 0/5054 77.47G 0.44G backup/debian64_weekly_20140615_00:04:01 0/5055 787.45G 0.00G backup/ubuntu_weekly_20140615_00:04:01 0/5056 340.51G 0.11G backup/0Notmachines_weekly_20140615_00:04:01 0/5057 271.30G 0.15G backup/1Appliances_weekly_20140615_00:04:01 0/5058 2.09G 0.00G backup/win_weekly_20140615_00:04:01 0/5059 461.61G 0.00G legolas/var_ggm_daily_ro.20140615_11:17:36 0/5068 288.89G 0.00G legolas/var_ggm_daily_rw.20140615_11:17:36 0/5069 288.89G 0.00G legolas/var_ggm_daily_ro.20140616_11:59:11 0/5095 288.28G 0.00G legolas/var_ggm_daily_rw.20140616_11:59:11 0/5096 288.28G 0.00G legolas/var_ggm_daily_rw.20140617_10:26:39 0/5124 288.42G 0.00G legolas/var_ggm_daily_ro.20140618_10:27:06 0/5151 288.97G 1.05G legolas/var_ggm_daily_ro.20140619_10:18:57 0/5178 289.15G 1.26G legolas/var_ggm_daily_ro.20140621_11:00:21 0/5229 288.12G 0.30G backup_weekly_20140622_00:04:01 0/5248 -0.00G 0.00G Soft_weekly_20140622_00:04:01 0/5249 59.27G 0.00G Win_weekly_20140622_00:04:01 0/5250 112.60G 0.00G backup/debian32_weekly_20140622_00:04:01 0/5251 20.62G 0.18G backup/debian64_weekly_20140622_00:04:01 0/5252 716.66G 0.00G backup/ubuntu_weekly_20140622_00:04:01 0/5253 336.38G 0.04G backup/0Notmachines_weekly_20140622_00:04:01 0/5254 270.47G 0.01G backup/1Appliances_weekly_20140622_00:04:01 0/5255 2.09G 0.00G backup/win_weekly_20140622_00:04:01 0/5256 461.61G 0.00G legolas/var_ggm_daily_ro.20140622_10:16:05 0/5265 287.50G 0.20G legolas/var_ggm_daily_ro.20140624_10:17:10 0/5310 299.05G 0.00G legolas/var_ggm_daily_rw.20140624_10:17:10 0/5311 299.05G 0.00G legolas/var_ggm_daily_ro.20140625_10:21:06 0/5343 299.50G 1.46G legolas/var_ggm_daily_ro.20140627_12:14:51 0/5382 295.13G 3.17G legolas/var_ggm_daily_ro.20140628_11:16:05 0/5420 299.93G 2.05G backup_weekly_20140629_00:04:01 0/5438 -0.00G 0.00G Soft_weekly_20140629_00:04:01 0/5440 59.27G 0.00G Win_weekly_20140629_00:04:01 0/5442 112.60G 0.00G backup/debian32_weekly_20140629_00:04:01 0/5445 -33.15G 0.77G backup/debian64_weekly_20140629_00:04:01 0/5446 709.35G 0.00G backup/ubuntu_weekly_20140629_00:04:01 0/5447 333.13G 0.26G backup/0Notmachines_weekly_20140629_00:04:01 0/5448 269.65G 0.00G backup/1Appliances_weekly_20140629_00:04:01 0/5449 2.09G 0.00G backup/win_weekly_20140629_00:04:01 0/5450 461.61G 0.00G legolas/var_ggm_daily_ro.20140629_10:28:56 0/5459 300.07G 2.30G legolas/home_ggm_daily_ro.20140630_10:02:02 0/5478 77.76G 0.00G legolas/var_ggm_daily_ro.20140630_10:19:42 0/5486 300.08G 2.31G legolas/home_ggm_daily_rw.20140701_10:02:05 0/5506 78.30G 0.00G legolas/var_ggm_daily_ro.20140701_10:49:22 0/5513 299.19G 0.40G legolas/home_ggm_daily_ro.20140702_10:02:06 0/5532 78.38G 0.00G legolas/home_ggm_daily_rw.20140702_10:02:06 0/5533 78.38G 0.00G legolas/root_ggm_daily_ro.20140702_10:20:09 0/5534 1.66G 0.00G legolas/root_ggm_daily_rw.20140702_10:20:09 0/5535 1.66G 0.00G legolas/tmp_ggm_daily_ro.20140702_10:20:50 0/5536 0.08G 0.00G legolas/tmp_ggm_daily_rw.20140702_10:20:50 0/5537 0.08G 0.00G legolas/usr_ggm_daily_ro.20140702_10:21:11 0/5538 10.36G 0.00G legolas/usr_ggm_daily_rw.20140702_10:21:11 0/5539 10.36G 0.00G legolas/var_ggm_daily_ro.20140702_10:21:31 0/5540 299.19G 0.41G backup_hourly_20140703_00:03:01 0/5541 -0.00G 0.00G backup_daily_20140703_00:03:01 0/5542 -0.00G 0.00G Soft_hourly_20140703_00:03:01 0/5543 59.27G 0.00G Soft_daily_20140703_00:03:01 0/5544 59.27G 0.00G Win_hourly_20140703_00:03:01 0/5545 112.60G 0.00G Win_daily_20140703_00:03:01 0/5546 112.60G 0.00G backup/debian32_hourly_20140703_00:03:01 0/5547 -103.86G 0.00G backup/debian32_daily_20140703_00:03:01 0/5548 -103.86G 0.00G backup/debian64_daily_20140703_00:03:01 0/5549 643.32G 0.00G backup/debian64_hourly_20140703_00:03:01 0/5550 643.32G 0.00G backup/ubuntu_hourly_20140703_00:03:01 0/5551 328.48G 0.00G backup/ubuntu_daily_20140703_00:03:01 0/5552 328.48G 0.00G backup/0Notmachines_hourly_20140703_00:03:01 0/5553 268.91G 0.00G backup/0Notmachines_daily_20140703_00:03:01 0/5554 268.91G 0.00G backup/1Appliances_hourly_20140703_00:03:01 0/5555 2.09G 0.00G backup/1Appliances_daily_20140703_00:03:01 0/5556 2.09G 0.00G backup/win_hourly_20140703_00:03:01 0/5557 461.61G 0.00G backup/win_daily_20140703_00:03:01 0/5558 461.61G 0.00G legolas/home_ggm_daily_ro.20140703_10:02:03 0/5559 78.41G 0.00G legolas/home_ggm_daily_rw.20140703_10:02:03 0/5560 78.41G 0.00G legolas/root_ggm_daily_ro.20140703_10:19:21 0/5561 1.66G 0.00G legolas/root_ggm_daily_rw.20140703_10:19:21 0/5562 1.66G 0.00G legolas/tmp_ggm_daily_ro.20140703_10:20:12 0/5563 0.08G 0.00G legolas/tmp_ggm_daily_rw.20140703_10:20:12 0/5564 0.08G 0.00G legolas/usr_ggm_daily_ro.20140703_10:21:08 0/5565 10.36G 0.00G legolas/usr_ggm_daily_rw.20140703_10:21:08 0/5566 10.36G 0.00G legolas/var_ggm_daily_ro.20140703_10:21:56 0/5567 301.30G 4.01G backup_hourly_20140704_00:03:01 0/5568 -0.00G 0.00G backup_daily_20140704_00:03:01 0/5569 -0.00G 0.00G Soft_daily_20140704_00:03:01 0/5570 59.27G 0.00G Soft_hourly_20140704_00:03:01 0/5571 59.27G 0.00G Win_daily_20140704_00:03:01 0/5572 112.60G 0.00G Win_hourly_20140704_00:03:01 0/5573 112.60G 0.00G backup/debian32_daily_20140704_00:03:01 0/5574 -101.81G 0.00G backup/debian32_hourly_20140704_00:03:01 0/5575 -101.81G 0.00G backup/debian64_hourly_20140704_00:03:01 0/5576 643.32G 0.00G backup/debian64_daily_20140704_00:03:01 0/5577 643.32G 0.00G backup/ubuntu_daily_20140704_00:03:01 0/5578 328.48G 0.00G backup/ubuntu_hourly_20140704_00:03:01 0/5579 328.48G 0.00G backup/0Notmachines_daily_20140704_00:03:01 0/5580 268.90G 0.00G backup/0Notmachines_hourly_20140704_00:03:01 0/5581 268.90G 0.00G backup/1Appliances_hourly_20140704_00:03:01 0/5582 2.09G 0.00G backup/1Appliances_daily_20140704_00:03:01 0/5583 2.09G 0.00G backup/win_hourly_20140704_00:03:01 0/5584 461.61G 0.00G backup/win_daily_20140704_00:03:01 0/5585 461.61G 0.00G legolas/home_ggm_daily_ro.20140704_10:02:03 0/5586 77.52G 0.00G legolas/home_ggm_daily_rw.20140704_10:02:03 0/5587 77.52G 0.00G legolas/root_ggm_daily_ro.20140704_10:16:59 0/5588 1.66G 0.00G legolas/root_ggm_daily_rw.20140704_10:16:59 0/5589 1.66G 0.00G legolas/tmp_ggm_daily_ro.20140704_10:17:07 0/5590 0.08G 0.00G legolas/tmp_ggm_daily_rw.20140704_10:17:07 0/5591 0.08G 0.00G legolas/usr_ggm_daily_ro.20140704_10:17:24 0/5592 10.36G 0.00G legolas/usr_ggm_daily_rw.20140704_10:17:24 0/5593 10.36G 0.00G legolas/var_ggm_daily_ro.20140704_10:17:39 0/5594 304.28G 19.49G backup_hourly_20140705_00:03:00 0/5595 -0.00G 0.00G backup_daily_20140705_00:03:00 0/5596 -0.00G 0.00G Soft_daily_20140705_00:03:00 0/5597 59.27G 0.00G Soft_hourly_20140705_00:03:00 0/5598 59.27G 0.00G Win_hourly_20140705_00:03:00 0/5599 112.60G 0.00G Win_daily_20140705_00:03:00 0/5600 112.60G 0.00G backup/debian32_daily_20140705_00:03:00 0/5601 -118.33G 0.00G backup/debian32_hourly_20140705_00:03:00 0/5602 -118.33G 0.00G backup/debian64_daily_20140705_00:03:00 0/5603 643.76G 0.00G backup/debian64_hourly_20140705_00:03:00 0/5604 643.76G 0.00G backup/ubuntu_daily_20140705_00:03:00 0/5605 328.12G 0.00G backup/ubuntu_hourly_20140705_00:03:00 0/5606 328.12G 0.00G backup/0Notmachines_daily_20140705_00:03:00 0/5607 268.78G 0.00G backup/0Notmachines_hourly_20140705_00:03:00 0/5608 268.78G 0.00G backup/1Appliances_daily_20140705_00:03:00 0/5609 2.09G 0.00G backup/1Appliances_hourly_20140705_00:03:00 0/5610 2.09G 0.00G backup/win_daily_20140705_00:03:00 0/5611 461.61G 0.00G backup/win_hourly_20140705_00:03:00 0/5612 461.61G 0.00G legolas/home_ggm_daily_ro.20140705_10:02:09 0/5613 77.54G 0.10G legolas/root_ggm_daily_ro.20140705_14:07:48 0/5614 1.66G 0.00G legolas/root_ggm_daily_rw.20140705_14:07:48 0/5615 1.66G 0.00G legolas/tmp_ggm_daily_ro.20140705_14:08:41 0/5616 0.08G 0.00G legolas/tmp_ggm_daily_rw.20140705_14:08:41 0/5617 0.08G 0.00G legolas/usr_ggm_daily_ro.20140705_14:09:13 0/5618 10.36G 0.00G legolas/usr_ggm_daily_rw.20140705_14:09:13 0/5619 10.36G 0.00G legolas/var_ggm_daily_ro.20140705_14:10:09 0/5620 299.50G 0.79G backup_daily_20140706_00:03:01 0/5621 -0.00G 0.00G backup_hourly_20140706_00:03:01 0/5622 -0.00G 0.00G Soft_daily_20140706_00:03:01 0/5623 59.27G 0.00G Soft_hourly_20140706_00:03:01 0/5624 59.27G 0.00G Win_daily_20140706_00:03:01 0/5625 112.60G 0.00G Win_hourly_20140706_00:03:01 0/5626 112.60G 0.00G backup/debian32_daily_20140706_00:03:01 0/5627 -129.94G 0.00G backup/debian32_hourly_20140706_00:03:01 0/5628 -129.94G 0.00G backup/debian64_daily_20140706_00:03:01 0/5629 644.62G 0.00G backup/debian64_hourly_20140706_00:03:01 0/5630 644.62G 0.00G backup/ubuntu_hourly_20140706_00:03:01 0/5631 328.12G 0.00G backup/ubuntu_daily_20140706_00:03:01 0/5632 328.12G 0.00G backup/0Notmachines_daily_20140706_00:03:01 0/5633 268.65G 0.00G backup/0Notmachines_hourly_20140706_00:03:01 0/5634 268.65G 0.00G backup/1Appliances_daily_20140706_00:03:01 0/5635 2.09G 0.00G backup/1Appliances_hourly_20140706_00:03:01 0/5636 2.09G 0.00G backup/win_daily_20140706_00:03:01 0/5637 461.61G 0.00G backup/win_hourly_20140706_00:03:01 0/5638 461.61G 0.00G backup_weekly_20140706_00:04:01 0/5639 -0.00G 0.00G Soft_weekly_20140706_00:04:01 0/5640 59.27G 0.00G Win_weekly_20140706_00:04:01 0/5641 112.60G 0.00G backup/debian32_weekly_20140706_00:04:01 0/5642 -129.94G 0.00G backup/debian64_weekly_20140706_00:04:01 0/5643 644.62G 0.00G backup/ubuntu_weekly_20140706_00:04:01 0/5644 328.12G 0.00G backup/0Notmachines_weekly_20140706_00:04:01 0/5645 268.65G 0.00G backup/1Appliances_weekly_20140706_00:04:01 0/5646 2.09G 0.00G backup/win_weekly_20140706_00:04:01 0/5647 461.61G 0.00G legolas/home_ggm_daily_ro.20140706_10:02:01 0/5648 77.53G 0.00G legolas/home_ggm_daily_rw.20140706_10:02:01 0/5649 77.53G 0.00G legolas/root_ggm_daily_ro.20140706_11:08:26 0/5650 1.67G 0.00G legolas/root_ggm_daily_rw.20140706_11:08:26 0/5651 1.67G 0.00G legolas/tmp_ggm_daily_ro.20140706_11:09:31 0/5652 0.08G 0.00G legolas/tmp_ggm_daily_rw.20140706_11:09:31 0/5653 0.08G 0.00G legolas/usr_ggm_daily_ro.20140706_11:10:34 0/5654 10.36G 0.00G legolas/usr_ggm_daily_rw.20140706_11:10:34 0/5655 10.36G 0.00G legolas/var_ggm_daily_ro.20140706_11:11:30 0/5656 301.21G 4.53G gargamel:/mnt/btrfs_pool2# btrfs quota disable . -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs quotas related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-13 15:45 ` btrfs quotas " Marc MERLIN @ 2014-07-14 1:36 ` Qu Wenruo 2014-07-14 2:43 ` Marc MERLIN 0 siblings, 1 reply; 23+ messages in thread From: Qu Wenruo @ 2014-07-14 1:36 UTC (permalink / raw) To: Marc MERLIN, Andrew E. Mileski, Duncan, Russell Coker, Satoru Takeuchi, linux-btrfs -------- Original Message -------- Subject: Re: btrfs quotas related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? From: Marc MERLIN <marc@merlins.org> To: Andrew E. Mileski <andrewm@isoar.ca>, Duncan <1i5t5.duncan@cox.net>, Russell Coker <russell@coker.com.au>, Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>, linux-btrfs@vger.kernel.org Date: 2014年07月13日 23:45 > On Sun, Jul 13, 2014 at 08:37:34AM -0700, Marc MERLIN wrote: >> On Sun, Jul 13, 2014 at 07:29:18AM -0700, Marc MERLIN wrote: >>> Is /proc/slabinfo supposed to show anything useful? >>> >>> This is the filesystem in question: >>> gargamel:~# btrfs fi df /mnt/btrfs_pool2/ >>> Data, single: total=3.34TiB, used=3.32TiB >>> System, DUP: total=8.00MiB, used=400.00KiB >>> System, single: total=4.00MiB, used=0.00 >>> Metadata, DUP: total=77.50GiB, used=59.87GiB >>> Metadata, single: total=8.00MiB, used=0.00 >> Mmmh, now that I think of it, I do have quota enabled on that >> filesystem. >> Due to the many many files, this may be what's causing the problem. >> I thought quotas were supposed to work with 3.15, but maybe there is >> still a leak? >> >> I just turned quotas off and I'm going to let my server run for a while to >> see if the leak stops. > Mmmh, look what I found in my quota output before I turned it off. > Many many unknown subvolumes. > > If I were to guess, when I rotate snapshots and delete them, their quota > does not, and a lot of crap stays behind. Not sure if that is causing > the memory leak, but that can't be good. This seems to be a known issue. When you enable quota and create a subvolume, a qgroup(0/<subvol id>) will be created and bind to the newly created subvolume. But on the other hand, when you delete the subvolume, the qgroup will *not* be deleted automatically. So you need to remove the qgroup manually. More info can be found in the btrfs wiki: https://btrfs.wiki.kernel.org/index.php/Quota_support#Known_issues Thanks, Qu > Hopefully quota disable will have deleted all that cruft and will stop the leak. > > gargamel:/mnt/btrfs_pool2# btrfs-quota.py . > subvol group total unshared > ------------------------------------------------------------------------------- > (unknown) 0/5 0.00G 0.00G > backup 0/257 -0.00G 0.00G > Soft 0/258 59.27G 0.00G > Win 0/259 112.60G 0.00G > backup/debian32 0/262 -166.13G -8.72G > backup/debian64 0/263 644.62G 0.00G > backup/ubuntu 0/264 326.59G -0.49G > backup-test 0/265 0.00G 0.00G > backup/0Notmachines 0/266 268.16G -0.13G > backup/1Appliances 0/566 2.09G 0.00G > backup/win 0/570 461.61G 0.00G > (unknown) 0/1039 0.00G 0.00G > (unknown) 0/1041 0.00G 0.00G > (unknown) 0/1044 0.00G 0.00G > (unknown) 0/1045 0.00G 0.00G > (unknown) 0/1046 0.00G 0.00G > (unknown) 0/1047 0.00G 0.00G > (unknown) 0/1048 0.00G 0.00G > (unknown) 0/1049 0.00G 0.00G > (unknown) 0/1050 0.00G 0.00G > (unknown) 0/1051 0.00G 0.00G > (unknown) 0/1052 0.00G 0.00G > (unknown) 0/1053 0.00G 0.00G > (unknown) 0/1054 300.95G 0.00G > (unknown) 0/1055 300.95G 0.00G > (unknown) 0/1056 2.09G 0.00G > (unknown) 0/1057 2.09G 0.00G > (unknown) 0/1058 461.61G 0.00G > (unknown) 0/1059 461.61G 0.00G > (unknown) 0/1518 0.00G -0.00G > (unknown) 0/1519 0.00G 0.00G > (unknown) 0/1520 59.26G 0.00G > (unknown) 0/1521 59.27G 0.00G > (unknown) 0/1522 112.60G 0.00G > (unknown) 0/1523 112.60G 0.00G > (unknown) 0/1524 0.00G 0.00G > (unknown) 0/1525 0.00G 0.00G > (unknown) 0/1526 278.62G -2.52G > (unknown) 0/1527 282.31G 0.00G > > (1600 lines of unknown snipped) > > legolas/tmp_ggm_daily_ro.20140527_10:03:17 0/4125 0.22G 0.00G > legolas/tmp_ggm_daily_ro.20140527_10:03:17_daily_20140528_00:03:01 0/4258 0.22G 0.00G > legolas/var_ggm_daily_ro.20140529_10:19:37 0/4565 297.75G 0.00G > legolas/var_ggm_daily_ro.20140530_10:10:24 0/4599 301.64G 0.01G > legolas/var_ggm_daily_ro.20140603_10:22:18 0/4710 303.93G 0.94G > legolas/var_ggm_daily_ro.20140605_10:20:02 0/4764 304.09G 0.00G > legolas/var_ggm_daily_ro.20140606_10:23:59 0/4792 305.97G 0.41G > legolas/tmp_ggm_daily_ro.20140611_10:40:08 0/4938 0.29G 0.01G > legolas/var_ggm_daily_ro.20140613_10:30:23 0/4997 288.92G 0.00G > legolas/var_ggm_daily_ro.20140614_10:26:37 0/5030 289.02G 0.00G > legolas/var_ggm_daily_rw.20140614_10:26:37 0/5032 289.02G 0.00G > backup_weekly_20140615_00:04:01 0/5051 -0.00G 0.00G > Soft_weekly_20140615_00:04:01 0/5052 59.27G 0.00G > Win_weekly_20140615_00:04:01 0/5053 112.60G 0.00G > backup/debian32_weekly_20140615_00:04:01 0/5054 77.47G 0.44G > backup/debian64_weekly_20140615_00:04:01 0/5055 787.45G 0.00G > backup/ubuntu_weekly_20140615_00:04:01 0/5056 340.51G 0.11G > backup/0Notmachines_weekly_20140615_00:04:01 0/5057 271.30G 0.15G > backup/1Appliances_weekly_20140615_00:04:01 0/5058 2.09G 0.00G > backup/win_weekly_20140615_00:04:01 0/5059 461.61G 0.00G > legolas/var_ggm_daily_ro.20140615_11:17:36 0/5068 288.89G 0.00G > legolas/var_ggm_daily_rw.20140615_11:17:36 0/5069 288.89G 0.00G > legolas/var_ggm_daily_ro.20140616_11:59:11 0/5095 288.28G 0.00G > legolas/var_ggm_daily_rw.20140616_11:59:11 0/5096 288.28G 0.00G > legolas/var_ggm_daily_rw.20140617_10:26:39 0/5124 288.42G 0.00G > legolas/var_ggm_daily_ro.20140618_10:27:06 0/5151 288.97G 1.05G > legolas/var_ggm_daily_ro.20140619_10:18:57 0/5178 289.15G 1.26G > legolas/var_ggm_daily_ro.20140621_11:00:21 0/5229 288.12G 0.30G > backup_weekly_20140622_00:04:01 0/5248 -0.00G 0.00G > Soft_weekly_20140622_00:04:01 0/5249 59.27G 0.00G > Win_weekly_20140622_00:04:01 0/5250 112.60G 0.00G > backup/debian32_weekly_20140622_00:04:01 0/5251 20.62G 0.18G > backup/debian64_weekly_20140622_00:04:01 0/5252 716.66G 0.00G > backup/ubuntu_weekly_20140622_00:04:01 0/5253 336.38G 0.04G > backup/0Notmachines_weekly_20140622_00:04:01 0/5254 270.47G 0.01G > backup/1Appliances_weekly_20140622_00:04:01 0/5255 2.09G 0.00G > backup/win_weekly_20140622_00:04:01 0/5256 461.61G 0.00G > legolas/var_ggm_daily_ro.20140622_10:16:05 0/5265 287.50G 0.20G > legolas/var_ggm_daily_ro.20140624_10:17:10 0/5310 299.05G 0.00G > legolas/var_ggm_daily_rw.20140624_10:17:10 0/5311 299.05G 0.00G > legolas/var_ggm_daily_ro.20140625_10:21:06 0/5343 299.50G 1.46G > legolas/var_ggm_daily_ro.20140627_12:14:51 0/5382 295.13G 3.17G > legolas/var_ggm_daily_ro.20140628_11:16:05 0/5420 299.93G 2.05G > backup_weekly_20140629_00:04:01 0/5438 -0.00G 0.00G > Soft_weekly_20140629_00:04:01 0/5440 59.27G 0.00G > Win_weekly_20140629_00:04:01 0/5442 112.60G 0.00G > backup/debian32_weekly_20140629_00:04:01 0/5445 -33.15G 0.77G > backup/debian64_weekly_20140629_00:04:01 0/5446 709.35G 0.00G > backup/ubuntu_weekly_20140629_00:04:01 0/5447 333.13G 0.26G > backup/0Notmachines_weekly_20140629_00:04:01 0/5448 269.65G 0.00G > backup/1Appliances_weekly_20140629_00:04:01 0/5449 2.09G 0.00G > backup/win_weekly_20140629_00:04:01 0/5450 461.61G 0.00G > legolas/var_ggm_daily_ro.20140629_10:28:56 0/5459 300.07G 2.30G > legolas/home_ggm_daily_ro.20140630_10:02:02 0/5478 77.76G 0.00G > legolas/var_ggm_daily_ro.20140630_10:19:42 0/5486 300.08G 2.31G > legolas/home_ggm_daily_rw.20140701_10:02:05 0/5506 78.30G 0.00G > legolas/var_ggm_daily_ro.20140701_10:49:22 0/5513 299.19G 0.40G > legolas/home_ggm_daily_ro.20140702_10:02:06 0/5532 78.38G 0.00G > legolas/home_ggm_daily_rw.20140702_10:02:06 0/5533 78.38G 0.00G > legolas/root_ggm_daily_ro.20140702_10:20:09 0/5534 1.66G 0.00G > legolas/root_ggm_daily_rw.20140702_10:20:09 0/5535 1.66G 0.00G > legolas/tmp_ggm_daily_ro.20140702_10:20:50 0/5536 0.08G 0.00G > legolas/tmp_ggm_daily_rw.20140702_10:20:50 0/5537 0.08G 0.00G > legolas/usr_ggm_daily_ro.20140702_10:21:11 0/5538 10.36G 0.00G > legolas/usr_ggm_daily_rw.20140702_10:21:11 0/5539 10.36G 0.00G > legolas/var_ggm_daily_ro.20140702_10:21:31 0/5540 299.19G 0.41G > backup_hourly_20140703_00:03:01 0/5541 -0.00G 0.00G > backup_daily_20140703_00:03:01 0/5542 -0.00G 0.00G > Soft_hourly_20140703_00:03:01 0/5543 59.27G 0.00G > Soft_daily_20140703_00:03:01 0/5544 59.27G 0.00G > Win_hourly_20140703_00:03:01 0/5545 112.60G 0.00G > Win_daily_20140703_00:03:01 0/5546 112.60G 0.00G > backup/debian32_hourly_20140703_00:03:01 0/5547 -103.86G 0.00G > backup/debian32_daily_20140703_00:03:01 0/5548 -103.86G 0.00G > backup/debian64_daily_20140703_00:03:01 0/5549 643.32G 0.00G > backup/debian64_hourly_20140703_00:03:01 0/5550 643.32G 0.00G > backup/ubuntu_hourly_20140703_00:03:01 0/5551 328.48G 0.00G > backup/ubuntu_daily_20140703_00:03:01 0/5552 328.48G 0.00G > backup/0Notmachines_hourly_20140703_00:03:01 0/5553 268.91G 0.00G > backup/0Notmachines_daily_20140703_00:03:01 0/5554 268.91G 0.00G > backup/1Appliances_hourly_20140703_00:03:01 0/5555 2.09G 0.00G > backup/1Appliances_daily_20140703_00:03:01 0/5556 2.09G 0.00G > backup/win_hourly_20140703_00:03:01 0/5557 461.61G 0.00G > backup/win_daily_20140703_00:03:01 0/5558 461.61G 0.00G > legolas/home_ggm_daily_ro.20140703_10:02:03 0/5559 78.41G 0.00G > legolas/home_ggm_daily_rw.20140703_10:02:03 0/5560 78.41G 0.00G > legolas/root_ggm_daily_ro.20140703_10:19:21 0/5561 1.66G 0.00G > legolas/root_ggm_daily_rw.20140703_10:19:21 0/5562 1.66G 0.00G > legolas/tmp_ggm_daily_ro.20140703_10:20:12 0/5563 0.08G 0.00G > legolas/tmp_ggm_daily_rw.20140703_10:20:12 0/5564 0.08G 0.00G > legolas/usr_ggm_daily_ro.20140703_10:21:08 0/5565 10.36G 0.00G > legolas/usr_ggm_daily_rw.20140703_10:21:08 0/5566 10.36G 0.00G > legolas/var_ggm_daily_ro.20140703_10:21:56 0/5567 301.30G 4.01G > backup_hourly_20140704_00:03:01 0/5568 -0.00G 0.00G > backup_daily_20140704_00:03:01 0/5569 -0.00G 0.00G > Soft_daily_20140704_00:03:01 0/5570 59.27G 0.00G > Soft_hourly_20140704_00:03:01 0/5571 59.27G 0.00G > Win_daily_20140704_00:03:01 0/5572 112.60G 0.00G > Win_hourly_20140704_00:03:01 0/5573 112.60G 0.00G > backup/debian32_daily_20140704_00:03:01 0/5574 -101.81G 0.00G > backup/debian32_hourly_20140704_00:03:01 0/5575 -101.81G 0.00G > backup/debian64_hourly_20140704_00:03:01 0/5576 643.32G 0.00G > backup/debian64_daily_20140704_00:03:01 0/5577 643.32G 0.00G > backup/ubuntu_daily_20140704_00:03:01 0/5578 328.48G 0.00G > backup/ubuntu_hourly_20140704_00:03:01 0/5579 328.48G 0.00G > backup/0Notmachines_daily_20140704_00:03:01 0/5580 268.90G 0.00G > backup/0Notmachines_hourly_20140704_00:03:01 0/5581 268.90G 0.00G > backup/1Appliances_hourly_20140704_00:03:01 0/5582 2.09G 0.00G > backup/1Appliances_daily_20140704_00:03:01 0/5583 2.09G 0.00G > backup/win_hourly_20140704_00:03:01 0/5584 461.61G 0.00G > backup/win_daily_20140704_00:03:01 0/5585 461.61G 0.00G > legolas/home_ggm_daily_ro.20140704_10:02:03 0/5586 77.52G 0.00G > legolas/home_ggm_daily_rw.20140704_10:02:03 0/5587 77.52G 0.00G > legolas/root_ggm_daily_ro.20140704_10:16:59 0/5588 1.66G 0.00G > legolas/root_ggm_daily_rw.20140704_10:16:59 0/5589 1.66G 0.00G > legolas/tmp_ggm_daily_ro.20140704_10:17:07 0/5590 0.08G 0.00G > legolas/tmp_ggm_daily_rw.20140704_10:17:07 0/5591 0.08G 0.00G > legolas/usr_ggm_daily_ro.20140704_10:17:24 0/5592 10.36G 0.00G > legolas/usr_ggm_daily_rw.20140704_10:17:24 0/5593 10.36G 0.00G > legolas/var_ggm_daily_ro.20140704_10:17:39 0/5594 304.28G 19.49G > backup_hourly_20140705_00:03:00 0/5595 -0.00G 0.00G > backup_daily_20140705_00:03:00 0/5596 -0.00G 0.00G > Soft_daily_20140705_00:03:00 0/5597 59.27G 0.00G > Soft_hourly_20140705_00:03:00 0/5598 59.27G 0.00G > Win_hourly_20140705_00:03:00 0/5599 112.60G 0.00G > Win_daily_20140705_00:03:00 0/5600 112.60G 0.00G > backup/debian32_daily_20140705_00:03:00 0/5601 -118.33G 0.00G > backup/debian32_hourly_20140705_00:03:00 0/5602 -118.33G 0.00G > backup/debian64_daily_20140705_00:03:00 0/5603 643.76G 0.00G > backup/debian64_hourly_20140705_00:03:00 0/5604 643.76G 0.00G > backup/ubuntu_daily_20140705_00:03:00 0/5605 328.12G 0.00G > backup/ubuntu_hourly_20140705_00:03:00 0/5606 328.12G 0.00G > backup/0Notmachines_daily_20140705_00:03:00 0/5607 268.78G 0.00G > backup/0Notmachines_hourly_20140705_00:03:00 0/5608 268.78G 0.00G > backup/1Appliances_daily_20140705_00:03:00 0/5609 2.09G 0.00G > backup/1Appliances_hourly_20140705_00:03:00 0/5610 2.09G 0.00G > backup/win_daily_20140705_00:03:00 0/5611 461.61G 0.00G > backup/win_hourly_20140705_00:03:00 0/5612 461.61G 0.00G > legolas/home_ggm_daily_ro.20140705_10:02:09 0/5613 77.54G 0.10G > legolas/root_ggm_daily_ro.20140705_14:07:48 0/5614 1.66G 0.00G > legolas/root_ggm_daily_rw.20140705_14:07:48 0/5615 1.66G 0.00G > legolas/tmp_ggm_daily_ro.20140705_14:08:41 0/5616 0.08G 0.00G > legolas/tmp_ggm_daily_rw.20140705_14:08:41 0/5617 0.08G 0.00G > legolas/usr_ggm_daily_ro.20140705_14:09:13 0/5618 10.36G 0.00G > legolas/usr_ggm_daily_rw.20140705_14:09:13 0/5619 10.36G 0.00G > legolas/var_ggm_daily_ro.20140705_14:10:09 0/5620 299.50G 0.79G > backup_daily_20140706_00:03:01 0/5621 -0.00G 0.00G > backup_hourly_20140706_00:03:01 0/5622 -0.00G 0.00G > Soft_daily_20140706_00:03:01 0/5623 59.27G 0.00G > Soft_hourly_20140706_00:03:01 0/5624 59.27G 0.00G > Win_daily_20140706_00:03:01 0/5625 112.60G 0.00G > Win_hourly_20140706_00:03:01 0/5626 112.60G 0.00G > backup/debian32_daily_20140706_00:03:01 0/5627 -129.94G 0.00G > backup/debian32_hourly_20140706_00:03:01 0/5628 -129.94G 0.00G > backup/debian64_daily_20140706_00:03:01 0/5629 644.62G 0.00G > backup/debian64_hourly_20140706_00:03:01 0/5630 644.62G 0.00G > backup/ubuntu_hourly_20140706_00:03:01 0/5631 328.12G 0.00G > backup/ubuntu_daily_20140706_00:03:01 0/5632 328.12G 0.00G > backup/0Notmachines_daily_20140706_00:03:01 0/5633 268.65G 0.00G > backup/0Notmachines_hourly_20140706_00:03:01 0/5634 268.65G 0.00G > backup/1Appliances_daily_20140706_00:03:01 0/5635 2.09G 0.00G > backup/1Appliances_hourly_20140706_00:03:01 0/5636 2.09G 0.00G > backup/win_daily_20140706_00:03:01 0/5637 461.61G 0.00G > backup/win_hourly_20140706_00:03:01 0/5638 461.61G 0.00G > backup_weekly_20140706_00:04:01 0/5639 -0.00G 0.00G > Soft_weekly_20140706_00:04:01 0/5640 59.27G 0.00G > Win_weekly_20140706_00:04:01 0/5641 112.60G 0.00G > backup/debian32_weekly_20140706_00:04:01 0/5642 -129.94G 0.00G > backup/debian64_weekly_20140706_00:04:01 0/5643 644.62G 0.00G > backup/ubuntu_weekly_20140706_00:04:01 0/5644 328.12G 0.00G > backup/0Notmachines_weekly_20140706_00:04:01 0/5645 268.65G 0.00G > backup/1Appliances_weekly_20140706_00:04:01 0/5646 2.09G 0.00G > backup/win_weekly_20140706_00:04:01 0/5647 461.61G 0.00G > legolas/home_ggm_daily_ro.20140706_10:02:01 0/5648 77.53G 0.00G > legolas/home_ggm_daily_rw.20140706_10:02:01 0/5649 77.53G 0.00G > legolas/root_ggm_daily_ro.20140706_11:08:26 0/5650 1.67G 0.00G > legolas/root_ggm_daily_rw.20140706_11:08:26 0/5651 1.67G 0.00G > legolas/tmp_ggm_daily_ro.20140706_11:09:31 0/5652 0.08G 0.00G > legolas/tmp_ggm_daily_rw.20140706_11:09:31 0/5653 0.08G 0.00G > legolas/usr_ggm_daily_ro.20140706_11:10:34 0/5654 10.36G 0.00G > legolas/usr_ggm_daily_rw.20140706_11:10:34 0/5655 10.36G 0.00G > legolas/var_ggm_daily_ro.20140706_11:11:30 0/5656 301.21G 4.53G > gargamel:/mnt/btrfs_pool2# btrfs quota disable . > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs quotas related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-14 1:36 ` Qu Wenruo @ 2014-07-14 2:43 ` Marc MERLIN 0 siblings, 0 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-14 2:43 UTC (permalink / raw) To: Qu Wenruo Cc: Andrew E. Mileski, Duncan, Russell Coker, Satoru Takeuchi, linux-btrfs On Mon, Jul 14, 2014 at 09:36:28AM +0800, Qu Wenruo wrote: > When you enable quota and create a subvolume, a qgroup(0/<subvol > id>) will be created and bind to the newly created subvolume. > But on the other hand, when you delete the subvolume, the qgroup > will *not* be deleted automatically. > So you need to remove the qgroup manually. > > More info can be found in the btrfs wiki: > https://btrfs.wiki.kernel.org/index.php/Quota_support#Known_issues This was updated since I last looked at it, thanks for pointing that out. So there must be a leak that is related to how many qtrees you have. Because even if I had 1600 qtrees, that shouldn't slowly eat up 6GB of RAM, correct? That said, I'll try the memleak detector you pointed me to, and read up on how it's supposed to work. Thanks for your help, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-13 14:29 ` btrfs is " Marc MERLIN 2014-07-13 15:37 ` Marc MERLIN @ 2014-07-14 1:24 ` Qu Wenruo 2014-07-16 0:36 ` Jérôme Poulin 2014-07-16 15:55 ` Marc MERLIN 1 sibling, 2 replies; 23+ messages in thread From: Qu Wenruo @ 2014-07-14 1:24 UTC (permalink / raw) To: Marc MERLIN, Andrew E. Mileski; +Cc: linux-btrfs -------- Original Message -------- Subject: Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? From: Marc MERLIN <marc@merlins.org> To: Andrew E. Mileski <andrewm@isoar.ca> Date: 2014年07月13日 22:29 > On Sun, Jul 06, 2014 at 07:58:15AM -0700, Marc MERLIN wrote: >>> As an update, after 1.7 days of scrubbing, the system has started >>> getting sluggish, I'm getting synchronization problems/crashes in some of >>> my tools that talk to serial ports (likely due to mini deadlocks in the >>> kernel), and I'm now getting a few btrfs hangs. >> Predictably, it died yesterday afternoon after going into memory death >> (it was answering pings, but userspace was dead, and even sysrq-o did >> not respond, I had to power cycle the power outlet). >> >> This happened just before my 3rd scrub finished, so I'm now 2 out of 2: >> running scrub on my 3 filesystems kills the system half way through the >> 3rd scrub. > Ok, I now changed the subject line to confirm that btrfs is to blame. > > I had my system booted 6 days and it held steady at 6.4GB free. > I mounted 2 of my 4 btrfs filesystems (one by one) and waited a few > days, and no problems with RAM going down. > > Then I mounted my 3rd btrfs filesystem, the one that holds many files > that has rsync backups running, and I lost 1.4GB of RAM overnight. > I've just umounted one of its mountpoints, the one where all the backups > happen while leaving its main pool mounted and will see if RAM keeps > going down or not (i.e. whether the memory leak is due to rsync activity > or some other background btrfs process). > > But generally, is there a tool to locate which kernel function allocated > all that RAM that seems to get allocated and forgotten? This can be done by kernel memleak detection. Location: -> Kernel hacking -> Memory Debugging -> Kernel memory leak detector Then you can check /sys/kernel/debug/memleak to see which function call caused the problem. Thanks, Qu > > Is /proc/slabinfo supposed to show anything useful? > > This is the filesystem in question: > gargamel:~# btrfs fi df /mnt/btrfs_pool2/ > Data, single: total=3.34TiB, used=3.32TiB > System, DUP: total=8.00MiB, used=400.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, DUP: total=77.50GiB, used=59.87GiB > Metadata, single: total=8.00MiB, used=0.00 > > > Thanks, > Marc ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-14 1:24 ` btrfs is " Qu Wenruo @ 2014-07-16 0:36 ` Jérôme Poulin 2014-07-16 15:55 ` Marc MERLIN 1 sibling, 0 replies; 23+ messages in thread From: Jérôme Poulin @ 2014-07-16 0:36 UTC (permalink / raw) To: Qu Wenruo; +Cc: Marc MERLIN, Andrew E. Mileski, linux-btrfs Hi I'd just like to comment that I currently have the same problem, quota are enabled, server can't stay up more than 15 days without hogging all of its memory. Use case is a torrent server which has more read than writes and a desktop. Kmemleak does not detect this leak after 10 days and at least 2 GB used. On my system, only leaks found were from Nvidia and AppArmor (Ubuntu 14.04 kernel 3.13 with Nvidia proprietary driver). On Sun, Jul 13, 2014 at 9:24 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > -------- Original Message -------- > Subject: Re: btrfs is related to OOM death problems on my 8GB server with > both 3.15.1 and 3.14? > From: Marc MERLIN <marc@merlins.org> > To: Andrew E. Mileski <andrewm@isoar.ca> > Date: 2014年07月13日 22:29 >> >> On Sun, Jul 06, 2014 at 07:58:15AM -0700, Marc MERLIN wrote: >>>> >>>> As an update, after 1.7 days of scrubbing, the system has started >>>> getting sluggish, I'm getting synchronization problems/crashes in some >>>> of >>>> my tools that talk to serial ports (likely due to mini deadlocks in the >>>> kernel), and I'm now getting a few btrfs hangs. >>> >>> Predictably, it died yesterday afternoon after going into memory death >>> (it was answering pings, but userspace was dead, and even sysrq-o did >>> not respond, I had to power cycle the power outlet). >>> >>> This happened just before my 3rd scrub finished, so I'm now 2 out of 2: >>> running scrub on my 3 filesystems kills the system half way through the >>> 3rd scrub. >> >> Ok, I now changed the subject line to confirm that btrfs is to blame. >> >> I had my system booted 6 days and it held steady at 6.4GB free. >> I mounted 2 of my 4 btrfs filesystems (one by one) and waited a few >> days, and no problems with RAM going down. >> >> Then I mounted my 3rd btrfs filesystem, the one that holds many files >> that has rsync backups running, and I lost 1.4GB of RAM overnight. >> I've just umounted one of its mountpoints, the one where all the backups >> happen while leaving its main pool mounted and will see if RAM keeps >> going down or not (i.e. whether the memory leak is due to rsync activity >> or some other background btrfs process). >> >> But generally, is there a tool to locate which kernel function allocated >> all that RAM that seems to get allocated and forgotten? > > This can be done by kernel memleak detection. > Location: > -> Kernel hacking > -> Memory Debugging > -> Kernel memory leak detector > > Then you can check /sys/kernel/debug/memleak to see which function call > caused the problem. > > Thanks, > Qu > >> >> Is /proc/slabinfo supposed to show anything useful? >> >> This is the filesystem in question: >> gargamel:~# btrfs fi df /mnt/btrfs_pool2/ >> Data, single: total=3.34TiB, used=3.32TiB >> System, DUP: total=8.00MiB, used=400.00KiB >> System, single: total=4.00MiB, used=0.00 >> Metadata, DUP: total=77.50GiB, used=59.87GiB >> Metadata, single: total=8.00MiB, used=0.00 >> >> >> Thanks, >> Marc > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-14 1:24 ` btrfs is " Qu Wenruo 2014-07-16 0:36 ` Jérôme Poulin @ 2014-07-16 15:55 ` Marc MERLIN 2014-07-17 2:22 ` Marc MERLIN 1 sibling, 1 reply; 23+ messages in thread From: Marc MERLIN @ 2014-07-16 15:55 UTC (permalink / raw) To: Qu Wenruo, Jérôme Poulin; +Cc: Andrew E. Mileski, linux-btrfs On Mon, Jul 14, 2014 at 09:24:11AM +0800, Qu Wenruo wrote: > >But generally, is there a tool to locate which kernel function allocated > >all that RAM that seems to get allocated and forgotten? > This can be done by kernel memleak detection. > Location: > -> Kernel hacking > -> Memory Debugging > -> Kernel memory leak detector > > Then you can check /sys/kernel/debug/memleak to see which function > call caused the problem. I wanted to report back on this. Unfortunately my laptop with 3.15.5 does not stay up more than 30mn, sometimes fewer when kmemleak is on. When I turn it off from the boot command line, the laptop seems to behave ok. On Tue, Jul 15, 2014 at 08:45:47PM -0400, Jérôme Poulin wrote: > Hi, > > For this same problem I once got into single user after 2 weeks of > utilisation and killed all, umounted all FS except root which is ext4, > rmmod'ed all modules and see by yourself: > http://i39.tinypic.com/2rrrjtl.jpg > > For those who want it textually: > 15 days uptime > 10 user mode processes (systemd, top and bash) > 2 GB memory usage, 4 GB total memory, 17 MB cache, 32 KB buffers. Very interesting, thanks for confirming. So even by removing the btrfs module, the memory was not reclaimed, correct? For what it's worth, my server seems ok now that I turned off quotas on it. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: btrfs is related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-16 15:55 ` Marc MERLIN @ 2014-07-17 2:22 ` Marc MERLIN 0 siblings, 0 replies; 23+ messages in thread From: Marc MERLIN @ 2014-07-17 2:22 UTC (permalink / raw) To: Qu Wenruo, Jérôme Poulin; +Cc: Andrew E. Mileski, linux-btrfs On Wed, Jul 16, 2014 at 08:55:32AM -0700, Marc MERLIN wrote: > On Mon, Jul 14, 2014 at 09:24:11AM +0800, Qu Wenruo wrote: > > >But generally, is there a tool to locate which kernel function allocated > > >all that RAM that seems to get allocated and forgotten? > > This can be done by kernel memleak detection. > > Location: > > -> Kernel hacking > > -> Memory Debugging > > -> Kernel memory leak detector > > > > Then you can check /sys/kernel/debug/memleak to see which function > > call caused the problem. > > I wanted to report back on this. > Unfortunately my laptop with 3.15.5 does not stay up more than 30mn, > sometimes fewer when kmemleak is on. When I turn it off from the boot > command line, the laptop seems to behave ok. I forgot to attach the logs of what I get soon after enabling kmemleak: http://marc.merlins.org/tmp/btrfs_hang_kmemleak.txt Highlight, first what got logged as stuff started hanging, and later the top of sysrq-w (look for Sysrq to get to it more quickly) kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) INFO: task kworker/u16:4:95 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:4 D 0000000000000000 0 95 2 0x00000000 Workqueue: btrfs-flush_delalloc normal_work_helper ffff880407297b60 0000000000000046 ffff880407297b30 ffff880407297fd8 ffff8804075f44d0 00000000000141c0 ffff88041e2941c0 ffff8804075f44d0 ffff880407297c00 0000000000000002 ffffffff810fdda8 ffff880407297b70 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff810731c8>] ? ttwu_stat+0x97/0xcc [<ffffffff8162223d>] ? _raw_spin_unlock_irqrestore+0x1f/0x32 [<ffffffff81078164>] ? try_to_wake_up+0x1b3/0x1c5 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810fff50>] filemap_flush+0x1c/0x1e [<ffffffff81231945>] btrfs_run_delalloc_work+0x32/0x69 [<ffffffff81252437>] normal_work_helper+0xfe/0x240 [<ffffffff81065e29>] process_one_work+0x195/0x2d2 [<ffffffff810660cb>] worker_thread+0x136/0x205 [<ffffffff81065f95>] ? process_scheduled_works+0x2f/0x2f [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:548 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. btrfs-transacti D 0000000000000000 0 548 2 0x00000000 ffff880404bbfc80 0000000000000046 000000000000e5fc ffff880404bbffd8 ffff880404bb82d0 00000000000141c0 7fffffffffffffff ffff8802099d2888 0000000000000002 ffffffff8161f048 7fffffffffffffff ffff880404bbfc90 Call Trace: [<ffffffff8161f048>] ? sock_rps_reset_flow+0x37/0x37 [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161f081>] schedule_timeout+0x39/0x129 [<ffffffff810765ed>] ? get_parent_ip+0xd/0x3c [<ffffffff81625a8f>] ? preempt_count_add+0x7a/0x8d [<ffffffff81620244>] __wait_for_common+0x11a/0x159 [<ffffffff810781bf>] ? wake_up_state+0x12/0x12 [<ffffffff816202a7>] wait_for_completion+0x24/0x26 [<ffffffff8123981f>] btrfs_wait_and_free_delalloc_work+0x16/0x28 [<ffffffff812417f8>] btrfs_run_ordered_operations+0x1e7/0x21e [<ffffffff8122cbad>] btrfs_commit_transaction+0x27/0x8b0 [<ffffffff8122966a>] transaction_kthread+0xf8/0x1ab [<ffffffff81229572>] ? btrfs_cleanup_transaction+0x44c/0x44c [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 INFO: task pidgin:11402 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pidgin D 0000000000000006 0 11402 5650 0x00000080 ffff8803c4307c30 0000000000000082 ffff8803c4307c00 ffff8803c4307fd8 ffff8803cb684050 00000000000141c0 ffff88041e3941c0 ffff8803cb684050 ffff8803c4307cd0 0000000000000002 ffffffff810fdda8 ffff8803c4307c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81317ac9>] ? debug_smp_processor_id+0x17/0x19 [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8132254e>] ? __percpu_counter_add+0x8c/0xa6 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task BrowserBlocking:15468 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. BrowserBlocking D 0000000000000000 0 15468 15395 0x00000080 ffff88038ba6fc30 0000000000000086 ffff88038ba6fc00 ffff88038ba6ffd8 ffff8800b394c310 00000000000141c0 ffff88041e2141c0 ffff8800b394c310 ffff88038ba6fcd0 0000000000000002 ffffffff810fdda8 ffff88038ba6fc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff81158d94>] ? __sb_end_write+0x2d/0x5b [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task Chrome_DBThread:15469 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Chrome_DBThread D 0000000000000004 0 15469 15395 0x00000080 ffff8803967afc30 0000000000000086 ffff8803967afc00 ffff8803967affd8 ffff880395892290 00000000000141c0 ffff88041e3141c0 ffff880395892290 ffff8803967afcd0 0000000000000002 ffffffff810fdda8 ffff8803967afc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff8111d502>] ? copy_page_to_iter+0x163/0x26b [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task Chrome_FileThre:15470 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Chrome_FileThre D 0000000000000005 0 15470 15395 0x00000080 ffff88038bd7fc30 0000000000000086 ffff88038bd7fc00 ffff88038bd7ffd8 ffff8803a065c250 00000000000141c0 ffff88041e3541c0 ffff8803a065c250 ffff88038bd7fcd0 0000000000000002 ffffffff810fdda8 ffff88038bd7fc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff816204bc>] ? mutex_unlock+0x16/0x18 [<ffffffff8123bb5d>] ? btrfs_file_aio_write+0x3e9/0x4b6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff81158d94>] ? __sb_end_write+0x2d/0x5b [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task IndexedDB:15475 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. IndexedDB D 0000000000000007 0 15475 15395 0x00000080 ffff88038ba87c30 0000000000000086 ffff88038ba87c00 ffff88038ba87fd8 ffff88038bda6150 00000000000141c0 ffff88041e3d41c0 ffff88038bda6150 ffff88038ba87cd0 0000000000000002 ffffffff810fdda8 ffff88038ba87c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff816204bc>] ? mutex_unlock+0x16/0x18 [<ffffffff8123bb5d>] ? btrfs_file_aio_write+0x3e9/0x4b6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a743>] btrfs_sync_file+0xa8/0x2b3 [<ffffffff8132254e>] ? __percpu_counter_add+0x8c/0xa6 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task BrowserBlocking:15487 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. BrowserBlocking D ffff8804044d4080 0 15487 15395 0x00000080 ffff88038ba0bc30 0000000000000086 ffff88038ba0bc00 ffff88038ba0bfd8 ffff8800b4256350 00000000000141c0 ffff88041e2941c0 ffff8800b4256350 ffff88038ba0bcd0 0000000000000002 ffffffff810fdda8 ffff88038ba0bc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff8111d38c>] ? pagefault_enable+0xe/0x21 [<ffffffff8111d502>] ? copy_page_to_iter+0x163/0x26b [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0xffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task BrowserBlocking:15501 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. BrowserBlocking D 0000000000000002 0 15501 15395 0x00000080 ffff8800af733c30 0000000000000086 ffff8800af733c00 ffff8800af733fd8 ffff8800af73c3d0 00000000000141c0 ffff88041e2941c0 ffff8800af73c3d0 ffff8800af733cd0 0000000000000002 ffffffff810fdda8 ffff8800af733c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff81158d94>] ? __sb_end_write+0x2d/0x5b [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f INFO: task Chrome_HistoryT:15507 blocked for more than 120 seconds. Tainted: G O 3.15.5-amd64-i915-preempt-20140714 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Chrome_HistoryT D 0000000000000000 0 15507 15395 0x00000080 ffff88038772fc30 0000000000000086 ffff88038772fc00 ffff88038772ffd8 ffff88038ba0e1d0 00000000000141c0 ffff88041e2141c0 ffff88038ba0e1d0 ffff88038772fcd0 0000000000000002 ffffffff810fdda8 ffff88038772fc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff813085a7>] ? radix_tree_lookup_slot+0x10/0x1f [<ffffffff810fe4bc>] ? find_get_entry+0x69/0x75 [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f SysRq : Show Blocked State task PC stack pid father kworker/u16:4 D 0000000000000000 0 95 2 0x00000000 Workqueue: btrfs-flush_delalloc normal_work_helper ffff880407297b60 0000000000000046 ffff880407297b30 ffff880407297fd8 ffff8804075f44d0 00000000000141c0 ffff88041e2941c0 ffff8804075f44d0 ffff880407297c00 0000000000000002 ffffffff810fdda8 ffff880407297b70 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff810731c8>] ? ttwu_stat+0x97/0xcc [<ffffffff8162223d>] ? _raw_spin_unlock_irqrestore+0x1f/0x32 [<ffffffff81078164>] ? try_to_wake_up+0x1b3/0x1c5 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810fff50>] filemap_flush+0x1c/0x1e [<ffffffff81231945>] btrfs_run_delalloc_work+0x32/0x69 [<ffffffff81252437>] normal_work_helper+0xfe/0x240 [<ffffffff81065e29>] process_one_work+0x195/0x2d2 [<ffffffff810660cb>] worker_thread+0x136/0x205 [<ffffffff81065f95>] ? process_scheduled_works+0x2f/0x2f [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 btrfs-transacti D 0000000000000000 0 548 2 0x00000000 ffff880404bbfc80 0000000000000046 000000000000e5fc ffff880404bbffd8 ffff880404bb82d0 00000000000141c0 7fffffffffffffff ffff8802099d2888 0000000000000002 ffffffff8161f048 7fffffffffffffff ffff880404bbfc90 Call Trace: [<ffffffff8161f048>] ? sock_rps_reset_flow+0x37/0x37 [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161f081>] schedule_timeout+0x39/0x129 [<ffffffff810765ed>] ? get_parent_ip+0xd/0x3c [<ffffffff81625a8f>] ? preempt_count_add+0x7a/0x8d [<ffffffff81620244>] __wait_for_common+0x11a/0x159 [<ffffffff810781bf>] ? wake_up_state+0x12/0x12 [<ffffffff816202a7>] wait_for_completion+0x24/0x26 [<ffffffff8123981f>] btrfs_wait_and_free_delalloc_work+0x16/0x28 [<ffffffff812417f8>] btrfs_run_ordered_operations+0x1e7/0x21e [<ffffffff8122cbad>] btrfs_commit_transaction+0x27/0x8b0 [<ffffffff8122966a>] transaction_kthread+0xf8/0x1ab [<ffffffff81229572>] ? btrfs_cleanup_transaction+0x44c/0x44c [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 pidgin D 0000000000000006 0 11402 5650 0x00000080 ffff8803c4307c30 0000000000000082 ffff8803c4307c00 ffff8803c4307fd8 ffff8803cb684050 00000000000141c0 ffff88041e3941c0 ffff8803cb684050 ffff8803c4307cd0 0000000000000002 ffffffff810fdda8 ffff8803c4307c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81317ac9>] ? debug_smp_processor_id+0x17/0x19 [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8132254e>] ? __percpu_counter_add+0x8c/0xa6 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f BrowserBlocking D 0000000000000000 0 15468 15395hedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff8111d502>] ? copy_page_to_iter+0x163/0x26b [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f Chrome_FileThre D 0000000000000005 0 15470 15395 0x00000080 ffff88038bd7fc30 0000000000000086 ffff88038bd7fc00 ffff88038bd7ffd8 ffff8803a065c250 00000000000141c0 ffff88041e3541c0 ffff8803a065c250 ffff88038bd7fcd0 0000000000000002 ffffffff810fdda8 ffff88038bd7fc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff816204bc>] ? mutex_unlock+0x16/0x18 [<ffffffff8123bb5d>] ? btrfs_file_aio_write+0x3e9/0x4b6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff81158d94>] ? __sb_end_write+0x2d/0x5b [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f IndexedDB D 0000000000000007 0 15475 15395 0x00000080 ffff88038ba87c30 0000000000000086 ffff88038ba87c00 ffff88038ba87fd8 ffff88038bda6150 00000000000141c0 ffff88041e3d41c0 ffff88038bda6150 ffff88038ba87cd0 0000000000000002 ffffffff810fdda8 ffff88038ba87c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake/0x21 [<ffffffff8111d502>] ? copy_page_to_iter+0x163/0x26b [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a743>] btrfs_sync_file+0xa8/0x2b3 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f BrowserBlocking D 0000000000000002 0 15501 15395 0x00000080 ffff8800af733c30 0000000000000086 ffff8800af733c00 ffff8800af733fd8 ffff8800af73c3d0 00000000000141c0 ffff88041e2941c0 ffff8800af73c3d0 ffff8800af733cd0 0000000000000002 ffffffff810fdda8 ffff8800af733c40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff81158d94>] ? __sb_end_write+0x2d/0x5b [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c46a>] SyS_fsync+0x10/0x14 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f Chrome_HistoryT D 0000000000000000 0 15507 15395 0x00000080 ffff88038772fc30 0000000000000086 ffff88038772fc00 ffff88038772ffd8 ffff88038ba0e1d0 00000000000141c0 ffff88041e2141c0 ffff88038ba0e1d0 ffff88038772fcd0 0000000000000002 ffffffff810fdda8 ffff88038772fc40 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff813085a7>] ? radix_tree_lookup_slot+0x10/0x1f [<ffffffff810fe4bc>] ? find_get_entry+0x69/0x75 [<ffffffff810fdf7d>] ? file_accessed+0x13/0x15 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_wrrange+0x55/0x57 [<ffffffff810ff57d>] filemap_fdatawrite_range+0x13/0x15 [<ffffffff8123a726>] btrfs_sync_file+0x8b/0x2b3 [<ffffffff8117c02f>] vfs_fsync_range+0x18/0x22 [<ffffffff8117c055>] vfs_fsync+0x1c/0x1e [<ffffffff8117c261>] do_fsync+0x2c/0x4c [<ffffffff8117c481>] SyS_fdatasync+0x13/0x17 [<ffffffff81628e2d>] system_call_fastpath+0x1a/0x1f kworker/u16:5 D 0000000000000000 0 15410 2 0x00000080 Workqueue: btrfs-delalloc normal_work_helper ffff88039b863860 0000000000000046 ffff88039b863830 ffff88039b863fd8 ffff88039f87a0d0 00000000000141c0 ffff88041e3141c0 ffff88039f87a0d0 ffff88039b863900 0000000000000002 ffffffff810fdda8 ffff88039b863870 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff810fe757>] lock_page+0x19/0x1c [<ffffffff810fe7b3>] find_lock_entry+0x33/0x55 [<ffffffff810fe7e3>] find_lock_page+0xe/0x1b [<ffffffff810feccc>] find_or_create_page+0x31/0x83 [<ffffffff81260ce7>] io_ctl_prepare_pages+0x49/0x11c [<ffffffff81262897>] __load_free_space_cache+0x1be/0x56c [<ffffffff81262d2b>] load_free_space_cache+0xe6/0x199 [<ffffffff810765ed>] ? get_parent_ip+0xd/0x3c [<ffffffff81217d49>] cache_block_group+0x1c4/0x343 [<ffffffff8108489d>] ? finish_wait+0x65/0x65 [<ffffffff8121c7ed>] find_free_extent+0x391/0x89e [<ffffffff8121ce7a>] btrfs_reserve_extent+0x70/0x114 [<ffffffff81232765>] cow_file_range+0x1b0/0x388 [<ffffffff81233414>] submit_compressed_extents+0x102/0x40f [<ffffffff81152ae3>] ? delete_object_full+0x29/0x2c [<ffffffff81231910>] ? async_cow_free+0x24/0x27 [<ffffffff812337a7>] async_cow_submit+0x86/0x8b [<ffffffff812524cd>] normal_work_helper+0x194/0x240 [<ffffffff81065e29>] process_one_work+0x195/0x2d2 [<ffffffff810660cb>] worker_thread+0x136/0x205 [<ffffffff81065f95>] ? process_scheduled_works+0x2f/0x2f [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 kworker/u16:11 D 0000000000000000 0 15419 2 0x00000080 Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) ffff8800b3823a00 0000000000000046 ffff8800b38239d0 ffff8800b3823fd8 ffff8800b6f60110 00000000000141c0 ffff88041e3541c0 ffff8800b6f60110 ffff8800b3823aa0 0000000000000002 ffffffff810fdda8 ffff8800b3823a10 Call Trace: [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c [<ffffffff8161fa5e>] schedule+0x73/0x75 [<ffffffff8161fc03>] io_schedule+0x60/0x7a [<ffffffff810fddb6>] sleep_on_page+0xe/0x12 [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a [<ffffffff810fde71>] __lock_page+0x69/0x6b [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34 [<ffffffff81242aaf>] lock_page+0x1e/0x21 [<ffffffff812465ba>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6 [<ffffffff81246a18>] extent_writepages+0x4b/0x5c [<ffffffff81230d08>] ? btrfs_submit_direct+0x3f4/0x3f4 [<ffffffff81625a8c>] ? preempt_count_add+0x77/0x8d [<ffffffff8122f2e3>] btrfs_writepages+0x28/0x2a [<ffffffff8110873d>] do_writepages+0x1e/0x2c [<ffffffff81177d7f>] __writeback_single_inode+0x7d/0x238 [<ffffffff81178ab7>] writeback_sb_inodes+0x1eb/0x339 [<ffffffff81178c79>] __writeback_inodes_wb+0x74/0xb7 [<ffffffff81178df4>] wb_writeback+0x138/0x293 [<ffffffff8117942c>] bdi_writeback_workfn+0x19a/0x329 [<ffffffff8100d047>] ? load_TLS+0xb/0xf [<ffffffff81065e29>] process_one_work+0x195/0x2d2 [<ffffffff810660cb>] worker_thread+0x136/0x205 [<ffffffff81065f95>] ? process_scheduled_works+0x2f/0x2f [<ffffffff8106b564>] kthread+0xae/0xb6 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61 -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-06 14:58 ` Marc MERLIN 2014-07-13 14:29 ` btrfs is " Marc MERLIN @ 2014-07-16 0:45 ` Jérôme Poulin 1 sibling, 0 replies; 23+ messages in thread From: Jérôme Poulin @ 2014-07-16 0:45 UTC (permalink / raw) To: Marc MERLIN; +Cc: Andrew E. Mileski, linux-btrfs Hi, For this same problem I once got into single user after 2 weeks of utilisation and killed all, umounted all FS except root which is ext4, rmmod'ed all modules and see by yourself: http://i39.tinypic.com/2rrrjtl.jpg For those who want it textually: 15 days uptime 10 user mode processes (systemd, top and bash) 2 GB memory usage, 4 GB total memory, 17 MB cache, 32 KB buffers. On Sun, Jul 6, 2014 at 10:58 AM, Marc MERLIN <marc@merlins.org> wrote: > On Sat, Jul 05, 2014 at 07:43:18AM -0700, Marc MERLIN wrote: >> On Sat, Jul 05, 2014 at 09:47:09AM -0400, Andrew E. Mileski wrote: >> > On 2014-07-03 9:19 PM, Marc MERLIN wrote: >> > >I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been >> > >running out of memory and deadlocking (panic= doesn't even work). >> > >I downgraded back to 3.14, but I already had the problem once since then. >> > >> > I didn't see any mention of the btrfs utility version in this thread >> > (I may be blind though). >> > >> > My server was suffering from frequent panics upon scrub / defrag / >> > balance, until I updated the btrfs utility. That resolved all my >> > issues. >> >> Really? The userland tool should only send ioctls to the kernel, I >> really can't see how it would cause the kernel code to panic or not. >> >> gargamel:~# btrfs --version >> Btrfs v3.14.1 >> which is the latest in debian unstable. >> >> As an update, after 1.7 days of scrubbing, the system has started >> getting sluggish, I'm getting synchronization problems/crashes in some of >> my tools that talk to serial ports (likely due to mini deadlocks in the >> kernel), and I'm now getting a few btrfs hangs. > > Predictably, it died yesterday afternoon after going into memory death > (it was answering pings, but userspace was dead, and even sysrq-o did > not respond, I had to power cycle the power outlet). > > This happened just before my 3rd scrub finished, so I'm now 2 out of 2: > running scrub on my 3 filesystems kills the system half way through the > 3rd scrub. > > This is the last memory log that reached the disk: > http://marc.merlins.org/tmp/btrfs-oom2.txt > > Do those logs point to any possible culprit, or a kernel memory leak > cannot be pointed to its source because the kernel loses track of who > requested the memory that leaked? > > > Excerpt here: > Sat Jul 5 14:25:04 PDT 2014 > total used free shared buffers cached > Mem: 7894792 7712384 182408 0 28 227480 > -/+ buffers/cache: 7484876 409916 > Swap: 15616764 463732 15153032 > > Userspace is using 345MB according to ps > > Sat Jul 5 14:25:04 PDT 2014 > MemTotal: 7894792 kB > MemFree: 184556 kB > MemAvailable: 269568 kB > Buffers: 28 kB > Cached: 228164 kB > SwapCached: 18296 kB > Active: 178196 kB > Inactive: 187016 kB > Active(anon): 70068 kB > Inactive(anon): 71100 kB > Active(file): 108128 kB > Inactive(file): 115916 kB > Unevictable: 5624 kB > Mlocked: 5624 kB > SwapTotal: 15616764 kB > SwapFree: 15152768 kB > Dirty: 17716 kB > Writeback: 516 kB > AnonPages: 140588 kB > Mapped: 21940 kB > Shmem: 688 kB > Slab: 181708 kB > SReclaimable: 59808 kB > SUnreclaim: 121900 kB > KernelStack: 4728 kB > PageTables: 8480 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 19564160 kB > Committed_AS: 1633204 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 358996 kB > VmallocChunk: 34359281468 kB > HardwareCorrupted: 0 kB > AnonHugePages: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 144920 kB > DirectMap2M: 7942144 kB > > Thanks, > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems .... > .... what McDonalds is to gourmet cooking > Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? 2014-07-04 1:19 Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? Marc MERLIN 2014-07-04 4:33 ` Russell Coker 2014-07-05 13:47 ` Andrew E. Mileski @ 2014-07-05 14:27 ` Andrew E. Mileski 2 siblings, 0 replies; 23+ messages in thread From: Andrew E. Mileski @ 2014-07-05 14:27 UTC (permalink / raw) To: linux-btrfs On 2014-07-03 9:19 PM, Marc MERLIN wrote: > I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been > running out of memory and deadlocking (panic= doesn't even work). > I downgraded back to 3.14, but I already had the problem once since then. I didn't see any mention of the btrfs utility version in this thread (I may be blind though). My server was suffering from frequent panics upon scrub / defrag / balance, until I updated the btrfs utility. That resolved all my issues. My server has two BTRFS filesystems of ~25 TiB each, and 4 GB of RAM (15% used). A hardware upgrade is imminent; just waiting for the backup to complete. ~~ Andrew E. Mileski ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2014-07-17 2:22 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-04 1:19 Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14? Marc MERLIN 2014-07-04 4:33 ` Russell Coker 2014-07-04 6:04 ` Marc MERLIN 2014-07-04 6:23 ` Satoru Takeuchi 2014-07-04 14:24 ` Marc MERLIN 2014-07-04 14:45 ` Russell Coker 2014-07-04 15:07 ` Marc MERLIN 2014-07-04 22:13 ` Duncan 2014-07-05 13:47 ` Andrew E. Mileski 2014-07-05 14:43 ` Marc MERLIN 2014-07-05 15:17 ` Andrew E. Mileski 2014-07-06 14:58 ` Marc MERLIN 2014-07-13 14:29 ` btrfs is " Marc MERLIN 2014-07-13 15:37 ` Marc MERLIN 2014-07-13 15:45 ` btrfs quotas " Marc MERLIN 2014-07-14 1:36 ` Qu Wenruo 2014-07-14 2:43 ` Marc MERLIN 2014-07-14 1:24 ` btrfs is " Qu Wenruo 2014-07-16 0:36 ` Jérôme Poulin 2014-07-16 15:55 ` Marc MERLIN 2014-07-17 2:22 ` Marc MERLIN 2014-07-16 0:45 ` Is btrfs " Jérôme Poulin 2014-07-05 14:27 ` Andrew E. Mileski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).