* doing lots of disk writes causes oom killer to kill processes
@ 2013-02-08 16:31 Michal Suchanek
2013-03-11 13:15 ` Michal Suchanek
0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-02-08 16:31 UTC (permalink / raw)
To: linux-mm, 699277
[-- Attachment #1: Type: text/plain, Size: 941 bytes --]
Hello,
I am dealing with VM disk images and performing something like wiping
free space to prepare image for compressing and storing on server or
copying it to external USB disk causes
1) system lockup in order of a few tens of seconds when all CPU cores
are 100% used by system and the machine is basicaly unusable
2) oom killer killing processes
This all on system with 8G ram so there should be plenty space to work with.
This happens with kernels 3.6.4 or 3.7.1
With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
problem even with less ram.
I have vm.swappiness = 0 set for a long time already.
Presumably the kernel should stop the process writing to disk from
creating more buffers and write out some rather than start killing
processes to free memory. Also it might use some swap even with zero
swappiness when there is no other way to free more space. Earlier
kernels certainly did that.
Thanks
Michal
[-- Attachment #2: dmesg.oom-kill.txt --]
[-- Type: text/plain, Size: 18561 bytes --]
[773518.533786] init cpuset=/ mems_allowed=0
[773518.533790] Pid: 1, comm: init Not tainted 3.7-trunk-amd64 #1 Debian 3.7.3-1~experimental.1
[773518.533791] Call Trace:
[773518.533802] [<ffffffff81373554>] ? dump_header+0x70/0x1aa
[773518.533807] [<ffffffff8103fbff>] ? put_online_cpus+0x1f/0x66
[773518.533811] [<ffffffff810a0dd4>] ? rcu_oom_notify+0xc6/0xd8
[773518.533815] [<ffffffff810c428b>] ? oom_kill_process+0x72/0x2c5
[773518.533819] [<ffffffff810612eb>] ? should_resched+0x5/0x23
[773518.533821] [<ffffffff810c49ca>] ? out_of_memory+0x384/0x3d5
[773518.533825] [<ffffffff810c8873>] ? __alloc_pages_nodemask+0x5c0/0x74c
[773518.533829] [<ffffffff810f9e4e>] ? kmem_getpages+0x54/0x124
[773518.533832] [<ffffffff810fa778>] ? fallback_alloc+0x125/0x1ed
[773518.533834] [<ffffffff810fb3a2>] ? kmem_cache_alloc+0x7b/0xfd
[773518.533838] [<ffffffff8111099d>] ? getname_flags.part.23+0x22/0x10a
[773518.533841] [<ffffffff81113523>] ? user_path_at_empty+0x35/0xa8
[773518.533845] [<ffffffff81372269>] ? __bad_area_nosemaphore+0x95/0x1ec
[773518.533849] [<ffffffff8137b4f4>] ? __do_page_fault+0x2fa/0x376
[773518.533851] [<ffffffff810612eb>] ? should_resched+0x5/0x23
[773518.533854] [<ffffffff813778f2>] ? _cond_resched+0x6/0x1b
[773518.533857] [<ffffffff8110bc9c>] ? cp_new_stat+0x11d/0x130
[773518.533861] [<ffffffff81042e59>] ? timespec_add_safe+0x1e/0x4d
[773518.533864] [<ffffffff8110ba31>] ? vfs_fstatat+0x2d/0x63
[773518.533867] [<ffffffff81115973>] ? poll_select_copy_remaining+0xea/0xff
[773518.533869] [<ffffffff8110bd46>] ? sys_newstat+0x12/0x2d
[773518.533872] [<ffffffff81378918>] ? page_fault+0x28/0x30
[773518.533875] [<ffffffff8137d6e9>] ? system_call_fastpath+0x16/0x1b
[773518.533877] Mem-Info:
[773518.533879] Node 0 DMA per-cpu:
[773518.533882] CPU 0: hi: 0, btch: 1 usd: 0
[773518.533883] CPU 1: hi: 0, btch: 1 usd: 0
[773518.533885] CPU 2: hi: 0, btch: 1 usd: 0
[773518.533887] CPU 3: hi: 0, btch: 1 usd: 0
[773518.533888] Node 0 DMA32 per-cpu:
[773518.533890] CPU 0: hi: 186, btch: 31 usd: 0
[773518.533891] CPU 1: hi: 186, btch: 31 usd: 0
[773518.533893] CPU 2: hi: 186, btch: 31 usd: 0
[773518.533895] CPU 3: hi: 186, btch: 31 usd: 0
[773518.533896] Node 0 Normal per-cpu:
[773518.533898] CPU 0: hi: 186, btch: 31 usd: 0
[773518.533899] CPU 1: hi: 186, btch: 31 usd: 0
[773518.533901] CPU 2: hi: 186, btch: 31 usd: 0
[773518.533902] CPU 3: hi: 186, btch: 31 usd: 0
[773518.533907] active_anon:439845 inactive_anon:131886 isolated_anon:0
active_file:611682 inactive_file:680137 isolated_file:0
unevictable:1641 dirty:680244 writeback:0 unstable:0
free:25754 slab_reclaimable:80416 slab_unreclaimable:7512
mapped:9807 shmem:11579 pagetables:4334 bounce:0
free_cma:0
[773518.533911] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[773518.533917] lowmem_reserve[]: 0 2998 7921 7921
[773518.533920] Node 0 DMA32 free:45188kB min:25528kB low:31908kB high:38292kB active_anon:16040kB inactive_anon:127684kB active_file:1324868kB inactive_file:1352336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3070172kB mlocked:0kB dirty:1352548kB writeback:0kB mapped:1360kB shmem:15220kB slab_reclaimable:165324kB slab_unreclaimable:1936kB kernel_stack:8kB pagetables:808kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5267308 all_unreclaimable? yes
[773518.533927] lowmem_reserve[]: 0 0 4923 4923
[773518.533929] Node 0 Normal free:41924kB min:41924kB low:52404kB high:62884kB active_anon:1743340kB inactive_anon:399860kB active_file:1121860kB inactive_file:1368212kB unevictable:6564kB isolated(anon):0kB isolated(file):0kB present:5041920kB mlocked:6564kB dirty:1368428kB writeback:0kB mapped:37868kB shmem:31096kB slab_reclaimable:156340kB slab_unreclaimable:28112kB kernel_stack:2376kB pagetables:16528kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10328235 all_unreclaimable? yes
[773518.533936] lowmem_reserve[]: 0 0 0 0
[773518.533938] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[773518.533946] Node 0 DMA32: 65*4kB 2176*8kB 940*16kB 242*32kB 10*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 0*4096kB = 45188kB
[773518.533953] Node 0 Normal: 530*4kB 824*8kB 234*16kB 73*32kB 34*64kB 15*128kB 8*256kB 5*512kB 12*1024kB 1*2048kB 1*4096kB = 41928kB
[773518.533961] 1304075 total pagecache pages
[773518.533962] 0 pages in swap cache
[773518.533964] Swap cache stats: add 0, delete 0, find 0/0
[773518.533965] Free swap = 8387576kB
[773518.533966] Total swap = 8387576kB
[773518.573684] 2064368 pages RAM
[773518.573686] 47751 pages reserved
[773518.573688] 2211781 pages shared
[773518.573689] 849932 pages non-shared
[773518.573690] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[773518.573705] [ 445] 0 445 5456 385 16 0 -1000 udevd
[773518.573710] [ 3075] 0 3075 2489 716 8 0 -1000 dhclient
[773518.573713] [ 3088] 113 3088 7324 245 18 0 0 dnsmasq
[773518.573716] [ 3946] 0 3946 7089 251 20 0 0 nodm
[773518.573719] [ 3963] 0 3963 98955 37868 158 0 0 Xorg
[773518.573723] [ 3986] 0 3986 29676 434 23 0 0 rsyslogd
[773518.573725] [ 4129] 0 4129 4167 82 12 0 0 atd
[773518.573728] [ 4189] 0 4189 3651 166 13 0 0 inetutils-inetd
[773518.573731] [ 4219] 110 4219 19846 625 30 0 0 dictd
[773518.573734] [ 4256] 114 4256 1021 78 8 0 0 uml_switch
[773518.573737] [ 4274] 0 4274 1685 92 9 0 0 gpm
[773518.573741] [ 4377] 103 4377 7512 291 19 0 0 dbus-daemon
[773518.573744] [ 4408] 0 4408 5085 228 15 0 0 cron
[773518.573747] [ 4435] 0 4435 10731 538 23 0 0 apache
[773518.573750] [ 4490] 0 4490 1061 196 8 0 0 acpid
[773518.573753] [ 4608] 0 4608 20551 457 42 0 0 winbindd
[773518.573756] [ 4648] 109 4648 15146 6221 36 0 0 tor
[773518.573759] [ 4726] 0 4726 20551 369 41 0 0 winbindd
[773518.573762] [ 4757] 33 4757 11156 282 25 0 0 lighttpd
[773518.573765] [ 5044] 0 5044 12462 285 27 0 -1000 sshd
[773518.573768] [ 5050] 1000 5050 17146 257 37 0 0 nodm
[773518.573771] [ 5052] 0 5052 48274 651 30 0 0 console-kit-dae
[773518.573774] [ 5054] 0 5054 983 20 7 0 0 acpi_fakekeyd
[773518.573777] [ 5134] 0 5134 30736 452 29 0 0 polkitd
[773518.573780] [ 5143] 1000 5143 10193 202 25 0 0 ck-launch-sessi
[773518.573783] [ 5157] 1000 5157 3096 130 10 0 0 ssh-agent
[773518.573785] [ 5164] 1000 5164 6047 117 17 0 0 dbus-launch
[773518.573789] [ 5165] 1000 5165 7450 96 18 0 0 dbus-daemon
[773518.573792] [ 5226] 1000 5226 59673 2395 85 0 0 gkrellm
[773518.573795] [ 5257] 0 5257 3173 194 11 0 0 getty
[773518.573797] [ 5258] 0 5258 3173 193 13 0 0 getty
[773518.573801] [ 5259] 0 5259 3173 193 12 0 0 getty
[773518.573803] [ 5260] 0 5260 3173 193 12 0 0 getty
[773518.573807] [ 5261] 0 5261 3173 194 12 0 0 getty
[773518.573810] [ 5262] 0 5262 3173 194 12 0 0 getty
[773518.573813] [ 5310] 1000 5310 15322 623 36 0 0 xscreensaver
[773518.573816] [ 5311] 1000 5311 56136 2213 94 0 0 lxpanel
[773518.573819] [ 5314] 1000 5314 10703 369 27 0 0 menu-cached
[773518.573822] [ 5316] 1000 5316 3639 163 12 0 0 uim-helper-serv
[773518.573825] [ 5344] 1000 5344 17962 4954 40 0 0 xmonad-x86_64-l
[773518.573828] [ 5347] 1000 5347 6047 117 16 0 0 dbus-launch
[773518.573831] [ 5348] 1000 5348 7648 368 20 0 0 dbus-daemon
[773518.573834] [ 5351] 1000 5351 24138 1870 51 0 0 uim-xim
[773518.573836] [ 5352] 1000 5352 56268 1524 75 0 0 uim-toolbar-gtk
[773518.573839] [ 5359] 1000 5359 15465 399 35 0 0 gvfsd
[773518.573842] [ 5361] 1000 5361 20768 334 36 0 0 gvfs-fuse-daemo
[773518.573845] [ 5384] 1000 5384 28001 2911 58 0 0 urxvt
[773518.573848] [ 5386] 1000 5386 31756 1295 65 0 0 uim-candwin-gtk
[773518.573850] [ 5387] 1000 5387 5522 1070 14 0 0 bash
[773518.573853] [ 5458] 1000 5458 744949 486898 1657 0 0 firefox
[773518.573856] [ 5609] 1000 5609 27075 1980 54 0 0 urxvt
[773518.573859] [ 5610] 1000 5610 5416 909 16 0 0 bash
[773518.573861] [ 3831] 1000 3831 7450 149 19 0 -1000 dbus-daemon
[773518.573864] [ 3860] 1000 3860 6605 431 17 0 -1000 tmux
[773518.573867] [ 3861] 1000 3861 5877 1560 16 0 -1000 bash
[773518.573870] [15917] 0 15917 23575 534 50 0 -1000 sshd
[773518.573873] [15922] 1000 15922 24426 973 50 0 -1000 sshd
[773518.573876] [21553] 1000 21553 25417 475 51 0 0 usmb
[773518.573878] [25733] 1000 25733 28656 4048 58 0 0 urxvt
[773518.573881] [25734] 1000 25734 5416 939 15 0 0 bash
[773518.573884] [11439] 1000 11439 40328 758 42 0 0 gvfs-gdu-volume
[773518.573888] [11441] 0 11441 32507 757 30 0 0 udisks-daemon
[773518.573891] [11442] 0 11442 11853 164 27 0 0 udisks-daemon
[773518.573894] [11448] 1000 11448 15112 430 35 0 0 gvfs-gphoto2-vo
[773518.573896] [11450] 1000 11450 36103 389 39 0 0 gvfs-afc-volume
[773518.573899] [25534] 1000 25534 11735 453 27 0 0 gvfsd-metadata
[773518.573902] [30298] 1000 30298 26677 1508 55 0 0 urxvt
[773518.573905] [30299] 1000 30299 5416 939 15 0 0 bash
[773518.573908] [ 9122] 33 9122 10731 203 23 0 0 apache
[773518.573911] [ 9123] 33 9123 10731 203 23 0 0 apache
[773518.573914] [ 9124] 33 9124 10731 203 23 0 0 apache
[773518.573917] [ 9125] 33 9125 10731 203 23 0 0 apache
[773518.573920] [ 9127] 33 9127 10731 203 23 0 0 apache
[773518.573923] [ 9180] 108 9180 5444 256 15 0 0 privoxy
[773518.573926] [10394] 1000 10394 10886 535 26 0 0 ssh
[773518.573929] [10581] 1000 10581 33217 443 30 0 0 dconf-service
[773518.573932] [10585] 1000 10585 10852 472 26 0 0 ssh
[773518.573935] [10643] 1000 10643 61337 961 81 0 0 pulseaudio
[773518.573938] [14858] 1000 14858 26929 1824 55 0 0 urxvt
[773518.573940] [14859] 1000 14859 5455 1016 15 0 0 bash
[773518.573943] [15138] 1000 15138 28981 2280 58 0 0 urxvt
[773518.573946] [15139] 1000 15139 5416 922 15 0 0 bash
[773518.573949] [15203] 1000 15203 10852 493 26 0 0 ssh
[773518.573952] [ 9933] 1000 9933 16635 584 39 0 0 gvfsd-trash
[773518.573954] [26409] 1000 26409 4387 246 13 0 0 tmux
[773518.573958] [26434] 0 26434 5455 327 15 0 -1000 udevd
[773518.573961] [26435] 0 26435 5455 326 15 0 -1000 udevd
[773518.573964] [26465] 1000 26465 5868 1392 16 0 -1000 bash
[773518.573967] [ 7970] 0 7970 4711 1642 16 0 0 atop
[773518.573974] [25248] 1000 25248 4574 237 14 0 -1000 cp
[773518.573976] Out of memory: Kill process 5458 (firefox) score 118 or sacrifice child
[773518.573986] Killed process 5458 (firefox) total-vm:2979796kB, anon-rss:1924664kB, file-rss:22928kB
[773519.093622] firefox: page allocation failure: order:0, mode:0x280da
[773519.093627] Pid: 5458, comm: firefox Not tainted 3.7-trunk-amd64 #1 Debian 3.7.3-1~experimental.1
[773519.093629] Call Trace:
[773519.093639] [<ffffffff810c5d51>] ? warn_alloc_failed+0x111/0x123
[773519.093643] [<ffffffff810c8963>] ? __alloc_pages_nodemask+0x6b0/0x74c
[773519.093647] [<ffffffff810f5936>] ? alloc_pages_vma+0x110/0x12d
[773519.093650] [<ffffffff810e0356>] ? handle_pte_fault+0x15e/0x7dd
[773519.093653] [<ffffffff810c7919>] ? free_hot_cold_page+0x42/0x102
[773519.093657] [<ffffffff810dd8d4>] ? pte_offset_kernel+0xc/0x38
[773519.093661] [<ffffffff8137b528>] ? __do_page_fault+0x32e/0x376
[773519.093664] [<ffffffff810f9572>] ? kmem_cache_free+0x2d/0x69
[773519.093669] [<ffffffff81066342>] ? __dequeue_entity+0x18/0x2b
[773519.093673] [<ffffffff8100d025>] ? paravirt_write_msr+0xb/0xe
[773519.093676] [<ffffffff8100d652>] ? __switch_to+0x1db/0x3f8
[773519.093679] [<ffffffff81067b88>] ? pick_next_task_fair+0xe3/0x13b
[773519.093682] [<ffffffff8105fad7>] ? mmdrop+0xd/0x1c
[773519.093684] [<ffffffff8106138c>] ? finish_task_switch+0x83/0xb4
[773519.093689] [<ffffffff81377881>] ? __schedule+0x4b2/0x4e0
[773519.093691] [<ffffffff81378918>] ? page_fault+0x28/0x30
[773519.093693] Mem-Info:
[773519.093694] Node 0 DMA per-cpu:
[773519.093696] CPU 0: hi: 0, btch: 1 usd: 0
[773519.093698] CPU 1: hi: 0, btch: 1 usd: 0
[773519.093700] CPU 2: hi: 0, btch: 1 usd: 0
[773519.093701] CPU 3: hi: 0, btch: 1 usd: 0
[773519.093702] Node 0 DMA32 per-cpu:
[773519.093704] CPU 0: hi: 186, btch: 31 usd: 0
[773519.093706] CPU 1: hi: 186, btch: 31 usd: 0
[773519.093708] CPU 2: hi: 186, btch: 31 usd: 0
[773519.093709] CPU 3: hi: 186, btch: 31 usd: 0
[773519.093710] Node 0 Normal per-cpu:
[773519.093712] CPU 0: hi: 186, btch: 31 usd: 0
[773519.093714] CPU 1: hi: 186, btch: 31 usd: 0
[773519.093715] CPU 2: hi: 186, btch: 31 usd: 0
[773519.093717] CPU 3: hi: 186, btch: 31 usd: 0
[773519.093721] active_anon:439845 inactive_anon:131886 isolated_anon:0
active_file:611682 inactive_file:680137 isolated_file:0
unevictable:1641 dirty:680244 writeback:0 unstable:0
free:25754 slab_reclaimable:80416 slab_unreclaimable:7512
mapped:9807 shmem:11579 pagetables:4334 bounce:0
free_cma:0
[773519.093725] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[773519.093731] lowmem_reserve[]: 0 2998 7921 7921
[773519.093735] Node 0 DMA32 free:45188kB min:25528kB low:31908kB high:38292kB active_anon:16040kB inactive_anon:127684kB active_file:1324868kB inactive_file:1352336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3070172kB mlocked:0kB dirty:1352548kB writeback:0kB mapped:1360kB shmem:15220kB slab_reclaimable:165324kB slab_unreclaimable:1936kB kernel_stack:8kB pagetables:808kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[773519.093741] lowmem_reserve[]: 0 0 4923 4923
[773519.093744] Node 0 Normal free:41924kB min:41924kB low:52404kB high:62884kB active_anon:1743340kB inactive_anon:399860kB active_file:1121860kB inactive_file:1368212kB unevictable:6564kB isolated(anon):0kB isolated(file):0kB present:5041920kB mlocked:6564kB dirty:1368428kB writeback:0kB mapped:37868kB shmem:31096kB slab_reclaimable:156340kB slab_unreclaimable:28112kB kernel_stack:2376kB pagetables:16528kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:736 all_unreclaimable? no
[773519.093750] lowmem_reserve[]: 0 0 0 0
[773519.093752] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[773519.093760] Node 0 DMA32: 64*4kB 2175*8kB 941*16kB 242*32kB 10*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 0*4096kB = 45192kB
[773519.093767] Node 0 Normal: 537*4kB 824*8kB 234*16kB 74*32kB 34*64kB 15*128kB 8*256kB 5*512kB 12*1024kB 1*2048kB 1*4096kB = 41988kB
[773519.093775] 1304075 total pagecache pages
[773519.093777] 0 pages in swap cache
[773519.093778] Swap cache stats: add 0, delete 0, find 0/0
[773519.093780] Free swap = 8387576kB
[773519.093781] Total swap = 8387576kB
[773519.139790] 2064368 pages RAM
[773519.139793] 47751 pages reserved
[773519.139794] 2211758 pages shared
[773519.139796] 849916 pages non-shared
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-02-08 16:31 Michal Suchanek
@ 2013-03-11 13:15 ` Michal Suchanek
0 siblings, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-03-11 13:15 UTC (permalink / raw)
To: linux-mm, 699277
On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> I am dealing with VM disk images and performing something like wiping
> free space to prepare image for compressing and storing on server or
> copying it to external USB disk causes
>
> 1) system lockup in order of a few tens of seconds when all CPU cores
> are 100% used by system and the machine is basicaly unusable
>
> 2) oom killer killing processes
>
> This all on system with 8G ram so there should be plenty space to work with.
>
> This happens with kernels 3.6.4 or 3.7.1
>
> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> problem even with less ram.
>
> I have vm.swappiness = 0 set for a long time already.
>
>
I did some testing with 3.7.1 and with swappiness as much as 75 the
kernel still causes all cores to loop somewhere in system when writing
lots of data to disk.
With swappiness as much as 90 processes still get killed on large disk writes.
Given that the max is 100 the interval in which mm works at all is
going to be very narrow, less than 10% of the paramater range. This is
a severe regression as is the cpu time consumed by the kernel.
The io scheduler is the default cfq.
If you have any idea what to try other than downgrading to an earlier
unaffected kernel I would like to hear.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
@ 2013-03-12 2:15 Hillf Danton
2013-03-12 9:03 ` Michal Suchanek
2013-08-26 13:51 ` Michal Suchanek
0 siblings, 2 replies; 15+ messages in thread
From: Hillf Danton @ 2013-03-12 2:15 UTC (permalink / raw)
To: Michal Suchanek, LKML, Linux-MM
>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> Hello,
>>
>> I am dealing with VM disk images and performing something like wiping
>> free space to prepare image for compressing and storing on server or
>> copying it to external USB disk causes
>>
>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> are 100% used by system and the machine is basicaly unusable
>>
>> 2) oom killer killing processes
>>
>> This all on system with 8G ram so there should be plenty space to work with.
>>
>> This happens with kernels 3.6.4 or 3.7.1
>>
>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> problem even with less ram.
>>
>> I have vm.swappiness = 0 set for a long time already.
>>
>>
>I did some testing with 3.7.1 and with swappiness as much as 75 the
>kernel still causes all cores to loop somewhere in system when writing
>lots of data to disk.
>
>With swappiness as much as 90 processes still get killed on large disk writes.
>
>Given that the max is 100 the interval in which mm works at all is
>going to be very narrow, less than 10% of the paramater range. This is
>a severe regression as is the cpu time consumed by the kernel.
>
>The io scheduler is the default cfq.
>
>If you have any idea what to try other than downgrading to an earlier
>unaffected kernel I would like to hear.
>
Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
deadlock caused by too_many_isolated())?
Or try 3.8 and/or 3.9, additionally?
Hillf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-03-12 2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
@ 2013-03-12 9:03 ` Michal Suchanek
2013-08-26 13:51 ` Michal Suchanek
1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-03-12 9:03 UTC (permalink / raw)
To: Hillf Danton; +Cc: LKML, Linux-MM
On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have vm.swappiness = 0 set for a long time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?
Hello,
in the meantime I tried setting io scheduler to deadline because I
remember using that one in my self-built kernels due to cfq breaking
some obscure block driver.
With the deadline io scheduler I can set swappiness back to 0 and the
system works normally even for moderate amount of IO - restoring disk
images from network. This would cause lockups and oom killer running
loose with the cfq scheduler.
So I guess I found what breaks the system and it is not so much the
kernel version. It's using pre-built kernels with the default
scheduler.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-03-12 2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
2013-03-12 9:03 ` Michal Suchanek
@ 2013-08-26 13:51 ` Michal Suchanek
2013-09-05 10:12 ` Michal Suchanek
1 sibling, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-08-26 13:51 UTC (permalink / raw)
To: Hillf Danton; +Cc: LKML, Linux-MM
On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have vm.swappiness = 0 set for a long time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?
>
Hello,
with deadline IO scheduler I experience this issue less often but it
still happens.
I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
Do you have some idea what to log so that useful information about the
lockup is gathered?
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-08-26 13:51 ` Michal Suchanek
@ 2013-09-05 10:12 ` Michal Suchanek
2013-09-17 13:31 ` Michal Suchanek
0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-09-05 10:12 UTC (permalink / raw)
To: Hillf Danton; +Cc: LKML, Linux-MM
Hello
On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am dealing with VM disk images and performing something like wiping
>>>> free space to prepare image for compressing and storing on server or
>>>> copying it to external USB disk causes
>>>>
>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>> are 100% used by system and the machine is basicaly unusable
>>>>
>>>> 2) oom killer killing processes
>>>>
>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>
>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>
>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>> problem even with less ram.
>>>>
>>>> I have vm.swappiness = 0 set for a long time already.
>>>>
>>>>
>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>kernel still causes all cores to loop somewhere in system when writing
>>>lots of data to disk.
>>>
>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>
>>>Given that the max is 100 the interval in which mm works at all is
>>>going to be very narrow, less than 10% of the paramater range. This is
>>>a severe regression as is the cpu time consumed by the kernel.
>>>
>>>The io scheduler is the default cfq.
>>>
>>>If you have any idea what to try other than downgrading to an earlier
>>>unaffected kernel I would like to hear.
>>>
>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> deadlock caused by too_many_isolated())?
>>
>> Or try 3.8 and/or 3.9, additionally?
>>
>
> Hello,
>
> with deadline IO scheduler I experience this issue less often but it
> still happens.
>
> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>
> Do you have some idea what to log so that useful information about the
> lockup is gathered?
>
This appears to be fixed in vanilla 3.11 kernel.
I still get short intermittent lockups and cpu usage spikes up to 20%
on a core but nowhere near the minute+ long lockups with all cores
100% on earlier kernels.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-05 10:12 ` Michal Suchanek
@ 2013-09-17 13:31 ` Michal Suchanek
2013-09-17 21:13 ` Jan Kara
0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-09-17 13:31 UTC (permalink / raw)
To: Hillf Danton; +Cc: LKML, Linux-MM
On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello
>
> On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I am dealing with VM disk images and performing something like wiping
>>>>> free space to prepare image for compressing and storing on server or
>>>>> copying it to external USB disk causes
>>>>>
>>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>>> are 100% used by system and the machine is basicaly unusable
>>>>>
>>>>> 2) oom killer killing processes
>>>>>
>>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>>
>>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>>
>>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>>> problem even with less ram.
>>>>>
>>>>> I have vm.swappiness = 0 set for a long time already.
>>>>>
>>>>>
>>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>>kernel still causes all cores to loop somewhere in system when writing
>>>>lots of data to disk.
>>>>
>>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>>
>>>>Given that the max is 100 the interval in which mm works at all is
>>>>going to be very narrow, less than 10% of the paramater range. This is
>>>>a severe regression as is the cpu time consumed by the kernel.
>>>>
>>>>The io scheduler is the default cfq.
>>>>
>>>>If you have any idea what to try other than downgrading to an earlier
>>>>unaffected kernel I would like to hear.
>>>>
>>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>>> deadlock caused by too_many_isolated())?
>>>
>>> Or try 3.8 and/or 3.9, additionally?
>>>
>>
>> Hello,
>>
>> with deadline IO scheduler I experience this issue less often but it
>> still happens.
>>
>> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>>
>> Do you have some idea what to log so that useful information about the
>> lockup is gathered?
>>
>
> This appears to be fixed in vanilla 3.11 kernel.
>
> I still get short intermittent lockups and cpu usage spikes up to 20%
> on a core but nowhere near the minute+ long lockups with all cores
> 100% on earlier kernels.
>
So I did more testing on the 3.11 kernel and while it works OK with
tar you can get severe lockups with mc or kvm. The difference is
probably the fact that sane tools do fsync() on files they close
forcing the file to write out and the kernel returning possible write
errors before they move on to next file.
With kvm writing to a file used as virtual disk the system would stall
indefinitely until the disk driver in the emulated system would time
out, return disk IO error, and the emulated system would stop writing.
In top I see all CPU cores 90%+ in wait. System is unusable. With mc
the lockups would be indefinite, probably because there is no timeout
on writing a file in mc.
I tried tuning swappiness and eleveators but the the basic problem is
solved by neither: the dirty buffers fill up memory and system stalls
trying to resolve the situation.
Obviously the kernel puts off writing any dirty buffers until the
memory pressure is overwhelming and the vmm flops.
At least the OOM killer does not get invoked anymore since there is
lots of memory - just Linux does not know how to use it.
The solution to this problem is quite simple - use the ancient
userspace bdflushd or what it was called. I emulate it with
{ while true ; do sleep 5; sync ; done } &
The system performance suddenly increases - to the awesome Debian stable levels.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-17 13:31 ` Michal Suchanek
@ 2013-09-17 21:13 ` Jan Kara
2013-09-17 22:22 ` Michal Suchanek
2013-09-18 14:56 ` Michal Suchanek
0 siblings, 2 replies; 15+ messages in thread
From: Jan Kara @ 2013-09-17 21:13 UTC (permalink / raw)
To: Michal Suchanek; +Cc: Hillf Danton, LKML, Linux-MM
Hello,
On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I am dealing with VM disk images and performing something like wiping
> >>>>> free space to prepare image for compressing and storing on server or
> >>>>> copying it to external USB disk causes
> >>>>>
> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >>>>> are 100% used by system and the machine is basicaly unusable
> >>>>>
> >>>>> 2) oom killer killing processes
> >>>>>
> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >>>>>
> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >>>>>
> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >>>>> problem even with less ram.
> >>>>>
> >>>>> I have vm.swappiness = 0 set for a long time already.
> >>>>>
> >>>>>
> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >>>>kernel still causes all cores to loop somewhere in system when writing
> >>>>lots of data to disk.
> >>>>
> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >>>>
> >>>>Given that the max is 100 the interval in which mm works at all is
> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >>>>a severe regression as is the cpu time consumed by the kernel.
> >>>>
> >>>>The io scheduler is the default cfq.
> >>>>
> >>>>If you have any idea what to try other than downgrading to an earlier
> >>>>unaffected kernel I would like to hear.
> >>>>
> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >>> deadlock caused by too_many_isolated())?
> >>>
> >>> Or try 3.8 and/or 3.9, additionally?
> >>>
> >>
> >> Hello,
> >>
> >> with deadline IO scheduler I experience this issue less often but it
> >> still happens.
> >>
> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >>
> >> Do you have some idea what to log so that useful information about the
> >> lockup is gathered?
> >>
> >
> > This appears to be fixed in vanilla 3.11 kernel.
> >
> > I still get short intermittent lockups and cpu usage spikes up to 20%
> > on a core but nowhere near the minute+ long lockups with all cores
> > 100% on earlier kernels.
> >
>
> So I did more testing on the 3.11 kernel and while it works OK with
> tar you can get severe lockups with mc or kvm. The difference is
> probably the fact that sane tools do fsync() on files they close
> forcing the file to write out and the kernel returning possible write
> errors before they move on to next file.
Sorry for chiming in a bit late. But is this really writing to a normal
disk? SATA drive or something else?
> With kvm writing to a file used as virtual disk the system would stall
> indefinitely until the disk driver in the emulated system would time
> out, return disk IO error, and the emulated system would stop writing.
> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> the lockups would be indefinite, probably because there is no timeout
> on writing a file in mc.
>
> I tried tuning swappiness and eleveators but the the basic problem is
> solved by neither: the dirty buffers fill up memory and system stalls
> trying to resolve the situation.
This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
amount of dirty memory. By default it is set to 20% of memory which tends
to be too much for 8 GB machine. Can you set it to something like 5% and
/proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
sizing (assuming standard SATA drive). Does it change anything?
If the problem doesn't go away, can you install systemtap on your system
and run attached script? It should report where exactly processes stall and
for how long which should help us address the issue. Thanks.
> Obviously the kernel puts off writing any dirty buffers until the
> memory pressure is overwhelming and the vmm flops.
>
> At least the OOM killer does not get invoked anymore since there is
> lots of memory - just Linux does not know how to use it.
>
> The solution to this problem is quite simple - use the ancient
> userspace bdflushd or what it was called. I emulate it with
> { while true ; do sleep 5; sync ; done } &
>
> The system performance suddenly increases - to the awesome Debian stable levels.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-17 21:13 ` Jan Kara
@ 2013-09-17 22:22 ` Michal Suchanek
2013-09-18 14:56 ` Michal Suchanek
1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-17 22:22 UTC (permalink / raw)
To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM
On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have vm.swappiness = 0 set for a long time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
> Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?
It's a LVM volume on a SATA drive. I sometimes use USB disks as well
but most of the time it's SATA or eSATA.
>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
> This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?
I can try that but I don't really mind if the kernel uses 2G ram for
buffers. The problem is it cannot manage those buffers. Does some
kernel structure grow out of proportion when the buffers reach this
size or something?
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-17 21:13 ` Jan Kara
2013-09-17 22:22 ` Michal Suchanek
@ 2013-09-18 14:56 ` Michal Suchanek
2013-09-19 10:13 ` Jan Kara
[not found] ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
1 sibling, 2 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-18 14:56 UTC (permalink / raw)
To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM
On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
> Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have vm.swappiness = 0 set for a long time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
> Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?
>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
> This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?
The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
these to 5/2 gives about the same result as running the script that
syncs every 5s. Setting to 30/10 gives larger data chunks and
intermittent lockup before every chunk is written.
It is quite possible to set kernel parameters that kill the kernel but
1) this is the default
2) the parameter is set in units that do not prevent the issue in
general (% RAM vs #blocks)
3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
traversing a structure holding 800M data in the background. Something
is seriously rotten somewhere.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-18 14:56 ` Michal Suchanek
@ 2013-09-19 10:13 ` Jan Kara
2013-10-09 14:19 ` Michal Suchanek
[not found] ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
1 sibling, 1 reply; 15+ messages in thread
From: Jan Kara @ 2013-09-19 10:13 UTC (permalink / raw)
To: Michal Suchanek; +Cc: Jan Kara, Hillf Danton, LKML, Linux-MM
[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]
On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
> > Hello,
> >
> > On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> >> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> >> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>> Hello,
> >> >>>>>
> >> >>>>> I am dealing with VM disk images and performing something like wiping
> >> >>>>> free space to prepare image for compressing and storing on server or
> >> >>>>> copying it to external USB disk causes
> >> >>>>>
> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >> >>>>> are 100% used by system and the machine is basicaly unusable
> >> >>>>>
> >> >>>>> 2) oom killer killing processes
> >> >>>>>
> >> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >> >>>>>
> >> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >> >>>>>
> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >> >>>>> problem even with less ram.
> >> >>>>>
> >> >>>>> I have vm.swappiness = 0 set for a long time already.
> >> >>>>>
> >> >>>>>
> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >> >>>>kernel still causes all cores to loop somewhere in system when writing
> >> >>>>lots of data to disk.
> >> >>>>
> >> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >> >>>>
> >> >>>>Given that the max is 100 the interval in which mm works at all is
> >> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >> >>>>a severe regression as is the cpu time consumed by the kernel.
> >> >>>>
> >> >>>>The io scheduler is the default cfq.
> >> >>>>
> >> >>>>If you have any idea what to try other than downgrading to an earlier
> >> >>>>unaffected kernel I would like to hear.
> >> >>>>
> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >> >>> deadlock caused by too_many_isolated())?
> >> >>>
> >> >>> Or try 3.8 and/or 3.9, additionally?
> >> >>>
> >> >>
> >> >> Hello,
> >> >>
> >> >> with deadline IO scheduler I experience this issue less often but it
> >> >> still happens.
> >> >>
> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >> >>
> >> >> Do you have some idea what to log so that useful information about the
> >> >> lockup is gathered?
> >> >>
> >> >
> >> > This appears to be fixed in vanilla 3.11 kernel.
> >> >
> >> > I still get short intermittent lockups and cpu usage spikes up to 20%
> >> > on a core but nowhere near the minute+ long lockups with all cores
> >> > 100% on earlier kernels.
> >> >
> >>
> >> So I did more testing on the 3.11 kernel and while it works OK with
> >> tar you can get severe lockups with mc or kvm. The difference is
> >> probably the fact that sane tools do fsync() on files they close
> >> forcing the file to write out and the kernel returning possible write
> >> errors before they move on to next file.
> > Sorry for chiming in a bit late. But is this really writing to a normal
> > disk? SATA drive or something else?
> >
> >> With kvm writing to a file used as virtual disk the system would stall
> >> indefinitely until the disk driver in the emulated system would time
> >> out, return disk IO error, and the emulated system would stop writing.
> >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> >> the lockups would be indefinite, probably because there is no timeout
> >> on writing a file in mc.
> >>
> >> I tried tuning swappiness and eleveators but the the basic problem is
> >> solved by neither: the dirty buffers fill up memory and system stalls
> >> trying to resolve the situation.
> > This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> > amount of dirty memory. By default it is set to 20% of memory which tends
> > to be too much for 8 GB machine. Can you set it to something like 5% and
> > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> > sizing (assuming standard SATA drive). Does it change anything?
>
> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
to better accomodate some workloads but 60/40 on 8 GB machines with
SATA drive really seems too much. That is going to give memory management a
headache.
The problem is that a good SATA drive can do ~100 MB/s if we are
lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
it takes 50s at best to write it, with more random IO to image file it can
well take several minutes to write. That may cause some increased latency
when memory reclaim waits for writeback to clean some pages.
> these to 5/2 gives about the same result as running the script that
> syncs every 5s. Setting to 30/10 gives larger data chunks and
> intermittent lockup before every chunk is written.
>
> It is quite possible to set kernel parameters that kill the kernel but
>
> 1) this is the default
Not upstream one so you should raise this with Debian I guess. 60/40
looks way out of reasonable range for todays machines.
> 2) the parameter is set in units that do not prevent the issue in
> general (% RAM vs #blocks)
You can set the number of bytes instead of percentage -
/proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
sizing depends on amount of memory, storage HW, workload. So it's more an
administrative task to set this tunable properly.
> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
> traversing a structure holding 800M data in the background. Something
> is seriously rotten somewhere.
Likely processes are waiting in direct reclaim for IO to finish. But that
is just guessing. Try running attached script (forgot to attach it to
previous email). You will need systemtap and kernel debuginfo installed.
The script doesn't work with all versions of systemtap (as it is sadly a
moving target) so if it fails, tell me your version of systemtap and I'll
update the script accordingly.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: watch-dstate.pl --]
[-- Type: application/x-perl, Size: 11084 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
[not found] ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
@ 2013-09-20 11:20 ` Michal Suchanek
0 siblings, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-20 11:20 UTC (permalink / raw)
To: Hillf Danton, Linux-MM, Linux Kernel Mailing List, Jan Kara
Hello,
On 19 September 2013 10:07, Hillf Danton <dhillf@gmail.com> wrote:
> Hello Michal
>
> Take it easy please, the kernel is made by human hands.
>
> Can you please try the diff(and sorry if mail agent reformats it)?
>
> Best Regards
> Hillf
>
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
> * implies that pages are cycling through the LRU faster than
> * they are written so also forcibly stall.
> */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
> congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
> }
>
> /*
> --
I applied the patch and raised the dirty block ratios to 30/10 and the
default 60/40 while imaging a VM and did not observe any problems so I
guess this solves it.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-09-19 10:13 ` Jan Kara
@ 2013-10-09 14:19 ` Michal Suchanek
2013-10-15 14:15 ` Michal Suchanek
2014-07-07 11:34 ` Michal Suchanek
0 siblings, 2 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-10-09 14:19 UTC (permalink / raw)
To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM
Hello,
On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>> > Hello,
>>
>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
> Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
> to better accomodate some workloads but 60/40 on 8 GB machines with
> SATA drive really seems too much. That is going to give memory management a
> headache.
>
> The problem is that a good SATA drive can do ~100 MB/s if we are
> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
> it takes 50s at best to write it, with more random IO to image file it can
> well take several minutes to write. That may cause some increased latency
> when memory reclaim waits for writeback to clean some pages.
>
>> these to 5/2 gives about the same result as running the script that
>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>> intermittent lockup before every chunk is written.
>>
>> It is quite possible to set kernel parameters that kill the kernel but
>>
>> 1) this is the default
> Not upstream one so you should raise this with Debian I guess. 60/40
> looks way out of reasonable range for todays machines.
>
>> 2) the parameter is set in units that do not prevent the issue in
>> general (% RAM vs #blocks)
> You can set the number of bytes instead of percentage -
> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
> sizing depends on amount of memory, storage HW, workload. So it's more an
> administrative task to set this tunable properly.
>
>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>> traversing a structure holding 800M data in the background. Something
>> is seriously rotten somewhere.
> Likely processes are waiting in direct reclaim for IO to finish. But that
> is just guessing. Try running attached script (forgot to attach it to
> previous email). You will need systemtap and kernel debuginfo installed.
> The script doesn't work with all versions of systemtap (as it is sadly a
> moving target) so if it fails, tell me your version of systemtap and I'll
> update the script accordingly.
This was fixed for me by the patch posted earlier by Hillf Danton so I
guess this answers what the system was (not) doing:
--- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
+++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
@@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
* implies that pages are cycling through the LRU faster than
* they are written so also forcibly stall.
*/
- if (nr_unqueued_dirty == nr_taken || nr_immediate)
+ if (nr_unqueued_dirty == nr_taken || nr_immediate) {
+ if (current_is_kswapd())
+ wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
congestion_wait(BLK_RW_ASYNC, HZ/10);
+ }
}
/*
Also 75485363 is hopefully addressing this issue in mainline.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-10-09 14:19 ` Michal Suchanek
@ 2013-10-15 14:15 ` Michal Suchanek
2014-07-07 11:34 ` Michal Suchanek
1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-10-15 14:15 UTC (permalink / raw)
To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM
On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> > Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>> Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>> Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>> You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>> Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
> * implies that pages are cycling through the LRU faster than
> * they are written so also forcibly stall.
> */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
> congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
> }
>
> /*
>
> Also 75485363 is hopefully addressing this issue in mainline.
>
Actually, this was in 3.11 already and it did make the behaviour a bit
better but was not enough.
So is something like the vmscan.c patch going to make it into the
mainline kernel?
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: doing lots of disk writes causes oom killer to kill processes
2013-10-09 14:19 ` Michal Suchanek
2013-10-15 14:15 ` Michal Suchanek
@ 2014-07-07 11:34 ` Michal Suchanek
1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2014-07-07 11:34 UTC (permalink / raw)
To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM
On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> > Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>> Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>> Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>> You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>> Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
> * implies that pages are cycling through the LRU faster than
> * they are written so also forcibly stall.
> */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
> congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
> }
>
> /*
>
Hello,
Is this being addressed somehow?
It seems the 3.15 kernel still has this issue .. unless it happens to
lock up for some other reason in similar situations.
Thanks
Michal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-07-07 11:34 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12 2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
2013-03-12 9:03 ` Michal Suchanek
2013-08-26 13:51 ` Michal Suchanek
2013-09-05 10:12 ` Michal Suchanek
2013-09-17 13:31 ` Michal Suchanek
2013-09-17 21:13 ` Jan Kara
2013-09-17 22:22 ` Michal Suchanek
2013-09-18 14:56 ` Michal Suchanek
2013-09-19 10:13 ` Jan Kara
2013-10-09 14:19 ` Michal Suchanek
2013-10-15 14:15 ` Michal Suchanek
2014-07-07 11:34 ` Michal Suchanek
[not found] ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
2013-09-20 11:20 ` Michal Suchanek
-- strict thread matches above, loose matches on Subject: below --
2013-02-08 16:31 Michal Suchanek
2013-03-11 13:15 ` Michal Suchanek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).