linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* doing lots of disk writes causes oom killer to kill processes
@ 2013-02-08 16:31 Michal Suchanek
  2013-03-11 13:15 ` Michal Suchanek
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-02-08 16:31 UTC (permalink / raw)
  To: linux-mm, 699277

[-- Attachment #1: Type: text/plain, Size: 941 bytes --]

Hello,

I am dealing with VM disk images and performing something like wiping
free space to prepare image for compressing and storing on server or
copying it to external USB disk causes

1) system lockup in order of a few tens of seconds when all CPU cores
are 100% used by system and the machine is basicaly unusable

2) oom killer killing processes

This all on system with 8G ram so there should be plenty space to work with.

This happens with kernels 3.6.4 or 3.7.1

With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
problem even with less ram.

I have  vm.swappiness = 0 set for a long  time already.

Presumably the kernel should stop the process writing to disk from
creating more buffers and write out some rather than start killing
processes to free memory. Also it might use some swap even with zero
swappiness when there is no other way to free more space. Earlier
kernels certainly did that.

Thanks

Michal

[-- Attachment #2: dmesg.oom-kill.txt --]
[-- Type: text/plain, Size: 18561 bytes --]

[773518.533786] init cpuset=/ mems_allowed=0
[773518.533790] Pid: 1, comm: init Not tainted 3.7-trunk-amd64 #1 Debian 3.7.3-1~experimental.1
[773518.533791] Call Trace:
[773518.533802]  [<ffffffff81373554>] ? dump_header+0x70/0x1aa
[773518.533807]  [<ffffffff8103fbff>] ? put_online_cpus+0x1f/0x66
[773518.533811]  [<ffffffff810a0dd4>] ? rcu_oom_notify+0xc6/0xd8
[773518.533815]  [<ffffffff810c428b>] ? oom_kill_process+0x72/0x2c5
[773518.533819]  [<ffffffff810612eb>] ? should_resched+0x5/0x23
[773518.533821]  [<ffffffff810c49ca>] ? out_of_memory+0x384/0x3d5
[773518.533825]  [<ffffffff810c8873>] ? __alloc_pages_nodemask+0x5c0/0x74c
[773518.533829]  [<ffffffff810f9e4e>] ? kmem_getpages+0x54/0x124
[773518.533832]  [<ffffffff810fa778>] ? fallback_alloc+0x125/0x1ed
[773518.533834]  [<ffffffff810fb3a2>] ? kmem_cache_alloc+0x7b/0xfd
[773518.533838]  [<ffffffff8111099d>] ? getname_flags.part.23+0x22/0x10a
[773518.533841]  [<ffffffff81113523>] ? user_path_at_empty+0x35/0xa8
[773518.533845]  [<ffffffff81372269>] ? __bad_area_nosemaphore+0x95/0x1ec
[773518.533849]  [<ffffffff8137b4f4>] ? __do_page_fault+0x2fa/0x376
[773518.533851]  [<ffffffff810612eb>] ? should_resched+0x5/0x23
[773518.533854]  [<ffffffff813778f2>] ? _cond_resched+0x6/0x1b
[773518.533857]  [<ffffffff8110bc9c>] ? cp_new_stat+0x11d/0x130
[773518.533861]  [<ffffffff81042e59>] ? timespec_add_safe+0x1e/0x4d
[773518.533864]  [<ffffffff8110ba31>] ? vfs_fstatat+0x2d/0x63
[773518.533867]  [<ffffffff81115973>] ? poll_select_copy_remaining+0xea/0xff
[773518.533869]  [<ffffffff8110bd46>] ? sys_newstat+0x12/0x2d
[773518.533872]  [<ffffffff81378918>] ? page_fault+0x28/0x30
[773518.533875]  [<ffffffff8137d6e9>] ? system_call_fastpath+0x16/0x1b
[773518.533877] Mem-Info:
[773518.533879] Node 0 DMA per-cpu:
[773518.533882] CPU    0: hi:    0, btch:   1 usd:   0
[773518.533883] CPU    1: hi:    0, btch:   1 usd:   0
[773518.533885] CPU    2: hi:    0, btch:   1 usd:   0
[773518.533887] CPU    3: hi:    0, btch:   1 usd:   0
[773518.533888] Node 0 DMA32 per-cpu:
[773518.533890] CPU    0: hi:  186, btch:  31 usd:   0
[773518.533891] CPU    1: hi:  186, btch:  31 usd:   0
[773518.533893] CPU    2: hi:  186, btch:  31 usd:   0
[773518.533895] CPU    3: hi:  186, btch:  31 usd:   0
[773518.533896] Node 0 Normal per-cpu:
[773518.533898] CPU    0: hi:  186, btch:  31 usd:   0
[773518.533899] CPU    1: hi:  186, btch:  31 usd:   0
[773518.533901] CPU    2: hi:  186, btch:  31 usd:   0
[773518.533902] CPU    3: hi:  186, btch:  31 usd:   0
[773518.533907] active_anon:439845 inactive_anon:131886 isolated_anon:0
 active_file:611682 inactive_file:680137 isolated_file:0
 unevictable:1641 dirty:680244 writeback:0 unstable:0
 free:25754 slab_reclaimable:80416 slab_unreclaimable:7512
 mapped:9807 shmem:11579 pagetables:4334 bounce:0
 free_cma:0
[773518.533911] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[773518.533917] lowmem_reserve[]: 0 2998 7921 7921
[773518.533920] Node 0 DMA32 free:45188kB min:25528kB low:31908kB high:38292kB active_anon:16040kB inactive_anon:127684kB active_file:1324868kB inactive_file:1352336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3070172kB mlocked:0kB dirty:1352548kB writeback:0kB mapped:1360kB shmem:15220kB slab_reclaimable:165324kB slab_unreclaimable:1936kB kernel_stack:8kB pagetables:808kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5267308 all_unreclaimable? yes
[773518.533927] lowmem_reserve[]: 0 0 4923 4923
[773518.533929] Node 0 Normal free:41924kB min:41924kB low:52404kB high:62884kB active_anon:1743340kB inactive_anon:399860kB active_file:1121860kB inactive_file:1368212kB unevictable:6564kB isolated(anon):0kB isolated(file):0kB present:5041920kB mlocked:6564kB dirty:1368428kB writeback:0kB mapped:37868kB shmem:31096kB slab_reclaimable:156340kB slab_unreclaimable:28112kB kernel_stack:2376kB pagetables:16528kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10328235 all_unreclaimable? yes
[773518.533936] lowmem_reserve[]: 0 0 0 0
[773518.533938] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[773518.533946] Node 0 DMA32: 65*4kB 2176*8kB 940*16kB 242*32kB 10*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 0*4096kB = 45188kB
[773518.533953] Node 0 Normal: 530*4kB 824*8kB 234*16kB 73*32kB 34*64kB 15*128kB 8*256kB 5*512kB 12*1024kB 1*2048kB 1*4096kB = 41928kB
[773518.533961] 1304075 total pagecache pages
[773518.533962] 0 pages in swap cache
[773518.533964] Swap cache stats: add 0, delete 0, find 0/0
[773518.533965] Free swap  = 8387576kB
[773518.533966] Total swap = 8387576kB
[773518.573684] 2064368 pages RAM
[773518.573686] 47751 pages reserved
[773518.573688] 2211781 pages shared
[773518.573689] 849932 pages non-shared
[773518.573690] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[773518.573705] [  445]     0   445     5456      385      16        0         -1000 udevd
[773518.573710] [ 3075]     0  3075     2489      716       8        0         -1000 dhclient
[773518.573713] [ 3088]   113  3088     7324      245      18        0             0 dnsmasq
[773518.573716] [ 3946]     0  3946     7089      251      20        0             0 nodm
[773518.573719] [ 3963]     0  3963    98955    37868     158        0             0 Xorg
[773518.573723] [ 3986]     0  3986    29676      434      23        0             0 rsyslogd
[773518.573725] [ 4129]     0  4129     4167       82      12        0             0 atd
[773518.573728] [ 4189]     0  4189     3651      166      13        0             0 inetutils-inetd
[773518.573731] [ 4219]   110  4219    19846      625      30        0             0 dictd
[773518.573734] [ 4256]   114  4256     1021       78       8        0             0 uml_switch
[773518.573737] [ 4274]     0  4274     1685       92       9        0             0 gpm
[773518.573741] [ 4377]   103  4377     7512      291      19        0             0 dbus-daemon
[773518.573744] [ 4408]     0  4408     5085      228      15        0             0 cron
[773518.573747] [ 4435]     0  4435    10731      538      23        0             0 apache
[773518.573750] [ 4490]     0  4490     1061      196       8        0             0 acpid
[773518.573753] [ 4608]     0  4608    20551      457      42        0             0 winbindd
[773518.573756] [ 4648]   109  4648    15146     6221      36        0             0 tor
[773518.573759] [ 4726]     0  4726    20551      369      41        0             0 winbindd
[773518.573762] [ 4757]    33  4757    11156      282      25        0             0 lighttpd
[773518.573765] [ 5044]     0  5044    12462      285      27        0         -1000 sshd
[773518.573768] [ 5050]  1000  5050    17146      257      37        0             0 nodm
[773518.573771] [ 5052]     0  5052    48274      651      30        0             0 console-kit-dae
[773518.573774] [ 5054]     0  5054      983       20       7        0             0 acpi_fakekeyd
[773518.573777] [ 5134]     0  5134    30736      452      29        0             0 polkitd
[773518.573780] [ 5143]  1000  5143    10193      202      25        0             0 ck-launch-sessi
[773518.573783] [ 5157]  1000  5157     3096      130      10        0             0 ssh-agent
[773518.573785] [ 5164]  1000  5164     6047      117      17        0             0 dbus-launch
[773518.573789] [ 5165]  1000  5165     7450       96      18        0             0 dbus-daemon
[773518.573792] [ 5226]  1000  5226    59673     2395      85        0             0 gkrellm
[773518.573795] [ 5257]     0  5257     3173      194      11        0             0 getty
[773518.573797] [ 5258]     0  5258     3173      193      13        0             0 getty
[773518.573801] [ 5259]     0  5259     3173      193      12        0             0 getty
[773518.573803] [ 5260]     0  5260     3173      193      12        0             0 getty
[773518.573807] [ 5261]     0  5261     3173      194      12        0             0 getty
[773518.573810] [ 5262]     0  5262     3173      194      12        0             0 getty
[773518.573813] [ 5310]  1000  5310    15322      623      36        0             0 xscreensaver
[773518.573816] [ 5311]  1000  5311    56136     2213      94        0             0 lxpanel
[773518.573819] [ 5314]  1000  5314    10703      369      27        0             0 menu-cached
[773518.573822] [ 5316]  1000  5316     3639      163      12        0             0 uim-helper-serv
[773518.573825] [ 5344]  1000  5344    17962     4954      40        0             0 xmonad-x86_64-l
[773518.573828] [ 5347]  1000  5347     6047      117      16        0             0 dbus-launch
[773518.573831] [ 5348]  1000  5348     7648      368      20        0             0 dbus-daemon
[773518.573834] [ 5351]  1000  5351    24138     1870      51        0             0 uim-xim
[773518.573836] [ 5352]  1000  5352    56268     1524      75        0             0 uim-toolbar-gtk
[773518.573839] [ 5359]  1000  5359    15465      399      35        0             0 gvfsd
[773518.573842] [ 5361]  1000  5361    20768      334      36        0             0 gvfs-fuse-daemo
[773518.573845] [ 5384]  1000  5384    28001     2911      58        0             0 urxvt
[773518.573848] [ 5386]  1000  5386    31756     1295      65        0             0 uim-candwin-gtk
[773518.573850] [ 5387]  1000  5387     5522     1070      14        0             0 bash
[773518.573853] [ 5458]  1000  5458   744949   486898    1657        0             0 firefox
[773518.573856] [ 5609]  1000  5609    27075     1980      54        0             0 urxvt
[773518.573859] [ 5610]  1000  5610     5416      909      16        0             0 bash
[773518.573861] [ 3831]  1000  3831     7450      149      19        0         -1000 dbus-daemon
[773518.573864] [ 3860]  1000  3860     6605      431      17        0         -1000 tmux
[773518.573867] [ 3861]  1000  3861     5877     1560      16        0         -1000 bash
[773518.573870] [15917]     0 15917    23575      534      50        0         -1000 sshd
[773518.573873] [15922]  1000 15922    24426      973      50        0         -1000 sshd
[773518.573876] [21553]  1000 21553    25417      475      51        0             0 usmb
[773518.573878] [25733]  1000 25733    28656     4048      58        0             0 urxvt
[773518.573881] [25734]  1000 25734     5416      939      15        0             0 bash
[773518.573884] [11439]  1000 11439    40328      758      42        0             0 gvfs-gdu-volume
[773518.573888] [11441]     0 11441    32507      757      30        0             0 udisks-daemon
[773518.573891] [11442]     0 11442    11853      164      27        0             0 udisks-daemon
[773518.573894] [11448]  1000 11448    15112      430      35        0             0 gvfs-gphoto2-vo
[773518.573896] [11450]  1000 11450    36103      389      39        0             0 gvfs-afc-volume
[773518.573899] [25534]  1000 25534    11735      453      27        0             0 gvfsd-metadata
[773518.573902] [30298]  1000 30298    26677     1508      55        0             0 urxvt
[773518.573905] [30299]  1000 30299     5416      939      15        0             0 bash
[773518.573908] [ 9122]    33  9122    10731      203      23        0             0 apache
[773518.573911] [ 9123]    33  9123    10731      203      23        0             0 apache
[773518.573914] [ 9124]    33  9124    10731      203      23        0             0 apache
[773518.573917] [ 9125]    33  9125    10731      203      23        0             0 apache
[773518.573920] [ 9127]    33  9127    10731      203      23        0             0 apache
[773518.573923] [ 9180]   108  9180     5444      256      15        0             0 privoxy
[773518.573926] [10394]  1000 10394    10886      535      26        0             0 ssh
[773518.573929] [10581]  1000 10581    33217      443      30        0             0 dconf-service
[773518.573932] [10585]  1000 10585    10852      472      26        0             0 ssh
[773518.573935] [10643]  1000 10643    61337      961      81        0             0 pulseaudio
[773518.573938] [14858]  1000 14858    26929     1824      55        0             0 urxvt
[773518.573940] [14859]  1000 14859     5455     1016      15        0             0 bash
[773518.573943] [15138]  1000 15138    28981     2280      58        0             0 urxvt
[773518.573946] [15139]  1000 15139     5416      922      15        0             0 bash
[773518.573949] [15203]  1000 15203    10852      493      26        0             0 ssh
[773518.573952] [ 9933]  1000  9933    16635      584      39        0             0 gvfsd-trash
[773518.573954] [26409]  1000 26409     4387      246      13        0             0 tmux
[773518.573958] [26434]     0 26434     5455      327      15        0         -1000 udevd
[773518.573961] [26435]     0 26435     5455      326      15        0         -1000 udevd
[773518.573964] [26465]  1000 26465     5868     1392      16        0         -1000 bash
[773518.573967] [ 7970]     0  7970     4711     1642      16        0             0 atop
[773518.573974] [25248]  1000 25248     4574      237      14        0         -1000 cp
[773518.573976] Out of memory: Kill process 5458 (firefox) score 118 or sacrifice child
[773518.573986] Killed process 5458 (firefox) total-vm:2979796kB, anon-rss:1924664kB, file-rss:22928kB
[773519.093622] firefox: page allocation failure: order:0, mode:0x280da
[773519.093627] Pid: 5458, comm: firefox Not tainted 3.7-trunk-amd64 #1 Debian 3.7.3-1~experimental.1
[773519.093629] Call Trace:
[773519.093639]  [<ffffffff810c5d51>] ? warn_alloc_failed+0x111/0x123
[773519.093643]  [<ffffffff810c8963>] ? __alloc_pages_nodemask+0x6b0/0x74c
[773519.093647]  [<ffffffff810f5936>] ? alloc_pages_vma+0x110/0x12d
[773519.093650]  [<ffffffff810e0356>] ? handle_pte_fault+0x15e/0x7dd
[773519.093653]  [<ffffffff810c7919>] ? free_hot_cold_page+0x42/0x102
[773519.093657]  [<ffffffff810dd8d4>] ? pte_offset_kernel+0xc/0x38
[773519.093661]  [<ffffffff8137b528>] ? __do_page_fault+0x32e/0x376
[773519.093664]  [<ffffffff810f9572>] ? kmem_cache_free+0x2d/0x69
[773519.093669]  [<ffffffff81066342>] ? __dequeue_entity+0x18/0x2b
[773519.093673]  [<ffffffff8100d025>] ? paravirt_write_msr+0xb/0xe
[773519.093676]  [<ffffffff8100d652>] ? __switch_to+0x1db/0x3f8
[773519.093679]  [<ffffffff81067b88>] ? pick_next_task_fair+0xe3/0x13b
[773519.093682]  [<ffffffff8105fad7>] ? mmdrop+0xd/0x1c
[773519.093684]  [<ffffffff8106138c>] ? finish_task_switch+0x83/0xb4
[773519.093689]  [<ffffffff81377881>] ? __schedule+0x4b2/0x4e0
[773519.093691]  [<ffffffff81378918>] ? page_fault+0x28/0x30
[773519.093693] Mem-Info:
[773519.093694] Node 0 DMA per-cpu:
[773519.093696] CPU    0: hi:    0, btch:   1 usd:   0
[773519.093698] CPU    1: hi:    0, btch:   1 usd:   0
[773519.093700] CPU    2: hi:    0, btch:   1 usd:   0
[773519.093701] CPU    3: hi:    0, btch:   1 usd:   0
[773519.093702] Node 0 DMA32 per-cpu:
[773519.093704] CPU    0: hi:  186, btch:  31 usd:   0
[773519.093706] CPU    1: hi:  186, btch:  31 usd:   0
[773519.093708] CPU    2: hi:  186, btch:  31 usd:   0
[773519.093709] CPU    3: hi:  186, btch:  31 usd:   0
[773519.093710] Node 0 Normal per-cpu:
[773519.093712] CPU    0: hi:  186, btch:  31 usd:   0
[773519.093714] CPU    1: hi:  186, btch:  31 usd:   0
[773519.093715] CPU    2: hi:  186, btch:  31 usd:   0
[773519.093717] CPU    3: hi:  186, btch:  31 usd:   0
[773519.093721] active_anon:439845 inactive_anon:131886 isolated_anon:0
 active_file:611682 inactive_file:680137 isolated_file:0
 unevictable:1641 dirty:680244 writeback:0 unstable:0
 free:25754 slab_reclaimable:80416 slab_unreclaimable:7512
 mapped:9807 shmem:11579 pagetables:4334 bounce:0
 free_cma:0
[773519.093725] Node 0 DMA free:15904kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[773519.093731] lowmem_reserve[]: 0 2998 7921 7921
[773519.093735] Node 0 DMA32 free:45188kB min:25528kB low:31908kB high:38292kB active_anon:16040kB inactive_anon:127684kB active_file:1324868kB inactive_file:1352336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3070172kB mlocked:0kB dirty:1352548kB writeback:0kB mapped:1360kB shmem:15220kB slab_reclaimable:165324kB slab_unreclaimable:1936kB kernel_stack:8kB pagetables:808kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[773519.093741] lowmem_reserve[]: 0 0 4923 4923
[773519.093744] Node 0 Normal free:41924kB min:41924kB low:52404kB high:62884kB active_anon:1743340kB inactive_anon:399860kB active_file:1121860kB inactive_file:1368212kB unevictable:6564kB isolated(anon):0kB isolated(file):0kB present:5041920kB mlocked:6564kB dirty:1368428kB writeback:0kB mapped:37868kB shmem:31096kB slab_reclaimable:156340kB slab_unreclaimable:28112kB kernel_stack:2376kB pagetables:16528kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:736 all_unreclaimable? no
[773519.093750] lowmem_reserve[]: 0 0 0 0
[773519.093752] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[773519.093760] Node 0 DMA32: 64*4kB 2175*8kB 941*16kB 242*32kB 10*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 2*2048kB 0*4096kB = 45192kB
[773519.093767] Node 0 Normal: 537*4kB 824*8kB 234*16kB 74*32kB 34*64kB 15*128kB 8*256kB 5*512kB 12*1024kB 1*2048kB 1*4096kB = 41988kB
[773519.093775] 1304075 total pagecache pages
[773519.093777] 0 pages in swap cache
[773519.093778] Swap cache stats: add 0, delete 0, find 0/0
[773519.093780] Free swap  = 8387576kB
[773519.093781] Total swap = 8387576kB
[773519.139790] 2064368 pages RAM
[773519.139793] 47751 pages reserved
[773519.139794] 2211758 pages shared
[773519.139796] 849916 pages non-shared

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-02-08 16:31 Michal Suchanek
@ 2013-03-11 13:15 ` Michal Suchanek
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-03-11 13:15 UTC (permalink / raw)
  To: linux-mm, 699277

On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> I am dealing with VM disk images and performing something like wiping
> free space to prepare image for compressing and storing on server or
> copying it to external USB disk causes
>
> 1) system lockup in order of a few tens of seconds when all CPU cores
> are 100% used by system and the machine is basicaly unusable
>
> 2) oom killer killing processes
>
> This all on system with 8G ram so there should be plenty space to work with.
>
> This happens with kernels 3.6.4 or 3.7.1
>
> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> problem even with less ram.
>
> I have  vm.swappiness = 0 set for a long  time already.
>
>
I did some testing with 3.7.1 and with swappiness as much as 75 the
kernel still causes all cores to loop somewhere in system when writing
lots of data to disk.

With swappiness as much as 90 processes still get killed on large disk writes.

Given that the max is 100 the interval in which mm works at all is
going to be very narrow, less than 10% of the paramater range. This is
a severe regression as is the cpu time consumed by the kernel.

The io scheduler is the default cfq.

If you have any idea what to try other than downgrading to an earlier
unaffected kernel I would like to hear.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
@ 2013-03-12  2:15 Hillf Danton
  2013-03-12  9:03 ` Michal Suchanek
  2013-08-26 13:51 ` Michal Suchanek
  0 siblings, 2 replies; 15+ messages in thread
From: Hillf Danton @ 2013-03-12  2:15 UTC (permalink / raw)
  To: Michal Suchanek, LKML, Linux-MM

>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> Hello,
>>
>> I am dealing with VM disk images and performing something like wiping
>> free space to prepare image for compressing and storing on server or
>> copying it to external USB disk causes
>>
>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> are 100% used by system and the machine is basicaly unusable
>>
>> 2) oom killer killing processes
>>
>> This all on system with 8G ram so there should be plenty space to work with.
>>
>> This happens with kernels 3.6.4 or 3.7.1
>>
>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> problem even with less ram.
>>
>> I have  vm.swappiness = 0 set for a long  time already.
>>
>>
>I did some testing with 3.7.1 and with swappiness as much as 75 the
>kernel still causes all cores to loop somewhere in system when writing
>lots of data to disk.
>
>With swappiness as much as 90 processes still get killed on large disk writes.
>
>Given that the max is 100 the interval in which mm works at all is
>going to be very narrow, less than 10% of the paramater range. This is
>a severe regression as is the cpu time consumed by the kernel.
>
>The io scheduler is the default cfq.
>
>If you have any idea what to try other than downgrading to an earlier
>unaffected kernel I would like to hear.
>
Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
deadlock caused by too_many_isolated())?

Or try 3.8 and/or 3.9, additionally?

Hillf

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
@ 2013-03-12  9:03 ` Michal Suchanek
  2013-08-26 13:51 ` Michal Suchanek
  1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-03-12  9:03 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have  vm.swappiness = 0 set for a long  time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?

Hello,

in the meantime I tried setting io scheduler to deadline because I
remember using that one in my self-built kernels due to cfq breaking
some obscure block driver.

With the deadline io scheduler I can set swappiness back to 0 and the
system works normally even for moderate amount of IO - restoring disk
images from network. This would cause lockups and oom killer running
loose with the cfq scheduler.

So I guess I found what breaks the system and it is not so much the
kernel version. It's using pre-built kernels with the default
scheduler.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
  2013-03-12  9:03 ` Michal Suchanek
@ 2013-08-26 13:51 ` Michal Suchanek
  2013-09-05 10:12   ` Michal Suchanek
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-08-26 13:51 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have  vm.swappiness = 0 set for a long  time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?
>

Hello,

with deadline IO scheduler I experience this issue less often but it
still happens.

I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.

Do you have some idea what to log so that useful information about the
lockup is gathered?

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-08-26 13:51 ` Michal Suchanek
@ 2013-09-05 10:12   ` Michal Suchanek
  2013-09-17 13:31     ` Michal Suchanek
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-09-05 10:12 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

Hello

On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am dealing with VM disk images and performing something like wiping
>>>> free space to prepare image for compressing and storing on server or
>>>> copying it to external USB disk causes
>>>>
>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>> are 100% used by system and the machine is basicaly unusable
>>>>
>>>> 2) oom killer killing processes
>>>>
>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>
>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>
>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>> problem even with less ram.
>>>>
>>>> I have  vm.swappiness = 0 set for a long  time already.
>>>>
>>>>
>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>kernel still causes all cores to loop somewhere in system when writing
>>>lots of data to disk.
>>>
>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>
>>>Given that the max is 100 the interval in which mm works at all is
>>>going to be very narrow, less than 10% of the paramater range. This is
>>>a severe regression as is the cpu time consumed by the kernel.
>>>
>>>The io scheduler is the default cfq.
>>>
>>>If you have any idea what to try other than downgrading to an earlier
>>>unaffected kernel I would like to hear.
>>>
>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> deadlock caused by too_many_isolated())?
>>
>> Or try 3.8 and/or 3.9, additionally?
>>
>
> Hello,
>
> with deadline IO scheduler I experience this issue less often but it
> still happens.
>
> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>
> Do you have some idea what to log so that useful information about the
> lockup is gathered?
>

This appears to be fixed in vanilla 3.11 kernel.

I still get short intermittent lockups and cpu usage spikes up to 20%
on a core but nowhere near the minute+ long lockups with all cores
100% on earlier kernels.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-05 10:12   ` Michal Suchanek
@ 2013-09-17 13:31     ` Michal Suchanek
  2013-09-17 21:13       ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Suchanek @ 2013-09-17 13:31 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello
>
> On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I am dealing with VM disk images and performing something like wiping
>>>>> free space to prepare image for compressing and storing on server or
>>>>> copying it to external USB disk causes
>>>>>
>>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>>> are 100% used by system and the machine is basicaly unusable
>>>>>
>>>>> 2) oom killer killing processes
>>>>>
>>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>>
>>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>>
>>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>>> problem even with less ram.
>>>>>
>>>>> I have  vm.swappiness = 0 set for a long  time already.
>>>>>
>>>>>
>>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>>kernel still causes all cores to loop somewhere in system when writing
>>>>lots of data to disk.
>>>>
>>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>>
>>>>Given that the max is 100 the interval in which mm works at all is
>>>>going to be very narrow, less than 10% of the paramater range. This is
>>>>a severe regression as is the cpu time consumed by the kernel.
>>>>
>>>>The io scheduler is the default cfq.
>>>>
>>>>If you have any idea what to try other than downgrading to an earlier
>>>>unaffected kernel I would like to hear.
>>>>
>>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>>> deadlock caused by too_many_isolated())?
>>>
>>> Or try 3.8 and/or 3.9, additionally?
>>>
>>
>> Hello,
>>
>> with deadline IO scheduler I experience this issue less often but it
>> still happens.
>>
>> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>>
>> Do you have some idea what to log so that useful information about the
>> lockup is gathered?
>>
>
> This appears to be fixed in vanilla 3.11 kernel.
>
> I still get short intermittent lockups and cpu usage spikes up to 20%
> on a core but nowhere near the minute+ long lockups with all cores
> 100% on earlier kernels.
>

So I did more testing on the 3.11 kernel and while it works OK with
tar you can get severe lockups with mc or kvm. The difference is
probably the fact that sane tools do fsync() on files they close
forcing the file to write out and the kernel returning possible write
errors before they move on to next file.

With kvm writing to a file used as virtual disk the system would stall
indefinitely until the disk driver in the emulated system would time
out, return disk IO error, and the emulated system would stop writing.
In top I see all CPU cores 90%+ in wait. System is unusable. With mc
the lockups would be indefinite, probably because there is no timeout
on writing a file in mc.

I tried tuning swappiness and eleveators but the the basic problem is
solved by neither: the dirty buffers fill up memory and system stalls
trying to resolve the situation.

Obviously the kernel puts off writing any dirty buffers until the
memory pressure is overwhelming and the vmm flops.

At least the OOM killer does not get invoked anymore since there is
lots of memory - just Linux does not know how to use it.

The solution to this problem is quite simple - use the ancient
userspace bdflushd or what it was called. I emulate it with
{ while true ; do sleep 5; sync ; done } &

The system performance suddenly increases - to the awesome Debian stable levels.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 13:31     ` Michal Suchanek
@ 2013-09-17 21:13       ` Jan Kara
  2013-09-17 22:22         ` Michal Suchanek
  2013-09-18 14:56         ` Michal Suchanek
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kara @ 2013-09-17 21:13 UTC (permalink / raw)
  To: Michal Suchanek; +Cc: Hillf Danton, LKML, Linux-MM

  Hello,

On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I am dealing with VM disk images and performing something like wiping
> >>>>> free space to prepare image for compressing and storing on server or
> >>>>> copying it to external USB disk causes
> >>>>>
> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >>>>> are 100% used by system and the machine is basicaly unusable
> >>>>>
> >>>>> 2) oom killer killing processes
> >>>>>
> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >>>>>
> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >>>>>
> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >>>>> problem even with less ram.
> >>>>>
> >>>>> I have  vm.swappiness = 0 set for a long  time already.
> >>>>>
> >>>>>
> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >>>>kernel still causes all cores to loop somewhere in system when writing
> >>>>lots of data to disk.
> >>>>
> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >>>>
> >>>>Given that the max is 100 the interval in which mm works at all is
> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >>>>a severe regression as is the cpu time consumed by the kernel.
> >>>>
> >>>>The io scheduler is the default cfq.
> >>>>
> >>>>If you have any idea what to try other than downgrading to an earlier
> >>>>unaffected kernel I would like to hear.
> >>>>
> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >>> deadlock caused by too_many_isolated())?
> >>>
> >>> Or try 3.8 and/or 3.9, additionally?
> >>>
> >>
> >> Hello,
> >>
> >> with deadline IO scheduler I experience this issue less often but it
> >> still happens.
> >>
> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >>
> >> Do you have some idea what to log so that useful information about the
> >> lockup is gathered?
> >>
> >
> > This appears to be fixed in vanilla 3.11 kernel.
> >
> > I still get short intermittent lockups and cpu usage spikes up to 20%
> > on a core but nowhere near the minute+ long lockups with all cores
> > 100% on earlier kernels.
> >
> 
> So I did more testing on the 3.11 kernel and while it works OK with
> tar you can get severe lockups with mc or kvm. The difference is
> probably the fact that sane tools do fsync() on files they close
> forcing the file to write out and the kernel returning possible write
> errors before they move on to next file.
  Sorry for chiming in a bit late. But is this really writing to a normal
disk? SATA drive or something else?

> With kvm writing to a file used as virtual disk the system would stall
> indefinitely until the disk driver in the emulated system would time
> out, return disk IO error, and the emulated system would stop writing.
> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> the lockups would be indefinite, probably because there is no timeout
> on writing a file in mc.
> 
> I tried tuning swappiness and eleveators but the the basic problem is
> solved by neither: the dirty buffers fill up memory and system stalls
> trying to resolve the situation.
  This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
amount of dirty memory. By default it is set to 20% of memory which tends
to be too much for 8 GB machine. Can you set it to something like 5% and
/proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
sizing (assuming standard SATA drive). Does it change anything?

If the problem doesn't go away, can you install systemtap on your system
and run attached script? It should report where exactly processes stall and
for how long which should help us address the issue. Thanks.

> Obviously the kernel puts off writing any dirty buffers until the
> memory pressure is overwhelming and the vmm flops.
> 
> At least the OOM killer does not get invoked anymore since there is
> lots of memory - just Linux does not know how to use it.
> 
> The solution to this problem is quite simple - use the ancient
> userspace bdflushd or what it was called. I emulate it with
> { while true ; do sleep 5; sync ; done } &
> 
> The system performance suddenly increases - to the awesome Debian stable levels.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 21:13       ` Jan Kara
@ 2013-09-17 22:22         ` Michal Suchanek
  2013-09-18 14:56         ` Michal Suchanek
  1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-17 22:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have  vm.swappiness = 0 set for a long  time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
>   Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?

It's a LVM volume on a SATA drive. I sometimes use USB disks as well
but most of the time it's SATA or eSATA.

>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
>   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?

I can try that but I don't really mind if the kernel uses 2G ram for
buffers. The problem is it cannot manage those buffers. Does some
kernel structure grow out of proportion when the buffers reach this
size or something?

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 21:13       ` Jan Kara
  2013-09-17 22:22         ` Michal Suchanek
@ 2013-09-18 14:56         ` Michal Suchanek
  2013-09-19 10:13           ` Jan Kara
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
  1 sibling, 2 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-18 14:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have  vm.swappiness = 0 set for a long  time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
>   Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?
>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
>   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?

The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
these to 5/2 gives about the same result as running the script that
syncs every 5s. Setting to 30/10 gives larger data chunks and
intermittent lockup before every chunk is written.

It is quite possible to set kernel parameters that kill the kernel but

1) this is the default
2) the parameter is set in units that do not prevent the issue in
general (% RAM vs #blocks)
3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
traversing a structure holding 800M data in the background. Something
is seriously rotten somewhere.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-18 14:56         ` Michal Suchanek
@ 2013-09-19 10:13           ` Jan Kara
  2013-10-09 14:19             ` Michal Suchanek
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Kara @ 2013-09-19 10:13 UTC (permalink / raw)
  To: Michal Suchanek; +Cc: Jan Kara, Hillf Danton, LKML, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]

On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
> >   Hello,
> >
> > On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> >> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> >> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>> Hello,
> >> >>>>>
> >> >>>>> I am dealing with VM disk images and performing something like wiping
> >> >>>>> free space to prepare image for compressing and storing on server or
> >> >>>>> copying it to external USB disk causes
> >> >>>>>
> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >> >>>>> are 100% used by system and the machine is basicaly unusable
> >> >>>>>
> >> >>>>> 2) oom killer killing processes
> >> >>>>>
> >> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >> >>>>>
> >> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >> >>>>>
> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >> >>>>> problem even with less ram.
> >> >>>>>
> >> >>>>> I have  vm.swappiness = 0 set for a long  time already.
> >> >>>>>
> >> >>>>>
> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >> >>>>kernel still causes all cores to loop somewhere in system when writing
> >> >>>>lots of data to disk.
> >> >>>>
> >> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >> >>>>
> >> >>>>Given that the max is 100 the interval in which mm works at all is
> >> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >> >>>>a severe regression as is the cpu time consumed by the kernel.
> >> >>>>
> >> >>>>The io scheduler is the default cfq.
> >> >>>>
> >> >>>>If you have any idea what to try other than downgrading to an earlier
> >> >>>>unaffected kernel I would like to hear.
> >> >>>>
> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >> >>> deadlock caused by too_many_isolated())?
> >> >>>
> >> >>> Or try 3.8 and/or 3.9, additionally?
> >> >>>
> >> >>
> >> >> Hello,
> >> >>
> >> >> with deadline IO scheduler I experience this issue less often but it
> >> >> still happens.
> >> >>
> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >> >>
> >> >> Do you have some idea what to log so that useful information about the
> >> >> lockup is gathered?
> >> >>
> >> >
> >> > This appears to be fixed in vanilla 3.11 kernel.
> >> >
> >> > I still get short intermittent lockups and cpu usage spikes up to 20%
> >> > on a core but nowhere near the minute+ long lockups with all cores
> >> > 100% on earlier kernels.
> >> >
> >>
> >> So I did more testing on the 3.11 kernel and while it works OK with
> >> tar you can get severe lockups with mc or kvm. The difference is
> >> probably the fact that sane tools do fsync() on files they close
> >> forcing the file to write out and the kernel returning possible write
> >> errors before they move on to next file.
> >   Sorry for chiming in a bit late. But is this really writing to a normal
> > disk? SATA drive or something else?
> >
> >> With kvm writing to a file used as virtual disk the system would stall
> >> indefinitely until the disk driver in the emulated system would time
> >> out, return disk IO error, and the emulated system would stop writing.
> >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> >> the lockups would be indefinite, probably because there is no timeout
> >> on writing a file in mc.
> >>
> >> I tried tuning swappiness and eleveators but the the basic problem is
> >> solved by neither: the dirty buffers fill up memory and system stalls
> >> trying to resolve the situation.
> >   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> > amount of dirty memory. By default it is set to 20% of memory which tends
> > to be too much for 8 GB machine. Can you set it to something like 5% and
> > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> > sizing (assuming standard SATA drive). Does it change anything?
> 
> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
  Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
to better accomodate some workloads but 60/40 on 8 GB machines with
SATA drive really seems too much. That is going to give memory management a
headache.

The problem is that a good SATA drive can do ~100 MB/s if we are
lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
it takes 50s at best to write it, with more random IO to image file it can
well take several minutes to write. That may cause some increased latency
when memory reclaim waits for writeback to clean some pages.

> these to 5/2 gives about the same result as running the script that
> syncs every 5s. Setting to 30/10 gives larger data chunks and
> intermittent lockup before every chunk is written.
> 
> It is quite possible to set kernel parameters that kill the kernel but
> 
> 1) this is the default
  Not upstream one so you should raise this with Debian I guess. 60/40
looks way out of reasonable range for todays machines.

> 2) the parameter is set in units that do not prevent the issue in
> general (% RAM vs #blocks)
  You can set the number of bytes instead of percentage -
/proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
sizing depends on amount of memory, storage HW, workload. So it's more an
administrative task to set this tunable properly.

> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
> traversing a structure holding 800M data in the background. Something
> is seriously rotten somewhere.
  Likely processes are waiting in direct reclaim for IO to finish. But that
is just guessing. Try running attached script (forgot to attach it to
previous email). You will need systemtap and kernel debuginfo installed.
The script doesn't work with all versions of systemtap (as it is sadly a
moving target) so if it fails, tell me your version of systemtap and I'll
update the script accordingly.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: watch-dstate.pl --]
[-- Type: application/x-perl, Size: 11084 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
@ 2013-09-20 11:20             ` Michal Suchanek
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-09-20 11:20 UTC (permalink / raw)
  To: Hillf Danton, Linux-MM, Linux Kernel Mailing List, Jan Kara

Hello,

On 19 September 2013 10:07, Hillf Danton <dhillf@gmail.com> wrote:
> Hello Michal
>
> Take it easy please, the kernel is made by human hands.
>
> Can you please try the diff(and sorry if mail agent reformats it)?
>
> Best Regards
> Hillf
>
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
> --

I applied the patch and raised the dirty block ratios to 30/10 and the
default 60/40 while imaging a VM and did not observe any problems so I
guess this solves it.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-19 10:13           ` Jan Kara
@ 2013-10-09 14:19             ` Michal Suchanek
  2013-10-15 14:15               ` Michal Suchanek
  2014-07-07 11:34               ` Michal Suchanek
  0 siblings, 2 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-10-09 14:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

Hello,

On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>> >   Hello,
>>
>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
> to better accomodate some workloads but 60/40 on 8 GB machines with
> SATA drive really seems too much. That is going to give memory management a
> headache.
>
> The problem is that a good SATA drive can do ~100 MB/s if we are
> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
> it takes 50s at best to write it, with more random IO to image file it can
> well take several minutes to write. That may cause some increased latency
> when memory reclaim waits for writeback to clean some pages.
>
>> these to 5/2 gives about the same result as running the script that
>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>> intermittent lockup before every chunk is written.
>>
>> It is quite possible to set kernel parameters that kill the kernel but
>>
>> 1) this is the default
>   Not upstream one so you should raise this with Debian I guess. 60/40
> looks way out of reasonable range for todays machines.
>
>> 2) the parameter is set in units that do not prevent the issue in
>> general (% RAM vs #blocks)
>   You can set the number of bytes instead of percentage -
> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
> sizing depends on amount of memory, storage HW, workload. So it's more an
> administrative task to set this tunable properly.
>
>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>> traversing a structure holding 800M data in the background. Something
>> is seriously rotten somewhere.
>   Likely processes are waiting in direct reclaim for IO to finish. But that
> is just guessing. Try running attached script (forgot to attach it to
> previous email). You will need systemtap and kernel debuginfo installed.
> The script doesn't work with all versions of systemtap (as it is sadly a
> moving target) so if it fails, tell me your version of systemtap and I'll
> update the script accordingly.

This was fixed for me by the patch posted earlier by Hillf Danton so I
guess this answers what the system was (not) doing:

--- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
+++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
@@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
  * implies that pages are cycling through the LRU faster than
  * they are written so also forcibly stall.
  */
- if (nr_unqueued_dirty == nr_taken || nr_immediate)
+ if (nr_unqueued_dirty == nr_taken || nr_immediate) {
+ if (current_is_kswapd())
+ wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
  congestion_wait(BLK_RW_ASYNC, HZ/10);
+ }
  }

  /*

Also 75485363 is hopefully addressing this issue in mainline.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-10-09 14:19             ` Michal Suchanek
@ 2013-10-15 14:15               ` Michal Suchanek
  2014-07-07 11:34               ` Michal Suchanek
  1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2013-10-15 14:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> >   Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>>   Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>>   You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>>   Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
>
> Also 75485363 is hopefully addressing this issue in mainline.
>

Actually, this was in 3.11 already and it did make the behaviour a bit
better but was not enough.

So is something like the vmscan.c patch going to make it into the
mainline kernel?

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-10-09 14:19             ` Michal Suchanek
  2013-10-15 14:15               ` Michal Suchanek
@ 2014-07-07 11:34               ` Michal Suchanek
  1 sibling, 0 replies; 15+ messages in thread
From: Michal Suchanek @ 2014-07-07 11:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> >   Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>>   Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>>   You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>>   Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
>

Hello,

Is this being addressed somehow?

It seems the 3.15 kernel still has this issue  .. unless it happens to
lock up for some other reason in similar situations.

Thanks

Michal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-07-07 11:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
2013-03-12  9:03 ` Michal Suchanek
2013-08-26 13:51 ` Michal Suchanek
2013-09-05 10:12   ` Michal Suchanek
2013-09-17 13:31     ` Michal Suchanek
2013-09-17 21:13       ` Jan Kara
2013-09-17 22:22         ` Michal Suchanek
2013-09-18 14:56         ` Michal Suchanek
2013-09-19 10:13           ` Jan Kara
2013-10-09 14:19             ` Michal Suchanek
2013-10-15 14:15               ` Michal Suchanek
2014-07-07 11:34               ` Michal Suchanek
     [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
2013-09-20 11:20             ` Michal Suchanek
  -- strict thread matches above, loose matches on Subject: below --
2013-02-08 16:31 Michal Suchanek
2013-03-11 13:15 ` Michal Suchanek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).