linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* isolate_freepages_block and excessive CPU usage by OSD process
@ 2014-11-15 11:48 Andrey Korolyov
  2014-11-15 16:32 ` Vlastimil Babka
  0 siblings, 1 reply; 36+ messages in thread
From: Andrey Korolyov @ 2014-11-15 11:48 UTC (permalink / raw)
  To: ceph-users@lists.ceph.com; +Cc: riel, Mark Nelson, linux-mm

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

Hello,

I had found recently that the OSD daemons under certain conditions
(moderate vm pressure, moderate I/O, slightly altered vm settings) can
go into loop involving isolate_freepages and effectively hit Ceph
cluster performance. I found this thread
https://lkml.org/lkml/2012/6/27/545, but looks like that the
significant decrease of bdi max_ratio did not helped even for a bit.
Although I have approximately a half of physical memory for cache-like
stuff, the problem with mm persists, so I would like to try
suggestions from the other people. In current testing iteration I had
decreased vfs_cache_pressure to 10 and raised vm_dirty_ratio and
background ratio to 15 and 10 correspondingly (because default values
are too spiky for mine workloads). The host kernel is a linux-stable
3.10.

Non-default VM settings are:
vm.swappiness = 5
vm.dirty_ratio=10
vm.dirty_background_ratio=5
bdi_max_ratio was 100%, right now 20%, at a glance it looks like the
situation worsened, because unstable OSD host cause domino-like effect
on other hosts, which are starting to flap too and only cache flush
via drop_caches is helping.

Unfortunately there are no slab info from "exhausted" state due to
sporadic nature of this bug, will try to catch next time.

slabtop (normal state):
 Active / Total Objects (% used)    : 8675843 / 8965833 (96.8%)
 Active / Total Slabs (% used)      : 224858 / 224858 (100.0%)
 Active / Total Caches (% used)     : 86 / 132 (65.2%)
 Active / Total Size (% used)       : 1152171.37K / 1253116.37K (91.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.14K / 15.75K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
6890130 6889185  99%    0.10K 176670       39    706680K buffer_head
751232 721707  96%    0.06K  11738       64     46952K kmalloc-64
251636 226228  89%    0.55K   8987       28    143792K radix_tree_node
121696  45710  37%    0.25K   3803       32     30424K kmalloc-256
113022  80618  71%    0.19K   2691       42     21528K dentry
112672  35160  31%    0.50K   3521       32     56336K kmalloc-512
 73136  72800  99%    0.07K   1306       56      5224K Acpi-ParseExt
 61696  58644  95%    0.02K    241      256       964K kmalloc-16
 54348  36649  67%    0.38K   1294       42     20704K ip6_dst_cache
 53136  51787  97%    0.11K   1476       36      5904K sysfs_dir_cache
 51200  50724  99%    0.03K    400      128      1600K kmalloc-32
 49120  46105  93%    1.00K   1535       32     49120K xfs_inode
 30702  30702 100%    0.04K    301      102      1204K Acpi-Namespace
 28224  25742  91%    0.12K    882       32      3528K kmalloc-128
 28028  22691  80%    0.18K    637       44      5096K vm_area_struct
 28008  28008 100%    0.22K    778       36      6224K xfs_ili
 18944  18944 100%    0.01K     37      512       148K kmalloc-8
 16576  15154  91%    0.06K    259       64      1036K anon_vma
 16475  14200  86%    0.16K    659       25      2636K sigqueue

zoneinfo (normal state, attached)

[-- Attachment #2: zoneinfo --]
[-- Type: application/octet-stream, Size: 15098 bytes --]

Node 0, zone      DMA
  pages free     3973
        min      5
        low      6
        high     7
        scanned  0
        spanned  4095
        present  3994
        managed  3973
    nr_free_pages 3973
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 0
    nr_active_file 0
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    numa_hit     0
    numa_miss    0
    numa_foreign 0
    numa_interleave 0
    numa_local   0
    numa_other   0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 1914, 32121, 32121)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 4
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 5
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 6
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 7
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 8
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 9
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 10
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 11
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 12
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 13
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 14
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 15
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 16
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 17
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 18
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 19
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 20
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 21
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 22
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 23
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
  all_unreclaimable: 1
  start_pfn:         1
  inactive_ratio:    1
Node 0, zone    DMA32
  pages free     32223
        min      669
        low      836
        high     1003
        scanned  0
        spanned  1044480
        present  511926
        managed  490239
    nr_free_pages 32223
    nr_inactive_anon 277
    nr_active_anon 45533
    nr_inactive_file 227698
    nr_active_file 122112
    nr_unevictable 4760
    nr_mlock     4760
    nr_anon_pages 49781
    nr_mapped    133
    nr_file_pages 350087
    nr_dirty     160
    nr_writeback 0
    nr_slab_reclaimable 20418
    nr_slab_unreclaimable 30228
    nr_page_table_pages 190
    nr_kernel_stack 436
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 2
    nr_vmscan_immediate_reclaim 3499
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     277
    nr_dirtied   609807631
    nr_written   609734467
    numa_hit     6979761185
    numa_miss    3941324201
    numa_foreign 0
    numa_interleave 0
    numa_local   6979751851
    numa_other   3941333535
    nr_anon_transparent_hugepages 1
    nr_free_cma  0
        protection: (0, 0, 30206, 30206)
  pagesets
    cpu: 0
              count: 12
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 1
              count: 8
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 2
              count: 60
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 3
              count: 45
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 4
              count: 12
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 5
              count: 3
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 6
              count: 49
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 7
              count: 28
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 8
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 9
              count: 5
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 10
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 11
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 12
              count: 19
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 13
              count: 1
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 14
              count: 12
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 15
              count: 162
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 16
              count: 14
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 17
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 18
              count: 3
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 19
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 20
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 21
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 22
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
    cpu: 23
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 50
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    3
Node 0, zone   Normal
  pages free     32960
        min      10568
        low      13210
        high     15852
        scanned  0
        spanned  7864320
        present  7864320
        managed  7732828
    nr_free_pages 32960
    nr_inactive_anon 11191
    nr_active_anon 3036913
    nr_inactive_file 3223885
    nr_active_file 1127966
    nr_unevictable 4086
    nr_mlock     4086
    nr_anon_pages 2363745
    nr_mapped    34191
    nr_file_pages 4358872
    nr_dirty     2926
    nr_writeback 0
    nr_slab_reclaimable 82623
    nr_slab_unreclaimable 24026
    nr_page_table_pages 12611
    nr_kernel_stack 1842
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 59
    nr_vmscan_immediate_reclaim 29602
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     6348
    nr_dirtied   8347305401
    nr_written   8343222456
    numa_hit     49594613817
    numa_miss    635457096
    numa_foreign 391251876
    numa_interleave 20063
    numa_local   49594490600
    numa_other   635580313
    nr_anon_transparent_hugepages 1331
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 58
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 1
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 2
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 3
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 4
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 5
              count: 78
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 6
              count: 64
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 7
              count: 151
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 8
              count: 182
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 9
              count: 173
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 10
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 11
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 12
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 13
              count: 156
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 14
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 15
              count: 135
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 16
              count: 158
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 17
              count: 172
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 18
              count: 167
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 19
              count: 171
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 20
              count: 169
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 21
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 22
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 23
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 90
  all_unreclaimable: 0
  start_pfn:         1048576
  inactive_ratio:    17
Node 1, zone   Normal
  pages free     14880
        min      11284
        low      14105
        high     16926
        scanned  0
        spanned  8388608
        present  8388608
        managed  8257056
    nr_free_pages 14880
    nr_inactive_anon 13140
    nr_active_anon 2569269
    nr_inactive_file 3715797
    nr_active_file 1659970
    nr_unevictable 15464
    nr_mlock     15464
    nr_anon_pages 1310698
    nr_mapped    45301
    nr_file_pages 5387102
    nr_dirty     3551
    nr_writeback 0
    nr_slab_reclaimable 135572
    nr_slab_unreclaimable 24093
    nr_page_table_pages 6677
    nr_kernel_stack 775
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 57854
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     10317
    nr_dirtied   13325911763
    nr_written   13320630581
    numa_hit     43510008565
    numa_miss    391251876
    numa_foreign 4576781297
    numa_interleave 19867
    numa_local   43509973410
    numa_other   391287031
    nr_anon_transparent_hugepages 2492
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 155
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 1
              count: 173
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 2
              count: 104
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 3
              count: 168
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 4
              count: 158
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 5
              count: 169
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 6
              count: 53
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 7
              count: 81
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 8
              count: 63
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 9
              count: 168
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 10
              count: 46
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 11
              count: 28
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 12
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 13
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 14
              count: 155
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 15
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 16
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 17
              count: 185
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 18
              count: 69
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 19
              count: 75
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 20
              count: 151
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 21
              count: 91
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 22
              count: 51
              high:  186
              batch: 31
  vm stats threshold: 90
    cpu: 23
              count: 56
              high:  186
              batch: 31
  vm stats threshold: 90
  all_unreclaimable: 0
  start_pfn:         8912896
  inactive_ratio:    17

^ permalink raw reply	[flat|nested] 36+ messages in thread
[parent not found: <CABYiri-do2YdfBx=r+u1kwXkEwN4v+yeRSHB-ODXo4gMFgW-Fg.mail.gmail.com>]

end of thread, other threads:[~2014-12-11  3:04 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-15 11:48 isolate_freepages_block and excessive CPU usage by OSD process Andrey Korolyov
2014-11-15 16:32 ` Vlastimil Babka
2014-11-15 17:10   ` Andrey Korolyov
2014-11-15 18:45     ` Vlastimil Babka
2014-11-15 18:52       ` Andrey Korolyov
     [not found] <CABYiri-do2YdfBx=r+u1kwXkEwN4v+yeRSHB-ODXo4gMFgW-Fg.mail.gmail.com>
2014-11-19  1:21 ` Christian Marie
2014-11-19 18:03   ` Andrey Korolyov
2014-11-19 21:20     ` Christian Marie
2014-11-19 23:10       ` Vlastimil Babka
2014-11-19 23:49         ` Andrey Korolyov
2014-11-20  3:30         ` Christian Marie
2014-11-21  2:35         ` Christian Marie
2014-11-23  9:33           ` Christian Marie
2014-11-24 21:48             ` Andrey Korolyov
2014-11-28  8:03               ` Joonsoo Kim
2014-11-28  9:26                 ` Vlastimil Babka
2014-12-01  8:31                   ` Joonsoo Kim
2014-12-02  1:47                     ` Christian Marie
2014-12-02  4:53                       ` Joonsoo Kim
2014-12-02  5:06                         ` Christian Marie
2014-12-03  4:04                           ` Christian Marie
2014-12-03  8:05                             ` Joonsoo Kim
2014-12-04 23:30                             ` Vlastimil Babka
2014-12-05  5:50                               ` Christian Marie
2014-12-03  7:57                           ` Joonsoo Kim
2014-12-04  7:30                             ` Christian Marie
2014-12-04  7:51                               ` Christian Marie
2014-12-05  1:07                               ` Joonsoo Kim
2014-12-05  5:55                                 ` Christian Marie
2014-12-08  7:19                                   ` Joonsoo Kim
2014-12-10 15:06                                 ` Vlastimil Babka
2014-12-11  3:08                                   ` Joonsoo Kim
2014-12-02 15:46                         ` Vlastimil Babka
2014-12-03  7:49                           ` Joonsoo Kim
2014-12-03 12:43                             ` Vlastimil Babka
2014-12-04  6:53                               ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).