From: CAI Qian <caiqian@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: linux-mm <linux-mm@kvack.org>, Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: known oom issues on numa in -mm tree?
Date: Fri, 28 Jan 2011 09:33:48 -0500 (EST) [thread overview]
Message-ID: <1939528112.209753.1296225228879.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1101280227440.28081@chino.kir.corp.google.com>
----- Original Message -----
> On Fri, 28 Jan 2011, CAI Qian wrote:
>
> > I can still reproduce this similar failure on both AMD and Intel
> > NUMA
> > systems using the latest linus tree with the commit you mentioned.
> > Unfortunately, I can't get a clear sysrq/console output of it but
> > only
> > a part of it (screenshot attached).
> >
> > It at least very easy to reproduce it for me by running LTP oom01
> > test
> > for both Magny-Cours and Nehalem-EX NUMA systems.
> >
>
> Are you sure this is the same issue? The picture you provided doesn't
> show the top of the stack so I don't know what it's doing, but the
> original report had this:
>
> oom02 R running task 0 2023 1969 0x00000088
> 0000000000000282 ffff88041d219df0 ffff88041fbf8ef0 ffffffff81100800
> ffff880418ab5b18 0000000000000282 ffffffff8100c9ee ffff880418ab5ba8
> 0000000087654321 0000000000000000 ffff880000000000 0000000000000001
> Call Trace:
> [<ffffffff81100800>] ? drain_local_pages+0x0/0x20
> [<ffffffff8100c9ee>] ? apic_timer_interrupt+0xe/0x20
> [<ffffffff81097ea6>] ? smp_call_function_many+0x1b6/0x210
> [<ffffffff81097e82>] ? smp_call_function_many+0x192/0x210
> [<ffffffff81100800>] ? drain_local_pages+0x0/0x20
> [<ffffffff81097f22>] ? smp_call_function+0x22/0x30
> [<ffffffff81068184>] ? on_each_cpu+0x24/0x50
> [<ffffffff810fe68c>] ? drain_all_pages+0x1c/0x20
> [<ffffffff81100d04>] ? __alloc_pages_nodemask+0x4e4/0x840
> [<ffffffff81138e09>] ? alloc_page_vma+0x89/0x140
> [<ffffffff8111c481>] ? handle_mm_fault+0x871/0xd80
> [<ffffffff814a4ecd>] ? schedule+0x3fd/0x980
> [<ffffffff8100c9ee>] ? apic_timer_interrupt+0xe/0x20
> [<ffffffff8100c9ee>] ? apic_timer_interrupt+0xe/0x20
> [<ffffffff814aadd3>] ? do_page_fault+0x143/0x4b0
> [<ffffffff8100a7b4>] ? __switch_to+0x194/0x320
> [<ffffffff814a4ecd>] ? schedule+0x3fd/0x980
> [<ffffffff814a7ad5>] ? page_fault+0x25/0x30
>
> and the reported symptom was kswapd running excessively. I'm pretty
> sure
> I fixed that with 2ff754fa8f41 (mm: clear pages_scanned only if
> draining a
> pcp adds pages to the buddy allocator).
>
> Absent the dmesg, it's going to be very difficult to diagnose an issue
> that isn't a panic.
Finally, have been able to get a sysrq-t output when oom01 is allocating
memory while used/free swap remained unchanged. All kswapd stopped at
zone_reclaim.
# free -m
total used free shared buffers cached
Mem: 48392 48050 341 0 26 31
-/+ buffers/cache: 47993 399
Swap: 50447 29560 20887
oom01 R running task 0 14534 14249 0x00000088
ffff88063fc17d58 0000000000000086 ffff88063fc17d20 000000020000000e
0000000000014d40 ffffea0013cc72d8 ffffea0013cc7188 ffffea0013cc7188
0000000000000297 ffff88063fc17d20 ffff8806371a7788 0000000000000297
Call Trace:
[<ffffffff811043be>] ? release_pages+0x24e/0x260
[<ffffffff81223591>] ? cpumask_any_but+0x31/0x50
[<ffffffff81042552>] ? flush_tlb_mm+0x42/0xa0
[<ffffffff811048c6>] ? __pagevec_release+0x26/0x40
[<ffffffff8110890f>] ? move_active_pages_to_lru+0x19f/0x1d0
[<ffffffff8100c96e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8100c96e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8100c96e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8100c96e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff810fc51f>] ? zone_watermark_ok+0x1f/0x30
[<ffffffff81139efc>] ? compaction_suitable+0x3c/0xc0
[<ffffffff81109647>] ? shrink_zone+0x1a7/0x520
[<ffffffff811095cd>] ? shrink_zone+0x12d/0x520
[<ffffffff8108cc43>] ? ktime_get_ts+0xb3/0xf0
[<ffffffff81109a7f>] ? do_try_to_free_pages+0xbf/0x4a0
[<ffffffff8110a0d2>] ? try_to_free_pages+0x92/0x130
[<ffffffff81100ccf>] ? __alloc_pages_nodemask+0x45f/0x850
[<ffffffff811395d3>] ? alloc_pages_vma+0x93/0x150
[<ffffffff81148bda>] ? do_huge_pmd_anonymous_page+0x13a/0x330
[<ffffffff8111e79d>] ? handle_mm_fault+0x24d/0x320
[<ffffffff814b21a3>] ? do_page_fault+0x143/0x4b0
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff814aef15>] ? page_fault+0x25/0x30
kswapd0 S ffff88022e28d000 0 275 2 0x00000000
ffff88022e2edda0 0000000000000046 ffffffff81e642c0 ffffffff00000000
0000000000014d40 ffff88022e28ca70 ffff88022e28d000 ffff88022e2edfd8
ffff88022e28d008 0000000000014d40 ffff88022e2ec010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd1 S ffff88022e28da30 0 276 2 0x00000000
ffff88022e2efda0 0000000000000046 ffff880337174000 ffff880300000000
0000000000014d40 ffff88022e28d4a0 ffff88022e28da30 ffff88022e2effd8
ffff88022e28da38 0000000000014d40 ffff88022e2ee010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd2 S ffff88022e282690 0 277 2 0x00000000
ffff88022e2d9da0 0000000000000046 ffff88052f904000 ffff880500000000
0000000000014d40 ffff88022e282100 ffff88022e282690 ffff88022e2d9fd8
ffff88022e282698 0000000000014d40 ffff88022e2d8010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd3 S ffff88022e2830c0 0 278 2 0x00000000
ffff88022e2dbda0 0000000000000046 ffff880637124000 ffff880600000000
0000000000014d40 ffff88022e282b30 ffff88022e2830c0 ffff88022e2dbfd8
ffff88022e2830c8 0000000000014d40 ffff88022e2da010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd4 S ffff88022e283af0 0 279 2 0x00000000
ffff88022e301da0 0000000000000046 ffff88082f908000 ffff880800000000
0000000000014d40 ffff88022e283560 ffff88022e283af0 ffff88022e301fd8
ffff88022e283af8 0000000000014d40 ffff88022e300010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd5 S ffff88022e278650 0 280 2 0x00000000
ffff88022e303da0 0000000000000046 ffff880937134000 ffff880900000000
0000000000014d40 ffff88022e2780c0 ffff88022e278650 ffff88022e303fd8
ffff88022e278658 0000000000014d40 ffff88022e302010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd6 S ffff88022e279080 0 281 2 0x00000000
ffff88022e2c5da0 0000000000000046 ffff880a37134000 ffff880a00000000
0000000000014d40 ffff88022e278af0 ffff88022e279080 ffff88022e2c5fd8
ffff88022e279088 0000000000014d40 ffff88022e2c4010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kswapd7 S ffff88022e279ab0 0 282 2 0x00000000
ffff88022e289da0 0000000000000046 ffff880c2f934000 ffff880c00000000
0000000000014d40 ffff88022e279520 ffff88022e279ab0 ffff88022e289fd8
ffff88022e279ab8 0000000000014d40 ffff88022e288010 0000000000014d40
Call Trace:
[<ffffffff8110a600>] ? zone_reclaim+0x380/0x400
[<ffffffff8110b196>] kswapd+0xb16/0xc10
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff814ac0ce>] ? schedule+0x44e/0xa10
[<ffffffff81082fa0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110a680>] ? kswapd+0x0/0xc10
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
migration/25 R running task 0 90 2 0x00000000
ffff88022f1d9de0 0000000000000046 ffff88083fc54d40 0000000000000000
0000000000014d40 ffff88022f1d2ab0 ffff88022f1d3040 ffff88022f1d9fd8
ffff88022f1d3048 0000000000014d40 ffff88022f1d8010 0000000000014d40
Call Trace:
[<ffffffff810b43d0>] ? stop_machine_cpu_stop+0x0/0xe0
[<ffffffff810b433d>] cpu_stopper_thread+0x13d/0x1d0
[<ffffffff810b4200>] ? cpu_stopper_thread+0x0/0x1d0
[<ffffffff810b4200>] ? cpu_stopper_thread+0x0/0x1d0
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
kworker/25:0 S ffff88022f1d2610 0 91 2 0x00000000
ffff88022f1fbe50 0000000000000046 ffff88063fc112c8 0000000000000082
0000000000014d40 ffff88022f1d2080 ffff88022f1d2610 ffff88022f1fbfd8
ffff88022f1d2618 0000000000014d40 ffff88022f1fa010 0000000000014d40
Call Trace:
[<ffffffff8107e301>] worker_thread+0x261/0x3c0
[<ffffffff8107e0a0>] ? worker_thread+0x0/0x3c0
[<ffffffff81082916>] kthread+0x96/0xa0
[<ffffffff8100cdc4>] kernel_thread_helper+0x4/0x10
[<ffffffff81082880>] ? kthread+0x0/0xa0
[<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
...
runnable tasks:
task PID tree-key switches prio exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
migration/25 90 2150.833956 29 0 2150.833956 0.000920 0.000000 /
R oom01 14534 582482.371064 96026 120 582482.371064 519502.918710 91644.125770 /
kworker/25:2 14592 582470.371064 4811 120 582470.371064 108.496370 207368.125892 /
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-01-28 14:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-06 8:03 known oom issues on numa in -mm tree? CAI Qian
2011-01-11 0:04 ` David Rientjes
2011-01-11 1:36 ` CAI Qian
2011-01-11 3:18 ` David Rientjes
2011-01-11 3:35 ` CAI Qian
2011-01-11 9:46 ` CAI Qian
2011-01-11 9:55 ` CAI Qian
2011-01-11 10:11 ` CAI Qian
2011-01-26 10:45 ` David Rientjes
2011-01-28 6:47 ` CAI Qian
2011-01-28 10:31 ` David Rientjes
2011-01-28 14:33 ` CAI Qian [this message]
2011-01-28 14:36 ` CAI Qian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1939528112.209753.1296225228879.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com \
--to=caiqian@redhat.com \
--cc=aarcange@redhat.com \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).