From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: sergey.senozhatsky@gmail.com, pmladek@suse.com, jack@suse.com,
tj@kernel.org, kyle@kernel.org, davej@codemonkey.org.uk,
calvinowens@fb.com, akpm@linux-foundation.org,
linux-mm@kvack.org
Subject: Re: How to avoid printk() delay caused by cond_resched() ?
Date: Wed, 2 Mar 2016 23:04:19 +0900 [thread overview]
Message-ID: <20160302140419.GA614@swordfish> (raw)
In-Reply-To: <201603022101.CAH73907.OVOOMFHFFtQJSL@I-love.SAKURA.ne.jp>
Hello Tetsuo,
On (03/02/16 21:01), Tetsuo Handa wrote:
> I'm trying to dump information of all threads which might be relevant
> to stalling inside memory allocator. But it seems to me that since this
> patch changed to allow calling cond_resched() from printk() if it is
> safe to do so, it is now possible that the thread which invoked the OOM
> killer can sleep for minutes with the oom_lock mutex held when my dump is
> in progress. I want to release oom_lock mutex as soon as possible so
> that other threads can call out_of_memory() to get TIF_MEMDIE and exit
> their allocations.
correct, otherwise chances are that you will softlockup or RCU stall
the system -- console_unlock() does not stop until the printk logbuf
has data in it (which can be appended concurrently from other CPUs).
this fact was the motivation for cond_resched() in console_unlock() in
the first place and for the later patch that permit to some of printk
callers to do the same.
> So, how can I prevent printk() triggered by out_of_memory() from sleeping
> for minutes with oom_lock mutex held? Guard it with preempt_disable() /
> preempt_enable() ? Guard it with rcu_read_lock() / rcu_read_unlock() ?
correct, both preempt_disable() and rcu_read_lock() will certainly forbid
cond_reshed() in printk()->console_unlock(), but it will also increase the
softlockup or/and rcu stall probability, depending on how many threads you
have and how many stack traces you need to print. can we do dirty things to
the frequency of cond_resched()? for example, do cond_resched() not after
every printk(), but every LOG_LINE_MAX chars.
kernel/printk/printk.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9917f69..5467b1e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2209,6 +2209,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool do_cond_resched, retry;
+ int num_printed;
if (console_suspended) {
up_console_sem();
@@ -2229,6 +2230,7 @@ void console_unlock(void)
console_may_schedule = 0;
again:
+ num_printed = 0;
/*
* We released the console_sem lock, so we need to recheck if
* cpu is online and (if not) is there at least one CON_ANYTIME
@@ -2310,8 +2312,11 @@ skip:
start_critical_timings();
local_irq_restore(flags);
- if (do_cond_resched)
+ num_printed += len + ext_len;
+ if (do_cond_resched && num_printed > LOG_LINE_MAX) {
cond_resched();
+ num_printed = 0;
+ }
}
console_locked = 0;
---
several questions,
do you use some sort of oom-kill reproducer? ... the problem with printk
is that you never know which once of the CPUs will do the printing job at
the end; any guess there is highly unreliable. but, assuming that you have
executed that oom-kill reproducer many times before the patch in question
do you have any rough numbers to compare how many seconds it used to take
to dump_stack all of the tasks?
hm, a bit puzzled, how much does OOM add to the cond_resched() latency here?
is this because at least some of the tasks we switch_to can do direct reclaims
and thus we have bigger "pauses" between call_console_drivers() iterations?
in that case even after offloading printk job to system_wq we still will
suffer, in some sense. oom mutex is going to be released sooner, that's true,
but console_unlock() will still cond_resched() after every call_console_drivers(),
so the logs potentially can be printed with a huge delay. we probably cond_resched()
in console_unlock() too often.
I haven't noticed anything like that during my tests.
-ss
> ----------
> [ 460.893958] tgid=11161 invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), order=0, oom_score_adj=1000
> [ 463.892897] tgid=11161 cpuset=/ mems_allowed=0
> [ 463.894724] CPU: 1 PID: 12346 Comm: tgid=11161 Not tainted 4.5.0-rc6-next-20160302+ #318
> [ 463.897026] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> [ 463.899841] 0000000000000286 00000000733ef955 ffff8800778a79b0 ffffffff813a2ded
> [ 463.902164] 0000000000000000 ffff8800778a7be0 ffff8800778a7a50 ffffffff811c24d0
> [ 463.904474] 0000000000000206 ffffffff81810c30 ffff8800778a79f0 ffffffff810bf839
> [ 463.906801] Call Trace:
> [ 463.908101] [<ffffffff813a2ded>] dump_stack+0x85/0xc8
> [ 463.909921] [<ffffffff811c24d0>] dump_header+0x5b/0x3b0
> [ 463.911759] [<ffffffff810bf839>] ? trace_hardirqs_on_caller+0xf9/0x1c0
> [ 463.913833] [<ffffffff810bf90d>] ? trace_hardirqs_on+0xd/0x10
> [ 463.915751] [<ffffffff81146f2d>] oom_kill_process+0x37d/0x570
> [ 463.918024] [<ffffffff81147366>] out_of_memory+0x1f6/0x5a0
> [ 463.919890] [<ffffffff81147424>] ? out_of_memory+0x2b4/0x5a0
> [ 463.921784] [<ffffffff8114d041>] __alloc_pages_nodemask+0xc91/0xeb0
> [ 463.923788] [<ffffffff81196726>] alloc_pages_current+0x96/0x1b0
> [ 463.925729] [<ffffffff8114188d>] __page_cache_alloc+0x12d/0x160
> [ 463.927682] [<ffffffff8114537a>] filemap_fault+0x48a/0x6a0
> [ 463.929547] [<ffffffff81145247>] ? filemap_fault+0x357/0x6a0
> [ 463.931409] [<ffffffff812b9e09>] xfs_filemap_fault+0x39/0x60
> [ 463.933255] [<ffffffff8116f19d>] __do_fault+0x6d/0x150
> [ 463.934974] [<ffffffff81175b1d>] handle_mm_fault+0xecd/0x1800
> [ 463.936791] [<ffffffff81174ca3>] ? handle_mm_fault+0x53/0x1800
> [ 463.938617] [<ffffffff8105a796>] __do_page_fault+0x1e6/0x520
> [ 463.940380] [<ffffffff8105ab00>] do_page_fault+0x30/0x80
> [ 463.942074] [<ffffffff817113e8>] page_fault+0x28/0x30
> [ 463.943750] Mem-Info:
> (...snipped...)
> [ 554.754959] MemAlloc: tgid=11161(12346) flags=0x400040 switches=865 seq=169 gfp=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) order=0 delay=81602
> [ 554.754962] tgid=11161 R running task 0 12346 11056 0x00000080
> [ 554.754965] ffff8800778a7838 ffff880034510000 ffff8800778b0000 ffff8800778a8000
> [ 554.754966] 0000000000000091 ffffffff82a86fbc 0000000000000004 0000000000000000
> [ 554.754967] ffff8800778a7850 ffffffff8170a5dd 0000000000000296 ffff8800778a7860
> [ 554.754968] Call Trace:
> [ 554.754974] [<ffffffff8170a5dd>] preempt_schedule_common+0x1f/0x42
> [ 554.754975] [<ffffffff8170a617>] _cond_resched+0x17/0x20
> [ 554.754978] [<ffffffff810d17b9>] console_unlock+0x509/0x5c0
> [ 554.754979] [<ffffffff810d1b93>] vprintk_emit+0x323/0x540
> [ 554.754981] [<ffffffff810d1f0a>] vprintk_default+0x1a/0x20
> [ 554.754983] [<ffffffff8114069e>] printk+0x58/0x6f
> [ 554.754986] [<ffffffff813aaede>] show_mem+0x1e/0xe0
> [ 554.754988] [<ffffffff811c24ec>] dump_header+0x77/0x3b0
> [ 554.754990] [<ffffffff810bf839>] ? trace_hardirqs_on_caller+0xf9/0x1c0
> [ 554.754991] [<ffffffff810bf90d>] ? trace_hardirqs_on+0xd/0x10
> [ 554.754993] [<ffffffff81146f2d>] oom_kill_process+0x37d/0x570
> [ 554.754995] [<ffffffff81147366>] out_of_memory+0x1f6/0x5a0
> [ 554.754996] [<ffffffff81147424>] ? out_of_memory+0x2b4/0x5a0
> [ 554.754997] [<ffffffff8114d041>] __alloc_pages_nodemask+0xc91/0xeb0
> [ 554.755000] [<ffffffff81196726>] alloc_pages_current+0x96/0x1b0
> [ 554.755001] [<ffffffff8114188d>] __page_cache_alloc+0x12d/0x160
> [ 554.755003] [<ffffffff8114537a>] filemap_fault+0x48a/0x6a0
> [ 554.755004] [<ffffffff81145247>] ? filemap_fault+0x357/0x6a0
> [ 554.755006] [<ffffffff812b9e09>] xfs_filemap_fault+0x39/0x60
> [ 554.755007] [<ffffffff8116f19d>] __do_fault+0x6d/0x150
> [ 554.755009] [<ffffffff81175b1d>] handle_mm_fault+0xecd/0x1800
> [ 554.755010] [<ffffffff81174ca3>] ? handle_mm_fault+0x53/0x1800
> [ 554.755012] [<ffffffff8105a796>] __do_page_fault+0x1e6/0x520
> [ 554.755013] [<ffffffff8105ab00>] do_page_fault+0x30/0x80
> [ 554.755015] [<ffffffff817113e8>] page_fault+0x28/0x30
> ----------
>
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> CONFIG_PREEMPT_COUNT=y
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-03-02 14:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 12:01 How to avoid printk() delay caused by cond_resched() ? Tetsuo Handa
2016-03-02 13:38 ` Petr Mladek
2016-03-02 14:11 ` Tetsuo Handa
2016-03-02 15:21 ` Petr Mladek
2016-03-02 14:34 ` Sergey Senozhatsky
2016-03-02 14:56 ` Petr Mladek
2016-03-02 16:04 ` Jan Kara
2016-03-03 5:41 ` Sergey Senozhatsky
2016-03-02 14:04 ` Sergey Senozhatsky [this message]
2016-03-02 15:02 ` Tetsuo Handa
2016-03-02 15:29 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160302140419.GA614@swordfish \
--to=sergey.senozhatsky@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=calvinowens@fb.com \
--cc=davej@codemonkey.org.uk \
--cc=jack@suse.com \
--cc=kyle@kernel.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=pmladek@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).