* continuous oom caused system deadlock
[not found] <254859941.6601291447527808.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-12-04 7:30 ` caiqian
2010-12-08 18:11 ` CAI Qian
0 siblings, 1 reply; 8+ messages in thread
From: caiqian @ 2010-12-04 7:30 UTC (permalink / raw)
To: linux-mm
Running this LTP test a few times for mmotm tree caused system hung hard,
http://people.redhat.com/qcai/oom01.c
I tried to bisect but only found it was also present in the tree a few months back as well.
SysRq-W output indicated that kswapd0 might stuck,
[ 373.943002] kswapd0 R running task 0 34 2 0x00000000
[ 373.943002] ffff88022abdbc80 ffffffff8146e4ce ffff88022abdbcb0 ffffffff81232698
[ 373.943002] 0000000000000001 ffffffff81a248f0 0000000000000000 0000000000000000
[ 373.943002] ffff88022abdbcc0 ffffffff8112d59d ffff88022abdbcd0 ffffffff8146e4ce
[ 373.943002] Call Trace:
[ 373.943002] [<ffffffff8146e4ce>] ? _raw_spin_lock+0xe/0x10
[ 373.943002] [<ffffffff81232698>] ? __percpu_counter_sum+0x4d/0x63
[ 373.943002] [<ffffffff8112d59d>] ? get_nr_inodes_unused+0x15/0x23
[ 373.943002] [<ffffffff8146e4ce>] ? _raw_spin_lock+0xe/0x10
[ 373.943002] [<ffffffff8146ea0e>] ? common_interrupt+0xe/0x13
[ 373.943002] [<ffffffff810e2add>] ? balance_pgdat+0x29b/0x417
[ 373.943002] [<ffffffff810e2e83>] ? kswapd+0x22a/0x240
[ 373.943002] [<ffffffff8106af63>] ? autoremove_wake_function+0x0/0x39
[ 373.943002] [<ffffffff810e2c59>] ? kswapd+0x0/0x240
[ 373.943002] [<ffffffff8106aaae>] ? kthread+0x82/0x8a
[ 373.943002] [<ffffffff8100bae4>] ? kernel_thread_helper+0x4/0x10
[ 373.943002] [<ffffffff8106aa2c>] ? kthread+0x0/0x8a
[ 373.943002] [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
full SysRq-W output:
[ 373.943002] Sched Debug Version: v0.09, 2.6.37-rc3+ #1
[ 373.943002] now at 381511.273166 msecs
[ 373.943002] .jiffies : 4295041238
[ 373.943002] .sysctl_sched_latency : 18.000000
[ 373.943002] .sysctl_sched_min_granularity : 2.250000
[ 373.943002] .sysctl_sched_wakeup_granularity : 3.000000
[ 373.943002] .sysctl_sched_child_runs_first : 0
[ 373.943002] .sysctl_sched_features : 31855
[ 373.943002] .sysctl_sched_tunable_scaling : 1 (logaritmic)
[ 373.943002]
[ 373.943002] cpu#0, 2826.236 MHz
[ 373.943002] .nr_running : 1
[ 373.943002] .load : 1024
[ 373.943002] .nr_switches : 69769
[ 373.943002] .nr_load_updates : 115459
[ 373.943002] .nr_uninterruptible : 0
[ 373.943002] .next_balance : 4295.041289
[ 373.943002] .curr->pid : 34
[ 373.943002] .clock : 373942.002254
[ 373.943002] .cpu_load[0] : 1024
[ 373.943002] .cpu_load[1] : 1024
[ 373.943002] .cpu_load[2] : 1024
[ 373.943002] .cpu_load[3] : 1024
[ 373.943002] .cpu_load[4] : 1024
[ 373.943002] .yld_count : 100
[ 373.943002] .sched_switch : 0
[ 373.943002] .sched_count : 82123
[ 373.943002] .sched_goidle : 26687
[ 373.943002] .avg_idle : 1000000
[ 373.943002] .ttwu_count : 30804
[ 373.943002] .ttwu_local : 8525
[ 373.943002] .bkl_count : 0
[ 373.943002]
[ 373.943002] cfs_rq[0]:/
[ 373.943002] .exec_clock : 107322.196661
[ 373.943002] .MIN_vruntime : 0.000001
[ 373.943002] .min_vruntime : 55990.920524
[ 373.943002] .max_vruntime : 0.000001
[ 373.943002] .spread : 0.000000
[ 373.943002] .spread0 : 0.000000
[ 373.943002] .nr_running : 1
[ 373.943002] .load : 1024
[ 373.943002] .nr_spread_over : 9
[ 373.943002] .shares : 0
[ 373.943002]
[ 373.943002] rt_rq[0]:/
[ 373.943002] .rt_nr_running : 0
[ 373.943002] .rt_throttled : 0
[ 373.943002] .rt_time : 0.000000
[ 373.943002] .rt_runtime : 1000.000000
[ 373.943002]
[ 373.943002] runnable tasks:
[ 373.943002] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[ 373.943002] ----------------------------------------------------------------------------------------------------------
[ 373.943002] R kswapd0 34 55990.920524 43568 120 55990.920524 38575.283576 287944.752314 /
[ 373.943002]
[ 373.943002] cpu#1, 2826.236 MHz
[ 373.943002] .nr_running : 2
[ 373.943002] .load : 2048
[ 373.943002] .nr_switches : 80939
[ 373.943002] .nr_load_updates : 141862
[ 373.943002] .nr_uninterruptible : 1
[ 373.943002] .next_balance : 4295.041423
[ 373.943002] .curr->pid : 925
[ 373.943002] .clock : 382530.001465
[ 373.943002] .cpu_load[0] : 2048
[ 373.943002] .cpu_load[1] : 1920
[ 373.943002] .cpu_load[2] : 1806
[ 373.943002] .cpu_load[3] : 1743
[ 373.943002] .cpu_load[4] : 1716
[ 373.943002] .yld_count : 127
[ 373.943002] .sched_switch : 0
[ 373.943002] .sched_count : 87429
[ 373.943002] .sched_goidle : 29877
[ 373.943002] .avg_idle : 1000000
[ 373.943002] .ttwu_count : 33588
[ 373.943002] .ttwu_local : 9295
[ 373.943002] .bkl_count : 0
[ 373.943002]
[ 373.943002] cfs_rq[1]:/
[ 373.943002] .exec_clock : 132931.075561
[ 373.943002] .MIN_vruntime : 66573.481283
[ 373.943002] .min_vruntime : 66573.481283
[ 373.943002] .max_vruntime : 66573.481283
[ 373.943002] .spread : 0.000000
[ 373.943002] .spread0 : 10582.560759
[ 373.943002] .nr_running : 2
[ 373.943002] .load : 2048
[ 373.943002] .nr_spread_over : 10
[ 373.943002] .shares : 0
[ 373.943002]
[ 373.943002] rt_rq[1]:/
[ 373.943002] .rt_nr_running : 0
[ 373.943002] .rt_throttled : 0
[ 373.943002] .rt_time : 0.000000
[ 373.943002] .rt_runtime : 850.000000
[ 373.943002]
[ 373.943002] runnable tasks:
[ 373.943002] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[ 373.943002] ----------------------------------------------------------------------------------------------------------
[ 373.943002] R rpcbind 925 75167.155023 3118 120 75167.155023 33682.358086 277604.838691 /
[ 373.943002] console-kit-dae 1328 66573.481283 716 120 66573.481283 2306.020280 277814.482610 /
[ 373.943002]
[ 373.943002] cpu#2, 2826.236 MHz
[ 373.943002] .nr_running : 1
[ 373.943002] .load : 1024
[ 373.943002] .nr_switches : 25657
[ 373.943002] .nr_load_updates : 133265
[ 373.943002] .nr_uninterruptible : 6
[ 373.943002] .next_balance : 4295.041381
[ 373.943002] .curr->pid : 1473
[ 373.943002] .clock : 382530.001959
[ 373.943002] .cpu_load[0] : 1024
[ 373.943002] .cpu_load[1] : 732
[ 373.943002] .cpu_load[2] : 703
[ 373.943002] .cpu_load[3] : 726
[ 373.943002] .cpu_load[4] : 777
[ 373.943002] .yld_count : 143
[ 373.943002] .sched_switch : 0
[ 373.943002] .sched_count : 33466
[ 373.943002] .sched_goidle : 5814
[ 373.943002] .avg_idle : 1000000
[ 373.943002] .ttwu_count : 9228
[ 373.943002] .ttwu_local : 6942
[ 373.943002] .bkl_count : 0
[ 373.943002]
[ 373.943002] cfs_rq[2]:/
[ 373.943002] .exec_clock : 125235.081389
[ 373.943002] .MIN_vruntime : 0.000001
[ 373.943002] .min_vruntime : 64653.378538
[ 373.943002] .max_vruntime : 0.000001
[ 373.943002] .spread : 0.000000
[ 373.943002] .spread0 : 8662.458014
[ 373.943002] .nr_running : 1
[ 373.943002] .load : 1024
[ 373.943002] .nr_spread_over : 28
[ 373.943002] .shares : 0
[ 373.943002]
[ 373.943002] rt_rq[2]:/
[ 373.943002] .rt_nr_running : 0
[ 373.943002] .rt_throttled : 0
[ 373.943002] .rt_time : 0.000000
[ 373.943002] .rt_runtime : 1000.000000
[ 373.943002]
[ 373.943002] runnable tasks:
[ 373.943002] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[ 373.943002] ----------------------------------------------------------------------------------------------------------
[ 373.943002] R oom01 1473 64653.378538 3405 120 64653.378538 44153.912865 3897.833338 /
[ 373.943002]
[ 373.943002] cpu#3, 2826.236 MHz
[ 373.943002] .nr_running : 2
[ 373.943002] .load : 2048
[ 373.943002] .nr_switches : 27316
[ 373.943002] .nr_load_updates : 137905
[ 373.943002] .nr_uninterruptible : 5
[ 373.943002] .next_balance : 4295.041253
[ 373.943002] .curr->pid : 1336
[ 373.943002] .clock : 382530.002311
[ 373.943002] .cpu_load[0] : 2048
[ 373.943002] .cpu_load[1] : 1980
[ 373.943002] .cpu_load[2] : 1820
[ 373.943002] .cpu_load[3] : 1754
[ 373.943002] .cpu_load[4] : 1790
[ 373.943002] .yld_count : 9
[ 373.943002] .sched_switch : 0
[ 373.943002] .sched_count : 36031
[ 373.943002] .sched_goidle : 6309
[ 373.943002] .avg_idle : 1000000
[ 373.943002] .ttwu_count : 9803
[ 373.943002] .ttwu_local : 7501
[ 373.943002] .bkl_count : 0
[ 373.943002]
[ 373.943002] cfs_rq[3]:/
[ 373.943002] .exec_clock : 131690.185382
[ 373.943002] .MIN_vruntime : 72546.296158
[ 373.943002] .min_vruntime : 72546.296158
[ 373.943002] .max_vruntime : 72546.296158
[ 373.943002] .spread : 0.000000
[ 373.943002] .spread0 : 16555.375634
[ 373.943002] .nr_running : 2
[ 373.943002] .load : 2048
[ 373.943002] .nr_spread_over : 4
[ 373.943002] .shares : 0
[ 373.943002]
[ 373.943002] rt_rq[3]:/
[ 373.943002] .rt_nr_running : 0
[ 373.943002] .rt_throttled : 0
[ 373.943002] .rt_time : 0.000000
[ 373.943002] .rt_runtime : 950.000000
[ 373.943002]
[ 373.943002] runnable tasks:
[ 373.943002] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[ 373.943002] ----------------------------------------------------------------------------------------------------------
[ 373.943002] irqbalance 908 72546.296158 5882 120 72546.296158 30782.122083 264942.728048 /
[ 373.943002] R bash 1336 81138.830657 744 120 81138.830657 10827.322162 278614.352123 /
[ 373.943002]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
2010-12-04 7:30 ` caiqian
@ 2010-12-08 18:11 ` CAI Qian
2010-12-08 21:48 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: CAI Qian @ 2010-12-08 18:11 UTC (permalink / raw)
To: linux-mm; +Cc: rientjes, kamezawa.hiroyu, kosaki.motohiro, akpm
Bisect indicated that this is the first bad commit,
commit 696d3cd5fb318c070dc757fe109e04e398138172
Author: David Rientjes <rientjes@google.com>
Date: Fri Jun 11 22:45:17 2010 +0200
__out_of_memory() only has a single caller, so fold it into
out_of_memory() and add a comment about locking for its call to
oom_kill_process().
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index cba18c0..26ae697 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -628,41 +628,6 @@ static void clear_system_oom(void)
spin_unlock(&zone_scan_lock);
}
-
-/*
- * Must be called with tasklist_lock held for read.
- */
-static void __out_of_memory(gfp_t gfp_mask, int order, const nodemask_t *mask)
-{
- struct task_struct *p;
- unsigned long points;
-
- if (sysctl_oom_kill_allocating_task)
- if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
- "Out of memory (oom_kill_allocating_task)"))
- return;
-retry:
- /*
- * Rambo mode: Shoot down a process and hope it solves whatever
- * issues we may have.
- */
- p = select_bad_process(&points, NULL, mask);
-
- if (PTR_ERR(p) == -1UL)
- return;
-
- /* Found nothing?!?! Either we hang forever, or we panic. */
- if (!p) {
- dump_header(NULL, gfp_mask, order, NULL);
- read_unlock(&tasklist_lock);
- panic("Out of memory and no killable processes...\n");
- }
-
- if (oom_kill_process(p, gfp_mask, order, points, NULL,
- "Out of memory"))
- goto retry;
-}
-
/**
* out_of_memory - kill the "best" process when we run out of memory
* @zonelist: zonelist pointer
@@ -678,7 +643,9 @@ retry:
void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
int order, nodemask_t *nodemask)
{
+ struct task_struct *p;
unsigned long freed = 0;
+ unsigned long points;
enum oom_constraint constraint = CONSTRAINT_NONE;
blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
@@ -703,10 +670,36 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
if (zonelist)
constraint = constrained_alloc(zonelist, gfp_mask, nodemask);
check_panic_on_oom(constraint, gfp_mask, order);
+
read_lock(&tasklist_lock);
- __out_of_memory(gfp_mask, order,
+ if (sysctl_oom_kill_allocating_task) {
+ /*
+ * oom_kill_process() needs tasklist_lock held. If it returns
+ * non-zero, current could not be killed so we must fallback to
+ * the tasklist scan.
+ */
+ if (!oom_kill_process(current, gfp_mask, order, 0, NULL,
+ "Out of memory (oom_kill_allocating_task)"))
+ return;
+ }
+
+retry:
+ p = select_bad_process(&points, NULL,
constraint == CONSTRAINT_MEMORY_POLICY ? nodemask :
NULL);
+ if (PTR_ERR(p) == -1UL)
+ return;
+
+ /* Found nothing?!?! Either we hang forever, or we panic. */
+ if (!p) {
+ dump_header(NULL, gfp_mask, order, NULL);
+ read_unlock(&tasklist_lock);
+ panic("Out of memory and no killable processes...\n");
+ }
+
+ if (oom_kill_process(p, gfp_mask, order, points, NULL,
+ "Out of memory"))
+ goto retry;
read_unlock(&tasklist_lock);
/*
> Running this LTP test a few times for mmotm tree caused system hung
> hard,
> http://people.redhat.com/qcai/oom01.c
>
> I tried to bisect but only found it was also present in the tree a few
> months back as well.
>
> SysRq-W output indicated that kswapd0 might stuck,
> [ 373.943002] kswapd0 R running task 0 34 2
> 0x00000000
> [ 373.943002] ffff88022abdbc80 ffffffff8146e4ce ffff88022abdbcb0
> ffffffff81232698
> [ 373.943002] 0000000000000001 ffffffff81a248f0 0000000000000000
> 0000000000000000
> [ 373.943002] ffff88022abdbcc0 ffffffff8112d59d ffff88022abdbcd0
> ffffffff8146e4ce
> [ 373.943002] Call Trace:
> [ 373.943002] [<ffffffff8146e4ce>] ? _raw_spin_lock+0xe/0x10
> [ 373.943002] [<ffffffff81232698>] ? __percpu_counter_sum+0x4d/0x63
> [ 373.943002] [<ffffffff8112d59d>] ? get_nr_inodes_unused+0x15/0x23
> [ 373.943002] [<ffffffff8146e4ce>] ? _raw_spin_lock+0xe/0x10
> [ 373.943002] [<ffffffff8146ea0e>] ? common_interrupt+0xe/0x13
> [ 373.943002] [<ffffffff810e2add>] ? balance_pgdat+0x29b/0x417
> [ 373.943002] [<ffffffff810e2e83>] ? kswapd+0x22a/0x240
> [ 373.943002] [<ffffffff8106af63>] ?
> autoremove_wake_function+0x0/0x39
> [ 373.943002] [<ffffffff810e2c59>] ? kswapd+0x0/0x240
> [ 373.943002] [<ffffffff8106aaae>] ? kthread+0x82/0x8a
> [ 373.943002] [<ffffffff8100bae4>] ? kernel_thread_helper+0x4/0x10
> [ 373.943002] [<ffffffff8106aa2c>] ? kthread+0x0/0x8a
> [ 373.943002] [<ffffffff8100bae0>] ? kernel_thread_helper+0x0/0x10
>
> full SysRq-W output:
> [ 373.943002] Sched Debug Version: v0.09, 2.6.37-rc3+ #1
> [ 373.943002] now at 381511.273166 msecs
> [ 373.943002] .jiffies :
> 4295041238
> [ 373.943002] .sysctl_sched_latency : 18.000000
> [ 373.943002] .sysctl_sched_min_granularity : 2.250000
> [ 373.943002] .sysctl_sched_wakeup_granularity : 3.000000
> [ 373.943002] .sysctl_sched_child_runs_first : 0
> [ 373.943002] .sysctl_sched_features : 31855
> [ 373.943002] .sysctl_sched_tunable_scaling : 1
> (logaritmic)
> [ 373.943002]
> [ 373.943002] cpu#0, 2826.236 MHz
> [ 373.943002] .nr_running : 1
> [ 373.943002] .load : 1024
> [ 373.943002] .nr_switches : 69769
> [ 373.943002] .nr_load_updates : 115459
> [ 373.943002] .nr_uninterruptible : 0
> [ 373.943002] .next_balance : 4295.041289
> [ 373.943002] .curr->pid : 34
> [ 373.943002] .clock : 373942.002254
> [ 373.943002] .cpu_load[0] : 1024
> [ 373.943002] .cpu_load[1] : 1024
> [ 373.943002] .cpu_load[2] : 1024
> [ 373.943002] .cpu_load[3] : 1024
> [ 373.943002] .cpu_load[4] : 1024
> [ 373.943002] .yld_count : 100
> [ 373.943002] .sched_switch : 0
> [ 373.943002] .sched_count : 82123
> [ 373.943002] .sched_goidle : 26687
> [ 373.943002] .avg_idle : 1000000
> [ 373.943002] .ttwu_count : 30804
> [ 373.943002] .ttwu_local : 8525
> [ 373.943002] .bkl_count : 0
> [ 373.943002]
> [ 373.943002] cfs_rq[0]:/
> [ 373.943002] .exec_clock : 107322.196661
> [ 373.943002] .MIN_vruntime : 0.000001
> [ 373.943002] .min_vruntime : 55990.920524
> [ 373.943002] .max_vruntime : 0.000001
> [ 373.943002] .spread : 0.000000
> [ 373.943002] .spread0 : 0.000000
> [ 373.943002] .nr_running : 1
> [ 373.943002] .load : 1024
> [ 373.943002] .nr_spread_over : 9
> [ 373.943002] .shares : 0
> [ 373.943002]
> [ 373.943002] rt_rq[0]:/
> [ 373.943002] .rt_nr_running : 0
> [ 373.943002] .rt_throttled : 0
> [ 373.943002] .rt_time : 0.000000
> [ 373.943002] .rt_runtime : 1000.000000
> [ 373.943002]
> [ 373.943002] runnable tasks:
> [ 373.943002] task PID tree-key switches prio
> exec-runtime sum-exec sum-sleep
> [ 373.943002]
> ----------------------------------------------------------------------------------------------------------
> [ 373.943002] R kswapd0 34 55990.920524 43568 120
> 55990.920524 38575.283576 287944.752314 /
> [ 373.943002]
> [ 373.943002] cpu#1, 2826.236 MHz
> [ 373.943002] .nr_running : 2
> [ 373.943002] .load : 2048
> [ 373.943002] .nr_switches : 80939
> [ 373.943002] .nr_load_updates : 141862
> [ 373.943002] .nr_uninterruptible : 1
> [ 373.943002] .next_balance : 4295.041423
> [ 373.943002] .curr->pid : 925
> [ 373.943002] .clock : 382530.001465
> [ 373.943002] .cpu_load[0] : 2048
> [ 373.943002] .cpu_load[1] : 1920
> [ 373.943002] .cpu_load[2] : 1806
> [ 373.943002] .cpu_load[3] : 1743
> [ 373.943002] .cpu_load[4] : 1716
> [ 373.943002] .yld_count : 127
> [ 373.943002] .sched_switch : 0
> [ 373.943002] .sched_count : 87429
> [ 373.943002] .sched_goidle : 29877
> [ 373.943002] .avg_idle : 1000000
> [ 373.943002] .ttwu_count : 33588
> [ 373.943002] .ttwu_local : 9295
> [ 373.943002] .bkl_count : 0
> [ 373.943002]
> [ 373.943002] cfs_rq[1]:/
> [ 373.943002] .exec_clock : 132931.075561
> [ 373.943002] .MIN_vruntime : 66573.481283
> [ 373.943002] .min_vruntime : 66573.481283
> [ 373.943002] .max_vruntime : 66573.481283
> [ 373.943002] .spread : 0.000000
> [ 373.943002] .spread0 : 10582.560759
> [ 373.943002] .nr_running : 2
> [ 373.943002] .load : 2048
> [ 373.943002] .nr_spread_over : 10
> [ 373.943002] .shares : 0
> [ 373.943002]
> [ 373.943002] rt_rq[1]:/
> [ 373.943002] .rt_nr_running : 0
> [ 373.943002] .rt_throttled : 0
> [ 373.943002] .rt_time : 0.000000
> [ 373.943002] .rt_runtime : 850.000000
> [ 373.943002]
> [ 373.943002] runnable tasks:
> [ 373.943002] task PID tree-key switches prio
> exec-runtime sum-exec sum-sleep
> [ 373.943002]
> ----------------------------------------------------------------------------------------------------------
> [ 373.943002] R rpcbind 925 75167.155023 3118 120
> 75167.155023 33682.358086 277604.838691 /
> [ 373.943002] console-kit-dae 1328 66573.481283 716 120
> 66573.481283 2306.020280 277814.482610 /
> [ 373.943002]
> [ 373.943002] cpu#2, 2826.236 MHz
> [ 373.943002] .nr_running : 1
> [ 373.943002] .load : 1024
> [ 373.943002] .nr_switches : 25657
> [ 373.943002] .nr_load_updates : 133265
> [ 373.943002] .nr_uninterruptible : 6
> [ 373.943002] .next_balance : 4295.041381
> [ 373.943002] .curr->pid : 1473
> [ 373.943002] .clock : 382530.001959
> [ 373.943002] .cpu_load[0] : 1024
> [ 373.943002] .cpu_load[1] : 732
> [ 373.943002] .cpu_load[2] : 703
> [ 373.943002] .cpu_load[3] : 726
> [ 373.943002] .cpu_load[4] : 777
> [ 373.943002] .yld_count : 143
> [ 373.943002] .sched_switch : 0
> [ 373.943002] .sched_count : 33466
> [ 373.943002] .sched_goidle : 5814
> [ 373.943002] .avg_idle : 1000000
> [ 373.943002] .ttwu_count : 9228
> [ 373.943002] .ttwu_local : 6942
> [ 373.943002] .bkl_count : 0
> [ 373.943002]
> [ 373.943002] cfs_rq[2]:/
> [ 373.943002] .exec_clock : 125235.081389
> [ 373.943002] .MIN_vruntime : 0.000001
> [ 373.943002] .min_vruntime : 64653.378538
> [ 373.943002] .max_vruntime : 0.000001
> [ 373.943002] .spread : 0.000000
> [ 373.943002] .spread0 : 8662.458014
> [ 373.943002] .nr_running : 1
> [ 373.943002] .load : 1024
> [ 373.943002] .nr_spread_over : 28
> [ 373.943002] .shares : 0
> [ 373.943002]
> [ 373.943002] rt_rq[2]:/
> [ 373.943002] .rt_nr_running : 0
> [ 373.943002] .rt_throttled : 0
> [ 373.943002] .rt_time : 0.000000
> [ 373.943002] .rt_runtime : 1000.000000
> [ 373.943002]
> [ 373.943002] runnable tasks:
> [ 373.943002] task PID tree-key switches prio
> exec-runtime sum-exec sum-sleep
> [ 373.943002]
> ----------------------------------------------------------------------------------------------------------
> [ 373.943002] R oom01 1473 64653.378538 3405 120
> 64653.378538 44153.912865 3897.833338 /
> [ 373.943002]
> [ 373.943002] cpu#3, 2826.236 MHz
> [ 373.943002] .nr_running : 2
> [ 373.943002] .load : 2048
> [ 373.943002] .nr_switches : 27316
> [ 373.943002] .nr_load_updates : 137905
> [ 373.943002] .nr_uninterruptible : 5
> [ 373.943002] .next_balance : 4295.041253
> [ 373.943002] .curr->pid : 1336
> [ 373.943002] .clock : 382530.002311
> [ 373.943002] .cpu_load[0] : 2048
> [ 373.943002] .cpu_load[1] : 1980
> [ 373.943002] .cpu_load[2] : 1820
> [ 373.943002] .cpu_load[3] : 1754
> [ 373.943002] .cpu_load[4] : 1790
> [ 373.943002] .yld_count : 9
> [ 373.943002] .sched_switch : 0
> [ 373.943002] .sched_count : 36031
> [ 373.943002] .sched_goidle : 6309
> [ 373.943002] .avg_idle : 1000000
> [ 373.943002] .ttwu_count : 9803
> [ 373.943002] .ttwu_local : 7501
> [ 373.943002] .bkl_count : 0
> [ 373.943002]
> [ 373.943002] cfs_rq[3]:/
> [ 373.943002] .exec_clock : 131690.185382
> [ 373.943002] .MIN_vruntime : 72546.296158
> [ 373.943002] .min_vruntime : 72546.296158
> [ 373.943002] .max_vruntime : 72546.296158
> [ 373.943002] .spread : 0.000000
> [ 373.943002] .spread0 : 16555.375634
> [ 373.943002] .nr_running : 2
> [ 373.943002] .load : 2048
> [ 373.943002] .nr_spread_over : 4
> [ 373.943002] .shares : 0
> [ 373.943002]
> [ 373.943002] rt_rq[3]:/
> [ 373.943002] .rt_nr_running : 0
> [ 373.943002] .rt_throttled : 0
> [ 373.943002] .rt_time : 0.000000
> [ 373.943002] .rt_runtime : 950.000000
> [ 373.943002]
> [ 373.943002] runnable tasks:
> [ 373.943002] task PID tree-key switches prio
> exec-runtime sum-exec sum-sleep
> [ 373.943002]
> ----------------------------------------------------------------------------------------------------------
> [ 373.943002] irqbalance 908 72546.296158 5882 120
> 72546.296158 30782.122083 264942.728048 /
> [ 373.943002] R bash 1336 81138.830657 744 120
> 81138.830657 10827.322162 278614.352123 /
> [ 373.943002]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
2010-12-08 18:11 ` CAI Qian
@ 2010-12-08 21:48 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-12-08 21:48 UTC (permalink / raw)
To: CAI Qian; +Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
On Wed, 8 Dec 2010, CAI Qian wrote:
> Bisect indicated that this is the first bad commit,
>
> commit 696d3cd5fb318c070dc757fe109e04e398138172
> Author: David Rientjes <rientjes@google.com>
> Date: Fri Jun 11 22:45:17 2010 +0200
>
> __out_of_memory() only has a single caller, so fold it into
> out_of_memory() and add a comment about locking for its call to
> oom_kill_process().
>
> Signed-off-by: David Rientjes <rientjes@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
This commit dropped the releasing of tasklist_lock when the oom killer
chooses not to act because it finds another task that has already been
killed but has yet to exit. That's fixed by b52723c5, so this bisect
isn't the source of your problem.
You didn't report the specific mmotm kernel that this was happening on, so
trying to diagnose or reproduce it is diffcult. Could you try 2.6.37-rc5
with your test case? If it works fine, could you try
mmotm-2010-12-02-16-34?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
[not found] <1390696678.561621291861499485.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-12-09 2:52 ` caiqian
2010-12-09 21:34 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: caiqian @ 2010-12-09 2:52 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
> > Bisect indicated that this is the first bad commit,
> >
> > commit 696d3cd5fb318c070dc757fe109e04e398138172
> > Author: David Rientjes <rientjes@google.com>
> > Date: Fri Jun 11 22:45:17 2010 +0200
> >
> > __out_of_memory() only has a single caller, so fold it into
> > out_of_memory() and add a comment about locking for its call to
> > oom_kill_process().
> >
> > Signed-off-by: David Rientjes <rientjes@google.com>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
>
> This commit dropped the releasing of tasklist_lock when the oom killer
> chooses not to act because it finds another task that has already been
> killed but has yet to exit. That's fixed by b52723c5, so this bisect
> isn't the source of your problem.
>
> You didn't report the specific mmotm kernel that this was happening
> on, so trying to diagnose or reproduce it is diffcult. Could you try
> 2.6.37-rc5 with your test case? If it works fine, could you try
> mmotm-2010-12-02-16-34?
The version is 2010-11-23-16-12 which included b52723c5 you mentioned. 2.6.37-rc5 had the same problem.
CAI Qian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
2010-12-09 2:52 ` caiqian
@ 2010-12-09 21:34 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-12-09 21:34 UTC (permalink / raw)
To: caiqian; +Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
On Wed, 8 Dec 2010, caiqian@redhat.com wrote:
> The version is 2010-11-23-16-12 which included b52723c5 you mentioned.
> 2.6.37-rc5 had the same problem.
>
The problem with your bisect is that you're bisecting in between 696d3cd5
and b52723c5 and identifying a problem that has already been fixed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
[not found] <1541018294.686981291944921430.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
@ 2010-12-10 1:36 ` caiqian
2010-12-11 0:30 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: caiqian @ 2010-12-10 1:36 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
> > The version is 2010-11-23-16-12 which included b52723c5 you mentioned.
> > 2.6.37-rc5 had the same problem.
> >
> The problem with your bisect is that you're bisecting in between 696d3cd5
> and b52723c5 and identifying a problem that has already been fixed.
Both 2010-11-23-16-12 and 2.6.37-rc5 have b52723c5 but still have the problem with OOM testing. If went back one commit before 696d3cd5, it had no problem. Might be b52723c5 did not fix the problem fully?
CAI Qian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
2010-12-10 1:36 ` continuous oom caused system deadlock caiqian
@ 2010-12-11 0:30 ` David Rientjes
2010-12-17 4:16 ` CAI Qian
0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2010-12-11 0:30 UTC (permalink / raw)
To: caiqian; +Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
On Thu, 9 Dec 2010, caiqian@redhat.com wrote:
> > > The version is 2010-11-23-16-12 which included b52723c5 you mentioned.
> > > 2.6.37-rc5 had the same problem.
> > >
> > The problem with your bisect is that you're bisecting in between 696d3cd5
> > and b52723c5 and identifying a problem that has already been fixed.
> Both 2010-11-23-16-12 and 2.6.37-rc5 have b52723c5 but still have the
> problem with OOM testing. If went back one commit before 696d3cd5, it
> had no problem. Might be b52723c5 did not fix the problem fully?
>
When a bisect identifies a commit in between a known-broken patch and fix
for that broken patch, you need to revert your tree back to the fix
(b52723c5) and retest. If the problem persists, then 696d3cd5 is the bad
commit. Otherwise, you need to bisect between the fix (by labeling it
with "git bisect good") and HEAD.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: continuous oom caused system deadlock
2010-12-11 0:30 ` David Rientjes
@ 2010-12-17 4:16 ` CAI Qian
0 siblings, 0 replies; 8+ messages in thread
From: CAI Qian @ 2010-12-17 4:16 UTC (permalink / raw)
To: David Rientjes
Cc: linux-mm, KAMEZAWA Hiroyuki, KOSAKI Motohiro, Andrew Morton
Hi David,
> When a bisect identifies a commit in between a known-broken patch and fix
> for that broken patch, you need to revert your tree back to the fix
> (b52723c5) and retest. If the problem persists, then 696d3cd5 is the bad
> commit. Otherwise, you need to bisect between the fix (by labeling it
> with "git bisect good") and HEAD.
It turned out that this bug is not always reproducible after your fix. Sorry for the false alarm.
CAI Qian
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-12-17 4:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1541018294.686981291944921430.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-12-10 1:36 ` continuous oom caused system deadlock caiqian
2010-12-11 0:30 ` David Rientjes
2010-12-17 4:16 ` CAI Qian
[not found] <1390696678.561621291861499485.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-12-09 2:52 ` caiqian
2010-12-09 21:34 ` David Rientjes
[not found] <254859941.6601291447527808.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
2010-12-04 7:30 ` caiqian
2010-12-08 18:11 ` CAI Qian
2010-12-08 21:48 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).