linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
       [not found] <4FBC1618.5010408@fold.natur.cuni.cz>
@ 2012-05-22 23:28 ` Andrew Morton
  2012-05-22 23:29   ` Andrew Morton
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andrew Morton @ 2012-05-22 23:28 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: LKML, khlebnikov, markus, hughd, kamezawa.hiroyu, oleg,
	Michal Hocko, linux-mm

On Wed, 23 May 2012 00:41:28 +0200
Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> wrote:

> Hi Andrew,
>   while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:
> 
> May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
> May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
> May 23 00:07:55 vostro acpid: exiting
> May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'
> 
>   I found by Google the below thread and thought that maybe it is related?
> http://comments.gmane.org/gmane.linux.kernel.mm/76459
>
> ...
>


Well hopefully the below will fix this?

I notice that I don't have this tagged for -stable backporting.  That
seems wrong.  Konstantin, do we know for how long this bug has been in
there?



From: Konstantin Khlebnikov <khlebnikov@openvz.org>
Subject: mm: correctly synchronize rss-counters at exit/exec

mm->rss_stat counters have per-task delta: task->rss_stat.  Before
changing task->mm pointer the kernel must flush this delta with
sync_mm_rss().

do_exit() already calls sync_mm_rss() to flush the rss-counters before
committing the rss statistics into task->signal->maxrss, taskstats, audit
and other stuff.  Unfortunately the kernel does this before calling
mm_release(), which can call put_user() for processing
task->clear_child_tid.  So at this point we can trigger page-faults and
task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
inconsistent and check_mm() will print something like this:

| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1

This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
out of do_exit() and calls it earlier.  After mm_release() there should be
no pagefaults.

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/exec.c     |    1 -
 kernel/exit.c |   13 ++++++++-----
 kernel/fork.c |    8 ++++++++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
--- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/fs/exec.c
@@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
--- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/exit.c
@@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
 	 * user space pages.  We don't need them, and if we didn't close them
 	 * they would be locked into memory.
 	 */
+	mm_release(current, current->mm);
 	exit_mm(current);
 	/*
 	 * We don't want to get frozen, in case system-wide hibernation
@@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
 	struct mm_struct *mm = tsk->mm;
 	struct core_state *core_state;
 
-	mm_release(tsk, mm);
 	if (!mm)
 		return;
 	/*
@@ -959,9 +959,13 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+
+	/* Set exit_code before complete_vfork_done() in mm_release() */
+	tsk->exit_code = code;
+
+	/* Release mm and sync mm's RSS info before statistics gathering */
+	mm_release(tsk, tsk->mm);
+
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
@@ -974,7 +978,6 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
-	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
 	exit_mm(tsk);
diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
--- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
+++ a/kernel/fork.c
@@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
 		}
 		tsk->clear_child_tid = NULL;
 	}
+
+	/*
+	 * Final rss-counter synchronization. After this point there must be
+	 * no pagefaults into this mm from the current context.  Otherwise
+	 * mm->rss_stat will be inconsistent.
+	 */
+	if (mm)
+		sync_mm_rss(mm);
 }
 
 /*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-22 23:28 ` 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59 Andrew Morton
@ 2012-05-22 23:29   ` Andrew Morton
  2012-05-23 17:21     ` Oleg Nesterov
  2012-05-23  6:07   ` Konstantin Khlebnikov
  2012-05-23 17:04   ` Martin Mokrejs
  2 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-05-22 23:29 UTC (permalink / raw)
  To: Martin Mokrejs, LKML, khlebnikov, markus, hughd, kamezawa.hiroyu,
	oleg, Michal Hocko, linux-mm

On Tue, 22 May 2012 16:28:35 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> I notice that I don't have this tagged for -stable backporting.  That
> seems wrong.  Konstantin, do we know for how long this bug has been in
> there?

Also, I have a note here that Oleg was unhappy with the patch.  Oleg
happiness is important.  Has he cheered up yet?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-22 23:28 ` 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59 Andrew Morton
  2012-05-22 23:29   ` Andrew Morton
@ 2012-05-23  6:07   ` Konstantin Khlebnikov
  2012-05-30  8:25     ` Martin Mokrejs
  2012-05-23 17:04   ` Martin Mokrejs
  2 siblings, 1 reply; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-23  6:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Mokrejs, LKML, markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, oleg@redhat.com, Michal Hocko,
	linux-mm@kvack.org

Andrew Morton wrote:
> On Wed, 23 May 2012 00:41:28 +0200
> Martin Mokrejs<mmokrejs@fold.natur.cuni.cz>  wrote:
>
>> Hi Andrew,
>>    while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:
>>
>> May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
>> May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
>> May 23 00:07:55 vostro acpid: exiting
>> May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'
>>
>>    I found by Google the below thread and thought that maybe it is related?
>> http://comments.gmane.org/gmane.linux.kernel.mm/76459
>>
>> ...
>>
>
>
> Well hopefully the below will fix this?
>
> I notice that I don't have this tagged for -stable backporting.  That
> seems wrong.  Konstantin, do we know for how long this bug has been in
> there?

It there for years, by itself it is mostly harmless.
This warning was added in c3f0327f8e9d7a503f0d64573c311eddd61f197d
so only v3.4 has this, I thought this fix will be there before release.

>
>
>
> From: Konstantin Khlebnikov<khlebnikov@openvz.org>
> Subject: mm: correctly synchronize rss-counters at exit/exec
>
> mm->rss_stat counters have per-task delta: task->rss_stat.  Before
> changing task->mm pointer the kernel must flush this delta with
> sync_mm_rss().
>
> do_exit() already calls sync_mm_rss() to flush the rss-counters before
> committing the rss statistics into task->signal->maxrss, taskstats, audit
> and other stuff.  Unfortunately the kernel does this before calling
> mm_release(), which can call put_user() for processing
> task->clear_child_tid.  So at this point we can trigger page-faults and
> task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
>
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
> out of do_exit() and calls it earlier.  After mm_release() there should be
> no pagefaults.
>
> [akpm@linux-foundation.org: tweak comment]
> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
> Cc: Hugh Dickins<hughd@google.com>
> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov<oleg@redhat.com>
> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
> ---
>
>   fs/exec.c     |    1 -
>   kernel/exit.c |   13 ++++++++-----
>   kernel/fork.c |    8 ++++++++
>   3 files changed, 16 insertions(+), 6 deletions(-)
>
> diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
> --- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/fs/exec.c
> @@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
>   	/* Notify parent that we're no longer interested in the old VM */
>   	tsk = current;
>   	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>   	mm_release(tsk, old_mm);
>
>   	if (old_mm) {
> diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
> --- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/kernel/exit.c
> @@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
>   	 * user space pages.  We don't need them, and if we didn't close them
>   	 * they would be locked into memory.
>   	 */
> +	mm_release(current, current->mm);
>   	exit_mm(current);
>   	/*
>   	 * We don't want to get frozen, in case system-wide hibernation
> @@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
>   	struct mm_struct *mm = tsk->mm;
>   	struct core_state *core_state;
>
> -	mm_release(tsk, mm);
>   	if (!mm)
>   		return;
>   	/*
> @@ -959,9 +959,13 @@ void do_exit(long code)
>   				preempt_count());
>
>   	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
> +
> +	/* Set exit_code before complete_vfork_done() in mm_release() */
> +	tsk->exit_code = code;
> +
> +	/* Release mm and sync mm's RSS info before statistics gathering */
> +	mm_release(tsk, tsk->mm);
> +
>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>   	if (group_dead) {
>   		hrtimer_cancel(&tsk->signal->real_timer);
> @@ -974,7 +978,6 @@ void do_exit(long code)
>   		tty_audit_exit();
>   	audit_free(tsk);
>
> -	tsk->exit_code = code;
>   	taskstats_exit(tsk, group_dead);
>
>   	exit_mm(tsk);
> diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
> --- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/kernel/fork.c
> @@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
>   		}
>   		tsk->clear_child_tid = NULL;
>   	}
> +
> +	/*
> +	 * Final rss-counter synchronization. After this point there must be
> +	 * no pagefaults into this mm from the current context.  Otherwise
> +	 * mm->rss_stat will be inconsistent.
> +	 */
> +	if (mm)
> +		sync_mm_rss(mm);
>   }
>
>   /*
> _
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-22 23:28 ` 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59 Andrew Morton
  2012-05-22 23:29   ` Andrew Morton
  2012-05-23  6:07   ` Konstantin Khlebnikov
@ 2012-05-23 17:04   ` Martin Mokrejs
  2012-05-24 10:36     ` Konstantin Khlebnikov
  2 siblings, 1 reply; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-23 17:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, khlebnikov, markus, hughd, kamezawa.hiroyu, oleg,
	Michal Hocko, linux-mm

Hi,
  I rebooted the laptop twice today after just brief uses and the messages did not
appear in the logs.

Now I just applied the below patch and during two reboots it did not appear either.
Do I have to use the computer for some longer while to reproduce the issue? ;-)

I will stay with the patch applied over 3.4-rc7 and would the BUG: re-appear I will
let you know. But I doubt at the moment I could confirm it really helped.
Clues how to reproduce? ;)
Martin

Andrew Morton wrote:
> On Wed, 23 May 2012 00:41:28 +0200
> Martin Mokrejs <mmokrejs@fold.natur.cuni.cz> wrote:
> 
>> Hi Andrew,
>>   while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:
>>
>> May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
>> May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
>> May 23 00:07:55 vostro acpid: exiting
>> May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'
>>
>>   I found by Google the below thread and thought that maybe it is related?
>> http://comments.gmane.org/gmane.linux.kernel.mm/76459
>>
>> ...
>>
> 
> 
> Well hopefully the below will fix this?
> 
> I notice that I don't have this tagged for -stable backporting.  That
> seems wrong.  Konstantin, do we know for how long this bug has been in
> there?
> 
> 
> 
> From: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Subject: mm: correctly synchronize rss-counters at exit/exec
> 
> mm->rss_stat counters have per-task delta: task->rss_stat.  Before
> changing task->mm pointer the kernel must flush this delta with
> sync_mm_rss().
> 
> do_exit() already calls sync_mm_rss() to flush the rss-counters before
> committing the rss statistics into task->signal->maxrss, taskstats, audit
> and other stuff.  Unfortunately the kernel does this before calling
> mm_release(), which can call put_user() for processing
> task->clear_child_tid.  So at this point we can trigger page-faults and
> task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
> inconsistent and check_mm() will print something like this:
> 
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
> 
> This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
> out of do_exit() and calls it earlier.  After mm_release() there should be
> no pagefaults.
> 
> [akpm@linux-foundation.org: tweak comment]
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  fs/exec.c     |    1 -
>  kernel/exit.c |   13 ++++++++-----
>  kernel/fork.c |    8 ++++++++
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
> --- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/fs/exec.c
> @@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
>  	/* Notify parent that we're no longer interested in the old VM */
>  	tsk = current;
>  	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>  	mm_release(tsk, old_mm);
>  
>  	if (old_mm) {
> diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
> --- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/kernel/exit.c
> @@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
>  	 * user space pages.  We don't need them, and if we didn't close them
>  	 * they would be locked into memory.
>  	 */
> +	mm_release(current, current->mm);
>  	exit_mm(current);
>  	/*
>  	 * We don't want to get frozen, in case system-wide hibernation
> @@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
>  	struct mm_struct *mm = tsk->mm;
>  	struct core_state *core_state;
>  
> -	mm_release(tsk, mm);
>  	if (!mm)
>  		return;
>  	/*
> @@ -959,9 +959,13 @@ void do_exit(long code)
>  				preempt_count());
>  
>  	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
> +
> +	/* Set exit_code before complete_vfork_done() in mm_release() */
> +	tsk->exit_code = code;
> +
> +	/* Release mm and sync mm's RSS info before statistics gathering */
> +	mm_release(tsk, tsk->mm);
> +
>  	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
>  		hrtimer_cancel(&tsk->signal->real_timer);
> @@ -974,7 +978,6 @@ void do_exit(long code)
>  		tty_audit_exit();
>  	audit_free(tsk);
>  
> -	tsk->exit_code = code;
>  	taskstats_exit(tsk, group_dead);
>  
>  	exit_mm(tsk);
> diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
> --- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
> +++ a/kernel/fork.c
> @@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
>  		}
>  		tsk->clear_child_tid = NULL;
>  	}
> +
> +	/*
> +	 * Final rss-counter synchronization. After this point there must be
> +	 * no pagefaults into this mm from the current context.  Otherwise
> +	 * mm->rss_stat will be inconsistent.
> +	 */
> +	if (mm)
> +		sync_mm_rss(mm);
>  }
>  
>  /*
> _
> 
> .
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-22 23:29   ` Andrew Morton
@ 2012-05-23 17:21     ` Oleg Nesterov
  2012-05-29 20:18       ` Konstantin Khlebnikov
  2012-05-30  9:54       ` Martin Mokrejs
  0 siblings, 2 replies; 21+ messages in thread
From: Oleg Nesterov @ 2012-05-23 17:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Martin Mokrejs, LKML, khlebnikov, markus, hughd, kamezawa.hiroyu,
	Michal Hocko, linux-mm

On 05/22, Andrew Morton wrote:
>
> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
> happiness is important.  Has he cheered up yet?

Well, yes, I do not really like this patch ;) Because I think there is
a more simple/straightforward fix, see below. In my opinion it also
makes the original code simpler.

But. Obviously this is subjective, I can't prove my patch is "better",
and I didn't try to test it.

So I won't argue with Konstantin who dislikes my patch, although I
would like to know the reason.

Oleg.


--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
 	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
 	mm = get_task_mm(p);
 	if (mm) {
+		sync_mm_rss(mm);
 		/* adjust to KB unit */
 		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
 		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
 	mm_release(tsk, mm);
 	if (!mm)
 		return;
+
+	sync_mm_rss(mm);
 	/*
 	 * Serialize with any possible pending coredump.
 	 * We must hold mmap_sem around checking core_state
@@ -960,9 +962,6 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
+		sync_mm_rss(old_mm);
 		/*
 		 * Make sure that if there is a core dump in progress
 		 * for the old mm, we get out and die instead of going


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-23 17:04   ` Martin Mokrejs
@ 2012-05-24 10:36     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-24 10:36 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: Andrew Morton, LKML, markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, oleg@redhat.com, Michal Hocko,
	linux-mm@kvack.org

Martin Mokrejs wrote:
> Hi,
>    I rebooted the laptop twice today after just brief uses and the messages did not
> appear in the logs.
>
> Now I just applied the below patch and during two reboots it did not appear either.
> Do I have to use the computer for some longer while to reproduce the issue? ;-)

Yes, some data must be in swap to reproduce this, so memory pressure required here.

>
> I will stay with the patch applied over 3.4-rc7 and would the BUG: re-appear I will
> let you know. But I doubt at the moment I could confirm it really helped.
> Clues how to reproduce? ;)
> Martin
>
> Andrew Morton wrote:
>> On Wed, 23 May 2012 00:41:28 +0200
>> Martin Mokrejs<mmokrejs@fold.natur.cuni.cz>  wrote:
>>
>>> Hi Andrew,
>>>    while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:
>>>
>>> May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
>>> May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
>>> May 23 00:07:55 vostro acpid: exiting
>>> May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'
>>>
>>>    I found by Google the below thread and thought that maybe it is related?
>>> http://comments.gmane.org/gmane.linux.kernel.mm/76459
>>>
>>> ...
>>>
>>
>>
>> Well hopefully the below will fix this?
>>
>> I notice that I don't have this tagged for -stable backporting.  That
>> seems wrong.  Konstantin, do we know for how long this bug has been in
>> there?
>>
>>
>>
>> From: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Subject: mm: correctly synchronize rss-counters at exit/exec
>>
>> mm->rss_stat counters have per-task delta: task->rss_stat.  Before
>> changing task->mm pointer the kernel must flush this delta with
>> sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush the rss-counters before
>> committing the rss statistics into task->signal->maxrss, taskstats, audit
>> and other stuff.  Unfortunately the kernel does this before calling
>> mm_release(), which can call put_user() for processing
>> task->clear_child_tid.  So at this point we can trigger page-faults and
>> task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
>> out of do_exit() and calls it earlier.  After mm_release() there should be
>> no pagefaults.
>>
>> [akpm@linux-foundation.org: tweak comment]
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Oleg Nesterov<oleg@redhat.com>
>> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
>> ---
>>
>>   fs/exec.c     |    1 -
>>   kernel/exit.c |   13 ++++++++-----
>>   kernel/fork.c |    8 ++++++++
>>   3 files changed, 16 insertions(+), 6 deletions(-)
>>
>> diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
>> --- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/fs/exec.c
>> @@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
>>   	/* Notify parent that we're no longer interested in the old VM */
>>   	tsk = current;
>>   	old_mm = current->mm;
>> -	sync_mm_rss(old_mm);
>>   	mm_release(tsk, old_mm);
>>
>>   	if (old_mm) {
>> diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
>> --- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/kernel/exit.c
>> @@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
>>   	 * user space pages.  We don't need them, and if we didn't close them
>>   	 * they would be locked into memory.
>>   	 */
>> +	mm_release(current, current->mm);
>>   	exit_mm(current);
>>   	/*
>>   	 * We don't want to get frozen, in case system-wide hibernation
>> @@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
>>   	struct mm_struct *mm = tsk->mm;
>>   	struct core_state *core_state;
>>
>> -	mm_release(tsk, mm);
>>   	if (!mm)
>>   		return;
>>   	/*
>> @@ -959,9 +959,13 @@ void do_exit(long code)
>>   				preempt_count());
>>
>>   	acct_update_integrals(tsk);
>> -	/* sync mm's RSS info before statistics gathering */
>> -	if (tsk->mm)
>> -		sync_mm_rss(tsk->mm);
>> +
>> +	/* Set exit_code before complete_vfork_done() in mm_release() */
>> +	tsk->exit_code = code;
>> +
>> +	/* Release mm and sync mm's RSS info before statistics gathering */
>> +	mm_release(tsk, tsk->mm);
>> +
>>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>>   	if (group_dead) {
>>   		hrtimer_cancel(&tsk->signal->real_timer);
>> @@ -974,7 +978,6 @@ void do_exit(long code)
>>   		tty_audit_exit();
>>   	audit_free(tsk);
>>
>> -	tsk->exit_code = code;
>>   	taskstats_exit(tsk, group_dead);
>>
>>   	exit_mm(tsk);
>> diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
>> --- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/kernel/fork.c
>> @@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
>>   		}
>>   		tsk->clear_child_tid = NULL;
>>   	}
>> +
>> +	/*
>> +	 * Final rss-counter synchronization. After this point there must be
>> +	 * no pagefaults into this mm from the current context.  Otherwise
>> +	 * mm->rss_stat will be inconsistent.
>> +	 */
>> +	if (mm)
>> +		sync_mm_rss(mm);
>>   }
>>
>>   /*
>> _
>>
>> .
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-23 17:21     ` Oleg Nesterov
@ 2012-05-29 20:18       ` Konstantin Khlebnikov
  2012-05-29 20:26         ` Andrew Morton
  2012-05-30 17:11         ` Oleg Nesterov
  2012-05-30  9:54       ` Martin Mokrejs
  1 sibling, 2 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-29 20:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Martin Mokrejs, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Oleg Nesterov wrote:
> On 05/22, Andrew Morton wrote:
>>
>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>> happiness is important.  Has he cheered up yet?
>
> Well, yes, I do not really like this patch ;) Because I think there is
> a more simple/straightforward fix, see below. In my opinion it also
> makes the original code simpler.
>
> But. Obviously this is subjective, I can't prove my patch is "better",
> and I didn't try to test it.
>
> So I won't argue with Konstantin who dislikes my patch, although I
> would like to know the reason.

I don't remember why I dislike your patch.
For now I can only say ACK )

>
> Oleg.
>
>
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
>   	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>   	mm = get_task_mm(p);
>   	if (mm) {
> +		sync_mm_rss(mm);
>   		/* adjust to KB unit */
>   		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
>   		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
>   	mm_release(tsk, mm);
>   	if (!mm)
>   		return;
> +
> +	sync_mm_rss(mm);
>   	/*
>   	 * Serialize with any possible pending coredump.
>   	 * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
>   				preempt_count());
>
>   	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>   	if (group_dead) {
>   		hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
>   	/* Notify parent that we're no longer interested in the old VM */
>   	tsk = current;
>   	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>   	mm_release(tsk, old_mm);
>
>   	if (old_mm) {
> +		sync_mm_rss(old_mm);
>   		/*
>   		 * Make sure that if there is a core dump in progress
>   		 * for the old mm, we get out and die instead of going
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-29 20:18       ` Konstantin Khlebnikov
@ 2012-05-29 20:26         ` Andrew Morton
  2012-05-29 21:59           ` Martin Mokrejs
  2012-05-30 17:11         ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2012-05-29 20:26 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Oleg Nesterov, Martin Mokrejs, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

On Wed, 30 May 2012 00:18:31 +0400
Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:

> Oleg Nesterov wrote:
> > On 05/22, Andrew Morton wrote:
> >>
> >> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
> >> happiness is important.  Has he cheered up yet?
> >
> > Well, yes, I do not really like this patch ;) Because I think there is
> > a more simple/straightforward fix, see below. In my opinion it also
> > makes the original code simpler.
> >
> > But. Obviously this is subjective, I can't prove my patch is "better",
> > and I didn't try to test it.
> >
> > So I won't argue with Konstantin who dislikes my patch, although I
> > would like to know the reason.
> 
> I don't remember why I dislike your patch.
> For now I can only say ACK )

We'll need a changelogged signed-off patch, please Oleg.  And some evidence
that it was tested would be nice ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-29 20:26         ` Andrew Morton
@ 2012-05-29 21:59           ` Martin Mokrejs
  2012-05-30 11:39             ` Konstantin Khlebnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-29 21:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Oleg Nesterov, LKML,
	markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko, linux-mm@kvack.org

Andrew Morton wrote:
> On Wed, 30 May 2012 00:18:31 +0400
> Konstantin Khlebnikov <khlebnikov@openvz.org> wrote:
> 
>> Oleg Nesterov wrote:
>>> On 05/22, Andrew Morton wrote:
>>>>
>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>> happiness is important.  Has he cheered up yet?
>>>
>>> Well, yes, I do not really like this patch ;) Because I think there is
>>> a more simple/straightforward fix, see below. In my opinion it also
>>> makes the original code simpler.
>>>
>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>> and I didn't try to test it.
>>>
>>> So I won't argue with Konstantin who dislikes my patch, although I
>>> would like to know the reason.
>>
>> I don't remember why I dislike your patch.
>> For now I can only say ACK )
> 
> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
> that it was tested would be nice ;)

I will reboot in few hours, finally after few days ... I am running this first
patch. I will try to test the second/alternative patch more quickly. Sorry for
the delay.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-23  6:07   ` Konstantin Khlebnikov
@ 2012-05-30  8:25     ` Martin Mokrejs
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-30  8:25 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, LKML, markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, oleg@redhat.com, Michal Hocko,
	linux-mm@kvack.org



Konstantin Khlebnikov wrote:
> Andrew Morton wrote:
>> On Wed, 23 May 2012 00:41:28 +0200
>> Martin Mokrejs<mmokrejs@fold.natur.cuni.cz>  wrote:
>>
>>> Hi Andrew,
>>>    while shutting down my laptop (Dell Vostro 3550 with 16GB RAM, core i7) with 3.4-rc7 I got:
>>>
>>> May 23 00:07:54 vostro kernel: [352687.968267] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
>>> May 23 00:07:54 vostro kernel: [352687.968312] BUG: Bad rss-counter state mm:ffff88040b56f800 idx:2 val:59
>>> May 23 00:07:55 vostro acpid: exiting
>>> May 23 00:07:55 vostro syslog-ng[2838]: syslog-ng shutting down; version='3.3.4'
>>>
>>>    I found by Google the below thread and thought that maybe it is related?
>>> http://comments.gmane.org/gmane.linux.kernel.mm/76459
>>>
>>> ...
>>>
>>
>>
>> Well hopefully the below will fix this?
>>
>> I notice that I don't have this tagged for -stable backporting.  That
>> seems wrong.  Konstantin, do we know for how long this bug has been in
>> there?
> 
> It there for years, by itself it is mostly harmless.
> This warning was added in c3f0327f8e9d7a503f0d64573c311eddd61f197d
> so only v3.4 has this, I thought this fix will be there before release.
> 
>>
>>
>>
>> From: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Subject: mm: correctly synchronize rss-counters at exit/exec
>>
>> mm->rss_stat counters have per-task delta: task->rss_stat.  Before
>> changing task->mm pointer the kernel must flush this delta with
>> sync_mm_rss().
>>
>> do_exit() already calls sync_mm_rss() to flush the rss-counters before
>> committing the rss statistics into task->signal->maxrss, taskstats, audit
>> and other stuff.  Unfortunately the kernel does this before calling
>> mm_release(), which can call put_user() for processing
>> task->clear_child_tid.  So at this point we can trigger page-faults and
>> task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
>> inconsistent and check_mm() will print something like this:
>>
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
>> | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
>>
>> This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
>> out of do_exit() and calls it earlier.  After mm_release() there should be
>> no pagefaults.
>>
>> [akpm@linux-foundation.org: tweak comment]
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
>> Cc: Hugh Dickins<hughd@google.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Oleg Nesterov<oleg@redhat.com>
>> Signed-off-by: Andrew Morton<akpm@linux-foundation.org>
>> ---
>>
>>   fs/exec.c     |    1 -
>>   kernel/exit.c |   13 ++++++++-----
>>   kernel/fork.c |    8 ++++++++
>>   3 files changed, 16 insertions(+), 6 deletions(-)
>>
>> diff -puN fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec fs/exec.c
>> --- a/fs/exec.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/fs/exec.c
>> @@ -823,7 +823,6 @@ static int exec_mmap(struct mm_struct *m
>>       /* Notify parent that we're no longer interested in the old VM */
>>       tsk = current;
>>       old_mm = current->mm;
>> -    sync_mm_rss(old_mm);
>>       mm_release(tsk, old_mm);
>>
>>       if (old_mm) {
>> diff -puN kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/exit.c
>> --- a/kernel/exit.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/kernel/exit.c
>> @@ -423,6 +423,7 @@ void daemonize(const char *name, ...)
>>        * user space pages.  We don't need them, and if we didn't close them
>>        * they would be locked into memory.
>>        */
>> +    mm_release(current, current->mm);
>>       exit_mm(current);
>>       /*
>>        * We don't want to get frozen, in case system-wide hibernation
>> @@ -640,7 +641,6 @@ static void exit_mm(struct task_struct *
>>       struct mm_struct *mm = tsk->mm;
>>       struct core_state *core_state;
>>
>> -    mm_release(tsk, mm);
>>       if (!mm)
>>           return;
>>       /*
>> @@ -959,9 +959,13 @@ void do_exit(long code)
>>                   preempt_count());
>>
>>       acct_update_integrals(tsk);
>> -    /* sync mm's RSS info before statistics gathering */
>> -    if (tsk->mm)
>> -        sync_mm_rss(tsk->mm);
>> +
>> +    /* Set exit_code before complete_vfork_done() in mm_release() */
>> +    tsk->exit_code = code;
>> +
>> +    /* Release mm and sync mm's RSS info before statistics gathering */
>> +    mm_release(tsk, tsk->mm);
>> +
>>       group_dead = atomic_dec_and_test(&tsk->signal->live);
>>       if (group_dead) {
>>           hrtimer_cancel(&tsk->signal->real_timer);
>> @@ -974,7 +978,6 @@ void do_exit(long code)
>>           tty_audit_exit();
>>       audit_free(tsk);
>>
>> -    tsk->exit_code = code;
>>       taskstats_exit(tsk, group_dead);
>>
>>       exit_mm(tsk);
>> diff -puN kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec kernel/fork.c
>> --- a/kernel/fork.c~mm-correctly-synchronize-rss-counters-at-exit-exec
>> +++ a/kernel/fork.c
>> @@ -809,6 +809,14 @@ void mm_release(struct task_struct *tsk,
>>           }
>>           tsk->clear_child_tid = NULL;
>>       }
>> +
>> +    /*
>> +     * Final rss-counter synchronization. After this point there must be
>> +     * no pagefaults into this mm from the current context.  Otherwise
>> +     * mm->rss_stat will be inconsistent.
>> +     */
>> +    if (mm)
>> +        sync_mm_rss(mm);
>>   }
>>
>>   /*
>> _
>>

I made my system to allocate some 3 millions of blocks in swap according to vmstat(1)
and rebooted.  It took about 6 minutes to the system to kill 7 gimp images 2.2GB
(16000x8000px, at 1200dpi each and a python session having some huge lists in memory.
I have 16GB of RAM. There were no errors/warnings or Oopses logged in /var/log/messages
so I conclude this patch from Konstantin Khlebnikov works for me on 3.4-rc7.

May 30 09:57:57 vostro syslog-ng[2534]: syslog-ng shutting down; version='3.3.4'
May 30 10:06:31 vostro syslog-ng[2519]: syslog-ng starting up; version='3.3.4'

Tested-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz>

--

Will try the other patch from Oleg Nesterov now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-23 17:21     ` Oleg Nesterov
  2012-05-29 20:18       ` Konstantin Khlebnikov
@ 2012-05-30  9:54       ` Martin Mokrejs
  1 sibling, 0 replies; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-30  9:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, LKML, khlebnikov, markus, hughd, kamezawa.hiroyu,
	Michal Hocko, linux-mm



Oleg Nesterov wrote:
> On 05/22, Andrew Morton wrote:
>>
>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>> happiness is important.  Has he cheered up yet?
> 
> Well, yes, I do not really like this patch ;) Because I think there is
> a more simple/straightforward fix, see below. In my opinion it also
> makes the original code simpler.
> 
> But. Obviously this is subjective, I can't prove my patch is "better",
> and I didn't try to test it.
> 
> So I won't argue with Konstantin who dislikes my patch, although I
> would like to know the reason.
> 
> Oleg.
> 
> 
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *sta
>  	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>  	mm = get_task_mm(p);
>  	if (mm) {
> +		sync_mm_rss(mm);
>  		/* adjust to KB unit */
>  		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
>  		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -643,6 +643,8 @@ static void exit_mm(struct task_struct *
>  	mm_release(tsk, mm);
>  	if (!mm)
>  		return;
> +
> +	sync_mm_rss(mm);
>  	/*
>  	 * Serialize with any possible pending coredump.
>  	 * We must hold mmap_sem around checking core_state
> @@ -960,9 +962,6 @@ void do_exit(long code)
>  				preempt_count());
>  
>  	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
>  	group_dead = atomic_dec_and_test(&tsk->signal->live);
>  	if (group_dead) {
>  		hrtimer_cancel(&tsk->signal->real_timer);
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *m
>  	/* Notify parent that we're no longer interested in the old VM */
>  	tsk = current;
>  	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>  	mm_release(tsk, old_mm);
>  
>  	if (old_mm) {
> +		sync_mm_rss(old_mm);
>  		/*
>  		 * Make sure that if there is a core dump in progress
>  		 * for the old mm, we get out and die instead of going
> 
> 

Tested-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz>

This patch works equally well for me as the other patch proposed earlier by Konstantin
Khlebnikov.

Would both patches have some debug printk() showing the code really did kick
in I would have been more assured it had a chance to really do their job. But
in both cases I made the system use up all RAM and start to swap so if that was
enough to trigger the situation as you said earlier then they are both fine.

Finally, I went to re-test again the patch from Konstantin because the several
minutes long delay in shutdown puzzled me and I did not get it with this patch
from Oleg. I conclude it was probably related to my initial attempts to also copy
/home/blah to /tmp (I thought it is in-memory filesystem so I can easily drain
memory resources but seems I was wrong). Maybe this was the reason while the
shutdown took so long. I am still not sure because init.d/ scritps cleanup /tmp
on startup on Gentoo ... but I was not able to reproduce the long delay on second
attempt with using purely python to eat my memory to record some huge lists.

For those wondering as well why the long delay on shutdown happened here are my
mounts:

# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext3 (rw,noatime,commit=0)
devtmpfs on /dev type devtmpfs (rw,relatime,size=8184896k,nr_inodes=2046224,mode=755)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
rc-svcdir on /lib64/rc/init.d type tmpfs (rw,nosuid,nodev,noexec,relatime,size=1024k,mode=755)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib64/rc/sh/cgroup-release-agent.sh,name=openrc)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
#

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-29 21:59           ` Martin Mokrejs
@ 2012-05-30 11:39             ` Konstantin Khlebnikov
  2012-05-30 11:59               ` Martin Mokrejs
  0 siblings, 1 reply; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-30 11:39 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: Andrew Morton, Oleg Nesterov, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Martin Mokrejs wrote:
> Andrew Morton wrote:
>> On Wed, 30 May 2012 00:18:31 +0400
>> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>>
>>> Oleg Nesterov wrote:
>>>> On 05/22, Andrew Morton wrote:
>>>>>
>>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>>> happiness is important.  Has he cheered up yet?
>>>>
>>>> Well, yes, I do not really like this patch ;) Because I think there is
>>>> a more simple/straightforward fix, see below. In my opinion it also
>>>> makes the original code simpler.
>>>>
>>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>>> and I didn't try to test it.
>>>>
>>>> So I won't argue with Konstantin who dislikes my patch, although I
>>>> would like to know the reason.
>>>
>>> I don't remember why I dislike your patch.
>>> For now I can only say ACK )
>>
>> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
>> that it was tested would be nice ;)
>
> I will reboot in few hours, finally after few days ... I am running this first
> patch. I will try to test the second/alternative patch more quickly. Sorry for
> the delay.
>

easiest way trigger this bug:

#define _GNU_SOURCE
#include <unistd.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>

static inline int sys_clone(unsigned long flags, void *stack, int *ptid, int *ctid)
{
	return syscall(SYS_clone, flags, stack, ptid, ctid);
}

int main(int argc, char **argv)
{
	void *page;

	page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	sys_clone(CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, page);
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-30 11:39             ` Konstantin Khlebnikov
@ 2012-05-30 11:59               ` Martin Mokrejs
  2012-05-30 12:22                 ` Konstantin Khlebnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-30 11:59 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Oleg Nesterov, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org



Konstantin Khlebnikov wrote:
> Martin Mokrejs wrote:
>> Andrew Morton wrote:
>>> On Wed, 30 May 2012 00:18:31 +0400
>>> Konstantin Khlebnikov<khlebnikov@openvz.org>  wrote:
>>>
>>>> Oleg Nesterov wrote:
>>>>> On 05/22, Andrew Morton wrote:
>>>>>>
>>>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>>>> happiness is important.  Has he cheered up yet?
>>>>>
>>>>> Well, yes, I do not really like this patch ;) Because I think there is
>>>>> a more simple/straightforward fix, see below. In my opinion it also
>>>>> makes the original code simpler.
>>>>>
>>>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>>>> and I didn't try to test it.
>>>>>
>>>>> So I won't argue with Konstantin who dislikes my patch, although I
>>>>> would like to know the reason.
>>>>
>>>> I don't remember why I dislike your patch.
>>>> For now I can only say ACK )
>>>
>>> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
>>> that it was tested would be nice ;)
>>
>> I will reboot in few hours, finally after few days ... I am running this first
>> patch. I will try to test the second/alternative patch more quickly. Sorry for
>> the delay.
>>
> 
> easiest way trigger this bug:
> 
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <sched.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> 
> static inline int sys_clone(unsigned long flags, void *stack, int *ptid, int *ctid)
> {
>     return syscall(SYS_clone, flags, stack, ptid, ctid);
> }
> 
> int main(int argc, char **argv)
> {
>     void *page;
> 
>     page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>     sys_clone(CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, page);
> }
> 

I am getting segfaults with this.

(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00007f430f70a7e0 in __elf_set___libc_subfreeres_element_free_mem__ () from /lib64/libc.so.6
#2  0x00007f430f70a7e8 in __elf_set___libc_atexit_element__IO_cleanup__ () from /lib64/libc.so.6
#3  0x0000000000000001 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb)

What number should I give it as an argument? ;-)

Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-30 11:59               ` Martin Mokrejs
@ 2012-05-30 12:22                 ` Konstantin Khlebnikov
  2012-05-30 12:54                   ` Konstantin Khlebnikov
  0 siblings, 1 reply; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-30 12:22 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: Andrew Morton, Oleg Nesterov, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Martin Mokrejs wrote:
>
>
> Konstantin Khlebnikov wrote:
>> Martin Mokrejs wrote:
>>> Andrew Morton wrote:
>>>> On Wed, 30 May 2012 00:18:31 +0400
>>>> Konstantin Khlebnikov<khlebnikov@openvz.org>   wrote:
>>>>
>>>>> Oleg Nesterov wrote:
>>>>>> On 05/22, Andrew Morton wrote:
>>>>>>>
>>>>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>>>>> happiness is important.  Has he cheered up yet?
>>>>>>
>>>>>> Well, yes, I do not really like this patch ;) Because I think there is
>>>>>> a more simple/straightforward fix, see below. In my opinion it also
>>>>>> makes the original code simpler.
>>>>>>
>>>>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>>>>> and I didn't try to test it.
>>>>>>
>>>>>> So I won't argue with Konstantin who dislikes my patch, although I
>>>>>> would like to know the reason.
>>>>>
>>>>> I don't remember why I dislike your patch.
>>>>> For now I can only say ACK )
>>>>
>>>> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
>>>> that it was tested would be nice ;)
>>>
>>> I will reboot in few hours, finally after few days ... I am running this first
>>> patch. I will try to test the second/alternative patch more quickly. Sorry for
>>> the delay.
>>>
>>
>> easiest way trigger this bug:
>>
>> #define _GNU_SOURCE
>> #include<unistd.h>
>> #include<sched.h>
>> #include<sys/syscall.h>
>> #include<sys/mman.h>
>>
>> static inline int sys_clone(unsigned long flags, void *stack, int *ptid, int *ctid)
>> {
>>      return syscall(SYS_clone, flags, stack, ptid, ctid);
>> }
>>
>> int main(int argc, char **argv)
>> {
>>      void *page;
>>
>>      page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>>      sys_clone(CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, page);
>> }
>>
>
> I am getting segfaults with this.
>
> (gdb) where
> #0  0x0000000000000000 in ?? ()
> #1  0x00007f430f70a7e0 in __elf_set___libc_subfreeres_element_free_mem__ () from /lib64/libc.so.6
> #2  0x00007f430f70a7e8 in __elf_set___libc_atexit_element__IO_cleanup__ () from /lib64/libc.so.6
> #3  0x0000000000000001 in ?? ()
> #4  0x0000000000000000 in ?? ()
> (gdb)
>
> What number should I give it as an argument? ;-)

there is no arguments.

yeah it corrupts stack. I'm too lazy to write it properly =)
but on non-patched kernel it also triggers this bug:
[206732.025131] BUG: Bad rss-counter state mm:ffff88000d8a6c80 idx:1 val:-1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-30 12:22                 ` Konstantin Khlebnikov
@ 2012-05-30 12:54                   ` Konstantin Khlebnikov
  2012-05-30 14:20                     ` Martin Mokrejs
  0 siblings, 1 reply; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-05-30 12:54 UTC (permalink / raw)
  To: Martin Mokrejs
  Cc: Andrew Morton, Oleg Nesterov, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Konstantin Khlebnikov wrote:
> Martin Mokrejs wrote:
>>
>>
>> Konstantin Khlebnikov wrote:
>>> Martin Mokrejs wrote:
>>>> Andrew Morton wrote:
>>>>> On Wed, 30 May 2012 00:18:31 +0400
>>>>> Konstantin Khlebnikov<khlebnikov@openvz.org>    wrote:
>>>>>
>>>>>> Oleg Nesterov wrote:
>>>>>>> On 05/22, Andrew Morton wrote:
>>>>>>>>
>>>>>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>>>>>> happiness is important.  Has he cheered up yet?
>>>>>>>
>>>>>>> Well, yes, I do not really like this patch ;) Because I think there is
>>>>>>> a more simple/straightforward fix, see below. In my opinion it also
>>>>>>> makes the original code simpler.
>>>>>>>
>>>>>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>>>>>> and I didn't try to test it.
>>>>>>>
>>>>>>> So I won't argue with Konstantin who dislikes my patch, although I
>>>>>>> would like to know the reason.
>>>>>>
>>>>>> I don't remember why I dislike your patch.
>>>>>> For now I can only say ACK )
>>>>>
>>>>> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
>>>>> that it was tested would be nice ;)
>>>>
>>>> I will reboot in few hours, finally after few days ... I am running this first
>>>> patch. I will try to test the second/alternative patch more quickly. Sorry for
>>>> the delay.
>>>>
>>>
>>> easiest way trigger this bug:
>>>
>>> #define _GNU_SOURCE
>>> #include<unistd.h>
>>> #include<sched.h>
>>> #include<sys/syscall.h>
>>> #include<sys/mman.h>
>>>
>>> static inline int sys_clone(unsigned long flags, void *stack, int *ptid, int *ctid)
>>> {
>>>       return syscall(SYS_clone, flags, stack, ptid, ctid);
>>> }
>>>
>>> int main(int argc, char **argv)
>>> {
>>>       void *page;
>>>
>>>       page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>>>       sys_clone(CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, page);
>>> }
>>>
>>
>> I am getting segfaults with this.
>>
>> (gdb) where
>> #0  0x0000000000000000 in ?? ()
>> #1  0x00007f430f70a7e0 in __elf_set___libc_subfreeres_element_free_mem__ () from /lib64/libc.so.6
>> #2  0x00007f430f70a7e8 in __elf_set___libc_atexit_element__IO_cleanup__ () from /lib64/libc.so.6
>> #3  0x0000000000000001 in ?? ()
>> #4  0x0000000000000000 in ?? ()
>> (gdb)
>>
>> What number should I give it as an argument? ;-)
>
> there is no arguments.
>
> yeah it corrupts stack. I'm too lazy to write it properly =)
> but on non-patched kernel it also triggers this bug:
> [206732.025131] BUG: Bad rss-counter state mm:ffff88000d8a6c80 idx:1 val:-1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>

this version works without segfaults =)

#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <sys/mman.h>

int child(void *arg)
{
	return 0;
}

char stack[4096];

int main(int argc, char **argv)
{
	void *page;

	page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	clone(child, stack + sizeof(stack), CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, NULL, page);
	return 0;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-30 12:54                   ` Konstantin Khlebnikov
@ 2012-05-30 14:20                     ` Martin Mokrejs
  0 siblings, 0 replies; 21+ messages in thread
From: Martin Mokrejs @ 2012-05-30 14:20 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Oleg Nesterov, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org



Konstantin Khlebnikov wrote:
> Konstantin Khlebnikov wrote:
>> Martin Mokrejs wrote:
>>>
>>>
>>> Konstantin Khlebnikov wrote:
>>>> Martin Mokrejs wrote:
>>>>> Andrew Morton wrote:
>>>>>> On Wed, 30 May 2012 00:18:31 +0400
>>>>>> Konstantin Khlebnikov<khlebnikov@openvz.org>    wrote:
>>>>>>
>>>>>>> Oleg Nesterov wrote:
>>>>>>>> On 05/22, Andrew Morton wrote:
>>>>>>>>>
>>>>>>>>> Also, I have a note here that Oleg was unhappy with the patch.  Oleg
>>>>>>>>> happiness is important.  Has he cheered up yet?
>>>>>>>>
>>>>>>>> Well, yes, I do not really like this patch ;) Because I think there is
>>>>>>>> a more simple/straightforward fix, see below. In my opinion it also
>>>>>>>> makes the original code simpler.
>>>>>>>>
>>>>>>>> But. Obviously this is subjective, I can't prove my patch is "better",
>>>>>>>> and I didn't try to test it.
>>>>>>>>
>>>>>>>> So I won't argue with Konstantin who dislikes my patch, although I
>>>>>>>> would like to know the reason.
>>>>>>>
>>>>>>> I don't remember why I dislike your patch.
>>>>>>> For now I can only say ACK )
>>>>>>
>>>>>> We'll need a changelogged signed-off patch, please Oleg.  And some evidence
>>>>>> that it was tested would be nice ;)
>>>>>
>>>>> I will reboot in few hours, finally after few days ... I am running this first
>>>>> patch. I will try to test the second/alternative patch more quickly. Sorry for
>>>>> the delay.
>>>>>
>>>>
>>>> easiest way trigger this bug:
>>>>
>>>> #define _GNU_SOURCE
>>>> #include<unistd.h>
>>>> #include<sched.h>
>>>> #include<sys/syscall.h>
>>>> #include<sys/mman.h>
>>>>
>>>> static inline int sys_clone(unsigned long flags, void *stack, int *ptid, int *ctid)
>>>> {
>>>>       return syscall(SYS_clone, flags, stack, ptid, ctid);
>>>> }
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>>       void *page;
>>>>
>>>>       page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>>>>       sys_clone(CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, page);
>>>> }
>>>>
>>>
>>> I am getting segfaults with this.
>>>
>>> (gdb) where
>>> #0  0x0000000000000000 in ?? ()
>>> #1  0x00007f430f70a7e0 in __elf_set___libc_subfreeres_element_free_mem__ () from /lib64/libc.so.6
>>> #2  0x00007f430f70a7e8 in __elf_set___libc_atexit_element__IO_cleanup__ () from /lib64/libc.so.6
>>> #3  0x0000000000000001 in ?? ()
>>> #4  0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>> What number should I give it as an argument? ;-)
>>
>> there is no arguments.
>>
>> yeah it corrupts stack. I'm too lazy to write it properly =)
>> but on non-patched kernel it also triggers this bug:
>> [206732.025131] BUG: Bad rss-counter state mm:ffff88000d8a6c80 idx:1 val:-1
>>
>> -- 
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>
> 
> this version works without segfaults =)
> 
> #define _GNU_SOURCE
> #include <stdlib.h>
> #include <sched.h>
> #include <sys/mman.h>
> 
> int child(void *arg)
> {
>     return 0;
> }
> 
> char stack[4096];
> 
> int main(int argc, char **argv)
> {
>     void *page;
> 
>     page = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
>     clone(child, stack + sizeof(stack), CLONE_VFORK | CLONE_VM | CLONE_CHILD_CLEARTID, NULL, NULL, NULL, page);
>     return 0;
> }
> 

Thanks, this app does not crash anymore. Re-confirming that both patches fix the issue on my system.

Martin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-29 20:18       ` Konstantin Khlebnikov
  2012-05-29 20:26         ` Andrew Morton
@ 2012-05-30 17:11         ` Oleg Nesterov
  2012-06-07  7:59           ` Konstantin Khlebnikov
  1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2012-05-30 17:11 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, Martin Mokrejs, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

On 05/30, Konstantin Khlebnikov wrote:
>
> I don't remember why I dislike your patch.
> For now I can only say ACK )

Great.

Thanks Konstantin, thanks Martin!

I'll write the changelog and send the patch tomorrow.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-05-30 17:11         ` Oleg Nesterov
@ 2012-06-07  7:59           ` Konstantin Khlebnikov
  2012-06-07  8:23             ` richard -rw- weinberger
  2012-06-07 13:18             ` Oleg Nesterov
  0 siblings, 2 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-06-07  7:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Martin Mokrejs, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Oleg Nesterov wrote:
> On 05/30, Konstantin Khlebnikov wrote:
>>
>> I don't remember why I dislike your patch.
>> For now I can only say ACK )
>
> Great.
>
> Thanks Konstantin, thanks Martin!
>
> I'll write the changelog and send the patch tomorrow.

Ding! Week is over, or I missed something? )

>
> Oleg.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email:<a href=mailto:"dont@kvack.org">  email@kvack.org</a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-06-07  7:59           ` Konstantin Khlebnikov
@ 2012-06-07  8:23             ` richard -rw- weinberger
  2012-06-07 13:18             ` Oleg Nesterov
  1 sibling, 0 replies; 21+ messages in thread
From: richard -rw- weinberger @ 2012-06-07  8:23 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Oleg Nesterov, Andrew Morton, Martin Mokrejs, LKML,
	markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko, linux-mm@kvack.org

On Thu, Jun 7, 2012 at 9:59 AM, Konstantin Khlebnikov
<khlebnikov@openvz.org> wrote:
> Oleg Nesterov wrote:
>>
>> On 05/30, Konstantin Khlebnikov wrote:
>>>
>>>
>>> I don't remember why I dislike your patch.
>>> For now I can only say ACK )
>>
>>
>> Great.
>>
>> Thanks Konstantin, thanks Martin!
>>
>> I'll write the changelog and send the patch tomorrow.
>
>
> Ding! Week is over, or I missed something? )

FWIW, I see the same issue also on UML (3.5-rc1).

-- 
Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-06-07  7:59           ` Konstantin Khlebnikov
  2012-06-07  8:23             ` richard -rw- weinberger
@ 2012-06-07 13:18             ` Oleg Nesterov
  2012-06-07 13:53               ` Konstantin Khlebnikov
  1 sibling, 1 reply; 21+ messages in thread
From: Oleg Nesterov @ 2012-06-07 13:18 UTC (permalink / raw)
  To: Konstantin Khlebnikov, Andrew Morton
  Cc: Martin Mokrejs, LKML, markus@trippelsdorf.de, hughd@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko, linux-mm@kvack.org

On 06/07, Konstantin Khlebnikov wrote:
>
> Oleg Nesterov wrote:
>>
>> I'll write the changelog and send the patch tomorrow.
>
> Ding! Week is over, or I missed something? )

Pong ;)

I have sent the patch on May 31, see
http://marc.info/?l=linux-kernel&m=133848759505805
Also attached below, just in case.

Initiallly I sent 2 patches, see
http://marc.info/?l=linux-kernel&m=133848784705941
but 2/2 (your patch) was already merged.

-------------------------------------------------------------------------------
[PATCH] correctly synchronize rss-counters at exit/exec

A simplified version of Konstantin Khlebnikov's patch.

do_exit() and exec_mmap() call sync_mm_rss() before mm_release()
does put_user(clear_child_tid) which can update task->rss_stat
and thus make mm->rss_stat inconsistent. This triggers the "BUG:"
printk in check_mm().

- Move the final sync_mm_rss() from do_exit() to exit_mm(), and
  change exec_mmap() to call sync_mm_rss() after mm_release() to
  make check_mm() happy.

  Perhaps we should simply move it into mm_release() and call it
  unconditionally to catch the "task->rss_stat != 0 && !task->mm"
  bugs.

- Since taskstats_exit() is called before exit_mm(), add another
  sync_mm_rss() into xacct_add_tsk() who actually uses rss_stat.

  Probably we should also shift acct_update_integrals().

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 fs/exec.c       |    2 +-
 kernel/exit.c   |    5 ++---
 kernel/tsacct.c |    1 +
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 52c9e2f..e49e3c2 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *mm)
 	/* Notify parent that we're no longer interested in the old VM */
 	tsk = current;
 	old_mm = current->mm;
-	sync_mm_rss(old_mm);
 	mm_release(tsk, old_mm);
 
 	if (old_mm) {
+		sync_mm_rss(old_mm);
 		/*
 		 * Make sure that if there is a core dump in progress
 		 * for the old mm, we get out and die instead of going
diff --git a/kernel/exit.c b/kernel/exit.c
index ab972a7..b3a84b5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -655,6 +655,8 @@ static void exit_mm(struct task_struct * tsk)
 	mm_release(tsk, mm);
 	if (!mm)
 		return;
+
+	sync_mm_rss(mm);
 	/*
 	 * Serialize with any possible pending coredump.
 	 * We must hold mmap_sem around checking core_state
@@ -965,9 +967,6 @@ void do_exit(long code)
 				preempt_count());
 
 	acct_update_integrals(tsk);
-	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
 	group_dead = atomic_dec_and_test(&tsk->signal->live);
 	if (group_dead) {
 		hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 23b4d78..a64ee90 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
 	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
 	mm = get_task_mm(p);
 	if (mm) {
+		sync_mm_rss(mm);
 		/* adjust to KB unit */
 		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
 		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;
-- 
1.5.5.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59
  2012-06-07 13:18             ` Oleg Nesterov
@ 2012-06-07 13:53               ` Konstantin Khlebnikov
  0 siblings, 0 replies; 21+ messages in thread
From: Konstantin Khlebnikov @ 2012-06-07 13:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Martin Mokrejs, LKML, markus@trippelsdorf.de,
	hughd@google.com, kamezawa.hiroyu@jp.fujitsu.com, Michal Hocko,
	linux-mm@kvack.org

Oleg Nesterov wrote:
> On 06/07, Konstantin Khlebnikov wrote:
>>
>> Oleg Nesterov wrote:
>>>
>>> I'll write the changelog and send the patch tomorrow.
>>
>> Ding! Week is over, or I missed something? )
>
> Pong ;)
>
> I have sent the patch on May 31, see
> http://marc.info/?l=linux-kernel&m=133848759505805
> Also attached below, just in case.
>
> Initiallly I sent 2 patches, see
> http://marc.info/?l=linux-kernel&m=133848784705941
> but 2/2 (your patch) was already merged.

Hmm, ok. Thanks.

I think rss-fix must be in stable-3.4.x -- that "BUG..." message can disturb users.
Plus via this bug any application can decrease rss down to zero =)

>
> -------------------------------------------------------------------------------
> [PATCH] correctly synchronize rss-counters at exit/exec
>
> A simplified version of Konstantin Khlebnikov's patch.
>
> do_exit() and exec_mmap() call sync_mm_rss() before mm_release()
> does put_user(clear_child_tid) which can update task->rss_stat
> and thus make mm->rss_stat inconsistent. This triggers the "BUG:"
> printk in check_mm().
>
> - Move the final sync_mm_rss() from do_exit() to exit_mm(), and
>    change exec_mmap() to call sync_mm_rss() after mm_release() to
>    make check_mm() happy.
>
>    Perhaps we should simply move it into mm_release() and call it
>    unconditionally to catch the "task->rss_stat != 0&&  !task->mm"
>    bugs.
>
> - Since taskstats_exit() is called before exit_mm(), add another
>    sync_mm_rss() into xacct_add_tsk() who actually uses rss_stat.
>
>    Probably we should also shift acct_update_integrals().
>
> Reported-by: Markus Trippelsdorf<markus@trippelsdorf.de>
> Tested-by: Martin Mokrejs<mmokrejs@fold.natur.cuni.cz>
> Signed-off-by: Oleg Nesterov<oleg@redhat.com>
> Acked-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
> ---
>   fs/exec.c       |    2 +-
>   kernel/exit.c   |    5 ++---
>   kernel/tsacct.c |    1 +
>   3 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 52c9e2f..e49e3c2 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -823,10 +823,10 @@ static int exec_mmap(struct mm_struct *mm)
>   	/* Notify parent that we're no longer interested in the old VM */
>   	tsk = current;
>   	old_mm = current->mm;
> -	sync_mm_rss(old_mm);
>   	mm_release(tsk, old_mm);
>
>   	if (old_mm) {
> +		sync_mm_rss(old_mm);
>   		/*
>   		 * Make sure that if there is a core dump in progress
>   		 * for the old mm, we get out and die instead of going
> diff --git a/kernel/exit.c b/kernel/exit.c
> index ab972a7..b3a84b5 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -655,6 +655,8 @@ static void exit_mm(struct task_struct * tsk)
>   	mm_release(tsk, mm);
>   	if (!mm)
>   		return;
> +
> +	sync_mm_rss(mm);
>   	/*
>   	 * Serialize with any possible pending coredump.
>   	 * We must hold mmap_sem around checking core_state
> @@ -965,9 +967,6 @@ void do_exit(long code)
>   				preempt_count());
>
>   	acct_update_integrals(tsk);
> -	/* sync mm's RSS info before statistics gathering */
> -	if (tsk->mm)
> -		sync_mm_rss(tsk->mm);
>   	group_dead = atomic_dec_and_test(&tsk->signal->live);
>   	if (group_dead) {
>   		hrtimer_cancel(&tsk->signal->real_timer);
> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
> index 23b4d78..a64ee90 100644
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -91,6 +91,7 @@ void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
>   	stats->virtmem = p->acct_vm_mem1 * PAGE_SIZE / MB;
>   	mm = get_task_mm(p);
>   	if (mm) {
> +		sync_mm_rss(mm);
>   		/* adjust to KB unit */
>   		stats->hiwater_rss   = get_mm_hiwater_rss(mm) * PAGE_SIZE / KB;
>   		stats->hiwater_vm    = get_mm_hiwater_vm(mm)  * PAGE_SIZE / KB;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-06-07 13:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4FBC1618.5010408@fold.natur.cuni.cz>
2012-05-22 23:28 ` 3.4-rc7: BUG: Bad rss-counter state mm:ffff88040b56f800 idx:1 val:-59 Andrew Morton
2012-05-22 23:29   ` Andrew Morton
2012-05-23 17:21     ` Oleg Nesterov
2012-05-29 20:18       ` Konstantin Khlebnikov
2012-05-29 20:26         ` Andrew Morton
2012-05-29 21:59           ` Martin Mokrejs
2012-05-30 11:39             ` Konstantin Khlebnikov
2012-05-30 11:59               ` Martin Mokrejs
2012-05-30 12:22                 ` Konstantin Khlebnikov
2012-05-30 12:54                   ` Konstantin Khlebnikov
2012-05-30 14:20                     ` Martin Mokrejs
2012-05-30 17:11         ` Oleg Nesterov
2012-06-07  7:59           ` Konstantin Khlebnikov
2012-06-07  8:23             ` richard -rw- weinberger
2012-06-07 13:18             ` Oleg Nesterov
2012-06-07 13:53               ` Konstantin Khlebnikov
2012-05-30  9:54       ` Martin Mokrejs
2012-05-23  6:07   ` Konstantin Khlebnikov
2012-05-30  8:25     ` Martin Mokrejs
2012-05-23 17:04   ` Martin Mokrejs
2012-05-24 10:36     ` Konstantin Khlebnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).