All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	akpm@linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	khlebnikov@openvz.org, hughd@google.com, stable@vger.kernel.org
Subject: Re: [patch 12/12] mm: correctly synchronize rss-counters at exit/exec
Date: Mon, 11 Jun 2012 19:25:21 +0900	[thread overview]
Message-ID: <4FD5C791.9090902@jp.fujitsu.com> (raw)
In-Reply-To: <20120608121816.GA23147@redhat.com>

(2012/06/08 21:18), Oleg Nesterov wrote:
> On 06/07, Linus Torvalds wrote:
>>
>> It does totally insane things in xacct_add_tsk(). You can't call
>> "sync_mm_rss(mm)" on somebody elses mm,
>
> Damn, I am stupid. Yes, I forgot about fill_stats_for_pid().
> And I didn't bother to look at get_task_mm() which clearly
> shows that this tsk can be !current.
>
> We can add the "p == current" check as Hugh suggested.
>
> But,
>
>> Doing it
>> *anywhere* where mm is not clearly "current->mm" is wrong.
>
> Agreed.
>
> How about v2? It adds sync_mm_rss() into taskstats_exit(). Note
> that it preserves the "tsk->mm != NULL" check we currently have.
> I think it should be removed (see the changelog), but even if I
> am right I'd prefer to do this in a separate patch.
>

I'm sorry I've been silent...one another fix I can think of is
this kind of change to sync_mm_rss(). How do you think ?

==
 From be49ed6843b09ae33d758f2a51cf8357f7502512 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Mon, 11 Jun 2012 19:45:09 +0900
Subject: [PATCH] fix sync_mm_rss() leakage.

Any page fault after sync_mm_rss() in do_exit() causes problem
in check_mm(). It happens because task's rss counter is not
synchronized after the last sync_mm_rss().

This patch replaces the last sync_mm_rss() with finalize_mm_rss()
and disallow per-task rss count caching after finalization.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
  fs/exec.c          |    3 ++-
  include/linux/mm.h |   10 ++++++++++
  kernel/exit.c      |    3 +--
  mm/memory.c        |   21 ++++++++++++++++++---
  4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a79786a..3e47772 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -819,7 +819,7 @@ static int exec_mmap(struct mm_struct *mm)
  	/* Notify parent that we're no longer interested in the old VM */
  	tsk = current;
  	old_mm = current->mm;
-	sync_mm_rss(old_mm);
+	finalize_mm_rss();
  	mm_release(tsk, old_mm);
  
  	if (old_mm) {
@@ -851,6 +851,7 @@ static int exec_mmap(struct mm_struct *mm)
  		return 0;
  	}
  	mmdrop(active_mm);
+	initialize_mm_rss();
  	return 0;
  }
  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..995d7ff 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1129,10 +1129,20 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
  
  #if defined(SPLIT_RSS_COUNTING)
  void sync_mm_rss(struct mm_struct *mm);
+void finalize_mm_rss(void);
+void initialize_mm_rss(void);
  #else
+static inline void finalize_mm_rss(void)
+{
+}
+
  static inline void sync_mm_rss(struct mm_struct *mm)
  {
  }
+
+static inline void initialize_mm_rss(void)
+{
+}
  #endif
  
  int vma_wants_writenotify(struct vm_area_struct *vma);
diff --git a/kernel/exit.c b/kernel/exit.c
index 34867cc..2111879 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -961,8 +961,7 @@ void do_exit(long code)
  
  	acct_update_integrals(tsk);
  	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+	finalize_mm_rss();
  	group_dead = atomic_dec_and_test(&tsk->signal->live);
  	if (group_dead) {
  		hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..07aa887d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -125,6 +125,20 @@ core_initcall(init_zero_pfn);
  
  #if defined(SPLIT_RSS_COUNTING)
  
+void initialize_mm_rss(void)
+{
+	current->rss_stat.events = 0;
+}
+
+void finalize_mm_rss(void)
+{
+	current->rss_stat.events = -1;
+	if (current->mm)
+		sync_mm_rss(current->mm);
+}
+
+#define rss_count_finalized(task)	((task)->rss_stat.events < 0)
+
  void sync_mm_rss(struct mm_struct *mm)
  {
  	int i;
@@ -135,14 +149,15 @@ void sync_mm_rss(struct mm_struct *mm)
  			current->rss_stat.count[i] = 0;
  		}
  	}
-	current->rss_stat.events = 0;
+	if (!rss_count_finalized(current))
+		current->rss_stat.events = 0;
  }
  
  static void add_mm_counter_fast(struct mm_struct *mm, int member, int val)
  {
  	struct task_struct *task = current;
  
-	if (likely(task->mm == mm))
+	if (likely(task->mm == mm && !rss_count_finalized(task)))
  		task->rss_stat.count[member] += val;
  	else
  		add_mm_counter(mm, member, val);
@@ -154,7 +169,7 @@ static void add_mm_counter_fast(struct mm_struct *mm, int member, int val)
  #define TASK_RSS_EVENTS_THRESH	(64)
  static void check_sync_rss_stat(struct task_struct *task)
  {
-	if (unlikely(task != current))
+	if (unlikely(task != current || rss_count_finalized(task)))
  		return;
  	if (unlikely(task->rss_stat.events++ > TASK_RSS_EVENTS_THRESH))
  		sync_mm_rss(task->mm);
-- 
1.7.4.1



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Markus Trippelsdorf <markus@trippelsdorf.de>,
	akpm@linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	khlebnikov@openvz.org, hughd@google.com, stable@vger.kernel.org
Subject: Re: [patch 12/12] mm: correctly synchronize rss-counters at exit/exec
Date: Mon, 11 Jun 2012 19:25:21 +0900	[thread overview]
Message-ID: <4FD5C791.9090902@jp.fujitsu.com> (raw)
In-Reply-To: <20120608121816.GA23147@redhat.com>

(2012/06/08 21:18), Oleg Nesterov wrote:
> On 06/07, Linus Torvalds wrote:
>>
>> It does totally insane things in xacct_add_tsk(). You can't call
>> "sync_mm_rss(mm)" on somebody elses mm,
>
> Damn, I am stupid. Yes, I forgot about fill_stats_for_pid().
> And I didn't bother to look at get_task_mm() which clearly
> shows that this tsk can be !current.
>
> We can add the "p == current" check as Hugh suggested.
>
> But,
>
>> Doing it
>> *anywhere* where mm is not clearly "current->mm" is wrong.
>
> Agreed.
>
> How about v2? It adds sync_mm_rss() into taskstats_exit(). Note
> that it preserves the "tsk->mm != NULL" check we currently have.
> I think it should be removed (see the changelog), but even if I
> am right I'd prefer to do this in a separate patch.
>

I'm sorry I've been silent...one another fix I can think of is
this kind of change to sync_mm_rss(). How do you think ?

==
 From be49ed6843b09ae33d758f2a51cf8357f7502512 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Date: Mon, 11 Jun 2012 19:45:09 +0900
Subject: [PATCH] fix sync_mm_rss() leakage.

Any page fault after sync_mm_rss() in do_exit() causes problem
in check_mm(). It happens because task's rss counter is not
synchronized after the last sync_mm_rss().

This patch replaces the last sync_mm_rss() with finalize_mm_rss()
and disallow per-task rss count caching after finalization.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
  fs/exec.c          |    3 ++-
  include/linux/mm.h |   10 ++++++++++
  kernel/exit.c      |    3 +--
  mm/memory.c        |   21 ++++++++++++++++++---
  4 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a79786a..3e47772 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -819,7 +819,7 @@ static int exec_mmap(struct mm_struct *mm)
  	/* Notify parent that we're no longer interested in the old VM */
  	tsk = current;
  	old_mm = current->mm;
-	sync_mm_rss(old_mm);
+	finalize_mm_rss();
  	mm_release(tsk, old_mm);
  
  	if (old_mm) {
@@ -851,6 +851,7 @@ static int exec_mmap(struct mm_struct *mm)
  		return 0;
  	}
  	mmdrop(active_mm);
+	initialize_mm_rss();
  	return 0;
  }
  
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..995d7ff 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1129,10 +1129,20 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
  
  #if defined(SPLIT_RSS_COUNTING)
  void sync_mm_rss(struct mm_struct *mm);
+void finalize_mm_rss(void);
+void initialize_mm_rss(void);
  #else
+static inline void finalize_mm_rss(void)
+{
+}
+
  static inline void sync_mm_rss(struct mm_struct *mm)
  {
  }
+
+static inline void initialize_mm_rss(void)
+{
+}
  #endif
  
  int vma_wants_writenotify(struct vm_area_struct *vma);
diff --git a/kernel/exit.c b/kernel/exit.c
index 34867cc..2111879 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -961,8 +961,7 @@ void do_exit(long code)
  
  	acct_update_integrals(tsk);
  	/* sync mm's RSS info before statistics gathering */
-	if (tsk->mm)
-		sync_mm_rss(tsk->mm);
+	finalize_mm_rss();
  	group_dead = atomic_dec_and_test(&tsk->signal->live);
  	if (group_dead) {
  		hrtimer_cancel(&tsk->signal->real_timer);
diff --git a/mm/memory.c b/mm/memory.c
index 1b7dc66..07aa887d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -125,6 +125,20 @@ core_initcall(init_zero_pfn);
  
  #if defined(SPLIT_RSS_COUNTING)
  
+void initialize_mm_rss(void)
+{
+	current->rss_stat.events = 0;
+}
+
+void finalize_mm_rss(void)
+{
+	current->rss_stat.events = -1;
+	if (current->mm)
+		sync_mm_rss(current->mm);
+}
+
+#define rss_count_finalized(task)	((task)->rss_stat.events < 0)
+
  void sync_mm_rss(struct mm_struct *mm)
  {
  	int i;
@@ -135,14 +149,15 @@ void sync_mm_rss(struct mm_struct *mm)
  			current->rss_stat.count[i] = 0;
  		}
  	}
-	current->rss_stat.events = 0;
+	if (!rss_count_finalized(current))
+		current->rss_stat.events = 0;
  }
  
  static void add_mm_counter_fast(struct mm_struct *mm, int member, int val)
  {
  	struct task_struct *task = current;
  
-	if (likely(task->mm == mm))
+	if (likely(task->mm == mm && !rss_count_finalized(task)))
  		task->rss_stat.count[member] += val;
  	else
  		add_mm_counter(mm, member, val);
@@ -154,7 +169,7 @@ static void add_mm_counter_fast(struct mm_struct *mm, int member, int val)
  #define TASK_RSS_EVENTS_THRESH	(64)
  static void check_sync_rss_stat(struct task_struct *task)
  {
-	if (unlikely(task != current))
+	if (unlikely(task != current || rss_count_finalized(task)))
  		return;
  	if (unlikely(task->rss_stat.events++ > TASK_RSS_EVENTS_THRESH))
  		sync_mm_rss(task->mm);
-- 
1.7.4.1




  reply	other threads:[~2012-06-11 10:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20120607212114.E4F5AA02F8@akpm.mtv.corp.google.com>
     [not found] ` <CA+55aFxOWR_h1vqRLAd_h5_woXjFBLyBHP--P8F7WsYrciXdmA@mail.gmail.com>
2012-06-08  0:25   ` [patch 12/12] mm: correctly synchronize rss-counters at exit/exec Linus Torvalds
2012-06-08  0:25     ` Linus Torvalds
2012-06-08  0:25     ` Linus Torvalds
2012-06-08  1:05     ` Markus Trippelsdorf
2012-06-08  1:18       ` Linus Torvalds
2012-06-08  1:18         ` Linus Torvalds
2012-06-08 12:18         ` Oleg Nesterov
2012-06-08 12:18           ` Oleg Nesterov
2012-06-11 10:25           ` Kamezawa Hiroyuki [this message]
2012-06-11 10:25             ` Kamezawa Hiroyuki
2012-06-08  1:16     ` Hugh Dickins
2012-06-08  1:19       ` Linus Torvalds
2012-06-08  1:19         ` Linus Torvalds
2012-06-08  1:19         ` Linus Torvalds
2012-06-08  5:28         ` Hugh Dickins
2012-06-08 10:20       ` Konstantin Khlebnikov
2012-06-08 10:20         ` Konstantin Khlebnikov
2012-06-08 12:24         ` Oleg Nesterov
2012-06-08 12:24           ` Oleg Nesterov
2012-06-08 13:29           ` Konstantin Khlebnikov
2012-06-08 13:29             ` Konstantin Khlebnikov
2012-06-08 17:01             ` Oleg Nesterov
2012-06-08 17:01               ` Oleg Nesterov
2012-06-09  9:43               ` [PATCH] " Konstantin Khlebnikov
2012-06-09  9:43                 ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD5C791.9090902@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=markus@trippelsdorf.de \
    --cc=oleg@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.