linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-10  7:15 Jianyu Zhan
  2014-05-10  7:17 ` [PATCH 2/3] mm: use a light-weight __mod_zone_page_state in mlocked_vma_newpage() Jianyu Zhan
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jianyu Zhan @ 2014-05-10  7:15 UTC (permalink / raw)
  To: akpm, mgorman, cody, liuj97, zhangyanfei, srivatsa.bhat, dave,
	iamjoonsoo.kim, n-horiguchi, kirill.shutemov, schwidefsky,
	nasa4836, gorcunov, riel, cl, toshi.kani, paul.gortmaker
  Cc: linux-mm, linux-kernel

__mod_zone_page_stat() is not irq-safe, so it should be used carefully.
And it is not appropirately documented now. This patch adds comment for
it, and also documents for some of its call sites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/page_alloc.c |  2 ++
 mm/rmap.c       |  6 ++++++
 mm/vmstat.c     | 16 +++++++++++++++-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dba293..9d6f474 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -659,6 +659,8 @@ static inline int free_pages_check(struct page *page)
  *
  * And clear the zone's pages_scanned counter, to hold off the "all pages are
  * pinned" detection logic.
+ *
+ * Note: this function should be used with irq disabled.
  */
 static void free_pcppages_bulk(struct zone *zone, int count,
 					struct per_cpu_pages *pcp)
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..6078a30 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -979,6 +979,8 @@ void page_add_anon_rmap(struct page *page,
 /*
  * Special version of the above for do_swap_page, which often runs
  * into pages that are exclusively owned by the current process.
+ * So we could use the irq-unsafe version __{inc|mod}_zone_page_stat
+ * here without others racing change it in between.
  * Everybody else should continue to use page_add_anon_rmap above.
  */
 void do_page_add_anon_rmap(struct page *page,
@@ -1077,6 +1079,10 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * And we are the last user of this page, so it is safe to use
+	 * the irq-unsafe version __{mod|dec}_zone_page here, since we
+	 * have no racer.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..778f154 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,21 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * Optimized modificatoin function.
+ *
+ * The code basically does the modification in two steps:
+ *
+ *  1. read the current counter based on the processor number
+ *  2. modificate the counter write it back.
+ *
+ * So this function should be used with the guarantee that
+ *
+ *  1. interrupts are disabled, or
+ *  2. interrupts are enabled, but no other sites would race to
+ *     modify this counter in between.
+ *
+ * Otherwise, an irq-safe version mod_zone_page_state() should
+ * be used instead.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-11 12:31 Jianyu Zhan
  2014-05-12 14:01 ` Christoph Lameter
  0 siblings, 1 reply; 15+ messages in thread
From: Jianyu Zhan @ 2014-05-11 12:31 UTC (permalink / raw)
  To: akpm, hughd, riel, nasa4836, mgorman, zhangyanfei, aarcange, fabf,
	sasha.levin, oleg, n-horiguchi, iamjoonsoo.kim, kirill.shutemov,
	gorcunov, cl, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat
  Cc: linux-mm, linux-kernel

>Your original __mod_zone_page_state happened to be correct;
>but you have no understanding of why it was correct, so its
>comment was very wrong, even after you changed the irq wording.
>
>This series just propagates your misunderstanding further,
>while providing an object lesson in how not to present a series.
>
>Sorry, you have quickly developed an unenviable reputation for
>patches which waste developers' time: please consider your
>patches much more carefully before posting them.

Hi, Hugh.

I'm sorry for my misunderstanding in the previous patches. I should
have given them more thought. I'm really sorry. {palmface} X 3

And I am really appreciated for your detailed comments and your patience
with me such a noob;-) Today I've spent some time on digging into
the history to figure out what is under the hood.

>I'd prefer to let Christoph write the definitive version,
>but my first stab at it would be:
>
>/*
> * For use when we know that interrupts are disabled,
> * or when we know that preemption is disabled and that
> * particular counter cannot be updated from interrupt context.
> */

 Seconded. Christoph, would you please write a comment? I've written
 a new one based on Hugh's, would you please also take a look? Thanks.

>Once upon a time, from 2.6.16 to 2.6.32, there was indeed a relevant
>and helpful comment in __page_set_anon_rmap():
>        /*
>         * nr_mapped state can be updated without turning off
>         * interrupts because it is not modified via interrupt.
>         */
>        __inc_page_state(nr_mapped);
>
>The comment survived the replacement of nr_mapped, but eventually
>it got cleaned away completely.
>
>It is safe to use the irq-unsafe __mod_zone_page_stat on counters
>which are never modified via interrupt.
>You are right that the comment is not good enough, but I don't trust
>your version either.  Since percpu variables are involved, it's important
>that preemption be disabled too (see comment above __inc_zone_state).

Thanks for clarifying this. I checked the history, this is my understanding
, correct me if I am wrong, thanks!

__mod_zone_page_stat() should be called with irq-off, this is a strongest
gurantee for safety. For a weaker gurantee, preemte-disable is needed and
in this situation, the item counter in question should not be modified in
interrupt context.

It is essential to have such gurantees, because __mod_zone_page_stat()
is a two-step operation : read-percpu-couter-then-modify-it.
(Need comments. Christoph, do I misunderstand it?)

For for all other call sites of __mod_zone_page_stat() in the patch,
I think your comment is right, it is because they are not modified in
interrupt context, so we could safely use __mod_zone_page_stat().
 
But for the call site in page_add_new_anon_rmap(), I think my previous
wording is appropriate: mlocked_vma_newpage() is only called in fault path
by page_add_new_anon_rmap() on a *new* page, which is initially only visible
via the pagetables, and the pte is locked while calling page_add_new_anon_rmap(),
so we are not afraid of others seeing it a this point, not even modifying it.
so whether is could be modified in interrupt context or not does matter, but it
is not the key point here.

I've renewed the patch. And in this patch, I also change page_add_new_anon_rmap to
use __mod_zone_page_stat(), since they are quite relative to be in one patch.

Thanks for your review. I am appreciated for your comments. Andrew, Christoph,
would you please review it? Thansk.

-----<8-----
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is only called in fault path by
page_add_new_anon_rmap(), which is called on a *new* page.
And such page is initially only visible via the pagetables, and the
pte is locked while calling page_add_new_anon_rmap(), so we need not
use an irq-safe mod_zone_page_state() here, using a light-weight version
__mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  9 ++++++++-
 mm/rmap.c     | 11 +++++++++++
 mm/vmstat.c   |  5 ++++-
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..7140c9b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,14 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * 1. this counter is not modified in interrupt context, and
+		 * 2. pte lock is held, and this a newpage, which is initially
+		 *    only visible via the pagetables. So this would exclude
+		 *    racy processes from preemting us and to modify it.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..0700253 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,12 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * 1. these counters are not modified in interrupt context, and
+		 * 2. pte lock is held, this would exclude racy processes from
+		 *    preemting us and to modify these counters.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1083,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * 1. these counters are not modified in interrupt context, and
+	 * 2. pte lock is held, this would exclude racy processes from
+	 *    preemting us and to modify these counters.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..e6db98d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,10 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that either
+ *  1. interrupts are disabled, or
+ *  2. preemption is disabled and that particular counter cannot be
+ *     updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 16:33 Jianyu Zhan
  2014-05-12 16:57 ` Christoph Lameter
  2014-05-14  3:59 ` Hugh Dickins
  0 siblings, 2 replies; 15+ messages in thread
From: Jianyu Zhan @ 2014-05-12 16:33 UTC (permalink / raw)
  To: cl, akpm, riel, aarcange, oleg, cldu, fabf, nasa4836, sasha.levin,
	zhangyanfei, iamjoonsoo.kim, n-horiguchi, kirill.shutemov, liwanp,
	gorcunov, minchan, dave.hansen, toshi.kani, paul.gortmaker,
	srivatsa.bhat
  Cc: linux-mm, linux-kernel, Hugh Dickins

>> This means they guarantee that even they are preemted the vm
>> counter won't be modified incorrectly.  Because the counter is page-related
>> (e.g., a new anon page added), and they are exclusively hold the pte lock.
>
>But there are multiple pte locks for numerous page. Another process could
>modify the counter because the pte lock for a different page was
>available which would cause counter corruption.
>
>
>> So, as you concludes in the other mail that __modd_zone_page_stat
>> couldn't be used.
>> in mlocked_vma_newpage, then what qualifies other call sites for using
>> it, in the same situation?

Thanks, now everything is clear.

I've renewed the patch, would you please review it? Thanks!

---<8---
mm: use the light version __mod_zone_page_state in mlocked_vma_newpage()

mlocked_vma_newpage() is called with pte lock held(a spinlock), which
implies preemtion disabled, and the vm stat counter is not modified from
interrupt context, so we need not use an irq-safe mod_zone_page_state() here,
using a light-weight version __mod_zone_page_state() would be OK.

This patch also documents __mod_zone_page_state() and some of its
callsites. The comment above __mod_zone_page_state() is from Hugh
Dickins, and acked by Christoph.

Most credits to Hugh and Christoph for the clarification on the usage of
the __mod_zone_page_state().

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
---
 mm/internal.h |  7 ++++++-
 mm/rmap.c     | 10 ++++++++++
 mm/vmstat.c   |  4 +++-
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..53d439e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -196,7 +196,12 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma,
 		return 0;
 
 	if (!TestSetPageMlocked(page)) {
-		mod_zone_page_state(page_zone(page), NR_MLOCK,
+		/*
+		 * We use the irq-unsafe __mod_zone_page_stat because
+		 * this counter is not modified from interrupt context, and the
+		 * pte lock is held(spinlock), which implies preemtion disabled.
+		 */
+		__mod_zone_page_state(page_zone(page), NR_MLOCK,
 				    hpage_nr_pages(page));
 		count_vm_event(UNEVICTABLE_PGMLOCKED);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index 9c3e773..2fa4375 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -986,6 +986,11 @@ void do_page_add_anon_rmap(struct page *page,
 {
 	int first = atomic_inc_and_test(&page->_mapcount);
 	if (first) {
+		/*
+		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+		 * these counters are not modified in interrupt context, and
+		 * pte lock(a spinlock) is held, which implies preemtion disabled.
+		 */
 		if (PageTransHuge(page))
 			__inc_zone_page_state(page,
 					      NR_ANON_TRANSPARENT_HUGEPAGES);
@@ -1077,6 +1082,11 @@ void page_remove_rmap(struct page *page)
 	/*
 	 * Hugepages are not counted in NR_ANON_PAGES nor NR_FILE_MAPPED
 	 * and not charged by memcg for now.
+	 *
+	 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
+	 * these counters are not modified in interrupt context, and
+	 * these counters are not modified in interrupt context, and
+	 * pte lock(a spinlock) is held, which implies preemtion disabled.
 	 */
 	if (unlikely(PageHuge(page)))
 		goto out;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 302dd07..704928e 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -207,7 +207,9 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 }
 
 /*
- * For use when we know that interrupts are disabled.
+ * For use when we know that interrupts are disabled,
+ * or when we know that preemption is disabled and that
+ * particular counter cannot be updated from interrupt context.
  */
 void __mod_zone_page_state(struct zone *zone, enum zone_stat_item item,
 				int delta)
-- 
2.0.0-rc1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] mm: add comment for __mod_zone_page_stat
@ 2014-05-12 17:36 Jianyu Zhan
  0 siblings, 0 replies; 15+ messages in thread
From: Jianyu Zhan @ 2014-05-12 17:36 UTC (permalink / raw)
  To: cl, akpm; +Cc: linux-mm, linux-kernel, nasa4836, Hugh Dickins

>Preemption is mis-spelled throughout.
>
>Otherwise
>
>Reviewed-by: Christoph Lameter <cl@linux.com>

 Oops, corrected. Thanks.

-----<8-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-14  4:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-10  7:15 [PATCH 1/3] mm: add comment for __mod_zone_page_stat Jianyu Zhan
2014-05-10  7:17 ` [PATCH 2/3] mm: use a light-weight __mod_zone_page_state in mlocked_vma_newpage() Jianyu Zhan
2014-05-10 20:16   ` Hugh Dickins
2014-05-10  7:18 ` [PATCH 3/3] mm: rename mlocked_vma_newpage to newpage_in_mlocked_vma Jianyu Zhan
2014-05-10 20:21   ` Hugh Dickins
2014-05-10 19:51 ` [PATCH 1/3] mm: add comment for __mod_zone_page_stat Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2014-05-11 12:31 Jianyu Zhan
2014-05-12 14:01 ` Christoph Lameter
2014-05-12 15:19   ` Jianyu Zhan
2014-05-12 16:05     ` Christoph Lameter
2014-05-12 16:07     ` Christoph Lameter
2014-05-12 16:33 Jianyu Zhan
2014-05-12 16:57 ` Christoph Lameter
2014-05-14  3:59 ` Hugh Dickins
2014-05-12 17:36 Jianyu Zhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).