All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andi Kleen <ak@suse.de>
Subject: Re: [RFC] Event counters [1/3]: Basic counter functionality
Date: Sat, 31 Dec 2005 18:26:02 -0200	[thread overview]
Message-ID: <20051231202602.GC3903@dmt.cnet> (raw)
In-Reply-To: <43B63931.6000307@yahoo.com.au>

Hi Nick!

On Sat, Dec 31, 2005 at 06:54:25PM +1100, Nick Piggin wrote:
> Marcelo Tosatti wrote:
> 
> >
> >What about this addition to the documentation above, to make it a little 
> >more verbose:
> >
> >	The possible race scenario is restricted to kernel preemption,
> >	and could happen as follows:
> >
> >	thread A				thread B
> >a)	movl    xyz(%ebp), %eax			movl    xyz(%ebp), %eax
> >b)	incl    %eax				incl    %eax
> >c)	movl    %eax, xyz(%ebp)			movl    %eax, xyz(%ebp)
> >
> >Thread A can be preempted in b), and thread B succesfully increments the
> >counter, writing it back to memory. Now thread A resumes execution, with
> >its stale copy of the counter, and overwrites the current counter.
> >
> >Resulting in increments lost.
> >
> >However that should be relatively rare condition.
> >
> 
> Hi Guys,
> 
> I've been waiting for some mm/ patches to clear from -mm before commenting
> too much... however I see that this patch is actually against -mm itself,
> with my __mod_page_state stuff in it... that makes the page state accounting
> much lighter weight AND is not racy.

It is racy with reference to preempt (please refer to the race condition
described above):

diff -puN mm/rmap.c~mm-page_state-opt mm/rmap.c
--- devel/mm/rmap.c~mm-page_state-opt   2005-12-13 22:25:01.000000000 -0800
+++ devel-akpm/mm/rmap.c        2005-12-13 22:25:01.000000000 -0800
@@ -451,7 +451,11 @@ static void __page_set_anon_rmap(struct 

        page->index = linear_page_index(vma, address);
 
-       inc_page_state(nr_mapped);
+       /*
+        * nr_mapped state can be updated without turning off
+        * interrupts because it is not modified via interrupt.
+        */
+       __inc_page_state(nr_mapped);
 }

And since "nr_mapped" is not a counter for debugging purposes only, you 
can't be lazy with reference to its consistency.

I would argue that you need a preempt save version for this important
counters, surrounded by preempt_disable/preempt_enable (which vanish 
if one selects !CONFIG_PREEMPT).

As Christoph notes, debugging counter consistency can be lazy, not even
requiring correct preempt locking (hum, this is debatable, needs careful
verification).
 
> So I'm not exactly sure why such a patch as this is wanted now? Are there
> any more xxx_page_state hotspots? (I admit to only looking at page faults,
> page allocator, and page reclaim).

A consolidation of the good parts of both would be interesting.

I don't see much point in Christoph's naming change to "event_counter", 
why are you doing that?

And follows an addition to your's mm-page_state-opt-docs.patch. Still
need to verify "nr_dirty" and "nr_unstable".

Happy new year!

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 343083f..f173e0f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,10 +83,14 @@
 struct page_state {
 	unsigned long nr_dirty;		/* Dirty writeable pages */
 	unsigned long nr_writeback;	/* Pages under writeback */
+					/* also modified from IRQ context */
 	unsigned long nr_unstable;	/* NFS unstable pages */
 	unsigned long nr_page_table_pages;/* Pages used for pagetables */
+					/* only modified from process context */
 	unsigned long nr_mapped;	/* mapped into pagetables */
+					/* only modified from process context */
 	unsigned long nr_slab;		/* In slab */
+					/* also modified from IRQ context */
 #define GET_PAGE_STATE_LAST nr_slab
 
 	/*

WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andi Kleen <ak@suse.de>
Subject: Re: [RFC] Event counters [1/3]: Basic counter functionality
Date: Sat, 31 Dec 2005 18:26:02 -0200	[thread overview]
Message-ID: <20051231202602.GC3903@dmt.cnet> (raw)
In-Reply-To: <43B63931.6000307@yahoo.com.au>

Hi Nick!

On Sat, Dec 31, 2005 at 06:54:25PM +1100, Nick Piggin wrote:
> Marcelo Tosatti wrote:
> 
> >
> >What about this addition to the documentation above, to make it a little 
> >more verbose:
> >
> >	The possible race scenario is restricted to kernel preemption,
> >	and could happen as follows:
> >
> >	thread A				thread B
> >a)	movl    xyz(%ebp), %eax			movl    xyz(%ebp), %eax
> >b)	incl    %eax				incl    %eax
> >c)	movl    %eax, xyz(%ebp)			movl    %eax, xyz(%ebp)
> >
> >Thread A can be preempted in b), and thread B succesfully increments the
> >counter, writing it back to memory. Now thread A resumes execution, with
> >its stale copy of the counter, and overwrites the current counter.
> >
> >Resulting in increments lost.
> >
> >However that should be relatively rare condition.
> >
> 
> Hi Guys,
> 
> I've been waiting for some mm/ patches to clear from -mm before commenting
> too much... however I see that this patch is actually against -mm itself,
> with my __mod_page_state stuff in it... that makes the page state accounting
> much lighter weight AND is not racy.

It is racy with reference to preempt (please refer to the race condition
described above):

diff -puN mm/rmap.c~mm-page_state-opt mm/rmap.c
--- devel/mm/rmap.c~mm-page_state-opt   2005-12-13 22:25:01.000000000 -0800
+++ devel-akpm/mm/rmap.c        2005-12-13 22:25:01.000000000 -0800
@@ -451,7 +451,11 @@ static void __page_set_anon_rmap(struct 

        page->index = linear_page_index(vma, address);
 
-       inc_page_state(nr_mapped);
+       /*
+        * nr_mapped state can be updated without turning off
+        * interrupts because it is not modified via interrupt.
+        */
+       __inc_page_state(nr_mapped);
 }

And since "nr_mapped" is not a counter for debugging purposes only, you 
can't be lazy with reference to its consistency.

I would argue that you need a preempt save version for this important
counters, surrounded by preempt_disable/preempt_enable (which vanish 
if one selects !CONFIG_PREEMPT).

As Christoph notes, debugging counter consistency can be lazy, not even
requiring correct preempt locking (hum, this is debatable, needs careful
verification).
 
> So I'm not exactly sure why such a patch as this is wanted now? Are there
> any more xxx_page_state hotspots? (I admit to only looking at page faults,
> page allocator, and page reclaim).

A consolidation of the good parts of both would be interesting.

I don't see much point in Christoph's naming change to "event_counter", 
why are you doing that?

And follows an addition to your's mm-page_state-opt-docs.patch. Still
need to verify "nr_dirty" and "nr_unstable".

Happy new year!

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 343083f..f173e0f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -83,10 +83,14 @@
 struct page_state {
 	unsigned long nr_dirty;		/* Dirty writeable pages */
 	unsigned long nr_writeback;	/* Pages under writeback */
+					/* also modified from IRQ context */
 	unsigned long nr_unstable;	/* NFS unstable pages */
 	unsigned long nr_page_table_pages;/* Pages used for pagetables */
+					/* only modified from process context */
 	unsigned long nr_mapped;	/* mapped into pagetables */
+					/* only modified from process context */
 	unsigned long nr_slab;		/* In slab */
+					/* also modified from IRQ context */
 #define GET_PAGE_STATE_LAST nr_slab
 
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2005-12-31 22:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-20 23:57 [RFC] Event counters [1/3]: Basic counter functionality Christoph Lameter
2005-12-20 23:57 ` Christoph Lameter
2005-12-20 23:57 ` [RFC] Event counters [2/3]: Convert inc_page_state -> count_event Christoph Lameter
2005-12-20 23:57   ` Christoph Lameter
2005-12-20 23:57 ` [RFC] Event counters [3/3]: Convert NUMA counters to event counters Christoph Lameter
2005-12-20 23:57   ` Christoph Lameter
2005-12-21 22:24   ` Christoph Lameter
2005-12-21 22:24     ` Christoph Lameter
2005-12-31  6:46 ` [RFC] Event counters [1/3]: Basic counter functionality Marcelo Tosatti
2005-12-31  6:46   ` Marcelo Tosatti
2005-12-31  7:54   ` Nick Piggin
2005-12-31  7:54     ` Nick Piggin
2005-12-31 20:26     ` Marcelo Tosatti [this message]
2005-12-31 20:26       ` Marcelo Tosatti
2006-01-02 21:40       ` Marcelo Tosatti
2006-01-02 21:40         ` Marcelo Tosatti
2006-01-03  5:11         ` Nick Piggin
2006-01-03  5:11           ` Nick Piggin
2006-01-03 10:11           ` Marcelo Tosatti
2006-01-03 10:11             ` Marcelo Tosatti
2006-01-03 13:21             ` Nick Piggin
2006-01-03 13:21               ` Nick Piggin
2006-01-03 12:06               ` Marcelo Tosatti
2006-01-03 12:06                 ` Marcelo Tosatti
2006-01-03  5:01       ` Nick Piggin
2006-01-03  5:01         ` Nick Piggin
2006-01-03 19:08     ` Christoph Lameter
2006-01-03 19:08       ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051231202602.GC3903@dmt.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=ak@suse.de \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.