All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux kernel <linux-kernel@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mingming Cao <cmm@us.ibm.com>, "Theodore Ts'o" <tytso@mit.edu>,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH] percpu_counter: Fix __percpu_counter_sum()
Date: Sun, 07 Dec 2008 14:28:00 +0100	[thread overview]
Message-ID: <493BCF60.1080409@cosmosbay.com> (raw)
In-Reply-To: <20081206202233.3b74febc.akpm@linux-foundation.org>

Andrew Morton a écrit :
> On Wed, 03 Dec 2008 21:24:36 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> Eric Dumazet a __crit :
>>
>> 1) __percpu_counter_sum() is buggy, it should not write
>> on per_cpu_ptr(fbc->counters, cpu), or another cpu
>> could get its changes lost.
>>
>> __percpu_counter_sum should be read only (const struct percpu_counter *fbc),
>> and no locking needed.
> 
> No, we can't do this - it will break ext4.
> 
> Take a closer look at 1f7c14c62ce63805f9574664a6c6de3633d4a354 and at
> e8ced39d5e8911c662d4d69a342b9d053eaaac4e.
> 
> I suggest that what we do is to revert both those changes.  We can
> worry about the possibly-unneeded spin_lock later, in a separate patch.
> 
> It should have been a separate patch anyway.  It's conceptually
> unrelated and is not a bugfix, but it was mixed in with a bugfix.
> 
> Mingming, this needs urgent consideration, please.  Note that I had to
> make additional changes to ext4 due to the subsequent introduction of
> the dirty_blocks counter.
> 
> 
> Please read the below changelogs carefully and check that I have got my
> head around this correctly - I may not have done.
> 


Hum... e8ced39d5e8911c662d4d69a342b9d053eaaac4e is probably following
the wrong path, but I see the intent. Even in the 'nr_files' case, it could
help to reduce excessive calls to percpu_counter_sum()

What we can do is to use two s64 counters (only in SMP):

s64 reference_count
s64 shadow_count

One that is guaranteed to be touched with appropriate locking
in __percpu_counter_add()

Another one that might be changed by percpu_counter_sum(), without
any locking, acting as a shadow.

Thanks

[PATCH] percpu_counter: Fix __percpu_counter_sum()

commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e (percpu_counter: 
new function percpu_counter_sum_and_set) was to make __percpu_counter_sum()
being able to recompute the estimate of a percpu_counter value.

Problem is that we cannot write on other cpus counters without racing.

What we can do is to use two s64 counter, one acting as a reference
that we should not change in __percpu_counter_sum(), another one, shadowing
the reference.

percpu_counter_read() is reading the shadow
percpu_counter_sum() reads the reference and recompute the shadow.

If a given percpu_counter is never 'summed', then its shadow_counter
is always equal to its reference.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 include/linux/percpu_counter.h |    9 +++++----
 lib/percpu_counter.c           |   27 +++++++++++++++++----------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 9007ccd..71b5c5d 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -17,7 +17,8 @@
 
 struct percpu_counter {
 	spinlock_t lock;
-	s64 count;
+	s64	   reference_count;
+	s64	   shadow_count;
 #ifdef CONFIG_HOTPLUG_CPU
 	struct list_head list;	/* All percpu_counters are on a list */
 #endif
@@ -55,7 +56,7 @@ static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
 
 static inline s64 percpu_counter_read(struct percpu_counter *fbc)
 {
-	return fbc->count;
+	return fbc->shadow_count;
 }
 
 /*
@@ -65,9 +66,9 @@ static inline s64 percpu_counter_read(struct percpu_counter *fbc)
  */
 static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
 {
-	s64 ret = fbc->count;
+	s64 ret = percpu_counter_read(fbc);
 
-	barrier();		/* Prevent reloads of fbc->count */
+	barrier();		/* Prevent reloads of fbc->shadow_count */
 	if (ret >= 0)
 		return ret;
 	return 1;
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index a866389..44ec857 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -14,6 +14,9 @@ static LIST_HEAD(percpu_counters);
 static DEFINE_MUTEX(percpu_counters_lock);
 #endif
 
+/*
+ * Note : This function is racy
+ */
 void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
 {
 	int cpu;
@@ -23,7 +26,8 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
 		*pcount = 0;
 	}
-	fbc->count = amount;
+	fbc->reference_count = amount;
+	fbc->shadow_count = amount;
 	spin_unlock(&fbc->lock);
 }
 EXPORT_SYMBOL(percpu_counter_set);
@@ -38,7 +42,8 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 	count = *pcount + amount;
 	if (count >= batch || count <= -batch) {
 		spin_lock(&fbc->lock);
-		fbc->count += count;
+		fbc->reference_count += count;
+		fbc->shadow_count += count;
 		*pcount = 0;
 		spin_unlock(&fbc->lock);
 	} else {
@@ -57,16 +62,16 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
 	s64 ret;
 	int cpu;
 
-	spin_lock(&fbc->lock);
-	ret = fbc->count;
+	ret = fbc->reference_count;
 	for_each_online_cpu(cpu) {
 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
 		ret += *pcount;
-		*pcount = 0;
 	}
-	fbc->count = ret;
-
-	spin_unlock(&fbc->lock);
+	/*
+	 * Update fbc->shadow_count so that percpu_counter_read()
+	 * can have a better idea of this counter 'value'
+	 */
+	fbc->shadow_count = ret;
 	return ret;
 }
 EXPORT_SYMBOL(__percpu_counter_sum);
@@ -76,7 +81,8 @@ static struct lock_class_key percpu_counter_irqsafe;
 int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
 {
 	spin_lock_init(&fbc->lock);
-	fbc->count = amount;
+	fbc->shadow_count = amount;
+	fbc->reference_count = amount;
 	fbc->counters = alloc_percpu(s32);
 	if (!fbc->counters)
 		return -ENOMEM;
@@ -132,7 +138,8 @@ static int __cpuinit percpu_counter_hotcpu_callback(struct notifier_block *nb,
 
 		spin_lock_irqsave(&fbc->lock, flags);
 		pcount = per_cpu_ptr(fbc->counters, cpu);
-		fbc->count += *pcount;
+		fbc->reference_count += *pcount;
+		fbc->shadow_count += *pcount;
 		*pcount = 0;
 		spin_unlock_irqrestore(&fbc->lock, flags);
 	}

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <dada1@cosmosbay.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux kernel <linux-kernel@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mingming Cao <cmm@us.ibm.com>, "Theodore Ts'o" <tytso@mit.edu>,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH] percpu_counter: Fix __percpu_counter_sum()
Date: Sun, 07 Dec 2008 14:28:00 +0100	[thread overview]
Message-ID: <493BCF60.1080409@cosmosbay.com> (raw)
In-Reply-To: <20081206202233.3b74febc.akpm@linux-foundation.org>

Andrew Morton a écrit :
> On Wed, 03 Dec 2008 21:24:36 +0100 Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
>> Eric Dumazet a __crit :
>>
>> 1) __percpu_counter_sum() is buggy, it should not write
>> on per_cpu_ptr(fbc->counters, cpu), or another cpu
>> could get its changes lost.
>>
>> __percpu_counter_sum should be read only (const struct percpu_counter *fbc),
>> and no locking needed.
> 
> No, we can't do this - it will break ext4.
> 
> Take a closer look at 1f7c14c62ce63805f9574664a6c6de3633d4a354 and at
> e8ced39d5e8911c662d4d69a342b9d053eaaac4e.
> 
> I suggest that what we do is to revert both those changes.  We can
> worry about the possibly-unneeded spin_lock later, in a separate patch.
> 
> It should have been a separate patch anyway.  It's conceptually
> unrelated and is not a bugfix, but it was mixed in with a bugfix.
> 
> Mingming, this needs urgent consideration, please.  Note that I had to
> make additional changes to ext4 due to the subsequent introduction of
> the dirty_blocks counter.
> 
> 
> Please read the below changelogs carefully and check that I have got my
> head around this correctly - I may not have done.
> 


Hum... e8ced39d5e8911c662d4d69a342b9d053eaaac4e is probably following
the wrong path, but I see the intent. Even in the 'nr_files' case, it could
help to reduce excessive calls to percpu_counter_sum()

What we can do is to use two s64 counters (only in SMP):

s64 reference_count
s64 shadow_count

One that is guaranteed to be touched with appropriate locking
in __percpu_counter_add()

Another one that might be changed by percpu_counter_sum(), without
any locking, acting as a shadow.

Thanks

[PATCH] percpu_counter: Fix __percpu_counter_sum()

commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e (percpu_counter: 
new function percpu_counter_sum_and_set) was to make __percpu_counter_sum()
being able to recompute the estimate of a percpu_counter value.

Problem is that we cannot write on other cpus counters without racing.

What we can do is to use two s64 counter, one acting as a reference
that we should not change in __percpu_counter_sum(), another one, shadowing
the reference.

percpu_counter_read() is reading the shadow
percpu_counter_sum() reads the reference and recompute the shadow.

If a given percpu_counter is never 'summed', then its shadow_counter
is always equal to its reference.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
 include/linux/percpu_counter.h |    9 +++++----
 lib/percpu_counter.c           |   27 +++++++++++++++++----------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 9007ccd..71b5c5d 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -17,7 +17,8 @@
 
 struct percpu_counter {
 	spinlock_t lock;
-	s64 count;
+	s64	   reference_count;
+	s64	   shadow_count;
 #ifdef CONFIG_HOTPLUG_CPU
 	struct list_head list;	/* All percpu_counters are on a list */
 #endif
@@ -55,7 +56,7 @@ static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
 
 static inline s64 percpu_counter_read(struct percpu_counter *fbc)
 {
-	return fbc->count;
+	return fbc->shadow_count;
 }
 
 /*
@@ -65,9 +66,9 @@ static inline s64 percpu_counter_read(struct percpu_counter *fbc)
  */
 static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
 {
-	s64 ret = fbc->count;
+	s64 ret = percpu_counter_read(fbc);
 
-	barrier();		/* Prevent reloads of fbc->count */
+	barrier();		/* Prevent reloads of fbc->shadow_count */
 	if (ret >= 0)
 		return ret;
 	return 1;
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index a866389..44ec857 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -14,6 +14,9 @@ static LIST_HEAD(percpu_counters);
 static DEFINE_MUTEX(percpu_counters_lock);
 #endif
 
+/*
+ * Note : This function is racy
+ */
 void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
 {
 	int cpu;
@@ -23,7 +26,8 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
 		*pcount = 0;
 	}
-	fbc->count = amount;
+	fbc->reference_count = amount;
+	fbc->shadow_count = amount;
 	spin_unlock(&fbc->lock);
 }
 EXPORT_SYMBOL(percpu_counter_set);
@@ -38,7 +42,8 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 	count = *pcount + amount;
 	if (count >= batch || count <= -batch) {
 		spin_lock(&fbc->lock);
-		fbc->count += count;
+		fbc->reference_count += count;
+		fbc->shadow_count += count;
 		*pcount = 0;
 		spin_unlock(&fbc->lock);
 	} else {
@@ -57,16 +62,16 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc)
 	s64 ret;
 	int cpu;
 
-	spin_lock(&fbc->lock);
-	ret = fbc->count;
+	ret = fbc->reference_count;
 	for_each_online_cpu(cpu) {
 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
 		ret += *pcount;
-		*pcount = 0;
 	}
-	fbc->count = ret;
-
-	spin_unlock(&fbc->lock);
+	/*
+	 * Update fbc->shadow_count so that percpu_counter_read()
+	 * can have a better idea of this counter 'value'
+	 */
+	fbc->shadow_count = ret;
 	return ret;
 }
 EXPORT_SYMBOL(__percpu_counter_sum);
@@ -76,7 +81,8 @@ static struct lock_class_key percpu_counter_irqsafe;
 int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
 {
 	spin_lock_init(&fbc->lock);
-	fbc->count = amount;
+	fbc->shadow_count = amount;
+	fbc->reference_count = amount;
 	fbc->counters = alloc_percpu(s32);
 	if (!fbc->counters)
 		return -ENOMEM;
@@ -132,7 +138,8 @@ static int __cpuinit percpu_counter_hotcpu_callback(struct notifier_block *nb,
 
 		spin_lock_irqsave(&fbc->lock, flags);
 		pcount = per_cpu_ptr(fbc->counters, cpu);
-		fbc->count += *pcount;
+		fbc->reference_count += *pcount;
+		fbc->shadow_count += *pcount;
 		*pcount = 0;
 		spin_unlock_irqrestore(&fbc->lock, flags);
 	}


  parent reply	other threads:[~2008-12-07 13:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-03 18:40 [PATCH] percpu_counter: fix CPU unplug race in percpu_counter_destroy() Eric Dumazet
2008-12-03 20:24 ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Eric Dumazet
2008-12-03 20:45   ` Peter Zijlstra
2008-12-04  6:14   ` David Miller
2008-12-07  4:22   ` Andrew Morton
2008-12-07  4:22     ` Andrew Morton
2008-12-07 10:25     ` Peter Zijlstra
2008-12-07 13:28     ` Eric Dumazet [this message]
2008-12-07 13:28       ` Eric Dumazet
2008-12-07 17:28       ` Andrew Morton
2008-12-07 18:00         ` Eric Dumazet
2008-12-07 18:00           ` Eric Dumazet
2008-12-08  4:52           ` Andrew Morton
2008-12-08 22:12             ` Theodore Tso
2008-12-08 22:20               ` Peter Zijlstra
2008-12-08 23:00                 ` Theodore Tso
2008-12-08 23:05                   ` Peter Zijlstra
2008-12-08 23:08                     ` Peter Zijlstra
2008-12-09  8:12                     ` Eric Dumazet
2008-12-09  8:12                       ` Eric Dumazet
2008-12-09  8:34                       ` Peter Zijlstra
2008-12-09  8:34                         ` Peter Zijlstra
2008-12-10  5:09                         ` Eric Dumazet
2008-12-10  5:49                           ` Andrew Morton
2008-12-10 22:56                             ` Eric Dumazet
2008-12-10 22:56                               ` Eric Dumazet
2008-12-12  8:17                               ` Rusty Russell
2008-12-12  8:22                                 ` Eric Dumazet
2008-12-12  8:22                                   ` Eric Dumazet
2008-12-12 11:08                                 ` [PATCH] percpu_counter: use local_t and atomic_long_t if possible Eric Dumazet
2008-12-12 11:08                                   ` Eric Dumazet
2008-12-12 11:29                                   ` Peter Zijlstra
2008-12-23 11:43                                   ` Peter Zijlstra
2008-12-25 13:26                                     ` Rusty Russell
2008-12-15 12:53                             ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Rusty Russell
2008-12-16 20:16                               ` Ingo Molnar
2008-12-10  7:12                           ` Peter Zijlstra
2008-12-08 23:07                   ` Andrew Morton
2008-12-08 23:49                     ` Theodore Tso
2008-12-08 22:22               ` Andrew Morton
2008-12-08 22:44               ` Mingming Cao
2008-12-08 22:44                 ` Mingming Cao
2008-12-07 22:24         ` [PATCH] atomic: fix a typo in atomic_long_xchg() Eric Dumazet
2008-12-07 22:24           ` Eric Dumazet
2008-12-07 15:28     ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Theodore Tso
2008-12-08  4:42       ` Andrew Morton
2008-12-08 17:55         ` Mingming Cao
2008-12-08 17:55           ` Mingming Cao
2008-12-11 16:32           ` [stable] " Greg KH
2008-12-08 17:44     ` Mingming Cao
2008-12-08 17:44       ` Mingming Cao
2008-12-04  6:13 ` [PATCH] percpu_counter: fix CPU unplug race in percpu_counter_destroy() David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=493BCF60.1080409@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=cmm@us.ibm.com \
    --cc=davem@davemloft.net \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.