* [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
@ 2006-04-10 17:58 Mingming Cao
2006-04-11 17:09 ` Christoph Lameter
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
0 siblings, 2 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-10 17:58 UTC (permalink / raw)
To: akpm; +Cc: kiran, Laurent Vivier, linux-kernel, ext2-devel, linux-fsdevel
Here are the proposed patches to allow the ext3 free block accounting
works with more than 8TB storage.
[PATCH 1] - Tries to fix the per cpu counter to handle the "overflow"
when dealing with unsigned long counters.
[PATCH 2] - Currently percpu_counter_read_positive() always return 1 if
the counter(singed type) is negative. This leads the ext3 always get
free blocks as 1 if there are more than 2**31 free blocks, thus prevent
non-root users to write(file creation) to the filesystem. This patch
fixed this by using percpu_counter_read() instead.
[PATCH 3] - Changes the places in ext3 when updating the free blocks
counter to use percpu_counter_mod_ll()(added in patch 1) to prevent
overflow.
patches against 2.6.16-mm2. Tested on a freshly created 10TB ext3,
filled the first 8TB storage with 6000 parallel dd (direct IO) first,
then tested the rest 2TB with overnight fsx.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-10 17:58 [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter Mingming Cao
@ 2006-04-11 17:09 ` Christoph Lameter
2006-04-11 19:01 ` Mingming Cao
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
1 sibling, 1 reply; 17+ messages in thread
From: Christoph Lameter @ 2006-04-11 17:09 UTC (permalink / raw)
To: Mingming Cao
Cc: akpm, kiran, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Mon, 10 Apr 2006, Mingming Cao wrote:
> Here are the proposed patches to allow the ext3 free block accounting
> works with more than 8TB storage.
Umm.. This is an issue on 32 bit platforms only. 64bit platforms x86_64,
ia64 etc do not need this. Would you make it arch specific?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-11 17:09 ` Christoph Lameter
@ 2006-04-11 19:01 ` Mingming Cao
2006-04-11 22:20 ` Ravikiran G Thirumalai
0 siblings, 1 reply; 17+ messages in thread
From: Mingming Cao @ 2006-04-11 19:01 UTC (permalink / raw)
To: Christoph Lameter
Cc: akpm, kiran, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Tue, 2006-04-11 at 10:09 -0700, Christoph Lameter wrote:
> On Mon, 10 Apr 2006, Mingming Cao wrote:
>
> > Here are the proposed patches to allow the ext3 free block accounting
> > works with more than 8TB storage.
>
> Umm.. This is an issue on 32 bit platforms only. 64bit platforms x86_64,
> ia64 etc do not need this. Would you make it arch specific?
Yes, make sense. I will update my patch soon. Thanks.
Mingming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-11 19:01 ` Mingming Cao
@ 2006-04-11 22:20 ` Ravikiran G Thirumalai
2006-04-12 21:28 ` [Ext2-devel] " Mingming Cao
0 siblings, 1 reply; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-11 22:20 UTC (permalink / raw)
To: Mingming Cao
Cc: Christoph Lameter, akpm, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Tue, Apr 11, 2006 at 12:01:13PM -0700, Mingming Cao wrote:
> On Tue, 2006-04-11 at 10:09 -0700, Christoph Lameter wrote:
> > On Mon, 10 Apr 2006, Mingming Cao wrote:
> >
> > > Here are the proposed patches to allow the ext3 free block accounting
> > > works with more than 8TB storage.
> >
> > Umm.. This is an issue on 32 bit platforms only. 64bit platforms x86_64,
> > ia64 etc do not need this. Would you make it arch specific?
>
> Yes, make sense. I will update my patch soon. Thanks.
Or make the counter s64? so that it stays 64 bit on all arches?
OR
why not change the global per-cpu counter type to unsigned long (as we
discussed earlier), so we don't need the extra "ul" flags and interfaces,
and all arches get a standard unsigned long return type? We could also
do away with percpu_read_positive then no? The applications for per-cpu
counters is going to be upcounters always methinks...
Thanks,
Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Ext2-devel] Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-11 22:20 ` Ravikiran G Thirumalai
@ 2006-04-12 21:28 ` Mingming Cao
2006-04-12 21:50 ` Andreas Dilger
2006-04-13 19:02 ` Ravikiran G Thirumalai
0 siblings, 2 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-12 21:28 UTC (permalink / raw)
To: Ravikiran G Thirumalai
Cc: Christoph Lameter, akpm, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Tue, 2006-04-11 at 15:20 -0700, Ravikiran G Thirumalai wrote:
> On Tue, Apr 11, 2006 at 12:01:13PM -0700, Mingming Cao wrote:
> > On Tue, 2006-04-11 at 10:09 -0700, Christoph Lameter wrote:
> > > On Mon, 10 Apr 2006, Mingming Cao wrote:
> > >
> > > > Here are the proposed patches to allow the ext3 free block accounting
> > > > works with more than 8TB storage.
> > >
> > > Umm.. This is an issue on 32 bit platforms only. 64bit platforms x86_64,
> > > ia64 etc do not need this. Would you make it arch specific?
> >
> > Yes, make sense. I will update my patch soon. Thanks.
>
I was thinking something like this:
+void percpu_counter_mod_ul(struct percpu_counter *fbc, long amount)
+{
+ preempt_disable();
+ __percpu_counter_mod(fbc, amount, BITS_PER_LONG<=32);
+ preempt_enable();
+}
+EXPORT_SYMBOL(percpu_counter_mod_ul);
where the check for unsigned long overflow is only turned on 32 bit
platforms.
> Or make the counter s64? so that it stays 64 bit on all arches?
>
Well, don't we have the problem : 64 bit counter add/dec/update is not
always atomic on all 32 bit platforms? There are risk that we will get
bogus global value.
> OR
> why not change the global per-cpu counter type to unsigned long (as we
> discussed earlier), so we don't need the extra "ul" flags and interfaces,
> and all arches get a standard unsigned long return type?
> We could also
> do away with percpu_read_positive then no? The applications for per-cpu
> counters is going to be upcounters always methinks...
>
yeah...I am not so happy with the extra "ul" checking flags either. But
as you have mentioned before, the signed global counter type is there
for some cases when the global counter becomes temporally negative
( although the counter in real life should always positive). What should
we do if the global counter is a unsigned value, was initialized to 0,
and now we add -5 to it(-5 is from one local counter, then we will get a
bogus big value)?
ext3 free blocks counter is always initialized to a big positive value
so I doubt above concern is a real issue for ext3. Perhaps you thought
this through that it's okay to change the global counter to unsigned
long for other applications of PCP counters?
Thanks,
Mingming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Ext2-devel] Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-12 21:28 ` [Ext2-devel] " Mingming Cao
@ 2006-04-12 21:50 ` Andreas Dilger
2006-04-13 19:02 ` Ravikiran G Thirumalai
1 sibling, 0 replies; 17+ messages in thread
From: Andreas Dilger @ 2006-04-12 21:50 UTC (permalink / raw)
To: Mingming Cao
Cc: Ravikiran G Thirumalai, Christoph Lameter, akpm, Laurent Vivier,
linux-kernel, ext2-devel, linux-fsdevel
On Apr 12, 2006 14:28 -0700, Mingming Cao wrote:
> where the check for unsigned long overflow is only turned on 32 bit
> platforms.
>
> > Or make the counter s64? so that it stays 64 bit on all arches?
> >
>
> Well, don't we have the problem : 64 bit counter add/dec/update is not
> always atomic on all 32 bit platforms? There are risk that we will get
> bogus global value.
My thought here is that the per-cpu counter could still be a 32-bit counter
and the global value could be a 64-bit value. That way, we don't need to
mess with 64-bit math in the common case, and we can still have a 64-bit
global value. The minor drawback would be that we can't have a per-cpu
delta of more than 2^31 at a time, but I don't think this is a worry here.
> > why not change the global per-cpu counter type to unsigned long (as we
> > discussed earlier), so we don't need the extra "ul" flags and interfaces,
> > and all arches get a standard unsigned long return type?
> > We could also
> > do away with percpu_read_positive then no? The applications for per-cpu
> > counters is going to be upcounters always methinks...
The "percpu_read_positive" usage is broken in any case, since it doesn't
correctly handle the case where there is no space in the filesystem at
all. The calling code (ext3_statfs) really needs to just call percpu_read()
and then return zero if this is negative.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Ext2-devel] Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-12 21:28 ` [Ext2-devel] " Mingming Cao
2006-04-12 21:50 ` Andreas Dilger
@ 2006-04-13 19:02 ` Ravikiran G Thirumalai
2006-04-13 22:25 ` Mingming Cao
1 sibling, 1 reply; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-13 19:02 UTC (permalink / raw)
To: Mingming Cao
Cc: Christoph Lameter, akpm, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Wed, Apr 12, 2006 at 02:28:35PM -0700, Mingming Cao wrote:
> On Tue, 2006-04-11 at 15:20 -0700, Ravikiran G Thirumalai wrote:
> > On Tue, Apr 11, 2006 at 12:01:13PM -0700, Mingming Cao wrote:
> > > On Tue, 2006-04-11 at 10:09 -0700, Christoph Lameter wrote:
>
>
> where the check for unsigned long overflow is only turned on 32 bit
> platforms.
>
> > Or make the counter s64? so that it stays 64 bit on all arches?
> >
>
> Well, don't we have the problem : 64 bit counter add/dec/update is not
> always atomic on all 32 bit platforms? There are risk that we will get
> bogus global value.
Yes, but the global counter is modified with a lock in the SMP case, and the
local counters are modified by their respective cpus only, Although there
might be more subtle issues .....
>
> > OR
> > why not change the global per-cpu counter type to unsigned long (as we
> > discussed earlier), so we don't need the extra "ul" flags and interfaces,
> > and all arches get a standard unsigned long return type?
> > We could also
> > do away with percpu_read_positive then no? The applications for per-cpu
> > counters is going to be upcounters always methinks...
> >
>
> yeah...I am not so happy with the extra "ul" checking flags either. But
> as you have mentioned before, the signed global counter type is there
> for some cases when the global counter becomes temporally negative
> ( although the counter in real life should always positive). What should
> we do if the global counter is a unsigned value, was initialized to 0,
> and now we add -5 to it(-5 is from one local counter, then we will get a
> bogus big value)?
I thought the solution to this was to have a global unsigned counter, and
signed local counter, and defer updates to the global if it is going to be a
large value due to the case above. This way the global counter remains an up
counter no?
Thanks,
Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Ext2-devel] Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-13 19:02 ` Ravikiran G Thirumalai
@ 2006-04-13 22:25 ` Mingming Cao
2006-04-14 0:20 ` Ravikiran G Thirumalai
0 siblings, 1 reply; 17+ messages in thread
From: Mingming Cao @ 2006-04-13 22:25 UTC (permalink / raw)
To: Ravikiran G Thirumalai
Cc: Christoph Lameter, akpm, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Thu, 2006-04-13 at 12:02 -0700, Ravikiran G Thirumalai wrote:
> On Wed, Apr 12, 2006 at 02:28:35PM -0700, Mingming Cao wrote:
> > On Tue, 2006-04-11 at 15:20 -0700, Ravikiran G Thirumalai wrote:
> > > On Tue, Apr 11, 2006 at 12:01:13PM -0700, Mingming Cao wrote:
> > > > On Tue, 2006-04-11 at 10:09 -0700, Christoph Lameter wrote:
> >
> >
> > where the check for unsigned long overflow is only turned on 32 bit
> > platforms.
> >
> > > Or make the counter s64? so that it stays 64 bit on all arches?
> > >
> >
> > Well, don't we have the problem : 64 bit counter add/dec/update is not
> > always atomic on all 32 bit platforms? There are risk that we will get
> > bogus global value.
>
> Yes, but the global counter is modified with a lock in the SMP case, and the
> local counters are modified by their respective cpus only, Although there
> might be more subtle issues .....
>
Hmm, I was worried about the read to the 64 bit global counter, there is
no lock to protect it from getting an half updated 64 bit counter. But I
think the window is probably small, and with what Andreas suggested
(keep the local per cpu counter as 32 bit value), we would avoid this
non-atomic math in most cases.
Well,anyway this counter is just an approximate value, and currently
(with 32 bit global counter) we could still read an old global counter
value while it's being updated since no lock is required while read it.
> >
> > > OR
> > > why not change the global per-cpu counter type to unsigned long (as we
> > > discussed earlier), so we don't need the extra "ul" flags and interfaces,
> > > and all arches get a standard unsigned long return type?
> > > We could also
> > > do away with percpu_read_positive then no? The applications for per-cpu
> > > counters is going to be upcounters always methinks...
> > >
> >
> > yeah...I am not so happy with the extra "ul" checking flags either. But
> > as you have mentioned before, the signed global counter type is there
> > for some cases when the global counter becomes temporally negative
> > ( although the counter in real life should always positive). What should
> > we do if the global counter is a unsigned value, was initialized to 0,
> > and now we add -5 to it(-5 is from one local counter, then we will get a
> > bogus big value)?
>
> I thought the solution to this was to have a global unsigned counter, and
> signed local counter, and defer updates to the global if it is going to be a
> large value due to the case above. This way the global counter remains an up
> counter no?
>
I think that will work, and we could get rid of the cheating
percpu_counter_read_positive() totally;)
So I think we could combine these two together: make the global counter
an unsigned 64 bit (u64) and keep the per cpu counter still signed 32
bit type, and also, defer updates the global counter to the global if
the end result is larger that before. This way we could have remove the
need for percpu_counter_read_positive(), and also allow the counter
being used for 64 bit accounting on 32 bit arch.
Comments?
Mingming
> Thanks,
> Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Ext2-devel] Re: [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter
2006-04-13 22:25 ` Mingming Cao
@ 2006-04-14 0:20 ` Ravikiran G Thirumalai
0 siblings, 0 replies; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-14 0:20 UTC (permalink / raw)
To: Mingming Cao
Cc: Christoph Lameter, akpm, Laurent Vivier, linux-kernel, ext2-devel,
linux-fsdevel
On Thu, Apr 13, 2006 at 03:25:39PM -0700, Mingming Cao wrote:
> On Thu, 2006-04-13 at 12:02 -0700, Ravikiran G Thirumalai wrote:
> > ...
> > Yes, but the global counter is modified with a lock in the SMP case, and the
> > local counters are modified by their respective cpus only, Although there
> > might be more subtle issues .....
> >
> Hmm, I was worried about the read to the 64 bit global counter, there is
> no lock to protect it from getting an half updated 64 bit counter. But I
> think the window is probably small, and with what Andreas suggested
> (keep the local per cpu counter as 32 bit value), we would avoid this
> non-atomic math in most cases.
>
> Well,anyway this counter is just an approximate value, and currently
> (with 32 bit global counter) we could still read an old global counter
> value while it's being updated since no lock is required while read it.
Yup, and percpu_counter_exceeds should take care of the race in global
counter reads where it matters.
> >
> > I thought the solution to this was to have a global unsigned counter, and
> > signed local counter, and defer updates to the global if it is going to be a
> > large value due to the case above. This way the global counter remains an up
> > counter no?
> >
>
> I think that will work, and we could get rid of the cheating
> percpu_counter_read_positive() totally;)
>
> So I think we could combine these two together: make the global counter
> an unsigned 64 bit (u64) and keep the per cpu counter still signed 32
> bit type, and also, defer updates the global counter to the global if
> the end result is larger that before. This way we could have remove the
> need for percpu_counter_read_positive(), and also allow the counter
> being used for 64 bit accounting on 32 bit arch.
>
> Comments?
Sounds OK to me.
Thanks,
Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-10 17:58 [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter Mingming Cao
2006-04-11 17:09 ` Christoph Lameter
@ 2006-04-21 14:59 ` Mingming Cao
2006-04-21 22:09 ` Andrew Morton
` (2 more replies)
1 sibling, 3 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-21 14:59 UTC (permalink / raw)
To: akpm; +Cc: kiran, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
The following patches are to fix the percpu counter issue support more
than 2**31 blocks for ext3, i.e. allow the ext3 free block accounting
works with more than 8TB storage.
[PATCH 1] - Generic perpcu longlong type counter support: global counter
type changed from long to long long. The local counter is still remains
32 bit (long type), so we could avoid doing 64 bit update in most cases.
Fixed the percpu_counter_read_positive() to handle the 0 value counter
correctly;Add support to initialize the global counter to a value that
are greater than 2**32.
[PATCH 2] - ext3 part of the changes: make use of the new support to
initialize the free blocks counter, instead of using percpu_counter_mod
() indirectly.
Patches against 2.6.17-rc1-mm2. Tested on a freshly created 10TB ext3.
Here is Patch 1.
Signed-Off-By: Mingming Cao <cmm@us.ibm.com>
---
linux-2.6.16-cmm/include/linux/percpu_counter.h | 42 +++++++++++++++++-------
linux-2.6.16-cmm/lib/percpu_counter.c | 8 ++--
2 files changed, 34 insertions(+), 16 deletions(-)
diff -puN include/linux/percpu_counter.h~percpu_longlong_counter include/linux/percpu_counter.h
--- linux-2.6.16/include/linux/percpu_counter.h~percpu_longlong_counter 2006-04-21 00:01:57.000000000 -0700
+++ linux-2.6.16-cmm/include/linux/percpu_counter.h 2006-04-21 00:02:39.000000000 -0700
@@ -17,7 +17,7 @@
struct percpu_counter {
spinlock_t lock;
- long count;
+ long long count;
long *counters;
};
@@ -34,17 +34,25 @@ static inline void percpu_counter_init(s
fbc->counters = alloc_percpu(long);
}
+static inline void
+percpu_counter_ll_init(struct percpu_counter *fbc, long long amount)
+{
+ spin_lock_init(&fbc->lock);
+ fbc->count = amount;
+ fbc->counters = alloc_percpu(long);
+}
+
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
{
free_percpu(fbc->counters);
}
void percpu_counter_mod(struct percpu_counter *fbc, long amount);
-long percpu_counter_sum(struct percpu_counter *fbc);
+long long percpu_counter_sum(struct percpu_counter *fbc);
void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount);
-int percpu_counter_exceeds(struct percpu_counter *fbc, long limit);
+int percpu_counter_exceeds(struct percpu_counter *fbc, long long limit);
-static inline long percpu_counter_read(struct percpu_counter *fbc)
+static inline long long percpu_counter_read(struct percpu_counter *fbc)
{
return fbc->count;
}
@@ -52,13 +60,14 @@ static inline long percpu_counter_read(s
/*
* It is possible for the percpu_counter_read() to return a small negative
* number for some counter which should never be negative.
+ *
*/
-static inline long percpu_counter_read_positive(struct percpu_counter *fbc)
+static inline long long percpu_counter_read_positive(struct percpu_counter *fbc)
{
- long ret = fbc->count;
+ long long ret = fbc->count;
barrier(); /* Prevent reloads of fbc->count */
- if (ret > 0)
+ if (ret >= 0)
return ret;
return 1;
}
@@ -66,7 +75,7 @@ static inline long percpu_counter_read_p
#else
struct percpu_counter {
- long count;
+ long long count;
};
static inline void percpu_counter_init(struct percpu_counter *fbc)
@@ -74,6 +83,13 @@ static inline void percpu_counter_init(s
fbc->count = 0;
}
+static inline void
+percpu_counter_ll_init(struct percpu_counter *fbc, long long amount)
+{
+ fbc->count = amount;
+}
+
+
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
{
}
@@ -86,17 +102,18 @@ percpu_counter_mod(struct percpu_counter
preempt_enable();
}
-static inline long percpu_counter_read(struct percpu_counter *fbc)
+static inline long long percpu_counter_read(struct percpu_counter *fbc)
{
return fbc->count;
}
-static inline long percpu_counter_read_positive(struct percpu_counter *fbc)
+static inline long long percpu_counter_read_positive(struct percpu_counter *fbc)
{
return fbc->count;
}
-static inline long percpu_counter_sum(struct percpu_counter *fbc)
+static inline
+unsigned long long percpu_counter_sum(struct percpu_counter *fbc)
{
return percpu_counter_read_positive(fbc);
}
@@ -108,7 +125,8 @@ static inline void percpu_counter_mod_bh
local_bh_enable();
}
-static inline int percpu_counter_exceeds(struct percpu_counter *fbc, long limit)
+static inline int
+percpu_counter_exceeds(struct percpu_counter *fbc, long long limit)
{
return percpu_counter_read(fbc) > limit;
}
diff -puN lib/percpu_counter.c~percpu_longlong_counter lib/percpu_counter.c
--- linux-2.6.16/lib/percpu_counter.c~percpu_longlong_counter 2006-04-21 00:01:57.000000000 -0700
+++ linux-2.6.16-cmm/lib/percpu_counter.c 2006-04-21 00:01:57.000000000 -0700
@@ -7,7 +7,7 @@
static void __percpu_counter_mod(struct percpu_counter *fbc, long amount)
{
- long count;
+ long long count;
long *pcount;
int cpu = smp_processor_id();
@@ -43,9 +43,9 @@ EXPORT_SYMBOL(percpu_counter_mod_bh);
* Add up all the per-cpu counts, return the result. This is a more accurate
* but much slower version of percpu_counter_read_positive()
*/
-long percpu_counter_sum(struct percpu_counter *fbc)
+long long percpu_counter_sum(struct percpu_counter *fbc)
{
- long ret;
+ long long ret;
int cpu;
spin_lock(&fbc->lock);
@@ -69,7 +69,7 @@ EXPORT_SYMBOL(percpu_counter_sum);
* it turns out that the limit wasn't exceeded, there will be no more calls to
* percpu_counter_sum() until significant counter skew has reoccurred.
*/
-int percpu_counter_exceeds(struct percpu_counter *fbc, long limit)
+int percpu_counter_exceeds(struct percpu_counter *fbc, long long limit)
{
if (percpu_counter_read(fbc) > limit)
if (percpu_counter_sum(fbc) > limit)
_
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
@ 2006-04-21 22:09 ` Andrew Morton
2006-04-24 17:48 ` Mingming Cao
2006-04-22 0:56 ` Ravikiran G Thirumalai
2006-04-24 22:51 ` [RESEND][PATCH 1/2] percpu counter data type changes " Mingming Cao
2 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2006-04-21 22:09 UTC (permalink / raw)
To: cmm; +Cc: kiran, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
Mingming Cao <cmm@us.ibm.com> wrote:
>
> The following patches are to fix the percpu counter issue support more
> than 2**31 blocks for ext3, i.e. allow the ext3 free block accounting
> works with more than 8TB storage.
>
> [PATCH 1] - Generic perpcu longlong type counter support: global counter
> type changed from long to long long. The local counter is still remains
> 32 bit (long type), so we could avoid doing 64 bit update in most cases.
> Fixed the percpu_counter_read_positive() to handle the 0 value counter
> correctly;Add support to initialize the global counter to a value that
> are greater than 2**32.
I think it would be saner to explicitly specify the size of the field.
That means using s32 and s64 throughout this code.
That'll actually save space on 64-bit machines, where we're presently doing
alloc_percpu(long) when all we need is alloc_percpu(s32).
We'd need to review all users of this interface to make sure that they
handle the changed sizes appropriately, too.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
2006-04-21 22:09 ` Andrew Morton
@ 2006-04-22 0:56 ` Ravikiran G Thirumalai
2006-04-24 17:49 ` Mingming Cao
2006-04-24 22:51 ` [RESEND][PATCH 1/2] percpu counter data type changes " Mingming Cao
2 siblings, 1 reply; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-22 0:56 UTC (permalink / raw)
To: Mingming Cao
Cc: akpm, LaurentVivier, sct, linux-kernel, ext2-devel, linux-fsdevel
On Fri, Apr 21, 2006 at 07:59:06AM -0700, Mingming Cao wrote:
> The following patches are to fix the percpu counter issue support more
> than 2**31 blocks for ext3, i.e. allow the ext3 free block accounting
> works with more than 8TB storage.
>
> [PATCH 1] - Generic perpcu longlong type counter support: global counter
> type changed from long to long long. The local counter is still remains
> 32 bit (long type), so we could avoid doing 64 bit update in most cases.
> Fixed the percpu_counter_read_positive() to handle the 0 value counter
> correctly;Add support to initialize the global counter to a value that
> are greater than 2**32.
>
> [PATCH 2] - ext3 part of the changes: make use of the new support to
> initialize the free blocks counter, instead of using percpu_counter_mod
> () indirectly.
>
> Patches against 2.6.17-rc1-mm2. Tested on a freshly created 10TB ext3.
>
> Here is Patch 1.
>
> Signed-Off-By: Mingming Cao <cmm@us.ibm.com>
>
> ...
> +static inline void
> +percpu_counter_ll_init(struct percpu_counter *fbc, long long amount)
> +{
> + spin_lock_init(&fbc->lock);
> + fbc->count = amount;
> + fbc->counters = alloc_percpu(long);
> +}
> +
Do we need another interface for this? Why not change percpu_counter_init
and users to use 'amount' as an additional argument instead?
Thanks,
Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-21 22:09 ` Andrew Morton
@ 2006-04-24 17:48 ` Mingming Cao
2006-04-24 18:26 ` Ravikiran G Thirumalai
0 siblings, 1 reply; 17+ messages in thread
From: Mingming Cao @ 2006-04-24 17:48 UTC (permalink / raw)
To: Andrew Morton
Cc: kiran, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
On Fri, 2006-04-21 at 15:09 -0700, Andrew Morton wrote:
> Mingming Cao <cmm@us.ibm.com> wrote:
> >
> > The following patches are to fix the percpu counter issue support more
> > than 2**31 blocks for ext3, i.e. allow the ext3 free block accounting
> > works with more than 8TB storage.
> >
> > [PATCH 1] - Generic perpcu longlong type counter support: global counter
> > type changed from long to long long. The local counter is still remains
> > 32 bit (long type), so we could avoid doing 64 bit update in most cases.
> > Fixed the percpu_counter_read_positive() to handle the 0 value counter
> > correctly;Add support to initialize the global counter to a value that
> > are greater than 2**32.
>
> I think it would be saner to explicitly specify the size of the field.
> That means using s32 and s64 throughout this code.
>
Agree. Will use s64 in this code. As s32 has the same issue with what we
have(unsigned long) on 32 bit machine today: it is not enough for ext3
to support more than 2**31 free blocks, and also obviously not enough
for 64 bit ext3 that Laurent is working on.
> That'll actually save space on 64-bit machines, where we're presently doing
> alloc_percpu(long) when all we need is alloc_percpu(s32).
>
> We'd need to review all users of this interface to make sure that they
> handle the changed sizes appropriately, too.
I looked at the all users of percpu counter that are currently in
mainline(2.6.17-rc1) and in mm tree(2.6.17-rc1-mm2), they are:
1. ext2 free blocks/inodes/dirs
(int type, to be changed to unsinged long)
2. ext3 free blocks/inodes/dirs
(int type, changing to unsigned long or unsigned long long)
3. nr_files
(currently int type)
4. decnet_memory allocated
(was atomic_t type in mainline, changed to percpu counter type in mm)
5. tcp_memory allocated
(was atomic_t type, changed to percpu counter type in mm tree)
I could be wrong, but I think there will be no effect to change the size
of the global counter from "long" to s64 for above percpu counter users,
except gives the counter more room to grow. Kiran, what do you think?
Did I miss any other users of the perpcu counters?
Mingming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-22 0:56 ` Ravikiran G Thirumalai
@ 2006-04-24 17:49 ` Mingming Cao
0 siblings, 0 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-24 17:49 UTC (permalink / raw)
To: Ravikiran G Thirumalai
Cc: akpm, LaurentVivier, sct, linux-kernel, ext2-devel, linux-fsdevel
On Fri, 2006-04-21 at 17:56 -0700, Ravikiran G Thirumalai wrote:
> On Fri, Apr 21, 2006 at 07:59:06AM -0700, Mingming Cao wrote:
> > The following patches are to fix the percpu counter issue support more
> > than 2**31 blocks for ext3, i.e. allow the ext3 free block accounting
> > works with more than 8TB storage.
> >
> > [PATCH 1] - Generic perpcu longlong type counter support: global counter
> > type changed from long to long long. The local counter is still remains
> > 32 bit (long type), so we could avoid doing 64 bit update in most cases.
> > Fixed the percpu_counter_read_positive() to handle the 0 value counter
> > correctly;Add support to initialize the global counter to a value that
> > are greater than 2**32.
> >
> > [PATCH 2] - ext3 part of the changes: make use of the new support to
> > initialize the free blocks counter, instead of using percpu_counter_mod
> > () indirectly.
> >
> > Patches against 2.6.17-rc1-mm2. Tested on a freshly created 10TB ext3.
> >
> > Here is Patch 1.
> >
> > Signed-Off-By: Mingming Cao <cmm@us.ibm.com>
> >
> > ...
> > +static inline void
> > +percpu_counter_ll_init(struct percpu_counter *fbc, long long amount)
> > +{
> > + spin_lock_init(&fbc->lock);
> > + fbc->count = amount;
> > + fbc->counters = alloc_percpu(long);
> > +}
> > +
>
> Do we need another interface for this? Why not change percpu_counter_init
> and users to use 'amount' as an additional argument instead?
Suggestion accepted.:-)
Mingming
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-24 17:48 ` Mingming Cao
@ 2006-04-24 18:26 ` Ravikiran G Thirumalai
2006-04-24 19:13 ` Mingming Cao
0 siblings, 1 reply; 17+ messages in thread
From: Ravikiran G Thirumalai @ 2006-04-24 18:26 UTC (permalink / raw)
To: Mingming Cao
Cc: Andrew Morton, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
On Mon, Apr 24, 2006 at 10:48:32AM -0700, Mingming Cao wrote:
> On Fri, 2006-04-21 at 15:09 -0700, Andrew Morton wrote:
> > >
> > I think it would be saner to explicitly specify the size of the field.
> > That means using s32 and s64 throughout this code.
> >
>
> Agree. Will use s64 in this code. As s32 has the same issue with what we
> have(unsigned long) on 32 bit machine today: it is not enough for ext3
> to support more than 2**31 free blocks, and also obviously not enough
> for 64 bit ext3 that Laurent is working on.
I think Andrew's suggestion was to change global counter to s64 and local
counter to s32. That way we avoid allocating a 64 bit local counter on 64
bit systems when we could do with a 32 bit counter. (although there is no
real space savings with current alloc_percpu ;), but hopefully that will
change in the future)
>
> I looked at the all users of percpu counter that are currently in
> mainline(2.6.17-rc1) and in mm tree(2.6.17-rc1-mm2), they are:
>
> 1. ext2 free blocks/inodes/dirs
> (int type, to be changed to unsinged long)
> 2. ext3 free blocks/inodes/dirs
> (int type, changing to unsigned long or unsigned long long)
> 3. nr_files
> (currently int type)
> 4. decnet_memory allocated
> (was atomic_t type in mainline, changed to percpu counter type in mm)
> 5. tcp_memory allocated
> (was atomic_t type, changed to percpu counter type in mm tree)
>
> I could be wrong, but I think there will be no effect to change the size
> of the global counter from "long" to s64 for above percpu counter users,
> except gives the counter more room to grow. Kiran, what do you think?
Agree. Since the counters were earlier int/atomic_t, s32 on local and s64 on
global should be OK.
> Did I miss any other users of the perpcu counters?
No, AFAIK, these are the only users as of now.
Thanks,
Kiran
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 free blocks counter
2006-04-24 18:26 ` Ravikiran G Thirumalai
@ 2006-04-24 19:13 ` Mingming Cao
0 siblings, 0 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-24 19:13 UTC (permalink / raw)
To: Ravikiran G Thirumalai
Cc: Andrew Morton, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
On Mon, 2006-04-24 at 11:26 -0700, Ravikiran G Thirumalai wrote:
> On Mon, Apr 24, 2006 at 10:48:32AM -0700, Mingming Cao wrote:
> > On Fri, 2006-04-21 at 15:09 -0700, Andrew Morton wrote:
> > > >
> > > I think it would be saner to explicitly specify the size of the field.
> > > That means using s32 and s64 throughout this code.
> > >
> >
> > Agree. Will use s64 in this code. As s32 has the same issue with what we
> > have(unsigned long) on 32 bit machine today: it is not enough for ext3
> > to support more than 2**31 free blocks, and also obviously not enough
> > for 64 bit ext3 that Laurent is working on.
>
> I think Andrew's suggestion was to change global counter to s64 and local
> counter to s32. That way we avoid allocating a 64 bit local counter on 64
> bit systems when we could do with a 32 bit counter. (although there is no
> real space savings with current alloc_percpu ;), but hopefully that will
> change in the future)
>
Oh, I see. :-) I shall have the updated patch soon.
Mingming
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RESEND][PATCH 1/2] percpu counter data type changes to suppport for more than 2**31 ext3 free blocks counter
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
2006-04-21 22:09 ` Andrew Morton
2006-04-22 0:56 ` Ravikiran G Thirumalai
@ 2006-04-24 22:51 ` Mingming Cao
2 siblings, 0 replies; 17+ messages in thread
From: Mingming Cao @ 2006-04-24 22:51 UTC (permalink / raw)
To: akpm; +Cc: kiran, LaurentVivier, sct, linux-kernel, ext2-devel,
linux-fsdevel
The percpu counter data type are changed in this set of patches to
support more users like ext3 who need more than 32 bit to store the free
blocks total in the filesystem.
[PATCH 1] - Generic perpcu counters data type changes. The size of the global
counter and local counter were explictly specified using s64 and s32
The global counter is changed from long to s64, while the local counter
is changed from long to s32, so we could avoid doing 64 bit update in most cases.
[PATCH 2] - Make use of the new percpu_counter_init() in the applications
of percpu counters, to able to pass the initial value of the global counter.
Patch 1 is included here.
Signed-Off-By: Mingming Cao <cmm@us.ibm.com>
---
linux-2.6.16-cmm/include/linux/percpu_counter.h | 47 ++++++++++++------------
linux-2.6.16-cmm/lib/percpu_counter.c | 18 ++++-----
2 files changed, 34 insertions(+), 31 deletions(-)
diff -puN include/linux/percpu_counter.h~percpu_longlong_counter include/linux/percpu_counter.h
--- linux-2.6.16/include/linux/percpu_counter.h~percpu_longlong_counter 2006-04-21 00:01:57.000000000 -0700
+++ linux-2.6.16-cmm/include/linux/percpu_counter.h 2006-04-24 13:50:26.000000000 -0700
@@ -17,8 +17,8 @@
struct percpu_counter {
spinlock_t lock;
- long count;
- long *counters;
+ s64 count;
+ s32 *counters;
};
#if NR_CPUS >= 16
@@ -27,11 +27,11 @@ struct percpu_counter {
#define FBC_BATCH (NR_CPUS*4)
#endif
-static inline void percpu_counter_init(struct percpu_counter *fbc)
+static inline void percpu_counter_init(struct percpu_counter *fbc, s64 amount)
{
spin_lock_init(&fbc->lock);
- fbc->count = 0;
- fbc->counters = alloc_percpu(long);
+ fbc->count = amount;
+ fbc->counters = alloc_percpu(s32);
}
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
@@ -39,12 +39,12 @@ static inline void percpu_counter_destro
free_percpu(fbc->counters);
}
-void percpu_counter_mod(struct percpu_counter *fbc, long amount);
-long percpu_counter_sum(struct percpu_counter *fbc);
-void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount);
-int percpu_counter_exceeds(struct percpu_counter *fbc, long limit);
+void percpu_counter_mod(struct percpu_counter *fbc, s32 amount);
+s64 percpu_counter_sum(struct percpu_counter *fbc);
+void percpu_counter_mod_bh(struct percpu_counter *fbc, s32 amount);
+int percpu_counter_exceeds(struct percpu_counter *fbc, s64 limit);
-static inline long percpu_counter_read(struct percpu_counter *fbc)
+static inline s64 percpu_counter_read(struct percpu_counter *fbc)
{
return fbc->count;
}
@@ -52,13 +52,14 @@ static inline long percpu_counter_read(s
/*
* It is possible for the percpu_counter_read() to return a small negative
* number for some counter which should never be negative.
+ *
*/
-static inline long percpu_counter_read_positive(struct percpu_counter *fbc)
+static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
{
- long ret = fbc->count;
+ s64 ret = fbc->count;
barrier(); /* Prevent reloads of fbc->count */
- if (ret > 0)
+ if (ret >= 0)
return ret;
return 1;
}
@@ -66,12 +67,12 @@ static inline long percpu_counter_read_p
#else
struct percpu_counter {
- long count;
+ s64 count;
};
-static inline void percpu_counter_init(struct percpu_counter *fbc)
+static inline void percpu_counter_init(struct percpu_counter *fbc, s64 amount)
{
- fbc->count = 0;
+ fbc->count = amount;
}
static inline void percpu_counter_destroy(struct percpu_counter *fbc)
@@ -79,36 +80,38 @@ static inline void percpu_counter_destro
}
static inline void
-percpu_counter_mod(struct percpu_counter *fbc, long amount)
+percpu_counter_mod(struct percpu_counter *fbc, s32 amount)
{
preempt_disable();
fbc->count += amount;
preempt_enable();
}
-static inline long percpu_counter_read(struct percpu_counter *fbc)
+static inline s64 percpu_counter_read(struct percpu_counter *fbc)
{
return fbc->count;
}
-static inline long percpu_counter_read_positive(struct percpu_counter *fbc)
+static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
{
return fbc->count;
}
-static inline long percpu_counter_sum(struct percpu_counter *fbc)
+static inline
+unsigned s64 percpu_counter_sum(struct percpu_counter *fbc)
{
return percpu_counter_read_positive(fbc);
}
-static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
+static inline void percpu_counter_mod_bh(struct percpu_counter *fbc, s32 amount)
{
local_bh_disable();
fbc->count += amount;
local_bh_enable();
}
-static inline int percpu_counter_exceeds(struct percpu_counter *fbc, long limit)
+static inline int
+percpu_counter_exceeds(struct percpu_counter *fbc, s64 limit)
{
return percpu_counter_read(fbc) > limit;
}
diff -puN lib/percpu_counter.c~percpu_longlong_counter lib/percpu_counter.c
--- linux-2.6.16/lib/percpu_counter.c~percpu_longlong_counter 2006-04-21 00:01:57.000000000 -0700
+++ linux-2.6.16-cmm/lib/percpu_counter.c 2006-04-24 13:52:09.000000000 -0700
@@ -5,10 +5,10 @@
#include <linux/percpu_counter.h>
#include <linux/module.h>
-static void __percpu_counter_mod(struct percpu_counter *fbc, long amount)
+static void __percpu_counter_mod(struct percpu_counter *fbc, s32 amount)
{
- long count;
- long *pcount;
+ s64 count;
+ s32 *pcount;
int cpu = smp_processor_id();
pcount = per_cpu_ptr(fbc->counters, cpu);
@@ -23,7 +23,7 @@ static void __percpu_counter_mod(struct
}
}
-void percpu_counter_mod(struct percpu_counter *fbc, long amount)
+void percpu_counter_mod(struct percpu_counter *fbc, s32 amount)
{
preempt_disable();
__percpu_counter_mod(fbc, amount);
@@ -31,7 +31,7 @@ void percpu_counter_mod(struct percpu_co
}
EXPORT_SYMBOL(percpu_counter_mod);
-void percpu_counter_mod_bh(struct percpu_counter *fbc, long amount)
+void percpu_counter_mod_bh(struct percpu_counter *fbc, s32 amount)
{
local_bh_disable();
__percpu_counter_mod(fbc, amount);
@@ -43,15 +43,15 @@ EXPORT_SYMBOL(percpu_counter_mod_bh);
* Add up all the per-cpu counts, return the result. This is a more accurate
* but much slower version of percpu_counter_read_positive()
*/
-long percpu_counter_sum(struct percpu_counter *fbc)
+s64 percpu_counter_sum(struct percpu_counter *fbc)
{
- long ret;
+ s64 ret;
int cpu;
spin_lock(&fbc->lock);
ret = fbc->count;
for_each_possible_cpu(cpu) {
- long *pcount = per_cpu_ptr(fbc->counters, cpu);
+ s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
ret += *pcount;
}
spin_unlock(&fbc->lock);
@@ -69,7 +69,7 @@ EXPORT_SYMBOL(percpu_counter_sum);
* it turns out that the limit wasn't exceeded, there will be no more calls to
* percpu_counter_sum() until significant counter skew has reoccurred.
*/
-int percpu_counter_exceeds(struct percpu_counter *fbc, long limit)
+int percpu_counter_exceeds(struct percpu_counter *fbc, s64 limit)
{
if (percpu_counter_read(fbc) > limit)
if (percpu_counter_sum(fbc) > limit)
_
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-04-24 22:51 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-10 17:58 [RFC][PATCH 0/3] ext3 percpu counter fixes to suppport for ext3 unsigned long type free blocks counter Mingming Cao
2006-04-11 17:09 ` Christoph Lameter
2006-04-11 19:01 ` Mingming Cao
2006-04-11 22:20 ` Ravikiran G Thirumalai
2006-04-12 21:28 ` [Ext2-devel] " Mingming Cao
2006-04-12 21:50 ` Andreas Dilger
2006-04-13 19:02 ` Ravikiran G Thirumalai
2006-04-13 22:25 ` Mingming Cao
2006-04-14 0:20 ` Ravikiran G Thirumalai
2006-04-21 14:59 ` [PATCH 1/2] ext3 percpu counter fixes to suppport for more than 2**31 ext3 " Mingming Cao
2006-04-21 22:09 ` Andrew Morton
2006-04-24 17:48 ` Mingming Cao
2006-04-24 18:26 ` Ravikiran G Thirumalai
2006-04-24 19:13 ` Mingming Cao
2006-04-22 0:56 ` Ravikiran G Thirumalai
2006-04-24 17:49 ` Mingming Cao
2006-04-24 22:51 ` [RESEND][PATCH 1/2] percpu counter data type changes " Mingming Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).