* [PATCH] ext2 statfs improvement for block and inode free count @ 2007-07-14 1:36 Badari Pulavarty 2007-07-19 3:18 ` Andrew Morton 2007-10-16 22:00 ` Andrew Morton 0 siblings, 2 replies; 7+ messages in thread From: Badari Pulavarty @ 2007-07-14 1:36 UTC (permalink / raw) To: Andrew Morton; +Cc: Andreas Dilger, lkml, ext4 Andrew, Can you include it in -mm ? BTW, this patch is against mainline, won't apply cleanly to -mm, due to other statfs() improvements. Thanks, Badari More statfs() improvements for ext2. ext2 already maintains percpu counters for free blocks and inodes. Derive free block count and inode count by summing up percpu counters, instead of counting up all the groups in the filesystem each time. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> fs/ext2/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.22/fs/ext2/super.c =================================================================== --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.000000000 -0700 +++ linux-2.6.22/fs/ext2/super.c 2007-07-13 20:06:51.000000000 -0700 @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * buf->f_type = EXT2_SUPER_MAGIC; buf->f_bsize = sb->s_blocksize; buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; - buf->f_bfree = ext2_count_free_blocks(sb); + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) buf->f_bavail = 0; buf->f_files = le32_to_cpu(es->s_inodes_count); - buf->f_ffree = ext2_count_free_inodes(sb); + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); buf->f_namelen = EXT2_NAME_LEN; fsid = le64_to_cpup((void *)es->s_uuid) ^ le64_to_cpup((void *)es->s_uuid + sizeof(u64)); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-07-14 1:36 [PATCH] ext2 statfs improvement for block and inode free count Badari Pulavarty @ 2007-07-19 3:18 ` Andrew Morton 2007-07-19 15:18 ` Badari Pulavarty 2007-10-16 22:00 ` Andrew Morton 1 sibling, 1 reply; 7+ messages in thread From: Andrew Morton @ 2007-07-19 3:18 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Andreas Dilger, lkml, ext4 On Fri, 13 Jul 2007 18:36:54 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote: > More statfs() improvements for ext2. ext2 already maintains > percpu counters for free blocks and inodes. Derive free > block count and inode count by summing up percpu counters, > instead of counting up all the groups in the filesystem > each time. > hm, another speedup patch with no measurements which demonstrate its benefit. > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > Acked-by: Andreas Dilger <adilger@clusterfs.com> > > fs/ext2/super.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.22/fs/ext2/super.c > =================================================================== > --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.000000000 -0700 > +++ linux-2.6.22/fs/ext2/super.c 2007-07-13 20:06:51.000000000 -0700 > @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * > buf->f_type = EXT2_SUPER_MAGIC; > buf->f_bsize = sb->s_blocksize; > buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; > - buf->f_bfree = ext2_count_free_blocks(sb); > + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); > buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); > if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) > buf->f_bavail = 0; > buf->f_files = le32_to_cpu(es->s_inodes_count); > - buf->f_ffree = ext2_count_free_inodes(sb); > + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); > buf->f_namelen = EXT2_NAME_LEN; > fsid = le64_to_cpup((void *)es->s_uuid) ^ > le64_to_cpup((void *)es->s_uuid + sizeof(u64)); > Well there's a tradeoff here. At large CPU counts, percpu_counter_sum() becomes quite expensive - it takes a global lock and then goes off fishing in every CPU's percpu_alloced memory. So there is some value of (num_online_cpus / sb->s_groups_count) at which this change becomes a loss. Where does that value lie? Bear in mind that the global lock in percpu_counter_sum() will tilt the scales quite a bit. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-07-19 3:18 ` Andrew Morton @ 2007-07-19 15:18 ` Badari Pulavarty 2007-07-20 9:01 ` Andreas Dilger 0 siblings, 1 reply; 7+ messages in thread From: Badari Pulavarty @ 2007-07-19 15:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Andreas Dilger, lkml, ext4 On Wed, 2007-07-18 at 20:18 -0700, Andrew Morton wrote: > On Fri, 13 Jul 2007 18:36:54 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote: > > > More statfs() improvements for ext2. ext2 already maintains > > percpu counters for free blocks and inodes. Derive free > > block count and inode count by summing up percpu counters, > > instead of counting up all the groups in the filesystem > > each time. > > > > hm, another speedup patch with no measurements which demonstrate its > benefit. In my setups (4 & 8-way), I didn't measure any significant performance improvements (in any reasonable workload). I see some decent improvements on cooked-up (1 million stats) tests :( > > > > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > > Acked-by: Andreas Dilger <adilger@clusterfs.com> > > > > fs/ext2/super.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.22/fs/ext2/super.c > > =================================================================== > > --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.000000000 -0700 > > +++ linux-2.6.22/fs/ext2/super.c 2007-07-13 20:06:51.000000000 -0700 > > @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * > > buf->f_type = EXT2_SUPER_MAGIC; > > buf->f_bsize = sb->s_blocksize; > > buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; > > - buf->f_bfree = ext2_count_free_blocks(sb); > > + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); > > buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); > > if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) > > buf->f_bavail = 0; > > buf->f_files = le32_to_cpu(es->s_inodes_count); > > - buf->f_ffree = ext2_count_free_inodes(sb); > > + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); > > buf->f_namelen = EXT2_NAME_LEN; > > fsid = le64_to_cpup((void *)es->s_uuid) ^ > > le64_to_cpup((void *)es->s_uuid + sizeof(u64)); > > > > Well there's a tradeoff here. At large CPU counts, percpu_counter_sum() > becomes quite expensive - it takes a global lock and then goes off fishing > in every CPU's percpu_alloced memory. > > So there is some value of (num_online_cpus / sb->s_groups_count) at which > this change becomes a loss. Where does that value lie? Yes. I debated long time whether I should submit this or not - due to very reason. Old code wasn't holding any locks. I don't have any high count CPU machine (>8way) with me. I will request for time on one. > > Bear in mind that the global lock in percpu_counter_sum() will tilt the > scales quite a bit. Noticed that too. I added WARN_ON() to see if percpu sum doesn't match computed sum. I saw few stacks in a 24 hour run of fsx runs. Thanks, Badari ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-07-19 15:18 ` Badari Pulavarty @ 2007-07-20 9:01 ` Andreas Dilger 0 siblings, 0 replies; 7+ messages in thread From: Andreas Dilger @ 2007-07-20 9:01 UTC (permalink / raw) To: Badari Pulavarty; +Cc: Andrew Morton, lkml, ext4 On Jul 19, 2007 08:18 -0700, Badari Pulavarty wrote: > In my setups (4 & 8-way), I didn't measure any significant performance > improvements (in any reasonable workload). I see some decent > improvements on cooked-up (1 million stats) tests :( I don't have any numbers to publish, but this did help internally for large filesystems. > > Well there's a tradeoff here. At large CPU counts, percpu_counter_sum() > > becomes quite expensive - it takes a global lock and then goes off fishing > > in every CPU's percpu_alloced memory. > > > > So there is some value of (num_online_cpus / sb->s_groups_count) at which > > this change becomes a loss. Where does that value lie? > > Yes. I debated long time whether I should submit this or not - due to > very reason. Old code wasn't holding any locks. I don't have any high > count CPU machine (>8way) with me. I will request for time on one. Considering that large (16TB) filesystems have 128000 groups in them, I'd suspect that iterating over all of the group descriptors is a loss until you have an unusually huge number of CPUs. What was even worse was before caching "overhead" it walked the groups list twice (once for "overhead" and once for the per-group summary). It might be that for the filesystem sizes ext2 is used on this isn't a win, but I didn't submit the patch for ext2... > I added WARN_ON() to see if percpu sum doesn't match > computed sum. I saw few stacks in a 24 hour run of fsx runs. That could just be because the filesystem is changing between these two checks... Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-07-14 1:36 [PATCH] ext2 statfs improvement for block and inode free count Badari Pulavarty 2007-07-19 3:18 ` Andrew Morton @ 2007-10-16 22:00 ` Andrew Morton 2007-10-16 22:18 ` Badari 2007-10-17 23:15 ` Andreas Dilger 1 sibling, 2 replies; 7+ messages in thread From: Andrew Morton @ 2007-10-16 22:00 UTC (permalink / raw) To: Badari Pulavarty; +Cc: adilger, linux-kernel, linux-ext4 On Fri, 13 Jul 2007 18:36:54 -0700 Badari Pulavarty <pbadari@us.ibm.com> wrote: > > More statfs() improvements for ext2. ext2 already maintains > percpu counters for free blocks and inodes. Derive free > block count and inode count by summing up percpu counters, > instead of counting up all the groups in the filesystem > each time. > > > Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> > Acked-by: Andreas Dilger <adilger@clusterfs.com> > > fs/ext2/super.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.22/fs/ext2/super.c > =================================================================== > --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.000000000 -0700 > +++ linux-2.6.22/fs/ext2/super.c 2007-07-13 20:06:51.000000000 -0700 > @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * > buf->f_type = EXT2_SUPER_MAGIC; > buf->f_bsize = sb->s_blocksize; > buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; > - buf->f_bfree = ext2_count_free_blocks(sb); > + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); > buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); > if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) > buf->f_bavail = 0; > buf->f_files = le32_to_cpu(es->s_inodes_count); > - buf->f_ffree = ext2_count_free_inodes(sb); > + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); > buf->f_namelen = EXT2_NAME_LEN; > fsid = le64_to_cpup((void *)es->s_uuid) ^ > le64_to_cpup((void *)es->s_uuid + sizeof(u64)); > Guys, I have this patch in a stalled state pending some convincing demonstration/argument that it's actually a worthwhile change. How much hurt will it cause on large-numa, and how much benefit do we get for that hurt? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-10-16 22:00 ` Andrew Morton @ 2007-10-16 22:18 ` Badari 2007-10-17 23:15 ` Andreas Dilger 1 sibling, 0 replies; 7+ messages in thread From: Badari @ 2007-10-16 22:18 UTC (permalink / raw) To: Andrew Morton; +Cc: adilger, linux-kernel, linux-ext4 Andrew Morton wrote: > On Fri, 13 Jul 2007 18:36:54 -0700 > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > >> More statfs() improvements for ext2. ext2 already maintains >> percpu counters for free blocks and inodes. Derive free >> block count and inode count by summing up percpu counters, >> instead of counting up all the groups in the filesystem >> each time. >> >> >> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> >> Acked-by: Andreas Dilger <adilger@clusterfs.com> >> >> fs/ext2/super.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> Index: linux-2.6.22/fs/ext2/super.c >> =================================================================== >> --- linux-2.6.22.orig/fs/ext2/super.c 2007-07-13 20:06:38.000000000 -0700 >> +++ linux-2.6.22/fs/ext2/super.c 2007-07-13 20:06:51.000000000 -0700 >> @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * >> buf->f_type = EXT2_SUPER_MAGIC; >> buf->f_bsize = sb->s_blocksize; >> buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; >> - buf->f_bfree = ext2_count_free_blocks(sb); >> + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); >> buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); >> if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) >> buf->f_bavail = 0; >> buf->f_files = le32_to_cpu(es->s_inodes_count); >> - buf->f_ffree = ext2_count_free_inodes(sb); >> + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); >> buf->f_namelen = EXT2_NAME_LEN; >> fsid = le64_to_cpup((void *)es->s_uuid) ^ >> le64_to_cpup((void *)es->s_uuid + sizeof(u64)); >> >> > > Guys, I have this patch in a stalled state pending some convincing > demonstration/argument that it's actually a worthwhile change. > > How much hurt will it cause on large-numa, and how much benefit do we get > for that hurt? > Unfortunately, I couldn't find any noticeable significant improvements with or without this patch. May be none of my filesystems are large enough on a large enough NUMA box to verify. Lets drop it for now and re-visit it if needed. Thanks, Badari ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ext2 statfs improvement for block and inode free count 2007-10-16 22:00 ` Andrew Morton 2007-10-16 22:18 ` Badari @ 2007-10-17 23:15 ` Andreas Dilger 1 sibling, 0 replies; 7+ messages in thread From: Andreas Dilger @ 2007-10-17 23:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Badari Pulavarty, linux-kernel, linux-ext4 On Oct 16, 2007 15:00 -0700, Andrew Morton wrote: > On Fri, 13 Jul 2007 18:36:54 -0700 > Badari Pulavarty <pbadari@us.ibm.com> wrote: > > @@ -1136,12 +1136,12 @@ static int ext2_statfs (struct dentry * > > buf->f_type = EXT2_SUPER_MAGIC; > > buf->f_bsize = sb->s_blocksize; > > buf->f_blocks = le32_to_cpu(es->s_blocks_count) - overhead; > > - buf->f_bfree = ext2_count_free_blocks(sb); > > + buf->f_bfree = percpu_counter_sum(&sbi->s_freeblocks_counter); > > buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); > > if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) > > buf->f_bavail = 0; > > buf->f_files = le32_to_cpu(es->s_inodes_count); > > - buf->f_ffree = ext2_count_free_inodes(sb); > > + buf->f_ffree = percpu_counter_sum(&sbi->s_freeinodes_counter); > > buf->f_namelen = EXT2_NAME_LEN; > > fsid = le64_to_cpup((void *)es->s_uuid) ^ > > le64_to_cpup((void *)es->s_uuid + sizeof(u64)); > > > > Guys, I have this patch in a stalled state pending some convincing > demonstration/argument that it's actually a worthwhile change. Is it just this part of the change, or the whole "overhead" calculation? It looks like this is also missing the "overhead_last" part of the change? The filesystem overhead is static outside of fs resize and there is no point in recalculating it. In fact there was a cond_resched() added in there expressly because it was causing hiccups in the low-latency tests, so it is a non-trivial CPU user. > How much hurt will it cause on large-numa, and how much benefit do we get > for that hurt? To be honest, I suspect the tradeoff depends on the size of the filesystem and the number of CPUs. It might not make sense for ext2, but it likely does for ext4. If you consider that ext2_count_free_*() will walk all of the block group descriptors (1 per 128MB on a 4kB filesystem, 1 per 8MB on a 1kB filesystem) and look at average filesystem sizes these days (80GB let's say for small ones, 8TB for big ones) that means 640 loops for an average fs, 64k loops for a big fs. We might also consider using percpu_counter_read() and ensure the value is >= 0, since these values do not have to be exact. We can't use percpu_counter_read_positive() since it will return 1 block free if the percpu value is negative. Using just percpu_counter_read() means we are inaccurate by at most (+/-)NR_CPUS * num_online_cpus * 2 blocks, but on average this will be 0 blocks. What's also missing from these patches somehow (was in the original patch I sent) is the update of the superblock fields. buf->f_bsize = sb->s_blocksize; buf->f_blocks = le32_to_cpu(es->s_blocks_count) - sbi->s_overhead_last; buf->f_bfree = ext3_count_free_blocks (sb); + es->s_free_blocks_count = cpu_to_le32(buf->f_bfree); buf->f_bavail = buf->f_bfree - le32_to_cpu(es->s_r_blocks_count); if (buf->f_bfree < le32_to_cpu(es->s_r_blocks_count)) buf->f_bavail = 0; buf->f_files = le32_to_cpu(es->s_inodes_count); buf->f_ffree = ext3_count_free_inodes (sb); + es->s_free_inodes_count = cpu_to_le32(buf->f_ffree); buf->f_namelen = EXT3_NAME_LEN; return 0; Without this, e2fsck will report spurious when doing a read-only check, even if the fs is idle: Pass 5: Checking group summary information Free blocks count wrong (83302, counted=83276). Fix? no Free inodes count wrong (99987, counted=99972). Yes, I know you aren't guaranteed consistency on a read-only e2fsck, but we should at least allow the problem to be addressed if it is easy. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-10-17 23:15 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-14 1:36 [PATCH] ext2 statfs improvement for block and inode free count Badari Pulavarty 2007-07-19 3:18 ` Andrew Morton 2007-07-19 15:18 ` Badari Pulavarty 2007-07-20 9:01 ` Andreas Dilger 2007-10-16 22:00 ` Andrew Morton 2007-10-16 22:18 ` Badari 2007-10-17 23:15 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).