* [RFC PATCH 0/6] Optimize e2fsck for large file systems
@ 2012-11-25 0:36 Theodore Ts'o
2012-11-25 0:36 ` [PATCH 1/6] libext2fs: optimize rb_set_bmap_range() Theodore Ts'o
` (7 more replies)
0 siblings, 8 replies; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
This patch series optimizes e2fsck for large file systems (where large
is 4TB or more). Previously checking a 4TB file system when it was
mostly full could take upwards of six minutes of wall clock time, and
e2fsck would be mostly CPU bound. With this patch series, the same 4TB
file system can now be checked in less than 50 seconds and approximately
20 seconds of userspace CPU time. (Previously, it was consuming over
15 times as much CPU time.)
The speed ups come in three places:
1) Reducing the CPU time while reading the block bitmap in from disk.
This was done by speeding up rb_set_bmap_range, and it significantly
improves e2fsck's pass 5 operation.
2) Reducing the CPU time in e2fsck pass1 while constructing the
block_found_map (which is the in-core block allocation bitmap as
found by interating over all of the inodes).
3) Further speed up e2fsck's pass 5 by comparing the block allocation
bitmap one bitmap block at a time, instead of a bit-at-a-time.
Before....
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 712k/31500k (622k/91k), time: 194.99/179.09/ 0.04
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.04MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 712k/62064k (605k/108k), time: 1.03/ 0.01/ 0.02
Pass 2: I/O read: 4MB, write: 0MB, rate: 3.89MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 712k/62064k (605k/108k), time: 197.31/180.37/ 0.07
Pass 3A: Memory used: 712k/62064k (626k/87k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 712k/62064k (594k/119k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 639.39MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 712k/936k (533k/180k), time: 7.82/ 7.81/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 832k/936k (510k/323k), time: 172.99/161.13/ 0.41
Pass 5: I/O read: 118MB, write: 0MB, rate: 0.68MB/s
Memory used: 832k/936k (510k/323k), time: 378.21/349.38/ 0.48
I/O read: 129MB, write: 0MB, rate: 0.34MB/s
... and after....
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 916k/31500k (719k/198k), time: 17.83/ 2.79/ 0.05
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.45MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.45/ 0.01/ 0.01
Pass 2: I/O read: 4MB, write: 0MB, rate: 8.82MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 916k/62064k (704k/212k), time: 18.89/ 3.39/ 0.07
Pass 3A: Memory used: 916k/62064k (735k/181k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 916k/62064k (693k/224k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1265.82MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 916k/936k (601k/315k), time: 5.69/ 5.67/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 1048k/936k (569k/480k), time: 22.76/10.49/ 0.53
Pass 5: I/O read: 117MB, write: 0MB, rate: 5.14MB/s
Memory used: 1048k/936k (569k/480k), time: 47.38/19.57/ 0.60
I/O read: 129MB, write: 0MB, rate: 2.72MB/s
For slower CPU's (i.e., bookshelf NAS servers with underpowered, wimpy
ARM processors) or for larger RAID arrays, the speed ups would of course
be even better.
Theodore Ts'o (6):
libext2fs: optimize rb_set_bmap_range()
e2fsck: optimize pass1 for CPU time
libext2fs: add ext2fs_bitcount() function
libext2fs: optimize rb_get_bmap_range()
libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
e2fsck: optimize pass 5 for CPU utilization
e2fsck/pass1.c | 18 +++++++++--
e2fsck/pass5.c | 55 +++++++++++++++++++++++++++++++--
lib/ext2fs/bitops.c | 35 +++++++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/blkmap64_rb.c | 76 ++++++++++++++++++++++++++++++++++------------
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_exp | 3 ++
7 files changed, 165 insertions(+), 24 deletions(-)
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/6] libext2fs: optimize rb_set_bmap_range()
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 9:40 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 2/6] e2fsck: optimize pass1 for CPU time Theodore Ts'o
` (6 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
This speeds up reading bitmaps from disk for very large (and full)
disks by significant amounts (i.e., up to two CPU minutes for a 4T
file system).
Addresses-Google-Bug: #7534813
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
lib/ext2fs/blkmap64_rb.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index a42eda1..816f44f 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -674,16 +674,42 @@ static errcode_t rb_set_bmap_range(ext2fs_generic_bitmap bitmap,
__u64 start, size_t num, void *in)
{
struct ext2fs_rb_private *bp;
+ unsigned char *cp;
size_t i;
+ int first_set = -1;
int ret;
bp = (struct ext2fs_rb_private *) bitmap->private;
for (i = 0; i < num; i++) {
- ret = ext2fs_test_bit(i, in);
- if (ret)
- rb_insert_extent(start + i - bitmap->start, 1, bp);
+ if (i & 7 == 0) {
+ unsigned char c = cp[i/8];
+ if (c == 0xFF) {
+ if (first_set == -1)
+ first_set = i;
+ i += 7;
+ continue;
+ }
+ if ((c == 0x00) && (first_set == -1)) {
+ i += 7;
+ continue;
+ }
+ }
+ if (ext2fs_test_bit(i, in)) {
+ if (first_set == -1)
+ first_set = i;
+ continue;
+ }
+ if (first_set == -1)
+ continue;
+
+ rb_insert_extent(start + first_set - bitmap->start,
+ i - first_set, bp);
+ first_set = -1;
}
+ if (first_set != -1)
+ rb_insert_extent(start + first_set - bitmap->start,
+ num - first_set, bp);
return 0;
}
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/6] e2fsck: optimize pass1 for CPU time
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
2012-11-25 0:36 ` [PATCH 1/6] libext2fs: optimize rb_set_bmap_range() Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 10:06 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 3/6] libext2fs: add ext2fs_bitcount() function Theodore Ts'o
` (5 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
Optimize e2fsck pass 1 by marking entire extents as being in use at a
time, instead of block by block. This optimization only works for
non-bigalloc file systems for now (it's tricky to handle bigalloc file
systems since this code is also responsible for dealing with blocks
that are not correctly aligned within a cluster). When the
optimization works, the CPU savings can be significant: up to two or
three CPU minutes for a full 4T disk.
Addresses-Google-Bug: #7534813
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
e2fsck/pass1.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 78fbe8d..cc00e0f 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -1432,6 +1432,16 @@ static _INLINE_ void mark_block_used(e2fsck_t ctx, blk64_t block)
}
}
+static _INLINE_ void mark_blocks_used(e2fsck_t ctx, blk64_t block,
+ unsigned int num)
+{
+ if (ext2fs_test_block_bitmap_range2(ctx->block_found_map, block, num))
+ ext2fs_mark_block_bitmap_range2(ctx->block_found_map, block, num);
+ else
+ while (num--)
+ mark_block_used(ctx, block++);
+}
+
/*
* Adjust the extended attribute block's reference counts at the end
* of pass 1, either by subtracting out references for EA blocks that
@@ -1867,11 +1877,15 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
goto failed_add_dir_block;
}
}
+ if (!ctx->fs->cluster_ratio_bits) {
+ mark_blocks_used(ctx, extent.e_pblk, extent.e_len);
+ pb->num_blocks += extent.e_len;
+ }
for (blk = extent.e_pblk, blockcnt = extent.e_lblk, i = 0;
i < extent.e_len;
blk++, blockcnt++, i++) {
- if (!(ctx->fs->cluster_ratio_bits &&
- pb->previous_block &&
+ if (ctx->fs->cluster_ratio_bits &&
+ !(pb->previous_block &&
(EXT2FS_B2C(ctx->fs, blk) ==
EXT2FS_B2C(ctx->fs, pb->previous_block)) &&
(blk & EXT2FS_CLUSTER_MASK(ctx->fs)) ==
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/6] libext2fs: add ext2fs_bitcount() function
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
2012-11-25 0:36 ` [PATCH 1/6] libext2fs: optimize rb_set_bmap_range() Theodore Ts'o
2012-11-25 0:36 ` [PATCH 2/6] e2fsck: optimize pass1 for CPU time Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 10:30 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 4/6] libext2fs: optimize rb_get_bmap_range() Theodore Ts'o
` (4 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
This function efficiently counts the number of bits in a block of
memory.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
lib/ext2fs/bitops.c | 35 +++++++++++++++++++++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_exp | 3 +++
4 files changed, 40 insertions(+)
diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
index 9322a35..7f5b66f 100644
--- a/lib/ext2fs/bitops.c
+++ b/lib/ext2fs/bitops.c
@@ -116,3 +116,38 @@ int ext2fs_test_bit64(__u64 nr, const void * addr)
return (mask & *ADDR);
}
+static unsigned int popcount8(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55);
+ res = (res & 0x33) + ((res >> 2) & 0x33);
+ return (res + (res >> 4)) & 0x0F;
+}
+
+static unsigned int popcount32(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55555555);
+ res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+ res = (res + (res >> 4)) & 0x0F0F0F0F;
+ res = res + (res >> 8);
+ return (res + (res >> 16)) & 0x000000FF;
+}
+
+unsigned int ext2fs_bitcount(const void *addr, unsigned int count)
+{
+ const unsigned char *cp = addr;
+ const __u32 *p = addr;
+ unsigned int res = 0;
+
+ if ((((unsigned long) addr) & 3) == 0) {
+ while (count > 4) {
+ res += popcount32(*p++);
+ count -= 4;
+ }
+ cp = (const unsigned char *) p;
+ }
+ while (count > 0) {
+ res += popcount8(*cp++);
+ count--;
+ }
+ return res;
+}
diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
index 526870f..17e707c 100644
--- a/lib/ext2fs/bitops.h
+++ b/lib/ext2fs/bitops.h
@@ -686,6 +686,7 @@ extern int ext2fs_test_bit(unsigned int nr, const void * addr);
extern int ext2fs_set_bit64(__u64 nr,void * addr);
extern int ext2fs_clear_bit64(__u64 nr, void * addr);
extern int ext2fs_test_bit64(__u64 nr, const void * addr);
+extern unsigned int ext2fs_bitcount(const void *addr, unsigned int count);
#ifdef NO_INLINE_FUNCS
extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
index 5da3693..2a76292 100644
--- a/lib/ext2fs/tst_bitmaps.c
+++ b/lib/ext2fs/tst_bitmaps.c
@@ -270,6 +270,7 @@ void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
for (i=0; i < len; i++)
printf("%02x", buf[i]);
printf("\n");
+ printf("bits set: %u\n", ext2fs_bitcount(buf, len));
free(buf);
}
diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
index 2d406ce..893f315 100644
--- a/lib/ext2fs/tst_bitmaps_exp
+++ b/lib/ext2fs/tst_bitmaps_exp
@@ -36,6 +36,7 @@ tst_bitmaps: testb 16
Block 16 is set
tst_bitmaps: dump_bb
block bitmap: 00f80000000000000000000000000000
+bits set: 5
tst_bitmaps: ffzb 11 16
First unmarked block is 11
tst_bitmaps: ffzb 12 16
@@ -64,6 +65,7 @@ tst_bitmaps: setb 12 7
Marking blocks 12 to 18
tst_bitmaps: dump_bb
block bitmap: 00f80300000000000000000000000000
+bits set: 7
tst_bitmaps: seti 2
Setting inode 2, was clear before
tst_bitmaps: seti 5
@@ -82,6 +84,7 @@ tst_bitmaps: testi 1
Inode 1 is clear
tst_bitmaps: dump_ib
inode bitmap: 1e000000
+bits set: 4
tst_bitmaps: ffzi 1 6
First unmarked inode is 1
tst_bitmaps: ffzi 2 5
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6] libext2fs: optimize rb_get_bmap_range()
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
` (2 preceding siblings ...)
2012-11-25 0:36 ` [PATCH 3/6] libext2fs: add ext2fs_bitcount() function Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 10:46 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps Theodore Ts'o
` (3 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
This simplifies the rb_get_bmap_range() function and speeds it up for
the case where most of the bitmap is zero.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
lib/ext2fs/blkmap64_rb.c | 25 ++++++++-----------------
1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index 816f44f..0705cf2 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -741,32 +741,23 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
break;
}
- pos = start;
+ memset(out, 0, (num + 7) >> 3);
+
for (; parent != NULL; parent = next) {
next = ext2fs_rb_next(parent);
ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);
- while (((pos - start) < num) &&
- (pos < ext->start)) {
- ext2fs_fast_clear_bit64((pos - start), out);
- pos++;
- }
-
- if ((pos - start) >= num)
- return 0;
+ pos = ext->start;
+ if (pos < start)
+ pos = start;
- while (((pos - start) < num) &&
- (pos < (ext->start + ext->count))) {
+ while (pos < (ext->start + ext->count)) {
+ if ((pos - start) >= num)
+ return 0;
ext2fs_fast_set_bit64((pos - start), out);
pos++;
}
}
-
- while ((pos - start) < num) {
- ext2fs_fast_clear_bit64((pos - start), out);
- pos++;
- }
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
` (3 preceding siblings ...)
2012-11-25 0:36 ` [PATCH 4/6] libext2fs: optimize rb_get_bmap_range() Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 11:22 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization Theodore Ts'o
` (2 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
This optimizies the CPU utilization of the rb_get_bmap_range()
function when most of the bitmap is allocated.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
lib/ext2fs/blkmap64_rb.c | 29 ++++++++++++++++++++++++-----
1 file changed, 24 insertions(+), 5 deletions(-)
diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
index 0705cf2..3a5518b 100644
--- a/lib/ext2fs/blkmap64_rb.c
+++ b/lib/ext2fs/blkmap64_rb.c
@@ -721,6 +721,7 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
struct rb_node *parent = NULL, *next, **n;
struct ext2fs_rb_private *bp;
struct bmap_rb_extent *ext;
+ int count;
__u64 pos;
bp = (struct ext2fs_rb_private *) bitmap->private;
@@ -748,14 +749,32 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);
pos = ext->start;
- if (pos < start)
+ count = ext->count;
+ if (pos > start+num)
+ break;
+ if (pos < start) {
+ count -= start - pos;
+ if (count < 0)
+ continue;
pos = start;
-
- while (pos < (ext->start + ext->count)) {
- if ((pos - start) >= num)
- return 0;
+ }
+ if (pos+count > start+num)
+ count = start+num - pos;
+
+ while (count > 0) {
+ if ((count >= 8) &&
+ ((pos - start) % 8) == 0) {
+ int nbytes = count >> 3;
+ int offset = (pos -start) >> 3;
+
+ memset(out + offset, 0xFF, nbytes);
+ pos += nbytes << 3;
+ count -= nbytes << 3;
+ continue;
+ }
ext2fs_fast_set_bit64((pos - start), out);
pos++;
+ count--;
}
}
return 0;
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
` (4 preceding siblings ...)
2012-11-25 0:36 ` [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps Theodore Ts'o
@ 2012-11-25 0:36 ` Theodore Ts'o
2012-11-26 11:59 ` Lukáš Czerner
2012-11-25 5:09 ` [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
2012-11-26 9:19 ` Lukáš Czerner
7 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 0:36 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
Add a fast past optimization in e2fsck's pass 5 for the common case
where the block bitmap is correct. The optimization works by
extracting each block group's block allocation bitmap into a memory
buffer, and comparing it with the expected allocation bitmap using
memcmp(). If it matches, then we can just update the free block
counts and be on our way, and skip checking each bit individually.
Addresses-Google-Bug: #7534813
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
e2fsck/pass5.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 2 deletions(-)
diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
index 8312fe0..5aff55c 100644
--- a/e2fsck/pass5.c
+++ b/e2fsck/pass5.c
@@ -212,6 +212,12 @@ static void check_block_bitmaps(e2fsck_t ctx)
int cmp_block = 0;
int redo_flag = 0;
blk64_t super_blk, old_desc_blk, new_desc_blk;
+ char *actual_buf, *bitmap_buf;
+
+ actual_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
+ "actual bitmap buffer");
+ bitmap_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
+ "bitmap block buffer");
clear_problem_context(&pctx);
free_array = (unsigned int *) e2fsck_allocate_memory(ctx,
@@ -259,11 +265,53 @@ redo_counts:
for (i = B2C(fs->super->s_first_data_block);
i < ext2fs_blocks_count(fs->super);
i += EXT2FS_CLUSTER_RATIO(fs)) {
+ int first_block_in_bg = (B2C(i) -
+ B2C(fs->super->s_first_data_block)) %
+ fs->super->s_clusters_per_group == 0;
+ int n, nbytes = fs->super->s_clusters_per_group / 8;
+
actual = ext2fs_fast_test_block_bitmap2(ctx->block_found_map, i);
+ /*
+ * Try to optimize pass5 by extracting a bitmap block
+ * as expected from what we have on disk, and then
+ * comparing the two. If they are identical, then
+ * update the free block counts and go on to the next
+ * block group. This is much faster than doing the
+ * individual bit-by-bit comparison. The one downside
+ * is that this doesn't work if we are asking e2fsck
+ * to do a discard operation.
+ */
+ if (!first_block_in_bg ||
+ (group == (int)fs->group_desc_count - 1) ||
+ (ctx->options & E2F_OPT_DISCARD))
+ goto no_optimize;
+
+ retval = ext2fs_get_block_bitmap_range2(ctx->block_found_map,
+ B2C(i), fs->super->s_clusters_per_group,
+ actual_buf);
+ if (retval)
+ goto no_optimize;
+ if (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))
+ memset(bitmap_buf, 0, nbytes);
+ else {
+ retval = ext2fs_get_block_bitmap_range2(fs->block_map,
+ B2C(i), fs->super->s_clusters_per_group,
+ bitmap_buf);
+ if (retval)
+ goto no_optimize;
+ }
+ if (memcmp(actual_buf, bitmap_buf, nbytes) != 0)
+ goto no_optimize;
+ n = ext2fs_bitcount(actual_buf, nbytes);
+ group_free = fs->super->s_clusters_per_group - n;
+ free_blocks += group_free;
+ i += fs->super->s_clusters_per_group - 1;
+ goto next_group;
+ no_optimize:
+
if (skip_group) {
- if ((B2C(i) - B2C(fs->super->s_first_data_block)) %
- fs->super->s_clusters_per_group == 0) {
+ if (first_block_in_bg) {
super_blk = 0;
old_desc_blk = 0;
new_desc_blk = 0;
@@ -401,6 +449,7 @@ redo_counts:
if (!bitmap && i >= first_free)
e2fsck_discard_blocks(ctx, first_free,
(i - first_free) + 1);
+ next_group:
first_free = ext2fs_blocks_count(fs->super);
free_array[group] = group_free;
@@ -475,6 +524,8 @@ redo_counts:
}
errout:
ext2fs_free_mem(&free_array);
+ ext2fs_free_mem(&actual_buf);
+ ext2fs_free_mem(&bitmap_buf);
}
static void check_inode_bitmaps(e2fsck_t ctx)
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/6] Optimize e2fsck for large file systems
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
` (5 preceding siblings ...)
2012-11-25 0:36 ` [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization Theodore Ts'o
@ 2012-11-25 5:09 ` Theodore Ts'o
2012-11-26 9:19 ` Lukáš Czerner
7 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-25 5:09 UTC (permalink / raw)
To: Ext4 Developers List
Oops, sorry, I screwed up the "before" numbers. Here are the proper
(apples-to-apples) comparisons; the results are not quite as
eye-popping, but there is still a significant (3x in wall clock time)
improvement...
Before...
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 916k/31500k (719k/198k), time: 96.48/81.26/ 0.10
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.08MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.46/ 0.01/ 0.03
Pass 2: I/O read: 4MB, write: 0MB, rate: 8.72MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 916k/62064k (705k/212k), time: 97.56/81.88/ 0.13
Pass 3A: Memory used: 916k/62064k (736k/181k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 916k/62064k (693k/223k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1257.86MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 916k/936k (602k/315k), time: 6.59/ 6.55/ 0.02
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 1048k/936k (569k/480k), time: 93.39/80.92/ 0.69
Pass 5: I/O read: 117MB, write: 0MB, rate: 1.25MB/s
Memory used: 1048k/936k (569k/480k), time: 197.58/169.36/ 0.84
I/O read: 129MB, write: 0MB, rate: 0.65MB/s
169.36user 0.90system 3:17.66elapsed 86%CPU (0avgtext+0avgdata 254192maxresident)k
0inputs+0outputs (12major+16066minor)pagefaults 0swaps
.... and after....
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 916k/31500k (719k/198k), time: 17.83/ 2.79/ 0.05
Pass 1: I/O read: 8MB, write: 0MB, rate: 0.45MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.45/ 0.01/ 0.01
Pass 2: I/O read: 4MB, write: 0MB, rate: 8.82MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 916k/62064k (704k/212k), time: 18.89/ 3.39/ 0.07
Pass 3A: Memory used: 916k/62064k (735k/181k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 916k/62064k (693k/224k), time: 0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 1265.82MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 916k/936k (601k/315k), time: 5.69/ 5.67/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 1048k/936k (569k/480k), time: 22.76/10.49/ 0.53
Pass 5: I/O read: 117MB, write: 0MB, rate: 5.14MB/s
Memory used: 1048k/936k (569k/480k), time: 47.38/19.57/ 0.60
I/O read: 129MB, write: 0MB, rate: 2.72MB/s
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 0/6] Optimize e2fsck for large file systems
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
` (6 preceding siblings ...)
2012-11-25 5:09 ` [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
@ 2012-11-26 9:19 ` Lukáš Czerner
7 siblings, 0 replies; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 9:19 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:28 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [RFC PATCH 0/6] Optimize e2fsck for large file systems
>
> This patch series optimizes e2fsck for large file systems (where large
> is 4TB or more). Previously checking a 4TB file system when it was
> mostly full could take upwards of six minutes of wall clock time, and
> e2fsck would be mostly CPU bound. With this patch series, the same 4TB
> file system can now be checked in less than 50 seconds and approximately
> 20 seconds of userspace CPU time. (Previously, it was consuming over
> 15 times as much CPU time.)
>
> The speed ups come in three places:
>
> 1) Reducing the CPU time while reading the block bitmap in from disk.
> This was done by speeding up rb_set_bmap_range, and it significantly
> improves e2fsck's pass 5 operation.
>
> 2) Reducing the CPU time in e2fsck pass1 while constructing the
> block_found_map (which is the in-core block allocation bitmap as
> found by interating over all of the inodes).
>
> 3) Further speed up e2fsck's pass 5 by comparing the block allocation
> bitmap one bitmap block at a time, instead of a bit-at-a-time.
>
> Before....
>
> Pass 1: Checking inodes, blocks, and sizes
> Pass 1: Memory used: 712k/31500k (622k/91k), time: 194.99/179.09/ 0.04
> Pass 1: I/O read: 8MB, write: 0MB, rate: 0.04MB/s
> Pass 2: Checking directory structure
> Pass 2: Memory used: 712k/62064k (605k/108k), time: 1.03/ 0.01/ 0.02
> Pass 2: I/O read: 4MB, write: 0MB, rate: 3.89MB/s
> Pass 3: Checking directory connectivity
> Peak memory: Memory used: 712k/62064k (605k/108k), time: 197.31/180.37/ 0.07
> Pass 3A: Memory used: 712k/62064k (626k/87k), time: 0.00/ 0.00/ 0.00
> Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
> Pass 3: Memory used: 712k/62064k (594k/119k), time: 0.00/ 0.00/ 0.00
> Pass 3: I/O read: 1MB, write: 0MB, rate: 639.39MB/s
> Pass 4: Checking reference counts
> Pass 4: Memory used: 712k/936k (533k/180k), time: 7.82/ 7.81/ 0.00
> Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
> Pass 5: Checking group summary information
> Pass 5: Memory used: 832k/936k (510k/323k), time: 172.99/161.13/ 0.41
> Pass 5: I/O read: 118MB, write: 0MB, rate: 0.68MB/s
> Memory used: 832k/936k (510k/323k), time: 378.21/349.38/ 0.48
> I/O read: 129MB, write: 0MB, rate: 0.34MB/s
>
> ... and after....
>
> Pass 1: Checking inodes, blocks, and sizes
> Pass 1: Memory used: 916k/31500k (719k/198k), time: 17.83/ 2.79/ 0.05
> Pass 1: I/O read: 8MB, write: 0MB, rate: 0.45MB/s
> Pass 2: Checking directory structure
> Pass 2: Memory used: 916k/62064k (704k/212k), time: 0.45/ 0.01/ 0.01
> Pass 2: I/O read: 4MB, write: 0MB, rate: 8.82MB/s
> Pass 3: Checking directory connectivity
> Peak memory: Memory used: 916k/62064k (704k/212k), time: 18.89/ 3.39/ 0.07
> Pass 3A: Memory used: 916k/62064k (735k/181k), time: 0.00/ 0.00/ 0.00
> Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
> Pass 3: Memory used: 916k/62064k (693k/224k), time: 0.00/ 0.00/ 0.00
> Pass 3: I/O read: 1MB, write: 0MB, rate: 1265.82MB/s
> Pass 4: Checking reference counts
> Pass 4: Memory used: 916k/936k (601k/315k), time: 5.69/ 5.67/ 0.00
> Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
> Pass 5: Checking group summary information
> Pass 5: Memory used: 1048k/936k (569k/480k), time: 22.76/10.49/ 0.53
> Pass 5: I/O read: 117MB, write: 0MB, rate: 5.14MB/s
> Memory used: 1048k/936k (569k/480k), time: 47.38/19.57/ 0.60
> I/O read: 129MB, write: 0MB, rate: 2.72MB/s
That's very impressive speed up Ted! I'll give it a try and take a
look at the patches.
Thanks!
-Lukas
>
> For slower CPU's (i.e., bookshelf NAS servers with underpowered, wimpy
> ARM processors) or for larger RAID arrays, the speed ups would of course
> be even better.
>
> Theodore Ts'o (6):
> libext2fs: optimize rb_set_bmap_range()
> e2fsck: optimize pass1 for CPU time
> libext2fs: add ext2fs_bitcount() function
> libext2fs: optimize rb_get_bmap_range()
> libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
> e2fsck: optimize pass 5 for CPU utilization
>
> e2fsck/pass1.c | 18 +++++++++--
> e2fsck/pass5.c | 55 +++++++++++++++++++++++++++++++--
> lib/ext2fs/bitops.c | 35 +++++++++++++++++++++
> lib/ext2fs/bitops.h | 1 +
> lib/ext2fs/blkmap64_rb.c | 76 ++++++++++++++++++++++++++++++++++------------
> lib/ext2fs/tst_bitmaps.c | 1 +
> lib/ext2fs/tst_bitmaps_exp | 3 ++
> 7 files changed, 165 insertions(+), 24 deletions(-)
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/6] libext2fs: optimize rb_set_bmap_range()
2012-11-25 0:36 ` [PATCH 1/6] libext2fs: optimize rb_set_bmap_range() Theodore Ts'o
@ 2012-11-26 9:40 ` Lukáš Czerner
2012-11-26 13:36 ` Theodore Ts'o
0 siblings, 1 reply; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 9:40 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:29 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 1/6] libext2fs: optimize rb_set_bmap_range()
>
> This speeds up reading bitmaps from disk for very large (and full)
> disks by significant amounts (i.e., up to two CPU minutes for a 4T
> file system).
>
> Addresses-Google-Bug: #7534813
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> lib/ext2fs/blkmap64_rb.c | 32 +++++++++++++++++++++++++++++---
> 1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
> index a42eda1..816f44f 100644
> --- a/lib/ext2fs/blkmap64_rb.c
> +++ b/lib/ext2fs/blkmap64_rb.c
> @@ -674,16 +674,42 @@ static errcode_t rb_set_bmap_range(ext2fs_generic_bitmap bitmap,
> __u64 start, size_t num, void *in)
> {
> struct ext2fs_rb_private *bp;
> + unsigned char *cp;
> size_t i;
> + int first_set = -1;
> int ret;
>
> bp = (struct ext2fs_rb_private *) bitmap->private;
>
> for (i = 0; i < num; i++) {
> - ret = ext2fs_test_bit(i, in);
> - if (ret)
> - rb_insert_extent(start + i - bitmap->start, 1, bp);
> + if (i & 7 == 0) {
> + unsigned char c = cp[i/8];
I do not see cp initialized anywhere. I suppose it should map to the
'in' bitmap ?
I guess that 8 will always be aliquot part of 'num', by maybe we
could explicitly check for that to avoid access to uninitialized
memory ?
> + if (c == 0xFF) {
> + if (first_set == -1)
> + first_set = i;
> + i += 7;
> + continue;
> + }
> + if ((c == 0x00) && (first_set == -1)) {
> + i += 7;
> + continue;
> + }
> + }
> + if (ext2fs_test_bit(i, in)) {
> + if (first_set == -1)
> + first_set = i;
> + continue;
> + }
> + if (first_set == -1)
> + continue;
> +
> + rb_insert_extent(start + first_set - bitmap->start,
> + i - first_set, bp);
> + first_set = -1;
> }
> + if (first_set != -1)
> + rb_insert_extent(start + first_set - bitmap->start,
> + num - first_set, bp);
>
> return 0;
> }
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/6] e2fsck: optimize pass1 for CPU time
2012-11-25 0:36 ` [PATCH 2/6] e2fsck: optimize pass1 for CPU time Theodore Ts'o
@ 2012-11-26 10:06 ` Lukáš Czerner
0 siblings, 0 replies; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 10:06 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:30 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 2/6] e2fsck: optimize pass1 for CPU time
>
> Optimize e2fsck pass 1 by marking entire extents as being in use at a
> time, instead of block by block. This optimization only works for
> non-bigalloc file systems for now (it's tricky to handle bigalloc file
> systems since this code is also responsible for dealing with blocks
> that are not correctly aligned within a cluster). When the
> optimization works, the CPU savings can be significant: up to two or
> three CPU minutes for a full 4T disk.
>
> Addresses-Google-Bug: #7534813
Looks good.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> e2fsck/pass1.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> index 78fbe8d..cc00e0f 100644
> --- a/e2fsck/pass1.c
> +++ b/e2fsck/pass1.c
> @@ -1432,6 +1432,16 @@ static _INLINE_ void mark_block_used(e2fsck_t ctx, blk64_t block)
> }
> }
>
> +static _INLINE_ void mark_blocks_used(e2fsck_t ctx, blk64_t block,
> + unsigned int num)
> +{
> + if (ext2fs_test_block_bitmap_range2(ctx->block_found_map, block, num))
> + ext2fs_mark_block_bitmap_range2(ctx->block_found_map, block, num);
> + else
> + while (num--)
> + mark_block_used(ctx, block++);
> +}
> +
> /*
> * Adjust the extended attribute block's reference counts at the end
> * of pass 1, either by subtracting out references for EA blocks that
> @@ -1867,11 +1877,15 @@ static void scan_extent_node(e2fsck_t ctx, struct problem_context *pctx,
> goto failed_add_dir_block;
> }
> }
> + if (!ctx->fs->cluster_ratio_bits) {
> + mark_blocks_used(ctx, extent.e_pblk, extent.e_len);
> + pb->num_blocks += extent.e_len;
> + }
> for (blk = extent.e_pblk, blockcnt = extent.e_lblk, i = 0;
> i < extent.e_len;
> blk++, blockcnt++, i++) {
> - if (!(ctx->fs->cluster_ratio_bits &&
> - pb->previous_block &&
> + if (ctx->fs->cluster_ratio_bits &&
> + !(pb->previous_block &&
> (EXT2FS_B2C(ctx->fs, blk) ==
> EXT2FS_B2C(ctx->fs, pb->previous_block)) &&
> (blk & EXT2FS_CLUSTER_MASK(ctx->fs)) ==
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/6] libext2fs: add ext2fs_bitcount() function
2012-11-25 0:36 ` [PATCH 3/6] libext2fs: add ext2fs_bitcount() function Theodore Ts'o
@ 2012-11-26 10:30 ` Lukáš Czerner
2012-11-26 14:06 ` Theodore Ts'o
0 siblings, 1 reply; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 10:30 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:31 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 3/6] libext2fs: add ext2fs_bitcount() function
>
> This function efficiently counts the number of bits in a block of
> memory.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> lib/ext2fs/bitops.c | 35 +++++++++++++++++++++++++++++++++++
> lib/ext2fs/bitops.h | 1 +
> lib/ext2fs/tst_bitmaps.c | 1 +
> lib/ext2fs/tst_bitmaps_exp | 3 +++
> 4 files changed, 40 insertions(+)
>
> diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
> index 9322a35..7f5b66f 100644
> --- a/lib/ext2fs/bitops.c
> +++ b/lib/ext2fs/bitops.c
> @@ -116,3 +116,38 @@ int ext2fs_test_bit64(__u64 nr, const void * addr)
> return (mask & *ADDR);
> }
>
> +static unsigned int popcount8(unsigned int w)
> +{
> + unsigned int res = w - ((w >> 1) & 0x55);
> + res = (res & 0x33) + ((res >> 2) & 0x33);
> + return (res + (res >> 4)) & 0x0F;
> +}
> +
> +static unsigned int popcount32(unsigned int w)
> +{
> + unsigned int res = w - ((w >> 1) & 0x55555555);
> + res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
> + res = (res + (res >> 4)) & 0x0F0F0F0F;
> + res = res + (res >> 8);
> + return (res + (res >> 16)) & 0x000000FF;
> +}
> +
> +unsigned int ext2fs_bitcount(const void *addr, unsigned int count)
> +{
> + const unsigned char *cp = addr;
> + const __u32 *p = addr;
> + unsigned int res = 0;
> +
Again it is assumed that 8 will always be aliquot of 'count', but
it might be worth having a check for that ?
Otherwise it looks good.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
> + if ((((unsigned long) addr) & 3) == 0) {
> + while (count > 4) {
> + res += popcount32(*p++);
> + count -= 4;
> + }
> + cp = (const unsigned char *) p;
> + }
> + while (count > 0) {
> + res += popcount8(*cp++);
> + count--;
> + }
> + return res;
> +}
> diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
> index 526870f..17e707c 100644
> --- a/lib/ext2fs/bitops.h
> +++ b/lib/ext2fs/bitops.h
> @@ -686,6 +686,7 @@ extern int ext2fs_test_bit(unsigned int nr, const void * addr);
> extern int ext2fs_set_bit64(__u64 nr,void * addr);
> extern int ext2fs_clear_bit64(__u64 nr, void * addr);
> extern int ext2fs_test_bit64(__u64 nr, const void * addr);
> +extern unsigned int ext2fs_bitcount(const void *addr, unsigned int count);
>
> #ifdef NO_INLINE_FUNCS
> extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
> diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
> index 5da3693..2a76292 100644
> --- a/lib/ext2fs/tst_bitmaps.c
> +++ b/lib/ext2fs/tst_bitmaps.c
> @@ -270,6 +270,7 @@ void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
> for (i=0; i < len; i++)
> printf("%02x", buf[i]);
> printf("\n");
> + printf("bits set: %u\n", ext2fs_bitcount(buf, len));
> free(buf);
> }
>
> diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
> index 2d406ce..893f315 100644
> --- a/lib/ext2fs/tst_bitmaps_exp
> +++ b/lib/ext2fs/tst_bitmaps_exp
> @@ -36,6 +36,7 @@ tst_bitmaps: testb 16
> Block 16 is set
> tst_bitmaps: dump_bb
> block bitmap: 00f80000000000000000000000000000
> +bits set: 5
> tst_bitmaps: ffzb 11 16
> First unmarked block is 11
> tst_bitmaps: ffzb 12 16
> @@ -64,6 +65,7 @@ tst_bitmaps: setb 12 7
> Marking blocks 12 to 18
> tst_bitmaps: dump_bb
> block bitmap: 00f80300000000000000000000000000
> +bits set: 7
> tst_bitmaps: seti 2
> Setting inode 2, was clear before
> tst_bitmaps: seti 5
> @@ -82,6 +84,7 @@ tst_bitmaps: testi 1
> Inode 1 is clear
> tst_bitmaps: dump_ib
> inode bitmap: 1e000000
> +bits set: 4
> tst_bitmaps: ffzi 1 6
> First unmarked inode is 1
> tst_bitmaps: ffzi 2 5
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/6] libext2fs: optimize rb_get_bmap_range()
2012-11-25 0:36 ` [PATCH 4/6] libext2fs: optimize rb_get_bmap_range() Theodore Ts'o
@ 2012-11-26 10:46 ` Lukáš Czerner
0 siblings, 0 replies; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 10:46 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:32 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 4/6] libext2fs: optimize rb_get_bmap_range()
>
> This simplifies the rb_get_bmap_range() function and speeds it up for
> the case where most of the bitmap is zero.
Looks good
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> lib/ext2fs/blkmap64_rb.c | 25 ++++++++-----------------
> 1 file changed, 8 insertions(+), 17 deletions(-)
>
> diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
> index 816f44f..0705cf2 100644
> --- a/lib/ext2fs/blkmap64_rb.c
> +++ b/lib/ext2fs/blkmap64_rb.c
> @@ -741,32 +741,23 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
> break;
> }
>
> - pos = start;
> + memset(out, 0, (num + 7) >> 3);
> +
> for (; parent != NULL; parent = next) {
> next = ext2fs_rb_next(parent);
> ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);
>
> - while (((pos - start) < num) &&
> - (pos < ext->start)) {
> - ext2fs_fast_clear_bit64((pos - start), out);
> - pos++;
> - }
> -
> - if ((pos - start) >= num)
> - return 0;
> + pos = ext->start;
> + if (pos < start)
> + pos = start;
>
> - while (((pos - start) < num) &&
> - (pos < (ext->start + ext->count))) {
> + while (pos < (ext->start + ext->count)) {
> + if ((pos - start) >= num)
> + return 0;
> ext2fs_fast_set_bit64((pos - start), out);
> pos++;
> }
> }
> -
> - while ((pos - start) < num) {
> - ext2fs_fast_clear_bit64((pos - start), out);
> - pos++;
> - }
> -
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
2012-11-25 0:36 ` [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps Theodore Ts'o
@ 2012-11-26 11:22 ` Lukáš Czerner
2012-11-26 14:17 ` Theodore Ts'o
0 siblings, 1 reply; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 11:22 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:33 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly
> allocated bmaps
>
> This optimizies the CPU utilization of the rb_get_bmap_range()
> function when most of the bitmap is allocated.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> lib/ext2fs/blkmap64_rb.c | 29 ++++++++++++++++++++++++-----
> 1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/lib/ext2fs/blkmap64_rb.c b/lib/ext2fs/blkmap64_rb.c
> index 0705cf2..3a5518b 100644
> --- a/lib/ext2fs/blkmap64_rb.c
> +++ b/lib/ext2fs/blkmap64_rb.c
> @@ -721,6 +721,7 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
> struct rb_node *parent = NULL, *next, **n;
> struct ext2fs_rb_private *bp;
> struct bmap_rb_extent *ext;
> + int count;
> __u64 pos;
>
> bp = (struct ext2fs_rb_private *) bitmap->private;
> @@ -748,14 +749,32 @@ static errcode_t rb_get_bmap_range(ext2fs_generic_bitmap bitmap,
> ext = ext2fs_rb_entry(parent, struct bmap_rb_extent, node);
>
> pos = ext->start;
> - if (pos < start)
> + count = ext->count;
> + if (pos > start+num)
Missing spacing around '+'.
Also I think that the condition should rather be:
if (pos >= start + num)
> + break;
> + if (pos < start) {
> + count -= start - pos;
> + if (count < 0)
> + continue;
> pos = start;
> -
> - while (pos < (ext->start + ext->count)) {
> - if ((pos - start) >= num)
> - return 0;
> + }
> + if (pos+count > start+num)
> + count = start+num - pos;
missing spacing around '+'
> +
> + while (count > 0) {
> + if ((count >= 8) &&
> + ((pos - start) % 8) == 0) {
> + int nbytes = count >> 3;
> + int offset = (pos -start) >> 3;
missing spacing around '-'
> +
> + memset(out + offset, 0xFF, nbytes);
> + pos += nbytes << 3;
> + count -= nbytes << 3;
> + continue;
> + }
> ext2fs_fast_set_bit64((pos - start), out);
> pos++;
> + count--;
> }
> }
> return 0;
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization
2012-11-25 0:36 ` [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization Theodore Ts'o
@ 2012-11-26 11:59 ` Lukáš Czerner
0 siblings, 0 replies; 20+ messages in thread
From: Lukáš Czerner @ 2012-11-26 11:59 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
On Sat, 24 Nov 2012, Theodore Ts'o wrote:
> Date: Sat, 24 Nov 2012 19:36:34 -0500
> From: Theodore Ts'o <tytso@mit.edu>
> To: Ext4 Developers List <linux-ext4@vger.kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Subject: [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization
>
> Add a fast past optimization in e2fsck's pass 5 for the common case
> where the block bitmap is correct. The optimization works by
> extracting each block group's block allocation bitmap into a memory
> buffer, and comparing it with the expected allocation bitmap using
> memcmp(). If it matches, then we can just update the free block
> counts and be on our way, and skip checking each bit individually.
>
> Addresses-Google-Bug: #7534813
Looks good, we can always move the discard operation at the end of
the function before we free the bitmap, but that's further
optimization.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> e2fsck/pass5.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index 8312fe0..5aff55c 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -212,6 +212,12 @@ static void check_block_bitmaps(e2fsck_t ctx)
> int cmp_block = 0;
> int redo_flag = 0;
> blk64_t super_blk, old_desc_blk, new_desc_blk;
> + char *actual_buf, *bitmap_buf;
> +
> + actual_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
> + "actual bitmap buffer");
> + bitmap_buf = (char *) e2fsck_allocate_memory(ctx, fs->blocksize,
> + "bitmap block buffer");
>
> clear_problem_context(&pctx);
> free_array = (unsigned int *) e2fsck_allocate_memory(ctx,
> @@ -259,11 +265,53 @@ redo_counts:
> for (i = B2C(fs->super->s_first_data_block);
> i < ext2fs_blocks_count(fs->super);
> i += EXT2FS_CLUSTER_RATIO(fs)) {
> + int first_block_in_bg = (B2C(i) -
> + B2C(fs->super->s_first_data_block)) %
> + fs->super->s_clusters_per_group == 0;
> + int n, nbytes = fs->super->s_clusters_per_group / 8;
> +
> actual = ext2fs_fast_test_block_bitmap2(ctx->block_found_map, i);
>
> + /*
> + * Try to optimize pass5 by extracting a bitmap block
> + * as expected from what we have on disk, and then
> + * comparing the two. If they are identical, then
> + * update the free block counts and go on to the next
> + * block group. This is much faster than doing the
> + * individual bit-by-bit comparison. The one downside
> + * is that this doesn't work if we are asking e2fsck
> + * to do a discard operation.
> + */
> + if (!first_block_in_bg ||
> + (group == (int)fs->group_desc_count - 1) ||
> + (ctx->options & E2F_OPT_DISCARD))
> + goto no_optimize;
> +
> + retval = ext2fs_get_block_bitmap_range2(ctx->block_found_map,
> + B2C(i), fs->super->s_clusters_per_group,
> + actual_buf);
> + if (retval)
> + goto no_optimize;
> + if (ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT))
> + memset(bitmap_buf, 0, nbytes);
> + else {
> + retval = ext2fs_get_block_bitmap_range2(fs->block_map,
> + B2C(i), fs->super->s_clusters_per_group,
> + bitmap_buf);
> + if (retval)
> + goto no_optimize;
> + }
> + if (memcmp(actual_buf, bitmap_buf, nbytes) != 0)
> + goto no_optimize;
> + n = ext2fs_bitcount(actual_buf, nbytes);
> + group_free = fs->super->s_clusters_per_group - n;
> + free_blocks += group_free;
> + i += fs->super->s_clusters_per_group - 1;
> + goto next_group;
> + no_optimize:
> +
> if (skip_group) {
> - if ((B2C(i) - B2C(fs->super->s_first_data_block)) %
> - fs->super->s_clusters_per_group == 0) {
> + if (first_block_in_bg) {
> super_blk = 0;
> old_desc_blk = 0;
> new_desc_blk = 0;
> @@ -401,6 +449,7 @@ redo_counts:
> if (!bitmap && i >= first_free)
> e2fsck_discard_blocks(ctx, first_free,
> (i - first_free) + 1);
> + next_group:
> first_free = ext2fs_blocks_count(fs->super);
>
> free_array[group] = group_free;
> @@ -475,6 +524,8 @@ redo_counts:
> }
> errout:
> ext2fs_free_mem(&free_array);
> + ext2fs_free_mem(&actual_buf);
> + ext2fs_free_mem(&bitmap_buf);
> }
>
> static void check_inode_bitmaps(e2fsck_t ctx)
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/6] libext2fs: optimize rb_set_bmap_range()
2012-11-26 9:40 ` Lukáš Czerner
@ 2012-11-26 13:36 ` Theodore Ts'o
0 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-26 13:36 UTC (permalink / raw)
To: Lukáš Czerner; +Cc: Ext4 Developers List
On Mon, Nov 26, 2012 at 10:40:28AM +0100, Lukáš Czerner wrote:
> I do not see cp initialized anywhere. I suppose it should map to the
> 'in' bitmap ?
Oops, I missed this when moving the patch over. Thanks for pointing
this out! Yes, it should have been initialized:
unsigned char *cp = in;
> I guess that 8 will always be aliquot part of 'num', by maybe we
> could explicitly check for that to avoid access to uninitialized
> memory ?
It is true that ext2fs_[sg]et_bmap_range() always gets called with num
as a multiple of 8, but it should work correctly even if it isn't
here, since when we check the uninitialized bits in the last byte in
the memory range, the optimization will just fail, and then we'll fall
back to the old slow path for the last bits in the bitmap.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/6] libext2fs: add ext2fs_bitcount() function
2012-11-26 10:30 ` Lukáš Czerner
@ 2012-11-26 14:06 ` Theodore Ts'o
2012-11-26 14:10 ` [PATCH 3/6 -v2] " Theodore Ts'o
0 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-26 14:06 UTC (permalink / raw)
To: Lukáš Czerner; +Cc: Ext4 Developers List
On Mon, Nov 26, 2012 at 11:30:55AM +0100, Lukáš Czerner wrote:
> > +unsigned int ext2fs_bitcount(const void *addr, unsigned int count)
> > +{
> > + const unsigned char *cp = addr;
> > + const __u32 *p = addr;
> > + unsigned int res = 0;
> > +
>
> Again it is assumed that 8 will always be aliquot of 'count', but
> it might be worth having a check for that ?
In this case "count" is in bytes, since that's the only thing that
makes sense. I'll rename the variable to nbytes to make this clear.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/6 -v2] libext2fs: add ext2fs_bitcount() function
2012-11-26 14:06 ` Theodore Ts'o
@ 2012-11-26 14:10 ` Theodore Ts'o
2012-11-26 14:12 ` [PATCH 4/6 -v3] " Theodore Ts'o
0 siblings, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-26 14:10 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: lczerner, Theodore Ts'o
This function efficiently counts the number of bits in a block of
memory.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
---
Renamed function signature to make it clear that ext2fs_bitcount takes
the number of bytes, not number of bits.
Enhance the unit test to make sure the popcount function is more
thoroughly tested.
lib/ext2fs/bitops.c | 35 ++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_cmds | 42 ++++++++++++++++++++
lib/ext2fs/tst_bitmaps_exp | 97 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 176 insertions(+)
diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
index 9322a35..7f5b66f 100644
--- a/lib/ext2fs/bitops.c
+++ b/lib/ext2fs/bitops.c
@@ -116,3 +116,38 @@ int ext2fs_test_bit64(__u64 nr, const void * addr)
return (mask & *ADDR);
}
+static unsigned int popcount8(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55);
+ res = (res & 0x33) + ((res >> 2) & 0x33);
+ return (res + (res >> 4)) & 0x0F;
+}
+
+static unsigned int popcount32(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55555555);
+ res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+ res = (res + (res >> 4)) & 0x0F0F0F0F;
+ res = res + (res >> 8);
+ return (res + (res >> 16)) & 0x000000FF;
+}
+
+unsigned int ext2fs_bitcount(const void *addr, unsigned int count)
+{
+ const unsigned char *cp = addr;
+ const __u32 *p = addr;
+ unsigned int res = 0;
+
+ if ((((unsigned long) addr) & 3) == 0) {
+ while (count > 4) {
+ res += popcount32(*p++);
+ count -= 4;
+ }
+ cp = (const unsigned char *) p;
+ }
+ while (count > 0) {
+ res += popcount8(*cp++);
+ count--;
+ }
+ return res;
+}
diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
index 526870f..17e707c 100644
--- a/lib/ext2fs/bitops.h
+++ b/lib/ext2fs/bitops.h
@@ -686,6 +686,7 @@ extern int ext2fs_test_bit(unsigned int nr, const void * addr);
extern int ext2fs_set_bit64(__u64 nr,void * addr);
extern int ext2fs_clear_bit64(__u64 nr, void * addr);
extern int ext2fs_test_bit64(__u64 nr, const void * addr);
+extern unsigned int ext2fs_bitcount(const void *addr, unsigned int count);
#ifdef NO_INLINE_FUNCS
extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
index 5da3693..2a76292 100644
--- a/lib/ext2fs/tst_bitmaps.c
+++ b/lib/ext2fs/tst_bitmaps.c
@@ -270,6 +270,7 @@ void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
for (i=0; i < len; i++)
printf("%02x", buf[i]);
printf("\n");
+ printf("bits set: %u\n", ext2fs_bitcount(buf, len));
free(buf);
}
diff --git a/lib/ext2fs/tst_bitmaps_cmds b/lib/ext2fs/tst_bitmaps_cmds
index 9a4f3d0..31e2a60 100644
--- a/lib/ext2fs/tst_bitmaps_cmds
+++ b/lib/ext2fs/tst_bitmaps_cmds
@@ -53,5 +53,47 @@ cleari 5
testi 17
testi 6
testi 4
+clearb 7 12
+dump_bb
+setb 1
+dump_bb
+setb 2
+dump_bb
+setb 3
+dump_bb
+setb 4
+dump_bb
+setb 5
+dump_bb
+setb 6
+dump_bb
+setb 7
+dump_bb
+setb 8
+dump_bb
+setb 10
+setb 12
+setb 14
+setb 17
+setb 19
+setb 24
+setb 26
+setb 27
+setb 30
+setb 31
+setb 32
+setb 35
+setb 39
+setb 40
+setb 44
+setb 46
+setb 47
+setb 49
+setb 51
+setb 52
+clearb 2
+clearb 3
+clearb 7
+dump_bb
quit
diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
index 2d406ce..2d62b66 100644
--- a/lib/ext2fs/tst_bitmaps_exp
+++ b/lib/ext2fs/tst_bitmaps_exp
@@ -36,6 +36,7 @@ tst_bitmaps: testb 16
Block 16 is set
tst_bitmaps: dump_bb
block bitmap: 00f80000000000000000000000000000
+bits set: 5
tst_bitmaps: ffzb 11 16
First unmarked block is 11
tst_bitmaps: ffzb 12 16
@@ -64,6 +65,7 @@ tst_bitmaps: setb 12 7
Marking blocks 12 to 18
tst_bitmaps: dump_bb
block bitmap: 00f80300000000000000000000000000
+bits set: 7
tst_bitmaps: seti 2
Setting inode 2, was clear before
tst_bitmaps: seti 5
@@ -82,6 +84,7 @@ tst_bitmaps: testi 1
Inode 1 is clear
tst_bitmaps: dump_ib
inode bitmap: 1e000000
+bits set: 4
tst_bitmaps: ffzi 1 6
First unmarked inode is 1
tst_bitmaps: ffzi 2 5
@@ -110,5 +113,99 @@ tst_bitmaps: testi 6
Inode 6 is clear
tst_bitmaps: testi 4
Inode 4 is clear
+tst_bitmaps: clearb 7 12
+Clearing blocks 7 to 18
+tst_bitmaps: dump_bb
+block bitmap: 00000000000000000000000000000000
+bits set: 0
+tst_bitmaps: setb 1
+Setting block 1, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 01000000000000000000000000000000
+bits set: 1
+tst_bitmaps: setb 2
+Setting block 2, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 03000000000000000000000000000000
+bits set: 2
+tst_bitmaps: setb 3
+Setting block 3, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 07000000000000000000000000000000
+bits set: 3
+tst_bitmaps: setb 4
+Setting block 4, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 0f000000000000000000000000000000
+bits set: 4
+tst_bitmaps: setb 5
+Setting block 5, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 1f000000000000000000000000000000
+bits set: 5
+tst_bitmaps: setb 6
+Setting block 6, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 3f000000000000000000000000000000
+bits set: 6
+tst_bitmaps: setb 7
+Setting block 7, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 7f000000000000000000000000000000
+bits set: 7
+tst_bitmaps: setb 8
+Setting block 8, was clear before
+tst_bitmaps: dump_bb
+block bitmap: ff000000000000000000000000000000
+bits set: 8
+tst_bitmaps: setb 10
+Setting block 10, was clear before
+tst_bitmaps: setb 12
+Setting block 12, was clear before
+tst_bitmaps: setb 14
+Setting block 14, was clear before
+tst_bitmaps: setb 17
+Setting block 17, was clear before
+tst_bitmaps: setb 19
+Setting block 19, was clear before
+tst_bitmaps: setb 24
+Setting block 24, was clear before
+tst_bitmaps: setb 26
+Setting block 26, was clear before
+tst_bitmaps: setb 27
+Setting block 27, was clear before
+tst_bitmaps: setb 30
+Setting block 30, was clear before
+tst_bitmaps: setb 31
+Setting block 31, was clear before
+tst_bitmaps: setb 32
+Setting block 32, was clear before
+tst_bitmaps: setb 35
+Setting block 35, was clear before
+tst_bitmaps: setb 39
+Setting block 39, was clear before
+tst_bitmaps: setb 40
+Setting block 40, was clear before
+tst_bitmaps: setb 44
+Setting block 44, was clear before
+tst_bitmaps: setb 46
+Setting block 46, was clear before
+tst_bitmaps: setb 47
+Setting block 47, was clear before
+tst_bitmaps: setb 49
+Setting block 49, was clear before
+tst_bitmaps: setb 51
+Setting block 51, was clear before
+tst_bitmaps: setb 52
+Setting block 52, was clear before
+tst_bitmaps: clearb 2
+Clearing block 2, was set before
+tst_bitmaps: clearb 3
+Clearing block 3, was set before
+tst_bitmaps: clearb 7
+Clearing block 7, was set before
+tst_bitmaps: dump_bb
+block bitmap: b92a85e6c4680d000000000000000000
+bits set: 25
tst_bitmaps: quit
tst_bitmaps:
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/6 -v3] libext2fs: add ext2fs_bitcount() function
2012-11-26 14:10 ` [PATCH 3/6 -v2] " Theodore Ts'o
@ 2012-11-26 14:12 ` Theodore Ts'o
0 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-26 14:12 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: lczerner, Theodore Ts'o
This function efficiently counts the number of bits in a block of
memory.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
---
(Once more, with the correct patch...)
Renamed function signature to make it clear that ext2fs_bitcount takes
the number of bytes, not number of bits.
Enhance the unit test to make sure the popcount function is more
thoroughly tested.
lib/ext2fs/bitops.c | 35 ++++++++++++++++
lib/ext2fs/bitops.h | 1 +
lib/ext2fs/tst_bitmaps.c | 1 +
lib/ext2fs/tst_bitmaps_cmds | 42 ++++++++++++++++++++
lib/ext2fs/tst_bitmaps_exp | 97 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 176 insertions(+)
diff --git a/lib/ext2fs/bitops.c b/lib/ext2fs/bitops.c
index 9322a35..bd9e768 100644
--- a/lib/ext2fs/bitops.c
+++ b/lib/ext2fs/bitops.c
@@ -116,3 +116,38 @@ int ext2fs_test_bit64(__u64 nr, const void * addr)
return (mask & *ADDR);
}
+static unsigned int popcount8(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55);
+ res = (res & 0x33) + ((res >> 2) & 0x33);
+ return (res + (res >> 4)) & 0x0F;
+}
+
+static unsigned int popcount32(unsigned int w)
+{
+ unsigned int res = w - ((w >> 1) & 0x55555555);
+ res = (res & 0x33333333) + ((res >> 2) & 0x33333333);
+ res = (res + (res >> 4)) & 0x0F0F0F0F;
+ res = res + (res >> 8);
+ return (res + (res >> 16)) & 0x000000FF;
+}
+
+unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes)
+{
+ const unsigned char *cp = addr;
+ const __u32 *p = addr;
+ unsigned int res = 0;
+
+ if ((((unsigned long) addr) & 3) == 0) {
+ while (nbytes > 4) {
+ res += popcount32(*p++);
+ nbytes -= 4;
+ }
+ cp = (const unsigned char *) p;
+ }
+ while (nbytes > 0) {
+ res += popcount8(*cp++);
+ nbytes--;
+ }
+ return res;
+}
diff --git a/lib/ext2fs/bitops.h b/lib/ext2fs/bitops.h
index 526870f..406999b 100644
--- a/lib/ext2fs/bitops.h
+++ b/lib/ext2fs/bitops.h
@@ -686,6 +686,7 @@ extern int ext2fs_test_bit(unsigned int nr, const void * addr);
extern int ext2fs_set_bit64(__u64 nr,void * addr);
extern int ext2fs_clear_bit64(__u64 nr, void * addr);
extern int ext2fs_test_bit64(__u64 nr, const void * addr);
+extern unsigned int ext2fs_bitcount(const void *addr, unsigned int nbytes);
#ifdef NO_INLINE_FUNCS
extern void ext2fs_fast_set_bit(unsigned int nr,void * addr);
diff --git a/lib/ext2fs/tst_bitmaps.c b/lib/ext2fs/tst_bitmaps.c
index 5da3693..2a76292 100644
--- a/lib/ext2fs/tst_bitmaps.c
+++ b/lib/ext2fs/tst_bitmaps.c
@@ -270,6 +270,7 @@ void dump_bitmap(ext2fs_generic_bitmap bmap, unsigned int start, unsigned num)
for (i=0; i < len; i++)
printf("%02x", buf[i]);
printf("\n");
+ printf("bits set: %u\n", ext2fs_bitcount(buf, len));
free(buf);
}
diff --git a/lib/ext2fs/tst_bitmaps_cmds b/lib/ext2fs/tst_bitmaps_cmds
index 9a4f3d0..31e2a60 100644
--- a/lib/ext2fs/tst_bitmaps_cmds
+++ b/lib/ext2fs/tst_bitmaps_cmds
@@ -53,5 +53,47 @@ cleari 5
testi 17
testi 6
testi 4
+clearb 7 12
+dump_bb
+setb 1
+dump_bb
+setb 2
+dump_bb
+setb 3
+dump_bb
+setb 4
+dump_bb
+setb 5
+dump_bb
+setb 6
+dump_bb
+setb 7
+dump_bb
+setb 8
+dump_bb
+setb 10
+setb 12
+setb 14
+setb 17
+setb 19
+setb 24
+setb 26
+setb 27
+setb 30
+setb 31
+setb 32
+setb 35
+setb 39
+setb 40
+setb 44
+setb 46
+setb 47
+setb 49
+setb 51
+setb 52
+clearb 2
+clearb 3
+clearb 7
+dump_bb
quit
diff --git a/lib/ext2fs/tst_bitmaps_exp b/lib/ext2fs/tst_bitmaps_exp
index 2d406ce..2d62b66 100644
--- a/lib/ext2fs/tst_bitmaps_exp
+++ b/lib/ext2fs/tst_bitmaps_exp
@@ -36,6 +36,7 @@ tst_bitmaps: testb 16
Block 16 is set
tst_bitmaps: dump_bb
block bitmap: 00f80000000000000000000000000000
+bits set: 5
tst_bitmaps: ffzb 11 16
First unmarked block is 11
tst_bitmaps: ffzb 12 16
@@ -64,6 +65,7 @@ tst_bitmaps: setb 12 7
Marking blocks 12 to 18
tst_bitmaps: dump_bb
block bitmap: 00f80300000000000000000000000000
+bits set: 7
tst_bitmaps: seti 2
Setting inode 2, was clear before
tst_bitmaps: seti 5
@@ -82,6 +84,7 @@ tst_bitmaps: testi 1
Inode 1 is clear
tst_bitmaps: dump_ib
inode bitmap: 1e000000
+bits set: 4
tst_bitmaps: ffzi 1 6
First unmarked inode is 1
tst_bitmaps: ffzi 2 5
@@ -110,5 +113,99 @@ tst_bitmaps: testi 6
Inode 6 is clear
tst_bitmaps: testi 4
Inode 4 is clear
+tst_bitmaps: clearb 7 12
+Clearing blocks 7 to 18
+tst_bitmaps: dump_bb
+block bitmap: 00000000000000000000000000000000
+bits set: 0
+tst_bitmaps: setb 1
+Setting block 1, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 01000000000000000000000000000000
+bits set: 1
+tst_bitmaps: setb 2
+Setting block 2, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 03000000000000000000000000000000
+bits set: 2
+tst_bitmaps: setb 3
+Setting block 3, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 07000000000000000000000000000000
+bits set: 3
+tst_bitmaps: setb 4
+Setting block 4, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 0f000000000000000000000000000000
+bits set: 4
+tst_bitmaps: setb 5
+Setting block 5, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 1f000000000000000000000000000000
+bits set: 5
+tst_bitmaps: setb 6
+Setting block 6, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 3f000000000000000000000000000000
+bits set: 6
+tst_bitmaps: setb 7
+Setting block 7, was clear before
+tst_bitmaps: dump_bb
+block bitmap: 7f000000000000000000000000000000
+bits set: 7
+tst_bitmaps: setb 8
+Setting block 8, was clear before
+tst_bitmaps: dump_bb
+block bitmap: ff000000000000000000000000000000
+bits set: 8
+tst_bitmaps: setb 10
+Setting block 10, was clear before
+tst_bitmaps: setb 12
+Setting block 12, was clear before
+tst_bitmaps: setb 14
+Setting block 14, was clear before
+tst_bitmaps: setb 17
+Setting block 17, was clear before
+tst_bitmaps: setb 19
+Setting block 19, was clear before
+tst_bitmaps: setb 24
+Setting block 24, was clear before
+tst_bitmaps: setb 26
+Setting block 26, was clear before
+tst_bitmaps: setb 27
+Setting block 27, was clear before
+tst_bitmaps: setb 30
+Setting block 30, was clear before
+tst_bitmaps: setb 31
+Setting block 31, was clear before
+tst_bitmaps: setb 32
+Setting block 32, was clear before
+tst_bitmaps: setb 35
+Setting block 35, was clear before
+tst_bitmaps: setb 39
+Setting block 39, was clear before
+tst_bitmaps: setb 40
+Setting block 40, was clear before
+tst_bitmaps: setb 44
+Setting block 44, was clear before
+tst_bitmaps: setb 46
+Setting block 46, was clear before
+tst_bitmaps: setb 47
+Setting block 47, was clear before
+tst_bitmaps: setb 49
+Setting block 49, was clear before
+tst_bitmaps: setb 51
+Setting block 51, was clear before
+tst_bitmaps: setb 52
+Setting block 52, was clear before
+tst_bitmaps: clearb 2
+Clearing block 2, was set before
+tst_bitmaps: clearb 3
+Clearing block 3, was set before
+tst_bitmaps: clearb 7
+Clearing block 7, was set before
+tst_bitmaps: dump_bb
+block bitmap: b92a85e6c4680d000000000000000000
+bits set: 25
tst_bitmaps: quit
tst_bitmaps:
--
1.7.12.rc0.22.gcdd159b
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps
2012-11-26 11:22 ` Lukáš Czerner
@ 2012-11-26 14:17 ` Theodore Ts'o
0 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2012-11-26 14:17 UTC (permalink / raw)
To: Lukáš Czerner; +Cc: Ext4 Developers List
On Mon, Nov 26, 2012 at 12:22:42PM +0100, Lukáš Czerner wrote:
> > + if (pos > start+num)
>
> Missing spacing around '+'.
I'll fix all of the spacing comments, thanks.
> Also I think that the condition should rather be:
>
> if (pos >= start + num)
Yes. I believe the case where pos = start + num gets caught later on
when we check the count variable, but it's cleaner/faster to bail out
here.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-11-26 14:17 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-25 0:36 [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
2012-11-25 0:36 ` [PATCH 1/6] libext2fs: optimize rb_set_bmap_range() Theodore Ts'o
2012-11-26 9:40 ` Lukáš Czerner
2012-11-26 13:36 ` Theodore Ts'o
2012-11-25 0:36 ` [PATCH 2/6] e2fsck: optimize pass1 for CPU time Theodore Ts'o
2012-11-26 10:06 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 3/6] libext2fs: add ext2fs_bitcount() function Theodore Ts'o
2012-11-26 10:30 ` Lukáš Czerner
2012-11-26 14:06 ` Theodore Ts'o
2012-11-26 14:10 ` [PATCH 3/6 -v2] " Theodore Ts'o
2012-11-26 14:12 ` [PATCH 4/6 -v3] " Theodore Ts'o
2012-11-25 0:36 ` [PATCH 4/6] libext2fs: optimize rb_get_bmap_range() Theodore Ts'o
2012-11-26 10:46 ` Lukáš Czerner
2012-11-25 0:36 ` [PATCH 5/6] libext2fs: optimize rb_get_bmap_range() for mostly allocated bmaps Theodore Ts'o
2012-11-26 11:22 ` Lukáš Czerner
2012-11-26 14:17 ` Theodore Ts'o
2012-11-25 0:36 ` [PATCH 6/6] e2fsck: optimize pass 5 for CPU utilization Theodore Ts'o
2012-11-26 11:59 ` Lukáš Czerner
2012-11-25 5:09 ` [RFC PATCH 0/6] Optimize e2fsck for large file systems Theodore Ts'o
2012-11-26 9:19 ` Lukáš Czerner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).