linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ext4 devel interlock meeting minutes (April 23, 2007)
@ 2007-04-23 23:35 Avantika Mathur
  2007-04-24  6:00 ` Alex Tomas
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Avantika Mathur @ 2007-04-23 23:35 UTC (permalink / raw)
  To: linux-ext4

Ext4 Developer Interlock Call: 04/23/2007 Meeting Minutes

Attendees: Mingming Cao, Dave Kleikamp, Avantika Mathur, Ted Ts'o, 
Suparna Bhattacharya,
Jean-Pierre Dion, Jean Noel Cordenner, Valérie Clément, Jose Santos

Minutes can be accessed at: 
http://ext4.wiki.kernel.org/index.php/Ext4_Developer%27s_Conference_Call

- Mingming proposed moving back to 8am PST meeting time, since the 6am 
time is inconvenient for
a few people.  This discussion will be continued through email, to find 
a time which works
for everyone.

- Next week's meeting will be canceled, unless there is anyone who would 
like to request a meeting.

PATCH STATUS

git-tree
- Mingming will be updating the git tree with extents-fix patches from 
Alex, i_flags patch from Honza, i_extra_isize patch from Kalpak.

Uninitialized Block Groups:
- The patch sent out by Andreas is against 2.6.16 and ext3.  Need to 
port this to current ext4, test and then add to git-tree.  Avantika will 
ask Andreas if he needs help with this.

JBD statistics:
- There is a patch to export JDB statistics to /proc.  In order to get 
this patch to mainline, there needs to be discussion about the correct 
place for the statistics; /proc or perhaps debugfs.

e2fsprogs:
- Ted will post the current e2fsprogs patches in progress. Ted has been 
working with these patches and making changes.
- Main work areas for making e2fsprogs compatible with extents and 64-bit.
    - block iterator: make a block iterator work with both extent and 
non-extent code. Code that is oblivious to extents will still work with 
the block iterator.  This has been written by Andreas Dilger.
    - extents: in order to preserve ABI compatibility, support for a new 
interface for extents which uses 64-bit logical and physical block 
numbers.  The block iterator then translate from on-disk to in-memory 
format.  This will allow for possible future increases of physical and 
logical block sizes in extents, without breaking ABI.
    - bitmaps in e2fsprogs: this will be discussed in more detail at the 
next meeting, after people have a chance to read related email.

preallocation:
- fallocate syscall interface:  the current plan, based on discussions 
on the mailing list, is to create a separate wrapper for s390 in glibc.  
Using regular parameter ordering for all other architectures, but a 
different order on s390.  Jakub Jelinek has said that the changes in 
glibc can be made pretty easily.
-  The preallocation patches in the ext4 git-tree are outdated, using 
the ioctl interface.  Once Amit re-posts the patches with the syscall 
interface, they will be updated in the git-tree as well.
- Mingming mentioned the need to flush preallocation metadata changes to 
disk if file size or file content is being tested.  Discussed doing an 
fsync at Bmap time.

TESTING
- extents testing
    - Discussed methods for testing extents on highly fragmented 
filesystems.
    - Jose will look into possible tests, including perhaps using the 
'aged' option in FFSB
    - Ted suggested creating a mountoption that creates a bad block 
allocator which it jumps to a new block group every 8 blocks.  This 
would force a very large number of extents, and may be a good test for 
extents.

- large filesystem
    - We would like to perform more testing on large (>16TB) filesystems
    - currently hardware limitations are preventing this testing.  We 
have tested 10TB raid dists, and 16TB loopback devices.  Avantika will 
look into creating very large sparse devices for testing.

- Large file deletion
    - Valerie had recently tested large file deletion on ext3/4, but did 
not see the expected performance gain with ext4 due to compact metadata 
when using extents.
    - Valerie will try re-running the test.  Jose will also be looking 
into this test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-23 23:35 Ext4 devel interlock meeting minutes (April 23, 2007) Avantika Mathur
@ 2007-04-24  6:00 ` Alex Tomas
  2007-04-24 14:04   ` Valerie Clement
  2007-04-24 14:27 ` Eric Sandeen
  2007-04-30 11:06 ` Aneesh Kumar
  2 siblings, 1 reply; 9+ messages in thread
From: Alex Tomas @ 2007-04-24  6:00 UTC (permalink / raw)
  To: Avantika Mathur; +Cc: linux-ext4

Avantika Mathur wrote:
> TESTING
> - extents testing
>    - Discussed methods for testing extents on highly fragmented 
> filesystems.
>    - Jose will look into possible tests, including perhaps using the 
> 'aged' option in FFSB
>    - Ted suggested creating a mountoption that creates a bad block 
> allocator which it jumps to a new block group every 8 blocks.  This 
> would force a very large number of extents, and may be a good test for 
> extents.

there is AGGRESSIVE_TEST define which limits number of entries in index/leaf.

> - Large file deletion
>    - Valerie had recently tested large file deletion on ext3/4, but did 
> not see the expected performance gain with ext4 due to compact metadata 
> when using extents.

any details?

thanks, Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-24  6:00 ` Alex Tomas
@ 2007-04-24 14:04   ` Valerie Clement
  2007-04-24 14:21     ` Alex Tomas
  0 siblings, 1 reply; 9+ messages in thread
From: Valerie Clement @ 2007-04-24 14:04 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Avantika Mathur, linux-ext4, Mingming Cao

Alex Tomas wrote:

>> - Large file deletion
>>    - Valerie had recently tested large file deletion on ext3/4, but 
>> did not see the expected performance gain with ext4 due to compact 
>> metadata when using extents.
> 
> any details?
> 

Ok, I found my mistake. There was a typo in my test script and the 
pagecache was not flushed between the file creation and the deletion.

Here are the results I obtain with a 2.6.17-rc7 kernel to delete a 100GB 
file:

ext3 : real  2m35.048s    user  0m0.000s     sys  0m6.424s
ext4 : real  0m11.160s    user  0m0.000s     sys  0m5.532s
xfs :  real  0m0.377s     user  0m0.004s     sys  0m0.004s

The performance gain with ext4 is much larger when running a good test...
Sorry the wrong information,

    Valérie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-24 14:04   ` Valerie Clement
@ 2007-04-24 14:21     ` Alex Tomas
  2007-04-24 14:51       ` Valerie Clement
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Tomas @ 2007-04-24 14:21 UTC (permalink / raw)
  To: Valerie Clement; +Cc: Avantika Mathur, linux-ext4, Mingming Cao

Valerie Clement wrote:
> Here are the results I obtain with a 2.6.17-rc7 kernel to delete a 100GB 
> file:
> 
> ext3 : real  2m35.048s    user  0m0.000s     sys  0m6.424s
> ext4 : real  0m11.160s    user  0m0.000s     sys  0m5.532s
> xfs :  real  0m0.377s     user  0m0.004s     sys  0m0.004s

would be very interesting to know how much IO was done to remove the file
and actual fragmentation in all the cases.

thanks, Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-23 23:35 Ext4 devel interlock meeting minutes (April 23, 2007) Avantika Mathur
  2007-04-24  6:00 ` Alex Tomas
@ 2007-04-24 14:27 ` Eric Sandeen
  2007-04-30 11:06 ` Aneesh Kumar
  2 siblings, 0 replies; 9+ messages in thread
From: Eric Sandeen @ 2007-04-24 14:27 UTC (permalink / raw)
  To: Avantika Mathur; +Cc: linux-ext4

Avantika Mathur wrote:
> - large filesystem
>     - We would like to perform more testing on large (>16TB) filesystems
>     - currently hardware limitations are preventing this testing.  We 
> have tested 10TB raid dists, and 16TB loopback devices.  Avantika will 
> look into creating very large sparse devices for testing.

I've been hacking up some ext3@16T testing scripts to use sparse
devicemapper devices which make use of snapshots... loopback files don't
work for testing, at least not hosted on ext[234], because we still
can't do these large file offsets.

(Documentation/device-mapper/zero.txt in the kernel tree describes these
sparse dm devices)

Testing the whole range as a sparse snapshot can be slow, since
devicemapper has to do all the exception handling etc, and I think
essentially creates a fragmented block device.

I've been playing with something like this:

# 90% of the real device size is used for a "real" 1:1 mapping
# The other 10% is sparsely mapped out to add up to totalsize.
# i.e. -

#                          [large sparse-ish device]
#
# +----------------------~  ~-----------------------------------------+
# |                     sparse                |         real          |
# +----------------------~  ~-----------------------------------------+
#
# |<------------ SPARSE_SIZE ---------------->|<----- REAL_SIZE ----->|

# is mapped on top of:

#                           [real block device]
#                      +----------------------------+
#                      | sp |       real            |
#                      +----------------------------+

and then marking the sparse range as full (maybe via lazy_bg, or other
methods).  You could then also put a dm-error target under the "full"
sections so that any IO that may stray there will fail.

This way you can direct the real IO to the 1:1 mapping portion of the
large dm device, and shouldn't get the snapshot slowdowns.

Anyway, just something I've been playing with...

-eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-24 14:21     ` Alex Tomas
@ 2007-04-24 14:51       ` Valerie Clement
  0 siblings, 0 replies; 9+ messages in thread
From: Valerie Clement @ 2007-04-24 14:51 UTC (permalink / raw)
  To: Alex Tomas; +Cc: Avantika Mathur, linux-ext4, Mingming Cao

Alex Tomas wrote:
> Valerie Clement wrote:
>> Here are the results I obtain with a 2.6.17-rc7 kernel to delete a 
>> 100GB file:
>>
>> ext3 : real  2m35.048s    user  0m0.000s     sys  0m6.424s
>> ext4 : real  0m11.160s    user  0m0.000s     sys  0m5.532s
>> xfs :  real  0m0.377s     user  0m0.004s     sys  0m0.004s
> 
> would be very interesting to know how much IO was done to remove the file
> and actual fragmentation in all the cases.
> 
> thanks, Alex
> 
Ok, I will do it.

   Valérie

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-23 23:35 Ext4 devel interlock meeting minutes (April 23, 2007) Avantika Mathur
  2007-04-24  6:00 ` Alex Tomas
  2007-04-24 14:27 ` Eric Sandeen
@ 2007-04-30 11:06 ` Aneesh Kumar
  2007-04-30 11:13   ` Alex Tomas
  2007-05-01 12:08   ` Kalpak Shah
  2 siblings, 2 replies; 9+ messages in thread
From: Aneesh Kumar @ 2007-04-30 11:06 UTC (permalink / raw)
  To: Avantika Mathur; +Cc: linux-ext4

On 4/24/07, Avantika Mathur <mathur@linux.vnet.ibm.com> wrote:
> Ext4 Developer Interlock Call: 04/23/2007 Meeting Minutes
>
> TESTING
> - extents testing
>     - Discussed methods for testing extents on highly fragmented
> filesystems.
>     - Jose will look into possible tests, including perhaps using the
> 'aged' option in FFSB
>     - Ted suggested creating a mountoption that creates a bad block
> allocator which it jumps to a new block group every 8 blocks.  This
> would force a very large number of extents, and may be a good test for
> extents.


What i am doing for creating a large number of extents is

dd if=/dev/zero of=myfile count=10
seek=20
while [ 1 ]; do dd if=/dev/zero of=myfile count=10 seek=$seek;
seek=`expr $seek + 20`; done


-aneesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-30 11:06 ` Aneesh Kumar
@ 2007-04-30 11:13   ` Alex Tomas
  2007-05-01 12:08   ` Kalpak Shah
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Tomas @ 2007-04-30 11:13 UTC (permalink / raw)
  To: Aneesh Kumar; +Cc: Avantika Mathur, linux-ext4

Aneesh Kumar wrote:
> What i am doing for creating a large number of extents is
> 
> dd if=/dev/zero of=myfile count=10
> seek=20
> while [ 1 ]; do dd if=/dev/zero of=myfile count=10 seek=$seek;
> seek=`expr $seek + 20`; done

with AGGRESSIVE_TEST defined in include/linux/ext4_fs_extents.h you may
get much more extents and index blocks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Ext4 devel interlock meeting minutes (April 23, 2007)
  2007-04-30 11:06 ` Aneesh Kumar
  2007-04-30 11:13   ` Alex Tomas
@ 2007-05-01 12:08   ` Kalpak Shah
  1 sibling, 0 replies; 9+ messages in thread
From: Kalpak Shah @ 2007-05-01 12:08 UTC (permalink / raw)
  To: Aneesh Kumar; +Cc: Avantika Mathur, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 1662 bytes --]

On Mon, 2007-04-30 at 16:36 +0530, Aneesh Kumar wrote:
> On 4/24/07, Avantika Mathur <mathur@linux.vnet.ibm.com> wrote:
> > Ext4 Developer Interlock Call: 04/23/2007 Meeting Minutes
> >
> > TESTING
> > - extents testing
> >     - Discussed methods for testing extents on highly fragmented
> > filesystems.
> >     - Jose will look into possible tests, including perhaps using the
> > 'aged' option in FFSB
> >     - Ted suggested creating a mountoption that creates a bad block
> > allocator which it jumps to a new block group every 8 blocks.  This
> > would force a very large number of extents, and may be a good test for
> > extents.
> 
> 
> What i am doing for creating a large number of extents is
> 
> dd if=/dev/zero of=myfile count=10
> seek=20
> while [ 1 ]; do dd if=/dev/zero of=myfile count=10 seek=$seek;
> seek=`expr $seek + 20`; done
> 
> 

I had written a simple tool "bitmap_manip" with which you can actually
manipulate the number of free chunks and their sizes in a filesystem. It
uses libext2fs to set the bits in block bitmaps thereby leaving the
desired free extents. I had written it to test the allocators
performance. 

It can be used as:
 ./bitmap_manip /dev/sda9 1MA 4 16K 1 12K 3 8K 4 4K 6
 
This will leave only 1 16K chunk, 3 12K chunks, .... free in the
filesystem. "1MA" 4 will get us 4 1Mb free ALIGNED chunks.

It isn't very beautiful code since it was only used for testing but
maybe it can help.

Thanks,
Kalpak.

> -aneesh
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: bitmap_manip.c --]
[-- Type: text/x-csrc, Size: 5074 bytes --]

/* Manipulate block bitmap directly for mballoc testing */

/* USAGE:
 * ./bitmap_manip /dev/volmballoc/test 16K 1 12K 3 8K 4 4K 6
 * This will leave 1 16K chunk, 3 12K chunks, .... in the filesystem specified.
 * Ideally give the inputs in ascending order. 
 * 1MA 4 will get us 4 1Mb ALIGNED chunks.	
 */

#include <stdio.h>
#include <ext2fs/ext2fs.h>
#include <ext2fs/ext2_types.h>
#include <fcntl.h>
#include <stdlib.h>

#define ONE_MB (1024 * 1024)
#define ONE_KB 1024

#define SETTING 0
#define FREEING 1

#define NO_ALIGN 0
#define ALIGN 1

struct chunk_arg {
	int chunk_size;
	int num_chunks;
	int align;
};

int main(int argc, char **argv)
{
	ext2_filsys fs;
	ext2fs_block_bitmap *map = NULL;
	int bg_num = 0, retval, arg_num, multiply, chunk_num;
	int i, start_blk, set_bit, test_bit, j;
	struct chunk_arg chunk[50];
	int free_blocks_req = 0, free_blocks_avail, num_of_chunks_req = 0, group;
	char str[10];
	float orig_avail_req, avail_req;
	int set_till_now, free_till_now, num_blks_to_set, num_blks_to_free, phase;
	int  current, align_flag = 0, align = 0, curr = 0;

	if (argc < 2) {
		printf("Please give name of a filesystem. Exiting...\n");
		return -1;
	}
	
	/* Even from user's perspective */
	if(argc & 0x01) {
		printf("This utility cannot have even number of arguments.\n");
		return -1;
	}

	if ((retval = ext2fs_open(argv[1], EXT2_FLAG_RW, 0, 0, unix_io_manager, &fs))) {
		com_err("ext2fs open:", retval, "while opening %s\n", argv[1]);
		return retval;
	}	

	srand(1234567);
	chunk_num = 0;
	for (arg_num = 2; arg_num < argc; arg_num += 2, chunk_num++) {
		strcpy(str, argv[arg_num]);

		/* Check if we have to align */
		if (toupper(str[strlen(str) - 1 ]) == 'A') {
			chunk[chunk_num].align = ALIGN;
			str[strlen(str) - 1] = '\0';	
			align = 1;
		}
		else
			chunk[chunk_num].align = NO_ALIGN;
		if (toupper(str[strlen(str) - 1]) == 'K')
			multiply = ONE_KB;
		else if(toupper(str[strlen(str) - 1]) == 'M') 
			multiply = ONE_MB;

		str[strlen(str) - 1] = '\0';
		chunk[chunk_num].chunk_size = ((strtod(str, NULL)) * multiply)/ (fs->blocksize);
		chunk[chunk_num].num_chunks = strtod(argv[arg_num + 1], NULL); 
			
		free_blocks_req += chunk[chunk_num].chunk_size * chunk[chunk_num].num_chunks;
		num_of_chunks_req += chunk[chunk_num].num_chunks;
	}

	ext2fs_read_block_bitmap(fs);			
	map = &fs->block_map;	

	start_blk = fs->super->s_first_data_block;
	free_blocks_avail = fs->super->s_free_blocks_count;

	orig_avail_req = free_blocks_avail / free_blocks_req;
	current = 0;
	i = start_blk;
	
	num_blks_to_set = (orig_avail_req / 4) * chunk[current].chunk_size;
	num_blks_to_free = chunk[current].chunk_size;
	phase = SETTING;
	do {
		test_bit = i;
		if (!ext2fs_fast_test_block_bitmap(*map, test_bit)) {
			if (phase == SETTING) {				
				if (chunk[current].align == ALIGN && chunk[current].num_chunks > 0) {
					if (align_flag == 0) {
						num_blks_to_set = (i / chunk[current].chunk_size + 1) * 
							chunk[current].chunk_size - i;
						align_flag = 1;
					}
					else if (i % chunk[current].chunk_size == 0) {
						num_blks_to_set = 0;
						phase = FREEING;
					}
				}
				set_bit = i;
		                ext2fs_mark_block_bitmap(*map, set_bit); 
				group = (set_bit - fs->super->s_first_data_block) / fs->super->s_blocks_per_group;
				fs->group_desc[group].bg_free_blocks_count--;
				fs->super->s_free_blocks_count--;		
				num_blks_to_set--;
				if (num_blks_to_set == 0) {
					phase = FREEING;
					align_flag = 0;
				}
			}
			else if (phase == FREEING) { 
				free_blocks_req--;
				num_blks_to_free--;
				if (num_blks_to_free == 0) {
					/* Decide how many blocks to set */
					phase = SETTING;
					num_of_chunks_req--;
					chunk[current].num_chunks--;
					
					/* No more free chunks required*/
					if (num_of_chunks_req == 0) {
						num_blks_to_set = free_blocks_avail;
					}
					else {
						for (j = 0; j < chunk_num; j++) {	
							if (chunk[j].num_chunks > 0) {	
								if (free_blocks_req > chunk[j].num_chunks * 
 									    chunk[j].chunk_size && current == j) {
									continue;
								}
								else {
									current = j;
									break;	
								}
							}
						}							
  					        avail_req = free_blocks_avail / free_blocks_req;
						if (align != 1)
	       						num_blks_to_set = (avail_req / 4) * 
								    chunk[current].chunk_size;
						else 
							num_blks_to_set = 20;
						num_blks_to_free = chunk[current].chunk_size;		
						/* Make sure a free block does not break across block groups */
						curr = i % 32767;
						curr = 32767 * (curr + 1);
						if (i + num_blks_to_set + num_blks_to_free > curr && i < curr)	
							num_blks_to_set += (curr) - (i + num_blks_to_set);
					}
				}
			}												
			free_blocks_avail--;	
		}
                i++;
	}while(i <= (fs->super->s_blocks_count - 1) || free_blocks_avail != 0);

        ext2fs_mark_bb_dirty(fs);
        ext2fs_mark_super_dirty(fs);

	if (i == fs->super->s_blocks_count && free_blocks_req != 0) {
		printf("Block manipulation failed. Sorry.\n");
		return 0;
	}		

	ext2fs_close(fs);	
	return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-01 12:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-23 23:35 Ext4 devel interlock meeting minutes (April 23, 2007) Avantika Mathur
2007-04-24  6:00 ` Alex Tomas
2007-04-24 14:04   ` Valerie Clement
2007-04-24 14:21     ` Alex Tomas
2007-04-24 14:51       ` Valerie Clement
2007-04-24 14:27 ` Eric Sandeen
2007-04-30 11:06 ` Aneesh Kumar
2007-04-30 11:13   ` Alex Tomas
2007-05-01 12:08   ` Kalpak Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).