All of lore.kernel.org
 help / color / mirror / Atom feed
* Resier Fragmentation Effects (was compression vs performance)
@ 2004-04-08 17:00 Burnes, James
  2004-04-09  5:53 ` Hans Reiser
  0 siblings, 1 reply; 8+ messages in thread
From: Burnes, James @ 2004-04-08 17:00 UTC (permalink / raw)
  To: Stewart Smith, Tom Vier; +Cc: Scott Young, reiserfs-list

I thought nearly all filesystems designed since Berkeley FFS were nearly
immune to fragmentation problems.

After reading the following analysis at Harvard, it seems that
fragmentation is still a problem.

http://www.eecs.harvard.edu/~keith/research/tr94.html

At least with FFS it seems that fragmentation is significantly worse
with smaller files.  That makes a certain intuitive sense.

Of course the Harvard guys are claiming worst case FFS fragmentation
incurs a 30% performance hit.  It would be nice if they could fix that,
but everything is relative.  I remember badly fragmented FAT filesystems
with probably closer to 90% performance hit. 

Apparently worst case is with file systems loaded down with a lot of
small files like news and mail servers.  Since Reiser tends to be used
in situations that call for a lot of small file creation and deletion I
thought this would be pertinent.  Also Reiser is radically different
internally than FFS.

I know Hans is super busy right now so I don't expect him to comment,
but maybe one of the core people could comment about the effects in
Reiser3 and 4 if they have a spare moment.

jim burnes
security engineer
great-west, denver
 

> -----Original Message-----
> From: Stewart Smith [mailto:stewart@flamingspork.com]
> Sent: Thursday, April 08, 2004 5:48 AM
> To: Tom Vier
> Cc: Scott Young; reiserfs-list@namesys.com
> Subject: Re: Can compression at filesystem level improveoverall
> performance?
> 
> On Tue, 2004-03-30 at 14:53, Tom Vier wrote:
> > an online defragger is an interesting idea. i think i remember the
topic
> > coming up for ext2 along time ago. iirc, reiserfs can lose
performance
> over
> > time (usage, actually), too.
> 
> XFS has this, xfs_fsr (part of xfsdump package on debian, might be
> called that on other distros too...)
> 
> Although... it's pretty hard to get XFS to fragment in the first
place,
> which is the best way to do things - but it's a hard way :)
> 
> --
> Stewart Smith (stewart@flamingspork.com)
> http://www.flamingspork.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Resier Fragmentation Effects (was compression vs performance)
@ 2004-04-08 17:07 Burnes, James
  2004-04-08 17:24 ` Dieter Nützel
  0 siblings, 1 reply; 8+ messages in thread
From: Burnes, James @ 2004-04-08 17:07 UTC (permalink / raw)
  To: Burnes, James, Stewart Smith, Tom Vier; +Cc: Scott Young, reiserfs-list

Little update.  That research is 10 years old, so I don't know how valid
it is anymore.

jim burnes
security engineer
great-west, denver
 

> -----Original Message-----
> From: Burnes, James
> Sent: Thursday, April 08, 2004 11:00 AM
> To: Stewart Smith; Tom Vier
> Cc: Scott Young; reiserfs-list@namesys.com
> Subject: Resier Fragmentation Effects (was compression vs performance)
> 
> I thought nearly all filesystems designed since Berkeley FFS were
nearly
> immune to fragmentation problems.
> 
> After reading the following analysis at Harvard, it seems that
> fragmentation is still a problem.
> 
> http://www.eecs.harvard.edu/~keith/research/tr94.html
> 
> At least with FFS it seems that fragmentation is significantly worse
> with smaller files.  That makes a certain intuitive sense.
> 
> Of course the Harvard guys are claiming worst case FFS fragmentation
> incurs a 30% performance hit.  It would be nice if they could fix
that,
> but everything is relative.  I remember badly fragmented FAT
filesystems
> with probably closer to 90% performance hit.
> 
> Apparently worst case is with file systems loaded down with a lot of
> small files like news and mail servers.  Since Reiser tends to be used
> in situations that call for a lot of small file creation and deletion
I
> thought this would be pertinent.  Also Reiser is radically different
> internally than FFS.
> 
> I know Hans is super busy right now so I don't expect him to comment,
> but maybe one of the core people could comment about the effects in
> Reiser3 and 4 if they have a spare moment.
> 
> jim burnes
> security engineer
> great-west, denver
> 
> 
> > -----Original Message-----
> > From: Stewart Smith [mailto:stewart@flamingspork.com]
> > Sent: Thursday, April 08, 2004 5:48 AM
> > To: Tom Vier
> > Cc: Scott Young; reiserfs-list@namesys.com
> > Subject: Re: Can compression at filesystem level improveoverall
> > performance?
> >
> > On Tue, 2004-03-30 at 14:53, Tom Vier wrote:
> > > an online defragger is an interesting idea. i think i remember the
> topic
> > > coming up for ext2 along time ago. iirc, reiserfs can lose
> performance
> > over
> > > time (usage, actually), too.
> >
> > XFS has this, xfs_fsr (part of xfsdump package on debian, might be
> > called that on other distros too...)
> >
> > Although... it's pretty hard to get XFS to fragment in the first
> place,
> > which is the best way to do things - but it's a hard way :)
> >
> > --
> > Stewart Smith (stewart@flamingspork.com)
> > http://www.flamingspork.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-08 17:07 Burnes, James
@ 2004-04-08 17:24 ` Dieter Nützel
  0 siblings, 0 replies; 8+ messages in thread
From: Dieter Nützel @ 2004-04-08 17:24 UTC (permalink / raw)
  To: reiserfs-list; +Cc: Burnes, James, Stewart Smith, Tom Vier, Scott Young

Am Donnerstag, 8. April 2004 19:07 schrieb Burnes, James:
> Little update.  That research is 10 years old, so I don't know how valid
> it is anymore.

It is ;-)

Have a look into the RieserFS archive.
For example at:
http://marc.theaimsgroup.com

Greetings,
	Dieter

PS A repacker for Reiser4 is under way.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-08 17:00 Resier Fragmentation Effects (was compression vs performance) Burnes, James
@ 2004-04-09  5:53 ` Hans Reiser
  2004-04-09 18:13   ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Hans Reiser @ 2004-04-09  5:53 UTC (permalink / raw)
  To: Burnes, James; +Cc: Stewart Smith, Tom Vier, Scott Young, reiserfs-list

Burnes, James wrote:

>I thought nearly all filesystems designed since Berkeley FFS were nearly
>immune to fragmentation problems.
>
>After reading the following analysis at Harvard, it seems that
>fragmentation is still a problem.
>
>http://www.eecs.harvard.edu/~keith/research/tr94.html
>
>  
>
Yeah, I wish I had read this in 94.  V3 suffers from the same problems 
as FFS does as described in the abstract (all that I read, sorry about 
that, I really am a bit busy, so unless someone suggests I should read 
more.... ) .  V4 cures it though.

-- 
Hans


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-09  5:53 ` Hans Reiser
@ 2004-04-09 18:13   ` Chris Mason
  2004-04-10  6:09     ` Hans Reiser
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2004-04-09 18:13 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Burnes, James, Stewart Smith, Tom Vier, Scott Young,
	reiserfs-list

On Fri, 2004-04-09 at 01:53, Hans Reiser wrote:
> Burnes, James wrote:
> 
> >I thought nearly all filesystems designed since Berkeley FFS were nearly
> >immune to fragmentation problems.
> >
> >After reading the following analysis at Harvard, it seems that
> >fragmentation is still a problem.
> >
> >http://www.eecs.harvard.edu/~keith/research/tr94.html
> >
> >  
> >
> Yeah, I wish I had read this in 94.  V3 suffers from the same problems 
> as FFS does as described in the abstract (all that I read, sorry about 
> that, I really am a bit busy, so unless someone suggests I should read 
> more.... ) .  V4 cures it though.

I put out some patches last week that try to deal with this in v3.  Take
a look through the archives for mail from me.

The v3 patches are an attempt to do better under common workloads.  I
think they are a big improvement, and I doubt there's much more that can
(or should) be done beyond simple tweaking.

v4 does a better job, and even if it doesn't, it should at least have
enough info in the metadata such that any problems can be fixed.

-chris



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-09 18:13   ` Chris Mason
@ 2004-04-10  6:09     ` Hans Reiser
  2004-04-10 20:02       ` Chris Mason
  0 siblings, 1 reply; 8+ messages in thread
From: Hans Reiser @ 2004-04-10  6:09 UTC (permalink / raw)
  To: Chris Mason
  Cc: Burnes, James, Stewart Smith, Tom Vier, Scott Young,
	reiserfs-list

Chris Mason wrote:

>On Fri, 2004-04-09 at 01:53, Hans Reiser wrote:
>  
>
>>Burnes, James wrote:
>>
>>    
>>
>>>I thought nearly all filesystems designed since Berkeley FFS were nearly
>>>immune to fragmentation problems.
>>>
>>>After reading the following analysis at Harvard, it seems that
>>>fragmentation is still a problem.
>>>
>>>http://www.eecs.harvard.edu/~keith/research/tr94.html
>>>
>>> 
>>>
>>>      
>>>
>>Yeah, I wish I had read this in 94.  V3 suffers from the same problems 
>>as FFS does as described in the abstract (all that I read, sorry about 
>>that, I really am a bit busy, so unless someone suggests I should read 
>>more.... ) .  V4 cures it though.
>>    
>>
>
>I put out some patches last week that try to deal with this in v3. 
>
Describe the algorithmic changes please.

> Take
>a look through the archives for mail from me.
>
>The v3 patches are an attempt to do better under common workloads.  I
>think they are a big improvement, and I doubt there's much more that can
>(or should) be done beyond simple tweaking.
>
>v4 does a better job, and even if it doesn't, it should at least have
>enough info in the metadata such that any problems can be fixed.
>
>-chris
>
>
>
>
>  
>


-- 
Hans


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-10  6:09     ` Hans Reiser
@ 2004-04-10 20:02       ` Chris Mason
  2004-04-11 15:40         ` Hans Reiser
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Mason @ 2004-04-10 20:02 UTC (permalink / raw)
  To: Hans Reiser
  Cc: Burnes, James, Stewart Smith, Tom Vier, Scott Young,
	reiserfs-list

On Sat, 2004-04-10 at 02:09, Hans Reiser wrote:

> >I put out some patches last week that try to deal with this in v3. 
> >
> Describe the algorithmic changes please.

These are the patches that Jeff and I started working on back in 2.4.20
or so.  The top of the patch documents the basic ideas.  Note that even
though I use the term bitmap group, this is just a logical entity
calculated from a hash of the packing locality or object id.  

v3 has always had options for using hashes to find areas of the disk for
allocation, the big difference is that I hashed into 64MB chunks of the
disk instead of into an individual starting block.  This keeps data
blocks together for the common case (files created one at a time in a
directory), but doesn't bunch everything at the start of the disk.

Rest of the info below:

The current reiserfs allocator pretty much allocates things sequentially
from the start of the disk, it works very nicely for desktop loads but
once you've got more then one proc doing io data files can fragment badly.

One obvious solution is something like ext2's bitmap groups, which put
file data into different areas of the disk based on which subdirectory
they are in.  The problem with bitmap groups is that if you've got a
group of subdirectories their contents will be spread out all over the
disk, leading to lots of seeks during a sequential read.

This allocator patch uses the packing locality to determine which bitmap
group to allocate from, but when you create a file it looks in the btree
to see how 'full' that packing locality already is.  If it hasn't been
heavily used yet, the packing locality is inherited from the parent
directory putting files in new subdirs close to the parent subdir,
otherwise it is the inode number of the parent directory putting new
files far away from the parent subdir.

The end result is fewer packing localities for the same working set.  For
example, one test data set created by 20 procs running in parallel has
6822 subdirs.  And with vanilla reiserfs that would mean 6822
packing localities.  This patch turns that into 2970 packing localities.

This makes sequential reads of big directory trees more efficient, but
it also makes the btree more efficient in general.  Things end up sorted
better because groups of subdirs end up with similar keys in the btree,
instead of being spread out all over.

The patch does not change any of the defaults, you need special mount
options to enable things.  I suggest starting here:

mount -o alloc=skip_busy:dirid_groups,packing_groups

mount -o alloc=dirid_groups will turn on the bitmap groups
mount -o packing_groups turns on the packing locality reduction code
mount -o alloc=skip_busy is the default
mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
skip_busy

Finally the patch adds a mount -o alloc=oid_groups, which puts files into
bitmap groups based on a has of their objectid.  This would be used for
databases or other situations where you have a limited number of very
large files.

This command will tell you how many packing localities are actually in
use:

debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l

-chris



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Resier Fragmentation Effects (was compression vs performance)
  2004-04-10 20:02       ` Chris Mason
@ 2004-04-11 15:40         ` Hans Reiser
  0 siblings, 0 replies; 8+ messages in thread
From: Hans Reiser @ 2004-04-11 15:40 UTC (permalink / raw)
  To: Chris Mason
  Cc: Burnes, James, Stewart Smith, Tom Vier, Scott Young,
	reiserfs-list

Chris Mason wrote:

>On Sat, 2004-04-10 at 02:09, Hans Reiser wrote:
>
>  
>
>>>I put out some patches last week that try to deal with this in v3. 
>>>
>>>      
>>>
>>Describe the algorithmic changes please.
>>    
>>
>
>These are the patches that Jeff and I started working on back in 2.4.20
>or so.  The top of the patch documents the basic ideas.  Note that even
>though I use the term bitmap group, this is just a logical entity
>calculated from a hash of the packing locality or object id.  
>
>v3 has always had options for using hashes to find areas of the disk for
>allocation, the big difference is that I hashed into 64MB chunks of the
>disk instead of into an individual starting block.  This keeps data
>blocks together for the common case (files created one at a time in a
>directory), but doesn't bunch everything at the start of the disk.
>
>Rest of the info below:
>
>The current reiserfs allocator pretty much allocates things sequentially
>from the start of the disk, it works very nicely for desktop loads but
>once you've got more then one proc doing io data files can fragment badly.
>
>One obvious solution is something like ext2's bitmap groups, which put
>file data into different areas of the disk based on which subdirectory
>they are in.  The problem with bitmap groups is that if you've got a
>group of subdirectories their contents will be spread out all over the
>disk, leading to lots of seeks during a sequential read.
>
>This allocator patch uses the packing locality to determine which bitmap
>group to allocate from, but when you create a file it looks in the btree
>to see how 'full' that packing locality already is.  If it hasn't been
>heavily used yet, the packing locality is inherited from the parent
>directory putting files in new subdirs close to the parent subdir,
>  
>
this seems like a very good idea, to determine whether to go to a new 
area of the disk based on how full the current one is

>otherwise it is the inode number of the parent directory putting new
>files far away from the parent subdir.
>
>The end result is fewer packing localities for the same working set.  For
>example, one test data set created by 20 procs running in parallel has
>6822 subdirs.  And with vanilla reiserfs that would mean 6822
>packing localities.  This patch turns that into 2970 packing localities.
>
>This makes sequential reads of big directory trees more efficient, but
>it also makes the btree more efficient in general.  Things end up sorted
>better because groups of subdirs end up with similar keys in the btree,
>instead of being spread out all over.
>
>The patch does not change any of the defaults, you need special mount
>options to enable things.  I suggest starting here:
>
>mount -o alloc=skip_busy:dirid_groups,packing_groups
>
>mount -o alloc=dirid_groups will turn on the bitmap groups
>mount -o packing_groups turns on the packing locality reduction code
>mount -o alloc=skip_busy is the default
>mount -o alloc=skip_busy:dirid_groups turns on both dirid_groups and
>skip_busy
>
>Finally the patch adds a mount -o alloc=oid_groups, which puts files into
>bitmap groups based on a has of their objectid.  This would be used for
>databases or other situations where you have a limited number of very
>large files.
>
>This command will tell you how many packing localities are actually in
>use:
>
>debugreiserfs -d /dev/xxx | grep '^|.*SD' | sed 's/^.....//' | awk '{print $1}' | sort -u | wc -l
>
>-chris
>
>
>
>
>  
>


-- 
Hans


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-04-11 15:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-08 17:00 Resier Fragmentation Effects (was compression vs performance) Burnes, James
2004-04-09  5:53 ` Hans Reiser
2004-04-09 18:13   ` Chris Mason
2004-04-10  6:09     ` Hans Reiser
2004-04-10 20:02       ` Chris Mason
2004-04-11 15:40         ` Hans Reiser
  -- strict thread matches above, loose matches on Subject: below --
2004-04-08 17:07 Burnes, James
2004-04-08 17:24 ` Dieter Nützel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.