Reducing memory requirements for high extent xfs files

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Reducing memory requirements for high extent xfs files
@ 2007-05-30 16:49 Michael Nishimoto
  2007-05-30 22:55 ` David Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Nishimoto @ 2007-05-30 16:49 UTC (permalink / raw)
  To: xfs

Hello,

Has anyone done any work or had thoughts on changes required
to reduce the total memory footprint of high extent xfs files?

Obviously, it is important to reduce fragmentation as files
are generated and to regularly defrag files, but both of these 
alternatives are not complete solutions.

To reduce memory consumption, xfs could bring in extents
from disk as needed (or just before needed) and could free 
up mappings when certain extent ranges have not been recently
accessed.  A solution should become more aggressive about 
reclaiming extent mapping memory as free memory becomes limited.

   Michael

____________________________________________________________________

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-05-30 16:49 Reducing memory requirements for high extent xfs files Michael Nishimoto
@ 2007-05-30 22:55 ` David Chinner
  2007-06-05 22:23   ` Michael Nishimoto
  0 siblings, 1 reply; 13+ messages in thread
From: David Chinner @ 2007-05-30 22:55 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: xfs

On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
> Hello,
> 
> Has anyone done any work or had thoughts on changes required
> to reduce the total memory footprint of high extent xfs files?

We changed the way we do memory allocation to avoid needing
large contiguous chunks of memory a bit over a year ago;
that solved the main OOM problem we were getting reported
with highly fragmented files.

> Obviously, it is important to reduce fragmentation as files
> are generated and to regularly defrag files, but both of these 
> alternatives are not complete solutions.
>
> To reduce memory consumption, xfs could bring in extents
> from disk as needed (or just before needed) and could free 
> up mappings when certain extent ranges have not been recently
> accessed.  A solution should become more aggressive about 
> reclaiming extent mapping memory as free memory becomes limited.

Yes, it could, but that's a pretty major overhaul of the extent
interface which currently assumes everywhere that the entire
extent tree is in core.

Can you describe the problem you are seeing that leads you to
ask this question? What's the problem you need to solve?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-05-30 22:55 ` David Chinner
@ 2007-06-05 22:23   ` Michael Nishimoto
  2007-06-05 23:11     ` Vlad Apostolov
  2007-06-06  1:36     ` David Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Nishimoto @ 2007-06-05 22:23 UTC (permalink / raw)
  To: David Chinner; +Cc: Michael Nishimoto, xfs



David Chinner wrote:
> On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
>  > Hello,
>  >
>  > Has anyone done any work or had thoughts on changes required
>  > to reduce the total memory footprint of high extent xfs files?
> 
> We changed the way we do memory allocation to avoid needing
> large contiguous chunks of memory a bit over a year ago;
> that solved the main OOM problem we were getting reported
> with highly fragmented files.
> 
>  > Obviously, it is important to reduce fragmentation as files
>  > are generated and to regularly defrag files, but both of these
>  > alternatives are not complete solutions.
>  >
>  > To reduce memory consumption, xfs could bring in extents
>  > from disk as needed (or just before needed) and could free
>  > up mappings when certain extent ranges have not been recently
>  > accessed.  A solution should become more aggressive about
>  > reclaiming extent mapping memory as free memory becomes limited.
> 
> Yes, it could, but that's a pretty major overhaul of the extent
> interface which currently assumes everywhere that the entire
> extent tree is in core.
> 
> Can you describe the problem you are seeing that leads you to
> ask this question? What's the problem you need to solve?
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group

I realize that this work won't be trivial which is why I asked if anyone
has thought about all relevant issues.

When using NFS over XFS, slowly growing files (can be ascii log files)
tend to fragment quite a bit.  One system had several hundred files
which required more than one page to store the extents.  Quite a few
files had extent counts greater than 10k, and one file had 120k extents.
Besides the memory consumption, latency to return the first byte of the
file can get noticeable.

   Michael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-05 22:23   ` Michael Nishimoto
@ 2007-06-05 23:11     ` Vlad Apostolov
  2007-06-05 23:17       ` Vlad Apostolov
  2007-06-06  1:36     ` David Chinner
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Apostolov @ 2007-06-05 23:11 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: David Chinner, Michael Nishimoto, xfs

Michael Nishimoto wrote:
>
>
> David Chinner wrote:
>> On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
>>  > Hello,
>>  >
>>  > Has anyone done any work or had thoughts on changes required
>>  > to reduce the total memory footprint of high extent xfs files?
>>
>> We changed the way we do memory allocation to avoid needing
>> large contiguous chunks of memory a bit over a year ago;
>> that solved the main OOM problem we were getting reported
>> with highly fragmented files.
>>
>>  > Obviously, it is important to reduce fragmentation as files
>>  > are generated and to regularly defrag files, but both of these
>>  > alternatives are not complete solutions.
>>  >
>>  > To reduce memory consumption, xfs could bring in extents
>>  > from disk as needed (or just before needed) and could free
>>  > up mappings when certain extent ranges have not been recently
>>  > accessed.  A solution should become more aggressive about
>>  > reclaiming extent mapping memory as free memory becomes limited.
>>
>> Yes, it could, but that's a pretty major overhaul of the extent
>> interface which currently assumes everywhere that the entire
>> extent tree is in core.
>>
>> Can you describe the problem you are seeing that leads you to
>> ask this question? What's the problem you need to solve?
>>
>> Cheers,
>>
>> Dave.
>> -- 
>> Dave Chinner
>> Principal Engineer
>> SGI Australian Software Group
>
> I realize that this work won't be trivial which is why I asked if anyone
> has thought about all relevant issues.
>
> When using NFS over XFS, slowly growing files (can be ascii log files)
> tend to fragment quite a bit.  One system had several hundred files
> which required more than one page to store the extents.  Quite a few
> files had extent counts greater than 10k, and one file had 120k extents.
> Besides the memory consumption, latency to return the first byte of the
> file can get noticeable.
>
>   Michael
>
Hi Michael,

You could use XFS_XFLAG_EXTSIZE and XFS_XFLAG_RTINHERIT flags to
set extent hint size, which would reduce the file fragmentation in this 
scenario.
Please check xfcntl man page for more details.

Regards,
Vlad
/

/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-05 23:11     ` Vlad Apostolov
@ 2007-06-05 23:17       ` Vlad Apostolov
  0 siblings, 0 replies; 13+ messages in thread
From: Vlad Apostolov @ 2007-06-05 23:17 UTC (permalink / raw)
  To: Vlad Apostolov; +Cc: Michael Nishimoto, David Chinner, Michael Nishimoto, xfs

Vlad Apostolov wrote:
> Michael Nishimoto wrote:
>>
>>
>> David Chinner wrote:
>>> On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
>>>  > Hello,
>>>  >
>>>  > Has anyone done any work or had thoughts on changes required
>>>  > to reduce the total memory footprint of high extent xfs files?
>>>
>>> We changed the way we do memory allocation to avoid needing
>>> large contiguous chunks of memory a bit over a year ago;
>>> that solved the main OOM problem we were getting reported
>>> with highly fragmented files.
>>>
>>>  > Obviously, it is important to reduce fragmentation as files
>>>  > are generated and to regularly defrag files, but both of these
>>>  > alternatives are not complete solutions.
>>>  >
>>>  > To reduce memory consumption, xfs could bring in extents
>>>  > from disk as needed (or just before needed) and could free
>>>  > up mappings when certain extent ranges have not been recently
>>>  > accessed.  A solution should become more aggressive about
>>>  > reclaiming extent mapping memory as free memory becomes limited.
>>>
>>> Yes, it could, but that's a pretty major overhaul of the extent
>>> interface which currently assumes everywhere that the entire
>>> extent tree is in core.
>>>
>>> Can you describe the problem you are seeing that leads you to
>>> ask this question? What's the problem you need to solve?
>>>
>>> Cheers,
>>>
>>> Dave.
>>> -- 
>>> Dave Chinner
>>> Principal Engineer
>>> SGI Australian Software Group
>>
>> I realize that this work won't be trivial which is why I asked if anyone
>> has thought about all relevant issues.
>>
>> When using NFS over XFS, slowly growing files (can be ascii log files)
>> tend to fragment quite a bit.  One system had several hundred files
>> which required more than one page to store the extents.  Quite a few
>> files had extent counts greater than 10k, and one file had 120k extents.
>> Besides the memory consumption, latency to return the first byte of the
>> file can get noticeable.
>>
>>   Michael
>>
> Hi Michael,
>
> You could use XFS_XFLAG_EXTSIZE and XFS_XFLAG_RTINHERIT flags to
> set extent hint size, which would reduce the file fragmentation in 
> this scenario.
> Please check xfcntl man page for more details.
>
> Regards,
> Vlad
I meant XFS_XFLAG_EXTSZINHERIT not XFS_XFLAG_RTINHERIT. This one
should be set on a parent directory.

Regards,
Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-05 22:23   ` Michael Nishimoto
  2007-06-05 23:11     ` Vlad Apostolov
@ 2007-06-06  1:36     ` David Chinner
  2007-06-06  2:00       ` Vlad Apostolov
  2007-06-06 17:18       ` Michael Nishimoto
  1 sibling, 2 replies; 13+ messages in thread
From: David Chinner @ 2007-06-06  1:36 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: David Chinner, Michael Nishimoto, xfs

On Tue, Jun 05, 2007 at 03:23:50PM -0700, Michael Nishimoto wrote:
> David Chinner wrote:
> >On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
> > > Hello,
> > >
> > > Has anyone done any work or had thoughts on changes required
> > > to reduce the total memory footprint of high extent xfs files?
.....
> >Yes, it could, but that's a pretty major overhaul of the extent
> >interface which currently assumes everywhere that the entire
> >extent tree is in core.
> >
> >Can you describe the problem you are seeing that leads you to
> >ask this question? What's the problem you need to solve?
> 
> I realize that this work won't be trivial which is why I asked if anyone
> has thought about all relevant issues.
> 
> When using NFS over XFS, slowly growing files (can be ascii log files)
> tend to fragment quite a bit.

Oh, that problem.

The issue is that allocation beyond EOF (the normal way we prevent
fragmentation in this case) gets truncated off on file close.

Even NFS request is processed by doing:

	open
	write
	close

And so XFS truncates the allocation beyond EOF on close. Hence
the next write requires a new allocation and that results in
a non-contiguous file because the adjacent blocks have already
been used....

Options:

	- NFS server open file cache to avoid the close.
	- add detection to XFS to determine if the called is
	  an NFS thread and don't truncate on close.
	- use preallocation.
	- preallocation on the file once will result in the
	  XFS_DIFLAG_PREALLOC being set on the inode and it
	  won't truncate on close.
	- append only flag will work in the same way as the
	  prealloc flag w.r.t preventing truncation on close.
	- run xfs_fsr

Note - i don't think extent size hints alone will help as they
don't prevent EOF truncation on close.

> One system had several hundred files
> which required more than one page to store the extents.

I don't consider that a problem as such. We'll always get some
level of fragmentation if we don't preallocate.

> Quite a few
> files had extent counts greater than 10k, and one file had 120k extents.

you should run xfs_fsr occassionally....

> Besides the memory consumption, latency to return the first byte of the
> file can get noticeable.

Yes, that too :/

However, I think we should be trying to fix the root cause of this
worst case fragmentation rather than trying to make the rest of the
filesystem accommodate an extreme corner case efficiently.  i.e.
let's look at the test cases and determine what piece of logic we
need to add or remove to prevent this cause of fragmentation.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-06  1:36     ` David Chinner
@ 2007-06-06  2:00       ` Vlad Apostolov
  2007-06-06  2:05         ` Vlad Apostolov
  2007-06-06 17:18       ` Michael Nishimoto
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Apostolov @ 2007-06-06  2:00 UTC (permalink / raw)
  To: David Chinner; +Cc: Michael Nishimoto, Michael Nishimoto, xfs

David Chinner wrote:
> On Tue, Jun 05, 2007 at 03:23:50PM -0700, Michael Nishimoto wrote:
>   
>> David Chinner wrote:
>>     
>>> On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
>>>       
>>>> Hello,
>>>>
>>>> Has anyone done any work or had thoughts on changes required
>>>> to reduce the total memory footprint of high extent xfs files?
>>>>         
> .....
>   
>>> Yes, it could, but that's a pretty major overhaul of the extent
>>> interface which currently assumes everywhere that the entire
>>> extent tree is in core.
>>>
>>> Can you describe the problem you are seeing that leads you to
>>> ask this question? What's the problem you need to solve?
>>>       
>> I realize that this work won't be trivial which is why I asked if anyone
>> has thought about all relevant issues.
>>
>> When using NFS over XFS, slowly growing files (can be ascii log files)
>> tend to fragment quite a bit.
>>     
>
> Oh, that problem.
>
> The issue is that allocation beyond EOF (the normal way we prevent
> fragmentation in this case) gets truncated off on file close.
>
> Even NFS request is processed by doing:
>
> 	open
> 	write
> 	close
>
> And so XFS truncates the allocation beyond EOF on close. Hence
> the next write requires a new allocation and that results in
> a non-contiguous file because the adjacent blocks have already
> been used....
>
> Options:
>
> 	- NFS server open file cache to avoid the close.
> 	- add detection to XFS to determine if the called is
> 	  an NFS thread and don't truncate on close.
> 	- use preallocation.
> 	- preallocation on the file once will result in the
> 	  XFS_DIFLAG_PREALLOC being set on the inode and it
> 	  won't truncate on close.
> 	- append only flag will work in the same way as the
> 	  prealloc flag w.r.t preventing truncation on close.
> 	- run xfs_fsr
>
> Note - i don't think extent size hints alone will help as they
> don't prevent EOF truncation on close.
>   
Dave,

I think extent hint should help in this situation. Here is an example
of writing 4 chars in a file with extent hint of 16Kb. The file ends
up with size of 4 and 8 basic blocks (512 bytes each) allocation in
one extent.

emu:/mnt/scratch1/temp # xfs_io -c "extsize 16384" -f foo
emu:/mnt/scratch1/temp # ls -al foo
-rw------- 1 root root 0 2007-06-06 12:33 foo
emu:/mnt/scratch1/temp # xfs_bmap -l -v foo
foo: no extents
emu:/mnt/scratch1/temp # echo "abc" > foo
emu:/mnt/scratch1/temp # ls -al foo
-rw------- 1 root root 4 2007-06-06 12:35 foo
emu:/mnt/scratch1/temp # xfs_bmap -l -v foo
foo:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          326088..326095    0 (326088..326095)     8

Just a warning that the extent hint works at the moment only for
contiguous files. There are problems for sparse files (with holes)
and extent hint.

Regards,
Vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-06  2:00       ` Vlad Apostolov
@ 2007-06-06  2:05         ` Vlad Apostolov
  0 siblings, 0 replies; 13+ messages in thread
From: Vlad Apostolov @ 2007-06-06  2:05 UTC (permalink / raw)
  To: Vlad Apostolov; +Cc: David Chinner, Michael Nishimoto, Michael Nishimoto, xfs

Vlad Apostolov wrote:

No, Dave is right. The example worked because the extent hint was the 
same size
as the filesystem block.

Regards,
Vlad
>>
>> Note - i don't think extent size hints alone will help as they
>> don't prevent EOF truncation on close.
> Dave,
>
> I think extent hint should help in this situation. Here is an example
> of writing 4 chars in a file with extent hint of 16Kb. The file ends
> up with size of 4 and 8 basic blocks (512 bytes each) allocation in
> one extent.
>
> emu:/mnt/scratch1/temp # xfs_io -c "extsize 16384" -f foo
> emu:/mnt/scratch1/temp # ls -al foo
> -rw------- 1 root root 0 2007-06-06 12:33 foo
> emu:/mnt/scratch1/temp # xfs_bmap -l -v foo
> foo: no extents
> emu:/mnt/scratch1/temp # echo "abc" > foo
> emu:/mnt/scratch1/temp # ls -al foo
> -rw------- 1 root root 4 2007-06-06 12:35 foo
> emu:/mnt/scratch1/temp # xfs_bmap -l -v foo
> foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
> 0: [0..7]: 326088..326095 0 (326088..326095) 8
>
> Just a warning that the extent hint works at the moment only for
> contiguous files. There are problems for sparse files (with holes)
> and extent hint.
>
> Regards,
> Vlad
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-06  1:36     ` David Chinner
  2007-06-06  2:00       ` Vlad Apostolov
@ 2007-06-06 17:18       ` Michael Nishimoto
  2007-06-06 23:47         ` David Chinner
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Nishimoto @ 2007-06-06 17:18 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

David Chinner wrote:

> On Tue, Jun 05, 2007 at 03:23:50PM -0700, Michael Nishimoto wrote:
> > David Chinner wrote:
> > >On Wed, May 30, 2007 at 09:49:38AM -0700, Michael Nishimoto wrote:
> > > > Hello,
> > > >
> > > > Has anyone done any work or had thoughts on changes required
> > > > to reduce the total memory footprint of high extent xfs files?
> .....
> > >Yes, it could, but that's a pretty major overhaul of the extent
> > >interface which currently assumes everywhere that the entire
> > >extent tree is in core.
> > >
> > >Can you describe the problem you are seeing that leads you to
> > >ask this question? What's the problem you need to solve?
> >
> > I realize that this work won't be trivial which is why I asked if anyone
> > has thought about all relevant issues.
> >
> > When using NFS over XFS, slowly growing files (can be ascii log files)
> > tend to fragment quite a bit.
>
> Oh, that problem.
>
> The issue is that allocation beyond EOF (the normal way we prevent
> fragmentation in this case) gets truncated off on file close.
>
> Even NFS request is processed by doing:
>
>         open
>         write
>         close
>
> And so XFS truncates the allocation beyond EOF on close. Hence
> the next write requires a new allocation and that results in
> a non-contiguous file because the adjacent blocks have already
> been used....
>
Yes, we diagnosed this same issue.

>
> Options:
>
>         1 NFS server open file cache to avoid the close.
>         2 add detection to XFS to determine if the called is
>           an NFS thread and don't truncate on close.
>         3 use preallocation.
>         4 preallocation on the file once will result in the
>           XFS_DIFLAG_PREALLOC being set on the inode and it
>           won't truncate on close.
>         5 append only flag will work in the same way as the
>           prealloc flag w.r.t preventing truncation on close.
>         6 run xfs_fsr
>
We have discussed doing number 1.  The problem with number 2,
3, 4, & 5 is that we ended up with a bunch of files which appeared
to leak space.  If the truncate isn't done at file close time, the extra
space sits around forever.

>
> Note - i don't think extent size hints alone will help as they
> don't prevent EOF truncation on close.
>
> > One system had several hundred files
> > which required more than one page to store the extents.
>
> I don't consider that a problem as such. We'll always get some
> level of fragmentation if we don't preallocate.
>
> > Quite a few
> > files had extent counts greater than 10k, and one file had 120k extents.
>
> you should run xfs_fsr occassionally....
>
> > Besides the memory consumption, latency to return the first byte of the
> > file can get noticeable.
>
> Yes, that too :/
>
> However, I think we should be trying to fix the root cause of this
> worst case fragmentation rather than trying to make the rest of the
> filesystem accommodate an extreme corner case efficiently.  i.e.
> let's look at the test cases and determine what piece of logic we
> need to add or remove to prevent this cause of fragmentation.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
I guess there are multiple ways to look at this problem.  I have been
going under the assumption that xfs' inability to handle a large number
of extents is the root cause.  When a filesystem is full, defragmentation
might not be possible.   Also, should we consider a file with 1MB extents as
fragmented?  A 100GB file with 1MB extents has 100k extents.  As disks
and, hence, filesystems get larger, it's possible to have a larger number
of such files in a filesystem.

I still think that trying to not fragment up front is required as well 
as running
xfs_fsr, but I don't think those alone can be a complete solution.

Getting back to the original question, has there ever been serious thought
in what it might take to handle large extent files?  What might be involved
with trying to page extent blocks?

I'm most concerned about the potential locking consequences and streaming
performance implications.

thanks,

  Michael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-06 17:18       ` Michael Nishimoto
@ 2007-06-06 23:47         ` David Chinner
  2007-06-22 23:58           ` Michael Nishimoto
  0 siblings, 1 reply; 13+ messages in thread
From: David Chinner @ 2007-06-06 23:47 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: David Chinner, xfs

On Wed, Jun 06, 2007 at 10:18:14AM -0700, Michael Nishimoto wrote:
> David Chinner wrote:
> >On Tue, Jun 05, 2007 at 03:23:50PM -0700, Michael Nishimoto wrote:
> >> When using NFS over XFS, slowly growing files (can be ascii log files)
> >> tend to fragment quite a bit.
> >
> >Oh, that problem.
.....
> >And so XFS truncates the allocation beyond EOF on close. Hence
> >the next write requires a new allocation and that results in
> >a non-contiguous file because the adjacent blocks have already
> >been used....
> >
> Yes, we diagnosed this same issue.
> 
> >Options:
> >
> >        1 NFS server open file cache to avoid the close.
> >        2 add detection to XFS to determine if the called is
> >          an NFS thread and don't truncate on close.
> >        3 use preallocation.
> >        4 preallocation on the file once will result in the
> >          XFS_DIFLAG_PREALLOC being set on the inode and it
> >          won't truncate on close.
> >        5 append only flag will work in the same way as the
> >          prealloc flag w.r.t preventing truncation on close.
> >        6 run xfs_fsr
> >
> We have discussed doing number 1.

So has the community - there may even be patches floating around...

> The problem with number 2,
> 3, 4, & 5 is that we ended up with a bunch of files which appeared
> to leak space.  If the truncate isn't done at file close time, the extra
> space sits around forever.

That's not a problem for slowly growing log files - they will
eventually use the space.

I'm not saying that the truncate should be avoided on all files,
just the slow growing ones that get fragmented....

> >However, I think we should be trying to fix the root cause of this
> >worst case fragmentation rather than trying to make the rest of the
> >filesystem accommodate an extreme corner case efficiently.  i.e.
> >let's look at the test cases and determine what piece of logic we
> >need to add or remove to prevent this cause of fragmentation.

> I guess there are multiple ways to look at this problem.  I have been
> going under the assumption that xfs' inability to handle a large number
> of extents is the root cause.

Fair enough.

> When a filesystem is full, defragmentation
> might not be possible.

Yes, that's true.

> Also, should we consider a file with 1MB extents as
> fragmented?  A 100GB file with 1MB extents has 100k extents.

Yes, that's fragmented - it has 4 orders of magnitude more extents
than optimal - and the extents are too small to allow reads or
writes to acheive full bandwidth on high end raid configs....

> As disks
> and, hence, filesystems get larger, it's possible to have a larger number
> of such files in a filesystem.

Yes. But as disks get larger, there's more space available from which
to allocate contiguous ranges and so that sort of problem is less
likely to occur (until filesytsem gets full).

> I still think that trying to not fragment up front is required as well 
> as running
> xfs_fsr, but I don't think those alone can be a complete solution.
> 
> Getting back to the original question, has there ever been serious thought
> in what it might take to handle large extent files?

Yes, I've thought about it from a relatively high level, but enough
to indicate real problems that breed complexity.

> What might be involved
> with trying to page extent blocks?

 - Rewriting all of the incore extent handling code to support missing
   extent ranges (currently uses deltas from the previous block for
   file offset).
 - changing the bmap btree code to convert to incore, uncompressed format
   on a block by block basis rather than into a global table
 - add code to demand read the extent list
   	- needs to use cursors to pin blocks in memory while doing traversals
	- needs to work in ENOMEM conditions
 - convert xfs_buf.c to be able to use mempools for both xfs_buf_t and block
   dev page cache so that we can read blocks when ENOMEM in the writeback
   path
 - convert in-core extent structures to use mempools so we can read blocks
   when -ENOMEM in the writeback path
 - any new allocated structures will also have to use mempools
 - add memory shaker interfaces
 
> I'm most concerned about the potential locking consequences and streaming
> performance implications.

In reality, the worst problem is writeback at ENOMEM. Who cares about
locking and performance if it's fundamentally unworkable when the
machine is out of memory?

Even using mempools we may not be able to demand page extent blocks
safely in all cases. This is my big worry about it, and the more I
thought about it, the less that demand paging made sense - it gets
horrendously complex when you have to start playing by mempool rules
and given that the lifetime of modified buffers is determined by the
log and AIL flushing behaviour we have serious problems guaranteeing
when objects would be returned to the mempool. This is a showstopper
issue, IMO. I'm happy to be proven wrong, but it looks *extremely*
messy and complex at this point....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-06 23:47         ` David Chinner
@ 2007-06-22 23:58           ` Michael Nishimoto
  2007-06-25  2:47             ` David Chinner
  2007-06-26  1:26             ` Nathan Scott
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Nishimoto @ 2007-06-22 23:58 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs

> 
>  > Also, should we consider a file with 1MB extents as
>  > fragmented?  A 100GB file with 1MB extents has 100k extents.
> 
> Yes, that's fragmented - it has 4 orders of magnitude more extents
> than optimal - and the extents are too small to allow reads or
> writes to acheive full bandwidth on high end raid configs....

Fair enough, so multiply those numbers by 100 -- a 10TB file with 100MB
extents.  It seems to me that we can look at the negative effects of
fragmentation in two ways here.  First, (regardless of size) if a file
has a large number of extents, then it is too fragmented.  Second, if
a file's extents are so small that we can't get full bandwidth, then it
is too fragmented.

If the second case were of primary concern, then it would be reasonable
to have 1000s of extents as long as each of the extents were big enough
to amortize disk latencies across a large amount of data.

We've been assuming that a good write is one which can send
2MB of data to a single drive; so with an 8+1 raid device, we need
16MB of write data to achieve high disk utilization.  In particular,
there are flexibility advantages if high extent count files can
still achieve good performance.

   Michael

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-22 23:58           ` Michael Nishimoto
@ 2007-06-25  2:47             ` David Chinner
  2007-06-26  1:26             ` Nathan Scott
  1 sibling, 0 replies; 13+ messages in thread
From: David Chinner @ 2007-06-25  2:47 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: xfs

On Fri, Jun 22, 2007 at 04:58:06PM -0700, Michael Nishimoto wrote:
> > > Also, should we consider a file with 1MB extents as
> > > fragmented?  A 100GB file with 1MB extents has 100k extents.
> >
> >Yes, that's fragmented - it has 4 orders of magnitude more extents
> >than optimal - and the extents are too small to allow reads or
> >writes to acheive full bandwidth on high end raid configs....
> 
> Fair enough, so multiply those numbers by 100 -- a 10TB file with 100MB
> extents.

If you've got 10TB of free space, the allocator should be doing a
better job than that ;)

> It seems to me that we can look at the negative effects of
> fragmentation in two ways here.  First, (regardless of size) if a file
> has a large number of extents, then it is too fragmented.  Second, if
> a file's extents are so small that we can't get full bandwidth, then it
> is too fragmented.

Yes, that is a fair observation.

The first case is really only a concern when the maximum extent size
(8GB on 4k fsb) becomes the limiting factor. That's at file sizes in
the hundreds of TB so we are not really in trouble there yet.

> If the second case were of primary concern, then it would be reasonable
> to have 1000s of extents as long as each of the extents were big enough
> to amortize disk latencies across a large amount of data.

*nod*

> We've been assuming that a good write is one which can send
> 2MB of data to a single drive; so with an 8+1 raid device, we need
> 16MB of write data to achieve high disk utilization.

Sure, and if you want really good write performance, you don't want
any seek between two lots of 16MB in the one file. which means that
the extent size really needs to be much larger than 16MB....

> In particular,
> there are flexibility advantages if high extent count files can
> still achieve good performance.

Sure. But there are many, many different options here that will
have an impact.

	- larger extent btree block size
	  	- reduces seeks to read the tree
	- btree defragmentation to reduce seek distance
	- smarter readahead to reduce I/O latency
	- special casing extent zero and the extents in that first block
	  to allow it to be brought in without the rest of the tree
	- critical block first retreival
	- demand paging
	
Of all of these options, demand paging is the most complex and
intrusive of the solutions. We should explore the simpler options
first to determine if they will solve your immediate problem.

FWIW, before we go changing any btree code, we really should
be unifying the various btree implementations in XFS.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Reducing memory requirements for high extent xfs files
  2007-06-22 23:58           ` Michael Nishimoto
  2007-06-25  2:47             ` David Chinner
@ 2007-06-26  1:26             ` Nathan Scott
  1 sibling, 0 replies; 13+ messages in thread
From: Nathan Scott @ 2007-06-26  1:26 UTC (permalink / raw)
  To: Michael Nishimoto; +Cc: David Chinner, xfs

Hi Mike,

On Fri, 2007-06-22 at 16:58 -0700, Michael Nishimoto wrote:
> >  > Also, should we consider a file with 1MB extents as
> >  > fragmented?  A 100GB file with 1MB extents has 100k extents.
> > 
> > Yes, that's fragmented - it has 4 orders of magnitude more extents
> > than optimal - and the extents are too small to allow reads or
> > writes to acheive full bandwidth on high end raid configs....
> 
> Fair enough, so multiply those numbers by 100 -- a 10TB file ...

This seems a flawed way to look at this to me - in practice, almost
noone would have files that large.  While filesystem sizes increase
and can be expected to continue to increase, I'd expect individual
file sizes do not tend to increase anywhere near as much - file sizes
tend to be an application property, and apps want to work for all
filesystems.  So, people want to store _more_ files in their larger
filesystems, not _larger_ files, AFAICT.

So, IMO, this isn't a good place to invest effort - there are alot
of bigger bang-for-buck places that XFS could do with change to make
it generally much better.  The biggest probably being the amount of
log traffic that XFS generates ... that really needs to be tackled.

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-06-26  1:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-30 16:49 Reducing memory requirements for high extent xfs files Michael Nishimoto
2007-05-30 22:55 ` David Chinner
2007-06-05 22:23   ` Michael Nishimoto
2007-06-05 23:11     ` Vlad Apostolov
2007-06-05 23:17       ` Vlad Apostolov
2007-06-06  1:36     ` David Chinner
2007-06-06  2:00       ` Vlad Apostolov
2007-06-06  2:05         ` Vlad Apostolov
2007-06-06 17:18       ` Michael Nishimoto
2007-06-06 23:47         ` David Chinner
2007-06-22 23:58           ` Michael Nishimoto
2007-06-25  2:47             ` David Chinner
2007-06-26  1:26             ` Nathan Scott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox