public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Zhang <mingz@ele.uri.edu>
To: Chris Wedgwood <cw@f00f.org>
Cc: Peter Grandi <pg_xfs@xfs.for.sabi.co.UK>,
	Linux XFS <linux-xfs@oss.sgi.com>
Subject: Re: stable xfs
Date: Sun, 23 Jul 2006 21:14:36 -0400	[thread overview]
Message-ID: <1153703676.6963.42.camel@localhost.localdomain> (raw)
In-Reply-To: <20060721180707.GB13892@tuatara.stupidest.org>

On Fri, 2006-07-21 at 11:07 -0700, Chris Wedgwood wrote:
> On Fri, Jul 21, 2006 at 01:00:44PM -0400, Ming Zhang wrote:
> 
> > what u mean overlay fs over small fs? like a unionfs?
> 
> sorta not really, it's userspace libraries which create a virtual
> filesystem over real filesystems with some database (bezerkely db).
> it sorta evolved from an attempt to unify several filesystems spread
> over cheap PCs into something that pretended to be one larger fs

fancy word for this is NAS virtualization i guess.


> 
> > but other than fsr. there is no better way for this right?
> 
> not publicly, you could patch fsr or nag me for my patches if that
> helps

i will run some tests about fsr and see if i need to bug you about
patches.


> 
> > of course, preallocate is always good. but i do not have control
> > over applications.
> 
> well, in some cases you could use LD_PRELOAD and influence things,  it
> depends on the application and what you need from it
> 
> fwiw, most modern p2p applicaitons have terribly access patterns which
> cause cause horrible fragmentation (on all fs's, not just XFS)
> 
> > sounds like a useful patch. :P will it be merged into fsr code?
> 
> no, because it's ugly and i don't think i ever decoupled it from other
> changes and posted it
> 
> > what kind of assistance you mean?
> 
> [WARNING: lots of hand waving ahead, plenty of minor, but important,
> details ignored]
> 

read about this and feel this will be VERY hard to be built, especially
considering the transaction issue. 

can this be easier?

* analyze the fs to find out which file(s) to be defrag;
* create a temp file and begin to copy, preserve the space so it is
continuous;
* after first round of copy, for changed blocks have a trace table and a
second round on changed blocks.
* lock and switch the old file with new file.


> if you wanted much smarter defragmentation semantics, it would
> probably make sense to
> 
>   * bulkstat the entire volume, this will give you the inode cluster
>     locations and enough information to start building a tree of where
>     all the files are (XFS_IOC_FSGEOMETRY details obviously)
> 
>   * opendir/read to build a full directory tree
> 
>   * use XFS_IOC_GETBMAP & XFS_IOC_GETBMAPA to figure out which blocks
>     are occupied by which files
> 
> you would now have a pretty good idea of what is using what parts of
> the disk, except of course it could be constantly changing underneath
> you to make things harder
> 
> also, doing this using the existing interfaces is (when i tried it)
> really really painfully slow if you have a large filesystem with a lot
> of small files (even when you try to optimized you accesses for
> minimize seeking by sorting by inode number and submitting several
> requests in parallel to try and help the elevator merge accesses)
> 
> 
> one you have some overall picture of the disk, you can decide what you
> want to move to achieve your goal, typically this would be to reduce
> the fragmentation of the largest files, and this would be be
> relocating some of all of those blocks to another place
> 
> if you want to allocate space in a given AG, you open/creat a
> temporary file in a directory in that AG (create multiple dirs as
> needed to ensure you have one or more of these), and preallocate the
> space --- there you can copy the file over
> 
> we could also add ioctls to further bias XFSs allocation strategies,
> like telling it to never allocate in some AGs (needed for an online
> shrink if someone wanted to make such a thing) or simply bias strongly
> away from some places, then add other ioctls to allow you to
> specifically allocate space in those AGs so you can bias what is
> allocated where
> 
> another useful ioctl would be a variation of XFS_IOC_SWAPEXT which
> would swap only some extents.  there is no internal support for this
> now except we do have code for XFS_IOC_UNRESVSP64 and XFS_IOC_RESVSP64
> so perhaps the idea would be to swap some (but not all) blocks of a
> file by creating a function that do the equivalent of 'punch a hole'
> where we want to replace the blocks, and then 'allocate new blocks
> given some i already have elsewhere' (however, making that all work as
> one transaction might be very very difficult)
> 
> it's a lot of effort for what for many people wouldn't only have
> marginal gains

  reply	other threads:[~2006-07-24  1:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-17 15:30 stable xfs Ming Zhang
2006-07-17 16:20 ` Peter Grandi
2006-07-18 22:36   ` Ming Zhang
2006-07-18 23:14     ` Peter Grandi
2006-07-19  1:20       ` Ming Zhang
2006-07-19  5:56         ` Chris Wedgwood
2006-07-19 10:53           ` Peter Grandi
2006-07-19 14:45             ` Ming Zhang
2006-07-22 17:13               ` Peter Grandi
2006-07-20  6:12             ` Chris Wedgwood
2006-07-22 17:31               ` Peter Grandi
2006-07-19 14:10           ` Ming Zhang
2006-07-19 10:24         ` Peter Grandi
2006-07-19 13:11           ` Ming Zhang
2006-07-20  6:15             ` Chris Wedgwood
2006-07-20 14:08               ` Ming Zhang
2006-07-20 16:17                 ` Chris Wedgwood
2006-07-20 16:38                   ` Ming Zhang
2006-07-20 19:04                     ` Chris Wedgwood
2006-07-21  0:19                       ` Ming Zhang
2006-07-21  3:26                         ` Chris Wedgwood
2006-07-21 13:10                           ` Ming Zhang
2006-07-21 16:07                             ` Chris Wedgwood
2006-07-21 17:00                               ` Ming Zhang
2006-07-21 18:07                                 ` Chris Wedgwood
2006-07-24  1:14                                   ` Ming Zhang [this message]
2006-07-22 18:09                     ` Peter Grandi
2006-07-22 17:47                 ` Peter Grandi
2006-07-22 15:37             ` Peter Grandi
2006-07-18 23:54 ` Nathan Scott
2006-07-19  1:15   ` Ming Zhang
2006-07-19  7:40   ` Martin Steigerwald
2006-07-19 14:11     ` Ming Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1153703676.6963.42.camel@localhost.localdomain \
    --to=mingz@ele.uri.edu \
    --cc=cw@f00f.org \
    --cc=linux-xfs@oss.sgi.com \
    --cc=pg_xfs@xfs.for.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox