From: Ming Zhang <mingz@ele.uri.edu>
To: Chris Wedgwood <cw@f00f.org>
Cc: Peter Grandi <pg_xfs@xfs.for.sabi.co.UK>,
Linux XFS <linux-xfs@oss.sgi.com>
Subject: Re: stable xfs
Date: Sun, 23 Jul 2006 21:14:36 -0400 [thread overview]
Message-ID: <1153703676.6963.42.camel@localhost.localdomain> (raw)
In-Reply-To: <20060721180707.GB13892@tuatara.stupidest.org>
On Fri, 2006-07-21 at 11:07 -0700, Chris Wedgwood wrote:
> On Fri, Jul 21, 2006 at 01:00:44PM -0400, Ming Zhang wrote:
>
> > what u mean overlay fs over small fs? like a unionfs?
>
> sorta not really, it's userspace libraries which create a virtual
> filesystem over real filesystems with some database (bezerkely db).
> it sorta evolved from an attempt to unify several filesystems spread
> over cheap PCs into something that pretended to be one larger fs
fancy word for this is NAS virtualization i guess.
>
> > but other than fsr. there is no better way for this right?
>
> not publicly, you could patch fsr or nag me for my patches if that
> helps
i will run some tests about fsr and see if i need to bug you about
patches.
>
> > of course, preallocate is always good. but i do not have control
> > over applications.
>
> well, in some cases you could use LD_PRELOAD and influence things, it
> depends on the application and what you need from it
>
> fwiw, most modern p2p applicaitons have terribly access patterns which
> cause cause horrible fragmentation (on all fs's, not just XFS)
>
> > sounds like a useful patch. :P will it be merged into fsr code?
>
> no, because it's ugly and i don't think i ever decoupled it from other
> changes and posted it
>
> > what kind of assistance you mean?
>
> [WARNING: lots of hand waving ahead, plenty of minor, but important,
> details ignored]
>
read about this and feel this will be VERY hard to be built, especially
considering the transaction issue.
can this be easier?
* analyze the fs to find out which file(s) to be defrag;
* create a temp file and begin to copy, preserve the space so it is
continuous;
* after first round of copy, for changed blocks have a trace table and a
second round on changed blocks.
* lock and switch the old file with new file.
> if you wanted much smarter defragmentation semantics, it would
> probably make sense to
>
> * bulkstat the entire volume, this will give you the inode cluster
> locations and enough information to start building a tree of where
> all the files are (XFS_IOC_FSGEOMETRY details obviously)
>
> * opendir/read to build a full directory tree
>
> * use XFS_IOC_GETBMAP & XFS_IOC_GETBMAPA to figure out which blocks
> are occupied by which files
>
> you would now have a pretty good idea of what is using what parts of
> the disk, except of course it could be constantly changing underneath
> you to make things harder
>
> also, doing this using the existing interfaces is (when i tried it)
> really really painfully slow if you have a large filesystem with a lot
> of small files (even when you try to optimized you accesses for
> minimize seeking by sorting by inode number and submitting several
> requests in parallel to try and help the elevator merge accesses)
>
>
> one you have some overall picture of the disk, you can decide what you
> want to move to achieve your goal, typically this would be to reduce
> the fragmentation of the largest files, and this would be be
> relocating some of all of those blocks to another place
>
> if you want to allocate space in a given AG, you open/creat a
> temporary file in a directory in that AG (create multiple dirs as
> needed to ensure you have one or more of these), and preallocate the
> space --- there you can copy the file over
>
> we could also add ioctls to further bias XFSs allocation strategies,
> like telling it to never allocate in some AGs (needed for an online
> shrink if someone wanted to make such a thing) or simply bias strongly
> away from some places, then add other ioctls to allow you to
> specifically allocate space in those AGs so you can bias what is
> allocated where
>
> another useful ioctl would be a variation of XFS_IOC_SWAPEXT which
> would swap only some extents. there is no internal support for this
> now except we do have code for XFS_IOC_UNRESVSP64 and XFS_IOC_RESVSP64
> so perhaps the idea would be to swap some (but not all) blocks of a
> file by creating a function that do the equivalent of 'punch a hole'
> where we want to replace the blocks, and then 'allocate new blocks
> given some i already have elsewhere' (however, making that all work as
> one transaction might be very very difficult)
>
> it's a lot of effort for what for many people wouldn't only have
> marginal gains
next prev parent reply other threads:[~2006-07-24 1:24 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-17 15:30 stable xfs Ming Zhang
2006-07-17 16:20 ` Peter Grandi
2006-07-18 22:36 ` Ming Zhang
2006-07-18 23:14 ` Peter Grandi
2006-07-19 1:20 ` Ming Zhang
2006-07-19 5:56 ` Chris Wedgwood
2006-07-19 10:53 ` Peter Grandi
2006-07-19 14:45 ` Ming Zhang
2006-07-22 17:13 ` Peter Grandi
2006-07-20 6:12 ` Chris Wedgwood
2006-07-22 17:31 ` Peter Grandi
2006-07-19 14:10 ` Ming Zhang
2006-07-19 10:24 ` Peter Grandi
2006-07-19 13:11 ` Ming Zhang
2006-07-20 6:15 ` Chris Wedgwood
2006-07-20 14:08 ` Ming Zhang
2006-07-20 16:17 ` Chris Wedgwood
2006-07-20 16:38 ` Ming Zhang
2006-07-20 19:04 ` Chris Wedgwood
2006-07-21 0:19 ` Ming Zhang
2006-07-21 3:26 ` Chris Wedgwood
2006-07-21 13:10 ` Ming Zhang
2006-07-21 16:07 ` Chris Wedgwood
2006-07-21 17:00 ` Ming Zhang
2006-07-21 18:07 ` Chris Wedgwood
2006-07-24 1:14 ` Ming Zhang [this message]
2006-07-22 18:09 ` Peter Grandi
2006-07-22 17:47 ` Peter Grandi
2006-07-22 15:37 ` Peter Grandi
2006-07-18 23:54 ` Nathan Scott
2006-07-19 1:15 ` Ming Zhang
2006-07-19 7:40 ` Martin Steigerwald
2006-07-19 14:11 ` Ming Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1153703676.6963.42.camel@localhost.localdomain \
--to=mingz@ele.uri.edu \
--cc=cw@f00f.org \
--cc=linux-xfs@oss.sgi.com \
--cc=pg_xfs@xfs.for.sabi.co.UK \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.