From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Sun, 23 Jul 2006 18:24:53 -0700 (PDT) Received: from orca.ele.uri.edu (orca.ele.uri.edu [131.128.51.63]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id k6O1OZDW009928 for ; Sun, 23 Jul 2006 18:24:37 -0700 Subject: Re: stable xfs From: Ming Zhang Reply-To: mingz@ele.uri.edu In-Reply-To: <20060721180707.GB13892@tuatara.stupidest.org> References: <20060720061527.GB18135@tuatara.stupidest.org> <1153404502.2768.50.camel@localhost.localdomain> <20060720161707.GB26748@tuatara.stupidest.org> <1153413481.2768.65.camel@localhost.localdomain> <20060720190401.GA28836@tuatara.stupidest.org> <1153441178.2768.158.camel@localhost.localdomain> <20060721032632.GA4138@tuatara.stupidest.org> <1153487431.2841.8.camel@localhost.localdomain> <20060721160709.GB12347@tuatara.stupidest.org> <1153501244.2841.50.camel@localhost.localdomain> <20060721180707.GB13892@tuatara.stupidest.org> Content-Type: text/plain Date: Sun, 23 Jul 2006 21:14:36 -0400 Message-Id: <1153703676.6963.42.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-To: xfs-bounce@oss.sgi.com List-Id: xfs To: Chris Wedgwood Cc: Peter Grandi , Linux XFS On Fri, 2006-07-21 at 11:07 -0700, Chris Wedgwood wrote: > On Fri, Jul 21, 2006 at 01:00:44PM -0400, Ming Zhang wrote: > > > what u mean overlay fs over small fs? like a unionfs? > > sorta not really, it's userspace libraries which create a virtual > filesystem over real filesystems with some database (bezerkely db). > it sorta evolved from an attempt to unify several filesystems spread > over cheap PCs into something that pretended to be one larger fs fancy word for this is NAS virtualization i guess. > > > but other than fsr. there is no better way for this right? > > not publicly, you could patch fsr or nag me for my patches if that > helps i will run some tests about fsr and see if i need to bug you about patches. > > > of course, preallocate is always good. but i do not have control > > over applications. > > well, in some cases you could use LD_PRELOAD and influence things, it > depends on the application and what you need from it > > fwiw, most modern p2p applicaitons have terribly access patterns which > cause cause horrible fragmentation (on all fs's, not just XFS) > > > sounds like a useful patch. :P will it be merged into fsr code? > > no, because it's ugly and i don't think i ever decoupled it from other > changes and posted it > > > what kind of assistance you mean? > > [WARNING: lots of hand waving ahead, plenty of minor, but important, > details ignored] > read about this and feel this will be VERY hard to be built, especially considering the transaction issue. can this be easier? * analyze the fs to find out which file(s) to be defrag; * create a temp file and begin to copy, preserve the space so it is continuous; * after first round of copy, for changed blocks have a trace table and a second round on changed blocks. * lock and switch the old file with new file. > if you wanted much smarter defragmentation semantics, it would > probably make sense to > > * bulkstat the entire volume, this will give you the inode cluster > locations and enough information to start building a tree of where > all the files are (XFS_IOC_FSGEOMETRY details obviously) > > * opendir/read to build a full directory tree > > * use XFS_IOC_GETBMAP & XFS_IOC_GETBMAPA to figure out which blocks > are occupied by which files > > you would now have a pretty good idea of what is using what parts of > the disk, except of course it could be constantly changing underneath > you to make things harder > > also, doing this using the existing interfaces is (when i tried it) > really really painfully slow if you have a large filesystem with a lot > of small files (even when you try to optimized you accesses for > minimize seeking by sorting by inode number and submitting several > requests in parallel to try and help the elevator merge accesses) > > > one you have some overall picture of the disk, you can decide what you > want to move to achieve your goal, typically this would be to reduce > the fragmentation of the largest files, and this would be be > relocating some of all of those blocks to another place > > if you want to allocate space in a given AG, you open/creat a > temporary file in a directory in that AG (create multiple dirs as > needed to ensure you have one or more of these), and preallocate the > space --- there you can copy the file over > > we could also add ioctls to further bias XFSs allocation strategies, > like telling it to never allocate in some AGs (needed for an online > shrink if someone wanted to make such a thing) or simply bias strongly > away from some places, then add other ioctls to allow you to > specifically allocate space in those AGs so you can bias what is > allocated where > > another useful ioctl would be a variation of XFS_IOC_SWAPEXT which > would swap only some extents. there is no internal support for this > now except we do have code for XFS_IOC_UNRESVSP64 and XFS_IOC_RESVSP64 > so perhaps the idea would be to swap some (but not all) blocks of a > file by creating a function that do the equivalent of 'punch a hole' > where we want to replace the blocks, and then 'allocate new blocks > given some i already have elsewhere' (however, making that all work as > one transaction might be very very difficult) > > it's a lot of effort for what for many people wouldn't only have > marginal gains