From mboxrd@z Thu Jan 1 00:00:00 1970 From: jim owens Subject: Re: [RFC][PATCH 0/5] Fiemap, an extent mapping ioctl Date: Tue, 27 May 2008 12:52:31 -0400 Message-ID: <483C3C4F.1090903@hp.com> References: <20080525000148.GJ8325@wotan.suse.de> <20080525194203.GB24328@infradead.org> <200805270948.51898.chris.mason@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Chris Mason , Christoph Hellwig , Mark Fasheh , Andreas Dilger , Kalpak Shah , Eric Sandeen , Josef Bacik To: linux-fsdevel@vger.kernel.org Return-path: Received: from g4t0015.houston.hp.com ([15.201.24.18]:40698 "EHLO g4t0015.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757482AbYE0Qwj (ORCPT ); Tue, 27 May 2008 12:52:39 -0400 In-Reply-To: <200805270948.51898.chris.mason@oracle.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: For what it is worth, a few comments from a newbie who has experience with a non-linux filesystem that has a similar API and supports files spread across multiple devices. Mark Fasheh wrote: > > * FIEMAP_FLAG_LUN_ORDER > If the file system stripes file data, this will return contiguous > regions of physical allocation, sorted by LUN. Logical offsets may not > make sense if this flag is passed. If the file system does not support > multiple LUNs, this flag will be ignored. This should return an error (ENOTSUPPORTED ?) if the FS does not support multiple devices OR does not support sort-by-lun-order so the caller does not count on the info being sorted. Even an FS that supports multiple devices per file may be unable to sort it by on-disk-order without consuming an ugly set of resources. Christoph Hellwig wrote: >> __u32 fe_lun; /* logical device number for extent (starting at 0)*/ > > > Again this lun thing is horribly ill-defined. There is no such thing > as a logic device number in our filesystem terminology. I agree that LUN is confusing. In my opinion the words "logical" and "number" are overused and meaningless. As Brad suggested, "device" would be preferable, or "unit", but unfortunately every word I can think of has some other definition too :) Our term was "volume"... an awful designation. Chris Mason wrote: > For btrfs I would return the logical extents via fiemap (just like the file > were on lvm) and make btrfs specific ioctls for details about where the file > actually lived. > > fiemap alone isn't a great way to describe raid levels or complex storage > topologies. To include physical information I would also have to encode the > raid level used and information about all the devices the data is replicated > on (raid1/10) fiemap by itself is useful for programs that want to determine how fragmented a file is or where sparse areas are to skip. At least one more generic API is needed to enumerate the device number to device (path name, inode, socket, ... ?). In our case this was only used for clusters. For the complex case you describe, it might be possible to have an "enumerate" api that could be used to traverse each layer for more detail. I hope this is done generically by someone. A final thought on this: > __u32 fe_lun; /* logical device number for extent (starting at 0)*/ While the flags field can be used to tell the validity of this number, we found that starting at 0 was not a good practice. We started at 1 so 0 was always a not-valid. One way this can be useful is if you have delayed allocation, you can indicate "intended device" with a non-0 number. Of course other values such as max_int could be termed "invalid" instead. Another point to document is whether this number is a contiguous series (1, 2, 3,... N) defining the location based on the current device list or is possibly a sparse (1, 2, 6) series because the FS tracks devices that have been removed. In our implementation both views were present for different consumers. The sparse series was native and the contiguous series a translation. jim