From mboxrd@z Thu Jan 1 00:00:00 1970 From: Colin Ngam Date: Mon, 02 Feb 2009 08:56:15 -0600 Subject: [Lustre-devel] SAM-QFS, ADM, and Lustre HSM In-Reply-To: <8585251D-41D7-4B42-99F9-BDBFA2CF88C1@Sun.COM> References: <3DF0F4AF-F4D6-476E-98F7-CD912C49FC18@Sun.COM> <2734A30F-2C76-4725-9F3A-29AD4245B7E8@Sun.COM> <496FCA67.6000500@sun.com> <48D329C0-242E-4A5A-94C1-DF493BB25C2F@Sun.COM> <496FE8D4.2090908@sun.com> <4977647D.5010503@sun.com> <4977E5BD.7000706@sun.com> <4978DB1E.30507@sun.com> <497A144C.4000408@Sun.COM> <20090126193548.GF3652@webber.adilger.int> <497E35A3.3080603@sun.com> <4983998D.1030601@sun.com> <8585251D-41D7-4B42-99F9-BDBFA2CF88C1@Sun.COM> Message-ID: <4987098F.402@Sun.COM> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org Harriet G. Coverston wrote: Hi, > Nathan, > > On Jan 30, 2009, at 6:21 PM, Nathaniel Rutman wrote: > >> LEIBOVICI Thomas wrote: >>> At CEA, we are using our own copytool that directly uses HPSS API. >>> This already exists and is in production for years. >>> I think there will be few modifications to adapt it to Lustre-HSM >>> purpose >>> (basically, add fid <-> HSM id mapping and backup of attributes, >>> path, stripe...) >> So then the QFS copytool will indeed be a new tool, and should be >> scheduled accordingly. >> Features: >> 1. "cp --preserve" like functionality (include metadata attributes in >> cp) >> 2. add EA's (create mini-tarball) >> 3. implement FID hash to subdivide namespace >> 4. periodic status reporting (via ioctl on file) >> >> >> Harriet G. Coverston wrote: >>>> There is a mechanism to get the current full pathname for a given >>>> fid from userspace, so an HSM-specific copytool could find it out, >>>> but a central tenet of the design here is that as far as the HSM is >>>> concerned, the entire Lustre FS is a flat namespace of FIDs. >>> >>> Be careful here. We are a file system. We don't have a limit on # of >>> files in one directory, but we don't recommend more than 500,000 >>> files in one single directory or you will start to see some >>> performance problems. You will have to create a tree, not use a flat >>> namespace. >> Yes, a tree based on a hash of the fid. >> The other option is to use the actual filename for storage, but from >> Lustre's point of view this gets extremely tricky. For example: >> Send /foo/bar to archive. Client A opens /foo/bar. Client B renames >> /foo/bar to /abc/xyz, but this change hasn't propagated to the >> archive yet. Client A now tries to read its open file handle, which >> tells Lustre to read the offline file FID 123, which it translates to >> /abc/xyz currently, which the archive doesn't know about yet. Not >> just xyz, but renames on any ancestor path element cause similar >> misses. Since the FID remains constant throughout the life of a >> file, we don't have to worry about any namespace changes (file or >> parents). If there was an alternate way of bypassing the archive's >> namespace to directly access a file, we could conceivably store e.g. >> an archive-specific identifier within the Lustre stripe EA, and pass >> this down to the copytool when reading an offline file, but this >> presupposes that such a thing exists, is of reasonable size, has a >> userspace method to access it, etc. > > Yes, we have a FID like concept in SAM-QFS. It is called the file ID. > It is 64 bits and consists of the inode/generation number. It is > unique. You can store it. You can issue an ioctl to open the ID. You > can issue an ioctl to do an ID stat, etc. It is much more efficient > than using the filename (expensive lookup). This means if you store > and use the ID, you can cover the rename window and still be > guaranteed that you will get the right file. Note, we don't rearchive > on a rename. I believe this facility only exist on the Meta Data Server Node and not on the Linux/Solaris clients. Am I correct? Thanks. colin > > I really think a replicated namespace will be much more intuitive and > solves restore. If you prefer > to build a tar container, that is OK, too. The tar file can have a > suffix and then you know it is tar and > you can tar it back. >> >> >>> >>>> You can get a full pathname if you want to for catastrophe >>>> recovery, but Lustre itself will only speak to the HSM with FIDs. >>>> As I said in the other email, although SAM-QFS can do name-based >>>> policies, the "name" as far as QFS is concerned is just the FID, >>>> so name-based policies at the copytool level are worthless. >>>> Unless we a.) add the path/filename back to the file (EA, or use a >>>> tarball wrapper), and b.) modify the SAM policy engine to use the >>>> "real" path/filename instead of the FID. >>> >>> Currently, we don't support policy using EA (extended attributes are >>> in 5.0). We have had lots of requests for this, especially from our >>> digital preservation customers. >> Ah, policy based on EAs would be the general case, yes. > Yes, this would be a nice feature for us. > > - Harriet > > Harriet G. Coverston > Solaris, Storage Software | Email: harriet.coverston at sun.com > Sun Microsystems, Inc. | AT&T: 651-554-1515 > 1270 Eagan Industrial Rd., Suite 160 | Fax: 651-554-1540 > Eagan, MN 55121-1231 > > > >