From mboxrd@z Thu Jan 1 00:00:00 1970 From: Malcolm Haak Subject: Re: HSM Date: Wed, 20 Nov 2013 22:09:03 +1000 Message-ID: <528CA65F.7030709@sgi.com> References: <3472A07E6605974CBC9BC573F1BC02E4AE69DEBF@PLOXCHG03.cern.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from relay3.sgi.com ([192.48.152.1]:32985 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753088Ab3KTMJJ (ORCPT ); Wed, 20 Nov 2013 07:09:09 -0500 In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Dmitry Borodaenko , Andreas Joachim Peters Cc: Sage Weil , "ceph-devel@vger.kernel.org" It is, except it might not be. Dmapi only works if you are the one in charge of the HSM and the filesystem. So for example in a DMF solution the filesystem mounted with DMAPI options is on your NFS head node. Your HSM solution is also installed there. Things get a bit more odd when you look at DMAPI + Clustered systems. You would need HSM agents on every client node If we are talking CephFS that is. This is also true with the Lustre solution. The Lustre clients have no idea this stuff is happening. This is how it should work. It means the current requirement for installed software on the bulk of your clients is a working kernel or fuse module. On 19/11/13 05:22, Dmitry Borodaenko wrote: > On Tue, Nov 12, 2013 at 1:47 AM, Andreas Joachim Peters > wrote: >> I think you need to support the following functionality to support HSM (file not block based): >> >> 1 implement a trigger on file creation/modification/deletion >> >> 2 store the additional HSM identifier for recall as a file attribute >> >> 3 policy based purging of file related blocks (LRU cache etc.) >> >> 4 implement an optional trigger to recall a purged file and block the IO (our experience is that automatic recalls are problematic for huge installations if the aggregation window for desired recalls is short since they create inefficient and chaotic access on tapes) >> >> 5 either snapshot a file before migration, do an exclusive lock or freeze it to avoid modifications during migration (you need to have a unique enough identifier for a file, either inode/path + checksum or also inode/path + modification time works) > > DMAPI seems to be the natural choice for items 1 & 4 above. > >> FYI: there was a paper about migration policy scanning performance by IBM two years ago: >> http://domino.watson.ibm.com/library/CyberDig.nsf/papers/4A50C2D66A1F90F7852578E3005A2034/$File/rj10484.pdf > > An important omission in that paper is the exact ILM policy that was > used to scan the file system. I strongly suspect that it was a > catch-all policy that matches every file without examining any > metadata. When you add conditions that check file metadata, scan time > would increase, probably by a few orders of magnitude. >