All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] How store HSM metadata in MDT ?
@ 2008-07-03 11:43 Aurelien Degremont
  2008-07-03 21:10 ` Peter Braam
  0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Degremont @ 2008-07-03 11:43 UTC (permalink / raw)
  To: lustre-devel

HSM MetaData

For Lustre HSM project, it will be needed to store, for each file, a 
list of information describing how many copies the file has in the HSM, 
what is their HSM ID, the copy date, and so on. This data could easily 
reach 500 bytes (I think we will need between 40 and 50 bytes per HSM 
copy, and we should be able to save at least 10 copies, surely more).
The question is: where could we store this data on MDT, in which place 
(EA?) and how manage this.

We had a discussion about this with Andreas and Nathan and it is not 
very clear what is the best solution here regarding to:
- We must keep in mind that there is 2 available backends for MDT: 
ldiskfs and ZFS and both must be supported here.
- EA space is not very wide on ldiskfs and quite used by several other 
features (stripping, ACL, ...)
- Clients will need to read this data and so the RPC mechanism should be 
available and large enough to handle it.

Moreover, we will store a purged data range on OST and MDT. This could 
easily fit in a EA.

What is the possible solutions we have here ?

-- 
Aurelien Degremont
CEA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-03 11:43 [Lustre-devel] How store HSM metadata in MDT ? Aurelien Degremont
@ 2008-07-03 21:10 ` Peter Braam
  2008-07-04 14:37   ` Aurelien Degremont
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Braam @ 2008-07-03 21:10 UTC (permalink / raw)
  To: lustre-devel

I do understand that we need HSM related metadata, but I learned more from
Rick Matthews (cc'd) who is the architect of Sun's ADM project.  Now I am
not sure I am in agreement with what has been discussed so far.

If there is more than one copy in the archive, it would be preferable if the
archive could maintain a mapping from the Lustre fid of the file to the
archived copies.  Associated with the FID of the data would then be a list
of archived copies, timestamps etc.

Can that be done in HPSS?

If not, policy related operations like purging older files etc will become
very complex and not scalable.  For example, a search to find older files in
the archive would require an e2scan operation to find the inodes and then
the objects in the archive.  If the file system was not available anymore
(for whatever reason), it is not even clear that such a purge could still
happen.

With an archive based database this can be an indexed search in the archive,
which is faster and more appropriate.

Clearly this has a major impact on how much attribute space we need.

Thoughts?

Peter




On 7/3/08 5:43 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:

> HSM MetaData
> 
> For Lustre HSM project, it will be needed to store, for each file, a
> list of information describing how many copies the file has in the HSM,
> what is their HSM ID, the copy date, and so on. This data could easily
> reach 500 bytes (I think we will need between 40 and 50 bytes per HSM
> copy, and we should be able to save at least 10 copies, surely more).
> The question is: where could we store this data on MDT, in which place
> (EA?) and how manage this.
> 
> We had a discussion about this with Andreas and Nathan and it is not
> very clear what is the best solution here regarding to:
> - We must keep in mind that there is 2 available backends for MDT:
> ldiskfs and ZFS and both must be supported here.
> - EA space is not very wide on ldiskfs and quite used by several other
> features (stripping, ACL, ...)
> - Clients will need to read this data and so the RPC mechanism should be
> available and large enough to handle it.
> 
> Moreover, we will store a purged data range on OST and MDT. This could
> easily fit in a EA.
> 
> What is the possible solutions we have here ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-03 21:10 ` Peter Braam
@ 2008-07-04 14:37   ` Aurelien Degremont
  2008-07-05 16:50     ` Andreas Dilger
  2008-07-06  3:24     ` Peter Braam
  0 siblings, 2 replies; 21+ messages in thread
From: Aurelien Degremont @ 2008-07-04 14:37 UTC (permalink / raw)
  To: lustre-devel

Peter Braam a ?crit :
> If there is more than one copy in the archive, it would be preferable if the
> archive could maintain a mapping from the Lustre fid of the file to the
> archived copies.  Associated with the FID of the data would then be a list
> of archived copies, timestamps etc.

Do you mean that the HSM will be aware of various versions of one same 
file, identified in Lustre by a FID ?
Or this will be masked by the archiving tool , doing some tricks to 
simulate it ?

> Can that be done in HPSS?

HPSS alone cannot do versioning on its files presently.


> If not, policy related operations like purging older files etc will become
> very complex and not scalable.  For example, a search to find older files in
> the archive would require an e2scan operation to find the inodes and then
> the objects in the archive.  If the file system was not available anymore
> (for whatever reason), it is not even clear that such a purge could still
> happen.
> 
> With an archive based database this can be an indexed search in the archive,
> which is faster and more appropriate.

By purgin do mean purging in Lustre or in the HSM?
There's no issue with purging in Lustre because this do not imply the HSM.
And removal of oldest copies in the HSM could be done asynchronously, 
slowly.

I'm not sure I see what you mean here


-- 
Aurelien Degremont
CEA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-04 14:37   ` Aurelien Degremont
@ 2008-07-05 16:50     ` Andreas Dilger
  2008-07-06  3:20       ` Peter Braam
  2008-07-06  3:24     ` Peter Braam
  1 sibling, 1 reply; 21+ messages in thread
From: Andreas Dilger @ 2008-07-05 16:50 UTC (permalink / raw)
  To: lustre-devel

On Jul 04, 2008  16:37 +0200, Aurelien Degremont wrote:
> Peter Braam a ?crit :
> > If there is more than one copy in the archive, it would be preferable if the
> > archive could maintain a mapping from the Lustre fid of the file to the
> > archived copies.  Associated with the FID of the data would then be a list
> > of archived copies, timestamps etc.
> 
> Do you mean that the HSM will be aware of various versions of one same 
> file, identified in Lustre by a FID ?
> Or this will be masked by the archiving tool , doing some tricks to 
> simulate it ?
> 
> > Can that be done in HPSS?
> 
> HPSS alone cannot do versioning on its files presently.

When HPSS acts as both backup and HSM, is it still dependent on an external
space/backup manager to track all of the files for the filesystem, or does
it have a space manager built into it?

> > If not, policy related operations like purging older files etc will become
> > very complex and not scalable.  For example, a search to find older files in
> > the archive would require an e2scan operation to find the inodes and then
> > the objects in the archive.  If the file system was not available anymore
> > (for whatever reason), it is not even clear that such a purge could still
> > happen.
> > 
> > With an archive based database this can be an indexed search in the archive,
> > which is faster and more appropriate.
> 
> By purgin do mean purging in Lustre or in the HSM?

Purging old backups of the file in the offline storage (it isn't quite
right to call this the HSM at this point, because there are multiple
backup copies of the file, not strictly a heirarchy).

> There's no issue with purging in Lustre because this do not imply the HSM.
> And removal of oldest copies in the HSM could be done asynchronously, 
> slowly.

What manages removal of the older copies in HPSS?  If HPSS can purge older
files based on policy (leaving at least the most recent copy always), then
it would be possible to defer the backup policy to HPSS and Lustre would
only ever need to reference a single offline file.  Any queries for listing
older versions of the file would be passed on from Lustre to HPSS in that
case.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-05 16:50     ` Andreas Dilger
@ 2008-07-06  3:20       ` Peter Braam
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-06  3:20 UTC (permalink / raw)
  To: lustre-devel




On 7/5/08 10:50 AM, "Andreas Dilger" <adilger@sun.com> wrote:

> What manages removal of the older copies in HPSS?  If HPSS can purge older
> files based on policy (leaving at least the most recent copy always), then
> it would be possible to defer the backup policy to HPSS and Lustre would
> only ever need to reference a single offline file.  Any queries for listing
> older versions of the file would be passed on from Lustre to HPSS in that
> case.
> 

The point is that there is a proposal here to have multiple pointers PER
INODE - that is not a good idea.

Regardless of what policies are in place and where they are managed, they
should not affect every inode with a new pointer for every backup object of
that inode.

Peter



> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-04 14:37   ` Aurelien Degremont
  2008-07-05 16:50     ` Andreas Dilger
@ 2008-07-06  3:24     ` Peter Braam
  2008-07-06 19:24       ` Lee Ward
  2008-07-11 14:31       ` Jacques-Charles Lafoucriere
  1 sibling, 2 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-06  3:24 UTC (permalink / raw)
  To: lustre-devel




On 7/4/08 8:37 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:

> Peter Braam a ?crit :
>> If there is more than one copy in the archive, it would be preferable if the
>> archive could maintain a mapping from the Lustre fid of the file to the
>> archived copies.  Associated with the FID of the data would then be a list
>> of archived copies, timestamps etc.
> 
> Do you mean that the HSM will be aware of various versions of one same
> file, identified in Lustre by a FID ?
> Or this will be masked by the archiving tool , doing some tricks to
> simulate it ?
> 
>> Can that be done in HPSS?
> 
> HPSS alone cannot do versioning on its files presently.

But your archiving utility that copies from Lustre to HPSS can maintain
database of these objects - no need to store anything in Lustre.


> 
> 
>> If not, policy related operations like purging older files etc will become
>> very complex and not scalable.  For example, a search to find older files in
>> the archive would require an e2scan operation to find the inodes and then
>> the objects in the archive.  If the file system was not available anymore
>> (for whatever reason), it is not even clear that such a purge could still
>> happen.
>> 
>> With an archive based database this can be an indexed search in the archive,
>> which is faster and more appropriate.
> 
> By purgin do mean purging in Lustre or in the HSM?

The HSM.

> There's no issue with purging in Lustre because this do not imply the HSM.
> And removal of oldest copies in the HSM could be done asynchronously,
> slowly.

There is a rule in Lustre - no scanning, ever.  This rule will not be broken
by HSM.  

So, you have to move your management of ID's of the archvied copies outside
of Lustre, in some database.  This will actually save you time - doing this
in the MDS will be no fun.

The MDS should only get attributes to indicate if and what version of a file
is in the archive and a cursor (maybe other information) in relation with
ongoing restores.

Peter


> 
> I'm not sure I see what you mean here
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-06  3:24     ` Peter Braam
@ 2008-07-06 19:24       ` Lee Ward
  2008-07-06 22:53         ` Peter Braam
  2008-07-08  8:52         ` Aurelien Degremont
  2008-07-11 14:31       ` Jacques-Charles Lafoucriere
  1 sibling, 2 replies; 21+ messages in thread
From: Lee Ward @ 2008-07-06 19:24 UTC (permalink / raw)
  To: lustre-devel

Are you all talking about HSM, really, or simply backup?

If backup, read no further.

If HSM, then, do you intend that the user be allowed to specify *which*
version of the file content is desired?

If yes and you also want the standard API and utilities to function,
seamlessly, then the version must be exposed in the name space, no? I.e.
For any file named "foo" with 3 versions, for instance, there would be
foo;1, foo;2, foo;3, and "foo" which is an alias for "foo;1".

If no, then, you'll have to craft a special API that will motivate
special tools. However, HPSS already has this API and set of tools so
what's the point? Wouldn't it be better to just modify HPSS to
understand versions?

If HSM, then, do you intend that two users might be allowed to work with
two, or more, versions of the file content simultaneously?

If yes then same problem as above since those two versions might need to
be in the same directory, at the same time, right?

No matter what you do, you have problems that can't be resolved when
mixing a POSIX name space with file versions, I believe. Since POSIX
reserves no characters you can't pick a scheme that includes version
information in the name without at least being confusing and the API
provides no other way to specify the version, no?

My personal choice would be to shy off direct version support by the
native file system. It doesn't seem to have a reasonable solution
without involving the user somehow to specify names or naming schemes.
That kind of involvement just begs for a special utility and, once
there, relieves the file system of the need to support any but the most
recent version itself, anyway.

		--Lee

On Sat, 2008-07-05 at 21:24 -0600, Peter Braam wrote:
> 
> 
> On 7/4/08 8:37 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:
> 
> > Peter Braam a ?crit :
> >> If there is more than one copy in the archive, it would be preferable if the
> >> archive could maintain a mapping from the Lustre fid of the file to the
> >> archived copies.  Associated with the FID of the data would then be a list
> >> of archived copies, timestamps etc.
> >
> > Do you mean that the HSM will be aware of various versions of one same
> > file, identified in Lustre by a FID ?
> > Or this will be masked by the archiving tool , doing some tricks to
> > simulate it ?
> >
> >> Can that be done in HPSS?
> >
> > HPSS alone cannot do versioning on its files presently.
> 
> But your archiving utility that copies from Lustre to HPSS can maintain
> database of these objects - no need to store anything in Lustre.
> 
> 
> >
> >
> >> If not, policy related operations like purging older files etc will become
> >> very complex and not scalable.  For example, a search to find older files in
> >> the archive would require an e2scan operation to find the inodes and then
> >> the objects in the archive.  If the file system was not available anymore
> >> (for whatever reason), it is not even clear that such a purge could still
> >> happen.
> >>
> >> With an archive based database this can be an indexed search in the archive,
> >> which is faster and more appropriate.
> >
> > By purgin do mean purging in Lustre or in the HSM?
> 
> The HSM.
> 
> > There's no issue with purging in Lustre because this do not imply the HSM.
> > And removal of oldest copies in the HSM could be done asynchronously,
> > slowly.
> 
> There is a rule in Lustre - no scanning, ever.  This rule will not be broken
> by HSM.
> 
> So, you have to move your management of ID's of the archvied copies outside
> of Lustre, in some database.  This will actually save you time - doing this
> in the MDS will be no fun.
> 
> The MDS should only get attributes to indicate if and what version of a file
> is in the archive and a cursor (maybe other information) in relation with
> ongoing restores.
> 
> Peter
> 
> 
> >
> > I'm not sure I see what you mean here
> >
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-06 19:24       ` Lee Ward
@ 2008-07-06 22:53         ` Peter Braam
  2008-07-08 12:06           ` Rick Matthews
  2008-07-08  8:52         ` Aurelien Degremont
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Braam @ 2008-07-06 22:53 UTC (permalink / raw)
  To: lustre-devel

Lee - Thank you for this clear explanation.

If solely the HSM can store multiple versions, we have already some
difficulties.  One might imagine setting a particular version in the HSM as
the primary one, meaning that this primary one will be transparently
restored or that a pre-staging utility will select this by default.

If the file is fully absent in the file system staging or restoring it will
work correctly.  However, if a part of the file remains in the file system,
this HSM versioning becomes complicated because the file will again have to
remember what HSM versions the fragments belong to, and we are almost back
where we were.

I think the emails so far make it clear that we don't want to have one
Lustre inode be associated with multiple objects in the HSM.

If the HSM system is used as a backup then the restore operations will have
user or operator involvement and this objection to storing multiple versions
in the HSM does not apply. However, the we still don't' want to store a
pointer to each version in the file system, that belongs in the HSM/backup
metadata store.

However, I don't want to end the discussion right here.

With DMU (or otherwise) we will get file systems where snapshots become
possible and common, and these snapshots will contain different versions of
the same file.  The way the namespace distinguishes these is that in the
pair (fsid, fid) the fsid is different for each snapshot.   So probably the
id in the HSM should allow for an fsid component.

Now DMU snapshot versions of one inode share blocks, and this leads to the
question if/how we can efficiently share blocks in the HSM also.  This
discussion would probably equally apply to upcoming "dedup" efforts for the
DMU, which the virtualization and "email attachment" community think are
very important.

Rick, Jeff  - how will we handle this?

Peter





On 7/6/08 1:24 PM, "Lee Ward" <lee@sandia.gov> wrote:

> Are you all talking about HSM, really, or simply backup?
> 
> If backup, read no further.
> 
> If HSM, then, do you intend that the user be allowed to specify *which*
> version of the file content is desired?
> 
> If yes and you also want the standard API and utilities to function,
> seamlessly, then the version must be exposed in the name space, no? I.e.
> For any file named "foo" with 3 versions, for instance, there would be
> foo;1, foo;2, foo;3, and "foo" which is an alias for "foo;1".
> 
> If no, then, you'll have to craft a special API that will motivate
> special tools. However, HPSS already has this API and set of tools so
> what's the point? Wouldn't it be better to just modify HPSS to
> understand versions?
> 
> If HSM, then, do you intend that two users might be allowed to work with
> two, or more, versions of the file content simultaneously?
> 
> If yes then same problem as above since those two versions might need to
> be in the same directory, at the same time, right?
> 
> No matter what you do, you have problems that can't be resolved when
> mixing a POSIX name space with file versions, I believe. Since POSIX
> reserves no characters you can't pick a scheme that includes version
> information in the name without at least being confusing and the API
> provides no other way to specify the version, no?
> 
> My personal choice would be to shy off direct version support by the
> native file system. It doesn't seem to have a reasonable solution
> without involving the user somehow to specify names or naming schemes.
> That kind of involvement just begs for a special utility and, once
> there, relieves the file system of the need to support any but the most
> recent version itself, anyway.
> 
> --Lee
> 
> On Sat, 2008-07-05 at 21:24 -0600, Peter Braam wrote:
>> 
>> 
>> On 7/4/08 8:37 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:
>> 
>>> Peter Braam a ?crit :
>>>> If there is more than one copy in the archive, it would be preferable if
>>>> the
>>>> archive could maintain a mapping from the Lustre fid of the file to the
>>>> archived copies.  Associated with the FID of the data would then be a list
>>>> of archived copies, timestamps etc.
>>> 
>>> Do you mean that the HSM will be aware of various versions of one same
>>> file, identified in Lustre by a FID ?
>>> Or this will be masked by the archiving tool , doing some tricks to
>>> simulate it ?
>>> 
>>>> Can that be done in HPSS?
>>> 
>>> HPSS alone cannot do versioning on its files presently.
>> 
>> But your archiving utility that copies from Lustre to HPSS can maintain
>> database of these objects - no need to store anything in Lustre.
>> 
>> 
>>> 
>>> 
>>>> If not, policy related operations like purging older files etc will become
>>>> very complex and not scalable.  For example, a search to find older files
>>>> in
>>>> the archive would require an e2scan operation to find the inodes and then
>>>> the objects in the archive.  If the file system was not available anymore
>>>> (for whatever reason), it is not even clear that such a purge could still
>>>> happen.
>>>> 
>>>> With an archive based database this can be an indexed search in the
>>>> archive,
>>>> which is faster and more appropriate.
>>> 
>>> By purgin do mean purging in Lustre or in the HSM?
>> 
>> The HSM.
>> 
>>> There's no issue with purging in Lustre because this do not imply the HSM.
>>> And removal of oldest copies in the HSM could be done asynchronously,
>>> slowly.
>> 
>> There is a rule in Lustre - no scanning, ever.  This rule will not be broken
>> by HSM.
>> 
>> So, you have to move your management of ID's of the archvied copies outside
>> of Lustre, in some database.  This will actually save you time - doing this
>> in the MDS will be no fun.
>> 
>> The MDS should only get attributes to indicate if and what version of a file
>> is in the archive and a cursor (maybe other information) in relation with
>> ongoing restores.
>> 
>> Peter
>> 
>> 
>>> 
>>> I'm not sure I see what you mean here
>>> 
>> 
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-06 19:24       ` Lee Ward
  2008-07-06 22:53         ` Peter Braam
@ 2008-07-08  8:52         ` Aurelien Degremont
  2008-07-08 17:41           ` Peter Braam
  1 sibling, 1 reply; 21+ messages in thread
From: Aurelien Degremont @ 2008-07-08  8:52 UTC (permalink / raw)
  To: lustre-devel

Lee Ward a ?crit :
> If HSM, then, do you intend that the user be allowed to specify *which*
> version of the file content is desired?

User could say:
   "overwrite the current version of this file with this older copies 
which was made few time ago."

-The current file content is lost.
-That is the only way to access the older copies content.

There is no namespace tricks, no huge API changes, always one version of 
a file in Lustre, just few functions added to 'lfs' command.

The purpose is just, using the HSM infrastructure, simply add few 
feature to help people asking us for backup features, but this will not 
be a true backup system. This kind of utility requires much more 
development.


-- 
Aurelien Degremont
CEA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-06 22:53         ` Peter Braam
@ 2008-07-08 12:06           ` Rick Matthews
  0 siblings, 0 replies; 21+ messages in thread
From: Rick Matthews @ 2008-07-08 12:06 UTC (permalink / raw)
  To: lustre-devel

Peter and Lee,
  Lee, you are correct when pointing out the versioning of a file with a
backup copy is a backup style function. One desirable to users of backup ans
some HSM products, but still primarily driven by the "coincidence" that
older copies remain, and reference to them may be desired. (In most 
instances,
these references are used to either "restore this copy to that directory" or
"restore this directory tree to its prior state".) So, while primarily a 
backup
function, one that if an HSM is the basis for backup copies may be important
in the future.


  HSM as the basis of backup copies is a desirable trait IMHO. The HSM 
is already
retaining an instance of the file, one which could easily be captured as 
a "backup"
copy. That said, HSM and snapshot seems to bring a better mix, 
particularly to the
user. A snapshot of the file system presents a consistent view, and the 
backup would
only need to include data (metadata) from files previously resident in 
the HSM.


  As for HSM and deduplication, I see the deduplication being an 
optimization tradeoff
with consumed space. In a relatively expensive random access media (like 
disk), deduplication
provides a reduced total data footprint while not affecting the 
retrieval rate significantly.
When the media is sequentially oriented and relatively less expensive to 
have (like tape),
deduplication seems to not make as much sense. So, I see deduplication 
as important on disk
based archive copies, and not all that useful in tape archiving. Of 
course, tape striping
is important, but is still a sequential store/retrieve. Also, if it is 
convenient to deduplicate
full sequential images of a file (while not violating a numbers of 
copies policy), that
should be done on the sequentially oriented media. There may also be 
some policy (sequential
affinity) reasons where even the full image deduplication is not desirable.

  Thank you for letting me participate in this discussion.
--
Rick

Peter Braam wrote:
> Lee - Thank you for this clear explanation.
>
> If solely the HSM can store multiple versions, we have already some
> difficulties.  One might imagine setting a particular version in the HSM as
> the primary one, meaning that this primary one will be transparently
> restored or that a pre-staging utility will select this by default.
>
> If the file is fully absent in the file system staging or restoring it will
> work correctly.  However, if a part of the file remains in the file system,
> this HSM versioning becomes complicated because the file will again have to
> remember what HSM versions the fragments belong to, and we are almost back
> where we were.
>
> I think the emails so far make it clear that we don't want to have one
> Lustre inode be associated with multiple objects in the HSM.
>
> If the HSM system is used as a backup then the restore operations will have
> user or operator involvement and this objection to storing multiple versions
> in the HSM does not apply. However, the we still don't' want to store a
> pointer to each version in the file system, that belongs in the HSM/backup
> metadata store.
>
> However, I don't want to end the discussion right here.
>
> With DMU (or otherwise) we will get file systems where snapshots become
> possible and common, and these snapshots will contain different versions of
> the same file.  The way the namespace distinguishes these is that in the
> pair (fsid, fid) the fsid is different for each snapshot.   So probably the
> id in the HSM should allow for an fsid component.
>
> Now DMU snapshot versions of one inode share blocks, and this leads to the
> question if/how we can efficiently share blocks in the HSM also.  This
> discussion would probably equally apply to upcoming "dedup" efforts for the
> DMU, which the virtualization and "email attachment" community think are
> very important.
>
> Rick, Jeff  - how will we handle this?
>
> Peter
>
>
>
>
>
> On 7/6/08 1:24 PM, "Lee Ward" <lee@sandia.gov> wrote:
>
>   
>> Are you all talking about HSM, really, or simply backup?
>>
>> If backup, read no further.
>>
>> If HSM, then, do you intend that the user be allowed to specify *which*
>> version of the file content is desired?
>>
>> If yes and you also want the standard API and utilities to function,
>> seamlessly, then the version must be exposed in the name space, no? I.e.
>> For any file named "foo" with 3 versions, for instance, there would be
>> foo;1, foo;2, foo;3, and "foo" which is an alias for "foo;1".
>>
>> If no, then, you'll have to craft a special API that will motivate
>> special tools. However, HPSS already has this API and set of tools so
>> what's the point? Wouldn't it be better to just modify HPSS to
>> understand versions?
>>
>> If HSM, then, do you intend that two users might be allowed to work with
>> two, or more, versions of the file content simultaneously?
>>
>> If yes then same problem as above since those two versions might need to
>> be in the same directory, at the same time, right?
>>
>> No matter what you do, you have problems that can't be resolved when
>> mixing a POSIX name space with file versions, I believe. Since POSIX
>> reserves no characters you can't pick a scheme that includes version
>> information in the name without at least being confusing and the API
>> provides no other way to specify the version, no?
>>
>> My personal choice would be to shy off direct version support by the
>> native file system. It doesn't seem to have a reasonable solution
>> without involving the user somehow to specify names or naming schemes.
>> That kind of involvement just begs for a special utility and, once
>> there, relieves the file system of the need to support any but the most
>> recent version itself, anyway.
>>
>> --Lee
>>
>> On Sat, 2008-07-05 at 21:24 -0600, Peter Braam wrote:
>>     
>>> On 7/4/08 8:37 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:
>>>
>>>       
>>>> Peter Braam a ?crit :
>>>>         
>>>>> If there is more than one copy in the archive, it would be preferable if
>>>>> the
>>>>> archive could maintain a mapping from the Lustre fid of the file to the
>>>>> archived copies.  Associated with the FID of the data would then be a list
>>>>> of archived copies, timestamps etc.
>>>>>           
>>>> Do you mean that the HSM will be aware of various versions of one same
>>>> file, identified in Lustre by a FID ?
>>>> Or this will be masked by the archiving tool , doing some tricks to
>>>> simulate it ?
>>>>
>>>>         
>>>>> Can that be done in HPSS?
>>>>>           
>>>> HPSS alone cannot do versioning on its files presently.
>>>>         
>>> But your archiving utility that copies from Lustre to HPSS can maintain
>>> database of these objects - no need to store anything in Lustre.
>>>
>>>
>>>       
>>>>         
>>>>> If not, policy related operations like purging older files etc will become
>>>>> very complex and not scalable.  For example, a search to find older files
>>>>> in
>>>>> the archive would require an e2scan operation to find the inodes and then
>>>>> the objects in the archive.  If the file system was not available anymore
>>>>> (for whatever reason), it is not even clear that such a purge could still
>>>>> happen.
>>>>>
>>>>> With an archive based database this can be an indexed search in the
>>>>> archive,
>>>>> which is faster and more appropriate.
>>>>>           
>>>> By purgin do mean purging in Lustre or in the HSM?
>>>>         
>>> The HSM.
>>>
>>>       
>>>> There's no issue with purging in Lustre because this do not imply the HSM.
>>>> And removal of oldest copies in the HSM could be done asynchronously,
>>>> slowly.
>>>>         
>>> There is a rule in Lustre - no scanning, ever.  This rule will not be broken
>>> by HSM.
>>>
>>> So, you have to move your management of ID's of the archvied copies outside
>>> of Lustre, in some database.  This will actually save you time - doing this
>>> in the MDS will be no fun.
>>>
>>> The MDS should only get attributes to indicate if and what version of a file
>>> is in the archive and a cursor (maybe other information) in relation with
>>> ongoing restores.
>>>
>>> Peter
>>>
>>>
>>>       
>>>> I'm not sure I see what you mean here
>>>>
>>>>         
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>
>>>       
>>     
>
>
>   


-- 
---------------------------------------------------------------------
Rick Matthews                           email: Rick.Matthews at sun.com
Sun Microsystems, Inc.                  phone:+1(651) 554-1518
1270 Eagan Industrial Road              phone(internal): 54418
Suite 160                               fax:  +1(651) 554-1540
Eagan, MN 55121-1231 USA                main: +1(651) 554-1500		
---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-08  8:52         ` Aurelien Degremont
@ 2008-07-08 17:41           ` Peter Braam
  2008-07-09 13:25             ` Aurelien Degremont
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Braam @ 2008-07-08 17:41 UTC (permalink / raw)
  To: lustre-devel

I think we have come to the following conclusions:

1. The HSM or a database associated with it implements a table to map FIDs
to stored HSM versions of a file, with other metadata it may need to
maintain its archives.

2. An HSM utility can query and learn about the versions stored for a fid
(or file name).  A "restore" function can copy any version out of the HSM
and place it in the file system.  This is similar to restoring a file from a
backup archive.

3. The file system only has attributes to indicate the state of the primary
archived copy (probably the last fully archived copy of the file), and can
retrieve that file on demand (without user intervention).

4. The HSM database will allow files in snapshots to be encoded with (fsid,
fid) or something similar.

5. for now we ignore block level dedup in the HSM

Can the owner of the HLD make updates?  Please also read on - I have some
more questions below.


On 7/8/08 2:52 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:

> Lee Ward a ?crit :
>> If HSM, then, do you intend that the user be allowed to specify *which*
>> version of the file content is desired?
> 
> User could say:
>    "overwrite the current version of this file with this older copies
> which was made few time ago."
> 
> -The current file content is lost.
> -That is the only way to access the older copies content.

Yes, that is reasonable.
 
> There is no namespace tricks, no huge API changes, always one version of
> a file in Lustre, just few functions added to 'lfs' command.

NO - this will not be an lfs command.  This is an HSM command.

> The purpose is just, using the HSM infrastructure, simply add few
> feature to help people asking us for backup features, but this will not
> be a true backup system. This kind of utility requires much more
> development.

I think it would be good to review one more time the following aspects of
the design:

1. how is a bare metal restore arranged (ie. How is metadata moved into the
HSM)?  Can this restore put files in a file system different than Lustre?

2. how are small files grouped then "tar'd up" and how are we setting the
attributes of the inodes of the files that have been placed in the HSM after
this?  How does the index entry for the fids in the HSM database function?

3. how are multiple coordinators and agents utilized to distribute load so
that the HSM can keep up with massive small file creation?

For all of these we have seen sketchy answers in the past, let's dig in and
make sure that we have this right.

Regards,

Peter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-08 17:41           ` Peter Braam
@ 2008-07-09 13:25             ` Aurelien Degremont
  2008-07-09 13:49               ` Peter Braam
  0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Degremont @ 2008-07-09 13:25 UTC (permalink / raw)
  To: lustre-devel

Peter Braam a ?crit :
> 1. The HSM or a database associated with it implements a table to map FIDs
> to stored HSM versions of a file, with other metadata it may need to
> maintain its archives.

Ok

> 2. An HSM utility can query and learn about the versions stored for a fid
> (or file name).  A "restore" function can copy any version out of the HSM
> and place it in the file system.  This is similar to restoring a file from a
> backup archive.

Ok, that's copy-in.

> 3. The file system only has attributes to indicate the state of the primary
> archived copy (probably the last fully archived copy of the file), and can
> retrieve that file on demand (without user intervention).

Ok. Still need to store the purge window on MDT and OST to raise cache 
misses.
How Lustre will update this information if user can use a HSM command 
directly, by-passing Lustre? He can change the file copies present in 
the HSM without Lustre knowing it.

> 4. The HSM database will allow files in snapshots to be encoded with (fsid,
> fid) or something similar.

Can we consider there is always a default snapshot?
The ID will always be done with FSID+FID ? Or should we consider a 
special case when snapshotting is not enabled ?

>> There is no namespace tricks, no huge API changes, always one version of
>> a file in Lustre, just few functions added to 'lfs' command.
> 
> NO - this will not be an lfs command.  This is an HSM command.

Could you present a use case of how user will explicitly make backups 
and restore an older copy using the HSM command and no Lustre component?

Doing this, the client nodes should be able to communicate with the HSM 
infrastructure, using specific network protocols, and so on.
You will need to set up your Lustre network and then your HSM network 
even if HSM just need to talk with the Lustre agent.

> 1. how is a bare metal restore arranged (ie. How is metadata moved into the
> HSM)?  Can this restore put files in a file system different than Lustre?

Until now, the metadata were stored inside Lustre, so this was not 
needed. Now, we must add a way for the archiving tool to "setattr" this 
data when restoring a file.

About a different filesystem, this will depend on the features used by 
the archiving tool to copy back the data and metadata. If those are 
standard, the file could be put in a different filesystem.

> 2. how are small files grouped then "tar'd up" and how are we setting the
> attributes of the inodes of the files that have been placed in the HSM after
> this?  How does the index entry for the fids in the HSM database function?

Presently, just the archiving tool was supposed to support such feature, 
to avoid having to recode them later (various tools will be needed for 
the various existing HSMs and their development won't be centralized) 
when we will add this kind of feature.
There is no defined mechanism for grouping file into the HSM presently.

> 3. how are multiple coordinators and agents utilized to distribute load so
> that the HSM can keep up with massive small file creation?

One coordinator per MDT.
The coordinator deals only with its MDT files.
The coordinator dispatches their requests on the agents with a 
round-robin. Agents can refuse requests if they cannot handle them (too 
busy). Coordinator try another one. If no agent are available, it 
postpones the request.


-- 
Aurelien Degremont
CEA

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-09 13:25             ` Aurelien Degremont
@ 2008-07-09 13:49               ` Peter Braam
  2008-07-11 14:32                 ` Jacques-Charles Lafoucriere
  2008-07-11 14:37                 ` Jacques-Charles Lafoucriere
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-09 13:49 UTC (permalink / raw)
  To: lustre-devel




On 7/9/08 7:25 AM, "Aurelien Degremont" <aurelien.degremont@cea.fr> wrote:
> 
>> 3. The file system only has attributes to indicate the state of the primary
>> archived copy (probably the last fully archived copy of the file), and can
>> retrieve that file on demand (without user intervention).
> 
> Ok. Still need to store the purge window on MDT and OST to raise cache
> misses.

Yes.

> How Lustre will update this information if user can use a HSM command
> directly, by-passing Lustre? He can change the file copies present in
> the HSM without Lustre knowing it.

NO - we said that the only operation we do is placing an entire file into
Lustre.


> 
>> 4. The HSM database will allow files in snapshots to be encoded with (fsid,
>> fid) or something similar.
> 
> Can we consider there is always a default snapshot?
> The ID will always be done with FSID+FID ? Or should we consider a
> special case when snapshotting is not enabled ?

Why would you?  You need to make sure that the index field is large enough.
Almost all our customers have more than one file system anyway, regardless
of snapshots.

> 
>>> There is no namespace tricks, no huge API changes, always one version of
>>> a file in Lustre, just few functions added to 'lfs' command.
>> 
>> NO - this will not be an lfs command.  This is an HSM command.
> 
> Could you present a use case of how user will explicitly make backups
> and restore an older copy using the HSM command and no Lustre component?

Hsm_copy_to_fs  <FID>   /mnt/lustre/braams_lost_file
 
> Doing this, the client nodes should be able to communicate with the HSM
> infrastructure, using specific network protocols, and so on.
> You will need to set up your Lustre network and then your HSM network
> even if HSM just need to talk with the Lustre agent.

The utility for restore is not essentially different from what the agent
invokes as a mover.

> 
>> 1. how is a bare metal restore arranged (ie. How is metadata moved into the
>> HSM)?  Can this restore put files in a file system different than Lustre?
> 
> Until now, the metadata were stored inside Lustre, so this was not
> needed. Now, we must add a way for the archiving tool to "setattr" this
> data when restoring a file.
> 
> About a different filesystem, this will depend on the features used by
> the archiving tool to copy back the data and metadata. If those are
> standard, the file could be put in a different filesystem.

Hmm.  This description has no content.  If you don't want to do this, say
so, or describe the entire process in detail.

> 
>> 2. how are small files grouped then "tar'd up" and how are we setting the
>> attributes of the inodes of the files that have been placed in the HSM after
>> this?  How does the index entry for the fids in the HSM database function?
> 
> Presently, just the archiving tool

HOW?

> was supposed to support such feature,
> to avoid having to recode them later (various tools will be needed for
> the various existing HSMs and their development won't be centralized)
> when we will add this kind of feature.
> There is no defined mechanism for grouping file into the HSM presently.

You need to describe this in detail - so far you are just repeating my
questions pretending they are answers.  What events are generated for small
files?  How are they grouped into something that is "tarred up"?  What
happens to all the individual inodes when the tarball hits the HSM?
 
>> 3. how are multiple coordinators and agents utilized to distribute load so
>> that the HSM can keep up with massive small file creation?
> 
> One coordinator per MDT.

No - these must be independent considerations.  A coordinator may be much
slower than an MDS node in handling a single file.  I say this because this
has been the experience in the industry so far - with small files the HSM
can not at all keep up.

> The coordinator deals only with its MDT files.
> The coordinator dispatches their requests on the agents with a
> round-robin.

No, I think a more sophisticated policy is needed.  Eg. Small files to these
agents, big files to others.


> Agents can refuse requests if they cannot handle them (too
> busy).

NO NO NO.

> Coordinator try another one. If no agent are available, it
> postpones the request.
> 

Please take time to respond with details.

Thanks.

Peter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-06  3:24     ` Peter Braam
  2008-07-06 19:24       ` Lee Ward
@ 2008-07-11 14:31       ` Jacques-Charles Lafoucriere
  2008-07-11 21:57         ` Peter Braam
  1 sibling, 1 reply; 21+ messages in thread
From: Jacques-Charles Lafoucriere @ 2008-07-11 14:31 UTC (permalink / raw)
  To: lustre-devel



Peter Braam wrote:
>
> There is a rule in Lustre - no scanning, ever.  This rule will not be broken
> by HSM.  
>   
If a site wants to connect an HSM to an existing file system (close to 
be full)
how do we do without fast scanning ?
We cannot restrict the use of HSM binding only on new file systems

JC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080711/8f0e1d63/attachment.htm>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-09 13:49               ` Peter Braam
@ 2008-07-11 14:32                 ` Jacques-Charles Lafoucriere
  2008-07-11 22:03                   ` Peter Braam
  2008-07-11 14:37                 ` Jacques-Charles Lafoucriere
  1 sibling, 1 reply; 21+ messages in thread
From: Jacques-Charles Lafoucriere @ 2008-07-11 14:32 UTC (permalink / raw)
  To: lustre-devel



Peter Braam wrote:
>>> 3. how are multiple coordinators and agents utilized to distribute load so
>>> that the HSM can keep up with massive small file creation?
>>>       
>> One coordinator per MDT.
>>     
>
> No - these must be independent considerations.  A coordinator may be much
> slower than an MDS node in handling a single file.  I say this because this
> has been the experience in the industry so far - with small files the HSM
> can not at all keep up.
>   

Why a coordinator should be on a slower node than a MDS ?
Coordinator is a Lustre service  like other Lustre services so it will 
be on a right hardware

Do you mean a coordinator is not part of the Lustre cluster ?

JC
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080711/506825c7/attachment.htm>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-09 13:49               ` Peter Braam
  2008-07-11 14:32                 ` Jacques-Charles Lafoucriere
@ 2008-07-11 14:37                 ` Jacques-Charles Lafoucriere
  2008-07-11 22:12                   ` Peter Braam
  1 sibling, 1 reply; 21+ messages in thread
From: Jacques-Charles Lafoucriere @ 2008-07-11 14:37 UTC (permalink / raw)
  To: lustre-devel

Hello

following latest discussions I understand a large change is coming on 
Lustre/HSM interactions.
In the HLD the HSM is following Lustre requests :
- lustre triggers copy-out and copy-in
- all the copy requests are made through the coordinator control (so the 
Hsm_copy_to_fs command is a command line interface to the coordinator).
Note that this Hsm_copy_to_fs is different from the copy tool.
The central role of coordinator allows us to control all the requests 
and avoid duplicated requests to copy tool (and give a global view).

Now it seems Lustre will have to be able to follow HSM requests to send 
file back in Lustre and so independently of the coordinator (in a 
previous email it was requested Hsm_copy_to_fs to trigger copy 
independently of Lustre)
I do not agree on this change because HSM has to be seen as a backend 
storage for Lustre and the decisions to copy have to be in Lustre.
Lustre must no suffer HSM but must use it

To manage this new requirement the copy tool has to implement a central 
entity that will :
- avoid duplicate requests
- choose which agent has to make the copy
This will have to be duplicated for each HSM (or backend) supported and 
also will duplicate coordinator role so I think it is better to have it 
in Lustre instead of in the copy tool.

About file grouping, the planed features are :
- for copy out:  in one request a list of files can be provided to copy 
tool so it can choose to group them in one HSM "group archive".
- for copy in: if a file is in a HSM "group archive", the copy tool will 
copy back this file in Lustre (and not all the archive file)
- grouped request can come from a user request or a the space manager 
(for a generic policy)

The space manager design is today on stand by because of the lack of 
information on changelogs, feeds, Lustre policy engine

I think there is a very strong need in Lustre to have a generic policy 
database that can be used to allocate files, copy out files, purge 
files, choose which agent will copy files .....
One use case for this database is to provide users an interface to said 
: I want all my *.avi files to be striped on 6 OST and all other files 
to be not striped

JC

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-11 14:31       ` Jacques-Charles Lafoucriere
@ 2008-07-11 21:57         ` Peter Braam
  2008-07-16 10:26           ` Jacques-Charles Lafoucriere
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Braam @ 2008-07-11 21:57 UTC (permalink / raw)
  To: lustre-devel

A one time event to build a database or log is acceptable.

However, the proposal so far would require frequent scans because the
metadata would be in the wrong place, that is what I want to avoid.

Maintaining that metadata is very simple, and can solve some problems that
you cannot easily solve without maintaining a database in conjunction with
the HSM (such as cleaning partial, aborted copies into the HSM).  You?d
simply ask the moving script to contact a database and make a record.  The
database may not be able to keep up with all file creations in the FS but it
can probably keep up with the activities that one moving node does to/from
the HSM.

Peter


On 7/11/08 8:31 AM, "Jacques-Charles Lafoucriere" <jc.lafoucriere@cea.fr>
wrote:

> 
> 
> Peter Braam wrote:
>>  
>> 
>> There is a rule in Lustre - no scanning, ever.  This rule will not be broken
>> by HSM.  
>>   
> If a site wants to connect an HSM to an existing file system (close to be
> full)
> how do we do without fast scanning ?
> We cannot restrict the use of HSM binding only on new file systems
> 
> JC
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080711/61dbfe02/attachment.htm>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-11 14:32                 ` Jacques-Charles Lafoucriere
@ 2008-07-11 22:03                   ` Peter Braam
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-11 22:03 UTC (permalink / raw)
  To: lustre-devel




On 7/11/08 8:32 AM, "Jacques-Charles Lafoucriere" <jc.lafoucriere@cea.fr>
wrote:

> 
> 
> Peter Braam wrote:
>>  
>>>  
>>>>  
>>>> 3. how are multiple coordinators and agents utilized to distribute load so
>>>> that the HSM can keep up with massive small file creation?
>>>>       
>>>>  
>>>  
>>> One coordinator per MDT.
>>>     
>>>  
>>  
>> 
>> No - these must be independent considerations.  A coordinator may be much
>> slower than an MDS node in handling a single file.  I say this because this
>> has been the experience in the industry so far - with small files the HSM
>> can not at all keep up.
>>   
> 
> Why a coordinator should be on a slower node than a MDS ?
> Coordinator is a Lustre service  like other Lustre services so it will be on a
> right hardware
> 
I see no reason whatsoever to couple them to MDT?s.  Keeping them de-coupled
is more flexible and can scale better.  Am I missing issues here?

As for performance the following: a coordinator may have a lot of work to do
to track which files still need to be handled by agents and which are in
progress already and may have a fair amount of interaction with agents (not
with HSM to tape but when a coordinator is handling a re-striping migration
it will). But its interaction with the MDT?s is very limited.  So we could
degrade the performance of an MDS by placing this on the same node.  Let?s
keep this flexible please.

Peter
> 
> 
> Do you mean a coordinator is not part of the Lustre cluster ?
> 
> JC
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080711/3f45bcee/attachment.htm>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-11 14:37                 ` Jacques-Charles Lafoucriere
@ 2008-07-11 22:12                   ` Peter Braam
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-11 22:12 UTC (permalink / raw)
  To: lustre-devel




On 7/11/08 8:37 AM, "Jacques-Charles Lafoucriere" <jc.lafoucriere@cea.fr>
wrote:

> Hello
> 
> following latest discussions I understand a large change is coming on
> Lustre/HSM interactions.
> In the HLD the HSM is following Lustre requests :
> - lustre triggers copy-out and copy-in

We can feed a list to the coordinator also to pre-stage ("primary versions")

> - all the copy requests are made through the coordinator control (so the
> Hsm_copy_to_fs command is a command line interface to the coordinator).

No.  I think we need to design this, but it will sipmly create a new file in
the file system, that doesn't require a coordinator.

> Note that this Hsm_copy_to_fs is different from the copy tool.
> The central role of coordinator allows us to control all the requests
> and avoid duplicated requests to copy tool (and give a global view).

The coordinator will do copy in form kernel, automatically triggered by
cache miss or by feeding a list to restore HSM primary copies.  The
hsm_copy_to_fs tool is ONLY needed when secondary copies held in the HSM are
being restored.


> 
> Now it seems Lustre will have to be able to follow HSM requests to send
> file back in Lustre and so independently of the coordinator (in a
> previous email it was requested Hsm_copy_to_fs to trigger copy
> independently of Lustre)
> I do not agree on this change because HSM has to be seen as a backend
> storage for Lustre and the decisions to copy have to be in Lustre.
> Lustre must no suffer HSM but must use it

No - I think you perhaps misunderstood the proposal.


> 
> To manage this new requirement the copy tool has to implement a central
> entity that will :
> - avoid duplicate requests
> - choose which agent has to make the copy
> This will have to be duplicated for each HSM (or backend) supported and
> also will duplicate coordinator role so I think it is better to have it
> in Lustre instead of in the copy tool.

No.

> 
> About file grouping, the planed features are :
> - for copy out:  in one request a list of files can be provided to copy
> tool so it can choose to group them in one HSM "group archive".

This again just rephrasing my question.  HOW is a list of small files
formed?

> - for copy in: if a file is in a HSM "group archive", the copy tool will
> copy back this file in Lustre (and not all the archive file)

No, because with your proposal restoring a 1000 small files will cause 1000
tape actions to get the archive.  I think the entire archive should come
back in one blow.

> - grouped request can come from a user request or a the space manager
> (for a generic policy)

Yes, how is the metadata handled?  This is the case where the HSM DB does
see significant load to make a mapping for each lustre fid to the archived
file.

> 
> The space manager design is today on stand by because of the lack of
> information on changelogs, feeds, Lustre policy engine

We will get there shortly.

> 
> I think there is a very strong need in Lustre to have a generic policy
> database that can be used to allocate files, copy out files, purge
> files, choose which agent will copy files .....

Policy database yes, but NOT a database with HSM related data.

There is a secondary side of policy which is how to treat the data that is
held in the HSM and that doesn't belong in Lustre, but belongs in user
space.  

Yet, this should be two parts of one policy management interface.  If Lustre
runs with Sun HSM it would be one tool, if Lustre runs with HPSS the tool
would have two sides - one to Lustre from Sun and one to HPSS from the HPSS
community.

> One use case for this database is to provide users an interface to said
> : I want all my *.avi files to be striped on 6 OST and all other files
> to be not striped

That is not an HSM policy, this should be a pool data placement policy and I
agree we need it.  The HSM should have sufficient metadata to restore files
in this manner if bare metal restore takes place.

The good news is that I see no serious disagreements, just some minor
misunderstandings.  Agree?

Happy quartorze juillet!!

Regards

Peter


> 
> JC
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-11 21:57         ` Peter Braam
@ 2008-07-16 10:26           ` Jacques-Charles Lafoucriere
  2008-07-16 19:00             ` Peter Braam
  0 siblings, 1 reply; 21+ messages in thread
From: Jacques-Charles Lafoucriere @ 2008-07-16 10:26 UTC (permalink / raw)
  To: lustre-devel

Space Manager needs are:
1) generate a candidate list for copy out (pre-migration)
2) generate a candidate list for purge

For 1) the criteria is : not up to date in HSM and not recently modified
For 2) the criteria is : up to date in HSM and not recently accessed

Needed changelogs events are "modifications" like :
- file creation
- mtime change
- atime change

The things I do not like in events mode are:
- if a file is created, filled and remove before copy-out (like a 
temporary file), we will have useless interaction with the spacemanger 
(and useless load)
- if for some issue, events are missed we will have HSM unknown files in 
Lustre. To resume this issue we can use a scan or find a way to warranty 
we will never missed an event.
This last point is a strong constraint because Lustre should be able to 
operate with a dead space manager.

I agree, I not fond of scanning, but  a low priority, background scan 
will solve these 2 issues.

For me the spacemanager and it's DB are common to all HSM and will have 
no HSM specific information.
HSM specific rules (like in which HSM internal storage class I will put 
a file) will be managed by HSM copy tool
Do you agree ?

JC

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Lustre-devel] How store HSM metadata in MDT ?
  2008-07-16 10:26           ` Jacques-Charles Lafoucriere
@ 2008-07-16 19:00             ` Peter Braam
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Braam @ 2008-07-16 19:00 UTC (permalink / raw)
  To: lustre-devel

This continues the Lustre design discussion for HSM.

On 7/16/08 4:26 AM, "Jacques-Charles Lafoucriere" <jc.lafoucriere@cea.fr>
wrote:

> Space Manager needs are:
> 1) generate a candidate list for copy out (pre-migration)
> 2) generate a candidate list for purge
> 
> For 1) the criteria is : not up to date in HSM and not recently modified
> For 2) the criteria is : up to date in HSM and not recently accessed
> 
> Needed changelogs events are "modifications" like :
> - file creation
> - mtime change
> - atime change

1) the files are in the log (and in ZFS the log can be reconstructed through
a fast search)

The issue here that makes me worried is the following.  Is the coordinator
managing "archiving" or is the space manager?

Whatever entity does it, it needs to WAIT until a file is quiescent for some
time.  ADM's event manager can do that, but how do we do it with HPSS?

Now interestingly the Size On MDS (SOM) project does almost precisely this,
it monitors a file going idle and transfers size from OSS's / clients to the
MDS inode.  So Lustre is pretty close, but this completes too quickly,
commonly archiving is postponed 20 minutes or so.


> 
> The things I do not like in events mode are:
> - if a file is created, filled and remove before copy-out (like a
> temporary file), we will have useless interaction with the spacemanger
> (and useless load)

A stat call to the file is quick and required anyway to eliminate race
conditions.

> - if for some issue, events are missed we will have HSM unknown files in
> Lustre. To resume this issue we can use a scan or find a way to warranty
> we will never missed an event.

Lustre logs and ZFS searches are guaranteed NOT to miss anything.  No finds
are necessary.

> This last point is a strong constraint because Lustre should be able to
> operate with a dead space manager.
> 
> I agree, I not fond of scanning, but  a low priority, background scan
> will solve these 2 issues.

We only need the scan for 2), and as indicated earlier it can be a rare
scan.

I will not accept scanning for 1).

> 
> For me the spacemanager and it's DB are common to all HSM and will have
> no HSM specific information.

And they will be a major bottleneck.   I definitely want to avoid a DB.

It is fair to state that all events belong with Lustre.  Lustre should
define adequate high performance features for distributed storage of events.

The logs or ZFS searches are a good example, similarly file sets, collecting
small files might be good examples.

A key consideration for the design is that by integrating it into Lustre we
control the performance of these event management systems much better than
through upcalls and databases.  Keep two systems in sync has proven to be
the problem when archiving small files (a ridiculously small number of small
files can be archived by current HSM's (100's) we need MILLIONS.  So our
architecture has to plan a major improvement here.  The database should go
away.

> HSM specific rules (like in which HSM internal storage class I will put
> a file) will be managed by HSM copy tool
> Do you agree ?

Yes. 

Peter
> 
> JC
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2008-07-16 19:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-03 11:43 [Lustre-devel] How store HSM metadata in MDT ? Aurelien Degremont
2008-07-03 21:10 ` Peter Braam
2008-07-04 14:37   ` Aurelien Degremont
2008-07-05 16:50     ` Andreas Dilger
2008-07-06  3:20       ` Peter Braam
2008-07-06  3:24     ` Peter Braam
2008-07-06 19:24       ` Lee Ward
2008-07-06 22:53         ` Peter Braam
2008-07-08 12:06           ` Rick Matthews
2008-07-08  8:52         ` Aurelien Degremont
2008-07-08 17:41           ` Peter Braam
2008-07-09 13:25             ` Aurelien Degremont
2008-07-09 13:49               ` Peter Braam
2008-07-11 14:32                 ` Jacques-Charles Lafoucriere
2008-07-11 22:03                   ` Peter Braam
2008-07-11 14:37                 ` Jacques-Charles Lafoucriere
2008-07-11 22:12                   ` Peter Braam
2008-07-11 14:31       ` Jacques-Charles Lafoucriere
2008-07-11 21:57         ` Peter Braam
2008-07-16 10:26           ` Jacques-Charles Lafoucriere
2008-07-16 19:00             ` Peter Braam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.