All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] Summary of our HSM discussion
@ 2008-08-04 18:06 Peter Braam
  2008-08-12 18:40 ` Kevan
       [not found] ` <48B85E2D.8000109@sun.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Braam @ 2008-08-04 18:06 UTC (permalink / raw)
  To: lustre-devel

We spoke about the HSM plans some 10 days ago.  I think that the conclusions
are roughly as follows:

1. It is desirable to reach a first implementation as soon as possible.
2. Some design puzzles remain to insure that HSM can keep up with Lutre
metadata clusters

The steps to reach a first implementation can be summarized as:

1. Include file closes in the changelog, if the file was opened for write.
Include timestamps in the changelog entries.  This allows the changelog
processor to see files that have become inactive and pass them on for
archiving. 
2. Build an open call that blocks for file retrieval and adapts timeouts to
avoid error returns.
3. Until a least-recently-used log is built, use the e2scan utility to
generate lists of candidates for purging.
4. Translate events and scan results into a form that they can be understood
by ADM. 
5. Work with a single coordinator, whose role it is to avoid getting
multiple ?close? records for the same file (a basic filter for events).
6. Do not use initiators ? these can come later and assist with load
balancing and free-ing space on demand (both of which we can ignore for the
first release) 
7. Do not use multiple agents ? the agents can move stripes of files etc,
and this is not needed with a basic user level solution, based on consuming
the log.  The only thing the agent must do in release one is get the
attention of a data mover to restore files on demand.

Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080804/4d098c7f/attachment.htm>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] Summary of our HSM discussion
  2008-08-04 18:06 [Lustre-devel] Summary of our HSM discussion Peter Braam
@ 2008-08-12 18:40 ` Kevan
  2008-08-14 16:38   ` Peter Braam
       [not found] ` <48B85E2D.8000109@sun.com>
  1 sibling, 1 reply; 6+ messages in thread
From: Kevan @ 2008-08-12 18:40 UTC (permalink / raw)
  To: lustre-devel

Peter,

Apologies, but I am having some difficulty determining where the
boundary
between Lustre and the HSM lies within the first release.  Plus I have
a few newbie Lustre questions.

I think your #4 is saying that Lustre will still provide a Space
Manager in
the first release, responsible for monitoring filesystem fullness,
using
e2scan to pick archive/purge candidates and issuing archive/purge
requests
to the HSM, that these tasks are not performed by the HSM itself.
True?
Is the Space Manager logic part of the Coordinator, or is it a
separate entity?
Is there one Coordinator/Space Manager pair per filesystem or one
total
per site?

Users will need commands that allow them to archive and recall their
files
and/or directory subtrees, and they will want commands like ls and
find
that show them the current HSM archive/purge state of their files so
that
they can pre-stage purged files before they are needed, and so that
they
can purge unneeded files to effectively manage their own quotas.  Will
these
commands be provided by Lustre, or by the HSM?

Given that files are only recalled on open, this implies that a file
which
is open for either read or write by any client can never be purged,
correct?
And a file open for write by any client should never be archived since
it
could be silently changing while the archive is in progress.  And if
a file is opened for write after an archive has begun, the HSM will
be sent a cancel request?  Is the necessary information available to
the
Space Manager and/or Coordinator so that these rules can be enforced?

The HSM data mover needs to be able to open a file by FID without
encountering the adaptive timeout that other users are seeing.  The
data
mover's I/Os must not change the file's read and write timestamps.
The
data mover needs a get_extents(int fd) function to read the file's
extent
map so that it can find the location of holes in sparse files and
preserve
those holes within its HSM copies.  Is there an interface available
that
provides this functionality?

In the FID HLD I find mention of a object version field within the
FID,
which apparently gets incremented with each modification of the file.
Is that currently implemented in Lustre?  I'm thinking of the case
where
a file is archived, recalled, modified, archived, recalled,
modified...
The HSM will need a way to map the correct HSM copies to the correct
version
of the file, so hopefully the version field is already supported.

Does Lustre already support snapshot capabilities, or will it in the
future?
When a snapshot is made, each archived/purged file within the snapshot
effectively creates another reference to its copies within the HSM
database.
An HSM file copy cannot be removed until it is known that no
references
remain to that particular version of the file, either within the live
filesystem or within any snapshot.  Will the Coordinator be able to
see the
snapshot references, and avoid sending delete requests to the HSM
until all
snapshot references for a particular file version have been removed?
Are snapshots read-write or read-only?  If read-only, how to you
intend to
have users access purged files in snapshots?

I haven't been able to figure out how backup/restore works, or will
work
in Lustre.  Standard utilities like tar will wreak havoc by triggering
file recall storms within the HSM.  Better is an intelligent backup
package
which understands that the HSM already has multiple copies of the file
data,
and so the backup program only needs to back up the metadata.  The
problem
here again is that new references to the HSM copies are being created,
yet those
references are not visible to the HSM, it doesn't know they exist, so
methods are needed to ensure that seemly-obsolete HSM copies are not
deleted before the backups that reference them have also been deleted.
If you could provide a short description of how you intend backup/
restore to
work in combination with an HSM, or if you could provide pointers,
that would
be great.

Regards, Kevan

On Aug 4, 1:06 pm, Peter Braam <Peter.Br...@Sun.COM> wrote:
> We spoke about the HSM plans some 10 days ago.  I think that the conclusions
> are roughly as follows:
>
> 1. It is desirable to reach a first implementation as soon as possible.
> 2. Some design puzzles remain to insure that HSM can keep up with Lutre
> metadata clusters
>
> The steps to reach a first implementation can be summarized as:
>
> 1. Include file closes in the changelog, if the file was opened for write.
> Include timestamps in the changelog entries.  This allows the changelog
> processor to see files that have become inactive and pass them on for
> archiving.
> 2. Build an open call that blocks for file retrieval and adapts timeouts to
> avoid error returns.
> 3. Until a least-recently-used log is built, use the e2scan utility to
> generate lists of candidates for purging.
> 4. Translate events and scan results into a form that they can be understood
> by ADM.
> 5. Work with a single coordinator, whose role it is to avoid getting
> multiple ?close? records for the same file (a basic filter for events).
> 6. Do not use initiators ? these can come later and assist with load
> balancing and free-ing space on demand (both of which we can ignore for the
> first release)
> 7. Do not use multiple agents ? the agents can move stripes of files etc,
> and this is not needed with a basic user level solution, based on consuming
> the log.  The only thing the agent must do in release one is get the
> attention of a data mover to restore files on demand.
>
> Peter
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-de... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] Summary of our HSM discussion
  2008-08-12 18:40 ` Kevan
@ 2008-08-14 16:38   ` Peter Braam
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Braam @ 2008-08-14 16:38 UTC (permalink / raw)
  To: lustre-devel

There are rather a lot of questions here - let's give this a go.


On 8/12/08 12:40 PM, "Kevan" <kfr@sgi.com> wrote:

> Peter,
> 
> Apologies, but I am having some difficulty determining where the
> boundary
> between Lustre and the HSM lies within the first release.  Plus I have
> a few newbie Lustre questions.
> 
> I think your #4 is saying that Lustre will still provide a Space
> Manager in
> the first release, responsible for monitoring filesystem fullness,
> using
> e2scan to pick archive/purge candidates and issuing archive/purge
> requests
> to the HSM, that these tasks are not performed by the HSM itself.

The list will be generated by e2scan initially, in due course by a more
efficient and scalable LRU log (see HLD).

The list will be digested and acted upon by the HSM policy manager.

> True?
> Is the Space Manager logic part of the Coordinator, or is it a
> separate entity?

separate

> Is there one Coordinator/Space Manager pair per filesystem or one
> total
> per site?

List generation will probably be per server target (per MDT or OST tbd).
Rick Matthews can tell us if the policy manager manages sites or file
systems.


> 
> Users will need commands that allow them to archive and recall their
> files
> and/or directory subtrees, and they will want commands like ls and
> find
> that show them the current HSM archive/purge state of their files so
> that
> they can pre-stage purged files before they are needed, and so that
> they
> can purge unneeded files to effectively manage their own quotas.  Will
> these
> commands be provided by Lustre, or by the HSM?

These will be commands issued to lustre, extensions of the "lfs" commands.

> 
> Given that files are only recalled on open, this implies that a file
> which
> is open for either read or write by any client can never be purged,
> correct?

yes

> And a file open for write by any client should never be archived since
> it
> could be silently changing while the archive is in progress.

If the HSM is used for backup one probably wants to back-up the file anyway,
and this is a decision of the policy manager.

>And if
> a file is opened for write after an archive has begun, the HSM will
> be sent a cancel request?

The file system will generate events, the policy manager can decide how it
acts on it.


  Is the necessary information available to
> the
> Space Manager and/or Coordinator so that these rules can be enforced?
> 
> The HSM data mover needs to be able to open a file by FID without
> encountering the adaptive timeout that other users are seeing.  The
> data
> mover's I/Os must not change the file's read and write timestamps.
> The
> data mover needs a get_extents(int fd) function to read the file's
> extent
> map so that it can find the location of holes in sparse files and
> preserve
> those holes within its HSM copies.  Is there an interface available
> that
> provides this functionality?


Planned in detail.  See HLD.

> 
> In the FID HLD I find mention of a object version field within the
> FID,
> which apparently gets incremented with each modification of the file.
> Is that currently implemented in Lustre?

Yes.

>  I'm thinking of the case
> where
> a file is archived, recalled, modified, archived, recalled,
> modified...
> The HSM will need a way to map the correct HSM copies to the correct
> version
> of the file, so hopefully the version field is already supported.

Only one version of a file is present in the file system.  The version is
merely a unique indicator that a file has changed.


> 
> Does Lustre already support snapshot capabilities, or will it in the
> future?
> When a snapshot is made, each archived/purged file within the snapshot
> effectively creates another reference to its copies within the HSM
> database.
> An HSM file copy cannot be removed until it is known that no
> references
> remain to that particular version of the file, either within the live
> filesystem or within any snapshot.  Will the Coordinator be able to
> see the
> snapshot references, and avoid sending delete requests to the HSM
> until all
> snapshot references for a particular file version have been removed?
> Are snapshots read-write or read-only?  If read-only, how to you
> intend to
> have users access purged files in snapshots?

TBD.  The key issue with snapshots is where multiple files in snapshots have
shared blocks.  Dedup in ZFS brings similar issues.

> 
> I haven't been able to figure out how backup/restore works, or will
> work
> in Lustre.  Standard utilities like tar will wreak havoc by triggering
> file recall storms within the HSM.  Better is an intelligent backup
> package
> which understands that the HSM already has multiple copies of the file
> data,
> and so the backup program only needs to back up the metadata.  The
> problem
> here again is that new references to the HSM copies are being created,
> yet those
> references are not visible to the HSM, it doesn't know they exist, so
> methods are needed to ensure that seemly-obsolete HSM copies are not
> deleted before the backups that reference them have also been deleted.
> If you could provide a short description of how you intend backup/
> restore to
> work in combination with an HSM, or if you could provide pointers,
> that would
> be great.

The HSM should have a metadata database to implement "tape side" (as opposed
to file system side) policy. That database might hold all metadata and
manage references.

Examples of such policies are compliance policies (e.g. Delete files from
this year), and backup policies, e.g. retain this or that set of files.  I
expect that like future file systems a new concept of fileset is required to
be very flexible in what policies are applied to.

Rick ...


Peter


> Regards, Kevan
> 
> On Aug 4, 1:06 pm, Peter Braam <Peter.Br...@Sun.COM> wrote:
>> We spoke about the HSM plans some 10 days ago.  I think that the conclusions
>> are roughly as follows:
>> 
>> 1. It is desirable to reach a first implementation as soon as possible.
>> 2. Some design puzzles remain to insure that HSM can keep up with Lutre
>> metadata clusters
>> 
>> The steps to reach a first implementation can be summarized as:
>> 
>> 1. Include file closes in the changelog, if the file was opened for write.
>> Include timestamps in the changelog entries.  This allows the changelog
>> processor to see files that have become inactive and pass them on for
>> archiving.
>> 2. Build an open call that blocks for file retrieval and adapts timeouts to
>> avoid error returns.
>> 3. Until a least-recently-used log is built, use the e2scan utility to
>> generate lists of candidates for purging.
>> 4. Translate events and scan results into a form that they can be understood
>> by ADM.
>> 5. Work with a single coordinator, whose role it is to avoid getting
>> multiple ?close? records for the same file (a basic filter for events).
>> 6. Do not use initiators ? these can come later and assist with load
>> balancing and free-ing space on demand (both of which we can ignore for the
>> first release)
>> 7. Do not use multiple agents ? the agents can move stripes of files etc,
>> and this is not needed with a basic user level solution, based on consuming
>> the log.  The only thing the agent must do in release one is get the
>> attention of a data mover to restore files on demand.
>> 
>> Peter
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-de... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-
>> devel
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] Summary of our HSM discussion
       [not found]   ` <48B86F3C.9080909@Sun.COM>
@ 2008-08-29 22:42     ` Nathaniel Rutman
  2008-09-03 20:58       ` Alex Kulyavtsev
  0 siblings, 1 reply; 6+ messages in thread
From: Nathaniel Rutman @ 2008-08-29 22:42 UTC (permalink / raw)
  To: lustre-devel

Rick Matthews wrote:
> On 08/29/08 15:38, Nathaniel Rutman wrote:
>> Rick - I'm finally getting a chance to look at the ADM docs.
>> Most notably as far as I'm concerned, it looks like ADM depends on 
>> DMAPI filesystem interfaces.
>> What we have in Lustre at the moment is a changelog, which includes 
>> all namespace ops (file create, destroy, rename, etc.), and will 
>> include the closed-after-write (#1 below); and e2scan which can be 
>> used to semi-efficiently walk the filesystem gathering mtime/atme 
>> info (#3.)
> DMAPI is an implementation choice. You are correct in assuming what it 
> needs is event information from which an informed decision is made.
> If the necessary information is not with the event (because of later 
> change, or efficiency) the event/policy piece will gather the needed 
> info. I don't
> think there is anything outside of standard POSIX needed.
>> We'll have to add a flag into the lov_ea indicating "in HSM", and 
>> then block for file retrieval (#2).
> Correct...with a small twist...the HSM holds copies of data even when 
> they continue to exist in native disk. The "release" of this space 
> then doesn't need to
> wait for a slower data mover. So, change "in HSM" to "only in HSM" and 
> you are correct.
right, that's what I had in mind.
>> So we need to take these three items and provide some kind of 
>> interface that ADM is comfortable with, while not strictly following 
>> the DMAPI "check with us for every system event" paradigm.
>> The only synchronous event here is #2, where we are requesting a file 
>> out of HSM.
> Yep.
>>
>> From the ADM spec:
>> Changes to ZFS will be fasttracked separately and putback to the ONNV 
>> gate. Much of
>> DMAPI's interaction with ZFS for dm_xxx APIs is done through VFS 
>> interfaces. Imported
>> VFS interfaces are in the table below. A few additional changes are 
>> necessary, such as
>> calling DMAPI to send events, and not updating timestamps for 
>> invisible IO. The plan and
>> current prototype adds a flag value (FINVIS) to be passed into the 
>> VOP_READ,
>> VOP_WRITE, and VOP_SPACE interfaces for invisible IO.
>>
>> If I'm understanding things correctly, if Lustre just honors the 
>> open(...,O_WRONLY | FINVIS) call, and sends the cache miss request 
>> (#2), that is sufficient interaction to pull an HSM file back into 
>> Lustre.
>> We would need a second element that would read the changelogs and 
>> e2scan results to determine when/which files to archive, and the 
>> open(..., O_RDONLY | FINVIS) call to get the data. This element could 
>> be userspace and is asynchronous. Would this talk directly to ADM? 
>> Use DMAPI calls?
> Correct...we would create an interface for consuming your events. (By 
> we, I mean some subset of the two teams). Our DMAPI implementation relies
> heavily on filtering to prevent event floods. As we've discussed, 
> since filters just remove unwanted things, they can occur in the 
> "kernel" / log generation,
> and in user space without impact on the resulting event chain. The 
> "invisible I/O" just prevents additional events size and 
> modtime/access time changes.
> Need not be DMAPI.
Ok, so this is the "event/policy piece", JC I think this is a 
subcomponent of the "coordinator" piece from the old HSM HLD.  I see no 
reason why this can't be a userspace program.  I imagine this piece 
feeds the events/LRU list into the ADM policy engine, and then somebody 
(this piece? ADM itself?) starts doing the FINVIS copyouts into ADM.

Does it make sense to send the cache miss request to this same 
event/policy piece, or to ADM directly?  Somebody needs to do the FINVIS 
copyin.

Thinking a little more about the Lustre internals for step #2, instead 
of blocking the open call on the MDT, maybe it makes sense to grant the 
open lock to the client, who receives the "only in HSM" flagged LOV md, 
and locally blocks read/write requests until the LOV md has been updated 
(maybe signalled through a lock callback on file)  (We've talked about 
adding a file layout lock in the past; maybe that is appropriate here).


>>
>> Does this sound right?
> Yep.
>>
>>
>> Peter Braam wrote:
>>> The steps to reach a first implementation can be summarized as:
>>>
>>>    1. Include file closes in the changelog, if the file was opened for
>>>       write. Include timestamps in the changelog entries. This allows
>>>       the changelog processor to see files that have become inactive
>>>       and pass them on for archiving.
>>>    2. Build an open call that blocks for file retrieval and adapts
>>>       timeouts to avoid error returns.
>>>    3. Until a least-recently-used log is built, use the e2scan utility
>>>       to generate lists of candidates for purging.
>>>    4. Translate events and scan results into a form that they can be
>>>       understood by ADM.
>>>    5. Work with a single coordinator, whose role it is to avoid
>>>       getting multiple ?close? records for the same file (a basic
>>>       filter for events).
>>>    6. Do not use initiators ? these can come later and assist with
>>>       load balancing and free-ing space on demand (both of which we
>>>       can ignore for the first release)
>>>    7. Do not use multiple agents ? the agents can move stripes of
>>>       files etc, and this is not needed with a basic user level
>>>       solution, based on consuming the log. The only thing the agent
>>>       must do in release one is get the attention of a data mover to
>>>       restore files on demand.
>>>
>>>
>>> Peter
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>   
>>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] Summary of our HSM discussion
  2008-08-29 22:42     ` Nathaniel Rutman
@ 2008-09-03 20:58       ` Alex Kulyavtsev
  2008-09-04  7:59         ` Peter Braam
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Kulyavtsev @ 2008-09-03 20:58 UTC (permalink / raw)
  To: lustre-devel

Hello,
sorry for breaking into discussion. Please find inlined

Nathaniel Rutman wrote:
> Rick Matthews wrote:
>   
>> On 08/29/08 15:38, Nathaniel Rutman wrote:
>>     
(Snip)
>>
>>> We'll have to add a flag into the lov_ea indicating "in HSM", and 
>>> then block for file retrieval (#2).
>>>       
>> Correct...with a small twist...the HSM holds copies of data even when 
>> they continue to exist in native disk. The "release" of this space 
>> then doesn't need to
>> wait for a slower data mover. So, change "in HSM" to "only in HSM" and 
>> you are correct.
>>     
> right, that's what I had in mind.
>   
- What is definition of "ONLY in HSM" ?
- Are these flags exposed to end user ?

Consider use case :
User has someFile striped across two osts:  OST1 and OST2. File is in 
HSM as well.
OST2 is down. User reads the file and reaches stripe residing on OST2 
(or open() checks ost status )
In this case it will be nice to stage file from tape as a whole or only 
stripes residing on OST2.
Also, when OST2 restarts it shall remove stale stripes and MDT points to 
right OST set after retrieval.
I realize it makes things more complicated and adds more triggers to #2  
for file retrieval.

Back to flags definition :
Thus staging from tape may be triggered by several conditions including
 (  File_is_Resident ) and (File_is_in_HSM) and (OST_is_Not_Available)
in addition to
 ( ! File_is_Resident ) and (File_is_in_HSM)

It may worth to keep flags  (File_is_Resident) and (File_is_in_HSM) 
separate  as 
"File_is_in_HSM" is a fundamental file property indicating "permanent" 
storage of the file and other flags reflect file state (file is resident 
on disk)
or transient condition (ost is down).

The other use case when end user writes file to lustre/HSM system and 
waits till file reaches the tape before deleting the original while 
checking file status time to time.
It can be done if "File_is_in_HSM" flag is exposed to end user by some 
command or if  HSM fileID is set in EA.
In this case user wants to know "is in hsm" part of the flag regardless 
"file is resident on disk". Keeping flags separate will help with logic 
and synchronization.

Best regards, Alex.

(snip)
>>> Peter Braam wrote:
>>>       
>>>> The steps to reach a first implementation can be summarized as:
>>>>
>>>>    1. Include file closes in the changelog, if the file was opened for
>>>>       write. Include timestamps in the changelog entries. This allows
>>>>       the changelog processor to see files that have become inactive
>>>>       and pass them on for archiving.
>>>>    2. Build an open call that blocks for file retrieval and adapts
>>>>       timeouts to avoid error returns.
>>>>    3. Until a least-recently-used log is built, use the e2scan utility
>>>>       to generate lists of candidates for purging.
>>>>    4. Translate events and scan results into a form that they can be
>>>>       understood by ADM.
>>>>    5. Work with a single coordinator, whose role it is to avoid
>>>>       getting multiple ?close? records for the same file (a basic
>>>>       filter for events).
>>>>    6. Do not use initiators ? these can come later and assist with
>>>>       load balancing and free-ing space on demand (both of which we
>>>>       can ignore for the first release)
>>>>    7. Do not use multiple agents ? the agents can move stripes of
>>>>       files etc, and this is not needed with a basic user level
>>>>       solution, based on consuming the log. The only thing the agent
>>>>       must do in release one is get the attention of a data mover to
>>>>       restore files on demand.
>>>>
>>>>
>>>> Peter
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>   
>>>>         
>>     
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] Summary of our HSM discussion
  2008-09-03 20:58       ` Alex Kulyavtsev
@ 2008-09-04  7:59         ` Peter Braam
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Braam @ 2008-09-04  7:59 UTC (permalink / raw)
  To: lustre-devel




On 9/3/08 10:58 PM, "Alex Kulyavtsev" <aik@fnal.gov> wrote:

> Hello,
> sorry for breaking into discussion. Please find inlined
> 
> Nathaniel Rutman wrote:
>> Rick Matthews wrote:
>>   
>>> On 08/29/08 15:38, Nathaniel Rutman wrote:
>>>     
> (Snip)
>>> 
>>>> We'll have to add a flag into the lov_ea indicating "in HSM", and
>>>> then block for file retrieval (#2).
>>>>       
>>> Correct...with a small twist...the HSM holds copies of data even when
>>> they continue to exist in native disk. The "release" of this space
>>> then doesn't need to
>>> wait for a slower data mover. So, change "in HSM" to "only in HSM" and
>>> you are correct.
>>>     
>> right, that's what I had in mind.
>>   
> - What is definition of "ONLY in HSM" ?
> - Are these flags exposed to end user ?

They will be extended attributes accessible with the xattr utilities.

If there is a standard for such attributes, we should use it to avoid
introducing yet another set of product specific attributes.

> 
> Consider use case :
> User has someFile striped across two osts:  OST1 and OST2. File is in
> HSM as well.
> OST2 is down. User reads the file and reaches stripe residing on OST2
> (or open() checks ost status )
> In this case it will be nice to stage file from tape as a whole or only
> stripes residing on OST2.
> Also, when OST2 restarts it shall remove stale stripes and MDT points to
> right OST set after retrieval.
> I realize it makes things more complicated and adds more triggers to #2
> for file retrieval.

Nice idea, but building this into the FS is really a refinement that we
should not be going after too soon.  When OST2 returns to the cluster, we
have cleanup work, and building all the administration infrastructure for
this is a lot of work.

With a copy_from_hsm command users should be able to do this.

Lsxattr <pathname> -- see file is on tape
Lfs getfid <pathname> -- get its fid
Copy_from_hsm <fid> <path> -- copy it in

> 
> Back to flags definition :
> Thus staging from tape may be triggered by several conditions including
>  (  File_is_Resident ) and (File_is_in_HSM) and (OST_is_Not_Available)
> in addition to
>  ( ! File_is_Resident ) and (File_is_in_HSM)
> 
> It may worth to keep flags  (File_is_Resident) and (File_is_in_HSM)
> separate  as 
> "File_is_in_HSM" is a fundamental file property indicating "permanent"
> storage of the file and other flags reflect file state (file is resident
> on disk)
> or transient condition (ost is down).

Reminder: File_is_in_HSM needs to be cleared if the file changes again.
Files change on the OSS, not on the MDS, where are the flags? We can trust
version propagation from OSS to MDS only when SOM is present.

> 
> The other use case when end user writes file to lustre/HSM system and
> waits till file reaches the tape before deleting the original while
> checking file status time to time.

Yes.

> It can be done if "File_is_in_HSM" flag is exposed to end user by some
> command or if  HSM fileID is set in EA.

The HSM fileID will NOT be in the EA at all.  The flag can be exposed.

> In this case user wants to know "is in hsm" part of the flag regardless
> "file is resident on disk". Keeping flags separate will help with logic
> and synchronization.

We want a flag file is NOT resident on disk, since otherwise we need to tag
all files, but that is a detail.

Peter

> 
> Best regards, Alex.
> 
> (snip)
>>>> Peter Braam wrote:
>>>>       
>>>>> The steps to reach a first implementation can be summarized as:
>>>>> 
>>>>>    1. Include file closes in the changelog, if the file was opened for
>>>>>       write. Include timestamps in the changelog entries. This allows
>>>>>       the changelog processor to see files that have become inactive
>>>>>       and pass them on for archiving.
>>>>>    2. Build an open call that blocks for file retrieval and adapts
>>>>>       timeouts to avoid error returns.
>>>>>    3. Until a least-recently-used log is built, use the e2scan utility
>>>>>       to generate lists of candidates for purging.
>>>>>    4. Translate events and scan results into a form that they can be
>>>>>       understood by ADM.
>>>>>    5. Work with a single coordinator, whose role it is to avoid
>>>>>       getting multiple ?close? records for the same file (a basic
>>>>>       filter for events).
>>>>>    6. Do not use initiators ? these can come later and assist with
>>>>>       load balancing and free-ing space on demand (both of which we
>>>>>       can ignore for the first release)
>>>>>    7. Do not use multiple agents ? the agents can move stripes of
>>>>>       files etc, and this is not needed with a basic user level
>>>>>       solution, based on consuming the log. The only thing the agent
>>>>>       must do in release one is get the attention of a data mover to
>>>>>       restore files on demand.
>>>>> 
>>>>> 
>>>>> Peter
>>>>> ------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Lustre-devel mailing list
>>>>> Lustre-devel at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>   
>>>>>         
>>>     
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>   
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-09-04  7:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-04 18:06 [Lustre-devel] Summary of our HSM discussion Peter Braam
2008-08-12 18:40 ` Kevan
2008-08-14 16:38   ` Peter Braam
     [not found] ` <48B85E2D.8000109@sun.com>
     [not found]   ` <48B86F3C.9080909@Sun.COM>
2008-08-29 22:42     ` Nathaniel Rutman
2008-09-03 20:58       ` Alex Kulyavtsev
2008-09-04  7:59         ` Peter Braam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.