* [Lustre-devel] Replication
@ 2008-05-05 14:41 Peter Braam
0 siblings, 0 replies; 4+ messages in thread
From: Peter Braam @ 2008-05-05 14:41 UTC (permalink / raw)
To: lustre-devel
Hi Nathan -
I talked through the design with Nikita. After he had understood our
constraints and I had understood his issues it all narrowed down to one
important improvement that Nikita suggests: we must get a fast way to
compute the pathname of a FID. The scanning and searching I suggested
without an index is not tenable.
We had a couple of suggestions, such as storing parent fid and a name in the
EA, or storing similar information in a large directory file.
Can you connect with Nikita and do this?
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080505/23025d10/attachment.htm>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Lustre-devel] Replication
[not found] <482098C9.4020403@sun.com>
@ 2008-05-07 5:57 ` Peter Braam
2008-05-08 14:48 ` Nikita Danilov
0 siblings, 1 reply; 4+ messages in thread
From: Peter Braam @ 2008-05-07 5:57 UTC (permalink / raw)
To: lustre-devel
On 5/6/08 11:43 AM, "Nathaniel Rutman" <Nathan.Rutman@Sun.COM> wrote:
> Peter Braam wrote:
>> Hi Nathan -
>>
>> I talked through the design with Nikita. After he had understood our
>> constraints and I had understood his issues it all narrowed down to
>> one important improvement that Nikita suggests: we must get a fast
>> way to compute the pathname of a FID. The scanning and searching I
>> suggested without an index is not tenable.
>>
>> We had a couple of suggestions, such as storing parent fid and a name
>> in the EA, or storing similar information in a large directory file.
>>
>> Can you connect with Nikita and do this?
>
> We talked yesterday afternoon.
> Nikita has three concerns:
>
> 1. Global lock on namespace during pathname reconstruction.
> I think we can eliminate this the following way:
> a. lookup full path from fid, parent fid (remember the list of fids for
> the entire path also)
> b. lookup last transno
> c. verify traversing down the full path name results in the same branch
> and leaf fids all the way back down
> i. if they don't match, repeat from a
> ii. if they do match, we can backtrack starting from the transno in b
> to regenerate the original name
>
> 2. Directory name lookup given the parent fid - this may be inefficient
> if we have to read the parent directory in order to get the name (parent
> object is not likely to be cached at lookup time).
>
> 3. Someone deletes one of the parents of a hardlinked file. If we only
> store one parent, there's no way to regenerate a pathname if that parent
> is the one that gets removed.
>
> For 2 and 3, we could store the directory name for each directory in an
> EA, and all the fids for all the parents in some other manner.
> But it seems to make more sense at this point to put all this
> information (fid, name, parent list) in a database file stored on the
> MDT. Then we just look through this database to generate our full path
> information; no need to lookup info in the file objects or EAs.
> Generating this database should be no more time consuming than writing
> the changelogs themselves, assuming a reasonable database structure like
> IAM.
>
Yes I agree with all of this.
Peter
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Lustre-devel] Replication
2008-05-07 5:57 ` Peter Braam
@ 2008-05-08 14:48 ` Nikita Danilov
2008-05-08 14:57 ` Peter Braam
0 siblings, 1 reply; 4+ messages in thread
From: Nikita Danilov @ 2008-05-08 14:48 UTC (permalink / raw)
To: lustre-devel
Peter Braam writes:
> On 5/6/08 11:43 AM, "Nathaniel Rutman" <Nathan.Rutman@Sun.COM> wrote:
[...]
> >
> > For 2 and 3, we could store the directory name for each directory in an
> > EA, and all the fids for all the parents in some other manner.
> > But it seems to make more sense at this point to put all this
> > information (fid, name, parent list) in a database file stored on the
> > MDT. Then we just look through this database to generate our full path
One advantage EA has over global data-base is that the former is more
resilient against file system corruption. This becomes more important if
we ever plan to use (parent-fid, name) information for things like fsck.
> > information; no need to lookup info in the file objects or EAs.
> > Generating this database should be no more time consuming than writing
> > the changelogs themselves, assuming a reasonable database structure like
> > IAM.
On a lower level note, I think that changelogs and parent-database are
better to be implemented as a new layer separate from mdd:
- mdd code is already complicated enough,
- separate layer can be inserted into stack optionally, avoiding
run-time cost if change-logs are not needed (currently there is no
way to insert a layer after initial configuration completes though).
> >
>
> Yes I agree with all of this.
>
> Peter
>
Nikita.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Lustre-devel] Replication
2008-05-08 14:48 ` Nikita Danilov
@ 2008-05-08 14:57 ` Peter Braam
0 siblings, 0 replies; 4+ messages in thread
From: Peter Braam @ 2008-05-08 14:57 UTC (permalink / raw)
To: lustre-devel
On 5/8/08 8:48 AM, "Nikita Danilov" <Nikita.Danilov@Sun.COM> wrote:
> Peter Braam writes:
>> On 5/6/08 11:43 AM, "Nathaniel Rutman" <Nathan.Rutman@Sun.COM> wrote:
>
> [...]
>
>>>
>>> For 2 and 3, we could store the directory name for each directory in an
>>> EA, and all the fids for all the parents in some other manner.
>>> But it seems to make more sense at this point to put all this
>>> information (fid, name, parent list) in a database file stored on the
>>> MDT. Then we just look through this database to generate our full path
>
> One advantage EA has over global data-base is that the former is more
> resilient against file system corruption. This becomes more important if
> we ever plan to use (parent-fid, name) information for things like fsck.
>
>>> information; no need to lookup info in the file objects or EAs.
>>> Generating this database should be no more time consuming than writing
>>> the changelogs themselves, assuming a reasonable database structure like
>>> IAM.
>
> On a lower level note, I think that changelogs and parent-database are
> better to be implemented as a new layer separate from mdd:
>
> - mdd code is already complicated enough,
>
> - separate layer can be inserted into stack optionally, avoiding
> run-time cost if change-logs are not needed (currently there is no
> way to insert a layer after initial configuration completes though).
Yes, find a good place.
Just remember that things like pNFS integrated with the Lustre servers also
need to replicate. In fact having this log purely at the DMU / ZFS level
would be a valuable feature - there are no good replication solutions even
for laptops today!
Peter
>
>>>
>>
>> Yes I agree with all of this.
>>
>> Peter
>>
>
> Nikita.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-05-08 14:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-05 14:41 [Lustre-devel] Replication Peter Braam
[not found] <482098C9.4020403@sun.com>
2008-05-07 5:57 ` Peter Braam
2008-05-08 14:48 ` Nikita Danilov
2008-05-08 14:57 ` Peter Braam
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.