* [Lustre-devel] Global generic database
2008-02-13 18:23 [Lustre-devel] Global generic database Nathaniel Rutman
@ 2008-02-13 20:35 ` Canon, Richard Shane
2008-02-14 12:58 ` Aurelien Degremont
` (3 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Canon, Richard Shane @ 2008-02-13 20:35 UTC (permalink / raw)
To: lustre-devel
JC mentioned this when we were talking about the Space Manager component
in the HSM design. The Space Manager would have some type of policy
language and the policies need to be stored some where. Another use
case would be QoS policies when that starts to appear.
--Shane
-----Original Message-----
From: lustre-devel-bounces@lists.lustre.org
[mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of Nathaniel
Rutman
Sent: Wednesday, February 13, 2008 1:24 PM
To: lustre-devel at lists.lustre.org; aurelien.degremont at cea.fr;
Eric.Barton at Sun.COM
Subject: [Lustre-devel] Global generic database
The design of various new features in Lustre call for global (filesystem
wide) databases, accessible from
clients or other servers:
A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool policies
(all .jpg files to pool #1)
B. filesets - fileset policies (log creates on fileset #1 to feed "foo")
C. HSM - (aureleien - what was the use case here?)
We've already implemented at least 2 of these:
D. Fid Location Database - (is this done?)
E. configuration parameters - stored in MGS llogs
Rather than continue 1-off implementations, I think it's time we came up
with a consistent,
global, generic database mechanism for A-C as well as other future uses.
Needs to be:
1. Fast. We need to cache database entries locally, which also means
having them under locks.
a. local caching
b. locks
2. Generic. Store any kind of data, not limited to 8k page boundaries,
etc.
3. Transactional. Power loss doesn't lead to inconsistent state.
4. Recoverable. Client changes are replayed if need be.
5. Remotely accessible, from a client or other servers.
_______________________________________________
Lustre-devel mailing list
Lustre-devel at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 11+ messages in thread* [Lustre-devel] Global generic database
2008-02-13 18:23 [Lustre-devel] Global generic database Nathaniel Rutman
2008-02-13 20:35 ` Canon, Richard Shane
@ 2008-02-14 12:58 ` Aurelien Degremont
2008-02-14 14:57 ` Peter J Braam
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Aurelien Degremont @ 2008-02-14 12:58 UTC (permalink / raw)
To: lustre-devel
Nathaniel Rutman a ?crit :
> The design of various new features in Lustre call for global (filesystem
> wide) databases, accessible from
> clients or other servers:
> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool policies
> (all .jpg files to pool #1)
> B. filesets - fileset policies (log creates on fileset #1 to feed "foo")
> C. HSM - (aureleien - what was the use case here?)
-pre-stage files when they are unused for 2 weeks
-purge files when fs occupation reaches 95%.
-do not purge file whose name matches ".local/*"
and so on...
There is a lot of possibilities with:
-events. When doing a check?
ie: fs occupations reaches N%, ...
-filters. Which object is concerned?
ie: file attributes (path, name, size, age, user, ...)
-actions. What to do with them?
ie: copy in, copy out, purge, ...
--
Aurelien Degremont
CEA
^ permalink raw reply [flat|nested] 11+ messages in thread* [Lustre-devel] Global generic database
2008-02-13 18:23 [Lustre-devel] Global generic database Nathaniel Rutman
2008-02-13 20:35 ` Canon, Richard Shane
2008-02-14 12:58 ` Aurelien Degremont
@ 2008-02-14 14:57 ` Peter J Braam
2008-02-14 19:56 ` Nathaniel Rutman
2008-02-15 20:40 ` Nikita Danilov
2008-02-15 17:50 ` Alexander Zarochentsev
2008-02-18 21:57 ` Yuriy Umanets
4 siblings, 2 replies; 11+ messages in thread
From: Peter J Braam @ 2008-02-14 14:57 UTC (permalink / raw)
To: lustre-devel
Hmm ... here are my thoughts.
1. The word scalable is missing below.
2. Any database that relates to file system policies and file system
objects (HSM?) should be a separate mechanism coupled to the file
system, so that you can pick up the server disks and the policies.
3. I think all updates to the database should be made on the server, and
the use cases should be restricted (e.g. this is for relatively small
databases).
4. Imho pools belong in the configuration log.
5. Fileset attributes belong with the file system (see 2) - either these
are implemented as special directory files and/or EA's (does the design
specify the purpose and items that need to be stored in databases?).
Hmm, so can we revisit why we need a new database mechanism?
- Peter -
Nathaniel Rutman wrote:
> The design of various new features in Lustre call for global (filesystem
> wide) databases, accessible from
> clients or other servers:
> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool policies
> (all .jpg files to pool #1)
> B. filesets - fileset policies (log creates on fileset #1 to feed "foo")
> C. HSM - (aureleien - what was the use case here?)
>
> We've already implemented at least 2 of these:
> D. Fid Location Database - (is this done?)
> E. configuration parameters - stored in MGS llogs
>
> Rather than continue 1-off implementations, I think it's time we came up
> with a consistent,
> global, generic database mechanism for A-C as well as other future uses.
> Needs to be:
> 1. Fast. We need to cache database entries locally, which also means
> having them under locks.
> a. local caching
> b. locks
> 2. Generic. Store any kind of data, not limited to 8k page boundaries, etc.
> 3. Transactional. Power loss doesn't lead to inconsistent state.
> 4. Recoverable. Client changes are replayed if need be.
> 5. Remotely accessible, from a client or other servers.
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Lustre-devel] Global generic database
2008-02-14 14:57 ` Peter J Braam
@ 2008-02-14 19:56 ` Nathaniel Rutman
2008-02-15 3:32 ` Peter J Braam
2008-02-15 20:40 ` Nikita Danilov
1 sibling, 1 reply; 11+ messages in thread
From: Nathaniel Rutman @ 2008-02-14 19:56 UTC (permalink / raw)
To: lustre-devel
Peter J Braam wrote:
> Hmm ... here are my thoughts.
>
> 1. The word scalable is missing below.
That is implicit in any Lustre design :)
>
> 2. Any database that relates to file system policies and file system
> objects (HSM?) should be a separate mechanism coupled to the file
> system, so that you can pick up the server disks and the policies.
What I am trying to avoid is multiple mechanisms to reduce the number of
database implementations we have to write/maintain.
>
> 3. I think all updates to the database should be made on the server,
> and the use cases should be restricted (e.g. this is for relatively
> small databases).
Maybe updates can only be made on the server, but the data needs to be
readable from anywhere.
>
> 4. Imho pools belong in the configuration log.
Pool definitions can easily be put in the configuration logs - but pool
policies can be complex ("all .mov files greater than 10GB go
to pool 7") and malleable - configuration logs are not easily
accessible, not random access (config log records are arbitrary size, so
we must walk the file from the beginning to find a record). If they
grow too big performance will suffer.
> 5. Fileset attributes belong with the file system (see 2) - either
> these are implemented as special directory files and/or EA's (does the
> design specify the purpose and items that need to be stored in
> databases?).
Fileset membership is stored with the filesystem (EAs), but fileset
policies may again be larger, complex entities that should probably be
stored once in a central database, and looked up as needed. For the
10,000 fileset case, clearly we don't want to read in 10,000 fileset
policies from the config log at startup; they should be loaded on-demand
as needed.
>
> Hmm, so can we revisit why we need a new database mechanism?
>
> - Peter -
>
>
>
> Nathaniel Rutman wrote:
>> The design of various new features in Lustre call for global
>> (filesystem wide) databases, accessible from
>> clients or other servers:
>> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool
>> policies (all .jpg files to pool #1)
>> B. filesets - fileset policies (log creates on fileset #1 to feed "foo")
>> C. HSM - (aureleien - what was the use case here?)
Space manager policies
>>
>> We've already implemented at least 2 of these:
>> D. Fid Location Database - (is this done?)
>> E. configuration parameters - stored in MGS llogs
>>
>> Rather than continue 1-off implementations, I think it's time we came
>> up with a consistent,
>> global, generic database mechanism for A-C as well as other future uses.
>> Needs to be:
>> 1. Fast. We need to cache database entries locally, which also means
>> having them under locks.
>> a. local caching
>> b. locks
>> 2. Generic. Store any kind of data, not limited to 8k page
>> boundaries, etc.
>> 3. Transactional. Power loss doesn't lead to inconsistent state.
>> 4. Recoverable. Client changes are replayed if need be.
>> 5. Remotely accessible, from a client or other servers.
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>
^ permalink raw reply [flat|nested] 11+ messages in thread* [Lustre-devel] Global generic database
2008-02-14 19:56 ` Nathaniel Rutman
@ 2008-02-15 3:32 ` Peter J Braam
0 siblings, 0 replies; 11+ messages in thread
From: Peter J Braam @ 2008-02-15 3:32 UTC (permalink / raw)
To: lustre-devel
Nathaniel Rutman wrote:
> Peter J Braam wrote:
>> Hmm ... here are my thoughts.
>>
>> 1. The word scalable is missing below.
> That is implicit in any Lustre design :)
>>
>> 2. Any database that relates to file system policies and file system
>> objects (HSM?) should be a separate mechanism coupled to the file
>> system, so that you can pick up the server disks and the policies.
> What I am trying to avoid is multiple mechanisms to reduce the number
> of database implementations we have to write/maintain.
>>
>> 3. I think all updates to the database should be made on the server,
>> and the use cases should be restricted (e.g. this is for relatively
>> small databases).
> Maybe updates can only be made on the server, but the data needs to be
> readable from anywhere.
>>
>> 4. Imho pools belong in the configuration log.
> Pool definitions can easily be put in the configuration logs - but
> pool policies can be complex ("all .mov files greater than 10GB go
> to pool 7") and malleable - configuration logs are not easily
> accessible, not random access (config log records are arbitrary size,
> so we must walk the file from the beginning to find a record). If
> they grow too big performance will suffer.
>> 5. Fileset attributes belong with the file system (see 2) - either
>> these are implemented as special directory files and/or EA's (does
>> the design specify the purpose and items that need to be stored in
>> databases?).
> Fileset membership is stored with the filesystem (EAs), but fileset
> policies may again be larger, complex entities that should probably be
> stored once in a central database, and looked up as needed. For the
> 10,000 fileset case, clearly we don't want to read in 10,000 fileset
> policies fro
> the config log at startup; they should be loaded on-demand as needed.
They need to be in the filesystem, not on the management server.
- Peter -
>>
>> Hmm, so can we revisit why we need a new database mechanism?
>>
>> - Peter -
>>
>>
>>
>> Nathaniel Rutman wrote:
>>> The design of various new features in Lustre call for global
>>> (filesystem wide) databases, accessible from
>>> clients or other servers:
>>> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool
>>> policies (all .jpg files to pool #1)
>>> B. filesets - fileset policies (log creates on fileset #1 to feed
>>> "foo")
>>> C. HSM - (aureleien - what was the use case here?)
> Space manager policies
>>>
>>> We've already implemented at least 2 of these:
>>> D. Fid Location Database - (is this done?)
>>> E. configuration parameters - stored in MGS llogs
>>>
>>> Rather than continue 1-off implementations, I think it's time we
>>> came up with a consistent,
>>> global, generic database mechanism for A-C as well as other future
>>> uses.
>>> Needs to be:
>>> 1. Fast. We need to cache database entries locally, which also means
>>> having them under locks.
>>> a. local caching
>>> b. locks
>>> 2. Generic. Store any kind of data, not limited to 8k page
>>> boundaries, etc.
>>> 3. Transactional. Power loss doesn't lead to inconsistent state.
>>> 4. Recoverable. Client changes are replayed if need be.
>>> 5. Remotely accessible, from a client or other servers.
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Lustre-devel] Global generic database
2008-02-14 14:57 ` Peter J Braam
2008-02-14 19:56 ` Nathaniel Rutman
@ 2008-02-15 20:40 ` Nikita Danilov
1 sibling, 0 replies; 11+ messages in thread
From: Nikita Danilov @ 2008-02-15 20:40 UTC (permalink / raw)
To: lustre-devel
Peter J Braam writes:
> Hmm ... here are my thoughts.
Can we use our existing directory/lookup/read/write mechanism to
implement this database? That is, imagine, that clients somehow get
special fid (DB_FID), representing directory not visible through the
normal namespace (this can be implemented as a /DB directory on the MDS
local file-system, alongside the /ROOT directory). Typical use of that
would be something along the lines of
int db_value_get(const char *key, void *buf, size_t count)
{
static struct dt_object *topdir = object_by_fid(DB_FID);
fd = lookup(topdir, key);
read(fd, buf, count);
close(fd);
}
db_value_get("filesets.FOO.policy", buf, BUFSIZE);
db_value_get("pools.BAR.width", &pool_width, sizeof pool_width);
etc.
Main advantage of this approach is of course that all code is already
here, moreover...
>
> 1. The word scalable is missing below.
fixed through the standards means: CMD, placement policies, split
directories, pdirops-locking,
>
> 2. Any database that relates to file system policies and file system
> objects (HSM?) should be a separate mechanism coupled to the file
> system, so that you can pick up the server disks and the policies.
achieved automatically (if I understand the issue correctly),
>
> 3. I think all updates to the database should be made on the server, and
> the use cases should be restricted (e.g. this is for relatively small
> databases).
>
> 4. Imho pools belong in the configuration log.
>
> 5. Fileset attributes belong with the file system (see 2) - either these
> are implemented as special directory files and/or EA's (does the design
> specify the purpose and items that need to be stored in databases?).
>
[...]
> > Needs to be:
> > 1. Fast. We need to cache database entries locally, which also means
hopefully fast. :-) Caching is already here,
> > having them under locks.
> > a. local caching
already here,
> > b. locks
already here,
> > 2. Generic. Store any kind of data, not limited to 8k page boundaries, etc.
already here,
> > 3. Transactional. Power loss doesn't lead to inconsistent state.
already here,
> > 4. Recoverable. Client changes are replayed if need be.
already here,
> > 5. Remotely accessible, from a client or other servers.
already here.
Plus, we can allow clients to mount DB_FID as a separate file system, so
that usual tools can be used to maintain the database.
Nikita.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Lustre-devel] Global generic database
2008-02-13 18:23 [Lustre-devel] Global generic database Nathaniel Rutman
` (2 preceding siblings ...)
2008-02-14 14:57 ` Peter J Braam
@ 2008-02-15 17:50 ` Alexander Zarochentsev
2008-02-16 7:40 ` Andreas Dilger
2008-02-18 21:57 ` Yuriy Umanets
4 siblings, 1 reply; 11+ messages in thread
From: Alexander Zarochentsev @ 2008-02-15 17:50 UTC (permalink / raw)
To: lustre-devel
Hello,
On 13 February 2008 21:23:50 Nathaniel Rutman wrote:
> The design of various new features in Lustre call for global
> (filesystem wide) databases, accessible from
> clients or other servers:
> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool
> policies (all .jpg files to pool #1)
> B. filesets - fileset policies (log creates on fileset #1 to feed
> "foo") C. HSM - (aureleien - what was the use case here?)
>
> We've already implemented at least 2 of these:
> D. Fid Location Database - (is this done?)
> E. configuration parameters - stored in MGS llogs
?ould be the same (file?) interface used for anything Lustre-specific
under /proc? Anyway we need a /proc replacement for use-level lustre
servers.
> Rather than continue 1-off implementations, I think it's time we came
> up with a consistent,
> global, generic database mechanism for A-C as well as other future
> uses. Needs to be:
> 1. Fast. We need to cache database entries locally, which also means
> having them under locks.
> a. local caching
> b. locks
> 2. Generic. Store any kind of data, not limited to 8k page
> boundaries, etc. 3. Transactional. Power loss doesn't lead to
> inconsistent state. 4. Recoverable. Client changes are replayed if
> need be.
> 5. Remotely accessible, from a client or other servers.
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
Thanks,
Zam.
^ permalink raw reply [flat|nested] 11+ messages in thread* [Lustre-devel] Global generic database
2008-02-15 17:50 ` Alexander Zarochentsev
@ 2008-02-16 7:40 ` Andreas Dilger
2008-02-17 11:27 ` Alex Lyashkov
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2008-02-16 7:40 UTC (permalink / raw)
To: lustre-devel
On Feb 15, 2008 20:50 +0300, Alexander Zarochentsev wrote:
> On 13 February 2008 21:23:50 Nathaniel Rutman wrote:
> > The design of various new features in Lustre call for global
> > (filesystem wide) databases, accessible from
> > clients or other servers:
> > A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool
> > policies (all .jpg files to pool #1)
> > B. filesets - fileset policies (log creates on fileset #1 to feed
> > "foo") C. HSM - (aureleien - what was the use case here?)
> >
> > We've already implemented at least 2 of these:
> > D. Fid Location Database - (is this done?)
> > E. configuration parameters - stored in MGS llogs
>
> ?ould be the same (file?) interface used for anything Lustre-specific
> under /proc? Anyway we need a /proc replacement for use-level lustre
> servers.
There won't immediately be a /proc replacement for uOSS. Instead, there
will be a new "lctl {get,set}_param" command that reads/writes the
same proc entries. At some later time we will make a .lustre/proc
directory which will allow access to the files currently in /proc,
possibly also allowing access to /proc values on all lustre nodes.
That hasn't been designed yet, but would definitely be convenient.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread* [Lustre-devel] Global generic database
2008-02-16 7:40 ` Andreas Dilger
@ 2008-02-17 11:27 ` Alex Lyashkov
0 siblings, 0 replies; 11+ messages in thread
From: Alex Lyashkov @ 2008-02-17 11:27 UTC (permalink / raw)
To: lustre-devel
On Sat, 2008-02-16 at 00:40 -0700, Andreas Dilger wrote:
>
> There won't immediately be a /proc replacement for uOSS. Instead,
> there
> will be a new "lctl {get,set}_param" command that reads/writes the
> same proc entries. At some later time we will make a .lustre/proc
> directory which will allow access to the files currently in /proc,
> possibly also allowing access to /proc values on all lustre nodes.
> That hasn't been designed yet, but would definitely be convenient.
I think we can put all parameters into sysctl tree and write simple
wraper for userland tools.
Which add functions for register/list/access to sysctl node.
--
Alex Lyashkov <Alexey.lyashkov@sun.com>
Lustre Group, Sun Microsystems
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Lustre-devel] Global generic database
2008-02-13 18:23 [Lustre-devel] Global generic database Nathaniel Rutman
` (3 preceding siblings ...)
2008-02-15 17:50 ` Alexander Zarochentsev
@ 2008-02-18 21:57 ` Yuriy Umanets
4 siblings, 0 replies; 11+ messages in thread
From: Yuriy Umanets @ 2008-02-18 21:57 UTC (permalink / raw)
To: lustre-devel
Nathaniel Rutman wrote:
> The design of various new features in Lustre call for global (filesystem
> wide) databases, accessible from
> clients or other servers:
> A. pools - pool descriptions (pool #1 = OSTs 1-10,30-60), pool policies
> (all .jpg files to pool #1)
> B. filesets - fileset policies (log creates on fileset #1 to feed "foo")
> C. HSM - (aureleien - what was the use case here?)
>
> We've already implemented at least 2 of these:
> D. Fid Location Database - (is this done?)
>
This is basic service in new MDS stack. Though its implementation lacks
DHT support as it was designed originally. Currently it uses round-robin
as a policy for spreading parts of FLD over MDS nodes in a cluster.
Another issue is that, originally intention was to make whole Lustre FLD
aware but now only MDS nodes store parts of FLD as only MDS uses new fids.
Thanks.
--
umka
^ permalink raw reply [flat|nested] 11+ messages in thread