* [Lustre-devel] Wide area use of Lustre and client caches
@ 2008-05-09 4:55 Peter Braam
2008-05-09 14:25 ` Brian J. Murrell
0 siblings, 1 reply; 8+ messages in thread
From: Peter Braam @ 2008-05-09 4:55 UTC (permalink / raw)
To: lustre-devel
During the LUG I was approached by a customer who wants to use a Lustre file
system at the far end of a WAN link. Since the situation may be of general
interest, I thought I would post a short report of the discussion here.
His use pattern was interesting ? a number of Windows clients must be
browsing files stored in Lustre in this remote location. It was expected
that the files would be fairly large, would be viewed by multiple clients,
and that few or no modifications would be made.
After some discussion we proposed a solution that involved a deployment as
follows:
1. A single Lustre client with lots of RAM. The settings on the client
would be (1) that the memory available for caching by lustre is large (2)
that the number of locks that can be held by this client is fairly large (3)
that this client uses the ?open cache?.
2. A samba server on this Lustre client.
With the settings above, we can expect that many of the files can be cached
in the Lustre client, hence after the initial read, I/O would be local in
the remote site. With the open file cache enabled, even the open and close
traffic will not go to the servers, but can be handled by the client. We
think that this will lead to a very good solution, that can work today.
A refinement is possible, that requires some development. There is a
feature in the Linux kernel to use a disk partition as a cache for a file
system ? it is called cachefs. This requires a few hooks in Lustre to
store chunks of files that are transferred to the client into this cache,
and cache invalidation calls to remove them. It allows us to achieve the
same performance as with the solution above, except that the disk will be a
bit slower than memory, but it can also be much larger.
We are eagerly awaiting the results of testing this configuration!
- peter -
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080508/480a7bd2/attachment.htm>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-05-09 4:55 [Lustre-devel] Wide area use of Lustre and client caches Peter Braam
@ 2008-05-09 14:25 ` Brian J. Murrell
2008-05-09 15:08 ` Peter Braam
0 siblings, 1 reply; 8+ messages in thread
From: Brian J. Murrell @ 2008-05-09 14:25 UTC (permalink / raw)
To: lustre-devel
On Thu, 2008-05-08 at 22:55 -0600, Peter Braam wrote:
>
> His use pattern was interesting ? a number of Windows clients must be
> browsing files stored in Lustre in this remote location. It was
> expected that the files would be fairly large, would be viewed by
> multiple clients, and that few or no modifications would be made.
Even still it's useful during implementation to think of the use case of
that remote client having read a file and caching and holding a read
lock on that file, say 1GB in size, and then another client wanting to
update say, 1KB in the middle of the file. It would be beneficial for
that 1GB file to have a small (but still practical) stripe size so that
the amount of cache that needs to be thrown away to accommodate the
write is relatively small.
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080509/554406bd/attachment.pgp>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-05-09 14:25 ` Brian J. Murrell
@ 2008-05-09 15:08 ` Peter Braam
2008-05-09 17:08 ` Brian J. Murrell
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Peter Braam @ 2008-05-09 15:08 UTC (permalink / raw)
To: lustre-devel
Nono - striping should only be used to get more bandwidth from servers. The
correct solution to the problem you point out is a lock conversion, planned
long ago, still far away maybe (Nikita?).
Peter
On 5/9/08 8:25 AM, "Brian J. Murrell" <Brian.Murrell@Sun.COM> wrote:
> On Thu, 2008-05-08 at 22:55 -0600, Peter Braam wrote:
>>
>> His use pattern was interesting ? a number of Windows clients must be
>> browsing files stored in Lustre in this remote location. It was
>> expected that the files would be fairly large, would be viewed by
>> multiple clients, and that few or no modifications would be made.
>
> Even still it's useful during implementation to think of the use case of
> that remote client having read a file and caching and holding a read
> lock on that file, say 1GB in size, and then another client wanting to
> update say, 1KB in the middle of the file. It would be beneficial for
> that 1GB file to have a small (but still practical) stripe size so that
> the amount of cache that needs to be thrown away to accommodate the
> write is relatively small.
>
> b.
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-05-09 15:08 ` Peter Braam
@ 2008-05-09 17:08 ` Brian J. Murrell
2008-05-09 18:17 ` Nikita Danilov
2008-05-09 23:01 ` Andreas Dilger
2 siblings, 0 replies; 8+ messages in thread
From: Brian J. Murrell @ 2008-05-09 17:08 UTC (permalink / raw)
To: lustre-devel
On Fri, 2008-05-09 at 09:08 -0600, Peter Braam wrote:
> Nono - striping should only be used to get more bandwidth from servers. The
> correct solution to the problem you point out is a lock conversion,
Indeed, I have heard that term being used and figured that that is what
it was all about, however...
> planned
> long ago, still far away maybe (Nikita?).
Given that it's not yet available and won't be for some time, isn't
breaking a file up into many objects via striping a sufficient
alternative to the problem?
Are there other problems caused by striping a file (that technically
doesn't need more bandwidth) that outweigh the benefits of not having to
toss so much cache away when a file is written to (i.e. assuming the
cost of reading the file is high such as it would be on a WAN link)?
b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20080509/2051fccf/attachment.pgp>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-05-09 15:08 ` Peter Braam
2008-05-09 17:08 ` Brian J. Murrell
@ 2008-05-09 18:17 ` Nikita Danilov
2008-05-09 23:01 ` Andreas Dilger
2 siblings, 0 replies; 8+ messages in thread
From: Nikita Danilov @ 2008-05-09 18:17 UTC (permalink / raw)
To: lustre-devel
Peter Braam writes:
> Nono - striping should only be used to get more bandwidth from servers. The
> correct solution to the problem you point out is a lock conversion, planned
> long ago, still far away maybe (Nikita?).
Yes, it's far away. Interestingly, similar lock "conversion" for
meta-data locks is required for write-back cache, when a sub-tree lock
is split into a set of sub-sub-tree sub-locks.
>
> Peter
Nikita.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-05-09 15:08 ` Peter Braam
2008-05-09 17:08 ` Brian J. Murrell
2008-05-09 18:17 ` Nikita Danilov
@ 2008-05-09 23:01 ` Andreas Dilger
2 siblings, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2008-05-09 23:01 UTC (permalink / raw)
To: lustre-devel
On May 09, 2008 09:08 -0600, Peter J. Braam wrote:
> Nono - striping should only be used to get more bandwidth from servers. The
> correct solution to the problem you point out is a lock conversion, planned
> long ago, still far away maybe (Nikita?).
This task is something that Oleg has been working on occasionally,
I think there are patches around but fairly old (though not as bad
as might be expected, because LDLM code changes relatively slowly).
> On 5/9/08 8:25 AM, "Brian J. Murrell" <Brian.Murrell@Sun.COM> wrote:
> > Even still it's useful during implementation to think of the use case of
> > that remote client having read a file and caching and holding a read
> > lock on that file, say 1GB in size, and then another client wanting to
> > update say, 1KB in the middle of the file. It would be beneficial for
> > that 1GB file to have a small (but still practical) stripe size so that
> > the amount of cache that needs to be thrown away to accommodate the
> > write is relatively small.
Having multiple stripes would save invalidation of (nstripes - 1) / nstripes
of the file, but in general the "update in the middle" paradigm is very
rare in real life, so in practise I don't think this will help much.
Even the "add a byte in the middle of a text file" case always causes
the whole file to be rewritten because of backing up the old file.
The only common applications I'm aware of that do partial-file read/write
operations are databases and peer-to-peer file sharing.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
[not found] <372618435.977041214909597153.JavaMail.root@dahlback.prod.local>
@ 2008-07-01 11:03 ` Daire Byrne
2008-07-01 14:56 ` Peter Braam
0 siblings, 1 reply; 8+ messages in thread
From: Daire Byrne @ 2008-07-01 11:03 UTC (permalink / raw)
To: lustre-devel
Peter,
I assume the same rational holds for NFS exporting too? I'm toying with the idea of putting lots of RAM in a server and exporting our LustreFS over NFS. We have some workloads which do a lot of seeking through a reasonably small set (~32Gigs worth) of files which may perform better if an NFS server caches the dataset and consequently doesn't have to do any disk seeks. Obviously this is not particularly scalable (cheaply) but in small scale tests it seems to perform better than seeking directly from Lustre.
The "open lock" stuff you mention is the work going on in #14975 right? Using Lustre 1.6.5 server/client it seems like I can already get line speed (GigE) reads over NFS for a single file once the Lustre client on the NFS server has cached it. But I have not tested this at scale with many clients and files simultaneously.
While we wait for Lustre caching (I assume the work done in #12182 is dead in the water?) this may be the best way for us to deal with heavy seek+read workloads. Our use of SATA based hardware RAID arrays doesn't help our seek performance either.
Daire
----- "Peter Braam" <Peter.Braam@Sun.COM> wrote:
> Wide area use of Lustre and client caches During the LUG I was
> approached by a customer who wants to use a Lustre file system at the
> far end of a WAN link. Since the situation may be of general interest,
> I thought I would post a short report of the discussion here.
>
> His use pattern was interesting ? a number of Windows clients must be
> browsing files stored in Lustre in this remote location. It was
> expected that the files would be fairly large, would be viewed by
> multiple clients, and that few or no modifications would be made.
>
> After some discussion we proposed a solution that involved a
> deployment as follows:
>
>
>
> 1. A single Lustre client with lots of RAM. The settings on the
> client would be (1) that the memory available for caching by lustre is
> large (2) that the number of locks that can be held by this client is
> fairly large (3) that this client uses the ?open cache?.
> 2. A samba server on this Lustre client.
>
> With the settings above, we can expect that many of the files can be
> cached in the Lustre client, hence after the initial read, I/O would
> be local in the remote site. With the open file cache enabled, even
> the open and close traffic will not go to the servers, but can be
> handled by the client. We think that this will lead to a very good
> solution, that can work today.
>
> A refinement is possible, that requires some development. There is a
> feature in the Linux kernel to use a disk partition as a cache for a
> file system ? it is called cachefs. This requires a few hooks in
> Lustre to store chunks of files that are transferred to the client
> into this cache, and cache invalidation calls to remove them. It
> allows us to achieve the same performance as with the solution above,
> except that the disk will be a bit slower than memory, but it can also
> be much larger.
>
> We are eagerly awaiting the results of testing this configuration!
>
> - peter -
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] Wide area use of Lustre and client caches
2008-07-01 11:03 ` Daire Byrne
@ 2008-07-01 14:56 ` Peter Braam
0 siblings, 0 replies; 8+ messages in thread
From: Peter Braam @ 2008-07-01 14:56 UTC (permalink / raw)
To: lustre-devel
Yes, it should help a lot - and it should apply in the same way. The open
cache is automatically used by the NFS server (which runs in kernel space).
Any volunteers to write the cachefs interfaces for us?
Peter
On 7/1/08 5:03 AM, "Daire Byrne" <Daire.Byrne@framestore.com> wrote:
> Peter,
>
> I assume the same rational holds for NFS exporting too? I'm toying with the
> idea of putting lots of RAM in a server and exporting our LustreFS over NFS.
> We have some workloads which do a lot of seeking through a reasonably small
> set (~32Gigs worth) of files which may perform better if an NFS server caches
> the dataset and consequently doesn't have to do any disk seeks. Obviously this
> is not particularly scalable (cheaply) but in small scale tests it seems to
> perform better than seeking directly from Lustre.
>
> The "open lock" stuff you mention is the work going on in #14975 right? Using
> Lustre 1.6.5 server/client it seems like I can already get line speed (GigE)
> reads over NFS for a single file once the Lustre client on the NFS server has
> cached it. But I have not tested this at scale with many clients and files
> simultaneously.
>
> While we wait for Lustre caching (I assume the work done in #12182 is dead in
> the water?) this may be the best way for us to deal with heavy seek+read
> workloads. Our use of SATA based hardware RAID arrays doesn't help our seek
> performance either.
>
> Daire
>
>
> ----- "Peter Braam" <Peter.Braam@Sun.COM> wrote:
>
>> Wide area use of Lustre and client caches During the LUG I was
>> approached by a customer who wants to use a Lustre file system at the
>> far end of a WAN link. Since the situation may be of general interest,
>> I thought I would post a short report of the discussion here.
>>
>> His use pattern was interesting ? a number of Windows clients must be
>> browsing files stored in Lustre in this remote location. It was
>> expected that the files would be fairly large, would be viewed by
>> multiple clients, and that few or no modifications would be made.
>>
>> After some discussion we proposed a solution that involved a
>> deployment as follows:
>>
>>
>>
>> 1. A single Lustre client with lots of RAM. The settings on the
>> client would be (1) that the memory available for caching by lustre is
>> large (2) that the number of locks that can be held by this client is
>> fairly large (3) that this client uses the ?open cache?.
>> 2. A samba server on this Lustre client.
>>
>> With the settings above, we can expect that many of the files can be
>> cached in the Lustre client, hence after the initial read, I/O would
>> be local in the remote site. With the open file cache enabled, even
>> the open and close traffic will not go to the servers, but can be
>> handled by the client. We think that this will lead to a very good
>> solution, that can work today.
>>
>> A refinement is possible, that requires some development. There is a
>> feature in the Linux kernel to use a disk partition as a cache for a
>> file system ? it is called cachefs. This requires a few hooks in
>> Lustre to store chunks of files that are transferred to the client
>> into this cache, and cache invalidation calls to remove them. It
>> allows us to achieve the same performance as with the solution above,
>> except that the disk will be a bit slower than memory, but it can also
>> be much larger.
>>
>> We are eagerly awaiting the results of testing this configuration!
>>
>> - peter -
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-07-01 14:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-09 4:55 [Lustre-devel] Wide area use of Lustre and client caches Peter Braam
2008-05-09 14:25 ` Brian J. Murrell
2008-05-09 15:08 ` Peter Braam
2008-05-09 17:08 ` Brian J. Murrell
2008-05-09 18:17 ` Nikita Danilov
2008-05-09 23:01 ` Andreas Dilger
[not found] <372618435.977041214909597153.JavaMail.root@dahlback.prod.local>
2008-07-01 11:03 ` Daire Byrne
2008-07-01 14:56 ` Peter Braam
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.