* [Lustre-devel] Reducing amount of glimpses
@ 2008-02-02 6:26 Oleg Drokin
2008-02-02 6:39 ` Eric Barton
2008-02-02 11:22 ` Nikita Danilov
0 siblings, 2 replies; 5+ messages in thread
From: Oleg Drokin @ 2008-02-02 6:26 UTC (permalink / raw)
To: lustre-devel
Hello!
Doing some large scale testing at ORNL, interesting pattern came up.
Suppose we are doing large-scale IOR testing on a shared file.
Some unlucky client does its writing at highest offset (or, at the
beginning,
was unlucky enough to grab whole-object PW lock).
As other clients do their writes, they would do glimpses first to
find out file
size. Now those glimpses turn into thousands of glimpse requests
to that poor
client. And many of them actually coming in parallel.
So I was thinking - perhaps it would be nice for a
filter_intent_policy() to check
if there are any glimpse requests being in flight to that client
already for that
same lock, and if there are, just wait for the request to return
and use data from
there?
Of course potential caveat here is that we have no way to tell if
the request reached
client by the time we started our processing or not, and so
potentially we might get
size data that is a bit stale, but I wonder if this is critical
enough in our case?
Any ideas?
Bye,
Oleg
^ permalink raw reply [flat|nested] 5+ messages in thread* [Lustre-devel] Reducing amount of glimpses
2008-02-02 6:26 [Lustre-devel] Reducing amount of glimpses Oleg Drokin
@ 2008-02-02 6:39 ` Eric Barton
2008-02-02 7:00 ` Oleg Drokin
2008-02-02 11:22 ` Nikita Danilov
1 sibling, 1 reply; 5+ messages in thread
From: Eric Barton @ 2008-02-02 6:39 UTC (permalink / raw)
To: lustre-devel
Oleg,
Reduction (as you describe) is an absolutely essential strategy
to achieve scalability. So without having checked through all the
details of your idea, it feels like absolutely the right solution.
If your idea not only changes latencies but also significant orderings,
I'll feel less secure. Have you thought it all through?
Cheers,
Eric
> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org
> [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of
> Oleg Drokin
> Sent: 02 February 2008 6:27 AM
> To: lustre-devel at clusterfs.com
> Subject: [Lustre-devel] Reducing amount of glimpses
>
> Hello!
>
> Doing some large scale testing at ORNL, interesting
> pattern came up.
> Suppose we are doing large-scale IOR testing on a shared file.
> Some unlucky client does its writing at highest offset
> (or, at the
> beginning,
> was unlucky enough to grab whole-object PW lock).
> As other clients do their writes, they would do glimpses
> first to
> find out file
> size. Now those glimpses turn into thousands of glimpse requests
> to that poor
> client. And many of them actually coming in parallel.
> So I was thinking - perhaps it would be nice for a
> filter_intent_policy() to check
> if there are any glimpse requests being in flight to that client
> already for that
> same lock, and if there are, just wait for the request to return
> and use data from
> there?
> Of course potential caveat here is that we have no way to
> tell if
> the request reached
> client by the time we started our processing or not, and so
> potentially we might get
> size data that is a bit stale, but I wonder if this is critical
> enough in our case?
>
> Any ideas?
>
> Bye,
> Oleg
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
^ permalink raw reply [flat|nested] 5+ messages in thread* [Lustre-devel] Reducing amount of glimpses
2008-02-02 6:39 ` Eric Barton
@ 2008-02-02 7:00 ` Oleg Drokin
2008-02-02 7:40 ` Andreas Dilger
0 siblings, 1 reply; 5+ messages in thread
From: Oleg Drokin @ 2008-02-02 7:00 UTC (permalink / raw)
To: lustre-devel
Hello!
On Feb 2, 2008, at 1:39 AM, Eric Barton wrote:
> Reduction (as you describe) is an absolutely essential strategy
> to achieve scalability. So without having checked through all the
> details of your idea, it feels like absolutely the right solution.
Yes, I just afraid that a bit stale size data might be reported, and
not yet
sure how critical is that to us.
> If your idea not only changes latencies but also significant
> orderings,
> I'll feel less secure. Have you thought it all through?
There is no ordering changes other than I would think perfectly normal
race.
Let me explain with a control flow:
client3 holds highest write lock. clients 1 and 2 do glimpse.
Right now (racy scenario):
client 1 sends glimpse->server on its behalf sends glimpse ast to
client3 -> client3 receives ast, sends reply
client 2 sends glimpse->server on its behalf sends glimpse ast to
client3 ->
client3 writes some more data
glimpse on behalf of client2 arrives to client3 -> client3 sends in
updated size
server now sends different sizes back to client 1 and client2.
If we consolidate, it will look like this:
client 1 sends glimpse->server on its behalf sends glimpse ast to
client3 -> client3 receives ast, sends reply
client 2 sends glimpse->server notices there is glimpse ast in flight
already to client3, and waits for it
client3 writes some more data
server receives reply and sends identical sizes to client1 and client2.
I tend to think this is normal race (kind of like with 2 processes
doing stat on a local file and 3rd writing to file constantly,
it is possible that when two stats are done closely apart, they would
come with identical size), but just want to check if
there are no objections about it.
Bye,
Oleg
^ permalink raw reply [flat|nested] 5+ messages in thread* [Lustre-devel] Reducing amount of glimpses
2008-02-02 7:00 ` Oleg Drokin
@ 2008-02-02 7:40 ` Andreas Dilger
0 siblings, 0 replies; 5+ messages in thread
From: Andreas Dilger @ 2008-02-02 7:40 UTC (permalink / raw)
To: lustre-devel
On Feb 02, 2008 02:00 -0500, Oleg Drokin wrote:
> On Feb 2, 2008, at 1:39 AM, Eric Barton wrote:
> > Reduction (as you describe) is an absolutely essential strategy
> > to achieve scalability. So without having checked through all the
> > details of your idea, it feels like absolutely the right solution.
>
> Yes, I just afraid that a bit stale size data might be reported, and
> not yet sure how critical is that to us.
A glimpse enqueue is already inherently racy, so I don't think this
is a serious problem.
> > If your idea not only changes latencies but also significant
> > orderings, I'll feel less secure. Have you thought it all through?
>
> There is no ordering changes other than I would think perfectly normal
> race. Let me explain with a control flow:
>
> client3 holds highest write lock. clients 1 and 2 do glimpse.
> Right now (racy scenario):
>
> client 1 sends glimpse->server on its behalf sends glimpse ast to
> client3 -> client3 receives ast, sends reply
> client 2 sends glimpse->server on its behalf sends glimpse ast to
> client3 ->
> client3 writes some more data
> glimpse on behalf of client2 arrives to client3 -> client3 sends in
> updated size
> server now sends different sizes back to client 1 and client2.
>
> If we consolidate, it will look like this:
> client 1 sends glimpse->server on its behalf sends glimpse ast to
> client3 -> client3 receives ast, sends reply
> client 2 sends glimpse->server notices there is glimpse ast in flight
> already to client3, and waits for it
> client3 writes some more data
> server receives reply and sends identical sizes to client1 and client2.
>
> I tend to think this is normal race (kind of like with 2 processes
> doing stat on a local file and 3rd writing to file constantly,
> it is possible that when two stats are done closely apart, they would
> come with identical size), but just want to check if
> there are no objections about it.
The uses of glimpse are only in a few cases:
- ls -l (inherently racy)
- read
- write
In the read and write cases the glimpse return is used to detemine if the
file size is before or after the given client's lock. Since the client
that is getting the glimpse is holding the EOF lock this can't change the
relative positions of the locks.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Lustre-devel] Reducing amount of glimpses
2008-02-02 6:26 [Lustre-devel] Reducing amount of glimpses Oleg Drokin
2008-02-02 6:39 ` Eric Barton
@ 2008-02-02 11:22 ` Nikita Danilov
1 sibling, 0 replies; 5+ messages in thread
From: Nikita Danilov @ 2008-02-02 11:22 UTC (permalink / raw)
To: lustre-devel
Oleg Drokin writes:
> Hello!
>
> Doing some large scale testing at ORNL, interesting pattern came up.
> Suppose we are doing large-scale IOR testing on a shared file.
> Some unlucky client does its writing at highest offset (or, at the
> beginning,
> was unlucky enough to grab whole-object PW lock).
> As other clients do their writes, they would do glimpses first to
> find out file
> size. Now those glimpses turn into thousands of glimpse requests
> to that poor
> client. And many of them actually coming in parallel.
I think there is a way to further reduce the number of glimpses sent in
the read/write paths. In a situation when client is doing read and the
region being read ends in a certain stripe, it's often enough to send a
glimpse request only to the OST where this stripe is located. Indeed, if
stripe is not a hole, then size of this stripe alone is sufficient to
determine whether we are in a short read situation or not. Of course if
stripe happens to end in a hole, client needs to send more glimpses, but
in the most common case of non-sparse file, 1 RPC seems to be enough.
>
> Bye,
> Oleg
Nikita.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-02-02 11:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-02 6:26 [Lustre-devel] Reducing amount of glimpses Oleg Drokin
2008-02-02 6:39 ` Eric Barton
2008-02-02 7:00 ` Oleg Drokin
2008-02-02 7:40 ` Andreas Dilger
2008-02-02 11:22 ` Nikita Danilov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.