[Lustre-devel] 2 primitives for the NRS

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eric Barton <eeb@sun.com>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] 2 primitives for the NRS
Date: Fri, 04 Jan 2008 18:19:44 +0000	[thread overview]
Message-ID: <002001c84efe$61cec660$0281a8c0@ebpc> (raw)
In-Reply-To: <477E662E.30803@gmail.com>

Wonderful questions! 

> I am trying to plug in an I/O request scheduler into OSS before
> read/write requests get dispatched to the obdfilter. 

If you base your code off b1_6, you can take a free ride on the initial
request processing done by Nathan's adaptive timeout code.

> What I am using
> is hashed bins with a basic rb tree, assuming it would be fairly
> reasonable to handle the number of I/O requests that can reach an
> OSS. My interface calls are very similar to yours, except the lack
> of plug-in comparison. I would not have more to suggest on this. 
> But I do have a couple of questions to check for possible thoughts if
> you have.
> 
> (1) How are you going to order the requests, say the read/write
> ones? I assume you made it flexible with a plug-in compare(). 

Yes - because I don't know yet what's going to work best.  And maybe
different services might want different orders.  And generalisation
when it doesn't hurt performance is a hard habit to break :)

My first thoughts are about fairness to all clients in the face of
unfairness elsewhere - e.g. a gridlocked network - so I'm thinking of
something that picks buffered RPC requests round-robin on client ID.

This is probably good for workloads of large numbers of single-client
jobs to ensure that no individual client can be starved.  However I
also suggest that it's good for I/O performed by a single job spread
over many clients.  

I base this on the idea that a good backend filesystem should and can
optimize disk utilisation.  When a file is written, the file<->disk
offset mapping is fixed for subsequent reads, so I want the NRS to
make I/O request execution order repeatable in the face of network
"noise" and races between clients.  Without this repeatability, we
have to fall back on the disk elevator to re-create the "good" disk
I/O stream on subsequent reads.  Surely it cannot do such a good job
as the NRS since it must have orders of magnitude fewer requests to
play with - bulk buffers must be allocated by the time it sees them.

See http://arch.lustre.org/index.php?title=Network_Request_Scheduler

I'm afraid I don't yet have anything even half backed to say on
write v. read order etc.  I'd still want some empirical evidence first.

> Would the order of the I/O requests based on object ID have some
> relevance to their locality on the disks?

I thinks it might make more sense for the backend F/S to use a job ID
to help it create sequential disk offsets for the whole I/O pattern
rather than anything coming from one individual client.

> I was assuming at least the requests
> can get smoothed out with the objID ordering.
> 
> (2) Have you checked the overhead when there are many concurrent
> threads competing for the locks associated with your heap?  The
> performance impact thereof?

I've only done sequential timings so far.  NRS ops could be "Amdhal
moments" for the whole server so fat SMPs might require some better
care.

> (3) Do you anticipate to merge the requests in any way, or possibly
> batch execute them?

Yes, but I'm such a lazy sod that I hope the disk elevator will smile
on me.  If not and I have to roll up my sleeves - so be it.

    Cheers,
              Eric

next prev parent reply	other threads:[~2008-01-04 18:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-04 13:37 [Lustre-devel] 2 primitives for the NRS Eric Barton
2008-01-04 17:00 ` Weikuan Yu
2008-01-04 18:19   ` Eric Barton [this message]
2008-01-04 17:40 ` Lee Ward
2008-01-04 17:51   ` Eric Barton
2008-01-04 18:36   ` Eric Barton
2008-01-05  0:36     ` Lee Ward
2008-01-05 12:19 ` Alex Lyashkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='002001c84efe$61cec660$0281a8c0@ebpc' \
    --to=eeb@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.