All of lore.kernel.org
 help / color / mirror / Atom feed
From: Olaf Weber <olaf@sgi.com>
To: lustre-devel@lists.lustre.org
Subject: [lustre-devel] Multi-rail networking for Lustre
Date: Fri, 22 Jan 2016 15:31:14 +0100	[thread overview]
Message-ID: <56A23D32.9080508@sgi.com> (raw)
In-Reply-To: <CAJ2e-W1rGCimMaEmdG3h2+PLJvcggZBV6j8sV+hfb04Q2qkvEg@mail.gmail.com>

On 22-01-16 10:08, Alexey Lyashkov wrote:
>
>
> On Thu, Jan 21, 2016 at 11:30 PM, Olaf Weber <olaf@sgi.com
> <mailto:olaf@sgi.com>> wrote:
>
>     On 21-01-16 20:16, Alexey Lyashkov wrote:

[...]

> In lustre terms each mount point is separated client. It have own cache, own
> structures, and completely separated each from an other.
> One exceptions it's ldlm cache which live on global object id space.

Another exception is flock deadlock detection, which is always a global 
operation. This is why ldlm_flock_deadlock() inspects c_peer.nid.

[...]

> All lustre stack operate with UUID, and it have none differences when it
> UUID live. We may migrate service / client from one network address to
> another, without logical reconnect. It's my main objections against you ideas.
> If none have a several addresses LNet should be responsible to reliability
> delivery a one-way requests. which is logically connect to PtlRPC. If node
> will be need to use different routing and different NID's for communication
> - it's should be hide in LNet, and LNet should provide as high api as possible.

The basic idea behind the multi-rail design is that LNet figures out how to 
send a message to a peer. But the user of LNet can provide a hint to 
indicate that for a specific message a specific path is preferred.

One of our goals is to keep changes to the LNet API small.

>         I expect you know about situation when one DNS name have several
>         addresses
>         like several 'A' records in dns zone file.
>
>
>     Sure, but when one name points to several machines, it does not help me
>     balance traffic over the interfaces of just one machine.
>
>
> Simple balance may be DNS based - just round robin, as we have now on IB /
> sock lnd. it isn't balance?
 > If you talk about more serious you should start from good flow control
 > between nodes. Probably Ideas from RIP and LACK protocols will help.

There is bonding/balancing in socklnd. There is none in o2iblnd.

[...]

>     A PtlRPC RPC has structure. The first LNetPut() transmits just the
>     header information. Then one or more LNetPut() or LNetGet() messages are
>     done to transmit the rest of the request. Then the response follows,
>     which also consists of several LNetPut() or LNetGet() messages.
>
> It's wrong. Looks you mix an RPC and bulk transfers.

Difference in terminology: I tend to think of an RPC as a request/response 
pair (if there is a response), and these in turn include all traffic related 
to the RPC, including any bulk transfers.

[...]

>     The lustre_uuid_to_peer() function enumerates all NIDs associated with
>     the UUID. This includes the primary NID, but also includes the other
>     NIDs. So we find a preferred peer NID based on that. Then we modify the
>     code like this:
>
> Why PtlRPC should be know that low level details? Currently we have a
> problems - when one of destination NID's is unreachable and transfer
> initiator need a full ptlrpc reconnect to resend to different NID. But as
> you should be have a resend

Within LNet a resend can be triggered from lnet_finalize() after a failed 
attempt to send the message has been decommitted. (Otherwise multiple send 
attempts will need to be tracked at the same time.)

>     The call of LNetPrimaryNID() gives the primary peer NID for the peer
>     NID. For this to work a handful of calls to LNetPrimaryNID() must be
>     added. After that it is up to LNet to find the best route.
>
>
> Per our's comment PrimaryNID will changed after we will find a best, did you
> think it loop usefull if you replace loop result at anycases ?
> from other view ptlrpc_uuid_to_peer called only in few cases, all other time
> ptlrpc have a cache a results in ptlrpc connection info.

The main benefit of the loop becomes detecting whether the node is sending 
to itself, in which case the loopback interface must be used. Though I do 
worry about degenerate or bad configurations where not all the IP addresses 
belong to the same node.

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696796
                            Veldzigt 2b       Fax:    +31(0)30-6696799
Sr Software Engineer       3454 PW de Meern  Vnet:   955-6796
Storage Software           The Netherlands   Email:  olaf at sgi.com

  reply	other threads:[~2016-01-22 14:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-16 10:14 [lustre-devel] Multi-rail networking for Lustre Olaf Weber
2015-11-12 20:50 ` Amir Shehata
     [not found]   ` <569E5BCA.5030705@sgi.com>
     [not found]     ` <CAJ2e-W2hVPfq1fT_43tAWM1eE7Ue8qD3RsswBXr+Fzwv39kyCQ@mail.gmail.com>
     [not found]       ` <569FDCC0.90004@sgi.com>
     [not found]         ` <CAJ2e-W0cFuzNDda4fWm-Sd=wmyjYnRyXx9PSLWGAHX5KQO1PGQ@mail.gmail.com>
     [not found]           ` <569FF198.5040207@sgi.com>
     [not found]             ` <CAJ2e-W3x-O8pWkg8vT40D2g6hbworabsc8MraqGZPw1QSbCFdg@mail.gmail.com>
     [not found]               ` <56A10B37.60709@sgi.com>
     [not found]                 ` <CAJ2e-W2q2JPBuye6gLfPYYqU1vk8YgBqE4=_u7Jdsu-vt8JdCw@mail.gmail.com>
     [not found]                   ` <56A13FDB.2050902@sgi.com>
2016-01-22  9:08                     ` Alexey Lyashkov
2016-01-22 14:31                       ` Olaf Weber [this message]
2016-01-22 20:06                         ` Alexey Lyashkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A23D32.9080508@sgi.com \
    --to=olaf@sgi.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.