All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] extend lnet_notify to public LNet API
@ 2010-11-16 16:00 Nic Henke
  2010-11-17  3:00 ` liang Zhen
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nic Henke @ 2010-11-16 16:00 UTC (permalink / raw)
  To: lustre-devel

We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a 
callback that would be called from lnet_notify. This will allow them to 
be notified when the lower layers have seen network problems between 
NIDs and let them take appropriate action. The upper layer could also be 
notified when that peer has returned to 'network health' after the LND 
gets its act together.

This would help allow upper layers to aggressively resend/reconnect in 
the cases where all TX have completed successfully (meaning no LNet -EIO 
on LND errors) but there are LNET_MSG_ACK or other REPLY traffic 
outstanding.

Initial proposal is on the verbose side, giving all data that 
lnet_notify sees:
- lnet_nid_t
- is_alive (boolean)
- cfs_time_t when (unsigned long on Linux) - jiffies when last alive

Is this workable and likely to be accepted up-stream ?

Nic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-16 16:00 [Lustre-devel] extend lnet_notify to public LNet API Nic Henke
@ 2010-11-17  3:00 ` liang Zhen
  2010-11-17  7:52 ` Alexey Lyashkov
  2010-11-22 17:29 ` Nic Henke
  2 siblings, 0 replies; 7+ messages in thread
From: liang Zhen @ 2010-11-17  3:00 UTC (permalink / raw)
  To: lustre-devel

Nic,

Are you suggesting to provide a new API like:

int LNetNotificationAttach(lnet_notification_callback_t callback);

to register a global callback for LNet, the callback will be called on 
any lnet_notify_locked? If so I don't see any reason we can't do this, 
at least from my point of view. One thing we need to concern is that we 
can't get such a notification for remote peers because no direct 
connection with them in LNDs, we can only get notification for routers 
but upper layer wouldn't be so interested in routers.

Also, seems to me it's a much bigger change in upper layer than in LNet.

Regards
Liang

On 11/17/10 12:00 AM, Nic Henke wrote:
> We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a
> callback that would be called from lnet_notify. This will allow them to
> be notified when the lower layers have seen network problems between
> NIDs and let them take appropriate action. The upper layer could also be
> notified when that peer has returned to 'network health' after the LND
> gets its act together.
>
> This would help allow upper layers to aggressively resend/reconnect in
> the cases where all TX have completed successfully (meaning no LNet -EIO
> on LND errors) but there are LNET_MSG_ACK or other REPLY traffic
> outstanding.
>
> Initial proposal is on the verbose side, giving all data that
> lnet_notify sees:
> - lnet_nid_t
> - is_alive (boolean)
> - cfs_time_t when (unsigned long on Linux) - jiffies when last alive
>
> Is this workable and likely to be accepted up-stream ?
>
> Nic
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-16 16:00 [Lustre-devel] extend lnet_notify to public LNet API Nic Henke
  2010-11-17  3:00 ` liang Zhen
@ 2010-11-17  7:52 ` Alexey Lyashkov
  2010-11-17 14:59   ` Nic Henke
  2010-11-22 17:29 ` Nic Henke
  2 siblings, 1 reply; 7+ messages in thread
From: Alexey Lyashkov @ 2010-11-17  7:52 UTC (permalink / raw)
  To: lustre-devel

Nic,

that idea discussed some time ago (as i remember with green and maxim), but have some objection.
Currently LNet hide from ptlrpc layer any network flaps, and LNet will resend request without notify ptlrpc about flap until ptlrpc request timeout.
But if ptlrpc will see node down event, ptlrpc will try reconnect  - that will produce extra overhead, because need to resend too much requests from sending and delay lists instead of lots requests in network flap time.
So, you need separate network flap from node down situation - before implementing that.
currently node marked down if node don't respond for request in ptlrpc timeout, which include network transmit and processing times, but it different then LNet message timeout.


On Nov 16, 2010, at 19:00, Nic Henke wrote:

> We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a 
> callback that would be called from lnet_notify. This will allow them to 
> be notified when the lower layers have seen network problems between 
> NIDs and let them take appropriate action. The upper layer could also be 
> notified when that peer has returned to 'network health' after the LND 
> gets its act together.
> 
> This would help allow upper layers to aggressively resend/reconnect in 
> the cases where all TX have completed successfully (meaning no LNet -EIO 
> on LND errors) but there are LNET_MSG_ACK or other REPLY traffic 
> outstanding.
> 
> Initial proposal is on the verbose side, giving all data that 
> lnet_notify sees:
> - lnet_nid_t
> - is_alive (boolean)
> - cfs_time_t when (unsigned long on Linux) - jiffies when last alive
> 
> Is this workable and likely to be accepted up-stream ?
> 
> Nic
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-17  7:52 ` Alexey Lyashkov
@ 2010-11-17 14:59   ` Nic Henke
  2010-11-22 17:23     ` Nic Henke
  0 siblings, 1 reply; 7+ messages in thread
From: Nic Henke @ 2010-11-17 14:59 UTC (permalink / raw)
  To: lustre-devel

On 11/17/2010 01:52 AM, Alexey Lyashkov wrote:
> Nic,
>
> that idea discussed some time ago (as i remember with green and maxim), but have some objection.
> Currently LNet hide from ptlrpc layer any network flaps, and LNet will resend request without notify ptlrpc about flap until ptlrpc request timeout.

I'm missing something - to my knowledge, LNet never retries messages.

> But if ptlrpc will see node down event, ptlrpc will try reconnect  - that will produce extra overhead, because need to resend too much requests from sending and delay lists instead of lots requests in network flap time.
> So, you need separate network flap from node down situation - before implementing that.
> currently node marked down if node don't respond for request in ptlrpc timeout, which include network transmit and processing times, but it different then LNet message timeout.

I think that is a valid upper layer decision to make, but separate from 
implementing the LNet callbacks on network 'flap'. I wouldn't want to 
force ptlrpc to use it.

Cheers,
Nic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-17 14:59   ` Nic Henke
@ 2010-11-22 17:23     ` Nic Henke
  0 siblings, 0 replies; 7+ messages in thread
From: Nic Henke @ 2010-11-22 17:23 UTC (permalink / raw)
  To: lustre-devel

On 11/17/2010 08:59 AM, Nic Henke wrote:
> On 11/17/2010 01:52 AM, Alexey Lyashkov wrote:
>> Nic,
>>
>> that idea discussed some time ago (as i remember with green and maxim), but have some objection.
>> Currently LNet hide from ptlrpc layer any network flaps, and LNet will resend request without notify ptlrpc about flap until ptlrpc request timeout.
>
> I'm missing something - to my knowledge, LNet never retries messages.
>
>> But if ptlrpc will see node down event, ptlrpc will try reconnect  - that will produce extra overhead, because need to resend too much requests from sending and delay lists instead of lots requests in network flap time.
>> So, you need separate network flap from node down situation - before implementing that.
>> currently node marked down if node don't respond for request in ptlrpc timeout, which include network transmit and processing times, but it different then LNet message timeout.
>
> I think that is a valid upper layer decision to make, but separate from
> implementing the LNet callbacks on network 'flap'. I wouldn't want to
> force ptlrpc to use it.

Any response to this ?

Nic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-16 16:00 [Lustre-devel] extend lnet_notify to public LNet API Nic Henke
  2010-11-17  3:00 ` liang Zhen
  2010-11-17  7:52 ` Alexey Lyashkov
@ 2010-11-22 17:29 ` Nic Henke
  2010-11-24 13:10   ` Eric Barton
  2 siblings, 1 reply; 7+ messages in thread
From: Nic Henke @ 2010-11-22 17:29 UTC (permalink / raw)
  To: lustre-devel

On 11/16/2010 10:00 AM, Nic Henke wrote:
> We'd like to allow upper layers (Lustre, Cray DVS, etc) to register a
> callback that would be called from lnet_notify. This will allow them to
> be notified when the lower layers have seen network problems between
> NIDs and let them take appropriate action. The upper layer could also be
> notified when that peer has returned to 'network health' after the LND
> gets its act together.
>
> This would help allow upper layers to aggressively resend/reconnect in
> the cases where all TX have completed successfully (meaning no LNet -EIO
> on LND errors) but there are LNET_MSG_ACK or other REPLY traffic
> outstanding.
>
> Initial proposal is on the verbose side, giving all data that
> lnet_notify sees:
> - lnet_nid_t
> - is_alive (boolean)
> - cfs_time_t when (unsigned long on Linux) - jiffies when last alive
>

One oddity - if the LND has peer_health disabled (no ni_peertimeout 
value), there doesn't seem to be anything that'd set the peer back to 
'up'. Am I missing something or is this as desired ?

Nic

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Lustre-devel] extend lnet_notify to public LNet API
  2010-11-22 17:29 ` Nic Henke
@ 2010-11-24 13:10   ` Eric Barton
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Barton @ 2010-11-24 13:10 UTC (permalink / raw)
  To: lustre-devel

Nic,

There's no fundamental reason we wouldn't want to add the ability to
register callbacks to be called from lnet_notify().  But we should
take care of the following details...

0. You will never receive notifications about NIDs that you are not on
   your local LNDs.  For example, if you're a client on a routed
   network, you'll only find out about router health, never about
   server health.

1. Don't just assume only 1 callback can be registered - if this is
   generally useful, several LNET users might want such notification
   and so the API and implementation should allow for it.

2. lnet_notify_locked() is where all notifications arrive.  It's
   called (as the name suggests) holding LNET locks, so any callback
   handling must complete in very well bounded time just like event
   callbacks.

Shadow wrote...

> that idea discussed some time ago (as i remember with green and
> maxim), but have some objection.  Currently LNet hide from ptlrpc
> layer any network flaps, and LNet will resend request without notify
> ptlrpc about flap until ptlrpc request timeout.  

LNET does not resend.  LNet itself doesn't know about "network flaps"
- all it has to go on is whether communications with individual peer
nodes succeed or fail.  

Isaac (correct me if this is too broad brush) most recently worked on
fixing some pathalogical issues with dead peers to ensure that
communications with a _known_ dead peer complete quickly with failure.
Now, the only time communications can be blocked for the whole LND
timeout should be on initial connection establishment to a peer in an
unknown state, or at the point that the peer dies.

> But if ptlrpc will see node down event, ptlrpc will try reconnect -
> that will produce extra overhead, because need to resend too much
> requests from sending and delay lists instead of lots requests in
> network flap time.  

Ptlrpc reconnect to a know dead peer will fail immediately.  However
this communication attempt will cause the LND to attempt connection
re-establishment and if this is successful, LNET will mark the peer
alive.  Subsequent communication attempts should now succeed.

> So, you need separate network flap from node down situation - before
> implementing that.  currently node marked down if node don't respond
> for request in ptlrpc timeout, which include network transmit and
> processing times, but it different then LNet message timeout.

Indeed.  This is further complicated by LNET routing.  Communications
buffered in a router are lost when the router fails, so you have to
regard routed LNETs as potentially lossy.

Nic wrote...

> One oddity - if the LND has peer_health disabled (no ni_peertimeout 
> value), there doesn't seem to be anything that'd set the peer back to 
> 'up'. Am I missing something or is this as desired ?

Hmmm - I originally implemented peer aliveness only for LNET routers
to ensure known dead routers were avoided - so the status of
non-router peers was not used or maintained properly (mea maxima
culpa).

Isaac fixed this more generally as I mentioned above, but it looks
like only socklnd and o2iblnd have the support.  Shouldn't be too hard
to add for ptllnd though :)

-- 

                Cheers,
                        Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-24 13:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-16 16:00 [Lustre-devel] extend lnet_notify to public LNet API Nic Henke
2010-11-17  3:00 ` liang Zhen
2010-11-17  7:52 ` Alexey Lyashkov
2010-11-17 14:59   ` Nic Henke
2010-11-22 17:23     ` Nic Henke
2010-11-22 17:29 ` Nic Henke
2010-11-24 13:10   ` Eric Barton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.