* [PATCH 0/13] IB-mgmt: Port madeye to userspace
@ 2010-11-03 23:13 Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B38B-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2010-11-03 23:13 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
This series ports the kernel madeye debug utility to
userspace. It depends on the kernel MAD snooping functionality
patches recently submitted to this list. The series
adds the following to the ib-mgmt tree:
* Adds the ability to snoop MADs to libibumad.
* Adds new header files to libibumad that define various MAD
data structures and definitions. The headers define a minimal
number of definitions, basically only what was needed for madeye.
* Starts the process of updating libibmad and opensm to use the
new definitions. mad.h and ib_types.h are updated to reference
values in the new header file, but the names of existing defines
did not change.
* Adds madeye as a new ib-diag.
The new header files define the MAD headers, but do not define any
of the data fields, such as SA attributes. Those are left for
further discussion. The patch series is basically the same as that
submitted for the RFC. Fixes were added to madeye based on more
extensive testing, and a new patch was added to the series to
identify which fields in libibumad are in network, versus host
order. I added that separately because it touches existing
structures, as well as new ones that are defined. Otherwise,
the definition for new structures follow the existing conventions.
Porting madeye to user space is a quick and useful way to verify
that the snooping capabilities work. However, an alternative goal
of these patches is to allow ibacm and similar applications to
detect and react to SA and CM timeouts.
Signed-off-by: Sean Hefty <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B38B-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-11-04 8:25 ` Or Gerlitz
[not found] ` <4CD26E04.3060408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2010-11-04 8:25 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hefty, Sean wrote:
> [...] an alternative goal f these patches is to allow ibacm and similar applications to detect and react to SA and CM timeouts.
Hi Sean,
As far as I understand CM timeout is an event not a mad... when
referring to detecting/reacting on CM timeouts, did you mean detecting
mads like "CM retries " and reacting on them?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <4CD26E04.3060408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2010-11-04 14:56 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B65F-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2010-11-04 14:56 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> As far as I understand CM timeout is an event not a mad... when
> referring to detecting/reacting on CM timeouts, did you mean detecting
> mads like "CM retries " and reacting on them?
CM timeout 'events' would be detected and reported as sent MADs that complete in error. For ibacm, the only CM MAD of interest is the REQ, since it contains SLID/DLID information that can be used to update a path record cache.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B65F-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-11-04 16:31 ` Or Gerlitz
[not found] ` <4CD2DFD3.7040900-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2010-11-04 16:31 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hefty, Sean wrote:
> sent MADs that complete in error. For ibacm, the only CM MAD of interest is the REQ,
> since it contains SLID/DLID information that can be used to update a path record cache.
still, I am not sure to be with you, the mads used by the CM aren't reliable, correct?
so I don't see why/how a mad containing e.g junk DLID completes with error...
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <4CD2DFD3.7040900-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
@ 2010-11-04 16:51 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B857-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2010-11-04 16:51 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> still, I am not sure to be with you, the mads used by the CM aren't
> reliable, correct?
> so I don't see why/how a mad containing e.g junk DLID completes with
> error...
CM mads aren't reliable, however they are retried. If a CM REQ does not receive a response after so many retries (usually 15), the REQ fails (status is timeout). The mad layer reports the timeout to the cm module. With snooping in place, a user will be notified that a mad send has failed and be given a copy of the mad.
At a higher level, this would be one usage model:
1. App calls rdma_getaddrinfo()
2. The librdmacm contacts the ibacm for path record data.
3. ibacm returns a path record. The path record _may_ have come from cached data.
4. The librdmacm tries to establish a connection.
5. The kernel ib_cm module issues REQ.
6. The ib_mad module retries the REQ until it times out.
7. The mad timeout is reported to any users wishing to capture errors.
In this example, the ibacm service would be registered and receive a copy of the failed REQ. The ibacm can look at the data in the REQ, see if it if has cached path record data which matches, and remove the cached data if so. If the REQ data cannot be found (for example, someone sent a REQ with a junk DLID), it simply discards the captured mad.
8. The librdmacm will see a connection failure.
9. The librdmacm can request a new path from the ibacm and retry.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B857-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-11-08 10:14 ` Or Gerlitz
[not found] ` <4CD7CD7B.2020003-smomgflXvOZWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Or Gerlitz @ 2010-11-08 10:14 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hefty, Sean wrote:
> CM mads aren't reliable, however they are retried. If a CM REQ does not receive a response after so many retries (usually 15), the REQ fails (status is timeout). The mad layer reports the timeout to the cm module. With snooping in place, a user will be notified that a mad send has failed and be given a copy of the mad.
mmm, got that - I also see that ib_mad_send_wc has both the status and
the content of the mad, upon which you base the design
> 3. ibacm returns a path record. The path record _may_ have come from cached data.
> 4. The librdmacm tries to establish a connection.
> 5. The kernel ib_cm module issues REQ.
> 6. The ib_mad module retries the REQ until it times out.
> 7. The mad timeout is reported to any users wishing to capture errors.
> In this example, the ibacm service would be registered and receive a copy of the failed REQ. The ibacm can look at the data in the REQ, see if it if has cached path record data which matches, and remove the cached data if so.
> 8. The librdmacm will see a connection failure.
so the usage of mad snooping would be for cache invalidations, I wonder
if registering on GID/MGID IN/OUT traps be sufficient for the same purpose?
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <4CD7CD7B.2020003-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2010-11-08 15:53 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B83D6B54-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Hefty, Sean @ 2010-11-08 15:53 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> so the usage of mad snooping would be for cache invalidations, I wonder
> if registering on GID/MGID IN/OUT traps be sufficient for the same purpose?
That requires registration with the SA. The intent is to avoid using a centralized service when possible. Otherwise, we end up with all nodes registering for a trap, the SA needing to notify all nodes of in/out of service, and all nodes updating their caches at the same time. The CM timeout approach avoids this; the only nodes that need to update their caches are ones which are actively trying to connect to a specific node.
Consider a case where one or more nodes are removed from an MPI run. (Maybe the software on the nodes are being updated.) Relying on traps would require SA communication to and from all nodes, even though the nodes won't be used. Additionally, once the nodes go back online, their path information may not have changed, so none of the work was even needed.
The focus of this patch is on thousands to tens of thousands of nodes. At that scale, the likelihood of a node going up/down at any given point is high.
- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/13] IB-mgmt: Port madeye to userspace
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B83D6B54-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-11-08 16:17 ` Or Gerlitz
0 siblings, 0 replies; 8+ messages in thread
From: Or Gerlitz @ 2010-11-08 16:17 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hefty, Sean wrote:
> That requires registration with the SA. The intent is to avoid using a centralized service when possible.
yep, makes sense, look like this design finally went the decentralized way... cool
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-11-08 16:17 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-03 23:13 [PATCH 0/13] IB-mgmt: Port madeye to userspace Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B38B-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-11-04 8:25 ` Or Gerlitz
[not found] ` <4CD26E04.3060408-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2010-11-04 14:56 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B65F-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-11-04 16:31 ` Or Gerlitz
[not found] ` <4CD2DFD3.7040900-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-11-04 16:51 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B837B857-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-11-08 10:14 ` Or Gerlitz
[not found] ` <4CD7CD7B.2020003-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2010-11-08 15:53 ` Hefty, Sean
[not found] ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B83D6B54-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-11-08 16:17 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).