From: Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
To: Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: Alex Netes <alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
"linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH V2] opensm: on MAD error call back print DR PATH or LID of request MAD
Date: Thu, 15 Dec 2011 09:49:56 -0800 [thread overview]
Message-ID: <20111215094956.32f1abad.weiny2@llnl.gov> (raw)
In-Reply-To: <4EEA00F5.7040505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
On Thu, 15 Dec 2011 06:15:17 -0800
Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> On 12/14/2011 10:18 PM, Ira Weiny wrote:
> >
> > In addition print transaction ID of all DR PATH dumps to make sure we know
> > which MAD's they refer to.
>
> A note on this approach is that this splits the logging of send errors
> between the vendor layer and SM rather than keeping it all at one layer
> of the implementation. That's the tradeoff to not fixing the bug in
> umad_receiver in terms of printing the DR path in ERR 5411.
Yes I guess it could be viewed this way but I really thought of it more as
adding to the already existing logging in sm_mad_ctrl_send_err_cb and fixing a
bug in the logging of umad_receiver.
As I responded in the other thread I did not take out any logging in
umad_receiver which I think is still valid. In addition I just added logging
in the error callback regarding the request which timed out.
>
> > Signed-off-by: Ira Weiny <weiny2-i2BcT+NCU+M@public.gmane.org>
> > ---
> > libvendor/osm_vendor_ibumad.c | 2 --
> > opensm/osm_helper.c | 5 +++--
> > opensm/osm_sm_mad_ctrl.c | 16 ++++++++++++++--
> > 3 files changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c
> > index e2ebd8e..b2872c8 100644
> > --- a/libvendor/osm_vendor_ibumad.c
> > +++ b/libvendor/osm_vendor_ibumad.c
> > @@ -348,8 +348,6 @@ static void *umad_receiver(void *p_ptr)
> > ", Hop Ptr: 0x%X\n",
> > mad->method, cl_ntoh16(mad->attr_id),
> > cl_ntoh64(mad->trans_id), smp->hop_ptr);
> > - osm_dump_smp_dr_path(p_vend->p_log, smp,
> > - OSM_LOG_ERROR);
>
> If you're going this direction, why not remove the logging of error 5411
> above it which means eliminate the else clause there ? Isn't that
> redundant with your change below to sm_mad_ctrl_send_err_cb ?
Technically, yes it is redundant as the "response" is not really a response.
(I think.) But my intention was not to remove any logging except that which
was "useless".
>
> Also, shouldn't another related change to umad_receiver be done:
>
> Where it is:
> if (mad->mgmt_class != IB_MCLASS_SUBN_DIR) {
> it should now be:
> if ((mad->mgmt_class != IB_MCLASS_SUBN_DIR) &&
> (mad->mgmt_class != IB_MCLASS_SUBN_LID)) {
>
> to go along with SM class being logged in the SM send_err callback
> rather than at umad layer.
I am not sure I follow here. Why would you care about the other classes which
timeout? Wouldn't they have the same issue of a response which is "fake"?
If we want to remove the logging at this layer I think we should consider
this.
diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c
index b2872c8..b352cef 100644
--- a/libvendor/osm_vendor_ibumad.c
+++ b/libvendor/osm_vendor_ibumad.c
@@ -327,29 +327,6 @@ static void *umad_receiver(void *p_ptr)
/* if status != 0 then we are handling recv timeout on send */
if (umad_status(p_madw->vend_wrap.umad)) {
- if (mad->mgmt_class != IB_MCLASS_SUBN_DIR) {
- /* LID routed */
- OSM_LOG(p_vend->p_log, OSM_LOG_ERROR, "ERR 5410: "
- "Send completed with error -- dropping\n"
- "\t\t\tClass 0x%x, Method 0x%X, Attr 0x%X, "
- "TID 0x%" PRIx64 ", LID %u\n",
- mad->mgmt_class, mad->method,
- cl_ntoh16(mad->attr_id),
- cl_ntoh64(mad->trans_id),
- cl_ntoh16(ib_mad_addr->lid));
- } else {
- ib_smp_t *smp;
-
- /* Direct routed SMP */
- smp = (ib_smp_t *) mad;
- OSM_LOG(p_vend->p_log, OSM_LOG_ERROR, "ERR 5411: "
- "DR SMP Send completed with error -- dropping\n"
- "\t\t\tMethod 0x%X, Attr 0x%X, TID 0x%" PRIx64
- ", Hop Ptr: 0x%X\n",
- mad->method, cl_ntoh16(mad->attr_id),
- cl_ntoh64(mad->trans_id), smp->hop_ptr);
- }
-
if (!(p_req_madw = get_madw(p_vend, &mad->trans_id))) {
OSM_LOG(p_vend->p_log, OSM_LOG_ERROR,
"ERR 5412: "
But I felt that was a bit draconian, and it was not my initial intent.
Ira
>
> -- Hal
>
> > }
> >
> > if (!(p_req_madw = get_madw(p_vend, &mad->trans_id))) {
> > diff --git a/opensm/osm_helper.c b/opensm/osm_helper.c
> > index f9f3d9d..b968679 100644
> > --- a/opensm/osm_helper.c
> > +++ b/opensm/osm_helper.c
> > @@ -2059,8 +2059,9 @@ void osm_dump_smp_dr_path(IN osm_log_t * p_log, IN const ib_smp_t * p_smp,
> > char buf[BUF_SIZE];
> > unsigned n;
> >
> > - n = sprintf(buf, "Received SMP on a %u hop path: "
> > - "Initial path = ", p_smp->hop_count);
> > + n = sprintf(buf, " DR SMP (TID 0x%" PRIx64 ") on a %u hop path: "
> > + "Initial path = ",
> > + cl_ntoh64(p_smp->trans_id), p_smp->hop_count);
> > n += sprint_uint8_arr(buf + n, sizeof(buf) - n,
> > p_smp->initial_path,
> > p_smp->hop_count + 1);
> > diff --git a/opensm/osm_sm_mad_ctrl.c b/opensm/osm_sm_mad_ctrl.c
> > index ee92c66..a3b444a 100644
> > --- a/opensm/osm_sm_mad_ctrl.c
> > +++ b/opensm/osm_sm_mad_ctrl.c
> > @@ -704,6 +704,7 @@ Exit:
> > */
> > static void (IN void *context, IN osm_madw_t * p_madw)
> > {
> > + char lidstr[8];
> > osm_sm_mad_ctrl_t *p_ctrl = context;
> > ib_api_status_t status;
> > ib_smp_t *p_smp;
> > @@ -713,13 +714,24 @@ static void sm_mad_ctrl_send_err_cb(IN void *context, IN osm_madw_t * p_madw)
> > CL_ASSERT(p_madw);
> >
> > p_smp = osm_madw_get_smp_ptr(p_madw);
> > +
> > + if (p_smp->mgmt_class == IB_MCLASS_SUBN_DIR)
> > + lidstr[0] = '\0';
> > + else
> > + snprintf(lidstr, 8, " DLID %u",
> > + cl_ntoh16(p_madw->mad_addr.dest_lid));
> > +
> > OSM_LOG(p_ctrl->p_log, OSM_LOG_ERROR, "ERR 3113: "
> > "MAD completed in error (%s): "
> > - "%s(%s), attr_mod 0x%x, TID 0x%" PRIx64 "\n",
> > + "%s(%s), attr_mod 0x%x, TID 0x%" PRIx64 " %s\n",
> > ib_get_err_str(p_madw->status),
> > ib_get_sm_method_str(p_smp->method),
> > ib_get_sm_attr_str(p_smp->attr_id), cl_ntoh32(p_smp->attr_mod),
> > - cl_ntoh64(p_smp->trans_id));
> > + cl_ntoh64(p_smp->trans_id),
> > + lidstr);
> > +
> > + if (p_smp->mgmt_class == IB_MCLASS_SUBN_DIR)
> > + osm_dump_smp_dr_path(p_ctrl->p_log, p_smp, OSM_LOG_ERROR);
> >
> > /*
> > If this was a SubnSet MAD, then this error might indicate a problem
>
--
Ira Weiny
Math Programmer/Computer Scientist
Lawrence Livermore National Lab
925-423-8008
weiny2-i2BcT+NCU+M@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-12-15 17:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-15 3:18 [PATCH V2] opensm: on MAD error call back print DR PATH or LID of request MAD Ira Weiny
[not found] ` <20111214191843.35cecd64.weiny2-i2BcT+NCU+M@public.gmane.org>
2011-12-15 14:15 ` Hal Rosenstock
[not found] ` <4EEA00F5.7040505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-15 17:49 ` Ira Weiny [this message]
[not found] ` <20111215094956.32f1abad.weiny2-i2BcT+NCU+M@public.gmane.org>
2011-12-15 22:20 ` Hal Rosenstock
[not found] ` <4EEA72AC.7060103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-16 0:33 ` Ira Weiny
[not found] ` <20111215163346.a26d5c22.weiny2-i2BcT+NCU+M@public.gmane.org>
2011-12-16 13:03 ` Hal Rosenstock
[not found] ` <4EEB41AE.4050504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2011-12-16 17:37 ` Ira Weiny
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111215094956.32f1abad.weiny2@llnl.gov \
--to=weiny2-i2bct+ncu+m@public.gmane.org \
--cc=alexne-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox