From: Line Holen <Line.Holen-UdXhSnd/wVw@public.gmane.org>
To: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms
Date: Mon, 19 Apr 2010 20:48:32 +0200 [thread overview]
Message-ID: <4BCCA580.3060700@Sun.COM> (raw)
In-Reply-To: <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 04/19/10 08:20 PM, Hal Rosenstock wrote:
> On Mon, Apr 19, 2010 at 5:15 AM, Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org> wrote:
>> SA path request handling can end up in a livelock in pr_rcv_get_path_parms().
>> This can happen if a path request is handled while LFT updates to the fabric
>> are in progress.
>> The LFT of the switch data structure is updated as part of the LFT response
>> processing. So while the SM is busy pushing the LFT updates, some switches have
>> up to date LFT info while others are not yet updated and contains the LFT of
>> the previous routing. For a (short) time interval there is a potential for
>> loops in the fabric. The livelock occurs if a path request is received during
>> this time interval.
>> Both LFT response handling and path request processing needs the SM lock.
>> When the livelock occurs the LFT response handling blocks forever waiting for
>> the lock to be released.
>>
>> The suggested fix is simply to introduce a max number of hops that should
>> be traversed while handling the path request. If this max is reached then
>> the request will return with NO_RECORD response
>
> To me, this begs the question of whether this should return a BUSY
> status rather than no record (and whether SA clients should handle
> those two differently) but that is a bigger change (and may require
> some end node change as well).
I think the fundamental issue here is that the path request handling is operating
on inconsistent data - a mixture of old and new lft setup. A proper fix would
be to use a consistent lft setup (either old or new) or deny service (return BUSY)
while LFT updates are in progress. A check on number of hops still make sense
though, because the routing could generate loops too.
>
> Also, should a similar change be made in SA MPR mpr_rcv_get_path_parms ?
Could be. I haven't checked that code.
Line
>
> -- Hal
>
>> and release the SM lock.
>> This way the LFT processing will be able to complete.
>>
>> Signed-off-by: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>
>>
>> ---
>>
>> diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
>> index c4c3f86..b399b70 100644
>> --- a/opensm/opensm/osm_sa_path_record.c
>> +++ b/opensm/opensm/osm_sa_path_record.c
>> @@ -4,6 +4,7 @@
>> * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>> * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved.
>> * Copyright (c) 2009 HNR Consulting. All rights reserved.
>> + * Copyright (c) 2010 Sun Microsystems, Inc. All rights reserved.
>> *
>> * This software is available to you under a choice of one of two
>> * licenses. You may choose to be licensed under the terms of the GNU
>> @@ -69,6 +70,9 @@
>> #include <opensm/osm_prefix_route.h>
>> #include <opensm/osm_ucast_lash.h>
>>
>> +
>> +#define MAX_HOPS 128
>> +
>> typedef struct osm_pr_item {
>> cl_list_item_t list_item;
>> ib_path_rec_t path_rec;
>> @@ -178,6 +182,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> osm_qos_level_t *p_qos_level = NULL;
>> uint16_t valid_sl_mask = 0xffff;
>> int is_lash;
>> + int hops = 0;
>>
>> OSM_LOG_ENTER(sa->p_log);
>>
>> @@ -369,6 +374,25 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> goto Exit;
>> }
>> }
>> +
>> + /* update number of hops traversed */
>> + hops++;
>> + if (hops > MAX_HOPS) {
>> +
>> + OSM_LOG(sa->p_log, OSM_LOG_ERROR,
>> + "Path from GUID 0x%016" PRIx64 " (%s) to lid %u GUID 0x%016"
>> + PRIx64 " (%s) needs more than %d hops, "
>> + "max %d hops allowed\n",
>> + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)),
>> + p_src_physp->p_node->print_desc,
>> + dest_lid_ho,
>> + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)),
>> + p_dest_physp->p_node->print_desc,
>> + hops, MAX_HOPS);
>> +
>> + status = IB_NOT_FOUND;
>> + goto Exit;
>> + }
>> }
>>
>> /*
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2010-04-19 18:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-19 9:15 [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms Line Holen
[not found] ` <4BCC1F3F.5080000-UdXhSnd/wVw@public.gmane.org>
2010-04-19 15:34 ` Sasha Khapyorsky
2010-04-19 18:32 ` Line Holen
[not found] ` <4BCCA1C5.5000904-UdXhSnd/wVw@public.gmane.org>
2010-04-21 10:16 ` Sasha Khapyorsky
2010-04-21 10:21 ` Sasha Khapyorsky
2010-04-21 10:40 ` Line Holen
2010-04-19 18:20 ` Hal Rosenstock
[not found] ` <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-19 18:48 ` Line Holen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BCCA580.3060700@Sun.COM \
--to=line.holen-udxhsnd/wvw@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox