From: Line Holen <Line.Holen-UdXhSnd/wVw@public.gmane.org>
To: Hal Rosenstock <hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms
Date: Mon, 19 Apr 2010 20:48:32 +0200 [thread overview]
Message-ID: <4BCCA580.3060700@Sun.COM> (raw)
In-Reply-To: <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 04/19/10 08:20 PM, Hal Rosenstock wrote:
> On Mon, Apr 19, 2010 at 5:15 AM, Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org> wrote:
>> SA path request handling can end up in a livelock in pr_rcv_get_path_parms().
>> This can happen if a path request is handled while LFT updates to the fabric
>> are in progress.
>> The LFT of the switch data structure is updated as part of the LFT response
>> processing. So while the SM is busy pushing the LFT updates, some switches have
>> up to date LFT info while others are not yet updated and contains the LFT of
>> the previous routing. For a (short) time interval there is a potential for
>> loops in the fabric. The livelock occurs if a path request is received during
>> this time interval.
>> Both LFT response handling and path request processing needs the SM lock.
>> When the livelock occurs the LFT response handling blocks forever waiting for
>> the lock to be released.
>>
>> The suggested fix is simply to introduce a max number of hops that should
>> be traversed while handling the path request. If this max is reached then
>> the request will return with NO_RECORD response
>
> To me, this begs the question of whether this should return a BUSY
> status rather than no record (and whether SA clients should handle
> those two differently) but that is a bigger change (and may require
> some end node change as well).
I think the fundamental issue here is that the path request handling is operating
on inconsistent data - a mixture of old and new lft setup. A proper fix would
be to use a consistent lft setup (either old or new) or deny service (return BUSY)
while LFT updates are in progress. A check on number of hops still make sense
though, because the routing could generate loops too.
>
> Also, should a similar change be made in SA MPR mpr_rcv_get_path_parms ?
Could be. I haven't checked that code.
Line
>
> -- Hal
>
>> and release the SM lock.
>> This way the LFT processing will be able to complete.
>>
>> Signed-off-by: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>
>>
>> ---
>>
>> diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
>> index c4c3f86..b399b70 100644
>> --- a/opensm/opensm/osm_sa_path_record.c
>> +++ b/opensm/opensm/osm_sa_path_record.c
>> @@ -4,6 +4,7 @@
>> * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>> * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved.
>> * Copyright (c) 2009 HNR Consulting. All rights reserved.
>> + * Copyright (c) 2010 Sun Microsystems, Inc. All rights reserved.
>> *
>> * This software is available to you under a choice of one of two
>> * licenses. You may choose to be licensed under the terms of the GNU
>> @@ -69,6 +70,9 @@
>> #include <opensm/osm_prefix_route.h>
>> #include <opensm/osm_ucast_lash.h>
>>
>> +
>> +#define MAX_HOPS 128
>> +
>> typedef struct osm_pr_item {
>> cl_list_item_t list_item;
>> ib_path_rec_t path_rec;
>> @@ -178,6 +182,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> osm_qos_level_t *p_qos_level = NULL;
>> uint16_t valid_sl_mask = 0xffff;
>> int is_lash;
>> + int hops = 0;
>>
>> OSM_LOG_ENTER(sa->p_log);
>>
>> @@ -369,6 +374,25 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> goto Exit;
>> }
>> }
>> +
>> + /* update number of hops traversed */
>> + hops++;
>> + if (hops > MAX_HOPS) {
>> +
>> + OSM_LOG(sa->p_log, OSM_LOG_ERROR,
>> + "Path from GUID 0x%016" PRIx64 " (%s) to lid %u GUID 0x%016"
>> + PRIx64 " (%s) needs more than %d hops, "
>> + "max %d hops allowed\n",
>> + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)),
>> + p_src_physp->p_node->print_desc,
>> + dest_lid_ho,
>> + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)),
>> + p_dest_physp->p_node->print_desc,
>> + hops, MAX_HOPS);
>> +
>> + status = IB_NOT_FOUND;
>> + goto Exit;
>> + }
>> }
>>
>> /*
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2010-04-19 18:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-19 9:15 [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms Line Holen
[not found] ` <4BCC1F3F.5080000-UdXhSnd/wVw@public.gmane.org>
2010-04-19 15:34 ` Sasha Khapyorsky
2010-04-19 18:32 ` Line Holen
[not found] ` <4BCCA1C5.5000904-UdXhSnd/wVw@public.gmane.org>
2010-04-21 10:16 ` Sasha Khapyorsky
2010-04-21 10:21 ` Sasha Khapyorsky
2010-04-21 10:40 ` Line Holen
2010-04-19 18:20 ` Hal Rosenstock
[not found] ` <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-19 18:48 ` Line Holen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BCCA580.3060700@Sun.COM \
--to=line.holen-udxhsnd/wvw@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.