From: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>
To: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms
Date: Mon, 19 Apr 2010 20:32:37 +0200 [thread overview]
Message-ID: <4BCCA1C5.5000904@Sun.COM> (raw)
In-Reply-To: <20100419153421.GB23994@me>
On 04/19/10 05:34 PM, Sasha Khapyorsky wrote:
> On 11:15 Mon 19 Apr , Line Holen wrote:
>> SA path request handling can end up in a livelock in pr_rcv_get_path_parms().
>> This can happen if a path request is handled while LFT updates to the fabric
>> are in progress.
>> The LFT of the switch data structure is updated as part of the LFT response
>> processing. So while the SM is busy pushing the LFT updates, some switches have
>> up to date LFT info while others are not yet updated and contains the LFT of
>> the previous routing. For a (short) time interval there is a potential for
>> loops in the fabric. The livelock occurs if a path request is received during
>> this time interval.
>> Both LFT response handling and path request processing needs the SM lock.
>> When the livelock occurs the LFT response handling blocks forever waiting for
>> the lock to be released.
>>
>> The suggested fix is simply to introduce a max number of hops that should
>> be traversed while handling the path request. If this max is reached then
>> the request will return with NO_RECORD response and release the SM lock.
>> This way the LFT processing will be able to complete.
>>
>> Signed-off-by: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>
>
> Applied. Thanks. See minor question/note below.
>
>> ---
>>
>> diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
>> index c4c3f86..b399b70 100644
>> --- a/opensm/opensm/osm_sa_path_record.c
>> +++ b/opensm/opensm/osm_sa_path_record.c
>> @@ -4,6 +4,7 @@
>> * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>> * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved.
>> * Copyright (c) 2009 HNR Consulting. All rights reserved.
>> + * Copyright (c) 2010 Sun Microsystems, Inc. All rights reserved.
>> *
>> * This software is available to you under a choice of one of two
>> * licenses. You may choose to be licensed under the terms of the GNU
>> @@ -69,6 +70,9 @@
>> #include <opensm/osm_prefix_route.h>
>> #include <opensm/osm_ucast_lash.h>
>>
>> +
>> +#define MAX_HOPS 128
>
> IB spec defines maximal number of hops for a fabric which is 64. Would
> it be netter to use this value here?
>
> Sasha
The value of 128 was chosen as 2x max DR path allowing the SM to be in
the middle of a fabric. But I have no problem lowering to 64.
Line
>
>> +
>> typedef struct osm_pr_item {
>> cl_list_item_t list_item;
>> ib_path_rec_t path_rec;
>> @@ -178,6 +182,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> osm_qos_level_t *p_qos_level = NULL;
>> uint16_t valid_sl_mask = 0xffff;
>> int is_lash;
>> + int hops = 0;
>>
>> OSM_LOG_ENTER(sa->p_log);
>>
>> @@ -369,6 +374,25 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>> goto Exit;
>> }
>> }
>> +
>> + /* update number of hops traversed */
>> + hops++;
>> + if (hops > MAX_HOPS) {
>> +
>> + OSM_LOG(sa->p_log, OSM_LOG_ERROR,
>> + "Path from GUID 0x%016" PRIx64 " (%s) to lid %u GUID 0x%016"
>> + PRIx64 " (%s) needs more than %d hops, "
>> + "max %d hops allowed\n",
>> + cl_ntoh64(osm_physp_get_port_guid(p_src_physp)),
>> + p_src_physp->p_node->print_desc,
>> + dest_lid_ho,
>> + cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)),
>> + p_dest_physp->p_node->print_desc,
>> + hops, MAX_HOPS);
>> +
>> + status = IB_NOT_FOUND;
>> + goto Exit;
>> + }
>> }
>>
>> /*
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-04-19 18:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-19 9:15 [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms Line Holen
[not found] ` <4BCC1F3F.5080000-UdXhSnd/wVw@public.gmane.org>
2010-04-19 15:34 ` Sasha Khapyorsky
2010-04-19 18:32 ` Line Holen [this message]
[not found] ` <4BCCA1C5.5000904-UdXhSnd/wVw@public.gmane.org>
2010-04-21 10:16 ` Sasha Khapyorsky
2010-04-21 10:21 ` Sasha Khapyorsky
2010-04-21 10:40 ` Line Holen
2010-04-19 18:20 ` Hal Rosenstock
[not found] ` <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-19 18:48 ` Line Holen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BCCA1C5.5000904@Sun.COM \
--to=line.holen-xsfywfwiy+m@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.