public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
To: Line Holen <Line.Holen-UdXhSnd/wVw@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms
Date: Mon, 19 Apr 2010 18:34:21 +0300	[thread overview]
Message-ID: <20100419153421.GB23994@me> (raw)
In-Reply-To: <4BCC1F3F.5080000-UdXhSnd/wVw@public.gmane.org>

On 11:15 Mon 19 Apr     , Line Holen wrote:
> SA path request handling can end up in a livelock in pr_rcv_get_path_parms().
> This can happen if a path request is handled while LFT updates to the fabric
> are in progress. 
> The LFT of the switch data structure is updated as part of the LFT response 
> processing. So while the SM is busy pushing the LFT updates, some switches have
> up to date LFT info while others are not yet updated and contains the LFT of
> the previous routing. For a (short) time interval there is a potential for 
> loops in the fabric. The livelock occurs if a path request is received during
> this time interval.
> Both LFT response handling and path request processing needs the SM lock.
> When the livelock occurs the LFT response handling blocks forever waiting for 
> the lock to be released.
> 
> The suggested fix is simply to introduce a max number of hops that should
> be traversed while handling the path request. If this max is reached then
> the request will return with NO_RECORD response and release the SM lock.
> This way the LFT processing will be able to complete.
> 
> Signed-off-by: Line Holen <Line.Holen-xsfywfwIY+M@public.gmane.org>

Applied. Thanks. See minor question/note below.

> 
> ---
> 
> diff --git a/opensm/opensm/osm_sa_path_record.c b/opensm/opensm/osm_sa_path_record.c
> index c4c3f86..b399b70 100644
> --- a/opensm/opensm/osm_sa_path_record.c
> +++ b/opensm/opensm/osm_sa_path_record.c
> @@ -4,6 +4,7 @@
>   * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
>   * Copyright (c) 2008 Xsigo Systems Inc. All rights reserved.
>   * Copyright (c) 2009 HNR Consulting. All rights reserved.
> + * Copyright (c) 2010 Sun Microsystems, Inc. All rights reserved.
>   *
>   * This software is available to you under a choice of one of two
>   * licenses.  You may choose to be licensed under the terms of the GNU
> @@ -69,6 +70,9 @@
>  #include <opensm/osm_prefix_route.h>
>  #include <opensm/osm_ucast_lash.h>
>  
> +
> +#define MAX_HOPS 128

IB spec defines maximal number of hops for a fabric which is 64. Would
it be netter to use this value here?

Sasha

> +
>  typedef struct osm_pr_item {
>  	cl_list_item_t list_item;
>  	ib_path_rec_t path_rec;
> @@ -178,6 +182,7 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  	osm_qos_level_t *p_qos_level = NULL;
>  	uint16_t valid_sl_mask = 0xffff;
>  	int is_lash;
> +	int hops = 0;
>  
>  	OSM_LOG_ENTER(sa->p_log);
>  
> @@ -369,6 +374,25 @@ static ib_api_status_t pr_rcv_get_path_parms(IN osm_sa_t * sa,
>  				goto Exit;
>  			}
>  		}
> +
> +		/* update number of hops traversed */
> +		hops++;
> +		if (hops > MAX_HOPS) {
> +
> +			OSM_LOG(sa->p_log, OSM_LOG_ERROR,
> +			    "Path from GUID 0x%016" PRIx64 " (%s) to lid %u GUID 0x%016"
> +			    PRIx64 " (%s) needs more than %d hops, "
> +			    "max %d hops allowed\n",
> +			    cl_ntoh64(osm_physp_get_port_guid(p_src_physp)),
> +			    p_src_physp->p_node->print_desc,
> +			    dest_lid_ho,
> +			    cl_ntoh64(osm_physp_get_port_guid(p_dest_physp)),
> +			    p_dest_physp->p_node->print_desc,
> +			    hops, MAX_HOPS);
> +
> +			status = IB_NOT_FOUND;
> +			goto Exit;
> +		}
>  	}
>  
>  	/*
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-04-19 15:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-19  9:15 [PATCH] opensm/osm_sa_path_record.c: livelock in pr_rcv_get_path_parms Line Holen
     [not found] ` <4BCC1F3F.5080000-UdXhSnd/wVw@public.gmane.org>
2010-04-19 15:34   ` Sasha Khapyorsky [this message]
2010-04-19 18:32     ` Line Holen
     [not found]       ` <4BCCA1C5.5000904-UdXhSnd/wVw@public.gmane.org>
2010-04-21 10:16         ` Sasha Khapyorsky
2010-04-21 10:21         ` Sasha Khapyorsky
2010-04-21 10:40           ` Line Holen
2010-04-19 18:20   ` Hal Rosenstock
     [not found]     ` <j2uf0e08f231004191120oc1e78130l683b9ae0ca51003a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-04-19 18:48       ` Line Holen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100419153421.GB23994@me \
    --to=sashak-smomgflxvozwk0htik3j/w@public.gmane.org \
    --cc=Line.Holen-UdXhSnd/wVw@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox