From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA48628E3F for ; Fri, 4 Apr 2025 14:40:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743777636; cv=none; b=e1779eHe129VET0X3Sn08X0hzuDakyVQarawN5PzDX6lwC490FXtCL/89jaxf+1MB4od5wPxwNn3MzfCWrgWGczHpPc7NCbV1QQwVMz5XmnLs0mpOq/VbGcXr3uSkR1VZcPa6NFPV/wSCfFfh3zcm7Wo+GGTNcs1nmU8SeGFwk8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743777636; c=relaxed/simple; bh=stnd6o+2AvOBAu7HQXgEMBXtbfZ6rBxBc8cLkK/oApc=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: Mime-Version:Content-Type; b=Ms2VKYS+DsRpyliPczwy+MpIfIukT94v5kr7fME18HhsfC0p3AhrO27dGsv7zpW5WCUtb8bq+sXZR2N4qG+UUJr8Ok2L692X21NVEONKrFjJM5x4RSj/Vl158D4CTdmtvghGvn5dDIbkDxDnclVQPeaUFLNwD5aACbabV/AClzc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Lu2VHb1R; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Lu2VHb1R" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-476805acddaso23468101cf.1 for ; Fri, 04 Apr 2025 07:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743777633; x=1744382433; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=LTwv+NvIv12iz2xRTuUQl78g8x0HeiF4zDUv6VqO33w=; b=Lu2VHb1Rkz4SMPmBTyzJbf1URTDFcOeSmYX2Js56l4kD1tr3zKFBRxdSXEb0HPEyYo 74G81cvd6hug2MR7ht0c5/fQ0RXLEPmneVBSKXMdNBPL2JjpExsYqz6yUkWOUmJ9AYSl rUce4JYdZ6hrGKamgc+PYaU1NPL9FJIp54OplRPT3HALKL1G+Rv3G5ZgKWUtFj1siwH/ bWtuuzcmrT/E/8an0315RLJByxIjY5z6GiKZqudu2jzzJuJU1mw31K6fJTZJJR58T3fL hyreLpOgHFj239p86a/DSBKZfstHXNMB6cfY0WyYiyXzUt95sAkkllkSXkJb1sQ4eB0C MrVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743777633; x=1744382433; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=LTwv+NvIv12iz2xRTuUQl78g8x0HeiF4zDUv6VqO33w=; b=Q+0BNuZjETyOVFw6uW/kvJvTCuMCkDAKrLajFkYJLDTvD7Tj8vKnHCJqzn5DMGhLWI wy1ItHQykXFXaVIVnM3cbKHdtxju6FtyraEVy18DfONMpbh3oGYE3KhDhIvfNRfv9RUR 2Ol7i8hD8H/Zyx5y7GmVXveo1u/dcZu5syHt+d6VL68F+FF04vjXhbNuBN9BM3WhBOHr HPBs0bEFaA9jDiUzTjqdwUTj2hrMl6Dp9k6m3BMIEF/MsZdMC2tQZZkNX02fNst29tfP c+hisrUJ/egqqFY1Qi+s5y3iP51NXWO2wpyND8WZr0iw4dtoo1OQIo+S7cwhBMWUZNwn hWag== X-Forwarded-Encrypted: i=1; AJvYcCXuCjMHoi10btTN1pSU/+/zx8sI8ibcB1oPDvR4LadRoyV0DH++d33NCV4iDbqIro8hHcrK2hM=@vger.kernel.org X-Gm-Message-State: AOJu0YxJ8xLbI1UR0/B4A3dgtLo+qMPPOH80c8hfU7NldmRIS9AMFlm+ 5yGlCS2NjCasY34sNlgY6KBRvssF4Vdqo4zRxzddy/6u/8BqBLXx X-Gm-Gg: ASbGncuyCJM5miZePkjuYTJOhmNEAjZZy0fP3zH13EPtYHICC68I4fEYBrFFq32t5pC jMjy1D0Jwb6/JT8zgfnx89cwJnjPD5O7FTyNA3VpILb4jHXHHm9zy4l3ZBWWdVA+2Q3k4VUzrrL wZDzunAILfsV/EabT9yb5nIVFy7XP+cUkQ7eRHwwdzQ9JsGZYVHS7Mie8BCTaLjA8m6McMXyAnp TvBS0tk/gpecfigU4XOy+nz50fj4KB2lX4tF0mRE0lLdTDmnd8EsGHMo5vq4w4Er5Kj29C1U1dF d17/r0t1JctB+SWBXZIiu2kLoBrbkEx73THIzMg1Gf8JqXrDa8Rc5KDAR2k5Pye3emHoQZ2dN/Y kq5KNVwJEVYFOSP49P8FlfQ== X-Google-Smtp-Source: AGHT+IFA23JVEvYRuXkWQQwUHiJkRu5z3zSXhg8k4En0oFmi6bPIo4/H5D4B/XuEK+PQCoMsbSAaEg== X-Received: by 2002:ac8:7f84:0:b0:476:8f75:b885 with SMTP id d75a77b69052e-47925a791c6mr47916791cf.44.1743777633497; Fri, 04 Apr 2025 07:40:33 -0700 (PDT) Received: from localhost (86.235.150.34.bc.googleusercontent.com. [34.150.235.86]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4791b088544sm23340891cf.45.2025.04.04.07.40.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Apr 2025 07:40:33 -0700 (PDT) Date: Fri, 04 Apr 2025 10:40:32 -0400 From: Willem de Bruijn To: Ido Schimmel , netdev@vger.kernel.org Cc: davem@davemloft.net, pabeni@redhat.com, edumazet@google.com, dsahern@kernel.org, horms@kernel.org, gnault@redhat.com, stfomichev@gmail.com, Ido Schimmel Message-ID: <67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch> In-Reply-To: <20250402114224.293392-2-idosch@nvidia.com> References: <20250402114224.293392-1-idosch@nvidia.com> <20250402114224.293392-2-idosch@nvidia.com> Subject: Re: [PATCH net 1/2] ipv6: Start path selection from the first nexthop Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Ido Schimmel wrote: > Cited commit transitioned IPv6 path selection to use hash-threshold > instead of modulo-N. With hash-threshold, each nexthop is assigned a > region boundary in the multipath hash function's output space and a > nexthop is chosen if the calculated hash is smaller than the nexthop's > region boundary. > > Hash-threshold does not work correctly if path selection does not start > with the first nexthop. For example, if fib6_select_path() is always > passed the last nexthop in the group, then it will always be chosen > because its region boundary covers the entire hash function's output > space. > > Fix this by starting the selection process from the first nexthop and do > not consider nexthops for which rt6_score_route() provided a negative > score. > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") > Reported-by: Stanislav Fomichev > Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/ > Signed-off-by: Ido Schimmel > --- > net/ipv6/route.c | 38 +++++++++++++++++++++++++++++++++++--- > 1 file changed, 35 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index c3406a0d45bd..864f0002034b 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt) > return false; > } > > +static struct fib6_info * > +rt6_multipath_first_sibling_rcu(const struct fib6_info *rt) > +{ > + struct fib6_info *iter; > + struct fib6_node *fn; > + > + fn = rcu_dereference(rt->fib6_node); > + if (!fn) > + goto out; > + iter = rcu_dereference(fn->leaf); > + if (!iter) > + goto out; > + > + while (iter) { > + if (iter->fib6_metric == rt->fib6_metric && > + rt6_qualify_for_ecmp(iter)) > + return iter; > + iter = rcu_dereference(iter->fib6_next); > + } > + > +out: > + return NULL; > +} The rcu counterpart to rt6_multipath_first_sibling, which is used when computing the ranges in rt6_multipath_rebalance. > + > void fib6_select_path(const struct net *net, struct fib6_result *res, > struct flowi6 *fl6, int oif, bool have_oif_match, > const struct sk_buff *skb, int strict) > { > - struct fib6_info *match = res->f6i; > + struct fib6_info *first, *match = res->f6i; > struct fib6_info *sibling; > > if (!match->nh && (!match->fib6_nsiblings || have_oif_match)) > @@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res, > return; > } > > - if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound)) > + first = rt6_multipath_first_sibling_rcu(match); > + if (!first) > goto out; > > - list_for_each_entry_rcu(sibling, &match->fib6_siblings, > + if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) && > + rt6_score_route(first->fib6_nh, first->fib6_flags, oif, > + strict) >= 0) { Does this fix address two issues in one patch: start from the first sibling, and check validity of the sibling? The behavior on negative score for the first_sibling appears different from that on subsequent siblings in the for_each below: in that case the loop breaks, while for the first it skips? if (fl6->mp_hash > nh_upper_bound) continue; if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0) break; match = sibling; break; Am I reading that correct and is that intentional? > + match = first; > + goto out; > + } > + > + list_for_each_entry_rcu(sibling, &first->fib6_siblings, > fib6_siblings) { > const struct fib6_nh *nh = sibling->fib6_nh; > int nh_upper_bound; > -- > 2.49.0 >