All of lore.kernel.org
 help / color / mirror / Atom feed
From: roopa <roopa@cumulusnetworks.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Robert Shearman <rshearma@brocade.com>,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH net-next v2] mpls: support for dead routes
Date: Tue, 03 Nov 2015 15:08:10 -0800	[thread overview]
Message-ID: <56393E5A.6080108@cumulusnetworks.com> (raw)
In-Reply-To: <878u6erk01.fsf@x220.int.ebiederm.org>

On 11/3/15, 10:54 AM, Eric W. Biederman wrote:
> roopa <roopa@cumulusnetworks.com> writes:
>
>> On 11/3/15, 7:08 AM, Robert Shearman wrote:
>>> On 03/11/15 05:13, Roopa Prabhu wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>>
>>>> Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
>>>> routes due to link events. Also adds code to ignore dead
>>>> routes during route selection
>>>>
>>>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>> ---
>>>> RFC to v1:
>>>>          Addressed a few comments from Eric and Robert:
>>>>          - remove support for weighted nexthops
>>>>          - Use rt_nhn_alive in the rt structure to keep count of alive
>>>>          routes.
>>>>          What i have not done is: sort nexthops on link events.
>>>>          I am not comfortable recreating or sorting nexthops on
>>>>          every carrier change. This leaves scope for optimizing in the future
>>>>
>>>> v1 to v2:
>>>>     Fix dead nexthop checks as suggested by dave
>>>>
>>>>   net/mpls/af_mpls.c  | 191 ++++++++++++++++++++++++++++++++++++++++++++--------
>>>>   net/mpls/internal.h |   3 +
>>>>   2 files changed, 166 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>>> index c70d750..5e88118 100644
>>>> --- a/net/mpls/af_mpls.c
>>>> +++ b/net/mpls/af_mpls.c
>>>> @@ -96,22 +96,15 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
>>>>
>>>> -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>> -                         struct sk_buff *skb, bool bos)
>>>> +static u32 mpls_multipath_hash(struct mpls_route *rt,
>>>> +                   struct sk_buff *skb, bool bos)
>>>>   {
>>>>       struct mpls_entry_decoded dec;
>>>>       struct mpls_shim_hdr *hdr;
>>>>       bool eli_seen = false;
>>>>       int label_index;
>>>> -    int nh_index = 0;
>>>>       u32 hash = 0;
>>>>
>>>> -    /* No need to look further into packet if there's only
>>>> -     * one path
>>>> -     */
>>>> -    if (rt->rt_nhn == 1)
>>>> -        goto out;
>>>> -
>>>>       for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
>>>>            label_index++) {
>>>>           if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
>>>> @@ -165,9 +158,37 @@ static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>>           }
>>>>       }
>>>>
>>>> -    nh_index = hash % rt->rt_nhn;
>>>> +    return hash;
>>>> +}
>>>> +
>>>> +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>>>> +                         struct sk_buff *skb, bool bos)
>>>> +{
>>>> +    u32 hash = 0;
>>>> +    int nh_index;
>>>> +    int n = 0;
>>>> +
>>>> +    /* No need to look further into packet if there's only
>>>> +     * one path
>>>> +     */
>>>> +    if (rt->rt_nhn == 1)
>>>> +        goto out;
>>>> +
>>>> +    if (rt->rt_nhn_alive <= 0)
>>>> +        return NULL;
>>>> +
>>>> +    hash = mpls_multipath_hash(rt, skb, bos);
>>>> +    nh_index = hash % rt->rt_nhn_alive;
>>>> +    for_nexthops(rt) {
>>> This should be optimised in the common rt->rt_nhn_alive == rt->rt_nhn case by doing the direct array index and avoid the nh walk.
>> sure, that is an optimization.  i will add that.
>>>> +        if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>>> +            continue;
>>>> +        if (n == nh_index)
>>>> +            return nh;
>>>> +        n++;
>>>> +    } endfor_nexthops(rt);
>>>> +
>>>>   out:
>>>> -    return &rt->rt_nh[nh_index];
>>>> +    return &rt->rt_nh[0];
>>>>   }
>>>>
>>>>   static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>>> @@ -365,6 +386,7 @@ static struct mpls_route *mpls_rt_alloc(int num_nh, u8 max_alen)
>>>>                GFP_KERNEL);
>>>>       if (rt) {
>>>>           rt->rt_nhn = num_nh;
>>>> +        rt->rt_nhn_alive = num_nh;
>>>>           rt->rt_max_alen = max_alen_aligned;
>>>>       }
>>>>
>>>> @@ -536,6 +558,15 @@ static int mpls_nh_assign_dev(struct net *net, struct mpls_route *rt,
>>>>
>>>>       RCU_INIT_POINTER(nh->nh_dev, dev);
>>>>
>>>> +    if (!netif_carrier_ok(dev))
>>>> +        nh->nh_flags |= RTNH_F_LINKDOWN;
>>>> +
>>>> +    if (!(dev->flags & IFF_UP))
>>>> +        nh->nh_flags |= RTNH_F_DEAD;
>>> Below you're testing IFF_RUNNING | IFF_LOWER_UP. Is the difference intended here?
>> Not really. I can change it. This is during adding the route. I tried to keep the checks consistent with ipv4.
>>
>>>> +
>>>> +    if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
>>>> +        rt->rt_nhn_alive--;
>>> You don't update your new rt->rt_flags field here. This highlights the issue with duplicating data, which you're doing with the rt_flags field.
>> I am not sure I understand completely. Would you prefer i loop and derive the rt_flags from nh_flags during dumps than storing them in rt_flags ?. Sure I can do that. I did not think it was such a big deal because, all routes (ipv4 and others) have rt_flags and all routes today carry RTNH_ flags and you have to send them to userspace in rtm_flags anyways.
>> I am just trying to keep the use and semantics of RTNH_F flags for mpls routes consistent with the other family routes.
>>
>>>> +
>>>>       return 0;
>>>>
>>>>   errout:
>>>> @@ -577,7 +608,7 @@ errout:
>>>>   }
>>>>
>>>>   static int mpls_nh_build(struct net *net, struct mpls_route *rt,
>>>> -             struct mpls_nh *nh, int oif,
>>>> +             struct mpls_nh *nh, int oif, int hops,
>>> This change isn't required.
>> ack, this is a leftover from my attempt to add weights. will remove it.
>>>>                struct nlattr *via, struct nlattr *newdst)
>>>>   {
>>>>       int err = -ENOMEM;
>>>> @@ -666,7 +697,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>>           /* neither weighted multipath nor any flags
>>>>            * are supported
>>>>            */
>>>> -        if (rtnh->rtnh_hops || rtnh->rtnh_flags)
>>>> +        if (rtnh->rtnh_flags || rtnh->rtnh_flags)
>>> As the build bot has pointed out, this change is not entirely correct.
>> yes, this was fixed later.
>>>>               goto errout;
>>>>
>>>>           attrlen = rtnh_attrlen(rtnh);
>>>> @@ -681,8 +712,8 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
>>>>               goto errout;
>>>>
>>>>           err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
>>>> -                    rtnh->rtnh_ifindex, nla_via,
>>>> -                    nla_newdst);
>>>> +                    rtnh->rtnh_ifindex, rtnh->rtnh_hops,
>>>> +                    nla_via, nla_newdst);
>>> This change isn't required.
>> yep, ack. same here left over from weights. will remove it.
>>
>>>>           if (err)
>>>>               goto errout;
>>>>
>>>> @@ -875,34 +906,100 @@ free:
>>>>       return ERR_PTR(err);
>>>>   }
>>>>
>>>> -static void mpls_ifdown(struct net_device *dev)
>>>> +static void mpls_ifdown(struct net_device *dev, int event)
>>>>   {
>>>>       struct mpls_route __rcu **platform_label;
>>>>       struct net *net = dev_net(dev);
>>>> -    struct mpls_dev *mdev;
>>>>       unsigned index;
>>>> +    int dead;
>>>>
>>>>       platform_label = rtnl_dereference(net->mpls.platform_label);
>>>>       for (index = 0; index < net->mpls.platform_labels; index++) {
>>>>           struct mpls_route *rt = rtnl_dereference(platform_label[index]);
>>>> +
>>>>           if (!rt)
>>>>               continue;
>>>> +        dead = 0;
>>>>           for_nexthops(rt) {
>>>> +            if ((event == NETDEV_DOWN &&
>>>> +                 (nh->nh_flags & RTNH_F_DEAD)) ||
>>>> +                 (event == NETDEV_CHANGE &&
>>>> +                 (nh->nh_flags & RTNH_F_LINKDOWN))) {
>>>> +                dead++;
>>>> +                continue;
>>>> +            }
>>>> +
>>>>               if (rtnl_dereference(nh->nh_dev) != dev)
>>>>                   continue;
>>>> -            nh->nh_dev = NULL;
>>>> +            switch (event) {
>>>> +            case NETDEV_DOWN:
>>>> +            case NETDEV_UNREGISTER:
>>>> +                nh->nh_flags |= RTNH_F_DEAD;
>>>> +                /* fall through */
>>>> +            case NETDEV_CHANGE:
>>>> +                nh->nh_flags |= RTNH_F_LINKDOWN;
>>>> +                rt->rt_nhn_alive--;
>>> Are these operations atomic on all architectures?
>> I think so.
> Ugh.  I don't see how.
> A) You are doing read-modify-write. 
> B) You are modifying multiple fields.
>
> So while the individual field writes may be atomic the changes certainly
> are not atomic.
>
>>> Even if they are, I'm a bit worried about the RCU-correctness of this. I think the fallthrough case in mpls_select_multipath saves you, causing traffic to get via nh0 instead of the one selected by the hash (which is fine transiently).
>>>
> I share your concern.  In-place modification sounds good in principle
> for handling RCU, but there is a reason why the original version of
> RCU was called Read-Copy-Update.   Given that there are multiple fields
> that seem to depend upon each other I am not certain we can perform rcu
> safe modifications without creating a fresh copy of the route.
>
I misread the initial comment. For some reason i thought the concern pointed out was between multiple updaters. And currently those are all under rtnl. I now realize it was the reader in the packet path you were talking about which may not see the update atomically.  I see how this will need to be a fresh copy and mpls_route_update on every link state change.

  reply	other threads:[~2015-11-03 23:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-03  5:13 [PATCH net-next v2] mpls: support for dead routes Roopa Prabhu
2015-11-03 15:08 ` Robert Shearman
2015-11-03 17:21   ` roopa
2015-11-03 18:54     ` Eric W. Biederman
2015-11-03 23:08       ` roopa [this message]
2015-11-03 15:28 ` David Miller
2015-11-03 16:00   ` roopa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56393E5A.6080108@cumulusnetworks.com \
    --to=roopa@cumulusnetworks.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=netdev@vger.kernel.org \
    --cc=rshearma@brocade.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.