Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware

All of lore.kernel.org
 help / color / mirror / Atom feed

From: roopa <roopa@cumulusnetworks.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Scott Feldman <sfeldma@gmail.com>, Jiri Pirko <jiri@resnulli.us>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	Andy Gospodarek <andy@greyhouse.net>
Subject: Re: [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware
Date: Thu, 28 May 2015 22:37:14 -0700	[thread overview]
Message-ID: <5567FB0A.1060900@cumulusnetworks.com> (raw)
In-Reply-To: <55673DF5.7060401@gmail.com>

On 5/28/15, 9:10 AM, John Fastabend wrote:
> On 05/28/2015 08:40 AM, Scott Feldman wrote:
>> On Thu, May 28, 2015 at 2:42 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Mon, May 18, 2015 at 10:19:16PM CEST, davem@davemloft.net wrote:
>>>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>>> Date: Sun, 17 May 2015 16:42:05 -0700
>>>>
>>>>> On most systems where you can offload routes to hardware,
>>>>> doing routing in software is not an option (the cpu limitations
>>>>> make routing impossible in software).
>>>>
>>>> You absolutely do not get to determine this policy, none of us
>>>> do.
>>>>
>>>> What matters is that by default the damn switch device being there
>>>> is %100 transparent to the user.
>>>>
>>>> And the way to achieve that default is to do software routes as
>>>> a fallback.
>>>>
>>>> I am not going to entertain changes of this nature which fail
>>>> route loading by default just because we've exceeded a device's
>>>> HW capacity to offload.
>>>>
>>>> I thought I was _really_ clear about this at netdev 0.1
>>>
>>> I certainly agree that by default, transparency 1:1 sw:hw mapping is
>>> what we need for fib. The current code is a good start!
>>>
>>> I see couple of issues regarding switchdev_fib_ipv4_abort:
>>> 1) If user adds and entry, switchdev_fib_ipv4_add fails, abort is
>>>     executed -> and, error returned. I would expect that route entry 
>>> should
>>>     be added in this case. The next attempt of adding the same entry 
>>> will
>>>     be successful.
>>>     The current behaviour breaks the transparency you are reffering to.
>>> 2) When switchdev_fib_ipv4_abort happens to be executed, the offload is
>>>     disabled for good (until reboot). That is certainly not nice, 
>>> alhough
>>>     I understand that is the easiest solution for now.
>>>
>>> I believe that we all agree that the 1:1 transparency, although it is a
>>> default, may not be optimal for real-life usage. HW resources are
>>> limited and user does not know them. The danger of hitting _abort and
>>> screwing-up the whole system is huge, unacceptable.
>>>
>>> So here, there are couple of more or less simple things that I 
>>> suggest to
>>> do in order to move a little bit forward:
>>> 1) Introduce system-wide option to switch _abort to just plain fail.
>>>     When HW does not have capacity, do not flush and fallback to sw, 
>>> but
>>>     rather just fail to add the entry. This would not break anything.
>>>     Userspace has to be prepared that entry add could fail.
>>> 2) Introduce a way to propagate resources to userspace. Driver knows 
>>> about
>>>     resources used/available/potentially_available. Switchdev infra 
>>> could
>>>     be extended in order to propagate the info to the user.
>>> 3) Introduce couple of flags for entry add that would alter the default
>>>     behaviour. Something like:
>>>          NLM_F_SKIP_KERNEL
>>>          NLM_F_SKIP_OFFLOAD
>>>     Again, this does not break the current users. On the other hand, 
>>> this
>>>     gives new users a leverage to instruct kernel where the entry 
>>> should
>>>     be added to (or not added to).
>>>
>>> Any thoughts? Objections?
>>
>> I don't like these.  Breaks transparency and forces the user in a
>> position of having to know hardware failures modes (unique to each
>> hardware device).  I presented an option d) which avoids this issues;
>> was it not understood?
>>
>
> Hi Scott,
>
> I understood your proposal. One caveat I had is in response to this,
>
> "Actually, now that I think of it, the device/driver could decide which
> related-prefix to evict from HW, if driver/device wanted to have a
> sense of which routes are more important to offload than other"
>
> hardware/driver/device shouldn't have a sense of which routes are more
> important than others. 

correct. The routing daemons know this best.
> I think this is where the NLM_F_* flags come in.
> If userspace _wants_ to push policy into the kernel about what is
> important it can. If it doesn't we get a sensible heuristic that does
> a reasonable job offloading rules transparently. This is how we did
> L2 and I think that seems to work fairly well. At least for me but,
> always interested to hear other use cases though.
agree.
>
> Also I guess I'm not seeing the multitude of hardware failure modes. I
> see two either the hardware doesn't support the operation or it is out
> of resources. Both can be learned if the hardware exports a model of its
> capabilities and resources.
agree, A switchdev api to query hardware resource and capability is due.
We can start with rocker. It gives an app the choice to control the policy.

But, for our usecase/deployments today,  i am more interested in a 
system wide policy because
it is easier on my apps today.

thanks.

next prev parent reply	other threads:[~2015-05-29  5:37 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-17 23:42 [PATCH net v2] switchdev: don't abort hardware ipv4 fib offload on failure to program fib entry in hardware Roopa Prabhu
2015-05-18  5:11 ` Scott Feldman
2015-05-18 20:19 ` David Miller
2015-05-19  0:21   ` John Fastabend
2015-05-19  3:48     ` David Miller
2015-05-19  5:58       ` roopa
2015-05-19 16:34         ` David Miller
2015-05-19 17:01           ` Jiri Pirko
2015-05-19 19:47           ` Andy Gospodarek
2015-05-19 20:28             ` David Miller
2015-05-20 14:37               ` Andy Gospodarek
2015-05-21  5:46               ` Scott Feldman
2015-05-21 15:37                 ` roopa
2015-05-29  7:50                 ` Jiri Pirko
2015-05-29 15:39                   ` Scott Feldman
2015-05-30  9:00                     ` Jiri Pirko
2015-05-31  4:19                       ` John Fastabend
2015-05-31  6:34                         ` Scott Feldman
2015-05-31  7:34                       ` Scott Feldman
2015-05-19  5:57   ` roopa
2015-05-28  9:42   ` Jiri Pirko
2015-05-28 15:35     ` John Fastabend
2015-05-29  7:42       ` Jiri Pirko
2015-05-28 15:40     ` Scott Feldman
2015-05-28 16:10       ` John Fastabend
2015-05-29  5:37         ` roopa [this message]
2015-05-28 22:35       ` Andy Gospodarek
2015-05-29  5:51         ` roopa
2015-05-29  7:50       ` Jiri Pirko
2015-05-29  5:31     ` roopa
2015-05-29 15:12     ` Scott Feldman
2015-05-29 15:37       ` Jiri Pirko
  -- strict thread matches above, loose matches on Subject: below --
2015-05-17  3:46 Roopa Prabhu
2015-05-17 23:41 ` roopa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5567FB0A.1060900@cumulusnetworks.com \
    --to=roopa@cumulusnetworks.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sfeldma@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.