netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* switchdev fib offload issues
@ 2016-04-18 15:47 Jiri Pirko
  2016-04-18 16:38 ` Ilan Tayari
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jiri Pirko @ 2016-04-18 15:47 UTC (permalink / raw)
  To: netdev
  Cc: davem, idosch, eladr, yotamg, ogerlitz, roopa, nikolay, jhs,
	john.fastabend, rami.rosen, gospo, stephen, sfeldma, dsa,
	f.fainelli, andrew, vivien.didelot, tgraf, aduyck

Hi all.

The current situation of fib offloading is not good, I believe we need
to make some changes, therefore I'm writing this email. Please read,
think and comment.

Currently what we have is that for every fib entry inserted into a table,
there is a call to switchdev:
 fib_table_insert->switchdev_fib_ipv4_add
Driver then pushes fib entry down to HW. So far good.

However, if for any reason the switchdev add operation fails, there is an
abort function called (switchdev_fib_ipv4_abort). This function does two
things which are both unfortunate in many usecases:
1) evicts all fib entries from HW leaving all processing done in kernel
    - For Spectrum ASIC this means that all traffic running at 100G between
      all ports is immediately downgraded to ~1-3Gbits
    - Also this happens silently, user knows nothing about anything went wrong,
      only forwarding performance suddenly sucks.

2) sets net->ipv4.fib_offload_disabled = true
    - That results in no other fib entry being offloaded, forever,
      until net is removed and added again, machine reboot is required
      in case if init_ns

These 2 issues makes fib offload completely unusable. So I propose
to start thinking about fixing this.

I believe that although the current behaviour might be good for default,
user should be able to change it by setting a different policy. This
policy will allow to propagate offload error to user.

Note that user already has to handle fib add errors which are independent
on particular fib entry. That is a case of insufficient memory (-ENOBUFS).
In fact, when offload fails, that is most likely also due to insufficient
resources in HW.

Proposed solutions (ideas):
1) per-netns. Add a procfs file:
	/proc/sys/net/ipv4/route/fib_offload_error_policy
	  with values: "evict" - default, current behaviour
                       "fail" - propagate offload error to user
	The policy value would be stored in struct net.

2) per-VRF/table
	When user creates a VRF master, he specifies a table ID
	this VRF is going to use. I propose to extend this so
	he can pass a policy ("evict"/"fail").
	The policy value would be stored in struct fib_table or
	struct fib6_table. The problem is that vfr only saves
	table ID, allocates dst but does not actually create
	table. That might be created later. But I think this
	could be resolved.

3) per-VFR/master_netdev
	In this case, the policy would be also set during
	the creation of VFR master. From user perspective,
	this looks same as 2)
	The policy value would be stored in struct net_vrf (vrf private).

Thoughts?

Thanks!

Jiri

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-21 13:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-18 15:47 switchdev fib offload issues Jiri Pirko
2016-04-18 16:38 ` Ilan Tayari
2016-04-18 17:11   ` David Ahern
2016-04-18 16:59 ` David Ahern
2016-04-19  7:02   ` Jiri Pirko
2016-04-18 17:17 ` Hannes Frederic Sowa
2016-04-21 13:00   ` Roopa Prabhu
2016-04-18 17:52 ` David Miller
2016-04-19  7:21   ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).