Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] ethtool: Support for driver private ioctl's
From: Andrew Lunn @ 2018-04-06 14:47 UTC (permalink / raw)
  To: Jose Abreu
  Cc: Florian Fainelli, David Miller, Jakub Jelinek, Jeff Garzik,
	Tim Hockin, Eli Kupermann, Chris Leech, Scott Feldman,
	Ben Hutchings, netdev, Joao Pinto
In-Reply-To: <27c05ec7-1f85-8e4a-be06-70d6d80e8a10@synopsys.com>

On Fri, Apr 06, 2018 at 02:51:15PM +0100, Jose Abreu wrote:
> Hi Florian,
> 
> On 05-04-2018 16:50, Florian Fainelli wrote:
> >
> > On 04/05/2018 03:47 AM, Jose Abreu wrote:
> >> Hi All,
> >>
> >> I would like to know your opinion regarding adding support for
> >> driver private ioctl's in ethtool.
> >>
> >> Background: Synopsys Ethernet IP's have a certain number of
> >> features which can be reconfigured at runtime. Giving you two
> >> examples: One of the most recent one is the safety features,
> >> which can be enabled/disabled and forced at runtime.

Hi Jose

Is there a reason somebody would decide to use the Ethernet in
'unsafe' mode? Cannot you just turn it on by default?

	 Andrew

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Eric W. Biederman @ 2018-04-06 14:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Kirill Tkhai, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <20180406130704.GB9263@gmail.com>

Christian Brauner <christian.brauner@canonical.com> writes:

> On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
>> Christian Brauner <christian.brauner@canonical.com> writes:
>> 
>> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
>> >> On 05.04.2018 17:07, Christian Brauner wrote:
>> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
>> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
>> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
>> >> >>>
>> >> >>> enabled sending hotplug events into all network namespaces back in 2010.
>> >> >>> Over time the set of uevents that get sent into all network namespaces has
>> >> >>> shrunk. We have now reached the point where hotplug events for all devices
>> >> >>> that carry a namespace tag are filtered according to that namespace.
>> >> >>>
>> >> >>> Specifically, they are filtered whenever the namespace tag of the kobject
>> >> >>> does not match the namespace tag of the netlink socket. One example are
>> >> >>> network devices. Uevents for network devices only show up in the network
>> >> >>> namespaces these devices are moved to or created in.
>> >> >>>
>> >> >>> However, any uevent for a kobject that does not have a namespace tag
>> >> >>> associated with it will not be filtered and we will *try* to broadcast it
>> >> >>> into all network namespaces.
>> >> >>>
>> >> >>> The original patchset was written in 2010 before user namespaces were a
>> >> >>> thing. With the introduction of user namespaces sending out uevents became
>> >> >>> partially isolated as they were filtered by user namespaces:
>> >> >>>
>> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
>> >> >>>
>> >> >>> if (!net_eq(sock_net(sk), p->net)) {
>> >> >>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
>> >> >>>                 return;
>> >> >>>
>> >> >>>         if (!peernet_has_id(sock_net(sk), p->net))
>> >> >>>                 return;
>> >> >>>
>> >> >>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >> >>>                              CAP_NET_BROADCAST))
>> >> >>>         j       return;
>> >> >>> }
>> >> >>>
>> >> >>> The file_ns_capable() check will check whether the caller had
>> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
>> >> >>> namespace of interest. This check is fine in general but seems insufficient
>> >> >>> to me when paired with uevents. The reason is that devices always belong to
>> >> >>> the initial user namespace so uevents for kobjects that do not carry a
>> >> >>> namespace tag should never be sent into another user namespace. This has
>> >> >>> been the intention all along. But there's one case where this breaks,
>> >> >>> namely if a new user namespace is created by root on the host and an
>> >> >>> identity mapping is established between root on the host and root in the
>> >> >>> new user namespace. Here's a reproducer:
>> >> >>>
>> >> >>>  sudo unshare -U --map-root
>> >> >>>  udevadm monitor -k
>> >> >>>  # Now change to initial user namespace and e.g. do
>> >> >>>  modprobe kvm
>> >> >>>  # or
>> >> >>>  rmmod kvm
>> >> >>>
>> >> >>> will allow the non-initial user namespace to retrieve all uevents from the
>> >> >>> host. This seems very anecdotal given that in the general case user
>> >> >>> namespaces do not see any uevents and also can't really do anything useful
>> >> >>> with them.
>> >> >>>
>> >> >>> Additionally, it is now possible to send uevents from userspace. As such we
>> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
>> >> >>> namespace of the network namespace of the netlink socket) userspace process
>> >> >>> make a decision what uevents should be sent.
>> >> >>>
>> >> >>> This makes me think that we should simply ensure that uevents for kobjects
>> >> >>> that do not carry a namespace tag are *always* filtered by user namespace
>> >> >>> in kobj_bcast_filter(). Specifically:
>> >> >>> - If the owning user namespace of the uevent socket is not init_user_ns the
>> >> >>>   event will always be filtered.
>> >> >>> - If the network namespace the uevent socket belongs to was created in the
>> >> >>>   initial user namespace but was opened from a non-initial user namespace
>> >> >>>   the event will be filtered as well.
>> >> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
>> >> >>> always only sent to the initial user namespace. The regression potential
>> >> >>> for this is near to non-existent since user namespaces can't really do
>> >> >>> anything with interesting devices.
>> >> >>>
>> >> >>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
>> >> >>> ---
>> >> >>>  lib/kobject_uevent.c | 10 +++++++++-
>> >> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >> >>>
>> >> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> >> >>> index 15ea216a67ce..cb98cddb6e3b 100644
>> >> >>> --- a/lib/kobject_uevent.c
>> >> >>> +++ b/lib/kobject_uevent.c
>> >> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
>> >> >>>  		return sock_ns != ns;
>> >> >>>  	}
>> >> >>>  
>> >> >>> -	return 0;
>> >> >>> +	/*
>> >> >>> +	 * The kobject does not carry a namespace tag so filter by user
>> >> >>> +	 * namespace below.
>> >> >>> +	 */
>> >> >>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
>> >> >>> +		return 1;
>> >> >>> +
>> >> >>> +	/* Check if socket was opened from non-initial user namespace. */
>> >> >>> +	return sk_user_ns(dsk) != &init_user_ns;
>> >> >>>  }
>> >> >>>  #endif
>> >> >>
>> >> >> So, this prohibits to listen events of all devices except network-related
>> >> >> in containers? If it's so, I don't think it's a good solution. Uevents is not
>> >> > 
>> >> > No, this is not correct: As it is right now *without my patch* no
>> >> > non-initial user namespace is receiving *any uevents* but those
>> >> > specifically namespaced such as those for network devices. This patch
>> >> > doesn't change that at all. The commit message outlines this in detail
>> >> > how this comes about.
>> >> > There is only one case where this currently breaks and this is as I
>> >> > outlined explicitly in my commit message when you create a new user
>> >> > namespace and map container(0) -> host(0). This patch fixes this.
>> >> 
>> >> Could you please point the place, where non-initial user namespaces are filtered?
>> >> I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
>> >> Now it will return 1 sometimes.
>> >
>> > Oh sure, it's in the commit message though. The callchain is
>> > lib/kobject_uevent.c:kobject_uevent_net_broadcast() ->
>> > nnet/netlink/af_netlink.c:netlink_broadcast_filtered() ->
>> > net/netlink/af_netlink.c:do_one_broadcast():
>> >
>> > This codepiece will check whether the openened socket holds
>> > CAP_NET_BROADCAST in the user namespace of the target network namespace
>> > which it won't because we don't have device namespaces and all devices
>> > belong to the initial set of namespaces.
>> >
>> >         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
>> >                              CAP_NET_BROADCAST))
>> >         j       return;
>> >
>> 
>> The above that only applies if someone has set NETLINK_F_LISTEN_ALL_NSID
>> on their socket and has had someone with the appropriate privileges
>> assign a peerrnetid.
>> 
>> All of which is to say that file_ns_capable is not nearly as applicable
>> as it might be, and if you can pass the other two checks I think it is
>> pointless (because the peernet attributes are not generated for
>> kobj_uevents) but valid to receive events from outside your network
>> namespace.
>> 
>> 
>> I might be missing something but I don't see anything excluding network
>> namespaces owned by !init_user_ns excluded from the kobject_uevent
>> logic.
>> 
>> The uevent_sock_list has one entry per network namespace. And
>> kobject_uevent_net_broacast appears to walk each one.
>
> Yeah, it definitely does.
>
>> 
>> I had a memory of filtering uevent messages and I had a memory
>> that the netlink_has_listeners had a per network namespace effect.
>> Neither seems true from my inspection of the code tonight.
>
> Yeah, I drew the same conclusion.
>
>> 
>> If we are not filtering ordinary uevents at least at the user namespace
>> level that does seem to be at least a nuisance.
>
> This patch would filter based on user namespace and bump it from "we're
> accidently doing the right thing" to "we're doing the right on purpose"
> and without accidently leaking uevents in some corner cases.
> Using kobj_bcast_filter() for this seems like the right thing to do.
> This sounds like you don't disagree with the approach I'm taking here?

How are we accidentally filtering?  If we can not describe how the code
currently works to achieve something we don't understand it well enough
to change it.



> On a sidenote, it also very much feels like we're leaking information if
> we're not filtering based on user namespaces on purpose. Non-initial
> user namespaces should not be able to receive information about device
> attribues simply via netlink uevent messages. At least not *trivially*
> [1] by opening a netlink uevent socket.

This is not at all about isolation.  Devices don't belong to user
namespaces.  This is about it being useless/silly to send events
to those sockets as the receiver could not do anything with the uevent.
At a practical level there should be no receivers.  Plus performance
issues.  At least my memory is that any unprivileged user on the system
is allowed to listen to those events.

At least at one point udev listening and reacting to events in every
network namespace in containers was a significant slow down on the
system.

The report was that adding netlink_has_listeners greatly sped up uevent
sending.   That testing appears faulty.  I think the real gain was
userspace not listening to uevents in every container.

Which means uevent injection for containers has some real technical
challenges to overcome before it can be wide spread.

>> Christian can you dig a little deeper into this.  I have a feeling that
>> there are some real efficiency improvements that we could make to
>> kobject_uevent_net_broadcast if nothing else.
>
> Yeah, for sure! I already started doing this. This patch here is
> basically a preparatory patch for that work which is on my todo.

Limiting the network namespaces on uevent_sock_list to just network
namespaces owned by the initial user namespaces would be a lot better
than your patch.


At a practical level I am concerned what will happen when we stand up a
uevent listener aka udev in say 100 or maybe 1000 unprivileged
containers on a system.  History suggests the current data structures
won't scale to the problem, and the fixes that were put in place kernel
side only did not change anything.  It was only the userspace fixes
that made a difference.

Which suggests either improving the isolation between network namespaces
for netlink multicast sockets or putting:


	if (!net_eq(sock_net(sk), p->net)) {
		if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
			return;

		if (!peernet_has_id(sock_net(sk), p->net))
			return;

		if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
				     CAP_NET_BROADCAST))
			return;
	}

In a filter and having an appropriate filter so that
netlink_broadcast_filtered only needs to be called once from
kobject_uevent_net_broadcast.

It is more difficult but in practice I expect we have far more to gain
by fixing the mc_list and the netlink_has_listeners function to be per
network namespace basis.  Handling the NETLINK_F_LISTEN_ALL_NSID will
be trickier in that case but the current semantics appear correct
for that case.

But before we do anything we have to test and see if the assertion
that we don't make it to netlink listeners in network namespaces
that are outside of the init_user_ns is true.   If it is true we need to
find the code that makes it true.

Eric

^ permalink raw reply

* Re: [PATCH net 1/3] lan78xx: PHY DSP registers initialization to address EEE link drop issues with long cables
From: Andrew Lunn @ 2018-04-06 14:43 UTC (permalink / raw)
  To: Raghuram Chary J; +Cc: davem, netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180406061204.18257-2-raghuramchary.jallipalli@microchip.com>

On Fri, Apr 06, 2018 at 11:42:02AM +0530, Raghuram Chary J wrote:
> The patch is to configure DSP registers of PHY device
> to handle Gbe-EEE failures with >40m cable length.
> 
> Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
> ---
>  drivers/net/phy/microchip.c  | 123 ++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/microchipphy.h |   8 +++
>  2 files changed, 130 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
> index 0f293ef28935..174ae9808722 100644
> --- a/drivers/net/phy/microchip.c
> +++ b/drivers/net/phy/microchip.c
> @@ -20,6 +20,7 @@
>  #include <linux/ethtool.h>
>  #include <linux/phy.h>
>  #include <linux/microchipphy.h>
> +#include <linux/delay.h>
>  
>  #define DRIVER_AUTHOR	"WOOJUNG HUH <woojung.huh@microchip.com>"
>  #define DRIVER_DESC	"Microchip LAN88XX PHY driver"
> @@ -66,6 +67,107 @@ static int lan88xx_suspend(struct phy_device *phydev)
>  	return 0;
>  }
>  
> +static void lan88xx_TR_reg_set(struct phy_device *phydev, u16 regaddr,
> +			       u32 data)
> +{
> +	int val;
> +	u16 buf;
> +
> +	/* Get access to token ring page */
> +	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
> +		  LAN88XX_EXT_PAGE_ACCESS_TR);

Hi Raghuram

You might want to look at phy_read_paged(), phy_write_paged(), etc.

There can be race conditions with paged access.

      Andrew

^ permalink raw reply

* Re: [PATCH net 2/3] lan78xx: Add support to dump lan78xx registers
From: Andrew Lunn @ 2018-04-06 14:40 UTC (permalink / raw)
  To: Raghuram Chary J; +Cc: davem, netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180406061204.18257-3-raghuramchary.jallipalli@microchip.com>

> +static int lan78xx_get_regs_len(struct net_device *netdev)
> +{
> +	return (sizeof(lan78xx_regs) + PHY_REG_SIZE);

If there is no PHY attached, you probably should not include
PHY_REG_SIZE here.

     Andrew

^ permalink raw reply

* Re: [PATCH net 3/3] lan78xx: Lan7801 Support for Fixed PHY
From: Andrew Lunn @ 2018-04-06 14:35 UTC (permalink / raw)
  To: Raghuram Chary J; +Cc: davem, netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180406061204.18257-4-raghuramchary.jallipalli@microchip.com>

On Fri, Apr 06, 2018 at 11:42:04AM +0530, Raghuram Chary J wrote:
> Adding Fixed PHY support to the lan78xx driver.

Hi Raghuram

What do you expect is connected to the MAC if there is no PHY?

     Andrew

^ permalink raw reply

* Re: Enable and configure storm prevention in a network device
From: Andrew Lunn @ 2018-04-06 14:30 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, m-karicheri2, netdev
In-Reply-To: <75eb56a4-ded2-5ed5-116c-776312f93cf3@gmail.com>

On Thu, Apr 05, 2018 at 03:35:06PM -0700, Florian Fainelli wrote:
> On 04/05/2018 01:20 PM, David Miller wrote:
> > From: Murali Karicheri <m-karicheri2@ti.com>
> > Date: Thu, 5 Apr 2018 16:14:49 -0400
> > 
> >> Is there a standard way to implement and configure storm prevention
> >> in a Linux network device?
> > 
> > What kind of "storm", an interrupt storm?
> > 
> 
> I would assume Murali is referring to L2 broadcast storms which is
> common in switches. There is not an API for that AFAICT and I am not
> sure what a proper API would look like.

tc?

The Marvell switches have leaky buckets, which can be used for
limiting broadcast and multicast packets, as well as traffic shaping
in general. Storm prevention is just a form of traffic shaping, so if
we have generic traffic shaping, it can be used for storm prevention.

   Andrew

^ permalink raw reply

* Re: [PATCH net] net: dsa: Discard frames from unused ports
From: Andrew Lunn @ 2018-04-06 14:25 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev, Vivien Didelot
In-Reply-To: <9421e733-ed1a-0e2e-b587-4f32dad6d8cb@gmail.com>

On Thu, Apr 05, 2018 at 03:20:14PM -0700, Florian Fainelli wrote:
> On 04/04/2018 07:17 PM, Andrew Lunn wrote:
> > On Wed, Apr 04, 2018 at 05:49:10PM -0700, Florian Fainelli wrote:
> >> On 04/04/2018 04:56 PM, Andrew Lunn wrote:
> >>> The Marvell switches under some conditions will pass a frame to the
> >>> host with the port being the CPU port. Such frames are invalid, and
> >>> should be dropped. Not dropping them can result in a crash when
> >>> incrementing the receive statistics for an invalid port.
> >>>
> >>> Reported-by: Chris Healy <cphealy@gmail.com>
> >>> Fixes: 5f6b4e14cada ("net: dsa: User per-cpu 64-bit statistics")
> >>
> >> Are you sure this is the commit that introduced the problem?
> > 
> > Hi Florian
> > 
> > Well, the problem is it crashes when trying to update the
> > statistics. The CPU port is not allocated a p->stats64, only slave
> > ports get those. So before this patch, there was no crash and the
> > frame would be delivered to the master interface. This in itself is
> > probably not correct, but also not fatal. Talking to Chris, it seems
> > this behaviour has existing for a long while. I needed to use lldpd to
> > trigger the issue, because i assume the Marvell switch sees these as
> > special frames and forwards them to the CPU. The other thing is, the
> > code got refactored recently. So this fix will not rebase to too many
> > earlier versions. It needs a fix per tagging protocol for before the
> > common dsa_master_find_slave() was added.
> 
> Yes what you are explaining makes sense, but does not that mean we would
> just be accessing a garbage memory location before as well?

Humm, yes. I actually picked the wrong patch. It took two attempts to
get the stats64 working. I should of picked the first one.

Before stats64, we just used skb->dev. I need to look back at older
code, but skb->dev is valid in the versions i tested. It points to the
master device. So we don't crash.

However, i agree, we should fix this for the LTS kernels.

	 Andrew

^ permalink raw reply

* Re: [PATCH] dp83640: Ensure against premature access to PHY registers after reset
From: Andrew Lunn @ 2018-04-06 14:14 UTC (permalink / raw)
  To: Esben Haabendal
  Cc: netdev, Esben Haabendal, Richard Cochran, Florian Fainelli,
	linux-kernel
In-Reply-To: <20180406140540.13511-1-esben.haabendal@gmail.com>

On Fri, Apr 06, 2018 at 04:05:40PM +0200, Esben Haabendal wrote:
> From: Esben Haabendal <eha@deif.com>
> 
> Signed-off-by: Esben Haabendal <eha@deif.com>
> ---
>  drivers/net/phy/dp83640.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
> index 654f42d00092..48403170096a 100644
> --- a/drivers/net/phy/dp83640.c
> +++ b/drivers/net/phy/dp83640.c
> @@ -1207,6 +1207,22 @@ static void dp83640_remove(struct phy_device *phydev)
>  	kfree(dp83640);
>  }
>  
> +static int dp83640_soft_reset(struct phy_device *phydev)
> +{
> +	int ret;
> +
> +	ret = genphy_soft_reset(phydev);
> +	if (ret < 0)
> +		return ret;
> +
> +	/* From DP83640 datasheet: "Software driver code must wait 3 us
> +	 * following a software reset before allowing further serial MII
> +	 * operations with the DP83640." */
> +	udelay(3);

Hi Esben

The accuracy of udelay() is not guaranteed. So you probably want to be
a bit pessimistic, and use 10.

  Andrew

^ permalink raw reply

* Re: [PATCH net] net/sched: fix NULL dereference in the error path of tcf_bpf_init()
From: Lucas Bates @ 2018-04-06 14:11 UTC (permalink / raw)
  To: Davide Caratti
  Cc: David S. Miller, Jamal Hadi Salim, Cong Wang,
	Linux Kernel Network Developers
In-Reply-To: <906cf9da70a7d933a1c64814ce8524502dac2b0f.1522968919.git.dcaratti@redhat.com>

On Thu, Apr 5, 2018 at 7:19 PM, Davide Caratti <dcaratti@redhat.com> wrote:
> when tcf_bpf_init_from_ops() fails (e.g. because of program having invalid
> number of instructions), tcf_bpf_cfg_cleanup() calls bpf_prog_put(NULL) or
> bpf_prog_destroy(NULL). Unless CONFIG_BPF_SYSCALL is unset, this causes
> the following error:
>
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
>  PGD 800000007345a067 P4D 800000007345a067 PUD 340e1067 PMD 0
>  Oops: 0000 [#1] SMP PTI
>  Modules linked in: act_bpf(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd glue_helper cryptd joydev snd_timer snd virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console i2c_core crc32c_intel serio_raw virtio_pci ata_piix libata virtio_ring floppy virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf]
>  CPU: 3 PID: 5654 Comm: tc Tainted: G            E    4.16.0.bpf_test+ #408
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:__bpf_prog_put+0xc/0xc0
>  RSP: 0018:ffff9594003ef728 EFLAGS: 00010202
>  RAX: 0000000000000000 RBX: ffff9594003ef758 RCX: 0000000000000024
>  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
>  RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
>  R10: 0000000000000220 R11: ffff8a7ab9f17131 R12: 0000000000000000
>  R13: ffff8a7ab7c3c8e0 R14: 0000000000000001 R15: ffff8a7ab88f1054
>  FS:  00007fcb2f17c740(0000) GS:ffff8a7abfd80000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000020 CR3: 000000007c888006 CR4: 00000000001606e0
>  Call Trace:
>   tcf_bpf_cfg_cleanup+0x2f/0x40 [act_bpf]
>   tcf_bpf_cleanup+0x4c/0x70 [act_bpf]
>   __tcf_idr_release+0x79/0x140
>   tcf_bpf_init+0x125/0x330 [act_bpf]
>   tcf_action_init_1+0x2cc/0x430
>   ? get_page_from_freelist+0x3f0/0x11b0
>   tcf_action_init+0xd3/0x1b0
>   tc_ctl_action+0x18b/0x240
>   rtnetlink_rcv_msg+0x29c/0x310
>   ? _cond_resched+0x15/0x30
>   ? __kmalloc_node_track_caller+0x1b9/0x270
>   ? rtnl_calcit.isra.29+0x100/0x100
>   netlink_rcv_skb+0xd2/0x110
>   netlink_unicast+0x17c/0x230
>   netlink_sendmsg+0x2cd/0x3c0
>   sock_sendmsg+0x30/0x40
>   ___sys_sendmsg+0x27a/0x290
>   ? mem_cgroup_commit_charge+0x80/0x130
>   ? page_add_new_anon_rmap+0x73/0xc0
>   ? do_anonymous_page+0x2a2/0x560
>   ? __handle_mm_fault+0xc75/0xe20
>   __sys_sendmsg+0x58/0xa0
>   do_syscall_64+0x6e/0x1a0
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>  RIP: 0033:0x7fcb2e58eba0
>  RSP: 002b:00007ffc93c496c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>  RAX: ffffffffffffffda RBX: 00007ffc93c497f0 RCX: 00007fcb2e58eba0
>  RDX: 0000000000000000 RSI: 00007ffc93c49740 RDI: 0000000000000003
>  RBP: 000000005ac6a646 R08: 0000000000000002 R09: 0000000000000000
>  R10: 00007ffc93c49120 R11: 0000000000000246 R12: 0000000000000000
>  R13: 00007ffc93c49804 R14: 0000000000000001 R15: 000000000066afa0
>  Code: 5f 00 48 8b 43 20 48 c7 c7 70 2f 7c b8 c7 40 10 00 00 00 00 5b e9 a5 8b 61 00 0f 1f 44 00 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 <48> 8b 47 20 f0 ff 08 74 05 5b 5d 41 5c c3 41 89 f4 0f 1f 44 00
>  RIP: __bpf_prog_put+0xc/0xc0 RSP: ffff9594003ef728
>  CR2: 0000000000000020
>
> Fix it in tcf_bpf_cfg_cleanup(), ensuring that bpf_prog_{put,destroy}(f)
> is called only when f is not NULL.
>
> Fixes: bbc09e7842a5 ("net/sched: fix idr leak on the error path of tcf_bpf_init()")
> Reported-by: Lucas Bates <lucasb@mojatatu.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

That does the trick, thanks Davide.

Tested-by: Lucas Bates <lucasb@mojatatu.com>

^ permalink raw reply

* [PATCH] dp83640: Ensure against premature access to PHY registers after reset
From: Esben Haabendal @ 2018-04-06 14:05 UTC (permalink / raw)
  To: netdev
  Cc: Esben Haabendal, Richard Cochran, Andrew Lunn, Florian Fainelli,
	linux-kernel

From: Esben Haabendal <eha@deif.com>

Signed-off-by: Esben Haabendal <eha@deif.com>
---
 drivers/net/phy/dp83640.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 654f42d00092..48403170096a 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -1207,6 +1207,22 @@ static void dp83640_remove(struct phy_device *phydev)
 	kfree(dp83640);
 }
 
+static int dp83640_soft_reset(struct phy_device *phydev)
+{
+	int ret;
+
+	ret = genphy_soft_reset(phydev);
+	if (ret < 0)
+		return ret;
+
+	/* From DP83640 datasheet: "Software driver code must wait 3 us
+	 * following a software reset before allowing further serial MII
+	 * operations with the DP83640." */
+	udelay(3);
+
+	return 0;
+}
+
 static int dp83640_config_init(struct phy_device *phydev)
 {
 	struct dp83640_private *dp83640 = phydev->priv;
@@ -1501,6 +1517,7 @@ static struct phy_driver dp83640_driver = {
 	.flags		= PHY_HAS_INTERRUPT,
 	.probe		= dp83640_probe,
 	.remove		= dp83640_remove,
+	.soft_reset	= dp83640_soft_reset,
 	.config_init	= dp83640_config_init,
 	.ack_interrupt  = dp83640_ack_interrupt,
 	.config_intr    = dp83640_config_intr,
-- 
2.16.3

^ permalink raw reply related

* [PATCH v3] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-06 14:04 UTC (permalink / raw)
  To: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel
  Cc: dnelson, Vadim Lomovtsev
In-Reply-To: <20180406111425.14636-1-Vadim.Lomovtsev@caviumnetworks.com>

From: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>

It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.

This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.

Addresses-Coverity-ID: 1467429 ("Dereference null return value")
Fixes: 37c3347eb247 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
---
Changes from v1 to v2:
 - C99 syntax: update xcast_addr_list struct field mc[0] -> mc[];
Changes from v2 to v3:
 - update commit description with 'Reported-by: Dan Carpenter';
 - update size calculations for mc list to offsetof() call
   instead of explicit arithmetic;
---
 drivers/net/ethernet/cavium/thunder/nic.h        |  7 +-----
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 28 +++++++++---------------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 5fc46c5a4f36..448d1fafc827 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -265,14 +265,9 @@ struct nicvf_drv_stats {
 
 struct cavium_ptp;
 
-struct xcast_addr {
-	struct list_head list;
-	u64              addr;
-};
-
 struct xcast_addr_list {
-	struct list_head list;
 	int              count;
+	u64              mc[];
 };
 
 struct nicvf_work {
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 1e9a31fef729..7d9e58533a83 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 						  work.work);
 	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
 	union nic_mbx mbx = {};
-	struct xcast_addr *xaddr, *next;
+	u8 idx = 0;
 
 	if (!vf_work)
 		return;
@@ -1956,16 +1956,10 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 	/* check if we have any specific MACs to be added to PF DMAC filter */
 	if (vf_work->mc) {
 		/* now go through kernel list of MACs and add them one by one */
-		list_for_each_entry_safe(xaddr, next,
-					 &vf_work->mc->list, list) {
+		for (idx = 0; idx < vf_work->mc->count; idx++) {
 			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
-			mbx.xcast.data.mac = xaddr->addr;
+			mbx.xcast.data.mac = vf_work->mc->mc[idx];
 			nicvf_send_msg_to_pf(nic, &mbx);
-
-			/* after receiving ACK from PF release memory */
-			list_del(&xaddr->list);
-			kfree(xaddr);
-			vf_work->mc->count--;
 		}
 		kfree(vf_work->mc);
 	}
@@ -1996,17 +1990,15 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
 			mode |= BGX_XCAST_MCAST_FILTER;
 			/* here we need to copy mc addrs */
 			if (netdev_mc_count(netdev)) {
-				struct xcast_addr *xaddr;
-
-				mc_list = kmalloc(sizeof(*mc_list), GFP_ATOMIC);
-				INIT_LIST_HEAD(&mc_list->list);
+				mc_list = kmalloc(offsetof(typeof(*mc_list),
+							   mc[netdev_mc_count(netdev)]),
+						  GFP_ATOMIC);
+				if (unlikely(!mc_list))
+					return;
+				mc_list->count = 0;
 				netdev_hw_addr_list_for_each(ha, &netdev->mc) {
-					xaddr = kmalloc(sizeof(*xaddr),
-							GFP_ATOMIC);
-					xaddr->addr =
+					mc_list->mc[mc_list->count] =
 						ether_addr_to_u64(ha->addr);
-					list_add_tail(&xaddr->list,
-						      &mc_list->list);
 					mc_list->count++;
 				}
 			}
-- 
2.14.3

^ permalink raw reply related

* Re: [RFC] ethtool: Support for driver private ioctl's
From: Jose Abreu @ 2018-04-06 13:57 UTC (permalink / raw)
  To: Michal Kubecek, netdev
  Cc: Florian Fainelli, Jose Abreu, David Miller, Jakub Jelinek,
	Jeff Garzik, Tim Hockin, Eli Kupermann, Chris Leech,
	Scott Feldman, Ben Hutchings, Joao Pinto
In-Reply-To: <20180406090702.mgzozmkkh3vuhe7w@unicorn.suse.cz>

Hi Michal,

On 06-04-2018 10:07, Michal Kubecek wrote:
> On Thu, Apr 05, 2018 at 08:50:49AM -0700, Florian Fainelli wrote:
>> On 04/05/2018 03:47 AM, Jose Abreu wrote:
>>> Background: Synopsys Ethernet IP's have a certain number of
>>> features which can be reconfigured at runtime. Giving you two
>>> examples: One of the most recent one is the safety features,
>>> which can be enabled/disabled and forced at runtime. Another one
>>> is a Flexible RX Parser which can route specific packets to
>>> specific RX DMA channels. Given that these are features specific
>>> to our IP's it would not be useful to add an uniform API for this
>>> because the users would only be one or two drivers ...
>> Parsing of packets and directing the matched packets to specific
>> queues/channels can be done through ethtool rxnfc API, tc/cls_flower as
>> well, so you should really check whether those APIs don't already allow
>> you to do what you want.
>>
>> ethtool already supports a concept of private  flags, not ioctl() though
>> which allows you to toggle boolean values for instance (or technically
>> up to how many bits a "flag" is used to represent) is that enough or do
>> you need to turn on/off the feature as well as pass configuration
>> parameters?
> Perhaps introducing "driver/device specific tunables" (i.e. something
> like tunables or PHY tunables but specific to a particular device) could
> be a way. But it could get out of control quickly and users wouldn't be
> happy if they had to set the same (or almost the same) parameter under
> five different names for five NIC vendors.

Yeah, that wouldn't be good but I think this should be a
responsibility to developer: To see if there is an existing
API/ethtool entry before implementing the "tunable". I think a
big concern, for me at least, is that ethtool already has a lot
of options and introducing even more would lead the user to
confusion ...

Thanks and Best Regards,
Jose Miguel Abreu

>
> Michal Kubecek

^ permalink raw reply

* Re: [RFC] ethtool: Support for driver private ioctl's
From: Jose Abreu @ 2018-04-06 13:51 UTC (permalink / raw)
  To: Florian Fainelli, Jose Abreu, David Miller, Jakub Jelinek,
	Jeff Garzik, Tim Hockin, Eli Kupermann, Chris Leech,
	Scott Feldman, Ben Hutchings
  Cc: netdev, Joao Pinto
In-Reply-To: <df7773c4-8695-38dd-bedd-39b88f4aaec1@gmail.com>

Hi Florian,

On 05-04-2018 16:50, Florian Fainelli wrote:
>
> On 04/05/2018 03:47 AM, Jose Abreu wrote:
>> Hi All,
>>
>> I would like to know your opinion regarding adding support for
>> driver private ioctl's in ethtool.
>>
>> Background: Synopsys Ethernet IP's have a certain number of
>> features which can be reconfigured at runtime. Giving you two
>> examples: One of the most recent one is the safety features,
>> which can be enabled/disabled and forced at runtime. Another one
>> is a Flexible RX Parser which can route specific packets to
>> specific RX DMA channels. Given that these are features specific
>> to our IP's it would not be useful to add an uniform API for this
>> because the users would only be one or two drivers ...
> Parsing of packets and directing the matched packets to specific
> queues/channels can be done through ethtool rxnfc API, tc/cls_flower as
> well, so you should really check whether those APIs don't already allow
> you to do what you want.

Hmm, but in our case this is directly done by HW, we just have to
program a kind of a table which will route automatically the
packets. Does this API support this?

>
> ethtool already supports a concept of private  flags, not ioctl() though
> which allows you to toggle boolean values for instance (or technically
> up to how many bits a "flag" is used to represent) is that enough or do
> you need to turn on/off the feature as well as pass configuration
> parameters?

Some of them I can just turn on/off but the remaining need
configuration and sometimes the configuration is extensive (like
in the case of RX Parser when we have to pass the routing table).

>
>> This new feature would change the help usage for ethtool so that
>> each driver private option would be shown, and then each driver
>> specific file would have a structure with all the available
>> options. Finally, each driver would have to handle the private
>> IOCTL's.
>>
>> We already have this working locally and now I would like to know
>> your opinion about upstreaming this ... Do you think this can be
>> useful for anyone else? Or should we change direction to use, for
>> example, debugfs/configfs?
> In general, even if there is only one driver implementing a particular
> feature, the approach chosen is to come up with an API that is as
> generic as possible. Even if there is a single user of that API in tree,
> having something that was thought to be generic is better than allowing
> uncontrolled private ioctl() implementations.

I understand your point of view but this seems like an overkill
to the -net subsystem because its specific to our IP, or are you
just mentioning a new ethtool entry? i.e. adding a new #define to
the list, plus -net handling ...

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH net-next 5/5] net: stmmac: Switch stmmac_mode_ops to generic HW Interface Helpers
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1523019346.git.joabreu@synopsys.com>

Switch stmmac_mode_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.

No functional change.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/chain_mode.c  | 20 +++++------
 drivers/net/ethernet/stmicro/stmmac/common.h      | 12 -------
 drivers/net/ethernet/stmicro/stmmac/hwif.h        | 27 ++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/ring_mode.c   | 24 ++++++-------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 43 ++++++++++-------------
 5 files changed, 68 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
index ca0f9c2..b9c9003 100644
--- a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
@@ -24,7 +24,7 @@
 
 #include "stmmac.h"
 
-static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
+static int jumbo_frm(void *p, struct sk_buff *skb, int csum)
 {
 	struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)p;
 	unsigned int nopaged_len = skb_headlen(skb);
@@ -93,7 +93,7 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 	return entry;
 }
 
-static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
+static unsigned int is_jumbo_frm(int len, int enh_desc)
 {
 	unsigned int ret = 0;
 
@@ -105,7 +105,7 @@ static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
 	return ret;
 }
 
-static void stmmac_init_dma_chain(void *des, dma_addr_t phy_addr,
+static void init_dma_chain(void *des, dma_addr_t phy_addr,
 				  unsigned int size, unsigned int extend_desc)
 {
 	/*
@@ -135,7 +135,7 @@ static void stmmac_init_dma_chain(void *des, dma_addr_t phy_addr,
 	}
 }
 
-static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
+static void refill_desc3(void *priv_ptr, struct dma_desc *p)
 {
 	struct stmmac_rx_queue *rx_q = (struct stmmac_rx_queue *)priv_ptr;
 	struct stmmac_priv *priv = rx_q->priv_data;
@@ -151,7 +151,7 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
 				      sizeof(struct dma_desc)));
 }
 
-static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
+static void clean_desc3(void *priv_ptr, struct dma_desc *p)
 {
 	struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)priv_ptr;
 	struct stmmac_priv *priv = tx_q->priv_data;
@@ -169,9 +169,9 @@ static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
 }
 
 const struct stmmac_mode_ops chain_mode_ops = {
-	.init = stmmac_init_dma_chain,
-	.is_jumbo_frm = stmmac_is_jumbo_frm,
-	.jumbo_frm = stmmac_jumbo_frm,
-	.refill_desc3 = stmmac_refill_desc3,
-	.clean_desc3 = stmmac_clean_desc3,
+	.init = init_dma_chain,
+	.is_jumbo_frm = is_jumbo_frm,
+	.jumbo_frm = jumbo_frm,
+	.refill_desc3 = refill_desc3,
+	.clean_desc3 = clean_desc3,
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 7291561..59673c6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -405,18 +405,6 @@ struct mii_regs {
 	unsigned int clk_csr_mask;
 };
 
-/* Helpers to manage the descriptors for chain and ring modes */
-struct stmmac_mode_ops {
-	void (*init) (void *des, dma_addr_t phy_addr, unsigned int size,
-		      unsigned int extend_desc);
-	unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
-	int (*jumbo_frm)(void *priv, struct sk_buff *skb, int csum);
-	int (*set_16kib_bfsize)(int mtu);
-	void (*init_desc3)(struct dma_desc *p);
-	void (*refill_desc3) (void *priv, struct dma_desc *p);
-	void (*clean_desc3) (void *priv, struct dma_desc *p);
-};
-
 struct mac_device_info {
 	const struct stmmac_ops *mac;
 	const struct stmmac_desc_ops *desc;
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index e23c0a3..f81ded4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -391,4 +391,31 @@ struct stmmac_hwtimestamp {
 #define stmmac_get_systime(__priv, __args...) \
 	stmmac_do_void_callback(__priv, ptp, get_systime, __args)
 
+/* Helpers to manage the descriptors for chain and ring modes */
+struct stmmac_mode_ops {
+	void (*init) (void *des, dma_addr_t phy_addr, unsigned int size,
+		      unsigned int extend_desc);
+	unsigned int (*is_jumbo_frm) (int len, int ehn_desc);
+	int (*jumbo_frm)(void *priv, struct sk_buff *skb, int csum);
+	int (*set_16kib_bfsize)(int mtu);
+	void (*init_desc3)(struct dma_desc *p);
+	void (*refill_desc3) (void *priv, struct dma_desc *p);
+	void (*clean_desc3) (void *priv, struct dma_desc *p);
+};
+
+#define stmmac_mode_init(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mode, init, __args)
+#define stmmac_is_jumbo_frm(__priv, __args...) \
+	stmmac_do_callback(__priv, mode, is_jumbo_frm, __args)
+#define stmmac_jumbo_frm(__priv, __args...) \
+	stmmac_do_callback(__priv, mode, jumbo_frm, __args)
+#define stmmac_set_16kib_bfsize(__priv, __args...) \
+	stmmac_do_callback(__priv, mode, set_16kib_bfsize, __args)
+#define stmmac_init_desc3(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mode, init_desc3, __args)
+#define stmmac_refill_desc3(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mode, refill_desc3, __args)
+#define stmmac_clean_desc3(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mode, clean_desc3, __args)
+
 #endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
index 808ca75..a7ffc73 100644
--- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
@@ -24,7 +24,7 @@
 
 #include "stmmac.h"
 
-static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
+static int jumbo_frm(void *p, struct sk_buff *skb, int csum)
 {
 	struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)p;
 	unsigned int nopaged_len = skb_headlen(skb);
@@ -99,7 +99,7 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 	return entry;
 }
 
-static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
+static unsigned int is_jumbo_frm(int len, int enh_desc)
 {
 	unsigned int ret = 0;
 
@@ -109,7 +109,7 @@ static unsigned int stmmac_is_jumbo_frm(int len, int enh_desc)
 	return ret;
 }
 
-static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
+static void refill_desc3(void *priv_ptr, struct dma_desc *p)
 {
 	struct stmmac_priv *priv = (struct stmmac_priv *)priv_ptr;
 
@@ -119,12 +119,12 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
 }
 
 /* In ring mode we need to fill the desc3 because it is used as buffer */
-static void stmmac_init_desc3(struct dma_desc *p)
+static void init_desc3(struct dma_desc *p)
 {
 	p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
 }
 
-static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
+static void clean_desc3(void *priv_ptr, struct dma_desc *p)
 {
 	struct stmmac_tx_queue *tx_q = (struct stmmac_tx_queue *)priv_ptr;
 	struct stmmac_priv *priv = tx_q->priv_data;
@@ -137,7 +137,7 @@ static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
 		p->des3 = 0;
 }
 
-static int stmmac_set_16kib_bfsize(int mtu)
+static int set_16kib_bfsize(int mtu)
 {
 	int ret = 0;
 	if (unlikely(mtu >= BUF_SIZE_8KiB))
@@ -146,10 +146,10 @@ static int stmmac_set_16kib_bfsize(int mtu)
 }
 
 const struct stmmac_mode_ops ring_mode_ops = {
-	.is_jumbo_frm = stmmac_is_jumbo_frm,
-	.jumbo_frm = stmmac_jumbo_frm,
-	.refill_desc3 = stmmac_refill_desc3,
-	.init_desc3 = stmmac_init_desc3,
-	.clean_desc3 = stmmac_clean_desc3,
-	.set_16kib_bfsize = stmmac_set_16kib_bfsize,
+	.is_jumbo_frm = is_jumbo_frm,
+	.jumbo_frm = jumbo_frm,
+	.refill_desc3 = refill_desc3,
+	.init_desc3 = init_desc3,
+	.clean_desc3 = clean_desc3,
+	.set_16kib_bfsize = set_16kib_bfsize,
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 896a1e7..90363a8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1161,9 +1161,8 @@ static int stmmac_init_rx_buffers(struct stmmac_priv *priv, struct dma_desc *p,
 	else
 		p->des2 = cpu_to_le32(rx_q->rx_skbuff_dma[i]);
 
-	if ((priv->hw->mode->init_desc3) &&
-	    (priv->dma_buf_sz == BUF_SIZE_16KiB))
-		priv->hw->mode->init_desc3(p);
+	if (priv->dma_buf_sz == BUF_SIZE_16KiB)
+		stmmac_init_desc3(priv, p);
 
 	return 0;
 }
@@ -1229,13 +1228,14 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 	u32 rx_count = priv->plat->rx_queues_to_use;
-	unsigned int bfsize = 0;
 	int ret = -ENOMEM;
+	int bfsize = 0;
 	int queue;
 	int i;
 
-	if (priv->hw->mode->set_16kib_bfsize)
-		bfsize = priv->hw->mode->set_16kib_bfsize(dev->mtu);
+	bfsize = stmmac_set_16kib_bfsize(priv, dev->mtu);
+	if (bfsize < 0)
+		bfsize = 0;
 
 	if (bfsize < BUF_SIZE_16KiB)
 		bfsize = stmmac_set_bfsize(dev->mtu, priv->dma_buf_sz);
@@ -1279,13 +1279,11 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 		/* Setup the chained descriptor addresses */
 		if (priv->mode == STMMAC_CHAIN_MODE) {
 			if (priv->extend_desc)
-				priv->hw->mode->init(rx_q->dma_erx,
-						     rx_q->dma_rx_phy,
-						     DMA_RX_SIZE, 1);
+				stmmac_mode_init(priv, rx_q->dma_erx,
+						rx_q->dma_rx_phy, DMA_RX_SIZE, 1);
 			else
-				priv->hw->mode->init(rx_q->dma_rx,
-						     rx_q->dma_rx_phy,
-						     DMA_RX_SIZE, 0);
+				stmmac_mode_init(priv, rx_q->dma_rx,
+						rx_q->dma_rx_phy, DMA_RX_SIZE, 0);
 		}
 	}
 
@@ -1332,13 +1330,11 @@ static int init_dma_tx_desc_rings(struct net_device *dev)
 		/* Setup the chained descriptor addresses */
 		if (priv->mode == STMMAC_CHAIN_MODE) {
 			if (priv->extend_desc)
-				priv->hw->mode->init(tx_q->dma_etx,
-						     tx_q->dma_tx_phy,
-						     DMA_TX_SIZE, 1);
+				stmmac_mode_init(priv, tx_q->dma_etx,
+						tx_q->dma_tx_phy, DMA_TX_SIZE, 1);
 			else
-				priv->hw->mode->init(tx_q->dma_tx,
-						     tx_q->dma_tx_phy,
-						     DMA_TX_SIZE, 0);
+				stmmac_mode_init(priv, tx_q->dma_tx,
+						tx_q->dma_tx_phy, DMA_TX_SIZE, 0);
 		}
 
 		for (i = 0; i < DMA_TX_SIZE; i++) {
@@ -1886,8 +1882,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
 			tx_q->tx_skbuff_dma[entry].map_as_page = false;
 		}
 
-		if (priv->hw->mode->clean_desc3)
-			priv->hw->mode->clean_desc3(tx_q, p);
+		stmmac_clean_desc3(priv, tx_q, p);
 
 		tx_q->tx_skbuff_dma[entry].last_segment = false;
 		tx_q->tx_skbuff_dma[entry].is_jumbo = false;
@@ -3099,11 +3094,11 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 	enh_desc = priv->plat->enh_desc;
 	/* To program the descriptors according to the size of the frame */
 	if (enh_desc)
-		is_jumbo = priv->hw->mode->is_jumbo_frm(skb->len, enh_desc);
+		is_jumbo = stmmac_is_jumbo_frm(priv, skb->len, enh_desc);
 
 	if (unlikely(is_jumbo) && likely(priv->synopsys_id <
 					 DWMAC_CORE_4_00)) {
-		entry = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
+		entry = stmmac_jumbo_frm(priv, tx_q, skb, csum_insertion);
 		if (unlikely(entry < 0))
 			goto dma_map_err;
 	}
@@ -3332,8 +3327,8 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 			} else {
 				p->des2 = cpu_to_le32(rx_q->rx_skbuff_dma[entry]);
 			}
-			if (priv->hw->mode->refill_desc3)
-				priv->hw->mode->refill_desc3(rx_q, p);
+
+			stmmac_refill_desc3(priv, rx_q, p);
 
 			if (rx_q->rx_zeroc_thresh > 0)
 				rx_q->rx_zeroc_thresh--;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 4/5] net: stmmac: Switch stmmac_hwtimestamp to generic HW Interface Helpers
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1523019346.git.joabreu@synopsys.com>

Switch stmmac_hwtimestamp to generic Hardware Interface Helpers instead
of using hard-coded callbacks. This makes the code more readable and
more flexible.

No functional change.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       | 12 --------
 drivers/net/ethernet/stmicro/stmmac/hwif.h         | 25 ++++++++++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c  | 34 ++++++++++++----------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 17 +++++------
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c   | 18 ++++--------
 5 files changed, 56 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 0e0b6f1..7291561 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -383,18 +383,6 @@ extern const struct stmmac_desc_ops ndesc_ops;
 
 struct mac_device_info;
 
-/* PTP and HW Timer helpers */
-struct stmmac_hwtimestamp {
-	void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data);
-	u32 (*config_sub_second_increment)(void __iomem *ioaddr, u32 ptp_clock,
-					   int gmac4);
-	int (*init_systime) (void __iomem *ioaddr, u32 sec, u32 nsec);
-	int (*config_addend) (void __iomem *ioaddr, u32 addend);
-	int (*adjust_systime) (void __iomem *ioaddr, u32 sec, u32 nsec,
-			       int add_sub, int gmac4);
-	 u64(*get_systime) (void __iomem *ioaddr);
-};
-
 extern const struct stmmac_hwtimestamp stmmac_ptp;
 extern const struct stmmac_mode_ops dwmac4_ring_mode_ops;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index 9575135..e23c0a3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -366,4 +366,29 @@ struct stmmac_ops {
 #define stmmac_safety_feat_dump(__priv, __args...) \
 	stmmac_do_callback(__priv, mac, safety_feat_dump, __args)
 
+/* PTP and HW Timer helpers */
+struct stmmac_hwtimestamp {
+	void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data);
+	void (*config_sub_second_increment)(void __iomem *ioaddr, u32 ptp_clock,
+					   int gmac4, u32 *ssinc);
+	int (*init_systime) (void __iomem *ioaddr, u32 sec, u32 nsec);
+	int (*config_addend) (void __iomem *ioaddr, u32 addend);
+	int (*adjust_systime) (void __iomem *ioaddr, u32 sec, u32 nsec,
+			       int add_sub, int gmac4);
+	void (*get_systime) (void __iomem *ioaddr, u64 *systime);
+};
+
+#define stmmac_config_hw_tstamping(__priv, __args...) \
+	stmmac_do_void_callback(__priv, ptp, config_hw_tstamping, __args)
+#define stmmac_config_sub_second_increment(__priv, __args...) \
+	stmmac_do_void_callback(__priv, ptp, config_sub_second_increment, __args)
+#define stmmac_init_systime(__priv, __args...) \
+	stmmac_do_callback(__priv, ptp, init_systime, __args)
+#define stmmac_config_addend(__priv, __args...) \
+	stmmac_do_callback(__priv, ptp, config_addend, __args)
+#define stmmac_adjust_systime(__priv, __args...) \
+	stmmac_do_callback(__priv, ptp, adjust_systime, __args)
+#define stmmac_get_systime(__priv, __args...) \
+	stmmac_do_void_callback(__priv, ptp, get_systime, __args)
+
 #endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
index 08c19eb..8d9cc21 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c
@@ -24,13 +24,13 @@
 #include "common.h"
 #include "stmmac_ptp.h"
 
-static void stmmac_config_hw_tstamping(void __iomem *ioaddr, u32 data)
+static void config_hw_tstamping(void __iomem *ioaddr, u32 data)
 {
 	writel(data, ioaddr + PTP_TCR);
 }
 
-static u32 stmmac_config_sub_second_increment(void __iomem *ioaddr,
-					      u32 ptp_clock, int gmac4)
+static void config_sub_second_increment(void __iomem *ioaddr,
+		u32 ptp_clock, int gmac4, u32 *ssinc)
 {
 	u32 value = readl(ioaddr + PTP_TCR);
 	unsigned long data;
@@ -57,10 +57,11 @@ static u32 stmmac_config_sub_second_increment(void __iomem *ioaddr,
 
 	writel(reg_value, ioaddr + PTP_SSIR);
 
-	return data;
+	if (ssinc)
+		*ssinc = data;
 }
 
-static int stmmac_init_systime(void __iomem *ioaddr, u32 sec, u32 nsec)
+static int init_systime(void __iomem *ioaddr, u32 sec, u32 nsec)
 {
 	int limit;
 	u32 value;
@@ -85,7 +86,7 @@ static int stmmac_init_systime(void __iomem *ioaddr, u32 sec, u32 nsec)
 	return 0;
 }
 
-static int stmmac_config_addend(void __iomem *ioaddr, u32 addend)
+static int config_addend(void __iomem *ioaddr, u32 addend)
 {
 	u32 value;
 	int limit;
@@ -109,8 +110,8 @@ static int stmmac_config_addend(void __iomem *ioaddr, u32 addend)
 	return 0;
 }
 
-static int stmmac_adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
-				 int add_sub, int gmac4)
+static int adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
+		int add_sub, int gmac4)
 {
 	u32 value;
 	int limit;
@@ -152,7 +153,7 @@ static int stmmac_adjust_systime(void __iomem *ioaddr, u32 sec, u32 nsec,
 	return 0;
 }
 
-static u64 stmmac_get_systime(void __iomem *ioaddr)
+static void get_systime(void __iomem *ioaddr, u64 *systime)
 {
 	u64 ns;
 
@@ -161,14 +162,15 @@ static u64 stmmac_get_systime(void __iomem *ioaddr)
 	/* Get the TSS and convert sec time value to nanosecond */
 	ns += readl(ioaddr + PTP_STSR) * 1000000000ULL;
 
-	return ns;
+	if (systime)
+		*systime = ns;
 }
 
 const struct stmmac_hwtimestamp stmmac_ptp = {
-	.config_hw_tstamping = stmmac_config_hw_tstamping,
-	.init_systime = stmmac_init_systime,
-	.config_sub_second_increment = stmmac_config_sub_second_increment,
-	.config_addend = stmmac_config_addend,
-	.adjust_systime = stmmac_adjust_systime,
-	.get_systime = stmmac_get_systime,
+	.config_hw_tstamping = config_hw_tstamping,
+	.init_systime = init_systime,
+	.config_sub_second_increment = config_sub_second_increment,
+	.config_addend = config_addend,
+	.adjust_systime = adjust_systime,
+	.get_systime = get_systime,
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 2b521035..896a1e7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -707,18 +707,18 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
 	priv->hwts_tx_en = config.tx_type == HWTSTAMP_TX_ON;
 
 	if (!priv->hwts_tx_en && !priv->hwts_rx_en)
-		priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, 0);
+		stmmac_config_hw_tstamping(priv, priv->ptpaddr, 0);
 	else {
 		value = (PTP_TCR_TSENA | PTP_TCR_TSCFUPDT | PTP_TCR_TSCTRLSSR |
 			 tstamp_all | ptp_v2 | ptp_over_ethernet |
 			 ptp_over_ipv6_udp | ptp_over_ipv4_udp | ts_event_en |
 			 ts_master_en | snap_type_sel);
-		priv->hw->ptp->config_hw_tstamping(priv->ptpaddr, value);
+		stmmac_config_hw_tstamping(priv, priv->ptpaddr, value);
 
 		/* program Sub Second Increment reg */
-		sec_inc = priv->hw->ptp->config_sub_second_increment(
-			priv->ptpaddr, priv->plat->clk_ptp_rate,
-			priv->plat->has_gmac4);
+		stmmac_config_sub_second_increment(priv,
+				priv->ptpaddr, priv->plat->clk_ptp_rate,
+				priv->plat->has_gmac4, &sec_inc);
 		temp = div_u64(1000000000ULL, sec_inc);
 
 		/* calculate default added value:
@@ -728,15 +728,14 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, struct ifreq *ifr)
 		 */
 		temp = (u64)(temp << 32);
 		priv->default_addend = div_u64(temp, priv->plat->clk_ptp_rate);
-		priv->hw->ptp->config_addend(priv->ptpaddr,
-					     priv->default_addend);
+		stmmac_config_addend(priv, priv->ptpaddr, priv->default_addend);
 
 		/* initialize system time */
 		ktime_get_real_ts64(&now);
 
 		/* lower 32 bits of tv_sec are safe until y2106 */
-		priv->hw->ptp->init_systime(priv->ptpaddr, (u32)now.tv_sec,
-					    now.tv_nsec);
+		stmmac_init_systime(priv, priv->ptpaddr,
+				(u32)now.tv_sec, now.tv_nsec);
 	}
 
 	return copy_to_user(ifr->ifr_data, &config,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index e471a90..7d3a5c7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -49,9 +49,7 @@ static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb)
 	addend = neg_adj ? (addend - diff) : (addend + diff);
 
 	spin_lock_irqsave(&priv->ptp_lock, flags);
-
-	priv->hw->ptp->config_addend(priv->ptpaddr, addend);
-
+	stmmac_config_addend(priv, priv->ptpaddr, addend);
 	spin_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	return 0;
@@ -84,10 +82,8 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta)
 	nsec = reminder;
 
 	spin_lock_irqsave(&priv->ptp_lock, flags);
-
-	priv->hw->ptp->adjust_systime(priv->ptpaddr, sec, nsec, neg_adj,
-				      priv->plat->has_gmac4);
-
+	stmmac_adjust_systime(priv, priv->ptpaddr, sec, nsec, neg_adj,
+			priv->plat->has_gmac4);
 	spin_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	return 0;
@@ -110,9 +106,7 @@ static int stmmac_get_time(struct ptp_clock_info *ptp, struct timespec64 *ts)
 	u64 ns;
 
 	spin_lock_irqsave(&priv->ptp_lock, flags);
-
-	ns = priv->hw->ptp->get_systime(priv->ptpaddr);
-
+	stmmac_get_systime(priv, priv->ptpaddr, &ns);
 	spin_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	*ts = ns_to_timespec64(ns);
@@ -137,9 +131,7 @@ static int stmmac_set_time(struct ptp_clock_info *ptp,
 	unsigned long flags;
 
 	spin_lock_irqsave(&priv->ptp_lock, flags);
-
-	priv->hw->ptp->init_systime(priv->ptpaddr, ts->tv_sec, ts->tv_nsec);
-
+	stmmac_init_systime(priv, priv->ptpaddr, ts->tv_sec, ts->tv_nsec);
 	spin_unlock_irqrestore(&priv->ptp_lock, flags);
 
 	return 0;
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 3/5] net: stmmac: Switch stmmac_ops to generic HW Interface Helpers
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1523019346.git.joabreu@synopsys.com>

Switch stmmac_ops to generic Hardware Interface Helpers instead of using
hard-coded callbacks. This makes the code more readable and more
flexible.

No functional change.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |  70 -----------
 drivers/net/ethernet/stmicro/stmmac/dwmac5.c       |  19 +--
 drivers/net/ethernet/stmicro/stmmac/dwmac5.h       |   6 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h         | 138 +++++++++++++++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  68 ++++------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 122 +++++++++---------
 6 files changed, 235 insertions(+), 188 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index b27221b..0e0b6f1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -383,76 +383,6 @@ extern const struct stmmac_desc_ops ndesc_ops;
 
 struct mac_device_info;
 
-/* Helpers to program the MAC core */
-struct stmmac_ops {
-	/* MAC core initialization */
-	void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
-	/* Enable the MAC RX/TX */
-	void (*set_mac)(void __iomem *ioaddr, bool enable);
-	/* Enable and verify that the IPC module is supported */
-	int (*rx_ipc)(struct mac_device_info *hw);
-	/* Enable RX Queues */
-	void (*rx_queue_enable)(struct mac_device_info *hw, u8 mode, u32 queue);
-	/* RX Queues Priority */
-	void (*rx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
-	/* TX Queues Priority */
-	void (*tx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
-	/* RX Queues Routing */
-	void (*rx_queue_routing)(struct mac_device_info *hw, u8 packet,
-				 u32 queue);
-	/* Program RX Algorithms */
-	void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
-	/* Program TX Algorithms */
-	void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
-	/* Set MTL TX queues weight */
-	void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
-					u32 weight, u32 queue);
-	/* RX MTL queue to RX dma mapping */
-	void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
-	/* Configure AV Algorithm */
-	void (*config_cbs)(struct mac_device_info *hw, u32 send_slope,
-			   u32 idle_slope, u32 high_credit, u32 low_credit,
-			   u32 queue);
-	/* Dump MAC registers */
-	void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
-	/* Handle extra events on specific interrupts hw dependent */
-	int (*host_irq_status)(struct mac_device_info *hw,
-			       struct stmmac_extra_stats *x);
-	/* Handle MTL interrupts */
-	int (*host_mtl_irq_status)(struct mac_device_info *hw, u32 chan);
-	/* Multicast filter setting */
-	void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
-	/* Flow control setting */
-	void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
-			  unsigned int fc, unsigned int pause_time, u32 tx_cnt);
-	/* Set power management mode (e.g. magic frame) */
-	void (*pmt)(struct mac_device_info *hw, unsigned long mode);
-	/* Set/Get Unicast MAC addresses */
-	void (*set_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
-			      unsigned int reg_n);
-	void (*get_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
-			      unsigned int reg_n);
-	void (*set_eee_mode)(struct mac_device_info *hw,
-			     bool en_tx_lpi_clockgating);
-	void (*reset_eee_mode)(struct mac_device_info *hw);
-	void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
-	void (*set_eee_pls)(struct mac_device_info *hw, int link);
-	void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x,
-		      u32 rx_queues, u32 tx_queues);
-	/* PCS calls */
-	void (*pcs_ctrl_ane)(void __iomem *ioaddr, bool ane, bool srgmi_ral,
-			     bool loopback);
-	void (*pcs_rane)(void __iomem *ioaddr, bool restart);
-	void (*pcs_get_adv_lp)(void __iomem *ioaddr, struct rgmii_adv *adv);
-	/* Safety Features */
-	int (*safety_feat_config)(void __iomem *ioaddr, unsigned int asp);
-	bool (*safety_feat_irq_status)(struct net_device *ndev,
-			void __iomem *ioaddr, unsigned int asp,
-			struct stmmac_safety_stats *stats);
-	const char *(*safety_feat_dump)(struct stmmac_safety_stats *stats,
-			int index, unsigned long *count);
-};
-
 /* PTP and HW Timer helpers */
 struct stmmac_hwtimestamp {
 	void (*config_hw_tstamping) (void __iomem *ioaddr, u32 data);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
index 860de39..2978550 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.c
@@ -237,15 +237,16 @@ int dwmac5_safety_feat_config(void __iomem *ioaddr, unsigned int asp)
 	return 0;
 }
 
-bool dwmac5_safety_feat_irq_status(struct net_device *ndev,
+int dwmac5_safety_feat_irq_status(struct net_device *ndev,
 		void __iomem *ioaddr, unsigned int asp,
 		struct stmmac_safety_stats *stats)
 {
-	bool ret = false, err, corr;
+	bool err, corr;
 	u32 mtl, dma;
+	int ret = 0;
 
 	if (!asp)
-		return false;
+		return -EINVAL;
 
 	mtl = readl(ioaddr + MTL_SAFETY_INT_STATUS);
 	dma = readl(ioaddr + DMA_SAFETY_INT_STATUS);
@@ -282,17 +283,19 @@ static const struct dwmac5_error {
 	{ dwmac5_dma_errors },
 };
 
-const char *dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
-			int index, unsigned long *count)
+int dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
+			int index, unsigned long *count, const char **desc)
 {
 	int module = index / 32, offset = index % 32;
 	unsigned long *ptr = (unsigned long *)stats;
 
 	if (module >= ARRAY_SIZE(dwmac5_all_errors))
-		return NULL;
+		return -EINVAL;
 	if (!dwmac5_all_errors[module].desc[offset].valid)
-		return NULL;
+		return -EINVAL;
 	if (count)
 		*count = *(ptr + index);
-	return dwmac5_all_errors[module].desc[offset].desc;
+	if (desc)
+		*desc = dwmac5_all_errors[module].desc[offset].desc;
+	return 0;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
index a0d2c44..bd4c466 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
@@ -43,10 +43,10 @@
 #define DMA_ECC_INT_STATUS		0x00001088
 
 int dwmac5_safety_feat_config(void __iomem *ioaddr, unsigned int asp);
-bool dwmac5_safety_feat_irq_status(struct net_device *ndev,
+int dwmac5_safety_feat_irq_status(struct net_device *ndev,
 		void __iomem *ioaddr, unsigned int asp,
 		struct stmmac_safety_stats *stats);
-const char *dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
-			int index, unsigned long *count);
+int dwmac5_safety_feat_dump(struct stmmac_safety_stats *stats,
+			int index, unsigned long *count, const char **desc);
 
 #endif /* __DWMAC5_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index e1a9ae6..9575135 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -228,4 +228,142 @@ struct stmmac_dma_ops {
 #define stmmac_enable_tso(__priv, __args...) \
 	stmmac_do_void_callback(__priv, dma, enable_tso, __args)
 
+struct mac_device_info;
+struct net_device;
+struct rgmii_adv;
+struct stmmac_safety_stats;
+
+/* Helpers to program the MAC core */
+struct stmmac_ops {
+	/* MAC core initialization */
+	void (*core_init)(struct mac_device_info *hw, struct net_device *dev);
+	/* Enable the MAC RX/TX */
+	void (*set_mac)(void __iomem *ioaddr, bool enable);
+	/* Enable and verify that the IPC module is supported */
+	int (*rx_ipc)(struct mac_device_info *hw);
+	/* Enable RX Queues */
+	void (*rx_queue_enable)(struct mac_device_info *hw, u8 mode, u32 queue);
+	/* RX Queues Priority */
+	void (*rx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
+	/* TX Queues Priority */
+	void (*tx_queue_prio)(struct mac_device_info *hw, u32 prio, u32 queue);
+	/* RX Queues Routing */
+	void (*rx_queue_routing)(struct mac_device_info *hw, u8 packet,
+				 u32 queue);
+	/* Program RX Algorithms */
+	void (*prog_mtl_rx_algorithms)(struct mac_device_info *hw, u32 rx_alg);
+	/* Program TX Algorithms */
+	void (*prog_mtl_tx_algorithms)(struct mac_device_info *hw, u32 tx_alg);
+	/* Set MTL TX queues weight */
+	void (*set_mtl_tx_queue_weight)(struct mac_device_info *hw,
+					u32 weight, u32 queue);
+	/* RX MTL queue to RX dma mapping */
+	void (*map_mtl_to_dma)(struct mac_device_info *hw, u32 queue, u32 chan);
+	/* Configure AV Algorithm */
+	void (*config_cbs)(struct mac_device_info *hw, u32 send_slope,
+			   u32 idle_slope, u32 high_credit, u32 low_credit,
+			   u32 queue);
+	/* Dump MAC registers */
+	void (*dump_regs)(struct mac_device_info *hw, u32 *reg_space);
+	/* Handle extra events on specific interrupts hw dependent */
+	int (*host_irq_status)(struct mac_device_info *hw,
+			       struct stmmac_extra_stats *x);
+	/* Handle MTL interrupts */
+	int (*host_mtl_irq_status)(struct mac_device_info *hw, u32 chan);
+	/* Multicast filter setting */
+	void (*set_filter)(struct mac_device_info *hw, struct net_device *dev);
+	/* Flow control setting */
+	void (*flow_ctrl)(struct mac_device_info *hw, unsigned int duplex,
+			  unsigned int fc, unsigned int pause_time, u32 tx_cnt);
+	/* Set power management mode (e.g. magic frame) */
+	void (*pmt)(struct mac_device_info *hw, unsigned long mode);
+	/* Set/Get Unicast MAC addresses */
+	void (*set_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
+			      unsigned int reg_n);
+	void (*get_umac_addr)(struct mac_device_info *hw, unsigned char *addr,
+			      unsigned int reg_n);
+	void (*set_eee_mode)(struct mac_device_info *hw,
+			     bool en_tx_lpi_clockgating);
+	void (*reset_eee_mode)(struct mac_device_info *hw);
+	void (*set_eee_timer)(struct mac_device_info *hw, int ls, int tw);
+	void (*set_eee_pls)(struct mac_device_info *hw, int link);
+	void (*debug)(void __iomem *ioaddr, struct stmmac_extra_stats *x,
+		      u32 rx_queues, u32 tx_queues);
+	/* PCS calls */
+	void (*pcs_ctrl_ane)(void __iomem *ioaddr, bool ane, bool srgmi_ral,
+			     bool loopback);
+	void (*pcs_rane)(void __iomem *ioaddr, bool restart);
+	void (*pcs_get_adv_lp)(void __iomem *ioaddr, struct rgmii_adv *adv);
+	/* Safety Features */
+	int (*safety_feat_config)(void __iomem *ioaddr, unsigned int asp);
+	int (*safety_feat_irq_status)(struct net_device *ndev,
+			void __iomem *ioaddr, unsigned int asp,
+			struct stmmac_safety_stats *stats);
+	int (*safety_feat_dump)(struct stmmac_safety_stats *stats,
+			int index, unsigned long *count, const char **desc);
+};
+
+#define stmmac_core_init(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, core_init, __args)
+#define stmmac_mac_set(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_mac, __args)
+#define stmmac_rx_ipc(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, rx_ipc, __args)
+#define stmmac_rx_queue_enable(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, rx_queue_enable, __args)
+#define stmmac_rx_queue_prio(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, rx_queue_prio, __args)
+#define stmmac_tx_queue_prio(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, tx_queue_prio, __args)
+#define stmmac_rx_queue_routing(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, rx_queue_routing, __args)
+#define stmmac_prog_mtl_rx_algorithms(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, prog_mtl_rx_algorithms, __args)
+#define stmmac_prog_mtl_tx_algorithms(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, prog_mtl_tx_algorithms, __args)
+#define stmmac_set_mtl_tx_queue_weight(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_mtl_tx_queue_weight, __args)
+#define stmmac_map_mtl_to_dma(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, map_mtl_to_dma, __args)
+#define stmmac_config_cbs(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, config_cbs, __args)
+#define stmmac_dump_mac_regs(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, dump_regs, __args)
+#define stmmac_host_irq_status(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, host_irq_status, __args)
+#define stmmac_host_mtl_irq_status(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, host_mtl_irq_status, __args)
+#define stmmac_set_filter(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_filter, __args)
+#define stmmac_flow_ctrl(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, flow_ctrl, __args)
+#define stmmac_pmt(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, pmt, __args)
+#define stmmac_set_umac_addr(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_umac_addr, __args)
+#define stmmac_get_umac_addr(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, get_umac_addr, __args)
+#define stmmac_set_eee_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_eee_mode, __args)
+#define stmmac_reset_eee_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, reset_eee_mode, __args)
+#define stmmac_set_eee_timer(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_eee_timer, __args)
+#define stmmac_set_eee_pls(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, set_eee_pls, __args)
+#define stmmac_mac_debug(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, debug, __args)
+#define stmmac_pcs_ctrl_ane(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, pcs_ctrl_ane, __args)
+#define stmmac_pcs_rane(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, pcs_rane, __args)
+#define stmmac_pcs_get_adv_lp(__priv, __args...) \
+	stmmac_do_void_callback(__priv, mac, pcs_get_adv_lp, __args)
+#define stmmac_safety_feat_config(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, safety_feat_config, __args)
+#define stmmac_safety_feat_irq_status(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, safety_feat_irq_status, __args)
+#define stmmac_safety_feat_dump(__priv, __args...) \
+	stmmac_do_callback(__priv, mac, safety_feat_dump, __args)
+
 #endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 7009718..6d82b3e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -291,11 +291,9 @@ static int stmmac_ethtool_get_link_ksettings(struct net_device *dev,
 		cmd->base.speed = priv->xstats.pcs_speed;
 
 		/* Get and convert ADV/LP_ADV from the HW AN registers */
-		if (!priv->hw->mac->pcs_get_adv_lp)
+		if (stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv))
 			return -EOPNOTSUPP;	/* should never happen indeed */
 
-		priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv);
-
 		/* Encoding of PSE bits is defined in 802.3z, 37.2.1.4 */
 
 		ethtool_convert_link_mode_to_legacy_u32(
@@ -393,11 +391,7 @@ stmmac_ethtool_set_link_ksettings(struct net_device *dev,
 			ADVERTISED_10baseT_Full);
 
 		spin_lock(&priv->lock);
-
-		if (priv->hw->mac->pcs_ctrl_ane)
-			priv->hw->mac->pcs_ctrl_ane(priv->ioaddr, 1,
-						    priv->hw->ps, 0);
-
+		stmmac_pcs_ctrl_ane(priv, priv->ioaddr, 1, priv->hw->ps, 0);
 		spin_unlock(&priv->lock);
 
 		return 0;
@@ -442,7 +436,7 @@ static void stmmac_ethtool_gregs(struct net_device *dev,
 
 	memset(reg_space, 0x0, REG_SPACE_SIZE);
 
-	priv->hw->mac->dump_regs(priv->hw, reg_space);
+	stmmac_dump_mac_regs(priv, priv->hw, reg_space);
 	stmmac_dump_dma_regs(priv, priv->ioaddr, reg_space);
 	/* Copy DMA registers to where ethtool expects them */
 	memcpy(&reg_space[ETHTOOL_DMA_OFFSET], &reg_space[DMA_BUS_MODE / 4],
@@ -454,15 +448,13 @@ stmmac_get_pauseparam(struct net_device *netdev,
 		      struct ethtool_pauseparam *pause)
 {
 	struct stmmac_priv *priv = netdev_priv(netdev);
+	struct rgmii_adv adv_lp;
 
 	pause->rx_pause = 0;
 	pause->tx_pause = 0;
 
-	if (priv->hw->pcs && priv->hw->mac->pcs_get_adv_lp) {
-		struct rgmii_adv adv_lp;
-
+	if (priv->hw->pcs && !stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv_lp)) {
 		pause->autoneg = 1;
-		priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv_lp);
 		if (!adv_lp.pause)
 			return;
 	} else {
@@ -488,12 +480,10 @@ stmmac_set_pauseparam(struct net_device *netdev,
 	u32 tx_cnt = priv->plat->tx_queues_to_use;
 	struct phy_device *phy = netdev->phydev;
 	int new_pause = FLOW_OFF;
+	struct rgmii_adv adv_lp;
 
-	if (priv->hw->pcs && priv->hw->mac->pcs_get_adv_lp) {
-		struct rgmii_adv adv_lp;
-
+	if (priv->hw->pcs && !stmmac_pcs_get_adv_lp(priv, priv->ioaddr, &adv_lp)) {
 		pause->autoneg = 1;
-		priv->hw->mac->pcs_get_adv_lp(priv->ioaddr, &adv_lp);
 		if (!adv_lp.pause)
 			return -EOPNOTSUPP;
 	} else {
@@ -515,27 +505,24 @@ stmmac_set_pauseparam(struct net_device *netdev,
 			return phy_start_aneg(phy);
 	}
 
-	priv->hw->mac->flow_ctrl(priv->hw, phy->duplex, priv->flow_ctrl,
-				 priv->pause, tx_cnt);
+	stmmac_flow_ctrl(priv, priv->hw, phy->duplex, priv->flow_ctrl,
+			priv->pause, tx_cnt);
 	return 0;
 }
 
 static void stmmac_get_ethtool_stats(struct net_device *dev,
 				 struct ethtool_stats *dummy, u64 *data)
 {
-	const char *(*dump)(struct stmmac_safety_stats *stats, int index,
-			unsigned long *count);
 	struct stmmac_priv *priv = netdev_priv(dev);
 	u32 rx_queues_count = priv->plat->rx_queues_to_use;
 	u32 tx_queues_count = priv->plat->tx_queues_to_use;
 	unsigned long count;
 	int i, j = 0, ret;
 
-	if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
-		dump = priv->hw->mac->safety_feat_dump;
-
+	if (priv->dma_cap.asp) {
 		for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
-			if (dump(&priv->sstats, i, &count))
+			if (!stmmac_safety_feat_dump(priv, &priv->sstats, i,
+						&count, NULL))
 				data[j++] = count;
 		}
 	}
@@ -563,11 +550,10 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
 				priv->xstats.phy_eee_wakeup_error_n = val;
 		}
 
-		if ((priv->hw->mac->debug) &&
-		    (priv->synopsys_id >= DWMAC_CORE_3_50))
-			priv->hw->mac->debug(priv->ioaddr,
-					     (void *)&priv->xstats,
-					     rx_queues_count, tx_queues_count);
+		if (priv->synopsys_id >= DWMAC_CORE_3_50)
+			stmmac_mac_debug(priv, priv->ioaddr,
+					(void *)&priv->xstats,
+					rx_queues_count, tx_queues_count);
 	}
 	for (i = 0; i < STMMAC_STATS_LEN; i++) {
 		char *p = (char *)priv + stmmac_gstrings_stats[i].stat_offset;
@@ -579,8 +565,6 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
 static int stmmac_get_sset_count(struct net_device *netdev, int sset)
 {
 	struct stmmac_priv *priv = netdev_priv(netdev);
-	const char *(*dump)(struct stmmac_safety_stats *stats, int index,
-			unsigned long *count);
 	int i, len, safety_len = 0;
 
 	switch (sset) {
@@ -589,11 +573,11 @@ static int stmmac_get_sset_count(struct net_device *netdev, int sset)
 
 		if (priv->dma_cap.rmon)
 			len += STMMAC_MMC_STATS_LEN;
-		if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
-			dump = priv->hw->mac->safety_feat_dump;
-
+		if (priv->dma_cap.asp) {
 			for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
-				if (dump(&priv->sstats, i, NULL))
+				if (!stmmac_safety_feat_dump(priv,
+							&priv->sstats, i,
+							NULL, NULL))
 					safety_len++;
 			}
 
@@ -611,17 +595,15 @@ static void stmmac_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 	int i;
 	u8 *p = data;
 	struct stmmac_priv *priv = netdev_priv(dev);
-	const char *(*dump)(struct stmmac_safety_stats *stats, int index,
-			unsigned long *count);
 
 	switch (stringset) {
 	case ETH_SS_STATS:
-		if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
-			dump = priv->hw->mac->safety_feat_dump;
+		if (priv->dma_cap.asp) {
 			for (i = 0; i < STMMAC_SAFETY_FEAT_SIZE; i++) {
-				const char *desc = dump(&priv->sstats, i, NULL);
-
-				if (desc) {
+				const char *desc;
+				if (!stmmac_safety_feat_dump(priv,
+							&priv->sstats, i,
+							NULL, &desc)) {
 					memcpy(p, desc, ETH_GSTRING_LEN);
 					p += ETH_GSTRING_LEN;
 				}
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7f2aeae..2b521035 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -336,8 +336,8 @@ static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
 
 	/* Check and enter in LPI mode */
 	if (!priv->tx_path_in_lpi_mode)
-		priv->hw->mac->set_eee_mode(priv->hw,
-					    priv->plat->en_tx_lpi_clockgating);
+		stmmac_set_eee_mode(priv, priv->hw,
+				priv->plat->en_tx_lpi_clockgating);
 }
 
 /**
@@ -348,7 +348,7 @@ static void stmmac_enable_eee_mode(struct stmmac_priv *priv)
  */
 void stmmac_disable_eee_mode(struct stmmac_priv *priv)
 {
-	priv->hw->mac->reset_eee_mode(priv->hw);
+	stmmac_reset_eee_mode(priv, priv->hw);
 	del_timer_sync(&priv->eee_ctrl_timer);
 	priv->tx_path_in_lpi_mode = false;
 }
@@ -411,8 +411,8 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
 			if (priv->eee_active) {
 				netdev_dbg(priv->dev, "disable EEE\n");
 				del_timer_sync(&priv->eee_ctrl_timer);
-				priv->hw->mac->set_eee_timer(priv->hw, 0,
-							     tx_lpi_timer);
+				stmmac_set_eee_timer(priv, priv->hw, 0,
+						tx_lpi_timer);
 			}
 			priv->eee_active = 0;
 			spin_unlock_irqrestore(&priv->lock, flags);
@@ -427,12 +427,11 @@ bool stmmac_eee_init(struct stmmac_priv *priv)
 			mod_timer(&priv->eee_ctrl_timer,
 				  STMMAC_LPI_T(eee_timer));
 
-			priv->hw->mac->set_eee_timer(priv->hw,
-						     STMMAC_DEFAULT_LIT_LS,
-						     tx_lpi_timer);
+			stmmac_set_eee_timer(priv, priv->hw,
+					STMMAC_DEFAULT_LIT_LS, tx_lpi_timer);
 		}
 		/* Set HW EEE according to the speed */
-		priv->hw->mac->set_eee_pls(priv->hw, ndev->phydev->link);
+		stmmac_set_eee_pls(priv, priv->hw, ndev->phydev->link);
 
 		ret = true;
 		spin_unlock_irqrestore(&priv->lock, flags);
@@ -796,8 +795,8 @@ static void stmmac_mac_flow_ctrl(struct stmmac_priv *priv, u32 duplex)
 {
 	u32 tx_cnt = priv->plat->tx_queues_to_use;
 
-	priv->hw->mac->flow_ctrl(priv->hw, duplex, priv->flow_ctrl,
-				 priv->pause, tx_cnt);
+	stmmac_flow_ctrl(priv, priv->hw, duplex, priv->flow_ctrl,
+			priv->pause, tx_cnt);
 }
 
 /**
@@ -1663,7 +1662,7 @@ static void stmmac_mac_enable_rx_queues(struct stmmac_priv *priv)
 
 	for (queue = 0; queue < rx_queues_count; queue++) {
 		mode = priv->plat->rx_queues_cfg[queue].mode_to_use;
-		priv->hw->mac->rx_queue_enable(priv->hw, mode, queue);
+		stmmac_rx_queue_enable(priv, priv->hw, mode, queue);
 	}
 }
 
@@ -2000,18 +1999,19 @@ static void stmmac_set_dma_operation_mode(struct stmmac_priv *priv, u32 txmode,
 
 static bool stmmac_safety_feat_interrupt(struct stmmac_priv *priv)
 {
-	bool ret = false;
+	int ret = false;
 
 	/* Safety features are only available in cores >= 5.10 */
 	if (priv->synopsys_id < DWMAC_CORE_5_10)
 		return ret;
-	if (priv->hw->mac->safety_feat_irq_status)
-		ret = priv->hw->mac->safety_feat_irq_status(priv->dev,
-				priv->ioaddr, priv->dma_cap.asp, &priv->sstats);
-
-	if (ret)
+	ret = stmmac_safety_feat_irq_status(priv, priv->dev,
+			priv->ioaddr, priv->dma_cap.asp, &priv->sstats);
+	if (ret && (ret != -EINVAL)) {
 		stmmac_global_err(priv);
-	return ret;
+		return true;
+	}
+
+	return false;
 }
 
 /**
@@ -2177,8 +2177,7 @@ static int stmmac_get_hw_features(struct stmmac_priv *priv)
 static void stmmac_check_ether_addr(struct stmmac_priv *priv)
 {
 	if (!is_valid_ether_addr(priv->dev->dev_addr)) {
-		priv->hw->mac->get_umac_addr(priv->hw,
-					     priv->dev->dev_addr, 0);
+		stmmac_get_umac_addr(priv, priv->hw, priv->dev->dev_addr, 0);
 		if (!is_valid_ether_addr(priv->dev->dev_addr))
 			eth_hw_addr_random(priv->dev);
 		netdev_info(priv->dev, "device MAC address %pM\n",
@@ -2332,7 +2331,7 @@ static void stmmac_set_tx_queue_weight(struct stmmac_priv *priv)
 
 	for (queue = 0; queue < tx_queues_count; queue++) {
 		weight = priv->plat->tx_queues_cfg[queue].weight;
-		priv->hw->mac->set_mtl_tx_queue_weight(priv->hw, weight, queue);
+		stmmac_set_mtl_tx_queue_weight(priv, priv->hw, weight, queue);
 	}
 }
 
@@ -2353,7 +2352,7 @@ static void stmmac_configure_cbs(struct stmmac_priv *priv)
 		if (mode_to_use == MTL_QUEUE_DCB)
 			continue;
 
-		priv->hw->mac->config_cbs(priv->hw,
+		stmmac_config_cbs(priv, priv->hw,
 				priv->plat->tx_queues_cfg[queue].send_slope,
 				priv->plat->tx_queues_cfg[queue].idle_slope,
 				priv->plat->tx_queues_cfg[queue].high_credit,
@@ -2375,7 +2374,7 @@ static void stmmac_rx_queue_dma_chan_map(struct stmmac_priv *priv)
 
 	for (queue = 0; queue < rx_queues_count; queue++) {
 		chan = priv->plat->rx_queues_cfg[queue].chan;
-		priv->hw->mac->map_mtl_to_dma(priv->hw, queue, chan);
+		stmmac_map_mtl_to_dma(priv, priv->hw, queue, chan);
 	}
 }
 
@@ -2395,7 +2394,7 @@ static void stmmac_mac_config_rx_queues_prio(struct stmmac_priv *priv)
 			continue;
 
 		prio = priv->plat->rx_queues_cfg[queue].prio;
-		priv->hw->mac->rx_queue_prio(priv->hw, prio, queue);
+		stmmac_rx_queue_prio(priv, priv->hw, prio, queue);
 	}
 }
 
@@ -2415,7 +2414,7 @@ static void stmmac_mac_config_tx_queues_prio(struct stmmac_priv *priv)
 			continue;
 
 		prio = priv->plat->tx_queues_cfg[queue].prio;
-		priv->hw->mac->tx_queue_prio(priv->hw, prio, queue);
+		stmmac_tx_queue_prio(priv, priv->hw, prio, queue);
 	}
 }
 
@@ -2436,7 +2435,7 @@ static void stmmac_mac_config_rx_queues_routing(struct stmmac_priv *priv)
 			continue;
 
 		packet = priv->plat->rx_queues_cfg[queue].pkt_route;
-		priv->hw->mac->rx_queue_routing(priv->hw, packet, queue);
+		stmmac_rx_queue_routing(priv, priv->hw, packet, queue);
 	}
 }
 
@@ -2450,50 +2449,47 @@ static void stmmac_mtl_configuration(struct stmmac_priv *priv)
 	u32 rx_queues_count = priv->plat->rx_queues_to_use;
 	u32 tx_queues_count = priv->plat->tx_queues_to_use;
 
-	if (tx_queues_count > 1 && priv->hw->mac->set_mtl_tx_queue_weight)
+	if (tx_queues_count > 1)
 		stmmac_set_tx_queue_weight(priv);
 
 	/* Configure MTL RX algorithms */
-	if (rx_queues_count > 1 && priv->hw->mac->prog_mtl_rx_algorithms)
-		priv->hw->mac->prog_mtl_rx_algorithms(priv->hw,
-						priv->plat->rx_sched_algorithm);
+	if (rx_queues_count > 1)
+		stmmac_prog_mtl_rx_algorithms(priv, priv->hw,
+				priv->plat->rx_sched_algorithm);
 
 	/* Configure MTL TX algorithms */
-	if (tx_queues_count > 1 && priv->hw->mac->prog_mtl_tx_algorithms)
-		priv->hw->mac->prog_mtl_tx_algorithms(priv->hw,
-						priv->plat->tx_sched_algorithm);
+	if (tx_queues_count > 1)
+		stmmac_prog_mtl_tx_algorithms(priv, priv->hw,
+				priv->plat->tx_sched_algorithm);
 
 	/* Configure CBS in AVB TX queues */
-	if (tx_queues_count > 1 && priv->hw->mac->config_cbs)
+	if (tx_queues_count > 1)
 		stmmac_configure_cbs(priv);
 
 	/* Map RX MTL to DMA channels */
-	if (priv->hw->mac->map_mtl_to_dma)
-		stmmac_rx_queue_dma_chan_map(priv);
+	stmmac_rx_queue_dma_chan_map(priv);
 
 	/* Enable MAC RX Queues */
-	if (priv->hw->mac->rx_queue_enable)
-		stmmac_mac_enable_rx_queues(priv);
+	stmmac_mac_enable_rx_queues(priv);
 
 	/* Set RX priorities */
-	if (rx_queues_count > 1 && priv->hw->mac->rx_queue_prio)
+	if (rx_queues_count > 1)
 		stmmac_mac_config_rx_queues_prio(priv);
 
 	/* Set TX priorities */
-	if (tx_queues_count > 1 && priv->hw->mac->tx_queue_prio)
+	if (tx_queues_count > 1)
 		stmmac_mac_config_tx_queues_prio(priv);
 
 	/* Set RX routing */
-	if (rx_queues_count > 1 && priv->hw->mac->rx_queue_routing)
+	if (rx_queues_count > 1)
 		stmmac_mac_config_rx_queues_routing(priv);
 }
 
 static void stmmac_safety_feat_configuration(struct stmmac_priv *priv)
 {
-	if (priv->hw->mac->safety_feat_config && priv->dma_cap.asp) {
+	if (priv->dma_cap.asp) {
 		netdev_info(priv->dev, "Enabling Safety Features\n");
-		priv->hw->mac->safety_feat_config(priv->ioaddr,
-				priv->dma_cap.asp);
+		stmmac_safety_feat_config(priv, priv->ioaddr, priv->dma_cap.asp);
 	} else {
 		netdev_info(priv->dev, "No Safety Features support found\n");
 	}
@@ -2528,7 +2524,7 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	}
 
 	/* Copy the MAC addr into the HW  */
-	priv->hw->mac->set_umac_addr(priv->hw, dev->dev_addr, 0);
+	stmmac_set_umac_addr(priv, priv->hw, dev->dev_addr, 0);
 
 	/* PS and related bits will be programmed according to the speed */
 	if (priv->hw->pcs) {
@@ -2544,7 +2540,7 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	}
 
 	/* Initialize the MAC Core */
-	priv->hw->mac->core_init(priv->hw, dev);
+	stmmac_core_init(priv, priv->hw, dev);
 
 	/* Initialize MTL*/
 	if (priv->synopsys_id >= DWMAC_CORE_4_00)
@@ -2554,7 +2550,7 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	if (priv->synopsys_id >= DWMAC_CORE_5_10)
 		stmmac_safety_feat_configuration(priv);
 
-	ret = priv->hw->mac->rx_ipc(priv->hw);
+	ret = stmmac_rx_ipc(priv, priv->hw);
 	if (!ret) {
 		netdev_warn(priv->dev, "RX IPC Checksum Offload disabled\n");
 		priv->plat->rx_coe = STMMAC_RX_COE_NONE;
@@ -2562,7 +2558,7 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	}
 
 	/* Enable the MAC Rx/Tx */
-	priv->hw->mac->set_mac(priv->ioaddr, true);
+	stmmac_mac_set(priv, priv->ioaddr, true);
 
 	/* Set the HW DMA mode and the COE */
 	stmmac_dma_operation_mode(priv);
@@ -2598,8 +2594,8 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 			priv->rx_riwt = MAX_DMA_RIWT;
 	}
 
-	if (priv->hw->pcs && priv->hw->mac->pcs_ctrl_ane)
-		priv->hw->mac->pcs_ctrl_ane(priv->hw, 1, priv->hw->ps, 0);
+	if (priv->hw->pcs)
+		stmmac_pcs_ctrl_ane(priv, priv->hw, 1, priv->hw->ps, 0);
 
 	/* set TX and RX rings length */
 	stmmac_set_rings_length(priv);
@@ -2778,7 +2774,7 @@ static int stmmac_release(struct net_device *dev)
 	free_dma_desc_resources(priv);
 
 	/* Disable the MAC Rx/Tx */
-	priv->hw->mac->set_mac(priv->ioaddr, false);
+	stmmac_mac_set(priv, priv->ioaddr, false);
 
 	netif_carrier_off(dev);
 
@@ -3614,7 +3610,7 @@ static void stmmac_set_rx_mode(struct net_device *dev)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 
-	priv->hw->mac->set_filter(priv->hw, dev);
+	stmmac_set_filter(priv, priv->hw, dev);
 }
 
 /**
@@ -3687,7 +3683,7 @@ static int stmmac_set_features(struct net_device *netdev,
 	/* No check needed because rx_coe has been set before and it will be
 	 * fixed in case of issue.
 	 */
-	priv->hw->mac->rx_ipc(priv->hw);
+	stmmac_rx_ipc(priv, priv->hw);
 
 	return 0;
 }
@@ -3731,8 +3727,7 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
 
 	/* To handle GMAC own interrupts */
 	if ((priv->plat->has_gmac) || (priv->plat->has_gmac4)) {
-		int status = priv->hw->mac->host_irq_status(priv->hw,
-							    &priv->xstats);
+		int status = stmmac_host_irq_status(priv, priv->hw, &priv->xstats);
 
 		if (unlikely(status)) {
 			/* For LPI we need to save the tx status */
@@ -3747,9 +3742,8 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
 				struct stmmac_rx_queue *rx_q =
 				&priv->rx_queue[queue];
 
-				status |=
-				priv->hw->mac->host_mtl_irq_status(priv->hw,
-								   queue);
+				status |= stmmac_host_mtl_irq_status(priv,
+						priv->hw, queue);
 
 				if (status & CORE_IRQ_MTL_RX_OVERFLOW)
 					stmmac_set_rx_tail_ptr(priv,
@@ -3829,7 +3823,7 @@ static int stmmac_set_mac_address(struct net_device *ndev, void *addr)
 	if (ret)
 		return ret;
 
-	priv->hw->mac->set_umac_addr(priv->hw, ndev->dev_addr, 0);
+	stmmac_set_umac_addr(priv, priv->hw, ndev->dev_addr, 0);
 
 	return ret;
 }
@@ -4418,7 +4412,7 @@ int stmmac_dvr_remove(struct device *dev)
 
 	stmmac_stop_all_dma(priv);
 
-	priv->hw->mac->set_mac(priv->ioaddr, false);
+	stmmac_mac_set(priv, priv->ioaddr, false);
 	netif_carrier_off(ndev);
 	unregister_netdev(ndev);
 	if (priv->plat->stmmac_rst)
@@ -4467,10 +4461,10 @@ int stmmac_suspend(struct device *dev)
 
 	/* Enable Power down mode by programming the PMT regs */
 	if (device_may_wakeup(priv->device)) {
-		priv->hw->mac->pmt(priv->hw, priv->wolopts);
+		stmmac_pmt(priv, priv->hw, priv->wolopts);
 		priv->irq_wake = 1;
 	} else {
-		priv->hw->mac->set_mac(priv->ioaddr, false);
+		stmmac_mac_set(priv, priv->ioaddr, false);
 		pinctrl_pm_select_sleep_state(priv->device);
 		/* Disable clock in case of PWM is off */
 		clk_disable(priv->plat->pclk);
@@ -4534,7 +4528,7 @@ int stmmac_resume(struct device *dev)
 	 */
 	if (device_may_wakeup(priv->device)) {
 		spin_lock_irqsave(&priv->lock, flags);
-		priv->hw->mac->pmt(priv->hw, 0);
+		stmmac_pmt(priv, priv->hw, 0);
 		spin_unlock_irqrestore(&priv->lock, flags);
 		priv->irq_wake = 0;
 	} else {
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 2/5] net: stmmac: Switch stmmac_dma_ops to generic HW Interface Helpers
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1523019346.git.joabreu@synopsys.com>

Switch stmmac_dma_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.

No functional change.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |  50 -------
 drivers/net/ethernet/stmicro/stmmac/hwif.h         | 106 +++++++++++++++
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  14 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 147 +++++++++------------
 4 files changed, 172 insertions(+), 145 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 2c50d8c..b27221b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -381,56 +381,6 @@ struct dma_features {
 extern const struct stmmac_desc_ops enh_desc_ops;
 extern const struct stmmac_desc_ops ndesc_ops;
 
-/* Specific DMA helpers */
-struct stmmac_dma_ops {
-	/* DMA core initialization */
-	int (*reset)(void __iomem *ioaddr);
-	void (*init)(void __iomem *ioaddr, struct stmmac_dma_cfg *dma_cfg,
-		     u32 dma_tx, u32 dma_rx, int atds);
-	void (*init_chan)(void __iomem *ioaddr,
-			  struct stmmac_dma_cfg *dma_cfg, u32 chan);
-	void (*init_rx_chan)(void __iomem *ioaddr,
-			     struct stmmac_dma_cfg *dma_cfg,
-			     u32 dma_rx_phy, u32 chan);
-	void (*init_tx_chan)(void __iomem *ioaddr,
-			     struct stmmac_dma_cfg *dma_cfg,
-			     u32 dma_tx_phy, u32 chan);
-	/* Configure the AXI Bus Mode Register */
-	void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
-	/* Dump DMA registers */
-	void (*dump_regs)(void __iomem *ioaddr, u32 *reg_space);
-	/* Set tx/rx threshold in the csr6 register
-	 * An invalid value enables the store-and-forward mode */
-	void (*dma_mode)(void __iomem *ioaddr, int txmode, int rxmode,
-			 int rxfifosz);
-	void (*dma_rx_mode)(void __iomem *ioaddr, int mode, u32 channel,
-			    int fifosz, u8 qmode);
-	void (*dma_tx_mode)(void __iomem *ioaddr, int mode, u32 channel,
-			    int fifosz, u8 qmode);
-	/* To track extra statistic (if supported) */
-	void (*dma_diagnostic_fr) (void *data, struct stmmac_extra_stats *x,
-				   void __iomem *ioaddr);
-	void (*enable_dma_transmission) (void __iomem *ioaddr);
-	void (*enable_dma_irq)(void __iomem *ioaddr, u32 chan);
-	void (*disable_dma_irq)(void __iomem *ioaddr, u32 chan);
-	void (*start_tx)(void __iomem *ioaddr, u32 chan);
-	void (*stop_tx)(void __iomem *ioaddr, u32 chan);
-	void (*start_rx)(void __iomem *ioaddr, u32 chan);
-	void (*stop_rx)(void __iomem *ioaddr, u32 chan);
-	int (*dma_interrupt) (void __iomem *ioaddr,
-			      struct stmmac_extra_stats *x, u32 chan);
-	/* If supported then get the optional core features */
-	void (*get_hw_feature)(void __iomem *ioaddr,
-			       struct dma_features *dma_cap);
-	/* Program the HW RX Watchdog */
-	void (*rx_watchdog)(void __iomem *ioaddr, u32 riwt, u32 number_chan);
-	void (*set_tx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
-	void (*set_rx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
-	void (*set_rx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
-	void (*set_tx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
-	void (*enable_tso)(void __iomem *ioaddr, bool en, u32 chan);
-};
-
 struct mac_device_info;
 
 /* Helpers to program the MAC core */
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
index 4994677..e1a9ae6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.h
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -122,4 +122,110 @@ struct stmmac_desc_ops {
 #define stmmac_set_mss(__priv, __args...) \
 	stmmac_do_void_callback(__priv, desc, set_mss, __args)
 
+struct stmmac_dma_cfg;
+struct dma_features;
+
+/* Specific DMA helpers */
+struct stmmac_dma_ops {
+	/* DMA core initialization */
+	int (*reset)(void __iomem *ioaddr);
+	void (*init)(void __iomem *ioaddr, struct stmmac_dma_cfg *dma_cfg,
+		     u32 dma_tx, u32 dma_rx, int atds);
+	void (*init_chan)(void __iomem *ioaddr,
+			  struct stmmac_dma_cfg *dma_cfg, u32 chan);
+	void (*init_rx_chan)(void __iomem *ioaddr,
+			     struct stmmac_dma_cfg *dma_cfg,
+			     u32 dma_rx_phy, u32 chan);
+	void (*init_tx_chan)(void __iomem *ioaddr,
+			     struct stmmac_dma_cfg *dma_cfg,
+			     u32 dma_tx_phy, u32 chan);
+	/* Configure the AXI Bus Mode Register */
+	void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
+	/* Dump DMA registers */
+	void (*dump_regs)(void __iomem *ioaddr, u32 *reg_space);
+	/* Set tx/rx threshold in the csr6 register
+	 * An invalid value enables the store-and-forward mode */
+	void (*dma_mode)(void __iomem *ioaddr, int txmode, int rxmode,
+			 int rxfifosz);
+	void (*dma_rx_mode)(void __iomem *ioaddr, int mode, u32 channel,
+			    int fifosz, u8 qmode);
+	void (*dma_tx_mode)(void __iomem *ioaddr, int mode, u32 channel,
+			    int fifosz, u8 qmode);
+	/* To track extra statistic (if supported) */
+	void (*dma_diagnostic_fr) (void *data, struct stmmac_extra_stats *x,
+				   void __iomem *ioaddr);
+	void (*enable_dma_transmission) (void __iomem *ioaddr);
+	void (*enable_dma_irq)(void __iomem *ioaddr, u32 chan);
+	void (*disable_dma_irq)(void __iomem *ioaddr, u32 chan);
+	void (*start_tx)(void __iomem *ioaddr, u32 chan);
+	void (*stop_tx)(void __iomem *ioaddr, u32 chan);
+	void (*start_rx)(void __iomem *ioaddr, u32 chan);
+	void (*stop_rx)(void __iomem *ioaddr, u32 chan);
+	int (*dma_interrupt) (void __iomem *ioaddr,
+			      struct stmmac_extra_stats *x, u32 chan);
+	/* If supported then get the optional core features */
+	void (*get_hw_feature)(void __iomem *ioaddr,
+			       struct dma_features *dma_cap);
+	/* Program the HW RX Watchdog */
+	void (*rx_watchdog)(void __iomem *ioaddr, u32 riwt, u32 number_chan);
+	void (*set_tx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
+	void (*set_rx_ring_len)(void __iomem *ioaddr, u32 len, u32 chan);
+	void (*set_rx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
+	void (*set_tx_tail_ptr)(void __iomem *ioaddr, u32 tail_ptr, u32 chan);
+	void (*enable_tso)(void __iomem *ioaddr, bool en, u32 chan);
+};
+
+#define stmmac_reset(__priv, __args...) \
+	stmmac_do_callback(__priv, dma, reset, __args)
+#define stmmac_dma_init(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, init, __args)
+#define stmmac_init_chan(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, init_chan, __args)
+#define stmmac_init_rx_chan(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, init_rx_chan, __args)
+#define stmmac_init_tx_chan(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, init_tx_chan, __args)
+#define stmmac_axi(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, axi, __args)
+#define stmmac_dump_dma_regs(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, dump_regs, __args)
+#define stmmac_dma_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, dma_mode, __args)
+#define stmmac_dma_rx_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, dma_rx_mode, __args)
+#define stmmac_dma_tx_mode(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, dma_tx_mode, __args)
+#define stmmac_dma_diagnostic_fr(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, dma_diagnostic_fr, __args)
+#define stmmac_enable_dma_transmission(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, enable_dma_transmission, __args)
+#define stmmac_enable_dma_irq(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, enable_dma_irq, __args)
+#define stmmac_disable_dma_irq(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, disable_dma_irq, __args)
+#define stmmac_start_tx(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, start_tx, __args)
+#define stmmac_stop_tx(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, stop_tx, __args)
+#define stmmac_start_rx(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, start_rx, __args)
+#define stmmac_stop_rx(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, stop_rx, __args)
+#define stmmac_dma_interrupt_status(__priv, __args...) \
+	stmmac_do_callback(__priv, dma, dma_interrupt, __args)
+#define stmmac_get_hw_feature(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, get_hw_feature, __args)
+#define stmmac_rx_watchdog(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, rx_watchdog, __args)
+#define stmmac_set_tx_ring_len(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, set_tx_ring_len, __args)
+#define stmmac_set_rx_ring_len(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, set_rx_ring_len, __args)
+#define stmmac_set_rx_tail_ptr(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, set_rx_tail_ptr, __args)
+#define stmmac_set_tx_tail_ptr(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, set_tx_tail_ptr, __args)
+#define stmmac_enable_tso(__priv, __args...) \
+	stmmac_do_void_callback(__priv, dma, enable_tso, __args)
+
 #endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 2c6ed47..7009718 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -443,7 +443,7 @@ static void stmmac_ethtool_gregs(struct net_device *dev,
 	memset(reg_space, 0x0, REG_SPACE_SIZE);
 
 	priv->hw->mac->dump_regs(priv->hw, reg_space);
-	priv->hw->dma->dump_regs(priv->ioaddr, reg_space);
+	stmmac_dump_dma_regs(priv, priv->ioaddr, reg_space);
 	/* Copy DMA registers to where ethtool expects them */
 	memcpy(&reg_space[ETHTOOL_DMA_OFFSET], &reg_space[DMA_BUS_MODE / 4],
 	       NUM_DWMAC1000_DMA_REGS * 4);
@@ -529,7 +529,7 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
 	u32 rx_queues_count = priv->plat->rx_queues_to_use;
 	u32 tx_queues_count = priv->plat->tx_queues_to_use;
 	unsigned long count;
-	int i, j = 0;
+	int i, j = 0, ret;
 
 	if (priv->dma_cap.asp && priv->hw->mac->safety_feat_dump) {
 		dump = priv->hw->mac->safety_feat_dump;
@@ -541,11 +541,9 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
 	}
 
 	/* Update the DMA HW counters for dwmac10/100 */
-	if (priv->hw->dma->dma_diagnostic_fr)
-		priv->hw->dma->dma_diagnostic_fr(&dev->stats,
-						 (void *) &priv->xstats,
-						 priv->ioaddr);
-	else {
+	ret = stmmac_dma_diagnostic_fr(priv, &dev->stats, (void *) &priv->xstats,
+			priv->ioaddr);
+	if (ret) {
 		/* If supported, for new GMAC chips expose the MMC counters */
 		if (priv->dma_cap.rmon) {
 			dwmac_mmc_read(priv->mmcaddr, &priv->mmc);
@@ -810,7 +808,7 @@ static int stmmac_set_coalesce(struct net_device *dev,
 	priv->tx_coal_frames = ec->tx_max_coalesced_frames;
 	priv->tx_coal_timer = ec->tx_coalesce_usecs;
 	priv->rx_riwt = rx_riwt;
-	priv->hw->dma->rx_watchdog(priv->ioaddr, priv->rx_riwt, rx_cnt);
+	stmmac_rx_watchdog(priv, priv->ioaddr, priv->rx_riwt, rx_cnt);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c44e19d..7f2aeae 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1677,7 +1677,7 @@ static void stmmac_mac_enable_rx_queues(struct stmmac_priv *priv)
 static void stmmac_start_rx_dma(struct stmmac_priv *priv, u32 chan)
 {
 	netdev_dbg(priv->dev, "DMA RX processes started in channel %d\n", chan);
-	priv->hw->dma->start_rx(priv->ioaddr, chan);
+	stmmac_start_rx(priv, priv->ioaddr, chan);
 }
 
 /**
@@ -1690,7 +1690,7 @@ static void stmmac_start_rx_dma(struct stmmac_priv *priv, u32 chan)
 static void stmmac_start_tx_dma(struct stmmac_priv *priv, u32 chan)
 {
 	netdev_dbg(priv->dev, "DMA TX processes started in channel %d\n", chan);
-	priv->hw->dma->start_tx(priv->ioaddr, chan);
+	stmmac_start_tx(priv, priv->ioaddr, chan);
 }
 
 /**
@@ -1703,7 +1703,7 @@ static void stmmac_start_tx_dma(struct stmmac_priv *priv, u32 chan)
 static void stmmac_stop_rx_dma(struct stmmac_priv *priv, u32 chan)
 {
 	netdev_dbg(priv->dev, "DMA RX processes stopped in channel %d\n", chan);
-	priv->hw->dma->stop_rx(priv->ioaddr, chan);
+	stmmac_stop_rx(priv, priv->ioaddr, chan);
 }
 
 /**
@@ -1716,7 +1716,7 @@ static void stmmac_stop_rx_dma(struct stmmac_priv *priv, u32 chan)
 static void stmmac_stop_tx_dma(struct stmmac_priv *priv, u32 chan)
 {
 	netdev_dbg(priv->dev, "DMA TX processes stopped in channel %d\n", chan);
-	priv->hw->dma->stop_tx(priv->ioaddr, chan);
+	stmmac_stop_tx(priv, priv->ioaddr, chan);
 }
 
 /**
@@ -1807,19 +1807,18 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
 		for (chan = 0; chan < rx_channels_count; chan++) {
 			qmode = priv->plat->rx_queues_cfg[chan].mode_to_use;
 
-			priv->hw->dma->dma_rx_mode(priv->ioaddr, rxmode, chan,
-						   rxfifosz, qmode);
+			stmmac_dma_rx_mode(priv, priv->ioaddr, rxmode, chan,
+					rxfifosz, qmode);
 		}
 
 		for (chan = 0; chan < tx_channels_count; chan++) {
 			qmode = priv->plat->tx_queues_cfg[chan].mode_to_use;
 
-			priv->hw->dma->dma_tx_mode(priv->ioaddr, txmode, chan,
-						   txfifosz, qmode);
+			stmmac_dma_tx_mode(priv, priv->ioaddr, txmode, chan,
+					txfifosz, qmode);
 		}
 	} else {
-		priv->hw->dma->dma_mode(priv->ioaddr, txmode, rxmode,
-					rxfifosz);
+		stmmac_dma_mode(priv, priv->ioaddr, txmode, rxmode, rxfifosz);
 	}
 }
 
@@ -1927,16 +1926,6 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
 	netif_tx_unlock(priv->dev);
 }
 
-static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv, u32 chan)
-{
-	priv->hw->dma->enable_dma_irq(priv->ioaddr, chan);
-}
-
-static inline void stmmac_disable_dma_irq(struct stmmac_priv *priv, u32 chan)
-{
-	priv->hw->dma->disable_dma_irq(priv->ioaddr, chan);
-}
-
 /**
  * stmmac_tx_err - to manage the tx error
  * @priv: driver private structure
@@ -2000,13 +1989,12 @@ static void stmmac_set_dma_operation_mode(struct stmmac_priv *priv, u32 txmode,
 	txfifosz /= tx_channels_count;
 
 	if (priv->synopsys_id >= DWMAC_CORE_4_00) {
-		priv->hw->dma->dma_rx_mode(priv->ioaddr, rxmode, chan,
-					   rxfifosz, rxqmode);
-		priv->hw->dma->dma_tx_mode(priv->ioaddr, txmode, chan,
-					   txfifosz, txqmode);
+		stmmac_dma_rx_mode(priv, priv->ioaddr, rxmode, chan, rxfifosz,
+				rxqmode);
+		stmmac_dma_tx_mode(priv, priv->ioaddr, txmode, chan, txfifosz,
+				txqmode);
 	} else {
-		priv->hw->dma->dma_mode(priv->ioaddr, txmode, rxmode,
-					rxfifosz);
+		stmmac_dma_mode(priv, priv->ioaddr, txmode, rxmode, rxfifosz);
 	}
 }
 
@@ -2050,16 +2038,15 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
 	 * all tx queues rather than just a single tx queue.
 	 */
 	for (chan = 0; chan < channels_to_check; chan++)
-		status[chan] = priv->hw->dma->dma_interrupt(priv->ioaddr,
-							    &priv->xstats,
-							    chan);
+		status[chan] = stmmac_dma_interrupt_status(priv, priv->ioaddr,
+				&priv->xstats, chan);
 
 	for (chan = 0; chan < rx_channel_count; chan++) {
 		if (likely(status[chan] & handle_rx)) {
 			struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
 
 			if (likely(napi_schedule_prep(&rx_q->napi))) {
-				stmmac_disable_dma_irq(priv, chan);
+				stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
 				__napi_schedule(&rx_q->napi);
 				poll_scheduled = true;
 			}
@@ -2080,7 +2067,8 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
 					&priv->rx_queue[0];
 
 				if (likely(napi_schedule_prep(&rx_q->napi))) {
-					stmmac_disable_dma_irq(priv, chan);
+					stmmac_disable_dma_irq(priv,
+							priv->ioaddr, chan);
 					__napi_schedule(&rx_q->napi);
 				}
 				break;
@@ -2176,15 +2164,7 @@ static void stmmac_selec_desc_mode(struct stmmac_priv *priv)
  */
 static int stmmac_get_hw_features(struct stmmac_priv *priv)
 {
-	u32 ret = 0;
-
-	if (priv->hw->dma->get_hw_feature) {
-		priv->hw->dma->get_hw_feature(priv->ioaddr,
-					      &priv->dma_cap);
-		ret = 1;
-	}
-
-	return ret;
+	return stmmac_get_hw_feature(priv, priv->ioaddr, &priv->dma_cap) == 0;
 }
 
 /**
@@ -2234,7 +2214,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 	if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE))
 		atds = 1;
 
-	ret = priv->hw->dma->reset(priv->ioaddr);
+	ret = stmmac_reset(priv, priv->ioaddr);
 	if (ret) {
 		dev_err(priv->device, "Failed to reset the dma\n");
 		return ret;
@@ -2242,51 +2222,48 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 
 	if (priv->synopsys_id >= DWMAC_CORE_4_00) {
 		/* DMA Configuration */
-		priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
-				    dummy_dma_tx_phy, dummy_dma_rx_phy, atds);
+		stmmac_dma_init(priv, priv->ioaddr, priv->plat->dma_cfg,
+				dummy_dma_tx_phy, dummy_dma_rx_phy, atds);
 
 		/* DMA RX Channel Configuration */
 		for (chan = 0; chan < rx_channels_count; chan++) {
 			rx_q = &priv->rx_queue[chan];
 
-			priv->hw->dma->init_rx_chan(priv->ioaddr,
-						    priv->plat->dma_cfg,
-						    rx_q->dma_rx_phy, chan);
+			stmmac_init_rx_chan(priv, priv->ioaddr,
+					priv->plat->dma_cfg, rx_q->dma_rx_phy,
+					chan);
 
 			rx_q->rx_tail_addr = rx_q->dma_rx_phy +
 				    (DMA_RX_SIZE * sizeof(struct dma_desc));
-			priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
-						       rx_q->rx_tail_addr,
-						       chan);
+			stmmac_set_rx_tail_ptr(priv, priv->ioaddr,
+					rx_q->rx_tail_addr, chan);
 		}
 
 		/* DMA TX Channel Configuration */
 		for (chan = 0; chan < tx_channels_count; chan++) {
 			tx_q = &priv->tx_queue[chan];
 
-			priv->hw->dma->init_chan(priv->ioaddr,
-						 priv->plat->dma_cfg,
-						 chan);
+			stmmac_init_chan(priv, priv->ioaddr,
+					priv->plat->dma_cfg, chan);
 
-			priv->hw->dma->init_tx_chan(priv->ioaddr,
-						    priv->plat->dma_cfg,
-						    tx_q->dma_tx_phy, chan);
+			stmmac_init_tx_chan(priv, priv->ioaddr,
+					priv->plat->dma_cfg, tx_q->dma_tx_phy,
+					chan);
 
 			tx_q->tx_tail_addr = tx_q->dma_tx_phy +
 				    (DMA_TX_SIZE * sizeof(struct dma_desc));
-			priv->hw->dma->set_tx_tail_ptr(priv->ioaddr,
-						       tx_q->tx_tail_addr,
-						       chan);
+			stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
+					tx_q->tx_tail_addr, chan);
 		}
 	} else {
 		rx_q = &priv->rx_queue[chan];
 		tx_q = &priv->tx_queue[chan];
-		priv->hw->dma->init(priv->ioaddr, priv->plat->dma_cfg,
-				    tx_q->dma_tx_phy, rx_q->dma_rx_phy, atds);
+		stmmac_dma_init(priv, priv->ioaddr, priv->plat->dma_cfg,
+				tx_q->dma_tx_phy, rx_q->dma_rx_phy, atds);
 	}
 
-	if (priv->plat->axi && priv->hw->dma->axi)
-		priv->hw->dma->axi(priv->ioaddr, priv->plat->axi);
+	if (priv->plat->axi)
+		stmmac_axi(priv, priv->ioaddr, priv->plat->axi);
 
 	return ret;
 }
@@ -2332,18 +2309,14 @@ static void stmmac_set_rings_length(struct stmmac_priv *priv)
 	u32 chan;
 
 	/* set TX ring length */
-	if (priv->hw->dma->set_tx_ring_len) {
-		for (chan = 0; chan < tx_channels_count; chan++)
-			priv->hw->dma->set_tx_ring_len(priv->ioaddr,
-						       (DMA_TX_SIZE - 1), chan);
-	}
+	for (chan = 0; chan < tx_channels_count; chan++)
+		stmmac_set_tx_ring_len(priv, priv->ioaddr,
+				(DMA_TX_SIZE - 1), chan);
 
 	/* set RX ring length */
-	if (priv->hw->dma->set_rx_ring_len) {
-		for (chan = 0; chan < rx_channels_count; chan++)
-			priv->hw->dma->set_rx_ring_len(priv->ioaddr,
-						       (DMA_RX_SIZE - 1), chan);
-	}
+	for (chan = 0; chan < rx_channels_count; chan++)
+		stmmac_set_rx_ring_len(priv, priv->ioaddr,
+				(DMA_RX_SIZE - 1), chan);
 }
 
 /**
@@ -2619,9 +2592,10 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 
 	priv->tx_lpi_timer = STMMAC_DEFAULT_TWT_LS;
 
-	if ((priv->use_riwt) && (priv->hw->dma->rx_watchdog)) {
-		priv->rx_riwt = MAX_DMA_RIWT;
-		priv->hw->dma->rx_watchdog(priv->ioaddr, MAX_DMA_RIWT, rx_cnt);
+	if (priv->use_riwt) {
+		ret = stmmac_rx_watchdog(priv, priv->ioaddr, MAX_DMA_RIWT, rx_cnt);
+		if (!ret)
+			priv->rx_riwt = MAX_DMA_RIWT;
 	}
 
 	if (priv->hw->pcs && priv->hw->mac->pcs_ctrl_ane)
@@ -2633,7 +2607,7 @@ static int stmmac_hw_setup(struct net_device *dev, bool init_ptp)
 	/* Enable TSO */
 	if (priv->tso) {
 		for (chan = 0; chan < tx_cnt; chan++)
-			priv->hw->dma->enable_tso(priv->ioaddr, 1, chan);
+			stmmac_enable_tso(priv, priv->ioaddr, 1, chan);
 	}
 
 	return 0;
@@ -3058,8 +3032,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
 
-	priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, tx_q->tx_tail_addr,
-				       queue);
+	stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
 
 	return NETDEV_TX_OK;
 
@@ -3271,10 +3244,10 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 	netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
 
 	if (priv->synopsys_id < DWMAC_CORE_4_00)
-		priv->hw->dma->enable_dma_transmission(priv->ioaddr);
+		stmmac_enable_dma_transmission(priv, priv->ioaddr);
 	else
-		priv->hw->dma->set_tx_tail_ptr(priv->ioaddr, tx_q->tx_tail_addr,
-					       queue);
+		stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr,
+				queue);
 
 	return NETDEV_TX_OK;
 
@@ -3608,7 +3581,7 @@ static int stmmac_poll(struct napi_struct *napi, int budget)
 	work_done = stmmac_rx(priv, budget, rx_q->queue_index);
 	if (work_done < budget) {
 		napi_complete_done(napi, work_done);
-		stmmac_enable_dma_irq(priv, chan);
+		stmmac_enable_dma_irq(priv, priv->ioaddr, chan);
 	}
 	return work_done;
 }
@@ -3778,11 +3751,11 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
 				priv->hw->mac->host_mtl_irq_status(priv->hw,
 								   queue);
 
-				if (status & CORE_IRQ_MTL_RX_OVERFLOW &&
-				    priv->hw->dma->set_rx_tail_ptr)
-					priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
-								rx_q->rx_tail_addr,
-								queue);
+				if (status & CORE_IRQ_MTL_RX_OVERFLOW)
+					stmmac_set_rx_tail_ptr(priv,
+							priv->ioaddr,
+							rx_q->rx_tail_addr,
+							queue);
 			}
 		}
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 1/5] net: stmmac: Switch stmmac_desc_ops to generic HW Interface Helpers
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1523019346.git.joabreu@synopsys.com>

Switch stmmac_desc_ops to generic Hardware Interface Helpers instead of
using hard-coded callbacks. This makes the code more readable and more
flexible.

No functional change.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/chain_mode.c   |  14 +--
 drivers/net/ethernet/stmicro/stmmac/common.h       |  55 +--------
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c |   4 +-
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c     |   4 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h         | 125 +++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   4 +-
 drivers/net/ethernet/stmicro/stmmac/ring_mode.c    |  15 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 110 +++++++++---------
 8 files changed, 195 insertions(+), 136 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/hwif.h

diff --git a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
index e93c40b..ca0f9c2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/chain_mode.c
@@ -51,8 +51,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 	tx_q->tx_skbuff_dma[entry].buf = des2;
 	tx_q->tx_skbuff_dma[entry].len = bmax;
 	/* do not close the descriptor and do not set own bit */
-	priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum, STMMAC_CHAIN_MODE,
-					0, false, skb->len);
+	stmmac_prepare_tx_desc(priv, desc, 1, bmax, csum, STMMAC_CHAIN_MODE,
+			0, false, skb->len);
 
 	while (len != 0) {
 		tx_q->tx_skbuff[entry] = NULL;
@@ -68,9 +68,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 				return -1;
 			tx_q->tx_skbuff_dma[entry].buf = des2;
 			tx_q->tx_skbuff_dma[entry].len = bmax;
-			priv->hw->desc->prepare_tx_desc(desc, 0, bmax, csum,
-							STMMAC_CHAIN_MODE, 1,
-							false, skb->len);
+			stmmac_prepare_tx_desc(priv, desc, 0, bmax, csum,
+					STMMAC_CHAIN_MODE, 1, false, skb->len);
 			len -= bmax;
 			i++;
 		} else {
@@ -83,9 +82,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 			tx_q->tx_skbuff_dma[entry].buf = des2;
 			tx_q->tx_skbuff_dma[entry].len = len;
 			/* last descriptor can be set now */
-			priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
-							STMMAC_CHAIN_MODE, 1,
-							true, skb->len);
+			stmmac_prepare_tx_desc(priv, desc, 0, len, csum,
+					STMMAC_CHAIN_MODE, 1, true, skb->len);
 			len = 0;
 		}
 	}
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index ad2388a..2c50d8c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -32,6 +32,7 @@
 #endif
 
 #include "descs.h"
+#include "hwif.h"
 #include "mmc.h"
 
 /* Synopsys Core versions */
@@ -377,60 +378,6 @@ struct dma_features {
 
 #define JUMBO_LEN		9000
 
-/* Descriptors helpers */
-struct stmmac_desc_ops {
-	/* DMA RX descriptor ring initialization */
-	void (*init_rx_desc) (struct dma_desc *p, int disable_rx_ic, int mode,
-			      int end);
-	/* DMA TX descriptor ring initialization */
-	void (*init_tx_desc) (struct dma_desc *p, int mode, int end);
-
-	/* Invoked by the xmit function to prepare the tx descriptor */
-	void (*prepare_tx_desc) (struct dma_desc *p, int is_fs, int len,
-				 bool csum_flag, int mode, bool tx_own,
-				 bool ls, unsigned int tot_pkt_len);
-	void (*prepare_tso_tx_desc)(struct dma_desc *p, int is_fs, int len1,
-				    int len2, bool tx_own, bool ls,
-				    unsigned int tcphdrlen,
-				    unsigned int tcppayloadlen);
-	/* Set/get the owner of the descriptor */
-	void (*set_tx_owner) (struct dma_desc *p);
-	int (*get_tx_owner) (struct dma_desc *p);
-	/* Clean the tx descriptor as soon as the tx irq is received */
-	void (*release_tx_desc) (struct dma_desc *p, int mode);
-	/* Clear interrupt on tx frame completion. When this bit is
-	 * set an interrupt happens as soon as the frame is transmitted */
-	void (*set_tx_ic)(struct dma_desc *p);
-	/* Last tx segment reports the transmit status */
-	int (*get_tx_ls) (struct dma_desc *p);
-	/* Return the transmit status looking at the TDES1 */
-	int (*tx_status) (void *data, struct stmmac_extra_stats *x,
-			  struct dma_desc *p, void __iomem *ioaddr);
-	/* Get the buffer size from the descriptor */
-	int (*get_tx_len) (struct dma_desc *p);
-	/* Handle extra events on specific interrupts hw dependent */
-	void (*set_rx_owner) (struct dma_desc *p);
-	/* Get the receive frame size */
-	int (*get_rx_frame_len) (struct dma_desc *p, int rx_coe_type);
-	/* Return the reception status looking at the RDES1 */
-	int (*rx_status) (void *data, struct stmmac_extra_stats *x,
-			  struct dma_desc *p);
-	void (*rx_extended_status) (void *data, struct stmmac_extra_stats *x,
-				    struct dma_extended_desc *p);
-	/* Set tx timestamp enable bit */
-	void (*enable_tx_timestamp) (struct dma_desc *p);
-	/* get tx timestamp status */
-	int (*get_tx_timestamp_status) (struct dma_desc *p);
-	/* get timestamp value */
-	 u64(*get_timestamp) (void *desc, u32 ats);
-	/* get rx timestamp status */
-	int (*get_rx_timestamp_status)(void *desc, void *next_desc, u32 ats);
-	/* Display ring */
-	void (*display_ring)(void *head, unsigned int size, bool rx);
-	/* set MSS via context descriptor */
-	void (*set_mss)(struct dma_desc *p, unsigned int mss);
-};
-
 extern const struct stmmac_desc_ops enh_desc_ops;
 extern const struct stmmac_desc_ops ndesc_ops;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index 2a6521d..65ed896c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -223,7 +223,7 @@ static int dwmac4_wrback_get_tx_timestamp_status(struct dma_desc *p)
 	return 0;
 }
 
-static inline u64 dwmac4_get_timestamp(void *desc, u32 ats)
+static inline void dwmac4_get_timestamp(void *desc, u32 ats, u64 *ts)
 {
 	struct dma_desc *p = (struct dma_desc *)desc;
 	u64 ns;
@@ -232,7 +232,7 @@ static inline u64 dwmac4_get_timestamp(void *desc, u32 ats)
 	/* convert high/sec time stamp value to nanosecond */
 	ns += le32_to_cpu(p->des1) * 1000000000ULL;
 
-	return ns;
+	*ts = ns;
 }
 
 static int dwmac4_rx_check_timestamp(void *desc)
diff --git a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
index 6768a25..3bfb3f5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/enh_desc.c
@@ -382,7 +382,7 @@ static int enh_desc_get_tx_timestamp_status(struct dma_desc *p)
 	return (le32_to_cpu(p->des0) & ETDES0_TIME_STAMP_STATUS) >> 17;
 }
 
-static u64 enh_desc_get_timestamp(void *desc, u32 ats)
+static void enh_desc_get_timestamp(void *desc, u32 ats, u64 *ts)
 {
 	u64 ns;
 
@@ -397,7 +397,7 @@ static u64 enh_desc_get_timestamp(void *desc, u32 ats)
 		ns += le32_to_cpu(p->des3) * 1000000000ULL;
 	}
 
-	return ns;
+	*ts = ns;
 }
 
 static int enh_desc_get_rx_timestamp_status(void *desc, void *next_desc,
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.h b/drivers/net/ethernet/stmicro/stmmac/hwif.h
new file mode 100644
index 0000000..4994677
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.h
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+// Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
+// stmmac HW Interface Callbacks
+
+#ifndef __STMMAC_HWIF_H__
+#define __STMMAC_HWIF_H__
+
+#define stmmac_do_void_callback(__priv, __module, __cname,  __arg0, __args...) \
+({ \
+	int __result = -EINVAL; \
+	if ((__priv)->hw->__module->__cname) { \
+		(__priv)->hw->__module->__cname((__arg0), ##__args); \
+		__result = 0; \
+	} \
+	__result; \
+})
+#define stmmac_do_callback(__priv, __module, __cname,  __arg0, __args...) \
+({ \
+	int __result = -EINVAL; \
+	if ((__priv)->hw->__module->__cname) \
+		__result = (__priv)->hw->__module->__cname((__arg0), ##__args); \
+	__result; \
+})
+
+struct stmmac_extra_stats;
+struct stmmac_safety_stats;
+struct dma_desc;
+struct dma_extended_desc;
+
+/* Descriptors helpers */
+struct stmmac_desc_ops {
+	/* DMA RX descriptor ring initialization */
+	void (*init_rx_desc)(struct dma_desc *p, int disable_rx_ic, int mode,
+			int end);
+	/* DMA TX descriptor ring initialization */
+	void (*init_tx_desc)(struct dma_desc *p, int mode, int end);
+	/* Invoked by the xmit function to prepare the tx descriptor */
+	void (*prepare_tx_desc)(struct dma_desc *p, int is_fs, int len,
+			bool csum_flag, int mode, bool tx_own, bool ls,
+			unsigned int tot_pkt_len);
+	void (*prepare_tso_tx_desc)(struct dma_desc *p, int is_fs, int len1,
+			int len2, bool tx_own, bool ls, unsigned int tcphdrlen,
+			unsigned int tcppayloadlen);
+	/* Set/get the owner of the descriptor */
+	void (*set_tx_owner)(struct dma_desc *p);
+	int (*get_tx_owner)(struct dma_desc *p);
+	/* Clean the tx descriptor as soon as the tx irq is received */
+	void (*release_tx_desc)(struct dma_desc *p, int mode);
+	/* Clear interrupt on tx frame completion. When this bit is
+	 * set an interrupt happens as soon as the frame is transmitted */
+	void (*set_tx_ic)(struct dma_desc *p);
+	/* Last tx segment reports the transmit status */
+	int (*get_tx_ls)(struct dma_desc *p);
+	/* Return the transmit status looking at the TDES1 */
+	int (*tx_status)(void *data, struct stmmac_extra_stats *x,
+			struct dma_desc *p, void __iomem *ioaddr);
+	/* Get the buffer size from the descriptor */
+	int (*get_tx_len)(struct dma_desc *p);
+	/* Handle extra events on specific interrupts hw dependent */
+	void (*set_rx_owner)(struct dma_desc *p);
+	/* Get the receive frame size */
+	int (*get_rx_frame_len)(struct dma_desc *p, int rx_coe_type);
+	/* Return the reception status looking at the RDES1 */
+	int (*rx_status)(void *data, struct stmmac_extra_stats *x,
+			struct dma_desc *p);
+	void (*rx_extended_status)(void *data, struct stmmac_extra_stats *x,
+			struct dma_extended_desc *p);
+	/* Set tx timestamp enable bit */
+	void (*enable_tx_timestamp) (struct dma_desc *p);
+	/* get tx timestamp status */
+	int (*get_tx_timestamp_status) (struct dma_desc *p);
+	/* get timestamp value */
+	void (*get_timestamp)(void *desc, u32 ats, u64 *ts);
+	/* get rx timestamp status */
+	int (*get_rx_timestamp_status)(void *desc, void *next_desc, u32 ats);
+	/* Display ring */
+	void (*display_ring)(void *head, unsigned int size, bool rx);
+	/* set MSS via context descriptor */
+	void (*set_mss)(struct dma_desc *p, unsigned int mss);
+};
+
+#define stmmac_init_rx_desc(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, init_rx_desc, __args)
+#define stmmac_init_tx_desc(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, init_tx_desc, __args)
+#define stmmac_prepare_tx_desc(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, prepare_tx_desc, __args)
+#define stmmac_prepare_tso_tx_desc(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, prepare_tso_tx_desc, __args)
+#define stmmac_set_tx_owner(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, set_tx_owner, __args)
+#define stmmac_get_tx_owner(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_tx_owner, __args)
+#define stmmac_release_tx_desc(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, release_tx_desc, __args)
+#define stmmac_set_tx_ic(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, set_tx_ic, __args)
+#define stmmac_get_tx_ls(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_tx_ls, __args)
+#define stmmac_tx_status(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, tx_status, __args)
+#define stmmac_get_tx_len(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_tx_len, __args)
+#define stmmac_set_rx_owner(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, set_rx_owner, __args)
+#define stmmac_get_rx_frame_len(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_rx_frame_len, __args)
+#define stmmac_rx_status(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, rx_status, __args)
+#define stmmac_rx_extended_status(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, rx_extended_status, __args)
+#define stmmac_enable_tx_timestamp(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, enable_tx_timestamp, __args)
+#define stmmac_get_tx_timestamp_status(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_tx_timestamp_status, __args)
+#define stmmac_get_timestamp(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, get_timestamp, __args)
+#define stmmac_get_rx_timestamp_status(__priv, __args...) \
+	stmmac_do_callback(__priv, desc, get_rx_timestamp_status, __args)
+#define stmmac_display_ring(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, display_ring, __args)
+#define stmmac_set_mss(__priv, __args...) \
+	stmmac_do_void_callback(__priv, desc, set_mss, __args)
+
+#endif /* __STMMAC_HWIF_H__ */
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index ebd9e5e..7b1d901 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -253,7 +253,7 @@ static int ndesc_get_tx_timestamp_status(struct dma_desc *p)
 	return (le32_to_cpu(p->des0) & TDES0_TIME_STAMP_STATUS) >> 17;
 }
 
-static u64 ndesc_get_timestamp(void *desc, u32 ats)
+static void ndesc_get_timestamp(void *desc, u32 ats, u64 *ts)
 {
 	struct dma_desc *p = (struct dma_desc *)desc;
 	u64 ns;
@@ -262,7 +262,7 @@ static u64 ndesc_get_timestamp(void *desc, u32 ats)
 	/* convert high/sec time stamp value to nanosecond */
 	ns += le32_to_cpu(p->des3) * 1000000000ULL;
 
-	return ns;
+	*ts = ns;
 }
 
 static int ndesc_get_rx_timestamp_status(void *desc, void *next_desc, u32 ats)
diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
index 28e4b5d..808ca75 100644
--- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
@@ -58,9 +58,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 		tx_q->tx_skbuff_dma[entry].is_jumbo = true;
 
 		desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
-		priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum,
-						STMMAC_RING_MODE, 0,
-						false, skb->len);
+		stmmac_prepare_tx_desc(priv, desc, 1, bmax, csum,
+				STMMAC_RING_MODE, 0, false, skb->len);
 		tx_q->tx_skbuff[entry] = NULL;
 		entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
 
@@ -79,9 +78,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 		tx_q->tx_skbuff_dma[entry].is_jumbo = true;
 
 		desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
-		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
-						STMMAC_RING_MODE, 1,
-						true, skb->len);
+		stmmac_prepare_tx_desc(priv, desc, 0, len, csum,
+				STMMAC_RING_MODE, 1, true, skb->len);
 	} else {
 		des2 = dma_map_single(priv->device, skb->data,
 				      nopaged_len, DMA_TO_DEVICE);
@@ -92,9 +90,8 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
 		tx_q->tx_skbuff_dma[entry].len = nopaged_len;
 		tx_q->tx_skbuff_dma[entry].is_jumbo = true;
 		desc->des3 = cpu_to_le32(des2 + BUF_SIZE_4KiB);
-		priv->hw->desc->prepare_tx_desc(desc, 1, nopaged_len, csum,
-						STMMAC_RING_MODE, 0,
-						true, skb->len);
+		stmmac_prepare_tx_desc(priv, desc, 1, nopaged_len, csum,
+				STMMAC_RING_MODE, 0, true, skb->len);
 	}
 
 	tx_q->cur_tx = entry;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 9a16931..c44e19d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -50,6 +50,7 @@
 #include <linux/reset.h>
 #include <linux/of_mdio.h>
 #include "dwmac1000.h"
+#include "hwif.h"
 
 #define STMMAC_ALIGN(x)	L1_CACHE_ALIGN(x)
 #define	TSO_MAX_BUFF_SIZE	(SZ_16K - 1)
@@ -464,9 +465,9 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv *priv,
 		return;
 
 	/* check tx tstamp status */
-	if (priv->hw->desc->get_tx_timestamp_status(p)) {
+	if (stmmac_get_tx_timestamp_status(priv, p)) {
 		/* get the valid tstamp */
-		ns = priv->hw->desc->get_timestamp(p, priv->adv_ts);
+		stmmac_get_timestamp(priv, p, priv->adv_ts, &ns);
 
 		memset(&shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
 		shhwtstamp.hwtstamp = ns_to_ktime(ns);
@@ -502,8 +503,8 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv *priv, struct dma_desc *p,
 		desc = np;
 
 	/* Check if timestamp is available */
-	if (priv->hw->desc->get_rx_timestamp_status(p, np, priv->adv_ts)) {
-		ns = priv->hw->desc->get_timestamp(desc, priv->adv_ts);
+	if (stmmac_get_rx_timestamp_status(priv, p, np, priv->adv_ts)) {
+		stmmac_get_timestamp(priv, desc, priv->adv_ts, &ns);
 		netdev_dbg(priv->dev, "get valid RX hw timestamp %llu\n", ns);
 		shhwtstamp = skb_hwtstamps(skb);
 		memset(shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
@@ -1008,7 +1009,7 @@ static void stmmac_display_rx_rings(struct stmmac_priv *priv)
 			head_rx = (void *)rx_q->dma_rx;
 
 		/* Display RX ring */
-		priv->hw->desc->display_ring(head_rx, DMA_RX_SIZE, true);
+		stmmac_display_ring(priv, head_rx, DMA_RX_SIZE, true);
 	}
 }
 
@@ -1029,7 +1030,7 @@ static void stmmac_display_tx_rings(struct stmmac_priv *priv)
 		else
 			head_tx = (void *)tx_q->dma_tx;
 
-		priv->hw->desc->display_ring(head_tx, DMA_TX_SIZE, false);
+		stmmac_display_ring(priv, head_tx, DMA_TX_SIZE, false);
 	}
 }
 
@@ -1073,13 +1074,13 @@ static void stmmac_clear_rx_descriptors(struct stmmac_priv *priv, u32 queue)
 	/* Clear the RX descriptors */
 	for (i = 0; i < DMA_RX_SIZE; i++)
 		if (priv->extend_desc)
-			priv->hw->desc->init_rx_desc(&rx_q->dma_erx[i].basic,
-						     priv->use_riwt, priv->mode,
-						     (i == DMA_RX_SIZE - 1));
+			stmmac_init_rx_desc(priv, &rx_q->dma_erx[i].basic,
+					priv->use_riwt, priv->mode,
+					(i == DMA_RX_SIZE - 1));
 		else
-			priv->hw->desc->init_rx_desc(&rx_q->dma_rx[i],
-						     priv->use_riwt, priv->mode,
-						     (i == DMA_RX_SIZE - 1));
+			stmmac_init_rx_desc(priv, &rx_q->dma_rx[i],
+					priv->use_riwt, priv->mode,
+					(i == DMA_RX_SIZE - 1));
 }
 
 /**
@@ -1097,13 +1098,11 @@ static void stmmac_clear_tx_descriptors(struct stmmac_priv *priv, u32 queue)
 	/* Clear the TX descriptors */
 	for (i = 0; i < DMA_TX_SIZE; i++)
 		if (priv->extend_desc)
-			priv->hw->desc->init_tx_desc(&tx_q->dma_etx[i].basic,
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
+			stmmac_init_tx_desc(priv, &tx_q->dma_etx[i].basic,
+					priv->mode, (i == DMA_TX_SIZE - 1));
 		else
-			priv->hw->desc->init_tx_desc(&tx_q->dma_tx[i],
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
+			stmmac_init_tx_desc(priv, &tx_q->dma_tx[i],
+					priv->mode, (i == DMA_TX_SIZE - 1));
 }
 
 /**
@@ -1851,9 +1850,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
 		else
 			p = tx_q->dma_tx + entry;
 
-		status = priv->hw->desc->tx_status(&priv->dev->stats,
-						      &priv->xstats, p,
-						      priv->ioaddr);
+		status = stmmac_tx_status(priv, &priv->dev->stats,
+				&priv->xstats, p, priv->ioaddr);
 		/* Check if the descriptor is owned by the DMA */
 		if (unlikely(status & tx_dma_own))
 			break;
@@ -1904,7 +1902,7 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
 			tx_q->tx_skbuff[entry] = NULL;
 		}
 
-		priv->hw->desc->release_tx_desc(p, priv->mode);
+		stmmac_release_tx_desc(priv, p, priv->mode);
 
 		entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
 	}
@@ -1957,13 +1955,11 @@ static void stmmac_tx_err(struct stmmac_priv *priv, u32 chan)
 	dma_free_tx_skbufs(priv, chan);
 	for (i = 0; i < DMA_TX_SIZE; i++)
 		if (priv->extend_desc)
-			priv->hw->desc->init_tx_desc(&tx_q->dma_etx[i].basic,
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
+			stmmac_init_tx_desc(priv, &tx_q->dma_etx[i].basic,
+					priv->mode, (i == DMA_TX_SIZE - 1));
 		else
-			priv->hw->desc->init_tx_desc(&tx_q->dma_tx[i],
-						     priv->mode,
-						     (i == DMA_TX_SIZE - 1));
+			stmmac_init_tx_desc(priv, &tx_q->dma_tx[i],
+					priv->mode, (i == DMA_TX_SIZE - 1));
 	tx_q->dirty_tx = 0;
 	tx_q->cur_tx = 0;
 	tx_q->mss = 0;
@@ -2851,10 +2847,10 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, unsigned int des,
 		buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ?
 			    TSO_MAX_BUFF_SIZE : tmp_len;
 
-		priv->hw->desc->prepare_tso_tx_desc(desc, 0, buff_size,
-			0, 1,
-			(last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE),
-			0, 0);
+		stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size,
+				0, 1,
+				(last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE),
+				0, 0);
 
 		tmp_len -= TSO_MAX_BUFF_SIZE;
 	}
@@ -2926,7 +2922,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 	/* set new MSS value if needed */
 	if (mss != tx_q->mss) {
 		mss_desc = tx_q->dma_tx + tx_q->cur_tx;
-		priv->hw->desc->set_mss(mss_desc, mss);
+		stmmac_set_mss(priv, mss_desc, mss);
 		tx_q->mss = mss;
 		tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, DMA_TX_SIZE);
 		WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]);
@@ -3012,7 +3008,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 			  STMMAC_COAL_TIMER(priv->tx_coal_timer));
 	} else {
 		priv->tx_count_frames = 0;
-		priv->hw->desc->set_tx_ic(desc);
+		stmmac_set_tx_ic(priv, desc);
 		priv->xstats.tx_set_ic_bit++;
 	}
 
@@ -3022,11 +3018,11 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 		     priv->hwts_tx_en)) {
 		/* declare that device is doing timestamping */
 		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
-		priv->hw->desc->enable_tx_timestamp(first);
+		stmmac_enable_tx_timestamp(priv, first);
 	}
 
 	/* Complete the first descriptor before granting the DMA */
-	priv->hw->desc->prepare_tso_tx_desc(first, 1,
+	stmmac_prepare_tso_tx_desc(priv, first, 1,
 			proto_hdr_len,
 			pay_len,
 			1, tx_q->tx_skbuff_dma[first_entry].last_segment,
@@ -3040,7 +3036,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 		 * sure that MSS's own bit is the last thing written.
 		 */
 		dma_wmb();
-		priv->hw->desc->set_tx_owner(mss_desc);
+		stmmac_set_tx_owner(priv, mss_desc);
 	}
 
 	/* The own bit must be the latest setting done when prepare the
@@ -3054,8 +3050,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
 			__func__, tx_q->cur_tx, tx_q->dirty_tx, first_entry,
 			tx_q->cur_tx, first, nfrags);
 
-		priv->hw->desc->display_ring((void *)tx_q->dma_tx, DMA_TX_SIZE,
-					     0);
+		stmmac_display_ring(priv, (void *)tx_q->dma_tx, DMA_TX_SIZE, 0);
 
 		pr_info(">>> frame to be transmitted: ");
 		print_pkt(skb->data, skb_headlen(skb));
@@ -3174,9 +3169,8 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		tx_q->tx_skbuff_dma[entry].last_segment = last_segment;
 
 		/* Prepare the descriptor and set the own bit too */
-		priv->hw->desc->prepare_tx_desc(desc, 0, len, csum_insertion,
-						priv->mode, 1, last_segment,
-						skb->len);
+		stmmac_prepare_tx_desc(priv, desc, 0, len, csum_insertion,
+				priv->mode, 1, last_segment, skb->len);
 	}
 
 	/* Only the last descriptor gets to point to the skb. */
@@ -3203,7 +3197,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		else
 			tx_head = (void *)tx_q->dma_tx;
 
-		priv->hw->desc->display_ring(tx_head, DMA_TX_SIZE, false);
+		stmmac_display_ring(priv, tx_head, DMA_TX_SIZE, false);
 
 		netdev_dbg(priv->dev, ">>> frame to be transmitted: ");
 		print_pkt(skb->data, skb->len);
@@ -3228,7 +3222,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 			  STMMAC_COAL_TIMER(priv->tx_coal_timer));
 	} else {
 		priv->tx_count_frames = 0;
-		priv->hw->desc->set_tx_ic(desc);
+		stmmac_set_tx_ic(priv, desc);
 		priv->xstats.tx_set_ic_bit++;
 	}
 
@@ -3259,13 +3253,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 			     priv->hwts_tx_en)) {
 			/* declare that device is doing timestamping */
 			skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
-			priv->hw->desc->enable_tx_timestamp(first);
+			stmmac_enable_tx_timestamp(priv, first);
 		}
 
 		/* Prepare the first descriptor setting the OWN bit too */
-		priv->hw->desc->prepare_tx_desc(first, 1, nopaged_len,
-						csum_insertion, priv->mode, 1,
-						last_segment, skb->len);
+		stmmac_prepare_tx_desc(priv, first, 1, nopaged_len,
+				csum_insertion, priv->mode, 1, last_segment,
+				skb->len);
 
 		/* The own bit must be the latest setting done when prepare the
 		 * descriptor and then barrier is needed to make sure that
@@ -3382,9 +3376,9 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
 		dma_wmb();
 
 		if (unlikely(priv->synopsys_id >= DWMAC_CORE_4_00))
-			priv->hw->desc->init_rx_desc(p, priv->use_riwt, 0, 0);
+			stmmac_init_rx_desc(priv, p, priv->use_riwt, 0, 0);
 		else
-			priv->hw->desc->set_rx_owner(p);
+			stmmac_set_rx_owner(priv, p);
 
 		dma_wmb();
 
@@ -3418,7 +3412,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		else
 			rx_head = (void *)rx_q->dma_rx;
 
-		priv->hw->desc->display_ring(rx_head, DMA_RX_SIZE, true);
+		stmmac_display_ring(priv, rx_head, DMA_RX_SIZE, true);
 	}
 	while (count < limit) {
 		int status;
@@ -3431,8 +3425,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 			p = rx_q->dma_rx + entry;
 
 		/* read the status of the incoming frame */
-		status = priv->hw->desc->rx_status(&priv->dev->stats,
-						   &priv->xstats, p);
+		status = stmmac_rx_status(priv, &priv->dev->stats,
+				&priv->xstats, p);
 		/* check if managed by the DMA otherwise go ahead */
 		if (unlikely(status & dma_own))
 			break;
@@ -3449,11 +3443,9 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 
 		prefetch(np);
 
-		if ((priv->extend_desc) && (priv->hw->desc->rx_extended_status))
-			priv->hw->desc->rx_extended_status(&priv->dev->stats,
-							   &priv->xstats,
-							   rx_q->dma_erx +
-							   entry);
+		if (priv->extend_desc)
+			stmmac_rx_extended_status(priv, &priv->dev->stats,
+					&priv->xstats, rx_q->dma_erx + entry);
 		if (unlikely(status == discard_frame)) {
 			priv->dev->stats.rx_errors++;
 			if (priv->hwts_rx_en && !priv->extend_desc) {
@@ -3479,7 +3471,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 			else
 				des = le32_to_cpu(p->des2);
 
-			frame_len = priv->hw->desc->get_rx_frame_len(p, coe);
+			frame_len = stmmac_get_rx_frame_len(priv, p, coe);
 
 			/*  If frame length is greater than skb buffer size
 			 *  (preallocated during init) then the packet is
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next 0/5] net: stmmac: Stop using hard-coded callbacks
From: Jose Abreu @ 2018-04-06 13:08 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue

This a starting point for a cleanup and re-organization of stmmac.

In this series we stop using hard-coded callbacks along the code and use
instead helpers which are defined in a single place ("hwif.h").

This brings several advantages:
	1) Less typing :)
	2) Guaranteed function pointer check
	3) More flexibility

By 2) we stop using the repeated pattern of:
	if (priv->hw->mac->some_func)
		priv->hw->mac->some_func(...)

I didn't check but I expect the final .ko will be bigger with this series
because *all* of function pointers are checked.

Anyway, I hope this can make the code more readable and more flexible now.

Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>

Jose Abreu (5):
  net: stmmac: Switch stmmac_desc_ops to generic HW Interface Helpers
  net: stmmac: Switch stmmac_dma_ops to generic HW Interface Helpers
  net: stmmac: Switch stmmac_ops to generic HW Interface Helpers
  net: stmmac: Switch stmmac_hwtimestamp to generic HW Interface Helpers
  net: stmmac: Switch stmmac_mode_ops to generic HW Interface Helpers

 drivers/net/ethernet/stmicro/stmmac/chain_mode.c   |  34 +-
 drivers/net/ethernet/stmicro/stmmac/common.h       | 199 +---------
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c |   4 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac5.c       |  19 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac5.h       |   6 +-
 drivers/net/ethernet/stmicro/stmmac/enh_desc.c     |   4 +-
 drivers/net/ethernet/stmicro/stmmac/hwif.h         | 421 ++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   4 +-
 drivers/net/ethernet/stmicro/stmmac/ring_mode.c    |  39 +-
 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   |  82 ++--
 .../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c  |  34 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 439 +++++++++------------
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c   |  18 +-
 13 files changed, 726 insertions(+), 577 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/hwif.h

-- 
2.9.3

^ permalink raw reply

* Re: [PATCH net-next] netns: filter uevents correctly
From: Christian Brauner @ 2018-04-06 13:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kirill Tkhai, davem, gregkh, netdev, linux-kernel, avagin, serge
In-Reply-To: <87in953ryi.fsf@xmission.com>

On Thu, Apr 05, 2018 at 10:59:49PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
> 
> > On Thu, Apr 05, 2018 at 05:26:59PM +0300, Kirill Tkhai wrote:
> >> On 05.04.2018 17:07, Christian Brauner wrote:
> >> > On Thu, Apr 05, 2018 at 04:01:03PM +0300, Kirill Tkhai wrote:
> >> >> On 04.04.2018 22:48, Christian Brauner wrote:
> >> >>> commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
> >> >>>
> >> >>> enabled sending hotplug events into all network namespaces back in 2010.
> >> >>> Over time the set of uevents that get sent into all network namespaces has
> >> >>> shrunk. We have now reached the point where hotplug events for all devices
> >> >>> that carry a namespace tag are filtered according to that namespace.
> >> >>>
> >> >>> Specifically, they are filtered whenever the namespace tag of the kobject
> >> >>> does not match the namespace tag of the netlink socket. One example are
> >> >>> network devices. Uevents for network devices only show up in the network
> >> >>> namespaces these devices are moved to or created in.
> >> >>>
> >> >>> However, any uevent for a kobject that does not have a namespace tag
> >> >>> associated with it will not be filtered and we will *try* to broadcast it
> >> >>> into all network namespaces.
> >> >>>
> >> >>> The original patchset was written in 2010 before user namespaces were a
> >> >>> thing. With the introduction of user namespaces sending out uevents became
> >> >>> partially isolated as they were filtered by user namespaces:
> >> >>>
> >> >>> net/netlink/af_netlink.c:do_one_broadcast()
> >> >>>
> >> >>> if (!net_eq(sock_net(sk), p->net)) {
> >> >>>         if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> >> >>>                 return;
> >> >>>
> >> >>>         if (!peernet_has_id(sock_net(sk), p->net))
> >> >>>                 return;
> >> >>>
> >> >>>         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >> >>>                              CAP_NET_BROADCAST))
> >> >>>         j       return;
> >> >>> }
> >> >>>
> >> >>> The file_ns_capable() check will check whether the caller had
> >> >>> CAP_NET_BROADCAST at the time of opening the netlink socket in the user
> >> >>> namespace of interest. This check is fine in general but seems insufficient
> >> >>> to me when paired with uevents. The reason is that devices always belong to
> >> >>> the initial user namespace so uevents for kobjects that do not carry a
> >> >>> namespace tag should never be sent into another user namespace. This has
> >> >>> been the intention all along. But there's one case where this breaks,
> >> >>> namely if a new user namespace is created by root on the host and an
> >> >>> identity mapping is established between root on the host and root in the
> >> >>> new user namespace. Here's a reproducer:
> >> >>>
> >> >>>  sudo unshare -U --map-root
> >> >>>  udevadm monitor -k
> >> >>>  # Now change to initial user namespace and e.g. do
> >> >>>  modprobe kvm
> >> >>>  # or
> >> >>>  rmmod kvm
> >> >>>
> >> >>> will allow the non-initial user namespace to retrieve all uevents from the
> >> >>> host. This seems very anecdotal given that in the general case user
> >> >>> namespaces do not see any uevents and also can't really do anything useful
> >> >>> with them.
> >> >>>
> >> >>> Additionally, it is now possible to send uevents from userspace. As such we
> >> >>> can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
> >> >>> namespace of the network namespace of the netlink socket) userspace process
> >> >>> make a decision what uevents should be sent.
> >> >>>
> >> >>> This makes me think that we should simply ensure that uevents for kobjects
> >> >>> that do not carry a namespace tag are *always* filtered by user namespace
> >> >>> in kobj_bcast_filter(). Specifically:
> >> >>> - If the owning user namespace of the uevent socket is not init_user_ns the
> >> >>>   event will always be filtered.
> >> >>> - If the network namespace the uevent socket belongs to was created in the
> >> >>>   initial user namespace but was opened from a non-initial user namespace
> >> >>>   the event will be filtered as well.
> >> >>> Put another way, uevents for kobjects not carrying a namespace tag are now
> >> >>> always only sent to the initial user namespace. The regression potential
> >> >>> for this is near to non-existent since user namespaces can't really do
> >> >>> anything with interesting devices.
> >> >>>
> >> >>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> >> >>> ---
> >> >>>  lib/kobject_uevent.c | 10 +++++++++-
> >> >>>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >>>
> >> >>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >>> index 15ea216a67ce..cb98cddb6e3b 100644
> >> >>> --- a/lib/kobject_uevent.c
> >> >>> +++ b/lib/kobject_uevent.c
> >> >>> @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> >> >>>  		return sock_ns != ns;
> >> >>>  	}
> >> >>>  
> >> >>> -	return 0;
> >> >>> +	/*
> >> >>> +	 * The kobject does not carry a namespace tag so filter by user
> >> >>> +	 * namespace below.
> >> >>> +	 */
> >> >>> +	if (sock_net(dsk)->user_ns != &init_user_ns)
> >> >>> +		return 1;
> >> >>> +
> >> >>> +	/* Check if socket was opened from non-initial user namespace. */
> >> >>> +	return sk_user_ns(dsk) != &init_user_ns;
> >> >>>  }
> >> >>>  #endif
> >> >>
> >> >> So, this prohibits to listen events of all devices except network-related
> >> >> in containers? If it's so, I don't think it's a good solution. Uevents is not
> >> > 
> >> > No, this is not correct: As it is right now *without my patch* no
> >> > non-initial user namespace is receiving *any uevents* but those
> >> > specifically namespaced such as those for network devices. This patch
> >> > doesn't change that at all. The commit message outlines this in detail
> >> > how this comes about.
> >> > There is only one case where this currently breaks and this is as I
> >> > outlined explicitly in my commit message when you create a new user
> >> > namespace and map container(0) -> host(0). This patch fixes this.
> >> 
> >> Could you please point the place, where non-initial user namespaces are filtered?
> >> I only see the kobj_bcast_filter() logic, and it used to return 0, which means "accepted".
> >> Now it will return 1 sometimes.
> >
> > Oh sure, it's in the commit message though. The callchain is
> > lib/kobject_uevent.c:kobject_uevent_net_broadcast() ->
> > nnet/netlink/af_netlink.c:netlink_broadcast_filtered() ->
> > net/netlink/af_netlink.c:do_one_broadcast():
> >
> > This codepiece will check whether the openened socket holds
> > CAP_NET_BROADCAST in the user namespace of the target network namespace
> > which it won't because we don't have device namespaces and all devices
> > belong to the initial set of namespaces.
> >
> >         if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> >                              CAP_NET_BROADCAST))
> >         j       return;
> >
> 
> The above that only applies if someone has set NETLINK_F_LISTEN_ALL_NSID
> on their socket and has had someone with the appropriate privileges
> assign a peerrnetid.
> 
> All of which is to say that file_ns_capable is not nearly as applicable
> as it might be, and if you can pass the other two checks I think it is
> pointless (because the peernet attributes are not generated for
> kobj_uevents) but valid to receive events from outside your network
> namespace.
> 
> 
> I might be missing something but I don't see anything excluding network
> namespaces owned by !init_user_ns excluded from the kobject_uevent
> logic.
> 
> The uevent_sock_list has one entry per network namespace. And
> kobject_uevent_net_broacast appears to walk each one.

Yeah, it definitely does.

> 
> I had a memory of filtering uevent messages and I had a memory
> that the netlink_has_listeners had a per network namespace effect.
> Neither seems true from my inspection of the code tonight.

Yeah, I drew the same conclusion.

> 
> If we are not filtering ordinary uevents at least at the user namespace
> level that does seem to be at least a nuisance.

This patch would filter based on user namespace and bump it from "we're
accidently doing the right thing" to "we're doing the right on purpose"
and without accidently leaking uevents in some corner cases.
Using kobj_bcast_filter() for this seems like the right thing to do.
This sounds like you don't disagree with the approach I'm taking here?

On a sidenote, it also very much feels like we're leaking information if
we're not filtering based on user namespaces on purpose. Non-initial
user namespaces should not be able to receive information about device
attribues simply via netlink uevent messages. At least not *trivially*
[1] by opening a netlink uevent socket.

> 
> 
> Christian can you dig a little deeper into this.  I have a feeling that
> there are some real efficiency improvements that we could make to
> kobject_uevent_net_broadcast if nothing else.

Yeah, for sure! I already started doing this. This patch here is
basically a preparatory patch for that work which is on my todo.

Christian

> 
> Perhaps you could see where uevents are broadcast by poking
> the sysfs uevent of an existing device, and triggering a retransmit.
> 
> Eric
> 
>                                                      
[1]: I mean they technically could very likely still try to gather the
     same info by constantly parsing all of sysfs and try to detect new
     PCI paths/directories and so on but it shouldn't be as easy as
     opening a netlink uevent socket.

^ permalink raw reply

* Re: [RFC PATCH net-next v5 2/4] net: Introduce generic bypass module
From: Jiri Pirko @ 2018-04-06 12:57 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <1522962503-3963-3-git-send-email-sridhar.samudrala@intel.com>

Thu, Apr 05, 2018 at 11:08:21PM CEST, sridhar.samudrala@intel.com wrote:
>This provides a generic interface for paravirtual drivers to listen
>for netdev register/unregister/link change events from pci ethernet
>devices with the same MAC and takeover their datapath. The notifier and
>event handling code is based on the existing netvsc implementation. A
>paravirtual driver can use this module by registering a set of ops and
>each instance of the device when it is probed.
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>---
> include/net/bypass.h |  80 ++++++++++
> net/Kconfig          |  18 +++
> net/core/Makefile    |   1 +
> net/core/bypass.c    | 406 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 505 insertions(+)
> create mode 100644 include/net/bypass.h
> create mode 100644 net/core/bypass.c
>
>diff --git a/include/net/bypass.h b/include/net/bypass.h
>new file mode 100644
>index 000000000000..e2dd122f951a
>--- /dev/null
>+++ b/include/net/bypass.h
>@@ -0,0 +1,80 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/* Copyright (c) 2018, Intel Corporation. */
>+
>+#ifndef _NET_BYPASS_H
>+#define _NET_BYPASS_H
>+
>+#include <linux/netdevice.h>
>+
>+struct bypass_ops {

Perhaps "net_bypass_" would be better prefix for this module structs
and functions. No strong opinion though.


>+	int (*register_child)(struct net_device *bypass_netdev,
>+			      struct net_device *child_netdev);

We have master/slave upper/lower netdevices. This adds "child". Consider
using some existing names. Not sure if possible without loss of meaning.


>+	int (*join_child)(struct net_device *bypass_netdev,
>+			  struct net_device *child_netdev);
>+	int (*unregister_child)(struct net_device *bypass_netdev,
>+				struct net_device *child_netdev);
>+	int (*release_child)(struct net_device *bypass_netdev,
>+			     struct net_device *child_netdev);
>+	int (*update_link)(struct net_device *bypass_netdev,
>+			   struct net_device *child_netdev);
>+	rx_handler_result_t (*handle_frame)(struct sk_buff **pskb);
>+};
>+
>+struct bypass_instance {
>+	struct list_head list;
>+	struct net_device __rcu *bypass_netdev;
>+	struct bypass *bypass;
>+};
>+
>+struct bypass {
>+	struct list_head list;
>+	const struct bypass_ops *ops;
>+	const struct net_device_ops *netdev_ops;
>+	struct list_head instance_list;
>+	struct mutex lock;
>+};
>+
>+#if IS_ENABLED(CONFIG_NET_BYPASS)
>+
>+struct bypass *bypass_register_driver(const struct bypass_ops *ops,
>+				      const struct net_device_ops *netdev_ops);
>+void bypass_unregister_driver(struct bypass *bypass);
>+
>+int bypass_register_instance(struct bypass *bypass, struct net_device *dev);
>+int bypass_unregister_instance(struct bypass *bypass, struct net_device	*dev);
>+
>+int bypass_unregister_child(struct net_device *child_netdev);
>+
>+#else
>+
>+static inline
>+struct bypass *bypass_register_driver(const struct bypass_ops *ops,
>+				      const struct net_device_ops *netdev_ops)
>+{
>+	return NULL;
>+}
>+
>+static inline void bypass_unregister_driver(struct bypass *bypass)
>+{
>+}
>+
>+static inline int bypass_register_instance(struct bypass *bypass,
>+					   struct net_device *dev)
>+{
>+	return 0;
>+}
>+
>+static inline int bypass_unregister_instance(struct bypass *bypass,
>+					     struct net_device *dev)
>+{
>+	return 0;
>+}
>+
>+static inline int bypass_unregister_child(struct net_device *child_netdev)
>+{
>+	return 0;
>+}
>+
>+#endif
>+
>+#endif /* _NET_BYPASS_H */
>diff --git a/net/Kconfig b/net/Kconfig
>index 0428f12c25c2..994445f4a96a 100644
>--- a/net/Kconfig
>+++ b/net/Kconfig
>@@ -423,6 +423,24 @@ config MAY_USE_DEVLINK
> 	  on MAY_USE_DEVLINK to ensure they do not cause link errors when
> 	  devlink is a loadable module and the driver using it is built-in.
> 
>+config NET_BYPASS
>+	tristate "Bypass interface"
>+	---help---
>+	  This provides a generic interface for paravirtual drivers to listen
>+	  for netdev register/unregister/link change events from pci ethernet
>+	  devices with the same MAC and takeover their datapath. This also
>+	  enables live migration of a VM with direct attached VF by failing
>+	  over to the paravirtual datapath when the VF is unplugged.
>+
>+config MAY_USE_BYPASS
>+	tristate
>+	default m if NET_BYPASS=m
>+	default y if NET_BYPASS=y || NET_BYPASS=n
>+	help
>+	  Drivers using the bypass infrastructure should have a dependency
>+	  on MAY_USE_BYPASS to ensure they do not cause link errors when
>+	  bypass is a loadable module and the driver using it is built-in.
>+
> endif   # if NET
> 
> # Used by archs to tell that they support BPF JIT compiler plus which flavour.
>diff --git a/net/core/Makefile b/net/core/Makefile
>index 6dbbba8c57ae..a9727ed1c8fc 100644
>--- a/net/core/Makefile
>+++ b/net/core/Makefile
>@@ -30,3 +30,4 @@ obj-$(CONFIG_DST_CACHE) += dst_cache.o
> obj-$(CONFIG_HWBM) += hwbm.o
> obj-$(CONFIG_NET_DEVLINK) += devlink.o
> obj-$(CONFIG_GRO_CELLS) += gro_cells.o
>+obj-$(CONFIG_NET_BYPASS) += bypass.o
>diff --git a/net/core/bypass.c b/net/core/bypass.c
>new file mode 100644
>index 000000000000..7bde962ec3d4
>--- /dev/null
>+++ b/net/core/bypass.c
>@@ -0,0 +1,406 @@
>+// SPDX-License-Identifier: GPL-2.0
>+/* Copyright (c) 2018, Intel Corporation. */
>+
>+/* A common module to handle registrations and notifications for paravirtual
>+ * drivers to enable accelerated datapath and support VF live migration.
>+ *
>+ * The notifier and event handling code is based on netvsc driver.
>+ */
>+
>+#include <linux/netdevice.h>
>+#include <linux/etherdevice.h>
>+#include <linux/ethtool.h>
>+#include <linux/module.h>
>+#include <linux/slab.h>
>+#include <linux/netdevice.h>
>+#include <linux/netpoll.h>
>+#include <linux/rtnetlink.h>
>+#include <linux/if_vlan.h>
>+#include <net/sch_generic.h>
>+#include <uapi/linux/if_arp.h>
>+#include <net/bypass.h>
>+
>+static LIST_HEAD(bypass_list);
>+
>+static DEFINE_MUTEX(bypass_mutex);

Why mutex? Apparently you don't need to sleep while holding a lock.
Simple spinlock would do.


>+
>+struct bypass_instance *bypass_instance_alloc(struct net_device *dev)
>+{
>+	struct bypass_instance *bypass_instance;
>+
>+	bypass_instance = kzalloc(sizeof(*bypass_instance), GFP_KERNEL);
>+	if (!bypass_instance)
>+		return NULL;
>+
>+	dev_hold(dev);
>+	rcu_assign_pointer(bypass_instance->bypass_netdev, dev);
>+
>+	return bypass_instance;
>+}
>+
>+void bypass_instance_free(struct bypass_instance *bypass_instance)
>+{
>+	struct net_device *bypass_netdev;
>+
>+	bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+
>+	dev_put(bypass_netdev);
>+	kfree(bypass_instance);
>+}
>+
>+static struct bypass_instance *bypass_get_instance_bymac(u8 *mac)
>+{
>+	struct bypass_instance *bypass_instance;
>+	struct net_device *bypass_netdev;
>+	struct bypass *bypass;
>+
>+	list_for_each_entry(bypass, &bypass_list, list) {
>+		mutex_lock(&bypass->lock);
>+		list_for_each_entry(bypass_instance, &bypass->instance_list,
>+				    list) {
>+			bypass_netdev =
>+				rcu_dereference(bypass_instance->bypass_netdev);
>+			if (ether_addr_equal(bypass_netdev->perm_addr, mac)) {
>+				mutex_unlock(&bypass->lock);
>+				goto out;
>+			}
>+		}
>+		mutex_unlock(&bypass->lock);
>+	}
>+
>+	bypass_instance = NULL;
>+out:
>+	return bypass_instance;
>+}
>+
>+static int bypass_register_child(struct net_device *child_netdev)
>+{
>+	struct bypass_instance *bypass_instance;
>+	struct bypass *bypass;
>+	struct net_device *bypass_netdev;
>+	int ret, orig_mtu;
>+
>+	ASSERT_RTNL();
>+
>+	mutex_lock(&bypass_mutex);
>+	bypass_instance = bypass_get_instance_bymac(child_netdev->perm_addr);
>+	if (!bypass_instance) {
>+		mutex_unlock(&bypass_mutex);
>+		goto done;
>+	}
>+
>+	bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+	bypass = bypass_instance->bypass;
>+	mutex_unlock(&bypass_mutex);
>+
>+	if (!bypass->ops->register_child)
>+		goto done;
>+
>+	ret = bypass->ops->register_child(bypass_netdev, child_netdev);
>+	if (ret != 0)
>+		goto done;
>+
>+	ret = netdev_rx_handler_register(child_netdev,
>+					 bypass->ops->handle_frame,
>+					 bypass_netdev);
>+	if (ret != 0) {
>+		netdev_err(child_netdev,
>+			   "can not register bypass rx handler (err = %d)\n",
>+			   ret);
>+		goto rx_handler_failed;
>+	}
>+
>+	ret = netdev_upper_dev_link(child_netdev, bypass_netdev, NULL);
>+	if (ret != 0) {
>+		netdev_err(child_netdev,

No line-wrap is needed here and in other cases like this.


>+			   "can not set master device %s (err = %d)\n",
>+			   bypass_netdev->name, ret);
>+		goto upper_link_failed;
>+	}
>+
>+	child_netdev->flags |= IFF_SLAVE;

Don't reuse IFF_SLAVE. That is bonding-specific thing. I know that
netvsc uses it, it is wrong.
Please rather introduce:
IFF_BYPASS for master
and IFF_BYPASS_SLAVE for slaves.




>+
>+	if (netif_running(bypass_netdev)) {
>+		ret = dev_open(child_netdev);
>+		if (ret && (ret != -EBUSY)) {
>+			netdev_err(bypass_netdev,
>+				   "Opening child %s failed ret:%d\n",
>+				   child_netdev->name, ret);
>+			goto err_interface_up;
>+		}
>+	}
>+
>+	/* Align MTU of child with master */
>+	orig_mtu = child_netdev->mtu;
>+	ret = dev_set_mtu(child_netdev, bypass_netdev->mtu);
>+	if (ret != 0) {
>+		netdev_err(bypass_netdev,
>+			   "unable to change mtu of %s to %u register failed\n",
>+			   child_netdev->name, bypass_netdev->mtu);
>+		goto err_set_mtu;
>+	}
>+
>+	ret = bypass->ops->join_child(bypass_netdev, child_netdev);
>+	if (ret != 0)
>+		goto err_join;
>+
>+	call_netdevice_notifiers(NETDEV_JOIN, child_netdev);
>+
>+	goto done;
>+
>+err_join:
>+	dev_set_mtu(child_netdev, orig_mtu);
>+err_set_mtu:
>+	dev_close(child_netdev);
>+err_interface_up:
>+	netdev_upper_dev_unlink(child_netdev, bypass_netdev);
>+	child_netdev->flags &= ~IFF_SLAVE;
>+upper_link_failed:
>+	netdev_rx_handler_unregister(child_netdev);
>+rx_handler_failed:
>+	bypass->ops->unregister_child(bypass_netdev, child_netdev);
>+
>+done:
>+	return NOTIFY_DONE;
>+}
>+
>+int bypass_unregister_child(struct net_device *child_netdev)
>+{
>+	struct bypass_instance *bypass_instance;
>+	struct net_device *bypass_netdev;
>+	struct bypass *bypass;
>+	int ret;
>+
>+	ASSERT_RTNL();
>+
>+	mutex_lock(&bypass_mutex);
>+	bypass_instance = bypass_get_instance_bymac(child_netdev->perm_addr);
>+	if (!bypass_instance) {
>+		mutex_unlock(&bypass_mutex);
>+		goto done;
>+	}
>+
>+	bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+	bypass = bypass_instance->bypass;
>+	mutex_unlock(&bypass_mutex);
>+
>+	ret = bypass->ops->release_child(bypass_netdev, child_netdev);
>+	if (ret != 0)
>+		goto done;
>+
>+	netdev_rx_handler_unregister(child_netdev);
>+	netdev_upper_dev_unlink(child_netdev, bypass_netdev);
>+	child_netdev->flags &= ~IFF_SLAVE;
>+
>+	if (!bypass->ops->unregister_child)
>+		goto done;
>+
>+	bypass->ops->unregister_child(bypass_netdev, child_netdev);
>+
>+done:
>+	return NOTIFY_DONE;
>+}
>+EXPORT_SYMBOL(bypass_unregister_child);

Please use "EXPORT_SYMBOL_GPL" for all exported symbols.


>+
>+static int bypass_update_link(struct net_device *child_netdev)
>+{
>+	struct bypass_instance *bypass_instance;
>+	struct net_device *bypass_netdev;
>+	struct bypass *bypass;
>+
>+	ASSERT_RTNL();
>+
>+	mutex_lock(&bypass_mutex);
>+	bypass_instance = bypass_get_instance_bymac(child_netdev->perm_addr);

You don't really need this lookup. The kernel knows about the master
device, you can just use netdev_master_upper_dev_get_rcu() to get it.


>+	if (!bypass_instance) {
>+		mutex_unlock(&bypass_mutex);
>+		goto done;
>+	}
>+
>+	bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+	bypass = bypass_instance->bypass;
>+	mutex_unlock(&bypass_mutex);
>+
>+	if (!bypass->ops->update_link)
>+		goto done;
>+
>+	bypass->ops->update_link(bypass_netdev, child_netdev);
>+
>+done:
>+	return NOTIFY_DONE;
>+}
>+
>+static bool bypass_validate_child_dev(struct net_device *dev)
>+{
>+	/* Avoid non-Ethernet type devices */
>+	if (dev->type != ARPHRD_ETHER)
>+		return false;
>+
>+	/* Avoid Vlan dev with same MAC registering as VF */
>+	if (is_vlan_dev(dev))
>+		return false;
>+
>+	/* Avoid Bonding master dev with same MAC registering as child dev */
>+	if ((dev->priv_flags & IFF_BONDING) && (dev->flags & IFF_MASTER))
>+		return false;
>+
>+	return true;
>+}
>+
>+static int
>+bypass_event(struct notifier_block *this, unsigned long event, void *ptr)
>+{
>+	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
>+	struct bypass *bypass;
>+
>+	/* Skip Parent events */
>+	mutex_lock(&bypass_mutex);
>+	list_for_each_entry(bypass, &bypass_list, list) {
>+		if (event_dev->netdev_ops == bypass->netdev_ops) {
>+			mutex_unlock(&bypass_mutex);
>+			return NOTIFY_DONE;
>+		}

What you need instead of this is an identification helper
netif_is_bypass_master()
similar to
netif_is_team_master()
netif_is_bridge_master()
etc

 

>+	}
>+	mutex_unlock(&bypass_mutex);
>+
>+	if (!bypass_validate_child_dev(event_dev))
>+		return NOTIFY_DONE;
>+
>+	switch (event) {
>+	case NETDEV_REGISTER:
>+		return bypass_register_child(event_dev);
>+	case NETDEV_UNREGISTER:
>+		return bypass_unregister_child(event_dev);
>+	case NETDEV_UP:
>+	case NETDEV_DOWN:
>+	case NETDEV_CHANGE:
>+		return bypass_update_link(event_dev);
>+	default:
>+		return NOTIFY_DONE;
>+	}
>+}
>+
>+static struct notifier_block bypass_notifier = {
>+	.notifier_call = bypass_event,
>+};
>+
>+static void bypass_register_existing_child(struct net_device *bypass_netdev)
>+{
>+	struct net *net = dev_net(bypass_netdev);
>+	struct net_device *dev;
>+
>+	rtnl_lock();
>+	for_each_netdev(net, dev) {
>+		if (dev == bypass_netdev)
>+			continue;
>+		if (!bypass_validate_child_dev(dev))
>+			continue;
>+		if (ether_addr_equal(bypass_netdev->perm_addr, dev->perm_addr))
>+			bypass_register_child(dev);
>+	}
>+	rtnl_unlock();
>+}
>+
>+int bypass_register_instance(struct bypass *bypass, struct net_device *dev)
>+{
>+	struct bypass_instance *bypass_instance;

No need to allocate this instace here. You can just have is embedded
inside netdevice priv and pass pointer to it here. You can pass the
pointer back to the driver when you call ops as the driver can get priv
back by it.

I would also call it "struct bypass_master" and this function
"bypass_master_register".

It should contain the ops pointer too.


>+	struct net_device *bypass_netdev;
>+	int ret = 0;
>+
>+	mutex_lock(&bypass->lock);
>+	list_for_each_entry(bypass_instance, &bypass->instance_list, list) {
>+		bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+		if (bypass_netdev == dev) {

This means the driver registered one netdev twice. That is a bug in
driver, so WARN_ON would be nice here to point that out.


>+			ret = -EEXIST;
>+			goto done;
>+		}
>+	}
>+
>+	bypass_instance = bypass_instance_alloc(dev);
>+	if (!bypass_instance) {
>+		ret = -ENOMEM;
>+		goto done;
>+	}
>+
>+	bypass_instance->bypass = bypass;
>+	list_add_tail(&bypass_instance->list, &bypass->instance_list);
>+
>+done:
>+	mutex_unlock(&bypass->lock);
>+	bypass_register_existing_child(dev);
>+	return ret;
>+}
>+EXPORT_SYMBOL(bypass_register_instance);
>+
>+int bypass_unregister_instance(struct bypass *bypass, struct net_device *dev)
>+{
>+	struct bypass_instance *bypass_instance;
>+	struct net_device *bypass_netdev;
>+	int ret = 0;
>+
>+	mutex_lock(&bypass->lock);
>+	list_for_each_entry(bypass_instance, &bypass->instance_list, list) {
>+		bypass_netdev = rcu_dereference(bypass_instance->bypass_netdev);
>+		if (bypass_netdev == dev) {
>+			list_del(&bypass_instance->list);
>+			bypass_instance_free(bypass_instance);
>+			goto done;
>+		}
>+	}
>+
>+	ret = -ENOENT;
>+done:
>+	mutex_unlock(&bypass->lock);
>+	return ret;
>+}
>+EXPORT_SYMBOL(bypass_unregister_instance);
>+
>+struct bypass *bypass_register_driver(const struct bypass_ops *ops,
>+				      const struct net_device_ops *netdev_ops)

I don't see why you need a list of drivers. What you need is just a list
of instances - bypass masters (probably to call them like that in the
code as well). Well, you can use the common netdevice list for that
purpose with the identification helper I mentioned above. Then you need
no lists and no mutexes/spinlocks.


>+{
>+	struct bypass *bypass;
>+
>+	bypass = kzalloc(sizeof(*bypass), GFP_KERNEL);
>+	if (!bypass)
>+		return NULL;
>+
>+	bypass->ops = ops;
>+	bypass->netdev_ops = netdev_ops;
>+	INIT_LIST_HEAD(&bypass->instance_list);
>+
>+	mutex_lock(&bypass_mutex);
>+	list_add_tail(&bypass->list, &bypass_list);
>+	mutex_unlock(&bypass_mutex);
>+
>+	return bypass;
>+}
>+EXPORT_SYMBOL_GPL(bypass_register_driver);
>+
>+void bypass_unregister_driver(struct bypass *bypass)
>+{
>+	mutex_lock(&bypass_mutex);
>+	list_del(&bypass->list);
>+	mutex_unlock(&bypass_mutex);
>+
>+	kfree(bypass);
>+}
>+EXPORT_SYMBOL_GPL(bypass_unregister_driver);
>+
>+static __init int
>+bypass_init(void)
>+{
>+	register_netdevice_notifier(&bypass_notifier);
>+
>+	return 0;
>+}
>+module_init(bypass_init);
>+
>+static __exit
>+void bypass_exit(void)
>+{
>+	unregister_netdevice_notifier(&bypass_notifier);
>+}
>+module_exit(bypass_exit);
>+
>+MODULE_DESCRIPTION("Bypass infrastructure/interface for Paravirtual drivers");
>+MODULE_LICENSE("GPL v2");
>-- 
>2.14.3
>

^ permalink raw reply

* Re: [RFC PATCH net-next v5 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Jiri Pirko @ 2018-04-06 12:48 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <1522962503-3963-4-git-send-email-sridhar.samudrala@intel.com>

Thu, Apr 05, 2018 at 11:08:22PM CEST, sridhar.samudrala@intel.com wrote:
>This patch enables virtio_net to switch over to a VF datapath when a VF
>netdev is present with the same MAC address. It allows live migration
>of a VM with a direct attached VF without the need to setup a bond/team
>between a VF and virtio net device in the guest.
>
>The hypervisor needs to enable only one datapath at any time so that
>packets don't get looped back to the VM over the other datapath. When a VF
>is plugged, the virtio datapath link state can be marked as down. The
>hypervisor needs to unplug the VF device from the guest on the source host
>and reset the MAC filter of the VF to initiate failover of datapath to
>virtio before starting the migration. After the migration is completed,
>the destination hypervisor sets the MAC filter on the VF and plugs it back
>to the guest to switch over to VF datapath.
>
>When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>created that acts as a master device and tracks the state of the 2 lower
>netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>passthru device with the same MAC is registered as 'active' netdev.
>
>This patch is based on the discussion initiated by Jesse on this thread.
>https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>
>Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>---
> drivers/net/Kconfig      |   1 +
> drivers/net/virtio_net.c | 612 ++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 612 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>index 891846655000..9e2cf61fd1c1 100644
>--- a/drivers/net/Kconfig
>+++ b/drivers/net/Kconfig
>@@ -331,6 +331,7 @@ config VETH
> config VIRTIO_NET
> 	tristate "Virtio network driver"
> 	depends on VIRTIO
>+	depends on MAY_USE_BYPASS
> 	---help---
> 	  This is the virtual network driver for virtio.  It can be used with
> 	  QEMU based VMMs (like KVM or Xen).  Say Y or M.
>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>index befb5944f3fd..86b2f8f2947d 100644
>--- a/drivers/net/virtio_net.c
>+++ b/drivers/net/virtio_net.c
>@@ -30,8 +30,11 @@
> #include <linux/cpu.h>
> #include <linux/average.h>
> #include <linux/filter.h>
>+#include <linux/netdevice.h>
>+#include <linux/pci.h>
> #include <net/route.h>
> #include <net/xdp.h>
>+#include <net/bypass.h>
> 
> static int napi_weight = NAPI_POLL_WEIGHT;
> module_param(napi_weight, int, 0444);
>@@ -206,6 +209,9 @@ struct virtnet_info {
> 	u32 speed;
> 
> 	unsigned long guest_offloads;
>+
>+	/* upper netdev created when BACKUP feature enabled */
>+	struct net_device __rcu *bypass_netdev;
> };
> 
> struct padded_vnet_hdr {
>@@ -2275,6 +2281,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
> 	}
> }
> 
>+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>+				      size_t len)
>+{
>+	struct virtnet_info *vi = netdev_priv(dev);
>+	int ret;
>+
>+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>+		return -EOPNOTSUPP;
>+
>+	ret = snprintf(buf, len, "_bkup");
>+	if (ret >= len)
>+		return -EOPNOTSUPP;
>+
>+	return 0;
>+}
>+
> static const struct net_device_ops virtnet_netdev = {
> 	.ndo_open            = virtnet_open,
> 	.ndo_stop   	     = virtnet_close,
>@@ -2292,6 +2314,7 @@ static const struct net_device_ops virtnet_netdev = {
> 	.ndo_xdp_xmit		= virtnet_xdp_xmit,
> 	.ndo_xdp_flush		= virtnet_xdp_flush,
> 	.ndo_features_check	= passthru_features_check,
>+	.ndo_get_phys_port_name	= virtnet_get_phys_port_name,
> };
> 
> static void virtnet_config_changed_work(struct work_struct *work)
>@@ -2689,6 +2712,576 @@ static int virtnet_validate(struct virtio_device *vdev)
> 	return 0;
> }
> 
>+/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
>+ * When BACKUP feature is enabled, an additional netdev(bypass netdev)
>+ * is created that acts as a master device and tracks the state of the
>+ * 2 lower netdevs. The original virtio_net netdev is registered as
>+ * 'backup' netdev and a passthru device with the same MAC is registered
>+ * as 'active' netdev.
>+ */
>+
>+/* bypass state maintained when BACKUP feature is enabled */
>+struct virtnet_bypass_info {
>+	/* passthru netdev with same MAC */
>+	struct net_device __rcu *active_netdev;
>+
>+	/* virtio_net netdev */
>+	struct net_device __rcu *backup_netdev;
>+
>+	/* active netdev stats */
>+	struct rtnl_link_stats64 active_stats;
>+
>+	/* backup netdev stats */
>+	struct rtnl_link_stats64 backup_stats;
>+
>+	/* aggregated stats */
>+	struct rtnl_link_stats64 bypass_stats;
>+
>+	/* spinlock while updating stats */
>+	spinlock_t stats_lock;
>+};
>+
>+static int virtnet_bypass_open(struct net_device *dev)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	struct net_device *active_netdev, *backup_netdev;
>+	int err;
>+
>+	netif_carrier_off(dev);
>+	netif_tx_wake_all_queues(dev);
>+
>+	active_netdev = rtnl_dereference(vbi->active_netdev);
>+	if (active_netdev) {
>+		err = dev_open(active_netdev);
>+		if (err)
>+			goto err_active_open;
>+	}
>+
>+	backup_netdev = rtnl_dereference(vbi->backup_netdev);
>+	if (backup_netdev) {
>+		err = dev_open(backup_netdev);
>+		if (err)
>+			goto err_backup_open;
>+	}

This should be moved to bypass module.
See "***" below.

>+
>+	return 0;
>+
>+err_backup_open:
>+	dev_close(active_netdev);
>+err_active_open:
>+	netif_tx_disable(dev);
>+	return err;
>+}
>+
>+static int virtnet_bypass_close(struct net_device *dev)
>+{
>+	struct virtnet_bypass_info *vi = netdev_priv(dev);
>+	struct net_device *child_netdev;
>+
>+	netif_tx_disable(dev);
>+
>+	child_netdev = rtnl_dereference(vi->active_netdev);
>+	if (child_netdev)
>+		dev_close(child_netdev);
>+
>+	child_netdev = rtnl_dereference(vi->backup_netdev);
>+	if (child_netdev)
>+		dev_close(child_netdev);

This should be moved to bypass module.

>+
>+	return 0;
>+}
>+
>+static netdev_tx_t virtnet_bypass_drop_xmit(struct sk_buff *skb,
>+					    struct net_device *dev)
>+{
>+	atomic_long_inc(&dev->tx_dropped);
>+	dev_kfree_skb_any(skb);
>+	return NETDEV_TX_OK;
>+}
>+
>+static bool virtnet_bypass_xmit_ready(struct net_device *dev)
>+{
>+	return netif_running(dev) && netif_carrier_ok(dev);
>+}
>+
>+static netdev_tx_t virtnet_bypass_start_xmit(struct sk_buff *skb,
>+					     struct net_device *dev)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	struct net_device *xmit_dev;
>+
>+	/* Try xmit via active netdev followed by backup netdev */
>+	xmit_dev = rcu_dereference_bh(vbi->active_netdev);
>+	if (!xmit_dev || !virtnet_bypass_xmit_ready(xmit_dev)) {
>+		xmit_dev = rcu_dereference_bh(vbi->backup_netdev);

This should be moved to bypass module.
	
>+		if (!xmit_dev || !virtnet_bypass_xmit_ready(xmit_dev))
>+			return virtnet_bypass_drop_xmit(skb, dev);
>+	}
>+
>+	skb->dev = xmit_dev;
>+	skb->queue_mapping = qdisc_skb_cb(skb)->slave_dev_queue_mapping;
>+
>+	return dev_queue_xmit(skb);
>+}
>+
>+static u16 virtnet_bypass_select_queue(struct net_device *dev,
>+				       struct sk_buff *skb, void *accel_priv,
>+				       select_queue_fallback_t fallback)
>+{
>+	/* This helper function exists to help dev_pick_tx get the correct
>+	 * destination queue.  Using a helper function skips a call to
>+	 * skb_tx_hash and will put the skbs in the queue we expect on their
>+	 * way down to the bonding driver.
>+	 */
>+	u16 txq = skb_rx_queue_recorded(skb) ? skb_get_rx_queue(skb) : 0;
>+
>+	/* Save the original txq to restore before passing to the driver */
>+	qdisc_skb_cb(skb)->slave_dev_queue_mapping = skb->queue_mapping;
>+
>+	if (unlikely(txq >= dev->real_num_tx_queues)) {
>+		do {
>+			txq -= dev->real_num_tx_queues;
>+		} while (txq >= dev->real_num_tx_queues);
>+	}
>+
>+	return txq;
>+}
>+
>+/* fold stats, assuming all rtnl_link_stats64 fields are u64, but
>+ * that some drivers can provide 32bit values only.
>+ */
>+static void virtnet_bypass_fold_stats(struct rtnl_link_stats64 *_res,
>+				      const struct rtnl_link_stats64 *_new,
>+				      const struct rtnl_link_stats64 *_old)
>+{
>+	const u64 *new = (const u64 *)_new;
>+	const u64 *old = (const u64 *)_old;
>+	u64 *res = (u64 *)_res;
>+	int i;
>+
>+	for (i = 0; i < sizeof(*_res) / sizeof(u64); i++) {
>+		u64 nv = new[i];
>+		u64 ov = old[i];
>+		s64 delta = nv - ov;
>+
>+		/* detects if this particular field is 32bit only */
>+		if (((nv | ov) >> 32) == 0)
>+			delta = (s64)(s32)((u32)nv - (u32)ov);
>+
>+		/* filter anomalies, some drivers reset their stats
>+		 * at down/up events.
>+		 */
>+		if (delta > 0)
>+			res[i] += delta;
>+	}
>+}
>+
>+static void virtnet_bypass_get_stats(struct net_device *dev,
>+				     struct rtnl_link_stats64 *stats)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	const struct rtnl_link_stats64 *new;
>+	struct rtnl_link_stats64 temp;
>+	struct net_device *child_netdev;
>+
>+	spin_lock(&vbi->stats_lock);
>+	memcpy(stats, &vbi->bypass_stats, sizeof(*stats));
>+
>+	rcu_read_lock();
>+
>+	child_netdev = rcu_dereference(vbi->active_netdev);
>+	if (child_netdev) {
>+		new = dev_get_stats(child_netdev, &temp);
>+		virtnet_bypass_fold_stats(stats, new, &vbi->active_stats);
>+		memcpy(&vbi->active_stats, new, sizeof(*new));
>+	}
>+
>+	child_netdev = rcu_dereference(vbi->backup_netdev);
>+	if (child_netdev) {
>+		new = dev_get_stats(child_netdev, &temp);
>+		virtnet_bypass_fold_stats(stats, new, &vbi->backup_stats);
>+		memcpy(&vbi->backup_stats, new, sizeof(*new));
>+	}
>+
>+	rcu_read_unlock();
>+
>+	memcpy(&vbi->bypass_stats, stats, sizeof(*stats));
>+	spin_unlock(&vbi->stats_lock);
>+}

This should be moved to bypass module.

>+
>+static int virtnet_bypass_change_mtu(struct net_device *dev, int new_mtu)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	struct net_device *active_netdev, *backup_netdev;
>+	int ret = 0;
>+
>+	active_netdev = rcu_dereference(vbi->active_netdev);
>+	if (active_netdev) {
>+		ret = dev_set_mtu(active_netdev, new_mtu);
>+		if (ret)
>+			return ret;
>+	}
>+
>+	backup_netdev = rcu_dereference(vbi->backup_netdev);
>+	if (backup_netdev) {
>+		ret = dev_set_mtu(backup_netdev, new_mtu);
>+		if (ret) {
>+			dev_set_mtu(active_netdev, dev->mtu);
>+			return ret;
>+		}
>+	}
>+
>+	dev->mtu = new_mtu;
>+	return 0;
>+}

This should be moved to bypass module.

	
>+
>+static void virtnet_bypass_set_rx_mode(struct net_device *dev)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	struct net_device *child_netdev;
>+
>+	rcu_read_lock();
>+
>+	child_netdev = rcu_dereference(vbi->active_netdev);
>+	if (child_netdev) {
>+		dev_uc_sync_multiple(child_netdev, dev);
>+		dev_mc_sync_multiple(child_netdev, dev);
>+	}
>+
>+	child_netdev = rcu_dereference(vbi->backup_netdev);
>+	if (child_netdev) {
>+		dev_uc_sync_multiple(child_netdev, dev);
>+		dev_mc_sync_multiple(child_netdev, dev);
>+	}
>+
>+	rcu_read_unlock();
>+}

This should be moved to bypass module.


>+
>+static const struct net_device_ops virtnet_bypass_netdev_ops = {
>+	.ndo_open		= virtnet_bypass_open,
>+	.ndo_stop		= virtnet_bypass_close,
>+	.ndo_start_xmit		= virtnet_bypass_start_xmit,
>+	.ndo_select_queue	= virtnet_bypass_select_queue,
>+	.ndo_get_stats64	= virtnet_bypass_get_stats,
>+	.ndo_change_mtu		= virtnet_bypass_change_mtu,
>+	.ndo_set_rx_mode	= virtnet_bypass_set_rx_mode,
>+	.ndo_validate_addr	= eth_validate_addr,
>+	.ndo_features_check	= passthru_features_check,
>+};
>+
>+static int
>+virtnet_bypass_ethtool_get_link_ksettings(struct net_device *dev,
>+					  struct ethtool_link_ksettings *cmd)
>+{
>+	struct virtnet_bypass_info *vbi = netdev_priv(dev);
>+	struct net_device *child_netdev;
>+
>+	child_netdev = rtnl_dereference(vbi->active_netdev);
>+	if (!child_netdev || !virtnet_bypass_xmit_ready(child_netdev)) {
>+		child_netdev = rtnl_dereference(vbi->backup_netdev);
>+		if (!child_netdev || !virtnet_bypass_xmit_ready(child_netdev)) {
>+			cmd->base.duplex = DUPLEX_UNKNOWN;
>+			cmd->base.port = PORT_OTHER;
>+			cmd->base.speed = SPEED_UNKNOWN;
>+
>+			return 0;
>+		}
>+	}
>+
>+	return __ethtool_get_link_ksettings(child_netdev, cmd);
>+}
>+
>+#define BYPASS_DRV_NAME "virtnet_bypass"
>+#define BYPASS_DRV_VERSION "0.1"
>+
>+static void virtnet_bypass_ethtool_get_drvinfo(struct net_device *dev,
>+					       struct ethtool_drvinfo *drvinfo)
>+{
>+	strlcpy(drvinfo->driver, BYPASS_DRV_NAME, sizeof(drvinfo->driver));
>+	strlcpy(drvinfo->version, BYPASS_DRV_VERSION, sizeof(drvinfo->version));
>+}
>+
>+static const struct ethtool_ops virtnet_bypass_ethtool_ops = {
>+	.get_drvinfo            = virtnet_bypass_ethtool_get_drvinfo,
>+	.get_link               = ethtool_op_get_link,
>+	.get_link_ksettings     = virtnet_bypass_ethtool_get_link_ksettings,
>+};
>+
>+static int virtnet_bypass_join_child(struct net_device *bypass_netdev,
>+				     struct net_device *child_netdev)
>+{
>+	struct virtnet_bypass_info *vbi;
>+	bool backup;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+	backup = (child_netdev->dev.parent == bypass_netdev->dev.parent);
>+	if (backup ? rtnl_dereference(vbi->backup_netdev) :
>+			rtnl_dereference(vbi->active_netdev)) {
>+		netdev_info(bypass_netdev,
>+			    "%s attempting to join bypass dev when %s already present\n",
>+			    child_netdev->name, backup ? "backup" : "active");

Bypass module should check if there is already some other netdev
enslaved and refuse right there.

The active/backup terminology is quite confusing. From the bonding world
that means active is the one which is currently used for tx of the
packets. And it depends on link and other things what netdev is declared
active. However here, it is different. Backup is always the virtio_net
instance even when it is active. Odd. Please change the terminology.
For "active" I suggest to use name "stolen".

*** Also, the 2 slave netdev pointers should be stored in the bypass
module instance, not in the drivers.



>+		return -EEXIST;
>+	}
>+
>+	dev_hold(child_netdev);
>+
>+	if (backup) {
>+		rcu_assign_pointer(vbi->backup_netdev, child_netdev);
>+		dev_get_stats(vbi->backup_netdev, &vbi->backup_stats);
>+	} else {
>+		rcu_assign_pointer(vbi->active_netdev, child_netdev);
>+		dev_get_stats(vbi->active_netdev, &vbi->active_stats);
>+		bypass_netdev->min_mtu = child_netdev->min_mtu;
>+		bypass_netdev->max_mtu = child_netdev->max_mtu;
>+	}
>+
>+	netdev_info(bypass_netdev, "child:%s joined\n", child_netdev->name);
>+
>+	return 0;
>+}
>+
>+static int virtnet_bypass_register_child(struct net_device *bypass_netdev,
>+					 struct net_device *child_netdev)
>+{
>+	struct virtnet_bypass_info *vbi;
>+	bool backup;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+	backup = (child_netdev->dev.parent == bypass_netdev->dev.parent);
>+	if (backup ? rtnl_dereference(vbi->backup_netdev) :
>+			rtnl_dereference(vbi->active_netdev)) {
>+		netdev_info(bypass_netdev,
>+			    "%s attempting to register bypass dev when %s already present\n",
>+			    child_netdev->name, backup ? "backup" : "active");
>+		return -EEXIST;
>+	}
>+
>+	/* Avoid non pci devices as active netdev */
>+	if (!backup && (!child_netdev->dev.parent ||
>+			!dev_is_pci(child_netdev->dev.parent)))
>+		return -EINVAL;
>+
>+	netdev_info(bypass_netdev, "child:%s registered\n", child_netdev->name);
>+
>+	return 0;
>+}
>+
>+static int virtnet_bypass_release_child(struct net_device *bypass_netdev,
>+					struct net_device *child_netdev)
>+{
>+	struct net_device *backup_netdev, *active_netdev;
>+	struct virtnet_bypass_info *vbi;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+	active_netdev = rtnl_dereference(vbi->active_netdev);
>+	backup_netdev = rtnl_dereference(vbi->backup_netdev);
>+
>+	if (child_netdev != active_netdev && child_netdev != backup_netdev)
>+		return -EINVAL;
>+
>+	netdev_info(bypass_netdev, "child:%s released\n", child_netdev->name);
>+
>+	return 0;
>+}
>+
>+static int virtnet_bypass_unregister_child(struct net_device *bypass_netdev,
>+					   struct net_device *child_netdev)
>+{
>+	struct net_device *backup_netdev, *active_netdev;
>+	struct virtnet_bypass_info *vbi;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+	active_netdev = rtnl_dereference(vbi->active_netdev);
>+	backup_netdev = rtnl_dereference(vbi->backup_netdev);
>+
>+	if (child_netdev != active_netdev && child_netdev != backup_netdev)
>+		return -EINVAL;
>+
>+	if (child_netdev == backup_netdev) {
>+		RCU_INIT_POINTER(vbi->backup_netdev, NULL);
>+	} else {
>+		RCU_INIT_POINTER(vbi->active_netdev, NULL);
>+		if (backup_netdev) {
>+			bypass_netdev->min_mtu = backup_netdev->min_mtu;
>+			bypass_netdev->max_mtu = backup_netdev->max_mtu;
>+		}
>+	}
>+
>+	dev_put(child_netdev);
>+
>+	netdev_info(bypass_netdev, "child:%s unregistered\n",
>+		    child_netdev->name);
>+
>+	return 0;
>+}
>+
>+static int virtnet_bypass_update_link(struct net_device *bypass_netdev,
>+				      struct net_device *child_netdev)
>+{
>+	struct net_device *active_netdev, *backup_netdev;
>+	struct virtnet_bypass_info *vbi;
>+
>+	if (!netif_running(bypass_netdev))
>+		return 0;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+
>+	active_netdev = rtnl_dereference(vbi->active_netdev);
>+	backup_netdev = rtnl_dereference(vbi->backup_netdev);
>+
>+	if (child_netdev != active_netdev && child_netdev != backup_netdev)
>+		return -EINVAL;
>+
>+	if ((active_netdev && virtnet_bypass_xmit_ready(active_netdev)) ||
>+	    (backup_netdev && virtnet_bypass_xmit_ready(backup_netdev))) {
>+		netif_carrier_on(bypass_netdev);
>+		netif_tx_wake_all_queues(bypass_netdev);
>+	} else {
>+		netif_carrier_off(bypass_netdev);
>+		netif_tx_stop_all_queues(bypass_netdev);
>+	}
>+
>+	return 0;
>+}
>+
>+/* Called when child dev is injecting data into network stack.
>+ * Change the associated network device from lower dev to virtio.
>+ * note: already called with rcu_read_lock
>+ */
>+static rx_handler_result_t virtnet_bypass_handle_frame(struct sk_buff **pskb)
>+{
>+	struct sk_buff *skb = *pskb;
>+	struct net_device *ndev = rcu_dereference(skb->dev->rx_handler_data);
>+
>+	skb->dev = ndev;
>+
>+	return RX_HANDLER_ANOTHER;
>+}

Hmm, you have the rx_handler defined in drivers and you register it in
bypass module. It is odd because here you assume that the bypass module
passed "ndev" and rx_handler_data. Also, you don't need advanced
features rx_handler provides.
Instead, just register a common rx_handler defined in the bypass module
and have simple skb_rx callback here (static void).


>+
>+static const struct bypass_ops virtnet_bypass_ops = {
>+	.register_child		= virtnet_bypass_register_child,
>+	.join_child		= virtnet_bypass_join_child,
>+	.unregister_child	= virtnet_bypass_unregister_child,
>+	.release_child		= virtnet_bypass_release_child,
>+	.update_link		= virtnet_bypass_update_link,
>+	.handle_frame		= virtnet_bypass_handle_frame,
>+};
>+
>+static struct bypass *virtnet_bypass;
>+
>+static int virtnet_bypass_create(struct virtnet_info *vi)
>+{
>+	struct net_device *backup_netdev = vi->dev;
>+	struct device *dev = &vi->vdev->dev;
>+	struct net_device *bypass_netdev;
>+	int res;
>+
>+	/* Alloc at least 2 queues, for now we are going with 16 assuming
>+	 * that most devices being bonded won't have too many queues.
>+	 */
>+	bypass_netdev = alloc_etherdev_mq(sizeof(struct virtnet_bypass_info),
>+					  16);
>+	if (!bypass_netdev) {
>+		dev_err(dev, "Unable to allocate bypass_netdev!\n");
>+		return -ENOMEM;
>+	}
>+
>+	dev_net_set(bypass_netdev, dev_net(backup_netdev));
>+	SET_NETDEV_DEV(bypass_netdev, dev);
>+
>+	bypass_netdev->netdev_ops = &virtnet_bypass_netdev_ops;
>+	bypass_netdev->ethtool_ops = &virtnet_bypass_ethtool_ops;
>+
>+	/* Initialize the device options */
>+	bypass_netdev->flags |= IFF_MASTER;

I think I pointed that out already. Don't use "IFF_MASTER". That is
specific to bonding. As I suggested in the reply to the patch #2, you
should introduce IFF_BYPASS. Also, this flag should be set by the bypass
module. Just create the netdev and do things specific to virtio and then
call to bypass module, pass the netdev so it can do the rest. I think
that the flags, features etc would be also fine to set there.


>+	bypass_netdev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
>+	bypass_netdev->priv_flags &= ~(IFF_XMIT_DST_RELEASE |
>+				       IFF_TX_SKB_SHARING);
>+
>+	/* don't acquire bypass netdev's netif_tx_lock when transmitting */
>+	bypass_netdev->features |= NETIF_F_LLTX;
>+
>+	/* Don't allow bypass devices to change network namespaces. */
>+	bypass_netdev->features |= NETIF_F_NETNS_LOCAL;
>+
>+	bypass_netdev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
>+				     NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |
>+				     NETIF_F_HIGHDMA | NETIF_F_LRO;
>+
>+	bypass_netdev->hw_features |= NETIF_F_GSO_ENCAP_ALL;
>+	bypass_netdev->features |= bypass_netdev->hw_features;
>+
>+	/* For now treat bypass netdev as VLAN challenged since we
>+	 * cannot assume VLAN functionality with a VF

Why? I don't see such drivers. But to be 100% correct, you should check
the NETIF_F_VLAN_CHALLENGED feature in bypass module during VF enslave
and forbid to do so if it is on.


>+	 */
>+	bypass_netdev->features |= NETIF_F_VLAN_CHALLENGED;
>+
>+	memcpy(bypass_netdev->dev_addr, backup_netdev->dev_addr,
>+	       bypass_netdev->addr_len);
>+
>+	bypass_netdev->min_mtu = backup_netdev->min_mtu;
>+	bypass_netdev->max_mtu = backup_netdev->max_mtu;
>+
>+	res = register_netdev(bypass_netdev);
>+	if (res < 0) {
>+		dev_err(dev, "Unable to register bypass_netdev!\n");
>+		goto err_register_netdev;
>+	}
>+
>+	netif_carrier_off(bypass_netdev);
>+
>+	res = bypass_register_instance(virtnet_bypass, bypass_netdev);
>+	if (res < 0)
>+		goto err_bypass;
>+
>+	rcu_assign_pointer(vi->bypass_netdev, bypass_netdev);
>+
>+	return 0;
>+
>+err_bypass:
>+	unregister_netdev(bypass_netdev);
>+err_register_netdev:
>+	free_netdev(bypass_netdev);
>+
>+	return res;
>+}
>+
>+static void virtnet_bypass_destroy(struct virtnet_info *vi)
>+{
>+	struct net_device *bypass_netdev;
>+	struct virtnet_bypass_info *vbi;
>+	struct net_device *child_netdev;
>+
>+	bypass_netdev = rcu_dereference(vi->bypass_netdev);
>+	/* no device found, nothing to free */
>+	if (!bypass_netdev)
>+		return;
>+
>+	vbi = netdev_priv(bypass_netdev);
>+
>+	netif_device_detach(bypass_netdev);
>+
>+	rtnl_lock();
>+
>+	child_netdev = rtnl_dereference(vbi->active_netdev);
>+	if (child_netdev)
>+		bypass_unregister_child(child_netdev);
>+
>+	child_netdev = rtnl_dereference(vbi->backup_netdev);
>+	if (child_netdev)
>+		bypass_unregister_child(child_netdev);
>+
>+	unregister_netdevice(bypass_netdev);
>+
>+	bypass_unregister_instance(virtnet_bypass, bypass_netdev);
>+
>+	rtnl_unlock();
>+
>+	free_netdev(bypass_netdev);
>+}
>+
>+/* END of functions supporting VIRTIO_NET_F_BACKUP feature. */
>+
> static int virtnet_probe(struct virtio_device *vdev)
> {
> 	int i, err = -ENOMEM;
>@@ -2839,10 +3432,15 @@ static int virtnet_probe(struct virtio_device *vdev)
> 
> 	virtnet_init_settings(dev);
> 
>+	if (virtio_has_feature(vdev, VIRTIO_NET_F_BACKUP)) {
>+		if (virtnet_bypass_create(vi) != 0)

You need to do:
		err = virtnet_bypass_create(vi);
		if (err)
otherwise you ignore err and virtnet_probe would return 0;


>+			goto free_vqs;
>+	}
>+
> 	err = register_netdev(dev);
> 	if (err) {
> 		pr_debug("virtio_net: registering device failed\n");
>-		goto free_vqs;
>+		goto free_bypass;
> 	}
> 
> 	virtio_device_ready(vdev);
>@@ -2879,6 +3477,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> 	vi->vdev->config->reset(vdev);
> 
> 	unregister_netdev(dev);
>+free_bypass:
>+	virtnet_bypass_destroy(vi);
> free_vqs:
> 	cancel_delayed_work_sync(&vi->refill);
> 	free_receive_page_frags(vi);
>@@ -2913,6 +3513,8 @@ static void virtnet_remove(struct virtio_device *vdev)
> 
> 	unregister_netdev(vi->dev);
> 
>+	virtnet_bypass_destroy(vi);
>+
> 	remove_vq_common(vi);
> 
> 	free_netdev(vi->dev);
>@@ -2996,6 +3598,11 @@ static __init int virtio_net_driver_init(void)
> {
> 	int ret;
> 
>+	virtnet_bypass = bypass_register_driver(&virtnet_bypass_ops,
>+						&virtnet_bypass_netdev_ops);
>+	if (!virtnet_bypass)
>+		return -ENOMEM;

If CONFIG_NET_BYPASS is undefined, you will always return -ENOMEM here.


>+
> 	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "virtio/net:online",
> 				      virtnet_cpu_online,
> 				      virtnet_cpu_down_prep);
>@@ -3010,12 +3617,14 @@ static __init int virtio_net_driver_init(void)
>         ret = register_virtio_driver(&virtio_net_driver);
> 	if (ret)
> 		goto err_virtio;
>+
> 	return 0;
> err_virtio:
> 	cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
> err_dead:
> 	cpuhp_remove_multi_state(virtionet_online);
> out:
>+	bypass_unregister_driver(virtnet_bypass);
> 	return ret;
> }
> module_init(virtio_net_driver_init);
>@@ -3025,6 +3634,7 @@ static __exit void virtio_net_driver_exit(void)
> 	unregister_virtio_driver(&virtio_net_driver);
> 	cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
> 	cpuhp_remove_multi_state(virtionet_online);
>+	bypass_unregister_driver(virtnet_bypass);
> }
> module_exit(virtio_net_driver_exit);
> 
>-- 
>2.14.3
>

^ permalink raw reply

* [PATCH] ARM: dts: ls1021a: Specify TBIPA register address
From: Esben Haabendal @ 2018-04-06 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Rob Herring, Mark Rutland, devicetree, linux-kernel,
	Claudiu Manoil, David S . Miller, Gustavo A . R . Silva,
	Esben Haabendal
In-Reply-To: <20180406123836.12019-2-esben.haabendal@gmail.com>

From: Esben Haabendal <eha@deif.com>

The current (mildly evil) fsl_pq_mdio code uses an undocumented shadow of
the TBIPA register on LS1021A, which happens to be read-only.
Changing TBI PHY address therefore does not work on LS1021A.

The real (and documented) address of the TBIPA registere lies in the eTSEC
block and not in MDIO/MII, which is read/write, so using that fixes
the problem.

Signed-off-by: Esben Haabendal <eha@deif.com>
---
 arch/arm/boot/dts/ls1021a.dtsi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index c31dad98f989..728e206009ea 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -587,7 +587,8 @@
 			device_type = "mdio";
 			#address-cells = <1>;
 			#size-cells = <0>;
-			reg = <0x0 0x2d24000 0x0 0x4000>;
+			reg = <0x0 0x2d24000 0x0 0x4000>,
+			      <0x0 0x2d10030 0x0 0x4>;
 		};
 
 		ptp_clock@2d10e00 {
-- 
2.16.3

^ permalink raw reply related

* [PATCH 1/2] net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
From: Esben Haabendal @ 2018-04-06 12:38 UTC (permalink / raw)
  To: netdev
  Cc: Esben Haabendal, Claudiu Manoil, Rob Herring, Mark Rutland,
	David S. Miller, Gustavo A. R. Silva, devicetree, linux-kernel

From: Esben Haabendal <eha@deif.com>

This introduces a simpler and generic method for for finding (and mapping)
the TBIPA register.

Instead of relying of complicated logic for finding the TBIPA register
address based on the MDIO or MII register block base
address, which even in some cases relies on undocumented shadow registers,
a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
register.

Backwards compatibility is kept, as the existing logic is applied when
only a single "reg" mapping is specified.

Signed-off-by: Esben Haabendal <eha@deif.com>
---
 .../devicetree/bindings/net/fsl-tsec-phy.txt       |  6 ++-
 drivers/net/ethernet/freescale/fsl_pq_mdio.c       | 50 +++++++++++++++-------
 2 files changed, 39 insertions(+), 17 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
index 594982c6b9f9..79bf352e659c 100644
--- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
+++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
@@ -6,7 +6,11 @@ the definition of the PHY node in booting-without-of.txt for an example
 of how to define a PHY.
 
 Required properties:
-  - reg : Offset and length of the register set for the device
+  - reg : Offset and length of the register set for the device, and optionally
+          the offset and length of the TBIPA register (TBI PHY address
+	  register).  If TBIPA register is not specified, the driver will
+	  attempt to infer it from the register set specified (your mileage may
+	  vary).
   - compatible : Should define the compatible device type for the
     mdio. Currently supported strings/devices are:
 	- "fsl,gianfar-tbi"
diff --git a/drivers/net/ethernet/freescale/fsl_pq_mdio.c b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
index 80ad16acf0f1..ac2c3f6a12bc 100644
--- a/drivers/net/ethernet/freescale/fsl_pq_mdio.c
+++ b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
@@ -377,6 +377,38 @@ static const struct of_device_id fsl_pq_mdio_match[] = {
 };
 MODULE_DEVICE_TABLE(of, fsl_pq_mdio_match);
 
+static void set_tbipa(const u32 tbipa_val, struct platform_device *pdev,
+		      uint32_t __iomem * (*get_tbipa)(void __iomem *),
+		      void __iomem *reg_map, struct resource *reg_res)
+{
+	struct device_node *np = pdev->dev.of_node;
+	uint32_t __iomem *tbipa;
+	bool tbipa_mapped;
+
+	tbipa = of_iomap(np, 1);
+	if (tbipa) {
+		tbipa_mapped = true;
+	} else {
+		tbipa_mapped = false;
+		tbipa = (*get_tbipa)(reg_map);
+
+		/*
+		 * Add consistency check to make sure TBI is contained within
+		 * the mapped range (not because we would get a segfault,
+		 * rather to catch bugs in computing TBI address). Print error
+		 * message but continue anyway.
+		 */
+		if ((void *)tbipa > reg_map + resource_size(reg_res) - 4)
+			dev_err(&pdev->dev, "invalid register map (should be at least 0x%04zx to contain TBI address)\n",
+				((void *)tbipa - reg_map) + 4);
+	}
+
+	iowrite32be(be32_to_cpu(tbipa_val), tbipa);
+
+	if (tbipa_mapped)
+		iounmap(tbipa);
+}
+
 static int fsl_pq_mdio_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *id =
@@ -450,8 +482,6 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev)
 
 		if (tbi) {
 			const u32 *prop = of_get_property(tbi, "reg", NULL);
-			uint32_t __iomem *tbipa;
-
 			if (!prop) {
 				dev_err(&pdev->dev,
 					"missing 'reg' property in node %pOF\n",
@@ -459,20 +489,8 @@ static int fsl_pq_mdio_probe(struct platform_device *pdev)
 				err = -EBUSY;
 				goto error;
 			}
-
-			tbipa = data->get_tbipa(priv->map);
-
-			/*
-			 * Add consistency check to make sure TBI is contained
-			 * within the mapped range (not because we would get a
-			 * segfault, rather to catch bugs in computing TBI
-			 * address). Print error message but continue anyway.
-			 */
-			if ((void *)tbipa > priv->map + resource_size(&res) - 4)
-				dev_err(&pdev->dev, "invalid register map (should be at least 0x%04zx to contain TBI address)\n",
-					((void *)tbipa - priv->map) + 4);
-
-			iowrite32be(be32_to_cpup(prop), tbipa);
+			set_tbipa(*prop, pdev,
+				  data->get_tbipa, priv->map, &res);
 		}
 	}
 
-- 
2.16.3

^ permalink raw reply related

* Re: [PATCH v2] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-06 12:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel, davem, dnelson, Vadim Lomovtsev, gustavo
In-Reply-To: <32d75865-6407-9e9e-a78d-38ba761b2575@arm.com>

Hi Robin,

On Fri, Apr 06, 2018 at 12:48:40PM +0100, Robin Murphy wrote:
> On 06/04/18 12:14, Vadim Lomovtsev wrote:
> > From: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
> > 
> > It is too expensive to pass u64 values via linked list, instead
> > allocate array for them by overall number of mac addresses from netdev.
> > 
> > This eventually removes multiple kmalloc() calls, aviod memory
> > fragmentation and allow to put single null check on kmalloc
> > return value in order to prevent a potential null pointer dereference.
> > 
> > Addresses-Coverity-ID: 1467429 ("Dereference null return value")
> > Fixes: 37c3347eb247 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
> > Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
> > ---
> > Changes from v1 to v2:
> >   - C99 syntax: update xcast_addr_list struct field mc[0] -> mc[];
> > 
> > ---
> >   drivers/net/ethernet/cavium/thunder/nic.h        |  7 +-----
> >   drivers/net/ethernet/cavium/thunder/nicvf_main.c | 28 +++++++++---------------
> >   2 files changed, 11 insertions(+), 24 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
> > index 5fc46c5a4f36..448d1fafc827 100644
> > --- a/drivers/net/ethernet/cavium/thunder/nic.h
> > +++ b/drivers/net/ethernet/cavium/thunder/nic.h
> > @@ -265,14 +265,9 @@ struct nicvf_drv_stats {
> >   struct cavium_ptp;
> > -struct xcast_addr {
> > -	struct list_head list;
> > -	u64              addr;
> > -};
> > -
> >   struct xcast_addr_list {
> > -	struct list_head list;
> >   	int              count;
> > +	u64              mc[];
> >   };
> >   struct nicvf_work {
> > diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> > index 1e9a31fef729..a26d8bc92e01 100644
> > --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> > +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
> > @@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
> >   						  work.work);
> >   	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
> >   	union nic_mbx mbx = {};
> > -	struct xcast_addr *xaddr, *next;
> > +	u8 idx = 0;
> >   	if (!vf_work)
> >   		return;
> > @@ -1956,16 +1956,10 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
> >   	/* check if we have any specific MACs to be added to PF DMAC filter */
> >   	if (vf_work->mc) {
> >   		/* now go through kernel list of MACs and add them one by one */
> > -		list_for_each_entry_safe(xaddr, next,
> > -					 &vf_work->mc->list, list) {
> > +		for (idx = 0; idx < vf_work->mc->count; idx++) {
> >   			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
> > -			mbx.xcast.data.mac = xaddr->addr;
> > +			mbx.xcast.data.mac = vf_work->mc->mc[idx];
> >   			nicvf_send_msg_to_pf(nic, &mbx);
> > -
> > -			/* after receiving ACK from PF release memory */
> > -			list_del(&xaddr->list);
> > -			kfree(xaddr);
> > -			vf_work->mc->count--;
> >   		}
> >   		kfree(vf_work->mc);
> >   	}
> > @@ -1996,17 +1990,15 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
> >   			mode |= BGX_XCAST_MCAST_FILTER;
> >   			/* here we need to copy mc addrs */
> >   			if (netdev_mc_count(netdev)) {
> > -				struct xcast_addr *xaddr;
> > -
> > -				mc_list = kmalloc(sizeof(*mc_list), GFP_ATOMIC);
> > -				INIT_LIST_HEAD(&mc_list->list);
> > +				mc_list = kmalloc(sizeof(*mc_list) +
> > +						  sizeof(u64) * netdev_mc_count(netdev),
> 
> FWIW if you really wanted to disambiguate that it's a structure and not just
> an array being allocated, then it could be expressed without explicit
> arithmetic:
> 
> 	size = offsetof(typeof(*mc_list), mc[netdev_mc_count(netdev)]);
> 
> but that's probably just a matter of personal preference at this point.
> 
> Robin.
> 

Thanks for reviewing it and for hint. From style perspective, I believe,
I should get rid off direct types names also, and use your suggestion here.

I'll update patch to v3, test and re-post.
Thank you for your time.

WBR,
Vadim

> > +						  GFP_ATOMIC);
> > +				if (unlikely(!mc_list))
> > +					return;
> > +				mc_list->count = 0;
> >   				netdev_hw_addr_list_for_each(ha, &netdev->mc) {
> > -					xaddr = kmalloc(sizeof(*xaddr),
> > -							GFP_ATOMIC);
> > -					xaddr->addr =
> > +					mc_list->mc[mc_list->count] =
> >   						ether_addr_to_u64(ha->addr);
> > -					list_add_tail(&xaddr->list,
> > -						      &mc_list->list);
> >   					mc_list->count++;
> >   				}
> >   			}
> > 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox