Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3] powerpc: Implement csum_ipv6_magic in assembly
From: Christophe LEROY @ 2018-05-24  6:20 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev, netdev
In-Reply-To: <20180523183447.GV17342@gate.crashing.org>



Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
>> The generic csum_ipv6_magic() generates a pretty bad result
> 
> <snip>
> 
> Please try with a more recent compiler, what you used is pretty ancient.
> It's not like recent compilers do great on this either, but it's not
> *that* bad anymore ;-)
> 
>> --- a/arch/powerpc/lib/checksum_32.S
>> +++ b/arch/powerpc/lib/checksum_32.S
>> @@ -293,3 +293,36 @@ dst_error:
>>   	EX_TABLE(51b, dst_error);
>>   
>>   EXPORT_SYMBOL(csum_partial_copy_generic)
>> +
>> +/*
>> + * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>> + *				      const struct in6_addr *daddr,
>> + *				      __u32 len, __u8 proto, __wsum sum)
>> + */
>> +
>> +_GLOBAL(csum_ipv6_magic)
>> +	lwz	r8, 0(r3)
>> +	lwz	r9, 4(r3)
>> +	lwz	r10, 8(r3)
>> +	lwz	r11, 12(r3)
>> +	addc	r0, r5, r6
>> +	adde	r0, r0, r7
>> +	adde	r0, r0, r8
>> +	adde	r0, r0, r9
>> +	adde	r0, r0, r10
>> +	adde	r0, r0, r11
>> +	lwz	r8, 0(r4)
>> +	lwz	r9, 4(r4)
>> +	lwz	r10, 8(r4)
>> +	lwz	r11, 12(r4)
>> +	adde	r0, r0, r8
>> +	adde	r0, r0, r9
>> +	adde	r0, r0, r10
>> +	adde	r0, r0, r11
>> +	addze	r0, r0
>> +	rotlwi	r3, r0, 16
>> +	add	r3, r0, r3
>> +	not	r3, r3
>> +	rlwinm	r3, r3, 16, 16, 31
>> +	blr
>> +EXPORT_SYMBOL(csum_ipv6_magic)
> 
> Clustering the loads and carry insns together is pretty much the worst you
> can do on most 32-bit CPUs.

Oh, really ? __csum_partial is written that way too.

Right, now I tried interleaving the lwz and adde. I get no improvment at 
all on a 885, but I get a 15% improvment on a 8321.

Christophe

> 
> 
> Segher
> 

^ permalink raw reply

* [PATCH net-next 1/1] bnx2x: Collect the device debug information during Tx timeout.
From: Sudarsana Reddy Kalluru @ 2018-05-24  6:21 UTC (permalink / raw)
  To: davem; +Cc: netdev

Tx-timeout mostly happens due to some issue in the device. In such cases,
debug dump would be helpful for identifying the cause of the issue.
This patch adds support to spill debug data during the Tx timeout. Here
bnx2x_panic_dump() API is used instead of bnx2x_panic(), since we still
want to allow the Tx-timeout recovery a chance to succeed.

Please consider applying this to "net-next".

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 95871576..182d5e1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4962,8 +4962,13 @@ void bnx2x_tx_timeout(struct net_device *dev)
 {
 	struct bnx2x *bp = netdev_priv(dev);

-#ifdef BNX2X_STOP_ON_ERROR
+	/* We want the information of the dump logged,
+	 * but calling bnx2x_panic() would kill all chances of recovery.
+	 */
 	if (!bp->panic)
+#ifdef BNX2X_STOP_ON_ERROR
+		bnx2x_panic_dump(bp, false);
+#else
 		bnx2x_panic();
 #endif

-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH bpf-next v3 01/15] net: initial AF_XDP skeleton
From: Björn Töpel @ 2018-05-24  6:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Karlsson, Magnus, Duyck, Alexander H, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Jesper Dangaard Brouer,
	Willem de Bruijn, Daniel Borkmann, Michael S. Tsirkin, Netdev,
	Björn Töpel, michael.lundkvist, Brandeburg, Jesse,
	Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <20180523155047.6c136279@xeon-e3>

2018-05-24 0:50 GMT+02:00 Stephen Hemminger <stephen@networkplumber.org>:
> On Wed,  2 May 2018 13:01:22 +0200
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
>> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
>> new file mode 100644
>> index 000000000000..90e4a7152854
>> --- /dev/null
>> +++ b/net/xdp/Kconfig
>> @@ -0,0 +1,7 @@
>> +config XDP_SOCKETS
>> +     bool "XDP sockets"
>> +     depends on BPF_SYSCALL
>> +     default n
>> +     help
>> +       XDP sockets allows a channel between XDP programs and
>> +       userspace applications.
>
> Why is XDP not supported as a module?
> Most distributions will want it to be a module so that it is not loaded
> unless used, and AF_XDP could be also be disabled by blacklisting the module.

Yes, all good points, and The Grand Plan is adding module support.
Unfortunately, it's not there yet.

^ permalink raw reply

* Re: [patch iproute2/net-next 2/2] devlink: introduce support for showing port number and split subport number
From: Jiri Pirko @ 2018-05-24  6:39 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, idosch, jakub.kicinski, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, simon.horman,
	pieter.jansenvanvuuren, john.hurley, dirk.vandermerwe,
	alexander.h.duyck, ogerlitz, vijaya.guvva, satananda.burla,
	raghu.vatsavayi, felix.manlunas, gospo, sathya.perla,
	vasundhara-v.volam, tariqt, eranbe, jeffrey.t.kirsher, roopa
In-Reply-To: <fdd0bb05-b5c1-7805-e8e8-30b6580ca4fd@gmail.com>

Wed, May 23, 2018 at 10:05:49PM CEST, dsahern@gmail.com wrote:
>On 5/20/18 2:15 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  devlink/devlink.c            | 6 ++++++
>>  include/uapi/linux/devlink.h | 2 ++
>>  2 files changed, 8 insertions(+)
>> 
>> diff --git a/devlink/devlink.c b/devlink/devlink.c
>> index df2c66dac1c7..b0ae17767dab 100644
>> --- a/devlink/devlink.c
>> +++ b/devlink/devlink.c
>> @@ -1737,9 +1737,15 @@ static void pr_out_port(struct dl *dl, struct nlattr **tb)
>>  
>>  		pr_out_str(dl, "flavour", port_flavour_name(port_flavour));
>>  	}
>> +	if (tb[DEVLINK_ATTR_PORT_NUMBER])
>> +		pr_out_uint(dl, "number",
>> +			    mnl_attr_get_u32(tb[DEVLINK_ATTR_PORT_NUMBER]));
>
>"number" is a label means nothing. "port" is more descriptive.

That attribute name is "port_number". As the other attributes are
named "port_something", and the "something" is printed out here, the
"number" is consistent with it. Each line represents a port with a list
of attributes.

>
># ./devlink port
>pci/0000:03:00.0/1: type eth netdev swp17 flavour physical number 17
>pci/0000:03:00.0/3: type eth netdev swp18 flavour physical number 18
>pci/0000:03:00.0/5: type eth netdev swp19 flavour physical number 19
>pci/0000:03:00.0/7: type eth netdev swp20 flavour physical number 20
>pci/0000:03:00.0/9: type eth netdev swp21 flavour physical number 21
>...
>pci/0000:03:00.0/61: type eth netdev swp1s0 flavour physical number 1
>split_group 1 subport 0
>pci/0000:03:00.0/62: type eth netdev swp1s1 flavour physical number 1
>split_group 1 subport 1
>

^ permalink raw reply

* Re: [PATCH net-next v4 2/2] openvswitch: Support conntrack zone limit
From: Pravin Shelar @ 2018-05-24  6:49 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1526948165-32443-3-git-send-email-yihung.wei@gmail.com>

On Mon, May 21, 2018 at 5:16 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table.  Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
>
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
>
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, ovs will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
>
> The following high leve APIs are provided to the userspace:
>   - OVS_CT_LIMIT_CMD_SET:
>     * set default connection limit for all zones
>     * set the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_DEL:
>     * remove the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_GET:
>     * get the default connection limit for all zones
>     * get the connection limit for a particular zone
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>

I have few comments, but otherwise patch looks good.
> ---
>  net/openvswitch/Kconfig     |   3 +-
>  net/openvswitch/conntrack.c | 541 +++++++++++++++++++++++++++++++++++++++++++-
>  net/openvswitch/conntrack.h |   9 +-
>  net/openvswitch/datapath.c  |   7 +-
>  net/openvswitch/datapath.h  |   3 +
>  5 files changed, 557 insertions(+), 6 deletions(-)
>
> diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
> index 2650205cdaf9..89da9512ec1e 100644
> --- a/net/openvswitch/Kconfig
> +++ b/net/openvswitch/Kconfig
> @@ -9,7 +9,8 @@ config OPENVSWITCH
>                    (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
>                                      (!NF_NAT || NF_NAT) && \
>                                      (!NF_NAT_IPV4 || NF_NAT_IPV4) && \
> -                                    (!NF_NAT_IPV6 || NF_NAT_IPV6)))
> +                                    (!NF_NAT_IPV6 || NF_NAT_IPV6) && \
> +                                    (!NETFILTER_CONNCOUNT || NETFILTER_CONNCOUNT)))
>         select LIBCRC32C
>         select MPLS
>         select NET_MPLS_GSO
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index 02fc343feb66..e8bb91420ca9 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -16,8 +16,11 @@
>  #include <linux/tcp.h>
>  #include <linux/udp.h>
>  #include <linux/sctp.h>
> +#include <linux/static_key.h>
>  #include <net/ip.h>
> +#include <net/genetlink.h>
>  #include <net/netfilter/nf_conntrack_core.h>
> +#include <net/netfilter/nf_conntrack_count.h>
>  #include <net/netfilter/nf_conntrack_helper.h>
>  #include <net/netfilter/nf_conntrack_labels.h>
>  #include <net/netfilter/nf_conntrack_seqadj.h>
> @@ -76,6 +79,31 @@ struct ovs_conntrack_info {
>  #endif
>  };
>
> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +#define OVS_CT_LIMIT_UNLIMITED 0
> +#define OVS_CT_LIMIT_DEFAULT OVS_CT_LIMIT_UNLIMITED
> +#define CT_LIMIT_HASH_BUCKETS 512
> +static DEFINE_STATIC_KEY_FALSE(ovs_ct_limit_enabled);
> +
> +struct ovs_ct_limit {
> +       /* Elements in ovs_ct_limit_info->limits hash table */
> +       struct hlist_node hlist_node;
> +       struct rcu_head rcu;
> +       u16 zone;
> +       u32 limit;
> +};
> +
> +struct ovs_ct_limit_info {
> +       u32 default_limit;
> +       struct hlist_head *limits;
> +       struct nf_conncount_data *data __aligned(8);

Why does it need explicit alignment attribute?

> +};
> +
> +static const struct nla_policy ct_limit_policy[OVS_CT_LIMIT_ATTR_MAX + 1] = {
> +       [OVS_CT_LIMIT_ATTR_ZONE_LIMIT] = { .type = NLA_NESTED, },
> +};
> +#endif
> +
>  static bool labels_nonzero(const struct ovs_key_ct_labels *labels);

...

> +static int ovs_ct_check_limit(struct net *net,
> +                             const struct ovs_conntrack_info *info,
> +                             const struct nf_conntrack_tuple *tuple)
> +{
> +       struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
> +       const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +       u32 per_zone_limit, connections;
> +       u32 conncount_key[5];

If the key size of single u32, why the array of 5 is defined for the key?

> +
> +       conncount_key[0] = info->zone.id;
> +
> +       per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
> +       if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED)
> +               return 0;
> +
> +       connections = nf_conncount_count(net, ct_limit_info->data,
> +                                        conncount_key, tuple, &info->zone);
> +       if (connections > per_zone_limit)
> +               return -ENOMEM;
> +
> +       return 0;
...

> -void ovs_ct_init(struct net *net)
> +#if    IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT)
> +static int ovs_ct_limit_init(struct net *net, struct ovs_net *ovs_net)
> +{
> +       int i, err;
> +
> +       ovs_net->ct_limit_info = kmalloc(sizeof(*ovs_net->ct_limit_info),
> +                                        GFP_KERNEL);
> +       if (!ovs_net->ct_limit_info)
> +               return -ENOMEM;
> +
> +       ovs_net->ct_limit_info->default_limit = OVS_CT_LIMIT_DEFAULT;
> +       ovs_net->ct_limit_info->limits =
> +               kmalloc_array(CT_LIMIT_HASH_BUCKETS, sizeof(struct hlist_head),
> +                             GFP_KERNEL);
> +       if (!ovs_net->ct_limit_info->limits) {
> +               kfree(ovs_net->ct_limit_info);
> +               return -ENOMEM;
> +       }
> +
> +       for (i = 0; i < CT_LIMIT_HASH_BUCKETS; i++)
> +               INIT_HLIST_HEAD(&ovs_net->ct_limit_info->limits[i]);
> +
> +       ovs_net->ct_limit_info->data =
> +               nf_conncount_init(net, NFPROTO_INET, sizeof(u32));
> +
> +       if (IS_ERR(ovs_net->ct_limit_info->data)) {

Can you print error msg, other wise it would be really hard to debug a
namespace launch failure due to this issue.

> +               err = PTR_ERR(ovs_net->ct_limit_info->data);
> +               kfree(ovs_net->ct_limit_info->limits);
> +               kfree(ovs_net->ct_limit_info);
> +               return err;
> +       }
> +       return 0;
> +}
> +
....
> +static int ovs_ct_limit_del_zone_limit(struct nlattr *nla_zone_limit,
> +                                      struct ovs_ct_limit_info *info)
> +{
> +       struct ovs_zone_limit *zone_limit;
> +       int rem;
> +       u16 zone;
> +
> +       rem = NLA_ALIGN(nla_len(nla_zone_limit));
> +       zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit);
> +
> +       while (rem >= sizeof(*zone_limit)) {
> +               if (unlikely(!check_zone_id(zone_limit->zone_id, &zone))) {
> +                       OVS_NLERR(true, "zone id is out of range");

There is no need to check if the port is out of range when we are
deleting it. since hash table lookup would fail anyways.

> +               } else {
> +                       ovs_lock();
> +                       ct_limit_del(info, zone);
> +                       ovs_unlock();
> +               }
> +               rem -= NLA_ALIGN(sizeof(*zone_limit));
> +               zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit +
> +                               NLA_ALIGN(sizeof(*zone_limit)));
> +       }
> +

This API does not handle delete of default limit.

> +       if (rem)
> +               OVS_NLERR(true, "del zone limit has %d unknown bytes", rem);
> +
> +       return 0;
> +}
> +
> +static int ovs_ct_limit_get_default_limit(struct ovs_ct_limit_info *info,
> +                                         struct sk_buff *reply)
> +{
> +       struct ovs_zone_limit zone_limit;
> +       int err;
> +
> +       zone_limit.zone_id = -1;

This is part of UAPI, Can you define constant in openvswitch.h for
default zone id to be -1.

> +       zone_limit.limit = info->default_limit;
> +       err = nla_put_nohdr(reply, sizeof(zone_limit), &zone_limit);
> +       if (err)
> +               return err;
> +
> +       return 0;
> +}
> +
> +static int ovs_ct_limit_get_zone_limit(struct net *net,
> +                                      struct nlattr *nla_zone_limit,
> +                                      struct ovs_ct_limit_info *info,
> +                                      struct sk_buff *reply)
> +{
> +       struct nf_conntrack_zone ct_zone;
> +       struct ovs_zone_limit *zone_limit;
> +       int rem, err;
> +       u32 conncount_key[5];
> +       u16 zone;
> +
> +       rem = NLA_ALIGN(nla_len(nla_zone_limit));
> +       zone_limit = (struct ovs_zone_limit *)nla_data(nla_zone_limit);
> +
> +       while (rem >= sizeof(*zone_limit)) {
> +               if (unlikely(zone_limit->zone_id == -1)) {
> +                       err = ovs_ct_limit_get_default_limit(info, reply);
> +                       if (err)
> +                               return err;
> +               } else if (unlikely(!check_zone_id(zone_limit->zone_id,
> +                                                       &zone))) {
> +                       OVS_NLERR(true, "zone id is out of range");
> +               } else {
> +                       rcu_read_lock();
> +                       zone_limit->limit = ct_limit_get(info, zone);
> +                       rcu_read_unlock();
> +
> +                       nf_ct_zone_init(&ct_zone, zone, NF_CT_DEFAULT_ZONE_DIR,
> +                                       0);
> +                       conncount_key[0] = zone;
> +                       zone_limit->count = nf_conncount_count(
> +                               net, info->data, conncount_key, NULL, &ct_zone);
> +                       err = nla_put_nohdr(reply, sizeof(*zone_limit),
> +                                           zone_limit);
> +                       if (err)
> +                               return err;
> +               }
> +               rem -= NLA_ALIGN(sizeof(*zone_limit));
> +               zone_limit = (struct ovs_zone_limit *)((u8 *)zone_limit +
> +                               NLA_ALIGN(sizeof(*zone_limit)));
> +       }
> +
> +       if (rem)
> +               OVS_NLERR(true, "get zone limit has %d unknown bytes", rem);
> +
> +       return 0;
> +}
> +
> +static int ovs_ct_limit_get_all_zone_limit(struct net *net,
> +                                          struct ovs_ct_limit_info *info,
> +                                          struct sk_buff *reply)
> +{
> +       struct nf_conntrack_zone ct_zone;
> +       struct ovs_zone_limit zone_limit;
> +       struct ovs_ct_limit *ct_limit;
> +       struct hlist_head *head;
> +       u32 conncount_key[5];
> +       int i, err = 0;
> +
> +       err = ovs_ct_limit_get_default_limit(info, reply);
> +       if (err)
> +               return err;
> +
> +       rcu_read_lock();
> +       for (i = 0; i < CT_LIMIT_HASH_BUCKETS; ++i) {
> +               head = &info->limits[i];
> +               hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
> +                       zone_limit.zone_id = ct_limit->zone;
> +                       zone_limit.limit = ct_limit->limit;
> +                       nf_ct_zone_init(&ct_zone, ct_limit->zone,
> +                                       NF_CT_DEFAULT_ZONE_DIR, 0);
> +
> +                       conncount_key[0] = ct_limit->zone;
> +                       zone_limit.count = nf_conncount_count(net, info->data,
> +                                       conncount_key, NULL, &ct_zone);
> +                       err = nla_put_nohdr(reply, sizeof(zone_limit),
> +                                       &zone_limit);
> +                       if (err)
> +                               goto exit_err;
Can you write a single helper function to build reply zone_limit
object that can be used in ovs_ct_limit_get_zone_limit() and
ovs_ct_limit_get_all_zone_limit()?

> +               }
> +       }
> +
> +exit_err:
> +       rcu_read_unlock();
> +       return err;
> +}
> +
...

^ permalink raw reply

* [PATCH 0/4] RFC CPSW switchdev mode
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas

Hello, 

This is adding a new mode on the cpsw driver based around switchdev.
In order to enable this you need to enable CONFIG_NET_SWITCHDEV, 
CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
and add to udev config: 

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"
Since the phys_switch_id is based on cpsw version, users with different 
version will need to do 'ip -d link show dev sw0p0 | grep switchid' and 
replace with the correct value.

This patch creates 3 ports, sw0p0, sw0p1 and sw0p2.
sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices
while sw0p0 is the switch 'cpu facing port'.
sw0p0 will be unable to receive and transmit traffic and it's not 100% within
switchdev scope but, it's used to configure switch cpu port individually as 
this is needed for various switch features and configuration scenarios.

Bridge setup:
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1

ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p0 up
ip link set dev sw0p0 master br0
ip link set dev sw0p2 master br0
ip link set dev sw0p1 master br0

ip link set br0 address $(cat /sys/class/net/sw0p1/address)
ifconfig br0 up

VLAN config:
untagged:
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master

tagged:
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master

IP address on br0:
bridge vlan add dev br0 vid 100 pvid untagged self
bridge vlan add dev sw0p0 vid 100 pvid untagged master
udhcpc -i br0

FDBs:
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master

MDBs:
single vlan:
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100

all vlans:
bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent
bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent

Multicast:
setting multicast on and off will affect registered multicast
setting allmulti on and off will affect unregistered multicast
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config 
commands.

Promiscuous mode:
Adding/removing sw0p0 on the bridge will enable/disable ALE_P0_UNI_FLOOD

NFS:
The only way for NFS to work is by chrooting to a minimal environment when 
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and 
it's just there to prove NFS is doable).

setup.sh:
#!/bin/sh
mkdir proc
mount -t proc none /proc

ifconfig br0  > /dev/null
if [ $? -ne 0 ]; then
        echo "Setting up bridge"
        ip link add name br0 type bridge
        ip link set dev br0 type bridge ageing_time 1000
        ip link set dev br0 type bridge vlan_filtering 1

        ip link set eth1 down 
        ip link set eth1 name sw0p1 
        ip link set dev sw0p1 up
        ip link set dev sw0p2 up
        ip link set dev sw0p0 up
        ip link set dev sw0p0 master br0 
        ip link set dev sw0p2 master br0
        ip link set dev sw0p1 master br0
        ifconfig sw0p1 0.0.0.0
        udhchc -i br0
fi
umount /proc

run_nfs.sh:
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p

cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/run_nfs.sh

run ./run_nfs.sh

This is on top of 4.17-rc2 tree.

P.S: I am not 100% sure that the promiscuity handling is correct.
Please let me know if i should change anything on that

Ilias Apalodimas (4):
  cpsw: move common headers definitions to cpsw_priv.h
  cpsw_ale: add support functions for switchdev
  cpsw_switchdev: add switchdev support files
  cpsw: add switchdev support

 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 610 ++++++++++++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ale.c       | 129 +++++++
 drivers/net/ethernet/ti/cpsw_ale.h       |   8 +
 drivers/net/ethernet/ti/cpsw_priv.h      | 148 ++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 8 files changed, 1039 insertions(+), 169 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

-- 
2.7.4

^ permalink raw reply

* [PATCH 1/4] cpsw: move common headers definitions to cpsw_priv.h
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas
In-Reply-To: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org>

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 111 +---------------------------
 drivers/net/ethernet/ti/cpsw_priv.h | 141 ++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 110 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3037127..b16e7cf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -41,6 +41,7 @@
 
 #include "cpsw.h"
 #include "cpsw_ale.h"
+#include "cpsw_priv.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -88,7 +89,6 @@ do {								\
 #define CPSW_VERSION_3		0x19010f
 #define CPSW_VERSION_4		0x190112
 
-#define HOST_PORT_NUM		0
 #define CPSW_ALE_PORTS_NUM	3
 #define SLIVER_SIZE		0x40
 
@@ -309,16 +309,6 @@ struct cpsw_ss_regs {
 #define CPSW_MAX_BLKS_TX_SHIFT		4
 #define CPSW_MAX_BLKS_RX		5
 
-struct cpsw_host_regs {
-	u32	max_blks;
-	u32	blk_cnt;
-	u32	tx_in_ctl;
-	u32	port_vlan;
-	u32	tx_pri_map;
-	u32	cpdma_tx_pri_map;
-	u32	cpdma_rx_chan_map;
-};
-
 struct cpsw_sliver_regs {
 	u32	id_ver;
 	u32	mac_control;
@@ -370,105 +360,6 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
-struct cpsw_slave_data {
-	struct device_node *phy_node;
-	char		phy_id[MII_BUS_ID_SIZE];
-	int		phy_if;
-	u8		mac_addr[ETH_ALEN];
-	u16		dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
-};
-
-struct cpsw_platform_data {
-	struct cpsw_slave_data	*slave_data;
-	u32	ss_reg_ofs;	/* Subsystem control register offset */
-	u32	channels;	/* number of cpdma channels (symmetric) */
-	u32	slaves;		/* number of slave cpgmac ports */
-	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
-	u32	ale_entries;	/* ale table size */
-	u32	bd_ram_size;  /*buffer descriptor ram size */
-	u32	mac_control;	/* Mac control register */
-	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
-};
-
-struct cpsw_slave {
-	void __iomem			*regs;
-	struct cpsw_sliver_regs __iomem	*sliver;
-	int				slave_num;
-	u32				mac_control;
-	struct cpsw_slave_data		*data;
-	struct phy_device		*phy;
-	struct net_device		*ndev;
-	u32				port_vlan;
-};
-
-static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
-{
-	return readl_relaxed(slave->regs + offset);
-}
-
-static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
-{
-	writel_relaxed(val, slave->regs + offset);
-}
-
-struct cpsw_vector {
-	struct cpdma_chan *ch;
-	int budget;
-};
-
-struct cpsw_common {
-	struct device			*dev;
-	struct cpsw_platform_data	data;
-	struct napi_struct		napi_rx;
-	struct napi_struct		napi_tx;
-	struct cpsw_ss_regs __iomem	*regs;
-	struct cpsw_wr_regs __iomem	*wr_regs;
-	u8 __iomem			*hw_stats;
-	struct cpsw_host_regs __iomem	*host_port_regs;
-	u32				version;
-	u32				coal_intvl;
-	u32				bus_freq_mhz;
-	int				rx_packet_max;
-	struct cpsw_slave		*slaves;
-	struct cpdma_ctlr		*dma;
-	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
-	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
-	struct cpsw_ale			*ale;
-	bool				quirk_irq;
-	bool				rx_irq_disabled;
-	bool				tx_irq_disabled;
-	u32 irqs_table[IRQ_NUM];
-	struct cpts			*cpts;
-	int				rx_ch_num, tx_ch_num;
-	int				speed;
-	int				usage_count;
-};
-
-struct cpsw_priv {
-	struct net_device		*ndev;
-	struct device			*dev;
-	u32				msg_enable;
-	u8				mac_addr[ETH_ALEN];
-	bool				rx_pause;
-	bool				tx_pause;
-	u32 emac_port;
-	struct cpsw_common *cpsw;
-};
-
-struct cpsw_stats {
-	char stat_string[ETH_GSTRING_LEN];
-	int type;
-	int sizeof_stat;
-	int stat_offset;
-};
-
-enum {
-	CPSW_STATS,
-	CPDMA_RX_STATS,
-	CPDMA_TX_STATS,
-};
-
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
new file mode 100644
index 0000000..3b02a83
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -0,0 +1,141 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+
+#define HOST_PORT_NUM		0
+#define IRQ_NUM			2
+#define CPSW_MAX_QUEUES		8
+
+#define CPSW_VERSION_1		0x19010a
+#define CPSW_VERSION_2		0x19010c
+#define CPSW_VERSION_3		0x19010f
+#define CPSW_VERSION_4		0x190112
+
+/* CPSW_PORT_V1 */
+#define CPSW1_MAX_BLKS      0x00 /* Maximum FIFO Blocks */
+#define CPSW1_BLK_CNT       0x04 /* FIFO Block Usage Count (Read Only) */
+#define CPSW1_TX_IN_CTL     0x08 /* Transmit FIFO Control */
+#define CPSW1_PORT_VLAN     0x0c /* VLAN Register */
+#define CPSW1_TX_PRI_MAP    0x10 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW1_TS_CTL        0x14 /* Time Sync Control */
+#define CPSW1_TS_SEQ_LTYPE  0x18 /* Time Sync Sequence ID Offset and Msg Type */
+#define CPSW1_TS_VLAN       0x1c /* Time Sync VLAN1 and VLAN2 */
+
+/* CPSW_PORT_V2 */
+#define CPSW2_CONTROL       0x00 /* Control Register */
+#define CPSW2_MAX_BLKS      0x08 /* Maximum FIFO Blocks */
+#define CPSW2_BLK_CNT       0x0c /* FIFO Block Usage Count (Read Only) */
+#define CPSW2_TX_IN_CTL     0x10 /* Transmit FIFO Control */
+#define CPSW2_PORT_VLAN     0x14 /* VLAN Register */
+#define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
+
+struct cpsw_slave_data {
+	struct	device_node *phy_node;
+	char	phy_id[MII_BUS_ID_SIZE];
+	int	phy_if;
+	u8	mac_addr[ETH_ALEN];
+	u16	dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
+};
+
+struct cpsw_platform_data {
+	struct cpsw_slave_data	*slave_data;
+	u32	ss_reg_ofs;	/* Subsystem control register offset */
+	u32	channels;	/* number of cpdma channels (symmetric) */
+	u32	slaves;		/* number of slave cpgmac ports */
+	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
+	u32	ale_entries;	/* ale table size */
+	u32	bd_ram_size;  /*buffer descriptor ram size */
+	u32	mac_control;	/* Mac control register */
+	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
+	bool	dual_emac;	/* Enable Dual EMAC mode */
+};
+
+struct cpsw_slave {
+	void __iomem			*regs;
+	struct cpsw_sliver_regs __iomem	*sliver;
+	int				slave_num;
+	u32				mac_control;
+	struct cpsw_slave_data		*data;
+	struct phy_device		*phy;
+	struct net_device		*ndev;
+	u32				port_vlan;
+};
+
+struct cpsw_vector {
+	struct cpdma_chan *ch;
+	int budget;
+};
+
+struct cpsw_common {
+	struct device			*dev;
+	struct cpsw_platform_data	data;
+	struct napi_struct		napi_rx;
+	struct napi_struct		napi_tx;
+	struct cpsw_ss_regs __iomem	*regs;
+	struct cpsw_wr_regs __iomem	*wr_regs;
+	u8 __iomem			*hw_stats;
+	struct cpsw_host_regs __iomem	*host_port_regs;
+	u32				version;
+	u32				coal_intvl;
+	u32				bus_freq_mhz;
+	int				rx_packet_max;
+	struct cpsw_slave		*slaves;
+	struct cpdma_ctlr		*dma;
+	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
+	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
+	struct cpsw_ale			*ale;
+	bool				quirk_irq;
+	bool				rx_irq_disabled;
+	bool				tx_irq_disabled;
+	u32				irqs_table[IRQ_NUM];
+	struct cpts			*cpts;
+	int				rx_ch_num, tx_ch_num;
+	int				speed;
+	int				usage_count;
+};
+
+struct cpsw_priv {
+	struct net_device	*ndev;
+	struct device		*dev;
+	u32			msg_enable;
+	u8			mac_addr[ETH_ALEN];
+	bool			rx_pause;
+	bool			tx_pause;
+	u8			port_state[3];
+	u32			emac_port;
+	struct cpsw_common	*cpsw;
+};
+
+struct cpsw_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int type;
+	int sizeof_stat;
+	int stat_offset;
+};
+
+enum {
+	CPSW_STATS,
+	CPDMA_RX_STATS,
+	CPDMA_TX_STATS,
+};
+
+struct cpsw_host_regs {
+	u32	max_blks;
+	u32	blk_cnt;
+	u32	tx_in_ctl;
+	u32	port_vlan;
+	u32	tx_pri_map;
+	u32	cpdma_tx_pri_map;
+	u32	cpdma_rx_chan_map;
+};
+
+static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
+{
+	return readl_relaxed(slave->regs + offset);
+}
+
+static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
+{
+	writel_relaxed(val, slave->regs + offset);
+}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/4] cpsw_ale: add support functions for switchdev
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas
In-Reply-To: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org>

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw_ale.c | 129 +++++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_ale.h |   8 +++
 2 files changed, 137 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 93dc05c..0b7383f 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -409,6 +409,41 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_mcast);
 
+static int cpsw_ale_read_mc(struct cpsw_ale *ale, u8 *addr, int flags, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_port_mask(ale_entry, ale->port_mask_bits);
+}
+
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state)
+{
+	int mcast_members, ret;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) | port_mask;
+	ret = cpsw_ale_add_mcast(ale, addr, mcast_members, flags, vid, 0);
+
+	return ret;
+}
+
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid)
+{
+	int mcast_members, ret;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) & ~port_mask;
+	ret = cpsw_ale_del_mcast(ale, addr, mcast_members, flags, vid);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_mcast_del_modify);
+
 /* ALE NetCP NU switch specific vlan functions */
 static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 				    int reg_mcast, int unreg_mcast)
@@ -424,6 +459,52 @@ static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 	writel(unreg_mcast, ale->params.ale_regs + ALE_VLAN_MASK_MUX(idx));
 }
 
+static int cpsw_ale_read_untagged(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_untag_force(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of current members for specificed vlan */
+static int cpsw_ale_read_vlan_members(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_member_list(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of registered/unregistered multicast registration */
+static int cpsw_ale_read_reg_unreg_mc(struct cpsw_ale *ale, u16 vid, bool unreg)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+	int ret;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	if (unreg)
+		ret = cpsw_ale_get_vlan_unreg_mcast(ale_entry,
+						    ale->vlan_field_bits);
+	else
+		ret = cpsw_ale_get_vlan_reg_mcast(ale_entry,
+						  ale->vlan_field_bits);
+
+	return ret;
+}
+
 int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 		      int reg_mcast, int unreg_mcast)
 {
@@ -482,6 +563,54 @@ int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_vlan);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mask, int unreg_mask)
+{
+	int ret = 0;
+	int vlan_members = cpsw_ale_read_vlan_members(ale, vid) & ~port_mask;
+	int reg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 0) & ~port_mask;
+	int unreg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 1) & ~port_mask;
+	int untag_members = cpsw_ale_read_untagged(ale, vid) & ~port_mask;
+
+	vlan_members |= port_mask;
+	untag_members |= untag_mask;
+	reg_mcast_members |= reg_mask;
+	unreg_mcast_members |= unreg_mask;
+
+	ret = cpsw_ale_add_vlan(ale, vid, vlan_members, untag_members,
+				reg_mcast_members, unreg_mcast_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to add vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev,  "port mask 0x%x untag 0x%x\n", vlan_members,
+		untag_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_add_modify);
+
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask)
+{
+	int ret = 0;
+	int vlan_members;
+
+	vlan_members = cpsw_ale_read_vlan_members(ale, vid);
+	vlan_members &= ~port_mask;
+
+	ret = cpsw_ale_del_vlan(ale, vid, vlan_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to del vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev, "port mask 0x%x\n", port_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_del_modify);
+
 void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti)
 {
 	u32 ale_entry[ALE_ENTRY_WORDS];
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index d4fe901..bc29616 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -123,4 +123,12 @@ int cpsw_ale_control_set(struct cpsw_ale *ale, int port,
 			 int control, int value);
 void cpsw_ale_dump(struct cpsw_ale *ale, u32 *data);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mcast, int unreg_mcast);
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask);
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state);
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port,
+			      int flags, u16 vid);
+
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH 3/4] cpsw_switchdev: add switchdev support files
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas
In-Reply-To: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org>

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw_switchdev.c | 299 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 4 files changed, 313 insertions(+)
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 48a541e..b22ae7d 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -73,6 +73,15 @@ config TI_CPSW
 	  To compile this driver as a module, choose M here: the module
 	  will be called cpsw.
 
+config TI_CPSW_SWITCHDEV
+	bool "TI CPSW switchdev support"
+	depends on TI_CPSW
+	depends on NET_SWITCHDEV
+	help
+	  Enable switchdev support on TI's CPSW Ethernet Switch.
+
+	  This will allow you to configure the switch using standard tools.
+
 config TI_CPTS
 	bool "TI Common Platform Time Sync (CPTS) Support"
 	depends on TI_CPSW || TI_KEYSTONE_NETCP
diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
index 0be551d..3926c6a 100644
--- a/drivers/net/ethernet/ti/Makefile
+++ b/drivers/net/ethernet/ti/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
 obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
 obj-$(CONFIG_TI_CPTS_MOD) += cpts.o
 obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
+ti_cpsw-objs:= cpsw_switchdev.o
 ti_cpsw-y := cpsw.o
 
 obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
new file mode 100644
index 0000000..bf8c1bf
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Texas Instruments switchdev Driver
+ *
+ * Copyright (C) 2018 Texas Instruments
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/if_bridge.h>
+#include <net/switchdev.h>
+#include "cpsw.h"
+#include "cpsw_priv.h"
+#include "cpsw_ale.h"
+
+static u32 cpsw_switchdev_get_ver(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	return cpsw->version;
+}
+
+static int cpsw_port_attr_set(struct net_device *dev,
+			      const struct switchdev_attr *attr,
+			      struct switchdev_trans *trans)
+{
+	return -EOPNOTSUPP;
+}
+
+static int cpsw_port_attr_get(struct net_device *dev,
+			      struct switchdev_attr *attr)
+{
+	u32 cpsw_ver;
+	int err = 0;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		cpsw_ver = cpsw_switchdev_get_ver(dev);
+		attr->u.ppid.id_len = sizeof(cpsw_ver);
+		memcpy(&attr->u.ppid.id, &cpsw_ver, attr->u.ppid.id_len);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
+static u16 cpsw_get_pvid(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	u32 __iomem *port_vlan_reg;
+	u32 pvid;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		pvid = slave_read(cpsw->slaves + (priv->emac_port - 1), reg);
+	} else {
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		pvid = readl(port_vlan_reg);
+	}
+
+	pvid = pvid & 0xfff;
+
+	return pvid;
+}
+
+static void cpsw_set_pvid(struct cpsw_priv *priv, u16 vid, bool cfi, u32 cos)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	void __iomem *port_vlan_reg;
+	u32 pvid;
+
+	pvid = vid;
+	pvid |= cfi ? BIT(12) : 0;
+	pvid |= (cos & 0x7) << 13;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		/* no barrier */
+		slave_write(cpsw->slaves + (priv->emac_port - 1), pvid, reg);
+	} else {
+		/* CPU port */
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		writel(pvid, port_vlan_reg);
+	}
+}
+
+static int cpsw_port_vlan_add(struct cpsw_priv *priv, bool untag, bool pvid,
+			      u16 vid)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask = BIT(priv->emac_port);
+	int unreg_mcast_mask = 0;
+	int reg_mcast_mask = 0;
+	int untag_mask = 0;
+	int ret = 0;
+
+	if (priv->ndev->flags & IFF_ALLMULTI)
+		unreg_mcast_mask = port_mask;
+
+	if (priv->ndev->flags & IFF_MULTICAST)
+		reg_mcast_mask = port_mask;
+
+	if (untag)
+		untag_mask = port_mask;
+
+	ret = cpsw_ale_vlan_add_modify(cpsw->ale, vid, port_mask, untag_mask,
+				       reg_mcast_mask, unreg_mcast_mask);
+	if (ret) {
+		dev_err(priv->dev, "Unable to add vlan\n");
+		return ret;
+	}
+
+	if (!pvid)
+		return ret;
+
+	cpsw_set_pvid(priv, vid, 0, 0);
+
+	dev_dbg(priv->dev, "VID: %u dev: %s port: %u\n", vid,
+		priv->ndev->name, priv->emac_port);
+
+	return ret;
+}
+
+static int cpsw_port_vlan_del(struct cpsw_priv *priv, u16 vid)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask = BIT(priv->emac_port);
+	int ret = 0;
+
+	ret = cpsw_ale_vlan_del_modify(cpsw->ale, vid, port_mask);
+	if (ret != 0)
+		return ret;
+
+	ret = cpsw_ale_del_ucast(cpsw->ale, priv->mac_addr,
+				 HOST_PORT_NUM, ALE_VLAN, vid);
+
+	if (vid == cpsw_get_pvid(priv))
+		cpsw_set_pvid(priv, 0, 0, 0);
+
+	if (ret != 0) {
+		dev_dbg(priv->dev, "Failed to delete unicast entry\n");
+		ret = 0;
+	}
+
+	ret = cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
+				 0, ALE_VLAN, vid);
+	if (ret != 0) {
+		dev_dbg(priv->dev, "Failed to delete multicast entry\n");
+		ret = 0;
+	}
+
+	return ret;
+}
+
+static int cpsw_port_vlans_add(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan,
+			       struct switchdev_trans *trans)
+{
+	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
+	bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
+	u16 vid;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_add(priv, untagged, pvid, vid);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_vlans_del(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan)
+
+{
+	u16 vid;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_del(priv, vid);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_mdb_add(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb,
+			     struct switchdev_trans *trans)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask;
+	int err;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	port_mask = BIT(priv->emac_port);
+	err = cpsw_ale_mcast_add_modify(cpsw->ale, mdb->addr, port_mask,
+					ALE_VLAN, mdb->vid, 0);
+
+	return err;
+}
+
+static int cpsw_port_mdb_del(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb)
+
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	int del_mask;
+	int err;
+
+	del_mask = BIT(priv->emac_port);
+	err = cpsw_ale_mcast_del_modify(cpsw->ale, mdb->addr, del_mask,
+					ALE_VLAN, mdb->vid);
+
+	return err;
+}
+
+static int cpsw_port_obj_add(struct net_device *ndev,
+			     const struct switchdev_obj *obj,
+			     struct switchdev_trans *trans)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_add(priv, vlan, trans);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+		err = cpsw_port_mdb_add(priv, mdb, trans);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static int cpsw_port_obj_del(struct net_device *ndev,
+			     const struct switchdev_obj *obj)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_del(priv, vlan);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+		err = cpsw_port_mdb_del(priv, SWITCHDEV_OBJ_PORT_MDB(obj));
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static const struct switchdev_ops cpsw_port_switchdev_ops = {
+	.switchdev_port_attr_set	= cpsw_port_attr_set,
+	.switchdev_port_attr_get	= cpsw_port_attr_get,
+	.switchdev_port_obj_add		= cpsw_port_obj_add,
+	.switchdev_port_obj_del		= cpsw_port_obj_del,
+};
+
+void cpsw_port_switchdev_init(struct net_device *ndev)
+{
+	ndev->switchdev_ops = &cpsw_port_switchdev_ops;
+}
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.h b/drivers/net/ethernet/ti/cpsw_switchdev.h
new file mode 100644
index 0000000..4940462
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.h
@@ -0,0 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <net/switchdev.h>
+
+void cpsw_port_switchdev_init(struct net_device *ndev);
-- 
2.7.4

^ permalink raw reply related

* [PATCH 4/4] cpsw: add switchdev support
From: Ilias Apalodimas @ 2018-05-24  6:56 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera
  Cc: francois.ozog, yogeshs, spatton, Ilias Apalodimas
In-Reply-To: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org>

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 503 +++++++++++++++++++++++++++++++-----
 drivers/net/ethernet/ti/cpsw_priv.h |   9 +-
 2 files changed, 450 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index b16e7cf..8f8ebd8 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -18,12 +18,10 @@
 #include <linux/clk.h>
 #include <linux/timer.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
 #include <linux/irqreturn.h>
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/etherdevice.h>
-#include <linux/netdevice.h>
 #include <linux/net_tstamp.h>
 #include <linux/phy.h>
 #include <linux/workqueue.h>
@@ -42,6 +40,7 @@
 #include "cpsw.h"
 #include "cpsw_ale.h"
 #include "cpsw_priv.h"
+#include "cpsw_switchdev.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -146,9 +145,6 @@ do {								\
 #define CPSW_CMINTMAX_INTVL	(1000 / CPSW_CMINTMIN_CNT)
 #define CPSW_CMINTMIN_INTVL	((1000 / CPSW_CMINTMAX_CNT) + 1)
 
-#define cpsw_slave_index(cpsw, priv)				\
-		((cpsw->data.dual_emac) ? priv->emac_port :	\
-		cpsw->data.active_slave)
 #define IRQ_NUM			2
 #define CPSW_MAX_QUEUES		8
 #define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
@@ -360,6 +356,13 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
+struct cpsw_switchdev_event_work {
+	struct work_struct work;
+	struct switchdev_notifier_fdb_info fdb_info;
+	struct cpsw_priv *priv;
+	unsigned long event;
+};
+
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
@@ -433,18 +436,22 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 		struct cpsw_slave *slave;				\
 		struct cpsw_common *cpsw = (priv)->cpsw;		\
 		int n;							\
-		if (cpsw->data.dual_emac)				\
-			(func)((cpsw)->slaves + priv->emac_port, ##arg);\
-		else							\
+		if (cpsw->data.switch_mode) {				\
+			if (priv->emac_port == HOST_PORT_NUM)		\
+				break;					\
+			(func)((cpsw)->slaves + (priv->emac_port - 1),  \
+			       ##arg);\
+		} else {						\
 			for (n = cpsw->data.slaves,			\
 					slave = cpsw->slaves;		\
 					n; n--)				\
 				(func)(slave++, ##arg);			\
+		}							\
 	} while (0)
 
 #define cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb)		\
 	do {								\
-		if (!cpsw->data.dual_emac)				\
+		if (!cpsw->data.switch_mode)				\
 			break;						\
 		if (CPDMA_RX_SOURCE_PORT(status) == 1) {		\
 			ndev = cpsw->slaves[0].ndev;			\
@@ -456,11 +463,13 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 	} while (0)
 #define cpsw_add_mcast(cpsw, priv, addr)				\
 	do {								\
-		if (cpsw->data.dual_emac) {				\
+		if (cpsw->data.switch_mode) {				\
 			struct cpsw_slave *slave = cpsw->slaves +	\
-						priv->emac_port;	\
+						(priv->emac_port - 1);	\
 			int slave_port = cpsw_get_slave_port(		\
 						slave->slave_num);	\
+			if (priv->emac_port == HOST_PORT_NUM)		\
+				break;					\
 			cpsw_ale_add_mcast(cpsw->ale, addr,		\
 				1 << slave_port | ALE_PORT_HOST,	\
 				ALE_VLAN, slave->port_vlan, 0);		\
@@ -476,13 +485,41 @@ static inline int cpsw_get_slave_port(u32 slave_num)
 	return slave_num + 1;
 }
 
+static int cpsw_is_dual_mac(u8 switch_mode)
+{
+	return switch_mode == CPSW_DUAL_EMAC;
+}
+
+static int cpsw_is_switchdev(u8 switch_mode)
+{
+	return switch_mode == CPSW_SWITCHDEV;
+}
+
+static int cpsw_is_switch(u8 switch_mode)
+{
+	return switch_mode == CPSW_TI_SWITCH;
+}
+
+static int cpsw_slave_index(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (priv->emac_port == HOST_PORT_NUM)
+		return -1;
+#endif
+
+	return cpsw->data.switch_mode ? priv->emac_port - 1 :
+		cpsw->data.active_slave;
+}
+
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 {
 	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 	struct cpsw_ale *ale = cpsw->ale;
 	int i;
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		bool flag = false;
 
 		/* Enabling promiscuous mode for one interface will be
@@ -508,7 +545,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			cpsw_ale_control_set(ale, 0, ALE_BYPASS, 0);
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
-	} else {
+	} else if (cpsw_is_switch(cpsw->data.switch_mode)) {
 		if (enable) {
 			unsigned long timeout = jiffies + HZ;
 
@@ -548,6 +585,18 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			}
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
+	} else if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		/* When interfaces are placed into a bridge they'll switch to
+		 * promiscuous mode. In switchdev case ALE_P0_UNI_FLOOD is
+		 * changed whether the cpu port participates in the bridge
+		 */
+		struct cpsw_priv *priv = netdev_priv(ndev);
+		int slave_idx = cpsw_slave_index(priv);
+		int slave_num;
+
+		slave_num = cpsw_get_slave_port(slave_idx);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NOLEARN, 0);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NO_SA_UPDATE, 0);
 	}
 }
 
@@ -555,10 +604,11 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int vid;
 
-	if (cpsw->data.dual_emac)
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
+		vid = cpsw->slaves[slave_no].port_vlan;
 	else
 		vid = cpsw->data.default_vlan;
 
@@ -629,8 +679,9 @@ static void cpsw_tx_handler(void *token, int len, int status)
 static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 {
 	struct cpsw_priv *priv = netdev_priv(skb->dev);
-	struct cpsw_common *cpsw = priv->cpsw;
 	u32 rx_vlan_encap_hdr = *((u32 *)skb->data);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	u16 vtag, vid, prio, pkt_type;
 
 	/* Remove VLAN header encapsulation word */
@@ -651,8 +702,8 @@ static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 	if (!vid)
 		return;
 	/* Ignore default vlans in dual mac mode */
-	if (cpsw->data.dual_emac &&
-	    vid == cpsw->slaves[priv->emac_port].port_vlan)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode) &&
+	    vid == cpsw->slaves[slave_no].port_vlan)
 		return;
 
 	prio = (rx_vlan_encap_hdr >>
@@ -681,9 +732,9 @@ static void cpsw_rx_handler(void *token, int len, int status)
 	cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb);
 
 	if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
-		/* In dual emac mode check for all interfaces */
-		if (cpsw->data.dual_emac && cpsw->usage_count &&
-		    (status >= 0)) {
+		/* In any other that switch mode check for all interfaces */
+		if (!cpsw_is_switch(cpsw->data.switch_mode) &&
+		    cpsw->usage_count && status >= 0) {
 			/* The packet received is for the interface which
 			 * is already down and the other interface is up
 			 * and running, instead of freeing which results
@@ -699,6 +750,11 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		return;
 	}
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		skb->offload_fwd_mark = 1;
+#endif
+
 	new_skb = netdev_alloc_skb_ip_align(ndev, cpsw->rx_packet_max);
 	if (new_skb) {
 		skb_copy_queue_mapping(new_skb, skb);
@@ -1206,11 +1262,10 @@ static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv,
 					struct sk_buff *skb,
 					struct cpdma_chan *txch)
 {
-	struct cpsw_common *cpsw = priv->cpsw;
-
 	skb_tx_timestamp(skb);
+
 	return cpdma_chan_submit(txch, skb, skb->data, skb->len,
-				 priv->emac_port + cpsw->data.dual_emac);
+				 priv->emac_port);
 }
 
 static inline void cpsw_add_dual_emac_def_ale_entries(
@@ -1283,7 +1338,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 
 	slave_port = cpsw_get_slave_port(slave->slave_num);
 
-	if (cpsw->data.dual_emac)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
 		cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
 	else
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1362,8 +1417,8 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	control_reg = readl(&cpsw->regs->control);
 	control_reg |= CPSW_VLAN_AWARE | CPSW_RX_VLAN_ENCAP;
 	writel(control_reg, &cpsw->regs->control);
-	fifo_mode = (cpsw->data.dual_emac) ? CPSW_FIFO_DUAL_MAC_MODE :
-		     CPSW_FIFO_NORMAL_MODE;
+	fifo_mode = cpsw_is_dual_mac(cpsw->data.switch_mode) ?
+		CPSW_FIFO_DUAL_MAC_MODE : CPSW_FIFO_NORMAL_MODE;
 	writel(fifo_mode, &cpsw->host_port_regs->tx_in_ctl);
 
 	/* setup host port priority mapping */
@@ -1374,7 +1429,7 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM,
 			     ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
 
-	if (!cpsw->data.dual_emac) {
+	if (!cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		cpsw_ale_add_ucast(cpsw->ale, priv->mac_addr, HOST_PORT_NUM,
 				   0, 0);
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1474,14 +1529,19 @@ static int cpsw_ndo_open(struct net_device *ndev)
 	/* Initialize host and slave ports */
 	if (!cpsw->usage_count)
 		cpsw_init_host_port(priv);
-	for_each_slave(priv, cpsw_slave_open, priv);
 
-	/* Add default VLAN */
-	if (!cpsw->data.dual_emac)
-		cpsw_add_default_vlan(priv);
-	else
-		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
-				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
+	if (!IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV) ||
+	    (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV) &&
+	     priv->emac_port != HOST_PORT_NUM)) {
+		for_each_slave(priv, cpsw_slave_open, priv);
+
+		/* Add default VLAN */
+		if (cpsw_is_dual_mac(cpsw->data.switch_mode))
+			cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
+					  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
+		else
+			cpsw_add_default_vlan(priv);
+	}
 
 	/* initialize shared resources for every ndev */
 	if (!cpsw->usage_count) {
@@ -1575,6 +1635,13 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
 	struct cpdma_chan *txch;
 	int ret, q_idx;
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (priv->emac_port == HOST_PORT_NUM) {
+		dev_kfree_skb_any(skb);
+		return NETDEV_TX_OK;
+	}
+#endif
+
 	if (skb_padto(skb, CPSW_MIN_PACKET_SIZE)) {
 		cpsw_err(priv, tx_err, "packet pad failed\n");
 		ndev->stats.tx_dropped++;
@@ -1655,8 +1722,12 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv)
 	struct cpsw_slave *slave;
 	struct cpsw_common *cpsw = priv->cpsw;
 	u32 ctrl, mtype;
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return;
 
-	slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+	slave = &cpsw->slaves[slave_no];
 
 	ctrl = slave_read(slave, CPSW2_CONTROL);
 	switch (cpsw->version) {
@@ -1791,11 +1862,14 @@ static int cpsw_ndo_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
 {
 	struct cpsw_priv *priv = netdev_priv(dev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
 
 	if (!netif_running(dev))
 		return -EINVAL;
 
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
+
 	switch (cmd) {
 	case SIOCSHWTSTAMP:
 		return cpsw_hwtstamp_set(dev, req);
@@ -1832,6 +1906,7 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct sockaddr *addr = (struct sockaddr *)p;
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int flags = 0;
 	u16 vid = 0;
 	int ret;
@@ -1845,8 +1920,8 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		vid = cpsw->slaves[slave_no].port_vlan;
 		flags = ALE_VLAN;
 	}
 
@@ -1884,8 +1959,11 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 	u32 port_mask;
 	struct cpsw_common *cpsw = priv->cpsw;
 
-	if (cpsw->data.dual_emac) {
-		port_mask = (1 << (priv->emac_port + 1)) | ALE_PORT_HOST;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return -EOPNOTSUPP;
+
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		port_mask = (1 << priv->emac_port) | ALE_PORT_HOST;
 
 		if (priv->ndev->flags & IFF_ALLMULTI)
 			unreg_mcast_mask = port_mask;
@@ -1929,6 +2007,9 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return 0;
+
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
@@ -1938,7 +2019,7 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		/* In dual EMAC, reserved VLAN id should not be used for
 		 * creating VLAN interfaces as this can break the dual
 		 * EMAC port separation
@@ -1965,6 +2046,9 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		return 0;
+
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
@@ -1974,7 +2058,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -1999,6 +2083,24 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	return ret;
 }
 
+static int cpsw_ndo_get_phys_port_name(struct net_device *ndev, char *name,
+				       size_t len)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int err;
+
+	if (!cpsw_is_switchdev(cpsw->data.switch_mode))
+		return -EOPNOTSUPP;
+
+	err = snprintf(name, len, "p%d", priv->emac_port);
+
+	if (err >= len)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
@@ -2065,6 +2167,7 @@ static const struct net_device_ops cpsw_netdev_ops = {
 #endif
 	.ndo_vlan_rx_add_vid	= cpsw_ndo_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= cpsw_ndo_vlan_rx_kill_vid,
+	.ndo_get_phys_port_name = cpsw_ndo_get_phys_port_name,
 };
 
 static int cpsw_get_regs_len(struct net_device *ndev)
@@ -2152,7 +2255,10 @@ static int cpsw_get_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (!cpsw->slaves[slave_no].phy)
 		return -EOPNOTSUPP;
@@ -2166,7 +2272,10 @@ static int cpsw_set_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_ksettings_set(cpsw->slaves[slave_no].phy,
@@ -2179,7 +2288,10 @@ static void cpsw_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return;
 
 	wol->supported = 0;
 	wol->wolopts = 0;
@@ -2192,7 +2304,10 @@ static int cpsw_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_wol(cpsw->slaves[slave_no].phy, wol);
@@ -2451,7 +2566,10 @@ static int cpsw_get_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_get_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2463,7 +2581,10 @@ static int cpsw_set_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2475,7 +2596,10 @@ static int cpsw_nway_reset(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return genphy_restart_aneg(cpsw->slaves[slave_no].phy);
@@ -2626,7 +2750,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 	data->mac_control = prop;
 
 	if (of_property_read_bool(node, "dual_emac"))
-		data->dual_emac = 1;
+		data->switch_mode = CPSW_DUAL_EMAC;
+
+	/* switchdev overrides DTS */
+	if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
+		data->switch_mode = CPSW_SWITCHDEV;
 
 	/*
 	 * Populate all the child nodes here...
@@ -2707,7 +2835,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			if (ret)
 				return ret;
 		}
-		if (data->dual_emac) {
+		if (cpsw_is_dual_mac(data->switch_mode)) {
 			if (of_property_read_u32(slave_node, "dual_emac_res_vlan",
 						 &prop)) {
 				dev_err(&pdev->dev, "Missing dual_emac_res_vlan in DT.\n");
@@ -2787,9 +2915,13 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	}
 	memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN);
 
-	priv_sl2->emac_port = 1;
+	priv_sl2->emac_port = 2;
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ndev->features |= NETIF_F_NETNS_LOCAL;
+		cpsw_port_switchdev_init(ndev);
+	}
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
@@ -2806,6 +2938,49 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	return ret;
 }
 
+static int cpsw_probe_cpu_port(struct cpsw_common *cpsw)
+{
+	struct cpsw_priv *priv_sl2;
+	struct net_device *ndev;
+	int ret = 0;
+
+	ndev = alloc_etherdev_mq(sizeof(struct cpsw_priv), CPSW_MAX_QUEUES);
+	if (!ndev) {
+		dev_err(cpsw->dev, "cpsw: error allocating net_device\n");
+		return -ENOMEM;
+	}
+
+	priv_sl2 = netdev_priv(ndev);
+	priv_sl2->cpsw = cpsw;
+	priv_sl2->ndev = ndev;
+	priv_sl2->dev  = &ndev->dev;
+	priv_sl2->msg_enable = netif_msg_init(debug_level, CPSW_DEBUG);
+
+	random_ether_addr(priv_sl2->mac_addr);
+	dev_info(cpsw->dev, "cpu port: Random MACID = %pM\n",
+		 priv_sl2->mac_addr);
+
+	memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN);
+
+	priv_sl2->emac_port = HOST_PORT_NUM;
+	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_NETNS_LOCAL;
+
+	ndev->netdev_ops = &cpsw_netdev_ops;
+
+	/* register the network device */
+	SET_NETDEV_DEV(ndev, cpsw->dev);
+	cpsw_port_switchdev_init(ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		dev_err(cpsw->dev, "cpsw: error registering net device\n");
+		free_netdev(ndev);
+		ret = -ENODEV;
+	}
+	cpsw->master = ndev;
+
+	return ret;
+}
+
 #define CPSW_QUIRK_IRQ		BIT(0)
 
 static const struct platform_device_id cpsw_devtype[] = {
@@ -2844,6 +3019,187 @@ static const struct of_device_id cpsw_of_mtable[] = {
 };
 MODULE_DEVICE_TABLE(of, cpsw_of_mtable);
 
+static bool cpsw_port_dev_check(const struct net_device *dev)
+{
+	return dev->netdev_ops == &cpsw_netdev_ops;
+}
+
+static void cpsw_fdb_offload_notify(struct net_device *ndev,
+				    struct switchdev_notifier_fdb_info *rcv)
+{
+	struct switchdev_notifier_fdb_info info;
+
+	info.addr = rcv->addr;
+	info.vid = rcv->vid;
+	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				 ndev, &info.info);
+}
+
+static void cpsw_switchdev_event_work(struct work_struct *work)
+{
+	struct cpsw_switchdev_event_work *switchdev_work =
+		container_of(work, struct cpsw_switchdev_event_work, work);
+	struct cpsw_priv *priv = switchdev_work->priv;
+	struct switchdev_notifier_fdb_info *fdb;
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	rtnl_lock();
+	switch (switchdev_work->event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		cpsw_ale_add_ucast(cpsw->ale, (u8 *)fdb->addr, priv->emac_port,
+				   ALE_VLAN | ALE_SECURE, fdb->vid);
+		cpsw_fdb_offload_notify(priv->ndev, fdb);
+		break;
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		cpsw_ale_del_ucast(cpsw->ale, (u8 *)fdb->addr, priv->emac_port,
+				   ALE_VLAN | ALE_SECURE, fdb->vid);
+		break;
+	default:
+		break;
+	}
+	rtnl_unlock();
+
+	kfree(switchdev_work->fdb_info.addr);
+	kfree(switchdev_work);
+	dev_put(priv->ndev);
+}
+
+/* called under rcu_read_lock() */
+static int cpsw_switchdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = switchdev_notifier_info_to_dev(ptr);
+	struct switchdev_notifier_fdb_info *fdb_info = ptr;
+	struct cpsw_switchdev_event_work *switchdev_work;
+	struct cpsw_priv *priv = netdev_priv(ndev);
+
+	if (!cpsw_port_dev_check(ndev))
+		return NOTIFY_DONE;
+
+	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
+	if (WARN_ON(!switchdev_work))
+		return NOTIFY_BAD;
+
+	INIT_WORK(&switchdev_work->work, cpsw_switchdev_event_work);
+	switchdev_work->priv = priv;
+	switchdev_work->event = event;
+
+	switch (event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		memcpy(&switchdev_work->fdb_info, ptr,
+		       sizeof(switchdev_work->fdb_info));
+		switchdev_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
+		ether_addr_copy((u8 *)switchdev_work->fdb_info.addr,
+				fdb_info->addr);
+		dev_hold(ndev);
+		break;
+	default:
+		kfree(switchdev_work);
+		return NOTIFY_DONE;
+	}
+
+	queue_work(system_long_wq, &switchdev_work->work);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_switchdev_notifier = {
+	.notifier_call = cpsw_switchdev_event,
+};
+
+static void cpsw_netdevice_port_link(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	if (priv->emac_port != HOST_PORT_NUM)
+		return;
+
+	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD, 1);
+}
+
+static void cpsw_netdevice_port_unlink(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	if (priv->emac_port != HOST_PORT_NUM)
+		return;
+
+	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD, 0);
+}
+
+/* netdev notifier */
+static int cpsw_netdevice_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+	struct netdev_notifier_changeupper_info *info;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		info = ptr;
+		if (!info->master)
+			goto out;
+		if (info->linking)
+			cpsw_netdevice_port_link(ndev);
+		else
+			cpsw_netdevice_port_unlink(ndev);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+out:
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_netdevice_nb __read_mostly = {
+	.notifier_call = cpsw_netdevice_event,
+};
+
+static int cpsw_register_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = register_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register netdevice notifier\n");
+		return ret;
+	}
+
+	ret = register_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register switchdev notifier\n");
+		goto unreg_netdevice;
+	}
+
+	return ret;
+
+unreg_netdevice:
+	ret = unregister_netdevice_notifier(&cpsw_netdevice_nb);
+
+	return ret;
+}
+
+static int cpsw_unregister_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = unregister_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret)
+		dev_err(priv->dev, "can't unregister switchdev notifier\n");
+
+	ret += unregister_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret)
+		dev_err(priv->dev, "can't unregister netdevice notifier\n");
+
+	return ret;
+}
+
 static int cpsw_probe(struct platform_device *pdev)
 {
 	struct clk			*clk;
@@ -2935,7 +3291,11 @@ static int cpsw_probe(struct platform_device *pdev)
 		cpsw->slaves[i].slave_num = i;
 
 	cpsw->slaves[0].ndev = ndev;
-	priv->emac_port = 0;
+
+	if (cpsw_is_switch(cpsw->data.switch_mode))
+		priv->emac_port = HOST_PORT_NUM;
+	else
+		priv->emac_port = 1;
 
 	clk = devm_clk_get(&pdev->dev, "fck");
 	if (IS_ERR(clk)) {
@@ -3076,8 +3436,17 @@ static int cpsw_probe(struct platform_device *pdev)
 			cpsw->quirk_irq = true;
 	}
 
-	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ret = cpsw_probe_cpu_port(cpsw);
+		if (ret) {
+			cpsw_err(priv, probe, "error probe cpu interface\n");
+			goto clean_dma_ret;
+		}
+		cpsw_port_switchdev_init(ndev);
+		ndev->features |= NETIF_F_NETNS_LOCAL;
+	}
 
+	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
 	netif_napi_add(ndev, &cpsw->napi_rx, cpsw_rx_poll, CPSW_POLL_WEIGHT);
@@ -3093,7 +3462,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		ret = cpsw_probe_dual_emac(priv);
 		if (ret) {
 			cpsw_err(priv, probe, "error probe slave 2 emac interface\n");
@@ -3139,6 +3508,12 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ret = cpsw_register_notifiers(priv);
+		if (ret)
+			goto clean_dma_ret;
+	}
+
 	cpsw_notice(priv, probe,
 		    "initialized device (regs %pa, irq %d, pool size %d)\n",
 		    &ss_res->start, ndev->irq, dma_params.descs_pool_size);
@@ -3164,7 +3539,8 @@ static int cpsw_probe(struct platform_device *pdev)
 static int cpsw_remove(struct platform_device *pdev)
 {
 	struct net_device *ndev = platform_get_drvdata(pdev);
-	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
 	ret = pm_runtime_get_sync(&pdev->dev);
@@ -3173,7 +3549,10 @@ static int cpsw_remove(struct platform_device *pdev)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac)
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		ret = cpsw_unregister_notifiers(priv);
+
+	if (cpsw->data.switch_mode)
 		unregister_netdev(cpsw->slaves[1].ndev);
 	unregister_netdev(ndev);
 
@@ -3182,8 +3561,10 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
-	if (cpsw->data.dual_emac)
+	if (cpsw->data.switch_mode)
 		free_netdev(cpsw->slaves[1].ndev);
+	if (cpsw->master)
+		free_netdev(cpsw->master);
 	free_netdev(ndev);
 	return 0;
 }
@@ -3195,7 +3576,7 @@ static int cpsw_suspend(struct device *dev)
 	struct net_device	*ndev = platform_get_drvdata(pdev);
 	struct cpsw_common	*cpsw = ndev_to_cpsw(ndev);
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw->data.switch_mode) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -3224,7 +3605,7 @@ static int cpsw_resume(struct device *dev)
 
 	/* shut up ASSERT_RTNL() warning in netif_set_real_num_tx/rx_queues */
 	rtnl_lock();
-	if (cpsw->data.dual_emac) {
+	if (cpsw->data.switch_mode) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 3b02a83..4be5ffc 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -30,6 +30,12 @@
 #define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
 #define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
 
+enum {
+	CPSW_TI_SWITCH,
+	CPSW_DUAL_EMAC,
+	CPSW_SWITCHDEV,
+};
+
 struct cpsw_slave_data {
 	struct	device_node *phy_node;
 	char	phy_id[MII_BUS_ID_SIZE];
@@ -48,7 +54,7 @@ struct cpsw_platform_data {
 	u32	bd_ram_size;  /*buffer descriptor ram size */
 	u32	mac_control;	/* Mac control register */
 	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
+	u8	switch_mode;    /* Enable Dual EMAC/switchdev mode */
 };
 
 struct cpsw_slave {
@@ -80,6 +86,7 @@ struct cpsw_common {
 	u32				coal_intvl;
 	u32				bus_freq_mhz;
 	int				rx_packet_max;
+	struct net_device		*master; /* used for switchdev */
 	struct cpsw_slave		*slaves;
 	struct cpdma_ctlr		*dma;
 	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v2 0/3] net: sfp: small improvements
From: Antoine Tenart @ 2018-05-24  6:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, linux, netdev, linux-kernel,
	thomas.petazzoni, maxime.chevallier, gregory.clement,
	miquel.raynal, nadavh, stefanc, ymarkman, mw
In-Reply-To: <decfdf6b-6047-4338-5b81-2b8ef9bc8e48@gmail.com>

Hi Florian,

On Wed, May 23, 2018 at 11:40:50AM -0700, Florian Fainelli wrote:
> 
> Antoine, can you please do CC the people who worked on that code before,
> arguably, send an update to MAINTAINERS file to create a specific
> section for PHYLINK.

My bad, sorry for that, I'll make sure to Cc everyone involved next
time. As for an update to MAINTAINERS, I believe the one listed for a
particular file should do it himself.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* [PATCH bpf-next v4 00/10] bpf: enhancements for multi-function programs
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski

[1] Support for bpf-to-bpf function calls in the powerpc64 JIT compiler.

[2] Provide a way for resolving function calls because of the way JITed
    images are allocated in powerpc64.

[3] Fix to get JITed instruction dumps for multi-function programs from
    the bpf system call.

[4] Fix for bpftool to show delimited multi-function JITed image dumps.

v4:
 - Incorporate review comments from Jakub.
 - Fix JSON output for bpftool.

v3:
 - Change base tree tag to bpf-next.
 - Incorporate review comments from Alexei, Daniel and Jakub.
 - Make sure that the JITed image does not grow or shrink after
   the last pass due to the way the instruction sequence used
   to load a callee's address maybe optimized.
 - Make additional changes to the bpf system call and bpftool to
   make multi-function JITed dumps easier to correlate.

v2:
 - Incorporate review comments from Jakub.

Sandipan Das (10):
  bpf: support 64-bit offsets for bpf function calls
  bpf: powerpc64: pad function address loads with NOPs
  bpf: powerpc64: add JIT support for multi-function programs
  bpf: get kernel symbol addresses via syscall
  tools: bpf: sync bpf uapi header
  tools: bpftool: resolve calls without using imm field
  bpf: fix multi-function JITed dump obtained via syscall
  bpf: get JITed image lengths of functions via syscall
  tools: bpf: sync bpf uapi header
  tools: bpftool: add delimiters to multi-function JITed dumps

 arch/powerpc/net/bpf_jit_comp64.c | 110 ++++++++++++++++++++++++++++++--------
 include/uapi/linux/bpf.h          |   4 ++
 kernel/bpf/syscall.c              |  82 ++++++++++++++++++++++++++--
 kernel/bpf/verifier.c             |  22 +++++---
 tools/bpf/bpftool/prog.c          |  97 ++++++++++++++++++++++++++++++++-
 tools/bpf/bpftool/xlated_dumper.c |  14 +++--
 tools/bpf/bpftool/xlated_dumper.h |   3 ++
 tools/include/uapi/linux/bpf.h    |   4 ++
 8 files changed, 301 insertions(+), 35 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [PATCH bpf-next v4 01/10] bpf: support 64-bit offsets for bpf function calls
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

The imm field of a bpf instruction is a signed 32-bit integer.
For JITed bpf-to-bpf function calls, it holds the offset of the
start address of the callee's JITed image from __bpf_call_base.

For some architectures, such as powerpc64, this offset may be
as large as 64 bits and cannot be accomodated in the imm field
without truncation.

We resolve this by:

[1] Additionally using the auxiliary data of each function to
    keep a list of start addresses of the JITed images for all
    functions determined by the verifier.

[2] Retaining the subprog id inside the off field of the call
    instructions and using it to index into the list mentioned
    above and lookup the callee's address.

To make sure that the existing JIT compilers continue to work
without requiring changes, we keep the imm field as it is.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 kernel/bpf/verifier.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a9e4b1372da6..559cb74ba29e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 			    insn->src_reg != BPF_PSEUDO_CALL)
 				continue;
 			subprog = insn->off;
-			insn->off = 0;
 			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
 				func[subprog]->bpf_func -
 				__bpf_call_base;
 		}
+
+		/* we use the aux data to keep a list of the start addresses
+		 * of the JITed images for each function in the program
+		 *
+		 * for some architectures, such as powerpc64, the imm field
+		 * might not be large enough to hold the offset of the start
+		 * address of the callee's JITed image from __bpf_call_base
+		 *
+		 * in such cases, we can lookup the start address of a callee
+		 * by using its subprog id, available from the off field of
+		 * the call instruction, as an index for this list
+		 */
+		func[i]->aux->func = func;
+		func[i]->aux->func_cnt = env->subprog_cnt;
 	}
 	for (i = 0; i < env->subprog_cnt; i++) {
 		old_bpf_func = func[i]->bpf_func;
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 02/10] bpf: powerpc64: pad function address loads with NOPs
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

For multi-function programs, loading the address of a callee
function to a register requires emitting instructions whose
count varies from one to five depending on the nature of the
address.

Since we come to know of the callee's address only before the
extra pass, the number of instructions required to load this
address may vary from what was previously generated. This can
make the JITed image grow or shrink.

To avoid this, we should generate a constant five-instruction
when loading function addresses by padding the optimized load
sequence with NOPs.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 1bdb1aff0619..e4582744a31d 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -167,25 +167,37 @@ static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 
 static void bpf_jit_emit_func_call(u32 *image, struct codegen_context *ctx, u64 func)
 {
+	unsigned int i, ctx_idx = ctx->idx;
+
+	/* Load function address into r12 */
+	PPC_LI64(12, func);
+
+	/* For bpf-to-bpf function calls, the callee's address is unknown
+	 * until the last extra pass. As seen above, we use PPC_LI64() to
+	 * load the callee's address, but this may optimize the number of
+	 * instructions required based on the nature of the address.
+	 *
+	 * Since we don't want the number of instructions emitted to change,
+	 * we pad the optimized PPC_LI64() call with NOPs to guarantee that
+	 * we always have a five-instruction sequence, which is the maximum
+	 * that PPC_LI64() can emit.
+	 */
+	for (i = ctx->idx - ctx_idx; i < 5; i++)
+		PPC_NOP();
+
 #ifdef PPC64_ELF_ABI_v1
-	/* func points to the function descriptor */
-	PPC_LI64(b2p[TMP_REG_2], func);
-	/* Load actual entry point from function descriptor */
-	PPC_BPF_LL(b2p[TMP_REG_1], b2p[TMP_REG_2], 0);
-	/* ... and move it to LR */
-	PPC_MTLR(b2p[TMP_REG_1]);
 	/*
 	 * Load TOC from function descriptor at offset 8.
 	 * We can clobber r2 since we get called through a
 	 * function pointer (so caller will save/restore r2)
 	 * and since we don't use a TOC ourself.
 	 */
-	PPC_BPF_LL(2, b2p[TMP_REG_2], 8);
-#else
-	/* We can clobber r12 */
-	PPC_FUNC_ADDR(12, func);
-	PPC_MTLR(12);
+	PPC_BPF_LL(2, 12, 8);
+	/* Load actual entry point from function descriptor */
+	PPC_BPF_LL(12, 12, 0);
 #endif
+
+	PPC_MTLR(12);
 	PPC_BLRL();
 }
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 03/10] bpf: powerpc64: add JIT support for multi-function programs
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

This adds support for bpf-to-bpf function calls in the powerpc64
JIT compiler. The JIT compiler converts the bpf call instructions
to native branch instructions. After a round of the usual passes,
the start addresses of the JITed images for the callee functions
are known. Finally, to fixup the branch target addresses, we need
to perform an extra pass.

Because of the address range in which JITed images are allocated
on powerpc64, the offsets of the start addresses of these images
from __bpf_call_base are as large as 64 bits. So, for a function
call, we cannot use the imm field of the instruction to determine
the callee's address. Instead, we use the alternative method of
getting it from the list of function addresses in the auxiliary
data of the caller by using the off field as an index.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v3:
 - Fix memory leak for jit_data when we fail to allocated addrs.
 - Remove unnecessary bpf_jit_binary_lock_ro() call.
---
 arch/powerpc/net/bpf_jit_comp64.c | 76 +++++++++++++++++++++++++++++++++------
 1 file changed, 66 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index e4582744a31d..f1c95779843b 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -268,7 +268,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			      struct codegen_context *ctx,
-			      u32 *addrs)
+			      u32 *addrs, bool extra_pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -724,11 +724,25 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			break;
 
 		/*
-		 * Call kernel helper
+		 * Call kernel helper or bpf function
 		 */
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
-			func = (u8 *) __bpf_call_base + imm;
+
+			/* bpf function call */
+			if (insn[i].src_reg == BPF_PSEUDO_CALL)
+				if (!extra_pass)
+					func = NULL;
+				else if (fp->aux->func && off < fp->aux->func_cnt)
+					/* use the subprog id from the off
+					 * field to lookup the callee address
+					 */
+					func = (u8 *) fp->aux->func[off]->bpf_func;
+				else
+					return -EINVAL;
+			/* kernel helper call */
+			else
+				func = (u8 *) __bpf_call_base + imm;
 
 			bpf_jit_emit_func_call(image, ctx, (u64)func);
 
@@ -876,6 +890,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 	return 0;
 }
 
+struct powerpc64_jit_data {
+	struct bpf_binary_header *header;
+	u32 *addrs;
+	u8 *image;
+	u32 proglen;
+	struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
 	u32 proglen;
@@ -883,6 +905,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	u8 *image = NULL;
 	u32 *code_base;
 	u32 *addrs;
+	struct powerpc64_jit_data *jit_data;
 	struct codegen_context cgctx;
 	int pass;
 	int flen;
@@ -890,6 +913,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	struct bpf_prog *org_fp = fp;
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
+	bool extra_pass = false;
 
 	if (!fp->jit_requested)
 		return org_fp;
@@ -903,11 +927,32 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		fp = tmp_fp;
 	}
 
+	jit_data = fp->aux->jit_data;
+	if (!jit_data) {
+		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+		if (!jit_data) {
+			fp = org_fp;
+			goto out;
+		}
+		fp->aux->jit_data = jit_data;
+	}
+
 	flen = fp->len;
+	addrs = jit_data->addrs;
+	if (addrs) {
+		cgctx = jit_data->ctx;
+		image = jit_data->image;
+		bpf_hdr = jit_data->header;
+		proglen = jit_data->proglen;
+		alloclen = proglen + FUNCTION_DESCR_SIZE;
+		extra_pass = true;
+		goto skip_init_ctx;
+	}
+
 	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
 	if (addrs == NULL) {
 		fp = org_fp;
-		goto out;
+		goto out_addrs;
 	}
 
 	memset(&cgctx, 0, sizeof(struct codegen_context));
@@ -916,10 +961,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
-		goto out;
+		goto out_addrs;
 	}
 
 	/*
@@ -937,9 +982,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 			bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
 		fp = org_fp;
-		goto out;
+		goto out_addrs;
 	}
 
+skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
 
 	/* Code generation passes 1-2 */
@@ -947,7 +993,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
@@ -973,10 +1019,20 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	fp->jited_len = alloclen;
 
 	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+	if (!fp->is_func || extra_pass) {
+out_addrs:
+		kfree(addrs);
+		kfree(jit_data);
+		fp->aux->jit_data = NULL;
+	} else {
+		jit_data->addrs = addrs;
+		jit_data->ctx = cgctx;
+		jit_data->proglen = proglen;
+		jit_data->image = image;
+		jit_data->header = bpf_hdr;
+	}
 
 out:
-	kfree(addrs);
-
 	if (bpf_blinded)
 		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 04/10] bpf: get kernel symbol addresses via syscall
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.

When bpf_jit_kallsyms is enabled, we can get the address
of the corresponding kernel symbol for a callee function
and resolve the symbol's name. The address is determined
by adding the value of the call instruction's imm field
to __bpf_call_base. This offset gets assigned to the imm
field by the verifier.

For some architectures, such as powerpc64, the imm field
is not large enough to hold this offset.

We resolve this by:

[1] Assigning the subprog id to the imm field of a call
    instruction in the verifier instead of the offset of
    the callee's symbol's address from __bpf_call_base.

[2] Determining the address of a callee's corresponding
    symbol by using the imm field as an index for the
    list of kernel symbol addresses now available from
    the program info.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v3:
 - Copy addresses to jited_ksyms only if bpf_dump_raw_ok()
   is true.
 - Move new fields to the end of bpf_prog_info to avoid
   breaking userspace.
---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c     | 25 +++++++++++++++++++++++++
 kernel/bpf/verifier.c    |  7 +------
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3e502d06bc3..0be90965867d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2205,6 +2205,8 @@ struct bpf_prog_info {
 	__u32 gpl_compatible:1;
 	__u64 netns_dev;
 	__u64 netns_ino;
+	__u32 nr_jited_ksyms;
+	__aligned_u64 jited_ksyms;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0b4c94551001..068a4fc79ddb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	if (!capable(CAP_SYS_ADMIN)) {
 		info.jited_prog_len = 0;
 		info.xlated_prog_len = 0;
+		info.nr_jited_ksyms = 0;
 		goto done;
 	}
 
@@ -1981,6 +1982,30 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 		}
 	}
 
+	ulen = info.nr_jited_ksyms;
+	info.nr_jited_ksyms = prog->aux->func_cnt;
+	if (info.nr_jited_ksyms && ulen) {
+		if (bpf_dump_raw_ok()) {
+			u64 __user *user_ksyms;
+			ulong ksym_addr;
+			u32 i;
+
+			/* copy the address of the kernel symbol
+			 * corresponding to each function
+			 */
+			ulen = min_t(u32, info.nr_jited_ksyms, ulen);
+			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
+			for (i = 0; i < ulen; i++) {
+				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
+				ksym_addr &= PAGE_MASK;
+				if (put_user((u64) ksym_addr, &user_ksyms[i]))
+					return -EFAULT;
+			}
+		} else {
+			info.jited_ksyms = 0;
+		}
+	}
+
 done:
 	if (copy_to_user(uinfo, &info, info_len) ||
 	    put_user(info_len, &uattr->info.info_len))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 559cb74ba29e..8c4d9d0fd3ab 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	 * later look the same as if they were interpreted only.
 	 */
 	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
-		unsigned long addr;
-
 		if (insn->code != (BPF_JMP | BPF_CALL) ||
 		    insn->src_reg != BPF_PSEUDO_CALL)
 			continue;
 		insn->off = env->insn_aux_data[i].call_imm;
 		subprog = find_subprog(env, i + insn->off + 1);
-		addr  = (unsigned long)func[subprog]->bpf_func;
-		addr &= PAGE_MASK;
-		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
-			    addr - __bpf_call_base;
+		insn->imm = subprog;
 	}
 
 	prog->jited = 1;
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 05/10] tools: bpf: sync bpf uapi header
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
addresses of the kernel symbols corresponding to each
function in a program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v3:
 - Move new fields to the end of bpf_prog_info to avoid
   breaking userspace.
---
 tools/include/uapi/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c3e502d06bc3..0be90965867d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2205,6 +2205,8 @@ struct bpf_prog_info {
 	__u32 gpl_compatible:1;
 	__u64 netns_dev;
 	__u64 netns_ino;
+	__u32 nr_jited_ksyms;
+	__aligned_u64 jited_ksyms;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 06/10] tools: bpftool: resolve calls without using imm field
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

Currently, we resolve the callee's address for a JITed function
call by using the imm field of the call instruction as an offset
from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
use this address to get the callee's kernel symbol's name.

For some architectures, such as powerpc64, the imm field is not
large enough to hold this offset. So, instead of assigning this
offset to the imm field, the verifier now assigns the subprog
id. Also, a list of kernel symbol addresses for all the JITed
functions is provided in the program info. We now use the imm
field as an index for this list to lookup a callee's symbol's
address and resolve its name.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
v3:
 - Avoid using redundant pointers.
 - Fix indentation.

v2:
 - Order variables from longest to shortest.
 - Make sure that ksyms_ptr and ksyms_len are always initialized.
 - Simplify code.
---
 tools/bpf/bpftool/prog.c          | 24 ++++++++++++++++++++++++
 tools/bpf/bpftool/xlated_dumper.c | 10 +++++++++-
 tools/bpf/bpftool/xlated_dumper.h |  2 ++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 9bdfdf2d3fbe..e05ab58d39e2 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -420,7 +420,9 @@ static int do_show(int argc, char **argv)
 
 static int do_dump(int argc, char **argv)
 {
+	unsigned long *func_ksyms = NULL;
 	struct bpf_prog_info info = {};
+	unsigned int nr_func_ksyms;
 	struct dump_data dd = {};
 	__u32 len = sizeof(info);
 	unsigned int buf_size;
@@ -496,10 +498,22 @@ static int do_dump(int argc, char **argv)
 		return -1;
 	}
 
+	nr_func_ksyms = info.nr_jited_ksyms;
+	if (nr_func_ksyms) {
+		func_ksyms = malloc(nr_func_ksyms * sizeof(__u64));
+		if (!func_ksyms) {
+			p_err("mem alloc failed");
+			close(fd);
+			goto err_free;
+		}
+	}
+
 	memset(&info, 0, sizeof(info));
 
 	*member_ptr = ptr_to_u64(buf);
 	*member_len = buf_size;
+	info.jited_ksyms = ptr_to_u64(func_ksyms);
+	info.nr_jited_ksyms = nr_func_ksyms;
 
 	err = bpf_obj_get_info_by_fd(fd, &info, &len);
 	close(fd);
@@ -513,6 +527,11 @@ static int do_dump(int argc, char **argv)
 		goto err_free;
 	}
 
+	if (info.nr_jited_ksyms > nr_func_ksyms) {
+		p_err("too many addresses returned");
+		goto err_free;
+	}
+
 	if ((member_len == &info.jited_prog_len &&
 	     info.jited_prog_insns == 0) ||
 	    (member_len == &info.xlated_prog_len &&
@@ -558,6 +577,9 @@ static int do_dump(int argc, char **argv)
 			dump_xlated_cfg(buf, *member_len);
 	} else {
 		kernel_syms_load(&dd);
+		dd.nr_jited_ksyms = info.nr_jited_ksyms;
+		dd.jited_ksyms = (__u64 *) info.jited_ksyms;
+
 		if (json_output)
 			dump_xlated_json(&dd, buf, *member_len, opcodes);
 		else
@@ -566,10 +588,12 @@ static int do_dump(int argc, char **argv)
 	}
 
 	free(buf);
+	free(func_ksyms);
 	return 0;
 
 err_free:
 	free(buf);
+	free(func_ksyms);
 	return -1;
 }
 
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index 7a3173b76c16..efdc8fecf2bb 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -174,7 +174,11 @@ static const char *print_call_pcrel(struct dump_data *dd,
 				    unsigned long address,
 				    const struct bpf_insn *insn)
 {
-	if (sym)
+	if (!dd->nr_jited_ksyms)
+		/* Do not show address for interpreted programs */
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			"%+d", insn->off);
+	else if (sym)
 		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 			 "%+d#%s", insn->off, sym->name);
 	else
@@ -203,6 +207,10 @@ static const char *print_call(void *private_data,
 	unsigned long address = dd->address_call_base + insn->imm;
 	struct kernel_sym *sym;
 
+	if (insn->src_reg == BPF_PSEUDO_CALL &&
+	    (__u32) insn->imm < dd->nr_jited_ksyms)
+		address = dd->jited_ksyms[insn->imm];
+
 	sym = kernel_syms_search(dd, address);
 	if (insn->src_reg == BPF_PSEUDO_CALL)
 		return print_call_pcrel(dd, sym, address, insn);
diff --git a/tools/bpf/bpftool/xlated_dumper.h b/tools/bpf/bpftool/xlated_dumper.h
index b34affa7ef2d..eafbb49c8d0b 100644
--- a/tools/bpf/bpftool/xlated_dumper.h
+++ b/tools/bpf/bpftool/xlated_dumper.h
@@ -49,6 +49,8 @@ struct dump_data {
 	unsigned long address_call_base;
 	struct kernel_sym *sym_mapping;
 	__u32 sym_count;
+	__u64 *jited_ksyms;
+	__u32 nr_jited_ksyms;
 	char scratch_buff[SYM_MAX_NAME + 8];
 };
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 07/10] bpf: fix multi-function JITed dump obtained via syscall
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

Currently, for multi-function programs, we cannot get the JITed
instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
command. Because of this, userspace tools such as bpftool fail
to identify a multi-function program as being JITed or not.

With the JIT enabled and the test program running, this can be
verified as follows:

  # cat /proc/sys/net/core/bpf_jit_enable
  1

Before applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
          loaded_at 2018-05-16T11:43:38+0530  uid 0
          xlated 216B  not jited  memlock 65536B
  ...

  # bpftool prog dump jited id 1
  no instructions returned

After applying this patch:

  # bpftool prog list
  1: kprobe  name foo  tag b811aab41a39ad3d  gpl
          loaded_at 2018-05-16T12:13:01+0530  uid 0
          xlated 216B  jited 308B  memlock 65536B
  ...

  # bpftool prog dump jited id 1
     0:   nop
     4:   nop
     8:   mflr    r0
     c:   std     r0,16(r1)
    10:   stdu    r1,-112(r1)
    14:   std     r31,104(r1)
    18:   addi    r31,r1,48
    1c:   li      r3,10
  ...

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v4:
 - If the image allocated in userspace is not large enough,
   fill it up till the end even if the last JITed image can
   only be copied partially.
---
 kernel/bpf/syscall.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 068a4fc79ddb..c8e987a612b5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1970,13 +1970,44 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 	 * for offload.
 	 */
 	ulen = info.jited_prog_len;
-	info.jited_prog_len = prog->jited_len;
+	if (prog->aux->func_cnt) {
+		u32 i;
+
+		info.jited_prog_len = 0;
+		for (i = 0; i < prog->aux->func_cnt; i++)
+			info.jited_prog_len += prog->aux->func[i]->jited_len;
+	} else {
+		info.jited_prog_len = prog->jited_len;
+	}
+
 	if (info.jited_prog_len && ulen) {
 		if (bpf_dump_raw_ok()) {
 			uinsns = u64_to_user_ptr(info.jited_prog_insns);
 			ulen = min_t(u32, info.jited_prog_len, ulen);
-			if (copy_to_user(uinsns, prog->bpf_func, ulen))
-				return -EFAULT;
+
+			/* for multi-function programs, copy the JITed
+			 * instructions for all the functions
+			 */
+			if (prog->aux->func_cnt) {
+				u32 len, free, i;
+				u8 *img;
+
+				free = ulen;
+				for (i = 0; i < prog->aux->func_cnt; i++) {
+					len = prog->aux->func[i]->jited_len;
+					len = min_t(u32, len, free);
+					img = (u8 *) prog->aux->func[i]->bpf_func;
+					if (copy_to_user(uinsns, img, len))
+						return -EFAULT;
+					uinsns += len;
+					free -= len;
+					if (!free)
+						break;
+				}
+			} else {
+				if (copy_to_user(uinsns, prog->bpf_func, ulen))
+					return -EFAULT;
+			}
 		} else {
 			info.jited_prog_insns = 0;
 		}
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 08/10] bpf: get JITed image lengths of functions via syscall
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of the JITed image lengths of each function for a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.

This can be used by userspace applications like bpftool
to split up the contiguous JITed dump, also obtained via
the system call, into more relatable chunks corresponding
to each function.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c     | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0be90965867d..344d2ddcef49 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2206,7 +2206,9 @@ struct bpf_prog_info {
 	__u64 netns_dev;
 	__u64 netns_ino;
 	__u32 nr_jited_ksyms;
+	__u32 nr_jited_func_lens;
 	__aligned_u64 jited_ksyms;
+	__aligned_u64 jited_func_lens;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c8e987a612b5..788456c18617 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2037,6 +2037,26 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 		}
 	}
 
+	ulen = info.nr_jited_func_lens;
+	info.nr_jited_func_lens = prog->aux->func_cnt;
+	if (info.nr_jited_func_lens && ulen) {
+		if (bpf_dump_raw_ok()) {
+			u32 __user *user_lens;
+			u32 func_len, i;
+
+			/* copy the JITed image lengths for each function */
+			ulen = min_t(u32, info.nr_jited_func_lens, ulen);
+			user_lens = u64_to_user_ptr(info.jited_func_lens);
+			for (i = 0; i < ulen; i++) {
+				func_len = prog->aux->func[i]->jited_len;
+				if (put_user(func_len, &user_lens[i]))
+					return -EFAULT;
+			}
+		} else {
+			info.jited_func_lens = 0;
+		}
+	}
+
 done:
 	if (copy_to_user(uinfo, &info, info_len) ||
 	    put_user(info_len, &uattr->info.info_len))
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 09/10] tools: bpf: sync bpf uapi header
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

Syncing the bpf.h uapi header with tools so that struct
bpf_prog_info has the two new fields for passing on the
JITed image lengths of each function in a multi-function
program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 tools/include/uapi/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0be90965867d..344d2ddcef49 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2206,7 +2206,9 @@ struct bpf_prog_info {
 	__u64 netns_dev;
 	__u64 netns_ino;
 	__u32 nr_jited_ksyms;
+	__u32 nr_jited_func_lens;
 	__aligned_u64 jited_ksyms;
+	__aligned_u64 jited_func_lens;
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
-- 
2.14.3

^ permalink raw reply related

* [PATCH bpf-next v4 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Sandipan Das @ 2018-05-24  6:56 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao, jakub.kicinski
In-Reply-To: <cover.1527143877.git.sandipan@linux.vnet.ibm.com>

This splits up the contiguous JITed dump obtained via the bpf
system call into more relatable chunks for each function in
the program. If the kernel symbols corresponding to these are
known, they are printed in the header for each JIT image dump
otherwise the masked start address is printed.

Before applying this patch:

  # bpftool prog dump jited id 1

     0:	push   %rbp
     1:	mov    %rsp,%rbp
  ...
    70:	leaveq
    71:	retq
    72:	push   %rbp
    73:	mov    %rsp,%rbp
  ...
    dd:	leaveq
    de:	retq

  # bpftool -p prog dump jited id 1

  [{
          "pc": "0x0",
          "operation": "push",
          "operands": ["%rbp"
          ]
      },{
  ...
      },{
          "pc": "0x71",
          "operation": "retq",
          "operands": [null
          ]
      },{
          "pc": "0x72",
          "operation": "push",
          "operands": ["%rbp"
          ]
      },{
  ...
      },{
          "pc": "0xde",
          "operation": "retq",
          "operands": [null
          ]
      }
  ]

After applying this patch:

  # echo 0 > /proc/sys/net/core/bpf_jit_kallsyms
  # bpftool prog dump jited id 1

  0xffffffffc02c7000:
     0:	push   %rbp
     1:	mov    %rsp,%rbp
  ...
    70:	leaveq
    71:	retq

  0xffffffffc02cf000:
     0:	push   %rbp
     1:	mov    %rsp,%rbp
  ...
    6b:	leaveq
    6c:	retq

  # bpftool -p prog dump jited id 1

  [{
          "name": "0xffffffffc02c7000",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x71",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      },{
          "name": "0xffffffffc02cf000",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x6c",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      }
  ]

  # echo 1 > /proc/sys/net/core/bpf_jit_kallsyms
  # bpftool prog dump jited id 1

  bpf_prog_b811aab41a39ad3d_foo:
     0:	push   %rbp
     1:	mov    %rsp,%rbp
  ...
    70:	leaveq
    71:	retq

  bpf_prog_cf418ac8b67bebd9_F:
     0:	push   %rbp
     1:	mov    %rsp,%rbp
  ...
    6b:	leaveq
    6c:	retq

  # bpftool -p prog dump jited id 1

  [{
          "name": "bpf_prog_b811aab41a39ad3d_foo",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x71",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      },{
          "name": "bpf_prog_cf418ac8b67bebd9_F",
          "insns": [{
                  "pc": "0x0",
                  "operation": "push",
                  "operands": ["%rbp"
                  ]
              },{
  ...
              },{
                  "pc": "0x6c",
                  "operation": "retq",
                  "operands": [null
                  ]
              }
          ]
      }
  ]

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v4:
 - Fix JSON output.
---
 tools/bpf/bpftool/prog.c          | 73 ++++++++++++++++++++++++++++++++++++++-
 tools/bpf/bpftool/xlated_dumper.c |  4 +--
 tools/bpf/bpftool/xlated_dumper.h |  1 +
 3 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e05ab58d39e2..39b88e760367 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -422,7 +422,9 @@ static int do_dump(int argc, char **argv)
 {
 	unsigned long *func_ksyms = NULL;
 	struct bpf_prog_info info = {};
+	unsigned int *func_lens = NULL;
 	unsigned int nr_func_ksyms;
+	unsigned int nr_func_lens;
 	struct dump_data dd = {};
 	__u32 len = sizeof(info);
 	unsigned int buf_size;
@@ -508,12 +510,24 @@ static int do_dump(int argc, char **argv)
 		}
 	}
 
+	nr_func_lens = info.nr_jited_func_lens;
+	if (nr_func_lens) {
+		func_lens = malloc(nr_func_lens * sizeof(__u32));
+		if (!func_lens) {
+			p_err("mem alloc failed");
+			close(fd);
+			goto err_free;
+		}
+	}
+
 	memset(&info, 0, sizeof(info));
 
 	*member_ptr = ptr_to_u64(buf);
 	*member_len = buf_size;
 	info.jited_ksyms = ptr_to_u64(func_ksyms);
 	info.nr_jited_ksyms = nr_func_ksyms;
+	info.jited_func_lens = ptr_to_u64(func_lens);
+	info.nr_jited_func_lens = nr_func_lens;
 
 	err = bpf_obj_get_info_by_fd(fd, &info, &len);
 	close(fd);
@@ -532,6 +546,11 @@ static int do_dump(int argc, char **argv)
 		goto err_free;
 	}
 
+	if (info.nr_jited_func_lens > nr_func_lens) {
+		p_err("too many values returned");
+		goto err_free;
+	}
+
 	if ((member_len == &info.jited_prog_len &&
 	     info.jited_prog_insns == 0) ||
 	    (member_len == &info.xlated_prog_len &&
@@ -569,7 +588,57 @@ static int do_dump(int argc, char **argv)
 				goto err_free;
 		}
 
-		disasm_print_insn(buf, *member_len, opcodes, name);
+		if (info.nr_jited_func_lens && info.jited_func_lens) {
+			struct kernel_sym *sym = NULL;
+			char sym_name[SYM_MAX_NAME];
+			unsigned char *img = buf;
+			__u64 *ksyms = NULL;
+			__u32 *lens;
+			__u32 i;
+
+			if (info.nr_jited_ksyms) {
+				kernel_syms_load(&dd);
+				ksyms = (__u64 *) info.jited_ksyms;
+			}
+
+			if (json_output)
+				jsonw_start_array(json_wtr);
+
+			lens = (__u32 *) info.jited_func_lens;
+			for (i = 0; i < info.nr_jited_func_lens; i++) {
+				if (ksyms) {
+					sym = kernel_syms_search(&dd, ksyms[i]);
+					if (sym)
+						sprintf(sym_name, "%s", sym->name);
+					else
+						sprintf(sym_name, "0x%016llx", ksyms[i]);
+				} else {
+					strcpy(sym_name, "unknown");
+				}
+
+				if (json_output) {
+					jsonw_start_object(json_wtr);
+					jsonw_name(json_wtr, "name");
+					jsonw_string(json_wtr, sym_name);
+					jsonw_name(json_wtr, "insns");
+				} else {
+					printf("%s:\n", sym_name);
+				}
+
+				disasm_print_insn(img, lens[i], opcodes, name);
+				img += lens[i];
+
+				if (json_output)
+					jsonw_end_object(json_wtr);
+				else
+					printf("\n");
+			}
+
+			if (json_output)
+				jsonw_end_array(json_wtr);
+		} else {
+			disasm_print_insn(buf, *member_len, opcodes, name);
+		}
 	} else if (visual) {
 		if (json_output)
 			jsonw_null(json_wtr);
@@ -589,11 +658,13 @@ static int do_dump(int argc, char **argv)
 
 	free(buf);
 	free(func_ksyms);
+	free(func_lens);
 	return 0;
 
 err_free:
 	free(buf);
 	free(func_ksyms);
+	free(func_lens);
 	return -1;
 }
 
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index efdc8fecf2bb..b97f1da60dd1 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -102,8 +102,8 @@ void kernel_syms_destroy(struct dump_data *dd)
 	free(dd->sym_mapping);
 }
 
-static struct kernel_sym *kernel_syms_search(struct dump_data *dd,
-					     unsigned long key)
+struct kernel_sym *kernel_syms_search(struct dump_data *dd,
+				      unsigned long key)
 {
 	struct kernel_sym sym = {
 		.address = key,
diff --git a/tools/bpf/bpftool/xlated_dumper.h b/tools/bpf/bpftool/xlated_dumper.h
index eafbb49c8d0b..33d86e2b369b 100644
--- a/tools/bpf/bpftool/xlated_dumper.h
+++ b/tools/bpf/bpftool/xlated_dumper.h
@@ -56,6 +56,7 @@ struct dump_data {
 
 void kernel_syms_load(struct dump_data *dd);
 void kernel_syms_destroy(struct dump_data *dd);
+struct kernel_sym *kernel_syms_search(struct dump_data *dd, unsigned long key);
 void dump_xlated_json(struct dump_data *dd, void *buf, unsigned int len,
 		      bool opcodes);
 void dump_xlated_plain(struct dump_data *dd, void *buf, unsigned int len,
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH bpf-next v4 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-24  7:04 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao
In-Reply-To: <477bc3286831947c2eb774e8787f72680cf378cf.1527143877.git.sandipan@linux.vnet.ibm.com>

On Thu, 24 May 2018 12:26:54 +0530, Sandipan Das wrote:
> This splits up the contiguous JITed dump obtained via the bpf
> system call into more relatable chunks for each function in
> the program. If the kernel symbols corresponding to these are
> known, they are printed in the header for each JIT image dump
> otherwise the masked start address is printed.

...
 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Thank you!

^ permalink raw reply

* [PATCH v2 00/13] ARM: pxa: switch to DMA slave maps
From: Robert Jarzmik @ 2018-05-24  7:06 UTC (permalink / raw)
  To: Daniel Mack, Haojian Zhuang, Robert Jarzmik, Ezequiel Garcia,
	Boris Brezillon, David Woodhouse, Brian Norris, Marek Vasut,
	Richard Weinberger, Liam Girdwood, Mark Brown, Arnd Bergmann
  Cc: alsa-devel, netdev, linux-mmc, linux-kernel, linux-ide, linux-mtd,
	dmaengine, linux-arm-kernel, linux-media

This v1 cover letter is quoted in [1].

For maintainers the table below should help you focus on the patches targetted at you, and ignore the other noise.

The differences since v1 is by maintainers / topic / patch :
- Arnd and Daniel / PXA topic / 0002
  devices.c split into pxa25x.c, pxa27x.c and pxa3xx.c
- Boris and Daniel / MTD topic / 0005
  Review and ack of this one
- Arnd and netdev / NET topic / 0006 and 0007
  Arnd comment taken, review and ack of these one
- Mark and alsa-devel / ASoC topic / 0008
  Mark, I couldn't keep your former Ack because :
    - I changed one line so that the cpu device provides the DMA
    - I added pxa2xx-i2s which was forgotten
  Therefore I need a new ack
- Arnd and Daniel / SSP topic / 0013
  Review and ack of this one

Happy review.

--
Robert

Robert Jarzmik (13):
  dmaengine: pxa: use a dma slave map
  ARM: pxa: add dma slave map
  mmc: pxamci: remove the dmaengine compat need
  media: pxa_camera: remove the dmaengine compat need
  mtd: rawnand: marvell: remove the dmaengine compat need
  net: smc911x: remove the dmaengine compat need
  net: smc91x: remove the dmaengine compat need
  ASoC: pxa: remove the dmaengine compat need
  ata: pata_pxa: remove the dmaengine compat need
  dmaengine: pxa: document pxad_param
  dmaengine: pxa: make the filter function internal
  ARM: pxa: remove the DMA IO resources
  ARM: pxa: change SSP DMA channels allocation

 arch/arm/mach-pxa/devices.c           | 148 +---------------------------------
 arch/arm/mach-pxa/devices.h           |   6 +-
 arch/arm/mach-pxa/pxa25x.c            |  41 +++++++++-
 arch/arm/mach-pxa/pxa27x.c            |  42 +++++++++-
 arch/arm/mach-pxa/pxa3xx.c            |  44 +++++++++-
 arch/arm/plat-pxa/ssp.c               |  47 -----------
 drivers/ata/pata_pxa.c                |  10 +--
 drivers/dma/pxa_dma.c                 |  13 ++-
 drivers/media/platform/pxa_camera.c   |  22 +----
 drivers/mmc/host/pxamci.c             |  29 +------
 drivers/mtd/nand/raw/marvell_nand.c   |  17 +---
 drivers/net/ethernet/smsc/smc911x.c   |  16 +---
 drivers/net/ethernet/smsc/smc91x.c    |  12 +--
 drivers/net/ethernet/smsc/smc91x.h    |   1 -
 include/linux/dma/pxa-dma.h           |  20 +++--
 include/linux/platform_data/mmp_dma.h |   4 +
 include/linux/pxa2xx_ssp.h            |   2 -
 sound/arm/pxa2xx-ac97.c               |  14 +---
 sound/arm/pxa2xx-pcm-lib.c            |   6 +-
 sound/soc/pxa/pxa-ssp.c               |   5 +-
 sound/soc/pxa/pxa2xx-ac97.c           |  32 ++------
 sound/soc/pxa/pxa2xx-i2s.c            |   6 +-
 22 files changed, 180 insertions(+), 357 deletions(-)

-- 
2.11.0

---
[1] Former v1 cover letter
This serie is aimed at removing the dmaengine slave compat use, and transfer
knowledge of the DMA requestors into architecture code.

This was discussed/advised by Arnd a couple of years back, it's almost time.

The serie is divided in 3 phasees :
 - phase 1 : patch 1/15 and patch 2/15
   => this is the preparation work
 - phase 2 : patches 3/15 .. 10/15
   => this is the switch of all the drivers
   => this one will require either an Ack of the maintainers or be taken by them
      once phase 1 is merged
 - phase 3 : patches 11/15
   => this is the last part, cleanup and removal of export of the DMA filter
      function

As this looks like a patch bomb, each maintainer expressing for his tree either
an Ack or "I want to take through my tree" will be spared in the next iterations
of this serie.

Several of these changes have been tested on actual hardware, including :
 - pxamci
 - pxa_camera
 - smc*
 - ASoC and SSP

^ permalink raw reply

* [PATCH v2 01/13] dmaengine: pxa: use a dma slave map
From: Robert Jarzmik @ 2018-05-24  7:06 UTC (permalink / raw)
  To: Daniel Mack, Haojian Zhuang, Robert Jarzmik, Ezequiel Garcia,
	Boris Brezillon, David Woodhouse, Brian Norris, Marek Vasut,
	Richard Weinberger, Liam Girdwood, Mark Brown, Arnd Bergmann
  Cc: alsa-devel, netdev, linux-mmc, linux-kernel, linux-ide, linux-mtd,
	dmaengine, linux-arm-kernel, linux-media
In-Reply-To: <20180524070703.11901-1-robert.jarzmik@free.fr>

In order to remove the specific knowledge of the dma mapping from PXA
drivers, add a default slave map for pxa architectures.

This won't impact MMP architecture, but is aimed only at all PXA boards.

This is the first step, and once all drivers are converted,
pxad_filter_fn() will be made static, and the DMA resources removed from
device.c.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vinod Koul <vkoul@kernel.org>
---
 drivers/dma/pxa_dma.c                 | 10 +++++++++-
 include/linux/platform_data/mmp_dma.h |  4 ++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/pxa_dma.c b/drivers/dma/pxa_dma.c
index b53fb618bbf6..9505334f9c6e 100644
--- a/drivers/dma/pxa_dma.c
+++ b/drivers/dma/pxa_dma.c
@@ -179,6 +179,8 @@ static unsigned int pxad_drcmr(unsigned int line)
 	return 0x1000 + line * 4;
 }
 
+bool pxad_filter_fn(struct dma_chan *chan, void *param);
+
 /*
  * Debug fs
  */
@@ -1396,9 +1398,10 @@ static int pxad_probe(struct platform_device *op)
 {
 	struct pxad_device *pdev;
 	const struct of_device_id *of_id;
+	const struct dma_slave_map *slave_map = NULL;
 	struct mmp_dma_platdata *pdata = dev_get_platdata(&op->dev);
 	struct resource *iores;
-	int ret, dma_channels = 0, nb_requestors = 0;
+	int ret, dma_channels = 0, nb_requestors = 0, slave_map_cnt = 0;
 	const enum dma_slave_buswidth widths =
 		DMA_SLAVE_BUSWIDTH_1_BYTE   | DMA_SLAVE_BUSWIDTH_2_BYTES |
 		DMA_SLAVE_BUSWIDTH_4_BYTES;
@@ -1429,6 +1432,8 @@ static int pxad_probe(struct platform_device *op)
 	} else if (pdata && pdata->dma_channels) {
 		dma_channels = pdata->dma_channels;
 		nb_requestors = pdata->nb_requestors;
+		slave_map = pdata->slave_map;
+		slave_map_cnt = pdata->slave_map_cnt;
 	} else {
 		dma_channels = 32;	/* default 32 channel */
 	}
@@ -1440,6 +1445,9 @@ static int pxad_probe(struct platform_device *op)
 	pdev->slave.device_prep_dma_memcpy = pxad_prep_memcpy;
 	pdev->slave.device_prep_slave_sg = pxad_prep_slave_sg;
 	pdev->slave.device_prep_dma_cyclic = pxad_prep_dma_cyclic;
+	pdev->slave.filter.map = slave_map;
+	pdev->slave.filter.mapcnt = slave_map_cnt;
+	pdev->slave.filter.fn = pxad_filter_fn;
 
 	pdev->slave.copy_align = PDMA_ALIGNMENT;
 	pdev->slave.src_addr_widths = widths;
diff --git a/include/linux/platform_data/mmp_dma.h b/include/linux/platform_data/mmp_dma.h
index d1397c8ed94e..6397b9c8149a 100644
--- a/include/linux/platform_data/mmp_dma.h
+++ b/include/linux/platform_data/mmp_dma.h
@@ -12,9 +12,13 @@
 #ifndef MMP_DMA_H
 #define MMP_DMA_H
 
+struct dma_slave_map;
+
 struct mmp_dma_platdata {
 	int dma_channels;
 	int nb_requestors;
+	int slave_map_cnt;
+	const struct dma_slave_map *slave_map;
 };
 
 #endif /* MMP_DMA_H */
-- 
2.11.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox