Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 iproute2 net-next] gre6: add collect metadata support
From: Stephen Hemminger @ 2017-12-15  5:20 UTC (permalink / raw)
  To: William Tu; +Cc: netdev, Daniel Borkmann
In-Reply-To: <1513131772-87005-1-git-send-email-u9012063@gmail.com>

On Tue, 12 Dec 2017 18:22:52 -0800
William Tu <u9012063@gmail.com> wrote:

> The patch adds 'external' option to support collect metadata
> gre6 tunnel.  The 'external' keyword is already used to set the
> device into collect metadata mode such as vxlan, geneve, ipip,
> etc.  This patch extends support for ipv6 gre and gretap.
> Example of L3 and L2 gre device:
> bash:~# ip link add dev ip6gre123 type ip6gre external
> bash:~# ip link add dev ip6gretap123 type ip6gretap external
> 
> Signed-off-by: William Tu <u9012063@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>

Applied to net-next

^ permalink raw reply

* Re: [patch iproute2] tc: fix command "tc actions del" hang issue
From: Stephen Hemminger @ 2017-12-15  5:16 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, jiri
In-Reply-To: <20171214090900.14336-1-chrism@mellanox.com>

On Thu, 14 Dec 2017 18:09:00 +0900
Chris Mi <chrism@mellanox.com> wrote:

> If command is RTM_DELACTION, a non-NULL pointer is passed to rtnl_talk().
> Then flag NLM_F_ACK is not set on n->nlmsg_flags and netlink_ack() will
> not be called. Command tc will wait for the reply for ever.
> 
> Fixes: 86bf43c7c2fd ("lib/libnetlink: update rtnl_talk to support malloc
> buff at run time")
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>

Thanks for fixing this.
Applied, but please don't linewrap the fixes tag.

^ permalink raw reply

* [PATCH net-next] net/ncsi: Don't take any action on HNCDSC AEN
From: Samuel Mendoza-Jonas @ 2017-12-15  5:16 UTC (permalink / raw)
  To: netdev; +Cc: Samuel Mendoza-Jonas, David S . Miller, linux-kernel, openbmc

The current HNCDSC handler takes the status flag from the AEN packet and
will update or change the current channel based on this flag and the
current channel status.

However the flag from the HNCDSC packet merely represents the host link
state. While the state of the host interface is potentially interesting
information it should not affect the state of the NCSI link. Indeed the
NCSI specification makes no mention of any recommended action related to
the host network controller driver state.

Update the HNCDSC handler to record the host network driver status but
take no other action.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
 net/ncsi/ncsi-aen.c | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
index 67e708e98ccf..e7b05de1e6d1 100644
--- a/net/ncsi/ncsi-aen.c
+++ b/net/ncsi/ncsi-aen.c
@@ -143,43 +143,14 @@ static int ncsi_aen_handler_hncdsc(struct ncsi_dev_priv *ndp,
 	if (!nc)
 		return -ENODEV;
 
-	/* If the channel is active one, we need reconfigure it */
 	spin_lock_irqsave(&nc->lock, flags);
 	ncm = &nc->modes[NCSI_MODE_LINK];
 	hncdsc = (struct ncsi_aen_hncdsc_pkt *)h;
 	ncm->data[3] = ntohl(hncdsc->status);
-	netdev_info(ndp->ndev.dev, "NCSI: HNCDSC AEN - channel %u state %s\n",
-		    nc->id, ncm->data[3] & 0x3 ? "up" : "down");
-	if (!list_empty(&nc->link) ||
-	    nc->state != NCSI_CHANNEL_ACTIVE) {
-		spin_unlock_irqrestore(&nc->lock, flags);
-		return 0;
-	}
-
-	spin_unlock_irqrestore(&nc->lock, flags);
-	if (!(ndp->flags & NCSI_DEV_HWA) && !(ncm->data[3] & 0x1))
-		ndp->flags |= NCSI_DEV_RESHUFFLE;
-
-	/* If this channel is the active one and the link doesn't
-	 * work, we have to choose another channel to be active one.
-	 * The logic here is exactly similar to what we do when link
-	 * is down on the active channel.
-	 *
-	 * On the other hand, we need configure it when host driver
-	 * state on the active channel becomes ready.
-	 */
-	ncsi_stop_channel_monitor(nc);
-
-	spin_lock_irqsave(&nc->lock, flags);
-	nc->state = (ncm->data[3] & 0x1) ? NCSI_CHANNEL_INACTIVE :
-					   NCSI_CHANNEL_ACTIVE;
 	spin_unlock_irqrestore(&nc->lock, flags);
-
-	spin_lock_irqsave(&ndp->lock, flags);
-	list_add_tail_rcu(&nc->link, &ndp->channel_queue);
-	spin_unlock_irqrestore(&ndp->lock, flags);
-
-	ncsi_process_next_channel(ndp);
+	netdev_printk(KERN_DEBUG, ndp->ndev.dev,
+		      "NCSI: host driver %srunning on channel %u\n",
+		      ncm->data[3] & 0x1 ? "" : "not ", nc->id);
 
 	return 0;
 }
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH v2 iproute2 1/1] ss: add missing path MTU parameter
From: Stephen Hemminger @ 2017-12-15  5:15 UTC (permalink / raw)
  To: Roman Mashak; +Cc: netdev, jhs
In-Reply-To: <1513281551-31687-1-git-send-email-mrv@mojatatu.com>

On Thu, 14 Dec 2017 14:59:11 -0500
Roman Mashak <mrv@mojatatu.com> wrote:

> v2:
>    Print the path MTU immediately after the MSS, as it is easier to parse
>    for humans (suggested by Neal Cardwell).
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>

Thanks for the patch, it looks like a good field to show.

Unfortunately, it does not apply cleanly to current iproute2 master.
Please rebase and resubmit.

^ permalink raw reply

* Re: [RFC][PATCH] Add primitives for manipulating bitfields both in host- and fixed-endian.
From: Jakub Kicinski @ 2017-12-15  5:07 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, netdev, linux-kernel
In-Reply-To: <20171215023343.GG21978@ZenIV.linux.org.uk>

Looks great to me!

On Fri, 15 Dec 2017 02:33:43 +0000, Al Viro wrote:
> The following primitives are defined in linux/bitfield.h:
> 
> * u32 le32_get_bits(__le32 val, u32 field) extracts the contents of the
>   bitfield specified by @field in little-endian 32bit value @val and
>   converts it to host-endian.
> 
> * void le32p_replace_bits(__le32 *p, u32 v, u32 field) replaces
>   the contents of the bitfield specified by @field in little-endian
>   32bit object pointet to by *p with the value of @v.  New value is
>   given in host-endian and stored as little-endian.
> 
> * __le32 le32_replace_bits(__le32 old, u32 v, u32 field) is equivalent to
>   ({__le32 tmp = old; le32p_replace_bits(&old, v, field); tmp;})
>   In other words, instead of modifying an object in memory, it takes
>   the initial value and returns the modified one.

the current macros take filed/mask as first param, not sure if it's
worth maintaining the order

^ permalink raw reply

* [PATCH v3 net-next 3/3] net: dsa: mediatek: update MAINTAINERS entry with MediaTek switch driver
From: sean.wang @ 2017-12-15  4:47 UTC (permalink / raw)
  To: davem, andrew, f.fainelli, vivien.didelot
  Cc: netdev, linux-kernel, linux-mediatek, Sean Wang
In-Reply-To: <cover.1513312600.git.sean.wang@mediatek.com>

From: Sean Wang <sean.wang@mediatek.com>

I work for MediaTek and maintain SoC targeting to home gateway and
also will keep extending and testing the function from MediaTek
switch.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c86781b..90ca7f8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8728,6 +8728,13 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/mediatek/
 
+MEDIATEK SWITCH DRIVER
+M:	Sean Wang <sean.wang@mediatek.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/dsa/mt7530.*
+F:	net/dsa/tag_mtk.c
+
 MEDIATEK JPEG DRIVER
 M:	Rick Chang <rick.chang@mediatek.com>
 M:	Bin Liu <bin.liu@mediatek.com>
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 net-next 2/3] net: dsa: mediatek: combine MediaTek tag with VLAN tag
From: sean.wang @ 2017-12-15  4:47 UTC (permalink / raw)
  To: davem, andrew, f.fainelli, vivien.didelot
  Cc: netdev, linux-kernel, linux-mediatek, Sean Wang
In-Reply-To: <cover.1513312600.git.sean.wang@mediatek.com>

From: Sean Wang <sean.wang@mediatek.com>

In order to let MT7530 switch can recognize well those egress packets
having both special tag and VLAN tag, the information about the special
tag should be carried on the existing VLAN tag. On the other hand, it's
unnecessary for extra handling for ingress packets when VLAN tag is
present since it is able to put the VLAN tag after the special tag and
then follow the existing way to parse.

Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/tag_mtk.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index 8475434..11535bc 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -13,10 +13,13 @@
  */
 
 #include <linux/etherdevice.h>
+#include <linux/if_vlan.h>
 
 #include "dsa_priv.h"
 
 #define MTK_HDR_LEN		4
+#define MTK_HDR_XMIT_UNTAGGED		0
+#define MTK_HDR_XMIT_TAGGED_TPID_8100	1
 #define MTK_HDR_RECV_SOURCE_PORT_MASK	GENMASK(2, 0)
 #define MTK_HDR_XMIT_DP_BIT_MASK	GENMASK(5, 0)
 
@@ -25,20 +28,37 @@ static struct sk_buff *mtk_tag_xmit(struct sk_buff *skb,
 {
 	struct dsa_port *dp = dsa_slave_to_port(dev);
 	u8 *mtk_tag;
+	bool is_vlan_skb = true;
 
-	if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
-		return NULL;
-
-	skb_push(skb, MTK_HDR_LEN);
+	/* Build the special tag after the MAC Source Address. If VLAN header
+	 * is present, it's required that VLAN header and special tag is
+	 * being combined. Only in this way we can allow the switch can parse
+	 * the both special and VLAN tag at the same time and then look up VLAN
+	 * table with VID.
+	 */
+	if (!skb_vlan_tagged(skb)) {
+		if (skb_cow_head(skb, MTK_HDR_LEN) < 0)
+			return NULL;
 
-	memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
+		skb_push(skb, MTK_HDR_LEN);
+		memmove(skb->data, skb->data + MTK_HDR_LEN, 2 * ETH_ALEN);
+		is_vlan_skb = false;
+	}
 
-	/* Build the tag after the MAC Source Address */
 	mtk_tag = skb->data + 2 * ETH_ALEN;
-	mtk_tag[0] = 0;
+
+	/* Mark tag attribute on special tag insertion to notify hardware
+	 * whether that's a combined special tag with 802.1Q header.
+	 */
+	mtk_tag[0] = is_vlan_skb ? MTK_HDR_XMIT_TAGGED_TPID_8100 :
+		     MTK_HDR_XMIT_UNTAGGED;
 	mtk_tag[1] = (1 << dp->index) & MTK_HDR_XMIT_DP_BIT_MASK;
-	mtk_tag[2] = 0;
-	mtk_tag[3] = 0;
+
+	/* Tag control information is kept for 802.1Q */
+	if (!is_vlan_skb) {
+		mtk_tag[2] = 0;
+		mtk_tag[3] = 0;
+	}
 
 	return skb;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 net-next 1/3] net: dsa: mediatek: add VLAN support for MT7530
From: sean.wang-NuS5LvNUpcJWk0Htik3J/w @ 2017-12-15  4:47 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, andrew-g2DYL2Zd6BY,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	vivien.didelot-4ysUXcep3aM1wj+D4I0NRVaTQe2KTcn/
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Sean Wang,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <cover.1513312600.git.sean.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>

From: Sean Wang <sean.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>

MT7530 can treat each port as either VLAN-unaware port or VLAN-aware port
through the implementation of port matrix mode or port security mode on
the ingress port, respectively. On one hand, Each port has been acting as
the VLAN-unaware one whenever the device is created in the initial or
certain port joins or leaves into/from the bridge at the runtime. On the
other hand, the patch just filling the required callbacks for VLAN
operations is achieved via extending the port to be into port security
mode when the port is configured as VLAN-aware port. Which mode can make
the port be able to recognize VID from incoming packets and look up VLAN
table to validate and judge which port it should be going to. And the
range for VID from 1 to 4094 is valid for the hardware.

Signed-off-by: Sean Wang <sean.wang-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
Reviewed-by: Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org>
---
 drivers/net/dsa/mt7530.c | 288 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/dsa/mt7530.h |  83 +++++++++++++-
 2 files changed, 364 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 2820d69..8a0bb00 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -805,6 +805,69 @@ mt7530_port_bridge_join(struct dsa_switch *ds, int port,
 }
 
 static void
+mt7530_port_set_vlan_unaware(struct dsa_switch *ds, int port)
+{
+	struct mt7530_priv *priv = ds->priv;
+	bool all_user_ports_removed = true;
+	int i;
+
+	/* When a port is removed from the bridge, the port would be set up
+	 * back to the default as is at initial boot which is a VLAN-unaware
+	 * port.
+	 */
+	mt7530_rmw(priv, MT7530_PCR_P(port), PCR_PORT_VLAN_MASK,
+		   MT7530_PORT_MATRIX_MODE);
+	mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK,
+		   VLAN_ATTR(MT7530_VLAN_TRANSPARENT));
+
+	priv->ports[port].vlan_filtering = false;
+
+	for (i = 0; i < MT7530_NUM_PORTS; i++) {
+		if (dsa_is_user_port(ds, i) &&
+		    priv->ports[i].vlan_filtering) {
+			all_user_ports_removed = false;
+			break;
+		}
+	}
+
+	/* CPU port also does the same thing until all user ports belonging to
+	 * the CPU port get out of VLAN filtering mode.
+	 */
+	if (all_user_ports_removed) {
+		mt7530_write(priv, MT7530_PCR_P(MT7530_CPU_PORT),
+			     PCR_MATRIX(dsa_user_ports(priv->ds)));
+		mt7530_write(priv, MT7530_PVC_P(MT7530_CPU_PORT),
+			     PORT_SPEC_TAG);
+	}
+}
+
+static void
+mt7530_port_set_vlan_aware(struct dsa_switch *ds, int port)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	/* The real fabric path would be decided on the membership in the
+	 * entry of VLAN table. PCR_MATRIX set up here with ALL_MEMBERS
+	 * means potential VLAN can be consisting of certain subset of all
+	 * ports.
+	 */
+	mt7530_rmw(priv, MT7530_PCR_P(port),
+		   PCR_MATRIX_MASK, PCR_MATRIX(MT7530_ALL_MEMBERS));
+
+	/* Trapped into security mode allows packet forwarding through VLAN
+	 * table lookup.
+	 */
+	mt7530_rmw(priv, MT7530_PCR_P(port), PCR_PORT_VLAN_MASK,
+		   MT7530_PORT_SECURITY_MODE);
+
+	/* Set the port as a user port which is to be able to recognize VID
+	 * from incoming packets before fetching entry within the VLAN table.
+	 */
+	mt7530_rmw(priv, MT7530_PVC_P(port), VLAN_ATTR_MASK,
+		   VLAN_ATTR(MT7530_VLAN_USER));
+}
+
+static void
 mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
 			 struct net_device *bridge)
 {
@@ -817,8 +880,11 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
 		/* Remove this port from the port matrix of the other ports
 		 * in the same bridge. If the port is disabled, port matrix
 		 * is kept and not being setup until the port becomes enabled.
+		 * And the other port's port matrix cannot be broken when the
+		 * other port is still a VLAN-aware port.
 		 */
-		if (dsa_is_user_port(ds, i) && i != port) {
+		if (!priv->ports[i].vlan_filtering &&
+		    dsa_is_user_port(ds, i) && i != port) {
 			if (dsa_to_port(ds, i)->bridge_dev != bridge)
 				continue;
 			if (priv->ports[i].enable)
@@ -836,6 +902,8 @@ mt7530_port_bridge_leave(struct dsa_switch *ds, int port,
 			   PCR_MATRIX(BIT(MT7530_CPU_PORT)));
 	priv->ports[port].pm = PCR_MATRIX(BIT(MT7530_CPU_PORT));
 
+	mt7530_port_set_vlan_unaware(ds, port);
+
 	mutex_unlock(&priv->reg_mutex);
 }
 
@@ -906,6 +974,220 @@ mt7530_port_fdb_dump(struct dsa_switch *ds, int port,
 	return 0;
 }
 
+static int
+mt7530_vlan_cmd(struct mt7530_priv *priv, enum mt7530_vlan_cmd cmd, u16 vid)
+{
+	struct mt7530_dummy_poll p;
+	u32 val;
+	int ret;
+
+	val = VTCR_BUSY | VTCR_FUNC(cmd) | vid;
+	mt7530_write(priv, MT7530_VTCR, val);
+
+	INIT_MT7530_DUMMY_POLL(&p, priv, MT7530_VTCR);
+	ret = readx_poll_timeout(_mt7530_read, &p, val,
+				 !(val & VTCR_BUSY), 20, 20000);
+	if (ret < 0) {
+		dev_err(priv->dev, "poll timeout\n");
+		return ret;
+	}
+
+	val = mt7530_read(priv, MT7530_VTCR);
+	if (val & VTCR_INVALID) {
+		dev_err(priv->dev, "read VTCR invalid\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+mt7530_port_vlan_filtering(struct dsa_switch *ds, int port,
+			   bool vlan_filtering)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	priv->ports[port].vlan_filtering = vlan_filtering;
+
+	if (vlan_filtering) {
+		/* The port is being kept as VLAN-unaware port when bridge is
+		 * set up with vlan_filtering not being set, Otherwise, the
+		 * port and the corresponding CPU port is required the setup
+		 * for becoming a VLAN-aware port.
+		 */
+		mt7530_port_set_vlan_aware(ds, port);
+		mt7530_port_set_vlan_aware(ds, MT7530_CPU_PORT);
+	}
+
+	return 0;
+}
+
+static int
+mt7530_port_vlan_prepare(struct dsa_switch *ds, int port,
+			 const struct switchdev_obj_port_vlan *vlan)
+{
+	/* nothing needed */
+
+	return 0;
+}
+
+static void
+mt7530_hw_vlan_add(struct mt7530_priv *priv,
+		   struct mt7530_hw_vlan_entry *entry)
+{
+	u8 new_members;
+	u32 val;
+
+	new_members = entry->old_members | BIT(entry->port) |
+		      BIT(MT7530_CPU_PORT);
+
+	/* Validate the entry with independent learning, create egress tag per
+	 * VLAN and joining the port as one of the port members.
+	 */
+	val = IVL_MAC | VTAG_EN | PORT_MEM(new_members) | VLAN_VALID;
+	mt7530_write(priv, MT7530_VAWD1, val);
+
+	/* Decide whether adding tag or not for those outgoing packets from the
+	 * port inside the VLAN.
+	 */
+	val = entry->untagged ? MT7530_VLAN_EGRESS_UNTAG :
+				MT7530_VLAN_EGRESS_TAG;
+	mt7530_rmw(priv, MT7530_VAWD2,
+		   ETAG_CTRL_P_MASK(entry->port),
+		   ETAG_CTRL_P(entry->port, val));
+
+	/* CPU port is always taken as a tagged port for serving more than one
+	 * VLANs across and also being applied with egress type stack mode for
+	 * that VLAN tags would be appended after hardware special tag used as
+	 * DSA tag.
+	 */
+	mt7530_rmw(priv, MT7530_VAWD2,
+		   ETAG_CTRL_P_MASK(MT7530_CPU_PORT),
+		   ETAG_CTRL_P(MT7530_CPU_PORT,
+			       MT7530_VLAN_EGRESS_STACK));
+}
+
+static void
+mt7530_hw_vlan_del(struct mt7530_priv *priv,
+		   struct mt7530_hw_vlan_entry *entry)
+{
+	u8 new_members;
+	u32 val;
+
+	new_members = entry->old_members & ~BIT(entry->port);
+
+	val = mt7530_read(priv, MT7530_VAWD1);
+	if (!(val & VLAN_VALID)) {
+		dev_err(priv->dev,
+			"Cannot be deleted due to invalid entry\n");
+		return;
+	}
+
+	/* If certain member apart from CPU port is still alive in the VLAN,
+	 * the entry would be kept valid. Otherwise, the entry is got to be
+	 * disabled.
+	 */
+	if (new_members && new_members != BIT(MT7530_CPU_PORT)) {
+		val = IVL_MAC | VTAG_EN | PORT_MEM(new_members) |
+		      VLAN_VALID;
+		mt7530_write(priv, MT7530_VAWD1, val);
+	} else {
+		mt7530_write(priv, MT7530_VAWD1, 0);
+		mt7530_write(priv, MT7530_VAWD2, 0);
+	}
+}
+
+static void
+mt7530_hw_vlan_update(struct mt7530_priv *priv, u16 vid,
+		      struct mt7530_hw_vlan_entry *entry,
+		      mt7530_vlan_op vlan_op)
+{
+	u32 val;
+
+	/* Fetch entry */
+	mt7530_vlan_cmd(priv, MT7530_VTCR_RD_VID, vid);
+
+	val = mt7530_read(priv, MT7530_VAWD1);
+
+	entry->old_members = (val >> PORT_MEM_SHFT) & PORT_MEM_MASK;
+
+	/* Manipulate entry */
+	vlan_op(priv, entry);
+
+	/* Flush result to hardware */
+	mt7530_vlan_cmd(priv, MT7530_VTCR_WR_VID, vid);
+}
+
+static void
+mt7530_port_vlan_add(struct dsa_switch *ds, int port,
+		     const struct switchdev_obj_port_vlan *vlan)
+{
+	bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
+	bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
+	struct mt7530_hw_vlan_entry new_entry;
+	struct mt7530_priv *priv = ds->priv;
+	u16 vid;
+
+	/* The port is kept as VLAN-unaware if bridge with vlan_filtering not
+	 * being set.
+	 */
+	if (!priv->ports[port].vlan_filtering)
+		return;
+
+	mutex_lock(&priv->reg_mutex);
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
+		mt7530_hw_vlan_entry_init(&new_entry, port, untagged);
+		mt7530_hw_vlan_update(priv, vid, &new_entry,
+				      mt7530_hw_vlan_add);
+	}
+
+	if (pvid) {
+		mt7530_rmw(priv, MT7530_PPBV1_P(port), G0_PORT_VID_MASK,
+			   G0_PORT_VID(vlan->vid_end));
+		priv->ports[port].pvid = vlan->vid_end;
+	}
+
+	mutex_unlock(&priv->reg_mutex);
+}
+
+static int
+mt7530_port_vlan_del(struct dsa_switch *ds, int port,
+		     const struct switchdev_obj_port_vlan *vlan)
+{
+	struct mt7530_hw_vlan_entry target_entry;
+	struct mt7530_priv *priv = ds->priv;
+	u16 vid, pvid;
+
+	/* The port is kept as VLAN-unaware if bridge with vlan_filtering not
+	 * being set.
+	 */
+	if (!priv->ports[port].vlan_filtering)
+		return 0;
+
+	mutex_lock(&priv->reg_mutex);
+
+	pvid = priv->ports[port].pvid;
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
+		mt7530_hw_vlan_entry_init(&target_entry, port, 0);
+		mt7530_hw_vlan_update(priv, vid, &target_entry,
+				      mt7530_hw_vlan_del);
+
+		/* PVID is being restored to the default whenever the PVID port
+		 * is being removed from the VLAN.
+		 */
+		if (pvid == vid)
+			pvid = G0_PORT_VID_DEF;
+	}
+
+	mt7530_rmw(priv, MT7530_PPBV1_P(port), G0_PORT_VID_MASK, pvid);
+	priv->ports[port].pvid = pvid;
+
+	mutex_unlock(&priv->reg_mutex);
+
+	return 0;
+}
+
 static enum dsa_tag_protocol
 mtk_get_tag_protocol(struct dsa_switch *ds, int port)
 {
@@ -1035,6 +1317,10 @@ static const struct dsa_switch_ops mt7530_switch_ops = {
 	.port_fdb_add		= mt7530_port_fdb_add,
 	.port_fdb_del		= mt7530_port_fdb_del,
 	.port_fdb_dump		= mt7530_port_fdb_dump,
+	.port_vlan_filtering	= mt7530_port_vlan_filtering,
+	.port_vlan_prepare	= mt7530_port_vlan_prepare,
+	.port_vlan_add		= mt7530_port_vlan_add,
+	.port_vlan_del		= mt7530_port_vlan_del,
 };
 
 static int
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 74db982..d9b407a 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -17,6 +17,7 @@
 #define MT7530_NUM_PORTS		7
 #define MT7530_CPU_PORT			6
 #define MT7530_NUM_FDB_RECORDS		2048
+#define MT7530_ALL_MEMBERS		0xff
 
 #define	NUM_TRGMII_CTRL			5
 
@@ -88,21 +89,42 @@ enum mt7530_fdb_cmd {
 /* Register for vlan table control */
 #define MT7530_VTCR			0x90
 #define  VTCR_BUSY			BIT(31)
-#define  VTCR_FUNC			(((x) & 0xf) << 12)
-#define  VTCR_FUNC_RD_VID		0x1
-#define  VTCR_FUNC_WR_VID		0x2
-#define  VTCR_FUNC_INV_VID		0x3
-#define  VTCR_FUNC_VAL_VID		0x4
+#define  VTCR_INVALID			BIT(16)
+#define  VTCR_FUNC(x)			(((x) & 0xf) << 12)
 #define  VTCR_VID			((x) & 0xfff)
 
+enum mt7530_vlan_cmd {
+	/* Read/Write the specified VID entry from VAWD register based
+	 * on VID.
+	 */
+	MT7530_VTCR_RD_VID = 0,
+	MT7530_VTCR_WR_VID = 1,
+};
+
 /* Register for setup vlan and acl write data */
 #define MT7530_VAWD1			0x94
 #define  PORT_STAG			BIT(31)
+/* Independent VLAN Learning */
 #define  IVL_MAC			BIT(30)
+/* Per VLAN Egress Tag Control */
+#define  VTAG_EN			BIT(28)
+/* VLAN Member Control */
 #define  PORT_MEM(x)			(((x) & 0xff) << 16)
-#define  VALID				BIT(1)
+/* VLAN Entry Valid */
+#define  VLAN_VALID			BIT(0)
+#define  PORT_MEM_SHFT			16
+#define  PORT_MEM_MASK			0xff
 
 #define MT7530_VAWD2			0x98
+/* Egress Tag Control */
+#define  ETAG_CTRL_P(p, x)		(((x) & 0x3) << ((p) << 1))
+#define  ETAG_CTRL_P_MASK(p)		ETAG_CTRL_P(p, 3)
+
+enum mt7530_vlan_egress_attr {
+	MT7530_VLAN_EGRESS_UNTAG = 0,
+	MT7530_VLAN_EGRESS_TAG = 2,
+	MT7530_VLAN_EGRESS_STACK = 3,
+};
 
 /* Register for port STP state control */
 #define MT7530_SSP_P(x)			(0x2000 + ((x) * 0x100))
@@ -120,11 +142,23 @@ enum mt7530_stp_state {
 /* Register for port control */
 #define MT7530_PCR_P(x)			(0x2004 + ((x) * 0x100))
 #define  PORT_VLAN(x)			((x) & 0x3)
+
+enum mt7530_port_mode {
+	/* Port Matrix Mode: Frames are forwarded by the PCR_MATRIX members. */
+	MT7530_PORT_MATRIX_MODE = PORT_VLAN(0),
+
+	/* Security Mode: Discard any frame due to ingress membership
+	 * violation or VID missed on the VLAN table.
+	 */
+	MT7530_PORT_SECURITY_MODE = PORT_VLAN(3),
+};
+
 #define  PCR_MATRIX(x)			(((x) & 0xff) << 16)
 #define  PORT_PRI(x)			(((x) & 0x7) << 24)
 #define  EG_TAG(x)			(((x) & 0x3) << 28)
 #define  PCR_MATRIX_MASK		PCR_MATRIX(0xff)
 #define  PCR_MATRIX_CLR			PCR_MATRIX(0)
+#define  PCR_PORT_VLAN_MASK		PORT_VLAN(3)
 
 /* Register for port security control */
 #define MT7530_PSC_P(x)			(0x200c + ((x) * 0x100))
@@ -134,10 +168,20 @@ enum mt7530_stp_state {
 #define MT7530_PVC_P(x)			(0x2010 + ((x) * 0x100))
 #define  PORT_SPEC_TAG			BIT(5)
 #define  VLAN_ATTR(x)			(((x) & 0x3) << 6)
+#define  VLAN_ATTR_MASK			VLAN_ATTR(3)
+
+enum mt7530_vlan_port_attr {
+	MT7530_VLAN_USER = 0,
+	MT7530_VLAN_TRANSPARENT = 3,
+};
+
 #define  STAG_VPID			(((x) & 0xffff) << 16)
 
 /* Register for port port-and-protocol based vlan 1 control */
 #define MT7530_PPBV1_P(x)		(0x2014 + ((x) * 0x100))
+#define  G0_PORT_VID(x)			(((x) & 0xfff) << 0)
+#define  G0_PORT_VID_MASK		G0_PORT_VID(0xfff)
+#define  G0_PORT_VID_DEF		G0_PORT_VID(1)
 
 /* Register for port MAC control register */
 #define MT7530_PMCR_P(x)		(0x3000 + ((x) * 0x100))
@@ -345,9 +389,20 @@ struct mt7530_fdb {
 	bool noarp;
 };
 
+/* struct mt7530_port -	This is the main data structure for holding the state
+ *			of the port.
+ * @enable:	The status used for show port is enabled or not.
+ * @pm:		The matrix used to show all connections with the port.
+ * @pvid:	The VLAN specified is to be considered a PVID at ingress.  Any
+ *		untagged frames will be assigned to the related VLAN.
+ * @vlan_filtering: The flags indicating whether the port that can recognize
+ *		    VLAN-tagged frames.
+ */
 struct mt7530_port {
 	bool enable;
 	u32 pm;
+	u16 pvid;
+	bool vlan_filtering;
 };
 
 /* struct mt7530_priv -	This is the main data structure for holding the state
@@ -382,6 +437,22 @@ struct mt7530_priv {
 	struct mutex reg_mutex;
 };
 
+struct mt7530_hw_vlan_entry {
+	int port;
+	u8  old_members;
+	bool untagged;
+};
+
+static inline void mt7530_hw_vlan_entry_init(struct mt7530_hw_vlan_entry *e,
+					     int port, bool untagged)
+{
+	e->port = port;
+	e->untagged = untagged;
+}
+
+typedef void (*mt7530_vlan_op)(struct mt7530_priv *,
+			       struct mt7530_hw_vlan_entry *);
+
 struct mt7530_hw_stats {
 	const char	*string;
 	u16		reg;
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 net-next 0/3] add VLAN support to DSA MT7530
From: sean.wang @ 2017-12-15  4:46 UTC (permalink / raw)
  To: davem, andrew, f.fainelli, vivien.didelot
  Cc: netdev, linux-kernel, linux-mediatek, Sean Wang

From: Sean Wang <sean.wang@mediatek.com>

Changes sicne v2:
update to the latest code base from net-next and fix up all building
errors with -Werror.

Changes since v1:
- fix up the typo
- prefer ordering declarations longest to shortest
- update that vlan_prepare callback should not change any state
- use lower case letter for function naming

The patchset extends DSA MT7530 to VLAN support through filling required
callbacks in patch 1 and merging the special tag with VLAN tag in patch 2
for allowing that the hardware can handle these packets with VID from the
CPU port.

Sean Wang (3):
  net: dsa: mediatek: add VLAN support for MT7530
  net: dsa: mediatek: combine MediaTek tag with VLAN tag
  net: dsa: mediatek: update MAINTAINERS entry with MediaTek switch
    driver

 MAINTAINERS              |   7 ++
 drivers/net/dsa/mt7530.c | 288 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/dsa/mt7530.h |  83 +++++++++++++-
 net/dsa/tag_mtk.c        |  38 +++++--
 4 files changed, 400 insertions(+), 16 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: Haishuang Yan @ 2017-12-15  2:46 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu

pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
the right place.

Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv6/ip6_gre.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 68e7eef..eab4b56 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -506,12 +506,12 @@ static int ip6erspan_rcv(struct sk_buff *skb, int gre_hdr_len,
 	struct ip6_tnl *tunnel;
 	__be32 index;
 
-	ipv6h = ipv6_hdr(skb);
-	ershdr = (struct erspanhdr *)skb->data;
-
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr))))
 		return PACKET_REJECT;
 
+	ipv6h = ipv6_hdr(skb);
+	ershdr = (struct erspanhdr *)skb->data;
+
 	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
 	index = ershdr->md.index;
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] ip_gre: fix wrong return value of erspan_rcv
From: Haishuang Yan @ 2017-12-15  2:46 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: netdev, linux-kernel, Haishuang Yan, William Tu

If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 net/ipv4/ip_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 61ee014..d747d06 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -267,7 +267,7 @@ static int erspan_rcv(struct sk_buff *skb, struct tnl_ptk_info *tpi,
 	len = gre_hdr_len + sizeof(*ershdr);
 
 	if (unlikely(!pskb_may_pull(skb, len)))
-		return -ENOMEM;
+		return PACKET_REJECT;
 
 	iph = ip_hdr(skb);
 	ershdr = (struct erspanhdr *)(skb->data + gre_hdr_len);
-- 
1.8.3.1

^ permalink raw reply related

* [RFC][PATCH] Add primitives for manipulating bitfields both in host- and fixed-endian.
From: Al Viro @ 2017-12-15  2:33 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linus Torvalds, netdev, linux-kernel
In-Reply-To: <20171213174554.GE21978@ZenIV.linux.org.uk>

The following primitives are defined in linux/bitfield.h:

* u32 le32_get_bits(__le32 val, u32 field) extracts the contents of the
  bitfield specified by @field in little-endian 32bit value @val and
  converts it to host-endian.

* void le32p_replace_bits(__le32 *p, u32 v, u32 field) replaces
  the contents of the bitfield specified by @field in little-endian
  32bit object pointet to by *p with the value of @v.  New value is
  given in host-endian and stored as little-endian.

* __le32 le32_replace_bits(__le32 old, u32 v, u32 field) is equivalent to
  ({__le32 tmp = old; le32p_replace_bits(&old, v, field); tmp;})
  In other words, instead of modifying an object in memory, it takes
  the initial value and returns the modified one.

Such set of helpers is defined for each of little-, big- and host-endian
types; e.g. u64_get_bits(val, field) will return the contents of the bitfield
specified by @field in host-endian 64bit value @val, etc.  Of course, for
host-endian no conversion is involved.

Fields to access are specified as GENMASK() values - an N-bit field
starting at bit #M is encoded as GENMASK(M + N - 1, N).  Note that
bit numbers refer to endianness of the object we are working with -
e.g. GENMASK(11, 0) in __be16 refers to the second byte and the lower
4 bits of the first byte.  In __le16 it would refer to the first byte
and the lower 4 bits of the second byte, etc.

Field specification must be a constant; __builtin_constant_p() doesn't
have to be true for it, but compiler must be able to evaluate it at
build time.  If it cannot or if the value does not encode any bitfield,
the build will fail.

If the value being stored in ..._replace_bits() is a constant that does
not fit into bitfield, a warning will be generated at compile time.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index 1030651f8309..4b4f0531a79c 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -16,6 +16,7 @@
 #define _LINUX_BITFIELD_H

 #include <linux/build_bug.h>
+#include <asm/byteorder.h>

 /*
  * Bitfield access macros
@@ -103,4 +104,50 @@
 		(typeof(_mask))(((_reg) & (_mask)) >> __bf_shf(_mask));	\
 	})

+extern void __compiletime_warning("value doesn't fit into mask")
+__field_overflow(void);
+extern void __compiletime_error("bad bitfield mask")
+__bad_mask(void);
+static __always_inline u64 mask_to_multiplier(u64 mask)
+{
+	if ((mask | (mask - 1)) & ((mask | (mask - 1)) + 1))
+		__bad_mask();
+	return mask & -mask;
+}
+
+#define ____MAKE_OP(type,base,to,from)					\
+static __always_inline __##type type##_replace_bits(__##type old,	\
+					base val, base field)		\
+{									\
+	__##type m = to(field);						\
+        if (__builtin_constant_p(val) &&				\
+		    (val & ~(field/mask_to_multiplier(field))))		\
+				    __field_overflow();			\
+	return (old & ~m) |						\
+		(to(val * mask_to_multiplier(field)) & m);		\
+}									\
+static __always_inline void type##p_replace_bits(__##type *p,		\
+					base val, base field)		\
+{									\
+	__##type m = to(field);						\
+        if (__builtin_constant_p(val) &&				\
+		    (val & ~(field/mask_to_multiplier(field))))		\
+				    __field_overflow();			\
+	*p = (*p & ~m) |						\
+		(to(val * mask_to_multiplier(field)) & m);		\
+}									\
+static __always_inline base type##_get_bits(__##type v, base field)	\
+{									\
+	return (from(v) & field)/mask_to_multiplier(field);		\
+}
+#define __MAKE_OP(size)							\
+	____MAKE_OP(le##size,u##size,cpu_to_le##size,le##size##_to_cpu)	\
+	____MAKE_OP(be##size,u##size,cpu_to_be##size,be##size##_to_cpu)	\
+	____MAKE_OP(u##size,u##size,,)
+__MAKE_OP(16)
+__MAKE_OP(32)
+__MAKE_OP(64)
+#undef __MAKE_OP
+#undef ____MAKE_OP
+
 #endif

^ permalink raw reply related

* Re: [PATCH iproute2] iplink: validate maximum gso_max_size
From: Stephen Hemminger @ 2017-12-15  2:24 UTC (permalink / raw)
  To: Solio Sarabia; +Cc: netdev
In-Reply-To: <1513117522-5111-1-git-send-email-solio.sarabia@intel.com>

On Tue, 12 Dec 2017 14:25:22 -0800
Solio Sarabia <solio.sarabia@intel.com> wrote:

> Validate the upper limit for gso_max_size, valid range is [0-65,536]
> inclusive. Fix minor whitespace in iplink man page.
> 
> Signed-off-by: Solio Sarabia <solio.sarabia@intel.com>
> ---

Applied to net-next

^ permalink raw reply

* [PATCH iproute] ip: validate vlan value for vlan info
From: Stephen Hemminger @ 2017-12-15  2:19 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

The VLAN tag must be 0..4095 to be valid.
Better to trap it here.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/iplink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index e83f1477e7bc..1e685cc23805 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -276,11 +276,13 @@ static void iplink_parse_vf_vlan_info(int vf, int *argcp, char ***argvp,
 {
 	int argc = *argcp;
 	char **argv = *argvp;
+	unsigned int vci;
 
 	NEXT_ARG();
-	if (get_unsigned(&ivvip->vlan, *argv, 0))
+	if (get_unsigned(&vci, *argv, 0) || vci > 4095)
 		invarg("Invalid \"vlan\" value\n", *argv);
 
+	ivvip->vlan = vci;
 	ivvip->vf = vf;
 	ivvip->qos = 0;
 	ivvip->vlan_proto = htons(ETH_P_8021Q);
-- 
2.11.0

^ permalink raw reply related

* [PATCH v3 2/2] ARM64: dts: meson-axg: enable ethernet for A113D S400 board
From: Yixun Lan @ 2017-12-15  2:10 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA, Kevin Hilman
  Cc: Neil Armstrong, Jerome Brunet, Giuseppe Cavallaro,
	Alexandre Torgue, Carlo Caione, Yixun Lan,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20171215021014.231308-1-yixun.lan-LpR1jeaWuhtBDgjK7y7TUQ@public.gmane.org>

This is tested in the S400 dev board which use a RTL8211F PHY,
and the pins connect to the 'eth_rgmii_y_pins' group.

Reviewed-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yixun Lan <yixun.lan-LpR1jeaWuhtBDgjK7y7TUQ@public.gmane.org>
---
 arch/arm64/boot/dts/amlogic/meson-axg-s400.dts | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts b/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
index 70eca1f8736a..b8c4f1913d28 100644
--- a/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
@@ -20,3 +20,10 @@
 &uart_AO {
 	status = "okay";
 };
+
+&ethmac {
+	status = "okay";
+	phy-mode = "rgmii";
+	pinctrl-0 = <&eth_rgmii_y_pins>;
+	pinctrl-names = "default";
+};
-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 1/2] ARM64: dts: meson-axg: add ethernet mac controller
From: Yixun Lan @ 2017-12-15  2:10 UTC (permalink / raw)
  To: devicetree, Kevin Hilman
  Cc: Neil Armstrong, Jerome Brunet, Giuseppe Cavallaro,
	Alexandre Torgue, Carlo Caione, Yixun Lan, linux-amlogic,
	linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20171215021014.231308-1-yixun.lan@amlogic.com>

Add DT info for the stmmac ethernet MAC which found in
the Amlogic's Meson-AXG SoC, also describe the ethernet
pinctrl & clock information here.

Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Yixun Lan <yixun.lan@amlogic.com>
---
 arch/arm64/boot/dts/amlogic/meson-axg.dtsi | 54 ++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
index d356ce74ad89..94c4972222b7 100644
--- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
@@ -7,6 +7,7 @@
 #include <dt-bindings/gpio/gpio.h>
 #include <dt-bindings/interrupt-controller/irq.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
+#include <dt-bindings/clock/axg-clkc.h>
 
 / {
 	compatible = "amlogic,meson-axg";
@@ -148,6 +149,19 @@
 			#address-cells = <0>;
 		};
 
+		ethmac: ethernet@ff3f0000 {
+			compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
+			reg = <0x0 0xff3f0000 0x0 0x10000
+				0x0 0xff634540 0x0 0x8>;
+			interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>;
+			interrupt-names = "macirq";
+			clocks = <&clkc CLKID_ETH>,
+				 <&clkc CLKID_FCLK_DIV2>,
+				 <&clkc CLKID_MPLL2>;
+			clock-names = "stmmaceth", "clkin0", "clkin1";
+			status = "disabled";
+		};
+
 		hiubus: bus@ff63c000 {
 			compatible = "simple-bus";
 			reg = <0x0 0xff63c000 0x0 0x1c00>;
@@ -194,6 +208,46 @@
 					#gpio-cells = <2>;
 					gpio-ranges = <&pinctrl_periphs 0 0 86>;
 				};
+
+				eth_rgmii_x_pins: eth-x-rgmii {
+					mux {
+						groups = "eth_mdio_x",
+						       "eth_mdc_x",
+						       "eth_rgmii_rx_clk_x",
+						       "eth_rx_dv_x",
+						       "eth_rxd0_x",
+						       "eth_rxd1_x",
+						       "eth_rxd2_rgmii",
+						       "eth_rxd3_rgmii",
+						       "eth_rgmii_tx_clk",
+						       "eth_txen_x",
+						       "eth_txd0_x",
+						       "eth_txd1_x",
+						       "eth_txd2_rgmii",
+						       "eth_txd3_rgmii";
+						function = "eth";
+					};
+				};
+
+				eth_rgmii_y_pins: eth-y-rgmii {
+					mux {
+						groups = "eth_mdio_y",
+						       "eth_mdc_y",
+						       "eth_rgmii_rx_clk_y",
+						       "eth_rx_dv_y",
+						       "eth_rxd0_y",
+						       "eth_rxd1_y",
+						       "eth_rxd2_rgmii",
+						       "eth_rxd3_rgmii",
+						       "eth_rgmii_tx_clk",
+						       "eth_txen_y",
+						       "eth_txd0_y",
+						       "eth_txd1_y",
+						       "eth_txd2_rgmii",
+						       "eth_txd3_rgmii";
+						function = "eth";
+					};
+				};
 			};
 		};
 
-- 
2.15.1

^ permalink raw reply related

* [PATCH v3 0/2] Add ethernet support for Meson-AXG SoC
From: Yixun Lan @ 2017-12-15  2:10 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA, Kevin Hilman
  Cc: Neil Armstrong, Jerome Brunet, Giuseppe Cavallaro,
	Alexandre Torgue, Carlo Caione, Yixun Lan,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

This series try to add support for the ethernet MAC controller
found in Meson-AXG SoC, and also enable it in the S400 board.

Changes in v3 since [4]:
 - put clock DT info in soc.dtsi
 - separate DT for 'add support for the controller' vs 'enable in board'

Changes in v2 since [1]:
 - rebase to kevin's v4.16/dt64 branch
 - add Neil's Reviewed-by
 - move clock info to board.dts instead of in soc.dtsi
 - drop "meson-axg-dwmac" compatible string, since we didn't use this
   we could re-add it later when we really need.
 - note: to make ethernet work properly,it depend on clock & pinctrl[2],
   to compile the DTS, the patch [3] is required.
   the code part will be taken via clock & pinctrl subsystem tree.

[1]
http://lists.infradead.org/pipermail/linux-amlogic/2017-November/005301.html

[4]
http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005768.html

[2]
http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005735.html
http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005694.html

[3]
http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005738.html

Yixun Lan (2):
  ARM64: dts: meson-axg: add ethernet mac controller
  ARM64: dts: meson-axg: enable ethernet for A113D S400 board

 arch/arm64/boot/dts/amlogic/meson-axg-s400.dts |  7 ++++
 arch/arm64/boot/dts/amlogic/meson-axg.dtsi     | 54 ++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

-- 
2.15.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] net: phy: broadcom: Add entry for 5395 switch PHYs
From: Chris Healy @ 2017-12-15  2:00 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, Andrew Lunn
In-Reply-To: <1513302496-22888-1-git-send-email-f.fainelli@gmail.com>

Tested-by: Chris Healy <cphealy@gmail.com>

Successfully tested on a machine with a Broadcom BCM5395 switch.

On Thu, Dec 14, 2017 at 5:48 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> Add an entry for the builtin PHYs present in the Broadcom BCM5395 switch. This
> allows us to retrieve the PHY statistics among other things.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  drivers/net/phy/broadcom.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/brcmphy.h    |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
> index a8f69c5777bc..3bb6b66dc7bf 100644
> --- a/drivers/net/phy/broadcom.c
> +++ b/drivers/net/phy/broadcom.c
> @@ -540,6 +540,37 @@ static int brcm_fet_config_intr(struct phy_device *phydev)
>         return err;
>  }
>
> +struct bcm53xx_phy_priv {
> +       u64     *stats;
> +};
> +
> +static int bcm53xx_phy_probe(struct phy_device *phydev)
> +{
> +       struct bcm53xx_phy_priv *priv;
> +
> +       priv = devm_kzalloc(&phydev->mdio.dev, sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return -ENOMEM;
> +
> +       phydev->priv = priv;
> +
> +       priv->stats = devm_kcalloc(&phydev->mdio.dev,
> +                                  bcm_phy_get_sset_count(phydev), sizeof(u64),
> +                                  GFP_KERNEL);
> +       if (!priv->stats)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +static void bcm53xx_phy_get_stats(struct phy_device *phydev,
> +                                 struct ethtool_stats *stats, u64 *data)
> +{
> +       struct bcm53xx_phy_priv *priv = phydev->priv;
> +
> +       bcm_phy_get_stats(phydev, priv->stats, stats, data);
> +}
> +
>  static struct phy_driver broadcom_drivers[] = {
>  {
>         .phy_id         = PHY_ID_BCM5411,
> @@ -679,6 +710,16 @@ static struct phy_driver broadcom_drivers[] = {
>         .config_init    = brcm_fet_config_init,
>         .ack_interrupt  = brcm_fet_ack_interrupt,
>         .config_intr    = brcm_fet_config_intr,
> +}, {
> +       .phy_id         = PHY_ID_BCM5395,
> +       .phy_id_mask    = 0xfffffff0,
> +       .name           = "Broadcom BCM5395",
> +       .flags          = PHY_IS_INTERNAL,
> +       .features       = PHY_GBIT_FEATURES,
> +       .get_sset_count = bcm_phy_get_sset_count,
> +       .get_strings    = bcm_phy_get_strings,
> +       .get_stats      = bcm53xx_phy_get_stats,
> +       .probe          = bcm53xx_phy_probe,
>  } };
>
>  module_phy_driver(broadcom_drivers);
> @@ -699,6 +740,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] = {
>         { PHY_ID_BCM57780, 0xfffffff0 },
>         { PHY_ID_BCMAC131, 0xfffffff0 },
>         { PHY_ID_BCM5241, 0xfffffff0 },
> +       { PHY_ID_BCM5395, 0xfffffff0 },
>         { }
>  };
>
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index 8ff86b4c1b8a..d3339dd48b1a 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -14,6 +14,7 @@
>  #define PHY_ID_BCM5241                 0x0143bc30
>  #define PHY_ID_BCMAC131                        0x0143bc70
>  #define PHY_ID_BCM5481                 0x0143bca0
> +#define PHY_ID_BCM5395                 0x0143bcf0
>  #define PHY_ID_BCM54810                        0x03625d00
>  #define PHY_ID_BCM5482                 0x0143bcb0
>  #define PHY_ID_BCM5411                 0x00206070
> --
> 2.7.4
>

^ permalink raw reply

* [PATCH net-next] net: dsa: bcm_sf2: Update compatible string for 7278B0
From: Florian Fainelli @ 2017-12-15  1:59 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: Florian Fainelli, Rob Herring, Mark Rutland, Andrew Lunn,
	Vivien Didelot, David S. Miller,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	open list

Update the compatible string and Device Tree binding document for
7278B0.

Signed-off-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt | 5 ++++-
 drivers/net/dsa/bcm_sf2.c                                          | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt b/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt
index 9a734d808aa7..b7336b9d6a3c 100644
--- a/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt
+++ b/Documentation/devicetree/bindings/net/brcm,bcm7445-switch-v4.0.txt
@@ -2,7 +2,10 @@
 
 Required properties:
 
-- compatible: should be "brcm,bcm7445-switch-v4.0" or "brcm,bcm7278-switch-v4.0"
+- compatible: should be one of
+	"brcm,bcm7445-switch-v4.0"
+	"brcm,bcm7278-switch-v4.0"
+	"brcm,bcm7278-switch-v4.8"
 - reg: addresses and length of the register sets for the device, must be 6
   pairs of register addresses and lengths
 - interrupts: interrupts for the devices, must be two interrupts
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8ea0cd494ea0..0378eded31f2 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -948,6 +948,9 @@ static const struct of_device_id bcm_sf2_of_match[] = {
 	{ .compatible = "brcm,bcm7278-switch-v4.0",
 	  .data = &bcm_sf2_7278_data
 	},
+	{ .compatible = "brcm,bcm7278-switch-v4.8",
+	  .data = &bcm_sf2_7278_data
+	},
 	{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, bcm_sf2_of_match);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] ethtool: fix MFLCN register dump for 82599 and newer
From: Zhang Kang @ 2017-12-15  1:56 UTC (permalink / raw)
  To: linville, netdev; +Cc: Zhang Kang, Gao Wayne, Wei Net

Use MFLCN for 82599 and X540 HW instead of FCTRL.

Signed-off-by: Zhang Kang <tjbroadroad@163.com>
Signed-off-by: Gao Wayne <Wayne.Gao@emc.com>
Signed-off-by: Wei Net <gw_net@163.com>
---
 ixgbe.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ixgbe.c b/ixgbe.c
index ff0e769..c632137 100644
--- a/ixgbe.c
+++ b/ixgbe.c
@@ -232,9 +232,9 @@ ixgbe_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 		"       Receive Priority Flow Control Packets:         %s\n",
 		reg,
 		reg & IXGBE_MFLCN_RFCE    ? "enabled"  : "disabled",
-		reg & IXGBE_FCTRL_DPF     ? "enabled"  : "disabled",
-		reg & IXGBE_FCTRL_PMCF    ? "enabled"  : "disabled",
-		reg & IXGBE_FCTRL_RPFCE   ? "enabled"  : "disabled");
+		reg & IXGBE_MFLCN_DPF     ? "enabled"  : "disabled",
+		reg & IXGBE_MFLCN_PMCF    ? "enabled"  : "disabled",
+		reg & IXGBE_MFLCN_RPFCE   ? "enabled"  : "disabled");
 	}
 
 	reg = regs_buff[516];
-- 
2.10.2.windows.1

^ permalink raw reply related

* Re: b53 tags on bpi-r1 (bcm53125)
From: Florian Fainelli @ 2017-12-15  1:55 UTC (permalink / raw)
  To: Jochen Friedrich; +Cc: netdev
In-Reply-To: <20171123201445.Horde.3FbuUT5FWgWtZrA-KljDJ7W@webmail.scram.de>

Hi Jochen,

On 11/23/2017 11:14 AM, Jochen Friedrich wrote:
> Hi Florian,
> 
>> With Broadcom tags (or any type of switch tagging protocol), eth0
>> becomes a conduit interface and no longer a "normal" network device for
>> applications/socket to use. This means that if you were obtaining an IP
>> address through a DHCP client using e.g: dhclient eth0, this now needs
>> to be replaced using dhclient lan1 for instance.
> 
> Yes, that's what I would expect.
> 
>> If you use the per-port network devices, do you actually see Broadcom
>> tags coming out of these interfaces, or is it just when you use eth0
>> like what you used before that you see that happening?
> 
> For outgoing Packet, I see the tags on eth0, that's expected, but also
> an a laptop connected to any of the external ports (no matter which one,
> wan or lan1-4).
> Incoming packets are seen on eth0 untagged and don't get forwarded to
> the corresponding child Interface.>
> So it seems the switch operates in dumb forwarding mode.

Humm, what if you comment that part from b53_set_forwarding:

        b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, &mgmt);
        mgmt |= B53_MII_DUMB_FWDG_EN;
        b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, mgmt);

does this help?

> 
> I also tried a pull down on LED Port 0 (mii_dumb_fwdg_en), but the
> switch still remains in dumb mode. However, I would have to double check
> I picked the correct pin, the Bpi-r1 schematics is not completely error
> free :-(.
> 
> Thanks, Jochen

-- 
Florian

^ permalink raw reply

* [PATCH bpf-next 05/13] selftests/bpf: add tests for stack_zero tracking
From: Alexei Starovoitov @ 2017-12-15  1:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Daniel Borkmann, John Fastabend, Edward Cree, Jakub Kicinski,
	netdev, kernel-team
In-Reply-To: <20171215015517.409513-1-ast@kernel.org>

From: Alexei Starovoitov <ast@fb.com>

adjust two tests, since verifier got smarter
and add new one to test stack_zero logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/test_verifier.c | 66 ++++++++++++++++++++++++++++-
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 88f389c6ec48..eaf294822a8f 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5649,7 +5649,7 @@ static struct bpf_test tests[] = {
 		"helper access to variable memory: size > 0 not allowed on NULL (ARG_PTR_TO_MEM_OR_NULL)",
 		.insns = {
 			BPF_MOV64_IMM(BPF_REG_1, 0),
-			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
 			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
 			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
 			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
@@ -5884,7 +5884,7 @@ static struct bpf_test tests[] = {
 			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
 			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
 			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
-			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
 			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
 			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
 			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 63),
@@ -9056,6 +9056,68 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
+	{
+		"calls: caller stack init to zero or map_value_or_null",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			/* fetch map_value_or_null or const_zero from stack */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+			/* store into map_value */
+			BPF_ST_MEM(BPF_W, BPF_REG_0, 0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			/* if (ctx == 0) return; */
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
+			/* else bpf_map_lookup() and *(fp - 8) = r0 */
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_2),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_map_lookup_elem),
+			/* write map_value_ptr_or_null into stack frame of main prog at fp-8 */
+			BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map1 = { 13 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_XDP,
+	},
+	{
+		"calls: stack init to zero and pruning",
+		.insns = {
+			/* first make allocated_stack 16 byte */
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, 0),
+			/* now fork the execution such that the false branch
+			 * of JGT insn will be verified second and it skisp zero
+			 * init of fp-8 stack slot. If stack liveness marking
+			 * is missing live_read marks from call map_lookup
+			 * processing then pruning will incorrectly assume
+			 * that fp-8 stack slot was unused in the fall-through
+			 * branch and will accept the program incorrectly
+			 */
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_1, 2, 2),
+			BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_map_lookup_elem),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 6 },
+		.errstr = "invalid indirect read from stack off -8+0 size 8",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_XDP,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 06/13] libbpf: add support for bpf_call
From: Alexei Starovoitov @ 2017-12-15  1:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Daniel Borkmann, John Fastabend, Edward Cree, Jakub Kicinski,
	netdev, kernel-team
In-Reply-To: <20171215015517.409513-1-ast@kernel.org>

From: Alexei Starovoitov <ast@fb.com>

- recognize relocation emitted by llvm
- since all regular function will be kept in .text section and llvm
  takes care of pc-relative offsets in bpf_call instruction
  simply copy all of .text to relevant program section while adjusting
  bpf_call instructions in program section to point to newly copied
  body of instructions from .text
- do so for all programs in the elf file
- set all programs types to the one passed to bpf_prog_load()

Note for elf files with multiple programs that use different
functions in .text section we need to do 'linker' style logic.
This work is still TBD

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/uapi/linux/bpf.h |   6 ++
 tools/lib/bpf/bpf.h            |   2 +-
 tools/lib/bpf/libbpf.c         | 170 ++++++++++++++++++++++++++++++-----------
 3 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index cf446c25c0ec..db1b0923a308 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -197,8 +197,14 @@ enum bpf_attach_type {
  */
 #define BPF_F_STRICT_ALIGNMENT	(1U << 0)
 
+/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
 #define BPF_PSEUDO_MAP_FD	1
 
+/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
+ * offset to another bpf function
+ */
+#define BPF_PSEUDO_CALL		1
+
 /* flags for BPF_MAP_UPDATE_ELEM command */
 #define BPF_ANY		0 /* create new element or update existing */
 #define BPF_NOEXIST	1 /* create new element if it didn't exist */
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6534889e2b2f..9f44c196931e 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -40,7 +40,7 @@ int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
 			  __u32 map_flags);
 
 /* Recommend log buffer size */
-#define BPF_LOG_BUF_SIZE 65536
+#define BPF_LOG_BUF_SIZE (256 * 1024)
 int bpf_load_program_name(enum bpf_prog_type type, const char *name,
 			  const struct bpf_insn *insns,
 			  size_t insns_cnt, const char *license,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 65d0d0aff4fa..5b83875b3594 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -174,12 +174,19 @@ struct bpf_program {
 	char *name;
 	char *section_name;
 	struct bpf_insn *insns;
-	size_t insns_cnt;
+	size_t insns_cnt, main_prog_cnt;
 	enum bpf_prog_type type;
 
-	struct {
+	struct reloc_desc {
+		enum {
+			RELO_LD64,
+			RELO_CALL,
+		} type;
 		int insn_idx;
-		int map_idx;
+		union {
+			int map_idx;
+			int text_off;
+		};
 	} *reloc_desc;
 	int nr_reloc;
 
@@ -234,6 +241,7 @@ struct bpf_object {
 		} *reloc;
 		int nr_reloc;
 		int maps_shndx;
+		int text_shndx;
 	} efile;
 	/*
 	 * All loaded bpf_object is linked in a list, which is
@@ -375,9 +383,13 @@ bpf_object__init_prog_names(struct bpf_object *obj)
 	size_t pi, si;
 
 	for (pi = 0; pi < obj->nr_programs; pi++) {
-		char *name = NULL;
+		const char *name = NULL;
 
 		prog = &obj->programs[pi];
+		if (prog->idx == obj->efile.text_shndx) {
+			name = ".text";
+			goto skip_search;
+		}
 
 		for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name;
 		     si++) {
@@ -405,7 +417,7 @@ bpf_object__init_prog_names(struct bpf_object *obj)
 				   prog->section_name);
 			return -EINVAL;
 		}
-
+skip_search:
 		prog->name = strdup(name);
 		if (!prog->name) {
 			pr_warning("failed to allocate memory for prog sym %s\n",
@@ -795,6 +807,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 		} else if ((sh.sh_type == SHT_PROGBITS) &&
 			   (sh.sh_flags & SHF_EXECINSTR) &&
 			   (data->d_size > 0)) {
+			if (strcmp(name, ".text") == 0)
+				obj->efile.text_shndx = idx;
 			err = bpf_object__add_program(obj, data->d_buf,
 						      data->d_size, name, idx);
 			if (err) {
@@ -856,11 +870,14 @@ bpf_object__find_prog_by_idx(struct bpf_object *obj, int idx)
 }
 
 static int
-bpf_program__collect_reloc(struct bpf_program *prog,
-			   size_t nr_maps, GElf_Shdr *shdr,
-			   Elf_Data *data, Elf_Data *symbols,
-			   int maps_shndx, struct bpf_map *maps)
+bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
+			   Elf_Data *data, struct bpf_object *obj)
 {
+	Elf_Data *symbols = obj->efile.symbols;
+	int text_shndx = obj->efile.text_shndx;
+	int maps_shndx = obj->efile.maps_shndx;
+	struct bpf_map *maps = obj->maps;
+	size_t nr_maps = obj->nr_maps;
 	int i, nrels;
 
 	pr_debug("collecting relocating info for: '%s'\n",
@@ -893,8 +910,10 @@ bpf_program__collect_reloc(struct bpf_program *prog,
 				   GELF_R_SYM(rel.r_info));
 			return -LIBBPF_ERRNO__FORMAT;
 		}
+		pr_debug("relo for %ld value %ld name %d\n",
+			 rel.r_info >> 32, sym.st_value, sym.st_name);
 
-		if (sym.st_shndx != maps_shndx) {
+		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
 			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
 				   prog->section_name, sym.st_shndx);
 			return -LIBBPF_ERRNO__RELOC;
@@ -903,6 +922,17 @@ bpf_program__collect_reloc(struct bpf_program *prog,
 		insn_idx = rel.r_offset / sizeof(struct bpf_insn);
 		pr_debug("relocation: insn_idx=%u\n", insn_idx);
 
+		if (insns[insn_idx].code == (BPF_JMP | BPF_CALL)) {
+			if (insns[insn_idx].src_reg != BPF_PSEUDO_CALL) {
+				pr_warning("incorrect bpf_call opcode\n");
+				return -LIBBPF_ERRNO__RELOC;
+			}
+			prog->reloc_desc[i].type = RELO_CALL;
+			prog->reloc_desc[i].insn_idx = insn_idx;
+			prog->reloc_desc[i].text_off = sym.st_value;
+			continue;
+		}
+
 		if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
 			pr_warning("bpf: relocation: invalid relo for insns[%d].code 0x%x\n",
 				   insn_idx, insns[insn_idx].code);
@@ -924,6 +954,7 @@ bpf_program__collect_reloc(struct bpf_program *prog,
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
+		prog->reloc_desc[i].type = RELO_LD64;
 		prog->reloc_desc[i].insn_idx = insn_idx;
 		prog->reloc_desc[i].map_idx = map_idx;
 	}
@@ -963,27 +994,76 @@ bpf_object__create_maps(struct bpf_object *obj)
 }
 
 static int
+bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
+			struct reloc_desc *relo)
+{
+	struct bpf_insn *insn, *new_insn;
+	struct bpf_program *text;
+	size_t new_cnt;
+
+	if (relo->type != RELO_CALL)
+		return -LIBBPF_ERRNO__RELOC;
+
+	if (prog->idx == obj->efile.text_shndx) {
+		pr_warning("relo in .text insn %d into off %d\n",
+			   relo->insn_idx, relo->text_off);
+		return -LIBBPF_ERRNO__RELOC;
+	}
+
+	if (prog->main_prog_cnt == 0) {
+		text = bpf_object__find_prog_by_idx(obj, obj->efile.text_shndx);
+		if (!text) {
+			pr_warning("no .text section found yet relo into text exist\n");
+			return -LIBBPF_ERRNO__RELOC;
+		}
+		new_cnt = prog->insns_cnt + text->insns_cnt;
+		new_insn = realloc(prog->insns, new_cnt * sizeof(*insn));
+		if (!new_insn) {
+			pr_warning("oom in prog realloc\n");
+			return -ENOMEM;
+		}
+		memcpy(new_insn + prog->insns_cnt, text->insns,
+		       text->insns_cnt * sizeof(*insn));
+		prog->insns = new_insn;
+		prog->main_prog_cnt = prog->insns_cnt;
+		prog->insns_cnt = new_cnt;
+	}
+	insn = &prog->insns[relo->insn_idx];
+	insn->imm += prog->main_prog_cnt - relo->insn_idx;
+	pr_debug("added %zd insn from %s to prog %s\n",
+		 text->insns_cnt, text->section_name, prog->section_name);
+	return 0;
+}
+
+static int
 bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 {
-	int i;
+	int i, err;
 
 	if (!prog || !prog->reloc_desc)
 		return 0;
 
 	for (i = 0; i < prog->nr_reloc; i++) {
-		int insn_idx, map_idx;
-		struct bpf_insn *insns = prog->insns;
+		if (prog->reloc_desc[i].type == RELO_LD64) {
+			struct bpf_insn *insns = prog->insns;
+			int insn_idx, map_idx;
 
-		insn_idx = prog->reloc_desc[i].insn_idx;
-		map_idx = prog->reloc_desc[i].map_idx;
+			insn_idx = prog->reloc_desc[i].insn_idx;
+			map_idx = prog->reloc_desc[i].map_idx;
 
-		if (insn_idx >= (int)prog->insns_cnt) {
-			pr_warning("relocation out of range: '%s'\n",
-				   prog->section_name);
-			return -LIBBPF_ERRNO__RELOC;
+			if (insn_idx >= (int)prog->insns_cnt) {
+				pr_warning("relocation out of range: '%s'\n",
+					   prog->section_name);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+			insns[insn_idx].imm = obj->maps[map_idx].fd;
+		} else {
+			err = bpf_program__reloc_text(prog, obj,
+						      &prog->reloc_desc[i]);
+			if (err)
+				return err;
 		}
-		insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
-		insns[insn_idx].imm = obj->maps[map_idx].fd;
 	}
 
 	zfree(&prog->reloc_desc);
@@ -1026,7 +1106,6 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
 		Elf_Data *data = obj->efile.reloc[i].data;
 		int idx = shdr->sh_info;
 		struct bpf_program *prog;
-		size_t nr_maps = obj->nr_maps;
 
 		if (shdr->sh_type != SHT_REL) {
 			pr_warning("internal error at %d\n", __LINE__);
@@ -1040,11 +1119,9 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
-		err = bpf_program__collect_reloc(prog, nr_maps,
+		err = bpf_program__collect_reloc(prog,
 						 shdr, data,
-						 obj->efile.symbols,
-						 obj->efile.maps_shndx,
-						 obj->maps);
+						 obj);
 		if (err)
 			return err;
 	}
@@ -1197,6 +1274,8 @@ bpf_object__load_progs(struct bpf_object *obj)
 	int err;
 
 	for (i = 0; i < obj->nr_programs; i++) {
+		if (obj->programs[i].idx == obj->efile.text_shndx)
+			continue;
 		err = bpf_program__load(&obj->programs[i],
 					obj->license,
 					obj->kern_version);
@@ -1859,7 +1938,7 @@ long libbpf_get_error(const void *ptr)
 int bpf_prog_load(const char *file, enum bpf_prog_type type,
 		  struct bpf_object **pobj, int *prog_fd)
 {
-	struct bpf_program *prog;
+	struct bpf_program *prog, *first_prog = NULL;
 	struct bpf_object *obj;
 	int err;
 
@@ -1867,25 +1946,30 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
 	if (IS_ERR(obj))
 		return -ENOENT;
 
-	prog = bpf_program__next(NULL, obj);
-	if (!prog) {
-		bpf_object__close(obj);
-		return -ENOENT;
-	}
-
-	/*
-	 * If type is not specified, try to guess it based on
-	 * section name.
-	 */
-	if (type == BPF_PROG_TYPE_UNSPEC) {
-		type = bpf_program__guess_type(prog);
+	bpf_object__for_each_program(prog, obj) {
+		/*
+		 * If type is not specified, try to guess it based on
+		 * section name.
+		 */
 		if (type == BPF_PROG_TYPE_UNSPEC) {
-			bpf_object__close(obj);
-			return -EINVAL;
+			type = bpf_program__guess_type(prog);
+			if (type == BPF_PROG_TYPE_UNSPEC) {
+				bpf_object__close(obj);
+				return -EINVAL;
+			}
 		}
+
+		bpf_program__set_type(prog, type);
+		if (prog->idx != obj->efile.text_shndx && !first_prog)
+			first_prog = prog;
+	}
+
+	if (!first_prog) {
+		pr_warning("object file doesn't contain bpf program\n");
+		bpf_object__close(obj);
+		return -ENOENT;
 	}
 
-	bpf_program__set_type(prog, type);
 	err = bpf_object__load(obj);
 	if (err) {
 		bpf_object__close(obj);
@@ -1893,6 +1977,6 @@ int bpf_prog_load(const char *file, enum bpf_prog_type type,
 	}
 
 	*pobj = obj;
-	*prog_fd = bpf_program__fd(prog);
+	*prog_fd = bpf_program__fd(first_prog);
 	return 0;
 }
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 13/13] selftests/bpf: additional bpf_call tests
From: Alexei Starovoitov @ 2017-12-15  1:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Daniel Borkmann, John Fastabend, Edward Cree, Jakub Kicinski,
	netdev, kernel-team
In-Reply-To: <20171215015517.409513-1-ast@kernel.org>

From: Daniel Borkmann <daniel@iogearbox.net>

Add some additional checks for few more corner cases.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/testing/selftests/bpf/test_verifier.c | 597 ++++++++++++++++++++++++++++
 1 file changed, 597 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index eaf294822a8f..3bacff0d6f91 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -8111,6 +8111,180 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 	},
 	{
+		"calls: not on unpriviledged",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+		},
+		.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
+		.result_unpriv = REJECT,
+		.result = ACCEPT,
+	},
+	{
+		"calls: overlapping caller/callee",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "last insn is not an exit or jmp",
+		.result = REJECT,
+	},
+	{
+		"calls: wrong recursive calls",
+		.insns = {
+			BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -2),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -2),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -2),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "jump out of range",
+		.result = REJECT,
+	},
+	{
+		"calls: wrong src reg",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 2, 0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "BPF_CALL uses reserved fields",
+		.result = REJECT,
+	},
+	{
+		"calls: wrong off value",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, -1, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "BPF_CALL uses reserved fields",
+		.result = REJECT,
+	},
+	{
+		"calls: jump back loop",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -1),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "back-edge from insn 0 to 0",
+		.result = REJECT,
+	},
+	{
+		"calls: conditional call",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "jump out of range",
+		.result = REJECT,
+	},
+	{
+		"calls: conditional call 2",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 2),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 3),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.result = ACCEPT,
+	},
+	{
+		"calls: conditional call 3",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 4),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -6),
+			BPF_MOV64_IMM(BPF_REG_0, 3),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -6),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "back-edge from insn",
+		.result = REJECT,
+	},
+	{
+		"calls: conditional call 4",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -5),
+			BPF_MOV64_IMM(BPF_REG_0, 3),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.result = ACCEPT,
+	},
+	{
+		"calls: conditional call 5",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_0, 1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -6),
+			BPF_MOV64_IMM(BPF_REG_0, 3),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "back-edge from insn",
+		.result = REJECT,
+	},
+	{
+		"calls: conditional call 6",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, -2),
+			BPF_EXIT_INSN(),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "back-edge from insn",
+		.result = REJECT,
+	},
+	{
 		"calls: using r0 returned by callee",
 		.insns = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
@@ -8122,6 +8296,17 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 	},
 	{
+		"calls: using uninit r0 from callee",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "!read_ok",
+		.result = REJECT,
+	},
+	{
 		"calls: callee is using r1",
 		.insns = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
@@ -8224,6 +8409,71 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 	},
 	{
+		"calls: calls with stack arith",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -64),
+			BPF_MOV64_IMM(BPF_REG_0, 42),
+			BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"calls: calls with misaligned stack access",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -63),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -61),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -63),
+			BPF_MOV64_IMM(BPF_REG_0, 42),
+			BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.flags = F_LOAD_WITH_STRICT_ALIGNMENT,
+		.errstr = "misaligned stack access",
+		.result = REJECT,
+	},
+	{
+		"calls: calls control flow, jump test",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 42),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 43),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+			BPF_JMP_IMM(BPF_JA, 0, 0, -3),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"calls: calls control flow, jump test 2",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_0, 42),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 2),
+			BPF_MOV64_IMM(BPF_REG_0, 43),
+			BPF_JMP_IMM(BPF_JA, 0, 0, 1),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, -3),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "jump out of range from insn 1 to 4",
+		.result = REJECT,
+	},
+	{
 		"calls: two calls with bad jump",
 		.insns = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
@@ -8298,6 +8548,18 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 	},
 	{
+		"calls: invalid call 2",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
+			BPF_EXIT_INSN(),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 0x7fffffff),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+		.errstr = "invalid destination",
+		.result = REJECT,
+	},
+	{
 		"calls: jumping across function bodies. test1",
 		.insns = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
@@ -8367,6 +8629,30 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 	},
 	{
+		"calls: ld_abs with changing ctx data in callee",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_LD_ABS(BPF_B, 0),
+			BPF_LD_ABS(BPF_H, 0),
+			BPF_LD_ABS(BPF_W, 0),
+			BPF_MOV64_REG(BPF_REG_7, BPF_REG_6),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5),
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_7),
+			BPF_LD_ABS(BPF_B, 0),
+			BPF_LD_ABS(BPF_H, 0),
+			BPF_LD_ABS(BPF_W, 0),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_IMM(BPF_REG_2, 1),
+			BPF_MOV64_IMM(BPF_REG_3, 2),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+				     BPF_FUNC_skb_vlan_push),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "BPF_LD_[ABS|IND] instructions cannot be mixed",
+		.result = REJECT,
+	},
+	{
 		"calls: two calls with bad fallthrough",
 		.insns = {
 			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 1),
@@ -8460,6 +8746,36 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 	},
 	{
+		"calls: write into caller stack frame",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+			BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+			BPF_EXIT_INSN(),
+			BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.result = ACCEPT,
+	},
+	{
+		"calls: write into callee stack frame",
+		.insns = {
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, -8),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.errstr = "cannot return stack pointer",
+		.result = REJECT,
+	},
+	{
 		"calls: two calls with stack write and void return",
 		.insns = {
 			/* main prog */
@@ -9057,6 +9373,287 @@ static struct bpf_test tests[] = {
 		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	},
 	{
+		"calls: pkt_ptr spill into caller stack 2",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			/* Marking is still kept, but not in all cases safe. */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_ST_MEM(BPF_W, BPF_REG_4, 0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			/* spill unchecked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 2),
+			/* now the pkt range is verified, read pkt_ptr from stack */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_4, 0),
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "invalid access to packet",
+		.result = REJECT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 3",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			/* Marking is still kept and safe here. */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_ST_MEM(BPF_W, BPF_REG_4, 0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			/* spill unchecked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 3),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* now the pkt range is verified, read pkt_ptr from stack */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_4, 0),
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 4",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+			/* Check marking propagated. */
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_ST_MEM(BPF_W, BPF_REG_4, 0, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			/* spill unchecked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 2),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 5",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_1, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_4, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 3),
+			/* spill checked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "same insn cannot be used with different",
+		.result = REJECT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 6",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_4, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 3),
+			/* spill checked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "R4 invalid mem access",
+		.result = REJECT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 7",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_4, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 3),
+			/* spill checked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "R4 invalid mem access",
+		.result = REJECT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 8",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_4, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 3),
+			/* spill checked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.result = ACCEPT,
+	},
+	{
+		"calls: pkt_ptr spill into caller stack 9",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1),
+			BPF_EXIT_INSN(),
+			BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_4, 0),
+			BPF_EXIT_INSN(),
+
+			/* subprog 1 */
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			/* spill unchecked pkt_ptr into stack of caller */
+			BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 2),
+			BPF_MOV64_IMM(BPF_REG_5, 1),
+			/* don't read back pkt_ptr from stack here */
+			/* write 4 bytes into packet */
+			BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_5),
+			BPF_EXIT_INSN(),
+		},
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+		.errstr = "invalid access to packet",
+		.result = REJECT,
+	},
+	{
 		"calls: caller stack init to zero or map_value_or_null",
 		.insns = {
 			BPF_MOV64_IMM(BPF_REG_0, 0),
-- 
2.9.5

^ permalink raw reply related

* [PATCH bpf-next 08/13] selftests/bpf: add xdp noinline test
From: Alexei Starovoitov @ 2017-12-15  1:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Daniel Borkmann, John Fastabend, Edward Cree, Jakub Kicinski,
	netdev, kernel-team
In-Reply-To: <20171215015517.409513-1-ast@kernel.org>

From: Alexei Starovoitov <ast@fb.com>

add large semi-artificial XDP test with 18 functions to stress test
bpf call verification logic

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/Makefile            |   3 +-
 tools/testing/selftests/bpf/test_progs.c        |  81 +++
 tools/testing/selftests/bpf/test_xdp_noinline.c | 833 ++++++++++++++++++++++++
 3 files changed, 916 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_xdp_noinline.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 6970d073df5b..7ef9601d04bf 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -18,7 +18,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
 	sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
-	test_l4lb_noinline.o
+	test_l4lb_noinline.o test_xdp_noinline.o
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
 	test_offload.py
@@ -54,6 +54,7 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
 	      -Wno-compare-distinct-pointer-types
 
 $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
+$(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
 
 %.o: %.c
 	$(CLANG) $(CLANG_FLAGS) \
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index abff83bf8d40..6472ca98690e 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -257,6 +257,86 @@ static void test_l4lb_all(void)
 	test_l4lb(file2);
 }
 
+static void test_xdp_noinline(void)
+{
+	const char *file = "./test_xdp_noinline.o";
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	struct vip key = {.protocol = 6};
+	struct vip_meta {
+		__u32 flags;
+		__u32 vip_num;
+	} value = {.vip_num = VIP_NUM};
+	__u32 stats_key = VIP_NUM;
+	struct vip_stats {
+		__u64 bytes;
+		__u64 pkts;
+	} stats[nr_cpus];
+	struct real_definition {
+		union {
+			__be32 dst;
+			__be32 dstv6[4];
+		};
+		__u8 flags;
+	} real_def = {.dst = MAGIC_VAL};
+	__u32 ch_key = 11, real_num = 3;
+	__u32 duration, retval, size;
+	int err, i, prog_fd, map_fd;
+	__u64 bytes = 0, pkts = 0;
+	struct bpf_object *obj;
+	char buf[128];
+	u32 *magic = (u32 *)buf;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (err) {
+		error_cnt++;
+		return;
+	}
+
+	map_fd = bpf_find_map(__func__, obj, "vip_map");
+	if (map_fd < 0)
+		goto out;
+	bpf_map_update_elem(map_fd, &key, &value, 0);
+
+	map_fd = bpf_find_map(__func__, obj, "ch_rings");
+	if (map_fd < 0)
+		goto out;
+	bpf_map_update_elem(map_fd, &ch_key, &real_num, 0);
+
+	map_fd = bpf_find_map(__func__, obj, "reals");
+	if (map_fd < 0)
+		goto out;
+	bpf_map_update_elem(map_fd, &real_num, &real_def, 0);
+
+	err = bpf_prog_test_run(prog_fd, NUM_ITER, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || errno || retval != 1 || size != 54 ||
+	      *magic != MAGIC_VAL, "ipv4",
+	      "err %d errno %d retval %d size %d magic %x\n",
+	      err, errno, retval, size, *magic);
+
+	err = bpf_prog_test_run(prog_fd, NUM_ITER, &pkt_v6, sizeof(pkt_v6),
+				buf, &size, &retval, &duration);
+	CHECK(err || errno || retval != 1 || size != 74 ||
+	      *magic != MAGIC_VAL, "ipv6",
+	      "err %d errno %d retval %d size %d magic %x\n",
+	      err, errno, retval, size, *magic);
+
+	map_fd = bpf_find_map(__func__, obj, "stats");
+	if (map_fd < 0)
+		goto out;
+	bpf_map_lookup_elem(map_fd, &stats_key, stats);
+	for (i = 0; i < nr_cpus; i++) {
+		bytes += stats[i].bytes;
+		pkts += stats[i].pkts;
+	}
+	if (bytes != MAGIC_BYTES * NUM_ITER * 2 || pkts != NUM_ITER * 2) {
+		error_cnt++;
+		printf("test_xdp_noinline:FAIL:stats %lld %lld\n", bytes, pkts);
+	}
+out:
+	bpf_object__close(obj);
+}
+
 static void test_tcp_estats(void)
 {
 	const char *file = "./test_tcp_estats.o";
@@ -766,6 +846,7 @@ int main(void)
 	test_pkt_access();
 	test_xdp();
 	test_l4lb_all();
+	test_xdp_noinline();
 	test_tcp_estats();
 	test_bpf_obj_id();
 	test_pkt_md_access();
diff --git a/tools/testing/selftests/bpf/test_xdp_noinline.c b/tools/testing/selftests/bpf/test_xdp_noinline.c
new file mode 100644
index 000000000000..5e4aac74f9d0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_xdp_noinline.c
@@ -0,0 +1,833 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2017 Facebook
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/icmp.h>
+#include <linux/icmpv6.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
+#include "bpf_helpers.h"
+
+#define bpf_printk(fmt, ...)				\
+({							\
+	char ____fmt[] = fmt;				\
+	bpf_trace_printk(____fmt, sizeof(____fmt),	\
+			##__VA_ARGS__);			\
+})
+
+static __u32 rol32(__u32 word, unsigned int shift)
+{
+	return (word << shift) | (word >> ((-shift) & 31));
+}
+
+/* copy paste of jhash from kernel sources to make sure llvm
+ * can compile it into valid sequence of bpf instructions
+ */
+#define __jhash_mix(a, b, c)			\
+{						\
+	a -= c;  a ^= rol32(c, 4);  c += b;	\
+	b -= a;  b ^= rol32(a, 6);  a += c;	\
+	c -= b;  c ^= rol32(b, 8);  b += a;	\
+	a -= c;  a ^= rol32(c, 16); c += b;	\
+	b -= a;  b ^= rol32(a, 19); a += c;	\
+	c -= b;  c ^= rol32(b, 4);  b += a;	\
+}
+
+#define __jhash_final(a, b, c)			\
+{						\
+	c ^= b; c -= rol32(b, 14);		\
+	a ^= c; a -= rol32(c, 11);		\
+	b ^= a; b -= rol32(a, 25);		\
+	c ^= b; c -= rol32(b, 16);		\
+	a ^= c; a -= rol32(c, 4);		\
+	b ^= a; b -= rol32(a, 14);		\
+	c ^= b; c -= rol32(b, 24);		\
+}
+
+#define JHASH_INITVAL		0xdeadbeef
+
+typedef unsigned int u32;
+
+static __attribute__ ((noinline))
+u32 jhash(const void *key, u32 length, u32 initval)
+{
+	u32 a, b, c;
+	const unsigned char *k = key;
+
+	a = b = c = JHASH_INITVAL + length + initval;
+
+	while (length > 12) {
+		a += *(u32 *)(k);
+		b += *(u32 *)(k + 4);
+		c += *(u32 *)(k + 8);
+		__jhash_mix(a, b, c);
+		length -= 12;
+		k += 12;
+	}
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9:  c += k[8];
+	case 8:  b += (u32)k[7]<<24;
+	case 7:  b += (u32)k[6]<<16;
+	case 6:  b += (u32)k[5]<<8;
+	case 5:  b += k[4];
+	case 4:  a += (u32)k[3]<<24;
+	case 3:  a += (u32)k[2]<<16;
+	case 2:  a += (u32)k[1]<<8;
+	case 1:  a += k[0];
+		 __jhash_final(a, b, c);
+	case 0: /* Nothing left to add */
+		break;
+	}
+
+	return c;
+}
+
+static __attribute__ ((noinline))
+u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
+{
+	a += initval;
+	b += initval;
+	c += initval;
+	__jhash_final(a, b, c);
+	return c;
+}
+
+static __attribute__ ((noinline))
+u32 jhash_2words(u32 a, u32 b, u32 initval)
+{
+	return __jhash_nwords(a, b, 0, initval + JHASH_INITVAL + (2 << 2));
+}
+
+struct flow_key {
+	union {
+		__be32 src;
+		__be32 srcv6[4];
+	};
+	union {
+		__be32 dst;
+		__be32 dstv6[4];
+	};
+	union {
+		__u32 ports;
+		__u16 port16[2];
+	};
+	__u8 proto;
+};
+
+struct packet_description {
+	struct flow_key flow;
+	__u8 flags;
+};
+
+struct ctl_value {
+	union {
+		__u64 value;
+		__u32 ifindex;
+		__u8 mac[6];
+	};
+};
+
+struct vip_definition {
+	union {
+		__be32 vip;
+		__be32 vipv6[4];
+	};
+	__u16 port;
+	__u16 family;
+	__u8 proto;
+};
+
+struct vip_meta {
+	__u32 flags;
+	__u32 vip_num;
+};
+
+struct real_pos_lru {
+	__u32 pos;
+	__u64 atime;
+};
+
+struct real_definition {
+	union {
+		__be32 dst;
+		__be32 dstv6[4];
+	};
+	__u8 flags;
+};
+
+struct lb_stats {
+	__u64 v2;
+	__u64 v1;
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) vip_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(struct vip_definition),
+	.value_size = sizeof(struct vip_meta),
+	.max_entries = 512,
+	.map_flags = 0,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) lru_cache = {
+	.type = BPF_MAP_TYPE_LRU_HASH,
+	.key_size = sizeof(struct flow_key),
+	.value_size = sizeof(struct real_pos_lru),
+	.max_entries = 300,
+	.map_flags = 1U << 1,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) ch_rings = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 12 * 655,
+	.map_flags = 0,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) reals = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(struct real_definition),
+	.max_entries = 40,
+	.map_flags = 0,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) stats = {
+	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(struct lb_stats),
+	.max_entries = 515,
+	.map_flags = 0,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(struct ctl_value),
+	.max_entries = 16,
+	.map_flags = 0,
+};
+
+struct eth_hdr {
+	unsigned char eth_dest[6];
+	unsigned char eth_source[6];
+	unsigned short eth_proto;
+};
+
+static inline __u64 calc_offset(bool is_ipv6, bool is_icmp)
+{
+	__u64 off = sizeof(struct eth_hdr);
+	if (is_ipv6) {
+		off += sizeof(struct ipv6hdr);
+		if (is_icmp)
+			off += sizeof(struct icmp6hdr) + sizeof(struct ipv6hdr);
+	} else {
+		off += sizeof(struct iphdr);
+		if (is_icmp)
+			off += sizeof(struct icmphdr) + sizeof(struct iphdr);
+	}
+	return off;
+}
+
+static __attribute__ ((noinline))
+bool parse_udp(void *data, void *data_end,
+	       bool is_ipv6, struct packet_description *pckt)
+{
+
+	bool is_icmp = !((pckt->flags & (1 << 0)) == 0);
+	__u64 off = calc_offset(is_ipv6, is_icmp);
+	struct udphdr *udp;
+	udp = data + off;
+
+	if (udp + 1 > data_end)
+		return 0;
+	if (!is_icmp) {
+		pckt->flow.port16[0] = udp->source;
+		pckt->flow.port16[1] = udp->dest;
+	} else {
+		pckt->flow.port16[0] = udp->dest;
+		pckt->flow.port16[1] = udp->source;
+	}
+	return 1;
+}
+
+static __attribute__ ((noinline))
+bool parse_tcp(void *data, void *data_end,
+	       bool is_ipv6, struct packet_description *pckt)
+{
+
+	bool is_icmp = !((pckt->flags & (1 << 0)) == 0);
+	__u64 off = calc_offset(is_ipv6, is_icmp);
+	struct tcphdr *tcp;
+
+	tcp = data + off;
+	if (tcp + 1 > data_end)
+		return 0;
+	if (tcp->syn)
+		pckt->flags |= (1 << 1);
+	if (!is_icmp) {
+		pckt->flow.port16[0] = tcp->source;
+		pckt->flow.port16[1] = tcp->dest;
+	} else {
+		pckt->flow.port16[0] = tcp->dest;
+		pckt->flow.port16[1] = tcp->source;
+	}
+	return 1;
+}
+
+static __attribute__ ((noinline))
+bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
+	      struct packet_description *pckt,
+	      struct real_definition *dst, __u32 pkt_bytes)
+{
+	struct eth_hdr *new_eth;
+	struct eth_hdr *old_eth;
+	struct ipv6hdr *ip6h;
+	__u32 ip_suffix;
+	void *data_end;
+	void *data;
+
+	if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct ipv6hdr)))
+		return 0;
+	data = (void *)(long)xdp->data;
+	data_end = (void *)(long)xdp->data_end;
+	new_eth = data;
+	ip6h = data + sizeof(struct eth_hdr);
+	old_eth = data + sizeof(struct ipv6hdr);
+	if (new_eth + 1 > data_end ||
+	    old_eth + 1 > data_end || ip6h + 1 > data_end)
+		return 0;
+	memcpy(new_eth->eth_dest, cval->mac, 6);
+	memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
+	new_eth->eth_proto = 56710;
+	ip6h->version = 6;
+	ip6h->priority = 0;
+	memset(ip6h->flow_lbl, 0, sizeof(ip6h->flow_lbl));
+
+	ip6h->nexthdr = IPPROTO_IPV6;
+	ip_suffix = pckt->flow.srcv6[3] ^ pckt->flow.port16[0];
+	ip6h->payload_len =
+	    __builtin_bswap16(pkt_bytes + sizeof(struct ipv6hdr));
+	ip6h->hop_limit = 4;
+
+	ip6h->saddr.in6_u.u6_addr32[0] = 1;
+	ip6h->saddr.in6_u.u6_addr32[1] = 2;
+	ip6h->saddr.in6_u.u6_addr32[2] = 3;
+	ip6h->saddr.in6_u.u6_addr32[3] = ip_suffix;
+	memcpy(ip6h->daddr.in6_u.u6_addr32, dst->dstv6, 16);
+	return 1;
+}
+
+static __attribute__ ((noinline))
+bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
+	      struct packet_description *pckt,
+	      struct real_definition *dst, __u32 pkt_bytes)
+{
+
+	__u32 ip_suffix = __builtin_bswap16(pckt->flow.port16[0]);
+	struct eth_hdr *new_eth;
+	struct eth_hdr *old_eth;
+	__u16 *next_iph_u16;
+	struct iphdr *iph;
+	__u32 csum = 0;
+	void *data_end;
+	void *data;
+
+	ip_suffix <<= 15;
+	ip_suffix ^= pckt->flow.src;
+	if (bpf_xdp_adjust_head(xdp, 0 - (int)sizeof(struct iphdr)))
+		return 0;
+	data = (void *)(long)xdp->data;
+	data_end = (void *)(long)xdp->data_end;
+	new_eth = data;
+	iph = data + sizeof(struct eth_hdr);
+	old_eth = data + sizeof(struct iphdr);
+	if (new_eth + 1 > data_end ||
+	    old_eth + 1 > data_end || iph + 1 > data_end)
+		return 0;
+	memcpy(new_eth->eth_dest, cval->mac, 6);
+	memcpy(new_eth->eth_source, old_eth->eth_dest, 6);
+	new_eth->eth_proto = 8;
+	iph->version = 4;
+	iph->ihl = 5;
+	iph->frag_off = 0;
+	iph->protocol = IPPROTO_IPIP;
+	iph->check = 0;
+	iph->tos = 1;
+	iph->tot_len = __builtin_bswap16(pkt_bytes + sizeof(struct iphdr));
+	/* don't update iph->daddr, since it will overwrite old eth_proto
+	 * and multiple iterations of bpf_prog_run() will fail
+	 */
+
+	iph->saddr = ((0xFFFF0000 & ip_suffix) | 4268) ^ dst->dst;
+	iph->ttl = 4;
+
+	next_iph_u16 = (__u16 *) iph;
+#pragma clang loop unroll(full)
+	for (int i = 0; i < sizeof(struct iphdr) >> 1; i++)
+		csum += *next_iph_u16++;
+	iph->check = ~((csum & 0xffff) + (csum >> 16));
+	if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
+		return 0;
+	return 1;
+}
+
+static __attribute__ ((noinline))
+bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
+{
+	struct eth_hdr *new_eth;
+	struct eth_hdr *old_eth;
+
+	old_eth = *data;
+	new_eth = *data + sizeof(struct ipv6hdr);
+	memcpy(new_eth->eth_source, old_eth->eth_source, 6);
+	memcpy(new_eth->eth_dest, old_eth->eth_dest, 6);
+	if (inner_v4)
+		new_eth->eth_proto = 8;
+	else
+		new_eth->eth_proto = 56710;
+	if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct ipv6hdr)))
+		return 0;
+	*data = (void *)(long)xdp->data;
+	*data_end = (void *)(long)xdp->data_end;
+	return 1;
+}
+
+static __attribute__ ((noinline))
+bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
+{
+	struct eth_hdr *new_eth;
+	struct eth_hdr *old_eth;
+
+	old_eth = *data;
+	new_eth = *data + sizeof(struct iphdr);
+	memcpy(new_eth->eth_source, old_eth->eth_source, 6);
+	memcpy(new_eth->eth_dest, old_eth->eth_dest, 6);
+	new_eth->eth_proto = 8;
+	if (bpf_xdp_adjust_head(xdp, (int)sizeof(struct iphdr)))
+		return 0;
+	*data = (void *)(long)xdp->data;
+	*data_end = (void *)(long)xdp->data_end;
+	return 1;
+}
+
+static __attribute__ ((noinline))
+int swap_mac_and_send(void *data, void *data_end)
+{
+	unsigned char tmp_mac[6];
+	struct eth_hdr *eth;
+
+	eth = data;
+	memcpy(tmp_mac, eth->eth_source, 6);
+	memcpy(eth->eth_source, eth->eth_dest, 6);
+	memcpy(eth->eth_dest, tmp_mac, 6);
+	return XDP_TX;
+}
+
+static __attribute__ ((noinline))
+int send_icmp_reply(void *data, void *data_end)
+{
+	struct icmphdr *icmp_hdr;
+	__u16 *next_iph_u16;
+	__u32 tmp_addr = 0;
+	struct iphdr *iph;
+	__u32 csum1 = 0;
+	__u32 csum = 0;
+	__u64 off = 0;
+
+	if (data + sizeof(struct eth_hdr)
+	     + sizeof(struct iphdr) + sizeof(struct icmphdr) > data_end)
+		return XDP_DROP;
+	off += sizeof(struct eth_hdr);
+	iph = data + off;
+	off += sizeof(struct iphdr);
+	icmp_hdr = data + off;
+	icmp_hdr->type = 0;
+	icmp_hdr->checksum += 0x0007;
+	iph->ttl = 4;
+	tmp_addr = iph->daddr;
+	iph->daddr = iph->saddr;
+	iph->saddr = tmp_addr;
+	iph->check = 0;
+	next_iph_u16 = (__u16 *) iph;
+#pragma clang loop unroll(full)
+	for (int i = 0; i < sizeof(struct iphdr) >> 1; i++)
+		csum += *next_iph_u16++;
+	iph->check = ~((csum & 0xffff) + (csum >> 16));
+	return swap_mac_and_send(data, data_end);
+}
+
+static __attribute__ ((noinline))
+int send_icmp6_reply(void *data, void *data_end)
+{
+	struct icmp6hdr *icmp_hdr;
+	struct ipv6hdr *ip6h;
+	__be32 tmp_addr[4];
+	__u64 off = 0;
+
+	if (data + sizeof(struct eth_hdr)
+	     + sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr) > data_end)
+		return XDP_DROP;
+	off += sizeof(struct eth_hdr);
+	ip6h = data + off;
+	off += sizeof(struct ipv6hdr);
+	icmp_hdr = data + off;
+	icmp_hdr->icmp6_type = 129;
+	icmp_hdr->icmp6_cksum -= 0x0001;
+	ip6h->hop_limit = 4;
+	memcpy(tmp_addr, ip6h->saddr.in6_u.u6_addr32, 16);
+	memcpy(ip6h->saddr.in6_u.u6_addr32, ip6h->daddr.in6_u.u6_addr32, 16);
+	memcpy(ip6h->daddr.in6_u.u6_addr32, tmp_addr, 16);
+	return swap_mac_and_send(data, data_end);
+}
+
+static __attribute__ ((noinline))
+int parse_icmpv6(void *data, void *data_end, __u64 off,
+		 struct packet_description *pckt)
+{
+	struct icmp6hdr *icmp_hdr;
+	struct ipv6hdr *ip6h;
+
+	icmp_hdr = data + off;
+	if (icmp_hdr + 1 > data_end)
+		return XDP_DROP;
+	if (icmp_hdr->icmp6_type == 128)
+		return send_icmp6_reply(data, data_end);
+	if (icmp_hdr->icmp6_type != 3)
+		return XDP_PASS;
+	off += sizeof(struct icmp6hdr);
+	ip6h = data + off;
+	if (ip6h + 1 > data_end)
+		return XDP_DROP;
+	pckt->flow.proto = ip6h->nexthdr;
+	pckt->flags |= (1 << 0);
+	memcpy(pckt->flow.srcv6, ip6h->daddr.in6_u.u6_addr32, 16);
+	memcpy(pckt->flow.dstv6, ip6h->saddr.in6_u.u6_addr32, 16);
+	return -1;
+}
+
+static __attribute__ ((noinline))
+int parse_icmp(void *data, void *data_end, __u64 off,
+	       struct packet_description *pckt)
+{
+	struct icmphdr *icmp_hdr;
+	struct iphdr *iph;
+
+	icmp_hdr = data + off;
+	if (icmp_hdr + 1 > data_end)
+		return XDP_DROP;
+	if (icmp_hdr->type == 8)
+		return send_icmp_reply(data, data_end);
+	if ((icmp_hdr->type != 3) || (icmp_hdr->code != 4))
+		return XDP_PASS;
+	off += sizeof(struct icmphdr);
+	iph = data + off;
+	if (iph + 1 > data_end)
+		return XDP_DROP;
+	if (iph->ihl != 5)
+		return XDP_DROP;
+	pckt->flow.proto = iph->protocol;
+	pckt->flags |= (1 << 0);
+	pckt->flow.src = iph->daddr;
+	pckt->flow.dst = iph->saddr;
+	return -1;
+}
+
+static __attribute__ ((noinline))
+__u32 get_packet_hash(struct packet_description *pckt,
+		      bool hash_16bytes)
+{
+	if (hash_16bytes)
+		return jhash_2words(jhash(pckt->flow.srcv6, 16, 12),
+				    pckt->flow.ports, 24);
+	else
+		return jhash_2words(pckt->flow.src, pckt->flow.ports,
+				    24);
+}
+
+__attribute__ ((noinline))
+static bool get_packet_dst(struct real_definition **real,
+			   struct packet_description *pckt,
+			   struct vip_meta *vip_info,
+			   bool is_ipv6, void *lru_map)
+{
+	struct real_pos_lru new_dst_lru = { };
+	bool hash_16bytes = is_ipv6;
+	__u32 *real_pos, hash, key;
+	__u64 cur_time;
+
+	if (vip_info->flags & (1 << 2))
+		hash_16bytes = 1;
+	if (vip_info->flags & (1 << 3)) {
+		pckt->flow.port16[0] = pckt->flow.port16[1];
+		memset(pckt->flow.srcv6, 0, 16);
+	}
+	hash = get_packet_hash(pckt, hash_16bytes);
+	if (hash != 0x358459b7 /* jhash of ipv4 packet */  &&
+	    hash != 0x2f4bc6bb /* jhash of ipv6 packet */)
+		return 0;
+	key = 2 * vip_info->vip_num + hash % 2;
+	real_pos = bpf_map_lookup_elem(&ch_rings, &key);
+	if (!real_pos)
+		return 0;
+	key = *real_pos;
+	*real = bpf_map_lookup_elem(&reals, &key);
+	if (!(*real))
+		return 0;
+	if (!(vip_info->flags & (1 << 1))) {
+		__u32 conn_rate_key = 512 + 2;
+		struct lb_stats *conn_rate_stats =
+		    bpf_map_lookup_elem(&stats, &conn_rate_key);
+
+		if (!conn_rate_stats)
+			return 1;
+		cur_time = bpf_ktime_get_ns();
+		if ((cur_time - conn_rate_stats->v2) >> 32 > 0xffFFFF) {
+			conn_rate_stats->v1 = 1;
+			conn_rate_stats->v2 = cur_time;
+		} else {
+			conn_rate_stats->v1 += 1;
+			if (conn_rate_stats->v1 >= 1)
+				return 1;
+		}
+		if (pckt->flow.proto == IPPROTO_UDP)
+			new_dst_lru.atime = cur_time;
+		new_dst_lru.pos = key;
+		bpf_map_update_elem(lru_map, &pckt->flow, &new_dst_lru, 0);
+	}
+	return 1;
+}
+
+__attribute__ ((noinline))
+static void connection_table_lookup(struct real_definition **real,
+				    struct packet_description *pckt,
+				    void *lru_map)
+{
+
+	struct real_pos_lru *dst_lru;
+	__u64 cur_time;
+	__u32 key;
+
+	dst_lru = bpf_map_lookup_elem(lru_map, &pckt->flow);
+	if (!dst_lru)
+		return;
+	if (pckt->flow.proto == IPPROTO_UDP) {
+		cur_time = bpf_ktime_get_ns();
+		if (cur_time - dst_lru->atime > 300000)
+			return;
+		dst_lru->atime = cur_time;
+	}
+	key = dst_lru->pos;
+	*real = bpf_map_lookup_elem(&reals, &key);
+}
+
+/* don't believe your eyes!
+ * below function has 6 arguments whereas bpf and llvm allow maximum of 5
+ * but since it's _static_ llvm can optimize one argument away
+ */
+__attribute__ ((noinline))
+static int process_l3_headers_v6(struct packet_description *pckt,
+				 __u8 *protocol, __u64 off,
+				 __u16 *pkt_bytes, void *data,
+				 void *data_end)
+{
+	struct ipv6hdr *ip6h;
+	__u64 iph_len;
+	int action;
+
+	ip6h = data + off;
+	if (ip6h + 1 > data_end)
+		return XDP_DROP;
+	iph_len = sizeof(struct ipv6hdr);
+	*protocol = ip6h->nexthdr;
+	pckt->flow.proto = *protocol;
+	*pkt_bytes = __builtin_bswap16(ip6h->payload_len);
+	off += iph_len;
+	if (*protocol == 45) {
+		return XDP_DROP;
+	} else if (*protocol == 59) {
+		action = parse_icmpv6(data, data_end, off, pckt);
+		if (action >= 0)
+			return action;
+	} else {
+		memcpy(pckt->flow.srcv6, ip6h->saddr.in6_u.u6_addr32, 16);
+		memcpy(pckt->flow.dstv6, ip6h->daddr.in6_u.u6_addr32, 16);
+	}
+	return -1;
+}
+
+__attribute__ ((noinline))
+static int process_l3_headers_v4(struct packet_description *pckt,
+				 __u8 *protocol, __u64 off,
+				 __u16 *pkt_bytes, void *data,
+				 void *data_end)
+{
+	struct iphdr *iph;
+	__u64 iph_len;
+	int action;
+
+	iph = data + off;
+	if (iph + 1 > data_end)
+		return XDP_DROP;
+	if (iph->ihl != 5)
+		return XDP_DROP;
+	*protocol = iph->protocol;
+	pckt->flow.proto = *protocol;
+	*pkt_bytes = __builtin_bswap16(iph->tot_len);
+	off += 20;
+	if (iph->frag_off & 65343)
+		return XDP_DROP;
+	if (*protocol == IPPROTO_ICMP) {
+		action = parse_icmp(data, data_end, off, pckt);
+		if (action >= 0)
+			return action;
+	} else {
+		pckt->flow.src = iph->saddr;
+		pckt->flow.dst = iph->daddr;
+	}
+	return -1;
+}
+
+__attribute__ ((noinline))
+static int process_packet(void *data, __u64 off, void *data_end,
+			  bool is_ipv6, struct xdp_md *xdp)
+{
+
+	struct real_definition *dst = NULL;
+	struct packet_description pckt = { };
+	struct vip_definition vip = { };
+	struct lb_stats *data_stats;
+	struct eth_hdr *eth = data;
+	void *lru_map = &lru_cache;
+	struct vip_meta *vip_info;
+	__u32 lru_stats_key = 513;
+	__u32 mac_addr_pos = 0;
+	__u32 stats_key = 512;
+	struct ctl_value *cval;
+	__u16 pkt_bytes;
+	__u64 iph_len;
+	__u8 protocol;
+	__u32 vip_num;
+	int action;
+
+	if (is_ipv6)
+		action = process_l3_headers_v6(&pckt, &protocol, off,
+					       &pkt_bytes, data, data_end);
+	else
+		action = process_l3_headers_v4(&pckt, &protocol, off,
+					       &pkt_bytes, data, data_end);
+	if (action >= 0)
+		return action;
+	protocol = pckt.flow.proto;
+	if (protocol == IPPROTO_TCP) {
+		if (!parse_tcp(data, data_end, is_ipv6, &pckt))
+			return XDP_DROP;
+	} else if (protocol == IPPROTO_UDP) {
+		if (!parse_udp(data, data_end, is_ipv6, &pckt))
+			return XDP_DROP;
+	} else {
+		return XDP_TX;
+	}
+
+	if (is_ipv6)
+		memcpy(vip.vipv6, pckt.flow.dstv6, 16);
+	else
+		vip.vip = pckt.flow.dst;
+	vip.port = pckt.flow.port16[1];
+	vip.proto = pckt.flow.proto;
+	vip_info = bpf_map_lookup_elem(&vip_map, &vip);
+	if (!vip_info) {
+		vip.port = 0;
+		vip_info = bpf_map_lookup_elem(&vip_map, &vip);
+		if (!vip_info)
+			return XDP_PASS;
+		if (!(vip_info->flags & (1 << 4)))
+			pckt.flow.port16[1] = 0;
+	}
+	if (data_end - data > 1400)
+		return XDP_DROP;
+	data_stats = bpf_map_lookup_elem(&stats, &stats_key);
+	if (!data_stats)
+		return XDP_DROP;
+	data_stats->v1 += 1;
+	if (!dst) {
+		if (vip_info->flags & (1 << 0))
+			pckt.flow.port16[0] = 0;
+		if (!(pckt.flags & (1 << 1)) && !(vip_info->flags & (1 << 1)))
+			connection_table_lookup(&dst, &pckt, lru_map);
+		if (dst)
+			goto out;
+		if (pckt.flow.proto == IPPROTO_TCP) {
+			struct lb_stats *lru_stats =
+			    bpf_map_lookup_elem(&stats, &lru_stats_key);
+
+			if (!lru_stats)
+				return XDP_DROP;
+			if (pckt.flags & (1 << 1))
+				lru_stats->v1 += 1;
+			else
+				lru_stats->v2 += 1;
+		}
+		if (!get_packet_dst(&dst, &pckt, vip_info, is_ipv6, lru_map))
+			return XDP_DROP;
+		data_stats->v2 += 1;
+	}
+out:
+	cval = bpf_map_lookup_elem(&ctl_array, &mac_addr_pos);
+	if (!cval)
+		return XDP_DROP;
+	if (dst->flags & (1 << 0)) {
+		if (!encap_v6(xdp, cval, &pckt, dst, pkt_bytes))
+			return XDP_DROP;
+	} else {
+		if (!encap_v4(xdp, cval, &pckt, dst, pkt_bytes))
+			return XDP_DROP;
+	}
+	vip_num = vip_info->vip_num;
+	data_stats = bpf_map_lookup_elem(&stats, &vip_num);
+	if (!data_stats)
+		return XDP_DROP;
+	data_stats->v1 += 1;
+	data_stats->v2 += pkt_bytes;
+
+	data = (void *)(long)xdp->data;
+	data_end = (void *)(long)xdp->data_end;
+	if (data + 4 > data_end)
+		return XDP_DROP;
+	*(u32 *)data = dst->dst;
+	return XDP_DROP;
+}
+
+__attribute__ ((section("xdp-test"), used))
+int balancer_ingress(struct xdp_md *ctx)
+{
+	void *data = (void *)(long)ctx->data;
+	void *data_end = (void *)(long)ctx->data_end;
+	struct eth_hdr *eth = data;
+	__u32 eth_proto;
+	__u32 nh_off;
+
+	nh_off = sizeof(struct eth_hdr);
+	if (data + nh_off > data_end)
+		return XDP_DROP;
+	eth_proto = eth->eth_proto;
+	if (eth_proto == 8)
+		return process_packet(data, nh_off, data_end, 0, ctx);
+	else if (eth_proto == 56710)
+		return process_packet(data, nh_off, data_end, 1, ctx);
+	else
+		return XDP_DROP;
+}
+
+char _license[] __attribute__ ((section("license"), used)) = "GPL";
+int _version __attribute__ ((section("version"), used)) = 1;
-- 
2.9.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox