Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list
From: David Ahern @ 2017-04-22 14:14 UTC (permalink / raw)
  To: Dmitry Vyukov, Martin KaFai Lau; +Cc: netdev, andreyknvl, mmanning
In-Reply-To: <CACT4Y+aRBZApshv2T-edaErPmiCsbMxNNkzC5Hs5jGJdvVwOAg@mail.gmail.com>

On 4/22/17 3:14 AM, Dmitry Vyukov wrote:
>> One small question.  Why cmpxchg is needed instead
>> of a ip6_rt_put() and then assign?
>> Is it fixing another bug?
> cmpxchg here looks fishy.
> If there are no concurrent modifications, then it is not needed.
> If there are and cmpxchg fails, then we will put the installed rt and
> leak the new one.
> 

Yes, I need to convert that to changing the rt under a lock.

Leftover from the beginning of the investigation when I suspected
locking and recalled Li's patch. I'll send a v2.

^ permalink raw reply

* Why max netlink msg size is limited to 16k
From: prashantkumar dhotre @ 2017-04-22 14:13 UTC (permalink / raw)
  To: netdev

I am observing that max netlink msg that my kernel module can send to
user app is close to 16K.

For larger sizes, genlmsg_unicast() succeeds but my app does not receive data.

I have tried increasing RECV buffer size in my user app but that does not help.

Regards

^ permalink raw reply

* [PATCH] orinoco:  fix spelling mistake: "Registerred" -> "Registered"
From: Colin King @ 2017-04-22 13:21 UTC (permalink / raw)
  To: Kalle Valo, David S . Miller, Tobias Klauser, Jarod Wilson,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA

From: Colin Ian King <colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

trivial fix to spelling mistake in dbg_dbg message

Signed-off-by: Colin Ian King <colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 drivers/net/wireless/intersil/orinoco/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intersil/orinoco/main.c b/drivers/net/wireless/intersil/orinoco/main.c
index 28cf97489001..d9128bb25e85 100644
--- a/drivers/net/wireless/intersil/orinoco/main.c
+++ b/drivers/net/wireless/intersil/orinoco/main.c
@@ -2283,7 +2283,7 @@ int orinoco_if_add(struct orinoco_private *priv,
 	priv->ndev = dev;
 
 	/* Report what we've done */
-	dev_dbg(priv->dev, "Registerred interface %s.\n", dev->name);
+	dev_dbg(priv->dev, "Registered interface %s.\n", dev->name);
 
 	return 0;
 
-- 
2.11.0

^ permalink raw reply related

* pull request: bluetooth-next 2017-04-22
From: Johan Hedberg @ 2017-04-22 13:13 UTC (permalink / raw)
  To: davem; +Cc: linux-bluetooth, netdev

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Hi Dave,

Here are some more Bluetooth patches (and one 802.15.4 patch) in the
bluetooth-next tree targeting the 4.12 kernel. Most of them are pure
fixes.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-04-21 20:23:53 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream

for you to fetch changes up to d160b74da85a4ec072b076db056e27ba7246eba0:

  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY (2017-04-22 10:28:40 +0200)

----------------------------------------------------------------
Arnd Bergmann (2):
      Bluetooth: try to improve CONFIG_SERIAL_DEV_BUS dependency
      ieee802154: don't select COMMON_CLK

Dean Jenkins (3):
      Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
      Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
      Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY

Sebastian Reichel (1):
      Bluetooth: hci_ll: Fix NULL pointer deref on FW upload failure

 drivers/bluetooth/Kconfig      | 8 +++++++-
 drivers/bluetooth/Makefile     | 2 +-
 drivers/bluetooth/hci_ldisc.c  | 7 ++++++-
 drivers/bluetooth/hci_ll.c     | 3 +--
 drivers/net/ieee802154/Kconfig | 2 +-
 5 files changed, 16 insertions(+), 6 deletions(-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH v2] net: natsemi: ns83820: add checks for dma mapping error
From: Alexey Khoroshilov @ 2017-04-22 13:06 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey Khoroshilov, netdev, linux-kernel, ldv-project
In-Reply-To: <20170417.151743.1779341653983811894.davem@davemloft.net>

The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
---
 drivers/net/ethernet/natsemi/ns83820.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/natsemi/ns83820.c b/drivers/net/ethernet/natsemi/ns83820.c
index 729095db3e08..dfc64e1e31f9 100644
--- a/drivers/net/ethernet/natsemi/ns83820.c
+++ b/drivers/net/ethernet/natsemi/ns83820.c
@@ -534,14 +534,19 @@ static inline int ns83820_add_rx_skb(struct ns83820 *dev, struct sk_buff *skb)
 		);
 #endif
 
+	buf = pci_map_single(dev->pci_dev, skb->data,
+			     REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
+	if (pci_dma_mapping_error(dev->pci_dev, buf)) {
+		kfree_skb(skb);
+		return 1;
+	}
+
 	sg = dev->rx_info.descs + (next_empty * DESC_SIZE);
 	BUG_ON(NULL != dev->rx_info.skbs[next_empty]);
 	dev->rx_info.skbs[next_empty] = skb;
 
 	dev->rx_info.next_empty = (next_empty + 1) % NR_RX_DESC;
 	cmdsts = REAL_RX_BUF_SIZE | CMDSTS_INTR;
-	buf = pci_map_single(dev->pci_dev, skb->data,
-			     REAL_RX_BUF_SIZE, PCI_DMA_FROMDEVICE);
 	build_rx_desc(dev, sg, 0, buf, cmdsts, 0);
 	/* update link of previous rx */
 	if (likely(next_empty != dev->rx_info.next_rx))
@@ -1068,6 +1073,7 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff *skb,
 	int stopped = 0;
 	int do_intr = 0;
 	volatile __le32 *first_desc;
+	volatile __le32 *desc;
 
 	dprintk("ns83820_hard_start_xmit\n");
 
@@ -1136,11 +1142,13 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff *skb,
 	if (nr_frags)
 		len -= skb->data_len;
 	buf = pci_map_single(dev->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+	if (pci_dma_mapping_error(dev->pci_dev, buf))
+		goto dma_error_first;
 
 	first_desc = dev->tx_descs + (free_idx * DESC_SIZE);
 
 	for (;;) {
-		volatile __le32 *desc = dev->tx_descs + (free_idx * DESC_SIZE);
+		desc = dev->tx_descs + (free_idx * DESC_SIZE);
 
 		dprintk("frag[%3u]: %4u @ 0x%08Lx\n", free_idx, len,
 			(unsigned long long)buf);
@@ -1160,6 +1168,8 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff *skb,
 
 		buf = skb_frag_dma_map(&dev->pci_dev->dev, frag, 0,
 				       skb_frag_size(frag), DMA_TO_DEVICE);
+		if (dma_mapping_error(&dev->pci_dev->dev, buf))
+			goto dma_error;
 		dprintk("frag: buf=%08Lx  page=%08lx offset=%08lx\n",
 			(long long)buf, (long) page_to_pfn(frag->page),
 			frag->page_offset);
@@ -1183,6 +1193,32 @@ static netdev_tx_t ns83820_hard_start_xmit(struct sk_buff *skb,
 		netif_start_queue(ndev);
 
 	return NETDEV_TX_OK;
+
+dma_error:
+	do {
+		free_idx = (free_idx + NR_TX_DESC - 1) % NR_TX_DESC;
+		desc = dev->tx_descs + (free_idx * DESC_SIZE);
+		cmdsts = le32_to_cpu(desc[DESC_CMDSTS]);
+		len = cmdsts & CMDSTS_LEN_MASK;
+		buf = desc_addr_get(desc + DESC_BUFPTR);
+		if (desc == first_desc)
+			pci_unmap_single(dev->pci_dev,
+					buf,
+					len,
+					PCI_DMA_TODEVICE);
+		else
+			pci_unmap_page(dev->pci_dev,
+					buf,
+					len,
+					PCI_DMA_TODEVICE);
+		desc[DESC_CMDSTS] = cpu_to_le32(0);
+		mb();
+	} while (desc != first_desc);
+
+dma_error_first:
+	dev_kfree_skb_any(skb);
+	ndev->stats.tx_errors++;
+	return NETDEV_TX_OK;
 }
 
 static void ns83820_update_stats(struct ns83820 *dev)
-- 
2.7.4

^ permalink raw reply related

* [PATCH iproute2 1/1] actions: Add support for user cookies
From: Jamal Hadi Salim @ 2017-04-22 12:36 UTC (permalink / raw)
  To: stephen; +Cc: netdev, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 tc/m_action.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..6ebe85e 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
 	int argc = *argc_p;
 	char **argv = *argv_p;
 	struct rtattr *tail, *tail2;
 	char k[16];
+	int act_ck_len = 0;
 	int ok = 0;
 	int eap = 0; /* expect action parameters */
 
 	int ret = 0;
 	int prio = 0;
+	unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
 	if (argc <= 0)
 		return -1;
@@ -215,16 +216,44 @@ done0:
 			addattr_l(n, MAX_MSG, ++prio, NULL, 0);
 			addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-			ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS, n);
+			ret = a->parse_aopt(a, &argc, &argv, TCA_ACT_OPTIONS,
+					    n);
 
 			if (ret < 0) {
 				fprintf(stderr, "bad action parsing\n");
 				goto bad_val;
 			}
+
+			if (*argv && strcmp(*argv, "cookie") == 0) {
+				size_t slen;
+
+				NEXT_ARG();
+				slen = strlen(*argv);
+				if (slen > TC_COOKIE_MAX_SIZE * 2) {
+					char cookie_err_m[128];
+
+					snprintf(cookie_err_m, 128,
+						 "%zd Max allowed size %d",
+						 slen, TC_COOKIE_MAX_SIZE*2);
+					invarg(cookie_err_m, *argv);
+				}
+
+				if (hex2mem(*argv, act_ck, slen / 2) < 0)
+					invarg("cookie must be a hex string\n",
+					       *argv);
+
+				act_ck_len = slen;
+				argc--;
+				argv++;
+			}
+
+			if (act_ck_len)
+				addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+					  &act_ck, act_ck_len);
+
 			tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
 			ok++;
 		}
-
 	}
 
 	if (eap > 0) {
@@ -245,8 +274,7 @@ bad_val:
 	return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
 	struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +302,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
 		return err;
 
 	if (show_stats && tb[TCA_ACT_STATS]) {
+
 		fprintf(f, "\tAction statistics:\n");
 		print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+		if (tb[TCA_ACT_COOKIE]) {
+			int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+			char b1[strsz+1];
+
+			fprintf(f, "\n\tcookie len %d %s ", strsz,
+				hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+					      strsz, b1, sizeof(b1)));
+		}
 		fprintf(f, "\n");
 	}
 
-- 
1.9.1

^ permalink raw reply related

* compile issue in latest iproute2
From: Jamal Hadi Salim @ 2017-04-22 12:31 UTC (permalink / raw)
  To: Daniel Borkmann, Stephen Hemminger; +Cc: netdev@vger.kernel.org


I dont think is a kernel uapi - but it was failing compiling
when HAVE_ELF is false.
-----
jhs@jhs-UX:~/git-trees/others/iproute-with-ck$ git diff include/bpf_util.h
diff --git a/include/bpf_util.h b/include/bpf_util.h
index 5361dab..edca339 100644
--- a/include/bpf_util.h
+++ b/include/bpf_util.h
@@ -266,7 +266,7 @@ int bpf_send_map_fds(const char *path, const char *obj);
  int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
                      unsigned int entries);
  #else
-static inline int bpf_send_map_fds(const char *path, const char *obj)
+inline int bpf_send_map_fds(const char *path, const char *obj)
  {
         return 0;
  }
-----

Let me know if you want a formal patch or feel free to take it.

cheers,
jamal

^ permalink raw reply related

* Re: [PATCH 4/4] [DO NOT MERGE] arm64: allwinner: a64: enable RTL8211E PHY workaround
From: kbuild test robot @ 2017-04-22 12:27 UTC (permalink / raw)
  To: Icenowy Zheng
  Cc: kbuild-all-JC7UmRfGjtg, Andrew Lunn, Florian Fainelli,
	Rob Herring, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20170421232436.10924-5-icenowy-h8G6r0blFSE@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi Icenowy,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.11-rc7 next-20170421]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Icenowy-Zheng/net-phy-realtek-change-macro-name-for-page-select-register/20170422-144641
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> Error: arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-plus.dts:52.1-9 Label or path ext_phy not found
>> FATAL ERROR: Syntax error parsing input tree

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34529 bytes --]

^ permalink raw reply

* [PATCH] net: netcp: fix spelling mistake: "memomry" -> "memory"
From: Colin King @ 2017-04-22 12:22 UTC (permalink / raw)
  To: Wingman Kwok, Murali Karicheri, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in dev_err message and rejoin
line.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index eece3e2eec14..897176fc5043 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -3048,8 +3048,7 @@ static void init_secondary_ports(struct gbe_priv *gbe_dev,
 	for_each_child_of_node(node, port) {
 		slave = devm_kzalloc(dev, sizeof(*slave), GFP_KERNEL);
 		if (!slave) {
-			dev_err(dev,
-				"memomry alloc failed for secondary port(%s), skipping...\n",
+			dev_err(dev, "memory alloc failed for secondary port(%s), skipping...\n",
 				port->name);
 			continue;
 		}
-- 
2.11.0

^ permalink raw reply related

* Re: r8169: Long link becomes ready times
From: Francois Romieu @ 2017-04-22 12:10 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Realtek linux nic maintainers, netdev, Florian Fainelli
In-Reply-To: <1492851908.2760.222.camel@users.sourceforge.net>

Paul Menzel <paulepanter@users.sourceforge.net> :
[...]
> The ASRock E350M1 has a Realtek ethernet controller.
> 
> It takes almost three seconds for the link to become ready. This is
> noticeable after resume from suspend, where the user wants to continue
> working but first has to wait for the network.
> 
> This test is done with Linux 4.10.
[...]
> The test below is done, removing the module, and then inserting it.
> 
> ```
> Apr 22 10:56:11.919311 myasrocke350m1 kernel: r8169 0000:03:00.0 eth0: RTL8168e/8111e at 0xf82ad000, bc:5f:f4:c8:d3:98, XID 0c200000 IRQ 26
> Apr 22 10:56:11.920631 myasrocke350m1 kernel: r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> Apr 22 10:56:11.967396 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: renamed from eth0
> Apr 22 10:56:12.064323 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
> Apr 22 10:56:12.179106 myasrocke350m1 kernel: r8169 0000:03:00.0: firmware: direct-loading firmware rtl_nic/rtl8168e-2.fw
> Apr 22 10:56:12.247858 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: link down
> Apr 22 10:56:12.248593 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
> Apr 22 10:56:14.992108 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: link up
> Apr 22 10:56:14.993299 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
> ```
> 
> Is it possible to get this well below one second?

Gross as it is, the link detection is already irq driven. Most currently
used delays - see grep -E '(sleep|delay)' drivers/.../r8169.c - are
supposed to be busy waiting loops with a moderate (well...) delay per
loop. The iteration bound is not expected to make a difference.

So, unless there is a big crawling sleep/delay hidden somewhere, this part
of the r8169 driver should not induce huge delays.

Realtek does not communicate hardware documentation, neither programming
specification nor known bugs. Some phy related pieces of material may be
found on their site but you would have to experiment a lot to check if
things can behave differently.

I also experience ~3s link down / link up transition with a 8168c when
connected to a 3Com 4200G. Same 2~3s figures with an intel 82578dc.
It looks similar with a 82574l.

I've never aimed at well below one second (500 ms ?) reliable autoneg.
The phy man may have a different vision.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH 2/2] ipv6: don't deliver packets with zero length to raw sockets
From: Jamie Bainbridge @ 2017-04-22 12:10 UTC (permalink / raw)
  To: David Miller
  Cc: Sabrina Dubroca, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <20170421.105332.1898209845543083717.davem@davemloft.net>

On Sat, Apr 22, 2017 at 12:53 AM, David Miller <davem@davemloft.net> wrote:
> From: Jamie Bainbridge <jbainbri@redhat.com>
> Date: Fri, 21 Apr 2017 21:18:00 +1000
>
>> I cannot see the use in delivering a skb with zero bytes after the
>> network header to a raw socket.
>
> Then it cannot be used to look at zero length UDP packets, which are
> completely legal and used.
>
> So we must deliver it.

Understood, thank you both for the clarification.

That would mean the pattern of select/ioctl/recvfrom is the incorrect
way to code an IPv6 raw socket application. I will let our user know.

How about the other patch in this series? That actually is a valid bug
when skb are paged in a certain way. That patch does not change
behaviour, it just allows ioctl to return the correct result whether
data is linear or paged. Will I resubmit that patch on its own with a
revised commit message?

Jamie

^ permalink raw reply

* Re: [PATCH net-next v4 1/2] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
From: Jamal Hadi Salim @ 2017-04-22 11:30 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, jiri, netdev, xiyou.wangcong
In-Reply-To: <c6333422-ae6d-149c-09ec-c5dfb43d68b3@mojatatu.com>

[-- Attachment #1: Type: text/plain, Size: 1502 bytes --]

On 17-04-21 02:11 PM, Jamal Hadi Salim wrote:

> Please bear with me. I want to make sure to get this right.
>
> Lets say I updated the kernel today to reject transactions with
> bits it didnt understand. Lets call this "old kernel". A tc that
> understands/sets these bits and nothing else. Call it "old tc".
> 3 months later:
> I add one more bit setting to introduce a new feature in a new
> kernel version. Lets call this new "kernel". I update to
> understand new bits. Call it "new tc".
>
> The possibilities:
> a) old tc + old kernel combo. No problem
> b) new tc + new kernel combo. No problem.
> c) old tc + new kernel combo. No problem.
> d) new tc + old kernel. Rejection.
>
> For #d if i have a smart tc it would retry with a new combination
> which restores its behavior to old tc level. Of course this means
> apps would have to be rewritten going forward to understand these
> mechanics.
> Alternative is to request for capabilities first then doing a
> lowest common denominator request.
> But even that is a lot more code and crossing user/kernel twice.
>
> There is a simpler approach that would work going forward.
> How about letting the user choose their fate? Set something maybe
> in the netlink header to tell the kernel "if you dont understand
> something I am asking for - please ignore it and do what you can".
> This would maintain current behavior but would force the user to
> explicitly state so.
>


Tested patch that demonstrates this idea is attached.

cheers,
jamal


[-- Attachment #2: patch-act-check-flags --]
[-- Type: text/plain, Size: 5985 bytes --]

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cce0613..48d3acb 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -674,10 +674,28 @@ struct tcamsg {
 	unsigned char	tca__pad1;
 	unsigned short	tca__pad2;
 };
+
+enum {
+	TCA_ROOT_UNSPEC,
+	TCA_ROOT_TAB,
+#define TCA_ACT_TAB TCA_ROOT_TAB
+	TCA_ROOT_FLAGS,
+	TCA_ROOT_COUNT,
+	__TCA_ROOT_MAX,
+#define	TCA_ROOT_MAX (__TCA_ROOT_MAX - 1)
+};
+
 #define TA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcamsg))))
 #define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
-#define TCA_ACT_TAB 1 /* attr type must be >=1 */	
-#define TCAA_MAX 1
+/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
+ *
+ * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
+ * actions in a dump. All dump responses will contain the number of actions
+ * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ *
+ */
+#define TCA_FLAG_LARGE_DUMP_ON		(1 << 0)
+#define TCA_FLAG_LIBERAL_CHECK_ON	(1 << 1)
 
 /* New extended info filters for IFLA_EXT_MASK */
 #define RTEXT_FILTER_VF		(1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 9ce22b7..fbe96ae 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -83,6 +83,7 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
 			   struct netlink_callback *cb)
 {
 	int err = 0, index = -1, i = 0, s_i = 0, n_i = 0;
+	u32 act_flags = cb->args[2];
 	struct nlattr *nest;
 
 	spin_lock_bh(&hinfo->lock);
@@ -111,14 +112,18 @@ static int tcf_dump_walker(struct tcf_hashinfo *hinfo, struct sk_buff *skb,
 			}
 			nla_nest_end(skb, nest);
 			n_i++;
-			if (n_i >= TCA_ACT_MAX_PRIO)
+			if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+			    n_i >= TCA_ACT_MAX_PRIO)
 				goto done;
 		}
 	}
 done:
 	spin_unlock_bh(&hinfo->lock);
-	if (n_i)
+	if (n_i) {
 		cb->args[0] += n_i;
+		if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+			cb->args[1] = n_i;
+	}
 	return n_i;
 
 nla_put_failure:
@@ -993,11 +998,15 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
 	return tcf_add_notify(net, n, &actions, portid);
 }
 
+static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
+	[TCA_ROOT_FLAGS]      = { .type = NLA_U32 },
+};
+
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 			 struct netlink_ext_ack *extack)
 {
 	struct net *net = sock_net(skb->sk);
-	struct nlattr *tca[TCAA_MAX + 1];
+	struct nlattr *tca[TCA_ROOT_MAX + 1];
 	u32 portid = skb ? NETLINK_CB(skb).portid : 0;
 	int ret = 0, ovr = 0;
 
@@ -1005,7 +1014,7 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 	    !netlink_capable(skb, CAP_NET_ADMIN))
 		return -EPERM;
 
-	ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCAA_MAX, NULL,
+	ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ROOT_MAX, NULL,
 			  extack);
 	if (ret < 0)
 		return ret;
@@ -1046,16 +1055,12 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n,
 	return ret;
 }
 
-static struct nlattr *find_dump_kind(const struct nlmsghdr *n)
+static struct nlattr *find_dump_kind(struct nlattr **nla)
 {
 	struct nlattr *tb1, *tb2[TCA_ACT_MAX + 1];
 	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
-	struct nlattr *nla[TCAA_MAX + 1];
 	struct nlattr *kind;
 
-	if (nlmsg_parse(n, sizeof(struct tcamsg), nla, TCAA_MAX,
-			NULL, NULL) < 0)
-		return NULL;
 	tb1 = nla[TCA_ACT_TAB];
 	if (tb1 == NULL)
 		return NULL;
@@ -1073,6 +1078,24 @@ static struct nlattr *find_dump_kind(const struct nlmsghdr *n)
 	return kind;
 }
 
+#define valid_tca_flags (TCA_FLAG_LARGE_DUMP_ON | TCA_FLAG_LIBERAL_CHECK_ON)
+/* If user has set TCA_FLAG_LIBERAL_CHECK_ON we let the kernel continue
+ * its checking for valid bits only, otherwise we check for any unknown
+ * bits set and reject them
+ */
+static inline bool tca_flags_valid(u32 act_flags)
+{
+	u32 invalid_flags_mask  = ~valid_tca_flags;
+	printk("act flags 0x%x\n", act_flags);
+	if (act_flags & TCA_FLAG_LIBERAL_CHECK_ON)
+		return true;
+
+	if (act_flags & invalid_flags_mask)
+		return false;
+
+	return true;
+}
+
 static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1082,8 +1105,18 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 	struct tc_action_ops *a_o;
 	int ret = 0;
 	struct tcamsg *t = (struct tcamsg *) nlmsg_data(cb->nlh);
-	struct nlattr *kind = find_dump_kind(cb->nlh);
+	struct nlattr *count_attr = NULL;
+	struct nlattr *tb[TCA_ROOT_MAX + 1];
+	struct nlattr *kind = NULL;
+	u32 act_flags = 0;
+	u32 act_count = 0;
+
+	ret = nlmsg_parse(cb->nlh, sizeof(struct tcamsg), tb, TCA_ROOT_MAX,
+			  tcaa_policy, NULL);
+	if (ret < 0)
+		return ret;
 
+	kind = find_dump_kind(tb);
 	if (kind == NULL) {
 		pr_info("tc_dump_action: action bad kind\n");
 		return 0;
@@ -1093,14 +1126,28 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 	if (a_o == NULL)
 		return 0;
 
+	if (tb[TCA_ROOT_FLAGS])
+		act_flags = nla_get_u32(tb[TCA_ROOT_FLAGS]);
+
+	if (act_flags && !tca_flags_valid(act_flags)) {
+		printk("invalid flags 0x%x. Ask for liberal behavior\n",
+		       act_flags);
+		return -EINVAL;
+	}
+
 	nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
 			cb->nlh->nlmsg_type, sizeof(*t), 0);
 	if (!nlh)
 		goto out_module_put;
+
+	cb->args[2] = act_flags;
 	t = nlmsg_data(nlh);
 	t->tca_family = AF_UNSPEC;
 	t->tca__pad1 = 0;
 	t->tca__pad2 = 0;
+	count_attr = nla_reserve(skb, TCA_ROOT_COUNT, sizeof(u32));
+	if (!count_attr)
+		goto out_module_put;
 
 	nest = nla_nest_start(skb, TCA_ACT_TAB);
 	if (nest == NULL)
@@ -1113,6 +1160,9 @@ static int tc_dump_action(struct sk_buff *skb, struct netlink_callback *cb)
 	if (ret > 0) {
 		nla_nest_end(skb, nest);
 		ret = skb->len;
+		act_count = cb->args[1];
+		memcpy(nla_data(count_attr), &act_count, sizeof(u32));
+		cb->args[1] = 0;
 	} else
 		nlmsg_trim(skb, b);
 

^ permalink raw reply related

* Re: [PATCH] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Julian Anastasov @ 2017-04-22 11:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netfilter-devel, lvs-devel, Wensong Zhang, Simon Horman, netdev
In-Reply-To: <2a5c88ca5670baade900e2b1f05a0dcc9927a29d.1492681122.git.pabeni@redhat.com>


	Hello,

On Thu, 20 Apr 2017, Paolo Abeni wrote:

> When creating a new ipvs service, ipv6 addresses are always accepted
> if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
> is not explicitly checked.
> 
> This allows the user-space to configure ipvs services even if the
> system is booted with ipv6.disable=1. On specific configuration, ipvs
> can try to call ipv6 routing code at setup time, causing the kernel to
> oops due to fib6_rules_ops being NULL.
> 
> This change addresses the issue adding a check for the ipv6
> module being enabled while validating ipv6 service operations and
> adding the same validation for dest operations.
> 
> According to git history, this issue is apparently present since
> the introduction of ipv6 support, and the oops can be triggered
> since commit 09571c7ae30865ad ("IPVS: Add function to determine
> if IPv6 address is local")
> 
> Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

	Looks good to me but I see two places that can benefit
from such check:

- in ip_vs_genl_new_daemon() if we do not want to create IPv6 sockets
for the sync protocol in make_send_sock() and make_receive_sock().
Not sure if this can lead to crashes.

- in ip_vs_proc_sync_conn() if we do not want backup server to accept 
IPv6 conns because they may be created even when dests are missing.
We may use retc = 10 there. Not fatal but may eat memory for
conns that will not be used.

Regards

^ permalink raw reply

* [PATCH net] ravb: Double free on error in ravb_start_xmit()
From: Dan Carpenter @ 2017-04-22 10:46 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: David S. Miller, Kazuya Mizuguchi, Simon Horman, Yoshihiro Kaneko,
	Masaru Nagai, Geert Uytterhoeven, Niklas Söderlund,
	Philippe Reynes, netdev, linux-renesas-soc, kernel-janitors

If skb_put_padto() fails then it frees the skb.  I shifted that code
up a bit to make my error handling a little simpler.

Fixes: a0d2f20650e8 ("Renesas Ethernet AVB PTP clock driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 8cfc4a54f2dc..3cd7989c007d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1516,11 +1516,12 @@ static netdev_tx_t ravb_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 		spin_unlock_irqrestore(&priv->lock, flags);
 		return NETDEV_TX_BUSY;
 	}
-	entry = priv->cur_tx[q] % (priv->num_tx_ring[q] * NUM_TX_DESC);
-	priv->tx_skb[q][entry / NUM_TX_DESC] = skb;
 
 	if (skb_put_padto(skb, ETH_ZLEN))
-		goto drop;
+		goto exit;
+
+	entry = priv->cur_tx[q] % (priv->num_tx_ring[q] * NUM_TX_DESC);
+	priv->tx_skb[q][entry / NUM_TX_DESC] = skb;
 
 	buffer = PTR_ALIGN(priv->tx_align[q], DPTR_ALIGN) +
 		 entry / NUM_TX_DESC * DPTR_ALIGN;

^ permalink raw reply related

* r8169: Long link becomes ready times
From: Paul Menzel @ 2017-04-22  9:05 UTC (permalink / raw)
  To: Realtek linux nic maintainers; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

Dear Linux folks,


The ASRock E350M1 has a Realtek ethernet controller.

It takes almost three seconds for the link to become ready. This is
noticeable after resume from suspend, where the user wants to continue
working but first has to wait for the network.

This test is done with Linux 4.10.

```
$ sudo lspci -s 3:00.0 -nn -v
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
	Subsystem: ASRock Incorporation Motherboard (one of many) [1849:8168]
	Flags: bus master, fast devsel, latency 0, IRQ 26
	I/O ports at 1000 [size=256]
	Memory at f0004000 (64-bit, prefetchable) [size=4K]
	Memory at f0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Kernel driver in use: r8169
	Kernel modules: r8169
```

The test below is done, removing the module, and then inserting it.

```
Apr 22 10:56:11.919311 myasrocke350m1 kernel: r8169 0000:03:00.0 eth0: RTL8168e/8111e at 0xf82ad000, bc:5f:f4:c8:d3:98, XID 0c200000 IRQ 26
Apr 22 10:56:11.920631 myasrocke350m1 kernel: r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Apr 22 10:56:11.967396 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: renamed from eth0
Apr 22 10:56:12.064323 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
Apr 22 10:56:12.179106 myasrocke350m1 kernel: r8169 0000:03:00.0: firmware: direct-loading firmware rtl_nic/rtl8168e-2.fw
Apr 22 10:56:12.247858 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: link down
Apr 22 10:56:12.248593 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth6: link is not ready
Apr 22 10:56:14.992108 myasrocke350m1 kernel: r8169 0000:03:00.0 eth6: link up
Apr 22 10:56:14.993299 myasrocke350m1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
```

Is it possible to get this well below one second?


Thanks,

Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply

* Re: [RFC] change the default Kconfig value of mlx5_en
From: Ian Kumlien @ 2017-04-22  9:28 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Leon Romanovsky, Matan Barak, David Miller, Saeed Mahameed,
	Linux Kernel Network Developers
In-Reply-To: <CALzJLG_+ubbpwYy+i=HixbT6C6PXB16ZA8_iySmggSsUD70bKA@mail.gmail.com>

On Sat, Apr 22, 2017 at 3:07 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Apr 22, 2017 at 3:47 AM, Ian Kumlien <ian.kumlien@gmail.com> wrote:
>> On Sat, Apr 22, 2017 at 2:34 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Sat, Apr 22, 2017 at 2:10 AM, Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>> Sorry,
>>>>
>>>> Back again, fighting cold, hot whiskey has been consumed...
>>>>
>>>> Something like this would perhaps be a better solution:
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>> index 60154a175bd3..fe192e247601 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>>>> @@ -1139,6 +1139,10 @@ static int mlx5_load_one(struct mlx5_core_dev
>>>> *dev, struct mlx5_priv *priv,
>>>>
>>>>  #ifdef CONFIG_MLX5_CORE_EN
>>>>         mlx5_eswitch_attach(dev->priv.eswitch);
>>>> +#else
>>>> +       if (MLX5_CAP_GEN(dev, port_type) == MLX5_CAP_PORT_TYPE_ETH) {
>>>> +               dev_info(&pdev->dev, "Ethernet device discovered but
>>>> support not enabled in kernel.");
>>>> +       }
>>>>  #endif
>>>>
>>>
>>> Currently both MLX5_CORE=n and MLX5_CORE_EN=n as a default, the issue
>>> you are seeing can occur only if you explicitly  set MLX5_CORE=y and
>>> MLX5_CORE=n, Why would someone do this if he knows he wants Ethernet
>>> support as well ? IMHO this print is redundant .
>>
>> Well, I'm running a prebuilt kernel - which was configured this way,
>> and since there
>> is no mlx5_en module and it does state that the link is "Ethernet", it
>> just looks like the
>> driver is broken or in some kind of really weird state.
>>
>>> Anyway, Are you looking for RDMA support over ethernet (RoCE) ? and
>>> you are not interested to have ethernet netdev support ?
>>
>> ? RDMA is something we'll look at in the future, right now, having the
>> nics actually
>> work as nics is a priority ;)
>>
>
> I see, i just wanted to understand your situation :)
>
>>> if yes, I think this is something that can be achieved, but the
>>> question is do we really need this ?
>>
>> It's really weird to see the driver load, to see everything register
>> and have no feedback.
>>
>
> So, in your case you have mlx5 core support without MLX5_CORE_EN which
> provides the eswitch and netdev functionality in ethernet.

Yes

> But you will still have mlx5_ib register an RDMA interface and
> theoretically it should work, the only thing you won't see is a
> netdevice.
>
> The weird thing is that you don't see a link up on the RDMA interface,
> Leon/Matan can you please look into this ? do we really need a netdev
> to have a functioning RDMA logical link in ethernet ?

The switch we have does support RDMA but the manual is sparse (as in
nothing really there) wrt enabling/configuring the RDMA bit so something
might be missing.

I'll try to remember to do the same test when we setup the mellanox switches =)

>> Including no network devices, but if you run the Infiniband commands,
>> they tell you that
>> you are connected to Ethernet but that the device is down and disabled.
>>
>> To me, down and disabled is not the same as in "Ethernet support is
>> not included" =)
>>
>> Basically, i would hate for someone else to end up in the same
>> situation since you only
>> get guides on how to enable infiniband/RDMA but what you really want
>> to do at that point
>> is to disable it and see if that gives you your network devices back =)
>>
>
> Yes this is misleading, Maybe your kernel log warning is not so bad
> after all, but let me dig more into this.
> I will get back to you next week.

Thanks, I bet that there is better ways to do it, this one was just
one of the first ones i found =)

>> I have had similar issues with some connectx3 devices while playing at
>> home but i suspect
>> that it's just a limitation of OFED packages available for the dist I'm running.

^ permalink raw reply

* Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list
From: Dmitry Vyukov @ 2017-04-22  9:14 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: David Ahern, netdev, andreyknvl, mmanning
In-Reply-To: <20170422055703.uo4ghu5jfbctxvud@kafai-mba.dhcp.thefacebook.com>

On Sat, Apr 22, 2017 at 7:57 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Fri, Apr 21, 2017 at 04:40:30PM -0700, David Ahern wrote:
>> Taking down the loopback device wreaks havoc on IPv6 routes. By
>> extension, taking a VRF device wreaks havoc on its table.
>>
>> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
>> FIB code while running syzkaller fuzzer. The root cause is a dead dst
>> that is on the garbage list gets reinserted into the IPv6 FIB. While on
>> the gc (or perhaps when it gets added to the gc list) the dst->next is
>> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
>> out-of-bounds access.
> Thanks for the investigation and details explanation.
>
> It sounds like the dst is already in DST_OBSOLETE_DEAD during
> the second fib6_add().  Glad that the fib6_del() caught it.
>
>>
>> Andrey's reproducer was the key to getting to the bottom of this.
>>
>> With IPv6, host routes for an address have the dst->dev set to the
>> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
>> a walk of the fib evicting routes with the 'lo' device which means all
>> host routes are removed. That process moves the dst which is attached to
>> an inet6_ifaddr to the gc list and marks it as dead.
>>
>> The recent change to keep global IPv6 addresses added a new function
>> fixup_permanent_addr that is called on admin up. That function restarts
>> dad for an inet6_ifaddr and when it completes the host route attached
>> to it is inserted into the fib. Since the route was marked dead and
>> moved to the gc list, we get the reported out-of-bounds accesses. If
>> the device with the address is taken down or the address is removed, the
>> WARN_ON in fib6_del is triggered.
>>
>> All of those faults are fixed by regenerating the host route of the
>> existing one has been moved to the gc list, something that can be
>> determined by checking if the rt6i_ref counter is 0.
>>
>> The update of the route on the ifp is done using cmpxchg as suggested
>> by Li RongQing in a patch a year ago.
>>
>> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
>> Reported-by: Dmitry Vyukov <dvyukov@google.com>
>> Reported-by: Andrey Konovalov <andreyknvl@google.com>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
>> ---
>>  net/ipv6/addrconf.c | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 08f9e8ea7a81..93555528a45b 100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -3303,14 +3303,16 @@ static void addrconf_gre_config(struct net_device *dev)
>>  static int fixup_permanent_addr(struct inet6_dev *idev,
>>                               struct inet6_ifaddr *ifp)
>>  {
>> -     if (!ifp->rt) {
>> -             struct rt6_info *rt;
>> +     if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
>> +             struct rt6_info *rt, *prev;
>>
>>               rt = addrconf_dst_alloc(idev, &ifp->addr, false);
>>               if (unlikely(IS_ERR(rt)))
>>                       return PTR_ERR(rt);
>>
>> -             ifp->rt = rt;
>> +             prev = cmpxchg(&ifp->rt, ifp->rt, rt);
> One small question.  Why cmpxchg is needed instead
> of a ip6_rt_put() and then assign?
> Is it fixing another bug?

cmpxchg here looks fishy.
If there are no concurrent modifications, then it is not needed.
If there are and cmpxchg fails, then we will put the installed rt and
leak the new one.




>> +             if (prev)
>> +                     ip6_rt_put(prev);
>>       }
>>
>>       if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
>> --
>> 2.1.4
>>

^ permalink raw reply

* Re: [PATCH 3/4] net: phy: realtek: add disable RX delay hack for RTL8211E
From: kbuild test robot @ 2017-04-22  7:12 UTC (permalink / raw)
  To: Icenowy Zheng
  Cc: kbuild-all-JC7UmRfGjtg, Andrew Lunn, Florian Fainelli,
	Rob Herring, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20170421232436.10924-4-icenowy-h8G6r0blFSE@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Hi Icenowy,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.11-rc7 next-20170421]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Icenowy-Zheng/net-phy-realtek-change-macro-name-for-page-select-register/20170422-144641
config: i386-randconfig-x070-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/phy/realtek.c: In function 'rtl8211e_config_init':
>> drivers/net/phy/realtek.c:147:3: error: expected ';' before 'phy_write'
      phy_write(phydev, RTL8211E_EXT_PAGE_SELECT, 0xa4);
      ^~~~~~~~~

vim +147 drivers/net/phy/realtek.c

   141			 *
   142			 * The datasheet of RTL8211E didn't cover this ext page.
   143			 *
   144			 * Select extension page 0xa4 here.
   145			 */
   146			phy_write(phydev, RTL8211_PAGE_SELECT, RTL8211E_EXT_PAGE)
 > 147			phy_write(phydev, RTL8211E_EXT_PAGE_SELECT, 0xa4);
   148	
   149			/* Write the magic number */
   150			phy_write(phydev, 0x1c, 0xb591);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

-- 
You received this message because you are subscribed to the Google Groups "linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24280 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: ipv6: regenerate host route if moved to gc list
From: Martin KaFai Lau @ 2017-04-22  5:57 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, dvyukov, andreyknvl, mmanning
In-Reply-To: <1492818030-17640-1-git-send-email-dsa@cumulusnetworks.com>

On Fri, Apr 21, 2017 at 04:40:30PM -0700, David Ahern wrote:
> Taking down the loopback device wreaks havoc on IPv6 routes. By
> extension, taking a VRF device wreaks havoc on its table.
>
> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
> FIB code while running syzkaller fuzzer. The root cause is a dead dst
> that is on the garbage list gets reinserted into the IPv6 FIB. While on
> the gc (or perhaps when it gets added to the gc list) the dst->next is
> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
> out-of-bounds access.
Thanks for the investigation and details explanation.

It sounds like the dst is already in DST_OBSOLETE_DEAD during
the second fib6_add().  Glad that the fib6_del() caught it.

>
> Andrey's reproducer was the key to getting to the bottom of this.
>
> With IPv6, host routes for an address have the dst->dev set to the
> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
> a walk of the fib evicting routes with the 'lo' device which means all
> host routes are removed. That process moves the dst which is attached to
> an inet6_ifaddr to the gc list and marks it as dead.
>
> The recent change to keep global IPv6 addresses added a new function
> fixup_permanent_addr that is called on admin up. That function restarts
> dad for an inet6_ifaddr and when it completes the host route attached
> to it is inserted into the fib. Since the route was marked dead and
> moved to the gc list, we get the reported out-of-bounds accesses. If
> the device with the address is taken down or the address is removed, the
> WARN_ON in fib6_del is triggered.
>
> All of those faults are fixed by regenerating the host route of the
> existing one has been moved to the gc list, something that can be
> determined by checking if the rt6i_ref counter is 0.
>
> The update of the route on the ifp is done using cmpxchg as suggested
> by Li RongQing in a patch a year ago.
>
> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
>  net/ipv6/addrconf.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 08f9e8ea7a81..93555528a45b 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3303,14 +3303,16 @@ static void addrconf_gre_config(struct net_device *dev)
>  static int fixup_permanent_addr(struct inet6_dev *idev,
>  				struct inet6_ifaddr *ifp)
>  {
> -	if (!ifp->rt) {
> -		struct rt6_info *rt;
> +	if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> +		struct rt6_info *rt, *prev;
>
>  		rt = addrconf_dst_alloc(idev, &ifp->addr, false);
>  		if (unlikely(IS_ERR(rt)))
>  			return PTR_ERR(rt);
>
> -		ifp->rt = rt;
> +		prev = cmpxchg(&ifp->rt, ifp->rt, rt);
One small question.  Why cmpxchg is needed instead
of a ip6_rt_put() and then assign?
Is it fixing another bug?


> +		if (prev)
> +			ip6_rt_put(prev);
>  	}
>
>  	if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
> --
> 2.1.4
>

^ permalink raw reply

* Re: [PATCH] xprtrdma: use offset_in_page() macro
From: Chuck Lever @ 2017-04-22  3:37 UTC (permalink / raw)
  To: Geliang Tang
  Cc: J. Bruce Fields, Jeff Layton, Trond Myklebust, Anna Schumaker,
	David S. Miller, Sagi Grimberg, Linux NFS Mailing List, netdev,
	linux-kernel
In-Reply-To: <1b5bf5bd903ad97138a3a86aa6bcb9b2fc142c6e.1492762974.git.geliangtang@gmail.com>


> On Apr 21, 2017, at 9:21 PM, Geliang Tang <geliangtang@gmail.com> wrote:
> 
> Use offset_in_page() macro instead of open-coding.
> 
> Signed-off-by: Geliang Tang <geliangtang@gmail.com>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c        | 4 ++--
> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 3 +--
> 2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index a044be2..429beea 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -540,7 +540,7 @@ rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
> 			goto out;
> 
> 		page = virt_to_page(xdr->tail[0].iov_base);
> -		page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
> +		page_base = offset_in_page(xdr->tail[0].iov_base);
> 
> 		/* If the content in the page list is an odd length,
> 		 * xdr_write_pages() has added a pad at the beginning
> @@ -587,7 +587,7 @@ rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
> 	 */
> 	if (xdr->tail[0].iov_len) {
> 		page = virt_to_page(xdr->tail[0].iov_base);
> -		page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
> +		page_base = offset_in_page(xdr->tail[0].iov_base);
> 		len = xdr->tail[0].iov_len;
> 
> map_tail:

There are several other sites that use PAGE_MASK in
rpc_rdma.c. Should those be included in this patch?

Do you have a way to test this change? If not I
can take it (once the above comment is addressed),
run it through the usual battery of NFS/RDMA
testing, and then pass it along to Anna.


> diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> index 1736337..60b3f29 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> @@ -306,12 +306,11 @@ static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
> 				unsigned char *base,
> 				unsigned int len)
> {
> -	unsigned long offset = (unsigned long)base & ~PAGE_MASK;
> 	struct ib_device *dev = rdma->sc_cm_id->device;
> 	dma_addr_t dma_addr;
> 
> 	dma_addr = ib_dma_map_page(dev, virt_to_page(base),
> -				   offset, len, DMA_TO_DEVICE);
> +				   offset_in_page(base), len, DMA_TO_DEVICE);
> 	if (ib_dma_mapping_error(dev, dma_addr))
> 		return -EIO;
> 

This hunk conflicts with a rewrite of svc_rdma_sendto.c that
Bruce has already accepted for v4.12. I would prefer this
be dropped.

The rewritten code also has this issue. I can submit a patch
separately that adds offset_in_page in the appropriate place.


--
Chuck Lever

^ permalink raw reply

* [PATCH 2/2] sparc64: Add eBPF JIT.
From: David Miller @ 2017-04-22  3:17 UTC (permalink / raw)
  To: sparclinux; +Cc: netdev, ast, daniel


This is an eBPF JIT for sparc64.  All major features are supported.

All tests under tools/testing/selftests/bpf/ pass.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/Kconfig                            |    3 +-
 arch/sparc/net/bpf_jit_32.h                   |    2 +-
 arch/sparc/net/{bpf_jit_32.h => bpf_jit_64.h} |   56 +-
 arch/sparc/net/bpf_jit_asm_32.S               |    7 -
 arch/sparc/net/bpf_jit_asm_64.S               |  162 +++-
 arch/sparc/net/bpf_jit_comp_32.c              |   49 -
 arch/sparc/net/bpf_jit_comp_64.c              | 1194 ++++++++++++++++++++++++-
 7 files changed, 1384 insertions(+), 89 deletions(-)
 copy arch/sparc/net/{bpf_jit_32.h => bpf_jit_64.h} (53%)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a59deae..7b16251 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -31,7 +31,8 @@ config SPARC
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select GENERIC_PCI_IOMAP
 	select HAVE_NMI_WATCHDOG if SPARC64
-	select HAVE_CBPF_JIT
+	select HAVE_CBPF_JIT if SPARC32
+	select HAVE_EBPF_JIT if SPARC64
 	select HAVE_DEBUG_BUGVERBOSE
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_CLOCKEVENTS
diff --git a/arch/sparc/net/bpf_jit_32.h b/arch/sparc/net/bpf_jit_32.h
index 33d6b37..d5c069b 100644
--- a/arch/sparc/net/bpf_jit_32.h
+++ b/arch/sparc/net/bpf_jit_32.h
@@ -39,7 +39,7 @@
 #define r_TMP2		G2
 #define r_OFF		G3
 
-/* assembly code in arch/sparc/net/bpf_jit_asm.S */
+/* assembly code in arch/sparc/net/bpf_jit_asm_32.S */
 extern u32 bpf_jit_load_word[];
 extern u32 bpf_jit_load_half[];
 extern u32 bpf_jit_load_byte[];
diff --git a/arch/sparc/net/bpf_jit_32.h b/arch/sparc/net/bpf_jit_64.h
similarity index 53%
copy from arch/sparc/net/bpf_jit_32.h
copy to arch/sparc/net/bpf_jit_64.h
index 33d6b37..74abd45 100644
--- a/arch/sparc/net/bpf_jit_32.h
+++ b/arch/sparc/net/bpf_jit_64.h
@@ -1,24 +1,13 @@
 #ifndef _BPF_JIT_H
 #define _BPF_JIT_H
 
-/* Conventions:
- *  %g1 : temporary
- *  %g2 : Secondary temporary used by SKB data helper stubs.
- *  %g3 : packet offset passed into SKB data helper stubs.
- *  %o0 : pointer to skb (first argument given to JIT function)
- *  %o1 : BPF A accumulator
- *  %o2 : BPF X accumulator
- *  %o3 : Holds saved %o7 so we can call helper functions without needing
- *        to allocate a register window.
- *  %o4 : skb->len - skb->data_len
- *  %o5 : skb->data
- */
-
 #ifndef __ASSEMBLER__
 #define G0		0x00
 #define G1		0x01
+#define G2		0x02
 #define G3		0x03
 #define G6		0x06
+#define G7		0x07
 #define O0		0x08
 #define O1		0x09
 #define O2		0x0a
@@ -27,19 +16,30 @@
 #define O5		0x0d
 #define SP		0x0e
 #define O7		0x0f
+#define L0		0x10
+#define L1		0x11
+#define L2		0x12
+#define L3		0x13
+#define L4		0x14
+#define L5		0x15
+#define L6		0x16
+#define L7		0x17
+#define I0		0x18
+#define I1		0x19
+#define I2		0x1a
+#define I3		0x1b
+#define I4		0x1c
+#define I5		0x1d
 #define FP		0x1e
+#define I7		0x1f
 
-#define r_SKB		O0
-#define r_A		O1
-#define r_X		O2
-#define r_saved_O7	O3
-#define r_HEADLEN	O4
-#define r_SKB_DATA	O5
+#define r_SKB		L0
+#define r_HEADLEN	L4
+#define r_SKB_DATA	L5
 #define r_TMP		G1
-#define r_TMP2		G2
-#define r_OFF		G3
+#define r_TMP2		G3
 
-/* assembly code in arch/sparc/net/bpf_jit_asm.S */
+/* assembly code in arch/sparc/net/bpf_jit_asm_64.S */
 extern u32 bpf_jit_load_word[];
 extern u32 bpf_jit_load_half[];
 extern u32 bpf_jit_load_byte[];
@@ -54,15 +54,13 @@ extern u32 bpf_jit_load_byte_negative_offset[];
 extern u32 bpf_jit_load_byte_msh_negative_offset[];
 
 #else
+#define r_RESULT	%o0
 #define r_SKB		%o0
-#define r_A		%o1
-#define r_X		%o2
-#define r_saved_O7	%o3
-#define r_HEADLEN	%o4
-#define r_SKB_DATA	%o5
+#define r_OFF		%o1
+#define r_HEADLEN	%l4
+#define r_SKB_DATA	%l5
 #define r_TMP		%g1
-#define r_TMP2		%g2
-#define r_OFF		%g3
+#define r_TMP2		%g3
 #endif
 
 #endif /* _BPF_JIT_H */
diff --git a/arch/sparc/net/bpf_jit_asm_32.S b/arch/sparc/net/bpf_jit_asm_32.S
index 5632cdc..dcc402f 100644
--- a/arch/sparc/net/bpf_jit_asm_32.S
+++ b/arch/sparc/net/bpf_jit_asm_32.S
@@ -2,17 +2,10 @@
 
 #include "bpf_jit_32.h"
 
-#ifdef CONFIG_SPARC64
-#define SAVE_SZ		176
-#define SCRATCH_OFF	STACK_BIAS + 128
-#define BE_PTR(label)	be,pn %xcc, label
-#define SIGN_EXTEND(reg)	sra reg, 0, reg
-#else
 #define SAVE_SZ		96
 #define SCRATCH_OFF	72
 #define BE_PTR(label)	be label
 #define SIGN_EXTEND(reg)
-#endif
 
 #define SKF_MAX_NEG_OFF	(-0x200000) /* SKF_LL_OFF from filter.h */
 
diff --git a/arch/sparc/net/bpf_jit_asm_64.S b/arch/sparc/net/bpf_jit_asm_64.S
index 6fb023f..3b3f146 100644
--- a/arch/sparc/net/bpf_jit_asm_64.S
+++ b/arch/sparc/net/bpf_jit_asm_64.S
@@ -1 +1,161 @@
-#include "bpf_jit_asm_32.S"
+#include <asm/ptrace.h>
+
+#include "bpf_jit_64.h"
+
+#define SAVE_SZ		176
+#define SCRATCH_OFF	STACK_BIAS + 128
+#define BE_PTR(label)	be,pn %xcc, label
+#define SIGN_EXTEND(reg)	sra reg, 0, reg
+
+#define SKF_MAX_NEG_OFF	(-0x200000) /* SKF_LL_OFF from filter.h */
+
+	.text
+	.globl	bpf_jit_load_word
+bpf_jit_load_word:
+	cmp	r_OFF, 0
+	bl	bpf_slow_path_word_neg
+	 nop
+	.globl	bpf_jit_load_word_positive_offset
+bpf_jit_load_word_positive_offset:
+	sub	r_HEADLEN, r_OFF, r_TMP
+	cmp	r_TMP, 3
+	ble	bpf_slow_path_word
+	 add	r_SKB_DATA, r_OFF, r_TMP
+	andcc	r_TMP, 3, %g0
+	bne	load_word_unaligned
+	 nop
+	retl
+	 ld	[r_TMP], r_RESULT
+load_word_unaligned:
+	ldub	[r_TMP + 0x0], r_OFF
+	ldub	[r_TMP + 0x1], r_TMP2
+	sll	r_OFF, 8, r_OFF
+	or	r_OFF, r_TMP2, r_OFF
+	ldub	[r_TMP + 0x2], r_TMP2
+	sll	r_OFF, 8, r_OFF
+	or	r_OFF, r_TMP2, r_OFF
+	ldub	[r_TMP + 0x3], r_TMP2
+	sll	r_OFF, 8, r_OFF
+	retl
+	 or	r_OFF, r_TMP2, r_RESULT
+
+	.globl	bpf_jit_load_half
+bpf_jit_load_half:
+	cmp	r_OFF, 0
+	bl	bpf_slow_path_half_neg
+	 nop
+	.globl	bpf_jit_load_half_positive_offset
+bpf_jit_load_half_positive_offset:
+	sub	r_HEADLEN, r_OFF, r_TMP
+	cmp	r_TMP, 1
+	ble	bpf_slow_path_half
+	 add	r_SKB_DATA, r_OFF, r_TMP
+	andcc	r_TMP, 1, %g0
+	bne	load_half_unaligned
+	 nop
+	retl
+	 lduh	[r_TMP], r_RESULT
+load_half_unaligned:
+	ldub	[r_TMP + 0x0], r_OFF
+	ldub	[r_TMP + 0x1], r_TMP2
+	sll	r_OFF, 8, r_OFF
+	retl
+	 or	r_OFF, r_TMP2, r_RESULT
+
+	.globl	bpf_jit_load_byte
+bpf_jit_load_byte:
+	cmp	r_OFF, 0
+	bl	bpf_slow_path_byte_neg
+	 nop
+	.globl	bpf_jit_load_byte_positive_offset
+bpf_jit_load_byte_positive_offset:
+	cmp	r_OFF, r_HEADLEN
+	bge	bpf_slow_path_byte
+	 nop
+	retl
+	 ldub	[r_SKB_DATA + r_OFF], r_RESULT
+
+#define bpf_slow_path_common(LEN)	\
+	save	%sp, -SAVE_SZ, %sp;	\
+	mov	%i0, %o0;		\
+	mov	%i1, %o1;		\
+	add	%fp, SCRATCH_OFF, %o2;	\
+	call	skb_copy_bits;		\
+	 mov	(LEN), %o3;		\
+	cmp	%o0, 0;			\
+	restore;
+
+bpf_slow_path_word:
+	bpf_slow_path_common(4)
+	bl	bpf_error
+	 ld	[%sp + SCRATCH_OFF], r_RESULT
+	retl
+	 nop
+bpf_slow_path_half:
+	bpf_slow_path_common(2)
+	bl	bpf_error
+	 lduh	[%sp + SCRATCH_OFF], r_RESULT
+	retl
+	 nop
+bpf_slow_path_byte:
+	bpf_slow_path_common(1)
+	bl	bpf_error
+	 ldub	[%sp + SCRATCH_OFF], r_RESULT
+	retl
+	 nop
+
+#define bpf_negative_common(LEN)			\
+	save	%sp, -SAVE_SZ, %sp;			\
+	mov	%i0, %o0;				\
+	mov	%i1, %o1;				\
+	SIGN_EXTEND(%o1);				\
+	call	bpf_internal_load_pointer_neg_helper;	\
+	 mov	(LEN), %o2;				\
+	mov	%o0, r_TMP;				\
+	cmp	%o0, 0;					\
+	BE_PTR(bpf_error);				\
+	 restore;
+
+bpf_slow_path_word_neg:
+	sethi	%hi(SKF_MAX_NEG_OFF), r_TMP
+	cmp	r_OFF, r_TMP
+	bl	bpf_error
+	 nop
+	.globl	bpf_jit_load_word_negative_offset
+bpf_jit_load_word_negative_offset:
+	bpf_negative_common(4)
+	andcc	r_TMP, 3, %g0
+	bne	load_word_unaligned
+	 nop
+	retl
+	 ld	[r_TMP], r_RESULT
+
+bpf_slow_path_half_neg:
+	sethi	%hi(SKF_MAX_NEG_OFF), r_TMP
+	cmp	r_OFF, r_TMP
+	bl	bpf_error
+	 nop
+	.globl	bpf_jit_load_half_negative_offset
+bpf_jit_load_half_negative_offset:
+	bpf_negative_common(2)
+	andcc	r_TMP, 1, %g0
+	bne	load_half_unaligned
+	 nop
+	retl
+	 lduh	[r_TMP], r_RESULT
+
+bpf_slow_path_byte_neg:
+	sethi	%hi(SKF_MAX_NEG_OFF), r_TMP
+	cmp	r_OFF, r_TMP
+	bl	bpf_error
+	 nop
+	.globl	bpf_jit_load_byte_negative_offset
+bpf_jit_load_byte_negative_offset:
+	bpf_negative_common(1)
+	retl
+	 ldub	[r_TMP], r_RESULT
+
+bpf_error:
+	/* Make the JIT program itself return zero. */
+	ret
+	restore	%g0, %g0, %o0
diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
index 83fc41d..d193748 100644
--- a/arch/sparc/net/bpf_jit_comp_32.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -17,24 +17,6 @@ static inline bool is_simm13(unsigned int value)
 	return value + 0x1000 < 0x2000;
 }
 
-static void bpf_flush_icache(void *start_, void *end_)
-{
-#ifdef CONFIG_SPARC64
-	/* Cheetah's I-cache is fully coherent.  */
-	if (tlb_type == spitfire) {
-		unsigned long start = (unsigned long) start_;
-		unsigned long end = (unsigned long) end_;
-
-		start &= ~7UL;
-		end = (end + 7UL) & ~7UL;
-		while (start < end) {
-			flushi(start);
-			start += 32;
-		}
-	}
-#endif
-}
-
 #define SEEN_DATAREF 1 /* might call external helpers */
 #define SEEN_XREG    2 /* ebx is used */
 #define SEEN_MEM     4 /* use mem[] for temporary storage */
@@ -82,11 +64,7 @@ static void bpf_flush_icache(void *start_, void *end_)
 #define BE		(F2(0, 2) | CONDE)
 #define BNE		(F2(0, 2) | CONDNE)
 
-#ifdef CONFIG_SPARC64
-#define BE_PTR		(F2(0, 1) | CONDE | (2 << 20))
-#else
 #define BE_PTR		BE
-#endif
 
 #define SETHI(K, REG)	\
 	(F2(0, 0x4) | RD(REG) | (((K) >> 10) & 0x3fffff))
@@ -116,13 +94,8 @@ static void bpf_flush_icache(void *start_, void *end_)
 #define LD64		F3(3, 0x0b)
 #define ST32		F3(3, 0x04)
 
-#ifdef CONFIG_SPARC64
-#define LDPTR		LD64
-#define BASE_STACKFRAME	176
-#else
 #define LDPTR		LD32
 #define BASE_STACKFRAME	96
-#endif
 
 #define LD32I		(LD32 | IMMED)
 #define LD8I		(LD8 | IMMED)
@@ -234,11 +207,7 @@ do {	BUILD_BUG_ON(FIELD_SIZEOF(STRUCT, FIELD) != sizeof(u8));	\
 	__emit_load8(BASE, STRUCT, FIELD, DEST);			\
 } while (0)
 
-#ifdef CONFIG_SPARC64
-#define BIAS (STACK_BIAS - 4)
-#else
 #define BIAS (-4)
-#endif
 
 #define emit_ldmem(OFF, DEST)						\
 do {	*prog++ = LD32I | RS1(SP) | S13(BIAS - (OFF)) | RD(DEST);	\
@@ -249,13 +218,8 @@ do {	*prog++ = ST32I | RS1(SP) | S13(BIAS - (OFF)) | RD(SRC);	\
 } while (0)
 
 #ifdef CONFIG_SMP
-#ifdef CONFIG_SPARC64
-#define emit_load_cpu(REG)						\
-	emit_load16(G6, struct thread_info, cpu, REG)
-#else
 #define emit_load_cpu(REG)						\
 	emit_load32(G6, struct thread_info, cpu, REG)
-#endif
 #else
 #define emit_load_cpu(REG)	emit_clear(REG)
 #endif
@@ -486,7 +450,6 @@ void bpf_jit_compile(struct bpf_prog *fp)
 				if (K == 1)
 					break;
 				emit_write_y(G0);
-#ifdef CONFIG_SPARC32
 				/* The Sparc v8 architecture requires
 				 * three instructions between a %y
 				 * register write and the first use.
@@ -494,31 +457,21 @@ void bpf_jit_compile(struct bpf_prog *fp)
 				emit_nop();
 				emit_nop();
 				emit_nop();
-#endif
 				emit_alu_K(DIV, K);
 				break;
 			case BPF_ALU | BPF_DIV | BPF_X:	/* A /= X; */
 				emit_cmpi(r_X, 0);
 				if (pc_ret0 > 0) {
 					t_offset = addrs[pc_ret0 - 1];
-#ifdef CONFIG_SPARC32
 					emit_branch(BE, t_offset + 20);
-#else
-					emit_branch(BE, t_offset + 8);
-#endif
 					emit_nop(); /* delay slot */
 				} else {
 					emit_branch_off(BNE, 16);
 					emit_nop();
-#ifdef CONFIG_SPARC32
 					emit_jump(cleanup_addr + 20);
-#else
-					emit_jump(cleanup_addr + 8);
-#endif
 					emit_clear(r_A);
 				}
 				emit_write_y(G0);
-#ifdef CONFIG_SPARC32
 				/* The Sparc v8 architecture requires
 				 * three instructions between a %y
 				 * register write and the first use.
@@ -526,7 +479,6 @@ void bpf_jit_compile(struct bpf_prog *fp)
 				emit_nop();
 				emit_nop();
 				emit_nop();
-#endif
 				emit_alu_X(DIV);
 				break;
 			case BPF_ALU | BPF_NEG:
@@ -797,7 +749,6 @@ cond_branch:			f_offset = addrs[i + filter[i].jf];
 		bpf_jit_dump(flen, proglen, pass + 1, image);
 
 	if (image) {
-		bpf_flush_icache(image, image + proglen);
 		fp->bpf_func = (void *)image;
 		fp->jited = 1;
 	}
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 49b5f65..4c59087 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1 +1,1193 @@
-#include "bpf_jit_comp_32.c"
+#include <linux/moduleloader.h>
+#include <linux/workqueue.h>
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/bpf.h>
+#include <linux/cache.h>
+#include <linux/if_vlan.h>
+
+#include <asm/cacheflush.h>
+#include <asm/ptrace.h>
+
+#include "bpf_jit_64.h"
+
+int bpf_jit_enable __read_mostly;
+
+static inline bool is_simm13(unsigned int value)
+{
+	return value + 0x1000 < 0x2000;
+}
+
+static void bpf_flush_icache(void *start_, void *end_)
+{
+	/* Cheetah's I-cache is fully coherent.  */
+	if (tlb_type == spitfire) {
+		unsigned long start = (unsigned long) start_;
+		unsigned long end = (unsigned long) end_;
+
+		start &= ~7UL;
+		end = (end + 7UL) & ~7UL;
+		while (start < end) {
+			flushi(start);
+			start += 32;
+		}
+	}
+}
+
+#define SEEN_DATAREF 1 /* might call external helpers */
+#define SEEN_XREG    2 /* ebx is used */
+#define SEEN_MEM     4 /* use mem[] for temporary storage */
+
+#define S13(X)		((X) & 0x1fff)
+#define IMMED		0x00002000
+#define RD(X)		((X) << 25)
+#define RS1(X)		((X) << 14)
+#define RS2(X)		((X))
+#define OP(X)		((X) << 30)
+#define OP2(X)		((X) << 22)
+#define OP3(X)		((X) << 19)
+#define COND(X)		((X) << 25)
+#define F1(X)		OP(X)
+#define F2(X, Y)	(OP(X) | OP2(Y))
+#define F3(X, Y)	(OP(X) | OP3(Y))
+#define ASI(X)		(((X) & 0xff) << 5)
+
+#define CONDN		COND(0x0)
+#define CONDE		COND(0x1)
+#define CONDLE		COND(0x2)
+#define CONDL		COND(0x3)
+#define CONDLEU		COND(0x4)
+#define CONDCS		COND(0x5)
+#define CONDNEG		COND(0x6)
+#define CONDVC		COND(0x7)
+#define CONDA		COND(0x8)
+#define CONDNE		COND(0x9)
+#define CONDG		COND(0xa)
+#define CONDGE		COND(0xb)
+#define CONDGU		COND(0xc)
+#define CONDCC		COND(0xd)
+#define CONDPOS		COND(0xe)
+#define CONDVS		COND(0xf)
+
+#define CONDGEU		CONDCC
+#define CONDLU		CONDCS
+
+#define WDISP22(X)	(((X) >> 2) & 0x3fffff)
+#define WDISP19(X)	(((X) >> 2) & 0x7ffff)
+
+#define ANNUL		(1 << 29)
+#define XCC		(1 << 21)
+
+#define BRANCH		(F2(0, 1) | XCC)
+
+#define BA		(BRANCH | CONDA)
+#define BG		(BRANCH | CONDG)
+#define BGU		(BRANCH | CONDGU)
+#define BLEU		(BRANCH | CONDLEU)
+#define BGE		(BRANCH | CONDGE)
+#define BGEU		(BRANCH | CONDGEU)
+#define BLU		(BRANCH | CONDLU)
+#define BE		(BRANCH | CONDE)
+#define BNE		(BRANCH | CONDNE)
+
+#define SETHI(K, REG)	\
+	(F2(0, 0x4) | RD(REG) | (((K) >> 10) & 0x3fffff))
+#define OR_LO(K, REG)	\
+	(F3(2, 0x02) | IMMED | RS1(REG) | ((K) & 0x3ff) | RD(REG))
+
+#define ADD		F3(2, 0x00)
+#define AND		F3(2, 0x01)
+#define ANDCC		F3(2, 0x11)
+#define OR		F3(2, 0x02)
+#define XOR		F3(2, 0x03)
+#define SUB		F3(2, 0x04)
+#define SUBCC		F3(2, 0x14)
+#define MUL		F3(2, 0x0a)
+#define MULX		F3(2, 0x09)
+#define UDIVX		F3(2, 0x0d)
+#define DIV		F3(2, 0x0e)
+#define SLL		F3(2, 0x25)
+#define SLLX		(F3(2, 0x25)|(1<<12))
+#define SRA		F3(2, 0x27)
+#define SRAX		(F3(2, 0x27)|(1<<12))
+#define SRL		F3(2, 0x26)
+#define SRLX		(F3(2, 0x26)|(1<<12))
+#define JMPL		F3(2, 0x38)
+#define SAVE		F3(2, 0x3c)
+#define RESTORE		F3(2, 0x3d)
+#define CALL		F1(1)
+#define BR		F2(0, 0x01)
+#define RD_Y		F3(2, 0x28)
+#define WR_Y		F3(2, 0x30)
+
+#define LD32		F3(3, 0x00)
+#define LD8		F3(3, 0x01)
+#define LD16		F3(3, 0x02)
+#define LD64		F3(3, 0x0b)
+#define LD64A		F3(3, 0x1b)
+#define ST8		F3(3, 0x05)
+#define ST16		F3(3, 0x06)
+#define ST32		F3(3, 0x04)
+#define ST64		F3(3, 0x0e)
+
+#define CAS		F3(3, 0x3c)
+#define CASX		F3(3, 0x3e)
+
+#define LDPTR		LD64
+#define BASE_STACKFRAME	176
+
+#define LD32I		(LD32 | IMMED)
+#define LD8I		(LD8 | IMMED)
+#define LD16I		(LD16 | IMMED)
+#define LD64I		(LD64 | IMMED)
+#define LDPTRI		(LDPTR | IMMED)
+#define ST32I		(ST32 | IMMED)
+
+struct jit_ctx {
+	struct bpf_prog		*prog;
+	unsigned int		*offset;
+	int			idx;
+	int			epilogue_offset;
+	bool 			tmp_1_used;
+	bool 			tmp_2_used;
+	bool 			tmp_3_used;
+	bool			saw_ld_abs_ind;
+	bool			saw_frame_pointer;
+	bool			saw_call;
+	bool			saw_tail_call;
+	u32			*image;
+};
+
+#define TMP_REG_1	(MAX_BPF_JIT_REG + 0)
+#define TMP_REG_2	(MAX_BPF_JIT_REG + 1)
+#define SKB_HLEN_REG	(MAX_BPF_JIT_REG + 2)
+#define SKB_DATA_REG	(MAX_BPF_JIT_REG + 3)
+#define TMP_REG_3	(MAX_BPF_JIT_REG + 4)
+
+/* Map BPF registers to SPARC registers */
+static const int bpf2sparc[] = {
+	/* return value from in-kernel function, and exit value from eBPF */
+	[BPF_REG_0] = O5,
+
+	/* arguments from eBPF program to in-kernel function */
+	[BPF_REG_1] = O0,
+	[BPF_REG_2] = O1,
+	[BPF_REG_3] = O2,
+	[BPF_REG_4] = O3,
+	[BPF_REG_5] = O4,
+
+	/* callee saved registers that in-kernel function will preserve */
+	[BPF_REG_6] = L0,
+	[BPF_REG_7] = L1,
+	[BPF_REG_8] = L2,
+	[BPF_REG_9] = L3,
+
+	/* read-only frame pointer to access stack */
+	[BPF_REG_FP] = L6,
+
+	[BPF_REG_AX] = G7,
+
+	/* temporary register for internal BPF JIT */
+	[TMP_REG_1] = G1,
+	[TMP_REG_2] = G2,
+	[TMP_REG_3] = G3,
+
+	[SKB_HLEN_REG] = L4,
+	[SKB_DATA_REG] = L5,
+};
+
+static void emit(const u32 insn, struct jit_ctx *ctx)
+{
+	if (ctx->image != NULL)
+		ctx->image[ctx->idx] = insn;
+
+	ctx->idx++;
+}
+
+static void emit_call(u32 *func, struct jit_ctx *ctx)
+{
+	if (ctx->image != NULL) {
+		void *here = &ctx->image[ctx->idx];
+		unsigned int off;
+
+		off = (void *)func - here;
+		ctx->image[ctx->idx] = CALL | ((off >> 2) & 0x3fffffff);
+	}
+	ctx->idx++;
+}
+
+static void emit_nop(struct jit_ctx *ctx)
+{
+	emit(SETHI(0, G0), ctx);
+}
+
+static void emit_reg_move(u32 from, u32 to, struct jit_ctx *ctx)
+{
+	emit(OR | RS1(G0) | RS2(from) | RD(to), ctx);
+}
+
+/* Emit 32-bit constant, zero extended. */
+static void emit_set_const(s32 K, u32 reg, struct jit_ctx *ctx)
+{
+	emit(SETHI(K, reg), ctx);
+	emit(OR_LO(K, reg), ctx);
+}
+
+/* Emit 32-bit constant, sign extended. */
+static void emit_set_const_sext(s32 K, u32 reg, struct jit_ctx *ctx)
+{
+	if (K >= 0) {
+		emit(SETHI(K, reg), ctx);
+		emit(OR_LO(K, reg), ctx);
+	} else {
+		u32 hbits = ~(u32) K;
+		u32 lbits = -0x400 | (u32) K;
+
+		emit(SETHI(hbits, reg), ctx);
+		emit(XOR | IMMED | RS1(reg) | S13(lbits) | RD(reg), ctx);
+	}
+}
+
+static void emit_alu(u32 opcode, u32 src, u32 dst, struct jit_ctx *ctx)
+{
+	emit(opcode | RS1(dst) | RS2(src) | RD(dst), ctx);
+}
+
+static void emit_alu3(u32 opcode, u32 a, u32 b, u32 c, struct jit_ctx *ctx)
+{
+	emit(opcode | RS1(a) | RS2(b) | RD(c), ctx);
+}
+
+static void emit_alu_K(unsigned int opcode, unsigned int dst, unsigned int imm,
+		       struct jit_ctx *ctx)
+{
+	bool small_immed = is_simm13(imm);
+	unsigned int insn = opcode;
+
+	insn |= RS1(dst) | RD(dst);
+	if (small_immed) {
+		emit(insn | IMMED | S13(imm), ctx);
+	} else {
+		unsigned int tmp = bpf2sparc[TMP_REG_1];
+
+		ctx->tmp_1_used = true;
+
+		emit_set_const_sext(imm, tmp, ctx);
+		emit(insn | RS2(tmp), ctx);
+	}
+}
+
+static void emit_alu3_K(unsigned int opcode, unsigned int src, unsigned int imm,
+			unsigned int dst, struct jit_ctx *ctx)
+{
+	bool small_immed = is_simm13(imm);
+	unsigned int insn = opcode;
+
+	insn |= RS1(src) | RD(dst);
+	if (small_immed) {
+		emit(insn | IMMED | S13(imm), ctx);
+	} else {
+		unsigned int tmp = bpf2sparc[TMP_REG_1];
+
+		ctx->tmp_1_used = true;
+
+		emit_set_const_sext(imm, tmp, ctx);
+		emit(insn | RS2(tmp), ctx);
+	}
+}
+
+static void emit_loadimm32(s32 K, unsigned int dest, struct jit_ctx *ctx)
+{
+	if (K >= 0 && is_simm13(K)) {
+		/* or %g0, K, DEST */
+		emit(OR | IMMED | RS1(G0) | S13(K) | RD(dest), ctx);
+	} else {
+		emit_set_const(K, dest, ctx);
+	}
+}
+
+static void emit_loadimm(s32 K, unsigned int dest, struct jit_ctx *ctx)
+{
+	if (is_simm13(K)) {
+		/* or %g0, K, DEST */
+		emit(OR | IMMED | RS1(G0) | S13(K) | RD(dest), ctx);
+	} else {
+		emit_set_const(K, dest, ctx);
+	}
+}
+
+static void emit_loadimm_sext(s32 K, unsigned int dest, struct jit_ctx *ctx)
+{
+	if (is_simm13(K)) {
+		/* or %g0, K, DEST */
+		emit(OR | IMMED | RS1(G0) | S13(K) | RD(dest), ctx);
+	} else {
+		emit_set_const_sext(K, dest, ctx);
+	}
+}
+
+static void emit_loadimm64(u64 K, unsigned int dest, struct jit_ctx *ctx)
+{
+	unsigned int tmp = bpf2sparc[TMP_REG_1];
+	u32 high_part = (K >> 32);
+	u32 low_part = (K & 0xffffffff);
+
+	ctx->tmp_1_used = true;
+
+	emit_set_const(high_part, tmp, ctx);
+	emit_set_const(low_part, dest, ctx);
+	emit_alu_K(SLLX, tmp, 32, ctx);
+	emit(OR | RS1(dest) | RS2(tmp) | RD(dest), ctx);
+}
+
+static void emit_branch(unsigned int br_opc, unsigned int from_idx, unsigned int to_idx,
+			struct jit_ctx *ctx)
+{
+	unsigned int off = to_idx - from_idx;
+
+	if (br_opc & XCC)
+		emit(br_opc | WDISP19(off << 2), ctx);
+	else
+		emit(br_opc | WDISP22(off << 2), ctx);
+}
+
+#define emit_read_y(REG, CTX)	emit(RD_Y | RD(REG), CTX)
+#define emit_write_y(REG, CTX)	emit(WR_Y | IMMED | RS1(REG) | S13(0), CTX)
+
+#define emit_cmp(R1, R2, CTX)				\
+	emit(SUBCC | RS1(R1) | RS2(R2) | RD(G0), CTX)
+
+#define emit_cmpi(R1, IMM, CTX)				\
+	emit(SUBCC | IMMED | RS1(R1) | S13(IMM) | RD(G0), CTX);
+
+#define emit_btst(R1, R2, CTX)				\
+	emit(ANDCC | RS1(R1) | RS2(R2) | RD(G0), CTX)
+
+#define emit_btsti(R1, IMM, CTX)			\
+	emit(ANDCC | IMMED | RS1(R1) | S13(IMM) | RD(G0), CTX)
+
+static void load_skb_regs(struct jit_ctx *ctx, u8 r_skb)
+{
+	const u8 r_headlen = bpf2sparc[SKB_HLEN_REG];
+	const u8 r_data = bpf2sparc[SKB_DATA_REG];
+	const u8 r_tmp = bpf2sparc[TMP_REG_1];
+	unsigned int off;
+
+	off = offsetof(struct sk_buff, len);
+	emit(LD32I | RS1(r_skb) | S13(off) | RD(r_headlen), ctx);
+
+	off = offsetof(struct sk_buff, data_len);
+	emit(LD32I | RS1(r_skb) | S13(off) | RD(r_tmp), ctx);
+
+	emit(SUB | RS1(r_headlen) | RS2(r_tmp) | RD(r_headlen), ctx);
+
+	off = offsetof(struct sk_buff, data);
+	emit(LDPTRI | RS1(r_skb) | S13(off) | RD(r_data), ctx);
+}
+
+/* Just skip the save instruction and the ctx register move.  */
+#define BPF_TAILCALL_PROLOGUE_SKIP	16
+#define BPF_TAILCALL_CNT_SP_OFF		(STACK_BIAS + 128)
+
+static void build_prologue(struct jit_ctx *ctx)
+{
+	s32 stack_needed = BASE_STACKFRAME;
+
+	if (ctx->saw_frame_pointer || ctx->saw_tail_call)
+		stack_needed += MAX_BPF_STACK;
+
+	if (ctx->saw_tail_call)
+		stack_needed += 8;
+
+	/* save %sp, -176, %sp */
+	emit(SAVE | IMMED | RS1(SP) | S13(-stack_needed) | RD(SP), ctx);
+
+	/* tail_call_cnt = 0 */
+	if (ctx->saw_tail_call) {
+		u32 off = BPF_TAILCALL_CNT_SP_OFF;
+
+		emit(ST32 | IMMED | RS1(SP) | S13(off) | RD(G0), ctx);
+	} else {
+		emit_nop(ctx);
+	}
+	if (ctx->saw_frame_pointer) {
+		const u8 vfp = bpf2sparc[BPF_REG_FP];
+
+		emit(ADD | IMMED | RS1(FP) | S13(STACK_BIAS) | RD(vfp), ctx);
+	}
+
+	emit_reg_move(I0, O0, ctx);
+	/* If you add anything here, adjust BPF_TAILCALL_PROLOGUE_SKIP above. */
+
+	if (ctx->saw_ld_abs_ind)
+		load_skb_regs(ctx, bpf2sparc[BPF_REG_1]);
+}
+
+static void build_epilogue(struct jit_ctx *ctx)
+{
+	ctx->epilogue_offset = ctx->idx;
+
+	/* ret (jmpl %i7 + 8, %g0) */
+	emit(JMPL | IMMED | RS1(I7) | S13(8) | RD(G0), ctx);
+
+	/* restore %i5, %g0, %o0 */
+	emit(RESTORE | RS1(bpf2sparc[BPF_REG_0]) | RS2(G0) | RD(O0), ctx);
+}
+
+static void emit_tail_call(struct jit_ctx *ctx)
+{
+	const u8 bpf_array = bpf2sparc[BPF_REG_2];
+	const u8 bpf_index = bpf2sparc[BPF_REG_3];
+	const u8 tmp = bpf2sparc[TMP_REG_1];
+	u32 off;
+
+	ctx->saw_tail_call = true;
+
+	off = offsetof(struct bpf_array, map.max_entries);
+	emit(LD32 | IMMED | RS1(bpf_array) | S13(off) | RD(tmp), ctx);
+	emit_cmp(bpf_index, tmp, ctx);
+#define OFFSET1 17
+	emit_branch(BGEU, ctx->idx, ctx->idx + OFFSET1, ctx);
+	emit_nop(ctx);
+
+	off = BPF_TAILCALL_CNT_SP_OFF;
+	emit(LD32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
+	emit_cmpi(tmp, MAX_TAIL_CALL_CNT, ctx);
+#define OFFSET2 13
+	emit_branch(BGU, ctx->idx, ctx->idx + OFFSET2, ctx);
+	emit_nop(ctx);
+
+	emit_alu_K(ADD, tmp, 1, ctx);
+	off = BPF_TAILCALL_CNT_SP_OFF;
+	emit(ST32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
+
+	emit_alu3_K(SLL, bpf_index, 3, tmp, ctx);
+	emit_alu(ADD, bpf_array, tmp, ctx);
+	off = offsetof(struct bpf_array, ptrs);
+	emit(LD64 | IMMED | RS1(tmp) | S13(off) | RD(tmp), ctx);
+
+	emit_cmpi(tmp, 0, ctx);
+#define OFFSET3 5
+	emit_branch(BE, ctx->idx, ctx->idx + OFFSET3, ctx);
+	emit_nop(ctx);
+
+	off = offsetof(struct bpf_prog, bpf_func);
+	emit(LD64 | IMMED | RS1(tmp) | S13(off) | RD(tmp), ctx);
+
+	off = BPF_TAILCALL_PROLOGUE_SKIP;
+	emit(JMPL | IMMED | RS1(tmp) | S13(off) | RD(G0), ctx);
+	emit_nop(ctx);
+}
+
+static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx)
+{
+	const u8 code = insn->code;
+	const u8 dst = bpf2sparc[insn->dst_reg];
+	const u8 src = bpf2sparc[insn->src_reg];
+	const int i = insn - ctx->prog->insnsi;
+	const s16 off = insn->off;
+	const s32 imm = insn->imm;
+	u32 *func;
+
+	if (insn->src_reg == BPF_REG_FP)
+		ctx->saw_frame_pointer = true;
+
+	switch (code) {
+	/* dst = src */
+	case BPF_ALU | BPF_MOV | BPF_X:
+		emit_alu3_K(SRL, src, 0, dst, ctx);
+		break;
+	case BPF_ALU64 | BPF_MOV | BPF_X:
+		emit_reg_move(src, dst, ctx);
+		break;
+	/* dst = dst OP src */
+	case BPF_ALU | BPF_ADD | BPF_X:
+	case BPF_ALU64 | BPF_ADD | BPF_X:
+		emit_alu(ADD, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_SUB | BPF_X:
+	case BPF_ALU64 | BPF_SUB | BPF_X:
+		emit_alu(SUB, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_AND | BPF_X:
+	case BPF_ALU64 | BPF_AND | BPF_X:
+		emit_alu(AND, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_OR | BPF_X:
+	case BPF_ALU64 | BPF_OR | BPF_X:
+		emit_alu(OR, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_XOR | BPF_X:
+	case BPF_ALU64 | BPF_XOR | BPF_X:
+		emit_alu(XOR, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_MUL | BPF_X:
+		emit_alu(MUL, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_MUL | BPF_X:
+		emit_alu(MULX, src, dst, ctx);
+		break;
+	case BPF_ALU | BPF_DIV | BPF_X:
+		emit_cmp(src, G0, ctx);
+		emit_branch(BE|ANNUL, ctx->idx, ctx->epilogue_offset, ctx);
+		emit_loadimm(0, bpf2sparc[BPF_REG_0], ctx);
+
+		emit_write_y(G0, ctx);
+		emit_alu(DIV, src, dst, ctx);
+		break;
+
+	case BPF_ALU64 | BPF_DIV | BPF_X:
+		emit_cmp(src, G0, ctx);
+		emit_branch(BE|ANNUL, ctx->idx, ctx->epilogue_offset, ctx);
+		emit_loadimm(0, bpf2sparc[BPF_REG_0], ctx);
+
+		emit_alu(UDIVX, src, dst, ctx);
+		break;
+
+	case BPF_ALU | BPF_MOD | BPF_X: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+
+		ctx->tmp_1_used = true;
+
+		emit_cmp(src, G0, ctx);
+		emit_branch(BE|ANNUL, ctx->idx, ctx->epilogue_offset, ctx);
+		emit_loadimm(0, bpf2sparc[BPF_REG_0], ctx);
+
+		emit_write_y(G0, ctx);
+		emit_alu3(DIV, dst, src, tmp, ctx);
+		emit_alu3(MULX, tmp, src, tmp, ctx);
+		emit_alu3(SUB, dst, tmp, dst, ctx);
+		goto do_alu32_trunc;
+	}
+	case BPF_ALU64 | BPF_MOD | BPF_X: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+
+		ctx->tmp_1_used = true;
+
+		emit_cmp(src, G0, ctx);
+		emit_branch(BE|ANNUL, ctx->idx, ctx->epilogue_offset, ctx);
+		emit_loadimm(0, bpf2sparc[BPF_REG_0], ctx);
+
+		emit_alu3(UDIVX, dst, src, tmp, ctx);
+		emit_alu3(MULX, tmp, src, tmp, ctx);
+		emit_alu3(SUB, dst, tmp, dst, ctx);
+		break;
+	}
+	case BPF_ALU | BPF_LSH | BPF_X:
+		emit_alu(SLL, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_LSH | BPF_X:
+		emit_alu(SLLX, src, dst, ctx);
+		break;
+	case BPF_ALU | BPF_RSH | BPF_X:
+		emit_alu(SRL, src, dst, ctx);
+		break;
+	case BPF_ALU64 | BPF_RSH | BPF_X:
+		emit_alu(SRLX, src, dst, ctx);
+		break;
+	case BPF_ALU | BPF_ARSH | BPF_X:
+		emit_alu(SRA, src, dst, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_ARSH | BPF_X:
+		emit_alu(SRAX, src, dst, ctx);
+		break;
+
+	/* dst = -dst */
+	case BPF_ALU | BPF_NEG:
+	case BPF_ALU64 | BPF_NEG:
+		emit(SUB | RS1(0) | RS2(dst) | RD(dst), ctx);
+		goto do_alu32_trunc;
+
+	case BPF_ALU | BPF_END | BPF_FROM_BE:
+		switch (imm) {
+		case 16:
+			emit_alu_K(SLL, dst, 16, ctx);
+			emit_alu_K(SRL, dst, 16, ctx);
+			break;
+		case 32:
+			emit_alu_K(SRL, dst, 0, ctx);
+			break;
+		case 64:
+			/* nop */
+			break;
+
+		}
+		break;
+
+	/* dst = BSWAP##imm(dst) */
+	case BPF_ALU | BPF_END | BPF_FROM_LE: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		const u8 tmp2 = bpf2sparc[TMP_REG_2];
+
+		ctx->tmp_1_used = true;
+		switch (imm) {
+		case 16:
+			emit_alu3_K(AND, dst, 0xff, tmp, ctx);
+			emit_alu3_K(SRL, dst, 8, dst, ctx);
+			emit_alu3_K(AND, dst, 0xff, dst, ctx);
+			emit_alu3_K(SLL, tmp, 8, tmp, ctx);
+			emit_alu(OR, tmp, dst, ctx);
+			break;
+
+		case 32:
+			ctx->tmp_2_used = true;
+			emit_alu3_K(SRL, dst, 24, tmp, ctx);	/* tmp  = dst >> 24 */
+			emit_alu3_K(SRL, dst, 16, tmp2, ctx);	/* tmp2 = dst >> 16 */
+			emit_alu3_K(AND, tmp2, 0xff, tmp2, ctx);/* tmp2 = tmp2 & 0xff */
+			emit_alu3_K(SLL, tmp2, 8, tmp2, ctx);	/* tmp2 = tmp2 << 8 */
+			emit_alu(OR, tmp2, tmp, ctx);		/* tmp  = tmp | tmp2 */
+			emit_alu3_K(SRL, dst, 8, tmp2, ctx);	/* tmp2 = dst >> 8 */
+			emit_alu3_K(AND, tmp2, 0xff, tmp2, ctx);/* tmp2 = tmp2 & 0xff */
+			emit_alu3_K(SLL, tmp2, 16, tmp2, ctx);	/* tmp2 = tmp2 << 16 */
+			emit_alu(OR, tmp2, tmp, ctx);		/* tmp  = tmp | tmp2 */
+			emit_alu3_K(AND, dst, 0xff, dst, ctx);	/* dst	= dst & 0xff */
+			emit_alu3_K(SLL, dst, 24, dst, ctx);	/* dst  = dst << 24 */
+			emit_alu(OR, tmp, dst, ctx);		/* dst  = dst | tmp */
+			break;
+
+		case 64:
+			emit_alu3_K(ADD, SP, STACK_BIAS + 128, tmp, ctx);
+			emit(ST64 | RS1(tmp) | RS2(G0) | RD(dst), ctx);
+			emit(LD64A | ASI(ASI_PL) | RS1(tmp) | RS2(G0) | RD(dst), ctx);
+			break;
+		}
+		break;
+	}
+	/* dst = imm */
+	case BPF_ALU | BPF_MOV | BPF_K:
+		emit_loadimm32(imm, dst, ctx);
+		break;
+	case BPF_ALU64 | BPF_MOV | BPF_K:
+		emit_loadimm_sext(imm, dst, ctx);
+		break;
+	/* dst = dst OP imm */
+	case BPF_ALU | BPF_ADD | BPF_K:
+	case BPF_ALU64 | BPF_ADD | BPF_K:
+		emit_alu_K(ADD, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_SUB | BPF_K:
+	case BPF_ALU64 | BPF_SUB | BPF_K:
+		emit_alu_K(SUB, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_AND | BPF_K:
+	case BPF_ALU64 | BPF_AND | BPF_K:
+		emit_alu_K(AND, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_OR | BPF_K:
+	case BPF_ALU64 | BPF_OR | BPF_K:
+		emit_alu_K(OR, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_XOR | BPF_K:
+	case BPF_ALU64 | BPF_XOR | BPF_K:
+		emit_alu_K(XOR, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU | BPF_MUL | BPF_K:
+		emit_alu_K(MUL, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_MUL | BPF_K:
+		emit_alu_K(MULX, dst, imm, ctx);
+		break;
+	case BPF_ALU | BPF_DIV | BPF_K:
+		if (imm == 0)
+			return -EINVAL;
+
+		emit_write_y(G0, ctx);
+		emit_alu_K(DIV, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_DIV | BPF_K:
+		if (imm == 0)
+			return -EINVAL;
+
+		emit_alu_K(UDIVX, dst, imm, ctx);
+		break;
+	case BPF_ALU64 | BPF_MOD | BPF_K:
+	case BPF_ALU | BPF_MOD | BPF_K: {
+		const u8 tmp = bpf2sparc[TMP_REG_2];
+		unsigned int div;
+
+		if (imm == 0)
+			return -EINVAL;
+
+		div = (BPF_CLASS(code) == BPF_ALU64) ? UDIVX : DIV;
+
+		ctx->tmp_2_used = true;
+
+		if (BPF_CLASS(code) != BPF_ALU64)
+			emit_write_y(G0, ctx);
+		if (is_simm13(imm)) {
+			emit(div | IMMED | RS1(dst) | S13(imm) | RD(tmp), ctx);
+			emit(MULX | IMMED | RS1(tmp) | S13(imm) | RD(tmp), ctx);
+			emit(SUB | RS1(dst) | RS2(tmp) | RD(dst), ctx);
+		} else {
+			const u8 tmp1 = bpf2sparc[TMP_REG_1];
+
+			ctx->tmp_1_used = true;
+
+			emit_set_const_sext(imm, tmp1, ctx);
+			emit(div | RS1(dst) | RS2(tmp1) | RD(tmp), ctx);
+			emit(MULX | RS1(tmp) | RS2(tmp1) | RD(tmp), ctx);
+			emit(SUB | RS1(dst) | RS2(tmp) | RD(dst), ctx);
+		}
+		goto do_alu32_trunc;
+	}
+	case BPF_ALU | BPF_LSH | BPF_K:
+		emit_alu_K(SLL, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_LSH | BPF_K:
+		emit_alu_K(SLLX, dst, imm, ctx);
+		break;
+	case BPF_ALU | BPF_RSH | BPF_K:
+		emit_alu_K(SRL, dst, imm, ctx);
+		break;
+	case BPF_ALU64 | BPF_RSH | BPF_K:
+		emit_alu_K(SRLX, dst, imm, ctx);
+		break;
+	case BPF_ALU | BPF_ARSH | BPF_K:
+		emit_alu_K(SRA, dst, imm, ctx);
+		goto do_alu32_trunc;
+	case BPF_ALU64 | BPF_ARSH | BPF_K:
+		emit_alu_K(SRAX, dst, imm, ctx);
+		break;
+
+	do_alu32_trunc:
+		if (BPF_CLASS(code) == BPF_ALU)
+			emit_alu_K(SRL, dst, 0, ctx);
+		break;
+
+	/* JUMP off */
+	case BPF_JMP | BPF_JA:
+		emit_branch(BA, ctx->idx, ctx->offset[i + off], ctx);
+		emit_nop(ctx);
+		break;
+	/* IF (dst COND src) JUMP off */
+	case BPF_JMP | BPF_JEQ | BPF_X:
+	case BPF_JMP | BPF_JGT | BPF_X:
+	case BPF_JMP | BPF_JGE | BPF_X:
+	case BPF_JMP | BPF_JNE | BPF_X:
+	case BPF_JMP | BPF_JSGT | BPF_X:
+	case BPF_JMP | BPF_JSGE | BPF_X: {
+		u32 br_opcode;
+
+		emit_cmp(dst, src, ctx);
+emit_cond_jmp:
+		switch (BPF_OP(code)) {
+		case BPF_JEQ:
+			br_opcode = BE;
+			break;
+		case BPF_JGT:
+			br_opcode = BGU;
+			break;
+		case BPF_JGE:
+			br_opcode = BGEU;
+			break;
+		case BPF_JSET:
+		case BPF_JNE:
+			br_opcode = BNE;
+			break;
+		case BPF_JSGT:
+			br_opcode = BG;
+			break;
+		case BPF_JSGE:
+			br_opcode = BGE;
+			break;
+		default:
+			/* Make sure we dont leak kernel information to the
+			 * user.
+			 */
+			return -EFAULT;
+		}
+		emit_branch(br_opcode, ctx->idx, ctx->offset[i + off], ctx);
+		emit_nop(ctx);
+		break;
+	}
+	case BPF_JMP | BPF_JSET | BPF_X:
+		emit_btst(dst, src, ctx);
+		goto emit_cond_jmp;
+	/* IF (dst COND imm) JUMP off */
+	case BPF_JMP | BPF_JEQ | BPF_K:
+	case BPF_JMP | BPF_JGT | BPF_K:
+	case BPF_JMP | BPF_JGE | BPF_K:
+	case BPF_JMP | BPF_JNE | BPF_K:
+	case BPF_JMP | BPF_JSGT | BPF_K:
+	case BPF_JMP | BPF_JSGE | BPF_K:
+		if (is_simm13(imm)) {
+			emit_cmpi(dst, imm, ctx);
+		} else {
+			ctx->tmp_1_used = true;
+			emit_loadimm_sext(imm, bpf2sparc[TMP_REG_1], ctx);
+			emit_cmp(dst, bpf2sparc[TMP_REG_1], ctx);
+		}
+		goto emit_cond_jmp;
+	case BPF_JMP | BPF_JSET | BPF_K:
+		if (is_simm13(imm)) {
+			emit_btsti(dst, imm, ctx);
+		} else {
+			ctx->tmp_1_used = true;
+			emit_loadimm_sext(imm, bpf2sparc[TMP_REG_1], ctx);
+			emit_btst(dst, bpf2sparc[TMP_REG_1], ctx);
+		}
+		goto emit_cond_jmp;
+
+	/* function call */
+	case BPF_JMP | BPF_CALL:
+	{
+		u8 *func = ((u8 *)__bpf_call_base) + imm;
+
+		ctx->saw_call = true;
+
+		emit_call((u32 *)func, ctx);
+		emit_nop(ctx);
+
+		emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
+
+		if (bpf_helper_changes_pkt_data(func) && ctx->saw_ld_abs_ind)
+			load_skb_regs(ctx, bpf2sparc[BPF_REG_6]);
+		break;
+	}
+
+	/* tail call */
+	case BPF_JMP | BPF_CALL |BPF_X:
+		emit_tail_call(ctx);
+
+	/* function return */
+	case BPF_JMP | BPF_EXIT:
+		/* Optimization: when last instruction is EXIT,
+		   simply fallthrough to epilogue. */
+		if (i == ctx->prog->len - 1)
+			break;
+		emit_branch(BA, ctx->idx, ctx->epilogue_offset, ctx);
+		emit_nop(ctx);
+		break;
+
+	/* dst = imm64 */
+	case BPF_LD | BPF_IMM | BPF_DW:
+	{
+		const struct bpf_insn insn1 = insn[1];
+		u64 imm64;
+
+		imm64 = (u64)insn1.imm << 32 | (u32)imm;
+		emit_loadimm64(imm64, dst, ctx);
+
+		return 1;
+	}
+
+	/* LDX: dst = *(size *)(src + off) */
+	case BPF_LDX | BPF_MEM | BPF_W:
+	case BPF_LDX | BPF_MEM | BPF_H:
+	case BPF_LDX | BPF_MEM | BPF_B:
+	case BPF_LDX | BPF_MEM | BPF_DW: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		u32 opcode = 0, rs2;
+
+		ctx->tmp_1_used = true;
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			opcode = LD32;
+			break;
+		case BPF_H:
+			opcode = LD16;
+			break;
+		case BPF_B:
+			opcode = LD8;
+			break;
+		case BPF_DW:
+			opcode = LD64;
+			break;
+		}
+
+		if (is_simm13(off)) {
+			opcode |= IMMED;
+			rs2 = S13(off);
+		} else {
+			emit_loadimm(off, tmp, ctx);
+			rs2 = RS2(tmp);
+		}
+		emit(opcode | RS1(src) | rs2 | RD(dst), ctx);
+		break;
+	}
+	/* ST: *(size *)(dst + off) = imm */
+	case BPF_ST | BPF_MEM | BPF_W:
+	case BPF_ST | BPF_MEM | BPF_H:
+	case BPF_ST | BPF_MEM | BPF_B:
+	case BPF_ST | BPF_MEM | BPF_DW: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		const u8 tmp2 = bpf2sparc[TMP_REG_2];
+		u32 opcode = 0, rs2;
+
+		ctx->tmp_2_used = true;
+		emit_loadimm(imm, tmp2, ctx);
+
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			opcode = ST32;
+			break;
+		case BPF_H:
+			opcode = ST16;
+			break;
+		case BPF_B:
+			opcode = ST8;
+			break;
+		case BPF_DW:
+			opcode = ST64;
+			break;
+		}
+
+		if (is_simm13(off)) {
+			opcode |= IMMED;
+			rs2 = S13(off);
+		} else {
+			ctx->tmp_1_used = true;
+			emit_loadimm(off, tmp, ctx);
+			rs2 = RS2(tmp);
+		}
+		emit(opcode | RS1(dst) | rs2 | RD(tmp2), ctx);
+		break;
+	}
+
+	/* STX: *(size *)(dst + off) = src */
+	case BPF_STX | BPF_MEM | BPF_W:
+	case BPF_STX | BPF_MEM | BPF_H:
+	case BPF_STX | BPF_MEM | BPF_B:
+	case BPF_STX | BPF_MEM | BPF_DW: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		u32 opcode = 0, rs2;
+
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			opcode = ST32;
+			break;
+		case BPF_H:
+			opcode = ST16;
+			break;
+		case BPF_B:
+			opcode = ST8;
+			break;
+		case BPF_DW:
+			opcode = ST64;
+			break;
+		}
+		if (is_simm13(off)) {
+			opcode |= IMMED;
+			rs2 = S13(off);
+		} else {
+			ctx->tmp_1_used = true;
+			emit_loadimm(off, tmp, ctx);
+			rs2 = RS2(tmp);
+		}
+		emit(opcode | RS1(dst) | rs2 | RD(src), ctx);
+		break;
+	}
+
+	/* STX XADD: lock *(u32 *)(dst + off) += src */
+	case BPF_STX | BPF_XADD | BPF_W: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		const u8 tmp2 = bpf2sparc[TMP_REG_2];
+		const u8 tmp3 = bpf2sparc[TMP_REG_3];
+
+		ctx->tmp_1_used = true;
+		ctx->tmp_2_used = true;
+		ctx->tmp_3_used = true;
+		emit_loadimm(off, tmp, ctx);
+		emit_alu3(ADD, dst, tmp, tmp, ctx);
+
+		emit(LD32 | RS1(tmp) | RS2(G0) | RD(tmp2), ctx);
+		emit_alu3(ADD, tmp2, src, tmp3, ctx);
+		emit(CAS | ASI(ASI_P) | RS1(tmp) | RS2(tmp2) | RD(tmp3), ctx);
+		emit_cmp(tmp2, tmp3, ctx);
+		emit_branch(BNE, 4, 0, ctx);
+		emit_nop(ctx);
+		break;
+	}
+	/* STX XADD: lock *(u64 *)(dst + off) += src */
+	case BPF_STX | BPF_XADD | BPF_DW: {
+		const u8 tmp = bpf2sparc[TMP_REG_1];
+		const u8 tmp2 = bpf2sparc[TMP_REG_2];
+		const u8 tmp3 = bpf2sparc[TMP_REG_3];
+
+		ctx->tmp_1_used = true;
+		ctx->tmp_2_used = true;
+		ctx->tmp_3_used = true;
+		emit_loadimm(off, tmp, ctx);
+		emit_alu3(ADD, dst, tmp, tmp, ctx);
+
+		emit(LD64 | RS1(tmp) | RS2(G0) | RD(tmp2), ctx);
+		emit_alu3(ADD, tmp2, src, tmp3, ctx);
+		emit(CASX | ASI(ASI_P) | RS1(tmp) | RS2(tmp2) | RD(tmp3), ctx);
+		emit_cmp(tmp2, tmp3, ctx);
+		emit_branch(BNE, 4, 0, ctx);
+		emit_nop(ctx);
+		break;
+	}
+#define CHOOSE_LOAD_FUNC(K, func) \
+		((int)K < 0 ? ((int)K >= SKF_LL_OFF ? func##_negative_offset : func) : func##_positive_offset)
+
+	/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
+	case BPF_LD | BPF_ABS | BPF_W:
+		func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_word);
+		goto common_load;
+	case BPF_LD | BPF_ABS | BPF_H:
+		func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_half);
+		goto common_load;
+	case BPF_LD | BPF_ABS | BPF_B:
+		func = CHOOSE_LOAD_FUNC(imm, bpf_jit_load_byte);
+		goto common_load;
+	/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */
+	case BPF_LD | BPF_IND | BPF_W:
+		func = bpf_jit_load_word;
+		goto common_load;
+	case BPF_LD | BPF_IND | BPF_H:
+		func = bpf_jit_load_half;
+		goto common_load;
+
+	case BPF_LD | BPF_IND | BPF_B:
+		func = bpf_jit_load_byte;
+	common_load:
+		ctx->saw_ld_abs_ind = true;
+
+		emit_reg_move(bpf2sparc[BPF_REG_6], O0, ctx);
+		emit_loadimm(imm, O1, ctx);
+
+		if (BPF_MODE(code) == BPF_IND)
+			emit_alu(ADD, src, O1, ctx);
+
+		emit_call(func, ctx);
+		emit_alu_K(SRA, O1, 0, ctx);
+
+		emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx);
+		break;
+
+	default:
+		pr_err_once("unknown opcode %02x\n", code);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int build_body(struct jit_ctx *ctx)
+{
+	const struct bpf_prog *prog = ctx->prog;
+	int i;
+
+	for (i = 0; i < prog->len; i++) {
+		const struct bpf_insn *insn = &prog->insnsi[i];
+		int ret;
+
+		ret = build_insn(insn, ctx);
+		ctx->offset[i] = ctx->idx;
+
+		if (ret > 0) {
+			i++;
+			continue;
+		}
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static void jit_fill_hole(void *area, unsigned int size)
+{
+	u32 *ptr;
+	/* We are guaranteed to have aligned memory. */
+	for (ptr = area; size >= sizeof(u32); size -= sizeof(u32))
+		*ptr++ = 0x91d02005; /* ta 5 */
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
+{
+	struct bpf_prog *tmp, *orig_prog = prog;
+	struct bpf_binary_header *header;
+	bool tmp_blinded = false;
+	struct jit_ctx ctx;
+	u32 image_size;
+	u8 *image_ptr;
+	int pass;
+
+	if (!bpf_jit_enable)
+		return orig_prog;
+
+	tmp = bpf_jit_blind_constants(prog);
+	/* If blinding was requested and we failed during blinding,
+	 * we must fall back to the interpreter.
+	 */
+	if (IS_ERR(tmp))
+		return orig_prog;
+	if (tmp != prog) {
+		tmp_blinded = true;
+		prog = tmp;
+	}
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.prog = prog;
+
+	ctx.offset = kcalloc(prog->len, sizeof(unsigned int), GFP_KERNEL);
+	if (ctx.offset == NULL) {
+		prog = orig_prog;
+		goto out;
+	}
+
+	/* Fake pass to detect features used, and get an accurate assessment
+	 * of what the final image size will be.
+	 */
+	if (build_body(&ctx)) {
+		prog = orig_prog;
+		goto out_off;
+	}
+	build_prologue(&ctx);
+	build_epilogue(&ctx);
+
+	/* Now we know the actual image size. */
+	image_size = sizeof(u32) * ctx.idx;
+	header = bpf_jit_binary_alloc(image_size, &image_ptr,
+				      sizeof(u32), jit_fill_hole);
+	if (header == NULL) {
+		prog = orig_prog;
+		goto out_off;
+	}
+
+	ctx.image = (u32 *)image_ptr;
+
+	for (pass = 1; pass < 3; pass++) {
+		ctx.idx = 0;
+
+		build_prologue(&ctx);
+
+		if (build_body(&ctx)) {
+			bpf_jit_binary_free(header);
+			prog = orig_prog;
+			goto out_off;
+		}
+
+		build_epilogue(&ctx);
+
+		if (bpf_jit_enable > 1)
+			pr_info("Pass %d: shrink = %d, seen = [%c%c%c%c%c%c%c]\n", pass,
+				image_size - (ctx.idx * 4),
+				ctx.tmp_1_used ? '1' : ' ',
+				ctx.tmp_2_used ? '2' : ' ',
+				ctx.tmp_3_used ? '3' : ' ',
+				ctx.saw_ld_abs_ind ? 'L' : ' ',
+				ctx.saw_frame_pointer ? 'F' : ' ',
+				ctx.saw_call ? 'C' : ' ',
+				ctx.saw_tail_call ? 'T' : ' ');
+	}
+
+	if (bpf_jit_enable > 1)
+		bpf_jit_dump(prog->len, image_size, pass, ctx.image);
+
+	bpf_flush_icache(header, (u8 *)header + (header->pages * PAGE_SIZE));
+
+	bpf_jit_binary_lock_ro(header);
+
+	prog->bpf_func = (void *)ctx.image;
+	prog->jited = 1;
+
+out_off:
+	kfree(ctx.offset);
+out:
+	if (tmp_blinded)
+		bpf_jit_prog_release_other(prog, prog == orig_prog ?
+					   tmp : orig_prog);
+	return prog;
+}
-- 
2.1.2.532.g19b5d50


^ permalink raw reply related

* [PATCH 1/2] sparc: Split BPF JIT into 32-bit and 64-bit.
From: David Miller @ 2017-04-22  3:16 UTC (permalink / raw)
  To: sparclinux; +Cc: netdev, ast, daniel


This is in preparation for adding the 64-bit eBPF JIT.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/net/Makefile                              | 2 +-
 arch/sparc/net/{bpf_jit.h => bpf_jit_32.h}           | 0
 arch/sparc/net/{bpf_jit_asm.S => bpf_jit_asm_32.S}   | 2 +-
 arch/sparc/net/bpf_jit_asm_64.S                      | 1 +
 arch/sparc/net/{bpf_jit_comp.c => bpf_jit_comp_32.c} | 2 +-
 arch/sparc/net/bpf_jit_comp_64.c                     | 1 +
 6 files changed, 5 insertions(+), 3 deletions(-)
 rename arch/sparc/net/{bpf_jit.h => bpf_jit_32.h} (100%)
 rename arch/sparc/net/{bpf_jit_asm.S => bpf_jit_asm_32.S} (99%)
 create mode 100644 arch/sparc/net/bpf_jit_asm_64.S
 rename arch/sparc/net/{bpf_jit_comp.c => bpf_jit_comp_32.c} (99%)
 create mode 100644 arch/sparc/net/bpf_jit_comp_64.c

diff --git a/arch/sparc/net/Makefile b/arch/sparc/net/Makefile
index 1306a58..76fa8e9 100644
--- a/arch/sparc/net/Makefile
+++ b/arch/sparc/net/Makefile
@@ -1,4 +1,4 @@
 #
 # Arch-specific network modules
 #
-obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_asm_$(BITS).o bpf_jit_comp_$(BITS).o
diff --git a/arch/sparc/net/bpf_jit.h b/arch/sparc/net/bpf_jit_32.h
similarity index 100%
rename from arch/sparc/net/bpf_jit.h
rename to arch/sparc/net/bpf_jit_32.h
diff --git a/arch/sparc/net/bpf_jit_asm.S b/arch/sparc/net/bpf_jit_asm_32.S
similarity index 99%
rename from arch/sparc/net/bpf_jit_asm.S
rename to arch/sparc/net/bpf_jit_asm_32.S
index 8c83f4b..5632cdc 100644
--- a/arch/sparc/net/bpf_jit_asm.S
+++ b/arch/sparc/net/bpf_jit_asm_32.S
@@ -1,6 +1,6 @@
 #include <asm/ptrace.h>
 
-#include "bpf_jit.h"
+#include "bpf_jit_32.h"
 
 #ifdef CONFIG_SPARC64
 #define SAVE_SZ		176
diff --git a/arch/sparc/net/bpf_jit_asm_64.S b/arch/sparc/net/bpf_jit_asm_64.S
new file mode 100644
index 0000000..6fb023f
--- /dev/null
+++ b/arch/sparc/net/bpf_jit_asm_64.S
@@ -0,0 +1 @@
+#include "bpf_jit_asm_32.S"
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp_32.c
similarity index 99%
rename from arch/sparc/net/bpf_jit_comp.c
rename to arch/sparc/net/bpf_jit_comp_32.c
index a6d9204..83fc41d 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -8,7 +8,7 @@
 #include <asm/cacheflush.h>
 #include <asm/ptrace.h>
 
-#include "bpf_jit.h"
+#include "bpf_jit_32.h"
 
 int bpf_jit_enable __read_mostly;
 
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
new file mode 100644
index 0000000..49b5f65
--- /dev/null
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -0,0 +1 @@
+#include "bpf_jit_comp_32.c"
-- 
2.1.2.532.g19b5d50


^ permalink raw reply related

* [PATCH v2 net-next 0/2] sparc64 eBPF JIT
From: David Miller @ 2017-04-22  3:16 UTC (permalink / raw)
  To: sparclinux; +Cc: netdev, ast, daniel


This series adds an eBPF JIT for sparc64.

Sparc32 keeps having it's cBPF JIT after these changes.

Thanks to Daniel and Alexei for their invaluable feedback and review.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

since v1:
 -- add tail call support
 -- fix MOD generation to not emit the same constant load several times
 -- flush icache over entire buffer properly
 -- do STACK_BIAS adjustment of FP once in prologue
 -- remove useless sanity checks
 -- generate patches with -M -C

since RFC:
 -- rework register allocation so that we don't have to emit move
    instructions to get the arguments into place for calls
 -- allow FP as source for 64-bit ALU move operations
 -- don't emit NOP instructions in prologue, instead do multiple
    passes so that all the branch offsets settle down
 -- remove unnecessary checks on imm64 loads that verifier does
 -- consistently use bpf2sparc[] instead of direct register references

^ permalink raw reply

* [PATCH] xprtrdma: use offset_in_page() macro
From: Geliang Tang @ 2017-04-22  1:21 UTC (permalink / raw)
  To: J. Bruce Fields, Jeff Layton, Trond Myklebust, Anna Schumaker,
	David S. Miller, Chuck Lever, Sagi Grimberg
  Cc: Geliang Tang, linux-nfs, netdev, linux-kernel
In-Reply-To: <4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.com>

Use offset_in_page() macro instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 net/sunrpc/xprtrdma/rpc_rdma.c        | 4 ++--
 net/sunrpc/xprtrdma/svc_rdma_sendto.c | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a044be2..429beea 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -540,7 +540,7 @@ rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
 			goto out;
 
 		page = virt_to_page(xdr->tail[0].iov_base);
-		page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
+		page_base = offset_in_page(xdr->tail[0].iov_base);
 
 		/* If the content in the page list is an odd length,
 		 * xdr_write_pages() has added a pad at the beginning
@@ -587,7 +587,7 @@ rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
 	 */
 	if (xdr->tail[0].iov_len) {
 		page = virt_to_page(xdr->tail[0].iov_base);
-		page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
+		page_base = offset_in_page(xdr->tail[0].iov_base);
 		len = xdr->tail[0].iov_len;
 
 map_tail:
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 1736337..60b3f29 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -306,12 +306,11 @@ static int svc_rdma_dma_map_buf(struct svcxprt_rdma *rdma,
 				unsigned char *base,
 				unsigned int len)
 {
-	unsigned long offset = (unsigned long)base & ~PAGE_MASK;
 	struct ib_device *dev = rdma->sc_cm_id->device;
 	dma_addr_t dma_addr;
 
 	dma_addr = ib_dma_map_page(dev, virt_to_page(base),
-				   offset, len, DMA_TO_DEVICE);
+				   offset_in_page(base), len, DMA_TO_DEVICE);
 	if (ib_dma_mapping_error(dev, dma_addr))
 		return -EIO;
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH] net: atheros: atl1: use offset_in_page() macro
From: Geliang Tang @ 2017-04-22  1:21 UTC (permalink / raw)
  To: Jay Cliburn, Chris Snook, David S. Miller, Eric Dumazet
  Cc: Geliang Tang, netdev, linux-kernel
In-Reply-To: <4dbc77ccaaed98b183cf4dba58a4fa325fd65048.1492758503.git.geliangtang@gmail.com>

Use offset_in_page() macro instead of open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
---
 drivers/net/ethernet/atheros/atlx/atl1.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atlx/atl1.c b/drivers/net/ethernet/atheros/atlx/atl1.c
index 022772e..83d2db2 100644
--- a/drivers/net/ethernet/atheros/atlx/atl1.c
+++ b/drivers/net/ethernet/atheros/atlx/atl1.c
@@ -1886,7 +1886,7 @@ static u16 atl1_alloc_rx_buffers(struct atl1_adapter *adapter)
 		buffer_info->skb = skb;
 		buffer_info->length = (u16) adapter->rx_buffer_len;
 		page = virt_to_page(skb->data);
-		offset = (unsigned long)skb->data & ~PAGE_MASK;
+		offset = offset_in_page(skb->data);
 		buffer_info->dma = pci_map_page(pdev, page, offset,
 						adapter->rx_buffer_len,
 						PCI_DMA_FROMDEVICE);
@@ -2230,7 +2230,7 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb,
 		hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
 		buffer_info->length = hdr_len;
 		page = virt_to_page(skb->data);
-		offset = (unsigned long)skb->data & ~PAGE_MASK;
+		offset = offset_in_page(skb->data);
 		buffer_info->dma = pci_map_page(adapter->pdev, page,
 						offset, hdr_len,
 						PCI_DMA_TODEVICE);
@@ -2254,9 +2254,8 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb,
 				data_len -= buffer_info->length;
 				page = virt_to_page(skb->data +
 					(hdr_len + i * ATL1_MAX_TX_BUF_LEN));
-				offset = (unsigned long)(skb->data +
-					(hdr_len + i * ATL1_MAX_TX_BUF_LEN)) &
-					~PAGE_MASK;
+				offset = offset_in_page(skb->data +
+					(hdr_len + i * ATL1_MAX_TX_BUF_LEN));
 				buffer_info->dma = pci_map_page(adapter->pdev,
 					page, offset, buffer_info->length,
 					PCI_DMA_TODEVICE);
@@ -2268,7 +2267,7 @@ static void atl1_tx_map(struct atl1_adapter *adapter, struct sk_buff *skb,
 		/* not TSO */
 		buffer_info->length = buf_len;
 		page = virt_to_page(skb->data);
-		offset = (unsigned long)skb->data & ~PAGE_MASK;
+		offset = offset_in_page(skb->data);
 		buffer_info->dma = pci_map_page(adapter->pdev, page,
 			offset, buf_len, PCI_DMA_TODEVICE);
 		if (++next_to_use == tpd_ring->count)
-- 
2.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox