Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v7 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: kbuild test robot @ 2018-05-03  5:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: kbuild-all, netdev, cake
In-Reply-To: <152527386316.14936.5409621935637217368.stgit@alrua-kau>

Hi Toke,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/sched-Add-Common-Applications-Kept-Enhanced-cake-qdisc/20180503-073002


coccinelle warnings: (new ones prefixed by >>)

>> net/sched/sch_cake.c:580:2-3: Unneeded semicolon

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* [PATCH] sched: fix semicolon.cocci warnings
From: kbuild test robot @ 2018-05-03  5:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: kbuild-all, netdev, cake
In-Reply-To: <152527386316.14936.5409621935637217368.stgit@alrua-kau>

From: Fengguang Wu <fengguang.wu@intel.com>

net/sched/sch_cake.c:580:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 907a16741a03 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
CC: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---

 sch_cake.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -577,7 +577,7 @@ cake_hash(struct cake_tin_data *q, const
 	default:
 		dsthost_hash = 0;
 		srchost_hash = 0;
-	};
+	}
 
 	/* This *must* be after the above switch, since as a
 	 * side-effect it sorts the src and dst addresses.

^ permalink raw reply

* Re: [PATCH net] ipv4: fix fnhe usage by non-cached routes
From: Julian Anastasov @ 2018-05-03  5:32 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, netdev, Martin KaFai Lau, kernel-team, Xin Long
In-Reply-To: <a20853ac-e177-0fc3-1537-2b973b5a1713@gmail.com>


	Hello,

On Wed, 2 May 2018, David Ahern wrote:

> On 5/2/18 12:41 AM, Julian Anastasov wrote:
> > Allow some non-cached routes to use non-expired fnhe:
> > 
> > 1. ip_del_fnhe: moved above and now called by find_exception.
> > The 4.5+ commit deed49df7390 expires fnhe only when caching
> > routes. Change that to:
> > 
> > 1.1. use fnhe for non-cached local output routes, with the help
> > from (2)
> > 
> > 1.2. allow __mkroute_input to detect expired fnhe (outdated
> > fnhe_gw, for example) when do_cache is false, eg. when itag!=0
> > for unicast destinations.
> > 
> > 2. __mkroute_output: keep fi to allow local routes with orig_oif != 0
> > to use fnhe info even when the new route will not be cached into fnhe.
> > After commit 839da4d98960 ("net: ipv4: set orig_oif based on fib
> > result for local traffic") it means all local routes will be affected
> > because they are not cached. This change is used to solve a PMTU
> > problem with IPVS (and probably Netfilter DNAT) setups that redirect
> > local clients from target local IP (local route to Virtual IP)
> > to new remote IP target, eg. IPVS TUN real server. Loopback has
> > 64K MTU and we need to create fnhe on the local route that will
> > keep the reduced PMTU for the Virtual IP. Without this change
> > fnhe_pmtu is updated from ICMP but never exposed to non-cached
> > local routes. This includes routes with flowi4_oif!=0 for 4.6+ and
> > with flowi4_oif=any for 4.14+).
> 
> Can you add a test case to tools/testing/selftests/net/pmtu.sh to cover
> this situation?

	Sure, I'll give it a try.

> > @@ -1310,8 +1340,14 @@ static struct fib_nh_exception *find_exception(struct fib_nh *nh, __be32 daddr)
> >  
> >  	for (fnhe = rcu_dereference(hash[hval].chain); fnhe;
> >  	     fnhe = rcu_dereference(fnhe->fnhe_next)) {
> > -		if (fnhe->fnhe_daddr == daddr)
> > +		if (fnhe->fnhe_daddr == daddr) {
> > +			if (fnhe->fnhe_expires &&
> > +			    time_after(jiffies, fnhe->fnhe_expires)) {
> > +				ip_del_fnhe(nh, daddr);
> 
> I'm surprised this is done in the fast path vs gc time. (the existing
> code does as well; your change is only moving the call to make the input
> and output paths the same)
> 
> 
> The change looks correct to me and all of my functional tests passed.
> 
> Acked-by: David Ahern <dsahern@gmail.com>

	Thanks for the review!

Regards

^ permalink raw reply

* Re: [PATCH] net/xfrm: Fix lookups for states with spi == 0
From: Herbert Xu @ 2018-05-03  5:40 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, 0x7f454c46, Steffen Klassert, David S. Miller,
	netdev, Masahide NAKAMURA, YOSHIFUJI Hideaki
In-Reply-To: <1525264896.14025.23.camel@arista.com>

On Wed, May 02, 2018 at 01:41:36PM +0100, Dmitry Safonov wrote:
>
> But still it's possible to create ipsec with zero SPI.
> And it seems not making sense to search for a state with SPI hash if
> request has zero SPI.

Fair enough.  In fact a zero SPI is legal and defined for IPcomp.

The bug arose from this patch:

commit 7b4dc3600e4877178ba94c7fbf7e520421378aa6
Author: Masahide NAKAMURA <nakam@linux-ipv6.org>
Date:   Wed Sep 27 22:21:52 2006 -0700

    [XFRM]: Do not add a state whose SPI is zero to the SPI hash.
    
    SPI=0 is used for acquired IPsec SA and MIPv6 RO state.
    Such state should not be added to the SPI hash
    because we do not care about it on deleting path.
    
    Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

I think it would be better to revert this.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [lkp-robot] 486ad79630 [ 15.532543] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
From: Andrew Morton @ 2018-05-03  5:44 UTC (permalink / raw)
  To: Cong Wang
  Cc: kernel test robot, kernel test robot,
	Linux Memory Management List, Johannes Weiner, LKP, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <CAM_iQpVDtrGCqd7NQ1vJXTuLMdz=GwbnN77vdkmY+PxtFmKHTw@mail.gmail.com>

On Wed, 2 May 2018 21:58:25 -0700 Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, May 2, 2018 at 9:27 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > So it's saying that something which got committed into Linus's tree
> > after 4.17-rc3 has caused a NULL deref in
> > sock_release->llc_ui_release+0x3a/0xd0
> 
> Do you mean it contains commit 3a04ce7130a7
> ("llc: fix NULL pointer deref for SOCK_ZAPPED")?

That was in 4.17-rc3 so if this report's bisection is correct, that
patch is innocent.

origin.patch (http://ozlabs.org/~akpm/mmots/broken-out/origin.patch)
contains no changes to net/llc/af_llc.c so perhaps this crash is also
occurring in 4.17-rc3 base.

^ permalink raw reply

* Re: INFO: rcu detected stall in __schedule
From: Tetsuo Handa @ 2018-05-03  5:45 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs, dvyukov; +Cc: linux-kernel, linux-ppp, netdev, paulus
In-Reply-To: <000000000000d2fe62056b3ccca5@google.com>

I'm not sure whether this is a PPP bug.

As of uptime = 484, RCU says that it stalled for 125 seconds.

----------
[  484.407032] INFO: rcu_sched self-detected stall on CPU
[  484.412488] 	0-...!: (125000 ticks this GP) idle=f3e/1/4611686018427387906 softirq=112858/112858 fqs=0 
[  484.422300] 	 (t=125000 jiffies g=61626 c=61625 q=1534)
[  484.427663] rcu_sched kthread starved for 125000 jiffies! g61626 c61625 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
----------

484 - 125 = 359, which was about to start SND related fuzzing in that log.

----------
2033/05/18 03:36:31 executing program 1:
r0 = socket(0x40000a, 0x5, 0x7)
setsockopt$inet_int(r0, 0x0, 0x18, &(0x7f0000000000)=0x200, 0x4)
bind$inet6(r0, &(0x7f00000000c0)={0xa, 0x0, 0x0, @loopback={0x0, 0x1}}, 0x1c)
perf_event_open(&(0x7f0000000040)={0x2, 0x70, 0x3e5}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0)
timer_create(0x0, &(0x7f00000001c0)={0x0, 0x15, 0x0, @thr={&(0x7f0000000440), &(0x7f0000000540)}}, &(0x7f0000000200))
timer_getoverrun(0x0)
perf_event_open(&(0x7f000025c000)={0x2, 0x78, 0x3e3}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
r1 = syz_open_dev$sndctrl(&(0x7f0000000200)='/dev/snd/controlC#\x00', 0x2, 0x0)
perf_event_open(&(0x7f0000001000)={0x0, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8ce, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffffffffffffff8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={&(0x7f0000005000), 0x2}, 0x1000000000c}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
ioctl$SNDRV_CTL_IOCTL_SUBSCRIBE_EVENTS(r1, 0xc0045516, &(0x7f00000000c0)=0x1)
r2 = syz_open_dev$sndpcmp(&(0x7f0000000100)='/dev/snd/pcmC#D#p\x00', 0x1, 0x4000)
ioctl$SNDRV_SEQ_IOCTL_GET_QUEUE_CLIENT(r2, 0xc04c5349, &(0x7f0000000240)={0x200, 0xfffffffffffffcdc, 0x1})
syz_open_dev$tun(&(0x7f00000003c0)='/dev/net/tun\x00', 0x0, 0x20402)
ioctl$SNDRV_CTL_IOCTL_PVERSION(r1, 0xc1105517, &(0x7f0000001000)=""/250)
ioctl$SNDRV_CTL_IOCTL_SUBSCRIBE_EVENTS(r1, 0xc0045516, &(0x7f0000000000))

2033/05/18 03:36:31 executing program 4:
syz_emit_ethernet(0x3e, &(0x7f00000000c0)={@broadcast=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff], @empty=[0x0, 0x0, 0xb00000000000000], [], {@ipv4={0x800, {{0x5, 0x4, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x1, 0x0, @remote={0xac, 0x14, 0x14, 0xbb}, @dev={0xac, 0x14, 0x14}}, @icmp=@parameter_prob={0x5, 0x4, 0x0, 0x0, 0x0, 0x0, {0x5, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @local={0xac, 0x223, 0x14, 0xaa}, @dev={0xac, 0x14, 0x14}}}}}}}, &(0x7f0000000000)={0x0, 0x2, [0x0, 0x2e6]})

2033/05/18 03:36:31 executing program 1:
r0 = socket$pppoe(0x18, 0x1, 0x0)
connect$pppoe(r0, &(0x7f00000000c0)={0x18, 0x0, {0x1, @broadcast=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff], 'ip6_vti0\x00'}}, 0x1e)
r1 = socket(0x3, 0xb, 0x80000001)
setsockopt$inet_sctp6_SCTP_ADAPTATION_LAYER(r1, 0x84, 0x7, &(0x7f0000000100)={0x2}, 0x4)
ioctl$sock_inet_SIOCGIFADDR(r0, 0x8915, &(0x7f0000000040)={'veth1_to_bridge\x00', {0x2, 0x4e21}})
r2 = syz_open_dev$admmidi(&(0x7f0000000000)='/dev/admmidi#\x00', 0x6, 0x8000)
setsockopt$SO_VM_SOCKETS_BUFFER_MAX_SIZE(r2, 0x28, 0x2, &(0x7f0000000080)=0xffffffffffffff00, 0x8)

[  359.306427] snd_virmidi snd_virmidi.0: control 112:0:0:�\b:0 is already present
----------

^ permalink raw reply

* Re: [PATCH 2/2] drivers core: multi-threading device shutdown
From: Tobin C. Harding @ 2018-05-03  5:54 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: steven.sistare, daniel.m.jordan, linux-kernel, jeffrey.t.kirsher,
	intel-wired-lan, netdev, gregkh
In-Reply-To: <20180503035931.22439-3-pasha.tatashin@oracle.com>

This code was a pleasure to read, super clean.

On Wed, May 02, 2018 at 11:59:31PM -0400, Pavel Tatashin wrote:
> When system is rebooted, halted or kexeced device_shutdown() is
> called.
> 
> This function shuts down every single device by calling either:
> 	dev->bus->shutdown(dev)
> 	dev->driver->shutdown(dev)
> 
> Even on a machine just with a moderate amount of devices, device_shutdown()
> may take multiple seconds to complete. Because many devices require a
> specific delays to perform this operation.
> 
> Here is sample analysis of time it takes to call device_shutdown() on
> two socket Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz machine.
> 
> device_shutdown		2.95s
>  mlx4_shutdown		1.14s
>  megasas_shutdown	0.24s
>  ixgbe_shutdown		0.37s x 4 (four ixgbe devices on my machine).
>  the rest		0.09s
> 
> In mlx4 we spent the most time, but that is because there is a 1 second
> sleep:
> mlx4_shutdown
>  mlx4_unload_one
>   mlx4_free_ownership
>    msleep(1000)
> 
> With megasas we spend quoter of second, but sometimes longer (up-to 0.5s)
> in this path:
> 
>     megasas_shutdown
>       megasas_flush_cache
>         megasas_issue_blocked_cmd
>           wait_event_timeout
> 
> Finally, with ixgbe_shutdown() it takes 0.37 for each device, but that time
> is spread all over the place, with bigger offenders:
> 
>     ixgbe_shutdown
>       __ixgbe_shutdown
>         ixgbe_close_suspend
>           ixgbe_down
>             ixgbe_init_hw_generic
>               ixgbe_reset_hw_X540
>                 msleep(100);                        0.104483472
>                 ixgbe_get_san_mac_addr_generic      0.048414851
>                 ixgbe_get_wwn_prefix_generic        0.048409893
>               ixgbe_start_hw_X540
>                 ixgbe_start_hw_generic
>                   ixgbe_clear_hw_cntrs_generic      0.048581502
>                   ixgbe_setup_fc_generic            0.024225800
> 
>     All the ixgbe_*generic functions end-up calling:
>     ixgbe_read_eerd_X540()
>       ixgbe_acquire_swfw_sync_X540
>         usleep_range(5000, 6000);
>       ixgbe_release_swfw_sync_X540
>         usleep_range(5000, 6000);
> 
> While these are short sleeps, they end-up calling them over 24 times!
> 24 * 0.0055s = 0.132s. Adding-up to 0.528s for four devices.
> 
> While we should keep optimizing the individual device drivers, in some
> cases this is simply a hardware property that forces a specific delay, and
> we must wait.
> 
> So, the solution for this problem is to shutdown devices in parallel.
> However, we must shutdown children before shutting down parents, so parent
> device must wait for its children to finish.
> 
> With this patch, on the same machine devices_shutdown() takes 1.142s, and
> without mlx4 one second delay only 0.38s
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> ---
>  drivers/base/core.c | 238 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 189 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b610816eb887..f370369a303b 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -25,6 +25,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sysfs.h>
> +#include <linux/kthread.h>
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -2102,6 +2103,59 @@ const char *device_get_devnode(struct device *dev,
>  	return *tmp = s;
>  }
>  
> +/**
> + * device_children_count - device children count
> + * @parent: parent struct device.
> + *
> + * Returns number of children for this device or 0 if nonde.
> + */
> +static int device_children_count(struct device *parent)
> +{
> +	struct klist_iter i;
> +	int children = 0;
> +
> +	if (!parent->p)
> +		return 0;
> +
> +	klist_iter_init(&parent->p->klist_children, &i);
> +	while (next_device(&i))
> +		children++;
> +	klist_iter_exit(&i);
> +
> +	return children;
> +}
> +
> +/**
> + * device_get_child_by_index - Return child using the provide index.
> + * @parent: parent struct device.
> + * @index:  Index of the child, where 0 is the first child in the children list,
> + * and so on.
> + *
> + * Returns child or NULL if child with this index is not present.
> + */
> +static struct device *
> +device_get_child_by_index(struct device *parent, int index)
> +{
> +	struct klist_iter i;
> +	struct device *dev = NULL, *d;
> +	int child_index = 0;
> +
> +	if (!parent->p || index < 0)
> +		return NULL;
> +
> +	klist_iter_init(&parent->p->klist_children, &i);
> +	while ((d = next_device(&i)) != NULL) {

perhaps:
	while ((d = next_device(&i))) {

> +		if (child_index == index) {
> +			dev = d;
> +			break;
> +		}
> +		child_index++;
> +	}
> +	klist_iter_exit(&i);
> +
> +	return dev;
> +}
> +
>  /**
>   * device_for_each_child - device child iterator.
>   * @parent: parent struct device.
> @@ -2765,71 +2819,157 @@ int device_move(struct device *dev, struct device *new_parent,
>  }
>  EXPORT_SYMBOL_GPL(device_move);
>  
> +/*
> + * device_shutdown_one - call ->shutdown() for the device passed as
> + * argument.
> + */
> +static void device_shutdown_one(struct device *dev)
> +{
> +	/* Don't allow any more runtime suspends */
> +	pm_runtime_get_noresume(dev);
> +	pm_runtime_barrier(dev);
> +
> +	if (dev->class && dev->class->shutdown_pre) {
> +		if (initcall_debug)
> +			dev_info(dev, "shutdown_pre\n");
> +		dev->class->shutdown_pre(dev);
> +	}
> +	if (dev->bus && dev->bus->shutdown) {
> +		if (initcall_debug)
> +			dev_info(dev, "shutdown\n");
> +		dev->bus->shutdown(dev);
> +	} else if (dev->driver && dev->driver->shutdown) {
> +		if (initcall_debug)
> +			dev_info(dev, "shutdown\n");
> +		dev->driver->shutdown(dev);
> +	}
> +
> +	/* Release device lock, and decrement the reference counter */
> +	device_unlock(dev);
> +	put_device(dev);
> +}
> +
> +static DECLARE_COMPLETION(device_root_tasks_complete);
> +static void device_shutdown_tree(struct device *dev);
> +static atomic_t device_root_tasks;
> +
> +/*
> + * Passed as an argument to to device_shutdown_task().
> + * child_next_index	the next available child index.
> + * tasks_running	number of tasks still running. Each tasks decrements it
> + *			when job is finished and the last tasks signals that the
> + *			job is complete.
> + * complete		Used to signal job competition.
> + * parent		Parent device.
> + */
> +struct device_shutdown_task_data {
> +	atomic_t		child_next_index;
> +	atomic_t		tasks_running;
> +	struct completion	complete;
> +	struct device		*parent;
> +};
> +
> +static int device_shutdown_task(void *data)
> +{
> +	struct device_shutdown_task_data *tdata =
> +		(struct device_shutdown_task_data *)data;

perhaps:
	struct device_shutdown_task_data *tdata = data;

> +	int child_idx = atomic_inc_return(&tdata->child_next_index) - 1;
> +	struct device *dev = device_get_child_by_index(tdata->parent,
> +						       child_idx);

perhaps:
	struct device *dev = device_get_child_by_index(tdata->parent, child_idx);

This is over the 80 character limit but only by one character :)

> +
> +	if (dev)
> +		device_shutdown_tree(dev);
> +	if (atomic_dec_return(&tdata->tasks_running) == 0)
> +		complete(&tdata->complete);
> +	return 0;
> +}
> +
> +/*
> + * Shutdown device tree with root started in dev. If dev has no children
> + * simply shutdown only this device. If dev has children recursively shutdown
> + * children first, and only then the parent. For performance reasons children
> + * are shutdown in parallel using kernel threads.
> + */
> +static void device_shutdown_tree(struct device *dev)
> +{
> +	int children_count = device_children_count(dev);
> +
> +	if (children_count) {
> +		struct device_shutdown_task_data tdata;
> +		int i;
> +
> +		init_completion(&tdata.complete);
> +		atomic_set(&tdata.child_next_index, 0);
> +		atomic_set(&tdata.tasks_running, children_count);
> +		tdata.parent = dev;
> +
> +		for (i = 0; i < children_count; i++) {
> +			kthread_run(device_shutdown_task,
> +				    &tdata, "device_shutdown.%s",
> +				    dev_name(dev));
> +		}
> +		wait_for_completion(&tdata.complete);
> +	}
> +	device_shutdown_one(dev);
> +}
> +
> +/*
> + * On shutdown each root device (the one that does not have a parent) goes
> + * through this function.
> + */
> +static int
> +device_shutdown_root_task(void *data)
> +{
> +	struct device *dev = (struct device *)data;
> +
> +	device_shutdown_tree(dev);
> +	if (atomic_dec_return(&device_root_tasks) == 0)
> +		complete(&device_root_tasks_complete);
> +	return 0;
> +}
> +
>  /**
>   * device_shutdown - call ->shutdown() on each device to shutdown.
>   */
>  void device_shutdown(void)
>  {
> -	struct device *dev, *parent;
> +	struct list_head *pos, *next;
> +	int root_devices = 0;
> +	struct device *dev;
>  
>  	spin_lock(&devices_kset->list_lock);
>  	/*
> -	 * Walk the devices list backward, shutting down each in turn.
> -	 * Beware that device unplug events may also start pulling
> -	 * devices offline, even as the system is shutting down.
> +	 * Prepare devices for shutdown: lock, and increment references in every
> +	 * devices. Remove child devices from the list, and count number of root

         * Prepare devices for shutdown: lock, and increment reference in each
         * device. Remove child devices from the list, and count number of root


Hope this helps,
Tobin.

^ permalink raw reply

* Re: INFO: rcu detected stall in __schedule
From: Dmitry Vyukov @ 2018-05-03  6:07 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, LKML, linux-ppp, netdev, paulus
In-Reply-To: <1356afb7-80cf-d9d9-c282-e4e819807376@I-love.SAKURA.ne.jp>

On Thu, May 3, 2018 at 7:45 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> I'm not sure whether this is a PPP bug.
>
> As of uptime = 484, RCU says that it stalled for 125 seconds.
>
> ----------
> [  484.407032] INFO: rcu_sched self-detected stall on CPU
> [  484.412488]  0-...!: (125000 ticks this GP) idle=f3e/1/4611686018427387906 softirq=112858/112858 fqs=0
> [  484.422300]   (t=125000 jiffies g=61626 c=61625 q=1534)
> [  484.427663] rcu_sched kthread starved for 125000 jiffies! g61626 c61625 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
> ----------
>
> 484 - 125 = 359, which was about to start SND related fuzzing in that log.
>
> ----------
> 2033/05/18 03:36:31 executing program 1:
> r0 = socket(0x40000a, 0x5, 0x7)
> setsockopt$inet_int(r0, 0x0, 0x18, &(0x7f0000000000)=0x200, 0x4)
> bind$inet6(r0, &(0x7f00000000c0)={0xa, 0x0, 0x0, @loopback={0x0, 0x1}}, 0x1c)
> perf_event_open(&(0x7f0000000040)={0x2, 0x70, 0x3e5}, 0x0, 0xffffffffffffffff, 0xffffffffffffffff, 0x0)
> timer_create(0x0, &(0x7f00000001c0)={0x0, 0x15, 0x0, @thr={&(0x7f0000000440), &(0x7f0000000540)}}, &(0x7f0000000200))
> timer_getoverrun(0x0)
> perf_event_open(&(0x7f000025c000)={0x2, 0x78, 0x3e3}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
> r1 = syz_open_dev$sndctrl(&(0x7f0000000200)='/dev/snd/controlC#\x00', 0x2, 0x0)
> perf_event_open(&(0x7f0000001000)={0x0, 0x70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8ce, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffffffffffffff8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={&(0x7f0000005000), 0x2}, 0x1000000000c}, 0x0, 0x0, 0xffffffffffffffff, 0x0)
> ioctl$SNDRV_CTL_IOCTL_SUBSCRIBE_EVENTS(r1, 0xc0045516, &(0x7f00000000c0)=0x1)
> r2 = syz_open_dev$sndpcmp(&(0x7f0000000100)='/dev/snd/pcmC#D#p\x00', 0x1, 0x4000)
> ioctl$SNDRV_SEQ_IOCTL_GET_QUEUE_CLIENT(r2, 0xc04c5349, &(0x7f0000000240)={0x200, 0xfffffffffffffcdc, 0x1})
> syz_open_dev$tun(&(0x7f00000003c0)='/dev/net/tun\x00', 0x0, 0x20402)
> ioctl$SNDRV_CTL_IOCTL_PVERSION(r1, 0xc1105517, &(0x7f0000001000)=""/250)
> ioctl$SNDRV_CTL_IOCTL_SUBSCRIBE_EVENTS(r1, 0xc0045516, &(0x7f0000000000))
>
> 2033/05/18 03:36:31 executing program 4:
> syz_emit_ethernet(0x3e, &(0x7f00000000c0)={@broadcast=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff], @empty=[0x0, 0x0, 0xb00000000000000], [], {@ipv4={0x800, {{0x5, 0x4, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x1, 0x0, @remote={0xac, 0x14, 0x14, 0xbb}, @dev={0xac, 0x14, 0x14}}, @icmp=@parameter_prob={0x5, 0x4, 0x0, 0x0, 0x0, 0x0, {0x5, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @local={0xac, 0x223, 0x14, 0xaa}, @dev={0xac, 0x14, 0x14}}}}}}}, &(0x7f0000000000)={0x0, 0x2, [0x0, 0x2e6]})
>
> 2033/05/18 03:36:31 executing program 1:
> r0 = socket$pppoe(0x18, 0x1, 0x0)
> connect$pppoe(r0, &(0x7f00000000c0)={0x18, 0x0, {0x1, @broadcast=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff], 'ip6_vti0\x00'}}, 0x1e)
> r1 = socket(0x3, 0xb, 0x80000001)
> setsockopt$inet_sctp6_SCTP_ADAPTATION_LAYER(r1, 0x84, 0x7, &(0x7f0000000100)={0x2}, 0x4)
> ioctl$sock_inet_SIOCGIFADDR(r0, 0x8915, &(0x7f0000000040)={'veth1_to_bridge\x00', {0x2, 0x4e21}})
> r2 = syz_open_dev$admmidi(&(0x7f0000000000)='/dev/admmidi#\x00', 0x6, 0x8000)
> setsockopt$SO_VM_SOCKETS_BUFFER_MAX_SIZE(r2, 0x28, 0x2, &(0x7f0000000080)=0xffffffffffffff00, 0x8)
>
> [  359.306427] snd_virmidi snd_virmidi.0: control 112:0:0:� :0 is already present
> ----------


It's the next one that caused the hang (the number in "Comm:
syz-executor1" matches with the number in "executing program 1"):

[  359.306427] snd_virmidi snd_virmidi.0: control 112:0:0:� :0 is
already present
2033/05/18 03:36:31 executing program 1:
r0 = openat$ptmx(0xffffffffffffff9c,
&(0x7f0000000140)='/dev/ptmx\x00', 0x0, 0x0)
ioctl$TCSETS(r0, 0x40045431, &(0x7f00005befdc))
r1 = syz_open_pts(r0, 0x20201)
fcntl$setstatus(r1, 0x4, 0x2800)
ioctl$TCXONC(r1, 0x540a, 0x0)
perf_event_open(&(0x7f000025c000)={0x2, 0x70, 0x3e5, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={&(0x7f000031f000)}}, 0x0,
0x0, 0xffffffffffffffff, 0x0)
write(r1, &(0x7f0000fd6000)='z', 0x1)
r2 = openat$ipvs(0xffffffffffffff9c,
&(0x7f0000000000)='/proc/sys/net/ipv4/vs/sync_ports\x00', 0x2, 0x0)
ioctl$ifreq_SIOCGIFINDEX_team(0xffffffffffffff9c, 0x8933,
&(0x7f00000012c0)={'team0\x00', <r3=>0x0})
bind$packet(r2, &(0x7f0000001300)={0x11, 0x1f, r3, 0x1, 0x0, 0x6,
@random="31e8917e98e6"}, 0x14)
ioctl$TIOCSETD(r1, 0x5423, &(0x7f00000000c0)=0x3)
ioctl$TCFLSH(r0, 0x540b, 0x0)
close(r0)

^ permalink raw reply

* [PATCH v6] bpf, x86_32: add eBPF JIT compiler for ia32
From: Wang YanQing @ 2018-05-03  6:10 UTC (permalink / raw)
  To: daniel
  Cc: ast, illusionist.neo, tglx, mingo, hpa, davem, x86, netdev,
	linux-kernel

The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
only. Classic BPF is supported because of the conversion by BPF core.

Almost all instructions from eBPF ISA supported except the following:
BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW

It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.

IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
eBPF registers, R0-R10, are simulated through scratch space on stack.

The reasons behind the hardware registers allocation policy are:
1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
  for general eBPF 64bit register simulation.
2:We need at least 4 registers to simulate most eBPF ISA operations
  on registers operands instead of on register&memory operands.
3:We need to put BPF_REG_AX on hardware registers, or constant blinding
  will degrade jit performance heavily.

Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
Testing results on i5-5200U:
1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
2) test_progs: Summary: 83 PASSED, 0 FAILED.
3) test_lpm: OK
4) test_lru_map: OK
5) test_verifier: Summary: 828 PASSED, 0 FAILED.

Above tests are all done in following two conditions separately:
1:bpf_jit_enable=1 and bpf_jit_harden=0
2:bpf_jit_enable=1 and bpf_jit_harden=2

Below are some numbers for this jit implementation:
Note:
  I run test_progs in kselftest 100 times continuously for every condition,
  the numbers are in format: total/times=avg.
  The numbers that test_bpf reports show almost the same relation.

a:jit_enable=0 and jit_harden=0            b:jit_enable=1 and jit_harden=0
  test_pkt_access:PASS:ipv4:15622/100=156    test_pkt_access:PASS:ipv4:10674/100=106
  test_pkt_access:PASS:ipv6:9130/100=91      test_pkt_access:PASS:ipv6:4855/100=48
  test_xdp:PASS:ipv4:240198/100=2401         test_xdp:PASS:ipv4:138912/100=1389
  test_xdp:PASS:ipv6:137326/100=1373         test_xdp:PASS:ipv6:68542/100=685
  test_l4lb:PASS:ipv4:61100/100=611          test_l4lb:PASS:ipv4:37302/100=373
  test_l4lb:PASS:ipv6:101000/100=1010        test_l4lb:PASS:ipv6:55030/100=550

c:jit_enable=1 and jit_harden=2
  test_pkt_access:PASS:ipv4:10558/100=105
  test_pkt_access:PASS:ipv6:5092/100=50
  test_xdp:PASS:ipv4:131902/100=1319
  test_xdp:PASS:ipv6:77932/100=779
  test_l4lb:PASS:ipv4:38924/100=389
  test_l4lb:PASS:ipv6:57520/100=575

The numbers show we get 30%~50% improvement.

See Documentation/networking/filter.txt for more information.

Signed-off-by: Wang YanQing <udknight@gmail.com>
---
 Changes v5-v6:
 1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
   consistence reason.
 2:Clean up non-standard comments, reported by Daniel Borkmann.
 3:Fix a memory leak issue, repoted by Daniel Borkmann.

 Changes v4-v5:
 1:Delete is_on_stack, BPF_REG_AX is the only one
   on real hardware registers, so just check with
   it.
 2:Apply commit 1612a981b766 ("bpf, x64: fix JIT emission
   for dead code"), suggested by Daniel Borkmann.
 
 Changes v3-v4:
 1:Fix changelog in commit.
   I install llvm-6.0, then test_progs willn't report errors.
   I submit another patch:
   "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
   to fix another problem, after that patch, test_verifier willn't report errors too.
 2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
 
 Changes v2-v3:
 1:Move BPF_REG_AX to real hardware registers for performance reason.
 3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
 4:Delete partial codes in 1c2a088a6626, suggested by Daniel Borkmann.
 5:Some bug fixes and comments improvement.
 
 Changes v1-v2:
 1:Fix bug in emit_ia32_neg64.
 2:Fix bug in emit_ia32_arsh_r64.
 3:Delete filename in top level comment, suggested by Thomas Gleixner.
 4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
 5:Rewrite some words in changelog.
 6:CodingSytle improvement and a little more comments.

 arch/x86/Kconfig                     |    2 +-
 arch/x86/include/asm/nospec-branch.h |   30 +-
 arch/x86/net/Makefile                |    9 +-
 arch/x86/net/bpf_jit_comp32.c        | 2553 ++++++++++++++++++++++++++++++++++
 4 files changed, 2588 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/net/bpf_jit_comp32.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c07f492..d51a71d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -138,7 +138,7 @@ config X86
 	select HAVE_DMA_CONTIGUOUS
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS
-	select HAVE_EBPF_JIT			if X86_64
+	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
 	select HAVE_EXIT_THREAD
 	select HAVE_FENTRY			if X86_64 || DYNAMIC_FTRACE
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index f928ad9..2cd344d 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -291,16 +291,20 @@ static inline void indirect_branch_prediction_barrier(void)
  *    lfence
  *    jmp spec_trap
  *  do_rop:
- *    mov %rax,(%rsp)
+ *    mov %rax,(%rsp) for x86_64
+ *    mov %edx,(%esp) for x86_32
  *    retq
  *
  * Without retpolines configured:
  *
- *    jmp *%rax
+ *    jmp *%rax for x86_64
+ *    jmp *%edx for x86_32
  */
 #ifdef CONFIG_RETPOLINE
+#ifdef CONFIG_X86_64
 # define RETPOLINE_RAX_BPF_JIT_SIZE	17
 # define RETPOLINE_RAX_BPF_JIT()				\
+do {								\
 	EMIT1_off32(0xE8, 7);	 /* callq do_rop */		\
 	/* spec_trap: */					\
 	EMIT2(0xF3, 0x90);       /* pause */			\
@@ -308,11 +312,31 @@ static inline void indirect_branch_prediction_barrier(void)
 	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
 	/* do_rop: */						\
 	EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */	\
-	EMIT1(0xC3);             /* retq */
+	EMIT1(0xC3);             /* retq */			\
+} while (0)
 #else
+# define RETPOLINE_EDX_BPF_JIT()				\
+do {								\
+	EMIT1_off32(0xE8, 7);	 /* call do_rop */		\
+	/* spec_trap: */					\
+	EMIT2(0xF3, 0x90);       /* pause */			\
+	EMIT3(0x0F, 0xAE, 0xE8); /* lfence */			\
+	EMIT2(0xEB, 0xF9);       /* jmp spec_trap */		\
+	/* do_rop: */						\
+	EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */		\
+	EMIT1(0xC3);             /* ret */			\
+} while (0)
+#endif
+#else /* !CONFIG_RETPOLINE */
+
+#ifdef CONFIG_X86_64
 # define RETPOLINE_RAX_BPF_JIT_SIZE	2
 # define RETPOLINE_RAX_BPF_JIT()				\
 	EMIT2(0xFF, 0xE0);	 /* jmp *%rax */
+#else
+# define RETPOLINE_EDX_BPF_JIT()				\
+	EMIT2(0xFF, 0xE2) /* jmp *%edx */
+#endif
 #endif
 
 #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
index fefb4b6..f54c9d4 100644
--- a/arch/x86/net/Makefile
+++ b/arch/x86/net/Makefile
@@ -1,6 +1,11 @@
 #
 # Arch-specific network modules
 #
-OBJECT_FILES_NON_STANDARD_bpf_jit.o += y
 
-obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
+
+ifeq ($(CONFIG_X86_32),y)
+        obj-$(CONFIG_BPF_JIT) += bpf_jit_comp32.o
+else
+        OBJECT_FILES_NON_STANDARD_bpf_jit.o += y
+        obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
+endif
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
new file mode 100644
index 0000000..61e6134
--- /dev/null
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -0,0 +1,2553 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Just-In-Time compiler for eBPF filters on IA32 (32bit x86)
+ *
+ * Author: Wang YanQing (udknight@gmail.com)
+ * The code based on code and ideas from:
+ * Eric Dumazet (eric.dumazet@gmail.com)
+ * and from:
+ * Shubham Bansal <illusionist.neo@gmail.com>
+ */
+
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <asm/cacheflush.h>
+#include <asm/set_memory.h>
+#include <asm/nospec-branch.h>
+#include <linux/bpf.h>
+
+/*
+ * eBPF prog stack layout:
+ *
+ *                         high
+ * original ESP =>        +-----+
+ *                        |     | callee saved registers
+ *                        +-----+
+ *                        | ... | eBPF JIT scratch space
+ * BPF_FP,IA32_EBP  =>    +-----+
+ *                        | ... | eBPF prog stack
+ *                        +-----+
+ *                        |RSVD | JIT scratchpad
+ * current ESP =>         +-----+
+ *                        |     |
+ *                        | ... | Function call stack
+ *                        |     |
+ *                        +-----+
+ *                          low
+ *
+ * The callee saved registers:
+ *
+ *                                high
+ * original ESP =>        +------------------+ \
+ *                        |        ebp       | |
+ * current EBP =>         +------------------+ } callee saved registers
+ *                        |    ebx,esi,edi   | |
+ *                        +------------------+ /
+ *                                low
+ */
+
+static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
+{
+	if (len == 1)
+		*ptr = bytes;
+	else if (len == 2)
+		*(u16 *)ptr = bytes;
+	else {
+		*(u32 *)ptr = bytes;
+		barrier();
+	}
+	return ptr + len;
+}
+
+#define EMIT(bytes, len) \
+	do { prog = emit_code(prog, bytes, len); cnt += len; } while (0)
+
+#define EMIT1(b1)		EMIT(b1, 1)
+#define EMIT2(b1, b2)		EMIT((b1) + ((b2) << 8), 2)
+#define EMIT3(b1, b2, b3)	EMIT((b1) + ((b2) << 8) + ((b3) << 16), 3)
+#define EMIT4(b1, b2, b3, b4)   \
+	EMIT((b1) + ((b2) << 8) + ((b3) << 16) + ((b4) << 24), 4)
+
+#define EMIT1_off32(b1, off) \
+	do { EMIT1(b1); EMIT(off, 4); } while (0)
+#define EMIT2_off32(b1, b2, off) \
+	do { EMIT2(b1, b2); EMIT(off, 4); } while (0)
+#define EMIT3_off32(b1, b2, b3, off) \
+	do { EMIT3(b1, b2, b3); EMIT(off, 4); } while (0)
+#define EMIT4_off32(b1, b2, b3, b4, off) \
+	do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0)
+
+#define jmp_label(label, jmp_insn_len) (label - cnt - jmp_insn_len)
+
+static bool is_imm8(int value)
+{
+	return value <= 127 && value >= -128;
+}
+
+static bool is_simm32(s64 value)
+{
+	return value == (s64) (s32) value;
+}
+
+#define STACK_OFFSET(k)	(k)
+#define TCALL_CNT	(MAX_BPF_JIT_REG + 0)	/* Tail Call Count */
+
+#define IA32_EAX	(0x0)
+#define IA32_EBX	(0x3)
+#define IA32_ECX	(0x1)
+#define IA32_EDX	(0x2)
+#define IA32_ESI	(0x6)
+#define IA32_EDI	(0x7)
+#define IA32_EBP	(0x5)
+#define IA32_ESP	(0x4)
+
+/*
+ * List of x86 cond jumps opcodes (. + s8)
+ * Add 0x10 (and an extra 0x0f) to generate far jumps (. + s32)
+ */
+#define IA32_JB  0x72
+#define IA32_JAE 0x73
+#define IA32_JE  0x74
+#define IA32_JNE 0x75
+#define IA32_JBE 0x76
+#define IA32_JA  0x77
+#define IA32_JL  0x7C
+#define IA32_JGE 0x7D
+#define IA32_JLE 0x7E
+#define IA32_JG  0x7F
+
+/*
+ * Map eBPF registers to IA32 32bit registers or stack scratch space.
+ *
+ * 1. All the registers, R0-R10, are mapped to scratch space on stack.
+ * 2. We need two 64 bit temp registers to do complex operations on eBPF
+ *    registers.
+ * 3. For performance reason, the BPF_REG_AX for blinding constant, is
+ *    mapped to real hardware register pair, IA32_ESI and IA32_EDI.
+ *
+ * As the eBPF registers are all 64 bit registers and IA32 has only 32 bit
+ * registers, we have to map each eBPF registers with two IA32 32 bit regs
+ * or scratch memory space and we have to build eBPF 64 bit register from those.
+ *
+ * We use IA32_EAX, IA32_EDX, IA32_ECX, IA32_EBX as temporary registers.
+ */
+static const u8 bpf2ia32[][2] = {
+	/* Return value from in-kernel function, and exit value from eBPF */
+	[BPF_REG_0] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+
+	/* The arguments from eBPF program to in-kernel function */
+	/* Stored on stack scratch space */
+	[BPF_REG_1] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+	[BPF_REG_2] = {STACK_OFFSET(16), STACK_OFFSET(20)},
+	[BPF_REG_3] = {STACK_OFFSET(24), STACK_OFFSET(28)},
+	[BPF_REG_4] = {STACK_OFFSET(32), STACK_OFFSET(36)},
+	[BPF_REG_5] = {STACK_OFFSET(40), STACK_OFFSET(44)},
+
+	/* Callee saved registers that in-kernel function will preserve */
+	/* Stored on stack scratch space */
+	[BPF_REG_6] = {STACK_OFFSET(48), STACK_OFFSET(52)},
+	[BPF_REG_7] = {STACK_OFFSET(56), STACK_OFFSET(60)},
+	[BPF_REG_8] = {STACK_OFFSET(64), STACK_OFFSET(68)},
+	[BPF_REG_9] = {STACK_OFFSET(72), STACK_OFFSET(76)},
+
+	/* Read only Frame Pointer to access Stack */
+	[BPF_REG_FP] = {STACK_OFFSET(80), STACK_OFFSET(84)},
+
+	/* Temporary register for blinding constants. */
+	[BPF_REG_AX] = {IA32_ESI, IA32_EDI},
+
+	/* Tail call count. Stored on stack scratch space. */
+	[TCALL_CNT] = {STACK_OFFSET(88), STACK_OFFSET(92)},
+};
+
+#define dst_lo	dst[0]
+#define dst_hi	dst[1]
+#define src_lo	src[0]
+#define src_hi	src[1]
+
+#define STACK_ALIGNMENT	8
+/*
+ * Stack space for BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4,
+ * BPF_REG_5, BPF_REG_6, BPF_REG_7, BPF_REG_8, BPF_REG_9,
+ * BPF_REG_FP, BPF_REG_AX and Tail call counts.
+ */
+#define SCRATCH_SIZE 96
+
+/* Total stack size used in JITed code */
+#define _STACK_SIZE \
+	(stack_depth + \
+	 + SCRATCH_SIZE + \
+	 + 4 /* Extra space for skb_copy_bits buffer */)
+
+#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
+
+/* Get the offset of eBPF REGISTERs stored on scratch space. */
+#define STACK_VAR(off) (off)
+
+/* Offset of skb_copy_bits buffer */
+#define SKB_BUFFER STACK_VAR(SCRATCH_SIZE)
+
+/* Encode 'dst_reg' register into IA32 opcode 'byte' */
+static u8 add_1reg(u8 byte, u32 dst_reg)
+{
+	return byte + dst_reg;
+}
+
+/* Encode 'dst_reg' and 'src_reg' registers into IA32 opcode 'byte' */
+static u8 add_2reg(u8 byte, u32 dst_reg, u32 src_reg)
+{
+	return byte + dst_reg + (src_reg << 3);
+}
+
+static void jit_fill_hole(void *area, unsigned int size)
+{
+	/* Fill whole space with int3 instructions */
+	memset(area, 0xcc, size);
+}
+
+static inline void emit_ia32_mov_i(const u8 dst, const u32 val, bool dstk,
+				   u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	if (dstk) {
+		if (val == 0) {
+			/* xor eax,eax */
+			EMIT2(0x33, add_2reg(0xC0, IA32_EAX, IA32_EAX));
+			/* mov dword ptr [ebp+off],eax */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(dst));
+		} else {
+			EMIT3_off32(0xC7, add_1reg(0x40, IA32_EBP),
+				    STACK_VAR(dst), val);
+		}
+	} else {
+		if (val == 0)
+			EMIT2(0x33, add_2reg(0xC0, dst, dst));
+		else
+			EMIT2_off32(0xC7, add_1reg(0xC0, dst),
+				    val);
+	}
+	*pprog = prog;
+}
+
+/* dst = imm (4 bytes)*/
+static inline void emit_ia32_mov_r(const u8 dst, const u8 src, bool dstk,
+				   bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 sreg = sstk ? IA32_EAX : src;
+
+	if (sstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(src));
+	if (dstk)
+		/* mov dword ptr [ebp+off],eax */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, sreg), STACK_VAR(dst));
+	else
+		/* mov dst,sreg */
+		EMIT2(0x89, add_2reg(0xC0, dst, sreg));
+
+	*pprog = prog;
+}
+
+/* dst = src */
+static inline void emit_ia32_mov_r64(const bool is64, const u8 dst[],
+				     const u8 src[], bool dstk,
+				     bool sstk, u8 **pprog)
+{
+	emit_ia32_mov_r(dst_lo, src_lo, dstk, sstk, pprog);
+	if (is64)
+		/* complete 8 byte move */
+		emit_ia32_mov_r(dst_hi, src_hi, dstk, sstk, pprog);
+	else
+		/* zero out high 4 bytes */
+		emit_ia32_mov_i(dst_hi, 0, dstk, pprog);
+}
+
+/* Sign extended move */
+static inline void emit_ia32_mov_i64(const bool is64, const u8 dst[],
+				     const u32 val, bool dstk, u8 **pprog)
+{
+	u32 hi = 0;
+
+	if (is64 && (val & (1<<31)))
+		hi = (u32)~0;
+	emit_ia32_mov_i(dst_lo, val, dstk, pprog);
+	emit_ia32_mov_i(dst_hi, hi, dstk, pprog);
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst * src
+ */
+static inline void emit_ia32_mul_r(const u8 dst, const u8 src, bool dstk,
+				   bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 sreg = sstk ? IA32_ECX : src;
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src));
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+	else
+		/* mov eax,dst */
+		EMIT2(0x8B, add_2reg(0xC0, dst, IA32_EAX));
+
+
+	EMIT2(0xF7, add_1reg(0xE0, sreg));
+
+	if (dstk)
+		/* mov dword ptr [ebp+off],eax */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst));
+	else
+		/* mov dst,eax */
+		EMIT2(0x89, add_2reg(0xC0, dst, IA32_EAX));
+
+	*pprog = prog;
+}
+
+static inline void emit_ia32_to_le_r64(const u8 dst[], s32 val,
+					 bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk && val != 64) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+	switch (val) {
+	case 16:
+		/*
+		 * Emit 'movzwl eax,ax' to zero extend 16-bit
+		 * into 64 bit
+		 */
+		EMIT2(0x0F, 0xB7);
+		EMIT1(add_2reg(0xC0, dreg_lo, dreg_lo));
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+		break;
+	case 32:
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+		break;
+	case 64:
+		/* nop */
+		break;
+	}
+
+	if (dstk && val != 64) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+static inline void emit_ia32_to_be_r64(const u8 dst[], s32 val,
+				       bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+	switch (val) {
+	case 16:
+		/* Emit 'ror %ax, 8' to swap lower 2 bytes */
+		EMIT1(0x66);
+		EMIT3(0xC1, add_1reg(0xC8, dreg_lo), 8);
+
+		EMIT2(0x0F, 0xB7);
+		EMIT1(add_2reg(0xC0, dreg_lo, dreg_lo));
+
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+		break;
+	case 32:
+		/* Emit 'bswap eax' to swap lower 4 bytes */
+		EMIT1(0x0F);
+		EMIT1(add_1reg(0xC8, dreg_lo));
+
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+		break;
+	case 64:
+		/* Emit 'bswap eax' to swap lower 4 bytes */
+		EMIT1(0x0F);
+		EMIT1(add_1reg(0xC8, dreg_lo));
+
+		/* Emit 'bswap edx' to swap lower 4 bytes */
+		EMIT1(0x0F);
+		EMIT1(add_1reg(0xC8, dreg_hi));
+
+		/* mov ecx,dreg_hi */
+		EMIT2(0x89, add_2reg(0xC0, IA32_ECX, dreg_hi));
+		/* mov dreg_hi,dreg_lo */
+		EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+		/* mov dreg_lo,ecx */
+		EMIT2(0x89, add_2reg(0xC0, dreg_lo, IA32_ECX));
+
+		break;
+	}
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (div|mod) src
+ */
+static inline void emit_ia32_div_mod_r(const u8 op, const u8 dst, const u8 src,
+				       bool dstk, bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(src));
+	else if (src != IA32_ECX)
+		/* mov ecx,src */
+		EMIT2(0x8B, add_2reg(0xC0, src, IA32_ECX));
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst));
+	else
+		/* mov eax,dst */
+		EMIT2(0x8B, add_2reg(0xC0, dst, IA32_EAX));
+
+	/* xor edx,edx */
+	EMIT2(0x31, add_2reg(0xC0, IA32_EDX, IA32_EDX));
+	/* div ecx */
+	EMIT2(0xF7, add_1reg(0xF0, IA32_ECX));
+
+	if (op == BPF_MOD) {
+		if (dstk)
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+			      STACK_VAR(dst));
+		else
+			EMIT2(0x89, add_2reg(0xC0, dst, IA32_EDX));
+	} else {
+		if (dstk)
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(dst));
+		else
+			EMIT2(0x89, add_2reg(0xC0, dst, IA32_EAX));
+	}
+	*pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (shift) src
+ */
+static inline void emit_ia32_shift_r(const u8 op, const u8 dst, const u8 src,
+				     bool dstk, bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg = dstk ? IA32_EAX : dst;
+	u8 b2;
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src));
+	else if (src != IA32_ECX)
+		/* mov ecx,src */
+		EMIT2(0x8B, add_2reg(0xC0, src, IA32_ECX));
+
+	switch (op) {
+	case BPF_LSH:
+		b2 = 0xE0; break;
+	case BPF_RSH:
+		b2 = 0xE8; break;
+	case BPF_ARSH:
+		b2 = 0xF8; break;
+	default:
+		return;
+	}
+	EMIT2(0xD3, add_1reg(b2, dreg));
+
+	if (dstk)
+		/* mov dword ptr [ebp+off],dreg */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg), STACK_VAR(dst));
+	*pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (op) src
+ */
+static inline void emit_ia32_alu_r(const bool is64, const bool hi, const u8 op,
+				   const u8 dst, const u8 src, bool dstk,
+				   bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 sreg = sstk ? IA32_EAX : src;
+	u8 dreg = dstk ? IA32_EDX : dst;
+
+	if (sstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(src));
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(dst));
+
+	switch (BPF_OP(op)) {
+	/* dst = dst + src */
+	case BPF_ADD:
+		if (hi && is64)
+			EMIT2(0x11, add_2reg(0xC0, dreg, sreg));
+		else
+			EMIT2(0x01, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst - src */
+	case BPF_SUB:
+		if (hi && is64)
+			EMIT2(0x19, add_2reg(0xC0, dreg, sreg));
+		else
+			EMIT2(0x29, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst | src */
+	case BPF_OR:
+		EMIT2(0x09, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst & src */
+	case BPF_AND:
+		EMIT2(0x21, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst ^ src */
+	case BPF_XOR:
+		EMIT2(0x31, add_2reg(0xC0, dreg, sreg));
+		break;
+	}
+
+	if (dstk)
+		/* mov dword ptr [ebp+off],dreg */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg),
+		      STACK_VAR(dst));
+	*pprog = prog;
+}
+
+/* ALU operation (64 bit) */
+static inline void emit_ia32_alu_r64(const bool is64, const u8 op,
+				     const u8 dst[], const u8 src[],
+				     bool dstk,  bool sstk,
+				     u8 **pprog)
+{
+	u8 *prog = *pprog;
+
+	emit_ia32_alu_r(is64, false, op, dst_lo, src_lo, dstk, sstk, &prog);
+	if (is64)
+		emit_ia32_alu_r(is64, true, op, dst_hi, src_hi, dstk, sstk,
+				&prog);
+	else
+		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+	*pprog = prog;
+}
+
+/*
+ * ALU operation (32 bit)
+ * dst = dst (op) val
+ */
+static inline void emit_ia32_alu_i(const bool is64, const bool hi, const u8 op,
+				   const u8 dst, const s32 val, bool dstk,
+				   u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg = dstk ? IA32_EAX : dst;
+	u8 sreg = IA32_EDX;
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(dst));
+
+	if (!is_imm8(val))
+		/* mov edx,imm32*/
+		EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EDX), val);
+
+	switch (op) {
+	/* dst = dst + val */
+	case BPF_ADD:
+		if (hi && is64) {
+			if (is_imm8(val))
+				EMIT3(0x83, add_1reg(0xD0, dreg), val);
+			else
+				EMIT2(0x11, add_2reg(0xC0, dreg, sreg));
+		} else {
+			if (is_imm8(val))
+				EMIT3(0x83, add_1reg(0xC0, dreg), val);
+			else
+				EMIT2(0x01, add_2reg(0xC0, dreg, sreg));
+		}
+		break;
+	/* dst = dst - val */
+	case BPF_SUB:
+		if (hi && is64) {
+			if (is_imm8(val))
+				EMIT3(0x83, add_1reg(0xD8, dreg), val);
+			else
+				EMIT2(0x19, add_2reg(0xC0, dreg, sreg));
+		} else {
+			if (is_imm8(val))
+				EMIT3(0x83, add_1reg(0xE8, dreg), val);
+			else
+				EMIT2(0x29, add_2reg(0xC0, dreg, sreg));
+		}
+		break;
+	/* dst = dst | val */
+	case BPF_OR:
+		if (is_imm8(val))
+			EMIT3(0x83, add_1reg(0xC8, dreg), val);
+		else
+			EMIT2(0x09, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst & val */
+	case BPF_AND:
+		if (is_imm8(val))
+			EMIT3(0x83, add_1reg(0xE0, dreg), val);
+		else
+			EMIT2(0x21, add_2reg(0xC0, dreg, sreg));
+		break;
+	/* dst = dst ^ val */
+	case BPF_XOR:
+		if (is_imm8(val))
+			EMIT3(0x83, add_1reg(0xF0, dreg), val);
+		else
+			EMIT2(0x31, add_2reg(0xC0, dreg, sreg));
+		break;
+	case BPF_NEG:
+		EMIT2(0xF7, add_1reg(0xD8, dreg));
+		break;
+	}
+
+	if (dstk)
+		/* mov dword ptr [ebp+off],dreg */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg),
+		      STACK_VAR(dst));
+	*pprog = prog;
+}
+
+/* ALU operation (64 bit) */
+static inline void emit_ia32_alu_i64(const bool is64, const u8 op,
+				     const u8 dst[], const u32 val,
+				     bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	u32 hi = 0;
+
+	if (is64 && (val & (1<<31)))
+		hi = (u32)~0;
+
+	emit_ia32_alu_i(is64, false, op, dst_lo, val, dstk, &prog);
+	if (is64)
+		emit_ia32_alu_i(is64, true, op, dst_hi, hi, dstk, &prog);
+	else
+		emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+
+	*pprog = prog;
+}
+
+/* dst = ~dst (64 bit) */
+static inline void emit_ia32_neg64(const u8 dst[], bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+
+	/* xor ecx,ecx */
+	EMIT2(0x31, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+	/* sub dreg_lo,ecx */
+	EMIT2(0x2B, add_2reg(0xC0, dreg_lo, IA32_ECX));
+	/* mov dreg_lo,ecx */
+	EMIT2(0x89, add_2reg(0xC0, dreg_lo, IA32_ECX));
+
+	/* xor ecx,ecx */
+	EMIT2(0x31, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+	/* sbb dreg_hi,ecx */
+	EMIT2(0x19, add_2reg(0xC0, dreg_hi, IA32_ECX));
+	/* mov dreg_hi,ecx */
+	EMIT2(0x89, add_2reg(0xC0, dreg_hi, IA32_ECX));
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+/* dst = dst << src */
+static inline void emit_ia32_lsh_r64(const u8 dst[], const u8 src[],
+				     bool dstk, bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	static int jmp_label1 = -1;
+	static int jmp_label2 = -1;
+	static int jmp_label3 = -1;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(src_lo));
+	else
+		/* mov ecx,src_lo */
+		EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+	/* cmp ecx,32 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+	/* Jumps when >= 32 */
+	if (is_imm8(jmp_label(jmp_label1, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+	/* < 32 */
+	/* shl dreg_hi,cl */
+	EMIT2(0xD3, add_1reg(0xE0, dreg_hi));
+	/* mov ebx,dreg_lo */
+	EMIT2(0x8B, add_2reg(0xC0, dreg_lo, IA32_EBX));
+	/* shl dreg_lo,cl */
+	EMIT2(0xD3, add_1reg(0xE0, dreg_lo));
+
+	/* IA32_ECX = -IA32_ECX + 32 */
+	/* neg ecx */
+	EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+	/* add ecx,32 */
+	EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+	/* shr ebx,cl */
+	EMIT2(0xD3, add_1reg(0xE8, IA32_EBX));
+	/* or dreg_hi,ebx */
+	EMIT2(0x09, add_2reg(0xC0, dreg_hi, IA32_EBX));
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 32 */
+	if (jmp_label1 == -1)
+		jmp_label1 = cnt;
+
+	/* cmp ecx,64 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+	/* Jumps when >= 64 */
+	if (is_imm8(jmp_label(jmp_label2, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+	/* >= 32 && < 64 */
+	/* sub ecx,32 */
+	EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+	/* shl dreg_lo,cl */
+	EMIT2(0xD3, add_1reg(0xE0, dreg_lo));
+	/* mov dreg_hi,dreg_lo */
+	EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+
+	/* xor dreg_lo,dreg_lo */
+	EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 64 */
+	if (jmp_label2 == -1)
+		jmp_label2 = cnt;
+	/* xor dreg_lo,dreg_lo */
+	EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+	/* xor dreg_hi,dreg_hi */
+	EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+	if (jmp_label3 == -1)
+		jmp_label3 = cnt;
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	/* out: */
+	*pprog = prog;
+}
+
+/* dst = dst >> src (signed)*/
+static inline void emit_ia32_arsh_r64(const u8 dst[], const u8 src[],
+				      bool dstk, bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	static int jmp_label1 = -1;
+	static int jmp_label2 = -1;
+	static int jmp_label3 = -1;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(src_lo));
+	else
+		/* mov ecx,src_lo */
+		EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+	/* cmp ecx,32 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+	/* Jumps when >= 32 */
+	if (is_imm8(jmp_label(jmp_label1, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+	/* < 32 */
+	/* lshr dreg_lo,cl */
+	EMIT2(0xD3, add_1reg(0xE8, dreg_lo));
+	/* mov ebx,dreg_hi */
+	EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+	/* ashr dreg_hi,cl */
+	EMIT2(0xD3, add_1reg(0xF8, dreg_hi));
+
+	/* IA32_ECX = -IA32_ECX + 32 */
+	/* neg ecx */
+	EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+	/* add ecx,32 */
+	EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+	/* shl ebx,cl */
+	EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+	/* or dreg_lo,ebx */
+	EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 32 */
+	if (jmp_label1 == -1)
+		jmp_label1 = cnt;
+
+	/* cmp ecx,64 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+	/* Jumps when >= 64 */
+	if (is_imm8(jmp_label(jmp_label2, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+	/* >= 32 && < 64 */
+	/* sub ecx,32 */
+	EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+	/* ashr dreg_hi,cl */
+	EMIT2(0xD3, add_1reg(0xF8, dreg_hi));
+	/* mov dreg_lo,dreg_hi */
+	EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+	/* ashr dreg_hi,imm8 */
+	EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 64 */
+	if (jmp_label2 == -1)
+		jmp_label2 = cnt;
+	/* ashr dreg_hi,imm8 */
+	EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+	/* mov dreg_lo,dreg_hi */
+	EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+	if (jmp_label3 == -1)
+		jmp_label3 = cnt;
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	/* out: */
+	*pprog = prog;
+}
+
+/* dst = dst >> src */
+static inline void emit_ia32_rsh_r64(const u8 dst[], const u8 src[], bool dstk,
+				     bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	static int jmp_label1 = -1;
+	static int jmp_label2 = -1;
+	static int jmp_label3 = -1;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+
+	if (sstk)
+		/* mov ecx,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(src_lo));
+	else
+		/* mov ecx,src_lo */
+		EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_ECX));
+
+	/* cmp ecx,32 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 32);
+	/* Jumps when >= 32 */
+	if (is_imm8(jmp_label(jmp_label1, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label1, 6));
+
+	/* < 32 */
+	/* lshr dreg_lo,cl */
+	EMIT2(0xD3, add_1reg(0xE8, dreg_lo));
+	/* mov ebx,dreg_hi */
+	EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+	/* shr dreg_hi,cl */
+	EMIT2(0xD3, add_1reg(0xE8, dreg_hi));
+
+	/* IA32_ECX = -IA32_ECX + 32 */
+	/* neg ecx */
+	EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+	/* add ecx,32 */
+	EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+	/* shl ebx,cl */
+	EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+	/* or dreg_lo,ebx */
+	EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 32 */
+	if (jmp_label1 == -1)
+		jmp_label1 = cnt;
+	/* cmp ecx,64 */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), 64);
+	/* Jumps when >= 64 */
+	if (is_imm8(jmp_label(jmp_label2, 2)))
+		EMIT2(IA32_JAE, jmp_label(jmp_label2, 2));
+	else
+		EMIT2_off32(0x0F, IA32_JAE + 0x10, jmp_label(jmp_label2, 6));
+
+	/* >= 32 && < 64 */
+	/* sub ecx,32 */
+	EMIT3(0x83, add_1reg(0xE8, IA32_ECX), 32);
+	/* shr dreg_hi,cl */
+	EMIT2(0xD3, add_1reg(0xE8, dreg_hi));
+	/* mov dreg_lo,dreg_hi */
+	EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+	/* xor dreg_hi,dreg_hi */
+	EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+	/* goto out; */
+	if (is_imm8(jmp_label(jmp_label3, 2)))
+		EMIT2(0xEB, jmp_label(jmp_label3, 2));
+	else
+		EMIT1_off32(0xE9, jmp_label(jmp_label3, 5));
+
+	/* >= 64 */
+	if (jmp_label2 == -1)
+		jmp_label2 = cnt;
+	/* xor dreg_lo,dreg_lo */
+	EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+	/* xor dreg_hi,dreg_hi */
+	EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+
+	if (jmp_label3 == -1)
+		jmp_label3 = cnt;
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	/* out: */
+	*pprog = prog;
+}
+
+/* dst = dst << val */
+static inline void emit_ia32_lsh_i64(const u8 dst[], const u32 val,
+				     bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+	/* Do LSH operation */
+	if (val < 32) {
+		/* shl dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xE0, dreg_hi), val);
+		/* mov ebx,dreg_lo */
+		EMIT2(0x8B, add_2reg(0xC0, dreg_lo, IA32_EBX));
+		/* shl dreg_lo,imm8 */
+		EMIT3(0xC1, add_1reg(0xE0, dreg_lo), val);
+
+		/* IA32_ECX = 32 - val */
+		/* mov ecx,val */
+		EMIT2(0xB1, val);
+		/* movzx ecx,ecx */
+		EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+		/* neg ecx */
+		EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+		/* add ecx,32 */
+		EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+		/* shr ebx,cl */
+		EMIT2(0xD3, add_1reg(0xE8, IA32_EBX));
+		/* or dreg_hi,ebx */
+		EMIT2(0x09, add_2reg(0xC0, dreg_hi, IA32_EBX));
+	} else if (val >= 32 && val < 64) {
+		u32 value = val - 32;
+
+		/* shl dreg_lo,imm8 */
+		EMIT3(0xC1, add_1reg(0xE0, dreg_lo), value);
+		/* mov dreg_hi,dreg_lo */
+		EMIT2(0x89, add_2reg(0xC0, dreg_hi, dreg_lo));
+		/* xor dreg_lo,dreg_lo */
+		EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+	} else {
+		/* xor dreg_lo,dreg_lo */
+		EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+	}
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+/* dst = dst >> val */
+static inline void emit_ia32_rsh_i64(const u8 dst[], const u32 val,
+				     bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+
+	/* Do RSH operation */
+	if (val < 32) {
+		/* shr dreg_lo,imm8 */
+		EMIT3(0xC1, add_1reg(0xE8, dreg_lo), val);
+		/* mov ebx,dreg_hi */
+		EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+		/* shr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xE8, dreg_hi), val);
+
+		/* IA32_ECX = 32 - val */
+		/* mov ecx,val */
+		EMIT2(0xB1, val);
+		/* movzx ecx,ecx */
+		EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+		/* neg ecx */
+		EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+		/* add ecx,32 */
+		EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+		/* shl ebx,cl */
+		EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+		/* or dreg_lo,ebx */
+		EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+	} else if (val >= 32 && val < 64) {
+		u32 value = val - 32;
+
+		/* shr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xE8, dreg_hi), value);
+		/* mov dreg_lo,dreg_hi */
+		EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+	} else {
+		/* xor dreg_lo,dreg_lo */
+		EMIT2(0x33, add_2reg(0xC0, dreg_lo, dreg_lo));
+		/* xor dreg_hi,dreg_hi */
+		EMIT2(0x33, add_2reg(0xC0, dreg_hi, dreg_hi));
+	}
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+/* dst = dst >> val (signed) */
+static inline void emit_ia32_arsh_i64(const u8 dst[], const u32 val,
+				      bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+	u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+
+	if (dstk) {
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+		      STACK_VAR(dst_hi));
+	}
+	/* Do RSH operation */
+	if (val < 32) {
+		/* shr dreg_lo,imm8 */
+		EMIT3(0xC1, add_1reg(0xE8, dreg_lo), val);
+		/* mov ebx,dreg_hi */
+		EMIT2(0x8B, add_2reg(0xC0, dreg_hi, IA32_EBX));
+		/* ashr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xF8, dreg_hi), val);
+
+		/* IA32_ECX = 32 - val */
+		/* mov ecx,val */
+		EMIT2(0xB1, val);
+		/* movzx ecx,ecx */
+		EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+		/* neg ecx */
+		EMIT2(0xF7, add_1reg(0xD8, IA32_ECX));
+		/* add ecx,32 */
+		EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 32);
+
+		/* shl ebx,cl */
+		EMIT2(0xD3, add_1reg(0xE0, IA32_EBX));
+		/* or dreg_lo,ebx */
+		EMIT2(0x09, add_2reg(0xC0, dreg_lo, IA32_EBX));
+	} else if (val >= 32 && val < 64) {
+		u32 value = val - 32;
+
+		/* ashr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xF8, dreg_hi), value);
+		/* mov dreg_lo,dreg_hi */
+		EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+
+		/* ashr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+	} else {
+		/* ashr dreg_hi,imm8 */
+		EMIT3(0xC1, add_1reg(0xF8, dreg_hi), 31);
+		/* mov dreg_lo,dreg_hi */
+		EMIT2(0x89, add_2reg(0xC0, dreg_lo, dreg_hi));
+	}
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],dreg_lo */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_lo),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],dreg_hi */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, dreg_hi),
+		      STACK_VAR(dst_hi));
+	}
+	*pprog = prog;
+}
+
+static inline void emit_ia32_mul_r64(const u8 dst[], const u8 src[], bool dstk,
+				     bool sstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_hi));
+	else
+		/* mov eax,dst_hi */
+		EMIT2(0x8B, add_2reg(0xC0, dst_hi, IA32_EAX));
+
+	if (sstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_lo));
+	else
+		/* mul src_lo */
+		EMIT2(0xF7, add_1reg(0xE0, src_lo));
+
+	/* mov ecx,eax */
+	EMIT2(0x89, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+	else
+		/* mov eax,dst_lo */
+		EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+	if (sstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_hi));
+	else
+		/* mul src_hi */
+		EMIT2(0xF7, add_1reg(0xE0, src_hi));
+
+	/* add eax,eax */
+	EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+	if (dstk)
+		/* mov eax,dword ptr [ebp+off] */
+		EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+	else
+		/* mov eax,dst_lo */
+		EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+	if (sstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(src_lo));
+	else
+		/* mul src_lo */
+		EMIT2(0xF7, add_1reg(0xE0, src_lo));
+
+	/* add ecx,edx */
+	EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EDX));
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],eax */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],ecx */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(dst_hi));
+	} else {
+		/* mov dst_lo,eax */
+		EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EAX));
+		/* mov dst_hi,ecx */
+		EMIT2(0x89, add_2reg(0xC0, dst_hi, IA32_ECX));
+	}
+
+	*pprog = prog;
+}
+
+static inline void emit_ia32_mul_i64(const u8 dst[], const u32 val,
+				     bool dstk, u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	u32 hi;
+
+	hi = val & (1<<31) ? (u32)~0 : 0;
+	/* movl eax,imm32 */
+	EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), val);
+	if (dstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_hi));
+	else
+		/* mul dst_hi */
+		EMIT2(0xF7, add_1reg(0xE0, dst_hi));
+
+	/* mov ecx,eax */
+	EMIT2(0x89, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+	/* movl eax,imm32 */
+	EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), hi);
+	if (dstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_lo));
+	else
+		/* mul dst_lo */
+		EMIT2(0xF7, add_1reg(0xE0, dst_lo));
+	/* add ecx,eax */
+	EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EAX));
+
+	/* movl eax,imm32 */
+	EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EAX), val);
+	if (dstk)
+		/* mul dword ptr [ebp+off] */
+		EMIT3(0xF7, add_1reg(0x60, IA32_EBP), STACK_VAR(dst_lo));
+	else
+		/* mul dst_lo */
+		EMIT2(0xF7, add_1reg(0xE0, dst_lo));
+
+	/* add ecx,edx */
+	EMIT2(0x01, add_2reg(0xC0, IA32_ECX, IA32_EDX));
+
+	if (dstk) {
+		/* mov dword ptr [ebp+off],eax */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+		      STACK_VAR(dst_lo));
+		/* mov dword ptr [ebp+off],ecx */
+		EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX),
+		      STACK_VAR(dst_hi));
+	} else {
+		/* mov dword ptr [ebp+off],eax */
+		EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EAX));
+		/* mov dword ptr [ebp+off],ecx */
+		EMIT2(0x89, add_2reg(0xC0, dst_hi, IA32_ECX));
+	}
+
+	*pprog = prog;
+}
+
+static int bpf_size_to_x86_bytes(int bpf_size)
+{
+	if (bpf_size == BPF_W)
+		return 4;
+	else if (bpf_size == BPF_H)
+		return 2;
+	else if (bpf_size == BPF_B)
+		return 1;
+	else if (bpf_size == BPF_DW)
+		return 4; /* imm32 */
+	else
+		return 0;
+}
+
+struct jit_context {
+	int cleanup_addr; /* Epilogue code offset */
+};
+
+/* Maximum number of bytes emitted while JITing one eBPF insn */
+#define BPF_MAX_INSN_SIZE	128
+#define BPF_INSN_SAFETY		64
+
+#define PROLOGUE_SIZE 35
+
+/*
+ * Emit prologue code for BPF program and check it's size.
+ * bpf_tail_call helper will skip it while jumping into another program.
+ */
+static void emit_prologue(u8 **pprog, u32 stack_depth)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	const u8 *r1 = bpf2ia32[BPF_REG_1];
+	const u8 fplo = bpf2ia32[BPF_REG_FP][0];
+	const u8 fphi = bpf2ia32[BPF_REG_FP][1];
+	const u8 *tcc = bpf2ia32[TCALL_CNT];
+
+	/* push ebp */
+	EMIT1(0x55);
+	/* mov ebp,esp */
+	EMIT2(0x89, 0xE5);
+	/* push edi */
+	EMIT1(0x57);
+	/* push esi */
+	EMIT1(0x56);
+	/* push ebx */
+	EMIT1(0x53);
+
+	/* sub esp,STACK_SIZE */
+	EMIT2_off32(0x81, 0xEC, STACK_SIZE);
+	/* sub ebp,SCRATCH_SIZE+4+12*/
+	EMIT3(0x83, add_1reg(0xE8, IA32_EBP), SCRATCH_SIZE + 16);
+	/* xor ebx,ebx */
+	EMIT2(0x31, add_2reg(0xC0, IA32_EBX, IA32_EBX));
+
+	/* Set up BPF prog stack base register */
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBP), STACK_VAR(fplo));
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(fphi));
+
+	/* Move BPF_CTX (EAX) to BPF_REG_R1 */
+	/* mov dword ptr [ebp+off],eax */
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r1[0]));
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(r1[1]));
+
+	/* Initialize Tail Count */
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[0]));
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+	BUILD_BUG_ON(cnt != PROLOGUE_SIZE);
+	*pprog = prog;
+}
+
+/* Emit epilogue code for BPF program */
+static void emit_epilogue(u8 **pprog, u32 stack_depth)
+{
+	u8 *prog = *pprog;
+	const u8 *r0 = bpf2ia32[BPF_REG_0];
+	int cnt = 0;
+
+	/* mov eax,dword ptr [ebp+off]*/
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r0[0]));
+	/* mov edx,dword ptr [ebp+off]*/
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r0[1]));
+
+	/* add ebp,SCRATCH_SIZE+4+12*/
+	EMIT3(0x83, add_1reg(0xC0, IA32_EBP), SCRATCH_SIZE + 16);
+
+	/* mov ebx,dword ptr [ebp-12]*/
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), -12);
+	/* mov esi,dword ptr [ebp-8]*/
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ESI), -8);
+	/* mov edi,dword ptr [ebp-4]*/
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDI), -4);
+
+	EMIT1(0xC9); /* leave */
+	EMIT1(0xC3); /* ret */
+	*pprog = prog;
+}
+
+/*
+ * Generate the following code:
+ * ... bpf_tail_call(void *ctx, struct bpf_array *array, u64 index) ...
+ *   if (index >= array->map.max_entries)
+ *     goto out;
+ *   if (++tail_call_cnt > MAX_TAIL_CALL_CNT)
+ *     goto out;
+ *   prog = array->ptrs[index];
+ *   if (prog == NULL)
+ *     goto out;
+ *   goto *(prog->bpf_func + prologue_size);
+ * out:
+ */
+static void emit_bpf_tail_call(u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	const u8 *r1 = bpf2ia32[BPF_REG_1];
+	const u8 *r2 = bpf2ia32[BPF_REG_2];
+	const u8 *r3 = bpf2ia32[BPF_REG_3];
+	const u8 *tcc = bpf2ia32[TCALL_CNT];
+	u32 lo, hi;
+	static int jmp_label1 = -1;
+
+	/*
+	 * if (index >= array->map.max_entries)
+	 *     goto out;
+	 */
+	/* mov eax,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r2[0]));
+	/* mov edx,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r3[0]));
+
+	/* cmp dword ptr [eax+off],edx */
+	EMIT3(0x39, add_2reg(0x40, IA32_EAX, IA32_EDX),
+	      offsetof(struct bpf_array, map.max_entries));
+	/* jbe out */
+	EMIT2(IA32_JBE, jmp_label(jmp_label1, 2));
+
+	/*
+	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+	 *     goto out;
+	 */
+	lo = (u32)MAX_TAIL_CALL_CNT;
+	hi = (u32)((u64)MAX_TAIL_CALL_CNT >> 32);
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(tcc[0]));
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+	/* cmp edx,hi */
+	EMIT3(0x83, add_1reg(0xF8, IA32_EBX), hi);
+	EMIT2(IA32_JNE, 3);
+	/* cmp ecx,lo */
+	EMIT3(0x83, add_1reg(0xF8, IA32_ECX), lo);
+
+	/* ja out */
+	EMIT2(IA32_JAE, jmp_label(jmp_label1, 2));
+
+	/* add eax,0x1 */
+	EMIT3(0x83, add_1reg(0xC0, IA32_ECX), 0x01);
+	/* adc ebx,0x0 */
+	EMIT3(0x83, add_1reg(0xD0, IA32_EBX), 0x00);
+
+	/* mov dword ptr [ebp+off],eax */
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(tcc[0]));
+	/* mov dword ptr [ebp+off],edx */
+	EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EBX), STACK_VAR(tcc[1]));
+
+	/* prog = array->ptrs[index]; */
+	/* mov edx, [eax + edx * 4 + offsetof(...)] */
+	EMIT3_off32(0x8B, 0x94, 0x90, offsetof(struct bpf_array, ptrs));
+
+	/*
+	 * if (prog == NULL)
+	 *     goto out;
+	 */
+	/* test edx,edx */
+	EMIT2(0x85, add_2reg(0xC0, IA32_EDX, IA32_EDX));
+	/* je out */
+	EMIT2(IA32_JE, jmp_label(jmp_label1, 2));
+
+	/* goto *(prog->bpf_func + prologue_size); */
+	/* mov edx, dword ptr [edx + 32] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EDX, IA32_EDX),
+	      offsetof(struct bpf_prog, bpf_func));
+	/* add edx,prologue_size */
+	EMIT3(0x83, add_1reg(0xC0, IA32_EDX), PROLOGUE_SIZE);
+
+	/* mov eax,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX), STACK_VAR(r1[0]));
+
+	/*
+	 * Now we're ready to jump into next BPF program:
+	 * eax == ctx (1st arg)
+	 * edx == prog->bpf_func + prologue_size
+	 */
+	RETPOLINE_EDX_BPF_JIT();
+
+	if (jmp_label1 == -1)
+		jmp_label1 = cnt;
+
+	/* out: */
+	*pprog = prog;
+}
+
+/* Push the scratch stack register on top of the stack. */
+static inline void emit_push_r64(const u8 src[], u8 **pprog)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	/* mov ecx,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_hi));
+	/* push ecx */
+	EMIT1(0x51);
+
+	/* mov ecx,dword ptr [ebp+off] */
+	EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX), STACK_VAR(src_lo));
+	/* push ecx */
+	EMIT1(0x51);
+
+	*pprog = prog;
+}
+
+static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
+		  int oldproglen, struct jit_context *ctx)
+{
+	struct bpf_insn *insn = bpf_prog->insnsi;
+	int insn_cnt = bpf_prog->len;
+	bool seen_exit = false;
+	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
+	int i, cnt = 0;
+	int proglen = 0;
+	u8 *prog = temp;
+
+	emit_prologue(&prog, bpf_prog->aux->stack_depth);
+
+	for (i = 0; i < insn_cnt; i++, insn++) {
+		const s32 imm32 = insn->imm;
+		const bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
+		const bool dstk = insn->dst_reg == BPF_REG_AX ? false : true;
+		const bool sstk = insn->src_reg == BPF_REG_AX ? false : true;
+		const u8 code = insn->code;
+		const u8 *dst = bpf2ia32[insn->dst_reg];
+		const u8 *src = bpf2ia32[insn->src_reg];
+		const u8 *r0 = bpf2ia32[BPF_REG_0];
+		s64 jmp_offset;
+		u8 jmp_cond;
+		int ilen;
+		u8 *func;
+
+		switch (code) {
+		/* ALU operations */
+		/* dst = src */
+		case BPF_ALU | BPF_MOV | BPF_K:
+		case BPF_ALU | BPF_MOV | BPF_X:
+		case BPF_ALU64 | BPF_MOV | BPF_K:
+		case BPF_ALU64 | BPF_MOV | BPF_X:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_mov_r64(is64, dst, src, dstk,
+						  sstk, &prog);
+				break;
+			case BPF_K:
+				/* Sign-extend immediate value to dst reg */
+				emit_ia32_mov_i64(is64, dst, imm32,
+						  dstk, &prog);
+				break;
+			}
+			break;
+		/* dst = dst + src/imm */
+		/* dst = dst - src/imm */
+		/* dst = dst | src/imm */
+		/* dst = dst & src/imm */
+		/* dst = dst ^ src/imm */
+		/* dst = dst * src/imm */
+		/* dst = dst << src */
+		/* dst = dst >> src */
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU64 | BPF_ADD | BPF_K:
+		case BPF_ALU64 | BPF_ADD | BPF_X:
+		case BPF_ALU64 | BPF_SUB | BPF_K:
+		case BPF_ALU64 | BPF_SUB | BPF_X:
+		case BPF_ALU64 | BPF_OR | BPF_K:
+		case BPF_ALU64 | BPF_OR | BPF_X:
+		case BPF_ALU64 | BPF_AND | BPF_K:
+		case BPF_ALU64 | BPF_AND | BPF_X:
+		case BPF_ALU64 | BPF_XOR | BPF_K:
+		case BPF_ALU64 | BPF_XOR | BPF_X:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_alu_r64(is64, BPF_OP(code), dst,
+						  src, dstk, sstk, &prog);
+				break;
+			case BPF_K:
+				emit_ia32_alu_i64(is64, BPF_OP(code), dst,
+						  imm32, dstk, &prog);
+				break;
+			}
+			break;
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_mul_r(dst_lo, src_lo, dstk,
+						sstk, &prog);
+				break;
+			case BPF_K:
+				/* mov ecx,imm32*/
+				EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+					    imm32);
+				emit_ia32_mul_r(dst_lo, IA32_ECX, dstk,
+						false, &prog);
+				break;
+			}
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_ARSH | BPF_K:
+		case BPF_ALU | BPF_ARSH | BPF_X:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_shift_r(BPF_OP(code), dst_lo, src_lo,
+						  dstk, sstk, &prog);
+				break;
+			case BPF_K:
+				/* mov ecx,imm32*/
+				EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+					    imm32);
+				emit_ia32_shift_r(BPF_OP(code), dst_lo,
+						  IA32_ECX, dstk, false,
+						  &prog);
+				break;
+			}
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
+		/* dst = dst / src(imm) */
+		/* dst = dst % src(imm) */
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_div_mod_r(BPF_OP(code), dst_lo,
+						    src_lo, dstk, sstk, &prog);
+				break;
+			case BPF_K:
+				/* mov ecx,imm32*/
+				EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX),
+					    imm32);
+				emit_ia32_div_mod_r(BPF_OP(code), dst_lo,
+						    IA32_ECX, dstk, false,
+						    &prog);
+				break;
+			}
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
+		case BPF_ALU64 | BPF_DIV | BPF_K:
+		case BPF_ALU64 | BPF_DIV | BPF_X:
+		case BPF_ALU64 | BPF_MOD | BPF_K:
+		case BPF_ALU64 | BPF_MOD | BPF_X:
+			goto notyet;
+		/* dst = dst >> imm */
+		/* dst = dst << imm */
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_K:
+			if (unlikely(imm32 > 31))
+				return -EINVAL;
+			/* mov ecx,imm32*/
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+			emit_ia32_shift_r(BPF_OP(code), dst_lo, IA32_ECX, dstk,
+					  false, &prog);
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
+		/* dst = dst << imm */
+		case BPF_ALU64 | BPF_LSH | BPF_K:
+			if (unlikely(imm32 > 63))
+				return -EINVAL;
+			emit_ia32_lsh_i64(dst, imm32, dstk, &prog);
+			break;
+		/* dst = dst >> imm */
+		case BPF_ALU64 | BPF_RSH | BPF_K:
+			if (unlikely(imm32 > 63))
+				return -EINVAL;
+			emit_ia32_rsh_i64(dst, imm32, dstk, &prog);
+			break;
+		/* dst = dst << src */
+		case BPF_ALU64 | BPF_LSH | BPF_X:
+			emit_ia32_lsh_r64(dst, src, dstk, sstk, &prog);
+			break;
+		/* dst = dst >> src */
+		case BPF_ALU64 | BPF_RSH | BPF_X:
+			emit_ia32_rsh_r64(dst, src, dstk, sstk, &prog);
+			break;
+		/* dst = dst >> src (signed) */
+		case BPF_ALU64 | BPF_ARSH | BPF_X:
+			emit_ia32_arsh_r64(dst, src, dstk, sstk, &prog);
+			break;
+		/* dst = dst >> imm (signed) */
+		case BPF_ALU64 | BPF_ARSH | BPF_K:
+			if (unlikely(imm32 > 63))
+				return -EINVAL;
+			emit_ia32_arsh_i64(dst, imm32, dstk, &prog);
+			break;
+		/* dst = ~dst */
+		case BPF_ALU | BPF_NEG:
+			emit_ia32_alu_i(is64, false, BPF_OP(code),
+					dst_lo, 0, dstk, &prog);
+			emit_ia32_mov_i(dst_hi, 0, dstk, &prog);
+			break;
+		/* dst = ~dst (64 bit) */
+		case BPF_ALU64 | BPF_NEG:
+			emit_ia32_neg64(dst, dstk, &prog);
+			break;
+		/* dst = dst * src/imm */
+		case BPF_ALU64 | BPF_MUL | BPF_X:
+		case BPF_ALU64 | BPF_MUL | BPF_K:
+			switch (BPF_SRC(code)) {
+			case BPF_X:
+				emit_ia32_mul_r64(dst, src, dstk, sstk, &prog);
+				break;
+			case BPF_K:
+				emit_ia32_mul_i64(dst, imm32, dstk, &prog);
+				break;
+			}
+			break;
+		/* dst = htole(dst) */
+		case BPF_ALU | BPF_END | BPF_FROM_LE:
+			emit_ia32_to_le_r64(dst, imm32, dstk, &prog);
+			break;
+		/* dst = htobe(dst) */
+		case BPF_ALU | BPF_END | BPF_FROM_BE:
+			emit_ia32_to_be_r64(dst, imm32, dstk, &prog);
+			break;
+		/* dst = imm64 */
+		case BPF_LD | BPF_IMM | BPF_DW: {
+			s32 hi, lo = imm32;
+
+			hi = insn[1].imm;
+			emit_ia32_mov_i(dst_lo, lo, dstk, &prog);
+			emit_ia32_mov_i(dst_hi, hi, dstk, &prog);
+			insn++;
+			i++;
+			break;
+		}
+		/* ST: *(u8*)(dst_reg + off) = imm */
+		case BPF_ST | BPF_MEM | BPF_H:
+		case BPF_ST | BPF_MEM | BPF_B:
+		case BPF_ST | BPF_MEM | BPF_W:
+		case BPF_ST | BPF_MEM | BPF_DW:
+			if (dstk)
+				/* mov eax,dword ptr [ebp+off] */
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+			else
+				/* mov eax,dst_lo */
+				EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				EMIT(0xC6, 1); break;
+			case BPF_H:
+				EMIT2(0x66, 0xC7); break;
+			case BPF_W:
+			case BPF_DW:
+				EMIT(0xC7, 1); break;
+			}
+
+			if (is_imm8(insn->off))
+				EMIT2(add_1reg(0x40, IA32_EAX), insn->off);
+			else
+				EMIT1_off32(add_1reg(0x80, IA32_EAX),
+					    insn->off);
+			EMIT(imm32, bpf_size_to_x86_bytes(BPF_SIZE(code)));
+
+			if (BPF_SIZE(code) == BPF_DW) {
+				u32 hi;
+
+				hi = imm32 & (1<<31) ? (u32)~0 : 0;
+				EMIT2_off32(0xC7, add_1reg(0x80, IA32_EAX),
+					    insn->off + 4);
+				EMIT(hi, 4);
+			}
+			break;
+
+		/* STX: *(u8*)(dst_reg + off) = src_reg */
+		case BPF_STX | BPF_MEM | BPF_B:
+		case BPF_STX | BPF_MEM | BPF_H:
+		case BPF_STX | BPF_MEM | BPF_W:
+		case BPF_STX | BPF_MEM | BPF_DW:
+			if (dstk)
+				/* mov eax,dword ptr [ebp+off] */
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+			else
+				/* mov eax,dst_lo */
+				EMIT2(0x8B, add_2reg(0xC0, dst_lo, IA32_EAX));
+
+			if (sstk)
+				/* mov edx,dword ptr [ebp+off] */
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(src_lo));
+			else
+				/* mov edx,src_lo */
+				EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_EDX));
+
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				EMIT(0x88, 1); break;
+			case BPF_H:
+				EMIT2(0x66, 0x89); break;
+			case BPF_W:
+			case BPF_DW:
+				EMIT(0x89, 1); break;
+			}
+
+			if (is_imm8(insn->off))
+				EMIT2(add_2reg(0x40, IA32_EAX, IA32_EDX),
+				      insn->off);
+			else
+				EMIT1_off32(add_2reg(0x80, IA32_EAX, IA32_EDX),
+					    insn->off);
+
+			if (BPF_SIZE(code) == BPF_DW) {
+				if (sstk)
+					/* mov edi,dword ptr [ebp+off] */
+					EMIT3(0x8B, add_2reg(0x40, IA32_EBP,
+							     IA32_EDX),
+					      STACK_VAR(src_hi));
+				else
+					/* mov edi,src_hi */
+					EMIT2(0x8B, add_2reg(0xC0, src_hi,
+							     IA32_EDX));
+				EMIT1(0x89);
+				if (is_imm8(insn->off + 4)) {
+					EMIT2(add_2reg(0x40, IA32_EAX,
+						       IA32_EDX),
+					      insn->off + 4);
+				} else {
+					EMIT1(add_2reg(0x80, IA32_EAX,
+						       IA32_EDX));
+					EMIT(insn->off + 4, 4);
+				}
+			}
+			break;
+
+		/* LDX: dst_reg = *(u8*)(src_reg + off) */
+		case BPF_LDX | BPF_MEM | BPF_B:
+		case BPF_LDX | BPF_MEM | BPF_H:
+		case BPF_LDX | BPF_MEM | BPF_W:
+		case BPF_LDX | BPF_MEM | BPF_DW:
+			if (sstk)
+				/* mov eax,dword ptr [ebp+off] */
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(src_lo));
+			else
+				/* mov eax,dword ptr [ebp+off] */
+				EMIT2(0x8B, add_2reg(0xC0, src_lo, IA32_EAX));
+
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+				EMIT2(0x0F, 0xB6); break;
+			case BPF_H:
+				EMIT2(0x0F, 0xB7); break;
+			case BPF_W:
+			case BPF_DW:
+				EMIT(0x8B, 1); break;
+			}
+
+			if (is_imm8(insn->off))
+				EMIT2(add_2reg(0x40, IA32_EAX, IA32_EDX),
+				      insn->off);
+			else
+				EMIT1_off32(add_2reg(0x80, IA32_EAX, IA32_EDX),
+					    insn->off);
+
+			if (dstk)
+				/* mov dword ptr [ebp+off],edx */
+				EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(dst_lo));
+			else
+				/* mov dst_lo,edx */
+				EMIT2(0x89, add_2reg(0xC0, dst_lo, IA32_EDX));
+			switch (BPF_SIZE(code)) {
+			case BPF_B:
+			case BPF_H:
+			case BPF_W:
+				if (dstk) {
+					EMIT3(0xC7, add_1reg(0x40, IA32_EBP),
+					      STACK_VAR(dst_hi));
+					EMIT(0x0, 4);
+				} else {
+					EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
+				}
+				break;
+			case BPF_DW:
+				EMIT2_off32(0x8B,
+					    add_2reg(0x80, IA32_EAX, IA32_EDX),
+					    insn->off + 4);
+				if (dstk)
+					EMIT3(0x89,
+					      add_2reg(0x40, IA32_EBP,
+						       IA32_EDX),
+					      STACK_VAR(dst_hi));
+				else
+					EMIT2(0x89,
+					      add_2reg(0xC0, dst_hi, IA32_EDX));
+				break;
+			default:
+				break;
+			}
+			break;
+		/* call */
+		case BPF_JMP | BPF_CALL:
+		{
+			const u8 *r1 = bpf2ia32[BPF_REG_1];
+			const u8 *r2 = bpf2ia32[BPF_REG_2];
+			const u8 *r3 = bpf2ia32[BPF_REG_3];
+			const u8 *r4 = bpf2ia32[BPF_REG_4];
+			const u8 *r5 = bpf2ia32[BPF_REG_5];
+
+			if (insn->src_reg == BPF_PSEUDO_CALL)
+				goto notyet;
+
+			func = (u8 *) __bpf_call_base + imm32;
+			jmp_offset = func - (image + addrs[i]);
+
+			if (!imm32 || !is_simm32(jmp_offset)) {
+				pr_err("unsupported BPF func %d addr %p image %p\n",
+				       imm32, func, image);
+				return -EINVAL;
+			}
+
+			/* mov eax,dword ptr [ebp+off] */
+			EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(r1[0]));
+			/* mov edx,dword ptr [ebp+off] */
+			EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+			      STACK_VAR(r1[1]));
+
+			emit_push_r64(r5, &prog);
+			emit_push_r64(r4, &prog);
+			emit_push_r64(r3, &prog);
+			emit_push_r64(r2, &prog);
+
+			EMIT1_off32(0xE8, jmp_offset + 9);
+
+			/* mov dword ptr [ebp+off],eax */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(r0[0]));
+			/* mov dword ptr [ebp+off],edx */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+			      STACK_VAR(r0[1]));
+
+			/* add esp,32 */
+			EMIT3(0x83, add_1reg(0xC0, IA32_ESP), 32);
+			break;
+		}
+		case BPF_JMP | BPF_TAIL_CALL:
+			emit_bpf_tail_call(&prog);
+			break;
+
+		/* cond jump */
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JNE | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JLT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_X:
+		case BPF_JMP | BPF_JLE | BPF_X:
+		case BPF_JMP | BPF_JSGT | BPF_X:
+		case BPF_JMP | BPF_JSLE | BPF_X:
+		case BPF_JMP | BPF_JSLT | BPF_X:
+		case BPF_JMP | BPF_JSGE | BPF_X: {
+			u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+			u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+			u8 sreg_lo = sstk ? IA32_ECX : src_lo;
+			u8 sreg_hi = sstk ? IA32_EBX : src_hi;
+
+			if (dstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(dst_hi));
+			}
+
+			if (sstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+				      STACK_VAR(src_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX),
+				      STACK_VAR(src_hi));
+			}
+
+			/* cmp dreg_hi,sreg_hi */
+			EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
+			EMIT2(IA32_JNE, 2);
+			/* cmp dreg_lo,sreg_lo */
+			EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));
+			goto emit_cond_jmp;
+		}
+		case BPF_JMP | BPF_JSET | BPF_X: {
+			u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+			u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+			u8 sreg_lo = sstk ? IA32_ECX : src_lo;
+			u8 sreg_hi = sstk ? IA32_EBX : src_hi;
+
+			if (dstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(dst_hi));
+			}
+
+			if (sstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_ECX),
+				      STACK_VAR(src_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX),
+				      STACK_VAR(src_hi));
+			}
+			/* and dreg_lo,sreg_lo */
+			EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
+			/* and dreg_hi,sreg_hi */
+			EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
+			/* or dreg_lo,dreg_hi */
+			EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
+			goto emit_cond_jmp;
+		}
+		case BPF_JMP | BPF_JSET | BPF_K: {
+			u32 hi;
+			u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+			u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+			u8 sreg_lo = IA32_ECX;
+			u8 sreg_hi = IA32_EBX;
+
+			if (dstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(dst_hi));
+			}
+			hi = imm32 & (1<<31) ? (u32)~0 : 0;
+
+			/* mov ecx,imm32 */
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+			/* mov ebx,imm32 */
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
+
+			/* and dreg_lo,sreg_lo */
+			EMIT2(0x23, add_2reg(0xC0, sreg_lo, dreg_lo));
+			/* and dreg_hi,sreg_hi */
+			EMIT2(0x23, add_2reg(0xC0, sreg_hi, dreg_hi));
+			/* or dreg_lo,dreg_hi */
+			EMIT2(0x09, add_2reg(0xC0, dreg_lo, dreg_hi));
+			goto emit_cond_jmp;
+		}
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JNE | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JLT | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JLE | BPF_K:
+		case BPF_JMP | BPF_JSGT | BPF_K:
+		case BPF_JMP | BPF_JSLE | BPF_K:
+		case BPF_JMP | BPF_JSLT | BPF_K:
+		case BPF_JMP | BPF_JSGE | BPF_K: {
+			u32 hi;
+			u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
+			u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+			u8 sreg_lo = IA32_ECX;
+			u8 sreg_hi = IA32_EBX;
+
+			if (dstk) {
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+				      STACK_VAR(dst_lo));
+				EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX),
+				      STACK_VAR(dst_hi));
+			}
+
+			hi = imm32 & (1<<31) ? (u32)~0 : 0;
+			/* mov ecx,imm32 */
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_ECX), imm32);
+			/* mov ebx,imm32 */
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX), hi);
+
+			/* cmp dreg_hi,sreg_hi */
+			EMIT2(0x39, add_2reg(0xC0, dreg_hi, sreg_hi));
+			EMIT2(IA32_JNE, 2);
+			/* cmp dreg_lo,sreg_lo */
+			EMIT2(0x39, add_2reg(0xC0, dreg_lo, sreg_lo));
+
+emit_cond_jmp:		/* Convert BPF opcode to x86 */
+			switch (BPF_OP(code)) {
+			case BPF_JEQ:
+				jmp_cond = IA32_JE;
+				break;
+			case BPF_JSET:
+			case BPF_JNE:
+				jmp_cond = IA32_JNE;
+				break;
+			case BPF_JGT:
+				/* GT is unsigned '>', JA in x86 */
+				jmp_cond = IA32_JA;
+				break;
+			case BPF_JLT:
+				/* LT is unsigned '<', JB in x86 */
+				jmp_cond = IA32_JB;
+				break;
+			case BPF_JGE:
+				/* GE is unsigned '>=', JAE in x86 */
+				jmp_cond = IA32_JAE;
+				break;
+			case BPF_JLE:
+				/* LE is unsigned '<=', JBE in x86 */
+				jmp_cond = IA32_JBE;
+				break;
+			case BPF_JSGT:
+				/* Signed '>', GT in x86 */
+				jmp_cond = IA32_JG;
+				break;
+			case BPF_JSLT:
+				/* Signed '<', LT in x86 */
+				jmp_cond = IA32_JL;
+				break;
+			case BPF_JSGE:
+				/* Signed '>=', GE in x86 */
+				jmp_cond = IA32_JGE;
+				break;
+			case BPF_JSLE:
+				/* Signed '<=', LE in x86 */
+				jmp_cond = IA32_JLE;
+				break;
+			default: /* to silence GCC warning */
+				return -EFAULT;
+			}
+			jmp_offset = addrs[i + insn->off] - addrs[i];
+			if (is_imm8(jmp_offset)) {
+				EMIT2(jmp_cond, jmp_offset);
+			} else if (is_simm32(jmp_offset)) {
+				EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
+			} else {
+				pr_err("cond_jmp gen bug %llx\n", jmp_offset);
+				return -EFAULT;
+			}
+
+			break;
+		}
+		case BPF_JMP | BPF_JA:
+			if (insn->off == -1)
+				/* -1 jmp instructions will always jump
+				 * backwards two bytes. Explicitly handling
+				 * this case avoids wasting too many passes
+				 * when there are long sequences of replaced
+				 * dead code.
+				 */
+				jmp_offset = -2;
+			else
+				jmp_offset = addrs[i + insn->off] - addrs[i];
+
+			if (!jmp_offset)
+				/* Optimize out nop jumps */
+				break;
+emit_jmp:
+			if (is_imm8(jmp_offset)) {
+				EMIT2(0xEB, jmp_offset);
+			} else if (is_simm32(jmp_offset)) {
+				EMIT1_off32(0xE9, jmp_offset);
+			} else {
+				pr_err("jmp gen bug %llx\n", jmp_offset);
+				return -EFAULT;
+			}
+			break;
+
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+		{
+			int size;
+			const u8 *r6 = bpf2ia32[BPF_REG_6];
+
+			/* Setting up first argument */
+			/* mov eax,dword ptr [ebp+off] */
+			EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(r6[0]));
+
+			/* Setting up second argument */
+			if (BPF_MODE(code) == BPF_ABS) {
+				/* mov %edx, imm32 */
+				EMIT1_off32(0xBA, imm32);
+			} else {
+				if (sstk)
+					/* mov edx,dword ptr [ebp+off] */
+					EMIT3(0x8B, add_2reg(0x40, IA32_EBP,
+							     IA32_EDX),
+					      STACK_VAR(src_lo));
+				else
+					/* mov edx,src_lo */
+					EMIT2(0x8B, add_2reg(0xC0, src_lo,
+							     IA32_EDX));
+				if (imm32) {
+					if (is_imm8(imm32))
+						/* add %edx,imm8 */
+						EMIT3(0x83, 0xC2, imm32);
+					else
+						/* add %edx,imm32 */
+						EMIT2_off32(0x81, 0xC2, imm32);
+				}
+			}
+
+			/* Setting up third argument */
+			switch (BPF_SIZE(code)) {
+			case BPF_W:
+				size = 4;
+				break;
+			case BPF_H:
+				size = 2;
+				break;
+			case BPF_B:
+				size = 1;
+				break;
+			default:
+				return -EINVAL;
+			}
+			/* mov ecx,val */
+			EMIT2(0xB1, size);
+			/* movzx ecx,ecx */
+			EMIT3(0x0F, 0xB6, add_2reg(0xC0, IA32_ECX, IA32_ECX));
+
+			/* mov ebx,ebp */
+			EMIT2(0x8B, add_2reg(0xC0, IA32_EBP, IA32_EBX));
+			/* add %ebx,imm8 */
+			EMIT3(0x83, add_1reg(0xC0, IA32_EBX), SKB_BUFFER);
+			/* push ebx */
+			EMIT1(0x53);
+
+			/* Setting up function pointer to call */
+			/* mov ebx,imm32*/
+			EMIT2_off32(0xC7, add_1reg(0xC0, IA32_EBX),
+				    (unsigned int)bpf_load_pointer);
+
+			EMIT2(0xFF, add_1reg(0xD0, IA32_EBX));
+			/* add %esp,4 */
+			EMIT3(0x83, add_1reg(0xC0, IA32_ESP), 4);
+			/* xor edx,edx */
+			EMIT2(0x33, add_2reg(0xC0, IA32_EDX, IA32_EDX));
+
+			/* mov dword ptr [ebp+off],eax */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+			      STACK_VAR(r0[0]));
+			/* mov dword ptr [ebp+off],edx */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EDX),
+			      STACK_VAR(r0[1]));
+
+			/*
+			 * Check if return address is NULL or not.
+			 * If NULL then jump to epilogue else continue
+			 * to load the value from retn address
+			 */
+			EMIT3(0x83, add_1reg(0xF8, IA32_EAX), 0);
+			jmp_offset = ctx->cleanup_addr - addrs[i];
+
+			switch (BPF_SIZE(code)) {
+			case BPF_W:
+				jmp_offset += 7;
+				break;
+			case BPF_H:
+				jmp_offset += 10;
+				break;
+			case BPF_B:
+				jmp_offset += 6;
+				break;
+			}
+
+			EMIT2_off32(0x0F, IA32_JE + 0x10, jmp_offset);
+			/* Load value from the address */
+			switch (BPF_SIZE(code)) {
+			case BPF_W:
+				/* mov eax,[eax] */
+				EMIT2(0x8B, 0x0);
+				/* Emit 'bswap eax' */
+				EMIT2(0x0F, add_1reg(0xC8, IA32_EAX));
+				break;
+			case BPF_H:
+				EMIT3(0x0F, 0xB7, 0x0);
+				EMIT1(0x66);
+				EMIT3(0xC1, add_1reg(0xC8, IA32_EAX), 8);
+				break;
+			case BPF_B:
+				EMIT3(0x0F, 0xB6, 0x0);
+				break;
+			}
+
+			/* mov dword ptr [ebp+off],eax */
+			EMIT3(0x89, add_2reg(0x40, IA32_EBP, IA32_EAX),
+			      STACK_VAR(r0[0]));
+			break;
+		}
+		/* STX XADD: lock *(u32 *)(dst + off) += src */
+		case BPF_STX | BPF_XADD | BPF_W:
+		/* STX XADD: lock *(u64 *)(dst + off) += src */
+		case BPF_STX | BPF_XADD | BPF_DW:
+			goto notyet;
+		case BPF_JMP | BPF_EXIT:
+			if (seen_exit) {
+				jmp_offset = ctx->cleanup_addr - addrs[i];
+				goto emit_jmp;
+			}
+			seen_exit = true;
+			/* Update cleanup_addr */
+			ctx->cleanup_addr = proglen;
+			emit_epilogue(&prog, bpf_prog->aux->stack_depth);
+			break;
+notyet:
+			pr_info_once("*** NOT YET: opcode %02x ***\n", code);
+			return -EFAULT;
+		default:
+			/*
+			 * This error will be seen if new instruction was added
+			 * to interpreter, but not to JIT or if there is junk in
+			 * bpf_prog
+			 */
+			pr_err("bpf_jit: unknown opcode %02x\n", code);
+			return -EINVAL;
+		}
+
+		ilen = prog - temp;
+		if (ilen > BPF_MAX_INSN_SIZE) {
+			pr_err("bpf_jit: fatal insn size error\n");
+			return -EFAULT;
+		}
+
+		if (image) {
+			if (unlikely(proglen + ilen > oldproglen)) {
+				pr_err("bpf_jit: fatal error\n");
+				return -EFAULT;
+			}
+			memcpy(image + proglen, temp, ilen);
+		}
+		proglen += ilen;
+		addrs[i] = proglen;
+		prog = temp;
+	}
+	return proglen;
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
+{
+	struct bpf_binary_header *header = NULL;
+	struct bpf_prog *tmp, *orig_prog = prog;
+	int proglen, oldproglen = 0;
+	struct jit_context ctx = {};
+	bool tmp_blinded = false;
+	u8 *image = NULL;
+	int *addrs;
+	int pass;
+	int i;
+
+	if (!prog->jit_requested)
+		return orig_prog;
+
+	tmp = bpf_jit_blind_constants(prog);
+	/*
+	 * If blinding was requested and we failed during blinding,
+	 * we must fall back to the interpreter.
+	 */
+	if (IS_ERR(tmp))
+		return orig_prog;
+	if (tmp != prog) {
+		tmp_blinded = true;
+		prog = tmp;
+	}
+
+	addrs = kmalloc(prog->len * sizeof(*addrs), GFP_KERNEL);
+	if (!addrs) {
+		prog = orig_prog;
+		goto out;
+	}
+
+	/*
+	 * Before first pass, make a rough estimation of addrs[]
+	 * each BPF instruction is translated to less than 64 bytes
+	 */
+	for (proglen = 0, i = 0; i < prog->len; i++) {
+		proglen += 64;
+		addrs[i] = proglen;
+	}
+	ctx.cleanup_addr = proglen;
+
+	/*
+	 * JITed image shrinks with every pass and the loop iterates
+	 * until the image stops shrinking. Very large BPF programs
+	 * may converge on the last pass. In such case do one more
+	 * pass to emit the final image.
+	 */
+	for (pass = 0; pass < 20 || image; pass++) {
+		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
+		if (proglen <= 0) {
+out_image:
+			image = NULL;
+			if (header)
+				bpf_jit_binary_free(header);
+			prog = orig_prog;
+			goto out_addrs;
+		}
+		if (image) {
+			if (proglen != oldproglen) {
+				pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
+				       proglen, oldproglen);
+				goto out_image;
+			}
+			break;
+		}
+		if (proglen == oldproglen) {
+			header = bpf_jit_binary_alloc(proglen, &image,
+						      1, jit_fill_hole);
+			if (!header) {
+				prog = orig_prog;
+				goto out_addrs;
+			}
+		}
+		oldproglen = proglen;
+		cond_resched();
+	}
+
+	if (bpf_jit_enable > 1)
+		bpf_jit_dump(prog->len, proglen, pass + 1, image);
+
+	if (image) {
+		bpf_jit_binary_lock_ro(header);
+		prog->bpf_func = (void *)image;
+		prog->jited = 1;
+		prog->jited_len = proglen;
+	} else {
+		prog = orig_prog;
+	}
+
+out_addrs:
+	kfree(addrs);
+out:
+	if (tmp_blinded)
+		bpf_jit_prog_release_other(prog, prog == orig_prog ?
+					   tmp : orig_prog);
+	return prog;
+}
-- 
1.8.5.6.2.g3d8a54e.dirty

^ permalink raw reply related

* [PATCH net-next] cxgb4: update latest firmware version supported
From: Ganesh Goudar @ 2018-05-03  6:24 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, venkatesh, Ganesh Goudar

Change t4fw_version.h to update latest firmware version
number to 1.19.1.0.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
index 123e2c1..4eb15ce 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h
@@ -36,8 +36,8 @@
 #define __T4FW_VERSION_H__
 
 #define T4FW_VERSION_MAJOR 0x01
-#define T4FW_VERSION_MINOR 0x10
-#define T4FW_VERSION_MICRO 0x3F
+#define T4FW_VERSION_MINOR 0x13
+#define T4FW_VERSION_MICRO 0x01
 #define T4FW_VERSION_BUILD 0x00
 
 #define T4FW_MIN_VERSION_MAJOR 0x01
@@ -45,8 +45,8 @@
 #define T4FW_MIN_VERSION_MICRO 0x00
 
 #define T5FW_VERSION_MAJOR 0x01
-#define T5FW_VERSION_MINOR 0x10
-#define T5FW_VERSION_MICRO 0x3F
+#define T5FW_VERSION_MINOR 0x13
+#define T5FW_VERSION_MICRO 0x01
 #define T5FW_VERSION_BUILD 0x00
 
 #define T5FW_MIN_VERSION_MAJOR 0x00
@@ -54,8 +54,8 @@
 #define T5FW_MIN_VERSION_MICRO 0x00
 
 #define T6FW_VERSION_MAJOR 0x01
-#define T6FW_VERSION_MINOR 0x10
-#define T6FW_VERSION_MICRO 0x3F
+#define T6FW_VERSION_MINOR 0x13
+#define T6FW_VERSION_MICRO 0x01
 #define T6FW_VERSION_BUILD 0x00
 
 #define T6FW_MIN_VERSION_MAJOR 0x00
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next v3 2/2] openvswitch: Support conntrack zone limit
From: Pravin Shelar @ 2018-05-03  6:49 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1525123713-38891-3-git-send-email-yihung.wei@gmail.com>

On Mon, Apr 30, 2018 at 2:28 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> Currently, nf_conntrack_max is used to limit the maximum number of
> conntrack entries in the conntrack table for every network namespace.
> For the VMs and containers that reside in the same namespace,
> they share the same conntrack table, and the total # of conntrack entries
> for all the VMs and containers are limited by nf_conntrack_max.  In this
> case, if one of the VM/container abuses the usage the conntrack entries,
> it blocks the others from committing valid conntrack entries into the
> conntrack table.  Even if we can possibly put the VM in different network
> namespace, the current nf_conntrack_max configuration is kind of rigid
> that we cannot limit different VM/container to have different # conntrack
> entries.
>
> To address the aforementioned issue, this patch proposes to have a
> fine-grained mechanism that could further limit the # of conntrack entries
> per-zone.  For example, we can designate different zone to different VM,
> and set conntrack limit to each zone.  By providing this isolation, a
> mis-behaved VM only consumes the conntrack entries in its own zone, and
> it will not influence other well-behaved VMs.  Moreover, the users can
> set various conntrack limit to different zone based on their preference.
>
> The proposed implementation utilizes Netfilter's nf_conncount backend
> to count the number of connections in a particular zone.  If the number of
> connection is above a configured limitation, ovs will return ENOMEM to the
> userspace.  If userspace does not configure the zone limit, the limit
> defaults to zero that is no limitation, which is backward compatible to
> the behavior without this patch.
>
> The following high leve APIs are provided to the userspace:
>   - OVS_CT_LIMIT_CMD_SET:
>     * set default connection limit for all zones
>     * set the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_DEL:
>     * remove the connection limit for a particular zone
>   - OVS_CT_LIMIT_CMD_GET:
>     * get the default connection limit for all zones
>     * get the connection limit for a particular zone
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  net/openvswitch/Kconfig     |   3 +-
>  net/openvswitch/conntrack.c | 508 +++++++++++++++++++++++++++++++++++++++++++-
>  net/openvswitch/conntrack.h |   9 +-
>  net/openvswitch/datapath.c  |   7 +-
>  net/openvswitch/datapath.h  |   1 +
>  5 files changed, 522 insertions(+), 6 deletions(-)
>
..
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index c5904f629091..8234964889d9 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
...

> +/* Call with ovs_mutex */
> +static void ct_limit_del(const struct ovs_ct_limit_info *info, u16 zone)
> +{
> +       struct ovs_ct_limit *ct_limit;
> +       struct hlist_head *head;
> +
> +       head = ct_limit_hash_bucket(info, zone);
> +       hlist_for_each_entry_rcu(ct_limit, head, hlist_node) {
better to use hlist_for_each_entry_safe()

> +               if (ct_limit->zone == zone) {
> +                       hlist_del_rcu(&ct_limit->hlist_node);
> +                       kfree_rcu(ct_limit, rcu);
> +                       return;
> +               }
> +       }
> +}
> +
....

> +static int ovs_ct_check_limit(struct net *net,
> +                             const struct ovs_conntrack_info *info,
> +                             const struct nf_conntrack_tuple *tuple)
> +{
> +       struct ovs_net *ovs_net = net_generic(net, ovs_net_id);
> +       const struct ovs_ct_limit_info *ct_limit_info = ovs_net->ct_limit_info;
> +       u32 per_zone_limit, connections;
> +       u32 conncount_key[5];
> +
> +       conncount_key[0] = info->zone.id;
> +
> +       rcu_read_lock();
This function is call with rcu_read_lock() in datapath, so no need to
take it again.

> +       per_zone_limit = ct_limit_get(ct_limit_info, info->zone.id);
> +       if (per_zone_limit == OVS_CT_LIMIT_UNLIMITED) {
> +               rcu_read_unlock();
> +               return 0;
> +       }
> +
> +       connections = nf_conncount_count(net, ct_limit_info->data,
> +                                        conncount_key, tuple, &info->zone);
> +       if (connections > per_zone_limit) {
> +               rcu_read_unlock();
> +               return -ENOMEM;
> +       }
> +
> +       rcu_read_unlock();
> +       return 0;
> +}
> +#endif
> +
....

>
>  static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
> @@ -2469,3 +2471,4 @@ MODULE_ALIAS_GENL_FAMILY(OVS_VPORT_FAMILY);
>  MODULE_ALIAS_GENL_FAMILY(OVS_FLOW_FAMILY);
>  MODULE_ALIAS_GENL_FAMILY(OVS_PACKET_FAMILY);
>  MODULE_ALIAS_GENL_FAMILY(OVS_METER_FAMILY);
> +MODULE_ALIAS_GENL_FAMILY(OVS_CT_LIMIT_FAMILY);
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 523d65526766..51bd4dcb6c8b 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -144,6 +144,7 @@ struct dp_upcall_info {
>  struct ovs_net {
>         struct list_head dps;
>         struct work_struct dp_notify_work;
> +       struct ovs_ct_limit_info *ct_limit_info;
>
Lets keep this struct and hash table inside the ovs_net to avoid
indirections in accessing the hash table. Also need to check for
IS_ENABLED(CONFIG_NETFILTER_CONNCOUNT).

^ permalink raw reply

* Re: DSA switch
From: Ran Shalit @ 2018-05-03  6:50 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180502205620.GE24748@lunn.ch>

On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>> Hello,
>>
>> Is it possible to use switch just like external real switch,
>> connecting all ports to the same subnet ?
>
> Yes. Just bridge all ports/interfaces together and put your host IP
> address on the bridge.
>
>         Andrew


Hi,

I get error on trying to add bridge.
I am trying to =understand which configuration is missing probably in my kernel,
 I ran strace, but not sure , does it point to any missing configuration ?

root@dm814x-evm:~# ip link add br0 type bridge
RTNETLINK answers: Operation not supported

root@dm814x-evm:~# ./strace ip link add br0 type bridge
execve("/bin/ip", ["ip", "link", "add", "br0", "type", "bridge"], [/*
11 vars */]) = 0
brk(0)                                  = 0x44000
uname({sys="Linux", node="dm814x-evm", ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x400c1000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = -1 ENOENT (No such file or directory)
open("/lib/tls/v7l/fast-mult/half/libresolv.so.2", O_RDONLY) = -1
ENOENT (No such file or directory)
stat64("/lib/tls/v7l/fast-mult/half", 0xbe8bb3c0) = -1 ENOENT (No such
file or directory)
open("/lib/tls/v7l/fast-mult/libresolv.so.2", O_RDONLY) = -1 ENOENT
(No such file or directory)
stat64("/lib/tls/v7l/fast-mult", 0xbe8bb3c0) = -1 ENOENT (No such file
or directory)
open("/lib/tls/v7l/half/libresolv.so.2", O_RDONLY) = -1 ENOENT (No
such file or directory)
stat64("/lib/tls/v7l/half", 0xbe8bb3c0) = -1 ENOENT (No such file or directory)
open("/lib/tls/v7l/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such
file or directory)
stat64("/lib/tls/v7l", 0xbe8bb3c0)      = -1 ENOENT (No such file or directory)
open("/lib/tls/fast-mult/half/libresolv.so.2", O_RDONLY) = -1 ENOENT
(No such file or directory)
stat64("/lib/tls/fast-mult/half", 0xbe8bb3c0) = -1 ENOENT (No such
file or directory)
open("/lib/tls/fast-mult/libresolv.so.2", O_RDONLY) = -1 ENOENT (No
such file or directory)
stat64("/lib/tls/fast-mult", 0xbe8bb3c0) = -1 ENOENT (No such file or directory)
open("/lib/tls/half/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such
file or directory)
stat64("/lib/tls/half", 0xbe8bb3c0)     = -1 ENOENT (No such file or directory)
open("/lib/tls/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such file or
directory)
stat64("/lib/tls", 0xbe8bb3c0)          = -1 ENOENT (No such file or directory)
open("/lib/v7l/fast-mult/half/libresolv.so.2", O_RDONLY) = -1 ENOENT
(No such file or directory)
stat64("/lib/v7l/fast-mult/half", 0xbe8bb3c0) = -1 ENOENT (No such
file or directory)
open("/lib/v7l/fast-mult/libresolv.so.2", O_RDONLY) = -1 ENOENT (No
such file or directory)
stat64("/lib/v7l/fast-mult", 0xbe8bb3c0) = -1 ENOENT (No such file or directory)
open("/lib/v7l/half/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such
file or directory)
stat64("/lib/v7l/half", 0xbe8bb3c0)     = -1 ENOENT (No such file or directory)
open("/lib/v7l/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such file or
directory)
stat64("/lib/v7l", 0xbe8bb3c0)          = -1 ENOENT (No such file or directory)
open("/lib/fast-mult/half/libresolv.so.2", O_RDONLY) = -1 ENOENT (No
such file or directory)
stat64("/lib/fast-mult/half", 0xbe8bb3c0) = -1 ENOENT (No such file or
directory)
open("/lib/fast-mult/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such
file or directory)
stat64("/lib/fast-mult", 0xbe8bb3c0)    = -1 ENOENT (No such file or directory)
open("/lib/half/libresolv.so.2", O_RDONLY) = -1 ENOENT (No such file
or directory)
stat64("/lib/half", 0xbe8bb3c0)         = -1 ENOENT (No such file or directory)
open("/lib/libresolv.so.2", O_RDONLY)   = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\234
\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=67624, ...}) = 0
mmap2(NULL, 108588, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x40164000
mprotect(0x40174000, 28672, PROT_NONE)  = 0
mmap2(0x4017b000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xf) = 0x4017b000
mmap2(0x4017d000, 6188, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4017d000
close(3)                                = 0
open("/lib/libdl.so.2", O_RDONLY)       = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0l\n\0\0004\0\0\0"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=18080, ...}) = 0
mmap2(NULL, 49364, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x400b2000
mprotect(0x400b6000, 28672, PROT_NONE)  = 0
mmap2(0x400bd000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x400bd000
close(3)                                = 0
open("/lib/libgcc_s.so.1", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0x'\0\0004\0\0\0"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=70650, ...}) = 0
mmap2(NULL, 79984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x400d1000
mprotect(0x400dd000, 28672, PROT_NONE)  = 0
mmap2(0x400e4000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb) = 0x400e4000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\240Q\1\0004\0\0\0"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1181160, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x400c2000
mmap2(NULL, 1217096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0x4017f000
mprotect(0x4029c000, 28672, PROT_NONE)  = 0
mmap2(0x402a3000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11c) = 0x402a3000
mmap2(0x402a6000, 8776, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x402a6000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x400e5000
set_tls(0x400e54a0, 0x400e5b70, 0x4002867c, 0x400e5b78, 0x40028050) = 0
mprotect(0x402a3000, 8192, PROT_READ)   = 0
mprotect(0x400bd000, 4096, PROT_READ)   = 0
mprotect(0x4017b000, 4096, PROT_READ)   = 0
mprotect(0x40027000, 4096, PROT_READ)   = 0
socket(PF_NETLINK, SOCK_RAW, 0)         = 3
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=1274, groups=00000000}, [12]) = 0
gettimeofday({1356950670, 688093}, NULL) = 0
send(3, " \0\0\0\20\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
32, 0) = 32
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"4\0\0\0\2\0\0\0\0\0\0\0\372\4\0\0\355\377\377\377
\0\0\0\20\0\5\0\0\0\0\0"..., 8192}], msg_controllen=0, msg_flags=0},
0) = 52
send(3, "\24\0\0\0\22\0\1\3\217l\341P\0\0\0\0\0\0\0\0", 20, 0) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"\254\1\0\0\20\0\2\0\217l\341P\372\4\0\0\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0"...,
16384}], msg_controllen=0, msg_flags=0}, 0) = 2664
brk(0)                                  = 0x44000
brk(0x65000)                            = 0x65000
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"\24\0\0\0\3\0\2\0\217l\341P\372\4\0\0\0\0\0\0\1\0\0\0I\0\1\0\0\0\0\0"...,
16384}], msg_controllen=0, msg_flags=0}, 0) = 20
open("/usr/lib//ip/link_bridge.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
sendmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"8\0\0\0\20\0\5\6\220l\341P\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
56}], msg_controllen=0, msg_flags=0}, 0) = 56
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0,
groups=00000000},
msg_iov(1)=[{"L\0\0\0\2\0\0\0\220l\341P\372\4\0\0\241\377\377\3778\0\0\0\20\0\5\6\220l\341P"...,
16384}], msg_controllen=0, msg_flags=0}, 0) = 76
dup(2)                                  = 4
fcntl64(4, F_GETFL)                     = 0x20002 (flags O_RDWR|O_LARGEFILE)
fstat64(4, {st_mode=S_IFCHR|0600, st_rdev=makedev(252, 0), ...}) = 0
ioctl(4, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or
TCGETS, {B115200 opost isig icanon echo ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x4011b000
_llseek(4, 0, 0xbe8b7510, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
write(4, "RTNETLINK answers: Operation not"..., 43RTNETLINK answers:
Operation not supported
) = 43
close(4)                                = 0
munmap(0x4011b000, 4096)                = 0
exit_group(2)                           = ?
root@dm814x-evm:~#


Thank you,
ran

^ permalink raw reply

* Re: [PATCH RFC net-next] net: ipvs: Adjust gso_size for IPPROTO_TCP
From: Martin KaFai Lau @ 2018-05-03  7:01 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: netdev, David Ahern, Tom Herbert, Eric Dumazet, Nikita Shirokov,
	kernel-team, lvs-devel
In-Reply-To: <alpine.LFD.2.20.1805022143360.3301@ja.home.ssi.bg>

On Wed, May 02, 2018 at 10:30:32PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 2 May 2018, Martin KaFai Lau wrote:
> 
> > On Wed, May 02, 2018 at 09:38:43AM +0300, Julian Anastasov wrote:
> > > 
> > > - initial traffic for port 21 does not use GSO. But after
> > > every packet IPVS calls maybe_update_pmtu (rt->dst.ops->update_pmtu)
> > > to report the reduced MTU. These updates are stored in fnhe_pmtu
> > > but they do not go to any route, even if we try to get fresh
> > > output route. Why? Because the local routes are not cached, so
> > > they can not use the fnhe. This is what my patch for route.c
> > > will fix. With this fix FTP-DATA gets route with reduced PMTU.
> > For IPv6, the 'if (rt6->rt6i_flags & RTF_LOCAL)' gate in
> > __ip6_rt_update_pmtu() may need to be lifted also.
> 
> 	Probably. I completely forgot the IPv6 part
> but as I don't know the IPv6 code enough, it may take
> some time to understand what can be the problem there...
> I'm not sure whether everything started with commit 0a6b2a1dc2a2,
> so that in some configurations before that commit things
> worked and problem was not noticed.
> 
> 	I think, we should focus on such direction for IPv6:
> 
> - do we remember per-VIP PMTU for the local routes
IPv6 used not to create cache route for DST_HOST route which
is a /128 route (that includes local /128 route).

Because of this, it had a bug such that a PMTU for the DST_HOST
route will trigger dst.ops->update_pmtu() which then set
an expire on the permanent /128 route instead of a cache
route.  The permanent route got unexpectedly expired/removed
later.

The fix was to allow creating /128 cache route as long as
it is not RTF_LOCAL in 653437d02f1f and 7035870d1219.  The
first post spelled out the problem better:
https://patchwork.ozlabs.org/patch/456050/

Later, when we only create cache route after seeing PMTU
in 45e4fd26683c, this RTF_LOCAL checking was carried over
to __ip6_rt_update_pmtu().

Out of my head, I don't see issue removing the
RTF_LOCAL check from __ip6_rt_update_pmtu().
DavidA, what do you think?

> 
> - when exactly we start to use the new PMTU, eg. what happens
> in case socket caches the route, whether route is killed via
> dst->obsolete. Or may be while the PMTU expiration is handled
> per-packet, the PMTU change is noticed only on ICMP...
Before sk can reuse its dst cache, the sk will notice
its dst cache is no longer valid by calling dst_check().
dst_check() should return NULL which is one of the side
effect of the earlier update_pmtu().  This dst_check()
is usually only called when the sk needs to do output,
so the new PMTU route (i.e. the RTF_CACHE IPv6 route)
only have effect to the later packets.

> 
> - as IPVS reports the PMTU via dst.ops->update_pmtu() long
> before any large packets are sent, do we propagate the
> PMTU. Also, for IPv4 __ip_rt_update_pmtu() has some protection
> from such per-packet updates that do not change the PMTU.
> 
> - if IPVS starts to send ICMP when gso_size exceeds PMTU,
> like in my draft patch, whether the PMTU is propagated
> to route and then to socket. As for the gso_size decrease,
> playing in IPVS is not very safe, at least, we need help
> from GSO experts to know how we should use it.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/2] bpf: enable stackmap with build_id in nmi context
From: Tobin C. Harding @ 2018-05-03  7:03 UTC (permalink / raw)
  To: Song Liu
  Cc: netdev, kernel-team, qinteng, Alexei Starovoitov, Daniel Borkmann,
	Peter Zijlstra
In-Reply-To: <20180502232030.3788284-2-songliubraving@fb.com>

On Wed, May 02, 2018 at 04:20:29PM -0700, Song Liu wrote:
> Currently, we cannot parse build_id in nmi context because of
> up_read(&current->mm->mmap_sem), this makes stackmap with build_id
> less useful. This patch enables parsing build_id in nmi by putting
> the up_read() call in irq_work. To avoid memory allocation in nmi
> context, we use per cpu variable for the irq_work. As a result, only
> one irq_work per cpu is allowed. If the irq_work is in-use, we
> fallback to only report ips.
> 
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  init/Kconfig          |  1 +
>  kernel/bpf/stackmap.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 54 insertions(+), 6 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index f013afc..480a4f2 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1391,6 +1391,7 @@ config BPF_SYSCALL
>  	bool "Enable bpf() system call"
>  	select ANON_INODES
>  	select BPF
> +	select IRQ_WORK
>  	default n
>  	help
>  	  Enable the bpf() system call that allows to manipulate eBPF
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3ba102b..51d4aea 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -11,6 +11,7 @@
>  #include <linux/perf_event.h>
>  #include <linux/elf.h>
>  #include <linux/pagemap.h>
> +#include <linux/irq_work.h>
>  #include "percpu_freelist.h"
>  
>  #define STACK_CREATE_FLAG_MASK					\
> @@ -32,6 +33,23 @@ struct bpf_stack_map {
>  	struct stack_map_bucket *buckets[];
>  };
>  
> +/* irq_work to run up_read() for build_id lookup in nmi context */
> +struct stack_map_irq_work {
> +	struct irq_work irq_work;
> +	struct rw_semaphore *sem;
> +};
> +
> +static void do_up_read(struct irq_work *entry)
> +{
> +	struct stack_map_irq_work *work = container_of(entry,
> +			struct stack_map_irq_work, irq_work);

perhaps:
	struct stack_map_irq_work *work;

	work = container_of(entry, struct stack_map_irq_work, irq_work);
> +	up_read(work->sem);
> +	work->sem = NULL;
> +}
> +
> +static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work);
> +
>  static inline bool stack_map_use_build_id(struct bpf_map *map)
>  {
>  	return (map->map_flags & BPF_F_STACK_BUILD_ID);
> @@ -267,17 +285,27 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  {
>  	int i;
>  	struct vm_area_struct *vma;
> +	bool in_nmi_ctx = in_nmi();
> +	bool irq_work_busy = false;
> +	struct stack_map_irq_work *work;
> +
> +	if (in_nmi_ctx) {
> +		work = this_cpu_ptr(&up_read_work);
> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
> +			/* cannot queue more up_read, fallback */
> +			irq_work_busy = true;
> +	}
>  
>  	/*
> -	 * We cannot do up_read() in nmi context, so build_id lookup is
> -	 * only supported for non-nmi events. If at some point, it is
> -	 * possible to run find_vma() without taking the semaphore, we
> -	 * would like to allow build_id lookup in nmi context.
> +	 * We cannot do up_read() in nmi context. To do build_id lookup
> +	 * in nmi context, we need to run up_read() in irq_work. We use
> +	 * a percpu variable to do the irq_work. If the irq_work is
> +	 * already used by another lookup, we fall back to report ips.
>  	 *
>  	 * Same fallback is used for kernel stack (!user) on a stackmap
>  	 * with build_id.
>  	 */
> -	if (!user || !current || !current->mm || in_nmi() ||
> +	if (!user || !current || !current->mm || irq_work_busy ||
>  	    down_read_trylock(&current->mm->mmap_sem) == 0) {
>  		/* cannot access current->mm, fall back to ips */
>  		for (i = 0; i < trace_nr; i++) {
> @@ -299,7 +327,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>  			- vma->vm_start;
>  		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
>  	}
> -	up_read(&current->mm->mmap_sem);
> +
> +	if (!in_nmi_ctx)
> +		up_read(&current->mm->mmap_sem);
> +	else {

perhaps:
	if (!in_nmi_ctx) {
		up_read(&current->mm->mmap_sem);
	} else {


Hope this helps,
Tobin.

^ permalink raw reply

* Re: DSA switch
From: Jiri Pirko @ 2018-05-03  7:11 UTC (permalink / raw)
  To: Ran Shalit; +Cc: Andrew Lunn, netdev
In-Reply-To: <CAJ2oMhLMSmzUHGSMS6AFhRfWSTYwjTJ1-E7Gsx0Pn9Opmtb5YA@mail.gmail.com>

Thu, May 03, 2018 at 08:50:52AM CEST, ranshalit@gmail.com wrote:
>On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>>> Hello,
>>>
>>> Is it possible to use switch just like external real switch,
>>> connecting all ports to the same subnet ?
>>
>> Yes. Just bridge all ports/interfaces together and put your host IP
>> address on the bridge.
>>
>>         Andrew
>
>
>Hi,
>
>I get error on trying to add bridge.
>I am trying to =understand which configuration is missing probably in my kernel,
> I ran strace, but not sure , does it point to any missing configuration ?
>
>root@dm814x-evm:~# ip link add br0 type bridge

Is the bridge module enabled in the kernel config?


>RTNETLINK answers: Operation not supported

^ permalink raw reply

* Re: [PATCH v2 bpf-next 2/2] bpf: add selftest for stackmap with build_id in NMI context
From: Tobin C. Harding @ 2018-05-03  7:19 UTC (permalink / raw)
  To: Song Liu; +Cc: netdev, kernel-team, qinteng
In-Reply-To: <20180502232030.3788284-3-songliubraving@fb.com>

On Wed, May 02, 2018 at 04:20:30PM -0700, Song Liu wrote:
> This new test captures stackmap with build_id with hardware event
> PERF_COUNT_HW_CPU_CYCLES.
> 
> Because we only support one ips-to-build_id lookup per cpu in NMI
> context, stack_amap will not be able to do the lookup in this test.

           stack_map ?

> Therefore, we didn't do compare_stack_ips(), as it will alwasy fail.
> 
> urandom_read.c is extended to run configurable cycles so that it can be
> caught by the perf event.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  tools/testing/selftests/bpf/test_progs.c   | 137 +++++++++++++++++++++++++++++
>  tools/testing/selftests/bpf/urandom_read.c |  10 ++-
>  2 files changed, 145 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index aa336f0..00bb08c 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -1272,6 +1272,142 @@ static void test_stacktrace_build_id(void)
>  	return;
>  }
>  
> +static void test_stacktrace_build_id_nmi(void)
> +{
> +	int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
> +	const char *file = "./test_stacktrace_build_id.o";
> +	int err, pmu_fd, prog_fd;
> +	struct perf_event_attr attr = {
> +		.sample_freq = 5000,
> +		.freq = 1,
> +		.type = PERF_TYPE_HARDWARE,
> +		.config = PERF_COUNT_HW_CPU_CYCLES,
> +	};
> +	__u32 key, previous_key, val, duration = 0;
> +	struct bpf_object *obj;
> +	char buf[256];
> +	int i, j;
> +	struct bpf_stack_build_id id_offs[PERF_MAX_STACK_DEPTH];
> +	int build_id_matches = 0;
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
> +	if (CHECK(err, "prog_load", "err %d errno %d\n", err, errno))
> +		goto out;
  		    
perhaps:
		return;

> +	pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
> +			 0 /* cpu 0 */, -1 /* group id */,
> +			 0 /* flags */);
> +	if (CHECK(pmu_fd < 0, "perf_event_open",
> +		  "err %d errno %d. Does the test host support PERF_COUNT_HW_CPU_CYCLES?\n",
> +		  pmu_fd, errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n",
> +		  err, errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n",
> +		  err, errno))
> +		goto disable_pmu;
> +
> +	/* find map fds */
> +	control_map_fd = bpf_find_map(__func__, obj, "control_map");
> +	if (CHECK(control_map_fd < 0, "bpf_find_map control_map",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	stackid_hmap_fd = bpf_find_map(__func__, obj, "stackid_hmap");
> +	if (CHECK(stackid_hmap_fd < 0, "bpf_find_map stackid_hmap",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	stackmap_fd = bpf_find_map(__func__, obj, "stackmap");
> +	if (CHECK(stackmap_fd < 0, "bpf_find_map stackmap", "err %d errno %d\n",
> +		  err, errno))
> +		goto disable_pmu;
> +
> +	stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
> +	if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
> +	       == 0);
> +	assert(system("taskset 0x1 ./urandom_read 100000") == 0);
> +	/* disable stack trace collection */
> +	key = 0;
> +	val = 1;
> +	bpf_map_update_elem(control_map_fd, &key, &val, 0);
> +
> +	/* for every element in stackid_hmap, we can find a corresponding one
> +	 * in stackmap, and vise versa.
> +	 */
> +	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
> +	if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
> +	if (CHECK(err, "compare_map_keys stackmap vs. stackid_hmap",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	err = extract_build_id(buf, 256);
> +
> +	if (CHECK(err, "get build_id with readelf",
> +		  "err %d errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	err = bpf_map_get_next_key(stackmap_fd, NULL, &key);
> +	if (CHECK(err, "get_next_key from stackmap",
> +		  "err %d, errno %d\n", err, errno))
> +		goto disable_pmu;
> +
> +	do {
> +		char build_id[64];
> +
> +		err = bpf_map_lookup_elem(stackmap_fd, &key, id_offs);
> +		if (CHECK(err, "lookup_elem from stackmap",
> +			  "err %d, errno %d\n", err, errno))
> +			goto disable_pmu;
> +		for (i = 0; i < PERF_MAX_STACK_DEPTH; ++i)
> +			if (id_offs[i].status == BPF_STACK_BUILD_ID_VALID &&
> +			    id_offs[i].offset != 0) {
> +				for (j = 0; j < 20; ++j)
> +					sprintf(build_id + 2 * j, "%02x",
> +						id_offs[i].build_id[j] & 0xff);
> +				if (strstr(buf, build_id) != NULL)
> +					build_id_matches = 1;
> +			}
> +		previous_key = key;
> +	} while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
> +
> +	if (CHECK(build_id_matches < 1, "build id match",
> +		  "Didn't find expected build ID from the map\n"))
> +		goto disable_pmu;
> +
> +	/*
> +	 * We intentionally skip compare_stack_ips(). This is because we
> +	 * only support one in_nmi() ips-to-build_id translation per cpu
> +	 * at any time, thus stack_amap here will always fallback to
> +	 * BPF_STACK_BUILD_ID_IP;
> +	 */
> +
> +disable_pmu:
> +	ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
> +
> +close_pmu:
> +	close(pmu_fd);
> +
> +close_prog:
> +	bpf_object__close(obj);
> +
> +out:
> +	return;
> +}

No real need for label 'out' right?  We can just return directly and
remove the last three lines of this function.

Hope this helps,
Tobin.

^ permalink raw reply

* Re: [RFC V3 PATCH 1/8] vhost: move get_rx_bufs to vhost.c
From: Jason Wang @ 2018-05-03  7:19 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann, wexu
In-Reply-To: <20180502080518.h52wme46fnqpyfpf@debian>



On 2018年05月02日 16:05, Tiwei Bie wrote:
> On Mon, Apr 23, 2018 at 01:34:53PM +0800, Jason Wang wrote:
>> Move get_rx_bufs() to vhost.c and rename it to
>> vhost_get_rx_bufs(). This helps to hide vring internal layout from
> A small typo. Based on the code change in this patch, it
> seems that this function is renamed to vhost_get_bufs().
>
> Thanks
>

Right, let me fix it in the next version.

Thanks

^ permalink raw reply

* Re: DSA switch
From: Ran Shalit @ 2018-05-03  7:25 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Andrew Lunn, netdev
In-Reply-To: <20180503071124.GM19250@nanopsycho>

On Thu, May 3, 2018 at 10:11 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> Thu, May 03, 2018 at 08:50:52AM CEST, ranshalit@gmail.com wrote:
>>On Wed, May 2, 2018 at 11:56 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>> On Wed, May 02, 2018 at 11:20:05PM +0300, Ran Shalit wrote:
>>>> Hello,
>>>>
>>>> Is it possible to use switch just like external real switch,
>>>> connecting all ports to the same subnet ?
>>>
>>> Yes. Just bridge all ports/interfaces together and put your host IP
>>> address on the bridge.
>>>
>>>         Andrew
>>
>>
>>Hi,
>>
>>I get error on trying to add bridge.
>>I am trying to =understand which configuration is missing probably in my kernel,
>> I ran strace, but not sure , does it point to any missing configuration ?
>>
>>root@dm814x-evm:~# ip link add br0 type bridge
>
> Is the bridge module enabled in the kernel config?
Yes, I've also added all configuration listed in
https://www.thelinuxfaq.com/355-rtnetlink-answers-operation-not-supported-on-centos
(we old kernel 2.6.37, which support TI's chip)

>
>
>>RTNETLINK answers: Operation not supported

I've managed doing it with brctl instead and it seems to work fine.
ifconfig lan0 0.0.0.0
ifconfig lan1 0.0.0.0
ifconfig lan2 0.0.0.0
ifconfig lan3 0.0.0.0
brctl addbr br0
brctl addif br0 lan0
brctl addif br0 lan1
brctl addif br0 lan2
brctl addif br0 lan3
ifconfig br0 150.42.40.222

Yet, brctl command seems to take  time (about a second till it
returns), and we have a requirement for fast boot,
So, I wander why " ip link add br0 type bridge" command gave those errors.
I also notice in the strace I've pasted here the following:
open("/usr/lib//ip/link_bridge.so", O_RDONLY) = -1 ENOENT (No such
file or directory)
There is really no such file in my filesystem /usr/lib//ip/link_bridge.so.
Why is it missing ?


Thank you,
ranran

^ permalink raw reply

* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-03  7:25 UTC (permalink / raw)
  To: Tiwei Bie, Michael S. Tsirkin; +Cc: netdev, wexu, linux-kernel, virtualization
In-Reply-To: <20180503020949.5u3qz32gsk33z6vk@debian>



On 2018年05月03日 10:09, Tiwei Bie wrote:
>>>> So how about we use the straightforward way then?
>>> You mean we do new += vq->vring_packed.num instead
>>> of event_idx -= vq->vring_packed.num before calling
>>> vring_need_event()?
>>>
>>> The problem is that, the second param (new_idx) of
>>> vring_need_event() will be used for:
>>>
>>> (__u16)(new_idx - event_idx - 1)
>>> (__u16)(new_idx - old)
>>>
>>> So if we change new, we will need to change old too.
>> I think that since we have a branch there anyway,
>> we are better off just special-casing if (wrap_counter != vq->wrap_counter).
>> Treat is differenty and avoid casts.
>>
>>> And that would be an ugly hack..
>>>
>>> Best regards,
>>> Tiwei Bie
>> I consider casts and huge numbers with two's complement
>> games even uglier.
> The dependency on two's complement game is introduced
> since the split ring.
>
> In packed ring, old is calculated via:
>
> old = vq->next_avail_idx - vq->num_added;
>
> In split ring, old is calculated via:
>
> old = vq->avail_idx_shadow - vq->num_added;
>
> In both cases, when vq->num_added is bigger, old will
> be a big number.
>
> Best regards,
> Tiwei Bie
>

How about just do something like vhost:

static u16 vhost_idx_diff(struct vhost_virtqueue *vq, u16 old, u16 new)
{
     if (new > old)
         return new - old;
     return  (new + vq->num - old);
}

static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
                       __u16 event_off, __u16 new,
                       __u16 old)
{
     return (__u16)(vhost_idx_diff(vq, new, event_off) - 1) <
            (__u16)vhost_idx_diff(vq, new, old);
}

?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] openvswitch: Add conntrack limit netlink definition
From: Pravin Shelar @ 2018-05-03  6:49 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: Linux Kernel Network Developers
In-Reply-To: <1525123713-38891-2-git-send-email-yihung.wei@gmail.com>

On Mon, Apr 30, 2018 at 2:28 PM, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> Define netlink messages and attributes to support user kernel
> communication that uses the conntrack limit feature.
>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
>  include/uapi/linux/openvswitch.h | 62 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
>
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 713e56ce681f..ca63c16375ce 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -937,4 +937,66 @@ enum ovs_meter_band_type {
>
>  #define OVS_METER_BAND_TYPE_MAX (__OVS_METER_BAND_TYPE_MAX - 1)
>
> +/* Conntrack limit */
> +#define OVS_CT_LIMIT_FAMILY  "ovs_ct_limit"
> +#define OVS_CT_LIMIT_MCGROUP "ovs_ct_limit"
> +#define OVS_CT_LIMIT_VERSION 0x1
> +
> +enum ovs_ct_limit_cmd {
> +       OVS_CT_LIMIT_CMD_UNSPEC,
> +       OVS_CT_LIMIT_CMD_SET,           /* Add or modify ct limit. */
> +       OVS_CT_LIMIT_CMD_DEL,           /* Delete ct limit. */
> +       OVS_CT_LIMIT_CMD_GET            /* Get ct limit. */
> +};
> +
> +enum ovs_ct_limit_attr {
> +       OVS_CT_LIMIT_ATTR_UNSPEC,
> +       OVS_CT_LIMIT_ATTR_OPTION,       /* Nested OVS_CT_LIMIT_ATTR_* */
> +       __OVS_CT_LIMIT_ATTR_MAX
> +};
> +
> +#define OVS_CT_LIMIT_ATTR_MAX (__OVS_CT_LIMIT_ATTR_MAX - 1)
> +
> +/**
> + * @OVS_CT_ZONE_LIMIT_ATTR_SET_REQ: Contains either
> + * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a pair of
> + * OVS_CT_ZONE_LIMIT_ATTR_ZONE and OVS_CT_ZONE_LIMIT_ATTR_LIMIT.
> + * @OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
> + * @OVS_CT_ZONE_LIMIT_ATTR_GET_REQ: Contains OVS_CT_ZONE_LIMIT_ATTR_ZONE.
> + * @OVS_CT_ZONE_LIMIT_ATTR_GET_RLY: Contains either
> + * OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT or a triple of
> + * OVS_CT_ZONE_LIMIT_ATTR_ZONE, OVS_CT_ZONE_LIMIT_ATTR_LIMIT and
> + * OVS_CT_ZONE_LIMIT_ATTR_COUNT.
> + */
> +enum ovs_ct_limit_option_attr {
> +       OVS_CT_LIMIT_OPTION_ATTR_UNSPEC,
> +       OVS_CT_ZONE_LIMIT_ATTR_SET_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> +                                        * attributes. */
> +       OVS_CT_ZONE_LIMIT_ATTR_DEL_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> +                                        * attributes. */
> +       OVS_CT_ZONE_LIMIT_ATTR_GET_REQ, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*
> +                                        * attributes. */
> +       OVS_CT_ZONE_LIMIT_ATTR_GET_RLY, /* Nested OVS_CT_ZONE_LIMIT_ATTR_*

This option looks redundant to me, can we just use ovs_ct_limit_cmd
and have nested attributes with ovs_ct_zone_limit_attr as parameters ?
I do not see need for ovs_ct_limit_attr either, These changes would
simplify the interface.

> +                                        * attributes. */
> +       __OVS_CT_LIMIT_OPTION_ATTR_MAX
> +};
> +
> +#define OVS_CT_LIMIT_OPTION_ATTR_MAX (__OVS_CT_LIMIT_OPTION_ATTR_MAX - 1)
> +
> +enum ovs_ct_zone_limit_attr {
> +       OVS_CT_ZONE_LIMIT_ATTR_UNSPEC,
> +       OVS_CT_ZONE_LIMIT_ATTR_DEFAULT_LIMIT,   /* u32 default conntrack limit
> +                                                * for all zones. */
> +       OVS_CT_ZONE_LIMIT_ATTR_ZONE,            /* u16 conntrack zone id. */
> +       OVS_CT_ZONE_LIMIT_ATTR_LIMIT,           /* u32 max number of conntrack
> +                                                * entries allowed in the
> +                                                * corresponding zone. */
> +       OVS_CT_ZONE_LIMIT_ATTR_COUNT,           /* u32 number of conntrack
> +                                                * entries in the corresponding
> +                                                * zone. */
> +       __OVS_CT_ZONE_LIMIT_ATTR_MAX
> +};
> +
> +#define OVS_CT_ZONE_LIMIT_ATTR_MAX (__OVS_CT_ZONE_LIMIT_ATTR_MAX - 1)
> +
>  #endif /* _LINUX_OPENVSWITCH_H */
> --
> 2.7.4
>

^ permalink raw reply

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Geert Uytterhoeven @ 2018-05-03  7:25 UTC (permalink / raw)
  To: Finn Thain; +Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List
In-Reply-To: <S1751632AbeECEYA/20180503042400Z+254@vger.kernel.org>

Hi Finn,

On Thu, May 3, 2018 at 6:23 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
> Set the device's coherent_dma_mask to avoid a WARNING splat.
> Please see commit 205e1b7f51e4 ("dma-mapping: warn when there is
> no coherent_dma_mask").
>
> Cc: linux-m68k@lists.linux-m68k.org
> Tested-by: Stan Johnson <userm57@yahoo.com>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

Thanks for your patch!

> --- a/drivers/net/ethernet/apple/macmace.c
> +++ b/drivers/net/ethernet/apple/macmace.c
> @@ -203,6 +203,10 @@ static int mace_probe(struct platform_device *pdev)
>         unsigned char checksum = 0;
>         int err;
>
> +       err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> +       if (err)
> +               return err;
> +
>         dev = alloc_etherdev(PRIV_BYTES);
>         if (!dev)
>                 return -ENOMEM;

Shouldn't this be handled in the platform code that instantiates the device,
i.e. in arch/m68k/mac/config.c:mac_platform_init()?

Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for platform
FEC ethernets").

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net] macsonic: Set platform device coherent_dma_mask
From: Geert Uytterhoeven @ 2018-05-03  7:25 UTC (permalink / raw)
  To: Finn Thain; +Cc: David S. Miller, linux-m68k, netdev, Linux Kernel Mailing List
In-Reply-To: <S1752057AbeECEYP/20180503042418Z+1168@vger.kernel.org>

Hi Finn,

On Thu, May 3, 2018 at 6:24 AM, Finn Thain <fthain@telegraphics.com.au> wrote:
> Set the device's coherent_dma_mask to avoid a WARNING splat.
> Please see commit 205e1b7f51e4 ("dma-mapping: warn when there is
> no coherent_dma_mask").
>
> Cc: linux-m68k@lists.linux-m68k.org
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

Thanks for your patch!

> --- a/drivers/net/ethernet/natsemi/macsonic.c
> +++ b/drivers/net/ethernet/natsemi/macsonic.c
> @@ -523,6 +523,10 @@ static int mac_sonic_platform_probe(struct platform_device *pdev)
>         struct sonic_local *lp;
>         int err;
>
> +       err = dma_coerce_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> +       if (err)
> +               return err;
> +
>         dev = alloc_etherdev(sizeof(struct sonic_local));
>         if (!dev)
>                 return -ENOMEM;

Shouldn't this be handled in the platform code that instantiates the device,
i.e. in arch/m68k/mac/config.c:mac_platform_init()?

Cfr. commit f61e64310b75733d ("m68k: set dma and coherent masks for platform
FEC ethernets").

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* [PATCH net-next 0/2] selftests: forwarding: Two enhancements
From: Ido Schimmel @ 2018-05-03  7:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel

First patch increases the maximum deviation in the multipath tests which
proved to be too low in some cases.

Second patch allows user to run only specific tests from each file using
the TESTS environment variable. This granularity is needed in setups
where not all the tests can pass.

Ido Schimmel (2):
  selftests: forwarding: Increase maximum deviation in multipath test
  selftests: forwarding: Allow running specific tests

 .../selftests/net/forwarding/bridge_vlan_aware.sh  | 26 +++++++++++++---
 .../net/forwarding/bridge_vlan_unaware.sh          | 26 +++++++++++++---
 tools/testing/selftests/net/forwarding/lib.sh      |  9 ++++++
 .../testing/selftests/net/forwarding/mirror_gre.sh | 36 +++++++++++++++++-----
 .../selftests/net/forwarding/mirror_gre_bound.sh   | 23 +++++++++++---
 .../selftests/net/forwarding/mirror_gre_changes.sh | 29 ++++++++++++++---
 .../selftests/net/forwarding/mirror_gre_flower.sh  | 23 +++++++++++---
 .../selftests/net/forwarding/mirror_gre_neigh.sh   | 22 ++++++++++---
 .../selftests/net/forwarding/mirror_gre_nh.sh      |  8 +++--
 tools/testing/selftests/net/forwarding/router.sh   | 14 +++++++--
 .../selftests/net/forwarding/router_multipath.sh   | 17 +++++++---
 .../testing/selftests/net/forwarding/tc_actions.sh | 25 ++++++++++-----
 .../testing/selftests/net/forwarding/tc_chains.sh  |  7 ++---
 .../testing/selftests/net/forwarding/tc_flower.sh  | 14 +++------
 .../selftests/net/forwarding/tc_shblocks.sh        |  5 +--
 15 files changed, 220 insertions(+), 64 deletions(-)

-- 
2.14.3

^ permalink raw reply

* [PATCH net-next 1/2] selftests: forwarding: Increase maximum deviation in multipath test
From: Ido Schimmel @ 2018-05-03  7:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180503075133.17450-1-idosch@mellanox.com>

We sometimes observe failures in the test due to too large discrepancy
between the measured and expected ratios. For example:

TEST: ECMP                                                          [FAIL]
        Too large discrepancy between expected and measured ratios
        INFO: Expected ratio 1.00 Measured ratio 1.11

Fix this by allowing an up to 15% deviation between both ratios.

Another possibility is to increase the number of generated flows, but
this will prolong the execution time of the test, which is already quite
high.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/forwarding/router_multipath.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 3bc351008db6..2bd3d41354d0 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -191,7 +191,7 @@ multipath_eval()
        diff=$(echo $weights_ratio - $packets_ratio | bc -l)
        diff=${diff#-}
 
-       test "$(echo "$diff / $weights_ratio > 0.1" | bc -l)" -eq 0
+       test "$(echo "$diff / $weights_ratio > 0.15" | bc -l)" -eq 0
        check_err $? "Too large discrepancy between expected and measured ratios"
        log_test "$desc"
        log_info "Expected ratio $weights_ratio Measured ratio $packets_ratio"
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 2/2] selftests: forwarding: Allow running specific tests
From: Ido Schimmel @ 2018-05-03  7:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, petrm, dsahern, mlxsw, Ido Schimmel
In-Reply-To: <20180503075133.17450-1-idosch@mellanox.com>

Similar to commit a511858c7536 ("selftests: fib_tests: Allow user to run
a specific test"), allow user to run only a subset of the tests using
the TESTS environment variable.

This is useful when not all the tests can pass on a given system.

Example:
# export TESTS="ping_ipv4 ping_ipv6"
# ./bridge_vlan_aware.sh
TEST: ping					[PASS]
TEST: ping6					[PASS]

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../selftests/net/forwarding/bridge_vlan_aware.sh  | 26 +++++++++++++---
 .../net/forwarding/bridge_vlan_unaware.sh          | 26 +++++++++++++---
 tools/testing/selftests/net/forwarding/lib.sh      |  9 ++++++
 .../testing/selftests/net/forwarding/mirror_gre.sh | 36 +++++++++++++++++-----
 .../selftests/net/forwarding/mirror_gre_bound.sh   | 23 +++++++++++---
 .../selftests/net/forwarding/mirror_gre_changes.sh | 29 ++++++++++++++---
 .../selftests/net/forwarding/mirror_gre_flower.sh  | 23 +++++++++++---
 .../selftests/net/forwarding/mirror_gre_neigh.sh   | 22 ++++++++++---
 .../selftests/net/forwarding/mirror_gre_nh.sh      |  8 +++--
 tools/testing/selftests/net/forwarding/router.sh   | 14 +++++++--
 .../selftests/net/forwarding/router_multipath.sh   | 15 +++++++--
 .../testing/selftests/net/forwarding/tc_actions.sh | 25 ++++++++++-----
 .../testing/selftests/net/forwarding/tc_chains.sh  |  7 ++---
 .../testing/selftests/net/forwarding/tc_flower.sh  | 14 +++------
 .../selftests/net/forwarding/tc_shblocks.sh        |  5 +--
 15 files changed, 219 insertions(+), 63 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
index 75d922438bc9..d8313d0438b7 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_aware.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
 NUM_NETIFS=4
 CHECK_TC="yes"
 source lib.sh
@@ -75,14 +76,31 @@ cleanup()
 	vrf_cleanup
 }
 
+ping_ipv4()
+{
+	ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+	ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+	learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+	flood_test $swp2 $h1 $h2
+}
+
 trap cleanup EXIT
 
 setup_prepare
 setup_wait
 
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
index 1cddf06f691d..c15c6c85c984 100755
--- a/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_vlan_unaware.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="ping_ipv4 ping_ipv6 learning flooding"
 NUM_NETIFS=4
 source lib.sh
 
@@ -73,14 +74,31 @@ cleanup()
 	vrf_cleanup
 }
 
+ping_ipv4()
+{
+	ping_test $h1 192.0.2.2
+}
+
+ping_ipv6()
+{
+	ping6_test $h1 2001:db8:1::2
+}
+
+learning()
+{
+	learning_test "br0" $swp1 $h1 $h2
+}
+
+flooding()
+{
+	flood_test $swp2 $h1 $h2
+}
+
 trap cleanup EXIT
 
 setup_prepare
 setup_wait
 
-ping_test $h1 192.0.2.2
-ping6_test $h1 2001:db8:1::2
-learning_test "br0" $swp1 $h1 $h2
-flood_test $swp2 $h1 $h2
+tests_run
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index a066ca536ac4..061c87bbf77c 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -477,6 +477,15 @@ matchall_sink_create()
 	   action drop
 }
 
+tests_run()
+{
+	local current_test
+
+	for current_test in ${TESTS:-$ALL_TESTS}; do
+		$current_test
+	done
+}
+
 ##############################################################################
 # Tests
 
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre.sh b/tools/testing/selftests/net/forwarding/mirror_gre.sh
index a8abc736f67c..c6786d1b2b96 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre.sh
@@ -10,6 +10,14 @@
 # traffic. Test that the payload is what is expected (ICMP ping request or
 # reply, depending on test).
 
+ALL_TESTS="
+	test_gretap
+	test_ip6gretap
+	test_gretap_mac
+	test_ip6gretap_mac
+	test_two_spans
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -100,22 +108,36 @@ test_two_spans()
 	log_test "two simultaneously configured mirrors ($tcflags)"
 }
 
-test_all()
+test_gretap()
 {
-	slow_path_trap_install $swp1 ingress
-	slow_path_trap_install $swp1 egress
-
 	full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap"
-	full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
 	full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap"
+}
+
+test_ip6gretap()
+{
+	full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap"
 	full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap"
+}
 
+test_gretap_mac()
+{
 	test_span_gre_mac gt4 ingress ip "mirror to gretap"
-	test_span_gre_mac gt6 ingress ipv6 "mirror to ip6gretap"
 	test_span_gre_mac gt4 egress ip "mirror to gretap"
+}
+
+test_ip6gretap_mac()
+{
+	test_span_gre_mac gt6 ingress ipv6 "mirror to ip6gretap"
 	test_span_gre_mac gt6 egress ipv6 "mirror to ip6gretap"
+}
 
-	test_two_spans
+test_all()
+{
+	slow_path_trap_install $swp1 ingress
+	slow_path_trap_install $swp1 egress
+
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
index 3708ac0f400a..360ca133bead 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_bound.sh
@@ -42,6 +42,11 @@
 # underlay manner, i.e. with a bound dummy device that marks underlay VRF where
 # the encapsulated packed should be routed.
 
+ALL_TESTS="
+	test_gretap
+	test_ip6gretap
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -178,6 +183,18 @@ cleanup()
 	vrf_cleanup
 }
 
+test_gretap()
+{
+	full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap w/ UL"
+	full_test_span_gre_dir gt4 egress  0 8 "mirror to gretap w/ UL"
+}
+
+test_ip6gretap()
+{
+	full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap w/ UL"
+	full_test_span_gre_dir gt6 egress  0 8 "mirror to ip6gretap w/ UL"
+}
+
 test_all()
 {
 	RET=0
@@ -185,11 +202,7 @@ test_all()
 	slow_path_trap_install $swp1 ingress
 	slow_path_trap_install $swp1 egress
 
-	full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap w/ UL"
-	full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap w/ UL"
-
-	full_test_span_gre_dir gt4 egress  0 8 "mirror to gretap w/ UL"
-	full_test_span_gre_dir gt6 egress  0 8 "mirror to ip6gretap w/ UL"
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
index 0ed288ac76d2..fdb612f69613 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_changes.sh
@@ -7,6 +7,13 @@
 # Test how mirrors to gretap and ip6gretap react to changes to relevant
 # configuration.
 
+ALL_TESTS="
+	test_ttl
+	test_tun_up
+	test_egress_up
+	test_remote_ip
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -155,22 +162,36 @@ test_span_gre_remote_ip()
 	log_test "$what: remote address change ($tcflags)"
 }
 
-test_all()
+test_ttl()
 {
-	slow_path_trap_install $swp1 ingress
-	slow_path_trap_install $swp1 egress
-
 	test_span_gre_ttl gt4 gretap ip "mirror to gretap"
 	test_span_gre_ttl gt6 ip6gretap ipv6 "mirror to ip6gretap"
+}
 
+test_tun_up()
+{
 	test_span_gre_tun_up gt4 "mirror to gretap"
 	test_span_gre_tun_up gt6 "mirror to ip6gretap"
+}
 
+test_egress_up()
+{
 	test_span_gre_egress_up gt4 192.0.2.130 "mirror to gretap"
 	test_span_gre_egress_up gt6 2001:db8:2::2 "mirror to ip6gretap"
+}
 
+test_remote_ip()
+{
 	test_span_gre_remote_ip gt4 gretap 192.0.2.130 192.0.2.132 "mirror to gretap"
 	test_span_gre_remote_ip gt6 ip6gretap 2001:db8:2::2 2001:db8:2::4 "mirror to ip6gretap"
+}
+
+test_all()
+{
+	slow_path_trap_install $swp1 ingress
+	slow_path_trap_install $swp1 egress
+
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
index 178a42d771aa..2e54407d8954 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_flower.sh
@@ -10,6 +10,11 @@
 # this address, mirroring takes place, whereas when pinging the other one,
 # there's no mirroring.
 
+ALL_TESTS="
+	test_gretap
+	test_ip6gretap
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -81,6 +86,18 @@ full_test_span_gre_dir_acl()
 	log_test "$direction $what ($tcflags)"
 }
 
+test_gretap()
+{
+	full_test_span_gre_dir_acl gt4 ingress 8 0 192.0.2.4 "ACL mirror to gretap"
+	full_test_span_gre_dir_acl gt4 egress 0 8 192.0.2.3 "ACL mirror to gretap"
+}
+
+test_ip6gretap()
+{
+	full_test_span_gre_dir_acl gt6 ingress 8 0 192.0.2.4 "ACL mirror to ip6gretap"
+	full_test_span_gre_dir_acl gt6 egress 0 8 192.0.2.3 "ACL mirror to ip6gretap"
+}
+
 test_all()
 {
 	RET=0
@@ -88,11 +105,7 @@ test_all()
 	slow_path_trap_install $swp1 ingress
 	slow_path_trap_install $swp1 egress
 
-	full_test_span_gre_dir_acl gt4 ingress 8 0 192.0.2.4 "ACL mirror to gretap"
-	full_test_span_gre_dir_acl gt6 ingress 8 0 192.0.2.4 "ACL mirror to ip6gretap"
-
-	full_test_span_gre_dir_acl gt4 egress 0 8 192.0.2.3 "ACL mirror to gretap"
-	full_test_span_gre_dir_acl gt6 egress 0 8 192.0.2.3 "ACL mirror to ip6gretap"
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
index 1ca29ba4f338..fc0508e40fca 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_neigh.sh
@@ -9,6 +9,11 @@
 # is set up. Later on, the neighbor is deleted and it is expected to be
 # reinitialized using the usual ARP process, and the mirroring offload updated.
 
+ALL_TESTS="
+	test_gretap
+	test_ip6gretap
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -69,15 +74,24 @@ test_span_gre_neigh()
 	log_test "$direction $what: neighbor change ($tcflags)"
 }
 
-test_all()
+test_gretap()
 {
-	slow_path_trap_install $swp1 ingress
-	slow_path_trap_install $swp1 egress
-
 	test_span_gre_neigh 192.0.2.130 gt4 ingress "mirror to gretap"
 	test_span_gre_neigh 192.0.2.130 gt4 egress "mirror to gretap"
+}
+
+test_ip6gretap()
+{
 	test_span_gre_neigh 2001:db8:2::2 gt6 ingress "mirror to ip6gretap"
 	test_span_gre_neigh 2001:db8:2::2 gt6 egress "mirror to ip6gretap"
+}
+
+test_all()
+{
+	slow_path_trap_install $swp1 ingress
+	slow_path_trap_install $swp1 egress
+
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
index 9ac70978541f..a0d1ad46a2bc 100755
--- a/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_gre_nh.sh
@@ -7,6 +7,11 @@
 # Test that gretap and ip6gretap mirroring works when the other tunnel endpoint
 # is reachable through a next-hop route (as opposed to directly-attached route).
 
+ALL_TESTS="
+	test_gretap
+	test_ip6gretap
+"
+
 NUM_NETIFS=6
 source lib.sh
 source mirror_lib.sh
@@ -92,8 +97,7 @@ test_all()
 	slow_path_trap_install $swp1 ingress
 	slow_path_trap_install $swp1 egress
 
-	test_gretap
-	test_ip6gretap
+	tests_run
 
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
diff --git a/tools/testing/selftests/net/forwarding/router.sh b/tools/testing/selftests/net/forwarding/router.sh
index cc6a14abfa87..a75cb51cc5bd 100755
--- a/tools/testing/selftests/net/forwarding/router.sh
+++ b/tools/testing/selftests/net/forwarding/router.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="ping_ipv4 ping_ipv6"
 NUM_NETIFS=4
 source lib.sh
 
@@ -114,12 +115,21 @@ cleanup()
 	vrf_cleanup
 }
 
+ping_ipv4()
+{
+	ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+	ping6_test $h1 2001:db8:2::2
+}
+
 trap cleanup EXIT
 
 setup_prepare
 setup_wait
 
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
+tests_run
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/router_multipath.sh b/tools/testing/selftests/net/forwarding/router_multipath.sh
index 2bd3d41354d0..6c4376289695 100755
--- a/tools/testing/selftests/net/forwarding/router_multipath.sh
+++ b/tools/testing/selftests/net/forwarding/router_multipath.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="ping_ipv4 ping_ipv6 multipath_test"
 NUM_NETIFS=8
 source lib.sh
 
@@ -364,13 +365,21 @@ cleanup()
 	vrf_cleanup
 }
 
+ping_ipv4()
+{
+	ping_test $h1 198.51.100.2
+}
+
+ping_ipv6()
+{
+	ping6_test $h1 2001:db8:2::2
+}
+
 trap cleanup EXIT
 
 setup_prepare
 setup_wait
 
-ping_test $h1 198.51.100.2
-ping6_test $h1 2001:db8:2::2
-multipath_test
+tests_run
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index 3a6385ebd5d0..813d02d1939d 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="gact_drop_and_ok_test mirred_egress_redirect_test \
+	mirred_egress_mirror_test gact_trap_test"
 NUM_NETIFS=4
 source tc_common.sh
 source lib.sh
@@ -111,6 +113,10 @@ gact_trap_test()
 {
 	RET=0
 
+	if [[ "$tcflags" != "skip_sw" ]]; then
+		return 0;
+	fi
+
 	tc filter add dev $swp1 ingress protocol ip pref 1 handle 101 flower \
 		skip_hw dst_ip 192.0.2.2 action drop
 	tc filter add dev $swp1 ingress protocol ip pref 3 handle 103 flower \
@@ -179,24 +185,29 @@ cleanup()
 	ip link set $swp1 address $swp1origmac
 }
 
+mirred_egress_redirect_test()
+{
+	mirred_egress_test "redirect"
+}
+
+mirred_egress_mirror_test()
+{
+	mirred_egress_test "mirror"
+}
+
 trap cleanup EXIT
 
 setup_prepare
 setup_wait
 
-gact_drop_and_ok_test
-mirred_egress_test "redirect"
-mirred_egress_test "mirror"
+tests_run
 
 tc_offload_check
 if [[ $? -ne 0 ]]; then
 	log_info "Could not test offloaded functionality"
 else
 	tcflags="skip_sw"
-	gact_drop_and_ok_test
-	mirred_egress_test "redirect"
-	mirred_egress_test "mirror"
-	gact_trap_test
+	tests_run
 fi
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_chains.sh b/tools/testing/selftests/net/forwarding/tc_chains.sh
index 2fd15226974b..d2c783e94df3 100755
--- a/tools/testing/selftests/net/forwarding/tc_chains.sh
+++ b/tools/testing/selftests/net/forwarding/tc_chains.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="unreachable_chain_test gact_goto_chain_test"
 NUM_NETIFS=2
 source tc_common.sh
 source lib.sh
@@ -107,16 +108,14 @@ trap cleanup EXIT
 setup_prepare
 setup_wait
 
-unreachable_chain_test
-gact_goto_chain_test
+tests_run
 
 tc_offload_check
 if [[ $? -ne 0 ]]; then
 	log_info "Could not test offloaded functionality"
 else
 	tcflags="skip_sw"
-	unreachable_chain_test
-	gact_goto_chain_test
+	tests_run
 fi
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_flower.sh b/tools/testing/selftests/net/forwarding/tc_flower.sh
index 0c54059f1875..20d1077e5a3d 100755
--- a/tools/testing/selftests/net/forwarding/tc_flower.sh
+++ b/tools/testing/selftests/net/forwarding/tc_flower.sh
@@ -1,6 +1,8 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="match_dst_mac_test match_src_mac_test match_dst_ip_test \
+	match_src_ip_test match_ip_flags_test"
 NUM_NETIFS=2
 source tc_common.sh
 source lib.sh
@@ -245,22 +247,14 @@ trap cleanup EXIT
 setup_prepare
 setup_wait
 
-match_dst_mac_test
-match_src_mac_test
-match_dst_ip_test
-match_src_ip_test
-match_ip_flags_test
+tests_run
 
 tc_offload_check
 if [[ $? -ne 0 ]]; then
 	log_info "Could not test offloaded functionality"
 else
 	tcflags="skip_sw"
-	match_dst_mac_test
-	match_src_mac_test
-	match_dst_ip_test
-	match_src_ip_test
-	match_ip_flags_test
+	tests_run
 fi
 
 exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tc_shblocks.sh b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
index 077b98048ef4..b5b917203815 100755
--- a/tools/testing/selftests/net/forwarding/tc_shblocks.sh
+++ b/tools/testing/selftests/net/forwarding/tc_shblocks.sh
@@ -1,6 +1,7 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
+ALL_TESTS="shared_block_test"
 NUM_NETIFS=4
 source tc_common.sh
 source lib.sh
@@ -109,14 +110,14 @@ trap cleanup EXIT
 setup_prepare
 setup_wait
 
-shared_block_test
+tests_run
 
 tc_offload_check
 if [[ $? -ne 0 ]]; then
 	log_info "Could not test offloaded functionality"
 else
 	tcflags="skip_sw"
-	shared_block_test
+	tests_run
 fi
 
 exit $EXIT_STATUS
-- 
2.14.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox