Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Jesper Dangaard Brouer @ 2018-05-07 13:35 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend,
	brouer
In-Reply-To: <20180504025432.23451-9-dsahern@gmail.com>


On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 6877426c23a6..cf0d27acf1d1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
[...]
> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
> +	.func		= bpf_xdp_fib_lookup,
> +	.gpl_only	= true,

Is it a deliberate choice to require BPF-progs using this helper to be
GPL licensed?

Asking as this seems to be the first network related helper with this
requirement, while this is typical for tracing related helpers.


> +	.ret_type	= RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_PTR_TO_MEM,
> +	.arg3_type      = ARG_CONST_SIZE,
> +	.arg4_type	= ARG_ANYTHING,
> +};

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next] net-next/hinic: add pci device ids for 25ge and 100ge card
From: Zhao Chen @ 2018-05-07 13:21 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, aviad.krawczyk, zhaochen6, tony.qu,
	yin.yinshi, luoshaokai

This patch adds PCI device IDs to support 25GE and 100GE card:

1. Add device id 0x0201 for HINIC 100GE dual port card.
2. Add device id 0x0200 for HINIC 25GE dual port card.
3. Macro of device id 0x1822 is modified for HINIC 25GE quad port card.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index eb53bd93065e..5b122728dcb4 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -51,7 +51,9 @@ static unsigned int rx_weight = 64;
 module_param(rx_weight, uint, 0644);
 MODULE_PARM_DESC(rx_weight, "Number Rx packets for NAPI budget (default=64)");
 
-#define PCI_DEVICE_ID_HI1822_PF         0x1822
+#define HINIC_DEV_ID_QUAD_PORT_25GE     0x1822
+#define HINIC_DEV_ID_DUAL_PORT_25GE     0x0200
+#define HINIC_DEV_ID_DUAL_PORT_100GE    0x0201
 
 #define HINIC_WQ_NAME                   "hinic_dev"
 
@@ -1097,7 +1099,9 @@ static void hinic_remove(struct pci_dev *pdev)
 }
 
 static const struct pci_device_id hinic_pci_table[] = {
-	{ PCI_VDEVICE(HUAWEI, PCI_DEVICE_ID_HI1822_PF), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_QUAD_PORT_25GE), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_25GE), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_100GE), 0},
 	{ 0, 0}
 };
 MODULE_DEVICE_TABLE(pci, hinic_pci_table);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Dmitry Vyukov @ 2018-05-07 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Easton, Jason Wang, KVM list, virtualization, netdev, LKML,
	syzkaller-bugs
In-Reply-To: <20180507155534-mutt-send-email-mst@kernel.org>

On Mon, May 7, 2018 at 3:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
>> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
>> so it should be allocated with kzalloc() to ensure all structure padding
>> is zeroed.
>>
>> Signed-off-by: Kevin Easton <kevin@guarana.org>
>> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
>> ---
>>  drivers/vhost/vhost.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index f3bd8e9..1b84dcff 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>>  /* Create a new message. */
>>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>>  {
>> -     struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>> +     struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>>       if (!node)
>>               return NULL;
>>       node->vq = vq;
>
>
> Let's just init the msg though.
>
> OK it seems this is the best we can do for now,
> we need a new feature bit to fix it for 32 bit
> userspace on 64 bit kernels.
>
> Does the following help?

Hi Michael,

You can ask reporter (syzbot) to test:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#kmsan-bugs


> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..58d9aec 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2342,6 +2342,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>         struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>         if (!node)
>                 return NULL;
> +
> +       /* Make sure all padding within the structure is initialized. */
> +       memset(&node->msg, 0, sizeof node->msg);
>         node->vq = vq;
>         node->msg.type = type;
>         return node;
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180507155534-mutt-send-email-mst%40kernel.org.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* [PATCH net 2/2] net: aquantia: Limit number of vectors to actually allocated irqs
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1525360773.git.igor.russkikh@aquantia.com>

Driver should use pci_alloc_irq_vectors return value to correct number
of allocated vectors and napi instances. Otherwise it'll panic later
in pci_irq_vector.

Driver also should allow more than one MSI vectors to be allocated.

Error return path from pci_alloc_irq_vectors is also fixed to revert
resources in a correct sequence when error happens.

Reported-by: Long, Nicholas <nicholas.a.long@baesystems.com>
Fixes: 23ee07a ("net: aquantia: Cleanup pci functions module")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 20 ++++++++++----------
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 720760d..1a1a638 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -95,6 +95,7 @@ void aq_nic_cfg_start(struct aq_nic_s *self)
 	/*rss rings */
 	cfg->vecs = min(cfg->aq_hw_caps->vecs, AQ_CFG_VECS_DEF);
 	cfg->vecs = min(cfg->vecs, num_online_cpus());
+	cfg->vecs = min(cfg->vecs, self->irqvecs);
 	/* cfg->vecs should be power of 2 for RSS */
 	if (cfg->vecs >= 8U)
 		cfg->vecs = 8U;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
index 219b550..faa533a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
@@ -80,6 +80,7 @@ struct aq_nic_s {
 
 	struct pci_dev *pdev;
 	unsigned int msix_entry_mask;
+	u32 irqvecs;
 };
 
 static inline struct device *aq_nic_get_dev(struct aq_nic_s *self)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index ecc6306..a50e08b 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -267,16 +267,16 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	numvecs = min(numvecs, num_online_cpus());
 	/*enable interrupts */
 #if !AQ_CFG_FORCE_LEGACY_INT
-	err = pci_alloc_irq_vectors(self->pdev, numvecs, numvecs,
-				    PCI_IRQ_MSIX);
-
-	if (err < 0) {
-		err = pci_alloc_irq_vectors(self->pdev, 1, 1,
-					    PCI_IRQ_MSI | PCI_IRQ_LEGACY);
-		if (err < 0)
-			goto err_hwinit;
+	numvecs = pci_alloc_irq_vectors(self->pdev, 1, numvecs,
+					PCI_IRQ_MSIX | PCI_IRQ_MSI |
+					PCI_IRQ_LEGACY);
+
+	if (numvecs < 0) {
+		err = numvecs;
+		goto err_hwinit;
 	}
 #endif
+	self->irqvecs = numvecs;
 
 	/* net device init */
 	aq_nic_cfg_start(self);
@@ -298,9 +298,9 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	kfree(self->aq_hw);
 err_ioremap:
 	free_netdev(ndev);
-err_pci_func:
-	pci_release_regions(pdev);
 err_ndev:
+	pci_release_regions(pdev);
+err_pci_func:
 	pci_disable_device(pdev);
 	return err;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/2] net: aquantia: driver should correctly declare vlan_features bits
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1525360773.git.igor.russkikh@aquantia.com>

In particular, not reporting SG forced skbs to be linear for vlan
interfaces over atlantic NIC.

With this fix it is possible to enable SG feature on device and
therefore optimize performance.

Reported-by: Ma Yuying <yuma@redhat.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 32f6d2e..720760d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -246,6 +246,8 @@ void aq_nic_ndev_init(struct aq_nic_s *self)

 	self->ndev->hw_features |= aq_hw_caps->hw_features;
 	self->ndev->features = aq_hw_caps->hw_features;
+	self->ndev->vlan_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+				     NETIF_F_RXHASH | NETIF_F_SG | NETIF_F_LRO;
 	self->ndev->priv_flags = aq_hw_caps->hw_priv_flags;
 	self->ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;

-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/2] Aquantia various patches 2018-05
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh

These are two patches covering issues found during test cycles:

First is that driver should declare valid vlan_features
Second fix is about correct allocation of MSI interrupts on some systems.

Igor Russkikh (2):
  net: aquantia: driver should correctly declare vlan_features bits
  net: aquantia: Limit number of vectors to actually allocated irqs

 drivers/net/ethernet/aquantia/atlantic/aq_nic.c      |  3 +++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 20 ++++++++++----------
 3 files changed, 14 insertions(+), 10 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Jesper Dangaard Brouer @ 2018-05-07 13:09 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Alexei Starovoitov, Daniel Borkmann, Björn Töpel,
	Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Willem de Bruijn,
	Michael S. Tsirkin, Network Development, Björn Töpel,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z, brouer
In-Reply-To: <CAJ8uoz2L+3dw5OOHJJ0CyV-y5w1ooNAjKHzHAcAiKhTjAq-vzQ@mail.gmail.com>

On Mon, 7 May 2018 11:13:58 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Sat, May 5, 2018 at 2:34 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote:  
> >> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:  
> >> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:  
> >> >> On 05/02/2018 01:01 PM, Björn Töpel wrote:  
> >> >> > From: Björn Töpel <bjorn.topel@intel.com>
> >> >> >
> >> >> > This patch set introduces a new address family called AF_XDP that is
> >> >> > optimized for high performance packet processing and, in upcoming
> >> >> > patch sets, zero-copy semantics. In this patch set, we have removed
> >> >> > all zero-copy related code in order to make it smaller, simpler and
> >> >> > hopefully more review friendly. This patch set only supports copy-mode
> >> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> >> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and
> >> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his
> >> >> > work has already been accepted. We will publish our zero-copy support
> >> >> > for RX and TX on top of his patch sets at a later point in time.  
> >> >>
> >> >> +1, would be great to see it land this cycle. Saw few minor nits here
> >> >> and there but nothing to hold it up, for the series:
> >> >>
> >> >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> >> >>
> >> >> Thanks everyone!  
> >> >
> >> > Great stuff!
> >> >
> >> > Applied to bpf-next, with one condition.
> >> > Upcoming zero-copy patches for both RX and TX need to be posted
> >> > and reviewed within this release window.
> >> > If netdev community as a whole won't be able to agree on the zero-copy
> >> > bits we'd need to revert this feature before the next merge window.  
> >>
> >> Thanks everyone for reviewing this. Highly appreciated.
> >>
> >> Just so we understand the purpose correctly:
> >>
> >> 1: Do you want to see the ZC patches in order to verify that the user
> >> space API holds? If so, we can produce an additional RFC  patch set
> >> using a big chunk of code that we had in RFC V1. We are not proud of
> >> this code since it is clunky, but it hopefully proves the point with
> >> the uapi being the same.
> >>
> >> 2: And/Or are you worried about us all (the netdev community) not
> >> agreeing on a way to implement ZC internally in the drivers and the
> >> XDP infrastructure? This is not going to be possible to finish during
> >> this cycle since we do not like the implementation we had in RFC V1.
> >> Too intrusive and now we also have nicer abstractions from Jesper that
> >> we can use and extend to provide a (hopefully) much cleaner and less
> >> intrusive solution.  
> >
> > short answer: both.
> >
> > Cleanliness and performance of the ZC code is not as important as
> > getting API right. The main concern that during ZC review process
> > we will find out that existing API has issues, so we have to
> > do this exercise before the merge window.
> > And RFC won't fly. Send the patches for real. They have to go
> > through the proper code review. The hackers of netdev community
> > can accept a partial, or a bit unclean, or slightly inefficient
> > implementation, since it can be and will be improved later,
> > but API we cannot change once it goes into official release.
> >
> > Here is the example of API concern:
> > this patch set added shared umem concept. It sounds good in theory,
> > but will it perform well with ZC ? Earlier RFCs didn't have that
> > feature. If it won't perform well than it shouldn't be in the tree.
> > The key reason to let AF_XDP into the tree is its performance promise.
> > If it doesn't perform we should rip it out and redesign.  
> 
> That is a fair point. We will try to produce patch sets for zero-copy
> RX and TX using the latest interfaces within this merge window. Just
> note that we will focus on this for the next week(s) instead of the
> review items that you and Daniel Borkmann submitted. If we get those
> patch sets out in time and we agree that they are a possible way
> forward, then we produce patches with your fixes. It was mainly small
> items, so should be quick.

I would like to see that you create a new xdp_mem_type for this new
zero-copy type. This will allow other XDP redirect methods/types (e.g.
devmap and cpumap) to react appropriately when receiving a zero-copy
frame.

For devmap, I'm hoping we can allow/support using the ndo_xdp_xmit call
without (first) copying (into a newly allocated page).  By arguing that
if an xsk-userspace app modify a frame it's not allowed to, then it is
simply a bug in the program. (Note, this would also allow using
ndo_xdp_xmit call for TX from xsk-userspace).

For cpumap, it is hard to avoid a copy, but I'm hoping we could delay
the copy (and alloc of mem dest area) until on the remote CPU.  This is
already the principle of cpumap; of moving the allocation of the SKB to
the remote CPU.

For ZC to interact with XDP redirect-core and return API, the zero-copy
memory type/allocator, need to provide an area for the xdp_frame data
to be stored in (as we cannot allow using top-of-frame like
non-zero-copy variants), and extend xdp_frame with an ZC umem-id.
I imagine we can avoid any dynamic allocations, as we upfront (at bind
and XDP_UMEM_REG time) know the number of frames.  (e.g. pre-alloc in
xdp_umem_reg() call, and have xdp_umem_get_xdp_frame lookup func).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Michael S. Tsirkin @ 2018-05-07 13:03 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel,
	syzkaller-bugs
In-Reply-To: <20180427154502.GA22544@la.guarana.org>

On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
> so it should be allocated with kzalloc() to ensure all structure padding
> is zeroed.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>
> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
> ---
>  drivers/vhost/vhost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..1b84dcff 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>  /* Create a new message. */
>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>  {
> -	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
> +	struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>  	if (!node)
>  		return NULL;
>  	node->vq = vq;


Let's just init the msg though.

OK it seems this is the best we can do for now,
we need a new feature bit to fix it for 32 bit
userspace on 64 bit kernels.

Does the following help?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f3bd8e9..58d9aec 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2342,6 +2342,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
 	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
 	if (!node)
 		return NULL;
+
+	/* Make sure all padding within the structure is initialized. */
+	memset(&node->msg, 0, sizeof node->msg);
 	node->vq = vq;
 	node->msg.type = type;
 	return node;

^ permalink raw reply related

* Re: The SO_BINDTODEVICE was set to the desired interface, but packets are received from all interfaces.
From: Paolo Abeni @ 2018-05-07 12:41 UTC (permalink / raw)
  To: Damir Mansurov, netdev, David Ahern
  Cc: Konstantin Ushakov, Alexandra N. Kossovsky, Andrey Dmitrov
In-Reply-To: <5a61e34b-75c2-0452-d6e2-6e4ea77d5ac2@oktetlabs.ru>

Hi,
On Mon, 2018-05-07 at 13:19 +0300, Damir Mansurov wrote:
> After successful call of the setsockopt(SO_BINDTODEVICE) function to set 
> data reception from only one interface, the data is still received from 
> all interfaces. Function setsockopt() returns 0 but then recv() receives 
> data from all available network interfaces.
> 
> The problem is reproducible on linux kernels 4.14 - 4.16, but it does 
> not on linux kernels 4.4, 4.13.

I think that the cause is commit:

commit fb74c27735f0a34e76dbf1972084e984ad2ea145
Author: David Ahern <dsahern@gmail.com>
Date:   Mon Aug 7 08:44:16 2017 -0700

    net: ipv4: add second dif to udp socket lookups

Something like the following should fix, but I'm unsure it preserves
the intended semathics for 'sdif'. David, can you please have a look?
Thanks!

Paolo
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index dd3102a37ef9..0d593d5c33cf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -401,9 +401,9 @@ static int compute_score(struct sock *sk, struct net *net,
 		bool dev_match = (sk->sk_bound_dev_if == dif ||
 				  sk->sk_bound_dev_if == sdif);

-		if (exact_dif && !dev_match)
+		if (!dev_match)
 			return -1;
-		if (sk->sk_bound_dev_if && dev_match)
+		if (sk->sk_bound_dev_if)
 			score += 4;
 	}

^ permalink raw reply related

* Re: [PATCH 09/18] net: mac80211.h: fix a bad comment line
From: Johannes Berg @ 2018-05-07 12:38 UTC (permalink / raw)
  To: Kalle Valo, Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, David S. Miller, linux-wireless, netdev
In-Reply-To: <87fu334p5v.fsf@purkki.adurom.net>

On Mon, 2018-05-07 at 15:37 +0300, Kalle Valo wrote:
> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> writes:
> 
> > Sphinx produces a lot of errors like this:
> > 	./include/net/mac80211.h:2083: warning: bad line:  >
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> 
> Randy already submitted a similar patch:
> 
> https://patchwork.kernel.org/patch/10367275/
> 
> But it seems Johannes has not applied that yet.

Yeah, I've been super busy preparing for the plugfest.

I'll make a pass over all the patches as soon as I can, hopefully today
or tomorrow.

johannes

^ permalink raw reply

* Re: [PATCH 09/18] net: mac80211.h: fix a bad comment line
From: Kalle Valo @ 2018-05-07 12:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Linux Doc Mailing List, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, Johannes Berg, David S. Miller, linux-wireless,
	netdev
In-Reply-To: <1b14ef4f63547c86c544e112b74acfff2b41f4af.1525684985.git.mchehab+samsung@kernel.org>

Mauro Carvalho Chehab <mchehab+samsung@kernel.org> writes:

> Sphinx produces a lot of errors like this:
> 	./include/net/mac80211.h:2083: warning: bad line:  >
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Randy already submitted a similar patch:

https://patchwork.kernel.org/patch/10367275/

But it seems Johannes has not applied that yet.

-- 
Kalle Valo

^ permalink raw reply

* [PATCH net-next] dt-bindings: dsa: Remove unnecessary #address/#size-cells
From: Fabio Estevam @ 2018-05-07 12:17 UTC (permalink / raw)
  To: davem; +Cc: f.fainelli, andrew, robh+dt, netdev, devicetree, Fabio Estevam

From: Fabio Estevam <fabio.estevam@nxp.com>

If the example binding is used on a real dts file, the following DTC
warning is seen with W=1:
    
arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property

Remove unnecessary #address-cells/#size-cells to improve the binding
document examples.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index cfe8f64..3ceeb8d 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -82,8 +82,6 @@ linked into one DSA cluster.
 
 	switch0: switch0@0 {
 		compatible = "marvell,mv88e6085";
-		#address-cells = <1>;
-		#size-cells = <0>;
 		reg = <0>;
 
 		dsa,member = <0 0>;
@@ -135,8 +133,6 @@ linked into one DSA cluster.
 
 	switch1: switch1@0 {
 		compatible = "marvell,mv88e6085";
-		#address-cells = <1>;
-		#size-cells = <0>;
 		reg = <0>;
 
 		dsa,member = <0 1>;
@@ -204,8 +200,6 @@ linked into one DSA cluster.
 
 	switch2: switch2@0 {
 		compatible = "marvell,mv88e6085";
-		#address-cells = <1>;
-		#size-cells = <0>;
 		reg = <0>;
 
 		dsa,member = <0 2>;
-- 
2.7.4

^ permalink raw reply related

* [PATCH] trivial: fix inconsistent help texts
From: Georg Hofmann @ 2018-05-07 12:03 UTC (permalink / raw)
  Cc: Georg Hofmann, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Jiri Kosina, netdev, linux-kernel

This patch removes "experimental" from the help text where depends on
CONFIG_EXPERIMENTAL was already removed.

Signed-off-by: Georg Hofmann <georg@hofmannsweb.com>
---
 net/ipv6/Kconfig | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 6794ddf..11e4e80 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -34,16 +34,15 @@ config IPV6_ROUTE_INFO
 	bool "IPv6: Route Information (RFC 4191) support"
 	depends on IPV6_ROUTER_PREF
 	---help---
-	  This is experimental support of Route Information.
+	  Support of Route Information.
 
 	  If unsure, say N.
 
 config IPV6_OPTIMISTIC_DAD
 	bool "IPv6: Enable RFC 4429 Optimistic DAD"
 	---help---
-	  This is experimental support for optimistic Duplicate
-	  Address Detection.  It allows for autoconfigured addresses
-	  to be used more quickly.
+	  Support for optimistic Duplicate Address Detection. It allows for
+	  autoconfigured addresses to be used more quickly.
 
 	  If unsure, say N.
 
@@ -280,7 +279,7 @@ config IPV6_MROUTE
 	depends on IPV6
 	select IP_MROUTE_COMMON
 	---help---
-	  Experimental support for IPv6 multicast forwarding.
+	  Support for IPv6 multicast forwarding.
 	  If unsure, say N.
 
 config IPV6_MROUTE_MULTIPLE_TABLES
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCHv2 net] sctp: delay the authentication for the duplicated cookie-echo chunk
From: Neil Horman @ 2018-05-07 11:38 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Marcelo Ricardo Leitner
In-Reply-To: <2a815adb308826967c84cd8dc371c3c9b054c8da.1525503587.git.lucien.xin@gmail.com>

On Sat, May 05, 2018 at 02:59:47PM +0800, Xin Long wrote:
> Now sctp only delays the authentication for the normal cookie-echo
> chunk by setting chunk->auth_chunk in sctp_endpoint_bh_rcv(). But
> for the duplicated one with auth, in sctp_assoc_bh_rcv(), it does
> authentication first based on the old asoc, which will definitely
> fail due to the different auth info in the old asoc.
> 
> The duplicated cookie-echo chunk will create a new asoc with the
> auth info from this chunk, and the authentication should also be
> done with the new asoc's auth info for all of the collision 'A',
> 'B' and 'D'. Otherwise, the duplicated cookie-echo chunk with auth
> will never pass the authentication and create the new connection.
> 
> This issue exists since very beginning, and this fix is to make
> sctp_assoc_bh_rcv() follow the way sctp_endpoint_bh_rcv() does
> for the normal cookie-echo chunk to delay the authentication.
> 
> While at it, remove the unused params from sctp_sf_authenticate()
> and define sctp_auth_chunk_verify() used for all the places that
> do the delayed authentication.
> 
> v1->v2:
>   fix the typo in changelog as Marcelo noticed.
> 
> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/associola.c    | 30 ++++++++++++++++-
>  net/sctp/sm_statefuns.c | 86 ++++++++++++++++++++++++++-----------------------
>  2 files changed, 75 insertions(+), 41 deletions(-)
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 837806d..a47179d 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1024,8 +1024,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  	struct sctp_endpoint *ep;
>  	struct sctp_chunk *chunk;
>  	struct sctp_inq *inqueue;
> -	int state;
> +	int first_time = 1;	/* is this the first time through the loop */
>  	int error = 0;
> +	int state;
>  
>  	/* The association should be held so we should be safe. */
>  	ep = asoc->ep;
> @@ -1036,6 +1037,30 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  		state = asoc->state;
>  		subtype = SCTP_ST_CHUNK(chunk->chunk_hdr->type);
>  
> +		/* If the first chunk in the packet is AUTH, do special
> +		 * processing specified in Section 6.3 of SCTP-AUTH spec
> +		 */
> +		if (first_time && subtype.chunk == SCTP_CID_AUTH) {
> +			struct sctp_chunkhdr *next_hdr;
> +
> +			next_hdr = sctp_inq_peek(inqueue);
> +			if (!next_hdr)
> +				goto normal;
> +
> +			/* If the next chunk is COOKIE-ECHO, skip the AUTH
> +			 * chunk while saving a pointer to it so we can do
> +			 * Authentication later (during cookie-echo
> +			 * processing).
> +			 */
> +			if (next_hdr->type == SCTP_CID_COOKIE_ECHO) {
> +				chunk->auth_chunk = skb_clone(chunk->skb,
> +							      GFP_ATOMIC);
> +				chunk->auth = 1;
> +				continue;
> +			}
> +		}
> +
> +normal:
>  		/* SCTP-AUTH, Section 6.3:
>  		 *    The receiver has a list of chunk types which it expects
>  		 *    to be received only after an AUTH-chunk.  This list has
> @@ -1074,6 +1099,9 @@ static void sctp_assoc_bh_rcv(struct work_struct *work)
>  		/* If there is an error on chunk, discard this packet. */
>  		if (error && chunk)
>  			chunk->pdiscard = 1;
> +
> +		if (first_time)
> +			first_time = 0;
>  	}
>  	sctp_association_put(asoc);
>  }
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 28c070e..c9ae340 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -153,10 +153,7 @@ static enum sctp_disposition sctp_sf_violation_chunk(
>  					struct sctp_cmd_seq *commands);
>  
>  static enum sctp_ierror sctp_sf_authenticate(
> -					struct net *net,
> -					const struct sctp_endpoint *ep,
>  					const struct sctp_association *asoc,
> -					const union sctp_subtype type,
>  					struct sctp_chunk *chunk);
>  
>  static enum sctp_disposition __sctp_sf_do_9_1_abort(
> @@ -626,6 +623,38 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
>  	return SCTP_DISPOSITION_CONSUME;
>  }
>  
> +static bool sctp_auth_chunk_verify(struct net *net, struct sctp_chunk *chunk,
> +				   const struct sctp_association *asoc)
> +{
> +	struct sctp_chunk auth;
> +
> +	if (!chunk->auth_chunk)
> +		return true;
> +
> +	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
> +	 * is supposed to be authenticated and we have to do delayed
> +	 * authentication.  We've just recreated the association using
> +	 * the information in the cookie and now it's much easier to
> +	 * do the authentication.
> +	 */
> +
> +	/* Make sure that we and the peer are AUTH capable */
> +	if (!net->sctp.auth_enable || !asoc->peer.auth_capable)
> +		return false;
> +
> +	/* set-up our fake chunk so that we can process it */
> +	auth.skb = chunk->auth_chunk;
> +	auth.asoc = chunk->asoc;
> +	auth.sctp_hdr = chunk->sctp_hdr;
> +	auth.chunk_hdr = (struct sctp_chunkhdr *)
> +				skb_push(chunk->auth_chunk,
> +					 sizeof(struct sctp_chunkhdr));
> +	skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
> +	auth.transport = chunk->transport;
> +
> +	return sctp_sf_authenticate(asoc, &auth) == SCTP_IERROR_NO_ERROR;
> +}
> +
>  /*
>   * Respond to a normal COOKIE ECHO chunk.
>   * We are the side that is being asked for an association.
> @@ -763,37 +792,9 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
>  	if (error)
>  		goto nomem_init;
>  
> -	/* SCTP-AUTH:  auth_chunk pointer is only set when the cookie-echo
> -	 * is supposed to be authenticated and we have to do delayed
> -	 * authentication.  We've just recreated the association using
> -	 * the information in the cookie and now it's much easier to
> -	 * do the authentication.
> -	 */
> -	if (chunk->auth_chunk) {
> -		struct sctp_chunk auth;
> -		enum sctp_ierror ret;
> -
> -		/* Make sure that we and the peer are AUTH capable */
> -		if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
> -			sctp_association_free(new_asoc);
> -			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> -		}
> -
> -		/* set-up our fake chunk so that we can process it */
> -		auth.skb = chunk->auth_chunk;
> -		auth.asoc = chunk->asoc;
> -		auth.sctp_hdr = chunk->sctp_hdr;
> -		auth.chunk_hdr = (struct sctp_chunkhdr *)
> -					skb_push(chunk->auth_chunk,
> -						 sizeof(struct sctp_chunkhdr));
> -		skb_pull(chunk->auth_chunk, sizeof(struct sctp_chunkhdr));
> -		auth.transport = chunk->transport;
> -
> -		ret = sctp_sf_authenticate(net, ep, new_asoc, type, &auth);
> -		if (ret != SCTP_IERROR_NO_ERROR) {
> -			sctp_association_free(new_asoc);
> -			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> -		}
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc)) {
> +		sctp_association_free(new_asoc);
> +		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>  	}
>  
>  	repl = sctp_make_cookie_ack(new_asoc, chunk);
> @@ -1797,13 +1798,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
>  	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>  		goto nomem;
>  
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Make sure no new addresses are being added during the
>  	 * restart.  Though this is a pretty complicated attack
>  	 * since you'd have to get inside the cookie.
>  	 */
> -	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands)) {
> +	if (!sctp_sf_check_restart_addrs(new_asoc, asoc, chunk, commands))
>  		return SCTP_DISPOSITION_CONSUME;
> -	}
>  
>  	/* If the endpoint is in the SHUTDOWN-ACK-SENT state and recognizes
>  	 * the peer has restarted (Action A), it MUST NOT setup a new
> @@ -1912,6 +1915,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
>  	if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
>  		goto nomem;
>  
> +	if (!sctp_auth_chunk_verify(net, chunk, new_asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Update the content of current association.  */
>  	sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
>  	sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
> @@ -2009,6 +2015,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
>  	 * a COOKIE ACK.
>  	 */
>  
> +	if (!sctp_auth_chunk_verify(net, chunk, asoc))
> +		return SCTP_DISPOSITION_DISCARD;
> +
>  	/* Don't accidentally move back into established state. */
>  	if (asoc->state < SCTP_STATE_ESTABLISHED) {
>  		sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_STOP,
> @@ -4171,10 +4180,7 @@ enum sctp_disposition sctp_sf_eat_fwd_tsn_fast(
>   * The return value is the disposition of the chunk.
>   */
>  static enum sctp_ierror sctp_sf_authenticate(
> -					struct net *net,
> -					const struct sctp_endpoint *ep,
>  					const struct sctp_association *asoc,
> -					const union sctp_subtype type,
>  					struct sctp_chunk *chunk)
>  {
>  	struct sctp_shared_key *sh_key = NULL;
> @@ -4275,7 +4281,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
>  						  commands);
>  
>  	auth_hdr = (struct sctp_authhdr *)chunk->skb->data;
> -	error = sctp_sf_authenticate(net, ep, asoc, type, chunk);
> +	error = sctp_sf_authenticate(asoc, chunk);
>  	switch (error) {
>  	case SCTP_IERROR_AUTH_BAD_HMAC:
>  		/* Generate the ERROR chunk and discard the rest
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* [iproute2-next  1/1] tipc: Add support to set and get MTU for UDP bearer
From: GhantaKrishnamurthy MohanKrishna @ 2018-05-07 11:14 UTC (permalink / raw)
  To: tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem, dsahern

In this commit we introduce the ability to set and get
MTU for UDP media and bearer.

For set and get properties such as tolerance, window and priority,
we already do:

    $ tipc media set PPROPERTY media MEDIA
    $ tipc media get PPROPERTY media MEDIA

    $ tipc bearer set OPTION media MEDIA ARGS
    $ tipc bearer get [OPTION] media MEDIA ARGS

The same has been extended for MTU, with an exception to support
only media type UDP.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
---
 tipc/bearer.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++---------
 tipc/media.c  | 24 ++++++++++++++++++++++--
 2 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/tipc/bearer.c b/tipc/bearer.c
index 0d8457015062..27e8d77f5635 100644
--- a/tipc/bearer.c
+++ b/tipc/bearer.c
@@ -42,7 +42,8 @@ static void _print_bearer_opts(void)
 		"OPTIONS\n"
 		" priority              - Bearer link priority\n"
 		" tolerance             - Bearer link tolerance\n"
-		" window                - Bearer link window\n");
+		" window                - Bearer link window\n"
+		" mtu                   - Bearer link mtu\n");
 }
 
 void print_bearer_media(void)
@@ -194,6 +195,21 @@ static int nl_add_udp_enable_opts(struct nlmsghdr *nlh, struct opt *opts,
 	return 0;
 }
 
+static char *cmd_get_media_type(const struct cmd *cmd, struct cmdl *cmdl,
+				struct opt *opts)
+{
+	struct opt *opt;
+
+	if (!(opt = get_opt(opts, "media"))) {
+		if (help_flag)
+			(cmd->help)(cmdl);
+		else
+			fprintf(stderr, "error, missing bearer media\n");
+		return NULL;
+	}
+	return opt->val;
+}
+
 static int nl_add_bearer_name(struct nlmsghdr *nlh, const struct cmd *cmd,
 			      struct cmdl *cmdl, struct opt *opts,
 			      const struct tipc_sup_media *sup_media)
@@ -217,15 +233,8 @@ int cmd_get_unique_bearer_name(const struct cmd *cmd, struct cmdl *cmdl,
 	struct opt *opt;
 	const struct tipc_sup_media *entry;
 
-
-	if (!(opt = get_opt(opts, "media"))) {
-		if (help_flag)
-			(cmd->help)(cmdl);
-		else
-			fprintf(stderr, "error, missing bearer media\n");
+	if (!(media = cmd_get_media_type(cmd, cmdl, opts)))
 		return -EINVAL;
-	}
-	media = opt->val;
 
 	for (entry = sup_media; entry->media; entry++) {
 		if (strcmp(entry->media, media))
@@ -559,6 +568,8 @@ static int cmd_bearer_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		prop = TIPC_NLA_PROP_TOL;
 	else if ((strcmp(cmd->cmd, "window") == 0))
 		prop = TIPC_NLA_PROP_WIN;
+	else if ((strcmp(cmd->cmd, "mtu") == 0))
+		prop = TIPC_NLA_PROP_MTU;
 	else
 		return -EINVAL;
 
@@ -571,6 +582,17 @@ static int cmd_bearer_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 	if (parse_opts(opts, cmdl) < 0)
 		return -EINVAL;
 
+	if (prop == TIPC_NLA_PROP_MTU) {
+		char *media;
+
+		if (!(media = cmd_get_media_type(cmd, cmdl, opts)))
+			return -EINVAL;
+		else if (strcmp(media, "udp")) {
+			fprintf(stderr, "error, not supported for media\n");
+			return -EINVAL;
+		}
+	}
+
 	if (!(nlh = msg_init(buf, TIPC_NL_BEARER_SET))) {
 		fprintf(stderr, "error, message initialisation failed\n");
 		return -1;
@@ -597,6 +619,7 @@ static int cmd_bearer_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ "priority",	cmd_bearer_set_prop,	cmd_bearer_set_help },
 		{ "tolerance",	cmd_bearer_set_prop,	cmd_bearer_set_help },
 		{ "window",	cmd_bearer_set_prop,	cmd_bearer_set_help },
+		{ "mtu",	cmd_bearer_set_prop,	cmd_bearer_set_help },
 		{ NULL }
 	};
 
@@ -877,12 +900,25 @@ static int cmd_bearer_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		prop = TIPC_NLA_PROP_TOL;
 	else if ((strcmp(cmd->cmd, "window") == 0))
 		prop = TIPC_NLA_PROP_WIN;
+	else if ((strcmp(cmd->cmd, "mtu") == 0))
+		prop = TIPC_NLA_PROP_MTU;
 	else
 		return -EINVAL;
 
 	if (parse_opts(opts, cmdl) < 0)
 		return -EINVAL;
 
+	if (prop == TIPC_NLA_PROP_MTU) {
+		char *media;
+
+		if (!(media = cmd_get_media_type(cmd, cmdl, opts)))
+			return -EINVAL;
+		else if (strcmp(media, "udp")) {
+			fprintf(stderr, "error, not supported for media\n");
+			return -EINVAL;
+		}
+	}
+
 	if (!(nlh = msg_init(buf, TIPC_NL_BEARER_GET))) {
 		fprintf(stderr, "error, message initialisation failed\n");
 		return -1;
@@ -904,6 +940,7 @@ static int cmd_bearer_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ "priority",	cmd_bearer_get_prop,	cmd_bearer_get_help },
 		{ "tolerance",	cmd_bearer_get_prop,	cmd_bearer_get_help },
 		{ "window",	cmd_bearer_get_prop,	cmd_bearer_get_help },
+		{ "mtu",	cmd_bearer_get_prop,	cmd_bearer_get_help },
 		{ "media",	cmd_bearer_get_media,	cmd_bearer_get_help },
 		{ NULL }
 	};
diff --git a/tipc/media.c b/tipc/media.c
index 6e10c7e5d8e6..969ef6578b3b 100644
--- a/tipc/media.c
+++ b/tipc/media.c
@@ -103,6 +103,8 @@ static int cmd_media_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		prop = TIPC_NLA_PROP_TOL;
 	else if ((strcmp(cmd->cmd, "window") == 0))
 		prop = TIPC_NLA_PROP_WIN;
+	else if ((strcmp(cmd->cmd, "mtu") == 0))
+		prop = TIPC_NLA_PROP_MTU;
 	else
 		return -EINVAL;
 
@@ -123,6 +125,12 @@ static int cmd_media_get_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		fprintf(stderr, "error, missing media\n");
 		return -EINVAL;
 	}
+
+	if ((prop == TIPC_NLA_PROP_MTU) &&
+	    (strcmp(opt->val, "udp"))) {
+		fprintf(stderr, "error, not supported for media\n");
+		return -EINVAL;
+	}
 	nest = mnl_attr_nest_start(nlh, TIPC_NLA_MEDIA);
 	mnl_attr_put_strz(nlh, TIPC_NLA_MEDIA_NAME, opt->val);
 	mnl_attr_nest_end(nlh, nest);
@@ -136,7 +144,8 @@ static void cmd_media_get_help(struct cmdl *cmdl)
 		"PROPERTIES\n"
 		" tolerance             - Get media tolerance\n"
 		" priority              - Get media priority\n"
-		" window                - Get media window\n",
+		" window                - Get media window\n"
+		" mtu                   - Get media mtu\n",
 		cmdl->argv[0]);
 }
 
@@ -147,6 +156,7 @@ static int cmd_media_get(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ "priority",	cmd_media_get_prop,	cmd_media_get_help },
 		{ "tolerance",	cmd_media_get_prop,	cmd_media_get_help },
 		{ "window",	cmd_media_get_prop,	cmd_media_get_help },
+		{ "mtu",	cmd_media_get_prop,	cmd_media_get_help },
 		{ NULL }
 	};
 
@@ -159,7 +169,8 @@ static void cmd_media_set_help(struct cmdl *cmdl)
 		"PROPERTIES\n"
 		" tolerance TOLERANCE   - Set media tolerance\n"
 		" priority PRIORITY     - Set media priority\n"
-		" window WINDOW         - Set media window\n",
+		" window WINDOW         - Set media window\n"
+		" mtu MTU               - Set media mtu\n",
 		cmdl->argv[0]);
 }
 
@@ -183,6 +194,8 @@ static int cmd_media_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		prop = TIPC_NLA_PROP_TOL;
 	else if ((strcmp(cmd->cmd, "window") == 0))
 		prop = TIPC_NLA_PROP_WIN;
+	else if ((strcmp(cmd->cmd, "mtu") == 0))
+		prop = TIPC_NLA_PROP_MTU;
 	else
 		return -EINVAL;
 
@@ -210,6 +223,12 @@ static int cmd_media_set_prop(struct nlmsghdr *nlh, const struct cmd *cmd,
 		fprintf(stderr, "error, missing media\n");
 		return -EINVAL;
 	}
+
+	if ((prop == TIPC_NLA_PROP_MTU) &&
+	    (strcmp(opt->val, "udp"))) {
+		fprintf(stderr, "error, not supported for media\n");
+		return -EINVAL;
+	}
 	mnl_attr_put_strz(nlh, TIPC_NLA_MEDIA_NAME, opt->val);
 
 	props = mnl_attr_nest_start(nlh, TIPC_NLA_MEDIA_PROP);
@@ -228,6 +247,7 @@ static int cmd_media_set(struct nlmsghdr *nlh, const struct cmd *cmd,
 		{ "priority",	cmd_media_set_prop,	cmd_media_set_help },
 		{ "tolerance",	cmd_media_set_prop,	cmd_media_set_help },
 		{ "window",	cmd_media_set_prop,	cmd_media_set_help },
+		{ "mtu",	cmd_media_set_prop,	cmd_media_set_help },
 		{ NULL }
 	};
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH net] MAINTAINERS: Update the 3c59x network driver entry
From: Steffen Klassert @ 2018-05-07 10:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Replace my old E-Mail address with a working one.
While at it, change the maintainance status to
'Odd Fixes'. I'm still around with some knowledge,
but don't actively maintain it anymore.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b1ccabd0dbc3..b3cbf1c3ed07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -137,9 +137,9 @@ Maintainers List (try to look for most precise areas first)
 		-----------------------------------
 
 3C59X NETWORK DRIVER
-M:	Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
+M:	Steffen Klassert <klassert@kernel.org>
 L:	netdev@vger.kernel.org
-S:	Maintained
+S:	Odd Fixes
 F:	Documentation/networking/vortex.txt
 F:	drivers/net/ethernet/3com/3c59x.c
 
-- 
2.14.1

^ permalink raw reply related

* Re: [RFC bpf-next 00/10] initial control flow support for eBPF verifier
From: Jiong Wang @ 2018-05-07 10:33 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: john.fastabend, netdev, oss-drivers
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

On 07/05/2018 11:22, Jiong Wang wrote:
> execution time
> ===
> test_l4lb_noinline:
>    existing check_subprog/check_cfg: ~55000 ns
>    new infrastructure: ~135000 ns
>
> test_xdp_noinline:
>    existing check_subprog/check_cfg: ~52000 ns
>    new infrastructure: ~120000 ns

Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

Regards,
Jiong

^ permalink raw reply

* The SO_BINDTODEVICE was set to the desired interface, but packets are received from all interfaces.
From: Damir Mansurov @ 2018-05-07 10:19 UTC (permalink / raw)
  To: netdev; +Cc: Konstantin Ushakov, Alexandra N. Kossovsky, Andrey Dmitrov

[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]


Greetings,

After successful call of the setsockopt(SO_BINDTODEVICE) function to set 
data reception from only one interface, the data is still received from 
all interfaces. Function setsockopt() returns 0 but then recv() receives 
data from all available network interfaces.

The problem is reproducible on linux kernels 4.14 - 4.16, but it does 
not on linux kernels 4.4, 4.13.

I have written C-code to reproduce this issue (see attached files 
b2d_send.c and b2d_recv.c). See below explanation of tested configuration.


         PC-1                              PC-2
  -------------------               -------------------
  | b2d_send        |               | b2d_recv        |
  |                 |               |                 |
  |           ------|               |------           |
  |          | eth0 |---------------| eth0 |          |
  |           ------|               |------           |
  |                 |               |                 |
  |           ------|               |------           |
  |          | eth1 |---------------| eth1 |          |
  |           ------|               |------           |
  |                 |               |                 |
  -------------------               -------------------

Steps:
1. Copy b2d_recv.c to PC-2, compile it ("gcc -o b2d_recv b2d_recv.c") 
and run "./b2d_recv eth0 23777" to get derived data only from eth0 
interface. Port number in this example is 23777 only for sample.

2. Copy b2d_send.c to PC-1, compile it ("gcc -o b2d_send b2d_send.c") 
and run "./b2d_send ip1 ip2 23777" where ip1 and ip2 are ip addresses of 
interfaces eth0 and eth1 of PC-2.

3. Result:
- b2d_recv prints out data from eth0 and eth1 on linux kernels from 4.14 
up to 4.16.
- b2d_recv prints out data from only eth0 on linux kernels below 4.14.


******************
Thanks,
Damir Mansurov
dnman@oktetlabs.ru

[-- Attachment #2: b2d_recv.c --]
[-- Type: text/x-csrc, Size: 3108 bytes --]

/*
 * Receive udp packets from desired interface
 *
 * This tool is used to check that option SO_BINDTODEVICE works correctly
 * setsockop(SO_BINDTODEVICE)
 * Use together with b2d_send.c
 *
 * 1. Start b2d_recv on receiver PC
 * 2. Start b2d_send on sender PC
 * 3. Check that packets are received only from the selected interface
 *
 * usage:   ./b2d_recv interface port
 * example: ./b2d_recv eth0 23777
 */

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <errno.h>

#define RX_BUFF_SIZE 1024

/* create recv socket, if error occured exit(EXIT_FAILURE) */
int create_recv_sock(const char * str_interface, const char * str_port,
                     struct sockaddr_in * sock_addr);

int
main(int argc, char *argv[])
{
    struct sockaddr_in sa_recv;

    if (argc != 3)
    {
        printf("usage: %s interface port\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    int sfd_recv  = create_recv_sock(argv[1], argv[2], &sa_recv);

    char rx_buff[RX_BUFF_SIZE] = {0};
    struct sockaddr_in src_addr;
    socklen_t addr_len;
    memset(&src_addr, 0, sizeof(struct sockaddr_in));

    while (1)
    {
        ssize_t res = recvfrom(sfd_recv, rx_buff, RX_BUFF_SIZE - 1, 0,
                       (struct sockaddr *)&src_addr, &addr_len);
        if (res < 0)
        {
            perror("recvfrom");
            exit(EXIT_FAILURE);
        }

        char buf[256];
        const char * x = inet_ntop(src_addr.sin_family, &src_addr.sin_addr,
                                   buf, addr_len);
        if (x == NULL)
        {
            perror("inet_ntop");
            exit(EXIT_FAILURE);
        }
        printf("recv %ld bytes from %s:%u: \"%s\"\n",
                res, buf, ntohs(src_addr.sin_port), rx_buff);
    }

    exit(EXIT_SUCCESS);
}


int
create_recv_sock(const char * str_interface, const char * str_port,
                       struct sockaddr_in * sock_addr)
{
    int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
    if (sockfd < 0)
    {
        perror("socket");
        exit(EXIT_FAILURE);
    }

    memset(sock_addr, 0, sizeof(struct sockaddr_in));
    sock_addr->sin_family = AF_INET;
    sock_addr->sin_addr.s_addr = htonl(INADDR_ANY);
    sock_addr->sin_port = htons((uint16_t)atoi(str_port));

    int res = bind(sockfd, (struct sockaddr*)sock_addr,
                      sizeof(struct sockaddr_in));

    if (res < 0)
    {
        perror("bind");
        exit(EXIT_FAILURE);
    }

    if (strlen(str_interface) == 0)
    {
        puts("Data will be received from all interfaces");
    }
    else
    {
        res = setsockopt(sockfd, SOL_SOCKET, SO_BINDTODEVICE,
                         str_interface, strlen(str_interface) + 1);
        if (res < 0)
        {
            perror("setsockopt");
            exit(EXIT_FAILURE);
        }
        else
        {
            printf("success bind to device \"%s\"\n", str_interface);
        }
    }
    printf("recv port %s\n", str_port);
    return sockfd;

}/* create_recv_socket() */



[-- Attachment #3: b2d_send.c --]
[-- Type: text/x-csrc, Size: 2501 bytes --]

/*
 * Send udp packets from two various interfaces,
 *
 * This tool used to check correctly work option
 * setsockopt(SO_BINDTODEVICE)
 * Use together with b2d_recv.c
 * Detailed description in file b2d_recv.c
 *
 * usage    ./b2d_send ip1 ip2 port
 * example: ./b2d_send 192.168.44.2 192.168.45.2 23777
 */

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <errno.h>

#define RX_BUFF_SIZE 1024

/* create sender socket, if error occured exit(EXIT_FAILURE) */
int create_sender_sock(const char * str_ip, const char * str_port,
                       struct sockaddr_in * sock_addr);

int
main(int argc, char *argv[])
{
    struct sockaddr_in sa_sender1, sa_sender2;

    if (argc != 4)
    {
        printf("usage: %s ip1 ip2 port\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    int sfd_sender1 = create_sender_sock(argv[1], argv[3], &sa_sender1);
    int sfd_sender2 = create_sender_sock(argv[2], argv[3], &sa_sender2);

    char tx_buff1[] = "Data from first socket";
    char tx_buff2[] = "Data from second socket";

    ssize_t res = send(sfd_sender1, tx_buff1, strlen(tx_buff1) + 1, 0);
    if (res < 0)
    {
        perror("sender1");
        exit(EXIT_FAILURE);
    }
    printf("success send %ld bytes to %s:%s \"%s\"\n",
            res, argv[1], argv[3], tx_buff1);

    res = send(sfd_sender2, tx_buff2, strlen(tx_buff2) + 1, 0);
    if (res < 0)
    {
        perror("sender2");
        exit(EXIT_FAILURE);
    }
    printf("success send %ld bytes to %s:%s \"%s\"\n",
            res, argv[2], argv[3], tx_buff2);

    exit(EXIT_SUCCESS);

}/* main() */


int
create_sender_sock(const char * str_ip, const char * str_port,
                   struct sockaddr_in * sock_addr)
{
    int res;
    int sockfd = socket(AF_INET, SOCK_DGRAM, 0);
    if (sockfd < 0)
    {
        perror("socket");
        exit(EXIT_FAILURE);
    }

    memset(sock_addr, 0, sizeof(struct sockaddr_in));
    sock_addr->sin_family = AF_INET;
    if (inet_pton(AF_INET, str_ip, &(sock_addr->sin_addr)) != 1)
    {
        perror("Bad ip_add");
        exit(EXIT_FAILURE);
    }
    sock_addr->sin_port = htons((uint16_t)atoi(str_port));

    res = connect(sockfd, (struct sockaddr*)sock_addr,
                  sizeof(struct sockaddr_in));
    if (res < 0)
    {
        perror("connect");
        exit(EXIT_FAILURE);
    }

    return sockfd;

}/* create_sender_socket() */



^ permalink raw reply

* Re: [PATCH 2/4] net: 3com: 3c59x: Move boomerang/vortex conditional into function
From: Steffen Klassert @ 2018-05-07 10:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, David S. Miller, tglx, Anna-Maria Gleixner
In-Reply-To: <20180504152228.o24gdwcf2oft52u4@linutronix.de>

On Fri, May 04, 2018 at 05:22:28PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-04 17:17:47 [+0200], To netdev@vger.kernel.org wrote:
> > From: Anna-Maria Gleixner <anna-maria@linutronix.de>
> > 
> > If vp->full_bus_master_tx is set, vp->full_bus_master_rx is set as well
> > (see vortex_probe1()). Therefore the conditionals for the decision if
> > boomerang or vortex ISR is executed have the same result. Instead of
> > repeating the explicit conditional execution of the boomerang/vortex ISR,
> > move it into an own function.
> > 
> > No functional change.
> > 
> > Cc: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
> 
> Steffen, this email address is still listed in the MAINTAINERS file for
> the driver and rejects emails.

Thanks for the hint, I'll update the address.

^ permalink raw reply

* [RFC bpf-next 10/10] bpf: cfg: reduce memory usage by using singly list + compat pointer
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

Because there are going to be quite a few nodes (bb, edge etc) for a
reasonable sized program, the size of cfg nodes matters.

The cfg traversal used at the moment are mostly via edge which contains
pointers to both src and dst basic block. The traversal is not via links
between basic blocks and edge nodes themselves, so link them with doubly
list_head is unnecessary, and is too heavy on 64-bit host as pointer will
take 8-bytes.

This patch reduce memory usage from two ways:
  1. use singly linked list to link basic block and edge nodes.
  2. introduce a compact pointer representation for pointers generated from
     memory pool. If a pointer is from pool, we could represent it using
     pool_base + pool_offset,

       struct cfg_node_link {
         u16 offset;
         s8 idx;
       };

     the compact pointer "cfg_node_link" will only take 3-bytes.
     the delink of the pointer will becomes:

     static void *cfg_node_delink(struct cfg_node_allocator *allocator,
                            struct cfg_node_link *link)
     {
       s8 idx = link->idx;

       if (idx == -1)
               return NULL;

       return allocator->base[idx] + link->offset;
     }

For example, basic blocks nodes changed from:

    struct bb_node {
       struct list_head l;
       struct list_head e_prevs;
       struct list_head e_succs;
       s16 head;
       u16 idx;
    };

into:

    struct bb_node {
	struct cfg_node_link link;
	struct cfg_node_link e_prevs;
	struct cfg_node_link e_succs;
	s16 head;
	u16 idx;
    };

We reduced node size from 52 to 16. We could further reduce node size to 14
if we reshuffle fields of link/e_prevs/e_succs to avoid alignment padding,
but that will with a cost of code readability.

>From benchmarks like test_xdp_noinline, this patch reduce peek memory usage
of new cfg infrastructure by more than 50%.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |   7 +-
 kernel/bpf/cfg.c             | 503 ++++++++++++++++++++++++-------------------
 kernel/bpf/cfg.h             |  23 +-
 kernel/bpf/verifier.c        |  33 ++-
 4 files changed, 312 insertions(+), 254 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 96337ba..c291461 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -176,8 +176,11 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 struct bpf_subprog_info {
 	u32 start; /* insn idx of function entry point */
 	u16 stack_depth; /* max. stack depth used by this function */
-	struct list_head callees; /* callees list. */
-	struct list_head bbs; /* basic blocks list. */
+	void *callees; /* callees list. */
+	struct {
+		void *entry_bb; /* basic blocks list head. */
+		void *exit_bb; /* basic blocks list tail. */
+	} bbs;
 	u32 bb_num; /* total basic block num. */
 	unsigned long *dtree;
 	bool dtree_avail;
diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
index 1aeac49..bda63ab 100644
--- a/kernel/bpf/cfg.c
+++ b/kernel/bpf/cfg.c
@@ -13,54 +13,55 @@
 
 #include "cfg.h"
 
+/* The size of cfg nodes matters, therefore we try to avoid using doubly linked
+ * list and we try to use base + offset to address node, by this we only need
+ * to keep offset.
+ */
+struct cfg_node_link {
+	u16 offset;
+	s8 idx;
+};
+
+static void *cfg_node_delink(struct cfg_node_allocator *allocator,
+			     struct cfg_node_link *link)
+{
+	s8 idx = link->idx;
+
+	if (idx == -1)
+		return NULL;
+
+	return allocator->base[idx] + link->offset;
+}
+
 struct cedge_node {
-	struct list_head l;
+	struct cfg_node_link link;
 	u8 caller_idx;
 	u8 callee_idx;
 };
 
 struct edge_node {
-	struct list_head l;
+	struct cfg_node_link link;
 	struct bb_node *src;
 	struct bb_node *dst;
 };
 
-struct edge_iter {
-	struct list_head *list_head;
-	struct edge_node *edge;
+struct bb_node {
+	struct cfg_node_link link;
+	struct cfg_node_link e_prevs;
+	struct cfg_node_link e_succs;
+	s16 head;
+	u16 idx;
 };
 
-#define first_edge(e_list)	list_first_entry(e_list, struct edge_node, l)
-#define last_edge(e_list)	list_last_entry(e_list, struct edge_node, l)
-#define next_edge(e)		list_next_entry(e, l)
+#define entry_bb(bb_list)		(struct bb_node *)(*bb_list)
+#define exit_bb(bb_list)		(struct bb_node *)(*(bb_list + 1))
 
-static bool ei_end_p(struct edge_iter *ei)
+static struct bb_node *bb_next(struct cfg_node_allocator *allocator,
+			       struct bb_node *bb)
 {
-	return &ei->edge->l == ei->list_head;
+	return (struct bb_node *)cfg_node_delink(allocator, &bb->link);
 }
 
-static void ei_next(struct edge_iter *ei)
-{
-	struct edge_node *e = ei->edge;
-
-	ei->edge = next_edge(e);
-}
-
-struct bb_node {
-	struct list_head l;
-	struct list_head e_prevs;
-	struct list_head e_succs;
-	u16 head;
-	u16 idx;
-};
-
-#define bb_prev(bb)		list_prev_entry(bb, l)
-#define bb_next(bb)		list_next_entry(bb, l)
-#define bb_first(bb_list)	list_first_entry(bb_list, struct bb_node, l)
-#define bb_last(bb_list)	list_last_entry(bb_list, struct bb_node, l)
-#define entry_bb(bb_list)	bb_first(bb_list)
-#define exit_bb(bb_list)	bb_last(bb_list)
-
 struct dom_info {
 	u16 *dfs_parent;
 	u16 *dfs_order;
@@ -82,9 +83,12 @@ struct dom_info {
 	u16 dfsnum;
 };
 
-struct node_pool {
-	struct list_head l;
-	void *data;
+struct mem_frag {
+	struct cfg_node_link link;
+	void *p;
+};
+
+struct pool_head {
 	u32 size;
 	u32 used;
 };
@@ -97,67 +101,103 @@ struct node_pool {
 static int cfg_node_allocator_grow(struct cfg_node_allocator *allocator,
 				   int min_grow_size)
 {
-	int s = min_grow_size;
-	struct node_pool *pool;
-	void *data;
+	int s = min_grow_size, pool_cnt = allocator->pool_cnt;
+	struct pool_head *pool;
 
-	s += sizeof(struct node_pool);
+	if (pool_cnt >= MAX_POOL_NUM)
+		return -E2BIG;
+
+	s += sizeof(struct pool_head);
 	s = ALIGN(s, MEM_CHUNK_SIZE);
-	data = kzalloc(s, GFP_KERNEL);
-	if (!data)
+	if (s > U16_MAX)
+		return -E2BIG;
+
+	pool = kzalloc(s, GFP_KERNEL);
+	if (!pool)
 		return -ENOMEM;
 
-	pool = (struct node_pool *)data;
-	pool->data = pool + 1;
-	pool->size = s - sizeof(struct node_pool);
-	pool->used = 0;
-	allocator->cur_free_pool = pool;
-	list_add_tail(&pool->l, &allocator->pools);
+	allocator->base[pool_cnt] = pool;
+	pool->size = s;
+	pool->used = sizeof(struct pool_head);
+	allocator->pool_cnt++;
 
 	return 0;
 }
 
-static void *cfg_node_alloc(struct cfg_node_allocator *allocator, int size)
+static int cfg_node_alloc(struct cfg_node_allocator *allocator,
+			  struct mem_frag *frag, int size)
 {
-	struct node_pool *pool = allocator->cur_free_pool;
+	int pool_idx = allocator->pool_cnt - 1;
+	struct pool_head *pool;
 	void *p;
 
+	pool = allocator->base[pool_idx];
 	if (pool->used + size > pool->size) {
 		int ret = cfg_node_allocator_grow(allocator, size);
 
 		if (ret < 0)
-			return NULL;
+			return ret;
 
-		pool = allocator->cur_free_pool;
+		pool_idx++;
+		pool = allocator->base[pool_idx];
 	}
 
-	p = pool->data + pool->used;
+	p = (void *)pool + pool->used;
+	frag->p = p;
+	frag->link.idx = pool_idx;
+	frag->link.offset = pool->used;
 	pool->used += size;
 
-	return p;
+	return 0;
 }
 
-static struct bb_node *get_single_bb_nodes(struct cfg_node_allocator *allocator)
+static int get_link_nodes(struct cfg_node_allocator *allocator,
+			  struct mem_frag *frag, int num, int elem_size)
 {
-	int size = sizeof(struct bb_node);
+	int i, ret;
+	struct cfg_node_link *link;
 
-	return (struct bb_node *)cfg_node_alloc(allocator, size);
+	ret = cfg_node_alloc(allocator, frag, num * elem_size);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < num; i++) {
+		link = frag->p + i * elem_size;
+		link->idx = -1;
+	}
+
+	return 0;
 }
 
-static struct edge_node *get_edge_nodes(struct cfg_node_allocator *allocator,
-					int num)
+static int get_bb_nodes(struct cfg_node_allocator *allocator,
+			struct mem_frag *frag, int num)
 {
-	int size = num * sizeof(struct edge_node);
+	struct bb_node *bb;
+	int i, ret;
+
+	ret = get_link_nodes(allocator, frag, num, sizeof(struct bb_node));
+	if (ret < 0)
+		return ret;
+
+	bb = frag->p;
+	for (i = 0; i < num; i++) {
+		bb[i].e_prevs.idx = -1;
+		bb[i].e_succs.idx = -1;
+	}
 
-	return (struct edge_node *)cfg_node_alloc(allocator, size);
+	return 0;
 }
 
-static struct cedge_node *
-get_single_cedge_node(struct cfg_node_allocator *allocator)
+static int get_edge_nodes(struct cfg_node_allocator *allocator,
+			  struct mem_frag *frag, int num)
 {
-	int size = sizeof(struct cedge_node);
+	return get_link_nodes(allocator, frag, num, sizeof(struct edge_node));
+}
 
-	return (struct cedge_node *)cfg_node_alloc(allocator, size);
+static int get_single_cedge_node(struct cfg_node_allocator *allocator,
+				 struct mem_frag *frag)
+{
+	return get_link_nodes(allocator, frag, 1, sizeof(struct cedge_node));
 }
 
 int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
@@ -167,7 +207,8 @@ int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
 
 	s += 2 * bb_num_esti * sizeof(struct edge_node);
 	s += cedge_num_esti * sizeof(struct cedge_node);
-	INIT_LIST_HEAD(&allocator->pools);
+
+	allocator->pool_cnt = 0;
 	ret = cfg_node_allocator_grow(allocator, s);
 	if (ret < 0)
 		return ret;
@@ -177,127 +218,123 @@ int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
 
 void cfg_node_allocator_free(struct cfg_node_allocator *allocator)
 {
-	struct list_head *pools = &allocator->pools;
-	struct node_pool *pool, *tmp;
+	int i, cnt = allocator->pool_cnt;
 
-	pool = first_node_pool(pools);
-	list_for_each_entry_safe_from(pool, tmp, pools, l) {
-		list_del(&pool->l);
-		kfree(pool);
-	}
+	for (i = 0; i < cnt; i++)
+		kfree(allocator->base[i]);
 }
 
-int subprog_append_bb(struct cfg_node_allocator *allocator,
-		      struct list_head *bb_list, int head)
+int subprog_append_bb(struct cfg_node_allocator *allocator, void **bb_list,
+		      int head)
 {
-	struct bb_node *new_bb, *bb;
+	struct bb_node *cur = entry_bb(bb_list);
+	struct bb_node *prev = cur;
+	struct mem_frag frag;
+	int ret;
 
-	list_for_each_entry(bb, bb_list, l) {
-		if (bb->head == head)
+	while (cur) {
+		if (cur->head == head)
 			return 0;
-		else if (bb->head > head)
+		else if (cur->head > head)
 			break;
-	}
+		prev = cur;
+		cur = cfg_node_delink(allocator, &cur->link);
+	};
 
-	bb = bb_prev(bb);
-	new_bb = get_single_bb_nodes(allocator);
-	if (!new_bb)
-		return -ENOMEM;
+	ret = get_bb_nodes(allocator, &frag, 1);
+	if (ret < 0)
+		return ret;
 
-	INIT_LIST_HEAD(&new_bb->e_prevs);
-	INIT_LIST_HEAD(&new_bb->e_succs);
-	new_bb->head = head;
-	list_add(&new_bb->l, &bb->l);
+	cur = frag.p;
+	cur->head = head;
+	cur->link = prev->link;
+	prev->link = frag.link;
 
 	return 0;
 }
 
-int subprog_fini_bb(struct cfg_node_allocator *allocator,
-		    struct list_head *bb_list, int subprog_end)
+int subprog_init_bb(struct cfg_node_allocator *allocator, void **bb_list,
+		    int subprog_start, int subprog_end)
 {
-	struct bb_node *bb = get_single_bb_nodes(allocator);
-
-	if (!bb)
-		return -ENOMEM;
-	/* entry bb. */
-	bb->head = -1;
-	INIT_LIST_HEAD(&bb->e_prevs);
-	INIT_LIST_HEAD(&bb->e_succs);
-	list_add(&bb->l, bb_list);
-
-	bb = get_single_bb_nodes(allocator);
-	if (!bb)
-		return -ENOMEM;
-	/* exit bb. */
-	bb->head = subprog_end;
-	INIT_LIST_HEAD(&bb->e_prevs);
-	INIT_LIST_HEAD(&bb->e_succs);
-	list_add_tail(&bb->l, bb_list);
-
-	return 0;
-}
+	struct bb_node **list_head = (struct bb_node **)bb_list;
+	struct bb_node **list_tail = (struct bb_node **)(bb_list + 1);
+	struct bb_node *entry_bb, *first_bb, *exit_bb;
+	int ret, s = sizeof(struct bb_node);
+	struct mem_frag frag;
 
-int subprog_init_bb(struct cfg_node_allocator *allocator,
-		    struct list_head *bb_list, int subprog_start)
-{
-	int ret;
-
-	INIT_LIST_HEAD(bb_list);
-	ret = subprog_append_bb(allocator, bb_list, subprog_start);
+	ret = get_bb_nodes(allocator, &frag, 3);
 	if (ret < 0)
 		return ret;
 
+	entry_bb = frag.p;
+	*list_head = entry_bb;
+	entry_bb->head = -1;
+	first_bb = frag.p + s;
+	first_bb->head = subprog_start;
+	exit_bb = frag.p + 2 * s;
+	exit_bb->head = subprog_end;
+	entry_bb->link.idx = frag.link.idx;
+	entry_bb->link.offset = frag.link.offset + s;
+	first_bb->link.idx = frag.link.idx;
+	first_bb->link.offset = frag.link.offset + 2 * s;
+	*list_tail = exit_bb;
+
 	return 0;
 }
 
-static struct bb_node *search_bb_with_head(struct list_head *bb_list, int head)
+static struct bb_node *search_bb_with_head(struct cfg_node_allocator *allocator,
+					   void **bb_list, int head)
 {
-	struct bb_node *bb;
+	struct bb_node *cur = entry_bb(bb_list);
 
-	list_for_each_entry(bb, bb_list, l) {
-		if (bb->head == head)
-			return bb;
-	}
+	while (cur) {
+		if (cur->head == head)
+			return cur;
+
+		cur = cfg_node_delink(allocator, &cur->link);
+	};
 
 	return NULL;
 }
 
 int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
-			 struct bpf_insn *insns, struct list_head *bb_list)
+			 struct bpf_insn *insns, void **bb_list)
 {
-	struct bb_node *bb, *exit_bb;
+	struct bb_node *bb = entry_bb(bb_list), *exit_bb;
 	struct edge_node *edge;
-	int bb_num;
+	struct mem_frag frag;
+	int ret, bb_num;
 
-	bb = entry_bb(bb_list);
-	edge = get_edge_nodes(allocator, 2);
-	if (!edge)
-		return -ENOMEM;
+	ret = get_edge_nodes(allocator, &frag, 2);
+	if (ret < 0)
+		return ret;
+	edge = frag.p;
 	edge->src = bb;
-	edge->dst = bb_next(bb);
-	list_add_tail(&edge->l, &bb->e_succs);
+	edge->dst = bb_next(allocator, bb);
+	bb->e_succs = frag.link;
 	edge[1].src = edge->src;
 	edge[1].dst = edge->dst;
-	list_add_tail(&edge[1].l, &edge[1].dst->e_prevs);
+	edge->dst->e_prevs = frag.link;
 	bb->idx = -1;
 
 	exit_bb = exit_bb(bb_list);
 	exit_bb->idx = -2;
-	bb = bb_next(bb);
+	bb = edge->dst;
 	bb_num = 0;
-	list_for_each_entry_from(bb, &exit_bb->l, l) {
+	while (bb && bb != exit_bb) {
+		struct bb_node *next_bb = bb_next(allocator, bb);
 		bool has_fallthrough, only_has_fallthrough;
 		bool has_branch, only_has_branch;
-		struct bb_node *next_bb = bb_next(bb);
 		int tail = next_bb->head - 1;
 		struct bpf_insn insn;
 		u8 code;
 
 		bb->idx = bb_num++;
 
-		edge = get_edge_nodes(allocator, 2);
-		if (!edge)
-			return -ENOMEM;
+		ret = get_edge_nodes(allocator, &frag, 2);
+		if (ret < 0)
+			return ret;
+		edge = frag.p;
 		edge->src = bb;
 		edge[1].src = bb;
 
@@ -322,8 +359,11 @@ int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
 				next_bb = exit_bb;
 			edge->dst = next_bb;
 			edge[1].dst = next_bb;
-			list_add_tail(&edge->l, &bb->e_succs);
-			list_add_tail(&edge[1].l, &edge[1].dst->e_prevs);
+			edge->link = bb->e_succs;
+			bb->e_succs = frag.link;
+			frag.link.offset += sizeof(struct edge_node);
+			edge[1].link = next_bb->e_prevs;
+			next_bb->e_prevs = frag.link;
 			edge = NULL;
 		}
 
@@ -331,23 +371,29 @@ int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
 			struct bb_node *tgt;
 
 			if (!edge) {
-				edge = get_edge_nodes(allocator, 2);
-				if (!edge)
-					return -ENOMEM;
+				ret = get_edge_nodes(allocator, &frag, 2);
+				if (ret < 0)
+					return ret;
+				edge = frag.p;
 				edge->src = bb;
 				edge[1].src = bb;
 			}
 
-			tgt = search_bb_with_head(bb_list,
+			tgt = search_bb_with_head(allocator, bb_list,
 						  tail + insn.off + 1);
 			if (!tgt)
 				return -EINVAL;
 
 			edge->dst = tgt;
 			edge[1].dst = tgt;
-			list_add_tail(&edge->l, &bb->e_succs);
-			list_add_tail(&edge[1].l, &tgt->e_prevs);
+			edge->link = bb->e_succs;
+			bb->e_succs = frag.link;
+			frag.link.offset += sizeof(struct edge_node);
+			edge[1].link = tgt->e_prevs;
+			tgt->e_prevs = frag.link;
 		}
+
+		bb = bb_next(allocator, bb);
 	}
 
 	return bb_num + 2;
@@ -451,13 +497,13 @@ static void link(struct dom_info *di, unsigned int v, unsigned int w)
 	}
 }
 
-static void calc_idoms(struct bpf_subprog_info *subprog, struct dom_info *di,
-		       bool reverse)
+static void
+calc_idoms(struct cfg_node_allocator *allocator,
+	   struct bpf_subprog_info *subprog, struct dom_info *di, bool reverse)
 {
 	u16 entry_bb_fake_idx = subprog->bb_num - 2, idx, w, k, par;
-	struct list_head *bb_list = &subprog->bbs;
+	void **bb_list = (void **)&subprog->bbs;
 	struct bb_node *entry_bb;
-	struct edge_iter ei;
 
 	if (reverse)
 		entry_bb = exit_bb(bb_list);
@@ -472,24 +518,21 @@ static void calc_idoms(struct bpf_subprog_info *subprog, struct dom_info *di,
 		par = di->dfs_parent[idx];
 		k = idx;
 
-		if (reverse) {
-			ei.edge = first_edge(&bb->e_succs);
-			ei.list_head = &bb->e_succs;
-		} else {
-			ei.edge = first_edge(&bb->e_prevs);
-			ei.list_head = &bb->e_prevs;
-		}
+		if (reverse)
+			e = cfg_node_delink(allocator, &bb->e_succs);
+		else
+			e = cfg_node_delink(allocator, &bb->e_prevs);
 
-		while (!ei_end_p(&ei)) {
+		while (e) {
 			struct bb_node *b;
 			u16 k1;
 
-			e = ei.edge;
 			if (reverse)
 				b = e->dst;
 			else
 				b = e->src;
-			ei_next(&ei);
+
+			e = cfg_node_delink(allocator, &e->link);
 
 			if (b == entry_bb)
 				k1 = di->dfs_order[entry_bb_fake_idx];
@@ -525,20 +568,22 @@ static void calc_idoms(struct bpf_subprog_info *subprog, struct dom_info *di,
 }
 
 static int
-calc_dfs_tree(struct bpf_verifier_env *env, struct bpf_subprog_info *subprog,
-	      struct dom_info *di, bool reverse)
+calc_dfs_tree(struct bpf_verifier_env *env,
+	      struct cfg_node_allocator *allocator,
+	      struct bpf_subprog_info *subprog, struct dom_info *di,
+	      bool reverse)
 {
 	u16 bb_num = subprog->bb_num, sp = 0, idx, parent_idx, i;
-	struct list_head *bb_list = &subprog->bbs;
+	void **bb_list = (void **)&subprog->bbs;
 	u16 entry_bb_fake_idx = bb_num - 2;
 	struct bb_node *entry_bb, *exit_bb;
-	struct edge_iter ei, *stack;
-	struct edge_node *e;
+	struct edge_node **stack, *e;
 	bool *visited;
 
 	di->dfs_order[entry_bb_fake_idx] = di->dfsnum;
 
-	stack = kmalloc_array(bb_num - 1, sizeof(struct edge_iter), GFP_KERNEL);
+	stack = kmalloc_array(bb_num - 1, sizeof(struct edge_node *),
+			      GFP_KERNEL);
 	if (!stack)
 		return -ENOMEM;
 
@@ -550,29 +595,26 @@ calc_dfs_tree(struct bpf_verifier_env *env, struct bpf_subprog_info *subprog,
 		entry_bb = exit_bb(bb_list);
 		exit_bb = entry_bb(bb_list);
 		di->dfs_to_bb[di->dfsnum++] = exit_bb;
-		ei.edge = first_edge(&entry_bb->e_prevs);
-		ei.list_head = &entry_bb->e_prevs;
+		e = cfg_node_delink(allocator, &entry_bb->e_prevs);
 	} else {
 		entry_bb = entry_bb(bb_list);
 		exit_bb = exit_bb(bb_list);
 		di->dfs_to_bb[di->dfsnum++] = entry_bb;
-		ei.edge = first_edge(&entry_bb->e_succs);
-		ei.list_head = &entry_bb->e_succs;
+		e = cfg_node_delink(allocator, &entry_bb->e_succs);
 	}
 
 	while (1) {
 		struct bb_node *bb_dst, *bb_src;
 
-		while (!ei_end_p(&ei)) {
-			e = ei.edge;
-
+		while (e) {
 			if (reverse) {
 				bb_dst = e->src;
 				if (bb_dst != exit_bb)
 					visited[bb_dst->idx] = true;
 				if (bb_dst == exit_bb ||
 				    di->dfs_order[bb_dst->idx]) {
-					ei_next(&ei);
+					e = cfg_node_delink(allocator,
+							    &e->link);
 					continue;
 				}
 				bb_src = e->dst;
@@ -582,7 +624,8 @@ calc_dfs_tree(struct bpf_verifier_env *env, struct bpf_subprog_info *subprog,
 					visited[bb_dst->idx] = true;
 				if (bb_dst == exit_bb ||
 				    di->dfs_order[bb_dst->idx]) {
-					ei_next(&ei);
+					e = cfg_node_delink(allocator,
+							    &e->link);
 					continue;
 				}
 				bb_src = e->src;
@@ -598,21 +641,20 @@ calc_dfs_tree(struct bpf_verifier_env *env, struct bpf_subprog_info *subprog,
 			di->dfs_to_bb[idx] = bb_dst;
 			di->dfs_parent[idx] = parent_idx;
 
-			stack[sp++] = ei;
-			if (reverse) {
-				ei.edge = first_edge(&bb_dst->e_prevs);
-				ei.list_head = &bb_dst->e_prevs;
-			} else {
-				ei.edge = first_edge(&bb_dst->e_succs);
-				ei.list_head = &bb_dst->e_succs;
-			}
+			stack[sp++] = e;
+			if (reverse)
+				e = cfg_node_delink(allocator,
+						    &bb_dst->e_prevs);
+			else
+				e = cfg_node_delink(allocator,
+						    &bb_dst->e_succs);
 		}
 
 		if (!sp)
 			break;
 
-		ei = stack[--sp];
-		ei_next(&ei);
+		e = stack[--sp];
+		e = cfg_node_delink(allocator, &e->link);
 	}
 
 	kfree(stack);
@@ -679,6 +721,7 @@ static int idoms_to_doms(struct bpf_subprog_info *subprog, struct dom_info *di)
  */
 
 int subprog_build_dom_info(struct bpf_verifier_env *env,
+			   struct cfg_node_allocator *allocator,
 			   struct bpf_subprog_info *subprog)
 {
 	struct dom_info di;
@@ -688,11 +731,11 @@ int subprog_build_dom_info(struct bpf_verifier_env *env,
 	if (ret < 0)
 		goto free_dominfo;
 
-	ret = calc_dfs_tree(env, subprog, &di, false);
+	ret = calc_dfs_tree(env, allocator, subprog, &di, false);
 	if (ret < 0)
 		goto free_dominfo;
 
-	calc_idoms(subprog, &di, false);
+	calc_idoms(allocator, subprog, &di, false);
 	ret = idoms_to_doms(subprog, &di);
 	if (ret < 0)
 		goto free_dominfo;
@@ -706,25 +749,33 @@ int subprog_build_dom_info(struct bpf_verifier_env *env,
 	return ret;
 }
 
-bool subprog_has_loop(struct bpf_subprog_info *subprog)
+bool subprog_has_loop(struct cfg_node_allocator *allocator,
+		      struct bpf_subprog_info *subprog)
 {
 	int lane_len = BITS_TO_LONGS(subprog->bb_num - 2);
-	struct list_head *bb_list = &subprog->bbs;
-	struct bb_node *bb, *entry_bb;
+	struct bb_node *bb, *entry_bb, *exit_bb;
+	void **bb_list = (void **)&subprog->bbs;
 	struct edge_node *e;
 
 	entry_bb = entry_bb(bb_list);
-	bb = bb_next(entry_bb);
-	list_for_each_entry_from(bb, &exit_bb(bb_list)->l, l)
-		list_for_each_entry(e, &bb->e_prevs, l) {
+	exit_bb = exit_bb(bb_list);
+	bb = bb_next(allocator, entry_bb);
+	while (bb && bb != exit_bb) {
+		e = cfg_node_delink(allocator, &bb->e_prevs);
+		while (e) {
 			struct bb_node *latch = e->src;
 
 			if (latch != entry_bb &&
 			    test_bit(bb->idx,
 				     subprog->dtree + latch->idx * lane_len))
 				return true;
+
+			e = cfg_node_delink(allocator, &e->link);
 		}
 
+		bb = bb_next(allocator, bb);
+	}
+
 	return false;
 }
 
@@ -768,37 +819,34 @@ int add_subprog(struct bpf_verifier_env *env, int off)
 }
 
 struct callee_iter {
-	struct list_head *list_head;
+	struct cedge_node *head;
 	struct cedge_node *callee;
 };
 
-#define first_callee(c_list)	list_first_entry(c_list, struct cedge_node, l)
-#define next_callee(c)		list_next_entry(c, l)
-
 static bool ci_end_p(struct callee_iter *ci)
 {
-	return &ci->callee->l == ci->list_head;
+	return !ci->callee;
 }
 
-static void ci_next(struct callee_iter *ci)
+static void ci_next(struct cfg_node_allocator *allocator,
+		    struct callee_iter *ci)
 {
 	struct cedge_node *c = ci->callee;
 
-	ci->callee = next_callee(c);
+	ci->callee = cfg_node_delink(allocator, &c->link);
 }
 
 #define EXPLORING	1
 #define EXPLORED	2
 int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
+				       struct cfg_node_allocator *allocator,
 				       struct bpf_subprog_info *subprog)
 {
-	struct callee_iter *stack;
+	int sp = 0, idx = 0, ret, *status;
+	struct callee_iter *stack, ci;
 	struct cedge_node *callee;
-	int sp = 0, idx = 0, ret;
-	struct callee_iter ci;
-	int *status;
 
-	stack = kmalloc_array(128, sizeof(struct callee_iter), GFP_KERNEL);
+	stack = kmalloc_array(64, sizeof(struct callee_iter), GFP_KERNEL);
 	if (!stack)
 		return -ENOMEM;
 	status = kcalloc(env->subprog_cnt, sizeof(int), GFP_KERNEL);
@@ -806,8 +854,8 @@ int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 		kfree(stack);
 		return -ENOMEM;
 	}
-	ci.callee = first_callee(&subprog->callees);
-	ci.list_head = &subprog->callees;
+	ci.head = subprog->callees;
+	ci.callee = subprog->callees;
 	status[0] = EXPLORING;
 
 	while (1) {
@@ -822,20 +870,19 @@ int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 
 			status[idx] = EXPLORING;
 
-			if (sp == 127) {
+			if (sp == 64) {
 				bpf_verifier_log_write(env, "cgraph - call frame too deep\n");
 				ret = -EINVAL;
 				goto err_free;
 			}
 
 			stack[sp++] = ci;
-			ci.callee = first_callee(&subprog[idx].callees);
-			ci.list_head = &subprog[idx].callees;
+			ci.head = subprog[idx].callees;
+			ci.callee = subprog[idx].callees;
 		}
 
-		if (!list_empty(ci.list_head))
-			status[first_callee(ci.list_head)->caller_idx] =
-				EXPLORED;
+		if (ci.head)
+			status[ci.head->caller_idx] = EXPLORED;
 		else
 			/* leaf func. */
 			status[idx] = EXPLORED;
@@ -844,7 +891,7 @@ int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 			break;
 
 		ci = stack[--sp];
-		ci_next(&ci);
+		ci_next(allocator, &ci);
 	}
 
 	for (idx = 0; idx < env->subprog_cnt; idx++)
@@ -863,27 +910,37 @@ int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 
 int subprog_append_callee(struct bpf_verifier_env *env,
 			  struct cfg_node_allocator *allocator,
-			  struct list_head *callees_list,
-			  int caller_idx, int off)
+			  void **callees_list, int caller_idx, int off)
 {
-	int callee_idx = find_subprog(env, off);
+	int callee_idx = find_subprog(env, off), ret;
 	struct cedge_node *new_callee, *callee;
+	struct mem_frag frag;
 
 	if (callee_idx < 0)
 		return callee_idx;
 
-	list_for_each_entry(callee, callees_list, l) {
+	callee = (struct cedge_node *)*callees_list;
+	while (callee) {
 		if (callee->callee_idx == callee_idx)
 			return 0;
+
+		callee = cfg_node_delink(allocator, &callee->link);
 	}
 
-	new_callee = get_single_cedge_node(allocator);
-	if (!new_callee)
-		return -ENOMEM;
+	ret = get_single_cedge_node(allocator, &frag);
+	if (ret < 0)
+		return ret;
 
+	new_callee = frag.p;
 	new_callee->caller_idx = caller_idx;
 	new_callee->callee_idx = callee_idx;
-	list_add_tail(&new_callee->l, callees_list);
+	callee = (struct cedge_node *)*callees_list;
+	if (!callee) {
+		*callees_list = new_callee;
+	} else {
+		new_callee->link = callee->link;
+		callee->link = frag.link;
+	}
 
 	return 0;
 }
diff --git a/kernel/bpf/cfg.h b/kernel/bpf/cfg.h
index 1868587..cb833bd 100644
--- a/kernel/bpf/cfg.h
+++ b/kernel/bpf/cfg.h
@@ -8,9 +8,11 @@
 #ifndef __BPF_CFG_H__
 #define __BPF_CFG_H__
 
+#define MAX_POOL_NUM	32
+
 struct cfg_node_allocator {
-	struct list_head pools;
-	struct node_pool *cur_free_pool;
+	void *base[MAX_POOL_NUM];
+	u8 pool_cnt;
 };
 
 int add_subprog(struct bpf_verifier_env *env, int off);
@@ -18,22 +20,23 @@ void cfg_node_allocator_free(struct cfg_node_allocator *allocator);
 int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
 			    int bb_num_esti, int cedge_num_esti);
 int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
+				       struct cfg_node_allocator *allocator,
 				       struct bpf_subprog_info *subprog);
 int find_subprog(struct bpf_verifier_env *env, int off);
 int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
-			 struct bpf_insn *insns, struct list_head *bb_list);
+			 struct bpf_insn *insns, void **bb_list);
 int subprog_append_bb(struct cfg_node_allocator *allocator,
-		      struct list_head *bb_list, int head);
+		      void **bb_list, int head);
 int subprog_append_callee(struct bpf_verifier_env *env,
 			  struct cfg_node_allocator *allocator,
-			  struct list_head *bb_list, int caller_idx, int off);
+			  void **callees, int caller_idx, int off);
 int subprog_build_dom_info(struct bpf_verifier_env *env,
+			   struct cfg_node_allocator *allocator,
 			   struct bpf_subprog_info *subprog);
-int subprog_fini_bb(struct cfg_node_allocator *allocator,
-		    struct list_head *bb_list, int subprog_end);
-bool subprog_has_loop(struct bpf_subprog_info *subprog);
-int subprog_init_bb(struct cfg_node_allocator *allocator,
-		    struct list_head *bb_list, int subprog_start);
+bool subprog_has_loop(struct cfg_node_allocator *allocator,
+		      struct bpf_subprog_info *subprog);
+int subprog_init_bb(struct cfg_node_allocator *allocator, void **bb_list,
+		    int subprog_start, int subprog_end);
 void subprog_free(struct bpf_subprog_info *subprog, int end_idx);
 
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index db4a3ca..fad0975 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -741,9 +741,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
 	struct bpf_subprog_info *subprog = env->subprog_info;
-	struct list_head *cur_bb_list, *cur_callee_list;
 	struct bpf_insn *insn = env->prog->insnsi;
 	int cedge_num_esti = 0, bb_num_esti = 3;
+	void **cur_bb_list, **cur_callee_list;
 	struct cfg_node_allocator allocator;
 	int insn_cnt = env->prog->len;
 	u8 code;
@@ -788,24 +788,19 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	 */
 	subprog[env->subprog_cnt].start = insn_cnt;
 
-	for (i = 0; i < env->subprog_cnt; i++) {
+	for (i = 0; i < env->subprog_cnt; i++)
 		if (env->log.level > 1)
 			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
-		/* Don't init callees inside add_subprog which will sort the
-		 * array which breaks list link.
-		 */
-		INIT_LIST_HEAD(&subprog[i].callees);
-	}
-
 	ret = cfg_node_allocator_init(&allocator, bb_num_esti,
 				      cedge_num_esti);
 	if (ret < 0)
 		return ret;
 	subprog_start = subprog[cur_subprog].start;
 	subprog_end = subprog[cur_subprog + 1].start;
-	cur_bb_list = &subprog[cur_subprog].bbs;
-	ret = subprog_init_bb(&allocator, cur_bb_list, subprog_start);
+	cur_bb_list = (void **)&subprog[cur_subprog].bbs;
+	ret = subprog_init_bb(&allocator, cur_bb_list, subprog_start,
+			      subprog_end);
 	if (ret < 0)
 		goto free_nodes;
 	cur_callee_list = &env->subprog_info[cur_subprog].callees;
@@ -884,20 +879,17 @@ static int check_subprogs(struct bpf_verifier_env *env)
 				goto free_nodes;
 			}
 
-			ret = subprog_fini_bb(&allocator, cur_bb_list,
-					      subprog_end);
-			if (ret < 0)
-				goto free_nodes;
 			ret = subprog_add_bb_edges(&allocator, insn,
 						   cur_bb_list);
 			if (ret < 0)
 				goto free_nodes;
 			subprog[cur_subprog].bb_num = ret;
-			ret = subprog_build_dom_info(env,
+			ret = subprog_build_dom_info(env, &allocator,
 						     &subprog[cur_subprog]);
 			if (ret < 0)
 				goto free_nodes;
-			if (subprog_has_loop(&subprog[cur_subprog])) {
+			if (subprog_has_loop(&allocator,
+					     &subprog[cur_subprog])) {
 				verbose(env, "cfg - loop detected");
 				ret = -EINVAL;
 				goto free_nodes;
@@ -906,9 +898,11 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			cur_subprog++;
 			if (cur_subprog < env->subprog_cnt) {
 				subprog_end = subprog[cur_subprog + 1].start;
-				cur_bb_list = &subprog[cur_subprog].bbs;
+				cur_bb_list =
+					(void **)&subprog[cur_subprog].bbs;
 				ret = subprog_init_bb(&allocator, cur_bb_list,
-						      subprog_start);
+						      subprog_start,
+						      subprog_end);
 				if (ret < 0)
 					goto free_nodes;
 				cur_callee_list = &subprog[cur_subprog].callees;
@@ -916,7 +910,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		}
 	}
 
-	ret = cgraph_check_recursive_unreachable(env, env->subprog_info);
+	ret = cgraph_check_recursive_unreachable(env, &allocator,
+						 env->subprog_info);
 	if (ret < 0)
 		goto free_nodes;
 
-- 
2.7.4

^ permalink raw reply related

* [RFC bpf-next 09/10] bpf: cfg: reduce k*alloc/free call by using memory pool for allocating nodes
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

During building control flow graph, we need to build basic block nodes
and edge nodes etc. These nodes needs to allocated dynamically as we
don't have pre-scan pass to know how many nodes we need accurately.
It is better to manage their allocation using memory pool which could
reduce calling of *alloc/kfree functions and also could simplify freeing
nodes.

This patch:
  - implements a simple memory pool based node allocator.
    nodes need dynamic allocation migrated to this allocator.
  - estimate node numbers inside subprog detection pass, asking allocator
    to do an initial allocation of estimated size (aligned to 2K). The pool
    will grow later if space are not enough.
  - There is no support on returning memory back to the pool.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/cfg.c      | 164 +++++++++++++++++++++++++++++++++++++-------------
 kernel/bpf/cfg.h      |  21 +++++--
 kernel/bpf/verifier.c |  39 +++++++++---
 3 files changed, 170 insertions(+), 54 deletions(-)

diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
index 174e3f0..1aeac49 100644
--- a/kernel/bpf/cfg.c
+++ b/kernel/bpf/cfg.c
@@ -82,7 +82,113 @@ struct dom_info {
 	u16 dfsnum;
 };
 
-int subprog_append_bb(struct list_head *bb_list, int head)
+struct node_pool {
+	struct list_head l;
+	void *data;
+	u32 size;
+	u32 used;
+};
+
+#define first_node_pool(pool_list)	\
+	list_first_entry(pool_list, struct node_pool, l)
+
+#define MEM_CHUNK_SIZE	(1024)
+
+static int cfg_node_allocator_grow(struct cfg_node_allocator *allocator,
+				   int min_grow_size)
+{
+	int s = min_grow_size;
+	struct node_pool *pool;
+	void *data;
+
+	s += sizeof(struct node_pool);
+	s = ALIGN(s, MEM_CHUNK_SIZE);
+	data = kzalloc(s, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	pool = (struct node_pool *)data;
+	pool->data = pool + 1;
+	pool->size = s - sizeof(struct node_pool);
+	pool->used = 0;
+	allocator->cur_free_pool = pool;
+	list_add_tail(&pool->l, &allocator->pools);
+
+	return 0;
+}
+
+static void *cfg_node_alloc(struct cfg_node_allocator *allocator, int size)
+{
+	struct node_pool *pool = allocator->cur_free_pool;
+	void *p;
+
+	if (pool->used + size > pool->size) {
+		int ret = cfg_node_allocator_grow(allocator, size);
+
+		if (ret < 0)
+			return NULL;
+
+		pool = allocator->cur_free_pool;
+	}
+
+	p = pool->data + pool->used;
+	pool->used += size;
+
+	return p;
+}
+
+static struct bb_node *get_single_bb_nodes(struct cfg_node_allocator *allocator)
+{
+	int size = sizeof(struct bb_node);
+
+	return (struct bb_node *)cfg_node_alloc(allocator, size);
+}
+
+static struct edge_node *get_edge_nodes(struct cfg_node_allocator *allocator,
+					int num)
+{
+	int size = num * sizeof(struct edge_node);
+
+	return (struct edge_node *)cfg_node_alloc(allocator, size);
+}
+
+static struct cedge_node *
+get_single_cedge_node(struct cfg_node_allocator *allocator)
+{
+	int size = sizeof(struct cedge_node);
+
+	return (struct cedge_node *)cfg_node_alloc(allocator, size);
+}
+
+int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
+			    int bb_num_esti, int cedge_num_esti)
+{
+	int s = bb_num_esti * sizeof(struct bb_node), ret;
+
+	s += 2 * bb_num_esti * sizeof(struct edge_node);
+	s += cedge_num_esti * sizeof(struct cedge_node);
+	INIT_LIST_HEAD(&allocator->pools);
+	ret = cfg_node_allocator_grow(allocator, s);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+void cfg_node_allocator_free(struct cfg_node_allocator *allocator)
+{
+	struct list_head *pools = &allocator->pools;
+	struct node_pool *pool, *tmp;
+
+	pool = first_node_pool(pools);
+	list_for_each_entry_safe_from(pool, tmp, pools, l) {
+		list_del(&pool->l);
+		kfree(pool);
+	}
+}
+
+int subprog_append_bb(struct cfg_node_allocator *allocator,
+		      struct list_head *bb_list, int head)
 {
 	struct bb_node *new_bb, *bb;
 
@@ -94,7 +200,7 @@ int subprog_append_bb(struct list_head *bb_list, int head)
 	}
 
 	bb = bb_prev(bb);
-	new_bb = kzalloc(sizeof(*new_bb), GFP_KERNEL);
+	new_bb = get_single_bb_nodes(allocator);
 	if (!new_bb)
 		return -ENOMEM;
 
@@ -106,9 +212,10 @@ int subprog_append_bb(struct list_head *bb_list, int head)
 	return 0;
 }
 
-int subprog_fini_bb(struct list_head *bb_list, int subprog_end)
+int subprog_fini_bb(struct cfg_node_allocator *allocator,
+		    struct list_head *bb_list, int subprog_end)
 {
-	struct bb_node *bb = kzalloc(sizeof(*bb), GFP_KERNEL);
+	struct bb_node *bb = get_single_bb_nodes(allocator);
 
 	if (!bb)
 		return -ENOMEM;
@@ -118,7 +225,7 @@ int subprog_fini_bb(struct list_head *bb_list, int subprog_end)
 	INIT_LIST_HEAD(&bb->e_succs);
 	list_add(&bb->l, bb_list);
 
-	bb = kzalloc(sizeof(*bb), GFP_KERNEL);
+	bb = get_single_bb_nodes(allocator);
 	if (!bb)
 		return -ENOMEM;
 	/* exit bb. */
@@ -130,12 +237,13 @@ int subprog_fini_bb(struct list_head *bb_list, int subprog_end)
 	return 0;
 }
 
-int subprog_init_bb(struct list_head *bb_list, int subprog_start)
+int subprog_init_bb(struct cfg_node_allocator *allocator,
+		    struct list_head *bb_list, int subprog_start)
 {
 	int ret;
 
 	INIT_LIST_HEAD(bb_list);
-	ret = subprog_append_bb(bb_list, subprog_start);
+	ret = subprog_append_bb(allocator, bb_list, subprog_start);
 	if (ret < 0)
 		return ret;
 
@@ -154,14 +262,15 @@ static struct bb_node *search_bb_with_head(struct list_head *bb_list, int head)
 	return NULL;
 }
 
-int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list)
+int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
+			 struct bpf_insn *insns, struct list_head *bb_list)
 {
 	struct bb_node *bb, *exit_bb;
 	struct edge_node *edge;
 	int bb_num;
 
 	bb = entry_bb(bb_list);
-	edge = kcalloc(2, sizeof(*edge), GFP_KERNEL);
+	edge = get_edge_nodes(allocator, 2);
 	if (!edge)
 		return -ENOMEM;
 	edge->src = bb;
@@ -186,7 +295,7 @@ int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list)
 
 		bb->idx = bb_num++;
 
-		edge = kcalloc(2, sizeof(*edge), GFP_KERNEL);
+		edge = get_edge_nodes(allocator, 2);
 		if (!edge)
 			return -ENOMEM;
 		edge->src = bb;
@@ -222,7 +331,7 @@ int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list)
 			struct bb_node *tgt;
 
 			if (!edge) {
-				edge = kcalloc(2, sizeof(*edge), GFP_KERNEL);
+				edge = get_edge_nodes(allocator, 2);
 				if (!edge)
 					return -ENOMEM;
 				edge->src = bb;
@@ -753,6 +862,7 @@ int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 }
 
 int subprog_append_callee(struct bpf_verifier_env *env,
+			  struct cfg_node_allocator *allocator,
 			  struct list_head *callees_list,
 			  int caller_idx, int off)
 {
@@ -767,7 +877,7 @@ int subprog_append_callee(struct bpf_verifier_env *env,
 			return 0;
 	}
 
-	new_callee = kzalloc(sizeof(*new_callee), GFP_KERNEL);
+	new_callee = get_single_cedge_node(allocator);
 	if (!new_callee)
 		return -ENOMEM;
 
@@ -778,41 +888,11 @@ int subprog_append_callee(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static void subprog_free_edge(struct bb_node *bb)
-{
-	struct list_head *succs = &bb->e_succs;
-	struct edge_node *e, *tmp;
-
-	/* prevs and succs are allocated as pair, succs is the start addr. */
-	list_for_each_entry_safe(e, tmp, succs, l) {
-		list_del(&e->l);
-		kfree(e);
-	}
-}
-
 void subprog_free(struct bpf_subprog_info *subprog, int end_idx)
 {
 	int i = 0;
 
 	for (; i <= end_idx; i++) {
-		struct list_head *callees = &subprog[i].callees;
-		struct list_head *bbs = &subprog[i].bbs;
-		struct cedge_node *callee, *tmp_callee;
-		struct bb_node *bb, *tmp, *exit;
-
-		bb = entry_bb(bbs);
-		exit = exit_bb(bbs);
-		list_for_each_entry_safe_from(bb, tmp, &exit->l, l) {
-			subprog_free_edge(bb);
-			list_del(&bb->l);
-			kfree(bb);
-		}
-
-		list_for_each_entry_safe(callee, tmp_callee, callees, l) {
-			list_del(&callee->l);
-			kfree(callee);
-		}
-
 		if (subprog[i].dtree_avail)
 			kfree(subprog[i].dtree);
 	}
diff --git a/kernel/bpf/cfg.h b/kernel/bpf/cfg.h
index 577c22c..1868587 100644
--- a/kernel/bpf/cfg.h
+++ b/kernel/bpf/cfg.h
@@ -8,19 +8,32 @@
 #ifndef __BPF_CFG_H__
 #define __BPF_CFG_H__
 
+struct cfg_node_allocator {
+	struct list_head pools;
+	struct node_pool *cur_free_pool;
+};
+
 int add_subprog(struct bpf_verifier_env *env, int off);
+void cfg_node_allocator_free(struct cfg_node_allocator *allocator);
+int cfg_node_allocator_init(struct cfg_node_allocator *allocator,
+			    int bb_num_esti, int cedge_num_esti);
 int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
 				       struct bpf_subprog_info *subprog);
 int find_subprog(struct bpf_verifier_env *env, int off);
-int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list);
-int subprog_append_bb(struct list_head *bb_list, int head);
+int subprog_add_bb_edges(struct cfg_node_allocator *allocator,
+			 struct bpf_insn *insns, struct list_head *bb_list);
+int subprog_append_bb(struct cfg_node_allocator *allocator,
+		      struct list_head *bb_list, int head);
 int subprog_append_callee(struct bpf_verifier_env *env,
+			  struct cfg_node_allocator *allocator,
 			  struct list_head *bb_list, int caller_idx, int off);
 int subprog_build_dom_info(struct bpf_verifier_env *env,
 			   struct bpf_subprog_info *subprog);
-int subprog_fini_bb(struct list_head *bb_list, int subprog_end);
+int subprog_fini_bb(struct cfg_node_allocator *allocator,
+		    struct list_head *bb_list, int subprog_end);
 bool subprog_has_loop(struct bpf_subprog_info *subprog);
-int subprog_init_bb(struct list_head *bb_list, int subprog_start);
+int subprog_init_bb(struct cfg_node_allocator *allocator,
+		    struct list_head *bb_list, int subprog_start);
 void subprog_free(struct bpf_subprog_info *subprog, int end_idx);
 
 #endif
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 54f2fe3..db4a3ca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -743,7 +743,10 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	struct bpf_subprog_info *subprog = env->subprog_info;
 	struct list_head *cur_bb_list, *cur_callee_list;
 	struct bpf_insn *insn = env->prog->insnsi;
+	int cedge_num_esti = 0, bb_num_esti = 3;
+	struct cfg_node_allocator allocator;
 	int insn_cnt = env->prog->len;
+	u8 code;
 
 	/* Add entry function. */
 	ret = add_subprog(env, 0);
@@ -752,8 +755,18 @@ static int check_subprogs(struct bpf_verifier_env *env)
 
 	/* determine subprog starts. The end is one before the next starts */
 	for (i = 0; i < insn_cnt; i++) {
-		if (insn[i].code != (BPF_JMP | BPF_CALL))
+		code = insn[i].code;
+		if (BPF_CLASS(code) != BPF_JMP)
+			continue;
+		if (BPF_OP(code) == BPF_EXIT) {
+			if (i + 1 < insn_cnt)
+				bb_num_esti++;
 			continue;
+		}
+		if (BPF_OP(code) != BPF_CALL) {
+			bb_num_esti += 2;
+			continue;
+		}
 		if (insn[i].src_reg != BPF_PSEUDO_CALL)
 			continue;
 		if (!env->allow_ptr_leaks) {
@@ -764,6 +777,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			verbose(env, "function calls in offloaded programs are not supported yet\n");
 			return -EINVAL;
 		}
+		cedge_num_esti++;
 		ret = add_subprog(env, i + insn[i].imm + 1);
 		if (ret < 0)
 			return ret;
@@ -784,10 +798,14 @@ static int check_subprogs(struct bpf_verifier_env *env)
 		INIT_LIST_HEAD(&subprog[i].callees);
 	}
 
+	ret = cfg_node_allocator_init(&allocator, bb_num_esti,
+				      cedge_num_esti);
+	if (ret < 0)
+		return ret;
 	subprog_start = subprog[cur_subprog].start;
 	subprog_end = subprog[cur_subprog + 1].start;
 	cur_bb_list = &subprog[cur_subprog].bbs;
-	ret = subprog_init_bb(cur_bb_list, subprog_start);
+	ret = subprog_init_bb(&allocator, cur_bb_list, subprog_start);
 	if (ret < 0)
 		goto free_nodes;
 	cur_callee_list = &env->subprog_info[cur_subprog].callees;
@@ -800,7 +818,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
 
 		if (BPF_OP(code) == BPF_EXIT) {
 			if (i + 1 < subprog_end) {
-				ret = subprog_append_bb(cur_bb_list, i + 1);
+				ret = subprog_append_bb(&allocator, cur_bb_list,
+							i + 1);
 				if (ret < 0)
 					goto free_nodes;
 			}
@@ -812,6 +831,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 				int target = i + insn[i].imm + 1;
 
 				ret = subprog_append_callee(env,
+							    &allocator,
 							    cur_callee_list,
 							    cur_subprog,
 							    target);
@@ -835,12 +855,12 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			goto free_nodes;
 		}
 
-		ret = subprog_append_bb(cur_bb_list, off);
+		ret = subprog_append_bb(&allocator, cur_bb_list, off);
 		if (ret < 0)
 			goto free_nodes;
 
 		if (i + 1 < subprog_end) {
-			ret = subprog_append_bb(cur_bb_list, i + 1);
+			ret = subprog_append_bb(&allocator, cur_bb_list, i + 1);
 			if (ret < 0)
 				goto free_nodes;
 		}
@@ -864,10 +884,12 @@ static int check_subprogs(struct bpf_verifier_env *env)
 				goto free_nodes;
 			}
 
-			ret = subprog_fini_bb(cur_bb_list, subprog_end);
+			ret = subprog_fini_bb(&allocator, cur_bb_list,
+					      subprog_end);
 			if (ret < 0)
 				goto free_nodes;
-			ret = subprog_add_bb_edges(insn, cur_bb_list);
+			ret = subprog_add_bb_edges(&allocator, insn,
+						   cur_bb_list);
 			if (ret < 0)
 				goto free_nodes;
 			subprog[cur_subprog].bb_num = ret;
@@ -885,7 +907,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			if (cur_subprog < env->subprog_cnt) {
 				subprog_end = subprog[cur_subprog + 1].start;
 				cur_bb_list = &subprog[cur_subprog].bbs;
-				ret = subprog_init_bb(cur_bb_list,
+				ret = subprog_init_bb(&allocator, cur_bb_list,
 						      subprog_start);
 				if (ret < 0)
 					goto free_nodes;
@@ -901,6 +923,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	ret = 0;
 
 free_nodes:
+	cfg_node_allocator_free(&allocator);
 	subprog_free(subprog, cur_subprog == env->subprog_cnt ?
 				cur_subprog - 1 : cur_subprog);
 	return ret;
-- 
2.7.4

^ permalink raw reply related

* [RFC bpf-next 08/10] bpf: cfg: remove push_insn and check_cfg
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

As we have detected loop and unreachable insns based on domination
information and call graph, there is no need of check_cfg.

This patch removes check_cfg and it's associated push_insn.

state prune heuristic marking as moved to check_subprog.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 228 ++++----------------------------------------------
 1 file changed, 16 insertions(+), 212 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 12f5399..54f2fe3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -735,6 +735,8 @@ enum reg_arg_type {
 	DST_OP_NO_MARK	/* same as above, check only, don't mark */
 };
 
+#define STATE_LIST_MARK ((struct bpf_verifier_state_list *)-1L)
+
 static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
@@ -815,8 +817,14 @@ static int check_subprogs(struct bpf_verifier_env *env)
 							    target);
 				if (ret < 0)
 					return ret;
+
+				env->explored_states[target] = STATE_LIST_MARK;
+				env->explored_states[i] = STATE_LIST_MARK;
 			}
 
+			if (i + 1 < insn_cnt)
+				env->explored_states[i + 1] = STATE_LIST_MARK;
+
 			goto next;
 		}
 
@@ -836,6 +844,13 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			if (ret < 0)
 				goto free_nodes;
 		}
+
+		if (BPF_OP(code) != BPF_JA) {
+			env->explored_states[i] = STATE_LIST_MARK;
+			env->explored_states[off] = STATE_LIST_MARK;
+		} else if (i + 1 < insn_cnt) {
+			env->explored_states[i + 1] = STATE_LIST_MARK;
+		}
 next:
 		if (i == subprog_end - 1) {
 			/* to avoid fall-through from one subprog into another
@@ -3951,217 +3966,6 @@ static int check_return_code(struct bpf_verifier_env *env)
 	return 0;
 }
 
-/* non-recursive DFS pseudo code
- * 1  procedure DFS-iterative(G,v):
- * 2      label v as discovered
- * 3      let S be a stack
- * 4      S.push(v)
- * 5      while S is not empty
- * 6            t <- S.pop()
- * 7            if t is what we're looking for:
- * 8                return t
- * 9            for all edges e in G.adjacentEdges(t) do
- * 10               if edge e is already labelled
- * 11                   continue with the next edge
- * 12               w <- G.adjacentVertex(t,e)
- * 13               if vertex w is not discovered and not explored
- * 14                   label e as tree-edge
- * 15                   label w as discovered
- * 16                   S.push(w)
- * 17                   continue at 5
- * 18               else if vertex w is discovered
- * 19                   label e as back-edge
- * 20               else
- * 21                   // vertex w is explored
- * 22                   label e as forward- or cross-edge
- * 23           label t as explored
- * 24           S.pop()
- *
- * convention:
- * 0x10 - discovered
- * 0x11 - discovered and fall-through edge labelled
- * 0x12 - discovered and fall-through and branch edges labelled
- * 0x20 - explored
- */
-
-enum {
-	DISCOVERED = 0x10,
-	EXPLORED = 0x20,
-	FALLTHROUGH = 1,
-	BRANCH = 2,
-};
-
-#define STATE_LIST_MARK ((struct bpf_verifier_state_list *) -1L)
-
-static int *insn_stack;	/* stack of insns to process */
-static int cur_stack;	/* current stack index */
-static int *insn_state;
-
-/* t, w, e - match pseudo-code above:
- * t - index of current instruction
- * w - next instruction
- * e - edge
- */
-static int push_insn(int t, int w, int e, struct bpf_verifier_env *env)
-{
-	if (e == FALLTHROUGH && insn_state[t] >= (DISCOVERED | FALLTHROUGH))
-		return 0;
-
-	if (e == BRANCH && insn_state[t] >= (DISCOVERED | BRANCH))
-		return 0;
-
-	if (w < 0 || w >= env->prog->len) {
-		verbose(env, "jump out of range from insn %d to %d\n", t, w);
-		return -EINVAL;
-	}
-
-	if (e == BRANCH)
-		/* mark branch target for state pruning */
-		env->explored_states[w] = STATE_LIST_MARK;
-
-	if (insn_state[w] == 0) {
-		/* tree-edge */
-		insn_state[t] = DISCOVERED | e;
-		insn_state[w] = DISCOVERED;
-		if (cur_stack >= env->prog->len)
-			return -E2BIG;
-		insn_stack[cur_stack++] = w;
-		return 1;
-	} else if ((insn_state[w] & 0xF0) == DISCOVERED) {
-		verbose(env, "back-edge from insn %d to %d\n", t, w);
-		return -EINVAL;
-	} else if (insn_state[w] == EXPLORED) {
-		/* forward- or cross-edge */
-		insn_state[t] = DISCOVERED | e;
-	} else {
-		verbose(env, "insn state internal bug\n");
-		return -EFAULT;
-	}
-	return 0;
-}
-
-/* non-recursive depth-first-search to detect loops in BPF program
- * loop == back-edge in directed graph
- */
-static int check_cfg(struct bpf_verifier_env *env)
-{
-	struct bpf_insn *insns = env->prog->insnsi;
-	int insn_cnt = env->prog->len;
-	int ret = 0;
-	int i, t;
-
-	ret = check_subprogs(env);
-	if (ret < 0)
-		return ret;
-
-	insn_state = kcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
-	if (!insn_state)
-		return -ENOMEM;
-
-	insn_stack = kcalloc(insn_cnt, sizeof(int), GFP_KERNEL);
-	if (!insn_stack) {
-		kfree(insn_state);
-		return -ENOMEM;
-	}
-
-	insn_state[0] = DISCOVERED; /* mark 1st insn as discovered */
-	insn_stack[0] = 0; /* 0 is the first instruction */
-	cur_stack = 1;
-
-peek_stack:
-	if (cur_stack == 0)
-		goto check_state;
-	t = insn_stack[cur_stack - 1];
-
-	if (BPF_CLASS(insns[t].code) == BPF_JMP) {
-		u8 opcode = BPF_OP(insns[t].code);
-
-		if (opcode == BPF_EXIT) {
-			goto mark_explored;
-		} else if (opcode == BPF_CALL) {
-			ret = push_insn(t, t + 1, FALLTHROUGH, env);
-			if (ret == 1)
-				goto peek_stack;
-			else if (ret < 0)
-				goto err_free;
-			if (t + 1 < insn_cnt)
-				env->explored_states[t + 1] = STATE_LIST_MARK;
-			if (insns[t].src_reg == BPF_PSEUDO_CALL) {
-				env->explored_states[t] = STATE_LIST_MARK;
-				ret = push_insn(t, t + insns[t].imm + 1, BRANCH, env);
-				if (ret == 1)
-					goto peek_stack;
-				else if (ret < 0)
-					goto err_free;
-			}
-		} else if (opcode == BPF_JA) {
-			if (BPF_SRC(insns[t].code) != BPF_K) {
-				ret = -EINVAL;
-				goto err_free;
-			}
-			/* unconditional jump with single edge */
-			ret = push_insn(t, t + insns[t].off + 1,
-					FALLTHROUGH, env);
-			if (ret == 1)
-				goto peek_stack;
-			else if (ret < 0)
-				goto err_free;
-			/* tell verifier to check for equivalent states
-			 * after every call and jump
-			 */
-			if (t + 1 < insn_cnt)
-				env->explored_states[t + 1] = STATE_LIST_MARK;
-		} else {
-			/* conditional jump with two edges */
-			env->explored_states[t] = STATE_LIST_MARK;
-			ret = push_insn(t, t + 1, FALLTHROUGH, env);
-			if (ret == 1)
-				goto peek_stack;
-			else if (ret < 0)
-				goto err_free;
-
-			ret = push_insn(t, t + insns[t].off + 1, BRANCH, env);
-			if (ret == 1)
-				goto peek_stack;
-			else if (ret < 0)
-				goto err_free;
-		}
-	} else {
-		/* all other non-branch instructions with single
-		 * fall-through edge
-		 */
-		ret = push_insn(t, t + 1, FALLTHROUGH, env);
-		if (ret == 1)
-			goto peek_stack;
-		else if (ret < 0)
-			goto err_free;
-	}
-
-mark_explored:
-	insn_state[t] = EXPLORED;
-	if (cur_stack-- <= 0) {
-		verbose(env, "pop stack internal bug\n");
-		ret = -EFAULT;
-		goto err_free;
-	}
-	goto peek_stack;
-
-check_state:
-	for (i = 0; i < insn_cnt; i++) {
-		if (insn_state[i] != EXPLORED) {
-			verbose(env, "unreachable insn %d\n", i);
-			ret = -EINVAL;
-			goto err_free;
-		}
-	}
-	ret = 0; /* cfg looks good */
-
-err_free:
-	kfree(insn_state);
-	kfree(insn_stack);
-	return ret;
-}
-
 /* check %cur's range satisfies %old's */
 static bool range_within(struct bpf_reg_state *old,
 			 struct bpf_reg_state *cur)
@@ -5710,7 +5514,7 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
 
 	env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);
 
-	ret = check_cfg(env);
+	ret = check_subprogs(env);
 	if (ret < 0)
 		goto skip_full_check;
 
-- 
2.7.4

^ permalink raw reply related

* [RFC bpf-next 07/10] bpf: cfg: build call graph and detect unreachable/recursive call
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

This patch build call graph during insn scan inside check_subprogs.
Then do recursive and unreachable subprog detection using call graph.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/cfg.c             | 133 +++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/cfg.h             |   4 ++
 kernel/bpf/verifier.c        |  32 +++++++++--
 4 files changed, 166 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9c0a808..96337ba 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -176,6 +176,7 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 struct bpf_subprog_info {
 	u32 start; /* insn idx of function entry point */
 	u16 stack_depth; /* max. stack depth used by this function */
+	struct list_head callees; /* callees list. */
 	struct list_head bbs; /* basic blocks list. */
 	u32 bb_num; /* total basic block num. */
 	unsigned long *dtree;
diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
index a34a95c..174e3f0 100644
--- a/kernel/bpf/cfg.c
+++ b/kernel/bpf/cfg.c
@@ -13,6 +13,12 @@
 
 #include "cfg.h"
 
+struct cedge_node {
+	struct list_head l;
+	u8 caller_idx;
+	u8 callee_idx;
+};
+
 struct edge_node {
 	struct list_head l;
 	struct bb_node *src;
@@ -652,6 +658,126 @@ int add_subprog(struct bpf_verifier_env *env, int off)
 	return 0;
 }
 
+struct callee_iter {
+	struct list_head *list_head;
+	struct cedge_node *callee;
+};
+
+#define first_callee(c_list)	list_first_entry(c_list, struct cedge_node, l)
+#define next_callee(c)		list_next_entry(c, l)
+
+static bool ci_end_p(struct callee_iter *ci)
+{
+	return &ci->callee->l == ci->list_head;
+}
+
+static void ci_next(struct callee_iter *ci)
+{
+	struct cedge_node *c = ci->callee;
+
+	ci->callee = next_callee(c);
+}
+
+#define EXPLORING	1
+#define EXPLORED	2
+int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
+				       struct bpf_subprog_info *subprog)
+{
+	struct callee_iter *stack;
+	struct cedge_node *callee;
+	int sp = 0, idx = 0, ret;
+	struct callee_iter ci;
+	int *status;
+
+	stack = kmalloc_array(128, sizeof(struct callee_iter), GFP_KERNEL);
+	if (!stack)
+		return -ENOMEM;
+	status = kcalloc(env->subprog_cnt, sizeof(int), GFP_KERNEL);
+	if (!status) {
+		kfree(stack);
+		return -ENOMEM;
+	}
+	ci.callee = first_callee(&subprog->callees);
+	ci.list_head = &subprog->callees;
+	status[0] = EXPLORING;
+
+	while (1) {
+		while (!ci_end_p(&ci)) {
+			callee = ci.callee;
+			idx = callee->callee_idx;
+			if (status[idx] == EXPLORING) {
+				bpf_verifier_log_write(env, "cgraph - recursive call\n");
+				ret = -EINVAL;
+				goto err_free;
+			}
+
+			status[idx] = EXPLORING;
+
+			if (sp == 127) {
+				bpf_verifier_log_write(env, "cgraph - call frame too deep\n");
+				ret = -EINVAL;
+				goto err_free;
+			}
+
+			stack[sp++] = ci;
+			ci.callee = first_callee(&subprog[idx].callees);
+			ci.list_head = &subprog[idx].callees;
+		}
+
+		if (!list_empty(ci.list_head))
+			status[first_callee(ci.list_head)->caller_idx] =
+				EXPLORED;
+		else
+			/* leaf func. */
+			status[idx] = EXPLORED;
+
+		if (!sp)
+			break;
+
+		ci = stack[--sp];
+		ci_next(&ci);
+	}
+
+	for (idx = 0; idx < env->subprog_cnt; idx++)
+		if (status[idx] != EXPLORED) {
+			bpf_verifier_log_write(env, "cgraph - unreachable subprog\n");
+			ret = -EINVAL;
+			goto err_free;
+		}
+
+	ret = 0;
+err_free:
+	kfree(status);
+	kfree(stack);
+	return ret;
+}
+
+int subprog_append_callee(struct bpf_verifier_env *env,
+			  struct list_head *callees_list,
+			  int caller_idx, int off)
+{
+	int callee_idx = find_subprog(env, off);
+	struct cedge_node *new_callee, *callee;
+
+	if (callee_idx < 0)
+		return callee_idx;
+
+	list_for_each_entry(callee, callees_list, l) {
+		if (callee->callee_idx == callee_idx)
+			return 0;
+	}
+
+	new_callee = kzalloc(sizeof(*new_callee), GFP_KERNEL);
+	if (!new_callee)
+		return -ENOMEM;
+
+	new_callee->caller_idx = caller_idx;
+	new_callee->callee_idx = callee_idx;
+	list_add_tail(&new_callee->l, callees_list);
+
+	return 0;
+}
+
 static void subprog_free_edge(struct bb_node *bb)
 {
 	struct list_head *succs = &bb->e_succs;
@@ -669,7 +795,9 @@ void subprog_free(struct bpf_subprog_info *subprog, int end_idx)
 	int i = 0;
 
 	for (; i <= end_idx; i++) {
+		struct list_head *callees = &subprog[i].callees;
 		struct list_head *bbs = &subprog[i].bbs;
+		struct cedge_node *callee, *tmp_callee;
 		struct bb_node *bb, *tmp, *exit;
 
 		bb = entry_bb(bbs);
@@ -680,6 +808,11 @@ void subprog_free(struct bpf_subprog_info *subprog, int end_idx)
 			kfree(bb);
 		}
 
+		list_for_each_entry_safe(callee, tmp_callee, callees, l) {
+			list_del(&callee->l);
+			kfree(callee);
+		}
+
 		if (subprog[i].dtree_avail)
 			kfree(subprog[i].dtree);
 	}
diff --git a/kernel/bpf/cfg.h b/kernel/bpf/cfg.h
index 57eab9b..577c22c 100644
--- a/kernel/bpf/cfg.h
+++ b/kernel/bpf/cfg.h
@@ -9,9 +9,13 @@
 #define __BPF_CFG_H__
 
 int add_subprog(struct bpf_verifier_env *env, int off);
+int cgraph_check_recursive_unreachable(struct bpf_verifier_env *env,
+				       struct bpf_subprog_info *subprog);
 int find_subprog(struct bpf_verifier_env *env, int off);
 int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list);
 int subprog_append_bb(struct list_head *bb_list, int head);
+int subprog_append_callee(struct bpf_verifier_env *env,
+			  struct list_head *bb_list, int caller_idx, int off);
 int subprog_build_dom_info(struct bpf_verifier_env *env,
 			   struct bpf_subprog_info *subprog);
 int subprog_fini_bb(struct list_head *bb_list, int subprog_end);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 72bda84..12f5399 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -739,9 +739,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
 	struct bpf_subprog_info *subprog = env->subprog_info;
+	struct list_head *cur_bb_list, *cur_callee_list;
 	struct bpf_insn *insn = env->prog->insnsi;
 	int insn_cnt = env->prog->len;
-	struct list_head *cur_bb_list;
 
 	/* Add entry function. */
 	ret = add_subprog(env, 0);
@@ -772,16 +772,23 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	 */
 	subprog[env->subprog_cnt].start = insn_cnt;
 
-	if (env->log.level > 1)
-		for (i = 0; i < env->subprog_cnt; i++)
+	for (i = 0; i < env->subprog_cnt; i++) {
+		if (env->log.level > 1)
 			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
+		/* Don't init callees inside add_subprog which will sort the
+		 * array which breaks list link.
+		 */
+		INIT_LIST_HEAD(&subprog[i].callees);
+	}
+
 	subprog_start = subprog[cur_subprog].start;
 	subprog_end = subprog[cur_subprog + 1].start;
 	cur_bb_list = &subprog[cur_subprog].bbs;
 	ret = subprog_init_bb(cur_bb_list, subprog_start);
 	if (ret < 0)
 		goto free_nodes;
+	cur_callee_list = &env->subprog_info[cur_subprog].callees;
 	/* now check that all jumps are within the same subprog */
 	for (i = 0; i < insn_cnt; i++) {
 		u8 code = insn[i].code;
@@ -798,8 +805,20 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			goto next;
 		}
 
-		if (BPF_OP(code) == BPF_CALL)
+		if (BPF_OP(code) == BPF_CALL) {
+			if (insn[i].src_reg == BPF_PSEUDO_CALL) {
+				int target = i + insn[i].imm + 1;
+
+				ret = subprog_append_callee(env,
+							    cur_callee_list,
+							    cur_subprog,
+							    target);
+				if (ret < 0)
+					return ret;
+			}
+
 			goto next;
+		}
 
 		off = i + insn[i].off + 1;
 		if (off < subprog_start || off >= subprog_end) {
@@ -855,10 +874,15 @@ static int check_subprogs(struct bpf_verifier_env *env)
 						      subprog_start);
 				if (ret < 0)
 					goto free_nodes;
+				cur_callee_list = &subprog[cur_subprog].callees;
 			}
 		}
 	}
 
+	ret = cgraph_check_recursive_unreachable(env, env->subprog_info);
+	if (ret < 0)
+		goto free_nodes;
+
 	ret = 0;
 
 free_nodes:
-- 
2.7.4

^ permalink raw reply related

* [RFC bpf-next 06/10] bpf: cfg: move find_subprog/add_subprog to cfg.c
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

This patch centre find_subprog and add_subprog to cfg.c.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/cfg.c      | 41 +++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/cfg.h      |  2 ++
 kernel/bpf/verifier.c | 42 ------------------------------------------
 3 files changed, 43 insertions(+), 42 deletions(-)

diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
index 7ce1472..a34a95c 100644
--- a/kernel/bpf/cfg.c
+++ b/kernel/bpf/cfg.c
@@ -6,8 +6,10 @@
  */
 
 #include <linux/bpf_verifier.h>
+#include <linux/bsearch.h>
 #include <linux/list.h>
 #include <linux/slab.h>
+#include <linux/sort.h>
 
 #include "cfg.h"
 
@@ -611,6 +613,45 @@ bool subprog_has_loop(struct bpf_subprog_info *subprog)
 	return false;
 }
 
+static int cmp_subprogs(const void *a, const void *b)
+{
+	return ((struct bpf_subprog_info *)a)->start -
+	       ((struct bpf_subprog_info *)b)->start;
+}
+
+int find_subprog(struct bpf_verifier_env *env, int off)
+{
+	struct bpf_subprog_info *p;
+
+	p = bsearch(&off, env->subprog_info, env->subprog_cnt,
+		    sizeof(env->subprog_info[0]), cmp_subprogs);
+	if (!p)
+		return -ENOENT;
+	return p - env->subprog_info;
+}
+
+int add_subprog(struct bpf_verifier_env *env, int off)
+{
+	int insn_cnt = env->prog->len;
+	int ret;
+
+	if (off >= insn_cnt || off < 0) {
+		bpf_verifier_log_write(env, "call to invalid destination\n");
+		return -EINVAL;
+	}
+	ret = find_subprog(env, off);
+	if (ret >= 0)
+		return 0;
+	if (env->subprog_cnt >= BPF_MAX_SUBPROGS) {
+		bpf_verifier_log_write(env, "too many subprograms\n");
+		return -E2BIG;
+	}
+	env->subprog_info[env->subprog_cnt++].start = off;
+	sort(env->subprog_info, env->subprog_cnt,
+	     sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
+	return 0;
+}
+
 static void subprog_free_edge(struct bb_node *bb)
 {
 	struct list_head *succs = &bb->e_succs;
diff --git a/kernel/bpf/cfg.h b/kernel/bpf/cfg.h
index 02729a9..57eab9b 100644
--- a/kernel/bpf/cfg.h
+++ b/kernel/bpf/cfg.h
@@ -8,6 +8,8 @@
 #ifndef __BPF_CFG_H__
 #define __BPF_CFG_H__
 
+int add_subprog(struct bpf_verifier_env *env, int off);
+int find_subprog(struct bpf_verifier_env *env, int off);
 int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list);
 int subprog_append_bb(struct list_head *bb_list, int head);
 int subprog_build_dom_info(struct bpf_verifier_env *env,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 46d5eae..72bda84 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20,8 +20,6 @@
 #include <linux/file.h>
 #include <linux/vmalloc.h>
 #include <linux/stringify.h>
-#include <linux/bsearch.h>
-#include <linux/sort.h>
 
 #include "cfg.h"
 #include "disasm.h"
@@ -737,46 +735,6 @@ enum reg_arg_type {
 	DST_OP_NO_MARK	/* same as above, check only, don't mark */
 };
 
-static int cmp_subprogs(const void *a, const void *b)
-{
-	return ((struct bpf_subprog_info *)a)->start -
-	       ((struct bpf_subprog_info *)b)->start;
-}
-
-static int find_subprog(struct bpf_verifier_env *env, int off)
-{
-	struct bpf_subprog_info *p;
-
-	p = bsearch(&off, env->subprog_info, env->subprog_cnt,
-		    sizeof(env->subprog_info[0]), cmp_subprogs);
-	if (!p)
-		return -ENOENT;
-	return p - env->subprog_info;
-
-}
-
-static int add_subprog(struct bpf_verifier_env *env, int off)
-{
-	int insn_cnt = env->prog->len;
-	int ret;
-
-	if (off >= insn_cnt || off < 0) {
-		verbose(env, "call to invalid destination\n");
-		return -EINVAL;
-	}
-	ret = find_subprog(env, off);
-	if (ret >= 0)
-		return 0;
-	if (env->subprog_cnt >= BPF_MAX_SUBPROGS) {
-		verbose(env, "too many subprograms\n");
-		return -E2BIG;
-	}
-	env->subprog_info[env->subprog_cnt++].start = off;
-	sort(env->subprog_info, env->subprog_cnt,
-	     sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
-	return 0;
-}
-
 static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
-- 
2.7.4

^ permalink raw reply related

* [RFC bpf-next 05/10] bpf: cfg: detect unreachable basic blocks
From: Jiong Wang @ 2018-05-07 10:22 UTC (permalink / raw)
  To: alexei.starovoitov, daniel
  Cc: john.fastabend, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

Do unreachable basic blocks detection as a side-product of the dfs walk
when building domination information.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/cfg.c      | 31 ++++++++++++++++++++++++++-----
 kernel/bpf/cfg.h      |  3 ++-
 kernel/bpf/verifier.c |  3 ++-
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/cfg.c b/kernel/bpf/cfg.c
index 90692e4..7ce1472 100644
--- a/kernel/bpf/cfg.c
+++ b/kernel/bpf/cfg.c
@@ -407,15 +407,17 @@ static void calc_idoms(struct bpf_subprog_info *subprog, struct dom_info *di,
 			di->idom[idx] = di->idom[di->idom[idx]];
 }
 
-static int calc_dfs_tree(struct bpf_subprog_info *subprog, struct dom_info *di,
-			 bool reverse)
+static int
+calc_dfs_tree(struct bpf_verifier_env *env, struct bpf_subprog_info *subprog,
+	      struct dom_info *di, bool reverse)
 {
-	u16 bb_num = subprog->bb_num, sp = 0, idx, parent_idx;
+	u16 bb_num = subprog->bb_num, sp = 0, idx, parent_idx, i;
 	struct list_head *bb_list = &subprog->bbs;
 	u16 entry_bb_fake_idx = bb_num - 2;
 	struct bb_node *entry_bb, *exit_bb;
 	struct edge_iter ei, *stack;
 	struct edge_node *e;
+	bool *visited;
 
 	di->dfs_order[entry_bb_fake_idx] = di->dfsnum;
 
@@ -423,6 +425,10 @@ static int calc_dfs_tree(struct bpf_subprog_info *subprog, struct dom_info *di,
 	if (!stack)
 		return -ENOMEM;
 
+	visited = kcalloc(bb_num - 2, sizeof(bool), GFP_KERNEL);
+	if (!visited)
+		return -ENOMEM;
+
 	if (reverse) {
 		entry_bb = exit_bb(bb_list);
 		exit_bb = entry_bb(bb_list);
@@ -445,6 +451,8 @@ static int calc_dfs_tree(struct bpf_subprog_info *subprog, struct dom_info *di,
 
 			if (reverse) {
 				bb_dst = e->src;
+				if (bb_dst != exit_bb)
+					visited[bb_dst->idx] = true;
 				if (bb_dst == exit_bb ||
 				    di->dfs_order[bb_dst->idx]) {
 					ei_next(&ei);
@@ -453,6 +461,8 @@ static int calc_dfs_tree(struct bpf_subprog_info *subprog, struct dom_info *di,
 				bb_src = e->dst;
 			} else {
 				bb_dst = e->dst;
+				if (bb_dst != exit_bb)
+					visited[bb_dst->idx] = true;
 				if (bb_dst == exit_bb ||
 				    di->dfs_order[bb_dst->idx]) {
 					ei_next(&ei);
@@ -490,6 +500,16 @@ static int calc_dfs_tree(struct bpf_subprog_info *subprog, struct dom_info *di,
 
 	kfree(stack);
 
+	for (i = 0; i < bb_num - 2; i++) {
+		if (!visited[i]) {
+			bpf_verifier_log_write(env, "cfg - unreachable insn\n");
+			kfree(visited);
+			return -EINVAL;
+		}
+	}
+
+	kfree(visited);
+
 	return 0;
 }
 
@@ -541,7 +561,8 @@ static int idoms_to_doms(struct bpf_subprog_info *subprog, struct dom_info *di)
  * The implementation also referenced GNU GCC 3.0.
  */
 
-int subprog_build_dom_info(struct bpf_subprog_info *subprog)
+int subprog_build_dom_info(struct bpf_verifier_env *env,
+			   struct bpf_subprog_info *subprog)
 {
 	struct dom_info di;
 	int ret;
@@ -550,7 +571,7 @@ int subprog_build_dom_info(struct bpf_subprog_info *subprog)
 	if (ret < 0)
 		goto free_dominfo;
 
-	ret = calc_dfs_tree(subprog, &di, false);
+	ret = calc_dfs_tree(env, subprog, &di, false);
 	if (ret < 0)
 		goto free_dominfo;
 
diff --git a/kernel/bpf/cfg.h b/kernel/bpf/cfg.h
index c02c4cf..02729a9 100644
--- a/kernel/bpf/cfg.h
+++ b/kernel/bpf/cfg.h
@@ -10,7 +10,8 @@
 
 int subprog_add_bb_edges(struct bpf_insn *insns, struct list_head *bb_list);
 int subprog_append_bb(struct list_head *bb_list, int head);
-int subprog_build_dom_info(struct bpf_subprog_info *subprog);
+int subprog_build_dom_info(struct bpf_verifier_env *env,
+			   struct bpf_subprog_info *subprog);
 int subprog_fini_bb(struct list_head *bb_list, int subprog_end);
 bool subprog_has_loop(struct bpf_subprog_info *subprog);
 int subprog_init_bb(struct list_head *bb_list, int subprog_start);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a93aa43..46d5eae 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -879,7 +879,8 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			if (ret < 0)
 				goto free_nodes;
 			subprog[cur_subprog].bb_num = ret;
-			ret = subprog_build_dom_info(&subprog[cur_subprog]);
+			ret = subprog_build_dom_info(env,
+						     &subprog[cur_subprog]);
 			if (ret < 0)
 				goto free_nodes;
 			if (subprog_has_loop(&subprog[cur_subprog])) {
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox