Netdev List
 help / color / mirror / Atom feed
* Re: [patch net 0/2] mlxsw: Fixes in GRE offloading
From: David Miller @ 2017-10-02 18:19 UTC (permalink / raw)
  To: jiri; +Cc: netdev, petrm, idosch, mlxsw
In-Reply-To: <20171002101457.1462-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Mon,  2 Oct 2017 12:14:55 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Petr says:
> 
> This patchset fixes a couple unrelated problems in offloading IP-in-IP tunnels
> in mlxsw driver.
> 
> - The first patch fixes a potential reference-counting problem that might lead
>   to a kernel crash.
> 
> - The second patch associates IPIP next hops with their loopback RIFs. Besides
>   being the right thing to do, it also fixes a problem where offloaded IPv6
>   routes that forward to IP-in-IP netdevices were not flagged as such.

Series applied.

^ permalink raw reply

* Re: [RFC net-next 1/5] net: dsa: Add infrastructure to support LAG
From: Florian Fainelli @ 2017-10-02 18:19 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, vivien.didelot, jiri, idosch, Woojung.Huh, john,
	sean.wang
In-Reply-To: <20171002020327.GA21593@lunn.ch>

On 10/01/2017 07:03 PM, Andrew Lunn wrote:
> On Sun, Oct 01, 2017 at 12:46:35PM -0700, Florian Fainelli wrote:
>> Add the necessary logic to support network device events targetting LAG events,
>> this is loosely inspired from mlxsw/spectrum.c.
>>
>> In the process we change dsa_slave_changeupper() to be more generic and be called
>> from both LAG events as well as normal bridge enslaving events paths.
>>
>> The DSA layer takes care of managing the LAG group identifiers, how many LAGs
>> may be supported by a switch, and how many members per LAG are supported by a
>> switch device. When a LAG group is identified, the port is then configured to
>> be a part of that group. When a LAG group no longer has any users, we remove it
>> and we tell the drivers whether it is safe to disable trunking altogether.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  include/net/dsa.h  |  25 +++++++++
>>  net/dsa/dsa2.c     |  12 ++++
>>  net/dsa/dsa_priv.h |   7 +++
>>  net/dsa/port.c     |  92 +++++++++++++++++++++++++++++++
>>  net/dsa/slave.c    | 157 +++++++++++++++++++++++++++++++++++++++++++++++++----
>>  net/dsa/switch.c   |  30 ++++++++++
>>  6 files changed, 312 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index 10dceccd9ce8..247ea58add68 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -182,12 +182,20 @@ struct dsa_port {
>>  	u8			stp_state;
>>  	struct net_device	*bridge_dev;
>>  	struct devlink_port	devlink_port;
>> +	u8			lag_id;
>> +	bool			lagged;
>>  	/*
>>  	 * Original copy of the master netdev ethtool_ops
>>  	 */
>>  	const struct ethtool_ops *orig_ethtool_ops;
>>  };
>>  
>> +struct dsa_lag_group {
>> +	/* Used to know when we can disable lag on the switch */
>> +	unsigned int		ref_count;
> 
> Hi Florian
> 
> In what contexts is ref_count manipulated. Normally you use would
> refcounf_t and the operations in linux/refcount.h. But if you know
> there is some other protection, e.g. rtnl, an unsigned int is O.K.
> Maybe scatter some assert_RTNL() in the code?

Hi Andrew,

This is called with rtnl held, but this is a good point. In fact, I
don't think we need the reference count at all, what I am going to
propose now is that we just maintain a bitmask of port members per lag
group (along with the reference to the lag device) and when the hamming
weight of that bitmask is 1, that means we were removing the lat port of
the LAG group and we can stop using that LAG group. This also allow us
to remove the port_lag_member operation, since we would be maintaining
that at the DSA layer now.

> 
>> +static bool dsa_slave_lag_check(struct net_device *dev, struct net_device *lag_dev,
>> +				struct netdev_lag_upper_info *lag_upper_info)
>> +{
>> +	struct dsa_slave_priv *p = netdev_priv(dev);
>> +	u8 lag_id;
>> +
>> +	/* No more lag identifiers available or already in use */
>> +	if (dsa_switch_lag_get_index(p->dp->ds, lag_dev, &lag_id) != 0)
>> +		return false;
>> +
>> +	if (lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH)
>> +		return false;
> 
> I wounder if the driver needs to decide this? Can different hardware
> support different tx_types?

That is a valid point. For instance, the b53/bcm_sf2 switches can only
do MAC DA and SA, SA only, DA only hashing, but you can't do hashing at
a higher level than L2 addresses, this does appear to be something that
the driver should indeed decide.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next] selftests: rtnetlink.sh: add vxlan and fou test cases
From: David Miller @ 2017-10-02 18:15 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <20171002100529.602-1-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Mon,  2 Oct 2017 12:05:29 +0200

> fou test lifted from ip-fou man page.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

I love seeing new testcases ;-)

Applied, thanks.

^ permalink raw reply

* Re: [kernel-hardening] [PATCH 0/2] capability controlled user-namespaces
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-10-02 18:12 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Mahesh Bandewar, LKML, Netdev, Kernel-hardening, Linux API,
	Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller
In-Reply-To: <20171002171410.GA19611-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>

On Mon, Oct 2, 2017 at 10:14 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Mahesh Bandewar (mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org):
>> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>
>> [Same as the previous RFC series sent on 9/21]
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>>
>> Detailed version
>> ----------------
>
> Hi,
>
> still struggling with how I feel about the idea in general.
>
> So is the intent mainly that if/when there comes an 0-day which allows
> users with CAP_NET_ADMIN in any namespace to gain privilege on the host,
> then this can be used as a stop-gap measure until there is a proper fix?
>
Thank for looking at this Serge.

Yes, but at the same time it's not just limited to NET_ADMIN but could
be any of the current capabilities.

> Otherwise, do you have any guidance for how people should use this?
>
> IMO it should be heavily discouraged to use this tool as a regular
> day to day configuration, as I'm not sure there is any "educated"
> decision to be made, even by those who are in the know, about what
> to put in this set.
>
I think that really depends on the environment. e.g. in certain
sandboxes third-part / semi-trusted workload is executed where network
resource is not used. In that environment I can easily take off
NET_ADMIN and NET_RAW without affecting anything there. At the same
time I wont have to worry about 0-day related to these two
capabilities. I would say the Admins at these places are in the best
place to decide what they can take-off safely and what they cannot.
Even if they decide not to take-off anything, having a tool at hand to
gain control is important when the next 0-day strikes us that can be
exploited using any of the currently used capabilities.

However, you are absolutely right in terms of using it as a stop-gap
measure to protect environment until it's fixed and the capability in
question can not be safely taken off permanently without hampering
operations.

thanks,
--mahesh..

[...]

^ permalink raw reply

* Re: [PATCH net-next 0/2] flow_dissector: dissect tunnel info
From: David Miller @ 2017-10-02 18:06 UTC (permalink / raw)
  To: simon.horman; +Cc: jiri, jhs, xiyou.wangcong, netdev, oss-drivers
In-Reply-To: <1506933676-20121-1-git-send-email-simon.horman@netronome.com>

From: Simon Horman <simon.horman@netronome.com>
Date: Mon,  2 Oct 2017 10:41:14 +0200

> Move dissection of tunnel info from the flower classifier to the flow
> dissector where all other dissection occurs.  This should not have any
> behavioural affect on other users of the flow dissector.

Series applied, thanks Simon.

^ permalink raw reply

* Re: [PATCH net-next 0/3] bridge: neigh msg proxy and flood suppression support
From: Stephen Hemminger @ 2017-10-02 18:02 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Nikolay Aleksandrov,
	bridge
In-Reply-To: <CAJieiUim2XLMGAomb3S5AeWfYqjxV_raetedWcA_PBiaGPRHWg@mail.gmail.com>

On Mon, 2 Oct 2017 07:49:09 -0700
Roopa Prabhu <roopa@cumulusnetworks.com> wrote:

> On Sun, Oct 1, 2017 at 9:36 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> > From: Roopa Prabhu <roopa@cumulusnetworks.com>
> >
> > This series implements arp and nd suppression in the bridge
> > driver for ethernet vpns. It implements rfc7432, section 10
> > https://tools.ietf.org/html/rfc7432#section-10
> > for ethernet VPN deployments. It is similar to the existing
> > BR_ARP_PROXY flag but has a few semantic differences to conform
> > to EVPN standard. In case of EVPN, it is mainly used to avoid flooding to
> > tunnel ports like vxlan/mpls. Unlike the existing flags it suppresses flood
> > of all neigh discovery packets (arp, nd) to tunnel ports.
> >
> > Roopa Prabhu (3):
> >   bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd
> >     flood
> >   neigh arp suppress first
> >   bridge: suppress nd messages from going to BR_NEIGH_SUPPRESS ports
> >  
> 
> pls ignore, shows conflict applying over recent net-next bridge
> changes. Will rebase and submit v2.

Ok, but the concept looks good.

^ permalink raw reply

* Re: [PATCH net-next 02/12] qed: Add ll2 ability of opening a secondary queue
From: David Miller @ 2017-10-02 17:56 UTC (permalink / raw)
  To: Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA
In-Reply-To: <1506932638-26268-3-git-send-email-Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

From: Michal Kalderon <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Date: Mon, 2 Oct 2017 11:23:48 +0300

> When more than one ll2 queue is opened ( that is not an OOO queue )
> ll2 code does not have enough information to determine whether
> the queue is the main one or not, so a new field is added to the
> acquire input data to expose the control of determining whether
> the queue is the main queue or a secondary queue.
> 
> Signed-off-by: Michal Kalderon <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Ariel Elior <Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c | 7 ++++++-
>  drivers/net/ethernet/qlogic/qed/qed_ll2.h | 1 +
>  include/linux/qed/qed_ll2_if.h            | 1 +
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> index 10e3a43..1dd0cca 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> @@ -894,7 +894,7 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,
>  	p_ramrod->drop_ttl0_flg = p_ll2_conn->input.rx_drop_ttl0_flg;
>  	p_ramrod->inner_vlan_removal_en = p_ll2_conn->input.rx_vlan_removal_en;
>  	p_ramrod->queue_id = p_ll2_conn->queue_id;
> -	p_ramrod->main_func_queue = (conn_type == QED_LL2_TYPE_OOO) ? 0 : 1;
> +	p_ramrod->main_func_queue = p_ll2_conn->main_func_queue;
>  
>  	if ((IS_MF_DEFAULT(p_hwfn) || IS_MF_SI(p_hwfn)) &&
>  	    p_ramrod->main_func_queue && (conn_type != QED_LL2_TYPE_ROCE) &&
> @@ -1265,6 +1265,11 @@ int qed_ll2_acquire_connection(void *cxt, struct qed_ll2_acquire_data *data)
>  
>  	p_ll2_info->tx_dest = (data->input.tx_dest == QED_LL2_TX_DEST_NW) ?
>  			      CORE_TX_DEST_NW : CORE_TX_DEST_LB;
> +	if (data->input.conn_type == QED_LL2_TYPE_OOO ||
> +	    data->input.secondary_queue)
> +		p_ll2_info->main_func_queue = false;
> +	else
> +		p_ll2_info->main_func_queue = true;
 ...
> +	u8 main_func_queue;

If these things are bools please use the 'bool' type.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next 01/12] qed: Add ll2 option to limit the number of bds per packet
From: David Miller @ 2017-10-02 17:56 UTC (permalink / raw)
  To: Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA
In-Reply-To: <1506932638-26268-2-git-send-email-Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>

From: Michal Kalderon <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Date: Mon, 2 Oct 2017 11:23:47 +0300

> +		p_pkt = (void *)((u8 *)p_tx->descq_array + desc_size * i);

Hmmm... this is definitely a red flag.

> @@ -63,17 +63,14 @@ struct qed_ll2_rx_packet {
>  struct qed_ll2_tx_packet {
>  	struct list_head list_entry;
>  	u16 bd_used;
> -	u16 vlan;
> -	u16 l4_hdr_offset_w;
> -	u8 bd_flags;
>  	bool notify_fw;
>  	void *cookie;
> -
> +	/* Flexible Array of bds_set determined by max_bds_per_packet */
>  	struct {
>  		struct core_tx_bd *txq_bd;
>  		dma_addr_t tx_frag;
>  		u16 frag_len;
> -	} bds_set[ETH_TX_MAX_BDS_PER_NON_LSO_PACKET];
> +	} bds_set[1];
>  };

If you do this then you have to make the ->descq_array a void pointer
or something.

Otherwise someone will try to access it as an array and it will
explode because the elements of the array are of a variable size.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
From: Stephen Hemminger @ 2017-10-02 17:37 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Michal Kubecek, Phil Sutter, Hangbin Liu
In-Reply-To: <1506605626-1744-3-git-send-email-haliu@redhat.com>

On Thu, 28 Sep 2017 21:33:46 +0800
Hangbin Liu <haliu@redhat.com> wrote:

> From: Hangbin Liu <liuhangbin@gmail.com>
> 
> This is an update for 460c03f3f3cc ("iplink: double the buffer size also in
> iplink_get()"). After update, we will not need to double the buffer size
> every time when VFs number increased.
> 
> With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
> length parameter.
> 
> With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
> answer to avoid overwrite data in nlh, because it may has more info after
> nlh. also this will avoid nlh buffer not enough issue.
> 
> We need to free answer after using.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---

Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing.
Can only those places that need that be targeted?

^ permalink raw reply

* Re: [PATCH iproute2] iproute: build more easily on Android
From: Stephen Hemminger @ 2017-10-02 17:36 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, enh
In-Reply-To: <20171002170337.42235-1-lorenzo@google.com>

On Tue,  3 Oct 2017 02:03:37 +0900
Lorenzo Colitti <lorenzo@google.com> wrote:

> iproute2 contains a bunch of kernel headers, including uapi ones.
> Android's libc uses uapi headers almost directly, and uses a
> script to fix kernel types that don't match what userspace
> expects.
> 
> For example: https://issuetracker.google.com/36987220 reports
> that our struct ip_mreq_source contains "__be32 imr_multiaddr"
> rather than "struct in_addr imr_multiaddr". The script addresses
> this by replacing the uapi struct definition with a #include
> <bits/ip_mreq.h> which contains the traditional userspace
> definition.
> 
> Unfortunately, when we compile iproute2, this definition
> conflicts with the one in iproute2's linux/in.h.
> 
> Historically we've just solved this problem by running "git rm"
> on all the iproute2 include/linux headers that break Android's
> libc.  However, deleting the files in this way makes it harder to
> keep up with upstream, because every upstream change to
> an include file causes a merge conflict with the delete.
> 
> This patch fixes the problem by moving the iproute2 linux headers
> from include/linux to include/uapi/linux.
> 
> Tested: compiles on ubuntu trusty (glibc)
> 
> Signed-off-by: Elliott Hughes <enh@google.com>
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

Rather than moving everything, why not make kernel headers directory
configurable as part of the configure script setup process.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 17:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <CANn89iKXGx2AmaYtqaD_CTvxgG2xC6vbuuNigixtUM82fExODQ@mail.gmail.com>

On Mon, Oct 02, 2017 at 10:27:15AM -0700, Eric Dumazet wrote:
> On Mon, Oct 2, 2017 at 10:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Mon, Oct 02, 2017 at 07:48:28AM -0700, Eric Dumazet wrote:
> >> Please try the following fool proof patch.
> >>
> >> This is what I had in my local tree back in August but could not
> >> conclude on the syzkaller bug I was working on.
> >>
> >> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> >> index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
> >> --- a/net/ipv4/icmp.c
> >> +++ b/net/ipv4/icmp.c
> >> @@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> >>               room = 576;
> >>       room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
> >>       room -= sizeof(struct icmphdr);
> >> -
> >> +     if (room < 0)
> >> +             goto ende;
> >>       icmp_param.data_len = skb_in->len - icmp_param.offset;
> >>       if (icmp_param.data_len > room)
> >>               icmp_param.data_len = room;
> >>
> >
> > Unfortuantely, with this applied I still see the issue.
> >
> > Syzkaller came up with a minimized reproducer [1], which can trigger the
> > issue near instantly under syz-execprog. If there's anything that would
> > help to narrow this down, I'm more than happy to give it a go.
> >
> > Thanks,
> > Mark.
> >
> > [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic/syzkaller.repro
> 
> Note that I was not trying to address the misaligned stuff.

Aargh, I put the reproducer in the wrong folder thanks to tab-completing
my kup command. :/

The reproducer linked above is for the kernel BUG at
net/core/skbuff.c:2626.

I've uploaded a copy into the relevant bug directory [1], but that'll
take a little while to sync out. I'll drop it from the misalignment bug
folder once that's visible to all.

Sorry about that!

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skbuff-bug/

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 17:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <20171002172131.GA3360@leverpostej>

On Mon, Oct 2, 2017 at 10:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Oct 02, 2017 at 07:48:28AM -0700, Eric Dumazet wrote:
>> Please try the following fool proof patch.
>>
>> This is what I had in my local tree back in August but could not
>> conclude on the syzkaller bug I was working on.
>>
>> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>> index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
>> --- a/net/ipv4/icmp.c
>> +++ b/net/ipv4/icmp.c
>> @@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>>               room = 576;
>>       room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>>       room -= sizeof(struct icmphdr);
>> -
>> +     if (room < 0)
>> +             goto ende;
>>       icmp_param.data_len = skb_in->len - icmp_param.offset;
>>       if (icmp_param.data_len > room)
>>               icmp_param.data_len = room;
>>
>
> Unfortuantely, with this applied I still see the issue.
>
> Syzkaller came up with a minimized reproducer [1], which can trigger the
> issue near instantly under syz-execprog. If there's anything that would
> help to narrow this down, I'm more than happy to give it a go.
>
> Thanks,
> Mark.
>
> [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic/syzkaller.repro

Note that I was not trying to address the misaligned stuff.

Only this :

------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:2626!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
Hardware name: linux,dummy-virt (DT)
task: ffff80003a901a80 task.stack: ffff80003a908000
PC is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
LR is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 17:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <1506955708.8061.5.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Oct 02, 2017 at 07:48:28AM -0700, Eric Dumazet wrote:
> Please try the following fool proof patch.
>
> This is what I had in my local tree back in August but could not
> conclude on the syzkaller bug I was working on.
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
>  		room = 576;
>  	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
>  	room -= sizeof(struct icmphdr);
> -
> +	if (room < 0)
> +		goto ende;
>  	icmp_param.data_len = skb_in->len - icmp_param.offset;
>  	if (icmp_param.data_len > room)
>  		icmp_param.data_len = room;
> 

Unfortuantely, with this applied I still see the issue.

Syzkaller came up with a minimized reproducer [1], which can trigger the
issue near instantly under syz-execprog. If there's anything that would
help to narrow this down, I'm more than happy to give it a go.

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic/syzkaller.repro

^ permalink raw reply

* Re: [PATCH V4] r8152: add Linksys USB3GIGV1 id
From: Grant Grundler @ 2017-10-02 17:21 UTC (permalink / raw)
  To: David Miller
  Cc: Grant Grundler, Hayes Wang, Oliver Neukum,
	linux-usb@vger.kernel.org, LKML, netdev
In-Reply-To: <20171001.223954.160035131695050852.davem@davemloft.net>

On Sun, Oct 1, 2017 at 10:39 PM, David Miller <davem@davemloft.net> wrote:
> From: Grant Grundler <grundler@chromium.org>
> Date: Thu, 28 Sep 2017 11:35:00 -0700
>
>> This linksys dongle by default comes up in cdc_ether mode.
>> This patch allows r8152 to claim the device:
>>    Bus 002 Device 002: ID 13b1:0041 Linksys
>>
>> Signed-off-by: Grant Grundler <grundler@chromium.org>
>
> Applied, thanks.

thanks David, Doug, Oliver! :)

cheers,
grant

^ permalink raw reply

* Re: [kernel-hardening] [PATCH 0/2] capability controlled user-namespaces
From: Serge E. Hallyn @ 2017-10-02 17:14 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook,
	Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller,
	Mahesh Bandewar
In-Reply-To: <20170929230952.29673-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>

Quoting Mahesh Bandewar (mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org):
> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> [Same as the previous RFC series sent on 9/21]
> 
> TL;DR version
> -------------
> Creating a sandbox environment with namespaces is challenging
> considering what these sandboxed processes can engage into. e.g.
> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
> Current form of user-namespaces, however, if changed a bit can allow
> us to create a sandbox environment without locking down user-
> namespaces.
> 
> Detailed version
> ----------------

Hi,

still struggling with how I feel about the idea in general.

So is the intent mainly that if/when there comes an 0-day which allows
users with CAP_NET_ADMIN in any namespace to gain privilege on the host,
then this can be used as a stop-gap measure until there is a proper fix?

Otherwise, do you have any guidance for how people should use this?

IMO it should be heavily discouraged to use this tool as a regular
day to day configuration, as I'm not sure there is any "educated"
decision to be made, even by those who are in the know, about what
to put in this set.

> Problem
> -------
> User-namespaces in the current form have increased the attack surface as
> any process can acquire capabilities which are not available to them (by
> default) by performing combination of clone()/unshare()/setns() syscalls.
> 
>     #define _GNU_SOURCE
>     #include <stdio.h>
>     #include <sched.h>
>     #include <netinet/in.h>
> 
>     int main(int ac, char **av)
>     {
>         int sock = -1;
> 
>         printf("Attempting to open RAW socket before unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock before unshare().\n");
>             close(sock);
>             sock = -1;
>         }
> 
>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>             perror("unshare() failed: ");
>             return 1;
>         }
> 
>         printf("Attempting to open RAW socket after unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock after unshare().\n");
>             close(sock);
>             sock = -1;
>         }
> 
>         return 0;
>     }
> 
> The above example shows how easy it is to acquire NET_RAW capabilities
> and once acquired, these processes could take benefit of above mentioned
> or similar issues discovered/undiscovered with malicious intent. Note
> that this is just an example and the problem/solution is not limited
> to NET_RAW capability *only*. 
> 
> The easiest fix one can apply here is to lock-down user-namespaces which
> many of the distros do (i.e. don't allow users to create user namespaces),
> but unfortunately that prevents everyone from using them.
> 
> Approach
> --------
> Introduce a notion of 'controlled' user-namespaces. Every process on
> the host is allowed to create user-namespaces (governed by the limit
> imposed by per-ns sysctl) however, mark user-namespaces created by
> sandboxed processes as 'controlled'. Use this 'mark' at the time of
> capability check in conjunction with a global capability whitelist.
> If the capability is not whitelisted, processes that belong to 
> controlled user-namespaces will not be allowed.
> 
> Once a user-ns is marked as 'controlled'; all its child user-
> namespaces are marked as 'controlled' too.
> 
> A global whitelist is list of capabilities governed by the
> sysctl which is available to (privileged) user in init-ns to modify
> while it's applicable to all controlled user-namespaces on the host.
> 
> Marking user-namespaces controlled without modifying the whitelist is
> equivalent of the current behavior. The default value of whitelist includes
> all capabilities so that the compatibility is maintained. However it gives
> admins fine-grained ability to control various capabilities system wide
> without locking down user-namespaces.
> 
> Please see individual patches in this series.
> 
> Mahesh Bandewar (2):
>   capability: introduce sysctl for controlled user-ns capability
>     whitelist
>   userns: control capabilities of some user namespaces
> 
>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>  include/linux/capability.h      |  4 ++++
>  include/linux/user_namespace.h  | 20 ++++++++++++++++
>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c                 |  5 ++++
>  kernel/user_namespace.c         |  3 +++
>  security/commoncap.c            |  8 +++++++
>  7 files changed, 113 insertions(+)
> 
> -- 
> 2.14.2.822.g60be5d43e6-goog

^ permalink raw reply

* Re: [next-queue PATCH v2 3/5] net/sched: Introduce the user API for the CBS shaper
From: Vinicius Costa Gomes @ 2017-10-02 17:07 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko, andre.guedes, Ivan Briano,
	Jesus Sanchez-Palencia, boon.leong.ong, richardcochran,
	Henrik Austad, levipearson, rodney.cummings
In-Reply-To: <CAM_iQpWGHE7hgwEZDO+oRGgWdrdYYofnHfuQq3fMOO-yFj7NSw@mail.gmail.com>

Hi,

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Fri, Sep 29, 2017 at 5:26 PM, Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
>> Export the API necessary for configuring the CBS shaper (implemented
>> in the next patch) via the tc tool.
>
> This one can be folded into patch 4/5.

Will do.


Cheers,

^ permalink raw reply

* Re: [PATCH 00/18] use ARRAY_SIZE macro
From: Zhi Wang @ 2017-10-02 17:05 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: alsa-devel, nouveau, dri-devel, dm-devel, brcm80211-dev-list,
	devel, linux-scsi, linux-rdma, amd-gfx, Jason Gunthorpe,
	linux-acpi, linux-video, intel-wired-lan, linux-media, intel-gfx,
	ecryptfs, linux-nfs, linux-raid, openipmi-developer,
	intel-gvt-dev, devel, brcm80211-dev-list.pdl, netdev, linux-usb,
	linux-wireless, linux-kernel, linux-integrity
In-Reply-To: <20171001193101.8898-1-jeremy.lefaure@lse.epita.fr>


[-- Attachment #1.1: Type: text/plain, Size: 1908 bytes --]

Thanks for the patch! :)

2017-10-01 22:30 GMT+03:00 Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>:

> Hi everyone,
> Using ARRAY_SIZE improves the code readability. I used coccinelle (I
> made a change to the array_size.cocci file [1]) to find several places
> where ARRAY_SIZE could be used instead of other macros or sizeof
> division.
>
> I tried to divide the changes into a patch per subsystem (excepted for
> staging). If one of the patch should be split into several patches, let
> me know.
>
> In order to reduce the size of the To: and Cc: lines, each patch of the
> series is sent only to the maintainers and lists concerned by the patch.
> This cover letter is sent to every list concerned by this series.
>
> This series is based on linux-next next-20170929. Each patch has been
> tested by building the relevant files with W=1.
>
> This series contains the following patches:
> [PATCH 01/18] sound: use ARRAY_SIZE
> [PATCH 02/18] tracing/filter: use ARRAY_SIZE
> [PATCH 03/18] media: use ARRAY_SIZE
> [PATCH 04/18] IB/mlx5: Use ARRAY_SIZE
> [PATCH 05/18] net: use ARRAY_SIZE
> [PATCH 06/18] drm: use ARRAY_SIZE
> [PATCH 07/18] scsi: bfa: use ARRAY_SIZE
> [PATCH 08/18] ecryptfs: use ARRAY_SIZE
> [PATCH 09/18] nfsd: use ARRAY_SIZE
> [PATCH 10/18] orangefs: use ARRAY_SIZE
> [PATCH 11/18] dm space map metadata: use ARRAY_SIZE
> [PATCH 12/18] x86: use ARRAY_SIZE
> [PATCH 13/18] tpm: use ARRAY_SIZE
> [PATCH 14/18] ipmi: use ARRAY_SIZE
> [PATCH 15/18] acpi: use ARRAY_SIZE
> [PATCH 16/18] media: staging: atomisp: use ARRAY_SIZE
> [PATCH 17/18] staging: rtl8723bs: use ARRAY_SIZE
> [PATCH 18/18] staging: rtlwifi: use ARRAY_SIZE
>
>
> [1]: https://lkml.org/lkml/2017/9/13/689
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

[-- Attachment #1.2: Type: text/html, Size: 2555 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply

* [PATCH iproute2] iproute: build more easily on Android
From: Lorenzo Colitti @ 2017-10-02 17:03 UTC (permalink / raw)
  To: netdev; +Cc: stephen, enh, Lorenzo Colitti

iproute2 contains a bunch of kernel headers, including uapi ones.
Android's libc uses uapi headers almost directly, and uses a
script to fix kernel types that don't match what userspace
expects.

For example: https://issuetracker.google.com/36987220 reports
that our struct ip_mreq_source contains "__be32 imr_multiaddr"
rather than "struct in_addr imr_multiaddr". The script addresses
this by replacing the uapi struct definition with a #include
<bits/ip_mreq.h> which contains the traditional userspace
definition.

Unfortunately, when we compile iproute2, this definition
conflicts with the one in iproute2's linux/in.h.

Historically we've just solved this problem by running "git rm"
on all the iproute2 include/linux headers that break Android's
libc.  However, deleting the files in this way makes it harder to
keep up with upstream, because every upstream change to
an include file causes a merge conflict with the delete.

This patch fixes the problem by moving the iproute2 linux headers
from include/linux to include/uapi/linux.

Tested: compiles on ubuntu trusty (glibc)

Signed-off-by: Elliott Hughes <enh@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 Makefile                                             | 2 +-
 include/{ => uapi}/linux/atm.h                       | 0
 include/{ => uapi}/linux/atmapi.h                    | 0
 include/{ => uapi}/linux/atmarp.h                    | 0
 include/{ => uapi}/linux/atmdev.h                    | 0
 include/{ => uapi}/linux/atmioc.h                    | 0
 include/{ => uapi}/linux/atmsap.h                    | 0
 include/{ => uapi}/linux/bpf.h                       | 0
 include/{ => uapi}/linux/bpf_common.h                | 0
 include/{ => uapi}/linux/can.h                       | 0
 include/{ => uapi}/linux/can/netlink.h               | 0
 include/{ => uapi}/linux/can/vxcan.h                 | 0
 include/{ => uapi}/linux/devlink.h                   | 0
 include/{ => uapi}/linux/elf-em.h                    | 0
 include/{ => uapi}/linux/fib_rules.h                 | 0
 include/{ => uapi}/linux/filter.h                    | 0
 include/{ => uapi}/linux/fou.h                       | 0
 include/{ => uapi}/linux/gen_stats.h                 | 0
 include/{ => uapi}/linux/genetlink.h                 | 0
 include/{ => uapi}/linux/hdlc/ioctl.h                | 0
 include/{ => uapi}/linux/icmpv6.h                    | 0
 include/{ => uapi}/linux/if.h                        | 0
 include/{ => uapi}/linux/if_addr.h                   | 0
 include/{ => uapi}/linux/if_addrlabel.h              | 0
 include/{ => uapi}/linux/if_alg.h                    | 0
 include/{ => uapi}/linux/if_arp.h                    | 0
 include/{ => uapi}/linux/if_bonding.h                | 0
 include/{ => uapi}/linux/if_bridge.h                 | 0
 include/{ => uapi}/linux/if_ether.h                  | 0
 include/{ => uapi}/linux/if_link.h                   | 0
 include/{ => uapi}/linux/if_macsec.h                 | 0
 include/{ => uapi}/linux/if_packet.h                 | 0
 include/{ => uapi}/linux/if_tun.h                    | 0
 include/{ => uapi}/linux/if_tunnel.h                 | 0
 include/{ => uapi}/linux/if_vlan.h                   | 0
 include/{ => uapi}/linux/ife.h                       | 0
 include/{ => uapi}/linux/ila.h                       | 0
 include/{ => uapi}/linux/in.h                        | 0
 include/{ => uapi}/linux/in6.h                       | 0
 include/{ => uapi}/linux/in_route.h                  | 0
 include/{ => uapi}/linux/inet_diag.h                 | 0
 include/{ => uapi}/linux/ip.h                        | 0
 include/{ => uapi}/linux/ip6_tunnel.h                | 0
 include/{ => uapi}/linux/ipsec.h                     | 0
 include/{ => uapi}/linux/kernel.h                    | 0
 include/{ => uapi}/linux/l2tp.h                      | 0
 include/{ => uapi}/linux/libc-compat.h               | 0
 include/{ => uapi}/linux/limits.h                    | 0
 include/{ => uapi}/linux/lwtunnel.h                  | 0
 include/{ => uapi}/linux/magic.h                     | 0
 include/{ => uapi}/linux/mpls.h                      | 0
 include/{ => uapi}/linux/mpls_iptunnel.h             | 0
 include/{ => uapi}/linux/neighbour.h                 | 0
 include/{ => uapi}/linux/net_namespace.h             | 0
 include/{ => uapi}/linux/netconf.h                   | 0
 include/{ => uapi}/linux/netdevice.h                 | 0
 include/{ => uapi}/linux/netfilter.h                 | 0
 include/{ => uapi}/linux/netfilter/ipset/ip_set.h    | 0
 include/{ => uapi}/linux/netfilter/x_tables.h        | 0
 include/{ => uapi}/linux/netfilter/xt_set.h          | 0
 include/{ => uapi}/linux/netfilter/xt_tcpudp.h       | 0
 include/{ => uapi}/linux/netfilter_ipv4.h            | 0
 include/{ => uapi}/linux/netfilter_ipv4/ip_tables.h  | 0
 include/{ => uapi}/linux/netfilter_ipv6.h            | 0
 include/{ => uapi}/linux/netfilter_ipv6/ip6_tables.h | 0
 include/{ => uapi}/linux/netlink.h                   | 0
 include/{ => uapi}/linux/netlink_diag.h              | 0
 include/{ => uapi}/linux/packet_diag.h               | 0
 include/{ => uapi}/linux/param.h                     | 0
 include/{ => uapi}/linux/pfkeyv2.h                   | 0
 include/{ => uapi}/linux/pkt_cls.h                   | 0
 include/{ => uapi}/linux/pkt_sched.h                 | 0
 include/{ => uapi}/linux/posix_types.h               | 0
 include/{ => uapi}/linux/rtnetlink.h                 | 0
 include/{ => uapi}/linux/sctp.h                      | 0
 include/{ => uapi}/linux/seg6.h                      | 0
 include/{ => uapi}/linux/seg6_genl.h                 | 0
 include/{ => uapi}/linux/seg6_hmac.h                 | 0
 include/{ => uapi}/linux/seg6_iptunnel.h             | 0
 include/{ => uapi}/linux/seg6_local.h                | 0
 include/{ => uapi}/linux/sock_diag.h                 | 0
 include/{ => uapi}/linux/socket.h                    | 0
 include/{ => uapi}/linux/sockios.h                   | 0
 include/{ => uapi}/linux/stddef.h                    | 0
 include/{ => uapi}/linux/sysinfo.h                   | 0
 include/{ => uapi}/linux/tc_act/tc_bpf.h             | 0
 include/{ => uapi}/linux/tc_act/tc_connmark.h        | 0
 include/{ => uapi}/linux/tc_act/tc_csum.h            | 0
 include/{ => uapi}/linux/tc_act/tc_defact.h          | 0
 include/{ => uapi}/linux/tc_act/tc_gact.h            | 0
 include/{ => uapi}/linux/tc_act/tc_ife.h             | 0
 include/{ => uapi}/linux/tc_act/tc_ipt.h             | 0
 include/{ => uapi}/linux/tc_act/tc_mirred.h          | 0
 include/{ => uapi}/linux/tc_act/tc_nat.h             | 0
 include/{ => uapi}/linux/tc_act/tc_pedit.h           | 0
 include/{ => uapi}/linux/tc_act/tc_sample.h          | 0
 include/{ => uapi}/linux/tc_act/tc_skbedit.h         | 0
 include/{ => uapi}/linux/tc_act/tc_skbmod.h          | 0
 include/{ => uapi}/linux/tc_act/tc_tunnel_key.h      | 0
 include/{ => uapi}/linux/tc_act/tc_vlan.h            | 0
 include/{ => uapi}/linux/tc_ematch/tc_em_cmp.h       | 0
 include/{ => uapi}/linux/tc_ematch/tc_em_meta.h      | 0
 include/{ => uapi}/linux/tc_ematch/tc_em_nbyte.h     | 0
 include/{ => uapi}/linux/tcp.h                       | 0
 include/{ => uapi}/linux/tcp_metrics.h               | 0
 include/{ => uapi}/linux/tipc.h                      | 0
 include/{ => uapi}/linux/tipc_netlink.h              | 0
 include/{ => uapi}/linux/types.h                     | 0
 include/{ => uapi}/linux/unix_diag.h                 | 0
 include/{ => uapi}/linux/veth.h                      | 0
 include/{ => uapi}/linux/xfrm.h                      | 0
 111 files changed, 1 insertion(+), 1 deletion(-)
 rename include/{ => uapi}/linux/atm.h (100%)
 rename include/{ => uapi}/linux/atmapi.h (100%)
 rename include/{ => uapi}/linux/atmarp.h (100%)
 rename include/{ => uapi}/linux/atmdev.h (100%)
 rename include/{ => uapi}/linux/atmioc.h (100%)
 rename include/{ => uapi}/linux/atmsap.h (100%)
 rename include/{ => uapi}/linux/bpf.h (100%)
 rename include/{ => uapi}/linux/bpf_common.h (100%)
 rename include/{ => uapi}/linux/can.h (100%)
 rename include/{ => uapi}/linux/can/netlink.h (100%)
 rename include/{ => uapi}/linux/can/vxcan.h (100%)
 rename include/{ => uapi}/linux/devlink.h (100%)
 rename include/{ => uapi}/linux/elf-em.h (100%)
 rename include/{ => uapi}/linux/fib_rules.h (100%)
 rename include/{ => uapi}/linux/filter.h (100%)
 rename include/{ => uapi}/linux/fou.h (100%)
 rename include/{ => uapi}/linux/gen_stats.h (100%)
 rename include/{ => uapi}/linux/genetlink.h (100%)
 rename include/{ => uapi}/linux/hdlc/ioctl.h (100%)
 rename include/{ => uapi}/linux/icmpv6.h (100%)
 rename include/{ => uapi}/linux/if.h (100%)
 rename include/{ => uapi}/linux/if_addr.h (100%)
 rename include/{ => uapi}/linux/if_addrlabel.h (100%)
 rename include/{ => uapi}/linux/if_alg.h (100%)
 rename include/{ => uapi}/linux/if_arp.h (100%)
 rename include/{ => uapi}/linux/if_bonding.h (100%)
 rename include/{ => uapi}/linux/if_bridge.h (100%)
 rename include/{ => uapi}/linux/if_ether.h (100%)
 rename include/{ => uapi}/linux/if_link.h (100%)
 rename include/{ => uapi}/linux/if_macsec.h (100%)
 rename include/{ => uapi}/linux/if_packet.h (100%)
 rename include/{ => uapi}/linux/if_tun.h (100%)
 rename include/{ => uapi}/linux/if_tunnel.h (100%)
 rename include/{ => uapi}/linux/if_vlan.h (100%)
 rename include/{ => uapi}/linux/ife.h (100%)
 rename include/{ => uapi}/linux/ila.h (100%)
 rename include/{ => uapi}/linux/in.h (100%)
 rename include/{ => uapi}/linux/in6.h (100%)
 rename include/{ => uapi}/linux/in_route.h (100%)
 rename include/{ => uapi}/linux/inet_diag.h (100%)
 rename include/{ => uapi}/linux/ip.h (100%)
 rename include/{ => uapi}/linux/ip6_tunnel.h (100%)
 rename include/{ => uapi}/linux/ipsec.h (100%)
 rename include/{ => uapi}/linux/kernel.h (100%)
 rename include/{ => uapi}/linux/l2tp.h (100%)
 rename include/{ => uapi}/linux/libc-compat.h (100%)
 rename include/{ => uapi}/linux/limits.h (100%)
 rename include/{ => uapi}/linux/lwtunnel.h (100%)
 rename include/{ => uapi}/linux/magic.h (100%)
 rename include/{ => uapi}/linux/mpls.h (100%)
 rename include/{ => uapi}/linux/mpls_iptunnel.h (100%)
 rename include/{ => uapi}/linux/neighbour.h (100%)
 rename include/{ => uapi}/linux/net_namespace.h (100%)
 rename include/{ => uapi}/linux/netconf.h (100%)
 rename include/{ => uapi}/linux/netdevice.h (100%)
 rename include/{ => uapi}/linux/netfilter.h (100%)
 rename include/{ => uapi}/linux/netfilter/ipset/ip_set.h (100%)
 rename include/{ => uapi}/linux/netfilter/x_tables.h (100%)
 rename include/{ => uapi}/linux/netfilter/xt_set.h (100%)
 rename include/{ => uapi}/linux/netfilter/xt_tcpudp.h (100%)
 rename include/{ => uapi}/linux/netfilter_ipv4.h (100%)
 rename include/{ => uapi}/linux/netfilter_ipv4/ip_tables.h (100%)
 rename include/{ => uapi}/linux/netfilter_ipv6.h (100%)
 rename include/{ => uapi}/linux/netfilter_ipv6/ip6_tables.h (100%)
 rename include/{ => uapi}/linux/netlink.h (100%)
 rename include/{ => uapi}/linux/netlink_diag.h (100%)
 rename include/{ => uapi}/linux/packet_diag.h (100%)
 rename include/{ => uapi}/linux/param.h (100%)
 rename include/{ => uapi}/linux/pfkeyv2.h (100%)
 rename include/{ => uapi}/linux/pkt_cls.h (100%)
 rename include/{ => uapi}/linux/pkt_sched.h (100%)
 rename include/{ => uapi}/linux/posix_types.h (100%)
 rename include/{ => uapi}/linux/rtnetlink.h (100%)
 rename include/{ => uapi}/linux/sctp.h (100%)
 rename include/{ => uapi}/linux/seg6.h (100%)
 rename include/{ => uapi}/linux/seg6_genl.h (100%)
 rename include/{ => uapi}/linux/seg6_hmac.h (100%)
 rename include/{ => uapi}/linux/seg6_iptunnel.h (100%)
 rename include/{ => uapi}/linux/seg6_local.h (100%)
 rename include/{ => uapi}/linux/sock_diag.h (100%)
 rename include/{ => uapi}/linux/socket.h (100%)
 rename include/{ => uapi}/linux/sockios.h (100%)
 rename include/{ => uapi}/linux/stddef.h (100%)
 rename include/{ => uapi}/linux/sysinfo.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_bpf.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_connmark.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_csum.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_defact.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_gact.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_ife.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_ipt.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_mirred.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_nat.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_pedit.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_sample.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_skbedit.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_skbmod.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_tunnel_key.h (100%)
 rename include/{ => uapi}/linux/tc_act/tc_vlan.h (100%)
 rename include/{ => uapi}/linux/tc_ematch/tc_em_cmp.h (100%)
 rename include/{ => uapi}/linux/tc_ematch/tc_em_meta.h (100%)
 rename include/{ => uapi}/linux/tc_ematch/tc_em_nbyte.h (100%)
 rename include/{ => uapi}/linux/tcp.h (100%)
 rename include/{ => uapi}/linux/tcp_metrics.h (100%)
 rename include/{ => uapi}/linux/tipc.h (100%)
 rename include/{ => uapi}/linux/tipc_netlink.h (100%)
 rename include/{ => uapi}/linux/types.h (100%)
 rename include/{ => uapi}/linux/unix_diag.h (100%)
 rename include/{ => uapi}/linux/veth.h (100%)
 rename include/{ => uapi}/linux/xfrm.h (100%)

diff --git a/Makefile b/Makefile
index 75c0e57006..6ad9610430 100644
--- a/Makefile
+++ b/Makefile
@@ -46,7 +46,7 @@ CCOPTS = -O2
 WFLAGS := -Wall -Wstrict-prototypes  -Wmissing-prototypes
 WFLAGS += -Wmissing-declarations -Wold-style-definition -Wformat=2
 
-CFLAGS := $(WFLAGS) $(CCOPTS) -I../include $(DEFINES) $(CFLAGS)
+CFLAGS := $(WFLAGS) $(CCOPTS) -I../include -I../include/uapi $(DEFINES) $(CFLAGS)
 YACCFLAGS = -d -t -v
 
 SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma man
diff --git a/include/linux/atm.h b/include/uapi/linux/atm.h
similarity index 100%
rename from include/linux/atm.h
rename to include/uapi/linux/atm.h
diff --git a/include/linux/atmapi.h b/include/uapi/linux/atmapi.h
similarity index 100%
rename from include/linux/atmapi.h
rename to include/uapi/linux/atmapi.h
diff --git a/include/linux/atmarp.h b/include/uapi/linux/atmarp.h
similarity index 100%
rename from include/linux/atmarp.h
rename to include/uapi/linux/atmarp.h
diff --git a/include/linux/atmdev.h b/include/uapi/linux/atmdev.h
similarity index 100%
rename from include/linux/atmdev.h
rename to include/uapi/linux/atmdev.h
diff --git a/include/linux/atmioc.h b/include/uapi/linux/atmioc.h
similarity index 100%
rename from include/linux/atmioc.h
rename to include/uapi/linux/atmioc.h
diff --git a/include/linux/atmsap.h b/include/uapi/linux/atmsap.h
similarity index 100%
rename from include/linux/atmsap.h
rename to include/uapi/linux/atmsap.h
diff --git a/include/linux/bpf.h b/include/uapi/linux/bpf.h
similarity index 100%
rename from include/linux/bpf.h
rename to include/uapi/linux/bpf.h
diff --git a/include/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
similarity index 100%
rename from include/linux/bpf_common.h
rename to include/uapi/linux/bpf_common.h
diff --git a/include/linux/can.h b/include/uapi/linux/can.h
similarity index 100%
rename from include/linux/can.h
rename to include/uapi/linux/can.h
diff --git a/include/linux/can/netlink.h b/include/uapi/linux/can/netlink.h
similarity index 100%
rename from include/linux/can/netlink.h
rename to include/uapi/linux/can/netlink.h
diff --git a/include/linux/can/vxcan.h b/include/uapi/linux/can/vxcan.h
similarity index 100%
rename from include/linux/can/vxcan.h
rename to include/uapi/linux/can/vxcan.h
diff --git a/include/linux/devlink.h b/include/uapi/linux/devlink.h
similarity index 100%
rename from include/linux/devlink.h
rename to include/uapi/linux/devlink.h
diff --git a/include/linux/elf-em.h b/include/uapi/linux/elf-em.h
similarity index 100%
rename from include/linux/elf-em.h
rename to include/uapi/linux/elf-em.h
diff --git a/include/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
similarity index 100%
rename from include/linux/fib_rules.h
rename to include/uapi/linux/fib_rules.h
diff --git a/include/linux/filter.h b/include/uapi/linux/filter.h
similarity index 100%
rename from include/linux/filter.h
rename to include/uapi/linux/filter.h
diff --git a/include/linux/fou.h b/include/uapi/linux/fou.h
similarity index 100%
rename from include/linux/fou.h
rename to include/uapi/linux/fou.h
diff --git a/include/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
similarity index 100%
rename from include/linux/gen_stats.h
rename to include/uapi/linux/gen_stats.h
diff --git a/include/linux/genetlink.h b/include/uapi/linux/genetlink.h
similarity index 100%
rename from include/linux/genetlink.h
rename to include/uapi/linux/genetlink.h
diff --git a/include/linux/hdlc/ioctl.h b/include/uapi/linux/hdlc/ioctl.h
similarity index 100%
rename from include/linux/hdlc/ioctl.h
rename to include/uapi/linux/hdlc/ioctl.h
diff --git a/include/linux/icmpv6.h b/include/uapi/linux/icmpv6.h
similarity index 100%
rename from include/linux/icmpv6.h
rename to include/uapi/linux/icmpv6.h
diff --git a/include/linux/if.h b/include/uapi/linux/if.h
similarity index 100%
rename from include/linux/if.h
rename to include/uapi/linux/if.h
diff --git a/include/linux/if_addr.h b/include/uapi/linux/if_addr.h
similarity index 100%
rename from include/linux/if_addr.h
rename to include/uapi/linux/if_addr.h
diff --git a/include/linux/if_addrlabel.h b/include/uapi/linux/if_addrlabel.h
similarity index 100%
rename from include/linux/if_addrlabel.h
rename to include/uapi/linux/if_addrlabel.h
diff --git a/include/linux/if_alg.h b/include/uapi/linux/if_alg.h
similarity index 100%
rename from include/linux/if_alg.h
rename to include/uapi/linux/if_alg.h
diff --git a/include/linux/if_arp.h b/include/uapi/linux/if_arp.h
similarity index 100%
rename from include/linux/if_arp.h
rename to include/uapi/linux/if_arp.h
diff --git a/include/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
similarity index 100%
rename from include/linux/if_bonding.h
rename to include/uapi/linux/if_bonding.h
diff --git a/include/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
similarity index 100%
rename from include/linux/if_bridge.h
rename to include/uapi/linux/if_bridge.h
diff --git a/include/linux/if_ether.h b/include/uapi/linux/if_ether.h
similarity index 100%
rename from include/linux/if_ether.h
rename to include/uapi/linux/if_ether.h
diff --git a/include/linux/if_link.h b/include/uapi/linux/if_link.h
similarity index 100%
rename from include/linux/if_link.h
rename to include/uapi/linux/if_link.h
diff --git a/include/linux/if_macsec.h b/include/uapi/linux/if_macsec.h
similarity index 100%
rename from include/linux/if_macsec.h
rename to include/uapi/linux/if_macsec.h
diff --git a/include/linux/if_packet.h b/include/uapi/linux/if_packet.h
similarity index 100%
rename from include/linux/if_packet.h
rename to include/uapi/linux/if_packet.h
diff --git a/include/linux/if_tun.h b/include/uapi/linux/if_tun.h
similarity index 100%
rename from include/linux/if_tun.h
rename to include/uapi/linux/if_tun.h
diff --git a/include/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
similarity index 100%
rename from include/linux/if_tunnel.h
rename to include/uapi/linux/if_tunnel.h
diff --git a/include/linux/if_vlan.h b/include/uapi/linux/if_vlan.h
similarity index 100%
rename from include/linux/if_vlan.h
rename to include/uapi/linux/if_vlan.h
diff --git a/include/linux/ife.h b/include/uapi/linux/ife.h
similarity index 100%
rename from include/linux/ife.h
rename to include/uapi/linux/ife.h
diff --git a/include/linux/ila.h b/include/uapi/linux/ila.h
similarity index 100%
rename from include/linux/ila.h
rename to include/uapi/linux/ila.h
diff --git a/include/linux/in.h b/include/uapi/linux/in.h
similarity index 100%
rename from include/linux/in.h
rename to include/uapi/linux/in.h
diff --git a/include/linux/in6.h b/include/uapi/linux/in6.h
similarity index 100%
rename from include/linux/in6.h
rename to include/uapi/linux/in6.h
diff --git a/include/linux/in_route.h b/include/uapi/linux/in_route.h
similarity index 100%
rename from include/linux/in_route.h
rename to include/uapi/linux/in_route.h
diff --git a/include/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
similarity index 100%
rename from include/linux/inet_diag.h
rename to include/uapi/linux/inet_diag.h
diff --git a/include/linux/ip.h b/include/uapi/linux/ip.h
similarity index 100%
rename from include/linux/ip.h
rename to include/uapi/linux/ip.h
diff --git a/include/linux/ip6_tunnel.h b/include/uapi/linux/ip6_tunnel.h
similarity index 100%
rename from include/linux/ip6_tunnel.h
rename to include/uapi/linux/ip6_tunnel.h
diff --git a/include/linux/ipsec.h b/include/uapi/linux/ipsec.h
similarity index 100%
rename from include/linux/ipsec.h
rename to include/uapi/linux/ipsec.h
diff --git a/include/linux/kernel.h b/include/uapi/linux/kernel.h
similarity index 100%
rename from include/linux/kernel.h
rename to include/uapi/linux/kernel.h
diff --git a/include/linux/l2tp.h b/include/uapi/linux/l2tp.h
similarity index 100%
rename from include/linux/l2tp.h
rename to include/uapi/linux/l2tp.h
diff --git a/include/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
similarity index 100%
rename from include/linux/libc-compat.h
rename to include/uapi/linux/libc-compat.h
diff --git a/include/linux/limits.h b/include/uapi/linux/limits.h
similarity index 100%
rename from include/linux/limits.h
rename to include/uapi/linux/limits.h
diff --git a/include/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
similarity index 100%
rename from include/linux/lwtunnel.h
rename to include/uapi/linux/lwtunnel.h
diff --git a/include/linux/magic.h b/include/uapi/linux/magic.h
similarity index 100%
rename from include/linux/magic.h
rename to include/uapi/linux/magic.h
diff --git a/include/linux/mpls.h b/include/uapi/linux/mpls.h
similarity index 100%
rename from include/linux/mpls.h
rename to include/uapi/linux/mpls.h
diff --git a/include/linux/mpls_iptunnel.h b/include/uapi/linux/mpls_iptunnel.h
similarity index 100%
rename from include/linux/mpls_iptunnel.h
rename to include/uapi/linux/mpls_iptunnel.h
diff --git a/include/linux/neighbour.h b/include/uapi/linux/neighbour.h
similarity index 100%
rename from include/linux/neighbour.h
rename to include/uapi/linux/neighbour.h
diff --git a/include/linux/net_namespace.h b/include/uapi/linux/net_namespace.h
similarity index 100%
rename from include/linux/net_namespace.h
rename to include/uapi/linux/net_namespace.h
diff --git a/include/linux/netconf.h b/include/uapi/linux/netconf.h
similarity index 100%
rename from include/linux/netconf.h
rename to include/uapi/linux/netconf.h
diff --git a/include/linux/netdevice.h b/include/uapi/linux/netdevice.h
similarity index 100%
rename from include/linux/netdevice.h
rename to include/uapi/linux/netdevice.h
diff --git a/include/linux/netfilter.h b/include/uapi/linux/netfilter.h
similarity index 100%
rename from include/linux/netfilter.h
rename to include/uapi/linux/netfilter.h
diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/uapi/linux/netfilter/ipset/ip_set.h
similarity index 100%
rename from include/linux/netfilter/ipset/ip_set.h
rename to include/uapi/linux/netfilter/ipset/ip_set.h
diff --git a/include/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h
similarity index 100%
rename from include/linux/netfilter/x_tables.h
rename to include/uapi/linux/netfilter/x_tables.h
diff --git a/include/linux/netfilter/xt_set.h b/include/uapi/linux/netfilter/xt_set.h
similarity index 100%
rename from include/linux/netfilter/xt_set.h
rename to include/uapi/linux/netfilter/xt_set.h
diff --git a/include/linux/netfilter/xt_tcpudp.h b/include/uapi/linux/netfilter/xt_tcpudp.h
similarity index 100%
rename from include/linux/netfilter/xt_tcpudp.h
rename to include/uapi/linux/netfilter/xt_tcpudp.h
diff --git a/include/linux/netfilter_ipv4.h b/include/uapi/linux/netfilter_ipv4.h
similarity index 100%
rename from include/linux/netfilter_ipv4.h
rename to include/uapi/linux/netfilter_ipv4.h
diff --git a/include/linux/netfilter_ipv4/ip_tables.h b/include/uapi/linux/netfilter_ipv4/ip_tables.h
similarity index 100%
rename from include/linux/netfilter_ipv4/ip_tables.h
rename to include/uapi/linux/netfilter_ipv4/ip_tables.h
diff --git a/include/linux/netfilter_ipv6.h b/include/uapi/linux/netfilter_ipv6.h
similarity index 100%
rename from include/linux/netfilter_ipv6.h
rename to include/uapi/linux/netfilter_ipv6.h
diff --git a/include/linux/netfilter_ipv6/ip6_tables.h b/include/uapi/linux/netfilter_ipv6/ip6_tables.h
similarity index 100%
rename from include/linux/netfilter_ipv6/ip6_tables.h
rename to include/uapi/linux/netfilter_ipv6/ip6_tables.h
diff --git a/include/linux/netlink.h b/include/uapi/linux/netlink.h
similarity index 100%
rename from include/linux/netlink.h
rename to include/uapi/linux/netlink.h
diff --git a/include/linux/netlink_diag.h b/include/uapi/linux/netlink_diag.h
similarity index 100%
rename from include/linux/netlink_diag.h
rename to include/uapi/linux/netlink_diag.h
diff --git a/include/linux/packet_diag.h b/include/uapi/linux/packet_diag.h
similarity index 100%
rename from include/linux/packet_diag.h
rename to include/uapi/linux/packet_diag.h
diff --git a/include/linux/param.h b/include/uapi/linux/param.h
similarity index 100%
rename from include/linux/param.h
rename to include/uapi/linux/param.h
diff --git a/include/linux/pfkeyv2.h b/include/uapi/linux/pfkeyv2.h
similarity index 100%
rename from include/linux/pfkeyv2.h
rename to include/uapi/linux/pfkeyv2.h
diff --git a/include/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
similarity index 100%
rename from include/linux/pkt_cls.h
rename to include/uapi/linux/pkt_cls.h
diff --git a/include/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
similarity index 100%
rename from include/linux/pkt_sched.h
rename to include/uapi/linux/pkt_sched.h
diff --git a/include/linux/posix_types.h b/include/uapi/linux/posix_types.h
similarity index 100%
rename from include/linux/posix_types.h
rename to include/uapi/linux/posix_types.h
diff --git a/include/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
similarity index 100%
rename from include/linux/rtnetlink.h
rename to include/uapi/linux/rtnetlink.h
diff --git a/include/linux/sctp.h b/include/uapi/linux/sctp.h
similarity index 100%
rename from include/linux/sctp.h
rename to include/uapi/linux/sctp.h
diff --git a/include/linux/seg6.h b/include/uapi/linux/seg6.h
similarity index 100%
rename from include/linux/seg6.h
rename to include/uapi/linux/seg6.h
diff --git a/include/linux/seg6_genl.h b/include/uapi/linux/seg6_genl.h
similarity index 100%
rename from include/linux/seg6_genl.h
rename to include/uapi/linux/seg6_genl.h
diff --git a/include/linux/seg6_hmac.h b/include/uapi/linux/seg6_hmac.h
similarity index 100%
rename from include/linux/seg6_hmac.h
rename to include/uapi/linux/seg6_hmac.h
diff --git a/include/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
similarity index 100%
rename from include/linux/seg6_iptunnel.h
rename to include/uapi/linux/seg6_iptunnel.h
diff --git a/include/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
similarity index 100%
rename from include/linux/seg6_local.h
rename to include/uapi/linux/seg6_local.h
diff --git a/include/linux/sock_diag.h b/include/uapi/linux/sock_diag.h
similarity index 100%
rename from include/linux/sock_diag.h
rename to include/uapi/linux/sock_diag.h
diff --git a/include/linux/socket.h b/include/uapi/linux/socket.h
similarity index 100%
rename from include/linux/socket.h
rename to include/uapi/linux/socket.h
diff --git a/include/linux/sockios.h b/include/uapi/linux/sockios.h
similarity index 100%
rename from include/linux/sockios.h
rename to include/uapi/linux/sockios.h
diff --git a/include/linux/stddef.h b/include/uapi/linux/stddef.h
similarity index 100%
rename from include/linux/stddef.h
rename to include/uapi/linux/stddef.h
diff --git a/include/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
similarity index 100%
rename from include/linux/sysinfo.h
rename to include/uapi/linux/sysinfo.h
diff --git a/include/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
similarity index 100%
rename from include/linux/tc_act/tc_bpf.h
rename to include/uapi/linux/tc_act/tc_bpf.h
diff --git a/include/linux/tc_act/tc_connmark.h b/include/uapi/linux/tc_act/tc_connmark.h
similarity index 100%
rename from include/linux/tc_act/tc_connmark.h
rename to include/uapi/linux/tc_act/tc_connmark.h
diff --git a/include/linux/tc_act/tc_csum.h b/include/uapi/linux/tc_act/tc_csum.h
similarity index 100%
rename from include/linux/tc_act/tc_csum.h
rename to include/uapi/linux/tc_act/tc_csum.h
diff --git a/include/linux/tc_act/tc_defact.h b/include/uapi/linux/tc_act/tc_defact.h
similarity index 100%
rename from include/linux/tc_act/tc_defact.h
rename to include/uapi/linux/tc_act/tc_defact.h
diff --git a/include/linux/tc_act/tc_gact.h b/include/uapi/linux/tc_act/tc_gact.h
similarity index 100%
rename from include/linux/tc_act/tc_gact.h
rename to include/uapi/linux/tc_act/tc_gact.h
diff --git a/include/linux/tc_act/tc_ife.h b/include/uapi/linux/tc_act/tc_ife.h
similarity index 100%
rename from include/linux/tc_act/tc_ife.h
rename to include/uapi/linux/tc_act/tc_ife.h
diff --git a/include/linux/tc_act/tc_ipt.h b/include/uapi/linux/tc_act/tc_ipt.h
similarity index 100%
rename from include/linux/tc_act/tc_ipt.h
rename to include/uapi/linux/tc_act/tc_ipt.h
diff --git a/include/linux/tc_act/tc_mirred.h b/include/uapi/linux/tc_act/tc_mirred.h
similarity index 100%
rename from include/linux/tc_act/tc_mirred.h
rename to include/uapi/linux/tc_act/tc_mirred.h
diff --git a/include/linux/tc_act/tc_nat.h b/include/uapi/linux/tc_act/tc_nat.h
similarity index 100%
rename from include/linux/tc_act/tc_nat.h
rename to include/uapi/linux/tc_act/tc_nat.h
diff --git a/include/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
similarity index 100%
rename from include/linux/tc_act/tc_pedit.h
rename to include/uapi/linux/tc_act/tc_pedit.h
diff --git a/include/linux/tc_act/tc_sample.h b/include/uapi/linux/tc_act/tc_sample.h
similarity index 100%
rename from include/linux/tc_act/tc_sample.h
rename to include/uapi/linux/tc_act/tc_sample.h
diff --git a/include/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
similarity index 100%
rename from include/linux/tc_act/tc_skbedit.h
rename to include/uapi/linux/tc_act/tc_skbedit.h
diff --git a/include/linux/tc_act/tc_skbmod.h b/include/uapi/linux/tc_act/tc_skbmod.h
similarity index 100%
rename from include/linux/tc_act/tc_skbmod.h
rename to include/uapi/linux/tc_act/tc_skbmod.h
diff --git a/include/linux/tc_act/tc_tunnel_key.h b/include/uapi/linux/tc_act/tc_tunnel_key.h
similarity index 100%
rename from include/linux/tc_act/tc_tunnel_key.h
rename to include/uapi/linux/tc_act/tc_tunnel_key.h
diff --git a/include/linux/tc_act/tc_vlan.h b/include/uapi/linux/tc_act/tc_vlan.h
similarity index 100%
rename from include/linux/tc_act/tc_vlan.h
rename to include/uapi/linux/tc_act/tc_vlan.h
diff --git a/include/linux/tc_ematch/tc_em_cmp.h b/include/uapi/linux/tc_ematch/tc_em_cmp.h
similarity index 100%
rename from include/linux/tc_ematch/tc_em_cmp.h
rename to include/uapi/linux/tc_ematch/tc_em_cmp.h
diff --git a/include/linux/tc_ematch/tc_em_meta.h b/include/uapi/linux/tc_ematch/tc_em_meta.h
similarity index 100%
rename from include/linux/tc_ematch/tc_em_meta.h
rename to include/uapi/linux/tc_ematch/tc_em_meta.h
diff --git a/include/linux/tc_ematch/tc_em_nbyte.h b/include/uapi/linux/tc_ematch/tc_em_nbyte.h
similarity index 100%
rename from include/linux/tc_ematch/tc_em_nbyte.h
rename to include/uapi/linux/tc_ematch/tc_em_nbyte.h
diff --git a/include/linux/tcp.h b/include/uapi/linux/tcp.h
similarity index 100%
rename from include/linux/tcp.h
rename to include/uapi/linux/tcp.h
diff --git a/include/linux/tcp_metrics.h b/include/uapi/linux/tcp_metrics.h
similarity index 100%
rename from include/linux/tcp_metrics.h
rename to include/uapi/linux/tcp_metrics.h
diff --git a/include/linux/tipc.h b/include/uapi/linux/tipc.h
similarity index 100%
rename from include/linux/tipc.h
rename to include/uapi/linux/tipc.h
diff --git a/include/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
similarity index 100%
rename from include/linux/tipc_netlink.h
rename to include/uapi/linux/tipc_netlink.h
diff --git a/include/linux/types.h b/include/uapi/linux/types.h
similarity index 100%
rename from include/linux/types.h
rename to include/uapi/linux/types.h
diff --git a/include/linux/unix_diag.h b/include/uapi/linux/unix_diag.h
similarity index 100%
rename from include/linux/unix_diag.h
rename to include/uapi/linux/unix_diag.h
diff --git a/include/linux/veth.h b/include/uapi/linux/veth.h
similarity index 100%
rename from include/linux/veth.h
rename to include/uapi/linux/veth.h
diff --git a/include/linux/xfrm.h b/include/uapi/linux/xfrm.h
similarity index 100%
rename from include/linux/xfrm.h
rename to include/uapi/linux/xfrm.h
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* [PATCH net-next 2/2] tcp: clean up TFO server's initial tcp_rearm_rto() call
From: Wei Wang @ 2017-10-02 17:02 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Yuchung Cheng, Neal Cardwell, Eric Dumazet, Wei Wang

From: Wei Wang <weiwan@google.com>

This commit does a cleanup and moves tcp_rearm_rto() call in the TFO
server case into a previous spot in tcp_rcv_state_process() to make
it more compact.
This is only a cosmetic change.

Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bd3a35f5dbf2..c5b8d61846c2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5911,6 +5911,15 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		if (req) {
 			inet_csk(sk)->icsk_retransmits = 0;
 			reqsk_fastopen_remove(sk, req, false);
+			/* Re-arm the timer because data may have been sent out.
+			 * This is similar to the regular data transmission case
+			 * when new data has just been ack'ed.
+			 *
+			 * (TFO) - we could try to be more aggressive and
+			 * retransmitting any data sooner based on when they
+			 * are sent out.
+			 */
+			tcp_rearm_rto(sk);
 		} else {
 			tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
 			tp->copied_seq = tp->rcv_nxt;
@@ -5933,18 +5942,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		if (tp->rx_opt.tstamp_ok)
 			tp->advmss -= TCPOLEN_TSTAMP_ALIGNED;
 
-		if (req) {
-			/* Re-arm the timer because data may have been sent out.
-			 * This is similar to the regular data transmission case
-			 * when new data has just been ack'ed.
-			 *
-			 * (TFO) - we could try to be more aggressive and
-			 * retransmitting any data sooner based on when they
-			 * are sent out.
-			 */
-			tcp_rearm_rto(sk);
-		}
-
 		if (!inet_csk(sk)->icsk_ca_ops->cong_control)
 			tcp_update_pacing_rate(sk);
 
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* [PATCH net-next 1/2] tcp: uniform the set up of sockets after successful connection
From: Wei Wang @ 2017-10-02 17:01 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Yuchung Cheng, Neal Cardwell, Eric Dumazet, Wei Wang

From: Wei Wang <weiwan@google.com>

Currently in the TCP code, the initialization sequence for cached
metrics, congestion control, BPF, etc, after successful connection
is very inconsistent. This introduces inconsistent bevhavior and is
prone to bugs. The current call sequence is as follows:

(1) for active case (tcp_finish_connect() case):
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);

(2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case):
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_mtup_init(sk);
        tcp_init_buffer_space(sk);
        tcp_init_metrics(sk);

(3) for TFO passive case (tcp_fastopen_create_child()):
        inet_csk(child)->icsk_af_ops->rebuild_header(child);
        tcp_init_congestion_control(child);
        tcp_mtup_init(child);
        tcp_init_metrics(child);
        tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_buffer_space(child);

This commit uniforms the above functions to have the following sequence:
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);

This sequence is the same as the (1) active case. We pick this sequence
because this order correctly allows BPF to override the settings
including congestion control module and initial cwnd, etc from
the route, and then allows the CC module to see those settings.

Suggested-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h       |  1 +
 net/ipv4/tcp.c          | 13 +++++++++++++
 net/ipv4/tcp_fastopen.c |  7 +------
 net/ipv4/tcp_input.c    | 21 +++------------------
 4 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 770b608c8439..f45fdc57d29d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -417,6 +417,7 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst);
 void tcp_disable_fack(struct tcp_sock *tp);
 void tcp_close(struct sock *sk, long timeout);
 void tcp_init_sock(struct sock *sk);
+void tcp_init_transfer(struct sock *sk, int bpf_op);
 unsigned int tcp_poll(struct file *file, struct socket *sock,
 		      struct poll_table_struct *wait);
 int tcp_getsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402720ab..a16445664644 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -456,6 +456,19 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
+void tcp_init_transfer(struct sock *sk, int bpf_op)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	tcp_mtup_init(sk);
+	icsk->icsk_af_ops->rebuild_header(sk);
+	tcp_init_metrics(sk);
+	tcp_call_bpf(sk, bpf_op);
+	tcp_init_congestion_control(sk);
+	tcp_init_buffer_space(sk);
+}
+EXPORT_SYMBOL(tcp_init_transfer);
+
 static void tcp_tx_timestamp(struct sock *sk, u16 tsflags, struct sk_buff *skb)
 {
 	if (tsflags && skb) {
diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index e3c33220c418..515a757f02a8 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -216,12 +216,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk,
 	refcount_set(&req->rsk_refcnt, 2);
 
 	/* Now finish processing the fastopen child socket. */
-	inet_csk(child)->icsk_af_ops->rebuild_header(child);
-	tcp_init_congestion_control(child);
-	tcp_mtup_init(child);
-	tcp_init_metrics(child);
-	tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
-	tcp_init_buffer_space(child);
+	tcp_init_transfer(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
 
 	tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index db9bb46b5776..bd3a35f5dbf2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5513,20 +5513,13 @@ void tcp_finish_connect(struct sock *sk, struct sk_buff *skb)
 		security_inet_conn_established(sk, skb);
 	}
 
-	/* Make sure socket is routed, for correct metrics.  */
-	icsk->icsk_af_ops->rebuild_header(sk);
-
-	tcp_init_metrics(sk);
-	tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
-	tcp_init_congestion_control(sk);
+	tcp_init_transfer(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
 
 	/* Prevent spurious tcp_cwnd_restart() on first data
 	 * packet.
 	 */
 	tp->lsndtime = tcp_jiffies32;
 
-	tcp_init_buffer_space(sk);
-
 	if (sock_flag(sk, SOCK_KEEPOPEN))
 		inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
 
@@ -5693,7 +5686,6 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 		if (tcp_is_sack(tp) && sysctl_tcp_fack)
 			tcp_enable_fack(tp);
 
-		tcp_mtup_init(sk);
 		tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
 		tcp_initialize_rcv_mss(sk);
 
@@ -5920,14 +5912,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 			inet_csk(sk)->icsk_retransmits = 0;
 			reqsk_fastopen_remove(sk, req, false);
 		} else {
-			/* Make sure socket is routed, for correct metrics. */
-			icsk->icsk_af_ops->rebuild_header(sk);
-			tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
-			tcp_init_congestion_control(sk);
-
-			tcp_mtup_init(sk);
+			tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
 			tp->copied_seq = tp->rcv_nxt;
-			tcp_init_buffer_space(sk);
 		}
 		smp_mb();
 		tcp_set_state(sk, TCP_ESTABLISHED);
@@ -5957,8 +5943,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 			 * are sent out.
 			 */
 			tcp_rearm_rto(sk);
-		} else
-			tcp_init_metrics(sk);
+		}
 
 		if (!inet_csk(sk)->icsk_ca_ops->cong_control)
 			tcp_update_pacing_rate(sk);
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* [PATCH net-next v2 2/2] libbpf: use map_flags when creating maps
From: Craig Gallek @ 2017-10-02 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <20171002164129.47986-1-kraigatgoog@gmail.com>

From: Craig Gallek <kraig@google.com>

This is required to use BPF_MAP_TYPE_LPM_TRIE or any other map type
which requires flags.

Signed-off-by: Craig Gallek <kraig@google.com>
---
 tools/lib/bpf/libbpf.c | 2 +-
 tools/lib/bpf/libbpf.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 28b300868ad7..5996e7565cc8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -968,7 +968,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 					   def->key_size,
 					   def->value_size,
 					   def->max_entries,
-					   0);
+					   def->map_flags);
 		if (*pfd < 0) {
 			size_t j;
 			int err = *pfd;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 7959086eb9c9..6e20003109e0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -207,6 +207,7 @@ struct bpf_map_def {
 	unsigned int key_size;
 	unsigned int value_size;
 	unsigned int max_entries;
+	unsigned int map_flags;
 };
 
 /*
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* [PATCH net-next v2 1/2] libbpf: parse maps sections of varying size
From: Craig Gallek @ 2017-10-02 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev
In-Reply-To: <20171002164129.47986-1-kraigatgoog@gmail.com>

From: Craig Gallek <kraig@google.com>

This library previously assumed a fixed-size map options structure.
Any new options were ignored.  In order to allow the options structure
to grow and to support parsing older programs, this patch updates
the maps section parsing to handle varying sizes.

Object files with maps sections smaller than expected will have the new
fields initialized to zero.  Object files which have larger than expected
maps sections will be rejected unless all of the unrecognized data is zero.

This change still assumes that each map definition in the maps section
is the same size.

Signed-off-by: Craig Gallek <kraig@google.com>
---
 tools/lib/bpf/libbpf.c | 54 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4f402dcdf372..28b300868ad7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -580,7 +580,7 @@ bpf_object__init_kversion(struct bpf_object *obj,
 }
 
 static int
-bpf_object__validate_maps(struct bpf_object *obj)
+bpf_object__validate_maps(struct bpf_object *obj, int map_def_sz)
 {
 	int i;
 
@@ -595,9 +595,11 @@ bpf_object__validate_maps(struct bpf_object *obj)
 		const struct bpf_map *a = &obj->maps[i - 1];
 		const struct bpf_map *b = &obj->maps[i];
 
-		if (b->offset - a->offset < sizeof(struct bpf_map_def)) {
-			pr_warning("corrupted map section in %s: map \"%s\" too small\n",
-				   obj->path, a->name);
+		if (b->offset - a->offset < map_def_sz) {
+			pr_warning("corrupted map section in %s: map \"%s\" too small "
+				   "(%zd vs %d)\n",
+				   obj->path, a->name, b->offset - a->offset,
+				   map_def_sz);
 			return -EINVAL;
 		}
 	}
@@ -615,7 +617,7 @@ static int compare_bpf_map(const void *_a, const void *_b)
 static int
 bpf_object__init_maps(struct bpf_object *obj)
 {
-	int i, map_idx, nr_maps = 0;
+	int i, map_idx, map_def_sz, nr_maps = 0;
 	Elf_Scn *scn;
 	Elf_Data *data;
 	Elf_Data *symbols = obj->efile.symbols;
@@ -658,6 +660,15 @@ bpf_object__init_maps(struct bpf_object *obj)
 	if (!nr_maps)
 		return 0;
 
+	/* Assume equally sized map definitions */
+	map_def_sz = data->d_size / nr_maps;
+	if (!data->d_size || (data->d_size % nr_maps) != 0) {
+		pr_warning("unable to determine map definition size "
+			   "section %s, %d maps in %zd bytes\n",
+			   obj->path, nr_maps, data->d_size);
+		return -EINVAL;
+	}
+
 	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
 	if (!obj->maps) {
 		pr_warning("alloc maps for object failed\n");
@@ -690,7 +701,7 @@ bpf_object__init_maps(struct bpf_object *obj)
 				      obj->efile.strtabidx,
 				      sym.st_name);
 		obj->maps[map_idx].offset = sym.st_value;
-		if (sym.st_value + sizeof(struct bpf_map_def) > data->d_size) {
+		if (sym.st_value + map_def_sz > data->d_size) {
 			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
 				   obj->path, map_name);
 			return -EINVAL;
@@ -704,12 +715,39 @@ bpf_object__init_maps(struct bpf_object *obj)
 		pr_debug("map %d is \"%s\"\n", map_idx,
 			 obj->maps[map_idx].name);
 		def = (struct bpf_map_def *)(data->d_buf + sym.st_value);
-		obj->maps[map_idx].def = *def;
+		/*
+		 * If the definition of the map in the object file fits in
+		 * bpf_map_def, copy it.  Any extra fields in our version
+		 * of bpf_map_def will default to zero as a result of the
+		 * calloc above.
+		 */
+		if (map_def_sz <= sizeof(struct bpf_map_def)) {
+			memcpy(&obj->maps[map_idx].def, def, map_def_sz);
+		} else {
+			/*
+			 * Here the map structure being read is bigger than what
+			 * we expect, truncate if the excess bits are all zero.
+			 * If they are not zero, reject this map as
+			 * incompatible.
+			 */
+			char *b;
+			for (b = ((char *)def) + sizeof(struct bpf_map_def);
+			     b < ((char *)def) + map_def_sz; b++) {
+				if (*b != 0) {
+					pr_warning("maps section in %s: \"%s\" "
+						   "has unrecognized, non-zero "
+						   "options\n",
+						   obj->path, map_name);
+					return -EINVAL;
+				}
+			}
+			obj->maps[map_idx].def = *def;
+		}
 		map_idx++;
 	}
 
 	qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
-	return bpf_object__validate_maps(obj);
+	return bpf_object__validate_maps(obj, map_def_sz);
 }
 
 static int bpf_object__elf_collect(struct bpf_object *obj)
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* [PATCH net-next v2 0/2] libbpf: support more map options
From: Craig Gallek @ 2017-10-02 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	David S . Miller
  Cc: Chonggang Li, netdev

From: Craig Gallek <kraig@google.com>

The functional change to this series is the ability to use flags when
creating maps from object files loaded by libbpf.  In order to do this,
the first patch updates the library to handle map definitions that
differ in size from libbpf's struct bpf_map_def.

For object files with a larger map definition, libbpf will continue to load
if the unknown fields are all zero, otherwise the map is rejected.  If the
map definition in the object file is smaller than expected, libbpf will use
zero as a default value in the missing fields.

Craig Gallek (2):
  libbpf: parse maps sections of varying size
  libbpf: use map_flags when creating maps

 tools/lib/bpf/libbpf.c | 56 ++++++++++++++++++++++++++++++++++++++++++--------
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 48 insertions(+), 9 deletions(-)

-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply

* Re: [PATCH 00/18] use ARRAY_SIZE macro
From: Mauro Carvalho Chehab @ 2017-10-02 16:37 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Tobin C. Harding, alsa-devel, nouveau, dri-devel, dm-devel,
	brcm80211-dev-list, devel, linux-scsi, linux-rdma, amd-gfx,
	Jason Gunthorpe, linux-acpi, linux-video, intel-wired-lan,
	linux-media, intel-gfx, ecryptfs, linux-nfs, linux-raid,
	openipmi-developer, intel-gvt-dev, devel, brcm80211-dev-list.pdl,
	netdev, linux-usb, linux-wireless
In-Reply-To: <20171001205220.10b78086@blatinox-laptop.localdomain>

Em Sun, 1 Oct 2017 20:52:20 -0400
Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> escreveu:

> Anyway, I can tell to each maintainer that they can apply the patches
> they're concerned about and next time I may send individual patches.

In the case of media, we'll handle it as if they were individual ones.

Thanks,
Mauro

^ permalink raw reply

* [net-next V3 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu
From: Jesper Dangaard Brouer @ 2017-10-02 16:05 UTC (permalink / raw)
  To: netdev
  Cc: jakub.kicinski, Michael S. Tsirkin, pavel.odintsov, Jason Wang,
	mchan, John Fastabend, peter.waskiewicz.jr,
	Jesper Dangaard Brouer, Daniel Borkmann, Alexei Starovoitov,
	Andy Gospodarek
In-Reply-To: <150696027949.24152.7507025809123255386.stgit@firesoul>

This sample program show how to use cpumap and the associated
tracepoints.

It provides command line stats, which shows how the XDP-RX process,
cpumap-enqueue and cpumap kthread dequeue is cooperating on a per CPU
basis.  It also utilize the xdp_exception and xdp_redirect_err
transpoints to allow users quickly to identify setup issues.

One issue with ixgbe driver is that the driver reset the link when
loading XDP.  This reset the procfs smp_affinity settings.  Thus,
after loading the program, these must be reconfigured.  The easiest
workaround it to reduce the RX-queue to e.g. two via:

 # ethtool --set-channels ixgbe1 combined 2

And then add CPUs above 0 and 1, like:

 # xdp_redirect_cpu --dev ixgbe1 --prog 2 --cpu 2 --cpu 3 --cpu 4

Another issue with ixgbe is that the page recycle mechanism is tied to
the RX-ring size.  And the default setting of 512 elements is too
small.  This is the same issue with regular devmap XDP_REDIRECT.
To overcome this I've been using 1024 rx-ring size:

 # ethtool -G ixgbe1 rx 1024 tx 1024

V3:
 - whitespace cleanups
 - bpf tracepoint cannot access top part of struct

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 samples/bpf/Makefile                |    4 
 samples/bpf/xdp_redirect_cpu_kern.c |  619 +++++++++++++++++++++++++++++++++
 samples/bpf/xdp_redirect_cpu_user.c |  647 +++++++++++++++++++++++++++++++++++
 3 files changed, 1270 insertions(+)
 create mode 100644 samples/bpf/xdp_redirect_cpu_kern.c
 create mode 100644 samples/bpf/xdp_redirect_cpu_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ebc2ad69b62c..52c4dab2c153 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -39,6 +39,7 @@ hostprogs-y += per_socket_stats_example
 hostprogs-y += load_sock_ops
 hostprogs-y += xdp_redirect
 hostprogs-y += xdp_redirect_map
+hostprogs-y += xdp_redirect_cpu
 hostprogs-y += xdp_monitor
 hostprogs-y += syscall_tp
 
@@ -84,6 +85,7 @@ test_map_in_map-objs := bpf_load.o $(LIBBPF) test_map_in_map_user.o
 per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
 xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
+xdp_redirect_cpu-objs := bpf_load.o $(LIBBPF) xdp_redirect_cpu_user.o
 xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 
@@ -129,6 +131,7 @@ always += tcp_iw_kern.o
 always += tcp_clamp_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
+always += xdp_redirect_cpu_kern.o
 always += xdp_monitor_kern.o
 always += syscall_tp_kern.o
 
@@ -169,6 +172,7 @@ HOSTLOADLIBES_xdp_tx_iptunnel += -lelf
 HOSTLOADLIBES_test_map_in_map += -lelf
 HOSTLOADLIBES_xdp_redirect += -lelf
 HOSTLOADLIBES_xdp_redirect_map += -lelf
+HOSTLOADLIBES_xdp_redirect_cpu += -lelf
 HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
 
diff --git a/samples/bpf/xdp_redirect_cpu_kern.c b/samples/bpf/xdp_redirect_cpu_kern.c
new file mode 100644
index 000000000000..fd1f55cb5079
--- /dev/null
+++ b/samples/bpf/xdp_redirect_cpu_kern.c
@@ -0,0 +1,619 @@
+/*  XDP redirect to CPUs via cpumap (BPF_MAP_TYPE_CPUMAP)
+ *
+ *  GPLv2, Copyright(c) 2017 Jesper Dangaard Brouer, Red Hat, Inc.
+ */
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/if_vlan.h>
+#include <uapi/linux/ip.h>
+#include <uapi/linux/ipv6.h>
+#include <uapi/linux/in.h>
+#include <uapi/linux/tcp.h>
+#include <uapi/linux/udp.h>
+
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+#define MAX_CPUS 12 /* WARNING - sync with _user.c */
+
+/* Special map type that can XDP_REDIRECT frames to another CPU */
+struct bpf_map_def SEC("maps") cpu_map = {
+	.type		= BPF_MAP_TYPE_CPUMAP,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(u32),
+	.max_entries	= MAX_CPUS,
+};
+
+/* Common stats data record to keep userspace more simple */
+struct datarec {
+	__u64 processed;
+	__u64 dropped;
+	__u64 issue;
+};
+
+/* Count RX packets, as XDP bpf_prog doesn't get direct TX-success
+ * feedback.  Redirect TX errors can be caught via a tracepoint.
+ */
+struct bpf_map_def SEC("maps") rx_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Used by trace point */
+struct bpf_map_def SEC("maps") redirect_err_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 2,
+	/* TODO: have entries for all possible errno's */
+};
+
+/* Used by trace point */
+struct bpf_map_def SEC("maps") cpumap_enqueue_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= MAX_CPUS,
+};
+
+/* Used by trace point */
+struct bpf_map_def SEC("maps") cpumap_kthread_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Set of maps controlling available CPU, and for iterating through
+ * selectable redirect CPUs.
+ */
+struct bpf_map_def SEC("maps") cpus_available = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(u32),
+	.max_entries	= MAX_CPUS,
+};
+struct bpf_map_def SEC("maps") cpus_count = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(u32),
+	.max_entries	= 1,
+};
+struct bpf_map_def SEC("maps") cpus_iterator = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(u32),
+	.max_entries	= 1,
+};
+
+/* Used by trace point */
+struct bpf_map_def SEC("maps") exception_cnt = {
+	.type		= BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size	= sizeof(u32),
+	.value_size	= sizeof(struct datarec),
+	.max_entries	= 1,
+};
+
+/* Helper parse functions */
+
+/* Parse Ethernet layer 2, extract network layer 3 offset and protocol
+ *
+ * Returns false on error and non-supported ether-type
+ */
+struct vlan_hdr {
+	__be16 h_vlan_TCI;
+	__be16 h_vlan_encapsulated_proto;
+};
+
+static __always_inline
+bool parse_eth(struct ethhdr *eth, void *data_end,
+	       u16 *eth_proto, u64 *l3_offset)
+{
+	u16 eth_type;
+	u64 offset;
+
+	offset = sizeof(*eth);
+	if ((void *)eth + offset > data_end)
+		return false;
+
+	eth_type = eth->h_proto;
+
+	/* Skip non 802.3 Ethertypes */
+	if (unlikely(ntohs(eth_type) < ETH_P_802_3_MIN))
+		return false;
+
+	/* Handle VLAN tagged packet */
+	if (eth_type == htons(ETH_P_8021Q) || eth_type == htons(ETH_P_8021AD)) {
+		struct vlan_hdr *vlan_hdr;
+
+		vlan_hdr = (void *)eth + offset;
+		offset += sizeof(*vlan_hdr);
+		if ((void *)eth + offset > data_end)
+			return false;
+		eth_type = vlan_hdr->h_vlan_encapsulated_proto;
+	}
+	/* TODO: Handle double VLAN tagged packet */
+
+	*eth_proto = ntohs(eth_type);
+	*l3_offset = offset;
+	return true;
+}
+
+static __always_inline
+u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct iphdr *iph = data + nh_off;
+	struct udphdr *udph;
+	u16 dport;
+
+	if (iph + 1 > data_end)
+		return 0;
+	if (!(iph->protocol == IPPROTO_UDP))
+		return 0;
+
+	udph = (void *)(iph + 1);
+	if (udph + 1 > data_end)
+		return 0;
+
+	dport = ntohs(udph->dest);
+	return dport;
+}
+
+static __always_inline
+int get_proto_ipv4(struct xdp_md *ctx, u64 nh_off)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct iphdr *iph = data + nh_off;
+
+	if (iph + 1 > data_end)
+		return 0;
+	return iph->protocol;
+}
+
+static __always_inline
+int get_proto_ipv6(struct xdp_md *ctx, u64 nh_off)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ipv6hdr *ip6h = data + nh_off;
+
+	if (ip6h + 1 > data_end)
+		return 0;
+	return ip6h->nexthdr;
+}
+
+SEC("xdp_cpu_map0")
+int  xdp_prognum0_no_touch(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct datarec *rec;
+	u32 *cpu_selected;
+	u32 cpu_dest;
+	u32 key = 0;
+
+	/* Only use first entry in cpus_available */
+	cpu_selected = bpf_map_lookup_elem(&cpus_available, &key);
+	if (!cpu_selected)
+		return XDP_ABORTED;
+	cpu_dest = *cpu_selected;
+
+	/* Count RX packet in map */
+	rec = bpf_map_lookup_elem(&rx_cnt, &key);
+	if (rec)
+		rec->processed++;
+
+	return bpf_redirect_map(&cpu_map, cpu_dest, 0);
+}
+
+SEC("xdp_cpu_map1_touch_data")
+int  xdp_prognum1_touch_data(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	struct datarec *rec;
+	u32 *cpu_selected;
+	u32 cpu_dest;
+	u16 eth_type;
+	u32 key = 0;
+
+	/* Only use first entry in cpus_available */
+	cpu_selected = bpf_map_lookup_elem(&cpus_available, &key);
+	if (!cpu_selected)
+		return XDP_ABORTED;
+	cpu_dest = *cpu_selected;
+
+	/* Validate packet length is minimum Eth header size */
+	if (eth + 1 > data_end)
+		return XDP_ABORTED;
+
+	/* Count RX packet in map */
+	rec = bpf_map_lookup_elem(&rx_cnt, &key);
+	if (!rec)
+		return XDP_ABORTED;
+	rec->processed++;
+
+	/* Read packet data, and use it (drop non 802.3 Ethertypes) */
+	eth_type = eth->h_proto;
+	if (ntohs(eth_type) < ETH_P_802_3_MIN) {
+		rec->dropped++;
+		return XDP_DROP;
+	}
+
+	return bpf_redirect_map(&cpu_map, cpu_dest, 0);
+}
+
+SEC("xdp_cpu_map2_round_robin")
+int  xdp_prognum2_round_robin(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	struct datarec *rec;
+	u32 cpu_dest;
+	u32 *cpu_lookup;
+	u32 key0 = 0;
+
+	u32 *cpu_selected;
+	u32 *cpu_iterator;
+	u32 *cpu_max;
+	u32 cpu_idx;
+
+	cpu_max = bpf_map_lookup_elem(&cpus_count, &key0);
+	if (!cpu_max)
+		return XDP_ABORTED;
+
+	cpu_iterator = bpf_map_lookup_elem(&cpus_iterator, &key0);
+	if (!cpu_iterator)
+		return XDP_ABORTED;
+	cpu_idx = *cpu_iterator;
+
+	*cpu_iterator += 1;
+	if (*cpu_iterator == *cpu_max)
+		*cpu_iterator = 0;
+
+	cpu_selected = bpf_map_lookup_elem(&cpus_available, &cpu_idx);
+	if (!cpu_selected)
+		return XDP_ABORTED;
+	cpu_dest = *cpu_selected;
+
+	/* Count RX packet in map */
+	rec = bpf_map_lookup_elem(&rx_cnt, &key0);
+	if (!rec)
+		return XDP_ABORTED;
+	rec->processed++;
+
+	/* Check cpu_dest is valid */
+	cpu_lookup = bpf_map_lookup_elem(&cpu_map, &cpu_dest);
+	if (!cpu_lookup) {
+		rec->issue++;
+		return XDP_DROP;
+	}
+
+	if (cpu_dest >= MAX_CPUS)
+		return XDP_ABORTED;
+
+	return bpf_redirect_map(&cpu_map, cpu_dest, 0);
+}
+
+SEC("xdp_cpu_map3_proto_separate")
+int  xdp_prognum3_proto_separate(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	u8 ip_proto = IPPROTO_UDP;
+	struct datarec *rec;
+	u16 eth_proto = 0;
+	u64 l3_offset = 0;
+	u32 cpu_dest = 0;
+	u32 cpu_idx = 0;
+	u32 *cpu_lookup;
+	u32 key = 0;
+
+	/* Count RX packet in map */
+	rec = bpf_map_lookup_elem(&rx_cnt, &key);
+	if (!rec)
+		return XDP_ABORTED;
+	rec->processed++;
+
+	if (!(parse_eth(eth, data_end, &eth_proto, &l3_offset)))
+		return XDP_PASS; /* Just skip */
+
+	/* Extract L4 protocol */
+	switch (eth_proto) {
+	case ETH_P_IP:
+		ip_proto = get_proto_ipv4(ctx, l3_offset);
+		break;
+	case ETH_P_IPV6:
+		ip_proto = get_proto_ipv6(ctx, l3_offset);
+		break;
+	case ETH_P_ARP:
+		cpu_idx = 0; /* ARP packet handled on separate CPU */
+		break;
+	default:
+		cpu_idx = 0;
+	}
+
+	/* Choose CPU based on L4 protocol */
+	switch (ip_proto) {
+	case IPPROTO_ICMP:
+	case IPPROTO_ICMPV6:
+		cpu_idx = 2;
+		break;
+	case IPPROTO_TCP:
+		cpu_idx = 0;
+		break;
+	case IPPROTO_UDP:
+		cpu_idx = 1;
+		break;
+	default:
+		cpu_idx = 0;
+	}
+
+	cpu_lookup = bpf_map_lookup_elem(&cpus_available, &cpu_idx);
+	if (!cpu_lookup)
+		return XDP_ABORTED;
+	cpu_dest = *cpu_lookup;
+
+	if (cpu_dest >= MAX_CPUS)
+		return XDP_ABORTED;
+
+	/* Check cpu_dest is valid */
+	cpu_lookup = bpf_map_lookup_elem(&cpu_map, &cpu_dest);
+	if (!cpu_lookup) {
+		rec->issue++;
+		return XDP_DROP;
+	}
+
+	return bpf_redirect_map(&cpu_map, cpu_dest, 0);
+}
+
+SEC("xdp_cpu_map4_ddos_filter_pktgen")
+int  xdp_prognum4_ddos_filter_pktgen(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *eth = data;
+	u8 ip_proto = IPPROTO_UDP;
+	struct datarec *rec;
+	u16 eth_proto = 0;
+	u64 l3_offset = 0;
+	u32 cpu_dest = 0;
+	u32 cpu_idx = 0;
+	u16 dest_port;
+	u32 *cpu_lookup;
+	u32 key = 0;
+
+	/* Count RX packet in map */
+	rec = bpf_map_lookup_elem(&rx_cnt, &key);
+	if (!rec)
+		return XDP_ABORTED;
+	rec->processed++;
+
+	if (!(parse_eth(eth, data_end, &eth_proto, &l3_offset)))
+		return XDP_PASS; /* Just skip */
+
+	/* Extract L4 protocol */
+	switch (eth_proto) {
+	case ETH_P_IP:
+		ip_proto = get_proto_ipv4(ctx, l3_offset);
+		break;
+	case ETH_P_IPV6:
+		ip_proto = get_proto_ipv6(ctx, l3_offset);
+		break;
+	case ETH_P_ARP:
+		cpu_idx = 0; /* ARP packet handled on separate CPU */
+		break;
+	default:
+		cpu_idx = 0;
+	}
+
+	/* Choose CPU based on L4 protocol */
+	switch (ip_proto) {
+	case IPPROTO_ICMP:
+	case IPPROTO_ICMPV6:
+		cpu_idx = 2;
+		break;
+	case IPPROTO_TCP:
+		cpu_idx = 0;
+		break;
+	case IPPROTO_UDP:
+		cpu_idx = 1;
+		/* DDoS filter UDP port 9 (pktgen) */
+		dest_port = get_dest_port_ipv4_udp(ctx, l3_offset);
+		if (dest_port == 9) {
+			if (rec)
+				rec->dropped++;
+			return XDP_DROP;
+		}
+		break;
+	default:
+		cpu_idx = 0;
+	}
+
+	cpu_lookup = bpf_map_lookup_elem(&cpus_available, &cpu_idx);
+	if (!cpu_lookup)
+		return XDP_ABORTED;
+	cpu_dest = *cpu_lookup;
+
+	if (cpu_dest >= MAX_CPUS)
+		return XDP_ABORTED;
+
+	/* Check cpu_dest is valid */
+	cpu_lookup = bpf_map_lookup_elem(&cpu_map, &cpu_dest);
+	if (!cpu_lookup) {
+		rec->issue++;
+		return XDP_DROP;
+	}
+
+	if (cpu_dest >= MAX_CPUS)
+		return XDP_ABORTED;
+
+	return bpf_redirect_map(&cpu_map, cpu_dest, 0);
+}
+
+
+char _license[] SEC("license") = "GPL";
+
+/*** Trace point code ***/
+
+/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
+ * Code in:                kernel/include/trace/events/xdp.h
+ */
+struct xdp_redirect_ctx {
+	u64 __pad;	// First 8 bytes are not accessible by bpf code
+	int prog_id;	//	offset:8;  size:4; signed:1;
+	u32 act;	//	offset:12  size:4; signed:0;
+	int ifindex;	//	offset:16  size:4; signed:1;
+	int err;	//	offset:20  size:4; signed:1;
+	int to_ifindex;	//	offset:24  size:4; signed:1;
+	u32 map_id;	//	offset:28  size:4; signed:0;
+	int map_index;	//	offset:32  size:4; signed:1;
+};			//	offset:36
+
+enum {
+	XDP_REDIRECT_SUCCESS = 0,
+	XDP_REDIRECT_ERROR = 1
+};
+
+static __always_inline
+int xdp_redirect_collect_stat(struct xdp_redirect_ctx *ctx)
+{
+	u32 key = XDP_REDIRECT_ERROR;
+	struct datarec *rec;
+	int err = ctx->err;
+
+	if (!err)
+		key = XDP_REDIRECT_SUCCESS;
+
+	rec = bpf_map_lookup_elem(&redirect_err_cnt, &key);
+	if (!rec)
+		return 0;
+	rec->dropped += 1;
+
+	return 0; /* Indicate event was filtered (no further processing)*/
+	/*
+	 * Returning 1 here would allow e.g. a perf-record tracepoint
+	 * to see and record these events, but it doesn't work well
+	 * in-practice as stopping perf-record also unload this
+	 * bpf_prog.  Plus, there is additional overhead of doing so.
+	 */
+}
+
+SEC("tracepoint/xdp/xdp_redirect_err")
+int trace_xdp_redirect_err(struct xdp_redirect_ctx *ctx)
+{
+	return xdp_redirect_collect_stat(ctx);
+}
+
+SEC("tracepoint/xdp/xdp_redirect_map_err")
+int trace_xdp_redirect_map_err(struct xdp_redirect_ctx *ctx)
+{
+	return xdp_redirect_collect_stat(ctx);
+}
+
+/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_exception/format
+ * Code in:                kernel/include/trace/events/xdp.h
+ */
+struct xdp_exception_ctx {
+	u64 __pad;	// First 8 bytes are not accessible by bpf code
+	int prog_id;	//	offset:8;  size:4; signed:1;
+	u32 act;	//	offset:12; size:4; signed:0;
+	int ifindex;	//	offset:16; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_exception")
+int trace_xdp_exception(struct xdp_exception_ctx *ctx)
+{
+	struct datarec *rec;
+	u32 key = 0;
+
+	rec = bpf_map_lookup_elem(&exception_cnt, &key);
+	if (!rec)
+		return 1;
+	rec->dropped += 1;
+
+	return 0;
+}
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_enqueue/format
+ * Code in:         kernel/include/trace/events/xdp.h
+ */
+struct cpumap_enqueue_ctx {
+	u64 __pad;		// First 8 bytes are not accessible by bpf code
+	int map_id;		//	offset:8;  size:4; signed:1;
+	u32 act;		//	offset:12; size:4; signed:0;
+	int cpu;		//	offset:16; size:4; signed:1;
+	unsigned int drops;	//	offset:20; size:4; signed:0;
+	unsigned int processed;	//	offset:24; size:4; signed:0;
+	int to_cpu;		//	offset:28; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_cpumap_enqueue")
+int trace_xdp_cpumap_enqueue(struct cpumap_enqueue_ctx *ctx)
+{
+	u32 to_cpu = ctx->to_cpu;
+	struct datarec *rec;
+
+	if (to_cpu >= MAX_CPUS)
+		return 1;
+
+	rec = bpf_map_lookup_elem(&cpumap_enqueue_cnt, &to_cpu);
+	if (!rec)
+		return 0;
+	rec->processed += ctx->processed;
+	rec->dropped   += ctx->drops;
+
+	/* Detect misconfig. Redirect to "same" CPU, makes no sense
+	 * and indicate user of cpumap have not done proper IRQ RXq
+	 * setup.
+	 */
+	if (ctx->cpu == ctx->to_cpu)
+		rec->issue += ctx->processed;
+
+	/* Inception: It's possible to detect overload situations, via
+	 * this tracepoint.  This can be used for creating a feedback
+	 * loop to XDP, which can take appropiate actions to mitigate
+	 * this overload situation.
+	 */
+	return 0;
+}
+
+/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_kthread/format
+ * Code in:         kernel/include/trace/events/xdp.h
+ */
+struct cpumap_kthread_ctx {
+	u64 __pad;		// First 8 bytes are not accessible by bpf code
+	int map_id;		//	offset:8;  size:4; signed:1;
+	u32 act;		//	offset:12; size:4; signed:0;
+	int cpu;		//	offset:16; size:4; signed:1;
+	unsigned int drops;	//	offset:20; size:4; signed:0;
+	unsigned int processed;	//	offset:24; size:4; signed:0;
+	int time_limit;		//	offset:28; size:4; signed:1;
+};
+
+SEC("tracepoint/xdp/xdp_cpumap_kthread")
+int trace_xdp_cpumap_kthread(struct cpumap_kthread_ctx *ctx)
+{
+	struct datarec *rec;
+	u32 key = 0;
+
+	rec = bpf_map_lookup_elem(&cpumap_kthread_cnt, &key);
+	if (!rec)
+		return 0;
+	rec->processed += ctx->processed;
+	rec->dropped   += ctx->drops;
+
+	/* Detect when time limit was exceeded, but queue was not-empty */
+	if (ctx->processed > 0 && ctx->time_limit)
+		rec->issue++;
+
+	return 0;
+}
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c
new file mode 100644
index 000000000000..15a25c23b195
--- /dev/null
+++ b/samples/bpf/xdp_redirect_cpu_user.c
@@ -0,0 +1,647 @@
+/* GPLv2 Copyright(c) 2017 Jesper Dangaard Brouer, Red Hat, Inc.
+ */
+static const char *__doc__ =
+	" XDP redirect with a CPU-map type \"BPF_MAP_TYPE_CPUMAP\"";
+
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <locale.h>
+#include <sys/resource.h>
+#include <getopt.h>
+#include <net/if.h>
+#include <time.h>
+
+#include <arpa/inet.h>
+#include <linux/if_link.h>
+
+#define MAX_CPUS 12 /* WARNING - sync with _kern.c */
+
+/* How many xdp_progs are defined in _kern.c */
+#define MAX_PROG 5
+
+/* Wanted to get rid of bpf_load.h and fake-"libbpf.h" (and instead
+ * use bpf/libbpf.h), but cannot as (currently) needed for XDP
+ * attaching to a device via set_link_xdp_fd()
+ */
+#include "libbpf.h"
+#include "bpf_load.h"
+
+#include "bpf_util.h"
+
+static int ifindex = -1;
+static char ifname_buf[IF_NAMESIZE];
+static char *ifname;
+
+static __u32 xdp_flags;
+
+/* Exit return codes */
+#define EXIT_OK		0
+#define EXIT_FAIL		1
+#define EXIT_FAIL_OPTION	2
+#define EXIT_FAIL_XDP		3
+#define EXIT_FAIL_BPF		4
+#define EXIT_FAIL_MEM		5
+
+static const struct option long_options[] = {
+	{"help",	no_argument,		NULL, 'h' },
+	{"dev",	required_argument,	NULL, 'd' },
+	{"skb-mode",	no_argument,		NULL, 'S' },
+	{"debug",	no_argument,		NULL, 'D' },
+	{"sec",	required_argument,	NULL, 's' },
+	{"prognum",	required_argument,	NULL, 'p' },
+	{"qsize",	required_argument,	NULL, 'q' },
+	{"cpu",	required_argument,	NULL, 'c' },
+	{"no-separators", no_argument,		NULL, 'z' },
+	{0, 0, NULL,  0 }
+};
+
+static void int_exit(int sig)
+{
+	fprintf(stderr,
+		"Interrupted: Removing XDP program on ifindex:%d device:%s\n",
+		ifindex, ifname);
+	if (ifindex > -1)
+		set_link_xdp_fd(ifindex, -1, xdp_flags);
+	exit(EXIT_OK);
+}
+
+static void usage(char *argv[])
+{
+	int i;
+
+	printf("\nDOCUMENTATION:\n%s\n", __doc__);
+	printf("\n");
+	printf(" Usage: %s (options-see-below)\n", argv[0]);
+	printf(" Listing options:\n");
+	for (i = 0; long_options[i].name != 0; i++) {
+		printf(" --%-12s", long_options[i].name);
+		if (long_options[i].flag != NULL)
+			printf(" flag (internal value:%d)",
+				*long_options[i].flag);
+		else
+			printf(" short-option: -%c",
+				long_options[i].val);
+		printf("\n");
+	}
+	printf("\n");
+}
+
+/* gettime returns the current time of day in nanoseconds.
+ * Cost: clock_gettime (ns) => 26ns (CLOCK_MONOTONIC)
+ *       clock_gettime (ns) =>  9ns (CLOCK_MONOTONIC_COARSE)
+ */
+#define NANOSEC_PER_SEC 1000000000 /* 10^9 */
+static __u64 gettime(void)
+{
+	struct timespec t;
+	int res;
+
+	res = clock_gettime(CLOCK_MONOTONIC, &t);
+	if (res < 0) {
+		fprintf(stderr, "Error with gettimeofday! (%i)\n", res);
+		exit(EXIT_FAIL);
+	}
+	return (__u64) t.tv_sec * NANOSEC_PER_SEC + t.tv_nsec;
+}
+
+/* Common stats data record shared with _kern.c */
+struct datarec {
+	__u64 processed;
+	__u64 dropped;
+	__u64 issue;
+};
+struct record {
+	__u64 timestamp;
+	struct datarec total;
+	struct datarec *cpu;
+};
+struct stats_record {
+	struct record rx_cnt;
+	struct record redir_err;
+	struct record kthread;
+	struct record exception;
+	struct record enq[MAX_CPUS];
+};
+
+static bool map_collect_percpu(int fd, __u32 key, struct record *rec)
+{
+	/* For percpu maps, userspace gets a value per possible CPU */
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	struct datarec values[nr_cpus];
+	__u64 sum_processed = 0;
+	__u64 sum_dropped = 0;
+	__u64 sum_issue = 0;
+	int i;
+
+	if ((bpf_map_lookup_elem(fd, &key, values)) != 0) {
+		fprintf(stderr,
+			"ERR: bpf_map_lookup_elem failed key:0x%X\n", key);
+		return false;
+	}
+	/* Get time as close as possible to reading map contents */
+	rec->timestamp = gettime();
+
+	/* Record and sum values from each CPU */
+	for (i = 0; i < nr_cpus; i++) {
+		rec->cpu[i].processed = values[i].processed;
+		sum_processed        += values[i].processed;
+		rec->cpu[i].dropped = values[i].dropped;
+		sum_dropped        += values[i].dropped;
+		rec->cpu[i].issue = values[i].issue;
+		sum_issue        += values[i].issue;
+	}
+	rec->total.processed = sum_processed;
+	rec->total.dropped   = sum_dropped;
+	rec->total.issue     = sum_issue;
+	return true;
+}
+
+static struct datarec *alloc_record_per_cpu(void)
+{
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	struct datarec *array;
+	size_t size;
+
+	size = sizeof(struct datarec) * nr_cpus;
+	array = malloc(size);
+	memset(array, 0, size);
+	if (!array) {
+		fprintf(stderr, "Mem alloc error (nr_cpus:%u)\n", nr_cpus);
+		exit(EXIT_FAIL_MEM);
+	}
+	return array;
+}
+
+static struct stats_record *alloc_stats_record(void)
+{
+	struct stats_record *rec;
+	int i;
+
+	rec = malloc(sizeof(*rec));
+	memset(rec, 0, sizeof(*rec));
+	if (!rec) {
+		fprintf(stderr, "Mem alloc error\n");
+		exit(EXIT_FAIL_MEM);
+	}
+	rec->rx_cnt.cpu    = alloc_record_per_cpu();
+	rec->redir_err.cpu = alloc_record_per_cpu();
+	rec->kthread.cpu   = alloc_record_per_cpu();
+	rec->exception.cpu = alloc_record_per_cpu();
+	for (i = 0; i < MAX_CPUS; i++)
+		rec->enq[i].cpu = alloc_record_per_cpu();
+
+	return rec;
+}
+
+static void free_stats_record(struct stats_record *r)
+{
+	int i;
+
+	for (i = 0; i < MAX_CPUS; i++)
+		free(r->enq[i].cpu);
+	free(r->exception.cpu);
+	free(r->kthread.cpu);
+	free(r->redir_err.cpu);
+	free(r->rx_cnt.cpu);
+	free(r);
+}
+
+static double calc_period(struct record *r, struct record *p)
+{
+	double period_ = 0;
+	__u64 period = 0;
+
+	period = r->timestamp - p->timestamp;
+	if (period > 0)
+		period_ = ((double) period / NANOSEC_PER_SEC);
+
+	return period_;
+}
+
+static __u64 calc_pps(struct datarec *r, struct datarec *p, double period_)
+{
+	__u64 packets = 0;
+	__u64 pps = 0;
+
+	if (period_ > 0) {
+		packets = r->processed - p->processed;
+		pps = packets / period_;
+	}
+	return pps;
+}
+
+static __u64 calc_drop_pps(struct datarec *r, struct datarec *p, double period_)
+{
+	__u64 packets = 0;
+	__u64 pps = 0;
+
+	if (period_ > 0) {
+		packets = r->dropped - p->dropped;
+		pps = packets / period_;
+	}
+	return pps;
+}
+
+static __u64 calc_errs_pps(struct datarec *r,
+			    struct datarec *p, double period_)
+{
+	__u64 packets = 0;
+	__u64 pps = 0;
+
+	if (period_ > 0) {
+		packets = r->issue - p->issue;
+		pps = packets / period_;
+	}
+	return pps;
+}
+
+static void stats_print(struct stats_record *stats_rec,
+			struct stats_record *stats_prev,
+			int prog_num)
+{
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	double pps = 0, drop = 0, err = 0;
+	struct record *rec, *prev;
+	int to_cpu;
+	double t;
+	int i;
+
+	/* Header */
+	printf("Running XDP/eBPF prog_num:%d\n", prog_num);
+	printf("%-15s %-7s %-14s %-11s %-9s\n",
+	       "XDP-cpumap", "CPU:to", "pps", "drop-pps", "extra-info");
+
+	/* XDP rx_cnt */
+	{
+		char *fmt_rx = "%-15s %-7d %'-14.0f %'-11.0f %'-10.0f %s\n";
+		char *fm2_rx = "%-15s %-7s %'-14.0f %'-11.0f\n";
+		char *errstr = "";
+
+		rec  = &stats_rec->rx_cnt;
+		prev = &stats_prev->rx_cnt;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps = calc_pps(r, p, t);
+			drop = calc_drop_pps(r, p, t);
+			err  = calc_errs_pps(r, p, t);
+			if (err > 0)
+				errstr = "cpu-dest/err";
+			if (pps > 0)
+				printf(fmt_rx, "XDP-RX",
+					i, pps, drop, err, errstr);
+		}
+		pps  = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop_pps(&rec->total, &prev->total, t);
+		err  = calc_errs_pps(&rec->total, &prev->total, t);
+		printf(fm2_rx, "XDP-RX", "total", pps, drop);
+	}
+
+	/* cpumap enqueue stats */
+	for (to_cpu = 0; to_cpu < MAX_CPUS; to_cpu++) {
+		char *fmt = "%-15s %3d:%-3d %'-14.0f %'-11.0f %'-10.0f %s\n";
+		char *fm2 = "%-15s %3s:%-3d %'-14.0f %'-11.0f %'-10.0f %s\n";
+		char *errstr = "";
+
+		rec  =  &stats_rec->enq[to_cpu];
+		prev = &stats_prev->enq[to_cpu];
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop_pps(r, p, t);
+			err  = calc_errs_pps(r, p, t);
+			if (err > 0)
+				errstr = "same-cpu/pps";
+			if (pps > 0)
+				printf(fmt, "cpumap-enqueue",
+				       i, to_cpu, pps, drop, err, errstr);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		if (pps > 0) {
+			drop = calc_drop_pps(&rec->total, &prev->total, t);
+			err  = calc_errs_pps(&rec->total, &prev->total, t);
+			printf(fm2, "cpumap-enqueue",
+			       "sum", to_cpu, pps, drop, err, errstr);
+		}
+	}
+
+	/* cpumap kthread stats */
+	{
+		char *fmt_k = "%-15s %-7d %'-14.0f %'-11.0f %-10.0f %s\n";
+		char *fm2_k = "%-15s %-7s %'-14.0f %'-11.0f %-10.0f %s\n";
+		char *errstr = "";
+
+		rec  = &stats_rec->kthread;
+		prev = &stats_prev->kthread;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop_pps(r, p, t);
+			err  = calc_errs_pps(r, p, t);
+			if (err > 0)
+				errstr = "time_exceed";
+			if (pps > 0)
+				printf(fmt_k, "cpumap_kthread",
+				       i, pps, drop, err, errstr);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop_pps(&rec->total, &prev->total, t);
+		printf(fm2_k, "cpumap_kthread", "total", pps, drop);
+	}
+
+	/* XDP redirect err tracepoints (very unlikely) */
+	{
+		char *fmt_err = "%-15s %-7d %'-14.0f %'-11.0f\n";
+		char *fm2_err = "%-15s %-7s %'-14.0f %'-11.0f\n";
+
+		rec  = &stats_rec->redir_err;
+		prev = &stats_prev->redir_err;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop_pps(r, p, t);
+			if (pps > 0)
+				printf(fmt_err, "redirect_err", i, pps, drop);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop_pps(&rec->total, &prev->total, t);
+		printf(fm2_err, "redirect_err", "total", pps, drop);
+	}
+
+	/* XDP general exception tracepoints */
+	{
+		char *fmt_err = "%-15s %-7d %'-14.0f %'-11.0f\n";
+		char *fm2_err = "%-15s %-7s %'-14.0f %'-11.0f\n";
+
+		rec  = &stats_rec->exception;
+		prev = &stats_prev->exception;
+		t = calc_period(rec, prev);
+		for (i = 0; i < nr_cpus; i++) {
+			struct datarec *r = &rec->cpu[i];
+			struct datarec *p = &prev->cpu[i];
+
+			pps  = calc_pps(r, p, t);
+			drop = calc_drop_pps(r, p, t);
+			if (pps > 0)
+				printf(fmt_err, "xdp_exception", i, pps, drop);
+		}
+		pps = calc_pps(&rec->total, &prev->total, t);
+		drop = calc_drop_pps(&rec->total, &prev->total, t);
+		printf(fm2_err, "xdp_exception", "total", pps, drop);
+	}
+
+	printf("\n");
+	fflush(stdout);
+}
+
+static void stats_collect(struct stats_record *rec)
+{
+	int fd, i;
+
+	fd = map_fd[1]; /* map: rx_cnt */
+	map_collect_percpu(fd, 0, &rec->rx_cnt);
+
+	fd = map_fd[2]; /* map: redirect_err_cnt */
+	map_collect_percpu(fd, 1, &rec->redir_err);
+
+	fd = map_fd[3]; /* map: cpumap_enqueue_cnt */
+	for (i = 0; i < MAX_CPUS; i++)
+		map_collect_percpu(fd, i, &rec->enq[i]);
+
+	fd = map_fd[4]; /* map: cpumap_kthread_cnt */
+	map_collect_percpu(fd, 0, &rec->kthread);
+
+	fd = map_fd[8]; /* map: exception_cnt */
+	map_collect_percpu(fd, 0, &rec->exception);
+}
+
+
+/* Pointer swap trick */
+static inline void swap(struct stats_record **a, struct stats_record **b)
+{
+	struct stats_record *tmp;
+
+	tmp = *a;
+	*a = *b;
+	*b = tmp;
+}
+
+static void stats_poll(int interval, bool use_separators, int prog_num)
+{
+	struct stats_record *record, *prev;
+
+	record = alloc_stats_record();
+	prev   = alloc_stats_record();
+	stats_collect(record);
+
+	/* Trick to pretty printf with thousands separators use %' */
+	if (use_separators)
+		setlocale(LC_NUMERIC, "en_US");
+
+	while (1) {
+		swap(&prev, &record);
+		stats_collect(record);
+		stats_print(record, prev, prog_num);
+		sleep(interval);
+	}
+
+	free_stats_record(record);
+	free_stats_record(prev);
+}
+
+static int create_cpu_entry(__u32 cpu, __u32 queue_size,
+			    __u32 avail_idx, bool new)
+{
+	__u32 curr_cpus_count;
+	__u32 key = 0;
+	int ret;
+
+	/* Add a CPU entry to cpumap, as this allocate a cpu entry in
+	 * the kernel for the cpu.
+	 */
+	ret = bpf_map_update_elem(map_fd[0], &cpu, &queue_size, 0);
+	if (ret) {
+		fprintf(stderr, "Create CPU entry failed\n");
+		exit(EXIT_FAIL_BPF);
+	}
+
+	/* Inform bpf_prog's that a new CPU is available to select
+	 * from via some control maps.
+	 */
+	/* map_fd[5] = cpus_available */
+	ret = bpf_map_update_elem(map_fd[5], &avail_idx, &cpu, 0);
+	if (ret) {
+		fprintf(stderr, "Add to avail CPUs failed\n");
+		exit(EXIT_FAIL_BPF);
+	}
+
+	/* When not replacing/updating existing entry, bump the count */
+	/* map_fd[6] = cpus_count */
+	if (new) {
+		ret = bpf_map_lookup_elem(map_fd[6], &key, &curr_cpus_count);
+		if (ret) {
+			fprintf(stderr, "Failed reading curr cpus_count\n");
+			exit(EXIT_FAIL_BPF);
+		}
+		curr_cpus_count++;
+		ret = bpf_map_update_elem(map_fd[6], &key, &curr_cpus_count, 0);
+		if (ret) {
+			fprintf(stderr, "Failed write curr cpus_count\n");
+			exit(EXIT_FAIL_BPF);
+		}
+	}
+	/* map_fd[7] = cpus_iterator */
+	printf("%s CPU:%u as idx:%u cpus_count:%u\n",
+	       new ? "Add-new":"Replace", cpu, avail_idx, curr_cpus_count);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rlimit r = {10 * 1024*1024, RLIM_INFINITY};
+	bool use_separators = true;
+	char filename[256];
+	bool debug = false;
+	int added_cpus = 0;
+	int longindex = 0;
+	int interval = 2;
+	int prog_num = 0;
+	int add_cpu = -1;
+	__u32 qsize;
+	int opt;
+
+	/* Notice: choosing he queue size is very important with the
+	 * ixgbe driver, because it's driver page recycling trick is
+	 * dependend on pages being returned quickly.  The number of
+	 * out-standing packets in the system must be less-than 2x
+	 * RX-ring size.
+	 */
+	qsize = 128+64;
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+		perror("setrlimit(RLIMIT_MEMLOCK)");
+		return 1;
+	}
+
+	if (load_bpf_file(filename)) {
+		fprintf(stderr, "ERR in load_bpf_file(): %s", bpf_log_buf);
+		return EXIT_FAIL;
+	}
+
+	if (!prog_fd[0]) {
+		fprintf(stderr, "ERR: load_bpf_file: %s\n", strerror(errno));
+		return EXIT_FAIL;
+	}
+
+	/* Parse commands line args */
+	while ((opt = getopt_long(argc, argv, "hSd:",
+				  long_options, &longindex)) != -1) {
+		switch (opt) {
+		case 'd':
+			if (strlen(optarg) >= IF_NAMESIZE) {
+				fprintf(stderr, "ERR: --dev name too long\n");
+				goto error;
+			}
+			ifname = (char *)&ifname_buf;
+			strncpy(ifname, optarg, IF_NAMESIZE);
+			ifindex = if_nametoindex(ifname);
+			if (ifindex == 0) {
+				fprintf(stderr,
+					"ERR: --dev name unknown err(%d):%s\n",
+					errno, strerror(errno));
+				goto error;
+			}
+			break;
+		case 's':
+			interval = atoi(optarg);
+			break;
+		case 'S':
+			xdp_flags |= XDP_FLAGS_SKB_MODE;
+			break;
+		case 'D':
+			debug = true;
+			break;
+		case 'z':
+			use_separators = false;
+			break;
+		case 'p':
+			/* Selecting eBPF prog to load */
+			prog_num = atoi(optarg);
+			if (prog_num < 0 || prog_num >= MAX_PROG) {
+				fprintf(stderr,
+					"--prognum too large err(%d):%s\n",
+					errno, strerror(errno));
+				goto error;
+			}
+			break;
+		case 'c':
+			/* Add multiple CPUs */
+			add_cpu = strtoul(optarg, NULL, 0);
+			if (add_cpu > MAX_CPUS) {
+				fprintf(stderr,
+				"--cpu nr too large for cpumap err(%d):%s\n",
+					errno, strerror(errno));
+				goto error;
+			}
+			create_cpu_entry(add_cpu, qsize, added_cpus, true);
+			added_cpus++;
+			break;
+		case 'q':
+			qsize = atoi(optarg);
+			break;
+		case 'h':
+		error:
+		default:
+			usage(argv);
+			return EXIT_FAIL_OPTION;
+		}
+	}
+	/* Required option */
+	if (ifindex == -1) {
+		fprintf(stderr, "ERR: required option --dev missing\n");
+		usage(argv);
+		return EXIT_FAIL_OPTION;
+	}
+	/* Required option */
+	if (add_cpu == -1) {
+		fprintf(stderr, "ERR: required option --cpu missing\n");
+		fprintf(stderr, " Specify multiple --cpu option to add more\n");
+		usage(argv);
+		return EXIT_FAIL_OPTION;
+	}
+
+	/* Remove XDP program when program is interrupted */
+	signal(SIGINT, int_exit);
+
+	if (set_link_xdp_fd(ifindex, prog_fd[prog_num], xdp_flags) < 0) {
+		fprintf(stderr, "link set xdp fd failed\n");
+		return EXIT_FAIL_XDP;
+	}
+
+	if (debug) {
+		printf("Debug-mode reading trace pipe (fix #define DEBUG)\n");
+		read_trace_pipe();
+	}
+
+	stats_poll(interval, use_separators, prog_num);
+	return EXIT_OK;
+}

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox