Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net: qualcomm: emac: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:01 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, timur, yang.wei9
In-Reply-To: <1549986597-4837-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:49:57 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in emac_mac_tx_process() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: moxa: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:01 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, keescook, yang.wei9
In-Reply-To: <1549986960-5031-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:56:00 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in moxart_tx_finished() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] net: sis: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:02 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, romieu, venza, yang.wei9
In-Reply-To: <1549987144-5333-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:59:04 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: macb: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:02 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, nicolas.ferre, yang.wei9
In-Reply-To: <1549987202-5393-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 00:00:02 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in at91ether_interrupt() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] net: ixp4xx_eth: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:02 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, khalasa, yang.wei9
In-Reply-To: <1549987271-5449-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Wed, 13 Feb 2019 00:01:11 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in eth_txdone_irq() when skb
> xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH] qed: fix indentation issue with statements in an if-block
From: David Miller @ 2019-02-14  5:04 UTC (permalink / raw)
  To: colin.king
  Cc: aelior, GR-everest-linux-l2, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20190212160153.12432-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Tue, 12 Feb 2019 16:01:53 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> There are some statements in an if-block that are not correctly
> indented. Fix these.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] qlge: fix some indentation issues
From: David Miller @ 2019-02-14  5:04 UTC (permalink / raw)
  To: colin.king
  Cc: manishc, GR-Linux-NIC-Dev, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20190212160807.12807-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Tue, 12 Feb 2019 16:08:07 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> There are some statements that are indented incorrectly. Fix these.
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: fix possible overflow in __sk_mem_raise_allocated()
From: David Miller @ 2019-02-14  5:05 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20190212202627.184863-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue, 12 Feb 2019 12:26:27 -0800

> With many active TCP sockets, fat TCP sockets could fool
> __sk_mem_raise_allocated() thanks to an overflow.
> 
> They would increase their share of the memory, instead
> of decreasing it.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: flower: only return error from hw offload if skip_sw
From: David Miller @ 2019-02-14  5:07 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, pablo
In-Reply-To: <20190212213906.9368-1-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Tue, 12 Feb 2019 23:39:06 +0200

> Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts
> on some unsupported action commands. However, this should not affect the
> case when user did not explicitly request hw offload by setting skip_sw
> flag. Modify tc_setup_flow_action() callers to only propagate the error if
> skip_sw flag is set for filter that is being offloaded, and set extack
> error message in that case.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Fixes: 3a7b68617de7 ("cls_api: add translator to flow_action representation")

Applied, thanks Vlad.

^ permalink raw reply

* Re: [PATCH net-next 1/1] flow_offload: fix block stats
From: David Miller @ 2019-02-14  5:08 UTC (permalink / raw)
  To: john.hurley; +Cc: jiri, netdev, pablo, oss-drivers
In-Reply-To: <1550017432-26306-1-git-send-email-john.hurley@netronome.com>

From: John Hurley <john.hurley@netronome.com>
Date: Wed, 13 Feb 2019 00:23:52 +0000

> With the introduction of flow_stats_update(), drivers now update the stats
> fields of the passed tc_cls_flower_offload struct, rather than call
> tcf_exts_stats_update() directly to update the stats of offloaded TC
> flower rules. However, if multiple qdiscs are registered to a TC shared
> block and a flower rule is applied, then, when getting stats for the rule,
> multiple callbacks may be made.
> 
> Take this into consideration by modifying flow_stats_update to gather the
> stats from all callbacks. Currently, the values in tc_cls_flower_offload
> only account for the last stats callback in the list.
> 
> Fixes: 3b1903ef97c0 ("flow_offload: add statistics retrieval infrastructure and use it")
> Signed-off-by: John Hurley <john.hurley@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next v11 0/7] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-14  5:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Ahern, Alexei Starovoitov, Daniel Borkmann, netdev,
	Peter Oskolkov, Willem de Bruijn
In-Reply-To: <20190214042127.azcsxbrpzhgumiwa@ast-mbp>

On Wed, Feb 13, 2019 at 8:21 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Feb 13, 2019 at 08:44:51PM -0700, David Ahern wrote:
> > On 2/13/19 7:39 PM, Alexei Starovoitov wrote:
> > > On Wed, Feb 13, 2019 at 05:46:26PM -0700, David Ahern wrote:
> > >> On 2/13/19 12:53 PM, Peter Oskolkov wrote:
> > >>> This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> > >>> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> > >>> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> > >>> to packets (e.g. IP/GRE, GUE, IPIP).
> > >>>
> > >>> This is useful when thousands of different short-lived flows should be
> > >>> encapped, each with different and dynamically determined destination.
> > >>> Although lwtunnels can be used in some of these scenarios, the ability
> > >>> to dynamically generate encap headers adds more flexibility, e.g.
> > >>> when routing depends on the state of the host (reflected in global bpf
> > >>> maps).
> > >>>
> > >>
> > >>
> > >> For the set:
> > >> Reviewed-by: David Ahern <dsahern@gmail.com>
> > >
> > > Applied. Thanks everyone!
> > >
> >
> > Looks like a cleanup round is needed.
> >
> > I changed the routes to fail with unreachable:
> >
> > @@ -179,16 +175,16 @@
> >       ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local
> > ${IPv4_GRE} ttl 255
> >       ip -netns ${NS3} link set gre_dev up
> >       ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
> > -     ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
> > -     ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
> > +     ip -netns ${NS1} route add unreachable ${IPv4_GRE}/32
> > +     ip -netns ${NS2} route add unreachable ${IPv4_GRE}/32
> >
> >
> >       # configure IPv6 GRE device in NS3, and a route to it via the "bottom"
> > route
> >       ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote
> > ${IPv6_1} local ${IPv6_GRE} ttl 255
> >       ip -netns ${NS3} link set gre6_dev up
> >       ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
> > -     ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
> > -     ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
> > +     ip -netns ${NS1} -6 route add unreachable ${IPv6_GRE}/128
> > +     ip -netns ${NS2} -6 route add unreachable ${IPv6_GRE}/128
> >
> >       # rp_filter gets confused by what these tests are doing, so disable it
> >       ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
> > @@ -220,7 +216,6 @@
> >
> >
> > and then removed all of the set -e and exit 1's in the script (really
> > should let all of the tests run versus bailing on the first failure).
> >
> > With kmemleak enabled I see a lot of suspected memory leaks - some may
> > not be related to this change but it is triggering the suspected leak:
>
> argh. Thanks a lot for catching it.
> Let's figure out the fix quickly.

Reproduced. Looking.

> If it's too intrusive we can revert and reapply.
> I'm not going to send a pull-req to Dave with a known issue like this.
>

^ permalink raw reply

* Re: [PATCH 2/2] mac80211: Free mpath object when rhashtable insertion fails
From: Dan Carpenter @ 2019-02-14  5:52 UTC (permalink / raw)
  To: kbuild, Herbert Xu
  Cc: kbuild-all, David Miller, johannes, linux-wireless, netdev, j,
	tgraf, johannes.berg
In-Reply-To: <E1gtmu6-0001Vl-O7@gondobar>

Hi Herbert,

url:    https://github.com/0day-ci/linux/commits/Herbert-Xu/mac80211-Fix-incorrect-usage-of-rhashtable-walk-API/20190213-181325
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git master

smatch warnings:
net/mac80211/mesh_pathtbl.c:439 mesh_path_add() warn: passing zero to 'ERR_PTR'

# https://github.com/0day-ci/linux/commit/a0886e834aacf883ceaf0c34c842c4cdb4d318fd
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout a0886e834aacf883ceaf0c34c842c4cdb4d318fd
vim +/ERR_PTR +439 net/mac80211/mesh_pathtbl.c

b15dc38b9 Bob Copeland         2016-02-28  390  
eb2b9311f Luis Carlos Cobo     2008-02-23  391  /**
eb2b9311f Luis Carlos Cobo     2008-02-23  392   * mesh_path_add - allocate and add a new path to the mesh path table
bf7cd94dc Johannes Berg        2013-02-15  393   * @dst: destination address of the path (ETH_ALEN length)
f698d856f Jasper Bryant-Greene 2008-08-03  394   * @sdata: local subif
eb2b9311f Luis Carlos Cobo     2008-02-23  395   *
af901ca18 André Goddard Rosa   2009-11-14  396   * Returns: 0 on success
eb2b9311f Luis Carlos Cobo     2008-02-23  397   *
eb2b9311f Luis Carlos Cobo     2008-02-23  398   * State: the initial state of the new path is set to 0
eb2b9311f Luis Carlos Cobo     2008-02-23  399   */
ae76eef02 Bob Copeland         2013-03-29  400  struct mesh_path *mesh_path_add(struct ieee80211_sub_if_data *sdata,
ae76eef02 Bob Copeland         2013-03-29  401  				const u8 *dst)
eb2b9311f Luis Carlos Cobo     2008-02-23  402  {
349eb8cf4 Johannes Berg        2011-05-14  403  	struct mesh_table *tbl;
eb2b9311f Luis Carlos Cobo     2008-02-23  404  	struct mesh_path *mpath, *new_mpath;
60854fd94 Bob Copeland         2016-03-02  405  	int ret;
eb2b9311f Luis Carlos Cobo     2008-02-23  406  
b203ca391 Joe Perches          2012-05-08  407  	if (ether_addr_equal(dst, sdata->vif.addr))
eb2b9311f Luis Carlos Cobo     2008-02-23  408  		/* never add ourselves as neighbours */
ae76eef02 Bob Copeland         2013-03-29  409  		return ERR_PTR(-ENOTSUPP);
eb2b9311f Luis Carlos Cobo     2008-02-23  410  
eb2b9311f Luis Carlos Cobo     2008-02-23  411  	if (is_multicast_ether_addr(dst))
ae76eef02 Bob Copeland         2013-03-29  412  		return ERR_PTR(-ENOTSUPP);
eb2b9311f Luis Carlos Cobo     2008-02-23  413  
472dbc45d Johannes Berg        2008-09-11  414  	if (atomic_add_unless(&sdata->u.mesh.mpaths, 1, MESH_MAX_MPATHS) == 0)
ae76eef02 Bob Copeland         2013-03-29  415  		return ERR_PTR(-ENOSPC);
ae76eef02 Bob Copeland         2013-03-29  416  
b15dc38b9 Bob Copeland         2016-02-28  417  	new_mpath = mesh_path_new(sdata, dst, GFP_ATOMIC);
402d7752e Pavel Emelyanov      2008-05-06  418  	if (!new_mpath)
60854fd94 Bob Copeland         2016-03-02  419  		return ERR_PTR(-ENOMEM);
402d7752e Pavel Emelyanov      2008-05-06  420  
60854fd94 Bob Copeland         2016-03-02  421  	tbl = sdata->u.mesh.mesh_paths;
60854fd94 Bob Copeland         2016-03-02  422  	do {
60854fd94 Bob Copeland         2016-03-02  423  		ret = rhashtable_lookup_insert_fast(&tbl->rhead,
60854fd94 Bob Copeland         2016-03-02  424  						    &new_mpath->rhash,
60854fd94 Bob Copeland         2016-03-02  425  						    mesh_rht_params);
f84e71a94 Pavel Emelyanov      2008-05-06  426  
60854fd94 Bob Copeland         2016-03-02  427  		if (ret == -EEXIST)
60854fd94 Bob Copeland         2016-03-02  428  			mpath = rhashtable_lookup_fast(&tbl->rhead,
60854fd94 Bob Copeland         2016-03-02  429  						       dst,
60854fd94 Bob Copeland         2016-03-02  430  						       mesh_rht_params);
05b0152f1 Herbert Xu           2019-02-13  431  		else if (!ret)
05b0152f1 Herbert Xu           2019-02-13  432  			hlist_add_head(&new_mpath->walk_list, &tbl->walk_head);
60854fd94 Bob Copeland         2016-03-02  433  	} while (unlikely(ret == -EEXIST && !mpath));
f5ea9120b Johannes Berg        2009-08-07  434  
a0886e834 Herbert Xu           2019-02-13  435  	if (ret)
a0886e834 Herbert Xu           2019-02-13  436  		kfree(new_mpath);
a0886e834 Herbert Xu           2019-02-13  437  
a0886e834 Herbert Xu           2019-02-13  438  	if (ret != -EEXIST)
60854fd94 Bob Copeland         2016-03-02 @439  		return ERR_PTR(ret);

                                                         if (ret && ret != -EEXIST) ?

ae76eef02 Bob Copeland         2013-03-29  440  
60854fd94 Bob Copeland         2016-03-02  441  	/* At this point either new_mpath was added, or we found a
60854fd94 Bob Copeland         2016-03-02  442  	 * matching entry already in the table; in the latter case
60854fd94 Bob Copeland         2016-03-02  443  	 * free the unnecessary new entry.
60854fd94 Bob Copeland         2016-03-02  444  	 */
a0886e834 Herbert Xu           2019-02-13  445  	if (ret == -EEXIST)
60854fd94 Bob Copeland         2016-03-02  446  		new_mpath = mpath;
60854fd94 Bob Copeland         2016-03-02  447  	sdata->u.mesh.mesh_paths_generation++;
60854fd94 Bob Copeland         2016-03-02  448  	return new_mpath;
18889231e Javier Cardona       2009-08-10  449  }
eb2b9311f Luis Carlos Cobo     2008-02-23  450  

:::::: The code at line 439 was first introduced by commit
:::::: 60854fd94573f0d3b80b55b40cf0140a0430f3ab mac80211: mesh: convert path table to rhashtable

:::::: TO: Bob Copeland <me@bobcopeland.com>
:::::: CC: Johannes Berg <johannes.berg@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Stanislav Fomichev @ 2019-02-14  5:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Willem de Bruijn, Stanislav Fomichev, Network Development,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn
In-Reply-To: <20190214043931.hdpddeom57rnnlhm@ast-mbp>

On 02/13, Alexei Starovoitov wrote:
> On Tue, Feb 12, 2019 at 09:02:32AM -0800, Stanislav Fomichev wrote:
> > On 02/05, Stanislav Fomichev wrote:
> > > On 02/05, Alexei Starovoitov wrote:
> > > > On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote:
> > > > > On 02/05, Alexei Starovoitov wrote:
> > > > > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > > > > > > On 02/05, Alexei Starovoitov wrote:
> > > > > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > > > > > > On 02/05, Willem de Bruijn wrote:
> > > > > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > > > > > > C-based flow dissector in this case.
> > > > > > > > > > >
> > > > > > > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > > > > > > work in the skb-less case.
> > > > > > > > > > >
> > > > > > > > > > > The series goes like this:
> > > > > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > > > > > > >    initialize temporary skb
> > > > > > > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > > > > > > >    associated with an skb
> > > > > > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > > > > > > >    plumb through the callers
> > > > > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > > > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > > > > > > 
> > > > > > > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > > > > > > for every packet in the device driver receive routine, *independent*
> > > > > > > > > > from the real skb allocation and zeroing which will likely happen
> > > > > > > > > > later.
> > > > > > > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > > > > > > 
> > > > > > > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > > > > > > on-stack skb).
> > > > > > > > 
> > > > > > > > imo extra 256 byte memset for every packet is non starter.
> > > > > > > We can put pre-allocated/initialized skbs without data into percpu or even
> > > > > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > > > > > > about having multiple percpu for irq/softirq/process contexts.
> > > > > > > Any concerns with that approach?
> > > > > > > Any other possible concerns with the overall series?
> > > > > > 
> > > > > > I'm missing why the whole thing is needed.
> > > > > > You're saying:
> > > > > > " make BPF flow dissector programs be able to work in the skb-less case".
> > > > > > What does it mean specifically?
> > > > > > The only non-skb case is XDP.
> > > > > > Are you saying you want flow_dissector prog to be run in XDP?
> > > > > eth_get_headlen that drivers call on RX path on a chunk of data to
> > > > > guesstimate the length of the headers calls flow dissector without an skb
> > > > > (__skb_flow_dissect was a weird interface where it accepts skb or
> > > > > data+len). Right now, there is no way to trigger BPF flow dissector
> > > > > for this case (we don't have an skb to get associated namespace/etc/etc).
> > > > > The patch series tries to fix that to make sure that we always trigger
> > > > > BPF program if it's attached to a device's namespace.
> > > > 
> > > > then why not to create flow_dissector prog type that works without skb?
> > > > Why do you need to fake an skb?
> > > > XDP progs work just fine without it.
> > > What's the advantage of having another prog type? In this case we would have
> > > to write the same flow dissector program twice: first time against __skb_buff
> > > interface, second time against xdp_md.
> > > By using fake skb, we make the same flow dissector __sk_buff BPF program
> > > work in both contexts without a rewrite to an xdp interface (I don't
> > > think users should care whether flow dissector was called form "xdp" vs skb
> > > context; and we're sort of stuck with __sk_buff interface already).
> > Should I follow up with v2 where I address memset(,,256) for each packet?
> > Or you still have some questions/doubts/suggestions regarding the problem
> > I'm trying to solve?
> 
> sorry for delay. I'm still thinking what is the path forward here.
No worries, thanks for sticking with me :-)

> That 'stuck with __sk_buff' is what bothers me.
I might have use the wrong word here. I don't think there is another
option to be honest. Using __sk_buff makes flow dissector programs work
with fragmented packets; if we were to use xdp_meta instead, it would
not work in this case. Another point here: the fact that
eth_get_headlen calls flow dissector on a chunk of data instead of skb
feel like an implementation detail. Imo, application writers should not
care about this context; coding against __sk_buff feels like the best we
can do.

> It's an indication that api wasn't thought through if first thing
> it needs is this fake skb hack.
> If bpf_flow.c is a realistic example of such flow dissector prog
> it means that real skb fields are accessed.
> In particular skb->vlan_proto, skb->protocol.
I do manually set skb->protocol to eth->h_proto in my proposal. This is later
correctly handled by bpf_flow.c: parse_eth_proto() is called on skb->protocol
and we correctly handle bpf_htons(ETH_P_8021Q) there. So existing
bpf_flow.c works as expected.

Related: I was also thinking about moving this check out of bpf_flow.c
and pass n_proto directly. I don't see why bpf_flow_keys "export"
n_proto, we know it when we call flow dissector, there is no need to
test for skb->vlan_present in the bpf program.

> These fields in case of 'fake skb' will not be set, since eth_type_trans()
> isn't called yet.
Just to reiterate, I do set skb->protocol manually and
skb->vlan_preset == false (and bpf_flow.c handles this case).
We can also set correct vlan_preset with an additional check, but I
decided to not do it since bpf_flow.c handles that.

> So either flow_dissector needs a real __sk_buff and all of its fields
> should be real or it's a different flow_dissector prog type that
> needs ctx->data, ctx->data_end, ctx->flow_keys only.
> Either way going with fake skb is incorrect, since bpf_flow.c example
> will be broken and for program writers it will be hard to figure why
> it's broken.
It's fake in a sense that it's stack allocated in this particular case.
To the bpf_flow.c it looks like a real __sk_buff. I don't see why
bpf_flow.c example might be broken in this case, can you elaborate?

The goal of this patch series was to essentially make this skb/no-skb
context transparent to the bpf_flow.c (i.e. no changes from the user
flow programs). Adding another flow dissector for eth_get_headlen case
also seems as a no go.

^ permalink raw reply

* Re: [net-next PATCH V3 0/3] Fix page_pool API and dma address storage
From: David Miller @ 2019-02-14  6:02 UTC (permalink / raw)
  To: brouer
  Cc: netdev, linux-mm, toke, ilias.apalodimas, willy, saeedm,
	alexander.duyck, akpm, mgorman, tariqt
In-Reply-To: <155002290134.5597.6544755780651689517.stgit@firesoul>

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Wed, 13 Feb 2019 02:55:34 +0100

> As pointed out by David Miller in [1] the current page_pool implementation
> stores dma_addr_t in page->private. This won't work on 32-bit platforms with
> 64-bit DMA addresses since the page->private is an unsigned long and the
> dma_addr_t a u64.
> 
> Since no driver is yet using the DMA mapping capabilities of the API let's
> fix this by storing the information in 'struct page' and use that to store
> and retrieve DMA addresses from network drivers.
> 
> As long as the addresses returned from dma_map_page() are aligned the first
> bit, used by the compound pages code should not be set.
> 
> Ilias tested the first two patches on Espressobin driver mvneta, for which
> we have patches for using the DMA API of page_pool.
> 
> [1]: https://lore.kernel.org/netdev/20181207.230655.1261252486319967024.davem@davemloft.net/
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Series applied, thanks Jesper.

^ permalink raw reply

* Re: [PATCH bpf-next v11 7/7] selftests: bpf: add test_lwt_ip_encap selftest
From: Stanislav Fomichev @ 2019-02-14  6:03 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, Peter Oskolkov,
	David Ahern, Willem de Bruijn
In-Reply-To: <20190213195341.184969-8-posk@google.com>

On 02/13, Peter Oskolkov wrote:
> This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode
> in bpf_lwt_push_encap.
> 
> Covered:
> - encapping in LWT_IN and LWT_XMIT
> - IPv4 and IPv6
> 
> A follow-up patch will add GSO and VRF-enabled tests.
> 
> Signed-off-by: Peter Oskolkov <posk@google.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |   3 +-
>  .../selftests/bpf/progs/test_lwt_ip_encap.c   |  85 +++++
>  .../selftests/bpf/test_lwt_ip_encap.sh        | 311 ++++++++++++++++++
>  3 files changed, 398 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
>  create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index c3edf47da05d..ccffaa0a0787 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -50,7 +50,8 @@ TEST_PROGS := test_kmod.sh \
>  	test_lirc_mode2.sh \
>  	test_skb_cgroup_id.sh \
>  	test_flow_dissector.sh \
> -	test_xdp_vlan.sh
> +	test_xdp_vlan.sh \
> +	test_lwt_ip_encap.sh
>  
>  TEST_PROGS_EXTENDED := with_addr.sh \
>  	with_tunnels.sh \
> diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
> new file mode 100644
> index 000000000000..c957d6dfe6d7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
> @@ -0,0 +1,85 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <stddef.h>
> +#include <string.h>
> +#include <linux/bpf.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include "bpf_helpers.h"
> +#include "bpf_endian.h"
> +
> +struct grehdr {
> +	__be16 flags;
> +	__be16 protocol;
> +};
> +
> +SEC("encap_gre")
> +int bpf_lwt_encap_gre(struct __sk_buff *skb)
> +{
> +	struct encap_hdr {
> +		struct iphdr iph;
> +		struct grehdr greh;
> +	} hdr;
> +	int err;
> +
> +	memset(&hdr, 0, sizeof(struct encap_hdr));
> +
> +	hdr.iph.ihl = 5;
> +	hdr.iph.version = 4;
> +	hdr.iph.ttl = 0x40;
> +	hdr.iph.protocol = 47;  /* IPPROTO_GRE */

[...]

> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +	hdr.iph.saddr = 0x640110ac;  /* 172.16.1.100 */
> +	hdr.iph.daddr = 0x641010ac;  /* 172.16.16.100 */
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> +	hdr.iph.saddr = 0xac100164;  /* 172.16.1.100 */
> +	hdr.iph.daddr = 0xac101064;  /* 172.16.16.100 */
> +#else
> +#error "Fix your compiler's __BYTE_ORDER__?!"
> +#endif

Nit, why not just:

	hdr.iph.saddr = bpf_htonl(0xac100164);  /* 172.16.1.100 */
	hdr.iph.daddr = bpf_htonl(0xac101064);  /* 172.16.16.100 */

?

> +	hdr.iph.tot_len = bpf_htons(skb->len + sizeof(struct encap_hdr));
> +
> +	hdr.greh.protocol = skb->protocol;
> +
> +	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
> +				 sizeof(struct encap_hdr));
> +	if (err)
> +		return BPF_DROP;
> +
> +	return BPF_LWT_REROUTE;
> +}
> +
> +SEC("encap_gre6")
> +int bpf_lwt_encap_gre6(struct __sk_buff *skb)
> +{
> +	struct encap_hdr {
> +		struct ipv6hdr ip6hdr;
> +		struct grehdr greh;
> +	} hdr;
> +	int err;
> +
> +	memset(&hdr, 0, sizeof(struct encap_hdr));
> +
> +	hdr.ip6hdr.version = 6;
> +	hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(struct grehdr));
> +	hdr.ip6hdr.nexthdr = 47;  /* IPPROTO_GRE */
> +	hdr.ip6hdr.hop_limit = 0x40;
> +	/* fb01::1 */
> +	hdr.ip6hdr.saddr.s6_addr[0] = 0xfb;
> +	hdr.ip6hdr.saddr.s6_addr[1] = 1;
> +	hdr.ip6hdr.saddr.s6_addr[15] = 1;
> +	/* fb10::1 */
> +	hdr.ip6hdr.daddr.s6_addr[0] = 0xfb;
> +	hdr.ip6hdr.daddr.s6_addr[1] = 0x10;
> +	hdr.ip6hdr.daddr.s6_addr[15] = 1;
> +
> +	hdr.greh.protocol = skb->protocol;
> +
> +	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
> +				 sizeof(struct encap_hdr));
> +	if (err)
> +		return BPF_DROP;
> +
> +	return BPF_LWT_REROUTE;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
> new file mode 100755
> index 000000000000..4ca714e23ab0
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
> @@ -0,0 +1,311 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Setup/topology:
> +#
> +#    NS1             NS2             NS3
> +#   veth1 <---> veth2   veth3 <---> veth4 (the top route)
> +#   veth5 <---> veth6   veth7 <---> veth8 (the bottom route)
> +#
> +#   each vethN gets IPv[4|6]_N address
> +#
> +#   IPv*_SRC = IPv*_1
> +#   IPv*_DST = IPv*_4
> +#
> +#   all tests test pings from IPv*_SRC to IPv*_DST
> +#
> +#   by default, routes are configured to allow packets to go
> +#   IP*_1 <=> IP*_2 <=> IP*_3 <=> IP*_4 (the top route)
> +#
> +#   a GRE device is installed in NS3 with IPv*_GRE, and
> +#   NS1/NS2 are configured to route packets to IPv*_GRE via IP*_8
> +#   (the bottom route)
> +#
> +# Tests:
> +#
> +#   1. routes NS2->IPv*_DST are brought down, so the only way a ping
> +#      from IP*_SRC to IP*_DST can work is via IPv*_GRE
> +#
> +#   2a. in an egress test, a bpf LWT_XMIT program is installed on veth1
> +#       that encaps the packets with an IP/GRE header to route to IPv*_GRE
> +#
> +#       ping: SRC->[encap at veth1:egress]->GRE:decap->DST
> +#       ping replies go DST->SRC directly
> +#
> +#   2b. in an ingress test, a bpf LWT_IN program is installed on veth2
> +#       that encaps the packets with an IP/GRE header to route to IPv*_GRE
> +#
> +#       ping: SRC->[encap at veth2:ingress]->GRE:decap->DST
> +#       ping replies go DST->SRC directly
> +
> +set -e  # exit on error
> +
> +if [[ $EUID -ne 0 ]]; then
> +	echo "This script must be run as root"
> +	echo "FAIL"
> +	exit 1
> +fi
> +
> +readonly NS1="ns1-$(mktemp -u XXXXXX)"
> +readonly NS2="ns2-$(mktemp -u XXXXXX)"
> +readonly NS3="ns3-$(mktemp -u XXXXXX)"
> +
> +readonly IPv4_1="172.16.1.100"
> +readonly IPv4_2="172.16.2.100"
> +readonly IPv4_3="172.16.3.100"
> +readonly IPv4_4="172.16.4.100"
> +readonly IPv4_5="172.16.5.100"
> +readonly IPv4_6="172.16.6.100"
> +readonly IPv4_7="172.16.7.100"
> +readonly IPv4_8="172.16.8.100"
> +readonly IPv4_GRE="172.16.16.100"
> +
> +readonly IPv4_SRC=$IPv4_1
> +readonly IPv4_DST=$IPv4_4
> +
> +readonly IPv6_1="fb01::1"
> +readonly IPv6_2="fb02::1"
> +readonly IPv6_3="fb03::1"
> +readonly IPv6_4="fb04::1"
> +readonly IPv6_5="fb05::1"
> +readonly IPv6_6="fb06::1"
> +readonly IPv6_7="fb07::1"
> +readonly IPv6_8="fb08::1"
> +readonly IPv6_GRE="fb10::1"
> +
> +readonly IPv6_SRC=$IPv6_1
> +readonly IPv6_DST=$IPv6_4
> +
> +setup() {
> +set -e  # exit on error
> +	# create devices and namespaces
> +	ip netns add "${NS1}"
> +	ip netns add "${NS2}"
> +	ip netns add "${NS3}"
> +
> +	ip link add veth1 type veth peer name veth2
> +	ip link add veth3 type veth peer name veth4
> +	ip link add veth5 type veth peer name veth6
> +	ip link add veth7 type veth peer name veth8
> +
> +	ip netns exec ${NS2} sysctl -wq net.ipv4.ip_forward=1
> +	ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.forwarding=1
> +
> +	ip link set veth1 netns ${NS1}
> +	ip link set veth2 netns ${NS2}
> +	ip link set veth3 netns ${NS2}
> +	ip link set veth4 netns ${NS3}
> +	ip link set veth5 netns ${NS1}
> +	ip link set veth6 netns ${NS2}
> +	ip link set veth7 netns ${NS2}
> +	ip link set veth8 netns ${NS3}
> +
> +	# configure addesses: the top route (1-2-3-4)
> +	ip -netns ${NS1}    addr add ${IPv4_1}/24  dev veth1
> +	ip -netns ${NS2}    addr add ${IPv4_2}/24  dev veth2
> +	ip -netns ${NS2}    addr add ${IPv4_3}/24  dev veth3
> +	ip -netns ${NS3}    addr add ${IPv4_4}/24  dev veth4
> +	ip -netns ${NS1} -6 addr add ${IPv6_1}/128 nodad dev veth1
> +	ip -netns ${NS2} -6 addr add ${IPv6_2}/128 nodad dev veth2
> +	ip -netns ${NS2} -6 addr add ${IPv6_3}/128 nodad dev veth3
> +	ip -netns ${NS3} -6 addr add ${IPv6_4}/128 nodad dev veth4
> +
> +	# configure addresses: the bottom route (5-6-7-8)
> +	ip -netns ${NS1}    addr add ${IPv4_5}/24  dev veth5
> +	ip -netns ${NS2}    addr add ${IPv4_6}/24  dev veth6
> +	ip -netns ${NS2}    addr add ${IPv4_7}/24  dev veth7
> +	ip -netns ${NS3}    addr add ${IPv4_8}/24  dev veth8
> +	ip -netns ${NS1} -6 addr add ${IPv6_5}/128 nodad dev veth5
> +	ip -netns ${NS2} -6 addr add ${IPv6_6}/128 nodad dev veth6
> +	ip -netns ${NS2} -6 addr add ${IPv6_7}/128 nodad dev veth7
> +	ip -netns ${NS3} -6 addr add ${IPv6_8}/128 nodad dev veth8
> +
> +
> +	ip -netns ${NS1} link set dev veth1 up
> +	ip -netns ${NS2} link set dev veth2 up
> +	ip -netns ${NS2} link set dev veth3 up
> +	ip -netns ${NS3} link set dev veth4 up
> +	ip -netns ${NS1} link set dev veth5 up
> +	ip -netns ${NS2} link set dev veth6 up
> +	ip -netns ${NS2} link set dev veth7 up
> +	ip -netns ${NS3} link set dev veth8 up
> +
> +	# configure routes: IP*_SRC -> veth1/IP*_2 (= top route) default;
> +	# the bottom route to specific bottom addresses
> +
> +	# NS1
> +	# top route
> +	ip -netns ${NS1}    route add ${IPv4_2}/32  dev veth1
> +	ip -netns ${NS1}    route add default dev veth1 via ${IPv4_2}  # go top by default
> +	ip -netns ${NS1} -6 route add ${IPv6_2}/128 dev veth1
> +	ip -netns ${NS1} -6 route add default dev veth1 via ${IPv6_2}  # go top by default
> +	# bottom route
> +	ip -netns ${NS1}    route add ${IPv4_6}/32  dev veth5
> +	ip -netns ${NS1}    route add ${IPv4_7}/32  dev veth5 via ${IPv4_6}
> +	ip -netns ${NS1}    route add ${IPv4_8}/32  dev veth5 via ${IPv4_6}
> +	ip -netns ${NS1} -6 route add ${IPv6_6}/128 dev veth5
> +	ip -netns ${NS1} -6 route add ${IPv6_7}/128 dev veth5 via ${IPv6_6}
> +	ip -netns ${NS1} -6 route add ${IPv6_8}/128 dev veth5 via ${IPv6_6}
> +
> +	# NS2
> +	# top route
> +	ip -netns ${NS2}    route add ${IPv4_1}/32  dev veth2
> +	ip -netns ${NS2}    route add ${IPv4_4}/32  dev veth3
> +	ip -netns ${NS2} -6 route add ${IPv6_1}/128 dev veth2
> +	ip -netns ${NS2} -6 route add ${IPv6_4}/128 dev veth3
> +	# bottom route
> +	ip -netns ${NS2}    route add ${IPv4_5}/32  dev veth6
> +	ip -netns ${NS2}    route add ${IPv4_8}/32  dev veth7
> +	ip -netns ${NS2} -6 route add ${IPv6_5}/128 dev veth6
> +	ip -netns ${NS2} -6 route add ${IPv6_8}/128 dev veth7
> +
> +	# NS3
> +	# top route
> +	ip -netns ${NS3}    route add ${IPv4_3}/32  dev veth4
> +	ip -netns ${NS3}    route add ${IPv4_1}/32  dev veth4 via ${IPv4_3}
> +	ip -netns ${NS3}    route add ${IPv4_2}/32  dev veth4 via ${IPv4_3}
> +	ip -netns ${NS3} -6 route add ${IPv6_3}/128 dev veth4
> +	ip -netns ${NS3} -6 route add ${IPv6_1}/128 dev veth4 via ${IPv6_3}
> +	ip -netns ${NS3} -6 route add ${IPv6_2}/128 dev veth4 via ${IPv6_3}
> +	# bottom route
> +	ip -netns ${NS3}    route add ${IPv4_7}/32  dev veth8
> +	ip -netns ${NS3}    route add ${IPv4_5}/32  dev veth8 via ${IPv4_7}
> +	ip -netns ${NS3}    route add ${IPv4_6}/32  dev veth8 via ${IPv4_7}
> +	ip -netns ${NS3} -6 route add ${IPv6_7}/128 dev veth8
> +	ip -netns ${NS3} -6 route add ${IPv6_5}/128 dev veth8 via ${IPv6_7}
> +	ip -netns ${NS3} -6 route add ${IPv6_6}/128 dev veth8 via ${IPv6_7}
> +
> +	# configure IPv4 GRE device in NS3, and a route to it via the "bottom" route
> +	ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local ${IPv4_GRE} ttl 255
> +	ip -netns ${NS3} link set gre_dev up
> +	ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
> +	ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
> +	ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
> +
> +
> +	# configure IPv6 GRE device in NS3, and a route to it via the "bottom" route
> +	ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote ${IPv6_1} local ${IPv6_GRE} ttl 255
> +	ip -netns ${NS3} link set gre6_dev up
> +	ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
> +	ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
> +	ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
> +
> +	# rp_filter gets confused by what these tests are doing, so disable it
> +	ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
> +	ip netns exec ${NS2} sysctl -wq net.ipv4.conf.all.rp_filter=0
> +	ip netns exec ${NS3} sysctl -wq net.ipv4.conf.all.rp_filter=0
> +}
> +
> +cleanup() {
> +	ip netns del ${NS1} 2> /dev/null
> +	ip netns del ${NS2} 2> /dev/null
> +	ip netns del ${NS3} 2> /dev/null
> +}
> +
> +trap cleanup EXIT
> +
> +test_ping() {
> +	local readonly PROTO=$1
> +	local readonly EXPECTED=$2
> +	local RET=0
> +
> +	set +e
> +	if [ "${PROTO}" == "IPv4" ] ; then
> +		ip netns exec ${NS1} ping  -c 1 -W 1 -I ${IPv4_SRC} ${IPv4_DST} 2>&1 > /dev/null
> +		RET=$?
> +	elif [ "${PROTO}" == "IPv6" ] ; then
> +		ip netns exec ${NS1} ping6 -c 1 -W 6 -I ${IPv6_SRC} ${IPv6_DST} 2>&1 > /dev/null
> +		RET=$?
> +	else
> +		echo "test_ping: unknown PROTO: ${PROTO}"
> +		exit 1
> +	fi
> +	set -e
> +
> +	if [ "0" != "${RET}" ]; then
> +		RET=1
> +	fi
> +
> +	if [ "${EXPECTED}" != "${RET}" ] ; then
> +		echo "FAIL: test_ping: ${RET}"
> +		exit 1
> +	fi
> +}
> +
> +test_egress() {
> +	local readonly ENCAP=$1
> +	echo "starting egress ${ENCAP} encap test"
> +	setup
> +
> +	# need to wait a bit for IPv6 to autoconf, otherwise
> +	# ping6 sometimes fails with "unable to bind to address"
> +
> +	# by default, pings work
> +	test_ping IPv4 0
> +	test_ping IPv6 0
> +
> +	# remove NS2->DST routes, ping fails
> +	ip -netns ${NS2}    route del ${IPv4_DST}/32  dev veth3
> +	ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
> +	test_ping IPv4 1
> +	test_ping IPv6 1
> +
> +	# install replacement routes (LWT/eBPF), pings succeed
> +	if [ "${ENCAP}" == "IPv4" ] ; then
> +		ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
> +		ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
> +	elif [ "${ENCAP}" == "IPv6" ] ; then
> +		ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
> +		ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
> +	else
> +		echo "FAIL: unknown encap ${ENCAP}"
> +	fi
> +	test_ping IPv4 0
> +	test_ping IPv6 0
> +
> +	cleanup
> +	echo "PASS"
> +}
> +
> +test_ingress() {
> +	local readonly ENCAP=$1
> +	echo "starting ingress ${ENCAP} encap test"
> +	setup
> +
> +	# need to wait a bit for IPv6 to autoconf, otherwise
> +	# ping6 sometimes fails with "unable to bind to address"
> +
> +	# by default, pings work
> +	test_ping IPv4 0
> +	test_ping IPv6 0
> +
> +	# remove NS2->DST routes, pings fail
> +	ip -netns ${NS2}    route del ${IPv4_DST}/32  dev veth3
> +	ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
> +	test_ping IPv4 1
> +	test_ping IPv6 1
> +
> +	# install replacement routes (LWT/eBPF), pings succeed
> +	if [ "${ENCAP}" == "IPv4" ] ; then
> +		ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
> +		ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
> +	elif [ "${ENCAP}" == "IPv6" ] ; then
> +		ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
> +		ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
> +	else
> +		echo "FAIL: unknown encap ${ENCAP}"
> +	fi
> +	test_ping IPv4 0
> +	test_ping IPv6 0
> +
> +	cleanup
> +	echo "PASS"
> +}
> +
> +test_egress IPv4
> +test_egress IPv6
> +
> +test_ingress IPv4
> +test_ingress IPv6
> +
> +echo "all tests passed"
> -- 
> 2.20.1.791.gb4d0f1c61a-goog
> 

^ permalink raw reply

* Re: [PATCH net-next] net: sched: remove duplicated include from cls_api.c
From: David Miller @ 2019-02-14  6:04 UTC (permalink / raw)
  To: yuehaibing; +Cc: jhs, xiyou.wangcong, jiri, netdev, kernel-janitors
In-Reply-To: <20190213014200.120305-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 13 Feb 2019 01:42:00 +0000

> Remove duplicated include.
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied, thanks.

^ permalink raw reply

* [PATCH bpf-next] bpf: fix memory leak in bpf_lwt_xmit_reroute
From: Peter Oskolkov @ 2019-02-14  6:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev
  Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov

On error the skb should be freed. Tested with diff/steps
provided by David Ahern.

Reported-by: David Ahern <dsahern@gmail.com>
Fixes: 3bd0b15281af ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Peter Oskolkov <posk@google.com>
---
 net/core/lwt_bpf.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 32251f3fcda0..f3273cbb6b22 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -179,18 +179,19 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
 	struct net_device *l3mdev = l3mdev_master_dev_rcu(skb_dst(skb)->dev);
 	int oif = l3mdev ? l3mdev->ifindex : 0;
 	struct dst_entry *dst = NULL;
+	int err = -EAFNOSUPPORT;
 	struct sock *sk;
 	struct net *net;
 	bool ipv4;
-	int err;
 
 	if (skb->protocol == htons(ETH_P_IP))
 		ipv4 = true;
 	else if (skb->protocol == htons(ETH_P_IPV6))
 		ipv4 = false;
 	else
-		return -EAFNOSUPPORT;
+		goto err;
 
+	err = -EINVAL;
 	sk = sk_to_full_sk(skb->sk);
 	if (sk) {
 		if (sk->sk_bound_dev_if)
@@ -216,7 +217,7 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
 
 		rt = ip_route_output_key(net, &fl4);
 		if (IS_ERR(rt))
-			return -EINVAL;
+			goto err;
 		dst = &rt->dst;
 	} else {
 		struct ipv6hdr *iph6 = ipv6_hdr(skb);
@@ -231,12 +232,15 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
 		fl6.saddr = iph6->saddr;
 
 		err = ipv6_stub->ipv6_dst_lookup(net, skb->sk, &dst, &fl6);
-		if (err || IS_ERR(dst))
-			return -EINVAL;
+		if (err || IS_ERR(dst)) {
+			err = -EINVAL;
+			goto err;
+		}
 	}
 	if (unlikely(dst->error)) {
 		dst_release(dst);
-		return -EINVAL;
+		err = -EINVAL;
+		goto err;
 	}
 
 	/* Although skb header was reserved in bpf_lwt_push_ip_encap(), it
@@ -246,17 +250,21 @@ static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
 	 */
 	err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
 	if (unlikely(err))
-		return err;
+		goto err;
 
 	skb_dst_drop(skb);
 	skb_dst_set(skb, dst);
 
 	err = dst_output(dev_net(skb_dst(skb)->dev), skb->sk, skb);
 	if (unlikely(err))
-		return err;
+		goto err;
 
 	/* ip[6]_finish_output2 understand LWTUNNEL_XMIT_DONE */
 	return LWTUNNEL_XMIT_DONE;
+
+err:
+	kfree_skb(skb);
+	return err;
 }
 
 static int bpf_xmit(struct sk_buff *skb)
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply related

* Re: [PATCH net-next 1/2] cxgb4/cxgb4vf: Add support for SGE doorbell queue timer
From: David Miller @ 2019-02-14  6:10 UTC (permalink / raw)
  To: vishal; +Cc: netdev, nirranjan, indranil, dt
In-Reply-To: <1550034844-10850-2-git-send-email-vishal@chelsio.com>

From: Vishal Kulkarni <vishal@chelsio.com>
Date: Wed, 13 Feb 2019 10:44:03 +0530

> T6 introduced a Timer Mechanism in SGE called the
> SGE Doorbell Queue Timer. With this we can now configure
> TX Queues to get CIDX Updates when:
> 
>     Time(CIDX == PIDX) >= Timer
> 
> Previously we rely on TX Queue Status Page updates by hardware
> for DMA completions. This will make Hardware/Firmware actually
> deliver the CIDX Updates as Ingress Queue messages with
> commensurate Interrupts.
> 
> So we now have a new RX Path component for processing CIDX Updates
> and reclaiming TX Descriptors faster.
> 
> Original work by: Casey Leedom <leedom@chelsio.com>
> 
> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>

Can you at least fix the reverse christmas tree situation for the functions
where you add or remove local variables?

Thank you.

^ permalink raw reply

* Re: [PATCH bpf-next v11 0/7] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-14  6:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Ahern, Alexei Starovoitov, Daniel Borkmann, netdev,
	Peter Oskolkov, Willem de Bruijn
In-Reply-To: <20190214042127.azcsxbrpzhgumiwa@ast-mbp>

On Wed, Feb 13, 2019 at 8:21 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Feb 13, 2019 at 08:44:51PM -0700, David Ahern wrote:
> > On 2/13/19 7:39 PM, Alexei Starovoitov wrote:
> > > On Wed, Feb 13, 2019 at 05:46:26PM -0700, David Ahern wrote:
> > >> On 2/13/19 12:53 PM, Peter Oskolkov wrote:
> > >>> This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> > >>> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> > >>> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> > >>> to packets (e.g. IP/GRE, GUE, IPIP).
> > >>>
> > >>> This is useful when thousands of different short-lived flows should be
> > >>> encapped, each with different and dynamically determined destination.
> > >>> Although lwtunnels can be used in some of these scenarios, the ability
> > >>> to dynamically generate encap headers adds more flexibility, e.g.
> > >>> when routing depends on the state of the host (reflected in global bpf
> > >>> maps).
> > >>>
> > >>
> > >>
> > >> For the set:
> > >> Reviewed-by: David Ahern <dsahern@gmail.com>
> > >
> > > Applied. Thanks everyone!
> > >
> >
> > Looks like a cleanup round is needed.
> >
> > I changed the routes to fail with unreachable:
> >
> > @@ -179,16 +175,16 @@
> >       ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local
> > ${IPv4_GRE} ttl 255
> >       ip -netns ${NS3} link set gre_dev up
> >       ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
> > -     ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
> > -     ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
> > +     ip -netns ${NS1} route add unreachable ${IPv4_GRE}/32
> > +     ip -netns ${NS2} route add unreachable ${IPv4_GRE}/32
> >
> >
> >       # configure IPv6 GRE device in NS3, and a route to it via the "bottom"
> > route
> >       ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote
> > ${IPv6_1} local ${IPv6_GRE} ttl 255
> >       ip -netns ${NS3} link set gre6_dev up
> >       ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
> > -     ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
> > -     ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
> > +     ip -netns ${NS1} -6 route add unreachable ${IPv6_GRE}/128
> > +     ip -netns ${NS2} -6 route add unreachable ${IPv6_GRE}/128
> >
> >       # rp_filter gets confused by what these tests are doing, so disable it
> >       ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
> > @@ -220,7 +216,6 @@
> >
> >
> > and then removed all of the set -e and exit 1's in the script (really
> > should let all of the tests run versus bailing on the first failure).
> >
> > With kmemleak enabled I see a lot of suspected memory leaks - some may
> > not be related to this change but it is triggering the suspected leak:
>
> argh. Thanks a lot for catching it.
> Let's figure out the fix quickly.

Sent a fix: https://patchwork.ozlabs.org/patch/1041836/
Sorry about that.

> If it's too intrusive we can revert and reapply.
> I'm not going to send a pull-req to Dave with a known issue like this.
>

^ permalink raw reply

* Re: [PATCH net-next] cxgb4vf: Few more link management changes.
From: David Miller @ 2019-02-14  6:12 UTC (permalink / raw)
  To: vishal; +Cc: netdev, nirranjan, indranil, dt
In-Reply-To: <1550035132-11025-1-git-send-email-vishal@chelsio.com>

From: Vishal Kulkarni <vishal@chelsio.com>
Date: Wed, 13 Feb 2019 10:48:52 +0530

> CR4_QSFP 10G Speed technology should be 10000baseKR_Full
> And also report available FEC modes.
> 
> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
From: David Miller @ 2019-02-14  6:13 UTC (permalink / raw)
  To: dan.carpenter; +Cc: andrew, f.fainelli, vivien.didelot, netdev, kernel-janitors
In-Reply-To: <20190213082304.GA14113@kadam>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 13 Feb 2019 11:23:04 +0300

> The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
> than or equal to DSA_MAX_PORTS.  The ds->ports[] array is used inside
> the dsa_is_user_port() and dsa_is_cpu_port() functions.  The ds->ports[]
> array is allocated in dsa_switch_alloc() and it has ds->num_ports
> elements so this leads to a static checker warning about a potential out
> of bounds read.
> 
> Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/3 net-next] lib: objagg: Fix an error code in objagg_hints_get()
From: David Miller @ 2019-02-14  6:13 UTC (permalink / raw)
  To: dan.carpenter; +Cc: jiri, netdev, idosch, kernel-janitors
In-Reply-To: <20190213085650.GB14113@kadam>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 13 Feb 2019 11:56:50 +0300

> We need to set the error code on this path otherwise we return
> ERR_PTR(0) which would result in a NULL dereference in the caller.
> 
> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/3 net-next] test_objagg: Test the correct variable
From: David Miller @ 2019-02-14  6:13 UTC (permalink / raw)
  To: dan.carpenter; +Cc: jiri, netdev, kernel-janitors
In-Reply-To: <20190213085820.GC14113@kadam>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 13 Feb 2019 11:58:20 +0300

> There is a typo here.  We intended to check "objagg2" but we instead
> test "objagg" which is not an error pointer.
> 
> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3 net-next] test_objagg: Uninitialized variable in error handling
From: David Miller @ 2019-02-14  6:13 UTC (permalink / raw)
  To: dan.carpenter; +Cc: jiri, netdev, kernel-janitors
In-Reply-To: <20190213085931.GD14113@kadam>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 13 Feb 2019 11:59:31 +0300

> We need to set the error message on this path otherwise some of the
> callers, such as test_hints_case(), print from an uninitialized pointer.
> 
> We had a similar bug earlier and set "errmsg" to NULL in the caller,
> test_delta_action_item().  That code is no longer required so I have
> removed it.
> 
> Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: Heiner Kallweit @ 2019-02-14  6:17 UTC (permalink / raw)
  To: David Chang; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
In-Reply-To: <20190214024527.GG7193@linux-kyyb.suse>

Hi David,

On 14.02.2019 03:45, David Chang wrote:
> Hi Heiner,
> 
> On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
>> Hi David,
>>
>> meanwhile there's the following bug report matching what reported.
>> It's even the same chip version (RTL8168h).
>> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>>
>> Symptom there is also a significant number of rx_missed packets.
>> Could you try what I mentioned there last:
>> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
>> end of rtl_hw_start_8168h_1() being disabled.
> 
> After disabled the aspm function that you mentioned, we finally got the
> positive testing result. And the rx_missed error was gone. If without
> the patch, the receive side get back to bad performance.
> 
Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
driver uses some heuristics to disable ASPM under high load. So it seems like there
is some hw issue. Open so far is whether this affects certain chip versions only.
Let's see whether they can provide more information.
Disabling ASPM in general would hurt notebook users because based on some past
measurements we know ASPM can significantly save energy.

> kernel: r8169: loading out-of-tree module taints kernel.
> kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> kernel: libphy: r8169: probed
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
> 
> NIC statistics:
>      tx_packets: 1653804
>      rx_packets: 1555966
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 0
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 1555884
>      broadcast: 78
>      multicast: 4
>      tx_aborted: 0
>      tx_underrun: 0
> 
> iperf receive:
> -----------------------------------------------------------
> Server listening on 5201
> -----------------------------------------------------------
> Accepted connection from 10.x.x.x, port 55516
> [  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
> [  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
> [  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
> [  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
> [  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
> [  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
> [  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
> [  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
> [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> [  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
> [...]
> [  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
> [  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
> [  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
> [  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
> [  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
> [  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
> [  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
> [  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
> [  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
> [  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
> [  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver
> 
> regards,
> David
> 
Heiner

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox