Netdev List

Netdev List
 help / color / mirror / Atom feed

* [iproute PATCH] rdma: Don't pass garbage to rd_check_is_filtered()
From: Phil Sutter @ 2018-10-18 12:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Variables 'src_port' and 'dst_port' are initialized only if attributes
RDMA_NLDEV_ATTR_RES_SRC_ADDR or RDMA_NLDEV_ATTR_RES_DST_ADDR are
present. Make sure to pass them over to rd_check_is_filtered() only if
that is the case.

Fixes: 9a362cc71a455 ("rdma: Add CM_ID resource tracking information")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 rdma/res.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/rdma/res.c b/rdma/res.c
index 074b9929a38b2..0d8c1c388c4ca 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -621,6 +621,8 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, void *data)
 			if (rd_check_is_string_filtered(rd, "src-addr",
 							src_addr_str))
 				continue;
+			if (rd_check_is_filtered(rd, "src-port", src_port))
+				continue;
 		}
 
 		if (nla_line[RDMA_NLDEV_ATTR_RES_DST_ADDR]) {
@@ -630,14 +632,10 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, void *data)
 			if (rd_check_is_string_filtered(rd, "dst-addr",
 							dst_addr_str))
 				continue;
+			if (rd_check_is_filtered(rd, "dst-port", dst_port))
+				continue;
 		}
 
-		if (rd_check_is_filtered(rd, "src-port", src_port))
-			continue;
-
-		if (rd_check_is_filtered(rd, "dst-port", dst_port))
-			continue;
-
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID]) {
 			pid = mnl_attr_get_u32(
 					nla_line[RDMA_NLDEV_ATTR_RES_PID]);
-- 
2.19.0

^ permalink raw reply related

* Re: [PATCH] net: ethernet: fec: Add missing SPEED_
From: Heiner Kallweit @ 2018-10-18 20:41 UTC (permalink / raw)
  To: Florian Fainelli, LABBE Corentin
  Cc: andrew, davem, fugang.duan, linux-kernel, netdev
In-Reply-To: <1b784f69-3ec2-feb2-81e1-9a335cf477c3@gmail.com>

On 18.10.2018 22:10, Florian Fainelli wrote:
> On 10/18/2018 12:59 PM, LABBE Corentin wrote:
>> On Thu, Oct 18, 2018 at 12:38:32PM -0700, Florian Fainelli wrote:
>>> On 10/18/2018 12:16 PM, LABBE Corentin wrote:
>>>> On Thu, Oct 18, 2018 at 11:55:49AM -0700, Florian Fainelli wrote:
>>>>> On 10/18/2018 11:47 AM, LABBE Corentin wrote:
>>>>>> On Thu, Oct 18, 2018 at 11:39:24AM -0700, Florian Fainelli wrote:
>>>>>>> On 10/18/2018 08:05 AM, Corentin Labbe wrote:
>>>>>>>> Since commit 58056c1e1b0e ("net: ethernet: Use phy_set_max_speed() to limit advertised speed"), the fec driver is unable to get any link.
>>>>>>>> This is due to missing SPEED_.
>>>>>>>
>>>>>>> But SPEED_1000 is defined in include/uapi/linux/ethtool.h as 1000, so
>>>>>>> surely this would amount to the same code paths being taken or am I
>>>>>>> missing something here?
>>>>>>
>>>>>> The bisect session pointed your patch, reverting it fix the issue.
>>>>>> BUT since the fix seemed trivial I sent the patch without more test then compile it.
>>>>>> Sorry, I have just found some minutes ago that it didnt fix the issue.
>>>>>>
>>>>>> But your patch is still the cause for sure.
>>>>>>
>>>>>
>>>>> What you are writing is really lowering the confidence level, first
>>>>> Andrew is the author of that patch, and second "just compiling" and
>>>>> pretending this fixes a problem when it does not is not quite what I
>>>>> would expect.
>>>>>
>>>>> I don't have a problem helping you find the solution or the right fix
>>>>> though, even if it is not my patch, but please get the author and actual
>>>>> problem right so we can move forward in confidence, thanks!
>>>>
>>>> Sorry again, I wanted to acknoledge my error but I did it too fast and late.
>>>> And sorry to have confound you with Andrew.
>>>
>>> No worries, here to help, let us know what your bisection points to. THanks
>>
>> I have added printing of phydev->supported
>> My working kernel (on top of 58056c1e1b0e + revert patch) got:
>> [    5.550838] fec_enet_mii_probe 2ff (gbit features)
>> [    5.555848] fec_enet_mii_probe 2ef (without 1000baseT_Half)
>> [    5.561620] fec_enet_mii_probe 22ef final (after pause)
>> [    5.566914] Micrel KSZ9021 Gigabit PHY 2188000.ethernet-1:06: attached PHY driver [Micrel KSZ9021 Gigabit PHY] (mii_bus:phy_addr=2188000.ethernet-1:06, irq=POLL)
>> [    8.730751] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
>> [    8.788311] Sending DHCP requests ., OK
>> [    8.832357] IP-Config: Got DHCP answer from 192.168.66.1, my address is 192.168.66.58
>>
>> the non-working kernel (next-20181015)
>> [    7.308917] fec_enet_mii_probe 62ff after phy_set_max_speed
>> [    7.314545] fec_enet_mii_probe 62ef after phy_remove_link_mode
>> [    7.320418] fec_enet_mii_probe 62ef after pause
>> and then no link
>>
>> So it seems that phy_set_max_speed adds bit 14 (ETHTOOL_LINK_MODE_Asym_Pause_BIT)
> 
> It's not masking it so it must be coming from phy_probe().
> 
See df8ed346d4a8 ("net: phy: fix flag masking in __set_phy_supported").
phy_set_max_speed() used to (unintentionally) mask the pause bits
and it seems that the fec driver used this bug as a feature.

>>
>> I have patched by adding:
>> phy_remove_link_mode(phy_dev, ETHTOOL_LINK_MODE_Asym_Pause_BIT);

Instead of programmatically removing the feature bit it should be
possible to do this in the PHY driver configuration. See also
this part of phy_probe().

	if (phydrv->features & (SUPPORTED_Pause | SUPPORTED_Asym_Pause)) {
		phydev->supported &= ~(SUPPORTED_Pause | SUPPORTED_Asym_Pause);
		phydev->supported |= phydrv->features &
				     (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
	} else {
		phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
	}

>> and got:
>> [    7.310559] fec_enet_mii_probe 62ff after phy_set_max_speed
>> [    7.316221] fec_enet_mii_probe 22ef after phy_remove_link_mode
>> [    7.322128] fec_enet_mii_probe 22ef after pause
>> [    7.326681] Micrel KSZ9021 Gigabit PHY 2188000.ethernet-1:06: attached PHY driver [Micrel KSZ9021 Gigabit PHY] (mii_bus:phy_addr=2188000.ethernet-1:06, irq=POLL)
>> [    7.611276] Waiting up to 3 more seconds for network.
>> [    7.881278] Waiting up to 2 more seconds for network.
>> [    8.131277] Waiting up to 2 more seconds for network.
>> [    8.401169] Waiting up to 2 more seconds for network.
>> [    8.671269] Waiting up to 2 more seconds for network.
>> [    8.941274] Waiting up to 1 more seconds for network.
>> [    9.211181] Waiting up to 1 more seconds for network.
>> [    9.481274] Waiting up to 1 more seconds for network.
>> [    9.751275] Waiting up to 1 more seconds for network.
>> [   10.021281] Waiting up to 0 more seconds for network.
>> [   10.291274] Waiting up to 0 more seconds for network.
>> [   10.381282] Sending DHCP requests .
>> [   10.473000] fec 2188000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
>> [   12.861267] ., OK
>> [   12.903405] IP-Config: Got DHCP answer from 192.168.66.1, my address is 192.168.66.58
>>
>> So at least I got a link, but the link is still late to got
> 
> The delay is likely something entirely different, it could be some of
> Heiner's recent changes to PHYLIB, Heiner do you have access to a system
> that polls the PHY?
> 
I don't think there's anything wrong with phylib. Time difference
between the fec_enet_mii_probe messages and the "link up" message
is little bit more than 3s in both cases.
For a reason not visible here the fec_enet_mii_probe messages
come 2s later in the second case.
What happens after the "link up" message is out of control of phylib.

^ permalink raw reply

* Re: [PATCH net v2] net/sched: act_gact: properly init 'goto chain'
From: Jamal Hadi Salim @ 2018-10-18 12:52 UTC (permalink / raw)
  To: Davide Caratti, Jiri Pirko, Cong Wang, David S. Miller, netdev
In-Reply-To: <71d1b5b90ab0e678be30726d373a9d325572a125.1539863681.git.dcaratti@redhat.com>

On 2018-10-18 8:05 a.m., Davide Caratti wrote:
> the following script:
> 
>   # tc f a dev v0 egress chain 4 matchall action simple sdata "A triumph!"
>   # tc f a dev v0 egress matchall action pass random determ goto chain 4 5
> 
> produces the following crash:
> 
>   BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>   PGD 0 P4D 0
>   Oops: 0000 [#1] SMP PTI
>   CPU: 9 PID: 0 Comm: swapper/9 Not tainted 4.19.0-rc6.chainfix + #472
>   Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0  07/26/2013
>   RIP: 0010:tcf_action_exec+0xb8/0x100
>   Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
>   RSP: 0018:ffff9af96f843bf8 EFLAGS: 00010246
>   RAX: 000000002000002a RBX: ffff9af9679cf200 RCX: 000000000000005a
>   RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9af585e006c0
>   RBP: ffff9af96f843ca0 R08: 0000000016000000 R09: 0000000000000000
>   R10: 0000000000000000 R11: 0000000000000000 R12: ffff9af968db4400
>   R13: ffff9af968db4408 R14: 0000000000000001 R15: ffff9af585e006c0
>   FS:  0000000000000000(0000) GS:ffff9af96f840000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 000000025980a001 CR4: 00000000001606e0
>   Call Trace:
>    <IRQ>
>    tcf_classify+0x89/0x140
>    __dev_queue_xmit+0x413/0x8a0
>    ? ip6_finish_output2+0x336/0x520
>    ip6_finish_output2+0x336/0x520
>    ? ip6_output+0x68/0x110
>    ip6_output+0x68/0x110
>    ? ip6_fragment+0x9e0/0x9e0
>    mld_sendpack+0x175/0x220
>    ? mld_gq_timer_expire+0x40/0x40
>    mld_dad_timer_expire+0x25/0x80
>    call_timer_fn+0x2b/0x120
>    run_timer_softirq+0x3e8/0x440
>    ? tick_sched_timer+0x37/0x70
>    ? __hrtimer_run_queues+0x118/0x290
>    __do_softirq+0xe3/0x2bd
>    irq_exit+0xe3/0xf0
>    smp_apic_timer_interrupt+0x74/0x130
>    apic_timer_interrupt+0xf/0x20
>    </IRQ>
>   RIP: 0010:cpuidle_enter_state+0xa5/0x320
>   Code: 71 82 5f 7e e8 bc 25 ab ff 48 89 c3 0f 1f 44 00 00 31 ff e8 3d 36 ab ff 80 7c 24 07 00 0f 85 28 02 00 00 fb 66 0f 1f 44 00 00 <4c> 29 f3 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
>   RSP: 0018:ffffafa1832cbe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>   RAX: ffff9af96f862600 RBX: 0000003ede349ac5 RCX: 000000000000001f
>   RDX: 0000003ede349ac5 RSI: 00000000313b14ef RDI: 0000000000000000
>   RBP: ffffcfa17fa40a00 R08: ffff9af96f85cdc0 R09: 000000000000afc8
>   R10: ffffafa1832cbe70 R11: 000000000000afc8 R12: 0000000000000004
>   R13: ffffffff82578bd8 R14: 0000003ec085dc50 R15: 0000000000000000
>    do_idle+0x200/0x280
>    cpu_startup_entry+0x6f/0x80
>    start_secondary+0x1a7/0x200
>    secondary_startup_64+0xa4/0xb0
>   Modules linked in: act_gact act_simple cls_matchall sch_ingress veth intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_ssif ghash_clmulni_intel pcbc aesni_intel ipmi_si iTCO_wdt crypto_simd iTCO_vendor_support cryptd mei_me ipmi_devintf glue_helper mei joydev ipmi_msghandler pcc_cpufreq sg lpc_ich pcspkr i2c_i801 ioatdma wmi ip_tables xfs libcrc32c mlx4_en sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm isci drm libsas igb ahci libahci scsi_transport_sas mlx4_core be2net crc32c_intel dca libata i2c_algo_bit i2c_core megaraid_sas devlink dm_mirror dm_region_hash dm_log dm_mod
>   CR2: 0000000000000000
> 
> when CONFIG_GACT_PROB is enabled, gact allows users to specify a fallback
> control action, that is stored in the action private data. 'goto chain x'
> never worked for that case, since goto_chain was never initialized. There
> is only one goto_chain handle per action: ensure that gact never contains
> more than one 'goto chain' in a rule. If the fallback control action is a
> 'goto chain', copy it to tcf_action to ensure that goto_chain is properly
> initialized.
> 
> v2: - fix breakage of TDC tests when 'p_parm' is not specified
>      - reject 'goto action' if it is specified twice in the same gact rule
> 
> Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
>   include/net/tc_act/tc_gact.h | 11 +++++++++--
>   net/sched/act_gact.c         | 19 ++++++++++++++-----
>   2 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
> index ef8dd0db70ce..0cbeb77349bc 100644
> --- a/include/net/tc_act/tc_gact.h
> +++ b/include/net/tc_act/tc_gact.h
> @@ -10,12 +10,19 @@ struct tcf_gact {
>   #ifdef CONFIG_GACT_PROB
>   	u16			tcfg_ptype;
>   	u16			tcfg_pval;
> +	int			tcfg_caction;
>   	int			tcfg_paction;
>   	atomic_t		packets;
>   #endif
>   };
>   #define to_gact(a) ((struct tcf_gact *)a)
>   
> +#ifdef CONFIG_GACT_PROB
> +#define GACT_PRIMARY_ACTION(g) ((g)->tcfg_caction)
> +#else
> +#define GACT_PRIMARY_ACTION(g) ((g)->tcf_action)
> +#endif
> +
>   static inline bool __is_tcf_gact_act(const struct tc_action *a, int act,
>   				     bool is_ext)
>   {
> @@ -26,8 +33,8 @@ static inline bool __is_tcf_gact_act(const struct tc_action *a, int act,
>   		return false;
>   
>   	gact = to_gact(a);
> -	if ((!is_ext && gact->tcf_action == act) ||
> -	    (is_ext && TC_ACT_EXT_CMP(gact->tcf_action, act)))
> +	if ((!is_ext && GACT_PRIMARY_ACTION(gact) == act) ||
> +	    (is_ext && TC_ACT_EXT_CMP(GACT_PRIMARY_ACTION(gact), act)))
>   		return true;
>   
>   #endif
> diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
> index cd1d9bd32ef9..49b32650efb9 100644
> --- a/net/sched/act_gact.c
> +++ b/net/sched/act_gact.c
> @@ -31,7 +31,7 @@ static int gact_net_rand(struct tcf_gact *gact)
>   {
>   	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
>   	if (prandom_u32() % gact->tcfg_pval)
> -		return gact->tcf_action;
> +		return gact->tcfg_caction;
>   	return gact->tcfg_paction;
>   }
>   
> @@ -41,7 +41,7 @@ static int gact_determ(struct tcf_gact *gact)
>   
>   	smp_rmb(); /* coupled with smp_wmb() in tcf_gact_init() */
>   	if (pack % gact->tcfg_pval)
> -		return gact->tcf_action;
> +		return gact->tcfg_caction;
>   	return gact->tcfg_paction;
>   }
>   
> @@ -88,6 +88,9 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
>   		p_parm = nla_data(tb[TCA_GACT_PROB]);
>   		if (p_parm->ptype >= MAX_RAND)
>   			return -EINVAL;
> +		if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN) &&
> +		    TC_ACT_EXT_CMP(parm->action, TC_ACT_GOTO_CHAIN))
> +			return -EINVAL;

Rejection is a good solution[1].
Would be helpful to set an ext_ack to something like
"only one goto chain is supported currently"
I didnt follow why you needed to introduce tcfg_caction...

cheers,
jamal

[1]actions should allow multitude return opcodes, not
just two. The proper solution seems to be to just let
the caller (cls_api - who is aware of chains) to
deal with the chain selection. Sticking them in actions
speeds up lookup (and deal with refcnts) but i wonder
given this breakage in abstraction whether they belong
there...

^ permalink raw reply

* Re: [iproute PATCH] devlink: Fix error reporting in cmd_resource_set()
From: Jiri Pirko @ 2018-10-18 12:56 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20181018112823.5220-1-phil@nwl.cc>

Thu, Oct 18, 2018 at 01:28:23PM CEST, phil@nwl.cc wrote:
>resource_path_parse() returns either zero or a negative error code,
>hence the negated value must be passed to strerror().
>
>Fixes: 8cd644095842a ("devlink: Add support for devlink resource abstraction")
>Signed-off-by: Phil Sutter <phil@nwl.cc>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [net PATCH] net: sched: Fix for duplicate class dump
From: Jiri Pirko @ 2018-10-18 12:57 UTC (permalink / raw)
  To: Phil Sutter; +Cc: David Miller, netdev, Eric Dumazet
In-Reply-To: <20181018083426.6623-1-phil@nwl.cc>

Thu, Oct 18, 2018 at 10:34:26AM CEST, phil@nwl.cc wrote:
>When dumping classes by parent, kernel would return classes twice:
>
>| # tc qdisc add dev lo root prio
>| # tc class show dev lo
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>| # tc class show dev lo parent 8001:
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>| class prio 8001:1 parent 8001:
>| class prio 8001:2 parent 8001:
>| class prio 8001:3 parent 8001:
>
>This comes from qdisc_match_from_root() potentially returning the root
>qdisc itself if its handle matched. Though in that case, root's classes
>were already dumped a few lines above.
>
>Fixes: cb395b2010879 ("net: sched: optimize class dumps")
>Signed-off-by: Phil Sutter <phil@nwl.cc>

Reviewed-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* [PATCH bpf-next v3 0/7] Implement queue/stack maps
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu

In some applications this is needed have a pool of free elements, for
example the list of free L4 ports in a SNAT.  None of the current maps allow
to do it as it is not possible to get any element without having they key
it is associated to, even if it were possible, the lack of locking mecanishms in
eBPF would do it almost impossible to be implemented without data races.

This patchset implements two new kind of eBPF maps: queue and stack.
Those maps provide to eBPF programs the peek, push and pop operations, and for
userspace applications a new bpf_map_lookup_and_delete_elem() is added.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>

v2 -> v3:
 - Remove "almost dead code" in syscall.c
 - Remove unnecessary copy_from_user in bpf_map_lookup_and_delete_elem
 - Rebase

v1 -> v2:
 - Put ARG_PTR_TO_UNINIT_MAP_VALUE logic into a separated patch
 - Fix missing __this_cpu_dec & preempt_enable calls in kernel/bpf/syscall.c

RFC v4 -> v1:
 - Remove roundup to power of 2 in memory allocation
 - Remove count and use a free slot to check if queue/stack is empty
 - Use if + assigment for wrapping indexes
 - Fix some minor style issues
 - Squash two patches together

RFC v3 -> RFC v4:
 - Revert renaming of kernel/bpf/stackmap.c
 - Remove restriction on value size
 - Remove len arguments from peek/pop helpers
 - Add new ARG_PTR_TO_UNINIT_MAP_VALUE

RFC v2 -> RFC v3:
 - Return elements by value instead that by reference
 - Implement queue/stack base on array and head + tail indexes
 - Rename stack trace related files to avoid confusion and conflicts

RFC v1 -> RFC v2:
 - Create two separate maps instead of single one + flags
 - Implement bpf_map_lookup_and_delete syscall
 - Support peek operation
 - Define replacement policy through flags in the update() method
 - Add eBPF side tests

---

Mauricio Vasquez B (7):
      bpf: rename stack trace map operations
      bpf/syscall: allow key to be null in map functions
      bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE
      bpf: add queue and stack maps
      bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall
      Sync uapi/bpf.h to tools/include
      selftests/bpf: add test cases for queue and stack maps


 include/linux/bpf.h                                |    7 
 include/linux/bpf_types.h                          |    4 
 include/uapi/linux/bpf.h                           |   30 ++
 kernel/bpf/Makefile                                |    2 
 kernel/bpf/core.c                                  |    3 
 kernel/bpf/helpers.c                               |   43 +++
 kernel/bpf/queue_stack_maps.c                      |  288 ++++++++++++++++++++
 kernel/bpf/stackmap.c                              |    2 
 kernel/bpf/syscall.c                               |   91 ++++++
 kernel/bpf/verifier.c                              |   28 ++
 net/core/filter.c                                  |    6 
 tools/include/uapi/linux/bpf.h                     |   30 ++
 tools/lib/bpf/bpf.c                                |   12 +
 tools/lib/bpf/bpf.h                                |    2 
 tools/testing/selftests/bpf/Makefile               |    5 
 tools/testing/selftests/bpf/bpf_helpers.h          |    7 
 tools/testing/selftests/bpf/test_maps.c            |  122 ++++++++
 tools/testing/selftests/bpf/test_progs.c           |   99 +++++++
 tools/testing/selftests/bpf/test_queue_map.c       |    4 
 tools/testing/selftests/bpf/test_queue_stack_map.h |   59 ++++
 tools/testing/selftests/bpf/test_stack_map.c       |    4 
 21 files changed, 834 insertions(+), 14 deletions(-)
 create mode 100644 kernel/bpf/queue_stack_maps.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_map.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h
 create mode 100644 tools/testing/selftests/bpf/test_stack_map.c

^ permalink raw reply

* [PATCH bpf-next v3 1/7] bpf: rename stack trace map operations
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

In the following patches queue and stack maps (FIFO and LIFO
datastructures) will be implemented.  In order to avoid confusion and
a possible name clash rename stack_map_ops to stack_trace_map_ops

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf_types.h |    2 +-
 kernel/bpf/stackmap.c     |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fa48343a5ea1..7bad4e1947ed 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -51,7 +51,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_HASH, htab_lru_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_PERCPU_HASH, htab_lru_percpu_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_LPM_TRIE, trie_map_ops)
 #ifdef CONFIG_PERF_EVENTS
-BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_trace_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_of_maps_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index b2ade10f7ec3..90daf285de03 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -600,7 +600,7 @@ static void stack_map_free(struct bpf_map *map)
 	put_callchain_buffers();
 }
 
-const struct bpf_map_ops stack_map_ops = {
+const struct bpf_map_ops stack_trace_map_ops = {
 	.map_alloc = stack_map_alloc,
 	.map_free = stack_map_free,
 	.map_get_next_key = stack_map_get_next_key,

^ permalink raw reply related

* [PATCH bpf-next v3 2/7] bpf/syscall: allow key to be null in map functions
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

This commit adds the required logic to allow key being NULL
in case the key_size of the map is 0.

A new __bpf_copy_key function helper only copies the key from
userpsace when key_size != 0, otherwise it enforces that key must be
null.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/syscall.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f4ecd6ed2252..78d9dd95e25f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -651,6 +651,17 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 	return -ENOTSUPP;
 }
 
+static void *__bpf_copy_key(void __user *ukey, u64 key_size)
+{
+	if (key_size)
+		return memdup_user(ukey, key_size);
+
+	if (ukey)
+		return ERR_PTR(-EINVAL);
+
+	return NULL;
+}
+
 /* last field in 'union bpf_attr' used by this command */
 #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value
 
@@ -678,7 +689,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	key = memdup_user(ukey, map->key_size);
+	key = __bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
 		goto err_put;
@@ -785,7 +796,7 @@ static int map_update_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	key = memdup_user(ukey, map->key_size);
+	key = __bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
 		goto err_put;
@@ -888,7 +899,7 @@ static int map_delete_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	key = memdup_user(ukey, map->key_size);
+	key = __bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
 		goto err_put;
@@ -941,7 +952,7 @@ static int map_get_next_key(union bpf_attr *attr)
 	}
 
 	if (ukey) {
-		key = memdup_user(ukey, map->key_size);
+		key = __bpf_copy_key(ukey, map->key_size);
 		if (IS_ERR(key)) {
 			err = PTR_ERR(key);
 			goto err_put;

^ permalink raw reply related

* [PATCH bpf-next v3 3/7] bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone
used to save the value of a map.  Basically the same as
ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra
argument.

This will be used in the following patch that implements some new
helpers that receive a pointer to be filled with a map value.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf.h   |    1 +
 kernel/bpf/verifier.c |    9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e60fff48288b..0f8b863e0229 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -138,6 +138,7 @@ enum bpf_arg_type {
 	ARG_CONST_MAP_PTR,	/* const argument used as pointer to bpf_map */
 	ARG_PTR_TO_MAP_KEY,	/* pointer to stack used as map key */
 	ARG_PTR_TO_MAP_VALUE,	/* pointer to stack used as map value */
+	ARG_PTR_TO_UNINIT_MAP_VALUE,	/* pointer to valid memory used to store a map value */
 
 	/* the following constraints used to prototype bpf_memcmp() and other
 	 * functions that access data on eBPF program stack
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3f93a548a642..d84c91ac3b70 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2117,7 +2117,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	}
 
 	if (arg_type == ARG_PTR_TO_MAP_KEY ||
-	    arg_type == ARG_PTR_TO_MAP_VALUE) {
+	    arg_type == ARG_PTR_TO_MAP_VALUE ||
+	    arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE) {
 		expected_type = PTR_TO_STACK;
 		if (!type_is_pkt_pointer(type) && type != PTR_TO_MAP_VALUE &&
 		    type != expected_type)
@@ -2187,7 +2188,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->key_size, false,
 					      NULL);
-	} else if (arg_type == ARG_PTR_TO_MAP_VALUE) {
+	} else if (arg_type == ARG_PTR_TO_MAP_VALUE ||
+		   arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE) {
 		/* bpf_map_xxx(..., map_ptr, ..., value) call:
 		 * check [value, value + map->value_size) validity
 		 */
@@ -2196,9 +2198,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			verbose(env, "invalid map_ptr to access map->value\n");
 			return -EACCES;
 		}
+		meta->raw_mode = (arg_type == ARG_PTR_TO_UNINIT_MAP_VALUE);
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->value_size, false,
-					      NULL);
+					      meta);
 	} else if (arg_type_is_mem_size(arg_type)) {
 		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 

^ permalink raw reply related

* [PATCH bpf-next v3 5/7] bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

The previous patch implemented a bpf queue/stack maps that
provided the peek/pop/push functions.  There is not a direct
relationship between those functions and the current maps
syscalls, hence a new MAP_LOOKUP_AND_DELETE_ELEM syscall is added,
this is mapped to the pop operation in the queue/stack maps
and it is still to implement in other kind of maps.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
---
 include/uapi/linux/bpf.h |    1 +
 kernel/bpf/syscall.c     |   66 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b8fc161c5b78..c8824d5364ff 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -103,6 +103,7 @@ enum bpf_cmd {
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
+	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1617407f9ee5..49ae64a26562 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -999,6 +999,69 @@ static int map_get_next_key(union bpf_attr *attr)
 	return err;
 }
 
+#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
+
+static int map_lookup_and_delete_elem(union bpf_attr *attr)
+{
+	void __user *ukey = u64_to_user_ptr(attr->key);
+	void __user *uvalue = u64_to_user_ptr(attr->value);
+	int ufd = attr->map_fd;
+	struct bpf_map *map;
+	void *key, *value, *ptr;
+	u32 value_size;
+	struct fd f;
+	int err;
+
+	if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM))
+		return -EINVAL;
+
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
+	key = __bpf_copy_key(ukey, map->key_size);
+	if (IS_ERR(key)) {
+		err = PTR_ERR(key);
+		goto err_put;
+	}
+
+	value_size = map->value_size;
+
+	err = -ENOMEM;
+	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
+	if (!value)
+		goto free_key;
+
+	if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+	    map->map_type == BPF_MAP_TYPE_STACK) {
+		err = map->ops->map_pop_elem(map, value);
+	} else {
+		err = -ENOTSUPP;
+	}
+
+	if (err)
+		goto free_value;
+
+	if (copy_to_user(uvalue, value, value_size) != 0)
+		goto free_value;
+
+	err = 0;
+
+free_value:
+	kfree(value);
+free_key:
+	kfree(key);
+err_put:
+	fdput(f);
+	return err;
+}
+
 static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name) \
 	[_id] = & _name ## _prog_ops,
@@ -2472,6 +2535,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_TASK_FD_QUERY:
 		err = bpf_task_fd_query(&attr, uattr);
 		break;
+	case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
+		err = map_lookup_and_delete_elem(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;

^ permalink raw reply related

* [PATCH bpf-next v3 4/7] bpf: add queue and stack maps
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs.
These maps support peek, pop and push operations that are exposed to eBPF
programs through the new bpf_map[peek/pop/push] helpers.  Those operations
are exposed to userspace applications through the already existing
syscalls in the following way:

BPF_MAP_LOOKUP_ELEM            -> peek
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
BPF_MAP_UPDATE_ELEM            -> push

Queue/stack maps are implemented using a buffer, tail and head indexes,
hence BPF_F_NO_PREALLOC is not supported.

As opposite to other maps, queue and stack do not use RCU for protecting
maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE
argument that is a pointer to a memory zone where to save the value of a
map.  Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not
be passed as an extra argument.

Our main motivation for implementing queue/stack maps was to keep track
of a pool of elements, like network ports in a SNAT, however we forsee
other use cases, like for exampling saving last N kernel events in a map
and then analysing from userspace.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
---
 include/linux/bpf.h           |    6 +
 include/linux/bpf_types.h     |    2 
 include/uapi/linux/bpf.h      |   29 ++++
 kernel/bpf/Makefile           |    2 
 kernel/bpf/core.c             |    3 
 kernel/bpf/helpers.c          |   43 ++++++
 kernel/bpf/queue_stack_maps.c |  288 +++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c          |    6 +
 kernel/bpf/verifier.c         |   19 +++
 net/core/filter.c             |    6 +
 10 files changed, 401 insertions(+), 3 deletions(-)
 create mode 100644 kernel/bpf/queue_stack_maps.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0f8b863e0229..33014ae73103 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -39,6 +39,9 @@ struct bpf_map_ops {
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
 	int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags);
 	int (*map_delete_elem)(struct bpf_map *map, void *key);
+	int (*map_push_elem)(struct bpf_map *map, void *value, u64 flags);
+	int (*map_pop_elem)(struct bpf_map *map, void *value);
+	int (*map_peek_elem)(struct bpf_map *map, void *value);
 
 	/* funcs called by prog_array and perf_event_array map */
 	void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file,
@@ -811,6 +814,9 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map,
 extern const struct bpf_func_proto bpf_map_lookup_elem_proto;
 extern const struct bpf_func_proto bpf_map_update_elem_proto;
 extern const struct bpf_func_proto bpf_map_delete_elem_proto;
+extern const struct bpf_func_proto bpf_map_push_elem_proto;
+extern const struct bpf_func_proto bpf_map_pop_elem_proto;
+extern const struct bpf_func_proto bpf_map_peek_elem_proto;
 
 extern const struct bpf_func_proto bpf_get_prandom_u32_proto;
 extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 7bad4e1947ed..44d9ab4809bd 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -69,3 +69,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops)
 #endif
 #endif
+BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f9187b41dff6..b8fc161c5b78 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -128,6 +128,8 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_CGROUP_STORAGE,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
 	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_QUEUE,
+	BPF_MAP_TYPE_STACK,
 };
 
 enum bpf_prog_type {
@@ -462,6 +464,28 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
+ * 	Description
+ * 		Push an element *value* in *map*. *flags* is one of:
+ *
+ * 		**BPF_EXIST**
+ * 		If the queue/stack is full, the oldest element is removed to
+ * 		make room for this.
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_pop_elem(struct bpf_map *map, void *value)
+ * 	Description
+ * 		Pop an element from *map*.
+ * Return
+ * 		0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_peek_elem(struct bpf_map *map, void *value)
+ * 	Description
+ * 		Get an element from *map* without removing it.
+ * Return
+ * 		0 on success, or a negative error in case of failure.
+ *
  * int bpf_probe_read(void *dst, u32 size, const void *src)
  * 	Description
  * 		For tracing programs, safely attempt to read *size* bytes from
@@ -2303,7 +2327,10 @@ union bpf_attr {
 	FN(skb_ancestor_cgroup_id),	\
 	FN(sk_lookup_tcp),		\
 	FN(sk_lookup_udp),		\
-	FN(sk_release),
+	FN(sk_release),			\
+	FN(map_push_elem),		\
+	FN(map_pop_elem),		\
+	FN(map_peek_elem),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index ff8262626b8f..4c2fa3ac56f6 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -3,7 +3,7 @@ obj-y := core.o
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o
 ifeq ($(CONFIG_NET),y)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index defcf4df6d91..7c7eeea8cffc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1783,6 +1783,9 @@ BPF_CALL_0(bpf_user_rnd_u32)
 const struct bpf_func_proto bpf_map_lookup_elem_proto __weak;
 const struct bpf_func_proto bpf_map_update_elem_proto __weak;
 const struct bpf_func_proto bpf_map_delete_elem_proto __weak;
+const struct bpf_func_proto bpf_map_push_elem_proto __weak;
+const struct bpf_func_proto bpf_map_pop_elem_proto __weak;
+const struct bpf_func_proto bpf_map_peek_elem_proto __weak;
 
 const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6502115e8f55..ab0d5e3f9892 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -76,6 +76,49 @@ const struct bpf_func_proto bpf_map_delete_elem_proto = {
 	.arg2_type	= ARG_PTR_TO_MAP_KEY,
 };
 
+BPF_CALL_3(bpf_map_push_elem, struct bpf_map *, map, void *, value, u64, flags)
+{
+	return map->ops->map_push_elem(map, value, flags);
+}
+
+const struct bpf_func_proto bpf_map_push_elem_proto = {
+	.func		= bpf_map_push_elem,
+	.gpl_only	= false,
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_MAP_VALUE,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_2(bpf_map_pop_elem, struct bpf_map *, map, void *, value)
+{
+	return map->ops->map_pop_elem(map, value);
+}
+
+const struct bpf_func_proto bpf_map_pop_elem_proto = {
+	.func		= bpf_map_pop_elem,
+	.gpl_only	= false,
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+};
+
+BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void *, value)
+{
+	return map->ops->map_peek_elem(map, value);
+}
+
+const struct bpf_func_proto bpf_map_peek_elem_proto = {
+	.func		= bpf_map_pop_elem,
+	.gpl_only	= false,
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MAP_VALUE,
+};
+
 const struct bpf_func_proto bpf_get_prandom_u32_proto = {
 	.func		= bpf_user_rnd_u32,
 	.gpl_only	= false,
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
new file mode 100644
index 000000000000..12a93fb37449
--- /dev/null
+++ b/kernel/bpf/queue_stack_maps.c
@@ -0,0 +1,288 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * queue_stack_maps.c: BPF queue and stack maps
+ *
+ * Copyright (c) 2018 Politecnico di Torino
+ */
+#include <linux/bpf.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include "percpu_freelist.h"
+
+#define QUEUE_STACK_CREATE_FLAG_MASK \
+	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
+
+struct bpf_queue_stack {
+	struct bpf_map map;
+	raw_spinlock_t lock;
+	u32 head, tail;
+	u32 size; /* max_entries + 1 */
+
+	char elements[0] __aligned(8);
+};
+
+static struct bpf_queue_stack *bpf_queue_stack(struct bpf_map *map)
+{
+	return container_of(map, struct bpf_queue_stack, map);
+}
+
+static bool queue_stack_map_is_empty(struct bpf_queue_stack *qs)
+{
+	return qs->head == qs->tail;
+}
+
+static bool queue_stack_map_is_full(struct bpf_queue_stack *qs)
+{
+	u32 head = qs->head + 1;
+
+	if (unlikely(head >= qs->size))
+		head = 0;
+
+	return head == qs->tail;
+}
+
+/* Called from syscall */
+static int queue_stack_map_alloc_check(union bpf_attr *attr)
+{
+	/* check sanity of attributes */
+	if (attr->max_entries == 0 || attr->key_size != 0 ||
+	    attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)
+		return -EINVAL;
+
+	if (attr->value_size > KMALLOC_MAX_SIZE)
+		/* if value_size is bigger, the user space won't be able to
+		 * access the elements.
+		 */
+		return -E2BIG;
+
+	return 0;
+}
+
+static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
+{
+	int ret, numa_node = bpf_map_attr_numa_node(attr);
+	struct bpf_queue_stack *qs;
+	u32 size, value_size;
+	u64 queue_size, cost;
+
+	size = attr->max_entries + 1;
+	value_size = attr->value_size;
+
+	queue_size = sizeof(*qs) + (u64) value_size * size;
+
+	cost = queue_size;
+	if (cost >= U32_MAX - PAGE_SIZE)
+		return ERR_PTR(-E2BIG);
+
+	cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
+
+	ret = bpf_map_precharge_memlock(cost);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	qs = bpf_map_area_alloc(queue_size, numa_node);
+	if (!qs)
+		return ERR_PTR(-ENOMEM);
+
+	memset(qs, 0, sizeof(*qs));
+
+	bpf_map_init_from_attr(&qs->map, attr);
+
+	qs->map.pages = cost;
+	qs->size = size;
+
+	raw_spin_lock_init(&qs->lock);
+
+	return &qs->map;
+}
+
+/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
+static void queue_stack_map_free(struct bpf_map *map)
+{
+	struct bpf_queue_stack *qs = bpf_queue_stack(map);
+
+	/* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
+	 * so the programs (can be more than one that used this map) were
+	 * disconnected from events. Wait for outstanding critical sections in
+	 * these programs to complete
+	 */
+	synchronize_rcu();
+
+	bpf_map_area_free(qs);
+}
+
+static int __queue_map_get(struct bpf_map *map, void *value, bool delete)
+{
+	struct bpf_queue_stack *qs = bpf_queue_stack(map);
+	unsigned long flags;
+	int err = 0;
+	void *ptr;
+
+	raw_spin_lock_irqsave(&qs->lock, flags);
+
+	if (queue_stack_map_is_empty(qs)) {
+		err = -ENOENT;
+		goto out;
+	}
+
+	ptr = &qs->elements[qs->tail * qs->map.value_size];
+	memcpy(value, ptr, qs->map.value_size);
+
+	if (delete) {
+		if (unlikely(++qs->tail >= qs->size))
+			qs->tail = 0;
+	}
+
+out:
+	raw_spin_unlock_irqrestore(&qs->lock, flags);
+	return err;
+}
+
+
+static int __stack_map_get(struct bpf_map *map, void *value, bool delete)
+{
+	struct bpf_queue_stack *qs = bpf_queue_stack(map);
+	unsigned long flags;
+	int err = 0;
+	void *ptr;
+	u32 index;
+
+	raw_spin_lock_irqsave(&qs->lock, flags);
+
+	if (queue_stack_map_is_empty(qs)) {
+		err = -ENOENT;
+		goto out;
+	}
+
+	index = qs->head - 1;
+	if (unlikely(index >= qs->size))
+		index = qs->size - 1;
+
+	ptr = &qs->elements[index * qs->map.value_size];
+	memcpy(value, ptr, qs->map.value_size);
+
+	if (delete)
+		qs->head = index;
+
+out:
+	raw_spin_unlock_irqrestore(&qs->lock, flags);
+	return err;
+}
+
+/* Called from syscall or from eBPF program */
+static int queue_map_peek_elem(struct bpf_map *map, void *value)
+{
+	return __queue_map_get(map, value, false);
+}
+
+/* Called from syscall or from eBPF program */
+static int stack_map_peek_elem(struct bpf_map *map, void *value)
+{
+	return __stack_map_get(map, value, false);
+}
+
+/* Called from syscall or from eBPF program */
+static int queue_map_pop_elem(struct bpf_map *map, void *value)
+{
+	return __queue_map_get(map, value, true);
+}
+
+/* Called from syscall or from eBPF program */
+static int stack_map_pop_elem(struct bpf_map *map, void *value)
+{
+	return __stack_map_get(map, value, true);
+}
+
+/* Called from syscall or from eBPF program */
+static int queue_stack_map_push_elem(struct bpf_map *map, void *value,
+				     u64 flags)
+{
+	struct bpf_queue_stack *qs = bpf_queue_stack(map);
+	unsigned long irq_flags;
+	int err = 0;
+	void *dst;
+
+	/* BPF_EXIST is used to force making room for a new element in case the
+	 * map is full
+	 */
+	bool replace = (flags & BPF_EXIST);
+
+	/* Check supported flags for queue and stack maps */
+	if (flags & BPF_NOEXIST || flags > BPF_EXIST)
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&qs->lock, irq_flags);
+
+	if (queue_stack_map_is_full(qs)) {
+		if (!replace) {
+			err = -E2BIG;
+			goto out;
+		}
+		/* advance tail pointer to overwrite oldest element */
+		if (unlikely(++qs->tail >= qs->size))
+			qs->tail = 0;
+	}
+
+	dst = &qs->elements[qs->head * qs->map.value_size];
+	memcpy(dst, value, qs->map.value_size);
+
+	if (unlikely(++qs->head >= qs->size))
+		qs->head = 0;
+
+out:
+	raw_spin_unlock_irqrestore(&qs->lock, irq_flags);
+	return err;
+}
+
+/* Called from syscall or from eBPF program */
+static void *queue_stack_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	return NULL;
+}
+
+/* Called from syscall or from eBPF program */
+static int queue_stack_map_update_elem(struct bpf_map *map, void *key,
+				       void *value, u64 flags)
+{
+	return -EINVAL;
+}
+
+/* Called from syscall or from eBPF program */
+static int queue_stack_map_delete_elem(struct bpf_map *map, void *key)
+{
+	return -EINVAL;
+}
+
+/* Called from syscall */
+static int queue_stack_map_get_next_key(struct bpf_map *map, void *key,
+					void *next_key)
+{
+	return -EINVAL;
+}
+
+const struct bpf_map_ops queue_map_ops = {
+	.map_alloc_check = queue_stack_map_alloc_check,
+	.map_alloc = queue_stack_map_alloc,
+	.map_free = queue_stack_map_free,
+	.map_lookup_elem = queue_stack_map_lookup_elem,
+	.map_update_elem = queue_stack_map_update_elem,
+	.map_delete_elem = queue_stack_map_delete_elem,
+	.map_push_elem = queue_stack_map_push_elem,
+	.map_pop_elem = queue_map_pop_elem,
+	.map_peek_elem = queue_map_peek_elem,
+	.map_get_next_key = queue_stack_map_get_next_key,
+};
+
+const struct bpf_map_ops stack_map_ops = {
+	.map_alloc_check = queue_stack_map_alloc_check,
+	.map_alloc = queue_stack_map_alloc,
+	.map_free = queue_stack_map_free,
+	.map_lookup_elem = queue_stack_map_lookup_elem,
+	.map_update_elem = queue_stack_map_update_elem,
+	.map_delete_elem = queue_stack_map_delete_elem,
+	.map_push_elem = queue_stack_map_push_elem,
+	.map_pop_elem = stack_map_pop_elem,
+	.map_peek_elem = stack_map_peek_elem,
+	.map_get_next_key = queue_stack_map_get_next_key,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 78d9dd95e25f..1617407f9ee5 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -727,6 +727,9 @@ static int map_lookup_elem(union bpf_attr *attr)
 		err = bpf_fd_htab_map_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+		   map->map_type == BPF_MAP_TYPE_STACK) {
+		err = map->ops->map_peek_elem(map, value);
 	} else {
 		rcu_read_lock();
 		ptr = map->ops->map_lookup_elem(map, key);
@@ -857,6 +860,9 @@ static int map_update_elem(union bpf_attr *attr)
 		/* rcu_read_lock() is not needed */
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
 							 attr->flags);
+	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+		   map->map_type == BPF_MAP_TYPE_STACK) {
+		err = map->ops->map_push_elem(map, value, attr->flags);
 	} else {
 		rcu_read_lock();
 		err = map->ops->map_update_elem(map, key, value, attr->flags);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d84c91ac3b70..7d6d9cf9ebd5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2324,6 +2324,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (func_id != BPF_FUNC_sk_select_reuseport)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_QUEUE:
+	case BPF_MAP_TYPE_STACK:
+		if (func_id != BPF_FUNC_map_peek_elem &&
+		    func_id != BPF_FUNC_map_pop_elem &&
+		    func_id != BPF_FUNC_map_push_elem)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -2380,6 +2387,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY)
 			goto error;
 		break;
+	case BPF_FUNC_map_peek_elem:
+	case BPF_FUNC_map_pop_elem:
+	case BPF_FUNC_map_push_elem:
+		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_STACK)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -2675,7 +2689,10 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	if (func_id != BPF_FUNC_tail_call &&
 	    func_id != BPF_FUNC_map_lookup_elem &&
 	    func_id != BPF_FUNC_map_update_elem &&
-	    func_id != BPF_FUNC_map_delete_elem)
+	    func_id != BPF_FUNC_map_delete_elem &&
+	    func_id != BPF_FUNC_map_push_elem &&
+	    func_id != BPF_FUNC_map_pop_elem &&
+	    func_id != BPF_FUNC_map_peek_elem)
 		return 0;
 
 	if (meta->map_ptr == NULL) {
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..ea48ec789b5c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4876,6 +4876,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_map_update_elem_proto;
 	case BPF_FUNC_map_delete_elem:
 		return &bpf_map_delete_elem_proto;
+	case BPF_FUNC_map_push_elem:
+		return &bpf_map_push_elem_proto;
+	case BPF_FUNC_map_pop_elem:
+		return &bpf_map_pop_elem_proto;
+	case BPF_FUNC_map_peek_elem:
+		return &bpf_map_peek_elem_proto;
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_get_smp_processor_id:

^ permalink raw reply related

* [PATCH bpf-next v3 6/7] Sync uapi/bpf.h to tools/include
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

Sync both files.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
---
 tools/include/uapi/linux/bpf.h |   30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f9187b41dff6..c8824d5364ff 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -103,6 +103,7 @@ enum bpf_cmd {
 	BPF_BTF_LOAD,
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
+	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 };
 
 enum bpf_map_type {
@@ -128,6 +129,8 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_CGROUP_STORAGE,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
 	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_QUEUE,
+	BPF_MAP_TYPE_STACK,
 };
 
 enum bpf_prog_type {
@@ -462,6 +465,28 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags)
+ * 	Description
+ * 		Push an element *value* in *map*. *flags* is one of:
+ *
+ * 		**BPF_EXIST**
+ * 		If the queue/stack is full, the oldest element is removed to
+ * 		make room for this.
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_pop_elem(struct bpf_map *map, void *value)
+ * 	Description
+ * 		Pop an element from *map*.
+ * Return
+ * 		0 on success, or a negative error in case of failure.
+ *
+ * int bpf_map_peek_elem(struct bpf_map *map, void *value)
+ * 	Description
+ * 		Get an element from *map* without removing it.
+ * Return
+ * 		0 on success, or a negative error in case of failure.
+ *
  * int bpf_probe_read(void *dst, u32 size, const void *src)
  * 	Description
  * 		For tracing programs, safely attempt to read *size* bytes from
@@ -2303,7 +2328,10 @@ union bpf_attr {
 	FN(skb_ancestor_cgroup_id),	\
 	FN(sk_lookup_tcp),		\
 	FN(sk_lookup_udp),		\
-	FN(sk_release),
+	FN(sk_release),			\
+	FN(map_push_elem),		\
+	FN(map_pop_elem),		\
+	FN(map_peek_elem),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call

^ permalink raw reply related

* [PATCH bpf-next v3 7/7] selftests/bpf: add test cases for queue and stack maps
From: Mauricio Vasquez B @ 2018-10-18 13:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, netdev; +Cc: Song Liu
In-Reply-To: <153986856416.9127.9618539079636149043.stgit@kernel>

test_maps:
Tests that queue/stack maps are behaving correctly even in corner cases

test_progs:
Tests new ebpf helpers

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
---
 tools/lib/bpf/bpf.c                                |   12 ++
 tools/lib/bpf/bpf.h                                |    2 
 tools/testing/selftests/bpf/Makefile               |    5 +
 tools/testing/selftests/bpf/bpf_helpers.h          |    7 +
 tools/testing/selftests/bpf/test_maps.c            |  122 ++++++++++++++++++++
 tools/testing/selftests/bpf/test_progs.c           |   99 ++++++++++++++++
 tools/testing/selftests/bpf/test_queue_map.c       |    4 +
 tools/testing/selftests/bpf/test_queue_stack_map.h |   59 ++++++++++
 tools/testing/selftests/bpf/test_stack_map.c       |    4 +
 9 files changed, 313 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_queue_map.c
 create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h
 create mode 100644 tools/testing/selftests/bpf/test_stack_map.c

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index d70a255cb05e..03f9bcc4ef50 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -278,6 +278,18 @@ int bpf_map_lookup_elem(int fd, const void *key, void *value)
 	return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
 }
 
+int bpf_map_lookup_and_delete_elem(int fd, const void *key, void *value)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.map_fd = fd;
+	attr.key = ptr_to_u64(key);
+	attr.value = ptr_to_u64(value);
+
+	return sys_bpf(BPF_MAP_LOOKUP_AND_DELETE_ELEM, &attr, sizeof(attr));
+}
+
 int bpf_map_delete_elem(int fd, const void *key)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 258c3c178333..26a51538213c 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -99,6 +99,8 @@ LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
 				   __u64 flags);
 
 LIBBPF_API int bpf_map_lookup_elem(int fd, const void *key, void *value);
+LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
+					      void *value);
 LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
 LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
 LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index d99dd6fc3fbe..e39dfb4e7970 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -37,7 +37,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
 	get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
 	test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o \
-	test_sk_lookup_kern.o test_xdp_vlan.o
+	test_sk_lookup_kern.o test_xdp_vlan.o test_queue_map.o test_stack_map.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -118,6 +118,9 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
 $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
 $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
 
+$(OUTPUT)/test_queue_map.o: test_queue_stack_map.h
+$(OUTPUT)/test_stack_map.o: test_queue_stack_map.h
+
 BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
 BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
 BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm')
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index fda8c162d0df..6407a3df0f3b 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -16,6 +16,13 @@ static int (*bpf_map_update_elem)(void *map, void *key, void *value,
 	(void *) BPF_FUNC_map_update_elem;
 static int (*bpf_map_delete_elem)(void *map, void *key) =
 	(void *) BPF_FUNC_map_delete_elem;
+static int (*bpf_map_push_elem)(void *map, void *value,
+				unsigned long long flags) =
+	(void *) BPF_FUNC_map_push_elem;
+static int (*bpf_map_pop_elem)(void *map, void *value) =
+	(void *) BPF_FUNC_map_pop_elem;
+static int (*bpf_map_peek_elem)(void *map, void *value) =
+	(void *) BPF_FUNC_map_peek_elem;
 static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) =
 	(void *) BPF_FUNC_probe_read;
 static unsigned long long (*bpf_ktime_get_ns)(void) =
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 9b552c0fc47d..4db2116e52be 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -15,6 +15,7 @@
 #include <string.h>
 #include <assert.h>
 #include <stdlib.h>
+#include <time.h>
 
 #include <sys/wait.h>
 #include <sys/socket.h>
@@ -471,6 +472,122 @@ static void test_devmap(int task, void *data)
 	close(fd);
 }
 
+static void test_queuemap(int task, void *data)
+{
+	const int MAP_SIZE = 32;
+	__u32 vals[MAP_SIZE + MAP_SIZE/2], val;
+	int fd, i;
+
+	/* Fill test values to be used */
+	for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++)
+		vals[i] = rand();
+
+	/* Invalid key size */
+	fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 4, sizeof(val), MAP_SIZE,
+			    map_flags);
+	assert(fd < 0 && errno == EINVAL);
+
+	fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 0, sizeof(val), MAP_SIZE,
+			    map_flags);
+	/* Queue map does not support BPF_F_NO_PREALLOC */
+	if (map_flags & BPF_F_NO_PREALLOC) {
+		assert(fd < 0 && errno == EINVAL);
+		return;
+	}
+	if (fd < 0) {
+		printf("Failed to create queuemap '%s'!\n", strerror(errno));
+		exit(1);
+	}
+
+	/* Push MAP_SIZE elements */
+	for (i = 0; i < MAP_SIZE; i++)
+		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
+
+	/* Check that element cannot be pushed due to max_entries limit */
+	assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 &&
+	       errno == E2BIG);
+
+	/* Peek element */
+	assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[0]);
+
+	/* Replace half elements */
+	for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++)
+		assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0);
+
+	/* Pop all elements */
+	for (i = MAP_SIZE/2; i < MAP_SIZE + MAP_SIZE/2; i++)
+		assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
+		       val == vals[i]);
+
+	/* Check that there are not elements left */
+	assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 &&
+	       errno == ENOENT);
+
+	/* Check that non supported functions set errno to EINVAL */
+	assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL);
+	assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL);
+
+	close(fd);
+}
+
+static void test_stackmap(int task, void *data)
+{
+	const int MAP_SIZE = 32;
+	__u32 vals[MAP_SIZE + MAP_SIZE/2], val;
+	int fd, i;
+
+	/* Fill test values to be used */
+	for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++)
+		vals[i] = rand();
+
+	/* Invalid key size */
+	fd = bpf_create_map(BPF_MAP_TYPE_STACK, 4, sizeof(val), MAP_SIZE,
+			    map_flags);
+	assert(fd < 0 && errno == EINVAL);
+
+	fd = bpf_create_map(BPF_MAP_TYPE_STACK, 0, sizeof(val), MAP_SIZE,
+			    map_flags);
+	/* Stack map does not support BPF_F_NO_PREALLOC */
+	if (map_flags & BPF_F_NO_PREALLOC) {
+		assert(fd < 0 && errno == EINVAL);
+		return;
+	}
+	if (fd < 0) {
+		printf("Failed to create stackmap '%s'!\n", strerror(errno));
+		exit(1);
+	}
+
+	/* Push MAP_SIZE elements */
+	for (i = 0; i < MAP_SIZE; i++)
+		assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0);
+
+	/* Check that element cannot be pushed due to max_entries limit */
+	assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 &&
+	       errno == E2BIG);
+
+	/* Peek element */
+	assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[i - 1]);
+
+	/* Replace half elements */
+	for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++)
+		assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0);
+
+	/* Pop all elements */
+	for (i = MAP_SIZE + MAP_SIZE/2 - 1; i >= MAP_SIZE/2; i--)
+		assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 &&
+		       val == vals[i]);
+
+	/* Check that there are not elements left */
+	assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 &&
+	       errno == ENOENT);
+
+	/* Check that non supported functions set errno to EINVAL */
+	assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL);
+	assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL);
+
+	close(fd);
+}
+
 #include <sys/socket.h>
 #include <sys/ioctl.h>
 #include <arpa/inet.h>
@@ -1434,10 +1551,15 @@ static void run_all_tests(void)
 	test_map_wronly();
 
 	test_reuseport_array();
+
+	test_queuemap(0, NULL);
+	test_stackmap(0, NULL);
 }
 
 int main(void)
 {
+	srand(time(NULL));
+
 	map_flags = 0;
 	run_all_tests();
 
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e8becca9c521..2d3c04f45530 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1735,8 +1735,105 @@ static void test_reference_tracking()
 	bpf_object__close(obj);
 }
 
+enum {
+	QUEUE,
+	STACK,
+};
+
+static void test_queue_stack_map(int type)
+{
+	const int MAP_SIZE = 32;
+	__u32 vals[MAP_SIZE], duration, retval, size, val;
+	int i, err, prog_fd, map_in_fd, map_out_fd;
+	char file[32], buf[128];
+	struct bpf_object *obj;
+	struct iphdr *iph = (void *)buf + sizeof(struct ethhdr);
+
+	/* Fill test values to be used */
+	for (i = 0; i < MAP_SIZE; i++)
+		vals[i] = rand();
+
+	if (type == QUEUE)
+		strncpy(file, "./test_queue_map.o", sizeof(file));
+	else if (type == STACK)
+		strncpy(file, "./test_stack_map.o", sizeof(file));
+	else
+		return;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+	if (err) {
+		error_cnt++;
+		return;
+	}
+
+	map_in_fd = bpf_find_map(__func__, obj, "map_in");
+	if (map_in_fd < 0)
+		goto out;
+
+	map_out_fd = bpf_find_map(__func__, obj, "map_out");
+	if (map_out_fd < 0)
+		goto out;
+
+	/* Push 32 elements to the input map */
+	for (i = 0; i < MAP_SIZE; i++) {
+		err = bpf_map_update_elem(map_in_fd, NULL, &vals[i], 0);
+		if (err) {
+			error_cnt++;
+			goto out;
+		}
+	}
+
+	/* The eBPF program pushes iph.saddr in the output map,
+	 * pops the input map and saves this value in iph.daddr
+	 */
+	for (i = 0; i < MAP_SIZE; i++) {
+		if (type == QUEUE) {
+			val = vals[i];
+			pkt_v4.iph.saddr = vals[i] * 5;
+		} else if (type == STACK) {
+			val = vals[MAP_SIZE - 1 - i];
+			pkt_v4.iph.saddr = vals[MAP_SIZE - 1 - i] * 5;
+		}
+
+		err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+					buf, &size, &retval, &duration);
+		if (err || retval || size != sizeof(pkt_v4) ||
+		    iph->daddr != val)
+			break;
+	}
+
+	CHECK(err || retval || size != sizeof(pkt_v4) || iph->daddr != val,
+	      "bpf_map_pop_elem",
+	      "err %d errno %d retval %d size %d iph->daddr %u\n",
+	      err, errno, retval, size, iph->daddr);
+
+	/* Queue is empty, program should return TC_ACT_SHOT */
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != 2 /* TC_ACT_SHOT */|| size != sizeof(pkt_v4),
+	      "check-queue-stack-map-empty",
+	      "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	/* Check that the program pushed elements correctly */
+	for (i = 0; i < MAP_SIZE; i++) {
+		err = bpf_map_lookup_and_delete_elem(map_out_fd, NULL, &val);
+		if (err || val != vals[i] * 5)
+			break;
+	}
+
+	CHECK(i != MAP_SIZE && (err || val != vals[i] * 5),
+	      "bpf_map_push_elem", "err %d value %u\n", err, val);
+
+out:
+	pkt_v4.iph.saddr = 0;
+	bpf_object__close(obj);
+}
+
 int main(void)
 {
+	srand(time(NULL));
+
 	jit_enabled = is_jit_enabled();
 
 	test_pkt_access();
@@ -1757,6 +1854,8 @@ int main(void)
 	test_task_fd_query_rawtp();
 	test_task_fd_query_tp();
 	test_reference_tracking();
+	test_queue_stack_map(QUEUE);
+	test_queue_stack_map(STACK);
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
diff --git a/tools/testing/selftests/bpf/test_queue_map.c b/tools/testing/selftests/bpf/test_queue_map.c
new file mode 100644
index 000000000000..87db1f9da33d
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_queue_map.c
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Politecnico di Torino
+#define MAP_TYPE BPF_MAP_TYPE_QUEUE
+#include "test_queue_stack_map.h"
diff --git a/tools/testing/selftests/bpf/test_queue_stack_map.h b/tools/testing/selftests/bpf/test_queue_stack_map.h
new file mode 100644
index 000000000000..295b9b3bc5c7
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_queue_stack_map.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (c) 2018 Politecnico di Torino
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/pkt_cls.h>
+#include "bpf_helpers.h"
+
+int _version SEC("version") = 1;
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) map_in = {
+	.type = MAP_TYPE,
+	.key_size = 0,
+	.value_size = sizeof(__u32),
+	.max_entries = 32,
+	.map_flags = 0,
+};
+
+struct bpf_map_def __attribute__ ((section("maps"), used)) map_out = {
+	.type = MAP_TYPE,
+	.key_size = 0,
+	.value_size = sizeof(__u32),
+	.max_entries = 32,
+	.map_flags = 0,
+};
+
+SEC("test")
+int _test(struct __sk_buff *skb)
+{
+	void *data_end = (void *)(long)skb->data_end;
+	void *data = (void *)(long)skb->data;
+	struct ethhdr *eth = (struct ethhdr *)(data);
+	__u32 value;
+	int err;
+
+	if (eth + 1 > data_end)
+		return TC_ACT_SHOT;
+
+	struct iphdr *iph = (struct iphdr *)(eth + 1);
+
+	if (iph + 1 > data_end)
+		return TC_ACT_SHOT;
+
+	err = bpf_map_pop_elem(&map_in, &value);
+	if (err)
+		return TC_ACT_SHOT;
+
+	iph->daddr = value;
+
+	err = bpf_map_push_elem(&map_out, &iph->saddr, 0);
+	if (err)
+		return TC_ACT_SHOT;
+
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_stack_map.c b/tools/testing/selftests/bpf/test_stack_map.c
new file mode 100644
index 000000000000..31c3880e6da0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_stack_map.c
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Politecnico di Torino
+#define MAP_TYPE BPF_MAP_TYPE_STACK
+#include "test_queue_stack_map.h"

^ permalink raw reply related

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Rafael David Tinoco @ 2018-10-18 13:21 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Sebastian Andrzej Siewior, Jason Wang, netdev, virtualization,
	tglx, Michael S. Tsirkin, David S. Miller
In-Reply-To: <55f14915-744b-e11c-bc50-87a872218479@lab.ntt.co.jp>

On Thu, Oct 18, 2018 at 6:19 AM, Toshiaki Makita
<makita.toshiaki@lab.ntt.co.jp> wrote:
> On 2018/10/18 18:08, Sebastian Andrzej Siewior wrote:
>> On 2018-10-18 18:00:05 [+0900], Toshiaki Makita wrote:
>>> On 2018/10/18 17:47, Sebastian Andrzej Siewior wrote:
>>>> On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
>>>>>
>>>>> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
>>>>>> I'm not sure what condition triggered this warning.
>>>>
>>>> If the seqlock is acquired once in softirq and then in process context
>>>> again it is enough evidence for lockdep to trigger this warning.
>>>
>>> No. As I said that should not happen because of NAPI guard.
>> Again: lockdep saw the lock in softirq context once and in process
>> context once and this is what triggers the warning. It does not matter
>> if NAPI is enabled or not during the access in process context. If you
>> want to allow this you need further lockdep annotation…
>>
>> … but: refill_work() disables NAPI for &vi->rq[1] and refills + updates
>> stats while NAPI is enabled for &vi->rq[0].
>
> Do you mean this is false positive? rq[0] and rq[1] never race with each
> other...
>

I just came to this thread after having the same "false positive"
warning on an armhf kvm guest dmesg.

It appears to me that, at least for my case, the sequence:

u64_stats_update_begin() -> write_seqcount_begin() ->
write_seqcount_begin_nested() -> raw_write_seqcount_begin()

is only incrementing s->sequence++. With that, whenever we have:

CONFIG_TRACE_IRQ_FLAGS and CONFIG_DEBUG_LOCK_ALLOC enabled

we might face this false-positive warning since there are no locks,
but just a sequencer, right ? So, Having a barrier, after incrementing
the sequence, like I have now, won't block the other context to
"acquire" the "same lock" (not a lock for this particular case)
warning done in "seqcount_acquire()".

Hope this helps the discussion.

Link: https://bugs.linaro.org/show_bug.cgi?id=4027

Thank

Rafael Tinoco

^ permalink raw reply

* Re: [danielwa@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
From: Daniel Walker @ 2018-10-18 13:42 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Hemant Ramdasi, netdev
In-Reply-To: <HE1PR04MB114545F641D276841BA2656F96F80@HE1PR04MB1145.eurprd04.prod.outlook.com>

On Thu, Oct 18, 2018 at 12:16:06PM +0000, Claudiu Manoil wrote:
> Hi,
> 
> Sorry but I never heard about the phy you're quoting, this m88e1101, what is it?
> Link mode? (SGMII, RGMII, ?)
> Our boards (the ones I know) have Vitesse or Atheros phys.
> If the maccfg2 setting you're mentioning really makes the difference, then it looks
> like your phy enters in 10/100 Mbit or half duplex operation mode after MAC reset,
> aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate
> in 1000 full duplex mode (GMII mode) by default.
> Link speed settings for the MACCFG2 register should be later adjusted via adjust_link() callback,
> so that if the initial maccfg2 settings don't match with the phy settings they will be adjusted
> by phylib's adjust_link().  For some reason this doesn't seem to happen on your setup either.
> So, could you please confirm whether after MAC reset your phy enters lower speed mode (MII),
> and whether the adjust_link() callback is getting invoked after ifconfig up?
> 


It's a Marvell phy, this is not an eval board from NXP it's custom hardware. The link on this board
is setup to run at 100Mps. Here's a snippet of the logs during a test run.

IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready

fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
PING 10.126.154.1 (10.126.154.1): 56 data bytes
64 bytes from 10.126.154.1: seq=0 ttl=255 time=2.101 ms

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss


I can check if adjust_link() is running. This kernel has only very few changes to allow the hardware to work
allos isolated under arch/powerpc/ , certainly no changes under drivers/. So if it's suppose to be running
there is no reason why it wouldn't be.

Daniel

^ permalink raw reply

* [iproute PATCH] ip-route: Fix parse_encap_seg6() srh parsing
From: Phil Sutter @ 2018-10-18 13:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Lebrun, netdev

In case caller did not specify 'segs' parameter, parse_srh() would read
garbage while iterating over 'segbuf'. Avoid this by initializing
'segbuf' to an empty string.

Fixes: e8493916a8ede ("iproute: add support for SR-IPv6 lwtunnel encapsulation")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 ip/iproute_lwtunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index 85045d4fff742..4ebfaa7cd6826 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -494,7 +494,7 @@ static int parse_encap_seg6(struct rtattr *rta, size_t len, int *argcp,
 	struct seg6_iptunnel_encap *tuninfo;
 	struct ipv6_sr_hdr *srh;
 	char **argv = *argvp;
-	char segbuf[1024];
+	char segbuf[1024] = "";
 	int argc = *argcp;
 	int encap = -1;
 	__u32 hmac = 0;
-- 
2.19.0

^ permalink raw reply related

* [PATCH] qed: fix spelling mistake "transcevier" -> "transceiver"
From: Colin King @ 2018-10-18 21:47 UTC (permalink / raw)
  To: Ariel Elior, everest-linux-l2, David S . Miller, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in DP_INFO message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 554d57ac1629..386ee5410237 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -2035,7 +2035,7 @@ int qed_mcp_trans_speed_mask(struct qed_hwfn *p_hwfn,
 		    NVM_CFG1_PORT_DRV_SPEED_CAPABILITY_MASK_1G;
 		break;
 	default:
-		DP_INFO(p_hwfn, "Unknown transcevier type 0x%x\n",
+		DP_INFO(p_hwfn, "Unknown transceiver type 0x%x\n",
 			transceiver_type);
 		*p_speed_mask = 0xff;
 		break;
-- 
2.19.1

^ permalink raw reply related

* [iproute PATCH] tipc: Drop unused variable 'genl'
From: Phil Sutter @ 2018-10-18 13:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Richard Alpe, netdev

Although initialized by call to libmnl, the variable is used only in a
call to sizeof(). Drop it and call sizeof with its type instead.

Fixes: f043759dd4928 ("tipc: add new TIPC configuration tool")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 tipc/node.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tipc/node.c b/tipc/node.c
index 0fa1064c72a17..2fec6753c974d 100644
--- a/tipc/node.c
+++ b/tipc/node.c
@@ -26,13 +26,12 @@
 
 static int node_list_cb(const struct nlmsghdr *nlh, void *data)
 {
-	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
 	struct nlattr *info[TIPC_NLA_MAX + 1] = {};
 	struct nlattr *attrs[TIPC_NLA_NODE_MAX + 1] = {};
 	char str[33] = {};
 	uint32_t addr;
 
-	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+	mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
 	if (!info[TIPC_NLA_NODE])
 		return MNL_CB_ERROR;
 
@@ -160,7 +159,6 @@ static int cmd_node_set_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 
 static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data)
 {
-	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
 	struct nlattr *info[TIPC_NLA_MAX + 1] = {};
 	struct nlattr *attrs[TIPC_NLA_NET_MAX + 1] = {};
 	char str[33] = {0,};
@@ -168,7 +166,7 @@ static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data)
 	uint64_t *w0 = (uint64_t *) &id[0];
 	uint64_t *w1 = (uint64_t *) &id[8];
 
-	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+	mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
 	if (!info[TIPC_NLA_NET])
 		return MNL_CB_ERROR;
 
@@ -207,11 +205,10 @@ static int cmd_node_get_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd,
 
 static int netid_get_cb(const struct nlmsghdr *nlh, void *data)
 {
-	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
 	struct nlattr *info[TIPC_NLA_MAX + 1] = {};
 	struct nlattr *attrs[TIPC_NLA_NET_MAX + 1] = {};
 
-	mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
+	mnl_attr_parse(nlh, sizeof(struct genlmsghdr), parse_attrs, info);
 	if (!info[TIPC_NLA_NET])
 		return MNL_CB_ERROR;
 
-- 
2.19.0

^ permalink raw reply related

* [iproute PATCH] tc: Remove pointless assignments in batch()
From: Phil Sutter @ 2018-10-18 13:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Chris Mi, netdev

All these assignments are later overwritten without reading in between,
so just drop them.

Fixes: 485d0c6001c4a ("tc: Add batchsize feature for filter and actions")
Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 tc/tc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index c493d5e92e0dd..eacd5c08573d4 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -325,11 +325,11 @@ static int batch(const char *name)
 	struct batch_buf *head = NULL, *tail = NULL, *buf_pool = NULL;
 	char *largv[100], *largv_next[100];
 	char *line, *line_next = NULL;
-	bool bs_enabled_next = false;
 	bool bs_enabled = false;
 	bool lastline = false;
 	int largc, largc_next;
 	bool bs_enabled_saved;
+	bool bs_enabled_next;
 	int batchsize = 0;
 	size_t len = 0;
 	int ret = 0;
@@ -358,7 +358,6 @@ static int batch(const char *name)
 		goto Exit;
 	largc = makeargs(line, largv, 100);
 	bs_enabled = batchsize_enabled(largc, largv);
-	bs_enabled_saved = bs_enabled;
 	do {
 		if (getcmdline(&line_next, &len, stdin) == -1)
 			lastline = true;
@@ -394,7 +393,6 @@ static int batch(const char *name)
 		len = 0;
 		bs_enabled_saved = bs_enabled;
 		bs_enabled = bs_enabled_next;
-		bs_enabled_next = false;
 
 		if (largc == 0) {
 			largc = largc_next;
-- 
2.19.0

^ permalink raw reply related

* [PATCH][ath10k-next] ath10k: fix some spelling mistakes
From: Colin King @ 2018-10-18 21:51 UTC (permalink / raw)
  To: Kalle Valo, David S . Miller, ath10k, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to some spelling mistakes in ath10k_err and ath10k_dbg
messages:
"capablity" -> "capability"
"registed" -> "registered"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ath/ath10k/qmi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/qmi.c b/drivers/net/wireless/ath/ath10k/qmi.c
index 56cb1831dcdf..c876f9057468 100644
--- a/drivers/net/wireless/ath/ath10k/qmi.c
+++ b/drivers/net/wireless/ath/ath10k/qmi.c
@@ -543,7 +543,7 @@ static int ath10k_qmi_cap_send_sync_msg(struct ath10k_qmi *qmi)
 		goto out;
 
 	if (resp->resp.result != QMI_RESULT_SUCCESS_V01) {
-		ath10k_err(ar, "capablity req rejected: %d\n", resp->resp.error);
+		ath10k_err(ar, "capability req rejected: %d\n", resp->resp.error);
 		ret = -EINVAL;
 		goto out;
 	}
@@ -623,7 +623,7 @@ static int ath10k_qmi_host_cap_send_sync(struct ath10k_qmi *qmi)
 		goto out;
 	}
 
-	ath10k_dbg(ar, ATH10K_DBG_QMI, "qmi host capablity request completed\n");
+	ath10k_dbg(ar, ATH10K_DBG_QMI, "qmi host capability request completed\n");
 	return 0;
 
 out:
@@ -657,7 +657,7 @@ ath10k_qmi_ind_register_send_sync_msg(struct ath10k_qmi *qmi)
 			       wlfw_ind_register_req_msg_v01_ei, &req);
 	if (ret < 0) {
 		qmi_txn_cancel(&txn);
-		ath10k_err(ar, "failed to send indication registed request: %d\n", ret);
+		ath10k_err(ar, "failed to send indication registered request: %d\n", ret);
 		goto out;
 	}
 
-- 
2.19.1

^ permalink raw reply related

* Re: [net PATCH] net: sched: Fix for duplicate class dump
From: Eric Dumazet @ 2018-10-18 13:57 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: phil, David Miller, netdev
In-Reply-To: <20181018125742.GE4558@nanopsycho.orion>

On Thu, Oct 18, 2018 at 6:03 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Thu, Oct 18, 2018 at 10:34:26AM CEST, phil@nwl.cc wrote:
> >When dumping classes by parent, kernel would return classes twice:
> >
> >| # tc qdisc add dev lo root prio
> >| # tc class show dev lo
> >| class prio 8001:1 parent 8001:
> >| class prio 8001:2 parent 8001:
> >| class prio 8001:3 parent 8001:
> >| # tc class show dev lo parent 8001:
> >| class prio 8001:1 parent 8001:
> >| class prio 8001:2 parent 8001:
> >| class prio 8001:3 parent 8001:
> >| class prio 8001:1 parent 8001:
> >| class prio 8001:2 parent 8001:
> >| class prio 8001:3 parent 8001:
> >
> >This comes from qdisc_match_from_root() potentially returning the root
> >qdisc itself if its handle matched. Though in that case, root's classes
> >were already dumped a few lines above.
> >
> >Fixes: cb395b2010879 ("net: sched: optimize class dumps")
> >Signed-off-by: Phil Sutter <phil@nwl.cc>
>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>

Good catch, thanks for the fix !

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH net-next] cxgb4: fix the error path of cxgb4_uld_register()
From: Ganesh Goudar @ 2018-10-18 14:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, dt, harsh, linux-crypto, Ganesh Goudar

On multi adapter setup if the uld registration fails even on
one adapter, the allocated resources for the uld on all the
adapters are freed, rendering the functioning adapters unusable.

This commit fixes the issue by freeing the allocated resources
only for the failed adapter.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/crypto/chelsio/chcr_core.c             |  4 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 46 ++++++--------------------
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |  2 +-
 3 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_core.c b/drivers/crypto/chelsio/chcr_core.c
index 04f277c..62249d4 100644
--- a/drivers/crypto/chelsio/chcr_core.c
+++ b/drivers/crypto/chelsio/chcr_core.c
@@ -237,9 +237,7 @@ static int chcr_uld_state_change(void *handle, enum cxgb4_state state)
 
 static int __init chcr_crypto_init(void)
 {
-	if (cxgb4_register_uld(CXGB4_ULD_CRYPTO, &chcr_uld_info))
-		pr_err("ULD register fail: No chcr crypto support in cxgb4\n");
-
+	cxgb4_register_uld(CXGB4_ULD_CRYPTO, &chcr_uld_info);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index 4bc2110..2673226 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -702,15 +702,14 @@ static void uld_attach(struct adapter *adap, unsigned int uld)
  *	about any presently available devices that support its type.  Returns
  *	%-EBUSY if a ULD of the same type is already registered.
  */
-int cxgb4_register_uld(enum cxgb4_uld type,
-		       const struct cxgb4_uld_info *p)
+void cxgb4_register_uld(enum cxgb4_uld type,
+			const struct cxgb4_uld_info *p)
 {
 	int ret = 0;
-	unsigned int adap_idx = 0;
 	struct adapter *adap;
 
 	if (type >= CXGB4_ULD_MAX)
-		return -EINVAL;
+		return;
 
 	mutex_lock(&uld_mutex);
 	list_for_each_entry(adap, &adapter_list, list_node) {
@@ -733,52 +732,29 @@ int cxgb4_register_uld(enum cxgb4_uld type,
 		}
 		if (adap->flags & FULL_INIT_DONE)
 			enable_rx_uld(adap, type);
-		if (adap->uld[type].add) {
-			ret = -EBUSY;
+		if (adap->uld[type].add)
 			goto free_irq;
-		}
 		ret = setup_sge_txq_uld(adap, type, p);
 		if (ret)
 			goto free_irq;
 		adap->uld[type] = *p;
 		uld_attach(adap, type);
-		adap_idx++;
-	}
-	mutex_unlock(&uld_mutex);
-	return 0;
-
+		continue;
 free_irq:
-	if (adap->flags & FULL_INIT_DONE)
-		quiesce_rx_uld(adap, type);
-	if (adap->flags & USING_MSIX)
-		free_msix_queue_irqs_uld(adap, type);
-free_rxq:
-	free_sge_queues_uld(adap, type);
-free_queues:
-	free_queues_uld(adap, type);
-out:
-
-	list_for_each_entry(adap, &adapter_list, list_node) {
-		if ((type == CXGB4_ULD_CRYPTO && !is_pci_uld(adap)) ||
-		    (type != CXGB4_ULD_CRYPTO && !is_offload(adap)))
-			continue;
-		if (type == CXGB4_ULD_ISCSIT && is_t4(adap->params.chip))
-			continue;
-		if (!adap_idx)
-			break;
-		adap->uld[type].handle = NULL;
-		adap->uld[type].add = NULL;
-		release_sge_txq_uld(adap, type);
 		if (adap->flags & FULL_INIT_DONE)
 			quiesce_rx_uld(adap, type);
 		if (adap->flags & USING_MSIX)
 			free_msix_queue_irqs_uld(adap, type);
+free_rxq:
 		free_sge_queues_uld(adap, type);
+free_queues:
 		free_queues_uld(adap, type);
-		adap_idx--;
+out:
+		dev_warn(adap->pdev_dev,
+			 "ULD registration failed for uld type %d\n", type);
 	}
 	mutex_unlock(&uld_mutex);
-	return ret;
+	return;
 }
 EXPORT_SYMBOL(cxgb4_register_uld);
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index de9ad31..5fa9a2d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -384,7 +384,7 @@ struct cxgb4_uld_info {
 	int (*tx_handler)(struct sk_buff *skb, struct net_device *dev);
 };
 
-int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
+void cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
 int cxgb4_unregister_uld(enum cxgb4_uld type);
 int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
 int cxgb4_immdata_send(struct net_device *dev, unsigned int idx,
-- 
2.1.0

^ permalink raw reply related

* Re: [danielwa@cisco.com: Re: gianfar: Implement MAC reset and reconfig procedure]
From: Daniel Walker @ 2018-10-18 14:05 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Hemant Ramdasi, netdev
In-Reply-To: <HE1PR04MB114545F641D276841BA2656F96F80@HE1PR04MB1145.eurprd04.prod.outlook.com>

On Thu, Oct 18, 2018 at 12:16:06PM +0000, Claudiu Manoil wrote:
> Hi,
> 
> Sorry but I never heard about the phy you're quoting, this m88e1101, what is it?
> Link mode? (SGMII, RGMII, ?)
> Our boards (the ones I know) have Vitesse or Atheros phys.
> If the maccfg2 setting you're mentioning really makes the difference, then it looks
> like your phy enters in 10/100 Mbit or half duplex operation mode after MAC reset,
> aka lower speed MII mode, whereas the INIT_SETTINGS set up the MAC to operate
> in 1000 full duplex mode (GMII mode) by default.
> Link speed settings for the MACCFG2 register should be later adjusted via adjust_link() callback,
> so that if the initial maccfg2 settings don't match with the phy settings they will be adjusted
> by phylib's adjust_link().  For some reason this doesn't seem to happen on your setup either.
> So, could you please confirm whether after MAC reset your phy enters lower speed mode (MII),
> and whether the adjust_link() callback is getting invoked after ifconfig up?

Here's some parts of the logs. I added a dump_stack() into adjust_link(). It
does appear to be running, but it seems it's not working or not doing what you
think it should be doing. The signature of the issue is below, you bring up the
interface the first time and it works, then bring it down/up and no traffic.
You can see in the second ping there is %100 packet loss. 

Seems the "Link is Up" lines indicate what adjust_link() changes.

IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
PING 10.126.154.1 (10.126.154.1): 56 data bytes
64 bytes from 10.126.154.1: seq=0 ttl=255 time=5.606 ms

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 5.606/5.606/5.606 ms
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430f1c] phy_state_machine+0x428/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.14.0-rc3 #174
Workqueue: events_power_efficient phy_state_machine
Call Trace:
[e81ffdb0] [c0008718] show_stack+0xfc/0x1bc (unreliable)
[e81ffe00] [c0602168] dump_stack+0x78/0xa0
[e81ffe10] [c0437b20] adjust_link+0x30/0x2b0
[e81ffe50] [c0430e60] phy_state_machine+0x36c/0x47c
[e81ffe70] [c0060a84] process_one_work+0x158/0x3c4
[e81ffea0] [c0061120] worker_thread+0x138/0x384
[e81ffed0] [c0068714] kthread+0xd0/0xe4
[e81fff40] [c0011bc8] ret_from_kernel_thread+0x5c/0x64
fsl-gianfar ff725000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
PING 10.126.154.1 (10.126.154.1): 56 data bytes

--- 10.126.154.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

^ permalink raw reply

* Re: [PATCH net-next] bnxt_en: Copy and paste bug in extended tx_stats
From: Michael Chan @ 2018-10-18 14:32 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: David Miller, Netdev, kernel-janitors
In-Reply-To: <20181018080239.z2egoh4bw4beb3r3@kili.mountain>

On Thu, Oct 18, 2018 at 1:02 AM Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> The struct type was copied from the line before but it should be "tx"
> instead of "rx".  I have reviewed the code and I can't immediately see
> that this bug causes a runtime issue.
>
> Fixes: 36e53349b60b ("bnxt_en: Add additional extended port statistics.")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Thanks.  Luckily, we did not use sizeof(*bp->hw_tx_port_stats_ext) to
allocate the memory, so there is no run-time issue.

Acked-by: Michael Chan <michael.chan@broadcom.com>

^ permalink raw reply

* Re: [PATCH] atm: eni: Move semicolon to a new line after empty for loop
From: David Miller @ 2018-10-18 22:41 UTC (permalink / raw)
  To: natechancellor; +Cc: 3chas3, linux-atm-general, netdev, linux-kernel
In-Reply-To: <20181017180334.8640-1-natechancellor@gmail.com>

From: Nathan Chancellor <natechancellor@gmail.com>
Date: Wed, 17 Oct 2018 11:03:34 -0700

> Clang warns:
> 
> drivers/atm/eni.c:244:48: error: for loop has empty body
> [-Werror,-Wempty-body]
>         for (order = 0; (1 << order) < *size; order++);
>                                                       ^
> drivers/atm/eni.c:244:48: note: put the semicolon on a separate line to
> silence this warning
> 
> In this case, that loop is expected to be empty so silence the warning
> in the way that Clang suggests.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/42
> Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox