Netdev List
 help / color / mirror / Atom feed
* Re: pull-request: ieee802154 2018-05-08
From: David Miller @ 2018-05-09  0:06 UTC (permalink / raw)
  To: stefan; +Cc: s.schmidt, linux-wpan, alex.aring, netdev
In-Reply-To: <a61fa7c3-fa57-535c-7f00-b495d0a46bb0@osg.samsung.com>

From: Stefan Schmidt <stefan@osg.samsung.com>
Date: Tue, 8 May 2018 21:55:37 +0200

> On 05/08/2018 04:18 PM, David Miller wrote:
>> Please submit the -stable fix directly, you can feel free to CC: me.
> 
> Will do when the patch hits Linus git tree.
> 
> I have a quick question on the process here. From the netdev-faq document
> I was under the impression all stable patches under net/ and drivers/net
> should be brought up to you and would be handled by you.
> 
> Does this apply to the core part of net (I fully understand that ieee802154
> is rather a niche) or is there some other reason for this exception?
> 
> Both processes (the normal stable one as well as the slightly different one
> for net/) would be fine to go with for me. Just need to know which one I
> should use for future stable patches. :-)
> 
> regards
> Stefan Schmidt

Generally wireless and ipsec have been submitting them directly,
sometimes the Intel ethernet guys do too.

Sometimes this makes things easier for me, and I'll ask you to submit
things directly when that is the case like right now.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: sfp: handle cases where neither BR,min nor BR,max is given
From: David Miller @ 2018-05-09  0:14 UTC (permalink / raw)
  To: antoine.tenart
  Cc: linux, netdev, linux-kernel, thomas.petazzoni, maxime.chevallier,
	gregory.clement, miquel.raynal, nadavh, stefanc, ymarkman, mw
In-Reply-To: <20180504152103.18152-1-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Fri,  4 May 2018 17:21:03 +0200

> When computing the bitrate using values read from an SFP module EEPROM,
> we use the nominal BR plus BR,min and BR,max to determine the
> boundaries. But in some cases BR,min and BR,max aren't provided, which
> led the SFP code to end up having the nominal value for both the minimum
> and maximum bitrate values. When using a passive cable, the nominal
> value should be used as the maximum one, and there is no minimum one
> so we should use 0.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 0/9] net: dsa: Plug in PHYLINK support
From: David Miller @ 2018-05-09  0:17 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, privat, andrew, vivien.didelot, rmk+kernel, sean.wang,
	Woojung.Huh, john, cphealy
In-Reply-To: <20180505190425.14378-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sat,  5 May 2018 12:04:16 -0700

> This patch series adds PHYLINK support to DSA which is necessary to
> support more complex PHY and pluggable modules setups.

The regression Andrew reported with patch #7 is holding up my
applying this series.

How would you like this to be handled?  Are you going to respin
once Andrew sends you the fix?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/9] net: dsa: Plug in PHYLINK support
From: Florian Fainelli @ 2018-05-09  0:20 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, privat, andrew, vivien.didelot, rmk+kernel, sean.wang,
	Woojung.Huh, john, cphealy
In-Reply-To: <20180508.201709.1396958179924861550.davem@davemloft.net>

Le 05/08/18 à 17:17, David Miller a écrit :
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Sat,  5 May 2018 12:04:16 -0700
> 
>> This patch series adds PHYLINK support to DSA which is necessary to
>> support more complex PHY and pluggable modules setups.
> 
> The regression Andrew reported with patch #7 is holding up my
> applying this series.
> 
> How would you like this to be handled?  Are you going to respin
> once Andrew sends you the fix?

Andrew sent me the fix and I squashed it in patch 7, waiting for Russell
to give me his SoB for patch 3 since he originally sent that to me
privately and I changed it a bit.

Expect a v2 soon, should have made that clearer. Thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH] selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
From: Paul Moore @ 2018-05-09  0:25 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Alexey Kodanev, Richard Haines, selinux, Eric Paris,
	linux-security-module, netdev
In-Reply-To: <1eb10913-8802-e2dd-68f0-9483435cd949@tycho.nsa.gov>

On Tue, May 8, 2018 at 2:40 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On 05/08/2018 01:05 PM, Paul Moore wrote:
>> On Tue, May 8, 2018 at 10:05 AM, Alexey Kodanev
>> <alexey.kodanev@oracle.com> wrote:
>>> Commit d452930fd3b9 ("selinux: Add SCTP support") breaks compatibility
>>> with the old programs that can pass sockaddr_in with AF_UNSPEC and
>>> INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT error.
>>> It was found with LTP/asapi_01 test.
>>>
>>> Similar to commit 29c486df6a20 ("net: ipv4: relax AF_INET check in
>>> bind()"), which relaxed AF_INET check for compatibility, add AF_UNSPEC
>>> case to AF_INET and make sure that the address is INADDR_ANY.
>>>
>>> Also, in the end of selinux_socket_bind(), instead of adding AF_UNSPEC
>>> to 'address->sa_family == AF_INET', verify AF_INET6 first.
>>>
>>> Fixes: d452930fd3b9 ("selinux: Add SCTP support")
>>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>>> ---
>>>  security/selinux/hooks.c | 12 +++++++++---
>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> Thanks for finding and reporting this regression.
>>
>> I think I would prefer to avoid having to duplicate the
>> AF_UNSPEC/INADDR_ANY checking logic in the SELinux hook, even though
>> it is a small bit of code and unlikely to change.  I'm wondering if it
>> would be better to check both the socket and sockaddr address family
>> in the main if conditional inside selinux_socket_bind(), what do you
>> think?  Another option would be to go back to just checking the
>> soackaddr address family; we moved away from that for a reason which
>> escapes at the moment (code cleanliness?), but perhaps that was a
>> mistake.
>
> We've always used the sk->sk_family there; it was only the recent code from Richard that started
> using the socket address family.

Yes I know, I thought I was the one that suggested it at some point
(I'll take the blame) ... although now that I'm looking at the git
log, maybe I'm confusing it with something else.

>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 4cafe6a19167..a3789b167667 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -4577,6 +4577,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>> {
>>        struct sock *sk = sock->sk;
>>        u16 family;
>> +       u16 family_sa;
>>        int err;
>>
>>        err = sock_has_perm(sk, SOCKET__BIND);
>> @@ -4585,7 +4586,9 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>
>>        /* If PF_INET or PF_INET6, check name_bind permission for the port. */
>>        family = sk->sk_family;
>> -       if (family == PF_INET || family == PF_INET6) {
>> +       family_sa = address->sa_family;
>> +       if ((family == PF_INET || family == PF_INET6) &&
>> +           (family_sa == PF_INET || family_sa == PF_INET6)) {
>
> Wouldn't this allow bypassing the name_bind permission check by passing in AF_UNSPEC?

I believe these name_bind permission checkis skipped for AF_UNSPEC
already, isn't it?  The only way the name_bind check would be
triggered is if the source port, snum, was non-zero and I didn't think
that was really legal for AF_UNSPEC/INADDR_ANY, is it?

>>                char *addrp;
>>                struct sk_security_struct *sksec = sk->sk_security;
>>                struct common_audit_data ad;
>> @@ -4601,7 +4604,7 @@ static int selinux_socket_bind(struct socket *sock, struc>
>>                 * need to check address->sa_family as it is possible to have
>>                 * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>                 */
>> -               switch (address->sa_family) {
>> +               switch (family_sa) {
>>                case AF_INET:
>>                        if (addrlen < sizeof(struct sockaddr_in))
>>                                return -EINVAL;
>>
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 4cafe6a..649a3be 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -4602,10 +4602,16 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>                  * sk->sk_family = PF_INET6 with addr->sa_family = AF_INET.
>>>                  */
>>>                 switch (address->sa_family) {
>>> +               case AF_UNSPEC:
>>>                 case AF_INET:
>>>                         if (addrlen < sizeof(struct sockaddr_in))
>>>                                 return -EINVAL;
>>>                         addr4 = (struct sockaddr_in *)address;
>>> +
>>> +                       if (address->sa_family == AF_UNSPEC &&
>>> +                           addr4->sin_addr.s_addr != htonl(INADDR_ANY))
>>> +                               return -EAFNOSUPPORT;
>>> +
>>>                         snum = ntohs(addr4->sin_port);
>>>                         addrp = (char *)&addr4->sin_addr.s_addr;
>>>                         break;
>>> @@ -4681,10 +4687,10 @@ static int selinux_socket_bind(struct socket *sock, struct sockaddr *address, in
>>>                 ad.u.net->sport = htons(snum);
>>>                 ad.u.net->family = family;
>>>
>>> -               if (address->sa_family == AF_INET)
>>> -                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>> -               else
>>> +               if (address->sa_family == AF_INET6)
>>>                         ad.u.net->v6info.saddr = addr6->sin6_addr;
>>> +               else
>>> +                       ad.u.net->v4info.saddr = addr4->sin_addr.s_addr;
>>>
>>>                 err = avc_has_perm(&selinux_state,
>>>                                    sksec->sid, sid,
>>> --
>>> 1.8.3.1
>>>
>>
>



-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC bpf-next 00/10] initial control flow support for eBPF verifier
From: David Miller @ 2018-05-09  0:28 UTC (permalink / raw)
  To: jiong.wang
  Cc: alexei.starovoitov, daniel, john.fastabend, netdev, oss-drivers
In-Reply-To: <1525688567-19618-1-git-send-email-jiong.wang@netronome.com>

From: Jiong Wang <jiong.wang@netronome.com>
Date: Mon,  7 May 2018 06:22:36 -0400

> This infrastructure comes with some memory usage and execution time cost.
> Memory pool based allocation is used to avoid frequently calling of
> kmalloc/free. Singly linked list and compact pointer are used to reduce
> various cfg node size.

The memory and execution time costs are unfortunate, but we will
certainly need a real control flow graph in the verifier whether we
like it or not.

^ permalink raw reply

* Re: [PATCH net-next] dt-bindings: dsa: Remove unnecessary #address/#size-cells
From: David Miller @ 2018-05-09  0:28 UTC (permalink / raw)
  To: festevam; +Cc: f.fainelli, andrew, robh+dt, netdev, devicetree, fabio.estevam
In-Reply-To: <1525695471-19984-1-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com>
Date: Mon,  7 May 2018 09:17:51 -0300

> From: Fabio Estevam <fabio.estevam@nxp.com>
> 
> If the example binding is used on a real dts file, the following DTC
> warning is seen with W=1:
>     
> arch/arm/boot/dts/imx6q-b450v3.dtb: Warning (avoid_unnecessary_addr_size): /mdio-gpio/switch@0: unnecessary #address-cells/#size-cells without "ranges" or child "reg" property
> 
> Remove unnecessary #address-cells/#size-cells to improve the binding
> document examples.
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>

Applied, thank you.

^ permalink raw reply

* Re: [net-next 0/6][pull request] 100GbE Intel Wired LAN Driver Updates 2018-05-07
From: David Miller @ 2018-05-09  0:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon,  7 May 2018 07:45:15 -0700

> This series contains updates to fm10k only.

Waiting for a respin after patch #2 is updated based upon feedback.

Thanks.

^ permalink raw reply

* [net-next  1/1] tipc: clean up removal of binding table items
From: Jon Maloy @ 2018-05-09  0:59 UTC (permalink / raw)
  To: davem, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	jon.maloy, canh.d.luu, ying.xue, tipc-discussion

In commit be47e41d77fb ("tipc: fix use-after-free in tipc_nametbl_stop")
we fixed a problem caused by premature release of service range items.

That fix is correct, and solved the problem. However, it doesn't address
the root of the problem, which is that we don't lookup the tipc_service
 -> service_range -> publication items in the correct hierarchical
order.

In this commit we try to make this right, and as a side effect obtain
some code simplification.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/name_table.c | 103 ++++++++++++++++++++++++++------------------------
 1 file changed, 53 insertions(+), 50 deletions(-)

diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index dd1c4fa..bebe88c 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -136,12 +136,12 @@ static struct tipc_service *tipc_service_create(u32 type, struct hlist_head *hd)
 }
 
 /**
- * tipc_service_find_range - find service range matching a service instance
+ * tipc_service_first_range - find first service range in tree matching instance
  *
  * Very time-critical, so binary search through range rb tree
  */
-static struct service_range *tipc_service_find_range(struct tipc_service *sc,
-						     u32 instance)
+static struct service_range *tipc_service_first_range(struct tipc_service *sc,
+						      u32 instance)
 {
 	struct rb_node *n = sc->ranges.rb_node;
 	struct service_range *sr;
@@ -158,6 +158,30 @@ static struct service_range *tipc_service_find_range(struct tipc_service *sc,
 	return NULL;
 }
 
+/*  tipc_service_find_range - find service range matching publication parameters
+ */
+static struct service_range *tipc_service_find_range(struct tipc_service *sc,
+						     u32 lower, u32 upper)
+{
+	struct rb_node *n = sc->ranges.rb_node;
+	struct service_range *sr;
+
+	sr = tipc_service_first_range(sc, lower);
+	if (!sr)
+		return NULL;
+
+	/* Look for exact match */
+	for (n = &sr->tree_node; n; n = rb_next(n)) {
+		sr = container_of(n, struct service_range, tree_node);
+		if (sr->upper == upper)
+			break;
+	}
+	if (!n || sr->lower != lower || sr->upper != upper)
+		return NULL;
+
+	return sr;
+}
+
 static struct service_range *tipc_service_create_range(struct tipc_service *sc,
 						       u32 lower, u32 upper)
 {
@@ -238,54 +262,19 @@ static struct publication *tipc_service_insert_publ(struct net *net,
 /**
  * tipc_service_remove_publ - remove a publication from a service
  */
-static struct publication *tipc_service_remove_publ(struct net *net,
-						    struct tipc_service *sc,
-						    u32 lower, u32 upper,
-						    u32 node, u32 key,
-						    struct service_range **rng)
+static struct publication *tipc_service_remove_publ(struct service_range *sr,
+						    u32 node, u32 key)
 {
-	struct tipc_subscription *sub, *tmp;
-	struct service_range *sr;
 	struct publication *p;
-	bool found = false;
-	bool last = false;
-	struct rb_node *n;
-
-	sr = tipc_service_find_range(sc, lower);
-	if (!sr)
-		return NULL;
 
-	/* Find exact matching service range */
-	for (n = &sr->tree_node; n; n = rb_next(n)) {
-		sr = container_of(n, struct service_range, tree_node);
-		if (sr->upper == upper)
-			break;
-	}
-	if (!n || sr->lower != lower || sr->upper != upper)
-		return NULL;
-
-	/* Find publication, if it exists */
 	list_for_each_entry(p, &sr->all_publ, all_publ) {
 		if (p->key != key || (node && node != p->node))
 			continue;
-		found = true;
-		break;
+		list_del(&p->all_publ);
+		list_del(&p->local_publ);
+		return p;
 	}
-	if (!found)
-		return NULL;
-
-	list_del(&p->all_publ);
-	list_del(&p->local_publ);
-	if (list_empty(&sr->all_publ))
-		last = true;
-
-	/* Notify any waiting subscriptions */
-	list_for_each_entry_safe(sub, tmp, &sc->subscriptions, service_list) {
-		tipc_sub_report_overlap(sub, p->lower, p->upper, TIPC_WITHDRAWN,
-					p->port, p->node, p->scope, last);
-	}
-	*rng = sr;
-	return p;
+	return NULL;
 }
 
 /**
@@ -376,17 +365,31 @@ struct publication *tipc_nametbl_remove_publ(struct net *net, u32 type,
 					     u32 node, u32 key)
 {
 	struct tipc_service *sc = tipc_service_find(net, type);
+	struct tipc_subscription *sub, *tmp;
 	struct service_range *sr = NULL;
 	struct publication *p = NULL;
+	bool last;
 
 	if (!sc)
 		return NULL;
 
 	spin_lock_bh(&sc->lock);
-	p = tipc_service_remove_publ(net, sc, lower, upper, node, key, &sr);
+	sr = tipc_service_find_range(sc, lower, upper);
+	if (!sr)
+		goto exit;
+	p = tipc_service_remove_publ(sr, node, key);
+	if (!p)
+		goto exit;
+
+	/* Notify any waiting subscriptions */
+	last = list_empty(&sr->all_publ);
+	list_for_each_entry_safe(sub, tmp, &sc->subscriptions, service_list) {
+		tipc_sub_report_overlap(sub, lower, upper, TIPC_WITHDRAWN,
+					p->port, node, p->scope, last);
+	}
 
 	/* Remove service range item if this was its last publication */
-	if (sr && list_empty(&sr->all_publ)) {
+	if (list_empty(&sr->all_publ)) {
 		rb_erase(&sr->tree_node, &sc->ranges);
 		kfree(sr);
 	}
@@ -396,6 +399,7 @@ struct publication *tipc_nametbl_remove_publ(struct net *net, u32 type,
 		hlist_del_init_rcu(&sc->service_list);
 		kfree_rcu(sc, rcu);
 	}
+exit:
 	spin_unlock_bh(&sc->lock);
 	return p;
 }
@@ -437,7 +441,7 @@ u32 tipc_nametbl_translate(struct net *net, u32 type, u32 instance, u32 *dnode)
 		goto not_found;
 
 	spin_lock_bh(&sc->lock);
-	sr = tipc_service_find_range(sc, instance);
+	sr = tipc_service_first_range(sc, instance);
 	if (unlikely(!sr))
 		goto no_match;
 
@@ -484,7 +488,7 @@ bool tipc_nametbl_lookup(struct net *net, u32 type, u32 instance, u32 scope,
 
 	spin_lock_bh(&sc->lock);
 
-	sr = tipc_service_find_range(sc, instance);
+	sr = tipc_service_first_range(sc, instance);
 	if (!sr)
 		goto no_match;
 
@@ -756,8 +760,7 @@ static void tipc_service_delete(struct net *net, struct tipc_service *sc)
 	spin_lock_bh(&sc->lock);
 	rbtree_postorder_for_each_entry_safe(sr, tmpr, &sc->ranges, tree_node) {
 		list_for_each_entry_safe(p, tmp, &sr->all_publ, all_publ) {
-			tipc_service_remove_publ(net, sc, p->lower, p->upper,
-						 p->node, p->key, &sr);
+			tipc_service_remove_publ(sr, p->node, p->key);
 			kfree_rcu(p, rcu);
 		}
 		rb_erase(&sr->tree_node, &sc->ranges);
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH ghak81 RFC V1 2/5] audit: convert sessionid unset to a macro
From: Richard Guy Briggs @ 2018-05-09  1:34 UTC (permalink / raw)
  To: Linux-Audit Mailing List, LKML,
	Linux NetDev Upstream Mailing List, Netfilter Devel List,
	Linux Security Module list, Integrity Measurement Architecture,
	SElinux list
  Cc: David Howells, Ingo Molnar
In-Reply-To: <91fd13c7a66718dc827d299fa101883e5d0a864f.1525466167.git.rgb@redhat.com>

On 2018-05-04 16:54, Richard Guy Briggs wrote:
> Use a macro, "AUDIT_SID_UNSET", to replace each instance of
> initialization and comparison to an audit session ID.
> 
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

There's a minor issue with this patch, adding a header include to
init/init_task.c in this patch and removing it from patch 5.  That'll be
in the next revision.

I have dynamic allocation working, so that has a good chance of
appearing  too.

> ---
>  include/linux/audit.h      | 2 +-
>  include/net/xfrm.h         | 2 +-
>  include/uapi/linux/audit.h | 1 +
>  init/init_task.c           | 2 +-
>  kernel/auditsc.c           | 4 ++--
>  5 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 75d5b03..5f86f7c 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -513,7 +513,7 @@ static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
>  }
>  static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>  {
> -	return -1;
> +	return AUDIT_SID_UNSET;
>  }
>  static inline void audit_ipc_obj(struct kern_ipc_perm *ipcp)
>  { }
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index a872379..fcce8ee 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -751,7 +751,7 @@ static inline void xfrm_audit_helper_usrinfo(bool task_valid,
>  					    audit_get_loginuid(current) :
>  					    INVALID_UID);
>  	const unsigned int ses = task_valid ? audit_get_sessionid(current) :
> -		(unsigned int) -1;
> +		AUDIT_SID_UNSET;
>  
>  	audit_log_format(audit_buf, " auid=%u ses=%u", auid, ses);
>  	audit_log_task_context(audit_buf);
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 4e61a9e..04f9bd2 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -465,6 +465,7 @@ struct audit_tty_status {
>  };
>  
>  #define AUDIT_UID_UNSET (unsigned int)-1
> +#define AUDIT_SID_UNSET ((unsigned int)-1)
>  
>  /* audit_rule_data supports filter rules with both integer and string
>   * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
> diff --git a/init/init_task.c b/init/init_task.c
> index 3ac6e75..c788f91 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -119,7 +119,7 @@ struct task_struct init_task
>  	.thread_node	= LIST_HEAD_INIT(init_signals.thread_head),
>  #ifdef CONFIG_AUDITSYSCALL
>  	.loginuid	= INVALID_UID,
> -	.sessionid	= (unsigned int)-1,
> +	.sessionid	= AUDIT_SID_UNSET,
>  #endif
>  #ifdef CONFIG_PERF_EVENTS
>  	.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f3817d0..6e3ceb9 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2050,7 +2050,7 @@ static void audit_log_set_loginuid(kuid_t koldloginuid, kuid_t kloginuid,
>  int audit_set_loginuid(kuid_t loginuid)
>  {
>  	struct task_struct *task = current;
> -	unsigned int oldsessionid, sessionid = (unsigned int)-1;
> +	unsigned int oldsessionid, sessionid = AUDIT_SID_UNSET;
>  	kuid_t oldloginuid;
>  	int rc;
>  
> @@ -2064,7 +2064,7 @@ int audit_set_loginuid(kuid_t loginuid)
>  	/* are we setting or clearing? */
>  	if (uid_valid(loginuid)) {
>  		sessionid = (unsigned int)atomic_inc_return(&session_id);
> -		if (unlikely(sessionid == (unsigned int)-1))
> +		if (unlikely(sessionid == AUDIT_SID_UNSET))
>  			sessionid = (unsigned int)atomic_inc_return(&session_id);
>  	}
>  
> -- 
> 1.8.3.1
> 
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH bpf-next v3 3/6] bpf: Add IPv6 Segment Routing helpers
From: Alexei Starovoitov @ 2018-05-09  1:57 UTC (permalink / raw)
  To: Mathieu Xhonneux; +Cc: David Lebrun, netdev, David Miller
In-Reply-To: <CAKSCvkSNFDvRXXBV9GkxzbiQgZikRriKSD5tnhjsyx99m_x3Ww@mail.gmail.com>

On Mon, May 07, 2018 at 11:21:11PM +0100, Mathieu Xhonneux wrote:
> I'm not sure what would be the best approach here. These errors appear
> when CONFIG_IPV6=m and CONFIG_IPV6_SEG6_LWTUNNEL=y (which is bool and
> depends on IPv6, hence it is also modularized in this case), then
> IS_ENABLED(CONFIG_IPV6_SEG6_LWTUNNEL) returns true, even though the
> seg6_* symbols are not available when linking vmlinux. If I'm correct,
> since net/core/filter.c is always built-in, all functions it uses must
> also be built-in.
> 
> I didn't find any other dependency from net/core/filter.c using a
> feature which can be modularized, hence the only solution I see here
> is to create a new bool CONFIG variable, e.g. CONFIG_IPV6_SEG6_BPF,
> which would require CONFIG_IPV6=y and CONFIG_IPV6_SEG6_LWTUNNEL=y. I
> could then replace my #if IS_ENABLED(CONFIG_IPV6_SEG6_LWTUNNEL)
> conditions by #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) in
> net/core/filter.c.

Requiring CONFIG_IPV6=y for this feature would unfortunate.
I think the least ugliest solution would be to add seg6 functions
to ipv6_bpf_stub and call them indirectly if they are not
in critical path.
If it is the fast path then CONFIG_IPV6=y is the only option,
but don't expose the new CONFIG_IPV6_SEG6_BPF to the user.
Make it automatic y/n depending on CONFIG_IPV6=y or (n|m)


> Any comment on this ?
> 
> Thanks.
> 
> 2018-05-07 0:29 GMT+01:00 kbuild test robot <lkp@intel.com>:
> > Hi Mathieu,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on bpf-next/master]
> >
> > url:    https://github.com/0day-ci/linux/commits/Mathieu-Xhonneux/ipv6-sr-introduce-seg6local-End-BPF-action/20180506-233046
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> > config: s390-allmodconfig (attached as .config)
> > compiler: s390x-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> > reproduce:
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # save the attached .config to linux build tree
> >         make.cross ARCH=s390
> >
> > All errors (new ones prefixed by >>):
> >
> >    net/core/filter.o: In function `bpf_push_seg6_encap':
> >    filter.c:(.text+0xaf4c): undefined reference to `seg6_validate_srh'
> >    filter.c:(.text+0xaf8a): undefined reference to `seg6_do_srh_inline'
> >    filter.c:(.text+0xafc4): undefined reference to `seg6_do_srh_encap'
> >    filter.c:(.text+0xb016): undefined reference to `seg6_lookup_nexthop'
> >    net/core/filter.o: In function `bpf_lwt_seg6_store_bytes':
> >>> (.text+0xb106): undefined reference to `seg6_bpf_srh_states'
> >    net/core/filter.o: In function `bpf_lwt_seg6_action':
> >    (.text+0xb2b0): undefined reference to `seg6_bpf_srh_states'
> >>> (.text+0xb334): undefined reference to `seg6_validate_srh'
> >>> (.text+0xb394): undefined reference to `seg6_lookup_nexthop'
> >    (.text+0xb3c4): undefined reference to `seg6_lookup_nexthop'
> >    net/core/filter.o: In function `bpf_lwt_seg6_adjust_srh':
> >    (.text+0xb492): undefined reference to `seg6_bpf_srh_states'
> >
> > ---
> > 0-DAY kernel test infrastructure                Open Source Technology Center
> > https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH ghak81 RFC V1 4/5] audit: use inline function to set audit context
From: Tobin C. Harding @ 2018-05-09  2:07 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Linux-Audit Mailing List, LKML,
	Linux NetDev Upstream Mailing List, Netfilter Devel List,
	Linux Security Module list, Integrity Measurement Architecture,
	SElinux list, Eric Paris, Paul Moore, Steve Grubb, Ingo Molnar,
	David Howells
In-Reply-To: <2f0566af8ccafdaf400a3d002cb4aef9b80e44cf.1525466167.git.rgb@redhat.com>

On Fri, May 04, 2018 at 04:54:37PM -0400, Richard Guy Briggs wrote:
> Recognizing that the audit context is an internal audit value, use an
> access function to set the audit context pointer for the task
> rather than reaching directly into the task struct to set it.
> 
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h | 8 ++++++++
>  kernel/auditsc.c      | 6 +++---
>  kernel/fork.c         | 2 +-
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 93e4c61..dba0d45 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -235,6 +235,10 @@ extern void __audit_inode_child(struct inode *parent,
>  extern void __audit_seccomp(unsigned long syscall, long signr, int code);
>  extern void __audit_ptrace(struct task_struct *t);
>  
> +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> +{
> +	task->audit_context = ctx;
> +}
>  static inline struct audit_context *audit_context(struct task_struct *task)
>  {
>  	return task->audit_context;
> @@ -472,6 +476,10 @@ static inline bool audit_dummy_context(void)
>  {
>  	return true;
>  }
> +static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
> +{
> +	task->audit_context = ctx;
> +}

If audit_context is an internal audit value why do we set it when
CONFIG_AUDITSYSCALL is not set?

thanks,
Tobin.

^ permalink raw reply

* pull-request Cavium liquidio firmware v1.7.2
From: Felix Manlunas @ 2018-05-09  2:21 UTC (permalink / raw)
  To: linux-firmware
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	felix.manlunas

The following changes since commit 8fc2d4e55685bf73b6f7752383da9067404a74bb:

  Merge git://git.marvell.com/mwifiex-firmware (2018-05-07 09:09:40 -0400)

are available in the git repository at:

  https://github.com/felix-cavium/linux-firmware.git for-upstreaming-v1.7.2

for you to fetch changes up to d3b6941e1a85cbff895a92aa9e36b50deaeac970:

  linux-firmware: liquidio: update firmware to v1.7.2 (2018-05-08 19:02:41 -0700)

Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
----------------------------------------------------------------
Felix Manlunas (1):
      linux-firmware: liquidio: update firmware to v1.7.2

 WHENCE                     |   8 ++++----
 liquidio/lio_210nv_nic.bin | Bin 1265368 -> 1281464 bytes
 liquidio/lio_210sv_nic.bin | Bin 1163128 -> 1179352 bytes
 liquidio/lio_23xx_nic.bin  | Bin 1271456 -> 1287264 bytes
 liquidio/lio_410nv_nic.bin | Bin 1265368 -> 1281464 bytes
 5 files changed, 4 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] net:sched: add gkprio scheduler
From: Cong Wang @ 2018-05-09  2:24 UTC (permalink / raw)
  To: Michel Machado
  Cc: Nishanth Devarajan, Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers, Cody Doucette
In-Reply-To: <273f91db-7a2f-acda-b306-5a78dd948478@digirati.com.br>

On Tue, May 8, 2018 at 5:59 AM, Michel Machado <michel@digirati.com.br> wrote:
>>> Overall it looks good to me, just one thing below:
>>>
>>>> +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
>>>> +       .id             =       "gkprio",
>>>> +       .priv_size      =       sizeof(struct gkprio_sched_data),
>>>> +       .enqueue        =       gkprio_enqueue,
>>>> +       .dequeue        =       gkprio_dequeue,
>>>> +       .peek           =       qdisc_peek_dequeued,
>>>> +       .init           =       gkprio_init,
>>>> +       .reset          =       gkprio_reset,
>>>> +       .change         =       gkprio_change,
>>>> +       .dump           =       gkprio_dump,
>>>> +       .destroy        =       gkprio_destroy,
>>>> +       .owner          =       THIS_MODULE,
>>>> +};
>>>
>>>
>>> You probably want to add Qdisc_class_ops here so that you can
>>> dump the stats of each internal queue.
>
>
> Hi Cong,
>
>    In the production scenario we are targeting, this priority queue must be
> classless; being classful would only bloat the code for us. I don't see
> making this queue classful as a problem per se, but I suggest leaving it as
> a future improvement for when someone can come up with a useful scenario for
> it.


Take a look at sch_prio, it is fairly simple since your internal
queues are just an array... Per-queue stats are quite useful
in production, we definitely want to observe which queues are
full which are not.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-09  2:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Eric Paris, Matthew Auld, Josh Triplett, Kirill A. Shutemov,
	Joonas Lahtinen, Chris Wilson, Stephen Smalley, Eric W. Biederman,
	Mimi Zohar, Alexei Starovoitov, davem, daniel, torvalds, gregkh,
	luto, netdev, linux-kernel, kernel-team, Al Viro, David Howells,
	Kees Cook, Andrew Morton
In-Reply-To: <20180507183931.GV27853@wotan.suse.de>

On Mon, May 07, 2018 at 06:39:31PM +0000, Luis R. Rodriguez wrote:
> 
> > Are you saying make 'static struct vfsmount *shm_mnt;'
> > global and use it here? so no init_tmpfs() necessary?
> > I think that can work, but feels that having two
> > tmpfs mounts (one for shmem and one for umh) is cleaner.
> 
> No, but now that you mention it... if a shared vfsmount is not used the
> /sys/kernel/mm/transparent_hugepage/shmem_enabled knob for using huge pages
> would not be followed for umh modules. For the i915 driver this was *why*
> they ended up adding shmem_file_setup_with_mnt(), they wanted huge pages to
> support huge-gtt-pages. What is the rationale behind umh.c using it for
> umh modules?
> 
> Users of shmem_kernel_file_setup() spawned later out of the desire to
> *avoid* LSMs since it didn't make sense in their case as their inodes
> are never exposed to userspace. Such is the case for ipc/shm.c and
> security/keys/big_key.c. Refer to commit c7277090927a5 ("security: shmem:
> implement kernel private shmem inodes") and then commit e1832f2923ec9
> ("ipc: use private shmem or hugetlbfs inodes for shm segments").
> 
> In this new umh usermode modules case we are:
> 
>   a) actually mapping data already extracted by the kernel somehow from
>      a file somehow, presumably from /lib/modules/ path somewhere, but
>      again this is not visible to umc.c, as it just gets called with:
> 
> 	fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> 
>   b) Creating the respective tmpfs file with shmem_file_setup_with_mnt()
>      with our on our own shared struct vfsmount *umh_fs.
> 
>   c) Populating the file created and stuffing it with our data passed
> 
>   d) Calling do_execve_file() on it.
> 
> Its not clear to me why you used shmem_file_setup_with_mnt() in this case. What
> are the gains? It would make sense to use shmem_kernel_file_setup() to avoid an
> LSM check on step b) *but* only if we already had a proper LSM check on step
> a).
> 
> I checked how you use fork_usermode_blob() in a) and in this case the kernel
> module bpfilter would be loaded first, and for that we already have an LSM
> check / hook for that module. From my review then, the magic done on your
> second patch to stuff the userspace application into the module should be
> irrelevant to us from an LSM perspective.
> 
> So, I can see a reason to use shmem_kernel_file_setup() but not I cannot
> see a reason to be using      shmem_file_setup_with_mnt() at the moment.

That's a good idea. I will switch to using shmem_kernel_file_setup().
I guess I can use kernel_write() as well instead of populate_file().
I wonder why i915 had to use pagecache_write_begin() and the loop
instead of kernel_write()...
The only reason to copy into tmpfs file is to make that memory swappable.
All standard shmem knobs should apply.

> > > > +{
> > > > +	size_t offset = 0;
> > > > +	int err;
> > > > +
> > > > +	do {
> > > > +		unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
> > > > +		struct page *page;
> > > > +		void *pgdata, *vaddr;
> > > > +
> > > > +		err = pagecache_write_begin(file, file->f_mapping, offset, len,
> > > > +					    0, &page, &pgdata);
> > > > +		if (err < 0)
> > > > +			goto fail;
> > > > +
> > > > +		vaddr = kmap(page);
> > > > +		memcpy(vaddr, data, len);
> > > > +		kunmap(page);
> > > > +
> > > > +		err = pagecache_write_end(file, file->f_mapping, offset, len,
> > > > +					  len, page, pgdata);
> > > > +		if (err < 0)
> > > > +			goto fail;
> > > > +
> > > > +		size -= len;
> > > > +		data += len;
> > > > +		offset += len;
> > > > +	} while (size);
> > > 
> > > Character for character, this looks like a wonderful copy and paste from
> > > i915_gem_object_create_from_data()'s own loop which does the same exact
> > > thing. Perhaps its time for a helper on mm/filemap.c with an export so
> > > if a bug is fixed in one place its fixed in both places.
> > 
> > yes, of course, but not right now.
> > Once it all lands that will be the time to create common helper.
> > It's not a good idea to mess with i915 in one patch set.
> 
> Either way works with me, so long as its done.

Will be gone due to switch to kernel_write().

> 
> > > > +int fork_usermode_blob(void *data, size_t len, struct umh_info *info)
> > > 
> > > Please use umh_fork_blob()
> > 
> > sorry, no. fork_usermode_blob() is much more descriptive name.
> 
> Prefixing new umh.c symbols with umh_*() makes it very clear this came from
> kernel/umh.c functionality. I've been dealing with other places in the kernel
> that have conflated their own use of kernel/umh.c functionality what they
> expose in both code and documentation, and correcting this has taken a long
> time. Best avoid future confusion and be consistent with new exported symbols
> for umc.c functionality.

There is no confusion today. The most known umh api is a family of
call_usermodehelper*()
In this case it's not a 'call', it's a 'fork', since part of kernel module
being forked as user mode process.
I considered naming this function fork_usermodehelper(),
but it's also not quite correct, since 'user mode helper' has predefined
meaning of something that has the path whereas here it's a blob of bytes.
Hence fork_usermode_blob() is more accurate and semantically correct name,
whereas umh_fork_blob() is not.

Notice I no longer call these new kernel modules as 'umh modules',
since that's the wrong name for the same reasons.
They are good ol' kernel modules.
The new functionality allowed by this patch is:
forking part of kernel module data as user mode process.
A lot of umh logic is reused, but 'user mode helpers' and
'user mode blobs' are distinct kernel features.

> I don't even want to test USB, I am just interesting in the *very* *very*
> basic aspects of it. A simple lib/test_umh_module.c would do with a respective
> check that its loaded, given this is a requirement from the API. It also helps
> folks understand the core basic knobs without having to look at bfilter code.

I agree that lib/test_usermode_blob.c must be available eventually.
Right now we cannot add it to the tree, since we need to figure how interface
between kernel and usermode_blob will work based on real world use case
of bpfilter. Once it gets further along that would be the time to say:
"look, here is the test for fork_usermode_blob() and here how others (usb drivers)
can and should use it".
Today is not the right time to fix the api. Such lib/test_usermode_blob.c
would have to be constantly adjusted as we tweak bpfilter side becoming
unnecessary burden of us bpfilter developers.
Everyone really need to think of these patches as work in progress
and internal details and api of fork_usermode_blob() will change.

^ permalink raw reply

* Re: [PATCH net-next] net:sched: add gkprio scheduler
From: Cong Wang @ 2018-05-09  2:27 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Michel Machado, Nishanth Devarajan, Jiri Pirko, David Miller,
	Linux Kernel Network Developers, Cody Doucette
In-Reply-To: <2855ce67-f1bc-c04c-81ae-70ae3fdc6b17@mojatatu.com>

On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> Have you considered using skb->prio instead of peeking into the packet
> header.
> Also have you looked at the dsmark qdisc?
>

dsmark modifies ds fields, while this one just maps ds fields into
different queues.

^ permalink raw reply

* Re: [PATCH net-next v10 3/4] netdev: octeon-ethernet: Add Cavium Octeon III support.
From: David Miller @ 2018-05-09  2:28 UTC (permalink / raw)
  To: steven.hill; +Cc: netdev, cmunoz, david.daney
In-Reply-To: <1525713731-27092-4-git-send-email-steven.hill@cavium.com>

From: "Steven J. Hill" <steven.hill@cavium.com>
Date: Mon,  7 May 2018 12:22:10 -0500

> +static atomic_t request_mgmt_once;
> +static atomic_t load_driver_once;
> +static atomic_t pki_id;
 ...
> +		/* One time request driver module */
> +		if (is_mix) {
> +			if (atomic_cmpxchg(&request_mgmt_once, 0, 1) == 0)
> +				request_module_nowait("octeon_mgmt");
> +		}
> +		if (is_pki) {
> +			if (atomic_cmpxchg(&load_driver_once, 0, 1) == 0)
> +				request_module_nowait("octeon3-ethernet");
> +		}

You're going to have to explain this, it makes no sense to me.
 
> +static int bgx_pki_ports_init(void)
> +{
> +	int i;
> +	int j;
> +	int k;

"int i, j, k;" please.

> +static int bgx_port_xgmii_set_link_up(struct bgx_port_priv *priv)
> +{
> +	u64 data;
> +	int timeout;

Please order from longest to shortest line for variable declarations.

> +static int bgx_port_sgmii_set_link_speed(struct bgx_port_priv *priv,
> +					 struct port_status status)
> +{
> +	int timeout;
> +	u64 miscx;
> +	u64 data;
> +	u64 prtx;

Please use "u64 miscx, data, prtx;" and put it on the first line.

> +static struct port_status bgx_port_get_xaui_link(struct bgx_port_priv *priv)
> +{
> +	struct port_status status;
> +	int lanes;
> +	int speed;
> +	u64 data;

"int lanes, speed;"

> +static int bgx_port_gser_27882(struct bgx_port_priv *priv)
> +{
> +	int timeout;
> +	u64 addr;
> +	u64 data;

"u64 addr, data;" and move to first line.

> +static int bgx_port_qlm_rx_equalization(struct bgx_port_priv *priv,
> +					int qlm, int lane)
> +{
> +	int max_lanes = bgx_port_get_max_qlm_lanes(qlm);
> +	int lane_mask;
> +	int timeout;
> +	int rc = 0;
> +	u64 lmode;
> +	u64 addr;
> +	u64 data;
> +	int i;

Please group these local variables.  Have some pity for people who
have not so much vertical space on their screen when they are reading
your code. :)

> +static int bgx_port_init_xaui_link(struct bgx_port_priv *priv)
> +{
> +	int use_training = 0;
> +	int use_ber = 0;
> +	int timeout;
> +	int rc = 0;
> +	u64 data;

Please group the int variables into a smaller number of lines.

> +		/* Wait for mac rx to be ready */
> +		timeout = 10000;
> +		do {
> +			data = oct_csr_read(BGX_SMU_RX_CTL(priv->node, priv->bgx, priv->index));
> +			data &= GENMASK_ULL(1, 0);
> +			if (!data)
> +				break;
> +			timeout--;
> +			udelay(1);
> +		} while (timeout);

This construct is repeated so many times, over and over.  Make a helper
function that performs this operation.

> +static void bgx_port_adjust_link(struct net_device *netdev)
> +{
> +	struct bgx_port_priv *priv = bgx_port_netdev2priv(netdev);
> +	bool link_changed = false;
> +	unsigned int duplex;
> +	unsigned int speed;
> +	unsigned int link;

Please group the unsigned ints.

> +static int bgx_port_probe(struct platform_device *pdev)
> +{
> +	struct bgx_port_priv *priv;
> +	const __be32 *reg;
> +	const u8 *mac;
> +	int numa_node;
> +	u32 index;
> +	u64 addr;
> +	int rc;

Please group variables of the same time into one line.

> +static int __init bgx_port_driver_init(void)
> +{
> +	int r;
> +	int i;
> +	int j;
> +	int k;

"int r, i, j, k;"

> +static inline u64 scratch_read64(u64 offset)

Do not use "inline" for functions in foo.c files, let the compiler
decide.

> +static inline void scratch_write64(u64 offset, u64 value)

Likewise.

> +static inline struct wr_ret octeon3_core_get_work_sync(int grp)
> +{
> +	u64 node = cvmx_get_node_num();
> +	struct wr_ret r;
> +	u64 response;
> +	u64 addr;

"u64 response, addr;"  Don't use inline.

> +static inline void octeon3_core_get_work_async(unsigned int grp)
> +{

Kill the inline.

> +static inline struct wr_ret octeon3_core_get_response_async(void)

Likewise.

> +static int octeon3_eth_tx_complete_hwtstamp(struct octeon3_ethernet *priv,
> +					    struct sk_buff *skb)
> +{
> +	struct skb_shared_hwtstamps shts;
> +	u64 hwts;
> +	u64 ns;

"u64 hwts, ns;"

> +static int octeon3_eth_tx_complete_worker(void *data)
> +{
> +	struct octeon3_ethernet_worker *worker = data;
> +	struct octeon3_ethernet_node *oen;
> +	int tx_complete_stop_thresh;
> +	int backlog_stop_thresh;
> +	int backlog;
> +	u64 aq_cnt;
> +	int order;
> +	int i;

Group the variable declarations, please.

> +static int octeon3_eth_common_ndo_init(struct net_device *netdev,
> +				       int extra_skip)
> +{
> +	struct octeon3_ethernet_node *oen;
> +	int base_rx_grp[MAX_RX_QUEUES];
> +	struct octeon3_ethernet *priv;
> +	int pki_chan;
> +	int aura;
> +	int dq;
> +	int i;
> +	int r;

"int pki_chan, aura, dq, i, r;"

> +static void octeon3_eth_ndo_get_stats64(struct net_device *netdev,
> +					struct rtnl_link_stats64 *s)
> +{
> +	struct octeon3_ethernet *priv = netdev_priv(netdev);
> +	u64 delta_packets;
> +	u64 delta_dropped;
> +	u64 delta_octets;
> +	u64 dropped;
> +	u64 packets;
> +	u64 octets;

Group the u64s please.

> +static int octeon3_eth_common_ndo_open(struct net_device *netdev)
> +{
> +	struct octeon3_ethernet *priv = netdev_priv(netdev);
> +	struct octeon3_rx *rx;
> +	int i;
> +	int r;

"int i, r;"

> +static inline u64 build_pko_send_ext_desc(struct sk_buff *skb)

Kill the inline.

> +static inline u64 build_pko_send_tso(struct sk_buff *skb, uint mtu)

Likewise.

> +static inline u64 build_pko_send_mem_sub(u64 addr)

Likewise.

> +static inline u64 build_pko_send_mem_ts(u64 addr)

Likewise.

> +static inline u64 build_pko_send_free(u64 addr)

Likewise.

> +static inline u64 build_pko_send_work(int grp, u64 addr)

Likewise.

> +static int octeon3_eth_ndo_start_xmit(struct sk_buff *skb,
> +				      struct net_device *netdev)
> +{
> +	struct octeon3_ethernet *priv = netdev_priv(netdev);
> +	struct octeon3_ethernet_node *oen;
> +	u64 scr_off = LMTDMA_SCR_OFFSET;
> +	struct sk_buff *skb_tmp;
> +	u64 pko_send_desc;
> +	u64 *lmtdma_addr;
> +	unsigned int mss;
> +	u64 lmtdma_data;
> +	u64 aq_cnt = 0;
> +	int frag_count;
> +	long backlog;
> +	u64 head_len;
> +	void **work;
> +	int grp;
> +	int i;

Please group these variables, this is crazy...

> +static int octeon3_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
> +{
> +	struct octeon3_ethernet	*priv;
> +	int neg_ppb = 0;
> +	u64 comp;
> +	u64 diff;

"u64 comp, diff;"
> +int octeon_fpa3_init(int node)
> +{
> +	static bool init_done[2];
> +	int aura_cnt;
> +	u64 data;
> +	int i;

"int aura_cnt, i; "

> +int octeon_fpa3_aura_init(int node, int pool, int aura_num,
> +			  int *aura, int num_bufs, unsigned int limit)
> +{
> +	struct global_resource_tag tag;
> +	unsigned int drop;
> +	unsigned int pass;
> +	char buf[16];
> +	int rc = 0;
> +	u64 shift;
> +	u64 data;

"unsigned int drop, pass;"
"u64 shift, data;"

> +void *octeon_fpa3_alloc(int node, int aura)
> +{
> +	void *buf = NULL;
> +	u64 buf_phys;
> +	u64 addr;

"u64 buf_phys, addr;"

> +void octeon_fpa3_free(int node, int aura, const void *buf)
> +{
> +	u64 buf_phys;
> +	u64 addr;

"u64 buf_phys, addr;"

> +int octeon_fpa3_mem_fill(int node, struct kmem_cache *cache,
> +			 int aura, int num_bufs)
> +{
> +	void *mem;
> +	int rc = 0;
> +	int i;

"int i, rc = 0;"

> +static int octeon3_pki_pcam_alloc_entry(int node, int entry, int bank)
> +{
> +	struct global_resource_tag tag;
> +	char buf[16];
> +	int num_clusters;
> +	int rc;
> +	int i;

"int num_clusters, rc, i;"

> +static int octeon3_pki_pcam_write_entry(int node,
> +					struct pcam_term_info *term_info)
> +{
> +	int num_clusters;
> +	u64 action;
> +	int entry;
> +	u64 match;
> +	int bank;
> +	u64 term;
> +	int i;

"u64 action, match, term;"
"int num_clusters, entry, bank, i;"

> +int octeon3_pki_set_ptp_skip(int node, int pknd, int skip)
> +{
> +	int num_clusters;
> +	u64 data;
> +	u64 i;

"u64 data, i;"

That's all I have the stomache for at the moment.

This thing is really large, making it nearly impossible to review
as one huge patch #3. Perhaps you can find a way to split it up
logically somehow?

^ permalink raw reply

* [PATCH bpf-next] bpf: sync tools bpf.h uapi header
From: Prashant Bhole @ 2018-05-09  2:04 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Prashant Bhole, David S . Miller, netdev

sync the header from include/uapi/linux/bpf.h which was updated to add
fib lookup helper function. This fixes selftests/bpf build failure

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
---
 tools/include/uapi/linux/bpf.h | 84 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 83 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 83a95ae388dd..ddc566cb7492 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -10,6 +10,8 @@
 
 #include <linux/types.h>
 #include <linux/bpf_common.h>
+#include <linux/if_ether.h>
+#include <linux/in6.h>
 
 /* Extended instruction set based on top of classic BPF */
 
@@ -116,6 +118,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP,
 	BPF_MAP_TYPE_SOCKMAP,
 	BPF_MAP_TYPE_CPUMAP,
+	BPF_MAP_TYPE_XSKMAP,
 };
 
 enum bpf_prog_type {
@@ -1825,6 +1828,33 @@ union bpf_attr {
  * 	Return
  * 		0 on success, or a negative error in case of failure.
  *
+ *
+ * int bpf_fib_lookup(void *ctx, struct bpf_fib_lookup *params, int plen, u32 flags)
+ *	Description
+ *		Do FIB lookup in kernel tables using parameters in *params*.
+ *		If lookup is successful and result shows packet is to be
+ *		forwarded, the neighbor tables are searched for the nexthop.
+ *		If successful (ie., FIB lookup shows forwarding and nexthop
+ *		is resolved), the nexthop address is returned in ipv4_dst,
+ *		ipv6_dst or mpls_out based on family, smac is set to mac
+ *		address of egress device, dmac is set to nexthop mac address,
+ *		rt_metric is set to metric from route.
+ *
+ *             *plen* argument is the size of the passed in struct.
+ *             *flags* argument can be one or more BPF_FIB_LOOKUP_ flags:
+ *
+ *             **BPF_FIB_LOOKUP_DIRECT** means do a direct table lookup vs
+ *             full lookup using FIB rules
+ *             **BPF_FIB_LOOKUP_OUTPUT** means do lookup from an egress
+ *             perspective (default is ingress)
+ *
+ *             *ctx* is either **struct xdp_md** for XDP programs or
+ *             **struct sk_buff** tc cls_act programs.
+ *
+ *     Return
+ *             Egress device index on success, 0 if packet needs to continue
+ *             up the stack for further processing or a negative error in case
+ *             of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1895,7 +1925,8 @@ union bpf_attr {
 	FN(xdp_adjust_tail),		\
 	FN(skb_get_xfrm_state),		\
 	FN(get_stack),			\
-	FN(skb_load_bytes_relative),
+	FN(skb_load_bytes_relative),	\
+	FN(fib_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -2309,4 +2340,55 @@ struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };
 
+/* DIRECT:  Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT:  Do lookup from egress perspective; default is ingress
+ */
+#define BPF_FIB_LOOKUP_DIRECT  BIT(0)
+#define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
+
+struct bpf_fib_lookup {
+	/* input */
+	__u8	family;   /* network family, AF_INET, AF_INET6, AF_MPLS */
+
+	/* set if lookup is to consider L4 data - e.g., FIB rules */
+	__u8	l4_protocol;
+	__be16	sport;
+	__be16	dport;
+
+	/* total length of packet from network header - used for MTU check */
+	__u16	tot_len;
+	__u32	ifindex;  /* L3 device index for lookup */
+
+	union {
+		/* inputs to lookup */
+		__u8	tos;		/* AF_INET  */
+		__be32	flowlabel;	/* AF_INET6 */
+
+		/* output: metric of fib result */
+		__u32 rt_metric;
+	};
+
+	union {
+		__be32		mpls_in;
+		__be32		ipv4_src;
+		struct in6_addr	ipv6_src;
+	};
+
+	/* input to bpf_fib_lookup, *dst is destination address.
+	 * output: bpf_fib_lookup sets to gateway address
+	 */
+	union {
+		/* return for MPLS lookups */
+		__be32		mpls_out[4];  /* support up to 4 labels */
+		__be32		ipv4_dst;
+		struct in6_addr	ipv6_dst;
+	};
+
+	/* output */
+	__be16	h_vlan_proto;
+	__be16	h_vlan_TCI;
+	__u8	smac[ETH_ALEN];
+	__u8	dmac[ETH_ALEN];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Alexei Starovoitov @ 2018-05-09  2:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Alexei Starovoitov, davem, daniel, torvalds, gregkh, luto, netdev,
	linux-kernel, kernel-team, Juergen Gross, Eric Paris,
	Matthew Auld, Josh Triplett, Kirill A. Shutemov, Joonas Lahtinen,
	Chris Wilson, Stephen Smalley, Eric W. Biederman, Mimi Zohar,
	David Howells, Kees Cook, Andrew Morton
In-Reply-To: <20180507185124.GA18195@wotan.suse.de>

On Mon, May 07, 2018 at 06:51:24PM +0000, Luis R. Rodriguez wrote:
> > Notice that _binary_net_bpfilter_bpfilter_umh_start - end
> > is placed into .init.rodata section, so it's freed as soon as __init
> > function of bpfilter.ko is finished.
> > As part of __init the bpfilter.ko does first request/reply action
> > via two unix pipe provided by fork_usermode_blob() helper to
> > make sure that umh is healthy. If not it will kill it via pid.
> 
> It does this very fast, right away. On a really slow system how are you sure
> that this won't race and the execution of the check happens early on prior to
> letting the actual setup trigger? After all, we're calling the userpsace
> process in async mode. We could preempt it now.

I don't see an issue.
the kernel synchronously writes into a pipe. User space process reads.
Exactly the same as coredump logic with pipes.

> > +# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> > +# inside bpfilter_umh.o elf file referenced by
> > +# _binary_net_bpfilter_bpfilter_umh_start symbol
> > +# which bpfilter_kern.c passes further into umh blob loader at run-time
> > +quiet_cmd_copy_umh = GEN $@
> > +      cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> > +      $(OBJCOPY) -I binary -O $(CONFIG_OUTPUT_FORMAT) \
> > +      -B `$(OBJDUMP) -f $<|grep architecture|cut -d, -f1|cut -d' ' -f2` \
> > +      --rename-section .data=.init.rodata $< $@
> 
> Cool, but so our expectation is that the compiler sets this symbol, how
> are we sure it will always be set?

Compiler doesn't set it. objcopy does.

> > +
> > +	if (__bpfilter_process_sockopt(NULL, 0, 0, 0, 0) != 0) {
> 
> See, here, what if the userspace process gets preemtped and we run this
> check afterwards? Is that possible?

User space is a normal task. It can sleep and can be single stepped with GDB.

^ permalink raw reply

* Re: [net-next PATCH v3 0/6] UDP GSO Segmentation clean-ups
From: David Miller @ 2018-05-09  2:30 UTC (permalink / raw)
  To: alexander.duyck; +Cc: netdev, willemb
In-Reply-To: <CAKgT0UfRXs5SQjxN7gzZ=qo2yORteLXfdDOj_pOrGyYX_LouRA@mail.gmail.com>

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Mon, 7 May 2018 13:03:47 -0700

> On Mon, May 7, 2018 at 11:08 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> This patch set addresses a number of issues I found while sorting out
>> enabling UDP GSO Segmentation support for ixgbe/ixgbevf. Specifically there
>> were a number of issues related to the checksum and such that seemed to
>> cause either minor irregularities or kernel panics in the case of the
>> offload request being allowed to traverse between name spaces.
>>
>> With this set applied I am was able to get UDP GSO traffic to pass over
>> vxlan tunnels in both offloaded modes and non-offloaded modes for ixgbe and
>> ixgbevf.
>>
>> I submitted the driver specific patches earlier as an RFC:
>> https://patchwork.ozlabs.org/project/netdev/list/?series=42477&archive=both&state=*
>>
>> v2: Updated patches based on feedback from Eric Dumazet
>>     Split first patch into several patches based on feedback from Eric
>> v3: Drop patch that was calling pskb_may_pull as it was redundant.
>>     Added code to use MANGLED_0 in case of UDP checksum
>>     Drop patch adding NETIF_F_GSO_UDP_L4 to list of GSO software offloads
>>     Added Acked-by for patches reviewed by Willem and not changed
> 
> Just noticed I forgot to update the subject before sending out the
> cover page. I updated it for this reply. If needed I will submit a v4,
> but for now I will leave this out here to finish up review.

I thought it was kinda amusing, because it shows up as the series name
in patchwork too :-)

Series applied with header posting Subj fixed, thanks Alexander.

^ permalink raw reply

* [PATCH bpf-next 0/2] nfp: bpf: add programmable RSS support
From: Jakub Kicinski @ 2018-05-09  2:37 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: davem, netdev, oss-drivers, Jakub Kicinski

Hi!

This small series adds a feature which extends BPF offload beyond
a pure host processing offload and firmly into the realm of
heterogeneous processing.  Allowing offloaded XDP programs to set
the RX queue index opens the door for defining fully programmable
RSS/n-tuple filter replacement.  In fact the device datapath will
skip the RSS processing completely if BPF decided on the queue
already, making the XDP program replace part of the standard NIC
datapath.

We hope some day the entire NIC datapath will be defined by BPF :)

Jakub Kicinski (2):
  bpf: xdp: allow offloads to store into rx_queue_index
  nfp: bpf: support setting the RX queue index

 drivers/net/ethernet/netronome/nfp/bpf/fw.h   |  1 +
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 47 +++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 11 +++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  8 ++++
 .../net/ethernet/netronome/nfp/bpf/verifier.c | 28 ++++++++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  | 22 +++++----
 include/linux/bpf.h                           |  2 +-
 kernel/bpf/verifier.c                         |  2 +-
 net/core/filter.c                             |  9 +++-
 9 files changed, 115 insertions(+), 15 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [PATCH bpf-next 1/2] bpf: xdp: allow offloads to store into rx_queue_index
From: Jakub Kicinski @ 2018-05-09  2:37 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: davem, netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180509023707.23601-1-jakub.kicinski@netronome.com>

It's fairly easy for offloaded XDP programs to select the RX queue
packets go to.  We need a way of expressing this in the software.
Allow write to the rx_queue_index field of struct xdp_md for
device-bound programs.

Skip convert_ctx_access callback entirely for offloads.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 include/linux/bpf.h   | 2 +-
 kernel/bpf/verifier.c | 2 +-
 net/core/filter.c     | 9 ++++++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 321969da67b7..a38e474bf7ee 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -627,7 +627,7 @@ bool bpf_offload_dev_match(struct bpf_prog *prog, struct bpf_map *map);
 #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
 int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
 
-static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
+static inline bool bpf_prog_is_dev_bound(const struct bpf_prog_aux *aux)
 {
 	return aux->offload_requested;
 }
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d5e1a6c4165d..d92d9c37affd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5215,7 +5215,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 	}
 
-	if (!ops->convert_ctx_access)
+	if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux))
 		return 0;
 
 	insn = env->prog->insnsi + delta;
diff --git a/net/core/filter.c b/net/core/filter.c
index cf0d27acf1d1..2336b90e8b26 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4911,8 +4911,15 @@ static bool xdp_is_valid_access(int off, int size,
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info)
 {
-	if (type == BPF_WRITE)
+	if (type == BPF_WRITE) {
+		if (bpf_prog_is_dev_bound(prog->aux)) {
+			switch (off) {
+			case offsetof(struct xdp_md, rx_queue_index):
+				return __is_valid_xdp_access(off, size);
+			}
+		}
 		return false;
+	}
 
 	switch (off) {
 	case offsetof(struct xdp_md, data):
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf-next 2/2] nfp: bpf: support setting the RX queue index
From: Jakub Kicinski @ 2018-05-09  2:37 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: davem, netdev, oss-drivers, Jakub Kicinski
In-Reply-To: <20180509023707.23601-1-jakub.kicinski@netronome.com>

BPF has access to all internal FW datapath structures.  Including
the structure containing RX queue selection.  With little coordination
with the datapath we can let the offloaded BPF select the RX queue.
We just need a way to tell the datapath that queue selection has already
been done and it shouldn't overwrite it.  Define a bit to tell datapath
BPF already selected a queue (QSEL_SET), if the selected queue is not
enabled (>= number of enabled queues) datapath will perform normal RSS.

BPF queue selection on the NIC can be used to replace standard
datapath RSS with fully programmable BPF/XDP RSS.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/fw.h   |  1 +
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 47 +++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 11 +++++
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  8 ++++
 .../net/ethernet/netronome/nfp/bpf/verifier.c | 28 ++++++++++-
 drivers/net/ethernet/netronome/nfp/nfp_asm.h  | 22 +++++----
 6 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/fw.h b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
index 3dbc21653ce5..4c7972e3db63 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/fw.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/fw.h
@@ -50,6 +50,7 @@ enum bpf_cap_tlv_type {
 	NFP_BPF_CAP_TYPE_ADJUST_HEAD	= 2,
 	NFP_BPF_CAP_TYPE_MAPS		= 3,
 	NFP_BPF_CAP_TYPE_RANDOM		= 4,
+	NFP_BPF_CAP_TYPE_QUEUE_SELECT	= 5,
 };
 
 struct nfp_bpf_cap_tlv_func {
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 326a2085d650..a4d3da215863 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -42,6 +42,7 @@
 
 #include "main.h"
 #include "../nfp_asm.h"
+#include "../nfp_net_ctrl.h"
 
 /* --- NFP prog --- */
 /* Foreach "multiple" entries macros provide pos and next<n> pointers.
@@ -1470,6 +1471,38 @@ nfp_perf_event_output(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 	return 0;
 }
 
+static int
+nfp_queue_select(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	u32 jmp_tgt;
+
+	jmp_tgt = nfp_prog_current_offset(nfp_prog) + 5;
+
+	/* Make sure the queue id fits into FW field */
+	emit_alu(nfp_prog, reg_none(), reg_a(meta->insn.src_reg * 2),
+		 ALU_OP_AND_NOT_B, reg_imm(0xff));
+	emit_br(nfp_prog, BR_BEQ, jmp_tgt, 2);
+
+	/* Set the 'queue selected' bit and the queue value */
+	emit_shf(nfp_prog, pv_qsel_set(nfp_prog),
+		 pv_qsel_set(nfp_prog), SHF_OP_OR, reg_imm(1),
+		 SHF_SC_L_SHF, PKT_VEL_QSEL_SET_BIT);
+	emit_ld_field(nfp_prog,
+		      pv_qsel_val(nfp_prog), 0x1, reg_b(meta->insn.src_reg * 2),
+		      SHF_SC_NONE, 0);
+	/* Delay slots end here, we will jump over next instruction if queue
+	 * value fits into the field.
+	 */
+	emit_ld_field(nfp_prog,
+		      pv_qsel_val(nfp_prog), 0x1, reg_imm(NFP_NET_RXR_MAX),
+		      SHF_SC_NONE, 0);
+
+	if (!nfp_prog_confirm_current_offset(nfp_prog, jmp_tgt))
+		return -EINVAL;
+
+	return 0;
+}
+
 /* --- Callbacks --- */
 static int mov_reg64(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
@@ -2160,6 +2193,17 @@ mem_stx_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 			    false, wrp_lmem_store);
 }
 
+static int mem_stx_xdp(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+{
+	switch (meta->insn.off) {
+	case offsetof(struct xdp_md, rx_queue_index):
+		return nfp_queue_select(nfp_prog, meta);
+	}
+
+	WARN_ON_ONCE(1); /* verifier should have rejected bad accesses */
+	return -EOPNOTSUPP;
+}
+
 static int
 mem_stx(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	unsigned int size)
@@ -2186,6 +2230,9 @@ static int mem_stx2(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 
 static int mem_stx4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
 {
+	if (meta->ptr.type == PTR_TO_CTX)
+		if (nfp_prog->type == BPF_PROG_TYPE_XDP)
+			return mem_stx_xdp(nfp_prog, meta);
 	return mem_stx(nfp_prog, meta, 4);
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index d72f9e7f42da..f1846d8f59cc 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -334,6 +334,13 @@ nfp_bpf_parse_cap_random(struct nfp_app_bpf *bpf, void __iomem *value,
 	return 0;
 }
 
+static int
+nfp_bpf_parse_cap_qsel(struct nfp_app_bpf *bpf, void __iomem *value, u32 length)
+{
+	bpf->queue_select = true;
+	return 0;
+}
+
 static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 {
 	struct nfp_cpp *cpp = app->pf->cpp;
@@ -376,6 +383,10 @@ static int nfp_bpf_parse_capabilities(struct nfp_app *app)
 			if (nfp_bpf_parse_cap_random(app->priv, value, length))
 				goto err_release_free;
 			break;
+		case NFP_BPF_CAP_TYPE_QUEUE_SELECT:
+			if (nfp_bpf_parse_cap_qsel(app->priv, value, length))
+				goto err_release_free;
+			break;
 		default:
 			nfp_dbg(cpp, "unknown BPF capability: %d\n", type);
 			break;
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 82682378d57f..8b143546ae85 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -82,10 +82,16 @@ enum static_regs {
 enum pkt_vec {
 	PKT_VEC_PKT_LEN		= 0,
 	PKT_VEC_PKT_PTR		= 2,
+	PKT_VEC_QSEL_SET	= 4,
+	PKT_VEC_QSEL_VAL	= 6,
 };
 
+#define PKT_VEL_QSEL_SET_BIT	4
+
 #define pv_len(np)	reg_lm(1, PKT_VEC_PKT_LEN)
 #define pv_ctm_ptr(np)	reg_lm(1, PKT_VEC_PKT_PTR)
+#define pv_qsel_set(np)	reg_lm(1, PKT_VEC_QSEL_SET)
+#define pv_qsel_val(np)	reg_lm(1, PKT_VEC_QSEL_VAL)
 
 #define stack_reg(np)	reg_a(STATIC_REG_STACK)
 #define stack_imm(np)	imm_b(np)
@@ -139,6 +145,7 @@ enum pkt_vec {
  * @helpers.perf_event_output:	output perf event to a ring buffer
  *
  * @pseudo_random:	FW initialized the pseudo-random machinery (CSRs)
+ * @queue_select:	BPF can set the RX queue ID in packet vector
  */
 struct nfp_app_bpf {
 	struct nfp_app *app;
@@ -181,6 +188,7 @@ struct nfp_app_bpf {
 	} helpers;
 
 	bool pseudo_random;
+	bool queue_select;
 };
 
 enum nfp_bpf_map_use {
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index e163f3cfa47d..844a9be6e55a 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -467,6 +467,30 @@ nfp_bpf_check_ptr(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 	return 0;
 }
 
+static int
+nfp_bpf_check_store(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+		    struct bpf_verifier_env *env)
+{
+	const struct bpf_reg_state *reg = cur_regs(env) + meta->insn.dst_reg;
+
+	if (reg->type == PTR_TO_CTX) {
+		if (nfp_prog->type == BPF_PROG_TYPE_XDP) {
+			/* XDP ctx accesses must be 4B in size */
+			switch (meta->insn.off) {
+			case offsetof(struct xdp_md, rx_queue_index):
+				if (nfp_prog->bpf->queue_select)
+					goto exit_check_ptr;
+				pr_vlog(env, "queue selection not supported by FW\n");
+				return -EOPNOTSUPP;
+			}
+		}
+		pr_vlog(env, "unsupported store to context field\n");
+		return -EOPNOTSUPP;
+	}
+exit_check_ptr:
+	return nfp_bpf_check_ptr(nfp_prog, meta, env, meta->insn.dst_reg);
+}
+
 static int
 nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
 		   struct bpf_verifier_env *env)
@@ -522,8 +546,8 @@ nfp_verify_insn(struct bpf_verifier_env *env, int insn_idx, int prev_insn_idx)
 		return nfp_bpf_check_ptr(nfp_prog, meta, env,
 					 meta->insn.src_reg);
 	if (is_mbpf_store(meta))
-		return nfp_bpf_check_ptr(nfp_prog, meta, env,
-					 meta->insn.dst_reg);
+		return nfp_bpf_check_store(nfp_prog, meta, env);
+
 	if (is_mbpf_xadd(meta))
 		return nfp_bpf_check_xadd(nfp_prog, meta, env);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
index 5f2b2f24f4fa..faa4e131c136 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -183,16 +183,18 @@ enum shf_sc {
 #define OP_ALU_DST_LMEXTN	0x80000000000ULL
 
 enum alu_op {
-	ALU_OP_NONE	= 0x00,
-	ALU_OP_ADD	= 0x01,
-	ALU_OP_NOT	= 0x04,
-	ALU_OP_ADD_2B	= 0x05,
-	ALU_OP_AND	= 0x08,
-	ALU_OP_SUB_C	= 0x0d,
-	ALU_OP_ADD_C	= 0x11,
-	ALU_OP_OR	= 0x14,
-	ALU_OP_SUB	= 0x15,
-	ALU_OP_XOR	= 0x18,
+	ALU_OP_NONE		= 0x00,
+	ALU_OP_ADD		= 0x01,
+	ALU_OP_NOT		= 0x04,
+	ALU_OP_ADD_2B		= 0x05,
+	ALU_OP_AND		= 0x08,
+	ALU_OP_AND_NOT_A	= 0x0c,
+	ALU_OP_SUB_C		= 0x0d,
+	ALU_OP_AND_NOT_B	= 0x10,
+	ALU_OP_ADD_C		= 0x11,
+	ALU_OP_OR		= 0x14,
+	ALU_OP_SUB		= 0x15,
+	ALU_OP_XOR		= 0x18,
 };
 
 enum alu_dst_ab {
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 2/2] alx: add disable_wol paramenter
From: AceLan Kao @ 2018-05-09  2:40 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Lunn, James Cliburn, Chris Snook, rakesh, netdev,
	Linux-Kernel@Vger. Kernel. Org, Emily Chien
In-Reply-To: <CAFv23QmkLiuFA6C4YFsO3hoTEE--2bT4H2uxaMqoY30G4_QJGw@mail.gmail.com>

Hi,

I didn't get any response around one month.
I'm still here hoping you can consider accepting the WoL patch.

Without that patch, people have no chance to bump into the issue and
have no chance to fix it.
Moreover, it leads to the dkms package be spreaded around, and it'll
become a more annoying issue when UEFI secure boot is enabled[1].
Please re-consider it to enable WoL again and set it to disable by default,
so that we/user have a chance to examine the feature and have a chance
to find out a read fix for it.
Thanks.

1. https://bugzilla.kernel.org/show_bug.cgi?id=61651

Best regards,
AceLan Kao.

2018-04-24 11:45 GMT+08:00 AceLan Kao <acelan.kao@canonical.com>:
> Hi,
>
> May I know the final decision of this patch?
> Thanks.
>
> Best regards,
> AceLan Kao.
>
> 2018-04-10 10:40 GMT+08:00 AceLan Kao <acelan.kao@canonical.com>:
>> The problem is I don't have a machine with that wakeup issue, and I
>> need WoL feature.
>> Instead of spreading "alx with WoL" dkms package everywhere, I would
>> like to see it's supported in the driver and is disabled by default.
>>
>> Moreover, the wakeup issue may come from old Atheros chips, or result
>> from buggy BIOS.
>> With the WoL has been removed from the driver, no one will report
>> issue about that, and we don't have any chance to find a fix for it.
>>
>> Adding this feature back is not covering a paper on the issue, it
>> makes people have a chance to examine this feature.
>>
>> 2018-04-09 22:50 GMT+08:00 David Miller <davem@davemloft.net>:
>>> From: Andrew Lunn <andrew@lunn.ch>
>>> Date: Mon, 9 Apr 2018 14:39:10 +0200
>>>
>>>> On Mon, Apr 09, 2018 at 07:35:14PM +0800, AceLan Kao wrote:
>>>>> The WoL feature was reported broken and will lead to
>>>>> the system resume immediately after suspending.
>>>>> This symptom is not happening on every system, so adding
>>>>> disable_wol option and disable WoL by default to prevent the issue from
>>>>> happening again.
>>>>
>>>>>  const char alx_drv_name[] = "alx";
>>>>>
>>>>> +/* disable WoL by default */
>>>>> +bool disable_wol = 1;
>>>>> +module_param(disable_wol, bool, 0);
>>>>> +MODULE_PARM_DESC(disable_wol, "Disable Wake on Lan feature");
>>>>> +
>>>>
>>>> Hi AceLan
>>>>
>>>> This seems like you are papering over the cracks. And module
>>>> parameters are not liked.
>>>>
>>>> Please try to find the real problem.
>>>
>>> Agreed.

^ permalink raw reply

* Re: [PATCH bpf-next 0/2] nfp: bpf: add programmable RSS support
From: Alexei Starovoitov @ 2018-05-09  2:42 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: daniel, davem, netdev, oss-drivers
In-Reply-To: <20180509023707.23601-1-jakub.kicinski@netronome.com>

On Tue, May 08, 2018 at 07:37:05PM -0700, Jakub Kicinski wrote:
> Hi!
> 
> This small series adds a feature which extends BPF offload beyond
> a pure host processing offload and firmly into the realm of
> heterogeneous processing.  Allowing offloaded XDP programs to set
> the RX queue index opens the door for defining fully programmable
> RSS/n-tuple filter replacement.  In fact the device datapath will
> skip the RSS processing completely if BPF decided on the queue
> already, making the XDP program replace part of the standard NIC
> datapath.

Absolutely love it!
Huge feature enabled by such tiny diff.

For the set:
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox