Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH iproute2/net-next 1/2] tc: flower: document that *_ip parameters take a PREFIX as an argument.
From: Simon Horman @ 2016-12-16 13:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1481896477-13497-1-git-send-email-simon.horman@netronome.com>

* The argument to src_ip, dst_ip, enc_src_ip and enc_dst_ip take an
  optional prefix length which is used to provide a mask to limit the scope
  of matching.
* This is documented as a PREFIX in keeping with ip-route(8).

Example of uses of IPv4 and IPv6 prefixes

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 dst_ip 192.168.1.1 action drop
tc filter add dev eth0 protocol ip parent ffff: flower \
    indev eth0 src_ip 10.0.0.0/8 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 src_ip 2001:DB8:1::/48 action drop
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
    indev eth0 dst_ip 2001:DB8::1 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 man/man8/tc-flower.8 | 28 ++++++++++++++--------------
 tc/f_flower.c        |  8 ++++----
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 88df83360b89..a383b6584dc6 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -31,8 +31,8 @@ flower \- flow based traffic control filter
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | "
 .IR IP_PROTO " } | { "
-.BR dst_ip " | " src_ip " } { "
-.IR ipv4_address " | " ipv6_address " } | { "
+.BR dst_ip " | " src_ip " } "
+.IR PREFIX " | { "
 .BR dst_port " | " src_port " } "
 .IR port_number " } | "
 .B enc_key_id
@@ -103,14 +103,14 @@ may be
 .BR tcp ", " udp ", " sctp ", " icmp ", " icmpv6
 or an unsigned 8bit value in hexadecimal format.
 .TP
-.BI dst_ip " ADDRESS"
+.BI dst_ip " PREFIX"
 .TQ
-.BI src_ip " ADDRESS"
+.BI src_ip " PREFIX"
 Match on source or destination IP address.
-.I ADDRESS
-must be a valid IPv4 or IPv6 address, depending on
-.BR protocol
-option of tc filter.
+.I PREFIX
+must be a valid IPv4 or IPv6 address, depending on the \fBprotocol\fR
+option to tc filter, optionally followed by a slash and the prefix length.
+If the prefix is missing, \fBtc\fR assumes a full-length host match.
 .TP
 .BI dst_port " NUMBER"
 .TQ
@@ -128,16 +128,16 @@ which have to be specified in beforehand.
 .TP
 .BI enc_key_id " NUMBER"
 .TQ
-.BI enc_dst_ip " ADDRESS"
+.BI enc_dst_ip " PREFIX"
 .TQ
-.BI enc_src_ip " ADDRESS"
-.TQ
-.BI enc_dst_port " NUMBER"
+.BI enc_src_ip " PREFIX"
 Match on IP tunnel metadata. Key id
 .I NUMBER
 is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
-.I ADDRESS
-must be a valid IPv4 or IPv6 address. Dst port
+.I PREFIX
+must be a valid IPv4 or IPv6 address optionally followed by a slash and the
+prefix length. If the prefix is missing, \fBtc\fR assumes a full-length
+host match.  Dst port
 .I NUMBER
 is a 16 bit UDP dst port.
 .SH NOTES
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 653dfefc060a..cdf74344f78f 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -48,14 +48,14 @@ static void explain(void)
 		"                       dst_mac MAC-ADDR |\n"
 		"                       src_mac MAC-ADDR |\n"
 		"                       ip_proto [tcp | udp | sctp | icmp | icmpv6 | IP-PROTO ] |\n"
-		"                       dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
-		"                       src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
+		"                       dst_ip PREFIX |\n"
+		"                       src_ip PREFIX |\n"
 		"                       dst_port PORT-NUMBER |\n"
 		"                       src_port PORT-NUMBER |\n"
 		"                       type ICMP-TYPE |\n"
 		"                       code ICMP-CODE }\n"
-		"                       enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
-		"                       enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n"
+		"                       enc_dst_ip PREFIX |\n"
+		"                       enc_src_ip PREFIX |\n"
 		"                       enc_key_id [ KEY-ID ] }\n"
 		"       FILTERID := X:Y:Z\n"
 		"       ACTION-SPEC := ... look at individual actions\n"
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH iproute2/net-next 2/2] tc: flower: Allow *_mac options to accept a mask
From: Simon Horman @ 2016-12-16 13:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1481896477-13497-1-git-send-email-simon.horman@netronome.com>

* The argument to src_mac and dst_mac may now take an optional mask
  to limit the scope of matching.
* This address is is documented as a LLADDR in keeping with ip-link(8).
* The formats accepted match those already output when dumping flower
  filters from the kernel.

Example of use of LLADDR with and without a mask:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:01:00:00:00/ff:ff:00:00:00:01 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00/23 action drop
tc filter add dev eth0 protocol ip parent ffff: flower indev eth0 \
	src_mac 52:54:00:00:00:00 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 man/man8/tc-flower.8 | 13 +++++++++----
 tc/f_flower.c        | 43 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index a383b6584dc6..31c7d3b32f9b 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -22,7 +22,7 @@ flower \- flow based traffic control filter
 .BR skip_sw " | " skip_hw
 .R " | { "
 .BR dst_mac " | " src_mac " } "
-.IR mac_address " | "
+.IR MASKED_LLADDR " | "
 .B vlan_id
 .IR VID " | "
 .B vlan_prio
@@ -74,10 +74,15 @@ filter, or TC offload is not enabled for the interface, operation will fail.
 .BI skip_hw
 Do not process filter by hardware.
 .TP
-.BI dst_mac " mac_address"
+.BI dst_mac " MASKED_LLADDR"
 .TQ
-.BI src_mac " mac_address"
-Match on source or destination MAC address.
+.BI src_mac " MASKED_LLADDR"
+Match on source or destination MAC address.  A mask may be optionally
+provided to limit the bits of the address which are matched. A mask is
+provided by following the address with a slash and then the mask. It may be
+provided in LLADDR format, in which case it is a bitwise mask, or as a
+number of high bits to match. If the mask is missing then a match on all
+bits is assumed.
 .TP
 .BI vlan_id " VID"
 Match on vlan tag id.
diff --git a/tc/f_flower.c b/tc/f_flower.c
index cdf74344f78f..6d9a3b70afed 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -45,8 +45,8 @@ static void explain(void)
 		"                       vlan_id VID |\n"
 		"                       vlan_prio PRIORITY |\n"
 		"                       vlan_ethtype [ ipv4 | ipv6 | ETH-TYPE ] |\n"
-		"                       dst_mac MAC-ADDR |\n"
-		"                       src_mac MAC-ADDR |\n"
+		"                       dst_mac MASKED-LLADDR |\n"
+		"                       src_mac MASKED-LLADDR |\n"
 		"                       ip_proto [tcp | udp | sctp | icmp | icmpv6 | IP-PROTO ] |\n"
 		"                       dst_ip PREFIX |\n"
 		"                       src_ip PREFIX |\n"
@@ -58,6 +58,7 @@ static void explain(void)
 		"                       enc_src_ip PREFIX |\n"
 		"                       enc_key_id [ KEY-ID ] }\n"
 		"       FILTERID := X:Y:Z\n"
+		"       MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS }\n"
 		"       ACTION-SPEC := ... look at individual actions\n"
 		"\n"
 		"NOTE: CLASSID, IP-PROTO are parsed as hexadecimal input.\n"
@@ -68,16 +69,44 @@ static void explain(void)
 static int flower_parse_eth_addr(char *str, int addr_type, int mask_type,
 				 struct nlmsghdr *n)
 {
-	int ret;
-	char addr[ETH_ALEN];
+	int ret, err = -1;
+	char addr[ETH_ALEN], *slash;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
 
 	ret = ll_addr_a2n(addr, sizeof(addr), str);
 	if (ret < 0)
-		return -1;
+		goto err;
 	addattr_l(n, MAX_MSG, addr_type, addr, sizeof(addr));
-	memset(addr, 0xff, ETH_ALEN);
+
+	if (slash) {
+		unsigned bits;
+
+		if (!get_unsigned(&bits, slash + 1, 10)) {
+			uint64_t mask;
+
+			/* Extra 16 bit shift to push mac address into
+			 * high bits of uint64_t
+			 */
+			mask = htonll(0xffffffffffffULL << (16 + 48 - bits));
+			memcpy(addr, &mask, ETH_ALEN);
+		} else {
+			ret = ll_addr_a2n(addr, sizeof(addr), slash + 1);
+			if (ret < 0)
+				goto err;
+		}
+	} else {
+		memset(addr, 0xff, ETH_ALEN);
+	}
 	addattr_l(n, MAX_MSG, mask_type, addr, sizeof(addr));
-	return 0;
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
 }
 
 static int flower_parse_vlan_eth_type(char *str, __be16 eth_type, int type,
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* Re: [PATCH perf/core REBASE 2/5] samples/bpf: Switch over to libbpf
From: Arnaldo Carvalho de Melo @ 2016-12-16 14:21 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Arnaldo Carvalho de Melo, LKML, netdev, Wang Nan, ast,
	Daniel Borkmann
In-Reply-To: <CAPWQB7EAN7Fg8+vO3Tn7WYWcZW_JpKQFNCJzY_K100ckab6JRg@mail.gmail.com>

Em Thu, Dec 15, 2016 at 05:48:31PM -0800, Joe Stringer escreveu:
> On 15 December 2016 at 14:00, Joe Stringer <joe@ovn.org> wrote:
> > On 15 December 2016 at 10:34, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >> So, I'm stopping here so that I can push what I have to Ingo, then I'll get
> >> back to this, hopefully by then you beat me and I have just to retest 8-)

> > OK, thanks for the report. Looks like there was another difference
> > between the two libbpfs - one used total program size for its
> > load_program API; the actual kernel API uses instruction count. This
> > incremental should do the trick:

> > https://github.com/joestringer/linux/commit/6ff7726f20077bed66fb725f5189c13690154b6a
 
> The full branch with this change (fast-forward from your tmp branch)
> is available here:
> https://github.com/joestringer/linux/tree/submit/libbpf_samples_v5
 
> I tried running every selftest and BPF sample I could get my hands on;
> there's one or two that I couldn't run, but seemed more to do with my
> versions of TC/iproute and kernel config rather than libbpf changes.
> Let me know if you see any further trouble.

Thanks for doing that! I'll try and reproduce your tests as soon as I'm
back to the office, it looks like it all will go together in my next
pull to Ingo.

- Arnaldo

^ permalink raw reply

* RE: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat
From: Nogah Frankel @ 2016-12-16 12:11 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev@vger.kernel.org, roopa@cumulusnetworks.com, Jiri Pirko,
	Elad Raz, Yotam Gigi, Ido Schimmel, Or Gerlitz
In-Reply-To: <20161215093327.2daf60c6@xeon-e3>


> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Thursday, December 15, 2016 7:33 PM
> To: Nogah Frankel <nogahf@mellanox.com>
> Cc: netdev@vger.kernel.org; roopa@cumulusnetworks.com; Jiri Pirko
> <jiri@mellanox.com>; Elad Raz <eladr@mellanox.com>; Yotam Gigi
> <yotamg@mellanox.com>; Ido Schimmel <idosch@mellanox.com>; Or Gerlitz
> <ogerlitz@mellanox.com>
> Subject: Re: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat
> 
> On Thu, 15 Dec 2016 15:00:43 +0200
> Nogah Frankel <nogahf@mellanox.com> wrote:
> 
> > Extended stats are part of the RTM_GETSTATS method. This patch adds them
> > to ifstat.
> > While extended stats can come in many forms, we support only the
> > rtnl_link_stats64 struct for them (which is the 64 bits version of struct
> > rtnl_link_stats).
> > We support stats in the main nesting level, or one lower.
> > The extension can be called by its name or any shorten of it. If there is
> > more than one matched, the first one will be picked.
> >
> > To get the extended stats the flag -x <stats type> is used.
> >
> > Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
> > Reviewed-by: Jiri Pirko <jiri@mellanox.com>
> > ---
> >  misc/ifstat.c | 161
> ++++++++++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 146 insertions(+), 15 deletions(-)
> >
> > diff --git a/misc/ifstat.c b/misc/ifstat.c
> > index 92d67b0..d17ae21 100644
> > --- a/misc/ifstat.c
> > +++ b/misc/ifstat.c
> > @@ -35,6 +35,7 @@
> >
> >  #include <SNAPSHOT.h>
> >
> > +#include "utils.h"
> >  int dump_zeros;
> >  int reset_history;
> >  int ignore_history;
> 
> Minor nit, please cleanup include order here (original code was wrong).
> 
> Standard practice is:
>  #include system headers (like stdio.h etc)
>  #include "xxx.h" local headers.
> 
> Should be:
> #include <getopt.h>
> 
> #include <linux/if.h>
> #include <linux/if_link.h>
> 
> #include "json_writer.h"
> #include "libnetlink.h"
> #include "utils.h"
> #include "SNAPSHOT.h"

Thanks, I'll fix it.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-16 14:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Craig Gallek, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <aca378b5-aae4-49cf-0542-d1167a5768b8@stressinduktion.org>

On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> Hi Josef,
> 
> On 15.12.2016 19:53, Josef Bacik wrote:
>>  On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> 
>> wrote:
>>>  On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek 
>>> <kraigatgoog@gmail.com>
>>>  wrote:
>>>>   On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert 
>>>> <tom@herbertland.com>
>>>>  wrote:
>>>>>   I think there may be some suspicious code in inet_csk_get_port. 
>>>>> At
>>>>>   tb_found there is:
>>>>> 
>>>>>                   if (((tb->fastreuse > 0 && reuse) ||
>>>>>                        (tb->fastreuseport > 0 &&
>>>>>                         !rcu_access_pointer(sk->sk_reuseport_cb) 
>>>>> &&
>>>>>                         sk->sk_reuseport && uid_eq(tb->fastuid,
>>>>>  uid))) &&
>>>>>                       smallest_size == -1)
>>>>>                           goto success;
>>>>>                   if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk,
>>>>>  tb, true)) {
>>>>>                           if ((reuse ||
>>>>>                                (tb->fastreuseport > 0 &&
>>>>>                                 sk->sk_reuseport &&
>>>>> 
>>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>>                                 uid_eq(tb->fastuid, uid))) &&
>>>>>                               smallest_size != -1 && --attempts 
>>>>> >= 0) {
>>>>>                                   spin_unlock_bh(&head->lock);
>>>>>                                   goto again;
>>>>>                           }
>>>>>                           goto fail_unlock;
>>>>>                   }
>>>>> 
>>>>>   AFAICT there is redundancy in these two conditionals.  The same 
>>>>> clause
>>>>>   is being checked in both: (tb->fastreuseport > 0 &&
>>>>>   !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>>   uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is 
>>>>> true the
>>>>>   first conditional should be hit, goto done,  and the second 
>>>>> will never
>>>>>   evaluate that part to true-- unless the sk is changed (do we 
>>>>> need
>>>>>   READ_ONCE for sk->sk_reuseport_cb?).
>>>>   That's an interesting point... It looks like this function also
>>>>   changed in 4.6 from using a single local_bh_disable() at the 
>>>> beginning
>>>>   with several spin_lock(&head->lock) to exclusively
>>>>   spin_lock_bh(&head->lock) at each locking point.  Perhaps the 
>>>> full bh
>>>>   disable variant was preventing the timers in your stack trace 
>>>> from
>>>>   running interleaved with this function before?
>>> 
>>>  Could be, although dropping the lock shouldn't be able to affect 
>>> the
>>>  search state. TBH, I'm a little lost in reading function, the
>>>  SO_REUSEPORT handling is pretty complicated. For instance,
>>>  rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in 
>>> that
>>>  function and also in every call to inet_csk_bind_conflict. I 
>>> wonder if
>>>  we can simply this under the assumption that SO_REUSEPORT is only
>>>  allowed if the port number (snum) is explicitly specified.
>> 
>>  Ok first I have data for you Hannes, here's the time distributions
>>  before during and after the lockup (with all the debugging in place 
>> the
>>  box eventually recovers).  I've attached it as a text file since it 
>> is
>>  long.
> 
> Thanks a lot!
> 
>>  Second is I was thinking about why we would spend so much time 
>> doing the
>>  ->owners list, and obviously it's because of the massive amount of
>>  timewait sockets on the owners list.  I wrote the following dumb 
>> patch
>>  and tested it and the problem has disappeared completely.  Now I 
>> don't
>>  know if this is right at all, but I thought it was weird we weren't
>>  copying the soreuseport option from the original socket onto the 
>> twsk.
>>  Is there are reason we aren't doing this currently?  Does this help
>>  explain what is happening?  Thanks,
> 
> The patch is interesting and a good clue, but I am immediately a bit
> concerned that we don't copy/tag the socket with the uid also to keep
> the security properties for SO_REUSEPORT. I have to think a bit more
> about this.
> 
> We have seen hangs during connect. I am afraid this patch wouldn't 
> help
> there while also guaranteeing uniqueness.


Yeah so I looked at the code some more and actually my patch is really 
bad.  If sk2->sk_reuseport is set we'll look at sk2->sk_reuseport_cb, 
which is outside of the timewait sock, so that's definitely bad.

But we should at least be setting it to 0 so that we don't do this 
normally.  Unfortunately simply setting it to 0 doesn't fix the 
problem.  So for some reason having ->sk_reuseport set to 1 on a 
timewait socket makes this problem non-existent, which is strange.

So back to the drawing board I guess.  I wonder if doing what craig 
suggested and batching the timewait timer expires so it hurts less 
would accomplish the same results.  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH iproute2 v2 1/3] ifstat: Add extended statistics to ifstat
From: Rami Rosen @ 2016-12-16 15:21 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: Stephen Hemminger, netdev@vger.kernel.org,
	roopa@cumulusnetworks.com, Jiri Pirko, Elad Raz, Yotam Gigi,
	Ido Schimmel, Or Gerlitz
In-Reply-To: <DB5PR05MB1895991888E270CE790F1C16AC9C0@DB5PR05MB1895.eurprd05.prod.outlook.com>

Hi,

>Thanks, I'll fix it.
Another minor nit, on this occasion:

bool is_extanded should be: bool is_extended

Regards,
Rami Rosen

^ permalink raw reply

* Re: [PATCH perf/core REBASE 2/5] samples/bpf: Switch over to libbpf
From: Arnaldo Carvalho de Melo @ 2016-12-16 15:42 UTC (permalink / raw)
  To: Joe Stringer
  Cc: LKML, netdev, Wang Nan, ast, Daniel Borkmann,
	Arnaldo Carvalho de Melo
In-Reply-To: <CAPWQB7EAN7Fg8+vO3Tn7WYWcZW_JpKQFNCJzY_K100ckab6JRg@mail.gmail.com>

Em Thu, Dec 15, 2016 at 05:48:31PM -0800, Joe Stringer escreveu:
> On 15 December 2016 at 14:00, Joe Stringer <joe@ovn.org> wrote:
> > On 15 December 2016 at 10:34, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >> Em Thu, Dec 15, 2016 at 03:29:18PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> Em Thu, Dec 15, 2016 at 12:50:22PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> > Em Wed, Dec 14, 2016 at 02:43:39PM -0800, Joe Stringer escreveu:
> >>> > > Now that libbpf under tools/lib/bpf/* is synced with the version from
> >>> > > samples/bpf, we can get rid most of the libbpf library here.
> >>> > >
> >>> > > Signed-off-by: Joe Stringer <joe@ovn.org>
> >>> > > Cc: Alexei Starovoitov <ast@fb.com>
> >>> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> >>> > > Cc: Wang Nan <wangnan0@huawei.com>
> >>> > > Link: http://lkml.kernel.org/r/20161209024620.31660-6-joe@ovn.org
> >>> > > [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
> >>>
> >>> So, the above comment no longer applied to this adjusted patch from you,
> >>> as you removed one hunk too much, that, after applied, gets samples/bpf/
> >>> to build successfully:
> >>>
> >>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> >>> index add514e2984a..81b0ef2f7994 100644
> >>> --- a/samples/bpf/Makefile
> >>> +++ b/samples/bpf/Makefile
> >>> @@ -107,6 +107,7 @@ always += lwt_len_hist_kern.o
> >>>  always += xdp_tx_iptunnel_kern.o
> >>>
> >>>  HOSTCFLAGS += -I$(objtree)/usr/include
> >>> +HOSTCFLAGS += -I$(srctree)/tools/lib/
> >>>  HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
> >>>
> >>>  HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
> >>>
> >>> ---------------------
> >>>
> >>> I added it, continuing...
> >>
> >> But then, when I tried to run offwaketime with it, it fails:
> >>
> >> [root@jouet bpf]# ./offwaketime  ls
> >> bpf_load_program() err=22
> >> BPF_LDX uses reserved fields
> >> bpf_load_program() err=22
> >> BPF_LDX uses reserved fields
> >> [root@jouet bpf]#
> >>
> >> If I remove this patch and try again, it works:
> >>
> >> [root@jouet bpf]# ./offwaketime | head -4
> >> swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 46
> >> chrome;return_from_SYSCALL_64;do_syscall_64;exit_to_usermode_loop;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;do_syscall_64;return_from_SYSCALL_64;;Chrome_ChildIOT 1
> >> firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 3
> >> dockerd-current;entry_SYSCALL_64_fastpath;sys_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;futex_wake;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;dockerd-current 2
> >> [root@jouet bpf]#
> >>
> >>
> >> So, I'm stopping here so that I can push what I have to Ingo, then I'll get
> >> back to this, hopefully by then you beat me and I have just to retest 8-)
> >
> > OK, thanks for the report. Looks like there was another difference
> > between the two libbpfs - one used total program size for its
> > load_program API; the actual kernel API uses instruction count. This
> > incremental should do the trick:
> >
> > https://github.com/joestringer/linux/commit/6ff7726f20077bed66fb725f5189c13690154b6a
> 
> The full branch with this change (fast-forward from your tmp branch)
> is available here:
> https://github.com/joestringer/linux/tree/submit/libbpf_samples_v5

Can you please repost the patches you changed, and just those, sometimes
I'm with limited net connectivity, so not having to go use the github
interface, figure out how to download the patches, etc, is a win.

- Arnaldo
 
> I tried running every selftest and BPF sample I could get my hands on;
> there's one or two that I couldn't run, but seemed more to do with my
> versions of TC/iproute and kernel config rather than libbpf changes.
> Let me know if you see any further trouble.

^ permalink raw reply

* RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers
From: George Spelvin @ 2016-12-16 15:44 UTC (permalink / raw)
  To: ak, davem, David.Laight, ebiggers3, hannes, Jason,
	kernel-hardening, linux-crypto, linux-kernel, linux, luto, netdev,
	tom, torvalds, tytso, vegard.nossum
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0240EFA@AcuExch.aculab.com>

Jason A. Donenfeld wrote:
> Isn't that equivalent to:
>	v0 = key[0];
>	v1 = key[1];
>	v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
>	v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);

(Pre-XORing key[] with the first two constants which, if the constants
are random in the first place, can be a no-op.)  Other than the typo
in the v2 line, yes.  If they key is non-public, then you can xor an
arbitrary constant in to both halves to slightly speed up the startup.

(Nits: There's a typo in the v2 line, you don't need to parenthesize
associative operators like xor, and the "ull" suffix is redundant here.)

> Those constants also look like ASCII strings.

They are.  The ASCII is "somepseudorandomlygeneratedbytes".

> What cryptographic analysis has been done on the values?

They're "nothing up my sleeve numbers".

They're arbitrary numbers, and almost any other values would do exactly
as well.  The main properties are:

1) They're different (particulatly v0 != v2 and v1 != v3), and
2) Neither they, nor their xor, is rotationally symmetric like 0x55555555.
   (Because SipHash is mostly rotationally symmetric, broken only by the
   interruption of the carry chain at the msbit, it helps slightly
   to break this up at the beginning.)

Those exact values only matter for portability.  If you don't need anyone
else to be able to compute matching outputs, then you could use any other
convenient constants (like the MD5 round constants).

^ permalink raw reply

* Re: [PATCH net] sfc: skip ptp allocations on secondary interfaces
From: Jarod Wilson @ 2016-12-16 15:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-net-drivers, Edward Cree, Bert Kenward, netdev
In-Reply-To: <20161215203104.12714-1-jarod@redhat.com>

On 2016-12-15 3:31 PM, Jarod Wilson wrote:
> Rather than allocating efx_ptp_data, a buffer, a workqueue, etc., and then
> ultimately deciding not to call ptp_clock_register() for non-primary
> interfaces, just exit out of efx_ptp_prober() earlier. Saves a little
> memory and speeds up init and teardown slightly.
>
> CC: linux-net-drivers@solarflare.com
> CC: Edward Cree <ecree@solarflare.com>
> CC: Bert Kenward <bkenward@solarflare.com>
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>

Self-Nak on this one. Talking to Bert off-list, he pointed out that the 
driver uses the PTP structure allocated here for the secondary interface 
to do internal time-stamping and the like. While it might be ideal not 
to intermingle PTP structs with non-PTP work, disabling this right now 
is obviously a non-starter.

-- 
Jarod Wilson
jarod@redhat.com

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-16 15:51 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: George Spelvin, Andi Kleen, David Miller, David Laight,
	Eric Biggers, Hannes Frederic Sowa, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, vegard.nossum,
	Daniel J . Bernstein
In-Reply-To: <CAGiyFddB_HT3H2yhYQ5rprYZ487rJ4iCaH9uPJQD57hiPbn9ng@mail.gmail.com>

Hi JP & George,

My function names:
- SipHash -> siphash
- HalfSipHash -> hsiphash

It appears that hsiphash can produce either 32-bit output or 64-bit
output, with the output length parameter as part of the hash algorithm
in there. When I code this for my kernel patchset, I very likely will
only implement one output length size. Right now I'm leaning toward
32-bit. Questions:

- Is this a reasonable choice?
- When hsiphash is desired due to its faster speed, are there any
circumstances in which producing a 64-bit output would actually be
useful? Namely, are there any hashtables that could benefit from a
64-bit functions?
- Are there reasons why hsiphash with 64-bit output would be
reasonable? Or will we be fine sticking with 32-bit output only?

With both hsiphash and siphash, the division of usage will probably become:
- Use 64-bit output 128-bit key siphash for keyed RNG-like things,
such as syncookies and sequence numbers
- Use 64-bit output 128-bit key siphash for hashtables that must
absolutely be secure to an extremely high bandwidth attacker, such as
userspace directly DoSing a kernel hashtable
- Use 32-bit output 64-bit key hsiphash for quick hashtable functions
that still must be secure but do not require as large of a security
margin

Sound good?

Jason

^ permalink raw reply

* Re: [PATCH v5 3/4] secure_seq: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-16 15:57 UTC (permalink / raw)
  To: David Laight
  Cc: Netdev, kernel-hardening@lists.openwall.com, LKML,
	linux-crypto@vger.kernel.org, Ted Tso, Hannes Frederic Sowa,
	Linus Torvalds, Eric Biggers, Tom Herbert, George Spelvin,
	Vegard Nossum, ak@linux.intel.com, davem@davemloft.net,
	luto@amacapital.net
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0240E66@AcuExch.aculab.com>

Hi David,

On Fri, Dec 16, 2016 at 10:59 AM, David Laight <David.Laight@aculab.com> wrote:
> You are still putting over-aligned data on stack.
> You only need to align it to the alignment of u64 (not the size of u64).
> If an on-stack item has a stronger alignment requirement than the stack
> the gcc has to generate two stack frames for the function.

Yesterday, folks were saying that sometimes 32-bit platforms need
8-byte alignment for certain 64-bit operations, so I shouldn't fall
back to 4-byte alignment there. But actually, looking at this more
closely, I can just make SIPHASH_ALIGNMENT == __alignof__(u64), which
will take care of all possible concerns, since gcc knows best which
platforms need what alignment. Thanks for making this clear to me with
"the alignment of u64 (not the size of u64)".

> Oh - and wait a bit longer between revisions.

Okay. We can be turtles.

Jason

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-16 15:21 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Craig Gallek, Eric Dumazet,
	Linux Kernel Network Developers
In-Reply-To: <1481900088.24490.6@smtp.office365.com>

On Fri, Dec 16, 2016 at 9:54 AM, Josef Bacik <jbacik@fb.com> wrote:
> On Thu, Dec 15, 2016 at 7:07 PM, Hannes Frederic Sowa 
> <hannes@stressinduktion.org> wrote:
>> Hi Josef,
>> 
>> On 15.12.2016 19:53, Josef Bacik wrote:
>>>  On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@herbertland.com> 
>>> wrote:
>>>>  On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek 
>>>> <kraigatgoog@gmail.com>
>>>>  wrote:
>>>>>   On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert 
>>>>> <tom@herbertland.com>
>>>>>  wrote:
>>>>>>   I think there may be some suspicious code in 
>>>>>> inet_csk_get_port. At
>>>>>>   tb_found there is:
>>>>>> 
>>>>>>                   if (((tb->fastreuse > 0 && reuse) ||
>>>>>>                        (tb->fastreuseport > 0 &&
>>>>>>                         !rcu_access_pointer(sk->sk_reuseport_cb) 
>>>>>> &&
>>>>>>                         sk->sk_reuseport && uid_eq(tb->fastuid,
>>>>>>  uid))) &&
>>>>>>                       smallest_size == -1)
>>>>>>                           goto success;
>>>>>>                   if 
>>>>>> (inet_csk(sk)->icsk_af_ops->bind_conflict(sk,
>>>>>>  tb, true)) {
>>>>>>                           if ((reuse ||
>>>>>>                                (tb->fastreuseport > 0 &&
>>>>>>                                 sk->sk_reuseport &&
>>>>>> 
>>>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>>>                                 uid_eq(tb->fastuid, uid))) &&
>>>>>>                               smallest_size != -1 && --attempts 
>>>>>> >= 0) {
>>>>>>                                   spin_unlock_bh(&head->lock);
>>>>>>                                   goto again;
>>>>>>                           }
>>>>>>                           goto fail_unlock;
>>>>>>                   }
>>>>>> 
>>>>>>   AFAICT there is redundancy in these two conditionals.  The 
>>>>>> same clause
>>>>>>   is being checked in both: (tb->fastreuseport > 0 &&
>>>>>>   !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>>>   uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is 
>>>>>> true the
>>>>>>   first conditional should be hit, goto done,  and the second 
>>>>>> will never
>>>>>>   evaluate that part to true-- unless the sk is changed (do we 
>>>>>> need
>>>>>>   READ_ONCE for sk->sk_reuseport_cb?).
>>>>>   That's an interesting point... It looks like this function also
>>>>>   changed in 4.6 from using a single local_bh_disable() at the 
>>>>> beginning
>>>>>   with several spin_lock(&head->lock) to exclusively
>>>>>   spin_lock_bh(&head->lock) at each locking point.  Perhaps the 
>>>>> full bh
>>>>>   disable variant was preventing the timers in your stack trace 
>>>>> from
>>>>>   running interleaved with this function before?
>>>> 
>>>>  Could be, although dropping the lock shouldn't be able to affect 
>>>> the
>>>>  search state. TBH, I'm a little lost in reading function, the
>>>>  SO_REUSEPORT handling is pretty complicated. For instance,
>>>>  rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in 
>>>> that
>>>>  function and also in every call to inet_csk_bind_conflict. I 
>>>> wonder if
>>>>  we can simply this under the assumption that SO_REUSEPORT is only
>>>>  allowed if the port number (snum) is explicitly specified.
>>> 
>>>  Ok first I have data for you Hannes, here's the time distributions
>>>  before during and after the lockup (with all the debugging in 
>>> place the
>>>  box eventually recovers).  I've attached it as a text file since 
>>> it is
>>>  long.
>> 
>> Thanks a lot!
>> 
>>>  Second is I was thinking about why we would spend so much time 
>>> doing the
>>>  ->owners list, and obviously it's because of the massive amount of
>>>  timewait sockets on the owners list.  I wrote the following dumb 
>>> patch
>>>  and tested it and the problem has disappeared completely.  Now I 
>>> don't
>>>  know if this is right at all, but I thought it was weird we weren't
>>>  copying the soreuseport option from the original socket onto the 
>>> twsk.
>>>  Is there are reason we aren't doing this currently?  Does this help
>>>  explain what is happening?  Thanks,
>> 
>> The patch is interesting and a good clue, but I am immediately a bit
>> concerned that we don't copy/tag the socket with the uid also to keep
>> the security properties for SO_REUSEPORT. I have to think a bit more
>> about this.
>> 
>> We have seen hangs during connect. I am afraid this patch wouldn't 
>> help
>> there while also guaranteeing uniqueness.
> 
> 
> Yeah so I looked at the code some more and actually my patch is 
> really bad.  If sk2->sk_reuseport is set we'll look at 
> sk2->sk_reuseport_cb, which is outside of the timewait sock, so 
> that's definitely bad.
> 
> But we should at least be setting it to 0 so that we don't do this 
> normally.  Unfortunately simply setting it to 0 doesn't fix the 
> problem.  So for some reason having ->sk_reuseport set to 1 on a 
> timewait socket makes this problem non-existent, which is strange.
> 
> So back to the drawing board I guess.  I wonder if doing what craig 
> suggested and batching the timewait timer expires so it hurts less 
> would accomplish the same results.  Thanks,

Wait no I lied, we access the sk->sk_reuseport_cb, not sk2's.  This is 
the code

                        if ((!reuse || !sk2->sk_reuse ||
                            sk2->sk_state == TCP_LISTEN) &&
                            (!reuseport || !sk2->sk_reuseport ||
                             rcu_access_pointer(sk->sk_reuseport_cb) ||
                             (sk2->sk_state != TCP_TIME_WAIT &&
                             !uid_eq(uid, sock_i_uid(sk2))))) {

                                if (!sk2->sk_rcv_saddr || 
!sk->sk_rcv_saddr ||
                                    sk2->sk_rcv_saddr == 
sk->sk_rcv_saddr)
                                        break;
                        }

so in my patches case we now have reuseport == 1, sk2->sk_reuseport == 
1.  But now we are using reuseport, so sk->sk_reuseport_cb should be 
non-NULL right?  So really setting the timewait sock's sk_reuseport 
should have no bearing on how this loop plays out right?  Thanks,

Josef

^ permalink raw reply

* [PATCH] cgroup: Fix CGROUP_BPF config
From: Andy Lutomirski @ 2016-12-16 16:33 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Mack; +Cc: Network Development, Andy Lutomirski

CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually
enabled, making it rather challenging to turn CGROUP_BPF on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 init/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index aafafeb0c117..223b734abccd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1156,7 +1156,8 @@ config CGROUP_PERF
 
 config CGROUP_BPF
 	bool "Support for eBPF programs attached to cgroups"
-	depends on BPF_SYSCALL && SOCK_CGROUP_DATA
+	depends on BPF_SYSCALL
+	select SOCK_CGROUP_DATA
 	help
 	  Allow attaching eBPF programs to a cgroup using the bpf(2)
 	  syscall command BPF_PROG_ATTACH.
-- 
2.9.3

^ permalink raw reply related

* [PATCH] ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
From: Julien Fortin @ 2016-12-16 16:36 UTC (permalink / raw)
  To: stephen, netdev; +Cc: phil, Julien Fortin

From: Julien Fortin <julien@cumulusnetworks.com>

Fixes: 4fb4a10e120b1 ("ipaddress: Print IFLA_VF_QUERY_RSS_EN setting”)

Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>
---
 ip/ipaddress.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index de64877..400ebb4 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -414,14 +414,6 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 			fprintf(fp, ", query_rss %s",
 			        rss_query->setting ? "on" : "off");
 	}
-	if (vf[IFLA_VF_RSS_QUERY_EN]) {
-		struct ifla_vf_rss_query_en *rss_query =
-			RTA_DATA(vf[IFLA_VF_RSS_QUERY_EN]);
-
-		if (rss_query->setting != -1)
-			fprintf(fp, ", query_rss %s",
-			        rss_query->setting ? "on" : "off");
-	}
 	if (vf[IFLA_VF_STATS] && show_stats)
 		print_vf_stats64(fp, vf[IFLA_VF_STATS]);
 }
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] ip: vfinfo: remove code duplication for IFLA_VF_RSS_QUERY_EN
From: Phil Sutter @ 2016-12-16 16:43 UTC (permalink / raw)
  To: Julien Fortin; +Cc: stephen, netdev
In-Reply-To: <20161216163605.19728-1-julien@cumulusnetworks.com>

On Fri, Dec 16, 2016 at 05:36:05PM +0100, Julien Fortin wrote:
> From: Julien Fortin <julien@cumulusnetworks.com>
> 
> Fixes: 4fb4a10e120b1 ("ipaddress: Print IFLA_VF_QUERY_RSS_EN setting”)
> 
> Signed-off-by: Julien Fortin <julien@cumulusnetworks.com>

Oh! Looks like Git messed up during a rebase and the obviously wrong
result was overlooked.

Acked-by: Phil Sutter <phil@nwl.cc>

^ permalink raw reply

* "rfkill: Add rfkill-any LED trigger" causes deadlock
From: Михаил Кринкин @ 2016-12-16 16:46 UTC (permalink / raw)
  To: kernel, johannes.berg, linux-wireless, davem, netdev
In-Reply-To: <20161216163707.GA2629@gmail.com>

Hi,

with recent i can't load my thinkpad with recent linux-next, bisect points
at the commit 73f4f76a196d7adb ("rfkill: Add rfkill-any LED trigger").

Problem occurs because thinkapd_acpi rfkill set_block handler
tpacpi_rfk_hook_set_block calls rfkill_set_sw_state, which in turn calls
new rfkill_any_led_trigger_event. So when rfkill_set_block called from
rfkill_register we have deadlock.

I added WARN to __rfkill_any_led_trigger_event to see how deadlock occurs,
here is backtrace:

[    6.090079] WARNING: CPU: 2 PID: 307 at net/rfkill/core.c:184
rfkill_any_led_trigger_event+0x2d/0x40
[    6.090080] reached __rfkill_any_led_trigger_event
[    6.090081] Modules linked in:
[    6.090082]  snd_pcm sg thinkpad_acpi(+) mei_me nvram snd_seq mei
idma64 virt_dma intel_pch_thermal ucsi intel_lpss_pci snd_seq_device
hci_uart snd_timer snd btbcm soundcore btqca led_class btintel
bluetooth i8042 rtc_cmos serio evdev intel_lpss_acpi intel_lpss
acpi_pad tpm_infineon mfd_core kvm_intel kvm irqbypass ipv6 autofs4
ext4 jbd2 mbcache sd_mod i915 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm ahci libahci video
pinctrl_sunrisepoint pinctrl_intel
[    6.090112] CPU: 2 PID: 307 Comm: systemd-udevd Tainted: G        W
      4.9.0-01741-g73f4f76-dirty #106
[    6.090113] Hardware name: LENOVO 20GJ004ERT/20GJ004ERT, BIOS
R0CET21W (1.09 ) 03/07/2016
[    6.090114]  ffffb199c023f998 ffffffffb4334b6b ffffb199c023f9e8
0000000000000000
[    6.090117]  ffffb199c023f9d8 ffffffffb40658ab 000000b84dd75d28
0000000080000006
[    6.090119]  ffff99ea4dd75d28 0000000000000001 0000000000000001
0000000000000000
[    6.090122] Call Trace:
[    6.090126]  [<ffffffffb4334b6b>] dump_stack+0x4d/0x72
[    6.090128]  [<ffffffffb40658ab>] __warn+0xcb/0xf0
[    6.090130]  [<ffffffffb406592f>] warn_slowpath_fmt+0x5f/0x80
[    6.090132]  [<ffffffffb4631bfd>] rfkill_any_led_trigger_event+0x2d/0x40
[    6.090135]  [<ffffffffb46322d9>] rfkill_set_sw_state+0x89/0xd0
[    6.090142]  [<ffffffffc06c9921>]
tpacpi_rfk_update_swstate+0x31/0x50 [thinkpad_acpi]
[    6.090148]  [<ffffffffc06c9972>]
tpacpi_rfk_hook_set_block+0x32/0x70 [thinkpad_acpi]
[    6.090150]  [<ffffffffb46327f0>] rfkill_set_block+0x90/0x160
[    6.090152]  [<ffffffffb4632930>] __rfkill_switch_all+0x70/0xa0
[    6.090154]  [<ffffffffb4633028>] rfkill_register+0x278/0x2b0
[    6.090159]  [<ffffffffc06e1f0e>] tpacpi_new_rfkill+0xef/0x15d
[thinkpad_acpi]
[    6.090164]  [<ffffffffc06e2407>] bluetooth_init+0x15e/0x1a9 [thinkpad_acpi]
[    6.090169]  [<ffffffffc06e2fb5>]
thinkpad_acpi_module_init.part.47+0x5e7/0x996 [thinkpad_acpi]
[    6.090174]  [<ffffffffc06e3364>] ?
thinkpad_acpi_module_init.part.47+0x996/0x996 [thinkpad_acpi]
[    6.090179]  [<ffffffffc06e36af>]
thinkpad_acpi_module_init+0x34b/0xc9c [thinkpad_acpi]
[    6.090181]  [<ffffffffb4000450>] do_one_initcall+0x50/0x180
[    6.090184]  [<ffffffffb4177e12>] ? do_init_module+0x27/0x1ee
[    6.090186]  [<ffffffffb41cc4b6>] ? kmem_cache_alloc_trace+0x156/0x1a0
[    6.090188]  [<ffffffffb4177e4a>] do_init_module+0x5f/0x1ee
[    6.090191]  [<ffffffffb40ed562>] load_module+0x22c2/0x28d0
[    6.090192]  [<ffffffffb40ea0f0>] ? __symbol_put+0x60/0x60
[    6.090195]  [<ffffffffb42ee15d>] ? ima_post_read_file+0x7d/0xa0
[    6.090198]  [<ffffffffb42a809b>] ? security_kernel_post_read_file+0x6b/0x80
[    6.090200]  [<ffffffffb40eddcf>] SYSC_finit_module+0xdf/0x110
[    6.090202]  [<ffffffffb40ede1e>] SyS_finit_module+0xe/0x10
[    6.090204]  [<ffffffffb463d524>] entry_SYSCALL_64_fastpath+0x17/0x98
[    6.090206] ---[ end trace ea6da61c1ec208d1 ]---
[    6.090207] rfkill_any_led_trigger unlocked
[    6.091664] ------------[ cut here ]------------

^ permalink raw reply

* RE: [upstream-release] [PATCH net 2/4] fsl/fman: arm: call of_platform_populate() for arm64 platfrom
From: Madalin-Cristian Bucur @ 2016-12-16 17:01 UTC (permalink / raw)
  To: Scott Wood, netdev@vger.kernel.org, mpe@ellerman.id.au
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <DB5PR0401MB1928F163614C64900D1458ED919D0@DB5PR0401MB1928.eurprd04.prod.outlook.com>

> From: Scott Wood
> Sent: Thursday, December 15, 2016 8:42 PM
> 
> On 12/15/2016 07:11 AM, Madalin Bucur wrote:
> > From: Igal Liberman <igal.liberman@freescale.com>
> >
> > Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
> > ---
> >  drivers/net/ethernet/freescale/fman/fman.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/freescale/fman/fman.c
> b/drivers/net/ethernet/freescale/fman/fman.c
> > index dafd9e1..f36b4eb 100644
> > --- a/drivers/net/ethernet/freescale/fman/fman.c
> > +++ b/drivers/net/ethernet/freescale/fman/fman.c
> > @@ -2868,6 +2868,16 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> >
> >  	fman->dev = &of_dev->dev;
> >
> > +#ifdef CONFIG_ARM64
> > +	/* call of_platform_populate in order to probe sub-nodes on arm64 */
> > +	err = of_platform_populate(fm_node, NULL, NULL, &of_dev->dev);
> > +	if (err) {
> > +		dev_err(&of_dev->dev, "%s: of_platform_populate() failed\n",
> > +			__func__);
> > +		goto fman_free;
> > +	}
> > +#endif
> 
> Should we remove fsl,fman from the PPC of_device_ids[], so this doesn't
> need an ifdef?
> 
> Why is it #ifdef CONFIG_ARM64 rather than #ifndef CONFIG_PPC?
> 
> -Scott

Igal was working on adding ARM64 support when this patch was created, thus the
choice of #ifdef CONFIG_ARM64. Unifying this for PPC and ARM64 by always calling
of_platform_populate() sounds like the best approach. I would need to synchronize
the introduction of this code with the removal of the fsl,fman entry from the
of_device_ids[] array.

Dave, Michael, Scott, is it ok to add to v2 of this patch set the patch that removes
the compatible "fsl,fman" from arch/powerpc/platforms/85xx/corenet_generic.c?

Thanks,
Madalin

^ permalink raw reply

* RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: David Laight @ 2016-12-16 17:06 UTC (permalink / raw)
  To: 'George Spelvin', ak@linux.intel.com, davem@davemloft.net,
	ebiggers3@gmail.com, hannes@stressinduktion.org, Jason@zx2c4.com,
	jeanphilippe.aumasson@gmail.com,
	kernel-hardening@lists.openwall.com, linux-crypto@vger.kernel.org,
	linux-kernel@vger.kernel.org, luto@amacapital.net,
	netdev@vger.kernel.org, tom@herbertland.com,
	torvalds@linux-foundation.org, "tytso@mit.edu" <tytso
  Cc: djb@cr.yp.to
In-Reply-To: <20161215232840.22459.qmail@ns.sciencehorizons.net>

From: George Spelvin
> Sent: 15 December 2016 23:29
> > If a halved version of SipHash can bring significant performance boost
> > (with 32b words instead of 64b words) with an acceptable security level
> > (64-bit enough?) then we may design such a version.
> 
> I was thinking if the key could be pushed to 80 bits, that would be nice,
> but honestly 64 bits is fine.  This is DoS protection, and while it's
> possible to brute-force a 64 bit secret, there are more effective (DDoS)
> attacks possible for the same cost.

A 32bit hash would also remove all the issues about the alignment
of IP addresses (etc) on 64bit systems.

> (I'd suggest a name of "HalfSipHash" to convey the reduced security
> effectively.)
> 
> > Regarding output size, are 64 bits sufficient?
> 
> As a replacement for jhash, 32 bits are sufficient.  It's for
> indexing an in-memory hash table on a 32-bit machine.

It is also worth remembering that if the intent is to generate
a hash table index (not a unique fingerprint) you will always
get collisions on the final value.
Randomness could still give overlong hash chains - which might
still need rehashing with a different key.

	David

^ permalink raw reply

* Re: [PULL] virtio, vhost: new device, fixes, speedups
From: Michael S. Tsirkin @ 2016-12-16 17:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: arend.vanspriel, KVM list, Network Development,
	Linux Kernel Mailing List, virtualization
In-Reply-To: <CA+55aFxwG=jj6=2fKjNkHCOnbKr-mNvNucbA=CSdPvb4kzW3FA@mail.gmail.com>

On Thu, Dec 15, 2016 at 06:20:40PM -0800, Linus Torvalds wrote:
> On Thu, Dec 15, 2016 at 3:05 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
> 
> Pulled, but I wonder...
> 
> >  Documentation/translations/zh_CN/sparse.txt        |   7 +-
> >  arch/arm/plat-samsung/include/plat/gpio-cfg.h      |   2 +-
> >  drivers/crypto/virtio/virtio_crypto_common.h       | 128 +++++
> [...]
> 
> what are you generating these diffstats with? Because they are pretty bogus..
> 
> The end result is correct:
> 
> >  86 files changed, 2106 insertions(+), 280 deletions(-)
> 
> but the file order in the diffstat is completely random, which makes
> it very hard to compare with what I get. It also makes it hard to see
> what you changed, because it's not alphabetical like it should be
> (strictly speaking the git pathname ordering isnt' really
> alphabetical, since the '/' sorts as the NUL character, but close
> enough).
> 
> I can't see the logic to the re-ordering of the lines, so I'm
> intrigued how you even generated it.
> 
>             Linus


Oh, that's because I set orderfile globally rather than
just for the qemu project which wants it.
Fixed, sorry about that.

-- 
MST

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-16 17:09 UTC (permalink / raw)
  To: David Laight
  Cc: George Spelvin, ak@linux.intel.com, davem@davemloft.net,
	ebiggers3@gmail.com, hannes@stressinduktion.org,
	jeanphilippe.aumasson@gmail.com,
	kernel-hardening@lists.openwall.com, linux-crypto@vger.kernel.org,
	linux-kernel@vger.kernel.org, luto@amacapital.net,
	netdev@vger.kernel.org, tom@herbertland.com,
	torvalds@linux-foundation.org, tytso@mit.edu,
	vegard.nossum@gmail.com, djb@cr.yp.to
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0241238@AcuExch.aculab.com>

Hi David,

On Fri, Dec 16, 2016 at 6:06 PM, David Laight <David.Laight@aculab.com> wrote:
> A 32bit hash would also remove all the issues about the alignment
> of IP addresses (etc) on 64bit systems.

The current replacements of md5_transform with siphash in the v6 patch
series will continue to use the original siphash, since the 128-bit
key is rather important for these kinds of secrets. Additionally,
64-bit siphash is already faster than the md5_transform that it
replaces. So the alignment concerns (now, non-issues; problems have
been solved, I believe) still remain.

Jason

^ permalink raw reply

* "rfkill: Add rfkill-any LED trigger" causes deadlock
From: Mike Krinkin @ 2016-12-16 17:09 UTC (permalink / raw)
  To: kernel-ePNcKBjznIDVItvQsEIGlw,
	johannes.berg-ral2JQCrhuEAvxtiuMwx3w,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CACpa5=_chGuU3zVcVR3nkNySwz1cGAQTs39v+BjwcDNSvOqhTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Dec 16, 2016 at 07:46:06PM +0300, Михаил Кринкин wrote:
> Hi,
> 
> with recent i can't load my thinkpad with recent linux-next, bisect points
> at the commit 73f4f76a196d7adb ("rfkill: Add rfkill-any LED trigger").
> 
> Problem occurs because thinkapd_acpi rfkill set_block handler
> tpacpi_rfk_hook_set_block calls rfkill_set_sw_state, which in turn calls
> new rfkill_any_led_trigger_event. So when rfkill_set_block called from
> rfkill_register we have deadlock.
> 
> I added WARN to __rfkill_any_led_trigger_event to see how deadlock occurs,
> here is backtrace:
> 
> [    6.090079] WARNING: CPU: 2 PID: 307 at net/rfkill/core.c:184
> rfkill_any_led_trigger_event+0x2d/0x40
> [    6.090080] reached __rfkill_any_led_trigger_event
> [    6.090081] Modules linked in:
> [    6.090082]  snd_pcm sg thinkpad_acpi(+) mei_me nvram snd_seq mei
> idma64 virt_dma intel_pch_thermal ucsi intel_lpss_pci snd_seq_device
> hci_uart snd_timer snd btbcm soundcore btqca led_class btintel
> bluetooth i8042 rtc_cmos serio evdev intel_lpss_acpi intel_lpss
> acpi_pad tpm_infineon mfd_core kvm_intel kvm irqbypass ipv6 autofs4
> ext4 jbd2 mbcache sd_mod i915 i2c_algo_bit drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops drm ahci libahci video
> pinctrl_sunrisepoint pinctrl_intel
> [    6.090112] CPU: 2 PID: 307 Comm: systemd-udevd Tainted: G        W
>       4.9.0-01741-g73f4f76-dirty #106
> [    6.090113] Hardware name: LENOVO 20GJ004ERT/20GJ004ERT, BIOS
> R0CET21W (1.09 ) 03/07/2016
> [    6.090114]  ffffb199c023f998 ffffffffb4334b6b ffffb199c023f9e8
> 0000000000000000
> [    6.090117]  ffffb199c023f9d8 ffffffffb40658ab 000000b84dd75d28
> 0000000080000006
> [    6.090119]  ffff99ea4dd75d28 0000000000000001 0000000000000001
> 0000000000000000
> [    6.090122] Call Trace:
> [    6.090126]  [<ffffffffb4334b6b>] dump_stack+0x4d/0x72
> [    6.090128]  [<ffffffffb40658ab>] __warn+0xcb/0xf0
> [    6.090130]  [<ffffffffb406592f>] warn_slowpath_fmt+0x5f/0x80
> [    6.090132]  [<ffffffffb4631bfd>] rfkill_any_led_trigger_event+0x2d/0x40
> [    6.090135]  [<ffffffffb46322d9>] rfkill_set_sw_state+0x89/0xd0
> [    6.090142]  [<ffffffffc06c9921>]
> tpacpi_rfk_update_swstate+0x31/0x50 [thinkpad_acpi]
> [    6.090148]  [<ffffffffc06c9972>]
> tpacpi_rfk_hook_set_block+0x32/0x70 [thinkpad_acpi]
> [    6.090150]  [<ffffffffb46327f0>] rfkill_set_block+0x90/0x160
> [    6.090152]  [<ffffffffb4632930>] __rfkill_switch_all+0x70/0xa0
> [    6.090154]  [<ffffffffb4633028>] rfkill_register+0x278/0x2b0
> [    6.090159]  [<ffffffffc06e1f0e>] tpacpi_new_rfkill+0xef/0x15d
> [thinkpad_acpi]
> [    6.090164]  [<ffffffffc06e2407>] bluetooth_init+0x15e/0x1a9 [thinkpad_acpi]
> [    6.090169]  [<ffffffffc06e2fb5>]
> thinkpad_acpi_module_init.part.47+0x5e7/0x996 [thinkpad_acpi]
> [    6.090174]  [<ffffffffc06e3364>] ?
> thinkpad_acpi_module_init.part.47+0x996/0x996 [thinkpad_acpi]
> [    6.090179]  [<ffffffffc06e36af>]
> thinkpad_acpi_module_init+0x34b/0xc9c [thinkpad_acpi]
> [    6.090181]  [<ffffffffb4000450>] do_one_initcall+0x50/0x180
> [    6.090184]  [<ffffffffb4177e12>] ? do_init_module+0x27/0x1ee
> [    6.090186]  [<ffffffffb41cc4b6>] ? kmem_cache_alloc_trace+0x156/0x1a0
> [    6.090188]  [<ffffffffb4177e4a>] do_init_module+0x5f/0x1ee
> [    6.090191]  [<ffffffffb40ed562>] load_module+0x22c2/0x28d0
> [    6.090192]  [<ffffffffb40ea0f0>] ? __symbol_put+0x60/0x60
> [    6.090195]  [<ffffffffb42ee15d>] ? ima_post_read_file+0x7d/0xa0
> [    6.090198]  [<ffffffffb42a809b>] ? security_kernel_post_read_file+0x6b/0x80
> [    6.090200]  [<ffffffffb40eddcf>] SYSC_finit_module+0xdf/0x110
> [    6.090202]  [<ffffffffb40ede1e>] SyS_finit_module+0xe/0x10
> [    6.090204]  [<ffffffffb463d524>] entry_SYSCALL_64_fastpath+0x17/0x98
> [    6.090206] ---[ end trace ea6da61c1ec208d1 ]---
> [    6.090207] rfkill_any_led_trigger unlocked
> [    6.091664] ------------[ cut here ]------------

^ permalink raw reply

* Re: [PULL] virtio, vhost: new device, fixes, speedups
From: Linus Torvalds @ 2016-12-16 17:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: arend.vanspriel, KVM list, Network Development,
	Linux Kernel Mailing List, virtualization
In-Reply-To: <20161216184123-mutt-send-email-mst@kernel.org>

On Fri, Dec 16, 2016 at 9:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Oh, that's because I set orderfile globally rather than
> just for the qemu project which wants it.
> Fixed, sorry about that.

That explains it. I should have remembered, I think this came up once
before with somebody else.

Yeah, for the kernel it makes things much easier (at least for me) to
have everything just the default alphabetical ordering, particularly
because we use directory structure for maintenance areas.

So ordering the diffs by type ends up breaking my mental model for "is
this pull request touching the right files", which is why I reacted.

             Linus

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-16 17:36 UTC (permalink / raw)
  To: Jason, jeanphilippe.aumasson
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes, kernel-hardening,
	linux-crypto, linux-kernel, linux, luto, netdev, tom, torvalds,
	tytso, vegard.nossum
In-Reply-To: <CAHmME9rxCYfwyF6EADWqpAEt+yqCPgCLUVH0FPdAy7r-oPnrRg@mail.gmail.com>

> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.

A 128-bit output option was added to SipHash after the initial publication;
this is just the equivalent in 32-bit.

> - Is this a reasonable choice?

Yes.

> - Are there reasons why hsiphash with 64-bit output would be
>   reasonable? Or will we be fine sticking with 32-bit output only?

Personally, I'd put in a comment saying that "there's a 64-bit output
variant that's not implemented" and punt until someone find a need.

> With both hsiphash and siphash, the division of usage will probably become:
> - Use 64-bit output 128-bit key siphash for keyed RNG-like things,
>   such as syncookies and sequence numbers
> - Use 64-bit output 128-bit key siphash for hashtables that must
>   absolutely be secure to an extremely high bandwidth attacker, such as
>   userspace directly DoSing a kernel hashtable
> - Use 32-bit output 64-bit key hsiphash for quick hashtable functions
>   that still must be secure but do not require as large of a security
>   margin.

On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
should be used always.  Don't even compile the 32-bit code, to prevent
anyone accidentally using it, and make hsiphash an alias for siphash.

On a 32-bit machine, it's a much trickier case.  I'd be tempted to
use the 32-bit code always, but it needs examination.

Fortunately, the cost of brute-forcing hash functions can be fairly
exactly quantified, thanks to bitcoin miners.  It currently takes 2^70
hashes to create one bitcoin block, worth 25 bitcoins ($19,500).  Thus,
2^63 hashes cost $152.

Now, there are two factors that must be considered:
- That's a very very "wholesale" rate.  That's assuming you're doing
  large numbers of these and can put in the up-front effort designing
  silicon ASICs to do the attack.
- That's for a more difficult hash (double sha-256) than SipHash.
  That's a constant fator, but a pretty significant one.  If the wholesale
  assumption holds, that might bring the cost down another 6 or 7 bits,
  to $1-2 per break.

If you're not the NSA and limited to general-purpose silicon, let's
assume a state of the art GPU (Radeon HD 7970; AMD GPUs seem do to better
than nVidia).  The bitcoin mining rate for those is about 700M/second,
29.4 bits.  So 63 bits is 152502 GPU-days, divided by some factor
to account for SipHash's high speed compared to two rounds of SHA-2.
Call it 1000 GPU-days.

It's very doable, but also very non-trivial.  The question is, wouldn't
it be cheaper and easier just to do a brute-force flooding DDoS?

(This is why I wish the key size could be tweaked up to 80 bits.
That would take all these numbers out of the reasonable range.)

Let me consider your second example above, "secure against local users".
I should dig through your patchset and find the details, but what exactly
are the consequences of such an attack?  Hasn't a local user already
got much better ways to DoS the system?

The thing to remember is that we're worried only about the combination
of a *new* Linux kernel (new build or under active maintenance) and a
32-bit host.  You'd be hard-pressed to find a *single* machine fitting
that description which is hosting multiple users or VMs and is not 64-bit.

These days, 32-bit CPUs are for embedded applications: network appliances,
TVs, etc.  That means basically single-user.  Even phones are 64 bit.
Is this really a threat that needs to be defended against?

For your first case, network applications, the additional security
is definitely attractive.  Syncookies are only a DoS, but sequence
numbers are a real security issue; they can let you inject data into a
TCP connection.

Hash tables are much harder to attack.  The information you get back from
timing probes is statistical, and thus testing a key is more expensive.
With sequence numbers, large amounts (32 bits) the hash output is
directly observable.

I wish we could get away with 64-bit security, but given that the
modern internet involves attacks from NSA/Spetssvyaz/3PLA, I agree
it's just not enough.

^ permalink raw reply

* [TSN RFC v2 0/9] TSN driver for the kernel
From: henrik @ 2016-12-16 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Richard Cochran, henrik, Henrik Austad, linux-media, alsa-devel,
	netdev

From: Henrik Austad <haustad@cisco.com>


The driver is directed via ConfigFS as we need userspace to handle
stream-reservation (MSRP), discovery and enumeration (IEEE 1722.1) and
whatever other management is needed. This also includes running an
appropriate PTP daemon (TSN favors gPTP).

Once we have all the required attributes, we can create link using
mkdir, and use write() to set the attributes. Once ready, specify the
'shim' (basically a thin wrapper between TSN and another subsystem) and
we start pushing out frames.

The network part: it ties directly into the Rx-handler for receive and
writes skb's using dev_queue_xmit(). This could probably be improved.

2 new fields in netdev_ops have been introduced, and the Intel
igb-driver has been updated (as this an AVB-capable NIC which is
available as a PCI-e card).

What remains (tl;dr: a lot) a.k.a "Known problems" or "working on it!"
- tie to (g)PTP properly, currently using ktime_get() for presentation
  time
- get time from shim into TSN
- let shim create/manage buffer
- redo parts of the link-stuff using RCUs, the current setup is a bit
  clumsy.
- The igb-driver does not work properly when compiled with IGB_TSN, some
  details in setting the register values needs to be figured out. I am
  working on this, but as it stands, the best bet is to load tsn using
  in_debug=1 to bypass the capability-check. I have had e1000 and sky2
  running for several days without crashing, igb crashes and burns
  violently.
- The ALSA driver does not handle multiple devices very well and is a
  work in progress.

* v2: changes since v1

Changes since v1
- updated to latest upstream kernel (v4.8)
- set dedicated enabled-attribute and let shim be stored in own (support
  future plan for enabling per-shim attributes)
- fixed endianess issue in bitfields used in tsn-structs
- Updated some of the trace-events to use trace_class
- Fix various silly typos
- Handle disabling of link from hrtimer a bit more gracefully (that
  actually works-ish).
- use old skb and size of skb when that is set (Reporte by Nikita)
- Move PCP-codes to NIC and not in the link itself
- Allow TSN-capable card to be loaded even when in debug-mode (and do
  not enforce TSN behaviour)
- Start hooking into ALSA's get_time_info hooks (very much incomplete)
- use threads for sending frames, wake from hrtimer-callback.
  This also queues up awaiting timers if we fail to complete the
  transmit before another timer arrives, it will immediately execute
  another iteration, so no events should be lost. That being said,
  should this happen, it is a clear bug as we really should complete
  well before the next interval.
- Cleanup link-locking and differentiate between Talker and Listener (as
  Listener grab link-lock from IRQ context)
- Change list-lock to spinlock as we may need to take a link-lock whilst
  holding the master list-lock.
- Do a more careful dance holding the spinlocks to regions only doing
  actual update.

Network driver (I210 only)
- bring up all Tx-/Rx-queues when igb is in TSN-mode regardless of how
  many CPUs the system has for I210
- Correctly calculate the idle_slope in I210's configure hook
- Update igb-driver with queue-select and return correct queue when
  sending TSN-frames
- add IGB_FLAG_QAV_PRIO flag to igb_adapter (to handle proper config of
  tx-ring when adapter is brought up.
- add TXDCTL logic (part of preparatory work for TSN) to igb-driver
- Improve SR(A|B) accountingo

ALSA Shim
- Allow userspace to grab much smaller chunks of data (down to a single
  Class A frame for S16_LE 2ch 48kHz).
- Create the card with index/id pattern to avoid collision with other
  cards.
* v1

Before reading on - this is not even beta, but I'd really appreciate if
people would comment on the overall architecture and perhaps provide
some pointers to where I should improve/fix/update
- thanks!

This is a very early RFC for a TSN-driver in the kernel. It has been
floating around in my repo for a while and I would appreciate some
feedback on the overall design to avoid doing some major blunders.

There are at least one AVB-driver (the AV-part of TSN) in the kernel
already. This driver aims to solve a wider scope as TSN can do much more
than just audio. A very basic ALSA-driver is added to the end that
allows you to play music between 2 machines using aplay in one end and
arecord | aplay on the other (some fiddling required) We have plans for
doing the same for v4l2 eventually (but there are other fishes to fry
first). The same goes for a TSN_SOCK type approach as well.

Henrik Austad (9):
  igb: add missing fields to TXDCTL-register
  TSN: add documentation
  TSN: Add the standard formerly known as AVB to the kernel
  Adding TSN-driver to Intel I210 controller
  Add TSN header for the driver
  Add TSN machinery to drive the traffic from a shim over the network
  Add TSN event-tracing
  AVB ALSA - Add ALSA shim for TSN
  MAINTAINERS: add TSN/AVB-entries

 Documentation/TSN/tsn.txt                    |  345 ++++++++
 MAINTAINERS                                  |   14 +
 drivers/media/Kconfig                        |   15 +
 drivers/media/Makefile                       |    2 +-
 drivers/media/avb/Makefile                   |    5 +
 drivers/media/avb/avb_alsa.c                 |  793 +++++++++++++++++
 drivers/media/avb/tsn_iec61883.h             |  152 ++++
 drivers/net/ethernet/intel/Kconfig           |   18 +
 drivers/net/ethernet/intel/igb/Makefile      |    2 +-
 drivers/net/ethernet/intel/igb/e1000_82575.h |    4 +
 drivers/net/ethernet/intel/igb/igb.h         |   26 +
 drivers/net/ethernet/intel/igb/igb_main.c    |   39 +-
 drivers/net/ethernet/intel/igb/igb_tsn.c     |  468 ++++++++++
 include/linux/netdevice.h                    |   44 +
 include/linux/tsn.h                          |  952 +++++++++++++++++++++
 include/trace/events/tsn.h                   |  333 ++++++++
 net/Kconfig                                  |    1 +
 net/Makefile                                 |    1 +
 net/tsn/Kconfig                              |   32 +
 net/tsn/Makefile                             |    6 +
 net/tsn/tsn_configfs.c                       |  673 +++++++++++++++
 net/tsn/tsn_core.c                           | 1189 ++++++++++++++++++++++++++
 net/tsn/tsn_header.c                         |  162 ++++
 net/tsn/tsn_internal.h                       |  397 +++++++++
 net/tsn/tsn_net.c                            |  392 +++++++++
 25 files changed, 6061 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/TSN/tsn.txt
 create mode 100644 drivers/media/avb/Makefile
 create mode 100644 drivers/media/avb/avb_alsa.c
 create mode 100644 drivers/media/avb/tsn_iec61883.h
 create mode 100644 drivers/net/ethernet/intel/igb/igb_tsn.c
 create mode 100644 include/linux/tsn.h
 create mode 100644 include/trace/events/tsn.h
 create mode 100644 net/tsn/Kconfig
 create mode 100644 net/tsn/Makefile
 create mode 100644 net/tsn/tsn_configfs.c
 create mode 100644 net/tsn/tsn_core.c
 create mode 100644 net/tsn/tsn_header.c
 create mode 100644 net/tsn/tsn_internal.h
 create mode 100644 net/tsn/tsn_net.c

--
2.7.4

^ permalink raw reply

* [TSN RFC v2 1/9] igb: add missing fields to TXDCTL-register
From: henrik @ 2016-12-16 17:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Richard Cochran, henrik, linux-media, alsa-devel, netdev,
	Jeff Kirsher, intel-wired-lan, Henrik Austad
In-Reply-To: <1481911153-549-1-git-send-email-henrik@austad.us>

From: Henrik Austad <henrik@austad.us>

The current list of E1000_TXDCTL-registers is incomplete. This adds the
missing parts for the Transmit Descriptor Control (TXDCTL) register.

The rest of these values (threshold for descriptor read/write) for
TXDCTL seems to be defined in igb/igb.h, not sure why this is split
though.

It seems that this was left out in the commit that added support for
82575 Gigabit Ethernet driver 9d5c8243 (igb: PCI-Express 82575 Gigabit
Ethernet driver).

Cc: linux-kernel@vger.kernel.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Signed-off-by: Henrik Austad <haustad@cisco.com>
---
 drivers/net/ethernet/intel/igb/e1000_82575.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_82575.h b/drivers/net/ethernet/intel/igb/e1000_82575.h
index acf0605..7faa482 100644
--- a/drivers/net/ethernet/intel/igb/e1000_82575.h
+++ b/drivers/net/ethernet/intel/igb/e1000_82575.h
@@ -158,7 +158,11 @@ struct e1000_adv_tx_context_desc {
 
 /* Additional Transmit Descriptor Control definitions */
 #define E1000_TXDCTL_QUEUE_ENABLE  0x02000000 /* Enable specific Tx Queue */
+
+/* Transmit Software Flush, sw-triggered desc writeback */
+#define E1000_TXDCTL_SWFLSH        0x04000000
 /* Tx Queue Arbitration Priority 0=low, 1=high */
+#define E1000_TXDCTL_PRIORITY      0x08000000
 
 /* Additional Receive Descriptor Control definitions */
 #define E1000_RXDCTL_QUEUE_ENABLE  0x02000000 /* Enable specific Rx Queue */
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox