Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22 19:24 UTC (permalink / raw)
  To: luto, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrUSM7KDNuVqh169241PNumJVOF5hrNQUq=k5Fnet6nSgA@mail.gmail.com>

> Having slept on this, I like it less.  The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value.  This probably breaks any attempt to apply
> security properties of the hash function.  For example, the internal
> state could easily contain a whole bunch of prior outputs it in
> verbatim.

The problem is, anti-backtracing is in severe tension with your desire
to use unmodified SipHash.

First of all, I'd like to repeat that it isn't a design goal of the current
generator and isn't necessary.

The current generator just returns hash[0] from the MD5 state, then
leaves the state stored.  The fact that it conceals earlier outputs is
an accident of the Davies-Meyer structure of md5_transform.

It isn't necessary, because every secret generated is stored unencrypted
for as long as it's of value.  A few values are used for retransmit
backoffs and random MAC addresses.  Those are revealed to the world as
soon as they're used.

Most values are used for ASLR.  These address are of interest to an
attacker trying to mount a buffer-overflow attack, but that only lasts
as long as the process is running to receive buffers.  After the process
exits, knowledge of its layout is worthless.

And this is stored as long as it's running in kernel-accessible data,
so storing a *second* copy in less conveniently kernel-accessible data
(the RNG state) doesn't make the situation any worse.

In addition to the above, if you're assuming a state capture, then
since we have (for necessary efficieny reasons) a negligible about of
fresh entropy, an attacker has the secret key and can predict *future*
outputs very easily.

Given that information, an attacker doesn't need to learn the layout of
vulnerable server X.  If they have a buffer overflow, they can crash
the current instance and wait for a fresh image to be started (with a
known address space) to launch their attack at.

Kernel state capture attacks are a very unlikely attack, mostly because
it's a narrow target a hair's breadth away from the much more interesting
outright kernel compromise (attacker gains write access as well as read)
which renders all this fancy cryptanaysis moot.

Now, the main point:  it's not likely to be solvable.

The problem with unmodified SipHash is that is has only 64 bits of
output.  No mix-back mechanism can get around the fundamental problem
that that's too small to prevent a brute-force guessing attack.  You need
wider mix-back.  And getting more output from unmodified SipHash requires
more finalization rounds, which is expensive.

(Aside: 64 bits does have the advantage that it can't be brute-forced on
the attacked machine.  It must be exfiltrated to the attacker, and the
solution returned to the attack code.  But doing this is getting easier
all the time.)

Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer
structure around its internal state.  Remember the internal state before
hashing in the entropy and secret, generate the output, then add the
previous and final states together for storage.

This is a standard textbook construction, very cheap, and doesn't touch
the compression function which is the target of analysis and attacks,
but it requires poking around in SipHash's internal state.  (And people
who read the textbooks without understanding them will get upset because
the textbooks all talk about using this construction with block ciphers,
and SipHash's compression function is not advertised as a block cipher.)

Alternative designs exist; you could extract additional output from
earlier rounds of SipHash, using the duplex sponge construction you
mentioned earlier.  That output would be used for mixback purposes *only*,
so wouldn't affect the security proof of the "primary" output.
But this is also getting creative with SipHash's internals.

Now, you could use a completely *different* cryptographic primitive
to enforce one-way-ness, and apply SipHash as a strong output transform,
but that doesn't feel like good design, and is probably more expensive.

Finally, your discomfort about an attacker learning the internal state...
if an attacker knows the key and the input, they can construct the
internal state.  Yes, we could discard the internal state and construct
a fresh one on the next call to get_random_int, but what are you going
to key it with?  What are you going to feed it?  What keeps *that*
internal state any more secret from an attacker than one that's explicitly
stored?

Keeping the internal state around is a cacheing optimization, that's all.

*If* you're assuming a state capture, the only thing secret from the
attacker is any fresh entropy collected between the time of capture
and the time of generation.  Due to mandatory efficiency requirements,
this is very small.  

I really think you're wishing for the impossible here.

A final note: although I'm disagreeing with you, thank you very much for 
the informed discussion.  Knowing that someone will read and think about
this message carefully has forced me to think it through carefully myself.

For example, clearly stating the concern over starting new processes
with predictable layout, and the limits on the fresh entropy supply,
has made me realize that there *is* a possible source: each exec()
is passed 128 bits from get_random_bytes in the AT_RANDOM element of
its auxv.  Since get_random_int() accesses "current" anyway, we could
store some seed material there rather than using "pid".  While this is
not fresh for each call to get_random_int, it *is* fresh for each new
address space whose layout is being randomized.

^ permalink raw reply

* Re: [PATCH iproute2 0/2] Fix the usage of TC tunnel key
From: Stephen Hemminger @ 2016-12-22 19:03 UTC (permalink / raw)
  To: Hadar Hen Zion; +Cc: netdev, Simon Horman, Or Gerlitz, Roi Dayan
In-Reply-To: <1482394481-19624-1-git-send-email-hadarh@mellanox.com>

On Thu, 22 Dec 2016 10:14:39 +0200
Hadar Hen Zion <hadarh@mellanox.com> wrote:

> Add dest UDP port parameter to the usage of tc tunnle key action and
> classifcation.
> 
> 
> Hadar Hen Zion (2):
>   tc/cls_flower: Add to the usage encapsulation dest UDP port
>   tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
> 
>  tc/f_flower.c     | 5 +++--
>  tc/m_tunnel_key.c | 2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Looks good, both applied

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Cong Wang @ 2016-12-22 19:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <585AEF24.4070800@iogearbox.net>

On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Ok, you mean for net. In that case I prefer the smaller sized fix to be
> honest. It also covers everything from the point where we fetch the chain
> via cops->tcf_chain() to the end of the function, which is where most of
> the complexity resides, and only the two mentioned commits do the relock,

I really wish the problem is only about relocking, but look at the code,
the deeper reason why we have this bug is the complexity of the logic
inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
idempotent; 2) the request logic itself is hard, because of tc filter design
and implementation.

This is why I worry more than just relocking.

> so as a fix I think it's fine as-is. As mentioned, if there's need to
> refactor tc_ctl_tfilter() net-next would be better, imho.

Refactor is a too strong word, just moving the replay out is not a refactor.
The end result will be still smaller than your commit d936377414fadbafb4,
which is already backported to stable.

^ permalink raw reply

* Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
From: Stephen Hemminger @ 2016-12-22 18:59 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: netdev, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch
In-Reply-To: <1482423795-6531-3-git-send-email-nogahf@mellanox.com>

On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:
On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:

>  }
> @@ -691,18 +804,22 @@ static const struct option longopts[] = {
>  	{ "interval", 1, 0, 't' },
>  	{ "version", 0, 0, 'V' },
>  	{ "zeros", 0, 0, 'z' },
> +	{ "extended", 1, 0, 'x'},
>  	{ 0 }
>  };
>  
> +
>  int main(int argc, char *argv[])

You let extra whitespace changes creep in.


> +		case 'x':
> +			is_extended = true;
> +			memset(stats_type, 0, 64);
> +			strncpy(stats_type, optarg, 63);
> +			break;

This seems like doing this either the paranoid or hard way.
Why not:
	const char *stats_type = NULL;
...

	case 'x':
		stats_type = optarg;
		break;
...
		if (stats_type)
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
				 getuid());
		else
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.ifstat.u%d", P_tmpdir, getuid());


Since:
	1) optarg points to area in argv that is persistent (avoid copy)
	2) don't need is_extended flag value then

Please cleanup and resubmit.

^ permalink raw reply

* [PATCH] tc: add missing limits.h header
From: Baruch Siach @ 2016-12-22 18:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach

This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
                             ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 tc/tc_util.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/tc_util.h b/tc/tc_util.h
index f198a4ad5554..4db26c6d5e25 100644
--- a/tc/tc_util.h
+++ b/tc/tc_util.h
@@ -2,6 +2,7 @@
 #define _TC_UTIL_H_ 1
 
 #define MAX_MSG 16384
+#include <limits.h>
 #include <linux/pkt_sched.h>
 #include <linux/pkt_cls.h>
 #include <linux/gen_stats.h>
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-22 18:48 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, Mark Rutland, netdev,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Mark Greer,
	Justin Bronder
In-Reply-To: <CAO7Z3WKmEbJCN_=srxTVKXvtz84vHGi41RS+9T7fmkmhRB6nEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Dec 19, 2016 at 5:23 PM, Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org> wrote:
> I can make that change, however, I worry that it may be a bit
> misleading, since there are only two supported clock frequencies, but
> a number like that to me implies that it could be set to any number
> you want.   I'm new at this, and so I'll go ahead and change it as you
> request, but I'd like to hear your thoughts on my concern.

Then the binding doc just needs to state what are the 2 valid frequencies.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Jason A. Donenfeld @ 2016-12-22 18:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

Okay, so in that case, a weak hashing function like SHA1 could result
in a real vulnerability. Therefore, this SHA1 stuff needs to be
reverted immediately, pending a different implementation. If this has
ever shipped in a kernel version, it could even deserve a CVE. No
SHA1!

> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Jeepers creepers. So for some ungodly reason, LKML has invented yet
another homebrewed crypto primitive. This story really gets more
horrifying every day. No bueno.

So yea, let's revert and re-commit (repeal and replace? just
kidding...). Out with SHA-1, in with Blake2 or SHA2.

Jason

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 18:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Theodore Ts'o, kernel-hardening, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <32db19ab-0c19-628c-3475-2ccb8050563a@stressinduktion.org>

On Thu, Dec 22, 2016 at 7:08 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).

AFAIK, this is a real vulnerability that needs to be addressed.

Judging by Ted's inquiry about my siphash testing suite, I assume he's
probably tinkering around with it as we speak. :)

Meanwhile I've separated things into several trees:

1. chacha20 rng, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=random-next

2. md5 cleanup, not yet submitted:
https://git.zx2c4.com/linux-dev/log/?h=md5-cleanup

3. md4 cleanup, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=ext4-next-md4-cleanup

4. siphash and networking, not yet submitted as a x/4 series:
https://git.zx2c4.com/linux-dev/log/?h=net-next-siphash

I'll submit (4) in a couple of days, waiting for any comments on the
existing patch-set.

Jason

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22 18:08 UTC (permalink / raw)
  To: Theodore Ts'o, Jason A. Donenfeld, kernel-hardening,
	Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222155447.u3ayvw4gmorhswjv@thunk.org>

On 22.12.2016 16:54, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> following up on what appears to be a random subject: ;)
>>>
>>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>>> in the htree. siphash seems to fit this use case pretty good.
>>
>> I saw this too. I'll try to address it in v8 of this series.
> 
> This is a separate issue, and this series is getting a bit too
> complex.  So I'd suggest pushing this off to a separate change.
> 
> Changing the htree hash algorithm is an on-disk format change, and so
> we couldn't roll it out until e2fsprogs gets updated and rolled out
> pretty broadley.  In fact George sent me patches to add siphash as a
> hash algorithm for htree a while back (for both the kernel and
> e2fsprogs), but I never got around to testing and applying them,
> mainly because while it's technically faster, I had other higher
> priority issues to work on --- and see previous comments regarding
> pixel peeping.  Improving the hash algorithm by tens or even hundreds
> of nanoseconds isn't really going to matter since we only do a htree
> lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
> times will dominate.  And from the power perspective, saving
> microwatts of CPU power isn't going to matter if you're going to be
> spinning up the storage device....

I wasn't concerned about performance but more about DoS resilience. I
wonder how safe half md4 actually is in terms of allowing users to
generate long hash chains in the filesystem (in terms of length
extension attacks against half_md4).

In ext4, is it actually possible that a "disrupter" learns about the
hashing secret in the way how the inodes are returned during getdents?

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
From: Alexander Duyck @ 2016-12-22 17:55 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Weilong Chen, Jeff Kirsher, intel-wired-lan, Netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <74e1e76d-40f6-db8e-74ac-90b0474d4e11@huawei.com>

Yes that is much more helpful.  So looking at it things are being
padded but the last 4 bytes always have this extra data in them.  I've
been trying to recreate the issue on an 82599 with an SR-IOV VF and I
haven't been having much luck reproducing the problem.

In your test environment is the 82599 connected directly to the
Windows machine or are there any
switches/routers/gateways/tunnels/vlans in between?  I've tried
several iterations but with the 82599 connected directly to another
NIC I have I am not able to get it to produce the garbage padding you
are seeing. It makes me wonder if there might be something that is
manipulating the packets in between the two systems.  For example if
there was a VLAN being associated with the VF that is later stripped
and then the packet handed raw to the test system it might explain
what is introducing the extra padding and reason for pulling in the
CRC, and your patch would mask the issue since it would push the
minimum frame size with a VLAN to 68 instead of 64.

- Alex

On Wed, Dec 21, 2016 at 6:00 PM, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2016/12/21 10:20, Alexander Duyck wrote:
>> I find it curious that only the last 4 bytes have data in them.  I'm
>> wondering if the NIC/driver in the Windows/Nessus system is
>> interpreting the 4 byte CRC on the end of the frame as padding instead
>> of stripping it.
>>
>> Is there any chance you could capture the entire frame instead of just
>> the padding?  Maybe you could run something like wireshark without
>> enabling promiscuous mode on the VF and capture the frames it is
>> trying to send and receive.  What I want to verify is what the actual
>> amount of padding is that is needed to get to 60 bytes and where the
>> CRC should start.
>>
>> - Alex
>
> Here is the verbose output, is this useful?
> Or we will try according to your advice, thanks,
>
> D:\Program Files\Tenable\Nessus>nasl.exe -aX -t 192.169.0.151 etherleak.nasl
> --------------------------
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 81 00 00 40 01 D7 F3 C0 A9 00 97    E... ...@.......
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00    ............x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 98 E4 75 DF          ............u.
> --------------------------
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 85 00 00 40 01 D7 EF C0 A9 00 97    E... ...@.......
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00    ............x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 FB DA F8 13          ..............
> ---[ ether1 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75    ...............u
> 0x10:  DF                                                 .
> ---[ ether2 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8    ................
> 0x10:  13                                                 .
>
> Padding observed in one frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75    ...............u
>   0x10:  DF                                                 .
>
> Padding observed in another frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8    ................
>   0x10:  13
>
>

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: John Fastabend @ 2016-12-22 17:50 UTC (permalink / raw)
  To: David Miller, daniel
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, netdev
In-Reply-To: <20161222.115326.376675148706578174.davem@davemloft.net>

On 16-12-22 08:53 AM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Wed, 21 Dec 2016 22:07:48 +0100
> 
>> Ok, you mean for net. In that case I prefer the smaller sized fix to
>> be honest. It also covers everything from the point where we fetch
>> the chain via cops->tcf_chain() to the end of the function, which is
>> where most of the complexity resides, and only the two mentioned
>> commits do the relock, so as a fix I think it's fine as-is. As
>> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
>> would be better, imho.
> 
> Please, can you two work towards an agreement as to what fix should
> go in at this time?
> 
> Thanks.
> 

Thanks for fixing this!

I have a slight preference to push this patch into net as its been
tested already by Shahar and is not particularly invasive. Then for
net-next do a larger cleanup to get rid of some of the complexity per
Cong's suggestion.

.John

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-22 17:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Borkmann, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrW_T1v4qKPJDs5dXwAnAit3M52AWMH-K+GJLb1WoLMuRQ@mail.gmail.com>

On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann <daniel@iogearbox.net>
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > >     bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> 
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Yeah, it still shouldn't crash the kernel but it could cause
malfunctions because assumptions are not met from user space thus it
could act in a strange way:

My personal biggest concern is that users of this API will at some
point in time assume this digist is unique (as a key itself for a
hashtables f.e.), while it is actually not (and not enforced so by the
kernel). If you can get an unpriv ebpf program inserted to the kernel
with the same weak hash, a controller daemon could pick it up and bind
it to another ebpf hook, probably outside of the unpriv realm the user
was in before. Only the sorting matters, which might be unstable and is
not guaranteed by anything in most hash table based data structures.

The API seems flawed to me.

> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.

To add to this, I am very much in favor of that. Right now it doesn't
have a name because it is a custom algorithm. ;)

> > > 
> > > Also:
> > > 
> > > +       result = (__force __be32 *)fp->digest;
> > > +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +               result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Putting on crypto hat:
> 
> NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
> 2016 when people know better and is going to handle it by porting that
> new primitive to userspace" is not a particularly good argument.
> 
> Okay, crypto hack back off.
> 
> > 
> > I wondered if bpf program loading should have used the module loading
> > infrastructure from the beginning...
> 
> That would be way too complicated and would be nasty for the unprivileged cases.

I was more or less just thinking about using the syscalls and user
space representation not the generic infrastructure, as it is anyway
too much concerned with real kernel modules (would probably also solve
your cgroup-ebpf thread, as it can be passed by unique name or name and
hash ;) ). Anyway...

> > > At the very least, there should be a separate function that calculates
> > > the hash of a buffer and that function should explicitly run itself
> > > against test vectors of various lengths to make sure that it's
> > > calculating what it claims to be calculating.  And it doesn't look
> > > like the userspace code has landed, so, if this thing isn't
> > > calculating SHA1 correctly, it's plausible that no one has noticed.
> > 
> > I hope this was known from the beginning, this is not sha1 unfortunately.
> > 
> > But ebpf elf programs also need preprocessing to get rid of some
> > embedded load-depending data, so maybe it was considered to be just
> > enough?
> 
> I suspect it was actually an accident.

Maybe, I don't know.

Bye,
Hannes

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 17:25 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> > > Hi Hannes,
>> > >
>> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> > > <hannes@stressinduktion.org> wrote:
>> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > > > You don't want to give people new IPv6 addresses with the same stable
>> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > > > connectivity then and it is extra work?
>> > >
>> > > Ahh, too bad. So it goes.
>> >
>> > If no other users survive we can put it into the ipv6 module.
>> >
>> > > > The bpf hash stuff can be changed during this merge window, as it is
>> > > > not yet in a released kernel. Albeit I would probably have preferred
>> > > > something like sha256 here, which can be easily replicated by user
>> > > > space tools (minus the problem of patching out references to not
>> > > > hashable data, which must be zeroed).
>> > >
>> > > Oh, interesting, so time is of the essence then. Do you want to handle
>> > > changing the new eBPF code to something not-SHA1 before it's too late,
>> > > as part of a ne
>>
>> w patchset that can fast track itself to David? And
>> > > then I can preserve my large series for the next merge window.
>> >
>> > This algorithm should be a non-seeded algorithm, because the hashes
>> > should be stable and verifiable by user space tooling. Thus this would
>> > need a hashing algorithm that is hardened against pre-image
>> > attacks/collision resistance, which siphash is not. I would prefer some
>> > higher order SHA algorithm for that actually.
>> >
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann <daniel@iogearbox.net>
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>>     bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

The commit log talks about using the hash to see if the program has
already been compiled and JITted.  If that's done, then a collision
will directly cause the kernel to malfunction.

>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +       result = (__force __be32 *)fp->digest;
>> +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +               result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Putting on crypto hat:

NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
2016 when people know better and is going to handle it by porting that
new primitive to userspace" is not a particularly good argument.

Okay, crypto hack back off.

>
> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...

That would be way too complicated and would be nasty for the unprivileged cases.

>
>> At the very least, there should be a separate function that calculates
>> the hash of a buffer and that function should explicitly run itself
>> against test vectors of various lengths to make sure that it's
>> calculating what it claims to be calculating.  And it doesn't look
>> like the userspace code has landed, so, if this thing isn't
>> calculating SHA1 correctly, it's plausible that no one has noticed.
>
> I hope this was known from the beginning, this is not sha1 unfortunately.
>
> But ebpf elf programs also need preprocessing to get rid of some
> embedded load-depending data, so maybe it was considered to be just
> enough?

I suspect it was actually an accident.

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Joao Pinto @ 2016-12-22 17:06 UTC (permalink / raw)
  To: Phil Reid, Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <816bb34f-bb4c-648a-179e-7e72cab975b0@electromag.com.au>

Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
> On 22/12/2016 23:47, Joao Pinto wrote:
>>
>> Hello Phil,
>>
>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>> G'day Joao,
>>>
>>> On 22/12/2016 20:38, Joao Pinto wrote:
>>>> When testing stmmac with my QoS reference design I checked a problem in the
>>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>>
>>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>>> ---
>>>> changes v1->v2 (David Miller)
>>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>>> to make sense
>>>>
>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> index b21d03f..94223c8 100644
>>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>>> *ioaddr, int mcbins,
>>>>      mac->mii.reg_shift = 6;
>>>>      mac->mii.reg_mask = 0x000007C0;
>>>>      mac->mii.clk_csr_shift = 2;
>>>> -    mac->mii.clk_csr_mask = 0xF;
>>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>
>>> Should this not be GENMASK(5,2)
>>
>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>
>> Bits: 4:2
>> 000 60-100 MHz clk_csr_i/42
>> 001 100-150 MHz clk_csr_i/62
>> 010 20-35 MHz clk_csr_i/16
>> 011 35-60 MHz clk_csr_i/26
>> 100 150-250 MHz clk_csr_i/102
>> 101 250-300 MHz clk_csr_i/124
>> 110, 111 Reserved
>>
>> So only bits 2, 3 and 4 should be masked.
>>
>> Thanks.
>>
> But this is a change in behaviour from what was there isn't.
> Previous mask was 4 bits. now it's 3.
> 
> And for example the altera socfgpa implementation in the cyclone V has valid values
> for values 0x8-0xf, using bit 5:2.

According to the databook, bit 5 is reserved and RO. When reserved, it is
possible to customize. Is that the case? If there is hardware using the 5th bit
we can put it GENMASK(5, 2) with no problem.

> 
> 
> 
> 
> 

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-22 16:59 UTC (permalink / raw)
  To: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov
  Cc: Jason A. Donenfeld, kernel-hardening@lists.openwall.com,
	Theodore Ts'o, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrUATPeaCJTQZN2h=ct6HH-xLt4V8T0PfPHNFGFN2HWtTA@mail.gmail.com>

On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > Hi Hannes,
> > > 
> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > > <hannes@stressinduktion.org> wrote:
> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > > > You don't want to give people new IPv6 addresses with the same stable
> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > connectivity then and it is extra work?
> > > 
> > > Ahh, too bad. So it goes.
> > 
> > If no other users survive we can put it into the ipv6 module.
> > 
> > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > something like sha256 here, which can be easily replicated by user
> > > > space tools (minus the problem of patching out references to not
> > > > hashable data, which must be zeroed).
> > > 
> > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > as part of a ne
> 
> w patchset that can fast track itself to David? And
> > > then I can preserve my large series for the next merge window.
> > 
> > This algorithm should be a non-seeded algorithm, because the hashes
> > should be stable and verifiable by user space tooling. Thus this would
> > need a hashing algorithm that is hardened against pre-image
> > attacks/collision resistance, which siphash is not. I would prefer some
> > higher order SHA algorithm for that actually.
> > 
> 
> You mean:
> 
> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Sun Dec 4 23:19:41 2016 +0100
> 
>     bpf: add prog_digest and expose it via fdinfo/netlink
> 
> 
> Yes, please!  This actually matters for security -- imagine a
> malicious program brute-forcing a collision so that it gets loaded
> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> (preferably the latter).  Speed basically doesn't matter here and
> Blake2 is both less stable (didn't they slightly change it recently?)
> and much less well studied.

We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).

> My inclination would have been to seed them with something that isn't
> exposed to userspace for the precise reason that it would prevent user
> code from making assumptions about what's in the hash.  But if there's
> a use case for why user code needs to be able to calculate the hash on
> its own, then that's fine.  But perhaps the actual fdinfo string
> should be "sha256:abcd1234..." to give some flexibility down the road.
> 
> Also:
> 
> +       result = (__force __be32 *)fp->digest;
> +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
> +               result[i] = cpu_to_be32(fp->digest[i]);
> 
> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very
> brief glance it looks wrong to me.  If you're doing this to avoid
> depending on crypto, then fix crypto so you can pull in the algorithm
> without pulling in the whole crypto core.

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

I hope this was known from the beginning, this is not sha1 unfortunately.

But ebpf elf programs also need preprocessing to get rid of some
embedded load-depending data, so maybe it was considered to be just
enough?

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Phil Reid @ 2016-12-22 16:57 UTC (permalink / raw)
  To: Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <efb22734-df9a-fde0-7cff-d1658105e2ec@synopsys.com>

On 22/12/2016 23:47, Joao Pinto wrote:
>
> Hello Phil,
>
> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>> G'day Joao,
>>
>> On 22/12/2016 20:38, Joao Pinto wrote:
>>> When testing stmmac with my QoS reference design I checked a problem in the
>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>
>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>> ---
>>> changes v1->v2 (David Miller)
>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>> to make sense
>>>
>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> index b21d03f..94223c8 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>> *ioaddr, int mcbins,
>>>      mac->mii.reg_shift = 6;
>>>      mac->mii.reg_mask = 0x000007C0;
>>>      mac->mii.clk_csr_shift = 2;
>>> -    mac->mii.clk_csr_mask = 0xF;
>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>
>> Should this not be GENMASK(5,2)
>
> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>
> Bits: 4:2
> 000 60-100 MHz clk_csr_i/42
> 001 100-150 MHz clk_csr_i/62
> 010 20-35 MHz clk_csr_i/16
> 011 35-60 MHz clk_csr_i/26
> 100 150-250 MHz clk_csr_i/102
> 101 250-300 MHz clk_csr_i/124
> 110, 111 Reserved
>
> So only bits 2, 3 and 4 should be masked.
>
> Thanks.
>
But this is a change in behaviour from what was there isn't.
Previous mask was 4 bits. now it's 3.

And for example the altera socfgpa implementation in the cyclone V has valid values
for values 0x8-0xf, using bit 5:2.





-- 
Regards
Phil Reid

^ permalink raw reply

* VERY IMPORTANT
From: info @ 2016-12-22  5:45 UTC (permalink / raw)


We have an inheritance of a deceased client with your surname. Kindly  
contact me via email:( gertjan.vlieghe@yandex.com ) with your full  
names for info

Best Regards.


Dr, Gertjan Vlieghe.
----------------------------------
Correo Corporativo Hospital Universitario del Valle E.S.E
*******************************************************************************

"Estamos re-dimensionandonos para crecer!"

******************************************************************************

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 16:53 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9pvPXiBu5TUYJL7gED7b=iXKrkXu45fXstnBFe77Esv5Q@mail.gmail.com>

On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi all,
>
> I don't know what your design requirements are for this. It looks like
> you're generating some kind of crypto digest of a program, and you
> need to avoid collisions. If you'd like to go with a PRF (keyed hash
> function) that uses some kernel secret key, then I'd strongly suggest
> using Keyed-Blake2. Alternatively, if you need for userspace to be
> able to calculate the same hash, and don't want to use some kernel
> secret, then I'd still suggest using Blake2, which will be fast and
> secure.
>
> If you can wait until January, I'll work on a commit adding the
> primitive to the tree. I've already written it and I just need to get
> things cleaned up.
>
>> Blake2 is both less stable (didn't they slightly change it recently?)
>
> No, Blake2 is very stable. It's also extremely secure and has been
> extensively studied. Not to mention it's faster than SHA2. And if you
> want to use it as a PRF, it's obvious better suited and faster to use
> Blake2's keyed PRF mode than HMAC-SHA2.
>
> If you don't care about performance, and you don't want to use a PRF,
> then just use SHA2-256. If you're particularly concerned about certain
> types of attacks, you could go with SHA2-512 truncated to 256 bytes,
> but somehow I doubt you need this.

I don't think this cares about performance.  (Well, it cares about
performance, but the verifier will likely dominiate the cost by such a
large margin that the hash algo doesn't matter.)  And staying
FIPS-compliant-ish is worth a little bit, so I'd advocate for
something in the SHA2 family.

> If userspace hasn't landed, can we get away with changing this code
> after 4.10? Or should we just fix it before 4.10? Or should we revert
> it before 4.10? Development-policy-things like this I have zero clue
> about, so I heed to your guidance.

I think it should be fixed or reverted before 4.10.

--Andy

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: David Miller @ 2016-12-22 16:53 UTC (permalink / raw)
  To: daniel
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, john.fastabend,
	netdev
In-Reply-To: <585AEF24.4070800@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 21 Dec 2016 22:07:48 +0100

> Ok, you mean for net. In that case I prefer the smaller sized fix to
> be honest. It also covers everything from the point where we fetch
> the chain via cops->tcf_chain() to the end of the function, which is
> where most of the complexity resides, and only the two mentioned
> commits do the relock, so as a fix I think it's fine as-is. As
> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
> would be better, imho.

Please, can you two work towards an agreement as to what fix should
go in at this time?

Thanks.

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Shahar Klein @ 2016-12-22 13:16 UTC (permalink / raw)
  To: Daniel Borkmann, davem
  Cc: shahark, xiyou.wangcong, gerlitz.or, roid, jiri, john.fastabend,
	netdev
In-Reply-To: <1954223e8b7aa90ff835357e8a3ef7537be33e43.1482337600.git.daniel@iogearbox.net>



On 12/21/2016 7:04 PM, Daniel Borkmann wrote:
> Shahar reported a soft lockup in tc_classify(), where we run into an
> endless loop when walking the classifier chain due to tp->next == tp
> which is a state we should never run into. The issue only seems to
> trigger under load in the tc control path.
>
> What happens is that in tc_ctl_tfilter(), thread A allocates a new
> tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
> with it. In that classifier callback we had to unlock/lock the rtnl
> mutex and returned with -EAGAIN. One reason why we need to drop there
> is, for example, that we need to request an action module to be loaded.
>
> This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
> after we loaded and found the requested action, we need to redo the
> whole request so we don't race against others. While we had to unlock
> rtnl in that time, thread B's request was processed next on that CPU.
> Thread B added a new tp instance successfully to the classifier chain.
> When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
> and destroying its tp instance which never got linked, we goto replay
> and redo A's request.
>
> This time when walking the classifier chain in tc_ctl_tfilter() for
> checking for existing tp instances we had a priority match and found
> the tp instance that was created and linked by thread B. Now calling
> again into tp->ops->change() with that tp was successful and returned
> without error.
>
> tp_created was never cleared in the second round, thus kernel thinks
> that we need to link it into the classifier chain (once again). tp and
> *back point to the same object due to the match we had earlier on. Thus
> for thread B's already public tp, we reset tp->next to tp itself and
> link it into the chain, which eventually causes the mentioned endless
> loop in tc_classify() once a packet hits the data path.
>
> Fix is to clear tp_created at the beginning of each request, also when
> we replay it. On the paths that can cause -EAGAIN we already destroy
> the original tp instance we had and on replay we really need to start
> from scratch. It seems that this issue was first introduced in commit
> 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
> and avoid kernel panic when we use cls_cgroup").
>
> Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
> Reported-by: Shahar Klein <shahark@mellanox.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  Shahar, you mentioned you wanted to run again later without the debug
>  printk's. Once you do that and come to the same result again, please
>  feel free to reply with a Tested-by or such. Thanks everyone!
>
>  net/sched/cls_api.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 3fbba79..1ecdf80 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
>  	unsigned long cl;
>  	unsigned long fh;
>  	int err;
> -	int tp_created = 0;
> +	int tp_created;
>
>  	if ((n->nlmsg_type != RTM_GETTFILTER) &&
>  	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
>  		return -EPERM;
>
>  replay:
> +	tp_created = 0;
> +
>  	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
>  	if (err < 0)
>  		return err;
>

I've tested this specific patch (without cong's patch and without the 
many prints) many times and in many test permutations today and it 
didn't lock

Thanks again Daniel!

Shahar

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 16:36 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Jason A. Donenfeld,
	Hannes Frederic Sowa, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222163041.xb2ysx4uswpaqfev@thunk.org>

On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.

Ack.

>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?

Sorry, I meant to write this. Long day, little sleep. Yes, of course.
Batched entropy.

>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.

Alright, sure.

>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.

Okay, I'll submit it to you separately.

> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.

I've already written these. They're behind TEST_HASH. They currently
test every single line of code of all implementations of siphash. I
spent a long time on these. The test vectors themselves were taken
from the SipHash creators' reference publication. Check out
lib/test_siphash.c in my tree.

Jason

On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
>> Could you offer a bit of advice on how to manage dependencies between
>> patchsets during merge windows? I'm a bit new to the process.
>>
>> Specifically, we how have 4 parts:
>> 1. add siphash, and use it for some networking code. to: david miller's net-next
>
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.
>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?
>
>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.
>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.
>
> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.
>
>                                               - Ted



-- 
Jason A. Donenfeld
Deep Space Explorer
fr: +33 6 51 90 82 66
us: +1 513 476 1200
www.jasondonenfeld.com
www.zx2c4.com
zx2c4.com/keys/AB9942E6D4A4CFC3412620A749FC7012A5DE03AE.asc

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22 16:30 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9r3zCE6zLsL9Jrp3a2D3mpH6oYEnz3BBerAGZXyJpr2mQ@mail.gmail.com>

On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
> Could you offer a bit of advice on how to manage dependencies between
> patchsets during merge windows? I'm a bit new to the process.
> 
> Specifically, we how have 4 parts:
> 1. add siphash, and use it for some networking code. to: david miller's net-next

I'd do this first, as one set.  Adding a new file to crypto is
unlikely to cause merge conflicts.

> 2. convert char/random to use siphash. to: ted ts'o's random-next

I'm confused, I thought you had agreed to the batched chacha20
approach?

> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
> inside of linux/cryptohash.h. to: ??'s ??-next

This is cleanup, so it doesn't matter that much when it happens.  md5
changes to crypto is also unlikely to cause conflicts, so we could do
this at the same time as (2), if Herbert (the crypto maintainer) agrees.

> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
> entry inside of linux/cryptohash.c. to: td ts'o's ext-next

This is definitely separate.

One more thing.  Can you add some test cases to lib/siphash.h?
Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
some test inputs and known outputs?  I'm going to need to add a
version of siphash to e2fsprogs, and I want to make sure the userspace
version is implementing the same algorithm as the kernel siphash.

	   		    	 	      - Ted

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Jason A. Donenfeld @ 2016-12-22 16:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrUATPeaCJTQZN2h=ct6HH-xLt4V8T0PfPHNFGFN2HWtTA@mail.gmail.com>

Hi all,

I don't know what your design requirements are for this. It looks like
you're generating some kind of crypto digest of a program, and you
need to avoid collisions. If you'd like to go with a PRF (keyed hash
function) that uses some kernel secret key, then I'd strongly suggest
using Keyed-Blake2. Alternatively, if you need for userspace to be
able to calculate the same hash, and don't want to use some kernel
secret, then I'd still suggest using Blake2, which will be fast and
secure.

If you can wait until January, I'll work on a commit adding the
primitive to the tree. I've already written it and I just need to get
things cleaned up.

> Blake2 is both less stable (didn't they slightly change it recently?)

No, Blake2 is very stable. It's also extremely secure and has been
extensively studied. Not to mention it's faster than SHA2. And if you
want to use it as a PRF, it's obvious better suited and faster to use
Blake2's keyed PRF mode than HMAC-SHA2.

If you don't care about performance, and you don't want to use a PRF,
then just use SHA2-256. If you're particularly concerned about certain
types of attacks, you could go with SHA2-512 truncated to 256 bytes,
but somehow I doubt you need this.

Anyway, the take away is: stop using SHA1. It's time has come.

> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very

I shuttered looking at this too. How did this possibly make it past
review? The attitude toward crypto here is generally *terrifying*.
Unless you're a professional cryptographer, the only correct attitude
to have is a careful and conservative one.

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

If userspace hasn't landed, can we get away with changing this code
after 4.10? Or should we just fix it before 4.10? Or should we revert
it before 4.10? Development-policy-things like this I have zero clue
about, so I heed to your guidance.

Jason

^ permalink raw reply

* [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel
In-Reply-To: <1482423795-6531-1-git-send-email-nogahf@mellanox.com>

Add support for extended statistics of SW only type, for counting only the
packets that went via the cpu. (useful for systems with forward
offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.

It is under the name 'software'
(or any shorten of it as 'soft' or simply 's')

For example:
ifstat -x s

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 misc/ifstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 8325ac7..79c0dba 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -730,7 +730,8 @@ static void xstat_usage(void)
 {
 	fprintf(stderr,
 "Usage: ifstat supported xstats:\n"
-"       64bits         default stats, with 64 bits support\n");
+"       64bits         default stats, with 64 bits support\n"
+"       software        SW stats. Counts only packets that went via the CPU\n");
 }
 
 struct extended_stats_options_t {
@@ -745,6 +746,7 @@ struct extended_stats_options_t {
  */
 static const struct extended_stats_options_t extended_stats_options[] = {
 	{"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE},
+	{"software",  IFLA_STATS_LINK_OFFLOAD_XSTATS, IFLA_OFFLOAD_XSTATS_CPU_HIT},
 };
 
 static bool get_filter_type(char *name)
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v3 3/4] ifstat: Add 64 bits based stats to extended statistics
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel
In-Reply-To: <1482423795-6531-1-git-send-email-nogahf@mellanox.com>

The default stats for ifstat are 32 bits based.
The kernel supports 64 bits based stats. (They are returned in struct
rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in
which the "normal" stats are returned, but with fields of u64 instead of
u32). This patch adds them as an extended stats.

It is read with filter type IFLA_STATS_LINK_64 and no sub type.

It is under the name 64bits
(or any shorten of it as "64")

For example:
ifstat -x 64bit

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 misc/ifstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index ce666b3..8325ac7 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -729,7 +729,8 @@ static int verify_forging(int fd)
 static void xstat_usage(void)
 {
 	fprintf(stderr,
-"Usage: ifstat supported xstats:\n");
+"Usage: ifstat supported xstats:\n"
+"       64bits         default stats, with 64 bits support\n");
 }
 
 struct extended_stats_options_t {
@@ -743,6 +744,7 @@ struct extended_stats_options_t {
  * Name length must be under 64 chars.
  */
 static const struct extended_stats_options_t extended_stats_options[] = {
+	{"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE},
 };
 
 static bool get_filter_type(char *name)
-- 
2.4.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox