Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Shahar Klein @ 2016-12-22 13:16 UTC (permalink / raw)
  To: Daniel Borkmann, davem
  Cc: shahark, xiyou.wangcong, gerlitz.or, roid, jiri, john.fastabend,
	netdev
In-Reply-To: <1954223e8b7aa90ff835357e8a3ef7537be33e43.1482337600.git.daniel@iogearbox.net>



On 12/21/2016 7:04 PM, Daniel Borkmann wrote:
> Shahar reported a soft lockup in tc_classify(), where we run into an
> endless loop when walking the classifier chain due to tp->next == tp
> which is a state we should never run into. The issue only seems to
> trigger under load in the tc control path.
>
> What happens is that in tc_ctl_tfilter(), thread A allocates a new
> tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
> with it. In that classifier callback we had to unlock/lock the rtnl
> mutex and returned with -EAGAIN. One reason why we need to drop there
> is, for example, that we need to request an action module to be loaded.
>
> This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
> after we loaded and found the requested action, we need to redo the
> whole request so we don't race against others. While we had to unlock
> rtnl in that time, thread B's request was processed next on that CPU.
> Thread B added a new tp instance successfully to the classifier chain.
> When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
> and destroying its tp instance which never got linked, we goto replay
> and redo A's request.
>
> This time when walking the classifier chain in tc_ctl_tfilter() for
> checking for existing tp instances we had a priority match and found
> the tp instance that was created and linked by thread B. Now calling
> again into tp->ops->change() with that tp was successful and returned
> without error.
>
> tp_created was never cleared in the second round, thus kernel thinks
> that we need to link it into the classifier chain (once again). tp and
> *back point to the same object due to the match we had earlier on. Thus
> for thread B's already public tp, we reset tp->next to tp itself and
> link it into the chain, which eventually causes the mentioned endless
> loop in tc_classify() once a packet hits the data path.
>
> Fix is to clear tp_created at the beginning of each request, also when
> we replay it. On the paths that can cause -EAGAIN we already destroy
> the original tp instance we had and on replay we really need to start
> from scratch. It seems that this issue was first introduced in commit
> 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
> and avoid kernel panic when we use cls_cgroup").
>
> Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
> Reported-by: Shahar Klein <shahark@mellanox.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  Shahar, you mentioned you wanted to run again later without the debug
>  printk's. Once you do that and come to the same result again, please
>  feel free to reply with a Tested-by or such. Thanks everyone!
>
>  net/sched/cls_api.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 3fbba79..1ecdf80 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n)
>  	unsigned long cl;
>  	unsigned long fh;
>  	int err;
> -	int tp_created = 0;
> +	int tp_created;
>
>  	if ((n->nlmsg_type != RTM_GETTFILTER) &&
>  	    !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
>  		return -EPERM;
>
>  replay:
> +	tp_created = 0;
> +
>  	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
>  	if (err < 0)
>  		return err;
>

I've tested this specific patch (without cong's patch and without the 
many prints) many times and in many test permutations today and it 
didn't lock

Thanks again Daniel!

Shahar

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: David Miller @ 2016-12-22 16:53 UTC (permalink / raw)
  To: daniel
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, john.fastabend,
	netdev
In-Reply-To: <585AEF24.4070800@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 21 Dec 2016 22:07:48 +0100

> Ok, you mean for net. In that case I prefer the smaller sized fix to
> be honest. It also covers everything from the point where we fetch
> the chain via cops->tcf_chain() to the end of the function, which is
> where most of the complexity resides, and only the two mentioned
> commits do the relock, so as a fix I think it's fine as-is. As
> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
> would be better, imho.

Please, can you two work towards an agreement as to what fix should
go in at this time?

Thanks.

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 16:53 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9pvPXiBu5TUYJL7gED7b=iXKrkXu45fXstnBFe77Esv5Q@mail.gmail.com>

On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi all,
>
> I don't know what your design requirements are for this. It looks like
> you're generating some kind of crypto digest of a program, and you
> need to avoid collisions. If you'd like to go with a PRF (keyed hash
> function) that uses some kernel secret key, then I'd strongly suggest
> using Keyed-Blake2. Alternatively, if you need for userspace to be
> able to calculate the same hash, and don't want to use some kernel
> secret, then I'd still suggest using Blake2, which will be fast and
> secure.
>
> If you can wait until January, I'll work on a commit adding the
> primitive to the tree. I've already written it and I just need to get
> things cleaned up.
>
>> Blake2 is both less stable (didn't they slightly change it recently?)
>
> No, Blake2 is very stable. It's also extremely secure and has been
> extensively studied. Not to mention it's faster than SHA2. And if you
> want to use it as a PRF, it's obvious better suited and faster to use
> Blake2's keyed PRF mode than HMAC-SHA2.
>
> If you don't care about performance, and you don't want to use a PRF,
> then just use SHA2-256. If you're particularly concerned about certain
> types of attacks, you could go with SHA2-512 truncated to 256 bytes,
> but somehow I doubt you need this.

I don't think this cares about performance.  (Well, it cares about
performance, but the verifier will likely dominiate the cost by such a
large margin that the hash algo doesn't matter.)  And staying
FIPS-compliant-ish is worth a little bit, so I'd advocate for
something in the SHA2 family.

> If userspace hasn't landed, can we get away with changing this code
> after 4.10? Or should we just fix it before 4.10? Or should we revert
> it before 4.10? Development-policy-things like this I have zero clue
> about, so I heed to your guidance.

I think it should be fixed or reverted before 4.10.

--Andy

^ permalink raw reply

* VERY IMPORTANT
From: info @ 2016-12-22  5:45 UTC (permalink / raw)


We have an inheritance of a deceased client with your surname. Kindly  
contact me via email:( gertjan.vlieghe@yandex.com ) with your full  
names for info

Best Regards.


Dr, Gertjan Vlieghe.
----------------------------------
Correo Corporativo Hospital Universitario del Valle E.S.E
*******************************************************************************

"Estamos re-dimensionandonos para crecer!"

******************************************************************************

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Phil Reid @ 2016-12-22 16:57 UTC (permalink / raw)
  To: Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <efb22734-df9a-fde0-7cff-d1658105e2ec@synopsys.com>

On 22/12/2016 23:47, Joao Pinto wrote:
>
> Hello Phil,
>
> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>> G'day Joao,
>>
>> On 22/12/2016 20:38, Joao Pinto wrote:
>>> When testing stmmac with my QoS reference design I checked a problem in the
>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>
>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>> ---
>>> changes v1->v2 (David Miller)
>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>> to make sense
>>>
>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> index b21d03f..94223c8 100644
>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>> *ioaddr, int mcbins,
>>>      mac->mii.reg_shift = 6;
>>>      mac->mii.reg_mask = 0x000007C0;
>>>      mac->mii.clk_csr_shift = 2;
>>> -    mac->mii.clk_csr_mask = 0xF;
>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>
>> Should this not be GENMASK(5,2)
>
> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>
> Bits: 4:2
> 000 60-100 MHz clk_csr_i/42
> 001 100-150 MHz clk_csr_i/62
> 010 20-35 MHz clk_csr_i/16
> 011 35-60 MHz clk_csr_i/26
> 100 150-250 MHz clk_csr_i/102
> 101 250-300 MHz clk_csr_i/124
> 110, 111 Reserved
>
> So only bits 2, 3 and 4 should be masked.
>
> Thanks.
>
But this is a change in behaviour from what was there isn't.
Previous mask was 4 bits. now it's 3.

And for example the altera socfgpa implementation in the cyclone V has valid values
for values 0x8-0xf, using bit 5:2.





-- 
Regards
Phil Reid

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-22 16:59 UTC (permalink / raw)
  To: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov
  Cc: Jason A. Donenfeld, kernel-hardening@lists.openwall.com,
	Theodore Ts'o, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrUATPeaCJTQZN2h=ct6HH-xLt4V8T0PfPHNFGFN2HWtTA@mail.gmail.com>

On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > Hi Hannes,
> > > 
> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > > <hannes@stressinduktion.org> wrote:
> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > > > You don't want to give people new IPv6 addresses with the same stable
> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > connectivity then and it is extra work?
> > > 
> > > Ahh, too bad. So it goes.
> > 
> > If no other users survive we can put it into the ipv6 module.
> > 
> > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > something like sha256 here, which can be easily replicated by user
> > > > space tools (minus the problem of patching out references to not
> > > > hashable data, which must be zeroed).
> > > 
> > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > as part of a ne
> 
> w patchset that can fast track itself to David? And
> > > then I can preserve my large series for the next merge window.
> > 
> > This algorithm should be a non-seeded algorithm, because the hashes
> > should be stable and verifiable by user space tooling. Thus this would
> > need a hashing algorithm that is hardened against pre-image
> > attacks/collision resistance, which siphash is not. I would prefer some
> > higher order SHA algorithm for that actually.
> > 
> 
> You mean:
> 
> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> Author: Daniel Borkmann <daniel@iogearbox.net>
> Date:   Sun Dec 4 23:19:41 2016 +0100
> 
>     bpf: add prog_digest and expose it via fdinfo/netlink
> 
> 
> Yes, please!  This actually matters for security -- imagine a
> malicious program brute-forcing a collision so that it gets loaded
> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> (preferably the latter).  Speed basically doesn't matter here and
> Blake2 is both less stable (didn't they slightly change it recently?)
> and much less well studied.

We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).

> My inclination would have been to seed them with something that isn't
> exposed to userspace for the precise reason that it would prevent user
> code from making assumptions about what's in the hash.  But if there's
> a use case for why user code needs to be able to calculate the hash on
> its own, then that's fine.  But perhaps the actual fdinfo string
> should be "sha256:abcd1234..." to give some flexibility down the road.
> 
> Also:
> 
> +       result = (__force __be32 *)fp->digest;
> +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
> +               result[i] = cpu_to_be32(fp->digest[i]);
> 
> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very
> brief glance it looks wrong to me.  If you're doing this to avoid
> depending on crypto, then fix crypto so you can pull in the algorithm
> without pulling in the whole crypto core.

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

I hope this was known from the beginning, this is not sha1 unfortunately.

But ebpf elf programs also need preprocessing to get rid of some
embedded load-depending data, so maybe it was considered to be just
enough?

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Joao Pinto @ 2016-12-22 17:06 UTC (permalink / raw)
  To: Phil Reid, Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <816bb34f-bb4c-648a-179e-7e72cab975b0@electromag.com.au>

Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
> On 22/12/2016 23:47, Joao Pinto wrote:
>>
>> Hello Phil,
>>
>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>> G'day Joao,
>>>
>>> On 22/12/2016 20:38, Joao Pinto wrote:
>>>> When testing stmmac with my QoS reference design I checked a problem in the
>>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>>
>>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>>> ---
>>>> changes v1->v2 (David Miller)
>>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>>> to make sense
>>>>
>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> index b21d03f..94223c8 100644
>>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>>> *ioaddr, int mcbins,
>>>>      mac->mii.reg_shift = 6;
>>>>      mac->mii.reg_mask = 0x000007C0;
>>>>      mac->mii.clk_csr_shift = 2;
>>>> -    mac->mii.clk_csr_mask = 0xF;
>>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>
>>> Should this not be GENMASK(5,2)
>>
>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>
>> Bits: 4:2
>> 000 60-100 MHz clk_csr_i/42
>> 001 100-150 MHz clk_csr_i/62
>> 010 20-35 MHz clk_csr_i/16
>> 011 35-60 MHz clk_csr_i/26
>> 100 150-250 MHz clk_csr_i/102
>> 101 250-300 MHz clk_csr_i/124
>> 110, 111 Reserved
>>
>> So only bits 2, 3 and 4 should be masked.
>>
>> Thanks.
>>
> But this is a change in behaviour from what was there isn't.
> Previous mask was 4 bits. now it's 3.
> 
> And for example the altera socfgpa implementation in the cyclone V has valid values
> for values 0x8-0xf, using bit 5:2.

According to the databook, bit 5 is reserved and RO. When reserved, it is
possible to customize. Is that the case? If there is hardware using the 5th bit
we can put it GENMASK(5, 2) with no problem.

> 
> 
> 
> 
> 

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 17:25 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> > > Hi Hannes,
>> > >
>> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> > > <hannes@stressinduktion.org> wrote:
>> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > > > You don't want to give people new IPv6 addresses with the same stable
>> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > > > connectivity then and it is extra work?
>> > >
>> > > Ahh, too bad. So it goes.
>> >
>> > If no other users survive we can put it into the ipv6 module.
>> >
>> > > > The bpf hash stuff can be changed during this merge window, as it is
>> > > > not yet in a released kernel. Albeit I would probably have preferred
>> > > > something like sha256 here, which can be easily replicated by user
>> > > > space tools (minus the problem of patching out references to not
>> > > > hashable data, which must be zeroed).
>> > >
>> > > Oh, interesting, so time is of the essence then. Do you want to handle
>> > > changing the new eBPF code to something not-SHA1 before it's too late,
>> > > as part of a ne
>>
>> w patchset that can fast track itself to David? And
>> > > then I can preserve my large series for the next merge window.
>> >
>> > This algorithm should be a non-seeded algorithm, because the hashes
>> > should be stable and verifiable by user space tooling. Thus this would
>> > need a hashing algorithm that is hardened against pre-image
>> > attacks/collision resistance, which siphash is not. I would prefer some
>> > higher order SHA algorithm for that actually.
>> >
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann <daniel@iogearbox.net>
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>>     bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

The commit log talks about using the hash to see if the program has
already been compiled and JITted.  If that's done, then a collision
will directly cause the kernel to malfunction.

>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +       result = (__force __be32 *)fp->digest;
>> +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +               result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Putting on crypto hat:

NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
2016 when people know better and is going to handle it by porting that
new primitive to userspace" is not a particularly good argument.

Okay, crypto hack back off.

>
> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...

That would be way too complicated and would be nasty for the unprivileged cases.

>
>> At the very least, there should be a separate function that calculates
>> the hash of a buffer and that function should explicitly run itself
>> against test vectors of various lengths to make sure that it's
>> calculating what it claims to be calculating.  And it doesn't look
>> like the userspace code has landed, so, if this thing isn't
>> calculating SHA1 correctly, it's plausible that no one has noticed.
>
> I hope this was known from the beginning, this is not sha1 unfortunately.
>
> But ebpf elf programs also need preprocessing to get rid of some
> embedded load-depending data, so maybe it was considered to be just
> enough?

I suspect it was actually an accident.

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-22 17:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Borkmann, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrW_T1v4qKPJDs5dXwAnAit3M52AWMH-K+GJLb1WoLMuRQ@mail.gmail.com>

On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann <daniel@iogearbox.net>
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > >     bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> 
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Yeah, it still shouldn't crash the kernel but it could cause
malfunctions because assumptions are not met from user space thus it
could act in a strange way:

My personal biggest concern is that users of this API will at some
point in time assume this digist is unique (as a key itself for a
hashtables f.e.), while it is actually not (and not enforced so by the
kernel). If you can get an unpriv ebpf program inserted to the kernel
with the same weak hash, a controller daemon could pick it up and bind
it to another ebpf hook, probably outside of the unpriv realm the user
was in before. Only the sorting matters, which might be unstable and is
not guaranteed by anything in most hash table based data structures.

The API seems flawed to me.

> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.

To add to this, I am very much in favor of that. Right now it doesn't
have a name because it is a custom algorithm. ;)

> > > 
> > > Also:
> > > 
> > > +       result = (__force __be32 *)fp->digest;
> > > +       for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +               result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Putting on crypto hat:
> 
> NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
> 2016 when people know better and is going to handle it by porting that
> new primitive to userspace" is not a particularly good argument.
> 
> Okay, crypto hack back off.
> 
> > 
> > I wondered if bpf program loading should have used the module loading
> > infrastructure from the beginning...
> 
> That would be way too complicated and would be nasty for the unprivileged cases.

I was more or less just thinking about using the syscalls and user
space representation not the generic infrastructure, as it is anyway
too much concerned with real kernel modules (would probably also solve
your cgroup-ebpf thread, as it can be passed by unique name or name and
hash ;) ). Anyway...

> > > At the very least, there should be a separate function that calculates
> > > the hash of a buffer and that function should explicitly run itself
> > > against test vectors of various lengths to make sure that it's
> > > calculating what it claims to be calculating.  And it doesn't look
> > > like the userspace code has landed, so, if this thing isn't
> > > calculating SHA1 correctly, it's plausible that no one has noticed.
> > 
> > I hope this was known from the beginning, this is not sha1 unfortunately.
> > 
> > But ebpf elf programs also need preprocessing to get rid of some
> > embedded load-depending data, so maybe it was considered to be just
> > enough?
> 
> I suspect it was actually an accident.

Maybe, I don't know.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: John Fastabend @ 2016-12-22 17:50 UTC (permalink / raw)
  To: David Miller, daniel
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, netdev
In-Reply-To: <20161222.115326.376675148706578174.davem@davemloft.net>

On 16-12-22 08:53 AM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Wed, 21 Dec 2016 22:07:48 +0100
> 
>> Ok, you mean for net. In that case I prefer the smaller sized fix to
>> be honest. It also covers everything from the point where we fetch
>> the chain via cops->tcf_chain() to the end of the function, which is
>> where most of the complexity resides, and only the two mentioned
>> commits do the relock, so as a fix I think it's fine as-is. As
>> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
>> would be better, imho.
> 
> Please, can you two work towards an agreement as to what fix should
> go in at this time?
> 
> Thanks.
> 

Thanks for fixing this!

I have a slight preference to push this patch into net as its been
tested already by Shahar and is not particularly invasive. Then for
net-next do a larger cleanup to get rid of some of the complexity per
Cong's suggestion.

.John

^ permalink raw reply

* Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
From: Alexander Duyck @ 2016-12-22 17:55 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Weilong Chen, Jeff Kirsher, intel-wired-lan, Netdev,
	linux-kernel@vger.kernel.org
In-Reply-To: <74e1e76d-40f6-db8e-74ac-90b0474d4e11@huawei.com>

Yes that is much more helpful.  So looking at it things are being
padded but the last 4 bytes always have this extra data in them.  I've
been trying to recreate the issue on an 82599 with an SR-IOV VF and I
haven't been having much luck reproducing the problem.

In your test environment is the 82599 connected directly to the
Windows machine or are there any
switches/routers/gateways/tunnels/vlans in between?  I've tried
several iterations but with the 82599 connected directly to another
NIC I have I am not able to get it to produce the garbage padding you
are seeing. It makes me wonder if there might be something that is
manipulating the packets in between the two systems.  For example if
there was a VLAN being associated with the VF that is later stripped
and then the packet handed raw to the test system it might explain
what is introducing the extra padding and reason for pulling in the
CRC, and your patch would mask the issue since it would push the
minimum frame size with a VLAN to 68 instead of 64.

- Alex

On Wed, Dec 21, 2016 at 6:00 PM, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
> On 2016/12/21 10:20, Alexander Duyck wrote:
>> I find it curious that only the last 4 bytes have data in them.  I'm
>> wondering if the NIC/driver in the Windows/Nessus system is
>> interpreting the 4 byte CRC on the end of the frame as padding instead
>> of stripping it.
>>
>> Is there any chance you could capture the entire frame instead of just
>> the padding?  Maybe you could run something like wireshark without
>> enabling promiscuous mode on the VF and capture the frames it is
>> trying to send and receive.  What I want to verify is what the actual
>> amount of padding is that is needed to get to 60 bytes and where the
>> CRC should start.
>>
>> - Alex
>
> Here is the verbose output, is this useful?
> Or we will try according to your advice, thanks,
>
> D:\Program Files\Tenable\Nessus>nasl.exe -aX -t 192.169.0.151 etherleak.nasl
> --------------------------
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 81 00 00 40 01 D7 F3 C0 A9 00 97    E... ...@.......
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00    ............x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 98 E4 75 DF          ............u.
> --------------------------
>  ---[ ICMP ]---
> 0x00:  45 00 00 1D 20 85 00 00 40 01 D7 EF C0 A9 00 97    E... ...@.......
> 0x10:  C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00    ............x...
> 0x20:  00 00 00 00 00 00 00 00 00 00 FB DA F8 13          ..............
> ---[ ether1 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75    ...............u
> 0x10:  DF                                                 .
> ---[ ether2 ]---
> 0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8    ................
> 0x10:  13                                                 .
>
> Padding observed in one frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75    ...............u
>   0x10:  DF                                                 .
>
> Padding observed in another frame :
>
>   0x00:  00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8    ................
>   0x10:  13
>
>

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22 18:08 UTC (permalink / raw)
  To: Theodore Ts'o, Jason A. Donenfeld, kernel-hardening,
	Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222155447.u3ayvw4gmorhswjv@thunk.org>

On 22.12.2016 16:54, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> following up on what appears to be a random subject: ;)
>>>
>>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>>> in the htree. siphash seems to fit this use case pretty good.
>>
>> I saw this too. I'll try to address it in v8 of this series.
> 
> This is a separate issue, and this series is getting a bit too
> complex.  So I'd suggest pushing this off to a separate change.
> 
> Changing the htree hash algorithm is an on-disk format change, and so
> we couldn't roll it out until e2fsprogs gets updated and rolled out
> pretty broadley.  In fact George sent me patches to add siphash as a
> hash algorithm for htree a while back (for both the kernel and
> e2fsprogs), but I never got around to testing and applying them,
> mainly because while it's technically faster, I had other higher
> priority issues to work on --- and see previous comments regarding
> pixel peeping.  Improving the hash algorithm by tens or even hundreds
> of nanoseconds isn't really going to matter since we only do a htree
> lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
> times will dominate.  And from the power perspective, saving
> microwatts of CPU power isn't going to matter if you're going to be
> spinning up the storage device....

I wasn't concerned about performance but more about DoS resilience. I
wonder how safe half md4 actually is in terms of allowing users to
generate long hash chains in the filesystem (in terms of length
extension attacks against half_md4).

In ext4, is it actually possible that a "disrupter" learns about the
hashing secret in the way how the inodes are returned during getdents?

Thanks,
Hannes

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 18:13 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Theodore Ts'o, kernel-hardening, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <32db19ab-0c19-628c-3475-2ccb8050563a@stressinduktion.org>

On Thu, Dec 22, 2016 at 7:08 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).

AFAIK, this is a real vulnerability that needs to be addressed.

Judging by Ted's inquiry about my siphash testing suite, I assume he's
probably tinkering around with it as we speak. :)


Meanwhile I've separated things into several trees:

1. chacha20 rng, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=random-next

2. md5 cleanup, not yet submitted:
https://git.zx2c4.com/linux-dev/log/?h=md5-cleanup

3. md4 cleanup, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=ext4-next-md4-cleanup

4. siphash and networking, not yet submitted as a x/4 series:
https://git.zx2c4.com/linux-dev/log/?h=net-next-siphash

I'll submit (4) in a couple of days, waiting for any comments on the
existing patch-set.

Jason

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Jason A. Donenfeld @ 2016-12-22 18:19 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482425969.2673.5.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

Okay, so in that case, a weak hashing function like SHA1 could result
in a real vulnerability. Therefore, this SHA1 stuff needs to be
reverted immediately, pending a different implementation. If this has
ever shipped in a kernel version, it could even deserve a CVE. No
SHA1!

> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Jeepers creepers. So for some ungodly reason, LKML has invented yet
another homebrewed crypto primitive. This story really gets more
horrifying every day. No bueno.

So yea, let's revert and re-commit (repeal and replace? just
kidding...). Out with SHA-1, in with Blake2 or SHA2.

Jason

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-22 18:48 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, Mark Rutland, netdev,
	devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Mark Greer,
	Justin Bronder
In-Reply-To: <CAO7Z3WKmEbJCN_=srxTVKXvtz84vHGi41RS+9T7fmkmhRB6nEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, Dec 19, 2016 at 5:23 PM, Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org> wrote:
> I can make that change, however, I worry that it may be a bit
> misleading, since there are only two supported clock frequencies, but
> a number like that to me implies that it could be set to any number
> you want.   I'm new at this, and so I'll go ahead and change it as you
> request, but I'd like to hear your thoughts on my concern.

Then the binding doc just needs to state what are the 2 valid frequencies.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] tc: add missing limits.h header
From: Baruch Siach @ 2016-12-22 18:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Baruch Siach

This fixes under musl build issues like:

f_matchall.c: In function ‘matchall_parse_opt’:
f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
            ^
f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in
f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function)
   if (h == LONG_MIN || h == LONG_MAX) {
                             ^

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 tc/tc_util.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/tc_util.h b/tc/tc_util.h
index f198a4ad5554..4db26c6d5e25 100644
--- a/tc/tc_util.h
+++ b/tc/tc_util.h
@@ -2,6 +2,7 @@
 #define _TC_UTIL_H_ 1
 
 #define MAX_MSG 16384
+#include <limits.h>
 #include <linux/pkt_sched.h>
 #include <linux/pkt_cls.h>
 #include <linux/gen_stats.h>
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
From: Stephen Hemminger @ 2016-12-22 18:59 UTC (permalink / raw)
  To: Nogah Frankel
  Cc: netdev, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch
In-Reply-To: <1482423795-6531-3-git-send-email-nogahf@mellanox.com>

On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:
On Thu, 22 Dec 2016 18:23:13 +0200
Nogah Frankel <nogahf@mellanox.com> wrote:

>  }
> @@ -691,18 +804,22 @@ static const struct option longopts[] = {
>  	{ "interval", 1, 0, 't' },
>  	{ "version", 0, 0, 'V' },
>  	{ "zeros", 0, 0, 'z' },
> +	{ "extended", 1, 0, 'x'},
>  	{ 0 }
>  };
>  
> +
>  int main(int argc, char *argv[])

You let extra whitespace changes creep in.


> +		case 'x':
> +			is_extended = true;
> +			memset(stats_type, 0, 64);
> +			strncpy(stats_type, optarg, 63);
> +			break;

This seems like doing this either the paranoid or hard way.
Why not:
	const char *stats_type = NULL;
...

	case 'x':
		stats_type = optarg;
		break;
...
		if (stats_type)
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
				 getuid());
		else
			snprintf(hist_name, sizeof(hist_name),
				 "%s/.ifstat.u%d", P_tmpdir, getuid());


Since:
	1) optarg points to area in argv that is persistent (avoid copy)
	2) don't need is_extended flag value then

Please cleanup and resubmit.

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Cong Wang @ 2016-12-22 19:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <585AEF24.4070800@iogearbox.net>

On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Ok, you mean for net. In that case I prefer the smaller sized fix to be
> honest. It also covers everything from the point where we fetch the chain
> via cops->tcf_chain() to the end of the function, which is where most of
> the complexity resides, and only the two mentioned commits do the relock,

I really wish the problem is only about relocking, but look at the code,
the deeper reason why we have this bug is the complexity of the logic
inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
idempotent; 2) the request logic itself is hard, because of tc filter design
and implementation.

This is why I worry more than just relocking.

> so as a fix I think it's fine as-is. As mentioned, if there's need to
> refactor tc_ctl_tfilter() net-next would be better, imho.

Refactor is a too strong word, just moving the replay out is not a refactor.
The end result will be still smaller than your commit d936377414fadbafb4,
which is already backported to stable.

^ permalink raw reply

* Re: [PATCH iproute2 0/2] Fix the usage of TC tunnel key
From: Stephen Hemminger @ 2016-12-22 19:03 UTC (permalink / raw)
  To: Hadar Hen Zion; +Cc: netdev, Simon Horman, Or Gerlitz, Roi Dayan
In-Reply-To: <1482394481-19624-1-git-send-email-hadarh@mellanox.com>

On Thu, 22 Dec 2016 10:14:39 +0200
Hadar Hen Zion <hadarh@mellanox.com> wrote:

> Add dest UDP port parameter to the usage of tc tunnle key action and
> classifcation.
> 
> 
> Hadar Hen Zion (2):
>   tc/cls_flower: Add to the usage encapsulation dest UDP port
>   tc/m_tunnel_key: Add to the usage encapsulation dest UDP port
> 
>  tc/f_flower.c     | 5 +++--
>  tc/m_tunnel_key.c | 2 +-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Looks good, both applied

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22 19:24 UTC (permalink / raw)
  To: luto, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrUSM7KDNuVqh169241PNumJVOF5hrNQUq=k5Fnet6nSgA@mail.gmail.com>

> Having slept on this, I like it less.  The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value.  This probably breaks any attempt to apply
> security properties of the hash function.  For example, the internal
> state could easily contain a whole bunch of prior outputs it in
> verbatim.

The problem is, anti-backtracing is in severe tension with your desire
to use unmodified SipHash.

First of all, I'd like to repeat that it isn't a design goal of the current
generator and isn't necessary.

The current generator just returns hash[0] from the MD5 state, then
leaves the state stored.  The fact that it conceals earlier outputs is
an accident of the Davies-Meyer structure of md5_transform.

It isn't necessary, because every secret generated is stored unencrypted
for as long as it's of value.  A few values are used for retransmit
backoffs and random MAC addresses.  Those are revealed to the world as
soon as they're used.

Most values are used for ASLR.  These address are of interest to an
attacker trying to mount a buffer-overflow attack, but that only lasts
as long as the process is running to receive buffers.  After the process
exits, knowledge of its layout is worthless.

And this is stored as long as it's running in kernel-accessible data,
so storing a *second* copy in less conveniently kernel-accessible data
(the RNG state) doesn't make the situation any worse.


In addition to the above, if you're assuming a state capture, then
since we have (for necessary efficieny reasons) a negligible about of
fresh entropy, an attacker has the secret key and can predict *future*
outputs very easily.

Given that information, an attacker doesn't need to learn the layout of
vulnerable server X.  If they have a buffer overflow, they can crash
the current instance and wait for a fresh image to be started (with a
known address space) to launch their attack at.


Kernel state capture attacks are a very unlikely attack, mostly because
it's a narrow target a hair's breadth away from the much more interesting
outright kernel compromise (attacker gains write access as well as read)
which renders all this fancy cryptanaysis moot.


Now, the main point:  it's not likely to be solvable.

The problem with unmodified SipHash is that is has only 64 bits of
output.  No mix-back mechanism can get around the fundamental problem
that that's too small to prevent a brute-force guessing attack.  You need
wider mix-back.  And getting more output from unmodified SipHash requires
more finalization rounds, which is expensive.

(Aside: 64 bits does have the advantage that it can't be brute-forced on
the attacked machine.  It must be exfiltrated to the attacker, and the
solution returned to the attack code.  But doing this is getting easier
all the time.)

Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer
structure around its internal state.  Remember the internal state before
hashing in the entropy and secret, generate the output, then add the
previous and final states together for storage.

This is a standard textbook construction, very cheap, and doesn't touch
the compression function which is the target of analysis and attacks,
but it requires poking around in SipHash's internal state.  (And people
who read the textbooks without understanding them will get upset because
the textbooks all talk about using this construction with block ciphers,
and SipHash's compression function is not advertised as a block cipher.)

Alternative designs exist; you could extract additional output from
earlier rounds of SipHash, using the duplex sponge construction you
mentioned earlier.  That output would be used for mixback purposes *only*,
so wouldn't affect the security proof of the "primary" output.
But this is also getting creative with SipHash's internals.


Now, you could use a completely *different* cryptographic primitive
to enforce one-way-ness, and apply SipHash as a strong output transform,
but that doesn't feel like good design, and is probably more expensive.


Finally, your discomfort about an attacker learning the internal state...
if an attacker knows the key and the input, they can construct the
internal state.  Yes, we could discard the internal state and construct
a fresh one on the next call to get_random_int, but what are you going
to key it with?  What are you going to feed it?  What keeps *that*
internal state any more secret from an attacker than one that's explicitly
stored?

Keeping the internal state around is a cacheing optimization, that's all.

*If* you're assuming a state capture, the only thing secret from the
attacker is any fresh entropy collected between the time of capture
and the time of generation.  Due to mandatory efficiency requirements,
this is very small.  

I really think you're wishing for the impossible here.


A final note: although I'm disagreeing with you, thank you very much for 
the informed discussion.  Knowing that someone will read and think about
this message carefully has forced me to think it through carefully myself.

For example, clearly stating the concern over starting new processes
with predictable layout, and the limits on the fresh entropy supply,
has made me realize that there *is* a possible source: each exec()
is passed 128 bits from get_random_bytes in the AT_RANDOM element of
its auxv.  Since get_random_int() accesses "current" anyway, we could
store some seed material there rather than using "pid".  While this is
not fresh for each call to get_random_int, it *is* fresh for each new
address space whose layout is being randomized.

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Andy Lutomirski @ 2016-12-22 19:32 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andrew Lutomirski, Andi Kleen, David S. Miller, David Laight,
	D. J. Bernstein, Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jason A. Donenfeld, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds, Ted Ts'o
In-Reply-To: <20161222192445.963.qmail@ns.sciencehorizons.net>

On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
>> Having slept on this, I like it less.  The problem is that a
>> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
>> secret || ...) -- they learn the internal state of the hash function
>> that generates that value.  This probably breaks any attempt to apply
>> security properties of the hash function.  For example, the internal
>> state could easily contain a whole bunch of prior outputs it in
>> verbatim.
>
> The problem is, anti-backtracing is in severe tension with your desire
> to use unmodified SipHash.
>
> First of all, I'd like to repeat that it isn't a design goal of the current
> generator and isn't necessary.

Agreed.

> Now, the main point:  it's not likely to be solvable.
>
> The problem with unmodified SipHash is that is has only 64 bits of
> output.  No mix-back mechanism can get around the fundamental problem
> that that's too small to prevent a brute-force guessing attack.  You need
> wider mix-back.  And getting more output from unmodified SipHash requires
> more finalization rounds, which is expensive.

It could only mix the output back in every two calls, in which case
you can backtrack up to one call but you need to do 2^128 work to
backtrack farther.  But yes, this is getting excessively complicated.

> Finally, your discomfort about an attacker learning the internal state...
> if an attacker knows the key and the input, they can construct the
> internal state.  Yes, we could discard the internal state and construct
> a fresh one on the next call to get_random_int, but what are you going
> to key it with?  What are you going to feed it?  What keeps *that*
> internal state any more secret from an attacker than one that's explicitly
> stored?

I do tend to like Ted's version in which we use batched
get_random_bytes() output.  If it's fast enough, it's simpler and lets
us get the full strength of a CSPRNG.

(Aside: some day I want to move all that code from drivers/ to lib/
and teach it to be buildable in userspace, too, so it's easy to play
with it, feed it test vectors, confirm that it's equivalent to a
reference implementation, write up the actual design and try to get
real cryptographers to analyze it, etc.)

> For example, clearly stating the concern over starting new processes
> with predictable layout, and the limits on the fresh entropy supply,
> has made me realize that there *is* a possible source: each exec()
> is passed 128 bits from get_random_bytes in the AT_RANDOM element of
> its auxv.  Since get_random_int() accesses "current" anyway, we could
> store some seed material there rather than using "pid".  While this is
> not fresh for each call to get_random_int, it *is* fresh for each new
> address space whose layout is being randomized.

Hmm, interesting.  Although, for ASLR, we could use get_random_bytes()
directly and be done with it.  It won't be a bottleneck.

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Alexei Starovoitov @ 2016-12-22 19:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrW_T1v4qKPJDs5dXwAnAit3M52AWMH-K+GJLb1WoLMuRQ@mail.gmail.com>

On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> We don't prevent ebpf programs being loaded based on the digest but
>> just to uniquely identify loaded programs from user space and match up
>> with their source.
>
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Andy, please read the code.
we could have used jhash there just as well.
Collisions are fine.

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22 19:50 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jason A. Donenfeld, kernel-hardening, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <32db19ab-0c19-628c-3475-2ccb8050563a@stressinduktion.org>

On Thu, Dec 22, 2016 at 07:08:37PM +0100, Hannes Frederic Sowa wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).
> 
> In ext4, is it actually possible that a "disrupter" learns about the
> hashing secret in the way how the inodes are returned during getdents?

They'd have to be a local user, who can execute telldir(3) --- in
which case there are plenty of other denial of service attacks one
could carry out that would be far more devastating.

It might also be an issue if the file system is exposed via NFS, but
again, there are so many other ways an attacker could DoS a NFS server
that I don't think of it as a much of a concern.

Keep in mind that worst someone can do is cause directory inserts to
fail with an ENOSPC, and there are plenty of other ways of doing that
--- such as consuming all of the blocks and inodes in the file system,
for example.

So it's a threat, but not a high priority one as far as I'm concerned.
And if this was a problem in actual practice, users could switch to
the TEA based hash, which should be far harder to attack, and
available today.

					- Ted

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 19:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Daniel Borkmann, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAADnVQ+memUT2zcc-wunP0QgWSLWpmnSJNRJZmfDdc+FBb=gEg@mail.gmail.com>

On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> We don't prevent ebpf programs being loaded based on the digest but
>>> just to uniquely identify loaded programs from user space and match up
>>> with their source.
>>
>> The commit log talks about using the hash to see if the program has
>> already been compiled and JITted.  If that's done, then a collision
>> will directly cause the kernel to malfunction.
>
> Andy, please read the code.
> we could have used jhash there just as well.
> Collisions are fine.

There's relevant in the code to read yet AFAICS.  The code exports it
via fdinfo, and userspace is expected to do something with it.  The
commit message says:

    When programs are pinned and retrieved by an ELF loader, the loader
    can check the program's digest through fdinfo and compare it against
    one that was generated over the ELF file's program section to see
    if the program needs to be reloaded.

I assume this means that a userspace component is expected to compare
the digest of a loaded program to a digest of a program it wants to
load and to use the result of the comparison to decide whether the
programs are the same.  If that's indeed the case (and it sure sounds
like it, and I fully expect CRIU to do very similar things when
support is added), then malicious collisions do matter.

It's also not quite clear to me why userspace needs to be able to
calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
command that takes a BPF program as input and hashes it would seem to
serve the same purpose, and that would allow the kernel to key the
digest and change the algorithm down the road without breaking things.

Regardless, adding a new hash algorithm that is almost-but-not-quite
SHA-1 and making it a stable interface to userspace is not a good
thing.

--Andy

^ permalink raw reply

* Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
From: Hannes Frederic Sowa @ 2016-12-22 20:00 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, kraigatgoog, eric.dumazet, tom, netdev, kernel-team
In-Reply-To: <1482333414.24490.12@smtp.office365.com>

On 21.12.2016 16:16, Josef Bacik wrote:
> On Wed, Dec 21, 2016 at 10:06 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote:
>>>  The only difference between inet6_csk_bind_conflict and
>>> inet_csk_bind_conflict
>>>  is how they check the rcv_saddr.  Since we want to be able to check
>>> the saddr in
>>>  other places just drop the protocol specific ->bind_conflict and
>>> replace it with
>>>  ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true
>>> bind conflict
>>>  function.
>>>
>>>  Signed-off-by: Josef Bacik <jbacik@fb.com>
>>>
>>
>>
>>
>>>  ---
>>>   include/net/inet6_connection_sock.h |  5 -----
>>>   include/net/inet_connection_sock.h  |  9 +++------
>>>   net/dccp/ipv4.c                     |  3 ++-
>>>   net/dccp/ipv6.c                     |  2 +-
>>>   net/ipv4/inet_connection_sock.c     | 22 +++++++-------------
>>>   net/ipv4/tcp_ipv4.c                 |  3 ++-
>>>   net/ipv4/udp.c                      |  1 +
>>>   net/ipv6/inet6_connection_sock.c    | 40
>>> -------------------------------------
>>>   net/ipv6/tcp_ipv6.c                 |  4 ++--
>>>   9 files changed, 18 insertions(+), 71 deletions(-)
>>>
>>>  diff --git a/include/net/inet6_connection_sock.h
>>> b/include/net/inet6_connection_sock.h
>>>  index 3212b39..8ec87b6 100644
>>>  --- a/include/net/inet6_connection_sock.h
>>>  +++ b/include/net/inet6_connection_sock.h
>>>  @@ -15,16 +15,11 @@
>>>
>>>   #include <linux/types.h>
>>>
>>>  -struct inet_bind_bucket;
>>>   struct request_sock;
>>>   struct sk_buff;
>>>   struct sock;
>>>   struct sockaddr;
>>>
>>>  -int inet6_csk_bind_conflict(const struct sock *sk,
>>>  -                const struct inet_bind_bucket *tb, bool relax,
>>>  -                bool soreuseport_ok);
>>>  -
>>>   struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct
>>> flowi6 *fl6,
>>>                         const struct request_sock *req, u8 proto);
>>>
>>>  diff --git a/include/net/inet_connection_sock.h
>>> b/include/net/inet_connection_sock.h
>>>  index ec0479a..9cd43c5 100644
>>>  --- a/include/net/inet_connection_sock.h
>>>  +++ b/include/net/inet_connection_sock.h
>>>  @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops {
>>>                   char __user *optval, int __user *optlen);
>>>   #endif
>>>       void        (*addr2sockaddr)(struct sock *sk, struct sockaddr *);
>>>  -    int        (*bind_conflict)(const struct sock *sk,
>>>  -                     const struct inet_bind_bucket *tb,
>>>  -                     bool relax, bool soreuseport_ok);
>>>  +    int         (*rcv_saddr_equal)(const struct sock *sk1,
>>>  +                       const struct sock *sk2,
>>>  +                       bool match_wildcard);
>>>       void        (*mtu_reduced)(struct sock *sk);
>>>   };
>>>
>>>
>>
>> The patch looks as a nice code cleanup already!
>>
>> Have you looked if we can simply have one rcv_saddr_equal for both ipv4
>> and ipv6 that e.g. uses sk->sk_family instead of function pointers?
>> This could give us even more possibilities to remove some indirect
>> functions calls and thus might relieve some cycles?
> 
> I was going to do that but I'm not familiar enough with how sockets work
> to be comfortable.  My main concern is we have the ipv6_only() check on
> a socket, which seems to indicate to me that you can have a socket that
> can do both ipv4/ipv6, so what if we're specifically going through the
> ipv6 code, but we aren't ipv6_only() and we end up doing the ipv4
> address compare when we really need to do the ipv6 address compare?  If
> this can't happen (and honestly as I type it out it sounds crazier than
> it did in my head) then yeah I'll totally do that as well and we can
> just have a global function without the protocol specific callbacks, but
> I need you or somebody to tell me I'm crazy and that it would be ok to
> have it all in one function.  Thanks,

IPv6 sockets can do IPv4 via mapped addresses. The other way around
doesn't work, there are no IPv4 sockets that can speak IPv6. The
ipv6_only flags in IPv4 sockets should always stay 0 but they need to be
evaluated from there side for possible port conflicts.

My idea is to use sk->sk_family, which is in sync with
icsk->icsk_af_ops, which you use for the function pointer lookup, to
switch between those functions. Looking through a lot of callback, I
don't see this assumption violated so far.

This would also solve the problem which David described, your search
would be free of those indirect jumps.

Bye,
Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox