Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22 15:33 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kernel-hardening, Theodore Ts'o, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9pu-2CY2WRHevnpwo-9qnZcTpqQgC2voGFOpSjo+LPiUA@mail.gmail.com>

On Thu, 2016-12-22 at 16:29 +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > As a first step, I'm considering adding a patch to move halfmd4.c
> > inside the ext4 domain, or at the very least, simply remove it from
> > linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> > usages to consider.
> 
> Specifically something like this:
> 
> https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=978213351f9633bd1e3d1fdc3f19d28e36eeac90
> 
> That only leaves two more uses of "cryptohash" to consider, but they
> require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
> That might be a straight-forward conversion to SipHash, but perhaps
> not; I need to look closely and think about it. The next is
> sha_transform in kernel/bpf/core.c. I really have no idea what's going
> on with the eBPF stuff, so that will take a bit longer to study. Maybe
> sha1 is fine in the end there? I'm not sure yet.

IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
You don't want to give people new IPv6 addresses with the same stable
secret (across reboots) after a kernel upgrade. Maybe they lose
connectivity then and it is extra work?

The bpf hash stuff can be changed during this merge window, as it is
not yet in a released kernel. Albeit I would probably have preferred
something like sha256 here, which can be easily replicated by user
space tools (minus the problem of patching out references to not
hashable data, which must be zeroed).

Bye,
Hannes

^ permalink raw reply

* [PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
From: Lorenzo Colitti @ 2016-12-22 15:33 UTC (permalink / raw)
  To: netdev; +Cc: davem, Lorenzo Colitti

Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made ip_do_redirect call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fa5c037227..9eabf49013 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -798,6 +798,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf
 	struct rtable *rt;
 	struct flowi4 fl4;
 	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	struct net *net = dev_net(skb->dev);
 	int oif = skb->dev->ifindex;
 	u8 tos = RT_TOS(iph->tos);
 	u8 prot = iph->protocol;
@@ -805,7 +806,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf
 
 	rt = (struct rtable *) dst;
 
-	__build_flow_key(sock_net(sk), &fl4, sk, iph, oif, tos, prot, mark, 0);
+	__build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0);
 	__ip_do_redirect(rt, skb, &fl4, true);
 }
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 15:41 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: kernel-hardening, Theodore Ts'o, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482420815.2673.1.camel@stressinduktion.org>

Hi Hannes,

On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> You don't want to give people new IPv6 addresses with the same stable
> secret (across reboots) after a kernel upgrade. Maybe they lose
> connectivity then and it is extra work?

Ahh, too bad. So it goes.

> The bpf hash stuff can be changed during this merge window, as it is
> not yet in a released kernel. Albeit I would probably have preferred
> something like sha256 here, which can be easily replicated by user
> space tools (minus the problem of patching out references to not
> hashable data, which must be zeroed).

Oh, interesting, so time is of the essence then. Do you want to handle
changing the new eBPF code to something not-SHA1 before it's too late,
as part of a new patchset that can fast track itself to David? And
then I can preserve my large series for the next merge window.

Jason

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Phil Reid @ 2016-12-22 15:42 UTC (permalink / raw)
  To: Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <7b395fd7dfd0c808243a744393473cbbf89b268a.1482410161.git.jpinto@synopsys.com>

G'day Joao,

On 22/12/2016 20:38, Joao Pinto wrote:
> When testing stmmac with my QoS reference design I checked a problem in the
> CSR clock configuration that was impossibilitating the phy discovery, since
> every read operation returned 0x0000ffff. This patch fixes the issue.
>
> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
> ---
> changes v1->v2 (David Miller)
> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
> to make sense
>
>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>  3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
> index b21d03f..94223c8 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
>  	mac->mii.reg_shift = 6;
>  	mac->mii.reg_mask = 0x000007C0;
>  	mac->mii.clk_csr_shift = 2;
> -	mac->mii.clk_csr_mask = 0xF;
> +	mac->mii.clk_csr_mask = GENMASK(4, 2);

Should this not be GENMASK(5,2)

>
>  	/* Get and dump the chip ID */
>  	*synopsys_id = stmmac_get_synopsys_id(hwid);
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
> index a1d582f..8a40e69 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
> @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id)
>  	mac->mii.reg_shift = 6;
>  	mac->mii.reg_mask = 0x000007C0;
>  	mac->mii.clk_csr_shift = 2;
> -	mac->mii.clk_csr_mask = 0xF;
> +	mac->mii.clk_csr_mask = GENMASK(4, 2);
same as above?

>
>  	/* Synopsys Id is not available on old chips */
>  	*synopsys_id = 0;
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index 23322fd..fda01f7 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
>  	value |= (phyaddr << priv->hw->mii.addr_shift)
>  		& priv->hw->mii.addr_mask;
>  	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
> -	value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
> -		<< priv->hw->mii.clk_csr_shift;
> +	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
> +		& priv->hw->mii.clk_csr_mask;
>  	if (priv->plat->has_gmac4)
>  		value |= MII_GMAC4_READ;
>
> @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg,
>  		& priv->hw->mii.addr_mask;
>  	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
>
> -	value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
> -		<< priv->hw->mii.clk_csr_shift);
> +	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
> +		& priv->hw->mii.clk_csr_mask;
>  	if (priv->plat->has_gmac4)
>  		value |= MII_GMAC4_WRITE;
>
>


-- 
Regards
Phil Reid

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Joao Pinto @ 2016-12-22 15:47 UTC (permalink / raw)
  To: Phil Reid, Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <15975894-6a5e-1706-ff9e-660c0bac3971@electromag.com.au>


Hello Phil,

Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
> G'day Joao,
> 
> On 22/12/2016 20:38, Joao Pinto wrote:
>> When testing stmmac with my QoS reference design I checked a problem in the
>> CSR clock configuration that was impossibilitating the phy discovery, since
>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>
>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>> ---
>> changes v1->v2 (David Miller)
>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>> to make sense
>>
>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> index b21d03f..94223c8 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>> *ioaddr, int mcbins,
>>      mac->mii.reg_shift = 6;
>>      mac->mii.reg_mask = 0x000007C0;
>>      mac->mii.clk_csr_shift = 2;
>> -    mac->mii.clk_csr_mask = 0xF;
>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
> 
> Should this not be GENMASK(5,2)

According to Universal MAC databook (valid for MAC100 and MAC1000), we have:

Bits: 4:2
000 60-100 MHz clk_csr_i/42
001 100-150 MHz clk_csr_i/62
010 20-35 MHz clk_csr_i/16
011 35-60 MHz clk_csr_i/26
100 150-250 MHz clk_csr_i/102
101 250-300 MHz clk_csr_i/124
110, 111 Reserved

So only bits 2, 3 and 4 should be masked.

Thanks.

> 
>>
>>      /* Get and dump the chip ID */
>>      *synopsys_id = stmmac_get_synopsys_id(hwid);
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> index a1d582f..8a40e69 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
>> @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem
>> *ioaddr, int *synopsys_id)
>>      mac->mii.reg_shift = 6;
>>      mac->mii.reg_mask = 0x000007C0;
>>      mac->mii.clk_csr_shift = 2;
>> -    mac->mii.clk_csr_mask = 0xF;
>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
> same as above?
> 
>>
>>      /* Synopsys Id is not available on old chips */
>>      *synopsys_id = 0;
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> index 23322fd..fda01f7 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int
>> phyaddr, int phyreg)
>>      value |= (phyaddr << priv->hw->mii.addr_shift)
>>          & priv->hw->mii.addr_mask;
>>      value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
>> -    value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
>> -        << priv->hw->mii.clk_csr_shift;
>> +    value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>> +        & priv->hw->mii.clk_csr_mask;
>>      if (priv->plat->has_gmac4)
>>          value |= MII_GMAC4_READ;
>>
>> @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int
>> phyaddr, int phyreg,
>>          & priv->hw->mii.addr_mask;
>>      value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
>>
>> -    value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
>> -        << priv->hw->mii.clk_csr_shift);
>> +    value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>> +        & priv->hw->mii.clk_csr_mask;
>>      if (priv->plat->has_gmac4)
>>          value |= MII_GMAC4_WRITE;
>>
>>
> 
> 

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22 15:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kernel-hardening, Theodore Ts'o, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9ok8iWfZybyDki13v6Xf3usRet1y8oUcDcy+5YwkARQPA@mail.gmail.com>

On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > You don't want to give people new IPv6 addresses with the same stable
> > secret (across reboots) after a kernel upgrade. Maybe they lose
> > connectivity then and it is extra work?
> 
> Ahh, too bad. So it goes.

If no other users survive we can put it into the ipv6 module.

> > The bpf hash stuff can be changed during this merge window, as it is
> > not yet in a released kernel. Albeit I would probably have preferred
> > something like sha256 here, which can be easily replicated by user
> > space tools (minus the problem of patching out references to not
> > hashable data, which must be zeroed).
> 
> Oh, interesting, so time is of the essence then. Do you want to handle
> changing the new eBPF code to something not-SHA1 before it's too late,
> as part of a new patchset that can fast track itself to David? And
> then I can preserve my large series for the next merge window.

This algorithm should be a non-seeded algorithm, because the hashes
should be stable and verifiable by user space tooling. Thus this would
need a hashing algorithm that is hardened against pre-image
attacks/collision resistance, which siphash is not. I would prefer some
higher order SHA algorithm for that actually.

Bye,
Hannes
 

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 15:53 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: kernel-hardening, Theodore Ts'o, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482421900.2673.3.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 4:51 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.

Right. SHA-256, SHA-512/256, Blake2s, or Blake2b would probably be
good candidates for this.

^ permalink raw reply

* Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: Alexander Duyck @ 2016-12-22 15:53 UTC (permalink / raw)
  To: maowenan
  Cc: Stephen Hemminger, netdev@vger.kernel.org,
	jeffrey.t.kirsher@intel.com, weiyongjun (A), Dingtianhong
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F20151F189@szxeml504-mbs.china.huawei.com>

On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com> wrote:
>
>
>> -----Original Message-----
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Thursday, December 22, 2016 9:28 AM
>> To: maowenan
>> Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
>> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
>>
>> On Thu, 8 Dec 2016 14:51:38 +0800
>> Mao Wenan <maowenan@huawei.com> wrote:
>>
>> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
>> > ordering mode, which can be set by ethtool.
>> > Relax ordering is one mode of 82599 NIC, to enable this mode can
>> > enhance the performance for some cpu architecure.
>>
>> Then it should be done by CPU architecture specific quirks (preferably in PCI
>> layer) so that all users get the option without having to do manual intervention.
>>
>> > example:
>> > ethtool -s enp1s0f0 relaxorder off
>> > ethtool -s enp1s0f0 relaxorder on
>>
>> Doing it via ethtool is a developer API (for testing) not something that makes
>> sense in production.
>
>
> This feature is not mandatory for all users, acturally relax ordering default configuration of 82599 is 'disable',
> So this patch gives one way to enable relax ordering to be selected in some performance condition.

That isn't quite true.  The default for Sparc systems is to have it enabled.

Really this is something that is platform specific.  I agree with
Stephen that it would work better if this was handled as a series of
platform specific quirks handled at something like the PCI layer
rather than be a switch the user can toggle on and off.

With that being said there are changes being made that should help to
improve the situation.  Specifically I am looking at adding support
for the DMA_ATTR_WEAK_ORDERING which may also allow us to identify
cases where you might be able to specify the DMA behavior via the DMA
mapping instead of having to make the final decision in the device
itself.

- Alex

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22 15:54 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kernel-hardening, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9r_zTHo=dxRRK6UrjJ_dKV14yYsZsxCc362z4CPoVkddw@mail.gmail.com>

On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > following up on what appears to be a random subject: ;)
> >
> > IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> > in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This is a separate issue, and this series is getting a bit too
complex.  So I'd suggest pushing this off to a separate change.

Changing the htree hash algorithm is an on-disk format change, and so
we couldn't roll it out until e2fsprogs gets updated and rolled out
pretty broadley.  In fact George sent me patches to add siphash as a
hash algorithm for htree a while back (for both the kernel and
e2fsprogs), but I never got around to testing and applying them,
mainly because while it's technically faster, I had other higher
priority issues to work on --- and see previous comments regarding
pixel peeping.  Improving the hash algorithm by tens or even hundreds
of nanoseconds isn't really going to matter since we only do a htree
lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
times will dominate.  And from the power perspective, saving
microwatts of CPU power isn't going to matter if you're going to be
spinning up the storage device....

						- Ted

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22 15:58 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kernel-hardening, Hannes Frederic Sowa, Andy Lutomirski, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9o=yLOLr2w3xYj2up-UW0tXtv=0A5ffiTiVCCHkv6Twxg@mail.gmail.com>

On Thu, Dec 22, 2016 at 07:03:29AM +0100, Jason A. Donenfeld wrote:
> I find this compelling. We'll have one csprng for both
> get_random_int/long and for /dev/urandom, and we'll be able to update
> that silly warning on the comment of get_random_int/long to read
> "gives output of either rdrand quality or of /dev/urandom quality",
> which makes it more useful for other things. It introduces less error
> prone code, and it lets the RNG analysis be spent on just one RNG, not
> two.
> 
> So, with your blessing, I'm going to move ahead with implementing a
> pretty version of this for v8.

Can we do this as a separate series, please?  At this point, it's a
completely separate change from a logical perspective, and we can take
in the change through the random.git tree.

Changes that touch files that are normally changed in several
different git trees leads to the potential for merge conflicts during
the linux-next integration and merge window processes.  Which is why
it's generally best to try to isolate changes as much as possible.

Cheers,

						- Ted

^ permalink raw reply

* BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-22 16:07 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Daniel Borkmann, Alexei Starovoitov
  Cc: Jason A. Donenfeld, kernel-hardening@lists.openwall.com,
	Theodore Ts'o, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson

On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> Hi Hannes,
>>
>> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > You don't want to give people new IPv6 addresses with the same stable
>> > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > connectivity then and it is extra work?
>>
>> Ahh, too bad. So it goes.
>
> If no other users survive we can put it into the ipv6 module.
>
>> > The bpf hash stuff can be changed during this merge window, as it is
>> > not yet in a released kernel. Albeit I would probably have preferred
>> > something like sha256 here, which can be easily replicated by user
>> > space tools (minus the problem of patching out references to not
>> > hashable data, which must be zeroed).
>>
>> Oh, interesting, so time is of the essence then. Do you want to handle
>> changing the new eBPF code to something not-SHA1 before it's too late,
>> as part of a ne
w patchset that can fast track itself to David? And
>> then I can preserve my large series for the next merge window.
>
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.
>

You mean:

commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
Author: Daniel Borkmann <daniel@iogearbox.net>
Date:   Sun Dec 4 23:19:41 2016 +0100

    bpf: add prog_digest and expose it via fdinfo/netlink

Yes, please!  This actually matters for security -- imagine a
malicious program brute-forcing a collision so that it gets loaded
wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
(preferably the latter).  Speed basically doesn't matter here and
Blake2 is both less stable (didn't they slightly change it recently?)
and much less well studied.

My inclination would have been to seed them with something that isn't
exposed to userspace for the precise reason that it would prevent user
code from making assumptions about what's in the hash.  But if there's
a use case for why user code needs to be able to calculate the hash on
its own, then that's fine.  But perhaps the actual fdinfo string
should be "sha256:abcd1234..." to give some flexibility down the road.

Also:

+       result = (__force __be32 *)fp->digest;
+       for (i = 0; i < SHA_DIGEST_WORDS; i++)
+               result[i] = cpu_to_be32(fp->digest[i]);

Everyone, please, please, please don't open-code crypto primitives.
Is this and the code above it even correct?  It might be but on a very
brief glance it looks wrong to me.  If you're doing this to avoid
depending on crypto, then fix crypto so you can pull in the algorithm
without pulling in the whole crypto core.

At the very least, there should be a separate function that calculates
the hash of a buffer and that function should explicitly run itself
against test vectors of various lengths to make sure that it's
calculating what it claims to be calculating.  And it doesn't look
like the userspace code has landed, so, if this thing isn't
calculating SHA1 correctly, it's plausible that no one has noticed.

--Andy

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Andy Lutomirski @ 2016-12-22 16:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: George Spelvin, Ted Ts'o, Andi Kleen, David S. Miller,
	David Laight, D. J. Bernstein, Eric Biggers, Eric Dumazet,
	Hannes Frederic Sowa, Jason A. Donenfeld, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds
In-Reply-To: <CALCETrVn1tWBQx-RCSqCQ2ZcB6hPdioaV52q8vY+Mz1fRKsUXA@mail.gmail.com>

On Wed, Dec 21, 2016 at 6:07 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
> <linux@sciencehorizons.net> wrote:
>> As a separate message, to disentangle the threads, I'd like to
>> talk about get_random_long().
>>
>> After some thinking, I still like the "state-preserving" construct
>> that's equivalent to the current MD5 code.  Yes, we could just do
>> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
>> preserve a bit more.
>>
>> It requires library support from the SipHash code to return the full
>> SipHash state, but I hope that's a fair thing to ask for.
>
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?
>
>>
>> Here's my current straw man design for comment.  It's very similar to
>> the current MD5-based design, but feeds all the seed material in the
>> "correct" way, as opposed to Xring directly into the MD5 state.
>>
>> * Each CPU has a (Half)SipHash state vector,
>>   "unsigned long get_random_int_hash[4]".  Unlike the current
>>   MD5 code, we take care to initialize it to an asymmetric state.
>>
>> * There's a global 256-bit random_int_secret (which we could
>>   reseed periodically).
>>
>> To generate a random number:
>> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>>   SipHash key and the appropriate XOR constants.
>> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>>   (This is arbitary seed material, copied from the current code.)
>> * Crank through that with (Half)SipHash-1-0.
>> * Crank through the random_int_secret with (Half)SipHash-1-0.
>> * Return v1 ^ v3.
>
> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:
>
> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Having slept on this, I like it less.  The problem is that a
backtracking attacker doesn't just learn H(random seed || entropy_0 ||
secret || ...) -- they learn the internal state of the hash function
that generates that value.  This probably breaks any attempt to apply
security properties of the hash function.  For example, the internal
state could easily contain a whole bunch of prior outputs it in
verbatim.

--Andy

^ permalink raw reply

* Re: [PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
From: David Miller @ 2016-12-22 16:13 UTC (permalink / raw)
  To: lorenzo; +Cc: netdev
In-Reply-To: <1482420837-30324-1-git-send-email-lorenzo@google.com>

From: Lorenzo Colitti <lorenzo@google.com>
Date: Fri, 23 Dec 2016 00:33:57 +0900

> Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
> protocols.") made ip_do_redirect call sock_net(sk) to determine
> the network namespace of the passed-in socket. This crashes if sk
> is NULL.
> 
> Fix this by getting the network namespace from the skb instead.
> 
> Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.")
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

Applied, thanks Lorenzo.

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 16:16 UTC (permalink / raw)
  To: Theodore Ts'o, Jason A. Donenfeld, kernel-hardening,
	Hannes Frederic Sowa, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222155853.beqowf2qfg7igf23@thunk.org>

Hi Ted,

On Thu, Dec 22, 2016 at 4:58 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> Can we do this as a separate series, please?  At this point, it's a
> completely separate change from a logical perspective, and we can take
> in the change through the random.git tree.
>
> Changes that touch files that are normally changed in several
> different git trees leads to the potential for merge conflicts during
> the linux-next integration and merge window processes.  Which is why
> it's generally best to try to isolate changes as much as possible.

Sure, I can separate things out.

Could you offer a bit of advice on how to manage dependencies between
patchsets during merge windows? I'm a bit new to the process.

Specifically, we how have 4 parts:
1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash. to: ted ts'o's random-next
3. move lib/md5.c to static function in crypto/md5.c, remove entry
inside of linux/cryptohash.h. to: ??'s ??-next
4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1, 3 depends on 1 & 2. But this can be
simplified into 3 parts:

1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash, move lib/md5.c to static
function in crypto/md5.c, remove entry inside of linux/cryptohash.h.
to: ted ts'o's random-next
3. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1. Is that okay with you?

Also, would you like me to merge (3) and (2) of the second list into
one series for you?

Jason

^ permalink raw reply

* Re: pull-request: wireless-drivers 2016-12-22
From: David Miller @ 2016-12-22 16:16 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <8760mcrobk.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 22 Dec 2016 17:11:27 +0200

> before the holidays a really small pull request for 4.10. I just want to
> have the regressions in rtlwifi and ath9k fixed quickly so I send this
> earlier than I normally would.
> 
> Please let me know if there are any problems.

Pulled, thanks.

^ permalink raw reply

* ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-22 16:16 UTC (permalink / raw)
  To: netdev; +Cc: Hannes Frederic Sowa

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc260e70..ea89073c8247 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -591,7 +591,11 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 	}
 
 	offset += skb_transport_offset(skb);
-	BUG_ON(skb_copy_bits(skb, offset, &csum, 2));
+	err = skb_copy_bits(skb, offset, &csum, 2);
+	if (err < 0) {
+		ip6_flush_pending_frames(sk);
+		goto out;
+	}
 
 	/* in case cksum was not initialized */
 	if (unlikely(csum))

^ permalink raw reply related

* Re: [PATCH v2 net] ipvlan: fix various issues in ipvlan_process_multicast()
From: David Miller @ 2016-12-22 16:20 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, maheshb
In-Reply-To: <1482372024.8944.79.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Dec 2016 18:00:24 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> 1) netif_rx() / dev_forward_skb() should not be called from process
> context.
> 
> 2) ipvlan_count_rx() should be called with preemption disabled.
> 
> 3) We should check if ipvlan->dev is up before feeding packets
> to netif_rx()
> 
> 4) We need to prevent device from disappearing if some packets
> are in the multicast backlog.
> 
> 5) One kfree_skb() should be a consume_skb() eventually
> 
> Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to
> a work-queue")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Mahesh Bandewar <maheshb@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipvlan: fix multicast processing
From: David Miller @ 2016-12-22 16:20 UTC (permalink / raw)
  To: mahesh; +Cc: netdev, edumazet, maheshb
In-Reply-To: <1482370216-14833-1-git-send-email-mahesh@bandewar.net>

From: Mahesh Bandewar <mahesh@bandewar.net>
Date: Wed, 21 Dec 2016 17:30:16 -0800

> From: Mahesh Bandewar <maheshb@google.com>
> 
> In an IPvlan setup when master is set in loopback mode e.g.
> 
>   ethtool -K eth0 set loopback on
> 
>   where eth0 is master device for IPvlan setup.
> 
> The failure is caused by the faulty logic that determines if the
> packet is from TX-path vs. RX-path by just looking at the mac-
> addresses on the packet while processing multicast packets.
> 
> In the loopback-mode where this crash was happening, the packets
> that are sent out are reflected by the NIC and are processed on
> the RX path, but mac-address check tricks into thinking this
> packet is from TX path and falsely uses dev_forward_skb() to pass
> packets to the slave (virtual) devices.
> 
> This patch records the path while queueing packets and eliminates
> logic of looking at mac-addresses for the same decision.
 ...
> Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue")
> Signed-off-by: Mahesh Bandewar <maheshb@google.com>

This looks a lot better, applied, thanks.

^ permalink raw reply

* Re: [PATCH net 1/1] tipc: don't send FIN message from connectionless socket
From: David Miller @ 2016-12-22 16:20 UTC (permalink / raw)
  To: jon.maloy; +Cc: netdev, tipc-discussion, viro
In-Reply-To: <1482409349-17081-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 22 Dec 2016 07:22:29 -0500

> In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
> check for socket type is in the wrong place, causing a closing socket
> to always send out a FIN message even when the socket was never
> connected. This is normally harmless, since the destination node for
> such messages most often is zero, and the message will be dropped, but
> it is still a wrong and confusing behavior.
> 
> We fix this in this commit.
> 
> Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/intel

^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: David Miller @ 2016-12-22 16:21 UTC (permalink / raw)
  To: Joao.Pinto
  Cc: peppe.cavallaro, seraphin.bonnaffe, hock.leong.kweh,
	niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <7b395fd7dfd0c808243a744393473cbbf89b268a.1482410161.git.jpinto@synopsys.com>

From: Joao Pinto <Joao.Pinto@synopsys.com>
Date: Thu, 22 Dec 2016 12:38:00 +0000

> When testing stmmac with my QoS reference design I checked a problem in the
> CSR clock configuration that was impossibilitating the phy discovery, since
> every read operation returned 0x0000ffff. This patch fixes the issue.
> 
> Signed-off-by: Joao Pinto <jpinto@synopsys.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] net/mlx4_en: Fix user prio field in XDP forward
From: David Miller @ 2016-12-22 16:21 UTC (permalink / raw)
  To: tariqt; +Cc: netdev, eranbe, saeedm, kafai
In-Reply-To: <1482409978-17590-1-git-send-email-tariqt@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>
Date: Thu, 22 Dec 2016 14:32:58 +0200

> The user prio field is wrong (and overflows) in the XDP forward
> flow.
> This is a result of a bad value for num_tx_rings_p_up, which should
> account all XDP TX rings, as they operate for the same user prio.
> 
> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> Reported-by: Martin KaFai Lau <kafai@fb.com>

Applied.

^ permalink raw reply

* [PATCH iproute2 v3 0/4] update ifstat for new stats
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel

Previously stats were gotten by RTM_GETLINK which returns 32 bits based
statistics. It supports only one type of stats.
Lately, a new method to get stats was added - RTM_GETSTATS. It supports
ability to choose stats type. The basic stats were changed from 32 bits
based to 64 bits based.

This patchset adds ifstat the ability to get extended stats by this
method. Its adds two types of extended stats:
64bits - the same as the "normal" stats but get the stats from the cpu
in 64 bits based struct.
SW - for packets that hit cpu.

---
v2->v3:
- patch 1/4:
 - add a new patch to reorder includes in misc/ifstat.c
- patch 2/4: (previously 1/3)
 - fix typos.
 - change error print to use fprintf.

v1->v2:
 - change from using RTM_GETSTATS always to using it only for extended
   stats.
 - Add 64bits extended stats type.

Nogah Frankel (4):
  ifstat: Includes reorder
  ifstat: Add extended statistics to ifstat
  ifstat: Add 64 bits based stats to extended statistics
  ifstat: Add "sw only" extended statistics to ifstat

 misc/ifstat.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 153 insertions(+), 18 deletions(-)

-- 
2.4.3

^ permalink raw reply

* [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel
In-Reply-To: <1482423795-6531-1-git-send-email-nogahf@mellanox.com>

Extended stats are part of the RTM_GETSTATS method. This patch adds them
to ifstat.
While extended stats can come in many forms, we support only the
rtnl_link_stats64 struct for them (which is the 64 bits version of struct
rtnl_link_stats).
We support stats in the main nesting level, or one lower.
The extension can be called by its name or any shorten of it. If there is
more than one matched, the first one will be picked.

To get the extended stats the flag -x <stats type> is used.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 misc/ifstat.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 146 insertions(+), 15 deletions(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 5bcbcc8..ce666b3 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -34,6 +34,7 @@
 #include "libnetlink.h"
 #include "json_writer.h"
 #include "SNAPSHOT.h"
+#include "utils.h"
 
 int dump_zeros;
 int reset_history;
@@ -48,17 +49,21 @@ int pretty;
 double W;
 char **patterns;
 int npatterns;
+bool is_extended;
+int filter_type;
+int sub_type;
 
 char info_source[128];
 int source_mismatch;
 
 #define MAXS (sizeof(struct rtnl_link_stats)/sizeof(__u32))
+#define NO_SUB_TYPE 0xffff
 
 struct ifstat_ent {
 	struct ifstat_ent	*next;
 	char			*name;
 	int			ifindex;
-	unsigned long long	val[MAXS];
+	__u64			val[MAXS];
 	double			rate[MAXS];
 	__u32			ival[MAXS];
 };
@@ -106,6 +111,48 @@ static int match(const char *id)
 	return 0;
 }
 
+static int get_nlmsg_extended(const struct sockaddr_nl *who,
+			      struct nlmsghdr *m, void *arg)
+{
+	struct if_stats_msg *ifsm = NLMSG_DATA(m);
+	struct rtattr *tb[IFLA_STATS_MAX+1];
+	int len = m->nlmsg_len;
+	struct ifstat_ent *n;
+
+	if (m->nlmsg_type != RTM_NEWSTATS)
+		return 0;
+
+	len -= NLMSG_LENGTH(sizeof(*ifsm));
+	if (len < 0)
+		return -1;
+
+	parse_rtattr(tb, IFLA_STATS_MAX, IFLA_STATS_RTA(ifsm), len);
+	if (tb[filter_type] == NULL)
+		return 0;
+
+	n = malloc(sizeof(*n));
+	if (!n)
+		abort();
+
+	n->ifindex = ifsm->ifindex;
+	n->name = strdup(ll_index_to_name(ifsm->ifindex));
+
+	if (sub_type == NO_SUB_TYPE) {
+		memcpy(&n->val, RTA_DATA(tb[filter_type]), sizeof(n->val));
+	} else {
+		struct rtattr *attr;
+
+		attr = parse_rtattr_one_nested(sub_type, tb[filter_type]);
+		if (attr == NULL)
+			return 0;
+		memcpy(&n->val, RTA_DATA(attr), sizeof(n->val));
+	}
+	memset(&n->rate, 0, sizeof(n->rate));
+	n->next = kern_db;
+	kern_db = n;
+	return 0;
+}
+
 static int get_nlmsg(const struct sockaddr_nl *who,
 		     struct nlmsghdr *m, void *arg)
 {
@@ -147,18 +194,34 @@ static void load_info(void)
 {
 	struct ifstat_ent *db, *n;
 	struct rtnl_handle rth;
+	__u32 filter_mask;
 
 	if (rtnl_open(&rth, 0) < 0)
 		exit(1);
 
-	if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) {
-		perror("Cannot send dump request");
-		exit(1);
-	}
+	if (is_extended) {
+		ll_init_map(&rth);
+		filter_mask = IFLA_STATS_FILTER_BIT(filter_type);
+		if (rtnl_wilddump_stats_req_filter(&rth, AF_UNSPEC, RTM_GETSTATS,
+						   filter_mask) < 0) {
+			perror("Cannot send dump request");
+			exit(1);
+		}
 
-	if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) {
-		fprintf(stderr, "Dump terminated\n");
-		exit(1);
+		if (rtnl_dump_filter(&rth, get_nlmsg_extended, NULL) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
+	} else {
+		if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) {
+			perror("Cannot send dump request");
+			exit(1);
+		}
+
+		if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) {
+			fprintf(stderr, "Dump terminated\n");
+			exit(1);
+		}
 	}
 
 	rtnl_close(&rth);
@@ -553,10 +616,17 @@ static void update_db(int interval)
 				}
 				for (i = 0; i < MAXS; i++) {
 					double sample;
-					unsigned long incr = h1->ival[i] - n->ival[i];
+					__u64 incr;
+
+					if (is_extended) {
+						incr = h1->val[i] - n->val[i];
+						n->val[i] = h1->val[i];
+					} else {
+						incr = (__u32) (h1->ival[i] - n->ival[i]);
+						n->val[i] += incr;
+						n->ival[i] = h1->ival[i];
+					}
 
-					n->val[i] += incr;
-					n->ival[i] = h1->ival[i];
 					sample = (double)(incr*1000)/interval;
 					if (interval >= scan_interval) {
 						n->rate[i] += W*(sample-n->rate[i]);
@@ -656,6 +726,48 @@ static int verify_forging(int fd)
 	return -1;
 }
 
+static void xstat_usage(void)
+{
+	fprintf(stderr,
+"Usage: ifstat supported xstats:\n");
+}
+
+struct extended_stats_options_t {
+	char *name;
+	int id;
+	int sub_type;
+};
+
+/* Note: if one xstat name is subset of another, it should be before it in this
+ * list.
+ * Name length must be under 64 chars.
+ */
+static const struct extended_stats_options_t extended_stats_options[] = {
+};
+
+static bool get_filter_type(char *name)
+{
+	int name_len;
+	int i;
+
+	name_len = strlen(name);
+	for (i = 0; i < ARRAY_SIZE(extended_stats_options); i++) {
+		const struct extended_stats_options_t *xstat;
+
+		xstat = &extended_stats_options[i];
+		if (strncmp(name, xstat->name, name_len) == 0) {
+			filter_type = xstat->id;
+			sub_type = xstat->sub_type;
+			strcpy(name, xstat->name);
+			return true;
+		}
+	}
+
+	fprintf(stderr, "invalid ifstat extension %s\n", name);
+	xstat_usage();
+	return false;
+}
+
 static void usage(void) __attribute__((noreturn));
 
 static void usage(void)
@@ -673,7 +785,8 @@ static void usage(void)
 "   -s, --noupdate       don't update history\n"
 "   -t, --interval=SECS  report average over the last SECS\n"
 "   -V, --version        output version information\n"
-"   -z, --zeros          show entries with zero activity\n");
+"   -z, --zeros          show entries with zero activity\n"
+"   -x, --extended=TYPE  show extended stats of TYPE\n");
 
 	exit(-1);
 }
@@ -691,18 +804,22 @@ static const struct option longopts[] = {
 	{ "interval", 1, 0, 't' },
 	{ "version", 0, 0, 'V' },
 	{ "zeros", 0, 0, 'z' },
+	{ "extended", 1, 0, 'x'},
 	{ 0 }
 };
 
+
 int main(int argc, char *argv[])
 {
 	char hist_name[128];
 	struct sockaddr_un sun;
 	FILE *hist_fp = NULL;
+	char stats_type[64];
 	int ch;
 	int fd;
 
-	while ((ch = getopt_long(argc, argv, "hjpvVzrnasd:t:e",
+	is_extended = false;
+	while ((ch = getopt_long(argc, argv, "hjpvVzrnasd:t:ex:",
 			longopts, NULL)) != EOF) {
 		switch (ch) {
 		case 'z':
@@ -743,6 +860,11 @@ int main(int argc, char *argv[])
 				exit(-1);
 			}
 			break;
+		case 'x':
+			is_extended = true;
+			memset(stats_type, 0, 64);
+			strncpy(stats_type, optarg, 63);
+			break;
 		case 'v':
 		case 'V':
 			printf("ifstat utility, iproute2-ss%s\n", SNAPSHOT);
@@ -757,6 +879,10 @@ int main(int argc, char *argv[])
 	argc -= optind;
 	argv += optind;
 
+	if (is_extended)
+		if (!get_filter_type(stats_type))
+			exit(-1);
+
 	sun.sun_family = AF_UNIX;
 	sun.sun_path[0] = 0;
 	sprintf(sun.sun_path+1, "ifstat%d", getuid());
@@ -795,8 +921,13 @@ int main(int argc, char *argv[])
 		snprintf(hist_name, sizeof(hist_name),
 			 "%s", getenv("IFSTAT_HISTORY"));
 	else
-		snprintf(hist_name, sizeof(hist_name),
-			 "%s/.ifstat.u%d", P_tmpdir, getuid());
+		if (!is_extended)
+			snprintf(hist_name, sizeof(hist_name),
+				 "%s/.ifstat.u%d", P_tmpdir, getuid());
+		else
+			snprintf(hist_name, sizeof(hist_name),
+				 "%s/.%s_ifstat.u%d", P_tmpdir, stats_type,
+				 getuid());
 
 	if (reset_history)
 		unlink(hist_name);
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v3 1/4] ifstat: Includes reorder
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel
In-Reply-To: <1482423795-6531-1-git-send-email-nogahf@mellanox.com>

Reorder the includes order in misc/ifstat.c to match convention.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 misc/ifstat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index 92d67b0..5bcbcc8 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -28,12 +28,12 @@
 #include <math.h>
 #include <getopt.h>
 
-#include <libnetlink.h>
-#include <json_writer.h>
 #include <linux/if.h>
 #include <linux/if_link.h>
 
-#include <SNAPSHOT.h>
+#include "libnetlink.h"
+#include "json_writer.h"
+#include "SNAPSHOT.h"
 
 int dump_zeros;
 int reset_history;
-- 
2.4.3

^ permalink raw reply related

* [PATCH iproute2 v3 3/4] ifstat: Add 64 bits based stats to extended statistics
From: Nogah Frankel @ 2016-12-22 16:23 UTC (permalink / raw)
  To: netdev
  Cc: stephen, roopa, roszenrami, ogerlitz, jiri, eladr, yotamg, idosch,
	Nogah Frankel
In-Reply-To: <1482423795-6531-1-git-send-email-nogahf@mellanox.com>

The default stats for ifstat are 32 bits based.
The kernel supports 64 bits based stats. (They are returned in struct
rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in
which the "normal" stats are returned, but with fields of u64 instead of
u32). This patch adds them as an extended stats.

It is read with filter type IFLA_STATS_LINK_64 and no sub type.

It is under the name 64bits
(or any shorten of it as "64")

For example:
ifstat -x 64bit

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 misc/ifstat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/misc/ifstat.c b/misc/ifstat.c
index ce666b3..8325ac7 100644
--- a/misc/ifstat.c
+++ b/misc/ifstat.c
@@ -729,7 +729,8 @@ static int verify_forging(int fd)
 static void xstat_usage(void)
 {
 	fprintf(stderr,
-"Usage: ifstat supported xstats:\n");
+"Usage: ifstat supported xstats:\n"
+"       64bits         default stats, with 64 bits support\n");
 }
 
 struct extended_stats_options_t {
@@ -743,6 +744,7 @@ struct extended_stats_options_t {
  * Name length must be under 64 chars.
  */
 static const struct extended_stats_options_t extended_stats_options[] = {
+	{"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE},
 };
 
 static bool get_filter_type(char *name)
-- 
2.4.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox