Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-09-24 17:55 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1380043818.4391.26.camel@localhost.localdomain>

On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich wrote:
>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich wrote:
>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David Miller wrote:
>>>>>>>> From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
>>>>>>>> Date: Tue, 10 Sep 2013 19:27:54 +0900
>>>>>>>>
>>>>>>>>> There seem to be some undesirable behaviors related with PVID.
>>>>>>>>> 1. It has no effect assigning PVID to a port. PVID cannot be applied
>>>>>>>>> to any frame regardless of whether we set it or not.
>>>>>>>>> 2. FDB entries learned via frames applied PVID are registered with
>>>>>>>>> VID 0 rather than VID value of PVID.
>>>>>>>>> 3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
>>>>>>>>> This leads interoperational problems such as sending frames with VID
>>>>>>>>> 4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
>>>>>>>>> 0 as they belong to VLAN 0, which is expected to be handled as they have
>>>>>>>>> no VID according to IEEE 802.1Q.
>>>>>>>>>
>>>>>>>>> Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
>>>>>>>>> is fixed, because we cannot activate PVID due to it.
>>>>>>>>
>>>>>>>> Please work out the issues in patch #2 with Vlad and resubmit this
>>>>>>>> series.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>
>>>>>>> I'm hovering between whether we should fix the issue by changing vlan 0
>>>>>>> interface behavior in 8021q module or enabling a bridge port to sending
>>>>>>> priority-tagged frames, or another better way.
>>>>>>>
>>>>>>> If you could comment it, I'd appreciate it :)
>>>>>>>
>>>>>>>
>>>>>>> BTW, I think what is discussed in patch #2 is another problem about
>>>>>>> handling priority-tags, and it exists without this patch set applied.
>>>>>>> It looks like that we should prepare another patch set than this to fix
>>>>>>> that problem.
>>>>>>>
>>>>>>> Should I include patches that fix the priority-tags problem in this
>>>>>>> patch set and resubmit them all together?
>>>>>>>
>>>>>>
>>>>>> I am thinking that we might need to do it in bridge and it looks like
>>>>>> the simplest way to do it is to have default priority regeneration table
>>>>>> (table 6-5 from 802.1Q doc).
>>>>>>
>>>>>> That way I think we would conform to the spec.
>>>>>>
>>>>>> -vlad
>>>>>
>>>>> Unfortunately I don't think the default priority regeneration table
>>>>> resolves the problem because IEEE 802.1Q says that a VLAN-aware bridge
>>>>> can transmit untagged or VLAN-tagged frames only (the end of section 7.5
>>>>> and 8.1.7).
>>>>>
>>>>> No mechanism to send priority-tagged frames is found as far as I can see
>>>>> the standard. I think the regenerated priority is used for outgoing PCP
>>>>> field only if egress policy is not untagged (i.e. transmitting as
>>>>> VLAN-tagged), and unused if untagged (Section 6.9.2 3rd/4th Paragraph).
>>>>>
>>>>> If we want to transmit priority-tagged frames from a bridge port, I
>>>>> think we need to implement a new (optional) feature that is above the
>>>>> standard, as I stated previously.
>>>>>
>>>>> How do you feel about adding a per-port policy that enables a bridge to
>>>>> send priority-tagged frames instead of untagged frames when egress
>>>>> policy for the port is untagged?
>>>>> With this change, we can transmit frames for a given vlan as either all
>>>>> untagged, all priority-tagged or all VLAN-tagged.
>>>>
>>>> That would work.  What I am thinking is that we do it by special casing
>>>> the vid 0 egress policy specification.  Let it be untagged by default
>>>> and if it is tagged, then we preserve the priority field and forward
>>>> it on.
>>>>
>>>> This keeps the API stable and doesn't require user/admin from knowing
>>>> exactly what happens.  Default operation conforms to the spec and allows
>>>> simple change to make it backward-compatible.
>>>>
>>>> What do you think.  I've done a simple prototype of this an it seems to
>>>> work with the VMs I am testing with.
>>>
>>> Are you saying that
>>> - by default, set the 0th bit of untagged_bitmap; and
>>> - if we unset the 0th bit and set the "vid"th bit, we transmit frames
>>> classified as belonging to VLAN "vid" as priority-tagged?
>>>
>>> If so, though it's attractive to keep current API, I'm worried about if
>>> it could be a bit confusing and not intuitive for kernel/iproute2
>>> developers that VID 0 has a special meaning only in the egress policy.
>>> Wouldn't it be better to adding a new member to struct net_port_vlans
>>> instead of using VID 0 of untagged_bitmap?
>>>
>>> Or are you saying that we use a new flag in struct net_port_vlans but
>>> use the BRIDGE_VLAN_INFO_UNTAGGED bit with VID 0 in netlink to set the
>>> flag?
>>>
>>> Even in that case, I'm afraid that it might be confusing for developers
>>> for the same reason. We are going to prohibit to specify VID with 0 (and
>>> 4095) in adding/deleting a FDB entry or a vlan filtering entry, but it
>>> would allow us to use VID 0 only when a vlan filtering entry is
>>> configured.
>>> I am thinking a new nlattr is a straightforward approach to configure
>>> it.
>>
>> By making this an explicit attribute it makes vid 0 a special case for
>> any automatic tool that would provision such filtering.  Seeing vid 0
>> would mean that these tools would have to know that this would have to
>> be translated to a different attribute instead of setting the policy
>> values.
>
> Yes, I agree with you that we can do it by the way you explained.
> What I don't understand is the advantage of using vid 0 over another way
> such as adding a new nlattr.
> I think we can indicate transmitting priority-tags explicitly by such a
> nlattr. Using vid 0 seems to be easier to implement than a new nlattr,
> but, for me, it looks less intuitive and more difficult to maintain
> because we have to care about vid 0 instead of simply ignoring it.
>

The point I am trying to make is that regardless of the approach someone
has to know what to do when enabling priority tagged frames.  You 
proposal would require the administrator or config tool to have that 
knowledge.  Example is:
	Admin does: bridge vlan set priority on dev eth0
         Automated app:
		if (vid == 0)
			/* Turn on priority tagged frame support */

My proposal would require the bridge filtering implementation to have it.
	user tool: bridge vlan add vid 0 tagged
	Automated app:  No special case.

IMO its better to have 1 piece code handling the special case then 
putting it multiple places.

Thanks
-vlad

> Thanks,
>
> Toshiaki Makita
>
>>
>> How it is implemented internally in the kernel isn't as big of an issue.
>> We can do it as a separate flag or as part of existing policy.
>>
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>> -vlad
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Toshiaki Makita
>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>
>>>
>>
>
>

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 18:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, David Miller, Linux Netdev List,
	Brandeburg, Jesse
In-Reply-To: <1380044075.3165.108.camel@edumazet-glaptop>

On Tue, Sep 24, 2013 at 10:34 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-09-24 at 10:03 -0700, Tom Herbert wrote:
>> On Tue, Sep 24, 2013 at 9:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2013-09-24 at 09:35 -0700, Tom Herbert wrote:
>> >
>> >> We should really be using rxhash for that anyway, eliminate this
>> >> ehashfn.  This would entail adding rxhash argument in the various
>> >> udp_lookup functions.
>> >
>> > Nope : Some NICs provide UDP rxhash only using L3  (source IP,
>> > destination IP), not L4 (adding source & destination ports)
>> >
>> Then the NIC won't set l4_rxhash and we'll rehash over 4-tuple when
>> skb_get_rxhash is called.
>
> Yes, but then in this case you add cpu cycles for no reason.
>
> If you have multiqueue NIC, you do not use RPS/RFS, so skb->rxhash might
> be 0
>
> hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
>
> is faster than the whole flow dissection game.
>
But if we already have valid l4_rxhash we're just wasting time
recomputing it (very like the same value).  So just do:

if (skb->l4_hash)
    hash = skb->rxhash
else
    hash = inet_ehashfn(net, daddr, hnum, saddr, sport);

Even if we go the inet_ehashfn route in the packet, it's makes sense
to store this in skb->rxhash so that subsequent functions don't need
to compute the hash

By the way, I believe there is an insidious in using the same hash
value or even related hash values in multiple places for steering.
Since we typically do something like hash % numqueues, we introduce
bias if multiple steering levels look at the same bits.  With
SO_REUSEPORT this might actually be beneficial since we could get port
selection to match RSS locality, but this is not by design!

>
>
>
>

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24 18:03 UTC (permalink / raw)
  To: therbert; +Cc: David.Laight, netdev, jesse.brandeburg
In-Reply-To: <CA+mtBx_AcmcNBDhRKVB8YDkSNdOwgJ3CXxBqpgxDw23X3m2rvQ@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Tue, 24 Sep 2013 08:54:24 -0700

> On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>> From: Tom Herbert <therbert@google.com>
>> Date: Tue, 24 Sep 2013 08:22:55 -0700
>>
>>> We use this value for steering, and could use it for other uses like
>>> connection lookup.
>>
>> For security reasons we absolutely cannot use it for that purpose,
>> please stop claiming this.
>>
>> Any hash function which an attacker can reproduce is attackable.
> 
> The Toeplitz function uses a secret key whose length is based on the
> input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
> attacker can reproduce this if the key is random.  If the problem is
> that devices are not being configured with a sufficiently random key
> (some actually are using a fixed key :-( ), that's a separate issue
> that should be addressed.  It is possible to DoS attack through the
> steering mechanism.

All of them are using a fixed, defined, key.

^ permalink raw reply

* Re: [PATCH 04/10] amd/7990: Remove extern from function prototypes
From: David Miller @ 2013-09-24 18:06 UTC (permalink / raw)
  To: joe; +Cc: netdev, linux-kernel
In-Reply-To: <1380041755.3575.93.camel@joe-AO722>

From: Joe Perches <joe@perches.com>
Date: Tue, 24 Sep 2013 09:55:55 -0700

> Hey David.
> 
> This ended up in your tree as
> commit 44da5c2f523876a02336c18dbcce93449fdbf0ca
> without your signoff.
> 
> Is there something I need to do differently so
> your scripts add your sign-off properly or did
> this manage to just slip through via some manual
> git am without the -s?

I forgot to "--signoff", that's all.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 18:06 UTC (permalink / raw)
  To: David Miller; +Cc: David Laight, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <20130924.140312.1944338200709799169.davem@redhat.com>

On Tue, Sep 24, 2013 at 11:03 AM, David Miller <davem@redhat.com> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Tue, 24 Sep 2013 08:54:24 -0700
>
>> On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>>> From: Tom Herbert <therbert@google.com>
>>> Date: Tue, 24 Sep 2013 08:22:55 -0700
>>>
>>>> We use this value for steering, and could use it for other uses like
>>>> connection lookup.
>>>
>>> For security reasons we absolutely cannot use it for that purpose,
>>> please stop claiming this.
>>>
>>> Any hash function which an attacker can reproduce is attackable.
>>
>> The Toeplitz function uses a secret key whose length is based on the
>> input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
>> attacker can reproduce this if the key is random.  If the problem is
>> that devices are not being configured with a sufficiently random key
>> (some actually are using a fixed key :-( ), that's a separate issue
>> that should be addressed.  It is possible to DoS attack through the
>> steering mechanism.
>
> All of them are using a fixed, defined, key.

bnx2x, at least, seems to being trying to set a random key:

prandom_bytes(params.rss_key, T_ETH_RSS_KEY * 4);

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Ben Hutchings @ 2013-09-24 18:10 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, David.Laight, netdev, jesse.brandeburg
In-Reply-To: <20130924.140312.1944338200709799169.davem@redhat.com>

On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Tue, 24 Sep 2013 08:54:24 -0700
> 
> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
> >> From: Tom Herbert <therbert@google.com>
> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
> >>
> >>> We use this value for steering, and could use it for other uses like
> >>> connection lookup.
> >>
> >> For security reasons we absolutely cannot use it for that purpose,
> >> please stop claiming this.
> >>
> >> Any hash function which an attacker can reproduce is attackable.
> > 
> > The Toeplitz function uses a secret key whose length is based on the
> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
> > attacker can reproduce this if the key is random.  If the problem is
> > that devices are not being configured with a sufficiently random key
> > (some actually are using a fixed key :-( ), that's a separate issue
> > that should be addressed.  It is possible to DoS attack through the
> > steering mechanism.
> 
> All of them are using a fixed, defined, key.

This is certainly false, as I know sfc randomises the key.  And the
Microsoft RSS spec appears to require that the key is programmable.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: Vincent Li @ 2013-09-24 18:11 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, davem, Vincent Li

the current behavior is when an IP is added to an interface, the primary
or secondary attributes is depending on the order of ip added to the interface
the first IP will be primary and second, third,... or alias IP will be secondary
if the IP subnet matches

this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary'
(use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify
an IP address to be  primary or secondary.

the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip'
and also that each blade has its own management ip on the management interface, so whichever blade in the
cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic
originated from the primary blade source from the 'floating management ip' for consistency. but in this
case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating
management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would
like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary.

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
---
 net/ipv4/devinet.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..bfc702a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 		return 0;
 	}
 
-	ifa->ifa_flags &= ~IFA_F_SECONDARY;
 	last_primary = &in_dev->ifa_list;
 
+	if((*last_primary) == NULL)
+		ifa->ifa_flags &= ~IFA_F_SECONDARY;
+
 	for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
 	     ifap = &ifa1->ifa_next) {
 		if (!(ifa1->ifa_flags & IFA_F_SECONDARY) &&
@@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
 				inet_free_ifa(ifa);
 				return -EINVAL;
 			}
-			ifa->ifa_flags |= IFA_F_SECONDARY;
+                        if (!(ifa->ifa_flags & IFA_F_SECONDARY))
+                                ifa1->ifa_flags |= IFA_F_SECONDARY;
+                        else
+                                ifa->ifa_flags |= IFA_F_SECONDARY;
 		}
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 18:24 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, David Laight, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <1380046245.2736.52.camel@bwh-desktop.uk.level5networks.com>

On Tue, Sep 24, 2013 at 11:10 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
>> From: Tom Herbert <therbert@google.com>
>> Date: Tue, 24 Sep 2013 08:54:24 -0700
>>
>> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>> >> From: Tom Herbert <therbert@google.com>
>> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
>> >>
>> >>> We use this value for steering, and could use it for other uses like
>> >>> connection lookup.
>> >>
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> >
>> > The Toeplitz function uses a secret key whose length is based on the
>> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
>> > attacker can reproduce this if the key is random.  If the problem is
>> > that devices are not being configured with a sufficiently random key
>> > (some actually are using a fixed key :-( ), that's a separate issue
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>>
>> All of them are using a fixed, defined, key.
>
> This is certainly false, as I know sfc randomises the key.  And the
> Microsoft RSS spec appears to require that the key is programmable.
>
There are some drivers are using fixed keys.  This is not good.  I'll
take a look at those.  It should be sufficient to randomly set the key
once at initialization.


> Ben.
>
> --
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* Re: SMSC 9303 support
From: Florian Fainelli @ 2013-09-24 18:29 UTC (permalink / raw)
  To: Gary Thomas; +Cc: Ben Hutchings, netdev
In-Reply-To: <5241C4F2.2040106@mlbassoc.com>

Hello,

2013/9/24 Gary Thomas <gary@mlbassoc.com>:
> On 2013-09-24 10:51, Ben Hutchings wrote:
>>
>> On Tue, 2013-09-24 at 06:21 -0600, Gary Thomas wrote:
>>>
>>> I need to support the SMSC9303 in an embedded system.  I'm not
>>> finding any [explicit] support for this device in the latest
>>> mainline kernel.  Did I miss something?
>>>
>>> To be clear, the SMSC9303 is a 3-port managed ethernet switch
>>> capable of supporting 802.1D/802.1Q directly. This switch is
>>> driven by a single MAC via MII/RMII and exposes the other two
>>> ports via physical PHYs.  What I need it to do is behave like
>>> two external, separate devices.  I was thinking that what I need
>>> to do is treat these as VLAN devices since the switch can manage
>>> the routing.
>>>
>>> Does this seem like a reasonable approach?
>>
>>
>> Linux has 'DSA' (Distributed Switch Architecture) which supports tagging
>> of packets to indicate which switch port they are sent or received
>> through.  This was originally added to support some Marvell switch chips
>> and I don't know whether it would be suitable or extensible for this
>> one.
>
>
> I've used the DSA stuff for years (worked directly with the Marvell folks
> when it was being developed).  It might work for this device, I'll think
> some more about using it although I was hoping for a lighter weight
> solution.

I do not think DSA is suitable for pure 802.1q switches such as this
one. OpenWrt has an out of tree patch which adds some switch-specific
operations that can be controlled over netlink (currently trying to
get them in a shape where they can be submitted for mainline
inclusion) [1], which I think is much more suitable than DSA or any
other proprietary switch tagging mechanism.

[1]: https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/swconfig.c
-- 
Florian

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24 18:48 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, hannes, netdev, jesse.brandeburg
In-Reply-To: <CA+mtBx8PJj1HgaoiOoJjQL6VmU4Rn7ctBpscr1YxpfrWW9VSbA@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Tue, 24 Sep 2013 11:02:11 -0700

> But if we already have valid l4_rxhash we're just wasting time
> recomputing it (very like the same value).  So just do:

Honestly, you're arguing over 12 or so cycles.  That's about how much
jhash costs.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Jesse Brandeburg @ 2013-09-24 18:48 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, David.Laight, netdev
In-Reply-To: <20130924.140312.1944338200709799169.davem@redhat.com>

On Tue, 24 Sep 2013 14:03:12 -0400 David Miller <davem@redhat.com> wrote:
...
> >> For security reasons we absolutely cannot use it for that purpose,
> >> please stop claiming this.
> >>
> >> Any hash function which an attacker can reproduce is attackable.
> > 
...
> > that should be addressed.  It is possible to DoS attack through the
> > steering mechanism.
> 
> All of them are using a fixed, defined, key.

We selected the fixed key on purpose.  The existing mechanisms built
into the stack for preventing the impact of DOS attacks like NAPI
polling will prevent any actual damage even if someone sends lots of
packets on a single flow.  If someone overflows a receive queue that
CPU runs until it can't keep up and then hardware drops further
packets.  In this case even with a randomized seed key any single flow
can still be targeted at a queue, which is no different than a single
queue adapter.

I'm not convinced there is an actual impact in practice.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24 18:49 UTC (permalink / raw)
  To: bhutchings; +Cc: therbert, David.Laight, netdev, jesse.brandeburg
In-Reply-To: <1380046245.2736.52.camel@bwh-desktop.uk.level5networks.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 24 Sep 2013 19:10:45 +0100

> On Tue, 2013-09-24 at 14:03 -0400, David Miller wrote:
>> From: Tom Herbert <therbert@google.com>
>> Date: Tue, 24 Sep 2013 08:54:24 -0700
>> 
>> > On Tue, Sep 24, 2013 at 8:39 AM, David Miller <davem@redhat.com> wrote:
>> >> From: Tom Herbert <therbert@google.com>
>> >> Date: Tue, 24 Sep 2013 08:22:55 -0700
>> >>
>> >>> We use this value for steering, and could use it for other uses like
>> >>> connection lookup.
>> >>
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> > 
>> > The Toeplitz function uses a secret key whose length is based on the
>> > input length.  96 bits in IPv4, 320 bits in IPv6.  I don't see how an
>> > attacker can reproduce this if the key is random.  If the problem is
>> > that devices are not being configured with a sufficiently random key
>> > (some actually are using a fixed key :-( ), that's a separate issue
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>> 
>> All of them are using a fixed, defined, key.
> 
> This is certainly false, as I know sfc randomises the key.  And the
> Microsoft RSS spec appears to require that the key is programmable.

Then I stand corrected.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-24 19:04 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: David Miller, David Laight, Linux Netdev List
In-Reply-To: <20130924114853.00003935@unknown>

On Tue, Sep 24, 2013 at 11:48 AM, Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
> On Tue, 24 Sep 2013 14:03:12 -0400 David Miller <davem@redhat.com> wrote:
> ...
>> >> For security reasons we absolutely cannot use it for that purpose,
>> >> please stop claiming this.
>> >>
>> >> Any hash function which an attacker can reproduce is attackable.
>> >
> ...
>> > that should be addressed.  It is possible to DoS attack through the
>> > steering mechanism.
>>
>> All of them are using a fixed, defined, key.
>
> We selected the fixed key on purpose.  The existing mechanisms built
> into the stack for preventing the impact of DOS attacks like NAPI
> polling will prevent any actual damage even if someone sends lots of
> packets on a single flow.  If someone overflows a receive queue that
> CPU runs until it can't keep up and then hardware drops further
> packets.  In this case even with a randomized seed key any single flow
> can still be targeted at a queue, which is no different than a single
> queue adapter.
>
> I'm not convinced there is an actual impact in practice.
>
But what is the purpose of using a fixed key, what are the benefits?
A clever attacker could attack a specific queue with different
4-tuples.  While this might be similar to a single flow attack, it
will be much harder to detect and be used to attack multiple servers
simultaneously in the say way (since they all share the same key).  I
don't think we could deploy a NIC with a static RSS key.

^ permalink raw reply

* Re: [net 5/6] i40e: better return values
From: Jesse Brandeburg @ 2013-09-24 19:09 UTC (permalink / raw)
  To: David Miller; +Cc: joe, jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <20130924.101231.1193830264265403478.davem@davemloft.net>

On Tue, 24 Sep 2013 10:12:31 -0400
David Miller <davem@davemloft.net> wrote:
> > Ick.  post_increment problem.
> > 
> > 	return ++num_tc;
> > 
> > There's nothing wrong with the original code
> > unless this is a bugfix which should be documented
> > better than "better return values".
> 
> Agreed, this style of coding is asking for a bug.
> 
> If you want to return "num_tc PLUS ONE" just say that:
> 
> 	return num_tc + 1;

Oops! Obviously the original post-inc was not the intent. That was my
bad. good catch Joe! will fix and resubmit.

^ permalink raw reply

* Re: SMSC 9303 support
From: Gary Thomas @ 2013-09-24 19:13 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Ben Hutchings, netdev
In-Reply-To: <CAGVrzcbF+24ouTGxMwsu3-QCi=JeHEcS7BqBceGBX7Jf+e7BdQ@mail.gmail.com>

On 2013-09-24 12:29, Florian Fainelli wrote:
> Hello,
>
> 2013/9/24 Gary Thomas <gary@mlbassoc.com>:
>> On 2013-09-24 10:51, Ben Hutchings wrote:
>>>
>>> On Tue, 2013-09-24 at 06:21 -0600, Gary Thomas wrote:
>>>>
>>>> I need to support the SMSC9303 in an embedded system.  I'm not
>>>> finding any [explicit] support for this device in the latest
>>>> mainline kernel.  Did I miss something?
>>>>
>>>> To be clear, the SMSC9303 is a 3-port managed ethernet switch
>>>> capable of supporting 802.1D/802.1Q directly. This switch is
>>>> driven by a single MAC via MII/RMII and exposes the other two
>>>> ports via physical PHYs.  What I need it to do is behave like
>>>> two external, separate devices.  I was thinking that what I need
>>>> to do is treat these as VLAN devices since the switch can manage
>>>> the routing.
>>>>
>>>> Does this seem like a reasonable approach?
>>>
>>>
>>> Linux has 'DSA' (Distributed Switch Architecture) which supports tagging
>>> of packets to indicate which switch port they are sent or received
>>> through.  This was originally added to support some Marvell switch chips
>>> and I don't know whether it would be suitable or extensible for this
>>> one.
>>
>>
>> I've used the DSA stuff for years (worked directly with the Marvell folks
>> when it was being developed).  It might work for this device, I'll think
>> some more about using it although I was hoping for a lighter weight
>> solution.
>
> I do not think DSA is suitable for pure 802.1q switches such as this
> one. OpenWrt has an out of tree patch which adds some switch-specific
> operations that can be controlled over netlink (currently trying to
> get them in a shape where they can be submitted for mainline
> inclusion) [1], which I think is much more suitable than DSA or any
> other proprietary switch tagging mechanism.
>
> [1]: https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/swconfig.c
>

This looks interesting.  Do you have any more information on how to
integrate this and/or use it?

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24 19:14 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Ben Hutchings, David Miller, David Laight, Linux Netdev List,
	Brandeburg, Jesse
In-Reply-To: <CA+mtBx83=X6k42oe4=-hD_coRoX72bJ05Kb5dDfRCDCdTzOH+A@mail.gmail.com>

On Tue, 2013-09-24 at 11:24 -0700, Tom Herbert wrote:
> >
> There are some drivers are using fixed keys.  This is not good.  I'll
> take a look at those.  It should be sufficient to randomly set the key
> once at initialization.
> 

If the software fallback (we need this for dumb NIC, or encapsulated
traffic) is more expensive than jhash, I do not think we will ever use
this Toeplitz thing.

We can not use jhash or Toeplitz.

^ permalink raw reply

* Re: [PATCH v4 net-next 06/27] bonding: modify bond_get_slave_by_dev() to use neighbours
From: David Miller @ 2013-09-24 19:19 UTC (permalink / raw)
  To: vfalico; +Cc: netdev, jiri, fubar, andy
In-Reply-To: <1380023227-9576-7-git-send-email-vfalico@redhat.com>

From: Veaceslav Falico <vfalico@redhat.com>
Date: Tue, 24 Sep 2013 13:46:46 +0200

> -	return NULL;
> +	return (struct slave *)netdev_lower_dev_get_private(bond->dev,
> +							    slave_dev);
>  }

void pointers never need to be cast, please remove this "struct slave *"
cast.

^ permalink raw reply

* Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: Stephen Hemminger @ 2013-09-24 19:21 UTC (permalink / raw)
  To: Vincent Li; +Cc: netdev, linux-kernel, davem
In-Reply-To: <1380046281-6012-1-git-send-email-vincent.mc.li@gmail.com>

On Tue, 24 Sep 2013 11:11:21 -0700
Vincent Li <vincent.mc.li@gmail.com> wrote:

> -	ifa->ifa_flags &= ~IFA_F_SECONDARY;
>  	last_primary = &in_dev->ifa_list;
>  
> +	if((*last_primary) == NULL)


Minor nit, paren's here are unnecessary

^ permalink raw reply

* Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
From: David Miller @ 2013-09-24 19:28 UTC (permalink / raw)
  To: vincent.mc.li; +Cc: netdev, linux-kernel
In-Reply-To: <1380046281-6012-1-git-send-email-vincent.mc.li@gmail.com>

From: Vincent Li <vincent.mc.li@gmail.com>
Date: Tue, 24 Sep 2013 11:11:21 -0700

> the current behavior is when an IP is added to an interface, the primary
> or secondary attributes is depending on the order of ip added to the interface
> the first IP will be primary and second, third,... or alias IP will be secondary
> if the IP subnet matches
> 
> this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary'
> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify
> an IP address to be  primary or secondary.
> 
> the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip'
> and also that each blade has its own management ip on the management interface, so whichever blade in the
> cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic
> originated from the primary blade source from the 'floating management ip' for consistency. but in this
> case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating
> management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would
> like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary.
> 
> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>

When submitting a patch, please:

1) Specify an appropriate prefix for your subject line, indicating the
   subsystem.  "ipv4: " might be appropriate here.

2) Format your commit message so that lines do not exceed 80 columns.
   People will read using ASCII text based tools in 80 column
   terminals.

^ permalink raw reply

* Re: [PATCH v4 net-next 06/27] bonding: modify bond_get_slave_by_dev() to use neighbours
From: Veaceslav Falico @ 2013-09-24 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jiri, fubar, andy
In-Reply-To: <20130924.151943.1884204173373827600.davem@redhat.com>

On Tue, Sep 24, 2013 at 03:19:43PM -0400, David Miller wrote:
>From: Veaceslav Falico <vfalico@redhat.com>
>Date: Tue, 24 Sep 2013 13:46:46 +0200
>
>> -	return NULL;
>> +	return (struct slave *)netdev_lower_dev_get_private(bond->dev,
>> +							    slave_dev);
>>  }
>
>void pointers never need to be cast, please remove this "struct slave *"
>cast.

Yep, leftover from the sandbox versions, will change and submit the new
version. Sorry about that :-/.

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24 19:42 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, David Miller, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <CA+mtBx8PJj1HgaoiOoJjQL6VmU4Rn7ctBpscr1YxpfrWW9VSbA@mail.gmail.com>

On Tue, Sep 24, 2013 at 11:02:11AM -0700, Tom Herbert wrote:
> On Tue, Sep 24, 2013 at 10:34 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2013-09-24 at 10:03 -0700, Tom Herbert wrote:
> >> On Tue, Sep 24, 2013 at 9:46 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Tue, 2013-09-24 at 09:35 -0700, Tom Herbert wrote:
> >> >
> >> >> We should really be using rxhash for that anyway, eliminate this
> >> >> ehashfn.  This would entail adding rxhash argument in the various
> >> >> udp_lookup functions.
> >> >
> >> > Nope : Some NICs provide UDP rxhash only using L3  (source IP,
> >> > destination IP), not L4 (adding source & destination ports)
> >> >
> >> Then the NIC won't set l4_rxhash and we'll rehash over 4-tuple when
> >> skb_get_rxhash is called.
> >
> > Yes, but then in this case you add cpu cycles for no reason.
> >
> > If you have multiqueue NIC, you do not use RPS/RFS, so skb->rxhash might
> > be 0
> >
> > hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> >
> > is faster than the whole flow dissection game.
> >
> But if we already have valid l4_rxhash we're just wasting time
> recomputing it (very like the same value).  So just do:
> 
> if (skb->l4_hash)
>     hash = skb->rxhash
> else
>     hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
> 
> Even if we go the inet_ehashfn route in the packet, it's makes sense
> to store this in skb->rxhash so that subsequent functions don't need
> to compute the hash
> 
> By the way, I believe there is an insidious in using the same hash
> value or even related hash values in multiple places for steering.
> Since we typically do something like hash % numqueues, we introduce
> bias if multiple steering levels look at the same bits.  With
> SO_REUSEPORT this might actually be beneficial since we could get port
> selection to match RSS locality, but this is not by design!

Wouldn't this also need infrastructure to sync the keys over multiple network
cards to have best benefits?

^ permalink raw reply

* per-PID network stats files in /proc
From: Matthew Hall @ 2013-09-24 20:15 UTC (permalink / raw)
  To: netdev

Hello,

I have an application that I'd like to make self-tuning, based upon the values 
of some of the network stats counters. Thus I went reading through a copy of 
linux-3.11.1 to look for some more information, and began exploring procfs as 
well.

I discovered some system-wide counters in /proc/net/snmp which are pretty 
interesting so I was trying to use the per-PID counters in 
/proc/net/PID/net/snmp as well.

Unfortunately I found that all of these files seem to be identical on my own 
system running Linux 3.2.0:

$ md5sum /proc/net/snmp /proc/1/net/snmp /proc/2/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/1/net/snmp
8b92b426f860d0667a780cd8baabb7bf  /proc/2/net/snmp

I found the snmp_seq_show function in net/ipv4/proc.c which prints these 
statistics to see how it really worked, but then I was having a hard time 
finding out what's really calling this code since it hooks up to the /proc 
framework.

Is there some option one can change which enables gathering the network 
statistics per-PID, or per-socket? This would be a tremendous help for my 
application.

Thanks,
Matthew.

^ permalink raw reply

* Re: per-PID network stats files in /proc
From: Stephen Hemminger @ 2013-09-24 20:41 UTC (permalink / raw)
  To: Matthew Hall; +Cc: netdev
In-Reply-To: <20130924201536.GA3555@mhcomputing.net>

On Tue, 24 Sep 2013 13:15:37 -0700
Matthew Hall <mh@mhcomputing.net> wrote:

> Hello,
> 
> I have an application that I'd like to make self-tuning, based upon the values 
> of some of the network stats counters. Thus I went reading through a copy of 
> linux-3.11.1 to look for some more information, and began exploring procfs as 
> well.
> 
> I discovered some system-wide counters in /proc/net/snmp which are pretty 
> interesting so I was trying to use the per-PID counters in 
> /proc/net/PID/net/snmp as well.
> 
> Unfortunately I found that all of these files seem to be identical on my own 
> system running Linux 3.2.0:
> 
> $ md5sum /proc/net/snmp /proc/1/net/snmp /proc/2/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/1/net/snmp
> 8b92b426f860d0667a780cd8baabb7bf  /proc/2/net/snmp
> 
> I found the snmp_seq_show function in net/ipv4/proc.c which prints these 
> statistics to see how it really worked, but then I was having a hard time 
> finding out what's really calling this code since it hooks up to the /proc 
> framework.
> 
> Is there some option one can change which enables gathering the network 
> statistics per-PID, or per-socket? This would be a tremendous help for my 
> application.
> 
> Thanks,
> Matthew.

No. because most of these would be associated with global state.
Even sockets can be shared between PID's.

^ permalink raw reply

* [PATCH 1/2 v2] net: Add function to get SW rxhash
From: Tom Herbert @ 2013-09-24 20:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

Some uses of skb_get_rxhash expect that the function will return
a consistent value whether it is called on RX or TX paths. On RX
path, we will use the rxhash if provided by the NIC, so this
would not normally be the same result computed in TX path would be
a software calculation.

This patch adds skb_get_sw_rxhash to explicitly request a hash
calculated by the stack, disregarding the hash provided by NIC.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h    | 14 ++++++++++++--
 net/core/flow_dissector.c |  1 +
 net/core/skbuff.c         |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2ddb48d..fdde013 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -381,6 +381,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@ooo_okay: allow the mapping of a socket to a queue to be changed
  *	@l4_rxhash: indicate rxhash is a canonical 4-tuple hash over transport
  *		ports.
+ *	@sw_rxhash: indicate rxhash was computed by the stack
  *	@wifi_acked_valid: wifi_acked was set
  *	@wifi_acked: whether frame was acked on wifi or not
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
@@ -488,6 +489,7 @@ struct sk_buff {
 	__u8			pfmemalloc:1;
 	__u8			ooo_okay:1;
 	__u8			l4_rxhash:1;
+	__u8			sw_rxhash:1;
 	__u8			wifi_acked_valid:1;
 	__u8			wifi_acked:1;
 	__u8			no_fcs:1;
@@ -498,7 +500,7 @@ struct sk_buff {
 	 * headers if needed
 	 */
 	__u8			encapsulation:1;
-	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
+	/* 6/8 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
 #if defined CONFIG_NET_DMA || defined CONFIG_NET_RX_BUSY_POLL
@@ -720,7 +722,15 @@ extern unsigned int   skb_find_text(struct sk_buff *skb, unsigned int from,
 extern void __skb_get_rxhash(struct sk_buff *skb);
 static inline __u32 skb_get_rxhash(struct sk_buff *skb)
 {
-	if (!skb->l4_rxhash)
+	if (!skb->l4_rxhash && !skb->sw_rxhash)
+		__skb_get_rxhash(skb);
+
+	return skb->rxhash;
+}
+
+static inline __u32 skb_get_sw_rxhash(struct sk_buff *skb)
+{
+	if (!skb->sw_rxhash)
 		__skb_get_rxhash(skb);
 
 	return skb->rxhash;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1929af8..8979121 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -200,6 +200,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
 		hash = 1;
 
 	skb->rxhash = hash;
+	skb->sw_rxhash = 1;
 }
 EXPORT_SYMBOL(__skb_get_rxhash);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff1..5021318 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -706,6 +706,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->rxhash		= old->rxhash;
 	new->ooo_okay		= old->ooo_okay;
 	new->l4_rxhash		= old->l4_rxhash;
+	new->sw_rxhash		= old->sw_rxhash;
 	new->no_fcs		= old->no_fcs;
 	new->encapsulation	= old->encapsulation;
 #ifdef CONFIG_XFRM
-- 
1.8.4

^ permalink raw reply related

* [PATCH 2/2 v2] tun: call skb_get_sw_rxhash
From: Tom Herbert @ 2013-09-24 20:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

tun calls skb_get_rxhash on both transmit and receive side with the
expectation that the calculation in symmetric. To ensure this
property call skb_get_sw_rxhash instead (ignores hash provided by NIC
on receive).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/tun.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 807815f..0619fdf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -358,7 +358,7 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb)
 	rcu_read_lock();
 	numqueues = ACCESS_ONCE(tun->numqueues);
 
-	txq = skb_get_rxhash(skb);
+	txq = skb_get_sw_rxhash(skb);
 	if (txq) {
 		e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
 		if (e)
@@ -1138,7 +1138,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	skb_reset_network_header(skb);
 	skb_probe_transport_header(skb, 0);
 
-	rxhash = skb_get_rxhash(skb);
+	rxhash = skb_get_sw_rxhash(skb);
 	netif_rx_ni(skb);
 
 	tun->dev->stats.rx_packets++;
-- 
1.8.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox