Netdev List
 help / color / mirror / Atom feed
* [patch net 2/3] mlxsw: spectrum_router: Don't reflect dead neighs
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis
In-Reply-To: <1482481970-2327-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

When a neighbour is considered to be dead, we should remove it from the
device's table regardless of its NUD state.

Without this patch, after setting a port to be administratively down we
get the following errors when we periodically try to update the kernel
about neighbours activity:

[  461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
matching neighbour for IP=192.168.100.2

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 53126bf..a0f9742 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -942,7 +942,7 @@ static void mlxsw_sp_router_neigh_update_hw(struct work_struct *work)
 	char rauht_pl[MLXSW_REG_RAUHT_LEN];
 	struct net_device *dev;
 	bool entry_connected;
-	u8 nud_state;
+	u8 nud_state, dead;
 	bool updating;
 	bool removing;
 	bool adding;
@@ -953,10 +953,11 @@ static void mlxsw_sp_router_neigh_update_hw(struct work_struct *work)
 	dip = ntohl(*((__be32 *) n->primary_key));
 	memcpy(neigh_entry->ha, n->ha, sizeof(neigh_entry->ha));
 	nud_state = n->nud_state;
+	dead = n->dead;
 	dev = n->dev;
 	read_unlock_bh(&n->lock);
 
-	entry_connected = nud_state & NUD_VALID;
+	entry_connected = nud_state & NUD_VALID && !dead;
 	adding = (!neigh_entry->offloaded) && entry_connected;
 	updating = neigh_entry->offloaded && entry_connected;
 	removing = neigh_entry->offloaded && !entry_connected;
@@ -1351,7 +1352,7 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp,
 	struct mlxsw_sp_neigh_entry *neigh_entry;
 	struct net_device *dev = fib_nh->nh_dev;
 	struct neighbour *n;
-	u8 nud_state;
+	u8 nud_state, dead;
 
 	/* Take a reference of neigh here ensuring that neigh would
 	 * not be detructed before the nexthop entry is finished.
@@ -1383,8 +1384,9 @@ static int mlxsw_sp_nexthop_init(struct mlxsw_sp *mlxsw_sp,
 	list_add_tail(&nh->neigh_list_node, &neigh_entry->nexthop_list);
 	read_lock_bh(&n->lock);
 	nud_state = n->nud_state;
+	dead = n->dead;
 	read_unlock_bh(&n->lock);
-	__mlxsw_sp_nexthop_neigh_update(nh, !(nud_state & NUD_VALID));
+	__mlxsw_sp_nexthop_neigh_update(nh, !(nud_state & NUD_VALID && !dead));
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [patch net 1/3] neigh: Send netevent after marking neigh as dead
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis
In-Reply-To: <1482481970-2327-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.

This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.

Fix that by sending a netevent after marking the neighbour as dead.

Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/neighbour.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 782dd86..7bb12e0 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -100,6 +100,7 @@ static void neigh_cleanup_and_release(struct neighbour *neigh)
 		neigh->parms->neigh_cleanup(neigh);
 
 	__neigh_notify(neigh, RTM_DELNEIGH, 0);
+	call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
 	neigh_release(neigh);
 }
 
-- 
2.7.4

^ permalink raw reply related

* [patch net 0/3] mlxsw: Router fixes
From: Jiri Pirko @ 2016-12-23  8:32 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis

From: Jiri Pirko <jiri@mellanox.com>

Ido says:

First two patches ensure we remove from the device's table neighbours
that are considered to be dead by the neighbour core.

The last patch removes nexthop groups from the device when they are no
longer valid.

Ido Schimmel (3):
  neigh: Send netevent after marking neigh as dead
  mlxsw: spectrum_router: Don't reflect dead neighs
  mlxsw: spectrum_router: Correctly remove nexthop groups

 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 13 +++++++++----
 net/core/neighbour.c                                  |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

-- 
2.7.4

^ permalink raw reply

* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-23  6:14 UTC (permalink / raw)
  To: Jeff Kirsher, Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org, weiyongjun (A),
	Dingtianhong
In-Reply-To: <1482455203.2481.31.camel@intel.com>



> -----Original Message-----
> From: Jeff Kirsher [mailto:jeffrey.t.kirsher@intel.com]
> Sent: Friday, December 23, 2016 9:07 AM
> To: maowenan; Alexander Duyck
> Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A);
> Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > > -----Original Message-----
> > > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > > Sent: Thursday, December 22, 2016 11:54 PM
> > > To: maowenan
> > > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > > com;
> > > weiyongjun (A); Dingtianhong
> > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > ordering mode
> > >
> > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> > > wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > > To: maowenan
> > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set
> > > > > relax ordering mode
> > > > >
> > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan
> > > > > <maowenan@huawei.com> wrote:
> > > > >
> > > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > > relax ordering mode, which can be set by ethtool.
> > > > > > Relax ordering is one mode of 82599 NIC, to enable this mode
> > > > > > can enhance the performance for some cpu architecure.
> > > > >
> > > > > Then it should be done by CPU architecture specific quirks
> > > > > (preferably in PCI
> > > > > layer) so that all users get the option without having to do
> > > > > manual
> > >
> > > intervention.
> > > > >
> > > > > > example:
> > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0
> > > > > > relaxorder on
> > > > >
> > > > > Doing it via ethtool is a developer API (for testing) not
> > > > > something that makes sense in production.
> > > >
> > > >
> > > > This feature is not mandatory for all users, acturally relax
> > > > ordering default configuration of 82599 is 'disable', So this
> > > > patch gives one way to
> > >
> > > enable relax ordering to be selected in some performance condition.
> > >
> > > That isn't quite true.  The default for Sparc systems is to have it
> > > enabled.
> > >
> > > Really this is something that is platform specific.  I agree with
> > > Stephen that it would work better if this was handled as a series of
> > > platform specific quirks handled at something like the PCI layer
> > > rather than be a switch the user can toggle on and off.
> > >
> > > With that being said there are changes being made that should help
> > > to improve the situation.  Specifically I am looking at adding
> > > support for the DMA_ATTR_WEAK_ORDERING which may also allow us to
> > > identify cases where you might be able to specify the DMA behavior
> > > via the DMA mapping instead of having to make the final decision in
> > > the device itself.
> > >
> > > - Alex
> >
> > Yes, Sparc is a special case. From the NIC driver point of view, It is
> > no need for some ARCHs to do particular operation and compiling
> > branch, ethtool is a flexible method for user to make decision whether
> > on|off this feature.
> > I think Jeff as maintainer of 82599 has some comments about this.
> 
> My original comment/objection was that you attempted to do this change as a
> module parameter to the ixgbe driver, where I directed you to use ethtool so
> that other drivers could benefit from the ability to enable/disable relaxed
> ordering.  As far as how it gets implemented in ethtool or PCI layer, makes
> little difference to me, I only had issues with the driver specific module
> parameter implementation, which is not acceptable.


Thank you Jeff and Alex.
And then I have gone through mail thread about "i40e: enable PCIe relax ordering for SPARC",
It only works for SPARC, any other ARCH who wants to enable DMA_ATTR_WEAK_ORDERING 
feature, should define the new macro, recompile the driver module.

Because of the above reasons, we implement in ethtool to give the final user a convenient
way to on|off special feature, no need define new macro, easy to extend the new features,
and also good benefit for other driver as Jeff referred.


^ permalink raw reply

* Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Phil Reid @ 2016-12-23  1:09 UTC (permalink / raw)
  To: Joao Pinto, peppe.cavallaro, davem, seraphin.bonnaffe
  Cc: hock.leong.kweh, niklas.cassel, pavel, linux-kernel, netdev
In-Reply-To: <67213186-8583-9f9c-6f6a-6f56d2ba4ff0@synopsys.com>

G'day Joao,
On 23/12/2016 01:06, Joao Pinto wrote:
> Às 4:57 PM de 12/22/2016, Phil Reid escreveu:
>> On 22/12/2016 23:47, Joao Pinto wrote:
>>>
>>> Hello Phil,
>>>
>>> Às 3:42 PM de 12/22/2016, Phil Reid escreveu:
>>>> G'day Joao,
>>>>
>>>> On 22/12/2016 20:38, Joao Pinto wrote:
>>>>> When testing stmmac with my QoS reference design I checked a problem in the
>>>>> CSR clock configuration that was impossibilitating the phy discovery, since
>>>>> every read operation returned 0x0000ffff. This patch fixes the issue.
>>>>>
>>>>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
>>>>> ---
>>>>> changes v1->v2 (David Miller)
>>>>> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch
>>>>> to make sense
>>>>>
>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +-
>>>>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c  | 2 +-
>>>>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c    | 8 ++++----
>>>>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> index b21d03f..94223c8 100644
>>>>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
>>>>> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem
>>>>> *ioaddr, int mcbins,
>>>>>      mac->mii.reg_shift = 6;
>>>>>      mac->mii.reg_mask = 0x000007C0;
>>>>>      mac->mii.clk_csr_shift = 2;
>>>>> -    mac->mii.clk_csr_mask = 0xF;
>>>>> +    mac->mii.clk_csr_mask = GENMASK(4, 2);
>>>>
>>>> Should this not be GENMASK(5,2)
>>>
>>> According to Universal MAC databook (valid for MAC100 and MAC1000), we have:
>>>
>>> Bits: 4:2
>>> 000 60-100 MHz clk_csr_i/42
>>> 001 100-150 MHz clk_csr_i/62
>>> 010 20-35 MHz clk_csr_i/16
>>> 011 35-60 MHz clk_csr_i/26
>>> 100 150-250 MHz clk_csr_i/102
>>> 101 250-300 MHz clk_csr_i/124
>>> 110, 111 Reserved
>>>
>>> So only bits 2, 3 and 4 should be masked.
>>>
>>> Thanks.
>>>
>> But this is a change in behaviour from what was there isn't.
>> Previous mask was 4 bits. now it's 3.
>>
>> And for example the altera socfgpa implementation in the cyclone V has valid values
>> for values 0x8-0xf, using bit 5:2.
>
> According to the databook, bit 5 is reserved and RO. When reserved, it is
> possible to customize. Is that the case? If there is hardware using the 5th bit
> we can put it GENMASK(5, 2) with no problem.
>
I've also checked the Aria 10 documentation and bit 5 is also RW.
The following options are documented and supported
     1000: CSR clock/4
     1001: CSR clock/6
     1010: CSR clock/8
     1011: CSR clock/10
     1100: CSR clock/12
     1101: CSR clock/14
     1110: CSR clock/16
     1111: CSR clock/18
They do mention that these values will probably be outside the IEEE 802.3 specified range.

But there's at least a couple of cores out there where GENMASK(5,2) is valid.
Can't say if anyone is using that setting thou.


-- 
Regards
Phil Reid

^ permalink raw reply

* Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: Jeff Kirsher @ 2016-12-23  1:06 UTC (permalink / raw)
  To: maowenan, Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org, weiyongjun (A),
	Dingtianhong
In-Reply-To: <F95AC9340317A84688A5F0DF0246F3F20151FD1C@szxeml504-mbs.china.huawei.com>

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

On Fri, 2016-12-23 at 00:40 +0000, maowenan wrote:
> > -----Original Message-----
> > From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> > Sent: Thursday, December 22, 2016 11:54 PM
> > To: maowenan
> > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.
> > com;
> > weiyongjun (A); Dingtianhong
> > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > ordering mode
> > 
> > On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> > wrote:
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Thursday, December 22, 2016 9:28 AM
> > > > To: maowenan
> > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> > > > ordering mode
> > > > 
> > > > On Thu, 8 Dec 2016 14:51:38 +0800
> > > > Mao Wenan <maowenan@huawei.com> wrote:
> > > > 
> > > > > This patch provides one way to set/unset IXGBE NIC TX and RX
> > > > > relax
> > > > > ordering mode, which can be set by ethtool.
> > > > > Relax ordering is one mode of 82599 NIC, to enable this mode can
> > > > > enhance the performance for some cpu architecure.
> > > > 
> > > > Then it should be done by CPU architecture specific quirks
> > > > (preferably in PCI
> > > > layer) so that all users get the option without having to do manual
> > 
> > intervention.
> > > > 
> > > > > example:
> > > > > ethtool -s enp1s0f0 relaxorder off
> > > > > ethtool -s enp1s0f0 relaxorder on
> > > > 
> > > > Doing it via ethtool is a developer API (for testing) not something
> > > > that makes sense in production.
> > > 
> > > 
> > > This feature is not mandatory for all users, acturally relax ordering
> > > default configuration of 82599 is 'disable', So this patch gives one
> > > way to
> > 
> > enable relax ordering to be selected in some performance condition.
> > 
> > That isn't quite true.  The default for Sparc systems is to have it
> > enabled.
> > 
> > Really this is something that is platform specific.  I agree with
> > Stephen that it
> > would work better if this was handled as a series of platform specific
> > quirks
> > handled at something like the PCI layer rather than be a switch the
> > user can
> > toggle on and off.
> > 
> > With that being said there are changes being made that should help to
> > improve
> > the situation.  Specifically I am looking at adding support for the
> > DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> > you might be able to specify the DMA behavior via the DMA mapping
> > instead of
> > having to make the final decision in the device itself.
> > 
> > - Alex
> 
> Yes, Sparc is a special case. From the NIC driver point of view, It is no
> need for 
> some ARCHs to do particular operation and compiling branch, ethtool is a
> flexible 
> method for user to make decision whether on|off this feature.
> I think Jeff as maintainer of 82599 has some comments about this.

My original comment/objection was that you attempted to do this change as a
module parameter to the ixgbe driver, where I directed you to use ethtool 
so that other drivers could benefit from the ability to enable/disable
relaxed ordering.  As far as how it gets implemented in ethtool or PCI
layer, makes little difference to me, I only had issues with the driver
specific module parameter implementation, which is not acceptable.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
From: maowenan @ 2016-12-23  0:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Stephen Hemminger, netdev@vger.kernel.org,
	jeffrey.t.kirsher@intel.com, weiyongjun (A), Dingtianhong
In-Reply-To: <CAKgT0Uc2ztu_iDn79zdNtsAdb-Ftu65V6YHV-sqXb6vTCTiGCQ@mail.gmail.com>



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Thursday, December 22, 2016 11:54 PM
> To: maowenan
> Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com;
> weiyongjun (A); Dingtianhong
> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> 
> On Wed, Dec 21, 2016 at 5:39 PM, maowenan <maowenan@huawei.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Thursday, December 22, 2016 9:28 AM
> >> To: maowenan
> >> Cc: netdev@vger.kernel.org; jeffrey.t.kirsher@intel.com
> >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax
> >> ordering mode
> >>
> >> On Thu, 8 Dec 2016 14:51:38 +0800
> >> Mao Wenan <maowenan@huawei.com> wrote:
> >>
> >> > This patch provides one way to set/unset IXGBE NIC TX and RX relax
> >> > ordering mode, which can be set by ethtool.
> >> > Relax ordering is one mode of 82599 NIC, to enable this mode can
> >> > enhance the performance for some cpu architecure.
> >>
> >> Then it should be done by CPU architecture specific quirks
> >> (preferably in PCI
> >> layer) so that all users get the option without having to do manual
> intervention.
> >>
> >> > example:
> >> > ethtool -s enp1s0f0 relaxorder off
> >> > ethtool -s enp1s0f0 relaxorder on
> >>
> >> Doing it via ethtool is a developer API (for testing) not something
> >> that makes sense in production.
> >
> >
> > This feature is not mandatory for all users, acturally relax ordering
> > default configuration of 82599 is 'disable', So this patch gives one way to
> enable relax ordering to be selected in some performance condition.
> 
> That isn't quite true.  The default for Sparc systems is to have it enabled.
> 
> Really this is something that is platform specific.  I agree with Stephen that it
> would work better if this was handled as a series of platform specific quirks
> handled at something like the PCI layer rather than be a switch the user can
> toggle on and off.
> 
> With that being said there are changes being made that should help to improve
> the situation.  Specifically I am looking at adding support for the
> DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where
> you might be able to specify the DMA behavior via the DMA mapping instead of
> having to make the final decision in the device itself.
> 
> - Alex

Yes, Sparc is a special case. From the NIC driver point of view, It is no need for 
some ARCHs to do particular operation and compiling branch, ethtool is a flexible 
method for user to make decision whether on|off this feature.
I think Jeff as maintainer of 82599 has some comments about this.
-Wenan


^ permalink raw reply

* [PATCH] brcmfmac: fix spelling mistakes on "Ivalid"
From: Colin King @ 2016-12-23  0:43 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	Pieter-Paul Giesberts, Rafał Miłecki, linux-wireless,
	brcm80211-dev-list.pdl, netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fixes to spelling mistake "Ivalid" to "Invalid" in
brcmf_err error messages.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 7ffc4ab..15eaf72 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -3971,7 +3971,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
 			pval |= AES_ENABLED;
 			break;
 		default:
-			brcmf_err("Ivalid unicast security info\n");
+			brcmf_err("Invalid unicast security info\n");
 		}
 		offset++;
 	}
@@ -4015,7 +4015,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp,
 			wpa_auth |= WPA2_AUTH_1X_SHA256;
 			break;
 		default:
-			brcmf_err("Ivalid key mgmt info\n");
+			brcmf_err("Invalid key mgmt info\n");
 		}
 		offset++;
 	}
-- 
2.10.2

^ permalink raw reply related

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-23  0:26 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Shahar Klein, Or Gerlitz, Roi Dayan, Jiri Pirko,
	John Fastabend, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpW9Ly+fMEiBtHq4XwUgvt89-5uE2Bwdw1FXJGV3=eY1ZA@mail.gmail.com>

On 12/22/2016 08:05 PM, Cong Wang wrote:
> On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Ok, you mean for net. In that case I prefer the smaller sized fix to be
>> honest. It also covers everything from the point where we fetch the chain
>> via cops->tcf_chain() to the end of the function, which is where most of
>> the complexity resides, and only the two mentioned commits do the relock,
>
> I really wish the problem is only about relocking, but look at the code,
> the deeper reason why we have this bug is the complexity of the logic
> inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it
> idempotent; 2) the request logic itself is hard, because of tc filter design
> and implementation.
>
> This is why I worry more than just relocking.

But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to
me your argument is more about fear of complexity on tc framework itself.
I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it
would be good to reduce it's complexity into smaller pieces. But it's not
really related to the fix itself, reducing complexity requires significantly
more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next
to try to simplify it, sure, but I don't get why we have to discuss so much
on this matter in this context, really.

>> so as a fix I think it's fine as-is. As mentioned, if there's need to
>> refactor tc_ctl_tfilter() net-next would be better, imho.
>
> Refactor is a too strong word, just moving the replay out is not a refactor.
> The end result will be still smaller than your commit d936377414fadbafb4,
> which is already backported to stable.

Commit d936377414fa is a whole different thing, and not related to the
topic at all. The two-line changes with the initialization fix is quite
straight forward and fixes a bug in the logic. It's also less complex in
terms of lines but also in terms of complexity than to refactor or move the
replay out. Moving it out contextually means that you also wouldn't rule
out that things like nlmsg_parse(), or in-fact *any* of the __tc_ctl_tfilter()
return paths could potentially return an error that suddenly would require
a replay of the request. So, while moving it out fixes the bug, too, it also
adds more potential points where you would go and replay the request where
you didn't do so before and therefore also adds more complexity to the code
that needs review/audit when fixing bugs since you don't have these hard/direct
return paths anymore. Therefore I don't think it's better to go that way
for the fix.

Thanks,
Daniel

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-23  0:07 UTC (permalink / raw)
  To: hannes, linux, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <30d9513a-129b-d246-1461-2326130e118f@stressinduktion.org>

Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)

Adding might_lock() annotations will improve coverage a lot.

> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't satisfy a request with one batched entropy block and have
> to consume randomness from two.

The bit granularity is also for the callers' convenience, so they don't
have to mask again.  Whether get_random_bits rounds up to byte boundaries
internally or not is something else.

When the current batch runs low, I was actually thinking of throwing
away the remaining bits and computing a new batch of 512.  But it's
whatever works best at implementation time.

>>> It could only mix the output back in every two calls, in which case
>>> you can backtrack up to one call but you need to do 2^128 work to
>>> backtrack farther.  But yes, this is getting excessively complicated.
> 
>> No, if you're willing to accept limited backtrack, this is a perfectly
>> acceptable solution, and not too complicated.  You could do it phase-less
>> if you like; store the previous output, then after generating the new
>> one, mix in both.  Then overwrite the previous output.  (But doing two
>> rounds of a crypto primtive to avoid one conditional jump is stupid,
>> so forget that.)

> Can you quickly explain why we lose the backtracking capability?

Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
backtracking is to compute output[i], or better yet state[i-1], given
state[i].

For example, consider an OFB or CTR mode generator.  The state is a key
and and IV, and you encrypt the IV with the key to produce output, then
either replace the IV with the output, or increment it.  Either way,
since you still have the key, you can invert the transformation and
recover the previous IV.

The standard way around this is to use the Davies-Meyer construction:

IV[i] = IV[i-1] + E(IV[i-1], key)

This is the standard way to make a non-invertible random function
out of an invertible random permutation.

>From the sum, there's no easy way to find the ciphertext *or* the
plaintext that was encrypted.  Assuming the encryption is secure,
the only way to reverse it is brute force: guess IV[i-1] and run the
operation forward to see if the resultant IV[i] matches.

There are a variety of ways to organize this computation, since the
guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
running E forward, backward, or starting from both ends to see if you
meet in the middle.

The way you add the encryption output to the IV is not very important.
It can be addition, xor, or some more complex invertible transformation.
In the case of SipHash, the "encryption" output is smaller than the
input, so we have to get a bit more creative, but it's still basically
the same thing.

The problem is that the output which is combined with the IV is too small.
With only 64 bits, trying all possible values is practical.  (The world's
Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
times per second.)

By basically doing two iterations at once and mixing in 128 bits of
output, the guessing attack is rendered impractical.  The only downside
is that you need to remember and store one result between when it's
computed and last used.  This is part of the state, so an attack can
find output[i-1], but not anything farther back.

> ChaCha as a block cipher gives a "perfect" permutation from the output
> of either the CRNG or the CPRNG, which actually itself has backtracking
> protection.

I'm not quite understanding.  The /dev/random implementation uses some
of the ChaCha output as a new ChaCha key (that's another way to mix output
back into the state) to prevent backtracking.  But this slows it down, and
again if you want to be efficient, you're generating and storing large batches
of entropy and storing it in the RNG state.

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-22 23:21 UTC (permalink / raw)
  To: John Fastabend, David Miller
  Cc: xiyou.wangcong, shahark, gerlitz.or, roid, jiri, netdev
In-Reply-To: <585C127E.7040509@gmail.com>

On 12/22/2016 06:50 PM, John Fastabend wrote:
> On 16-12-22 08:53 AM, David Miller wrote:
>> From: Daniel Borkmann <daniel@iogearbox.net>
>> Date: Wed, 21 Dec 2016 22:07:48 +0100
>>
>>> Ok, you mean for net. In that case I prefer the smaller sized fix to
>>> be honest. It also covers everything from the point where we fetch
>>> the chain via cops->tcf_chain() to the end of the function, which is
>>> where most of the complexity resides, and only the two mentioned
>>> commits do the relock, so as a fix I think it's fine as-is. As
>>> mentioned, if there's need to refactor tc_ctl_tfilter() net-next
>>> would be better, imho.
>>
>> Please, can you two work towards an agreement as to what fix should
>> go in at this time?
>>
>> Thanks.
>
> Thanks for fixing this!
>
> I have a slight preference to push this patch into net as its been
> tested already by Shahar and is not particularly invasive. Then for
> net-next do a larger cleanup to get rid of some of the complexity per
> Cong's suggestion.

My preference as well, thanks.

^ permalink raw reply

* Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann @ 2016-12-22 23:20 UTC (permalink / raw)
  To: Shahar Klein, davem
  Cc: xiyou.wangcong, gerlitz.or, roid, jiri, john.fastabend, netdev
In-Reply-To: <7033ed3d-0665-1a38-7631-a1346d46e1b4@mellanox.com>

On 12/22/2016 02:16 PM, Shahar Klein wrote:
> On 12/21/2016 7:04 PM, Daniel Borkmann wrote:
>> Shahar reported a soft lockup in tc_classify(), where we run into an
>> endless loop when walking the classifier chain due to tp->next == tp
>> which is a state we should never run into. The issue only seems to
>> trigger under load in the tc control path.
>>
>> What happens is that in tc_ctl_tfilter(), thread A allocates a new
>> tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
>> with it. In that classifier callback we had to unlock/lock the rtnl
>> mutex and returned with -EAGAIN. One reason why we need to drop there
>> is, for example, that we need to request an action module to be loaded.
>>
>> This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
>> after we loaded and found the requested action, we need to redo the
>> whole request so we don't race against others. While we had to unlock
>> rtnl in that time, thread B's request was processed next on that CPU.
>> Thread B added a new tp instance successfully to the classifier chain.
>> When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
>> and destroying its tp instance which never got linked, we goto replay
>> and redo A's request.
>>
>> This time when walking the classifier chain in tc_ctl_tfilter() for
>> checking for existing tp instances we had a priority match and found
>> the tp instance that was created and linked by thread B. Now calling
>> again into tp->ops->change() with that tp was successful and returned
>> without error.
>>
>> tp_created was never cleared in the second round, thus kernel thinks
>> that we need to link it into the classifier chain (once again). tp and
>> *back point to the same object due to the match we had earlier on. Thus
>> for thread B's already public tp, we reset tp->next to tp itself and
>> link it into the chain, which eventually causes the mentioned endless
>> loop in tc_classify() once a packet hits the data path.
>>
>> Fix is to clear tp_created at the beginning of each request, also when
>> we replay it. On the paths that can cause -EAGAIN we already destroy
>> the original tp instance we had and on replay we really need to start
>> from scratch. It seems that this issue was first introduced in commit
>> 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
>> and avoid kernel panic when we use cls_cgroup").
>>
>> Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
>> Reported-by: Shahar Klein <shahark@mellanox.com>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
[...]
> I've tested this specific patch (without cong's patch and without the many prints) many times and in many test permutations today and it didn't lock
>
> Thanks again Daniel!

Just catching up with email (traveling whole day), thanks a bunch for
your effort, Shahar! In that case lets then add:

Tested-by: Shahar Klein <shahark@mellanox.com>

^ permalink raw reply

* [PATCH net] inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
From: Willem de Bruijn @ 2016-12-22 23:19 UTC (permalink / raw)
  To: netdev; +Cc: davem, njagabar, samanthakumar, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_sockglue.c | 2 +-
 net/ipv6/datagram.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 8b13881..9760734 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -148,7 +148,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb)
 	const struct iphdr *iph = ip_hdr(skb);
 	__be16 *ports = (__be16 *)skb_transport_header(skb);
 
-	if (skb_transport_offset(skb) + 4 > skb->len)
+	if (skb_transport_offset(skb) + 4 > (int)skb->len)
 		return;
 
 	/* All current transport protocols have the port numbers in the
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 0489e19..1407426 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -701,7 +701,7 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg,
 		struct sockaddr_in6 sin6;
 		__be16 *ports = (__be16 *) skb_transport_header(skb);
 
-		if (skb_transport_offset(skb) + 4 <= skb->len) {
+		if (skb_transport_offset(skb) + 4 <= (int)skb->len) {
 			/* All current transport protocols have the port numbers in the
 			 * first four bytes of the transport header and this function is
 			 * written with this assumption in mind.
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH] Fixed status entry in m_can documentation
From: Rob Herring @ 2016-12-22 23:14 UTC (permalink / raw)
  To: Vyacheslav V. Yurkov
  Cc: linux-can-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Jiri Kosina,
	Wolfgang Grandegger, Marc Kleine-Budde, Mark Rutland
In-Reply-To: <20161222104521.20683-1-uvv.mail-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Thu, Dec 22, 2016 at 11:45:21AM +0100, Vyacheslav V. Yurkov wrote:
> Use valid value for 'enabled' in status field
> 
> Signed-off-by: Vyacheslav V. Yurkov <uvv.mail-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/net/can/m_can.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt
> index 9e33177..5facaf5 100644
> --- a/Documentation/devicetree/bindings/net/can/m_can.txt
> +++ b/Documentation/devicetree/bindings/net/can/m_can.txt
> @@ -63,5 +63,5 @@ Board dts:
>  &m_can1 {
>  	pinctrl-names = "default";
>  	pinctrl-0 = <&pinctrl_m_can1>;
> -	status = "enabled";
> +	status = "okay";

Examples don't need to have status prop. Just remove.

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Rob Herring @ 2016-12-22 23:02 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo,
	mark.rutland, netdev, devicetree, linux-kernel, mgreer, justin
In-Reply-To: <1482380314-16440-2-git-send-email-geoff@kuvee.com>

On Wed, Dec 21, 2016 at 11:18:33PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options for supporting hardware designs
> with 1.8 Volt or 3.3 Volt IO.   This commit adds a device tree option,
> using a fixed regulator binding, for setting the io voltage to match
> the hardware configuration. If no option is supplied it defaults to
> 3.3 volt configuration.
> 
> Signed-off-by: Geoff Lansberry <geoff@kuvee.com>
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  2 ++

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/nfc/trf7970a.c                             | 26 +++++++++++++++++++++-
>  2 files changed, 27 insertions(+), 1 deletion(-)

^ permalink raw reply

* Re: [PATCH v3 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-22 23:01 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	lauro.venancio-430g2QfJUUCGglJvpFV4uA,
	aloisio.almeida-430g2QfJUUCGglJvpFV4uA,
	sameo-VuQAYsv1563Yd54FQh9/CA, mark.rutland-5wv7dgnIgG8,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	mgreer-luAo+O/VEmrlveNOaEYElw, justin-R+k406RtEhcAvxtiuMwx3w
In-Reply-To: <1482380314-16440-1-git-send-email-geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>

On Wed, Dec 21, 2016 at 11:18:32PM -0500, Geoff Lansberry wrote:
> The TRF7970A has configuration options to support hardware designs
> which use a 27.12MHz clock. This commit adds a device tree option
> 'clock-frequency' to support configuring the this chip for default
> 13.56MHz clock or the optional 27.12MHz clock.
> 
> Signed-off-by: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  2 +

Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

>  drivers/nfc/trf7970a.c                             | 50 +++++++++++++++++-----
>  2 files changed, 41 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Eric Dumazet @ 2016-12-22 22:43 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, hannes, kraigatgoog, tom, netdev, kernel-team
In-Reply-To: <1482441998-28359-7-git-send-email-jbacik@fb.com>

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
> never set it again.  Which means that in the future if we end up adding a bunch
> of reuseport sk's to that tb we'll have to do the expensive scan every time.
> Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
> Once one sk has made it onto the list we know that there are no potential bind
> conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
> common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
> we know we have to do an extra check for subsequent reuseport sockets and skip
> the expensive bind conflict check.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  include/net/inet_hashtables.h   |  4 ++++
>  net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
>  2 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 756ed16..4ccc18f 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -74,12 +74,16 @@ struct inet_ehash_bucket {
>   * users logged onto your box, isn't it nice to know that new data
>   * ports are created in O(1) time?  I thought so. ;-)	-DaveM
>   */
> +#define FASTREUSEPORT_ANY	1
> +#define FASTREUSEPORT_STRICT	2
> +
>  struct inet_bind_bucket {
>  	possible_net_t		ib_net;
>  	unsigned short		port;
>  	signed char		fastreuse;
>  	signed char		fastreuseport;
>  	kuid_t			fastuid;
> +	struct sock_common	fastsock;
>  	int			num_owners;
>  	struct hlist_node	node;
>  	struct hlist_head	owners;

Please place this fat field at the end of inet_bind_bucket

Many sockets do not use SO_REUSEPORT and should not use this field,
while tb->owners need to be touched.

^ permalink raw reply

* Re: [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
From: Eric Dumazet @ 2016-12-22 22:41 UTC (permalink / raw)
  To: Josef Bacik; +Cc: davem, hannes, kraigatgoog, tom, netdev, kernel-team
In-Reply-To: <1482441998-28359-4-git-send-email-jbacik@fb.com>

On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote:
> In inet_csk_get_port we seem to be using smallest_port to figure out where the
> best place to look for a SO_REUSEPORT sk that matches with an existing set of
> SO_REUSEPORT's.  However if we get to the logic
> 
> if (smallest_size != -1) {
> 	port = smallest_port;
> 	goto have_port;
> }
> 
> we will do a useless search, because we would have already done the
> inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
> would have gone to found_tb and succeeded.  Since this logic makes us do yet
> another trip through inet_csk_bind_conflict for a port we know won't work just
> delete this code and save us the time.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Please remove tb->need_owners ;)

^ permalink raw reply

* Re: [PATCH v4 2/3] Bluetooth: btusb: Add out-of-band wakeup support
From: Rob Herring @ 2016-12-22 22:39 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Mark Rutland, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	Amitkumar Karwar, Wei-Ning Huang, Xinming Hu, netdev, devicetree,
	linux-bluetooth, Brian Norris, linux-kernel, rajatxjain
In-Reply-To: <1482352432-38302-2-git-send-email-rajatja@google.com>

On Wed, Dec 21, 2016 at 12:33:51PM -0800, Rajat Jain wrote:
> Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
> can be connected to a gpio on the CPU side, and can be used to wakeup
> the host out-of-band. This can be useful in situations where the
> in-band wakeup is not possible or not preferable (e.g. the in-band
> wakeup may require the USB host controller to remain active, and
> hence consuming more system power during system sleep).
> 
> The oob gpio interrupt to be used for wakeup on the CPU side, is
> read from the device tree node, (using standard interrupt descriptors).
> A devcie tree binding document is also added for the driver. The
> compatible string is in compliance with
> Documentation/devicetree/bindings/usb/usb-device.txt
> 
> Signed-off-by: Rajat Jain <rajatja@google.com>
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> ---
> v4: Move the set_bit(BTUSB_OOB_WAKE_DISABLED,..) call to the beginning of
>     btusb_config_oob_wake() - caught by Brian.
> v3: Add Brian's "Reviewed-by"
> v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
>     * Leave it on device tree to specify IRQ flags (level /edge triggered)
>     * Mark the device as non wakeable on exit.
> 
>  Documentation/devicetree/bindings/net/btusb.txt | 40 ++++++++++++

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/bluetooth/btusb.c                       | 84 +++++++++++++++++++++++++
>  2 files changed, 124 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/btusb.txt

^ permalink raw reply

* Re: [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
From: Rob Herring @ 2016-12-22 22:30 UTC (permalink / raw)
  To: Murali Karicheri
  Cc: netdev, linux-omap, grygorii.strashko, mugunthanvnm, linux-kernel,
	arnd, davem, devicetree, mark.rutland
In-Reply-To: <1482271793-7671-3-git-send-email-m-karicheri2@ti.com>

On Tue, Dec 20, 2016 at 05:09:45PM -0500, Murali Karicheri wrote:
> From: WingMan Kwok <w-kwok2@ti.com>
> 
> The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides
> functionality of a physical coding sublayer (PCS) on data being
> transferred between a demuxed XGMII and SerDes supporting a 16
> or 32 bit interface.  From the driver point of view, whether
> a ethernet link is up or not depends also on the status of the
> block-lock bit of the PCSR.  This patch adds the checking of that
> bit in order to determine the link status.

I would think this would be a common thing and the phy driver should 
provide the status, rather than trying to give the ethernet driver 
direct access to the phy registers. Is the PCSR the serdes phy or 
registers in addition to that?

Rob

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Hannes Frederic Sowa @ 2016-12-22 21:38 UTC (permalink / raw)
  To: George Spelvin, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <20161222211140.2816.qmail@ns.sciencehorizons.net>

On 22.12.2016 22:11, George Spelvin wrote:
>> I do tend to like Ted's version in which we use batched
>> get_random_bytes() output.  If it's fast enough, it's simpler and lets
>> us get the full strength of a CSPRNG.
> 
> With the ChaCha20 generator, that's fine, although note that this abandons
> anti-backtracking entirely.
> 
> It also takes locks, something the previous get_random_int code
> path avoided.  Do we need to audit the call sites to ensure that's safe?

We have spin_lock_irq* locks on the way. Of course they can hurt in when
contended. The situation should be the same as with get_random_bytes,
callable from every possible situation in the kernel without fear, so
this should also work for get_random_int. A lockdep test should still be
done. ;)

> And there is the issue that the existing callers assume that there's a
> fixed cost per word.  A good half of get_random_long calls are followed by
> "& ~PAGE_MASK" to extract the low 12 bits.  Or "& ((1ul << mmap_rnd_bits)
> - 1)" to extract the low 28.  If we have a buffer we're going to have to
> pay to refill, it would be nice to use less than 8 bytes to satisfy those.
> 
> But that can be a followup patch.  I'm thinking
> 
> unsigned long get_random_bits(unsigned bits)
> 	E.g. get_random_bits(PAGE_SHIFT),
> 	     get_random_bits(mmap_rnd_bits),
> 	u32 imm_rnd = get_random_bits(32)
> 
> unsigned get_random_mod(unsigned modulus)
> 	E.g. get_random_mod(hole) & ~(alignment - 1);
> 	     get_random_mod(port_scan_backoff)
> 	(Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed
> 	to prandom.)
> 
> with, until the audit is completed:
> #define get_random_int() get_random_bits(32)
> #define get_random_long() get_random_bits(BITS_PER_LONG)

Yes, that does look nice indeed. Accounting for bits instead of bytes
shouldn't be a huge problem either. Maybe it gets a bit more verbose in
case you can't satisfy a request with one batched entropy block and have
to consume randomness from two.

>> It could only mix the output back in every two calls, in which case
>> you can backtrack up to one call but you need to do 2^128 work to
>> backtrack farther.  But yes, this is getting excessively complicated.
> 
> No, if you're willing to accept limited backtrack, this is a perfectly
> acceptable solution, and not too complicated.  You could do it phase-less
> if you like; store the previous output, then after generating the new
> one, mix in both.  Then overwrite the previous output.  (But doing two
> rounds of a crypto primtive to avoid one conditional jump is stupid,
> so forget that.)

Can you quickly explain why we lose the backtracking capability?

ChaCha as a block cipher gives a "perfect" permutation from the output
of either the CRNG or the CPRNG, which actually itself has backtracking
protection.

Thanks for explaining,
Hannes

^ permalink raw reply

* [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
never set it again.  Which means that in the future if we end up adding a bunch
of reuseport sk's to that tb we'll have to do the expensive scan every time.
Instead add a sock_common to the tb so we know what reuseport sk succeeded last.
Once one sk has made it onto the list we know that there are no potential bind
conflicts on the owners list that match that sk's rcv_addr.  So copy the sk's
common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so
we know we have to do an extra check for subsequent reuseport sockets and skip
the expensive bind conflict check.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/net/inet_hashtables.h   |  4 ++++
 net/ipv4/inet_connection_sock.c | 53 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 756ed16..4ccc18f 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -74,12 +74,16 @@ struct inet_ehash_bucket {
  * users logged onto your box, isn't it nice to know that new data
  * ports are created in O(1) time?  I thought so. ;-)	-DaveM
  */
+#define FASTREUSEPORT_ANY	1
+#define FASTREUSEPORT_STRICT	2
+
 struct inet_bind_bucket {
 	possible_net_t		ib_net;
 	unsigned short		port;
 	signed char		fastreuse;
 	signed char		fastreuseport;
 	kuid_t			fastuid;
+	struct sock_common	fastsock;
 	int			num_owners;
 	struct hlist_node	node;
 	struct hlist_head	owners;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 923bd60..814382f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -236,6 +236,32 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *
 	return head;
 }
 
+static inline int sk_reuseport_match(struct inet_bind_bucket *tb,
+				     struct sock *sk)
+{
+	struct sock *sk2 = (struct sock *)&tb->fastsock;
+	kuid_t uid = sock_i_uid(sk);
+
+	if (tb->fastreuseport <= 0)
+		return 0;
+	if (!sk->sk_reuseport)
+		return 0;
+	if (rcu_access_pointer(sk->sk_reuseport_cb))
+		return 0;
+	if (!uid_eq(tb->fastuid, uid))
+		return 0;
+	/* We only need to check the rcv_saddr if this tb was once marked
+	 * without fastreuseport and then was reset, as we can only know that
+	 * the fastsock has no potential bind conflicts with the rest of the
+	 * possible socks on the owners list.
+	 */
+	if (tb->fastreuseport == FASTREUSEPORT_ANY)
+		return 1;
+	if (!inet_rcv_saddr_equal(sk, sk2, true))
+		return 0;
+	return 1;
+}
+
 /* Obtain a reference to a local port for the given sock,
  * if snum is zero it means select any available local port.
  * We try to allocate an odd port (and leave even ports for connect())
@@ -275,9 +301,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			goto success;
 
 		if ((tb->fastreuse > 0 && reuse) ||
-		     (tb->fastreuseport > 0 &&
-		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
+		    sk_reuseport_match(tb, sk))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
@@ -288,14 +312,35 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
 			tb->fastuid = uid;
+			memcpy(&tb->fastsock, &sk->__sk_common,
+			       sizeof(struct sock_common));
 		} else {
 			tb->fastreuseport = 0;
 		}
 	} else {
 		if (!reuse)
 			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+		if (sk->sk_reuseport) {
+			/* We didn't match or we don't have fastreuseport set on
+			 * the tb, but we have sk_reuseport set on this socket
+			 * and we know that there are no bind conflicts with
+			 * this socket in this tb, so reset our tb's reuseport
+			 * settings so that any subsequent sockets that match
+			 * our current socket will be put on the fast path.
+			 *
+			 * If we reset we need to set FASTREUSEPORT_STRICT so we
+			 * do extra checking for all subsequent sk_reuseport
+			 * socks.
+			 */
+			if (!sk_reuseport_match(tb, sk)) {
+				tb->fastreuseport = FASTREUSEPORT_STRICT;
+				tb->fastuid = uid;
+				memcpy(&tb->fastsock, &sk->__sk_common,
+				       sizeof(struct sock_common));
+			}
+		} else {
 			tb->fastreuseport = 0;
+		}
 	}
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
-- 
2.5.5

^ permalink raw reply related

* [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

In inet_csk_get_port we seem to be using smallest_port to figure out where the
best place to look for a SO_REUSEPORT sk that matches with an existing set of
SO_REUSEPORT's.  However if we get to the logic

if (smallest_size != -1) {
	port = smallest_port;
	goto have_port;
}

we will do a useless search, because we would have already done the
inet_csk_bind_conflict for that port and it would have returned 1, otherwise we
would have gone to found_tb and succeeded.  Since this logic makes us do yet
another trip through inet_csk_bind_conflict for a port we know won't work just
delete this code and save us the time.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 26 ++++----------------------
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a1c9055..d352366 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -165,7 +165,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
 	int ret = 1, attempts = 5, port = snum;
-	int smallest_size = -1, smallest_port;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -175,7 +174,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	bool reuseport_ok = !!snum;
 
 	if (port) {
-have_port:
 		head = &hinfo->bhash[inet_bhashfn(net, port,
 						  hinfo->bhash_size)];
 		spin_lock_bh(&head->lock);
@@ -209,8 +207,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 	 * We do the opposite to not pollute connect() users.
 	 */
 	offset |= 1U;
-	smallest_size = -1;
-	smallest_port = low; /* avoid compiler warning */
 
 other_parity_scan:
 	port = low + offset;
@@ -224,15 +220,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
-				if (((tb->fastreuse > 0 && reuse) ||
-				     (tb->fastreuseport > 0 &&
-				      sk->sk_reuseport &&
-				      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-				      uid_eq(tb->fastuid, uid))) &&
-				    (tb->num_owners < smallest_size || smallest_size == -1)) {
-					smallest_size = tb->num_owners;
-					smallest_port = port;
-				}
 				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
 					goto tb_found;
 				goto next_port;
@@ -243,10 +230,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		cond_resched();
 	}
 
-	if (smallest_size != -1) {
-		port = smallest_port;
-		goto have_port;
-	}
 	offset--;
 	if (!(offset & 1))
 		goto other_parity_scan;
@@ -268,19 +251,18 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		if (sk->sk_reuse == SK_FORCE_REUSE)
 			goto success;
 
-		if (((tb->fastreuse > 0 && reuse) ||
+		if ((tb->fastreuse > 0 && reuse) ||
 		     (tb->fastreuseport > 0 &&
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-		      sk->sk_reuseport && uid_eq(tb->fastuid, uid))) &&
-		    smallest_size == -1)
+		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
 		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
 			if ((reuse ||
 			     (tb->fastreuseport > 0 &&
 			      sk->sk_reuseport &&
 			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) &&
-			    !snum && smallest_size != -1 && --attempts >= 0) {
+			      uid_eq(tb->fastuid, uid))) && !snum &&
+			    --attempts >= 0) {
 				spin_unlock_bh(&head->lock);
 				goto again;
 			}
-- 
2.5.5

^ permalink raw reply related

* [PATCH 5/6 net-next] inet: split inet_csk_get_port into two functions
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

inet_csk_get_port does two different things, it either scans for an open port,
or it tries to see if the specified port is available for use.  Since these two
operations have different rules and are basically independent lets split them
into two different functions to make them both more readable.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 66 +++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f6a34bc..923bd60 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -156,33 +156,21 @@ static int inet_csk_bind_conflict(const struct sock *sk,
 	return sk2 != NULL;
 }
 
-/* Obtain a reference to a local port for the given sock,
- * if snum is zero it means select any available local port.
- * We try to allocate an odd port (and leave even ports for connect())
+/*
+ * Find an open port number for the socket.  Returns with the
+ * inet_bind_hashbucket lock held.
  */
-int inet_csk_get_port(struct sock *sk, unsigned short snum)
+static struct inet_bind_hashbucket *
+inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *port_ret)
 {
-	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, port = snum;
+	int port = 0;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
 	struct inet_bind_bucket *tb;
-	kuid_t uid = sock_i_uid(sk);
 	u32 remaining, offset;
-	bool reuseport_ok = !!snum;
 
-	if (port) {
-		head = &hinfo->bhash[inet_bhashfn(net, port,
-						  hinfo->bhash_size)];
-		spin_lock_bh(&head->lock);
-		inet_bind_bucket_for_each(tb, &head->chain)
-			if (net_eq(ib_net(tb), net) && tb->port == port)
-				goto tb_found;
-
-		goto tb_not_found;
-	}
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -221,11 +209,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
 				if (hlist_empty(&tb->owners))
 					goto success;
-				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
+				if (!inet_csk_bind_conflict(sk, tb, false, false))
 					goto success;
 				goto next_port;
 			}
-		goto tb_not_found;
+		tb = NULL;
+		goto success;
 next_port:
 		spin_unlock_bh(&head->lock);
 		cond_resched();
@@ -240,8 +229,41 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		attempt_half = 2;
 		goto other_half_scan;
 	}
-	return ret;
+	return NULL;
+success:
+	*port_ret = port;
+	*tb_ret = tb;
+	return head;
+}
+
+/* Obtain a reference to a local port for the given sock,
+ * if snum is zero it means select any available local port.
+ * We try to allocate an odd port (and leave even ports for connect())
+ */
+int inet_csk_get_port(struct sock *sk, unsigned short snum)
+{
+	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
+	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
+	int ret = 1, port = snum;
+	struct inet_bind_hashbucket *head;
+	struct net *net = sock_net(sk);
+	struct inet_bind_bucket *tb = NULL;
+	kuid_t uid = sock_i_uid(sk);
 
+	if (!port) {
+		head = inet_csk_find_open_port(sk, &tb, &port);
+		if (!head)
+			return ret;
+		if (!tb)
+			goto tb_not_found;
+		goto success;
+	}
+	head = &hinfo->bhash[inet_bhashfn(net, port,
+					  hinfo->bhash_size)];
+	spin_lock_bh(&head->lock);
+	inet_bind_bucket_for_each(tb, &head->chain)
+		if (net_eq(ib_net(tb), net) && tb->port == port)
+			goto tb_found;
 tb_not_found:
 	tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep,
 				     net, head, port);
@@ -257,7 +279,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
+		if (inet_csk_bind_conflict(sk, tb, true, true))
 			goto fail_unlock;
 	}
 success:
-- 
2.5.5

^ permalink raw reply related

* [PATCH 4/6 net-next] inet: don't check for bind conflicts twice when searching for a port
From: Josef Bacik @ 2016-12-22 21:26 UTC (permalink / raw)
  To: davem, hannes, kraigatgoog, eric.dumazet, tom, netdev,
	kernel-team
In-Reply-To: <1482441998-28359-1-git-send-email-jbacik@fb.com>

This is just wasted time, we've already found a tb that doesn't have a bind
conflict, and we don't drop the head lock so scanning again isn't going to give
us a different answer.  Instead move the tb->reuse setting logic outside of the
found_tb path and put it in the success: path.  Then make it so that we don't
goto again if we find a bind conflict in the found_tb path as we won't reach
this anymore when we are scanning for an ephemeral port.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 net/ipv4/inet_connection_sock.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index d352366..f6a34bc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -164,7 +164,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 {
 	bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN;
 	struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo;
-	int ret = 1, attempts = 5, port = snum;
+	int ret = 1, port = snum;
 	struct inet_bind_hashbucket *head;
 	struct net *net = sock_net(sk);
 	int i, low, high, attempt_half;
@@ -183,7 +183,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 
 		goto tb_not_found;
 	}
-again:
 	attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0;
 other_half_scan:
 	inet_get_local_port_range(net, &low, &high);
@@ -220,8 +219,10 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		spin_lock_bh(&head->lock);
 		inet_bind_bucket_for_each(tb, &head->chain)
 			if (net_eq(ib_net(tb), net) && tb->port == port) {
+				if (hlist_empty(&tb->owners))
+					goto success;
 				if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok))
-					goto tb_found;
+					goto success;
 				goto next_port;
 			}
 		goto tb_not_found;
@@ -256,23 +257,11 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		      !rcu_access_pointer(sk->sk_reuseport_cb) &&
 		      sk->sk_reuseport && uid_eq(tb->fastuid, uid)))
 			goto success;
-		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) {
-			if ((reuse ||
-			     (tb->fastreuseport > 0 &&
-			      sk->sk_reuseport &&
-			      !rcu_access_pointer(sk->sk_reuseport_cb) &&
-			      uid_eq(tb->fastuid, uid))) && !snum &&
-			    --attempts >= 0) {
-				spin_unlock_bh(&head->lock);
-				goto again;
-			}
+		if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok))
 			goto fail_unlock;
-		}
-		if (!reuse)
-			tb->fastreuse = 0;
-		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
-			tb->fastreuseport = 0;
-	} else {
+	}
+success:
+	if (!hlist_empty(&tb->owners)) {
 		tb->fastreuse = reuse;
 		if (sk->sk_reuseport) {
 			tb->fastreuseport = 1;
@@ -280,8 +269,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum)
 		} else {
 			tb->fastreuseport = 0;
 		}
+	} else {
+		if (!reuse)
+			tb->fastreuse = 0;
+		if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid))
+			tb->fastreuseport = 0;
 	}
-success:
 	if (!inet_csk(sk)->icsk_bind_hash)
 		inet_bind_hash(sk, tb, port);
 	WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
-- 
2.5.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox