Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: wireless: brcmsmac: Remove unnecessary parentheses
From: Varsha Rao @ 2018-06-01  2:14 UTC (permalink / raw)
  To: Nicholas Mc Guire, Lukas Bulwahn, Arend van Spriel, Franky Lin,
	Hante Meuleman, Chi-Hsien Lin, Wright Feng, Kalle Valo,
	David S. Miller, linux-wireless, brcm80211-dev-list.pdl,
	brcm80211-dev-list, netdev, linux-kernel
  Cc: Varsha Rao

This patch fixes the clang warning of extraneous parentheses, with the
following coccinelle script.

@@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
index 3a13d176b221..35e3b101e5cf 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_cmn.c
@@ -159,7 +159,7 @@ u16 read_radio_reg(struct brcms_phy *pi, u16 addr)
 {
 	u16 data;
 
-	if ((addr == RADIO_IDCODE))
+	if (addr == RADIO_IDCODE)
 		return 0xffff;
 
 	switch (pi->pubpi.phy_type) {
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH V4] mlx4_core: allocate ICM memory in page size chunks
From: Qing Huang @ 2018-06-01  2:04 UTC (permalink / raw)
  To: Michal Hocko, Eric Dumazet
  Cc: David Miller, tariqt, haakon.bugge, yanjun.zhu, netdev,
	linux-rdma, linux-kernel, gi-oh.kim, santosh.shilimkar@oracle.com
In-Reply-To: <20180531091022.GL15278@dhcp22.suse.cz>



On 5/31/2018 2:10 AM, Michal Hocko wrote:
> On Thu 31-05-18 10:55:32, Michal Hocko wrote:
>> On Thu 31-05-18 04:35:31, Eric Dumazet wrote:
> [...]
>>> I merely copied/pasted from alloc_skb_with_frags() :/
>> I will have a look at it. Thanks!
> OK, so this is an example of an incremental development ;).
>
> __GFP_NORETRY was added by ed98df3361f0 ("net: use __GFP_NORETRY for
> high order allocations") to prevent from OOM killer. Yet this was
> not enough because fb05e7a89f50 ("net: don't wait for order-3 page
> allocation") didn't want an excessive reclaim for non-costly orders
> so it made it completely NOWAIT while it preserved __GFP_NORETRY in
> place which is now redundant. Should I send a patch?
>

Just curious, how about GFP_ATOMIC flag? Would it work in a similar 
fashion? We experimented
with it a bit in the past but it seemed to cause other issue in our 
tests. :-)

By the way, we didn't encounter any OOM killer events. It seemed that 
the mlx4_alloc_icm() triggered slowpath.
We still had about 2GB free memory while it was highly fragmented.

  #0 [ffff8801f308b380] remove_migration_pte at ffffffff811f0e0b
  #1 [ffff8801f308b3e0] rmap_walk_file at ffffffff811cb890
  #2 [ffff8801f308b440] rmap_walk at ffffffff811cbaf2
  #3 [ffff8801f308b450] remove_migration_ptes at ffffffff811f0db0
  #4 [ffff8801f308b490] __unmap_and_move at ffffffff811f2ea6
  #5 [ffff8801f308b4e0] unmap_and_move at ffffffff811f2fc5
  #6 [ffff8801f308b540] migrate_pages at ffffffff811f3219
  #7 [ffff8801f308b5c0] compact_zone at ffffffff811b707e
  #8 [ffff8801f308b650] compact_zone_order at ffffffff811b735d
  #9 [ffff8801f308b6e0] try_to_compact_pages at ffffffff811b7485
#10 [ffff8801f308b770] __alloc_pages_direct_compact at ffffffff81195f96
#11 [ffff8801f308b7b0] __alloc_pages_slowpath at ffffffff811978a1
#12 [ffff8801f308b890] __alloc_pages_nodemask at ffffffff81197ec1
#13 [ffff8801f308b970] alloc_pages_current at ffffffff811e261f
#14 [ffff8801f308b9e0] mlx4_alloc_icm at ffffffffa01f39b2 [mlx4_core]

Thanks!

^ permalink raw reply

* [PATCH] net: ethernet: mlx4: Remove unnecessary parentheses
From: Varsha Rao @ 2018-06-01  2:00 UTC (permalink / raw)
  To: Tariq Toukan, David S. Miller, Nicholas Mc Guire, Lukas Bulwahn,
	netdev, linux-rdma, linux-kernel
  Cc: Varsha Rao

This patch fixes the clang warning of extraneous parentheses, with the
following coccinelle script.

@@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx4/port.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 3ef3406ff4cb..10fcc22f4590 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -614,9 +614,9 @@ int __mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan,
 		int index_at_dup_port = -1;
 
 		for (i = MLX4_VLAN_REGULAR; i < MLX4_MAX_VLAN_NUM; i++) {
-			if ((vlan == (MLX4_VLAN_MASK & be32_to_cpu(table->entries[i]))))
+			if (vlan == (MLX4_VLAN_MASK & be32_to_cpu(table->entries[i])))
 				index_at_port = i;
-			if ((vlan == (MLX4_VLAN_MASK & be32_to_cpu(dup_table->entries[i]))))
+			if (vlan == (MLX4_VLAN_MASK & be32_to_cpu(dup_table->entries[i])))
 				index_at_dup_port = i;
 		}
 		/* check that same vlan is not in the tables at different indices */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2 net] mlx4_core: restore optimal ICM memory allocation
From: Qing Huang @ 2018-06-01  1:51 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir,
	Daniel Jurgens, Zhu Yanjun
In-Reply-To: <20180531125224.97098-1-edumazet@google.com>



On 5/31/2018 5:52 AM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought two regressions caught in our regression suite.
>
> The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
> or 6.25 % which is unacceptable since ICM can be pretty large.
>
> This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
> per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
> (instead of prior 256KB)

It would be great if you could share the test case that triggered the 
KASAN report in your
environment. Our QA has been running intensive tests using 8KB or 4KB 
chunk size configuration
for some time, no one has reported memory corruption issues so far, 
given that we are not
running the latest upstream kernel.

IMO, it's worthwhile to find out the root cause and fix the problem.


>
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
>
> Most of these allocations happen right after boot time, when we get
> plenty of non fragmented memory, there is really no point being so
> pessimistic and break huge pages into order-0 ones just for fun.
>
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.

Just FYI, out of memory wasn't our original concern. We didn't encounter 
OOM killings.


>
> Second regression is an KASAN fault, that will need further investigations.
>
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Tariq Toukan <tariqt@mellanox.com>
> Cc: John Sperbeck <jsperbeck@google.com>
> Cc: Tarick Bedeir <tarick@google.com>
> Cc: Qing Huang <qing.huang@oracle.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..5342bd8a3d0bfaa9e76bb9b6943790606c97b181 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> -	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   	struct mlx4_icm *icm;
>   	struct mlx4_icm_chunk *chunk = NULL;
>   	int cur_order;
> +	gfp_t mask;
>   	int ret;
>   
>   	/* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,17 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   		while (1 << cur_order > npages)
>   			--cur_order;
>   
> +		mask = gfp_mask;
> +		if (cur_order)
> +			mask &= ~__GFP_DIRECT_RECLAIM;
> +
>   		if (coherent)
>   			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
>   						      &chunk->mem[chunk->npages],
> -						      cur_order, gfp_mask);
> +						      cur_order, mask);
>   		else
>   			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> -						   cur_order, gfp_mask,
> +						   cur_order, mask,
>   						   dev->numa_node);
>   
>   		if (ret) {

^ permalink raw reply

* [PATCH net-next v2 2/2] qed: Fix use of incorrect shmem address.
From: Sudarsana Reddy Kalluru @ 2018-06-01  1:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Michal.Kalderon
In-Reply-To: <20180601014737.6164-1-sudarsana.kalluru@cavium.com>

Incorrect shared memory address is used while deriving the values
for tc and pri_type. Use shmem address corresponding to 'oem_cfg_func'
where the management firmare saves tc/pri_type values.

Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index 2612e3e..6f9927d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1514,9 +1514,10 @@ void qed_mcp_read_ufp_config(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)
 	}
 
 	qed_mcp_get_shmem_func(p_hwfn, p_ptt, &shmem_info, MCP_PF_ID(p_hwfn));
-	val = (port_cfg & OEM_CFG_FUNC_TC_MASK) >> OEM_CFG_FUNC_TC_OFFSET;
+	val = (shmem_info.oem_cfg_func & OEM_CFG_FUNC_TC_MASK) >>
+		OEM_CFG_FUNC_TC_OFFSET;
 	p_hwfn->ufp_info.tc = (u8)val;
-	val = (port_cfg & OEM_CFG_FUNC_HOST_PRI_CTRL_MASK) >>
+	val = (shmem_info.oem_cfg_func & OEM_CFG_FUNC_HOST_PRI_CTRL_MASK) >>
 		OEM_CFG_FUNC_HOST_PRI_CTRL_OFFSET;
 	if (val == OEM_CFG_FUNC_HOST_PRI_CTRL_VNIC) {
 		p_hwfn->ufp_info.pri_type = QED_UFP_PRI_VNIC;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v2 1/2] qed: Fix shared memory inconsistency between driver and the MFW.
From: Sudarsana Reddy Kalluru @ 2018-06-01  1:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Michal.Kalderon
In-Reply-To: <20180601014737.6164-1-sudarsana.kalluru@cavium.com>

The structure shared between driver and management firmware (MFW)
differ in sizes. The additional field defined by the MFW is not
relevant to the current driver. Add a dummy field to the structure.

Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..beba930 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -11996,6 +11996,7 @@ struct public_port {
 #define EEE_REMOTE_TW_RX_MASK   0xffff0000
 #define EEE_REMOTE_TW_RX_OFFSET 16

+	u32 reserved1;
 	u32 oem_cfg_port;
 #define OEM_CFG_CHANNEL_TYPE_MASK                       0x00000003
 #define OEM_CFG_CHANNEL_TYPE_OFFSET                     0
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v2 0/2] qed: Fix issues in UFP feature commit 'cac6f691'.
From: Sudarsana Reddy Kalluru @ 2018-06-01  1:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Michal.Kalderon, Sudarsana Reddy Kalluru

From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>

This patch series fixes couple of issues in the UFP feature commit,
   cac6f691: Add support for Unified Fabric Port.

Changes from previous version:
------------------------------
v2: Added "Fixes:" tag.

Please consider applying it to "net-next".

Sudarsana Reddy Kalluru (2):
  qed: Fix shared memory inconsistency between driver and the MFW.
  qed: Fix use of incorrect shmem address.

 drivers/net/ethernet/qlogic/qed/qed_hsi.h | 1 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next] net: phy: consider PHY_IGNORE_INTERRUPT in state machine PHY_NOLINK handling
From: David Miller @ 2018-06-01  1:26 UTC (permalink / raw)
  To: hkallweit1; +Cc: f.fainelli, andrew, netdev
In-Reply-To: <0a4e472d-cb7f-ef1f-420c-1327fa41e8cd@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 30 May 2018 22:13:20 +0200

> We can bail out immediately also in case of PHY_IGNORE_INTERRUPT because
> phy_mac_interupt() informs us once the link is up.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

When state is PHY_NOLINK, the phy_mac_interrupt() code paths
will change the state to PHY_CHANGELINK before queueing up
the state machine invocation.

So I can't even see how we can enter phy_state_machine with
->state == PHY_NOLINK is the mac interrupt paths are being
used properly.

Therefore it looks like the code as written is harmless.

Did you actually hit a problem with this test or is this
a change based purely upon code inspection?

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: prevent non-IPv4 socket to be added into sock hash
From: Eric Dumazet @ 2018-06-01  1:00 UTC (permalink / raw)
  To: John Fastabend; +Cc: Wei Wang, netdev, Willem de Bruijn
In-Reply-To: <c6aae1fa-3ad1-9192-3e99-e177b1098f06@gmail.com>

On Thu, May 31, 2018 at 7:32 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
>
> Hi Wei,
>
> Thanks for the report and fix. It would be better to fix the
> root cause so that IPv6 works as intended.
>
> I'm testing the following now,
>
> Author: John Fastabend <john.fastabend@gmail.com>
> Date:   Thu May 31 14:38:59 2018 -0700
>
>     sockmap: fix crash when ipv6 sock is added by adding support for IPv6
>
>     Apparently we had a testing escape and missed IPv6. This fixes a crash
>     where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot.
>
>     Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>

Hi John

In any case, please forward correct attribution for Wei's work, and
syzbot 'Reported-by'

Are you sure you are handling IPv4 mapped in IPv6 sockets as well ?

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: Avoid GFP_KERNEL in response handler
From: Samuel Mendoza-Jonas @ 2018-06-01  0:33 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S . Miller, linux-kernel, openbmc
In-Reply-To: <69fcb143-00a2-2ddf-e2d4-c692b650f292@gmail.com>

On Thu, 2018-05-31 at 04:50 -0400, Eric Dumazet wrote:
> 
> On 05/31/2018 03:02 AM, Samuel Mendoza-Jonas wrote:
> > ncsi_rsp_handler_gc() allocates the filter arrays using GFP_KERNEL in
> > softirq context, causing the below backtrace. This allocation is only a
> > few dozen bytes during probing so allocate with GFP_ATOMIC instead.
> > 
> 
> Hi Samuel
> 
> You forgot to add
> 
> Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters")
> 
> size = (rsp->uc_cnt + rsp->mc_cnt + rsp->mixed_cnt) * ETH_ALEN;
> 
> -> seems to be able to reach more than few dozen bytes...

Hi Eric,

The NCSI spec (at least in the v1.1.0 version I'm looking at) sets the
total number of MAC address filters at 8, so we would be looking at a
maximum of 8 * ETH_ALEN = 48 bytes.
That said it shouldn't be too arduous to move the allocation to later in
the probe/configure cycle so if needed we could do that.

> 
> Also, what prevents ncsi_rsp_handler_gc() to be called multiples times ?
> 
> nc->mac_filter.addrs & nc->vlan_filter.vids would be re-allocated and memory would leak.
> 

Good point, we should put a check there just in case to see if it's
allocated. We should be safe though as ncsi_rsp_handler_gc() should only
be called via ncsi_probe_channel() which only happens through
ncsi_start_dev(), and addrs/vids is cleaned up in ncsi_remove_channel().
Rogue packets shouldn't hit the ncsi_rsp_handler_gc() handler without an
outstanding request.. but it probably is safer to check regardless.

Regards,
Sam

^ permalink raw reply

* Re: [PATCH bpf-next] xsk: temporarily disable AF_XDP
From: Björn Töpel @ 2018-06-01  0:24 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Netdev
  Cc: Björn Töpel, Karlsson, Magnus, Magnus Karlsson
In-Reply-To: <20180531001754.15923-1-bjorn.topel@gmail.com>

Den ons 30 maj 2018 kl 17:22 skrev Björn Töpel <bjorn.topel@gmail.com>:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Temporarily disable AF_XDP sockets, and hide uapi.
>
[...]

Alexei/Daniel,

Ignore this patch, please.


Thanks,
Björn

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Andrew Lunn @ 2018-06-01  0:10 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <f48b0978-7891-487b-d2b1-3f23b269578c@gmail.com>

> Configuring the different WoL options isn't handled by writing to
> the PHY registers but by writing to chip / MAC registers.
> Therefore phy_suspend() isn't able to figure out whether WoL is
> enabled or not. Only the parent has the full picture.

Hi Heiner

I think you need to look at your different runtime PM domains.  If i
understand the code right, you runtime suspend if there is no
link. But for this to work correctly, your PHY needs to keep working.
You also cannot assume all accesses to the PHY go via the MAC. Some
calls will go direct to the PHY, and they can trigger MDIO bus
accesses.  So i think you need two runtime PM domains. MAC and MDIO
bus.  Maybe just the pll? An MDIO bus is a device, so it can have its
on PM callbacks. It is not clear what you need to resume in order to
make MDIO work.

It might also help if you do the phy_connect in .ndo_open and
disconnect in .ndo_stop. This is a common pattern in drivers. But some
also do it is probe and remove.

     Andrew

^ permalink raw reply

* Re: suspicius csum initialization in vmxnet3_rx_csum
From: Neil Horman @ 2018-05-31 23:32 UTC (permalink / raw)
  To: Ronak Doshi; +Cc: Paolo Abeni, Guolin Yang, Boon Ang, Louis Luo, netdev
In-Reply-To: <alpine.OSX.2.21.1805311048380.47071@doshir-m01.vmware.com>

On Thu, May 31, 2018 at 11:02:34AM -0700, Ronak Doshi wrote:
> 
> On Wed, 30 May 2018, Paolo Abeni wrote:
> 
> > Hi,
> > 
> > On Thu, 2018-05-24 at 21:48 +0000, Guolin Yang wrote:
> > > Yes, that code  is not correct, we should fix that code
> > 
> > Did you have any chance to address the issue and/or to give a more in-
> > deepth look to the change proposed in my initial email?
> >  
> Hi Paolo,
> 
> Can you provide the esx build you are using? It can be found using 
> "vmware -vl" on ESX host.
> 
> Did you try your proposed fix and did it work? Are you sure the packet
> hits the below if block and not the else block? I still don't think the
> ICMP packet will go through the below if block.
> 
>    if (gdesc->rcd.csum) {
> 	skb->csum = htons(gdesc->rcd.csum);
> 	skb->ip_summed = CHECKSUM_PARTIAL;
>    } else {
> 	skb_checksum_none_assert(skb);
>    }
> 
> The vmxnet3 emulation does not calculate rcd.csum for ICMP packet and
> hence should go through the else block i.e. checksum none.
> 
What packet types will rcd.csum be set for?
Neil

> Thanks,
> Ronak
> 

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: prevent non-IPv4 socket to be added into sock hash
From: John Fastabend @ 2018-05-31 23:31 UTC (permalink / raw)
  To: Wei Wang, netdev; +Cc: Eric Dumazet, Willem de Bruijn
In-Reply-To: <20180530212928.190650-1-tracywwnj@gmail.com>

On 05/30/2018 02:29 PM, Wei Wang wrote:
> From: Wei Wang <weiwan@google.com>
> 
> Sock hash only supports IPv4 socket proto right now.
> If a non-IPv4 socket gets stored in the BPF map, sk->sk_prot gets
> overwritten with the v4 tcp prot.
> 
> Syskaller reported the following related issue on an IPv6 socket:
> BUG: KASAN: slab-out-of-bounds in ip6_dst_idev include/net/ip6_fib.h:203 [inline]
> BUG: KASAN: slab-out-of-bounds in ip6_xmit+0x2002/0x23f0 net/ipv6/ip6_output.c:264
> Read of size 8 at addr ffff8801b300edb0 by task syz-executor888/4522
> 
> CPU: 0 PID: 4522 Comm: syz-executor888 Not tainted 4.17.0-rc4+ #17
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>  ip6_dst_idev include/net/ip6_fib.h:203 [inline]
>  ip6_xmit+0x2002/0x23f0 net/ipv6/ip6_output.c:264
>  inet6_csk_xmit+0x377/0x630 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1be0/0x3e40 net/ipv4/tcp_output.c:1159
>  tcp_send_syn_data net/ipv4/tcp_output.c:3441 [inline]
>  tcp_connect+0x2207/0x45a0 net/ipv4/tcp_output.c:3480
>  tcp_v4_connect+0x1934/0x1d50 net/ipv4/tcp_ipv4.c:272
>  __inet_stream_connect+0x943/0x1120 net/ipv4/af_inet.c:655
>  tcp_sendmsg_fastopen net/ipv4/tcp.c:1162 [inline]
>  tcp_sendmsg_locked+0x2859/0x3ee0 net/ipv4/tcp.c:1209
>  tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
>  inet_sendmsg+0x19f/0x690 net/ipv4/af_inet.c:798
>  sock_sendmsg_nosec net/socket.c:629 [inline]
>  sock_sendmsg+0xd5/0x120 net/socket.c:639
>  ___sys_sendmsg+0x805/0x940 net/socket.c:2117
>  __sys_sendmsg+0x115/0x270 net/socket.c:2155
>  __do_sys_sendmsg net/socket.c:2164 [inline]
>  __se_sys_sendmsg net/socket.c:2162 [inline]
>  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x43ff99
> RSP: 002b:00007ffc00bd1cf8 EFLAGS: 00000217 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ff99
> RDX: 0000000020000000 RSI: 0000000020000580 RDI: 0000000000000003
> RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
> R10: 00000000004002c8 R11: 0000000000000217 R12: 00000000004018c0
> R13: 0000000000401950 R14: 0000000000000000 R15: 0000000000000000
> 
> Fixes: 81110384441a ("bpf: sockmap, add hash map support")
> Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> ---

Hi Wei,

Thanks for the report and fix. It would be better to fix the
root cause so that IPv6 works as intended.

I'm testing the following now,

Author: John Fastabend <john.fastabend@gmail.com>
Date:   Thu May 31 14:38:59 2018 -0700

    sockmap: fix crash when ipv6 sock is added by adding support for IPv6
    
    Apparently we had a testing escape and missed IPv6. This fixes a crash
    where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot.
    
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 52a91d8..e191122 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -41,6 +41,7 @@
 #include <linux/mm.h>
 #include <net/strparser.h>
 #include <net/tcp.h>
+#include <net/transp_v6.h>
 #include <linux/ptr_ring.h>
 #include <net/inet_common.h>
 #include <linux/sched/signal.h>
@@ -162,6 +163,8 @@ static bool bpf_tcp_stream_read(const struct sock *sk)
 }
 
 static struct proto tcp_bpf_proto;
+static struct proto tcpv6_bpf_proto;
+
 static int bpf_tcp_init(struct sock *sk)
 {
        struct smap_psock *psock;
@@ -182,13 +185,21 @@ static int bpf_tcp_init(struct sock *sk)
        psock->sk_proto = sk->sk_prot;
 
        if (psock->bpf_tx_msg) {
+               tcpv6_bpf_proto.sendmsg = bpf_tcp_sendmsg;
+               tcpv6_bpf_proto.sendpage = bpf_tcp_sendpage;
+               tcpv6_bpf_proto.recvmsg = bpf_tcp_recvmsg;
+               tcpv6_bpf_proto.stream_memory_read = bpf_tcp_stream_read;
                tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
                tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
                tcp_bpf_proto.recvmsg = bpf_tcp_recvmsg;
                tcp_bpf_proto.stream_memory_read = bpf_tcp_stream_read;
        }
 
-       sk->sk_prot = &tcp_bpf_proto;
+       if (sk->sk_family == AF_INET6)
+               sk->sk_prot = &tcpv6_bpf_proto;
+       else
+               sk->sk_prot = &tcp_bpf_proto;
+
        rcu_read_unlock();
        return 0;
 }
@@ -1113,6 +1124,8 @@ static int bpf_tcp_ulp_register(void)
 {
        tcp_bpf_proto = tcp_prot;
        tcp_bpf_proto.close = bpf_tcp_close;
+       tcpv6_bpf_proto = tcpv6_prot;
+       tcpv6_bpf_proto.close = bpf_tcp_close;
        /* Once BPF TX ULP is registered it is never unregistered. It
         * will be in the ULP list for the lifetime of the system. Doing
         * duplicate registers is not a problem.

^ permalink raw reply related

* [PATCH net-next 2/2] selftests: forwarding: mirror_vlan: Change test description
From: Petr Machata @ 2018-05-31 22:37 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah, idosch
In-Reply-To: <cover.1527805500.git.petrm@mellanox.com>

The test description is displayed with the PASS/FAIL resolution after
the test is ran. There however already is one other test described
exactly like this, which makes it unclear which of the tests passed or
failed. Make the description unique.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 tools/testing/selftests/net/forwarding/mirror_vlan.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/mirror_vlan.sh b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
index c22e949..efade19 100755
--- a/tools/testing/selftests/net/forwarding/mirror_vlan.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
@@ -129,7 +129,7 @@ test_tagged_vlan_dir()
 				  192.0.2.17 192.0.2.18
 	mirror_uninstall $swp1 $direction
 
-	log_test "$direction mirror to vlan ($tcflags)"
+	log_test "$direction mirror tagged to vlan ($tcflags)"
 }
 
 test_tagged_vlan()
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next 1/2] selftests: forwarding: mirror_vlan: Uninstall trap
From: Petr Machata @ 2018-05-31 22:37 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah, idosch
In-Reply-To: <cover.1527805500.git.petrm@mellanox.com>

Instead of installing a trap before tests run and uninstalling it after
they run, mirror_vlan.sh installs it twice due to a typo. Fix the typo.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 tools/testing/selftests/net/forwarding/mirror_vlan.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/forwarding/mirror_vlan.sh b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
index 1e10520..c22e949 100755
--- a/tools/testing/selftests/net/forwarding/mirror_vlan.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_vlan.sh
@@ -146,7 +146,7 @@ test_all()
 
 	tests_run
 
-	trap_install $h3 ingress
+	trap_uninstall $h3 ingress
 	slow_path_trap_uninstall $swp1 egress
 	slow_path_trap_uninstall $swp1 ingress
 }
-- 
2.4.11

^ permalink raw reply related

* [PATCH net-next 0/2] selftests: forwarding: mirror_vlan: Fixlets
From: Petr Machata @ 2018-05-31 22:37 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah, idosch

This patchset includes two small fixes for the tests that were
introduced in commit 1bb58d2d3cbe ("Merge branch
'Mirroring-tests-involving-VLAN'").

In patch #1, a "tc action trap" is uninstalled after the suite runs,
instead of being installed again.

In patch #2, a test in suite is renamed to differentiate it from another
test of the same name.

Petr Machata (2):
  selftests: forwarding: mirror_vlan: Uninstall trap
  selftests: forwarding: mirror_vlan: Change test description

 tools/testing/selftests/net/forwarding/mirror_vlan.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.4.11

^ permalink raw reply

* Re: [PATCH v2 net] ixgbe: fix parsing of TC actions for HW offload
From: Jeff Kirsher @ 2018-05-31 21:46 UTC (permalink / raw)
  To: Ondřej Hlavatý; +Cc: netdev, Andrew Bowers
In-Reply-To: <20180531212104.23572-1-ohlavaty@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

On Thu, 2018-05-31 at 23:21 +0200, Ondřej Hlavatý wrote:
> The previous code was optimistic, accepting the offload of whole
> action
> chain when there was a single known action (drop/redirect). This
> results
> in offloading a rule which should not be offloaded, because its
> behavior
> cannot be reproduced in the hardware.
> 
> For example:
> 
> $ tc filter add dev eno1 parent ffff: protocol ip \
>     u32 ht 800: order 1 match tcp src 42 FFFF \
>     action mirred egress mirror dev enp1s16 pipe \
>     drop
> 
> The controller is unable to mirror the packet to a VF, but still
> offloads the rule by dropping the packet.
> 
> Change the approach of the function to a pessimistic one, rejecting
> the
> chain when an unknown action is found. This is better suited for
> future
> extensions.
> 
> Note that both recognized actions always return TC_ACT_SHOT,
> therefore
> it is safe to ignore actions behind them.
> 
> Signed-off-by: Ondřej Hlavatý <ohlavaty@redhat.com>

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Note- I am having our validation move to testing with GCC 8.1.1 or
later so that we can catch warnings like Dave found in the future.

Dave- Please go ahead and pick this up.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 net] ixgbe: fix parsing of TC actions for HW offload
From: Ondřej Hlavatý @ 2018-05-31 21:21 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: netdev, Andrew Bowers, Ondřej Hlavatý

The previous code was optimistic, accepting the offload of whole action
chain when there was a single known action (drop/redirect). This results
in offloading a rule which should not be offloaded, because its behavior
cannot be reproduced in the hardware.

For example:

$ tc filter add dev eno1 parent ffff: protocol ip \
    u32 ht 800: order 1 match tcp src 42 FFFF \
    action mirred egress mirror dev enp1s16 pipe \
    drop

The controller is unable to mirror the packet to a VF, but still
offloads the rule by dropping the packet.

Change the approach of the function to a pessimistic one, rejecting the
chain when an unknown action is found. This is better suited for future
extensions.

Note that both recognized actions always return TC_ACT_SHOT, therefore
it is safe to ignore actions behind them.

Signed-off-by: Ondřej Hlavatý <ohlavaty@redhat.com>

---

Changes from v1: Fix the introduced warning by rejecting hypothetical
empty action chains as well. Removed Cc's from description.

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index afadba99f7b8..2ecd55856c50 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9054,7 +9054,6 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter,
 {
 	const struct tc_action *a;
 	LIST_HEAD(actions);
-	int err;

 	if (!tcf_exts_has_actions(exts))
 		return -EINVAL;
@@ -9075,11 +9074,11 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter,

 			if (!dev)
 				return -EINVAL;
-			err = handle_redirect_action(adapter, dev->ifindex, queue,
-						     action);
-			if (err == 0)
-				return err;
+			return handle_redirect_action(adapter, dev->ifindex,
+						      queue, action);
 		}
+
+		return -EINVAL;
 	}

 	return -EINVAL;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] rtnetlink: Add more well known protocol values
From: Stephen Hemminger @ 2018-05-31 21:13 UTC (permalink / raw)
  To: Donald Sharp; +Cc: netdev, dsahern
In-Reply-To: <20180530122732.3688-1-sharpd@cumulusnetworks.com>

On Wed, 30 May 2018 08:27:32 -0400
Donald Sharp <sharpd@cumulusnetworks.com> wrote:

> FRRouting installs routes into the kernel associated with
> the originating protocol.  Add these values to the well
> known values in rtnetlink.h.
> 
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> v2: Fixed whitespace issues
>  include/uapi/linux/rtnetlink.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index cabb210c93af..7d8502313c99 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -254,6 +254,11 @@ enum {
>  #define RTPROT_DHCP	16      /* DHCP client */
>  #define RTPROT_MROUTED	17      /* Multicast daemon */
>  #define RTPROT_BABEL	42      /* Babel daemon */
> +#define RTPROT_BGP	186     /* BGP Routes */
> +#define RTPROT_ISIS	187     /* ISIS Routes */
> +#define RTPROT_OSPF	188     /* OSPF Routes */
> +#define RTPROT_RIP	189     /* RIP Routes */
> +#define RTPROT_EIGRP	192     /* EIGRP Routes */
>  
>  /* rtm_scope
>  

There is a matching table in iproute2 which should also be updated.
See etc/iproute/rt_protos

There also seems to be some leftover gated garbage there as well...

^ permalink raw reply

* [PATCH] ixgbe: fix broken ipsec Rx with proper cast on spi
From: Shannon Nelson @ 2018-05-31 21:12 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev

Fix up a cast problem introduced by a sparse cleanup patch.  This fixes
a problem where the encrypted packets were not recognized on Rx and
subsequently dropped.

Fixes: 9cfbfa701b55 ("ixgbe: cleanup sparse warnings")
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index e1c9762..344a1f2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -663,7 +663,7 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 
 		/* hash the new entry for faster search in Rx path */
 		hash_add_rcu(ipsec->rx_sa_list, &ipsec->rx_tbl[sa_idx].hlist,
-			     (__force u64)rsa.xs->id.spi);
+			     (__force u32)rsa.xs->id.spi);
 	} else {
 		struct tx_sa tsa;
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next v12 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Siwei Liu @ 2018-05-31 20:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, venu.busireddy
  Cc: Stephen Hemminger, Sridhar Samudrala, David Miller, Netdev,
	virtualization, virtio-dev, Brandeburg, Jesse, Alexander Duyck,
	Jakub Kicinski, Jason Wang, Jiri Pirko, aaron.f.brown,
	anjali.singhai
In-Reply-To: <20180531212356-mutt-send-email-mst@kernel.org>

On Thu, May 31, 2018 at 11:35 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 30, 2018 at 10:06:35PM -0400, Stephen Hemminger wrote:
>> On Thu, 24 May 2018 09:55:14 -0700
>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>
>> > Use the registration/notification framework supported by the generic
>> > failover infrastructure.
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>
>> Why was this merged? It was never signed off by any of the netvsc maintainers,
>> and there were still issues unresolved.
>>
>> There are also namespaces issues I am fixing and this breaks them.
>> Will start my patch set with a revert for this. Sorry
>
> As long as you finish the patch set with re-integrating with failover,
> that's fine IMHO.
>
> I suspect it's easier to add the code to failover though - namespace
> things likely affect virtio as well. Lookup by ID would be an optional
> feature for virtio, but probably a useful one - I won't ask you
I would think for production uses this is a required feature and
should be enabled by default.

Venu (cc'ed) is working on the group ID stuff currently for pairing
virtio and passthrough devices. Would appreciate feedback on the group
ID proposal on virtio-dev.

-Siwei

> to add it to virtio but it could be a mode in failover
> that virtio will activate down the road. And reducing the number of
> times we look cards up based on ID can only be a good thing.
>
> --
> MST

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Heiner Kallweit @ 2018-05-31 20:28 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <20180531183040.GA7378@lunn.ch>

On 31.05.2018 20:30, Andrew Lunn wrote:
>> By the way: The problem is related to an experimental patch series for
>> splitting r8169/r8168 drivers and switching r8168 to phylib.
>> Therefore the change to r8168.c won't apply to existing kernel code.
> 
> Hi Heiner
> 
> I still think you are trying to fix the wrong problem.
> 
> Lets take a look at these patches, particularly your code for
> interfacing to phylib.
> 
> Thanks
> 	Andrew
> 
Hi Andrew,

I skip some intermediate patches and in the following just list the
patch adding basic phylib support. The code shouldn't be a big
surprise ..

The network chip integrates subsystems like MAC and PHY, and the
driver takes care that in suspend / resume the different components
are suspended / resumed in the right order.
This includes actions like speeding down the PHY from 1GB to
preferably 10MB (to save energy) in case WoL is configured.

It causes issues like the one I face now, if subsystems like the
MDIO bus suddenly trigger own suspend / resume actions w/o knowing
about status of the other chip subsystems.
Main issue I think is that we loose control over what is done in
which order.

To provide just one example of typical issues:
Configuring the different WoL options isn't handled by writing to
the PHY registers but by writing to chip / MAC registers.
Therefore phy_suspend() isn't able to figure out whether WoL is
enabled or not. Only the parent has the full picture.

To consider dependencies and ensure the right order of suspend /
resume actions I think a parent should be allowed to request that
it takes care of PM of its subsystems.

Regards,
Heiner

---
 drivers/net/ethernet/realtek/Kconfig |   1 +
 drivers/net/ethernet/realtek/r8168.c | 140 +++++++++++++++++++++------
 drivers/net/ethernet/realtek/r8169.h |   2 +
 3 files changed, 116 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/realtek/Kconfig b/drivers/net/ethernet/realtek/Kconfig
index 404e1f288..f1e9e8cc2 100644
--- a/drivers/net/ethernet/realtek/Kconfig
+++ b/drivers/net/ethernet/realtek/Kconfig
@@ -105,6 +105,7 @@ config R8168
 	select R8169_COMMON
 	select FW_LOADER
 	select CRC32
+	select PHYLIB
 	select MII
 	help
 	  Say Y here if you have a Realtek 8168 PCI Gigabit Ethernet adapter.
diff --git a/drivers/net/ethernet/realtek/r8168.c b/drivers/net/ethernet/realtek/r8168.c
index 546115344..473a147ec 100644
--- a/drivers/net/ethernet/realtek/r8168.c
+++ b/drivers/net/ethernet/realtek/r8168.c
@@ -16,6 +16,7 @@
 #include <linux/delay.h>
 #include <linux/ethtool.h>
 #include <linux/mii.h>
+#include <linux/phy.h>
 #include <linux/if_vlan.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -868,25 +869,6 @@ static void rtl_link_chg_patch(struct rtl8169_private *tp)
 	}
 }
 
-static void rtl8169_check_link_status(struct net_device *dev,
-				      struct rtl8169_private *tp)
-{
-	struct device *d = tp_to_dev(tp);
-
-	if (rtl8169_xmii_link_ok(tp)) {
-		rtl_link_chg_patch(tp);
-		/* This is to cancel a scheduled suspend if there's one. */
-		pm_request_resume(d);
-		netif_carrier_on(dev);
-		if (net_ratelimit())
-			netif_info(tp, ifup, dev, "link up\n");
-	} else {
-		netif_carrier_off(dev);
-		netif_info(tp, ifdown, dev, "link down\n");
-		pm_runtime_idle(d);
-	}
-}
-
 #define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)
 
 static u32 __rtl8169_get_wol(struct rtl8169_private *tp)
@@ -4444,8 +4426,8 @@ static void rtl_reset_work(struct rtl8169_private *tp)
 
 	napi_enable(&tp->napi);
 	rtl8169_hw_start(tp);
+	phy_start(dev->phydev);
 	netif_wake_queue(dev);
-	rtl8169_check_link_status(dev, tp);
 }
 
 static bool rtl_test_hw_pad_bug(struct rtl8169_private *tp, struct sk_buff *skb)
@@ -4615,7 +4597,7 @@ static void rtl_slow_event_work(struct rtl8169_private *tp)
 		rtl8169_pcierr_interrupt(dev);
 
 	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp);
+		phy_mac_interrupt(dev->phydev);
 
 	rtl_irq_enable_all(tp);
 }
@@ -4658,6 +4640,8 @@ static void rtl8169_down(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 
+	phy_stop(dev->phydev);
+
 	del_timer_sync(&tp->timer);
 
 	napi_disable(&tp->napi);
@@ -4762,14 +4746,13 @@ static int rtl_open(struct net_device *dev)
 	if (!rtl8169_init_counter_offsets(tp))
 		netif_warn(tp, hw, dev, "counter reset/update failed\n");
 
+	phy_start(dev->phydev);
 	netif_start_queue(dev);
 
 	rtl_unlock_work(tp);
 
 	tp->saved_wolopts = 0;
 	pm_runtime_put_sync(&pdev->dev);
-
-	rtl8169_check_link_status(dev, tp);
 out:
 	return retval;
 
@@ -4796,6 +4779,7 @@ static void rtl8169_net_suspend(struct net_device *dev)
 	if (!netif_running(dev))
 		return;
 
+	phy_stop(dev->phydev);
 	netif_device_detach(dev);
 	netif_stop_queue(dev);
 
@@ -4827,6 +4811,8 @@ static void __rtl8169_resume(struct net_device *dev)
 
 	rtl_pll_power_up(tp);
 
+	phy_start(tp->dev->phydev);
+
 	rtl_lock_work(tp);
 	napi_enable(&tp->napi);
 	set_bit(RTL_FLAG_TASK_ENABLED, tp->wk.flags);
@@ -4979,6 +4965,8 @@ static void rtl_remove_one(struct pci_dev *pdev)
 	netif_napi_del(&tp->napi);
 
 	unregister_netdev(dev);
+	phy_disconnect(dev->phydev);
+	mdiobus_unregister(tp->mii_bus);
 
 	rtl_release_firmware(tp);
 
@@ -5041,6 +5029,92 @@ DECLARE_RTL_COND(rtl_rxtx_empty_cond)
 	return (RTL_R8(tp, MCU) & RXTX_EMPTY) == RXTX_EMPTY;
 }
 
+static int r8168_mdio_read_reg(struct mii_bus *mii_bus, int phyaddr, int phyreg)
+{
+	struct rtl8169_private *tp = mii_bus->priv;
+
+	return tp->mdio_ops.read(tp, phyreg);
+}
+
+static int r8168_mdio_write_reg(struct mii_bus *mii_bus, int phyaddr,
+				int phyreg, u16 val)
+{
+	struct rtl8169_private *tp = mii_bus->priv;
+
+	tp->mdio_ops.write(tp, phyreg, val);
+
+	return 0;
+}
+
+static int r8168_mdio_register(struct rtl8169_private *tp)
+{
+	struct pci_dev *pdev = tp->pci_dev;
+	struct mii_bus *new_bus;
+	int ret;
+
+	new_bus = devm_mdiobus_alloc(&pdev->dev);
+	if (!new_bus)
+		return -ENOMEM;
+
+	new_bus->name = "r8168";
+	new_bus->priv = tp;
+	new_bus->phy_mask = ~1;
+	new_bus->parent = &pdev->dev;
+	new_bus->irq[0] = PHY_IGNORE_INTERRUPT;
+	snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8168-%x",
+		 PCI_DEVID(pdev->bus->number, pdev->devfn));
+
+	new_bus->read = r8168_mdio_read_reg;
+	new_bus->write = r8168_mdio_write_reg;
+
+	ret = mdiobus_register(new_bus);
+	if (!ret)
+		tp->mii_bus = new_bus;
+
+	return ret;
+}
+
+static void r8168_phylink_handler(struct net_device *ndev)
+{
+	struct rtl8169_private *tp = netdev_priv(ndev);
+
+	if (netif_carrier_ok(ndev)) {
+		rtl_link_chg_patch(tp);
+		pm_request_resume(&tp->pci_dev->dev);
+	} else {
+		pm_runtime_idle(&tp->pci_dev->dev);
+	}
+
+	if (net_ratelimit())
+		phy_print_status(ndev->phydev);
+}
+
+static int r8168_phy_connect(struct rtl8169_private *tp)
+{
+	struct phy_device *phydev;
+	phy_interface_t phy_mode;
+	int ret;
+
+	phy_mode = tp->mii.supports_gmii ? PHY_INTERFACE_MODE_GMII :
+		   PHY_INTERFACE_MODE_MII;
+
+	phydev = phy_find_first(tp->mii_bus);
+	if (!phydev)
+		return -ENODEV;
+
+	if (!tp->mii.supports_gmii && phydev->supported & PHY_1000BT_FEATURES) {
+		netif_info(tp, probe, tp->dev, "Restrict PHY to 100Mbit because MAC doesn't support 1GBit\n");
+		phy_set_max_speed(phydev, SPEED_100);
+	}
+
+	ret = phy_connect_direct(tp->dev, phydev, r8168_phylink_handler,
+				 phy_mode);
+	if (!ret)
+		phy_attached_info(phydev);
+
+	return ret;
+}
+
 static void rtl_hw_init_8168g(struct rtl8169_private *tp)
 {
 	u32 data;
@@ -5287,10 +5361,18 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_drvdata(pdev, dev);
 
-	rc = register_netdev(dev);
-	if (rc < 0)
+	rc = r8168_mdio_register(tp);
+	if (rc)
 		return rc;
 
+	rc = r8168_phy_connect(tp);
+	if (rc)
+		goto err_mdio_unregister;
+
+	rc = register_netdev(dev);
+	if (rc)
+		goto err_phy_disconnect;
+
 	netif_info(tp, probe, dev, "%s, %pM, XID %08x, IRQ %d\n",
 		   rtl_chip_infos[chipset].name, dev->dev_addr,
 		   (u32)(RTL_R32(tp, TxConfig) & 0xfcf0f8ff),
@@ -5303,12 +5385,16 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (r8168_check_dash(tp))
 		rtl8168_driver_start(tp);
 
-	netif_carrier_off(dev);
-
 	if (pci_dev_run_wake(pdev))
 		pm_runtime_put_sync(&pdev->dev);
 
 	return 0;
+
+err_phy_disconnect:
+	phy_disconnect(dev->phydev);
+err_mdio_unregister:
+	mdiobus_unregister(tp->mii_bus);
+	return rc;
 }
 
 static struct pci_driver rtl8168_pci_driver = {
diff --git a/drivers/net/ethernet/realtek/r8169.h b/drivers/net/ethernet/realtek/r8169.h
index 355bbcd0f..2f18e5dee 100644
--- a/drivers/net/ethernet/realtek/r8169.h
+++ b/drivers/net/ethernet/realtek/r8169.h
@@ -14,6 +14,7 @@
 #include <linux/delay.h>
 #include <linux/ethtool.h>
 #include <linux/mii.h>
+#include <linux/phy.h>
 #include <linux/if_vlan.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -475,6 +476,7 @@ struct rtl8169_private {
 	} wk;
 
 	struct mii_if_info mii;
+	struct mii_bus *mii_bus;
 	dma_addr_t counters_phys_addr;
 	struct rtl8169_counters *counters;
 	struct rtl8169_tc_offsets tc_offset;
-- 
2.17.0

^ permalink raw reply related

* [net PATCH] net-sysfs: Fix memory leak in XPS configuration
From: Alexander Duyck @ 2018-05-31 19:59 UTC (permalink / raw)
  To: netdev, davem

This patch reorders the error cases in showing the XPS configuration so
that we hold off on memory allocation until after we have verified that we
can support XPS on a given ring.

Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 net/core/net-sysfs.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index c476f07..bb7e80f 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1214,9 +1214,6 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 	cpumask_var_t mask;
 	unsigned long index;
 
-	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
-		return -ENOMEM;
-
 	index = get_netdev_queue_index(queue);
 
 	if (dev->num_tc) {
@@ -1226,6 +1223,9 @@ static ssize_t xps_cpus_show(struct netdev_queue *queue,
 			return -EINVAL;
 	}
 
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+		return -ENOMEM;
+
 	rcu_read_lock();
 	dev_maps = rcu_dereference(dev->xps_maps);
 	if (dev_maps) {

^ permalink raw reply related

* Re: [net] ixgbe: fix parsing of TC actions for HW offload
From: Ondřej Hlavatý @ 2018-05-31 20:01 UTC (permalink / raw)
  To: David Miller
  Cc: jeffrey.t.kirsher, netdev, nhorman, sassmann, jogreene, jhs, jiri
In-Reply-To: <20180531.154106.1949427114118876175.davem@davemloft.net>

On 31.05., David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Wed, 30 May 2018 11:01:34 -0700
> 
> > From: Ondřej Hlavatý <ohlavaty@redhat.com>
> > 
> > The previous code was optimistic, accepting the offload of whole action
> > chain when there was a single known action (drop/redirect). This results
> > in offloading a rule which should not be offloaded, because its behavior
> > cannot be reproduced in the hardware.
> > 
> 
> This introduces a new warning with gcc-8.1.1 on Fedora 28.

Indeed. Sorry for noticing that it was introduced by my patch. The
issue is pretty obvious - empty action chain will not select a queue. As
the Flow Director rules always need to explicitly set a queue index when
they match (as far as I'm concerned), the correct solution should be to
just reject empty chains for offload as well. I will prepare v2 soon.

Ondřej

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox