* Re: broken behaviour of TC filter delete
From: Cong Wang @ 2018-08-24 18:11 UTC (permalink / raw)
To: Roman Mashak
Cc: Jiri Pirko, Linux Kernel Network Developers, Jiri Pirko,
Jamal Hadi Salim
In-Reply-To: <85sh33zqkh.fsf@mojatatu.com>
On Fri, Aug 24, 2018 at 9:21 AM Roman Mashak <mrv@mojatatu.com> wrote:
>
> So _before_ commit f71e0ca4db187af7c44987e9d21e9042c3046070 step 6 would
> return -ENOENT with "Error: Filter with specified priority/protocol not
> found." and _after_ the commit it returns -EINVAL (Error: Cannot find
> specified filter chain.)
>
> ENOENT seems to be more logical to return when there's no more filter to delete.
Yeah, at least we should keep ENOENT for compatibility.
The bug here is chain 0 is gone after the last filter is gone,
so when you delete the filter again, it treats it as you specify
chain 0 which does not exist, so it hits EINVAL before ENOENT.
I am not sure how to fix this properly.
^ permalink raw reply
* ethtool 4.18 released
From: John W. Linville @ 2018-08-24 18:38 UTC (permalink / raw)
To: netdev
ethtool version 4.18 has been released.
Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-4.18.tar.xz
Release notes:
* Feature: Add support for WAKE_FILTER (WoL using filters)
* Feature: Add support for action value -2 (wake-up filter)
* Fix: document WoL filters option also in help message
* Feature: ixgbe dump strings for security registers
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* [net 01/11] ixgb: use dma_zalloc_coherent instead of allocator/memset
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: YueHaibing, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: YueHaibing <yuehaibing@huawei.com>
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index 43664adf7a3c..d3e72d0f66ef 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -771,14 +771,13 @@ ixgb_setup_rx_resources(struct ixgb_adapter *adapter)
rxdr->size = rxdr->count * sizeof(struct ixgb_rx_desc);
rxdr->size = ALIGN(rxdr->size, 4096);
- rxdr->desc = dma_alloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
- GFP_KERNEL);
+ rxdr->desc = dma_zalloc_coherent(&pdev->dev, rxdr->size, &rxdr->dma,
+ GFP_KERNEL);
if (!rxdr->desc) {
vfree(rxdr->buffer_info);
return -ENOMEM;
}
- memset(rxdr->desc, 0, rxdr->size);
rxdr->next_to_clean = 0;
rxdr->next_to_use = 0;
--
2.17.1
^ permalink raw reply related
* [net 05/11] igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Jia-Ju Bai, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Jia-Ju Bai <baijiaju1990@gmail.com>
igb_sw_init() is never called in atomic context.
It calls kzalloc() and kcalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 74416f8a9446..a32c576c1e65 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3873,7 +3873,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
adapter->mac_table = kcalloc(hw->mac.rar_entry_count,
sizeof(struct igb_mac_addr),
- GFP_ATOMIC);
+ GFP_KERNEL);
if (!adapter->mac_table)
return -ENOMEM;
@@ -3883,7 +3883,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
/* Setup and initialize a copy of the hw vlan table array */
adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32),
- GFP_ATOMIC);
+ GFP_KERNEL);
if (!adapter->shadow_vfta)
return -ENOMEM;
--
2.17.1
^ permalink raw reply related
* [net 02/11] e1000: check on netif_running() before calling e1000_up()
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Bo Chen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Bo Chen <chenbo@pdx.edu>
When the device is not up, the call to 'e1000_up()' from the error handling path
of 'e1000_set_ringparam()' causes a kernel oops with a null-pointer
dereference. The null-pointer dereference is triggered in function
'e1000_alloc_rx_buffers()' at line 'buffer_info = &rx_ring->buffer_info[i]'.
This bug was reported by COD, a tool for testing kernel module binaries I am
building. This bug was also detected by KFI from Dr. Kai Cong.
This patch fixes the bug by checking on 'netif_running()' before calling
'e1000_up()' in 'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index bdb3f8e65ed4..c1e4e94f100f 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -644,7 +644,8 @@ static int e1000_set_ringparam(struct net_device *netdev,
err_alloc_rx:
kfree(txdr);
err_alloc_tx:
- e1000_up(adapter);
+ if (netif_running(adapter->netdev))
+ e1000_up(adapter);
err_setup:
clear_bit(__E1000_RESETTING, &adapter->flags);
return err;
--
2.17.1
^ permalink raw reply related
* [net 04/11] igb: Use an advanced ctx descriptor for launchtime
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem
Cc: Jesus Sanchez-Palencia, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
On i210, Launchtime (TxTime) requires the usage of an "Advanced
Transmit Context Descriptor" for retrieving the timestamp of a packet.
The igb driver correctly builds such descriptor on the segmentation
flow (i.e. igb_tso()) or on the checksum one (i.e. igb_tx_csum()), but the
feature is broken for AF_PACKET if the IGB_TX_FLAGS_VLAN is not set,
which happens due to an early return on igb_tx_csum().
This flag is only set by the kernel when a VLAN interface is used,
thus we can't just rely on it. Here we are fixing this issue by checking
if launchtime is enabled for the current tx_ring before performing the
early return.
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index d03c2f0d7592..74416f8a9446 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5816,7 +5816,8 @@ static void igb_tx_csum(struct igb_ring *tx_ring, struct igb_tx_buffer *first)
if (skb->ip_summed != CHECKSUM_PARTIAL) {
csum_failed:
- if (!(first->tx_flags & IGB_TX_FLAGS_VLAN))
+ if (!(first->tx_flags & IGB_TX_FLAGS_VLAN) &&
+ !tx_ring->launchtime_enable)
return;
goto no_csum;
}
--
2.17.1
^ permalink raw reply related
* [net 03/11] e1000: ensure to free old tx/rx rings in set_ringparam()
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Bo Chen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Bo Chen <chenbo@pdx.edu>
In 'e1000_set_ringparam()', the tx_ring and rx_ring are updated with new value
and the old tx/rx rings are freed only when the device is up. There are resource
leaks on old tx/rx rings when the device is not up. This bug is reported by COD,
a tool for testing kernel module binaries I am building.
This patch fixes the bug by always calling 'kfree()' on old tx/rx rings in
'e1000_set_ringparam()'.
Signed-off-by: Bo Chen <chenbo@pdx.edu>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index c1e4e94f100f..2569a168334c 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -624,14 +624,14 @@ static int e1000_set_ringparam(struct net_device *netdev,
adapter->tx_ring = tx_old;
e1000_free_all_rx_resources(adapter);
e1000_free_all_tx_resources(adapter);
- kfree(tx_old);
- kfree(rx_old);
adapter->rx_ring = rxdr;
adapter->tx_ring = txdr;
err = e1000_up(adapter);
if (err)
goto err_setup;
}
+ kfree(tx_old);
+ kfree(rx_old);
clear_bit(__E1000_RESETTING, &adapter->flags);
return 0;
--
2.17.1
^ permalink raw reply related
* [net 08/11] ixgbe: Prevent unsupported configurations with XDP
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem
Cc: Tony Nguyen, netdev, nhorman, sassmann, jogreene, Jakub Kicinski,
Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Tony Nguyen <anthony.l.nguyen@intel.com>
These changes address comments by Jakub Kicinski on
commit 38b7e7f8ae82 ("ixgbe: Do not allow LRO or MTU change with XDP").
Change the MTU check with XDP to allow any supported value and only
reject those outside of the range as opposed to rejecting any change
when XDP is active. In situations where MTU size is not supported,
return -EINVAL instead of -EPERM.
Add checks when enabling SRIOV, DCB, or adding L2FW offloaded device
as they are not supported with XDP.
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 +++++++++++++++++--
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 ++++
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 663d59ba527a..9a23d33a47ed 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6620,8 +6620,18 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
struct ixgbe_adapter *adapter = netdev_priv(netdev);
if (adapter->xdp_prog) {
- e_warn(probe, "MTU cannot be changed while XDP program is loaded\n");
- return -EPERM;
+ int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN +
+ VLAN_HLEN;
+ int i;
+
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ struct ixgbe_ring *ring = adapter->rx_ring[i];
+
+ if (new_frame_size > ixgbe_rx_bufsz(ring)) {
+ e_warn(probe, "Requested MTU size is not supported with XDP\n");
+ return -EINVAL;
+ }
+ }
}
/*
@@ -8983,6 +8993,15 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
#ifdef CONFIG_IXGBE_DCB
if (tc) {
+ if (adapter->xdp_prog) {
+ e_warn(probe, "DCB is not supported with XDP\n");
+
+ ixgbe_init_interrupt_scheme(adapter);
+ if (netif_running(dev))
+ ixgbe_open(dev);
+ return -EINVAL;
+ }
+
netdev_set_num_tc(dev, tc);
ixgbe_set_prio_tc_map(adapter);
@@ -9934,6 +9953,11 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev)
int tcs = adapter->hw_tcs ? : 1;
int pool, err;
+ if (adapter->xdp_prog) {
+ e_warn(probe, "L2FW offload is not supported with XDP\n");
+ return ERR_PTR(-EINVAL);
+ }
+
/* The hardware supported by ixgbe only filters on the destination MAC
* address. In order to avoid issues we only support offloading modes
* where the hardware can actually provide the functionality.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 6f59933cdff7..9264a5f8a5d0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -53,6 +53,11 @@ static int __ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
struct ixgbe_hw *hw = &adapter->hw;
int i;
+ if (adapter->xdp_prog) {
+ e_warn(probe, "SRIOV is not supported with XDP\n");
+ return -EINVAL;
+ }
+
/* Enable VMDq flag so device will be set in VM mode */
adapter->flags |= IXGBE_FLAG_SRIOV_ENABLED |
IXGBE_FLAG_VMDQ_ENABLED;
--
2.17.1
^ permalink raw reply related
* [net 09/11] ixgbe: fix driver behaviour after issuing VFLR
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem
Cc: Sebastian Basierski, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Sebastian Basierski <sebastianx.basierski@intel.com>
Since VFLR doesn't clear VFMBMEM (VF Mailbox Memory)
and is not re-enabling queues correctly we should fix
this behavior.
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 26 +++++++++++++++++++
drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 1 +
2 files changed, 27 insertions(+)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 9264a5f8a5d0..3c6f01c41b78 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -693,8 +693,13 @@ static int ixgbe_set_vf_macvlan(struct ixgbe_adapter *adapter,
static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
{
struct ixgbe_hw *hw = &adapter->hw;
+ struct ixgbe_ring_feature *vmdq = &adapter->ring_feature[RING_F_VMDQ];
struct vf_data_storage *vfinfo = &adapter->vfinfo[vf];
+ u32 q_per_pool = __ALIGN_MASK(1, ~vmdq->mask);
u8 num_tcs = adapter->hw_tcs;
+ u32 reg_val;
+ u32 queue;
+ u32 word;
/* remove VLAN filters beloning to this VF */
ixgbe_clear_vf_vlans(adapter, vf);
@@ -731,6 +736,27 @@ static inline void ixgbe_vf_reset_event(struct ixgbe_adapter *adapter, u32 vf)
/* reset VF api back to unknown */
adapter->vfinfo[vf].vf_api = ixgbe_mbox_api_10;
+
+ /* Restart each queue for given VF */
+ for (queue = 0; queue < q_per_pool; queue++) {
+ unsigned int reg_idx = (vf * q_per_pool) + queue;
+
+ reg_val = IXGBE_READ_REG(hw, IXGBE_PVFTXDCTL(reg_idx));
+
+ /* Re-enabling only configured queues */
+ if (reg_val) {
+ reg_val |= IXGBE_TXDCTL_ENABLE;
+ IXGBE_WRITE_REG(hw, IXGBE_PVFTXDCTL(reg_idx), reg_val);
+ reg_val &= ~IXGBE_TXDCTL_ENABLE;
+ IXGBE_WRITE_REG(hw, IXGBE_PVFTXDCTL(reg_idx), reg_val);
+ }
+ }
+
+ /* Clear VF's mailbox memory */
+ for (word = 0; word < IXGBE_VFMAILBOX_SIZE; word++)
+ IXGBE_WRITE_REG_ARRAY(hw, IXGBE_PFMBMEM(vf), word, 0);
+
+ IXGBE_WRITE_FLUSH(hw);
}
static int ixgbe_set_vf_mac(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 44cfb2021145..41bcbb337e83 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -2518,6 +2518,7 @@ enum {
/* Translated register #defines */
#define IXGBE_PVFTDH(P) (0x06010 + (0x40 * (P)))
#define IXGBE_PVFTDT(P) (0x06018 + (0x40 * (P)))
+#define IXGBE_PVFTXDCTL(P) (0x06028 + (0x40 * (P)))
#define IXGBE_PVFTDWBAL(P) (0x06038 + (0x40 * (P)))
#define IXGBE_PVFTDWBAH(P) (0x0603C + (0x40 * (P)))
--
2.17.1
^ permalink raw reply related
* [net 07/11] ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Jia-Ju Bai, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Jia-Ju Bai <baijiaju1990@gmail.com>
ixgbe_fcoe_ddp_setup(), ixgbe_setup_fcoe_ddp_resources() and
ixgbe_sw_init() are never called in atomic context.
They call kmalloc(), dma_pool_alloc() and kzalloc() with GFP_ATOMIC,
which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Sebastian Basierski <sebastianx.basierski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 94b3165ff543..ccd852ad62a4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -192,7 +192,7 @@ static int ixgbe_fcoe_ddp_setup(struct net_device *netdev, u16 xid,
}
/* alloc the udl from per cpu ddp pool */
- ddp->udl = dma_pool_alloc(ddp_pool->pool, GFP_ATOMIC, &ddp->udp);
+ ddp->udl = dma_pool_alloc(ddp_pool->pool, GFP_KERNEL, &ddp->udp);
if (!ddp->udl) {
e_err(drv, "failed allocated ddp context\n");
goto out_noddp_unmap;
@@ -760,7 +760,7 @@ int ixgbe_setup_fcoe_ddp_resources(struct ixgbe_adapter *adapter)
return 0;
/* Extra buffer to be shared by all DDPs for HW work around */
- buffer = kmalloc(IXGBE_FCBUFF_MIN, GFP_ATOMIC);
+ buffer = kmalloc(IXGBE_FCBUFF_MIN, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index af4c9ae7f432..663d59ba527a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6201,7 +6201,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
adapter->mac_table = kcalloc(hw->mac.num_rar_entries,
sizeof(struct ixgbe_mac_addr),
- GFP_ATOMIC);
+ GFP_KERNEL);
if (!adapter->mac_table)
return -ENOMEM;
--
2.17.1
^ permalink raw reply related
* [net 10/11] i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Martyna Szapar, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Martyna Szapar <martyna.szapar@intel.com>
If interface is connected to switch port configured for DCB there are
TX timeouts when bringing up interface. Problem started appearing after
adding in i40e driver code mqprio hardware offload mode. In function
i40e_vsi_configure_bw_alloc was added resetting BW rate which should
be executing when mqprio qdisc is removed but was also when there was
no mqprio qdisc added and DCB was enabled. In this patch was added
additional check for DCB flag so now when DCB is enabled the correct
DCB configs from before mqprio patch are restored.
Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f2c622e78802..ac685ad4d877 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5122,15 +5122,17 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
u8 *bw_share)
{
struct i40e_aqc_configure_vsi_tc_bw_data bw_data;
+ struct i40e_pf *pf = vsi->back;
i40e_status ret;
int i;
- if (vsi->back->flags & I40E_FLAG_TC_MQPRIO)
+ /* There is no need to reset BW when mqprio mode is on. */
+ if (pf->flags & I40E_FLAG_TC_MQPRIO)
return 0;
- if (!vsi->mqprio_qopt.qopt.hw) {
+ if (!vsi->mqprio_qopt.qopt.hw && !(pf->flags & I40E_FLAG_DCB_ENABLED)) {
ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
if (ret)
- dev_info(&vsi->back->pdev->dev,
+ dev_info(&pf->pdev->dev,
"Failed to reset tx rate for vsi->seid %u\n",
vsi->seid);
return ret;
@@ -5139,12 +5141,11 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
bw_data.tc_bw_credits[i] = bw_share[i];
- ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, vsi->seid, &bw_data,
- NULL);
+ ret = i40e_aq_config_vsi_tc_bw(&pf->hw, vsi->seid, &bw_data, NULL);
if (ret) {
- dev_info(&vsi->back->pdev->dev,
+ dev_info(&pf->pdev->dev,
"AQ command Config VSI BW allocation per TC failed = %d\n",
- vsi->back->hw.aq.asq_last_status);
+ pf->hw.aq.asq_last_status);
return -EINVAL;
}
--
2.17.1
^ permalink raw reply related
* [net 00/11][pull request] Intel Wired LAN Driver Updates 2018-08-24
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene
This series contains fixes to e1000, igb, ixgb, ixgbe and i40e.
YueHaibing from Huawei provides a change to use dma_zalloc_coherent()
instead of calls to allocator followed by a memset for ixgb.
Bo Chen provides a couple of fixes for e1000, first by adding a check to
prevent a NULL pointer dereference. The second change is to clean up a
possible resource leak on old transmit and receive rings when the device
is not up.
Jesus fixes an issue in the usage of an advanced transmit context
descriptor for retrieving the timestamp of a packet for AF_PACKET if the
IGB_TX_FLAGS_VLAN is not set in igb.
Jia-Ju Bai provides several patches which replace GFP_ATOMIC with
GFP_KERNEL, when using kzalloc() and kcalloc() which is not necessary.
Also found an instance of mdelay() call which could be replaced with
msleep().
Tony fixes ixgbe to allow MTU changes with XDP, by adding checks to
ensure only supported values and return -EINVAL for when it is not
supported.
Sebastian fixed an issue that was not clearing VF mailbox memory and
ensure queues are re-enabled correctly.
Martyna fixes a transmit timeout when DCB is configured when bringing up
an interface.
Jake fixes a previous commit which accidentally reversed the check of
the data pointer, so we can accurately count the size of the stats.
The following are changes since commit ff0fadfffe681203bfe134e1041ab6ccb4aa3dff:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 1GbE
Bo Chen (2):
e1000: check on netif_running() before calling e1000_up()
e1000: ensure to free old tx/rx rings in set_ringparam()
Jacob Keller (1):
i40e: fix condition of WARN_ONCE for stat strings
Jesus Sanchez-Palencia (1):
igb: Use an advanced ctx descriptor for launchtime
Jia-Ju Bai (3):
igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
Martyna Szapar (1):
i40e: Fix for Tx timeouts when interface is brought up if DCB is
enabled
Sebastian Basierski (1):
ixgbe: fix driver behaviour after issuing VFLR
Tony Nguyen (1):
ixgbe: Prevent unsupported configurations with XDP
YueHaibing (1):
ixgb: use dma_zalloc_coherent instead of allocator/memset
.../net/ethernet/intel/e1000/e1000_ethtool.c | 7 +++--
.../net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 15 ++++-----
drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 7 +++--
drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 ++-
drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 4 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 30 ++++++++++++++++--
.../net/ethernet/intel/ixgbe/ixgbe_sriov.c | 31 +++++++++++++++++++
drivers/net/ethernet/intel/ixgbe/ixgbe_type.h | 1 +
10 files changed, 81 insertions(+), 23 deletions(-)
--
2.17.1
^ permalink raw reply
* [net 06/11] igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Jia-Ju Bai, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Jia-Ju Bai <baijiaju1990@gmail.com>
igb_integrated_phy_loopback() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index f92f7918112d..5acf3b743876 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1649,7 +1649,7 @@ static int igb_integrated_phy_loopback(struct igb_adapter *adapter)
if (hw->phy.type == e1000_phy_m88)
igb_phy_disable_receiver(adapter);
- mdelay(500);
+ msleep(500);
return 0;
}
--
2.17.1
^ permalink raw reply related
* [net 11/11] i40e: fix condition of WARN_ONCE for stat strings
From: Jeff Kirsher @ 2018-08-24 18:47 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180824184735.32175-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Commit 9b10df596bd4 ("i40e: use WARN_ONCE to replace the commented
BUG_ON size check") introduced a warning check to make sure
that the size of the stat strings was always the expected value. This
code accidentally inverted the check of the data pointer. Fix this so
that we accurately count the size of the stats we copied in.
This fixes an erroneous WARN kernel splat that occurs when requesting
ethtool statistics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Mauro S M Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index abcd096ede14..5ff6caa83948 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2013,7 +2013,7 @@ static void i40e_get_stat_strings(struct net_device *netdev, u8 *data)
for (i = 0; i < I40E_MAX_USER_PRIORITY; i++)
i40e_add_stat_strings(&data, i40e_gstrings_pfc_stats, i);
- WARN_ONCE(p - data != i40e_get_stats_count(netdev) * ETH_GSTRING_LEN,
+ WARN_ONCE(data - p != i40e_get_stats_count(netdev) * ETH_GSTRING_LEN,
"stat strings count mismatch!");
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v2 17/17] net: WireGuard secure network tunnel
From: Andrew Lunn @ 2018-08-24 23:00 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: linux-kernel, netdev, davem, Greg KH
In-Reply-To: <20180824213849.23647-18-Jason@zx2c4.com>
Hi Jason
It is normal to include after the --- what you have changed since the
previous version.
I ran checkpatch on this again.
Last time, we had:
total: 6 errors, 763 warnings, 6514 lines checked
This time, we have:
total: 8 errors, 196 warnings, 7470 lines checked
So much better with the warnings. Thanks.
The errors should be simple to fix:
ERROR: that open brace { should be on the previous line
ERROR: that open brace { should be on the previous line
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: that open brace { should be on the previous line
ERROR: Use of __initconst requires a separate use of const
ERROR: trailing whitespace
One appears to be a false positive:
+static const struct {
+ bool result;
+ unsigned int msec_to_sleep_before;
+} expected_results[] __initconst = {
+ [0 ... PACKETS_BURSTABLE - 1] = { true, 0 },
+ [PACKETS_BURSTABLE] = { false, 0 },
+ [PACKETS_BURSTABLE + 1] = { true, MSEC_PER_SEC / PACKETS_PER_SECOND },
+ [PACKETS_BURSTABLE + 2] = { false, 0 },
+ [PACKETS_BURSTABLE + 3] = { true, (MSEC_PER_SEC / PACKETS_PER_SECOND) * 2 },
+ [PACKETS_BURSTABLE + 4] = { true, 0 },
+ [PACKETS_BURSTABLE + 5] = { false, 0 }
+};
Looking at some of the warnings:
WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#428: FILE: drivers/net/wireguard/allowedips.c:112:
+ BUG_ON(len >= 128);
WARNING: Macros with flow control statements should be avoided
#5905: FILE: drivers/net/wireguard/selftest/ratelimiter.h:87:
+#define ensure_time do { \
+ if (time_is_before_jiffies(loop_start_time + \
+ maximum_jiffies_at_index(i))) { \
+ if (++tries >= 5000) \
+ goto err; \
+ gc_entries(NULL); \
+ rcu_barrier(); \
+ msleep(500); \
+ goto restart; \
+ } \
+ } while (0)
WARNING: Macros with flow control statements should be avoided
#6948: FILE: drivers/net/wireguard/timers.c:29:
+#define peer_get_from_timer(timer_name) \
+ struct wireguard_peer *peer; \
+ rcu_read_lock_bh(); \
+ peer = peer_get_maybe_zero(from_timer(peer, timer, timer_name)); \
+ rcu_read_unlock_bh(); \
+ if (unlikely(!peer)) \
+ return;
are not very nice.
WARNING: Possible unnecessary 'out of memory' message
#5306: FILE: drivers/net/wireguard/selftest/allowedips.h:261:
+ if (!peers) {
+ pr_info("allowedips random self-test: out of memory\n");
kcalloc and friends are pretty noisy when they fails. No need to add
your own print.
I see we still have a lot of __always_inline in C files :-(
Andrew
^ permalink raw reply
* [Patch net] tipc: switch to rhashtable iterator
From: Cong Wang @ 2018-08-24 19:28 UTC (permalink / raw)
To: netdev; +Cc: tipc-discussion, Cong Wang, Jon Maloy, Ying Xue
syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
tipc_group_delete() just deletes it in tipc_release().
tipc_nl_sk_walk() aims to lock this sock when walking each sock
in the hash table to close race conditions with sock changes like
this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
this doesn't work at all. All non-BH call path should take
lock_sock() instead to make it work.
tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
where RCU read lock is required, this is the reason why lock_sock()
can't be taken on this path. This could be resolved by switching to
rhashtable iterator API's, where taking a sleepable lock is possible.
Also, the iterator API's are friendly for restartable calls like
diag dump, the last position is remembered behind the scence,
all we need to do here is saving the iterator into cb->args[].
I tested this with parallel tipc diag dump and thousands of tipc
socket creation and release, no crash or memory leak.
Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
net/tipc/diag.c | 2 ++
net/tipc/netlink.c | 2 ++
net/tipc/socket.c | 76 +++++++++++++++++++++++++++++++++++-------------------
net/tipc/socket.h | 2 ++
4 files changed, 56 insertions(+), 26 deletions(-)
diff --git a/net/tipc/diag.c b/net/tipc/diag.c
index aaabb0b776dd..73137f4aeb68 100644
--- a/net/tipc/diag.c
+++ b/net/tipc/diag.c
@@ -84,7 +84,9 @@ static int tipc_sock_diag_handler_dump(struct sk_buff *skb,
if (h->nlmsg_flags & NLM_F_DUMP) {
struct netlink_dump_control c = {
+ .start = tipc_dump_start,
.dump = tipc_diag_dump,
+ .done = tipc_dump_done,
};
netlink_dump_start(net->diag_nlsk, skb, h, &c);
return 0;
diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c
index 6ff2254088f6..99ee419210ba 100644
--- a/net/tipc/netlink.c
+++ b/net/tipc/netlink.c
@@ -167,7 +167,9 @@ static const struct genl_ops tipc_genl_v2_ops[] = {
},
{
.cmd = TIPC_NL_SOCK_GET,
+ .start = tipc_dump_start,
.dumpit = tipc_nl_sk_dump,
+ .done = tipc_dump_done,
.policy = tipc_nl_policy,
},
{
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index c9a50b62c738..ab7a2a7178f7 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -3229,45 +3229,69 @@ int tipc_nl_sk_walk(struct sk_buff *skb, struct netlink_callback *cb,
struct netlink_callback *cb,
struct tipc_sock *tsk))
{
- struct net *net = sock_net(skb->sk);
- struct tipc_net *tn = tipc_net(net);
- const struct bucket_table *tbl;
- u32 prev_portid = cb->args[1];
- u32 tbl_id = cb->args[0];
- struct rhash_head *pos;
+ struct rhashtable_iter *iter = (void *)cb->args[0];
struct tipc_sock *tsk;
int err;
- rcu_read_lock();
- tbl = rht_dereference_rcu((&tn->sk_rht)->tbl, &tn->sk_rht);
- for (; tbl_id < tbl->size; tbl_id++) {
- rht_for_each_entry_rcu(tsk, pos, tbl, tbl_id, node) {
- spin_lock_bh(&tsk->sk.sk_lock.slock);
- if (prev_portid && prev_portid != tsk->portid) {
- spin_unlock_bh(&tsk->sk.sk_lock.slock);
+ rhashtable_walk_start(iter);
+ while ((tsk = rhashtable_walk_next(iter)) != NULL) {
+ if (IS_ERR(tsk)) {
+ err = PTR_ERR(tsk);
+ if (err == -EAGAIN) {
+ err = 0;
continue;
}
+ break;
+ }
- err = skb_handler(skb, cb, tsk);
- if (err) {
- prev_portid = tsk->portid;
- spin_unlock_bh(&tsk->sk.sk_lock.slock);
- goto out;
- }
-
- prev_portid = 0;
- spin_unlock_bh(&tsk->sk.sk_lock.slock);
+ sock_hold(&tsk->sk);
+ rhashtable_walk_stop(iter);
+ lock_sock(&tsk->sk);
+ err = skb_handler(skb, cb, tsk);
+ if (err) {
+ release_sock(&tsk->sk);
+ sock_put(&tsk->sk);
+ goto out;
}
+ release_sock(&tsk->sk);
+ rhashtable_walk_start(iter);
+ sock_put(&tsk->sk);
}
+ rhashtable_walk_stop(iter);
out:
- rcu_read_unlock();
- cb->args[0] = tbl_id;
- cb->args[1] = prev_portid;
-
return skb->len;
}
EXPORT_SYMBOL(tipc_nl_sk_walk);
+int tipc_dump_start(struct netlink_callback *cb)
+{
+ struct rhashtable_iter *iter = (void *)cb->args[0];
+ struct net *net = sock_net(cb->skb->sk);
+ struct tipc_net *tn = tipc_net(net);
+
+ if (!iter) {
+ iter = kmalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ cb->args[0] = (long)iter;
+ }
+
+ rhashtable_walk_enter(&tn->sk_rht, iter);
+ return 0;
+}
+EXPORT_SYMBOL(tipc_dump_start);
+
+int tipc_dump_done(struct netlink_callback *cb)
+{
+ struct rhashtable_iter *hti = (void *)cb->args[0];
+
+ rhashtable_walk_exit(hti);
+ kfree(hti);
+ return 0;
+}
+EXPORT_SYMBOL(tipc_dump_done);
+
int tipc_sk_fill_sock_diag(struct sk_buff *skb, struct netlink_callback *cb,
struct tipc_sock *tsk, u32 sk_filter_state,
u64 (*tipc_diag_gen_cookie)(struct sock *sk))
diff --git a/net/tipc/socket.h b/net/tipc/socket.h
index aff9b2ae5a1f..d43032e26532 100644
--- a/net/tipc/socket.h
+++ b/net/tipc/socket.h
@@ -68,4 +68,6 @@ int tipc_nl_sk_walk(struct sk_buff *skb, struct netlink_callback *cb,
int (*skb_handler)(struct sk_buff *skb,
struct netlink_callback *cb,
struct tipc_sock *tsk));
+int tipc_dump_start(struct netlink_callback *cb);
+int tipc_dump_done(struct netlink_callback *cb);
#endif
--
2.14.4
^ permalink raw reply related
* Re: Re: KASAN: use-after-free Read in __rhashtable_lookup (2)
From: syzbot @ 2018-08-24 23:05 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: davem, linux-kernel, linux-rdma, netdev, rds-devel,
santosh.shilimkar, sowmini.varadhan, syzkaller-bugs
In-Reply-To: <20180824230446.GA16891@oracle.com>
> #syz dup KASAN: use-after-free Read in rds_find_bound
unknown command "dup"
> The problem is a missing synchronize_net() in rds_release.
> References:
> https://www.spinics.net/lists/netdev/msg475074.html
> https://www.spinics.net/lists/netdev/msg507908.html
^ permalink raw reply
* Re: Re: KASAN: use-after-free Read in __rhashtable_lookup (2)
From: syzbot @ 2018-08-24 23:05 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: davem, linux-kernel, linux-rdma, netdev, rds-devel,
santosh.shilimkar, sowmini.varadhan, syzkaller-bugs
In-Reply-To: <20180824230446.GA16891@oracle.com>
> #syz dup KASAN: use-after-free Read in rds_find_bound
unknown command "dup"
> The problem is a missing synchronize_net() in rds_release.
> References:
> https://www.spinics.net/lists/netdev/msg475074.html
> https://www.spinics.net/lists/netdev/msg507908.html
> --
> You received this message because you are subscribed to the Google
> Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/20180824230446.GA16891%40oracle.com.
> For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply
* Re: Re: KASAN: use-after-free Read in __rhashtable_lookup (2)
From: Dmitry Vyukov @ 2018-08-24 23:10 UTC (permalink / raw)
To: syzbot
Cc: Sowmini Varadhan, David Miller, LKML, linux-rdma, netdev,
rds-devel, Santosh Shilimkar, syzkaller-bugs
In-Reply-To: <000000000000d8879a05743667a2@google.com>
On Fri, Aug 24, 2018 at 4:05 PM, syzbot
<syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com> wrote:
>
>
>> #syz dup KASAN: use-after-free Read in rds_find_bound
>
>
> unknown command "dup"
You need a colon after dup.
But this can't possibly be a dup of that bug. That bug was fixed in February:
https://syzkaller.appspot.com/bug?extid=93a5839deb355537440f
And this crash happened a day ago. This is a different bug. If you dup
it to a closed one, this one will also be considered closed, and then
a new crash will immediately create a new bug.
>> The problem is a missing synchronize_net() in rds_release.
>> References:
>> https://www.spinics.net/lists/netdev/msg475074.html
>> https://www.spinics.net/lists/netdev/msg507908.html
>
>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/syzkaller-bugs/20180824230446.GA16891%40oracle.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/000000000000d8879a05743667a2%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply
* Re: KASAN: use-after-free Read in __rhashtable_lookup (2)
From: Sowmini Varadhan @ 2018-08-24 23:37 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: syzbot, David Miller, LKML, linux-rdma, netdev, rds-devel,
Santosh Shilimkar, syzkaller-bugs
In-Reply-To: <CACT4Y+a6P+q8jtRrd=cAf5hWbqQO4CQdWUaCCX-ygRbGsWHnGQ@mail.gmail.com>
On (08/24/18 16:10), Dmitry Vyukov wrote:
> You need a colon after dup.
I see.
> But this can't possibly be a dup of that bug. That bug was fixed in February:
Apologies, I unknowingly made a mistake in my
syzbot-tiquette in that case,
I did not intend to *close* the bug as a dup, I was
merely trying to indicate that this is yet another
manifestation of the original report with subject
"KASAN: use-after-free Read in rds_find_bound"
All of these problems are happening because we are missing
a synchronize_net() in rds_release.
A full description of the race conditon can be found in
https://www.spinics.net/lists/netdev/msg475074.html
If rds_release is going to nuke the rds_socket
it needs to sychronize_net() with other threads that may be
looking up the bind_hash_table. If we dont do that,
all of these "use-after-free" races can happen.
As long as we dont have the synchronize_net() we are
missing a big (and standard) bit of RCU synchronization.
--Sowmini
^ permalink raw reply
* [PATCH bpf 0/2] Two BPF sockmap fixes
From: Daniel Borkmann @ 2018-08-24 20:08 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
This series contains two fixes found while working with the BPF
sockmap code. One refcount leakage and one case for a small window
of use after free. See patches for more info. Thanks!
Daniel Borkmann (2):
bpf, sockmap: fix potential use after free in bpf_tcp_close
bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg
kernel/bpf/sockmap.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--
2.9.5
^ permalink raw reply
* [PATCH bpf 1/2] bpf, sockmap: fix potential use after free in bpf_tcp_close
From: Daniel Borkmann @ 2018-08-24 20:08 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
In-Reply-To: <20180824200851.12308-1-daniel@iogearbox.net>
bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop().
A parallel update on the sock hash map can happen between psock_map_pop()
and lookup_elem_raw() where we override the element under link->hash /
link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only
test whether an element is present, but we do not test whether the
element is infact the element we were looking for.
We lock the sock in bpf_tcp_close() during that time, so do we hold
the lock in sock_hash_update_elem(). However, the latter locks the
sock which is newly updated, not the one we're purging from the hash
table. This means that while one CPU is doing the lookup from bpf_tcp_close(),
another CPU is doing the map update in parallel, dropped our sock from
the hlist and released the psock.
Subsequently the first CPU will find the new sock and attempts to drop
and release the old sock yet another time. Fix is that we need to check
the elements for a match after lookup, similar as we do in the sock map.
Note that the hash tab elems are freed via RCU, so access to their
link->hash / link->key is fine since we're under RCU read side there.
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index cf5195c..01879e4 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -369,7 +369,7 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
/* If another thread deleted this object skip deletion.
* The refcnt on psock may or may not be zero.
*/
- if (l) {
+ if (l && l == link) {
hlist_del_rcu(&link->hash_node);
smap_release_sock(psock, link->sk);
free_htab_elem(htab, link);
--
2.9.5
^ permalink raw reply related
* [PATCH bpf 2/2] bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg
From: Daniel Borkmann @ 2018-08-24 20:08 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: john.fastabend, netdev, Daniel Borkmann
In-Reply-To: <20180824200851.12308-1-daniel@iogearbox.net>
In bpf_tcp_recvmsg() we first took a reference on the psock, however
once we find that there are skbs in the normal socket's receive queue
we return with processing them through tcp_recvmsg(). Problem is that
we leak the taken reference on the psock in that path. Given we don't
really do anything with the psock at this point, move the skb_queue_empty()
test before we fetch the psock to fix this case.
Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
kernel/bpf/sockmap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 01879e4..26d8a30 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -912,6 +912,8 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
if (unlikely(flags & MSG_ERRQUEUE))
return inet_recv_error(sk, msg, len, addr_len);
+ if (!skb_queue_empty(&sk->sk_receive_queue))
+ return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
rcu_read_lock();
psock = smap_psock_sk(sk);
@@ -922,9 +924,6 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
goto out;
rcu_read_unlock();
- if (!skb_queue_empty(&sk->sk_receive_queue))
- return tcp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
-
lock_sock(sk);
bytes_ready:
while (copied != len) {
--
2.9.5
^ permalink raw reply related
* [PATCH v2 00/17] WireGuard: Secure Network Tunnel
From: Jason A. Donenfeld @ 2018-08-24 21:38 UTC (permalink / raw)
To: linux-kernel, netdev, davem; +Cc: Jason A. Donenfeld
This patchset is available on git.kernel.org in this branch, where it may be
pulled directly for inclusion into net-next:
* https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=jd/wireguard
WireGuard is a secure network tunnel written especially for Linux, which
has faced around three years of serious development, deployment, and
scrutiny. It delivers excellent performance and is extremely easy to
use and configure. It has been designed with the primary goal of being
both easy to audit by virtue of being small and highly secure from a
cryptography and systems security perspective. WireGuard is used by some
massive companies pushing enormous amounts of traffic, and likely
already today you've consumed bytes that at some point transited through
a WireGuard tunnel. Even as an out-of-tree module, WireGuard has been
integrated into various userspace tools, Linux distributions, mobile
phones, and data centers. There are ports in several languages to
several operating systems, and even commercial hardware and services
sold integrating WireGuard. It is time, therefore, for WireGuard to be
properly integrated into Linux.
Ample information, including documentation, installation instructions,
and project details, is available at:
* https://www.wireguard.com/
* https://www.wireguard.com/papers/wireguard.pdf
As it is currently an out-of-tree module, it lives in its own git repo
and has its own mailing list, and every commit for the module is tested
against every stable kernel since 3.10 on a variety of architectures
using an extensive test suite:
* https://git.zx2c4.com/WireGuard
https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/WireGuard.git/
* https://lists.zx2c4.com/mailman/listinfo/wireguard
* https://www.wireguard.com/build-status/
The project has been broadly discussed at conferences, and was presented
to the Netdev developers in Seoul last November, where a paper was
released detailing some interesting aspects of the project. Dave asked
me after the talk if I would consider sending in a v1 "sooner rather
than later", hence this patchset. A decision is still waiting from the
LPC network track committee, but an update on these topics may be
presented in Vancouver in a few months. Presentations:
* https://www.wireguard.com/presentations/
* https://www.wireguard.com/papers/wireguard-netdev22.pdf
The cryptography in the protocol itself has been formally verified by
several independent academic teams with positive results, and I know of
two additional efforts on their way to further corroborate those
findings. The version 1 protocol is "complete", and so the purpose of
this review is to assess the implementation of the protocol. However, it
still may be of interest to know that the thing you're reviewing uses a
protocol with various nice security properties:
* https://www.wireguard.com/formal-verification/
This patchset is divided into three segments. The first introduces a very
simple helper for working with the FPU state for the purposes of amortizing
SIMD operations. The second segment is a small collection of cryptographic
primitives, split up into several commits by primitive and by hardware. The
third is WireGuard itself, presented as an unintrusive and self-contained
virtual network driver.
It is intended that this entire patch series enter the kernel through
DaveM's net-next tree.
Changes v1->v2:
- Zinc has been split into multiple patches.
- Zinc has been split into multiple config options.
- WireGuard has been wrapped at 80 characters.
- Numerous cleanups of inside WireGuard.
- Splitting out the simd helper into linux/.
- Simplification of bitwise arithmetic inside some Zinc primitives.
- Better handling of unaligned buffers inside Zinc.
- Expanded CC list.
Enjoy,
Jason
^ permalink raw reply
* [PATCH v2 03/17] zinc: ChaCha20 generic C implementation
From: Jason A. Donenfeld @ 2018-08-24 21:38 UTC (permalink / raw)
To: linux-kernel, netdev, davem
Cc: Jason A. Donenfeld, Andy Lutomirski, Greg KH, Samuel Neves,
Jean-Philippe Aumasson, linux-crypto
In-Reply-To: <20180824213849.23647-1-Jason@zx2c4.com>
This implements the ChaCha20 permutation as a single C statement, by way
of the comma operator, which the compiler is able to simplify
terrifically.
Information: https://cr.yp.to/chacha.html
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: linux-crypto@vger.kernel.org
---
include/zinc/chacha20.h | 47 ++++++++++++
lib/zinc/Kconfig | 4 +
lib/zinc/Makefile | 4 +
lib/zinc/chacha20/chacha20.c | 137 +++++++++++++++++++++++++++++++++++
lib/zinc/main.c | 5 ++
5 files changed, 197 insertions(+)
create mode 100644 include/zinc/chacha20.h
create mode 100644 lib/zinc/chacha20/chacha20.c
diff --git a/include/zinc/chacha20.h b/include/zinc/chacha20.h
new file mode 100644
index 000000000000..ba50a2424ad0
--- /dev/null
+++ b/include/zinc/chacha20.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CHACHA20_H
+#define _ZINC_CHACHA20_H
+
+#include <linux/simd.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+enum {
+ CHACHA20_IV_SIZE = 16,
+ CHACHA20_KEY_SIZE = 32,
+ CHACHA20_BLOCK_SIZE = 64,
+ HCHACHA20_KEY_SIZE = 32,
+ HCHACHA20_NONCE_SIZE = 16
+};
+
+struct chacha20_ctx {
+ u32 key[8];
+ u32 counter[4];
+} __aligned(32);
+
+void chacha20_fpu_init(void);
+
+static inline void chacha20_init(struct chacha20_ctx *state, const u8 key[CHACHA20_KEY_SIZE], const u64 nonce)
+{
+ __le32 *le_key = (__le32 *)key;
+ state->key[0] = le32_to_cpu(le_key[0]);
+ state->key[1] = le32_to_cpu(le_key[1]);
+ state->key[2] = le32_to_cpu(le_key[2]);
+ state->key[3] = le32_to_cpu(le_key[3]);
+ state->key[4] = le32_to_cpu(le_key[4]);
+ state->key[5] = le32_to_cpu(le_key[5]);
+ state->key[6] = le32_to_cpu(le_key[6]);
+ state->key[7] = le32_to_cpu(le_key[7]);
+ state->counter[0] = state->counter[1] = 0;
+ state->counter[2] = nonce & U32_MAX;
+ state->counter[3] = nonce >> 32;
+}
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len, simd_context_t simd_context);
+
+void hchacha20(u8 derived_key[CHACHA20_KEY_SIZE], const u8 nonce[HCHACHA20_NONCE_SIZE], const u8 key[HCHACHA20_KEY_SIZE], simd_context_t simd_context);
+
+#endif /* _ZINC_CHACHA20_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index aa4f8d449d6b..5311a0d6ba8b 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -6,6 +6,10 @@ config ZINC
select NEON
select KERNEL_MODE_NEON
+config ZINC_CHACHA20
+ bool
+ select ZINC
+
config ZINC_DEBUG
bool "Zinc cryptography library debugging and self-tests"
depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 8e30115217db..6ec77feb4ab7 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -2,6 +2,10 @@ ccflags-y := -O3
ccflags-y += -Wframe-larger-than=8192
ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
+ifeq ($(CONFIG_ZINC_CHACHA20),y)
+zinc-y += chacha20/chacha20.o
+endif
+
zinc-y += main.o
obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
new file mode 100644
index 000000000000..29c487375ccd
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20.c
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/chacha20.h>
+
+#include <linux/kernel.h>
+#include <crypto/algapi.h>
+
+#ifndef HAVE_CHACHA20_ARCH_IMPLEMENTATION
+void __init chacha20_fpu_init(void) { }
+static inline bool chacha20_arch(u8 *out, const u8 *in, const size_t len, const u32 key[8], const u32 counter[4], simd_context_t simd_context) { return false; }
+static inline bool hchacha20_arch(u8 *derived_key, const u8 *nonce, const u8 *key, simd_context_t simd_context) { return false; }
+#endif
+
+#define EXPAND_32_BYTE_K 0x61707865U, 0x3320646eU, 0x79622d32U, 0x6b206574U
+
+#define QUARTER_ROUND(x, a, b, c, d) ( \
+ x[a] += x[b], \
+ x[d] = rol32((x[d] ^ x[a]), 16), \
+ x[c] += x[d], \
+ x[b] = rol32((x[b] ^ x[c]), 12), \
+ x[a] += x[b], \
+ x[d] = rol32((x[d] ^ x[a]), 8), \
+ x[c] += x[d], \
+ x[b] = rol32((x[b] ^ x[c]), 7) \
+)
+
+#define C(i, j) (i * 4 + j)
+
+#define DOUBLE_ROUND(x) ( \
+ /* Column Round */ \
+ QUARTER_ROUND(x, C(0, 0), C(1, 0), C(2, 0), C(3, 0)), \
+ QUARTER_ROUND(x, C(0, 1), C(1, 1), C(2, 1), C(3, 1)), \
+ QUARTER_ROUND(x, C(0, 2), C(1, 2), C(2, 2), C(3, 2)), \
+ QUARTER_ROUND(x, C(0, 3), C(1, 3), C(2, 3), C(3, 3)), \
+ /* Diagonal Round */ \
+ QUARTER_ROUND(x, C(0, 0), C(1, 1), C(2, 2), C(3, 3)), \
+ QUARTER_ROUND(x, C(0, 1), C(1, 2), C(2, 3), C(3, 0)), \
+ QUARTER_ROUND(x, C(0, 2), C(1, 3), C(2, 0), C(3, 1)), \
+ QUARTER_ROUND(x, C(0, 3), C(1, 0), C(2, 1), C(3, 2)) \
+)
+
+#define TWENTY_ROUNDS(x) ( \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x), \
+ DOUBLE_ROUND(x) \
+)
+
+static void chacha20_block_generic(__le32 *stream, u32 *state)
+{
+ u32 x[CHACHA20_BLOCK_SIZE / sizeof(u32)];
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(x); ++i)
+ x[i] = state[i];
+
+ TWENTY_ROUNDS(x);
+
+ for (i = 0; i < ARRAY_SIZE(x); ++i)
+ stream[i] = cpu_to_le32(x[i] + state[i]);
+
+ ++state[12];
+}
+
+static void chacha20_generic(u8 *out, const u8 *in, u32 len, const u32 key[8], const u32 counter[4])
+{
+ __le32 buf[CHACHA20_BLOCK_SIZE / sizeof(__le32)];
+ u32 x[] = {
+ EXPAND_32_BYTE_K,
+ key[0], key[1], key[2], key[3],
+ key[4], key[5], key[6], key[7],
+ counter[0], counter[1], counter[2], counter[3]
+ };
+
+ if (out != in)
+ memmove(out, in, len);
+
+ while (len >= CHACHA20_BLOCK_SIZE) {
+ chacha20_block_generic(buf, x);
+ crypto_xor(out, (u8 *)buf, CHACHA20_BLOCK_SIZE);
+ len -= CHACHA20_BLOCK_SIZE;
+ out += CHACHA20_BLOCK_SIZE;
+ }
+ if (len) {
+ chacha20_block_generic(buf, x);
+ crypto_xor(out, (u8 *)buf, len);
+ }
+}
+
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len, simd_context_t simd_context)
+{
+ if (!chacha20_arch(dst, src, len, state->key, state->counter, simd_context))
+ chacha20_generic(dst, src, len, state->key, state->counter);
+ state->counter[0] += (len + 63) / 64;
+}
+EXPORT_SYMBOL(chacha20);
+
+static void hchacha20_generic(u8 derived_key[CHACHA20_KEY_SIZE], const u8 nonce[HCHACHA20_NONCE_SIZE], const u8 key[HCHACHA20_KEY_SIZE])
+{
+ __le32 *out = (__force __le32 *)derived_key;
+ u32 x[] = {
+ EXPAND_32_BYTE_K,
+ le32_to_cpup((__le32 *)(key + 0)), le32_to_cpup((__le32 *)(key + 4)), le32_to_cpup((__le32 *)(key + 8)), le32_to_cpup((__le32 *)(key + 12)),
+ le32_to_cpup((__le32 *)(key + 16)), le32_to_cpup((__le32 *)(key + 20)), le32_to_cpup((__le32 *)(key + 24)), le32_to_cpup((__le32 *)(key + 28)),
+ le32_to_cpup((__le32 *)(nonce + 0)), le32_to_cpup((__le32 *)(nonce + 4)), le32_to_cpup((__le32 *)(nonce + 8)), le32_to_cpup((__le32 *)(nonce + 12))
+ };
+
+ TWENTY_ROUNDS(x);
+
+ out[0] = cpu_to_le32(x[0]);
+ out[1] = cpu_to_le32(x[1]);
+ out[2] = cpu_to_le32(x[2]);
+ out[3] = cpu_to_le32(x[3]);
+ out[4] = cpu_to_le32(x[12]);
+ out[5] = cpu_to_le32(x[13]);
+ out[6] = cpu_to_le32(x[14]);
+ out[7] = cpu_to_le32(x[15]);
+}
+
+void hchacha20(u8 derived_key[CHACHA20_KEY_SIZE], const u8 nonce[HCHACHA20_NONCE_SIZE], const u8 key[HCHACHA20_KEY_SIZE], simd_context_t simd_context)
+{
+ if (!hchacha20_arch(derived_key, nonce, key, simd_context))
+ hchacha20_generic(derived_key, nonce, key);
+}
+/* Deliberately not EXPORT_SYMBOL'd, since there are few reasons why somebody
+ * should be using this directly, rather than via xchacha20. Revisit only in
+ * the unlikely event that somebody has a good reason to export this.
+ */
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index 590872563955..79b85045f22f 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -3,6 +3,8 @@
* Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
*/
+#include <zinc/chacha20.h>
+
#include <linux/init.h>
#include <linux/module.h>
@@ -17,6 +19,9 @@
static int __init mod_init(void)
{
+#ifdef CONFIG_ZINC_CHACHA20
+ chacha20_fpu_init();
+#endif
return 0;
}
--
2.18.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox