* [net-next 05/13] fm10k: fix typos on fall through comments
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Newer versions of GCC since version 7 now warn when a case statement may
fall through without an explicit comment. "Fallthough" does not count as
it is misspelled. Fix the typos for these comments to appease the new
warnings.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 4 ++--
drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 10 +++++-----
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index 334088a101c3..244d3ad58ca7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -1,5 +1,5 @@
/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -1586,7 +1586,7 @@ s32 fm10k_pfvf_mbx_init(struct fm10k_hw *hw, struct fm10k_mbx_info *mbx,
mbx->mbmem_reg = FM10K_MBMEM_VF(id, 0);
break;
}
- /* fallthough */
+ /* fall through */
default:
return FM10K_MBX_ERR_NO_MBX;
}
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 40ee0242a80a..9e4fb3a44376 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1,5 +1,5 @@
/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -1334,19 +1334,19 @@ static u8 fm10k_iov_supported_xcast_mode_pf(struct fm10k_vf_info *vf_info,
case FM10K_XCAST_MODE_PROMISC:
if (vf_flags & FM10K_VF_FLAG_PROMISC_CAPABLE)
return FM10K_XCAST_MODE_PROMISC;
- /* fallthough */
+ /* fall through */
case FM10K_XCAST_MODE_ALLMULTI:
if (vf_flags & FM10K_VF_FLAG_ALLMULTI_CAPABLE)
return FM10K_XCAST_MODE_ALLMULTI;
- /* fallthough */
+ /* fall through */
case FM10K_XCAST_MODE_MULTI:
if (vf_flags & FM10K_VF_FLAG_MULTI_CAPABLE)
return FM10K_XCAST_MODE_MULTI;
- /* fallthough */
+ /* fall through */
case FM10K_XCAST_MODE_NONE:
if (vf_flags & FM10K_VF_FLAG_NONE_CAPABLE)
return FM10K_XCAST_MODE_NONE;
- /* fallthough */
+ /* fall through */
default:
break;
}
--
2.14.2
^ permalink raw reply related
* [net-next 09/13] fm10k: simplify reading PFVFLRE register
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
We're doing a really convoluted bitshift and read for the PFVFLRE
register. Just reading the PFVFLRE(1), shifting it by 32, then reading
PFVFLRE(0) should be sufficient.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index d8356c494f06..dfc88a463735 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -1,5 +1,5 @@
/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -67,10 +67,8 @@ s32 fm10k_iov_event(struct fm10k_intfc *interface)
/* read VFLRE to determine if any VFs have been reset */
do {
- vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(0));
+ vflre = fm10k_read_reg(hw, FM10K_PFVFLRE(1));
vflre <<= 32;
- vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(1));
- vflre = (vflre << 32) | (vflre >> 32);
vflre |= fm10k_read_reg(hw, FM10K_PFVFLRE(0));
i = iov_data->num_vfs;
--
2.14.2
^ permalink raw reply related
* [net-next 03/13] fm10k: Use seq_putc() in fm10k_dbg_desc_break()
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Markus Elfring, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Markus Elfring <elfring@users.sourceforge.net>
Two single characters should be put into a sequence.
Thus use the corresponding function "seq_putc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
index 5116fd043630..14df09e2d964 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c
@@ -52,9 +52,9 @@ static void fm10k_dbg_desc_seq_stop(struct seq_file __always_unused *s,
static void fm10k_dbg_desc_break(struct seq_file *s, int i)
{
while (i--)
- seq_puts(s, "-");
+ seq_putc(s, '-');
- seq_puts(s, "\n");
+ seq_putc(s, '\n');
}
static int fm10k_dbg_tx_desc_seq_show(struct seq_file *s, void *v)
--
2.14.2
^ permalink raw reply related
* [net-next 04/13] fm10k: stop spurious link down messages when Tx FIFO is full
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
In fm10k_get_host_state_generic, we check the mailbox tx_read() function
to ensure that the mailbox is still open. This function also checks to
make sure we have space to transmit another message. Unfortunately, if
we just recently sent a bunch of messages (such as enabling hundreds of
VLANs on a VF) this can result in a race where the watchdog task thinks
the link went down just because we haven't had time to process all these
messages yet.
Instead, lets just check whether the mailbox is still open. This ensures
that we don't race with the Tx FIFO, and we only link down once the
mailbox is not open.
This is safe, because if the FIFO fills up and we're unable to send
a message for too long, we'll end up triggering the timeout detection
which results in a reset. Additionally, since we still check to ensure
the mailbox state is OPEN, we'll transition to link down whenever the
mailbox closes as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_common.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.c b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
index 62a6ad9b3eed..736a9f087bc9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
@@ -1,5 +1,5 @@
/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -517,8 +517,8 @@ s32 fm10k_get_host_state_generic(struct fm10k_hw *hw, bool *host_ready)
goto out;
}
- /* verify Mailbox is still valid */
- if (!mbx->ops.tx_ready(mbx, FM10K_VFMBX_MSG_MTU))
+ /* verify Mailbox is still open */
+ if (mbx->state != FM10K_STATE_OPEN)
goto out;
/* interface cannot receive traffic without logical ports */
--
2.14.2
^ permalink raw reply related
* [net-next 08/13] fm10k: avoid needless delay when loading driver
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
When we load the driver, we set the last_reset to be in the future,
which delays the initial driver reset. Additionally, the service task
isn't scheduled to run automatically until the timer runs out. This
causes a needless delay of the first reset to begin talking to the
switch manager.
We can avoid this by simply not setting last_reset and immediately
scheduling the service task while in probe. This allows the device to
wake up faster, and avoids this delay.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 9212b3fa3b62..6c2c4bffaedf 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1800,9 +1800,6 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
netdev->vlan_features |= NETIF_F_HIGHDMA;
}
- /* delay any future reset requests */
- interface->last_reset = jiffies + (10 * HZ);
-
/* reset and initialize the hardware so it is in a known state */
err = hw->mac.ops.reset_hw(hw);
if (err) {
@@ -2079,8 +2076,9 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
/* enable SR-IOV after registering netdev to enforce PF/VF ordering */
fm10k_iov_configure(pdev, 0);
- /* clear the service task disable bit to allow service task to start */
+ /* clear the service task disable bit and kick off service task */
clear_bit(__FM10K_SERVICE_DISABLE, interface->state);
+ fm10k_service_event_schedule(interface);
return 0;
--
2.14.2
^ permalink raw reply related
* [net-next 02/13] fm10k: reschedule service event if we stall the PF<->SM mailbox
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
When we are handling PF<->VF mailbox messages, it is possible that the
VF will send us so many messages that the PF<->SM FIFO will fill up. In
this case, we stop the loop and wait until the service event is
rescheduled.
Normally this should happen due to an interrupt. But it is possible that
we don't get another interrupt for a while and it isn't until the
service timer actually reschedules us. Instead, simply reschedule
immediately which will cause the service event to be run again as soon
as we exit.
This ensures that we promptly handle all of the PF<->VF messages with
minimal delay, while still giving time for the SM mailbox to drain.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 2ec49116fe91..d8356c494f06 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -143,6 +143,10 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
if (!hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU)) {
/* keep track of how many times this occurs */
interface->hw_sm_mbx_full++;
+
+ /* make sure we try again momentarily */
+ fm10k_service_event_schedule(interface);
+
break;
}
--
2.14.2
^ permalink raw reply related
* [net-next 06/13] fm10k: avoid possible truncation of q_vector->name
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
New versions of GCC since version 7 began warning about possible
truncation of calls to snprintf. We can fix this and avoid false
positives. First, we should pass the full buffer size to snprintf,
because it guarantees a NULL character as part of its passed length, so
passing len-1 is simply wasting a byte of possible storage.
Second, if we make the ri and ti variables unsigned, the compiler is
able to correctly reason that the value never gets larger than 256, so
it doesn't need to warn about the full space required to print a signed
integer.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 63784576ae8b..9212b3fa3b62 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1544,7 +1544,7 @@ int fm10k_qv_request_irq(struct fm10k_intfc *interface)
struct net_device *dev = interface->netdev;
struct fm10k_hw *hw = &interface->hw;
struct msix_entry *entry;
- int ri = 0, ti = 0;
+ unsigned int ri = 0, ti = 0;
int vector, err;
entry = &interface->msix_entries[NON_Q_VECTORS(hw)];
@@ -1554,15 +1554,15 @@ int fm10k_qv_request_irq(struct fm10k_intfc *interface)
/* name the vector */
if (q_vector->tx.count && q_vector->rx.count) {
- snprintf(q_vector->name, sizeof(q_vector->name) - 1,
- "%s-TxRx-%d", dev->name, ri++);
+ snprintf(q_vector->name, sizeof(q_vector->name),
+ "%s-TxRx-%u", dev->name, ri++);
ti++;
} else if (q_vector->rx.count) {
- snprintf(q_vector->name, sizeof(q_vector->name) - 1,
- "%s-rx-%d", dev->name, ri++);
+ snprintf(q_vector->name, sizeof(q_vector->name),
+ "%s-rx-%u", dev->name, ri++);
} else if (q_vector->tx.count) {
- snprintf(q_vector->name, sizeof(q_vector->name) - 1,
- "%s-tx-%d", dev->name, ti++);
+ snprintf(q_vector->name, sizeof(q_vector->name),
+ "%s-tx-%u", dev->name, ti++);
} else {
/* skip this unused q_vector */
continue;
--
2.14.2
^ permalink raw reply related
* [net-next 01/13] fm10k: ensure we process SM mbx when processing VF mbx
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171002154236.84043-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
When we process VF mailboxes, the driver is likely going to also queue
up messages to the switch manager. This process merely queues up the
FIFO, but doesn't actually begin the transmission process. Because we
hold the mailbox lock during this VF processing, the PF<->SM mailbox is
not getting processed at this time. Ensure that we actually process the
PF<->SM mailbox in between each PF<->VF mailbox.
This should ensure prompt transmission of the messages queued up after
each VF message is received and handled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 5f4dac0d36ef..2ec49116fe91 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -126,6 +126,9 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
struct fm10k_mbx_info *mbx = &vf_info->mbx;
u16 glort = vf_info->glort;
+ /* process the SM mailbox first to drain outgoing messages */
+ hw->mbx.ops.process(hw, &hw->mbx);
+
/* verify port mapping is valid, if not reset port */
if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort))
hw->iov.ops.reset_lport(hw, vf_info);
--
2.14.2
^ permalink raw reply related
* [net-next 00/13][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-02
From: Jeff Kirsher @ 2017-10-02 15:42 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene
This series contains updates to fm10k only.
Jake provides all but one of the changes in this series. Most are small
fixes, starting with ensuring prompt transmission of messages queued up
after each VF message is received and handled. Fix a possible race
condition between the watchdog task and the processing of mailbox
messages by just checking whether the mailbox is still open. Fix a
couple of GCC v7 warnings, including misspelled "fall through" comments
and warnings about possible truncation of calls to snprintf(). Cleaned
up a convoluted bitshift and read for the PFVFLRE register. Fixed a
potential divide by zero when finding the proper r_idx.
Markus Elfring fixes an issue which was found using Coccinelle, where
we should have been using seq_putc() instead of seq_puts().
The following are changes since commit 0929567a7a2dab8455a7313956973ff0d339709a:
samples/bpf: fix warnings in xdp_monitor_user
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE
Jacob Keller (12):
fm10k: ensure we process SM mbx when processing VF mbx
fm10k: reschedule service event if we stall the PF<->SM mailbox
fm10k: stop spurious link down messages when Tx FIFO is full
fm10k: fix typos on fall through comments
fm10k: avoid possible truncation of q_vector->name
fm10k: add missing fall through comment
fm10k: avoid needless delay when loading driver
fm10k: simplify reading PFVFLRE register
fm10k: don't loop while resetting VFs due to VFLR event
fm10k: avoid divide by zero in rare cases when device is resetting
fm10k: move fm10k_prepare_for_reset and fm10k_handle_reset
fm10k: prevent race condition of __FM10K_SERVICE_SCHED
Markus Elfring (1):
fm10k: Use seq_putc() in fm10k_dbg_desc_break()
drivers/net/ethernet/intel/fm10k/fm10k_common.c | 6 +-
drivers/net/ethernet/intel/fm10k/fm10k_debugfs.c | 4 +-
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 35 ++++----
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 1 +
drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 4 +-
drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 8 +-
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 110 +++++++++++++----------
drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 10 +--
8 files changed, 101 insertions(+), 77 deletions(-)
--
2.14.2
^ permalink raw reply
* Re: [PATCH v2 3/6] staging: fsl-dpaa2/ethsw: Add ethtool support
From: Andrew Lunn @ 2017-10-02 15:37 UTC (permalink / raw)
To: Razvan Stefanescu
Cc: gregkh, devel, linux-kernel, netdev, agraf, arnd,
alexandru.marginean, bogdan.purcareata, ruxandra.radulescu,
laurentiu.tudor, stuyoder
In-Reply-To: <1506933380-12641-4-git-send-email-razvan.stefanescu@nxp.com>
Hi Razvan
> +static void ethsw_get_drvinfo(struct net_device *netdev,
> + struct ethtool_drvinfo *drvinfo)
> +{
> + struct ethsw_port_priv *port_priv = netdev_priv(netdev);
> + u16 version_major, version_minor;
> + int err;
> +
> + strlcpy(drvinfo->driver, KBUILD_MODNAME, sizeof(drvinfo->driver));
> + strlcpy(drvinfo->version, ethsw_drv_version, sizeof(drvinfo->version));
Software driver versions are mostly useless. I would suggest you
remove this.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Eric Dumazet @ 2017-10-02 15:19 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
In-Reply-To: <20171002150936.GB30423@breakpoint.cc>
On Mon, 2017-10-02 at 17:09 +0200, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Just use RCU : A writer is supposed to work on a private copy, and
> > _then_ publish the new pointer, so that a reader can not see mangled
> > string.
> >
> > We either copy the 'old' name or the 'new' one.
> >
> > A seqcount is not needed, and wont prevent you from reading the value
> > right before a change anyway.
>
> Would you rather use kfree_rcu or unconditional synchronize_net()
> before releasing old memory?
kfree_rcu() please ;)
Adding 16 bytes for the rcu_head is acceptable I think.
^ permalink raw reply
* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Florian Westphal @ 2017-10-02 15:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1506956029.8061.8.camel@edumazet-glaptop3.roam.corp.google.com>
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Just use RCU : A writer is supposed to work on a private copy, and
> _then_ publish the new pointer, so that a reader can not see mangled
> string.
>
> We either copy the 'old' name or the 'new' one.
>
> A seqcount is not needed, and wont prevent you from reading the value
> right before a change anyway.
Would you rather use kfree_rcu or unconditional synchronize_net()
before releasing old memory?
^ permalink raw reply
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 15:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
David S. Miller, Willem de Bruijn
In-Reply-To: <1506955708.8061.5.camel@edumazet-glaptop3.roam.corp.google.com>
On Mon, Oct 02, 2017 at 07:48:28AM -0700, Eric Dumazet wrote:
> Please try the following fool proof patch.
>
> This is what I had in my local tree back in August but could not
> conclude on the syzkaller bug I was working on.
Thanks, I'll give this a go shortly.
I'm currently minimizing the Syzkaller log so that I can trigger the
issue more quickly (and have some confidence in a Tested-by)!
Thanks,
Mark.
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
> room = 576;
> room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
> room -= sizeof(struct icmphdr);
> -
> + if (room < 0)
> + goto ende;
> icmp_param.data_len = skb_in->len - icmp_param.offset;
> if (icmp_param.data_len > room)
> icmp_param.data_len = room;
>
>
>
^ permalink raw reply
* Re: [iproute PATCH v3 0/3] Check user supplied interface name lengths
From: Stephen Hemminger @ 2017-10-02 15:03 UTC (permalink / raw)
To: Phil Sutter; +Cc: netdev
In-Reply-To: <20171002114637.25703-1-phil@nwl.cc>
On Mon, 2 Oct 2017 13:46:34 +0200
Phil Sutter <phil@nwl.cc> wrote:
> This series adds explicit checks for user-supplied interface names to
> make sure they fit Linux's requirements.
>
> The first two patches simplify interface name parsing in some places -
> these are side-effects of working on the actual implementation provided
> in patch three.
>
> Changes since v2:
> - Changed patch 3 as suggested in review.
>
> Changes since v1:
> - Patches 1 and 2 introduced.
> - Changes to patch 3 are listed in there.
>
> Phil Sutter (3):
> ip{6,}tunnel: Avoid copying user-supplied interface name around
> tc: flower: No need to cache indev arg
> Check user supplied interface name lengths
>
> include/utils.h | 2 ++
> ip/ip6tunnel.c | 9 +++++----
> ip/ipl2tp.c | 4 +++-
> ip/iplink.c | 31 ++++++++++++-------------------
> ip/ipmaddr.c | 3 ++-
> ip/iprule.c | 10 ++++++++--
> ip/iptunnel.c | 29 +++++++++++++++--------------
> ip/iptuntap.c | 6 ++++--
> lib/utils.c | 29 +++++++++++++++++++++++++++++
> misc/arpd.c | 3 ++-
> tc/f_flower.c | 7 +++----
> 11 files changed, 85 insertions(+), 48 deletions(-)
>
Applied.
^ permalink raw reply
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 15:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
Willem de Bruijn
In-Reply-To: <CANn89i+zQG=rjHRqzsvPzjg5tqW43Lcz-BJ9spLascP9Nt5z8Q@mail.gmail.com>
On Mon, Oct 02, 2017 at 07:42:17AM -0700, Eric Dumazet wrote:
> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > Just to check I've understood correctly, are you suggesting that the
> > IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
> > doesn't seem to exist today)?
>
> We have plenty of places this is checked.
>
> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>
> Problem is : these checks are not fool proof yet.
>
> ( Only the admin was supposed to play these games )
Sorry, I meant that there was no constant called IP_MIN_MTU, and I was
just looking at the sites fixed up by c780a049f9bf4423.
I appreciate given that this requires admin privileges it's not exactly
high priority. I didn't mean for the above to sound like some kind of
accusation!
> > Otherwise, I do spot another potential issue. The writer side (e.g. most
> > net_device::ndo_change_mtu implementations and the __dev_set_mtu()
> > fallback) doesn't use WRITE_ONCE().
>
> It does not matter how many strange values can be observed by the reader :
> We must be fool proof anyway from reader point of view, so the
> WRITE_ONCE() is not strictly needed.
Ok. If we expect to always check somewhere on the reader side I guess
that makes sense.
Thanks,
Mark.
^ permalink raw reply
* Re: RFC iproute2 doc files
From: Stephen Hemminger @ 2017-10-02 14:56 UTC (permalink / raw)
To: Leon Romanovsky; +Cc: netdev
In-Reply-To: <20171002073109.GK2031@mtr-leonro.local>
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]
On Mon, 2 Oct 2017 10:31:09 +0300
Leon Romanovsky <leon@kernel.org> wrote:
> On Wed, Sep 20, 2017 at 08:11:59AM -0700, Stephen Hemminger wrote:
> > I noticed that the iproute man pages are up to date but the LaTex documentation
> > is very out of date. Rarely updated since the Linux 2.2 days.
> >
> > Either someone needs to do a massive editing job on them, or they should just
> > be dropped. My preference would be to just drop everything in the doc/ directory.
> > The current versions are so old, they can't be helping.
>
> If my vote counts, I will say to drop it.
>
> Thanks
They are gone now in current git repo. If anyone wants them they can
resurrect them from git and revise them.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net 3/3] net: skb_queue_purge(): lock/unlock the queue only once
From: Stephen Hemminger @ 2017-10-02 14:55 UTC (permalink / raw)
To: Michael Witten
Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
Eric Dumazet, netdev, linux-kernel
In-Reply-To: <057dd5367241468691b2b9adbc38a3ba-mfwitten@gmail.com>
On Mon, 02 Oct 2017 05:15:32 -0000
Michael Witten <mfwitten@gmail.com> wrote:
> On Sun, 1 Oct 2017 17:59:09 -0700, Stephen Hemminger wrote:
>
> > On Sun, 01 Oct 2017 22:19:20 -0000 Michael Witten wrote:
> >
> >> + spin_lock_irqsave(&q->lock, flags);
> >> + skb = q->next;
> >> + __skb_queue_head_init(q);
> >> + spin_unlock_irqrestore(&q->lock, flags);
> >
> > Other code manipulating lists uses splice operation and
> > a sk_buff_head temporary on the stack. That would be easier
> > to understand.
> >
> > struct sk_buf_head head;
> >
> > __skb_queue_head_init(&head);
> > spin_lock_irqsave(&q->lock, flags);
> > skb_queue_splice_init(q, &head);
> > spin_unlock_irqrestore(&q->lock, flags);
> >
> >
> >> + while (skb != head) {
> >> + next = skb->next;
> >> kfree_skb(skb);
> >> + skb = next;
> >> + }
> >
> > It would be cleaner if you could use
> > skb_queue_walk_safe rather than open coding the loop.
> >
> > skb_queue_walk_safe(&head, skb, tmp)
> > kfree_skb(skb);
>
> I appreciate abstraction as much as anybody, but I do not believe
> that such abstractions would actually be an improvement here.
>
> * Splice-initing seems more like an idiom than an abstraction;
> at first blush, it wouldn't be clear to me what the intention
> is.
>
> * Such abstractions are fairly unnecessary.
>
> * The function as written is already so short as to be
> easily digested.
>
> * More to the point, this function is not some generic,
> higher-level algorithm that just happens to employ the
> socket buffer interface; rather, it is a function that
> implements part of that very interface, and may thus
> twiddle the intimate bits of these data structures
> without being accused of abusing a leaky abstraction.
>
> * Such abstractions add overhead, if only conceptually. In this
> case, a temporary socket buffer queue allocates *3* unnecessary
> struct members, including a whole `spinlock_t' member:
>
> prev
> qlen
> lock
>
> It's possible that the compiler will be smart enough to leave
> those out, but I have my suspicions that it won't, not only
> given that the interface contract requires that the temporary
> socket buffer queue be properly initialized before use, but
> also because splicing into the temporary will manipulate its
> `qlen'. Yet, why worry whether optimization happens? The whole
> issue can simply be avoided by exploiting the intimate details
> that are already philosophically available to us.
>
> Similarly, the function `skb_queue_walk_safe' is nice, but it
> loses value both because a temporary queue loses value (as just
> described), and because it ignores the fact that legitimate
> access to the internals of these data structures allows for
> setting up the requested loop in advance; that is to say, the
> two parts of the function that we are now debating can be woven
> together more tightly than `skb_queue_walk_safe' allows.
>
> For these reasons, I stand by the way that the patch currently
> implements this function; it does exactly what is desired, no more
> or less.
>
> Sincerely,
> Michael Witten
The point is that there was discussion in the past of replacing
the next/prev as used in skb with more generic code from list.h.
If the abstraction was used, then this code would just work.
The temporary skb_buff_head is on the stack, and any
access to updating those fields like qlen are in CPU cache
and therefore have very little impact on any peformance.
^ permalink raw reply
* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Eric Dumazet @ 2017-10-02 14:53 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
In-Reply-To: <20171002102745.3047-1-fw@strlen.de>
On Mon, 2017-10-02 at 12:27 +0200, Florian Westphal wrote:
> Device alias can be set by either rtnetlink (rtnl is held) or sysfs.
>
> rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
> Add an extra mutex for it plus a seqcount to get a consistent snapshot
> of the alias buffer.
> +int dev_get_alias(const struct net_device *dev, char *alias, size_t len)
> +{
> + unsigned int seq;
> + int ret;
> +
> + for (;;) {
> + const char *name;
> +
> + ret = 0;
> + rcu_read_lock();
> + name = rcu_dereference(dev->ifalias);
> + seq = raw_seqcount_begin(&ifalias_rename_seq);
> + if (name)
> + ret = snprintf(alias, len, "%s", name);
> + rcu_read_unlock();
> +
> + if (!read_seqcount_retry(&ifalias_rename_seq, seq))
> + break;
> +
> + cond_resched();
> + }
> +
> + return ret;
> +}
I believe this too complex and not needed.
Just use RCU : A writer is supposed to work on a private copy, and
_then_ publish the new pointer, so that a reader can not see mangled
string.
We either copy the 'old' name or the 'new' one.
A seqcount is not needed, and wont prevent you from reading the value
right before a change anyway.
^ permalink raw reply
* Re: [BUG] bpf is broken in net-next
From: Stephen Hemminger @ 2017-10-02 14:52 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Martin KaFai Lau, netdev
In-Reply-To: <20171002013424.tlaictbu7btc4z56@ast-mbp>
On Sun, 1 Oct 2017 18:34:25 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Sun, Oct 01, 2017 at 02:02:30PM -0700, Stephen Hemminger wrote:
> > Recent regression in net-next building bpf.c in samples/bpf now broken.
> >
> > $ make samples/bpf/
> >
> >
> > HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o
> > samples/bpf/../../tools/lib/bpf/bpf.c: In function ‘bpf_create_map_node’:
> > samples/bpf/../../tools/lib/bpf/bpf.c:76:13: error: ‘union bpf_attr’ has no member named ‘map_name’; did you mean ‘map_type’?
> > memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
> > ^
> > samples/bpf/../../tools/lib/bpf/bpf.c:76:44: error: ‘BPF_OBJ_NAME_LEN’ undeclared (first use in this function)
> > memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
>
> everything works fine for me...
> did you do 'make headers_install' ?
>
Yes, that was the problem. I had done make mrproper after changing branches.
^ permalink raw reply
* Re: [PATCH RFC] flow_dissector: Add FLOW_DISSECTOR_F_FLOWER
From: Jiri Pirko @ 2017-10-02 14:49 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, hannes, netdev, rohit
In-Reply-To: <20170929191343.11318-1-tom@quantonium.net>
Fri, Sep 29, 2017 at 09:13:42PM CEST, tom@quantonium.net wrote:
>This patch is RFC and would be applied after "flow_dissector:
>Protocol specific flow dissector offload"
>
>In order to maitain uAPI in flower, the FLOW_DISSECTOR_F_FLOWER flag
>is added to indicate to flow_dissector that the caller is flower.
>As new funtionality is addes to flow_dissector that would break
>the flower uAPI, the code can be wrapped in "if (!(flags &
>FLOW_DISSECTOR_F_FLOWER)).
>
>In this patch the conditional is use around protocol specific
>dissection (e.g. DPI into VXLAN) as well as the code that
>enforces a depth of parsing to prevent DPI. The latter was a
>recent patch that would introduce a parsing limit to flower that
>did not exist before (i.e. would break uAPI).
>
>Signed-off-by: Tom Herbert <tom@quantonium.net>
>---
> include/net/flow_dissector.h | 1 +
> net/core/flow_dissector.c | 17 +++++++++++------
> net/sched/cls_flow.c | 3 ++-
> 3 files changed, 14 insertions(+), 7 deletions(-)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index ad75bbfd1c9c..ca315107d147 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -214,6 +214,7 @@ enum flow_dissector_key_id {
> #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL BIT(2)
> #define FLOW_DISSECTOR_F_STOP_AT_ENCAP BIT(3)
> #define FLOW_DISSECTOR_F_STOP_AT_L4 BIT(4)
>+#define FLOW_DISSECTOR_F_FLOWER BIT(5)
I don't like flow_dissector to have any user-specific bits. Note that
the same dissection may be used not only from flower, but from other
code as well (OVS). Flow dissector should not care who the caller is.
^ permalink raw reply
* Re: [PATCH net-next 0/3] bridge: neigh msg proxy and flood suppression support
From: Roopa Prabhu @ 2017-10-02 14:49 UTC (permalink / raw)
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, Nikolay Aleksandrov,
stephen@networkplumber.org, bridge
In-Reply-To: <1506919018-27875-1-git-send-email-roopa@cumulusnetworks.com>
On Sun, Oct 1, 2017 at 9:36 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This series implements arp and nd suppression in the bridge
> driver for ethernet vpns. It implements rfc7432, section 10
> https://tools.ietf.org/html/rfc7432#section-10
> for ethernet VPN deployments. It is similar to the existing
> BR_ARP_PROXY flag but has a few semantic differences to conform
> to EVPN standard. In case of EVPN, it is mainly used to avoid flooding to
> tunnel ports like vxlan/mpls. Unlike the existing flags it suppresses flood
> of all neigh discovery packets (arp, nd) to tunnel ports.
>
> Roopa Prabhu (3):
> bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd
> flood
> neigh arp suppress first
> bridge: suppress nd messages from going to BR_NEIGH_SUPPRESS ports
>
pls ignore, shows conflict applying over recent net-next bridge
changes. Will rebase and submit v2.
^ permalink raw reply
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 14:48 UTC (permalink / raw)
To: Mark Rutland
Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
David S. Miller, Willem de Bruijn
In-Reply-To: <20171002142156.GB21696@leverpostej>
On Mon, 2017-10-02 at 15:21 +0100, Mark Rutland wrote:
> Hi Eric,
>
> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
> > On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> > > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> > > skb_copy_and_csum_bits().
>
> > > kernel BUG at net/core/skbuff.c:2626!
>
> > > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> > > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> > > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> > > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> > > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> > > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> > > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> > > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> > > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> > > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> > > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> > > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> > > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> > > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> > > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> > > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> > > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> > > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> > > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> > > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> > > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> > > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> > > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> > > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> > > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> > > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> > > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> > > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> > > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> > > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
Please try the following fool proof patch.
This is what I had in my local tree back in August but could not
conclude on the syzkaller bug I was working on.
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
room = 576;
room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
room -= sizeof(struct icmphdr);
-
+ if (room < 0)
+ goto ende;
icmp_param.data_len = skb_in->len - icmp_param.offset;
if (icmp_param.data_len > room)
icmp_param.data_len = room;
^ permalink raw reply related
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 14:42 UTC (permalink / raw)
To: Mark Rutland
Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
Willem de Bruijn
In-Reply-To: <20171002142156.GB21696@leverpostej>
On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Eric,
>
> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>> > skb_copy_and_csum_bits().
>
>> > kernel BUG at net/core/skbuff.c:2626!
>
>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>
>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>> on loopback device, below minimum size of ipv4 MTU.
>
>> I tried to track it in August [1], but it seems hard to find all the
>> issues with this.
>>
>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>> Author: Eric Dumazet <edumazet@google.com>
>> Date: Wed Aug 16 11:09:12 2017 -0700
>>
>> ipv4: better IP_MAX_MTU enforcement
>>
>> While working on yet another syzkaller report, I found
>> that our IP_MAX_MTU enforcements were not properly done.
>>
>> gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>> final result can be bigger than IP_MAX_MTU :/
>>
>> This is a problem because device mtu can be changed on other cpus or
>> threads.
>>
>> While this patch does not fix the issue I am working on, it is
>> probably worth addressing it.
>
> Just to check I've understood correctly, are you suggesting that the
> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
> doesn't seem to exist today)?
We have plenty of places this is checked.
For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
Problem is : these checks are not fool proof yet.
( Only the admin was supposed to play these games )
>
> Otherwise, I do spot another potential issue. The writer side (e.g. most
> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
> fallback) doesn't use WRITE_ONCE().
It does not matter how many strange values can be observed by the reader :
We must be fool proof anyway from reader point of view, so the
WRITE_ONCE() is not strictly needed.
>
> IIUC, that means that the write could be torn across multiple accesses,
> and we could see dev->mtu < dev->min_mtu on the read side, even if we
> use READ_ONCE(), and sanity check the mtu value before calling
> __dev_set_mtu().
^ permalink raw reply
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 14:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
Willem de Bruijn
In-Reply-To: <CANn89i++X2rzsaHkTRayMpCED_exL8WPq12=RwD_hXsZ6cN-wQ@mail.gmail.com>
Hi Eric,
On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> > skb_copy_and_csum_bits().
> > kernel BUG at net/core/skbuff.c:2626!
> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
> This is most likely a bug caused by syzkaller setting a ridiculous MTU
> on loopback device, below minimum size of ipv4 MTU.
> I tried to track it in August [1], but it seems hard to find all the
> issues with this.
>
> commit c780a049f9bf442314335372c9abc4548bfe3e44
> Author: Eric Dumazet <edumazet@google.com>
> Date: Wed Aug 16 11:09:12 2017 -0700
>
> ipv4: better IP_MAX_MTU enforcement
>
> While working on yet another syzkaller report, I found
> that our IP_MAX_MTU enforcements were not properly done.
>
> gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
> final result can be bigger than IP_MAX_MTU :/
>
> This is a problem because device mtu can be changed on other cpus or
> threads.
>
> While this patch does not fix the issue I am working on, it is
> probably worth addressing it.
Just to check I've understood correctly, are you suggesting that the
IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
doesn't seem to exist today)?
Otherwise, I do spot another potential issue. The writer side (e.g. most
net_device::ndo_change_mtu implementations and the __dev_set_mtu()
fallback) doesn't use WRITE_ONCE().
IIUC, that means that the write could be torn across multiple accesses,
and we could see dev->mtu < dev->min_mtu on the read side, even if we
use READ_ONCE(), and sanity check the mtu value before calling
__dev_set_mtu().
Thanks,
Mark.
^ permalink raw reply
* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
From: Eric Dumazet @ 2017-10-02 13:56 UTC (permalink / raw)
To: James Chapman, svimik; +Cc: Stephen Hemminger, netdev
In-Reply-To: <CAEwTi7QkF73cc1hShzOoM9bzb5mBzyFW8t68OCn+XECrJ9d61Q@mail.gmail.com>
CC svimik@gmail.com so that he is aware of this netdev thread.
On Mon, 2017-10-02 at 14:32 +0100, James Chapman wrote:
> This seems to be a NULL pointer exception caused by tunnel->sock being
> NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
> l2tp_core.c:1135.
>
> tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.
>
> At the moment, I don't understand how this happens because
> pppol2tp_xmit() does a sock_hold() on the tunnel socket before
> l2tp_xmit_skb() is called. I'm still looking at this.
>
> Has this problem only recently started happening?
>
>
>
>
>
> On 1 October 2017 at 18:21, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 01 Oct 2017 16:22:33 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=197099
> >
> > Bug ID: 197099
> > Summary: Kernel panic in interrupt [l2tp_ppp]
> > Product: Networking
> > Version: 2.5
> > Kernel Version: 4.8.13-1.el6.elrepo.x86_64
> > Hardware: x86-64
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: Other
> > Assignee: stephen@networkplumber.org
> > Reporter: svimik@gmail.com
> > Regression: No
> >
> > Created attachment 258685
> > --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> > stacktrace screenshot
> >
> > Hello!
> >
> > Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> > l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> > I'm wrong).
> >
> > Unfortunately I'm still struggling with making kdump work, so the trace
> > screenshot is all I have at this moment. The only hope is that this stacktrace
> > means something to the guys that wrote the code.
> >
> > --
> > You are receiving this mail because:
> > You are the assignee for the bug.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox