* [PATCH 2/2] [v3] net: qcom/emac: add support for the Qualcomm Technologies QDF2400
From: Timur Tabi @ 2016-12-08 19:24 UTC (permalink / raw)
To: David Miller, netdev, alokc
In-Reply-To: <1481225061-30962-1-git-send-email-timur@codeaurora.org>
The QDF2432 and the QDF2400 have slightly different internal PHYs,
so there are some programming differences. Some of the registers in
the QDF2400 have moved, and some registers require different values
during initialization.
Because of the differences, and because HIDs are a scare resource,
the ACPI tables specify the hardware version in an _HRV property.
Version 1 is the QDF2432, and version 2 is the QDF2400. Any future
SOC that has the same internal PHY but different programming
requirements will be assigned the next available version number.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
---
Notes:
v3: fix kbuild compilation error with !CONFIG_ACPI
drivers/net/ethernet/qualcomm/emac/Makefile | 3 +-
.../ethernet/qualcomm/emac/emac-sgmii-qdf2400.c | 217 +++++++++++++++++++++
drivers/net/ethernet/qualcomm/emac/emac-sgmii.c | 33 +++-
drivers/net/ethernet/qualcomm/emac/emac-sgmii.h | 1 +
4 files changed, 249 insertions(+), 5 deletions(-)
create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2400.c
diff --git a/drivers/net/ethernet/qualcomm/emac/Makefile b/drivers/net/ethernet/qualcomm/emac/Makefile
index 204b787..7a66879 100644
--- a/drivers/net/ethernet/qualcomm/emac/Makefile
+++ b/drivers/net/ethernet/qualcomm/emac/Makefile
@@ -5,4 +5,5 @@
obj-$(CONFIG_QCOM_EMAC) += qcom-emac.o
qcom-emac-objs := emac.o emac-mac.o emac-phy.o emac-sgmii.o \
- emac-sgmii-fsm9900.o emac-sgmii-qdf2432.o
+ emac-sgmii-fsm9900.o emac-sgmii-qdf2432.o \
+ emac-sgmii-qdf2400.o
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2400.c b/drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2400.c
new file mode 100644
index 0000000..5b84194
--- /dev/null
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2400.c
@@ -0,0 +1,217 @@
+/* Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/* Qualcomm Technologies, Inc. QDF2400 EMAC SGMII Controller driver.
+ */
+
+#include <linux/iopoll.h>
+#include "emac.h"
+
+/* EMAC_SGMII register offsets */
+#define EMAC_SGMII_PHY_TX_PWR_CTRL 0x000C
+#define EMAC_SGMII_PHY_LANE_CTRL1 0x0018
+#define EMAC_SGMII_PHY_CDR_CTRL0 0x0058
+#define EMAC_SGMII_PHY_POW_DWN_CTRL0 0x0080
+#define EMAC_SGMII_PHY_RESET_CTRL 0x00a8
+#define EMAC_SGMII_PHY_INTERRUPT_MASK 0x00b4
+
+/* SGMII digital lane registers */
+#define EMAC_SGMII_LN_DRVR_CTRL0 0x000C
+#define EMAC_SGMII_LN_DRVR_TAP_EN 0x0018
+#define EMAC_SGMII_LN_TX_MARGINING 0x001C
+#define EMAC_SGMII_LN_TX_PRE 0x0020
+#define EMAC_SGMII_LN_TX_POST 0x0024
+#define EMAC_SGMII_LN_TX_BAND_MODE 0x0060
+#define EMAC_SGMII_LN_LANE_MODE 0x0064
+#define EMAC_SGMII_LN_PARALLEL_RATE 0x007C
+#define EMAC_SGMII_LN_CML_CTRL_MODE0 0x00C0
+#define EMAC_SGMII_LN_MIXER_CTRL_MODE0 0x00D8
+#define EMAC_SGMII_LN_VGA_INITVAL 0x013C
+#define EMAC_SGMII_LN_UCDR_FO_GAIN_MODE0 0x0184
+#define EMAC_SGMII_LN_UCDR_SO_GAIN_MODE0 0x0190
+#define EMAC_SGMII_LN_UCDR_SO_CONFIG 0x019C
+#define EMAC_SGMII_LN_RX_BAND 0x01A4
+#define EMAC_SGMII_LN_RX_RCVR_PATH1_MODE0 0x01C0
+#define EMAC_SGMII_LN_RSM_CONFIG 0x01F8
+#define EMAC_SGMII_LN_SIGDET_ENABLES 0x0230
+#define EMAC_SGMII_LN_SIGDET_CNTRL 0x0234
+#define EMAC_SGMII_LN_SIGDET_DEGLITCH_CNTRL 0x0238
+#define EMAC_SGMII_LN_RX_EN_SIGNAL 0x02AC
+#define EMAC_SGMII_LN_RX_MISC_CNTRL0 0x02B8
+#define EMAC_SGMII_LN_DRVR_LOGIC_CLKDIV 0x02C8
+
+/* SGMII digital lane register values */
+#define UCDR_STEP_BY_TWO_MODE0 BIT(7)
+#define UCDR_xO_GAIN_MODE(x) ((x) & 0x7f)
+#define UCDR_ENABLE BIT(6)
+#define UCDR_SO_SATURATION(x) ((x) & 0x3f)
+
+#define SIGDET_LP_BYP_PS4 BIT(7)
+#define SIGDET_EN_PS0_TO_PS2 BIT(6)
+
+#define TXVAL_VALID_INIT BIT(4)
+#define KR_PCIGEN3_MODE BIT(0)
+
+#define MAIN_EN BIT(0)
+
+#define TX_MARGINING_MUX BIT(6)
+#define TX_MARGINING(x) ((x) & 0x3f)
+
+#define TX_PRE_MUX BIT(6)
+
+#define TX_POST_MUX BIT(6)
+
+#define CML_GEAR_MODE(x) (((x) & 7) << 3)
+#define CML2CMOS_IBOOST_MODE(x) ((x) & 7)
+
+#define MIXER_LOADB_MODE(x) (((x) & 0xf) << 2)
+#define MIXER_DATARATE_MODE(x) ((x) & 3)
+
+#define VGA_THRESH_DFE(x) ((x) & 0x3f)
+
+#define SIGDET_LP_BYP_PS0_TO_PS2 BIT(5)
+#define SIGDET_FLT_BYP BIT(0)
+
+#define SIGDET_LVL(x) (((x) & 0xf) << 4)
+
+#define SIGDET_DEGLITCH_CTRL(x) (((x) & 0xf) << 1)
+
+#define INVERT_PCS_RX_CLK BIT(7)
+
+#define DRVR_LOGIC_CLK_EN BIT(4)
+#define DRVR_LOGIC_CLK_DIV(x) ((x) & 0xf)
+
+#define PARALLEL_RATE_MODE0(x) ((x) & 0x3)
+
+#define BAND_MODE0(x) ((x) & 0x3)
+
+#define LANE_MODE(x) ((x) & 0x1f)
+
+#define CDR_PD_SEL_MODE0(x) (((x) & 0x3) << 5)
+#define EN_DLL_MODE0 BIT(4)
+#define EN_IQ_DCC_MODE0 BIT(3)
+#define EN_IQCAL_MODE0 BIT(2)
+
+#define BYPASS_RSM_SAMP_CAL BIT(1)
+#define BYPASS_RSM_DLL_CAL BIT(0)
+
+#define L0_RX_EQUALIZE_ENABLE BIT(6)
+
+#define PWRDN_B BIT(0)
+
+#define CDR_MAX_CNT(x) ((x) & 0xff)
+
+#define SERDES_START_WAIT_TIMES 100
+
+struct emac_reg_write {
+ unsigned int offset;
+ u32 val;
+};
+
+static void emac_reg_write_all(void __iomem *base,
+ const struct emac_reg_write *itr, size_t size)
+{
+ size_t i;
+
+ for (i = 0; i < size; ++itr, ++i)
+ writel(itr->val, base + itr->offset);
+}
+
+static const struct emac_reg_write sgmii_laned[] = {
+ /* CDR Settings */
+ {EMAC_SGMII_LN_UCDR_FO_GAIN_MODE0,
+ UCDR_STEP_BY_TWO_MODE0 | UCDR_xO_GAIN_MODE(10)},
+ {EMAC_SGMII_LN_UCDR_SO_GAIN_MODE0, UCDR_xO_GAIN_MODE(0)},
+ {EMAC_SGMII_LN_UCDR_SO_CONFIG, UCDR_ENABLE | UCDR_SO_SATURATION(12)},
+
+ /* TX/RX Settings */
+ {EMAC_SGMII_LN_RX_EN_SIGNAL, SIGDET_LP_BYP_PS4 | SIGDET_EN_PS0_TO_PS2},
+
+ {EMAC_SGMII_LN_DRVR_CTRL0, TXVAL_VALID_INIT | KR_PCIGEN3_MODE},
+ {EMAC_SGMII_LN_DRVR_TAP_EN, MAIN_EN},
+ {EMAC_SGMII_LN_TX_MARGINING, TX_MARGINING_MUX | TX_MARGINING(25)},
+ {EMAC_SGMII_LN_TX_PRE, TX_PRE_MUX},
+ {EMAC_SGMII_LN_TX_POST, TX_POST_MUX},
+
+ {EMAC_SGMII_LN_CML_CTRL_MODE0,
+ CML_GEAR_MODE(1) | CML2CMOS_IBOOST_MODE(1)},
+ {EMAC_SGMII_LN_MIXER_CTRL_MODE0,
+ MIXER_LOADB_MODE(12) | MIXER_DATARATE_MODE(1)},
+ {EMAC_SGMII_LN_VGA_INITVAL, VGA_THRESH_DFE(31)},
+ {EMAC_SGMII_LN_SIGDET_ENABLES,
+ SIGDET_LP_BYP_PS0_TO_PS2 | SIGDET_FLT_BYP},
+ {EMAC_SGMII_LN_SIGDET_CNTRL, SIGDET_LVL(8)},
+
+ {EMAC_SGMII_LN_SIGDET_DEGLITCH_CNTRL, SIGDET_DEGLITCH_CTRL(4)},
+ {EMAC_SGMII_LN_RX_MISC_CNTRL0, INVERT_PCS_RX_CLK},
+ {EMAC_SGMII_LN_DRVR_LOGIC_CLKDIV,
+ DRVR_LOGIC_CLK_EN | DRVR_LOGIC_CLK_DIV(4)},
+
+ {EMAC_SGMII_LN_PARALLEL_RATE, PARALLEL_RATE_MODE0(1)},
+ {EMAC_SGMII_LN_TX_BAND_MODE, BAND_MODE0(1)},
+ {EMAC_SGMII_LN_RX_BAND, BAND_MODE0(2)},
+ {EMAC_SGMII_LN_LANE_MODE, LANE_MODE(26)},
+ {EMAC_SGMII_LN_RX_RCVR_PATH1_MODE0, CDR_PD_SEL_MODE0(2) |
+ EN_DLL_MODE0 | EN_IQ_DCC_MODE0 | EN_IQCAL_MODE0},
+ {EMAC_SGMII_LN_RSM_CONFIG, BYPASS_RSM_SAMP_CAL | BYPASS_RSM_DLL_CAL},
+};
+
+static const struct emac_reg_write physical_coding_sublayer_programming[] = {
+ {EMAC_SGMII_PHY_POW_DWN_CTRL0, PWRDN_B},
+ {EMAC_SGMII_PHY_CDR_CTRL0, CDR_MAX_CNT(15)},
+ {EMAC_SGMII_PHY_TX_PWR_CTRL, 0},
+ {EMAC_SGMII_PHY_LANE_CTRL1, L0_RX_EQUALIZE_ENABLE},
+};
+
+int emac_sgmii_init_qdf2400(struct emac_adapter *adpt)
+{
+ struct emac_phy *phy = &adpt->phy;
+ void __iomem *phy_regs = phy->base;
+ void __iomem *laned = phy->digital;
+ unsigned int i;
+ u32 lnstatus;
+
+ /* PCS lane-x init */
+ emac_reg_write_all(phy->base, physical_coding_sublayer_programming,
+ ARRAY_SIZE(physical_coding_sublayer_programming));
+
+ /* SGMII lane-x init */
+ emac_reg_write_all(phy->digital, sgmii_laned, ARRAY_SIZE(sgmii_laned));
+
+ /* Power up PCS and start reset lane state machine */
+
+ writel(0, phy_regs + EMAC_SGMII_PHY_RESET_CTRL);
+ writel(1, laned + SGMII_LN_RSM_START);
+
+ /* Wait for c_ready assertion */
+ for (i = 0; i < SERDES_START_WAIT_TIMES; i++) {
+ lnstatus = readl(phy_regs + SGMII_PHY_LN_LANE_STATUS);
+ if (lnstatus & BIT(1))
+ break;
+ usleep_range(100, 200);
+ }
+
+ if (i == SERDES_START_WAIT_TIMES) {
+ netdev_err(adpt->netdev, "SGMII failed to start\n");
+ return -EIO;
+ }
+
+ /* Disable digital and SERDES loopback */
+ writel(0, phy_regs + SGMII_PHY_LN_BIST_GEN0);
+ writel(0, phy_regs + SGMII_PHY_LN_BIST_GEN2);
+ writel(0, phy_regs + SGMII_PHY_LN_CDR_CTRL1);
+
+ /* Mask out all the SGMII Interrupt */
+ writel(0, phy_regs + EMAC_SGMII_PHY_INTERRUPT_MASK);
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
index 07c872a..d999d5d 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
@@ -161,20 +161,45 @@ void emac_sgmii_reset(struct emac_adapter *adpt)
static int emac_sgmii_acpi_match(struct device *dev, void *data)
{
+#ifdef CONFIG_ACPI
static const struct acpi_device_id match_table[] = {
{
.id = "QCOM8071",
- .driver_data = (kernel_ulong_t)emac_sgmii_init_qdf2432,
},
{}
};
const struct acpi_device_id *id = acpi_match_device(match_table, dev);
emac_sgmii_initialize *initialize = data;
- if (id)
- *initialize = (emac_sgmii_initialize)id->driver_data;
+ if (id) {
+ acpi_handle handle = ACPI_HANDLE(dev);
+ unsigned long long hrv;
+ acpi_status status;
+
+ status = acpi_evaluate_integer(handle, "_HRV", NULL, &hrv);
+ if (status) {
+ if (status == AE_NOT_FOUND)
+ /* Older versions of the QDF2432 ACPI tables do
+ * not have an _HRV property.
+ */
+ hrv = 1;
+ else
+ /* Something is wrong with the tables */
+ return 0;
+ }
- return !!id;
+ switch (hrv) {
+ case 1:
+ *initialize = emac_sgmii_init_qdf2432;
+ return 1;
+ case 2:
+ *initialize = emac_sgmii_init_qdf2400;
+ return 1;
+ }
+ }
+#endif
+
+ return 0;
}
static const struct of_device_id emac_sgmii_dt_match[] = {
diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
index e2bef14..80ed3dc 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
@@ -21,5 +21,6 @@ void emac_sgmii_reset(struct emac_adapter *adpt);
int emac_sgmii_init_fsm9900(struct emac_adapter *adpt);
int emac_sgmii_init_qdf2432(struct emac_adapter *adpt);
+int emac_sgmii_init_qdf2400(struct emac_adapter *adpt);
#endif
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply related
* [PATCH 0/2 v3] net: qcom/emac: simplify support for different SOCs
From: Timur Tabi @ 2016-12-08 19:24 UTC (permalink / raw)
To: David Miller, netdev, alokc
On SOCs that have the Qualcomm EMAC network controller, the internal
PHY block is always different. Sometimes the differences are small,
sometimes it might be a completely different IP. Either way, using version
numbers to differentiate them and putting all of the init code in one
file does not scale.
This patchset does two things: The first breaks up the current code into
different files, and the second patch adds support for a third SOC, the
Qualcomm Technologies QDF2400 ARM Server SOC.
Timur Tabi (2):
[v2] net: qcom/emac: move phy init code to separate files
[v3] net: qcom/emac: add support for the Qualcomm Technologies QDF2400
drivers/net/ethernet/qualcomm/emac/Makefile | 4 +-
.../ethernet/qualcomm/emac/emac-sgmii-fsm9900.c | 245 +++++++++
.../ethernet/qualcomm/emac/emac-sgmii-qdf2400.c | 217 ++++++++
.../ethernet/qualcomm/emac/emac-sgmii-qdf2432.c | 210 ++++++++
drivers/net/ethernet/qualcomm/emac/emac-sgmii.c | 569 ++-------------------
drivers/net/ethernet/qualcomm/emac/emac-sgmii.h | 6 +-
6 files changed, 725 insertions(+), 526 deletions(-)
create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii-fsm9900.c
create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2400.c
create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii-qdf2432.c
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* Re: [PATCH v2 net-next 0/2] phy: lan78xx: add phy fixup unregister functions & LAN7801 update
From: David Miller @ 2016-12-08 19:22 UTC (permalink / raw)
To: Woojung.Huh; +Cc: f.fainelli, andrew, netdev, UNGLinuxDriver
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D4097998F@CHN-SV-EXMX02.mchp-main.com>
From: <Woojung.Huh@microchip.com>
Date: Wed, 7 Dec 2016 20:25:50 +0000
> V2 patch of adding phy fixup unregister function with use in LAN7801
> update.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Edward Cree @ 2016-12-08 19:20 UTC (permalink / raw)
To: Paolo Abeni, Eric Dumazet
Cc: David Miller, netdev, Willem de Bruijn, Tom Herbert,
Jesper Dangaard Brouer
In-Reply-To: <1481097428.5535.12.camel@redhat.com>
On 07/12/16 07:57, Paolo Abeni wrote:
> We have some experimental patches to implement GRO for plain UDP
> connected sockets, using frag_list to preserve the individual skb len,
> and deliver the packet to user space individually. With that I got
> ~3mpps with a single queue/user space sink - before the recent udp
> improvements.
You might want to benchmark these against my batched receive patches
from a while ago[1], both seem to have broadly the same objective.
In my benchmarking (obviously with different hardware) I was using
multiple sink processes, but all (processes and irqs) on a single
core; the unpatched kernel was getting ~5Mpps. Then with my patches
I was getting ~6.4Mpps. (Limitations of my test scripts meant that
having a single sink process meant also having a single source
process, in which case I was TX limited to ~3Mpps, and using about
60% CPU on the RX side.)
Let me know if you're interested in doing this comparison; if so I'll
post updated patches against net-next. My own attempts to benchmark
them more have been held up by lack of time and not really knowing
what constitutes a realistic netfilter setup.
Of course if you're using a device other than sfc you'll need to add
your own equivalent of patch #2 to call the netif_receive_skb_list()
entry point from the driver.
-Ed
[1] https://www.spinics.net/lists/netdev/msg373769.html
^ permalink raw reply
* Re: [PATCH 2/3] i40e: Add XDP_TX support
From: kbuild test robot @ 2016-12-08 19:20 UTC (permalink / raw)
To: Björn Töpel
Cc: kbuild-all, jeffrey.t.kirsher, intel-wired-lan,
Björn Töpel, john.r.fastabend, magnus.karlsson, netdev
In-Reply-To: <20161208170022.11555-3-bjorn.topel@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3383 bytes --]
Hi Björn,
[auto build test ERROR on next-20161208]
[cannot apply to jkirsher-next-queue/dev-queue v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.9-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Bj-rn-T-pel/i40e-Support-for-XDP/20161209-013138
config: sparc64-allmodconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64
All error/warnings (new ones prefixed by >>):
In file included from include/linux/cache.h:4:0,
from include/linux/printk.h:8,
from include/linux/kernel.h:13,
from include/linux/list.h:8,
from include/linux/timer.h:4,
from include/linux/workqueue.h:8,
from include/linux/bpf.h:11,
from drivers/net/ethernet/intel/i40e/i40e_txrx.c:27:
drivers/net/ethernet/intel/i40e/i40e_txrx.c: In function 'i40e_try_flip_rx_page':
>> drivers/net/ethernet/intel/i40e/i40e_txrx.c:1613:32: error: 'size' undeclared (first use in this function)
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
^
include/uapi/linux/kernel.h:10:41: note: in definition of macro '__ALIGN_KERNEL_MASK'
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
^
include/linux/kernel.h:49:22: note: in expansion of macro '__ALIGN_KERNEL'
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
^~~~~~~~~~~~~~
>> drivers/net/ethernet/intel/i40e/i40e_txrx.c:1613:26: note: in expansion of macro 'ALIGN'
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
^~~~~
drivers/net/ethernet/intel/i40e/i40e_txrx.c:1613:32: note: each undeclared identifier is reported only once for each function it appears in
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
^
include/uapi/linux/kernel.h:10:41: note: in definition of macro '__ALIGN_KERNEL_MASK'
#define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask))
^
include/linux/kernel.h:49:22: note: in expansion of macro '__ALIGN_KERNEL'
#define ALIGN(x, a) __ALIGN_KERNEL((x), (a))
^~~~~~~~~~~~~~
>> drivers/net/ethernet/intel/i40e/i40e_txrx.c:1613:26: note: in expansion of macro 'ALIGN'
unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
^~~~~
vim +/size +1613 drivers/net/ethernet/intel/i40e/i40e_txrx.c
1607 */
1608 static bool i40e_try_flip_rx_page(struct i40e_rx_buffer *rx_buffer)
1609 {
1610 #if (PAGE_SIZE < 8192)
1611 unsigned int truesize = I40E_RXBUFFER_2048;
1612 #else
> 1613 unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
1614 unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
1615 #endif
1616
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 48617 bytes --]
^ permalink raw reply
* Re: [patch] ser_gigaset: return -ENOMEM on error instead of success
From: David Miller @ 2016-12-08 19:19 UTC (permalink / raw)
To: dan.carpenter
Cc: pebolle, tilman, isdn, gigaset307x-common, netdev,
kernel-janitors
In-Reply-To: <20161207112203.GC5507@elgon.mountain>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 7 Dec 2016 14:22:03 +0300
> If we can't allocate the resources in gigaset_initdriver() then we
> should return -ENOMEM instead of zero.
>
> Fixes: 2869b23e4b95 ("[PATCH] drivers/isdn/gigaset: new M101 driver (v2)")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: David Miller @ 2016-12-08 19:17 UTC (permalink / raw)
To: john.fastabend
Cc: daniel, mst, shm, tgraf, alexei.starovoitov, john.r.fastabend,
netdev, brouer
In-Reply-To: <20161207200139.28121.4811.stgit@john-Precision-Tower-5810>
From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 07 Dec 2016 12:10:47 -0800
> This implements virtio_net for the mergeable buffers and big_packet
> modes. I tested this with vhost_net running on qemu and did not see
> any issues. For testing num_buf > 1 I added a hack to vhost driver
> to only but 100 bytes per buffer.
...
So where are we with this?
I'm not too thrilled with the idea of making XDP_TX optional or
something like that. If someone enables XDP, there is a tradeoff.
I also have reservations about the idea to make jumbo frames work
without giving XDP access to the whole packet. If it wants to push or
pop a header, it might need to know the whole packet length. How will
you pass that to the XDP program?
Some kinds of encapsulation require trailers, thus preclusing access
to the entire packet precludes those kinds of transformations.
This is why we want simple, linear, buffer access for XDP.
Even the most seemingly minor exception turns into a huge complicated
mess.
^ permalink raw reply
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Tom Herbert @ 2016-12-08 19:15 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481220136.4930.110.camel@edumazet-glaptop3.roam.corp.google.com>
On Thu, Dec 8, 2016 at 10:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:
>
>> Of course that would only help on systems where no one enable encaps,
>> ie. looks good in the the simple benchmarks but in real life if just
>> one socket enables encap everyone else takes the hit. Alternatively,
>> maybe we could do early demux when we do the lookup in GRO to
>> eliminate the extra lookup?
>
> Well, if you do the lookup in GRO, wont it be done for every incoming
> MSS, instead of once per GRO packet ?
We should be able to avoid that. We already do the lookup for every
UDP packet going into GRO, would only need to take the refcnt once for
the whole GRO packet.
>
> Anyway, the flooded UDP sockets out there are not normally connected
We still should be able to use early demux in that case, just can't
avoid the route lookup. I wonder if we might be able to cache a soft
route maybe for the last local destination received to help the
unconnected sockets case...
In any case, I can take a look at of doing early demux from with UDP GRO.
Tom
> ones.
>
>
>
^ permalink raw reply
* Re: [PATCH 3/6] net: ethernet: ti: cpts: add support of cpts HW_TS_PUSH
From: Grygorii Strashko @ 2016-12-08 19:04 UTC (permalink / raw)
To: Richard Cochran
Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-omap-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok
In-Reply-To: <20161203232130.GA17944@netboy>
On 12/03/2016 05:21 PM, Richard Cochran wrote:
> On Mon, Nov 28, 2016 at 05:04:25PM -0600, Grygorii Strashko wrote:
>> This also change overflow polling period when HW_TS_PUSH feature is
>> enabled - overflow check work will be scheduled more often (every
>> 200ms) for proper HW_TS_PUSH events reporting.
>
> For proper reporting, you should make use of the interrupt. The small
> fifo (16 iirc) could very well overflow in 200 ms. The interrupt
> handler should read out the entire fifo at each interrupt.
>
huh. Seems this is not really good idea, because MISC Irq will be
triggered for *any* CPTS event and there is no way to enable it just for
HW_TS_PUSH. So, this doesn't work will with current code for RX/TX timestamping
(which uses polling mode). + runtime overhead in net RX/TX caused by
triggering more interrupts.
May be, overflow check/polling timeout can be made configurable (module parameter).
--
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:52 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <CANn89i+10fwMQ+oqs2AgVfE9CHnpZqecN_NxVqobyzD1riyMfg@mail.gmail.com>
On Thu, Dec 8, 2016 at 10:38 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Dec 8, 2016 at 10:36 AM, Eric Dumazet <edumazet@google.com> wrote:
>> On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>>
>>> Nice one! This sounds like a relevant improvement!
>>>
>>> I'm wondering if it may cause regressions with small value of
>>> sk_rcvbuf ?!? e.g. with:
>>>
>>> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>>>
>>
>> Possibly, then simply we can refine the test to :
>>
>> size = up->forward_deficit;
>> if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
>> return;
>
I will also add this patch :
This really makes sure our changes to sk_forward_alloc wont be slowed
because producers see
the change to sk_rmem_alloc too soon.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8400d6954558..6bdcbe103390 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1191,13 +1191,14 @@ static void udp_rmem_release(struct sock *sk,
int size, int partial)
}
up->forward_deficit = 0;
- atomic_sub(size, &sk->sk_rmem_alloc);
sk->sk_forward_alloc += size;
amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
sk->sk_forward_alloc -= amt;
if (amt)
__sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT);
+
+ atomic_sub(size, &sk->sk_rmem_alloc);
}
/* Note: called with sk_receive_queue.lock held.
^ permalink raw reply related
* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Paolo Abeni @ 2016-12-08 18:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Willem de Bruijn, Tom Herbert
In-Reply-To: <1481120956.4930.7.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, 2016-12-07 at 06:29 -0800, Eric Dumazet wrote:
> On Wed, 2016-12-07 at 08:57 +0100, Paolo Abeni wrote:
>
> > We have some experimental patches to implement GRO for plain UDP
> > connected sockets, using frag_list to preserve the individual skb len,
> > and deliver the packet to user space individually. With that I got
> > ~3mpps with a single queue/user space sink - before the recent udp
> > improvements. I would like to present these patches on netdev soon (no
> > sooner than next week, anyway).
> >
>
> Make sure you handle properly all netfilter helpers :(
Thank you for the head-up!
UDP-GRO will be enabled by a specific netdev feature bit, disabled by
default, should not impact by default any setup.
> Keeping frag_list means you keep one sk_buff per segment, so this really
> looks like a legacy UDP server (like a DNS server) wont benefit from
> this anyway.
I'm sorry, I do not follow.
UDP GRO will require connected socket - very likely no DNS server. The
use-case is an application using long lived UDP sockets doing a lot of
traffic, like fix protocol feeds over UDP.
Thank you,
Paolo
^ permalink raw reply
* Re: [PATCH v4 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active
From: Saeed Mahameed @ 2016-12-08 18:47 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
Daniel Borkmann, David Miller, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Saeed Mahameed,
Tariq Toukan, Kernel Team
In-Reply-To: <1481154794-2311034-4-git-send-email-kafai@fb.com>
On Thu, Dec 8, 2016 at 1:53 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head()
> support. This patch only affects the code path when XDP is active.
>
> After testing, the tx_dropped counter is incremented if the xdp_prog sends
> more than wire MTU.
>
I guess this is the HW tx_dropped counter. I suggest as a future
improvement to
drop in SW to save CPU and HW cycles on such packets.
Also those packets pages will be recycled immediately if dropped in
SW, one more extra point.
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH] drivers: net: xgene: initialize slots
From: Iyappan Subramanian @ 2016-12-08 18:44 UTC (permalink / raw)
To: Colin King; +Cc: Keyur Chudgar, netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20161208111754.9711-1-colin.king@canonical.com>
On Thu, Dec 8, 2016 at 3:17 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> static analysis using cppcheck detected that slots was uninitialized.
> Fix this by initializing it to buf_pool->slots - 1
>
> Found using static analysis with CoverityScan, CID #1387620
>
> Fixes: a9380b0f7be818 ("drivers: net: xgene: Add support for Jumbo frame")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> index 6c7eea8..899163c 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> @@ -636,6 +636,7 @@ static void xgene_enet_free_pagepool(struct xgene_enet_desc_ring *buf_pool,
>
> dev = ndev_to_dev(buf_pool->ndev);
> head = buf_pool->head;
> + slots = buf_pool->slots - 1;
>
> for (i = 0; i < 4; i++) {
> frag_size = xgene_enet_get_data_len(le64_to_cpu(desc[i ^ 1]));
Thanks, Colin.
Dan Carpenter <dan.carpenter@oracle.com> posted the fix already and
got accepted.
http://marc.info/?l=linux-netdev&m=148110980224343&w=2
> --
> 2.10.2
>
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:38 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <CANn89iL8r9UD=sGn3WxVFZ+Z_QJYYM6aXxCFvafwvJ-bEtNhKQ@mail.gmail.com>
On Thu, Dec 8, 2016 at 10:36 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>
>> Nice one! This sounds like a relevant improvement!
>>
>> I'm wondering if it may cause regressions with small value of
>> sk_rcvbuf ?!? e.g. with:
>>
>> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>>
>
> Possibly, then simply we can refine the test to :
>
> size = up->forward_deficit;
> if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
> return;
BTW, I tried :
lpaa6:~# ./netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m
1024 -M 1024
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
127.0.0.1 () port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
4608 1024 10.00 4499400 0 3685.88
2560 10.00 4498670 3685.28
So it looks like it is working.
However I have no doubt there might be a corner case for tiny
SO_RCVBUF values or for some message sizes.
^ permalink raw reply
* Re: [PATCH v4 net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
From: Saeed Mahameed @ 2016-12-08 18:36 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
Daniel Borkmann, David Miller, Jakub Kicinski,
Jesper Dangaard Brouer, John Fastabend, Saeed Mahameed,
Tariq Toukan, Kernel Team
In-Reply-To: <1481154794-2311034-3-git-send-email-kafai@fb.com>
On Thu, Dec 8, 2016 at 1:53 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> When XDP is active in mlx4, mlx4 is using one page/pkt.
> At the same time (i.e. when XDP is active), it is currently
> limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN)
> which is 1514 in x86. AFAICT, we can at least raise the MTU
> limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this
> patch is doing. It will be useful in the next patch which
> allows XDP program to extend the packet by adding new header(s).
>
> Note: In the earlier XDP patches, there is already existing guard
> to ensure the page/pkt scheme only applies when XDP is active
> in mlx4.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Eric Dumazet @ 2016-12-08 18:36 UTC (permalink / raw)
To: Paolo Abeni; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481221491.6120.11.camel@redhat.com>
On Thu, Dec 8, 2016 at 10:24 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Nice one! This sounds like a relevant improvement!
>
> I'm wondering if it may cause regressions with small value of
> sk_rcvbuf ?!? e.g. with:
>
> netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
>
Possibly, then simply we can refine the test to :
size = up->forward_deficit;
if (size < (sk->sk_rcvbuf >> 2) && !skb_queue_empty(sk->sk_receive_buf))
return;
^ permalink raw reply
* Re: [PATCH net-next] bpf: fix state equivalence
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
To: ast; +Cc: daniel, jbacik, tgraf, netdev
In-Reply-To: <1481137079-2205635-1-git-send-email-ast@fb.com>
From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 7 Dec 2016 10:57:59 -0800
> Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> and 484611357c19 ("bpf: allow access into map value arrays") by themselves
> are correct, but in combination they make state equivalence ignore 'id' field
> of the register state which can lead to accepting invalid program.
>
> Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
> Fixes: 484611357c19 ("bpf: allow access into map value arrays")
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] net: rfs: add a jump label
From: Eric Dumazet @ 2016-12-08 18:31 UTC (permalink / raw)
To: David Miller; +Cc: netdev, pabeni
In-Reply-To: <20161208.131900.434329215014851517.davem@davemloft.net>
On Thu, 2016-12-08 at 13:19 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 07 Dec 2016 08:29:10 -0800
>
> > From: Eric Dumazet <edumazet@google.com>
> >
> > RFS is not commonly used, so add a jump label to avoid some conditionals
> > in fast path.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Applied, but I wonder how effective this will really be in the long run.
I guess this applies to about all jump labels.
As soon as the attribute is per namespace, we no longer can use them.
A conditional cost really depends on the expression complexity
(including cache line misses)
TCP stack might benefit from jump labels, like sysctl_tcp_low_latency
which is often set to 1 on hosts mostly using epoll()/poll()/select()
instead of blocking read()/recvmsg()
^ permalink raw reply
* Re: [PATCH net-next] net: do not read sk_drops if application does not care
From: David Miller @ 2016-12-08 18:31 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481133936.4930.51.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 10:05:36 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> sk_drops can be an often written field, do not read it unless
> application showed interest.
>
> Note that sk_drops can be read via inet_diag, so applications
> can avoid getting this info from every received packet.
>
> In the future, 'reading' sk_drops might require folding per node or per
> cpu fields, and thus become even more expensive than today.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: pch_gbe: Fix TX RX descriptor accesses for big endian systems
From: David Miller @ 2016-12-08 18:29 UTC (permalink / raw)
To: hassan.naveed; +Cc: netdev, paul.burton, matt.redfearn, fw, romieu
In-Reply-To: <1481133534-26224-1-git-send-email-hassan.naveed@imgtec.com>
From: Hassan Naveed <hassan.naveed@imgtec.com>
Date: Wed, 7 Dec 2016 09:58:54 -0800
> Fix pch_gbe driver for ethernet operations for a big endian CPU.
> Values written to and read from transmit and receive descriptors
> in the pch_gbe driver are byte swapped from the perspective of a
> big endian CPU, since the ethernet controller always operates in
> little endian mode. Rectify this by appropriately byte swapping
> these descriptor field values in the driver software.
>
> Signed-off-by: Hassan Naveed <hassan.naveed@imgtec.com>
> Reviewed-by: Paul Burton <paul.burton@imgtec.com>
> Reviewed-by: Matt Redfearn <matt.redfearn@imgtec.com>
As explained by Francois, you need to use the proper endian types in
the descriptor datastructure.
Then please run sparse with endianness checking enabled on the build
of the driver.
^ permalink raw reply
* Re: [PATCH net v2 1/1] driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed
From: Mahesh Bandewar (महेश बंडेवार) @ 2016-12-08 18:29 UTC (permalink / raw)
To: fgao; +Cc: David Miller, Eric Dumazet, linux-netdev, gfree.wind
In-Reply-To: <1481167018-559-1-git-send-email-fgao@ikuai8.com>
On Wed, Dec 7, 2016 at 7:16 PM, <fgao@ikuai8.com> wrote:
> From: Gao Feng <fgao@ikuai8.com>
>
> When netdev_upper_dev_unlink failed in ipvlan_link_new, need to
> unlink the ipvlan dev with upper dev.
>
> Signed-off-by: Gao Feng <fgao@ikuai8.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
> v2: Rename the label to unlink_netdev, per Mahesh Bandewar
> v1: Initial patch
>
> drivers/net/ipvlan/ipvlan_main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 0fef178..dfbc4ef 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -546,13 +546,15 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> }
> err = ipvlan_set_port_mode(port, mode);
> if (err) {
> - goto unregister_netdev;
> + goto unlink_netdev;
> }
>
> list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
> netif_stacked_transfer_operstate(phy_dev, dev);
> return 0;
>
> +unlink_netdev:
> + netdev_upper_dev_unlink(phy_dev, dev);
> unregister_netdev:
> unregister_netdevice(dev);
> destroy_ipvlan_port:
> --
> 1.9.1
>
>
^ permalink raw reply
* Re: [PATCH net-next] udp: under rx pressure, try to condense skbs
From: David Miller @ 2016-12-08 18:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481131173.4930.36.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 09:19:33 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> Under UDP flood, many softirq producers try to add packets to
> UDP receive queue, and one user thread is burning one cpu trying
> to dequeue packets as fast as possible.
>
> Two parts of the per packet cost are :
> - copying payload from kernel space to user space,
> - freeing memory pieces associated with skb.
>
> If socket is under pressure, softirq handler(s) can try to pull in
> skb->head the payload of the packet if it fits.
>
> Meaning the softirq handler(s) can free/reuse the page fragment
> immediately, instead of letting udp_recvmsg() do this hundreds of usec
> later, possibly from another node.
>
>
> Additional gains :
> - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
> - We avoid cache line misses at copyout() time and consume_skb() time,
> and avoid one put_page() with potential alien freeing on NUMA hosts.
>
> This comes at the cost of a copy, bounded to available tail room, which
> is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
> than necessary)
>
> This patch gave me about 5 % increase in throughput in my tests.
>
> skb_condense() helper could probably used in other contexts.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
This is isolated to UDP, and would be easy to revert if it causes
problems. So applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH net-next V3 0/7] liquidio VF data path
From: David Miller @ 2016-12-08 18:25 UTC (permalink / raw)
To: rvatsavayi; +Cc: netdev
In-Reply-To: <1481129677-10586-1-git-send-email-rvatsavayi@caviumnetworks.com>
From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Wed, 7 Dec 2016 08:54:30 -0800
> Following is V3 patch series that adds support for VF
> data path related features. It also has following changes
> related to previous comments:
> 1) Remove unnecessary "void *" casting.
> 2) Remove inline for functions and let gcc decide.
>
> Please apply patches in following order as some of them
> depend on earlier patches.
Series applied.
^ permalink raw reply
* Re: [PATCH v2 net-next 4/4] udp: add batching to udp_rmem_release()
From: Paolo Abeni @ 2016-12-08 18:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <1481218739-27089-5-git-send-email-edumazet@google.com>
On Thu, 2016-12-08 at 09:38 -0800, Eric Dumazet wrote:
> If udp_recvmsg() constantly releases sk_rmem_alloc
> for every read packet, it gives opportunity for
> producers to immediately grab spinlocks and desperatly
> try adding another packet, causing false sharing.
>
> We can add a simple heuristic to give the signal
> by batches of ~25 % of the queue capacity.
>
> This patch considerably increases performance under
> flood by about 50 %, since the thread draining the queue
> is no longer slowed by false sharing.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/linux/udp.h | 3 +++
> net/ipv4/udp.c | 11 +++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index d1fd8cd39478..c0f530809d1f 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -79,6 +79,9 @@ struct udp_sock {
> int (*gro_complete)(struct sock *sk,
> struct sk_buff *skb,
> int nhoff);
> +
> + /* This field is dirtied by udp_recvmsg() */
> + int forward_deficit;
> };
>
> static inline struct udp_sock *udp_sk(const struct sock *sk)
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 880cd3d84abf..f0096d088104 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1177,8 +1177,19 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
> /* fully reclaim rmem/fwd memory allocated for skb */
> static void udp_rmem_release(struct sock *sk, int size, int partial)
> {
> + struct udp_sock *up = udp_sk(sk);
> int amt;
>
> + if (likely(partial)) {
> + up->forward_deficit += size;
> + size = up->forward_deficit;
> + if (size < (sk->sk_rcvbuf >> 2))
> + return;
> + } else {
> + size += up->forward_deficit;
> + }
> + up->forward_deficit = 0;
> +
> atomic_sub(size, &sk->sk_rmem_alloc);
> sk->sk_forward_alloc += size;
> amt = (sk->sk_forward_alloc - partial) & ~(SK_MEM_QUANTUM - 1);
Nice one! This sounds like a relevant improvement!
I'm wondering if it may cause regressions with small value of
sk_rcvbuf ?!? e.g. with:
netperf -t UDP_STREAM -H 127.0.0.1 -- -s 1280 -S 1280 -m 1024 -M 1024
I'm sorry, I fear I will not unable to do any test before next week.
Cheers,
Paolo
^ permalink raw reply
* Re: [PATCH net-next] net: rfs: add a jump label
From: David Miller @ 2016-12-08 18:19 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, pabeni
In-Reply-To: <1481128150.4930.25.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Dec 2016 08:29:10 -0800
> From: Eric Dumazet <edumazet@google.com>
>
> RFS is not commonly used, so add a jump label to avoid some conditionals
> in fast path.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
Applied, but I wonder how effective this will really be in the long run.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox