* Re: [PATCH net-next V2 05/11] net/mlx5e: Support RX multi-packet WQE (Striding RQ)
From: Eric Dumazet @ 2016-04-18 14:17 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Or Gerlitz,
Tal Alon, Tariq Toukan, Eran Ben Elisha, Achiad Shochat
In-Reply-To: <CALzJLG_W9SkgMBQp86P0WDknw4Kc=DCBrvpPemAUbRX=r4r8Yg@mail.gmail.com>
On Mon, 2016-04-18 at 16:05 +0300, Saeed Mahameed wrote:
> On Mon, Apr 18, 2016 at 3:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sun, 2016-04-17 at 17:29 -0700, Eric Dumazet wrote:
> >
> >>
> >> If really you need to allocate physically contiguous memory, have you
> >> considered converting the order-5 pages into 32 order-0 ones ?
> >
> > Search for split_page() call sites for examples.
> >
> >
>
> Thanks Eric, we are already evaluating split_page as we speak.
>
> We did look but could not find any specific alloc_pages API that
> allocates many physically contiguous pages with order0 ! so we assume
> it is ok to use split_page.
Note: I have no idea of split_page() performance :
Buddy page allocator has to aggregate pages into order-5, then we would
undo the work, touching 32 cache lines.
You might first benchmark a simple loop doing
loop 10,000,000 times
Order-5 allocation
split into 32 order-0
free 32 pages
Another idea would be to have a way to control max number of order-5
pages that a port would be using.
Since driver always own a ref on a order-5 pages, idea would be to
maintain a circular ring of up to XXX such pages, so that we can detect
an abnormal use and fallback to order-0 immediately.
^ permalink raw reply
* Re: [PATCH net-next 2/5] qede: Add support for ethtool private flags
From: Sergei Shtylyov @ 2016-04-18 14:11 UTC (permalink / raw)
To: Yuval Mintz, davem, netdev
In-Reply-To: <1460921195-23352-3-git-send-email-Yuval.Mintz@qlogic.com>
Hello.
On 4/17/2016 10:26 PM, Yuval Mintz wrote:
> Adds a getter for the interfaces private flags.
> The only parameter currently supported is whether the interface is a
> coupled function [required for supporting 100g].
>
> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
> ---
> drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 27 +++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
> index f87e83b..5ba6b2a 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
[...]
> @@ -185,6 +200,17 @@ static int qede_get_sset_count(struct net_device *dev, int stringset)
> }
> }
>
> +static u32 qede_get_priv_flags(struct net_device *dev)
> +{
> + struct qede_dev *edev = netdev_priv(dev);
> + u32 flags = 0;
> +
> + flags |= (!!(edev->dev_info.common.num_hwfns > 1)) <<
> + QEDE_PRI_FLAG_CMT;
Why not just '='?
> +
> + return flags;
... or direct return of the value above?
> +}
> +
> static int qede_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
> {
> struct qede_dev *edev = netdev_priv(dev);
[...]
MBR, Sergei
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Eli Cohen @ 2016-04-18 14:05 UTC (permalink / raw)
To: Sinan Kaya; +Cc: linux-rdma, timur, cov, Yishai Hadas, netdev, linux-kernel
In-Reply-To: <5714E6DA.4080008@codeaurora.org>
Sure, this is not the complete patch. As far as I know the problem
you're facing with arm is that virt_to_page() does not provide the
correct page descriptor so my suggestion will eliminate the need for
it.
On Mon, Apr 18, 2016 at 09:53:30AM -0400, Sinan Kaya wrote:
> On 4/18/2016 2:54 AM, Eli Cohen wrote:
> > Sinan,
> >
> > if we get rid of the part this code:
> >
> > if (BITS_PER_LONG == 64) {
> > struct page **pages;
> > pages = kmalloc(sizeof *pages * buf->nbufs, gfp);
> > if (!pages)
> > goto err_free;
> > ...
> > ...
> > if (!buf->direct.buf)
> > goto err_free;
> > }
> >
> > Does that solve the arm issue?
>
> I will test. As far as I know, there is one more place these DMA addresses
> are called with vmap. This is in mlx4_en_map_buffer.
>
> I was trying to rearrange the allocation so that vmap actually works.
>
> What do you think about mlx4_en_map_buffer?
>
>
> --
> Sinan Kaya
> Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Christoph Hellwig @ 2016-04-18 13:59 UTC (permalink / raw)
To: Sinan Kaya
Cc: Christoph Hellwig, linux-rdma, timur, cov, Yishai Hadas, netdev,
linux-kernel
In-Reply-To: <5714E5D6.7050600@codeaurora.org>
On Mon, Apr 18, 2016 at 09:49:10AM -0400, Sinan Kaya wrote:
> Here is a good description of logical address vs. virtual address.
>
> https://www.quora.com/What-is-the-Kernel-logical-and-virtual-addresses-What-is-the-difference-between-them-What-is-the-type-of-addresses-listed-in-the-System-map
That's not how we use the terms in Linux. But it's not really the point
of my question either.
> > Is this correct?
> >
> No, the driver is plain broken without this patch. It causes a kernel panic
> during driver probe.
>
> This is the definition of vmap API.
>
> https://www.kernel.org/doc/htmldocs/kernel-api/API-vmap.html
Thanks for the pointer, but I'm actually the person who introduced vmap
to Linux a long time ago, and this is once again not my question.
> You cannot take several virtually mapped addresses returned by dma_alloc_coherent
> and try to make them virtually contiguous again.
But now we're getting closer to the issue: the mlx4_en driver is using
vmap on buffers allocated using dma_alloc_coherent if on a 64-bit
architecture, and that's obviously broken.
Now the big quetions is: why does it do that, given that
dma_alloc_coherent can be used for high order allocations anyway (and in
fact many architectures implement is using a version of vmap).
Let's get some answers on these question from the Mellanox folks and
work from there.
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Sinan Kaya @ 2016-04-18 13:53 UTC (permalink / raw)
To: Eli Cohen; +Cc: linux-rdma, timur, cov, Yishai Hadas, netdev, linux-kernel
In-Reply-To: <20160418065447.GA11539@x-vnc01.mtx.labs.mlnx>
On 4/18/2016 2:54 AM, Eli Cohen wrote:
> Sinan,
>
> if we get rid of the part this code:
>
> if (BITS_PER_LONG == 64) {
> struct page **pages;
> pages = kmalloc(sizeof *pages * buf->nbufs, gfp);
> if (!pages)
> goto err_free;
> ...
> ...
> if (!buf->direct.buf)
> goto err_free;
> }
>
> Does that solve the arm issue?
I will test. As far as I know, there is one more place these DMA addresses
are called with vmap. This is in mlx4_en_map_buffer.
I was trying to rearrange the allocation so that vmap actually works.
What do you think about mlx4_en_map_buffer?
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
^ permalink raw reply
* [PATCH v2 1/1] drivers: net: cpsw: Prevent NUll pointer dereference with two PHYs
From: Andrew Goodbody @ 2016-04-18 13:53 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Andrew Goodbody
In-Reply-To: <1460987606-18125-1-git-send-email-andrew.goodbody@cambrionix.com>
Adding a 2nd PHY to cpsw results in a NULL pointer dereference
as below. Fix by maintaining a reference to each PHY node in slave
struct instead of a single reference in the priv struct which was
overwritten by the 2nd PHY.
[ 17.870933] Unable to handle kernel NULL pointer dereference at virtual address 00000180
[ 17.879557] pgd = dc8bc000
[ 17.882514] [00000180] *pgd=9c882831, *pte=00000000, *ppte=00000000
[ 17.889213] Internal error: Oops: 17 [#1] ARM
[ 17.893838] Modules linked in:
[ 17.897102] CPU: 0 PID: 1657 Comm: connmand Not tainted 4.5.0-ge463dfb-dirty #11
[ 17.904947] Hardware name: Cambrionix whippet
[ 17.909576] task: dc859240 ti: dc968000 task.ti: dc968000
[ 17.915339] PC is at phy_attached_print+0x18/0x8c
[ 17.920339] LR is at phy_attached_info+0x14/0x18
[ 17.925247] pc : [<c042baec>] lr : [<c042bb74>] psr: 600f0113
[ 17.925247] sp : dc969cf8 ip : dc969d28 fp : dc969d18
[ 17.937425] r10: dda7a400 r9 : 00000000 r8 : 00000000
[ 17.942971] r7 : 00000001 r6 : ddb00480 r5 : ddb8cb34 r4 : 00000000
[ 17.949898] r3 : c0954cc0 r2 : c09562b0 r1 : 00000000 r0 : 00000000
[ 17.956829] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 17.964401] Control: 10c5387d Table: 9c8bc019 DAC: 00000051
[ 17.970500] Process connmand (pid: 1657, stack limit = 0xdc968210)
[ 17.977059] Stack: (0xdc969cf8 to 0xdc96a000)
[ 17.981692] 9ce0: dc969d28 dc969d08
[ 17.990386] 9d00: c038f9bc c038f6b4 ddb00480 dc969d34 dc969d28 c042bb74 c042bae4 00000000
[ 17.999080] 9d20: c09562b0 c0954cc0 dc969d5c dc969d38 c043ebfc c042bb6c 00000007 00000003
[ 18.007773] 9d40: ddb00000 ddb8cb58 ddb00480 00000001 dc969dec dc969d60 c0441614 c043ea68
[ 18.016465] 9d60: 00000000 00000003 00000000 fffffff4 dc969df4 0000000d 00000000 00000000
[ 18.025159] 9d80: dc969db4 dc969d90 c005dc08 c05839e0 dc969df4 0000000d ddb00000 00001002
[ 18.033851] 9da0: 00000000 00000000 dc969dcc dc969db8 c005ddf4 c005dbc8 00000000 00000118
[ 18.042544] 9dc0: dc969dec dc969dd0 ddb00000 c06db27c ffff9003 00001002 00000000 00000000
[ 18.051237] 9de0: dc969e0c dc969df0 c057c88c c04410dc dc969e0c ddb00000 ddb00000 00000001
[ 18.059930] 9e00: dc969e34 dc969e10 c057cb44 c057c7d8 ddb00000 ddb00138 00001002 beaeda20
[ 18.068622] 9e20: 00000000 00000000 dc969e5c dc969e38 c057cc28 c057cac0 00000000 dc969e80
[ 18.077315] 9e40: dda7a40c beaeda20 00000000 00000000 dc969ecc dc969e60 c05e36d0 c057cc14
[ 18.086007] 9e60: dc969e84 00000051 beaeda20 00000000 dda7a40c 00000014 ddb00000 00008914
[ 18.094699] 9e80: 30687465 00000000 00000000 00000000 00009003 00000000 00000000 00000000
[ 18.103391] 9ea0: 00001002 00008914 dd257ae0 beaeda20 c098a428 beaeda20 00000011 00000000
[ 18.112084] 9ec0: dc969edc dc969ed0 c05e4e54 c05e3030 dc969efc dc969ee0 c055f5ac c05e4cc4
[ 18.120777] 9ee0: beaeda20 dd257ae0 dc8ab4c0 00008914 dc969f7c dc969f00 c010b388 c055f45c
[ 18.129471] 9f00: c071ca40 dd257ac0 c00165e8 dc968000 dc969f3c dc969f20 dc969f64 dc969f28
[ 18.138164] 9f20: c0115708 c0683ec8 dd257ac0 dd257ac0 dc969f74 dc969f40 c055f350 c00fc66c
[ 18.146857] 9f40: dd82e4d0 00000011 00000000 00080000 dd257ac0 00000000 dc8ab4c0 dc8ab4c0
[ 18.155550] 9f60: 00008914 beaeda20 00000011 00000000 dc969fa4 dc969f80 c010bc34 c010b2fc
[ 18.164242] 9f80: 00000000 00000011 00000002 00000036 c00165e8 dc968000 00000000 dc969fa8
[ 18.172935] 9fa0: c00163e0 c010bbcc 00000000 00000011 00000011 00008914 beaeda20 00009003
[ 18.181628] 9fc0: 00000000 00000011 00000002 00000036 00081018 00000001 00000000 beaedc10
[ 18.190320] 9fe0: 00083188 beaeda1c 00043a5d b6d29c0c 600b0010 00000011 00000000 00000000
[ 18.198989] Backtrace:
[ 18.201621] [<c042bad8>] (phy_attached_print) from [<c042bb74>] (phy_attached_info+0x14/0x18)
[ 18.210664] r3:c0954cc0 r2:c09562b0 r1:00000000
[ 18.215588] r4:ddb00480
[ 18.218322] [<c042bb60>] (phy_attached_info) from [<c043ebfc>] (cpsw_slave_open+0x1a0/0x280)
[ 18.227293] [<c043ea5c>] (cpsw_slave_open) from [<c0441614>] (cpsw_ndo_open+0x544/0x674)
[ 18.235874] r7:00000001 r6:ddb00480 r5:ddb8cb58 r4:ddb00000
[ 18.241944] [<c04410d0>] (cpsw_ndo_open) from [<c057c88c>] (__dev_open+0xc0/0x128)
[ 18.249972] r9:00000000 r8:00000000 r7:00001002 r6:ffff9003 r5:c06db27c r4:ddb00000
[ 18.258255] [<c057c7cc>] (__dev_open) from [<c057cb44>] (__dev_change_flags+0x90/0x154)
[ 18.266745] r5:00000001 r4:ddb00000
[ 18.270575] [<c057cab4>] (__dev_change_flags) from [<c057cc28>] (dev_change_flags+0x20/0x50)
[ 18.279523] r9:00000000 r8:00000000 r7:beaeda20 r6:00001002 r5:ddb00138 r4:ddb00000
[ 18.287811] [<c057cc08>] (dev_change_flags) from [<c05e36d0>] (devinet_ioctl+0x6ac/0x76c)
[ 18.296483] r9:00000000 r8:00000000 r7:beaeda20 r6:dda7a40c r5:dc969e80 r4:00000000
[ 18.304762] [<c05e3024>] (devinet_ioctl) from [<c05e4e54>] (inet_ioctl+0x19c/0x1c8)
[ 18.312882] r10:00000000 r9:00000011 r8:beaeda20 r7:c098a428 r6:beaeda20 r5:dd257ae0
[ 18.321235] r4:00008914
[ 18.323956] [<c05e4cb8>] (inet_ioctl) from [<c055f5ac>] (sock_ioctl+0x15c/0x2d8)
[ 18.331829] [<c055f450>] (sock_ioctl) from [<c010b388>] (do_vfs_ioctl+0x98/0x8d0)
[ 18.339765] r7:00008914 r6:dc8ab4c0 r5:dd257ae0 r4:beaeda20
[ 18.345822] [<c010b2f0>] (do_vfs_ioctl) from [<c010bc34>] (SyS_ioctl+0x74/0x84)
[ 18.353573] r10:00000000 r9:00000011 r8:beaeda20 r7:00008914 r6:dc8ab4c0 r5:dc8ab4c0
[ 18.361924] r4:00000000
[ 18.364653] [<c010bbc0>] (SyS_ioctl) from [<c00163e0>] (ret_fast_syscall+0x0/0x3c)
[ 18.372682] r9:dc968000 r8:c00165e8 r7:00000036 r6:00000002 r5:00000011 r4:00000000
[ 18.380960] Code: e92dd810 e24cb010 e24dd010 e59b4004 (e5902180)
[ 18.387580] ---[ end trace c80529466223f3f3 ]---
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
---
v2 - Move allocation of memory for priv->slaves to inside cpsw_probe_dt so it
has data->slaves initialised first which is needed to calculate size
drivers/net/ethernet/ti/cpsw.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 42fdfd4..e62909c 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -349,6 +349,7 @@ struct cpsw_slave {
struct cpsw_slave_data *data;
struct phy_device *phy;
struct net_device *ndev;
+ struct device_node *phy_node;
u32 port_vlan;
u32 open_stat;
};
@@ -367,7 +368,6 @@ struct cpsw_priv {
spinlock_t lock;
struct platform_device *pdev;
struct net_device *ndev;
- struct device_node *phy_node;
struct napi_struct napi_rx;
struct napi_struct napi_tx;
struct device *dev;
@@ -1148,8 +1148,8 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
- if (priv->phy_node)
- slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
+ if (slave->phy_node)
+ slave->phy = of_phy_connect(priv->ndev, slave->phy_node,
&cpsw_adjust_link, 0, slave->data->phy_if);
else
slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
@@ -1946,7 +1946,7 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
struct device_node *node = pdev->dev.of_node;
struct device_node *slave_node;
struct cpsw_platform_data *data = &priv->data;
- int i = 0, ret;
+ int i, ret;
u32 prop;
if (!node)
@@ -1958,6 +1958,14 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
}
data->slaves = prop;
+ priv->slaves = devm_kzalloc(&pdev->dev,
+ sizeof(struct cpsw_slave) * data->slaves,
+ GFP_KERNEL);
+ if (!priv->slaves)
+ return -ENOMEM;
+ for (i = 0; i < data->slaves; i++)
+ priv->slaves[i].slave_num = i;
+
if (of_property_read_u32(node, "active_slave", &prop)) {
dev_err(&pdev->dev, "Missing active_slave property in the DT.\n");
return -EINVAL;
@@ -2023,6 +2031,7 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
if (ret)
dev_warn(&pdev->dev, "Doesn't have any child node\n");
+ i = 0;
for_each_child_of_node(node, slave_node) {
struct cpsw_slave_data *slave_data = data->slave_data + i;
const void *mac_addr = NULL;
@@ -2033,7 +2042,8 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
if (strcmp(slave_node->name, "slave"))
continue;
- priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+ priv->slaves[i].phy_node =
+ of_parse_phandle(slave_node, "phy-handle", 0);
parp = of_get_property(slave_node, "phy_id", &lenp);
if (of_phy_is_fixed_link(slave_node)) {
struct device_node *phy_node;
@@ -2292,16 +2302,6 @@ static int cpsw_probe(struct platform_device *pdev)
memcpy(ndev->dev_addr, priv->mac_addr, ETH_ALEN);
- priv->slaves = devm_kzalloc(&pdev->dev,
- sizeof(struct cpsw_slave) * data->slaves,
- GFP_KERNEL);
- if (!priv->slaves) {
- ret = -ENOMEM;
- goto clean_runtime_disable_ret;
- }
- for (i = 0; i < data->slaves; i++)
- priv->slaves[i].slave_num = i;
-
priv->slaves[0].ndev = ndev;
priv->emac_port = 0;
--
2.5.0
^ permalink raw reply related
* [PATCH v2 0/1] drivers: net: cpsw: Fix NULL pointer dereference with two slave PHYs
From: Andrew Goodbody @ 2016-04-18 13:53 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel, Andrew Goodbody
This is a fix for a NULL pointer dereference from cpsw which is triggered
by having two slave PHYs attached to a cpsw network device. The problem is
due to only maintaining a single reference to a PHY node in the prive data
which gets overwritten by the second PHY probe. So move the PHY node
reference to the individual slave data so that there is now one per slave.
v1 had a problem that data->slaves was used before it had been filled in
Andrew Goodbody (1):
Prevent NUll pointer dereference with two PHYs on cpsw
drivers/net/ethernet/ti/cpsw.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
--
2.5.0
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Sinan Kaya @ 2016-04-18 13:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-rdma, timur, cov, Yishai Hadas, netdev, linux-kernel
In-Reply-To: <20160418131058.GA25421@infradead.org>
On 4/18/2016 9:10 AM, Christoph Hellwig wrote:
> On Mon, Apr 18, 2016 at 09:06:18AM -0400, okaya@codeaurora.org wrote:
>> On 2016-04-18 08:12, Christoph Hellwig wrote:
>>> On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote:
>>>> Current code is assuming that the address returned by dma_alloc_coherent
>>>> is a logical address. This is not true on ARM/ARM64 systems.
>>>
>>> Can you explain what you mean with a 'logical address' and what actual
>>> issue you're trying to solve?
>>
Here is a good description of logical address vs. virtual address.
https://www.quora.com/What-is-the-Kernel-logical-and-virtual-addresses-What-is-the-difference-between-them-What-is-the-type-of-addresses-listed-in-the-System-map
>> Vmap call is failing on arm64 systems because dma alloc api already returns
>> an address mapped with vmap.
>
> Please state your problem clearly. What I'm reverse engineering from
> your posts is: because dma_alloc_coherent uses vmap-like mappings on
> arm64 (all, some systems?)
All arm64 systems.
>a driver using a lot of them might run into
> limits of the vmap pool size.
>
> Is this correct?
>
No, the driver is plain broken without this patch. It causes a kernel panic
during driver probe.
This is the definition of vmap API.
https://www.kernel.org/doc/htmldocs/kernel-api/API-vmap.html
VMAP allows you to make several pages look contiguous to the CPU.
It can only be used against logical addresses returned from kmalloc
or alloc_page.
You cannot take several virtually mapped addresses returned by dma_alloc_coherent
and try to make them virtually contiguous again.
The code happens to work on other architectures by pure luck. AFAIK, dma_alloc_coherent
returns logical addresses on Intel systems until it runs out of DMA memory. After
that intel arch will also start returning virtually mapped addresses and this code
will also fail. ARM64 on the other hand always returns a virtually mapped address.
The goal of this code is to allocate a bunch of page sized memory and make it look
contiguous. It is just using the wrong API. The correct API is either kmalloc or
alloc_page map it with dma_map_page not dma_alloc_coherent.
The proper usage of dma_map_page requires code to call dma_sync API in correct
places to be compatible with noncoherent systems. This code is already assuming
coherency. It would be a nice to have dma_sync APIs in right places. There is no
harm in calling dma_sync API for coherent systems as they are no-ops in DMA mapping
layer whereas it is a cache flush for non-coherent systems.
>>
>> Please see arch/arm64/mm directory.
> ---end quoted text---
>
I hope it is clear now. The previous email was the most I could type on my phone.
--
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
^ permalink raw reply
* Re: [patch -next] udp: fix if statement in SIOCINQ ioctl
From: Willem de Bruijn @ 2016-04-18 13:41 UTC (permalink / raw)
To: Eric Dumazet
Cc: Dan Carpenter, David S. Miller, Willem de Bruijn,
Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, Network Development, LKML, kernel-janitors
In-Reply-To: <1460981977.10638.105.camel@edumazet-glaptop3.roam.corp.google.com>
On Mon, Apr 18, 2016 at 8:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-04-18 at 11:44 +0300, Dan Carpenter wrote:
>> We deleted a line of code and accidentally made the "return put_user()"
>> part of the if statement when it's supposed to be unconditional.
>>
>> Fixes: 9f9a45beaa96 ('udp: do not expect udp headers on ioctl SIOCINQ')
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Thanks for catching this.
^ permalink raw reply
* Re: [PATCH bluetooth-next 07/10] ipv6: introduce neighbour discovery ops
From: Alexander Aring @ 2016-04-18 13:28 UTC (permalink / raw)
To: linux-wpan
Cc: kernel, marcel, jukka.rissanen, hannes, stefan, mcr, werner,
linux-bluetooth, netdev, David S . Miller, Alexey Kuznetsov,
James Morris, Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1460977108-4675-8-git-send-email-aar@pengutronix.de>
Hi,
Am 04/18/2016 um 12:58 PM schrieb Alexander Aring:
> This patch introduces neighbour discovery ops callback structure. The
> structure contains at first receive and transmit handling for NS/NA and
> userspace option field functionality.
>
> These callback offers 6lowpan different handling, such as 802.15.4 short
> address handling or RFC6775 (Neighbor Discovery Optimization for IPv6 over
> 6LoWPANs).
>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: James Morris <jmorris@namei.org>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Alexander Aring <aar@pengutronix.de>
> ---
> include/linux/netdevice.h | 3 ++
> include/net/ndisc.h | 73 ++++++++++++++++++++++++++++++++++++++++++-----
> net/ipv6/addrconf.c | 1 +
> net/ipv6/ndisc.c | 71 +++++++++++++++++++++++++++++++--------------
> net/ipv6/route.c | 2 +-
> 5 files changed, 121 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0052c42..4f1b3f2 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1677,6 +1677,9 @@ struct net_device {
> #ifdef CONFIG_NET_L3_MASTER_DEV
> const struct l3mdev_ops *l3mdev_ops;
> #endif
> +#ifdef CONFIG_IPV6
> + const struct ndisc_ops *ndisc_ops;
> +#endif
Need to change it to:
#if IS_ENABLED(CONFIG_IPV6)
as well also on other configs which can be builded as tristate.
Sorry for the noise, I will fix that in v2. :-)
- Alex
^ permalink raw reply
* Re: [PATCH RFC net-next 0/2] pskb_extract() helper function.
From: Marcelo Ricardo Leitner @ 2016-04-18 13:28 UTC (permalink / raw)
To: Sowmini Varadhan; +Cc: eric.dumazet, netdev
In-Reply-To: <cover.1460928360.git.sowmini.varadhan@oracle.com>
On Mon, Apr 18, 2016 at 06:21:07AM -0700, Sowmini Varadhan wrote:
> This patchset follows up on the discussion in
> https://www.mail-archive.com/netdev@vger.kernel.org/msg105090.html
>
> For RDS-TCP, we have to deal with the full gamut of
> nonlinear sk_buffs, including all the frag_list variants.
> Also, the parent skb has to remain unchanged, while the clone
> is queued for Rx on the PF_RDS socket.
>
> Patch 1 of this patchset adds a pskb_extract() function that
> does all this without the redundant memcpy's in pskb_expand_head()
> and __pskb_pull_tail().
>
> A further optimization is also possible by inlining pskb_trim()
> itself into pskb_carve() and thus avoiding the needless copy
> of trailer frags/pages that will then get trimmed away. I am
> deferring that optimization for the next iteration, and would
> like to get feedback on this first pass, which by itself gives
> a noticeable perf boost.
I like this idea. We can also make use of it in SCTP.
Marcelo
^ permalink raw reply
* [PATCH RFC net-next 0/2] pskb_extract() helper function.
From: Sowmini Varadhan @ 2016-04-18 13:21 UTC (permalink / raw)
To: eric.dumazet, netdev; +Cc: sowmini.varadhan
This patchset follows up on the discussion in
https://www.mail-archive.com/netdev@vger.kernel.org/msg105090.html
For RDS-TCP, we have to deal with the full gamut of
nonlinear sk_buffs, including all the frag_list variants.
Also, the parent skb has to remain unchanged, while the clone
is queued for Rx on the PF_RDS socket.
Patch 1 of this patchset adds a pskb_extract() function that
does all this without the redundant memcpy's in pskb_expand_head()
and __pskb_pull_tail().
A further optimization is also possible by inlining pskb_trim()
itself into pskb_carve() and thus avoiding the needless copy
of trailer frags/pages that will then get trimmed away. I am
deferring that optimization for the next iteration, and would
like to get feedback on this first pass, which by itself gives
a noticeable perf boost.
Sowmini Varadhan (2):
Add pskb_extract() helper function
Call pskb_extract() helper function
include/linux/skbuff.h | 2 +
net/core/skbuff.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++++
net/rds/tcp_recv.c | 14 +--
3 files changed, 253 insertions(+), 11 deletions(-)
^ permalink raw reply
* [PATCH RFC net-next 2/2] RDS: TCP: Call pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-18 13:21 UTC (permalink / raw)
To: eric.dumazet, netdev; +Cc: sowmini.varadhan
In-Reply-To: <cover.1460928360.git.sowmini.varadhan@oracle.com>
rds-stress experiments with request size 256 bytes, 8K acks,
using 16 threads show a 40% improvment when pskb_extract()
replaces the {skb_clone(..); pskb_pull(..); pskb_trim(..);}
pattern in the Rx path, so we leverage the perf gain with
this commit.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/tcp_recv.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index 27a9921..d75d8b5 100644
--- a/net/rds/tcp_recv.c
+++ b/net/rds/tcp_recv.c
@@ -207,22 +207,14 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
}
if (left && tc->t_tinc_data_rem) {
- clone = skb_clone(skb, arg->gfp);
+ to_copy = min(tc->t_tinc_data_rem, left);
+
+ clone = pskb_extract(skb, offset, to_copy, arg->gfp);
if (!clone) {
desc->error = -ENOMEM;
goto out;
}
- to_copy = min(tc->t_tinc_data_rem, left);
- if (!pskb_pull(clone, offset) ||
- pskb_trim(clone, to_copy)) {
- pr_warn("rds_tcp_data_recv: pull/trim failed "
- "left %zu data_rem %zu skb_len %d\n",
- left, tc->t_tinc_data_rem, skb->len);
- kfree_skb(clone);
- desc->error = -ENOMEM;
- goto out;
- }
skb_queue_tail(&tinc->ti_skb_list, clone);
rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
--
1.7.1
^ permalink raw reply related
* [PATCH RFC net-next 1/2] skbuff: Add pskb_extract() helper function
From: Sowmini Varadhan @ 2016-04-18 13:21 UTC (permalink / raw)
To: eric.dumazet, netdev; +Cc: sowmini.varadhan
In-Reply-To: <cover.1460928360.git.sowmini.varadhan@oracle.com>
A pattern of skb usage seen in modules such as RDS-TCP is to
extract `to_copy' bytes from the received TCP segment, starting
at some offset `off' into a new skb `clone'. This is done in
the ->data_ready callback, where the clone skb is queued up for rx on
the PF_RDS socket, while the parent TCP segment is returned unchanged
back to the TCP engine.
The existing code uses the sequence
clone = skb_clone(..);
pskb_pull(clone, off, ..);
pskb_trim(clone, to_copy, ..);
with the intention of discarding the first `off' bytes. However,
skb_clone() + pskb_pull() implies pksb_expand_head(), which ends
up doing a redundant memcpy of bytes that will then get discarded
in __pskb_pull_tail().
To avoid this inefficiency, this commit adds pskb_extract() that
creates the clone, and memcpy's only the relevant header/frag/frag_list
to the start of `clone'. pskb_trim() is then invoked to trim clone
down to the requested to_copy bytes.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
include/linux/skbuff.h | 2 +
net/core/skbuff.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 250 insertions(+), 0 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index da0ace3..a1ce639 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2986,6 +2986,8 @@ struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
int skb_ensure_writable(struct sk_buff *skb, int write_len);
int skb_vlan_pop(struct sk_buff *skb);
int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
+ gfp_t gfp);
static inline int memcpy_from_msg(void *data, struct msghdr *msg, int len)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4cc594c..e8b6d20 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4619,3 +4619,251 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
return NULL;
}
EXPORT_SYMBOL(alloc_skb_with_frags);
+
+/* carve out the first off bytes from skb when off < headlen */
+static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
+ const int headlen, gfp_t gfp_mask)
+{
+ int i;
+ int size = skb_end_offset(skb);
+ int new_hlen = headlen - off;
+ u8 *data;
+ int doff = 0;
+
+ size = SKB_DATA_ALIGN(size);
+
+ if (skb_pfmemalloc(skb))
+ gfp_mask |= __GFP_MEMALLOC;
+ data = kmalloc_reserve(size +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL);
+ if (!data)
+ return -ENOMEM;
+
+ size = SKB_WITH_OVERHEAD(ksize(data));
+
+ /* Copy real data, and all frags */
+ skb_copy_from_linear_data_offset(skb, off, data, new_hlen);
+ skb->len -= off;
+
+ memcpy((struct skb_shared_info *)(data + size),
+ skb_shinfo(skb),
+ offsetof(struct skb_shared_info,
+ frags[skb_shinfo(skb)->nr_frags]));
+ if (skb_cloned(skb)) {
+ /* drop the old head gracefully */
+ if (skb_orphan_frags(skb, gfp_mask)) {
+ kfree(data);
+ return -ENOMEM;
+ }
+ for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+ skb_frag_ref(skb, i);
+ if (skb_has_frag_list(skb))
+ skb_clone_fraglist(skb);
+ skb_release_data(skb);
+ } else {
+ /* we can reuse existing recount- all we did was
+ * relocate values
+ */
+ skb_free_head(skb);
+ }
+
+ doff = (data - skb->head);
+ skb->head = data;
+ skb->data = data;
+ skb->head_frag = 0;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ skb->end = size;
+ doff = 0;
+#else
+ skb->end = skb->head + size;
+#endif
+ skb_set_tail_pointer(skb, skb_headlen(skb));
+ skb_headers_offset_update(skb, 0);
+ skb->cloned = 0;
+ skb->hdr_len = 0;
+ skb->nohdr = 0;
+ atomic_set(&skb_shinfo(skb)->dataref, 1);
+
+ return 0;
+}
+
+static int pskb_carve(struct sk_buff *skb, const u32 off, gfp_t gfp);
+
+/* carve out the first eat bytes from skb's frag_list. May recurse into
+ * pskb_carve()
+ */
+static int pskb_carve_frag_list(struct sk_buff *skb,
+ struct skb_shared_info *shinfo, int eat,
+ gfp_t gfp_mask)
+{
+ struct sk_buff *list = shinfo->frag_list;
+ struct sk_buff *clone = NULL;
+ struct sk_buff *insp = NULL;
+
+ do {
+ if (!list) {
+ pr_err("Not enough bytes to eat. Want %d\n", eat);
+ return -EFAULT;
+ }
+ if (list->len <= eat) {
+ /* Eaten as whole. */
+ eat -= list->len;
+ list = list->next;
+ insp = list;
+ } else {
+ /* Eaten partially. */
+ if (skb_shared(list)) {
+ clone = skb_clone(list, gfp_mask);
+ if (!clone)
+ return -ENOMEM;
+ insp = list->next;
+ list = clone;
+ } else {
+ /* This may be pulled without problems. */
+ insp = list;
+ }
+ if (pskb_carve(list, eat, gfp_mask) < 0) {
+ kfree_skb(clone);
+ return -ENOMEM;
+ }
+ break;
+ }
+ } while (eat);
+
+ /* Free pulled out fragments. */
+ while ((list = shinfo->frag_list) != insp) {
+ shinfo->frag_list = list->next;
+ kfree_skb(list);
+ }
+ /* And insert new clone at head. */
+ if (clone) {
+ clone->next = list;
+ shinfo->frag_list = clone;
+ }
+ return 0;
+}
+
+/* carve off first len bytes from skb. Split line (off) is in the
+ * non-linear part of skb
+ */
+static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
+ int pos, gfp_t gfp_mask)
+{
+ int i, k = 0;
+ int size = skb_end_offset(skb);
+ u8 *data;
+ const int nfrags = skb_shinfo(skb)->nr_frags;
+ struct skb_shared_info *shinfo;
+ int doff = 0;
+
+ size = SKB_DATA_ALIGN(size);
+
+ if (skb_pfmemalloc(skb))
+ gfp_mask |= __GFP_MEMALLOC;
+ data = kmalloc_reserve(size +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+ gfp_mask, NUMA_NO_NODE, NULL);
+ if (!data)
+ return -ENOMEM;
+
+ size = SKB_WITH_OVERHEAD(ksize(data));
+
+ memcpy((struct skb_shared_info *)(data + size),
+ skb_shinfo(skb), offsetof(struct skb_shared_info,
+ frags[skb_shinfo(skb)->nr_frags]));
+ if (skb_orphan_frags(skb, gfp_mask)) {
+ kfree(data);
+ return -ENOMEM;
+ }
+ shinfo = (struct skb_shared_info *)(data + size);
+ for (i = 0; i < nfrags; i++) {
+ int fsize = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+
+ if (pos + fsize > off) {
+ shinfo->frags[k] = skb_shinfo(skb)->frags[i];
+
+ if (pos < off) {
+ /* Split frag.
+ * We have two variants in this case:
+ * 1. Move all the frag to the second
+ * part, if it is possible. F.e.
+ * this approach is mandatory for TUX,
+ * where splitting is expensive.
+ * 2. Split is accurately. We make this.
+ */
+ shinfo->frags[0].page_offset += off - pos;
+ skb_frag_size_sub(&shinfo->frags[0], off - pos);
+ }
+ skb_frag_ref(skb, i);
+ k++;
+ }
+ pos += fsize;
+ }
+ shinfo->nr_frags = k;
+ if (skb_has_frag_list(skb))
+ skb_clone_fraglist(skb);
+
+ if (k == 0) {
+ /* split line is in frag list */
+ pskb_carve_frag_list(skb, shinfo, off - pos, gfp_mask);
+ }
+ skb_release_data(skb);
+
+ doff = (data - skb->head);
+ skb->head = data;
+ skb->head_frag = 0;
+ skb->data = data;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+ skb->end = size;
+ doff = 0;
+#else
+ skb->end = skb->head + size;
+#endif
+ skb_reset_tail_pointer(skb);
+ skb_headers_offset_update(skb, 0);
+ skb->cloned = 0;
+ skb->hdr_len = 0;
+ skb->nohdr = 0;
+ skb->len -= off;
+ skb->data_len = skb->len;
+ atomic_set(&skb_shinfo(skb)->dataref, 1);
+ return 0;
+}
+
+/* remove len bytes from the beginning of the skb */
+static int pskb_carve(struct sk_buff *skb, const u32 len, gfp_t gfp)
+{
+ int headlen = skb_headlen(skb);
+
+ if (len < headlen)
+ return pskb_carve_inside_header(skb, len, headlen, gfp);
+ else
+ return pskb_carve_inside_nonlinear(skb, len, headlen, gfp);
+}
+
+/* Extract to_copy bytes starting at off from skb, and return this in
+ * a new skb
+ */
+struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
+ int to_copy, gfp_t gfp)
+{
+ struct sk_buff *clone = skb_clone(skb, gfp);
+
+ if (!clone)
+ return NULL;
+
+ if (pskb_carve(clone, off, gfp) < 0) {
+ pr_warn("pskb_carve failed\n");
+ kfree_skb(clone);
+ return NULL;
+ }
+
+ if (pskb_trim(clone, to_copy)) {
+ pr_warn("pskb_trim failed\n");
+ kfree_skb(clone);
+ return NULL;
+ }
+ return clone;
+}
+EXPORT_SYMBOL(pskb_extract);
--
1.7.1
^ permalink raw reply related
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: Christoph Hellwig @ 2016-04-18 13:10 UTC (permalink / raw)
To: okaya
Cc: Christoph Hellwig, linux-rdma, timur, cov, Yishai Hadas, netdev,
linux-kernel
In-Reply-To: <0c6a430c5f0ec64f51d7c594ef9751dd@codeaurora.org>
On Mon, Apr 18, 2016 at 09:06:18AM -0400, okaya@codeaurora.org wrote:
> On 2016-04-18 08:12, Christoph Hellwig wrote:
> >On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote:
> >>Current code is assuming that the address returned by dma_alloc_coherent
> >>is a logical address. This is not true on ARM/ARM64 systems.
> >
> >Can you explain what you mean with a 'logical address' and what actual
> >issue you're trying to solve?
>
> Vmap call is failing on arm64 systems because dma alloc api already returns
> an address mapped with vmap.
Please state your problem clearly. What I'm reverse engineering from
your posts is: because dma_alloc_coherent uses vmap-like mappings on
arm64 (all, some systems?) a driver using a lot of them might run into
limits of the vmap pool size.
Is this correct?
>
> Please see arch/arm64/mm directory.
---end quoted text---
^ permalink raw reply
* Re: [PATCH V2] net: ethernet: mellanox: correct page conversion
From: okaya @ 2016-04-18 13:06 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-rdma, timur, cov, Yishai Hadas, netdev, linux-kernel
In-Reply-To: <20160418121247.GA25387@infradead.org>
On 2016-04-18 08:12, Christoph Hellwig wrote:
> On Sat, Apr 16, 2016 at 06:23:32PM -0400, Sinan Kaya wrote:
>> Current code is assuming that the address returned by
>> dma_alloc_coherent
>> is a logical address. This is not true on ARM/ARM64 systems.
>
> Can you explain what you mean with a 'logical address' and what actual
> issue you're trying to solve?
Vmap call is failing on arm64 systems because dma alloc api already
returns an address mapped with vmap.
Please see arch/arm64/mm directory.
^ permalink raw reply
* Re: [PATCH net-next V2 05/11] net/mlx5e: Support RX multi-packet WQE (Striding RQ)
From: Saeed Mahameed @ 2016-04-18 13:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Saeed Mahameed, David S. Miller, Linux Netdev List, Or Gerlitz,
Tal Alon, Tariq Toukan, Eran Ben Elisha, Achiad Shochat
In-Reply-To: <1460983695.10638.113.camel@edumazet-glaptop3.roam.corp.google.com>
On Mon, Apr 18, 2016 at 3:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2016-04-17 at 17:29 -0700, Eric Dumazet wrote:
>
>>
>> If really you need to allocate physically contiguous memory, have you
>> considered converting the order-5 pages into 32 order-0 ones ?
>
> Search for split_page() call sites for examples.
>
>
Thanks Eric, we are already evaluating split_page as we speak.
We did look but could not find any specific alloc_pages API that
allocates many physically contiguous pages with order0 ! so we assume
it is ok to use split_page.
BTW our MPWQE solution doesn't totally rely on huge physically
contiguous memory, as you see in the next two patches we introduce a
fragmented MPWQE approach as a fallback, but we do understand your
concern for the normal flow.
^ permalink raw reply
* Re: [PATCH bluetooth-next 09/10] 6lowpan: introduce 6lowpan-nd
From: kbuild test robot @ 2016-04-18 13:04 UTC (permalink / raw)
To: Alexander Aring
Cc: kbuild-all-JC7UmRfGjtg, linux-wpan-u79uwXL29TY76Z2rM5mHXA,
kernel-bIcnvbaLZ9MEGnE8C9+IrQ, marcel-kz+m5ild9QBg9hUCZPvPmw,
jukka.rissanen-VuQAYsv1563Yd54FQh9/CA,
hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
stefan-JPH+aEBZ4P+UEJcrhfAQsw, mcr-SWp7JaYWvAQV+D8aMU/kSg,
werner-SEdMjqphH88wryQfseakQg,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Alexander Aring, David S . Miller,
Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy
In-Reply-To: <1460977108-4675-10-git-send-email-aar-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5783 bytes --]
Hi Alexander,
[auto build test ERROR on bluetooth-next/master]
url: https://github.com/0day-ci/linux/commits/Alexander-Aring/6lowpan-introduce-basic-6lowpan-nd/20160418-191825
base: https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
config: x86_64-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/debugfs.h:18,
from include/net/6lowpan.h:56,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h: In function 'ndisc_is_useropt':
include/net/ndisc.h:211:16: error: 'const struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->is_useropt))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/6lowpan.h:58,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h:212:13: error: 'const struct net_device' has no member named 'ndisc_ops'
return dev->ndisc_ops->is_useropt(opt);
^
In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/debugfs.h:18,
from include/net/6lowpan.h:56,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h: In function 'ndisc_send_na':
include/net/ndisc.h:223:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/6lowpan.h:58,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h:224:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_na(dev, daddr, solicited_addr, router,
^
In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/debugfs.h:18,
from include/net/6lowpan.h:56,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h: In function 'ndisc_recv_na':
include/net/ndisc.h:230:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/6lowpan.h:58,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h:231:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_na(skb);
^
In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/debugfs.h:18,
from include/net/6lowpan.h:56,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h: In function 'ndisc_send_ns':
include/net/ndisc.h:239:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/6lowpan.h:58,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h:240:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_ns(dev, solicit, daddr, saddr);
^
In file included from include/linux/linkage.h:4:0,
from include/linux/fs.h:4,
from include/linux/debugfs.h:18,
from include/net/6lowpan.h:56,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h: In function 'ndisc_recv_ns':
include/net/ndisc.h:245:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/6lowpan.h:58,
from net/6lowpan/ndisc.c:11:
include/net/ndisc.h:246:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_ns(skb);
^
net/6lowpan/ndisc.c: In function 'lowpan_register_ndisc_ops':
>> net/6lowpan/ndisc.c:632:5: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops = &lowpan_ndisc_ops;
^
vim +632 net/6lowpan/ndisc.c
626 .send_ns = lowpan_ndisc_send_ns,
627 .recv_ns = lowpan_ndisc_recv_ns,
628 };
629
630 void lowpan_register_ndisc_ops(struct net_device *dev)
631 {
> 632 dev->ndisc_ops = &lowpan_ndisc_ops;
633 }
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 54118 bytes --]
^ permalink raw reply
* Re: [PATCH bluetooth-next 07/10] ipv6: introduce neighbour discovery ops
From: kbuild test robot @ 2016-04-18 12:59 UTC (permalink / raw)
To: Alexander Aring
Cc: kbuild-all, linux-wpan, kernel, marcel, jukka.rissanen, hannes,
stefan, mcr, werner, linux-bluetooth, netdev, Alexander Aring,
David S . Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy
In-Reply-To: <1460977108-4675-8-git-send-email-aar@pengutronix.de>
[-- Attachment #1: Type: text/plain, Size: 13788 bytes --]
Hi Alexander,
[auto build test ERROR on bluetooth-next/master]
url: https://github.com/0day-ci/linux/commits/Alexander-Aring/6lowpan-introduce-basic-6lowpan-nd/20160418-191825
base: https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
config: x86_64-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All error/warnings (new ones prefixed by >>):
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/net/ethernet/atheros/alx/main.c:35:
include/net/ndisc.h: In function 'ndisc_is_useropt':
>> include/net/ndisc.h:201:16: error: 'const struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->is_useropt))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/inetpeer.h:15,
from include/net/route.h:28,
from include/net/ip.h:31,
from include/net/ip6_checksum.h:31,
from drivers/net/ethernet/atheros/alx/main.c:46:
include/net/ndisc.h:202:13: error: 'const struct net_device' has no member named 'ndisc_ops'
return dev->ndisc_ops->is_useropt(opt);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/net/ethernet/atheros/alx/main.c:35:
include/net/ndisc.h: In function 'ndisc_send_na':
>> include/net/ndisc.h:213:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/inetpeer.h:15,
from include/net/route.h:28,
from include/net/ip.h:31,
from include/net/ip6_checksum.h:31,
from drivers/net/ethernet/atheros/alx/main.c:46:
include/net/ndisc.h:214:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_na(dev, daddr, solicited_addr, router,
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/net/ethernet/atheros/alx/main.c:35:
include/net/ndisc.h: In function 'ndisc_recv_na':
include/net/ndisc.h:220:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/inetpeer.h:15,
from include/net/route.h:28,
from include/net/ip.h:31,
from include/net/ip6_checksum.h:31,
from drivers/net/ethernet/atheros/alx/main.c:46:
include/net/ndisc.h:221:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_na(skb);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/net/ethernet/atheros/alx/main.c:35:
include/net/ndisc.h: In function 'ndisc_send_ns':
include/net/ndisc.h:229:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/inetpeer.h:15,
from include/net/route.h:28,
from include/net/ip.h:31,
from include/net/ip6_checksum.h:31,
from drivers/net/ethernet/atheros/alx/main.c:46:
include/net/ndisc.h:230:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_ns(dev, solicit, daddr, saddr);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/net/ethernet/atheros/alx/main.c:35:
include/net/ndisc.h: In function 'ndisc_recv_ns':
include/net/ndisc.h:235:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from include/net/inetpeer.h:15,
from include/net/route.h:28,
from include/net/ip.h:31,
from include/net/ip6_checksum.h:31,
from drivers/net/ethernet/atheros/alx/main.c:46:
include/net/ndisc.h:236:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_ns(skb);
^
--
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from net/ipv6/ndisc.c:32:
include/net/ndisc.h: In function 'ndisc_is_useropt':
>> include/net/ndisc.h:201:16: error: 'const struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->is_useropt))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h:202:13: error: 'const struct net_device' has no member named 'ndisc_ops'
return dev->ndisc_ops->is_useropt(opt);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from net/ipv6/ndisc.c:32:
include/net/ndisc.h: In function 'ndisc_send_na':
>> include/net/ndisc.h:213:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h:214:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_na(dev, daddr, solicited_addr, router,
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from net/ipv6/ndisc.c:32:
include/net/ndisc.h: In function 'ndisc_recv_na':
include/net/ndisc.h:220:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_na))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h:221:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_na(skb);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from net/ipv6/ndisc.c:32:
include/net/ndisc.h: In function 'ndisc_send_ns':
include/net/ndisc.h:229:16: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(dev->ndisc_ops->send_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h:230:6: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops->send_ns(dev, solicit, daddr, saddr);
^
In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from net/ipv6/ndisc.c:32:
include/net/ndisc.h: In function 'ndisc_recv_ns':
include/net/ndisc.h:235:21: error: 'struct net_device' has no member named 'ndisc_ops'
if (likely(skb->dev->ndisc_ops->recv_ns))
^
include/linux/compiler.h:169:40: note: in definition of macro 'likely'
# define likely(x) __builtin_expect(!!(x), 1)
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h:236:11: error: 'struct net_device' has no member named 'ndisc_ops'
skb->dev->ndisc_ops->recv_ns(skb);
^
net/ipv6/ndisc.c: In function 'ip6_register_ndisc_ops':
>> net/ipv6/ndisc.c:1804:10: error: 'struct net_device' has no member named 'ndisc_ops'
if (dev->ndisc_ops) {
^
net/ipv6/ndisc.c:1809:7: error: 'struct net_device' has no member named 'ndisc_ops'
dev->ndisc_ops = &ip6_ndisc_ops;
^
In file included from include/net/ipv6.h:20:0,
from net/ipv6/ndisc.c:57:
include/net/ndisc.h: In function 'ndisc_is_useropt':
>> include/net/ndisc.h:205:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
vim +201 include/net/ndisc.h
195 void (*recv_ns)(struct sk_buff *skb);
196 };
197
198 static inline int ndisc_is_useropt(const struct net_device *dev,
199 struct nd_opt_hdr *opt)
200 {
> 201 if (likely(dev->ndisc_ops->is_useropt))
202 return dev->ndisc_ops->is_useropt(opt);
203 else
204 return 0;
> 205 }
206
207 static inline void ndisc_send_na(struct net_device *dev,
208 const struct in6_addr *daddr,
209 const struct in6_addr *solicited_addr,
210 bool router, bool solicited, bool override,
211 bool inc_opt)
212 {
> 213 if (likely(dev->ndisc_ops->send_na))
214 dev->ndisc_ops->send_na(dev, daddr, solicited_addr, router,
215 solicited, override, inc_opt);
216 }
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 54118 bytes --]
^ permalink raw reply
* IMPORTANT MAIL TO YOU
From: verifelaw @ 2016-04-18 11:32 UTC (permalink / raw)
I am Capt. Lawrence Tyman, an officer in US Army,and also a West Point
Graduate, serving in the Military with the 82nd Air Borne Division
Peace keeping force deployed from Afganistan to Syria.
We were moved to Syria from Iraq as the last batch just left,and i
really need your help in assisting me with the safe keeping of 1 military
trunk box contain funds amount of $10.2M which i secured on a raiding we carried out in
January in one of the chief Syrian IsIs base which i headed the squard as the
Captain. With every possible arrangement to lift this box out, is intended to arrive
Belgium from there a diplomat will deliver it to your designated location
I hope you can be trusted? You will be rewarded handsomely if you could help
me secure the funds until I conclude my service here in 3 month to meet you while we can
plan head to head on a good and profitable business or company i can invest my funds in your country.
If you can be trusted and willing to support me in securing this safely kindly indicate
by Letting me know this (1) Your name (2) Your address (3) Age (4) Occupation and
i will explain further when i get a response from you
kindly contact me in this my private email address below: lawrencetyman@gmx.com
Regards,
Capt. Lawrence Tyman
^ permalink raw reply
* Re: [PATCH net-next V2 05/11] net/mlx5e: Support RX multi-packet WQE (Striding RQ)
From: Eric Dumazet @ 2016-04-18 12:48 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, netdev, Or Gerlitz, Tal Alon, Tariq Toukan,
Eran Ben Elisha, Achiad Shochat
In-Reply-To: <1460939371.10638.97.camel@edumazet-glaptop3.roam.corp.google.com>
On Sun, 2016-04-17 at 17:29 -0700, Eric Dumazet wrote:
>
> If really you need to allocate physically contiguous memory, have you
> considered converting the order-5 pages into 32 order-0 ones ?
Search for split_page() call sites for examples.
^ permalink raw reply
* Re: [patch -next] udp: fix if statement in SIOCINQ ioctl
From: Eric Dumazet @ 2016-04-18 12:19 UTC (permalink / raw)
To: Dan Carpenter
Cc: David S. Miller, Willem de Bruijn, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel,
kernel-janitors
In-Reply-To: <20160418084449.GA12410@mwanda>
On Mon, 2016-04-18 at 11:44 +0300, Dan Carpenter wrote:
> We deleted a line of code and accidentally made the "return put_user()"
> part of the if statement when it's supposed to be unconditional.
>
> Fixes: 9f9a45beaa96 ('udp: do not expect udp headers on ioctl SIOCINQ')
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* RE: Poorer networking performance in later kernels?
From: Butler, Peter @ 2016-04-18 12:02 UTC (permalink / raw)
To: Rick Jones, netdev@vger.kernel.org
In-Reply-To: <57116D21.8000807@hpe.com>
Just a minor clarification to my last paragraph ("When I perform the tests in this setup the 3.4.2 and 4.4.0 kernels perform identically - just as you would expect."). By this I don't mean that the 3.4.2 and 4.4.0 kernels on the VMs perform identically to the 3.4.2 and 4.4.0 kernels on the actual hardware; what I mean is that in VM-land the original problem is essentially gone, as I get the same throughput with either kernel .
-----Original Message-----
From: Butler, Peter
Sent: April-18-16 7:28 AM
To: 'Rick Jones' <rick.jones2@hpe.com>; netdev@vger.kernel.org
Subject: RE: Poorer networking performance in later kernels?
Hi Rick
Thanks for the reply.
Here is some hardware information, as requested (the two systems are identical, and are communicating with one another over a 10GB full-duplex Ethernet backplane):
- processor type: Intel(R) Xeon(R) CPU C5528 @ 2.13GHz
- NIC: Intel 82599EB 10GB XAUI/BX4
- NIC driver: ixgbe version 4.2.1-k (part of 4.4.0 kernel)
As for the buffer sizes, those rather large ones work fine for us with the 3.4.2 kernel. However, for the sake of being complete, I have re-tried the tests with the 'standard' 4.4.0 kernel parameters for all /proc/sys/net/* values, and the results still were extremely poor in comparison to the 3.4.2 kernel.
Our MTU is actually just the standard 1500 bytes, however the message size was chosen to mimic actual traffic which will be segmented.
I ran ethtool -k (indeed I checked all ethtool parameters, not just those via -k) and the only real difference I could find was in "large-receive-offload" which was ON in 3.4.2 but OFF in 4.4.0 - so I used ethtool to change this to match the 3.4.2 settings and re-ran the tests. Didn't help :-( It's possible of course that I have missed a parameter here or there in comparing the 3.4.2 setup to the 4.4.0 setup. I also tried running the ethtool config with the latest and greatest ethtool version (4.5) on the 4.4.0 kernel, as compared to the old 3.1 version on our 3.4.2 kernel.
I performed the TCP_RR test as requested and in that case, the results are much more comparable. The old kernel is still better, but now only around 10% better as opposed to 2-3x better.
However I still contend that the *_STREAM tests are giving us more pertinent data, since our product application is only getting 1/3 to 1/2 half of the performance on the 4.4.0 kernel, and this is the same thing I see when I use netperf to test.
One other note: I tried running our 3.4.2 and 4.4.0 kernels in a VM environment on my workstation, so as to take the 'real' production hardware out of the equation. When I perform the tests in this setup the 3.4.2 and 4.4.0 kernels perform identically - just as you would expect.
Any other ideas? What can I be missing here?
Peter
-----Original Message-----
From: Rick Jones [mailto:rick.jones2@hpe.com]
Sent: April-15-16 6:37 PM
To: Butler, Peter <pbutler@sonusnet.com>; netdev@vger.kernel.org
Subject: Re: Poorer networking performance in later kernels?
On 04/15/2016 02:02 PM, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
>
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop
> in networking performance. Nothing was changed on the test systems,
> other than the kernel itself (and kernel modules). The identical
> .config used to build the 3.4.2 kernel was brought over into the
> 4.4.0 kernel source tree, and any configuration differences (e.g. new
> parameters, etc.) were taken as default values.
>
> The testing was performed on the same actual hardware for both kernel
> versions (i.e. take the existing 3.4.2 physical setup, simply boot
> into the (new) kernel and run the same test). The netperf utility was
> used for benchmarking and the testing was always performed on idle
> systems.
>
> TCP testing yielded the following results, where the 4.4.0 kernel only
> got about 1/2 of the throughput:
>
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 3.4.2 13631488 13631488 8952 30.01 9370.29 10.14 6.50 0.709 0.454
> 4.4.0 13631488 13631488 8952 30.02 5314.03 9.14 14.31 1.127 1.765
>
> SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:
>
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 3.4.2 13631488 13631488 8952 30.00 2306.22 13.87 13.19 3.941 3.747
> 4.4.0 13631488 13631488 8952 30.01 882.74 16.86 19.14 12.516 14.210
>
> The same tests were performed a multitude of time, and are always
> consistent (within a few percent). I've also tried playing with
> various run-time kernel parameters (/proc/sys/kernel/net/...) on the
> 4.4.0 kernel to alleviate the issue but have had no success at all.
>
> I'm at a loss as to what could possibly account for such a discrepancy...
>
I suspect I am not alone in being curious about the CPU(s) present in the systems and the model/whatnot of the NIC being used. I'm also curious as to why you have what at first glance seem like absurdly large socket buffer sizes.
That said, it looks like you have some Really Big (tm) increases in service demand. Many more CPU cycles being consumed per KB of data transferred.
Your message size makes me wonder if you were using a 9000 byte MTU.
Perhaps in the move from 3.4.2 to 4.4.0 you lost some or all of the stateless offloads for your NIC(s)? Running ethtool -k <interface> on both ends under both kernels might be good.
Also, if you did have a 9000 byte MTU under 3.4.2 are you certain you still had it under 4.4.0?
It would (at least to me) also be interesting to run a TCP_RR test comparing the two kernels. TCP_RR (at least with the default request/response size of one byte) doesn't really care about stateless offloads or MTUs and could show how much difference there is in basic path length (or I suppose in interrupt coalescing behaviour if the NIC in question has a mildly dodgy heuristic for such things).
happy benchmarking,
rick jones
^ permalink raw reply
* Re: Poorer networking performance in later kernels?
From: Eric Dumazet @ 2016-04-18 12:16 UTC (permalink / raw)
To: Butler, Peter; +Cc: netdev@vger.kernel.org
In-Reply-To: <1460759582.10638.79.camel@edumazet-glaptop3.roam.corp.google.com>
On Fri, 2016-04-15 at 15:33 -0700, Eric Dumazet wrote:
> On Fri, 2016-04-15 at 21:02 +0000, Butler, Peter wrote:
> > (Please keep me CC'd to all comments/responses)
> >
> > I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in networking performance. Nothing was changed on the test systems, other than the kernel itself (and kernel modules). The identical .config used to build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any configuration differences (e.g. new parameters, etc.) were taken as default values.
> >
> > The testing was performed on the same actual hardware for both kernel versions (i.e. take the existing 3.4.2 physical setup, simply boot into the (new) kernel and run the same test). The netperf utility was used for benchmarking and the testing was always performed on idle systems.
> >
> > TCP testing yielded the following results, where the 4.4.0 kernel only got about 1/2 of the throughput:
> >
> > Recv Send Send Utilization Service Demand
> > Socket Socket Message Elapsed Send Recv Send Recv
> > Size Size Size Time Throughput local remote local remote
> > bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
> >
> > 3.4.2 13631488 13631488 8952 30.01 9370.29 10.14 6.50 0.709 0.454
> > 4.4.0 13631488 13631488 8952 30.02 5314.03 9.14 14.31 1.127 1.765
> >
> > SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:
> >
> > Recv Send Send Utilization Service Demand
> > Socket Socket Message Elapsed Send Recv Send Recv
> > Size Size Size Time Throughput local remote local remote
> > bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
> >
> > 3.4.2 13631488 13631488 8952 30.00 2306.22 13.87 13.19 3.941 3.747
> > 4.4.0 13631488 13631488 8952 30.01 882.74 16.86 19.14 12.516 14.210
> >
> > The same tests were performed a multitude of time, and are always consistent (within a few percent). I've also tried playing with various run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the issue but have had no success at all.
> >
> > I'm at a loss as to what could possibly account for such a discrepancy...
>
> Maybe new kernel is faster and you have drops somewhere ?
>
> nstat >/dev/null
> netperf -H ...
> nstat
>
> Would help
>
Are you receiving my mails, or simply ignoring them ?
Thanks.
^ permalink raw reply
* Re: [PATCH] cls_cgroup: get sk_classid only from full sockets
From: Eric Dumazet @ 2016-04-18 12:14 UTC (permalink / raw)
To: Konstantin Khlebnikov; +Cc: netdev, David S. Miller, cgroups, linux-kernel
In-Reply-To: <146097943086.15300.8997345346172526153.stgit@buzz>
On Mon, 2016-04-18 at 14:37 +0300, Konstantin Khlebnikov wrote:
> skb->sk could point to timewait or request socket which has no sk_classid.
> Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify".
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>
Thanks !
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox