* Re: [PATCHSET net-next v2 00/07] Support for byte queue limits on various drivers
From: David Miller @ 2013-10-21 20:34 UTC (permalink / raw)
To: milky-kernel; +Cc: netdev
In-Reply-To: <1382292803-18875-1-git-send-email-milky-kernel@mcmilk.de>
I'm not applying any patches that add module parameters for this, sorry.
^ permalink raw reply
* Re: [PATCH net] tcp: initialize passive-side sk_pacing_rate after 3WHS
From: Eric Dumazet @ 2013-10-21 20:36 UTC (permalink / raw)
To: Neal Cardwell; +Cc: David Miller, netdev, Eric Dumazet, Yuchung Cheng
In-Reply-To: <1382384419-6081-1-git-send-email-ncardwell@google.com>
On Mon, 2013-10-21 at 15:40 -0400, Neal Cardwell wrote:
> For passive TCP connections, upon receiving the ACK that completes the
> 3WHS, make sure we set our pacing rate after we get our first RTT
> sample.
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Yuchung Cheng <ycheng@google.com>
> ---
> net/ipv4/tcp_input.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 53974c7..a16b01b 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5712,6 +5712,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
> } else
> tcp_init_metrics(sk);
>
> + tcp_update_pacing_rate(sk);
> +
> /* Prevent spurious tcp_cwnd_restart() on first data packet */
> tp->lsndtime = tcp_time_stamp;
>
Seems good to me, thanks !
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH v2.44 1/5] odp: Allow VLAN actions after MPLS actions
From: Ben Pfaff @ 2013-10-21 20:41 UTC (permalink / raw)
To: Simon Horman
Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
Joe Stringer
In-Reply-To: <1381972511-27221-2-git-send-email-horms@verge.net.au>
On Thu, Oct 17, 2013 at 10:15:07AM +0900, Simon Horman wrote:
> From: Joe Stringer <joe@wand.net.nz>
>
> OpenFlow 1.1 and 1.2, and 1.3 differ in their handling of MPLS actions in the
> presence of VLAN tags. To allow correct behaviour to be committed in
> each situation, this patch adds a second round of VLAN tag action
> handling to commit_odp_actions(), which occurs after MPLS actions. This
> is implemented with a new field in 'struct xlate_in' called 'vlan_tci'.
>
> When an push_mpls action is composed, the flow's current VLAN state is
> stored into xin->vlan_tci, and flow->vlan_tci is set to 0 (pop_vlan). If
> a VLAN tag is present, it is stripped; if not, then there is no change.
> Any later modifications to the VLAN state is written to xin->vlan_tci.
> When committing the actions, flow->vlan_tci is used before MPLS actions,
> and xin->vlan_tci is used afterwards. This retains the current datapath
> behaviour, but allows VLAN actions to be applied in a more flexible
> manner.
>
> Both before and after this patch MPLS LSEs are pushed onto a packet after
> any VLAN tags that may be present. This is the behaviour described in
> OpenFlow 1.1 and 1.2. OpenFlow 1.3 specifies that MPLS LSEs should be
> pushed onto a packet before any VLAN tags that are present. Support
> for this will be added by a subsequent patch that makes use of
> the infrastructure added by this patch.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Simon Horman <horms@verge.net.au>
I think that this patch tries to track the VLAN tag inside the MPLS
label and the VLAN tag outside the MPLS label separately. But it does
it in an odd way, by testing whether those tags have the same value.
I'm not sure that's correct. If I set a VLAN, push an MPLS label
outside the VLAN, then push the same VLAN outside the MPLS label, does
it behave correctly? (Is there a test for this behavior in patch 3?
If so then I'm reassured.)
^ permalink raw reply
* Re: [patch net v2 0/3] UFO fixes
From: Hannes Frederic Sowa @ 2013-10-21 21:14 UTC (permalink / raw)
To: David Miller
Cc: jiri, netdev, eric.dumazet, jdmason, yoshfuji, kuznet, jmorris,
kaber, herbert
In-Reply-To: <20131021.162612.1795347771078720888.davem@davemloft.net>
On Mon, Oct 21, 2013 at 04:26:12PM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Sun, 20 Oct 2013 05:26:17 +0200
>
> > Hi David!
> >
> > On Sat, Oct 19, 2013 at 07:21:47PM -0400, David Miller wrote:
> >> From: Jiri Pirko <jiri@resnulli.us>
> >> Date: Sat, 19 Oct 2013 12:29:14 +0200
> >>
> >> > Couple of patches fixing UFO functionality in different situations.
> >> >
> >> > v1->v2:
> >> > - minor if{}else{} coding style adjustment suggested by Sergei Shtylyov
> >>
> >> Series applied, thanks Jiri.
> >
> > I would propose that the patches
> >
> > "ip6_output: do skb ufo init for peeked non ufo skb as well"
> > (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b)
> >
> > and
> >
> > "ip_output: do skb ufo init for peeked non ufo skb as well"
> > (e93b7d748be887cd7639b113ba7d7ef792a7efb9)
> >
> > should go to stable because they solve a possible memory corruption
> > from userspace.
>
> I suppose... the reason I didn't automatically queue these up for -stable
> is that they are rather non-trivial.
This patch I proposed before is IMHO more simple. Would you consider
this a candidate for stable only? I would send a proper patch then.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6d56840..3565450 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1308,6 +1308,11 @@ static inline int skb_pagelen(const struct sk_buff *skb)
return len + skb_headlen(skb);
}
+static inline bool skb_has_frags(const struct sk_buff *skb)
+{
+ return skb_shinfo(skb)->nr_frags;
+}
+
/**
* __skb_fill_page_desc - initialise a paged fragment in an skb
* @skb: buffer containing fragment to be initialised
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7d8357b..8dc3d8d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -836,7 +836,7 @@ static int __ip_append_data(struct sock *sk,
csummode = CHECKSUM_PARTIAL;
cork->length += length;
- if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+ if (((length > mtu) || (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
err = ip_ufo_append_data(sk, queue, getfrag, from, length,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..ded4f6f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1227,7 +1227,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
skb = skb_peek_tail(&sk->sk_write_queue);
cork->length += length;
if (((length > mtu) ||
- (skb && skb_is_gso(skb))) &&
+ (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO)) {
err = ip6_ufo_append_data(sk, getfrag, from, length,
Greetings,
Hannes
^ permalink raw reply related
* Re: [patch net v2 0/3] UFO fixes
From: David Miller @ 2013-10-21 21:15 UTC (permalink / raw)
To: hannes
Cc: jiri, netdev, eric.dumazet, jdmason, yoshfuji, kuznet, jmorris,
kaber, herbert
In-Reply-To: <20131021211426.GB24158@order.stressinduktion.org>
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 21 Oct 2013 23:14:26 +0200
> This patch I proposed before is IMHO more simple. Would you consider
> this a candidate for stable only? I would send a proper patch then.
Yes, I would.
Thanks.
^ permalink raw reply
* Re: [PATCH v2] net: remove function sk_reset_txq()
From: David Miller @ 2013-10-21 21:18 UTC (permalink / raw)
To: gamerh2o; +Cc: netdev, linux-kernel
In-Reply-To: <20131020013757.GA25265@will>
From: ZHAO Gang <gamerh2o@gmail.com>
Date: Sun, 20 Oct 2013 09:37:57 +0800
> What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
> and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
> Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
> can remove unneeded sk_reset_txq().
>
> Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
> change a typo in patch description: sock.c -> sock.h
This patch does not apply cleanly to net-next, please fix this up
and resubmit.
Thanks.
^ permalink raw reply
* Re: nf_tables*.h: Remove extern from function prototypes
From: David Miller @ 2013-10-21 21:19 UTC (permalink / raw)
To: joe; +Cc: netdev, linux-kernel
In-Reply-To: <1382245531.2041.37.camel@joe-AO722>
From: Joe Perches <joe@perches.com>
Date: Sat, 19 Oct 2013 22:05:31 -0700
> There are a mix of function prototypes with and without extern
> in the kernel sources. Standardize on not using extern for
> function prototypes.
>
> Function prototypes don't need to be written with extern.
> extern is assumed by the compiler. Its use is as unnecessary as
> using auto to declare automatic/local variables in a block.
>
> Signed-off-by: Joe Perches <joe@perches.com>
I'll apply this directly, thanks Joe.
^ permalink raw reply
* Re: [PATCH 00/15] net: ethernet: remove unnecessary pci_set_drvdata() part 2
From: David Miller @ 2013-10-21 21:21 UTC (permalink / raw)
To: jg1.han; +Cc: netdev
In-Reply-To: <003801cece02$6abb0160$40310420$%han@samsung.com>
From: Jingoo Han <jg1.han@samsung.com>
Date: Mon, 21 Oct 2013 11:08:21 +0900
> Since commit 0998d0631001288a5974afc0b2a5f568bcdecb4d
> (device-core: Ensure drvdata = NULL when no driver is bound),
> the driver core clears the driver data to NULL after device_release
> or on probe failure. Thus, it is not needed to manually clear the
> device driver data to NULL.
Series applied, thanks.
^ permalink raw reply
* Re: [net PATCH 1/1] drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
From: David Miller @ 2013-10-21 21:23 UTC (permalink / raw)
To: mugunthanvnm; +Cc: netdev, linux-omap, bigeasy
In-Reply-To: <1382252546-16382-1-git-send-email-mugunthanvnm@ti.com>
From: Mugunthan V N <mugunthanvnm@ti.com>
Date: Sun, 20 Oct 2013 12:32:26 +0530
> When interrupt pacing is enabled, receive/transmit statistics are not
> updated properly by hardware which leads to ISR return with IRQ_NONE
> and inturn kernel disables the interrupt. This patch removed the checking
> of receive/transmit statistics from ISR.
>
> This patch is verified with AM335x Beagle Bone Black and below is the
> kernel warn when interrupt pacing is enabled.
...
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Applied, thanks.
^ permalink raw reply
* [net-next PATCH] macvlan: resolve ENOENT errors on creation
From: John Fastabend @ 2013-10-21 21:28 UTC (permalink / raw)
To: vfalico, nhorman; +Cc: netdev
After the commit below attempting to create macvlan devices was
resulting in ENOENT errors,
# ip link add link p3p2 type macvlan
RTNETLINK answers: Invalid argument
This happens because netdev_upper_dev_link() is called before
register_netdevice() in the macvlan code. Through a call chain
this results in a call to __netdev_adjacent_dev_insert() and
finally a sysfs_create_link(). This requires the kobject of
the macvlan to be registered which is done in register_netdevice().
If there is no kobject which is the case here the ENOENT error
is seen on the command line.
To resolve this move the netdev_upper_dev_link() call below
the register_netdevice() call. This aligns with vlan driver
flow.
Regression introduced here,
commit 5831d66e8097aedfa3bc35941cf265ada2352317
Author: Veaceslav Falico <vfalico@redhat.com>
Date: Wed Sep 25 09:20:32 2013 +0200
net: create sysfs symlinks for neighbour devices
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
drivers/net/macvlan.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9bf46bd..cc9845e 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -828,22 +828,21 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
eth_hw_addr_inherit(dev, lowerdev);
}
+ port->count += 1;
+ err = register_netdevice(dev);
+ if (err < 0)
+ goto destroy_port;
+
err = netdev_upper_dev_link(lowerdev, dev);
if (err)
goto destroy_port;
- port->count += 1;
- err = register_netdevice(dev);
- if (err < 0)
- goto upper_dev_unlink;
list_add_tail_rcu(&vlan->list, &port->vlans);
netif_stacked_transfer_operstate(lowerdev, dev);
return 0;
-upper_dev_unlink:
- netdev_upper_dev_unlink(lowerdev, dev);
destroy_port:
port->count -= 1;
if (!port->count)
^ permalink raw reply related
* [PATCH net] netpoll: linearize skb before accessing its data
From: Antonio Quartulli @ 2013-10-21 21:31 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Antonio Quartulli
__netpoll_rx() assumes that the data buffer of the received
skb is linear and then passes it to rx_hook().
However this is not true because the skb has not been
linearized yet.
This can cause rx_hook() to access non allocated memory
while parsing the received data.
Fix __netpoll_rx() by explicitly linearising the skb.
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
I checked linux-3.0 and this bug seems to be already there. Please consider
queueing it for stable.
Regards,
net/core/netpoll.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index fc75c9e..97cff18 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -814,6 +814,9 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
if (pskb_trim_rcsum(skb, len))
goto out;
+ if (skb_linearize(skb))
+ goto out;
+
iph = (struct iphdr *)skb->data;
if (iph->protocol != IPPROTO_UDP)
goto out;
@@ -855,6 +858,8 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
goto out;
if (pskb_trim_rcsum(skb, len + sizeof(struct ipv6hdr)))
goto out;
+ if (skb_linearize(skb))
+ goto out;
ip6h = ipv6_hdr(skb);
if (!pskb_may_pull(skb, sizeof(struct udphdr)))
goto out;
--
1.8.4
^ permalink raw reply related
* Re: [net-next PATCH] macvlan: resolve ENOENT errors on creation
From: Veaceslav Falico @ 2013-10-21 21:34 UTC (permalink / raw)
To: John Fastabend; +Cc: nhorman, netdev
In-Reply-To: <20131021212801.19330.69659.stgit@nitbit.x32>
On Mon, Oct 21, 2013 at 02:28:02PM -0700, John Fastabend wrote:
>After the commit below attempting to create macvlan devices was
>resulting in ENOENT errors,
>
># ip link add link p3p2 type macvlan
>RTNETLINK answers: Invalid argument
>
>This happens because netdev_upper_dev_link() is called before
>register_netdevice() in the macvlan code. Through a call chain
>this results in a call to __netdev_adjacent_dev_insert() and
>finally a sysfs_create_link(). This requires the kobject of
>the macvlan to be registered which is done in register_netdevice().
>If there is no kobject which is the case here the ENOENT error
>is seen on the command line.
>
>To resolve this move the netdev_upper_dev_link() call below
>the register_netdevice() call. This aligns with vlan driver
>flow.
Yep, changed the vlan code, but didn't see the macvlan. My cscope didn't
catch it for some reason :-/.
I've also checked - there are no users except bonding, vlan (both are ok),
and macvlan.
Acked-by: Veaceslav Falico <vfalico@redhat.com>
>
>Regression introduced here,
>
>commit 5831d66e8097aedfa3bc35941cf265ada2352317
>Author: Veaceslav Falico <vfalico@redhat.com>
>Date: Wed Sep 25 09:20:32 2013 +0200
>
> net: create sysfs symlinks for neighbour devices
>
>CC: Veaceslav Falico <vfalico@redhat.com>
>CC: Neil Horman <nhorman@tuxdriver.com>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
> drivers/net/macvlan.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>index 9bf46bd..cc9845e 100644
>--- a/drivers/net/macvlan.c
>+++ b/drivers/net/macvlan.c
>@@ -828,22 +828,21 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
> eth_hw_addr_inherit(dev, lowerdev);
> }
>
>+ port->count += 1;
>+ err = register_netdevice(dev);
>+ if (err < 0)
>+ goto destroy_port;
>+
> err = netdev_upper_dev_link(lowerdev, dev);
> if (err)
> goto destroy_port;
>
>- port->count += 1;
>- err = register_netdevice(dev);
>- if (err < 0)
>- goto upper_dev_unlink;
>
> list_add_tail_rcu(&vlan->list, &port->vlans);
> netif_stacked_transfer_operstate(lowerdev, dev);
>
> return 0;
>
>-upper_dev_unlink:
>- netdev_upper_dev_unlink(lowerdev, dev);
> destroy_port:
> port->count -= 1;
> if (!port->count)
>
^ permalink raw reply
* Re: [net-next PATCH] macvlan: resolve ENOENT errors on creation
From: John Fastabend @ 2013-10-21 21:48 UTC (permalink / raw)
To: Veaceslav Falico; +Cc: nhorman, netdev
In-Reply-To: <20131021213441.GB18170@redhat.com>
On 10/21/2013 02:34 PM, Veaceslav Falico wrote:
> On Mon, Oct 21, 2013 at 02:28:02PM -0700, John Fastabend wrote:
>> After the commit below attempting to create macvlan devices was
>> resulting in ENOENT errors,
>>
>> # ip link add link p3p2 type macvlan
>> RTNETLINK answers: Invalid argument
>>
>> This happens because netdev_upper_dev_link() is called before
>> register_netdevice() in the macvlan code. Through a call chain
>> this results in a call to __netdev_adjacent_dev_insert() and
>> finally a sysfs_create_link(). This requires the kobject of
>> the macvlan to be registered which is done in register_netdevice().
>> If there is no kobject which is the case here the ENOENT error
>> is seen on the command line.
>>
>> To resolve this move the netdev_upper_dev_link() call below
>> the register_netdevice() call. This aligns with vlan driver
>> flow.
>
> Yep, changed the vlan code, but didn't see the macvlan. My cscope didn't
> catch it for some reason :-/.
>
> I've also checked - there are no users except bonding, vlan (both are ok),
> and macvlan.
>
The openvswitch code uses netdev_master_upper_dev_link() which
eventually calls __netdev_adjacent_dev_insert() as well. But from
a quick code inspection I think it should work. Anyways that is one
other user.
.John
--
John Fastabend Intel Corporation
^ permalink raw reply
* Re: [net-next PATCH] macvlan: resolve ENOENT errors on creation
From: Neil Horman @ 2013-10-21 21:54 UTC (permalink / raw)
To: John Fastabend; +Cc: vfalico, netdev
In-Reply-To: <20131021212801.19330.69659.stgit@nitbit.x32>
On Mon, Oct 21, 2013 at 02:28:02PM -0700, John Fastabend wrote:
> After the commit below attempting to create macvlan devices was
> resulting in ENOENT errors,
>
> # ip link add link p3p2 type macvlan
> RTNETLINK answers: Invalid argument
>
> This happens because netdev_upper_dev_link() is called before
> register_netdevice() in the macvlan code. Through a call chain
> this results in a call to __netdev_adjacent_dev_insert() and
> finally a sysfs_create_link(). This requires the kobject of
> the macvlan to be registered which is done in register_netdevice().
> If there is no kobject which is the case here the ENOENT error
> is seen on the command line.
>
> To resolve this move the netdev_upper_dev_link() call below
> the register_netdevice() call. This aligns with vlan driver
> flow.
>
> Regression introduced here,
>
> commit 5831d66e8097aedfa3bc35941cf265ada2352317
> Author: Veaceslav Falico <vfalico@redhat.com>
> Date: Wed Sep 25 09:20:32 2013 +0200
>
> net: create sysfs symlinks for neighbour devices
>
> CC: Veaceslav Falico <vfalico@redhat.com>
> CC: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply
* [PATCH] net-bnx2x: Fix byte order problem on NVRAM writes
From: Nate Klein @ 2013-10-21 21:57 UTC (permalink / raw)
To: netdev; +Cc: eilong, nxk, linux-kernel
Tested:
ethtool -e eth0 raw on >first.nvram
ethtool -E eth0 <first.nvram
ethtool -e eth0 raw on >second.nvram
cmp first.nvram second.nvram || ethtool -E eth0 <second.nvram
(No output means pass.)
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 8213cc8..35671fb 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -1549,7 +1549,7 @@ static int bnx2x_nvram_write_dword(struct bnx2x *bp, u32 offset, u32 val,
REG_WR(bp, MCP_REG_MCPR_NVM_COMMAND, MCPR_NVM_COMMAND_DONE);
/* write the data */
- REG_WR(bp, MCP_REG_MCPR_NVM_WRITE, val);
+ REG_WR(bp, MCP_REG_MCPR_NVM_WRITE, cpu_to_be32(val));
/* address of the NVRAM to write to */
REG_WR(bp, MCP_REG_MCPR_NVM_ADDR,
--
1.8.4
^ permalink raw reply related
* [PATCH stable] inet: fix possible memory corruption with UDP_CORK and UFO
From: Hannes Frederic Sowa @ 2013-10-21 22:07 UTC (permalink / raw)
To: netdev; +Cc: jiri, eric.dumazet, davem
This is a replacement patch only for stable which does fix the problems
handled by the following two commits in -net:
"ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9)
"ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b)
Three frames are written on a corked udp socket for which the output
netdevice has UFO enabled. If the first and third frame are smaller than
the mtu and the second one is bigger, we enqueue the second frame with
skb_append_datato_frags without initializing the gso fields. This leads
to the third frame appended regulary and thus constructing an invalid skb.
This fixes the problem by always using skb_append_datato_frags as soon
as the first frag got enqueued to the skb without marking the packet
as SKB_GSO_UDP.
The problem with only two frames for ipv6 was fixed by "ipv6: udp
packets following an UFO enqueued packet need also be handled by UFO"
(2811ebac2521ceac84f2bdae402455baa6a7fb47).
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/linux/skbuff.h | 5 +++++
net/ipv4/ip_output.c | 2 +-
net/ipv6/ip6_output.c | 2 +-
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c2d8933..eee3d92 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1316,6 +1316,11 @@ static inline int skb_pagelen(const struct sk_buff *skb)
return len + skb_headlen(skb);
}
+static inline bool skb_has_frags(const struct sk_buff *skb)
+{
+ return skb_shinfo(skb)->nr_frags;
+}
+
/**
* __skb_fill_page_desc - initialise a paged fragment in an skb
* @skb: buffer containing fragment to be initialised
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 3982eab..13e617f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -841,7 +841,7 @@ static int __ip_append_data(struct sock *sk,
csummode = CHECKSUM_PARTIAL;
cork->length += length;
- if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+ if (((length > mtu) || (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
err = ip_ufo_append_data(sk, queue, getfrag, from, length,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 975624b..65f28be 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1230,7 +1230,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
skb = skb_peek_tail(&sk->sk_write_queue);
cork->length += length;
if (((length > mtu) ||
- (skb && skb_is_gso(skb))) &&
+ (skb && skb_has_frags(skb))) &&
(sk->sk_protocol == IPPROTO_UDP) &&
(rt->dst.dev->features & NETIF_F_UFO)) {
err = ip6_ufo_append_data(sk, getfrag, from, length,
--
1.8.3.1
^ permalink raw reply related
* Re: [net-next PATCH] macvlan: resolve ENOENT errors on creation
From: Veaceslav Falico @ 2013-10-21 22:05 UTC (permalink / raw)
To: John Fastabend; +Cc: nhorman, netdev
In-Reply-To: <5265A147.4010501@gmail.com>
On Mon, Oct 21, 2013 at 02:48:55PM -0700, John Fastabend wrote:
>On 10/21/2013 02:34 PM, Veaceslav Falico wrote:
>>On Mon, Oct 21, 2013 at 02:28:02PM -0700, John Fastabend wrote:
>>>After the commit below attempting to create macvlan devices was
>>>resulting in ENOENT errors,
>>>
>>># ip link add link p3p2 type macvlan
>>>RTNETLINK answers: Invalid argument
>>>
>>>This happens because netdev_upper_dev_link() is called before
>>>register_netdevice() in the macvlan code. Through a call chain
>>>this results in a call to __netdev_adjacent_dev_insert() and
>>>finally a sysfs_create_link(). This requires the kobject of
>>>the macvlan to be registered which is done in register_netdevice().
>>>If there is no kobject which is the case here the ENOENT error
>>>is seen on the command line.
>>>
>>>To resolve this move the netdev_upper_dev_link() call below
>>>the register_netdevice() call. This aligns with vlan driver
>>>flow.
>>
>>Yep, changed the vlan code, but didn't see the macvlan. My cscope didn't
>>catch it for some reason :-/.
>>
>>I've also checked - there are no users except bonding, vlan (both are ok),
>>and macvlan.
>>
>
>The openvswitch code uses netdev_master_upper_dev_link() which
>eventually calls __netdev_adjacent_dev_insert() as well. But from
>a quick code inspection I think it should work. Anyways that is one
>other user.
Yep, checked them also now:
team - links two already existing devices (team->dev and port_dev)
batadv - links existing device to another existing device, or creates of if
the latter doesn't exist (and calls register_netdev on creation)
bridge - links two already existing devices (bridge->dev and dev)
openvswitch - links two already existing devices (get_dpdev(vport->dp) and
the device found by dev_get_by_name()).
bonding - uses netdev_master_upper_dev_link_private(), and is also ok.
Hopefully we're safe.
>
>.John
>
>--
>John Fastabend Intel Corporation
^ permalink raw reply
* Re: [PATCH net] netpoll: linearize skb before accessing its data
From: David Miller @ 2013-10-21 22:23 UTC (permalink / raw)
To: antonio; +Cc: netdev
In-Reply-To: <1382391080-1607-1-git-send-email-antonio@meshcoding.com>
From: Antonio Quartulli <antonio@meshcoding.com>
Date: Mon, 21 Oct 2013 23:31:20 +0200
> __netpoll_rx() assumes that the data buffer of the received
> skb is linear and then passes it to rx_hook().
> However this is not true because the skb has not been
> linearized yet.
>
> This can cause rx_hook() to access non allocated memory
> while parsing the received data.
>
> Fix __netpoll_rx() by explicitly linearising the skb.
>
> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
It is rx_hook's obligation to access the SKB properly and not
assume that the SKB is linear. It is very expensive to
linearize every SKB just for the sake of improperly implemented
receive hooks.
In particular the rx hooks must make use of interface such
as pskb_may_pull(), just like every other protocol does
on packet input processing, to make sure the area they want
to access is in the linear area.
^ permalink raw reply
* Re: [PATCH stable] inet: fix possible memory corruption with UDP_CORK and UFO
From: David Miller @ 2013-10-21 22:25 UTC (permalink / raw)
To: hannes; +Cc: netdev, jiri, eric.dumazet
In-Reply-To: <20131021220747.GC24158@order.stressinduktion.org>
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 22 Oct 2013 00:07:47 +0200
> This is a replacement patch only for stable which does fix the problems
> handled by the following two commits in -net:
>
> "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9)
> "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b)
>
> Three frames are written on a corked udp socket for which the output
> netdevice has UFO enabled. If the first and third frame are smaller than
> the mtu and the second one is bigger, we enqueue the second frame with
> skb_append_datato_frags without initializing the gso fields. This leads
> to the third frame appended regulary and thus constructing an invalid skb.
>
> This fixes the problem by always using skb_append_datato_frags as soon
> as the first frag got enqueued to the skb without marking the packet
> as SKB_GSO_UDP.
>
> The problem with only two frames for ipv6 was fixed by "ipv6: udp
> packets following an UFO enqueued packet need also be handled by UFO"
> (2811ebac2521ceac84f2bdae402455baa6a7fb47).
>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David Miller <davem@davemloft.net>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Queued up for -stable, thanks Hannes.
^ permalink raw reply
* Re: [PATCH net] netpoll: linearize skb before accessing its data
From: Eric Dumazet @ 2013-10-21 22:25 UTC (permalink / raw)
To: Antonio Quartulli; +Cc: David S. Miller, netdev
In-Reply-To: <1382391080-1607-1-git-send-email-antonio@meshcoding.com>
On Mon, 2013-10-21 at 23:31 +0200, Antonio Quartulli wrote:
> __netpoll_rx() assumes that the data buffer of the received
> skb is linear and then passes it to rx_hook().
> However this is not true because the skb has not been
> linearized yet.
>
> This can cause rx_hook() to access non allocated memory
> while parsing the received data.
>
> Fix __netpoll_rx() by explicitly linearising the skb.
>
> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
> ---
>
> I checked linux-3.0 and this bug seems to be already there. Please consider
> queueing it for stable.
>
>
> Regards,
>
>
>
> net/core/netpoll.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index fc75c9e..97cff18 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -814,6 +814,9 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
> if (pskb_trim_rcsum(skb, len))
> goto out;
>
> + if (skb_linearize(skb))
> + goto out;
> +
> iph = (struct iphdr *)skb->data;
> if (iph->protocol != IPPROTO_UDP)
> goto out;
> @@ -855,6 +858,8 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
> goto out;
> if (pskb_trim_rcsum(skb, len + sizeof(struct ipv6hdr)))
> goto out;
> + if (skb_linearize(skb))
> + goto out;
> ip6h = ipv6_hdr(skb);
> if (!pskb_may_pull(skb, sizeof(struct udphdr)))
> goto out;
Well, if you linearize the skb, no need for pskb_may_pull(),
and it would be better to do it once at the beginning...
Anyway, how I see nothing sets rx_hook, what am I missing ?
# git grep -n rx_hook
include/linux/netpoll.h:27: void (*rx_hook)(struct netpoll *, int, char *, int);
include/linux/netpoll.h:44: struct list_head rx_np; /* netpolls that registered an rx_hook */
net/core/netpoll.c:639: /* If there are several rx_hooks for the same address,
net/core/netpoll.c:722: /* If there are several rx_hooks for the same address,
net/core/netpoll.c:837: np->rx_hook(np, ntohs(uh->source),
net/core/netpoll.c:875: np->rx_hook(np, ntohs(uh->source),
net/core/netpoll.c:1065: if (np->rx_hook) {
^ permalink raw reply
* Re: [PATCH net v2 0/9] bnx2x: Bug fixes patch series
From: David Miller @ 2013-10-21 22:32 UTC (permalink / raw)
To: yuvalmin; +Cc: netdev, ariele, eilong
In-Reply-To: <1382280694-8428-1-git-send-email-yuvalmin@broadcom.com>
From: "Yuval Mintz" <yuvalmin@broadcom.com>
Date: Sun, 20 Oct 2013 16:51:25 +0200
> This patch series contains fixes for various flows - several SR-IOV issues
> are fixed, ethtool callbacks (coalescing and register dump) are corrected,
> null pointer dereference on error flows is prevented, etc.
>
> Changes from V1
> ---------------
> - Patch 2 "bnx2x: Prevent an illegal pointer dereference during panic"
> is revised, with improved handling of edge cases.
Series applied, thanks for strengthening the address validation in
patch #2.
^ permalink raw reply
* Re: [PATCH net] netpoll: linearize skb before accessing its data
From: David Miller @ 2013-10-21 22:33 UTC (permalink / raw)
To: eric.dumazet; +Cc: antonio, netdev
In-Reply-To: <1382394336.3284.92.camel@edumazet-glaptop.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 21 Oct 2013 15:25:36 -0700
> Anyway, how I see nothing sets rx_hook, what am I missing ?
Only out of tree code makes use of this facility, and it's been
this way since the facility was introduced.
Yes, I'm disappointed and unhappy about this too.
^ permalink raw reply
* Re: [PATCH net 0/3] ipv6: use rt6i_gateway as nexthop
From: David Miller @ 2013-10-21 22:40 UTC (permalink / raw)
To: ja; +Cc: netdev, netfilter-devel, lvs-devel, yoshfuji
In-Reply-To: <1382272985-1528-1-git-send-email-ja@ssi.bg>
From: Julian Anastasov <ja@ssi.bg>
Date: Sun, 20 Oct 2013 15:43:02 +0300
> I see the following two alternatives for applying these
> patches:
>
> 1. Linger patch 2 in net-next to avoid surprises in the upcoming
> release. In this case patch 3 can be reworked not to depend on
> the new rt6_nexthop() definition in patch 2. I guess this is a
> better option, so that patch 2 can be reviewed and tested for
> longer time.
>
> 2. Include all 3 patches in net tree - more risky because this
> is my first attempt to change IPv6.
I have decided to merge all three patches into -net right now.
I've reviewed these patches several times and they look good
to me.
I'll let them cook upstream for at least a week before submitting them
to -stable to let any last minute errors show themselves and
subsequently get resolved.
Thanks!
^ permalink raw reply
* Re: [PATCH net 2/3] ipv6: fill rt6i_gateway with nexthop address
From: David Miller @ 2013-10-21 22:42 UTC (permalink / raw)
To: ja; +Cc: hannes, netdev, netfilter-devel, lvs-devel, yoshfuji
In-Reply-To: <alpine.LFD.2.03.1310211009270.1522@ssi.bg>
From: Julian Anastasov <ja@ssi.bg>
Date: Mon, 21 Oct 2013 10:31:06 +0300 (EEST)
> Thanks for the review! I don't mind too about
> removing rt6_nexthop. For me it is 51% against 49% to keep it
> as it denotes the places that use nexthop and not gateway.
> May be more opinions will help to decide because I don't know
> if there are any plans to use similar techniques as done for IPv4.
I have no strong opinion about removing rt6_nexthop.
If it is of low cost today and makes the code easier to
undersand and grep, just leave it alone.
^ permalink raw reply
* Re: [PATCH 0/6] ipv4: tcp_memcontrol and userns sysctls
From: David Miller @ 2013-10-21 22:44 UTC (permalink / raw)
To: ebiederm-aS9lmoZGLiVWk0Htik3J/w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87r4bghml4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
Date: Sat, 19 Oct 2013 16:23:19 -0700
> While looking into allowing the ipv4 sysctls to be used in a network
> namespace I stumbled upon the mess that is tcp_memcontrol.
>
> I remove the dead code, broken code, and excessive abstraction in the
> tcp_memcontrols then I clean up up and allow in the user namespace the
> per net ipv4 sysctls.
I wish we hadn't installed this stuff in the first place, but better
to take care of it now than later.
Series applied, thanks Eric.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox