Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] net/ethernet: cpsw: DT read bool dual_emac
From: Mugunthan V N @ 2013-10-04 13:28 UTC (permalink / raw)
  To: Markus Pargmann, David S. Miller
  Cc: Florian Fainelli, linux-arm-kernel, netdev, kernel
In-Reply-To: <1380890680-30941-2-git-send-email-mpa@pengutronix.de>

On Friday 04 October 2013 07:44 AM, Markus Pargmann wrote:
> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>

Regards
Mugunthan V N

^ permalink raw reply

* Re: [PATCHv2] IPv6: Allow the MTU of ipip6 tunnel to be set below 1280
From: Hannes Frederic Sowa @ 2013-10-04 13:32 UTC (permalink / raw)
  To: Oussama Ghorbel
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1380808166-13280-1-git-send-email-ou.ghorbel@gmail.com>

On Thu, Oct 03, 2013 at 02:49:26PM +0100, Oussama Ghorbel wrote:
> The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
> However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
> More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530
> 
> This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
> -In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
> -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.
> 
> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
> ---
>  net/ipv6/ip6_tunnel.c |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index 46ba243..4b51b03 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -1429,9 +1429,17 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
>  static int
>  ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
>  {
> -	if (new_mtu < IPV6_MIN_MTU) {
> -		return -EINVAL;
> +	struct ip6_tnl *tnl = netdev_priv(dev);
> +
> +	if (tnl->parms.proto == IPPROTO_IPIP) {
> +		if (new_mtu < 68)
> +			return -EINVAL;
> +	} else {
> +		if (new_mtu < IPV6_MIN_MTU)
> +			return -EINVAL;
>  	}
> +	if (new_mtu > 0xFFF8 - dev->hard_header_len)
> +		return -EINVAL;
>  	dev->mtu = new_mtu;
>  	return 0;
>  }

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Anytime soon we should replace all the FFF8 with a symbolic constant.

Thanks,

  Hannes

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Eric Dumazet @ 2013-10-04 13:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, sparclinux,
	Nicolas Dichtel, Alexei Starovoitov, linux-s390, Russell King,
	x86, James Morris, Ingo Molnar, Alexey Kuznetsov,
	Paul E. McKenney, Xi Wang, Matt Evans, Thomas Gleixner,
	linux-arm-kernel, Stelian Nirlu, Nicolas Schichan,
	Hideaki YOSHIFUJI, netdev, linux-kernel, David S. Miller,
	Mircea Gherzan, Daniel Borkmann <
In-Reply-To: <20131004075133.GA12313@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1306 bytes --]

1)
>
> I took a brief look at arch/x86/net/bpf_jit_comp.c while reviewing this
> patch.
>
> You need to split up bpf_jit_compile(), it's an obscenely large, ~600
> lines long function. We don't do that in modern, maintainable kernel code.
>
> 2)
>
> This 128 bytes extra padding:
>
>         /* Most of BPF filters are really small,
>          * but if some of them fill a page, allow at least
>          * 128 extra bytes to insert a random section of int3
>          */
>         sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE);
>
> why is it done? It's not clear to me from the comment.
>

commit 314beb9bcabfd6b4542ccbced2402af2c6f6142a
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri May 17 16:37:03 2013 +0000

    x86: bpf_jit_comp: secure bpf jit against spraying attacks

    hpa bringed into my attention some security related issues
    with BPF JIT on x86.

    This patch makes sure the bpf generated code is marked read only,
    as other kernel text sections.

    It also splits the unused space (we vmalloc() and only use a fraction of
    the page) in two parts, so that the generated bpf code not starts at a
    known offset in the page, but a pseudo random one.

    Refs:

http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html

[-- Attachment #1.2: Type: text/html, Size: 2059 bytes --]

[-- Attachment #2: Type: text/plain, Size: 150 bytes --]

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* RE: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
From: Paul Durrant @ 2013-10-04 14:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Wei Liu
In-Reply-To: <20131004.002248.789391156050342547.davem@davemloft.net>

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: 04 October 2013 05:23
> To: Paul Durrant
> Cc: netdev@vger.kernel.org; Wei Liu
> Subject: Re: [PATCH] xen-netback: Don't destroy the netdev until the vif is
> shut
> 
> 
> Can one of you do -stable backports of this change for v3.11 and any
> other active -stable branch where this fix is relevant?
> 

Ok. Will do.

  Paul

> I gave it a shot, and it got outside my realm of expertise.
> 
> Thanks!

^ permalink raw reply

* [PATCH] ipv4: fix ineffective source address selection
From: Jiri Benc @ 2013-10-04 15:04 UTC (permalink / raw)
  To: netdev

When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").

Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 net/ipv4/route.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 727f436..6011615 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2072,7 +2072,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
 							      RT_SCOPE_LINK);
 			goto make_route;
 		}
-		if (fl4->saddr) {
+		if (!fl4->saddr) {
 			if (ipv4_is_multicast(fl4->daddr))
 				fl4->saddr = inet_select_addr(dev_out, 0,
 							      fl4->flowi4_scope);
-- 
1.7.6.5

^ permalink raw reply related

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Nicolas Dichtel @ 2013-10-04 15:21 UTC (permalink / raw)
  To: Masatake YAMATO, netdev
In-Reply-To: <1380859529-32351-1-git-send-email-yamato@redhat.com>

Le 04/10/2013 06:05, Masatake YAMATO a écrit :
> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.
But the ifindex can be interpreted only when you know the related netns and
veth peer is probably in another netns.
How to found the netdevice in userland in this case?

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: cpsw: Search childs for slave nodes
From: Peter Korsgaard @ 2013-10-04 15:24 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: David S. Miller, Mugunthan V N, netdev, kernel, Florian Fainelli,
	linux-arm-kernel
In-Reply-To: <1380890680-30941-1-git-send-email-mpa@pengutronix.de>

>>>>> "Markus" == Markus Pargmann <mpa@pengutronix.de> writes:

 Markus> The current implementation searches the whole DT for nodes named
 Markus> "slave".

 Markus> This patch changes it to search only child nodes for slaves.

 Markus> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>

Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

-- 
Bye, Peter Korsgaard

^ permalink raw reply

* Re: [PATCH 2/2] net/ethernet: cpsw: DT read bool dual_emac
From: Peter Korsgaard @ 2013-10-04 15:24 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: David S. Miller, Mugunthan V N, netdev, kernel, Florian Fainelli,
	linux-arm-kernel
In-Reply-To: <1380890680-30941-2-git-send-email-mpa@pengutronix.de>

>>>>> "Markus" == Markus Pargmann <mpa@pengutronix.de> writes:

 Markus> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>

Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

-- 
Bye, Peter Korsgaard

^ permalink raw reply

* Re: [PATCH] ipv4: fix ineffective source address selection
From: Eric Dumazet @ 2013-10-04 15:49 UTC (permalink / raw)
  To: Jiri Benc; +Cc: netdev
In-Reply-To: <245895a0777442b56ecea1453be041aa1b31c5a2.1380898983.git.jbenc@redhat.com>

On Fri, 2013-10-04 at 17:04 +0200, Jiri Benc wrote:
> When sending out multicast messages, the source address in inet->mc_addr is
> ignored and rewritten by an autoselected one. This is caused by a typo in
> commit 813b3b5db831 ("ipv4: Use caller's on-stack flowi as-is in output
> route lookups").
> 
> Signed-off-by: Jiri Benc <jbenc@redhat.com>
> ---

Nice catch !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [stable 3.0] add some CVE fixes
From: Greg KH @ 2013-10-04 16:03 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jiri Slaby, stable, netdev
In-Reply-To: <1380860287.1441.101.camel@deadeye.wl.decadent.org.uk>

On Fri, Oct 04, 2013 at 05:18:07AM +0100, Ben Hutchings wrote:
> On Thu, 2013-10-03 at 21:11 +0200, Jiri Slaby wrote:
> > On 10/03/2013 08:44 PM, Greg KH wrote:
> > > On Thu, Oct 03, 2013 at 11:20:28AM +0200, Jiri Slaby wrote:
> > >> Plus the backports that are replied to this mail?
> > > 
> > > I don't see any backports, did you forget to send them?
> > 
> > I don't think so, they were sent and this is a log of one of them:
> > OK. Log says:
> > Sendmail: /usr/sbin/sendmail -f jslaby@suse.cz -i stable@vger.kernel.org
> > jslaby@suse.cz
> > From: Jiri Slaby <jslaby@suse.cz>
> > To: <stable@vger.kernel.org>
> > Cc: jslaby@suse.cz
> > Subject: [PATCH 4/4] Tools: hv: verify origin of netlink connector message
> > Date: Thu,  3 Oct 2013 11:23:50 +0200
> > Message-Id: <1380792230-27255-4-git-send-email-jslaby@suse.cz>
> > X-Mailer: git-send-email 1.8.4
> > In-Reply-To: <1380792230-27255-1-git-send-email-jslaby@suse.cz>
> > References: <524D36DC.5070506@suse.cz>
> >  <1380792230-27255-1-git-send-email-jslaby@suse.cz>
> > 
> > Result: OK
> > 
> > Could you check your spam folder? Or I can bounce them directly to you?
> 
> They didn't hit the list either.  If they're applicable to 3.2 as well
> then could you send them this way?

I did not find these anywhere in any spam filter, and as they didn't hit
the list either, I'd blame an email server on your end somewhere.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH v2.42 1/5] odp: Allow VLAN actions after MPLS actions
From: Ben Pfaff @ 2013-10-04 16:21 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Ravi K, netdev-u79uwXL29TY76Z2rM5mHXA,
	Isaku Yamahata
In-Reply-To: <1380874200-8981-2-git-send-email-horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

On Fri, Oct 04, 2013 at 05:09:56PM +0900, Simon Horman wrote:
> From: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
> 
> OpenFlow 1.1 and 1.2, and 1.3 differ in their handling of MPLS actions in the
> presence of VLAN tags. To allow correct behaviour to be committed in
> each situation, this patch adds a second round of VLAN tag action
> handling to commit_odp_actions(), which occurs after MPLS actions. This
> is implemented with a new field in 'struct xlate_in' called 'vlan_tci'.
> 
> When an push_mpls action is composed, the flow's current VLAN state is
> stored into xin->vlan_tci, and flow->vlan_tci is set to 0 (pop_vlan). If
> a VLAN tag is present, it is stripped; if not, then there is no change.
> Any later modifications to the VLAN state is written to xin->vlan_tci.
> When committing the actions, flow->vlan_tci is used before MPLS actions,
> and xin->vlan_tci is used afterwards. This retains the current datapath
> behaviour, but allows VLAN actions to be applied in a more flexible
> manner.
> 
> Both before and after this patch MPLS LSEs are pushed onto a packet after
> any VLAN tags that may be present. This is the behaviour described in
> OpenFlow 1.1 and 1.2. OpenFlow 1.3 specifies that MPLS LSEs should be
> pushed onto a packet before any VLAN tags that are present. Support
> for this will be added by a subsequent patch that makes use of
> the infrastructure added by this patch.
> 
> Signed-off-by: Joe Stringer <joe-Q1GJJQv1iO6lP80pJB477g@public.gmane.org>
> Signed-off-by: Simon Horman <horms-/R6kz+dDXgpPR4JQBCEnsQ@public.gmane.org>

I noticed a couple more minor points.

First, it seems to me that the "vlan_tci" member that this adds to
xlate_in could go in xlate_ctx just as well.  I would prefer that,
because (as far as I can tell) there is no reason for the client to use
any value other than flow->vlan_tci here, and putting it in xlate_ctx
hides it from the client.

Thanks for rearranging the code and updating the comment in
do_xlate_actions().  It makes the operation clearer.  But now that it's
clear I have an additional question.  Does it really make sense to have
'vlan_tci' as only a local variable in do_xlate_actions()?  Presumably,
MPLS and VLANs should interact the same way regardless of whether they
are separated by resubmits or goto_tables.  That is, I suspect that this
is xlate_ctx state, not local state.

Thanks,

Ben.

^ permalink raw reply

* [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
From: Paul Durrant @ 2013-10-04 16:26 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: Paul Durrant, Xi Xiong, Matt Wilson, Annie Li, Wei Liu,
	Ian Campbell

Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into
xenvif_count_skb_slots() for skbs with a linear area spanning a page
boundary. The alignment of skb->data needs to be taken into account, not
just the head length. This patch fixes the issue by dry-running the code
from xenvif_gop_skb() (and adjusting the comment above the function to note
that).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Xi Xiong <xixiong@amazon.com>
Cc: Matt Wilson <msw@amazon.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>

---
 drivers/net/xen-netback/netback.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d0b0feb..6f680f4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -223,15 +223,28 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
 /*
  * Figure out how many ring slots we're going to need to send @skb to
  * the guest. This function is essentially a dry run of
- * xenvif_gop_frag_copy.
+ * xenvif_gop_skb.
  */
 unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
 {
+	unsigned char *data;
 	unsigned int count;
 	int i, copy_off;
 	struct skb_cb_overlay *sco;
 
-	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
+	count = 0;
+
+	data = skb->data;
+	while (data < skb_tail_pointer(skb)) {
+		unsigned int offset = offset_in_page(data);
+		unsigned int len = PAGE_SIZE - offset;
+
+		if (data + len > skb_tail_pointer(skb))
+			len = skb_tail_pointer(skb) - data;
+
+		count++;
+		data += len;
+	}
 
 	copy_off = skb_headlen(skb) % PAGE_SIZE;
 
-- 
1.7.10.4

^ permalink raw reply related

* USB ethernet (smsc95xx) transmit timeout errors
From: David Laight @ 2013-10-04 16:28 UTC (permalink / raw)
  To: netdev

We are seeing some 'transmit timeout' errors from the smsc95xx driver.
I've found references to similar problems on the RPI, but no explicit
resolution.

Our systems are Intel i7 motherboards with the USB chip on a 'front panel'
board (together with three USB2 sockets) plugged into one on the
motherboard's USB3 sockets.
(The silicon has some kind of clever USB hub in it.)

Initial failures were with the 3.2 kernel from Ubuntu 12.04, but I've
updated one of the systems that failed to a 3.8 kernel and it still
fails.

Does this problem 'ring a bell' with anyone? and is it likely to have
been fixed in a more recent kernel?

Even with heavy traffic the systems might run for 24 hours before
failing - so proving a system is 'fixed' is difficult.

	David

^ permalink raw reply

* Re: [PATCH v2.42 2/5] ofp-actions: Add separate OpenFlow 1.3 action parser
From: Ben Pfaff @ 2013-10-04 16:31 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <1380874200-8981-3-git-send-email-horms@verge.net.au>

On Fri, Oct 04, 2013 at 05:09:57PM +0900, Simon Horman wrote:
> From: Joe Stringer <joe@wand.net.nz>
> 
> This patch adds new ofpact_from_openflow13() and
> ofpacts_from_openflow13() functions parallel to the existing ofpact
> handling code. In the OpenFlow 1.3 version, push_mpls is handled
> differently, but all other actions are handled by the existing code.
> 
> In the case of push_mpls for OpenFlow 1.3 the new mpls_before_vlan field of
> struct ofpact_push_mpls is set to true.  This will be used by a subsequent
> patch to allow allow the correct VLAN+MPLS datapath behaviour to be
> determined at odp translation time.
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Simon Horman <horms@verge.net.au>

The need for a huge comment on mpls_before_vlan suggests to me that
breaking it out as a separate type would be helpful.  How about this:

/* The position in the packet at which to insert an MPLS header.
 *
 * Used NXAST_PUSH_MPLS, OFPAT11_PUSH_MPLS. */
enum ofpact_mpls_position {
    /* Add the MPLS LSE after the Ethernet header but before any VLAN tags.
     * OpenFlow 1.3+ requires this behavior. */
    OFPACT_MPLS_BEFORE_VLAN,

    /* Add the MPLS LSE after the Ethernet header and any VLAN tags.
     * OpenFlow 1.1 and 1.2 require this behavior. */
    OFPACT_MPLS_AFTER_VLAN
};

/* OFPACT_PUSH_VLAN/MPLS/PBB
 *
 * Used for NXAST_PUSH_MPLS, OFPAT11_PUSH_MPLS. */
struct ofpact_push_mpls {
    struct ofpact ofpact;
    ovs_be16 ethertype;
    enum ofpact_mpls_position position;
};

Thanks,

Ben.

^ permalink raw reply

* Re: [PATCH v2.42 3/5] lib: Support pushing of MPLS LSE before or after VLAN tag
From: Ben Pfaff @ 2013-10-04 16:32 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <1380874200-8981-4-git-send-email-horms@verge.net.au>

On Fri, Oct 04, 2013 at 05:09:58PM +0900, Simon Horman wrote:
> From: Joe Stringer <joe@wand.net.nz>
> 
> This patch modifies the push_mpls behaviour to allow
> pushing of an MPLS LSE either before any VLAN tag that may be present.
> 
> Pushing the MPLS LSE before any VLAN tag that is present is the
> behaviour specified in OpenFlow 1.3.
> 
> Pushing the MPLS LSE after the any VLAN tag that is present is the
> behaviour specified in OpenFlow 1.1 and 1.2. This is the only behaviour
> that was supported prior to this patch.
> 
> When an push_mpls action has been inserted using OpenFlow 1.2 or earlier
> the behaviour of pushing the MPLS LSE before any VLAN tag that may be
> present is implemented by by inserting VLAN actions around the MPLS push
> action during odp translation; Pop VLAN tags before committing MPLS
> actions, and push the expected VLAN tag afterwards.
> 
> The trigger condition for the two different behaviours is the value of the
> mpls_before_vlan field of struct ofpact_push_mpls.  This field is set when
> parsing OpenFlow actions.
> 
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Simon Horman <horms@verge.net.au>

I'm happy with this, I think.  It will need a trivial update if you take
my suggestion on patch 2.

^ permalink raw reply

* Re: [PATCH 0/3] net: mv643xx_eth: various small fixes for v3.12
From: Ezequiel Garcia @ 2013-10-04 16:53 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: Jason Cooper, netdev, linux-kernel, Lennert Buytenhek,
	David Miller, linux-arm-kernel
In-Reply-To: <1380711442-24735-1-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:19PM +0200, Sebastian Hesselbarth wrote:
> This patch set comprises some one-liners to fix issues with repeated
> loading and unloading of a modular mv643xx_eth driver.
> 
> First two patches take care of the periodic port statistic timer, that
> updates statistics by reading port registers using add_timer/mod_timer.
> 
> Patch 1 moves timer re-schedule from mib_counters_update to the timer
> callback. As mib_counters_update is also called from non-timer context,
> this ensures the timer is reactivated from timer context only.
> 
> Patch 2 moves initial timer schedule from _probe() time to right before
> the port is actually started as the corresponding del_timer_sync is at
> _stop() time. This fixes a regression, where unloading the driver from a
> non-started eth device can cause the timer to access deallocated mem.
> 
> Patch 3 adds an assignment of the ports device_node to the corresponding
> self-created platform_device. This is required to allow fixups based on
> the device_node's compatible string later. Actually, it is also a potential
> regression because we already check compatible string for Kirkwood, but
> does not (yet) rely on the fixup.
> 
> All patches are based on v3.12-rc3 and have been tested on Kirkwood-based
> Seagate Dockstar.
> 
> Patches 1 and 2 can also possibly queued up for -stable.
> 
> Sebastian Hesselbarth (3):
>   net: mv643xx_eth: update statistics timer from timer context only
>   net: mv643xx_eth: fix orphaned statistics timer crash
>   net: mv643xx_eth: fix missing device_node for port devices
> 
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org

Sorry for missing this series, somehow my filters failed to catch it.

Anyway, I would expect these patches to have my Reported-by, but I'm
glad to see you've solved all the issues.

I see David has already applied them all, but I'll re-run my tests to
check there aren't any more issues if I can find some time next week.

Thanks for taking care of it!
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next] dev: add support of flag IFF_NOPROC
From: David Miller @ 2013-10-04 17:29 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: stephen, netdev
In-Reply-To: <524EAF64.8000801@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Fri, 04 Oct 2013 14:07:00 +0200

> Of course optimizing /proc and /sysfs is a good option, but any
> optimizations
> will never be as fast as disabling them for some well known
> netdevices.
> 
> Note also that the memory consumption is significantly less with this
> flag:

It potentially breaks tools, it's a non-starter, sorry.

^ permalink raw reply

* [PATCH] tcp: do not forget FIN in tcp_shifted_skb()
From: Eric Dumazet @ 2013-10-04 17:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Neal Cardwell, Ilpo Järvinen, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

Yuchung found following problem :

 There are bugs in the SACK processing code, merging part in
 tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
 skbs FIN flag. When a receiver first SACK the FIN sequence, and later
 throw away ofo queue (e.g., sack-reneging), the sender will stop
 retransmitting the FIN flag, and hangs forever.

Following packetdrill test can be used to reproduce the bug.

$ cat sack-merge-bug.pkt
`sysctl -q net.ipv4.tcp_fack=0`

// Establish a connection and send 10 MSS.
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+.000 bind(3, ..., ...) = 0
+.000 listen(3, 1) = 0

+.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
+.001 < . 1:1(0) ack 1 win 1024
+.000 accept(3, ..., ...) = 4

+.100 write(4, ..., 12000) = 12000
+.000 shutdown(4, SHUT_WR) = 0
+.000 > . 1:10001(10000) ack 1
+.050 < . 1:1(0) ack 2001 win 257
+.000 > FP. 10001:12001(2000) ack 1
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
+.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
// SACK reneg
+.050 < . 1:1(0) ack 12001 win 257
+0 %{ print "unacked: ",tcpi_unacked }%
+5 %{ print "" }%


First, a typo inverted left/right of one OR operation, then
code forgot to advance end_seq if the merged skb carried FIN.

Bug was added in 2.6.29 by commit 832d11c5cd076ab
("tcp: Try to restore large SKBs while SACK processing")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 25a89ea..113dc5f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1284,7 +1284,10 @@ static bool tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
 		tp->lost_cnt_hint -= tcp_skb_pcount(prev);
 	}
 
-	TCP_SKB_CB(skb)->tcp_flags |= TCP_SKB_CB(prev)->tcp_flags;
+	TCP_SKB_CB(prev)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags;
+	if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+		TCP_SKB_CB(prev)->end_seq++;
+
 	if (skb == tcp_highest_sack(sk))
 		tcp_advance_highest_sack(sk, skb);
 

^ permalink raw reply related

* Re: [PATCH 0/3] net: mv643xx_eth: various small fixes for v3.12
From: Sebastian Hesselbarth @ 2013-10-04 17:34 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Jason Cooper, netdev, linux-kernel, Lennert Buytenhek,
	David Miller, linux-arm-kernel
In-Reply-To: <20131004165354.GA9065@localhost>

On 10/04/2013 06:53 PM, Ezequiel Garcia wrote:
> On Wed, Oct 02, 2013 at 12:57:19PM +0200, Sebastian Hesselbarth wrote:
>> This patch set comprises some one-liners to fix issues with repeated
>> loading and unloading of a modular mv643xx_eth driver.
>>
>> First two patches take care of the periodic port statistic timer, that
>> updates statistics by reading port registers using add_timer/mod_timer.
>>
>> Patch 1 moves timer re-schedule from mib_counters_update to the timer
>> callback. As mib_counters_update is also called from non-timer context,
>> this ensures the timer is reactivated from timer context only.
>>
>> Patch 2 moves initial timer schedule from _probe() time to right before
>> the port is actually started as the corresponding del_timer_sync is at
>> _stop() time. This fixes a regression, where unloading the driver from a
>> non-started eth device can cause the timer to access deallocated mem.
>>
>> Patch 3 adds an assignment of the ports device_node to the corresponding
>> self-created platform_device. This is required to allow fixups based on
>> the device_node's compatible string later. Actually, it is also a potential
>> regression because we already check compatible string for Kirkwood, but
>> does not (yet) rely on the fixup.
>>
>> All patches are based on v3.12-rc3 and have been tested on Kirkwood-based
>> Seagate Dockstar.
>>
>> Patches 1 and 2 can also possibly queued up for -stable.
>>
>> Sebastian Hesselbarth (3):
>>    net: mv643xx_eth: update statistics timer from timer context only
>>    net: mv643xx_eth: fix orphaned statistics timer crash
>>    net: mv643xx_eth: fix missing device_node for port devices
>>
>>   drivers/net/ethernet/marvell/mv643xx_eth.c |    7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> ---
>> Cc: David Miller <davem@davemloft.net>
>> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
>> Cc: Jason Cooper <jason@lakedaemon.net>
>> Cc: netdev@vger.kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>
> Sorry for missing this series, somehow my filters failed to catch it.
>
> Anyway, I would expect these patches to have my Reported-by, but I'm
> glad to see you've solved all the issues.

True, thanks to Ezequiel for initially reporting the issues. I will
have to be more careful about the Reported-by next time.

> I see David has already applied them all, but I'll re-run my tests to
> check there aren't any more issues if I can find some time next week.
>
> Thanks for taking care of it!

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Ingo Molnar @ 2013-10-04 17:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Benjamin Herrenschmidt, Heiko Carstens, Paul Mackerras,
	H. Peter Anvin, sparclinux, Nicolas Dichtel, Alexei Starovoitov,
	linux-s390, Russell King, x86, James Morris, Ingo Molnar,
	Alexey Kuznetsov, Paul E. McKenney, Xi Wang, Matt Evans,
	Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	David S. Miller, Mi
In-Reply-To: <CANn89iKkbR0_HaofvC_OVvqRv_Hqj3rATx-Z_4xXeusOasa56g@mail.gmail.com>


* Eric Dumazet <edumazet@google.com> wrote:

> 1)
> >
> > I took a brief look at arch/x86/net/bpf_jit_comp.c while reviewing this
> > patch.
> >
> > You need to split up bpf_jit_compile(), it's an obscenely large, ~600
> > lines long function. We don't do that in modern, maintainable kernel code.
> >
> > 2)
> >
> > This 128 bytes extra padding:
> >
> >         /* Most of BPF filters are really small,
> >          * but if some of them fill a page, allow at least
> >          * 128 extra bytes to insert a random section of int3
> >          */
> >         sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE);
> >
> > why is it done? It's not clear to me from the comment.
> >
> 
> commit 314beb9bcabfd6b4542ccbced2402af2c6f6142a
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri May 17 16:37:03 2013 +0000
> 
>     x86: bpf_jit_comp: secure bpf jit against spraying attacks
> 
>     hpa bringed into my attention some security related issues
>     with BPF JIT on x86.
> 
>     This patch makes sure the bpf generated code is marked read only,
>     as other kernel text sections.
> 
>     It also splits the unused space (we vmalloc() and only use a fraction of
>     the page) in two parts, so that the generated bpf code not starts at a
>     known offset in the page, but a pseudo random one.

Thanks for the explanation - that makes sense.

	Ingo

^ permalink raw reply

* Let the moment last as much as you want.
From: JamesEPolk @ 2013-10-04 17:40 UTC (permalink / raw)



		Dear Customer,

Sometimes things happen much quicker than you would love to.

Let the moment last as much as you want.

http://translate.googleusercontent.com/translate_c?depth=1&hl=auto&sl=de&url=www.google.cn&u=http://onlineshop63.yolasite.com/store&usg=ALkJrhjuuBC9SIdn3nj6vfQ7-8vR_Bt35g

Best Regards, 
		

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04 17:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Benjamin Herrenschmidt, Heiko Carstens, Eric Dumazet,
	Paul Mackerras, H. Peter Anvin, sparclinux, Nicolas Dichtel,
	linux-s390, Russell King, x86, James Morris, Ingo Molnar,
	Alexey Kuznetsov, Paul E. McKenney, Xi Wang, Matt Evans,
	Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	David S. Miller, Mircea 
In-Reply-To: <20131004075133.GA12313@gmail.com>

On Fri, Oct 4, 2013 at 12:51 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> +static void bpf_jit_free_deferred(struct work_struct *work)
>> +{
>> +     struct sk_filter *fp = container_of((void *)work, struct sk_filter,
>> +                                         insns);
>> +     unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
>> +     struct bpf_binary_header *header = (void *)addr;
>> +
>> +     set_memory_rw(addr, header->pages);
>> +     module_free(NULL, header);
>> +     kfree(fp);
>> +}
>
> Using the data type suggestions I make further below, this could be
> written in a simpler form, as:
>
>         struct sk_filter *fp = container_of(work, struct sk_filter, work);

yes. I've made it already as part of V4

> Also, a question, why do you mask with PAGE_MASK here:
>
>         unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
>
> ?
>
> AFAICS bpf_func is the module_alloc() result - and module code is page
> aligned. So ->bpf_func is always page aligned here. (The sk_run_filter
> special case cannot happen here.)

randomization of bpf_func start is a prevention of jit spraying
attacks as Eric pointed out.

>>  void bpf_jit_free(struct sk_filter *fp)
>>  {
>>       if (fp->bpf_func != sk_run_filter) {
>> -             unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
>> -             struct bpf_binary_header *header = (void *)addr;
>> -
>> -             set_memory_rw(addr, header->pages);
>> -             module_free(NULL, header);
>> +             struct work_struct *work = (struct work_struct *)fp->insns;
>> +             INIT_WORK(work, bpf_jit_free_deferred);
>
> Missing newline between local variables and statements.

yes. noted and fixed in V4.

>>       unsigned int            (*bpf_func)(const struct sk_buff *skb,
>>                                           const struct sock_filter *filter);
>> -     struct rcu_head         rcu;
>> +     /* insns start right after bpf_func, so that sk_run_filter() fetches
>> +      * first insn from the same cache line that was used to call into
>> +      * sk_run_filter()
>> +      */
>>       struct sock_filter      insns[0];
>
> Please use the customary (multi-line) comment style:
>
>   /*
>    * Comment .....
>    * ...... goes here.
>    */
>
> specified in Documentation/CodingStyle.

I believe filter.h belongs to networking comment style, that's why
checkpatch didn't complain.
But I removed the comment in V4

>>  static inline unsigned int sk_filter_len(const struct sk_filter *fp)
>>  {
>> -     return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
>> +     return max(fp->len * sizeof(struct sock_filter),
>> +                sizeof(struct work_struct)) + sizeof(*fp);
>
> So, "sizeof(struct work_struct)) + sizeof(*fp)" is a pattern that repeats
> a couple of times. Might make sense to stick that into a helper of its own
> - but in general this open coded overlay allocation method looks a bit
> fragile and not very obvious in isolation.
>
> So it could be done a bit cleaner, using an anonymous union:
>
>         /*
>          * These two overlay, the work struct is used during workqueue
>          * driven teardown, when the instructions are not used anymore:
>          */
>         union {
>                 struct sock_filter      insns[0];
>                 struct work_struct      work;
>         };
>
> And then all the sizeof() calculations become obvious and sk_filter_len()
> could be eliminated - I've marked the conversions in the code further
> below.

Eric made exactly the same comment. Agreed and fixed in V4

>>  #else
>> +#include <linux/slab.h>
>
> Inlines in the middle of header files are generally frowned upon.
>
> The standard pattern is to put them at the top, that way it's easier to
> see the dependencies and there's also less .config dependent inclusion,
> which makes header hell cleanup work easier.

Agree. I only followed the style that is already in filter.h 20 lines above.
#ifdef CONFIG_BPF_JIT
#include <stdarg.h>
#include <linux/linkage.h>
#include <linux/printk.h>

as part of the cleanup can move all of them to the top. In the separate commit?

>>       struct sk_filter *fp;
>>       unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
>> +     unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
>> +             + sizeof(*fp);
>
> Using the structure definition I suggested, this could be replaced with
> the more obvious:
>
>         unsigned int sk_fsize = max(fsize, sizeof(*fp));

with helper function it's even cleaner. Fixed in V4

> A couple of questions/suggestions:
>
> 1)
>
> I took a brief look at arch/x86/net/bpf_jit_comp.c while reviewing this
> patch.
>
> You need to split up bpf_jit_compile(), it's an obscenely large, ~600
> lines long function. We don't do that in modern, maintainable kernel code.

I had the same thought, therefore in my proposed generalization of bpf:
http://patchwork.ozlabs.org/patch/279280/
It is split into two. do_jit() is still a bit large at 400 lines. Can
split it further.

> 3)
>
> It's nice code altogether! Are there any plans to generalize its
> interfaces, to allow arbitrary bytecode to be used by other kernel
> subsystems as well? In particular tracing filters could make use of it,
> but it would also allow safe probe points.

That was exactly the reasons to generalize bpf as I proposed.
"extended BPF is a set of pseudo instructions that stitch kernel provided
data in the form of bpf_context with kernel provided set of functions in a safe
and deterministic way with minimal performance overhead vs native code"
Not sure what 'tracing filters' you have in mind, but I'm happy to try
them with extended bpf model.
imo existing bpf with two registers is too limiting for performance
and argument passing.
That's why going to 10 made it a lot more flexible and generically usable.
Sorry to hijack the thread.

Thanks
Alexei

^ permalink raw reply

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Stephen Hemminger @ 2013-10-04 17:55 UTC (permalink / raw)
  To: Masatake YAMATO; +Cc: netdev
In-Reply-To: <1380859529-32351-1-git-send-email-yamato@redhat.com>

On Fri,  4 Oct 2013 13:05:29 +0900
Masatake YAMATO <yamato@redhat.com> wrote:

> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.
> 
> Signed-off-by: Masatake YAMATO <yamato@redhat.com>
> ---

netlink API's are supposed to be symmetrical.

When creating veth, the VETH_INFO_PEER attribute is struct(ifinfomsg).
The fill_info should tack on the same data.

^ permalink raw reply

* Re: [PATCH 00/33] Netfilter updates for net-next
From: David Miller @ 2013-10-04 17:59 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1380875598-5250-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri,  4 Oct 2013 10:32:45 +0200

> The following patchset contains Netfilter updates for your net-next tree,
> mostly ipset improvements and enhancements features, they are:
 ...

Pulled, thanks a lot Pablo.

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Ingo Molnar @ 2013-10-04 18:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Benjamin Herrenschmidt, Heiko Carstens, Eric Dumazet,
	Paul Mackerras, H. Peter Anvin, sparclinux, Nicolas Dichtel,
	linux-s390, Russell King, x86, James Morris, Ingo Molnar,
	Alexey Kuznetsov, Paul E. McKenney, Xi Wang, Matt Evans,
	Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	David S. Miller, Mircea 
In-Reply-To: <CAMEtUuy0mpdUHJjztK31mHjNVahnoZ8CHo2sLLyJmt=p5+gq0w@mail.gmail.com>


* Alexei Starovoitov <ast@plumgrid.com> wrote:

> >>  #else
> >> +#include <linux/slab.h>
> >
> > Inlines in the middle of header files are generally frowned upon.
> >
> > The standard pattern is to put them at the top, that way it's easier to
> > see the dependencies and there's also less .config dependent inclusion,
> > which makes header hell cleanup work easier.
> 
> Agree. I only followed the style that is already in filter.h 20 lines above.
>
> #ifdef CONFIG_BPF_JIT
> #include <stdarg.h>
> #include <linux/linkage.h>
> #include <linux/printk.h>
> 
> as part of the cleanup can move all of them to the top. In the separate commit?

Yeah, sure, that's fine.

> >>       struct sk_filter *fp;
> >>       unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
> >> +     unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
> >> +             + sizeof(*fp);
> >
> > Using the structure definition I suggested, this could be replaced with
> > the more obvious:
> >
> >         unsigned int sk_fsize = max(fsize, sizeof(*fp));
> 
> with helper function it's even cleaner. Fixed in V4

So my thought was that the helper function is perhaps too trivial and 
somewhat obscures the allocation pattern, but yeah. Either way is fine 
with me.

> > A couple of questions/suggestions:
> >
> > 1)
> >
> > I took a brief look at arch/x86/net/bpf_jit_comp.c while reviewing this
> > patch.
> >
> > You need to split up bpf_jit_compile(), it's an obscenely large, ~600
> > lines long function. We don't do that in modern, maintainable kernel code.
> 
> I had the same thought, therefore in my proposed generalization of bpf:
> http://patchwork.ozlabs.org/patch/279280/
> It is split into two. do_jit() is still a bit large at 400 lines. Can
> split it further.

Yeah, I think as long as you split out the loop iterator into a separate 
function it gets much better.

The actual instruction generation code within the iterator looks good in a 
single chunk - splitting it up further than that might in fact make it 
less readable.

> > 3)
> >
> > It's nice code altogether! Are there any plans to generalize its 
> > interfaces, to allow arbitrary bytecode to be used by other kernel 
> > subsystems as well? In particular tracing filters could make use of 
> > it, but it would also allow safe probe points.
> 
> That was exactly the reasons to generalize bpf as I proposed.

Ok, cool :-)

For the x86 bits:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox