Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Nested GRE locking bug
From: Eric Dumazet @ 2010-10-25 20:08 UTC (permalink / raw)
  To: David Miller; +Cc: ben, netdev, beatrice.barbe, 599816
In-Reply-To: <20101025.125347.28807251.davem@davemloft.net>

Le lundi 25 octobre 2010 à 12:53 -0700, David Miller a écrit :

> I'll commit the following to upstream, and submit a combined
> patch to -stable.
> 
> --------------------
> net: Increase xmit RECURSION_LIMIT to 10.
> 
> Three is definitely too low, and we know from reports that GRE tunnels
> stacked as deeply as 37 levels cause stack overflows, so pick some
> reasonable value between those two.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 78b5a89..2c7da3a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2213,7 +2213,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  }
>  
>  static DEFINE_PER_CPU(int, xmit_recursion);
> -#define RECURSION_LIMIT 3
> +#define RECURSION_LIMIT 10
>  
>  /**
>   *	dev_queue_xmit - transmit a buffer


Perfect, thanks !




^ permalink raw reply

* Re: [PATCH 1/4] vlan: rcu annotations
From: David Miller @ 2010-10-25 20:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287991918.2658.5379.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 09:31:58 +0200

> (struct net_device)->vlgrp is rcu protected :
> 
> add __rcu annotation and proper rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/4] ipv6: ip6_ptr rcu annotations
From: David Miller @ 2010-10-25 20:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287991925.2658.5380.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 09:32:05 +0200

> (struct net_device)->ip6_ptr is rcu protected :
> 
> add __rcu annotation and proper rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/4] net/802: add __rcu annotations
From: David Miller @ 2010-10-25 20:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287991956.2658.5384.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 09:32:36 +0200

> (struct net_device)->garp_port is rcu protected :
> (struct garp_port)->applicants is rcu protected :
> 
> add __rcu annotation and proper rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/4] tunnels: add _rcu annotations
From: David Miller @ 2010-10-25 20:10 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287991996.2658.5385.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 09:33:16 +0200

> (struct ip6_tnl)->next is rcu protected :
> (struct ip_tunnel)->next is rcu protected :
> (struct xfrm6_tunnel)->next is rcu protected :
> 
> add __rcu annotation and proper rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: b43legacy: fix compile error
From: Arnd Hannemann @ 2010-10-25 20:13 UTC (permalink / raw)
  To: Larry Finger
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Eric Dumazet
In-Reply-To: <4CC5D3A7.3080709-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>

Am 25.10.2010 20:59, schrieb Larry Finger:
> On 10/25/2010 01:44 PM, Arnd Hannemann wrote:
>> Am 25.10.2010 20:36, schrieb Larry Finger:
>>> On 10/25/2010 01:26 PM, Arnd Hannemann wrote:
>>>> Am 25.10.2010 17:32, schrieb Larry Finger:
>>>>> On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
>>>>>> On todays linus tree the following compile error happened to me:
>>>>>>
>>>>>>   CC [M]  drivers/net/wireless/b43legacy/xmit.o
>>>>>> In file included from include/net/dst.h:11,
>>>>>>                  from drivers/net/wireless/b43legacy/xmit.c:31:
>>>>>> include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_fast':
>>>>>> include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_slow':
>>>>>> include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>> include/net/dst_ops.h: In function 'dst_entries_add':
>>>>>> include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>> include/net/dst_ops.h: In function 'dst_entries_init':
>>>>>> include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>> include/net/dst_ops.h: In function 'dst_entries_destroy':
>>>>>> include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>> make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
>>>>>> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
>>>>>> make[2]: *** [drivers/net/wireless] Error 2
>>>>>> make[1]: *** [drivers/net] Error 2
>>>>>> make: *** [drivers] Error 2
>>>>>>
>>>>>> This patch fixes this issue by adding "linux/cache.h" as an include to
>>>>>> "include/net/dst_ops.h".
>>>>>
>>>>> Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
>>>>> v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.
>>>>
>>>> Exactly the same git describe here.
>>>> Maybe your arch includes cache.h already, in my case its a compile for ARM (shmobile).
>>>
>>> That probably makes the difference. Using Eric's fix that removes the #include
>>> <linux/dst.h> should be better. Does it work for you?
>>>
>>> There are probably a lot more of the system includes that may not be needed. If
>>> I send you a patch removing them, could you test?
>>
>> As it turns out my card is not supported by b43legacy, but compilation testing,
>> sure I can test that.
> 
> If it is a Broadcom card, it is likely handled by b43.

Yes. It seems it should work with b43 (its an SDIO card) and it almost does...

> Attached is a trial removal of a number of include statements. Does it compile?

Nope:
NSTALL_MOD_PATH=/home/arnd/projekte/renesas-2/nfs modules
  CHK     include/linux/version.h
  CHK     include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
  CALL    scripts/checksyscalls.sh
  CC [M]  drivers/net/wireless/b43legacy/main.o
drivers/net/wireless/b43legacy/main.c: In function 'b43legacy_upload_microcode':
drivers/net/wireless/b43legacy/main.c:1688: error: implicit declaration of function 'signal_pending'
make[4]: *** [drivers/net/wireless/b43legacy/main.o] Error 1
make[3]: *** [drivers/net/wireless/b43legacy] Error 2
make[2]: *** [drivers/net/wireless] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

Thanks,
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation
From: Francois Romieu @ 2010-10-25 20:17 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, davem, eric.dumazet, jpirko
In-Reply-To: <1288033566-2091-1-git-send-email-nhorman@tuxdriver.com>

nhorman@tuxdriver.com <nhorman@tuxdriver.com> :
[...]
> +		if (order > 0) {
> +			for (j = 0; j < pg_vec[i]->num_pages; j++)
> +				pages[j] = virt_to_page(pg_vec[i]->pages[j]);
> +
> +			if (unlikely(!pg_vec[i]))
> +				goto out_free_pgvec;

Check pg_vec[i] sooner ?

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation
From: Eric Dumazet @ 2010-10-25 20:38 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, davem, jpirko
In-Reply-To: <1288033566-2091-1-git-send-email-nhorman@tuxdriver.com>

Le lundi 25 octobre 2010 à 15:06 -0400, nhorman@tuxdriver.com a écrit :
> From: Neil Horman <nhorman@tuxdriver.com>
> 
> It was shown to me recently that systems under high load were driven very deep
> into swap when tcpdump was run.  The reason this happened was because the
> AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
> application to specify how many entries an AF_PACKET socket will have and how
> large each entry will be.  It seems the default setting for tcpdump is to set
> the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
> allocation.  Thats difficult under good circumstances, and horrid under memory
> pressure.
> 
> I thought it would be good to make that a bit more usable.  I was going to do a
> simple conversion of the ring buffer from contigous pages to iovecs, but
> unfortunately, the metadata which AF_PACKET places in these buffers can easily
> span a page boundary, and given that these buffers get mapped into user space,
> and the data layout doesn't easily allow for a change to padding between frames
> to avoid that, a simple iovec change is just going to break user space ABI
> consistency.
> 
> So instead I've done this.  This patch does the aforementioned change,
> allocating an array of pages instead of one contiguous chunk, and then vmaps the
> array into a contiguous memory space, so that it can still be accessed in the
> same way it was before.  This allows for a consisten user and kernel space
> behavior for memory mapped AF_PACKET sockets, which at the same time relieving
> the memory pressure placed on a system when tcpdump defaults are used.
> 
> Tested successfully by me.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> ---

Strange because last time I took a look at this stuff, libpcap was doing
several tries, reducing page orders until it got no allocation
failures...

(It tries to get high order pages, maybe to reduce TLB pressure...)

I remember adding __GFP_NOWARN to avoid a kernel message, while tcpdump
was actually working...




^ permalink raw reply

* Re: [PATCH v2 1/9] tproxy: split off ipv6 defragmentation to a separate module
From: Patrick McHardy @ 2010-10-25 20:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: KOVACS Krisztian, netdev, netfilter-devel, Balazs Scheidler,
	David Miller
In-Reply-To: <1288001640.2826.96.camel@edumazet-laptop>

Am 25.10.2010 12:14, schrieb Eric Dumazet:
> Le lundi 25 octobre 2010 à 11:38 +0200, KOVACS Krisztian a écrit :
>> Hi,
>>
>> On Fri, 2010-10-22 at 00:19 +0200, Eric Dumazet wrote:
>>> Le jeudi 21 octobre 2010 à 16:04 +0200, Patrick McHardy a écrit :
>>>> Am 21.10.2010 13:43, schrieb KOVACS Krisztian:
>>>>> tproxy: split off ipv6 defragmentation to a separate module
>>>>>     
>>>>>     Like with IPv4, TProxy needs IPv6 defragmentation but does not
>>>>>     require connection tracking. Since defragmentation was coupled
>>>>>     with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
>>>>>     similar to the already existing nf_defrag_ipv4.
>>>>
>>>> Applied, thanks.
>>>
>>> Hmm...
>>>
>>> CONFIG_IPV6=m
>>> CONFIG_NETFILTER_TPROXY=m
>>>
>>>
>>>   MODPOST 201 modules
>>> ERROR: "nf_defrag_ipv6_enable" [net/netfilter/xt_TPROXY.ko] undefined!
>>> ERROR: "ipv6_find_hdr" [net/netfilter/xt_TPROXY.ko] undefined!
>>>
>>> Sorry, it's late here, I wont fix this ;)
>>
>> Oops, I guess this is because you do have IPv6 support but don't have
>> ip6tables enabled in your config. Does the patch below fix the issue for
>> you? (For me it now compiles with and without IPv6 conntrack, ip6tables
>> and IPv6 support, too.)
>>
>>
> 
> I had ip6tables enabled, but not CONFIG_NF_CONNTRACK_IPV6 ;)
> 
>>
>> netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY
>>
>> One of the previous tproxy related patches split IPv6 defragmentation and
>> connection tracking, but did not correctly add Kconfig stanzas to handle the
>> new dependencies correctly. This patch fixes that by making the config options
>> mirror the setup we have for IPv4: a distinct config option for defragmentation
>> that is automatically selected by both connection tracking and
>> xt_TPROXY/xt_socket.
>>
>> The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
>> and xt_TPROXY: we only compile these in case we have ip6tables support enabled.
>>
>> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> 
> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>

Dave, please apply directly. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 1/9] tproxy: split off ipv6 defragmentation to a separate module
From: David Miller @ 2010-10-25 20:54 UTC (permalink / raw)
  To: kaber; +Cc: eric.dumazet, hidden, netdev, netfilter-devel, bazsi
In-Reply-To: <4CC5EBC8.9000701@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 25 Oct 2010 22:42:48 +0200

> Am 25.10.2010 12:14, schrieb Eric Dumazet:
>>> netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY
>>>
>>> One of the previous tproxy related patches split IPv6 defragmentation and
>>> connection tracking, but did not correctly add Kconfig stanzas to handle the
>>> new dependencies correctly. This patch fixes that by making the config options
>>> mirror the setup we have for IPv4: a distinct config option for defragmentation
>>> that is automatically selected by both connection tracking and
>>> xt_TPROXY/xt_socket.
>>>
>>> The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
>>> and xt_TPROXY: we only compile these in case we have ip6tables support enabled.
>>>
>>> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
>> 
>> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Dave, please apply directly. Thanks!

Will do, thanks Patrick!

^ permalink raw reply

* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla.
From: Ben Greear @ 2010-10-25 21:18 UTC (permalink / raw)
  To: NetDev
In-Reply-To: <4CC5C51F.3000606@candelatech.com>

On 10/25/2010 10:57 AM, Ben Greear wrote:
>
> To re-create, setup 2 802.1q vlans on different physical interfaces on
> the same system,
> set up routing rules such that send-to-self works, and pass traffic
> (UDP/IPv4 in this case,
> but doesn't seem to matter).
> Stop traffic, then attempt to create additional 802.1q vlans on the same
> physical interfaces.
> The crash only appears to happen after having sent traffic on the
> interface.
>
> Likely it will also crash if one system is sending to another, but so
> far we've
> just tested sending-to-self.
>
> This appears very reproducible for us, and appears to be the same
> problem that
> I had reported against our hacked kernel here:
>
> http://www.spinics.net/lists/netdev/msg144748.html

Bleh, I think I see the problem.

If a NIC is in promis mode, it can receive VLAN packets for which there
are no VLAN devices.

static gro_result_t
vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
                 unsigned int vlan_tci, struct sk_buff *skb)
{
         struct sk_buff *p;
         struct net_device *vlan_dev;
         u16 vlan_id;

         if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
                 skb->deliver_no_wcard = 1;

         skb->skb_iif = skb->dev->ifindex;
         __vlan_hwaccel_put_tag(skb, vlan_tci);
         vlan_id = vlan_tci & VLAN_VID_MASK;
         vlan_dev = vlan_group_get_device(grp, vlan_id);

         if (vlan_dev)
                 skb->dev = vlan_dev;
         else if (vlan_id) {
                 if (!(skb->dev->flags & IFF_PROMISC))
                         goto drop;
                 skb->pkt_type = PACKET_OTHERHOST;
         }

You hit that else branch, and then skb->dev remains the physical
device.

Later, it's passed to:

int vlan_hwaccel_do_receive(struct sk_buff *skb)
{
	struct net_device *dev = skb->dev;
	struct vlan_rx_stats     *rx_stats;

	skb->dev = vlan_dev_info(dev)->real_dev;
	netif_nit_deliver(skb);


which does no checking before assuming that skb->dev is a vlan
device.

Things go downhill rapidly after that.


Maybe this code in dev.c should check that skb->dev is
VLAN device before passing to the hwaccel code?

static int __netif_receive_skb(struct sk_buff *skb)
{
	struct packet_type *ptype, *pt_prev;
	rx_handler_func_t *rx_handler;
	struct net_device *orig_dev;
	struct net_device *master;
	struct net_device *null_or_orig;
	struct net_device *orig_or_bond;
	int ret = NET_RX_DROP;
	__be16 type;

	if (!netdev_tstamp_prequeue)
		net_timestamp_check(skb);

	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
		return NET_RX_SUCCESS;


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH] rps: add __rcu annotations
From: David Miller @ 2010-10-25 21:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288011722.2826.116.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 15:02:02 +0200

> Add __rcu annotations to :
> 	(struct netdev_rx_queue)->rps_map
> 	(struct netdev_rx_queue)->rps_flow_table
> 	struct rps_sock_flow_table *rps_sock_flow_table;
> 
> And use appropriate rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net_ns: add __rcu annotations
From: David Miller @ 2010-10-25 21:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288012811.2826.119.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 15:20:11 +0200

> add __rcu annotation to (struct net)->gen, and use
> rcu_dereference_protected() in net_assign_generic()
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ipv4: add __rcu annotations to ip_ra_chain
From: David Miller @ 2010-10-25 21:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288013564.2826.123.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 15:32:44 +0200

> Add __rcu annotations to :
>         (struct ip_ra_chain)->next
> 	struct ip_ra_chain *ip_ra_chain;
> 
> And use appropriate rcu primitives.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: add __rcu annotation to sk_filter
From: David Miller @ 2010-10-25 21:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288014425.2826.126.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 25 Oct 2010 15:47:05 +0200

> Add __rcu annotation to :
>         (struct sock)->sk_filter
> 
> And use appropriate rcu primitives to reduce sparse warnings if
> CONFIG_SPARSE_RCU_POINTER=y
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla.
From: John Fastabend @ 2010-10-25 21:34 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev
In-Reply-To: <4CC5F40D.8050302@candelatech.com>

On 10/25/2010 2:18 PM, Ben Greear wrote:
> On 10/25/2010 10:57 AM, Ben Greear wrote:
>>
>> To re-create, setup 2 802.1q vlans on different physical interfaces on
>> the same system,
>> set up routing rules such that send-to-self works, and pass traffic
>> (UDP/IPv4 in this case,
>> but doesn't seem to matter).
>> Stop traffic, then attempt to create additional 802.1q vlans on the same
>> physical interfaces.
>> The crash only appears to happen after having sent traffic on the
>> interface.
>>
>> Likely it will also crash if one system is sending to another, but so
>> far we've
>> just tested sending-to-self.
>>
>> This appears very reproducible for us, and appears to be the same
>> problem that
>> I had reported against our hacked kernel here:
>>
>> http://www.spinics.net/lists/netdev/msg144748.html
> 
> Bleh, I think I see the problem.
> 
> If a NIC is in promis mode, it can receive VLAN packets for which there
> are no VLAN devices.
> 
> static gro_result_t
> vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
>                  unsigned int vlan_tci, struct sk_buff *skb)
> {
>          struct sk_buff *p;
>          struct net_device *vlan_dev;
>          u16 vlan_id;
> 
>          if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
>                  skb->deliver_no_wcard = 1;
> 
>          skb->skb_iif = skb->dev->ifindex;
>          __vlan_hwaccel_put_tag(skb, vlan_tci);
>          vlan_id = vlan_tci & VLAN_VID_MASK;
>          vlan_dev = vlan_group_get_device(grp, vlan_id);
> 
>          if (vlan_dev)
>                  skb->dev = vlan_dev;
>          else if (vlan_id) {
>                  if (!(skb->dev->flags & IFF_PROMISC))
>                          goto drop;
>                  skb->pkt_type = PACKET_OTHERHOST;
>          }
> 
> You hit that else branch, and then skb->dev remains the physical
> device.
> 
> Later, it's passed to:
> 
> int vlan_hwaccel_do_receive(struct sk_buff *skb)
> {
> 	struct net_device *dev = skb->dev;
> 	struct vlan_rx_stats     *rx_stats;
> 
> 	skb->dev = vlan_dev_info(dev)->real_dev;
> 	netif_nit_deliver(skb);
> 

Looks like this should be fixed on net-next,

bool vlan_hwaccel_do_receive(struct sk_buff **skbp)
{
        struct sk_buff *skb = *skbp;
        u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK;
        struct net_device *vlan_dev;
        struct vlan_rx_stats *rx_stats;

        vlan_dev = vlan_find_dev(skb->dev, vlan_id);
        if (!vlan_dev) {
                if (vlan_id)
                        skb->pkt_type = PACKET_OTHERHOST;
                return false;
        }

If the vlan_dev is not found do not set skb->dev and return false then
in __netif_receive_skb,

      if (vlan_tx_tag_present(skb)) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                if (vlan_hwaccel_do_receive(&skb)) {
                        ret = __netif_receive_skb(skb);
                        goto out;
                } else if (unlikely(!skb))
                        goto out;
        }


> 
> which does no checking before assuming that skb->dev is a vlan
> device.
> 
> Things go downhill rapidly after that.
> 
> 
> Maybe this code in dev.c should check that skb->dev is
> VLAN device before passing to the hwaccel code?
> 
> static int __netif_receive_skb(struct sk_buff *skb)
> {
> 	struct packet_type *ptype, *pt_prev;
> 	rx_handler_func_t *rx_handler;
> 	struct net_device *orig_dev;
> 	struct net_device *master;
> 	struct net_device *null_or_orig;
> 	struct net_device *orig_or_bond;
> 	int ret = NET_RX_DROP;
> 	__be16 type;
> 
> 	if (!netdev_tstamp_prequeue)
> 		net_timestamp_check(skb);
> 
> 	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
> 		return NET_RX_SUCCESS;
> 
> 
> Thanks,
> Ben
> 


^ permalink raw reply

* Question on code in __netif_receive_skb
From: Ben Greear @ 2010-10-25 21:37 UTC (permalink / raw)
  To: NetDev; +Cc: Patrick McHardy, Eric Dumazet

While poking at the VLAN crash, I noticed this code in dev.c, in
the __netif_receive_skb method.

	if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
		return NET_RX_SUCCESS;

As far as I can tell, this return can never happen because vlan_hwaccel_do_receive
always returns 0.

Maybe it should instead check for return-value == 0 and return success in that case?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: Reproducible VLAN/e1000e crash in 2.6.36 vanilla.
From: Eric Dumazet @ 2010-10-25 21:38 UTC (permalink / raw)
  To: John Fastabend; +Cc: Ben Greear, NetDev
In-Reply-To: <4CC5F7EB.7010307@intel.com>

Le lundi 25 octobre 2010 à 14:34 -0700, John Fastabend a écrit :
> On 10/25/2010 2:18 PM, Ben Greear wrote:
> > On 10/25/2010 10:57 AM, Ben Greear wrote:
> >>
> >> To re-create, setup 2 802.1q vlans on different physical interfaces on
> >> the same system,
> >> set up routing rules such that send-to-self works, and pass traffic
> >> (UDP/IPv4 in this case,
> >> but doesn't seem to matter).
> >> Stop traffic, then attempt to create additional 802.1q vlans on the same
> >> physical interfaces.
> >> The crash only appears to happen after having sent traffic on the
> >> interface.
> >>
> >> Likely it will also crash if one system is sending to another, but so
> >> far we've
> >> just tested sending-to-self.
> >>
> >> This appears very reproducible for us, and appears to be the same
> >> problem that
> >> I had reported against our hacked kernel here:
> >>
> >> http://www.spinics.net/lists/netdev/msg144748.html
> > 
> > Bleh, I think I see the problem.
> > 
> > If a NIC is in promis mode, it can receive VLAN packets for which there
> > are no VLAN devices.
> > 
> > static gro_result_t
> > vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
> >                  unsigned int vlan_tci, struct sk_buff *skb)
> > {
> >          struct sk_buff *p;
> >          struct net_device *vlan_dev;
> >          u16 vlan_id;
> > 
> >          if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
> >                  skb->deliver_no_wcard = 1;
> > 
> >          skb->skb_iif = skb->dev->ifindex;
> >          __vlan_hwaccel_put_tag(skb, vlan_tci);
> >          vlan_id = vlan_tci & VLAN_VID_MASK;
> >          vlan_dev = vlan_group_get_device(grp, vlan_id);
> > 
> >          if (vlan_dev)
> >                  skb->dev = vlan_dev;
> >          else if (vlan_id) {
> >                  if (!(skb->dev->flags & IFF_PROMISC))
> >                          goto drop;
> >                  skb->pkt_type = PACKET_OTHERHOST;
> >          }
> > 
> > You hit that else branch, and then skb->dev remains the physical
> > device.
> > 
> > Later, it's passed to:
> > 
> > int vlan_hwaccel_do_receive(struct sk_buff *skb)
> > {
> > 	struct net_device *dev = skb->dev;
> > 	struct vlan_rx_stats     *rx_stats;
> > 
> > 	skb->dev = vlan_dev_info(dev)->real_dev;
> > 	netif_nit_deliver(skb);
> > 
> 
> Looks like this should be fixed on net-next,
> 
> bool vlan_hwaccel_do_receive(struct sk_buff **skbp)
> {
>         struct sk_buff *skb = *skbp;
>         u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK;
>         struct net_device *vlan_dev;
>         struct vlan_rx_stats *rx_stats;
> 
>         vlan_dev = vlan_find_dev(skb->dev, vlan_id);
>         if (!vlan_dev) {
>                 if (vlan_id)
>                         skb->pkt_type = PACKET_OTHERHOST;
>                 return false;
>         }
> 
> If the vlan_dev is not found do not set skb->dev and return false then
> in __netif_receive_skb,
> 
>       if (vlan_tx_tag_present(skb)) {
>                 if (pt_prev) {
>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>                         pt_prev = NULL;
>                 }
>                 if (vlan_hwaccel_do_receive(&skb)) {
>                         ret = __netif_receive_skb(skb);
>                         goto out;
>                 } else if (unlikely(!skb))
>                         goto out;
>         }
> 

Yes but net-next is totally different beast for vlans ;)

We should make a patch for 2.6.36, not bringing huge vlan stuff added
for 2.6.37 




^ permalink raw reply

* Re: [PATCH v2 10/14] ixgbe: Update ixgbe to use new vlan accleration.
From: Michał Mirosław @ 2010-10-25 21:40 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr
  Cc: Jesse Gross, David Miller, netdev@vger.kernel.org,
	Tantilov, Emil S, Kirsher, Jeffrey T
In-Reply-To: <1288029009.1808.1.camel@pjaxe>

W dniu 25 października 2010 19:50 użytkownik Peter P Waskiewicz Jr
<peter.p.waskiewicz.jr@intel.com> napisał:
> On Fri, 2010-10-22 at 06:24 -0700, Michał Mirosław wrote:
>> 2010/10/21 Jesse Gross <jesse@nicira.com>:
>> > Make the ixgbe driver use the new vlan accleration model.
>> [...]
>> > --- a/drivers/net/ixgbe/ixgbe_main.c
>> > +++ b/drivers/net/ixgbe/ixgbe_main.c
>> > @@ -954,17 +954,13 @@ static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector,
>> >        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
>> >        u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
>> >
>> > -       if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
>> > -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>> > -                       vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>> > -               else
>> > -                       napi_gro_receive(napi, skb);
>> > -       } else {
>> > -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>> > -                       vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
>> > -               else
>> > -                       netif_rx(skb);
>> > -       }
>> > +       if (is_vlan && (tag & VLAN_VID_MASK))
>> > +               __vlan_hwaccel_put_tag(skb, tag);
>>
>> I know that this is carried over from the driver, but why tag == 0 is
>> treated differently here? VID0 is somewhat special, as normally it
>> means 802.1p packet, but i.e. in embedded world people are using it as
>> normal VID. It would be nice to have this handled consistently in the
>> VLAN core - deliver to base dev (tag stripped) if vlan 0 is not
>> configured and to vlan dev if it is.
>
> ixgbe handles VLAN 0 differently because that's the tag that's used when
> DCB is enabled, and no VLAN is configured.  We have to insert the 802.1p
> tag for DCB to work, but the OS won't know about the 802.1q tag, and
> ends up dropping the frame.  So we enable VLAN ID 0 in the HW and tell
> it to strip the tag, so we can still pass the frame up the stack.

So that's actually (part of) what I'm proposing but done at driver level.

BTW, What happens If you both configure VLAN 0 and enable DCB?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH v2 10/14] ixgbe: Update ixgbe to use new vlan accleration.
From: John Fastabend @ 2010-10-25 22:02 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Waskiewicz Jr, Peter P, Jesse Gross, David Miller,
	netdev@vger.kernel.org, Tantilov, Emil S, Kirsher, Jeffrey T
In-Reply-To: <AANLkTi=R9f3oAdVAG1MyyTwEqBGRaBj8mtLPg15R2BjW@mail.gmail.com>

On 10/25/2010 2:40 PM, Michał Mirosław wrote:
> W dniu 25 października 2010 19:50 użytkownik Peter P Waskiewicz Jr
> <peter.p.waskiewicz.jr@intel.com> napisał:
>> On Fri, 2010-10-22 at 06:24 -0700, Michał Mirosław wrote:
>>> 2010/10/21 Jesse Gross <jesse@nicira.com>:
>>>> Make the ixgbe driver use the new vlan accleration model.
>>> [...]
>>>> --- a/drivers/net/ixgbe/ixgbe_main.c
>>>> +++ b/drivers/net/ixgbe/ixgbe_main.c
>>>> @@ -954,17 +954,13 @@ static void ixgbe_receive_skb(struct ixgbe_q_vector *q_vector,
>>>>        bool is_vlan = (status & IXGBE_RXD_STAT_VP);
>>>>        u16 tag = le16_to_cpu(rx_desc->wb.upper.vlan);
>>>>
>>>> -       if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL)) {
>>>> -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>>>> -                       vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>>>> -               else
>>>> -                       napi_gro_receive(napi, skb);
>>>> -       } else {
>>>> -               if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>>>> -                       vlan_hwaccel_rx(skb, adapter->vlgrp, tag);
>>>> -               else
>>>> -                       netif_rx(skb);
>>>> -       }
>>>> +       if (is_vlan && (tag & VLAN_VID_MASK))
>>>> +               __vlan_hwaccel_put_tag(skb, tag);
>>>
>>> I know that this is carried over from the driver, but why tag == 0 is
>>> treated differently here? VID0 is somewhat special, as normally it
>>> means 802.1p packet, but i.e. in embedded world people are using it as
>>> normal VID. It would be nice to have this handled consistently in the
>>> VLAN core - deliver to base dev (tag stripped) if vlan 0 is not
>>> configured and to vlan dev if it is.
>>
>> ixgbe handles VLAN 0 differently because that's the tag that's used when
>> DCB is enabled, and no VLAN is configured.  We have to insert the 802.1p
>> tag for DCB to work, but the OS won't know about the 802.1q tag, and
>> ends up dropping the frame.  So we enable VLAN ID 0 in the HW and tell
>> it to strip the tag, so we can still pass the frame up the stack.
> 
> So that's actually (part of) what I'm proposing but done at driver level.
> 

Michal,

I agree this should be handled outside the driver. Something like this should
do,

--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -12,10 +12,12 @@ Hunk #1, a/net/8021q/vlan_core.c bool vlan_hwaccel_do_receive(struct sk_buff **skbp)
        struct vlan_rx_stats *rx_stats;

        vlan_dev = vlan_find_dev(skb->dev, vlan_id);
-       if (!vlan_dev) {
-               if (vlan_id)
-                       skb->pkt_type = PACKET_OTHERHOST;
+       if (!vlan_dev && vlan_id) {
+               skb->pkt_type = PACKET_OTHERHOST;
                return false;
+       } else if (!vlan_dev) {
+               skb->vlan_tci = 0;
+               return true;
        }

        skb = *skbp = skb_share_check(skb, GFP_ATOMIC);
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -134,14 +134,14 @@ Hunk #1, a/net/8021q/vlan_dev.c int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
         * wrapped proto: we do nothing here.
         */

-       if (!vlan_dev) {
+       if (!vlan_dev && vlan_id) {
                if (vlan_id) {
                        pr_debug("%s: ERROR: No net_device for VID: %u on dev: %s\n",
                                 __func__, vlan_id, dev->name);
                        goto err_unlock;
                }
                rx_stats = NULL;
-       } else {
+       } else if (vlan_id) {
                skb->dev = vlan_dev;

                rx_stats = this_cpu_ptr(vlan_dev_info(skb->dev)->vlan_rx_stats);



> BTW, What happens If you both configure VLAN 0 and enable DCB?
> 

Currently, on ixgbe VLAN0 will not work with DCB. We should just remove it
on net-next. I'll put together a formal patch for the first part.

Thanks,
John

> Best Regards,
> Michał Mirosław
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: __bad_udelay in network driver breaks build
From: Andi Kleen @ 2010-10-25 22:10 UTC (permalink / raw)
  To: David Miller; +Cc: andi, netdev, jeffrey.t.kirsher, alexander.h.duyck
In-Reply-To: <20101025.130513.102558756.davem@davemloft.net>

On Mon, Oct 25, 2010 at 01:05:13PM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Mon, 18 Oct 2010 13:52:30 +0200
> 
> > 
> > Current Linus master x86-64 allyesconfig fails with
> > 
> > drivers/built-in.o: In function `tms380tr_chipset_init':
> > tms380tr.c:(.text+0x10f02de): undefined reference to `__bad_udelay'
> > tms380tr.c:(.text+0x10f03ab): undefined reference to `__bad_udelay'
> > tms380tr.c:(.text+0x10f0400): undefined reference to `__bad_udelay'
> > tms380tr.c:(.text+0x10f07b2): undefined reference to `__bad_udelay'
> > tms380tr.c:(.text+0x10f08ed): undefined reference to `__bad_udelay'
> > make[2]: *** [.tmp_vmlinux1] Error 1
> 
> Let me know if this fixes things:

Fixes that problem, but now I get (with Linus' latest again and a gcc 4.6
snapshot)  

In file included from /home/lsrc/git/linux-2.6/drivers/net/igbvf/ethtool.c:36:0:
/home/lsrc/git/linux-2.6/drivers/net/igbvf/igbvf.h:129:15: error: duplicate member 'page'
make[5]: *** [drivers/net/igbvf/ethtool.o] Error 1
make[4]: *** [drivers/net/igbvf] Error 2

struct igbvf_buffer {
        dma_addr_t dma;
        struct sk_buff *skb;
        union {
                /* Tx */
                struct {
                        unsigned long time_stamp;
                        u16 length;
                        u16 next_to_watch;
                        u16 mapped_as_page;
                };
                /* Rx */
                struct {
                        struct page *page; <--------------- No 1
                        u64 page_dma;
                        unsigned int page_offset;
                };
        };
        struct page *page;         <------------ No 2
};

Hmm conflict of a member with a transparent union.
Maybe older gccs didn't catch that. But it looks very broken

Alexander, can you sort out which "page" member should be used where?

And there's another problem in SCSI which I'll report separately.

-Andi


^ permalink raw reply

* [PATCH] net: reset gso header when the copied skb is linearized
From: Flavio Leitner @ 2010-10-25 22:23 UTC (permalink / raw)
  To: netdev; +Cc: Flavio Leitner

The gso header is incorrect when the copied skb is
linearized. This patch creates another helper function
to copy the gso header when it is appropriated

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 net/core/skbuff.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 104f844..54a2d3a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -649,6 +649,12 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	if (skb_mac_header_was_set(new))
 		new->mac_header	      += offset;
 #endif
+}
+
+static void copy_skb_header_gso(struct sk_buff *new, const struct sk_buff *old)
+{
+	copy_skb_header(new, old);
+
 	skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
 	skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
 	skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
@@ -740,7 +746,7 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 		skb_clone_fraglist(n);
 	}
 
-	copy_skb_header(n, skb);
+	copy_skb_header_gso(n, skb);
 out:
 	return n;
 }
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCH] Enhance AF_PACKET implementation to not require high order contiguous memory allocation
From: Eric Dumazet @ 2010-10-25 22:30 UTC (permalink / raw)
  To: nhorman@tuxdriver.com; +Cc: netdev, davem, jpirko
In-Reply-To: <E1PAVIx-0001qL-EB@smtp.tuxdriver.com>

Le lundi 25 octobre 2010 à 18:14 -0400, nhorman@tuxdriver.com a écrit :
> I think I remember those changes and IIrc yes,  tcpdump will make
> several attempts to get buffers of an appropriate size.  But while it
> tries to do that it bogs the system trying to write out pagecahe,
> swap, etc.  And that activity doesn't guarantee success.  His does
> either, but getting 5 order 0 pages is far easier and less intrusive
> to a loaded system than trying to get 1 order 4 chunk.  That's all I'm
> trying to accomplish here.  Just making it easier to use af_packet
> sockets without interfering with system performance
> 

Actually, using vmalloc() would probably hurt performance, because of
extra TLB pressure.

Of course, on recent x86 hardware you dont notice that much...

If not, why af_packet would use such convoluted double array of
'compound pages' ?

Also, on x86_32, vmalloc()/vmap() space is small (128 MB) so you might
exhaust it pretty fast with several sniffers running.

I would try a two level thing : Try to get high order pages, and
fallback on low order pages, but normally libpcap does this for us ?

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] be2net: Adding an option to use INTx instead of MSI-X
From: Ben Hutchings @ 2010-10-25 22:38 UTC (permalink / raw)
  To: David Miller; +Cc: somnath.kotur, netdev, linux-pci
In-Reply-To: <20101025.120943.189699949.davem@davemloft.net>

David Miller wrote:
> From: Somnath Kotur <somnath.kotur@emulex.com>
> Date: Mon, 25 Oct 2010 16:42:35 +0530
> 
> > By default, be2net uses MSIx wherever possible.
> > Adding a module parameter to use INTx for users who do not want to use MSIx.
> > 
> > Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
> 
> Either add a new ethtool flag, or use the PCI subsystem facilities
> for tweaking things to implement this.
>
> Do not use a module option, otherwise every other networking driver
> author will get the same "cool" idea, give the module option
> different names, and the resulting user experience is terrible.

This has already happened, sadly.  So far as I can see it's mostly done
to allow users to work around systems with broken MSIs; I'm not aware of
any other reason to prefer legacy interrupts.  However, the PCI subsystem
already implements a blacklist and a kernel parameter for disabling MSIs
on these systems.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC][net-next-2.6 PATCH 1/4] net: consolidate 8021q tagging
From: Jesse Gross @ 2010-10-25 22:44 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

On Thu, Oct 21, 2010 at 3:10 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> Now that VLAN packets are tagged in dev_hard_start_xmit()
> at the bottom of the stack we no longer need to tag them
> in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR
> case).
>
> This allows the accel path and non accel paths to be consolidated.
> Here the vlan_tci in the skb is always set and we allow the
> stack to add the actual tag in dev_hard_start_xmit().
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

This makes sense.  The only other thing that I would do is drop the
vlan_dev_info(dev)->cnt_encap_on_xmit and
vlan_dev_info(dev)->cnt_inc_headroom_on_tx variables.  Nothing will
ever increment them anymore.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox