Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 4/8] af_unix: Allow SO_PEERCRED to work across namespaces.
From: Daniel Lezcano @ 2010-06-14 13:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, Serge Hallyn, Linux Containers, netdev,
	Pavel Emelyanov
In-Reply-To: <m1r5kbgivt.fsf@fess.ebiederm.org>

On 06/13/2010 03:30 PM, Eric W. Biederman wrote:
> Use struct pid and struct cred to store the peer credentials on struct
> sock.  This gives enough information to convert the peer credential
> information to a value relative to whatever namespace the socket is in
> at the time.
>
> This removes nasty surprises when using SO_PEERCRED on socket
> connetions where the processes on either side are in different pid and
> user namespaces.
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
>    

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>


^ permalink raw reply

* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Eric Dumazet @ 2010-06-14 13:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <1276522483.2478.88.camel@edumazet-laptop>

Le lundi 14 juin 2010 à 15:34 +0200, Eric Dumazet a écrit :

> [PATCH] net: fix deliver_no_wcard regression on loopback device
> 
> deliver_no_wcard is not being set in skb_copy_header.
> In the skb_cloned case it is not being cleared and
> may cause the skb to be dropped when the loopback device
> pushes it back up the stack.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Oh I forgot :

Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

> ---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9f07e74..bcf2fa3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
>  	new->ip_summed		= old->ip_summed;
>  	skb_copy_queue_mapping(new, old);
>  	new->priority		= old->priority;
> +	new->deliver_no_wcard	= old->deliver_no_wcard;
>  #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
>  	new->ipvs_property	= old->ipvs_property;
>  #endif
> 



^ permalink raw reply

* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Eric Dumazet @ 2010-06-14 13:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <20100614132120.GA24785@redhat.com>

From: John Fastabend <john.r.fastabend@intel.com>

Le lundi 14 juin 2010 à 16:21 +0300, Michael S. Tsirkin a écrit :
> On Thu, Jun 03, 2010 at 12:30:11PM -0700, John Fastabend wrote:
> > Currently, the accelerated receive path for VLAN's will
> > drop packets if the real device is an inactive slave and
> > is not one of the special pkts tested for in
> > skb_bond_should_drop().  This behavior is different then
> > the non-accelerated path and for pkts over a bonded vlan.
> > 
> > For example,
> > 
> > vlanx -> bond0 -> ethx
> > 
> > will be dropped in the vlan path and not delivered to any
> > packet handlers at all.  However,
> > 
> > bond0 -> vlanx -> ethx
> > 
> > and
> > 
> > bond0 -> ethx
> > 
> > will be delivered to handlers that match the exact dev,
> > because the VLAN path checks the real_dev which is not a
> > slave and netif_recv_skb() doesn't drop frames but only
> > delivers them to exact matches.
> > 
> > This patch adds a sk_buff flag which is used for tagging
> > skbs that would previously been dropped and allows the
> > skb to continue to skb_netif_recv().  Here we add
> > logic to check for the deliver_no_wcard flag and if it
> > is set only deliver to handlers that match exactly.  This
> > makes both paths above consistent and gives pkt handlers
> > a way to identify skbs that come from inactive slaves.
> > Without this patch in some configurations skbs will be
> > delivered to handlers with exact matches and in others
> > be dropped out right in the vlan path.
> > 
> > I have tested the following 4 configurations in failover modes
> > and load balancing modes.
> > 
> > # bond0 -> ethx
> > 
> > # vlanx -> bond0 -> ethx
> > 
> > # bond0 -> vlanx -> ethx
> > 
> > # bond0 -> ethx
> >             |
> >   vlanx -> --
> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> I am using qemu with both tap and slirp (userspace) networking.
> This works fine under 2.6.35-rc2 but breaks under 2.6.35-rc3:
> ssh over slirp stops working sometimes right away
> and sometimes after a bit of use, connection times out.
> 
> Git bisect gave me this commit:
> 597a264b1a9c7e36d1728f677c66c5c1f7e3b837.
> 
> Reverting 597a264b1a9c7e36d1728f677c66c5c1f7e3b837 fixes the issue
> for me.
> 
> I'm short for time now so didn't debug this further.
> I opened a bugzilla to track this issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=16204
> 

A fix is already there, and bug is already opened multiple times.

http://lkml.org/lkml/2010/6/13/155

[PATCH] net: fix deliver_no_wcard regression on loopback device

deliver_no_wcard is not being set in skb_copy_header.
In the skb_cloned case it is not being cleared and
may cause the skb to be dropped when the loopback device
pushes it back up the stack.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f07e74..bcf2fa3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->ip_summed		= old->ip_summed;
 	skb_copy_queue_mapping(new, old);
 	new->priority		= old->priority;
+	new->deliver_no_wcard	= old->deliver_no_wcard;
 #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
 	new->ipvs_property	= old->ipvs_property;
 #endif



^ permalink raw reply related

* potential race in virtio ring?
From: Michael S. Tsirkin @ 2010-06-14 13:59 UTC (permalink / raw)
  To: virtualization, Rusty Russell, Jiri Pirko, Shirley Ma, netdev,
	linux-kernel

Hi!
I was going over the vring code and noticed, that
the ring has this check:

irqreturn_t vring_interrupt(int irq, void *_vq)
{
        struct vring_virtqueue *vq = to_vvq(_vq);

        if (!more_used(vq)) {
                pr_debug("virtqueue interrupt with no work for %p\n", vq);
                return IRQ_NONE;

static inline bool more_used(const struct vring_virtqueue *vq)
{               
        return vq->last_used_idx != vq->vring.used->idx;
}               

My concern is that with virtio net, more_used is called
on a CPU different from the one that polls the vq.
This might mean that last_used_idx value might be stale.
Could this lead to a missed interrupt?

Thanks,

-- 
MST

^ permalink raw reply

* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Shan Wei @ 2010-06-14 14:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <1276506144.2478.40.camel@edumazet-laptop>

Eric Dumazet wrote, at 06/14/2010 05:02 PM:
> Instead of doing one atomic operation per frag, we can factorize them.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

IPv6 netfilter has implemented owns queue to manage/reassemble defragments.
So, you miss this one.

[PATCH 1/2] netfilter: defrag: remove one redundant atomic ops

Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6fb8901..bc5b86d 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -442,7 +442,6 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 	skb_shinfo(head)->frag_list = head->next;
 	skb_reset_transport_header(head);
 	skb_push(head, head->data - skb_network_header(head));
-	atomic_sub(head->truesize, &nf_init_frags.mem);
 
 	for (fp=head->next; fp; fp = fp->next) {
 		head->data_len += fp->len;
@@ -452,8 +451,8 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
 		else if (head->ip_summed == CHECKSUM_COMPLETE)
 			head->csum = csum_add(head->csum, fp->csum);
 		head->truesize += fp->truesize;
-		atomic_sub(fp->truesize, &nf_init_frags.mem);
 	}
+	atomic_sub(head->truesize, &nf_init_frags.mem);
 
 	head->next = NULL;
 	head->dev = dev;

^ permalink raw reply related

* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Shan Wei @ 2010-06-14 14:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <1276507363.2478.43.camel@edumazet-laptop>

Eric Dumazet wrote, at 06/14/2010 05:22 PM:
> Third param (work) is unused, remove it.
> 
> Remove __inline__ and inline qualifiers.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

we also need to fix IPv6 netfilter.

[PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()

The parameter (work) is unused, remove it.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 net/ipv6/netfilter/nf_conntrack_reasm.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index bc5b86d..9254008 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -114,10 +114,8 @@ static void nf_skb_free(struct sk_buff *skb)
 }
 
 /* Memory Tracking Functions. */
-static inline void frag_kfree_skb(struct sk_buff *skb, unsigned int *work)
+static void frag_kfree_skb(struct sk_buff *skb)
 {
-	if (work)
-		*work -= skb->truesize;
 	atomic_sub(skb->truesize, &nf_init_frags.mem);
 	nf_skb_free(skb);
 	kfree_skb(skb);
@@ -335,7 +333,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 				fq->q.fragments = next;
 
 			fq->q.meat -= free_it->len;
-			frag_kfree_skb(free_it, NULL);
+			frag_kfree_skb(free_it);
 		}
 	}
 
-- 
1.6.3.3

^ permalink raw reply related

* Re: mpd client timeouts (bisected) 2.6.35-rc3
From: Michael S. Tsirkin @ 2010-06-14 14:13 UTC (permalink / raw)
  To: David Miller
  Cc: john.r.fastabend, markus, linux-kernel, netdev, yanmin_zhang,
	alex.shi, tim.c.chen
In-Reply-To: <20100613.171318.193707256.davem@davemloft.net>

On Sun, Jun 13, 2010 at 05:13:18PM -0700, David Miller wrote:
> From: John Fastabend <john.r.fastabend@intel.com>
> Date: Sun, 13 Jun 2010 13:36:30 -0700
> 
> > Needed to set the wcard bit in copy_skb_header otherwise it will not
> > be cleared when called from skb_clone.  Which then hits the loopback
> > device gets pushed into the rx path and is eventually dropped. The
> > following patch fixes this. Hopefully, this is easy and fast enough
> > for you Dave.
> > 
> > 
> > [PATCH] net: fix deliver_no_wcard regression on loopback device
> > 
> > deliver_no_wcard is not being set in skb_copy_header.
> > In the skb_cloned case it is not being cleared and
> > may cause the skb to be dropped when the loopback device
> > pushes it back up the stack.
> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> Applied, but your email client corrupted this patch in many
> ways.  Please correct this for next time, thanks.

FWIW:

Tested-by: Michael S. Tsirkin <mst@redhat.com>

-- 
MST

^ permalink raw reply

* Re: [PATCH v2] net: deliver skbs on inactive slaves to exact matches
From: Michael S. Tsirkin @ 2010-06-14 14:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: John Fastabend, fubar, davem, nhorman, bonding-devel, netdev
In-Reply-To: <1276522708.2478.89.camel@edumazet-laptop>

On Mon, Jun 14, 2010 at 03:38:28PM +0200, Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 15:34 +0200, Eric Dumazet a écrit :
> 
> > [PATCH] net: fix deliver_no_wcard regression on loopback device
> > 
> > deliver_no_wcard is not being set in skb_copy_header.
> > In the skb_cloned case it is not being cleared and
> > may cause the skb to be dropped when the loopback device
> > pushes it back up the stack.
> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Oh I forgot :
> 
> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>

Grr. Could have saved myself a bit of time if I guessed
it's related.

> > ---
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 9f07e74..bcf2fa3 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -532,6 +532,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
> >  	new->ip_summed		= old->ip_summed;
> >  	skb_copy_queue_mapping(new, old);
> >  	new->priority		= old->priority;
> > +	new->deliver_no_wcard	= old->deliver_no_wcard;
> >  #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
> >  	new->ipvs_property	= old->ipvs_property;
> >  #endif
> > 
> 

^ permalink raw reply

* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Eric Dumazet @ 2010-06-14 14:18 UTC (permalink / raw)
  To: Shan Wei; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <4C163622.7080003@cn.fujitsu.com>

Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
> Eric Dumazet wrote, at 06/14/2010 05:02 PM:
> > Instead of doing one atomic operation per frag, we can factorize them.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> IPv6 netfilter has implemented owns queue to manage/reassemble defragments.
> So, you miss this one.
> 

Not exactly missed, its only a different thing :)

I prefer to separate if possible net patches (David) and netfilter ones
(Patrick), because of delay between git trees.

> [PATCH 1/2] netfilter: defrag: remove one redundant atomic ops
> 
> Instead of doing one atomic operation per frag, we can factorize them.
> Reported from Eric Dumazet.
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

> ---
>  net/ipv6/netfilter/nf_conntrack_reasm.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index 6fb8901..bc5b86d 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -442,7 +442,6 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
>  	skb_shinfo(head)->frag_list = head->next;
>  	skb_reset_transport_header(head);
>  	skb_push(head, head->data - skb_network_header(head));
> -	atomic_sub(head->truesize, &nf_init_frags.mem);
>  
>  	for (fp=head->next; fp; fp = fp->next) {
>  		head->data_len += fp->len;
> @@ -452,8 +451,8 @@ nf_ct_frag6_reasm(struct nf_ct_frag6_queue *fq, struct net_device *dev)
>  		else if (head->ip_summed == CHECKSUM_COMPLETE)
>  			head->csum = csum_add(head->csum, fp->csum);
>  		head->truesize += fp->truesize;
> -		atomic_sub(fp->truesize, &nf_init_frags.mem);
>  	}
> +	atomic_sub(head->truesize, &nf_init_frags.mem);
>  
>  	head->next = NULL;
>  	head->dev = dev;



^ permalink raw reply

* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Eric Dumazet @ 2010-06-14 14:20 UTC (permalink / raw)
  To: Shan Wei; +Cc: David Miller, netdev, Patrick McHardy, netfilter-devel
In-Reply-To: <4C163643.1080906@cn.fujitsu.com>

Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
> Eric Dumazet wrote, at 06/14/2010 05:22 PM:
> > Third param (work) is unused, remove it.
> > 
> > Remove __inline__ and inline qualifiers.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> we also need to fix IPv6 netfilter.
> 

well, 'fix' is not appropriate, there is no bug ;)

> [PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()
> 
> The parameter (work) is unused, remove it.
> Reported from Eric Dumazet.
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

> ---
>  net/ipv6/netfilter/nf_conntrack_reasm.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
> index bc5b86d..9254008 100644
> --- a/net/ipv6/netfilter/nf_conntrack_reasm.c
> +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
> @@ -114,10 +114,8 @@ static void nf_skb_free(struct sk_buff *skb)
>  }
>  
>  /* Memory Tracking Functions. */
> -static inline void frag_kfree_skb(struct sk_buff *skb, unsigned int *work)
> +static void frag_kfree_skb(struct sk_buff *skb)
>  {
> -	if (work)
> -		*work -= skb->truesize;
>  	atomic_sub(skb->truesize, &nf_init_frags.mem);
>  	nf_skb_free(skb);
>  	kfree_skb(skb);
> @@ -335,7 +333,7 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>  				fq->q.fragments = next;
>  
>  			fq->q.meat -= free_it->len;
> -			frag_kfree_skb(free_it, NULL);
> +			frag_kfree_skb(free_it);
>  		}
>  	}
>  



^ permalink raw reply

* Re: [PATCH net-next-2.6] ip_frag: Remove some atomic ops
From: Patrick McHardy @ 2010-06-14 14:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shan Wei, David Miller, netdev, netfilter-devel
In-Reply-To: <1276525084.2478.92.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
>   
>> [PATCH 1/2] netfilter: defrag: remove one redundant atomic ops
>>
>> Instead of doing one atomic operation per frag, we can factorize them.
>> Reported from Eric Dumazet.
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>>     
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>   

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipfrag : frag_kfree_skb() cleanup
From: Patrick McHardy @ 2010-06-14 14:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shan Wei, David Miller, netdev, netfilter-devel
In-Reply-To: <1276525212.2478.93.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le lundi 14 juin 2010 à 22:01 +0800, Shan Wei a écrit :
>   
>> [PATCH 2/2] netfilter: defrag: kill unused work parameter of frag_kfree_skb()
>>
>> The parameter (work) is unused, remove it.
>> Reported from Eric Dumazet.
>>
>> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
>>     
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>
>   
Also applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next-2.6] ipv6: avoid two atomics in ipv6_rthdr_rcv()
From: Eric Dumazet @ 2010-06-14 14:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Use __in6_dev_get() instead of in6_dev_get()/in6_dev_put()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/exthdrs.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 853a633..262f105 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -312,6 +312,7 @@ static int ipv6_destopt_rcv(struct sk_buff *skb)
   Routing header.
  ********************************/
 
+/* called with rcu_read_lock() */
 static int ipv6_rthdr_rcv(struct sk_buff *skb)
 {
 	struct inet6_skb_parm *opt = IP6CB(skb);
@@ -324,12 +325,9 @@ static int ipv6_rthdr_rcv(struct sk_buff *skb)
 	struct net *net = dev_net(skb->dev);
 	int accept_source_route = net->ipv6.devconf_all->accept_source_route;
 
-	idev = in6_dev_get(skb->dev);
-	if (idev) {
-		if (accept_source_route > idev->cnf.accept_source_route)
-			accept_source_route = idev->cnf.accept_source_route;
-		in6_dev_put(idev);
-	}
+	idev = __in6_dev_get(skb->dev);
+	if (idev && accept_source_route > idev->cnf.accept_source_route)
+		accept_source_route = idev->cnf.accept_source_route;
 
 	if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
 	    !pskb_may_pull(skb, (skb_transport_offset(skb) +



^ permalink raw reply related

* Re: [PATCH v4] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-14 14:39 UTC (permalink / raw)
  To: netfilter-devel@vger.kernel.org
  Cc: netdev@vger.kernel.org, Jan Engelhardt, Patrick McHardy,
	Timo Teras
In-Reply-To: <1276264913-429-1-git-send-email-luciano.coelho@nokia.com>

On Fri, 2010-06-11 at 16:01 +0200, Coelho Luciano (Nokia-D/Helsinki)
wrote:
> v4: Fixed according to Jan's and Patrick's comments to v3
>     Changed to mutex locking instead of spin locks
>     Save the timer in the target info struct to avoid extra reads
>     Other small clean-ups

Does the patch look okay now? Any further comments?


-- 
Cheers,
Luca.


^ permalink raw reply

* [PATCH net-next-2.6] ipv6: RCU changes in ipv6_get_mtu() and ip6_dst_hoplimit()
From: Eric Dumazet @ 2010-06-14 14:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit() 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv6/route.c |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f770285..8f2d040 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1084,11 +1084,11 @@ static int ipv6_get_mtu(struct net_device *dev)
 	int mtu = IPV6_MIN_MTU;
 	struct inet6_dev *idev;
 
-	idev = in6_dev_get(dev);
-	if (idev) {
+	rcu_read_lock();
+	idev = __in6_dev_get(dev);
+	if (idev)
 		mtu = idev->cnf.mtu6;
-		in6_dev_put(idev);
-	}
+	rcu_read_unlock();
 	return mtu;
 }
 
@@ -1097,12 +1097,15 @@ int ip6_dst_hoplimit(struct dst_entry *dst)
 	int hoplimit = dst_metric(dst, RTAX_HOPLIMIT);
 	if (hoplimit < 0) {
 		struct net_device *dev = dst->dev;
-		struct inet6_dev *idev = in6_dev_get(dev);
-		if (idev) {
+		struct inet6_dev *idev;
+
+		rcu_read_lock();
+		idev = __in6_dev_get(dev);
+		if (idev)
 			hoplimit = idev->cnf.hop_limit;
-			in6_dev_put(idev);
-		} else
+		else
 			hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit;
+		rcu_read_unlock();
 	}
 	return hoplimit;
 }



^ permalink raw reply related

* Re: [PATCH v4] netfilter: Xtables: idletimer target implementation
From: Patrick McHardy @ 2010-06-14 14:48 UTC (permalink / raw)
  To: Luciano Coelho; +Cc: netfilter, netdev, Jan Engelhardt, Timo Teras
In-Reply-To: <1276264913-429-1-git-send-email-luciano.coelho@nokia.com>

Luciano Coelho wrote:
> +static int idletimer_tg_create(struct idletimer_tg_info *info)
> +{
> +	int ret;
> +
> +	info->timer = kmalloc(sizeof(*info->timer), GFP_ATOMIC);
> +	if (!info->timer) {
> +		pr_debug("couldn't alloc timer\n");
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	info->timer->attr.attr.name = kstrdup(info->label, GFP_ATOMIC);
>   

These two allocations don't need GFP_ATOMIC AFAICT.

> +	if (!info->timer->attr.attr.name) {
> +		pr_debug("couldn't alloc attribute name\n");
> +		ret = -ENOMEM;
> +		goto out_free_timer;
> +	}
> +	info->timer->attr.attr.mode = S_IRUGO;
> +	info->timer->attr.show = idletimer_tg_show;
> +
> +	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
> +	if (ret < 0) {
> +		pr_debug("couldn't add file to sysfs");
> +		goto out_free_attr;
> +	}
> +
> +	list_add(&info->timer->entry, &idletimer_tg_list);
> +
> +	setup_timer(&info->timer->timer, idletimer_tg_expired,
> +		    (unsigned long) info->timer);
> +	info->timer->refcnt = 1;
> +
> +	mod_timer(&info->timer->timer,
> +		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
> +
> +	INIT_WORK(&info->timer->work, idletimer_tg_work);
> +
> +	return 0;
> +
> +out_free_attr:
> +	kfree(info->timer->attr.attr.name);
> +out_free_timer:
> +	kfree(info->timer);
> +out:
> +	return ret;
> +}
> +
> +/*
> + * The actual xt_tables plugin.
> + */
> +static unsigned int idletimer_tg_target(struct sk_buff *skb,
> +					 const struct xt_action_param *par)
> +{
> +	const struct idletimer_tg_info *info = par->targinfo;
> +
> +	pr_debug("resetting timer %s, timeout period %u\n",
> +		 info->label, info->timeout);
> +
> +	mutex_lock(&list_mutex);
>   

You can't take the mutex in the target function. What is it supposed to
protect again?

> +
> +	BUG_ON(!info->timer);
> +
> +	mod_timer(&info->timer->timer,
> +		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
> +
> +	mutex_unlock(&list_mutex);
> +
> +	return XT_CONTINUE;
> +}
> +


^ permalink raw reply

* [PATCH net-next-2.6] be2net: enable ipv6 tso support
From: Ajit Khaparde @ 2010-06-14 14:56 UTC (permalink / raw)
  To: David Miller, netdev

Add ipv6 support to the be2net driver.

Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
---
 drivers/net/benet/be_hw.h   |    2 +-
 drivers/net/benet/be_main.c |    6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/benet/be_hw.h b/drivers/net/benet/be_hw.h
index 063026d..0683967 100644
--- a/drivers/net/benet/be_hw.h
+++ b/drivers/net/benet/be_hw.h
@@ -192,7 +192,7 @@ struct amap_eth_hdr_wrb {
 	u8 event;
 	u8 crc;
 	u8 forward;
-	u8 ipsec;
+	u8 lso6;
 	u8 mgmt;
 	u8 ipcs;
 	u8 udpcs;
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 3225774..01eb447 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -373,10 +373,12 @@ static void wrb_fill_hdr(struct be_eth_hdr_wrb *hdr, struct sk_buff *skb,
 
 	AMAP_SET_BITS(struct amap_eth_hdr_wrb, crc, hdr, 1);
 
-	if (skb_shinfo(skb)->gso_segs > 1 && skb_shinfo(skb)->gso_size) {
+	if (skb_is_gso(skb)) {
 		AMAP_SET_BITS(struct amap_eth_hdr_wrb, lso, hdr, 1);
 		AMAP_SET_BITS(struct amap_eth_hdr_wrb, lso_mss,
 			hdr, skb_shinfo(skb)->gso_size);
+		if (skb_is_gso_v6(skb))
+			AMAP_SET_BITS(struct amap_eth_hdr_wrb, lso6, hdr, 1);
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		if (is_tcp_pkt(skb))
 			AMAP_SET_BITS(struct amap_eth_hdr_wrb, tcpcs, hdr, 1);
@@ -2186,7 +2188,7 @@ static void be_netdev_init(struct net_device *netdev)
 
 	netdev->features |= NETIF_F_SG | NETIF_F_HW_VLAN_RX | NETIF_F_TSO |
 		NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_FILTER | NETIF_F_HW_CSUM |
-		NETIF_F_GRO;
+		NETIF_F_GRO | NETIF_F_TSO6;
 
 	netdev->vlan_features |= NETIF_F_SG | NETIF_F_TSO | NETIF_F_HW_CSUM;
 
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH v4] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-14 14:59 UTC (permalink / raw)
  To: ext Patrick McHardy
  Cc: netfilter@vger.kernel.org, netdev@vger.kernel.org, Jan Engelhardt,
	Timo Teras
In-Reply-To: <4C164152.8080103@trash.net>

On Mon, 2010-06-14 at 16:48 +0200, ext Patrick McHardy wrote:
> Luciano Coelho wrote:
> > +static int idletimer_tg_create(struct idletimer_tg_info *info)
> > +{
> > +	int ret;
> > +
> > +	info->timer = kmalloc(sizeof(*info->timer), GFP_ATOMIC);
> > +	if (!info->timer) {
> > +		pr_debug("couldn't alloc timer\n");
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	info->timer->attr.attr.name = kstrdup(info->label, GFP_ATOMIC);
> >   
> 
> These two allocations don't need GFP_ATOMIC AFAICT.

You're right, I'll fix it in v5.


> 
> > +	if (!info->timer->attr.attr.name) {
> > +		pr_debug("couldn't alloc attribute name\n");
> > +		ret = -ENOMEM;
> > +		goto out_free_timer;
> > +	}
> > +	info->timer->attr.attr.mode = S_IRUGO;
> > +	info->timer->attr.show = idletimer_tg_show;
> > +
> > +	ret = sysfs_create_file(idletimer_tg_kobj, &info->timer->attr.attr);
> > +	if (ret < 0) {
> > +		pr_debug("couldn't add file to sysfs");
> > +		goto out_free_attr;
> > +	}
> > +
> > +	list_add(&info->timer->entry, &idletimer_tg_list);
> > +
> > +	setup_timer(&info->timer->timer, idletimer_tg_expired,
> > +		    (unsigned long) info->timer);
> > +	info->timer->refcnt = 1;
> > +
> > +	mod_timer(&info->timer->timer,
> > +		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
> > +
> > +	INIT_WORK(&info->timer->work, idletimer_tg_work);
> > +
> > +	return 0;
> > +
> > +out_free_attr:
> > +	kfree(info->timer->attr.attr.name);
> > +out_free_timer:
> > +	kfree(info->timer);
> > +out:
> > +	return ret;
> > +}
> > +
> > +/*
> > + * The actual xt_tables plugin.
> > + */
> > +static unsigned int idletimer_tg_target(struct sk_buff *skb,
> > +					 const struct xt_action_param *par)
> > +{
> > +	const struct idletimer_tg_info *info = par->targinfo;
> > +
> > +	pr_debug("resetting timer %s, timeout period %u\n",
> > +		 info->label, info->timeout);
> > +
> > +	mutex_lock(&list_mutex);
> >   
> 
> You can't take the mutex in the target function. What is it supposed to
> protect again?

Hmmmm... I was thinking that info->timer could be freed while this
function is executing.  But I guess the call to this function is already
protected against that, right?

I'll remove the mutex from here.


> > +
> > +	BUG_ON(!info->timer);
> > +
> > +	mod_timer(&info->timer->timer,
> > +		  msecs_to_jiffies(info->timeout * 1000) + jiffies);
> > +
> > +	mutex_unlock(&list_mutex);
> > +
> > +	return XT_CONTINUE;
> > +}
> > +
> 


-- 
Cheers,
Luca.


^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv6: RCU changes in ipv6_get_mtu() and ip6_dst_hoplimit()
From: YOSHIFUJI Hideaki @ 2010-06-14 15:18 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, YOSHIFUJI Hideaki
In-Reply-To: <1276526780.2478.101.camel@edumazet-laptop>

(2010/06/14 23:46), Eric Dumazet wrote:
> Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
> and ip6_dst_hoplimit()
>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

--yoshfuji

^ permalink raw reply

* [PATCH] net: Fix error in comment on net_device_ops::ndo_get_stats
From: Ben Hutchings @ 2010-06-14 15:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

ndo_get_stats still returns struct net_device_stats *; there is
no struct net_device_stats64.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 include/linux/netdevice.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d85a38e..4164285 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -671,7 +671,7 @@ struct netdev_rx_queue {
  *	1. Define @ndo_get_stats64 to update a rtnl_link_stats64 structure
  *	   (which should normally be dev->stats64) and return a ponter to
  *	   it. The structure must not be changed asynchronously.
- *	2. Define @ndo_get_stats to update a net_device_stats64 structure
+ *	2. Define @ndo_get_stats to update a net_device_stats structure
  *	   (which should normally be dev->stats) and return a pointer to
  *	   it. The structure may be changed asynchronously only if each
  *	   field is written atomically.
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-next-2.6] loopback: Implement 64bit stats on 32bit arches
From: Eric Dumazet @ 2010-06-14 15:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ben Hutchings

Uses a seqcount_t to synchronize stat producer and consumer, for packets
and bytes counter, now u64 types.

(dropped counter being rarely used, stay a native "unsigned long" type)

No noticeable performance impact on x86, as it only adds two increments
per frame. It might be more expensive on arches where smp_wmb() is not
free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/loopback.c |   61 ++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 10 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 72b7949..09334f8 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -60,11 +60,51 @@
 #include <net/net_namespace.h>
 
 struct pcpu_lstats {
-	unsigned long packets;
-	unsigned long bytes;
+	u64 packets;
+	u64 bytes;
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+	seqcount_t seq;
+#endif
 	unsigned long drops;
 };
 
+#if BITS_PER_LONG==32 && defined(CONFIG_SMP)
+static void inline lstats_update_begin(struct pcpu_lstats *lstats)
+{
+	write_seqcount_begin(&lstats->seq);
+}
+static void inline lstats_update_end(struct pcpu_lstats *lstats)
+{
+	write_seqcount_end(&lstats->seq);
+}
+static void inline lstats_fetch_and_add(u64 *packets, u64 *bytes, const struct pcpu_lstats *lstats)
+{
+	u64 tpackets, tbytes;
+	unsigned int seq;
+
+	do {
+		seq = read_seqcount_begin(&lstats->seq);
+		tpackets = lstats->packets;
+		tbytes = lstats->bytes;
+	} while (read_seqcount_retry(&lstats->seq, seq));
+
+	*packets += tpackets;
+	*bytes += tbytes;
+}
+#else
+static void inline lstats_update_begin(struct pcpu_lstats *lstats)
+{
+}
+static void inline lstats_update_end(struct pcpu_lstats *lstats)
+{
+}
+static void inline lstats_fetch_and_add(u64 *packets, u64 *bytes, const struct pcpu_lstats *lstats)
+{
+	*packets += lstats->packets;
+	*bytes += lstats->bytes;
+}
+#endif
+
 /*
  * The higher levels take care of making this non-reentrant (it's
  * called with bh's disabled).
@@ -86,21 +126,23 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 
 	len = skb->len;
 	if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
+		lstats_update_begin(lb_stats);
 		lb_stats->bytes += len;
 		lb_stats->packets++;
+		lstats_update_end(lb_stats);
 	} else
 		lb_stats->drops++;
 
 	return NETDEV_TX_OK;
 }
 
-static struct net_device_stats *loopback_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev)
 {
 	const struct pcpu_lstats __percpu *pcpu_lstats;
-	struct net_device_stats *stats = &dev->stats;
-	unsigned long bytes = 0;
-	unsigned long packets = 0;
-	unsigned long drops = 0;
+	struct rtnl_link_stats64 *stats = &dev->stats64;
+	u64 bytes = 0;
+	u64 packets = 0;
+	u64 drops = 0;
 	int i;
 
 	pcpu_lstats = (void __percpu __force *)dev->ml_priv;
@@ -108,8 +150,7 @@ static struct net_device_stats *loopback_get_stats(struct net_device *dev)
 		const struct pcpu_lstats *lb_stats;
 
 		lb_stats = per_cpu_ptr(pcpu_lstats, i);
-		bytes   += lb_stats->bytes;
-		packets += lb_stats->packets;
+		lstats_fetch_and_add(&packets, &bytes, lb_stats);
 		drops   += lb_stats->drops;
 	}
 	stats->rx_packets = packets;
@@ -158,7 +199,7 @@ static void loopback_dev_free(struct net_device *dev)
 static const struct net_device_ops loopback_ops = {
 	.ndo_init      = loopback_dev_init,
 	.ndo_start_xmit= loopback_xmit,
-	.ndo_get_stats = loopback_get_stats,
+	.ndo_get_stats64 = loopback_get_stats64,
 };
 
 /*



^ permalink raw reply related

* Re: [PATCH 1/4] net/phy/marvell: Expose IDs and flags in a .h and add dns323 LEDs setup flag
From: David Miller @ 2010-06-14 16:13 UTC (permalink / raw)
  To: benh; +Cc: w.sang, netdev, hvr, linux-arm-kernel, nico
In-Reply-To: <1276408955.1962.578.camel@pasglop>

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Sun, 13 Jun 2010 16:02:35 +1000

> On Sun, 2010-06-13 at 07:28 +0200, Wolfram Sang wrote:
>> On Sun, Jun 13, 2010 at 11:10:23AM +1000, Benjamin Herrenschmidt wrote:
>> > This moves the various known Marvell PHY IDs to include/linux/marvell_phy.h
>> > along with dev_flags definitions for use by the driver.
>> > 
>> > I then added a flag that changes the PHY init code to setup the LEDs
>> > config to the values needed to operate a dns323 rev C1 NAS.
>> > 
>> > I moved the existing "resistance" flag to the .h as well, though I've
>> > been unable to find whoever sets this to convert it to use that constant.
>> > 
>> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> 
>> That should do for now.
>> 
>> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
> 
> Thanks.
> 
> Dave, any objection to having that go via the arm tree along with the
> rest of my patches to support the dns323 since they depend on this one ?

No problem:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH v2] ucc_geth: fix for RX skb buffers recycling
From: Sergey Matyukevich @ 2010-06-14 16:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, leoli, avorontsov
In-Reply-To: <20100609.180240.59675642.davem@davemloft.net>

Hello David,

Could you please consider the second, simplified, version of the patch
for ucc_geth driver regarding proper RX error skb buffer recycling.


This patch implements a proper modification of RX skb buffers before
recycling. Adjusting only skb->data is not enough because after that
skb->tail and skb->len become incorrect.

Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
---
 drivers/net/ucc_geth.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 4a34833..807470e 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3215,6 +3215,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 					   __func__, __LINE__, (u32) skb);
 			if (skb) {
 				skb->data = skb->head + NET_SKB_PAD;
+				skb->len = 0;
+				skb_reset_tail_pointer(skb);
 				__skb_queue_head(&ugeth->rx_recycle, skb);
 			}
 
-- 
1.6.2.5


^ permalink raw reply related

* Re: New issues in 2.6.35-rc{1,2}
From: Borislav Petkov @ 2010-06-14 16:43 UTC (permalink / raw)
  To: Nico Schottelius; +Cc: LKML, netdev
In-Reply-To: <20100614161904.GE31910@schottelius.org>

From: Nico Schottelius <nico-linux-20100614@schottelius.org>
Date: Mon, Jun 14, 2010 at 06:19:04PM +0200

Ccing netdev.

> Hey devs!
> 
> In both of these versions, the update of the carrier flag seems
> to be "different" than before (linux-2.6.34-08528-gb3f2f6c):
> 
> wlan0 (iwlagn) does not recognize it's disconnected
> and eth0 (e1000e) does only react correctly after I restarted
> dhcpcd (isc):
> 
> [11:20] kr:pm# dhcpcd eth0 
> dhcpcd: version 5.2.2 starting
> dhcpcd: eth0: waiting for carrier
> ^Cdhcpcd: received SIGINT, stopping
> dhcpcd: eth0: removing interface
> [11:20] kr:pm# dhcpcd eth0
> dhcpcd: version 5.2.2 starting
> dhcpcd: eth0: broadcasting for a lease
> dhcpcd: eth0: offered 129.132.102.115 from 129.132.65.12
> dhcpcd: eth0: ignoring offer of 129.132.102.115 from 129.132.57.97
> dhcpcd: eth0: acknowledged 129.132.102.115 from 129.132.65.12
> dhcpcd: eth0: checking for 129.132.102.115
> dhcpcd: eth0: leased 129.132.102.115 for 86400 seconds
> dhcpcd: forking to background
> 
> It's known that iwlagn has problems after suspend/resume,
> but new is that both nics do not notify dhcpcd that the link
> has gone. I'm not sure what / where broke here, just see that
> dhcpcd noticed it before and now doesn't anymore.

Had the same dhcpcd issue here both with e1000e and bnx2 and definitely
after the .35 merge window. dhclient doesn't seem affected.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply

* Re: [PATCH] vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)
From: Pedro Garcia @ 2010-06-14 16:49 UTC (permalink / raw)
  To: netdev; +Cc: Ben Hutchings
In-Reply-To: <1276466190.14011.223.camel@localhost>

On Sun, 13 Jun 2010 22:56:30 +0100, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> I have no particular opinion on this change, but you need to read and
> follow Documentation/SubmittingPatches.
> 
> Ben.

Sorry, first kernel patch, and I did not know about it. I resubmit with
the correct style / format:

I am using kernel 2.6.26 in a linux box, and I have another box in the
network using 802.1p (priority tagging, but no VLAN).

Without the 8021q module loaded in the kernel, all 802.1p packets are
silently discarded (probably as expected, as the protocol is not loaded in
the kernel).

When I load 8021q module, these packets are forwarded to the module, but
they are discarded also as VLAN 0 is not configured.

I think this should not be the default behaviour, as VLAN 0 is not really
a VLAN, so it should be treated differently.

I could define the VLAN 0 (ip link add link eth0 name eth0.dot1p type vlan
id 0), but then I have a lot of issues with the ARP table entries, as to
ping the other box, outgoing traffic goes through eth0, but incoming arp
reply ends up in eth0.dot1p. In the end this means I can not communicate
with the box using 802.1p unless I use 802.1p tagging for all traffic in
the network (the linux box and all other), which is not a must of the
spec.

I have developed a patch for vlan_dev.c which makes VLAN 0 to be just
reintroduced to netif_rx but with no VLAN tagging if VLAN 0 has not been
defined, so the default behaviour is to ignore the VLAN tagging and accept
the packet as if it was not tagged, and one can still define something
different for VLAN 0 if desired (so it is backwards compatible).

Signed-off-by: Pedro Garcia <pedro.netdev@dondevamos.com>
--- net/8021q/vlan_dev.c.orig   2008-07-13 23:51:29.000000000 +0200
+++ net/8021q/vlan_dev.c        2010-06-14 18:07:35.000000000 +0200
@@ -151,6 +151,7 @@ int vlan_skb_recv(struct sk_buff *skb, s
        struct vlan_hdr *vhdr;
        unsigned short vid;
        struct net_device_stats *stats;
+       struct net_device *vlan_dev;
        unsigned short vlan_TCI;

        skb = skb_share_check(skb, GFP_ATOMIC);
@@ -165,11 +166,23 @@ int vlan_skb_recv(struct sk_buff *skb, s
        vid = (vlan_TCI & VLAN_VID_MASK);

        rcu_read_lock();
-       skb->dev = __find_vlan_dev(dev, vid);
-       if (!skb->dev) {
+       vlan_dev = __find_vlan_dev(dev, vid);
+       if (vlan_dev) {
+               skb->dev = vlan_dev;
+       } else if (vid) {
                pr_debug("%s: ERROR: No net_device for VID: %u on dev:
%s\n",
                         __func__, (unsigned int)vid, dev->name);
                goto err_unlock;
+       } else {
+               /* 2010-06-13: Pedro Garcia
+                  The packet is VLAN tagged, but VID is 0 and the user
has
+                  not defined anything for VLAN 0, so it is a 802.1p
packet.
+                  We will just netif_rx it later to the original
interface,
+                  but with the skb->proto set to the wrapped proto, so we
do
+                  nothing here. */
+
+               pr_debug("%s: INFO: VLAN 0 used as default VLAN on dev:
%s\n",
+                        __func__, dev->name);
        }

        skb->dev->last_rx = jiffies;

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox