Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] rfkill-gpio: ACPI support
From: Johannes Berg @ 2013-10-17 14:26 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: John W. Linville, Rhyland Klein, Rafael J. Wysocki, linux-acpi,
	linux-wireless, netdev
In-Reply-To: <1381920823-15403-1-git-send-email-heikki.krogerus@linux.intel.com>

On Wed, 2013-10-16 at 13:53 +0300, Heikki Krogerus wrote:
> Hi,
> 
> The first patches prepare the driver for the support. The last patch
> can then add the support quite easily. With these patches, adding DT
> support later will be quite straight forward if someone needs it.

Applied. I'm totally relying on all the reviewers though, since I have
very little idea of what's going on. If anyone else wants to maintain
the rfkill-gpio driver I'd welcome that :-)

johannes


^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Wei Liu @ 2013-10-17 14:06 UTC (permalink / raw)
  To: jianhai luan
  Cc: David Vrabel, Jan Beulich, ian.campbell, wei.liu2, xen-devel,
	annie.li, netdev
In-Reply-To: <525FED42.4040608@oracle.com>

On Thu, Oct 17, 2013 at 09:59:30PM +0800, jianhai luan wrote:
[...]
> >>>>If use time_after_eq64(), expire ,next_credit and other member will must
> >>>>be u64.
> >>>Yes, you'll need to store next_credit as a u64 in vif instead of
> >>>calculating it in tx_credit_exceeded from expires (which is only an
> >>>unsigned long).
> >>I know that.  Even we use u64, time_after_eq()  will also do wrong judge
> >>in theory (not in reality because need long long time).
> >If jiffies_64 has millisecond resolution that would be more than
> >500,000,000 years.
> 
> Yes, I agree the fact.
> >
> >>I think the two better fixed way is below:
> >>   - By time_before() to judge if now beyond MAX_ULONG/2
> >This is broken, so no.
> 
> Where is broken?  would you like to help me point it out.

I think David means you didn't actually fix the problem. Your solution is
merely a workaround.

> >
> >David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17 13:59 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FBC7F.9040800@citrix.com>


On 2013-10-17 18:31, David Vrabel wrote:
> On 17/10/13 11:19, jianhai luan wrote:
>> On 2013-10-17 17:15, David Vrabel wrote:
>>> On 17/10/13 10:02, jianhai luan wrote:
>>>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>>>
>>>>>> If netfront sends at a very low rate, the time between subsequent
>>>>>> calls
>>>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>>>> timer_after_eq() will be incorrect.  Credit will not be replenished
>>>>>> and
>>>>>> the guest may become unable to send (e.g., if prior to the long
>>>>>> gap, all
>>>>>> credit was exhausted).
>>>>>>
>>>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>>>> Because
>>>>>> the fact now must be not before than expire, time_before(now, expire)
>>>>>> == true
>>>>>> will verify the scenario.
>>>>>>        time_after_eq(now, next_credit) || time_before (now, expire)
>>>>>>        ==
>>>>>>        !time_in_range_open(now, expire, next_credit)
>>>>> So first of all this must be with a 32-bit netback. And the not
>>>>> coverable gap between activity is well over 240 days long. _If_
>>>>> this really needs dealing with, then why is extending this from
>>>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>>>> change to 64-bit jiffy values, and use time_after_eq64()?
>>>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>>>> MAX_ULONG/2 in 64-bit will need long long time)
>>>>
>>>> I think the gap should be think all environment even now extending 480+.
>>>> if now fall in the gap,  one timer will be pending and replenish will be
>>>> in time.  Please run the attachment test program.
>>>>
>>>> If use time_after_eq64(), expire ,next_credit and other member will must
>>>> be u64.
>>> Yes, you'll need to store next_credit as a u64 in vif instead of
>>> calculating it in tx_credit_exceeded from expires (which is only an
>>> unsigned long).
>> I know that.  Even we use u64, time_after_eq()  will also do wrong judge
>> in theory (not in reality because need long long time).
> If jiffies_64 has millisecond resolution that would be more than
> 500,000,000 years.

Yes, I agree the fact.
>
>> I think the two better fixed way is below:
>>    - By time_before() to judge if now beyond MAX_ULONG/2
> This is broken, so no.

Where is broken?  would you like to help me point it out.
>
> David

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Eric Dumazet @ 2013-10-17 13:44 UTC (permalink / raw)
  To: Peter Schmitt; +Cc: netdev@vger.kernel.org, Pravin B Shelar
In-Reply-To: <1382007190.35868.YahooMailNeo@web172006.mail.ir2.yahoo.com>

On Thu, 2013-10-17 at 11:53 +0100, Peter Schmitt wrote:
> Hi,
> 
> 
> > 
> > 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the probable
> > fix :
> > 
> > commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
> > ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
> > 
> > The problem is 3.10 do not pull the extra 4 bytes of WCCP
> > 
> > This is now properly done in iptunnel_pull_header()
> > 
> 
> First of all, many thanks for your help on this.
> 
> I tried to apply the fix to the 3.10.16 sources, as I would like to
> stay with the long-term line. Unfortunately it does not apply, as
> there are a lot of dependencies on other patches.
> 
> Do you think there will be a fix for the long-time 3.10 kernel line?
> Or can you guide me on how to apply this fix to the long-term kernel?

3.10 is right in the middle of GRE refactoring, and many bugs are in it.

It might be very painful to get a complete list of patches to backport.

^ permalink raw reply

* Re: [PATCH net-next] fib_trie: Send RTM_DELROUTE when link goes down
From: Sergei Shtylyov @ 2013-10-17 13:43 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: netdev
In-Reply-To: <1382008395-9869-1-git-send-email-kristian.evensen@gmail.com>

Hello.

On 17-10-2013 15:13, Kristian Evensen wrote:

> From: Kristian Evensen <kristian.evensen@gmail.com>

> When a link is set as down using for example "ip link set dev X down", routes
> are deleted, but RTM_DELROUTE messages are not sent to RTNLGRP_IPV4_ROUTE. This
> patch makes trie_flush_list() send a RTM_DELROUTE for each route that is
> removed, and makes rtnelink route deletion behavior consistent across commands.
> The parameter lists for trie_flush_list() and trie_flush_leaf() are expanded to
> include required information otherwise unavailable in trie_flush_list().

> One use case that is simplified by this patch is IPv4 WAN connection monitoring.
> Instead of listening for and handling both RTM_*ROUTE and RTM_*LINK-messages, it
> is sufficient to handle RTM_*ROUTE.

> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>

> ---
>   net/ipv4/fib_trie.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)

> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index ec9a9ef..acd5b3b 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1698,15 +1698,23 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
>   	return 0;
>   }
>
> -static int trie_flush_list(struct list_head *head)
> +static int trie_flush_list(struct list_head *head, u32 tb_id, t_key key,
> +		int plen)

    The continuation line should start right under *struct* on the first line, 
according to the networking coding style.

>   {
>   	struct fib_alias *fa, *fa_node;
>   	int found = 0;
> +	struct nl_info cfg;
> +
> +	memset(&cfg, 0, sizeof(cfg));
>
>   	list_for_each_entry_safe(fa, fa_node, head, fa_list) {
>   		struct fib_info *fi = fa->fa_info;
>
>   		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
> +			cfg.nl_net = fi->fib_net;
> +			rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb_id,
> +					&cfg, 0);

    Here the line should start right under RTM_DELROUTE.

WBR, Sergei

^ permalink raw reply

* Re: [PATCH ipsec v3] xfrm: prevent ipcomp scratch buffer race condition
From: Herbert Xu @ 2013-10-17 13:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michal Kubecek, Steffen Klassert, David S. Miller, netdev
In-Reply-To: <1382017117.2045.146.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 17, 2013 at 06:38:37AM -0700, Eric Dumazet wrote:
> On Thu, 2013-10-17 at 15:07 +0200, Michal Kubecek wrote:
> > In ipcomp_compress(), sortirq is enabled too early, allowing the
> > per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> > (called on the same CPU in softirq context) between populating
> > the buffer and copying the compressed data to the skb.
> > 
> > v2: as pointed out by Steffen Klassert, if we also move the
> > local_bh_disable() before reading the per-cpu pointers, we can
> > get rid of get_cpu()/put_cpu().
> > 
> > v3: removed ipcomp_decompress part (as explained by Herbert Xu,
> > it cannot be called from process context), get rid of cpu
> > variable (thanks to Eric Dumazet)
> > 
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> > ---
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH ipsec v3] xfrm: prevent ipcomp scratch buffer race condition
From: Eric Dumazet @ 2013-10-17 13:38 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131017130740.D8741E8A5E@unicorn.suse.cz>

On Thu, 2013-10-17 at 15:07 +0200, Michal Kubecek wrote:
> In ipcomp_compress(), sortirq is enabled too early, allowing the
> per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> (called on the same CPU in softirq context) between populating
> the buffer and copying the compressed data to the skb.
> 
> v2: as pointed out by Steffen Klassert, if we also move the
> local_bh_disable() before reading the per-cpu pointers, we can
> get rid of get_cpu()/put_cpu().
> 
> v3: removed ipcomp_decompress part (as explained by Herbert Xu,
> it cannot be called from process context), get rid of cpu
> variable (thanks to Eric Dumazet)
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-17 13:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <20131017110420.GA21027@gondor.apana.org.au>

On Thu, Oct 17, 2013 at 07:04:20PM +0800, Herbert Xu wrote:
V> On Thu, Oct 17, 2013 at 01:01:49PM +0200, Michal Kubecek wrote:
> > 
> > Am I wrong?
> 
> I hope so :) Because otherwise xfrm_input will surely dead-lock since
> it uses spin_lock.
> 
> The cases you mentioned should all go through netif_rx which will
> then do the processing in BH context.

You are right. Thank you for the correction.

Michal Kubecek

^ permalink raw reply

* [PATCH ipsec v3] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-17 13:07 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Herbert Xu, David S. Miller, netdev, Eric Dumazet
In-Reply-To: <20131017095502.GD7660@secunet.com>

In ipcomp_compress(), sortirq is enabled too early, allowing the
per-cpu scratch buffer to be rewritten by ipcomp_decompress()
(called on the same CPU in softirq context) between populating
the buffer and copying the compressed data to the skb.

v2: as pointed out by Steffen Klassert, if we also move the
local_bh_disable() before reading the per-cpu pointers, we can
get rid of get_cpu()/put_cpu().

v3: removed ipcomp_decompress part (as explained by Herbert Xu,
it cannot be called from process context), get rid of cpu
variable (thanks to Eric Dumazet)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 net/xfrm/xfrm_ipcomp.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index 2906d52..3be02b6 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -141,14 +141,14 @@ static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb)
 	const int plen = skb->len;
 	int dlen = IPCOMP_SCRATCH_SIZE;
 	u8 *start = skb->data;
-	const int cpu = get_cpu();
-	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
-	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
+	struct crypto_comp *tfm;
+	u8 *scratch;
 	int err;
 
 	local_bh_disable();
+	scratch = *this_cpu_ptr(ipcomp_scratches);
+	tfm = *this_cpu_ptr(ipcd->tfms);
 	err = crypto_comp_compress(tfm, start, plen, scratch, &dlen);
-	local_bh_enable();
 	if (err)
 		goto out;
 
@@ -158,13 +158,13 @@ static int ipcomp_compress(struct xfrm_state *x, struct sk_buff *skb)
 	}
 
 	memcpy(start + sizeof(struct ip_comp_hdr), scratch, dlen);
-	put_cpu();
+	local_bh_enable();
 
 	pskb_trim(skb, dlen + sizeof(struct ip_comp_hdr));
 	return 0;
 
 out:
-	put_cpu();
+	local_bh_enable();
 	return err;
 }
 
-- 
1.8.1.4

^ permalink raw reply related

* Re: [PATCH v2 net 4/4] bridge: Fix updating FDB entries when the PVID is applied
From: Toshiaki Makita @ 2013-10-17 12:52 UTC (permalink / raw)
  To: vyasevic
  Cc: Stephen Hemminger, David S . Miller, netdev, Toshiaki Makita,
	Fernando Luis Vazquez Cao
In-Reply-To: <525EBABF.6030304@redhat.com>

On Wed, 2013-10-16 at 12:11 -0400, Vlad Yasevich wrote:
> On 10/16/2013 11:57 AM, Stephen Hemminger wrote:
> > On Wed, 16 Oct 2013 17:07:16 +0900
> > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> >
> >> We currently set the value that variable vid is pointing, which will be
> >> used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
> >> or priority-tagged frames, even though the PVID is valid.
> >> This leads to FDB updates in such a wrong way that they are learned with
> >> VID 0.
> >> Update the value to that of PVID if the PVID is applied.
> >>
> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> >> Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
> >> ---
> >>   net/bridge/br_vlan.c | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> >> index 5a9c44a..53f0990 100644
> >> --- a/net/bridge/br_vlan.c
> >> +++ b/net/bridge/br_vlan.c
> >> @@ -217,6 +217,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
> >>   		/* PVID is set on this port.  Any untagged or priority-tagged
> >>   		 * ingress frame is considered to belong to this vlan.
> >>   		 */
> >> +		*vid = pvid;
> >>   		if (likely(err))
> >>   			/* Untagged Frame. */
> >>   			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), pvid);
> >
> >
> > Ok, but side-effects seem like an indication of poor code logic
> > flow design. Not your fault but part of the the per-vlan filtering code.
> >
> 
> I'll see if I can re-work the code to get rid of the side-effects.

I'm thinking br_allowed_ingress() might have too many roles.
- Determine whether an incoming frame is acceptable.
- Update skb->vlan_tci if PVID is applied.
- Update the argument 'vid'.

Besides, 'vid' is actually updated by not br_allowed_ingress() but
br_vlan_get_tag().

I think this complicated implementation could lead to missing expected
code for updating vid.

At least we can remove the third role from br_allowed_ingress() because
the required vid is recorded in skb->vlan_tci when we exit the function.

So, we can write the caller of br_allowed_ingress() like
	...
	if (!br_allowed_ingress(br, v, skb))
		goto drop;
	vid = br_vlan_get_tag(skb);

(Assuming br_vlan_get_tag() has been changed to return vid.)

However, this will require br_vlan_get_tag() to check br->vlan_enabled.
Does this change reduce complexity of current implementation?

BTW, some codes in mdb, such as br_multicast_ipv4_rcv(), seem to call
br_vlan_get_tag() without checking br->vlan_enabled.
Is this right way?
Or does br_vlan_get_tag() originally need to check br->vlan_enabled?

Thanks,

Toshiaki Makita

> 
> -vlad

^ permalink raw reply

* Re: sun4i_handle_irq: WARNING at net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert
From: Maxime Ripard @ 2013-10-17 12:32 UTC (permalink / raw)
  To: Richard Genoud
  Cc: Stefan Roese, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CACQ1gAinSoTCTtf9Y6TL=vtEm1jnJgis0n=y=frNtuPxue9-ng@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

Hi Richard

On Thu, Oct 17, 2013 at 01:46:49PM +0200, Richard Genoud wrote:
> > Hmmm, that seems pretty far from the network/interrupt drivers, and it
> > looks like other users have seen this on !ARM machines:
> > http://forums.gentoo.org/viewtopic-p-7379928.html
> > https://bugzilla.redhat.com/show_bug.cgi?id=989251
> >
> > There's a patch in the redhat's bugzilla, could you try to apply it and
> > see if it solves your problem?
> >
> > Maybe the netdev guys will have other ideas as well.
> For the record, I set:
> net.ipv4.tcp_early_retrans = 2
> net.ipv4.tcp_frto = 0
> And I didn't see the warning any more.
> It's seems that a patch is on it's way. I'll try it instead of setting
> the sysctl.
> http://thread.gmane.org/gmane.linux.network/286793

Ah, I wanted to ask you about the status of this.

Thanks for the follow-up,
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net 2/4] bridge: Apply the PVID to priority-tagged frames
From: Toshiaki Makita @ 2013-10-17 12:14 UTC (permalink / raw)
  To: vyasevic
  Cc: Stephen Hemminger, David S . Miller, netdev, Toshiaki Makita,
	Fernando Luis Vazquez Cao
In-Reply-To: <525EBBC9.8050809@redhat.com>

On Wed, 2013-10-16 at 12:16 -0400, Vlad Yasevich wrote:
> On 10/16/2013 11:55 AM, Stephen Hemminger wrote:
> > On Wed, 16 Oct 2013 17:07:14 +0900
> > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> >
> >> IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
> >> use the PVID for the port as its VID.
> >> (See IEEE 802.1Q-2011 6.9.1 and Table 9-2)
> >>
> >> Apply the PVID to not only untagged frames but also priority-tagged frames.
> >>
> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> >> ---
> >>   net/bridge/br_vlan.c | 27 ++++++++++++++++++++-------
> >>   1 file changed, 20 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> >> index 21b6d21..5a9c44a 100644
> >> --- a/net/bridge/br_vlan.c
> >> +++ b/net/bridge/br_vlan.c
> >> @@ -189,6 +189,8 @@ out:
> >>   bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
> >>   			struct sk_buff *skb, u16 *vid)
> >>   {
> >> +	int err;
> >> +
> >>   	/* If VLAN filtering is disabled on the bridge, all packets are
> >>   	 * permitted.
> >>   	 */
> >> @@ -201,20 +203,31 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
> >>   	if (!v)
> >>   		return false;
> >>
> >> -	if (br_vlan_get_tag(skb, vid)) {
> >> +	err = br_vlan_get_tag(skb, vid);
> >> +	if (!*vid) {
> >>   		u16 pvid = br_get_pvid(v);
> >
> > Ok, but it looks like br_vlan_get_tag() could be cleaner if it just returned
> > the tag, and there was another br_vlan_tag_present() function.

Thank you for reviewing.
I agree with you.
I had been afraid that if it affects other codes because
br_vlan_get_tag() is used in many places else, but now I have decided
not to hesitate to change its signature and behavior.

> 
> I was just thinking about that as well.  If we make br_vlan_get_tag()
> return either the actual tag (if the packet is tagged), or the pvid
> if (untagged/prio_tagged), then we can skp most of this.

Hmm... maybe I don't fully understand you.

Is what you intend something like
	br_allowed_ingress(...) {
		...
		vid = br_vlan_get_tag(skb, v);
		if (!tagged(skb)) put_tag(skb, vid); /* untagged */
		else if (!get_vid(skb)) update_vid(skb, vid); /* prio_tagged */
		...
	}

	br_vlan_get_tag(skb, v) {
		if (tagged(skb)) {
			vid = get_vid(skb);
			if (!vid) return get_pvid(v); /* prio_tagged */
			return vid;
		}
		return get_pvid(v); /* untagged */
	}

This needs double check for prio_tagged at br_allowed_ingress() and
br_vlan_get_tag().

Or if we modify skb->vlan_tci at br_vlan_get_tag(), isn't it a little
dangerous to other codes that use this function in order to just get
vid?

I am thinking it makes things simple that br_vlan_get_tag() returns 0 if
(untagged/prio_tagged).

	br_allowed_ingress(...) {
		...
		vid = br_vlan_get_tag(skb);
		if (!vid) {
			vid = get_pvid(v);
			if (!tagged(skb)) put_tag(skb, vid);/* untagged */
			else update_vid(skb, vid); /* prio_tagged */
		}
		...
	}

	br_vlan_get_tag(skb) {
		if (tagged(skb)) return get_vid(skb);
		return 0;
	}

Thanks,

Toshiaki Makita

> 
> >
> > Also, does this still work if CONFIG_BRIDGE_VLAN_FILTERING is disabled?
> 
> Yes.  br_allowed_ingress becomes an inline if the config option is disabled.
> 
> -vlad

^ permalink raw reply

* Re: [PATCH] X.25: Fix address field length calculation
From: Kelleter, Günther @ 2013-10-17 12:09 UTC (permalink / raw)
  To: Andrew Hendry, David Laight
  Cc: Joe Perches, David Miller, linux-x25@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <CADo0ohh7jZhc_WJFkrYYxoYza8ZeSEadzwgwabJWwQ1TucdCcg@mail.gmail.com>


E.g. called address 7 digits and caller address 3 digits. Called DCE
answering without facilities
gives us this packet (hex):

37 12 34 56 71 23 00

then x25_parse_address_block()  tries to pull 1+7+3 = 11 bytes from the
packet (with pskb_may_pull())
which only has 7 bytes.
When facilities are included the wrong calculated length has no effect
since the facilities make this packet long enough to make pskb_may_pull
with wrong number
of bytes succeed. later x25_addr_ntoa() correctly pulls 6 bytes for
addresses from the packet.


Am 17.10.2013 13:02, schrieb Andrew Hendry:
> Sorry for the previous html mail.
> This appears to be correct, what length addresses are you getting back
> in the call accept when this happens?
>
> On Wed, Oct 16, 2013 at 7:56 PM, David Laight <David.Laight@aculab.com> wrote:
>>> On Tue, 2013-10-15 at 14:29 +0000, Kelleter, Günther wrote:
>>>> Addresses are BCD encoded, not ASCII. x25_addr_ntoa got it right.
>>> []
>>>> Wrong length calculation leads to rejection of CALL ACCEPT packets.
>>> []
>>>> diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
>>> []
>>>> @@ -98,7 +98,7 @@ int x25_parse_address_block(struct sk_buff *skb,
>>>>     }
>>>>     len = *skb->data;
>>>> -   needed = 1 + (len >> 4) + (len & 0x0f);
>>>> +   needed = 1 + ((len >> 4) + (len & 0x0f) + 1) / 2;
>>> This calculation looks odd.
>> Looks correct to me...
>> In X.25 the lengths (in digits) of the called and calling addresses
>> are encoded in the high and low nibbles of one byte and then
>> followed by both addresses with a digit in each nibble.
>> If the length of the first address is odd, the second one
>> isn't byte aligned.
>>
>>         David
>>
>>
>>


-- 

^ permalink raw reply

* Re: [PATCH 5/5] net: rfkill: gpio: add ACPI support
From: Rafael J. Wysocki @ 2013-10-17 11:48 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Heikki Krogerus, John W. Linville, Johannes Berg, Rhyland Klein,
	linux-acpi, linux-wireless, netdev
In-Reply-To: <20131017074426.GG3521@intel.com>

On Thursday, October 17, 2013 10:44:26 AM Mika Westerberg wrote:
> On Wed, Oct 16, 2013 at 10:55:01PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, October 16, 2013 01:53:43 PM Heikki Krogerus wrote:
> > > Including ACPI ID for Broadcom GPS receiver BCM4752.
> > > 
> > > Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> > > ---
> > >  net/rfkill/rfkill-gpio.c | 31 ++++++++++++++++++++++++++++++-
> > >  1 file changed, 30 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
> > > index 2dd78c6..5620d3c 100644
> > > --- a/net/rfkill/rfkill-gpio.c
> > > +++ b/net/rfkill/rfkill-gpio.c
> > > @@ -24,6 +24,8 @@
> > >  #include <linux/platform_device.h>
> > >  #include <linux/clk.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/acpi.h>
> > > +#include <linux/acpi_gpio.h>
> > >  
> > >  #include <linux/rfkill-gpio.h>
> > >  
> > > @@ -70,6 +72,23 @@ static const struct rfkill_ops rfkill_gpio_ops = {
> > >  	.set_block = rfkill_gpio_set_power,
> > >  };
> > >  
> > > +static int rfkill_gpio_acpi_probe(struct device *dev,
> > > +				  struct rfkill_gpio_data *rfkill)
> > > +{
> > > +	const struct acpi_device_id *id;
> > > +
> > > +	id = acpi_match_device(dev->driver->acpi_match_table, dev);
> > > +	if (!id)
> > > +		return -ENODEV;
> > > +
> > > +	rfkill->name = dev_name(dev);
> > > +	rfkill->type = (unsigned)id->driver_data;
> > > +	rfkill->reset_gpio = acpi_get_gpio_by_index(dev, 0, NULL);
> > > +	rfkill->shutdown_gpio = acpi_get_gpio_by_index(dev, 1, NULL);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int rfkill_gpio_probe(struct platform_device *pdev)
> > >  {
> > >  	struct rfkill_gpio_platform_data *pdata = pdev->dev.platform_data;
> > > @@ -82,7 +101,11 @@ static int rfkill_gpio_probe(struct platform_device *pdev)
> > >  	if (!rfkill)
> > >  		return -ENOMEM;
> > >  
> > > -	if (pdata) {
> > > +	if (ACPI_HANDLE(&pdev->dev)) {
> > > +		ret = rfkill_gpio_acpi_probe(&pdev->dev, rfkill);
> > > +		if (ret)
> > > +			return ret;
> > > +	} else if (pdata) {
> > >  		clk_name = pdata->power_clk_name;
> > >  		rfkill->name = pdata->name;
> > >  		rfkill->type = pdata->type;
> > > @@ -170,12 +193,18 @@ static int rfkill_gpio_remove(struct platform_device *pdev)
> > >  	return 0;
> > >  }
> > >  
> > > +static const struct acpi_device_id rfkill_acpi_match[] = {
> > > +	{ "BCM4752", RFKILL_TYPE_GPS },
> > > +	{ },
> > > +};
> > > +
> > >  static struct platform_driver rfkill_gpio_driver = {
> > >  	.probe = rfkill_gpio_probe,
> > >  	.remove = rfkill_gpio_remove,
> > >  	.driver = {
> > >  		.name = "rfkill_gpio",
> > >  		.owner = THIS_MODULE,
> > > +		.acpi_match_table = ACPI_PTR(rfkill_acpi_match),
> > >  	},
> > >  };
> > 
> > Looks good to me.
> > 
> > Has Mika seen this?
> 
> Yes, saw it now and looks good to me as well.
> 
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> for the whole series, for what it's worth.

OK, so

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

for the ACPI-related changes.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: sun4i_handle_irq: WARNING at net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert
From: Richard Genoud @ 2013-10-17 11:46 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Stefan Roese, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20130823182116.GE1230@lukather>

2013/8/23 Maxime Ripard <maxime.ripard@free-electrons.com>:
> Hi Richard,
>
> On Wed, Aug 21, 2013 at 11:47:20AM +0200, Richard GENOUD wrote:
>> Hi Maxime, Stephan
>>
>> I just realise that, *sometimes*, I have some warnings on my cubieboard
>> (6 since the 22 of july, and the board is runnning 24/24).
>
> Wow, I'm impressed it worked this fine actually :)
>
>> It has happened also on 3.10 + emac patches.
>> Here it is:
>> [27224.060000] ------------[ cut here ]------------
>> [27224.070000] WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert+0x717/0x734()
>> [27224.080000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.0-rc6-00010-g54f11ec #8
>> [27224.080000] [<c000f231>] (unwind_backtrace+0x1/0x98) from [<c000e2b7>] (show_stack+0xb/0xc)
>> [27224.090000] [<c000e2b7>] (show_stack+0xb/0xc) from [<c0013cd3>] (warn_slowpath_common+0x43/0x5c)
>> [27224.100000] [<c0013cd3>] (warn_slowpath_common+0x43/0x5c) from [<c0013d6d>] (warn_slowpath_null+0x11/0x14)
>> [27224.110000] [<c0013d6d>] (warn_slowpath_null+0x11/0x14) from [<c015eadf>] (tcp_fastretrans_alert+0x717/0x734)
>> [27224.120000] [<c015eadf>] (tcp_fastretrans_alert+0x717/0x734) from [<c015f32b>] (tcp_ack+0x6af/0x984)
>> [27224.130000] [<c015f32b>] (tcp_ack+0x6af/0x984) from [<c015fcab>] (tcp_rcv_established+0x93/0x3ec)
>> [27224.140000] [<c015fcab>] (tcp_rcv_established+0x93/0x3ec) from [<c0165621>] (tcp_v4_do_rcv+0xa1/0x160)
>> [27224.150000] [<c0165621>] (tcp_v4_do_rcv+0xa1/0x160) from [<c0167129>] (tcp_v4_rcv+0x4e9/0x500)
>> [27224.160000] [<c0167129>] (tcp_v4_rcv+0x4e9/0x500) from [<c015025d>] (ip_local_deliver_finish+0x79/0x180)
>> [27224.170000] [<c015025d>] (ip_local_deliver_finish+0x79/0x180) from [<c01500a1>] (ip_rcv_finish+0xc9/0x20c)
>> [27224.180000] [<c01500a1>] (ip_rcv_finish+0xc9/0x20c) from [<c0129493>] (__netif_receive_skb_core+0x287/0x3dc)
>> [27224.190000] [<c0129493>] (__netif_receive_skb_core+0x287/0x3dc) from [<c0129a8b>] (process_backlog+0x5b/0xd0)
>> [27224.200000] [<c0129a8b>] (process_backlog+0x5b/0xd0) from [<c012bc6f>] (net_rx_action+0x5f/0xe8)
>> [27224.200000] [<c012bc6f>] (net_rx_action+0x5f/0xe8) from [<c0015fc9>] (__do_softirq+0x91/0x134)
>> [27224.210000] [<c0015fc9>] (__do_softirq+0x91/0x134) from [<c0016251>] (irq_exit+0x59/0x80)
>> [27224.220000] [<c0016251>] (irq_exit+0x59/0x80) from [<c000d00d>] (handle_IRQ+0x21/0x54)
>> [27224.230000] [<c000d00d>] (handle_IRQ+0x21/0x54) from [<c00083ef>] (sun4i_handle_irq+0x1f/0x30)
>> [27224.240000] [<c00083ef>] (sun4i_handle_irq+0x1f/0x30) from [<c000ea1b>] (__irq_svc+0x3b/0x5c)
>> [27224.250000] Exception stack(0xc0287f70 to 0xc0287fb8)
>> [27224.250000] 7f60:                                     00000000 00000000 00000000 00000000
>> [27224.260000] 7f80: c0286000 c02ad7fd 00000001 c02ad7fd c028e084 413fc082 00000000 00000000
>> [27224.270000] 7fa0: 01000000 c0287fb8 c000d115 c000d116 40000033 ffffffff
>> [27224.270000] [<c000ea1b>] (__irq_svc+0x3b/0x5c) from [<c000d116>] (arch_cpu_idle+0x16/0x1c)
>> [27224.280000] [<c000d116>] (arch_cpu_idle+0x16/0x1c) from [<c002e69d>] (cpu_startup_entry+0x35/0xa0)
>> [27224.290000] [<c002e69d>] (cpu_startup_entry+0x35/0xa0) from [<c026f75f>] (start_kernel+0x1af/0x1f0)
>> [27224.300000] ---[ end trace 15641276e08fba8a ]---
>
> Hmmm, that seems pretty far from the network/interrupt drivers, and it
> looks like other users have seen this on !ARM machines:
> http://forums.gentoo.org/viewtopic-p-7379928.html
> https://bugzilla.redhat.com/show_bug.cgi?id=989251
>
> There's a patch in the redhat's bugzilla, could you try to apply it and
> see if it solves your problem?
>
> Maybe the netdev guys will have other ideas as well.
For the record, I set:
net.ipv4.tcp_early_retrans = 2
net.ipv4.tcp_frto = 0
And I didn't see the warning any more.
It's seems that a patch is on it's way. I'll try it instead of setting
the sysctl.
http://thread.gmane.org/gmane.linux.network/286793

Richard.

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Antonio Quartulli @ 2013-10-17 11:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Antonio Quartulli, David S. Miller, netdev@vger.kernel.org,
	Stephen Hemminger
In-Reply-To: <20131017112857.GA11318@localhost>

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Thu, Oct 17, 2013 at 04:28:57AM -0700, Pablo Neira Ayuso wrote:
> Hi,
> > +
> > +/**
> > + * br_netfilter_skb_free - clean the NF bridge data in an skb
> > + * @skb: the skb which the data to free belongs to
> > + */
> > +void br_netfilter_skb_free(struct sk_buff *skb)
> > +{
> > +	nf_bridge_put(skb->nf_bridge);
> > +	skb->nf_bridge = NULL;
> > +}
> 
> This should be nf_reset.

You think I should directly use nf_reset instead of this function?

I see that nf_reset() cleans up the conntrack part too: does it also become
useless once the packet exits the bridge interface?


Cheers,

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Pablo Neira Ayuso @ 2013-10-17 11:28 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: David S. Miller, netdev, Antonio Quartulli, Stephen Hemminger
In-Reply-To: <1381791096-3561-1-git-send-email-antonio@meshcoding.com>

Hi,

On Tue, Oct 15, 2013 at 12:51:36AM +0200, Antonio Quartulli wrote:
> From: Antonio Quartulli <antonio@open-mesh.com>
> 
> Even if it is forbidden to enslave a bridge interface into
> another one, it is still possible to create a chain of
> virtual interfaces including two distinct bridges.
>
> In this case, the skb entering the second bridge could have
> the nf_bridge field already set due to a previous operation
> and consequently lead to wrong a processing of the packet
> itself.
> 
> To prevent this behaviour release and set to NULL the
> nf_bridge field of the skb when forwarding the packet.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
> ---
> 
> I know that not using "extern" when declaring the prototype is not consistent
> with the surrounding code, but checkpatch complaints about using "extern" in .h
> file and I prefer to not do something "wrong" even if stylistically ugly.
> 
> Cheers,
> 
> 
>  net/bridge/br_forward.c   |  2 ++
>  net/bridge/br_netfilter.c | 10 ++++++++++
>  net/bridge/br_private.h   |  2 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 4b81b14..62955f3 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -49,6 +49,8 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
>  	} else {
>  		skb_push(skb, ETH_HLEN);
>  		br_drop_fake_rtable(skb);
> +		br_netfilter_skb_free(skb);
> +
>  		dev_queue_xmit(skb);
>  	}
>  
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index f877362..7cad3e2 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -1086,3 +1086,13 @@ void br_netfilter_fini(void)
>  #endif
>  	dst_entries_destroy(&fake_dst_ops);
>  }
> +
> +/**
> + * br_netfilter_skb_free - clean the NF bridge data in an skb
> + * @skb: the skb which the data to free belongs to
> + */
> +void br_netfilter_skb_free(struct sk_buff *skb)
> +{
> +	nf_bridge_put(skb->nf_bridge);
> +	skb->nf_bridge = NULL;
> +}

This should be nf_reset.

^ permalink raw reply

* [PATCH net-next] fib_trie: Send RTM_DELROUTE when link goes down
From: Kristian Evensen @ 2013-10-17 11:13 UTC (permalink / raw)
  To: netdev; +Cc: Kristian Evensen

From: Kristian Evensen <kristian.evensen@gmail.com>

When a link is set as down using for example "ip link set dev X down", routes
are deleted, but RTM_DELROUTE messages are not sent to RTNLGRP_IPV4_ROUTE. This
patch makes trie_flush_list() send a RTM_DELROUTE for each route that is
removed, and makes rtnelink route deletion behavior consistent across commands.
The parameter lists for trie_flush_list() and trie_flush_leaf() are expanded to
include required information otherwise unavailable in trie_flush_list().

One use case that is simplified by this patch is IPv4 WAN connection monitoring.
Instead of listening for and handling both RTM_*ROUTE and RTM_*LINK-messages, it
is sufficient to handle RTM_*ROUTE.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>

---
 net/ipv4/fib_trie.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index ec9a9ef..acd5b3b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1698,15 +1698,23 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-static int trie_flush_list(struct list_head *head)
+static int trie_flush_list(struct list_head *head, u32 tb_id, t_key key,
+		int plen)
 {
 	struct fib_alias *fa, *fa_node;
 	int found = 0;
+	struct nl_info cfg;
+
+	memset(&cfg, 0, sizeof(cfg));
 
 	list_for_each_entry_safe(fa, fa_node, head, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
 		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+			cfg.nl_net = fi->fib_net;
+			rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb_id,
+					&cfg, 0);
+
 			list_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
@@ -1716,7 +1724,7 @@ static int trie_flush_list(struct list_head *head)
 	return found;
 }
 
-static int trie_flush_leaf(struct leaf *l)
+static int trie_flush_leaf(struct leaf *l, u32 tb_id)
 {
 	int found = 0;
 	struct hlist_head *lih = &l->list;
@@ -1724,7 +1732,7 @@ static int trie_flush_leaf(struct leaf *l)
 	struct leaf_info *li = NULL;
 
 	hlist_for_each_entry_safe(li, tmp, lih, hlist) {
-		found += trie_flush_list(&li->falh);
+		found += trie_flush_list(&li->falh, tb_id, l->key, li->plen);
 
 		if (list_empty(&li->falh)) {
 			hlist_del_rcu(&li->hlist);
@@ -1813,7 +1821,7 @@ int fib_table_flush(struct fib_table *tb)
 	int found = 0;
 
 	for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) {
-		found += trie_flush_leaf(l);
+		found += trie_flush_leaf(l, tb->tb_id);
 
 		if (ll && hlist_empty(&ll->list))
 			trie_leaf_remove(t, ll);
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Herbert Xu @ 2013-10-17 11:04 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <20131017110149.GA6747@unicorn.suse.cz>

On Thu, Oct 17, 2013 at 01:01:49PM +0200, Michal Kubecek wrote:
>
> When I was thinking about this, I came with two situations when this
> might IMHO happen:
> 
> 1. We are sending a packet to ourselves; this could mean not only a
> local loop but also e.g. a packets between an LXC container and the host
> or between two LXC containers. While it doesn't make much sense to use
> compression in such case, it might happen while testing a solution which
> is going to be used in a normal setup later.
> 
> 2. The packet was passed to tun/tap device by a userspace application.
> 
> Am I wrong?

I hope so :) Because otherwise xfrm_input will surely dead-lock since
it uses spin_lock.

The cases you mentioned should all go through netif_rx which will
then do the processing in BH context.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] X.25: Fix address field length calculation
From: Andrew Hendry @ 2013-10-17 11:02 UTC (permalink / raw)
  To: David Laight
  Cc: Joe Perches, Kelleter, Günther, David Miller, linux-x25,
	netdev, linux-kernel
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B738F@saturn3.aculab.com>

Sorry for the previous html mail.
This appears to be correct, what length addresses are you getting back
in the call accept when this happens?

On Wed, Oct 16, 2013 at 7:56 PM, David Laight <David.Laight@aculab.com> wrote:
>> On Tue, 2013-10-15 at 14:29 +0000, Kelleter, Günther wrote:
>> > Addresses are BCD encoded, not ASCII. x25_addr_ntoa got it right.
>> []
>> > Wrong length calculation leads to rejection of CALL ACCEPT packets.
>> []
>> > diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
>> []
>> > @@ -98,7 +98,7 @@ int x25_parse_address_block(struct sk_buff *skb,
>> >     }
>> >     len = *skb->data;
>> > -   needed = 1 + (len >> 4) + (len & 0x0f);
>> > +   needed = 1 + ((len >> 4) + (len & 0x0f) + 1) / 2;
>>
>> This calculation looks odd.
>
> Looks correct to me...
> In X.25 the lengths (in digits) of the called and calling addresses
> are encoded in the high and low nibbles of one byte and then
> followed by both addresses with a digit in each nibble.
> If the length of the first address is odd, the second one
> isn't byte aligned.
>
>         David
>
>
>

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Peter Schmitt @ 2013-10-17 10:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Pravin B Shelar
In-Reply-To: <1381942996.2045.135.camel@edumazet-glaptop.roam.corp.google.com>

Hi,


> 
> 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the probable
> fix :
> 
> commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
> ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
> 
> The problem is 3.10 do not pull the extra 4 bytes of WCCP
> 
> This is now properly done in iptunnel_pull_header()
> 

First of all, many thanks for your help on this.

I tried to apply the fix to the 3.10.16 sources, as I would like to stay with the long-term line. Unfortunately it does not apply, as there are a lot of dependencies on other patches.

Do you think there will be a fix for the long-time 3.10 kernel line? Or can you guide me on how to apply this fix to the long-term kernel?

Best regards,
Peter

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-17 11:01 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <20131016123205.GA9982@gondor.apana.org.au>

On Wed, Oct 16, 2013 at 08:32:05PM +0800, Herbert Xu wrote:
> On Mon, Oct 14, 2013 at 06:03:34PM +0200, Michal Kubecek wrote:
> 
> > Add similar protection into ipcomp_decompress() as it can be
> > called from process context as well (even if such scenario seems
> > a bit artificial).
> 
> I don't think this is possible or otherwise xfrm_input will
> dead-lock.

When I was thinking about this, I came with two situations when this
might IMHO happen:

1. We are sending a packet to ourselves; this could mean not only a
local loop but also e.g. a packets between an LXC container and the host
or between two LXC containers. While it doesn't make much sense to use
compression in such case, it might happen while testing a solution which
is going to be used in a normal setup later.

2. The packet was passed to tun/tap device by a userspace application.

Am I wrong?

                                                        Michal Kubecek

^ permalink raw reply

* Re: IPv6 path discovery oddities - flushing the routing cache resolves
From: Valentijn Sessink @ 2013-10-17 10:53 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20131016154841.GC18135@order.stressinduktion.org>

Hi,

Thanks for your fast response, please see below. Sorry for the rather 
long mail, due to the routing tables. Even though they're really short, 
the output of ipv6_route is extensive...

On 16-10-13 17:48, Hannes Frederic Sowa wrote:
> I do think these two issues are connected. Could you send me a the
> corresponding ip route output and /proc/net/ipv6_route output for when it
> works and mtus are correctly handled and when it does not work?

When it works (please note that the routing cache at first doesn't show 
an MTU, that only happens after a "too big" has been received, hence the 
difference before and after a "ps uaxww"), I have the following - this 
is from last night:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache

# ps uaxww
[... output cut for brevity ...]

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires 577sec mtu 1280

# cat /proc/net/ipv6_route
200107b815290000cd7a7681b261af09 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 000062b2 01000023
   br0
20011af8ff030003021966fffe2606dd 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 00000002 01400023
   br0
2a0104f80000a0a10000000000020001 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000000 00000001 01000003
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000400 00000000 00000000 00000003
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000037 80200001
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000003 000e6bc4 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 0000003b 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo

(Please note, while trying, I found a rather interesting output, showing 
negative time; if it's normal, please ignore:
# ip -statistics -6 ro li cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires -23sec users 2 used 2 age 637sec mtu 1280)


Anyway, today, suddenly it doesn't work anymore:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
fe80::/64 dev vnet2  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:2ff:37a::2 via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529::f via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::5 via 2a01:4f8:200:546b::5 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache

# ip -statistics -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache  users 4 used 9448
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache  users 4 used 260467 age 0sec
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache  users 1 used 987 age 205sec
# ip -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache
(The second time suddenly a couple of routes were gone - I don't know 
why). After "cat /proc/net/ipv6_route > /tmp/route-bad" I typed a "route 
-6 flush cache" and TCP started flowing again. Here's ipv6_route:

200107b802ff037a0000000000000002 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b815290000000000000000000f 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b81529000002138ffffeda140c 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 000024aa 01000023 
   br0
200107b815290000b845bb255d992a07 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 0003f79f 01000023 
   br0
2a0104f80200546b0000000000000005 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000000005 00000000 00000000 00000018 01000001 
   br0
2a0104f80200546b0000000000002eef 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000002eef 00000000 00000001 000003db 01000001 
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000400 00000000 00000000 00000003 
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000167 80200001 
    lo
2a0104f80200546b0000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 0000000b 002a4bfe 80200001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 000001ed 80200001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffedd5a81 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo

I hope this helps.

Best regards,

Valentijn

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Vrabel @ 2013-10-17 10:31 UTC (permalink / raw)
  To: jianhai luan
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FB9BC.9010608@oracle.com>

On 17/10/13 11:19, jianhai luan wrote:
> 
> On 2013-10-17 17:15, David Vrabel wrote:
>> On 17/10/13 10:02, jianhai luan wrote:
>>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>>
>>>>> If netfront sends at a very low rate, the time between subsequent
>>>>> calls
>>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>>> timer_after_eq() will be incorrect.  Credit will not be replenished
>>>>> and
>>>>> the guest may become unable to send (e.g., if prior to the long
>>>>> gap, all
>>>>> credit was exhausted).
>>>>>
>>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>>> Because
>>>>> the fact now must be not before than expire, time_before(now, expire)
>>>>> == true
>>>>> will verify the scenario.
>>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>>       ==
>>>>>       !time_in_range_open(now, expire, next_credit)
>>>> So first of all this must be with a 32-bit netback. And the not
>>>> coverable gap between activity is well over 240 days long. _If_
>>>> this really needs dealing with, then why is extending this from
>>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>>> change to 64-bit jiffy values, and use time_after_eq64()?
>>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>>> MAX_ULONG/2 in 64-bit will need long long time)
>>>
>>> I think the gap should be think all environment even now extending 480+.
>>> if now fall in the gap,  one timer will be pending and replenish will be
>>> in time.  Please run the attachment test program.
>>>
>>> If use time_after_eq64(), expire ,next_credit and other member will must
>>> be u64.
>> Yes, you'll need to store next_credit as a u64 in vif instead of
>> calculating it in tx_credit_exceeded from expires (which is only an
>> unsigned long).
> 
> I know that.  Even we use u64, time_after_eq()  will also do wrong judge
> in theory (not in reality because need long long time).

If jiffies_64 has millisecond resolution that would be more than
500,000,000 years.

> I think the two better fixed way is below:
>   - By time_before() to judge if now beyond MAX_ULONG/2

This is broken, so no.

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17 10:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FAABE.5080806@citrix.com>


On 2013-10-17 17:15, David Vrabel wrote:
> On 17/10/13 10:02, jianhai luan wrote:
>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>
>>>> If netfront sends at a very low rate, the time between subsequent calls
>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>>> credit was exhausted).
>>>>
>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>> Because
>>>> the fact now must be not before than expire, time_before(now, expire)
>>>> == true
>>>> will verify the scenario.
>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>       ==
>>>>       !time_in_range_open(now, expire, next_credit)
>>> So first of all this must be with a 32-bit netback. And the not
>>> coverable gap between activity is well over 240 days long. _If_
>>> this really needs dealing with, then why is extending this from
>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>> change to 64-bit jiffy values, and use time_after_eq64()?
>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>> MAX_ULONG/2 in 64-bit will need long long time)
>>
>> I think the gap should be think all environment even now extending 480+.
>> if now fall in the gap,  one timer will be pending and replenish will be
>> in time.  Please run the attachment test program.
>>
>> If use time_after_eq64(), expire ,next_credit and other member will must
>> be u64.
> Yes, you'll need to store next_credit as a u64 in vif instead of
> calculating it in tx_credit_exceeded from expires (which is only an
> unsigned long).

I know that.  Even we use u64, time_after_eq()  will also do wrong judge 
in theory (not in reality because need long long time).
I think the two better fixed way is below:
   - By time_before() to judge if now beyond MAX_ULONG/2
   - Add another timer to check and update expire in MAX_ULONG>>2 period.

Because second way isn't  be verified in practical (need more time to 
waiting jiffes increase),  I chose the first.
>
> David

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox