Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 5/5] net: rfkill: gpio: add ACPI support
From: Rafael J. Wysocki @ 2013-10-17 11:48 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Heikki Krogerus, John W. Linville, Johannes Berg, Rhyland Klein,
	linux-acpi, linux-wireless, netdev
In-Reply-To: <20131017074426.GG3521@intel.com>

On Thursday, October 17, 2013 10:44:26 AM Mika Westerberg wrote:
> On Wed, Oct 16, 2013 at 10:55:01PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, October 16, 2013 01:53:43 PM Heikki Krogerus wrote:
> > > Including ACPI ID for Broadcom GPS receiver BCM4752.
> > > 
> > > Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
> > > ---
> > >  net/rfkill/rfkill-gpio.c | 31 ++++++++++++++++++++++++++++++-
> > >  1 file changed, 30 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
> > > index 2dd78c6..5620d3c 100644
> > > --- a/net/rfkill/rfkill-gpio.c
> > > +++ b/net/rfkill/rfkill-gpio.c
> > > @@ -24,6 +24,8 @@
> > >  #include <linux/platform_device.h>
> > >  #include <linux/clk.h>
> > >  #include <linux/slab.h>
> > > +#include <linux/acpi.h>
> > > +#include <linux/acpi_gpio.h>
> > >  
> > >  #include <linux/rfkill-gpio.h>
> > >  
> > > @@ -70,6 +72,23 @@ static const struct rfkill_ops rfkill_gpio_ops = {
> > >  	.set_block = rfkill_gpio_set_power,
> > >  };
> > >  
> > > +static int rfkill_gpio_acpi_probe(struct device *dev,
> > > +				  struct rfkill_gpio_data *rfkill)
> > > +{
> > > +	const struct acpi_device_id *id;
> > > +
> > > +	id = acpi_match_device(dev->driver->acpi_match_table, dev);
> > > +	if (!id)
> > > +		return -ENODEV;
> > > +
> > > +	rfkill->name = dev_name(dev);
> > > +	rfkill->type = (unsigned)id->driver_data;
> > > +	rfkill->reset_gpio = acpi_get_gpio_by_index(dev, 0, NULL);
> > > +	rfkill->shutdown_gpio = acpi_get_gpio_by_index(dev, 1, NULL);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int rfkill_gpio_probe(struct platform_device *pdev)
> > >  {
> > >  	struct rfkill_gpio_platform_data *pdata = pdev->dev.platform_data;
> > > @@ -82,7 +101,11 @@ static int rfkill_gpio_probe(struct platform_device *pdev)
> > >  	if (!rfkill)
> > >  		return -ENOMEM;
> > >  
> > > -	if (pdata) {
> > > +	if (ACPI_HANDLE(&pdev->dev)) {
> > > +		ret = rfkill_gpio_acpi_probe(&pdev->dev, rfkill);
> > > +		if (ret)
> > > +			return ret;
> > > +	} else if (pdata) {
> > >  		clk_name = pdata->power_clk_name;
> > >  		rfkill->name = pdata->name;
> > >  		rfkill->type = pdata->type;
> > > @@ -170,12 +193,18 @@ static int rfkill_gpio_remove(struct platform_device *pdev)
> > >  	return 0;
> > >  }
> > >  
> > > +static const struct acpi_device_id rfkill_acpi_match[] = {
> > > +	{ "BCM4752", RFKILL_TYPE_GPS },
> > > +	{ },
> > > +};
> > > +
> > >  static struct platform_driver rfkill_gpio_driver = {
> > >  	.probe = rfkill_gpio_probe,
> > >  	.remove = rfkill_gpio_remove,
> > >  	.driver = {
> > >  		.name = "rfkill_gpio",
> > >  		.owner = THIS_MODULE,
> > > +		.acpi_match_table = ACPI_PTR(rfkill_acpi_match),
> > >  	},
> > >  };
> > 
> > Looks good to me.
> > 
> > Has Mika seen this?
> 
> Yes, saw it now and looks good to me as well.
> 
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> 
> for the whole series, for what it's worth.

OK, so

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

for the ACPI-related changes.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: sun4i_handle_irq: WARNING at net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert
From: Richard Genoud @ 2013-10-17 11:46 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: Stefan Roese, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20130823182116.GE1230@lukather>

2013/8/23 Maxime Ripard <maxime.ripard@free-electrons.com>:
> Hi Richard,
>
> On Wed, Aug 21, 2013 at 11:47:20AM +0200, Richard GENOUD wrote:
>> Hi Maxime, Stephan
>>
>> I just realise that, *sometimes*, I have some warnings on my cubieboard
>> (6 since the 22 of july, and the board is runnning 24/24).
>
> Wow, I'm impressed it worked this fine actually :)
>
>> It has happened also on 3.10 + emac patches.
>> Here it is:
>> [27224.060000] ------------[ cut here ]------------
>> [27224.070000] WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_input.c:2711 tcp_fastretrans_alert+0x717/0x734()
>> [27224.080000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.0-rc6-00010-g54f11ec #8
>> [27224.080000] [<c000f231>] (unwind_backtrace+0x1/0x98) from [<c000e2b7>] (show_stack+0xb/0xc)
>> [27224.090000] [<c000e2b7>] (show_stack+0xb/0xc) from [<c0013cd3>] (warn_slowpath_common+0x43/0x5c)
>> [27224.100000] [<c0013cd3>] (warn_slowpath_common+0x43/0x5c) from [<c0013d6d>] (warn_slowpath_null+0x11/0x14)
>> [27224.110000] [<c0013d6d>] (warn_slowpath_null+0x11/0x14) from [<c015eadf>] (tcp_fastretrans_alert+0x717/0x734)
>> [27224.120000] [<c015eadf>] (tcp_fastretrans_alert+0x717/0x734) from [<c015f32b>] (tcp_ack+0x6af/0x984)
>> [27224.130000] [<c015f32b>] (tcp_ack+0x6af/0x984) from [<c015fcab>] (tcp_rcv_established+0x93/0x3ec)
>> [27224.140000] [<c015fcab>] (tcp_rcv_established+0x93/0x3ec) from [<c0165621>] (tcp_v4_do_rcv+0xa1/0x160)
>> [27224.150000] [<c0165621>] (tcp_v4_do_rcv+0xa1/0x160) from [<c0167129>] (tcp_v4_rcv+0x4e9/0x500)
>> [27224.160000] [<c0167129>] (tcp_v4_rcv+0x4e9/0x500) from [<c015025d>] (ip_local_deliver_finish+0x79/0x180)
>> [27224.170000] [<c015025d>] (ip_local_deliver_finish+0x79/0x180) from [<c01500a1>] (ip_rcv_finish+0xc9/0x20c)
>> [27224.180000] [<c01500a1>] (ip_rcv_finish+0xc9/0x20c) from [<c0129493>] (__netif_receive_skb_core+0x287/0x3dc)
>> [27224.190000] [<c0129493>] (__netif_receive_skb_core+0x287/0x3dc) from [<c0129a8b>] (process_backlog+0x5b/0xd0)
>> [27224.200000] [<c0129a8b>] (process_backlog+0x5b/0xd0) from [<c012bc6f>] (net_rx_action+0x5f/0xe8)
>> [27224.200000] [<c012bc6f>] (net_rx_action+0x5f/0xe8) from [<c0015fc9>] (__do_softirq+0x91/0x134)
>> [27224.210000] [<c0015fc9>] (__do_softirq+0x91/0x134) from [<c0016251>] (irq_exit+0x59/0x80)
>> [27224.220000] [<c0016251>] (irq_exit+0x59/0x80) from [<c000d00d>] (handle_IRQ+0x21/0x54)
>> [27224.230000] [<c000d00d>] (handle_IRQ+0x21/0x54) from [<c00083ef>] (sun4i_handle_irq+0x1f/0x30)
>> [27224.240000] [<c00083ef>] (sun4i_handle_irq+0x1f/0x30) from [<c000ea1b>] (__irq_svc+0x3b/0x5c)
>> [27224.250000] Exception stack(0xc0287f70 to 0xc0287fb8)
>> [27224.250000] 7f60:                                     00000000 00000000 00000000 00000000
>> [27224.260000] 7f80: c0286000 c02ad7fd 00000001 c02ad7fd c028e084 413fc082 00000000 00000000
>> [27224.270000] 7fa0: 01000000 c0287fb8 c000d115 c000d116 40000033 ffffffff
>> [27224.270000] [<c000ea1b>] (__irq_svc+0x3b/0x5c) from [<c000d116>] (arch_cpu_idle+0x16/0x1c)
>> [27224.280000] [<c000d116>] (arch_cpu_idle+0x16/0x1c) from [<c002e69d>] (cpu_startup_entry+0x35/0xa0)
>> [27224.290000] [<c002e69d>] (cpu_startup_entry+0x35/0xa0) from [<c026f75f>] (start_kernel+0x1af/0x1f0)
>> [27224.300000] ---[ end trace 15641276e08fba8a ]---
>
> Hmmm, that seems pretty far from the network/interrupt drivers, and it
> looks like other users have seen this on !ARM machines:
> http://forums.gentoo.org/viewtopic-p-7379928.html
> https://bugzilla.redhat.com/show_bug.cgi?id=989251
>
> There's a patch in the redhat's bugzilla, could you try to apply it and
> see if it solves your problem?
>
> Maybe the netdev guys will have other ideas as well.
For the record, I set:
net.ipv4.tcp_early_retrans = 2
net.ipv4.tcp_frto = 0
And I didn't see the warning any more.
It's seems that a patch is on it's way. I'll try it instead of setting
the sysctl.
http://thread.gmane.org/gmane.linux.network/286793

Richard.

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Antonio Quartulli @ 2013-10-17 11:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Antonio Quartulli, David S. Miller, netdev@vger.kernel.org,
	Stephen Hemminger
In-Reply-To: <20131017112857.GA11318@localhost>

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Thu, Oct 17, 2013 at 04:28:57AM -0700, Pablo Neira Ayuso wrote:
> Hi,
> > +
> > +/**
> > + * br_netfilter_skb_free - clean the NF bridge data in an skb
> > + * @skb: the skb which the data to free belongs to
> > + */
> > +void br_netfilter_skb_free(struct sk_buff *skb)
> > +{
> > +	nf_bridge_put(skb->nf_bridge);
> > +	skb->nf_bridge = NULL;
> > +}
> 
> This should be nf_reset.

You think I should directly use nf_reset instead of this function?

I see that nf_reset() cleans up the conntrack part too: does it also become
useless once the packet exits the bridge interface?


Cheers,

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net] bridge: clean the nf_bridge status when forwarding the skb
From: Pablo Neira Ayuso @ 2013-10-17 11:28 UTC (permalink / raw)
  To: Antonio Quartulli
  Cc: David S. Miller, netdev, Antonio Quartulli, Stephen Hemminger
In-Reply-To: <1381791096-3561-1-git-send-email-antonio@meshcoding.com>

Hi,

On Tue, Oct 15, 2013 at 12:51:36AM +0200, Antonio Quartulli wrote:
> From: Antonio Quartulli <antonio@open-mesh.com>
> 
> Even if it is forbidden to enslave a bridge interface into
> another one, it is still possible to create a chain of
> virtual interfaces including two distinct bridges.
>
> In this case, the skb entering the second bridge could have
> the nf_bridge field already set due to a previous operation
> and consequently lead to wrong a processing of the packet
> itself.
> 
> To prevent this behaviour release and set to NULL the
> nf_bridge field of the skb when forwarding the packet.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
> ---
> 
> I know that not using "extern" when declaring the prototype is not consistent
> with the surrounding code, but checkpatch complaints about using "extern" in .h
> file and I prefer to not do something "wrong" even if stylistically ugly.
> 
> Cheers,
> 
> 
>  net/bridge/br_forward.c   |  2 ++
>  net/bridge/br_netfilter.c | 10 ++++++++++
>  net/bridge/br_private.h   |  2 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 4b81b14..62955f3 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -49,6 +49,8 @@ int br_dev_queue_push_xmit(struct sk_buff *skb)
>  	} else {
>  		skb_push(skb, ETH_HLEN);
>  		br_drop_fake_rtable(skb);
> +		br_netfilter_skb_free(skb);
> +
>  		dev_queue_xmit(skb);
>  	}
>  
> diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
> index f877362..7cad3e2 100644
> --- a/net/bridge/br_netfilter.c
> +++ b/net/bridge/br_netfilter.c
> @@ -1086,3 +1086,13 @@ void br_netfilter_fini(void)
>  #endif
>  	dst_entries_destroy(&fake_dst_ops);
>  }
> +
> +/**
> + * br_netfilter_skb_free - clean the NF bridge data in an skb
> + * @skb: the skb which the data to free belongs to
> + */
> +void br_netfilter_skb_free(struct sk_buff *skb)
> +{
> +	nf_bridge_put(skb->nf_bridge);
> +	skb->nf_bridge = NULL;
> +}

This should be nf_reset.

^ permalink raw reply

* [PATCH net-next] fib_trie: Send RTM_DELROUTE when link goes down
From: Kristian Evensen @ 2013-10-17 11:13 UTC (permalink / raw)
  To: netdev; +Cc: Kristian Evensen

From: Kristian Evensen <kristian.evensen@gmail.com>

When a link is set as down using for example "ip link set dev X down", routes
are deleted, but RTM_DELROUTE messages are not sent to RTNLGRP_IPV4_ROUTE. This
patch makes trie_flush_list() send a RTM_DELROUTE for each route that is
removed, and makes rtnelink route deletion behavior consistent across commands.
The parameter lists for trie_flush_list() and trie_flush_leaf() are expanded to
include required information otherwise unavailable in trie_flush_list().

One use case that is simplified by this patch is IPv4 WAN connection monitoring.
Instead of listening for and handling both RTM_*ROUTE and RTM_*LINK-messages, it
is sufficient to handle RTM_*ROUTE.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>

---
 net/ipv4/fib_trie.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index ec9a9ef..acd5b3b 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1698,15 +1698,23 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
 	return 0;
 }
 
-static int trie_flush_list(struct list_head *head)
+static int trie_flush_list(struct list_head *head, u32 tb_id, t_key key,
+		int plen)
 {
 	struct fib_alias *fa, *fa_node;
 	int found = 0;
+	struct nl_info cfg;
+
+	memset(&cfg, 0, sizeof(cfg));
 
 	list_for_each_entry_safe(fa, fa_node, head, fa_list) {
 		struct fib_info *fi = fa->fa_info;
 
 		if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+			cfg.nl_net = fi->fib_net;
+			rtmsg_fib(RTM_DELROUTE, htonl(key), fa, plen, tb_id,
+					&cfg, 0);
+
 			list_del_rcu(&fa->fa_list);
 			fib_release_info(fa->fa_info);
 			alias_free_mem_rcu(fa);
@@ -1716,7 +1724,7 @@ static int trie_flush_list(struct list_head *head)
 	return found;
 }
 
-static int trie_flush_leaf(struct leaf *l)
+static int trie_flush_leaf(struct leaf *l, u32 tb_id)
 {
 	int found = 0;
 	struct hlist_head *lih = &l->list;
@@ -1724,7 +1732,7 @@ static int trie_flush_leaf(struct leaf *l)
 	struct leaf_info *li = NULL;
 
 	hlist_for_each_entry_safe(li, tmp, lih, hlist) {
-		found += trie_flush_list(&li->falh);
+		found += trie_flush_list(&li->falh, tb_id, l->key, li->plen);
 
 		if (list_empty(&li->falh)) {
 			hlist_del_rcu(&li->hlist);
@@ -1813,7 +1821,7 @@ int fib_table_flush(struct fib_table *tb)
 	int found = 0;
 
 	for (l = trie_firstleaf(t); l; l = trie_nextleaf(l)) {
-		found += trie_flush_leaf(l);
+		found += trie_flush_leaf(l, tb->tb_id);
 
 		if (ll && hlist_empty(&ll->list))
 			trie_leaf_remove(t, ll);
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Herbert Xu @ 2013-10-17 11:04 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <20131017110149.GA6747@unicorn.suse.cz>

On Thu, Oct 17, 2013 at 01:01:49PM +0200, Michal Kubecek wrote:
>
> When I was thinking about this, I came with two situations when this
> might IMHO happen:
> 
> 1. We are sending a packet to ourselves; this could mean not only a
> local loop but also e.g. a packets between an LXC container and the host
> or between two LXC containers. While it doesn't make much sense to use
> compression in such case, it might happen while testing a solution which
> is going to be used in a normal setup later.
> 
> 2. The packet was passed to tun/tap device by a userspace application.
> 
> Am I wrong?

I hope so :) Because otherwise xfrm_input will surely dead-lock since
it uses spin_lock.

The cases you mentioned should all go through netif_rx which will
then do the processing in BH context.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] X.25: Fix address field length calculation
From: Andrew Hendry @ 2013-10-17 11:02 UTC (permalink / raw)
  To: David Laight
  Cc: Joe Perches, Kelleter, Günther, David Miller, linux-x25,
	netdev, linux-kernel
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B738F@saturn3.aculab.com>

Sorry for the previous html mail.
This appears to be correct, what length addresses are you getting back
in the call accept when this happens?

On Wed, Oct 16, 2013 at 7:56 PM, David Laight <David.Laight@aculab.com> wrote:
>> On Tue, 2013-10-15 at 14:29 +0000, Kelleter, Günther wrote:
>> > Addresses are BCD encoded, not ASCII. x25_addr_ntoa got it right.
>> []
>> > Wrong length calculation leads to rejection of CALL ACCEPT packets.
>> []
>> > diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
>> []
>> > @@ -98,7 +98,7 @@ int x25_parse_address_block(struct sk_buff *skb,
>> >     }
>> >     len = *skb->data;
>> > -   needed = 1 + (len >> 4) + (len & 0x0f);
>> > +   needed = 1 + ((len >> 4) + (len & 0x0f) + 1) / 2;
>>
>> This calculation looks odd.
>
> Looks correct to me...
> In X.25 the lengths (in digits) of the called and calling addresses
> are encoded in the high and low nibbles of one byte and then
> followed by both addresses with a digit in each nibble.
> If the length of the first address is odd, the second one
> isn't byte aligned.
>
>         David
>
>
>

^ permalink raw reply

* Re: PROBLEM: GRE forwarding not working with 3.10.x, WCCPv2 and Cisco 7200 Router showing IP0 bad-hlen 4 in tcpdump
From: Peter Schmitt @ 2013-10-17 10:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org, Pravin B Shelar
In-Reply-To: <1381942996.2045.135.camel@edumazet-glaptop.roam.corp.google.com>

Hi,

> 
> 3.10 is buggy (for ETH_P_WCCP support I mean), 3.11 has the probable
> fix :
> 
> commit 3d7b46cd20e300bd6989fb1f43d46f1b9645816e
> ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
> 
> The problem is 3.10 do not pull the extra 4 bytes of WCCP
> 
> This is now properly done in iptunnel_pull_header()
> 

First of all, many thanks for your help on this.

I tried to apply the fix to the 3.10.16 sources, as I would like to stay with the long-term line. Unfortunately it does not apply, as there are a lot of dependencies on other patches.

Do you think there will be a fix for the long-time 3.10 kernel line? Or can you guide me on how to apply this fix to the long-term kernel?

Best regards,
Peter

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Michal Kubecek @ 2013-10-17 11:01 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Steffen Klassert, David S. Miller, netdev
In-Reply-To: <20131016123205.GA9982@gondor.apana.org.au>

On Wed, Oct 16, 2013 at 08:32:05PM +0800, Herbert Xu wrote:
> On Mon, Oct 14, 2013 at 06:03:34PM +0200, Michal Kubecek wrote:
> 
> > Add similar protection into ipcomp_decompress() as it can be
> > called from process context as well (even if such scenario seems
> > a bit artificial).
> 
> I don't think this is possible or otherwise xfrm_input will
> dead-lock.

When I was thinking about this, I came with two situations when this
might IMHO happen:

1. We are sending a packet to ourselves; this could mean not only a
local loop but also e.g. a packets between an LXC container and the host
or between two LXC containers. While it doesn't make much sense to use
compression in such case, it might happen while testing a solution which
is going to be used in a normal setup later.

2. The packet was passed to tun/tap device by a userspace application.

Am I wrong?

                                                        Michal Kubecek

^ permalink raw reply

* Re: IPv6 path discovery oddities - flushing the routing cache resolves
From: Valentijn Sessink @ 2013-10-17 10:53 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20131016154841.GC18135@order.stressinduktion.org>

Hi,

Thanks for your fast response, please see below. Sorry for the rather 
long mail, due to the routing tables. Even though they're really short, 
the output of ipv6_route is extensive...

On 16-10-13 17:48, Hannes Frederic Sowa wrote:
> I do think these two issues are connected. Could you send me a the
> corresponding ip route output and /proc/net/ipv6_route output for when it
> works and mtus are correctly handled and when it does not work?

When it works (please note that the routing cache at first doesn't show 
an MTU, that only happens after a "too big" has been received, hence the 
difference before and after a "ps uaxww"), I have the following - this 
is from last night:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache

# ps uaxww
[... output cut for brevity ...]

# ip -6 ro li cache
2001:7b8:1529:0:cd7a:7681:b261:af09 via fe80::1 dev br0  metric 0
     cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires 577sec mtu 1280

# cat /proc/net/ipv6_route
200107b815290000cd7a7681b261af09 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 000062b2 01000023
   br0
20011af8ff030003021966fffe2606dd 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000001 00000002 01400023
   br0
2a0104f80000a0a10000000000020001 80 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000000 00000000 00000001 01000003
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
fe800000000000000000000000000001 00000400 00000000 00000000 00000003
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000037 80200001
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000003 000e6bc4 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 0000003b 80200001
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00
00000000000000000000000000000000 00000000 00000001 00000000 80200001
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00
00000000000000000000000000000000 00000100 00000000 00000000 00000001
vnet1
00000000000000000000000000000000 00 00000000000000000000000000000000 00
00000000000000000000000000000000 ffffffff 00000001 0001b230 00200200
    lo

(Please note, while trying, I found a rather interesting output, showing 
negative time; if it's normal, please ignore:
# ip -statistics -6 ro li cache
2001:1af8:ff03:3:219:66ff:fe26:6dd via fe80::1 dev br0  metric 0
     cache  expires -23sec users 2 used 2 age 637sec mtu 1280)


Anyway, today, suddenly it doesn't work anymore:

# ip -6 ro li
2a01:4f8:200:546b::/64 dev br0  proto kernel  metric 256
fe80::/64 dev eth0  proto kernel  metric 256
fe80::/64 dev br0  proto kernel  metric 256
fe80::/64 dev vnet0  proto kernel  metric 256
fe80::/64 dev vnet1  proto kernel  metric 256
fe80::/64 dev vnet2  proto kernel  metric 256
default via fe80::1 dev br0  metric 1024

# ip -6 ro li cache
2001:7b8:2ff:37a::2 via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529::f via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::5 via 2a01:4f8:200:546b::5 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache

# ip -statistics -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache  users 4 used 9448
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache  users 4 used 260467 age 0sec
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache  users 1 used 987 age 205sec
# ip -6 ro li cache
2001:7b8:1529:0:213:8fff:feda:140c via fe80::1 dev br0  metric 0
     cache
2001:7b8:1529:0:b845:bb25:5d99:2a07 via fe80::1 dev br0  metric 0
     cache
2a01:4f8:200:546b::2eef via 2a01:4f8:200:546b::2eef dev br0  metric 0
     cache
(The second time suddenly a couple of routes were gone - I don't know 
why). After "cat /proc/net/ipv6_route > /tmp/route-bad" I typed a "route 
-6 flush cache" and TCP started flowing again. Here's ipv6_route:

200107b802ff037a0000000000000002 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b815290000000000000000000f 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000000 00000006 01000003 
   br0
200107b81529000002138ffffeda140c 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 000024aa 01000023 
   br0
200107b815290000b845bb255d992a07 80 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000000 00000004 0003f79f 01000023 
   br0
2a0104f80200546b0000000000000005 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000000005 00000000 00000000 00000018 01000001 
   br0
2a0104f80200546b0000000000002eef 80 00000000000000000000000000000000 00 
2a0104f80200546b0000000000002eef 00000000 00000001 000003db 01000001 
   br0
2a0104f80200546b0000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
fe800000000000000000000000000001 00000400 00000000 00000000 00000003 
   br0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000167 80200001 
    lo
2a0104f80200546b0000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
2a0104f80200546b0000000000000002 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 0000000b 002a4bfe 80200001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe800000000000000000000000000000 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 00300001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 000001ed 80200001 
    lo
fe80000000000000d63d7efffee306e9 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe17ee5b 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffe497447 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
fe80000000000000fc5400fffedd5a81 80 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000000 00000001 00000000 80200001 
    lo
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
   br0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
  eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet1
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 
00000000000000000000000000000000 00000100 00000000 00000000 00000001 
vnet2
00000000000000000000000000000000 00 00000000000000000000000000000000 00 
00000000000000000000000000000000 ffffffff 00000001 00064566 00200200 
    lo

I hope this helps.

Best regards,

Valentijn

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Vrabel @ 2013-10-17 10:31 UTC (permalink / raw)
  To: jianhai luan
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FB9BC.9010608@oracle.com>

On 17/10/13 11:19, jianhai luan wrote:
> 
> On 2013-10-17 17:15, David Vrabel wrote:
>> On 17/10/13 10:02, jianhai luan wrote:
>>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>>
>>>>> If netfront sends at a very low rate, the time between subsequent
>>>>> calls
>>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>>> timer_after_eq() will be incorrect.  Credit will not be replenished
>>>>> and
>>>>> the guest may become unable to send (e.g., if prior to the long
>>>>> gap, all
>>>>> credit was exhausted).
>>>>>
>>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>>> Because
>>>>> the fact now must be not before than expire, time_before(now, expire)
>>>>> == true
>>>>> will verify the scenario.
>>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>>       ==
>>>>>       !time_in_range_open(now, expire, next_credit)
>>>> So first of all this must be with a 32-bit netback. And the not
>>>> coverable gap between activity is well over 240 days long. _If_
>>>> this really needs dealing with, then why is extending this from
>>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>>> change to 64-bit jiffy values, and use time_after_eq64()?
>>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>>> MAX_ULONG/2 in 64-bit will need long long time)
>>>
>>> I think the gap should be think all environment even now extending 480+.
>>> if now fall in the gap,  one timer will be pending and replenish will be
>>> in time.  Please run the attachment test program.
>>>
>>> If use time_after_eq64(), expire ,next_credit and other member will must
>>> be u64.
>> Yes, you'll need to store next_credit as a u64 in vif instead of
>> calculating it in tx_credit_exceeded from expires (which is only an
>> unsigned long).
> 
> I know that.  Even we use u64, time_after_eq()  will also do wrong judge
> in theory (not in reality because need long long time).

If jiffies_64 has millisecond resolution that would be more than
500,000,000 years.

> I think the two better fixed way is below:
>   - By time_before() to judge if now beyond MAX_ULONG/2

This is broken, so no.

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17 10:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FAABE.5080806@citrix.com>


On 2013-10-17 17:15, David Vrabel wrote:
> On 17/10/13 10:02, jianhai luan wrote:
>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>
>>>> If netfront sends at a very low rate, the time between subsequent calls
>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>>> credit was exhausted).
>>>>
>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>>> Because
>>>> the fact now must be not before than expire, time_before(now, expire)
>>>> == true
>>>> will verify the scenario.
>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>       ==
>>>>       !time_in_range_open(now, expire, next_credit)
>>> So first of all this must be with a 32-bit netback. And the not
>>> coverable gap between activity is well over 240 days long. _If_
>>> this really needs dealing with, then why is extending this from
>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>> change to 64-bit jiffy values, and use time_after_eq64()?
>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>> MAX_ULONG/2 in 64-bit will need long long time)
>>
>> I think the gap should be think all environment even now extending 480+.
>> if now fall in the gap,  one timer will be pending and replenish will be
>> in time.  Please run the attachment test program.
>>
>> If use time_after_eq64(), expire ,next_credit and other member will must
>> be u64.
> Yes, you'll need to store next_credit as a u64 in vif instead of
> calculating it in tx_credit_exceeded from expires (which is only an
> unsigned long).

I know that.  Even we use u64, time_after_eq()  will also do wrong judge 
in theory (not in reality because need long long time).
I think the two better fixed way is below:
   - By time_before() to judge if now beyond MAX_ULONG/2
   - Add another timer to check and update expire in MAX_ULONG>>2 period.

Because second way isn't  be verified in practical (need more time to 
waiting jiffes increase),  I chose the first.
>
> David

^ permalink raw reply

* Re: [PATCH] WAN: Adding support for Infineon PEF2256 E1 chipset
From: Mark Rutland @ 2013-10-17 10:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: rob.herring@calxeda.com, Pawel Moll, Stephen Warren, Ian Campbell,
	Rob Landley, grant.likely@linaro.org, Krzysztof Halasa,
	devicetree@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	jerome.chantelauze@c-s.fr
In-Reply-To: <201310161525.r9GFPZI5006238@localhost.localdomain>

On Wed, Oct 16, 2013 at 04:25:35PM +0100, Christophe Leroy wrote:
> The patch adds WAN support for Infineon PEF2256 E1 Chipset.
> 
> Signed-off-by: Jerome Chantelauze <jerome.chantelauze@c-s.fr>
> Acked-by: Christophe Leroy <christophe.leroy@c-s.fr>

> +static ssize_t fs_attr_mode_store(struct device *dev,
> +                       struct device_attribute *attr,  const char *buf,
> +                       size_t count)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +       u32 value;
> +       int ret = kstrtol(buf, 10, (long int *)&value);

u32 is not the same as long int.

> +       int reconfigure = (value != priv->mode);
> +
> +       if (ret != 0)
> +               return ret;
> +
> +       if (value != MASTER_MODE && value != SLAVE_MODE)
> +               return -EINVAL;
> +
> +       priv->mode = value;
> +       if (reconfigure && priv->init_done) {
> +               pef2256_close(ndev);
> +               init_FALC(priv);
> +               pef2256_open(ndev);
> +       }
> +
> +       return count;

What if count is not the number of characters read?

[...]

> +
> +       /* TS 0 is reserved */
> +       if (value & 0x80000000)
> +               return -EINVAL;

Magic numbers should be turned into constants.

> +static ssize_t fs_attr_Rx_TS_show(struct device *dev,
> +                       struct device_attribute *attr, char *buf)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +
> +       return sprintf(buf, "0x%08x\n", priv->Rx_TS);
> +}
> +
> +
> +static ssize_t fs_attr_Rx_TS_store(struct device *dev,
> +                       struct device_attribute *attr,  const char *buf,
> +                       size_t count)
> +{
> +       struct net_device *ndev = dev_get_drvdata(dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +       u32 value;
> +       int ret = kstrtol(buf, 10, (long int *)&value);

I'm not sure what the rules are regarding this, but why do we show this
in hexadecimal but read it in decimal?

[...]

> +int Config_HDLC(struct pef2256_dev_priv *priv)
> +{
> +       int i;
> +       int TS_idx;
> +       struct pef2256_regs *base_addr;

That sounds suspicious. Using structs for the offsets of registers isn't
very portable...

It would be preferable to #define the offsets.

> +       u8 dummy;
> +
> +       /* Set framer E1 address */
> +       base_addr = (struct pef2256_regs *)priv->base_addr;

That looks even more suspicious...

> +
> +       /* Read to remove pending IT */
> +       dummy = base_addr->ISR0;
> +       dummy = base_addr->ISR1;

You should use MMIO accessors here (readl, writel, etc). You have no
idea how the compiler may reorganise, coalese or throw away accesses,
nor how those accesses will be made. Additionally, without the requisite
barriers you have no guarantee the CPU won't reorder these accesses.

The compiler is within its rights here to throw away these accesses as
the results are never used. This is broken.

With some constants for the register offsets, this would be:

readb(base_addr + REG_ISR0);
readb(base_addr + REG_ISR1);

Which won't be reordered or thrown away by either the compiler or CPU.

> +
> +       /* Mask HDLC 1 Transmit IT */
> +       base_addr->IMR1 |= 1;
> +       base_addr->IMR1 |= 1 << 4;
> +       base_addr->IMR1 |= 1 << 5;
> +
> +       /* Mask HDLC 1 Receive IT */
> +       base_addr->IMR0 |= 1;
> +       base_addr->IMR0 |= 1 << 7;
> +       base_addr->IMR1 |= 1 << 6;
> +
> +       udelay((2 * 32) * 125);

Why the udelay, and how was the delay period (2 * 32 * 125) derived?

Is this to account for the lack of barriers, or does the hardware have a
requirement that there's a delay? 

If the former, please fix. If the later, please coment the udelay to
make this clear.

> +
> +       /* MODE.HRAC = 0 (Receiver inactive)
> +          MODE.DIV = 0 (Data normal operation)
> +          for FALC V2.2 : MODE.HDLCI = 0 (normal operation) */
> +       /* MODE.MDS2:0 = 100 (No address comparison) */
> +       /* MODE.HRAC = 1 (Receiver active) */
> +       out_8(&(base_addr->MODE), 1 << 3);

Why are you using an MMIO accessor here but not elsewhere?

Not all architectures seem to have out_8, but I think iowrite8/writeb
will work (though I'm not sure what the intended difference between
writeb and out_8 is).

> +       /* CCR1.EITS = 1 (Enable internal Time Slot 31:0 Signaling)
> +          CCR1.XMFA = 0 (No transmit multiframe alignment)
> +          CCR1.RFT1:0 = 00 (RFIFO sur 32 bytes) */
> +       /* setting up Interframe Time Fill */
> +       /* CCR1.ITF = 1 (Interframe Time Fill Continuous flag) */
> +       out_8(&(base_addr->CCR1), 0x10 | (1 << 3));
> +       /* CCR2.XCRC = 0 (Transmit CRC ON)
> +          CCR2.RCRC = 0 (Receive CRC ON, no write in RFIFO)
> +          CCR2.RADD = 0 (No write address in RFIFO) */
> +       out_8(&(base_addr->CCR2), 0x00);
> +
> +       udelay((2 * 32) * 125);

Please explain all udelay instances.

[...]

> +                               setbits8(&(base_addr->TTR1), 1 << i);

I'm not aware of a generic equivalent to setbits8, but it seems like
writeb(readb(ADDR) | bits), ADDR) would do the same.

[...]

> +static int pef2256_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
> +{
> +       int ret;
> +
> +       ret = hdlc_ioctl(dev, ifr, cmd);
> +       return ret;
> +}

This seems a bit useless -- can't you just assign hdlc_ioctl to
pef2256_ops::ndo_do_ioctl directly?

> +static const struct of_device_id pef2256_match[];
> +static int pef2256_probe(struct platform_device *ofdev)

s/ofdev/pdev -- platform_device has nothing to do with OF.

> +{
> +       const struct of_device_id *match;
> +       struct pef2256_dev_priv *priv;
> +       int ret = -ENOMEM;
> +       struct net_device *netdev;
> +       hdlc_device *hdlc;
> +       int sys_ret;
> +       struct pef2256_regs *base_addr;
> +       struct device_node *np = (&ofdev->dev)->of_node;
> +       const u32 *data;
> +       int len;
> +
> +       match = of_match_device(pef2256_match, &ofdev->dev);
> +       if (!match)
> +               return -EINVAL;

Why not:

if (!pdev->dev.of_node)
	return -EINVAL;

You shouldn't have an of_node unless one of your compatible strings
matched, and this way you don't have to iterate over the list again.

> +
> +       dev_err(&ofdev->dev, "Found PEF2256\n");
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return ret;
> +
> +       priv->dev = &ofdev->dev;
> +
> +       data = of_get_property(np, "data-rate", &len);
> +       if (!data || len != 4) {

Use of_property_read_u32.

> +               dev_err(&ofdev->dev, "failed to read data-rate -> using 8Mb\n");
> +               priv->data_rate = DATA_RATE_8M;
> +       } else
> +               priv->data_rate = *data;
> +
> +       data = of_get_property(np, "channel-phase", &len);
> +       if (!data || len != 4) {

Use of_property_read_u32.

> +               dev_err(&ofdev->dev, "failed to read channel phase -> using 0\n");
> +               priv->channel_phase = CHANNEL_PHASE_0;
> +       } else
> +               priv->channel_phase = *data;
> +
> +       data = of_get_property(np, "rising-edge-sync-pulse", NULL);

Use of_property_read_string.

> +       if (!data) {
> +               dev_err(&ofdev->dev, "failed to read rising edge sync pulse -> using \"transmit\"\n");
> +               strcpy(priv->rising_edge_sync_pulse, "transmit");
> +       } else if (strcmp((char *)data, "transmit") &&
> +                       strcmp((char *)data, "receive")) {
> +               dev_err(&ofdev->dev, "invalid rising edge sync pulse -> using \"transmit\"\n");
> +               strcpy(priv->rising_edge_sync_pulse, "transmit");
> +       } else
> +               strncpy(priv->rising_edge_sync_pulse, (char *)data, 10);
> +
> +       priv->irq = of_irq_to_resource(np, 0, NULL);
> +       if (!priv->irq) {
> +               dev_err(priv->dev, "no irq defined\n");
> +               return -EINVAL;
> +       }

The irq will have already been parsed, and will be in your
platform_device's set of resources. You can use platform_get_irq to get
at it rather than getting the of_ code to attempt to map it again.

Why are you storing the IRQ resource, rather than the irq itself? Surely
the irq number is easier to deal with?

> +       netdev = alloc_hdlcdev(priv);
> +       if (!netdev) {
> +               ret = -ENOMEM;
> +               return ret;

You leak priv and the priv->base_addr mapping here.

[...]

> +       ret = register_hdlc_device(netdev);
> +       if (ret < 0) {
> +               pr_err("unable to register\n");
> +               return ret;

You leak the priv, priv->base_addr, and netdev here.

> +       }
> +
> +       sys_ret = 0;
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_mode);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_Tx_TS);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_Rx_TS);
> +       sys_ret |= device_create_file(priv->dev, &dev_attr_regs);

Huh? can't any of these fail individually?

> +
> +       if (sys_ret) {
> +               device_remove_file(priv->dev, &dev_attr_mode);

What about the other files?

> +               unregister_hdlc_device(priv->netdev);
> +               free_netdev(priv->netdev);

What about priv and priv->base_addr?

Why is there not a return here? We'll fall out to the main body and
return 0, as if everything's OK...

> +       }
> +
> +       priv->init_done = 0;
> +
> +       return 0;
> +}

[...]

> +
> +
> +/*
> + * Suppression du module
> + */
> +static int pef2256_remove(struct platform_device *ofdev)
> +{
> +       struct net_device *ndev = dev_get_drvdata(&ofdev->dev);
> +       struct pef2256_dev_priv *priv = dev_to_hdlc(ndev)->priv;
> +
> +       device_remove_file(priv->dev, &dev_attr_Rx_TS);
> +       device_remove_file(priv->dev, &dev_attr_Tx_TS);
> +       device_remove_file(priv->dev, &dev_attr_mode);
> +
> +       unregister_hdlc_device(priv->netdev);
> +       free_netdev(priv->netdev);

What about priv and priv->base_addr?

> +
> +       /* Do E1 stuff */
> +
> +       dev_set_drvdata(&ofdev->dev, NULL);
> +       kfree(ofdev);

Is that meant to be done here? Isn't that the job of the core code?

[...]

> +static int __init pef2256_init(void)
> +{
> +       int ret;
> +       ret = platform_driver_register(&pef2256_driver);
> +       return ret;
> +}
> +module_init(pef2256_init);
> +
> +
> +static void __exit pef2256_exit(void)
> +{
> +       platform_driver_unregister(&pef2256_driver);
> +}
> +module_exit(pef2256_exit);

Use module_platform_driver?

> +/* Framer E1 registers */
> +union pef2256_Fifo {
> +       u8      XFIFO[sizeof(u16)];             /* Transmit FIFO */
> +       u8      RFIFO[sizeof(u16)];             /* Receive FIFO */
> +};

Huh? Why sizeof(u16) rather than 2?

> +struct pef2256_regs {
> +       union pef2256_Fifo      FIFO;   /* 0x00/0x01    FIFO (Tx or rx) */
> +       unsigned char   CMDR;   /* 0x02 Command Register */
> +       unsigned char   MODE;   /* 0x03 Mode Register */
> +       unsigned char   RAH1;   /* 0x04 Receive Address High 1 */
> +       unsigned char   RAH2;   /* 0x05 Receive Address High 2 */
> +       unsigned char   RAL1;   /* 0x06 Receive Address Low 1 */

[...]

Please do not use structures for calculation of register offsets.

> +       unsigned short  FEC;    /* 0x50/0x51 Framing Error Counter */
> +       unsigned short  CVC;    /* 0x52/0x53 Code Violation Counter */
> +       unsigned short  CEC1;   /* 0x54/0x55 CRC Error Counter 1 */
> +       unsigned short  EBC;    /* 0x56/0x57 E-Bit Error Counter */
> +       unsigned short  CEC2;   /* 0x58/0x59 CRC Error Counter 2 */
> +       unsigned short  CEC3;   /* 0x5A/0x5B CRC Error Counter 3 */

These may not be the size you expect.

> diff -urN a/Documentation/devicetree/bindings/net/pef2256.txt b/Documentation/devicetree/bindings/net/pef2256.txt
> --- a/Documentation/devicetree/bindings/net/pef2256.txt 1970-01-01 01:00:00.000000000 +0100
> +++ b/Documentation/devicetree/bindings/net/pef2256.txt 2013-10-13 15:05:42.000000000 +0200
> @@ -0,0 +1,29 @@
> +* Wan on Infineon pef2256 E1 controller

A brief description would be helpful. Is there any publicly available
documentation?

> +
> +Required properties:
> +- compatible: Should be "infineon,pef2256"

s/Should be/Should contain/ -- variants may exist in future.

> +- reg: Address and length of the register set for the device

Is there only the one register bank?

> +- interrupts: Should contain interrupts

How many? What do they correspond to?

> +
> +Optional properties:
> +- data-rate: Data rate on the system highway.
> +  Supported values are: 2, 4, 8, 16.
> +  8 if not defined.

What is the "system highway"? Is this configuration, or is this a
property of the device that cannot be probed?

> +- channel-phase: First time slot transmission channel phase.
> +  Supported values are: 0, 1, 2, 3, 4, 5, 6, 7.
> +  0 if not defined.

Similarly?

> +- rising-edge-sync-pulse: rising edge synchronous pulse.
> +  Supported values are: "receive", "transmit".
> +  "transmit" if not defined.

I'm not sure what this means. Could you elaborate?

Thanks,
Mark.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:59 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FC98002000078000FBBB5@nat28.tlf.novell.com>


On 2013-10-17 17:26, Jan Beulich wrote:
>>>> On 17.10.13 at 11:02, jianhai luan <jianhai.luan@oracle.com> wrote:
>> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>>
>>>> If netfront sends at a very low rate, the time between subsequent calls
>>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>>> credit was exhausted).
>>>>
>>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>> Because
>>>> the fact now must be not before than expire, time_before(now, expire) ==
>> true
>>>> will verify the scenario.
>>>>       time_after_eq(now, next_credit) || time_before (now, expire)
>>>>       ==
>>>>       !time_in_range_open(now, expire, next_credit)
>>> So first of all this must be with a 32-bit netback. And the not
>>> coverable gap between activity is well over 240 days long. _If_
>>> this really needs dealing with, then why is extending this from
>>> 240+ to 480+ days sufficient? I.e. why don't you simply
>>> change to 64-bit jiffy values, and use time_after_eq64()?
>> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
>> MAX_ULONG/2 in 64-bit will need long long time)
>>
>> I think the gap should be think all environment even now extending 480+.
>> if now fall in the gap,  one timer will be pending and replenish will be
>> in time.  Please run the attachment test program.
> Not sure what this is supposed to tell me. I recognize that there
> are overflow conditions not handled properly, but (a) I have a
> hard time thinking of a sensible guest that sits idle for over 240
> days (host uptime usually isn't even coming close to that due to
> maintenance requirements) and (b) if there is such a sensible
> guest, then I can't see why dealing with one being idle for over
> 480 days should be required too.

The issue can be reproduced when now beyond MAX_ULONG/2 (if the gust 
will send lesser package).
Jiffies beyond than MAX_UNLONG/2 will need below time:
     HZ         days
    100        248.55        (((0xffffffff/2)/HZ)/3600)/24
    250        99.42          (((0xffffffff/2)/HZ)/3600)/24
   1000       24.86          (((0xffffffff/2)/HZ)/3600)/24

Because we use 250,  the issue be found when uptime large than 100 days.

Jason
>> If use time_after_eq64(), expire ,next_credit and other member will must
>> be u64.
> Exactly - that's what I was telling you to do.
>
> Jan
>

^ permalink raw reply

* Re: [PATCH ipsec] xfrm: prevent ipcomp scratch buffer race condition
From: Steffen Klassert @ 2013-10-17  9:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michal Kubecek, David S. Miller, netdev
In-Reply-To: <20131016123205.GA9982@gondor.apana.org.au>

On Wed, Oct 16, 2013 at 08:32:05PM +0800, Herbert Xu wrote:
> On Mon, Oct 14, 2013 at 06:03:34PM +0200, Michal Kubecek wrote:
> > In ipcomp_compress(), sortirq is enabled too early, allowing the
> > per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> > (called on the same CPU in softirq context) between populating
> > the buffer and copying the compressed data to the skb.
> 
> Good catch.
> 
> > Add similar protection into ipcomp_decompress() as it can be
> > called from process context as well (even if such scenario seems
> > a bit artificial).
> 
> I don't think this is possible or otherwise xfrm_input will
> dead-lock.
> 

Michal, please incorporate the feedback from Herbert and Eric,
I'll take it into the ipsec tree then. Thanks!

^ permalink raw reply

* Re: [PATCH net-next] {selinux, af_key} Rework pfkey_sadb2xfrm_user_sec_ctx
From: Steffen Klassert @ 2013-10-17  9:51 UTC (permalink / raw)
  To: Fan Du; +Cc: Paul Moore, davem, netdev
In-Reply-To: <525F3EBD.80406@windriver.com>

On Thu, Oct 17, 2013 at 09:34:53AM +0800, Fan Du wrote:
> 
> 
> On 2013年10月16日 23:15, Paul Moore wrote:
> >
> >The fact that you are now changing sadb_x_sec_ctx->sadb_x_sec_len whenever
> >pfkey_sadb2xfrm_user_sec_ctx() is called raises an eyebrow.  Can you elaborate
> >on why this is not a problem?
> >
> Thanks for your attention, Paul.
> 
> sadb_x_sec_ctx is extra headers passed down from user space, the usage of
> of this data structure falls down to one of pfkey_funcs function only for
> one time, more specifically speaking, it's only used by SELINUX for security
> checking for each operation. In other words, sadb_x_sec_ctx involves with a
> one shot business here. So the original codes seems do a lots of extra job
> which could easily be avoid using casting operation.
> 

Since the selinux people have to live with that change in the fist place,
I'd like to see an ack of one of the selinux maintainers before I take
in into ipsec-next, Paul?

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Jan Beulich @ 2013-10-17  9:26 UTC (permalink / raw)
  To: jianhai luan
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

>>> On 17.10.13 at 11:02, jianhai luan <jianhai.luan@oracle.com> wrote:
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. 
> Because
>>> the fact now must be not before than expire, time_before(now, expire) == 
> true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
> 
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
> MAX_ULONG/2 in 64-bit will need long long time)
> 
> I think the gap should be think all environment even now extending 480+. 
> if now fall in the gap,  one timer will be pending and replenish will be 
> in time.  Please run the attachment test program.

Not sure what this is supposed to tell me. I recognize that there
are overflow conditions not handled properly, but (a) I have a
hard time thinking of a sensible guest that sits idle for over 240
days (host uptime usually isn't even coming close to that due to
maintenance requirements) and (b) if there is such a sensible
guest, then I can't see why dealing with one being idle for over
480 days should be required too.

> If use time_after_eq64(), expire ,next_credit and other member will must 
> be u64.

Exactly - that's what I was telling you to do.

Jan

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: David Vrabel @ 2013-10-17  9:15 UTC (permalink / raw)
  To: jianhai luan
  Cc: Jan Beulich, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

On 17/10/13 10:02, jianhai luan wrote:
> 
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2.
>>> Because
>>> the fact now must be not before than expire, time_before(now, expire)
>>> == true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
> 
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond
> MAX_ULONG/2 in 64-bit will need long long time)
> 
> I think the gap should be think all environment even now extending 480+.
> if now fall in the gap,  one timer will be pending and replenish will be
> in time.  Please run the attachment test program.
> 
> If use time_after_eq64(), expire ,next_credit and other member will must
> be u64.

Yes, you'll need to store next_credit as a u64 in vif instead of
calculating it in tx_credit_exceeded from expires (which is only an
unsigned long).

David

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FA79F.8060601@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]


On 2013-10-17 17:02, jianhai luan wrote:
>
> On 2013-10-17 16:26, Jan Beulich wrote:
>>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>>
>>> If netfront sends at a very low rate, the time between subsequent calls
>>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>>> the guest may become unable to send (e.g., if prior to the long gap, 
>>> all
>>> credit was exhausted).
>>>
>>> We should add the scenario which now beyond 
>>> next_credit+MAX_UNLONG/2. Because
>>> the fact now must be not before than expire, time_before(now, 
>>> expire) == true
>>> will verify the scenario.
>>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>>      ==
>>>      !time_in_range_open(now, expire, next_credit)
>> So first of all this must be with a 32-bit netback. And the not
>> coverable gap between activity is well over 240 days long. _If_
>> this really needs dealing with, then why is extending this from
>> 240+ to 480+ days sufficient? I.e. why don't you simply
>> change to 64-bit jiffy values, and use time_after_eq64()?
>
> Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
> MAX_ULONG/2 in 64-bit will need long long time)
>
> I think the gap should be think all environment even now extending 
> 480+. if now fall in the gap,  one timer will be pending and replenish 
> will be in time.  Please run the attachment test program.
>

Sorry for miss the attachment in previous letter. Please check the 
attachment.
> If use time_after_eq64(), expire ,next_credit and other member will 
> must be u64.
>>
>> Jan
>>
>>> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
>>> ---
>>>   drivers/net/xen-netback/netback.c |    7 +++++--
>>>   1 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netback/netback.c
>>> b/drivers/net/xen-netback/netback.c
>>> index f3e591c..31eedaf 100644
>>> --- a/drivers/net/xen-netback/netback.c
>>> +++ b/drivers/net/xen-netback/netback.c
>>> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif 
>>> *vif,
>>> unsigned size)
>>>       if (timer_pending(&vif->credit_timeout))
>>>           return true;
>>>   -    /* Passed the point where we can replenish credit? */
>>> -    if (time_after_eq(now, next_credit)) {
>>> +    /* Credit should be replenished when now does not fall into the
>>> +     * range from expires to next_credit, and time_in_range_open()
>>> +     * is used to verify whether this case happens.
>>> +     */
>>> +    if (!time_in_range_open(now, vif->credit_timeout.expires, 
>>> next_credit)) {
>>>           vif->credit_timeout.expires = now;
>>>           tx_add_credit(vif);
>>>       }
>>> -- 
>>> 1.7.6.5
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>>
>>
>


[-- Attachment #2: main.c --]
[-- Type: text/plain, Size: 961 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({	type __dummy; \
	typeof(x) __dummy2; \
	(void)(&__dummy == &__dummy2); \
	1; \
})

#define time_after(a, b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((b) - (a)) < 0))
#define time_before(a,b)	time_after(b,a)

#define time_after_eq(a,b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((a) -(b)) >= 0))
#define time_before_eq(a, b) time_after_eq(b,a)

void do_nothing()
{
	return;
}

int main()
{
	unsigned char expire, now, next;
	unsigned char delta = 10;
	int i, j;

	for(i = 0; i < 256; i++) {
		expire = i;
		next = expire + delta;

		printf("\n\n\n[%u ... %u]\n", expire, next);
		now = expire;
		for(j=0; j < 1024; j++, now++) {	
			if(j%256 == 0) printf("\n");

			if (time_after_eq(now, next) ||
				time_before(now, expire)) {
				do_nothing();
			}
			else {
				printf("    now=%d\n", (char)now);
			}
		}
	}
	
	return 0;
}

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: jianhai luan @ 2013-10-17  9:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <525FBB4F02000078000FBB30@nat28.tlf.novell.com>


On 2013-10-17 16:26, Jan Beulich wrote:
>>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
>> time_after_eq() only works if the delta is < MAX_ULONG/2.
>>
>> If netfront sends at a very low rate, the time between subsequent calls
>> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
>> timer_after_eq() will be incorrect.  Credit will not be replenished and
>> the guest may become unable to send (e.g., if prior to the long gap, all
>> credit was exhausted).
>>
>> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. Because
>> the fact now must be not before than expire, time_before(now, expire) == true
>> will verify the scenario.
>>      time_after_eq(now, next_credit) || time_before (now, expire)
>>      ==
>>      !time_in_range_open(now, expire, next_credit)
> So first of all this must be with a 32-bit netback. And the not
> coverable gap between activity is well over 240 days long. _If_
> this really needs dealing with, then why is extending this from
> 240+ to 480+ days sufficient? I.e. why don't you simply
> change to 64-bit jiffy values, and use time_after_eq64()?

Yes, the issue only can be  reproduced in 32-bit Dom0 (Beyond 
MAX_ULONG/2 in 64-bit will need long long time)

I think the gap should be think all environment even now extending 480+. 
if now fall in the gap,  one timer will be pending and replenish will be 
in time.  Please run the attachment test program.

If use time_after_eq64(), expire ,next_credit and other member will must 
be u64.
>
> Jan
>
>> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
>> ---
>>   drivers/net/xen-netback/netback.c |    7 +++++--
>>   1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c
>> b/drivers/net/xen-netback/netback.c
>> index f3e591c..31eedaf 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif *vif,
>> unsigned size)
>>   	if (timer_pending(&vif->credit_timeout))
>>   		return true;
>>   
>> -	/* Passed the point where we can replenish credit? */
>> -	if (time_after_eq(now, next_credit)) {
>> +	/* Credit should be replenished when now does not fall into the
>> +	 * range from expires to next_credit, and time_in_range_open()
>> +	 * is used to verify whether this case happens.
>> +	 */
>> +	if (!time_in_range_open(now, vif->credit_timeout.expires, next_credit)) {
>>   		vif->credit_timeout.expires = now;
>>   		tx_add_credit(vif);
>>   	}
>> -- 
>> 1.7.6.5
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply

* RE: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6 offload support
From: Paul Durrant @ 2013-10-17  8:42 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org
In-Reply-To: <1381942838.30409.26.camel@kazak.uk.xensource.com>

> -----Original Message-----
> From: Ian Campbell
> Sent: 16 October 2013 18:01
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6 offload
> support
> 
> On Wed, 2013-10-16 at 17:53 +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Ian Campbell
> > > Sent: 16 October 2013 17:20
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org
> > > Subject: Re: [Xen-devel] [PATCH net-next v4 0/5] xen-netback: IPv6
> offload
> > > support
> > >
> > > On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> > > > This patch series adds support for checksum and large packet offloads
> into
> > > > xen-netback.
> > > > Testing has mainly been done using the Microsoft network hardware
> > > > certification suite running in Server 2008R2 VMs with Citrix PV
> frontends.
> > >
> > > Are there any Linux netfront patches in existence/the pipeline to take
> > > advantage of this?
> > >
> >
> > I was waiting for the backend patches to be accepted first ;-)
> 
> I think it would be useful to get et least an RFC so others can try it
> etc.
> 

Well, everyone can build and use the Windows frontend :-) (https://github.com/xenserver/win-xenvif/tree/upstream)

I'll try to hack up something in xen-netfront soon.

  Paul

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-17  8:41 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131017003421.GA31470@hmsreliant.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> On Mon, Oct 14, 2013 at 03:18:47PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-10-14 at 14:19 -0700, Eric Dumazet wrote:
> > > On Mon, 2013-10-14 at 16:28 -0400, Neil Horman wrote:
> > > 
> > > > So, early testing results today.  I wrote a test module that, allocated a 4k
> > > > buffer, initalized it with random data, and called csum_partial on it 100000
> > > > times, recording the time at the start and end of that loop.  Results on a 2.4
> > > > GHz Intel Xeon processor:
> > > > 
> > > > Without patch: Average execute time for csum_partial was 808 ns
> > > > With patch: Average execute time for csum_partial was 438 ns
> > > 
> > > Impressive, but could you try again with data out of cache ?
> > 
> > So I tried your patch on a GRE tunnel and got following results on a
> > single TCP flow. (short result : no visible difference)
> > 
> > 
> 
> So I went to reproduce these results, but was unable to (due to the fact that I
> only have a pretty jittery network to do testing accross at the moment with
> these devices).  So instead I figured that I would go back to just doing
> measurements with the module that I cobbled together (operating under the
> assumption that it would give me accurate, relatively jitter free results (I've
> attached the module code for reference below).  My results show slightly
> different behavior:
> 
> Base results runs:
> 89417240
> 85170397
> 85208407
> 89422794
> 91645494
> 103655144
> 86063791
> 75647774
> 83502921
> 85847372
> AVG = 875 ns
>
> Prefetch only runs:
> 70962849
> 77555099
> 81898170
> 68249290
> 72636538
> 83039294
> 78561494
> 83393369
> 85317556
> 79570951
> AVG = 781 ns
> 
> Parallel addition only runs:
> 42024233
> 44313064
> 48304416
> 64762297
> 42994259
> 41811628
> 55654282
> 64892958
> 55125582
> 42456403
> AVG = 510 ns
> 
> 
> Both prefetch and parallel addition:
> 41329930
> 40689195
> 61106622
> 46332422
> 49398117
> 52525171
> 49517101
> 61311153
> 43691814
> 49043084
> AVG = 494 ns
> 
> 
> For reference, each of the above large numbers is the number of 
> nanoseconds taken to compute the checksum of a 4kb buffer 100000 times.  
> To get my average results, I ran the test in a loop 10 times, averaged 
> them, and divided by 100000.
> 
> Based on these, prefetching is obviously a a good improvement, but not 
> as good as parallel execution, and the winner by far is doing both.

But in the actual usecase mentioned the packet data was likely cache-cold, 
it just arrived in the NIC and an IRQ got sent. Your testcase uses a 
super-hot 4K buffer that fits into the L1 cache. So it's apples to 
oranges.

To correctly simulate the workload you'd have to:

 - allocate a buffer larger than your L2 cache.

 - to measure the effects of the prefetches you'd also have to randomize
   the individual buffer positions. See how 'perf bench numa' implements a
   random walk via --data_rand_walk, in tools/perf/bench/numa.c.
   Otherwise the CPU might learn your simplistic stream direction and the
   L2 cache might hw-prefetch your data, interfering with any explicit 
   prefetches the code does. In many real-life usecases packet buffers are
   scattered.

Also, it would be nice to see standard deviation noise numbers when two 
averages are close to each other, to be able to tell whether differences 
are statistically significant or not.

For example 'perf stat --repeat' will output stddev for you:

  comet:~/tip> perf stat --repeat 20 --null bash -c 'usleep $((RANDOM*10))'

   Performance counter stats for 'bash -c usleep $((RANDOM*10))' (20 runs):

       0.189084480 seconds time elapsed                                          ( +- 11.95% )

The last '+-' percentage is the noise of the measurement.

Also note that you can inspect many cache behavior details of your 
algorithm via perf stat - the -ddd option will give you a laundry list:

  aldebaran:~> perf stat --repeat 20 -ddd perf bench sched messaging
  ...

     Total time: 0.095 [sec]

 Performance counter stats for 'perf bench sched messaging' (20 runs):

       1519.128721 task-clock (msec)         #   12.305 CPUs utilized            ( +-  0.34% )
            22,882 context-switches          #    0.015 M/sec                    ( +-  2.84% )
             3,927 cpu-migrations            #    0.003 M/sec                    ( +-  2.74% )
            16,616 page-faults               #    0.011 M/sec                    ( +-  0.17% )
     2,327,978,366 cycles                    #    1.532 GHz                      ( +-  1.61% ) [36.43%]
     1,715,561,189 stalled-cycles-frontend   #   73.69% frontend cycles idle     ( +-  1.76% ) [38.05%]
       715,715,454 stalled-cycles-backend    #   30.74% backend  cycles idle     ( +-  2.25% ) [39.85%]
     1,253,106,346 instructions              #    0.54  insns per cycle        
                                             #    1.37  stalled cycles per insn  ( +-  1.71% ) [49.68%]
       241,181,126 branches                  #  158.763 M/sec                    ( +-  1.43% ) [47.83%]
         4,232,053 branch-misses             #    1.75% of all branches          ( +-  1.23% ) [48.63%]
       431,907,354 L1-dcache-loads           #  284.313 M/sec                    ( +-  1.00% ) [48.37%]
        20,550,528 L1-dcache-load-misses     #    4.76% of all L1-dcache hits    ( +-  0.82% ) [47.61%]
         7,435,847 LLC-loads                 #    4.895 M/sec                    ( +-  0.94% ) [36.11%]
         2,419,201 LLC-load-misses           #   32.53% of all LL-cache hits     ( +-  2.93% ) [ 7.33%]
       448,638,547 L1-icache-loads           #  295.326 M/sec                    ( +-  2.43% ) [21.75%]
        22,066,490 L1-icache-load-misses     #    4.92% of all L1-icache hits    ( +-  2.54% ) [30.66%]
       475,557,948 dTLB-loads                #  313.047 M/sec                    ( +-  1.96% ) [37.96%]
         6,741,523 dTLB-load-misses          #    1.42% of all dTLB cache hits   ( +-  2.38% ) [37.05%]
     1,268,628,660 iTLB-loads                #  835.103 M/sec                    ( +-  1.75% ) [36.45%]
            74,192 iTLB-load-misses          #    0.01% of all iTLB cache hits   ( +-  2.88% ) [36.19%]
         4,466,526 L1-dcache-prefetches      #    2.940 M/sec                    ( +-  1.61% ) [36.17%]
         2,396,311 L1-dcache-prefetch-misses #    1.577 M/sec                    ( +-  1.55% ) [35.71%]

       0.123459566 seconds time elapsed                                          ( +-  0.58% )

There's also a number of prefetch counters that might be useful:

 aldebaran:~> perf list | grep prefetch
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  LLC-prefetches                                     [Hardware cache event]
  LLC-prefetch-misses                                [Hardware cache event]
  node-prefetches                                    [Hardware cache event]
  node-prefetch-misses                               [Hardware cache event]

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 2/3] ipvs: avoid rcu_barrier during netns cleanup
From: Simon Horman @ 2013-10-17  8:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Julian Anastasov, lvs-devel, netdev, netfilter-devel,
	Wensong Zhang
In-Reply-To: <20131017081142.GA5324@localhost>

On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > > 
> > > 	Hello,
> > > 
> > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > > 
> > > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > > manually apply.
> > > > 
> > > > Let me know.
> > > 
> > > 	It is not critical. I waited weeks the net tree to be
> > > copied into net-next because it collides with the recent
> > > "ipvs: make the service replacement more robust" change in
> > > net tree :) But if a rcu_barrier in the netns cleanup looks
> > > scary enough you can push it to nf. IMHO, it just adds
> > > unneeded delay there.
> > 
> > If it is not critical I would prefer for it to travel through
> > nf-next. Though I do not feel strongly about this.
> 
> Will enqueue for nf-next.
> 
> I'd appreciate if you can recover the tradition of attaching a short
> evaluation in the cover letter as I do when I send pull requests to
> David. Thanks!

Sure, will do.

^ permalink raw reply

* Re: [PATCH net-next] netfilter: xt_socket: use sock_gen_put()
From: Pablo Neira Ayuso @ 2013-10-17  8:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, netfilter-devel
In-Reply-To: <1381507405.4971.108.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 11, 2013 at 09:03:25AM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> TCP listener refactoring, part 7 :
> 
> Use sock_gen_put() instead of xt_socket_put_sk() for future
> SYN_RECV support.

Applied, thanks Eric.

^ permalink raw reply

* Re: [Xen-devel] [PATCH net] xen-netback: add the scenario which now beyond the range time_after_eq().
From: Jan Beulich @ 2013-10-17  8:26 UTC (permalink / raw)
  To: Jason Luan
  Cc: david.vrabel, ian.campbell, wei.liu2, xen-devel, annie.li, netdev
In-Reply-To: <1381944167-24918-1-git-send-email-jianhai.luan@oracle.com>

>>> On 16.10.13 at 19:22, Jason Luan <jianhai.luan@oracle.com> wrote:
> time_after_eq() only works if the delta is < MAX_ULONG/2.
> 
> If netfront sends at a very low rate, the time between subsequent calls
> to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for
> timer_after_eq() will be incorrect.  Credit will not be replenished and
> the guest may become unable to send (e.g., if prior to the long gap, all
> credit was exhausted).
> 
> We should add the scenario which now beyond next_credit+MAX_UNLONG/2. Because
> the fact now must be not before than expire, time_before(now, expire) == true
> will verify the scenario.
>     time_after_eq(now, next_credit) || time_before (now, expire)
>     ==
>     !time_in_range_open(now, expire, next_credit)

So first of all this must be with a 32-bit netback. And the not
coverable gap between activity is well over 240 days long. _If_
this really needs dealing with, then why is extending this from
240+ to 480+ days sufficient? I.e. why don't you simply
change to 64-bit jiffy values, and use time_after_eq64()?

Jan

> Signed-off-by: Jason Luan <jianhai.luan@oracle.com>
> ---
>  drivers/net/xen-netback/netback.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index f3e591c..31eedaf 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1194,8 +1194,11 @@ static bool tx_credit_exceeded(struct xenvif *vif, 
> unsigned size)
>  	if (timer_pending(&vif->credit_timeout))
>  		return true;
>  
> -	/* Passed the point where we can replenish credit? */
> -	if (time_after_eq(now, next_credit)) {
> +	/* Credit should be replenished when now does not fall into the
> +	 * range from expires to next_credit, and time_in_range_open()
> +	 * is used to verify whether this case happens.
> +	 */
> +	if (!time_in_range_open(now, vif->credit_timeout.expires, next_credit)) {
>  		vif->credit_timeout.expires = now;
>  		tx_add_credit(vif);
>  	}
> -- 
> 1.7.6.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox