Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH linux-3.10.y 3/3] ip6tnl: fix double free of fb_tnl_dev on exit
From: Nicolas Dichtel @ 2014-01-30 10:09 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, netdev, stable, williams, lclaudio, jkacur, willemb,
	Nicolas Dichtel
In-Reply-To: <1391076563-10798-1-git-send-email-nicolas.dichtel@6wind.com>

This problem was fixed upstream by commit 1e9f3d6f1c40 ("ip6tnl: fix use after
free of fb_tnl_dev").
The upstream patch depends on upstream commit 0bd8762824e7 ("ip6tnl: add x-netns
support"), which was not backported into 3.10 branch.

First, explain the problem: when the ip6_tunnel module is unloaded,
ip6_tunnel_cleanup() is called.
rmmod ip6_tunnel
=> ip6_tunnel_cleanup()
  => rtnl_link_unregister()
    => __rtnl_kill_links()
      => for_each_netdev(net, dev) {
        if (dev->rtnl_link_ops == ops)
        	ops->dellink(dev, &list_kill);
        }
At this point, the FB device is deleted (and all ip6tnl tunnels).
  => unregister_pernet_device()
    => unregister_pernet_operations()
      => ops_exit_list()
        => ip6_tnl_exit_net()
          => ip6_tnl_destroy_tunnels()
            => t = rtnl_dereference(ip6n->tnls_wc[0]);
               unregister_netdevice_queue(t->dev, &list);
We delete the FB device a second time here!

The previous fix removes these lines, which fix this double free. But the patch
introduces a memory leak when a netns is destroyed, because the FB device is
never deleted. By adding an rtnl ops which delete all ip6tnl device excepting
the FB device, we can keep this exlicit removal in ip6_tnl_destroy_tunnels().

CC: Steven Rostedt <rostedt@goodmis.org>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/ip6_tunnel.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0516ebbea80b..f21cf476b00c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1617,6 +1617,15 @@ static int ip6_tnl_changelink(struct net_device *dev, struct nlattr *tb[],
 	return ip6_tnl_update(t, &p);
 }
 
+static void ip6_tnl_dellink(struct net_device *dev, struct list_head *head)
+{
+	struct net *net = dev_net(dev);
+	struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
+
+	if (dev != ip6n->fb_tnl_dev)
+		unregister_netdevice_queue(dev, head);
+}
+
 static size_t ip6_tnl_get_size(const struct net_device *dev)
 {
 	return
@@ -1681,6 +1690,7 @@ static struct rtnl_link_ops ip6_link_ops __read_mostly = {
 	.validate	= ip6_tnl_validate,
 	.newlink	= ip6_tnl_newlink,
 	.changelink	= ip6_tnl_changelink,
+	.dellink	= ip6_tnl_dellink,
 	.get_size	= ip6_tnl_get_size,
 	.fill_info	= ip6_tnl_fill_info,
 };
-- 
1.8.4.1

^ permalink raw reply related

* Re: Help testing for USB ethernet/xHCI regression
From: renevant @ 2014-01-30 10:46 UTC (permalink / raw)
  To: David Laight
  Cc: 'Sarah Sharp', renevant@internode.on.net,
	linux-usb@vger.kernel.org, Mark Lord, Greg Kroah-Hartman,
	netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D0F6ACF20@AcuExch.aculab.com>

When using the ax88179 connected via the via based card the whole system gets 
brought down after a while i got this my system log.

I'm going to take a break and see if I can narrow anything more down tomorrow. 
This log is in reverse because of the wonderful way journalctl works.

I suppose I could try using iommu=pt as a kernel boot parameter but that 
doesn't sound like a safe thing to do.


 Jan 30 21:04:38 athas kernel: [<ffffffffc11281a0>] xhci_msi_irq [xhci_hcd]
Jan 30 21:04:38 athas kernel: handlers:
Jan 30 21:04:38 athas kernel:  [<ffffffffa74f9ee6>] 
system_call_fastpath+0x1a/0x1f
Jan 30 21:04:38 athas kernel:  [<ffffffffa715ef6a>] ? 
SyS_epoll_ctl+0x4fa/0xb00
Jan 30 21:04:38 athas kernel:  <EOI>  [<ffffffffa715f181>] ? 
SyS_epoll_ctl+0x711/0xb00
Jan 30 21:04:38 athas kernel:  [<ffffffffa74f982a>] common_interrupt+0x6a/0x6a
Jan 30 21:04:38 athas kernel:  [<ffffffffa700445a>] do_IRQ+0x4a/0xf0
Jan 30 21:04:38 athas kernel:  [<ffffffffa7004679>] handle_irq+0x19/0x30
Jan 30 21:04:38 athas kernel:  [<ffffffffa708200f>] handle_edge_irq+0x6f/0x120
Jan 30 21:04:38 athas kernel:  [<ffffffffa707f8b1>] handle_irq_event+0x31/0x50
Jan 30 21:04:38 athas kernel:  [<ffffffffa707f802>] 
handle_irq_event_percpu+0xc2/0x140
Jan 30 21:04:38 athas kernel:  [<ffffffffa7081b00>] note_interrupt+0xe0/0x1e0
Jan 30 21:04:38 athas kernel:  [<ffffffffa708176d>] __report_bad_irq+0x2d/0xc0
Jan 30 21:04:38 athas kernel:  <IRQ>  [<ffffffffa74efd3f>] 
dump_stack+0x45/0x56
Jan 30 21:04:38 athas kernel: Call Trace:
Jan 30 21:04:38 athas kernel:  0000000000000000 ffff88043edc3ed8 
ffffffffa7081b00 0000000000000000
Jan 30 21:04:38 athas kernel:  ffff88043edc3e98 ffffffffa708176d 
ffff880425031b00 000000000000004d
Jan 30 21:04:38 athas kernel:  ffff880425031b84 ffff88043edc3e70 
ffffffffa74efd3f ffff880425031b00
Jan 30 21:04:38 athas kernel: Hardware name: To be filled by O.E.M. To be 
filled by O.E.M./M5A99FX PRO R2.0, BIOS 2201 11/22/2013
Jan 30 21:04:38 athas kernel: CPU: 7 PID: 1160 Comm: Chrome_IOThread Not 
tainted 3.13.0+ #13
Jan 30 21:04:38 athas kernel: irq event 77: bogus return value ffffff94
Jan 30 21:04:38 athas kernel: xhci_hcd 0000:02:00.0: Host not halted after 
16000 microseconds.
Jan 30 21:04:38 athas kernel: AMD-Vi: Event logged [IO_PAGE_FAULT 
device=02:00.0 domain=0x0019 address=0x00000000002b2000 flags=0x0000]
Jan 30 21:04:38 athas kernel: xhci_hcd 0000:02:00.0: WARNING: Host System 
Error


Regards,

Will Trives

^ permalink raw reply

* Re: IGMP joins come from the wrong SA/interface
From: Steinar H. Gunderson @ 2014-01-30 10:47 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev
In-Reply-To: <20140120184025.GA19972@sesse.net>

On Mon, Jan 20, 2014 at 07:40:25PM +0100, Steinar H. Gunderson wrote:
>> I currently only remember one commit 0a7e22609067ff ("ipv4: fix
>> ineffective source address selection") which did affect multicast source
>> address selection in recent times.
> I tried 3.10.27, just to check something older. I also tried 3.10.27 with
> 0a7e22609067ff reverted, and it's still wrong.
> 
> I am thinking this might have something to do with the machine switching to
> systemd, presumably changing the order of DHCP and static addresses being
> assigned...

Anything more I can do here?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

^ permalink raw reply

* [PATCH net] net/ipv4: Use proper RCU APIs for writer-side in udp_offload.c
From: Or Gerlitz @ 2014-01-30 10:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, edumazet, Shlomo Pongratz, Or Gerlitz

From: Shlomo Pongratz <shlomop@mellanox.com>

RCU writer side should use rcu_dereference_protected() and not
rcu_dereference(), fix that. This also removes the "suspicious RCU usage"
warning seen when running with CONFIG_PROVE_RCU.

Fixes: b582ef0 ('net: Add GRO support for UDP encapsulating protocols')
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
---
 net/ipv4/udp_offload.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 2ffea6f..1bf21d4 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -109,7 +109,8 @@ int udp_add_offload(struct udp_offload *uo)
 	new_offload->offload = uo;
 
 	spin_lock(&udp_offload_lock);
-	rcu_assign_pointer(new_offload->next, rcu_dereference(*head));
+	rcu_assign_pointer(new_offload->next,
+			   rcu_dereference_protected(*head, lockdep_is_held(&udp_offload_lock)));
 	rcu_assign_pointer(*head, new_offload);
 	spin_unlock(&udp_offload_lock);
 
@@ -130,12 +131,15 @@ void udp_del_offload(struct udp_offload *uo)
 
 	spin_lock(&udp_offload_lock);
 
-	uo_priv = rcu_dereference(*head);
+	uo_priv = rcu_dereference_protected(*head,
+					    lockdep_is_held(&udp_offload_lock));
 	for (; uo_priv != NULL;
-		uo_priv = rcu_dereference(*head)) {
-
+	     uo_priv = rcu_dereference_protected(*head,
+						 lockdep_is_held(&udp_offload_lock))) {
 		if (uo_priv->offload == uo) {
-			rcu_assign_pointer(*head, rcu_dereference(uo_priv->next));
+			rcu_assign_pointer(*head,
+					   rcu_dereference_protected(uo_priv->next,
+								     lockdep_is_held(&udp_offload_lock)));
 			goto unlock;
 		}
 		head = &uo_priv->next;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net] e100: Fix "disabling already-disabled device" warning
From: Michele Baldessari @ 2014-01-30 10:51 UTC (permalink / raw)
  To: netdev; +Cc: e1000-devel, idirectscm, David S. Miller, Michele Baldessari

In https://bugzilla.redhat.com/show_bug.cgi?id=994438 and
https://bugzilla.redhat.com/show_bug.cgi?id=970480  we
received different reports of e100 throwing the following
warning:

 [<c06a0ba5>] ? pci_disable_device+0x85/0x90
 [<c044a153>] warn_slowpath_fmt+0x33/0x40
 [<c06a0ba5>] pci_disable_device+0x85/0x90
 [<f7fdf7e0>] __e100_shutdown+0x80/0x120 [e100]
 [<c0476ca5>] ? check_preempt_curr+0x65/0x90
 [<f7fdf8d6>] e100_suspend+0x16/0x30 [e100]
 [<c06a1ebb>] pci_legacy_suspend+0x2b/0xb0
 [<c098fc0f>] ? wait_for_completion+0x1f/0xd0
 [<c06a2d50>] ? pci_pm_poweroff+0xb0/0xb0
 [<c06a2de4>] pci_pm_freeze+0x94/0xa0
 [<c0767bb7>] dpm_run_callback+0x37/0x80
 [<c076a204>] ? pm_wakeup_pending+0xc4/0x140
 [<c0767f12>] __device_suspend+0xb2/0x1f0
 [<c076806f>] async_suspend+0x1f/0x90
 [<c04706e5>] async_run_entry_fn+0x35/0x140
 [<c0478aef>] ? wake_up_process+0x1f/0x40
 [<c0464495>] process_one_work+0x115/0x370
 [<c0462645>] ? start_worker+0x25/0x30
 [<c0464dc5>] ? manage_workers.isra.27+0x1a5/0x250
 [<c0464f6e>] worker_thread+0xfe/0x330
 [<c0464e70>] ? manage_workers.isra.27+0x250/0x250
 [<c046a224>] kthread+0x94/0xa0
 [<c0997f37>] ret_from_kernel_thread+0x1b/0x28
 [<c046a190>] ? insert_kthread_work+0x30/0x30

This patch removes pci_disable_device() from __e100_shutdown().
pci_clear_master() is enough.

Signed-off-by: Michele Baldessari <michele@acksyn.org>
Tested-by: Mark Harig <idirectscm@aim.com>
---
 drivers/net/ethernet/intel/e100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
index cbaba44..bf7a01e 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -3034,7 +3034,7 @@ static void __e100_shutdown(struct pci_dev *pdev, bool *enable_wake)
 		*enable_wake = false;
 	}
 
-	pci_disable_device(pdev);
+	pci_clear_master(pdev);
 }
 
 static int __e100_power_off(struct pci_dev *pdev, bool wake)
-- 
1.8.5.3

^ permalink raw reply related

* Re: [Patch net] net: allow setting mac address of loopback device
From: Neil Horman @ 2014-01-30 12:04 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Stephen Hemminger, Eric Dumazet, David S. Miller
In-Reply-To: <1391038731-7501-1-git-send-email-xiyou.wangcong@gmail.com>

On Wed, Jan 29, 2014 at 03:38:51PM -0800, Cong Wang wrote:
> We are trying to mirror the local traffic from lo to eth0,
> allowing setting mac address of lo to eth0 would make
> the ether addresses in these packets correct, so that
> we don't have to modify the ether header again.
> 
> Since usually no one cares about its mac address (all-zero),
> it is safe to allow those who care to set its mac address.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 
> ---
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index c5011e0..a0ee030 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -160,6 +160,7 @@ static const struct net_device_ops loopback_ops = {
>  	.ndo_init      = loopback_dev_init,
>  	.ndo_start_xmit= loopback_xmit,
>  	.ndo_get_stats64 = loopback_get_stats64,
> +	.ndo_set_mac_address = eth_mac_addr,
>  };
>  
>  /*
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Seems reasonable
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* IPv4 / IPv6 over IPv4 IPsec tunnel: setting the DF bit
From: Simon Schneider @ 2014-01-30 12:25 UTC (permalink / raw)
  To: netdev

Hi,
for the scenarios
- IPv4 over IPv4 IPsec tunnel
- IPv6 over IPv4 IPsec tunnel

I wonder how the DF bit of the outer (encrypted) packet is set.

There are generally three options:
- DF bit always 0
- DF bit always 1
- DF bit copied from inner packet

(the last case is obviously not applicable for the IPv6 case, as the IPv6 header does not have a DF bit).

How is this done in Linux?

When investigating, I stumbled over defines named TNL_F_DF_INHERIT / TNL_F_DF_DEFAULT.

Are these still supported?

Is it possible to configure the behavior at runtime or just at compile time?

I would appreciate very much if someone could give an overview on this!

best regards, Simon

^ permalink raw reply

* Re: Help testing for USB ethernet/xHCI regression
From: renevant @ 2014-01-30 12:46 UTC (permalink / raw)
  To: renevant
  Cc: David Laight, 'Sarah Sharp', linux-usb@vger.kernel.org,
	Mark Lord, Greg Kroah-Hartman, netdev@vger.kernel.org
In-Reply-To: <1703720.2GNpoTUMCv@athas>

via vl800 pcie card 
kernel parameter iommu=pt
ethtool -K xxx sg off
ifconfig xxx mtu 4060 up

stable so far, it's way past the point that it usually crashes.

i'll do proper testing tomorrow


iommu=pt   bah !

Regards,

Will Trives

On Thursday 30 January 2014 21:46:27 renevant@internode.on.net wrote:
> When using the ax88179 connected via the via based card the whole system
> gets brought down after a while i got this my system log.
> 
> I'm going to take a break and see if I can narrow anything more down
> tomorrow. This log is in reverse because of the wonderful way journalctl
> works.
> 
> I suppose I could try using iommu=pt as a kernel boot parameter but that
> doesn't sound like a safe thing to do.
> 
> 
>  Jan 30 21:04:38 athas kernel: [<ffffffffc11281a0>] xhci_msi_irq [xhci_hcd]
> Jan 30 21:04:38 athas kernel: handlers:
> Jan 30 21:04:38 athas kernel:  [<ffffffffa74f9ee6>]
> system_call_fastpath+0x1a/0x1f
> Jan 30 21:04:38 athas kernel:  [<ffffffffa715ef6a>] ?
> SyS_epoll_ctl+0x4fa/0xb00
> Jan 30 21:04:38 athas kernel:  <EOI>  [<ffffffffa715f181>] ?
> SyS_epoll_ctl+0x711/0xb00
> Jan 30 21:04:38 athas kernel:  [<ffffffffa74f982a>]
> common_interrupt+0x6a/0x6a Jan 30 21:04:38 athas kernel: 
> [<ffffffffa700445a>] do_IRQ+0x4a/0xf0 Jan 30 21:04:38 athas kernel: 
> [<ffffffffa7004679>] handle_irq+0x19/0x30 Jan 30 21:04:38 athas kernel: 
> [<ffffffffa708200f>] handle_edge_irq+0x6f/0x120 Jan 30 21:04:38 athas
> kernel:  [<ffffffffa707f8b1>] handle_irq_event+0x31/0x50 Jan 30 21:04:38
> athas kernel:  [<ffffffffa707f802>]
> handle_irq_event_percpu+0xc2/0x140
> Jan 30 21:04:38 athas kernel:  [<ffffffffa7081b00>]
> note_interrupt+0xe0/0x1e0 Jan 30 21:04:38 athas kernel: 
> [<ffffffffa708176d>] __report_bad_irq+0x2d/0xc0 Jan 30 21:04:38 athas
> kernel:  <IRQ>  [<ffffffffa74efd3f>]
> dump_stack+0x45/0x56
> Jan 30 21:04:38 athas kernel: Call Trace:
> Jan 30 21:04:38 athas kernel:  0000000000000000 ffff88043edc3ed8
> ffffffffa7081b00 0000000000000000
> Jan 30 21:04:38 athas kernel:  ffff88043edc3e98 ffffffffa708176d
> ffff880425031b00 000000000000004d
> Jan 30 21:04:38 athas kernel:  ffff880425031b84 ffff88043edc3e70
> ffffffffa74efd3f ffff880425031b00
> Jan 30 21:04:38 athas kernel: Hardware name: To be filled by O.E.M. To be
> filled by O.E.M./M5A99FX PRO R2.0, BIOS 2201 11/22/2013
> Jan 30 21:04:38 athas kernel: CPU: 7 PID: 1160 Comm: Chrome_IOThread Not
> tainted 3.13.0+ #13
> Jan 30 21:04:38 athas kernel: irq event 77: bogus return value ffffff94
> Jan 30 21:04:38 athas kernel: xhci_hcd 0000:02:00.0: Host not halted after
> 16000 microseconds.
> Jan 30 21:04:38 athas kernel: AMD-Vi: Event logged [IO_PAGE_FAULT
> device=02:00.0 domain=0x0019 address=0x00000000002b2000 flags=0x0000]
> Jan 30 21:04:38 athas kernel: xhci_hcd 0000:02:00.0: WARNING: Host System
> Error
> 
> 
> Regards,
> 
> Will Trives

^ permalink raw reply

* Re: [PATCH net v2 5/9] bridge: Fix the way to check if a local fdb entry can be deleted
From: Toshiaki Makita @ 2014-01-30 12:50 UTC (permalink / raw)
  To: Stephen Hemminger, vyasevic, David S . Miller; +Cc: netdev
In-Reply-To: <1387526576.3475.5.camel@ubuntu-vm-makita>

On Fri, 2013-12-20 at 17:02 +0900, Toshiaki Makita wrote:
> On Thu, 2013-12-19 at 09:39 -0800, Stephen Hemminger wrote:
> ...
> > Could we make up as set of test case scripts to validate these changes.
> > Now that FDB table can be manipulated by iproute tools, should be possible
> > to have set of cases for validation.
> 
> Thank you for your suggestion.
> Maybe is it enough to make some test of port attach/detach and confirm
> that data/control plane doesn't result in any inconsistency or odd
> situation by tcpdump/bridge commands?

Sorry for replying an old thread..

I tested traffic and fdb entries when attaching/detaching a bridge port,
and couldn't find any problem with this patch.

Instead, I found an additional undesirable behavior without this patch:
ping to a bridge device (with arp resolution) sometimes fails while
detaching a port.
The bridge device continues to have its mac address for a while when a
port is detached, but in current implementation we immediately delete
corresponding fdb entry even though that mac address is used by the
bridge device. I think this caused ping fails because the replied mac
address actually can't reach the bridge device due to the premature fdb
entry deletion.

I need to rearrange this patch set, but I'm going to keep this change as
is.


The test I did is:
- Attach/detach a port while sending traffic, and confirm tat the
traffic is changed to being [flooded/delivered to the bridge] as
expected. Also, confirm that corresponding fdb entry is added/deleted as
expected.
- Attach/detach a port while flushing arp entry and pinging, and confirm
that ping doesn't fail.

The test script I used and the result is below.

---- Test script begin ----
#!/bin/sh

# This is a test script for validating fdb and traffic consistency
# when a bridge port is attached/detached.
#
# This script does two types of tests.
# Forwarding test:
#   Confirm that traffic to mac address of PORT0 is delivered to bridge
#   when PORT0 is attached and flooded when PORT0 is detached.
#   And measure the time window between "ip set master" and actual traffic
#   change by "date" command (clock_gettime) and timestamp in tcpdump.
# ARP/ICMP test:
#   Confirm that ping can always suceed with arp resolution each time when
#   PORT0 gets attached/detached.
# In each test, this validates fdb entries as well, i.e. if attaching PORT0,
# we must have a local entry where its dst and address is those of PORT0,
# and if detaching PORT0, we must not have such an entry.
#
# This test script assumes environment below.
# BR0 has three ports. Two of them (PORT1/2) are veths to namespaces.
# PORT0 is subject for attaching/detaching.
#
# +----------------------------------------------+
# | +-------------+                              |
# | |     +-------|        +-----+  +---+  +-----|
# | | NS0 |NS0_IF0|--VETH--|PORT1|--|BR0|--|PORT0|
# | |     +-------|        +-----+  +---+  +-----|
# | +-------------+                   |          |
# |                                   |          |
# | +-------------+                   |          |
# | |     +-------|        +-----+    |          |
# | | NS1 |NS1_IF0|--VETH--|PORT2|----+          |
# | |     +-------|        +-----+               |
# | +-------------+                              |
# |                             Physical machine |
# +----------------------------------------------+
#
# Test scenarios are
# Forwarding test:
#   Send traffic from NS0 to the mac address of PORT0.
#   The traffic should be delivered to BR0 when attached, and to NS1 when
#   detached.
# ARP/ICMP test:
#   Invalidate the arp entry of the mac address of PORT0 and do "ping -c 1"
#   to the ip address of BR0.
#   Ping should always succeed even if attach/detach occurs.

export LC_ALL=C

# Number of tests
ITERATIONS=10
[ -n "$1" ] && ITERATIONS=$1

# Namespace
NS0=ns0
NS1=ns1
NS0CMD="ip netns exec $NS0"

# Veth interface
VETH_TO_NS0_IF=veth0
NS0_IF=ns0-eth0
VETH_TO_NS1_IF=veth1
NS1_IF=ns1-eth0

# Bridge
BR0=br0

# Bridge interfaces
PORT0=em1
PORT1=$VETH_TO_NS0_IF
PORT2=$VETH_TO_NS1_IF

# MAC addresses
MAC0=12:34:56:78:90:ab # for PORT0, smallest among all bridge ports
NS0_MAC=aa:bb:cc:dd:ee:00
NS1_MAC=aa:bb:cc:dd:ee:01
FLOOD_MAC=aa:bb:cc:dd:ee:02

# IP addresses
BR0_IP=192.168.0.1
NS0_IP=192.168.0.2
PREFIX=24

# Pktgen parameters
PGTHREAD=/proc/net/pktgen/kpktgend_0
PGDEV=/proc/net/pktgen/$NS0_IF
PGCTRL=/proc/net/pktgen/pgctrl

# Directories to store logs
RESULT_DIR=/tmp

# Test statistics
SEND_FAILS_ATTACH=0
NO_ENTRY_ATTACH=0
WRONG_ENTRY_ATTACH=0
UNDELETED_ENTRY_ATTACH=0
CAPTURE_DROPS_ATTACH=0
NO_CAPTURE_ATTACH=0
DELAYED_FLOW_CHANGE_ATTACH=0
DROPS_UNKNOWN_REASON_ATTACH=0

SEND_FAILS_DETACH=0
NO_ENTRY_DETACH=0
WRONG_ENTRY_DETACH=0
UNDELETED_ENTRY_DETACH=0
CAPTURE_DROPS_DETACH=0
NO_CAPTURE_DETACH=0
DELAYED_FLOW_CHANGE_DETACH=0
DROPS_UNKNOWN_REASON_DETACH=0

WINDOW_SUM_ATTACH=0
WINDOW_MAX_ATTACH=0
WINDOW_MIN_ATTACH=-1

WINDOW_SUM_DETACH=0
WINDOW_MAX_DETACH=0
WINDOW_MIN_DETACH=-1

PING_FAIL_COUNT_ATTACH=0
PING_ERROR_COUNT_ATTACH=0
PING_FAIL_COUNT_DETACH=0
PING_ERROR_COUNT_DETACH=0

FORWARD_TEST_COUNT_ATTACH=0
FORWARD_TEST_COUNT_DETACH=0
REPLY_TEST_COUNT_ATTACH=0
REPLY_TEST_COUNT_DETACH=0

prepare ()
{
	modprobe pktgen

	# Namespace settings
	for i in 0 1; do
		eval NS=\$NS$i
		eval NS_IF=\$NS${i}_IF
		eval VETH_TO_NS_IF=\$VETH_TO_NS${i}_IF
		eval NS_MAC=\$NS${i}_MAC

		if ip netns | grep -q $NS; then
			ip netns exec $NS ip link show dev $NS_IF > /dev/null 2>&1 && \
			ip netns exec $NS ip link del $NS_IF
			ip netns del $NS
		fi

		ip netns add $NS

		ip link show dev $VETH_TO_NS_IF > /dev/null 2>&1 && \
		ip link del $VETH_TO_NS_IF
		ip link show dev $NS_IF > /dev/null 2>&1 && ip link del $NS_IF
		ip link add $VETH_TO_NS_IF type veth peer name $NS_IF

		ip link set $NS_IF netns $NS
		ip netns exec $NS ip link set $NS_IF up
		ip link set $VETH_TO_NS_IF address $NS_MAC 
	done

	# PORT0 address setting
	ip link set $PORT0 address $MAC0

	# Bridge settings
	ip link show dev $BR0 > /dev/null 2>&1 && ip link del $BR0
	ip link add $BR0 type bridge

	ip link set $PORT1 master $BR0 # NS0
	ip link set $PORT2 master $BR0 # NS1

	# IP address settings
	ip addr add ${BR0_IP}/$PREFIX dev $BR0
	$NS0CMD ip addr add ${NS0_IP}/$PREFIX dev $NS0_IF

	# Enable bridge and ports
	ip link set $PORT0 up
	ip link set $PORT1 up
	ip link set $PORT2 up
	ip link set $BR0 up
}

# Set a pktgen parameter
pgset ()
{
	PGPATH=$2
	if [ x"$3" == x"bg" -a x"$PGPATH" == x"$PGCTRL" ]; then
		$NS0CMD sh -c "echo $1 > $PGPATH" &
		PG_PID=$!
		ERROR=$?
	else
		$NS0CMD sh -c "echo $1 > $PGPATH"
		ERROR=$?
		RESULT=`$NS0CMD cat $PGPATH`
		if ! echo "$RESULT" | fgrep -q "Result: OK:"; then
			echo "$RESULT" | fgrep Result: 1>&2
			ERROR=1
		fi
	fi
	[ $ERROR -ne 0 ] && return 1
	return 0
}

# Send frames using pktgen
send_frames ()
{
	COUNT=$1
	DST=$2
	BG=$3

	pgset "rem_device_all" $PGTHREAD || return 1
	pgset "add_device $NS0_IF" $PGTHREAD || return 1
	
	pgset "count $COUNT" $PGDEV || return 1
	pgset "pkt_size 60" $PGDEV || return 1
	pgset "dst_mac $DST" $PGDEV || return 1
	pgset "delay 0" $PGDEV || return 1

	pgset "start" $PGCTRL $BG || return 1
	return 0
}

# Send frames and attach/detach PORT0
do_forward_test ()
{
	ATTACH=$1
	if [ x"$ATTACH" == x"attach" ]; then
		MASTER="master $BR0"
	else
		MASTER="nomaster"
	fi

	DUMP_PROG=tcpdump
	DATE=`date +%H%M%S%N`
	BR0_DUMP=${RESULT_DIR}/${BR0}_${DATE}.dump
	BR0_DUMPLOG=${RESULT_DIR}/${BR0}_${DATE}.log
	NS1_IF_DUMP=${RESULT_DIR}/${NS1_IF}_${DATE}.dump
	NS1_IF_DUMPLOG=${RESULT_DIR}/${NS1_IF}_${DATE}.log
	FILTER="udp and dst port 9"

	# Start capturing
	$DUMP_PROG -p -i $BR0 -f "$FILTER" -s 64 \
	-w $BR0_DUMP 2> $BR0_DUMPLOG &
	DUMP_PIDS=$!

	ip netns exec $NS1 $DUMP_PROG -p -i $NS1_IF -f "$FILTER" -s 64 \
	-w $NS1_IF_DUMP 2> $NS1_IF_DUMPLOG &
	DUMP_PIDS="$DUMP_PIDS $!"

	# Wait for capturing start
	BR0_CAPTURE_OK=0
	NS1_IF_CAPTURE_OK=0
	while [ $BR0_CAPTURE_OK -eq 0 -o $NS1_IF_CAPTURE_OK -eq 0 ]; do
		[ $BR0_CAPTURE_OK -eq 0 ] &&  send_frames 1000 "$NS0_MAC" fg
		[ $NS1_IF_CAPTURE_OK -eq 0 ] && send_frames 1000 "$FLOOD_MAC" fg
		sleep 0.2
		[ $BR0_CAPTURE_OK -eq 0 ] && \
		[ `tshark -n -r $BR0_DUMP 2> /dev/null | wc -l` -ne 0 ] && \
		BR0_CAPTURE_OK=1
		[ $NS1_IF_CAPTURE_OK -eq 0 ] && \
		[ `tshark -n -r $NS1_IF_DUMP 2> /dev/null | wc -l` -ne 0 ] && \
		NS1_IF_CAPTURE_OK=1
	done

	# Do test
	PG_PID=""
	if send_frames 0 "$MAC0" bg; then
		# Wait for frame sending start
		sleep 0.2

		# Change traffic destination
		ADD_DEL_IF_TIME=`date +%H%M%S%N`
		ip link set $PORT0 $MASTER

		# Wait for traffic flow change
		sleep 0.1

		SEND_FAILED=0
	else
		SEND_FAILED=1
	fi

	# Stop capturing
	kill $DUMP_PIDS $PG_PID
	wait $DUMP_PIDS $PG_PID 2> /dev/null
	DUMP_PIDS=""
	PG_PID=""

	if [ $SEND_FAILED -eq 0 ]; then
		SEND_ERROR_COUNT=`$NS0CMD tail -1 $PGDEV | \
		sed -n 's/.*errors: \([0-9]\+\)/\1/p'`
		[ "$SEND_ERROR_COUNT" -ne 0 ] && SEND_FAILED=1
	fi
}

validate_fdb ()
{
	local AT_TYPE=$1

	FDB_ENTRY=`bridge fdb show | grep -v self | grep $MAC0`
	if [ -z "$FDB_ENTRY" ]; then
		# Retrieving a particular fdb entry might fail because
		# "bridge fdb show" could consist of multiple netlink
		# recvmsg()s and fdb might changes between each recvmsg().
		# Check fdb again to make sure.
		FDB_ENTRY=`bridge fdb show | grep -v self | grep $MAC0`
	fi

	if [ x"$AT_TYPE" == x"ATTACH" ]; then
		if [ -z "$FDB_ENTRY" ]; then
			NO_ENTRY_ATTACH=`expr $NO_ENTRY_ATTACH + 1`
			return 1
		fi
	
		if ! echo $FDB_ENTRY | grep -q "permanent"; then
			WRONG_ENTRY_ATTACH=`expr $WRONG_ENTRY_ATTACH + 1`
			return 1
		fi
		if ! echo $FDB_ENTRY | grep -q "$PORT0"; then
			WRONG_ENTRY_ATTACH=`expr $WRONG_ENTRY_ATTACH + 1`
			return 1
		fi
	else
		if [ -n "$FDB_ENTRY" ]; then
			if ! echo $FDB_ENTRY | grep -q "$PORT0"; then
				WRONG_ENTRY_DETACH=`expr $WRONG_ENTRY_DETACH + 1`
				return 1
			fi
			UNDELETED_ENTRY_DETACH=`expr $UNDELETED_ENTRY_DETACH + 1`
			return 1
		fi
	fi

	return 0
}

validate_forward ()
{
	if [ x"$1" == x"attach" ]; then
		DUMP_BEFORE_CHANGE=$NS1_IF_DUMP
		DUMP_AFTER_CHANGE=$BR0_DUMP
		local AT_TYPE="ATTACH"
	else
		DUMP_BEFORE_CHANGE=$BR0_DUMP
		DUMP_AFTER_CHANGE=$NS1_IF_DUMP
		local AT_TYPE="DETACH"
	fi

	if [ $SEND_FAILED -eq 1 ]; then
		eval local SEND_FAILS=\$SEND_FAILS_$AT_TYPE
		eval SEND_FAILS_$AT_TYPE=`expr $SEND_FAILS + 1`
		return 2
	fi

	validate_fdb $AT_TYPE
	local RET=$?
	[ $RET -ne 0 ] && return $RET

	# Validate captured data
	BR0_CAPTURE_DROPS=`tail -1 $BR0_DUMPLOG | awk '{print $1}'`
	NS1_IF_CAPTURE_DROPS=`tail -1 $NS1_IF_DUMPLOG | awk '{print $1}'`
	if [ "$BR0_CAPTURE_DROPS" -ne 0 -o \
	     "$NS1_IF_CAPTURE_DROPS" -ne 0 ]; then
		# Kernel failed to store captured frames due to no space
		eval local CAPTURE_DROPS=\$CAPTURE_DROPS_$AT_TYPE
		eval CAPTURE_DROPS_$AT_TYPE=`expr $CAPTURE_DROPS + 1`
		return 2
	fi

	CAPTURED_BEFORE_CHANGE=`tshark -n -r $DUMP_BEFORE_CHANGE -T fields \
	-e frame.number -Y "eth.dst eq $MAC0" 2> /dev/null | wc -l`
	CAPTURED_AFTER_CHANGE=`tshark -n -r $DUMP_AFTER_CHANGE -T fields \
	-e frame.number -Y "eth.dst eq $MAC0" 2> /dev/null | wc -l`
	if [ $CAPTURED_BEFORE_CHANGE -eq 0 ]; then
		# Couldn't captured expected traffic
		# Maybe one of 'sleep's was too short
		eval local NO_CAPTURE=\$NO_CAPTURE_$AT_TYPE
		eval NO_CAPTURE_$AT_TYPE=`expr $NO_CAPTURE + 1`
		return 2
	fi
	if [ $CAPTURED_AFTER_CHANGE -eq 0 ]; then
		# This implies too late traffic flow change
		eval local DELAYED_FLOW_CHANGE=\$DELAYED_FLOW_CHANGE_$AT_TYPE
		eval DELAYED_FLOW_CHANGE_$AT_TYPE=`expr $DELAYED_FLOW_CHANGE + 1`
		return 1
	fi

	LAST_SEQ_BEFORE_CHANGE=`tshark -n -r $DUMP_BEFORE_CHANGE -T fields \
	-e pktgen.seqnum -Y "eth.dst eq $MAC0" 2> /dev/null | tail -1`
	FIRST_SEQ_AFTER_CHANGE=`tshark -n -r $DUMP_AFTER_CHANGE -T fields \
	-e pktgen.seqnum -Y "eth.dst eq $MAC0" 2> /dev/null | head -1`
	LAST_SEQ_AFTER_CHANGE=`tshark -n -r $DUMP_AFTER_CHANGE -T fields \
	-e pktgen.seqnum -Y "eth.dst eq $MAC0" 2> /dev/null | tail -1`

	# Check drops by unknown reason
	if [ `expr $FIRST_SEQ_AFTER_CHANGE - $LAST_SEQ_BEFORE_CHANGE` -ne 1 -o \
	     "$CAPTURED_BEFORE_CHANGE" -ne "$LAST_SEQ_BEFORE_CHANGE" -o \
	     "$CAPTURED_AFTER_CHANGE" -ne \
	     `expr $LAST_SEQ_AFTER_CHANGE - $LAST_SEQ_BEFORE_CHANGE` ]; then
		eval local DROPS_UNKNOWN_REASON=\$DROPS_UNKNOWN_REASON_$AT_TYPE
		eval DROPS_UNKNOWN_REASON_$AT_TYPE=`expr $DROPS_UNKNOWN_REASON + 1`
		return 1
	fi

	# Calculate window time
	TRAFFIC_CHANGED_TIME=`tshark -n -r $DUMP_AFTER_CHANGE -T fields \
	-e frame.time -Y "eth.dst eq $MAC0" 2> /dev/null | head -1 | \
	awk '{print $4}' | sed s/[:.]//g`

	h1=`echo $ADD_DEL_IF_TIME | cut -b 1-2`
	m1=`echo $ADD_DEL_IF_TIME | cut -b 3-4`
	s1=`echo $ADD_DEL_IF_TIME | cut -b 5-6`
	us1=`echo $ADD_DEL_IF_TIME | cut -b 7-12`
	TIME1=`expr $h1 \* 3600 + $m1 \* 60 + $s1`
	TIME1=`expr $TIME1 \* 1000000 + $us1`

	h2=`echo $TRAFFIC_CHANGED_TIME | cut -b 1-2`
	[ "$h2" -lt "$h1" ] && h2=`expr $h2 + 24`
	m2=`echo $TRAFFIC_CHANGED_TIME | cut -b 3-4`
	s2=`echo $TRAFFIC_CHANGED_TIME | cut -b 5-6`
	us2=`echo $TRAFFIC_CHANGED_TIME | cut -b 7-12`
	TIME2=`expr $h2 \* 3600 + $m2 \* 60 + $s2`
	TIME2=`expr $TIME2 \* 1000000 + $us2`

	WINDOW=`expr $TIME2 - $TIME1`
	eval local WINDOW_SUM=\$WINDOW_SUM_$AT_TYPE
	eval WINDOW_SUM_$AT_TYPE=`expr $WINDOW_SUM + $WINDOW`
	eval local WINDOW_MAX=\$WINDOW_MAX_$AT_TYPE
	[ "$WINDOW" -gt "$WINDOW_MAX" ] && eval WINDOW_MAX_$AT_TYPE=$WINDOW
	eval local WINDOW_MIN=\$WINDOW_MIN_$AT_TYPE
	[ "$WINDOW_MIN" -eq -1 -o "$WINDOW" -lt "$WINDOW_MIN" ] && \
	eval WINDOW_MIN_$AT_TYPE=$WINDOW

	return 0
}

cleanup_forward_logs ()
{
	/bin/rm -f "$BR0_DUMP"
	/bin/rm -f "$BR0_DUMPLOG"
	/bin/rm -f "$NS1_IF_DUMP"
	/bin/rm -f "$NS1_IF_DUMPLOG"
}

send_probes ()
{
	: > $PROBE_RESULT
	while :; do
		# Invalidate arp cache
		$NS0CMD ip neigh del $BR0_IP dev $NS0_IF 2> /dev/null
		# Send arp request and echo request.
		# This may fail when mac address of BR0 changes
		# between arp reply and echo request.
		$NS0CMD ping -c 1 -w 1 $BR0_IP > /dev/null 2>&1 || \
		echo >> $PROBE_RESULT
	done
}

do_reply_test ()
{
	ATTACH=$1
	if [ x"$ATTACH" == x"attach" ]; then
		MASTER="master $BR0"
	else
		MASTER="nomaster"
	fi

	DATE=`date +%H%M%S%N`
	PROBE_RESULT=${RESULT_DIR}/PROBE_${DATE}.log

	send_probes &
	PROBE_PID=$!
	sleep 0.1

	# Change traffic destination
	ip link set $PORT0 $MASTER

	# Wait for traffic flow change
	sleep 1.2

	# Stop probing
	kill $PROBE_PID
	wait $PROBE_PID 2> /dev/null
	PROBE_PID=""
}

validate_reply ()
{
	if [ x"$1" == x"attach" ]; then
		local AT_TYPE="ATTACH"
	else
		local AT_TYPE="DETACH"
	fi

	validate_fdb $AT_TYPE
	local RET=$?
	[ $RET -ne 0 ] && return $RET

	# validate ping result
	local PING_FAIL=`cat $PROBE_RESULT | wc -l`
	if [ $PING_FAIL -eq 1 ]; then
		# ping may fail if mac address changes between arp reply and
		# echo request.
		eval local PING_FAIL_COUNT=\$PING_FAIL_COUNT_$AT_TYPE
		eval PING_FAIL_COUNT_$AT_TYPE=`expr $PING_FAIL_COUNT + 1`
		return 2
	elif [ $PING_FAIL -gt 1 ]; then
		# ping should not fail two or more times.
		eval local PING_ERROR_COUNT=\$PING_ERROR_COUNT_$AT_TYPE
		eval PING_ERROR_COUNT_$AT_TYPE=`expr $PING_ERROR_COUNT + 1`
		return 1
	fi

	return 0
}

cleanup_reply_logs ()
{
	/bin/rm -f "$PROBE_RESULT"
}

output_results ()
{
	echo "" 1>&2

	local AT_TYPE
	for AT_TYPE in ATTACH DETACH; do
		eval local NO_ENTRY=\$NO_ENTRY_$AT_TYPE
		eval local WRONG_ENTRY=\$WRONG_ENTRY_$AT_TYPE
		eval local UNDELETED_ENTRY=\$UNDELETED_ENTRY_$AT_TYPE
		eval local DELAYED_FLOW_CHANGE=\$DELAYED_FLOW_CHANGE_$AT_TYPE
		eval local DROPS_UNKNOWN_REASON=\$DROPS_UNKNOWN_REASON_$AT_TYPE
		eval local PING_ERROR_COUNT=\$PING_ERROR_COUNT_$AT_TYPE
		ERROR_COUNT=`expr $NO_ENTRY + $WRONG_ENTRY + $UNDELETED_ENTRY + $DELAYED_FLOW_CHANGE + $DROPS_UNKNOWN_REASON + $PING_ERROR_COUNT`

		eval local SEND_FAILS=\$SEND_FAILS_$AT_TYPE
		eval local CAPTURE_DROPS=\$CAPTURE_DROPS_$AT_TYPE
		eval local NO_CAPTURE=\$NO_CAPTURE_$AT_TYPE
		eval local PING_FAIL_COUNT=\$PING_FAIL_COUNT_$AT_TYPE
		INFO_COUNT=`expr $SEND_FAILS + $CAPTURE_DROPS + $NO_CAPTURE + $PING_FAIL_COUNT`

		eval local FORWARD_TEST_COUNT=\$FORWARD_TEST_COUNT_$AT_TYPE
		eval local REPLY_TEST_COUNT=\$REPLY_TEST_COUNT_$AT_TYPE
		local TEST_COUNT=`expr $FORWARD_TEST_COUNT + $REPLY_TEST_COUNT`

		echo "$AT_TYPE test result:"
		if [ "$ERROR_COUNT" -ne 0 ]; then
			echo "Validation failed: $ERROR_COUNT"
			if [ "$NO_ENTRY" -ne 0 ]; then
				echo -e "\tExpected fdb entry not found: $NO_ENTRY"
			fi
			if [ "$WRONG_ENTRY" -ne 0 ]; then
				echo -e "\tExpected fdb entry had wrong attribute: $WRONG_ENTRY"
			fi
			if [ "$UNDELETED_ENTRY" -ne 0 ]; then
				echo -e "\tUnexpected fdb entry found: $UNDELETED_ENTRY"
			fi
			if [ "$DELAYED_FLOW_CHANGE" -ne 0 ]; then
				echo -e "\tWindow time was too long (over 100 ms): $DELAYED_FLOW_CHANGE"
			fi
			if [ "$DROPS_UNKNOWN_REASON" -ne 0 ]; then
				echo -e "\tCouldn't capture due to unknown reason: $DROPS_UNKNOWN_REASON"
			fi
			if [ "$PING_ERROR_COUNT" -ne 0 ]; then
				echo -e "\tPing failed two or more times: $PING_ERROR_COUNT"
			fi
		else
			if [ "$TEST_COUNT" -ne 0 ]; then
				echo "All validations succeeded."
				echo "Number of valid tests: $TEST_COUNT"
				echo -e "\tForwarding tests: $FORWARD_TEST_COUNT"
				echo -e "\tARP/ICMP tests: $REPLY_TEST_COUNT"
			else
				echo "No valid test was done."
			fi
		fi

		if [ "$FORWARD_TEST_COUNT" -ne 0 ]; then
			eval local WINDOW_SUM=\$WINDOW_SUM_$AT_TYPE
			eval local WINDOW_MAX=\$WINDOW_MAX_$AT_TYPE
			eval local WINDOW_MIN=\$WINDOW_MIN_$AT_TYPE

			echo "Window time summary:"

			WINDOW_AVG=`expr $WINDOW_SUM / $FORWARD_TEST_COUNT`
			WINDOW_AVG_MSEC=`expr $WINDOW_AVG / 1000`
			WINDOW_AVG_USEC=`expr $WINDOW_AVG % 1000 | xargs printf %03d`
			echo -e "\tAverage window time: ${WINDOW_AVG_MSEC}.$WINDOW_AVG_USEC msec"

			WINDOW_MAX_MSEC=`expr $WINDOW_MAX / 1000`
			WINDOW_MAX_USEC=`expr $WINDOW_MAX % 1000 | xargs printf %03d`
			echo -e "\tMax window time: ${WINDOW_MAX_MSEC}.$WINDOW_MAX_USEC msec"

			WINDOW_MIN_MSEC=`expr $WINDOW_MIN / 1000`
			WINDOW_MIN_USEC=`expr $WINDOW_MIN % 1000 | xargs printf %03d`
			echo -e "\tMin window time: ${WINDOW_MIN_MSEC}.$WINDOW_MIN_USEC msec"
		fi

		if [ "$INFO_COUNT" -ne 0 ]; then
			echo "INFO: Some tests failed to execute commands normally."
			echo "      These are not validation failures but test environment issues."
			if [ "$SEND_FAILS" -ne 0 ]; then
				echo -e "\tFrame send fails: $SEND_FAILS"
			fi
			if [ "$CAPTURE_DROPS" -ne 0 ]; then
				echo -e "\tCouldn't capture due to buffer overflow: $CAPTURE_DROPS"
			fi
			if [ "$NO_CAPTURE" -ne 0 ]; then
				echo -e "\tCouldn't capture due to delayed pktgen start: $NO_CAPTURE"
			fi
			if [ "$PING_FAIL_COUNT" -ne 0 ]; then
				echo -e "\tPing failed (only once): $PING_FAIL_COUNT"
				echo -e "\t  Maybe this happened due to timing issue where mac address changes between arp reply and echo request."
			fi
		fi

		echo ""
	done
}

cleanup_settings ()
{
	ip link del $BR0

	local i
	for i in 0 1; do
		eval NS=\$NS$i
		eval VETH_TO_NS_IF=\$VETH_TO_NS${i}_IF

		ip link del $VETH_TO_NS_IF
		ip netns del $NS
	done
}

output_exit ()
{
	output_results
	if [ -n "$DUMP_PIDS$PG_PID$PROBE_PID" ]; then
		kill $DUMP_PIDS $PG_PID $PROBE_PID
		wait $DUMP_PIDS $PG_PID $PROBE_PID 2> /dev/null
	fi
	cleanup_forward_logs
	cleanup_reply_logs
	cleanup_settings
	exit
}

forward_test ()
{
	local j
	for j in attach detach; do
		if [ x"$j" == x"attach" ]; then
			local AT_TYPE=ATTACH
		else
			local AT_TYPE=DETACH
		fi
		do_forward_test $j
		validate_forward $j
		local RET=$?
		if [ $RET -eq 0 ]; then
			echo -n "." 1>&2
			eval local TEST_COUNT=\$FORWARD_TEST_COUNT_$AT_TYPE
			eval FORWARD_TEST_COUNT_$AT_TYPE=`expr $TEST_COUNT + 1`
		elif [ $RET -eq 1 ]; then
			echo -n "!" 1>&2
		else
			echo -n "?" 1>&2
		fi
		cleanup_forward_logs
	done
}

reply_test ()
{
	local j
	for j in attach detach; do
		if [ x"$j" == x"attach" ]; then
			local AT_TYPE=ATTACH
		else
			local AT_TYPE=DETACH
		fi
		do_reply_test $j
		validate_reply $j
		local RET=$?
		if [ $RET -eq 0 ]; then
			echo -n "." 1>&2
			eval local TEST_COUNT=\$REPLY_TEST_COUNT_$AT_TYPE
			eval REPLY_TEST_COUNT_$AT_TYPE=`expr $TEST_COUNT + 1`
		elif [ $RET -eq 1 ]; then
			echo -n "!" 1>&2
		else
			echo -n "?" 1>&2
		fi
		cleanup_reply_logs
	done
}

prepare

trap output_exit INT TERM
trap "output_results 1>&2" USR1

for i in `seq $ITERATIONS`; do
	forward_test
	reply_test
done

output_exit

---- Test script end ----

---- Test result begin ----

* before this patch set

ATTACH test result:
All validations succeeded.
Number of valid tests: 2000
        Forwarding tests: 1000
        ARP/ICMP tests: 1000
Window time summary:
        Average window time: 1.547 msec
        Max window time: 2.899 msec
        Min window time: 1.328 msec

DETACH test result:
All validations succeeded.
Number of valid tests: 1982
        Forwarding tests: 1000
        ARP/ICMP tests: 982
Window time summary:
        Average window time: 1.276 msec
        Max window time: 4.030 msec
        Min window time: 1.056 msec
INFO: Some tests failed to execute commands normally.
      These are not validation failures but test environment issues.
        Ping failed (only once): 18
          Maybe this happened due to timing issue where mac address changes between arp reply and echo request.


* after this patch set

ATTACH test result:
All validations succeeded.
Number of valid tests: 2000
        Forwarding tests: 1000
        ARP/ICMP tests: 1000
Window time summary:
        Average window time: 1.455 msec
        Max window time: 2.651 msec
        Min window time: 1.256 msec

DETACH test result:
All validations succeeded.
Number of valid tests: 1999
        Forwarding tests: 1000
        ARP/ICMP tests: 999
Window time summary:
        Average window time: 1.382 msec
        Max window time: 4.015 msec
        Min window time: 1.159 msec
INFO: Some tests failed to execute commands normally.
      These are not validation failures but test environment issues.
        Ping failed (only once): 1
          Maybe this happened due to timing issue where mac address changes between arp reply and echo request.

---- Test result end ----

Note: ping failed only once even after applying this patch set.
I'm thinking mac address was changed between arp reply and echo request.
This can happen on not only bridge device but also in any network
environment, and not a bug.

Thanks,
Toshiaki Makita

^ permalink raw reply

* [PATCH] rtnetlink: return the newly created link in response to newlink
From: Tom Gundersen @ 2014-01-30 13:05 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, John Fastabend, Thomas Graf, Nicolas Dichtel,
	Vlad Yasevich, Tom Gundersen, Marcel Holtmann, David S. Miller

Userspace needs to reliably know the ifindex of the netdevs it creates,
as we cannot rely on the ifname staying unchanged.

Earlier, a simlpe NLMSG_ERROR would be returned, but this returns the
corresponding RTM_NEWLINK on success instead.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: David S. Miller <davem@davemloft.net>
---
 net/core/rtnetlink.c | 100 ++++++++++++++++++++++++++-------------------------
 1 file changed, 52 insertions(+), 48 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index cf67144..31c1322 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1725,6 +1725,54 @@ static int rtnl_group_changelink(struct net *net, int group,
 	return 0;
 }
 
+static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh)
+{
+	struct net *net = sock_net(skb->sk);
+	struct ifinfomsg *ifm;
+	char ifname[IFNAMSIZ];
+	struct nlattr *tb[IFLA_MAX+1];
+	struct net_device *dev = NULL;
+	struct sk_buff *nskb;
+	int err;
+	u32 ext_filter_mask = 0;
+
+	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[IFLA_IFNAME])
+		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
+
+	if (tb[IFLA_EXT_MASK])
+		ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
+
+	ifm = nlmsg_data(nlh);
+	if (ifm->ifi_index > 0)
+		dev = __dev_get_by_index(net, ifm->ifi_index);
+	else if (tb[IFLA_IFNAME])
+		dev = __dev_get_by_name(net, ifname);
+	else
+		return -EINVAL;
+
+	if (dev == NULL)
+		return -ENODEV;
+
+	nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
+	if (nskb == NULL)
+		return -ENOBUFS;
+
+	err = rtnl_fill_ifinfo(nskb, dev, RTM_NEWLINK, NETLINK_CB(skb).portid,
+			       nlh->nlmsg_seq, 0, 0, ext_filter_mask);
+	if (err < 0) {
+		/* -EMSGSIZE implies BUG in if_nlmsg_size */
+		WARN_ON(err == -EMSGSIZE);
+		kfree_skb(nskb);
+	} else
+		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+
+	return err;
+}
+
 static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1871,63 +1919,19 @@ replay:
 			goto out;
 		}
 
+		ifm->ifi_index = dev->ifindex;
+
 		err = rtnl_configure_link(dev, ifm);
 		if (err < 0)
 			unregister_netdevice(dev);
+		else
+			rtnl_getlink(skb, nlh);
 out:
 		put_net(dest_net);
 		return err;
 	}
 }
 
-static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh)
-{
-	struct net *net = sock_net(skb->sk);
-	struct ifinfomsg *ifm;
-	char ifname[IFNAMSIZ];
-	struct nlattr *tb[IFLA_MAX+1];
-	struct net_device *dev = NULL;
-	struct sk_buff *nskb;
-	int err;
-	u32 ext_filter_mask = 0;
-
-	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
-	if (err < 0)
-		return err;
-
-	if (tb[IFLA_IFNAME])
-		nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ);
-
-	if (tb[IFLA_EXT_MASK])
-		ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);
-
-	ifm = nlmsg_data(nlh);
-	if (ifm->ifi_index > 0)
-		dev = __dev_get_by_index(net, ifm->ifi_index);
-	else if (tb[IFLA_IFNAME])
-		dev = __dev_get_by_name(net, ifname);
-	else
-		return -EINVAL;
-
-	if (dev == NULL)
-		return -ENODEV;
-
-	nskb = nlmsg_new(if_nlmsg_size(dev, ext_filter_mask), GFP_KERNEL);
-	if (nskb == NULL)
-		return -ENOBUFS;
-
-	err = rtnl_fill_ifinfo(nskb, dev, RTM_NEWLINK, NETLINK_CB(skb).portid,
-			       nlh->nlmsg_seq, 0, 0, ext_filter_mask);
-	if (err < 0) {
-		/* -EMSGSIZE implies BUG in if_nlmsg_size */
-		WARN_ON(err == -EMSGSIZE);
-		kfree_skb(nskb);
-	} else
-		err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
-
-	return err;
-}
-
 static u16 rtnl_calcit(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
 	struct net *net = sock_net(skb->sk);
-- 
1.8.5.3

^ permalink raw reply related

* [PATCH] net: set default DEVTYPE for all ethernet based devices
From: Tom Gundersen @ 2014-01-30 13:20 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Stephen Hemminger, Avinash Kumar,
	Mauro Carvalho Chehab, Simon Horman, Tom Gundersen,
	Marcel Holtmann, Greg KH, Kay Sievers

In systemd's networkd and udevd, we would like to give the administrator a
simple way to filter net devices by their DEVTYPE [0][1]. Other software
such as ConnMan and NetworkManager uses a similar filtering already.

Currently, plain ethernet devices have DEVTYPE=(null). This patch sets the
devtype to "ethernet" instead. This avoids the need for special-casing the
DEVTYPE=(null) case in userspace, and also avoids false positives, as there
are several other types of netdevs that also have DEVTYPE=(null).

Notice that this is done, as suggested by Marcel, in alloc_etherdev_mqs(),
and as best I can tell it will not give any false positives. I considered
doing it in ether_setup() instead as that seemed more intuitive, but that
would give a lot of false positives indeed.

[0]: <http://www.freedesktop.org/software/systemd/man/systemd-networkd.service.html#Type>
[1]: <http://www.freedesktop.org/software/systemd/man/udev.html#Type>

Signed-off-by: Tom Gundersen <teg@jklm.no>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Kay Sievers <kay@vrfy.org>
---
 net/ethernet/eth.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 8f032ba..b76dc17 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -369,6 +369,10 @@ void ether_setup(struct net_device *dev)
 }
 EXPORT_SYMBOL(ether_setup);

+static const struct device_type eth_type = {
+	.name = "ethernet",
+};
+
 /**
  * alloc_etherdev_mqs - Allocates and sets up an Ethernet device
  * @sizeof_priv: Size of additional driver-private structure to be allocated
@@ -387,7 +391,13 @@ EXPORT_SYMBOL(ether_setup);
 struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
 				      unsigned int rxqs)
 {
-	return alloc_netdev_mqs(sizeof_priv, "eth%d", ether_setup, txqs, rxqs);
+	struct net_device* dev;
+
+	dev = alloc_netdev_mqs(sizeof_priv, "eth%d", ether_setup, txqs, rxqs);
+	if (dev)
+		dev->dev.type = &eth_type;
+
+	return dev;
 }
 EXPORT_SYMBOL(alloc_etherdev_mqs);

-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH 0/2] [BUG FIXES - 3.10.27] sit: More backports
From: Steven Rostedt @ 2014-01-30 13:31 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: linux-kernel, netdev, stable, Clark Williams,
	Luis Claudio R. Goncalves, John Kacur, Willem de Bruijn
In-Reply-To: <52EA1B4B.5080403@6wind.com>

On Thu, 30 Jan 2014 10:28:43 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:


> Steve, I think the patch I sent yesterday is the good fix. At the end, it's
> a backport of Willem's patch. Note that he also ack that patch.
> The first version you sent (which removes
> unregister_netdevice_queue(sitn->fb_tunnel_dev, &list)) will introduce a
> memory leak when the user destroy a netns.

Hi Nicolas,

I reverted my patches and applied and tested your patches locally and
they passed my first line testing. I'm going to have them entered into
our test suite, after removing our other patches, and see if they solve
all the bugs that we were tripping over.

I'll let you know when these are finished.

Thanks!

-- Steve

^ permalink raw reply

* Re: [PATCH 0/2] [BUG FIXES - 3.10.27] sit: More backports
From: Nicolas Dichtel @ 2014-01-30 13:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, netdev, stable, Clark Williams,
	Luis Claudio R. Goncalves, John Kacur, Willem de Bruijn
In-Reply-To: <20140130083103.2bc68bad@gandalf.local.home>

Le 30/01/2014 14:31, Steven Rostedt a écrit :
> On Thu, 30 Jan 2014 10:28:43 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>
>> Steve, I think the patch I sent yesterday is the good fix. At the end, it's
>> a backport of Willem's patch. Note that he also ack that patch.
>> The first version you sent (which removes
>> unregister_netdevice_queue(sitn->fb_tunnel_dev, &list)) will introduce a
>> memory leak when the user destroy a netns.
>
> Hi Nicolas,
>
> I reverted my patches and applied and tested your patches locally and
> they passed my first line testing. I'm going to have them entered into
> our test suite, after removing our other patches, and see if they solve
> all the bugs that we were tripping over.
>
> I'll let you know when these are finished.
Thank you for testing.


Regards,
Nicolas

^ permalink raw reply

* Re: Fwd: RFC 7112 on Implications of Oversized IPv6 Header Chains
From: Ben Hutchings @ 2014-01-30 13:56 UTC (permalink / raw)
  To: Fernando Gont; +Cc: netdev
In-Reply-To: <52E955A3.7080408@gont.com.ar>

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

On Wed, 2014-01-29 at 16:25 -0300, Fernando Gont wrote:
> Folks,
> 
> FYI. This one has important implications -- it allows stateless
> filtering in IPv6 (otherwise not really possible)
[...]

Still not possible unless you can trust that all hosts behind the
firewall will correctly drop overlapping fragments.

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH 1/4] net: ethoc: implement basic ethtool operations
From: Ben Hutchings @ 2014-01-30 13:59 UTC (permalink / raw)
  To: Max Filippov
  Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
	Marc Gauthier
In-Reply-To: <1391025397-14965-2-git-send-email-jcmvbkbc@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

On Wed, 2014-01-29 at 23:56 +0400, Max Filippov wrote:
> The following methods are implemented:
> - get link state (standard implementation);
> - get timestamping info (standard implementation).
> 
> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

Reviewed-by: Ben Hutchings <ben@decadent.org.uk>

> ---
>  drivers/net/ethernet/ethoc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
> index 5854d41..6de6352 100644
> --- a/drivers/net/ethernet/ethoc.c
> +++ b/drivers/net/ethernet/ethoc.c
> @@ -900,6 +900,11 @@ out:
>  	return NETDEV_TX_OK;
>  }
>  
> +const struct ethtool_ops ethoc_ethtool_ops = {
> +	.get_link = ethtool_op_get_link,
> +	.get_ts_info = ethtool_op_get_ts_info,
> +};
> +
>  static const struct net_device_ops ethoc_netdev_ops = {
>  	.ndo_open = ethoc_open,
>  	.ndo_stop = ethoc_stop,
> @@ -1148,6 +1153,7 @@ static int ethoc_probe(struct platform_device *pdev)
>  	netdev->netdev_ops = &ethoc_netdev_ops;
>  	netdev->watchdog_timeo = ETHOC_TIMEOUT;
>  	netdev->features |= 0;
> +	netdev->ethtool_ops = &ethoc_ethtool_ops;
>  
>  	/* setup NAPI */
>  	netif_napi_add(netdev, &priv->napi, ethoc_poll, 64);

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH 3/4] net: ethoc: implement ethtool get registers
From: Ben Hutchings @ 2014-01-30 14:01 UTC (permalink / raw)
  To: Max Filippov
  Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
	Marc Gauthier
In-Reply-To: <1391025397-14965-4-git-send-email-jcmvbkbc@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]

On Wed, 2014-01-29 at 23:56 +0400, Max Filippov wrote:
> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

Reviewed-by: Ben Hutchings <ben@decadent.org.uk>

> ---
>  drivers/net/ethernet/ethoc.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
> index 9518023..0bf297b 100644
> --- a/drivers/net/ethernet/ethoc.c
> +++ b/drivers/net/ethernet/ethoc.c
> @@ -52,6 +52,7 @@ MODULE_PARM_DESC(buffer_size, "DMA buffer allocation size");
>  #define	ETH_HASH0	0x48
>  #define	ETH_HASH1	0x4c
>  #define	ETH_TXCTRL	0x50
> +#define	ETH_END		0x54
>  
>  /* mode register */
>  #define	MODER_RXEN	(1 <<  0) /* receive enable */
> @@ -922,9 +923,28 @@ static int ethoc_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
>  	return phy_ethtool_sset(phydev, cmd);
>  }
>  
> +static int ethoc_get_regs_len(struct net_device *netdev)
> +{
> +	return ETH_END;
> +}
> +
> +static void ethoc_get_regs(struct net_device *dev, struct ethtool_regs *regs,
> +			   void *p)
> +{
> +	struct ethoc *priv = netdev_priv(dev);
> +	u32 *regs_buff = p;
> +	unsigned i;
> +
> +	regs->version = 0;
> +	for (i = 0; i < ETH_END / sizeof(u32); ++i)
> +		regs_buff[i] = ethoc_read(priv, i * sizeof(u32));
> +}
> +
>  const struct ethtool_ops ethoc_ethtool_ops = {
>  	.get_settings = ethoc_get_settings,
>  	.set_settings = ethoc_set_settings,
> +	.get_regs_len = ethoc_get_regs_len,
> +	.get_regs = ethoc_get_regs,
>  	.get_link = ethtool_op_get_link,
>  	.get_ts_info = ethtool_op_get_ts_info,
>  };

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH v2 4/4] net: ethoc: implement ethtool operations
From: Ben Hutchings @ 2014-01-30 14:04 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-xtensa@linux-xtensa.org, netdev, LKML, Chris Zankel,
	Marc Gauthier, David S. Miller, Florian Fainelli
In-Reply-To: <CAMo8Bf+9A-yh-_EBmZeJqMhUKrHN5+BPtckFj_gnZ44tfnX__w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

On Thu, 2014-01-30 at 07:04 +0400, Max Filippov wrote:
> On Thu, Jan 30, 2014 at 5:59 AM, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Wed, 2014-01-29 at 10:00 +0400, Max Filippov wrote:
[...]
> >> +     priv->num_tx = rounddown_pow_of_two(ring->tx_pending);
> >
> > Range check?
> 
> May there be requested more than ring->tx_max_pending that we
> indicated in the get_ringparam?

Yes, the ethtool core doesn't check that for you.

> >> +     priv->num_rx = priv->num_bd - priv->num_tx;
> >> +     if (priv->num_rx > ring->rx_pending)
> >> +             priv->num_rx = ring->rx_pending;
> >
> > So the RX ring may only ever be shrunk?!  Did you mean to compare with
> > priv->num_bd instead?
> 
> First all non-TX descriptors are made RX, and if that's more than user
> requested I trim it.
[...]

OK, I get it.  But it would be clearer if you used min().

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: IGMP joins come from the wrong SA/interface
From: Hannes Frederic Sowa @ 2014-01-30 14:17 UTC (permalink / raw)
  To: Steinar H. Gunderson; +Cc: netdev
In-Reply-To: <20140130104709.GA21178@sesse.net>

On Thu, Jan 30, 2014 at 11:47:09AM +0100, Steinar H. Gunderson wrote:
> On Mon, Jan 20, 2014 at 07:40:25PM +0100, Steinar H. Gunderson wrote:
> >> I currently only remember one commit 0a7e22609067ff ("ipv4: fix
> >> ineffective source address selection") which did affect multicast source
> >> address selection in recent times.
> > I tried 3.10.27, just to check something older. I also tried 3.10.27 with
> > 0a7e22609067ff reverted, and it's still wrong.
> > 
> > I am thinking this might have something to do with the machine switching to
> > systemd, presumably changing the order of DHCP and static addresses being
> > assigned...
> 
> Anything more I can do here?

Can you give a bit more background what multicast application you are running
on the box and also post a cat /proc/net/igmp?

(For the application info an strace how the join to the multicast address would be
interesting.)

I guess a workaround would be to bind the join to a specific interface.

Greetings,

  Hannes

^ permalink raw reply

* Re: IPv4 / IPv6 over IPv4 IPsec tunnel: setting the DF bit
From: Hannes Frederic Sowa @ 2014-01-30 14:21 UTC (permalink / raw)
  To: Simon Schneider; +Cc: netdev
In-Reply-To: <trinity-bc5263ea-896d-4350-aa80-fd2895b54b3b-1391084710336@3capp-gmx-bs28>

On Thu, Jan 30, 2014 at 01:25:10PM +0100, Simon Schneider wrote:
> Hi,
> for the scenarios
> - IPv4 over IPv4 IPsec tunnel
> - IPv6 over IPv4 IPsec tunnel
> 
> I wonder how the DF bit of the outer (encrypted) packet is set.
> 
> There are generally three options:
> - DF bit always 0
> - DF bit always 1
> - DF bit copied from inner packet

There is a pmtudisc knob on ip tunnel ... to force DF bit on outgoing packets,
but DF bit should get copied from inner packet up to tunnel header in every
case.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] rtnetlink: return the newly created link in response to newlink
From: Thomas Graf @ 2014-01-30 14:27 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: netdev, linux-kernel, John Fastabend, Nicolas Dichtel,
	Vlad Yasevich, Marcel Holtmann, David S. Miller
In-Reply-To: <1391087144-24490-1-git-send-email-teg@jklm.no>

On 01/30/14 at 02:05pm, Tom Gundersen wrote:
> Userspace needs to reliably know the ifindex of the netdevs it creates,
> as we cannot rely on the ifname staying unchanged.
> 
> Earlier, a simlpe NLMSG_ERROR would be returned, but this returns the
> corresponding RTM_NEWLINK on success instead.

This breaks existing Netlink applications in user space. User space
apps are not prepared to receive both a RTM_NEWLINK reply _and_
the ACK unless they have set NLM_F_ECHO in the original request.

You can already reliably retrieve the ifindex by listening to
RTNLGRP_LINK messages and be notified about the link created
including all follow-up renames.

^ permalink raw reply

* [PATCH 0/5] can: sja1000: cleanups and new OF property
From: Florian Vaussard @ 2014-01-30 14:29 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, florian.vaussard

Hello,

The first part of this series performs serveral small cleanups
(patches 1 to 3).

The second part introduces the 'reg-io-width' binding (already used
by some other drivers) to perform a similar job as what was done
with IORESOURCE_MEM_XXBIT on the sja1000_platform. This is needed
on my system to correctly take into account the aliasing of the
address bus.

All patches were tested on my OMAP3 system with a memory-mapped
SJA1000.

Regards,
Florian

Florian Vaussard (5):
  can: sja1000: remove unused defines
  can: sja1000: convert printk to use netdev API
  can: sja1000: of: use devm_* APIs
  Documentation: devicetree: sja1000: add reg-io-width binding
  can: sja1000: of: add read/write routines for 8, 16 and 32-bit
    register access

 .../devicetree/bindings/net/can/sja1000.txt        |  4 ++
 drivers/net/can/sja1000/sja1000.c                  |  3 +-
 drivers/net/can/sja1000/sja1000_of_platform.c      | 66 ++++++++++++++--------
 3 files changed, 46 insertions(+), 27 deletions(-)

-- 
1.8.1.2

^ permalink raw reply

* [PATCH 1/5] can: sja1000: remove unused defines
From: Florian Vaussard @ 2014-01-30 14:29 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, florian.vaussard
In-Reply-To: <1391092168-21246-1-git-send-email-florian.vaussard@epfl.ch>

Remove unused defines for the OF platform.

Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
---
 drivers/net/can/sja1000/sja1000_of_platform.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/can/sja1000/sja1000_of_platform.c b/drivers/net/can/sja1000/sja1000_of_platform.c
index 047accd..2f29eb9 100644
--- a/drivers/net/can/sja1000/sja1000_of_platform.c
+++ b/drivers/net/can/sja1000/sja1000_of_platform.c
@@ -55,9 +55,6 @@ MODULE_LICENSE("GPL v2");
 
 #define SJA1000_OFP_CAN_CLOCK  (16000000 / 2)
 
-#define SJA1000_OFP_OCR        OCR_TX0_PULLDOWN
-#define SJA1000_OFP_CDR        (CDR_CBP | CDR_CLK_OFF)
-
 static u8 sja1000_ofp_read_reg(const struct sja1000_priv *priv, int reg)
 {
 	return ioread8(priv->reg_base + reg);
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 2/5] can: sja1000: convert printk to use netdev API
From: Florian Vaussard @ 2014-01-30 14:29 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, florian.vaussard
In-Reply-To: <1391092168-21246-1-git-send-email-florian.vaussard@epfl.ch>

Use netdev_* where applicable.

Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
---
 drivers/net/can/sja1000/sja1000.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index f17c301..55cce47 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -106,8 +106,7 @@ static int sja1000_probe_chip(struct net_device *dev)
 	struct sja1000_priv *priv = netdev_priv(dev);
 
 	if (priv->reg_base && sja1000_is_absent(priv)) {
-		printk(KERN_INFO "%s: probing @0x%lX failed\n",
-		       DRV_NAME, dev->base_addr);
+		netdev_err(dev, "probing failed\n");
 		return 0;
 	}
 	return -1;
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 3/5] can: sja1000: of: use devm_* APIs
From: Florian Vaussard @ 2014-01-30 14:29 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, florian.vaussard
In-Reply-To: <1391092168-21246-1-git-send-email-florian.vaussard@epfl.ch>

Simplify probe and remove functions by converting most of the resources
to use devm_* APIs.

Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
---
 drivers/net/can/sja1000/sja1000_of_platform.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/net/can/sja1000/sja1000_of_platform.c b/drivers/net/can/sja1000/sja1000_of_platform.c
index 2f29eb9..8ebb4af 100644
--- a/drivers/net/can/sja1000/sja1000_of_platform.c
+++ b/drivers/net/can/sja1000/sja1000_of_platform.c
@@ -69,18 +69,11 @@ static void sja1000_ofp_write_reg(const struct sja1000_priv *priv,
 static int sja1000_ofp_remove(struct platform_device *ofdev)
 {
 	struct net_device *dev = platform_get_drvdata(ofdev);
-	struct sja1000_priv *priv = netdev_priv(dev);
-	struct device_node *np = ofdev->dev.of_node;
-	struct resource res;
 
 	unregister_sja1000dev(dev);
 	free_sja1000dev(dev);
-	iounmap(priv->reg_base);
 	irq_dispose_mapping(dev->irq);
 
-	of_address_to_resource(np, 0, &res);
-	release_mem_region(res.start, resource_size(&res));
-
 	return 0;
 }
 
@@ -102,23 +95,22 @@ static int sja1000_ofp_probe(struct platform_device *ofdev)
 
 	res_size = resource_size(&res);
 
-	if (!request_mem_region(res.start, res_size, DRV_NAME)) {
+	if (!devm_request_mem_region(&ofdev->dev,
+				     res.start, res_size, DRV_NAME)) {
 		dev_err(&ofdev->dev, "couldn't request %pR\n", &res);
 		return -EBUSY;
 	}
 
-	base = ioremap_nocache(res.start, res_size);
+	base = devm_ioremap_nocache(&ofdev->dev, res.start, res_size);
 	if (!base) {
 		dev_err(&ofdev->dev, "couldn't ioremap %pR\n", &res);
-		err = -ENOMEM;
-		goto exit_release_mem;
+		return -ENOMEM;
 	}
 
 	irq = irq_of_parse_and_map(np, 0);
 	if (irq == 0) {
 		dev_err(&ofdev->dev, "no irq found\n");
-		err = -ENODEV;
-		goto exit_unmap_mem;
+		return -ENODEV;
 	}
 
 	dev = alloc_sja1000dev(0);
@@ -191,10 +183,6 @@ exit_free_sja1000:
 	free_sja1000dev(dev);
 exit_dispose_irq:
 	irq_dispose_mapping(irq);
-exit_unmap_mem:
-	iounmap(base);
-exit_release_mem:
-	release_mem_region(res.start, res_size);
 
 	return err;
 }
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 4/5] Documentation: devicetree: sja1000: add reg-io-width binding
From: Florian Vaussard @ 2014-01-30 14:29 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, florian.vaussard, Grant Likely,
	Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala,
	devicetree
In-Reply-To: <1391092168-21246-1-git-send-email-florian.vaussard@epfl.ch>

Add the reg-io-width property to describe the width of the memory
accesses.

Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ian Campbell <ijc+devicetree@hellion.org.uk>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
---
 Documentation/devicetree/bindings/net/can/sja1000.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/can/sja1000.txt b/Documentation/devicetree/bindings/net/can/sja1000.txt
index f2105a4..b4a6d53 100644
--- a/Documentation/devicetree/bindings/net/can/sja1000.txt
+++ b/Documentation/devicetree/bindings/net/can/sja1000.txt
@@ -12,6 +12,10 @@ Required properties:
 
 Optional properties:
 
+- reg-io-width : Specify the size (in bytes) of the IO accesses that
+	should be performed on the device.  Valid value is 1, 2 or 4.
+	Default to 1 (8 bits).
+
 - nxp,external-clock-frequency : Frequency of the external oscillator
 	clock in Hz. Note that the internal clock frequency used by the
 	SJA1000 is half of that value. If not specified, a default value
-- 
1.8.1.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox