Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-15 21:49 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David S. Miller, Pravin B Shelar, Jiri Pirko, Cong Wang,
	dev@openvswitch.org, netdev
In-Reply-To: <CAMEtUuxysgzNqeBpBQ-ajLtymYCvEF-GMiUcQuy1b-QA=dFhdw@mail.gmail.com>

On Tue, Oct 15, 2013 at 9:53 AM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Tue, Oct 15, 2013 at 8:31 AM, Jesse Gross <jesse@nicira.com> wrote:
>> On Sun, Oct 13, 2013 at 8:50 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>>> index c323567..ffa429a 100644
>>> --- a/net/openvswitch/dp_notify.c
>>> +++ b/net/openvswitch/dp_notify.c
>>> @@ -59,15 +59,9 @@ void ovs_dp_notify_wq(struct work_struct *work)
>>>                         struct hlist_node *n;
>>>
>>>                         hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
>>> -                               struct netdev_vport *netdev_vport;
>>> -
>>>                                 if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
>>>                                         continue;
>>> -
>>> -                               netdev_vport = netdev_vport_priv(vport);
>>> -                               if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
>>> -                                   netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
>>> -                                       dp_detach_port_notify(vport);
>>> +                               dp_detach_port_notify(vport);
>>
>> Doesn't this free *all* ports of type OVS_VPORT_TYPE_NETDEV when any
>> one of them is removed?
>
> sorry. not sure what I was thinking on Sunday evening. will respin

will take it back. the check was removed to prevent hang upon dev netns moves,
since reg_state will still be == netreg_registered,
but yes, different check is needed.
sending v4

^ permalink raw reply

* [PATCH v4 net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-15 21:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jesse Gross, Pravin B Shelar, Jiri Pirko, Cong Wang, dev, netdev

The combination of two commits:
commit 8e4e1713e4
("openvswitch: Simplify datapath locking.")
commit 2537b4dd0a
("openvswitch:: link upper device for port devices")

introduced a bug where upper_dev wasn't unlinked upon
netdev_unregister notification

The following steps:

  modprobe openvswitch
  ovs-dpctl add-dp test
  ip tuntap add dev tap1 mode tap
  ovs-dpctl add-if test tap1
  ip tuntap del dev tap1 mode tap

are causing multiple warnings:

[   62.747557] gre: GRE over IPv4 demultiplexor driver
[   62.749579] openvswitch: Open vSwitch switching datapath
[   62.755087] device test entered promiscuous mode
[   62.765911] device tap1 entered promiscuous mode
[   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
[   62.769017] ------------[ cut here ]------------
[   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
[   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
[   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
[   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
[   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
[   62.769059] Call Trace:
[   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
[   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
[   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
[   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
[   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
[   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
[   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
[   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
[   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
[   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
[   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
[   62.769093] ---[ end trace 838756c62e156ffb ]---
[   62.769481] ------------[ cut here ]------------
[   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769486] sysfs: can not remove 'master', no directory
[   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
[   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
[   62.769525] Call Trace:
[   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
[   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769573] ---[ end trace 838756c62e156ffc ]---
[   62.769574] ------------[ cut here ]------------
[   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
[   62.769577] sysfs: can not remove 'upper_test', no directory
[   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
[   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
[   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
[   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
[   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
[   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
[   62.769613] Call Trace:
[   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
[   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
[   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
[   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
[   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
[   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
[   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
[   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
[   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
[   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
[   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
[   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
[   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
[   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
[   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
[   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
[   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
[   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
[   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
[   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
[   62.769657] ---[ end trace 838756c62e156ffd ]---
[   62.769724] device tap1 left promiscuous mode

This patch also affects moving devices between net namespaces.

OVS used to ignore netns move notifications which caused problems.
Like:
  ovs-dpctl add-if test tap1
  ip link set tap1 netns 3512
and then removing tap1 inside the namespace will cause hang on missing dev_put.

With this patch OVS will detach dev upon receiving netns move event.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 net/openvswitch/dp_notify.c    |    7 +++++--
 net/openvswitch/vport-netdev.c |   16 +++++++++++++---
 net/openvswitch/vport-netdev.h |    1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index c323567..5c2dab2 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -65,8 +65,7 @@ void ovs_dp_notify_wq(struct work_struct *work)
 					continue;
 
 				netdev_vport = netdev_vport_priv(vport);
-				if (netdev_vport->dev->reg_state == NETREG_UNREGISTERED ||
-				    netdev_vport->dev->reg_state == NETREG_UNREGISTERING)
+				if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH))
 					dp_detach_port_notify(vport);
 			}
 		}
@@ -88,6 +87,10 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
 		return NOTIFY_DONE;
 
 	if (event == NETDEV_UNREGISTER) {
+		/* upper_dev_unlink and decrement promisc immediately */
+		ovs_netdev_detach_dev(vport);
+
+		/* schedule vport destroy, dev_put and genl notification */
 		ovs_net = net_generic(dev_net(dev), ovs_net_id);
 		queue_work(system_wq, &ovs_net->dp_notify_work);
 	}
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 09d93c1..d21f77d 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -150,15 +150,25 @@ static void free_port_rcu(struct rcu_head *rcu)
 	ovs_vport_free(vport_from_priv(netdev_vport));
 }
 
-static void netdev_destroy(struct vport *vport)
+void ovs_netdev_detach_dev(struct vport *vport)
 {
 	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
 
-	rtnl_lock();
+	ASSERT_RTNL();
 	netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
 	netdev_rx_handler_unregister(netdev_vport->dev);
-	netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
+	netdev_upper_dev_unlink(netdev_vport->dev,
+				netdev_master_upper_dev_get(netdev_vport->dev));
 	dev_set_promiscuity(netdev_vport->dev, -1);
+}
+
+static void netdev_destroy(struct vport *vport)
+{
+	struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
+
+	rtnl_lock();
+	if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)
+		ovs_netdev_detach_dev(vport);
 	rtnl_unlock();
 
 	call_rcu(&netdev_vport->rcu, free_port_rcu);
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index dd298b5..8df01c11 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -39,5 +39,6 @@ netdev_vport_priv(const struct vport *vport)
 }
 
 const char *ovs_netdev_get_name(const struct vport *);
+void ovs_netdev_detach_dev(struct vport *);
 
 #endif /* vport_netdev.h */
-- 
1.7.9.5

^ permalink raw reply related

* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Damian Pietras @ 2013-10-15 22:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1381870957.2045.73.camel@edumazet-glaptop.roam.corp.google.com>

On 15.10.2013 23:02, Eric Dumazet wrote:
>> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
>> xfrm: Fix the gc threshold value for ipv4
>>
>> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
>> and I don't really believe such simple case stayed broken for so long.
>> Em I missing something or there is really a bug?
>>
>> If smeone is interested in details of this configuration and commands
>> I'm running, just let me know. This was reproduced with few VMs under XEN.
>>
> 
> It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
> sensible value given your workload.
> 
> try :
> 
> echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh
> 
> Presumably the 1024 default is really too small...

Now it's working in my test setup, I'm changing it on the production
boxes, thanks!


-- 
Damian Pietras

^ permalink raw reply

* [PATCH] sh_eth: add/use RMCR.RNC bit
From: Sergei Shtylyov @ 2013-10-15 22:29 UTC (permalink / raw)
  To: netdev; +Cc: nobuhiro.iwamatsu.yj, linux-sh, davem, horms

Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC bit and
replace bare numbers in the driver by  this mnemonic.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
This patch is against DaveM's 'net.git' repo but it intended for 'net-next.git'
repo -- it's  because 'net-next.git' doesn't contain the required Simon Horman's
patch yet.

 drivers/net/ethernet/renesas/sh_eth.c |    6 +++---
 drivers/net/ethernet/renesas/sh_eth.h |    3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

Index: net/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net/drivers/net/ethernet/renesas/sh_eth.c
@@ -483,7 +483,7 @@ static struct sh_eth_cpu_data sh7757_dat
 	.register_type	= SH_ETH_REG_FAST_SH4,
 
 	.eesipr_value	= DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
 	.eesr_err_check	= EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -561,7 +561,7 @@ static struct sh_eth_cpu_data sh7757_dat
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
 	.fdr_value	= 0x0000072f,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.irq_flags	= IRQF_SHARED,
 	.apr		= 1,
@@ -689,7 +689,7 @@ static struct sh_eth_cpu_data r8a7740_da
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
 	.fdr_value	= 0x0000070f,
-	.rmcr_value	= 0x00000001,
+	.rmcr_value	= RMCR_RNC,
 
 	.apr		= 1,
 	.mpr		= 1,
Index: net/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net/drivers/net/ethernet/renesas/sh_eth.h
@@ -321,6 +321,9 @@ enum TD_STS_BIT {
 #define TD_TFP	(TD_TFP1|TD_TFP0)
 
 /* RMCR */
+enum RMCR_BIT {
+	RMCR_RNC = 0x00000001,
+};
 #define DEFAULT_RMCR_VALUE	0x00000000
 
 /* ECMR */

^ permalink raw reply

* Re: [PATCH ipsec v2] xfrm: prevent ipcomp scratch buffer race condition
From: Eric Dumazet @ 2013-10-15 22:44 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Steffen Klassert, Herbert Xu, David S. Miller, netdev
In-Reply-To: <20131015214030.B0D06E8A60@unicorn.suse.cz>

On Tue, 2013-10-15 at 23:40 +0200, Michal Kubecek wrote:
> In ipcomp_compress(), sortirq is enabled too early, allowing the
> per-cpu scratch buffer to be rewritten by ipcomp_decompress()
> (called on the same CPU in softirq context) between populating
> the buffer and copying the compressed data to the skb.
> 
> Add similar protection into ipcomp_decompress() as it can be
> called from process context as well (even if such scenario seems
> a bit artificial).
> 
> v2: as pointed out by Steffen Klassert, if we also move the
> local_bh_disable() before reading the per-cpu pointers, we can
> get rid of get_cpu()/put_cpu().
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> ---
>  net/xfrm/xfrm_ipcomp.c | 29 ++++++++++++++++++-----------
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
> index 2906d52..9fe4f42 100644
> --- a/net/xfrm/xfrm_ipcomp.c
> +++ b/net/xfrm/xfrm_ipcomp.c
> @@ -45,12 +45,17 @@ static int ipcomp_decompress(struct xfrm_state *x, struct sk_buff *skb)
>  	const int plen = skb->len;
>  	int dlen = IPCOMP_SCRATCH_SIZE;
>  	const u8 *start = skb->data;
> -	const int cpu = get_cpu();
> -	u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
> -	struct crypto_comp *tfm = *per_cpu_ptr(ipcd->tfms, cpu);
> -	int err = crypto_comp_decompress(tfm, start, plen, scratch, &dlen);
> +	struct crypto_comp *tfm;
> +	u8 *scratch;
> +	int cpu;
> +	int err;
>  	int len;
>  
> +	local_bh_disable();
> +	cpu = smp_processor_id();
> +	scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
> +	tfm = *per_cpu_ptr(ipcd->tfms, cpu);

Have you tried this_cpu_ptr() instead ?

^ permalink raw reply

* Re: "xfrm: Fix the gc threshold value for ipv4" broke my IPSec connections
From: Eric Dumazet @ 2013-10-15 22:51 UTC (permalink / raw)
  To: Damian Pietras, Steffen Klassert; +Cc: netdev
In-Reply-To: <525DBE65.1070707@daper.net>

On Wed, 2013-10-16 at 00:15 +0200, Damian Pietras wrote:
> On 15.10.2013 23:02, Eric Dumazet wrote:
> >> 703fb94ec58e0e8769380c2877a8a34aeb5b6c97
> >> xfrm: Fix the gc threshold value for ipv4
> >>
> >> Reverting it on 3.10.15 fixes my issue. This seems to be there from 3.7
> >> and I don't really believe such simple case stayed broken for so long.
> >> Em I missing something or there is really a bug?
> >>
> >> If smeone is interested in details of this configuration and commands
> >> I'm running, just let me know. This was reproduced with few VMs under XEN.
> >>
> > 
> > It looks like you need to tune /proc/sys/net/ipv4/xfrm4_gc_thresh to a
> > sensible value given your workload.
> > 
> > try :
> > 
> > echo 65536 >/proc/sys/net/ipv4/xfrm4_gc_thresh
> > 
> > Presumably the 1024 default is really too small...
> 
> Now it's working in my test setup, I'm changing it on the production
> boxes, thanks!
> 
> 

Steffen, what do you think ?

1024 seems really small, given we had much higher values.

(256 K on a 1GB host)

This sysctl also needs an entry in
Documentation/networking/ip-sysctl.txt

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: David Miller @ 2013-10-15 23:14 UTC (permalink / raw)
  To: steffen.klassert; +Cc: netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 15 Oct 2013 09:30:20 +0200

> We could fiddle something to get a terminating condition if the
> state is not resolved after some time, but my plan was to disable
> the larval_drop sysctl by default some day again. At best without
> any notable change to userspace. That's why I would prefer to
> remove the sleeping entirely.

Ok I have no problem with removing the sleeping stuff.

^ permalink raw reply

* Re: [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: Eric Dumazet @ 2013-10-15 23:58 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, netdev
In-Reply-To: <20131015073020.GV7660@secunet.com>

On Tue, 2013-10-15 at 09:30 +0200, Steffen Klassert wrote:

> Right, that's why I've limited the queue to 100 packets. We can
> queue the SYNs of up to 100 tcp connestions that want to use
> this IPsec state. It surely can happen that we queue multiple
> retransmitted SYNs if the IPsec resolution is slow. But the
> queueing code tries at least to get the packets out before
> the first tcp retransmit. I think there is still room for
> optimizations, maybe reducing the queue lenght or the queue
> timeout to avoid queueing retransmitted SYNs as much as possible.

Note that its totally possible to avoid retransmitting SYN if original
SYN is still in a host queue.

We currently increment a SNMP counter when we detect this, we could
do something else (like not queuing a copy of the packet)

http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=0e280af026a5662ffd57c4e623b822df1f7f47ff

Another work in progress is to delay RTO arming at the time TCP
packet leaves the host queues, instead of at the enqueue time.

^ permalink raw reply

* [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:02 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明
  Cc: lvs-devel, netdev, Julian Anastasov, Mark Brooks, Simon Horman

In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
such that it creates and uses a neigh entry if none is found.
Subsequently the 'n' field was removed from struct rt6_info.

Unfortunately my analysis is that in the case of IPVS direct routing this
change leads to incorrect behaviour as in this case packets may be output
to a destination other than where they would be output according to the
route table. In particular, the destination address may actually be a local
address and empirically a neighbour lookup seems to result in it becoming
unreachable.

This patch resolves the problem by providing the destination address
determined by IPVS to ip6_finish_output2() in the skb callback.  Although
this seems to work I can see several problems with this approach:

* It is rather ugly, stuffing an IPVS exception right in
  the middle of IPv6 code. The overhead could be eliminated for many users
  by using a staic key. But none the less it is not attractive.

* The use of the skb callback is may not be valid
  as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
  alternative is to add a new field to struct sk_buff.

* This covers all IPv6 packets output by IPVS but actually
  only those output using IPVS Direct-Routing need this.  One way to
  resolve this would be to add a more fine-grained ipvs_property to
  struct sk_buff.

Reported-by: Mark Brooks <mark@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             | 6 ++++++
 net/ipv6/ip6_output.c           | 9 +++++++--
 net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..11d90a6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
 		atomic_read(&dest->inactconns);
 }
 
+struct ipvs_skb_cb {
+	struct in6_addr *daddr;
+};
+
+#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
+
 #endif	/* _NET_IP_VS_H */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..a340180 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -52,6 +52,7 @@
 #include <net/addrconf.h>
 #include <net/rawv6.h>
 #include <net/icmp.h>
+#include <net/ip_vs.h>
 #include <net/xfrm.h>
 #include <net/checksum.h>
 #include <linux/mroute6.h>
@@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct neighbour *neigh;
-	struct in6_addr *nexthop;
+	struct in6_addr *nexthop, *daddr;
 	int ret;
 
 	skb->protocol = htons(ETH_P_IPV6);
@@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
+	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
+		daddr = IP_VS_SKB_CB(skb)->daddr;
+	else
+		daddr = &ipv6_hdr(skb)->daddr;
+	nexthop = rt6_nexthop((struct rt6_info *)dst, daddr);
 	neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop);
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&nd_tbl, nexthop, dst->dev, false);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index c47444e..054b679 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -391,6 +391,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
 		rt = (struct rt6_info *) dst;
 	}
 
+	IP_VS_SKB_CB(skb)->daddr = daddr;
+
 	local = __ip_vs_is_local_route6(rt);
 	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
 	      rt_mode)) {
-- 
1.8.4


^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:13 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>

On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
> 
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
> 
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> this seems to work I can see several problems with this approach:
> 
> * It is rather ugly, stuffing an IPVS exception right in
>   the middle of IPv6 code. The overhead could be eliminated for many users
>   by using a staic key. But none the less it is not attractive.
> 
> * The use of the skb callback is may not be valid
>   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
>   alternative is to add a new field to struct sk_buff.

Please no ;)

> 
> * This covers all IPv6 packets output by IPVS but actually
>   only those output using IPVS Direct-Routing need this.  One way to
>   resolve this would be to add a more fine-grained ipvs_property to
>   struct sk_buff.
> 
> Reported-by: Mark Brooks <mark@loadbalancer.org>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> ---
>  include/net/ip_vs.h             | 6 ++++++
>  net/ipv6/ip6_output.c           | 9 +++++++--
>  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 1c2e1b9..11d90a6 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
>  		atomic_read(&dest->inactconns);
>  }
>  
> +struct ipvs_skb_cb {
> +	struct in6_addr *daddr;
> +};

So we pass a reference.

> +
> +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> +
>  #endif	/* _NET_IP_VS_H */
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a54c45c..a340180 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -52,6 +52,7 @@
>  #include <net/addrconf.h>
>  #include <net/rawv6.h>
>  #include <net/icmp.h>
> +#include <net/ip_vs.h>
>  #include <net/xfrm.h>
>  #include <net/checksum.h>
>  #include <linux/mroute6.h>
> @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	struct dst_entry *dst = skb_dst(skb);
>  	struct net_device *dev = dst->dev;
>  	struct neighbour *neigh;
> -	struct in6_addr *nexthop;
> +	struct in6_addr *nexthop, *daddr;
>  	int ret;
>  
>  	skb->protocol = htons(ETH_P_IPV6);
> @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
>  	}
>  
>  	rcu_read_lock_bh();
> -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> +		daddr = IP_VS_SKB_CB(skb)->daddr;

What guarantee do we have daddr points to valid memory (not already
freed/reused) ?

I guess things like NFQUEUE could happen ?




^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-16  0:15 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <525D6C16.6010705@oracle.com>


On 2013-10-16 0:23, jianhai luan wrote:
>
> On 2013-10-16 0:03, Wei Liu wrote:
>> On Tue, Oct 15, 2013 at 11:19:42PM +0800, jianhai luan wrote:
>> [...]
>>>>>>> * time_after_eq(now, next_credit) -> false
>>>>>>> * time_before(now, expires) -> false
>>>>>> If now is placed in above environment, the result will be correct
>>>>>> (Sending package will be not allowed until next_credit).
>>>>> No, it is not necessarily correct. Keep in mind that "now" wraps 
>>>>> around,
>>>>> which is the issue you try to fix. You still have a window to 
>>>>> stall your
>>>>> frontend.
>>>> Remember that time_after_eq is supposed to work even with wraparound
>>>> occurring, so long as the two times are less than MAX_LONG/2 apart.
>>> Sorry for my misunderstand explanation. I mean that
>>>    * time_after_eq()/time_before_eq() fix the jiffies wraparound, so
>>> please think about  jiffies in line increasing.
>>>    * time_after_eq()/time_before_eq() have the range (0, MAX_LONG/2),
>>> the judge will be wrong if out of the range.
>>>
>>> So please think about three kind environment
>>>    -  expires        now        next_credit
>>>       --------time increases this direction ---------->
>>>
>>>    -  expires        [next_credit        now next_credit+MAX_LONG/2
>>>       --------time increase this direction ----------->
>>>
>>>    - expires        next_credit        next_credit+MAX_LONG/2 now
>>>       --------time increadse this direction ---------->
>>>
>>> The first environment should be netfront consume all credit_byte
>>> before next_credit, So we should pending one timer to calculator the
>>> new credit_byte, and don't transmit until next_credit.
>>>
>>> the second environment should be calculator the credit_byte because
>>> netfront don't consume all credit_byte before next_credit, and
>>> time_after_eq() do correct judge.
>>>
>>> the third environment should be calculator in time because netfront
>>> don't consume all credit_byte until next_credit.But time_after_eq do
>>> error judge (time_after_eq(now, next_credit) is false), so the
>>> remaining_byte isn't be increased.
>>>
>>> and I work on the third environment.  You know now >
>>> next_credit+MAX_LONG/2, time_before(now, expire) should be
>>> true(time_before(now, expire) is false in first environment)
>> Thanks for staighten this out for me. I'm just too dumb for this, please
>> be patient with me. :-)
>>
>> Could you prove that time_before(now, expire) is always true in third
>> case? That's where my main cencern lies. Is it because msecs_to_jiffies
>> always returns MAX_JIFFY_OFFSET (which is ((LONG_MAX >> 1)-1) ) at most?
>
> I have wrong judge in third environment. If now large than expires + 
> MAX_UNLONG, time_before(now, expires) will be false.
>   expires    next_credit    next_credit+MAX_UNLONG/2    expires + 
> MAX_UNLONG    now    next_credit+MAX_UNLONG
>   --------------------------------------------------------- time 
> increadse this direction  ---------------------------------->
>
>   In the above environment, time_before(now, expires) will return 
> false. But the jiffies elapsed more time and next_credit will be 
> reachable in soon(time_after_eq(now, next_credit) will be true).

After above talk, the window should be exist (expire+MAX_ULONG 
next_credit+MAX_ULONG, expire + 2MAX_ULONF next_credit+MAX_ULONG 
....,expire+<n>ULONG next_credit+<n>MAX_ULONG). Other time window should 
be not exist (maybe i don't think about).
  If so, please think about the below:
   * If no speed control, vif->credit_usec should be zero. expire = 
next_credit and the window is zero
   * If we have done speed control, the scenario should be very likely 
than first environment except the value is larger (the delta is 
<n>MAX_UNLONG)
      - If do speed control. the window should have been think about
                      speed            worse time
                      100M/s          40s
                      1000M/s       4s
      - the time should be acceptable.
>>
>> Wei.
>

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  0:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381882426.2045.85.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:13:46PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:02 +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> > 
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> > 
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> > this seems to work I can see several problems with this approach:
> > 
> > * It is rather ugly, stuffing an IPVS exception right in
> >   the middle of IPv6 code. The overhead could be eliminated for many users
> >   by using a staic key. But none the less it is not attractive.
> > 
> > * The use of the skb callback is may not be valid
> >   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> >   alternative is to add a new field to struct sk_buff.
> 
> Please no ;)

I thought of you when I wrote that comment :)

> > * This covers all IPv6 packets output by IPVS but actually
> >   only those output using IPVS Direct-Routing need this.  One way to
> >   resolve this would be to add a more fine-grained ipvs_property to
> >   struct sk_buff.
> > 
> > Reported-by: Mark Brooks <mark@loadbalancer.org>
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > ---
> >  include/net/ip_vs.h             | 6 ++++++
> >  net/ipv6/ip6_output.c           | 9 +++++++--
> >  net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> >  3 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> > index 1c2e1b9..11d90a6 100644
> > --- a/include/net/ip_vs.h
> > +++ b/include/net/ip_vs.h
> > @@ -1649,4 +1649,10 @@ ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
> >  		atomic_read(&dest->inactconns);
> >  }
> >  
> > +struct ipvs_skb_cb {
> > +	struct in6_addr *daddr;
> > +};
> 
> So we pass a reference.
>
> > +
> > +#define IP_VS_SKB_CB(skb) ((struct ipvs_skb_cb *)&(skb)->cb)
> > +
> >  #endif	/* _NET_IP_VS_H */
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index a54c45c..a340180 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -52,6 +52,7 @@
> >  #include <net/addrconf.h>
> >  #include <net/rawv6.h>
> >  #include <net/icmp.h>
> > +#include <net/ip_vs.h>
> >  #include <net/xfrm.h>
> >  #include <net/checksum.h>
> >  #include <linux/mroute6.h>
> > @@ -61,7 +62,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	struct dst_entry *dst = skb_dst(skb);
> >  	struct net_device *dev = dst->dev;
> >  	struct neighbour *neigh;
> > -	struct in6_addr *nexthop;
> > +	struct in6_addr *nexthop, *daddr;
> >  	int ret;
> >  
> >  	skb->protocol = htons(ETH_P_IPV6);
> > @@ -105,7 +106,11 @@ static int ip6_finish_output2(struct sk_buff *skb)
> >  	}
> >  
> >  	rcu_read_lock_bh();
> > -	nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr);
> > +	if (unlikely(IS_ENABLED(CONFIG_IP_VS) && skb->ipvs_property))
> > +		daddr = IP_VS_SKB_CB(skb)->daddr;
> 
> What guarantee do we have daddr points to valid memory (not already
> freed/reused) ?

I can change that to passing a value if there is a risk
the reference could become invalid. To be honest I am more
worried that skb->cb might be clobbered entirely.

> I guess things like NFQUEUE could happen ?

Could you expand a little?

^ permalink raw reply

* Re: [PATCH] sh_eth: add/use RMCR.RNC bit
From: Simon Horman @ 2013-10-16  0:34 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, nobuhiro.iwamatsu.yj, linux-sh, davem
In-Reply-To: <201310160229.58735.sergei.shtylyov@cogentembedded.com>

On Wed, Oct 16, 2013 at 02:29:58AM +0400, Sergei Shtylyov wrote:
> Declare 'enum EMCR_BIT' containing the single member for the RMCR.RNC bit and
> replace bare numbers in the driver by  this mnemonic.
> 
> Suggested-by: David Miller <davem@davemloft.net>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Thanks Sergei,

this seems to move things in the right direction.

Reviewed-by: Simon Horman <horms+renesas@verge.net.au>

> ---
> This patch is against DaveM's 'net.git' repo but it intended for 'net-next.git'
> repo -- it's  because 'net-next.git' doesn't contain the required Simon Horman's
> patch yet.
> 
>  drivers/net/ethernet/renesas/sh_eth.c |    6 +++---
>  drivers/net/ethernet/renesas/sh_eth.h |    3 +++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> Index: net/drivers/net/ethernet/renesas/sh_eth.c
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.c
> +++ net/drivers/net/ethernet/renesas/sh_eth.c
> @@ -483,7 +483,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  	.register_type	= SH_ETH_REG_FAST_SH4,
>  
>  	.eesipr_value	= DMAC_M_RFRMER | DMAC_M_ECI | 0x003fffff,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.tx_check	= EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
>  	.eesr_err_check	= EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
> @@ -561,7 +561,7 @@ static struct sh_eth_cpu_data sh7757_dat
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000072f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.irq_flags	= IRQF_SHARED,
>  	.apr		= 1,
> @@ -689,7 +689,7 @@ static struct sh_eth_cpu_data r8a7740_da
>  			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
>  			  EESR_TDE | EESR_ECI,
>  	.fdr_value	= 0x0000070f,
> -	.rmcr_value	= 0x00000001,
> +	.rmcr_value	= RMCR_RNC,
>  
>  	.apr		= 1,
>  	.mpr		= 1,
> Index: net/drivers/net/ethernet/renesas/sh_eth.h
> ===================================================================
> --- net.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net/drivers/net/ethernet/renesas/sh_eth.h
> @@ -321,6 +321,9 @@ enum TD_STS_BIT {
>  #define TD_TFP	(TD_TFP1|TD_TFP0)
>  
>  /* RMCR */
> +enum RMCR_BIT {
> +	RMCR_RNC = 0x00000001,
> +};
>  #define DEFAULT_RMCR_VALUE	0x00000000
>  
>  /* ECMR */
> 

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  0:39 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016002807.GM22321@verge.net.au>

On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:

> > I guess things like NFQUEUE could happen ?
> 
> Could you expand a little?

This was to point that between IPVS and ipv6 stack we might have a
delay, and daddr was maybe pointed to a freed memory.

IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.




^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Hannes Frederic Sowa @ 2013-10-16  0:53 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>

On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.
> 
> Unfortunately my analysis is that in the case of IPVS direct routing this
> change leads to incorrect behaviour as in this case packets may be output
> to a destination other than where they would be output according to the
> route table. In particular, the destination address may actually be a local
> address and empirically a neighbour lookup seems to result in it becoming
> unreachable.
> 
> This patch resolves the problem by providing the destination address
> determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> this seems to work I can see several problems with this approach:
> 
> * It is rather ugly, stuffing an IPVS exception right in
>   the middle of IPv6 code. The overhead could be eliminated for many users
>   by using a staic key. But none the less it is not attractive.
> 
> * The use of the skb callback is may not be valid
>   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
>   alternative is to add a new field to struct sk_buff.
> 
> * This covers all IPv6 packets output by IPVS but actually
>   only those output using IPVS Direct-Routing need this.  One way to
>   resolve this would be to add a more fine-grained ipvs_property to
>   struct sk_buff.

Hmm, that reminds me on the following bug report which would be nice we could
solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html

Greetings,

  Hannes


^ permalink raw reply

* Re: [PATCH] bridge: Correctly clamp MAX forward_delay when enabling STP
From: Herbert Xu @ 2013-10-16  0:56 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Stephen Hemminger
In-Reply-To: <1381863465-27304-1-git-send-email-vyasevic@redhat.com>

On Tue, Oct 15, 2013 at 02:57:45PM -0400, Vlad Yasevich wrote:
> Commit be4f154d5ef0ca147ab6bcd38857a774133f5450
> 	bridge: Clamp forward_delay when enabling STP
> had a typo when attempting to clamp maximum forward delay.
> 
> It is possible to set bridge_forward_delay to be higher then
> permitted maximum when STP is off.  When turning STP on, the
> higher then allowed delay has to be clamed down to max value.
> 
> CC: Herbert Xu <herbert@gondor.apana.org.au>
> CC: Stephen Hemminger <shemminger@vyatta.com>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Good catch!

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-16  1:05 UTC (permalink / raw)
  To: Fan Du, nhorman; +Cc: steffen.klassert, davem, netdev
In-Reply-To: <1381828777-15894-1-git-send-email-fan.du@windriver.com>

On 10/15/2013 05:19 AM, Fan Du wrote:
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Looks good to me.

-vlad

> ---
> v3:
>    - Rename is_xfrm_armed by dst_xfrm
>    - Move this funtion in include/net/dst.h
>
> v2:
>    - Split v1 into two separate patches.
>
> ---
>   include/net/dst.h |   12 ++++++++++++
>   net/sctp/output.c |    3 ++-
>   2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 211dcf1..44995c1 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -478,10 +478,22 @@ static inline struct dst_entry *xfrm_lookup(struct net *net,
>   {
>   	return dst_orig;
>   }
> +
> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
> +{
> +	return NULL;
> +}
> +
>   #else
>   struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
>   			      const struct flowi *fl, struct sock *sk,
>   			      int flags);
> +
> +/* skb attached with this dst needs transformation if dst->xfrm is valid */
> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
> +{
> +	return dst->xfrm;
> +}
>   #endif
>
>   #endif /* _NET_DST_H */
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..24b3718 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>   	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>   	 */
>   	if (!sctp_checksum_disable) {
> -		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
> +			(dst_xfrm(dst) != NULL)) {
>   			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>   			/* 3) Put the resultant value into the checksum field in the
>

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-16  1:52 UTC (permalink / raw)
  To: Fan Du, nhorman; +Cc: steffen.klassert, davem, netdev
In-Reply-To: <525DE666.10308@gmail.com>

On 10/15/2013 09:05 PM, Vlad Yasevich wrote:
> On 10/15/2013 05:19 AM, Fan Du wrote:
>> igb/ixgbe have hardware sctp checksum support, when this feature is
>> enabled
>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>> every thing
>> up and pack the 16bits result in the checksum field). The result is fail
>> establishment of sctp communication.
>>
>> Signed-off-by: Fan Du <fan.du@windriver.com>
>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
>
> Looks good to me.
>

Just tryied applying this and looks like you based this on net-next.
This fixes a rather ugly bug when checksum offloading is done.
I am going to rebase this and re-submit for net.

-vlad

> -vlad
>
>> ---
>> v3:
>>    - Rename is_xfrm_armed by dst_xfrm
>>    - Move this funtion in include/net/dst.h
>>
>> v2:
>>    - Split v1 into two separate patches.
>>
>> ---
>>   include/net/dst.h |   12 ++++++++++++
>>   net/sctp/output.c |    3 ++-
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/dst.h b/include/net/dst.h
>> index 211dcf1..44995c1 100644
>> --- a/include/net/dst.h
>> +++ b/include/net/dst.h
>> @@ -478,10 +478,22 @@ static inline struct dst_entry
>> *xfrm_lookup(struct net *net,
>>   {
>>       return dst_orig;
>>   }
>> +
>> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
>> +{
>> +    return NULL;
>> +}
>> +
>>   #else
>>   struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry
>> *dst_orig,
>>                     const struct flowi *fl, struct sock *sk,
>>                     int flags);
>> +
>> +/* skb attached with this dst needs transformation if dst->xfrm is
>> valid */
>> +static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
>> +{
>> +    return dst->xfrm;
>> +}
>>   #endif
>>
>>   #endif /* _NET_DST_H */
>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>> index 0ac3a65..24b3718 100644
>> --- a/net/sctp/output.c
>> +++ b/net/sctp/output.c
>> @@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>>        * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>>        */
>>       if (!sctp_checksum_disable) {
>> -        if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
>> +        if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
>> +            (dst_xfrm(dst) != NULL)) {
>>               __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>>
>>               /* 3) Put the resultant value into the checksum field in
>> the
>>
>

^ permalink raw reply

* Re: [PATCHv3 net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Fan Du @ 2013-10-16  1:58 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: nhorman, steffen.klassert, davem, netdev
In-Reply-To: <525DF141.6060302@gmail.com>



On 2013年10月16日 09:52, Vlad Yasevich wrote:
> On 10/15/2013 09:05 PM, Vlad Yasevich wrote:
>> On 10/15/2013 05:19 AM, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>>> Signed-off-by: Fan Du <fan.du@windriver.com>
>>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>>> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
>>
>> Looks good to me.
>>
>
> Just tryied applying this and looks like you based this on net-next.
> This fixes a rather ugly bug when checksum offloading is done.
> I am going to rebase this and re-submit for net.

:( Sorry for the inconvenience. Stable tree might hurts also.

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

* [PATCH v4 net 0/3] sctp: Use software checksum under certain
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich

There are some cards that support SCTP checksum offloading.  When using
these cards with IPSec or forcing IP fragmentation of SCTP traffic,
the checksum is computed incorrectly due to the fact that xfrm and IP/IPv6
fragmentation code do not know that this is SCTP traffic and do not
know that checksum has to be computed differently.

To fix this, we let SCTP detect these conditions and perform software
checksum calculation.

Fan Du (1):
  sctp: Use software crc32 checksum when xfrm transform will happen.

Vlad Yasevich (2):
  net: dst: provide accessor function to dst->xfrm
  sctp: Perform software checksum if packet has to be fragmented.

 include/net/dst.h | 12 ++++++++++++
 net/sctp/output.c |  3 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH v4 net 1/3] net: dst: provide accessor function to dst->xfrm
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.

Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 include/net/dst.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 3bc4865..3c4c944 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -479,10 +479,22 @@ static inline struct dst_entry *xfrm_lookup(struct net *net,
 {
 	return dst_orig;
 } 
+
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+	return NULL;
+}
+
 #else
 extern struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 				     const struct flowi *fl, struct sock *sk,
 				     int flags);
+
+/* skb attached with this dst needs transformation if dst->xfrm is valid */
+static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
+{
+	return dst->xfrm;
+}
 #endif
 
 #endif /* _NET_DST_H */
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 net 2/3] sctp: Use software crc32 checksum when xfrm transform will happen.
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Neil Horman, Steffen Klassert, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

From: Fan Du <fan.du@windriver.com>

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do checksum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0ac3a65..d35b54c 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -536,7 +536,8 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
 	 */
 	if (!sctp_checksum_disable) {
-		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
+		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
+		    (dst_xfrm(dst) != NULL)) {
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v4 net 3/3] sctp: Perform software checksum if packet has to be fragmented.
From: Vlad Yasevich @ 2013-10-16  2:01 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, fan.du, Vlad Yasevich
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

IP/IPv6 fragmentation knows how to compute only TCP/UDP checksum.
This causes problems if SCTP packets has to be fragmented and
ipsummed has been set to PARTIAL due to checksum offload support.
This condition can happen when retransmitting after MTU discover,
or when INIT or other control chunks are larger then MTU.
Check for the rare fragmentation condition in SCTP and use software
checksum calculation in this case.

CC: Fan Du <fan.du@windriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index d35b54c..3191373 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -537,7 +537,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 */
 	if (!sctp_checksum_disable) {
 		if (!(dst->dev->features & NETIF_F_SCTP_CSUM) ||
-		    (dst_xfrm(dst) != NULL)) {
+		    (dst_xfrm(dst) != NULL) || packet->ipfragok) {
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
-- 
1.8.3.1

^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381883949.2045.97.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:39:09PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:
> 
> > > I guess things like NFQUEUE could happen ?
> > 
> > Could you expand a little?
> 
> This was to point that between IPVS and ipv6 stack we might have a
> delay, and daddr was maybe pointed to a freed memory.
> 
> IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.

That does seem very promising but while implementing it
I hit a problem.

struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
now larger than 48 bytes and no longer fits in skb->cb.

Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
suggests?

^ permalink raw reply

* Re: [PATCH V2] For for each TSN t being newly acked (Not only cumulatively, but also SELECTIVELY) cacc_saw_newack should be set to 1.
From: Vlad Yasevich @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Chang Xiangzhong; +Cc: nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <1381860784-16481-1-git-send-email-changxiangzhong@gmail.com>

On 10/15/2013 02:13 PM, Chang Xiangzhong wrote:
> Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>

Your proposed solution is very nice, but it does 2 things in one
patch.
  1) It fixes the bug
  2) It refactors the code to improve the flow.

While (2) is very nice, it needs a much more careful review.
Can you please split this into 2 patches?  First patch can
make the code look like this:

	if (sctp_acked(sack, tsn)) {
		...
		if (!tchunk->tsn_gap_acked) {
			tchunk->tsn_gap_acked = 1;
			*highest_new_tsn_in_sack = tsn;
			bytes_acked += sctp_data_size(tchunk);
                         if (!tchunk->transport)
				migrate_bytes += sctp_data_size(tchunk);
			forward_progress = true;

			/*
			 * SFR-CACC algorithm:
			 * 2) If the SACK contains gap acks
			 * and the flag CHANGEOVER_ACTIVE is
			 * set the receiver of the SACK MUST
			 * take the following action:
			...
		}
	}

Then you can file a second patch to improve the flow/refactor the 
function.  You have to be very careful here though and be sure to
run through all the regression tests since you would be modifying
a very critcal part of the code.

Thanks
-vlad

> ---
>   net/sctp/outqueue.c |   76 ++++++++++++++++++++++++---------------------------
>   1 file changed, 35 insertions(+), 41 deletions(-)
>
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 94df758..84ef3b8 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1357,13 +1357,13 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>
>   		tsn = ntohl(tchunk->subh.data_hdr->tsn);
>   		if (sctp_acked(sack, tsn)) {
> -			/* If this queue is the retransmit queue, the
> -			 * retransmit timer has already reclaimed
> -			 * the outstanding bytes for this chunk, so only
> -			 * count bytes associated with a transport.
> -			 */
> -			if (transport) {
> -				/* If this chunk is being used for RTT
> +			if (!tchunk->tsn_gap_acked) {
> +				/* If this queue is the retransmit queue, the
> +				 * retransmit timer has already reclaimed
> +				 * the outstanding bytes for this chunk, so only
> +				 * count bytes associated with a transport.
> +				 *
> +				 * If this chunk is being used for RTT
>   				 * measurement, calculate the RTT and update
>   				 * the RTO using this value.
>   				 *
> @@ -1374,28 +1374,44 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				 * first instance of the packet or a later
>   				 * instance).
>   				 */
> -				if (!tchunk->tsn_gap_acked &&
> -				    tchunk->rtt_in_progress) {
> +				if (transport && tchunk->rtt_in_progress) {
>   					tchunk->rtt_in_progress = 0;
>   					rtt = jiffies - tchunk->sent_at;
>   					sctp_transport_update_rto(transport,
> -								  rtt);
> +						rtt);
>   				}
> -			}
>
> -			/* If the chunk hasn't been marked as ACKED,
> -			 * mark it and account bytes_acked if the
> -			 * chunk had a valid transport (it will not
> -			 * have a transport if ASCONF had deleted it
> -			 * while DATA was outstanding).
> -			 */
> -			if (!tchunk->tsn_gap_acked) {
> +				/* If the chunk hasn't been marked as ACKED,
> +				 * mark it and account bytes_acked if the
> +				 * chunk had a valid transport (it will not
> +				 * have a transport if ASCONF had deleted it
> +				 * while DATA was outstanding).
> +				 */
>   				tchunk->tsn_gap_acked = 1;
>   				*highest_new_tsn_in_sack = tsn;
>   				bytes_acked += sctp_data_size(tchunk);
>   				if (!tchunk->transport)
>   					migrate_bytes += sctp_data_size(tchunk);
>   				forward_progress = true;
> +
> +				/* SFR-CACC algorithm:
> +				 * 2) If the SACK contains gap acks
> +				 * and the flag CHANGEOVER_ACTIVE is
> +				 * set the receiver of the SACK MUST
> +				 * take the following action:
> +				 *
> +				 * B) For each TSN t being acked that
> +				 * has not been acked in any SACK so
> +				 * far, set cacc_saw_newack to 1 for
> +				 * the destination that the TSN was
> +				 * sent to.
> +				 */
> +				if (transport &&
> +					sack->num_gap_ack_blocks &&
> +					q->asoc->peer.primary_path->cacc.
> +					changeover_active
> +				   )
> +					transport->cacc.cacc_saw_newack = 1;
>   			}
>
>   			if (TSN_lte(tsn, sack_ctsn)) {
> @@ -1411,30 +1427,8 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				restart_timer = 1;
>   				forward_progress = true;
>
> -				if (!tchunk->tsn_gap_acked) {
> -					/*
> -					 * SFR-CACC algorithm:
> -					 * 2) If the SACK contains gap acks
> -					 * and the flag CHANGEOVER_ACTIVE is
> -					 * set the receiver of the SACK MUST
> -					 * take the following action:
> -					 *
> -					 * B) For each TSN t being acked that
> -					 * has not been acked in any SACK so
> -					 * far, set cacc_saw_newack to 1 for
> -					 * the destination that the TSN was
> -					 * sent to.
> -					 */
> -					if (transport &&
> -					    sack->num_gap_ack_blocks &&
> -					    q->asoc->peer.primary_path->cacc.
> -					    changeover_active)
> -						transport->cacc.cacc_saw_newack
> -							= 1;
> -				}
> -
>   				list_add_tail(&tchunk->transmitted_list,
> -					      &q->sacked);
> +					&q->sacked);
>   			} else {
>   				/* RFC2960 7.2.4, sctpimpguide-05 2.8.2
>   				 * M2) Each time a SACK arrives reporting
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox