Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 18/19] batman-adv: fix locking in hash_add()
From: Antonio Quartulli @ 2012-06-18 20:39 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Matthias Schiffer, Sven Eckelmann,
	Antonio Quartulli
In-Reply-To: <1340051963-14836-1-git-send-email-ordex@autistici.org>

From: Matthias Schiffer <mschiffer@universe-factory.net>

To ensure an entry isn't added twice all comparisons have to be protected by the
hash line write spinlock. This doesn't really hurt as the case that it is tried
to add an element already present to the hash shouldn't occur very often, so in
most cases the lock would have have to be taken anyways.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/hash.h |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/net/batman-adv/hash.h b/net/batman-adv/hash.h
index 93b3c71..3d67ce4 100644
--- a/net/batman-adv/hash.h
+++ b/net/batman-adv/hash.h
@@ -110,26 +110,23 @@ static inline int hash_add(struct hashtable_t *hash,
 	head = &hash->table[index];
 	list_lock = &hash->list_locks[index];
 
-	rcu_read_lock();
-	__hlist_for_each_rcu(node, head) {
+	spin_lock_bh(list_lock);
+
+	hlist_for_each(node, head) {
 		if (!compare(node, data))
 			continue;
 
 		ret = 1;
-		goto err_unlock;
+		goto unlock;
 	}
-	rcu_read_unlock();
 
 	/* no duplicate found in list, add new element */
-	spin_lock_bh(list_lock);
 	hlist_add_head_rcu(data_node, head);
-	spin_unlock_bh(list_lock);
 
 	ret = 0;
-	goto out;
 
-err_unlock:
-	rcu_read_unlock();
+unlock:
+	spin_unlock_bh(list_lock);
 out:
 	return ret;
 }
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 19/19] batman-adv: only store changed gw_bandwidth values
From: Antonio Quartulli @ 2012-06-18 20:39 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Sven Eckelmann,
	Antonio Quartulli
In-Reply-To: <1340051963-14836-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/gateway_common.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/batman-adv/gateway_common.c b/net/batman-adv/gateway_common.c
index ca57ac7..6e3b052 100644
--- a/net/batman-adv/gateway_common.c
+++ b/net/batman-adv/gateway_common.c
@@ -162,6 +162,9 @@ ssize_t gw_bandwidth_set(struct net_device *net_dev, char *buff, size_t count)
 	 **/
 	gw_bandwidth_to_kbit((uint8_t)gw_bandwidth_tmp, &down, &up);
 
+	if (atomic_read(&bat_priv->gw_bandwidth) == gw_bandwidth_tmp)
+		return count;
+
 	gw_deselect(bat_priv);
 	bat_info(net_dev,
 		 "Changing gateway bandwidth from: '%i' to: '%ld' (propagating: %d%s/%d%s)\n",
-- 
1.7.9.4

^ permalink raw reply related

* Re: [PATCH 2/2 net-next] ixgbe: remove xmit length check
From: Jeff Kirsher @ 2012-06-18 20:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: e1000-devel, Bruce Allan, netdev, David S. Miller
In-Reply-To: <20120618115558.1ef26a48@nehalam.linuxnetplumber.net>


[-- Attachment #1.1: Type: text/plain, Size: 349 bytes --]

On Mon, 2012-06-18 at 11:55 -0700, Stephen Hemminger wrote:
> The check here is bogus. Since len is unsigned, it can never
> be negative. And it would be a bug in network stack to ever
> send a zero length packet to device.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> 

Thanks, I have added this patch as well to my queue.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 395 bytes --]

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
From: Borislav Petkov @ 2012-06-18 20:43 UTC (permalink / raw)
  To: netdev; +Cc: lkml

Just got the below on -rc3 after resuming. No network afterwards.

Rebooting fixed it.

[15473.272994] Restarting tasks ... done.
[15473.685535] forcedeth 0000:00:08.0: eth0: link up
[15483.926079] ------------[ cut here ]------------
[15483.926095] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x16f/0x213()
[15483.926099] Hardware name:  
[15483.926103] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
[15483.926160] Modules linked in: nls_iso8859_15 nls_cp437 tun cpufreq_powersave cpufreq_userspace ip6table_filter cpufreq_conservative ip6_tables cpufreq_stats iptable_filter ip_tables x_tables binfmt_misc fuse dm_crypt ipv6 vfat fat dm_mod powernow_k8 mperf kvm_amd kvm radeon drm_kms_helper ttm edac_core k10temp microcode cfbfillrect cfbimgblt cfbcopyarea
[15483.926166] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc3+ #3
[15483.926170] Call Trace:
[15483.926180]  <IRQ>  [<ffffffff8137b500>] ? dev_watchdog+0x72/0x213
[15483.926189]  [<ffffffff8102b354>] warn_slowpath_common+0x83/0x9b
[15483.926195]  [<ffffffff8102b40f>] warn_slowpath_fmt+0x46/0x48
[15483.926203]  [<ffffffff8137b5fd>] dev_watchdog+0x16f/0x213
[15483.926211]  [<ffffffff8103a4a9>] run_timer_softirq+0x37b/0x58f
[15483.926217]  [<ffffffff8103a354>] ? run_timer_softirq+0x226/0x58f
[15483.926224]  [<ffffffff8105e56e>] ? local_clock+0x2a/0x3b
[15483.926231]  [<ffffffff81051842>] ? hrtimer_interrupt+0x118/0x1b4
[15483.926237]  [<ffffffff8137b48e>] ? pfifo_fast_dequeue+0xc2/0xc2
[15483.926245]  [<ffffffff81075638>] ? trace_hardirqs_off_caller+0x1f/0x10e
[15483.926256]  [<ffffffff81032c26>] __do_softirq+0x17f/0x32d
[15483.926262]  [<ffffffff810720c2>] ? clockevents_program_event+0xab/0xc7
[15483.926269]  [<ffffffff813fe40c>] call_softirq+0x1c/0x30
[15483.926277]  [<ffffffff810035bf>] do_softirq+0x3d/0x86
[15483.926282]  [<ffffffff81033021>] irq_exit+0x53/0xbb
[15483.926290]  [<ffffffff813fea2d>] smp_apic_timer_interrupt+0x8a/0x98
[15483.926296]  [<ffffffff813fdbdc>] apic_timer_interrupt+0x6c/0x80
[15483.926304]  <EOI>  [<ffffffff810532b0>] ? __atomic_notifier_call_chain+0xdb/0x109
[15483.926313]  [<ffffffff81009a40>] ? default_idle+0x113/0x24e
[15483.926319]  [<ffffffff81009a3e>] ? default_idle+0x111/0x24e
[15483.926326]  [<ffffffff81009d06>] amd_e400_idle+0xe5/0xe7
[15483.926332]  [<ffffffff8100a4ae>] cpu_idle+0x6c/0xc8
[15483.926339]  [<ffffffff813e044c>] rest_init+0x130/0x137
[15483.926345]  [<ffffffff813e031c>] ? csum_partial_copy_generic+0x16c/0x16c
[15483.926353]  [<ffffffff8186fa30>] start_kernel+0x2d2/0x2df
[15483.926359]  [<ffffffff8186f567>] ? repair_env_string+0x56/0x56
[15483.926366]  [<ffffffff8186f27a>] x86_64_start_reservations+0x7e/0x82
[15483.926373]  [<ffffffff8186f36e>] x86_64_start_kernel+0xf0/0xf7
[15483.926377] ---[ end trace bd46b3883a3ab819 ]---
[15483.926385] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15488.929896] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15498.921512] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15508.913144] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15518.904786] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15528.896409] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15538.888039] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15548.879678] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15558.871309] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15568.862929] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15578.854570] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15588.846189] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15598.837828] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15608.829441] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15618.821083] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15628.812722] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15638.804350] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15648.795974] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15658.787605] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15668.779222] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15678.770871] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15688.762501] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15698.754136] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15708.745764] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15718.737391] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15728.729019] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15738.720654] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15748.712280] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15754.570157] forcedeth 0000:00:08.0: eth0: link down
[15756.972392] forcedeth 0000:00:08.0: eth0: link up
[15762.700565] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15777.679993] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15787.671642] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15797.663285] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15802.667092] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15812.658701] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15822.650341] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15832.641980] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15882.600137] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15892.591755] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15917.562840] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15932.558276] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15962.533183] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15972.524806] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15982.516438] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[15992.508077] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[16002.499699] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[16012.491327] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[16022.482969] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020
[16032.474585] forcedeth 0000:00:08.0: eth0: Got tx_timeout. irq status: 00000020

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply

* RE: divide by 0 error in igbvf_set_coalesce - ab50a2a
From: Williams, Mitch A @ 2012-06-18 20:45 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev@vger.kernel.org
In-Reply-To: <4FDF706D.7080509@cisco.com>



> -----Original Message-----
> From: David Ahern [mailto:daahern@cisco.com]
> Sent: Monday, June 18, 2012 11:16 AM
> To: Williams, Mitch A
> Cc: netdev@vger.kernel.org
> Subject: divide by 0 error in igbvf_set_coalesce - ab50a2a
> 
> Mitch:
> 
> I have a VM using a 82576 based VF. Running:
> $ ethtool -C eth2 rx-usecs 0
> 
> generates the following trace on console:
> 
> [  894.683322] divide error: 0000 [#1] SMP [  894.684020] CPU 1 [

Thanks for letting me know, David. I'll look into it and get a patch out soon. Shouldn't be that big of a deal to fix.

In the meantime, my advice to you is, "Don't do that."

-Mitch



> 894.684020] Modules linked in: sunrpc virtio_net igbvf virtio_blk
> virtio_pci virtio_ring virtio [  894.684020] [  894.684020] Pid: 7310,
> comm: ethtool Not tainted 3.5.0-rc1 #0 Bochs Bochs [  894.684020] RIP:
> 0010:[<ffffffffa00259ec>]  [<ffffffffa00259ec>]
> igbvf_set_coalesce+0x5b/0x8b [igbvf] [  894.684020] RSP:
> 0018:ffff88003cd51c38  EFLAGS: 00010246 [  894.684020] RAX:
> 000000003b9aca00 RBX: ffff88003aa42000 RCX:
> 0000000000000000
> [  894.684020] RDX: 0000000000000000 RSI: ffff88003cd51c48 RDI:
> ffff88003aa42780
> [  894.684020] RBP: ffff88003cd51c38 R08: 0000000000000000 R09:
> 0000000000000000
> [  894.684020] R10: 0000000000000000 R11: ffff88003cd51ec8 R12:
> ffff88003cd51c48
> [  894.684020] R13: 000000000000000f R14: 00000000ffffffff R15:
> 00000000fffffff2
> [  894.684020] FS:  00007f8142ccb720(0000) GS:ffff88003fd00000(0000)
> knlGS:0000000000000000
> [  894.684020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 894.684020] CR2: 000000000040466c CR3: 000000003c754000 CR4:
> 00000000000007e0
> [  894.684020] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  894.684020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  894.684020] Process ethtool (pid: 7310, threadinfo ffff88003cd50000,
> task ffff88003b984470) [  894.684020] Stack:
> [  894.684020]  ffff88003cd51cb8 ffffffff8132bcf5 000000000000000f
> 0000000000000000
> [  894.684020]  0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [  894.684020]  0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> [  894.684020] Call Trace:
> [  894.684020]  [<ffffffff8132bcf5>] ethtool_set_coalesce+0x54/0x5d [
> 894.684020]  [<ffffffff8132d0c9>] dev_ethtool+0x5dc/0x175b [
> 894.684020]  [<ffffffff81057921>] ? need_resched+0x1e/0x28 [
> 894.684020]  [<ffffffff81057934>] ? should_resched+0x9/0x29 [
> 894.684020]  [<ffffffff813be822>] ? _cond_resched+0xe/0x22 [
> 894.684020]  [<ffffffff8132b227>] dev_ioctl+0x517/0x684 [  894.684020]
> [<ffffffff810ec91f>] ? pmd_offset+0x14/0x3b [  894.684020]
> [<ffffffff81315045>] sock_do_ioctl+0x3d/0x48 [  894.684020]
> [<ffffffff8131545f>] sock_ioctl+0x1f8/0x207 [  894.684020]
> [<ffffffff8111f6fa>] do_vfs_ioctl+0x475/0x4b6 [  894.684020]
> [<ffffffff811f8e70>] ?
> inode_has_perm.clone.19.clone.27+0x33/0x35
> [  894.684020]  [<ffffffff811f9298>] ? file_has_perm+0x73/0x7e [
> 894.684020]  [<ffffffff8110f6cf>] ? fd_install+0x57/0x60 [  894.684020]
> [<ffffffff8111f791>] sys_ioctl+0x56/0x79 [  894.684020]
> [<ffffffff813c5de9>] system_call_fastpath+0x16/0x1b [  894.684020] Code:
> 83 f8 02 77 0f c7 87 0c 03 00 00 e8 01 00 00 8b 46
> 04 eb 19 8d 04 8d 00 00 00 00 31 d2 89 87 0c 03 00 00 c1 e1 0a b8 00 ca
> 9a 3b <f7> f1 89 87 08 03 00 00 8b 97 0c 03 00 00 48 8b 87 80 03 00 00 [
> 894.684020] RIP  [<ffffffffa00259ec>] igbvf_set_coalesce+0x5b/0x8b
> [igbvf] [  894.684020]  RSP <ffff88003cd51c38> [  894.779474] ---[ end
> trace 162bed6b66df758d ]---
> 
> 
> This commit introduced the problem:
> 
> commit ab50a2a430693b0961dc7b7d9fe2a4bd77d11ea6
> Author: Mitch A Williams <mitch.a.williams@intel.com>
> Date:   Sat Jan 14 08:10:50 2012 +0000
> 
>      igbvf: refactor Interrupt Throttle Rate code
> 
> David

^ permalink raw reply

* драсссти ..!!) 
From: Аллчик  Гуслякова @ 2012-06-18 18:38 UTC (permalink / raw)
  To: netdev

Привеет, Фома скинула адрессок имейл твой.))) 
открой мою страничку www.demo1.restaurant.edge7tech.com/picture.php Марианчик Хорошилова записана я там.!) 
если скучно и хош подружиться..! 

^ permalink raw reply

* [PATCH] sctp: fix warning when compiling without IPv6
From: Daniel Halperin @ 2012-06-18 21:04 UTC (permalink / raw)
  To: netdev

net/sctp/protocol.c: In function ‘sctp_addr_wq_timeout_handler’:
net/sctp/protocol.c:676: warning: label ‘free_next’ defined but not used

Signed-off-by: Daniel Halperin <dhalperi@cs.washington.edu>
---
 net/sctp/protocol.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 5942d27..9c90811 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -673,7 +673,9 @@ void sctp_addr_wq_timeout_handler(unsigned long arg)
 				SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: sctp_asconf_mgmt failed\n");
 			sctp_bh_unlock_sock(sk);
 		}
+#if IS_ENABLED(CONFIG_IPV6)
 free_next:
+#endif
 		list_del(&addrw->list);
 		kfree(addrw);
 	}
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH 1/2 net-next] ixgbe: use skb_padto
From: Alexander Duyck @ 2012-06-18 21:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Kirsher, Bruce Allan, Carolyn Wyborny, Don Skidmore,
	Greg Rose, Peter P Waskiewicz Jr, David S. Miller, e1000-devel,
	netdev
In-Reply-To: <20120618105816.5fdd0b90@nehalam.linuxnetplumber.net>

On 06/18/2012 10:58 AM, Stephen Hemminger wrote:
> The code to pad packets here is the same effective code as
> the existing inline function skb_padto(). There is a minor
> performance gain since skb_padto() also uses unlikely().
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
>
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c	2012-06-18 10:53:09.130376800 -0700
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c	2012-06-18 10:55:13.104540844 -0700
> @@ -6389,11 +6389,8 @@ static netdev_tx_t ixgbe_xmit_frame(stru
>  	 * The minimum packet size for olinfo paylen is 17 so pad the skb
>  	 * in order to meet this minimum size requirement.
>  	 */
> -	if (skb->len < 17) {
> -		if (skb_padto(skb, 17))
> -			return NETDEV_TX_OK;
> -		skb->len = 17;
> -	}
> +	if (skb_padto(skb, 17))
> +		return NETDEV_TX_OK;
>  
>  	tx_ring = adapter->tx_ring[skb->queue_mapping];
>  	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
I don't think this will work.  We need to update the skb->len and last I
knew skb_padto doesn't do that.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH] c_can_pci: generic module for C_CAN/D_CAN on PCI
From: Federico Vaga @ 2012-06-18 21:23 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Wolfgang Grandegger, Giancarlo Asnaghi, Alan Cox, linux-can,
	netdev, linux-kernel, Bhupesh SHARMA, AnilKumar Chimata,
	Alessandro Rubini
In-Reply-To: <4FDF83A5.6060703@pengutronix.de>

> I get this warning:
>
> socketcan/linux/drivers/net/can/c_can/c_can_pci.c: In function 'c_can_pci_probe':
> socketcan/linux/drivers/net/can/c_can/c_can_pci.c:71: warning: 'priv' may be used uninitialized in this function
>
> What about:
>
> -       pci_iounmap(pdev, priv->base);
> +       pci_iounmap(pdev, addr);

I didn't get this warning, but I read again the code and the warning
is correct, so pci_iounmap(pdev, addr) is the right way.

-- 
Federico Vaga

^ permalink raw reply

* Re: pull request: wireless 2012-06-18
From: David Miller @ 2012-06-18 21:48 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20120618195947.GA30590@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Mon, 18 Jun 2012 15:59:48 -0400

> This is a batch of fixes intended for 3.5...
> 
> This includes pulls from the mac80211 and bluetooth trees -- soon
> I'll be completely irrelevant!

Don't forget to keep taking all the credit, that's the trick :)

> Please let me know if there are problems!

Pulled, thanks a lot John.

^ permalink raw reply

* [PATCH] ipv6: Move ipv6 proc file registration to end of init order
From: Thomas Graf @ 2012-06-18 22:08 UTC (permalink / raw)
  To: davem; +Cc: netdev

/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.

This opens up a short time frame to access fib_table_hash with its pants
down.

Move the registration of the proc files to a later point in the init
order to avoid the race.

Tested :-)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/ipv6/route.c |   41 +++++++++++++++++++++++++++++++----------
 1 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 999a982..becb048 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2957,10 +2957,6 @@ static int __net_init ip6_route_net_init(struct net *net)
 	net->ipv6.sysctl.ip6_rt_mtu_expires = 10*60*HZ;
 	net->ipv6.sysctl.ip6_rt_min_advmss = IPV6_MIN_MTU - 20 - 40;
 
-#ifdef CONFIG_PROC_FS
-	proc_net_fops_create(net, "ipv6_route", 0, &ipv6_route_proc_fops);
-	proc_net_fops_create(net, "rt6_stats", S_IRUGO, &rt6_stats_seq_fops);
-#endif
 	net->ipv6.ip6_rt_gc_expire = 30*HZ;
 
 	ret = 0;
@@ -2981,10 +2977,6 @@ out_ip6_dst_ops:
 
 static void __net_exit ip6_route_net_exit(struct net *net)
 {
-#ifdef CONFIG_PROC_FS
-	proc_net_remove(net, "ipv6_route");
-	proc_net_remove(net, "rt6_stats");
-#endif
 	kfree(net->ipv6.ip6_null_entry);
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 	kfree(net->ipv6.ip6_prohibit_entry);
@@ -2993,11 +2985,33 @@ static void __net_exit ip6_route_net_exit(struct net *net)
 	dst_entries_destroy(&net->ipv6.ip6_dst_ops);
 }
 
+static int __net_init ip6_route_net_init_late(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	proc_net_fops_create(net, "ipv6_route", 0, &ipv6_route_proc_fops);
+	proc_net_fops_create(net, "rt6_stats", S_IRUGO, &rt6_stats_seq_fops);
+#endif
+	return 0;
+}
+
+static void __net_exit ip6_route_net_exit_late(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	proc_net_remove(net, "ipv6_route");
+	proc_net_remove(net, "rt6_stats");
+#endif
+}
+
 static struct pernet_operations ip6_route_net_ops = {
 	.init = ip6_route_net_init,
 	.exit = ip6_route_net_exit,
 };
 
+static struct pernet_operations ip6_route_net_late_ops = {
+	.init = ip6_route_net_init_late,
+	.exit = ip6_route_net_exit_late,
+};
+
 static struct notifier_block ip6_route_dev_notifier = {
 	.notifier_call = ip6_route_dev_notify,
 	.priority = 0,
@@ -3047,19 +3061,25 @@ int __init ip6_route_init(void)
 	if (ret)
 		goto xfrm6_init;
 
+	ret = register_pernet_subsys(&ip6_route_net_late_ops);
+	if (ret)
+		goto fib6_rules_init;
+
 	ret = -ENOBUFS;
 	if (__rtnl_register(PF_INET6, RTM_NEWROUTE, inet6_rtm_newroute, NULL, NULL) ||
 	    __rtnl_register(PF_INET6, RTM_DELROUTE, inet6_rtm_delroute, NULL, NULL) ||
 	    __rtnl_register(PF_INET6, RTM_GETROUTE, inet6_rtm_getroute, NULL, NULL))
-		goto fib6_rules_init;
+		goto out_register_late_subsys;
 
 	ret = register_netdevice_notifier(&ip6_route_dev_notifier);
 	if (ret)
-		goto fib6_rules_init;
+		goto out_register_late_subsys;
 
 out:
 	return ret;
 
+out_register_late_subsys:
+	unregister_pernet_subsys(&ip6_route_net_late_ops);
 fib6_rules_init:
 	fib6_rules_cleanup();
 xfrm6_init:
@@ -3078,6 +3098,7 @@ out_kmem_cache:
 void ip6_route_cleanup(void)
 {
 	unregister_netdevice_notifier(&ip6_route_dev_notifier);
+	unregister_pernet_subsys(&ip6_route_net_late_ops);
 	fib6_rules_cleanup();
 	xfrm6_fini();
 	fib6_gc_cleanup();
-- 
1.7.7.6

^ permalink raw reply related

* Re: [RFC] Introduce to batch variants of accept() and epoll_ctl() syscall
From: Andi Kleen @ 2012-06-18 23:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Li Yu, Changli Gao, Linux Netdev List, Linux Kernel Mailing List,
	davidel
In-Reply-To: <1339750318.7491.70.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> I believe accept() is the problem here, because it contends with the
> softirq processing the tcp session handshake.

The MOSBENCH people some time ago did a per CPU accept queue. This is
probably overkill, but there are clearly some scaling problems here
with enough cores.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* [PATCH net-next] ixgbe: simplify padding and length checks (v2)
From: Stephen Hemminger @ 2012-06-18 23:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jeff Kirsher, Bruce Allan, Carolyn Wyborny, Don Skidmore,
	Greg Rose, Peter P Waskiewicz Jr, David S. Miller, e1000-devel,
	netdev
In-Reply-To: <4FDF9B37.3030804@intel.com>

The check for length <= 0 is bogus because length is unsigned, and network
stack never sends zero length packets (unless it is totally broken).

The check for really small packets can be optimized (using unlikely)
and calling skb_pad directly.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c	2012-06-18 10:53:09.130376800 -0700
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c	2012-06-18 15:20:44.364951004 -0700
@@ -6380,17 +6380,12 @@ static netdev_tx_t ixgbe_xmit_frame(stru
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_ring *tx_ring;
 
-	if (skb->len <= 0) {
-		dev_kfree_skb_any(skb);
-		return NETDEV_TX_OK;
-	}
-
 	/*
 	 * The minimum packet size for olinfo paylen is 17 so pad the skb
 	 * in order to meet this minimum size requirement.
 	 */
-	if (skb->len < 17) {
-		if (skb_padto(skb, 17))
+	if (unlikely(skb->len < 17)) {
+		if (skb_pad(skb, 17 - skb->len))
 			return NETDEV_TX_OK;
 		skb->len = 17;
 	}

^ permalink raw reply

* [RFC] TCP:  Support configurable delayed-ack parameters.
From: greearb @ 2012-06-19  0:52 UTC (permalink / raw)
  To: netdev; +Cc: Ben Greear, Daniel Baluta

From: Ben Greear <greearb@candelatech.com>

RFC2581 ($4.2) specifies when an ACK should be generated as follows:

" .. an ACK SHOULD be generated for at least every second
  full-sized segment, and MUST be generated within 500 ms
  of the arrival of the first unacknowledged packet.
"

We export the number of segments and the timeout limits
specified above, so that a user can tune them according
to their needs.

Specifically:
	* /proc/sys/net/ipv4/tcp_default_delack_segs, represents
	the threshold for the number of segments.
	* /proc/sys/net/ipv4/tcp_default_delack_min, specifies
	the minimum timeout value
	* /proc/sys/net/ipv4/tcp_default_delack_max, specifies
	the maximum timeout value.

In addition, new TCP socket options are added to allow
per-socket configuration:

TCP_DELACK_SEGS
TCP_DELACK_MIN
TCP_DELACK_MAX

In order to keep a multiply out of the hot path, the segs * mss
computation is recalculated and cached whenever segs or mss changes.

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Ben Greear <greearb@candelatech.com>
---

Compile-tested only at this point.


 Documentation/networking/ip-sysctl.txt |   13 +++++++++++++
 include/linux/tcp.h                    |    3 +++
 include/net/inet_connection_sock.h     |   31 ++++++++++++++++++++++++++++---
 include/net/tcp.h                      |   13 ++++++++++---
 net/dccp/output.c                      |    5 +++--
 net/dccp/timer.c                       |    2 +-
 net/ipv4/inet_connection_sock.c        |   13 +++++++++++++
 net/ipv4/sysctl_net_ipv4.c             |   21 +++++++++++++++++++++
 net/ipv4/tcp.c                         |   23 +++++++++++++++++++----
 net/ipv4/tcp_input.c                   |   24 ++++++++++++++----------
 net/ipv4/tcp_output.c                  |   22 +++++++++++++++-------
 net/ipv4/tcp_timer.c                   |    3 ++-
 12 files changed, 142 insertions(+), 31 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6f896b9..89675d8 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -551,6 +551,19 @@ tcp_thin_dupack - BOOLEAN
 	Documentation/networking/tcp-thin.txt
 	Default: 0
 
+tcp_default_delack_segs: - INTEGER
+	Sets the default minimal number of full-sized TCP segments
+	received after which an ACK should be sent.
+	Default: 1 (as specified in RFC2582, S4.2)
+
+tcp_default_delack_min:	- INTEGER
+	Sets the default minimum time (in miliseconds) to delay before sending an ACK.
+	Default: 40ms
+
+tcp_default_delack_max: - INTEGER
+	Sets the maximum time (in miliseconds) to delay before sending an ACK.
+	Default: 200ms
+
 UDP variables:
 
 udp_mem - vector of 3 INTEGERs: min, pressure, max
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 5f359db..bc73d8c 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -110,6 +110,9 @@ enum {
 #define TCP_REPAIR_QUEUE	20
 #define TCP_QUEUE_SEQ		21
 #define TCP_REPAIR_OPTIONS	22
+#define TCP_DELACK_SEGS         23 /* Number of segments per delayed ack */
+#define TCP_DELACK_MIN          24 /* minimum delayed ack, in miliseconds */
+#define TCP_DELACK_MAX          25 /* maximum delayed ack, in miliseconds */
 
 struct tcp_repair_opt {
 	__u32	opt_code;
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 7d83f90..2ada03c 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -113,7 +113,12 @@ struct inet_connection_sock {
 		unsigned long	  timeout;	 /* Currently scheduled timeout		   */
 		__u32		  lrcvtime;	 /* timestamp of last received data packet */
 		__u16		  last_seg_size; /* Size of last incoming segment	   */
-		__u16		  rcv_mss;	 /* MSS used for delayed ACK decisions	   */ 
+		__u16		  _rcv_mss;	 /* MSS used for delayed ACK decisions	   */
+		__u32		  calc_thresh;   /* rcv_mss * tcp_delack_segs          */
+		__u16		  tcp_delack_min; /* Minimum ack delay in ms               */
+		__u16		  tcp_delack_max; /* Minimum ack delay in ms               */
+		__u16		  tcp_delack_segs;/* Delay # of segs before sending ack    */
+		__u16		  UNUSED_HOLE;    /* Add new member(s) here                */
 	} icsk_ack;
 	struct {
 		int		  enabled;
@@ -171,11 +176,31 @@ static inline int inet_csk_ack_scheduled(const struct sock *sk)
 	return inet_csk(sk)->icsk_ack.pending & ICSK_ACK_SCHED;
 }
 
-static inline void inet_csk_delack_init(struct sock *sk)
+static inline __u16 inet_csk_get_rcv_mss(const struct sock *sk)
 {
-	memset(&inet_csk(sk)->icsk_ack, 0, sizeof(inet_csk(sk)->icsk_ack));
+	return inet_csk(sk)->icsk_ack._rcv_mss;
 }
 
+static inline void inet_csk_recalc_delack_thresh(struct sock *sk)
+{
+       struct inet_connection_sock *icsk = inet_csk(sk);
+       icsk->icsk_ack.calc_thresh =
+               icsk->icsk_ack._rcv_mss * icsk->icsk_ack.tcp_delack_segs;
+}
+
+static inline void inet_csk_set_rcv_mss(struct sock *sk, __u16 rcv_mss)
+{
+	inet_csk(sk)->icsk_ack._rcv_mss = rcv_mss;
+	inet_csk_recalc_delack_thresh(sk);
+}
+
+static inline u32 inet_csk_delack_thresh(const struct sock *sk)
+{
+       return inet_csk(sk)->icsk_ack.calc_thresh;
+}
+
+extern void inet_csk_delack_init(struct sock *sk);
+
 extern void inet_csk_delete_keepalive_timer(struct sock *sk);
 extern void inet_csk_reset_keepalive_timer(struct sock *sk, unsigned long timeout);
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e79aa48..d6cb650 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -113,14 +113,18 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
 				  * TIME-WAIT timer.
 				  */
 
-#define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */
+/* default maximum time to delay before sending an ACK */
+#define TCP_DELACK_MAX_DEFAULT	((unsigned)(HZ/5))
+
 #if HZ >= 100
-#define TCP_DELACK_MIN	((unsigned)(HZ/25))	/* minimal time to delay before sending an ACK */
+/* default minimum time to delay before sending an ACK */
+#define TCP_DELACK_MIN_DEFAULT	((unsigned)(HZ/25))
 #define TCP_ATO_MIN	((unsigned)(HZ/25))
 #else
-#define TCP_DELACK_MIN	4U
+#define TCP_DELACK_MIN_DEFAULT	4U
 #define TCP_ATO_MIN	4U
 #endif
+
 #define TCP_RTO_MAX	((unsigned)(120*HZ))
 #define TCP_RTO_MIN	((unsigned)(HZ/5))
 #define TCP_TIMEOUT_INIT ((unsigned)(1*HZ))	/* RFC6298 2.1 initial RTO value	*/
@@ -253,6 +257,9 @@ extern int sysctl_tcp_cookie_size;
 extern int sysctl_tcp_thin_linear_timeouts;
 extern int sysctl_tcp_thin_dupack;
 extern int sysctl_tcp_early_retrans;
+extern int sysctl_tcp_default_delack_segs;
+extern int sysctl_tcp_default_delack_min;
+extern int sysctl_tcp_default_delack_max;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/dccp/output.c b/net/dccp/output.c
index 7873673..984a19a 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -574,10 +574,11 @@ void dccp_send_ack(struct sock *sk)
 						GFP_ATOMIC);
 
 		if (skb == NULL) {
+			struct inet_connection_sock *icsk = inet_csk(sk);
 			inet_csk_schedule_ack(sk);
-			inet_csk(sk)->icsk_ack.ato = TCP_ATO_MIN;
+			icsk->icsk_ack.ato = TCP_ATO_MIN;
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
-						  TCP_DELACK_MAX,
+						  icsk->icsk_ack.tcp_delack_max,
 						  DCCP_RTO_MAX);
 			return;
 		}
diff --git a/net/dccp/timer.c b/net/dccp/timer.c
index 16f0b22..2fc883c 100644
--- a/net/dccp/timer.c
+++ b/net/dccp/timer.c
@@ -203,7 +203,7 @@ static void dccp_delack_timer(unsigned long data)
 		icsk->icsk_ack.blocked = 1;
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED);
 		sk_reset_timer(sk, &icsk->icsk_delack_timer,
-			       jiffies + TCP_DELACK_MIN);
+			       jiffies + icsk->icsk_ack.tcp_delack_min);
 		goto out;
 	}
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f9ee741..4206b79 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -366,6 +366,19 @@ void inet_csk_reset_keepalive_timer(struct sock *sk, unsigned long len)
 }
 EXPORT_SYMBOL(inet_csk_reset_keepalive_timer);
 
+extern int sysctl_tcp_default_delack_min;
+extern int sysctl_tcp_default_delack_max;
+extern int sysctl_tcp_default_delack_segs;
+void inet_csk_delack_init(struct sock *sk)
+{
+	struct inet_connection_sock *icsk = inet_csk(sk);
+	memset(&icsk->icsk_ack, 0, sizeof(icsk->icsk_ack));
+	icsk->icsk_ack.tcp_delack_min = sysctl_tcp_default_delack_min;
+	icsk->icsk_ack.tcp_delack_max = sysctl_tcp_default_delack_max;
+	icsk->icsk_ack.tcp_delack_segs = sysctl_tcp_default_delack_segs;
+}
+EXPORT_SYMBOL(inet_csk_delack_init);
+
 struct dst_entry *inet_csk_route_req(struct sock *sk,
 				     struct flowi4 *fl4,
 				     const struct request_sock *req)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ef32956..e898a2e 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -687,6 +687,27 @@ static struct ctl_table ipv4_table[] = {
 		.extra2		= &two,
 	},
 	{
+		.procname	= "tcp_default_delack_segs",
+		.data		= &sysctl_tcp_default_delack_segs,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+	{
+		.procname	= "tcp_default_delack_min",
+		.data		= &sysctl_tcp_default_delack_min,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_ms_jiffies
+	},
+	{
+		.procname	= "tcp_default_delack_max",
+		.data		= &sysctl_tcp_default_delack_max,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_ms_jiffies
+	},
+	{
 		.procname	= "udp_mem",
 		.data		= &sysctl_udp_mem,
 		.maxlen		= sizeof(sysctl_udp_mem),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3ba605f..55a4597 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1305,8 +1305,9 @@ void tcp_cleanup_rbuf(struct sock *sk, int copied)
 		   /* Delayed ACKs frequently hit locked sockets during bulk
 		    * receive. */
 		if (icsk->icsk_ack.blocked ||
-		    /* Once-per-two-segments ACK was not sent by tcp_input.c */
-		    tp->rcv_nxt - tp->rcv_wup > icsk->icsk_ack.rcv_mss ||
+		    /* More than once-per-tcp_delack_segs-segments ACK
+		     * was not sent by tcp_input.c */
+		    tp->rcv_nxt - tp->rcv_wup > inet_csk_delack_thresh(sk) ||
 		    /*
 		     * If this read emptied read buffer, we send ACK, if
 		     * connection is not bidirectional, user drained
@@ -2436,7 +2437,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 	case TCP_NODELAY:
 		if (val) {
 			/* TCP_NODELAY is weaker than TCP_CORK, so that
-			 * this option on corked socket is remembered, but
+			 * thiso ption on corked socket is remembered, but
 			 * it is not activated until cork is cleared.
 			 *
 			 * However, when TCP_NODELAY is set we make
@@ -2627,6 +2628,20 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
 		 */
 		icsk->icsk_user_timeout = msecs_to_jiffies(val);
 		break;
+
+	case TCP_DELACK_SEGS:
+		icsk->icsk_ack.tcp_delack_segs = val;
+		inet_csk_recalc_delack_thresh(sk);
+		break;
+
+	case TCP_DELACK_MIN:
+		icsk->icsk_ack.tcp_delack_min = val;
+		break;
+
+	case TCP_DELACK_MAX:
+		icsk->icsk_ack.tcp_delack_max = val;
+		break;
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -2693,7 +2708,7 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info)
 	info->tcpi_rto = jiffies_to_usecs(icsk->icsk_rto);
 	info->tcpi_ato = jiffies_to_usecs(icsk->icsk_ack.ato);
 	info->tcpi_snd_mss = tp->mss_cache;
-	info->tcpi_rcv_mss = icsk->icsk_ack.rcv_mss;
+	info->tcpi_rcv_mss = inet_csk_get_rcv_mss(sk);
 
 	if (sk->sk_state == TCP_LISTEN) {
 		info->tcpi_unacked = sk->sk_ack_backlog;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b224eb8..6c0f901 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_abc __read_mostly;
 int sysctl_tcp_early_retrans __read_mostly = 2;
 
+int sysctl_tcp_default_delack_segs __read_mostly = 1;
+
 #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
 #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
 #define FLAG_DATA_ACKED		0x04 /* This ACK acknowledged new data.		*/
@@ -139,8 +141,8 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 	 * sends good full-sized frames.
 	 */
 	len = skb_shinfo(skb)->gso_size ? : skb->len;
-	if (len >= icsk->icsk_ack.rcv_mss) {
-		icsk->icsk_ack.rcv_mss = len;
+	if (len >= inet_csk_get_rcv_mss(sk)) {
+		inet_csk_set_rcv_mss(sk, len);
 	} else {
 		/* Otherwise, we make more careful check taking into account,
 		 * that SACKs block is variable.
@@ -163,7 +165,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 			len -= tcp_sk(sk)->tcp_header_len;
 			icsk->icsk_ack.last_seg_size = len;
 			if (len == lss) {
-				icsk->icsk_ack.rcv_mss = len;
+				inet_csk_set_rcv_mss(sk, len);
 				return;
 			}
 		}
@@ -176,7 +178,8 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 static void tcp_incr_quickack(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
-	unsigned int quickacks = tcp_sk(sk)->rcv_wnd / (2 * icsk->icsk_ack.rcv_mss);
+	unsigned int quickacks;
+	quickacks = tcp_sk(sk)->rcv_wnd / (2 * inet_csk_get_rcv_mss(sk));
 
 	if (quickacks == 0)
 		quickacks = 2;
@@ -310,7 +313,7 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb)
 
 	while (tp->rcv_ssthresh <= window) {
 		if (truesize <= skb->len)
-			return 2 * inet_csk(sk)->icsk_ack.rcv_mss;
+			return 2 * inet_csk_get_rcv_mss(sk);
 
 		truesize >>= 1;
 		window >>= 1;
@@ -440,7 +443,7 @@ void tcp_initialize_rcv_mss(struct sock *sk)
 	hint = min(hint, TCP_MSS_DEFAULT);
 	hint = max(hint, TCP_MIN_MSS);
 
-	inet_csk(sk)->icsk_ack.rcv_mss = hint;
+	inet_csk_set_rcv_mss(sk, hint);
 }
 EXPORT_SYMBOL(tcp_initialize_rcv_mss);
 
@@ -510,7 +513,7 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 	struct tcp_sock *tp = tcp_sk(sk);
 	if (tp->rx_opt.rcv_tsecr &&
 	    (TCP_SKB_CB(skb)->end_seq -
-	     TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss))
+	     TCP_SKB_CB(skb)->seq >= inet_csk_get_rcv_mss(sk)))
 		tcp_rcv_rtt_update(tp, tcp_time_stamp - tp->rx_opt.rcv_tsecr, 0);
 }
 
@@ -5206,8 +5209,8 @@ static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	    /* More than one full frame received... */
-	if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&
+	/* More than tcp_delack_segs full frame(s) received... */
+	if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk_delack_thresh(sk) &&
 	     /* ... and right edge of window advances far enough.
 	      * (tcp_recvmsg() will send ACK otherwise). Or...
 	      */
@@ -5909,7 +5912,8 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 			icsk->icsk_ack.lrcvtime = tcp_time_stamp;
 			tcp_enter_quickack_mode(sk);
 			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
-						  TCP_DELACK_MAX, TCP_RTO_MAX);
+						  icsk->icsk_ack.tcp_delack_max,
+						  TCP_RTO_MAX);
 
 discard:
 			__kfree_skb(skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 803cbfe..25f4e45 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -65,6 +65,11 @@ int sysctl_tcp_slow_start_after_idle __read_mostly = 1;
 int sysctl_tcp_cookie_size __read_mostly = 0; /* TCP_COOKIE_MAX */
 EXPORT_SYMBOL_GPL(sysctl_tcp_cookie_size);
 
+int sysctl_tcp_default_delack_min __read_mostly = TCP_DELACK_MIN_DEFAULT;
+EXPORT_SYMBOL(sysctl_tcp_default_delack_min);
+
+int sysctl_tcp_default_delack_max __read_mostly = TCP_DELACK_MAX_DEFAULT;
+EXPORT_SYMBOL(sysctl_tcp_default_delack_max);
 
 /* Account for new data that has been sent to the network. */
 static void tcp_event_new_data_sent(struct sock *sk, const struct sk_buff *skb)
@@ -1927,7 +1932,7 @@ u32 __tcp_select_window(struct sock *sk)
 	 * but may be worse for the performance because of rcv_mss
 	 * fluctuations.  --SAW  1998/11/1
 	 */
-	int mss = icsk->icsk_ack.rcv_mss;
+	int mss = inet_csk_get_rcv_mss(sk);
 	int free_space = tcp_space(sk);
 	int full_space = min_t(int, tp->window_clamp, tcp_full_space(sk));
 	int window;
@@ -2699,14 +2704,14 @@ void tcp_send_delayed_ack(struct sock *sk)
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int ato = icsk->icsk_ack.ato;
 	unsigned long timeout;
+	const struct tcp_sock *tp = tcp_sk(sk);
 
-	if (ato > TCP_DELACK_MIN) {
-		const struct tcp_sock *tp = tcp_sk(sk);
+	if (ato > icsk->icsk_ack.tcp_delack_min) {
 		int max_ato = HZ / 2;
 
 		if (icsk->icsk_ack.pingpong ||
 		    (icsk->icsk_ack.pending & ICSK_ACK_PUSHED))
-			max_ato = TCP_DELACK_MAX;
+			max_ato = icsk->icsk_ack.tcp_delack_max;
 
 		/* Slow path, intersegment interval is "high". */
 
@@ -2715,7 +2720,8 @@ void tcp_send_delayed_ack(struct sock *sk)
 		 * directly.
 		 */
 		if (tp->srtt) {
-			int rtt = max(tp->srtt >> 3, TCP_DELACK_MIN);
+			int rtt = max_t(unsigned, tp->srtt >> 3,
+					icsk->icsk_ack.tcp_delack_min);
 
 			if (rtt < max_ato)
 				max_ato = rtt;
@@ -2750,6 +2756,7 @@ void tcp_send_delayed_ack(struct sock *sk)
 void tcp_send_ack(struct sock *sk)
 {
 	struct sk_buff *buff;
+	struct inet_connection_sock *icsk = inet_csk(sk);
 
 	/* If we have been reset, we may not send again. */
 	if (sk->sk_state == TCP_CLOSE)
@@ -2762,9 +2769,10 @@ void tcp_send_ack(struct sock *sk)
 	buff = alloc_skb(MAX_TCP_HEADER, GFP_ATOMIC);
 	if (buff == NULL) {
 		inet_csk_schedule_ack(sk);
-		inet_csk(sk)->icsk_ack.ato = TCP_ATO_MIN;
+		icsk->icsk_ack.ato = TCP_ATO_MIN;
 		inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
-					  TCP_DELACK_MAX, TCP_RTO_MAX);
+					  icsk->icsk_ack.tcp_delack_max,
+					  TCP_RTO_MAX);
 		return;
 	}
 
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index e911e6c..4bd85fd 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -216,7 +216,8 @@ static void tcp_delack_timer(unsigned long data)
 		/* Try again later. */
 		icsk->icsk_ack.blocked = 1;
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED);
-		sk_reset_timer(sk, &icsk->icsk_delack_timer, jiffies + TCP_DELACK_MIN);
+		sk_reset_timer(sk, &icsk->icsk_delack_timer,
+			       jiffies + icsk->icsk_ack.tcp_delack_min);
 		goto out_unlock;
 	}
 
-- 
1.7.7.6

^ permalink raw reply related

* Re: [RFC] TCP:  Support configurable delayed-ack parameters.
From: Stephen Hemminger @ 2012-06-19  1:27 UTC (permalink / raw)
  To: greearb; +Cc: netdev, Daniel Baluta
In-Reply-To: <1340067163-29329-1-git-send-email-greearb@candelatech.com>

On Mon, 18 Jun 2012 17:52:43 -0700
greearb@candelatech.com wrote:

> From: Ben Greear <greearb@candelatech.com>
> 
> RFC2581 ($4.2) specifies when an ACK should be generated as follows:
> 
> " .. an ACK SHOULD be generated for at least every second
>   full-sized segment, and MUST be generated within 500 ms
>   of the arrival of the first unacknowledged packet.
> "
> 
> We export the number of segments and the timeout limits
> specified above, so that a user can tune them according
> to their needs.
> 
> Specifically:
> 	* /proc/sys/net/ipv4/tcp_default_delack_segs, represents
> 	the threshold for the number of segments.
> 	* /proc/sys/net/ipv4/tcp_default_delack_min, specifies
> 	the minimum timeout value
> 	* /proc/sys/net/ipv4/tcp_default_delack_max, specifies
> 	the maximum timeout value.
> 
> In addition, new TCP socket options are added to allow
> per-socket configuration:
> 
> TCP_DELACK_SEGS
> TCP_DELACK_MIN
> TCP_DELACK_MAX
> 
> In order to keep a multiply out of the hot path, the segs * mss
> computation is recalculated and cached whenever segs or mss changes.
> 
> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
> Signed-off-by: Ben Greear <greearb@candelatech.com>

What is the justification (other than standard) for making this
tunable. Why would you want to do this? Why shouldn't the stack be adjusting
it for you (based on other heuristics)? Or is this just for testing interoperation
with TCP stacks that have wonky ACK policies. There are already too many TCP tunable
parameters for general usage.

^ permalink raw reply

* Re: [PATCH] [XFRM] Fix unexpected SA hard expiration after changing date
From: Fan Du @ 2012-06-19  1:34 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: davem, herbert, netdev, fdu
In-Reply-To: <20120618110523.GA29295@secunet.com>



On 2012年06月18日 19:05, Steffen Klassert wrote:
> On Mon, Jun 18, 2012 at 04:24:16PM +0800, fan.du wrote:
>> After SA is setup, one timer is armed to detect soft/hard expiration,
>> however the timer handler uses xtime to do the math. This makes hard
>> expiration occurs first before soft expiration after setting new date
>> with big interval. As a result new child SA is deleted before rekeying
>> the new one.
>>
>> Signed-off-by: fan.du<fan.du@windriver.com>
>> ---
>>   include/net/xfrm.h    |    2 ++
>>   net/xfrm/xfrm_state.c |   22 ++++++++++++++++++----
>>   2 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
>> index 2933d74..1734acc 100644
>> --- a/include/net/xfrm.h
>> +++ b/include/net/xfrm.h
>> @@ -214,6 +214,8 @@ struct xfrm_state
>>   	/* Private data of this transformer, format is opaque,
>>   	 * interpreted by xfrm_type methods. */
>>   	void			*data;
>> +	u32				flags;
>
> We already have the xflags field, it holds exactly one flag
> at the moment. So I think we don't need yet another u32 that
> holds one flag too.
>

good point!
I will make it in the next version.


-- 

Love each day!
--fan

^ permalink raw reply

* Re: [PATCH] ipv6: Move ipv6 proc file registration to end of init order
From: David Miller @ 2012-06-19  1:39 UTC (permalink / raw)
  To: tgraf; +Cc: netdev
In-Reply-To: <1432a6aff6ee203a688e78b06ed8000475385564.1340057223.git.tgraf@suug.ch>

From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 19 Jun 2012 00:08:33 +0200

> /proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
> handler is installed in ip6_route_net_init() whereas fib_table_hash is
> allocated in fib6_net_init() _after_ the proc handler has been installed.
> 
> This opens up a short time frame to access fib_table_hash with its pants
> down.
> 
> Move the registration of the proc files to a later point in the init
> order to avoid the race.
> 
> Tested :-)
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

This looks a lot better, applied, thanks Thomas.

^ permalink raw reply

* [PATCH] ipv4: Early TCP socket demux.
From: David Miller @ 2012-06-19  2:40 UTC (permalink / raw)
  To: netdev


You know you want it.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/protocol.h b/include/net/protocol.h
index 875f489..6c47bf8 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -34,6 +34,7 @@
 
 /* This is used to register protocols. */
 struct net_protocol {
+	int			(*early_demux)(struct sk_buff *skb);
 	int			(*handler)(struct sk_buff *skb);
 	void			(*err_handler)(struct sk_buff *skb, u32 info);
 	int			(*gso_send_check)(struct sk_buff *skb);
diff --git a/include/net/sock.h b/include/net/sock.h
index 4a45216..87b424a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -319,6 +319,7 @@ struct sock {
 	unsigned long 		sk_flags;
 	struct dst_entry	*sk_dst_cache;
 	spinlock_t		sk_dst_lock;
+	struct dst_entry	*sk_rx_dst;
 	atomic_t		sk_wmem_alloc;
 	atomic_t		sk_omem_alloc;
 	int			sk_sndbuf;
@@ -1426,6 +1427,7 @@ extern struct sk_buff		*sock_rmalloc(struct sock *sk,
 					      gfp_t priority);
 extern void			sock_wfree(struct sk_buff *skb);
 extern void			sock_rfree(struct sk_buff *skb);
+extern void			sock_edemux(struct sk_buff *skb);
 
 extern int			sock_setsockopt(struct socket *sock, int level,
 						int op, char __user *optval,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9332f34..6660ffc 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -325,6 +325,7 @@ extern void tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void tcp_shutdown (struct sock *sk, int how);
 
+extern int tcp_v4_early_demux(struct sk_buff *skb);
 extern int tcp_v4_rcv(struct sk_buff *skb);
 
 extern struct inet_peer *tcp_v4_get_peer(struct sock *sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index 9e5b71f..929bdcc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1465,6 +1465,11 @@ void sock_rfree(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(sock_rfree);
 
+void sock_edemux(struct sk_buff *skb)
+{
+	sock_put(skb->sk);
+}
+EXPORT_SYMBOL(sock_edemux);
 
 int sock_i_uid(struct sock *sk)
 {
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e4e8e00..a2bd2d2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -157,6 +157,7 @@ void inet_sock_destruct(struct sock *sk)
 
 	kfree(rcu_dereference_protected(inet->inet_opt, 1));
 	dst_release(rcu_dereference_check(sk->sk_dst_cache, 1));
+	dst_release(sk->sk_rx_dst);
 	sk_refcnt_debug_dec(sk);
 }
 EXPORT_SYMBOL(inet_sock_destruct);
@@ -1520,14 +1521,15 @@ static const struct net_protocol igmp_protocol = {
 #endif
 
 static const struct net_protocol tcp_protocol = {
-	.handler =	tcp_v4_rcv,
-	.err_handler =	tcp_v4_err,
-	.gso_send_check = tcp_v4_gso_send_check,
-	.gso_segment =	tcp_tso_segment,
-	.gro_receive =	tcp4_gro_receive,
-	.gro_complete =	tcp4_gro_complete,
-	.no_policy =	1,
-	.netns_ok =	1,
+	.early_demux	=	tcp_v4_early_demux,
+	.handler	=	tcp_v4_rcv,
+	.err_handler	=	tcp_v4_err,
+	.gso_send_check	=	tcp_v4_gso_send_check,
+	.gso_segment	=	tcp_tso_segment,
+	.gro_receive	=	tcp4_gro_receive,
+	.gro_complete	=	tcp4_gro_complete,
+	.no_policy	=	1,
+	.netns_ok	=	1,
 };
 
 static const struct net_protocol udp_protocol = {
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 8590144..cb883e1 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -324,19 +324,34 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	 *	how the packet travels inside Linux networking.
 	 */
 	if (skb_dst(skb) == NULL) {
-		int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-					       iph->tos, skb->dev);
-		if (unlikely(err)) {
-			if (err == -EHOSTUNREACH)
-				IP_INC_STATS_BH(dev_net(skb->dev),
-						IPSTATS_MIB_INADDRERRORS);
-			else if (err == -ENETUNREACH)
-				IP_INC_STATS_BH(dev_net(skb->dev),
-						IPSTATS_MIB_INNOROUTES);
-			else if (err == -EXDEV)
-				NET_INC_STATS_BH(dev_net(skb->dev),
-						 LINUX_MIB_IPRPFILTER);
-			goto drop;
+		const struct net_protocol *ipprot;
+		int protocol = iph->protocol;
+		int hash, err;
+
+		hash = protocol & (MAX_INET_PROTOS - 1);
+
+		rcu_read_lock();
+		ipprot = rcu_dereference(inet_protos[hash]);
+		err = -ENOENT;
+		if (ipprot && ipprot->early_demux)
+			err = ipprot->early_demux(skb);
+		rcu_read_unlock();
+
+		if (err) {
+			err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+						   iph->tos, skb->dev);
+			if (unlikely(err)) {
+				if (err == -EHOSTUNREACH)
+					IP_INC_STATS_BH(dev_net(skb->dev),
+							IPSTATS_MIB_INADDRERRORS);
+				else if (err == -ENETUNREACH)
+					IP_INC_STATS_BH(dev_net(skb->dev),
+							IPSTATS_MIB_INNOROUTES);
+				else if (err == -EXDEV)
+					NET_INC_STATS_BH(dev_net(skb->dev),
+							 LINUX_MIB_IPRPFILTER);
+				goto drop;
+			}
 		}
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fda2ca1..bd90181 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1671,6 +1671,50 @@ csum_err:
 }
 EXPORT_SYMBOL(tcp_v4_do_rcv);
 
+int tcp_v4_early_demux(struct sk_buff *skb)
+{
+	struct net *net = dev_net(skb->dev);
+	const struct iphdr *iph;
+	const struct tcphdr *th;
+	struct sock *sk;
+	int err;
+
+	err = -ENOENT;
+	if (skb->pkt_type != PACKET_HOST)
+		goto out_err;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb) + sizeof(struct tcphdr)))
+		goto out_err;
+
+	iph = ip_hdr(skb);
+	th = (struct tcphdr *) ((char *)iph + ip_hdrlen(skb));
+
+	if (th->doff < sizeof(struct tcphdr) / 4)
+		goto out_err;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb) + th->doff * 4))
+		goto out_err;
+
+	sk = __inet_lookup_established(net, &tcp_hashinfo,
+				       iph->saddr, th->source,
+				       iph->daddr, th->dest,
+				       skb->dev->ifindex);
+	if (sk) {
+		skb->sk = sk;
+		skb->destructor = sock_edemux;
+		if (sk->sk_state != TCP_TIME_WAIT) {
+			struct dst_entry *dst = sk->sk_rx_dst;
+			if (dst) {
+				skb_dst_set_noref(skb, dst);
+				err = 0;
+			}
+		}
+	}
+
+out_err:
+	return err;
+}
+
 /*
  *	From tcp_input.c
  */
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index cb01531..72b7c63 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -445,6 +445,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 		struct tcp_sock *oldtp = tcp_sk(sk);
 		struct tcp_cookie_values *oldcvp = oldtp->cookie_values;
 
+		newsk->sk_rx_dst = dst_clone(skb_dst(skb));
+
 		/* TCP Cookie Transactions require space for the cookie pair,
 		 * as it differs for each connection.  There is no need to
 		 * copy any s_data_payload stored at the original socket.

^ permalink raw reply related

* Re: [RFC] TCP:  Support configurable delayed-ack parameters.
From: Ben Greear @ 2012-06-19  2:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Daniel Baluta
In-Reply-To: <20120618182716.5f8fb72f@nehalam.linuxnetplumber.net>

On 06/18/2012 06:27 PM, Stephen Hemminger wrote:
> On Mon, 18 Jun 2012 17:52:43 -0700
> greearb@candelatech.com wrote:
>
>> From: Ben Greear<greearb@candelatech.com>
>>
>> RFC2581 ($4.2) specifies when an ACK should be generated as follows:
>>
>> " .. an ACK SHOULD be generated for at least every second
>>    full-sized segment, and MUST be generated within 500 ms
>>    of the arrival of the first unacknowledged packet.
>> "
>>
>> We export the number of segments and the timeout limits
>> specified above, so that a user can tune them according
>> to their needs.
>>
>> Specifically:
>> 	* /proc/sys/net/ipv4/tcp_default_delack_segs, represents
>> 	the threshold for the number of segments.
>> 	* /proc/sys/net/ipv4/tcp_default_delack_min, specifies
>> 	the minimum timeout value
>> 	* /proc/sys/net/ipv4/tcp_default_delack_max, specifies
>> 	the maximum timeout value.
>>
>> In addition, new TCP socket options are added to allow
>> per-socket configuration:
>>
>> TCP_DELACK_SEGS
>> TCP_DELACK_MIN
>> TCP_DELACK_MAX
>>
>> In order to keep a multiply out of the hot path, the segs * mss
>> computation is recalculated and cached whenever segs or mss changes.
>>
>> Signed-off-by: Daniel Baluta<dbaluta@ixiacom.com>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>
> What is the justification (other than standard) for making this
> tunable. Why would you want to do this? Why shouldn't the stack be adjusting
> it for you (based on other heuristics)? Or is this just for testing interoperation
> with TCP stacks that have wonky ACK policies. There are already too many TCP tunable
> parameters for general usage.

tcp over wifi performance sucks, and tuning it to delay acks by 10 or 20 segments
gives a decent performance boost.

It is beyond me to write something that auto-tunes this, but even if someone
did, it's virtually guaranteed that someone somewhere will get better results
by tuning their application directly.

I honestly didn't even read the RFC section in question..just stole the
description text from the original patch by Daniel Baluta.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* [PATCH 0/4] netfilter updates for net-next (batch 3)
From: pablo @ 2012-06-19  3:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi David,

The following patchset provides fixes for issues that were recently introduced
by my new cthelper infrastructure. They have been spotted by Randy Dunlap,
Andrew Morton and Dan Carpenter.

The patches provide:

* compilation fixes if CONFIG_NF_CONNTRACK is disabled: I moved all the
  conntrack code from nfnetlink_queue.c to nfnetlink_queue_ct.c to avoid
  peppering the entire code with lots of ifdefs. I needed to rename
  nfnetlink_queue.c to nfnetlink_queue_core.c to get it working with the
  Makefile tweaks I've added.

* fix NULL pointer dereference via ctnetlink while trying to change the helper
  for an existing conntrack entry. I don't find any reasonable use case for
  changing the helper from one to another in run-time. Thus, now ctnetlink
  returns -EOPNOTSUPP for this operation.

* fix possible out-of-bound zeroing of the conntrack extension area due to
  the helper automatic assignation routine.

You can pull these changes from:

git://1984.lsi.us.es/nf-next master

Thanks!

Pablo Neira Ayuso (4):
  netfilter: ctnetlink: fix NULL dereference while trying to change helper
  netfilter: nf_ct_helper: disable automatic helper re-assignment of different type
  netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset
  netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled

 include/net/netfilter/nfnetlink_queue.h            |   43 +++++++++
 net/netfilter/Kconfig                              |   29 ++++--
 net/netfilter/Makefile                             |    4 +-
 net/netfilter/nf_conntrack_helper.c                |    8 +-
 net/netfilter/nf_conntrack_netlink.c               |   24 ++---
 .../{nfnetlink_queue.c => nfnetlink_queue_core.c}  |   49 ++--------
 net/netfilter/nfnetlink_queue_ct.c                 |   97 ++++++++++++++++++++
 7 files changed, 187 insertions(+), 67 deletions(-)
 create mode 100644 include/net/netfilter/nfnetlink_queue.h
 rename net/netfilter/{nfnetlink_queue.c => nfnetlink_queue_core.c} (95%)
 create mode 100644 net/netfilter/nfnetlink_queue_ct.c

-- 
1.7.10

^ permalink raw reply

* [PATCH 2/4] netfilter: nf_ct_helper: disable automatic helper re-assignment of different type
From: pablo @ 2012-06-19  3:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch modifies __nf_ct_try_assign_helper in a way that invalidates support
for the following scenario:

1) attach the helper A for first time when the conntrack is created
2) attach new (different) helper B due to changes the reply tuple caused by NAT

eg. port redirection from TCP/21 to TCP/5060 with both FTP and SIP helpers
loaded, which seems to be a quite unorthodox scenario.

I can provide a more elaborated patch to support this scenario but explicit
helper attachment provides a better solution for this since now the use can
attach the helpers consistently, without relying on the automatic helper
lookup magic.

This patch fixes a possible out of bound zeroing of the conntrack helper
extension if the helper B uses more memory for its private data than
helper A.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_helper.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 2918ec2..c4bc637 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -229,7 +229,13 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 			goto out;
 		}
 	} else {
-		memset(help->data, 0, helper->data_len);
+		/* We only allow helper re-assignment of the same sort since
+		 * we cannot reallocate the helper extension area.
+		 */
+		if (help->helper != helper) {
+			RCU_INIT_POINTER(help->helper, NULL);
+			goto out;
+		}
 	}

 	rcu_assign_pointer(help->helper, helper);
-- 
1.7.10

^ permalink raw reply related

* [PATCH 4/4] netfilter: nfnetlink_queue: fix compilation with NF_CONNTRACK disabled
From: pablo @ 2012-06-19  3:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

In "9cb0176 netfilter: add glue code to integrate nfnetlink_queue and ctnetlink"
the compilation with NF_CONNTRACK disabled is broken. This patch fixes this
issue.

I have moved the conntrack part into nfnetlink_queue_ct.c to avoid
peppering the entire nfnetlink_queue.c code with ifdefs.

I also needed to rename nfnetlink_queue.c to nfnetlink_queue_pkt.c
to update the net/netfilter/Makefile to support conditional compilation
of the conntrack integration.

This patch also adds CONFIG_NETFILTER_QUEUE_CT in case you want to explicitly
disable the integration between nf_conntrack and nfnetlink_queue.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nfnetlink_queue.h            |   43 +++++++++
 net/netfilter/Kconfig                              |    9 ++
 net/netfilter/Makefile                             |    2 +
 net/netfilter/nf_conntrack_netlink.c               |   11 +--
 .../{nfnetlink_queue.c => nfnetlink_queue_core.c}  |   49 ++--------
 net/netfilter/nfnetlink_queue_ct.c                 |   97 ++++++++++++++++++++
 6 files changed, 164 insertions(+), 47 deletions(-)
 create mode 100644 include/net/netfilter/nfnetlink_queue.h
 rename net/netfilter/{nfnetlink_queue.c => nfnetlink_queue_core.c} (95%)
 create mode 100644 net/netfilter/nfnetlink_queue_ct.c

diff --git a/include/net/netfilter/nfnetlink_queue.h b/include/net/netfilter/nfnetlink_queue.h
new file mode 100644
index 0000000..9f8095c
--- /dev/null
+++ b/include/net/netfilter/nfnetlink_queue.h
@@ -0,0 +1,43 @@
+#ifndef _NET_NFNL_QUEUE_H_
+#define _NET_NFNL_QUEUE_H_
+
+#include <linux/netfilter/nf_conntrack_common.h>
+
+struct nf_conn;
+
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+struct nf_conn *nfqnl_ct_get(struct sk_buff *entskb, size_t *size,
+			     enum ip_conntrack_info *ctinfo);
+struct nf_conn *nfqnl_ct_parse(const struct sk_buff *skb,
+			       const struct nlattr *attr,
+			       enum ip_conntrack_info *ctinfo);
+int nfqnl_ct_put(struct sk_buff *skb, struct nf_conn *ct,
+		 enum ip_conntrack_info ctinfo);
+void nfqnl_ct_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
+			 enum ip_conntrack_info ctinfo, int diff);
+#else
+inline struct nf_conn *
+nfqnl_ct_get(struct sk_buff *entskb, size_t *size, enum ip_conntrack_info *ctinfo)
+{
+	return NULL;
+}
+
+inline struct nf_conn *nfqnl_ct_parse(const struct sk_buff *skb,
+				      const struct nlattr *attr,
+				      enum ip_conntrack_info *ctinfo)
+{
+	return NULL;
+}
+
+inline int
+nfqnl_ct_put(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info ctinfo)
+{
+	return 0;
+}
+
+inline void nfqnl_ct_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
+				enum ip_conntrack_info ctinfo, int diff)
+{
+}
+#endif /* NF_CONNTRACK */
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index f1a52ba..c19b214 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -340,6 +340,7 @@ config NF_CT_NETLINK_HELPER
 	select NETFILTER_NETLINK
 	depends on NF_CT_NETLINK
 	depends on NETFILTER_NETLINK_QUEUE
+	depends on NETFILTER_NETLINK_QUEUE_CT
 	depends on NETFILTER_ADVANCED
 	help
 	  This option enables the user-space connection tracking helpers
@@ -347,6 +348,14 @@ config NF_CT_NETLINK_HELPER
 
 	  If unsure, say `N'.
 
+config NETFILTER_NETLINK_QUEUE_CT
+        bool "NFQUEUE integration with Connection Tracking"
+        default n
+        depends on NETFILTER_NETLINK_QUEUE
+	help
+	  If this option is enabled, NFQUEUE can include Connection Tracking
+	  information together with the packet is the enqueued via NFNETLINK.
+
 endif # NF_CONNTRACK
 
 # transparent proxy support
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7cc2019..1c5160f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -9,6 +9,8 @@ obj-$(CONFIG_NETFILTER) = netfilter.o
 
 obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o
 obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o
+nfnetlink_queue-y := nfnetlink_queue_core.o
+nfnetlink_queue-$(CONFIG_NETFILTER_NETLINK_QUEUE_CT) += nfnetlink_queue_ct.o
 obj-$(CONFIG_NETFILTER_NETLINK_QUEUE) += nfnetlink_queue.o
 obj-$(CONFIG_NETFILTER_NETLINK_LOG) += nfnetlink_log.o
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 76271a1..31d1d8f 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1627,8 +1627,7 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
 	return err;
 }
 
-#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
-    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+#ifdef CONFIG_NETFILTER_NETLINK_QUEUE_CT
 static size_t
 ctnetlink_nfqueue_build_size(const struct nf_conn *ct)
 {
@@ -1762,7 +1761,7 @@ static struct nfq_ct_hook ctnetlink_nfqueue_hook = {
 	.seq_adjust	= nf_nat_tcp_seq_adjust,
 #endif
 };
-#endif /* CONFIG_NETFILTER_NETLINK_QUEUE */
+#endif /* CONFIG_NETFILTER_NETLINK_QUEUE_CT */
 
 /***********************************************************************
  * EXPECT
@@ -2568,8 +2567,7 @@ static int __init ctnetlink_init(void)
 		pr_err("ctnetlink_init: cannot register pernet operations\n");
 		goto err_unreg_exp_subsys;
 	}
-#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
-    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+#ifdef CONFIG_NETFILTER_NETLINK_QUEUE_CT
 	/* setup interaction between nf_queue and nf_conntrack_netlink. */
 	RCU_INIT_POINTER(nfq_ct_hook, &ctnetlink_nfqueue_hook);
 #endif
@@ -2590,8 +2588,7 @@ static void __exit ctnetlink_exit(void)
 	unregister_pernet_subsys(&ctnetlink_net_ops);
 	nfnetlink_subsys_unregister(&ctnl_exp_subsys);
 	nfnetlink_subsys_unregister(&ctnl_subsys);
-#if defined(CONFIG_NETFILTER_NETLINK_QUEUE) ||	\
-    defined(CONFIG_NETFILTER_NETLINK_QUEUE_MODULE)
+#ifdef CONFIG_NETFILTER_NETLINK_QUEUE_CT
 	RCU_INIT_POINTER(nfq_ct_hook, NULL);
 #endif
 }
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue_core.c
similarity index 95%
rename from net/netfilter/nfnetlink_queue.c
rename to net/netfilter/nfnetlink_queue_core.c
index ff82c79..d36b95e 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -30,7 +30,7 @@
 #include <linux/list.h>
 #include <net/sock.h>
 #include <net/netfilter/nf_queue.h>
-#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nfnetlink_queue.h>
 
 #include <linux/atomic.h>
 
@@ -234,7 +234,6 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 	struct sk_buff *entskb = entry->skb;
 	struct net_device *indev;
 	struct net_device *outdev;
-	struct nfq_ct_hook *nfq_ct;
 	struct nf_conn *ct = NULL;
 	enum ip_conntrack_info uninitialized_var(ctinfo);
 
@@ -270,17 +269,8 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 		break;
 	}
 
-	/* rcu_read_lock()ed by __nf_queue already. */
-	nfq_ct = rcu_dereference(nfq_ct_hook);
-	if (nfq_ct != NULL && (queue->flags & NFQA_CFG_F_CONNTRACK)) {
-		ct = nf_ct_get(entskb, &ctinfo);
-		if (ct) {
-			if (!nf_ct_is_untracked(ct))
-				size += nfq_ct->build_size(ct);
-			else
-				ct = NULL;
-		}
-	}
+	if (queue->flags & NFQA_CFG_F_CONNTRACK)
+		ct = nfqnl_ct_get(entskb, &size, &ctinfo);
 
 	skb = alloc_skb(size, GFP_ATOMIC);
 	if (!skb)
@@ -404,23 +394,8 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
 			BUG();
 	}
 
-	if (ct) {
-		struct nlattr *nest_parms;
-		u_int32_t tmp;
-
-		nest_parms = nla_nest_start(skb, NFQA_CT | NLA_F_NESTED);
-		if (!nest_parms)
-			goto nla_put_failure;
-
-		if (nfq_ct->build(skb, ct) < 0)
-			goto nla_put_failure;
-
-		nla_nest_end(skb, nest_parms);
-
-		tmp = ctinfo;
-		if (nla_put_u32(skb, NFQA_CT_INFO, htonl(ctinfo)))
-			goto nla_put_failure;
-	}
+	if (ct && nfqnl_ct_put(skb, ct, ctinfo) < 0)
+		goto nla_put_failure;
 
 	nlh->nlmsg_len = skb->tail - old_tail;
 	return skb;
@@ -764,7 +739,6 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 	struct nfqnl_instance *queue;
 	unsigned int verdict;
 	struct nf_queue_entry *entry;
-	struct nfq_ct_hook *nfq_ct;
 	enum ip_conntrack_info uninitialized_var(ctinfo);
 	struct nf_conn *ct = NULL;
 
@@ -786,13 +760,8 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 		return -ENOENT;
 
 	rcu_read_lock();
-	nfq_ct = rcu_dereference(nfq_ct_hook);
-	if (nfq_ct != NULL &&
-	    (queue->flags & NFQA_CFG_F_CONNTRACK) && nfqa[NFQA_CT]) {
-		ct = nf_ct_get(entry->skb, &ctinfo);
-		if (ct && !nf_ct_is_untracked(ct))
-			nfq_ct->parse(nfqa[NFQA_CT], ct);
-	}
+	if (nfqa[NFQA_CT] && (queue->flags & NFQA_CFG_F_CONNTRACK))
+		ct = nfqnl_ct_parse(entry->skb, nfqa[NFQA_CT], &ctinfo);
 
 	if (nfqa[NFQA_PAYLOAD]) {
 		u16 payload_len = nla_len(nfqa[NFQA_PAYLOAD]);
@@ -802,8 +771,8 @@ nfqnl_recv_verdict(struct sock *ctnl, struct sk_buff *skb,
 				 payload_len, entry, diff) < 0)
 			verdict = NF_DROP;
 
-		if (ct && (ct->status & IPS_NAT_MASK) && diff)
-			nfq_ct->seq_adjust(skb, ct, ctinfo, diff);
+		if (ct)
+			nfqnl_ct_seq_adjust(skb, ct, ctinfo, diff);
 	}
 	rcu_read_unlock();
 
diff --git a/net/netfilter/nfnetlink_queue_ct.c b/net/netfilter/nfnetlink_queue_ct.c
new file mode 100644
index 0000000..68ef550
--- /dev/null
+++ b/net/netfilter/nfnetlink_queue_ct.c
@@ -0,0 +1,97 @@
+/*
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/skbuff.h>
+#include <linux/netfilter.h>
+#include <linux/netfilter/nfnetlink.h>
+#include <linux/netfilter/nfnetlink_queue.h>
+#include <net/netfilter/nf_conntrack.h>
+
+struct nf_conn *nfqnl_ct_get(struct sk_buff *entskb, size_t *size,
+			     enum ip_conntrack_info *ctinfo)
+{
+	struct nfq_ct_hook *nfq_ct;
+	struct nf_conn *ct;
+
+	/* rcu_read_lock()ed by __nf_queue already. */
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct == NULL)
+		return NULL;
+
+	ct = nf_ct_get(entskb, ctinfo);
+	if (ct) {
+		if (!nf_ct_is_untracked(ct))
+			*size += nfq_ct->build_size(ct);
+		else
+			ct = NULL;
+	}
+	return ct;
+}
+
+struct nf_conn *
+nfqnl_ct_parse(const struct sk_buff *skb, const struct nlattr *attr,
+	       enum ip_conntrack_info *ctinfo)
+{
+	struct nfq_ct_hook *nfq_ct;
+	struct nf_conn *ct;
+
+	/* rcu_read_lock()ed by __nf_queue already. */
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct == NULL)
+		return NULL;
+
+	ct = nf_ct_get(skb, ctinfo);
+	if (ct && !nf_ct_is_untracked(ct))
+		nfq_ct->parse(attr, ct);
+
+	return ct;
+}
+
+int nfqnl_ct_put(struct sk_buff *skb, struct nf_conn *ct,
+		 enum ip_conntrack_info ctinfo)
+{
+	struct nfq_ct_hook *nfq_ct;
+	struct nlattr *nest_parms;
+	u_int32_t tmp;
+
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct == NULL)
+		return 0;
+
+	nest_parms = nla_nest_start(skb, NFQA_CT | NLA_F_NESTED);
+	if (!nest_parms)
+		goto nla_put_failure;
+
+	if (nfq_ct->build(skb, ct) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest_parms);
+
+	tmp = ctinfo;
+	if (nla_put_be32(skb, NFQA_CT_INFO, htonl(tmp)))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+void nfqnl_ct_seq_adjust(struct sk_buff *skb, struct nf_conn *ct,
+			 enum ip_conntrack_info ctinfo, int diff)
+{
+	struct nfq_ct_hook *nfq_ct;
+
+	nfq_ct = rcu_dereference(nfq_ct_hook);
+	if (nfq_ct == NULL)
+		return;
+
+	if ((ct->status & IPS_NAT_MASK) && diff)
+		nfq_ct->seq_adjust(skb, ct, ctinfo, diff);
+}
-- 
1.7.10


^ permalink raw reply related

* [PATCH 1/4] netfilter: ctnetlink: fix NULL dereference while trying to change helper
From: pablo @ 2012-06-19  3:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

The patch 1afc56794e03: "netfilter: nf_ct_helper: implement variable
length helper private data" from Jun 7, 2012, leads to the following
Smatch complaint:

net/netfilter/nf_conntrack_netlink.c:1231 ctnetlink_change_helper()
         error: we previously assumed 'help->helper' could be null (see line 1228)

This NULL dereference can be triggered with the following sequence:

1) attach the helper for first time when the conntrack is created.
2) remove the helper module or detach the helper from the conntrack
   via ctnetlink.
3) attach helper again (the same or different one, no matter) to the
   that existing conntrack again via ctnetlink.

This patch fixes the problem by removing the use case that allows you
to re-assign again a helper for one conntrack entry via ctnetlink since
I cannot find any practical use for it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |   13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index ae156df..76271a1 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1224,19 +1224,12 @@ ctnetlink_change_helper(struct nf_conn *ct, const struct nlattr * const cda[])
 			if (helper->from_nlattr && helpinfo)
 				helper->from_nlattr(helpinfo, ct);
 			return 0;
-		}
-		if (help->helper)
+		} else
 			return -EBUSY;
-		/* need to zero data of old helper */
-		memset(help->data, 0, help->helper->data_len);
-	} else {
-		/* we cannot set a helper for an existing conntrack */
-		return -EOPNOTSUPP;
 	}
 
-	rcu_assign_pointer(help->helper, helper);
-
-	return 0;
+	/* we cannot set a helper for an existing conntrack */
+	return -EOPNOTSUPP;
 }
 
 static inline int
-- 
1.7.10

^ permalink raw reply related

* [PATCH 3/4] netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset
From: pablo @ 2012-06-19  3:16 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1340075789-6196-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch fixes the compilation of net/netfilter/nfnetlink_cthelper.c
if CONFIG_NF_CONNTRACK is not set.

This patch also moves the definition of the cthelper infrastructure to
the scope of NF_CONNTRACK things.

I have also renamed NETFILTER_NETLINK_CTHELPER by NF_CT_NETLINK_HELPER,
to use similar names to other nf_conntrack_netlink extensions. Better now
that this has been only for two days in David's tree.

Two new dependencies have been added:

* NF_CT_NETLINK
* NETFILTER_NETLINK_QUEUE

Since these infrastructure requires both ctnetlink and nfqueue.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/Kconfig  |   20 ++++++++++++--------
 net/netfilter/Makefile |    2 +-
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index aae6c62..f1a52ba 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -12,14 +12,6 @@ tristate "Netfilter NFACCT over NFNETLINK interface"
 	  If this option is enabled, the kernel will include support
 	  for extended accounting via NFNETLINK.
 
-config NETFILTER_NETLINK_CTHELPER
-tristate "Netfilter CTHELPER over NFNETLINK interface"
-	depends on NETFILTER_ADVANCED
-	select NETFILTER_NETLINK
-	help
-	  If this option is enabled, the kernel will include support
-	  for user-space connection tracking helpers via NFNETLINK.
-
 config NETFILTER_NETLINK_QUEUE
 	tristate "Netfilter NFQUEUE over NFNETLINK interface"
 	depends on NETFILTER_ADVANCED
@@ -343,6 +335,18 @@ config NF_CT_NETLINK_TIMEOUT
 
 	  If unsure, say `N'.
 
+config NF_CT_NETLINK_HELPER
+	tristate 'Connection tracking helpers in user-space via Netlink'
+	select NETFILTER_NETLINK
+	depends on NF_CT_NETLINK
+	depends on NETFILTER_NETLINK_QUEUE
+	depends on NETFILTER_ADVANCED
+	help
+	  This option enables the user-space connection tracking helpers
+	  infrastructure.
+
+	  If unsure, say `N'.
+
 endif # NF_CONNTRACK
 
 # transparent proxy support
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 2f3bc0f..7cc2019 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -9,7 +9,6 @@ obj-$(CONFIG_NETFILTER) = netfilter.o
 
 obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o
 obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o
-obj-$(CONFIG_NETFILTER_NETLINK_CTHELPER) += nfnetlink_cthelper.o
 obj-$(CONFIG_NETFILTER_NETLINK_QUEUE) += nfnetlink_queue.o
 obj-$(CONFIG_NETFILTER_NETLINK_LOG) += nfnetlink_log.o
 
@@ -25,6 +24,7 @@ obj-$(CONFIG_NF_CT_PROTO_UDPLITE) += nf_conntrack_proto_udplite.o
 # netlink interface for nf_conntrack
 obj-$(CONFIG_NF_CT_NETLINK) += nf_conntrack_netlink.o
 obj-$(CONFIG_NF_CT_NETLINK_TIMEOUT) += nfnetlink_cttimeout.o
+obj-$(CONFIG_NF_CT_NETLINK_HELPER) += nfnetlink_cthelper.o
 
 # connection tracking helpers
 nf_conntrack_h323-objs := nf_conntrack_h323_main.o nf_conntrack_h323_asn1.o
-- 
1.7.10

^ permalink raw reply related

* Re: linux-next: Tree for Jun 18 (netfilter nfconntrack)
From: Pablo Neira Ayuso @ 2012-06-19  3:19 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, linux-next, LKML, netdev, netfilter-devel,
	coreteam
In-Reply-To: <4FDF65F6.6010002@xenotime.net>

On Mon, Jun 18, 2012 at 10:31:34AM -0700, Randy Dunlap wrote:
> On 06/17/2012 11:53 PM, Stephen Rothwell wrote:
> 
> > Hi all,
> > 
> > Changes since 20120615:
> 
> 
> 
> on i386 or x86_64:
> 
> # CONFIG_NF_CONNTRACK is not set
> 
>   CC [M]  net/netfilter/nfnetlink_cthelper.o
> In file included from include/net/netfilter/nf_conntrack_helper.h:12:0,
>                  from net/netfilter/nfnetlink_cthelper.c:23:
> include/net/netfilter/nf_conntrack.h:77:22: error: field 'ct_general' has incomplete type
> include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
> include/net/netfilter/nf_conntrack.h:157:30: error: 'const struct sk_buff' has no member named 'nfct'
> include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
> include/net/netfilter/nf_conntrack.h:164:2: error: implicit declaration of function 'nf_conntrack_put'
> make[3]: *** [net/netfilter/nfnetlink_cthelper.o] Error 1

I've send a patch to David to solve this:

 netfilter: fix compilation of the nfnl_cthelper if NF_CONNTRACK is unset

It seems to resolve the issue for me here.

Thanks for the report.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox