* Re: [GIT PULL nf-next-2.6] ipvs namespaces
From: Pablo Neira Ayuso @ 2011-01-12 20:29 UTC (permalink / raw)
To: Simon Horman
Cc: netfilter-devel, lvs-devel, netdev, Patrick McHardy,
Julian Anastasov, Hans Schillstrom
In-Reply-To: <1294294578-8601-1-git-send-email-horms@verge.net.au>
Hi Simon,
On 06/01/11 07:15, Simon Horman wrote:
> Hi Patrick,
>
> please consider pulling
> git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git master
> in order to get the netns changes to IPVS from Hans Schillstrom.
I'm hitting some conflicts when pulling from your tree.
>From git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6
* branch master -> FETCH_HEAD
Auto-merging include/linux/netfilter.h
Auto-merging include/linux/skbuff.h
Auto-merging include/net/netfilter/nf_conntrack.h
Auto-merging net/core/skbuff.c
Auto-merging net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
Auto-merging net/ipv6/netfilter/nf_conntrack_reasm.c
Auto-merging net/netfilter/core.c
Auto-merging net/netfilter/ipvs/ip_vs_ctl.c
CONFLICT (content): Merge conflict in net/netfilter/ipvs/ip_vs_ctl.c
^ permalink raw reply
* Re: [PATCH V8 02/13] ntp: add ADJ_SETOFFSET mode bit
From: Kuwahara,T. @ 2011-01-12 20:39 UTC (permalink / raw)
To: john stultz
Cc: Richard Cochran, linux-kernel, linux-api, netdev, Alan Cox,
Arnd Bergmann, Christoph Lameter, David Miller, Krzysztof Halasa,
Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner
In-Reply-To: <1294779309.3441.66.camel@work-vm>
On Wed, Jan 12, 2011 at 5:55 AM, john stultz <johnstul@us.ibm.com> wrote:
> So the kernel handles leap second insertion via a timer. Thus at the end
> of 23:59:59, it will inject a leapsecond by setting the time back by one
> second (back to 23:59:59) and setting the TIME_OOP flag.
>
> This timer is an absolute timer, so if someone moves the clock forward
> across the 23:59:59 boundary, the adjustment will still be made.
>
> The patch is not broken, nor useless.
It takes into account only one upcoming leap second, but ignores all
the others. That's not sufficient for arbitrary adjustments.
^ permalink raw reply
* Re: [PATCH net-next] netfilter: x_table: speedup compat operations
From: Pablo Neira Ayuso @ 2011-01-12 20:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Patrick McHardy, David Miller, Netfilter Development Mailinglist,
netdev
In-Reply-To: <1292693715.18869.148.camel@edumazet-laptop>
On 18/12/10 18:35, Eric Dumazet wrote:
> One iptables invocation with 135000 rules takes 35 seconds of cpu time
> on a recent server, using a 32bit distro and a 64bit kernel.
>
> We eventually trigger NMI/RCU watchdog.
>
> INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)
>
> COMPAT mode has quadratic behavior and consume 16 bytes of memory per
> rule.
>
> Switch the xt_compat algos to use an array instead of list, and use a
> binary search to locate an offset in the sorted array.
>
> This halves memory need (8 bytes per rule), and removes quadratic
> behavior [ O(N*N) -> O(N*log2(N)) ]
>
> Time of iptables goes from 35 s to 150 ms.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> include/linux/netfilter/x_tables.h | 3
> net/bridge/netfilter/ebtables.c | 1
> net/ipv4/netfilter/arp_tables.c | 2
> net/ipv4/netfilter/ip_tables.c | 2
> net/ipv6/netfilter/ip6_tables.c | 2
> net/netfilter/x_tables.c | 82 +++++++++++++++------------
> 6 files changed, 57 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
> index 742bec0..0f04d98 100644
> --- a/include/linux/netfilter/x_tables.h
> +++ b/include/linux/netfilter/x_tables.h
> @@ -611,8 +611,9 @@ struct _compat_xt_align {
> extern void xt_compat_lock(u_int8_t af);
> extern void xt_compat_unlock(u_int8_t af);
>
> -extern int xt_compat_add_offset(u_int8_t af, unsigned int offset, short delta);
> +extern int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta);
> extern void xt_compat_flush_offsets(u_int8_t af);
> +extern void xt_compat_init_offsets(u_int8_t af, unsigned int number);
> extern int xt_compat_calc_jump(u_int8_t af, unsigned int offset);
>
> extern int xt_compat_match_offset(const struct xt_match *match);
> diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
> index cbc9f39..2bf2fb2 100644
> --- a/net/bridge/netfilter/ebtables.c
> +++ b/net/bridge/netfilter/ebtables.c
> @@ -1764,6 +1764,7 @@ static int compat_table_info(const struct ebt_table_info *info,
>
> newinfo->entries_size = size;
>
> + xt_compat_init_offsets(AF_INET, info->nentries);
> return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info,
> entries, newinfo);
> }
> diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> index 3fac340..47e5178 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -883,6 +883,7 @@ static int compat_table_info(const struct xt_table_info *info,
> memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
> newinfo->initial_entries = 0;
> loc_cpu_entry = info->entries[raw_smp_processor_id()];
> + xt_compat_init_offsets(NFPROTO_ARP, info->number);
> xt_entry_foreach(iter, loc_cpu_entry, info->size) {
> ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
> if (ret != 0)
> @@ -1350,6 +1351,7 @@ static int translate_compat_table(const char *name,
> duprintf("translate_compat_table: size %u\n", info->size);
> j = 0;
> xt_compat_lock(NFPROTO_ARP);
> + xt_compat_init_offsets(NFPROTO_ARP, number);
> /* Walk through entries, checking offsets. */
> xt_entry_foreach(iter0, entry0, total_size) {
> ret = check_compat_entry_size_and_hooks(iter0, info, &size,
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index a846d63..c5a75d7 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -1080,6 +1080,7 @@ static int compat_table_info(const struct xt_table_info *info,
> memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
> newinfo->initial_entries = 0;
> loc_cpu_entry = info->entries[raw_smp_processor_id()];
> + xt_compat_init_offsets(AF_INET, info->number);
> xt_entry_foreach(iter, loc_cpu_entry, info->size) {
> ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
> if (ret != 0)
> @@ -1681,6 +1682,7 @@ translate_compat_table(struct net *net,
> duprintf("translate_compat_table: size %u\n", info->size);
> j = 0;
> xt_compat_lock(AF_INET);
> + xt_compat_init_offsets(AF_INET, number);
> /* Walk through entries, checking offsets. */
> xt_entry_foreach(iter0, entry0, total_size) {
> ret = check_compat_entry_size_and_hooks(iter0, info, &size,
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 4555823..0c9973a 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -1093,6 +1093,7 @@ static int compat_table_info(const struct xt_table_info *info,
> memcpy(newinfo, info, offsetof(struct xt_table_info, entries));
> newinfo->initial_entries = 0;
> loc_cpu_entry = info->entries[raw_smp_processor_id()];
> + xt_compat_init_offsets(AF_INET6, info->number);
> xt_entry_foreach(iter, loc_cpu_entry, info->size) {
> ret = compat_calc_entry(iter, info, loc_cpu_entry, newinfo);
> if (ret != 0)
> @@ -1696,6 +1697,7 @@ translate_compat_table(struct net *net,
> duprintf("translate_compat_table: size %u\n", info->size);
> j = 0;
> xt_compat_lock(AF_INET6);
> + xt_compat_init_offsets(AF_INET6, number);
> /* Walk through entries, checking offsets. */
> xt_entry_foreach(iter0, entry0, total_size) {
> ret = check_compat_entry_size_and_hooks(iter0, info, &size,
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8046350..75182ed 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -38,9 +38,8 @@ MODULE_DESCRIPTION("{ip,ip6,arp,eb}_tables backend module");
> #define SMP_ALIGN(x) (((x) + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1))
>
> struct compat_delta {
> - struct compat_delta *next;
> - unsigned int offset;
> - int delta;
> + unsigned int offset; /* offset in kernel */
> + int delta; /* delta in 32bit user land */
> };
>
> struct xt_af {
> @@ -49,7 +48,9 @@ struct xt_af {
> struct list_head target;
> #ifdef CONFIG_COMPAT
> struct mutex compat_mutex;
> - struct compat_delta *compat_offsets;
> + struct compat_delta *compat_tab;
> + unsigned int number; /* number of slots in compat_tab[] */
> + unsigned int cur; /* number of used slots in compat_tab[] */
> #endif
> };
>
> @@ -414,54 +415,67 @@ int xt_check_match(struct xt_mtchk_param *par,
> EXPORT_SYMBOL_GPL(xt_check_match);
>
> #ifdef CONFIG_COMPAT
> -int xt_compat_add_offset(u_int8_t af, unsigned int offset, short delta)
> +int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta)
> {
> - struct compat_delta *tmp;
> + struct xt_af *xp = &xt[af];
>
> - tmp = kmalloc(sizeof(struct compat_delta), GFP_KERNEL);
> - if (!tmp)
> - return -ENOMEM;
> + if (!xp->compat_tab) {
> + if (!xp->number)
> + return -EINVAL;
> + xp->compat_tab = vmalloc(sizeof(struct compat_delta) * xp->number);
> + if (!xp->compat_tab)
> + return -ENOMEM;
> + xp->cur = 0;
> + }
>
> - tmp->offset = offset;
> - tmp->delta = delta;
> + if (xp->cur >= xp->number)
> + return -EINVAL;
minor glitch: We leak xp->compat_tab if this error condition above is true.
>
> - if (xt[af].compat_offsets) {
> - tmp->next = xt[af].compat_offsets->next;
> - xt[af].compat_offsets->next = tmp;
> - } else {
> - xt[af].compat_offsets = tmp;
> - tmp->next = NULL;
> - }
> + if (xp->cur)
> + delta += xp->compat_tab[xp->cur - 1].delta;
> + xp->compat_tab[xp->cur].offset = offset;
> + xp->compat_tab[xp->cur].delta = delta;
> + xp->cur++;
> return 0;
> }
> EXPORT_SYMBOL_GPL(xt_compat_add_offset);
>
> void xt_compat_flush_offsets(u_int8_t af)
> {
> - struct compat_delta *tmp, *next;
> -
> - if (xt[af].compat_offsets) {
> - for (tmp = xt[af].compat_offsets; tmp; tmp = next) {
> - next = tmp->next;
> - kfree(tmp);
> - }
> - xt[af].compat_offsets = NULL;
> + if (xt[af].compat_tab) {
> + vfree(xt[af].compat_tab);
> + xt[af].compat_tab = NULL;
> + xt[af].number = 0;
> }
> }
> EXPORT_SYMBOL_GPL(xt_compat_flush_offsets);
>
> int xt_compat_calc_jump(u_int8_t af, unsigned int offset)
> {
> - struct compat_delta *tmp;
> - int delta;
> -
> - for (tmp = xt[af].compat_offsets, delta = 0; tmp; tmp = tmp->next)
> - if (tmp->offset < offset)
> - delta += tmp->delta;
> - return delta;
> + struct compat_delta *tmp = xt[af].compat_tab;
> + int mid, left = 0, right = xt[af].cur - 1;
> +
> + while (left <= right) {
> + mid = (left + right) >> 1;
> + if (offset > tmp[mid].offset)
> + left = mid + 1;
> + else if (offset < tmp[mid].offset)
> + right = mid - 1;
> + else
> + return mid ? tmp[mid - 1].delta : 0;
> + }
> + WARN_ON_ONCE(1);
> + return 0;
> }
> EXPORT_SYMBOL_GPL(xt_compat_calc_jump);
>
> +void xt_compat_init_offsets(u_int8_t af, unsigned int number)
> +{
> + xt[af].number = number;
> + xt[af].cur = 0;
> +}
> +EXPORT_SYMBOL(xt_compat_init_offsets);
> +
> int xt_compat_match_offset(const struct xt_match *match)
> {
> u_int16_t csize = match->compatsize ? : match->matchsize;
> @@ -1337,7 +1351,7 @@ static int __init xt_init(void)
> mutex_init(&xt[i].mutex);
> #ifdef CONFIG_COMPAT
> mutex_init(&xt[i].compat_mutex);
> - xt[i].compat_offsets = NULL;
> + xt[i].compat_tab = NULL;
> #endif
> INIT_LIST_HEAD(&xt[i].target);
> INIT_LIST_HEAD(&xt[i].match);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH V8 02/13] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2011-01-12 20:55 UTC (permalink / raw)
To: Kuwahara,T.
Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
Thomas Gleixner
In-Reply-To: <AANLkTikYuYwb4BLsU3BF_=d9fAdcfb0AC2itDBeyFsNq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Thu, 2011-01-13 at 05:39 +0900, Kuwahara,T. wrote:
> On Wed, Jan 12, 2011 at 5:55 AM, john stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > So the kernel handles leap second insertion via a timer. Thus at the end
> > of 23:59:59, it will inject a leapsecond by setting the time back by one
> > second (back to 23:59:59) and setting the TIME_OOP flag.
> >
> > This timer is an absolute timer, so if someone moves the clock forward
> > across the 23:59:59 boundary, the adjustment will still be made.
> >
> > The patch is not broken, nor useless.
>
> It takes into account only one upcoming leap second, but ignores all
> the others. That's not sufficient for arbitrary adjustments.
If an application wants to manage the full historical table of
leapseconds and compensate appropriately, then that's fine. The
interface proposed still functions in a reasonable manner.
Again, I agree that leapseconds are annoying to deal with. It would be
great if time() was defined as TAI time instead of UTC. I'm actually
hoping to provide a CLOCK_TAI clockid someday.
thanks
-john
^ permalink raw reply
* Re: XT_MATCH_REALM Kconfig whinge...
From: Jan Engelhardt @ 2011-01-12 20:57 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Valdis.Kletnieks, Patrick McHardy, David S. Miller, linux-kernel,
netfilter-devel, netdev
In-Reply-To: <4D2E057D.30707@netfilter.org>
On Wednesday 2011-01-12 20:48, Pablo Neira Ayuso wrote:
>On 12/01/11 20:15, Valdis.Kletnieks@vt.edu wrote:
>> scripts/kconfig/conf --silentoldconfig Kconfig
>> warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
>> warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
>
>Does this fix your problem?
>
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 1534f2b..ae56764 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -886,7 +886,8 @@ config NETFILTER_XT_MATCH_RATEEST
config NETFILTER_XT_MATCH_REALM
tristate '"realm" match support'
depends on NETFILTER_ADVANCED
- select NET_CLS_ROUTE
+ depends on NET_SCHED
+ depends on NET_CLS_ROUTE
help
This option adds a `realm' match, which allows you to use the realm
key from the routing subsystem inside iptables.
This patch is not right. The select should just be removed, because
xt_realm is useful even without SCHED and CLS_ROUTE.
^ permalink raw reply related
* Re: [PATCH net-next] netfilter: x_table: speedup compat operations
From: Eric Dumazet @ 2011-01-12 21:03 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Patrick McHardy, David Miller, Netfilter Development Mailinglist,
netdev
In-Reply-To: <4D2E12B9.6090201@netfilter.org>
Le mercredi 12 janvier 2011 à 21:44 +0100, Pablo Neira Ayuso a écrit :
> minor glitch: We leak xp->compat_tab if this error condition above is true.
>
I dont think so, pointer is stored in xp->compat_tab
xt_compat_flush_offsets() will free it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net: remove dev_txq_stats_fold()
From: Jarek Poplawski @ 2011-01-12 21:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, David Brownell, Michał Nazarewicz, Neil Jones,
linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
Alexander Duyck, Jeff Kirsher
In-Reply-To: <1294844520.3981.84.camel@edumazet-laptop>
On Wed, Jan 12, 2011 at 04:02:00PM +0100, Eric Dumazet wrote:
> Le mercredi 12 janvier 2011 ?? 14:52 +0100, Eric Dumazet a écrit :
>
> > Or even better, remove these counters since there is no users left but
> > ixgbe.
> >
> > (vlans, tunnels, ... now use percpu stats)
> >
> >
>
> Thanks Jarek for the reminder :)
>
> [PATCH] net: remove dev_txq_stats_fold()
>
> After recent changes, (percpu stats on vlan/tunnels...), we dont need
> anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
>
> Only remaining users are ixgbe, sch_teql & macvlan :
And gianfar (not counting staging/bcm) but I'm not sure if that's all.
>
> 1) ixgbe can be converted to use existing tx_ring counters.
>
> 2) macvlan incremented txq->tx_dropped, it can use the
> dev->stats.tx_dropped counter.
>
> 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
> Now we have ndo_get_stats64(), use it.
Btw, why doesn't sch_teql need locking (for 32-bits)?
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: XT_MATCH_REALM Kconfig whinge...
From: Pablo Neira Ayuso @ 2011-01-12 21:08 UTC (permalink / raw)
To: Jan Engelhardt
Cc: Valdis.Kletnieks, Patrick McHardy, David S. Miller, linux-kernel,
netfilter-devel, netdev
In-Reply-To: <alpine.LNX.2.01.1101122156310.27393@obet.zrqbmnf.qr>
[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]
On 12/01/11 21:57, Jan Engelhardt wrote:
> On Wednesday 2011-01-12 20:48, Pablo Neira Ayuso wrote:
>
>> On 12/01/11 20:15, Valdis.Kletnieks@vt.edu wrote:
>>> scripts/kconfig/conf --silentoldconfig Kconfig
>>> warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
>>> warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
>>
>> Does this fix your problem?
>>
>
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 1534f2b..ae56764 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -886,7 +886,8 @@ config NETFILTER_XT_MATCH_RATEEST
> config NETFILTER_XT_MATCH_REALM
> tristate '"realm" match support'
> depends on NETFILTER_ADVANCED
> - select NET_CLS_ROUTE
> + depends on NET_SCHED
> + depends on NET_CLS_ROUTE
> help
> This option adds a `realm' match, which allows you to use the realm
> key from the routing subsystem inside iptables.
>
>
> This patch is not right. The select should just be removed, because
> xt_realm is useful even without SCHED and CLS_ROUTE.
I wonder why NET_CLS_ROUTE has been there as dependency.
Then this patch should be fine.
[-- Attachment #2: realm.patch --]
[-- Type: text/x-patch, Size: 1136 bytes --]
netfilter: xt_realm: fix unmet direct dependencies
From: Pablo Neira Ayuso <pablo@netfilter.org>
scripts/kconfig/conf --silentoldconfig Kconfig
warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
warning: (NETFILTER_XT_MATCH_REALM) selects NET_CLS_ROUTE which has unmet direct dependencies (NET && NET_SCHED)
Jan Engelhardt spotted that NET_CLS_ROUTE is a superfluous dependency,
for that reason, this patch remove it.
Reported by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/Kconfig | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 1534f2b..8960260 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -886,7 +886,6 @@ config NETFILTER_XT_MATCH_RATEEST
config NETFILTER_XT_MATCH_REALM
tristate '"realm" match support'
depends on NETFILTER_ADVANCED
- select NET_CLS_ROUTE
help
This option adds a `realm' match, which allows you to use the realm
key from the routing subsystem inside iptables.
^ permalink raw reply related
* Re: [PATCH net-next] netfilter: x_table: speedup compat operations
From: Pablo Neira Ayuso @ 2011-01-12 21:11 UTC (permalink / raw)
To: Eric Dumazet
Cc: Patrick McHardy, David Miller, Netfilter Development Mailinglist,
netdev
In-Reply-To: <1294866229.3335.2.camel@edumazet-laptop>
On 12/01/11 22:03, Eric Dumazet wrote:
> Le mercredi 12 janvier 2011 à 21:44 +0100, Pablo Neira Ayuso a écrit :
>
>> minor glitch: We leak xp->compat_tab if this error condition above is true.
>>
>
> I dont think so, pointer is stored in xp->compat_tab
>
> xt_compat_flush_offsets() will free it.
Indeed, sorry. I'm going apply your patch. Thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] netfilter: x_table: speedup compat operations
From: Eric Dumazet @ 2011-01-12 21:14 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Patrick McHardy, David Miller, Netfilter Development Mailinglist,
netdev
In-Reply-To: <4D2E18E5.4000902@netfilter.org>
Le mercredi 12 janvier 2011 à 22:11 +0100, Pablo Neira Ayuso a écrit :
> On 12/01/11 22:03, Eric Dumazet wrote:
> > Le mercredi 12 janvier 2011 à 21:44 +0100, Pablo Neira Ayuso a écrit :
> >
> >> minor glitch: We leak xp->compat_tab if this error condition above is true.
> >>
> >
> > I dont think so, pointer is stored in xp->compat_tab
> >
> > xt_compat_flush_offsets() will free it.
>
> Indeed, sorry. I'm going apply your patch. Thanks Eric.
Thanks ! I completely forgot about it, to be honest :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net: remove dev_txq_stats_fold()
From: Jarek Poplawski @ 2011-01-12 21:18 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, David Brownell, Neil Jones,
linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
Alexander Duyck, Jeff Kirsher, Michal Nazarewicz
In-Reply-To: <20110112210518.GA2152-qLhEM4ewh5CNj9Bq2fkWzw@public.gmane.org>
Btw, I updated Michal's email (according to linux-kernel).
Jarek P.
On Wed, Jan 12, 2011 at 10:05:18PM +0100, Jarek Poplawski wrote:
> On Wed, Jan 12, 2011 at 04:02:00PM +0100, Eric Dumazet wrote:
> > Le mercredi 12 janvier 2011 ?? 14:52 +0100, Eric Dumazet a écrit :
> >
> > > Or even better, remove these counters since there is no users left but
> > > ixgbe.
> > >
> > > (vlans, tunnels, ... now use percpu stats)
> > >
> > >
> >
> > Thanks Jarek for the reminder :)
> >
> > [PATCH] net: remove dev_txq_stats_fold()
> >
> > After recent changes, (percpu stats on vlan/tunnels...), we dont need
> > anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
> >
> > Only remaining users are ixgbe, sch_teql & macvlan :
>
> And gianfar (not counting staging/bcm) but I'm not sure if that's all.
>
> >
> > 1) ixgbe can be converted to use existing tx_ring counters.
> >
> > 2) macvlan incremented txq->tx_dropped, it can use the
> > dev->stats.tx_dropped counter.
> >
> > 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
> > Now we have ndo_get_stats64(), use it.
>
> Btw, why doesn't sch_teql need locking (for 32-bits)?
>
> Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: TCP connections stall when station re-associates.
From: Ben Greear @ 2011-01-12 21:19 UTC (permalink / raw)
To: NetDev, linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <4D2CFDB8.3010907-my8/4N5VtI7c+919tysfdA@public.gmane.org>
On 01/11/2011 05:02 PM, Ben Greear wrote:
>
> We're seeing something funny while testing ath9k and ath5k with
> multiple VIFs. We are using send-to-self routing rules
> and have a TCP connection going between two station interfaces
> connected to the same AP. Kernel is ~2.6.37 (wireless-testing from
> a few days ago, plus some local patches). We tried commercial
> APs and ath9k/ath5k APs running 2.6.37 kernel and hostap with
> similar results.
When the stations re-connect, we re-run dhcp client, which involves
setting IP to zero and then re-acquiring the IP address. We
then set up new routing rules, etc to have it back to the way
it was before the station lost contact with the AP.
The TCP/IP connections are specifically bound to local IP and port.
Maybe it is invalid to expect TCP connections to survive this
IP change?
I'll work on reproducing this DHCP behaviour on some regular
wired interfaces and see if I can reproduce in a simpler test
case.
Thanks,
Ben
--
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 2/3] bna: Remove unnecessary memset(,0,)
From: Joe Perches @ 2011-01-12 21:21 UTC (permalink / raw)
To: Rasesh Mody, Debashis Dutt; +Cc: netdev, linux-kernel
In-Reply-To: <6f6d4a7a705949a3464471f364f77215f1e796f3.1294867098.git.joe@perches.com>
kzalloc'd memory doesn't need a memset to 0.
Signed-off-by: Joe Perches <joe@perches.com>
---
drivers/net/bna/bnad_ethtool.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/net/bna/bnad_ethtool.c b/drivers/net/bna/bnad_ethtool.c
index 99be5ae..142d604 100644
--- a/drivers/net/bna/bnad_ethtool.c
+++ b/drivers/net/bna/bnad_ethtool.c
@@ -275,7 +275,6 @@ bnad_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
ioc_attr = kzalloc(sizeof(*ioc_attr), GFP_KERNEL);
if (ioc_attr) {
- memset(ioc_attr, 0, sizeof(*ioc_attr));
spin_lock_irqsave(&bnad->bna_lock, flags);
bfa_nw_ioc_get_attr(&bnad->bna.device.ioc, ioc_attr);
spin_unlock_irqrestore(&bnad->bna_lock, flags);
--
1.7.3.3.398.g0b0cd.dirty
^ permalink raw reply related
* Re: [PATCH] net: remove dev_txq_stats_fold()
From: Eric Dumazet @ 2011-01-12 22:13 UTC (permalink / raw)
To: Jarek Poplawski
Cc: David Miller, David Brownell, Neil Jones, linux-usb, netdev,
Alexander Duyck, Jeff Kirsher, Michal Nazarewicz,
Sandeep Gopalpet
In-Reply-To: <20110112211806.GB2152@del.dom.local>
Le mercredi 12 janvier 2011 à 22:18 +0100, Jarek Poplawski a écrit :
> Btw, I updated Michal's email (according to linux-kernel).
>
> Jarek P.
>
> On Wed, Jan 12, 2011 at 10:05:18PM +0100, Jarek Poplawski wrote:
> > On Wed, Jan 12, 2011 at 04:02:00PM +0100, Eric Dumazet wrote:
> > > Le mercredi 12 janvier 2011 ?? 14:52 +0100, Eric Dumazet a écrit :
> > >
> > > > Or even better, remove these counters since there is no users left but
> > > > ixgbe.
> > > >
> > > > (vlans, tunnels, ... now use percpu stats)
> > > >
> > > >
> > >
> > > Thanks Jarek for the reminder :)
> > >
> > > [PATCH] net: remove dev_txq_stats_fold()
> > >
> > > After recent changes, (percpu stats on vlan/tunnels...), we dont need
> > > anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
> > >
> > > Only remaining users are ixgbe, sch_teql & macvlan :
> >
> > And gianfar (not counting staging/bcm) but I'm not sure if that's all.
> >
Hmm, I did a "allyesconfig", but on x86_64 ;)
> > >
> > > 1) ixgbe can be converted to use existing tx_ring counters.
> > >
> > > 2) macvlan incremented txq->tx_dropped, it can use the
> > > dev->stats.tx_dropped counter.
> > >
> > > 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
> > > Now we have ndo_get_stats64(), use it.
> >
> > Btw, why doesn't sch_teql need locking (for 32-bits)?
Yes you're right, thanks !
[PATCH v2] net: remove dev_txq_stats_fold()
After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
Only remaining users are ixgbe, sch_teql, gianfar & macvlan :
1) ixgbe can be converted to use existing tx_ring counters.
2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.
3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)
4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets
This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.
Ref: http://www.spinics.net/lists/netdev/msg149202.html
Reported-by: Neil Jones <neiljay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
CC: Michal Nazarewicz <mina86@mina86.com>
---
v2: added gianfar, removed u64 fields from sch_teql to avoid extra
locking
drivers/net/gianfar.c | 10 ++++------
drivers/net/gianfar.h | 10 ++++++++++
drivers/net/ixgbe/ixgbe_main.c | 23 ++++++++++++++++-------
drivers/net/macvtap.c | 2 +-
include/linux/netdevice.h | 5 -----
net/core/dev.c | 29 -----------------------------
net/sched/sch_teql.c | 26 +++++++++++++++++++++-----
7 files changed, 52 insertions(+), 53 deletions(-)
diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 6de4675..119aa20 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -434,7 +434,6 @@ static void gfar_init_mac(struct net_device *ndev)
static struct net_device_stats *gfar_get_stats(struct net_device *dev)
{
struct gfar_private *priv = netdev_priv(dev);
- struct netdev_queue *txq;
unsigned long rx_packets = 0, rx_bytes = 0, rx_dropped = 0;
unsigned long tx_packets = 0, tx_bytes = 0;
int i = 0;
@@ -450,9 +449,8 @@ static struct net_device_stats *gfar_get_stats(struct net_device *dev)
dev->stats.rx_dropped = rx_dropped;
for (i = 0; i < priv->num_tx_queues; i++) {
- txq = netdev_get_tx_queue(dev, i);
- tx_bytes += txq->tx_bytes;
- tx_packets += txq->tx_packets;
+ tx_bytes += priv->tx_queue[i]->stats.tx_bytes;
+ tx_packets += priv->tx_queue[i]->stats.tx_packets;
}
dev->stats.tx_bytes = tx_bytes;
@@ -2109,8 +2107,8 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
/* Update transmit stats */
- txq->tx_bytes += skb->len;
- txq->tx_packets ++;
+ tx_queue->stats.tx_bytes += skb->len;
+ tx_queue->stats.tx_packets++;
txbdp = txbdp_start = tx_queue->cur_tx;
lstatus = txbdp->lstatus;
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 68984eb..54de413 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -907,12 +907,21 @@ enum {
MQ_MG_MODE
};
+/*
+ * Per TX queue stats
+ */
+struct tx_q_stats {
+ unsigned long tx_packets;
+ unsigned long tx_bytes;
+};
+
/**
* struct gfar_priv_tx_q - per tx queue structure
* @txlock: per queue tx spin lock
* @tx_skbuff:skb pointers
* @skb_curtx: to be used skb pointer
* @skb_dirtytx:the last used skb pointer
+ * @stats: bytes/packets stats
* @qindex: index of this queue
* @dev: back pointer to the dev structure
* @grp: back pointer to the group to which this queue belongs
@@ -934,6 +943,7 @@ struct gfar_priv_tx_q {
struct txbd8 *tx_bd_base;
struct txbd8 *cur_tx;
struct txbd8 *dirty_tx;
+ struct tx_q_stats stats;
struct net_device *dev;
struct gfar_priv_grp *grp;
u16 skb_curtx;
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a060610..602078b 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -6667,8 +6667,6 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
struct ixgbe_adapter *adapter,
struct ixgbe_ring *tx_ring)
{
- struct net_device *netdev = tx_ring->netdev;
- struct netdev_queue *txq;
unsigned int first;
unsigned int tx_flags = 0;
u8 hdr_len = 0;
@@ -6765,9 +6763,6 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
/* add the ATR filter if ATR is on */
if (test_bit(__IXGBE_TX_FDIR_INIT_DONE, &tx_ring->state))
ixgbe_atr(tx_ring, skb, tx_flags, protocol);
- txq = netdev_get_tx_queue(netdev, tx_ring->queue_index);
- txq->tx_bytes += skb->len;
- txq->tx_packets++;
ixgbe_tx_queue(tx_ring, tx_flags, count, skb->len, hdr_len);
ixgbe_maybe_stop_tx(tx_ring, DESC_NEEDED);
@@ -6925,8 +6920,6 @@ static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device *netdev,
struct ixgbe_adapter *adapter = netdev_priv(netdev);
int i;
- /* accurate rx/tx bytes/packets stats */
- dev_txq_stats_fold(netdev, stats);
rcu_read_lock();
for (i = 0; i < adapter->num_rx_queues; i++) {
struct ixgbe_ring *ring = ACCESS_ONCE(adapter->rx_ring[i]);
@@ -6943,6 +6936,22 @@ static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device *netdev,
stats->rx_bytes += bytes;
}
}
+
+ for (i = 0; i < adapter->num_tx_queues; i++) {
+ struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
+ u64 bytes, packets;
+ unsigned int start;
+
+ if (ring) {
+ do {
+ start = u64_stats_fetch_begin_bh(&ring->syncp);
+ packets = ring->stats.packets;
+ bytes = ring->stats.bytes;
+ } while (u64_stats_fetch_retry_bh(&ring->syncp, start));
+ stats->tx_packets += packets;
+ stats->tx_bytes += bytes;
+ }
+ }
rcu_read_unlock();
/* following stats updated by ixgbe_watchdog_task() */
stats->multicast = netdev->stats.multicast;
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 21845af..5933621 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -585,7 +585,7 @@ err:
rcu_read_lock_bh();
vlan = rcu_dereference(q->vlan);
if (vlan)
- netdev_get_tx_queue(vlan->dev, 0)->tx_dropped++;
+ vlan->dev->stats.tx_dropped++;
rcu_read_unlock_bh();
return err;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be4957c..d971346 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -520,9 +520,6 @@ struct netdev_queue {
* please use this field instead of dev->trans_start
*/
unsigned long trans_start;
- u64 tx_bytes;
- u64 tx_packets;
- u64 tx_dropped;
} ____cacheline_aligned_in_smp;
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
@@ -2265,8 +2262,6 @@ extern void dev_load(struct net *net, const char *name);
extern void dev_mcast_init(void);
extern struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
struct rtnl_link_stats64 *storage);
-extern void dev_txq_stats_fold(const struct net_device *dev,
- struct rtnl_link_stats64 *stats);
extern int netdev_max_backlog;
extern int netdev_tstamp_prequeue;
diff --git a/net/core/dev.c b/net/core/dev.c
index a3ef808..83507c2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5523,34 +5523,6 @@ void netdev_run_todo(void)
}
}
-/**
- * dev_txq_stats_fold - fold tx_queues stats
- * @dev: device to get statistics from
- * @stats: struct rtnl_link_stats64 to hold results
- */
-void dev_txq_stats_fold(const struct net_device *dev,
- struct rtnl_link_stats64 *stats)
-{
- u64 tx_bytes = 0, tx_packets = 0, tx_dropped = 0;
- unsigned int i;
- struct netdev_queue *txq;
-
- for (i = 0; i < dev->num_tx_queues; i++) {
- txq = netdev_get_tx_queue(dev, i);
- spin_lock_bh(&txq->_xmit_lock);
- tx_bytes += txq->tx_bytes;
- tx_packets += txq->tx_packets;
- tx_dropped += txq->tx_dropped;
- spin_unlock_bh(&txq->_xmit_lock);
- }
- if (tx_bytes || tx_packets || tx_dropped) {
- stats->tx_bytes = tx_bytes;
- stats->tx_packets = tx_packets;
- stats->tx_dropped = tx_dropped;
- }
-}
-EXPORT_SYMBOL(dev_txq_stats_fold);
-
/* Convert net_device_stats to rtnl_link_stats64. They have the same
* fields in the same order, with only the type differing.
*/
@@ -5594,7 +5566,6 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
} else {
netdev_stats_to_stats64(storage, &dev->stats);
- dev_txq_stats_fold(dev, storage);
}
storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
return storage;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index af9360d..8ab66a4 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -59,6 +59,10 @@ struct teql_master
struct net_device *dev;
struct Qdisc *slaves;
struct list_head master_list;
+ unsigned long tx_bytes;
+ unsigned long tx_packets;
+ unsigned long tx_errors;
+ unsigned long tx_dropped;
};
struct teql_sched_data
@@ -274,7 +278,6 @@ static inline int teql_resolve(struct sk_buff *skb,
static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct teql_master *master = netdev_priv(dev);
- struct netdev_queue *txq = netdev_get_tx_queue(dev, 0);
struct Qdisc *start, *q;
int busy;
int nores;
@@ -314,8 +317,8 @@ restart:
__netif_tx_unlock(slave_txq);
master->slaves = NEXT_SLAVE(q);
netif_wake_queue(dev);
- txq->tx_packets++;
- txq->tx_bytes += length;
+ master->tx_packets++;
+ master->tx_bytes += length;
return NETDEV_TX_OK;
}
__netif_tx_unlock(slave_txq);
@@ -342,10 +345,10 @@ restart:
netif_stop_queue(dev);
return NETDEV_TX_BUSY;
}
- dev->stats.tx_errors++;
+ master->tx_errors++;
drop:
- txq->tx_dropped++;
+ master->tx_dropped++;
dev_kfree_skb(skb);
return NETDEV_TX_OK;
}
@@ -398,6 +401,18 @@ static int teql_master_close(struct net_device *dev)
return 0;
}
+static struct rtnl_link_stats64 *teql_master_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *stats)
+{
+ struct teql_master *m = netdev_priv(dev);
+
+ stats->tx_packets = m->tx_packets;
+ stats->tx_bytes = m->tx_bytes;
+ stats->tx_errors = m->tx_errors;
+ stats->tx_dropped = m->tx_dropped;
+ return stats;
+}
+
static int teql_master_mtu(struct net_device *dev, int new_mtu)
{
struct teql_master *m = netdev_priv(dev);
@@ -422,6 +437,7 @@ static const struct net_device_ops teql_netdev_ops = {
.ndo_open = teql_master_open,
.ndo_stop = teql_master_close,
.ndo_start_xmit = teql_master_xmit,
+ .ndo_get_stats64 = teql_master_stats64,
.ndo_change_mtu = teql_master_mtu,
};
^ permalink raw reply related
* RE: [PATCH 2/3] bna: Remove unnecessary memset(,0,)
From: Rasesh Mody @ 2011-01-12 22:55 UTC (permalink / raw)
To: Joe Perches, Debashis Dutt
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <0ed96bd1a053e61c827f2ac09e38cba59e32a21b.1294867098.git.joe@perches.com>
>From: Joe Perches [mailto:joe@perches.com]
>Sent: Wednesday, January 12, 2011 1:21 PM
>
>kzalloc'd memory doesn't need a memset to 0.
>
>Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rasesh Mody <rmody@brocade.com>
>---
> drivers/net/bna/bnad_ethtool.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
>diff --git a/drivers/net/bna/bnad_ethtool.c
>b/drivers/net/bna/bnad_ethtool.c
>index 99be5ae..142d604 100644
>--- a/drivers/net/bna/bnad_ethtool.c
>+++ b/drivers/net/bna/bnad_ethtool.c
>@@ -275,7 +275,6 @@ bnad_get_drvinfo(struct net_device *netdev, struct
>ethtool_drvinfo *drvinfo)
>
> ioc_attr = kzalloc(sizeof(*ioc_attr), GFP_KERNEL);
> if (ioc_attr) {
>- memset(ioc_attr, 0, sizeof(*ioc_attr));
> spin_lock_irqsave(&bnad->bna_lock, flags);
> bfa_nw_ioc_get_attr(&bnad->bna.device.ioc, ioc_attr);
> spin_unlock_irqrestore(&bnad->bna_lock, flags);
>--
>1.7.3.3.398.g0b0cd.dirty
Joe,
Patch looks ok to me.
Thanks,
Rasesh
^ permalink raw reply
* RE: [E1000-devel] [e100] Page allocation failure warning(?) in 2.6.36.3
From: Chris Rankin @ 2011-01-12 23:27 UTC (permalink / raw)
To: JesseBrandeburg, Eric Dumazet
Cc: David Miller, e1000-devel@lists.sourceforge.net, Tushar NDave,
netdev@vger.kernel.org, Jeffrey TKirsher
In-Reply-To: <1294856052.3981.125.camel@edumazet-laptop>
Thanks, the problem has not reoccurred since I've rebooted the box with the new e100 module.
However, the problem *did* reoccur when I tried just stopping networking, unloading the old module, loading the new module and restarting networking... (I think there were some residual network packets still taking up memory in the system. Maybe.)
Jan 12 22:27:04 wellhouse kernel: ifconfig: page allocation failure. order:6, mode:0x8020
Jan 12 22:27:04 wellhouse kernel: Pid: 14968, comm: ifconfig Not tainted 2.6.36.3 #1
Jan 12 22:27:04 wellhouse kernel: Call Trace:
Jan 12 22:27:04 wellhouse kernel: [<c104b2a9>] ? __alloc_pages_nodemask+0x477/0x4a6
Jan 12 22:27:04 wellhouse kernel: [<c106177d>] ? __slab_alloc+0x1eb/0x396
Jan 12 22:27:04 wellhouse kernel: [<c1004ca6>] ? dma_generic_alloc_coherent+0x4e/0xac
Jan 12 22:27:04 wellhouse kernel: [<c105fb5c>] ? dma_pool_alloc+0xe5/0x1d9
Jan 12 22:27:04 wellhouse kernel: [<c1004c58>] ? dma_generic_alloc_coherent+0x0/0xac
Jan 12 22:27:04 wellhouse kernel: [<c66f67ee>] ? e100_rx_alloc_skb+0x82/0x11d [e100]
Jan 12 22:27:07 wellhouse kernel: [<c66f687e>] ? e100_rx_alloc_skb+0x112/0x11d [e100]
Jan 12 22:27:07 wellhouse kernel: [<c66f68d7>] ? e100_alloc_cbs+0x4e/0xfa [e100]
Jan 12 22:27:07 wellhouse kernel: [<c66f836e>] ? e100_up+0x1b/0xf1 [e100]
Jan 12 22:27:07 wellhouse kernel: [<c66f845b>] ? e100_open+0x17/0x3b [e100]
Jan 12 22:27:07 wellhouse kernel: [<c1121630>] ? __dev_open+0x7c/0xa0
Jan 12 22:27:07 wellhouse kernel: [<c11217ed>] ? __dev_change_flags+0x8b/0x100
Jan 12 22:27:07 wellhouse kernel: [<c11218c3>] ? dev_change_flags+0x10/0x3b
Jan 12 22:27:07 wellhouse kernel: [<c1159880>] ? devinet_ioctl+0x25a/0x532
Jan 12 22:27:07 wellhouse kernel: [<c11146d2>] ? sock_ioctl+0x1a8/0x1ca
Jan 12 22:27:07 wellhouse kernel: [<c111452a>] ? sock_ioctl+0x0/0x1ca
Jan 12 22:27:07 wellhouse kernel: [<c106e061>] ? do_vfs_ioctl+0x464/0x4a2
Jan 12 22:27:07 wellhouse kernel: [<c1014ce0>] ? do_page_fault+0x2d2/0x2ea
Jan 12 22:27:07 wellhouse kernel: [<c1014cc8>] ? do_page_fault+0x2ba/0x2ea
Jan 12 22:27:07 wellhouse kernel: [<c10636f6>] ? sys_faccessat+0x144/0x151
Jan 12 22:27:07 wellhouse kernel: [<c106e0cc>] ? sys_ioctl+0x2d/0x49
Jan 12 22:27:07 wellhouse kernel: [<c1177dd5>] ? syscall_call+0x7/0xb
Jan 12 22:27:07 wellhouse kernel: Mem-Info:
Jan 12 22:27:07 wellhouse kernel: DMA per-cpu:
Jan 12 22:27:07 wellhouse kernel: CPU 0: hi: 0, btch: 1 usd: 0
Jan 12 22:27:07 wellhouse kernel: Normal per-cpu:
Jan 12 22:27:07 wellhouse kernel: CPU 0: hi: 6, btch: 1 usd: 1
Jan 12 22:27:07 wellhouse kernel: active_anon:2698 inactive_anon:3626 isolated_anon:0
Jan 12 22:27:07 wellhouse kernel: active_file:2305 inactive_file:3574 isolated_file:0
Jan 12 22:27:07 wellhouse kernel: unevictable:0 dirty:17 writeback:0 unstable:0
Jan 12 22:27:07 wellhouse kernel: free:558 slab_reclaimable:484 slab_unreclaimable:1309
Jan 12 22:27:07 wellhouse kernel: mapped:1209 shmem:650 pagetables:129 bounce:0
Jan 12 22:27:07 wellhouse kernel: DMA free:1044kB min:248kB low:308kB high:372kB active_anon:3516kB inactive_anon:4004kB active_file:1784kB inactive_file:4216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB mlocked:0kB dirty:4kB writeback:0kB mapped:988kB shmem:8kB slab_reclaimable:288kB slab_unreclaimable:188kB kernel_stack:56kB pagetables:76kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 12 22:27:07 wellhouse kernel: lowmem_reserve[]: 0 47 47
Jan 12 22:27:07 wellhouse kernel: Normal free:1188kB min:764kB low:952kB high:1144kB active_anon:7276kB inactive_anon:10500kB active_file:7436kB inactive_file:10080kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:48768kB mlocked:0kB dirty:64kB writeback:0kB mapped:3848kB shmem:2592kB slab_reclaimable:1648kB slab_unreclaimable:5048kB kernel_stack:240kB pagetables:440kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 12 22:27:07 wellhouse kernel: lowmem_reserve[]: 0 0 0
Jan 12 22:27:07 wellhouse kernel: DMA: 87*4kB 9*8kB 5*16kB 3*32kB 1*64kB 3*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1044kB
Jan 12 22:27:07 wellhouse kernel: Normal: 123*4kB 11*8kB 0*16kB 1*32kB 1*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1188kB
Jan 12 22:27:07 wellhouse kernel: 6766 total pagecache pages
Jan 12 22:27:07 wellhouse kernel: 237 pages in swap cache
Jan 12 22:27:07 wellhouse kernel: Swap cache stats: add 1830, delete 1593, find 1018/1086
Jan 12 22:27:07 wellhouse kernel: Free swap = 2176336kB
Jan 12 22:27:07 wellhouse kernel: Total swap = 2179596kB
Jan 12 22:27:07 wellhouse kernel: 16383 pages RAM
Jan 12 22:27:07 wellhouse kernel: 826 pages reserved
Jan 12 22:27:07 wellhouse kernel: 5098 pages shared
Jan 12 22:27:07 wellhouse kernel: 12619 pages non-shared
I also noticed that the e100 is still using GFP_ATOMIC in one place. Does this mean that the problem is ultimately only truly fixable by freeing up some memory?
Cheers,
Chris
--- On Wed, 12/1/11, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Subject: RE: [E1000-devel] [e100] Page allocation failure warning(?) in 2.6.36.3
> To: "Brandeburg, Jesse" <jesse.brandeburg@intel.com>
> Cc: "David Miller" <davem@davemloft.net>, "Chris Rankin" <rankincj@yahoo.com>, "e1000-devel@lists.sourceforge.net" <e1000-devel@lists.sourceforge.net>, "Dave, Tushar N" <tushar.n.dave@intel.com>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>, "Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>
> Date: Wednesday, 12 January, 2011, 18:14
> Le mercredi 12 janvier 2011 à 10:05
> -0800, Brandeburg, Jesse a écrit :
>
> > First, I don't think the following comment should hold
> up this patch.
> >
> > As a policy question when I asked about using
> __GFP_NOWARN before in other
> > Intel ethernet drivers the consensus seemed to be that
> the warning
> > messages were useful. All our drivers correctly
> handle runtime memory
> > failures, but none of them are currently using
> __GFP_NOWARN.
> >
> > Can I submit patches to change our other drivers to
> __GFP_NOWARN? I think
> > it will make for quite a few less reports of
> non-issues to the list. All
> > our drivers that I would be patching already have
> ethtool counters that
> > count failed allocations.
> >
>
> If an allocation failure is really handled, in the sense
> NIC doesnt
> freeze but only lose one incoming frame, then probably
> yes.
>
> I think the warning message can be useful when driver is
> known to let
> things in a non working state ;)
>
> As you said, this can be done later, here is a respin
> without this bit.
>
> Thanks !
>
> [PATCH v2] e100: use GFP_KERNEL allocations at device init
> stage
>
> In lowmem conditions, e100 driver can fail its
> initialization, because
> of GFP_ATOMIC abuse.
>
> Switch to GFP_KERNEL were applicable.
>
> Reported-by: Chris Rankin <rankincj@yahoo.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> drivers/net/e100.c | 22
> +++++++++++++++++-----
> 1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/e100.c b/drivers/net/e100.c
> index b0aa9e6..c9a2126 100644
> --- a/drivers/net/e100.c
> +++ b/drivers/net/e100.c
> @@ -1880,9 +1880,21 @@ static inline void
> e100_start_receiver(struct nic *nic, struct rx *rx)
> }
>
> #define RFD_BUF_LEN (sizeof(struct rfd) +
> VLAN_ETH_FRAME_LEN)
> -static int e100_rx_alloc_skb(struct nic *nic, struct rx
> *rx)
> +
> +static struct sk_buff *e100_alloc_skb(struct net_device
> *dev, gfp_t flags)
> +{
> + struct sk_buff *skb;
> +
> + skb = __netdev_alloc_skb(dev,
> RFD_BUF_LEN + NET_IP_ALIGN, flags);
> + if (NET_IP_ALIGN && skb)
> + skb_reserve(skb,
> NET_IP_ALIGN);
> + return skb;
> +}
> +
> +static int e100_rx_alloc_skb(struct nic *nic, struct rx
> *rx, gfp_t flags)
> {
> - if (!(rx->skb =
> netdev_alloc_skb_ip_align(nic->netdev, RFD_BUF_LEN)))
> + rx->skb =
> e100_alloc_skb(nic->netdev, flags);
> + if (!rx->skb)
> return -ENOMEM;
>
> /* Init, and map the RFD. */
> @@ -2026,7 +2038,7 @@ static void e100_rx_clean(struct nic
> *nic, unsigned int *work_done,
>
> /* Alloc new skbs to refill list */
> for (rx = nic->rx_to_use;
> !rx->skb; rx = nic->rx_to_use = rx->next) {
> - if
> (unlikely(e100_rx_alloc_skb(nic, rx)))
> + if
> (unlikely(e100_rx_alloc_skb(nic, rx, GFP_ATOMIC)))
>
> break; /* Better luck next time (see watchdog) */
> }
>
> @@ -2102,13 +2114,13 @@ static int
> e100_rx_alloc_list(struct nic *nic)
> nic->rx_to_use = nic->rx_to_clean
> = NULL;
> nic->ru_running = RU_UNINITIALIZED;
>
> - if (!(nic->rxs = kcalloc(count,
> sizeof(struct rx), GFP_ATOMIC)))
> + if (!(nic->rxs = kcalloc(count,
> sizeof(struct rx), GFP_KERNEL)))
> return -ENOMEM;
>
> for (rx = nic->rxs, i = 0; i <
> count; rx++, i++) {
> rx->next = (i + 1
> < count) ? rx + 1 : nic->rxs;
> rx->prev = (i ==
> 0) ? nic->rxs + count - 1 : rx - 1;
> - if
> (e100_rx_alloc_skb(nic, rx)) {
> + if
> (e100_rx_alloc_skb(nic, rx, GFP_KERNEL)) {
>
> e100_rx_clean_list(nic);
>
> return -ENOMEM;
> }
>
>
>
^ permalink raw reply
* [PATCH] eth: fix new kernel-doc warning
From: Randy Dunlap @ 2011-01-13 0:50 UTC (permalink / raw)
To: netdev; +Cc: davem
From: Randy Dunlap <randy.dunlap@oracle.com>
Fix new kernel-doc warning (copy-paste typo):
Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
net/ethernet/eth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- linux-2.6.37-git9.orig/net/ethernet/eth.c
+++ linux-2.6.37-git9/net/ethernet/eth.c
@@ -351,7 +351,7 @@ EXPORT_SYMBOL(ether_setup);
* @sizeof_priv: Size of additional driver-private structure to be allocated
* for this Ethernet device
* @txqs: The number of TX queues this device has.
- * @txqs: The number of RX queues this device has.
+ * @rxqs: The number of RX queues this device has.
*
* Fill in the fields of the device structure with Ethernet-generic
* values. Basically does everything except registering the device.
^ permalink raw reply
* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Matt Carlson @ 2011-01-13 1:21 UTC (permalink / raw)
To: Jesse Gross
Cc: Matthew Carlson, Michael Leun, Michael Chan, Eric Dumazet,
David Miller, Ben Greear, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <AANLkTi=N6jFE7T1=W-WLo_eHYWeE=oB6n8C8zPGoBNQF@mail.gmail.com>
On Thu, Jan 06, 2011 at 08:36:27PM -0800, Jesse Gross wrote:
> On Thu, Jan 6, 2011 at 10:24 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> > On Sat, Dec 18, 2010 at 07:38:00PM -0800, Jesse Gross wrote:
> >> On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
> >> <lkml20101129@newton.leun.net> wrote:
> >> > OK - all tests done on that DL320G5:
> >> >
> >> > For completeness, 2.6.37-rc5 unpatched:
> >> >
> >> > eth0, no vlan configured: totally broken - see double tagged vlans
> >> > without tag, single or untagged packets missing at all
> >>
> >> Random behavior? ?This one is somewhat hard to explain - maybe there
> >> are some other factors. ?eth0 has ASF on, so it always strips tags. ?I
> >> would expect it to behave like the vlan configured case.
> >>
> >> >
> >> > eth0, vlan configured: see packets without vlan tag (see double tagged
> >> > packets with one vlan tag)
> >>
> >> Both ASF and vlan group configured cause tag stripping to be enabled.
> >> Missing tag.
> >>
> >> >
> >> > eth1 same as originally reported:
> >> > without vlan configured see vlan tags (single and double tagged as
> >> > expected)
> >>
> >> No ASF and no vlan group means tag stripping is disabled. ?Have tag.
> >>
> >> > with vlan configured: see packets without vlan tag (see double tagged
> >> > packets with one vlan tag)
> >>
> >> Configuring vlan group causes stripping to be enabled. ?Missing tag.
> >>
> >> >
> >> >
> >> > 2.6.37-rc5, your tg3 use new vlan-code patch:
> >> >
> >> > eth0, no vlan configured: ?see packets without vlan tag (see double
> >> > tagged packets with one vlan tag)
> >>
> >> ASF enables tag stripping. ?Missing tag.
> >>
> >> > eth1, no vlan configured: see vlan tags (single and double tagged as
> >> > expected)
> >>
> >> No ASF, no vlan group means no stripping. ?Have tag.
> >>
> >> >
> >> >
> >> > eth0, vlan configured: as without vlan
> >>
> >> ASF enables stripping. ?Missing tag.
> >>
> >> > eth1, vlan configured: as without vlan
> >>
> >> With this patch vlan stripping is only enabled when ASF is on, so no
> >> stripping. ?Have tag.
> >>
> >> >
> >> > 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
> >> >
> >> > eth1 no vlan configured: see packets without vlan tag (see double tagged
> >> > packets with one vlan tag)
> >>
> >> With the second patch, vlan stripping is always enabled. ?Missing tag.
> >>
> >> > eth1 with vlan: the same
> >>
> >> Stripping still always enabled. ?Missing tag.
> >>
> >> The bottom line is whenever vlan stripping is enabled we're missing
> >> the outer tag. ?It might be worth adding some debugging in the area
> >> before napi_gro_receive/vlan_gro_receive (depending on version). ?My
> >> guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
> >> vlan packets on this NIC.
> >>
> >> You said that everything works on the 5752? ?Matt, is it possible that
> >> the 5714 either has a problem with vlan stripping or a different way
> >> of reporting it?
> >
> > I don't think this is a 5714 specific issue. ?I think the problem is
> > rooted in the fact that the VLAN tag stripping is enabled.
>
> It's definitely related to vlan stripping being enabled. Other cards
> using tg3 seem to work fine with stripping though, which is why I
> thought it might be specific to the 5714.
I just tested this on a 5714S, using a net-next-2.6 snapshot obtained
today. It does the right thing in both cases (2nd tg3 patch ommited /
applied). The tag is always visible in the packet stream as seen from
tcpdump.
> > Your RXD_FLAG_VLAN idea sounds unlikely to me, but it's worth a check.
> >
> > The patch here is using __vlan_hwaccel_put_tag(), which informs the
> > stack a VLAN tag is present. ?If this is indeed a reporting problem, I'm
> > not sure what else the driver should be doing.
>
> The code to hand off the tag to the stack looks OK to me. Michael was
> seeing this on older versions of the kernel as well with this NIC,
> which predates both this patch and the larger vlan changes so it
> doesn't seem like a problem with passing the tag to the network stack.
> It's hard to know exactly what is going on though without seeing what
> the hardware is reporting.
When RX_MODE_KEEP_VLAN_TAG is set, the RXD_FLAG_VLAN flag will not be set
when receiving a packet. The driver skips the __vlan_hwaccel_put_tag()
call.
When RX_MODE_KEEP_VLAN_TAG is unset, the RXD_FLAG_VLAN flag is set, and
__vlan_hwaccel_put_tag() is called to reinject the packet.
^ permalink raw reply
* [PATCH] sched: remove unused backlog in RED stats
From: Stephen Hemminger @ 2011-01-13 1:42 UTC (permalink / raw)
To: David Miller; +Cc: netdev
The RED statistics structure includes backlog field which is not
set or used by any code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
--- a/include/net/red.h 2011-01-12 15:04:13.146731266 -0800
+++ b/include/net/red.h 2011-01-12 15:04:19.050731266 -0800
@@ -97,7 +97,6 @@ struct red_stats {
u32 forced_mark; /* Forced marks, qavg > max_thresh */
u32 pdrop; /* Drops due to queue limits */
u32 other; /* Drops due to drop() calls */
- u32 backlog;
};
struct red_parms {
^ permalink raw reply
* [PATCH 00/22] ipvs namespaces v3.3
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom
Hi Pablo,
this changest includes the following changes since the v3.2 series
which was most recently posted as "[GIT PULL nf-next-2.6] ipvs namespaces".
* Remove several hunks that only make whitespace changes
* Add Acked-by: Julian Anastasov <ja@ssi.bg>
(It was an omission from v3.2)
* Fix merge conflicts
There are two changes that produce conflicts
* In the current net-next-2.6 tree but absent from the current nf-next-2.6 tree
there is "workqueue: convert
cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync()"
* And in the current nf-next-2.6 tree but absent from the current
net-next-2.6 tree there is "net: use the macros defined for the members
of flowi"
In order to create this series I merged net-next-2.6 into nf-next-2.6.
The result is at
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6 ipvs-netns3.3
However, I guess that you have already done your own merge and simply
pulling the branch above will create a bit of a mess. Please let me know
if you have a tree/branch that I should use as a base for a pull request.
^ permalink raw reply
* [PATCH 02/22] IPVS: netns to services part 1
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.
Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()
Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got struct net *net as first param.
If possible get net ptr skb etc,
- if not &init_net is used at this early stage of patching.
ip_vs_ctl.c procfs not ready for netns yet.
*v3
Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 64 +++++++++-
include/net/netns/ip_vs.h | 8 +
net/netfilter/ipvs/ip_vs_conn.c | 2 +-
net/netfilter/ipvs/ip_vs_core.c | 4 +-
net/netfilter/ipvs/ip_vs_ctl.c | 232 +++++++++++++++++++--------------
net/netfilter/ipvs/ip_vs_proto_sctp.c | 5 +-
net/netfilter/ipvs/ip_vs_proto_tcp.c | 7 +-
net/netfilter/ipvs/ip_vs_proto_udp.c | 5 +-
net/netfilter/ipvs/ip_vs_sync.c | 2 +-
9 files changed, 214 insertions(+), 115 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index c1c2ece..d551e0d 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -37,6 +37,59 @@ static inline struct netns_ipvs *net_ipvs(struct net* net)
{
return net->ipvs;
}
+/*
+ * Get net ptr from skb in traffic cases
+ * use skb_sknet when call is from userland (ioctl or netlink)
+ */
+static inline struct net *skb_net(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_NS
+#ifdef CONFIG_IP_VS_DEBUG
+ /*
+ * This is used for debug only.
+ * Start with the most likely hit
+ * End with BUG
+ */
+ if (likely(skb->dev && skb->dev->nd_net))
+ return dev_net(skb->dev);
+ if (skb_dst(skb)->dev)
+ return dev_net(skb_dst(skb)->dev);
+ WARN(skb->sk, "Maybe skb_sknet should be used in %s() at line:%d\n",
+ __func__, __LINE__);
+ if (likely(skb->sk && skb->sk->sk_net))
+ return sock_net(skb->sk);
+ pr_err("There is no net ptr to find in the skb in %s() line:%d\n",
+ __func__, __LINE__);
+ BUG();
+#else
+ return dev_net(skb->dev ? : skb_dst(skb)->dev);
+#endif
+#else
+ return &init_net;
+#endif
+}
+
+static inline struct net *skb_sknet(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_NS
+#ifdef CONFIG_IP_VS_DEBUG
+ /* Start with the most likely hit */
+ if (likely(skb->sk && skb->sk->sk_net))
+ return sock_net(skb->sk);
+ WARN(skb->dev, "Maybe skb_net should be used instead in %s() line:%d\n",
+ __func__, __LINE__);
+ if (likely(skb->dev && skb->dev->nd_net))
+ return dev_net(skb->dev);
+ pr_err("There is no net ptr to find in the skb in %s() line:%d\n",
+ __func__, __LINE__);
+ BUG();
+#else
+ return sock_net(skb->sk);
+#endif
+#else
+ return &init_net;
+#endif
+}
/* Connections' size value needed by ip_vs_ctl.c */
extern int ip_vs_conn_tab_size;
@@ -496,6 +549,7 @@ struct ip_vs_service {
unsigned flags; /* service status flags */
unsigned timeout; /* persistent timeout in ticks */
__be32 netmask; /* grouping granularity */
+ struct net *net;
struct list_head destinations; /* real server d-linked list */
__u32 num_dests; /* number of servers */
@@ -896,7 +950,7 @@ extern int sysctl_ip_vs_sync_ver;
extern void ip_vs_sync_switch_mode(int mode);
extern struct ip_vs_service *
-ip_vs_service_get(int af, __u32 fwmark, __u16 protocol,
+ip_vs_service_get(struct net *net, int af, __u32 fwmark, __u16 protocol,
const union nf_inet_addr *vaddr, __be16 vport);
static inline void ip_vs_service_put(struct ip_vs_service *svc)
@@ -905,7 +959,7 @@ static inline void ip_vs_service_put(struct ip_vs_service *svc)
}
extern struct ip_vs_dest *
-ip_vs_lookup_real_service(int af, __u16 protocol,
+ip_vs_lookup_real_service(struct net *net, int af, __u16 protocol,
const union nf_inet_addr *daddr, __be16 dport);
extern int ip_vs_use_count_inc(void);
@@ -913,9 +967,9 @@ extern void ip_vs_use_count_dec(void);
extern int ip_vs_control_init(void);
extern void ip_vs_control_cleanup(void);
extern struct ip_vs_dest *
-ip_vs_find_dest(int af, const union nf_inet_addr *daddr, __be16 dport,
- const union nf_inet_addr *vaddr, __be16 vport, __u16 protocol,
- __u32 fwmark);
+ip_vs_find_dest(struct net *net, int af, const union nf_inet_addr *daddr,
+ __be16 dport, const union nf_inet_addr *vaddr, __be16 vport,
+ __u16 protocol, __u32 fwmark);
extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 12fe840..5b87d22 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -20,6 +20,14 @@ struct ctl_table_header;
struct netns_ipvs {
int gen; /* Generation */
+ /*
+ * Hash table: for real service lookups
+ */
+ #define IP_VS_RTAB_BITS 4
+ #define IP_VS_RTAB_SIZE (1 << IP_VS_RTAB_BITS)
+ #define IP_VS_RTAB_MASK (IP_VS_RTAB_SIZE - 1)
+
+ struct list_head rs_table[IP_VS_RTAB_SIZE];
};
#endif /* IP_VS_H_ */
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 7c1b502..7a0e79e 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -611,7 +611,7 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
struct ip_vs_dest *dest;
if ((cp) && (!cp->dest)) {
- dest = ip_vs_find_dest(cp->af, &cp->daddr, cp->dport,
+ dest = ip_vs_find_dest(&init_net, cp->af, &cp->daddr, cp->dport,
&cp->vaddr, cp->vport,
cp->protocol, cp->fwmark);
ip_vs_bind_dest(cp, dest);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 206f40c..d0616ea 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1031,6 +1031,7 @@ drop:
static unsigned int
ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
{
+ struct net *net = NULL;
struct ip_vs_iphdr iph;
struct ip_vs_protocol *pp;
struct ip_vs_conn *cp;
@@ -1054,6 +1055,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (unlikely(!skb_dst(skb)))
return NF_ACCEPT;
+ net = skb_net(skb);
ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
@@ -1119,7 +1121,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
sizeof(_ports), _ports);
if (pptr == NULL)
return NF_ACCEPT; /* Not for me */
- if (ip_vs_lookup_real_service(af, iph.protocol,
+ if (ip_vs_lookup_real_service(net, af, iph.protocol,
&iph.saddr,
pptr[0])) {
/*
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index ceeef43..2d7c96b 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -288,15 +288,6 @@ static struct list_head ip_vs_svc_table[IP_VS_SVC_TAB_SIZE];
static struct list_head ip_vs_svc_fwm_table[IP_VS_SVC_TAB_SIZE];
/*
- * Hash table: for real service lookups
- */
-#define IP_VS_RTAB_BITS 4
-#define IP_VS_RTAB_SIZE (1 << IP_VS_RTAB_BITS)
-#define IP_VS_RTAB_MASK (IP_VS_RTAB_SIZE - 1)
-
-static struct list_head ip_vs_rtable[IP_VS_RTAB_SIZE];
-
-/*
* Trash for destinations
*/
static LIST_HEAD(ip_vs_dest_trash);
@@ -311,9 +302,9 @@ static atomic_t ip_vs_nullsvc_counter = ATOMIC_INIT(0);
/*
* Returns hash value for virtual service
*/
-static __inline__ unsigned
-ip_vs_svc_hashkey(int af, unsigned proto, const union nf_inet_addr *addr,
- __be16 port)
+static inline unsigned
+ip_vs_svc_hashkey(struct net *net, int af, unsigned proto,
+ const union nf_inet_addr *addr, __be16 port)
{
register unsigned porth = ntohs(port);
__be32 addr_fold = addr->ip;
@@ -323,6 +314,7 @@ ip_vs_svc_hashkey(int af, unsigned proto, const union nf_inet_addr *addr,
addr_fold = addr->ip6[0]^addr->ip6[1]^
addr->ip6[2]^addr->ip6[3];
#endif
+ addr_fold ^= ((size_t)net>>8);
return (proto^ntohl(addr_fold)^(porth>>IP_VS_SVC_TAB_BITS)^porth)
& IP_VS_SVC_TAB_MASK;
@@ -331,13 +323,13 @@ ip_vs_svc_hashkey(int af, unsigned proto, const union nf_inet_addr *addr,
/*
* Returns hash value of fwmark for virtual service lookup
*/
-static __inline__ unsigned ip_vs_svc_fwm_hashkey(__u32 fwmark)
+static inline unsigned ip_vs_svc_fwm_hashkey(struct net *net, __u32 fwmark)
{
- return fwmark & IP_VS_SVC_TAB_MASK;
+ return (((size_t)net>>8) ^ fwmark) & IP_VS_SVC_TAB_MASK;
}
/*
- * Hashes a service in the ip_vs_svc_table by <proto,addr,port>
+ * Hashes a service in the ip_vs_svc_table by <netns,proto,addr,port>
* or in the ip_vs_svc_fwm_table by fwmark.
* Should be called with locked tables.
*/
@@ -353,16 +345,16 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc)
if (svc->fwmark == 0) {
/*
- * Hash it by <protocol,addr,port> in ip_vs_svc_table
+ * Hash it by <netns,protocol,addr,port> in ip_vs_svc_table
*/
- hash = ip_vs_svc_hashkey(svc->af, svc->protocol, &svc->addr,
- svc->port);
+ hash = ip_vs_svc_hashkey(svc->net, svc->af, svc->protocol,
+ &svc->addr, svc->port);
list_add(&svc->s_list, &ip_vs_svc_table[hash]);
} else {
/*
- * Hash it by fwmark in ip_vs_svc_fwm_table
+ * Hash it by fwmark in svc_fwm_table
*/
- hash = ip_vs_svc_fwm_hashkey(svc->fwmark);
+ hash = ip_vs_svc_fwm_hashkey(svc->net, svc->fwmark);
list_add(&svc->f_list, &ip_vs_svc_fwm_table[hash]);
}
@@ -374,7 +366,7 @@ static int ip_vs_svc_hash(struct ip_vs_service *svc)
/*
- * Unhashes a service from ip_vs_svc_table/ip_vs_svc_fwm_table.
+ * Unhashes a service from svc_table / svc_fwm_table.
* Should be called with locked tables.
*/
static int ip_vs_svc_unhash(struct ip_vs_service *svc)
@@ -386,10 +378,10 @@ static int ip_vs_svc_unhash(struct ip_vs_service *svc)
}
if (svc->fwmark == 0) {
- /* Remove it from the ip_vs_svc_table table */
+ /* Remove it from the svc_table table */
list_del(&svc->s_list);
} else {
- /* Remove it from the ip_vs_svc_fwm_table table */
+ /* Remove it from the svc_fwm_table table */
list_del(&svc->f_list);
}
@@ -400,23 +392,24 @@ static int ip_vs_svc_unhash(struct ip_vs_service *svc)
/*
- * Get service by {proto,addr,port} in the service table.
+ * Get service by {netns, proto,addr,port} in the service table.
*/
static inline struct ip_vs_service *
-__ip_vs_service_find(int af, __u16 protocol, const union nf_inet_addr *vaddr,
- __be16 vport)
+__ip_vs_service_find(struct net *net, int af, __u16 protocol,
+ const union nf_inet_addr *vaddr, __be16 vport)
{
unsigned hash;
struct ip_vs_service *svc;
/* Check for "full" addressed entries */
- hash = ip_vs_svc_hashkey(af, protocol, vaddr, vport);
+ hash = ip_vs_svc_hashkey(net, af, protocol, vaddr, vport);
list_for_each_entry(svc, &ip_vs_svc_table[hash], s_list){
if ((svc->af == af)
&& ip_vs_addr_equal(af, &svc->addr, vaddr)
&& (svc->port == vport)
- && (svc->protocol == protocol)) {
+ && (svc->protocol == protocol)
+ && net_eq(svc->net, net)) {
/* HIT */
return svc;
}
@@ -430,16 +423,17 @@ __ip_vs_service_find(int af, __u16 protocol, const union nf_inet_addr *vaddr,
* Get service by {fwmark} in the service table.
*/
static inline struct ip_vs_service *
-__ip_vs_svc_fwm_find(int af, __u32 fwmark)
+__ip_vs_svc_fwm_find(struct net *net, int af, __u32 fwmark)
{
unsigned hash;
struct ip_vs_service *svc;
/* Check for fwmark addressed entries */
- hash = ip_vs_svc_fwm_hashkey(fwmark);
+ hash = ip_vs_svc_fwm_hashkey(net, fwmark);
list_for_each_entry(svc, &ip_vs_svc_fwm_table[hash], f_list) {
- if (svc->fwmark == fwmark && svc->af == af) {
+ if (svc->fwmark == fwmark && svc->af == af
+ && net_eq(svc->net, net)) {
/* HIT */
return svc;
}
@@ -449,7 +443,7 @@ __ip_vs_svc_fwm_find(int af, __u32 fwmark)
}
struct ip_vs_service *
-ip_vs_service_get(int af, __u32 fwmark, __u16 protocol,
+ip_vs_service_get(struct net *net, int af, __u32 fwmark, __u16 protocol,
const union nf_inet_addr *vaddr, __be16 vport)
{
struct ip_vs_service *svc;
@@ -459,14 +453,15 @@ ip_vs_service_get(int af, __u32 fwmark, __u16 protocol,
/*
* Check the table hashed by fwmark first
*/
- if (fwmark && (svc = __ip_vs_svc_fwm_find(af, fwmark)))
+ svc = __ip_vs_svc_fwm_find(net, af, fwmark);
+ if (fwmark && svc)
goto out;
/*
* Check the table hashed by <protocol,addr,port>
* for "full" addressed entries
*/
- svc = __ip_vs_service_find(af, protocol, vaddr, vport);
+ svc = __ip_vs_service_find(net, af, protocol, vaddr, vport);
if (svc == NULL
&& protocol == IPPROTO_TCP
@@ -476,7 +471,7 @@ ip_vs_service_get(int af, __u32 fwmark, __u16 protocol,
* Check if ftp service entry exists, the packet
* might belong to FTP data connections.
*/
- svc = __ip_vs_service_find(af, protocol, vaddr, FTPPORT);
+ svc = __ip_vs_service_find(net, af, protocol, vaddr, FTPPORT);
}
if (svc == NULL
@@ -484,7 +479,7 @@ ip_vs_service_get(int af, __u32 fwmark, __u16 protocol,
/*
* Check if the catch-all port (port zero) exists
*/
- svc = __ip_vs_service_find(af, protocol, vaddr, 0);
+ svc = __ip_vs_service_find(net, af, protocol, vaddr, 0);
}
out:
@@ -545,10 +540,10 @@ static inline unsigned ip_vs_rs_hashkey(int af,
}
/*
- * Hashes ip_vs_dest in ip_vs_rtable by <proto,addr,port>.
+ * Hashes ip_vs_dest in rs_table by <proto,addr,port>.
* should be called with locked tables.
*/
-static int ip_vs_rs_hash(struct ip_vs_dest *dest)
+static int ip_vs_rs_hash(struct netns_ipvs *ipvs, struct ip_vs_dest *dest)
{
unsigned hash;
@@ -562,19 +557,19 @@ static int ip_vs_rs_hash(struct ip_vs_dest *dest)
*/
hash = ip_vs_rs_hashkey(dest->af, &dest->addr, dest->port);
- list_add(&dest->d_list, &ip_vs_rtable[hash]);
+ list_add(&dest->d_list, &ipvs->rs_table[hash]);
return 1;
}
/*
- * UNhashes ip_vs_dest from ip_vs_rtable.
+ * UNhashes ip_vs_dest from rs_table.
* should be called with locked tables.
*/
static int ip_vs_rs_unhash(struct ip_vs_dest *dest)
{
/*
- * Remove it from the ip_vs_rtable table.
+ * Remove it from the rs_table table.
*/
if (!list_empty(&dest->d_list)) {
list_del(&dest->d_list);
@@ -588,10 +583,11 @@ static int ip_vs_rs_unhash(struct ip_vs_dest *dest)
* Lookup real service by <proto,addr,port> in the real service table.
*/
struct ip_vs_dest *
-ip_vs_lookup_real_service(int af, __u16 protocol,
+ip_vs_lookup_real_service(struct net *net, int af, __u16 protocol,
const union nf_inet_addr *daddr,
__be16 dport)
{
+ struct netns_ipvs *ipvs = net_ipvs(net);
unsigned hash;
struct ip_vs_dest *dest;
@@ -602,7 +598,7 @@ ip_vs_lookup_real_service(int af, __u16 protocol,
hash = ip_vs_rs_hashkey(af, daddr, dport);
read_lock(&__ip_vs_rs_lock);
- list_for_each_entry(dest, &ip_vs_rtable[hash], d_list) {
+ list_for_each_entry(dest, &ipvs->rs_table[hash], d_list) {
if ((dest->af == af)
&& ip_vs_addr_equal(af, &dest->addr, daddr)
&& (dest->port == dport)
@@ -652,7 +648,8 @@ ip_vs_lookup_dest(struct ip_vs_service *svc, const union nf_inet_addr *daddr,
* ip_vs_lookup_real_service() looked promissing, but
* seems not working as expected.
*/
-struct ip_vs_dest *ip_vs_find_dest(int af, const union nf_inet_addr *daddr,
+struct ip_vs_dest *ip_vs_find_dest(struct net *net, int af,
+ const union nf_inet_addr *daddr,
__be16 dport,
const union nf_inet_addr *vaddr,
__be16 vport, __u16 protocol, __u32 fwmark)
@@ -660,7 +657,7 @@ struct ip_vs_dest *ip_vs_find_dest(int af, const union nf_inet_addr *daddr,
struct ip_vs_dest *dest;
struct ip_vs_service *svc;
- svc = ip_vs_service_get(af, fwmark, protocol, vaddr, vport);
+ svc = ip_vs_service_get(net, af, fwmark, protocol, vaddr, vport);
if (!svc)
return NULL;
dest = ip_vs_lookup_dest(svc, daddr, dport);
@@ -768,6 +765,7 @@ static void
__ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
struct ip_vs_dest_user_kern *udest, int add)
{
+ struct netns_ipvs *ipvs = net_ipvs(svc->net);
int conn_flags;
/* set the weight and the flags */
@@ -780,11 +778,11 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
conn_flags |= IP_VS_CONN_F_NOOUTPUT;
} else {
/*
- * Put the real service in ip_vs_rtable if not present.
+ * Put the real service in rs_table if not present.
* For now only for NAT!
*/
write_lock_bh(&__ip_vs_rs_lock);
- ip_vs_rs_hash(dest);
+ ip_vs_rs_hash(ipvs, dest);
write_unlock_bh(&__ip_vs_rs_lock);
}
atomic_set(&dest->conn_flags, conn_flags);
@@ -1117,7 +1115,7 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
* Add a service into the service hash table
*/
static int
-ip_vs_add_service(struct ip_vs_service_user_kern *u,
+ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
struct ip_vs_service **svc_p)
{
int ret = 0;
@@ -1172,6 +1170,7 @@ ip_vs_add_service(struct ip_vs_service_user_kern *u,
svc->flags = u->flags;
svc->timeout = u->timeout * HZ;
svc->netmask = u->netmask;
+ svc->net = net;
INIT_LIST_HEAD(&svc->destinations);
rwlock_init(&svc->sched_lock);
@@ -1428,17 +1427,19 @@ static int ip_vs_del_service(struct ip_vs_service *svc)
/*
* Flush all the virtual services
*/
-static int ip_vs_flush(void)
+static int ip_vs_flush(struct net *net)
{
int idx;
struct ip_vs_service *svc, *nxt;
/*
- * Flush the service table hashed by <protocol,addr,port>
+ * Flush the service table hashed by <netns,protocol,addr,port>
*/
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
- list_for_each_entry_safe(svc, nxt, &ip_vs_svc_table[idx], s_list) {
- ip_vs_unlink_service(svc);
+ list_for_each_entry_safe(svc, nxt, &ip_vs_svc_table[idx],
+ s_list) {
+ if (net_eq(svc->net, net))
+ ip_vs_unlink_service(svc);
}
}
@@ -1448,7 +1449,8 @@ static int ip_vs_flush(void)
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry_safe(svc, nxt,
&ip_vs_svc_fwm_table[idx], f_list) {
- ip_vs_unlink_service(svc);
+ if (net_eq(svc->net, net))
+ ip_vs_unlink_service(svc);
}
}
@@ -1472,20 +1474,22 @@ static int ip_vs_zero_service(struct ip_vs_service *svc)
return 0;
}
-static int ip_vs_zero_all(void)
+static int ip_vs_zero_all(struct net *net)
{
int idx;
struct ip_vs_service *svc;
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
- ip_vs_zero_service(svc);
+ if (net_eq(svc->net, net))
+ ip_vs_zero_service(svc);
}
}
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
- ip_vs_zero_service(svc);
+ if (net_eq(svc->net, net))
+ ip_vs_zero_service(svc);
}
}
@@ -1763,6 +1767,7 @@ static struct ctl_table_header * sysctl_header;
#ifdef CONFIG_PROC_FS
struct ip_vs_iter {
+ struct seq_net_private p; /* Do not move this, netns depends upon it*/
struct list_head *table;
int bucket;
};
@@ -1789,6 +1794,7 @@ static inline const char *ip_vs_fwd_name(unsigned flags)
/* Get the Nth entry in the two lists */
static struct ip_vs_service *ip_vs_info_array(struct seq_file *seq, loff_t pos)
{
+ struct net *net = seq_file_net(seq);
struct ip_vs_iter *iter = seq->private;
int idx;
struct ip_vs_service *svc;
@@ -1796,7 +1802,7 @@ static struct ip_vs_service *ip_vs_info_array(struct seq_file *seq, loff_t pos)
/* look in hash by protocol */
for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
- if (pos-- == 0){
+ if (net_eq(svc->net, net) && pos-- == 0) {
iter->table = ip_vs_svc_table;
iter->bucket = idx;
return svc;
@@ -1807,7 +1813,7 @@ static struct ip_vs_service *ip_vs_info_array(struct seq_file *seq, loff_t pos)
/* keep looking in fwmark */
for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
- if (pos-- == 0) {
+ if (net_eq(svc->net, net) && pos-- == 0) {
iter->table = ip_vs_svc_fwm_table;
iter->bucket = idx;
return svc;
@@ -1961,7 +1967,7 @@ static const struct seq_operations ip_vs_info_seq_ops = {
static int ip_vs_info_open(struct inode *inode, struct file *file)
{
- return seq_open_private(file, &ip_vs_info_seq_ops,
+ return seq_open_net(inode, file, &ip_vs_info_seq_ops,
sizeof(struct ip_vs_iter));
}
@@ -2011,7 +2017,7 @@ static int ip_vs_stats_show(struct seq_file *seq, void *v)
static int ip_vs_stats_seq_open(struct inode *inode, struct file *file)
{
- return single_open(file, ip_vs_stats_show, NULL);
+ return single_open_net(inode, file, ip_vs_stats_show);
}
static const struct file_operations ip_vs_stats_fops = {
@@ -2113,6 +2119,7 @@ static void ip_vs_copy_udest_compat(struct ip_vs_dest_user_kern *udest,
static int
do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
{
+ struct net *net = sock_net(sk);
int ret;
unsigned char arg[MAX_ARG_LEN];
struct ip_vs_service_user *usvc_compat;
@@ -2147,7 +2154,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
if (cmd == IP_VS_SO_SET_FLUSH) {
/* Flush the virtual service */
- ret = ip_vs_flush();
+ ret = ip_vs_flush(net);
goto out_unlock;
} else if (cmd == IP_VS_SO_SET_TIMEOUT) {
/* Set timeout values for (tcp tcpfin udp) */
@@ -2174,7 +2181,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
if (cmd == IP_VS_SO_SET_ZERO) {
/* if no service address is set, zero counters in all */
if (!usvc.fwmark && !usvc.addr.ip && !usvc.port) {
- ret = ip_vs_zero_all();
+ ret = ip_vs_zero_all(net);
goto out_unlock;
}
}
@@ -2191,10 +2198,10 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
/* Lookup the exact service by <protocol, addr, port> or fwmark */
if (usvc.fwmark == 0)
- svc = __ip_vs_service_find(usvc.af, usvc.protocol,
+ svc = __ip_vs_service_find(net, usvc.af, usvc.protocol,
&usvc.addr, usvc.port);
else
- svc = __ip_vs_svc_fwm_find(usvc.af, usvc.fwmark);
+ svc = __ip_vs_svc_fwm_find(net, usvc.af, usvc.fwmark);
if (cmd != IP_VS_SO_SET_ADD
&& (svc == NULL || svc->protocol != usvc.protocol)) {
@@ -2207,7 +2214,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
if (svc != NULL)
ret = -EEXIST;
else
- ret = ip_vs_add_service(&usvc, &svc);
+ ret = ip_vs_add_service(net, &usvc, &svc);
break;
case IP_VS_SO_SET_EDIT:
ret = ip_vs_edit_service(svc, &usvc);
@@ -2267,7 +2274,8 @@ ip_vs_copy_service(struct ip_vs_service_entry *dst, struct ip_vs_service *src)
}
static inline int
-__ip_vs_get_service_entries(const struct ip_vs_get_services *get,
+__ip_vs_get_service_entries(struct net *net,
+ const struct ip_vs_get_services *get,
struct ip_vs_get_services __user *uptr)
{
int idx, count=0;
@@ -2278,7 +2286,7 @@ __ip_vs_get_service_entries(const struct ip_vs_get_services *get,
for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
/* Only expose IPv4 entries to old interface */
- if (svc->af != AF_INET)
+ if (svc->af != AF_INET || !net_eq(svc->net, net))
continue;
if (count >= get->num_services)
@@ -2297,7 +2305,7 @@ __ip_vs_get_service_entries(const struct ip_vs_get_services *get,
for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
/* Only expose IPv4 entries to old interface */
- if (svc->af != AF_INET)
+ if (svc->af != AF_INET || !net_eq(svc->net, net))
continue;
if (count >= get->num_services)
@@ -2317,7 +2325,7 @@ __ip_vs_get_service_entries(const struct ip_vs_get_services *get,
}
static inline int
-__ip_vs_get_dest_entries(const struct ip_vs_get_dests *get,
+__ip_vs_get_dest_entries(struct net *net, const struct ip_vs_get_dests *get,
struct ip_vs_get_dests __user *uptr)
{
struct ip_vs_service *svc;
@@ -2325,9 +2333,9 @@ __ip_vs_get_dest_entries(const struct ip_vs_get_dests *get,
int ret = 0;
if (get->fwmark)
- svc = __ip_vs_svc_fwm_find(AF_INET, get->fwmark);
+ svc = __ip_vs_svc_fwm_find(net, AF_INET, get->fwmark);
else
- svc = __ip_vs_service_find(AF_INET, get->protocol, &addr,
+ svc = __ip_vs_service_find(net, AF_INET, get->protocol, &addr,
get->port);
if (svc) {
@@ -2401,7 +2409,9 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
unsigned char arg[128];
int ret = 0;
unsigned int copylen;
+ struct net *net = sock_net(sk);
+ BUG_ON(!net);
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -2463,7 +2473,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
ret = -EINVAL;
goto out;
}
- ret = __ip_vs_get_service_entries(get, user);
+ ret = __ip_vs_get_service_entries(net, get, user);
}
break;
@@ -2476,10 +2486,11 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
entry = (struct ip_vs_service_entry *)arg;
addr.ip = entry->addr;
if (entry->fwmark)
- svc = __ip_vs_svc_fwm_find(AF_INET, entry->fwmark);
+ svc = __ip_vs_svc_fwm_find(net, AF_INET, entry->fwmark);
else
- svc = __ip_vs_service_find(AF_INET, entry->protocol,
- &addr, entry->port);
+ svc = __ip_vs_service_find(net, AF_INET,
+ entry->protocol, &addr,
+ entry->port);
if (svc) {
ip_vs_copy_service(entry, svc);
if (copy_to_user(user, entry, sizeof(*entry)) != 0)
@@ -2502,7 +2513,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
ret = -EINVAL;
goto out;
}
- ret = __ip_vs_get_dest_entries(get, user);
+ ret = __ip_vs_get_dest_entries(net, get, user);
}
break;
@@ -2722,11 +2733,12 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
int idx = 0, i;
int start = cb->args[0];
struct ip_vs_service *svc;
+ struct net *net = skb_sknet(skb);
mutex_lock(&__ip_vs_mutex);
for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) {
list_for_each_entry(svc, &ip_vs_svc_table[i], s_list) {
- if (++idx <= start)
+ if (++idx <= start || !net_eq(svc->net, net))
continue;
if (ip_vs_genl_dump_service(skb, svc, cb) < 0) {
idx--;
@@ -2737,7 +2749,7 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
for (i = 0; i < IP_VS_SVC_TAB_SIZE; i++) {
list_for_each_entry(svc, &ip_vs_svc_fwm_table[i], f_list) {
- if (++idx <= start)
+ if (++idx <= start || !net_eq(svc->net, net))
continue;
if (ip_vs_genl_dump_service(skb, svc, cb) < 0) {
idx--;
@@ -2753,7 +2765,8 @@ nla_put_failure:
return skb->len;
}
-static int ip_vs_genl_parse_service(struct ip_vs_service_user_kern *usvc,
+static int ip_vs_genl_parse_service(struct net *net,
+ struct ip_vs_service_user_kern *usvc,
struct nlattr *nla, int full_entry,
struct ip_vs_service **ret_svc)
{
@@ -2796,9 +2809,9 @@ static int ip_vs_genl_parse_service(struct ip_vs_service_user_kern *usvc,
}
if (usvc->fwmark)
- svc = __ip_vs_svc_fwm_find(usvc->af, usvc->fwmark);
+ svc = __ip_vs_svc_fwm_find(net, usvc->af, usvc->fwmark);
else
- svc = __ip_vs_service_find(usvc->af, usvc->protocol,
+ svc = __ip_vs_service_find(net, usvc->af, usvc->protocol,
&usvc->addr, usvc->port);
*ret_svc = svc;
@@ -2835,13 +2848,14 @@ static int ip_vs_genl_parse_service(struct ip_vs_service_user_kern *usvc,
return 0;
}
-static struct ip_vs_service *ip_vs_genl_find_service(struct nlattr *nla)
+static struct ip_vs_service *ip_vs_genl_find_service(struct net *net,
+ struct nlattr *nla)
{
struct ip_vs_service_user_kern usvc;
struct ip_vs_service *svc;
int ret;
- ret = ip_vs_genl_parse_service(&usvc, nla, 0, &svc);
+ ret = ip_vs_genl_parse_service(net, &usvc, nla, 0, &svc);
return ret ? ERR_PTR(ret) : svc;
}
@@ -2909,6 +2923,7 @@ static int ip_vs_genl_dump_dests(struct sk_buff *skb,
struct ip_vs_service *svc;
struct ip_vs_dest *dest;
struct nlattr *attrs[IPVS_CMD_ATTR_MAX + 1];
+ struct net *net;
mutex_lock(&__ip_vs_mutex);
@@ -2917,7 +2932,8 @@ static int ip_vs_genl_dump_dests(struct sk_buff *skb,
IPVS_CMD_ATTR_MAX, ip_vs_cmd_policy))
goto out_err;
- svc = ip_vs_genl_find_service(attrs[IPVS_CMD_ATTR_SERVICE]);
+ net = skb_sknet(skb);
+ svc = ip_vs_genl_find_service(net, attrs[IPVS_CMD_ATTR_SERVICE]);
if (IS_ERR(svc) || svc == NULL)
goto out_err;
@@ -3102,13 +3118,15 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
struct ip_vs_dest_user_kern udest;
int ret = 0, cmd;
int need_full_svc = 0, need_full_dest = 0;
+ struct net *net;
+ net = skb_sknet(skb);
cmd = info->genlhdr->cmd;
mutex_lock(&__ip_vs_mutex);
if (cmd == IPVS_CMD_FLUSH) {
- ret = ip_vs_flush();
+ ret = ip_vs_flush(net);
goto out;
} else if (cmd == IPVS_CMD_SET_CONFIG) {
ret = ip_vs_genl_set_config(info->attrs);
@@ -3133,7 +3151,7 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
goto out;
} else if (cmd == IPVS_CMD_ZERO &&
!info->attrs[IPVS_CMD_ATTR_SERVICE]) {
- ret = ip_vs_zero_all();
+ ret = ip_vs_zero_all(net);
goto out;
}
@@ -3143,7 +3161,7 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
if (cmd == IPVS_CMD_NEW_SERVICE || cmd == IPVS_CMD_SET_SERVICE)
need_full_svc = 1;
- ret = ip_vs_genl_parse_service(&usvc,
+ ret = ip_vs_genl_parse_service(net, &usvc,
info->attrs[IPVS_CMD_ATTR_SERVICE],
need_full_svc, &svc);
if (ret)
@@ -3173,7 +3191,7 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
switch (cmd) {
case IPVS_CMD_NEW_SERVICE:
if (svc == NULL)
- ret = ip_vs_add_service(&usvc, &svc);
+ ret = ip_vs_add_service(net, &usvc, &svc);
else
ret = -EEXIST;
break;
@@ -3211,7 +3229,9 @@ static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info)
struct sk_buff *msg;
void *reply;
int ret, cmd, reply_cmd;
+ struct net *net;
+ net = skb_sknet(skb);
cmd = info->genlhdr->cmd;
if (cmd == IPVS_CMD_GET_SERVICE)
@@ -3240,7 +3260,8 @@ static int ip_vs_genl_get_cmd(struct sk_buff *skb, struct genl_info *info)
{
struct ip_vs_service *svc;
- svc = ip_vs_genl_find_service(info->attrs[IPVS_CMD_ATTR_SERVICE]);
+ svc = ip_vs_genl_find_service(net,
+ info->attrs[IPVS_CMD_ATTR_SERVICE]);
if (IS_ERR(svc)) {
ret = PTR_ERR(svc);
goto out_err;
@@ -3411,9 +3432,15 @@ static void ip_vs_genl_unregister(void)
*/
int __net_init __ip_vs_control_init(struct net *net)
{
+ int idx;
+ struct netns_ipvs *ipvs = net_ipvs(net);
+
if (!net_eq(net, &init_net)) /* netns not enabled yet */
return -EPERM;
+ for (idx = 0; idx < IP_VS_RTAB_SIZE; idx++)
+ INIT_LIST_HEAD(&ipvs->rs_table[idx]);
+
proc_net_fops_create(net, "ip_vs", 0, &ip_vs_info_fops);
proc_net_fops_create(net, "ip_vs_stats", 0, &ip_vs_stats_fops);
sysctl_header = register_net_sysctl_table(net, net_vs_ctl_path,
@@ -3445,43 +3472,48 @@ static struct pernet_operations ipvs_control_ops = {
int __init ip_vs_control_init(void)
{
- int ret;
int idx;
+ int ret;
EnterFunction(2);
- /* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
+ /* Initialize svc_table, ip_vs_svc_fwm_table, rs_table */
for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
INIT_LIST_HEAD(&ip_vs_svc_table[idx]);
INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
}
- for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++) {
- INIT_LIST_HEAD(&ip_vs_rtable[idx]);
+
+ ret = register_pernet_subsys(&ipvs_control_ops);
+ if (ret) {
+ pr_err("cannot register namespace.\n");
+ goto err;
}
- smp_wmb();
+
+ smp_wmb(); /* Do we really need it now ? */
ret = nf_register_sockopt(&ip_vs_sockopts);
if (ret) {
pr_err("cannot register sockopt.\n");
- return ret;
+ goto err_net;
}
ret = ip_vs_genl_register();
if (ret) {
pr_err("cannot register Generic Netlink interface.\n");
nf_unregister_sockopt(&ip_vs_sockopts);
- return ret;
+ goto err_net;
}
- ret = register_pernet_subsys(&ipvs_control_ops);
- if (ret)
- return ret;
-
/* Hook the defense timer */
schedule_delayed_work(&defense_work, DEFENSE_TIMER_PERIOD);
LeaveFunction(2);
return 0;
+
+err_net:
+ unregister_pernet_subsys(&ipvs_control_ops);
+err:
+ return ret;
}
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index a315159..521b827 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -12,6 +12,7 @@ static int
sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
int *verdict, struct ip_vs_conn **cpp)
{
+ struct net *net;
struct ip_vs_service *svc;
sctp_chunkhdr_t _schunkh, *sch;
sctp_sctphdr_t *sh, _sctph;
@@ -27,9 +28,9 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
sizeof(_schunkh), &_schunkh);
if (sch == NULL)
return 0;
-
+ net = skb_net(skb);
if ((sch->type == SCTP_CID_INIT) &&
- (svc = ip_vs_service_get(af, skb->mark, iph.protocol,
+ (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
&iph.daddr, sh->dest))) {
int ignored;
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 1cdab12..c175d31 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -31,6 +31,7 @@ static int
tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
int *verdict, struct ip_vs_conn **cpp)
{
+ struct net *net;
struct ip_vs_service *svc;
struct tcphdr _tcph, *th;
struct ip_vs_iphdr iph;
@@ -42,11 +43,11 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
*verdict = NF_DROP;
return 0;
}
-
+ net = skb_net(skb);
/* No !th->ack check to allow scheduling on SYN+ACK for Active FTP */
if (th->syn &&
- (svc = ip_vs_service_get(af, skb->mark, iph.protocol, &iph.daddr,
- th->dest))) {
+ (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
+ &iph.daddr, th->dest))) {
int ignored;
if (ip_vs_todrop()) {
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index cd398de..5ab54f6 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -31,6 +31,7 @@ static int
udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
int *verdict, struct ip_vs_conn **cpp)
{
+ struct net *net;
struct ip_vs_service *svc;
struct udphdr _udph, *uh;
struct ip_vs_iphdr iph;
@@ -42,8 +43,8 @@ udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_protocol *pp,
*verdict = NF_DROP;
return 0;
}
^ permalink raw reply related
* [PATCH 03/22] IPVS: netns awarness to lblcr sheduler
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as
sysctl_lblcr_expiration
procfs updated to handle this.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/netns/ip_vs.h | 5 +++
net/netfilter/ipvs/ip_vs_lblcr.c | 54 +++++++++++++++++++++++++------------
2 files changed, 41 insertions(+), 18 deletions(-)
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 5b87d22..51a92ee 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -28,6 +28,11 @@ struct netns_ipvs {
#define IP_VS_RTAB_MASK (IP_VS_RTAB_SIZE - 1)
struct list_head rs_table[IP_VS_RTAB_SIZE];
+
+ /* ip_vs_lblcr */
+ int sysctl_lblcr_expiration;
+ struct ctl_table_header *lblcr_ctl_header;
+ struct ctl_table *lblcr_ctl_table;
};
#endif /* IP_VS_H_ */
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 7c7396a..61ae8cf 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -70,8 +70,6 @@
* entries that haven't been touched for a day.
*/
#define COUNT_FOR_FULL_EXPIRATION 30
-static int sysctl_ip_vs_lblcr_expiration = 24*60*60*HZ;
-
/*
* for IPVS lblcr entry hash table
@@ -296,7 +294,7 @@ struct ip_vs_lblcr_table {
static ctl_table vs_vars_table[] = {
{
.procname = "lblcr_expiration",
- .data = &sysctl_ip_vs_lblcr_expiration,
+ .data = NULL,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec_jiffies,
@@ -304,8 +302,6 @@ static ctl_table vs_vars_table[] = {
{ }
};
-static struct ctl_table_header * sysctl_header;
-
static inline void ip_vs_lblcr_free(struct ip_vs_lblcr_entry *en)
{
list_del(&en->list);
@@ -425,14 +421,15 @@ static inline void ip_vs_lblcr_full_check(struct ip_vs_service *svc)
unsigned long now = jiffies;
int i, j;
struct ip_vs_lblcr_entry *en, *nxt;
+ struct netns_ipvs *ipvs = net_ipvs(svc->net);
for (i=0, j=tbl->rover; i<IP_VS_LBLCR_TAB_SIZE; i++) {
j = (j + 1) & IP_VS_LBLCR_TAB_MASK;
write_lock(&svc->sched_lock);
list_for_each_entry_safe(en, nxt, &tbl->bucket[j], list) {
- if (time_after(en->lastuse+sysctl_ip_vs_lblcr_expiration,
- now))
+ if (time_after(en->lastuse
+ + ipvs->sysctl_lblcr_expiration, now))
continue;
ip_vs_lblcr_free(en);
@@ -664,6 +661,7 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
read_lock(&svc->sched_lock);
en = ip_vs_lblcr_get(svc->af, tbl, &iph.daddr);
if (en) {
+ struct netns_ipvs *ipvs = net_ipvs(svc->net);
/* We only hold a read lock, but this is atomic */
en->lastuse = jiffies;
@@ -675,7 +673,7 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
/* More than one destination + enough time passed by, cleanup */
if (atomic_read(&en->set.size) > 1 &&
time_after(jiffies, en->set.lastmod +
- sysctl_ip_vs_lblcr_expiration)) {
+ ipvs->sysctl_lblcr_expiration)) {
struct ip_vs_dest *m;
write_lock(&en->set.lock);
@@ -749,23 +747,43 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler =
*/
static int __net_init __ip_vs_lblcr_init(struct net *net)
{
- if (!net_eq(net, &init_net)) /* netns not enabled yet */
- return -EPERM;
-
- sysctl_header = register_net_sysctl_table(net, net_vs_ctl_path,
- vs_vars_table);
- if (!sysctl_header)
- return -ENOMEM;
+ struct netns_ipvs *ipvs = net_ipvs(net);
+
+ if (!net_eq(net, &init_net)) {
+ ipvs->lblcr_ctl_table = kmemdup(vs_vars_table,
+ sizeof(vs_vars_table),
+ GFP_KERNEL);
+ if (ipvs->lblcr_ctl_table == NULL)
+ goto err_dup;
+ } else
+ ipvs->lblcr_ctl_table = vs_vars_table;
+ ipvs->sysctl_lblcr_expiration = 24*60*60*HZ;
+ ipvs->lblcr_ctl_table[0].data = &ipvs->sysctl_lblcr_expiration;
+
+ ipvs->lblcr_ctl_header =
+ register_net_sysctl_table(net, net_vs_ctl_path,
+ ipvs->lblcr_ctl_table);
+ if (!ipvs->lblcr_ctl_header)
+ goto err_reg;
return 0;
+
+err_reg:
+ if (!net_eq(net, &init_net))
+ kfree(ipvs->lblcr_ctl_table);
+
+err_dup:
+ return -ENOMEM;
}
static void __net_exit __ip_vs_lblcr_exit(struct net *net)
{
- if (!net_eq(net, &init_net)) /* netns not enabled yet */
- return;
+ struct netns_ipvs *ipvs = net_ipvs(net);
+
+ unregister_net_sysctl_table(ipvs->lblcr_ctl_header);
- unregister_net_sysctl_table(sysctl_header);
+ if (!net_eq(net, &init_net))
+ kfree(ipvs->lblcr_ctl_table);
}
static struct pernet_operations ip_vs_lblcr_ops = {
--
1.7.2.3
^ permalink raw reply related
* [PATCH 05/22] IPVS: netns, prepare protocol
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.
This patch causes warnings of unused functions, they will be used
when next patch will be applied.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 20 +++++++++++-
include/net/netns/ip_vs.h | 3 ++
net/netfilter/ipvs/ip_vs_proto.c | 66 ++++++++++++++++++++++++++++++++++++++
3 files changed, 88 insertions(+), 1 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index d551e0d..88d4e40 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -352,6 +352,7 @@ struct iphdr;
struct ip_vs_conn;
struct ip_vs_app;
struct sk_buff;
+struct ip_vs_proto_data;
struct ip_vs_protocol {
struct ip_vs_protocol *next;
@@ -366,6 +367,10 @@ struct ip_vs_protocol {
void (*exit)(struct ip_vs_protocol *pp);
+ void (*init_netns)(struct net *net, struct ip_vs_proto_data *pd);
+
+ void (*exit_netns)(struct net *net, struct ip_vs_proto_data *pd);
+
int (*conn_schedule)(int af, struct sk_buff *skb,
struct ip_vs_protocol *pp,
int *verdict, struct ip_vs_conn **cpp);
@@ -417,7 +422,20 @@ struct ip_vs_protocol {
int (*set_state_timeout)(struct ip_vs_protocol *pp, char *sname, int to);
};
-extern struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto);
+/*
+ * protocol data per netns
+ */
+struct ip_vs_proto_data {
+ struct ip_vs_proto_data *next;
+ struct ip_vs_protocol *pp;
+ int *timeout_table; /* protocol timeout table */
+ atomic_t appcnt; /* counter of proto app incs. */
+ struct tcp_states_t *tcp_state_table;
+};
+
+extern struct ip_vs_protocol *ip_vs_proto_get(unsigned short proto);
+extern struct ip_vs_proto_data *ip_vs_proto_data_get(struct net *net,
+ unsigned short proto);
struct ip_vs_conn_param {
const union nf_inet_addr *caddr;
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index d14581c..6f4e089 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -28,6 +28,9 @@ struct netns_ipvs {
#define IP_VS_RTAB_MASK (IP_VS_RTAB_SIZE - 1)
struct list_head rs_table[IP_VS_RTAB_SIZE];
+ /* ip_vs_proto */
+ #define IP_VS_PROTO_TAB_SIZE 32 /* must be power of 2 */
+ struct ip_vs_proto_data *proto_data_table[IP_VS_PROTO_TAB_SIZE];
/* ip_vs_lblc */
int sysctl_lblc_expiration;
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 4539294..576e296 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -60,6 +60,31 @@ static int __used __init register_ip_vs_protocol(struct ip_vs_protocol *pp)
return 0;
}
+/*
+ * register an ipvs protocols netns related data
+ */
+static int
+register_ip_vs_proto_netns(struct net *net, struct ip_vs_protocol *pp)
+{
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ unsigned hash = IP_VS_PROTO_HASH(pp->protocol);
+ struct ip_vs_proto_data *pd =
+ kzalloc(sizeof(struct ip_vs_proto_data), GFP_ATOMIC);
+
+ if (!pd) {
+ pr_err("%s(): no memory.\n", __func__);
+ return -ENOMEM;
+ }
+ pd->pp = pp; /* For speed issues */
+ pd->next = ipvs->proto_data_table[hash];
+ ipvs->proto_data_table[hash] = pd;
+ atomic_set(&pd->appcnt, 0); /* Init app counter */
+
+ if (pp->init_netns != NULL)
+ pp->init_netns(net, pd);
+
+ return 0;
+}
/*
* unregister an ipvs protocol
@@ -82,6 +107,29 @@ static int unregister_ip_vs_protocol(struct ip_vs_protocol *pp)
return -ESRCH;
}
+/*
+ * unregister an ipvs protocols netns data
+ */
+static int
+unregister_ip_vs_proto_netns(struct net *net, struct ip_vs_proto_data *pd)
+{
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ struct ip_vs_proto_data **pd_p;
+ unsigned hash = IP_VS_PROTO_HASH(pd->pp->protocol);
+
+ pd_p = &ipvs->proto_data_table[hash];
+ for (; *pd_p; pd_p = &(*pd_p)->next) {
+ if (*pd_p == pd) {
+ *pd_p = pd->next;
+ if (pd->pp->exit_netns != NULL)
+ pd->pp->exit_netns(net, pd);
+ kfree(pd);
+ return 0;
+ }
+ }
+
+ return -ESRCH;
+}
/*
* get ip_vs_protocol object by its proto.
@@ -100,6 +148,24 @@ struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto)
}
EXPORT_SYMBOL(ip_vs_proto_get);
+/*
+ * get ip_vs_protocol object data by netns and proto
+ */
+struct ip_vs_proto_data *
+ip_vs_proto_data_get(struct net *net, unsigned short proto)
+{
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ struct ip_vs_proto_data *pd;
+ unsigned hash = IP_VS_PROTO_HASH(proto);
+
+ for (pd = ipvs->proto_data_table[hash]; pd; pd = pd->next) {
+ if (pd->pp->protocol == proto)
+ return pd;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL(ip_vs_proto_data_get);
/*
* Propagate event for state change to all protocols
--
1.7.2.3
^ permalink raw reply related
* [PATCH 06/22] IPVS: netns preparation for proto_tcp
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data
*v3
Removed unused function as sugested by Simon
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/ip_vs.h | 2 +-
include/net/netns/ip_vs.h | 8 +++
net/netfilter/ipvs/ip_vs_ftp.c | 8 ++-
net/netfilter/ipvs/ip_vs_proto.c | 13 ++++-
net/netfilter/ipvs/ip_vs_proto_tcp.c | 97 ++++++++++++++++++----------------
5 files changed, 79 insertions(+), 49 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 88d4e40..3c45a00 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -807,7 +807,7 @@ extern void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
extern const char * ip_vs_state_name(__u16 proto, int state);
-extern void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp);
+extern void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp);
extern int ip_vs_check_template(struct ip_vs_conn *ct);
extern void ip_vs_random_dropentry(void);
extern int ip_vs_conn_init(void);
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 6f4e089..ac77363 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -31,6 +31,14 @@ struct netns_ipvs {
/* ip_vs_proto */
#define IP_VS_PROTO_TAB_SIZE 32 /* must be power of 2 */
struct ip_vs_proto_data *proto_data_table[IP_VS_PROTO_TAB_SIZE];
+ /* ip_vs_proto_tcp */
+#ifdef CONFIG_IP_VS_PROTO_TCP
+ #define TCP_APP_TAB_BITS 4
+ #define TCP_APP_TAB_SIZE (1 << TCP_APP_TAB_BITS)
+ #define TCP_APP_TAB_MASK (TCP_APP_TAB_SIZE - 1)
+ struct list_head tcp_apps[TCP_APP_TAB_SIZE];
+ spinlock_t tcp_app_lock;
+#endif
/* ip_vs_lblc */
int sysctl_lblc_expiration;
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 0e762f3..b38ae94 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -157,6 +157,7 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
int ret = 0;
enum ip_conntrack_info ctinfo;
struct nf_conn *ct;
+ struct net *net;
#ifdef CONFIG_IP_VS_IPV6
/* This application helper doesn't work with IPv6 yet,
@@ -257,8 +258,9 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
* would be adjusted twice.
*/
+ net = skb_net(skb);
cp->app_data = NULL;
- ip_vs_tcp_conn_listen(n_cp);
+ ip_vs_tcp_conn_listen(net, n_cp);
ip_vs_conn_put(n_cp);
return ret;
}
@@ -287,6 +289,7 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
union nf_inet_addr to;
__be16 port;
struct ip_vs_conn *n_cp;
+ struct net *net;
#ifdef CONFIG_IP_VS_IPV6
/* This application helper doesn't work with IPv6 yet,
@@ -378,7 +381,8 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
/*
* Move tunnel to listen state
*/
- ip_vs_tcp_conn_listen(n_cp);
+ net = skb_net(skb);
+ ip_vs_tcp_conn_listen(net, n_cp);
ip_vs_conn_put(n_cp);
return 1;
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 576e296..320c6a6 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -307,12 +307,23 @@ ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
*/
static int __net_init __ip_vs_protocol_init(struct net *net)
{
+#ifdef CONFIG_IP_VS_PROTO_TCP
+ register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
+#endif
return 0;
}
static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
{
- /* empty */
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ struct ip_vs_proto_data *pd;
+ int i;
+
+ /* unregister all the ipvs proto data for this netns */
+ for (i = 0; i < IP_VS_PROTO_TAB_SIZE; i++) {
+ while ((pd = ipvs->proto_data_table[i]) != NULL)
+ unregister_ip_vs_proto_netns(net, pd);
+ }
}
static struct pernet_operations ipvs_proto_ops = {
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index c175d31..9d9df3d 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -9,8 +9,12 @@
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
- * Changes:
+ * Changes: Hans Schillstrom <hans.schillstrom@ericsson.com>
*
+ * Network name space (netns) aware.
+ * Global data moved to netns i.e struct netns_ipvs
+ * tcp_timeouts table has copy per netns in a hash table per
+ * protocol ip_vs_proto_data and is handled by netns
*/
#define KMSG_COMPONENT "IPVS"
@@ -345,7 +349,7 @@ static const int tcp_state_off[IP_VS_DIR_LAST] = {
/*
* Timeout table[state]
*/
-static int tcp_timeouts[IP_VS_TCP_S_LAST+1] = {
+static const int tcp_timeouts[IP_VS_TCP_S_LAST+1] = {
[IP_VS_TCP_S_NONE] = 2*HZ,
[IP_VS_TCP_S_ESTABLISHED] = 15*60*HZ,
[IP_VS_TCP_S_SYN_SENT] = 2*60*HZ,
@@ -460,13 +464,6 @@ static void tcp_timeout_change(struct ip_vs_protocol *pp, int flags)
tcp_state_table = (on? tcp_states_dos : tcp_states);
}
-static int
-tcp_set_state_timeout(struct ip_vs_protocol *pp, char *sname, int to)
-{
- return ip_vs_set_state_timeout(pp->timeout_table, IP_VS_TCP_S_LAST,
- tcp_state_name_table, sname, to);
-}
-
static inline int tcp_state_idx(struct tcphdr *th)
{
if (th->rst)
@@ -487,6 +484,7 @@ set_tcp_state(struct ip_vs_protocol *pp, struct ip_vs_conn *cp,
int state_idx;
int new_state = IP_VS_TCP_S_CLOSE;
int state_off = tcp_state_off[direction];
+ struct ip_vs_proto_data *pd; /* Temp fix */
/*
* Update state offset to INPUT_ONLY if necessary
@@ -542,10 +540,13 @@ set_tcp_state(struct ip_vs_protocol *pp, struct ip_vs_conn *cp,
}
}
- cp->timeout = pp->timeout_table[cp->state = new_state];
+ pd = ip_vs_proto_data_get(&init_net, pp->protocol);
+ if (likely(pd))
+ cp->timeout = pd->timeout_table[cp->state = new_state];
+ else /* What to do ? */
+ cp->timeout = tcp_timeouts[cp->state = new_state];
}
-
/*
* Handle state transitions
*/
@@ -573,17 +574,6 @@ tcp_state_transition(struct ip_vs_conn *cp, int direction,
return 1;
}
-
-/*
- * Hash table for TCP application incarnations
- */
-#define TCP_APP_TAB_BITS 4
-#define TCP_APP_TAB_SIZE (1 << TCP_APP_TAB_BITS)
-#define TCP_APP_TAB_MASK (TCP_APP_TAB_SIZE - 1)
-
-static struct list_head tcp_apps[TCP_APP_TAB_SIZE];
-static DEFINE_SPINLOCK(tcp_app_lock);
-
static inline __u16 tcp_app_hashkey(__be16 port)
{
return (((__force u16)port >> TCP_APP_TAB_BITS) ^ (__force u16)port)
@@ -597,21 +587,23 @@ static int tcp_register_app(struct ip_vs_app *inc)
__u16 hash;
__be16 port = inc->port;
int ret = 0;
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
+ struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_TCP);
hash = tcp_app_hashkey(port);
- spin_lock_bh(&tcp_app_lock);
- list_for_each_entry(i, &tcp_apps[hash], p_list) {
+ spin_lock_bh(&ipvs->tcp_app_lock);
+ list_for_each_entry(i, &ipvs->tcp_apps[hash], p_list) {
if (i->port == port) {
ret = -EEXIST;
goto out;
}
}
- list_add(&inc->p_list, &tcp_apps[hash]);
- atomic_inc(&ip_vs_protocol_tcp.appcnt);
+ list_add(&inc->p_list, &ipvs->tcp_apps[hash]);
+ atomic_inc(&pd->pp->appcnt);
out:
- spin_unlock_bh(&tcp_app_lock);
+ spin_unlock_bh(&ipvs->tcp_app_lock);
return ret;
}
@@ -619,16 +611,20 @@ static int tcp_register_app(struct ip_vs_app *inc)
static void
tcp_unregister_app(struct ip_vs_app *inc)
{
- spin_lock_bh(&tcp_app_lock);
- atomic_dec(&ip_vs_protocol_tcp.appcnt);
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
+ struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_TCP);
+
+ spin_lock_bh(&ipvs->tcp_app_lock);
+ atomic_dec(&pd->pp->appcnt);
list_del(&inc->p_list);
- spin_unlock_bh(&tcp_app_lock);
+ spin_unlock_bh(&ipvs->tcp_app_lock);
}
static int
tcp_app_conn_bind(struct ip_vs_conn *cp)
{
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
int hash;
struct ip_vs_app *inc;
int result = 0;
@@ -640,12 +636,12 @@ tcp_app_conn_bind(struct ip_vs_conn *cp)
/* Lookup application incarnations and bind the right one */
hash = tcp_app_hashkey(cp->vport);
- spin_lock(&tcp_app_lock);
- list_for_each_entry(inc, &tcp_apps[hash], p_list) {
+ spin_lock(&ipvs->tcp_app_lock);
+ list_for_each_entry(inc, &ipvs->tcp_apps[hash], p_list) {
if (inc->port == cp->vport) {
if (unlikely(!ip_vs_app_inc_get(inc)))
break;
- spin_unlock(&tcp_app_lock);
+ spin_unlock(&ipvs->tcp_app_lock);
IP_VS_DBG_BUF(9, "%s(): Binding conn %s:%u->"
"%s:%u to app %s on port %u\n",
@@ -662,7 +658,7 @@ tcp_app_conn_bind(struct ip_vs_conn *cp)
goto out;
}
}
- spin_unlock(&tcp_app_lock);
+ spin_unlock(&ipvs->tcp_app_lock);
out:
return result;
@@ -672,24 +668,34 @@ tcp_app_conn_bind(struct ip_vs_conn *cp)
/*
* Set LISTEN timeout. (ip_vs_conn_put will setup timer)
*/
-void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp)
+void ip_vs_tcp_conn_listen(struct net *net, struct ip_vs_conn *cp)
{
+ struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_TCP);
+
spin_lock(&cp->lock);
cp->state = IP_VS_TCP_S_LISTEN;
- cp->timeout = ip_vs_protocol_tcp.timeout_table[IP_VS_TCP_S_LISTEN];
+ cp->timeout = (pd ? pd->timeout_table[IP_VS_TCP_S_LISTEN]
+ : tcp_timeouts[IP_VS_TCP_S_LISTEN]);
spin_unlock(&cp->lock);
}
-
-static void ip_vs_tcp_init(struct ip_vs_protocol *pp)
+/* ---------------------------------------------
+ * timeouts is netns related now.
+ * ---------------------------------------------
+ */
+static void __ip_vs_tcp_init(struct net *net, struct ip_vs_proto_data *pd)
{
- IP_VS_INIT_HASH_TABLE(tcp_apps);
- pp->timeout_table = tcp_timeouts;
-}
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ ip_vs_init_hash_table(ipvs->tcp_apps, TCP_APP_TAB_SIZE);
+ spin_lock_init(&ipvs->tcp_app_lock);
+ pd->timeout_table = ip_vs_create_timeout_table((int *)tcp_timeouts,
+ sizeof(tcp_timeouts));
+}
-static void ip_vs_tcp_exit(struct ip_vs_protocol *pp)
+static void __ip_vs_tcp_exit(struct net *net, struct ip_vs_proto_data *pd)
{
+ kfree(pd->timeout_table);
}
@@ -699,8 +705,10 @@ struct ip_vs_protocol ip_vs_protocol_tcp = {
.num_states = IP_VS_TCP_S_LAST,
.dont_defrag = 0,
.appcnt = ATOMIC_INIT(0),
- .init = ip_vs_tcp_init,
- .exit = ip_vs_tcp_exit,
+ .init = NULL,
+ .exit = NULL,
+ .init_netns = __ip_vs_tcp_init,
+ .exit_netns = __ip_vs_tcp_exit,
.register_app = tcp_register_app,
.unregister_app = tcp_unregister_app,
.conn_schedule = tcp_conn_schedule,
@@ -714,5 +722,4 @@ struct ip_vs_protocol ip_vs_protocol_tcp = {
.app_conn_bind = tcp_app_conn_bind,
.debug_packet = ip_vs_tcpudp_debug_packet,
.timeout_change = tcp_timeout_change,
- .set_state_timeout = tcp_set_state_timeout,
};
--
1.7.2.3
^ permalink raw reply related
* [PATCH 08/22] IPVS: netns preparation for proto_sctp
From: Simon Horman @ 2011-01-13 1:52 UTC (permalink / raw)
To: netfilter-devel, lvs-devel, netdev
Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
In this phase (one), all local vars will be moved to ipvs struct.
Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data
*v3
Removed unuset function set_state_timeout()
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
include/net/netns/ip_vs.h | 9 +++
net/netfilter/ipvs/ip_vs_proto.c | 3 +
net/netfilter/ipvs/ip_vs_proto_sctp.c | 121 ++++++++++++++++-----------------
3 files changed, 70 insertions(+), 63 deletions(-)
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 62b1448..58bd3fd 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -47,6 +47,15 @@ struct netns_ipvs {
struct list_head udp_apps[UDP_APP_TAB_SIZE];
spinlock_t udp_app_lock;
#endif
+ /* ip_vs_proto_sctp */
+#ifdef CONFIG_IP_VS_PROTO_SCTP
+ #define SCTP_APP_TAB_BITS 4
+ #define SCTP_APP_TAB_SIZE (1 << SCTP_APP_TAB_BITS)
+ #define SCTP_APP_TAB_MASK (SCTP_APP_TAB_SIZE - 1)
+ /* Hash table for SCTP application incarnations */
+ struct list_head sctp_apps[SCTP_APP_TAB_SIZE];
+ spinlock_t sctp_app_lock;
+#endif
/* ip_vs_lblc */
int sysctl_lblc_expiration;
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index cdc4142..001b2f8 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -313,6 +313,9 @@ static int __net_init __ip_vs_protocol_init(struct net *net)
#ifdef CONFIG_IP_VS_PROTO_UDP
register_ip_vs_proto_netns(net, &ip_vs_protocol_udp);
#endif
+#ifdef CONFIG_IP_VS_PROTO_SCTP
+ register_ip_vs_proto_netns(net, &ip_vs_protocol_sctp);
+#endif
return 0;
}
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 521b827..f826dd1 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -862,7 +862,7 @@ static struct ipvs_sctp_nextstate
/*
* Timeout table[state]
*/
-static int sctp_timeouts[IP_VS_SCTP_S_LAST + 1] = {
+static const int sctp_timeouts[IP_VS_SCTP_S_LAST + 1] = {
[IP_VS_SCTP_S_NONE] = 2 * HZ,
[IP_VS_SCTP_S_INIT_CLI] = 1 * 60 * HZ,
[IP_VS_SCTP_S_INIT_SER] = 1 * 60 * HZ,
@@ -906,18 +906,6 @@ static const char *sctp_state_name(int state)
return "?";
}
-static void sctp_timeout_change(struct ip_vs_protocol *pp, int flags)
-{
-}
-
-static int
-sctp_set_state_timeout(struct ip_vs_protocol *pp, char *sname, int to)
-{
-
-return ip_vs_set_state_timeout(pp->timeout_table, IP_VS_SCTP_S_LAST,
- sctp_state_name_table, sname, to);
-}
-
static inline int
set_sctp_state(struct ip_vs_protocol *pp, struct ip_vs_conn *cp,
int direction, const struct sk_buff *skb)
@@ -926,6 +914,7 @@ set_sctp_state(struct ip_vs_protocol *pp, struct ip_vs_conn *cp,
unsigned char chunk_type;
int event, next_state;
int ihl;
+ struct ip_vs_proto_data *pd;
#ifdef CONFIG_IP_VS_IPV6
ihl = cp->af == AF_INET ? ip_hdrlen(skb) : sizeof(struct ipv6hdr);
@@ -1001,10 +990,13 @@ set_sctp_state(struct ip_vs_protocol *pp, struct ip_vs_conn *cp,
}
}
}
+ pd = ip_vs_proto_data_get(&init_net, pp->protocol); /* tmp fix */
+ if (likely(pd))
+ cp->timeout = pd->timeout_table[cp->state = next_state];
+ else /* What to do ? */
+ cp->timeout = sctp_timeouts[cp->state = next_state];
- cp->timeout = pp->timeout_table[cp->state = next_state];
-
- return 1;
+ return 1;
}
static int
@@ -1020,16 +1012,6 @@ sctp_state_transition(struct ip_vs_conn *cp, int direction,
return ret;
}
-/*
- * Hash table for SCTP application incarnations
- */
-#define SCTP_APP_TAB_BITS 4
-#define SCTP_APP_TAB_SIZE (1 << SCTP_APP_TAB_BITS)
-#define SCTP_APP_TAB_MASK (SCTP_APP_TAB_SIZE - 1)
-
-static struct list_head sctp_apps[SCTP_APP_TAB_SIZE];
-static DEFINE_SPINLOCK(sctp_app_lock);
-
static inline __u16 sctp_app_hashkey(__be16 port)
{
return (((__force u16)port >> SCTP_APP_TAB_BITS) ^ (__force u16)port)
@@ -1042,34 +1024,40 @@ static int sctp_register_app(struct ip_vs_app *inc)
__u16 hash;
__be16 port = inc->port;
int ret = 0;
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
+ struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_SCTP);
hash = sctp_app_hashkey(port);
- spin_lock_bh(&sctp_app_lock);
- list_for_each_entry(i, &sctp_apps[hash], p_list) {
+ spin_lock_bh(&ipvs->sctp_app_lock);
+ list_for_each_entry(i, &ipvs->sctp_apps[hash], p_list) {
if (i->port == port) {
ret = -EEXIST;
goto out;
}
}
- list_add(&inc->p_list, &sctp_apps[hash]);
- atomic_inc(&ip_vs_protocol_sctp.appcnt);
+ list_add(&inc->p_list, &ipvs->sctp_apps[hash]);
+ atomic_inc(&pd->pp->appcnt);
out:
- spin_unlock_bh(&sctp_app_lock);
+ spin_unlock_bh(&ipvs->sctp_app_lock);
return ret;
}
static void sctp_unregister_app(struct ip_vs_app *inc)
{
- spin_lock_bh(&sctp_app_lock);
- atomic_dec(&ip_vs_protocol_sctp.appcnt);
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
+ struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_SCTP);
+
+ spin_lock_bh(&ipvs->sctp_app_lock);
+ atomic_dec(&pd->pp->appcnt);
list_del(&inc->p_list);
- spin_unlock_bh(&sctp_app_lock);
+ spin_unlock_bh(&ipvs->sctp_app_lock);
}
static int sctp_app_conn_bind(struct ip_vs_conn *cp)
{
+ struct netns_ipvs *ipvs = net_ipvs(&init_net);
int hash;
struct ip_vs_app *inc;
int result = 0;
@@ -1080,12 +1068,12 @@ static int sctp_app_conn_bind(struct ip_vs_conn *cp)
/* Lookup application incarnations and bind the right one */
hash = sctp_app_hashkey(cp->vport);
- spin_lock(&sctp_app_lock);
- list_for_each_entry(inc, &sctp_apps[hash], p_list) {
+ spin_lock(&ipvs->sctp_app_lock);
+ list_for_each_entry(inc, &ipvs->sctp_apps[hash], p_list) {
if (inc->port == cp->vport) {
if (unlikely(!ip_vs_app_inc_get(inc)))
break;
- spin_unlock(&sctp_app_lock);
+ spin_unlock(&ipvs->sctp_app_lock);
IP_VS_DBG_BUF(9, "%s: Binding conn %s:%u->"
"%s:%u to app %s on port %u\n",
@@ -1101,43 +1089,50 @@ static int sctp_app_conn_bind(struct ip_vs_conn *cp)
goto out;
}
}
- spin_unlock(&sctp_app_lock);
+ spin_unlock(&ipvs->sctp_app_lock);
out:
return result;
}
-static void ip_vs_sctp_init(struct ip_vs_protocol *pp)
+/* ---------------------------------------------
+ * timeouts is netns related now.
+ * ---------------------------------------------
+ */
+static void __ip_vs_sctp_init(struct net *net, struct ip_vs_proto_data *pd)
{
- IP_VS_INIT_HASH_TABLE(sctp_apps);
- pp->timeout_table = sctp_timeouts;
-}
+ struct netns_ipvs *ipvs = net_ipvs(net);
+ ip_vs_init_hash_table(ipvs->sctp_apps, SCTP_APP_TAB_SIZE);
+ spin_lock_init(&ipvs->tcp_app_lock);
+ pd->timeout_table = ip_vs_create_timeout_table((int *)sctp_timeouts,
+ sizeof(sctp_timeouts));
+}
-static void ip_vs_sctp_exit(struct ip_vs_protocol *pp)
+static void __ip_vs_sctp_exit(struct net *net, struct ip_vs_proto_data *pd)
{
-
+ kfree(pd->timeout_table);
}
struct ip_vs_protocol ip_vs_protocol_sctp = {
- .name = "SCTP",
- .protocol = IPPROTO_SCTP,
- .num_states = IP_VS_SCTP_S_LAST,
- .dont_defrag = 0,
- .appcnt = ATOMIC_INIT(0),
- .init = ip_vs_sctp_init,
- .exit = ip_vs_sctp_exit,
- .register_app = sctp_register_app,
+ .name = "SCTP",
+ .protocol = IPPROTO_SCTP,
+ .num_states = IP_VS_SCTP_S_LAST,
+ .dont_defrag = 0,
+ .init = NULL,
+ .exit = NULL,
+ .init_netns = __ip_vs_sctp_init,
+ .exit_netns = __ip_vs_sctp_exit,
+ .register_app = sctp_register_app,
.unregister_app = sctp_unregister_app,
- .conn_schedule = sctp_conn_schedule,
- .conn_in_get = ip_vs_conn_in_get_proto,
- .conn_out_get = ip_vs_conn_out_get_proto,
- .snat_handler = sctp_snat_handler,
- .dnat_handler = sctp_dnat_handler,
- .csum_check = sctp_csum_check,
- .state_name = sctp_state_name,
+ .conn_schedule = sctp_conn_schedule,
+ .conn_in_get = ip_vs_conn_in_get_proto,
+ .conn_out_get = ip_vs_conn_out_get_proto,
+ .snat_handler = sctp_snat_handler,
+ .dnat_handler = sctp_dnat_handler,
+ .csum_check = sctp_csum_check,
+ .state_name = sctp_state_name,
.state_transition = sctp_state_transition,
- .app_conn_bind = sctp_app_conn_bind,
- .debug_packet = ip_vs_tcpudp_debug_packet,
- .timeout_change = sctp_timeout_change,
- .set_state_timeout = sctp_set_state_timeout,
+ .app_conn_bind = sctp_app_conn_bind,
+ .debug_packet = ip_vs_tcpudp_debug_packet,
+ .timeout_change = NULL,
};
--
1.7.2.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox