Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-2.6] bridge: fix br_multicast_ipv6_rcv for paged skbs
From: Stephen Hemminger @ 2010-12-30 18:46 UTC (permalink / raw)
  To: Tomas Winkler; +Cc: davem, netdev, linux-wireless, Johannes Berg
In-Reply-To: <1293708753-17728-1-git-send-email-tomas.winkler@intel.com>

On Thu, 30 Dec 2010 13:32:33 +0200
Tomas Winkler <tomas.winkler@intel.com> wrote:

> use pskb_may_pull to access header correctly for paged skbs
> 
> the pskb_may_pull ideom is used ipv6 heder parsing
> but omitted int the bridge code
> 
> this fixes bug https://bugzilla.kernel.org/show_bug.cgi?id=25202
> 
> Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: authenticated
> Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: associated (aid 2)
> Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 RADIUS: starting accounting session 4D0608A3-00000005
> Dec 15 14:36:41 User-PC kernel: [175576.120287] ------------[ cut here ]------------
> Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178!
> Dec 15 14:36:41 User-PC kernel: [175576.120609] invalid opcode: 0000 [#1] SMP
> Dec 15 14:36:41 User-PC kernel: [175576.120749] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
> Dec 15 14:36:41 User-PC kernel: [175576.121035] Modules linked in: oprofile binfmt_misc bridge stp llc parport_pc ppdev arc4 iwlagn snd_hda_codec_realtek iwlcore i915 snd_hda_intel mac80211 joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi drm_kms_helper snd_rawmidi drm snd_seq_midi_event snd_seq snd_timer snd_seq_device cfg80211 eeepc_wmi usbhid psmouse intel_agp i2c_algo_bit intel_gtt uvcvideo agpgart videodev sparse_keymap snd shpchp v4l1_compat lp hid video serio_raw soundcore output snd_page_alloc ahci libahci atl1c
> Dec 15 14:36:41 User-PC kernel: [175576.122712]
> Dec 15 14:36:41 User-PC kernel: [175576.122769] Pid: 0, comm: kworker/0:0 Tainted: G        W   2.6.37-rc5-wl+ #3 1015PE/1016P
> Dec 15 14:36:41 User-PC kernel: [175576.123012] EIP: 0060:[<f83edd65>] EFLAGS: 00010283 CPU: 1
> Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.123362] EAX: 0000001c EBX: f5626318 ECX: 00000000 EDX: 00000000
> Dec 15 14:36:41 User-PC kernel: [175576.123550] ESI: ec512262 EDI: f5626180 EBP: f60b5ca0 ESP: f60b5bd8
> Dec 15 14:36:41 User-PC kernel: [175576.123737]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Dec 15 14:36:41 User-PC kernel: [175576.123902] Process kworker/0:0 (pid: 0, ti=f60b4000 task=f60a8000 task.ti=f60b0000)
> Dec 15 14:36:41 User-PC kernel: [175576.124137] Stack:
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  ec556500 f6d06800 f60b5be8 c01087d8 ec512262 00000030 00000024 f5626180
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  f572c200 ef463440 f5626300 3affffff f6d06dd0 e60766a4 000000c4 f6d06860
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  ffffffff ec55652c 00000001 f6d06844 f60b5c64 c0138264 c016e451 c013e47d
> Dec 15 14:36:41 User-PC kernel: [175576.124181] Call Trace:
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01087d8>] ? sched_clock+0x8/0x10
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0138264>] ? enqueue_entity+0x174/0x440
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e451>] ? sched_clock_cpu+0x131/0x190
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c013e47d>] ? select_task_rq_fair+0x2ad/0x730
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0524fc1>] ? nf_iterate+0x71/0x90
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4914>] ? br_handle_frame_finish+0x184/0x220 [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e46e9>] ? br_handle_frame+0x189/0x230 [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4560>] ? br_handle_frame+0x0/0x230 [bridge]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff026>] ? __netif_receive_skb+0x1b6/0x5b0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04f7a30>] ? skb_copy_bits+0x110/0x210
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0503a7f>] ? netif_receive_skb+0x6f/0x80
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cb74c>] ? ieee80211_deliver_skb+0x8c/0x1a0 [mac80211]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cc836>] ? ieee80211_rx_handlers+0xeb6/0x1aa0 [mac80211]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff1f0>] ? __netif_receive_skb+0x380/0x5b0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e242>] ? sched_clock_local+0xb2/0x190
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c012b688>] ? default_spin_lock_flags+0x8/0x10
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cd621>] ? ieee80211_prepare_and_rx_handle+0x201/0xa90 [mac80211]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82ce154>] ? ieee80211_rx+0x2a4/0x830 [mac80211]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f815a8d6>] ? iwl_update_stats+0xa6/0x2a0 [iwlcore]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8499212>] ? iwlagn_rx_reply_rx+0x292/0x3b0 [iwlagn]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8483697>] ? iwl_rx_handle+0xe7/0x350 [iwlagn]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8486ab7>] ? iwl_irq_tasklet+0xf7/0x5c0 [iwlagn]
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01aece1>] ? __rcu_process_callbacks+0x201/0x2d0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150d05>] ? tasklet_action+0xc5/0x100
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150a07>] ? __do_softirq+0x97/0x1d0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d910c>] ? nmi_stack_correct+0x2f/0x34
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150970>] ? __do_softirq+0x0/0x1d0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  <IRQ>
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01508f5>] ? irq_exit+0x65/0x70
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05df062>] ? do_IRQ+0x52/0xc0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01036b0>] ? common_interrupt+0x30/0x38
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c03a1fc2>] ? intel_idle+0xc2/0x160
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04daebb>] ? cpuidle_idle_call+0x6b/0x100
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0101dea>] ? cpu_idle+0x8a/0xf0
> Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d2702>] ? start_secondary+0x1e8/0x1ee
> 
> Cc:YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> ---
>  net/bridge/br_multicast.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index f19e347..074c478 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1464,6 +1464,10 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
>  	if (offset < 0 || nexthdr != IPPROTO_ICMPV6)
>  		return 0;
>  
> +	if (!pskb_may_pull(skb,
> +		(skb_network_header(skb) + offset + 1 - skb->data))) 
> +                        return 0;
> +
>  	/* Okay, we found ICMPv6 header */
>  	skb2 = skb_clone(skb, GFP_ATOMIC);
>  	if (!skb2)

This doesn't look correct. The calculation of the offset doesn't look correct.
Just following the skb_clone(), the skb_pull value is "offset".
Also, the other checks return -EINVAL for incorrectly formed packet.

--- a/net/bridge/br_multicast.c	2010-12-30 10:29:58.579510488 -0800
+++ b/net/bridge/br_multicast.c	2010-12-30 10:43:27.273386691 -0800
@@ -1464,6 +1464,9 @@ static int br_multicast_ipv6_rcv(struct
 	if (offset < 0 || nexthdr != IPPROTO_ICMPV6)
 		return 0;
 
+	if (!pskb_may_pull(skb, offset))
+		return -EINVAL;
+
 	/* Okay, we found ICMPv6 header */
 	skb2 = skb_clone(skb, GFP_ATOMIC);
 	if (!skb2)



-- 

^ permalink raw reply

* Re: [PATCH net-next-2.6] sfq: fix slot_dequeue_head()
From: Jarek Poplawski @ 2010-12-30 17:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1293721368.7150.307.camel@edumazet-laptop>

On Thu, Dec 30, 2010 at 04:02:48PM +0100, Eric Dumazet wrote:
> Le mercredi 22 décembre 2010 ?? 07:32 +0000, Jarek Poplawski a écrit :
> > > Also, slot_dequeue_tail() should make sure slot skb chain is correctly
> > > terminated, or sfq_dump_class_stats() can access freed skbs.
> > 
> > ...and a good hint for code reusing ;-)
> 
> Yes, and of course same fix is needed in slot_dequeue_head(), as further
> testing on my side made it pretty clear.
> 
> I was adding possibility to have more packets queued in SFQ (more
> packets than max number of flows) and got unexpected crashes.
> 
> Reverting to net-next-2.6, I still got crashes. Oops.
> 
> [PATCH net-next-2.6] sfq: fix slot_dequeue_head()
> 
> slot_dequeue_head() should make sure slot skb chain is correct in both
> ways, or we can crash if all possible flows are in use.

Nice scenario ;-) Of course, it's easy to guess I looked for something
like this after your previous fix and missed that :-| Btw, it looks
like slot_queue_init() could go back to sfq_init() now.

Jarek P.

> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jarek Poplawski <jarkao2@gmail.com>
> ---
>  net/sched/sch_sfq.c |    1 +
>  1 files changed, 1 insertion(+)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 6a2f88f..3977e56 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -292,6 +292,7 @@ static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot)
>  	struct sk_buff *skb = slot->skblist_next;
>  
>  	slot->skblist_next = skb->next;
> +	skb->next->prev = (struct sk_buff *)slot;
>  	skb->next = skb->prev = NULL;
>  	return skb;
>  }
> 
> 

^ permalink raw reply

* [PATCH/RFC] Re: Compilation of pktgen fails for ARCH=um
From: Randy Dunlap @ 2010-12-30 17:39 UTC (permalink / raw)
  To: christoph.paasch; +Cc: netdev, linux-arch
In-Reply-To: <201012301133.14524.christoph.paasch@uclouvain.be>

On Thu, 30 Dec 2010 11:33:14 +0100 Christoph Paasch wrote:

> Hi,
> 
> compiling the packet generator (NET_PKTGEN) for ARCH=um does not work.
> 
> Since commit 43d28b6515a6ea580a198df3e253e88236f08978 (pktgen: increasing 
> transmission granularity), function spin(...) uses ndelay(...), which is not 
> implemented for uml.
> 
> Should pktgen be disabled for uml?
> Or should ndelay be an empty macro? (in arch/um/include/asm/delay.h)

or ndelay() in arch/um/include/asm/delay.y can be removed completely
and then the default implementation of ndelay() will be used from
include/linux/delay.h.
This builds cleanly, but I don't know how well it would work.

---
From: Randy Dunlap <randy.dunlap@oracle.com>

Allow uml to use the default implementation of ndelay() from
<linux/delay.h>.  Fixes build error:

net/built-in.o: In function `spin':
pktgen.c:(.text+0x27391): undefined reference to `__unimplemented_ndelay'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 arch/um/include/asm/delay.h |    5 -----
 1 file changed, 5 deletions(-)

--- linux-next-20101214.orig/arch/um/include/asm/delay.h
+++ linux-next-20101214/arch/um/include/asm/delay.h
@@ -12,9 +12,4 @@ extern void __delay(unsigned long loops)
 #define udelay(n) ((__builtin_constant_p(n) && (n) > 20000) ? \
 	__bad_udelay() : __udelay(n))
 
-/* It appears that ndelay is not used at all for UML, and has never been
- * implemented. */
-extern void __unimplemented_ndelay(void);
-#define ndelay(n) __unimplemented_ndelay()
-
 #endif

^ permalink raw reply

* Re: linux-next: Tree for December 29 (netfilter build errors)
From: Eric Dumazet @ 2010-12-30 17:07 UTC (permalink / raw)
  To: sedat.dilek
  Cc: Changli Gao, Randy Dunlap, Stephen Rothwell, netfilter-devel,
	linux-next, LKML, netdev
In-Reply-To: <AANLkTikPbKOQseJ8qfoSqv4XWgH5fjbg488jd6_Xv22E@mail.gmail.com>

Le jeudi 30 décembre 2010 à 17:47 +0100, Sedat Dilek a écrit :
> On Thu, Dec 30, 2010 at 3:18 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> > On Thu, Dec 30, 2010 at 12:53 AM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
> >> On Wed, 29 Dec 2010 13:07:00 +1100 Stephen Rothwell wrote:
> >>
> >>> Hi all,
> >>>
> >>> Changes since 20101228:
> >>
> >>
> >> In file included from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:22:
> >> linux-next-20101229/include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
> >> linux-next-20101229/include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
> >> linux-next-20101229/include/net/netfilter/nf_conntrack.h:174: error: 'const struct sk_buff' has no member named 'nfct'
> >> linux-next-20101229/include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
> >> linux-next-20101229/include/net/netfilter/nf_conntrack.h:181: error: implicit declaration of function 'nf_conntrack_put'
> >>  CC      net/ipv6/exthdrs_core.o
> >> In file included from linux-next-20101229/include/net/netfilter/nf_conntrack_core.h:18,
> >>                 from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:26:
> >> linux-next-20101229/include/net/netfilter/nf_conntrack_ecache.h: In function 'nf_ct_ecache_ext_add':
> >> linux-next-20101229/include/net/netfilter/nf_conntrack_ecache.h:35: error: 'struct net' has no member named 'ct'
> >> In file included from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:26:
> >> linux-next-20101229/include/net/netfilter/nf_conntrack_core.h: In function 'nf_conntrack_confirm':
> >> linux-next-20101229/include/net/netfilter/nf_conntrack_core.h:60: error: 'struct sk_buff' has no member named 'nfct'
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'nf_ct6_defrag_user':
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:36: error: 'struct sk_buff' has no member named 'nfct'
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:37: error: implicit declaration of function 'nf_ct_zone'
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:37: error: 'struct sk_buff' has no member named 'nfct'
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:60: error: 'struct sk_buff' has no member named 'nfct'
> >> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:60: error: 'struct sk_buff' has no member named 'nfct'
> >> make[4]: *** [net/ipv6/netfilter/nf_defrag_ipv6_hooks.o] Error 1
> >> make[3]: *** [net/ipv6/netfilter] Error 2
> >>
> >>
> >>
> >
> > This bug has been already fixed in nf-next-2.6:
> > http://git.kernel.org/?p=linux/kernel/git/kaber/nf-next-2.6.git;a=commitdiff;h=ae90bdeaeac6b964b7a1e853a90a19f358a9ac20;hp=f1c722295e029eace7960fc687efd5afd67dc555
> >
> > Thanks.
> >
> > --
> > Regards,
> > Changli Gao(xiaosuo@gmail.com)
> >
> 
> Does not look like the nf-next-2.6 is triggered by linux-next... Does
> this tree passes net-next-2.6 and then enters linux-next?
> 

Yes it does. Patrick is probably too busy right now to push fixes to
David.

^ permalink raw reply

* Re: linux-next: Tree for December 29 (netfilter build errors)
From: Sedat Dilek @ 2010-12-30 16:47 UTC (permalink / raw)
  To: Changli Gao
  Cc: Randy Dunlap, Stephen Rothwell, netfilter-devel, linux-next, LKML,
	netdev
In-Reply-To: <AANLkTin8Zc4wkx9BjD0vJSNdne8LVHx=JyeSC2YdKTKH@mail.gmail.com>

On Thu, Dec 30, 2010 at 3:18 AM, Changli Gao <xiaosuo@gmail.com> wrote:
> On Thu, Dec 30, 2010 at 12:53 AM, Randy Dunlap <randy.dunlap@oracle.com> wrote:
>> On Wed, 29 Dec 2010 13:07:00 +1100 Stephen Rothwell wrote:
>>
>>> Hi all,
>>>
>>> Changes since 20101228:
>>
>>
>> In file included from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:22:
>> linux-next-20101229/include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type
>> linux-next-20101229/include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
>> linux-next-20101229/include/net/netfilter/nf_conntrack.h:174: error: 'const struct sk_buff' has no member named 'nfct'
>> linux-next-20101229/include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
>> linux-next-20101229/include/net/netfilter/nf_conntrack.h:181: error: implicit declaration of function 'nf_conntrack_put'
>>  CC      net/ipv6/exthdrs_core.o
>> In file included from linux-next-20101229/include/net/netfilter/nf_conntrack_core.h:18,
>>                 from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:26:
>> linux-next-20101229/include/net/netfilter/nf_conntrack_ecache.h: In function 'nf_ct_ecache_ext_add':
>> linux-next-20101229/include/net/netfilter/nf_conntrack_ecache.h:35: error: 'struct net' has no member named 'ct'
>> In file included from linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:26:
>> linux-next-20101229/include/net/netfilter/nf_conntrack_core.h: In function 'nf_conntrack_confirm':
>> linux-next-20101229/include/net/netfilter/nf_conntrack_core.h:60: error: 'struct sk_buff' has no member named 'nfct'
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'nf_ct6_defrag_user':
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:36: error: 'struct sk_buff' has no member named 'nfct'
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:37: error: implicit declaration of function 'nf_ct_zone'
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:37: error: 'struct sk_buff' has no member named 'nfct'
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag':
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:60: error: 'struct sk_buff' has no member named 'nfct'
>> linux-next-20101229/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:60: error: 'struct sk_buff' has no member named 'nfct'
>> make[4]: *** [net/ipv6/netfilter/nf_defrag_ipv6_hooks.o] Error 1
>> make[3]: *** [net/ipv6/netfilter] Error 2
>>
>>
>>
>
> This bug has been already fixed in nf-next-2.6:
> http://git.kernel.org/?p=linux/kernel/git/kaber/nf-next-2.6.git;a=commitdiff;h=ae90bdeaeac6b964b7a1e853a90a19f358a9ac20;hp=f1c722295e029eace7960fc687efd5afd67dc555
>
> Thanks.
>
> --
> Regards,
> Changli Gao(xiaosuo@gmail.com)
>

Does not look like the nf-next-2.6 is triggered by linux-next... Does
this tree passes net-next-2.6 and then enters linux-next?

- Sedat -

P.S.:

sd@tbox:/mnt/sdb5/linux-kernel/linux-next$ git pull
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6.git
master:nf-next-2.6
remote: Counting objects: 705, done.
remote: Compressing objects: 100% (415/415), done.
remote: Total 545 (delta 475), reused 149 (delta 130)
Receiving objects: 100% (545/545), 88.80 KiB | 47 KiB/s, done.
Resolving deltas: 100% (475/475), completed with 136 local objects.
From git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
 * [new branch]      master     -> nf-next-2.6
warning: too many files (created: 931 deleted: 428), skipping inexact
rename detection
Auto-merging include/linux/netfilter.h
Auto-merging include/linux/skbuff.h
Auto-merging net/core/skbuff.c
Auto-merging net/ipv6/netfilter/nf_conntrack_reasm.c
Auto-merging net/netfilter/core.c
Auto-merging net/netfilter/ipvs/ip_vs_ctl.c
Auto-merging net/netfilter/ipvs/ip_vs_xmit.c
Merge made by recursive.
 include/linux/ip_vs.h                              |    8 +
 include/linux/netfilter.h                          |    6 +-
 include/linux/netfilter/xt_CT.h                    |   10 +-
 include/linux/netfilter/xt_TCPOPTSTRIP.h           |    2 +-
 include/linux/netfilter/xt_TPROXY.h                |    8 +-
 include/linux/netfilter/xt_cluster.h               |    8 +-
 include/linux/netfilter/xt_quota.h                 |    6 +-
 include/linux/netfilter/xt_time.h                  |   14 +-
 include/linux/netfilter/xt_u32.h                   |   16 +-
 include/linux/skbuff.h                             |   15 +
 include/net/ip_vs.h                                |   25 +-
 include/net/netfilter/ipv6/nf_conntrack_ipv6.h     |   10 -
 include/net/netfilter/ipv6/nf_defrag_ipv6.h        |   10 +
 include/net/netfilter/nf_conntrack.h               |   19 +-
 include/net/netfilter/nf_conntrack_ecache.h        |   12 +-
 include/net/netfilter/nf_conntrack_extend.h        |    6 +
 include/net/netfilter/nf_conntrack_l3proto.h       |    2 +-
 include/net/netfilter/nf_nat.h                     |    6 +
 include/net/netfilter/nf_nat_core.h                |    4 +-
 net/core/skbuff.c                                  |    2 +
 net/ipv4/netfilter/ipt_LOG.c                       |    3 +-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   |   17 +-
 net/ipv4/netfilter/nf_nat_amanda.c                 |    8 +-
 net/ipv4/netfilter/nf_nat_core.c                   |    9 +-
 net/ipv6/netfilter/ip6t_LOG.c                      |    3 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c            |    2 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c          |    8 +-
 net/netfilter/core.c                               |    4 +-
 net/netfilter/ipvs/ip_vs_conn.c                    |   36 +-
 net/netfilter/ipvs/ip_vs_core.c                    |   94 ++-
 net/netfilter/ipvs/ip_vs_ctl.c                     |   36 +-
 net/netfilter/ipvs/ip_vs_ftp.c                     |    5 +-
 net/netfilter/ipvs/ip_vs_pe.c                      |   17 +-
 net/netfilter/ipvs/ip_vs_pe_sip.c                  |    3 +
 net/netfilter/ipvs/ip_vs_proto_sctp.c              |   11 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c               |   10 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c               |   10 +-
 net/netfilter/ipvs/ip_vs_sync.c                    |  962 +++++++++++++++++---
 net/netfilter/ipvs/ip_vs_xmit.c                    |   26 +-
 net/netfilter/nf_conntrack_core.c                  |    3 +-
 net/netfilter/nf_conntrack_expect.c                |   25 +-
 net/netfilter/nf_conntrack_extend.c                |   11 +-
 net/netfilter/nf_conntrack_helper.c                |   10 +-
 net/netfilter/nf_conntrack_netlink.c               |    1 +
 net/netfilter/nf_conntrack_proto.c                 |   24 +-
 net/netfilter/nf_conntrack_proto_dccp.c            |    3 +
 net/netfilter/nf_conntrack_proto_sctp.c            |    1 +
 net/netfilter/nf_conntrack_proto_tcp.c             |   14 +-
 net/netfilter/nf_conntrack_standalone.c            |    9 +-
 net/netfilter/nf_log.c                             |    6 +-
 net/netfilter/nf_queue.c                           |   18 +-
 net/netfilter/nfnetlink_log.c                      |    6 +-
 net/netfilter/xt_CLASSIFY.c                        |   36 +-
 net/netfilter/xt_NFQUEUE.c                         |    6 +-
 54 files changed, 1258 insertions(+), 368 deletions(-)
sd@tbox:/mnt/sdb5/linux-kernel/linux-next$

^ permalink raw reply

* [net-next-2.6 PATCH v2 3/4] bnx2x: adding dcbnl support
From: Shmulik Ravid @ 2010-12-30 16:27 UTC (permalink / raw)
  To: davem; +Cc: Eilon Greenstein, John Fastabend, lucy.liu, netdev


Adding dcbnl implementation to bnx2x allowing users to manage the
embedded DCBX engine.

This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x.h      |   23 ++-
 drivers/net/bnx2x/bnx2x_dcb.c  |  663 ++++++++++++++++++++++++++++++++++++++--
 drivers/net/bnx2x/bnx2x_dcb.h  |    9 +-
 drivers/net/bnx2x/bnx2x_main.c |    8 +-
 4 files changed, 677 insertions(+), 26 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index f14c6ed..77d6c8d 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -22,15 +22,17 @@
  * (you will need to reboot afterwards) */
 /* #define BNX2X_STOP_ON_ERROR */
 
-#define DRV_MODULE_VERSION      "1.62.00-2"
-#define DRV_MODULE_RELDATE      "2010/12/13"
+#define DRV_MODULE_VERSION      "1.62.00-3"
+#define DRV_MODULE_RELDATE      "2010/12/21"
 #define BNX2X_BC_VER            0x040200
 
 #define BNX2X_MULTI_QUEUE
 
 #define BNX2X_NEW_NAPI
 
-
+#if defined(CONFIG_DCB)
+#define BCM_DCB
+#endif
 #if defined(CONFIG_CNIC) || defined(CONFIG_CNIC_MODULE)
 #define BCM_CNIC 1
 #include "../cnic_if.h"
@@ -1186,7 +1188,20 @@ struct bnx2x {
 	/* LLDP params */
 	struct bnx2x_config_lldp_params		lldp_config_params;
 
-	/* DCBX params */
+	/* DCB support on/off */
+	u16 dcb_state;
+#define BNX2X_DCB_STATE_OFF			0
+#define BNX2X_DCB_STATE_ON			1
+
+	/* DCBX engine mode */
+	int dcbx_enabled;
+#define BNX2X_DCBX_ENABLED_OFF			0
+#define BNX2X_DCBX_ENABLED_ON_NEG_OFF		1
+#define BNX2X_DCBX_ENABLED_ON_NEG_ON		2
+#define BNX2X_DCBX_ENABLED_INVALID		(-1)
+
+	bool dcbx_mode_uset;
+
 	struct bnx2x_config_dcbx_params		dcbx_config_params;
 
 	struct bnx2x_dcbx_port_params		dcbx_port_params;
diff --git a/drivers/net/bnx2x/bnx2x_dcb.c b/drivers/net/bnx2x/bnx2x_dcb.c
index 0b86480..fb60021 100644
--- a/drivers/net/bnx2x/bnx2x_dcb.c
+++ b/drivers/net/bnx2x/bnx2x_dcb.c
@@ -619,13 +619,10 @@ static void bnx2x_dcbx_admin_mib_updated_params(struct bnx2x *bp,
 	for (i = 0; i < sizeof(struct lldp_admin_mib); i += 4, buff++)
 		*buff = REG_RD(bp, (offset + i));
 
-
-	if (BNX2X_DCBX_CONFIG_INV_VALUE != dp->admin_dcbx_enable) {
-		if (dp->admin_dcbx_enable)
-			SET_FLAGS(admin_mib.ver_cfg_flags, DCBX_DCBX_ENABLED);
-		else
-			RESET_FLAGS(admin_mib.ver_cfg_flags, DCBX_DCBX_ENABLED);
-	}
+	if (bp->dcbx_enabled == BNX2X_DCBX_ENABLED_ON_NEG_ON)
+		SET_FLAGS(admin_mib.ver_cfg_flags, DCBX_DCBX_ENABLED);
+	else
+		RESET_FLAGS(admin_mib.ver_cfg_flags, DCBX_DCBX_ENABLED);
 
 	if ((BNX2X_DCBX_OVERWRITE_SETTINGS_ENABLE ==
 				dp->overwrite_settings)) {
@@ -734,12 +731,26 @@ static void bnx2x_dcbx_admin_mib_updated_params(struct bnx2x *bp,
 		REG_WR(bp, (offset + i), *buff);
 }
 
-/* default */
+void bnx2x_dcbx_set_state(struct bnx2x *bp, bool dcb_on, u32 dcbx_enabled)
+{
+	if (CHIP_IS_E2(bp) && !CHIP_MODE_IS_4_PORT(bp)) {
+		bp->dcb_state = dcb_on;
+		bp->dcbx_enabled = dcbx_enabled;
+	} else {
+		bp->dcb_state = false;
+		bp->dcbx_enabled = BNX2X_DCBX_ENABLED_INVALID;
+	}
+	DP(NETIF_MSG_LINK, "DCB state [%s:%s]\n",
+	   dcb_on ? "ON" : "OFF",
+	   dcbx_enabled == BNX2X_DCBX_ENABLED_OFF ? "user-mode" :
+	   dcbx_enabled == BNX2X_DCBX_ENABLED_ON_NEG_OFF ? "on-chip static" :
+	   dcbx_enabled == BNX2X_DCBX_ENABLED_ON_NEG_ON ?
+	   "on-chip with negotiation" : "invalid");
+}
+
 void bnx2x_dcbx_init_params(struct bnx2x *bp)
 {
 	bp->dcbx_config_params.admin_dcbx_version = 0x0; /* 0 - CEE; 1 - IEEE */
-	bp->dcbx_config_params.dcb_enable = 1;
-	bp->dcbx_config_params.admin_dcbx_enable = 1;
 	bp->dcbx_config_params.admin_ets_willing = 1;
 	bp->dcbx_config_params.admin_pfc_willing = 1;
 	bp->dcbx_config_params.overwrite_settings = 1;
@@ -807,23 +818,27 @@ void bnx2x_dcbx_init_params(struct bnx2x *bp)
 void bnx2x_dcbx_init(struct bnx2x *bp)
 {
 	u32 dcbx_lldp_params_offset = SHMEM_LLDP_DCBX_PARAMS_NONE;
+
+	if (bp->dcbx_enabled <= 0)
+		return;
+
 	/* validate:
 	 * chip of good for dcbx version,
 	 * dcb is wanted
 	 * the function is pmf
 	 * shmem2 contains DCBX support fields
 	 */
-	DP(NETIF_MSG_LINK, "dcb_enable %d bp->port.pmf %d\n",
-	   bp->dcbx_config_params.dcb_enable, bp->port.pmf);
+	DP(NETIF_MSG_LINK, "dcb_state %d bp->port.pmf %d\n",
+	   bp->dcb_state, bp->port.pmf);
 
-	if (CHIP_IS_E2(bp) && !CHIP_MODE_IS_4_PORT(bp) &&
-	    bp->dcbx_config_params.dcb_enable &&
-	    bp->port.pmf &&
+	if (bp->dcb_state ==  BNX2X_DCB_STATE_ON && bp->port.pmf &&
 	    SHMEM2_HAS(bp, dcbx_lldp_params_offset)) {
-		dcbx_lldp_params_offset = SHMEM2_RD(bp,
-						    dcbx_lldp_params_offset);
+		dcbx_lldp_params_offset =
+			SHMEM2_RD(bp, dcbx_lldp_params_offset);
+
 		DP(NETIF_MSG_LINK, "dcbx_lldp_params_offset 0x%x\n",
 		   dcbx_lldp_params_offset);
+
 		if (SHMEM_LLDP_DCBX_PARAMS_NONE != dcbx_lldp_params_offset) {
 			bnx2x_dcbx_lldp_updated_params(bp,
 						       dcbx_lldp_params_offset);
@@ -1452,7 +1467,7 @@ static void bnx2x_dcbx_get_ets_pri_pg_tbl(struct bnx2x *bp,
  ******************************************************************************/
 static void bnx2x_pfc_fw_struct_e2(struct bnx2x *bp)
 {
-	struct flow_control_configuration   *pfc_fw_cfg = 0;
+	struct flow_control_configuration   *pfc_fw_cfg = NULL;
 	u16 pri_bit = 0;
 	u8 cos = 0, pri = 0;
 	struct priority_cos *tt2cos;
@@ -1489,3 +1504,615 @@ static void bnx2x_pfc_fw_struct_e2(struct bnx2x *bp)
 	}
 	bnx2x_dcbx_print_cos_params(bp,	pfc_fw_cfg);
 }
+/* DCB netlink */
+#ifdef BCM_DCB
+#include <linux/dcbnl.h>
+
+#define BNX2X_DCBX_CAPS		(DCB_CAP_DCBX_LLD_MANAGED | \
+				DCB_CAP_DCBX_VER_CEE | DCB_CAP_DCBX_STATIC)
+
+static inline bool bnx2x_dcbnl_set_valid(struct bnx2x *bp)
+{
+	/* validate dcbnl call that may change HW state:
+	 * DCB is on and DCBX mode was SUCCESSFULLY set by the user.
+	 */
+	return bp->dcb_state && bp->dcbx_mode_uset;
+}
+
+static u8 bnx2x_dcbnl_get_state(struct net_device *netdev)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "state = %d\n", bp->dcb_state);
+	return bp->dcb_state;
+}
+
+static u8 bnx2x_dcbnl_set_state(struct net_device *netdev, u8 state)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "state = %s\n", state ? "on" : "off");
+
+	bnx2x_dcbx_set_state(bp, (state ? true : false), bp->dcbx_enabled);
+	return 0;
+}
+
+static void bnx2x_dcbnl_get_perm_hw_addr(struct net_device *netdev,
+					 u8 *perm_addr)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "GET-PERM-ADDR\n");
+
+	/* first the HW mac address */
+	memcpy(perm_addr, netdev->dev_addr, netdev->addr_len);
+
+#ifdef BCM_CNIC
+	/* second SAN address */
+	memcpy(perm_addr+netdev->addr_len, bp->fip_mac, netdev->addr_len);
+#endif
+}
+
+static void bnx2x_dcbnl_set_pg_tccfg_tx(struct net_device *netdev, int prio,
+					u8 prio_type, u8 pgid, u8 bw_pct,
+					u8 up_map)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+
+	DP(NETIF_MSG_LINK, "prio[%d] = %d\n", prio, pgid);
+	if (!bnx2x_dcbnl_set_valid(bp) || prio >= DCBX_MAX_NUM_PRI_PG_ENTRIES)
+		return;
+
+	/**
+	 * bw_pct ingnored -	band-width percentage devision between user
+	 *			priorities within the same group is not
+	 *			standard and hence not supported
+	 *
+	 * prio_type igonred -	priority levels within the same group are not
+	 *			standard and hence are not supported. According
+	 *			to the standard pgid 15 is dedicated to strict
+	 *			prioirty traffic (on the port level).
+	 *
+	 * up_map ignored
+	 */
+
+	bp->dcbx_config_params.admin_configuration_ets_pg[prio] = pgid;
+	bp->dcbx_config_params.admin_ets_configuration_tx_enable = 1;
+}
+
+static void bnx2x_dcbnl_set_pg_bwgcfg_tx(struct net_device *netdev,
+					 int pgid, u8 bw_pct)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "pgid[%d] = %d\n", pgid, bw_pct);
+
+	if (!bnx2x_dcbnl_set_valid(bp) || pgid >= DCBX_MAX_NUM_PG_BW_ENTRIES)
+		return;
+
+	bp->dcbx_config_params.admin_configuration_bw_precentage[pgid] = bw_pct;
+	bp->dcbx_config_params.admin_ets_configuration_tx_enable = 1;
+}
+
+static void bnx2x_dcbnl_set_pg_tccfg_rx(struct net_device *netdev, int prio,
+					u8 prio_type, u8 pgid, u8 bw_pct,
+					u8 up_map)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "Nothing to set; No RX support\n");
+}
+
+static void bnx2x_dcbnl_set_pg_bwgcfg_rx(struct net_device *netdev,
+					 int pgid, u8 bw_pct)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "Nothing to set; No RX support\n");
+}
+
+static void bnx2x_dcbnl_get_pg_tccfg_tx(struct net_device *netdev, int prio,
+					u8 *prio_type, u8 *pgid, u8 *bw_pct,
+					u8 *up_map)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "prio = %d\n", prio);
+
+	/**
+	 * bw_pct ingnored -	band-width percentage devision between user
+	 *			priorities within the same group is not
+	 *			standard and hence not supported
+	 *
+	 * prio_type igonred -	priority levels within the same group are not
+	 *			standard and hence are not supported. According
+	 *			to the standard pgid 15 is dedicated to strict
+	 *			prioirty traffic (on the port level).
+	 *
+	 * up_map ignored
+	 */
+	*up_map = *bw_pct = *prio_type = *pgid = 0;
+
+	if (!bp->dcb_state || prio >= DCBX_MAX_NUM_PRI_PG_ENTRIES)
+		return;
+
+	*pgid = DCBX_PRI_PG_GET(bp->dcbx_local_feat.ets.pri_pg_tbl, prio);
+}
+
+static void bnx2x_dcbnl_get_pg_bwgcfg_tx(struct net_device *netdev,
+					 int pgid, u8 *bw_pct)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "pgid = %d\n", pgid);
+
+	*bw_pct = 0;
+
+	if (!bp->dcb_state || pgid >= DCBX_MAX_NUM_PG_BW_ENTRIES)
+		return;
+
+	*bw_pct = DCBX_PG_BW_GET(bp->dcbx_local_feat.ets.pg_bw_tbl, pgid);
+}
+
+static void bnx2x_dcbnl_get_pg_tccfg_rx(struct net_device *netdev, int prio,
+					u8 *prio_type, u8 *pgid, u8 *bw_pct,
+					u8 *up_map)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "Nothing to get; No RX support\n");
+
+	*prio_type = *pgid = *bw_pct = *up_map = 0;
+}
+
+static void bnx2x_dcbnl_get_pg_bwgcfg_rx(struct net_device *netdev,
+					 int pgid, u8 *bw_pct)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "Nothing to get; No RX support\n");
+
+	*bw_pct = 0;
+}
+
+static void bnx2x_dcbnl_set_pfc_cfg(struct net_device *netdev, int prio,
+				    u8 setting)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "prio[%d] = %d\n", prio, setting);
+
+	if (!bnx2x_dcbnl_set_valid(bp) || prio >= MAX_PFC_PRIORITIES)
+		return;
+
+	bp->dcbx_config_params.admin_pfc_bitmap |= ((setting ? 1 : 0) << prio);
+
+	if (setting)
+		bp->dcbx_config_params.admin_pfc_tx_enable = 1;
+}
+
+static void bnx2x_dcbnl_get_pfc_cfg(struct net_device *netdev, int prio,
+				    u8 *setting)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "prio = %d\n", prio);
+
+	*setting = 0;
+
+	if (!bp->dcb_state || prio >= MAX_PFC_PRIORITIES)
+		return;
+
+	*setting = (bp->dcbx_local_feat.pfc.pri_en_bitmap >> prio) & 0x1;
+}
+
+static u8 bnx2x_dcbnl_set_all(struct net_device *netdev)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	int rc = 0;
+
+	DP(NETIF_MSG_LINK, "SET-ALL\n");
+
+	if (!bnx2x_dcbnl_set_valid(bp))
+		return 1;
+
+	if (bp->recovery_state != BNX2X_RECOVERY_DONE) {
+		netdev_err(bp->dev, "Handling parity error recovery. "
+				"Try again later\n");
+		return 1;
+	}
+	if (netif_running(bp->dev)) {
+		bnx2x_nic_unload(bp, UNLOAD_NORMAL);
+		rc = bnx2x_nic_load(bp, LOAD_NORMAL);
+	}
+	DP(NETIF_MSG_LINK, "set_dcbx_params done (%d)\n", rc);
+	if (rc)
+		return 1;
+
+	return 0;
+}
+
+static u8 bnx2x_dcbnl_get_cap(struct net_device *netdev, int capid, u8 *cap)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	u8 rval = 0;
+
+	if (bp->dcb_state) {
+		switch (capid) {
+		case DCB_CAP_ATTR_PG:
+			*cap = true;
+			break;
+		case DCB_CAP_ATTR_PFC:
+			*cap = true;
+			break;
+		case DCB_CAP_ATTR_UP2TC:
+			*cap = false;
+			break;
+		case DCB_CAP_ATTR_PG_TCS:
+			*cap = 0x80;	/* 8 priorities for PGs */
+			break;
+		case DCB_CAP_ATTR_PFC_TCS:
+			*cap = 0x80;	/* 8 priorities for PFC */
+			break;
+		case DCB_CAP_ATTR_GSP:
+			*cap = true;
+			break;
+		case DCB_CAP_ATTR_BCN:
+			*cap = false;
+			break;
+		case DCB_CAP_ATTR_DCBX:
+			*cap = BNX2X_DCBX_CAPS;
+		default:
+			rval = -EINVAL;
+			break;
+		}
+	} else
+		rval = -EINVAL;
+
+	DP(NETIF_MSG_LINK, "capid %d:%x\n", capid, *cap);
+	return rval;
+}
+
+static u8 bnx2x_dcbnl_get_numtcs(struct net_device *netdev, int tcid, u8 *num)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	u8 rval = 0;
+
+	DP(NETIF_MSG_LINK, "tcid %d\n", tcid);
+
+	if (bp->dcb_state) {
+		switch (tcid) {
+		case DCB_NUMTCS_ATTR_PG:
+			*num = E2_NUM_OF_COS;
+			break;
+		case DCB_NUMTCS_ATTR_PFC:
+			*num = E2_NUM_OF_COS;
+			break;
+		default:
+			rval = -EINVAL;
+			break;
+		}
+	} else
+		rval = -EINVAL;
+
+	return rval;
+}
+
+static u8 bnx2x_dcbnl_set_numtcs(struct net_device *netdev, int tcid, u8 num)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "num tcs = %d; Not supported\n", num);
+	return -EINVAL;
+}
+
+static u8  bnx2x_dcbnl_get_pfc_state(struct net_device *netdev)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "state = %d\n", bp->dcbx_local_feat.pfc.enabled);
+
+	if (!bp->dcb_state)
+		return 0;
+
+	return bp->dcbx_local_feat.pfc.enabled;
+}
+
+static void bnx2x_dcbnl_set_pfc_state(struct net_device *netdev, u8 state)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "state = %s\n", state ? "on" : "off");
+
+	if (!bnx2x_dcbnl_set_valid(bp))
+		return;
+
+	bp->dcbx_config_params.admin_pfc_tx_enable =
+	bp->dcbx_config_params.admin_pfc_enable = (state ? 1 : 0);
+}
+
+static bool bnx2x_app_is_equal(struct dcbx_app_priority_entry *app_ent,
+			       u8 idtype, u16 idval)
+{
+	if (!(app_ent->appBitfield & DCBX_APP_ENTRY_VALID))
+		return false;
+
+	switch (idtype) {
+	case DCB_APP_IDTYPE_ETHTYPE:
+		if ((app_ent->appBitfield & DCBX_APP_ENTRY_SF_MASK) !=
+			DCBX_APP_SF_ETH_TYPE)
+			return false;
+		break;
+	case DCB_APP_IDTYPE_PORTNUM:
+		if ((app_ent->appBitfield & DCBX_APP_ENTRY_SF_MASK) !=
+			DCBX_APP_SF_PORT)
+			return false;
+		break;
+	default:
+		return false;
+	}
+	if (app_ent->app_id != idval)
+		return false;
+
+	return true;
+}
+
+static void bnx2x_admin_app_set_ent(
+	struct bnx2x_admin_priority_app_table *app_ent,
+	u8 idtype, u16 idval, u8 up)
+{
+	app_ent->valid = 1;
+
+	switch (idtype) {
+	case DCB_APP_IDTYPE_ETHTYPE:
+		app_ent->traffic_type = TRAFFIC_TYPE_ETH;
+		break;
+	case DCB_APP_IDTYPE_PORTNUM:
+		app_ent->traffic_type = TRAFFIC_TYPE_PORT;
+		break;
+	default:
+		break; /* never gets here */
+	}
+	app_ent->app_id = idval;
+	app_ent->priority = up;
+}
+
+static bool bnx2x_admin_app_is_equal(
+	struct bnx2x_admin_priority_app_table *app_ent,
+	u8 idtype, u16 idval)
+{
+	if (!app_ent->valid)
+		return false;
+
+	switch (idtype) {
+	case DCB_APP_IDTYPE_ETHTYPE:
+		if (app_ent->traffic_type != TRAFFIC_TYPE_ETH)
+			return false;
+		break;
+	case DCB_APP_IDTYPE_PORTNUM:
+		if (app_ent->traffic_type != TRAFFIC_TYPE_PORT)
+			return false;
+		break;
+	default:
+		return false;
+	}
+	if (app_ent->app_id != idval)
+		return false;
+
+	return true;
+}
+
+static int bnx2x_set_admin_app_up(struct bnx2x *bp, u8 idtype, u16 idval, u8 up)
+{
+	int i, ff;
+
+	/* iterate over the app entries looking for idtype and idval */
+	for (i = 0, ff = -1; i < 4; i++) {
+		struct bnx2x_admin_priority_app_table *app_ent =
+			&bp->dcbx_config_params.admin_priority_app_table[i];
+		if (bnx2x_admin_app_is_equal(app_ent, idtype, idval))
+			break;
+
+		if (ff < 0 && !app_ent->valid)
+			ff = i;
+	}
+	if (i < 4)
+		/* if found overwrite up */
+		bp->dcbx_config_params.
+			admin_priority_app_table[i].priority = up;
+	else if (ff >= 0)
+		/* not found use first-free */
+		bnx2x_admin_app_set_ent(
+			&bp->dcbx_config_params.admin_priority_app_table[ff],
+			idtype, idval, up);
+	else
+		/* app table is full */
+		return -EBUSY;
+
+	/* up configured, if not 0 make sure feature is enabled */
+	if (up)
+		bp->dcbx_config_params.admin_application_priority_tx_enable = 1;
+
+	return 0;
+}
+
+static u8 bnx2x_dcbnl_set_app_up(struct net_device *netdev, u8 idtype,
+				 u16 idval, u8 up)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+
+	DP(NETIF_MSG_LINK, "app_type %d, app_id %x, prio bitmap %d\n",
+	   idtype, idval, up);
+
+	if (!bnx2x_dcbnl_set_valid(bp))
+		return -EINVAL;
+
+	/* verify idtype */
+	switch (idtype) {
+	case DCB_APP_IDTYPE_ETHTYPE:
+	case DCB_APP_IDTYPE_PORTNUM:
+		break;
+	default:
+		return -EINVAL;
+	}
+	return bnx2x_set_admin_app_up(bp, idtype, idval, up);
+}
+
+static u8 bnx2x_dcbnl_get_app_up(struct net_device *netdev, u8 idtype,
+				 u16 idval)
+{
+	int i;
+	u8 up = 0;
+
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "app_type %d, app_id 0x%x\n", idtype, idval);
+
+	/* iterate over the app entries looking for idtype and idval */
+	for (i = 0; i < DCBX_MAX_APP_PROTOCOL; i++)
+		if (bnx2x_app_is_equal(&bp->dcbx_local_feat.app.app_pri_tbl[i],
+				       idtype, idval))
+			break;
+
+	if (i < DCBX_MAX_APP_PROTOCOL)
+		/* if found return up */
+		up = bp->dcbx_local_feat.app.app_pri_tbl[i].pri_bitmap;
+	else
+		DP(NETIF_MSG_LINK, "app not found\n");
+
+	return up;
+}
+
+static u8 bnx2x_dcbnl_get_dcbx(struct net_device *netdev)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	u8 state;
+
+	state = DCB_CAP_DCBX_LLD_MANAGED | DCB_CAP_DCBX_VER_CEE;
+
+	if (bp->dcbx_enabled == BNX2X_DCBX_ENABLED_ON_NEG_OFF)
+		state |= DCB_CAP_DCBX_STATIC;
+
+	return state;
+}
+
+static u8 bnx2x_dcbnl_set_dcbx(struct net_device *netdev, u8 state)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	DP(NETIF_MSG_LINK, "state = %02x\n", state);
+
+	/* set dcbx mode */
+
+	if ((state & BNX2X_DCBX_CAPS) != state) {
+		BNX2X_ERR("Requested DCBX mode %x is beyond advertised "
+			  "capabilities\n", state);
+		return 1;
+	}
+
+	if (bp->dcb_state != BNX2X_DCB_STATE_ON) {
+		BNX2X_ERR("DCB turned off, DCBX configuration is invalid\n");
+		return 1;
+	}
+
+	if (state & DCB_CAP_DCBX_STATIC)
+		bp->dcbx_enabled = BNX2X_DCBX_ENABLED_ON_NEG_OFF;
+	else
+		bp->dcbx_enabled = BNX2X_DCBX_ENABLED_ON_NEG_ON;
+
+	bp->dcbx_mode_uset = true;
+	return 0;
+}
+
+
+static u8 bnx2x_dcbnl_get_featcfg(struct net_device *netdev, int featid,
+				  u8 *flags)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	u8 rval = 0;
+
+	DP(NETIF_MSG_LINK, "featid %d\n", featid);
+
+	if (bp->dcb_state) {
+		*flags = 0;
+		switch (featid) {
+		case DCB_FEATCFG_ATTR_PG:
+			if (bp->dcbx_local_feat.ets.enabled)
+				*flags |= DCB_FEATCFG_ENABLE;
+			if (bp->dcbx_error & DCBX_LOCAL_ETS_ERROR)
+				*flags |= DCB_FEATCFG_ERROR;
+			break;
+		case DCB_FEATCFG_ATTR_PFC:
+			if (bp->dcbx_local_feat.pfc.enabled)
+				*flags |= DCB_FEATCFG_ENABLE;
+			if (bp->dcbx_error & (DCBX_LOCAL_PFC_ERROR |
+			    DCBX_LOCAL_PFC_MISMATCH))
+				*flags |= DCB_FEATCFG_ERROR;
+			break;
+		case DCB_FEATCFG_ATTR_APP:
+			if (bp->dcbx_local_feat.app.enabled)
+				*flags |= DCB_FEATCFG_ENABLE;
+			if (bp->dcbx_error & (DCBX_LOCAL_APP_ERROR |
+			    DCBX_LOCAL_APP_MISMATCH))
+				*flags |= DCB_FEATCFG_ERROR;
+			break;
+		default:
+			rval = -EINVAL;
+			break;
+		}
+	} else
+		rval = -EINVAL;
+
+	return rval;
+}
+
+static u8 bnx2x_dcbnl_set_featcfg(struct net_device *netdev, int featid,
+				  u8 flags)
+{
+	struct bnx2x *bp = netdev_priv(netdev);
+	u8 rval = 0;
+
+	DP(NETIF_MSG_LINK, "featid = %d flags = %02x\n", featid, flags);
+
+	/* ignore the 'advertise' flag */
+	if (bnx2x_dcbnl_set_valid(bp)) {
+		switch (featid) {
+		case DCB_FEATCFG_ATTR_PG:
+			bp->dcbx_config_params.admin_ets_enable =
+				flags & DCB_FEATCFG_ENABLE ? 1 : 0;
+			bp->dcbx_config_params.admin_ets_willing =
+				flags & DCB_FEATCFG_WILLING ? 1 : 0;
+			break;
+		case DCB_FEATCFG_ATTR_PFC:
+			bp->dcbx_config_params.admin_pfc_enable =
+				flags & DCB_FEATCFG_ENABLE ? 1 : 0;
+			bp->dcbx_config_params.admin_pfc_willing =
+				flags & DCB_FEATCFG_WILLING ? 1 : 0;
+			break;
+		case DCB_FEATCFG_ATTR_APP:
+			/* ignore enable, always enabled */
+			bp->dcbx_config_params.admin_app_priority_willing =
+				flags & DCB_FEATCFG_WILLING ? 1 : 0;
+			break;
+		default:
+			rval = -EINVAL;
+			break;
+		}
+	} else
+		rval = -EINVAL;
+
+	return rval;
+}
+
+const struct dcbnl_rtnl_ops bnx2x_dcbnl_ops = {
+	.getstate       = bnx2x_dcbnl_get_state,
+	.setstate       = bnx2x_dcbnl_set_state,
+	.getpermhwaddr  = bnx2x_dcbnl_get_perm_hw_addr,
+	.setpgtccfgtx   = bnx2x_dcbnl_set_pg_tccfg_tx,
+	.setpgbwgcfgtx  = bnx2x_dcbnl_set_pg_bwgcfg_tx,
+	.setpgtccfgrx   = bnx2x_dcbnl_set_pg_tccfg_rx,
+	.setpgbwgcfgrx  = bnx2x_dcbnl_set_pg_bwgcfg_rx,
+	.getpgtccfgtx   = bnx2x_dcbnl_get_pg_tccfg_tx,
+	.getpgbwgcfgtx  = bnx2x_dcbnl_get_pg_bwgcfg_tx,
+	.getpgtccfgrx   = bnx2x_dcbnl_get_pg_tccfg_rx,
+	.getpgbwgcfgrx  = bnx2x_dcbnl_get_pg_bwgcfg_rx,
+	.setpfccfg      = bnx2x_dcbnl_set_pfc_cfg,
+	.getpfccfg      = bnx2x_dcbnl_get_pfc_cfg,
+	.setall         = bnx2x_dcbnl_set_all,
+	.getcap         = bnx2x_dcbnl_get_cap,
+	.getnumtcs      = bnx2x_dcbnl_get_numtcs,
+	.setnumtcs      = bnx2x_dcbnl_set_numtcs,
+	.getpfcstate    = bnx2x_dcbnl_get_pfc_state,
+	.setpfcstate    = bnx2x_dcbnl_set_pfc_state,
+	.getapp         = bnx2x_dcbnl_get_app_up,
+	.setapp         = bnx2x_dcbnl_set_app_up,
+	.getdcbx        = bnx2x_dcbnl_get_dcbx,
+	.setdcbx        = bnx2x_dcbnl_set_dcbx,
+	.getfeatcfg     = bnx2x_dcbnl_get_featcfg,
+	.setfeatcfg     = bnx2x_dcbnl_set_featcfg,
+};
+
+#endif /* BCM_DCB */
diff --git a/drivers/net/bnx2x/bnx2x_dcb.h b/drivers/net/bnx2x/bnx2x_dcb.h
index 8dea56b..f650f98 100644
--- a/drivers/net/bnx2x/bnx2x_dcb.h
+++ b/drivers/net/bnx2x/bnx2x_dcb.h
@@ -51,7 +51,6 @@ struct bnx2x_dcbx_pfc_params {
 };
 
 struct bnx2x_dcbx_port_params {
-	u32 dcbx_enabled;
 	struct bnx2x_dcbx_pfc_params pfc;
 	struct bnx2x_dcbx_pg_params  ets;
 	struct bnx2x_dcbx_app_params app;
@@ -88,8 +87,6 @@ struct bnx2x_admin_priority_app_table {
  * DCBX protocol configuration parameters.
  ******************************************************************************/
 struct bnx2x_config_dcbx_params {
-	u32 dcb_enable;
-	u32 admin_dcbx_enable;
 	u32 overwrite_settings;
 	u32 admin_dcbx_version;
 	u32 admin_ets_enable;
@@ -182,6 +179,7 @@ struct bnx2x;
 void bnx2x_dcb_init_intmem_pfc(struct bnx2x *bp);
 void bnx2x_dcbx_update(struct work_struct *work);
 void bnx2x_dcbx_init_params(struct bnx2x *bp);
+void bnx2x_dcbx_set_state(struct bnx2x *bp, bool dcb_on, u32 dcbx_enabled);
 
 enum {
 	BNX2X_DCBX_STATE_NEG_RECEIVED = 0x1,
@@ -190,4 +188,9 @@ enum {
 };
 void bnx2x_dcbx_set_params(struct bnx2x *bp, u32 state);
 
+/* DCB netlink */
+#ifdef BCM_DCB
+extern const struct dcbnl_rtnl_ops bnx2x_dcbnl_ops;
+#endif /* BCM_DCB */
+
 #endif /* BNX2X_DCB_H */
diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
index cf54427..489a551 100644
--- a/drivers/net/bnx2x/bnx2x_main.c
+++ b/drivers/net/bnx2x/bnx2x_main.c
@@ -3107,7 +3107,8 @@ static inline void bnx2x_attn_int_deasserted3(struct bnx2x *bp, u32 attn)
 				bnx2x_pmf_update(bp);
 
 			if (bp->port.pmf &&
-			    (val & DRV_STATUS_DCBX_NEGOTIATION_RESULTS))
+			    (val & DRV_STATUS_DCBX_NEGOTIATION_RESULTS) &&
+				bp->dcbx_enabled > 0)
 				/* start dcbx state machine */
 				bnx2x_dcbx_set_params(bp,
 					BNX2X_DCBX_STATE_NEG_RECEIVED);
@@ -8795,6 +8796,7 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
 	bp->timer.data = (unsigned long) bp;
 	bp->timer.function = bnx2x_timer;
 
+	bnx2x_dcbx_set_state(bp, true, BNX2X_DCBX_ENABLED_ON_NEG_ON);
 	bnx2x_dcbx_init_params(bp);
 
 	return rc;
@@ -9146,6 +9148,10 @@ static int __devinit bnx2x_init_dev(struct pci_dev *pdev,
 	dev->vlan_features |= (NETIF_F_TSO | NETIF_F_TSO_ECN);
 	dev->vlan_features |= NETIF_F_TSO6;
 
+#ifdef BCM_DCB
+	dev->dcbnl_ops = &bnx2x_dcbnl_ops;
+#endif
+
 	/* get_port_hwinfo() will set prtad and mmds properly */
 	bp->mdio.prtad = MDIO_PRTAD_NONE;
 	bp->mdio.mmds = 0;
-- 
1.7.1





^ permalink raw reply related

* [net-next-2.6 PATCH v2 4/4] dcbnl: cleanup
From: Shmulik Ravid @ 2010-12-30 16:27 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, Eilon Greenstein, lucy.liu, netdev

A couple of small cleanups for patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 net/dcb/dcbnl.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index b387816..ff3c12d 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1626,8 +1626,10 @@ u8 dcb_setapp(struct net_device *dev, struct dcb_app *new)
 	if (new->priority) {
 		struct dcb_app_type *entry;
 		entry = kmalloc(sizeof(struct dcb_app_type), GFP_ATOMIC);
-		if (!entry)
+		if (!entry) {
+			spin_unlock(&dcb_lock);
 			return -ENOMEM;
+		}
 
 		memcpy(&entry->app, new, sizeof(*new));
 		strncpy(entry->name, dev->name, IFNAMSIZ);
@@ -1640,7 +1642,7 @@ out:
 }
 EXPORT_SYMBOL(dcb_setapp);
 
-void dcb_flushapp(void)
+static void dcb_flushapp(void)
 {
 	struct dcb_app_type *app;
 
-- 
1.7.1





^ permalink raw reply related

* [net-next-2.6 PATCH v2 2/4] dcbnl: adding DCBX feature flags get-set
From: Shmulik Ravid @ 2010-12-30 16:26 UTC (permalink / raw)
  To: davem; +Cc: Eilon Greenstein, John Fastabend, lucy.liu, netdev

Adding a pair of set-get routines to dcbnl for setting the negotiation
flags of the various DCB features. Conforms to the CEE flavor of DCBX
The user sets these flags (enable, advertise, willing) for each feature
to be used by the DCBX engine. The 'get' routine returns which of the
features is enabled after the negotiation.

This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/linux/dcbnl.h |   33 ++++++++++++
 include/net/dcbnl.h   |    3 +
 net/dcb/dcbnl.c       |  133 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 169 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index dd5c84e..18cab06 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -126,6 +126,8 @@ struct dcbmsg {
  * @DCB_CMD_IEEE_GET: get IEEE 802.1Qaz configuration
  * @DCB_CMD_GDCBX: get DCBX engine configuration
  * @DCB_CMD_SDCBX: set DCBX engine configuration
+ * @DCB_CMD_GFEATCFG: get DCBX features flags
+ * @DCB_CMD_SFEATCFG: set DCBX features negotiation flags
  */
 enum dcbnl_commands {
 	DCB_CMD_UNDEFINED,
@@ -165,6 +167,9 @@ enum dcbnl_commands {
 	DCB_CMD_GDCBX,
 	DCB_CMD_SDCBX,
 
+	DCB_CMD_GFEATCFG,
+	DCB_CMD_SFEATCFG,
+
 	__DCB_CMD_ENUM_MAX,
 	DCB_CMD_MAX = __DCB_CMD_ENUM_MAX - 1,
 };
@@ -186,6 +191,7 @@ enum dcbnl_commands {
  * @DCB_ATTR_BCN: backward congestion notification configuration (NLA_NESTED)
  * @DCB_ATTR_IEEE: IEEE 802.1Qaz supported attributes (NLA_NESTED)
  * @DCB_ATTR_DCBX: DCBX engine configuration in the device (NLA_U8)
+ * @DCB_ATTR_FEATCFG: DCBX features flags (NLA_NESTED)
  */
 enum dcbnl_attrs {
 	DCB_ATTR_UNDEFINED,
@@ -207,6 +213,7 @@ enum dcbnl_attrs {
 	DCB_ATTR_IEEE,
 
 	DCB_ATTR_DCBX,
+	DCB_ATTR_FEATCFG,
 
 	__DCB_ATTR_ENUM_MAX,
 	DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
@@ -495,4 +502,30 @@ enum dcbnl_app_attrs {
 	DCB_APP_ATTR_MAX = __DCB_APP_ATTR_ENUM_MAX - 1,
 };
 
+/**
+ * enum dcbnl_featcfg_attrs - features conifiguration flags
+ *
+ * @DCB_FEATCFG_ATTR_UNDEFINED: unspecified attribute to catch errors
+ * @DCB_FEATCFG_ATTR_ALL: (NLA_FLAG) all features configuration attributes
+ * @DCB_FEATCFG_ATTR_PG: (NLA_U8) configuration flags for priority groups
+ * @DCB_FEATCFG_ATTR_PFC: (NLA_U8) configuration flags for priority
+ *                                 flow control
+ * @DCB_FEATCFG_ATTR_APP: (NLA_U8) configuration flags for application TLV
+ *
+ */
+#define DCB_FEATCFG_ERROR	0x01	/* error in feature resolution */
+#define DCB_FEATCFG_ENABLE	0x02	/* enable feature */
+#define DCB_FEATCFG_WILLING	0x04	/* feature is willing */
+#define DCB_FEATCFG_ADVERTISE	0x08	/* advertise feature */
+enum dcbnl_featcfg_attrs {
+	DCB_FEATCFG_ATTR_UNDEFINED,
+	DCB_FEATCFG_ATTR_ALL,
+	DCB_FEATCFG_ATTR_PG,
+	DCB_FEATCFG_ATTR_PFC,
+	DCB_FEATCFG_ATTR_APP,
+
+	__DCB_FEATCFG_ATTR_ENUM_MAX,
+	DCB_FEATCFG_ATTR_MAX = __DCB_FEATCFG_ATTR_ENUM_MAX - 1,
+};
+
 #endif /* __LINUX_DCBNL_H__ */
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index c65347b..a8e7852 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -70,11 +70,14 @@ struct dcbnl_rtnl_ops {
 	void (*setbcnrp)(struct net_device *, int, u8);
 	u8   (*setapp)(struct net_device *, u8, u16, u8);
 	u8   (*getapp)(struct net_device *, u8, u16);
+	u8   (*getfeatcfg)(struct net_device *, int, u8 *);
+	u8   (*setfeatcfg)(struct net_device *, int, u8);
 
 	/* DCBX configuration */
 	u8   (*getdcbx)(struct net_device *);
 	u8   (*setdcbx)(struct net_device *, u8);
 
+
 };
 
 #endif /* __NET_DCBNL_H__ */
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 4603691..b387816 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -69,6 +69,7 @@ static const struct nla_policy dcbnl_rtnl_policy[DCB_ATTR_MAX + 1] = {
 	[DCB_ATTR_APP]         = {.type = NLA_NESTED},
 	[DCB_ATTR_IEEE]	       = {.type = NLA_NESTED},
 	[DCB_ATTR_DCBX]        = {.type = NLA_U8},
+	[DCB_ATTR_FEATCFG]     = {.type = NLA_NESTED},
 };
 
 /* DCB priority flow control to User Priority nested attributes */
@@ -182,6 +183,14 @@ static const struct nla_policy dcbnl_ieee_app[DCB_ATTR_IEEE_APP_MAX + 1] = {
 	[DCB_ATTR_IEEE_APP]	    = {.len = sizeof(struct dcb_app)},
 };
 
+/* DCB number of traffic classes nested attributes. */
+static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = {
+	[DCB_FEATCFG_ATTR_ALL]      = {.type = NLA_FLAG},
+	[DCB_FEATCFG_ATTR_PG]       = {.type = NLA_U8},
+	[DCB_FEATCFG_ATTR_PFC]      = {.type = NLA_U8},
+	[DCB_FEATCFG_ATTR_APP]      = {.type = NLA_U8},
+};
+
 static LIST_HEAD(dcb_app_list);
 static DEFINE_SPINLOCK(dcb_lock);
 
@@ -1306,6 +1315,122 @@ static int dcbnl_setdcbx(struct net_device *netdev, struct nlattr **tb,
 	return ret;
 }
 
+static int dcbnl_getfeatcfg(struct net_device *netdev, struct nlattr **tb,
+			    u32 pid, u32 seq, u16 flags)
+{
+	struct sk_buff *dcbnl_skb;
+	struct nlmsghdr *nlh;
+	struct dcbmsg *dcb;
+	struct nlattr *data[DCB_FEATCFG_ATTR_MAX + 1], *nest;
+	u8 value;
+	int ret = -EINVAL;
+	int i;
+	int getall = 0;
+
+	if (!tb[DCB_ATTR_FEATCFG] || !netdev->dcbnl_ops->getfeatcfg)
+		return ret;
+
+	ret = nla_parse_nested(data, DCB_FEATCFG_ATTR_MAX, tb[DCB_ATTR_FEATCFG],
+			       dcbnl_featcfg_nest);
+	if (ret) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	dcbnl_skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!dcbnl_skb) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	nlh = NLMSG_NEW(dcbnl_skb, pid, seq, RTM_GETDCB, sizeof(*dcb), flags);
+
+	dcb = NLMSG_DATA(nlh);
+	dcb->dcb_family = AF_UNSPEC;
+	dcb->cmd = DCB_CMD_GFEATCFG;
+
+	nest = nla_nest_start(dcbnl_skb, DCB_ATTR_FEATCFG);
+	if (!nest) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	if (data[DCB_FEATCFG_ATTR_ALL])
+		getall = 1;
+
+	for (i = DCB_FEATCFG_ATTR_ALL+1; i <= DCB_FEATCFG_ATTR_MAX; i++) {
+		if (!getall && !data[i])
+			continue;
+
+		ret = netdev->dcbnl_ops->getfeatcfg(netdev, i, &value);
+		if (!ret) {
+			ret = nla_put_u8(dcbnl_skb, i, value);
+
+			if (ret) {
+				nla_nest_cancel(dcbnl_skb, nest);
+				ret = -EINVAL;
+				goto err;
+			}
+		} else
+			goto err;
+	}
+	nla_nest_end(dcbnl_skb, nest);
+
+	nlmsg_end(dcbnl_skb, nlh);
+
+	ret = rtnl_unicast(dcbnl_skb, &init_net, pid);
+	if (ret) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	return 0;
+nlmsg_failure:
+err:
+	kfree_skb(dcbnl_skb);
+err_out:
+	return ret;
+}
+
+static int dcbnl_setfeatcfg(struct net_device *netdev, struct nlattr **tb,
+			    u32 pid, u32 seq, u16 flags)
+{
+	struct nlattr *data[DCB_FEATCFG_ATTR_MAX + 1];
+	int ret = -EINVAL;
+	u8 value;
+	int i;
+
+	if (!tb[DCB_ATTR_FEATCFG] || !netdev->dcbnl_ops->setfeatcfg)
+		return ret;
+
+	ret = nla_parse_nested(data, DCB_FEATCFG_ATTR_MAX, tb[DCB_ATTR_FEATCFG],
+			       dcbnl_featcfg_nest);
+
+	if (ret) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	for (i = DCB_FEATCFG_ATTR_ALL+1; i <= DCB_FEATCFG_ATTR_MAX; i++) {
+		if (data[i] == NULL)
+			continue;
+
+		value = nla_get_u8(data[i]);
+
+		ret = netdev->dcbnl_ops->setfeatcfg(netdev, i, value);
+
+		if (ret)
+			goto operr;
+	}
+
+operr:
+	ret = dcbnl_reply(!!ret, RTM_SETDCB, DCB_CMD_SFEATCFG,
+			  DCB_ATTR_FEATCFG, pid, seq, flags);
+
+err:
+	return ret;
+}
+
 static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1427,6 +1552,14 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		ret = dcbnl_setdcbx(netdev, tb, pid, nlh->nlmsg_seq,
 				    nlh->nlmsg_flags);
 		goto out;
+	case DCB_CMD_GFEATCFG:
+		ret = dcbnl_getfeatcfg(netdev, tb, pid, nlh->nlmsg_seq,
+				       nlh->nlmsg_flags);
+		goto out;
+	case DCB_CMD_SFEATCFG:
+		ret = dcbnl_setfeatcfg(netdev, tb, pid, nlh->nlmsg_seq,
+				       nlh->nlmsg_flags);
+		goto out;
 	default:
 		goto errout;
 	}
-- 
1.7.1





^ permalink raw reply related

* [net-next-2.6 PATCH v2 1/4] dcbnl: adding DCBX engine capability
From: Shmulik Ravid @ 2010-12-30 16:26 UTC (permalink / raw)
  To: davem; +Cc: Eilon Greenstein, John Fastabend, lucy.liu, netdev

Adding an optional DCBX capability and a pair for get-set routines for
setting the device DCBX mode. The DCBX capability is a bit field of
supported attributes. The user is expected to set the DCBX mode with a
subset of the advertised attributes.

This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/linux/dcbnl.h |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/net/dcbnl.h   |    5 +++++
 net/dcb/dcbnl.c       |   43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index 9071e1c..dd5c84e 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -124,6 +124,8 @@ struct dcbmsg {
  * @DCB_CMD_SAPP: set application protocol configuration
  * @DCB_CMD_IEEE_SET: set IEEE 802.1Qaz configuration
  * @DCB_CMD_IEEE_GET: get IEEE 802.1Qaz configuration
+ * @DCB_CMD_GDCBX: get DCBX engine configuration
+ * @DCB_CMD_SDCBX: set DCBX engine configuration
  */
 enum dcbnl_commands {
 	DCB_CMD_UNDEFINED,
@@ -160,6 +162,9 @@ enum dcbnl_commands {
 	DCB_CMD_IEEE_SET,
 	DCB_CMD_IEEE_GET,
 
+	DCB_CMD_GDCBX,
+	DCB_CMD_SDCBX,
+
 	__DCB_CMD_ENUM_MAX,
 	DCB_CMD_MAX = __DCB_CMD_ENUM_MAX - 1,
 };
@@ -180,6 +185,7 @@ enum dcbnl_commands {
  * @DCB_ATTR_NUMTCS: number of traffic classes supported (NLA_NESTED)
  * @DCB_ATTR_BCN: backward congestion notification configuration (NLA_NESTED)
  * @DCB_ATTR_IEEE: IEEE 802.1Qaz supported attributes (NLA_NESTED)
+ * @DCB_ATTR_DCBX: DCBX engine configuration in the device (NLA_U8)
  */
 enum dcbnl_attrs {
 	DCB_ATTR_UNDEFINED,
@@ -200,6 +206,8 @@ enum dcbnl_attrs {
 	/* IEEE std attributes */
 	DCB_ATTR_IEEE,
 
+	DCB_ATTR_DCBX,
+
 	__DCB_ATTR_ENUM_MAX,
 	DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
 };
@@ -359,6 +367,8 @@ enum dcbnl_tc_attrs {
  * @DCB_CAP_ATTR_GSP: (NLA_U8) device supports group strict priority
  * @DCB_CAP_ATTR_BCN: (NLA_U8) device supports Backwards Congestion
  *                             Notification
+ * @DCB_CAP_ATTR_DCBX: (NLA_U8) device supports DCBX engine
+ *
  */
 enum dcbnl_cap_attrs {
 	DCB_CAP_ATTR_UNDEFINED,
@@ -370,12 +380,45 @@ enum dcbnl_cap_attrs {
 	DCB_CAP_ATTR_PFC_TCS,
 	DCB_CAP_ATTR_GSP,
 	DCB_CAP_ATTR_BCN,
+	DCB_CAP_ATTR_DCBX,
 
 	__DCB_CAP_ATTR_ENUM_MAX,
 	DCB_CAP_ATTR_MAX = __DCB_CAP_ATTR_ENUM_MAX - 1,
 };
 
 /**
+ * DCBX capability flags
+ *
+ * @DCB_CAP_DCBX_HOST: DCBX negotiation is performed by the host LLDP agent.
+ *                     'set' routines are used to configure the device with
+ *                     the negotiated parameters
+ *
+ * @DCB_CAP_DCBX_LLD_MANAGED: DCBX negotiation is not performed in the host but
+ *                            by another entity
+ *                            'get' routines are used to retrieve the
+ *                            negotiated parameters
+ *                            'set' routines can be used to set the initial
+ *                            negotiation configuration
+ *
+ * @DCB_CAP_DCBX_VER_CEE: for a non-host DCBX engine, indicates the engine
+ *                        supports the CEE protocol flavor
+ *
+ * @DCB_CAP_DCBX_VER_IEEE: for a non-host DCBX engine, indicates the engine
+ *                         supports the IEEE protocol flavor
+ *
+ * @DCB_CAP_DCBX_STATIC: for a non-host DCBX engine, indicates the engine
+ *                       supports static configuration (i.e no actual
+ *                       negotiation is performed negotiated parameters equal
+ *                       the initial configuration)
+ *
+ */
+#define DCB_CAP_DCBX_HOST		0x01
+#define DCB_CAP_DCBX_LLD_MANAGED	0x02
+#define DCB_CAP_DCBX_VER_CEE		0x04
+#define DCB_CAP_DCBX_VER_IEEE		0x08
+#define DCB_CAP_DCBX_STATIC		0x10
+
+/**
  * enum dcbnl_numtcs_attrs - number of traffic classes
  *
  * @DCB_NUMTCS_ATTR_UNDEFINED: unspecified attribute to catch errors
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index ab7d623..c65347b 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -70,6 +70,11 @@ struct dcbnl_rtnl_ops {
 	void (*setbcnrp)(struct net_device *, int, u8);
 	u8   (*setapp)(struct net_device *, u8, u16, u8);
 	u8   (*getapp)(struct net_device *, u8, u16);
+
+	/* DCBX configuration */
+	u8   (*getdcbx)(struct net_device *);
+	u8   (*setdcbx)(struct net_device *, u8);
+
 };
 
 #endif /* __NET_DCBNL_H__ */
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 5f1d0ee..4603691 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -68,6 +68,7 @@ static const struct nla_policy dcbnl_rtnl_policy[DCB_ATTR_MAX + 1] = {
 	[DCB_ATTR_BCN]         = {.type = NLA_NESTED},
 	[DCB_ATTR_APP]         = {.type = NLA_NESTED},
 	[DCB_ATTR_IEEE]	       = {.type = NLA_NESTED},
+	[DCB_ATTR_DCBX]        = {.type = NLA_U8},
 };
 
 /* DCB priority flow control to User Priority nested attributes */
@@ -124,6 +125,7 @@ static const struct nla_policy dcbnl_cap_nest[DCB_CAP_ATTR_MAX + 1] = {
 	[DCB_CAP_ATTR_PFC_TCS] = {.type = NLA_U8},
 	[DCB_CAP_ATTR_GSP]     = {.type = NLA_U8},
 	[DCB_CAP_ATTR_BCN]     = {.type = NLA_U8},
+	[DCB_CAP_ATTR_DCBX]    = {.type = NLA_U8},
 };
 
 /* DCB capabilities nested attributes. */
@@ -1271,6 +1273,39 @@ nlmsg_failure:
 	return -1;
 }
 
+/* DCBX configuration */
+static int dcbnl_getdcbx(struct net_device *netdev, struct nlattr **tb,
+			 u32 pid, u32 seq, u16 flags)
+{
+	int ret = -EINVAL;
+
+	if (!netdev->dcbnl_ops->getdcbx)
+		return ret;
+
+	ret = dcbnl_reply(netdev->dcbnl_ops->getdcbx(netdev), RTM_GETDCB,
+			  DCB_CMD_GDCBX, DCB_ATTR_DCBX, pid, seq, flags);
+
+	return ret;
+}
+
+static int dcbnl_setdcbx(struct net_device *netdev, struct nlattr **tb,
+			 u32 pid, u32 seq, u16 flags)
+{
+	int ret = -EINVAL;
+	u8 value;
+
+	if (!tb[DCB_ATTR_DCBX] || !netdev->dcbnl_ops->setdcbx)
+		return ret;
+
+	value = nla_get_u8(tb[DCB_ATTR_DCBX]);
+
+	ret = dcbnl_reply(netdev->dcbnl_ops->setdcbx(netdev, value),
+			  RTM_SETDCB, DCB_CMD_SDCBX, DCB_ATTR_DCBX,
+			  pid, seq, flags);
+
+	return ret;
+}
+
 static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1384,6 +1419,14 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		ret = dcbnl_ieee_get(netdev, tb, pid, nlh->nlmsg_seq,
 				 nlh->nlmsg_flags);
 		goto out;
+	case DCB_CMD_GDCBX:
+		ret = dcbnl_getdcbx(netdev, tb, pid, nlh->nlmsg_seq,
+				    nlh->nlmsg_flags);
+		goto out;
+	case DCB_CMD_SDCBX:
+		ret = dcbnl_setdcbx(netdev, tb, pid, nlh->nlmsg_seq,
+				    nlh->nlmsg_flags);
+		goto out;
 	default:
 		goto errout;
 	}
-- 
1.7.1





^ permalink raw reply related

* [net-next-2.6 PATCH v2 0/4] dcbnl: Extending dcbnl to support non-host DCBX
From: Shmulik Ravid @ 2010-12-30 16:26 UTC (permalink / raw)
  To: davem; +Cc: Eilon Greenstein, John Fastabend, lucy.liu, netdev

DCBX is the exchange protocol for negotiating DCB parameters between a
host and a switch. In some circumstances the DCBX negotiator can not
reside in the host directly on top of the networking interface. For
instance many converged network adapters support an embedded DCBX engine
that performs the negotiation and configures the device with the
negotiated parameters. Another example is virtual hosts and SRIOV
virtual functions.

The following patches extend the dcbnl netlink interface so in addition
to its current semantics it offers a standard mechanism for managing
such non-host DCBX engines. In this new mode 'set' operations are used
to set the initial negotiation configuration and the 'get' operation are
used to retrieve the negotiated results. 

The current definition of dcbnl assumes a host based stack is performing
the negotiation. As DCBX is encapsulated by LLDP there can be only one
DCBX negotiator per physical link. Thus the current scheme does not
allow the coexistence of CNAs that have embedded DCBX engines and CNAs
that do not, or even just retrieving DCB parameters by VF drivers.

The last patch adds an implementation of the dcbnl operations in their
proposed new semantics to the bnx2x, allowing users to configure and
manage the embedded DCBX engine.

The entire patch series is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers




^ permalink raw reply

* Re: [uClinux-dev] Re: dm9000 patch
From: Baruch Siach @ 2010-12-30 15:12 UTC (permalink / raw)
  To: Angelo Dureghello
  Cc: Greg Ungerer, uClinux development list, Geert Uytterhoeven,
	netdev, linux-kernel, linux-m68k
In-Reply-To: <4D1C9FA5.6040208@gmail.com>

Hi Angelo,

On Thu, Dec 30, 2010 at 04:05:09PM +0100, Angelo Dureghello wrote:
> Hi Greg and All,
> infinite thanks, i solved finally my issue and the board is fully working.
> 
> I used IRQ7 line, casually, and unfortunately it was wrong.
> IRQ7 is a special autovectored interrupt, in particular, it is an
> EDGE interrupt, and not a LEVEL interrupt like IRQ1 to 6.
> 
> I used IRQ1 now, a normal level interrupt, and everything works perfect.
> 
> So i have dm9000 and MCF5307 (big-endian cpu) fully working with 32
> bit bus straight wired D0:31 to the dm9000, and would like to share
> my patch to dm9000.c, so maybe in next kernel versions dm9000 maybe
> could be enabled as default. Let me know eventually the procedure
> for this.

Just send the patch to the netdev mailing list. For additional instructions 
see Documentation/SubmitChecklist and Documentation/SubmittingPatches. Once 
you submitted you patch you can track the status of your patch at 
http://patchwork.ozlabs.org/project/netdev/list/.

baruch

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply

* Re: [uClinux-dev] Re: dm9000 patch
From: Angelo Dureghello @ 2010-12-30 15:05 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: uClinux development list, Geert Uytterhoeven, netdev,
	Baruch Siach, linux-kernel, linux-m68k
In-Reply-To: <4D1C6A30.6020706@snapgear.com>

Hi Greg and All,
infinite thanks, i solved finally my issue and the board is fully working.

I used IRQ7 line, casually, and unfortunately it was wrong.
IRQ7 is a special autovectored interrupt, in particular, it is an EDGE 
interrupt, and not a LEVEL interrupt like IRQ1 to 6.

I used IRQ1 now, a normal level interrupt, and everything works perfect.

So i have dm9000 and MCF5307 (big-endian cpu) fully working with 32 bit 
bus straight wired D0:31 to the dm9000, and would like to share my patch 
to dm9000.c, so maybe in next kernel versions dm9000 maybe could be 
enabled as default. Let me know eventually the procedure for this.

still many thanks,
regards,
angelo




On 30/12/2010 12:17, Greg Ungerer wrote:
>
> Hi Anelo,
>
> On 30/12/10 19:59, Angelo Dureghello wrote:
> [snip]
>> i phisically connected the HW interrupt pin of dm9000 to MCF5307 IRQ7
>> pin (pin68). dm9000 is configured (through a resistor to3.3V on pin 57)
>> not as default, but to act with HIGH to LOW interrupt edge, as MCF5307
>> understand, and the interrupt line is pulled up to 3.3V to avoid
>> flickering.
>>
>>
>> PULL UP RES to 3.3V
>>
>> dm9000 | |
>> IRQ |---------+-------------------| MCCF5307 PIN 68 (IRQ7)
>>
>>
>> IRQ 7 is the "level 7" autovectored interrupt (vect 31 dec).
>>
>> Checking well the MCF5307 datasheet i have seen that "level 7" interrupt
>> i casually choosed seems to be a special level:
>
> Yes, IRQ7 is "special" on the 5307. It is non-maskable. So normal
> linux locking will be broken if you use it in the normal way.
>
>
>> /18.7.1 Level 7 Interrupts
>> Level 7 interrupts are nonmaskable and are handled differently than
>> other interrupts.
>> Level 7 interrupts are edge triggered by a transition from a lower
>> priority request to the
>> level 7 request. Interrupts at all other levels are level sensitive.
>> Therefore, if IRQ7 remains
>> asserted, the MCF5307 recognizes only one level 7 interrupt because only
>> one transition
>> from a lower level request to a level 7 request occurred. For the
>> processor to
>> recognize two consecutive level 7 interrupts, one of the following must
>> occur:
>>
>> 1) The interrupt request on the interrupt control pins is raised to
>> level 7 and stays there
>> until an interrupt-acknowledge cycle begins. The level later drops but
>> then returns to
>> level 7, causing a second transition on the interrupt control lines.
>>
>> 2) The interrupt request on the interrupt control pins is raised to
>> level 7 and stays there.
>> If the level 7 interrupt routine lowers the mask level, a second level 7
>> interrupt is
>> recognized without a transition of the interrupt control pins. After the
>> level 7 routine
>> completes, the MCF5307 compares the mask level to the request level on
>> the IRQx
>> signals. Because the mask level is lower than the requested level, the
>> interrupt mask
>> is set back to level 7. To ensure it is recognized, the level 7 request
>> on IRQ7 must be
>> held until the second interrupt-acknowledge bus cycle begins./
>>
>> I guess i can try to use another IRQ line, for example IRQ1 and see what
>> happen. Let me know your thought and i can try right now to hw wire up
>> the fix.
>
> Yes, I suggest you try using another IRQ line. Stay away from IRQ7
> for any normal devices.
>
> Regards
> Greg
>
>
>
>> still many thanks,
>>
>> regards,
>> angelo
>>
>>
>> On 30/12/2010 01:37, Greg Ungerer wrote:
>>> Hi Angelo,
>>>
>>> On 30/12/10 06:57, Angelo Dureghello wrote:
>>>> Hi all,
>>>> thanks for the help,
>>>> the kernel is a main line kernel. Then yes, i am still using uclinux
>>>> tree for libc/tools.
>>>
>>> How is the DM9000 hardware connected to the 5307?
>>> I am wondering how you connected the interrupt (and to
>>> which interrupt) and the addressing (direct of a chip select)?
>>>
>>> (For example NETtel based 5307 platform support of the SMC91x code is
>>> in mainline as arch/m68knommu/platform/5307/nettel.c). Can you show
>>> the code you used to setup your dm9000 hardware?
>>> (Specifically I guess I want to know if you use the "auto-vectored"
>>> interrupt mode?)
>>>
>>> Thanks
>>> Greg
>>>
>>>
>>>> I collected another spinlock recursion with a slightly different call
>>>> stack trace, as always, the spinlock recursion issue happen on a high
>>>> tx/rx traffic of the dm9000e, in this case just asking an index.html
>>>> with some images and texts:
>>>>
>>>> [ 1108.930000] BUG: spinlock recursion on CPU#0, httpd/29
>>>> [ 1108.930000] lock: 00c42c06, .magic: dead4ead, .owner: httpd/29,
>>>> .owner_cpu: 0
>>>> [ 1108.930000] Stack from 00d7a688:
>>>> [ 1108.930000] 00d7a6b4 000ad988 001840ca 00c42c06 dead4ead 00d641d4
>>>> 0000001d 00000000
>>>> [ 1108.930000] 00c42c06 000064f0 00c42800 00d7a6e8 000adb5a 00c42c06
>>>> 00184130 00002704
>>>> [ 1108.930000] 00000000 0000001f 0014d17e 00159912 00c42b60 000064f0
>>>> 00c42800 0002cb16
>>>> [ 1108.930000] 00d7a6f8 0014d24e 00c42c06 00000000 00d7a738 000e485c
>>>> 00c42c06 00000000
>>>> [ 1108.930000] 00000000 0000001f 0014d17e 00159912 0000004a 00cfc600
>>>> 000064f0 00009a74
>>>> [ 1108.930000] 0002cb16 00191204 00d7a760 0002b6f2 00d7a760 0002b514
>>>> 0000001f 00c42800
>>>> [ 1108.930000] Call Trace:
>>>> [ 1108.930000] [000ad988] spin_bug+0x86/0x11a
>>>> [ 1108.930000] [000adb5a] do_raw_spin_lock+0x58/0x120
>>>> [ 1108.930000] [0014d24e] _raw_spin_lock_irqsave+0x28/0x32
>>>> [ 1108.930000] [000e485c] dm9000_interrupt+0x1a/0x2e0
>>>> [ 1108.930000] [0002b514] handle_IRQ_event+0x2a/0xec
>>>> [ 1108.930000] [0002b680] __do_IRQ+0xaa/0x128
>>>> [ 1108.930000] [00000bb6] do_IRQ+0x48/0x62
>>>> [ 1108.930000] [000033c6] inthandler+0x6a/0x74
>>>> [ 1108.930000] [000fb626] dev_hard_start_xmit+0x170/0x4c4
>>>> [ 1108.930000] [0010b80e] sch_direct_xmit+0xc0/0x1bc
>>>> [ 1108.930000] [000fe9de] dev_queue_xmit+0x160/0x3e6
>>>> [ 1108.930000] [001195c4] ip_finish_output+0xec/0x320
>>>> [ 1108.930000] [0011a768] ip_output+0x9e/0xa8
>>>> [ 1108.930000] [00119856] ip_local_out+0x26/0x30
>>>> [ 1108.930000] [0011a56e] ip_build_and_send_pkt+0x16e/0x178
>>>> [ 1108.930000] [0012fc96] tcp_v4_send_synack+0x52/0x90
>>>> [ 1108.930000] [00130f86] tcp_v4_conn_request+0x3fa/0x57c
>>>> [ 1108.930000] [0012a1c6] tcp_rcv_state_process+0x25e/0xa66
>>>> [ 1108.930000] [001309a4] tcp_v4_do_rcv+0x7c/0x1c8
>>>> [ 1108.930000] [00132854] tcp_v4_rcv+0x546/0x6d2
>>>> [ 1108.930000] [001153a8] ip_local_deliver+0x9c/0x1b0
>>>> [ 1108.930000] [001158e8] ip_rcv+0x42c/0x5f0
>>>> [ 1108.930000] [000fa74e] __netif_receive_skb+0x196/0x2ec
>>>> [ 1108.930000] [000fe142] process_backlog+0x72/0x11e
>>>> [ 1108.930000] [000fe290] net_rx_action+0xa2/0x150
>>>> [ 1108.930000] [0000e13c] __do_softirq+0x74/0xe4
>>>> [ 1108.930000] [0000e1e2] do_softirq+0x36/0x40
>>>> [ 1108.930000] [0000e6c6] local_bh_enable+0x7a/0xa4
>>>> [ 1108.930000] [000fe972] dev_queue_xmit+0xf4/0x3e6
>>>> [ 1108.930000] [001195c4] ip_finish_output+0xec/0x320
>>>> [ 1108.930000] [0011a768] ip_output+0x9e/0xa8
>>>> [ 1108.930000] [00119856] ip_local_out+0x26/0x30
>>>> [ 1108.930000] [0011a90a] ip_queue_xmit+0x198/0x426
>>>> [ 1108.930000] [0012bcc8] tcp_transmit_skb+0x3f0/0x76c
>>>> [ 1108.930000] [0012cfda] tcp_write_xmit+0x178/0x868
>>>> [ 1108.930000] [0012d6f8] __tcp_push_pending_frames+0x2e/0x9a
>>>> [ 1108.930000] [001222be] tcp_sendmsg+0x82e/0x98c
>>>> [ 1108.930000] [0013d9c0] inet_sendmsg+0x32/0x54
>>>> [ 1108.930000] [000ec25e] sock_aio_write+0xc8/0x138
>>>> [ 1108.930000] [00043e7e] do_sync_write+0x9e/0xfe
>>>> [ 1108.930000] [00043f56] vfs_write+0x78/0x84
>>>> [ 1108.930000] [0004446c] sys_write+0x40/0x7a
>>>> [ 1108.930000] [00003244] system_call+0x84/0xc2
>>>> [ 1108.930000]
>>>>
>>>> seems like while i transmit a packet, dm9000_interrupt try to acquire
>>>> the spinlock owned from the same task.
>>>>
>>>> Compiling the kernel i am getting:
>>>> CC kernel/irq/handle.o
>>>> kernel/irq/handle.c:432:3: warning: #warning __do_IRQ is deprecated.
>>>> Please convert to proper flow handlers
>>>>
>>>> Could the usage of __do_IRQ super-handler be a cause of the issue ?
>>>>
>>>>
>>>> many thanks,
>>>> angelo
>>>>
>>>> On 29/12/2010 19:45, Geert Uytterhoeven wrote:
>>>>> On Wed, Dec 29, 2010 at 19:06, Baruch Siach<baruch@tkos.co.il> wrote:
>>>>>> Hi Angelo,
>>>>>>
>>>>>> On Wed, Dec 29, 2010 at 02:13:22PM +0100, Angelo Dureghello wrote:
>>>>>>> just FYI, i tested kernel 2.6.36.2, unfortunately the issue is 
>>>>>>> still
>>>>>>> there, below the call stack trace.
>>>>>> Help from the m68k experts seems to be needed. Adding the relevant
>>>>>> list to Cc.
>>>>> This is uClinux? Added Cc...
>>>>>
>>>>>>> [ 4.620000] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
>>>>>>> [ 39.390000] BUG: spinlock recursion on CPU#0, httpd/29
>>>>>>> [ 39.390000] lock: 00189c44, .magic: dead4ead, .owner: httpd/29,
>>>>>>> .owner_cpu: 0
>>>>>>> [ 39.390000] Stack from 00d6a990:
>>>>>>> [ 39.390000] 00d6a9bc 000a9710 0017cac7 00189c44 dead4ead
>>>>>>> 00de48f4 0000001d 00000000
>>>>>>> [ 39.390000] 00189c44 0002a646 00145f70 00d6a9f0 000a98e2
>>>>>>> 00189c44 0017cb2d 00189c44
>>>>>>> [ 39.390000] 00d6aad8 0000001f 00145f5c 001523f6 00189c08
>>>>>>> 0002a646 00145f70 0002bc52
>>>>>>> [ 39.390000] 00d6a9fc 00145f7e 00189c44 00d6aa28 0002a75e
>>>>>>> 00189c44 0000001f 00d6aad8
>>>>>>> [ 39.390000] 0000001f 00145f5c 00189c08 0002a646 00145f70
>>>>>>> 0002bc52 00d6aa3c 00000bb6
>>>>>>> [ 39.390000] 0000001f 00189c44 00cfc780 00d6aa84 0000337a
>>>>>>> 0000001f 00d6aa4c 00000001
>>>>>>> [ 39.390000] Call Trace:
>>>>>>> [ 39.390000] [000a9710] spin_bug+0x86/0x11a
>>>>>>> [ 39.390000] [000a98e2] do_raw_spin_lock+0x58/0x120
>>>>>>> [ 39.390000] [00145f7e] _raw_spin_lock+0xe/0x14
>>>>>>> [ 39.390000] [0002a75e] __do_IRQ+0x2c/0x108
>>>>>>> [ 39.390000] [00000bb6] do_IRQ+0x48/0x62
>>>>>>> [ 39.390000] [0000337a] inthandler+0x6a/0x74
>>>>>>> [ 39.390000] [0002a82e] __do_IRQ+0xfc/0x108
>>>>>>> [ 39.390000] [00000bb6] do_IRQ+0x48/0x62
>>>>>>> [ 39.390000] [0000337a] inthandler+0x6a/0x74
>>>>>>> [ 39.390000] [000ef0ce] skb_release_all+0x10/0x20
>>>>>>> [ 39.390000] [000ee6bc] __kfree_skb+0x10/0x92
>>>>>>> [ 39.390000] [000ee75e] consume_skb+0x20/0x34
>>>>>>> [ 39.390000] [000e004e] dm9000_start_xmit+0xdc/0xec
>>>>>>> [ 39.390000] [000f67a2] dev_hard_start_xmit+0x146/0x472
>>>>>>> [ 39.390000] [00106506] sch_direct_xmit+0xc0/0x1bc
>>>>>>> [ 39.390000] [000f9914] dev_queue_xmit+0x160/0x3e4
>>>>>>> [ 39.390000] [00113b3e] ip_finish_output+0xee/0x318
>>>>>>> [ 39.390000] [001142b4] ip_output+0x7c/0x88
>>>>>>> [ 39.390000] [00113dc6] ip_local_out+0x26/0x30
>>>>>>> [ 39.390000] [00114d9a] ip_queue_xmit+0x152/0x374
>>>>>>> [ 39.390000] [00125c8c] tcp_transmit_skb+0x3f0/0x732
>>>>>>> [ 39.390000] [00126f26] tcp_write_xmit+0x178/0x868
>>>>>>> [ 39.390000] [00127644] __tcp_push_pending_frames+0x2e/0x9a
>>>>>>> [ 39.390000] [0011c3d6] tcp_sendmsg+0x82e/0x98c
>>>>>>> [ 39.390000] [00137544] inet_sendmsg+0x32/0x54
>>>>>>> [ 39.390000] [000e79a6] sock_aio_write+0xc8/0x138
>>>>>>> [ 39.390000] [00042590] do_sync_write+0x9e/0xfe
>>>>>>> [ 39.390000] [00042668] vfs_write+0x78/0x84
>>>>>>> [ 39.390000] [00042a92] sys_write+0x40/0x7a
>>>>>>> [ 39.390000] [00003224] system_call+0x84/0xc2
>>>>>>> [ 39.390000]
>>>>>>>
>>>>>>> dm9000e is as default not visible/selectable in menuconfig for
>>>>>>> Coldfire architectures, so this probably cannot be considered as a
>>>>>>> kernel bug.
>>>>>>>
>>>>>>> I going forward in investigations, every help is appreciated,
>>>>>>>
>>>>>>> regards,
>>>>>>> angelo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 29/12/2010 07:06, Baruch Siach wrote:
>>>>>>>> Hi Angelo,
>>>>>>>>
>>>>>>>> On Tue, Dec 28, 2010 at 10:52:42PM +0100, Angelo Dureghello wrote:
>>>>>>>>> sorry to contact you directly but i couldn't get any help from 
>>>>>>>>> the
>>>>>>>>> kernel.org mailing list, since i am not a developer my mails are
>>>>>>>>> generally skipped.
>>>>>>>> The best way to get the contact info for a piece of kernel 
>>>>>>>> code, is
>>>>>>>> using the
>>>>>>>> get_maintainer.pl script. Running 'scripts/get_maintainer.pl -f
>>>>>>>> drivers/net/dm9000.c' gives the following output:
>>>>>>>>
>>>>>>>> netdev@vger.kernel.org
>>>>>>>> linux-kernel@vger.kernel.org
>>>>>>>>
>>>>>>>> I added both to Cc.
>>>>>>>>
>>>>>>>>> I am very near to have a custom board working with MCF5307 cpu 
>>>>>>>>> and
>>>>>>>>> dm9000.
>>>>>>>>> I am using kernel 2.6.36-rc3 with your last patch about
>>>>>>>>> spinlock-recursion already included.
>>>>>>>> You should try to update to the latest .36 kernel, which is
>>>>>>>> currently
>>>>>>>> 2.6.36.2. The problem that you experience might be unrelated to 
>>>>>>>> the
>>>>>>>> dm9000
>>>>>>>> driver (or to networking at all), so it might have been fixed in
>>>>>>>> this version.
>>>>>>>>
>>>>>>>>> I have "ping" and "telnet" to the embedded board fully working.
>>>>>>>>> If i try to get a sample web page with some images from the board
>>>>>>>>> httpd with a browser, in 80% of cases i get a trap/oops:
>>>>>>>> Try to enable KALLSYMS in your kernel .config to make your stack
>>>>>>>> trace more
>>>>>>>> meaningful. This is under 'General setup -> Configure standard
>>>>>>>> kernel features
>>>>>>>> (for small systems) -> Load all symbols for debugging/ksymoops'.
>>>>>>>>
>>>>>>>> I hope this helps.
>>>>>>>>
>>>>>>>> baruch
>>>>>>>>
>>>>>>>>> [ 4.590000] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
>>>>>>>>> [ 67.630000] BUG: spinlock recursion on CPU#0, httpd/29
>>>>>>>>> [ 67.630000] lock: 00c42c06, .magic: dead4ead, .owner: httpd/29,
>>>>>>>>> .owner_cpu: 0
>>>>>>>>> [ 67.630000] Stack from 00d7b914:
>>>>>>>>> [ 67.630000] 00d7b940 000a8cf0 0015f693 00c42c06 dead4ead
>>>>>>>>> 00dec1d4 0000001d 00000000
>>>>>>>>> [ 67.630000] 00c42c06 00006188 00c42800 00d7b974 000a8ec2
>>>>>>>>> 00c42c06 0015f6f9 00002704
>>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 00c42b60
>>>>>>>>> 00006188 00c42800 0002b312
>>>>>>>>> [ 67.630000] 00d7b984 0014701e 00c42c06 00000000 00d7b9c4
>>>>>>>>> 000df21c 00c42c06 00000000
>>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 000005ea
>>>>>>>>> 00cfc640 00006188 000096e8
>>>>>>>>> [ 67.630000] 0002b312 00146fa4 00c42b60 00002704 00d7b9ec
>>>>>>>>> 00029d3a 0000001f 00c42800
>>>>>>>>> [ 67.630000] Call Trace:
>>>>>>>>> [ 67.630000] [000a8cf0] [000a8ec2] [0014701e] [000df21c] 
>>>>>>>>> [00029d3a]
>>>>>>>>> [ 67.630000] [00029e84] [00000bb6] [0000336e] [000df162] 
>>>>>>>>> [000effd6]
>>>>>>>>> [ 67.630000] [00100482] [000f312e] [000f9ebc] [0010dd2a] 
>>>>>>>>> [0010e4a0]
>>>>>>>>> [ 67.630000] [0010dfb2] [0010ef80] [0011fed6] [00121170] 
>>>>>>>>> [0012188e]
>>>>>>>>> [ 67.630000] [0011ecc6] [001249fe] [000e4084] [0011621c] 
>>>>>>>>> [00131a44]
>>>>>>>>> [ 67.630000] [000e11ee] [00041944] [00041a1c] [00041e46] 
>>>>>>>>> [00003218]
>>>>>>>>> [ 67.630000] BUG: spinlock lockup on CPU#0, httpd/29, 00c42c06
>>>>>>>>> [ 67.630000] Stack from 00d7b934:
>>>>>>>>> [ 67.630000] 00d7b974 000a8f66 0015f703 00000000 00dec1d4
>>>>>>>>> 0000001d 00c42c06 00002704
>>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 00c42b60
>>>>>>>>> 00006188 00c42800 0002b312
>>>>>>>>> [ 67.630000] 00d7b984 0014701e 00c42c06 00000000 00d7b9c4
>>>>>>>>> 000df21c 00c42c06 00000000
>>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 000005ea
>>>>>>>>> 00cfc640 00006188 000096e8
>>>>>>>>> [ 67.630000] 0002b312 00146fa4 00c42b60 00002704 00d7b9ec
>>>>>>>>> 00029d3a 0000001f 00c42800
>>>>>>>>> [ 67.630000] 0016c1b4 00cfc640 0000001f 0016c178 00029d10
>>>>>>>>> 00146fb8 00d7ba20 00029e84
>>>>>>>>> [ 67.630000] Call Trace:
>>>>>>>>> [ 67.630000] [000a8f66] [0014701e] [000df21c] [00029d3a] 
>>>>>>>>> [00029e84]
>>>>>>>>> [ 67.630000] [00000bb6] [0000336e] [000df162] [000effd6] 
>>>>>>>>> [00100482]
>>>>>>>>> [ 67.630000] [000f312e] [000f9ebc] [0010dd2a] [0010e4a0] 
>>>>>>>>> [0010dfb2]
>>>>>>>>> [ 67.630000] [0010ef80] [0011fed6] [00121170] [0012188e] 
>>>>>>>>> [0011ecc6]
>>>>>>>>> [ 67.630000] [001249fe] [000e4084] [0011621c] [00131a44] 
>>>>>>>>> [000e11ee]
>>>>>>>>> [ 67.630000] [00041944] [00041a1c] [00041e46] [00003218]
>>>>>>>>>
>>>>>>>>> As i said, i was hoping in your patch but i sadly discovered 
>>>>>>>>> it is
>>>>>>>>> already included in this kernel version.
>>>>>>>>> Hope you can give me some help or can forward me to an 
>>>>>>>>> appropriate
>>>>>>>>> mailing list.
>>>>>> -- 
>>>>>> ~. .~ Tk Open Systems
>>>>>> =}------------------------------------------------ooO--U--Ooo------------{= 
>>>>>>
>>>>>>
>>>>>>
>>>>>> - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
>>>>>> -- 
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-kernel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> uClinux-dev mailing list
>>>> uClinux-dev@uclinux.org
>>>> http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
>>>> This message was resent by uclinux-dev@uclinux.org
>>>> To unsubscribe see:
>>>> http://mailman.uclinux.org/mailman/options/uclinux-dev
>>>>
>>>
>>>
>>
>>
>>
>
>

^ permalink raw reply

* [PATCH net-next-2.6] sfq: fix slot_dequeue_head()
From: Eric Dumazet @ 2010-12-30 15:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jarek Poplawski
In-Reply-To: <20101222073211.GA7001@ff.dom.local>

Le mercredi 22 décembre 2010 à 07:32 +0000, Jarek Poplawski a écrit :
> > Also, slot_dequeue_tail() should make sure slot skb chain is correctly
> > terminated, or sfq_dump_class_stats() can access freed skbs.
> 
> ...and a good hint for code reusing ;-)

Yes, and of course same fix is needed in slot_dequeue_head(), as further
testing on my side made it pretty clear.

I was adding possibility to have more packets queued in SFQ (more
packets than max number of flows) and got unexpected crashes.

Reverting to net-next-2.6, I still got crashes. Oops.

[PATCH net-next-2.6] sfq: fix slot_dequeue_head()

slot_dequeue_head() should make sure slot skb chain is correct in both
ways, or we can crash if all possible flows are in use.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
---
 net/sched/sch_sfq.c |    1 +
 1 files changed, 1 insertion(+)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 6a2f88f..3977e56 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -292,6 +292,7 @@ static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot)
 	struct sk_buff *skb = slot->skblist_next;
 
 	slot->skblist_next = skb->next;
+	skb->next->prev = (struct sk_buff *)slot;
 	skb->next = skb->prev = NULL;
 	return skb;
 }



^ permalink raw reply related

* Buffer-bloat (was Re: IPTV buffering)
From: Jesper Dangaard Brouer @ 2010-12-30 14:53 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Netfilter Developer Mailing List, netdev
In-Reply-To: <Pine.LNX.4.64.1012211634330.21593@ask.diku.dk>

On Tue, 21 Dec 2010, Jesper Dangaard Brouer wrote:

>
> On Thu, 16 Dec 2010, Jan Engelhardt wrote:
>>  On Thursday 2010-12-16 10:57, Jesper Dangaard Brouer wrote:
>> 
>> >  [...] NetConf 2010, see:
>> > 
>> >  http://vger.kernel.org/netconf2010.html
>>
>>  I just went over a few slide sets, and noticed Dave's Netfilter summary
>>  about your IPTV talk, enlisting the point
>>
>>  * Ethernet switches buffer too small
>>
>>  ("too small".. "too few"?) Given the recent uproar about bufferbloat in
>>  routing devices (see LWN coverage about Getty's articles), wanting
>>  larger buffers seems to almost contradict what Getty would like.
>
> Always wanting small buffers doesn't make sense.  It seem that he is not 
> considering that network equipment can be used for other things than TCP/IP.

I have created a blogpost:
  http://netoptimizer.blogspot.com/2010/12/buffer-bloat-calculations.html

Where I explain how it makes sense to have small buffers on links with a 
small bandwidth.

  - ISPs need to adjust the buffer size according to the bandwidth of the 
link.


>>  Wanting more buffers vs. wanting less buffering seems to be quite
>>  contradictory. Jesper, what is your take on this?
>
> Skimming through Getty's blog post, I think Getty has actually missed what is 
> happening.  He should read my masters thesis[1]... The real problem is that 
> TCP/IP is clocked by the ACK packets, and on asymetric links (like ADSL and 
> DOCSIS), the ACK packets are simply comming downstream too fast on the larger 
> downstream link, resulting in bursts and high-latency on the upstream link.

Adjusting my statement; the asym ACK issue might be part the issue, 
causing the packets to queue in the buffer.

The buffer-bloat issue is very true and a real-life issue.  ISPs need to 
adjust the buffers according to the bandwidth on the link!

Cheers,
   Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time
From: Rob Landley @ 2010-12-30 12:52 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rob Landley, Trond Myklebust, J. Bruce Fields, Neil Brown,
	Pavel Emelyanov, linux-nfs, David S. Miller, netdev, linux-kernel
In-Reply-To: <20101230114514.GA31976@shutemov.name>

On 12/30/2010 05:45 AM, Kirill A. Shutemov wrote:
> On Thu, Dec 30, 2010 at 05:05:22AM -0600, Rob Landley wrote:
>> On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov<kas@openvz.org>  wrote:
>>> On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
>>>> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
>>>>>>> If no rpcmount mountoption, no rpc_pipefs was found at
>>>>>>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
>>>>>>> init_rpc_pipefs.
>>>>>>
>>>>>> It's the "we are in init's mount namespace" that I was wondering about.
>>>>>>
>>>>>> So if I naievely chroot, nfs mount stops working the way it did before I
>>>>>> chrooted unless I do an extra setup step?
>>>>>
>>>>> No. It will work as before since you are still in init's mount namespace.
>>>>> Creating new mount namespace changes rules.
>>>>
>>>> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs.  Got it.
>>>>
>>>> I'm kind of surprised that the kernel cares about a specific path under
>>>> /var/lib.  (Seems like policy in the kernel somehow.)
>>>
>>> Yep. It's bad, but there is way to overwrite the default.
>>>
>>> Other way is to leave 'rpcmount' mountoption without default.
>>> get_rpc_pipefs(NULL) in init's mount namespace will always return
>>> init_rpc_pipefs, without filesystem lookup.
>>> get_rpc_pipefs(NULL) in non-init's mount namespace will always return
>>> error.
>>>
>>> So you will have to specify 'rpcmount' mountoption for every nfs mount in
>>> container. Hmm, I guess, it may confuse user.
>>>
>>> Or we can try to move the default to userspace. /sbin/mount.nfs?
>>
>> /proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
>> binary.  Once upon a time /sys/hotplug was the default value, and that was
>> there to overwrite it.  (They changed the default to blank (disabled) not due
>> to policy reasons, but due to adding the netlink hotplug notification
>> mechanism and making that the default.)
>>
>> I bring that up to point out that the general consensus about policy in the
>> kernel seems to be "when you really really can't avoid having any, make a
>> sane default the user can override".
>>
>> (Of course adding another entry to the crawling horror of /proc may not
>> be an improvement.  But individual overrides at the mount -o level seem
>> like a non-optimal granularity for this...)
>
> Do you propose to implement default as sysctl parameter?

I was pointing out it's been done before.

I'd prefer autodetecting it so new namespaces and the base namespace 
don't have magic policy _or_ require different mount invocations.  An 
ability to change the default for a value is less appealing than not 
needing the value in the first place.

And changing the default would probably have to be per-container anyway 
to be useful.  (Which isn't _quite_ the same as per-namespace since you 
can chroot without CLONE_NEWNS.)

(I keep thinking back to web service providers offering cheap web 
hosting "with root access" via openvz containers and such.  They're 
administering their own boxes, but aren't big iron guys.  This is yet 
another thing for them to understand that didn't apply to the linux box 
they have at home, and I'm just wondering if there's a way they don't 
have to.)

>>>> Can't it just
>>>> check the current process's mount list to see if an instance of
>>>> rpc_pipefs is mounted in the current namespace the way lxc looks for
>>>> cgroups?  Or are there potential performance/scalability issues with that?
>>>
>>> What should we do if we have several rpc_pipefs mounts in the namespace?
>>
>> You mean more than one inside a given process's view of the filesystem, taking
>> into account chroot like /proc/mounts does?
>>
>> Before this patch series, there was one instance systemwide.  The patch changed
>> that to look a fixed location in the filesystem relative to the
>> current chroot.  Either
>> way, there was one instance available to a given process doing an nfs mount.
>>
>> What's the use case for having more than one visible to a given process?
>> (NUMA scalability?  Some sort of multipath/VPN routing context?)
>
> It's no so obvious for me why we should restrict it. ;)

You can still provide a specific location with "-o rpcmount=/blah", 
correct?  So this isn't restricting it, this is autodetecting the 
default value, using the visible mount point of the appropriate type.

> Currently, there is no association between rpc_pipefs and mount namespace,

There is in that the root context doesn't need to have this mounted, and 
new namespaces do.  So there's an existing association between a LACK of 
a namespace and a different default behavior.

My understanding (correct me if I'm wrong) is that the historical 
behavior is that there's only one, and it doesn't actually live anywhere 
in the filesystem tree.  You're adding a special location.  I'm 
wondering if there's any way for that location not to be special.

> so I don't see simple way to restrict number of rpc_pipefs per mount
> namespace. Associating mount namespace with rpc_pipefs is not a good idea,
> I think.

I'm talking about associating a default rpc_pipefs instance with a 
namespace, which it seems to me you're already doing by emulating the 
legacy behavior.  Before you CLONE_NEWNS you get a magic default mount 
that doesn't exist in the tree.  After you CLONE_NEWNS you get something 
like -EINVAL unless you supply your own default.  (I'm actually not sure 
why new namespaces don't fall back to the magic global one...)

I'm suggesting that if the user doesn't specify -o rpcmount then the 
default could be the first rpc_pipefs mount visible to the current 
process context, rather than a specific path.  Logic to do that exists 
in the proc/self/mounts code (which I'm reading through now...).

(Your 00/12 post doesn't actually explain what can be _different_ about 
the various instances of rpc_pipefs, and hence why you'd want to mount 
it multiple times.  I'm still coming up to speed on the guts of NFS. 
The use case I'm trying to fix involves containers with different 
network routing than the host, and this looks like potentially part of 
the solution to that, but I'm still putting together enough context to 
work out how....)

Rob

^ permalink raw reply

* Re: [PATCH v4] Gemini: Gigabit ethernet driver
From: Hans Ulli Kroll @ 2010-12-30 11:48 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Hans Ulli Kroll, gemini-board-dev, netdev, Christoph Biedl
In-Reply-To: <20101230083905.5A8EB13909@rere.qmqm.pl>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 865 bytes --]



On Thu, 30 Dec 2010, Michał Mirosław wrote:

> Driver for SL351x (Gemini) SoC ethernet peripheral. Based in part
> on work by Paulius Zaleckas and GPLd code from Raidsonic and other
> NAS vendors.
> 
> Tested on Raidsonic IcyBox 4220-B (dual SATA NAS).
> 
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
> 

Hi Michał,

I've make a quick compile and run test on my IB4220
Looks good.

But we must include the file gmac.h from 
arch/arm/mach-gemini/include/mach.
We break the build, if we don't do this.

Other issue is the __raw_writel and __raw_readl functions, we can switch 
to readl and writel. This is due some mistake of the original developer 
(Stormlinksemi) who doesn't understand endianes on ARM

I've tested this driver on my IB4220 only in 100MBit/s mode. PHY is broken.

I have another IB4220 at my home with working PHY

Ulli


^ permalink raw reply

* Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time
From: Kirill A. Shutemov @ 2010-12-30 11:45 UTC (permalink / raw)
  To: Rob Landley
  Cc: Kirill A. Shutemov, Rob Landley, Trond Myklebust, J. Bruce Fields,
	Neil Brown, Pavel Emelyanov, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <AANLkTim2QrkSW0HufD5wp=-8ikwydN5SUS+fdWK6JHqb-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Dec 30, 2010 at 05:05:22AM -0600, Rob Landley wrote:
> On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov <kas-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> > On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
> >> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
> >> >>> If no rpcmount mountoption, no rpc_pipefs was found at
> >> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
> >> >>> init_rpc_pipefs.
> >> >>
> >> >> It's the "we are in init's mount namespace" that I was wondering about.
> >> >>
> >> >> So if I naievely chroot, nfs mount stops working the way it did before I
> >> >> chrooted unless I do an extra setup step?
> >> >
> >> > No. It will work as before since you are still in init's mount namespace.
> >> > Creating new mount namespace changes rules.
> >>
> >> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs.  Got it.
> >>
> >> I'm kind of surprised that the kernel cares about a specific path under
> >> /var/lib.  (Seems like policy in the kernel somehow.)
> >
> > Yep. It's bad, but there is way to overwrite the default.
> >
> > Other way is to leave 'rpcmount' mountoption without default.
> > get_rpc_pipefs(NULL) in init's mount namespace will always return
> > init_rpc_pipefs, without filesystem lookup.
> > get_rpc_pipefs(NULL) in non-init's mount namespace will always return
> > error.
> >
> > So you will have to specify 'rpcmount' mountoption for every nfs mount in
> > container. Hmm, I guess, it may confuse user.
> >
> > Or we can try to move the default to userspace. /sbin/mount.nfs?
> 
> /proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
> binary.  Once upon a time /sys/hotplug was the default value, and that was
> there to overwrite it.  (They changed the default to blank (disabled) not due
> to policy reasons, but due to adding the netlink hotplug notification
> mechanism and making that the default.)
> 
> I bring that up to point out that the general consensus about policy in the
> kernel seems to be "when you really really can't avoid having any, make a
> sane default the user can override".
> 
> (Of course adding another entry to the crawling horror of /proc may not
> be an improvement.  But individual overrides at the mount -o level seem
> like a non-optimal granularity for this...)

Do you propose to implement default as sysctl parameter?

> >> Can't it just
> >> check the current process's mount list to see if an instance of
> >> rpc_pipefs is mounted in the current namespace the way lxc looks for
> >> cgroups?  Or are there potential performance/scalability issues with that?
> >
> > What should we do if we have several rpc_pipefs mounts in the namespace?
> 
> You mean more than one inside a given process's view of the filesystem, taking
> into account chroot like /proc/mounts does?
> 
> Before this patch series, there was one instance systemwide.  The patch changed
> that to look a fixed location in the filesystem relative to the
> current chroot.  Either
> way, there was one instance available to a given process doing an nfs mount.
> 
> What's the use case for having more than one visible to a given process?
> (NUMA scalability?  Some sort of multipath/VPN routing context?)

It's no so obvious for me why we should restrict it. ;)

Currently, there is no association between rpc_pipefs and mount namespace,
so I don't see simple way to restrict number of rpc_pipefs per mount
namespace. Associating mount namespace with rpc_pipefs is not a good idea,
I think.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-2.6] bridge: fix br_multicast_ipv6_rcv for paged skbs
From: Tomas Winkler @ 2010-12-30 11:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-wireless, Tomas Winkler, Johannes Berg
In-Reply-To: <AANLkTi=XQ2DE==ggJHfokUE1=opDVOFxQNeqwY1xZN_8@mail.gmail.com>

use pskb_may_pull to access header correctly for paged skbs

the pskb_may_pull ideom is used ipv6 heder parsing
but omitted int the bridge code

this fixes bug https://bugzilla.kernel.org/show_bug.cgi?id=25202

Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: authenticated
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: associated (aid 2)
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 RADIUS: starting accounting session 4D0608A3-00000005
Dec 15 14:36:41 User-PC kernel: [175576.120287] ------------[ cut here ]------------
Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178!
Dec 15 14:36:41 User-PC kernel: [175576.120609] invalid opcode: 0000 [#1] SMP
Dec 15 14:36:41 User-PC kernel: [175576.120749] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
Dec 15 14:36:41 User-PC kernel: [175576.121035] Modules linked in: oprofile binfmt_misc bridge stp llc parport_pc ppdev arc4 iwlagn snd_hda_codec_realtek iwlcore i915 snd_hda_intel mac80211 joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi drm_kms_helper snd_rawmidi drm snd_seq_midi_event snd_seq snd_timer snd_seq_device cfg80211 eeepc_wmi usbhid psmouse intel_agp i2c_algo_bit intel_gtt uvcvideo agpgart videodev sparse_keymap snd shpchp v4l1_compat lp hid video serio_raw soundcore output snd_page_alloc ahci libahci atl1c
Dec 15 14:36:41 User-PC kernel: [175576.122712]
Dec 15 14:36:41 User-PC kernel: [175576.122769] Pid: 0, comm: kworker/0:0 Tainted: G        W   2.6.37-rc5-wl+ #3 1015PE/1016P
Dec 15 14:36:41 User-PC kernel: [175576.123012] EIP: 0060:[<f83edd65>] EFLAGS: 00010283 CPU: 1
Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.123362] EAX: 0000001c EBX: f5626318 ECX: 00000000 EDX: 00000000
Dec 15 14:36:41 User-PC kernel: [175576.123550] ESI: ec512262 EDI: f5626180 EBP: f60b5ca0 ESP: f60b5bd8
Dec 15 14:36:41 User-PC kernel: [175576.123737]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 15 14:36:41 User-PC kernel: [175576.123902] Process kworker/0:0 (pid: 0, ti=f60b4000 task=f60a8000 task.ti=f60b0000)
Dec 15 14:36:41 User-PC kernel: [175576.124137] Stack:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ec556500 f6d06800 f60b5be8 c01087d8 ec512262 00000030 00000024 f5626180
Dec 15 14:36:41 User-PC kernel: [175576.124181]  f572c200 ef463440 f5626300 3affffff f6d06dd0 e60766a4 000000c4 f6d06860
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ffffffff ec55652c 00000001 f6d06844 f60b5c64 c0138264 c016e451 c013e47d
Dec 15 14:36:41 User-PC kernel: [175576.124181] Call Trace:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01087d8>] ? sched_clock+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0138264>] ? enqueue_entity+0x174/0x440
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e451>] ? sched_clock_cpu+0x131/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c013e47d>] ? select_task_rq_fair+0x2ad/0x730
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0524fc1>] ? nf_iterate+0x71/0x90
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4914>] ? br_handle_frame_finish+0x184/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e46e9>] ? br_handle_frame+0x189/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4560>] ? br_handle_frame+0x0/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff026>] ? __netif_receive_skb+0x1b6/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04f7a30>] ? skb_copy_bits+0x110/0x210
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0503a7f>] ? netif_receive_skb+0x6f/0x80
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cb74c>] ? ieee80211_deliver_skb+0x8c/0x1a0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cc836>] ? ieee80211_rx_handlers+0xeb6/0x1aa0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff1f0>] ? __netif_receive_skb+0x380/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e242>] ? sched_clock_local+0xb2/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c012b688>] ? default_spin_lock_flags+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cd621>] ? ieee80211_prepare_and_rx_handle+0x201/0xa90 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82ce154>] ? ieee80211_rx+0x2a4/0x830 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f815a8d6>] ? iwl_update_stats+0xa6/0x2a0 [iwlcore]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8499212>] ? iwlagn_rx_reply_rx+0x292/0x3b0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8483697>] ? iwl_rx_handle+0xe7/0x350 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8486ab7>] ? iwl_irq_tasklet+0xf7/0x5c0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01aece1>] ? __rcu_process_callbacks+0x201/0x2d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150d05>] ? tasklet_action+0xc5/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150a07>] ? __do_softirq+0x97/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d910c>] ? nmi_stack_correct+0x2f/0x34
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150970>] ? __do_softirq+0x0/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  <IRQ>
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01508f5>] ? irq_exit+0x65/0x70
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05df062>] ? do_IRQ+0x52/0xc0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01036b0>] ? common_interrupt+0x30/0x38
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c03a1fc2>] ? intel_idle+0xc2/0x160
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04daebb>] ? cpuidle_idle_call+0x6b/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0101dea>] ? cpu_idle+0x8a/0xf0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d2702>] ? start_secondary+0x1e8/0x1ee

Cc:YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
 net/bridge/br_multicast.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index f19e347..074c478 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1464,6 +1464,10 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
 	if (offset < 0 || nexthdr != IPPROTO_ICMPV6)
 		return 0;
 
+	if (!pskb_may_pull(skb,
+		(skb_network_header(skb) + offset + 1 - skb->data))) 
+                        return 0;
+
 	/* Okay, we found ICMPv6 header */
 	skb2 = skb_clone(skb, GFP_ATOMIC);
 	if (!skb2)
-- 
1.7.3.4

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply related

* Re: Re: dm9000 patch
From: Greg Ungerer @ 2010-12-30 11:17 UTC (permalink / raw)
  To: Angelo Dureghello
  Cc: Baruch Siach, uClinux development list, netdev, linux-m68k,
	linux-kernel
In-Reply-To: <4D1C5807.5050702@gmail.com>


Hi Anelo,

On 30/12/10 19:59, Angelo Dureghello wrote:
[snip]
> i phisically connected the HW interrupt pin of dm9000 to MCF5307 IRQ7
> pin (pin68). dm9000 is configured (through a resistor to3.3V on pin 57)
> not as default, but to act with HIGH to LOW interrupt edge, as MCF5307
> understand, and the interrupt line is pulled up to 3.3V to avoid
> flickering.
>
>
> PULL UP RES to 3.3V
>
> dm9000 | |
> IRQ |---------+-------------------| MCCF5307 PIN 68 (IRQ7)
>
>
> IRQ 7 is the "level 7" autovectored interrupt (vect 31 dec).
>
> Checking well the MCF5307 datasheet i have seen that "level 7" interrupt
> i casually choosed seems to be a special level:

Yes, IRQ7 is "special" on the 5307. It is non-maskable. So normal
linux locking will be broken if you use it in the normal way.


> /18.7.1 Level 7 Interrupts
> Level 7 interrupts are nonmaskable and are handled differently than
> other interrupts.
> Level 7 interrupts are edge triggered by a transition from a lower
> priority request to the
> level 7 request. Interrupts at all other levels are level sensitive.
> Therefore, if IRQ7 remains
> asserted, the MCF5307 recognizes only one level 7 interrupt because only
> one transition
> from a lower level request to a level 7 request occurred. For the
> processor to
> recognize two consecutive level 7 interrupts, one of the following must
> occur:
>
> 1) The interrupt request on the interrupt control pins is raised to
> level 7 and stays there
> until an interrupt-acknowledge cycle begins. The level later drops but
> then returns to
> level 7, causing a second transition on the interrupt control lines.
>
> 2) The interrupt request on the interrupt control pins is raised to
> level 7 and stays there.
> If the level 7 interrupt routine lowers the mask level, a second level 7
> interrupt is
> recognized without a transition of the interrupt control pins. After the
> level 7 routine
> completes, the MCF5307 compares the mask level to the request level on
> the IRQx
> signals. Because the mask level is lower than the requested level, the
> interrupt mask
> is set back to level 7. To ensure it is recognized, the level 7 request
> on IRQ7 must be
> held until the second interrupt-acknowledge bus cycle begins./
>
> I guess i can try to use another IRQ line, for example IRQ1 and see what
> happen. Let me know your thought and i can try right now to hw wire up
> the fix.

Yes, I suggest you try using another IRQ line. Stay away from IRQ7
for any normal devices.

Regards
Greg



> still many thanks,
>
> regards,
> angelo
>
>
> On 30/12/2010 01:37, Greg Ungerer wrote:
>> Hi Angelo,
>>
>> On 30/12/10 06:57, Angelo Dureghello wrote:
>>> Hi all,
>>> thanks for the help,
>>> the kernel is a main line kernel. Then yes, i am still using uclinux
>>> tree for libc/tools.
>>
>> How is the DM9000 hardware connected to the 5307?
>> I am wondering how you connected the interrupt (and to
>> which interrupt) and the addressing (direct of a chip select)?
>>
>> (For example NETtel based 5307 platform support of the SMC91x code is
>> in mainline as arch/m68knommu/platform/5307/nettel.c). Can you show
>> the code you used to setup your dm9000 hardware?
>> (Specifically I guess I want to know if you use the "auto-vectored"
>> interrupt mode?)
>>
>> Thanks
>> Greg
>>
>>
>>> I collected another spinlock recursion with a slightly different call
>>> stack trace, as always, the spinlock recursion issue happen on a high
>>> tx/rx traffic of the dm9000e, in this case just asking an index.html
>>> with some images and texts:
>>>
>>> [ 1108.930000] BUG: spinlock recursion on CPU#0, httpd/29
>>> [ 1108.930000] lock: 00c42c06, .magic: dead4ead, .owner: httpd/29,
>>> .owner_cpu: 0
>>> [ 1108.930000] Stack from 00d7a688:
>>> [ 1108.930000] 00d7a6b4 000ad988 001840ca 00c42c06 dead4ead 00d641d4
>>> 0000001d 00000000
>>> [ 1108.930000] 00c42c06 000064f0 00c42800 00d7a6e8 000adb5a 00c42c06
>>> 00184130 00002704
>>> [ 1108.930000] 00000000 0000001f 0014d17e 00159912 00c42b60 000064f0
>>> 00c42800 0002cb16
>>> [ 1108.930000] 00d7a6f8 0014d24e 00c42c06 00000000 00d7a738 000e485c
>>> 00c42c06 00000000
>>> [ 1108.930000] 00000000 0000001f 0014d17e 00159912 0000004a 00cfc600
>>> 000064f0 00009a74
>>> [ 1108.930000] 0002cb16 00191204 00d7a760 0002b6f2 00d7a760 0002b514
>>> 0000001f 00c42800
>>> [ 1108.930000] Call Trace:
>>> [ 1108.930000] [000ad988] spin_bug+0x86/0x11a
>>> [ 1108.930000] [000adb5a] do_raw_spin_lock+0x58/0x120
>>> [ 1108.930000] [0014d24e] _raw_spin_lock_irqsave+0x28/0x32
>>> [ 1108.930000] [000e485c] dm9000_interrupt+0x1a/0x2e0
>>> [ 1108.930000] [0002b514] handle_IRQ_event+0x2a/0xec
>>> [ 1108.930000] [0002b680] __do_IRQ+0xaa/0x128
>>> [ 1108.930000] [00000bb6] do_IRQ+0x48/0x62
>>> [ 1108.930000] [000033c6] inthandler+0x6a/0x74
>>> [ 1108.930000] [000fb626] dev_hard_start_xmit+0x170/0x4c4
>>> [ 1108.930000] [0010b80e] sch_direct_xmit+0xc0/0x1bc
>>> [ 1108.930000] [000fe9de] dev_queue_xmit+0x160/0x3e6
>>> [ 1108.930000] [001195c4] ip_finish_output+0xec/0x320
>>> [ 1108.930000] [0011a768] ip_output+0x9e/0xa8
>>> [ 1108.930000] [00119856] ip_local_out+0x26/0x30
>>> [ 1108.930000] [0011a56e] ip_build_and_send_pkt+0x16e/0x178
>>> [ 1108.930000] [0012fc96] tcp_v4_send_synack+0x52/0x90
>>> [ 1108.930000] [00130f86] tcp_v4_conn_request+0x3fa/0x57c
>>> [ 1108.930000] [0012a1c6] tcp_rcv_state_process+0x25e/0xa66
>>> [ 1108.930000] [001309a4] tcp_v4_do_rcv+0x7c/0x1c8
>>> [ 1108.930000] [00132854] tcp_v4_rcv+0x546/0x6d2
>>> [ 1108.930000] [001153a8] ip_local_deliver+0x9c/0x1b0
>>> [ 1108.930000] [001158e8] ip_rcv+0x42c/0x5f0
>>> [ 1108.930000] [000fa74e] __netif_receive_skb+0x196/0x2ec
>>> [ 1108.930000] [000fe142] process_backlog+0x72/0x11e
>>> [ 1108.930000] [000fe290] net_rx_action+0xa2/0x150
>>> [ 1108.930000] [0000e13c] __do_softirq+0x74/0xe4
>>> [ 1108.930000] [0000e1e2] do_softirq+0x36/0x40
>>> [ 1108.930000] [0000e6c6] local_bh_enable+0x7a/0xa4
>>> [ 1108.930000] [000fe972] dev_queue_xmit+0xf4/0x3e6
>>> [ 1108.930000] [001195c4] ip_finish_output+0xec/0x320
>>> [ 1108.930000] [0011a768] ip_output+0x9e/0xa8
>>> [ 1108.930000] [00119856] ip_local_out+0x26/0x30
>>> [ 1108.930000] [0011a90a] ip_queue_xmit+0x198/0x426
>>> [ 1108.930000] [0012bcc8] tcp_transmit_skb+0x3f0/0x76c
>>> [ 1108.930000] [0012cfda] tcp_write_xmit+0x178/0x868
>>> [ 1108.930000] [0012d6f8] __tcp_push_pending_frames+0x2e/0x9a
>>> [ 1108.930000] [001222be] tcp_sendmsg+0x82e/0x98c
>>> [ 1108.930000] [0013d9c0] inet_sendmsg+0x32/0x54
>>> [ 1108.930000] [000ec25e] sock_aio_write+0xc8/0x138
>>> [ 1108.930000] [00043e7e] do_sync_write+0x9e/0xfe
>>> [ 1108.930000] [00043f56] vfs_write+0x78/0x84
>>> [ 1108.930000] [0004446c] sys_write+0x40/0x7a
>>> [ 1108.930000] [00003244] system_call+0x84/0xc2
>>> [ 1108.930000]
>>>
>>> seems like while i transmit a packet, dm9000_interrupt try to acquire
>>> the spinlock owned from the same task.
>>>
>>> Compiling the kernel i am getting:
>>> CC kernel/irq/handle.o
>>> kernel/irq/handle.c:432:3: warning: #warning __do_IRQ is deprecated.
>>> Please convert to proper flow handlers
>>>
>>> Could the usage of __do_IRQ super-handler be a cause of the issue ?
>>>
>>>
>>> many thanks,
>>> angelo
>>>
>>> On 29/12/2010 19:45, Geert Uytterhoeven wrote:
>>>> On Wed, Dec 29, 2010 at 19:06, Baruch Siach<baruch@tkos.co.il> wrote:
>>>>> Hi Angelo,
>>>>>
>>>>> On Wed, Dec 29, 2010 at 02:13:22PM +0100, Angelo Dureghello wrote:
>>>>>> just FYI, i tested kernel 2.6.36.2, unfortunately the issue is still
>>>>>> there, below the call stack trace.
>>>>> Help from the m68k experts seems to be needed. Adding the relevant
>>>>> list to Cc.
>>>> This is uClinux? Added Cc...
>>>>
>>>>>> [ 4.620000] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
>>>>>> [ 39.390000] BUG: spinlock recursion on CPU#0, httpd/29
>>>>>> [ 39.390000] lock: 00189c44, .magic: dead4ead, .owner: httpd/29,
>>>>>> .owner_cpu: 0
>>>>>> [ 39.390000] Stack from 00d6a990:
>>>>>> [ 39.390000] 00d6a9bc 000a9710 0017cac7 00189c44 dead4ead
>>>>>> 00de48f4 0000001d 00000000
>>>>>> [ 39.390000] 00189c44 0002a646 00145f70 00d6a9f0 000a98e2
>>>>>> 00189c44 0017cb2d 00189c44
>>>>>> [ 39.390000] 00d6aad8 0000001f 00145f5c 001523f6 00189c08
>>>>>> 0002a646 00145f70 0002bc52
>>>>>> [ 39.390000] 00d6a9fc 00145f7e 00189c44 00d6aa28 0002a75e
>>>>>> 00189c44 0000001f 00d6aad8
>>>>>> [ 39.390000] 0000001f 00145f5c 00189c08 0002a646 00145f70
>>>>>> 0002bc52 00d6aa3c 00000bb6
>>>>>> [ 39.390000] 0000001f 00189c44 00cfc780 00d6aa84 0000337a
>>>>>> 0000001f 00d6aa4c 00000001
>>>>>> [ 39.390000] Call Trace:
>>>>>> [ 39.390000] [000a9710] spin_bug+0x86/0x11a
>>>>>> [ 39.390000] [000a98e2] do_raw_spin_lock+0x58/0x120
>>>>>> [ 39.390000] [00145f7e] _raw_spin_lock+0xe/0x14
>>>>>> [ 39.390000] [0002a75e] __do_IRQ+0x2c/0x108
>>>>>> [ 39.390000] [00000bb6] do_IRQ+0x48/0x62
>>>>>> [ 39.390000] [0000337a] inthandler+0x6a/0x74
>>>>>> [ 39.390000] [0002a82e] __do_IRQ+0xfc/0x108
>>>>>> [ 39.390000] [00000bb6] do_IRQ+0x48/0x62
>>>>>> [ 39.390000] [0000337a] inthandler+0x6a/0x74
>>>>>> [ 39.390000] [000ef0ce] skb_release_all+0x10/0x20
>>>>>> [ 39.390000] [000ee6bc] __kfree_skb+0x10/0x92
>>>>>> [ 39.390000] [000ee75e] consume_skb+0x20/0x34
>>>>>> [ 39.390000] [000e004e] dm9000_start_xmit+0xdc/0xec
>>>>>> [ 39.390000] [000f67a2] dev_hard_start_xmit+0x146/0x472
>>>>>> [ 39.390000] [00106506] sch_direct_xmit+0xc0/0x1bc
>>>>>> [ 39.390000] [000f9914] dev_queue_xmit+0x160/0x3e4
>>>>>> [ 39.390000] [00113b3e] ip_finish_output+0xee/0x318
>>>>>> [ 39.390000] [001142b4] ip_output+0x7c/0x88
>>>>>> [ 39.390000] [00113dc6] ip_local_out+0x26/0x30
>>>>>> [ 39.390000] [00114d9a] ip_queue_xmit+0x152/0x374
>>>>>> [ 39.390000] [00125c8c] tcp_transmit_skb+0x3f0/0x732
>>>>>> [ 39.390000] [00126f26] tcp_write_xmit+0x178/0x868
>>>>>> [ 39.390000] [00127644] __tcp_push_pending_frames+0x2e/0x9a
>>>>>> [ 39.390000] [0011c3d6] tcp_sendmsg+0x82e/0x98c
>>>>>> [ 39.390000] [00137544] inet_sendmsg+0x32/0x54
>>>>>> [ 39.390000] [000e79a6] sock_aio_write+0xc8/0x138
>>>>>> [ 39.390000] [00042590] do_sync_write+0x9e/0xfe
>>>>>> [ 39.390000] [00042668] vfs_write+0x78/0x84
>>>>>> [ 39.390000] [00042a92] sys_write+0x40/0x7a
>>>>>> [ 39.390000] [00003224] system_call+0x84/0xc2
>>>>>> [ 39.390000]
>>>>>>
>>>>>> dm9000e is as default not visible/selectable in menuconfig for
>>>>>> Coldfire architectures, so this probably cannot be considered as a
>>>>>> kernel bug.
>>>>>>
>>>>>> I going forward in investigations, every help is appreciated,
>>>>>>
>>>>>> regards,
>>>>>> angelo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 29/12/2010 07:06, Baruch Siach wrote:
>>>>>>> Hi Angelo,
>>>>>>>
>>>>>>> On Tue, Dec 28, 2010 at 10:52:42PM +0100, Angelo Dureghello wrote:
>>>>>>>> sorry to contact you directly but i couldn't get any help from the
>>>>>>>> kernel.org mailing list, since i am not a developer my mails are
>>>>>>>> generally skipped.
>>>>>>> The best way to get the contact info for a piece of kernel code, is
>>>>>>> using the
>>>>>>> get_maintainer.pl script. Running 'scripts/get_maintainer.pl -f
>>>>>>> drivers/net/dm9000.c' gives the following output:
>>>>>>>
>>>>>>> netdev@vger.kernel.org
>>>>>>> linux-kernel@vger.kernel.org
>>>>>>>
>>>>>>> I added both to Cc.
>>>>>>>
>>>>>>>> I am very near to have a custom board working with MCF5307 cpu and
>>>>>>>> dm9000.
>>>>>>>> I am using kernel 2.6.36-rc3 with your last patch about
>>>>>>>> spinlock-recursion already included.
>>>>>>> You should try to update to the latest .36 kernel, which is
>>>>>>> currently
>>>>>>> 2.6.36.2. The problem that you experience might be unrelated to the
>>>>>>> dm9000
>>>>>>> driver (or to networking at all), so it might have been fixed in
>>>>>>> this version.
>>>>>>>
>>>>>>>> I have "ping" and "telnet" to the embedded board fully working.
>>>>>>>> If i try to get a sample web page with some images from the board
>>>>>>>> httpd with a browser, in 80% of cases i get a trap/oops:
>>>>>>> Try to enable KALLSYMS in your kernel .config to make your stack
>>>>>>> trace more
>>>>>>> meaningful. This is under 'General setup -> Configure standard
>>>>>>> kernel features
>>>>>>> (for small systems) -> Load all symbols for debugging/ksymoops'.
>>>>>>>
>>>>>>> I hope this helps.
>>>>>>>
>>>>>>> baruch
>>>>>>>
>>>>>>>> [ 4.590000] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
>>>>>>>> [ 67.630000] BUG: spinlock recursion on CPU#0, httpd/29
>>>>>>>> [ 67.630000] lock: 00c42c06, .magic: dead4ead, .owner: httpd/29,
>>>>>>>> .owner_cpu: 0
>>>>>>>> [ 67.630000] Stack from 00d7b914:
>>>>>>>> [ 67.630000] 00d7b940 000a8cf0 0015f693 00c42c06 dead4ead
>>>>>>>> 00dec1d4 0000001d 00000000
>>>>>>>> [ 67.630000] 00c42c06 00006188 00c42800 00d7b974 000a8ec2
>>>>>>>> 00c42c06 0015f6f9 00002704
>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 00c42b60
>>>>>>>> 00006188 00c42800 0002b312
>>>>>>>> [ 67.630000] 00d7b984 0014701e 00c42c06 00000000 00d7b9c4
>>>>>>>> 000df21c 00c42c06 00000000
>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 000005ea
>>>>>>>> 00cfc640 00006188 000096e8
>>>>>>>> [ 67.630000] 0002b312 00146fa4 00c42b60 00002704 00d7b9ec
>>>>>>>> 00029d3a 0000001f 00c42800
>>>>>>>> [ 67.630000] Call Trace:
>>>>>>>> [ 67.630000] [000a8cf0] [000a8ec2] [0014701e] [000df21c] [00029d3a]
>>>>>>>> [ 67.630000] [00029e84] [00000bb6] [0000336e] [000df162] [000effd6]
>>>>>>>> [ 67.630000] [00100482] [000f312e] [000f9ebc] [0010dd2a] [0010e4a0]
>>>>>>>> [ 67.630000] [0010dfb2] [0010ef80] [0011fed6] [00121170] [0012188e]
>>>>>>>> [ 67.630000] [0011ecc6] [001249fe] [000e4084] [0011621c] [00131a44]
>>>>>>>> [ 67.630000] [000e11ee] [00041944] [00041a1c] [00041e46] [00003218]
>>>>>>>> [ 67.630000] BUG: spinlock lockup on CPU#0, httpd/29, 00c42c06
>>>>>>>> [ 67.630000] Stack from 00d7b934:
>>>>>>>> [ 67.630000] 00d7b974 000a8f66 0015f703 00000000 00dec1d4
>>>>>>>> 0000001d 00c42c06 00002704
>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 00c42b60
>>>>>>>> 00006188 00c42800 0002b312
>>>>>>>> [ 67.630000] 00d7b984 0014701e 00c42c06 00000000 00d7b9c4
>>>>>>>> 000df21c 00c42c06 00000000
>>>>>>>> [ 67.630000] 00000000 0000001f 00146fa4 00152f0c 000005ea
>>>>>>>> 00cfc640 00006188 000096e8
>>>>>>>> [ 67.630000] 0002b312 00146fa4 00c42b60 00002704 00d7b9ec
>>>>>>>> 00029d3a 0000001f 00c42800
>>>>>>>> [ 67.630000] 0016c1b4 00cfc640 0000001f 0016c178 00029d10
>>>>>>>> 00146fb8 00d7ba20 00029e84
>>>>>>>> [ 67.630000] Call Trace:
>>>>>>>> [ 67.630000] [000a8f66] [0014701e] [000df21c] [00029d3a] [00029e84]
>>>>>>>> [ 67.630000] [00000bb6] [0000336e] [000df162] [000effd6] [00100482]
>>>>>>>> [ 67.630000] [000f312e] [000f9ebc] [0010dd2a] [0010e4a0] [0010dfb2]
>>>>>>>> [ 67.630000] [0010ef80] [0011fed6] [00121170] [0012188e] [0011ecc6]
>>>>>>>> [ 67.630000] [001249fe] [000e4084] [0011621c] [00131a44] [000e11ee]
>>>>>>>> [ 67.630000] [00041944] [00041a1c] [00041e46] [00003218]
>>>>>>>>
>>>>>>>> As i said, i was hoping in your patch but i sadly discovered it is
>>>>>>>> already included in this kernel version.
>>>>>>>> Hope you can give me some help or can forward me to an appropriate
>>>>>>>> mailing list.
>>>>> --
>>>>> ~. .~ Tk Open Systems
>>>>> =}------------------------------------------------ooO--U--Ooo------------{=
>>>>>
>>>>>
>>>>> - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-kernel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> uClinux-dev mailing list
>>> uClinux-dev@uclinux.org
>>> http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
>>> This message was resent by uclinux-dev@uclinux.org
>>> To unsubscribe see:
>>> http://mailman.uclinux.org/mailman/options/uclinux-dev
>>>
>>
>>
>
>
>


-- 
------------------------------------------------------------------------
Greg Ungerer  --  Principal Engineer        EMAIL:     gerg@snapgear.com
SnapGear Group, McAfee                      PHONE:       +61 7 3435 2888
8 Gardner Close,                            FAX:         +61 7 3891 3630
Milton, QLD, 4064, Australia                WEB: http://www.SnapGear.com
_______________________________________________
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev

^ permalink raw reply

* Re: [PATCH v2 00/12] make rpc_pipefs be mountable multiple time
From: Rob Landley @ 2010-12-30 11:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Rob Landley, Trond Myklebust, J. Bruce Fields, Neil Brown,
	Pavel Emelyanov, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101230104416.GA31824-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>

On Thu, Dec 30, 2010 at 4:44 AM, Kirill A. Shutemov <kas-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Dec 30, 2010 at 04:05:07AM -0600, Rob Landley wrote:
>> On 12/30/2010 03:44 AM, Kirill A. Shutemov wrote:
>> >>> If no rpcmount mountoption, no rpc_pipefs was found at
>> >>> '/var/lib/nfs/rpc_pipefs' and we are in init's mount namespace, we use
>> >>> init_rpc_pipefs.
>> >>
>> >> It's the "we are in init's mount namespace" that I was wondering about.
>> >>
>> >> So if I naievely chroot, nfs mount stops working the way it did before I
>> >> chrooted unless I do an extra setup step?
>> >
>> > No. It will work as before since you are still in init's mount namespace.
>> > Creating new mount namespace changes rules.
>>
>> Ah, CLONE_NEWNS and then you need /var/lib/nfs/rpc_pipefs.  Got it.
>>
>> I'm kind of surprised that the kernel cares about a specific path under
>> /var/lib.  (Seems like policy in the kernel somehow.)
>
> Yep. It's bad, but there is way to overwrite the default.
>
> Other way is to leave 'rpcmount' mountoption without default.
> get_rpc_pipefs(NULL) in init's mount namespace will always return
> init_rpc_pipefs, without filesystem lookup.
> get_rpc_pipefs(NULL) in non-init's mount namespace will always return
> error.
>
> So you will have to specify 'rpcmount' mountoption for every nfs mount in
> container. Hmm, I guess, it may confuse user.
>
> Or we can try to move the default to userspace. /sbin/mount.nfs?

/proc/sys/kernel/hotplug exists to tell the kernel where to find the hotplug
binary.  Once upon a time /sys/hotplug was the default value, and that was
there to overwrite it.  (They changed the default to blank (disabled) not due
to policy reasons, but due to adding the netlink hotplug notification
mechanism and making that the default.)

I bring that up to point out that the general consensus about policy in the
kernel seems to be "when you really really can't avoid having any, make a
sane default the user can override".

(Of course adding another entry to the crawling horror of /proc may not
be an improvement.  But individual overrides at the mount -o level seem
like a non-optimal granularity for this...)

>> Can't it just
>> check the current process's mount list to see if an instance of
>> rpc_pipefs is mounted in the current namespace the way lxc looks for
>> cgroups?  Or are there potential performance/scalability issues with that?
>
> What should we do if we have several rpc_pipefs mounts in the namespace?

You mean more than one inside a given process's view of the filesystem, taking
into account chroot like /proc/mounts does?

Before this patch series, there was one instance systemwide.  The patch changed
that to look a fixed location in the filesystem relative to the
current chroot.  Either
way, there was one instance available to a given process doing an nfs mount.

What's the use case for having more than one visible to a given process?
(NUMA scalability?  Some sort of multipath/VPN routing context?)

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [*v3 PATCH 13/22] IPVS: netns awareness to ip_vs_est
From: hans @ 2010-12-30 10:50 UTC (permalink / raw)
  To: horms, ja, daniel.lezcano, wensong, lvs-devel, netdev,
	netfilter-devel
  Cc: Hans Schillstrom
In-Reply-To: <1293706266-27152-1-git-send-email-hans@schillstrom.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

*v3
 timer per ns instead of a common timer in estimator.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h            |    4 +-
 include/net/netns/ip_vs.h      |    5 ++
 net/netfilter/ipvs/ip_vs_ctl.c |   20 +++++-----
 net/netfilter/ipvs/ip_vs_est.c |   86 ++++++++++++++++++++++-----------------
 4 files changed, 65 insertions(+), 50 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index e8567b0..489c6ea 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1002,8 +1002,8 @@ extern void ip_vs_sync_cleanup(void);
  */
 extern int ip_vs_estimator_init(void);
 extern void ip_vs_estimator_cleanup(void);
-extern void ip_vs_new_estimator(struct ip_vs_stats *stats);
-extern void ip_vs_kill_estimator(struct ip_vs_stats *stats);
+extern void ip_vs_new_estimator(struct net *net, struct ip_vs_stats *stats);
+extern void ip_vs_kill_estimator(struct net *net, struct ip_vs_stats *stats);
 extern void ip_vs_zero_estimator(struct ip_vs_stats *stats);

 /*
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index f077bd3..3da0eca 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -70,6 +70,11 @@ struct netns_ipvs {
 	int 			sysctl_lblcr_expiration;
 	struct ctl_table_header	*lblcr_ctl_header;
 	struct ctl_table	*lblcr_ctl_table;
+	/* ip_vs_est */
+	struct list_head 	est_list;	/* estimator list */
+	spinlock_t		est_lock;
+	struct timer_list	est_timer;	/* Estimation timer */
+
 };

 #endif /* IP_VS_H_ */
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index faaee81..5b4da80 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -816,7 +816,7 @@ __ip_vs_update_dest(struct ip_vs_service *svc, struct ip_vs_dest *dest,
 	spin_unlock(&dest->dst_lock);

 	if (add)
-		ip_vs_new_estimator(&dest->stats);
+		ip_vs_new_estimator(svc->net, &dest->stats);

 	write_lock_bh(&__ip_vs_svc_lock);

@@ -1009,9 +1009,9 @@ ip_vs_edit_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 /*
  *	Delete a destination (must be already unlinked from the service)
  */
-static void __ip_vs_del_dest(struct ip_vs_dest *dest)
+static void __ip_vs_del_dest(struct net *net, struct ip_vs_dest *dest)
 {
-	ip_vs_kill_estimator(&dest->stats);
+	ip_vs_kill_estimator(net, &dest->stats);

 	/*
 	 *  Remove it from the d-linked list with the real services.
@@ -1080,6 +1080,7 @@ static int
 ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 {
 	struct ip_vs_dest *dest;
+	struct net *net = svc->net;
 	__be16 dport = udest->port;

 	EnterFunction(2);
@@ -1108,7 +1109,7 @@ ip_vs_del_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 	/*
 	 *	Delete the destination
 	 */
-	__ip_vs_del_dest(dest);
+	__ip_vs_del_dest(net, dest);

 	LeaveFunction(2);

@@ -1197,7 +1198,7 @@ ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 	else if (svc->port == 0)
 		atomic_inc(&ip_vs_nullsvc_counter);

-	ip_vs_new_estimator(&svc->stats);
+	ip_vs_new_estimator(net, &svc->stats);

 	/* Count only IPv4 services for old get/setsockopt interface */
 	if (svc->af == AF_INET)
@@ -1345,7 +1346,7 @@ static void __ip_vs_del_service(struct ip_vs_service *svc)
 	if (svc->af == AF_INET)
 		ip_vs_num_services--;

-	ip_vs_kill_estimator(&svc->stats);
+	ip_vs_kill_estimator(svc->net, &svc->stats);

 	/* Unbind scheduler */
 	old_sched = svc->scheduler;
@@ -1368,7 +1369,7 @@ static void __ip_vs_del_service(struct ip_vs_service *svc)
 	 */
 	list_for_each_entry_safe(dest, nxt, &svc->destinations, n_list) {
 		__ip_vs_unlink_dest(svc, dest, 0);
-		__ip_vs_del_dest(dest);
+		__ip_vs_del_dest(svc->net, dest);
 	}

 	/*
@@ -3458,7 +3459,7 @@ int __net_init __ip_vs_control_init(struct net *net)
 	sysctl_header = register_net_sysctl_table(net, net_vs_ctl_path, vs_vars);
 	if (sysctl_header == NULL)
 		goto err_reg;
-	ip_vs_new_estimator(&ip_vs_stats);
+	ip_vs_new_estimator(net, &ip_vs_stats);
 	return 0;

 err_reg:
@@ -3470,7 +3471,7 @@ static void __net_exit __ip_vs_control_cleanup(struct net *net)
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return;

-	ip_vs_kill_estimator(&ip_vs_stats);
+	ip_vs_kill_estimator(net, &ip_vs_stats);
 	unregister_net_sysctl_table(sysctl_header);
 	proc_net_remove(net, "ip_vs_stats");
 	proc_net_remove(net, "ip_vs");
@@ -3534,7 +3535,6 @@ void ip_vs_control_cleanup(void)
 	ip_vs_trash_cleanup();
 	cancel_rearming_delayed_work(&defense_work);
 	cancel_work_sync(&defense_work.work);
-	ip_vs_kill_estimator(&ip_vs_stats);
 	unregister_pernet_subsys(&ipvs_control_ops);
 	ip_vs_genl_unregister();
 	nf_unregister_sockopt(&ip_vs_sockopts);
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 7417a0c..4a82a8b 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -8,8 +8,12 @@
  *              as published by the Free Software Foundation; either version
  *              2 of the License, or (at your option) any later version.
  *
- * Changes:
- *
+ * Changes:     Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *              Network name space (netns) aware.
+ *              Global data moved to netns i.e struct netns_ipvs
+ *              Affected data: est_list and est_lock.
+ *              estimation_timer() runs with timer per netns.
+ *              get_stats()) do the per cpu summing.
  */

 #define KMSG_COMPONENT "IPVS"
@@ -48,12 +52,6 @@
  */


-static void estimation_timer(unsigned long arg);
-
-static LIST_HEAD(est_list);
-static DEFINE_SPINLOCK(est_lock);
-static DEFINE_TIMER(est_timer, estimation_timer, 0, 0);
-
 static void estimation_timer(unsigned long arg)
 {
 	struct ip_vs_estimator *e;
@@ -62,9 +60,12 @@ static void estimation_timer(unsigned long arg)
 	u32 n_inpkts, n_outpkts;
 	u64 n_inbytes, n_outbytes;
 	u32 rate;
+	struct net *net = (struct net *)arg;
+	struct netns_ipvs *ipvs;

-	spin_lock(&est_lock);
-	list_for_each_entry(e, &est_list, list) {
+	ipvs = net_ipvs(net);
+	spin_lock(&ipvs->est_lock);
+	list_for_each_entry(e, &ipvs->est_list, list) {
 		s = container_of(e, struct ip_vs_stats, est);

 		spin_lock(&s->lock);
@@ -75,38 +76,39 @@ static void estimation_timer(unsigned long arg)
 		n_outbytes = s->ustats.outbytes;

 		/* scaled by 2^10, but divided 2 seconds */
-		rate = (n_conns - e->last_conns)<<9;
+		rate = (n_conns - e->last_conns) << 9;
 		e->last_conns = n_conns;
-		e->cps += ((long)rate - (long)e->cps)>>2;
-		s->ustats.cps = (e->cps+0x1FF)>>10;
+		e->cps += ((long)rate - (long)e->cps) >> 2;
+		s->ustats.cps = (e->cps + 0x1FF) >> 10;

-		rate = (n_inpkts - e->last_inpkts)<<9;
+		rate = (n_inpkts - e->last_inpkts) << 9;
 		e->last_inpkts = n_inpkts;
-		e->inpps += ((long)rate - (long)e->inpps)>>2;
-		s->ustats.inpps = (e->inpps+0x1FF)>>10;
+		e->inpps += ((long)rate - (long)e->inpps) >> 2;
+		s->ustats.inpps = (e->inpps + 0x1FF) >> 10;

-		rate = (n_outpkts - e->last_outpkts)<<9;
+		rate = (n_outpkts - e->last_outpkts) << 9;
 		e->last_outpkts = n_outpkts;
-		e->outpps += ((long)rate - (long)e->outpps)>>2;
-		s->ustats.outpps = (e->outpps+0x1FF)>>10;
+		e->outpps += ((long)rate - (long)e->outpps) >> 2;
+		s->ustats.outpps = (e->outpps + 0x1FF) >> 10;

-		rate = (n_inbytes - e->last_inbytes)<<4;
+		rate = (n_inbytes - e->last_inbytes) << 4;
 		e->last_inbytes = n_inbytes;
-		e->inbps += ((long)rate - (long)e->inbps)>>2;
-		s->ustats.inbps = (e->inbps+0xF)>>5;
+		e->inbps += ((long)rate - (long)e->inbps) >> 2;
+		s->ustats.inbps = (e->inbps + 0xF) >> 5;

-		rate = (n_outbytes - e->last_outbytes)<<4;
+		rate = (n_outbytes - e->last_outbytes) << 4;
 		e->last_outbytes = n_outbytes;
-		e->outbps += ((long)rate - (long)e->outbps)>>2;
-		s->ustats.outbps = (e->outbps+0xF)>>5;
+		e->outbps += ((long)rate - (long)e->outbps) >> 2;
+		s->ustats.outbps = (e->outbps + 0xF) >> 5;
 		spin_unlock(&s->lock);
 	}
-	spin_unlock(&est_lock);
-	mod_timer(&est_timer, jiffies + 2*HZ);
+	spin_unlock(&ipvs->est_lock);
+	mod_timer(&ipvs->est_timer, jiffies + 2*HZ);
 }

-void ip_vs_new_estimator(struct ip_vs_stats *stats)
+void ip_vs_new_estimator(struct net *net, struct ip_vs_stats *stats)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_estimator *est = &stats->est;

 	INIT_LIST_HEAD(&est->list);
@@ -126,18 +128,19 @@ void ip_vs_new_estimator(struct ip_vs_stats *stats)
 	est->last_outbytes = stats->ustats.outbytes;
 	est->outbps = stats->ustats.outbps<<5;

-	spin_lock_bh(&est_lock);
-	list_add(&est->list, &est_list);
-	spin_unlock_bh(&est_lock);
+	spin_lock_bh(&ipvs->est_lock);
+	list_add(&est->list, &ipvs->est_list);
+	spin_unlock_bh(&ipvs->est_lock);
 }

-void ip_vs_kill_estimator(struct ip_vs_stats *stats)
+void ip_vs_kill_estimator(struct net *net, struct ip_vs_stats *stats)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_estimator *est = &stats->est;

-	spin_lock_bh(&est_lock);
+	spin_lock_bh(&ipvs->est_lock);
 	list_del(&est->list);
-	spin_unlock_bh(&est_lock);
+	spin_unlock_bh(&ipvs->est_lock);
 }

 void ip_vs_zero_estimator(struct ip_vs_stats *stats)
@@ -159,14 +162,25 @@ void ip_vs_zero_estimator(struct ip_vs_stats *stats)

 static int __net_init __ip_vs_estimator_init(struct net *net)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;

+	INIT_LIST_HEAD(&ipvs->est_list);
+	spin_lock_init(&ipvs->est_lock);
+	setup_timer(&ipvs->est_timer, estimation_timer, (unsigned long)net);
+	mod_timer(&ipvs->est_timer, jiffies + 2 * HZ);
 	return 0;
 }

+static void __net_exit __ip_vs_estimator_exit(struct net *net)
+{
+	del_timer_sync(&net_ipvs(net)->est_timer);
+}
 static struct pernet_operations ip_vs_app_ops = {
 	.init = __ip_vs_estimator_init,
+	.exit = __ip_vs_estimator_exit,
 };

 int __init ip_vs_estimator_init(void)
@@ -174,14 +188,10 @@ int __init ip_vs_estimator_init(void)
 	int rv;

 	rv = register_pernet_subsys(&ip_vs_app_ops);
-	if (rv < 0)
-		return rv;
-	mod_timer(&est_timer, jiffies + 2 * HZ);
 	return rv;
 }

 void ip_vs_estimator_cleanup(void)
 {
-	del_timer_sync(&est_timer);
 	unregister_pernet_subsys(&ip_vs_app_ops);
 }
--
1.7.2.3


^ permalink raw reply related

* [*v3 PATCH 07/22] IPVS: netns preparation for proto_udp
From: hans @ 2010-12-30 10:50 UTC (permalink / raw)
  To: horms, ja, daniel.lezcano, wensong, lvs-devel, netdev,
	netfilter-devel
  Cc: Hans Schillstrom
In-Reply-To: <1293706266-27152-1-git-send-email-hans@schillstrom.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
Removed unused function set_state_timeout()

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/netns/ip_vs.h            |    8 +++
 net/netfilter/ipvs/ip_vs_proto.c     |    3 +
 net/netfilter/ipvs/ip_vs_proto_udp.c |   86 +++++++++++++++++-----------------
 3 files changed, 54 insertions(+), 43 deletions(-)

diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 512cdd0..4975026 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -40,6 +40,14 @@ struct netns_ipvs {
 	struct list_head 	tcp_apps[TCP_APP_TAB_SIZE];
 	spinlock_t		tcp_app_lock;
 #endif
+	/* ip_vs_proto_udp */
+#ifdef CONFIG_IP_VS_PROTO_UDP
+	#define	UDP_APP_TAB_BITS	4
+	#define	UDP_APP_TAB_SIZE	(1 << UDP_APP_TAB_BITS)
+	#define	UDP_APP_TAB_MASK	(UDP_APP_TAB_SIZE - 1)
+	struct list_head 	udp_apps[UDP_APP_TAB_SIZE];
+	spinlock_t		udp_app_lock;
+#endif
 
 	/* ip_vs_lblc */
 	int 			sysctl_lblc_expiration;
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 90d69c5..ec71d47 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -310,6 +310,9 @@ static int  __net_init  __ip_vs_protocol_init(struct net *net)
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
 #endif
+#ifdef CONFIG_IP_VS_PROTO_UDP
+	register_ip_vs_proto_netns(net, &ip_vs_protocol_udp);
+#endif
 	return 0;
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 5ab54f6..71a4721 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -9,7 +9,8 @@
  *              as published by the Free Software Foundation; either version
  *              2 of the License, or (at your option) any later version.
  *
- * Changes:
+ * Changes:     Hans Schillstrom <hans.schillstrom@ericsson.com>
+ *              Network name space (netns) aware.
  *
  */
 
@@ -345,19 +346,6 @@ udp_csum_check(int af, struct sk_buff *skb, struct ip_vs_protocol *pp)
 	return 1;
 }
 
-
-/*
- *	Note: the caller guarantees that only one of register_app,
- *	unregister_app or app_conn_bind is called each time.
- */
-
-#define	UDP_APP_TAB_BITS	4
-#define	UDP_APP_TAB_SIZE	(1 << UDP_APP_TAB_BITS)
-#define	UDP_APP_TAB_MASK	(UDP_APP_TAB_SIZE - 1)
-
-static struct list_head udp_apps[UDP_APP_TAB_SIZE];
-static DEFINE_SPINLOCK(udp_app_lock);
-
 static inline __u16 udp_app_hashkey(__be16 port)
 {
 	return (((__force u16)port >> UDP_APP_TAB_BITS) ^ (__force u16)port)
@@ -371,22 +359,24 @@ static int udp_register_app(struct ip_vs_app *inc)
 	__u16 hash;
 	__be16 port = inc->port;
 	int ret = 0;
+	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_UDP);
 
 	hash = udp_app_hashkey(port);
 
 
-	spin_lock_bh(&udp_app_lock);
-	list_for_each_entry(i, &udp_apps[hash], p_list) {
+	spin_lock_bh(&ipvs->udp_app_lock);
+	list_for_each_entry(i, &ipvs->udp_apps[hash], p_list) {
 		if (i->port == port) {
 			ret = -EEXIST;
 			goto out;
 		}
 	}
-	list_add(&inc->p_list, &udp_apps[hash]);
-	atomic_inc(&ip_vs_protocol_udp.appcnt);
+	list_add(&inc->p_list, &ipvs->udp_apps[hash]);
+	atomic_inc(&pd->pp->appcnt);
 
   out:
-	spin_unlock_bh(&udp_app_lock);
+	spin_unlock_bh(&ipvs->udp_app_lock);
 	return ret;
 }
 
@@ -394,15 +384,19 @@ static int udp_register_app(struct ip_vs_app *inc)
 static void
 udp_unregister_app(struct ip_vs_app *inc)
 {
-	spin_lock_bh(&udp_app_lock);
-	atomic_dec(&ip_vs_protocol_udp.appcnt);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_UDP);
+	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+
+	spin_lock_bh(&ipvs->udp_app_lock);
+	atomic_dec(&pd->pp->appcnt);
 	list_del(&inc->p_list);
-	spin_unlock_bh(&udp_app_lock);
+	spin_unlock_bh(&ipvs->udp_app_lock);
 }
 
 
 static int udp_app_conn_bind(struct ip_vs_conn *cp)
 {
+	struct netns_ipvs *ipvs = net_ipvs(&init_net);
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
@@ -414,12 +408,12 @@ static int udp_app_conn_bind(struct ip_vs_conn *cp)
 	/* Lookup application incarnations and bind the right one */
 	hash = udp_app_hashkey(cp->vport);
 
-	spin_lock(&udp_app_lock);
-	list_for_each_entry(inc, &udp_apps[hash], p_list) {
+	spin_lock(&ipvs->udp_app_lock);
+	list_for_each_entry(inc, &ipvs->udp_apps[hash], p_list) {
 		if (inc->port == cp->vport) {
 			if (unlikely(!ip_vs_app_inc_get(inc)))
 				break;
-			spin_unlock(&udp_app_lock);
+			spin_unlock(&ipvs->udp_app_lock);
 
 			IP_VS_DBG_BUF(9, "%s(): Binding conn %s:%u->"
 				      "%s:%u to app %s on port %u\n",
@@ -436,14 +430,14 @@ static int udp_app_conn_bind(struct ip_vs_conn *cp)
 			goto out;
 		}
 	}
-	spin_unlock(&udp_app_lock);
+	spin_unlock(&ipvs->udp_app_lock);
 
   out:
 	return result;
 }
 
 
-static int udp_timeouts[IP_VS_UDP_S_LAST+1] = {
+static const int udp_timeouts[IP_VS_UDP_S_LAST+1] = {
 	[IP_VS_UDP_S_NORMAL]		=	5*60*HZ,
 	[IP_VS_UDP_S_LAST]		=	2*HZ,
 };
@@ -453,14 +447,6 @@ static const char *const udp_state_name_table[IP_VS_UDP_S_LAST+1] = {
 	[IP_VS_UDP_S_LAST]		=	"BUG!",
 };
 
-
-static int
-udp_set_state_timeout(struct ip_vs_protocol *pp, char *sname, int to)
-{
-	return ip_vs_set_state_timeout(pp->timeout_table, IP_VS_UDP_S_LAST,
-				       udp_state_name_table, sname, to);
-}
-
 static const char * udp_state_name(int state)
 {
 	if (state >= IP_VS_UDP_S_LAST)
@@ -473,18 +459,31 @@ udp_state_transition(struct ip_vs_conn *cp, int direction,
 		     const struct sk_buff *skb,
 		     struct ip_vs_protocol *pp)
 {
-	cp->timeout = pp->timeout_table[IP_VS_UDP_S_NORMAL];
+	struct ip_vs_proto_data *pd;   /* Temp fix, pp will be replaced by pd */
+
+	pd = ip_vs_proto_data_get(&init_net, IPPROTO_UDP);
+	if (unlikely(!pd)) {
+		pr_err("UDP no ns data\n");
+		return 0;
+	}
+
+	cp->timeout = pd->timeout_table[IP_VS_UDP_S_NORMAL];
 	return 1;
 }
 
-static void udp_init(struct ip_vs_protocol *pp)
+static void __udp_init(struct net *net, struct ip_vs_proto_data *pd)
 {
-	IP_VS_INIT_HASH_TABLE(udp_apps);
-	pp->timeout_table = udp_timeouts;
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
+	ip_vs_init_hash_table(ipvs->udp_apps, UDP_APP_TAB_SIZE);
+	spin_lock_init(&ipvs->udp_app_lock);
+	pd->timeout_table = ip_vs_create_timeout_table((int *)udp_timeouts,
+							sizeof(udp_timeouts));
 }
 
-static void udp_exit(struct ip_vs_protocol *pp)
+static void __udp_exit(struct net *net, struct ip_vs_proto_data *pd)
 {
+	kfree(pd->timeout_table);
 }
 
 
@@ -493,8 +492,10 @@ struct ip_vs_protocol ip_vs_protocol_udp = {
 	.protocol =		IPPROTO_UDP,
 	.num_states =		IP_VS_UDP_S_LAST,
 	.dont_defrag =		0,
-	.init =			udp_init,
-	.exit =			udp_exit,
+	.init =			NULL,
+	.exit =			NULL,
+	.init_netns =		__udp_init,
+	.exit_netns =		__udp_exit,
 	.conn_schedule =	udp_conn_schedule,
 	.conn_in_get =		ip_vs_conn_in_get_proto,
 	.conn_out_get =		ip_vs_conn_out_get_proto,
@@ -508,5 +509,4 @@ struct ip_vs_protocol ip_vs_protocol_udp = {
 	.app_conn_bind =	udp_app_conn_bind,
 	.debug_packet =		ip_vs_tcpudp_debug_packet,
 	.timeout_change =	NULL,
-	.set_state_timeout =	udp_set_state_timeout,
 };
-- 
1.7.2.3


^ permalink raw reply related

* [*v3 PATCH 15/22] IPVS: netns, ip_vs_stats and its procfs
From: hans @ 2010-12-30 10:50 UTC (permalink / raw)
  To: horms, ja, daniel.lezcano, wensong, lvs-devel, netdev,
	netfilter-devel
  Cc: Hans Schillstrom
In-Reply-To: <1293706266-27152-1-git-send-email-hans@schillstrom.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

The statistic counter locks for every packet are now removed,
and that statistic is now per CPU, i.e. no locks needed.
However summing is made in ip_vs_est into ip_vs_stats struct
which is moved to ipvs struc.

procfs, ip_vs_stats now have a "per cpu" count and a grand total.
A new function seq_file_single_net() in ip_vs.h created for handling of
single_open_net() since it does not place net ptr in a struct, like others.

/var/lib/lxc # cat /proc/net/ip_vs_stats_percpu
       Total Incoming Outgoing         Incoming         Outgoing
CPU    Conns  Packets  Packets            Bytes            Bytes
  0        0        3        1               9D               34
  1        0        1        2               49               70
  2        0        1        2               34               76
  3        1        2        2               70               74
  ~        1        7        7              18A              18E

     Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
           0        0        0                0                0

*v3
ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added.
u64 seq lock added

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h             |   25 +++++++++
 include/net/netns/ip_vs.h       |    5 ++
 net/netfilter/ipvs/ip_vs_core.c |   26 ++++-----
 net/netfilter/ipvs/ip_vs_ctl.c  |  110 ++++++++++++++++++++++++++++++++-------
 net/netfilter/ipvs/ip_vs_est.c  |   34 ++++++++++++
 5 files changed, 166 insertions(+), 34 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index d7b1dcd..1076cfb 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -88,6 +88,18 @@ static inline struct net *skb_sknet(const struct sk_buff *skb) {
 	return &init_net;
 #endif
 }
+/*
+ * This one needed for single_open_net since net is stored directly in
+ * private not as a struct i.e. seq_file_net cant be used.
+ */
+static inline struct net *seq_file_single_net(struct seq_file *seq)
+{
+#ifdef CONFIG_NET_NS
+	return (struct net *)seq->private;
+#else
+	return &init_net;
+#endif
+}
 
 /* Connections' size value needed by ip_vs_ctl.c */
 extern int ip_vs_conn_tab_size;
@@ -344,6 +356,19 @@ struct ip_vs_stats {
 
 	spinlock_t              lock;           /* spin lock */
 };
+/*
+ * Helper Macros for per cpu
+ * ipvs->ctl_stats->ustats.count
+ */
+#define IPVS_STAT_INC(ipvs, count)	\
+	__this_cpu_inc((ipvs)->ustats->count)
+
+#define IPVS_STAT_ADD(ipvs, count, value)	\
+	write_seqcount_begin(per_cpu_ptr((ipvs)->ustats_seq, \
+		             raw_smp_processor_id())); \
+	__this_cpu_add((ipvs)->ustats->count, value); \
+	write_seqcount_end(per_cpu_ptr((ipvs)->ustats_seq, \
+			   raw_smp_processor_id()))
 
 struct dst_entry;
 struct iphdr;
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index f6a6114..3b173b4 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -62,6 +62,11 @@ struct netns_ipvs {
 	struct list_head 	sctp_apps[SCTP_APP_TAB_SIZE];
 	spinlock_t		sctp_app_lock;
 #endif
+	/* ip_vs_ctl */
+	struct ip_vs_stats 		*ctl_stats;  /* Statistics & estimator */
+	struct ip_vs_stats_user __percpu *ustats;    /* Statistics */
+	seqcount_t			*ustats_seq; /* u64 read retry */
+
 	/* ip_vs_lblc */
 	int 			sysctl_lblc_expiration;
 	struct ctl_table_header	*lblc_ctl_header;
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 5d6e250..5e278e5 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -115,6 +115,8 @@ static inline void
 ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
 {
 	struct ip_vs_dest *dest = cp->dest;
+	struct netns_ipvs *ipvs = net_ipvs(skb_net(skb));
+
 	if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) {
 		spin_lock(&dest->stats.lock);
 		dest->stats.ustats.inpkts++;
@@ -126,10 +128,8 @@ ip_vs_in_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
 		dest->svc->stats.ustats.inbytes += skb->len;
 		spin_unlock(&dest->svc->stats.lock);
 
-		spin_lock(&ip_vs_stats.lock);
-		ip_vs_stats.ustats.inpkts++;
-		ip_vs_stats.ustats.inbytes += skb->len;
-		spin_unlock(&ip_vs_stats.lock);
+		IPVS_STAT_INC(ipvs, inpkts);
+		IPVS_STAT_ADD(ipvs, inbytes, skb->len);
 	}
 }
 
@@ -138,6 +138,8 @@ static inline void
 ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
 {
 	struct ip_vs_dest *dest = cp->dest;
+	struct netns_ipvs *ipvs = net_ipvs(skb_net(skb));
+
 	if (dest && (dest->flags & IP_VS_DEST_F_AVAILABLE)) {
 		spin_lock(&dest->stats.lock);
 		dest->stats.ustats.outpkts++;
@@ -149,10 +151,8 @@ ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
 		dest->svc->stats.ustats.outbytes += skb->len;
 		spin_unlock(&dest->svc->stats.lock);
 
-		spin_lock(&ip_vs_stats.lock);
-		ip_vs_stats.ustats.outpkts++;
-		ip_vs_stats.ustats.outbytes += skb->len;
-		spin_unlock(&ip_vs_stats.lock);
+		IPVS_STAT_INC(ipvs, outpkts);
+		IPVS_STAT_ADD(ipvs, outbytes, skb->len);
 	}
 }
 
@@ -160,6 +160,8 @@ ip_vs_out_stats(struct ip_vs_conn *cp, struct sk_buff *skb)
 static inline void
 ip_vs_conn_stats(struct ip_vs_conn *cp, struct ip_vs_service *svc)
 {
+	struct netns_ipvs *ipvs = net_ipvs(svc->net);
+
 	spin_lock(&cp->dest->stats.lock);
 	cp->dest->stats.ustats.conns++;
 	spin_unlock(&cp->dest->stats.lock);
@@ -168,9 +170,7 @@ ip_vs_conn_stats(struct ip_vs_conn *cp, struct ip_vs_service *svc)
 	svc->stats.ustats.conns++;
 	spin_unlock(&svc->stats.lock);
 
-	spin_lock(&ip_vs_stats.lock);
-	ip_vs_stats.ustats.conns++;
-	spin_unlock(&ip_vs_stats.lock);
+	IPVS_STAT_INC(ipvs, conns);
 }
 
 
@@ -1471,13 +1471,12 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
 static unsigned int
 ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 {
-	struct net *net = NULL;
+	struct net *net;
 	struct ip_vs_iphdr iph;
 	struct ip_vs_protocol *pp;
 	struct ip_vs_proto_data *pd;
 	struct ip_vs_conn *cp;
 	int ret, restart, pkts;
-	struct net *net;
 	struct netns_ipvs *ipvs;
 
 	/* Already marked as IPVS request or reply? */
@@ -1842,7 +1841,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
 	},
 #endif
 };
-
 /*
  *	Initialize IP Virtual Server netns mem.
  */
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 105c05f..173fadc 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -258,8 +258,7 @@ static DECLARE_DELAYED_WORK(defense_work, defense_work_handler);
 
 static void defense_work_handler(struct work_struct *work)
 {
-	struct net *net = &init_net;
-	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct netns_ipvs *ipvs = net_ipvs(&init_net);
 
 	update_defense_level(ipvs);
 	if (atomic_read(&ip_vs_dropentry))
@@ -1499,7 +1498,7 @@ static int ip_vs_zero_all(struct net *net)
 		}
 	}
 
-	ip_vs_zero_stats(&ip_vs_stats);
+	ip_vs_zero_stats(net_ipvs(net)->ctl_stats);
 	return 0;
 }
 
@@ -1989,13 +1988,11 @@ static const struct file_operations ip_vs_info_fops = {
 
 #endif
 
-struct ip_vs_stats ip_vs_stats = {
-	.lock = __SPIN_LOCK_UNLOCKED(ip_vs_stats.lock),
-};
-
 #ifdef CONFIG_PROC_FS
 static int ip_vs_stats_show(struct seq_file *seq, void *v)
 {
+	struct net *net = seq_file_single_net(seq);
+	struct ip_vs_stats *ctl_stats = net_ipvs(net)->ctl_stats;
 
 /*               01234567 01234567 01234567 0123456701234567 0123456701234567 */
 	seq_puts(seq,
@@ -2003,22 +2000,22 @@ static int ip_vs_stats_show(struct seq_file *seq, void *v)
 	seq_printf(seq,
 		   "   Conns  Packets  Packets            Bytes            Bytes\n");
 
-	spin_lock_bh(&ip_vs_stats.lock);
-	seq_printf(seq, "%8X %8X %8X %16LX %16LX\n\n", ip_vs_stats.ustats.conns,
-		   ip_vs_stats.ustats.inpkts, ip_vs_stats.ustats.outpkts,
-		   (unsigned long long) ip_vs_stats.ustats.inbytes,
-		   (unsigned long long) ip_vs_stats.ustats.outbytes);
+	spin_lock_bh(&ctl_stats->lock);
+	seq_printf(seq, "%8X %8X %8X %16LX %16LX\n\n", ctl_stats->ustats.conns,
+		   ctl_stats->ustats.inpkts, ctl_stats->ustats.outpkts,
+		   (unsigned long long) ctl_stats->ustats.inbytes,
+		   (unsigned long long) ctl_stats->ustats.outbytes);
 
 /*                 01234567 01234567 01234567 0123456701234567 0123456701234567 */
 	seq_puts(seq,
 		   " Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s\n");
 	seq_printf(seq,"%8X %8X %8X %16X %16X\n",
-			ip_vs_stats.ustats.cps,
-			ip_vs_stats.ustats.inpps,
-			ip_vs_stats.ustats.outpps,
-			ip_vs_stats.ustats.inbps,
-			ip_vs_stats.ustats.outbps);
-	spin_unlock_bh(&ip_vs_stats.lock);
+			ctl_stats->ustats.cps,
+			ctl_stats->ustats.inpps,
+			ctl_stats->ustats.outpps,
+			ctl_stats->ustats.inbps,
+			ctl_stats->ustats.outbps);
+	spin_unlock_bh(&ctl_stats->lock);
 
 	return 0;
 }
@@ -2036,6 +2033,57 @@ static const struct file_operations ip_vs_stats_fops = {
 	.release = single_release,
 };
 
+static int ip_vs_stats_percpu_show(struct seq_file *seq, void *v)
+{
+	struct net *net = seq_file_single_net(seq);
+	struct ip_vs_stats *ctl_stats = net_ipvs(net)->ctl_stats;
+	int i;
+
+/*               01234567 01234567 01234567 0123456701234567 0123456701234567 */
+	seq_puts(seq,
+		 "       Total Incoming Outgoing         Incoming         Outgoing\n");
+	seq_printf(seq,
+		   "CPU    Conns  Packets  Packets            Bytes            Bytes\n");
+
+	for_each_possible_cpu(i) {
+		struct ip_vs_stats_user *u = per_cpu_ptr(net->ipvs->ustats, i);
+		seq_printf(seq, "%3X %8X %8X %8X %16LX %16LX\n",
+			    i, u->conns, u->inpkts, u->outpkts,
+			    (__u64) u->inbytes, (__u64) u->outbytes);
+	}
+
+	spin_lock_bh(&ctl_stats->lock);
+	seq_printf(seq, "  ~ %8X %8X %8X %16LX %16LX\n\n", ctl_stats->ustats.conns,
+		   ctl_stats->ustats.inpkts, ctl_stats->ustats.outpkts,
+		   (unsigned long long) ctl_stats->ustats.inbytes,
+		   (unsigned long long) ctl_stats->ustats.outbytes);
+
+/*                 01234567 01234567 01234567 0123456701234567 0123456701234567 */
+	seq_puts(seq,
+		   "     Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s\n");
+	seq_printf(seq,"    %8X %8X %8X %16X %16X\n",
+			ctl_stats->ustats.cps,
+			ctl_stats->ustats.inpps,
+			ctl_stats->ustats.outpps,
+			ctl_stats->ustats.inbps,
+			ctl_stats->ustats.outbps);
+	spin_unlock_bh(&ctl_stats->lock);
+
+	return 0;
+}
+
+static int ip_vs_stats_percpu_seq_open(struct inode *inode, struct file *file)
+{
+	return single_open_net(inode, file, ip_vs_stats_percpu_show);
+}
+
+static const struct file_operations ip_vs_stats_percpu_fops = {
+	.owner = THIS_MODULE,
+	.open = ip_vs_stats_percpu_seq_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
 #endif
 
 /*
@@ -3460,6 +3508,18 @@ int __net_init __ip_vs_control_init(struct net *net)
 
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;
+	/* procfs stats */
+	ipvs->ctl_stats = kzalloc(sizeof(struct ip_vs_stats), GFP_KERNEL);
+	if (ipvs->ctl_stats == NULL) {
+		pr_err("%s(): no memory.\n", __func__);
+		return -ENOMEM;
+	}
+	ipvs->ustats = alloc_percpu(struct ip_vs_stats_user);
+	if (!ipvs->ustats) {
+		pr_err("%s() alloc_percpu failed\n",__func__);
+		goto err_alloc;
+	}
+	spin_lock_init(&ipvs->ctl_stats->lock);
 
 	for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++) {
 		INIT_LIST_HEAD(&ipvs->rs_table[idx]);
@@ -3467,25 +3527,35 @@ int __net_init __ip_vs_control_init(struct net *net)
 
 	proc_net_fops_create(net, "ip_vs", 0, &ip_vs_info_fops);
 	proc_net_fops_create(net, "ip_vs_stats", 0, &ip_vs_stats_fops);
+	proc_net_fops_create(net, "ip_vs_stats_percpu", 0,
+			     &ip_vs_stats_percpu_fops);
 	sysctl_header = register_net_sysctl_table(net, net_vs_ctl_path, vs_vars);
 	if (sysctl_header == NULL)
 		goto err_reg;
-	ip_vs_new_estimator(net, &ip_vs_stats);
+	ip_vs_new_estimator(net, ipvs->ctl_stats);
 	return 0;
 
 err_reg:
+	free_percpu(ipvs->ustats);
+err_alloc:
+	kfree(ipvs->ctl_stats);
 	return -ENOMEM;
 }
 
 static void __net_exit __ip_vs_control_cleanup(struct net *net)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return;
 
-	ip_vs_kill_estimator(net, &ip_vs_stats);
+	ip_vs_kill_estimator(net, ipvs->ctl_stats);
 	unregister_net_sysctl_table(sysctl_header);
+	proc_net_remove(net, "ip_vs_stats_percpu");
 	proc_net_remove(net, "ip_vs_stats");
 	proc_net_remove(net, "ip_vs");
+	free_percpu(ipvs->ustats);
+	kfree(ipvs->ctl_stats);
 }
 
 static struct pernet_operations ipvs_control_ops = {
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 4a82a8b..6d3d06c 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -52,6 +52,39 @@
  */
 
 
+/*
+ * Make a summary from each cpu
+ */
+static inline void get_stats(struct netns_ipvs *ipvs)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct ip_vs_stats_user *u = per_cpu_ptr(ipvs->ustats, i);
+		seqcount_t  *seq_count = per_cpu_ptr(ipvs->ustats_seq, i);
+		unsigned int start;
+		if (i) {
+			ipvs->ctl_stats->ustats.conns += u->conns;
+			ipvs->ctl_stats->ustats.inpkts += u->inpkts;
+			ipvs->ctl_stats->ustats.outpkts += u->outpkts;
+			do {
+				start = read_seqcount_begin(seq_count);
+				ipvs->ctl_stats->ustats.inbytes += u->inbytes;
+				ipvs->ctl_stats->ustats.outbytes += u->outbytes;
+			} while (read_seqcount_retry(seq_count, start));
+		} else {
+			ipvs->ctl_stats->ustats.conns = u->conns;
+			ipvs->ctl_stats->ustats.inpkts = u->inpkts;
+			ipvs->ctl_stats->ustats.outpkts = u->outpkts;
+			do {
+				start = read_seqcount_begin(seq_count);
+				ipvs->ctl_stats->ustats.inbytes = u->inbytes;
+				ipvs->ctl_stats->ustats.outbytes = u->outbytes;
+			} while (read_seqcount_retry(seq_count, start));
+		}
+	}
+}
+
 static void estimation_timer(unsigned long arg)
 {
 	struct ip_vs_estimator *e;
@@ -64,6 +97,7 @@ static void estimation_timer(unsigned long arg)
 	struct netns_ipvs *ipvs;
 
 	ipvs = net_ipvs(net);
+	get_stats(ipvs);
 	spin_lock(&ipvs->est_lock);
 	list_for_each_entry(e, &ipvs->est_list, list) {
 		s = container_of(e, struct ip_vs_stats, est);
-- 
1.7.2.3


^ permalink raw reply related

* [*v3 PATCH 12/22] IPVS: netns awareness to ip_vs_app
From: hans @ 2010-12-30 10:50 UTC (permalink / raw)
  To: horms, ja, daniel.lezcano, wensong, lvs-devel, netdev,
	netfilter-devel
  Cc: Hans Schillstrom
In-Reply-To: <1293706266-27152-1-git-send-email-hans@schillstrom.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

in ip_vs_protocol param struct net *net added to:
 - register_app()
 - unregister_app()
This affected almost all proto_xxx.c files

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h                   |   12 +++---
 include/net/netns/ip_vs.h             |    5 ++
 net/netfilter/ipvs/ip_vs_app.c        |   72 +++++++++++++++++++--------------
 net/netfilter/ipvs/ip_vs_ftp.c        |    8 ++--
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   12 +++---
 net/netfilter/ipvs/ip_vs_proto_tcp.c  |   12 +++---
 net/netfilter/ipvs/ip_vs_proto_udp.c  |   12 +++---
 7 files changed, 75 insertions(+), 58 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index fde0bca..e8567b0 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -400,9 +400,9 @@ struct ip_vs_protocol {
 				const struct sk_buff *skb,
 				struct ip_vs_proto_data *pd);
 
-	int (*register_app)(struct ip_vs_app *inc);
+	int (*register_app)(struct net *net, struct ip_vs_app *inc);
 
-	void (*unregister_app)(struct ip_vs_app *inc);
+	void (*unregister_app)(struct net *net, struct ip_vs_app *inc);
 
 	int (*app_conn_bind)(struct ip_vs_conn *cp);
 
@@ -869,12 +869,12 @@ ip_vs_control_add(struct ip_vs_conn *cp, struct ip_vs_conn *ctl_cp)
  *      (from ip_vs_app.c)
  */
 #define IP_VS_APP_MAX_PORTS  8
-extern int register_ip_vs_app(struct ip_vs_app *app);
-extern void unregister_ip_vs_app(struct ip_vs_app *app);
+extern int register_ip_vs_app(struct net *net, struct ip_vs_app *app);
+extern void unregister_ip_vs_app(struct net *net, struct ip_vs_app *app);
 extern int ip_vs_bind_app(struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
 extern void ip_vs_unbind_app(struct ip_vs_conn *cp);
-extern int
-register_ip_vs_app_inc(struct ip_vs_app *app, __u16 proto, __u16 port);
+extern int register_ip_vs_app_inc(struct net *net, struct ip_vs_app *app,
+				  __u16 proto, __u16 port);
 extern int ip_vs_app_inc_get(struct ip_vs_app *inc);
 extern void ip_vs_app_inc_put(struct ip_vs_app *inc);
 
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index fcb3c7c..f077bd3 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -29,6 +29,11 @@ struct netns_ipvs {
 	#define IP_VS_RTAB_MASK (IP_VS_RTAB_SIZE - 1)
 
 	struct list_head 	rs_table[IP_VS_RTAB_SIZE];
+	/* ip_vs_app */
+	struct list_head 	app_list;
+	struct mutex		app_mutex;
+	struct lock_class_key 	app_key;	/* mutex debuging */
+
 	/* ip_vs_proto */
 	#define IP_VS_PROTO_TAB_SIZE	32	/* must be power of 2 */
 	struct ip_vs_proto_data *proto_data_table[IP_VS_PROTO_TAB_SIZE];
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 6d10352..af2cd83 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -43,11 +43,6 @@ EXPORT_SYMBOL(register_ip_vs_app);
 EXPORT_SYMBOL(unregister_ip_vs_app);
 EXPORT_SYMBOL(register_ip_vs_app_inc);
 
-/* ipvs application list head */
-static LIST_HEAD(ip_vs_app_list);
-static DEFINE_MUTEX(__ip_vs_app_mutex);
-
-
 /*
  *	Get an ip_vs_app object
  */
@@ -67,7 +62,8 @@ static inline void ip_vs_app_put(struct ip_vs_app *app)
  *	Allocate/initialize app incarnation and register it in proto apps.
  */
 static int
-ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
+ip_vs_app_inc_new(struct net *net, struct ip_vs_app *app, __u16 proto,
+		  __u16 port)
 {
 	struct ip_vs_protocol *pp;
 	struct ip_vs_app *inc;
@@ -98,7 +94,7 @@ ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
 		}
 	}
 
-	ret = pp->register_app(inc);
+	ret = pp->register_app(net, inc);
 	if (ret)
 		goto out;
 
@@ -119,7 +115,7 @@ ip_vs_app_inc_new(struct ip_vs_app *app, __u16 proto, __u16 port)
  *	Release app incarnation
  */
 static void
-ip_vs_app_inc_release(struct ip_vs_app *inc)
+ip_vs_app_inc_release(struct net *net, struct ip_vs_app *inc)
 {
 	struct ip_vs_protocol *pp;
 
@@ -127,7 +123,7 @@ ip_vs_app_inc_release(struct ip_vs_app *inc)
 		return;
 
 	if (pp->unregister_app)
-		pp->unregister_app(inc);
+		pp->unregister_app(net, inc);
 
 	IP_VS_DBG(9, "%s App %s:%u unregistered\n",
 		  pp->name, inc->name, ntohs(inc->port));
@@ -168,15 +164,16 @@ void ip_vs_app_inc_put(struct ip_vs_app *inc)
  *	Register an application incarnation in protocol applications
  */
 int
-register_ip_vs_app_inc(struct ip_vs_app *app, __u16 proto, __u16 port)
+register_ip_vs_app_inc(struct net *net, struct ip_vs_app *app, __u16 proto, __u16 port)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
 	int result;
 
-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&ipvs->app_mutex);
 
-	result = ip_vs_app_inc_new(app, proto, port);
+	result = ip_vs_app_inc_new(net, app, proto, port);
 
-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&ipvs->app_mutex);
 
 	return result;
 }
@@ -185,16 +182,17 @@ register_ip_vs_app_inc(struct ip_vs_app *app, __u16 proto, __u16 port)
 /*
  *	ip_vs_app registration routine
  */
-int register_ip_vs_app(struct ip_vs_app *app)
+int register_ip_vs_app(struct net *net, struct ip_vs_app *app)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
 	/* increase the module use count */
 	ip_vs_use_count_inc();
 
-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&ipvs->app_mutex);
 
-	list_add(&app->a_list, &ip_vs_app_list);
+	list_add(&app->a_list, &ipvs->app_list);
 
-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&ipvs->app_mutex);
 
 	return 0;
 }
@@ -204,19 +202,20 @@ int register_ip_vs_app(struct ip_vs_app *app)
  *	ip_vs_app unregistration routine
  *	We are sure there are no app incarnations attached to services
  */
-void unregister_ip_vs_app(struct ip_vs_app *app)
+void unregister_ip_vs_app(struct net *net, struct ip_vs_app *app)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_app *inc, *nxt;
 
-	mutex_lock(&__ip_vs_app_mutex);
+	mutex_lock(&ipvs->app_mutex);
 
 	list_for_each_entry_safe(inc, nxt, &app->incs_list, a_list) {
-		ip_vs_app_inc_release(inc);
+		ip_vs_app_inc_release(net, inc);
 	}
 
 	list_del(&app->a_list);
 
-	mutex_unlock(&__ip_vs_app_mutex);
+	mutex_unlock(&ipvs->app_mutex);
 
 	/* decrease the module use count */
 	ip_vs_use_count_dec();
@@ -226,7 +225,8 @@ void unregister_ip_vs_app(struct ip_vs_app *app)
 /*
  *	Bind ip_vs_conn to its ip_vs_app (called by cp constructor)
  */
-int ip_vs_bind_app(struct ip_vs_conn *cp, struct ip_vs_protocol *pp)
+int ip_vs_bind_app(struct ip_vs_conn *cp,
+		   struct ip_vs_protocol *pp)
 {
 	return pp->app_conn_bind(cp);
 }
@@ -481,11 +481,11 @@ int ip_vs_app_pkt_in(struct ip_vs_conn *cp, struct sk_buff *skb)
  *	/proc/net/ip_vs_app entry function
  */
 
-static struct ip_vs_app *ip_vs_app_idx(loff_t pos)
+static struct ip_vs_app *ip_vs_app_idx(struct netns_ipvs *ipvs, loff_t pos)
 {
 	struct ip_vs_app *app, *inc;
 
-	list_for_each_entry(app, &ip_vs_app_list, a_list) {
+	list_for_each_entry(app, &ipvs->app_list, a_list) {
 		list_for_each_entry(inc, &app->incs_list, a_list) {
 			if (pos-- == 0)
 				return inc;
@@ -497,19 +497,24 @@ static struct ip_vs_app *ip_vs_app_idx(loff_t pos)
 
 static void *ip_vs_app_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	mutex_lock(&__ip_vs_app_mutex);
+	struct net *net = seq_file_net(seq);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
+	mutex_lock(&ipvs->app_mutex);
 
-	return *pos ? ip_vs_app_idx(*pos - 1) : SEQ_START_TOKEN;
+	return *pos ? ip_vs_app_idx(ipvs, *pos - 1) : SEQ_START_TOKEN;
 }
 
 static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ip_vs_app *inc, *app;
 	struct list_head *e;
+	struct net *net = seq_file_net(seq);
+	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	++*pos;
 	if (v == SEQ_START_TOKEN)
-		return ip_vs_app_idx(0);
+		return ip_vs_app_idx(ipvs, 0);
 
 	inc = v;
 	app = inc->app;
@@ -518,7 +523,7 @@ static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		return list_entry(e, struct ip_vs_app, a_list);
 
 	/* go on to next application */
-	for (e = app->a_list.next; e != &ip_vs_app_list; e = e->next) {
+	for (e = app->a_list.next; e != &ipvs->app_list; e = e->next) {
 		app = list_entry(e, struct ip_vs_app, a_list);
 		list_for_each_entry(inc, &app->incs_list, a_list) {
 			return inc;
@@ -529,7 +534,9 @@ static void *ip_vs_app_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 
 static void ip_vs_app_seq_stop(struct seq_file *seq, void *v)
 {
-	mutex_unlock(&__ip_vs_app_mutex);
+	struct netns_ipvs *ipvs = net_ipvs(seq_file_net(seq));
+
+	mutex_unlock(&ipvs->app_mutex);
 }
 
 static int ip_vs_app_seq_show(struct seq_file *seq, void *v)
@@ -557,7 +564,8 @@ static const struct seq_operations ip_vs_app_seq_ops = {
 
 static int ip_vs_app_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_app_seq_ops);
+	return seq_open_net(inode, file, &ip_vs_app_seq_ops,
+			    sizeof(struct seq_net_private));
 }
 
 static const struct file_operations ip_vs_app_fops = {
@@ -571,9 +579,13 @@ static const struct file_operations ip_vs_app_fops = {
 
 static int __net_init __ip_vs_app_init(struct net *net)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;
 
+	INIT_LIST_HEAD(&ipvs->app_list);
+	__mutex_init(&ipvs->app_mutex,"ipvs->app_mutex", &ipvs->app_key);
 	proc_net_fops_create(net, "ip_vs_app", 0, &ip_vs_app_fops);
 	return 0;
 }
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index b38ae94..77b0036 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -414,14 +414,14 @@ static int __net_init __ip_vs_ftp_init(struct net *net)
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;
 
-	ret = register_ip_vs_app(app);
+	ret = register_ip_vs_app(net, app);
 	if (ret)
 		return ret;
 
 	for (i=0; i<IP_VS_APP_MAX_PORTS; i++) {
 		if (!ports[i])
 			continue;
-		ret = register_ip_vs_app_inc(app, app->protocol, ports[i]);
+		ret = register_ip_vs_app_inc(net, app, app->protocol, ports[i]);
 		if (ret)
 			break;
 		pr_info("%s: loaded support on port[%d] = %d\n",
@@ -429,7 +429,7 @@ static int __net_init __ip_vs_ftp_init(struct net *net)
 	}
 
 	if (ret)
-		unregister_ip_vs_app(app);
+		unregister_ip_vs_app(net, app);
 
 	return ret;
 }
@@ -443,7 +443,7 @@ static void __ip_vs_ftp_exit(struct net *net)
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return;
 
-	unregister_ip_vs_app(app);
+	unregister_ip_vs_app(net, app);
 }
 
 static struct pernet_operations ip_vs_ftp_ops = {
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index ad44205..391b856 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -1016,14 +1016,14 @@ static inline __u16 sctp_app_hashkey(__be16 port)
 		& SCTP_APP_TAB_MASK;
 }
 
-static int sctp_register_app(struct ip_vs_app *inc)
+static int sctp_register_app(struct net *net, struct ip_vs_app *inc)
 {
 	struct ip_vs_app *i;
 	__u16 hash;
 	__be16 port = inc->port;
 	int ret = 0;
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_SCTP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_SCTP);
 
 	hash = sctp_app_hashkey(port);
 
@@ -1042,10 +1042,10 @@ out:
 	return ret;
 }
 
-static void sctp_unregister_app(struct ip_vs_app *inc)
+static void sctp_unregister_app(struct net *net, struct ip_vs_app *inc)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_SCTP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_SCTP);
 
 	spin_lock_bh(&ipvs->sctp_app_lock);
 	atomic_dec(&pd->appcnt);
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 83bcda3..f27325b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -576,14 +576,14 @@ static inline __u16 tcp_app_hashkey(__be16 port)
 }
 
 
-static int tcp_register_app(struct ip_vs_app *inc)
+static int tcp_register_app(struct net *net, struct ip_vs_app *inc)
 {
 	struct ip_vs_app *i;
 	__u16 hash;
 	__be16 port = inc->port;
 	int ret = 0;
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_TCP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_TCP);
 
 	hash = tcp_app_hashkey(port);
 
@@ -604,10 +604,10 @@ static int tcp_register_app(struct ip_vs_app *inc)
 
 
 static void
-tcp_unregister_app(struct ip_vs_app *inc)
+tcp_unregister_app(struct net *net, struct ip_vs_app *inc)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_TCP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_TCP);
 
 	spin_lock_bh(&ipvs->tcp_app_lock);
 	atomic_dec(&pd->appcnt);
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 3719837..1dc3941 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -353,14 +353,14 @@ static inline __u16 udp_app_hashkey(__be16 port)
 }
 
 
-static int udp_register_app(struct ip_vs_app *inc)
+static int udp_register_app(struct net *net, struct ip_vs_app *inc)
 {
 	struct ip_vs_app *i;
 	__u16 hash;
 	__be16 port = inc->port;
 	int ret = 0;
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_UDP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_UDP);
 
 	hash = udp_app_hashkey(port);
 
@@ -382,10 +382,10 @@ static int udp_register_app(struct ip_vs_app *inc)
 
 
 static void
-udp_unregister_app(struct ip_vs_app *inc)
+udp_unregister_app(struct net *net, struct ip_vs_app *inc)
 {
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, IPPROTO_UDP);
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(net, IPPROTO_UDP);
+	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	spin_lock_bh(&ipvs->udp_app_lock);
 	atomic_dec(&pd->appcnt);
-- 
1.7.2.3


^ permalink raw reply related

* [*v3 PATCH 16/22] IPVS: netns, connection hash got net as param.
From: hans @ 2010-12-30 10:51 UTC (permalink / raw)
  To: horms, ja, daniel.lezcano, wensong, lvs-devel, netdev,
	netfilter-devel
  Cc: Hans Schillstrom
In-Reply-To: <1293706266-27152-1-git-send-email-hans@schillstrom.com>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?

All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.

ip_vs_conn_fill_param() have *net as first param now.

Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()

*v3
  moved net compare to the end in "fast path"

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h                     |   53 ++++++++++++---
 include/net/netns/ip_vs.h               |    2 +
 net/netfilter/ipvs/ip_vs_conn.c         |  112 +++++++++++++++++++------------
 net/netfilter/ipvs/ip_vs_core.c         |   15 +++--
 net/netfilter/ipvs/ip_vs_ftp.c          |   14 ++--
 net/netfilter/ipvs/ip_vs_nfct.c         |    6 +-
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |   15 +++--
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |    2 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c    |    2 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c    |    2 +-
 net/netfilter/ipvs/ip_vs_sync.c         |   13 ++--
 11 files changed, 153 insertions(+), 83 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1076cfb..fb0470a 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -455,6 +455,7 @@ extern struct ip_vs_proto_data * ip_vs_proto_data_get(struct net *net,
 						      unsigned short proto);

 struct ip_vs_conn_param {
+	struct net *			net;
 	const union nf_inet_addr	*caddr;
 	const union nf_inet_addr	*vaddr;
 	__be16				cport;
@@ -472,17 +473,19 @@ struct ip_vs_conn_param {
  */
 struct ip_vs_conn {
 	struct list_head        c_list;         /* hashed list heads */
-
+#ifdef CONFIG_NET_NS
+	struct net              *net;           /* Name space */
+#endif
 	/* Protocol, addresses and port numbers */
-	u16                      af;		/* address family */
-	union nf_inet_addr       caddr;          /* client address */
-	union nf_inet_addr       vaddr;          /* virtual address */
-	union nf_inet_addr       daddr;          /* destination address */
-	volatile __u32           flags;          /* status flags */
-	__u32                    fwmark;         /* Fire wall mark from skb */
-	__be16                   cport;
-	__be16                   vport;
-	__be16                   dport;
+	u16                     af;             /* address family */
+	__be16                  cport;
+	__be16                  vport;
+	__be16                  dport;
+	__u32                   fwmark;         /* Fire wall mark from skb */
+	union nf_inet_addr      caddr;          /* client address */
+	union nf_inet_addr      vaddr;          /* virtual address */
+	union nf_inet_addr      daddr;          /* destination address */
+	volatile __u32          flags;          /* status flags */
 	__u16                   protocol;       /* Which protocol (TCP/UDP) */

 	/* counter and timer */
@@ -525,6 +528,33 @@ struct ip_vs_conn {
 	__u8			pe_data_len;
 };

+/*
+ *  To save some memory in conn table when name space is disabled.
+ */
+static inline struct net *ip_vs_conn_net(const struct ip_vs_conn *cp)
+{
+#ifdef CONFIG_NET_NS
+	return cp->net;
+#else
+	return &init_net;
+#endif
+}
+static inline void ip_vs_conn_net_set(struct ip_vs_conn *cp, struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	cp->net = net;
+#endif
+}
+
+static inline int ip_vs_conn_net_eq(const struct ip_vs_conn *cp,
+				    struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	return cp->net == net;
+#else
+	return 1;
+#endif
+}

 /*
  *	Extended internal versions of struct ip_vs_service_user and
@@ -774,13 +804,14 @@ enum {
 	IP_VS_DIR_LAST,
 };

-static inline void ip_vs_conn_fill_param(int af, int protocol,
+static inline void ip_vs_conn_fill_param(struct net *net, int af, int protocol,
 					 const union nf_inet_addr *caddr,
 					 __be16 cport,
 					 const union nf_inet_addr *vaddr,
 					 __be16 vport,
 					 struct ip_vs_conn_param *p)
 {
+	p->net = net;
 	p->af = af;
 	p->protocol = protocol;
 	p->caddr = caddr;
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index 3b173b4..d547e03 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -67,6 +67,8 @@ struct netns_ipvs {
 	struct ip_vs_stats_user __percpu *ustats;    /* Statistics */
 	seqcount_t			*ustats_seq; /* u64 read retry */

+	/* ip_vs_conn */
+	atomic_t 		conn_count;         /*  connection counter */
 	/* ip_vs_lblc */
 	int 			sysctl_lblc_expiration;
 	struct ctl_table_header	*lblc_ctl_header;
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index b2024c9..0d5e4fe 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -64,9 +64,6 @@ static struct list_head *ip_vs_conn_tab __read_mostly;
 /*  SLAB cache for IPVS connections */
 static struct kmem_cache *ip_vs_conn_cachep __read_mostly;

-/*  counter for current IPVS connections */
-static atomic_t ip_vs_conn_count = ATOMIC_INIT(0);
-
 /*  counter for no client port connections */
 static atomic_t ip_vs_conn_no_cport_cnt = ATOMIC_INIT(0);

@@ -76,7 +73,7 @@ static unsigned int ip_vs_conn_rnd __read_mostly;
 /*
  *  Fine locking granularity for big connection hash table
  */
-#define CT_LOCKARRAY_BITS  4
+#define CT_LOCKARRAY_BITS  5
 #define CT_LOCKARRAY_SIZE  (1<<CT_LOCKARRAY_BITS)
 #define CT_LOCKARRAY_MASK  (CT_LOCKARRAY_SIZE-1)

@@ -133,19 +130,19 @@ static inline void ct_write_unlock_bh(unsigned key)
 /*
  *	Returns hash value for IPVS connection entry
  */
-static unsigned int ip_vs_conn_hashkey(int af, unsigned proto,
+static unsigned int ip_vs_conn_hashkey(struct net *net, int af, unsigned proto,
 				       const union nf_inet_addr *addr,
 				       __be16 port)
 {
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
-		return jhash_3words(jhash(addr, 16, ip_vs_conn_rnd),
-				    (__force u32)port, proto, ip_vs_conn_rnd)
-			& ip_vs_conn_tab_mask;
+		return (jhash_3words(jhash(addr, 16, ip_vs_conn_rnd),
+				    (__force u32)port, proto, ip_vs_conn_rnd) ^
+			((size_t)net>>8)) & ip_vs_conn_tab_mask;
 #endif
-	return jhash_3words((__force u32)addr->ip, (__force u32)port, proto,
-			    ip_vs_conn_rnd)
-		& ip_vs_conn_tab_mask;
+	return (jhash_3words((__force u32)addr->ip, (__force u32)port, proto,
+			    ip_vs_conn_rnd) ^
+		((size_t)net>>8)) & ip_vs_conn_tab_mask;
 }

 static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p,
@@ -166,15 +163,15 @@ static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p,
 		port = p->vport;
 	}

-	return ip_vs_conn_hashkey(p->af, p->protocol, addr, port);
+	return ip_vs_conn_hashkey(p->net, p->af, p->protocol, addr, port);
 }

 static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
 {
 	struct ip_vs_conn_param p;

-	ip_vs_conn_fill_param(cp->af, cp->protocol, &cp->caddr, cp->cport,
-			      NULL, 0, &p);
+	ip_vs_conn_fill_param(ip_vs_conn_net(cp), cp->af, cp->protocol,
+			      &cp->caddr, cp->cport, NULL, 0, &p);

 	if (cp->pe) {
 		p.pe = cp->pe;
@@ -186,7 +183,7 @@ static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
 }

 /*
- *	Hashes ip_vs_conn in ip_vs_conn_tab by proto,addr,port.
+ *	Hashes ip_vs_conn in ip_vs_conn_tab by netns,proto,addr,port.
  *	returns bool success.
  */
 static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
@@ -269,11 +266,12 @@ __ip_vs_conn_in_get(const struct ip_vs_conn_param *p)

 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
+		    p->cport == cp->cport && p->vport == cp->vport &&
 		    ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
 		    ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
-		    p->cport == cp->cport && p->vport == cp->vport &&
 		    ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
-		    p->protocol == cp->protocol) {
+		    p->protocol == cp->protocol &&
+		    ip_vs_conn_net_eq(cp, p->net)) {
 			/* HIT */
 			atomic_inc(&cp->refcnt);
 			ct_read_unlock(hash);
@@ -313,17 +311,18 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
 			    struct ip_vs_conn_param *p)
 {
 	__be16 _ports[2], *pptr;
+	struct net *net = skb_net(skb);

 	pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
 	if (pptr == NULL)
 		return 1;

 	if (likely(!inverse))
-		ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
-				      &iph->daddr, pptr[1], p);
+		ip_vs_conn_fill_param(net, af, iph->protocol, &iph->saddr,
+				      pptr[0], &iph->daddr, pptr[1], p);
 	else
-		ip_vs_conn_fill_param(af, iph->protocol, &iph->daddr, pptr[1],
-				      &iph->saddr, pptr[0], p);
+		ip_vs_conn_fill_param(net, af, iph->protocol, &iph->daddr,
+				      pptr[1], &iph->saddr, pptr[0], p);
 	return 0;
 }

@@ -352,6 +351,8 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
 	ct_read_lock(hash);

 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+		if (!ip_vs_conn_net_eq(cp, p->net))
+			continue;
 		if (p->pe_data && p->pe->ct_match) {
 			if (p->pe == cp->pe && p->pe->ct_match(p, cp))
 				goto out;
@@ -403,10 +404,11 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)

 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
+		    p->vport == cp->cport && p->cport == cp->dport &&
 		    ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
 		    ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
-		    p->vport == cp->cport && p->cport == cp->dport &&
-		    p->protocol == cp->protocol) {
+		    p->protocol == cp->protocol &&
+		    ip_vs_conn_net_eq(cp, p->net)) {
 			/* HIT */
 			atomic_inc(&cp->refcnt);
 			ret = cp;
@@ -609,8 +611,8 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
 	struct ip_vs_dest *dest;

 	if ((cp) && (!cp->dest)) {
-		dest = ip_vs_find_dest(&init_net, cp->af, &cp->daddr, cp->dport,
-				       &cp->vaddr, cp->vport,
+		dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
+				       cp->dport, &cp->vaddr, cp->vport,
 				       cp->protocol, cp->fwmark);
 		ip_vs_bind_dest(cp, dest);
 		return dest;
@@ -728,6 +730,7 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
 static void ip_vs_conn_expire(unsigned long data)
 {
 	struct ip_vs_conn *cp = (struct ip_vs_conn *)data;
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));

 	cp->timeout = 60*HZ;

@@ -770,7 +773,7 @@ static void ip_vs_conn_expire(unsigned long data)
 		ip_vs_unbind_dest(cp);
 		if (cp->flags & IP_VS_CONN_F_NO_CPORT)
 			atomic_dec(&ip_vs_conn_no_cport_cnt);
-		atomic_dec(&ip_vs_conn_count);
+		atomic_dec(&ipvs->conn_count);

 		kmem_cache_free(ip_vs_conn_cachep, cp);
 		return;
@@ -804,7 +807,9 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	       struct ip_vs_dest *dest, __u32 fwmark)
 {
 	struct ip_vs_conn *cp;
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, p->protocol);
+	struct netns_ipvs *ipvs = net_ipvs(p->net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(p->net,
+							   p->protocol);

 	cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
 	if (cp == NULL) {
@@ -814,6 +819,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,

 	INIT_LIST_HEAD(&cp->c_list);
 	setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
+	ip_vs_conn_net_set(cp, p->net);
 	cp->af		   = p->af;
 	cp->protocol	   = p->protocol;
 	ip_vs_addr_copy(p->af, &cp->caddr, p->caddr);
@@ -844,7 +850,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	atomic_set(&cp->n_control, 0);
 	atomic_set(&cp->in_pkts, 0);

-	atomic_inc(&ip_vs_conn_count);
+	atomic_inc(&ipvs->conn_count);
 	if (flags & IP_VS_CONN_F_NO_CPORT)
 		atomic_inc(&ip_vs_conn_no_cport_cnt);

@@ -886,17 +892,22 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
  *	/proc/net/ip_vs_conn entries
  */
 #ifdef CONFIG_PROC_FS
+struct ip_vs_iter_state {
+	struct seq_net_private p;
+	struct list_head *l;
+};

 static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
 {
 	int idx;
 	struct ip_vs_conn *cp;
+	struct ip_vs_iter_state *iter = seq->private;

 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
 		ct_read_lock_bh(idx);
 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
 			if (pos-- == 0) {
-				seq->private = &ip_vs_conn_tab[idx];
+				iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 			}
 		}
@@ -908,14 +919,17 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)

 static void *ip_vs_conn_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	seq->private = NULL;
+	struct ip_vs_iter_state *iter = seq->private;
+
+	iter->l = NULL;
 	return *pos ? ip_vs_conn_array(seq, *pos - 1) :SEQ_START_TOKEN;
 }

 static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ip_vs_conn *cp = v;
-	struct list_head *e, *l = seq->private;
+	struct ip_vs_iter_state *iter = seq->private;
+	struct list_head *e, *l = iter->l;
 	int idx;

 	++*pos;
@@ -932,18 +946,19 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	while (++idx < ip_vs_conn_tab_size) {
 		ct_read_lock_bh(idx);
 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
-			seq->private = &ip_vs_conn_tab[idx];
+			iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 		}
 		ct_read_unlock_bh(idx);
 	}
-	seq->private = NULL;
+	iter->l = NULL;
 	return NULL;
 }

 static void ip_vs_conn_seq_stop(struct seq_file *seq, void *v)
 {
-	struct list_head *l = seq->private;
+	struct ip_vs_iter_state *iter = seq->private;
+	struct list_head *l = iter->l;

 	if (l)
 		ct_read_unlock_bh(l - ip_vs_conn_tab);
@@ -957,9 +972,12 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v)
    "Pro FromIP   FPrt ToIP     TPrt DestIP   DPrt State       Expires PEName PEData\n");
 	else {
 		const struct ip_vs_conn *cp = v;
+		struct net *net = seq_file_net(seq);
 		char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
 		size_t len = 0;

+		if (!ip_vs_conn_net_eq(cp, net))
+			return 0;
 		if (cp->pe_data) {
 			pe_data[0] = ' ';
 			len = strlen(cp->pe->name);
@@ -1004,7 +1022,8 @@ static const struct seq_operations ip_vs_conn_seq_ops = {

 static int ip_vs_conn_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_conn_seq_ops);
+	return seq_open_net(inode, file, &ip_vs_conn_seq_ops,
+			    sizeof(struct ip_vs_iter_state));
 }

 static const struct file_operations ip_vs_conn_fops = {
@@ -1031,6 +1050,10 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v)
    "Pro FromIP   FPrt ToIP     TPrt DestIP   DPrt State       Origin Expires\n");
 	else {
 		const struct ip_vs_conn *cp = v;
+		struct net *net = seq_file_net(seq);
+
+		if (!ip_vs_conn_net_eq(cp, net))
+			return 0;

 #ifdef CONFIG_IP_VS_IPV6
 		if (cp->af == AF_INET6)
@@ -1067,7 +1090,8 @@ static const struct seq_operations ip_vs_conn_sync_seq_ops = {

 static int ip_vs_conn_sync_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_conn_sync_seq_ops);
+	return seq_open_net(inode, file, &ip_vs_conn_sync_seq_ops,
+			    sizeof(struct ip_vs_iter_state));
 }

 static const struct file_operations ip_vs_conn_sync_fops = {
@@ -1168,10 +1192,11 @@ void ip_vs_random_dropentry(void)
 /*
  *      Flush all the connection entries in the ip_vs_conn_tab
  */
-static void ip_vs_conn_flush(void)
+static void ip_vs_conn_flush(struct net *net)
 {
 	int idx;
 	struct ip_vs_conn *cp;
+	struct netns_ipvs *ipvs = net_ipvs(net);

   flush_again:
 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
@@ -1181,7 +1206,8 @@ static void ip_vs_conn_flush(void)
 		ct_write_lock_bh(idx);

 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
-
+			if (!ip_vs_conn_net_eq(cp, net))
+				continue;
 			IP_VS_DBG(4, "del connection\n");
 			ip_vs_conn_expire_now(cp);
 			if (cp->control) {
@@ -1194,7 +1220,7 @@ static void ip_vs_conn_flush(void)

 	/* the counter may be not NULL, because maybe some conn entries
 	   are run by slow timer handler or unhashed but still referred */
-	if (atomic_read(&ip_vs_conn_count) != 0) {
+	if (atomic_read(&ipvs->conn_count) != 0) {
 		schedule();
 		goto flush_again;
 	}
@@ -1204,8 +1230,11 @@ static void ip_vs_conn_flush(void)
  */
 int __net_init __ip_vs_conn_init(struct net *net)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;
+	atomic_set(&ipvs->conn_count, 0);

 	proc_net_fops_create(net, "ip_vs_conn", 0, &ip_vs_conn_fops);
 	proc_net_fops_create(net, "ip_vs_conn_sync", 0, &ip_vs_conn_sync_fops);
@@ -1217,6 +1246,8 @@ static void __net_exit __ip_vs_conn_cleanup(struct net *net)
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return;

+	/* flush all the connection entries first */
+	ip_vs_conn_flush(net);
 	proc_net_remove(net, "ip_vs_conn");
 	proc_net_remove(net, "ip_vs_conn_sync");
 }
@@ -1277,9 +1308,6 @@ int __init ip_vs_conn_init(void)
 void ip_vs_conn_cleanup(void)
 {
 	unregister_pernet_subsys(&ipvs_conn_ops);
-	/* flush all the connection entries first */
-	ip_vs_conn_flush();
-
 	/* Release the empty cache */
 	kmem_cache_destroy(ip_vs_conn_cachep);
 	vfree(ip_vs_conn_tab);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 5e278e5..ad0cb7d 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -191,7 +191,8 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
 			      const union nf_inet_addr *vaddr, __be16 vport,
 			      struct ip_vs_conn_param *p)
 {
-	ip_vs_conn_fill_param(svc->af, protocol, caddr, cport, vaddr, vport, p);
+	ip_vs_conn_fill_param(svc->net, svc->af, protocol, caddr, cport, vaddr,
+			      vport, p);
 	p->pe = svc->pe;
 	if (p->pe && p->pe->fill_param)
 		return p->pe->fill_param(p, skb);
@@ -334,8 +335,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 	/*
 	 *    Create a new connection according to the template
 	 */
-	ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, src_port,
-			      &iph.daddr, dst_port, &param);
+	ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol, &iph.saddr,
+			      src_port, &iph.daddr, dst_port, &param);

 	cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest, skb->mark);
 	if (cp == NULL) {
@@ -450,8 +451,10 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	 */
 	{
 		struct ip_vs_conn_param p;
-		ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr,
-				      pptr[0], &iph.daddr, pptr[1], &p);
+
+		ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
+				      &iph.saddr, pptr[0], &iph.daddr, pptr[1],
+				      &p);
 		cp = ip_vs_conn_new(&p, &dest->addr,
 				    dest->port ? dest->port : pptr[1],
 				    flags, dest, skb->mark);
@@ -518,7 +521,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
 		{
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(svc->af, iph.protocol,
+			ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
 					      &iph.saddr, pptr[0],
 					      &iph.daddr, pptr[1], &p);
 			cp = ip_vs_conn_new(&p, &daddr, 0,
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 77b0036..6a04f9a 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -198,13 +198,15 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 		 */
 		{
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(AF_INET, iph->protocol,
-					      &from, port, &cp->caddr, 0, &p);
+			ip_vs_conn_fill_param(ip_vs_conn_net(cp), AF_INET,
+					      iph->protocol, &from, port,
+					      &cp->caddr, 0, &p);
 			n_cp = ip_vs_conn_out_get(&p);
 		}
 		if (!n_cp) {
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(AF_INET, IPPROTO_TCP, &cp->caddr,
+			ip_vs_conn_fill_param(ip_vs_conn_net(cp),
+					      AF_INET, IPPROTO_TCP, &cp->caddr,
 					      0, &cp->vaddr, port, &p);
 			n_cp = ip_vs_conn_new(&p, &from, port,
 					      IP_VS_CONN_F_NO_CPORT |
@@ -361,9 +363,9 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,

 	{
 		struct ip_vs_conn_param p;
-		ip_vs_conn_fill_param(AF_INET, iph->protocol, &to, port,
-				      &cp->vaddr, htons(ntohs(cp->vport)-1),
-				      &p);
+		ip_vs_conn_fill_param(ip_vs_conn_net(cp), AF_INET,
+				      iph->protocol, &to, port, &cp->vaddr,
+				      htons(ntohs(cp->vport)-1), &p);
 		n_cp = ip_vs_conn_in_get(&p);
 		if (!n_cp) {
 			n_cp = ip_vs_conn_new(&p, &cp->daddr,
diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 4680647..f454c80 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -141,6 +141,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
 	struct nf_conntrack_tuple *orig, new_reply;
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = nf_ct_net(ct);

 	if (exp->tuple.src.l3num != PF_INET)
 		return;
@@ -155,7 +156,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,

 	/* RS->CLIENT */
 	orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
-	ip_vs_conn_fill_param(exp->tuple.src.l3num, orig->dst.protonum,
+	ip_vs_conn_fill_param(net, exp->tuple.src.l3num, orig->dst.protonum,
 			      &orig->src.u3, orig->src.u.tcp.port,
 			      &orig->dst.u3, orig->dst.u.tcp.port, &p);
 	cp = ip_vs_conn_out_get(&p);
@@ -268,7 +269,8 @@ void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp)
 		" for conn " FMT_CONN "\n",
 		__func__, ARG_TUPLE(&tuple), ARG_CONN(cp));

-	h = nf_conntrack_find_get(&init_net, NF_CT_DEFAULT_ZONE, &tuple);
+	h = nf_conntrack_find_get(ip_vs_conn_net(cp), NF_CT_DEFAULT_ZONE,
+				  &tuple);
 	if (h) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		/* Show what happens instead of calling nf_ct_kill() */
diff --git a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
index 28039cb..5b8eb8b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
@@ -41,15 +41,16 @@ struct isakmp_hdr {
 #define PORT_ISAKMP	500

 static void
-ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
-			     int inverse, struct ip_vs_conn_param *p)
+ah_esp_conn_fill_param_proto(struct net *net, int af,
+			     const struct ip_vs_iphdr *iph, int inverse,
+			     struct ip_vs_conn_param *p)
 {
 	if (likely(!inverse))
-		ip_vs_conn_fill_param(af, IPPROTO_UDP,
+		ip_vs_conn_fill_param(net, af, IPPROTO_UDP,
 				      &iph->saddr, htons(PORT_ISAKMP),
 				      &iph->daddr, htons(PORT_ISAKMP), p);
 	else
-		ip_vs_conn_fill_param(af, IPPROTO_UDP,
+		ip_vs_conn_fill_param(net, af, IPPROTO_UDP,
 				      &iph->daddr, htons(PORT_ISAKMP),
 				      &iph->saddr, htons(PORT_ISAKMP), p);
 }
@@ -61,8 +62,9 @@ ah_esp_conn_in_get(int af, const struct sk_buff *skb,
 {
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = skb_net(skb);

-	ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+	ah_esp_conn_fill_param_proto(net, af, iph, inverse, &p);
 	cp = ip_vs_conn_in_get(&p);
 	if (!cp) {
 		/*
@@ -89,8 +91,9 @@ ah_esp_conn_out_get(int af, const struct sk_buff *skb,
 {
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = skb_net(skb);

-	ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+	ah_esp_conn_fill_param_proto(net, af, iph, inverse, &p);
 	cp = ip_vs_conn_out_get(&p);
 	if (!cp) {
 		IP_VS_DBG_BUF(12, "Unknown ISAKMP entry for inout packet "
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 391b856..e92b3d2 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -1055,7 +1055,7 @@ static void sctp_unregister_app(struct net *net, struct ip_vs_app *inc)

 static int sctp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index f27325b..e421934 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -619,7 +619,7 @@ tcp_unregister_app(struct net *net, struct ip_vs_app *inc)
 static int
 tcp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 1dc3941..581157b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -396,7 +396,7 @@ udp_unregister_app(struct net *net, struct ip_vs_app *inc)

 static int udp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 29c6bbb..2887c34 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -657,21 +657,21 @@ control:
  *  fill_param used by version 1
  */
 static inline int
-ip_vs_conn_fill_param_sync(int af, union ip_vs_sync_conn *sc,
+ip_vs_conn_fill_param_sync(struct net *net, int af, union ip_vs_sync_conn *sc,
 			   struct ip_vs_conn_param *p,
 			   __u8 *pe_data, unsigned int pe_data_len,
 			   __u8 *pe_name, unsigned int pe_name_len)
 {
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
-		ip_vs_conn_fill_param(af, sc->v6.protocol,
+		ip_vs_conn_fill_param(net, af, sc->v6.protocol,
 				      (const union nf_inet_addr *)&sc->v6.caddr,
 				      sc->v6.cport,
 				      (const union nf_inet_addr *)&sc->v6.vaddr,
 				      sc->v6.vport, p);
 	else
 #endif
-		ip_vs_conn_fill_param(af, sc->v4.protocol,
+		ip_vs_conn_fill_param(net, af, sc->v4.protocol,
 				      (const union nf_inet_addr *)&sc->v4.caddr,
 				      sc->v4.cport,
 				      (const union nf_inet_addr *)&sc->v4.vaddr,
@@ -878,7 +878,7 @@ static void ip_vs_process_message_v0(struct net *net, const char *buffer,
 			}
 		}

-		ip_vs_conn_fill_param(AF_INET, s->protocol,
+		ip_vs_conn_fill_param(net, AF_INET, s->protocol,
 				      (const union nf_inet_addr *)&s->caddr,
 				      s->cport,
 				      (const union nf_inet_addr *)&s->vaddr,
@@ -1040,9 +1040,8 @@ static inline int ip_vs_proc_sync_conn(struct net *net, __u8 *p, __u8 *msg_end)
 			state = 0;
 		}
 	}
-	if (ip_vs_conn_fill_param_sync(af, s, &param,
-					pe_data, pe_data_len,
-					pe_name, pe_name_len)) {
+	if (ip_vs_conn_fill_param_sync(net, af, s, &param, pe_data,
+				       pe_data_len, pe_name, pe_name_len)) {
 		retc = 50;
 		goto out;
 	}
--
1.7.2.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox