From: Arvid Brodin <arvid.brodin@enea.com>
To: <netdev@vger.kernel.org>
Cc: arbr <Arvid.Brodin@enea.com>
Subject: Re: bridge: HSR support - possible recursive locking?
Date: Thu, 12 Jan 2012 19:02:23 +0100 [thread overview]
Message-ID: <4F0F202F.8060901@enea.com> (raw)
In-Reply-To: <4F073954.7040001@enea.com>
Arvid Brodin wrote:
> Arvid Brodin wrote:
>>> On Tue, 11 Oct 2011 20:25:08 +0200
>>> Arvid Brodin <arvid.brodin@enea.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to add support for HSR ("High-availability Seamless Redundancy",
>>>> IEC-62439-3) to the bridge code. With HSR, all connected units have two network
>>>> ports and are connected in a ring. All new Ethernet packets are sent on both
>>>> ports (or passed through if the current unit is not the originating unit). The
>>>> same packet is never passed twice. Non-HSR units are not allowed in the ring.
>>>>
>>>> This gives instant, reconfiguration-free failover.
>>>>
> *snip*
>> I need to do two things:
>>
>> 1) Bind two network interfaces into one (say, eth0 & eth1 => hsr0). Frames sent on
>> hsr0 should get an HSR tag (including the correct EtherType) and go out on both
>> eth0 and eth1.
>>
>> 2) Ingress frames on eth0 & eth1, with EtherType 0x88fb, should be captured and
>> handled specially (either received on hsr0 or forwarded to the other bound
>> physical interface).
>>
>
> I'm slowly getting there! :)
>
> But what is net_device->header_ops->rebuild supposed to do?
>
I have a "possible recursive locking" when I send cloned packets, and I can't figure out
why. Here's the stack dump and some debug printouts:
hsr_dev_xmit:286: sent on first slave
=============================================
[ INFO: possible recursive locking detected ]
2.6.37 #43
---------------------------------------------
swapper/0 is trying to acquire lock:
(_xmit_ETHER#2){+.-...}, at: [<901b9aae>] sch_direct_xmit+0x24/0x152
but task is already holding lock:
(_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
other info that might help us debug this:
4 locks held by swapper/0:
#0: (&n->timer){+.-...}, at: [<9002b2b4>] run_timer_softirq+0x98/0x184
#1: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
#2: (_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
#3: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
stack backtrace:
Call trace:
[<9001c264>] dump_stack+0x18/0x20
[<9003fdbc>] validate_chain+0x40c/0x9ac
[<90040968>] __lock_acquire+0x60c/0x670
[<90041cda>] lock_acquire+0x3a/0x48
[<90216c5c>] _raw_spin_lock+0x20/0x44
[<901b9aae>] sch_direct_xmit+0x24/0x152
[<901afb44>] dev_queue_xmit+0x1c8/0x37c
[<90213090>] nf_hook_xmit+0x8/0xc
[<902130a2>] slave_xmit+0xe/0x10
[<902131d6>] hsr_dev_xmit+0xa6/0xcc
[<901af8c2>] dev_hard_start_xmit+0x382/0x43c
[<901afc64>] dev_queue_xmit+0x2e8/0x37c
[<901dc8a0>] arp_xmit+0x8/0xc
[<901dcf86>] arp_send+0x2a/0x2c
[<901dd978>] arp_solicit+0x110/0x130
[<901b54a4>] neigh_timer_handler+0x1c2/0x206
[<9002b31e>] run_timer_softirq+0x102/0x184
[<90027eb8>] __do_softirq+0x64/0xe0
[<9002804a>] do_softirq+0x26/0x48
[<90028146>] irq_exit+0x2e/0x64
[<90019bae>] do_IRQ+0x46/0x5c
[<90018424>] irq_level0+0x18/0x60
[<902136ae>] rest_init+0x72/0x90
[<9000063c>] start_kernel+0x21c/0x258
[<00000000>] 0x0
hsr_dev_xmit:289: sent on second slave
The code looks like this (from my hsr_dev_xmit() function):
...
skb2 = skb_clone(skb, GFP_ATOMIC);
slave_xmit(skb, hsr_priv->slave_data[0].dev);
printk(KERN_INFO "%s:%d: sent on first slave\n", __func__, __LINE__);
if (skb2)
slave_xmit(skb2, hsr_priv->slave_data[1].dev);
printk(KERN_INFO "%s:%d: sent on second slave\n", __func__, __LINE__);
...
and slave_xmit looks like this:
int nf_hook_xmit(struct sk_buff *skb)
{
dev_queue_xmit(skb);
return 0;
}
static int slave_xmit(struct sk_buff *skb, struct net_device *dev)
{
int res;
skb->dev = dev;
skb->priority = 1; // FIXME: what does this mean?
res = NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev, nf_hook_xmit);
// res = dev_queue_xmit(skb);
/* Buffer is consumed on errors too, so nothing to do here, really... */
return res;
}
I believe I'm doing exactly the same thing as the bridging code (but of course I
can't be). So what is it that I'm doing wrong???
--
Arvid Brodin
Enea Services Stockholm AB
prev parent reply other threads:[~2012-01-12 18:02 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4E948A04.8060400@enea.com>
[not found] ` <20111011112821.28cd3e51@nehalam.linuxnetplumber.net>
2011-10-11 23:51 ` bridge: HSR support Arvid Brodin
2011-10-12 13:28 ` David Lamparter
2011-10-12 14:24 ` Arvid Brodin
2011-10-24 14:17 ` Arvid Brodin
2011-10-28 15:34 ` Arvid Brodin
2011-10-28 15:54 ` Stephen Hemminger
2011-10-28 16:36 ` Arvid Brodin
2011-12-06 23:23 ` Arvid Brodin
2011-12-06 23:27 ` Stephen Hemminger
2011-12-07 18:30 ` Arvid Brodin
2011-12-07 19:59 ` Jay Vosburgh
2011-12-08 14:45 ` Arvid Brodin
2011-11-21 16:52 ` Arvid Brodin
2012-01-06 18:11 ` Arvid Brodin
2012-01-12 18:02 ` Arvid Brodin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F0F202F.8060901@enea.com \
--to=arvid.brodin@enea.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.