public inbox for b.a.t.m.a.n@lists.open-mesh.org
 help / color / mirror / Atom feed
* [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
@ 2013-11-01  7:55 Bastian Bittorf
  2013-11-01 12:36 ` Antonio Quartulli
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Bittorf @ 2013-11-01  7:55 UTC (permalink / raw)
  To: b.a.t.m.a.n

in my small office-network we have _often_ the situation,
that my laptop 8-) has no internet. if this happens, i can
connect via ssh hop by hop and can see, that the network
itself (wireless/wired) is working, but the 'routes' are wrong.

here the transglobal-table in the master, my laptop is '00:21:6a:32:7c:1c'

root@EG-labor-AP:~ batctl tg
Globally announced TT entries received via the mesh bat0
       Client        (TTVN)       Originator      (Curr TTVN) (CRC   )
Flags
 * 02:00:c0:ca:c0:1a  (  8) via 02:00:ca:b1:00:15     (  8)   (0xe285) [...]
 * 06:2f:65:8a:d2:b7  (  1) via 02:00:ca:b1:00:02     (  1)   (0x50f6) [...]
 * 46:0a:75:3c:f2:47  (  1) via 02:00:ca:b1:00:15     (  8)   (0xe285) [...]
 * 5e:27:29:d8:ee:b4  (  1) via 02:00:ca:b1:00:76     (  1)   (0xc9fc) [...]
 * 72:f7:18:80:9d:9d  ( 86) via 02:00:de:ad:00:03     ( 86)   (0x7651) [...]
 * ae:d9:0f:ef:01:c3  (  1) via 02:00:ca:b1:02:22     (  5)   (0x6456) [...]
 * 56:fb:55:27:b2:63  (  1) via 02:00:ca:b1:00:58     (  5)   (0xac7c) [...]
 * 46:13:bf:2a:53:1e  (  1) via 02:00:ca:b1:00:13     (  5)   (0x8133) [...]
 * 0a:c6:fd:60:5d:5f  (  1) via 02:00:de:ad:02:23     (  1)   (0xa1c1) [...]
### interesting part:
 * 00:21:6a:32:7c:1c  (  4) via 02:00:ca:b1:02:22     (  5)   (0x6456) [.W.]
 + 00:21:6a:32:7c:1c  (  5) via 02:00:ca:b1:00:13     (  5)   [.W.]
###
 * e6:ad:ca:24:f6:10  (  1) via 02:00:ca:b1:00:45     (  3)   (0x6182) [...]

root@EG-labor-AP:~ batctl -v
batctl 2013.4.0 [batman-adv: 2013.4.0]

root@EG-labor-AP:~ cat /etc/openwrt_version 
r38568

the interesting thing is, that my laptop seems to be reachable via
*:02:22 and *:00:13 - the 2nd entry has no hash (?), but 'batctl t 00:21:6a:32:7c:1c'
outputs *:00:13 as originator. from the topology, it is impossible to be
near this node, so no roaming can happen AND i can see on my laptop,
that there was no roaming. the situation recovers without interaction after some
minutes. the transglobal table does not change, but 'batctl t 00:21:6a:32:7c:1c' 
outputs the correct *:02:22

what can i do for more debugging or is this bug already solved in trunk?

bye, bastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-01  7:55 [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table Bastian Bittorf
@ 2013-11-01 12:36 ` Antonio Quartulli
  2013-11-01 14:33   ` Bastian Bittorf
  0 siblings, 1 reply; 10+ messages in thread
From: Antonio Quartulli @ 2013-11-01 12:36 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

Hello Bastian,

On Fri, Nov 01, 2013 at 08:55:58AM +0100, Bastian Bittorf wrote:
> in my small office-network we have _often_ the situation,
> that my laptop 8-) has no internet. if this happens, i can
> connect via ssh hop by hop and can see, that the network
> itself (wireless/wired) is working, but the 'routes' are wrong.
> 
> here the transglobal-table in the master, my laptop is '00:21:6a:32:7c:1c'

what is master?

[..]

> the interesting thing is, that my laptop seems to be reachable via
> *:02:22 and *:00:13 - the 2nd entry has no hash (?), but 'batctl t 00:21:6a:32:7c:1c'
> outputs *:00:13 as originator. from the topology, it is impossible to be
> near this node, so no roaming can happen AND i can see on my laptop,
> that there was no roaming. the situation recovers without interaction after some
> minutes. the transglobal table does not change, but 'batctl t 00:21:6a:32:7c:1c' 
> outputs the correct *:02:22
> 

Here[1] you have an explanation about the translation table output.

My guess is that you have more than one node connected to the same LAN and BLA2
is properly enabled but some kind of L3 tricks on top of batman-adv is creating
confusion in the network. I'd suggest to read (if you have not done it yet) [2].


[1] http://www.open-mesh.org/projects/batman-adv/wiki/Understand-your-batman-adv-network#translation-tables
[2] http://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II


Cheers,

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-01 12:36 ` Antonio Quartulli
@ 2013-11-01 14:33   ` Bastian Bittorf
  2013-11-01 14:39     ` Antonio Quartulli
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Bittorf @ 2013-11-01 14:33 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

* Antonio Quartulli <antonio@meshcoding.com> [01.11.2013 13:53]:
> > here the transglobal-table in the master, my laptop is '00:21:6a:32:7c:1c'
> 
> what is master?

the node which has internet connectitvity / default gateway.

> 
> > the interesting thing is, that my laptop seems to be reachable via
> > *:02:22 and *:00:13 - the 2nd entry has no hash (?), but 'batctl t 00:21:6a:32:7c:1c'
> > outputs *:00:13 as originator. from the topology, it is impossible to be
> > near this node, so no roaming can happen AND i can see on my laptop,
> > that there was no roaming. the situation recovers without interaction after some
> > minutes. the transglobal table does not change, but 'batctl t 00:21:6a:32:7c:1c' 
> > outputs the correct *:02:22
> > 
> 
> Here[1] you have an explanation about the translation table output.

thanks, this help: so [.W.] means:
"this client is connected to the node through a wireless device"

      Client        (TTVN)       Originator      (Curr TTVN) (CRC   )   Flags
* 00:21:6a:32:7c:1c  (  4) via 02:00:ca:b1:02:22     (  5)   (0x6456) * [.W.]
+ 00:21:6a:32:7c:1c  (  5) via 02:00:ca:b1:00:13     (  5)   [.W.]

but i can be sure, that may laptop "00:21:6a:32:7c:1c" was never connected to '02:00:ca:b1:00:13'.
both nodes are not connected via cable and are nodes in hybrid-mode (ap+adhoc).
no special tricks, 'only' macvlan. BLA2 is active on all nodes.

the again: why does batman-adv think, that the client (my laptop) is/was
reachable over 02:00:ca:b1:00:13 - the laptop was never there? a hash-collision?

what i also see now:
a laptop is connected via wifi to NodeA, but i ask the 'transglobal'
table, batman-adv says it is on another location and 'batctl tr $lapop'
also works. explaining it:

NodeA = 192.168.99.1/16   ~~~ Laptop with 192.168.222.51/16

(air)

NodeB = 192.168.222.1/16

The Laptop is connected to Node A, but has an IP from Node B.
batman-adv thinks that the Laptop is on NodeB, but in fact it
is on NodeA. Why is this? On Node A 'wlan0' is bridged to bat0.

I can also see via pinging from Laptop 'dups' (2 answers).

bye, bastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-01 14:33   ` Bastian Bittorf
@ 2013-11-01 14:39     ` Antonio Quartulli
  2013-11-01 15:16       ` Bastian Bittorf
  0 siblings, 1 reply; 10+ messages in thread
From: Antonio Quartulli @ 2013-11-01 14:39 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 3334 bytes --]

On Fri, Nov 01, 2013 at 03:33:05PM +0100, Bastian Bittorf wrote:
> * Antonio Quartulli <antonio@meshcoding.com> [01.11.2013 13:53]:
> > > here the transglobal-table in the master, my laptop is '00:21:6a:32:7c:1c'
> > 
> > what is master?
> 
> the node which has internet connectitvity / default gateway.
> 
> > 
> > > the interesting thing is, that my laptop seems to be reachable via
> > > *:02:22 and *:00:13 - the 2nd entry has no hash (?), but 'batctl t 00:21:6a:32:7c:1c'
> > > outputs *:00:13 as originator. from the topology, it is impossible to be
> > > near this node, so no roaming can happen AND i can see on my laptop,
> > > that there was no roaming. the situation recovers without interaction after some
> > > minutes. the transglobal table does not change, but 'batctl t 00:21:6a:32:7c:1c' 
> > > outputs the correct *:02:22
> > > 
> > 
> > Here[1] you have an explanation about the translation table output.
> 
> thanks, this help: so [.W.] means:
> "this client is connected to the node through a wireless device"

It also explains why you have more than one entry for the same client and why.

> 
>       Client        (TTVN)       Originator      (Curr TTVN) (CRC   )   Flags
> * 00:21:6a:32:7c:1c  (  4) via 02:00:ca:b1:02:22     (  5)   (0x6456) * [.W.]
> + 00:21:6a:32:7c:1c  (  5) via 02:00:ca:b1:00:13     (  5)   [.W.]
> 
> but i can be sure, that may laptop "00:21:6a:32:7c:1c" was never connected to '02:00:ca:b1:00:13'.
> both nodes are not connected via cable and are nodes in hybrid-mode (ap+adhoc).
> no special tricks, 'only' macvlan. BLA2 is active on all nodes.
> 
> the again: why does batman-adv think, that the client (my laptop) is/was
> reachable over 02:00:ca:b1:00:13 - the laptop was never there? a hash-collision?

No. This happens when bat0 on one node and bat0 on the other are bridged
together. The common scenario for this is that you have the two nodes connected
to an Ethernet switch and you have bat0 bridged into this LAN. At this point the
"two bat0s" will get in touch with each other. Like the first picture in this
page[1].

The "only" macvlan thing is probably something we should try to investigate
further :-)
You are the first reporting strange issues like this and the fact that this
happens quite often means that there is something in the network setup that is
triggering this problem.

Do you mind explaining a bit more in details how you structured the node? (which
interface is bridged with what, where macvlan is connected).

Can you also provide the output of "batctl bbt" ?

> 
> what i also see now:
> a laptop is connected via wifi to NodeA, but i ask the 'transglobal'
> table, batman-adv says it is on another location and 'batctl tr $lapop'
> also works. explaining it:
> 
> NodeA = 192.168.99.1/16   ~~~ Laptop with 192.168.222.51/16
> 
> (air)
> 
> NodeB = 192.168.222.1/16
> 
> The Laptop is connected to Node A, but has an IP from Node B.
> batman-adv thinks that the Laptop is on NodeB, but in fact it
> is on NodeA. Why is this? On Node A 'wlan0' is bridged to bat0.
> 

I guess you roamed from NodeB to NodeA ? Is the entry in the global table
followed by a "R"

Cheers,

[1] http://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-01 14:39     ` Antonio Quartulli
@ 2013-11-01 15:16       ` Bastian Bittorf
  2013-11-03  7:10         ` Gui Iribarren
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Bittorf @ 2013-11-01 15:16 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

* Antonio Quartulli <antonio@meshcoding.com> [01.11.2013 16:04]:
> > both nodes are not connected via cable and are nodes in hybrid-mode (ap+adhoc).
> > no special tricks, 'only' macvlan. BLA2 is active on all nodes.
> > 
> > the again: why does batman-adv think, that the client (my laptop) is/was
> > reachable over 02:00:ca:b1:00:13 - the laptop was never there? a hash-collision?
> 
> No. This happens when bat0 on one node and bat0 on the other are bridged
> together. The common scenario for this is that you have the two nodes connected
> to an Ethernet switch and you have bat0 bridged into this LAN. At this point the
> "two bat0s" will get in touch with each other. Like the first picture in this
> page[1].
> 
> The "only" macvlan thing is probably something we should try to investigate
> further :-)
> You are the first reporting strange issues like this and the fact that this
> happens quite often means that there is something in the network setup that is
> triggering this problem.

All nodes are in 'hybrid' mode, so adhoc+ap on 1 or more radio's.
Each interface, e.g. LAN/WAN/ADHOC is an batman-adv interface, each
AP-Mode/hostapd-interfaces is bridged to bat0, so it looks like:

root@node15hybrid:~ batctl interface
eth0.1: active		# LAN
eth0.2: active		# WAN
wlan0-1: active		# adhoc-2.4ghz
wlan1-1: active		# adhoc-5ghz

root@node15hybrid:~ brctl show
bridge name     bridge id               STP enabled     interfaces
br-mybridge     7fff.460a753cf247       no              bat0
                                                        wlan0	# AP-2.4ghz
                                                        wlan1	# AP-5ghz

A few number of nodes are coupled via wire (this works).
Each node has an IP of 192.168.x.1/16 where X is a uniq number.

Each node has a macvlan called 'gateway0' which has the IP 192.168.0.1/32
This is just an IP which every DHCP-Client gets for "default-gateway".
(so the gateway is the node itself and not the internet-offering-node).
This looks like this:

root@node222hybrid:~ ip address show dev gateway0
15: gateway0@br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN group default 
    link/ether 02:00:c0:ca:c0:1a brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.1/32 scope global gateway0
       valid_lft forever preferred_lft forever
    inet6 fe80::c0ff:feca:c01a/64 scope link 
       valid_lft forever preferred_lft forever

Each node is a batman-adv gateway, so 'batctl gwl' outputs every node.
(so DHCP-questions are not forwarded but ansered locally).

The backbone-table seems to be empty on every node.

Does this help? bye, bastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-01 15:16       ` Bastian Bittorf
@ 2013-11-03  7:10         ` Gui Iribarren
  2013-11-03  9:18           ` Bastian Bittorf
  0 siblings, 1 reply; 10+ messages in thread
From: Gui Iribarren @ 2013-11-03  7:10 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On 11/01/2013 04:16 PM, Bastian Bittorf wrote:
> * Antonio Quartulli <antonio@meshcoding.com> [01.11.2013 16:04]:
>>> both nodes are not connected via cable and are nodes in hybrid-mode (ap+adhoc).
>>> no special tricks, 'only' macvlan. BLA2 is active on all nodes.
>>>
>>> the again: why does batman-adv think, that the client (my laptop) is/was
>>> reachable over 02:00:ca:b1:00:13 - the laptop was never there? a hash-collision?
>>
>> No. This happens when bat0 on one node and bat0 on the other are bridged
>> together. The common scenario for this is that you have the two nodes connected
>> to an Ethernet switch and you have bat0 bridged into this LAN. At this point the
>> "two bat0s" will get in touch with each other. Like the first picture in this
>> page[1].
>>
>> The "only" macvlan thing is probably something we should try to investigate
>> further :-)
>> You are the first reporting strange issues like this and the fact that this
>> happens quite often means that there is something in the network setup that is
>> triggering this problem.
>
> All nodes are in 'hybrid' mode, so adhoc+ap on 1 or more radio's.
> Each interface, e.g. LAN/WAN/ADHOC is an batman-adv interface, each
> AP-Mode/hostapd-interfaces is bridged to bat0, so it looks like:
>
> root@node15hybrid:~ batctl interface
> eth0.1: active		# LAN
> eth0.2: active		# WAN
> wlan0-1: active		# adhoc-2.4ghz
> wlan1-1: active		# adhoc-5ghz
>
> root@node15hybrid:~ brctl show
> bridge name     bridge id               STP enabled     interfaces
> br-mybridge     7fff.460a753cf247       no              bat0
>                                                          wlan0	# AP-2.4ghz
>                                                          wlan1	# AP-5ghz
>
> A few number of nodes are coupled via wire (this works).
> Each node has an IP of 192.168.x.1/16 where X is a uniq number.
>
> Each node has a macvlan called 'gateway0' which has the IP 192.168.0.1/32
> This is just an IP which every DHCP-Client gets for "default-gateway".
> (so the gateway is the node itself and not the internet-offering-node).
> This looks like this:
>
> root@node222hybrid:~ ip address show dev gateway0
> 15: gateway0@br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> qdisc noqueue state UNKNOWN group default
>      link/ether 02:00:c0:ca:c0:1a brd ff:ff:ff:ff:ff:ff

if this mac (02:00:c0:ca:c0:1a) exists on several different nodes that 
are not connected by a real ethernet backbone (and BLA2 enabled)
then batman goes mushroom tripping, since it 'sees' that MAC as a 
non-mesh-client that is everywhere at the same time, and tries to roam 
it around, creating funny symptoms (like DUPs and such)

if all nodes are actually connected to an ethernet backbone, then BLA2 
is supposed to save the day, by properly handling the situation. 
(haven't actually tried it, tho)

what we did is avoid (the best we can) those packets to be sent over 
bat0, with ebtables

# cat /etc/firewall.user
ebtables -A FORWARD -j DROP -d 02:00:c0:ca:c0:1a
ebtables -t nat -A POSTROUTING -o bat0 -j DROP -s 02:00:c0:ca:c0:1a

hope that helps!

>      inet 192.168.0.1/32 scope global gateway0
>         valid_lft forever preferred_lft forever
>      inet6 fe80::c0ff:feca:c01a/64 scope link
>         valid_lft forever preferred_lft forever
>
> Each node is a batman-adv gateway, so 'batctl gwl' outputs every node.
> (so DHCP-questions are not forwarded but ansered locally).

i wouldn't be so sure... AFAIU when a request arrives at a 
gw_mode=master, bat0 passes it upstream (to br-lan) as a broadcast, so 
that it will reach either a local dnsmasq, or another DHCP server 
running on the lan behind (say, connected to eth0 which is part of br-lan)

(i used that setup several times; a batadv gw_mode=master node with no 
local dnsmasq, but another dhcp server connected via ethernet behind)

>
> The backbone-table seems to be empty on every node.
>
> Does this help? bye, bastian
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-03  7:10         ` Gui Iribarren
@ 2013-11-03  9:18           ` Bastian Bittorf
  2013-11-03  9:42             ` Gui Iribarren
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Bittorf @ 2013-11-03  9:18 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

* Gui Iribarren <gui@altermundi.net> [03.11.2013 09:52]:
> >Each node has a macvlan called 'gateway0' which has the IP 192.168.0.1/32
> >This is just an IP which every DHCP-Client gets for "default-gateway".
> >(so the gateway is the node itself and not the internet-offering-node).
> >This looks like this:
> >
> >root@node222hybrid:~ ip address show dev gateway0
> >15: gateway0@br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> >qdisc noqueue state UNKNOWN group default
> >     link/ether 02:00:c0:ca:c0:1a brd ff:ff:ff:ff:ff:ff
> 
> if this mac (02:00:c0:ca:c0:1a) exists on several different nodes
> that are not connected by a real ethernet backbone (and BLA2 enabled)
> then batman goes mushroom tripping, since it 'sees' that MAC as a
> non-mesh-client that is everywhere at the same time, and tries to
> roam it around, creating funny symptoms (like DUPs and such)

BINGO! thank you Gui - if i read the old mails, i can even see it
i the transglobal table. if i look into the mesh, it pop's up on
random nodes with random originators. yes: mushroom tripping 8-)

i will try the ebtables approach, but i dont like it.
IMHO it's more elegant to just 'ignore' this mac by the
daemon itself:

To the devs: is this possible?

bye, bastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-03  9:18           ` Bastian Bittorf
@ 2013-11-03  9:42             ` Gui Iribarren
  2013-11-23  9:24               ` Bastian Bittorf
  0 siblings, 1 reply; 10+ messages in thread
From: Gui Iribarren @ 2013-11-03  9:42 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On 11/03/2013 10:18 AM, Bastian Bittorf wrote:
> * Gui Iribarren <gui@altermundi.net> [03.11.2013 09:52]:
>>> Each node has a macvlan called 'gateway0' which has the IP 192.168.0.1/32
>>> This is just an IP which every DHCP-Client gets for "default-gateway".
>>> (so the gateway is the node itself and not the internet-offering-node).
>>> This looks like this:
>>>
>>> root@node222hybrid:~ ip address show dev gateway0
>>> 15: gateway0@br-mybridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
>>> qdisc noqueue state UNKNOWN group default
>>>      link/ether 02:00:c0:ca:c0:1a brd ff:ff:ff:ff:ff:ff
>>
>> if this mac (02:00:c0:ca:c0:1a) exists on several different nodes
>> that are not connected by a real ethernet backbone (and BLA2 enabled)
>> then batman goes mushroom tripping, since it 'sees' that MAC as a
>> non-mesh-client that is everywhere at the same time, and tries to
>> roam it around, creating funny symptoms (like DUPs and such)
>
> BINGO! thank you Gui - if i read the old mails, i can even see it
> i the transglobal table. if i look into the mesh, it pop's up on
> random nodes with random originators. yes: mushroom tripping 8-)
>
> i will try the ebtables approach, but i dont like it.
> IMHO it's more elegant to just 'ignore' this mac by the
> daemon itself:

According to the (misleading, by definition :P) docs

http://www.open-mesh.org/projects/open-mesh/wiki/Connecting-Batman-adv-clouds

it shouldn't quite ignore it, but instead properly support this 
"anycast" MAC, taking advantage of the "Bridge Loop Avoidance II 
component" even when there's no physical backbone between nodes.

meanwhile, at the batcave... (2013/10/14)

     d0tslash: guii: we don't have anycast support
     d0tslash: yet
     guii: ...?
     d0tslash: if you have the same mac address on multiple nodes 
(without bla), it will roam
     guii: oh!
     d0tslash: hm, it might work if you have bla enabled and the nodes 
are connected via the same ethernet
     d0tslash: but otherweise it won't work
     d0tslash: because it is supposed to roam :)

     d0tslash: so you won't have that feature for now, i'm afraid
     d0tslash: it is still on our "feature todo" list


so, given there will be MACs that roam (laptops, phones...) and MACs 
that don't (anycast), i can imagine some kind of regexp matching that 
will say "don't roam these kind of MACs, instead, consider them anycast 
macs and use bla2 magic"

  # batctl anycast EE:C4:57:00:00:00/32


...until then, ebtables WORKSFORME :D
and all this doesn't make batman-adv any less awesome than what it was 
already ;)

btw, even with the ebtables rule, we had to turn off DAT in a scenario 
equivalent to yours, because the DAT cache was also acting funny (DUP 
arp replies from each node in the cloud)
haven't got around to properly debug it / report it, but still, be warned :)

>
> To the devs: is this possible?
>
> bye, bastian
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-03  9:42             ` Gui Iribarren
@ 2013-11-23  9:24               ` Bastian Bittorf
  2013-11-23 16:15                 ` Antonio Quartulli
  0 siblings, 1 reply; 10+ messages in thread
From: Bastian Bittorf @ 2013-11-23  9:24 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

* Gui Iribarren <gui@altermundi.net> [03.11.2013 20:37]:
> .....until then, ebtables WORKSFORME :D
> and all this doesn't make batman-adv any less awesome than what it
> was already ;)
> 
> btw, even with the ebtables rule, we had to turn off DAT in a
> scenario equivalent to yours, because the DAT cache was also acting
> funny (DUP arp replies from each node in the cloud)
> haven't got around to properly debug it / report it, but still, be warned :)

since a few days we have running those 2 ebtable-rules on all nodes:
ebtables -A FORWARD -j DROP -d "$mac_gateway"
ebtables -t nat -A POSTROUTING -o bat0 -j DROP -s "$mac_gateway"

it looks like this:
root@box:~ ebtables -L FORWARD --Lc
Bridge table: filter

Bridge chain: FORWARD, entries: 1, policy: ACCEPT
-d 2:0:c0:ca:c0:1a -j DROP , pcnt = 9581 -- bcnt = 1116077

root@box:~ ebtables -t nat -L POSTROUTING --Lc
Bridge table: nat

Bridge chain: POSTROUTING, entries: 1, policy: ACCEPT
-s 2:0:c0:ca:c0:1a -o bat0 -j DROP , pcnt = 4 -- bcnt = 352

so most of the time it is working fine. but we have seen another
issue, but i'am unsure where is it coming from:

"clients time out in translocal-table"

A laptop connected to always the same router / no roaming involved
times out in 'translocal-table' and so it also times out on the
other nodes in the 'transglobal-table', so it is not reachable anymore.

a bad translocal-table/dat-cache with this client looks like this:
(i have removed other clients, for better readablility)

root@box:~ batctl tl
Locally retrieved addresses (from bat0) announced via TT (TTVN: 2 CRC:
0x6023):
       Client        Flags   Last seen
 * 00:21:6a:32:7c:1c [....W]   0.010

root@box:~ batctl dc
Distributed ARP Table (bat0):
          IPv4             MAC           last-seen
 *  192.168.222.61 00:21:6a:32:7c:1c      3:50

after some seconds the client disappaers from DAT-cache:

root@box:~ batctl tl
Locally retrieved addresses (from bat0) announced via TT (TTVN: 2 CRC:
0x6023):
       Client        Flags   Last seen 
 * 00:21:6a:32:7c:1c [....W]   0.010

root@box:~ batctl dc
Distributed ARP Table (bat0):
          IPv4             MAC           last-seen

after some time even the 'translocal-table' is empty, although with
'iw dev wlan0 station dump' i can see the active client. i'm
normally connected, can ping/ssh the node itself but not further.
(only hop by hop)

how does batman detect, if a client is active? (can i trigger is somehow?)
what can i do tho debug further?

thanks & bye, bastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table
  2013-11-23  9:24               ` Bastian Bittorf
@ 2013-11-23 16:15                 ` Antonio Quartulli
  0 siblings, 0 replies; 10+ messages in thread
From: Antonio Quartulli @ 2013-11-23 16:15 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

On Sat, Nov 23, 2013 at 10:24:45AM +0100, Bastian Bittorf wrote:
> * Gui Iribarren <gui@altermundi.net> [03.11.2013 20:37]:
> > btw, even with the ebtables rule, we had to turn off DAT in a
> > scenario equivalent to yours, because the DAT cache was also acting
> > funny (DUP arp replies from each node in the cloud)
> > haven't got around to properly debug it / report it, but still, be warned :)

I think all these strange behaviours are coming from the fact that what you guys
are trying to do is not really supported by the underlying layer (batman-adv).

I think a better idea is to start thinking how to bring anycast support in
batman-adv other than trying to mess up the rest :) That would surely help the
entire community.

After the last WBM we concentrated our efforts in creating a starting point for
a "more general" solution and we collected the results in this page [*].

This page describes what you probably want to achieve at the end, so working all
together to make it possible would probably be the best option (instead of
trying to workaround unsupported setup and then asking for help to debug
inconsistent behaviours....).

> root@box:~ batctl dc
> Distributed ARP Table (bat0):
>           IPv4             MAC           last-seen
>  *  192.168.222.61 00:21:6a:32:7c:1c      3:50
> 
> after some seconds the client disappaers from DAT-cache:

As you can imagine DAT is a cache and if it does not get refreshed
often enough the content will slowly disappear. Right now the timeout is 4
minutes and this is why "after" few second your entry goes away (it is at 3:50
at that moment). If you have not yet read the documentation, [1] explains the
mechanism behind it.

> 
> root@box:~ batctl tl
> Locally retrieved addresses (from bat0) announced via TT (TTVN: 2 CRC:
> 0x6023):
>        Client        Flags   Last seen 
>  * 00:21:6a:32:7c:1c [....W]   0.010
> 
> root@box:~ batctl dc
> Distributed ARP Table (bat0):
>           IPv4             MAC           last-seen
> 
> after some time even the 'translocal-table' is empty, although with
> 'iw dev wlan0 station dump' i can see the active client. i'm
> normally connected, can ping/ssh the node itself but not further.
> (only hop by hop)
> 
> how does batman detect, if a client is active? (can i trigger is somehow?)
> what can i do tho debug further?
> 

As written in [2]:
"Every client MAC address that is recognized through the mesh interface will be
stored in a node local table called "local translation table" which will contain
all the clients the node is currently serving."

So if your client is timing out it means that no packet originated by it is
reaching your mesh interface.

If you want to debug further now you have to ask yourself what are you doing to
prevent packets to reach bat0 :-)


Cheers,


[*] http://www.open-mesh.org/projects/open-mesh/wiki/Connecting-Batman-adv-clouds
[1] http://www.open-mesh.org/projects/batman-adv/wiki/DistributedArpTable-technical
[2] http://www.open-mesh.org/projects/batman-adv/wiki/Client-announcement

-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-11-23 16:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-01  7:55 [B.A.T.M.A.N.] lost connection to a client / Q: transglobal-table Bastian Bittorf
2013-11-01 12:36 ` Antonio Quartulli
2013-11-01 14:33   ` Bastian Bittorf
2013-11-01 14:39     ` Antonio Quartulli
2013-11-01 15:16       ` Bastian Bittorf
2013-11-03  7:10         ` Gui Iribarren
2013-11-03  9:18           ` Bastian Bittorf
2013-11-03  9:42             ` Gui Iribarren
2013-11-23  9:24               ` Bastian Bittorf
2013-11-23 16:15                 ` Antonio Quartulli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox