[B.A.T.M.A.N.] batman goes looping...

public inbox for b.a.t.m.a.n@lists.open-mesh.org
 help / color / mirror / Atom feed

* [B.A.T.M.A.N.] batman goes looping...
@ 2009-07-22 10:56 Andrew Lunn
  2009-07-27 16:20 ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Lunn @ 2009-07-22 10:56 UTC (permalink / raw)
  To: lindner_marek; +Cc: B.A.T.M.A.N

[-- Attachment #1: Type: text/plain, Size: 5164 bytes --]

Hi Marek

I finally had time to dig into our problems with loops in our chain.

Some background for the list. Yang and I have been using User Mode
Linux (UML) to build a test network for batman advanced. We connect a
number of uml machines together using a modified version of
uml_switch. The modifications allow us to change the packet drop
probability between any two nodes. We have been testing using simple
chains as shown in the attached gif. The black lines show the
currently used links. The red lines are other links which are
currently not used by batman. The black links have a packet drop
probablilty of 0% and the red of 20%.

Our test was to remove uml5 from the network and see how long
batman-adv took to re-route around it. We ping from uml4 to uml6 and
from uml1 to uml9.

We found that uml4->uml6 would recover in around 14 seconds. However
uml1->uml9 took much longer, 65 seconds.

Looking at the routing, we found it went into loops. When sending from
uml1 to uml9, uml1 routes to uml2, uml2 routes back to uml1.

Here are the logs from uml2. I've cut out most of the packets, just
showing OGMs from uml9. There is a simple relationship between the MAC
address and the uml number:

fe:fe:00:00:01:01 - uml1
fe:fe:00:00:02:01 - uml2
fe:fe:00:00:03:01 - uml3
etc...

[  42949558] Received BATMAN packet via NB: fe:fe:00:00:03:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:04:01, seqno 146, tq 218, TTL 44, V 7, IDF 0) 
[  42949558] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:03:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 218 
[  42949558] update_originator(): Searching and updating originator entry of received packet 
[  42949558] Updating existing last-hop neighbour of originator
[  42949558] Drop packet: duplicate packet received

This has been received from uml3 origionally from uml4. The TQ is 218
to uml9 via uml3.

[  42949559] Received BATMAN packet via NB: fe:fe:00:00:01:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:03:01, seqno 146, tq 209, TTL 42, V 7, IDF 0) 
[  42949559] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:01:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 209 
[  42949559] update_originator(): Searching and updating originator entry of received packet 
[  42949559] Updating existing last-hop neighbour of originator
[  42949559] Drop packet: duplicate packet received

This is where is starts to get interesting. This is from uml1,
origionally from uml3. So it has jumped uml2, it used the 20% packet
drop link which exists between uml1 and uml3. Because this is not an
echo, uml2 processes it, and now knows that with a TQ of 209 it can
get to uml9 via uml1.

[  42949559] Sending own  packet (originator fe:fe:00:00:02:01, seqno 155, TQ 255, TTL 50, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[  42949559] Forwarding aggregated  packet (originator fe:fe:00:00:06:01, seqno 152, TQ 232, TTL 46, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[  42949559] Forwarding aggregated  packet (originator fe:fe:00:00:09:01, seqno 146, TQ 215, TTL 43, IDF off) on interface eth1 [fe:fe:00:00:02:01]
[  42949559] Forwarding  packet (originator fe:fe:00:00:01:01, seqno 156, TQ 250, TTL 49, IDF on) on interface eth1 [fe:fe:00:00:02:01]

[  42949560] Received BATMAN packet via NB: fe:fe:00:00:03:01, IF: eth1 [fe:fe:00:00:02:01] (from OG: fe:fe:00:00:09:01, via old OG: fe:fe:00:00:04:01, seqno 148, tq 150, TTL 45, V 7, IDF 0) 
[  42949560] updating last_seqno: old 146, new 148 
[  42949560] bidirectional: orig = fe:fe:00:00:09:01 neigh = fe:fe:00:00:03:01 => own_bcast = 64, real recv = 64, local tq: 255, asym_penalty: 255, total tq: 150 
[  42949560] update_originator(): Searching and updating originator entry of received packet 
[  42949560] Updating existing last-hop neighbour of originator
[  42949560] Changing route towards: fe:fe:00:00:09:01 (now via fe:fe:00:00:01:01 - was via fe:fe:00:00:03:01)
[  42949560] Forwarding packet: rebroadcast originator packet 
[  42949560] Forwarding packet: tq_orig: 150, tq_avg: 209, tq_forw: 204, ttl_orig: 44, ttl_forw: 255 

Now things go none optimal :-(

This is from uml3, origionally from uml4. The TQ value has dropped to
150. This will be when we have removed uml5, so the TQ naturally does
drop.

The TQ value via uml3 is now less than the TQ value via uml1. So it
changes its route to go via uml1.

Looking at the logs of uml1, uml1 is always routing to uml9 via uml2.
The problem here i think is to do with the asymetric links algorithms.
When sending out an OGM, the node uses the TQ for its best link to the
originator, not the link the OGM came in on. If the OGM from uml1
origionally from UML3 reported the TQ via that route, the TQ would
very likely be lower. uml2 would then not of choosen to swap to
uml1. However, uml1 reports its best route, which is via uml2. uml2
does not know this, decides to use uml1, and we have a loop.

Does this all hang together correctly? I'm i interpreting this all
right...

How would you suggest fix this?

Thanks
        Andrew

[-- Attachment #2: mesh.gif --]
[-- Type: image/gif, Size: 17145 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-07-22 10:56 [B.A.T.M.A.N.] batman goes looping Andrew Lunn
@ 2009-07-27 16:20 ` Marek Lindner
  2009-07-30 14:32   ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-07-27 16:20 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]


Hi,

> Looking at the logs of uml1, uml1 is always routing to uml9 via uml2.
> The problem here i think is to do with the asymetric links algorithms.
> When sending out an OGM, the node uses the TQ for its best link to the
> originator, not the link the OGM came in on. If the OGM from uml1
> origionally from UML3 reported the TQ via that route, the TQ would
> very likely be lower. uml2 would then not of choosen to swap to
> uml1. However, uml1 reports its best route, which is via uml2. uml2
> does not know this, decides to use uml1, and we have a loop.
>
> Does this all hang together correctly? I'm i interpreting this all
> right...
>
> How would you suggest fix this?

I would tend to say it is a bug in the routing selection code. UML2 switches 
the route because it compare thes "negative" TQ message of seqno N with a TQ 
of seqno N - x. I suggest the following fix:

If we receive a TQ value via a neighbor that is smaller than the previous TQ 
that we received via that neighbor we don't change the route to a neighbor 
which did not send us the same or newer seqno.

That way your scenario should not happen because uml2 would not switch. On the 
other hand if uml1 really has a better route uml2 would switch as soon as the 
packet with a new seqno via uml1 arrives. What do you think ?

I attached a patch that should do exactly that. As I'm travelling right now 
I'm not able to test it before next week. If you find the time to do so, please 
let me know about your findings. This patch is not as strict as it could be. It 
might be necessary to rework it as soon as the concept has been proven.

Thanks again for this thorough analysis. Let us know if you find more.  :-)

Regards,
Marek



[-- Attachment #2: premature-route-change.patch --]
[-- Type: text/x-patch, Size: 2290 bytes --]

 batman-adv-kernelland/routing.c |   19 ++++++++++++++++++-
 1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/batman-adv-kernelland/routing.c b/batman-adv-kernelland/routing.c
index b576f8c..cb9f0ea 100644
--- a/batman-adv-kernelland/routing.c
+++ b/batman-adv-kernelland/routing.c
@@ -271,7 +271,7 @@ static int isBidirectionalNeigh(struct orig_node *orig_node, struct orig_node *o
 static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming, unsigned char *hna_buff, int hna_buff_len, char is_duplicate)
 {
 	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL, *best_neigh_node = NULL;
-	unsigned char max_tq = 0, max_bcast_own = 0;
+	unsigned char max_tq = 0, max_bcast_own = 0, tq_avg_old;
 	int tmp_hna_buff_len;
 
 	debug_log(LOG_TYPE_BATMAN, "update_originator(): Searching and updating originator entry of received packet \n");
@@ -306,6 +306,7 @@ static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, stru
 	neigh_node->last_valid = jiffies;
 
 	ring_buffer_set(neigh_node->tq_recv, &neigh_node->tq_index, batman_packet->tq);
+	tq_avg_old = neigh_node->tq_avg;
 	neigh_node->tq_avg = ring_buffer_avg(neigh_node->tq_recv);
 
 	if (!is_duplicate) {
@@ -323,6 +324,22 @@ static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, stru
 
 	tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len);
 
+	/**
+	 * if the neighbor that sent us this packet is our current best next
+	 * hop but delivers a TQ that is worse than the previous one we have
+	 * have to make sure that the alternative route already knows about the
+	 * changed TQ otherwise we risk a (temporary) loop
+	 * in case our alternative route does not know about his change we
+	 * stick with our current route
+	 */
+	if ((orig_node->router == neigh_node) &&
+	    (neigh_node != best_neigh_node) &&
+	    (tq_avg_old > neigh_node->tq_avg) &&
+	    (!get_bit_status(best_neigh_node->real_bits,
+	                     orig_node->last_real_seqno,
+	                     batman_packet->seqno)))
+		best_neigh_node = neigh_node;
+
 	update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len);
 }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-07-27 16:20 ` Marek Lindner
@ 2009-07-30 14:32   ` Yang Su
  2009-08-03 10:21     ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-07-30 14:32 UTC (permalink / raw)
  To: Marek Lindner; +Cc: b.a.t.m.a.n, Andrew Lunn

[-- Attachment #1: Type: text/plain, Size: 7642 bytes --]

-  More information about the UML based test setup: a chain consisting of 9
nodes (each node is an uml instatnce, uml1, uml2 ... uml9). Direct
neighbors (distance=1 hop) can talk to each other with 0% link loss rate.
Nodes that are 2 hop away can talk to each other with 20% link loss rate.
Nodes that are more than 2 hops away cannot communicate to each other. We
shut down the node in the middle (uml5) and measure (with ping) the time it
takes for batman to re-setup the end-to-end routes (from uml1 to uml9).

-  We observe extensive tmp routing loops during the rerouting events. It
often takes batman more than 1 minutes to get any packet through after the
rerouting and it takes even longer (several minutes) before batman
stabilizes.

-  The cause for the loops:

Let's study the interaction between uml1 and uml2:

OGMs from uml9 may arrive at uml1 and uml2 at roughly the same time
(because both of them has a shortest path of 6-hop from uml9).  uml1 and
uml2 will forward those OGMs to each other.  Depending on the timing
relations between uml1 and uml2, there are four possible combinations:

Case 1: first uml1 forwards the OGM received from uml3 to uml2,  then  uml
2 forwards the OGM received from uml3 to uml1.

Case 2: first uml1 forwards the OGM received from uml3 to uml2, then uml 2
forwards the OGM received from uml1 back to uml1.

Case 3: first uml2 forwards the OGM received from uml3 to uml1, then uml1
forwards the OGM received from uml3 to uml2.

Case 4: first uml2 forwards the OGM received from uml3 to uml1, then uml1
forwards the OGM received from uml2 back to uml2.

When the quality of the path from uml1 to uml9 drops down dramatically
(e.g., in our case, path loss rate increases from 0% to 20% after we remove
uml5), 3 out of 4 cases (case 1,2,4) may cause looping between uml1 and
uml2. With the interaction of echo cancellation mechanism, case 2 may even
lead batman into very deep looping.

- To fix the problem:

I explored two ideas:

1.  Similar to what Marek proposed. But push it to the extreme: switch to a
neighbor only when it has the best tq AND it has the newest seqno.  This
fix along seems to already solve the looping problem in the test cases. It
reduces the rerouting time from more than 1 minutes to less than 15
seconds.  Marek: I also tried the patch you sent. It didn't help in this
setup.

2. Relaxed echo cancellation. This is based on  the following observation:
the TQ value that a node puts into OGM is completely decoupled with "from
which neighbor this OGM is received".   As a result, the TQ value contained
in the echoed OGM represent the real TQ value at the neighbor which echoed
this OGM. The current echo cancellation implementation just drops all the
echoed OGM. This may prevent the node from updating the information towards
the neighbor that echos the OGM. In the extreme case, the information
towards that neighbor may becomes completely stale (similar to what happens
in case 2).  The change I  made:  Always check the TQ contained in the
echoed OGMs. When it is worse than the avg TQ towards that neighbor, we use
this TQ reading to update the avg TQ towards that neighbor. This change
didn't show any effect during the chain tests. However, I still include
this change in the patch to bring up the discussion.

In the attachment, I append a patch which contains a preliminary
implementation for both ideas.



Cheers,
Yang
(See attached file: routing-loops.patch)



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Marek Lindner <lindner_marek@yahoo.de>                                                                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |b.a.t.m.a.n@lists.open-mesh.net                                                                                                                   |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Andrew Lunn <andrew.lunn@ascom.ch>, Yang Su <yang.su@ascom.ch>                                                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |27.07.2009 18:21                                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [B.A.T.M.A.N.] batman goes looping...                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|






Hi,

> Looking at the logs of uml1, uml1 is always routing to uml9 via uml2.
> The problem here i think is to do with the asymetric links algorithms.
> When sending out an OGM, the node uses the TQ for its best link to the
> originator, not the link the OGM came in on. If the OGM from uml1
> origionally from UML3 reported the TQ via that route, the TQ would
> very likely be lower. uml2 would then not of choosen to swap to
> uml1. However, uml1 reports its best route, which is via uml2. uml2
> does not know this, decides to use uml1, and we have a loop.
>
> Does this all hang together correctly? I'm i interpreting this all
> right...
>
> How would you suggest fix this?

I would tend to say it is a bug in the routing selection code. UML2
switches
the route because it compare thes "negative" TQ message of seqno N with a
TQ
of seqno N - x. I suggest the following fix:

If we receive a TQ value via a neighbor that is smaller than the previous
TQ
that we received via that neighbor we don't change the route to a neighbor
which did not send us the same or newer seqno.

That way your scenario should not happen because uml2 would not switch. On
the
other hand if uml1 really has a better route uml2 would switch as soon as
the
packet with a new seqno via uml1 arrives. What do you think ?

I attached a patch that should do exactly that. As I'm travelling right now

I'm not able to test it before next week. If you find the time to do so,
please
let me know about your findings. This patch is not as strict as it could
be. It
might be necessary to rework it as soon as the concept has been proven.

Thanks again for this thorough analysis. Let us know if you find more.  :-)

Regards,
Marek

[-- Attachment #2: routing-loops.patch --]
[-- Type: application/octet-stream, Size: 4305 bytes --]

Index: types.h
===================================================================
--- types.h	(revision 1365)
+++ types.h	(working copy)
@@ -73,6 +73,7 @@
 	uint8_t tq_index;
 	uint8_t tq_avg;
 	uint8_t last_ttl;
+        uint16_t seqno;         /* Most recent seqno received from this neighbor */
 	unsigned long last_valid;            /* when last packet via this neighbour was received */
 	TYPE_OF_WORD real_bits[NUM_WORDS];
 	struct orig_node *orig_node;
Index: routing.c
===================================================================
--- routing.c	(revision 1365)
+++ routing.c	(working copy)
@@ -305,6 +305,9 @@
 	orig_node->flags = batman_packet->flags;
 	neigh_node->last_valid = jiffies;
 
+        if(batman_packet->seqno > neigh_node->seqno)
+            neigh_node->seqno = batman_packet->seqno;
+        
 	ring_buffer_set(neigh_node->tq_recv, &neigh_node->tq_index, batman_packet->tq);
 	neigh_node->tq_avg = ring_buffer_avg(neigh_node->tq_recv);
 
@@ -320,10 +323,16 @@
 		best_neigh_node = neigh_node;
 
 	}
-
-	tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len);
-
-	update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len);
+        
+        if(best_neigh_node == NULL || orig_node->router == NULL){
+                tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len);
+                update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len);         
+                
+                /* Resolving the tmpt looping: only when best_neigh_node has up-to-date seqno, switch to it */
+        }else if(best_neigh_node->seqno > orig_node->router->seqno){                
+                tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len);
+                update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len);
+        }
 }
 
 static char count_real_packets(struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming)
@@ -367,7 +376,10 @@
 	char has_directlink_flag;
 	char is_my_addr = 0, is_my_orig = 0, is_my_oldorig = 0, is_broadcast = 0, is_bidirectional, is_single_hop_neigh, is_duplicate;
 	unsigned short if_incoming_seqno;
-
+        
+        struct neigh_node *tmp_neigh_node = NULL;
+        
+        
 	/* could be changed by schedule_own_packet() */
 	if_incoming_seqno = atomic_read(&if_incoming->seqno);
 
@@ -435,22 +447,35 @@
 		return;
 	}
 
+        orig_node = get_orig_node(batman_packet->orig);
+	if (orig_node == NULL)
+		return;
+
+        /* When the echo contains a tq from neighbor that is worse
+         * than the current average via that neighbor, update the avg
+         * tq to circumvent the stale tq reading */
 	if (is_my_oldorig) {
+                list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {                        
+                        if (compare_orig(tmp_neigh_node->addr, ethhdr->h_source) && (tmp_neigh_node->if_incoming == if_incoming)) {
+                                if(batman_packet->tq < tmp_neigh_node->tq_avg){
+                                        ring_buffer_set(tmp_neigh_node->tq_recv, &tmp_neigh_node->tq_index, batman_packet->tq);
+                                        tmp_neigh_node->tq_avg = ring_buffer_avg(tmp_neigh_node->tq_recv);
+                                }
+                        }
+                }
+                
 		debug_log(LOG_TYPE_BATMAN, "Drop packet: ignoring all rebroadcast echos (sender: %s) \n", neigh_str);
 		return;
 	}
-
+        
 	is_duplicate = count_real_packets(ethhdr, batman_packet, if_incoming);
-
-	orig_node = get_orig_node(batman_packet->orig);
-	if (orig_node == NULL)
-		return;
-
+        
+        
 	/* if sender is a direct neighbor the sender mac equals originator mac */
 	orig_neigh_node = (is_single_hop_neigh ? orig_node : get_orig_node(ethhdr->h_source));
 	if (orig_neigh_node == NULL)
 		return;
-
+        
 	/* drop packet if sender is not a direct neighbor and if we don't route towards it */
 	if (!is_single_hop_neigh && (orig_neigh_node->router == NULL)) {
 		debug_log(LOG_TYPE_BATMAN, "Drop packet: OGM via unknown neighbor! \n");

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-07-30 14:32   ` Yang Su
@ 2009-08-03 10:21     ` Marek Lindner
  2009-08-04 15:59       ` [B.A.T.M.A.N.] Routing help? nick farrow
  2009-08-06  8:44       ` [B.A.T.M.A.N.] batman goes looping Yang Su
  0 siblings, 2 replies; 18+ messages in thread
From: Marek Lindner @ 2009-08-03 10:21 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 2749 bytes --]

On Thursday 30 July 2009 22:32:04 Yang Su wrote:
> 1.  Similar to what Marek proposed. But push it to the extreme: switch to a
> neighbor only when it has the best tq AND it has the newest seqno.  This
> fix along seems to already solve the looping problem in the test cases. It
> reduces the rerouting time from more than 1 minutes to less than 15
> seconds.  Marek: I also tried the patch you sent. It didn't help in this
> setup.

Changing the route based on the (fastest) sequence number has some drawbacks 
which we experienced before (BATMAN III). It tends to discriminate longer but 
better routes and favors short paths. In some asymetric link (worst case) 
scenarios it would route against all odds, simply because receiving a fast 
packet (newest seqno) does not mean we can send the same way back. If possible 
I'd like to avoid this strict seqno check.

I attached another patch which will conduct a route switch only if the TQ of 
the sending neighbor is better than our current best route (no negative 
switching anymore). I tested the patch here and it works so far but my 
environment is less controlled than yours. Could you perform the same test on 
your setup ?

If you intend to experiment with other ideas watch out for the sequence number 
as it will overflow. A simple "greater than" check might lead to strange 
results. Also, updates_routes() checks for changed HNA messages even if the 
next hop does not change.


> 2. Relaxed echo cancellation. This is based on  the following observation:
> the TQ value that a node puts into OGM is completely decoupled with "from
> which neighbor this OGM is received".   As a result, the TQ value contained
> in the echoed OGM represent the real TQ value at the neighbor which echoed
> this OGM. The current echo cancellation implementation just drops all the
> echoed OGM. This may prevent the node from updating the information towards
> the neighbor that echos the OGM. In the extreme case, the information
> towards that neighbor may becomes completely stale (similar to what happens
> in case 2).  The change I  made:  Always check the TQ contained in the
> echoed OGMs. When it is worse than the avg TQ towards that neighbor, we use
> this TQ reading to update the avg TQ towards that neighbor. This change
> didn't show any effect during the chain tests. However, I still include
> this change in the patch to bring up the discussion.

Right, every node will emit his currently best TQ value but I did not 
understand how we can use that. If we send him a better TQ he will send back 
that number. If we send a bad TQ he will send his good number. Furthermore, 
each hop will apply some asymetric / hop / wifi penalty that we pull into our 
routing database ?

Regards,
Marek


[-- Attachment #2: switch-route-with-better-tq.patch --]
[-- Type: text/x-patch, Size: 3675 bytes --]

 batman-adv-kernelland/routing.c |   44 ++++++++++++++++----------------------
 1 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/batman-adv-kernelland/routing.c b/batman-adv-kernelland/routing.c
index b576f8c..204f6a3 100644
--- a/batman-adv-kernelland/routing.c
+++ b/batman-adv-kernelland/routing.c
@@ -270,31 +270,22 @@ static int isBidirectionalNeigh(struct orig_node *orig_node, struct orig_node *o
 
 static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming, unsigned char *hna_buff, int hna_buff_len, char is_duplicate)
 {
-	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL, *best_neigh_node = NULL;
-	unsigned char max_tq = 0, max_bcast_own = 0;
+	struct neigh_node *neigh_node = NULL, *tmp_neigh_node = NULL;
 	int tmp_hna_buff_len;
 
 	debug_log(LOG_TYPE_BATMAN, "update_originator(): Searching and updating originator entry of received packet \n");
 
 	list_for_each_entry(tmp_neigh_node, &orig_node->neigh_list, list) {
-
 		if (compare_orig(tmp_neigh_node->addr, ethhdr->h_source) && (tmp_neigh_node->if_incoming == if_incoming)) {
 			neigh_node = tmp_neigh_node;
-		} else {
-
-			if (!is_duplicate) {
-				ring_buffer_set(tmp_neigh_node->tq_recv, &tmp_neigh_node->tq_index, 0);
-				tmp_neigh_node->tq_avg = ring_buffer_avg(tmp_neigh_node->tq_recv);
-			}
+			continue;
+		}
 
-			/* if we got have a better tq value via this neighbour or same tq value if it is currently our best neighbour (to avoid route flipping) */
-			if ((tmp_neigh_node->tq_avg > max_tq) || ((tmp_neigh_node->tq_avg == max_tq) && (tmp_neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > max_bcast_own)) || ((orig_node->router == tmp_neigh_node) && (tmp_neigh_node->tq_avg == max_tq))) {
+		if (is_duplicate)
+			continue;
 
-				max_tq = tmp_neigh_node->tq_avg;
-				max_bcast_own = tmp_neigh_node->orig_node->bcast_own_sum[if_incoming->if_num];
-				best_neigh_node = tmp_neigh_node;
-			}
-		}
+		ring_buffer_set(tmp_neigh_node->tq_recv, &tmp_neigh_node->tq_index, 0);
+		tmp_neigh_node->tq_avg = ring_buffer_avg(tmp_neigh_node->tq_recv);
 	}
 
 	if (neigh_node == NULL)
@@ -313,17 +304,20 @@ static void update_orig(struct orig_node *orig_node, struct ethhdr *ethhdr, stru
 		neigh_node->last_ttl = batman_packet->ttl;
 	}
 
-	if ((neigh_node->tq_avg > max_tq) || ((neigh_node->tq_avg == max_tq) && (neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > max_bcast_own)) || ((orig_node->router == neigh_node) && (neigh_node->tq_avg == max_tq))) {
-
-		max_tq = neigh_node->tq_avg;
-		max_bcast_own = neigh_node->orig_node->bcast_own_sum[if_incoming->if_num];
-		best_neigh_node = neigh_node;
-
-	}
-
 	tmp_hna_buff_len = (hna_buff_len > batman_packet->num_hna * ETH_ALEN ? batman_packet->num_hna * ETH_ALEN : hna_buff_len);
 
-	update_routes(orig_node, best_neigh_node, hna_buff, tmp_hna_buff_len);
+	/**
+	 * if we got have a better tq value via this neighbour or
+	 * same tq value but the link is more symetric change the next hop
+	 * router
+	 */
+	if ((orig_node->router != neigh_node) && ((!orig_node->router) ||
+	    (neigh_node->tq_avg > orig_node->router->tq_avg) ||
+	    ((neigh_node->tq_avg == orig_node->router->tq_avg) &&
+	     (neigh_node->orig_node->bcast_own_sum[if_incoming->if_num] > orig_node->router->orig_node->bcast_own_sum[if_incoming->if_num]))))
+		update_routes(orig_node, neigh_node, hna_buff, tmp_hna_buff_len);
+	else
+		update_routes(orig_node, orig_node->router, hna_buff, tmp_hna_buff_len);
 }
 
 static char count_real_packets(struct ethhdr *ethhdr, struct batman_packet *batman_packet, struct batman_if *if_incoming)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [B.A.T.M.A.N.] Routing help?
  2009-08-03 10:21     ` Marek Lindner
@ 2009-08-04 15:59       ` nick farrow
  2009-08-06  8:44       ` [B.A.T.M.A.N.] batman goes looping Yang Su
  1 sibling, 0 replies; 18+ messages in thread
From: nick farrow @ 2009-08-04 15:59 UTC (permalink / raw)
  To: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 1664 bytes --]

Hi,

I have 3 linksys wrt54 flashed with openwrt 7.09, webif'ed and installed BATMAN 0.2-rv478.1, all OK I think.

I have arranged the 3 nodes such that node 1 can only see 2 , and  2 only  3, so no direct contact is available from 1 to 3 (except via the mesh)

I have not broken the br-lan configuration of these devices and assigned 172.16.1.1/24 to node 1 172.16.1.2/24 to node 2 172.16.1.3/24 to node 3, so I think that means all packets recvd on wl0 or eh0 are bridged.

I have a pc hooked into the lan port of node 3 From here I can ssh into node3 (via the ethernet) and into node2 via the wifi (but not .1)  Netstat -rn on node  3 indicates that 172.16.1.1 (node1) is via 172.16.1.2 and this is confirmed by the batman web if that shows .1 is reached via .2 - all great.

From the ssh prompt on .3 I can ping .2 with no loss and ping .1 with  ~ 6% loss. However, if I try to ping .1 from the pc connected on the ethernet port of .3 I get no route to host. Putting wireshark on the ethernet ifc shows the arp going out but nothing coming back. Interestingly enough if I ssh into .3 and ping .1, with wireshark running on the ethernet of .3 Is see the outgoing ICMP but not the reply (which is there because the ping succeeds) . It seems as if the local host traffic is not truly bridged.

This is the first attempt so I might have some incorrect/missing config or done something stupid, any ideas would be welcome!

Thanks very much

nick

_________________________________________________________________
Windows Live Messenger: Celebrate 10 amazing years with free winks and emoticons.
http://clk.atdmt.com/UKM/go/157562755/direct/01/

[-- Attachment #2: Type: text/html, Size: 1897 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-03 10:21     ` Marek Lindner
  2009-08-04 15:59       ` [B.A.T.M.A.N.] Routing help? nick farrow
@ 2009-08-06  8:44       ` Yang Su
  2009-08-06 14:15         ` Marek Lindner
  1 sibling, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-06  8:44 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Marek Lindner <lindner_marek@yahoo.de>                                                                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net>                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03.08.2009 12:23                                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [B.A.T.M.A.N.] batman goes looping...                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |b.a.t.m.a.n-bounces@lists.open-mesh.net                                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|








On Thursday 30 July 2009 22:32:04 Yang Su wrote:
> 1.  Similar to what Marek proposed. But push it to the extreme: switch to
a
> neighbor only when it has the best tq AND it has the newest seqno.  This
> fix along seems to already solve the looping problem in the test cases.
It
> reduces the rerouting time from more than 1 minutes to less than 15
> seconds.  Marek: I also tried the patch you sent. It didn't help in this
> setup.

Changing the route based on the (fastest) sequence number has some
drawbacks
which we experienced before (BATMAN III). It tends to discriminate longer
but
better routes and favors short paths. In some asymetric link (worst case)
scenarios it would route against all odds, simply because receiving a fast
packet (newest seqno) does not mean we can send the same way back. If
possible
I'd like to avoid this strict seqno check.

- You are right. Under certain circumstance, this approach may lead to less
optimal route selection. Just to understand batman algorithm better, I try
to make an example (please correct me if something's wrong here): There are
two direct neighbors via which a sender communicates with a receiver. Via
the good neighbor, their is a single good route with long delay. Between
the bad neighbor and the receiver there are many parallel bad routes. Each
bad route has high packet loss rate but very short delay. The receiver's
OGMs arrive at the sender first via the bad neighbor. Although there is
substantial OGM loss on each individual bad routes, by aggregation over
those parallel bad routes, the bad neighbor is still able to relay most of
sender's OGM to the receiver. As a result, if the sender already chooses
the bad neighbor as the next hop, it will stick to it for quite long time
and does not switch to the good neighbor.

I attached another patch which will conduct a route switch only if the TQ
of the sending neighbor is better than our current best route (no negative
switching anymore). I tested the patch here and it works so far but my
environment is less controlled than yours. Could you perform the same test
on your setup ?

- This approach achieves the similar goal as the seqno-based one: switch to
a neighbor only when this neighbor is better and has more up-to-date
information than the current best route. But this approach is less
constrained and seems not to suffer from the above problem for the
seqno-based approach. I have tested the patch on our test environment. It
works pretty well on the test cases.

> 2. Relaxed echo cancellation. This is based on  the following
observation:
> the TQ value that a node puts into OGM is completely decoupled with "from
> which neighbor this OGM is received".   As a result, the TQ value
contained
> in the echoed OGM represent the real TQ value at the neighbor which
echoed
> this OGM. The current echo cancellation implementation just drops all the
> echoed OGM. This may prevent the node from updating the information
towards
> the neighbor that echos the OGM. In the extreme case, the information
> towards that neighbor may becomes completely stale (similar to what
happens
> in case 2).  The change I  made:  Always check the TQ contained in the
> echoed OGMs. When it is worse than the avg TQ towards that neighbor, we
use
> this TQ reading to update the avg TQ towards that neighbor. This change
> didn't show any effect during the chain tests. However, I still include
> this change in the patch to bring up the discussion.

Right, every node will emit his currently best TQ value but I did not
understand how we can use that. If we send him a better TQ he will send
back
that number. If we send a bad TQ he will send his good number. Furthermore,

each hop will apply some asymetric / hop / wifi penalty that we pull into
our
routing database ?

- The problem might happen when the neighbor thinks that I am his best
neighbor (his good number is actually from me). In this case, my estimation
of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop
penalty). When my TQ drops down, the correct behavior should be that my
estimation of that neighbor's TQ also drops down accordingly. However, when
echo cancellation steps in, my estimation of that neighbor's TQ will not be
updated and remains to be the stale value. This might cause problems (e.g.
looping) in corner cases even with the new patch.  Does this analysis make
sense or I miss some details of the echo cancellation?

Regards,
Marek

[attachment "switch-route-with-better-tq.patch" deleted by Yang
Su/NSUYAN/CH/Ascom] _______________________________________________
B.A.T.M.A.N mailing list
B.A.T.M.A.N@lists.open-mesh.net
https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-06  8:44       ` [B.A.T.M.A.N.] batman goes looping Yang Su
@ 2009-08-06 14:15         ` Marek Lindner
  2009-08-07 15:13           ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-06 14:15 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 3386 bytes --]

On Thursday 06 August 2009 16:44:28 Yang Su wrote:
> - You are right. Under certain circumstance, this approach may lead to less
> optimal route selection. Just to understand batman algorithm better, I try
> to make an example (please correct me if something's wrong here): There are
> two direct neighbors via which a sender communicates with a receiver. Via
> the good neighbor, their is a single good route with long delay. Between
> the bad neighbor and the receiver there are many parallel bad routes. Each
> bad route has high packet loss rate but very short delay. The receiver's
> OGMs arrive at the sender first via the bad neighbor. Although there is
> substantial OGM loss on each individual bad routes, by aggregation over
> those parallel bad routes, the bad neighbor is still able to relay most of
> sender's OGM to the receiver. As a result, if the sender already chooses
> the bad neighbor as the next hop, it will stick to it for quite long time
> and does not switch to the good neighbor.

It is possble to simplify your example more to clearly illustrate the problem. 
Scenario: Host A wants to send data to host B. Consider the beautiful graphic 
I attached. ;-)
The link between A and B is quite asymetric. No packet loss from B to A but 
heavy packet loss the other way round. Now, B periodically sends its OGMs and 
all OGMs will arrive at A directly and via the neighbor whereas the OGMs via 
the neighbor always be slower than the directly received ones. A should choose 
the longer path via the neighbor because it has no packet loss but the 
"fastest sequence number wins" algorithm won't allow that. 

> - This approach achieves the similar goal as the seqno-based one: switch to
> a neighbor only when this neighbor is better and has more up-to-date
> information than the current best route. But this approach is less
> constrained and seems not to suffer from the above problem for the
> seqno-based approach. I have tested the patch on our test environment. It
> works pretty well on the test cases.

Great! I comitted the patch immediately. Thanks again for your thorough 
analysis and ideas. :-)

> - The problem might happen when the neighbor thinks that I am his best
> neighbor (his good number is actually from me). In this case, my estimation
> of that neighbor's TQ is actually decided by my own TQ (my TQ minus one hop
> penalty). When my TQ drops down, the correct behavior should be that my
> estimation of that neighbor's TQ also drops down accordingly. However, when
> echo cancellation steps in, my estimation of that neighbor's TQ will not be
> updated and remains to be the stale value. This might cause problems (e.g.
> looping) in corner cases even with the new patch.  Does this analysis make
> sense or I miss some details of the echo cancellation?

Unless you found another bug the echo cancellation should not hinder the 
propagation of our own TQ value. Once we (uml2) received a packet we will 
rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got 
lost) and update its routing tables accordingly. Now, he will rebroadcast the 
packet as well which will be killed by the echo cancellation on uml1 as this 
packet does not contain new information. 
But even if the TQ received via uml2 is worse uml1 might still have a good TQ 
from uml3 in its routing table which might get rebroadcasted.

Regards,
Marek

[-- Attachment #2: fast_seqno_problem.png --]
[-- Type: image/png, Size: 7600 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-06 14:15         ` Marek Lindner
@ 2009-08-07 15:13           ` Yang Su
  2009-08-07 15:58             ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-07 15:13 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

[-- Attachment #1: Type: text/plain, Size: 2677 bytes --]






> - This approach achieves the similar goal as the seqno-based one: switch
to
> a neighbor only when this neighbor is better and has more up-to-date
> information than the current best route. But this approach is less
> constrained and seems not to suffer from the above problem for the
> seqno-based approach. I have tested the patch on our test environment. It
> works pretty well on the test cases.

Great! I comitted the patch immediately. Thanks again for your thorough
analysis and ideas. :-)

- Some updates: I did more thorough testing. It turns out that this patch
does really solve the problem.  This method seems not able to completely
eliminate all possible forms of looping which happen in the test cases.
This result in an special pattern: the rerouting is fast (~15 seconds),
then (after 3~ 5 seconds) the network enters the looping state and it
normally takes long time to recover.  Last time when I did tests, I stop
the tests right after I observed the successful rerouting. I didn't look
into the cause of this problem yet. Any inputs are welcome.

> - The problem might happen when the neighbor thinks that I am his best
> neighbor (his good number is actually from me). In this case, my
estimation
> of that neighbor's TQ is actually decided by my own TQ (my TQ minus one
hop
> penalty). When my TQ drops down, the correct behavior should be that my
> estimation of that neighbor's TQ also drops down accordingly. However,
when
> echo cancellation steps in, my estimation of that neighbor's TQ will not
be
> updated and remains to be the stale value. This might cause problems
(e.g.
> looping) in corner cases even with the new patch.  Does this analysis
make
> sense or I miss some details of the echo cancellation?

Unless you found another bug the echo cancellation should not hinder the
propagation of our own TQ value. Once we (uml2) received a packet we will
rebroadcast it. Our neighbor (uml1) will receive it (unless the packet got
lost) and update its routing tables accordingly. Now, he will rebroadcast
the
packet as well which will be killed by the echo cancellation on uml1 as
this
packet does not contain new information.
But even if the TQ received via uml2 is worse uml1 might still have a good
TQ
from uml3 in its routing table which might get rebroadcasted.

- In the following example (also appended in the attachment :) ), if I
understand the current echo cancellation implementation correctly, batman
will enter permanent looping between A and B. In this example, A send to F,
all the links are perfect and have the same delay. Only exception is link
A-E. It is an asymmetric link.

Regards,
Yang
(See attached file: looping.pdf)

[-- Attachment #2: looping.pdf --]
[-- Type: application/pdf, Size: 4387 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-07 15:13           ` Yang Su
@ 2009-08-07 15:58             ` Marek Lindner
  2009-08-10  7:20               ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-07 15:58 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Friday 07 August 2009 23:13:46 Yang Su wrote:
> - Some updates: I did more thorough testing. It turns out that this patch
> does really solve the problem.  This method seems not able to completely
> eliminate all possible forms of looping which happen in the test cases.
> This result in an special pattern: the rerouting is fast (~15 seconds),
> then (after 3~ 5 seconds) the network enters the looping state and it
> normally takes long time to recover.  Last time when I did tests, I stop
> the tests right after I observed the successful rerouting. I didn't look
> into the cause of this problem yet. Any inputs are welcome.

Can you provide logs from the nodes involved ? You also can send them off-list 
to avoid sending too big attachments.


> - In the following example (also appended in the attachment :) ), if I
> understand the current echo cancellation implementation correctly, batman
> will enter permanent looping between A and B. In this example, A send to F,
> all the links are perfect and have the same delay. Only exception is link
> A-E. It is an asymmetric link.

Why do you think it would loop permanently and why do you think it is the 
fault of the echo cancellation ?
Actually, this example should[tm] be pretty easy. Every time the OGM from E or 
F arrive at A via the asymetric link A will apply a severe asymetric link 
penalty. All nodes will route via the longer path. Did you try to run this 
scenario in your testbed ?

Regards,
Marek


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-07 15:58             ` Marek Lindner
@ 2009-08-10  7:20               ` Yang Su
  2009-08-10  7:48                 ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-10  7:20 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

For the echo cancellation example: the problem happens with the interaction
between A and B. Because of the asym link, A chooses B as its next neighbor
towards F. Because the asym path is short, B always receives F's OGM from A
first  (before B receives the OGM with the same seqno from C). As a result,
from A's point of view, F's OGMs rebroadcasted by B are always echoes and A
will never update the TQ information towards F via B.

When the quality of any links on the path B to F drops down, B might choose
A as its next neighbor towards F and a loop forms between A and B.

Regards,
Yang

On Friday 07 August 2009 23:13:46 Yang Su wrote:
> - Some updates: I did more thorough testing. It turns out that this patch
> does really solve the problem.  This method seems not able to completely
> eliminate all possible forms of looping which happen in the test cases.
> This result in an special pattern: the rerouting is fast (~15 seconds),
> then (after 3~ 5 seconds) the network enters the looping state and it
> normally takes long time to recover.  Last time when I did tests, I stop
> the tests right after I observed the successful rerouting. I didn't look
> into the cause of this problem yet. Any inputs are welcome.

Can you provide logs from the nodes involved ? You also can send them
off-list
to avoid sending too big attachments.

> - In the following example (also appended in the attachment :) ), if I
> understand the current echo cancellation implementation correctly, batman
> will enter permanent looping between A and B. In this example, A send to
F,
> all the links are perfect and have the same delay. Only exception is link
> A-E. It is an asymmetric link.

Why do you think it would loop permanently and why do you think it is the
fault of the echo cancellation ?
Actually, this example should[tm] be pretty easy. Every time the OGM from E
or
F arrive at A via the asymetric link A will apply a severe asymetric link
penalty. All nodes will route via the longer path. Did you try to run this
scenario in your testbed ?

Regards,
Marek

_______________________________________________
B.A.T.M.A.N mailing list
B.A.T.M.A.N@lists.open-mesh.net
https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-10  7:20               ` Yang Su
@ 2009-08-10  7:48                 ` Marek Lindner
  2009-08-10  7:57                   ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-10  7:48 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Monday 10 August 2009 15:20:28 Yang Su wrote:
> For the echo cancellation example: the problem happens with the interaction
> between A and B. Because of the asym link, A chooses B as its next neighbor
> towards F. Because the asym path is short, B always receives F's OGM from A
> first  (before B receives the OGM with the same seqno from C). As a result,
> from A's point of view, F's OGMs rebroadcasted by B are always echoes and A
> will never update the TQ information towards F via B.
>
> When the quality of any links on the path B to F drops down, B might choose
> A as its next neighbor towards F and a loop forms between A and B.

Ah, here we have the misunderstanding. Let me briefly outline how the echo 
cancellation works (I refer to your PDF-example):

1. Node C emits a packet.
2. Node B receives it and writes the address of Node C in the previous sender 
field because it received that packet via C. Now, B rebroadcasts the OGM.
3. Node C receives the packet it sent before and discards it as the packet 
contains its own address in the previous sender field.
4. Node A happily receives the packet (and goes back to step 2).

The echo cancellation will not kill the packets coming via the longer path. 

Does this help ? 

Regards,
Marek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-10  7:48                 ` Marek Lindner
@ 2009-08-10  7:57                   ` Yang Su
  2009-08-10  8:16                     ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-10  7:57 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Marek Lindner <lindner_marek@yahoo.de>                                                                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net>                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |10.08.2009 09:50                                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [B.A.T.M.A.N.] batman goes looping...                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |b.a.t.m.a.n-bounces@lists.open-mesh.net                                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|








On Monday 10 August 2009 15:20:28 Yang Su wrote:
> For the echo cancellation example: the problem happens with the
interaction
> between A and B. Because of the asym link, A chooses B as its next
neighbor
> towards F. Because the asym path is short, B always receives F's OGM from
A
> first  (before B receives the OGM with the same seqno from C). As a
result,
> from A's point of view, F's OGMs rebroadcasted by B are always echoes and
A
> will never update the TQ information towards F via B.
>
> When the quality of any links on the path B to F drops down, B might
choose
> A as its next neighbor towards F and a loop forms between A and B.

Ah, here we have the misunderstanding. Let me briefly outline how the echo
cancellation works (I refer to your PDF-example):

1. Node C emits a packet.
2. Node B receives it and writes the address of Node C in the previous
sender
field because it received that packet via C. Now, B rebroadcasts the OGM.

-- B has already rebroadcasted an earlier OGM with the same sequence number
(received from A). Will B rebroadcast the OGM with the same sequence number
(received from C) again?

3. Node C receives the packet it sent before and discards it as the packet
contains its own address in the previous sender field.
4. Node A happily receives the packet (and goes back to step 2).


Cheers,
Yang


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-10  7:57                   ` Yang Su
@ 2009-08-10  8:16                     ` Marek Lindner
  2009-08-11 16:42                       ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-10  8:16 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Monday 10 August 2009 15:57:29 Yang Su wrote:
> -- B has already rebroadcasted an earlier OGM with the same sequence number
> (received from A). Will B rebroadcast the OGM with the same sequence number
> (received from C) again?

Excellent question - it should. I will check the code later to verify that 
this is the case. 
However, this is not related to the echo cancellation anymore.

Regards,
Marek


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-10  8:16                     ` Marek Lindner
@ 2009-08-11 16:42                       ` Marek Lindner
  2009-08-12 14:34                         ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-11 16:42 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Monday 10 August 2009 16:16:49 Marek Lindner wrote:
> Excellent question - it should. I will check the code later to verify that
> this is the case.
> However, this is not related to the echo cancellation anymore.

I inspected the code for behaviour in the case you outlined. I think our 
dropping policy might be a bit too strict. I suggest the following behaviour:

* New OGMs are processed and rebroadcasted as usual.
* Known OGMs (the sequence number is not new but we did not hear this sequence 
number via that neighbor) are processed and rebroadcasted as well.
* Duplicates (known sequence number via a neighbor that sent us this sequence 
number before are dropped.

Objections / ideas ?

Regards,
Marek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-11 16:42                       ` Marek Lindner
@ 2009-08-12 14:34                         ` Yang Su
  2009-08-13  9:56                           ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-12 14:34 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Given the DV nature of Batman, current strict dropping policy seems to make
sense, because sending known OGMs (as suggested) does introduce significant
overhead but does not propagate so much new information.

In addition, loosing the dropping policy may introduce new problems which
might not be easily predictable at this moment, e.g., stability, looping
(propagating stale information within the network might cause even more
nasty looping problems).

Another way (and maybe simpler ) to fix the problem: instead of loosing the
OGM dropping policy,  loosing the echo cancellation policy.

It is a very simple fix: in addition to the usual echo cancellation
process, I also check  the TQ value contained in the echoed OGM.  This TQ
is announced by the neighbor that broadcasts this echo. If this TQ is worse
than my current TQ estimation towards that neighbor, I update the
estimation with this TQ.

Does that make sense?

Regards,
Yang



|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Marek Lindner <lindner_marek@yahoo.de>                                                                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net>                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |11.08.2009 18:44                                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [B.A.T.M.A.N.] batman goes looping...                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |b.a.t.m.a.n-bounces@lists.open-mesh.net                                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





On Monday 10 August 2009 16:16:49 Marek Lindner wrote:
> Excellent question - it should. I will check the code later to verify
that
> this is the case.
> However, this is not related to the echo cancellation anymore.

I inspected the code for behaviour in the case you outlined. I think our
dropping policy might be a bit too strict. I suggest the following
behaviour:

* New OGMs are processed and rebroadcasted as usual.
* Known OGMs (the sequence number is not new but we did not hear this
sequence
number via that neighbor) are processed and rebroadcasted as well.
* Duplicates (known sequence number via a neighbor that sent us this
sequence
number before are dropped.

Objections / ideas ?

Regards,
Marek

_______________________________________________
B.A.T.M.A.N mailing list
B.A.T.M.A.N@lists.open-mesh.net
https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-12 14:34                         ` Yang Su
@ 2009-08-13  9:56                           ` Marek Lindner
  2009-08-13 15:07                             ` Yang Su
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Lindner @ 2009-08-13  9:56 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Wednesday 12 August 2009 22:34:17 Yang Su wrote:
> Given the DV nature of Batman, current strict dropping policy seems to make
> sense, because sending known OGMs (as suggested) does introduce significant
> overhead but does not propagate so much new information.
>
> In addition, loosing the dropping policy may introduce new problems which
> might not be easily predictable at this moment, e.g., stability, looping
> (propagating stale information within the network might cause even more
> nasty looping problems).

I agree that this might introduce new problems which is the reason for posting 
it here. I definitely see the overhead created by that methog. Every better 
idea will be accepted immediately.  :-)


> Another way (and maybe simpler ) to fix the problem: instead of loosing the
> OGM dropping policy,  loosing the echo cancellation policy.
>
> It is a very simple fix: in addition to the usual echo cancellation
> process, I also check  the TQ value contained in the echoed OGM.  This TQ
> is announced by the neighbor that broadcasts this echo. If this TQ is worse
> than my current TQ estimation towards that neighbor, I update the
> estimation with this TQ.

In your testbed it fixed the issue ?
Looking at the example you gave I would expect that the problem is only solved 
partially. You argued the OGMs from E via A would always arrive faster at B 
compared to the longer path. Using your method would probably avoid the loop 
between A and B in the event of a packet loss on the longer path but it would 
not let route A via the longer path. Am I mistaken ?

Regards,
Marek

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-13  9:56                           ` Marek Lindner
@ 2009-08-13 15:07                             ` Yang Su
  2009-08-13 16:00                               ` Marek Lindner
  0 siblings, 1 reply; 18+ messages in thread
From: Yang Su @ 2009-08-13 15:07 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

Maybe I miss some thing here. What do you think would prevent A from
choosing via the long route (via B) ?

Cheers,
Yang


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Marek Lindner <lindner_marek@yahoo.de>                                                                                                            |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |"The list for a Better Approach To Mobile Ad-hoc Networking" <b.a.t.m.a.n@lists.open-mesh.net>                                                    |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |13.08.2009 11:59                                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: [B.A.T.M.A.N.] batman goes looping...                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Sent by:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |b.a.t.m.a.n-bounces@lists.open-mesh.net                                                                                                           |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





On Wednesday 12 August 2009 22:34:17 Yang Su wrote:
> Given the DV nature of Batman, current strict dropping policy seems to
make
> sense, because sending known OGMs (as suggested) does introduce
significant
> overhead but does not propagate so much new information.
>
> In addition, loosing the dropping policy may introduce new problems which
> might not be easily predictable at this moment, e.g., stability, looping
> (propagating stale information within the network might cause even more
> nasty looping problems).

I agree that this might introduce new problems which is the reason for
posting
it here. I definitely see the overhead created by that methog. Every better

idea will be accepted immediately.  :-)


> Another way (and maybe simpler ) to fix the problem: instead of loosing
the
> OGM dropping policy,  loosing the echo cancellation policy.
>
> It is a very simple fix: in addition to the usual echo cancellation
> process, I also check  the TQ value contained in the echoed OGM.  This TQ
> is announced by the neighbor that broadcasts this echo. If this TQ is
worse
> than my current TQ estimation towards that neighbor, I update the
> estimation with this TQ.

In your testbed it fixed the issue ?
Looking at the example you gave I would expect that the problem is only
solved
partially. You argued the OGMs from E via A would always arrive faster at B

compared to the longer path. Using your method would probably avoid the
loop
between A and B in the event of a packet loss on the longer path but it
would
not let route A via the longer path. Am I mistaken ?

Regards,
Marek
_______________________________________________
B.A.T.M.A.N mailing list
B.A.T.M.A.N@lists.open-mesh.net
https://lists.open-mesh.net/mm/listinfo/b.a.t.m.a.n




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [B.A.T.M.A.N.] batman goes looping...
  2009-08-13 15:07                             ` Yang Su
@ 2009-08-13 16:00                               ` Marek Lindner
  0 siblings, 0 replies; 18+ messages in thread
From: Marek Lindner @ 2009-08-13 16:00 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking

On Thursday 13 August 2009 23:07:06 Yang Su wrote:
> Maybe I miss some thing here. What do you think would prevent A from
> choosing via the long route (via B) ?

The starting point (still using your example):
* A points to E as it never learned about the long path.
* B points to C because it learned about the long path but does not 
rebroadcast the packets from C because they are slower than the 
packets via A.
* C uses the longer path.

Your idea: Use the echo cancellation's answer to retrieve more information, 
especially when the TQ which is coming back is worse than my own I update my 
routing information.

The situation now:
* A still points to E because the echos coming from B do not contain TQ values 
that are worse than its own.
* B will probably never route via A because the TQ values are much worse.
* C is not affected.

Did I overlook something ?

Regards,
Marek

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-08-13 16:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-22 10:56 [B.A.T.M.A.N.] batman goes looping Andrew Lunn
2009-07-27 16:20 ` Marek Lindner
2009-07-30 14:32   ` Yang Su
2009-08-03 10:21     ` Marek Lindner
2009-08-04 15:59       ` [B.A.T.M.A.N.] Routing help? nick farrow
2009-08-06  8:44       ` [B.A.T.M.A.N.] batman goes looping Yang Su
2009-08-06 14:15         ` Marek Lindner
2009-08-07 15:13           ` Yang Su
2009-08-07 15:58             ` Marek Lindner
2009-08-10  7:20               ` Yang Su
2009-08-10  7:48                 ` Marek Lindner
2009-08-10  7:57                   ` Yang Su
2009-08-10  8:16                     ` Marek Lindner
2009-08-11 16:42                       ` Marek Lindner
2009-08-12 14:34                         ` Yang Su
2009-08-13  9:56                           ` Marek Lindner
2009-08-13 15:07                             ` Yang Su
2009-08-13 16:00                               ` Marek Lindner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox