[B.A.T.M.A.N.] A few more comments on the BATMAN routing protocol

public inbox for b.a.t.m.a.n@lists.open-mesh.org
 help / color / mirror / Atom feed

* [B.A.T.M.A.N.] A few more comments on the BATMAN routing protocol
@ 2008-08-17 17:54 Juliusz Chroboczek
  2008-08-21  5:42 ` [B.A.T.M.A.N.] " Axel Neumann
  0 siblings, 1 reply; 4+ messages in thread
From: Juliusz Chroboczek @ 2008-08-17 17:54 UTC (permalink / raw)
  To: axel, elektra, lindner_marek, siwu
  Cc: olsr-users, Roman Steiner, b.a.t.m.a.n, babel-users

Hello to all,

As some of you may remember, I made a few comments about the BATMAN
routing protocol, as described in the draft of 7 April 2008 (the
so-called ``mark 3'' version).  These comments can be found on

  http://mid.gmane.org/7itzdzzero.fsf@lanthane.pps.jussieu.fr

I have received a few replies, some of which were public but many of
which were private.  Unfortunately, no one of these replies appears to
be an authoritative reply of the BATMAN developers, so there is no
easy-to-quote document I can reply to.

Before I engage in a point-by-point reply, I'd like to mention that
there appears to be a ``mark 4'' protocol and a ``BMX'' protocol in
development, which apparently solve some of the issues I mentioned.
This is good to hear, and I'd like to see a specification of these
mythical protocols.

1. Exponential convergence
**************************

My claim that there exist topologies in which average-case convergence
of BATMAN is exponential in the number of hops has been confirmed by
two of the BATMAN developers.  I still believe this to be a very
significant flaw of BATMAN.

Elektra claimed that this is the desired ``fish-eye'' behaviour.
I disagree with that -- exponential convergence is exponential
convergence, whatever name you give to it.

Axel Neumann spoke about TCP inefficiencies in the presence of packet
loss.  I do not understand how that relates to the issue at hand.

I'd like to remind everyone that all of OLSR, AODV and Babel exhibit
linear-time convergence in all cases.

2. Lack of loop avoidance
*************************

A few of my correspondents have pointed out that BATMAN does in fact
have a loop avoidance mechanism.  I therefore retract my claim that
BATMAN causes persistent routing loops.

Unfortunately, none of the mails I received described the loop
avoidance mechanism, and the few hints that were given do not appear
to correspond to anything that's described in the draft.  Hence, I am
unable to evaluate BATMAN's loop avoidance mechanism, and in
particular I cannot determine whether it causes starvation or leads to
sub-optimal routing.

I am looking forward to a detailed description of BATMAN's loop
avoidance mechanism.

3. Unrealistic metric
*********************

This was confirmed by a few people, and is apparently worked around in
the next version of the protocol.  I am eagerly looking forward to
a complete description of the ``mark 4'' version of the BATMAN
protocol.

4. Lack of aggregation
**********************

This is apparently fixed in the next version of the protocol.  I am
eagerly looking forward to a complete description of the ``mark 4''
version of the BATMAN protocol.

5. Jitter is not compulsory
***************************

This was confirmed.  It is still unclear to me whether the BATMAN
implementation does apply jitter.

                                        Juliusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [B.A.T.M.A.N.] Re: A few more comments on the BATMAN routing protocol
  2008-08-17 17:54 [B.A.T.M.A.N.] A few more comments on the BATMAN routing protocol Juliusz Chroboczek
@ 2008-08-21  5:42 ` Axel Neumann
  2008-08-27 16:26   ` [B.A.T.M.A.N.] Re: [Babel-users] " Juliusz Chroboczek
  0 siblings, 1 reply; 4+ messages in thread
From: Axel Neumann @ 2008-08-21  5:42 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking
  Cc: Juliusz Chroboczek, olsr-users, babel-users

Juliusz,
now after being the first white-listed person on the batman-mailing list you 
are also the first person who receives an authorative reply from the 
batman-developer team (sorry for the coordination delay). However, we would 
like to mention that batman is a community project which can not afford a 
president in charge for authorative replies. Therefore, you have received a 
number of profound replies authored by a number of individuals but core 
batman developer. These mails have addressed all your points and either 
corrected false assumptions or indicated how newer batman versions handle the 
addressed problems. For example see:

https://list.open-mesh.net/pipermail/b.a.t.m.a.n/2008-August/000877.html
for disprove of point 2

https://list.open-mesh.net/pipermail/b.a.t.m.a.n/2008-August/000878.html
for clarifications on point 1, disprove of point 2, links to solutions for 
point 3, 4 and real to life experience 

https://list.open-mesh.net/pipermail/b.a.t.m.a.n/2008-August/000880.html
for link to performance studies of protocol-traffic overhead, etc (point 4)

By the way, the mythical protocols can be downloaded, tried and further 
examined here (batman-0.3):
http://downloads.open-mesh.net/batman/stable/sources/batman-0.3.tar.gz
and (BMX) here:
http://downloads.open-mesh.net/batman/development/sources/batmand-exp_0.3-alpha-current_sources.tgz

Nevertheless we are still eager to continue the discussion:
> 1. Exponential convergence
> **************************
>
> My claim that there exist topologies in which average-case convergence
> of BATMAN is exponential in the number of hops has been confirmed by
> two of the BATMAN developers.  I still believe this to be a very
> significant flaw of BATMAN.
>  
Packet loss increases also the count of OGMs that trigger a route switch 
decreases.

The convergence time depends to a great degree on the difference between 
the OGM count between the best and the second best route, if the routes 
are absolutely non-overlapping and the best route expires. Of course, if 
we have lots of broadcast packet loss from our destination we don't get 
updates often, but every protocol using broadcasts for messages suffers 
(or benefits in terms of overhead!) from this.

Assume a node has two almost equally good paths towards a destination 
that are non-overlapping. One has 90% broadcast packet loss, and the 
other one 91%. Now the route with 90% packet loss breaks. Given a 
sliding window size of 100 OGMs the route will likely switch as soon as 
1-2 OGMs arrive via the 91% loss path. So it mostly takes a single 
received OGM to switch, in the worst case two. Given the lossy route, 
this will take about 11 OGM intervals in total, on average.

Now let's come to the worst case. Again two paths that are 
non-overlapping. One is terrible, 99% loss. The other is perfect, no 
loss. If the perfect route breaks, it will take one or two full sliding 
window sizes on average to switch. We can now wonder how important it is 
to switch quickly from a route with no packet loss to a route with 99% 
packet loss - more so because we can't know that it is not just a burst 
of interference preventing new OGMs coming in via the best path. Even in 
this case the convergence time is likely to be as good or as bad as any 
other protocol, simply because other protocols would have to update 
local topology information (link metrics) as well and deal with the fact 
that their topology updates are communicated very infrequently as well.

All versions of Batman benefit from the fact that they don't produce 
much protocol overhead (small amount of data that needs to be 
communicated, OGM flooding is only flooded exclusively via the best 
routing path to a destination). Compared to other proactive protocols, 
protocol messages can be send more frequently to improve convergence 
speed. Batman-Exp is running in a 150 node mesh in Leipzig with an OGM 
interval of 1 second.

> 2. Lack of loop avoidance
> *************************
>   
The idea behind the design of Batman was primarily to invent something 
which doesn't loop. Elektra didn't see the protocol looping when testing 
it in the Meraka grid. (Sending ICMP requests every 0.05 seconds for 
hours on the weakest and longest route possible, while multiple traffic 
streams were colliding in the center of the grid.)  So we don't see a 
lack of a loop avoidance mechanism. If you can think of situations where 
B.A.T.M.A.N. loops let us know.

> 4. Lack of aggregation
> **********************
>
> This is apparently fixed in the next version of the protocol.  I am
> eagerly looking forward to a complete description of the ``mark 4''
> version of the BATMAN protocol.
>   
It is implemented in 0.3 and Exp.
>
> 5. Jitter is not compulsory
> ***************************
>
> This was confirmed.  It is still unclear to me whether the BATMAN
> implementation does apply jitter.
>
>   
The implementation of batmand-0.2, 0.3 and EXP do. But if the CTS/RTS 
mechanism of 802.11 would actually work, it wouldn't be necessary to 
have it in the protocol. 

The B.A.T.M.A.N. team

On Sonntag 17 August 2008, Juliusz Chroboczek wrote:
> Hello to all,
>
> As some of you may remember, I made a few comments about the BATMAN
> routing protocol, as described in the draft of 7 April 2008 (the
> so-called ``mark 3'' version).  These comments can be found on
>
>   http://mid.gmane.org/7itzdzzero.fsf@lanthane.pps.jussieu.fr
>
> I have received a few replies, some of which were public but many of
> which were private.  Unfortunately, no one of these replies appears to
> be an authoritative reply of the BATMAN developers, so there is no
> easy-to-quote document I can reply to.
>
> Before I engage in a point-by-point reply, I'd like to mention that
> there appears to be a ``mark 4'' protocol and a ``BMX'' protocol in
> development, which apparently solve some of the issues I mentioned.
> This is good to hear, and I'd like to see a specification of these
> mythical protocols.
>
>
> 1. Exponential convergence
> **************************
>
> My claim that there exist topologies in which average-case convergence
> of BATMAN is exponential in the number of hops has been confirmed by
> two of the BATMAN developers.  I still believe this to be a very
> significant flaw of BATMAN.
>
> Elektra claimed that this is the desired ``fish-eye'' behaviour.
> I disagree with that -- exponential convergence is exponential
> convergence, whatever name you give to it.
>
> Axel Neumann spoke about TCP inefficiencies in the presence of packet
> loss.  I do not understand how that relates to the issue at hand.
>
> I'd like to remind everyone that all of OLSR, AODV and Babel exhibit
> linear-time convergence in all cases.
>
>
> 2. Lack of loop avoidance
> *************************
>
> A few of my correspondents have pointed out that BATMAN does in fact
> have a loop avoidance mechanism.  I therefore retract my claim that
> BATMAN causes persistent routing loops.
>
> Unfortunately, none of the mails I received described the loop
> avoidance mechanism, and the few hints that were given do not appear
> to correspond to anything that's described in the draft.  Hence, I am
> unable to evaluate BATMAN's loop avoidance mechanism, and in
> particular I cannot determine whether it causes starvation or leads to
> sub-optimal routing.
>
> I am looking forward to a detailed description of BATMAN's loop
> avoidance mechanism.
>
>
> 3. Unrealistic metric
> *********************
>
> This was confirmed by a few people, and is apparently worked around in
> the next version of the protocol.  I am eagerly looking forward to
> a complete description of the ``mark 4'' version of the BATMAN
> protocol.
>
>
> 4. Lack of aggregation
> **********************
>
> This is apparently fixed in the next version of the protocol.  I am
> eagerly looking forward to a complete description of the ``mark 4''
> version of the BATMAN protocol.
>
>
> 5. Jitter is not compulsory
> ***************************
>
> This was confirmed.  It is still unclear to me whether the BATMAN
> implementation does apply jitter.
>
>
>                                         Juliusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [B.A.T.M.A.N.] Re: [Babel-users] A few more comments on the BATMAN routing protocol
  2008-08-21  5:42 ` [B.A.T.M.A.N.] " Axel Neumann
@ 2008-08-27 16:26   ` Juliusz Chroboczek
  2008-09-01 12:02     ` Axel Neumann
  0 siblings, 1 reply; 4+ messages in thread
From: Juliusz Chroboczek @ 2008-08-27 16:26 UTC (permalink / raw)
  To: Axel Neumann
  Cc: The list for a Better Approach To Mobile Ad-hoc Networking,
	olsr-users, babel-users

>> My claim that there exist topologies in which average-case convergence
>> of BATMAN is exponential in the number of hops has been confirmed by
>> two of the BATMAN developers.  I still believe this to be a very
>> significant flaw of BATMAN.

> Packet loss increases also the count of OGMs that trigger a route switch 
> decreases.

I realise that.  The trouble is that the time needed to reconverge is
proportional to the product of the link qualities, and not to the sum
of the link qualities, as is the case in Babel and in OLSR.

> Now let's come to the worst case. Again two paths that are 
> non-overlapping. One is terrible, 99% loss. The other is perfect, no 
> loss.

As I mentioned in my point 3, the packet loss, as measured by BATMAN,
is not realistic in the presence of link-layer AQM.  As you know,
IEEE 802.11 does perform link-layer AQM for unicast packets.

A 10-hop route with each link having 20% loss will be measured by
BATMAN as a route with 90% loss.  However, with 7-persistent per-hop
AQM (the IEEE 802.11 default), such a route has an effective loss rate
of roughly 10^-4, and is therefore quite usable by standard transport-layer
protocols such as TCP.  (In case this is not clear to you -- this is
brilliantly exposed in the original ETX paper.)

Hence, my claim that BATMAN converges in exponential time stands, even
if you restrict yourself to routes usable by TCP.

> All versions of Batman benefit from the fact that they don't produce 
> much protocol overhead (small amount of data that needs to be 
> communicated, OGM flooding is only flooded exclusively via the best 
> routing path to a destination).

Given a network with n nodes and e edges, BATMAN sends O(n^2) packets
per unit of time, Babel sends O(n^2) messages, and OLSR sends between
O(e) and O(e.n) depending on the MPR redundancy.

I fail to see how that relates to the fact that BATMAN converges in
exponential time in the network diameter.

                                        Juliusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [B.A.T.M.A.N.] Re: [Babel-users] A few more comments on the BATMAN routing protocol
  2008-08-27 16:26   ` [B.A.T.M.A.N.] Re: [Babel-users] " Juliusz Chroboczek
@ 2008-09-01 12:02     ` Axel Neumann
  0 siblings, 0 replies; 4+ messages in thread
From: Axel Neumann @ 2008-09-01 12:02 UTC (permalink / raw)
  To: The list for a Better Approach To Mobile Ad-hoc Networking
  Cc: Juliusz Chroboczek, olsr-users, babel-users

Hi,

On Mittwoch 27 August 2008, you wrote:
> >> My claim that there exist topologies in which average-case convergence
> >> of BATMAN is exponential in the number of hops has been confirmed by
> >> two of the BATMAN developers.  I still believe this to be a very
> >> significant flaw of BATMAN.
> >
> > Packet loss increases also the count of OGMs that trigger a route switch
> > decreases.
>
> I realise that.  The trouble is that the time needed to reconverge is
> proportional to the product of the link qualities, and not to the sum
> of the link qualities, as is the case in Babel and in OLSR.
>
> > Now let's come to the worst case. Again two paths that are
> > non-overlapping. One is terrible, 99% loss. The other is perfect, no
> > loss.
>
> As I mentioned in my point 3, the packet loss, as measured by BATMAN,
> is not realistic in the presence of link-layer AQM.  As you know,
> IEEE 802.11 does perform link-layer AQM for unicast packets.
AQM == Active Queueing Management ? Sorry, I did not know this. I know that 
802.11 retransmits (usually up to 7 times) in case of absent ACKs and most 
implementation performs some non-standardized Ack-based rate control. 

All mesh protocols I am aware of use broadcast-based link probing for 
estimating the link quality. I agree that this is not very precise and also 
does not indicate much about the achievable bandwidth. But the alternatives 
are also difficult. Performing unicast measurements introduces lots of 
overhead especially in areas of high node density and relying on estimated 
802.11 transmit rates requires interaction with the MAC layer and still does 
not relieve from the need for casual unicast traffic between all neighboring 
nodes. 
I am currently experiencing with unicast probes, hoping to find an acceptable 
mixture of selective unicast probing and reusing probetraffic for flooding of 
topology information. 

>
> A 10-hop route with each link having 20% loss will be measured by
> BATMAN as a route with 90% loss.  However, with 7-persistent per-hop
> AQM (the IEEE 802.11 default), such a route has an effective loss rate
> of roughly 10^-4, and is therefore quite usable by standard transport-layer
> protocols such as TCP.  (In case this is not clear to you -- this is
> brilliantly exposed in the original ETX paper.)
>
> Hence, my claim that BATMAN converges in exponential time stands, even
> if you restrict yourself to routes usable by TCP.

And we have not denied your claim. But the evolution of BATMAN has continued 
after batmand-0.2 and what is described in the draft - so did the OLSR which 
is used in todays Freifunk networks. For example, the BMX version by default 
sends each OGM twice (or more often if wanted, my first observations was that 
twice seems sufficient and more than twice does not pay off). Taken your 
example of 10 hops with a packet loss of 20% and a resulting Path Error Rate 
of 90 % 
transferred to the scenario of sending OGMs twice: 
(Link-OGM-Error-Rate) LOER=(LPER)²=0.04	=> 
(Path-OGM-Error-Rate) POER = 1-( (1-LOER)^hops ) = 0.34 = 34%

So statistically even after 10 hops 66 of 100 OGMs will be received. 
Even after 20 hops more than 55 of 100 OGMs could be received. 

But the generally cool thing is: The more weak hops exist between two nodes 
(and therefore the less important a good estimation of the path quality 
becomes) the less OGMs are eating up the available bandwidth and processing 
resources.

To sum up: Yes, it is exponential. Its not a bug, its a feature. This is 
wanted as long as sufficient OGMs remain to maintain usable routes.

>
> > All versions of Batman benefit from the fact that they don't produce
> > much protocol overhead (small amount of data that needs to be
> > communicated, OGM flooding is only flooded exclusively via the best
> > routing path to a destination).
>
> Given a network with n nodes and e edges, BATMAN sends O(n^2) packets
> per unit of time, Babel sends O(n^2) messages, and OLSR sends between
> O(e) and O(e.n) depending on the MPR redundancy.
>

I think the packets BATMAN sends per node (without packet loss) should be 
O(n).
Even though, this calculation ignores the packet loss which has been 
emphasized so much above. Without being an expert. If you want to express the 
theoretical effort of large BATMAN networks WITH PACKETLOSS with the 
O-notation, I guess that it rather ends up in something like O(log n) per 
node. Simply because the larger a network becomes, the greater the average 
distance between nodes in terms of hops becomes and therefore the greater the 
average end-to-end OGM loss. ( I think Roman Steiner also mentioned this 
effect in his simulations which he has described here 
http://open-mesh.net/Members/roman/report.pdf )

With kind of provoking example: Adding 500 nodes in the distance of 500 hops 
(with a Link OGM Error Rate of 4% and an originator interval of 1 second) 
causes the nodes at the other side to send 21 OGMs more  per year.

ciao,
axel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-09-01 12:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-17 17:54 [B.A.T.M.A.N.] A few more comments on the BATMAN routing protocol Juliusz Chroboczek
2008-08-21  5:42 ` [B.A.T.M.A.N.] " Axel Neumann
2008-08-27 16:26   ` [B.A.T.M.A.N.] Re: [Babel-users] " Juliusz Chroboczek
2008-09-01 12:02     ` Axel Neumann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox