[B.A.T.M.A.N.] Still looping packets in bla setup

public inbox for b.a.t.m.a.n@lists.open-mesh.org
 help / color / mirror / Atom feed

* [B.A.T.M.A.N.] Still looping packets in bla setup
@ 2016-02-19 10:58 Andreas Pape
  2016-02-19 15:06 ` Simon Wunderlich
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Pape @ 2016-02-19 10:58 UTC (permalink / raw)
  To: Simon Wunderlich; +Cc: b.a.t.m.a.n

Hello Simon,

I'm still working with my IT department to be able to send the patches in
a way compliant to the documentation provided by Sven. In the meantime I
reworked the patches, but I am still struggling with the e-mail client
issue.
Nevertheless if I apply all the patches I sent earlier (except Patch 4/4
as it was meaningless) I still have 3 problems :
1. I have sometimes looping unicast packets in the direction backbone ->
mesh -> backbone. Dropping all unicast traffic received from another
backbone gw and destined to be forwarded to the backbone again as you
suggested this in an earlier mail solves this issue. Shall I provide an
according patch? Shall I add a "Suggested-by" referring to you? I feel a
little bit uncomforable with this patch as it seems to be something more
like a workaround. The question I cannot answer yet is why the other
backbone gws send traffic via the mesh which could be sent via the
backbone?
2. Although having the patch for 1. applied, the backbone gateways send
claim frames for the devices of their own backbone in rare cases from time
to time. I could send a patch for this as it is rather easy to check with
the help of the local tt table (batadv_is_my_client) if it is reasonable
to send a claim frame for these devices. Again, this patch looks more like
a workaround to me as I also cannot explain what really triggers the
generation of these claim frames.
3. I see again in rare cases looping multicasts for traffic
mesh->backbone->mesh. If I look at the bla debug messages in these cases I
see, that a backbone gw holding the claim for the source of the multicast
frame thinks that the client belonging to the source address has "roamed"
from another mesh node into the backbone network although it didn't. From
this I conclude that another backbone gw has forwarded the multicast into
the backbone although it shoudn't have done this (having found no claim
for the client or erroneously also holding a claim). In this case the
backbone gateways seem to be out-of-sync about the actual claim status for
that client. This effect only lasts a very short time, as the gateway
which found the "roaming" client unclaims it and within a few milliseconds
(depending on the traffic generated by the client) another backbone gw (or
the same) claims the client again. Of course then the looping of the
multicast traffic from the client stops. In my case the sender of the
multicast was the bridge interface br0 of a remote mesh node itself. The
bat0 softinterface was added to that bridge. The looping multicast then
gave me a "bat0: received packet with own address as source address"
message. Furthermore that bat0 interface sent a claim frame for the mac of
the own bridge (whch is obvious as bat0 received a message from the mesh
with a mac address not claimed yet....). This claim frame then produces
another "bat0: received packet ..." message.
I currently have no workaround for this 3rd issue as all I can image to
prevent this will break the "roaming client" scenario for bla. I could
even live with this problem as it happens quite seldomly and as it is
"self-healing", but it tells my that there might be a sync issue. Do you
think that my 1st and 2nd point could also relate to the same problem?
In the meantime I looked through the code for hours but I am not able to
find something that could explain the observed problem.

Kind regards,
Andreas

..................................................................
PHOENIX CONTACT ELECTRONICS GmbH

Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont
USt-Id-Nr.: DE811742156
Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528
Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck
___________________________________________________________________
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
----------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
___________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [B.A.T.M.A.N.] Still looping packets in bla setup
  2016-02-19 10:58 [B.A.T.M.A.N.] Still looping packets in bla setup Andreas Pape
@ 2016-02-19 15:06 ` Simon Wunderlich
  2016-02-22  7:36   ` [B.A.T.M.A.N.] Antwort: " Andreas Pape
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Wunderlich @ 2016-02-19 15:06 UTC (permalink / raw)
  To: Andreas Pape; +Cc: b.a.t.m.a.n

[-- Attachment #1: Type: text/plain, Size: 5363 bytes --]

Hi Andreas,

On Friday 19 February 2016 11:58:47 Andreas Pape wrote:
> Hello Simon,
> 
> I'm still working with my IT department to be able to send the patches in
> a way compliant to the documentation provided by Sven. In the meantime I
> reworked the patches, but I am still struggling with the e-mail client
> issue.
> Nevertheless if I apply all the patches I sent earlier (except Patch 4/4
> as it was meaningless) I still have 3 problems :
> 1. I have sometimes looping unicast packets in the direction backbone ->
> mesh -> backbone. Dropping all unicast traffic received from another
> backbone gw and destined to be forwarded to the backbone again as you
> suggested this in an earlier mail solves this issue. Shall I provide an
> according patch?

I think that is a good idea.

> Shall I add a "Suggested-by" referring to you?

If you want, you can add a "Reported-by", although I don't need any credit 
here. I don't know if there is something like Suggested-by in Linux.

> I feel a
> little bit uncomforable with this patch as it seems to be something more
> like a workaround.

Well, we also drop the broadcast traffic when it comes from another backbone. So 
I don't think this is a workaround. BLA consists of a lot of rules after all 
...

> The question I cannot answer yet is why the other
> backbone gws send traffic via the mesh which could be sent via the
> backbone?

That is a good quesiton indeed. I think this can only be answered if you 
inspect the unicast packet and its source and destination as well as 
translation table state and bridge port state (brctl showmacs). This might 
give a clue what is going on ... If its not a DAT packet, that is.

> 2. Although having the patch for 1. applied, the backbone gateways send
> claim frames for the devices of their own backbone in rare cases from time
> to time. I could send a patch for this as it is rather easy to check with
> the help of the local tt table (batadv_is_my_client) if it is reasonable
> to send a claim frame for these devices. Again, this patch looks more like
> a workaround to me as I also cannot explain what really triggers the
> generation of these claim frames.

I don't think this is the right way to solve it - if a client has roamed to 
another device in the mesh, a gateway MUST send a claim. However 
batadv_is_my_client would probably return true, suggesting that the client is 
local although it is not local anymore.

The problem probably needs to be fixed somewhere else.

> 3. I see again in rare cases looping multicasts for traffic
> mesh->backbone->mesh. If I look at the bla debug messages in these cases I
> see, that a backbone gw holding the claim for the source of the multicast
> frame thinks that the client belonging to the source address has "roamed"
> from another mesh node into the backbone network although it didn't. From
> this I conclude that another backbone gw has forwarded the multicast into
> the backbone although it shoudn't have done this (having found no claim
> for the client or erroneously also holding a claim). In this case the
> backbone gateways seem to be out-of-sync about the actual claim status for
> that client. This effect only lasts a very short time, as the gateway
> which found the "roaming" client unclaims it and within a few milliseconds
> (depending on the traffic generated by the client) another backbone gw (or
> the same) claims the client again. Of course then the looping of the
> multicast traffic from the client stops. In my case the sender of the
> multicast was the bridge interface br0 of a remote mesh node itself. The
> bat0 softinterface was added to that bridge. The looping multicast then
> gave me a "bat0: received packet with own address as source address"
> message. Furthermore that bat0 interface sent a claim frame for the mac of
> the own bridge (whch is obvious as bat0 received a message from the mesh
> with a mac address not claimed yet....). This claim frame then produces
> another "bat0: received packet ..." message.
> I currently have no workaround for this 3rd issue as all I can image to
> prevent this will break the "roaming client" scenario for bla. I could
> even live with this problem as it happens quite seldomly and as it is
> "self-healing", but it tells my that there might be a sync issue. Do you
> think that my 1st and 2nd point could also relate to the same problem?
> In the meantime I looked through the code for hours but I am not able to
> find something that could explain the observed problem.

Hmm ... that sounds strange. I don't know if this is related to the your first 
two points since we are talking about multicast here and the other points were 
about unicast.

I think the main question here is - if the packet came from the mesh, why 
wasn't there a claim frame? 

Maybe two questions could help: 
 * does this happen in the first minutes after starting/restarting the mesh? 
There is some initial time for bla gateway nodes to detect each other, 
although this should happen quite fast.
 * Do you have some unusual high amount of broadcast/multicast (e.g. 
streaming, fieldbus protocol, etc)?

What might help is to get dumps from the hard interface as well as the bat0 
soft interface and check the corresponding packets when this problem happens. 
Not sure if this helps and how easy it is to capture dumps ...

Cheers,
     Simon

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [B.A.T.M.A.N.] Antwort: Re: Still looping packets in bla setup
  2016-02-19 15:06 ` Simon Wunderlich
@ 2016-02-22  7:36   ` Andreas Pape
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Pape @ 2016-02-22  7:36 UTC (permalink / raw)
  To: Simon Wunderlich; +Cc: b.a.t.m.a.n

Hello Simon,

Simon Wunderlich <sw@simonwunderlich.de> schrieb am 19.02.2016 16:06:03:


> > 2. Although having the patch for 1. applied, the backbone gateways
send
> > claim frames for the devices of their own backbone in rare cases from
time
> > to time. I could send a patch for this as it is rather easy to check
with
> > the help of the local tt table (batadv_is_my_client) if it is
reasonable
> > to send a claim frame for these devices. Again, this patch looks more
like
> > a workaround to me as I also cannot explain what really triggers the
> > generation of these claim frames.
>
> I don't think this is the right way to solve it - if a client has roamed
to
> another device in the mesh, a gateway MUST send a claim. However
> batadv_is_my_client would probably return true, suggesting that the
client is
> local although it is not local anymore.
>
> The problem probably needs to be fixed somewhere else.
>
> > 3. I see again in rare cases looping multicasts for traffic
> > mesh->backbone->mesh. If I look at the bla debug messages in these
cases I
> > see, that a backbone gw holding the claim for the source of the
multicast
> > frame thinks that the client belonging to the source address has
"roamed"
> > from another mesh node into the backbone network although it didn't.
From
> > this I conclude that another backbone gw has forwarded the multicast
into
> > the backbone although it shoudn't have done this (having found no
claim
> > for the client or erroneously also holding a claim). In this case the
> > backbone gateways seem to be out-of-sync about the actual claim status
for
> > that client. This effect only lasts a very short time, as the gateway
> > which found the "roaming" client unclaims it and within a few
milliseconds
> > (depending on the traffic generated by the client) another backbone gw
(or
> > the same) claims the client again. Of course then the looping of the
> > multicast traffic from the client stops. In my case the sender of the
> > multicast was the bridge interface br0 of a remote mesh node itself.
The
> > bat0 softinterface was added to that bridge. The looping multicast
then
> > gave me a "bat0: received packet with own address as source address"
> > message. Furthermore that bat0 interface sent a claim frame for the
mac of
> > the own bridge (whch is obvious as bat0 received a message from the
mesh
> > with a mac address not claimed yet....). This claim frame then
produces
> > another "bat0: received packet ..." message.
> > I currently have no workaround for this 3rd issue as all I can image
to
> > prevent this will break the "roaming client" scenario for bla. I could
> > even live with this problem as it happens quite seldomly and as it is
> > "self-healing", but it tells my that there might be a sync issue. Do
you
> > think that my 1st and 2nd point could also relate to the same problem?
> > In the meantime I looked through the code for hours but I am not able
to
> > find something that could explain the observed problem.
>
> Hmm ... that sounds strange. I don't know if this is related to the
> your first
> two points since we are talking about multicast here and the other
> points were
> about unicast.
>

In the meantime after further attempts to debug this I can say that 2. and
3.
are somehow related to each other as both seem to happen at the same time.
I cannot syncronize the debug messages as they are recorded by two
different
nodes (a backbone gw A and a normal node B) but it looks as if I first get
a
packet sent by B which is forwarded into the mesh by gw A again leading to

an unclaim of B (although B hasn't roamed to the backbone). After that
A has added B to the local TT. Then A receives a unicast packet from B via
the
mesh. After this packet I see different other unicast packets sent by
devices
from the local backbone network of A coming via the bat0 interface
erroneously.
These packets trigger the generation of claim frames (as described in 2.).

> I think the main question here is - if the packet came from the mesh,
why
> wasn't there a claim frame?
>
> Maybe two questions could help:
>  * does this happen in the first minutes after starting/restarting the
mesh?
> There is some initial time for bla gateway nodes to detect each other,
> although this should happen quite fast.

I am aware that this might be a little bit unstable after establishing the
mesh.
But this normally happens after some minutes of successfull operation of
the system
(in the order of 10 minutes approximately).

>  * Do you have some unusual high amount of broadcast/multicast (e.g.
> streaming, fieldbus protocol, etc)?
>

No, only normal IPv4 stuff and sometimes some ARPs. I use normal IPv4
communication
to the webinterfaces of my nodes and some normal pings (a packet a
second).
The nodes itself have IPv6 enabled although I don't use IPv6 but only IPv4
addressing.
The multicasts in question are for destination mac 33:33:00:00:00:01
(IPv6) and are
sent each 10 seconds by the IPv6 Linux stack.

> What might help is to get dumps from the hard interface as well as the
bat0
> soft interface and check the corresponding packets when this
problemhappens.
> Not sure if this helps and how easy it is to capture dumps ...
>
> Cheers,
>      Simon[Anhang "signature.asc" gelöscht von Andreas Pape/Phoenix
Contact]


..................................................................
PHOENIX CONTACT ELECTRONICS GmbH

Sitz der Gesellschaft / registered office of the company: 31812 Bad Pyrmont
USt-Id-Nr.: DE811742156
Amtsgericht Hannover HRB 100528 / district court Hannover HRB 100528
Geschäftsführer / Executive Board: Roland Bent, Dr. Martin Heubeck
___________________________________________________________________
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren, jegliche anderweitige Verwendung sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet.
----------------------------------------------------------------------------------------------------
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
___________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-02-22  7:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-19 10:58 [B.A.T.M.A.N.] Still looping packets in bla setup Andreas Pape
2016-02-19 15:06 ` Simon Wunderlich
2016-02-22  7:36   ` [B.A.T.M.A.N.] Antwort: " Andreas Pape

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox