netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ISSUE: mv88e6xxx]: Down/Up link and not forwarding
@ 2016-10-04 15:13 Jose Antonio Delgado Alfonso
  0 siblings, 0 replies; 4+ messages in thread
From: Jose Antonio Delgado Alfonso @ 2016-10-04 15:13 UTC (permalink / raw)
  To: netdev

We are working in an ARMv7 embedded system running kernel 4.1 but
including patches to upgrade dsa/mv88e6xxx to kernel version 4.3
(5acf4d0, Wed, 27 May 2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir()
should not return a boolean."

This is the schema of the system.

    +---------------------+ eth0
    |                     +--+
    |                     |  |
    |   Embedded system   +--+
    |                     |
    |       ARMv7         |
    |                     | Marvell 88E8057(sky2)          
+------------------+
    |                     +--+                          
+--+                  +--+ eth1@marvell
    |                     |  +---------------------------+ 
|                  |  +-------+
    |                     +--+         CPU port          +--+   
mv88e6176     +--+
    +------+--+-----------+                                
|                  |
emulated   |  |                                            
|                  |
GPIO-MDIO  +--+                                         
+--+                  +--+ eth2@marvell
             +-------------------------------------------+ 
|                  |  +-------+
                              MDIO                      
+--+                  +--+
                                                           
+------------------+

There is a bridge (br-lan) which includes eth0/eth1/eth2

>From time to time, We are seeing a link down and up of about 1s.
Following the message that kernel sends.

[  312.769399] dsa dsa@0 eth2: Link is Down
[  312.773372] br-lan: port 3(eth2) entered disabled state
[  312.947274] dsa dsa@0 eth2: link up, 100 Mb/s, full duplex, flow
control disabled
[  312.963807] br-lan: port 3(eth2) entered forwarding state
[  312.969276] br-lan: port 3(eth2) entered forwarding state
[  313.777815] dsa dsa@0 eth2: Link is Up - 100Mbps/Full - flow control
rx/tx
[  314.966277] br-lan: port 3(eth2) entered forwarding state

Moreover, under a reboot loop test which consists in booting the system,
ping the unit and, if it responds, reboot again, we found that the
bridge does not forward packages after many reboots.
Looking into 88e6176 registers we saw the following

    GLOBAL GLOBAL2   0    1    2    3    4    5    6 
 0:  c820       0  de0f 5d0f 500f 500f 500f 4e07 4007
 1:     3       0    3e    3    3    3    3    3    3
 2:     0    ffff     0    0    0    0    0    0    0
 3:     0    ffff  1761 1761 1761 1761 1761 1761 1761
 4:  6000     258  373f  433  430  433  433  433  433
 5:  1000    c12f     0    0    0    0    0    0    0
 6:  c000    1f0f  101e 3005 3003 4001 5001 6001 7001
 7:     0    707f     0    0    0    0    0    0    0
 8:     0    7800  2480 2480 2480 2480 2480 2480 2480
 9:     0    1600     1    1    1    1    1    1    1
 a:   148       0     0    0    0    0    0    0    0
 b:  6000    1000     1    2    4    8   10   20   40
 c:     0      22     0    0    0    0    0    0    0
 d:  ffff     507     0    0    0    0    0    0    0
 e:  ffff      36     0    0    0    0    0    0    0
 f:  ffff     f00  dada dada dada dada dada dada dada
10:     0       0     0    0    0    0    0    0    0
11:     0       0     0    0    0    0    0    0    0
12:  5555       0     0    0    0    0    0    0    0
13:  5555       0   34d 8b18  54d    0    0    0    0
14:  aaaa     400     0    0    0    0    0    0    0
15:  aaaa       0     0    0    0    0    0    0    0
16:  ffff       0    33   33   33   33   33   33    0
17:  ffff       0     0    0    0    0    0    0    0
18:  fa41    1884  3210 3210 3210 3210 3210 3210 3210
19:     0     5e1  7654 7654 7654 7654 7654 7654 7654
1a:     0       0     0    0    0    0    0    0    0
1b:   1fc    f869  8000 8000 8000 8000 8000 8000 8000
1c:     0    4c00     0    0    0    0    0    0    0
1d:  5ce0       0     0    0    0    0    0    0    0
1e:     0       0     0    0    0    0    0    0    0
1f:     0       0     0    0    0    0    0    0    0

The main difference is GLOBAL2 5th register. When the unit is just
initialized, the driver sets this register to 00ff, however, when the
issue happens, its value is c12f.
We got a patch which allows us to set registers values. If we change
c12f to 00ff the ping works, otherwise, ping does not work. We do not
know who is changing the register value. Apparently, driver does not.

Weirderif possible, sometimes even global2 5th register is set to 00ff
and bridge does not forward packages either. We have not sorted out
which other register is affecting.

Finally, The weirdest behaviour we are seeing is that the unit does not
detect a link change, register 0 of ports 1 and 2 do not update their
status.

Have you experienced a similar issue in your side?

Is it possible that those micro-outage could be the reason of bad
settings in Global2 5th register?

Have you fixed this issues in a newer Linux kernel version?

Thanks in advance.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ISSUE: mv88e6xxx]: Down/Up link and not forwarding
       [not found] <8e5e36d7-7618-2a4e-6aba-e65e41662d47@aoifes.com>
@ 2016-10-04 15:37 ` Jose Antonio Delgado Alfonso
  2016-10-04 18:58   ` Florian Fainelli
  0 siblings, 1 reply; 4+ messages in thread
From: Jose Antonio Delgado Alfonso @ 2016-10-04 15:37 UTC (permalink / raw)
  To: netdev

We are working in an ARMv7 embedded system running kernel 4.1 but
including patches to upgrade dsa/mv88e6xxx to kernel version 4.3
(5acf4d0, Wed, 27 May 2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir()
should not return a boolean."

This is the schema of the system.

 +-------------------+ eth0
 |                   +--+
 |                   |  |
 | Embedded system   +--+
 |                   |
 |      ARMv7        |
 |                   | Marvell 88E8057(sky2)     +-------------+
 |                   +--+                     +--+             +--+ eth1
 |                   |  +---------------------+  |             |  +------+
 |                   +--+      CPU port       +--+  mv88e6176  +--+
 +------+--+---------+                           |             |
emulated|  |                                     |             |
GPIO    +--+                                  +--+             +--+ eth2
MDIO      +-----------------------------------+  |             |  +------+
                              MDIO            +--+             +--+
                                                 +-------------+

There is a bridge (br-lan) which includes eth0/eth1/eth2

>From time to time, We are seeing a link down and up of about 1s.
Following the message that kernel sends.

[  312.769399] dsa dsa@0 eth2: Link is Down
[  312.773372] br-lan: port 3(eth2) entered disabled state
[  312.947274] dsa dsa@0 eth2: link up, 100 Mb/s, full duplex, flow
control disabled
[  312.963807] br-lan: port 3(eth2) entered forwarding state
[  312.969276] br-lan: port 3(eth2) entered forwarding state
[  313.777815] dsa dsa@0 eth2: Link is Up - 100Mbps/Full - flow control
rx/tx
[  314.966277] br-lan: port 3(eth2) entered forwarding state

Moreover, under a reboot loop test which consists in booting the system,
ping the unit and, if it responds, reboot again, we found that the
bridge does not forward packages after many reboots.
Looking into 88e6176 registers we saw the following

    GLOBAL GLOBAL2   0    1    2    3    4    5    6 
 0:  c820       0  de0f 5d0f 500f 500f 500f 4e07 4007
 1:     3       0    3e    3    3    3    3    3    3
 2:     0    ffff     0    0    0    0    0    0    0
 3:     0    ffff  1761 1761 1761 1761 1761 1761 1761
 4:  6000     258  373f  433  430  433  433  433  433
 5:  1000    c12f     0    0    0    0    0    0    0
 6:  c000    1f0f  101e 3005 3003 4001 5001 6001 7001
 7:     0    707f     0    0    0    0    0    0    0
 8:     0    7800  2480 2480 2480 2480 2480 2480 2480
 9:     0    1600     1    1    1    1    1    1    1
 a:   148       0     0    0    0    0    0    0    0
 b:  6000    1000     1    2    4    8   10   20   40
 c:     0      22     0    0    0    0    0    0    0
 d:  ffff     507     0    0    0    0    0    0    0
 e:  ffff      36     0    0    0    0    0    0    0
 f:  ffff     f00  dada dada dada dada dada dada dada
10:     0       0     0    0    0    0    0    0    0
11:     0       0     0    0    0    0    0    0    0
12:  5555       0     0    0    0    0    0    0    0
13:  5555       0   34d 8b18  54d    0    0    0    0
14:  aaaa     400     0    0    0    0    0    0    0
15:  aaaa       0     0    0    0    0    0    0    0
16:  ffff       0    33   33   33   33   33   33    0
17:  ffff       0     0    0    0    0    0    0    0
18:  fa41    1884  3210 3210 3210 3210 3210 3210 3210
19:     0     5e1  7654 7654 7654 7654 7654 7654 7654
1a:     0       0     0    0    0    0    0    0    0
1b:   1fc    f869  8000 8000 8000 8000 8000 8000 8000
1c:     0    4c00     0    0    0    0    0    0    0
1d:  5ce0       0     0    0    0    0    0    0    0
1e:     0       0     0    0    0    0    0    0    0
1f:     0       0     0    0    0    0    0    0    0

The main difference is GLOBAL2 5th register. When the unit is just
initialized, the driver sets this register to 00ff, however, when the
issue happens, its value is c12f.
We got a patch which allows us to set registers values. If we change
c12f to 00ff the ping works, otherwise, ping does not work. We do not
know who is changing the register value. Apparently, driver does not.

Weirderif possible, sometimes even global2 5th register is set to 00ff
and bridge does not forward packages either. We have not sorted out
which other register is affecting.

Finally, The weirdest behaviour we are seeing is that the unit does not
detect a link change, register 0 of ports 1 and 2 do not update their
status.

Have you experienced a similar issue in your side?

Is it possible that those micro-outage could be the reason of bad
settings in Global2 5th register?

Have you fixed this issues in a newer Linux kernel version?

Thanks in advance.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ISSUE: mv88e6xxx]: Down/Up link and not forwarding
  2016-10-04 15:37 ` [ISSUE: mv88e6xxx]: Down/Up link and not forwarding Jose Antonio Delgado Alfonso
@ 2016-10-04 18:58   ` Florian Fainelli
  2016-10-04 20:28     ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Fainelli @ 2016-10-04 18:58 UTC (permalink / raw)
  To: Jose Antonio Delgado Alfonso, netdev, Andrew Lunn, Vivien Didelot

On October 4, 2016 8:37:13 AM PDT, Jose Antonio Delgado Alfonso <jose.delgado@aoifes.com> wrote:
>We are working in an ARMv7 embedded system running kernel 4.1 but
>including patches to upgrade dsa/mv88e6xxx to kernel version 4.3
>(5acf4d0, Wed, 27 May 2015 15:32:15 -0700) "[PATCH] blk: rq_data_dir()
>should not return a boolean."
>
>This is the schema of the system.
>
> +-------------------+ eth0
> |                   +--+
> |                   |  |
> | Embedded system   +--+
> |                   |
> |      ARMv7        |
> |                   | Marvell 88E8057(sky2)     +-------------+
>|                   +--+                     +--+             +--+ eth1
>|                   |  +---------------------+  |             | 
>+------+
> |                   +--+      CPU port       +--+  mv88e6176  +--+
> +------+--+---------+                           |             |
>emulated|  |                                     |             |
>GPIO    +--+                                  +--+             +--+
>eth2
>MDIO      +-----------------------------------+  |             | 
>+------+
>                              MDIO            +--+             +--+
>                                                 +-------------+
>
>There is a bridge (br-lan) which includes eth0/eth1/eth2

Can you detail what eth0 and eth1 actually correspond to? The bridge layer denies adding DSA master network interfaces as bridge members as soon as they have tags enabled.

>
>>From time to time, We are seeing a link down and up of about 1s.
>Following the message that kernel sends.
>
>[  312.769399] dsa dsa@0 eth2: Link is Down
>[  312.773372] br-lan: port 3(eth2) entered disabled state
>[  312.947274] dsa dsa@0 eth2: link up, 100 Mb/s, full duplex, flow
>control disabled
>[  312.963807] br-lan: port 3(eth2) entered forwarding state
>[  312.969276] br-lan: port 3(eth2) entered forwarding state
>[  313.777815] dsa dsa@0 eth2: Link is Up - 100Mbps/Full - flow control
>rx/tx
>[  314.966277] br-lan: port 3(eth2) entered forwarding state
>
>Moreover, under a reboot loop test which consists in booting the
>system,
>ping the unit and, if it responds, reboot again, we found that the
>bridge does not forward packages after many reboots.
>Looking into 88e6176 registers we saw the following
>
>    GLOBAL GLOBAL2   0    1    2    3    4    5    6 
> 0:  c820       0  de0f 5d0f 500f 500f 500f 4e07 4007
> 1:     3       0    3e    3    3    3    3    3    3
> 2:     0    ffff     0    0    0    0    0    0    0
> 3:     0    ffff  1761 1761 1761 1761 1761 1761 1761
> 4:  6000     258  373f  433  430  433  433  433  433
> 5:  1000    c12f     0    0    0    0    0    0    0
> 6:  c000    1f0f  101e 3005 3003 4001 5001 6001 7001
> 7:     0    707f     0    0    0    0    0    0    0
> 8:     0    7800  2480 2480 2480 2480 2480 2480 2480
> 9:     0    1600     1    1    1    1    1    1    1
> a:   148       0     0    0    0    0    0    0    0
> b:  6000    1000     1    2    4    8   10   20   40
> c:     0      22     0    0    0    0    0    0    0
> d:  ffff     507     0    0    0    0    0    0    0
> e:  ffff      36     0    0    0    0    0    0    0
> f:  ffff     f00  dada dada dada dada dada dada dada
>10:     0       0     0    0    0    0    0    0    0
>11:     0       0     0    0    0    0    0    0    0
>12:  5555       0     0    0    0    0    0    0    0
>13:  5555       0   34d 8b18  54d    0    0    0    0
>14:  aaaa     400     0    0    0    0    0    0    0
>15:  aaaa       0     0    0    0    0    0    0    0
>16:  ffff       0    33   33   33   33   33   33    0
>17:  ffff       0     0    0    0    0    0    0    0
>18:  fa41    1884  3210 3210 3210 3210 3210 3210 3210
>19:     0     5e1  7654 7654 7654 7654 7654 7654 7654
>1a:     0       0     0    0    0    0    0    0    0
>1b:   1fc    f869  8000 8000 8000 8000 8000 8000 8000
>1c:     0    4c00     0    0    0    0    0    0    0
>1d:  5ce0       0     0    0    0    0    0    0    0
>1e:     0       0     0    0    0    0    0    0    0
>1f:     0       0     0    0    0    0    0    0    0
>
>The main difference is GLOBAL2 5th register. When the unit is just
>initialized, the driver sets this register to 00ff, however, when the
>issue happens, its value is c12f.
>We got a patch which allows us to set registers values. If we change
>c12f to 00ff the ping works, otherwise, ping does not work. We do not
>know who is changing the register value. Apparently, driver does not.
>
>Weirderif possible, sometimes even global2 5th register is set to 00ff
>and bridge does not forward packages either. We have not sorted out
>which other register is affecting.
>
>Finally, The weirdest behaviour we are seeing is that the unit does not
>detect a link change, register 0 of ports 1 and 2 do not update their
>status.
>
>Have you experienced a similar issue in your side?
>
>Is it possible that those micro-outage could be the reason of bad
>settings in Global2 5th register?
>
>Have you fixed this issues in a newer Linux kernel version?

Can you try reproducing this with the latest net-next tree?

-- 
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ISSUE: mv88e6xxx]: Down/Up link and not forwarding
  2016-10-04 18:58   ` Florian Fainelli
@ 2016-10-04 20:28     ` Andrew Lunn
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Lunn @ 2016-10-04 20:28 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Jose Antonio Delgado Alfonso, netdev, Vivien Didelot

> >The main difference is GLOBAL2 5th register. When the unit is just
> >initialized, the driver sets this register to 00ff, however, when
> >the issue happens, its value is c12f.

You might want to hack the MDIO driver and get it to trap writes to
this register and give you a call stack.

     Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-04 20:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8e5e36d7-7618-2a4e-6aba-e65e41662d47@aoifes.com>
2016-10-04 15:37 ` [ISSUE: mv88e6xxx]: Down/Up link and not forwarding Jose Antonio Delgado Alfonso
2016-10-04 18:58   ` Florian Fainelli
2016-10-04 20:28     ` Andrew Lunn
2016-10-04 15:13 Jose Antonio Delgado Alfonso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).