netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oliver Freyermuth <o.freyermuth@googlemail.com>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: netdev@vger.kernel.org
Subject: Re: Memory corruption with r8169 across several device revisions and kernels
Date: Mon, 22 Jan 2018 23:55:58 +0100	[thread overview]
Message-ID: <3ceebc43-6baf-2b4c-af10-70522e97385e@googlemail.com> (raw)
In-Reply-To: <20180122000922.GA3020@electric-eye.fr.zoreil.com>

Dear Francois, other r8169 experts, 

Am 22.01.2018 um 01:09 schrieb Francois Romieu:
> Are you able to retrieve the layout ? That is, does it appear to match:
> 
> - r8169 hardware stats DMA buffer ?
>   TxOk, RxOk, TxErr, RxErr, ...
> 
> - rtnl_link_stats ?
>   rx_packets, tx_packets, rx_bytes, tx_bytes, ...
> 
> or something else ?
> 

It took me a while, somehow it seems the bug does not always occur - potentially there's also some race involved. 
Reproducing on a Ubuntu 17.10 system I found the following:

Address in virtual memory || value
0x7f87bb4c6000            || 0x00000217
0x7f87bb4c6008            || 0x000003ab
0x7f87bb4c6018            || 0x00000000
0x7f87bb4c6028            || 0x00000279
0x7f87bb4c6030            || 0x000000e1
0x7f87bb4c6038            || 0x00000051

At almost the same time, I find the following numbers in /proc/self/net/dev for the device:

             decimal || hex
RX bytes:    870820  || 0x000d49a4
   packets:     945  || 0x000003b1
   errs           0  || 
   drop           0  || 
   fifo           0  ||  
   frame          0  || 
   compressed     0  || 
   multicast     83  || 0x00000053
TX bytes:     58505  || 0x0000e489
   packets:     535  || 0x00000217
   errs           0  || 
   drop           0  ||
   fifo           0  || 
   frame          0  || 
   compressed     0  || 
   multicast      0  || 

Since there was a small delay in time (reading from /proc/self/net/dev happened a few seconds later),
these values are by a few packets off from the memory dump. 

So I deduce the layout:
0x7f87bb4c6000   TX Packets
0x7f87bb4c6008   RX Packets
0x7f87bb4c6010    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6018   ???
0x7f87bb4c6020    * corruption not seen by memtester for whatever reason *
0x7f87bb4c6028   ???
0x7f87bb4c6030   ???
0x7f87bb4c6038   RX multicast (?)

So the only thing which is fully clear is that there are TX Packets and after that RX Packets. 

Checking through the driver sources, I find rtnl_link_stats64 can not be the culprit, since it has rx_packets and only after tx_packets. 
However, struct rtl8169_counters looks like:
struct rtl8169_counters {
	__le64	tx_packets;
	__le64	rx_packets;
	__le64	tx_errors;
	__le32	rx_errors;
	__le16	rx_missed;
	__le16	align_errors;
	__le32	tx_one_collision;
	__le32	tx_multi_collision;
	__le64	rx_unicast;
	__le64	rx_broadcast;
	__le32	rx_multicast;
	__le16	tx_aborted;
	__le16	tx_underun;
};
This looks like it could very well match the structure found in memory, so something would be broken related to rtl8169_do_counters, in the DMA transfer. 

Does this help - can I provide more info? I get the feeling this affects many tens of thousands of systems and just has been hidden due to 
network stats being read rarely... 

Cheers,
Oliver

  parent reply	other threads:[~2018-01-22 22:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-20 20:18 Memory corruption with r8169 across several device revisions and kernels Oliver Freyermuth
2018-01-21 20:48 ` Francois Romieu
2018-01-21 20:50   ` Oliver Freyermuth
2018-01-22  0:09     ` Francois Romieu
2018-01-22  0:44       ` Oliver Freyermuth
2018-01-22 22:55       ` Oliver Freyermuth [this message]
2018-01-23 15:28         ` David Miller
2018-01-23 15:47           ` Oliver Freyermuth
2018-01-24  5:31             ` Andreas Hartmann
2018-01-24  7:05               ` Andreas Hartmann
2018-01-23 22:13         ` Francois Romieu
2018-01-24  1:21           ` Oliver Freyermuth
2018-01-24  1:33             ` David Miller
2018-01-26  0:53               ` [PATCH net 1/1] r8169: fix memory corruption on retrieval of hardware statistics Francois Romieu
2018-01-26  2:34                 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ceebc43-6baf-2b4c-af10-70522e97385e@googlemail.com \
    --to=o.freyermuth@googlemail.com \
    --cc=netdev@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).