From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oliver Freyermuth Subject: Re: Memory corruption with r8169 across several device revisions and kernels Date: Mon, 22 Jan 2018 01:44:33 +0100 Message-ID: <3fb413aa-997f-42f5-2a43-c29d8de51d3d@googlemail.com> References: <20180121204809.GA1398@electric-eye.fr.zoreil.com> <8ac81034-008b-7ad0-619c-b80bb0843c14@googlemail.com> <20180122000922.GA3020@electric-eye.fr.zoreil.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Francois Romieu Return-path: Received: from mail-wm0-f51.google.com ([74.125.82.51]:38778 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750994AbeAVApV (ORCPT ); Sun, 21 Jan 2018 19:45:21 -0500 Received: by mail-wm0-f51.google.com with SMTP id 141so13621857wme.3 for ; Sun, 21 Jan 2018 16:45:20 -0800 (PST) In-Reply-To: <20180122000922.GA3020@electric-eye.fr.zoreil.com> Content-Language: en-GB Sender: netdev-owner@vger.kernel.org List-ID: Am 22.01.2018 um 01:09 schrieb Francois Romieu: > You said: > > Oliver Freyermuth : > [...] >> The values found in overwritten memory match those contained in >> /proc/self/net/dev for the realtek ethernet device. > > Are you able to retrieve the layout ? That is, does it appear to match: > > - r8169 hardware stats DMA buffer ? > TxOk, RxOk, TxErr, RxErr, ... > > - rtnl_link_stats ? > rx_packets, tx_packets, rx_bytes, tx_bytes, ... > > or something else ? Not cleanly. Since I'm no expert in kernel module development, I can only deduce from what I get in mapped memory, e.g. with memtester. What I found there I found back in /proc/self/net/dev, I'm not sure anymore whether it was RX or TX bytes / packets (but it was none of the error counters). I can try to reproduce to clarify, but it's a somwhat dangerous undertaking. Also, from a time when the physical offset was in low memory, I got the following in syslog: Oct 12 10:05:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065b8ea Oct 12 10:10:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be39 Oct 12 10:11:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065be8c Oct 12 10:12:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bef8 Oct 12 10:13:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065bfbe Oct 12 10:18:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c37a Oct 12 10:19:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065c3db Oct 12 10:31:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065cc48 Oct 12 10:35:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065d402 Oct 12 10:47:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065dcbb Oct 12 10:53:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 0065e0a3 Oct 12 11:39:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 006602f2 Oct 12 11:44:02 desktop1 kernel: Corrupted low memory at ffff880000009000 (9000 phys) = 00661ef0 Also, I'm not sure whether the low memory scanner continues after a single corruption was found, potentially it would only see the first corrupted region. memtester in userspace stops on the first corruption and then tries another pass. At least I only ever saw one corrupted region with the tools I used. The same was true for the corrupted btrfs filesystem: As far as I could tell, there was a single corrupted region, no series of counters, i.e. not a full structure. Cheers, Oliver