From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oliver Freyermuth Subject: Memory corruption with r8169 across several device revisions and kernels Date: Sat, 20 Jan 2018 21:18:54 +0100 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Return-path: Received: from mail-wr0-f170.google.com ([209.85.128.170]:38148 "EHLO mail-wr0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756589AbeATUS5 (ORCPT ); Sat, 20 Jan 2018 15:18:57 -0500 Received: by mail-wr0-f170.google.com with SMTP id x1so4555434wrb.5 for ; Sat, 20 Jan 2018 12:18:57 -0800 (PST) Received: from ?IPv6:2a01:5c0:15:1810:21e1:6d91:4377:4a9e? ([2a01:5c0:15:1810:21e1:6d91:4377:4a9e]) by smtp.googlemail.com with ESMTPSA id 2sm4395326wmk.28.2018.01.20.12.18.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 20 Jan 2018 12:18:55 -0800 (PST) Content-Language: en-GB Sender: netdev-owner@vger.kernel.org List-ID: Dear network experts, please redirect me if this is the wrong place. I have reproduced the following issue across three devices with different Realtek card revisions and different Distros (Debian 9, Ubuntu 17.04, Gentoo with kernels 4.9, 4.11.3, 4.14.12). It's safely reproducible with at least: Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12) Memory corruption at physical addresses either in low memory or kernel memory or user space memory occurs when reading from: /proc/self/net/dev The physical memory addresses which get corrupted change with each boot of the system, and also appear to change with each reload of the kernel module (I have only one data point on that). To reproduce, execute: $ while true; do cat /proc/self/net/dev > /dev/null; done and in parallel, scan memory for corruption, e.g. $ memtester 15G Of course, one should try to map all system memory here. It usually shows up in the first loop iteration if the "while" loop is executed in parallel. Depending on the actual memory being corrupted, it may also become visible via Corrupted low memory at ffff88000000b000 (b000 phys) = 0016e109 in klog, if the low memory corruption scanning is activated. The values found in overwritten memory match those contained in /proc/self/net/dev for the realtek ethernet device. Unloading r8169 or disabling the card in bios "fixes" this issue. I have already ended up with two corrupted btrfs filesystems due to this issue, and many segfaults in userspace. Please include me directly in replies, I may not stay subscribed to the list. Cheers, Oliver