From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id 161676B0038 for ; Thu, 19 Oct 2017 20:33:51 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id y128so3205526pfg.5 for ; Thu, 19 Oct 2017 17:33:51 -0700 (PDT) Received: from lgeamrelo12.lge.com (LGEAMRELO12.lge.com. [156.147.23.52]) by mx.google.com with ESMTP id x5si10354474plo.143.2017.10.19.17.33.48 for ; Thu, 19 Oct 2017 17:33:48 -0700 (PDT) Date: Fri, 20 Oct 2017 09:33:46 +0900 From: Minchan Kim Subject: Re: swap_info_get: Bad swap offset entry 0200f8a7 Message-ID: <20171020003346.GA855@bbox> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Christian Kujau Cc: linux-kernel@vger.kernel.org, Hugh Dickins , Nitin Gupta , Robert Schelander , Andrew Morton , linux-mm@kvack.org, "Huang, Ying" Hello, On Sun, Oct 15, 2017 at 05:17:36PM -0700, Christian Kujau wrote: > Hi, > > every now and then (and more frequently now) I receive the following > message on this Atom N270 netbook: > > swap_info_get: Bad swap offset entry 0200f8a7 > > This started to show up a few months ago but appears to happen more > frequently now: > > 4 May < Linux version 4.11.2-1-ARCH > 4 Jun < Linux version 4.11.3-1-ARCH > 7 Jul < Linux version 4.11.9-1-ARCH > 4 Aug < Linux version 4.12.8-2-ARCH > 24 Sep < Linux version 4.12.13-1-ARCH > 158 Oct < Linux version 4.13.5-1-ARCH > > I've only found (very) old reports for this[0][2] with either no > solution[1] or some hinting that this may be caused by hardware errors. Since 4.11, there are lots of happenings in swap subsystem to be optimized so it might be related to one of those changes but I'm not sure. Worth to Ccing Huang who may know somethings since then. Thanks. > > In my case howerver no kernel BUG messages or oopses are involved and no > PTE errors are logged. The machine appears to be very stable, although > memory usage is quite high on that machine (but no OOM situations so > far either). As the machine is only equipped with 1GB of RAM, I'm > using ZRAM on this system, which usually looks something like this: > > $ zramctl > NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > /dev/zram0 lz4 248.7M 195.7M 74M 78.7M 2 [SWAP] > > I suspect that, when memory pressure is high, zram may not be quick enough > to decompress a page leading to these messages, but then I'd have expected > a zram error message too. > > Can anybody comment on these messages? If they're really indicating a > hardware error, shouldn't there be other messages too? So far, rasdaemon > has not logged any errors. > > Thanks, > Christian. > > [0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html > [1] https://bugzilla.redhat.com/show_bug.cgi?id=432337 > [2] https://access.redhat.com/solutions/218733 > -- > BOFH excuse #323: > > Your processor has processed too many instructions. Turn it off immediately, do not type any commands!! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751571AbdJPAXP (ORCPT ); Sun, 15 Oct 2017 20:23:15 -0400 Received: from trent.utfs.org ([94.185.90.103]:56090 "EHLO trent.utfs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751348AbdJPAXO (ORCPT ); Sun, 15 Oct 2017 20:23:14 -0400 X-Greylist: delayed 336 seconds by postgrey-1.27 at vger.kernel.org; Sun, 15 Oct 2017 20:23:14 EDT Date: Sun, 15 Oct 2017 17:17:36 -0700 (PDT) From: Christian Kujau To: linux-kernel@vger.kernel.org cc: Hugh Dickins , Minchan Kim , Nitin Gupta , Robert Schelander Subject: swap_info_get: Bad swap offset entry 0200f8a7 Message-ID: User-Agent: Alpine 2.21.1 (DEB 213 2017-07-18) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, every now and then (and more frequently now) I receive the following message on this Atom N270 netbook: swap_info_get: Bad swap offset entry 0200f8a7 This started to show up a few months ago but appears to happen more frequently now: 4 May < Linux version 4.11.2-1-ARCH 4 Jun < Linux version 4.11.3-1-ARCH 7 Jul < Linux version 4.11.9-1-ARCH 4 Aug < Linux version 4.12.8-2-ARCH 24 Sep < Linux version 4.12.13-1-ARCH 158 Oct < Linux version 4.13.5-1-ARCH I've only found (very) old reports for this[0][2] with either no solution[1] or some hinting that this may be caused by hardware errors. In my case howerver no kernel BUG messages or oopses are involved and no PTE errors are logged. The machine appears to be very stable, although memory usage is quite high on that machine (but no OOM situations so far either). As the machine is only equipped with 1GB of RAM, I'm using ZRAM on this system, which usually looks something like this: $ zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lz4 248.7M 195.7M 74M 78.7M 2 [SWAP] I suspect that, when memory pressure is high, zram may not be quick enough to decompress a page leading to these messages, but then I'd have expected a zram error message too. Can anybody comment on these messages? If they're really indicating a hardware error, shouldn't there be other messages too? So far, rasdaemon has not logged any errors. Thanks, Christian. [0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html [1] https://bugzilla.redhat.com/show_bug.cgi?id=432337 [2] https://access.redhat.com/solutions/218733 -- BOFH excuse #323: Your processor has processed too many instructions. Turn it off immediately, do not type any commands!! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751646AbdJTAdu (ORCPT ); Thu, 19 Oct 2017 20:33:50 -0400 Received: from LGEAMRELO12.lge.com ([156.147.23.52]:35373 "EHLO lgeamrelo12.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750899AbdJTAdt (ORCPT ); Thu, 19 Oct 2017 20:33:49 -0400 X-Original-SENDERIP: 156.147.1.125 X-Original-MAILFROM: minchan@kernel.org X-Original-SENDERIP: 10.177.220.163 X-Original-MAILFROM: minchan@kernel.org Date: Fri, 20 Oct 2017 09:33:46 +0900 From: Minchan Kim To: Christian Kujau Cc: linux-kernel@vger.kernel.org, Hugh Dickins , Nitin Gupta , Robert Schelander , Andrew Morton , linux-mm@kvack.org, "Huang, Ying" Subject: Re: swap_info_get: Bad swap offset entry 0200f8a7 Message-ID: <20171020003346.GA855@bbox> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Sun, Oct 15, 2017 at 05:17:36PM -0700, Christian Kujau wrote: > Hi, > > every now and then (and more frequently now) I receive the following > message on this Atom N270 netbook: > > swap_info_get: Bad swap offset entry 0200f8a7 > > This started to show up a few months ago but appears to happen more > frequently now: > > 4 May < Linux version 4.11.2-1-ARCH > 4 Jun < Linux version 4.11.3-1-ARCH > 7 Jul < Linux version 4.11.9-1-ARCH > 4 Aug < Linux version 4.12.8-2-ARCH > 24 Sep < Linux version 4.12.13-1-ARCH > 158 Oct < Linux version 4.13.5-1-ARCH > > I've only found (very) old reports for this[0][2] with either no > solution[1] or some hinting that this may be caused by hardware errors. Since 4.11, there are lots of happenings in swap subsystem to be optimized so it might be related to one of those changes but I'm not sure. Worth to Ccing Huang who may know somethings since then. Thanks. > > In my case howerver no kernel BUG messages or oopses are involved and no > PTE errors are logged. The machine appears to be very stable, although > memory usage is quite high on that machine (but no OOM situations so > far either). As the machine is only equipped with 1GB of RAM, I'm > using ZRAM on this system, which usually looks something like this: > > $ zramctl > NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > /dev/zram0 lz4 248.7M 195.7M 74M 78.7M 2 [SWAP] > > I suspect that, when memory pressure is high, zram may not be quick enough > to decompress a page leading to these messages, but then I'd have expected > a zram error message too. > > Can anybody comment on these messages? If they're really indicating a > hardware error, shouldn't there be other messages too? So far, rasdaemon > has not logged any errors. > > Thanks, > Christian. > > [0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html > [1] https://bugzilla.redhat.com/show_bug.cgi?id=432337 > [2] https://access.redhat.com/solutions/218733 > -- > BOFH excuse #323: > > Your processor has processed too many instructions. Turn it off immediately, do not type any commands!! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752870AbdJTLbR (ORCPT ); Fri, 20 Oct 2017 07:31:17 -0400 Received: from mail-qk0-f179.google.com ([209.85.220.179]:46310 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752628AbdJTLbP (ORCPT ); Fri, 20 Oct 2017 07:31:15 -0400 X-Google-Smtp-Source: ABhQp+RaY+A1AEDv1GM6ibNgp3EqfGCl2T5Mkb2kGQn6QkAAHt3C24j1F2pQGStRRHz2D1ZQwSGtCZMlMCvshfGlNx0= MIME-Version: 1.0 In-Reply-To: References: From: huang ying Date: Fri, 20 Oct 2017 19:31:14 +0800 Message-ID: Subject: Re: swap_info_get: Bad swap offset entry 0200f8a7 To: Christian Kujau Cc: LKML , Hugh Dickins , Minchan Kim , Nitin Gupta , Robert Schelander Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Christian, On Mon, Oct 16, 2017 at 8:17 AM, Christian Kujau wrote: > Hi, > > every now and then (and more frequently now) I receive the following > message on this Atom N270 netbook: > > swap_info_get: Bad swap offset entry 0200f8a7 > > This started to show up a few months ago but appears to happen more > frequently now: > > 4 May < Linux version 4.11.2-1-ARCH > 4 Jun < Linux version 4.11.3-1-ARCH > 7 Jul < Linux version 4.11.9-1-ARCH > 4 Aug < Linux version 4.12.8-2-ARCH > 24 Sep < Linux version 4.12.13-1-ARCH > 158 Oct < Linux version 4.13.5-1-ARCH So you have never seen this before 4.11 like 4.10? Which operations will trigger this error messages? Is it possible for you to check whether the error exists for normal swap device (not ZRAM)? 32bit or 64bit kernel do you use? Best Regards, Huang, Ying > I've only found (very) old reports for this[0][2] with either no > solution[1] or some hinting that this may be caused by hardware errors. > > In my case howerver no kernel BUG messages or oopses are involved and no > PTE errors are logged. The machine appears to be very stable, although > memory usage is quite high on that machine (but no OOM situations so > far either). As the machine is only equipped with 1GB of RAM, I'm > using ZRAM on this system, which usually looks something like this: > > $ zramctl > NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > /dev/zram0 lz4 248.7M 195.7M 74M 78.7M 2 [SWAP] > > I suspect that, when memory pressure is high, zram may not be quick enough > to decompress a page leading to these messages, but then I'd have expected > a zram error message too. > > Can anybody comment on these messages? If they're really indicating a > hardware error, shouldn't there be other messages too? So far, rasdaemon > has not logged any errors. > > Thanks, > Christian. > > [0] http://lkml.iu.edu/hypermail/linux/kernel/0204.3/0165.html > [1] https://bugzilla.redhat.com/show_bug.cgi?id=432337 > [2] https://access.redhat.com/solutions/218733 > -- > BOFH excuse #323: > > Your processor has processed too many instructions. Turn it off immediately, do not type any commands!! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753272AbdJUBHq (ORCPT ); Fri, 20 Oct 2017 21:07:46 -0400 Received: from trent.utfs.org ([94.185.90.103]:53984 "EHLO trent.utfs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753166AbdJUBHp (ORCPT ); Fri, 20 Oct 2017 21:07:45 -0400 Date: Fri, 20 Oct 2017 18:07:43 -0700 (PDT) From: Christian Kujau To: huang ying cc: LKML , Hugh Dickins , Minchan Kim , Nitin Gupta , Robert Schelander Subject: Re: swap_info_get: Bad swap offset entry 0200f8a7 In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21.9 (DEB 223 2017-09-30) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 20 Oct 2017, huang ying wrote: > > 4 May < Linux version 4.11.2-1-ARCH > > 4 Jun < Linux version 4.11.3-1-ARCH > > 7 Jul < Linux version 4.11.9-1-ARCH > > 4 Aug < Linux version 4.12.8-2-ARCH > > 24 Sep < Linux version 4.12.13-1-ARCH > > 158 Oct < Linux version 4.13.5-1-ARCH > > So you have never seen this before 4.11 like 4.10? Unfortunately the kernel logs for that machine only go back until May 2017 and I cannot tell if that hasn't happened before. I've seen these messages appear since then but didn't bother much. But as it now happens more frequently, I thought I should mention this to the list. > Which operations will trigger this error messages? I'm not able to reproduce it at will, but I suspect that memory pressure triggers these messages. The machine in question is an Lenovo Ideapad S10 notebook running 24x7 and is equipped with 1 GB of RAM. Two Java processes are basically using up all the memory, so usually it tooks like this: ======================================== $ free -m total used free shared buff/cache available Mem: 994 866 67 1 60 20 Swap: 760 437 322 $ zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lz4 248.7M 247M 92.3M 97.4M 2 [SWAP] ======================================== I just assumed the message is triggered when the system is really low on memory and maybe zram is too slow to provide the memory requested. But that's just my layman's assumption :-) For example, today's message was emitted during the night: Oct 20 01:26:18 len kernel: [638973.207849] \ swap_info_get: Bad swap offset entry 0200f8a7 And here are the sysstat numbers for that time frame: $ sar -r -s 00:00 -e 02:00 Linux 4.13.5-1-ARCH (len) 10/20/2017 _i686_ (2 CPU) 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 12:10:06 AM 70076 948404 93.12 4 19004 1556176 86.58 376608 379408 220 12:20:02 AM 80488 937992 92.10 4 180404 1563952 87.01 380184 327736 5568 12:30:03 AM 83296 935184 91.82 4 137260 1569776 87.34 329512 330000 280 12:40:03 AM 65188 953292 93.60 4 21156 1571048 87.41 386644 389820 1144 12:50:03 AM 67512 950968 93.37 4 33452 1570628 87.38 378936 381580 1304 01:00:07 AM 65520 952960 93.57 4 24996 1573180 87.53 385396 386152 904 01:10:03 AM 66956 951524 93.43 4 35520 1572696 87.50 379548 379364 172 01:20:02 AM 67440 951040 93.38 4 88736 1569864 87.34 381764 370472 7080 01:30:03 AM 70048 948432 93.12 4 29212 1572504 87.49 383516 381900 1832 01:40:04 AM 71532 946948 92.98 4 29220 1570096 87.35 380120 380284 1000 01:50:03 AM 65828 952652 93.54 4 34408 1570604 87.38 381040 381028 1604 Average: 70353 948127 93.09 4 57579 1569139 87.30 376661 371613 1919 == If that is unreadable, here it is again: https://paste.debian.net/991927/ > Is it possible for you to check > whether the error exists for normal swap device (not ZRAM)? I have "normal" (but encrpted) swap configured but with a lower priority: cat /proc/swaps Filename Type Size Used Priority /dev/dm-0 partition 524284 194348 0 /dev/zram0 partition 254616 253536 32767 I shall disable the zram device and disable encryption too and will report back if the message appears again. > 32bit or 64bit kernel do you use? I'm using an i686 kernel for this Atom N270 processor (with HT enabled). Thanks for your response, Christian. -- BOFH excuse #403: Sysadmin didn't hear pager go off due to loud music from bar-room speakers. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753470AbdJUQIB (ORCPT ); Sat, 21 Oct 2017 12:08:01 -0400 Received: from mail-qk0-f177.google.com ([209.85.220.177]:52419 "EHLO mail-qk0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753362AbdJUQH7 (ORCPT ); Sat, 21 Oct 2017 12:07:59 -0400 X-Google-Smtp-Source: ABhQp+SX7NqYHIT6CILm04aTRici+JQqZPH0m7fQCcd6CXOltqtnBhGdfNT50+EIbW1XimI9/O4xldQgxBNISngwVpQ= MIME-Version: 1.0 In-Reply-To: References: From: huang ying Date: Sun, 22 Oct 2017 00:07:58 +0800 Message-ID: Subject: Re: swap_info_get: Bad swap offset entry 0200f8a7 To: Christian Kujau Cc: LKML , Hugh Dickins , Minchan Kim , Nitin Gupta , Robert Schelander Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 21, 2017 at 9:07 AM, Christian Kujau wrote: > On Fri, 20 Oct 2017, huang ying wrote: >> > 4 May < Linux version 4.11.2-1-ARCH >> > 4 Jun < Linux version 4.11.3-1-ARCH >> > 7 Jul < Linux version 4.11.9-1-ARCH >> > 4 Aug < Linux version 4.12.8-2-ARCH >> > 24 Sep < Linux version 4.12.13-1-ARCH >> > 158 Oct < Linux version 4.13.5-1-ARCH >> >> So you have never seen this before 4.11 like 4.10? > > Unfortunately the kernel logs for that machine only go back until May > 2017 and I cannot tell if that hasn't happened before. I've seen these > messages appear since then but didn't bother much. But as it now happens > more frequently, I thought I should mention this to the list. > >> Which operations will trigger this error messages? > > I'm not able to reproduce it at will, but I suspect that memory pressure > triggers these messages. The machine in question is an Lenovo Ideapad S10 > notebook running 24x7 and is equipped with 1 GB of RAM. Two Java processes > are basically using up all the memory, so usually it tooks like this: > > ======================================== > $ free -m > total used free shared buff/cache available > Mem: 994 866 67 1 60 20 > Swap: 760 437 322 > > $ zramctl > NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT > /dev/zram0 lz4 248.7M 247M 92.3M 97.4M 2 [SWAP] > ======================================== > > I just assumed the message is triggered when the system is really low on > memory and maybe zram is too slow to provide the memory requested. But > that's just my layman's assumption :-) For example, today's message was > emitted during the night: > > > Oct 20 01:26:18 len kernel: [638973.207849] \ > swap_info_get: Bad swap offset entry 0200f8a7 > > > And here are the sysstat numbers for that time frame: > > > $ sar -r -s 00:00 -e 02:00 > Linux 4.13.5-1-ARCH (len) 10/20/2017 _i686_ (2 CPU) > 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty > 12:10:06 AM 70076 948404 93.12 4 19004 1556176 86.58 376608 379408 220 > 12:20:02 AM 80488 937992 92.10 4 180404 1563952 87.01 380184 327736 5568 > 12:30:03 AM 83296 935184 91.82 4 137260 1569776 87.34 329512 330000 280 > 12:40:03 AM 65188 953292 93.60 4 21156 1571048 87.41 386644 389820 1144 > 12:50:03 AM 67512 950968 93.37 4 33452 1570628 87.38 378936 381580 1304 > 01:00:07 AM 65520 952960 93.57 4 24996 1573180 87.53 385396 386152 904 > 01:10:03 AM 66956 951524 93.43 4 35520 1572696 87.50 379548 379364 172 > 01:20:02 AM 67440 951040 93.38 4 88736 1569864 87.34 381764 370472 7080 > 01:30:03 AM 70048 948432 93.12 4 29212 1572504 87.49 383516 381900 1832 > 01:40:04 AM 71532 946948 92.98 4 29220 1570096 87.35 380120 380284 1000 > 01:50:03 AM 65828 952652 93.54 4 34408 1570604 87.38 381040 381028 1604 > Average: 70353 948127 93.09 4 57579 1569139 87.30 376661 371613 1919 > > > == If that is unreadable, here it is again: https://paste.debian.net/991927/ > >> Is it possible for you to check >> whether the error exists for normal swap device (not ZRAM)? > > I have "normal" (but encrpted) swap configured but with a lower priority: > > cat /proc/swaps > Filename Type Size Used Priority > /dev/dm-0 partition 524284 194348 0 > /dev/zram0 partition 254616 253536 32767 > > I shall disable the zram device and disable encryption too and will report > back if the message appears again. > >> 32bit or 64bit kernel do you use? > > I'm using an i686 kernel for this Atom N270 processor (with HT enabled). Thanks for your information! I have reproduced your problem. They are just false error message. I will fix that and send out the patch soon. Best Regards, Huang, Ying