From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265452AbUASRZA (ORCPT ); Mon, 19 Jan 2004 12:25:00 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265478AbUASRZA (ORCPT ); Mon, 19 Jan 2004 12:25:00 -0500 Received: from mail33.messagelabs.com ([195.245.230.51]:21736 "HELO mail33.messagelabs.com") by vger.kernel.org with SMTP id S265452AbUASRY4 (ORCPT ); Mon, 19 Jan 2004 12:24:56 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-23.tower-33.messagelabs.com!1074533094!775317 X-StarScan-Version: 5.1.15; banners=-,-,- X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <20040119145430.GI1748@srv-lnx2600.matchmail.com> From: Oliver Kiddle References: <7641.1074512162@gmcs3.local> <20040119145430.GI1748@srv-lnx2600.matchmail.com> To: linux-kernel@vger.kernel.org Subject: Re: page allocation failure Date: Mon, 19 Jan 2004 18:29:03 +0100 Message-ID: <9759.1074533343@gmcs3.local> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Mike Fedyk wrote: > > Try running "vmstat 1" and output that to a file, and post your /proc/meminfo. > > Do you start getting the error before a couple of days, or you just can't > login after that amount of time? I can't log in immediately following the first occurence of the error. I can type in a username at the login prompt but nothing happens after pressing enter. Two days was just a rough idea of how long the system could be up before going down. It has gone down twice since I posted earlier so it wasn't even vaguely an accurate figure. On both occasions, there has not been a "page allocation failure" error though. These last two times, I was running xfsdump along with a nfsd activity. I had the following, possibly unrelated messages on the console. st0: Block limits 1 - 16777215 bytes. spurious 8259A interrupt IRQ7 I've put /proc/meminfo below though that is from the beginning while everything is still fine. The vmstat output is more interesting and I have it captured for the period when it went down. vmstat output starts off like this: r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 0 947908 5792 37128 0 0 54 49 1072 121 0 2 96 2 The free column then slowly drops. Shortly before the end, is this sequence: 2 1 0 57036 2412 62044 0 0 2224 512 1950 188 1 70 20 9 0 0 0 55104 1284 64096 0 0 2204 320 1663 154 0 51 42 7 2 1 0 53048 44 67168 0 0 3080 0 1939 32 0 59 38 3 2 0 1388 49748 56 69592 0 1388 2796 1393 1909 161 1 64 15 19 3 2 1928 45828 60 72376 64 1208 3056 1208 2146 184 3 70 2 25 1 4 1464 94700 60 22088 0 808 3428 828 1873 213 1 58 0 41 0 1 1176 93716 60 23060 356 316 1596 429 2079 342 0 56 4 40 3 3 1176 94116 64 22368 144 0 1124 311 6419 1369 0 6 1 93 1 2 1176 109176 36 7360 0 0 828 159 29189 7978 0 1 0 99 This is the first time the swpd column is non-zero. The figures don't change a vast amount after that and only 25 samples later, the very last sample I got looked like this: 0 1 1176 109248 40 7364 0 0 0 0 1009 25 0 1 0 99 I can send you the full output if you want (70kb compressed). /proc/meminfo: MemTotal: 1034796 kB MemFree: 884620 kB Buffers: 14768 kB Cached: 61192 kB SwapCached: 0 kB Active: 51972 kB Inactive: 35992 kB HighTotal: 131008 kB HighFree: 57148 kB LowTotal: 903788 kB LowFree: 827472 kB SwapTotal: 996020 kB SwapFree: 996020 kB Dirty: 24 kB Writeback: 0 kB Mapped: 16772 kB Slab: 31064 kB Committed_AS: 24876 kB PageTables: 536 kB VmallocTotal: 114680 kB VmallocUsed: 692 kB VmallocChunk: 113988 kB I have /tmp mounted using tmpfs if that is in any way significant. Thanks Oliver