From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757754AbXKTEN4 (ORCPT ); Mon, 19 Nov 2007 23:13:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754113AbXKTENs (ORCPT ); Mon, 19 Nov 2007 23:13:48 -0500 Received: from smtp105.mail.mud.yahoo.com ([209.191.85.215]:46613 "HELO smtp105.mail.mud.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752557AbXKTENr (ORCPT ); Mon, 19 Nov 2007 23:13:47 -0500 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:X-YMail-OSG:From:To:Subject:Date:User-Agent:Cc:References:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Content-Disposition:Message-Id; b=mK7vGczx5XghptAjZaTaYqv036IWNA0BqB5oQu/REyjhi+XkcoeyFLdFM2A2SPaCrUk04c+O/qN40ka6+2PufeQ5M2rdcZ6ZH3wnQ+HGffkjDOZRRCZGS9eJsRG+TvFA/QmTjHfkhov/mE+Ql6zwrFvcyDF+2059Jmyq/Lo58FI= ; X-YMail-OSG: .Z3wggMVM1lw5xXy.B.oE9Myjpg7rYUVDsnCz478CxUasCJNg.iSs6dKEUIe9oQiJCDFxw7T3w-- From: Nick Piggin To: pomac@vapor.com Subject: Re: [BUG?] OOM with large cache....(x86_64, 2.6.24-rc3-git1, nohz) Date: Tue, 20 Nov 2007 15:13:36 +1100 User-Agent: KMail/1.9.5 Cc: Linux-kernel@vger.kernel.org References: <1195520355.8601.14.camel@localhost> In-Reply-To: <1195520355.8601.14.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200711201513.36711.nickpiggin@yahoo.com.au> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 20 November 2007 11:59, Ian Kumlien wrote: > Hi, > > I have had this before and sent a mail about it. > > It seems like the diskcache is still in use and is never shrunk. This > happened with a odd load though, trackerd started indexing a bit late > and the other workload which is a large bittorrent seed/download. > > The bittorrent app is the one that drives up the diskcache. > > I don't think that trackerd was triggering it, i actually upgraded > kernel since it kept happening on 2.6.23... > > I really don't know what other information i can provide. > > free from now (some hours later) > vmstat from now ^ > > and the dmesg log. > > Ideas? Comments? > > free: > total used free shared buffers cached > Mem: 2056484 2039736 16748 0 20776 1585408 > -/+ buffers/cache: 433552 1622932 > Swap: 2530180 426020 2104160 > --- > > vmstat: > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- r b swpd free buff cache si so bi bo in > cs us sy id wa 0 0 426020 16612 20580 1585848 26 21 684 56 34 > 51 5 3 88 4 --- > > --- 8<--- 8<--- > ntpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 > > Call Trace: > [] oom_kill_process+0xf6/0x110 > [] out_of_memory+0x1b6/0x200 > [] __alloc_pages+0x387/0x3c0 > [] __do_page_cache_readahead+0x103/0x260 > [] filemap_fault+0x2f1/0x420 > [] __do_fault+0x6b/0x410 > [] recalc_sigpending+0xe/0x40 > [] handle_mm_fault+0x1bd/0x7a0 > [] save_i387+0x9a/0xe0 > [] do_page_fault+0x176/0x790 > [] sys_rt_sigreturn+0x35f/0x400 > [] error_exit+0x0/0x51 > > Mem-info: > DMA per-cpu: > CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 > usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, > btch: 1 usd: 0 DMA32 per-cpu: > CPU 0: Hot: hi: 186, btch: 31 usd: 148 Cold: hi: 62, btch: 15 > usd: 60 CPU 1: Hot: hi: 186, btch: 31 usd: 116 Cold: hi: 62, > btch: 15 usd: 18 Active:241172 inactive:241825 dirty:0 writeback:0 > unstable:0 > free:3388 slab:8095 mapped:149 pagetables:6263 bounce:0 > DMA free:7908kB min:20kB low:24kB high:28kB active:0kB inactive:0kB > present:7436kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 > 2003 2003 2003 > DMA32 free:5644kB min:5716kB low:7144kB high:8572kB active:964688kB > inactive:967188kB present:2052008kB pages_scanned:5519125 > all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 > DMA: 5*4kB 4*8kB 3*16kB 4*32kB 6*64kB 5*128kB 4*256kB 3*512kB 0*1024kB > 0*2048kB 1*4096kB = 7908kB DMA32: 95*4kB 2*8kB 0*16kB 0*32kB 0*64kB 1*128kB > 0*256kB 2*512kB 0*1024kB 0*2048kB 1*4096kB = 5644kB Swap cache: add > 1979600, delete 1979592, find 144656/307405, race 1+17 Free swap = 0kB > Total swap = 2530180kB > Free swap: 0kB > 524208 pages of RAM > 10149 reserved pages > 5059 pages shared > 8 pages swap cached > Out of memory: kill process 8421 (trackerd) score 1016524 or a child > Killed process 8421 (trackerd) It's also used up all your 2.5GB of swap. The output of your `free` shows a fair bit of disk cache there, but it also shows a lot of swap free, which isn't the case at oom-time. Unfortunately, we don't show NR_ANON_PAGES in these stats, but at a guess, I'd say that the file cache is mostly shrunk and you still don't have enough memory. trackerd probably has a memory leak in it, or else is just trying to allocate more memory than you have. Is this a regression?