OOM-killer too aggressive?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* OOM-killer too aggressive?
@ 2006-02-26 14:35 Chuck Ebbert
  2006-02-26 18:21 ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Chuck Ebbert @ 2006-02-26 14:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Largret, Jens Axboe, Andrew Morton

Chris Largret is getting repeated OOM kills because of DMA memory
exhaustion:

oom-killer: gfp_mask=0xd1, order=3

Call Trace: <ffffffff8104ed46>{out_of_memory+58} <ffffffff8104ff30>{__alloc_pages+534}
       <ffffffff8104ffee>{__get_free_pages+48} <ffffffff8117d8e9>{dma_mem_alloc+31}
       <ffffffff81183e70>{floppy_open+348} <ffffffff81072125>{do_open+172}
       <ffffffff810724b4>{blkdev_open+0} <ffffffff810724dc>{blkdev_open+40}
       <ffffffff81069fea>{__dentry_open+230} <ffffffff8106a10e>{nameidata_to_filp+40}
       <ffffffff8106a153>{do_filp_open+51} <ffffffff8106a2cb>{get_unused_fd+116}
       <ffffffff8106a477>{do_sys_open+73} <ffffffff8106a4d3>{sys_open+27}
       <ffffffff8100aa3a>{system_call+126}
Mem-info:
DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:184
cpu 0 cold: high 62, batch 15 used:4
cpu 1 hot: high 186, batch 31 used:160
cpu 1 cold: high 62, batch 15 used:4
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages:     2843384kB (0kB HighMem)
Active:10367 inactive:38871 dirty:42 writeback:0 unstable:0 free:710846
slab:4726 mapped:2155 pagetables:147
DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
present:15728kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3014 3014 3014
DMA32 free:2843340kB min:7008kB low:8760kB high:10512kB active:41468kB
inactive:155484kB present:3086500kB pages_scanned:0 all_unreclaimable?
no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 44kB
DMA32: 933*4kB 573*8kB 229*16kB 74*32kB 21*64kB 5*128kB 1*256kB 1*512kB
0*1024kB 0*2048kB 690*4096kB = 2843340kB
Normal: empty
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap  = 1477960kB
Total swap = 1477960kB
Free swap:       1477960kB
786416 pages of RAM
17590 reserved pages
10033 pages shared
0 pages swap cached
Out of Memory: Killed process 4886 (dbus-daemon).


Looking at floppy_open, we have:

        if (!floppy_track_buffer) {
                /* if opening an ED drive, reserve a big buffer,
                 * else reserve a small one */
                if ((UDP->cmos == 6) || (UDP->cmos == 5))
                        try = 64;       /* Only 48 actually useful */
                else
                        try = 32;       /* Only 24 actually useful */

                tmp = (char *)fd_dma_mem_alloc(1024 * try);
                if (!tmp && !floppy_track_buffer) {
                        try >>= 1;      /* buffer only one side */
                        INFBOUND(try, 16);
                        tmp = (char *)fd_dma_mem_alloc(1024 * try);
                }
                if (!tmp && !floppy_track_buffer) {
                        fallback_on_nodma_alloc(&tmp, 2048 * try);
                }

So it will try to allocate half its first request if that fails, then
fall back to non-DMA memory as a last resort, but doesn't get a chance
because the OOM killer gets invoked.  Maybe we need a new flag that says
"fail me immediately if no memory available"?

Or should floppy.c be fixed so it doesn't ask for so much?


I found a diagnostic patch but only this part applies to 2.6.16-rc4:

> From: Jens Axboe <axboe@suse.de>

--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -637,6 +637,8 @@ void blk_queue_bounce_limit(request_queu
 {
 	unsigned long bounce_pfn = dma_addr >> PAGE_SHIFT;
 
+	printk("q=%p, dma_addr=%llx, bounce pfn %lu\n", q, dma_addr, bounce_pfn);
+
 	/*
 	 * set appropriate bounce gfp mask -- unfortunately we don't have a
 	 * full 4GB zone, so we have to resort to low memory for any bounces.
 
-- 
Chuck
"Equations are the Devil's sentences."  --Stephen Colbert

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 14:35 Chuck Ebbert
@ 2006-02-26 18:21 ` Andrew Morton
  2006-02-26 20:39   ` Andi Kleen
  2006-02-26 21:06   ` Chris Largret
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2006-02-26 18:21 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, largret, axboe, Andi Kleen

Chuck Ebbert <76306.1226@compuserve.com> wrote:
>
> Chris Largret is getting repeated OOM kills because of DMA memory
> exhaustion:
> 
> oom-killer: gfp_mask=0xd1, order=3
> 

This could be related to the known GFP_DMA oom on some x86_64 machines.

> Looking at floppy_open, we have:
> 
>         if (!floppy_track_buffer) {
>                 /* if opening an ED drive, reserve a big buffer,
>                  * else reserve a small one */
>                 if ((UDP->cmos == 6) || (UDP->cmos == 5))
>                         try = 64;       /* Only 48 actually useful */
>                 else
>                         try = 32;       /* Only 24 actually useful */
> 
>                 tmp = (char *)fd_dma_mem_alloc(1024 * try);
>                 if (!tmp && !floppy_track_buffer) {
>                         try >>= 1;      /* buffer only one side */
>                         INFBOUND(try, 16);
>                         tmp = (char *)fd_dma_mem_alloc(1024 * try);
>                 }
>                 if (!tmp && !floppy_track_buffer) {
>                         fallback_on_nodma_alloc(&tmp, 2048 * try);
>                 }
> 
> So it will try to allocate half its first request if that fails, then
> fall back to non-DMA memory as a last resort, but doesn't get a chance
> because the OOM killer gets invoked.  Maybe we need a new flag that says
> "fail me immediately if no memory available"?

That's __GFP_NORETRY.

> Or should floppy.c be fixed so it doesn't ask for so much?
> 

The page allocator uses 32k as the threshold for when-to-try-like-crazy.

x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
use GFP_DMA.

--- devel/drivers/block/floppy.c~floppy-false-oom-fix	2006-02-26 10:14:38.000000000 -0800
+++ devel-akpm/drivers/block/floppy.c	2006-02-26 10:15:04.000000000 -0800
@@ -278,7 +278,8 @@ static void do_fd_request(request_queue_
 #endif
 
 #ifndef fd_dma_mem_alloc
-#define fd_dma_mem_alloc(size) __get_dma_pages(GFP_KERNEL,get_order(size))
+#define fd_dma_mem_alloc(size)	\
+		__get_dma_pages(GFP_KERNEL|__GFP_NORETRY,get_order(size))
 #endif
 
 static inline void fallback_on_nodma_alloc(char **addr, size_t l)
_


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
       [not found] <5KvnZ-4uN-27@gated-at.bofh.it>
@ 2006-02-26 18:39 ` Robert Hancock
  2006-02-26 21:56   ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Hancock @ 2006-02-26 18:39 UTC (permalink / raw)
  To: Chuck Ebbert, linux-kernel

Chuck Ebbert wrote:
> DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
> present:15728kB pages_scanned:0 all_unreclaimable? yes

I think the big question is who used up all the DMA zone.. Surely not 
the floppy driver..

> So it will try to allocate half its first request if that fails, then
> fall back to non-DMA memory as a last resort, but doesn't get a chance
> because the OOM killer gets invoked.  Maybe we need a new flag that says
> "fail me immediately if no memory available"?

I think __GFP_NORETRY already does this.. There is also __GFP_NOWARN 
which suppresses the allocation failure warning, not sure if we want 
that or not..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 18:21 ` Andrew Morton
@ 2006-02-26 20:39   ` Andi Kleen
  2006-02-26 21:04     ` Andrew Morton
  2006-02-26 21:06   ` Chris Largret
  1 sibling, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2006-02-26 20:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chuck Ebbert, linux-kernel, largret, axboe

On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> Chuck Ebbert <76306.1226@compuserve.com> wrote:
> >
> > Chris Largret is getting repeated OOM kills because of DMA memory
> > exhaustion:
> > 
> > oom-killer: gfp_mask=0xd1, order=3
> > 
> 
> This could be related to the known GFP_DMA oom on some x86_64 machines.

What known GFP_DMA oom? GFP_DMA allocation should work.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 21:56   ` Marcelo Tosatti
@ 2006-02-26 20:56     ` Chris Largret
  2006-02-27  0:22       ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Largret @ 2006-02-26 20:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Robert Hancock, Chuck Ebbert, linux-kernel

On Sun, 2006-02-26 at 15:56 -0600, Marcelo Tosatti wrote:
> On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> > I think the big question is who used up all the DMA zone.. Surely not 
> > the floppy driver..
> 
> The kernel text and data? "readelf -S vmlinux" output would be useful.

$ readelf -S vmlinux
There are 52 section headers, starting at offset 0x2548488:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         ffffffff81000000  00100000
       000000000026102f  0000000000000000  AX       0     0     16
  [ 2] __ex_table        PROGBITS         ffffffff81261030  00361030
       0000000000004420  0000000000000000   A       0     0     8
  [ 3] .rodata           PROGBITS         ffffffff81266000  00366000
       000000000004ba6f  0000000000000000   A       0     0     32
  [ 4] .pci_fixup        PROGBITS         ffffffff812b1a70  003b1a70
       00000000000008a0  0000000000000000   A       0     0     16
  [ 5] .rio_route        PROGBITS         ffffffff812b2310  0066997c
       0000000000000000  0000000000000000   W       0     0     1
  [ 6] __ksymtab         PROGBITS         ffffffff812b2310  003b2310
       0000000000009ac0  0000000000000000   A       0     0     16
  [ 7] __ksymtab_gpl     PROGBITS         ffffffff812bbdd0  003bbdd0
       0000000000001ea0  0000000000000000   A       0     0     16
  [ 8] __kcrctab         PROGBITS         ffffffff812bdc70  003bdc70
       0000000000004d60  0000000000000000   A       0     0     8
  [ 9] __kcrctab_gpl     PROGBITS         ffffffff812c29d0  003c29d0
       0000000000000f50  0000000000000000   A       0     0     8
  [10] __ksymtab_strings PROGBITS         ffffffff812c3920  003c3920
       0000000000010622  0000000000000000   A       0     0     32
  [11] __param           PROGBITS         ffffffff812d4000  003d4000
       0000000000000d20  0000000000000000   A       0     0     8
  [12] .data             PROGBITS         ffffffff812d5000  003d5000
       00000000000cc5d0  0000000000000000  WA       0     0     4096
  [13] .bss              NOBITS           ffffffff813a1600  004a15d0
       000000000008210c  0000000000000000  WA       0     0     64
  [14] .data.cacheline_a PROGBITS         ffffffff81424000  00524000
       0000000000004c00  0000000000000000  WA       0     0     64
  [15] .data.read_mostly PROGBITS         ffffffff81428c00  00528c00
       00000000000009b0  0000000000000000  WA       0     0     64
  [16] .vsyscall_0       PROGBITS         ffffffffff600000  00600000
       0000000000000108  0000000000000000  AX       0     0     1
  [17] .xtime_lock       PROGBITS         ffffffffff600140  00600140
       0000000000000008  0000000000000000  WA       0     0     16
  [18] .vxtime           PROGBITS         ffffffffff600150  00600150
       0000000000000030  0000000000000000  WA       0     0     16
  [19] .wall_jiffies     PROGBITS         ffffffffff600180  00600180
       0000000000000008  0000000000000000  WA       0     0     16
  [20] .sys_tz           PROGBITS         ffffffffff600190  00600190
       0000000000000008  0000000000000000  WA       0     0     16
  [21] .sysctl_vsyscall  PROGBITS         ffffffffff6001a0  006001a0
       0000000000000004  0000000000000000  WA       0     0     16
  [22] .xtime            PROGBITS         ffffffffff6001b0  006001b0
       0000000000000010  0000000000000000  WA       0     0     16
  [23] .jiffies          PROGBITS         ffffffffff6001c0  006001c0
       0000000000000008  0000000000000000  WA       0     0     16
  [24] .vsyscall_1       PROGBITS         ffffffffff600400  00600400
       000000000000002e  0000000000000000  AX       0     0     1
  [25] .vsyscall_2       PROGBITS         ffffffffff600800  00600800
       000000000000000d  0000000000000000  AX       0     0     1
  [26] .vsyscall_3       PROGBITS         ffffffffff600c00  00600c00
       000000000000000d  0000000000000000  AX       0     0     1
  [27] .data.init_task   PROGBITS         ffffffff8142c000  0062c000
       0000000000002000  0000000000000000  WA       0     0     32
  [28] .init.text        PROGBITS         ffffffff8142e000  0062e000
       00000000000238de  0000000000000000  AX       0     0     1
  [29] .init.data        PROGBITS         ffffffff81452000  00652000
       000000000000c560  0000000000000000  WA       0     0     4096
  [30] .init.setup       PROGBITS         ffffffff8145e560  0065e560
       0000000000000af8  0000000000000000  WA       0     0     8
  [31] .initcall.init    PROGBITS         ffffffff8145f058  0065f058
       0000000000000730  0000000000000000  WA       0     0     8
  [32] .con_initcall.ini PROGBITS         ffffffff8145f788  0065f788
       0000000000000018  0000000000000000  WA       0     0     8
  [33] .security_initcal PROGBITS         ffffffff8145f7a0  0066997c
       0000000000000000  0000000000000000   W       0     0     1
  [34] .altinstructions  PROGBITS         ffffffff8145f7a0  0065f7a0
       0000000000000283  0000000000000000   A       0     0     8
  [35] .altinstr_replace PROGBITS         ffffffff8145fa23  0065fa23
       0000000000000095  0000000000000000  AX       0     0     1
  [36] .exit.text        PROGBITS         ffffffff8145fab8  0065fab8
       0000000000000d5d  0000000000000000  AX       0     0     1
  [37] .init.ramfs       PROGBITS         ffffffff81461000  00661000
       0000000000000086  0000000000000000   A       0     0     1
  [38] .data.percpu      PROGBITS         ffffffff81462000  00662000
       000000000000797c  0000000000000000  WA       0     0     64
  [39] .comment          PROGBITS         0000000000000000  0066997c
       0000000000003d74  0000000000000000           0     0     1
  [40] .debug_aranges    PROGBITS         0000000000000000  0066d6f0
       000000000000d4f0  0000000000000000           0     0     1
  [41] .debug_pubnames   PROGBITS         0000000000000000  0067abe0
       0000000000026a6e  0000000000000000           0     0     1
  [42] .debug_info       PROGBITS         0000000000000000  006a164e
       0000000001ab55e4  0000000000000000           0     0     1
  [43] .debug_abbrev     PROGBITS         0000000000000000  02156c32
       00000000000ca03b  0000000000000000           0     0     1
  [44] .debug_line       PROGBITS         0000000000000000  02220c6d
       0000000000190ccd  0000000000000000           0     0     1
  [45] .debug_frame      PROGBITS         0000000000000000  023b1940
       000000000009ad88  0000000000000000           0     0     8
  [46] .debug_str        PROGBITS         0000000000000000  0244c6c8
       00000000000be96a  0000000000000001  MS       0     0     1
  [47] .debug_ranges     PROGBITS         0000000000000000  0250b032
       000000000003d1e0  0000000000000000           0     0     1
  [48] .note.GNU-stack   PROGBITS         0000000000000000  02548212
       0000000000000000  0000000000000000   X       0     0     1
  [49] .shstrtab         STRTAB           0000000000000000  02548212
       0000000000000273  0000000000000000           0     0     1
  [50] .symtab           SYMTAB           0000000000000000  02549188
       00000000000b3898  0000000000000018          51   20791     8
  [51] .strtab           STRTAB           0000000000000000  025fca20
       0000000000096692  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor
specific)

--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 20:39   ` Andi Kleen
@ 2006-02-26 21:04     ` Andrew Morton
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Morton @ 2006-02-26 21:04 UTC (permalink / raw)
  To: Andi Kleen; +Cc: 76306.1226, linux-kernel, largret, axboe

Andi Kleen <ak@muc.de> wrote:
>
> On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> > Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > > 
> > > oom-killer: gfp_mask=0xd1, order=3
> > > 
> > 
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
> 
> What known GFP_DMA oom? GFP_DMA allocation should work.
> 

There's a problem on some x86_64 machines which confuses the BIO layer. 
BIO makes simple decisions about bounce pfns and some x86_64 memory layouts
cause them to go wrong.  Net effect: lots of GFP_DMA allocations in the BIO
layer.

http://readlist.com/lists/vger.kernel.org/linux-kernel/36/182357.html
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175173

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 18:21 ` Andrew Morton
  2006-02-26 20:39   ` Andi Kleen
@ 2006-02-26 21:06   ` Chris Largret
  2006-02-26 21:31     ` Andrew Morton
  1 sibling, 1 reply; 27+ messages in thread
From: Chris Largret @ 2006-02-26 21:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chuck Ebbert, linux-kernel, axboe, Andi Kleen

On Sun, 2006-02-26 at 10:21 -0800, Andrew Morton wrote: 
> Chuck Ebbert <76306.1226@compuserve.com> wrote:
> >
> > Chris Largret is getting repeated OOM kills because of DMA memory
> > exhaustion:
> > 
> > oom-killer: gfp_mask=0xd1, order=3
> > 
> 
> This could be related to the known GFP_DMA oom on some x86_64 machines.

I'm not sure if this has any bearing on it, but the OOM Killer only does
this when I compile the kernel with SMP support.

> > Or should floppy.c be fixed so it doesn't ask for so much?
> 
> The page allocator uses 32k as the threshold for when-to-try-like-crazy.
> 
> x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
> use GFP_DMA.
> 
> --- devel/drivers/block/floppy.c~floppy-false-oom-fix	2006-02-26 10:14:38.000000000 -0800
> +++ devel-akpm/drivers/block/floppy.c	2006-02-26 10:15:04.000000000 -0800
> @@ -278,7 +278,8 @@ static void do_fd_request(request_queue_

Sorry, this didn't help on my machine. I am running that latest kernel
pre-patch (2.6.16-rc4) for testing right now and had to modify the
offsets a little. If there's any output that would help, please let me
know.

--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 21:06   ` Chris Largret
@ 2006-02-26 21:31     ` Andrew Morton
  2006-02-26 23:00       ` Chris Largret
                         ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Andrew Morton @ 2006-02-26 21:31 UTC (permalink / raw)
  To: largret; +Cc: 76306.1226, linux-kernel, axboe, ak

Chris Largret <largret@gmail.com> wrote:
>
> On Sun, 2006-02-26 at 10:21 -0800, Andrew Morton wrote: 
> > Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > > 
> > > oom-killer: gfp_mask=0xd1, order=3
> > > 
> > 
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
> 
> I'm not sure if this has any bearing on it, but the OOM Killer only does
> this when I compile the kernel with SMP support.

I doubt if that's related.

> > > Or should floppy.c be fixed so it doesn't ask for so much?
> > 
> > The page allocator uses 32k as the threshold for when-to-try-like-crazy.
> > 
> > x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
> > use GFP_DMA.
> > 
> > --- devel/drivers/block/floppy.c~floppy-false-oom-fix	2006-02-26 10:14:38.000000000 -0800
> > +++ devel-akpm/drivers/block/floppy.c	2006-02-26 10:15:04.000000000 -0800
> > @@ -278,7 +278,8 @@ static void do_fd_request(request_queue_
> 
> Sorry, this didn't help on my machine. I am running that latest kernel
> pre-patch (2.6.16-rc4) for testing right now and had to modify the
> offsets a little. If there's any output that would help, please let me
> know.
> 

hm, OK.  I suppose we can hit it with the big hammer, but I'd be reluctant
to merge this patch because it has the potential to hide problems, such as
the as-yet-unfixed bio-uses-ZONE_DMA one.

--- devel/mm/page_alloc.c~a	2006-02-26 13:26:56.000000000 -0800
+++ devel-akpm/mm/page_alloc.c	2006-02-26 13:28:58.000000000 -0800
@@ -1003,7 +1003,8 @@ rebalance:
 						zonelist, alloc_flags);
 		if (page)
 			goto got_pg;
-	} else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+	} else if ((gfp_mask & __GFP_FS) &&
+			!(gfp_mask & (__GFP_NORETRY|__GFP_DMA))) {
 		/*
 		 * Go through the zonelist yet one more time, keep
 		 * very high watermark here, this is only to catch
@@ -1027,7 +1028,7 @@ rebalance:
 	 * <= 3, but that may not be true in other implementations.
 	 */
 	do_retry = 0;
-	if (!(gfp_mask & __GFP_NORETRY)) {
+	if (!(gfp_mask & (__GFP_NORETRY|__GFP_DMA))) {
 		if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
 			do_retry = 1;
 		if (gfp_mask & __GFP_NOFAIL)
_


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 18:39 ` OOM-killer too aggressive? Robert Hancock
@ 2006-02-26 21:56   ` Marcelo Tosatti
  2006-02-26 20:56     ` Chris Largret
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2006-02-26 21:56 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Chuck Ebbert, linux-kernel

On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> Chuck Ebbert wrote:
> >DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
> >present:15728kB pages_scanned:0 all_unreclaimable? yes
> 
> I think the big question is who used up all the DMA zone.. Surely not 
> the floppy driver..

The kernel text and data? "readelf -S vmlinux" output would be useful.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 21:31     ` Andrew Morton
@ 2006-02-26 23:00       ` Chris Largret
  2006-02-27  0:20         ` Andrew Morton
  2006-02-26 23:47       ` Andi Kleen
  2006-02-26 23:51       ` Andi Kleen
  2 siblings, 1 reply; 27+ messages in thread
From: Chris Largret @ 2006-02-26 23:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: 76306.1226, linux-kernel, axboe, ak

On Sun, 2006-02-26 at 13:31 -0800, Andrew Morton wrote:
> > Sorry, this didn't help on my machine. I am running that latest kernel
> > pre-patch (2.6.16-rc4) for testing right now and had to modify the
> > offsets a little. If there's any output that would help, please let me
> > know.
>
> hm, OK.  I suppose we can hit it with the big hammer, but I'd be reluctant
> to merge this patch because it has the potential to hide problems, such as
> the as-yet-unfixed bio-uses-ZONE_DMA one.
> 
> --- devel/mm/page_alloc.c~a	2006-02-26 13:26:56.000000000 -0800
> +++ devel-akpm/mm/page_alloc.c	2006-02-26 13:28:58.000000000 -0800
> @@ -1003,7 +1003,8 @@ rebalance:

I reversed the previous patch before applying this one. If they were
supposed to be used together, let me know.

>From the initial results it looks like the OOM Killer is not being used
now, Unfortunately I can't check with dmesg because right after login is
initiated (but before I get a chance to type anything) there is a
"Kernel BUG" message. This is all that is is printed when a serial
console is in use. If you need the rest of the information, let me know
and I'll see about typing it up.

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/vmalloc.c:352
invalid opcode: 0000 [1] SMP 
CPU 1 
Modules linked in: snd_pcm_oss snd_mixer_oss md5 ipv6 ipt_recent
ipt_REJECT xt_state xt_tcpudp iptable_filter ip_tables x_tables nfs
lockd nfs_acl sunrpc uhci_hcd r8169 ohci1394 ieee1394 emu10k1_gp
gameport snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_pcm
snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd
tda9887 tuner cx8800 cx88xx video_buf ir_common tveeprom compat_ioctl32
v4l1_compat v4l2_common btcx_risc videodev forcedeth usblp ohci_hcd
i2c_nforce2 ehci_hc

--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
@ 2006-02-26 23:32 Chuck Ebbert
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Ebbert @ 2006-02-26 23:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: axboe, largret, linux-kernel

In-Reply-To: <20060226203917.GA76858@muc.de>

On Sun, 26 Feb 2006 at 21:39:17 +0100, Andi Kleen wrote:
> On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> > Chuck Ebbert <76306.1226@compuserve.com> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > > 
> > > oom-killer: gfp_mask=0xd1, order=3
> > > 
> > 
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
> 
> What known GFP_DMA oom? GFP_DMA allocation should work.

        http://marc.theaimsgroup.com/?t=113895864600001&r=1&w=2
        http://marc.theaimsgroup.com/?t=113766047000002&r=1&w=2

-- 
Chuck
"Equations are the Devil's sentences."  --Stephen Colbert


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 21:31     ` Andrew Morton
  2006-02-26 23:00       ` Chris Largret
@ 2006-02-26 23:47       ` Andi Kleen
  2006-02-26 23:51       ` Andi Kleen
  2 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2006-02-26 23:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: largret, 76306.1226, linux-kernel, axboe

> hm, OK.  I suppose we can hit it with the big hammer, but I'd be reluctant
> to merge this patch because it has the potential to hide problems, such as
> the as-yet-unfixed bio-uses-ZONE_DMA one.

Better would be to fix the block layer. I think something like that
would be better: (only lightly tested - it booted on a 6GB x86-64 box) 

It is over pessimistic on systems with real IOMMU that can even remap
to DMA addresses < 4GB - in the future those might want to define
some ARCH_HAS macro so it can be checked here.

Does that patch fix the problem?

That said adding GFP_NORETRY to the floppy allocation is probably still a good
idea. I will do that change here.

-Andi


Disable block layer bouncing for most memory on 64bit systems

The low level PCI DMA mapping functions should handle it in most cases.

This should fix problems with depleting the DMA zone early. The old
code used precious GFP_DMA memory in many cases where it was not needed.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/block/ll_rw_blk.c
===================================================================
--- linux.orig/block/ll_rw_blk.c
+++ linux/block/ll_rw_blk.c
@@ -625,26 +625,32 @@ static inline int ordered_bio_endio(stru
  *    Different hardware can have different requirements as to what pages
  *    it can do I/O directly to. A low level driver can call
  *    blk_queue_bounce_limit to have lower memory pages allocated as bounce
- *    buffers for doing I/O to pages residing above @page. By default
- *    the block layer sets this to the highest numbered "low" memory page.
+ *    buffers for doing I/O to pages residing above @page. 
  **/
 void blk_queue_bounce_limit(request_queue_t *q, u64 dma_addr)
 {
 	unsigned long bounce_pfn = dma_addr >> PAGE_SHIFT;
+	int dma = 0;
 
-	/*
-	 * set appropriate bounce gfp mask -- unfortunately we don't have a
-	 * full 4GB zone, so we have to resort to low memory for any bounces.
-	 * ISA has its own < 16MB zone.
-	 */
-	if (bounce_pfn < blk_max_low_pfn) {
+	q->bounce_gfp = GFP_NOIO;
+#if BITS_PER_LONG == 64
+	/* Assume anything >= 4GB can be handled by IOMMU. 
+	   Actually some IOMMUs can handle everything, but I don't
+	   know of a way to test this here. */
+	if (bounce_pfn < (0xffffffff>>PAGE_SHIFT))
+		dma = 1;
+	q->bounce_pfn = max_low_pfn;
+#else
+	if (bounce_pfn < blk_max_low_pfn)
+		dma = 1;
+	q->bounce_pfn = bounce_pfn;
+#endif
+	if (dma) {	
 		BUG_ON(dma_addr < BLK_BOUNCE_ISA);
 		init_emergency_isa_pool();
 		q->bounce_gfp = GFP_NOIO | GFP_DMA;
-	} else
-		q->bounce_gfp = GFP_NOIO;
-
-	q->bounce_pfn = bounce_pfn;
+		q->bounce_pfn = bounce_pfn;
+	}
 }
 
 EXPORT_SYMBOL(blk_queue_bounce_limit);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 21:31     ` Andrew Morton
  2006-02-26 23:00       ` Chris Largret
  2006-02-26 23:47       ` Andi Kleen
@ 2006-02-26 23:51       ` Andi Kleen
  2006-02-27 22:30         ` Christoph Lameter
  2 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2006-02-26 23:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: largret, 76306.1226, linux-kernel, axboe


Thinking about this more I think we need a __GFP_NOOOM for other
purposes too. e.g. the x86-64 IOMMU code tries to do similar
fallbacks and I suspect it will be hit by the OOM killer too.

-Andi


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 23:00       ` Chris Largret
@ 2006-02-27  0:20         ` Andrew Morton
  2006-02-27  1:01           ` Chris Largret
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2006-02-27  0:20 UTC (permalink / raw)
  To: largret; +Cc: 76306.1226, linux-kernel, axboe, ak

Chris Largret <largret@gmail.com> wrote:
>
> On Sun, 2006-02-26 at 13:31 -0800, Andrew Morton wrote:
> > > Sorry, this didn't help on my machine. I am running that latest kernel
> > > pre-patch (2.6.16-rc4) for testing right now and had to modify the
> > > offsets a little. If there's any output that would help, please let me
> > > know.
> >
> > hm, OK.  I suppose we can hit it with the big hammer, but I'd be reluctant
> > to merge this patch because it has the potential to hide problems, such as
> > the as-yet-unfixed bio-uses-ZONE_DMA one.
> > 
> > --- devel/mm/page_alloc.c~a	2006-02-26 13:26:56.000000000 -0800
> > +++ devel-akpm/mm/page_alloc.c	2006-02-26 13:28:58.000000000 -0800
> > @@ -1003,7 +1003,8 @@ rebalance:
> 
> I reversed the previous patch before applying this one. If they were
> supposed to be used together, let me know.

No, that's right.

> >From the initial results it looks like the OOM Killer is not being used
> now, Unfortunately I can't check with dmesg because right after login is
> initiated (but before I get a chance to type anything) there is a
> "Kernel BUG" message. This is all that is is printed when a serial
> console is in use. If you need the rest of the information, let me know
> and I'll see about typing it up.
> 
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at mm/vmalloc.c:352
> invalid opcode: 0000 [1] SMP 
> CPU 1 
> Modules linked in: snd_pcm_oss snd_mixer_oss md5 ipv6 ipt_recent
> ipt_REJECT xt_state xt_tcpudp iptable_filter ip_tables x_tables nfs
> lockd nfs_acl sunrpc uhci_hcd r8169 ohci1394 ieee1394 emu10k1_gp
> gameport snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_pcm
> snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd
> tda9887 tuner cx8800 cx88xx video_buf ir_common tveeprom compat_ioctl32
> v4l1_compat v4l2_common btcx_risc videodev forcedeth usblp ohci_hcd
> i2c_nforce2 ehci_hc

Sigh.  The floppy driver's just a jpke.  Looks like the failed allocation
fell back to vmalloc then screwed it up.

I rather doubt whether x86_64 needs to be constraining itself to the ISA
DMA region anyway - something for Andi to look at please?

You could try this one instead, although I guess I'll need to fire up the
test box for this bug.


--- devel/include/asm-x86_64/floppy.h~b	2006-02-26 16:15:44.000000000 -0800
+++ devel-akpm/include/asm-x86_64/floppy.h	2006-02-26 16:16:21.000000000 -0800
@@ -40,7 +40,7 @@
 #define fd_disable_irq()        disable_irq(FLOPPY_IRQ)
 #define fd_free_irq()		free_irq(FLOPPY_IRQ, NULL)
 #define fd_get_dma_residue()    SW._get_dma_residue(FLOPPY_DMA)
-#define fd_dma_mem_alloc(size)	SW._dma_mem_alloc(size)
+#define fd_dma_mem_alloc(size)	__alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
 #define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)
 
 #define FLOPPY_CAN_FALLBACK_ON_NODMA
_


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 20:56     ` Chris Largret
@ 2006-02-27  0:22       ` Marcelo Tosatti
  2006-02-27  1:48         ` Chris Largret
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2006-02-27  0:22 UTC (permalink / raw)
  To: Chris Largret; +Cc: Robert Hancock, Chuck Ebbert, linux-kernel

On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> On Sun, 2006-02-26 at 15:56 -0600, Marcelo Tosatti wrote:
> > On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> > > I think the big question is who used up all the DMA zone.. Surely not 
> > > the floppy driver..
> > 
> > The kernel text and data? "readelf -S vmlinux" output would be useful.
> 
> $ readelf -S vmlinux
> There are 52 section headers, starting at offset 0x2548488:

<snip>

>   [49] .shstrtab         STRTAB           0000000000000000  02548212
>        0000000000000273  0000000000000000           0     0     1
>   [50] .symtab           SYMTAB           0000000000000000  02549188
>        00000000000b3898  0000000000000018          51   20791     8
>   [51] .strtab           STRTAB           0000000000000000  025fca20
>        0000000000096692  0000000000000000           0     0     1

More than 40MB, that should partially explain it...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27  0:20         ` Andrew Morton
@ 2006-02-27  1:01           ` Chris Largret
  2006-02-27  1:57             ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Largret @ 2006-02-27  1:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: 76306.1226, linux-kernel, axboe, ak

On Sun, 2006-02-26 at 16:20 -0800, Andrew Morton wrote:

> Sigh.  The floppy driver's just a jpke.  Looks like the failed allocation
> fell back to vmalloc then screwed it up.

> You could try this one instead, although I guess I'll need to fire up the
> test box for this bug.

> --- devel/include/asm-x86_64/floppy.h~b	2006-02-26 16:15:44.000000000 -0800
> +++ devel-akpm/include/asm-x86_64/floppy.h	2006-02-26 16:16:21.000000000 -0800
> @@ -40,7 +40,7 @@
>  #define fd_disable_irq()        disable_irq(FLOPPY_IRQ)
>  #define fd_free_irq()		free_irq(FLOPPY_IRQ, NULL)
>  #define fd_get_dma_residue()    SW._get_dma_residue(FLOPPY_DMA)
> -#define fd_dma_mem_alloc(size)	SW._dma_mem_alloc(size)
> +#define fd_dma_mem_alloc(size)	__alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
>  #define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)
>  
>  #define FLOPPY_CAN_FALLBACK_ON_NODMA

CC      drivers/block/floppy.o
drivers/block/floppy.c: In function `raw_cmd_copyin':
drivers/block/floppy.c:3245: error: too few arguments to function
`__alloc_pages'
drivers/block/floppy.c: In function `floppy_open':
drivers/block/floppy.c:3738: error: too few arguments to function
`__alloc_pages'
drivers/block/floppy.c:3742: error: too few arguments to function
`__alloc_pages'
make[2]: *** [drivers/block/floppy.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2


I'm sorry, but I'm not sure where to start for looking up the definition
for __alloc_pages().

--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27  0:22       ` Marcelo Tosatti
@ 2006-02-27  1:48         ` Chris Largret
  2006-02-27 15:47           ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Largret @ 2006-02-27  1:48 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Robert Hancock, Chuck Ebbert, linux-kernel, Andrew Morton

On Sun, 2006-02-26 at 18:22 -0600, Marcelo Tosatti wrote:
> On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> > $ readelf -S vmlinux
> > There are 52 section headers, starting at offset 0x2548488:
> 
> <snip>
> 
> >   [49] .shstrtab         STRTAB           0000000000000000  02548212
> >        0000000000000273  0000000000000000           0     0     1
> >   [50] .symtab           SYMTAB           0000000000000000  02549188
> >        00000000000b3898  0000000000000018          51   20791     8
> >   [51] .strtab           STRTAB           0000000000000000  025fca20
> >        0000000000096692  0000000000000000           0     0     1
> 
> More than 40MB, that should partially explain it...

Ouch. I hadn't noticed that and will have to see about bringing that
down a little. It's the same size when compiling without SMP, and the
OOM Killer doesn't cause problems then. There is something else that is
causing these problems.

>From using ls on the *.o files, it appears (as expected) that most of
this is the built-in drivers. The pruning should be fun. :)

--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27  1:01           ` Chris Largret
@ 2006-02-27  1:57             ` Andrew Morton
  2006-02-27  6:34               ` Chris Largret
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2006-02-27  1:57 UTC (permalink / raw)
  To: largret; +Cc: 76306.1226, linux-kernel, axboe, ak

Chris Largret <largret@gmail.com> wrote:
>
>  drivers/block/floppy.c:3245: error: too few arguments to function
>  `__alloc_pages'

doh.

--- devel/include/asm-x86_64/floppy.h~b	2006-02-26 16:15:44.000000000 -0800
+++ devel-akpm/include/asm-x86_64/floppy.h	2006-02-26 17:57:02.000000000 -0800
@@ -40,7 +40,7 @@
 #define fd_disable_irq()        disable_irq(FLOPPY_IRQ)
 #define fd_free_irq()		free_irq(FLOPPY_IRQ, NULL)
 #define fd_get_dma_residue()    SW._get_dma_residue(FLOPPY_DMA)
-#define fd_dma_mem_alloc(size)	SW._dma_mem_alloc(size)
+#define fd_dma_mem_alloc(size)	alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
 #define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)
 
 #define FLOPPY_CAN_FALLBACK_ON_NODMA
_


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27  1:57             ` Andrew Morton
@ 2006-02-27  6:34               ` Chris Largret
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Largret @ 2006-02-27  6:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: 76306.1226, linux-kernel, axboe, ak

On Sun, 2006-02-26 at 17:57 -0800, Andrew Morton wrote:
> Chris Largret <largret@gmail.com> wrote:
> >
> >  drivers/block/floppy.c:3245: error: too few arguments to function
> >  `__alloc_pages'
> 
> doh.
> 
> --- devel/include/asm-x86_64/floppy.h~b	2006-02-26 16:15:44.000000000 -0800
> +++ devel-akpm/include/asm-x86_64/floppy.h	2006-02-26 17:57:02.000000000 -0800

Earlier I said that there was a "Kernel BUG" and all processing stopped
right after the login prompt was displayed (but before I could type
anything). Now the kernel continues to work, but the messages are a
little disconcerting. Here is the version with a backtrace (from dmesg):


Bad page state in process 'swapper'
page:ffff810001539168 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015391a0 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015391d8 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539210 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539248 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539280 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015392b8 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015392f0 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
       <ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
       <ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
       <ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
       <ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
       <ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
       <ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
       <ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
       <ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
       <ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
       <ffffffff814390a1>{start_secondary+1189}


--
Chris Largret <http://daga.dyndns.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27  1:48         ` Chris Largret
@ 2006-02-27 15:47           ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2006-02-27 15:47 UTC (permalink / raw)
  To: Chris Largret; +Cc: Robert Hancock, Chuck Ebbert, linux-kernel, Andrew Morton

On Sun, Feb 26, 2006 at 05:48:15PM -0800, Chris Largret wrote:
> On Sun, 2006-02-26 at 18:22 -0600, Marcelo Tosatti wrote:
> > On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> > > $ readelf -S vmlinux
> > > There are 52 section headers, starting at offset 0x2548488:
> > 
> > <snip>
> > 
> > >   [49] .shstrtab         STRTAB           0000000000000000  02548212
> > >        0000000000000273  0000000000000000           0     0     1
> > >   [50] .symtab           SYMTAB           0000000000000000  02549188
> > >        00000000000b3898  0000000000000018          51   20791     8
> > >   [51] .strtab           STRTAB           0000000000000000  025fca20
> > >        0000000000096692  0000000000000000           0     0     1
> > 
> > More than 40MB, that should partially explain it...
> 
> Ouch. I hadn't noticed that and will have to see about bringing that
> down a little. It's the same size when compiling without SMP, and the
> OOM Killer doesn't cause problems then. There is something else that is
> causing these problems.

Indeed, this only explains why the DMA zone is full.

The floppy driver is asking for a large contiguous chunk of memory
in the DMA zone, which the allocator tries to satistify by killing
applications.

Andrew's patch makes the allocator give up easier, which allows the
driver to fallback to non-contiguous memory (that is the real problem).

> >From using ls on the *.o files, it appears (as expected) that most of
> this is the built-in drivers. The pruning should be fun. :)

There should be no need to prune it to fix the OOM issue, it explains
why the DMA memory is full though.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-26 23:51       ` Andi Kleen
@ 2006-02-27 22:30         ` Christoph Lameter
  2006-02-28  0:41           ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2006-02-27 22:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, largret, 76306.1226, linux-kernel, axboe

On Sun, 27 Feb 2006, Andi Kleen wrote:

> Thinking about this more I think we need a __GFP_NOOOM for other
> purposes too. e.g. the x86-64 IOMMU code tries to do similar
> fallbacks and I suspect it will be hit by the OOM killer too.

Isnt this also a constrained allocation? We could expand the check to also 
catch these types of restrictions and fail.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-27 22:30         ` Christoph Lameter
@ 2006-02-28  0:41           ` Andi Kleen
  2006-02-28  0:59             ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2006-02-28  0:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, largret, 76306.1226, linux-kernel, axboe

On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> On Sun, 27 Feb 2006, Andi Kleen wrote:
> 
> > Thinking about this more I think we need a __GFP_NOOOM for other
> > purposes too. e.g. the x86-64 IOMMU code tries to do similar
> > fallbacks and I suspect it will be hit by the OOM killer too.
> 
> Isnt this also a constrained allocation? We could expand the check to also 
> catch these types of restrictions and fail.

No, it uses the full fallback zone list of the target node, not a custom
one. Would be hard to detect without a flag.

Maybe __GFP_NORETRY is actually good enough for this purpose. Opinions?

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-28  0:41           ` Andi Kleen
@ 2006-02-28  0:59             ` Andrew Morton
  2006-02-28  1:03               ` Christoph Lameter
  2006-02-28  1:25               ` Andi Kleen
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2006-02-28  0:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, largret, 76306.1226, linux-kernel, axboe

Andi Kleen <ak@muc.de> wrote:
>
> On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> > On Sun, 27 Feb 2006, Andi Kleen wrote:
> > 
> > > Thinking about this more I think we need a __GFP_NOOOM for other
> > > purposes too. e.g. the x86-64 IOMMU code tries to do similar
> > > fallbacks and I suspect it will be hit by the OOM killer too.
> > 
> > Isnt this also a constrained allocation? We could expand the check to also 
> > catch these types of restrictions and fail.
> 
> No, it uses the full fallback zone list of the target node, not a custom
> one. Would be hard to detect without a flag.
> 
> Maybe __GFP_NORETRY is actually good enough for this purpose. Opinions?
> 

I was thinking that your __GFP_NOOOM was a thinko.  How would it differ
from __GFP_NORETRY?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-28  0:59             ` Andrew Morton
@ 2006-02-28  1:03               ` Christoph Lameter
  2006-02-28  1:25               ` Andi Kleen
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2006-02-28  1:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, largret, 76306.1226, linux-kernel, axboe

On Mon, 27 Feb 2006, Andrew Morton wrote:

> > On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> > > Isnt this also a constrained allocation? We could expand the check to also 
> > > catch these types of restrictions and fail.
> > 
> > No, it uses the full fallback zone list of the target node, not a custom
> > one. Would be hard to detect without a flag.

Right but it specifies in its flags that not all system memory can satisfy 
this particular memory request. That fact may be detected by the 
out_of_memory() function. We could do something special there instead of 
OOMing.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-28  0:59             ` Andrew Morton
  2006-02-28  1:03               ` Christoph Lameter
@ 2006-02-28  1:25               ` Andi Kleen
  2006-02-28  1:38                 ` Andrew Morton
  1 sibling, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2006-02-28  1:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, largret, 76306.1226, linux-kernel, axboe

> I was thinking that your __GFP_NOOOM was a thinko.  How would it differ
> from __GFP_NORETRY?

__GFP_NORETRY seems to skip at least one retry pass as far as I can see.
__GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
with oom killing? I'm not sure - that is why i was asking.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-28  1:25               ` Andi Kleen
@ 2006-02-28  1:38                 ` Andrew Morton
  2006-02-28 12:09                   ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2006-02-28  1:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: clameter, largret, 76306.1226, linux-kernel, axboe

Andi Kleen <ak@muc.de> wrote:
>
> > I was thinking that your __GFP_NOOOM was a thinko.  How would it differ
> > from __GFP_NORETRY?
> 
> __GFP_NORETRY seems to skip at least one retry pass as far as I can see.
> __GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
> with oom killing? I'm not sure - that is why i was asking.
> 

Oh, OK.  That final get_page_from_freelist() is allegedly to see if a
parallel oom-killing freed some pages - we already know that
try_to_free_pages() didn't work.

I rather doubt that it'll make any difference.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: OOM-killer too aggressive?
  2006-02-28  1:38                 ` Andrew Morton
@ 2006-02-28 12:09                   ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2006-02-28 12:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, largret, 76306.1226, linux-kernel, axboe

On Mon, Feb 27, 2006 at 05:38:30PM -0800, Andrew Morton wrote:
> Andi Kleen <ak@muc.de> wrote:
> >
> > > I was thinking that your __GFP_NOOOM was a thinko.  How would it differ
> > > from __GFP_NORETRY?
> > 
> > __GFP_NORETRY seems to skip at least one retry pass as far as I can see.
> > __GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
> > with oom killing? I'm not sure - that is why i was asking.
> > 
> 
> Oh, OK.  That final get_page_from_freelist() is allegedly to see if a
> parallel oom-killing freed some pages - we already know that
> try_to_free_pages() didn't work.
> 
> I rather doubt that it'll make any difference.

I switched over the x86-64 IOMMU code and floppy code to use
__GFP_NORETRY now.

But perhaps it would be better to rename it to __GFP_NOOOM
because I think that would express its meaning better.

-Andi


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-02-28 12:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5KvnZ-4uN-27@gated-at.bofh.it>
2006-02-26 18:39 ` OOM-killer too aggressive? Robert Hancock
2006-02-26 21:56   ` Marcelo Tosatti
2006-02-26 20:56     ` Chris Largret
2006-02-27  0:22       ` Marcelo Tosatti
2006-02-27  1:48         ` Chris Largret
2006-02-27 15:47           ` Marcelo Tosatti
2006-02-26 23:32 Chuck Ebbert
  -- strict thread matches above, loose matches on Subject: below --
2006-02-26 14:35 Chuck Ebbert
2006-02-26 18:21 ` Andrew Morton
2006-02-26 20:39   ` Andi Kleen
2006-02-26 21:04     ` Andrew Morton
2006-02-26 21:06   ` Chris Largret
2006-02-26 21:31     ` Andrew Morton
2006-02-26 23:00       ` Chris Largret
2006-02-27  0:20         ` Andrew Morton
2006-02-27  1:01           ` Chris Largret
2006-02-27  1:57             ` Andrew Morton
2006-02-27  6:34               ` Chris Largret
2006-02-26 23:47       ` Andi Kleen
2006-02-26 23:51       ` Andi Kleen
2006-02-27 22:30         ` Christoph Lameter
2006-02-28  0:41           ` Andi Kleen
2006-02-28  0:59             ` Andrew Morton
2006-02-28  1:03               ` Christoph Lameter
2006-02-28  1:25               ` Andi Kleen
2006-02-28  1:38                 ` Andrew Morton
2006-02-28 12:09                   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox