* 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
@ 2008-05-21 11:30 Miquel van Smoorenburg
2008-05-21 12:49 ` Glauber Costa
0 siblings, 1 reply; 11+ messages in thread
From: Miquel van Smoorenburg @ 2008-05-21 11:30 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, Glauber Costa
I've recently switched some of my boxes from a 32 to a
64 bit kernel. These are usenet server boxes that do
a lot of I/O. They are running 2.6.24 / 2.6.25
Every 15 minutes a cronjob calls a management utility, tw_cli,
to read the raid status of the 3ware disk arrays. That
often fails with a segmentation violation ..
tw_cli: page allocation failure. order:0, mode:0x10d0
Pid: 9296, comm: tw_cli Not tainted 2.6.25.4 #2
Call Trace:
[<ffffffff802604b6>] __alloc_pages+0x336/0x390
[<ffffffff80210ff4>] dma_alloc_pages+0x24/0xa0
[<ffffffff80211113>] dma_alloc_coherent+0xa3/0x2e0
[<ffffffff8804a58f>] :3w_9xxx:twa_chrdev_ioctl+0x11f/0x810
[<ffffffff802826c0>] chrdev_open+0x0/0x1c0
[<ffffffff8027d997>] __dentry_open+0x197/0x210
[<ffffffff8028c4ed>] vfs_ioctl+0x7d/0xa0
[<ffffffff8028c584>] do_vfs_ioctl+0x74/0x2d0
[<ffffffff8028c829>] sys_ioctl+0x49/0x80
[<ffffffff8020b29b>] system_call_after_swapgs+0x7b/0x80
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
CPU 2: hi: 0, btch: 1 usd: 0
CPU 3: hi: 0, btch: 1 usd: 0
DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 60
CPU 1: hi: 186, btch: 31 usd: 185
CPU 2: hi: 186, btch: 31 usd: 176
CPU 3: hi: 186, btch: 31 usd: 165
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 120
CPU 1: hi: 186, btch: 31 usd: 164
CPU 2: hi: 186, btch: 31 usd: 177
CPU 3: hi: 186, btch: 31 usd: 182
Active:265929 inactive:1657355 dirty:663189 writeback:62890 unstable:0
free:49079 slab:65923 mapped:1238 pagetables:927 bounce:0
DMA free:12308kB min:184kB low:228kB high:276kB active:0kB inactive:0kB present:11816kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 8053 8053
DMA32 free:94200kB min:52912kB low:66140kB high:79368kB active:440616kB inactive:2505772kB present:3333792kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4797 4797
Normal free:86792kB min:77968kB low:97460kB high:116952kB active:623100kB inactive:4126872kB present:4912640kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 5*8kB 2*16kB 6*32kB 4*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12308kB
DMA32: 150*4kB 5*8kB 2299*16kB 120*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 13*4096kB = 94512kB
Normal: 462*4kB 3803*8kB 123*16kB 24*32kB 2*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 18*4096kB = 109760kB
1653409 total pagecache pages
Swap cache: add 5748, delete 5411, find 4317/4852
Free swap = 4588488kB
Total swap = 4594580kB
Free swap: 4588488kB
2293760 pages of RAM
249225 reserved pages
1658761 pages shared
337 pages swap cached
(this is easily reproducible by pinning a lot of memory with
mmap/mlock, say 6 GB on an 8 GB box, while running
cat /dev/zero > filename, then invoking tw_cli)
Now this appears to happen because dma_alloc_coherent() in
pci-dma_64.c does this:
/* Don't invoke OOM killer */
gfp |= __GFP_NORETRY;
However, if you read mm/page_alloc.c you can see that this not only
prevents invoking the OOM killer, it also does what it says:
no retries when allocating memory.
That means that dma_alloc_coherent(..., GFP_KERNEL) can become
unreliable. Bad news.
pci-dma_32 does not do this.
And in 2.6.26-rc1, pci-dma_32.c and pci-dma_64.c were merged,
so now the 32 bit kernel has the same problem.
Does anyone know why this was added on x86_64 ?
If not I think this patch should go into 2.6.26:
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-21 13:15:54.000000000 +0200
@@ -397,9 +397,6 @@
if (dev->dma_mask == NULL)
return NULL;
- /* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
-
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
larger than 16MB and in this case we have a chance of
Ideas ? Maybe a __GFP_NO_OOMKILLER ?
Mike.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-21 11:30 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Miquel van Smoorenburg
@ 2008-05-21 12:49 ` Glauber Costa
2008-05-22 8:47 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Glauber Costa @ 2008-05-21 12:49 UTC (permalink / raw)
To: Miquel van Smoorenburg; +Cc: linux-kernel, linux-mm, andi-suse
Miquel van Smoorenburg wrote:
> I've recently switched some of my boxes from a 32 to a
> 64 bit kernel. These are usenet server boxes that do
> a lot of I/O. They are running 2.6.24 / 2.6.25
>
> Every 15 minutes a cronjob calls a management utility, tw_cli,
> to read the raid status of the 3ware disk arrays. That
> often fails with a segmentation violation ..
>
> tw_cli: page allocation failure. order:0, mode:0x10d0
> Pid: 9296, comm: tw_cli Not tainted 2.6.25.4 #2
>
> Call Trace:
> [<ffffffff802604b6>] __alloc_pages+0x336/0x390
> [<ffffffff80210ff4>] dma_alloc_pages+0x24/0xa0
> [<ffffffff80211113>] dma_alloc_coherent+0xa3/0x2e0
> [<ffffffff8804a58f>] :3w_9xxx:twa_chrdev_ioctl+0x11f/0x810
> [<ffffffff802826c0>] chrdev_open+0x0/0x1c0
> [<ffffffff8027d997>] __dentry_open+0x197/0x210
> [<ffffffff8028c4ed>] vfs_ioctl+0x7d/0xa0
> [<ffffffff8028c584>] do_vfs_ioctl+0x74/0x2d0
> [<ffffffff8028c829>] sys_ioctl+0x49/0x80
> [<ffffffff8020b29b>] system_call_after_swapgs+0x7b/0x80
>
> Mem-info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> CPU 1: hi: 0, btch: 1 usd: 0
> CPU 2: hi: 0, btch: 1 usd: 0
> CPU 3: hi: 0, btch: 1 usd: 0
> DMA32 per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 60
> CPU 1: hi: 186, btch: 31 usd: 185
> CPU 2: hi: 186, btch: 31 usd: 176
> CPU 3: hi: 186, btch: 31 usd: 165
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 120
> CPU 1: hi: 186, btch: 31 usd: 164
> CPU 2: hi: 186, btch: 31 usd: 177
> CPU 3: hi: 186, btch: 31 usd: 182
> Active:265929 inactive:1657355 dirty:663189 writeback:62890 unstable:0
> free:49079 slab:65923 mapped:1238 pagetables:927 bounce:0
> DMA free:12308kB min:184kB low:228kB high:276kB active:0kB inactive:0kB present:11816kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 3255 8053 8053
> DMA32 free:94200kB min:52912kB low:66140kB high:79368kB active:440616kB inactive:2505772kB present:3333792kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 4797 4797
> Normal free:86792kB min:77968kB low:97460kB high:116952kB active:623100kB inactive:4126872kB present:4912640kB pages_scanned:32 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA: 3*4kB 5*8kB 2*16kB 6*32kB 4*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12308kB
> DMA32: 150*4kB 5*8kB 2299*16kB 120*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 13*4096kB = 94512kB
> Normal: 462*4kB 3803*8kB 123*16kB 24*32kB 2*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 18*4096kB = 109760kB
> 1653409 total pagecache pages
> Swap cache: add 5748, delete 5411, find 4317/4852
> Free swap = 4588488kB
> Total swap = 4594580kB
> Free swap: 4588488kB
> 2293760 pages of RAM
> 249225 reserved pages
> 1658761 pages shared
> 337 pages swap cached
>
> (this is easily reproducible by pinning a lot of memory with
> mmap/mlock, say 6 GB on an 8 GB box, while running
> cat /dev/zero > filename, then invoking tw_cli)
>
> Now this appears to happen because dma_alloc_coherent() in
> pci-dma_64.c does this:
>
> /* Don't invoke OOM killer */
> gfp |= __GFP_NORETRY;
>
> However, if you read mm/page_alloc.c you can see that this not only
> prevents invoking the OOM killer, it also does what it says:
> no retries when allocating memory.
>
> That means that dma_alloc_coherent(..., GFP_KERNEL) can become
> unreliable. Bad news.
>
> pci-dma_32 does not do this.
>
> And in 2.6.26-rc1, pci-dma_32.c and pci-dma_64.c were merged,
> so now the 32 bit kernel has the same problem.
>
> Does anyone know why this was added on x86_64 ?
>
> If not I think this patch should go into 2.6.26:
>
> diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
> --- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
> +++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-21 13:15:54.000000000 +0200
> @@ -397,9 +397,6 @@
> if (dev->dma_mask == NULL)
> return NULL;
>
> - /* Don't invoke OOM killer */
> - gfp |= __GFP_NORETRY;
> -
> #ifdef CONFIG_X86_64
> /* Why <=? Even when the mask is smaller than 4GB it is often
> larger than 16MB and in this case we have a chance of
>
>
> Ideas ? Maybe a __GFP_NO_OOMKILLER ?
probably andi has a better idea on why it was added, since it used to
live in his tree?
> Mike.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-21 12:49 ` Glauber Costa
@ 2008-05-22 8:47 ` Andi Kleen
2008-05-22 19:25 ` Miquel van Smoorenburg
2008-05-22 19:58 ` Thomas Gleixner
0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2008-05-22 8:47 UTC (permalink / raw)
To: Glauber Costa; +Cc: Miquel van Smoorenburg, linux-kernel, linux-mm, andi-suse
On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> probably andi has a better idea on why it was added, since it used to
> live in his tree?
d_a_c() tries a couple of zones, and running the oom killer for each
is inconvenient. Especially for the 16MB DMA zone which is unlikely
to be cleared by the OOM killer anyways because normal user applications
don't put pages in there. There was a real report with some problems
in this area. Also for the earlier tries you don't want to really
bring the system into swap.
Mask allocator would clean most of that up.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-22 8:47 ` Andi Kleen
@ 2008-05-22 19:25 ` Miquel van Smoorenburg
2008-05-24 19:38 ` Miquel van Smoorenburg
2008-05-22 19:58 ` Thomas Gleixner
1 sibling, 1 reply; 11+ messages in thread
From: Miquel van Smoorenburg @ 2008-05-22 19:25 UTC (permalink / raw)
To: Andi Kleen; +Cc: Glauber Costa, linux-kernel, linux-mm, andi-suse, miquels
On Thu, 2008-05-22 at 10:47 +0200, Andi Kleen wrote:
> On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> > probably andi has a better idea on why it was added, since it used to
> > live in his tree?
>
> d_a_c() tries a couple of zones, and running the oom killer for each
> is inconvenient. Especially for the 16MB DMA zone which is unlikely
> to be cleared by the OOM killer anyways because normal user applications
> don't put pages in there. There was a real report with some problems
> in this area. Also for the earlier tries you don't want to really
> bring the system into swap.
I understand, but I do think using __GFP_NORETRY causes problems.
Most drivers call pci_alloc_consistent() which calls
dma_alloc_coherent(.... GFP_ATOMIC) which can dip deep into reserves so
it won't fail so easily. Just a handful use dma_alloc_coherent()
directly.
However, in 2.6.26-rc1, dpt_i2o.c was updated for 64 bit support, and
all it's kmalloc(.... GFP_KERNEL) + virt_to_bus() calls have been
replaced by dma_alloc_coherent(.... GFP_KERNEL).
In that case, it's not a very good idea to add __GFP_NORETRY. It will
cause problems. It certainly does in 3w-xxxx.c and it probably will
cause worse problems in dpt_i2o.c.
I think we should do something. How about one of these two patches.
# -----
linux-2.6.26-d_a_c-fix-noretry.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 21:21:37.000000000 +0200
@@ -398,7 +398,8 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ if (!(gfp & __GFP_WAIT))
+ gfp |= __GFP_NORETRY;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
# -----
linux-2.6.26-gfp-no-oom.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 20:42:10.000000000 +0200
@@ -398,7 +398,7 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ gfp |= __GFP_NO_OOM;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
diff -ruN linux-2.6.26-rc3.orig/include/linux/gfp.h linux-2.6.26-rc3/include/linux/gfp.h
--- linux-2.6.26-rc3.orig/include/linux/gfp.h 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/include/linux/gfp.h 2008-05-22 21:17:36.000000000 +0200
@@ -43,6 +43,7 @@
#define __GFP_REPEAT ((__force gfp_t)0x400u) /* See above */
#define __GFP_NOFAIL ((__force gfp_t)0x800u) /* See above */
#define __GFP_NORETRY ((__force gfp_t)0x1000u)/* See above */
+#define __GFP_NO_OOM ((__force gfp_t)0x2000u)/* Don't invoke oomkiller */
#define __GFP_COMP ((__force gfp_t)0x4000u)/* Add compound page metadata */
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
diff -ruN linux-2.6.26-rc3.orig/mm/page_alloc.c linux-2.6.26-rc3/mm/page_alloc.c
--- linux-2.6.26-rc3.orig/mm/page_alloc.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/mm/page_alloc.c 2008-05-22 17:39:12.000000000 +0200
@@ -1583,7 +1583,8 @@
zonelist, high_zoneidx, alloc_flags);
if (page)
goto got_pg;
- } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+ } else if ((gfp_mask & __GFP_FS) &&
+ !(gfp_mask & (__GFP_NORETRY|__GFP_NO_OOM))) {
if (!try_set_zone_oom(zonelist, gfp_mask)) {
schedule_timeout_uninterruptible(1);
goto restart;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-22 8:47 ` Andi Kleen
2008-05-22 19:25 ` Miquel van Smoorenburg
@ 2008-05-22 19:58 ` Thomas Gleixner
2008-05-22 22:59 ` Andi Kleen
1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2008-05-22 19:58 UTC (permalink / raw)
To: Andi Kleen
Cc: Glauber Costa, Miquel van Smoorenburg, linux-kernel, linux-mm,
andi-suse
On Thu, 22 May 2008, Andi Kleen wrote:
> On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> > probably andi has a better idea on why it was added, since it used to
> > live in his tree?
>
> d_a_c() tries a couple of zones, and running the oom killer for each
> is inconvenient. Especially for the 16MB DMA zone which is unlikely
> to be cleared by the OOM killer anyways because normal user applications
> don't put pages in there. There was a real report with some problems
> in this area.
Can you give some pointers please ?
Thanks,
tglx
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-22 19:58 ` Thomas Gleixner
@ 2008-05-22 22:59 ` Andi Kleen
0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2008-05-22 22:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andi Kleen, Glauber Costa, Miquel van Smoorenburg, linux-kernel,
linux-mm, andi-suse
On Thu, May 22, 2008 at 09:58:11PM +0200, Thomas Gleixner wrote:
> On Thu, 22 May 2008, Andi Kleen wrote:
> > On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> > > probably andi has a better idea on why it was added, since it used to
> > > live in his tree?
> >
> > d_a_c() tries a couple of zones, and running the oom killer for each
> > is inconvenient. Especially for the 16MB DMA zone which is unlikely
> > to be cleared by the OOM killer anyways because normal user applications
> > don't put pages in there. There was a real report with some problems
> > in this area.
>
> Can you give some pointers please ?
To the bug report? Memory is fuzzy, but I think it was some SUSE bugzilla
report, might have been for SLES.
Anyways the reasoning is still valid. Longer term the mask allocator
would be the right fix, shorter term a new GFP flag as proposed
sounds reasonable.
The trick is just that you need different __GFP_ flags for the different
allocations. e.g. the first the "higher zone" quick try should
continue to use __GFP_NORETRY. And the 16MB one should too. It would
only make sense for the main request.
In the mask allocator patchkit kernel it should be also ok already.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-22 19:25 ` Miquel van Smoorenburg
@ 2008-05-24 19:38 ` Miquel van Smoorenburg
2008-05-25 16:35 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Miquel van Smoorenburg @ 2008-05-24 19:38 UTC (permalink / raw)
To: Andi Kleen; +Cc: Glauber Costa, linux-kernel, linux-mm, andi-suse
On Thu, 2008-05-22 at 21:25 +0200, Miquel van Smoorenburg wrote:
> Most drivers call pci_alloc_consistent() which calls
> dma_alloc_coherent(.... GFP_ATOMIC) which can dip deep into reserves so
> it won't fail so easily. Just a handful use dma_alloc_coherent()
> directly.
>
> However, in 2.6.26-rc1, dpt_i2o.c was updated for 64 bit support, and
> all it's kmalloc(.... GFP_KERNEL) + virt_to_bus() calls have been
> replaced by dma_alloc_coherent(.... GFP_KERNEL).
>
> In that case, it's not a very good idea to add __GFP_NORETRY.
>
> I think we should do something. How about one of these two patches.
And Andi wrote:
On Fri, 2008-05-23 at 00:59 +0200, Andi Kleen wrote:
> Anyways the reasoning is still valid. Longer term the mask allocator
> would be the right fix, shorter term a new GFP flag as proposed
> sounds reasonable.
So how about linux-2.6.26-gfp-no-oom.patch (see previous mail) for
2.6.26 ?
Mike.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-24 19:38 ` Miquel van Smoorenburg
@ 2008-05-25 16:35 ` Andi Kleen
2008-05-25 19:55 ` Alan Cox
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2008-05-25 16:35 UTC (permalink / raw)
To: Miquel van Smoorenburg
Cc: Andi Kleen, Glauber Costa, linux-kernel, linux-mm, andi-suse
> So how about linux-2.6.26-gfp-no-oom.patch (see previous mail) for
> 2.6.26
Changing the gfp once globally like you did is not right, because
the different fallback cases have to be handled differently
(see the different cases I discussed in my earlier mail)
Especially the 16MB zone allocation should never trigger the OOM killer.
That could be special cased, but __GFP_NO_OOM_KILLER is likely better
as a short term fix although I'm still not 100% sure what implications
it will have to do more VM replies in the early fallbacks.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-25 16:35 ` Andi Kleen
@ 2008-05-25 19:55 ` Alan Cox
2008-05-25 21:23 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2008-05-25 19:55 UTC (permalink / raw)
To: Andi Kleen
Cc: Miquel van Smoorenburg, Glauber Costa, linux-kernel, linux-mm,
andi-suse
On Sun, 25 May 2008 18:35:39 +0200
Andi Kleen <andi@firstfloor.org> wrote:
> > So how about linux-2.6.26-gfp-no-oom.patch (see previous mail) for
> > 2.6.26
>
> Changing the gfp once globally like you did is not right, because
> the different fallback cases have to be handled differently
> (see the different cases I discussed in my earlier mail)
>
> Especially the 16MB zone allocation should never trigger the OOM killer.
That depends how much memory you have.
Alan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-25 19:55 ` Alan Cox
@ 2008-05-25 21:23 ` Andi Kleen
2008-05-25 22:02 ` Alan Cox
0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2008-05-25 21:23 UTC (permalink / raw)
To: Alan Cox
Cc: Andi Kleen, Miquel van Smoorenburg, Glauber Costa, linux-kernel,
linux-mm, andi-suse
On Sun, May 25, 2008 at 08:55:32PM +0100, Alan Cox wrote:
> On Sun, 25 May 2008 18:35:39 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > > So how about linux-2.6.26-gfp-no-oom.patch (see previous mail) for
> > > 2.6.26
> >
> > Changing the gfp once globally like you did is not right, because
> > the different fallback cases have to be handled differently
> > (see the different cases I discussed in my earlier mail)
> >
> > Especially the 16MB zone allocation should never trigger the OOM killer.
>
> That depends how much memory you have.
No it doesn't because the lower zone protection basically never puts
anything that is not GFP_DMA into the 16MB zone.
Just check yourself on your machine using sysrq.
That was one of the motivations behind the mask allocator design.
-Andi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
2008-05-25 21:23 ` Andi Kleen
@ 2008-05-25 22:02 ` Alan Cox
0 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2008-05-25 22:02 UTC (permalink / raw)
To: Andi Kleen
Cc: Miquel van Smoorenburg, Glauber Costa, linux-kernel, linux-mm,
andi-suse
> No it doesn't because the lower zone protection basically never puts
> anything that is not GFP_DMA into the 16MB zone.
>
> Just check yourself on your machine using sysrq.
>
> That was one of the motivations behind the mask allocator design.
Try a 16MB embedded PC
Alan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-05-25 22:02 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-21 11:30 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Miquel van Smoorenburg
2008-05-21 12:49 ` Glauber Costa
2008-05-22 8:47 ` Andi Kleen
2008-05-22 19:25 ` Miquel van Smoorenburg
2008-05-24 19:38 ` Miquel van Smoorenburg
2008-05-25 16:35 ` Andi Kleen
2008-05-25 19:55 ` Alan Cox
2008-05-25 21:23 ` Andi Kleen
2008-05-25 22:02 ` Alan Cox
2008-05-22 19:58 ` Thomas Gleixner
2008-05-22 22:59 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).