From: Miquel van Smoorenburg <miquels@cistron.nl>
To: Andi Kleen <andi@firstfloor.org>
Cc: Glauber Costa <gcosta@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
andi-suse@firstfloor.org, miquels@cistron.nl
Subject: Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Date: Thu, 22 May 2008 21:25:43 +0200 [thread overview]
Message-ID: <1211484343.30678.15.camel@localhost.localdomain> (raw)
In-Reply-To: <20080522084736.GC31727@one.firstfloor.org>
On Thu, 2008-05-22 at 10:47 +0200, Andi Kleen wrote:
> On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> > probably andi has a better idea on why it was added, since it used to
> > live in his tree?
>
> d_a_c() tries a couple of zones, and running the oom killer for each
> is inconvenient. Especially for the 16MB DMA zone which is unlikely
> to be cleared by the OOM killer anyways because normal user applications
> don't put pages in there. There was a real report with some problems
> in this area. Also for the earlier tries you don't want to really
> bring the system into swap.
I understand, but I do think using __GFP_NORETRY causes problems.
Most drivers call pci_alloc_consistent() which calls
dma_alloc_coherent(.... GFP_ATOMIC) which can dip deep into reserves so
it won't fail so easily. Just a handful use dma_alloc_coherent()
directly.
However, in 2.6.26-rc1, dpt_i2o.c was updated for 64 bit support, and
all it's kmalloc(.... GFP_KERNEL) + virt_to_bus() calls have been
replaced by dma_alloc_coherent(.... GFP_KERNEL).
In that case, it's not a very good idea to add __GFP_NORETRY. It will
cause problems. It certainly does in 3w-xxxx.c and it probably will
cause worse problems in dpt_i2o.c.
I think we should do something. How about one of these two patches.
# -----
linux-2.6.26-d_a_c-fix-noretry.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 21:21:37.000000000 +0200
@@ -398,7 +398,8 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ if (!(gfp & __GFP_WAIT))
+ gfp |= __GFP_NORETRY;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
# -----
linux-2.6.26-gfp-no-oom.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 20:42:10.000000000 +0200
@@ -398,7 +398,7 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ gfp |= __GFP_NO_OOM;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
diff -ruN linux-2.6.26-rc3.orig/include/linux/gfp.h linux-2.6.26-rc3/include/linux/gfp.h
--- linux-2.6.26-rc3.orig/include/linux/gfp.h 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/include/linux/gfp.h 2008-05-22 21:17:36.000000000 +0200
@@ -43,6 +43,7 @@
#define __GFP_REPEAT ((__force gfp_t)0x400u) /* See above */
#define __GFP_NOFAIL ((__force gfp_t)0x800u) /* See above */
#define __GFP_NORETRY ((__force gfp_t)0x1000u)/* See above */
+#define __GFP_NO_OOM ((__force gfp_t)0x2000u)/* Don't invoke oomkiller */
#define __GFP_COMP ((__force gfp_t)0x4000u)/* Add compound page metadata */
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
diff -ruN linux-2.6.26-rc3.orig/mm/page_alloc.c linux-2.6.26-rc3/mm/page_alloc.c
--- linux-2.6.26-rc3.orig/mm/page_alloc.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/mm/page_alloc.c 2008-05-22 17:39:12.000000000 +0200
@@ -1583,7 +1583,8 @@
zonelist, high_zoneidx, alloc_flags);
if (page)
goto got_pg;
- } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+ } else if ((gfp_mask & __GFP_FS) &&
+ !(gfp_mask & (__GFP_NORETRY|__GFP_NO_OOM))) {
if (!try_set_zone_oom(zonelist, gfp_mask)) {
schedule_timeout_uninterruptible(1);
goto restart;
WARNING: multiple messages have this Message-ID (diff)
From: Miquel van Smoorenburg <miquels@cistron.nl>
To: Andi Kleen <andi@firstfloor.org>
Cc: Glauber Costa <gcosta@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
andi-suse@firstfloor.org, miquels@cistron.nl
Subject: Re: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Date: Thu, 22 May 2008 21:25:43 +0200 [thread overview]
Message-ID: <1211484343.30678.15.camel@localhost.localdomain> (raw)
In-Reply-To: <20080522084736.GC31727@one.firstfloor.org>
On Thu, 2008-05-22 at 10:47 +0200, Andi Kleen wrote:
> On Wed, May 21, 2008 at 09:49:27AM -0300, Glauber Costa wrote:
> > probably andi has a better idea on why it was added, since it used to
> > live in his tree?
>
> d_a_c() tries a couple of zones, and running the oom killer for each
> is inconvenient. Especially for the 16MB DMA zone which is unlikely
> to be cleared by the OOM killer anyways because normal user applications
> don't put pages in there. There was a real report with some problems
> in this area. Also for the earlier tries you don't want to really
> bring the system into swap.
I understand, but I do think using __GFP_NORETRY causes problems.
Most drivers call pci_alloc_consistent() which calls
dma_alloc_coherent(.... GFP_ATOMIC) which can dip deep into reserves so
it won't fail so easily. Just a handful use dma_alloc_coherent()
directly.
However, in 2.6.26-rc1, dpt_i2o.c was updated for 64 bit support, and
all it's kmalloc(.... GFP_KERNEL) + virt_to_bus() calls have been
replaced by dma_alloc_coherent(.... GFP_KERNEL).
In that case, it's not a very good idea to add __GFP_NORETRY. It will
cause problems. It certainly does in 3w-xxxx.c and it probably will
cause worse problems in dpt_i2o.c.
I think we should do something. How about one of these two patches.
# -----
linux-2.6.26-d_a_c-fix-noretry.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 21:21:37.000000000 +0200
@@ -398,7 +398,8 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ if (!(gfp & __GFP_WAIT))
+ gfp |= __GFP_NORETRY;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
# -----
linux-2.6.26-gfp-no-oom.patch
diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c 2008-05-22 20:42:10.000000000 +0200
@@ -398,7 +398,7 @@
return NULL;
/* Don't invoke OOM killer */
- gfp |= __GFP_NORETRY;
+ gfp |= __GFP_NO_OOM;
#ifdef CONFIG_X86_64
/* Why <=? Even when the mask is smaller than 4GB it is often
diff -ruN linux-2.6.26-rc3.orig/include/linux/gfp.h linux-2.6.26-rc3/include/linux/gfp.h
--- linux-2.6.26-rc3.orig/include/linux/gfp.h 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/include/linux/gfp.h 2008-05-22 21:17:36.000000000 +0200
@@ -43,6 +43,7 @@
#define __GFP_REPEAT ((__force gfp_t)0x400u) /* See above */
#define __GFP_NOFAIL ((__force gfp_t)0x800u) /* See above */
#define __GFP_NORETRY ((__force gfp_t)0x1000u)/* See above */
+#define __GFP_NO_OOM ((__force gfp_t)0x2000u)/* Don't invoke oomkiller */
#define __GFP_COMP ((__force gfp_t)0x4000u)/* Add compound page metadata */
#define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on success */
#define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
diff -ruN linux-2.6.26-rc3.orig/mm/page_alloc.c linux-2.6.26-rc3/mm/page_alloc.c
--- linux-2.6.26-rc3.orig/mm/page_alloc.c 2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/mm/page_alloc.c 2008-05-22 17:39:12.000000000 +0200
@@ -1583,7 +1583,8 @@
zonelist, high_zoneidx, alloc_flags);
if (page)
goto got_pg;
- } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+ } else if ((gfp_mask & __GFP_FS) &&
+ !(gfp_mask & (__GFP_NORETRY|__GFP_NO_OOM))) {
if (!try_set_zone_oom(zonelist, gfp_mask)) {
schedule_timeout_uninterruptible(1);
goto restart;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-22 19:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-21 11:30 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Miquel van Smoorenburg
2008-05-21 11:30 ` Miquel van Smoorenburg
2008-05-21 12:49 ` Glauber Costa
2008-05-21 12:49 ` Glauber Costa
2008-05-22 8:47 ` Andi Kleen
2008-05-22 8:47 ` Andi Kleen
2008-05-22 19:25 ` Miquel van Smoorenburg [this message]
2008-05-22 19:25 ` Miquel van Smoorenburg
2008-05-24 19:38 ` Miquel van Smoorenburg
2008-05-24 19:38 ` Miquel van Smoorenburg
2008-05-25 16:35 ` Andi Kleen
2008-05-25 16:35 ` Andi Kleen
2008-05-25 19:55 ` Alan Cox
2008-05-25 19:55 ` Alan Cox
2008-05-25 21:23 ` Andi Kleen
2008-05-25 21:23 ` Andi Kleen
2008-05-25 22:02 ` Alan Cox
2008-05-25 22:02 ` Alan Cox
2008-05-22 19:58 ` Thomas Gleixner
2008-05-22 19:58 ` Thomas Gleixner
2008-05-22 22:59 ` Andi Kleen
2008-05-22 22:59 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1211484343.30678.15.camel@localhost.localdomain \
--to=miquels@cistron.nl \
--cc=andi-suse@firstfloor.org \
--cc=andi@firstfloor.org \
--cc=gcosta@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.