linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Miquel van Smoorenburg <miquels@cistron.nl>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Glauber Costa <gcosta@redhat.com>
Subject: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Date: Wed, 21 May 2008 13:30:32 +0200	[thread overview]
Message-ID: <20080521113028.GA24632@xs4all.net> (raw)

I've recently switched some of my boxes from a 32 to a
64 bit kernel. These are usenet server boxes that do
a lot of I/O. They are running 2.6.24 / 2.6.25

Every 15 minutes a cronjob calls a management utility, tw_cli,
 to read the raid status of the 3ware disk arrays. That
often fails with a segmentation violation .. 

tw_cli: page allocation failure. order:0, mode:0x10d0
Pid: 9296, comm: tw_cli Not tainted 2.6.25.4 #2

Call Trace:
 [<ffffffff802604b6>] __alloc_pages+0x336/0x390
 [<ffffffff80210ff4>] dma_alloc_pages+0x24/0xa0
 [<ffffffff80211113>] dma_alloc_coherent+0xa3/0x2e0
 [<ffffffff8804a58f>] :3w_9xxx:twa_chrdev_ioctl+0x11f/0x810
 [<ffffffff802826c0>] chrdev_open+0x0/0x1c0
 [<ffffffff8027d997>] __dentry_open+0x197/0x210
 [<ffffffff8028c4ed>] vfs_ioctl+0x7d/0xa0
 [<ffffffff8028c584>] do_vfs_ioctl+0x74/0x2d0
 [<ffffffff8028c829>] sys_ioctl+0x49/0x80
 [<ffffffff8020b29b>] system_call_after_swapgs+0x7b/0x80

Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  60
CPU    1: hi:  186, btch:  31 usd: 185
CPU    2: hi:  186, btch:  31 usd: 176
CPU    3: hi:  186, btch:  31 usd: 165
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 120
CPU    1: hi:  186, btch:  31 usd: 164
CPU    2: hi:  186, btch:  31 usd: 177
CPU    3: hi:  186, btch:  31 usd: 182
Active:265929 inactive:1657355 dirty:663189 writeback:62890 unstable:0
 free:49079 slab:65923 mapped:1238 pagetables:927 bounce:0
DMA free:12308kB min:184kB low:228kB high:276kB active:0kB inactive:0kB present:11816kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 8053 8053
DMA32 free:94200kB min:52912kB low:66140kB high:79368kB active:440616kB inactive:2505772kB present:3333792kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4797 4797
Normal free:86792kB min:77968kB low:97460kB high:116952kB active:623100kB inactive:4126872kB present:4912640kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 5*8kB 2*16kB 6*32kB 4*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12308kB
DMA32: 150*4kB 5*8kB 2299*16kB 120*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 13*4096kB = 94512kB
Normal: 462*4kB 3803*8kB 123*16kB 24*32kB 2*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 18*4096kB = 109760kB
1653409 total pagecache pages
Swap cache: add 5748, delete 5411, find 4317/4852
Free swap  = 4588488kB
Total swap = 4594580kB
Free swap:       4588488kB
2293760 pages of RAM
249225 reserved pages
1658761 pages shared
337 pages swap cached

(this is easily reproducible by pinning a lot of memory with
 mmap/mlock, say 6 GB on an 8 GB box, while running
 cat /dev/zero > filename, then invoking tw_cli)

Now this appears to happen because dma_alloc_coherent() in
pci-dma_64.c does this:

        /* Don't invoke OOM killer */
        gfp |= __GFP_NORETRY;

However, if you read mm/page_alloc.c you can see that this not only
prevents invoking the OOM killer, it also does what it says:
no retries when allocating memory.

That means that dma_alloc_coherent(..., GFP_KERNEL) can become
unreliable. Bad news.

pci-dma_32 does not do this.

And in 2.6.26-rc1, pci-dma_32.c and pci-dma_64.c were merged,
so now the 32 bit kernel has the same problem.

Does anyone know why this was added on x86_64 ?

If not I think this patch should go into 2.6.26:

diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c	2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c	2008-05-21 13:15:54.000000000 +0200
@@ -397,9 +397,6 @@
 	if (dev->dma_mask == NULL)
 		return NULL;
 
-	/* Don't invoke OOM killer */
-	gfp |= __GFP_NORETRY;
-
 #ifdef CONFIG_X86_64
 	/* Why <=? Even when the mask is smaller than 4GB it is often
 	   larger than 16MB and in this case we have a chance of


Ideas ? Maybe a __GFP_NO_OOMKILLER ? 

Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2008-05-21 11:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21 11:30 Miquel van Smoorenburg [this message]
2008-05-21 12:49 ` 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Glauber Costa
2008-05-22  8:47   ` Andi Kleen
2008-05-22 19:25     ` Miquel van Smoorenburg
2008-05-24 19:38       ` Miquel van Smoorenburg
2008-05-25 16:35         ` Andi Kleen
2008-05-25 19:55           ` Alan Cox
2008-05-25 21:23             ` Andi Kleen
2008-05-25 22:02               ` Alan Cox
2008-05-22 19:58     ` Thomas Gleixner
2008-05-22 22:59       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080521113028.GA24632@xs4all.net \
    --to=miquels@cistron.nl \
    --cc=gcosta@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).