2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Glauber Costa <gcosta@redhat.com>
Subject: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Date: Wed, 21 May 2008 13:30:32 +0200	[thread overview]
Message-ID: <20080521113028.GA24632@xs4all.net> (raw)

I've recently switched some of my boxes from a 32 to a
64 bit kernel. These are usenet server boxes that do
a lot of I/O. They are running 2.6.24 / 2.6.25

Every 15 minutes a cronjob calls a management utility, tw_cli,
 to read the raid status of the 3ware disk arrays. That
often fails with a segmentation violation .. 

tw_cli: page allocation failure. order:0, mode:0x10d0
Pid: 9296, comm: tw_cli Not tainted 2.6.25.4 #2

Call Trace:
 [<ffffffff802604b6>] __alloc_pages+0x336/0x390
 [<ffffffff80210ff4>] dma_alloc_pages+0x24/0xa0
 [<ffffffff80211113>] dma_alloc_coherent+0xa3/0x2e0
 [<ffffffff8804a58f>] :3w_9xxx:twa_chrdev_ioctl+0x11f/0x810
 [<ffffffff802826c0>] chrdev_open+0x0/0x1c0
 [<ffffffff8027d997>] __dentry_open+0x197/0x210
 [<ffffffff8028c4ed>] vfs_ioctl+0x7d/0xa0
 [<ffffffff8028c584>] do_vfs_ioctl+0x74/0x2d0
 [<ffffffff8028c829>] sys_ioctl+0x49/0x80
 [<ffffffff8020b29b>] system_call_after_swapgs+0x7b/0x80

Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  60
CPU    1: hi:  186, btch:  31 usd: 185
CPU    2: hi:  186, btch:  31 usd: 176
CPU    3: hi:  186, btch:  31 usd: 165
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 120
CPU    1: hi:  186, btch:  31 usd: 164
CPU    2: hi:  186, btch:  31 usd: 177
CPU    3: hi:  186, btch:  31 usd: 182
Active:265929 inactive:1657355 dirty:663189 writeback:62890 unstable:0
 free:49079 slab:65923 mapped:1238 pagetables:927 bounce:0
DMA free:12308kB min:184kB low:228kB high:276kB active:0kB inactive:0kB present:11816kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 8053 8053
DMA32 free:94200kB min:52912kB low:66140kB high:79368kB active:440616kB inactive:2505772kB present:3333792kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4797 4797
Normal free:86792kB min:77968kB low:97460kB high:116952kB active:623100kB inactive:4126872kB present:4912640kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 5*8kB 2*16kB 6*32kB 4*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12308kB
DMA32: 150*4kB 5*8kB 2299*16kB 120*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 13*4096kB = 94512kB
Normal: 462*4kB 3803*8kB 123*16kB 24*32kB 2*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 18*4096kB = 109760kB
1653409 total pagecache pages
Swap cache: add 5748, delete 5411, find 4317/4852
Free swap  = 4588488kB
Total swap = 4594580kB
Free swap:       4588488kB
2293760 pages of RAM
249225 reserved pages
1658761 pages shared
337 pages swap cached

(this is easily reproducible by pinning a lot of memory with
 mmap/mlock, say 6 GB on an 8 GB box, while running
 cat /dev/zero > filename, then invoking tw_cli)

Now this appears to happen because dma_alloc_coherent() in
pci-dma_64.c does this:

        /* Don't invoke OOM killer */
        gfp |= __GFP_NORETRY;

However, if you read mm/page_alloc.c you can see that this not only
prevents invoking the OOM killer, it also does what it says:
no retries when allocating memory.

That means that dma_alloc_coherent(..., GFP_KERNEL) can become
unreliable. Bad news.

pci-dma_32 does not do this.

And in 2.6.26-rc1, pci-dma_32.c and pci-dma_64.c were merged,
so now the 32 bit kernel has the same problem.

Does anyone know why this was added on x86_64 ?

If not I think this patch should go into 2.6.26:

diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c	2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c	2008-05-21 13:15:54.000000000 +0200
@@ -397,9 +397,6 @@
 	if (dev->dma_mask == NULL)
 		return NULL;
 
-	/* Don't invoke OOM killer */
-	gfp |= __GFP_NORETRY;
-
 #ifdef CONFIG_X86_64
 	/* Why <=? Even when the mask is smaller than 4GB it is often
 	   larger than 16MB and in this case we have a chance of


Ideas ? Maybe a __GFP_NO_OOMKILLER ? 

Mike.

WARNING: multiple messages have this Message-ID (diff)

From: Miquel van Smoorenburg <miquels@cistron.nl>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Glauber Costa <gcosta@redhat.com>
Subject: 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ?
Date: Wed, 21 May 2008 13:30:32 +0200	[thread overview]
Message-ID: <20080521113028.GA24632@xs4all.net> (raw)

I've recently switched some of my boxes from a 32 to a
64 bit kernel. These are usenet server boxes that do
a lot of I/O. They are running 2.6.24 / 2.6.25

Every 15 minutes a cronjob calls a management utility, tw_cli,
 to read the raid status of the 3ware disk arrays. That
often fails with a segmentation violation .. 

tw_cli: page allocation failure. order:0, mode:0x10d0
Pid: 9296, comm: tw_cli Not tainted 2.6.25.4 #2

Call Trace:
 [<ffffffff802604b6>] __alloc_pages+0x336/0x390
 [<ffffffff80210ff4>] dma_alloc_pages+0x24/0xa0
 [<ffffffff80211113>] dma_alloc_coherent+0xa3/0x2e0
 [<ffffffff8804a58f>] :3w_9xxx:twa_chrdev_ioctl+0x11f/0x810
 [<ffffffff802826c0>] chrdev_open+0x0/0x1c0
 [<ffffffff8027d997>] __dentry_open+0x197/0x210
 [<ffffffff8028c4ed>] vfs_ioctl+0x7d/0xa0
 [<ffffffff8028c584>] do_vfs_ioctl+0x74/0x2d0
 [<ffffffff8028c829>] sys_ioctl+0x49/0x80
 [<ffffffff8020b29b>] system_call_after_swapgs+0x7b/0x80

Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  60
CPU    1: hi:  186, btch:  31 usd: 185
CPU    2: hi:  186, btch:  31 usd: 176
CPU    3: hi:  186, btch:  31 usd: 165
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 120
CPU    1: hi:  186, btch:  31 usd: 164
CPU    2: hi:  186, btch:  31 usd: 177
CPU    3: hi:  186, btch:  31 usd: 182
Active:265929 inactive:1657355 dirty:663189 writeback:62890 unstable:0
 free:49079 slab:65923 mapped:1238 pagetables:927 bounce:0
DMA free:12308kB min:184kB low:228kB high:276kB active:0kB inactive:0kB present:11816kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 8053 8053
DMA32 free:94200kB min:52912kB low:66140kB high:79368kB active:440616kB inactive:2505772kB present:3333792kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4797 4797
Normal free:86792kB min:77968kB low:97460kB high:116952kB active:623100kB inactive:4126872kB present:4912640kB pages_scanned:32 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3*4kB 5*8kB 2*16kB 6*32kB 4*64kB 4*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12308kB
DMA32: 150*4kB 5*8kB 2299*16kB 120*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 13*4096kB = 94512kB
Normal: 462*4kB 3803*8kB 123*16kB 24*32kB 2*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 18*4096kB = 109760kB
1653409 total pagecache pages
Swap cache: add 5748, delete 5411, find 4317/4852
Free swap  = 4588488kB
Total swap = 4594580kB
Free swap:       4588488kB
2293760 pages of RAM
249225 reserved pages
1658761 pages shared
337 pages swap cached

(this is easily reproducible by pinning a lot of memory with
 mmap/mlock, say 6 GB on an 8 GB box, while running
 cat /dev/zero > filename, then invoking tw_cli)

Now this appears to happen because dma_alloc_coherent() in
pci-dma_64.c does this:

        /* Don't invoke OOM killer */
        gfp |= __GFP_NORETRY;

However, if you read mm/page_alloc.c you can see that this not only
prevents invoking the OOM killer, it also does what it says:
no retries when allocating memory.

That means that dma_alloc_coherent(..., GFP_KERNEL) can become
unreliable. Bad news.

pci-dma_32 does not do this.

And in 2.6.26-rc1, pci-dma_32.c and pci-dma_64.c were merged,
so now the 32 bit kernel has the same problem.

Does anyone know why this was added on x86_64 ?

If not I think this patch should go into 2.6.26:

diff -ruN linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c
--- linux-2.6.26-rc3.orig/arch/x86/kernel/pci-dma.c	2008-05-18 23:36:41.000000000 +0200
+++ linux-2.6.26-rc3/arch/x86/kernel/pci-dma.c	2008-05-21 13:15:54.000000000 +0200
@@ -397,9 +397,6 @@
 	if (dev->dma_mask == NULL)
 		return NULL;
 
-	/* Don't invoke OOM killer */
-	gfp |= __GFP_NORETRY;
-
 #ifdef CONFIG_X86_64
 	/* Why <=? Even when the mask is smaller than 4GB it is often
 	   larger than 16MB and in this case we have a chance of


Ideas ? Maybe a __GFP_NO_OOMKILLER ? 

Mike.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next             reply	other threads:[~2008-05-21 11:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21 11:30 Miquel van Smoorenburg [this message]
2008-05-21 11:30 ` 2.6.26: x86/kernel/pci_dma.c: gfp |= __GFP_NORETRY ? Miquel van Smoorenburg
2008-05-21 12:49 ` Glauber Costa
2008-05-21 12:49   ` Glauber Costa
2008-05-22  8:47   ` Andi Kleen
2008-05-22  8:47     ` Andi Kleen
2008-05-22 19:25     ` Miquel van Smoorenburg
2008-05-22 19:25       ` Miquel van Smoorenburg
2008-05-24 19:38       ` Miquel van Smoorenburg
2008-05-24 19:38         ` Miquel van Smoorenburg
2008-05-25 16:35         ` Andi Kleen
2008-05-25 16:35           ` Andi Kleen
2008-05-25 19:55           ` Alan Cox
2008-05-25 19:55             ` Alan Cox
2008-05-25 21:23             ` Andi Kleen
2008-05-25 21:23               ` Andi Kleen
2008-05-25 22:02               ` Alan Cox
2008-05-25 22:02                 ` Alan Cox
2008-05-22 19:58     ` Thomas Gleixner
2008-05-22 19:58       ` Thomas Gleixner
2008-05-22 22:59       ` Andi Kleen
2008-05-22 22:59         ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080521113028.GA24632@xs4all.net \
    --to=miquels@cistron.nl \
    --cc=gcosta@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.