All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Schmidt <zaphodb@zaphods.net>
To: Andrew Morton <akpm@osdl.org>
Cc: marcelo.tosatti@cyclades.com, linux-kernel@vger.kernel.org,
	piggin@cyberone.com.au
Subject: 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: Re: Kernel 2.6.9 Multiple Page Allocation Failures
Date: Wed, 10 Nov 2004 02:27:34 +0100	[thread overview]
Message-ID: <20041110012733.GD20754@zaphods.net> (raw)
In-Reply-To: <20041109235201.GC20754@zaphods.net>

Alright, i got a funny thing here that i suspect to be an(other?) vm issue:

We are running a third-party closed source software which handles many tcp
sessions and reads and writes that to/from several disks/partitions.
With 2.6.10-rc1-mm4 it is the first time we notice that, right after the kernel
throws a swapper: page allocation error thread (just like the ones you already
know), the interrupt rate, connection count and traffic decreases subsequently.

Here is part of a vmstat 10:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0  11312  19404    268 1896896    0    0  1091 17578 25463  1225  7 38 37 18
 0  0  11312  26372    180 1892836    0    0  1182 21433 25576  1216  7 38 31 24
 1  2  11308  23784    608 1890168    0    0  1252 20667 25532  1243  7 40 24 29
 0  2  11308  23304    428 1890552    0    0  1174 20363 25948  1332  7 40 32 21
 1  1  11304  18496    444 1893328    0    0  1630 20506 25840  1322  7 38 30 26
 1  1  11304   8712    232 1905508    0    0  1528 19662 26245  1305  7 40 25 28
 1  0  11304  18952    180 1894000    0    0  1595 19680 26275  1215  7 38 27 28
 1  0  11304  22404    132 1896632    0    0   369 17724 24072  1045  7 37 49  7
 1  0  11304  23956    492 1899876    0    0   504 19609 20829  1151  9 34 49  7
 1  0  11304  25380    460 1908340    0    0   424 17983 16964   927  9 28 55  8
 1  0  11304  18244    464 1922140    0    0   309 14431 13417   836 10 27 60  3
 0  0  11304  17720    472 1928388    0    0   224 11868 9933   607 11 23 63  3
 1  0  11304  25720    476 1924440    0    0   133  7663 6780   504 10 20 68  2
 1  0  11304  24156    488 1928168    0    0   107  6244 5011   315  8 18 73  1
 0  0  11304  19544    712 1934268    0    0    76  3191 4464   299  8 18 73  1
 0  0  11304  19248    728 1936564    0    0    23  1802 4002   249  7 17 76  0
 1  0  11304  27092    736 1929892    0    0    16  1336 3655   284  6 16 78  0
 0  0  11304  26472    752 1931984    0    0    19  1508 3408   248  5 16 78  1
 0  0  11304  19000    768 1940944    0    0    20  1398 3195   226  5 14 81  0
 1  0  11304  21460    776 1938896    0    0    14  1084 3057   241  5 14 82  0
 0  0  11304  26268    848 1934608    0    0    12   927 2906   218  5 13 82  0
 1  1  11304  22076    900 1939860    0    0    18   679 2897   215  5 11 84  1
 0  0  11304  25880    952 1936748    0    0    17   653 2713   251  4 13 82  1
 0  0  11304  20436    976 1942368    0    0     8  1117 2703   229  5 11 83  1
...

strace shows:
01:38:50.316041 gettimeofday({1100047130, 316054}, NULL) = 0
01:38:50.316188 poll([{fd=5671, events=POLLIN}, {fd=2727, events=POLLIN}, {fd=6663, events=POLLIN}, {fd=197, events=POLLIN}, {fd=3978, events=POLLIN}, {fd=779, events=POLLIN}, ...{line continues like this}...
...
01:38:50.328056 accept(5, 0xbffd4ab8, [16]) = -1 EAGAIN (Resource temporarily unavailable) ...{an awful lot of these}...
...
01:38:50.329585 futex(0xaf1a698, FUTEX_WAIT, 92828, {0, 9964000}) = -1 ETIMEDOUT (Connection timed out) ...{some of these}...
...
Application says:
"n.n.n.n:p Client closed connection in body read"

To me it seems like suddently all those open sockets are suddenly 'temporarily
unavailable' to the application and so the connections time out.
I have not (yet?) seen this behaviour on 2.6.9, 2.6.9-mm1, 2.6.10-rc1-bk12 or
2.6.10-rc1-mm3.
I am able to reproduce the behaviour if under the same load iptraf or
tethereal are started. (First thought it might be because of the promisc mode.)

This seems to be what _might_ have triggered this although it was logged
happened 5m earlier than the traffic decay:

 printk: 36 messages suppressed.
 swapper: page allocation failure. order:0, mode:0x20
  [__alloc_pages+525/912] __alloc_pages+0x20d/0x390
  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
  [kmem_getpages+24/192] kmem_getpages+0x18/0xc0
  [cache_grow+157/304] cache_grow+0x9d/0x130
  [cache_alloc_refill+380/576] cache_alloc_refill+0x17c/0x240
  [__kmalloc+122/144] __kmalloc+0x7a/0x90
  [alloc_skb+50/208] alloc_skb+0x32/0xd0
  [tg3_alloc_rx_skb+112/304] tg3_alloc_rx_skb+0x70/0x130
  [tg3_rx+518/944] tg3_rx+0x206/0x3b0
  [tg3_poll+139/336] tg3_poll+0x8b/0x150
  [net_rx_action+125/288] net_rx_action+0x7d/0x120
  [__do_softirq+184/208] __do_softirq+0xb8/0xd0
  [do_softirq+45/48] do_softirq+0x2d/0x30
  [do_IRQ+30/48] do_IRQ+0x1e/0x30
  [common_interrupt+26/32] common_interrupt+0x1a/0x20
  [default_idle+0/64] default_idle+0x0/0x40
  [default_idle+44/64] default_idle+0x2c/0x40
  [cpu_idle+51/64] cpu_idle+0x33/0x40
  [start_kernel+331/368] start_kernel+0x14b/0x170
  [unknown_bootoption+0/432] unknown_bootoption+0x0/0x1b0
 DMA per-cpu:
 cpu 0 hot: low 2, high 6, batch 1
 cpu 0 cold: low 0, high 2, batch 1
 cpu 1 hot: low 2, high 6, batch 1
 cpu 1 cold: low 0, high 2, batch 1
 Normal per-cpu:
 cpu 0 hot: low 32, high 96, batch 16
 cpu 0 cold: low 0, high 32, batch 16
 cpu 1 hot: low 32, high 96, batch 16
 cpu 1 cold: low 0, high 32, batch 16
 HighMem per-cpu:
 cpu 0 hot: low 32, high 96, batch 16
 cpu 0 cold: low 0, high 32, batch 16
 cpu 1 hot: low 32, high 96, batch 16
 cpu 1 cold: low 0, high 32, batch 16
 
 Free pages:        4616kB (1600kB HighMem)
 Active:504159 inactive:454759 dirty:20020 writeback:115 unstable:0 free:1154 slab:50758 mapped:489095 pagetables:1222
 DMA free:56kB min:144kB low:288kB high:432kB active:1936kB inactive:4932kB present:16384kB pages_scanned:32 all_unreclaimable? no
 protections[]: 0 0 0
 Normal free:2960kB min:8044kB low:16088kB high:24132kB active:492320kB inactive:166992kB present:901120kB pages_scanned:62 all_unreclaimable? no
 protections[]: 0 0 0
 HighMem free:1600kB min:512kB low:1024kB high:1536kB active:1522380kB inactive:1647112kB present:3178432kB pages_scanned:0 all_unreclaimable? no
 protections[]: 0 0 0
 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56kB
 Normal: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2960kB
 HighMem: 6*4kB 3*8kB 41*16kB 0*32kB 6*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1600kB
 Swap cache: add 154147, delete 151810, find 29532/39794, race 0+0

	Stefan

  reply	other threads:[~2004-11-10  1:27 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-03 22:24 Kernel 2.6.9 Multiple Page Allocation Failures (Part 2) Stefan Schmidt
2004-11-04 12:17 ` Marcelo Tosatti
2004-11-04 18:18   ` Stefan Schmidt
2004-11-09 16:41     ` Kernel 2.6.9 Multiple Page Allocation Failures Marcelo Tosatti
2004-11-09 22:35       ` Lukas Hejtmanek
2004-11-09 22:46         ` Andrew Morton
2004-11-09 22:44           ` Lukas Hejtmanek
2004-11-09 20:33             ` Marcelo Tosatti
2004-11-10 20:35               ` Lukas Hejtmanek
2004-11-10 21:09                 ` Andrew Morton
2004-11-10 21:24                   ` Lukas Hejtmanek
2004-11-10 21:47                     ` Andrew Morton
2004-11-10 21:28               ` Lukas Hejtmanek
2004-11-10 18:11                 ` Marcelo Tosatti
2004-11-11  1:04                   ` Lukas Hejtmanek
2004-11-11 21:44                   ` Lukas Hejtmanek
2004-11-12 12:09                     ` Nick Piggin
2004-11-13 14:47                       ` Stefan Schmidt
2004-11-16  9:33                         ` Marcelo Tosatti
2004-11-16 17:05                           ` Lukas Hejtmanek
2004-11-21  1:43                             ` Stefan Schmidt
2004-11-21  2:42                               ` Stefan Schmidt
2004-12-02 19:54                                 ` Lukas Hejtmanek
2004-12-02 20:25                                   ` Andrew Morton
2004-12-02 21:03                                     ` Lukas Hejtmanek
2004-12-02 22:31                                       ` Stefan Schmidt
2004-12-02 22:48                                         ` Lukas Hejtmanek
2004-12-02 22:56                                         ` Andrew Morton
2004-12-02 23:18                                           ` Lukas Hejtmanek
2004-12-03  0:18                                             ` Andrew Morton
2004-12-03 12:11                                               ` Lukas Hejtmanek
2004-12-03 12:17                                                 ` Lukas Hejtmanek
2004-12-07 22:52                                                 ` Nick Piggin
2004-12-07 22:59                                                   ` Lukas Hejtmanek
2004-12-07 23:05                                                     ` Nick Piggin
2004-12-08 11:18                                                       ` Lukas Hejtmanek
2004-12-08 11:23                                                         ` Nick Piggin
2004-12-08 11:46                                                           ` Lukas Hejtmanek
2004-12-08 13:14                                                           ` Lukas Hejtmanek
2004-12-09  8:52                                                             ` Nick Piggin
2004-12-09  9:02                                                               ` Lukas Hejtmanek
2004-12-09 10:29                                                                 ` Nick Piggin
2004-12-09 10:37                                                                   ` Lukas Hejtmanek
2004-12-03  6:18                                           ` Nathan Scott
2004-12-03  7:06                                             ` Andrew Morton
2004-12-07 11:17                                             ` Lukas Hejtmanek
2004-12-08  0:15                                               ` Nathan Scott
2004-12-08  0:36                                                 ` Lukas Hejtmanek
2004-12-03 10:35                                           ` Christoph Hellwig
2004-12-03 10:58                                     ` P
2004-12-03 17:11                                       ` Andrew Morton
2004-11-09 23:52           ` Stefan Schmidt
2004-11-10  1:27             ` Stefan Schmidt [this message]
2004-11-10  1:39               ` 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: " Andrew Morton
2004-11-10  2:03                 ` Stefan Schmidt
2004-11-10  2:21                   ` Andrew Morton
2004-11-10  4:24                   ` Nick Piggin
2004-11-10 10:28                     ` Stefan Schmidt
2004-11-10 12:06                       ` Stefan Schmidt
2004-11-10  8:58                         ` Marcelo Tosatti
2004-11-10 12:48                           ` Stefan Schmidt
2004-11-10 10:56                             ` Marcelo Tosatti
2004-11-11  1:23                             ` Nick Piggin
2004-11-11 18:31                               ` jhigdon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041110012733.GD20754@zaphods.net \
    --to=zaphodb@zaphods.net \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=piggin@cyberone.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.