public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Schmidt <zaphodb@zaphods.net>
To: Andrew Morton <akpm@osdl.org>
Cc: marcelo.tosatti@cyclades.com, linux-kernel@vger.kernel.org,
	piggin@cyberone.com.au
Subject: 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: Re: Kernel 2.6.9 Multiple Page Allocation Failures
Date: Wed, 10 Nov 2004 02:27:34 +0100	[thread overview]
Message-ID: <20041110012733.GD20754@zaphods.net> (raw)
In-Reply-To: <20041109235201.GC20754@zaphods.net>

Alright, i got a funny thing here that i suspect to be an(other?) vm issue:

We are running a third-party closed source software which handles many tcp
sessions and reads and writes that to/from several disks/partitions.
With 2.6.10-rc1-mm4 it is the first time we notice that, right after the kernel
throws a swapper: page allocation error thread (just like the ones you already
know), the interrupt rate, connection count and traffic decreases subsequently.

Here is part of a vmstat 10:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0  11312  19404    268 1896896    0    0  1091 17578 25463  1225  7 38 37 18
 0  0  11312  26372    180 1892836    0    0  1182 21433 25576  1216  7 38 31 24
 1  2  11308  23784    608 1890168    0    0  1252 20667 25532  1243  7 40 24 29
 0  2  11308  23304    428 1890552    0    0  1174 20363 25948  1332  7 40 32 21
 1  1  11304  18496    444 1893328    0    0  1630 20506 25840  1322  7 38 30 26
 1  1  11304   8712    232 1905508    0    0  1528 19662 26245  1305  7 40 25 28
 1  0  11304  18952    180 1894000    0    0  1595 19680 26275  1215  7 38 27 28
 1  0  11304  22404    132 1896632    0    0   369 17724 24072  1045  7 37 49  7
 1  0  11304  23956    492 1899876    0    0   504 19609 20829  1151  9 34 49  7
 1  0  11304  25380    460 1908340    0    0   424 17983 16964   927  9 28 55  8
 1  0  11304  18244    464 1922140    0    0   309 14431 13417   836 10 27 60  3
 0  0  11304  17720    472 1928388    0    0   224 11868 9933   607 11 23 63  3
 1  0  11304  25720    476 1924440    0    0   133  7663 6780   504 10 20 68  2
 1  0  11304  24156    488 1928168    0    0   107  6244 5011   315  8 18 73  1
 0  0  11304  19544    712 1934268    0    0    76  3191 4464   299  8 18 73  1
 0  0  11304  19248    728 1936564    0    0    23  1802 4002   249  7 17 76  0
 1  0  11304  27092    736 1929892    0    0    16  1336 3655   284  6 16 78  0
 0  0  11304  26472    752 1931984    0    0    19  1508 3408   248  5 16 78  1
 0  0  11304  19000    768 1940944    0    0    20  1398 3195   226  5 14 81  0
 1  0  11304  21460    776 1938896    0    0    14  1084 3057   241  5 14 82  0
 0  0  11304  26268    848 1934608    0    0    12   927 2906   218  5 13 82  0
 1  1  11304  22076    900 1939860    0    0    18   679 2897   215  5 11 84  1
 0  0  11304  25880    952 1936748    0    0    17   653 2713   251  4 13 82  1
 0  0  11304  20436    976 1942368    0    0     8  1117 2703   229  5 11 83  1
...

strace shows:
01:38:50.316041 gettimeofday({1100047130, 316054}, NULL) = 0
01:38:50.316188 poll([{fd=5671, events=POLLIN}, {fd=2727, events=POLLIN}, {fd=6663, events=POLLIN}, {fd=197, events=POLLIN}, {fd=3978, events=POLLIN}, {fd=779, events=POLLIN}, ...{line continues like this}...
...
01:38:50.328056 accept(5, 0xbffd4ab8, [16]) = -1 EAGAIN (Resource temporarily unavailable) ...{an awful lot of these}...
...
01:38:50.329585 futex(0xaf1a698, FUTEX_WAIT, 92828, {0, 9964000}) = -1 ETIMEDOUT (Connection timed out) ...{some of these}...
...
Application says:
"n.n.n.n:p Client closed connection in body read"

To me it seems like suddently all those open sockets are suddenly 'temporarily
unavailable' to the application and so the connections time out.
I have not (yet?) seen this behaviour on 2.6.9, 2.6.9-mm1, 2.6.10-rc1-bk12 or
2.6.10-rc1-mm3.
I am able to reproduce the behaviour if under the same load iptraf or
tethereal are started. (First thought it might be because of the promisc mode.)

This seems to be what _might_ have triggered this although it was logged
happened 5m earlier than the traffic decay:

 printk: 36 messages suppressed.
 swapper: page allocation failure. order:0, mode:0x20
  [__alloc_pages+525/912] __alloc_pages+0x20d/0x390
  [__get_free_pages+24/48] __get_free_pages+0x18/0x30
  [kmem_getpages+24/192] kmem_getpages+0x18/0xc0
  [cache_grow+157/304] cache_grow+0x9d/0x130
  [cache_alloc_refill+380/576] cache_alloc_refill+0x17c/0x240
  [__kmalloc+122/144] __kmalloc+0x7a/0x90
  [alloc_skb+50/208] alloc_skb+0x32/0xd0
  [tg3_alloc_rx_skb+112/304] tg3_alloc_rx_skb+0x70/0x130
  [tg3_rx+518/944] tg3_rx+0x206/0x3b0
  [tg3_poll+139/336] tg3_poll+0x8b/0x150
  [net_rx_action+125/288] net_rx_action+0x7d/0x120
  [__do_softirq+184/208] __do_softirq+0xb8/0xd0
  [do_softirq+45/48] do_softirq+0x2d/0x30
  [do_IRQ+30/48] do_IRQ+0x1e/0x30
  [common_interrupt+26/32] common_interrupt+0x1a/0x20
  [default_idle+0/64] default_idle+0x0/0x40
  [default_idle+44/64] default_idle+0x2c/0x40
  [cpu_idle+51/64] cpu_idle+0x33/0x40
  [start_kernel+331/368] start_kernel+0x14b/0x170
  [unknown_bootoption+0/432] unknown_bootoption+0x0/0x1b0
 DMA per-cpu:
 cpu 0 hot: low 2, high 6, batch 1
 cpu 0 cold: low 0, high 2, batch 1
 cpu 1 hot: low 2, high 6, batch 1
 cpu 1 cold: low 0, high 2, batch 1
 Normal per-cpu:
 cpu 0 hot: low 32, high 96, batch 16
 cpu 0 cold: low 0, high 32, batch 16
 cpu 1 hot: low 32, high 96, batch 16
 cpu 1 cold: low 0, high 32, batch 16
 HighMem per-cpu:
 cpu 0 hot: low 32, high 96, batch 16
 cpu 0 cold: low 0, high 32, batch 16
 cpu 1 hot: low 32, high 96, batch 16
 cpu 1 cold: low 0, high 32, batch 16
 
 Free pages:        4616kB (1600kB HighMem)
 Active:504159 inactive:454759 dirty:20020 writeback:115 unstable:0 free:1154 slab:50758 mapped:489095 pagetables:1222
 DMA free:56kB min:144kB low:288kB high:432kB active:1936kB inactive:4932kB present:16384kB pages_scanned:32 all_unreclaimable? no
 protections[]: 0 0 0
 Normal free:2960kB min:8044kB low:16088kB high:24132kB active:492320kB inactive:166992kB present:901120kB pages_scanned:62 all_unreclaimable? no
 protections[]: 0 0 0
 HighMem free:1600kB min:512kB low:1024kB high:1536kB active:1522380kB inactive:1647112kB present:3178432kB pages_scanned:0 all_unreclaimable? no
 protections[]: 0 0 0
 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 56kB
 Normal: 0*4kB 0*8kB 1*16kB 0*32kB 2*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2960kB
 HighMem: 6*4kB 3*8kB 41*16kB 0*32kB 6*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1600kB
 Swap cache: add 154147, delete 151810, find 29532/39794, race 0+0

	Stefan

  reply	other threads:[~2004-11-10  1:27 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-03 22:24 Kernel 2.6.9 Multiple Page Allocation Failures (Part 2) Stefan Schmidt
2004-11-04 12:17 ` Marcelo Tosatti
2004-11-04 18:18   ` Stefan Schmidt
2004-11-09 16:41     ` Kernel 2.6.9 Multiple Page Allocation Failures Marcelo Tosatti
2004-11-09 22:35       ` Lukas Hejtmanek
2004-11-09 22:46         ` Andrew Morton
2004-11-09 22:44           ` Lukas Hejtmanek
2004-11-09 20:33             ` Marcelo Tosatti
2004-11-10 20:35               ` Lukas Hejtmanek
2004-11-10 21:09                 ` Andrew Morton
2004-11-10 21:24                   ` Lukas Hejtmanek
2004-11-10 21:47                     ` Andrew Morton
2004-11-10 21:28               ` Lukas Hejtmanek
2004-11-10 18:11                 ` Marcelo Tosatti
2004-11-11  1:04                   ` Lukas Hejtmanek
2004-11-11 21:44                   ` Lukas Hejtmanek
2004-11-12 12:09                     ` Nick Piggin
2004-11-13 14:47                       ` Stefan Schmidt
2004-11-16  9:33                         ` Marcelo Tosatti
2004-11-16 17:05                           ` Lukas Hejtmanek
2004-11-21  1:43                             ` Stefan Schmidt
2004-11-21  2:42                               ` Stefan Schmidt
2004-12-02 19:54                                 ` Lukas Hejtmanek
2004-12-02 20:25                                   ` Andrew Morton
2004-12-02 21:03                                     ` Lukas Hejtmanek
2004-12-02 22:31                                       ` Stefan Schmidt
2004-12-02 22:48                                         ` Lukas Hejtmanek
2004-12-02 22:56                                         ` Andrew Morton
2004-12-02 23:18                                           ` Lukas Hejtmanek
2004-12-03  0:18                                             ` Andrew Morton
2004-12-03 12:11                                               ` Lukas Hejtmanek
2004-12-03 12:17                                                 ` Lukas Hejtmanek
2004-12-07 22:52                                                 ` Nick Piggin
2004-12-07 22:59                                                   ` Lukas Hejtmanek
2004-12-07 23:05                                                     ` Nick Piggin
2004-12-08 11:18                                                       ` Lukas Hejtmanek
2004-12-08 11:23                                                         ` Nick Piggin
2004-12-08 11:46                                                           ` Lukas Hejtmanek
2004-12-08 13:14                                                           ` Lukas Hejtmanek
2004-12-09  8:52                                                             ` Nick Piggin
2004-12-09  9:02                                                               ` Lukas Hejtmanek
2004-12-09 10:29                                                                 ` Nick Piggin
2004-12-09 10:37                                                                   ` Lukas Hejtmanek
2004-12-03  6:18                                           ` Nathan Scott
2004-12-03  7:06                                             ` Andrew Morton
2004-12-07 11:17                                             ` Lukas Hejtmanek
2004-12-08  0:15                                               ` Nathan Scott
2004-12-08  0:36                                                 ` Lukas Hejtmanek
2004-12-03 10:35                                           ` Christoph Hellwig
2004-12-03 10:58                                     ` P
2004-12-03 17:11                                       ` Andrew Morton
2004-11-09 23:52           ` Stefan Schmidt
2004-11-10  1:27             ` Stefan Schmidt [this message]
2004-11-10  1:39               ` 2.6.10-rc1-mm4 -1 EAGAIN after allocation failure was: " Andrew Morton
2004-11-10  2:03                 ` Stefan Schmidt
2004-11-10  2:21                   ` Andrew Morton
2004-11-10  4:24                   ` Nick Piggin
2004-11-10 10:28                     ` Stefan Schmidt
2004-11-10 12:06                       ` Stefan Schmidt
2004-11-10  8:58                         ` Marcelo Tosatti
2004-11-10 12:48                           ` Stefan Schmidt
2004-11-10 10:56                             ` Marcelo Tosatti
2004-11-11  1:23                             ` Nick Piggin
2004-11-11 18:31                               ` jhigdon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041110012733.GD20754@zaphods.net \
    --to=zaphodb@zaphods.net \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    --cc=piggin@cyberone.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox