From: Andrew Morton <akpm@zip.com.au>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: lkml <linux-kernel@vger.kernel.org>
Subject: [patch] don't allocate ratnodes under PF_MEMALLOC
Date: Fri, 12 Apr 2002 23:59:43 -0700 [thread overview]
Message-ID: <3CB7D75F.FEE95D28@zip.com.au> (raw)
On the swap_out() path, the radix-tree pagecache is allocating its
nodes with PF_MEMALLOC set, which allows it to completely exhaust the
free page lists(*). This is fairly easy to trigger with swap-intensive
loads.
It would be better to make those node allocations fail at an earlier
time. When this happens, the radix-tree can still obtain nodes from its
mempool, and we leave some memory available for the I/O layer.
(Assuming that the I/O is being performed under PF_MEMALLOC, which it
is).
So the patch simply drops PF_MEMALLOC while adding nodes to the
swapcache's tree.
We're still performing atomic allocations, so the rat is still biting
pretty deeply into the page reserves - under heavy load the amount of
free memory is less than half of what it was pre-rat.
It is unfortunate that the page allocator overloads !__GFP_WAIT to also
mean "try harder". It would be better to separate these concepts, and
to allow the radix-tree code (at least) to perform atomic allocations,
but to not go below pages_min. It seems that __GFP_TRY_HARDER will be
pretty straightforward to implement. Later.
The patch also impements a workaround for the mempool list_head
problem, until that is sorted out.
(*) The usual result is that the SCSI layer dies at scsi_merge.c:82.
It would be nice to have a fix for that - it's going BUG if 1-order
allocations fail at interrupt time. That happens pretty easily.
--- 2.5.8-pre3/mm/vmscan.c~ratcache-pf_memalloc Fri Apr 12 23:40:35 2002
+++ 2.5.8-pre3-akpm/mm/vmscan.c Fri Apr 12 23:40:35 2002
@@ -36,6 +36,29 @@
#define DEF_PRIORITY (6)
/*
+ * On the swap_out path, the radix-tree node allocations are performing
+ * GFP_ATOMIC allocations under PF_MEMALLOC. They can completely
+ * exhaust the page allocator. This is bad; some pages should be left
+ * available for the I/O system to start sending the swapcache contents
+ * to disk.
+ *
+ * So PF_MEMALLOC is dropped here. This causes the slab allocations to fail
+ * earlier, so radix-tree nodes will then be allocated from the mempool
+ * reserves.
+ */
+static inline int
+swap_out_add_to_swap_cache(struct page *page, swp_entry_t entry)
+{
+ int flags = current->flags;
+ int ret;
+
+ current->flags &= ~PF_MEMALLOC;
+ ret = add_to_swap_cache(page, entry);
+ current->flags = flags;
+ return ret;
+}
+
+/*
* The swap-out function returns 1 if it successfully
* scanned all the pages it was asked to (`count').
* It returns zero if it couldn't do anything,
@@ -140,7 +163,7 @@ drop_pte:
* (adding to the page cache will clear the dirty
* and uptodate bits, so we need to do it again)
*/
- switch (add_to_swap_cache(page, entry)) {
+ switch (swap_out_add_to_swap_cache(page, entry)) {
case 0: /* Success */
SetPageUptodate(page);
set_page_dirty(page);
--- 2.5.8-pre3/lib/radix-tree.c~ratcache-pf_memalloc Fri Apr 12 23:40:35 2002
+++ 2.5.8-pre3-akpm/lib/radix-tree.c Fri Apr 12 23:54:49 2002
@@ -49,11 +49,27 @@ struct radix_tree_path {
static kmem_cache_t *radix_tree_node_cachep;
static mempool_t *radix_tree_node_pool;
-#define radix_tree_node_alloc(root) \
- mempool_alloc(radix_tree_node_pool, (root)->gfp_mask)
-#define radix_tree_node_free(node) \
- mempool_free((node), radix_tree_node_pool);
+/*
+ * mempool scribbles on the first eight bytes of the managed
+ * memory. Here we implement a temp workaround for that.
+ */
+#include <linux/list.h>
+static inline struct radix_tree_node *
+radix_tree_node_alloc(struct radix_tree_root *root)
+{
+ struct radix_tree_node *ret;
+
+ ret = mempool_alloc(radix_tree_node_pool, root->gfp_mask);
+ if (ret)
+ memset(ret, 0, sizeof(struct list_head));
+ return ret;
+}
+static inline void
+radix_tree_node_free(struct radix_tree_node *node)
+{
+ mempool_free(node, radix_tree_node_pool);
+}
/*
* Return the maximum key which can be store into a
-
next reply other threads:[~2002-04-13 6:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-04-13 6:59 Andrew Morton [this message]
2002-04-13 11:32 ` [patch] don't allocate ratnodes under PF_MEMALLOC Christoph Hellwig
2002-04-13 17:13 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3CB7D75F.FEE95D28@zip.com.au \
--to=akpm@zip.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.