* [patch] don't allocate ratnodes under PF_MEMALLOC
@ 2002-04-13 6:59 Andrew Morton
2002-04-13 11:32 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2002-04-13 6:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: lkml
On the swap_out() path, the radix-tree pagecache is allocating its
nodes with PF_MEMALLOC set, which allows it to completely exhaust the
free page lists(*). This is fairly easy to trigger with swap-intensive
loads.
It would be better to make those node allocations fail at an earlier
time. When this happens, the radix-tree can still obtain nodes from its
mempool, and we leave some memory available for the I/O layer.
(Assuming that the I/O is being performed under PF_MEMALLOC, which it
is).
So the patch simply drops PF_MEMALLOC while adding nodes to the
swapcache's tree.
We're still performing atomic allocations, so the rat is still biting
pretty deeply into the page reserves - under heavy load the amount of
free memory is less than half of what it was pre-rat.
It is unfortunate that the page allocator overloads !__GFP_WAIT to also
mean "try harder". It would be better to separate these concepts, and
to allow the radix-tree code (at least) to perform atomic allocations,
but to not go below pages_min. It seems that __GFP_TRY_HARDER will be
pretty straightforward to implement. Later.
The patch also impements a workaround for the mempool list_head
problem, until that is sorted out.
(*) The usual result is that the SCSI layer dies at scsi_merge.c:82.
It would be nice to have a fix for that - it's going BUG if 1-order
allocations fail at interrupt time. That happens pretty easily.
--- 2.5.8-pre3/mm/vmscan.c~ratcache-pf_memalloc Fri Apr 12 23:40:35 2002
+++ 2.5.8-pre3-akpm/mm/vmscan.c Fri Apr 12 23:40:35 2002
@@ -36,6 +36,29 @@
#define DEF_PRIORITY (6)
/*
+ * On the swap_out path, the radix-tree node allocations are performing
+ * GFP_ATOMIC allocations under PF_MEMALLOC. They can completely
+ * exhaust the page allocator. This is bad; some pages should be left
+ * available for the I/O system to start sending the swapcache contents
+ * to disk.
+ *
+ * So PF_MEMALLOC is dropped here. This causes the slab allocations to fail
+ * earlier, so radix-tree nodes will then be allocated from the mempool
+ * reserves.
+ */
+static inline int
+swap_out_add_to_swap_cache(struct page *page, swp_entry_t entry)
+{
+ int flags = current->flags;
+ int ret;
+
+ current->flags &= ~PF_MEMALLOC;
+ ret = add_to_swap_cache(page, entry);
+ current->flags = flags;
+ return ret;
+}
+
+/*
* The swap-out function returns 1 if it successfully
* scanned all the pages it was asked to (`count').
* It returns zero if it couldn't do anything,
@@ -140,7 +163,7 @@ drop_pte:
* (adding to the page cache will clear the dirty
* and uptodate bits, so we need to do it again)
*/
- switch (add_to_swap_cache(page, entry)) {
+ switch (swap_out_add_to_swap_cache(page, entry)) {
case 0: /* Success */
SetPageUptodate(page);
set_page_dirty(page);
--- 2.5.8-pre3/lib/radix-tree.c~ratcache-pf_memalloc Fri Apr 12 23:40:35 2002
+++ 2.5.8-pre3-akpm/lib/radix-tree.c Fri Apr 12 23:54:49 2002
@@ -49,11 +49,27 @@ struct radix_tree_path {
static kmem_cache_t *radix_tree_node_cachep;
static mempool_t *radix_tree_node_pool;
-#define radix_tree_node_alloc(root) \
- mempool_alloc(radix_tree_node_pool, (root)->gfp_mask)
-#define radix_tree_node_free(node) \
- mempool_free((node), radix_tree_node_pool);
+/*
+ * mempool scribbles on the first eight bytes of the managed
+ * memory. Here we implement a temp workaround for that.
+ */
+#include <linux/list.h>
+static inline struct radix_tree_node *
+radix_tree_node_alloc(struct radix_tree_root *root)
+{
+ struct radix_tree_node *ret;
+
+ ret = mempool_alloc(radix_tree_node_pool, root->gfp_mask);
+ if (ret)
+ memset(ret, 0, sizeof(struct list_head));
+ return ret;
+}
+static inline void
+radix_tree_node_free(struct radix_tree_node *node)
+{
+ mempool_free(node, radix_tree_node_pool);
+}
/*
* Return the maximum key which can be store into a
-
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] don't allocate ratnodes under PF_MEMALLOC
2002-04-13 6:59 [patch] don't allocate ratnodes under PF_MEMALLOC Andrew Morton
@ 2002-04-13 11:32 ` Christoph Hellwig
2002-04-13 17:13 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2002-04-13 11:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, lkml
>
> /*
> + * On the swap_out path, the radix-tree node allocations are performing
> + * GFP_ATOMIC allocations under PF_MEMALLOC. They can completely
> + * exhaust the page allocator. This is bad; some pages should be left
> + * available for the I/O system to start sending the swapcache contents
> + * to disk.
> + *
> + * So PF_MEMALLOC is dropped here. This causes the slab allocations to fail
> + * earlier, so radix-tree nodes will then be allocated from the mempool
> + * reserves.
> + */
> +static inline int
> +swap_out_add_to_swap_cache(struct page *page, swp_entry_t entry)
> +{
> + int flags = current->flags;
> + int ret;
> +
> + current->flags &= ~PF_MEMALLOC;
> + ret = add_to_swap_cache(page, entry);
> + current->flags = flags;
> + return ret;
> +}
I don't like this soloution very - I think porting th add_to_swap() logic
from -rmap and implementing the flags fiddling in that function makes
more sense. I will do so once a -pre4 is out to resync.
Christoph
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] don't allocate ratnodes under PF_MEMALLOC
2002-04-13 11:32 ` Christoph Hellwig
@ 2002-04-13 17:13 ` Christoph Hellwig
0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2002-04-13 17:13 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds, lkml
On Sat, Apr 13, 2002 at 12:32:44PM +0100, Christoph Hellwig wrote:
>
> I don't like this soloution very - I think porting th add_to_swap() logic
> from -rmap and implementing the flags fiddling in that function makes
> more sense. I will do so once a -pre4 is out to resync.
Umm, after actually taking a look at the mainline swapout path, a
rmap-like add_to_swap() logic doesn't fit very well. Go ahead with your
current fix, please.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-04-13 17:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-13 6:59 [patch] don't allocate ratnodes under PF_MEMALLOC Andrew Morton
2002-04-13 11:32 ` Christoph Hellwig
2002-04-13 17:13 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.