linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 02/15] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
Date: Wed, 8 Feb 2012 16:34:21 +0000	[thread overview]
Message-ID: <20120208163421.GL5938@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.1202080907320.30248@router.home>

On Wed, Feb 08, 2012 at 09:14:32AM -0600, Christoph Lameter wrote:
> On Wed, 8 Feb 2012, Mel Gorman wrote:
> 
> > o struct kmem_cache_cpu could be left alone even though it's a small saving
> 
> Its multiplied by the number of caches and by the number of
> processors.
> 
> > o struct slab also be left alone
> > o struct array_cache could be left alone although I would point out that
> >   it would make no difference in size as touched is changed to a bool to
> >   fit pfmemalloc in
> 
> Both of these are performance critical structures in slab.
> 

Ok, I looked into what is necessary to replace these with checking a page
flag and the cost shifts quite a bit and ends up being more expensive.

Right now, I use array_cache to record if there are any pfmemalloc
objects in the free list at all. If there are not, no expensive checks
are made. For example, in __ac_put_obj(), I check ac->pfmemalloc to see
if an expensive check is required. Using a page flag, the same check
requires a lookup with virt_to_page(). This in turns uses a
pfn_to_page() which depending on the memory model can be very expensive.
No matter what, it's more expensive than a simple check and this is in
the slab free path.

It is more complicated in check_ac_pfmemalloc() too although the performance
impact is less because it is a slow path. If ac->pfmemalloc is false,
the check of each slabp can be avoided. Without it, all the slabps must
be checked unconditionally and each slabp that is checked must call
virt_to_page().

Overall, the memory savings of moving to a page flag are miniscule but
the performance cost is far higher because of the use of virt_to_page().

> > o It would still be necessary to do the object pointer tricks in slab.c
> 
> These trick are not done for slub. It seems that they are not necessary?
> 

In slub, it's sufficient to check kmem_cache_cpu to know whether the
objects in the list are pfmemalloc or not.

> > remain. However, the downside of requiring a page flag is very high. In
> > the event we increase the number of page flags - great, I'll use one but
> > right now I do not think the use of page flag is justified.
> 
> On 64 bit I think there is not much of an issue with another page flag.
> 

There isn't, but on 32 bit there is.

> Also consider that the slab allocators do not make full use of the other
> page flags. We could overload one of the existing flags. I removed
> slubs use of them last year. PG_active could be overloaded I think.
> 

Yeah, you're right on the button there. I did my checking assuming that
PG_active+PG_slab were safe to use. The following is an untested patch that
I probably got details wrong in but it illustrates where virt_to_page()
starts cropping up.

It was a good idea and thanks for thinking of it but unfortunately the
implementation would be more expensive than what I have currently.


diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index e90a673..108f3ce 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -432,6 +432,35 @@ static inline int PageTransCompound(struct page *page)
 }
 #endif
 
+#ifdef CONFIG_NFS_SWAP
+static inline int PageSlabPfmemalloc(struct page *page)
+{
+	VM_BUG_ON(!PageSlab(page));
+	return PageActive(page);
+}
+
+static inline void SetPageSlabPfmemalloc(struct page *page)
+{
+	VM_BUG_ON(!PageSlab(page));
+	SetPageActive(page);
+}
+
+static inline void ClearPageSlabPfmemalloc(struct page *page)
+{
+	VM_BUG_ON(!PageSlab(page));
+	ClearPageActive(page);
+}
+#else
+static inline int PageSlabPfmemalloc(struct page *page)
+{
+	return 0;
+}
+
+static inline void SetPageSlabPfmemalloc(struct page *page)
+{
+}
+#endif
+
 #ifdef CONFIG_MMU
 #define __PG_MLOCKED		(1 << PG_mlocked)
 #else
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 1d9ae40..a32bcfd 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -46,7 +46,6 @@ struct kmem_cache_cpu {
 	struct page *page;	/* The slab from which we are allocating */
 	struct page *partial;	/* Partially allocated frozen slabs */
 	int node;		/* The node of the page (or -1 for debug) */
-	bool pfmemalloc;	/* Slab page had pfmemalloc set */
 #ifdef CONFIG_SLUB_STATS
 	unsigned stat[NR_SLUB_STAT_ITEMS];
 #endif
diff --git a/mm/slab.c b/mm/slab.c
index 268cd96..3012186 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -233,7 +233,6 @@ struct slab {
 			unsigned int inuse;	/* num of objs active in slab */
 			kmem_bufctl_t free;
 			unsigned short nodeid;
-			bool pfmemalloc;	/* Slab had pfmemalloc set */
 		};
 		struct slab_rcu __slab_cover_slab_rcu;
 	};
@@ -255,8 +254,7 @@ struct array_cache {
 	unsigned int avail;
 	unsigned int limit;
 	unsigned int batchcount;
-	bool touched;
-	bool pfmemalloc;
+	unsigned int touched;
 	spinlock_t lock;
 	void *entry[];	/*
 			 * Must have this definition in here for the proper
@@ -978,6 +976,13 @@ static struct array_cache *alloc_arraycache(int node, int entries,
 	return nc;
 }
 
+static inline bool is_slab_pfmemalloc(struct slab *slabp)
+{
+	struct page *page = virt_to_page(slabp->s_mem);
+
+	return PageSlabPfmemalloc(page);
+}
+
 /* Clears ac->pfmemalloc if no slabs have pfmalloc set */
 static void check_ac_pfmemalloc(struct kmem_cache *cachep,
 						struct array_cache *ac)
@@ -985,22 +990,18 @@ static void check_ac_pfmemalloc(struct kmem_cache *cachep,
 	struct kmem_list3 *l3 = cachep->nodelists[numa_mem_id()];
 	struct slab *slabp;
 
-	if (!ac->pfmemalloc)
-		return;
-
 	list_for_each_entry(slabp, &l3->slabs_full, list)
-		if (slabp->pfmemalloc)
+		if (is_slab_pfmemalloc(slabp))
 			return;
 
 	list_for_each_entry(slabp, &l3->slabs_partial, list)
-		if (slabp->pfmemalloc)
+		if (is_slab_pfmemalloc(slabp))
 			return;
 
 	list_for_each_entry(slabp, &l3->slabs_free, list)
-		if (slabp->pfmemalloc)
+		if (is_slab_pfmemalloc(slabp))
 			return;
 
-	ac->pfmemalloc = false;
 }
 
 static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
@@ -1036,7 +1037,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
 			struct slab *slabp = virt_to_slab(objp);
-			slabp->pfmemalloc = false;
+			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			check_ac_pfmemalloc(cachep, ac);
 			return objp;
@@ -1066,15 +1067,11 @@ static inline void *ac_get_obj(struct kmem_cache *cachep,
 static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 								void *objp)
 {
-	struct slab *slabp;
+	struct page *page = virt_to_page(objp);
 
 	/* If there are pfmemalloc slabs, check if the object is part of one */
-	if (unlikely(ac->pfmemalloc)) {
-		slabp = virt_to_slab(objp);
-
-		if (slabp->pfmemalloc)
-			set_obj_pfmemalloc(&objp);
-	}
+	if (PageSlabPfmemalloc(page))
+		set_obj_pfmemalloc(&objp);
 
 	return objp;
 }
@@ -1906,9 +1903,13 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	else
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++)
+	for (i = 0; i < nr_pages; i++) {
 		__SetPageSlab(page + i);
 
+		if (*pfmemalloc)
+			SetPageSlabPfmemalloc(page);
+	}
+
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
 
@@ -2888,7 +2889,6 @@ static struct slab *alloc_slabmgmt(struct kmem_cache *cachep, void *objp,
 	slabp->s_mem = objp + colour_off;
 	slabp->nodeid = nodeid;
 	slabp->free = 0;
-	slabp->pfmemalloc = false;
 	return slabp;
 }
 
@@ -3074,13 +3074,6 @@ static int cache_grow(struct kmem_cache *cachep,
 	if (!slabp)
 		goto opps1;
 
-	/* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */
-	if (pfmemalloc) {
-		struct array_cache *ac = cpu_cache_get(cachep);
-		slabp->pfmemalloc = true;
-		ac->pfmemalloc = true;
-	}
-
 	slab_map_pages(cachep, slabp, objp);
 
 	cache_init_objs(cachep, slabp);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-02-08 16:34 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-06 22:56 [PATCH 00/15] Swap-over-NBD without deadlocking V8 Mel Gorman
2012-02-06 22:56 ` [PATCH 01/15] mm: Serialize access to min_free_kbytes Mel Gorman
2012-02-08 18:47   ` Rik van Riel
2012-02-06 22:56 ` [PATCH 02/15] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2012-02-07 16:27   ` Christoph Lameter
2012-02-08 14:45     ` Mel Gorman
2012-02-08 15:14       ` Christoph Lameter
2012-02-08 16:34         ` Mel Gorman [this message]
2012-02-08 19:49           ` Christoph Lameter
2012-02-08 21:23             ` Mel Gorman
2012-02-08 22:13               ` Christoph Lameter
2012-02-09 12:50                 ` Mel Gorman
2012-02-09 19:53                   ` Christoph Lameter
2012-02-10 10:26                     ` Mel Gorman
2012-02-10 21:01                       ` Christoph Lameter
2012-02-10 22:07                         ` Christoph Lameter
2012-02-13 10:12                           ` Mel Gorman
2012-02-13 11:10                         ` Mel Gorman
2012-02-06 22:56 ` [PATCH 03/15] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 04/15] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2012-02-06 22:56 ` [PATCH 05/15] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2012-02-06 22:56 ` [PATCH 06/15] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2012-02-06 22:56 ` [PATCH 07/15] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2012-02-06 22:56 ` [PATCH 08/15] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 09/15] netvm: Propagate page->pfmemalloc to skb Mel Gorman
2012-02-06 22:56 ` [PATCH 10/15] netvm: Propagate page->pfmemalloc from netdev_alloc_page " Mel Gorman
2012-02-07 23:38   ` Alexander Duyck
2012-02-08 15:23     ` Mel Gorman
2012-02-06 22:56 ` [PATCH 11/15] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2012-02-06 22:56 ` [PATCH 12/15] mm: Micro-optimise slab to avoid a function call Mel Gorman
2012-02-06 22:56 ` [PATCH 13/15] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2012-02-06 22:56 ` [PATCH 14/15] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2012-02-06 22:56 ` [PATCH 15/15] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2012-02-07 12:45 ` [PATCH 00/15] Swap-over-NBD without deadlocking V8 Hillf Danton
2012-02-07 13:27   ` Mel Gorman
2012-02-08 12:51     ` Hillf Danton
2012-02-08 15:26       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120208163421.GL5938@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).