linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Mel Gorman <mgorman@suse.de>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
Date: Thu, 28 Apr 2011 09:21:10 +1000	[thread overview]
Message-ID: <20110428092110.608eb354@notabene.brown> (raw)
In-Reply-To: <20110426135940.GE4658@suse.de>

On Tue, 26 Apr 2011 14:59:40 +0100 Mel Gorman <mgorman@suse.de> wrote:

> On Tue, Apr 26, 2011 at 09:37:58PM +1000, NeilBrown wrote:
> > On Tue, 26 Apr 2011 08:36:43 +0100 Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > +		/*
> > > +		 * If there are full empty slabs and we were not forced to
> > > +		 * allocate a slab, mark this one !pfmemalloc
> > > +		 */
> > > +		l3 = cachep->nodelists[numa_mem_id()];
> > > +		if (!list_empty(&l3->slabs_free) && force_refill) {
> > > +			struct slab *slabp = virt_to_slab(objp);
> > > +			slabp->pfmemalloc = false;
> > > +			clear_obj_pfmemalloc(&objp);
> > > +			check_ac_pfmemalloc(cachep, ac);
> > > +			return objp;
> > > +		}
> > 
> > The comment doesn't match the code.  I think you need to remove the words
> > "full" and "not" assuming the code is correct which it probably is...
> > 
> 
> I'll fix up the comment, you're right, it's confusing.
> 
> > But the code seems to be much more complex than Peter's original, and I don't
> > see the gain.
> > 
> 
> You're right, it is more complex.
> 
> > Peter's code had only one 'reserved' flag for each kmem_cache. 
> 
> The reserve was set in a per-cpu structure so there was a "lag" time
> before that information was available to other CPUs. Fine on smaller
> machines but a bit more of a problem today. 
> 
> > You seem to
> > have one for every slab.  I don't see the point.
> > It is true that yours is in some sense more fair - but I'm not sure the
> > complexity is worth it.
> > 
> 
> More fairness was one of the objects.
> 
> > Was there some particular reason you made the change?
> > 
> 
> This version survives under considerably more stress than Peter's
> original version did without requiring the additional complexity of
> memory reserves.
> 

That is certainly a very compelling argument .... but I still don't get why.
I'm sorry if I'm being dense, but I still don't see why the complexity buys
us better stability and I really would like to understand.

You don't seem to need the same complexity for SLUB with the justification
of "SLUB generally maintaining smaller lists than SLAB".

Presumably these are per-CPU lists of free objects or slabs?  If the things
on those lists could be used by anyone long lists couldn't hurt.
So the problem must be that the lists get long while the array_cache is still
marked as 'pfmemalloc' (or 'reserve' in Peter's patches).

Is that the problem?  That reserve memory gets locked up in SLAB freelists?
If so - would that be more easily addressed by effectively reducing the
'batching' when the array_cache had dipped into reserves, so slabs are
returned to the VM more promptly?

Probably related, now that you've fixed the comment here (thanks):

+		/*
+		 * If there are empty slabs on the slabs_free list and we are
+		 * being forced to refill the cache, mark this one !pfmemalloc.
+		 */
+		l3 = cachep->nodelists[numa_mem_id()];
+		if (!list_empty(&l3->slabs_free) && force_refill) {
+			struct slab *slabp = virt_to_slab(objp);
+			slabp->pfmemalloc = false;
+			clear_obj_pfmemalloc(&objp);
+			check_ac_pfmemalloc(cachep, ac);
+			return objp;
+		}

I'm trying to understand it...
The context is that a non-MEMALLOC allocation is happening and everything on
the free lists is reserved for MEMALLOC allocations.
So if in that case there is a completely free slab on the free list we decide
that it is OK to mark the current slab as non-MEMALLOC.
The logic seems to be that we could just release that free slab to the VM,
then an alloc_page would be able to get it back.  But if we are still well
below the reserve watermark, then there might be some other allocation that
is more deserving of the page and we shouldn't just assume we can take
it with actually calling in to alloc_pages to check that we are no longer
running on reserves..

So this looks like an optimisation that is wrong.



BTW, 


+	/* Record if ALLOC_PFMEMALLOC was set when allocating the slab */
+	if (pfmemalloc) {
+		struct array_cache *ac = cpu_cache_get(cachep);
+		slabp->pfmemalloc = true;
+		ac->pfmemalloc = 1;
+	}
+

I think that "= 1"  should be "= true".  :-)

NeilBrown

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-04-27 23:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-26  7:36 [PATCH 00/13] Swap-over-NBD without deadlocking Mel Gorman
2011-04-26  7:36 ` [PATCH 01/13] mm: Serialize access to min_free_kbytes Mel Gorman
2011-04-26  7:36 ` [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2011-04-26 11:15   ` NeilBrown
2011-04-26 11:33     ` Mel Gorman
2011-04-26 12:05       ` NeilBrown
2011-04-26 11:37   ` NeilBrown
2011-04-26 13:59     ` Mel Gorman
2011-04-27 23:21       ` NeilBrown [this message]
2011-04-28  9:46         ` Mel Gorman
2011-04-26  7:36 ` [PATCH 03/13] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2011-04-26  9:49   ` NeilBrown
2011-04-26 10:36     ` Mel Gorman
2011-04-26 10:53       ` NeilBrown
2011-04-26 14:00         ` Mel Gorman
2011-04-26  7:36 ` [PATCH 04/13] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2011-04-26  7:36 ` [PATCH 05/13] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2011-04-26  7:36 ` [PATCH 06/13] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2011-04-26  7:36 ` [PATCH 07/13] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2011-04-26  7:36 ` [PATCH 08/13] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2011-04-26  7:36 ` [PATCH 09/13] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2011-04-26 12:21   ` NeilBrown
2011-04-26 14:10     ` Mel Gorman
2011-04-26 23:22       ` NeilBrown
2011-04-26  7:36 ` [PATCH 10/13] mm: Micro-optimise slab to avoid a function call Mel Gorman
2011-04-26  7:36 ` [PATCH 11/13] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2011-04-26  7:36 ` [PATCH 12/13] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2011-04-26 12:30   ` NeilBrown
2011-04-26 14:26     ` Mel Gorman
2011-04-26 23:18       ` NeilBrown
2011-04-27  8:36         ` Mel Gorman
2011-04-26  7:36 ` [PATCH 13/13] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2011-04-26 12:35   ` NeilBrown
2011-04-26 14:26     ` Mel Gorman
2011-04-26 14:23 ` [PATCH 00/13] Swap-over-NBD without deadlocking Peter Zijlstra
2011-04-26 14:46   ` Mel Gorman
2011-04-26 14:50     ` Peter Zijlstra
2011-04-27  8:43       ` Mel Gorman
2011-04-28 13:31 ` Pavel Machek
2011-04-28 13:42   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2011-04-27 16:07 [PATCH 00/13] Swap-over-NBD without deadlocking v3 Mel Gorman
2011-04-27 16:08 ` [PATCH 02/13] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110428092110.608eb354@notabene.brown \
    --to=neilb@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).