All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: hannes@cmpxchg.org, riel@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, kernel-team@fb.com,
	Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation
Date: Tue, 20 Jun 2017 11:46:45 +0900	[thread overview]
Message-ID: <20170620024645.GA27702@bbox> (raw)
In-Reply-To: <20170619151120.GA11245@destiny>

Hello Josef,

On Mon, Jun 19, 2017 at 11:11:21AM -0400, Josef Bacik wrote:
> On Wed, Jun 14, 2017 at 03:40:45PM +0900, Minchan Kim wrote:
> > On Tue, Jun 13, 2017 at 08:01:57AM -0400, Josef Bacik wrote:
> > > On Tue, Jun 13, 2017 at 02:28:02PM +0900, Minchan Kim wrote:
> > > > Hello,
> > > > 
> > > > On Thu, Jun 08, 2017 at 03:19:05PM -0400, josef@toxicpanda.com wrote:
> > > > > From: Josef Bacik <jbacik@fb.com>
> > > > > 
> > > > > When testing a slab heavy workload I noticed that we often would barely
> > > > > reclaim anything at all from slab when kswapd started doing reclaim.
> > > > > This is because we use the ratio of nr_scanned / nr_lru to determine how
> > > > > much of slab we should reclaim.  But in a slab only/mostly workload we
> > > > > will not have much page cache to reclaim, and thus our ratio will be
> > > > > really low and not at all related to where the memory on the system is.
> > > > 
> > > > I want to understand this clearly.
> > > > Why nr_scanned / nr_lru is low if system doesnt' have much page cache?
> > > > Could you elaborate it a bit?
> > > > 
> > > 
> > > Yeah so for example on my freshly booted test box I have this
> > > 
> > > Active:            58840 kB
> > > Inactive:          46860 kB
> > > 
> > > Every time we do a get_scan_count() we do this
> > > 
> > > scan = size >> sc->priority
> > > 
> > > where sc->priority starts at DEF_PRIORITY, which is 12.  The first loop through
> > > reclaim would result in a scan target of 2 pages to 11715 total inactive pages,
> > > and 3 pages to 14710 total active pages.  This is a really really small target
> > > for a system that is entirely slab pages.  And this is super optimistic, this
> > > assumes we even get to scan these pages.  We don't increment sc->nr_scanned
> > > unless we 1) isolate the page, which assumes it's not in use, and 2) can lock
> > > the page.  Under pressure these numbers could probably go down, I'm sure there's
> > > some random pages from daemons that aren't actually in use, so the targets get
> > > even smaller.
> > > 
> > > We have to get sc->priority down a lot before we start to get to the 1:1 ratio
> > > that would even start to be useful for reclaim in this scenario.  Add to this
> > > that most shrinkable slabs have this idea that their objects have to loop
> > > through the LRU twice (no longer icache/dcache as Al took my patch to fix that
> > > thankfully) and you end up spending a lot of time looping and reclaiming
> > > nothing.  Basing it on actual slab usage makes more sense logically and avoids
> > > this kind of problem.  Thanks,
> > 
> > Thanks. I got understood now.
> > 
> > As I see your change, it seems to be rather aggressive to me.
> > 
> >         node_slab = lruvec_page_state(lruvec, NR_SLAB_RECLAIMABLE);
> >         shrink_slab(,,, node_slab >> sc->priority, node_slab);
> > 
> > The point is when we finish reclaiming from direct/background(ie, kswapd),
> > it makes sure that VM scanned slab object up to twice of the size which
> > is consistent with LRU pages.
> > 
> > What do you think about this?
> 
> Sorry for the delay, I was on a short vacation.  At first I thought this was a
> decent idea so I went to put it in there.  But there were some problems with it,
> and with sc->priority itself I beleive.  First the results were not great, we
> still end up not doing a lot of reclaim until we get down to the lower priority
> numbers.
> 
> The thing that's different with slab vs everybody else is that these numbers are
> a ratio, not a specific scan target amount.  With the other LRU's we do

Hmm, I don't get it why the ratio model is a problem.
My suggestion is to aim for scanning entire available objects list twice
in a reclaim cycle(priority from 12 to 0) which is consistent with LRU
reclaim. IOW, (1/4096 + 1/2048 + ... 1/1) * available object.
If it cannot reclaim pages with low priority(ie, 12), it try to reclaim
more objects in higher priority and finally, it try to reclaim every objects
at priority 1 and one more chance with priority 0.

> 
> scan = total >> sc->priority
> 
> and then we look through 'scan' number of pages, which means we're usually
> reclaiming enough stuff to make progress at each priority level.  Slab is
> different, pages != slab objects.  Plus we have this common pattern of putting

Aha, I see your concern. The problem is although we can reclaim a object from
slab, it doesn't mean to reclaim a page so VM should go with next priority
cycle. If so, I think we can adjust the the cost model more agressive with
ratio approach. (1/12 + 2/12 + 3/12 ...) With this, in a reclaim cycle(12..0),
we guarantees that scanning of entire objects list four times while LRU is
two times. As well, it might be a helpful if we can have slab's reclaim tunable
knob to tune reclaim agressiveness of slab like swappiness for anonymous pages.

> every object onto our lru list, and letting the scanning mechanism figure out
> which objects are actually not in use any more, which means each scan is likely
> to not make progress until we've gone through the entire lru.

Sorry, I didn't understand. Could you elaborate a bit if it's important point
in this discussion?

> 
> You are worried that we are just going to empty the slab every time, and that is
> totally a valid concern.  But we have checks in place to make sure that our
> total_scan (the number of objects we scan) doesn't end up hugely bonkers so we
> don't waste time scanning through objects.  If we wanted to be even more careful
> we could add some checks in do_shrink_slab/shrink_slab to bail as soon as we hit
> our reclaim targets, instead of having just the one check in shrink_node.

Acutually, my main worry is the expression(gslab/greclaimable).
What's the rationale for such modeling in you mind?
Without understanding that, it's hard to say whether it's proper.

For exmaple, with your expression, if nr_slab == nr_lru, it scans all objects
of slab. Why?  At that time, VM even doesn't scan full LRU.
I really want to make them consistent so when a reclaim cycle is done,
we guarantees to happen some amount of scanning.
In my idea, LRU scanning is x2 of LRU pages and slab scanning is x4 of
slab object.

> 
> As for sc->priority, I think it doesn't make much sense in general.  It makes
> total sense to limit the number of pages scanned per LRU, but we can accomplish
> this with ratios of each lru to the overall state of the system.  The fact is we
> want to keep scanning and reclaiming until we hit our reclaim target, so using
> the sc->priority thing is just kind of clunky and sometimes results in us
> looping needlessly out to get the priority lowered, when we could just apply
> ratio based pressure to the LRU's/slab until we hit our targets, and then bail
> out.  I could be wrong and that seems like a big can of worms I don't want to
> open right now, but for sure I don't think it's a good fit for slab shrinking
> because of the disconnect of nr_slab_pages to actual slab objects.  Thanks,
> 
> Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-06-20  2:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-08 19:19 [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation josef
2017-06-08 19:19 ` [PATCH 2/2] mm: make kswapd try harder to keep active pages in cache josef
2017-06-13  5:28 ` [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation Minchan Kim
2017-06-13 12:01   ` Josef Bacik
2017-06-14  6:40     ` Minchan Kim
2017-06-19 15:11       ` Josef Bacik
2017-06-20  2:46         ` Minchan Kim [this message]
2017-06-27 13:59           ` Josef Bacik
2017-06-30  2:17             ` Minchan Kim
2017-06-30 15:03               ` Josef Bacik
2017-07-02  1:58                 ` Dave Chinner
2017-07-03 13:52                   ` Josef Bacik
2017-07-03  1:33                 ` Minchan Kim
2017-07-03 13:50                   ` Josef Bacik
2017-07-04  3:01                     ` Minchan Kim
2017-07-04 13:21                       ` Josef Bacik
2017-07-04 22:57                         ` Dave Chinner
2017-07-05  4:59                           ` Minchan Kim
2017-07-05 23:58                             ` Dave Chinner
2017-07-06  3:56                               ` Minchan Kim
2017-07-05 13:33                           ` Josef Bacik
2017-07-05 23:30                             ` Dave Chinner
2017-07-05  4:43                         ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170620024645.GA27702@bbox \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jbacik@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.