linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, dm-devel@redhat.com,
	Mikulas Patocka <mpatocka@redhat.com>,
	Mel Gorman <mgorman@suse.de>,
	David Rientjes <rientjes@google.com>,
	Ondrej Kozina <okozina@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [dm-devel] [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks
Date: Wed, 27 Jul 2016 13:43:35 +1000	[thread overview]
Message-ID: <87lh0n4ufs.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160725083247.GD9401@dhcp22.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 4267 bytes --]

On Mon, Jul 25 2016, Michal Hocko wrote:

> On Sat 23-07-16 10:12:24, NeilBrown wrote:

>> Maybe that is impractical, but having firm rules like that would go a
>> long way to make it possible to actually understand and reason about how
>> MM works.  As it is, there seems to be a tendency to put bandaids over
>> bandaids.
>
> Ohh, I would definitely wish for this to be more clear but as it turned
> out over time there are quite some interdependencies between MM/FS/IO
> layers which make the picture really blur. If there is a brave soul to
> make that more clear without breaking any of that it would be really
> cool ;)

Just need that comprehensive regression-test-suite and off we go....


>> > My thinking was that throttle_vm_writeout is there to prevent from
>> > dirtying too many pages from the reclaim the context.  PF_LESS_THROTTLE
>> > is part of the writeout so throttling it on too many dirty pages is
>> > questionable (well we get some bias but that is not really reliable). It
>> > still makes sense to throttle when the backing device is congested
>> > because the writeout path wouldn't make much progress anyway and we also
>> > do not want to cycle through LRU lists too quickly in that case.
>> 
>> "dirtying ... from the reclaim context" ??? What does that mean?
>
> Say you would cause a swapout from the reclaim context. You would
> effectively dirty that anon page until it gets written down to the
> storage.

I should probably figure out how swap really works.  I have vague ideas
which are probably missing important details...
Isn't the first step that the page gets moved into the swap-cache - and
marked dirty I guess.  Then it gets written out and the page is marked
'clean'.
Then further memory pressure might push it out of the cache, or an early
re-use would pull it back from the cache.
If so, then "dirtying in reclaim context" could also be described as
"moving into the swap cache" - yes?  So should there be a limit on dirty
pages in the swap cache just like there is for dirty pages in any
filesystem (the max_dirty_ratio thing) ??
Maybe there is?

>> The use of PF_LESS_THROTTLE in current_may_throttle() in vmscan.c is to
>> avoid a live-lock.  A key premise is that nfsd only allocates unbounded
>> memory when it is writing to the page cache.  So it only needs to be
>> throttled when the backing device it is writing to is congested.  It is
>> particularly important that it *doesn't* get throttled just because an
>> NFS backing device is congested, because nfsd might be trying to clear
>> that congestion.
>
> Thanks for the clarification. IIUC then removing throttle_vm_writeout
> for the nfsd writeout should be harmless as well, right?

Certainly shouldn't hurt from the perspective of nfsd.

>> >> The purpose of that flag is to allow a thread to dirty a page-cache page
>> >> as part of cleaning another page-cache page.
>> >> So it makes sense for loop and sometimes for nfsd.  It would make sense
>> >> for dm-crypt if it was putting the encrypted version in the page cache.
>> >> But if dm-crypt is just allocating a transient page (which I think it
>> >> is), then a mempool should be sufficient (and we should make sure it is
>> >> sufficient) and access to an extra 10% (or whatever) of the page cache
>> >> isn't justified.
>> >
>> > If you think that PF_LESS_THROTTLE (ab)use in mempool_alloc is not
>> > appropriate then would a PF_MEMPOOL be any better?
>> 
>> Why a PF rather than a GFP flag?
>
> Well, short answer is that gfp masks are almost depleted.

Really?  We have 26.

pagemap has a cute hack to store both GFP flags and other flag bits in
the one 32 it number per address_space.  'struct address_space' could
afford an extra 32 number I think.

radix_tree_root adds 3 'tag' flags to the gfp_mask.
There is 16bits of free space in radix_tree_node (between 'offset' and
'count').  That space on the root node could store a record of which tags
are set anywhere.  Or would that extra memory de-ref be a killer?

I think we'd end up with cleaner code if we removed the cute-hacks.  And
we'd be able to use 6 more GFP flags!!  (though I do wonder if we really
need all those 26).

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  parent reply	other threads:[~2016-07-27  3:43 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18  8:39 [RFC PATCH 0/2] mempool vs. page allocator interaction Michal Hocko
2016-07-18  8:41 ` [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path Michal Hocko
2016-07-18  8:41   ` [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks Michal Hocko
2016-07-19 21:50     ` Mikulas Patocka
2016-07-22  8:46     ` NeilBrown
2016-07-22  9:04       ` NeilBrown
2016-07-22  9:15       ` Michal Hocko
2016-07-23  0:12         ` NeilBrown
2016-07-25  8:32           ` Michal Hocko
2016-07-25 19:23             ` Michal Hocko
2016-07-26  7:07               ` Michal Hocko
2016-07-27  3:43             ` NeilBrown [this message]
2016-07-27 18:24               ` [dm-devel] " Michal Hocko
2016-07-27 21:33                 ` NeilBrown
2016-07-28  7:17                   ` Michal Hocko
2016-08-03 12:53                     ` Mikulas Patocka
2016-08-03 14:34                       ` Michal Hocko
2016-08-04 18:49                         ` Mikulas Patocka
2016-08-12 12:32                           ` Michal Hocko
2016-08-13 17:34                             ` Mikulas Patocka
2016-08-14 10:34                               ` Michal Hocko
2016-08-15 16:15                                 ` Mikulas Patocka
2016-11-23 21:11                                 ` Mikulas Patocka
2016-11-24 13:29                                   ` Michal Hocko
2016-11-24 17:10                                     ` Mikulas Patocka
2016-11-28 14:06                                       ` Michal Hocko
2016-07-25 21:52           ` Mikulas Patocka
2016-07-26  7:25             ` Michal Hocko
2016-07-27  4:02             ` [dm-devel] " NeilBrown
2016-07-27 14:28               ` Mikulas Patocka
2016-07-27 18:40                 ` Michal Hocko
2016-08-03 13:59                   ` Mikulas Patocka
2016-08-03 14:42                     ` Michal Hocko
2016-08-04 18:46                       ` Mikulas Patocka
2016-07-27 21:36                 ` NeilBrown
2016-07-19  2:00   ` [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path David Rientjes
2016-07-19  7:49     ` Michal Hocko
2016-07-19 13:54   ` Johannes Weiner
2016-07-19 14:19     ` Michal Hocko
2016-07-19 22:01       ` Mikulas Patocka
2016-07-19 20:45     ` David Rientjes
2016-07-20  8:15       ` Michal Hocko
2016-07-20 21:06         ` David Rientjes
2016-07-21  8:52           ` Michal Hocko
2016-07-21 12:13             ` Johannes Weiner
2016-07-21 14:53               ` Michal Hocko
2016-07-21 15:26                 ` Johannes Weiner
2016-07-22  1:41                 ` NeilBrown
2016-07-22  6:37                 ` Michal Hocko
2016-07-22 12:26                   ` Vlastimil Babka
2016-07-22 19:44                     ` Andrew Morton
2016-07-23 18:52                       ` Vlastimil Babka
2016-07-19 21:50   ` Mikulas Patocka
2016-07-20  6:44     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lh0n4ufs.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=okozina@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).