linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Yang Shi <yang.shi@linaro.org>, Ning Qu <quning@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: Re: [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size
Date: Thu, 14 Apr 2016 13:56:41 +0200	[thread overview]
Message-ID: <570F8579.2070201@suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1604051337450.5965@eggly.anvils>

On 04/05/2016 10:40 PM, Hugh Dickins wrote:
> Though debug kernels have a VM_BUG_ON to help protect from misaccounting
> lru_size, non-debug kernels are liable to wrap it around: and then the
> vast unsigned long size draws page reclaim into a loop of repeatedly
> doing nothing on an empty list, without even a cond_resched().
>
> That soft lockup looks confusingly like an over-busy reclaim scenario,
> with lots of contention on the lru_lock in shrink_inactive_list():
> yet has a totally different origin.
>
> Help differentiate with a custom warning in mem_cgroup_update_lru_size(),
> even in non-debug kernels; and reset the size to avoid the lockup.  But
> the particular bug which suggested this change was mine alone, and since
> fixed.

In my opinion, the code now looks quite complicated, not sure it's a good 
tradeoff for a rare (?) development bug. But I guess it's up to memcg 
maintainers which I note are not explicitly CC'd, so adding them now.

Maybe more generally, we can discuss in LSF/MM's mm debugging session, what it 
means that DEBUG_VM check has to become unconditional. Does it mean insufficient 
testing with DEBUG_VM during development/integration phase? Or are some bugs so 
rare we can't depend on that phase to catch them? IIRC Fedora kernels are built 
with DEBUG_VM, unless that changed...

> Make it a WARN_ONCE: the first occurrence is the most informative, a
> flurry may follow, yet even when rate-limited little more is learnt.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   include/linux/mm_inline.h |    2 +-
>   mm/memcontrol.c           |   24 ++++++++++++++++++++----
>   2 files changed, 21 insertions(+), 5 deletions(-)
>
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -35,8 +35,8 @@ static __always_inline void del_page_fro
>   				struct lruvec *lruvec, enum lru_list lru)
>   {
>   	int nr_pages = hpage_nr_pages(page);
> -	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>   	list_del(&page->lru);
> +	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>   	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages);
>   }
>
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1022,22 +1022,38 @@ out:
>    * @lru: index of lru list the page is sitting on
>    * @nr_pages: positive when adding or negative when removing
>    *
> - * This function must be called when a page is added to or removed from an
> - * lru list.
> + * This function must be called under lru_lock, just before a page is added
> + * to or just after a page is removed from an lru list (that ordering being
> + * so as to allow it to check that lru_size 0 is consistent with list_empty).
>    */
>   void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>   				int nr_pages)
>   {
>   	struct mem_cgroup_per_zone *mz;
>   	unsigned long *lru_size;
> +	long size;
> +	bool empty;

Could there be more descriptive names? lru_size vs size looks confusing.

>
>   	if (mem_cgroup_disabled())
>   		return;
>
>   	mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec);
>   	lru_size = mz->lru_size + lru;
> -	*lru_size += nr_pages;
> -	VM_BUG_ON((long)(*lru_size) < 0);
> +	empty = list_empty(lruvec->lists + lru);
> +
> +	if (nr_pages < 0)
> +		*lru_size += nr_pages;
> +
> +	size = *lru_size;
> +	if (WARN_ONCE(size < 0 || empty != !size,

Maybe I'm just not used enough to constructs like "empty != !size", but it 
really takes me longer than I'd like to get the meaning :(

> +		"%s(%p, %d, %d): lru_size %ld but %sempty\n",
> +		__func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {
> +		VM_BUG_ON(1);
> +		*lru_size = 0;
> +	}
> +
> +	if (nr_pages > 0)
> +		*lru_size += nr_pages;
>   }
>
>   bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

  reply	other threads:[~2016-04-14 11:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 20:37 [PATCH 00/10] mm: easy preliminaries to THPagecache Hugh Dickins
2016-04-05 20:40 ` [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2016-04-14 11:56   ` Vlastimil Babka [this message]
2016-04-05 20:42 ` [PATCH 02/10] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2016-04-05 20:44 ` [PATCH 03/10] mm: use __SetPageSwapBacked and dont ClearPageSwapBacked Hugh Dickins
2016-04-06  9:52   ` Mel Gorman
2016-04-05 20:45 ` [PATCH 04/10] tmpfs: preliminary minor tidyups Hugh Dickins
2016-04-05 20:47 ` [PATCH 05/10] tmpfs: mem_cgroup charge fault to vm_mm not current mm Hugh Dickins
2016-04-05 20:49 ` [PATCH 06/10] mm: /proc/sys/vm/stat_refresh to force vmstat update Hugh Dickins
2016-04-05 20:51 ` [PATCH 07/10] huge mm: move_huge_pmd does not need new_vma Hugh Dickins
2016-04-11 10:28   ` Kirill A. Shutemov
2016-04-05 20:52 ` [PATCH 08/10] huge pagecache: extend mremap pmd rmap lockout to files Hugh Dickins
2016-04-11 10:32   ` Kirill A. Shutemov
2016-04-05 20:55 ` [PATCH 09/10] huge pagecache: mmap_sem is unlocked when truncation splits pmd Hugh Dickins
2016-04-11 10:35   ` Kirill A. Shutemov
2016-04-14 17:39   ` Matthew Wilcox
2016-04-16  4:55     ` Andrew Morton
2016-04-05 21:02 ` [PATCH 10/10] arch: fix has_transparent_hugepage() Hugh Dickins
2016-04-05 23:25   ` David Miller
2016-04-06  3:22   ` Vineet Gupta
2016-04-06  6:58   ` Ingo Molnar
2016-04-06 12:29     ` Chris Metcalf
2016-04-06 21:58       ` Ingo Molnar
2016-04-06 11:56   ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570F8579.2070201@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=quning@gmail.com \
    --cc=vdavydov@virtuozzo.com \
    --cc=yang.shi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).