All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Yang Shi <yang.shi@linaro.org>, Ning Qu <quning@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: Re: [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size
Date: Thu, 14 Apr 2016 13:56:41 +0200	[thread overview]
Message-ID: <570F8579.2070201@suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.1604051337450.5965@eggly.anvils>

On 04/05/2016 10:40 PM, Hugh Dickins wrote:
> Though debug kernels have a VM_BUG_ON to help protect from misaccounting
> lru_size, non-debug kernels are liable to wrap it around: and then the
> vast unsigned long size draws page reclaim into a loop of repeatedly
> doing nothing on an empty list, without even a cond_resched().
>
> That soft lockup looks confusingly like an over-busy reclaim scenario,
> with lots of contention on the lru_lock in shrink_inactive_list():
> yet has a totally different origin.
>
> Help differentiate with a custom warning in mem_cgroup_update_lru_size(),
> even in non-debug kernels; and reset the size to avoid the lockup.  But
> the particular bug which suggested this change was mine alone, and since
> fixed.

In my opinion, the code now looks quite complicated, not sure it's a good 
tradeoff for a rare (?) development bug. But I guess it's up to memcg 
maintainers which I note are not explicitly CC'd, so adding them now.

Maybe more generally, we can discuss in LSF/MM's mm debugging session, what it 
means that DEBUG_VM check has to become unconditional. Does it mean insufficient 
testing with DEBUG_VM during development/integration phase? Or are some bugs so 
rare we can't depend on that phase to catch them? IIRC Fedora kernels are built 
with DEBUG_VM, unless that changed...

> Make it a WARN_ONCE: the first occurrence is the most informative, a
> flurry may follow, yet even when rate-limited little more is learnt.
>
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   include/linux/mm_inline.h |    2 +-
>   mm/memcontrol.c           |   24 ++++++++++++++++++++----
>   2 files changed, 21 insertions(+), 5 deletions(-)
>
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -35,8 +35,8 @@ static __always_inline void del_page_fro
>   				struct lruvec *lruvec, enum lru_list lru)
>   {
>   	int nr_pages = hpage_nr_pages(page);
> -	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>   	list_del(&page->lru);
> +	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>   	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages);
>   }
>
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1022,22 +1022,38 @@ out:
>    * @lru: index of lru list the page is sitting on
>    * @nr_pages: positive when adding or negative when removing
>    *
> - * This function must be called when a page is added to or removed from an
> - * lru list.
> + * This function must be called under lru_lock, just before a page is added
> + * to or just after a page is removed from an lru list (that ordering being
> + * so as to allow it to check that lru_size 0 is consistent with list_empty).
>    */
>   void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>   				int nr_pages)
>   {
>   	struct mem_cgroup_per_zone *mz;
>   	unsigned long *lru_size;
> +	long size;
> +	bool empty;

Could there be more descriptive names? lru_size vs size looks confusing.

>
>   	if (mem_cgroup_disabled())
>   		return;
>
>   	mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec);
>   	lru_size = mz->lru_size + lru;
> -	*lru_size += nr_pages;
> -	VM_BUG_ON((long)(*lru_size) < 0);
> +	empty = list_empty(lruvec->lists + lru);
> +
> +	if (nr_pages < 0)
> +		*lru_size += nr_pages;
> +
> +	size = *lru_size;
> +	if (WARN_ONCE(size < 0 || empty != !size,

Maybe I'm just not used enough to constructs like "empty != !size", but it 
really takes me longer than I'd like to get the meaning :(

> +		"%s(%p, %d, %d): lru_size %ld but %sempty\n",
> +		__func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {
> +		VM_BUG_ON(1);
> +		*lru_size = 0;
> +	}
> +
> +	if (nr_pages > 0)
> +		*lru_size += nr_pages;
>   }
>
>   bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

  reply	other threads:[~2016-04-14 11:56 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 20:37 [PATCH 00/10] mm: easy preliminaries to THPagecache Hugh Dickins
2016-04-05 20:37 ` Hugh Dickins
2016-04-05 20:40 ` [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2016-04-05 20:40   ` Hugh Dickins
2016-04-14 11:56   ` Vlastimil Babka [this message]
2016-04-05 20:42 ` [PATCH 02/10] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2016-04-05 20:42   ` Hugh Dickins
2016-04-05 20:44 ` [PATCH 03/10] mm: use __SetPageSwapBacked and dont ClearPageSwapBacked Hugh Dickins
2016-04-05 20:44   ` Hugh Dickins
2016-04-06  9:52   ` Mel Gorman
2016-04-06  9:52     ` Mel Gorman
2016-04-05 20:45 ` [PATCH 04/10] tmpfs: preliminary minor tidyups Hugh Dickins
2016-04-05 20:45   ` Hugh Dickins
2016-04-05 20:47 ` [PATCH 05/10] tmpfs: mem_cgroup charge fault to vm_mm not current mm Hugh Dickins
2016-04-05 20:47   ` Hugh Dickins
2016-04-05 20:49 ` [PATCH 06/10] mm: /proc/sys/vm/stat_refresh to force vmstat update Hugh Dickins
2016-04-05 20:49   ` Hugh Dickins
2016-04-05 20:51 ` [PATCH 07/10] huge mm: move_huge_pmd does not need new_vma Hugh Dickins
2016-04-05 20:51   ` Hugh Dickins
2016-04-11 10:28   ` Kirill A. Shutemov
2016-04-11 10:28     ` Kirill A. Shutemov
2016-04-05 20:52 ` [PATCH 08/10] huge pagecache: extend mremap pmd rmap lockout to files Hugh Dickins
2016-04-05 20:52   ` Hugh Dickins
2016-04-11 10:32   ` Kirill A. Shutemov
2016-04-11 10:32     ` Kirill A. Shutemov
2016-04-05 20:55 ` [PATCH 09/10] huge pagecache: mmap_sem is unlocked when truncation splits pmd Hugh Dickins
2016-04-05 20:55   ` Hugh Dickins
2016-04-11 10:35   ` Kirill A. Shutemov
2016-04-11 10:35     ` Kirill A. Shutemov
2016-04-14 17:39   ` Matthew Wilcox
2016-04-14 17:39     ` Matthew Wilcox
2016-04-16  4:55     ` Andrew Morton
2016-04-16  4:55       ` Andrew Morton
2016-04-05 21:02 ` [PATCH 10/10] arch: fix has_transparent_hugepage() Hugh Dickins
2016-04-05 21:02   ` Hugh Dickins
2016-04-05 23:25   ` David Miller
2016-04-05 23:25     ` David Miller
2016-04-06  3:22   ` Vineet Gupta
2016-04-06  6:58   ` Ingo Molnar
2016-04-06  6:58     ` Ingo Molnar
2016-04-06 12:29     ` Chris Metcalf
2016-04-06 21:58       ` Ingo Molnar
2016-04-06 21:58         ` Ingo Molnar
2016-04-06 11:56   ` Gerald Schaefer
2016-04-06 11:56     ` Gerald Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=570F8579.2070201@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=quning@gmail.com \
    --cc=vdavydov@virtuozzo.com \
    --cc=yang.shi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.