All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ning Qu <quning@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 01/24] mm: update_lru_size warn and reset bad lru_size
Date: Mon, 23 Feb 2015 11:30:55 +0200	[thread overview]
Message-ID: <20150223093055.GA7322@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.LSU.2.11.1502201949350.14414@eggly.anvils>

On Fri, Feb 20, 2015 at 07:51:16PM -0800, Hugh Dickins wrote:
> Though debug kernels have a VM_BUG_ON to help protect from misaccounting
> lru_size, non-debug kernels are liable to wrap it around: and then the
> vast unsigned long size draws page reclaim into a loop of repeatedly
> doing nothing on an empty list, without even a cond_resched().
> 
> That soft lockup looks confusingly like an over-busy reclaim scenario,
> with lots of contention on the lruvec lock in shrink_inactive_list():
> yet has a totally different origin.
> 
> Help differentiate with a custom warning in mem_cgroup_update_lru_size(),
> even in non-debug kernels; and reset the size to avoid the lockup.  But
> the particular bug which suggested this change was mine alone, and since
> fixed.

Do we need this kind of check for !MEMCG kernels?

> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  include/linux/mm_inline.h |    2 +-
>  mm/memcontrol.c           |   24 ++++++++++++++++++++----
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> --- thpfs.orig/include/linux/mm_inline.h	2013-11-03 15:41:51.000000000 -0800
> +++ thpfs/include/linux/mm_inline.h	2015-02-20 19:33:25.928096883 -0800
> @@ -35,8 +35,8 @@ static __always_inline void del_page_fro
>  				struct lruvec *lruvec, enum lru_list lru)
>  {
>  	int nr_pages = hpage_nr_pages(page);
> -	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>  	list_del(&page->lru);
> +	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>  	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages);
>  }
>  
> --- thpfs.orig/mm/memcontrol.c	2015-02-08 18:54:22.000000000 -0800
> +++ thpfs/mm/memcontrol.c	2015-02-20 19:33:25.928096883 -0800
> @@ -1296,22 +1296,38 @@ out:
>   * @lru: index of lru list the page is sitting on
>   * @nr_pages: positive when adding or negative when removing
>   *
> - * This function must be called when a page is added to or removed from an
> - * lru list.
> + * This function must be called under lruvec lock, just before a page is added
> + * to or just after a page is removed from an lru list (that ordering being so
> + * as to allow it to check that lru_size 0 is consistent with list_empty).
>   */
>  void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>  				int nr_pages)
>  {
>  	struct mem_cgroup_per_zone *mz;
>  	unsigned long *lru_size;
> +	long size;
> +	bool empty;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  
>  	mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec);
>  	lru_size = mz->lru_size + lru;
> -	*lru_size += nr_pages;
> -	VM_BUG_ON((long)(*lru_size) < 0);
> +	empty = list_empty(lruvec->lists + lru);
> +
> +	if (nr_pages < 0)
> +		*lru_size += nr_pages;
> +
> +	size = *lru_size;
> +	if (WARN(size < 0 || empty != !size,
> +	"mem_cgroup_update_lru_size(%p, %d, %d): lru_size %ld but %sempty\n",
> +			lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

Formatting can be unscrewed this way:

	if (WARN(size < 0 || empty != !size,
		"%s(%p, %d, %d): lru_size %ld but %sempty\n",
		__func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

> +		VM_BUG_ON(1);
> +		*lru_size = 0;
> +	}
> +
> +	if (nr_pages > 0)
> +		*lru_size += nr_pages;
>  }
>  
>  bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, struct mem_cgroup *root)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ning Qu <quning@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 01/24] mm: update_lru_size warn and reset bad lru_size
Date: Mon, 23 Feb 2015 11:30:55 +0200	[thread overview]
Message-ID: <20150223093055.GA7322@node.dhcp.inet.fi> (raw)
In-Reply-To: <alpine.LSU.2.11.1502201949350.14414@eggly.anvils>

On Fri, Feb 20, 2015 at 07:51:16PM -0800, Hugh Dickins wrote:
> Though debug kernels have a VM_BUG_ON to help protect from misaccounting
> lru_size, non-debug kernels are liable to wrap it around: and then the
> vast unsigned long size draws page reclaim into a loop of repeatedly
> doing nothing on an empty list, without even a cond_resched().
> 
> That soft lockup looks confusingly like an over-busy reclaim scenario,
> with lots of contention on the lruvec lock in shrink_inactive_list():
> yet has a totally different origin.
> 
> Help differentiate with a custom warning in mem_cgroup_update_lru_size(),
> even in non-debug kernels; and reset the size to avoid the lockup.  But
> the particular bug which suggested this change was mine alone, and since
> fixed.

Do we need this kind of check for !MEMCG kernels?

> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  include/linux/mm_inline.h |    2 +-
>  mm/memcontrol.c           |   24 ++++++++++++++++++++----
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> --- thpfs.orig/include/linux/mm_inline.h	2013-11-03 15:41:51.000000000 -0800
> +++ thpfs/include/linux/mm_inline.h	2015-02-20 19:33:25.928096883 -0800
> @@ -35,8 +35,8 @@ static __always_inline void del_page_fro
>  				struct lruvec *lruvec, enum lru_list lru)
>  {
>  	int nr_pages = hpage_nr_pages(page);
> -	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>  	list_del(&page->lru);
> +	mem_cgroup_update_lru_size(lruvec, lru, -nr_pages);
>  	__mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages);
>  }
>  
> --- thpfs.orig/mm/memcontrol.c	2015-02-08 18:54:22.000000000 -0800
> +++ thpfs/mm/memcontrol.c	2015-02-20 19:33:25.928096883 -0800
> @@ -1296,22 +1296,38 @@ out:
>   * @lru: index of lru list the page is sitting on
>   * @nr_pages: positive when adding or negative when removing
>   *
> - * This function must be called when a page is added to or removed from an
> - * lru list.
> + * This function must be called under lruvec lock, just before a page is added
> + * to or just after a page is removed from an lru list (that ordering being so
> + * as to allow it to check that lru_size 0 is consistent with list_empty).
>   */
>  void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>  				int nr_pages)
>  {
>  	struct mem_cgroup_per_zone *mz;
>  	unsigned long *lru_size;
> +	long size;
> +	bool empty;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  
>  	mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec);
>  	lru_size = mz->lru_size + lru;
> -	*lru_size += nr_pages;
> -	VM_BUG_ON((long)(*lru_size) < 0);
> +	empty = list_empty(lruvec->lists + lru);
> +
> +	if (nr_pages < 0)
> +		*lru_size += nr_pages;
> +
> +	size = *lru_size;
> +	if (WARN(size < 0 || empty != !size,
> +	"mem_cgroup_update_lru_size(%p, %d, %d): lru_size %ld but %sempty\n",
> +			lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

Formatting can be unscrewed this way:

	if (WARN(size < 0 || empty != !size,
		"%s(%p, %d, %d): lru_size %ld but %sempty\n",
		__func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) {

> +		VM_BUG_ON(1);
> +		*lru_size = 0;
> +	}
> +
> +	if (nr_pages > 0)
> +		*lru_size += nr_pages;
>  }
>  
>  bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, struct mem_cgroup *root)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

  reply	other threads:[~2015-02-23  9:31 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-21  3:49 [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Hugh Dickins
2015-02-21  3:49 ` Hugh Dickins
2015-02-21  3:51 ` [PATCH 01/24] mm: update_lru_size warn and reset bad lru_size Hugh Dickins
2015-02-21  3:51   ` Hugh Dickins
2015-02-23  9:30   ` Kirill A. Shutemov [this message]
2015-02-23  9:30     ` Kirill A. Shutemov
2015-03-23  2:44     ` Hugh Dickins
2015-03-23  2:44       ` Hugh Dickins
2015-02-21  3:54 ` [PATCH 02/24] mm: update_lru_size do the __mod_zone_page_state Hugh Dickins
2015-02-21  3:54   ` Hugh Dickins
2015-02-21  3:56 ` [PATCH 03/24] mm: use __SetPageSwapBacked and don't ClearPageSwapBacked Hugh Dickins
2015-02-21  3:56   ` Hugh Dickins
2015-02-25 10:53   ` Mel Gorman
2015-02-25 10:53     ` Mel Gorman
2015-03-23  3:01     ` Hugh Dickins
2015-03-23  3:01       ` Hugh Dickins
2015-02-21  3:58 ` [PATCH 04/24] mm: make page migration's newpage handling more robust Hugh Dickins
2015-02-21  3:58   ` Hugh Dickins
2015-02-21  4:00 ` [PATCH 05/24] tmpfs: preliminary minor tidyups Hugh Dickins
2015-02-21  4:00   ` Hugh Dickins
2015-02-21  4:01 ` [PATCH 06/24] huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m Hugh Dickins
2015-02-21  4:01   ` Hugh Dickins
2015-02-21  4:03 ` [PATCH 07/24] huge tmpfs: include shmem freeholes in available memory counts Hugh Dickins
2015-02-21  4:03   ` Hugh Dickins
2015-02-21  4:05 ` [PATCH 08/24] huge tmpfs: prepare huge=N mount option and /proc/sys/vm/shmem_huge Hugh Dickins
2015-02-21  4:05   ` Hugh Dickins
2015-02-21  4:06 ` [PATCH 09/24] huge tmpfs: try to allocate huge pages, split into a team Hugh Dickins
2015-02-21  4:06   ` Hugh Dickins
2015-02-21  4:07 ` [PATCH 10/24] huge tmpfs: avoid team pages in a few places Hugh Dickins
2015-02-21  4:07   ` Hugh Dickins
2015-02-21  4:09 ` [PATCH 11/24] huge tmpfs: shrinker to migrate and free underused holes Hugh Dickins
2015-02-21  4:09   ` Hugh Dickins
2015-03-19 16:56   ` Konstantin Khlebnikov
2015-03-19 16:56     ` Konstantin Khlebnikov
2015-03-23  4:40     ` Hugh Dickins
2015-03-23  4:40       ` Hugh Dickins
2015-03-23 12:50       ` Kirill A. Shutemov
2015-03-23 12:50         ` Kirill A. Shutemov
2015-03-23 13:50         ` Kirill A. Shutemov
2015-03-23 13:50           ` Kirill A. Shutemov
2015-03-24 12:57       ` Kirill A. Shutemov
2015-03-24 12:57         ` Kirill A. Shutemov
2015-03-25  0:41         ` Hugh Dickins
2015-03-25  0:41           ` Hugh Dickins
2015-02-21  4:11 ` [PATCH 12/24] huge tmpfs: get_unmapped_area align and fault supply huge page Hugh Dickins
2015-02-21  4:11   ` Hugh Dickins
2015-02-21  4:12 ` [PATCH 13/24] huge tmpfs: extend get_user_pages_fast to shmem pmd Hugh Dickins
2015-02-21  4:12   ` Hugh Dickins
2015-02-21  4:13 ` [PATCH 14/24] huge tmpfs: extend vma_adjust_trans_huge " Hugh Dickins
2015-02-21  4:13   ` Hugh Dickins
2015-02-21  4:15 ` [PATCH 15/24] huge tmpfs: rework page_referenced_one and try_to_unmap_one Hugh Dickins
2015-02-21  4:15   ` Hugh Dickins
2015-02-21  4:16 ` [PATCH 16/24] huge tmpfs: fix problems from premature exposure of pagetable Hugh Dickins
2015-02-21  4:16   ` Hugh Dickins
2015-07-01 10:53   ` Kirill A. Shutemov
2015-07-01 10:53     ` Kirill A. Shutemov
2015-02-21  4:18 ` [PATCH 17/24] huge tmpfs: map shmem by huge page pmd or by page team ptes Hugh Dickins
2015-02-21  4:18   ` Hugh Dickins
2015-02-21  4:20 ` [PATCH 18/24] huge tmpfs: mmap_sem is unlocked when truncation splits huge pmd Hugh Dickins
2015-02-21  4:20   ` Hugh Dickins
2015-02-21  4:22 ` [PATCH 19/24] huge tmpfs: disband split huge pmds on race or memory failure Hugh Dickins
2015-02-21  4:22   ` Hugh Dickins
2015-02-21  4:23 ` [PATCH 20/24] huge tmpfs: use Unevictable lru with variable hpage_nr_pages() Hugh Dickins
2015-02-21  4:23   ` Hugh Dickins
2015-02-21  4:25 ` [PATCH 21/24] huge tmpfs: fix Mlocked meminfo, tracking huge and unhuge mlocks Hugh Dickins
2015-02-21  4:25   ` Hugh Dickins
2015-02-21  4:27 ` [PATCH 22/24] huge tmpfs: fix Mapped meminfo, tracking huge and unhuge mappings Hugh Dickins
2015-02-21  4:27   ` Hugh Dickins
2015-02-21  4:29 ` [PATCH 23/24] kvm: plumb return of hva when resolving page fault Hugh Dickins
2015-02-21  4:29   ` Hugh Dickins
2015-02-21  4:31 ` [PATCH 24/24] kvm: teach kvm to map page teams as huge pages Hugh Dickins
2015-02-21  4:31   ` Hugh Dickins
2015-02-23 13:48 ` [PATCH 00/24] huge tmpfs: an alternative approach to THPageCache Kirill A. Shutemov
2015-02-23 13:48   ` Kirill A. Shutemov
2015-03-23  2:25   ` Hugh Dickins
2015-03-23  2:25     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150223093055.GA7322@node.dhcp.inet.fi \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=quning@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.