All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: mm: memcontrol: rewrite uncharge API: problems
Date: Thu, 3 Jul 2014 20:41:04 -0400	[thread overview]
Message-ID: <20140704004104.GG1369@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.11.1407031219500.1370@eggly.anvils>

On Thu, Jul 03, 2014 at 12:54:36PM -0700, Hugh Dickins wrote:
> On Wed, 2 Jul 2014, Hugh Dickins wrote:
> > On Wed, 2 Jul 2014, Johannes Weiner wrote:
> > > 
> > > Could you give the following patch a spin?  I put it in the mmots
> > > stack on top of mm-memcontrol-rewrite-charge-api-fix-shmem_unuse-fix.
> > 
> > I'm just with the laptop until this evening.  I slapped it on top of
> > my 3.16-rc2-mm1 plus fixes (but obviously minus my memcg_batch one
> > - which incidentally continues to run without crashing on the G5),
> > and it quickly gave me this lockdep splat, which doesn't look very
> > different from the one before.
> > 
> > I see there's now an -rc3-mm1, I'll try it out on that in half an
> > hour... but unless I send word otherwise, assume that's the same.
> 
> Yes, I get that lockdep report each time on -rc3-mm1 + your patch.

There are two instances where I missed to make &rtpz->lock IRQ-safe:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91b621846e10..bbaa3f4cf4db 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3919,7 +3919,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						    gfp_mask, &nr_scanned);
 		nr_reclaimed += reclaimed;
 		*total_scanned += nr_scanned;
-		spin_lock(&mctz->lock);
+		spin_lock_irq(&mctz->lock);
 
 		/*
 		 * If we failed to reclaim anything from this memory cgroup
@@ -3959,7 +3959,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 		 */
 		/* If excess == 0, no tree ops */
 		__mem_cgroup_insert_exceeded(mz, mctz, excess);
-		spin_unlock(&mctz->lock);
+		spin_unlock_irq(&mctz->lock);
 		css_put(&mz->memcg->css);
 		loop++;
 		/*

That should make it complete - but the IRQ toggling costs are fairly
high so I'm rewriting the batching code to use the page lists that
most uncharges have anyway, and then batch the no-IRQ sections.

> I also twice got a flurry of res_counter.c:28 underflow warnings.
> Hmm, 62 of them each time (I was checking for a number near 512,
> which would suggest a THP/4k confusion, but no).  The majority
> of them coming from mem_cgroup_reparent_charges.

I haven't seen these yet.  But the location makes sense: if there are
any imbalances they'll be noticed during a group's final uncharges.

> But the laptop stayed up fine (for two hours before I had to stop
> it), and the G5 has run fine with that load for 16 hours now, no
> problems with release_pages, and not even a res_counter.c:28 (but
> I don't use lockdep on it).

Great!

> The x86 workstation ran fine for 4.5 hours, then hit some deadlock
> which I doubt had any connection to your changes: looked more like
> a jbd2 transaction was failing to complete (which, with me trying
> ext4 on loop on tmpfs, might be more my problem than anyone else's).
> 
> Oh, but nearly forgot, I did an earlier run on the laptop last night,
> which crashed within minutes on
> 
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault

Haven't seen that one yet, either.  The only way I can see this happen
is when the same page gets selected for migration twice in a row.
Maybe a race with putback, where it gets added to the LRU but isolated
by compaction before putback drops the refcount - will verify that.

Thanks for your reports!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: mm: memcontrol: rewrite uncharge API: problems
Date: Thu, 3 Jul 2014 20:41:04 -0400	[thread overview]
Message-ID: <20140704004104.GG1369@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.11.1407031219500.1370@eggly.anvils>

On Thu, Jul 03, 2014 at 12:54:36PM -0700, Hugh Dickins wrote:
> On Wed, 2 Jul 2014, Hugh Dickins wrote:
> > On Wed, 2 Jul 2014, Johannes Weiner wrote:
> > > 
> > > Could you give the following patch a spin?  I put it in the mmots
> > > stack on top of mm-memcontrol-rewrite-charge-api-fix-shmem_unuse-fix.
> > 
> > I'm just with the laptop until this evening.  I slapped it on top of
> > my 3.16-rc2-mm1 plus fixes (but obviously minus my memcg_batch one
> > - which incidentally continues to run without crashing on the G5),
> > and it quickly gave me this lockdep splat, which doesn't look very
> > different from the one before.
> > 
> > I see there's now an -rc3-mm1, I'll try it out on that in half an
> > hour... but unless I send word otherwise, assume that's the same.
> 
> Yes, I get that lockdep report each time on -rc3-mm1 + your patch.

There are two instances where I missed to make &rtpz->lock IRQ-safe:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91b621846e10..bbaa3f4cf4db 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3919,7 +3919,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 						    gfp_mask, &nr_scanned);
 		nr_reclaimed += reclaimed;
 		*total_scanned += nr_scanned;
-		spin_lock(&mctz->lock);
+		spin_lock_irq(&mctz->lock);
 
 		/*
 		 * If we failed to reclaim anything from this memory cgroup
@@ -3959,7 +3959,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone *zone, int order,
 		 */
 		/* If excess == 0, no tree ops */
 		__mem_cgroup_insert_exceeded(mz, mctz, excess);
-		spin_unlock(&mctz->lock);
+		spin_unlock_irq(&mctz->lock);
 		css_put(&mz->memcg->css);
 		loop++;
 		/*

That should make it complete - but the IRQ toggling costs are fairly
high so I'm rewriting the batching code to use the page lists that
most uncharges have anyway, and then batch the no-IRQ sections.

> I also twice got a flurry of res_counter.c:28 underflow warnings.
> Hmm, 62 of them each time (I was checking for a number near 512,
> which would suggest a THP/4k confusion, but no).  The majority
> of them coming from mem_cgroup_reparent_charges.

I haven't seen these yet.  But the location makes sense: if there are
any imbalances they'll be noticed during a group's final uncharges.

> But the laptop stayed up fine (for two hours before I had to stop
> it), and the G5 has run fine with that load for 16 hours now, no
> problems with release_pages, and not even a res_counter.c:28 (but
> I don't use lockdep on it).

Great!

> The x86 workstation ran fine for 4.5 hours, then hit some deadlock
> which I doubt had any connection to your changes: looked more like
> a jbd2 transaction was failing to complete (which, with me trying
> ext4 on loop on tmpfs, might be more my problem than anyone else's).
> 
> Oh, but nearly forgot, I did an earlier run on the laptop last night,
> which crashed within minutes on
> 
> VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM))
> mm/memcontrol.c:6680!
> page had count 1 mapcount 0 mapping anon index 0x196
> flags locked uptodate reclaim swapbacked, pcflags 1, memcg not root
> mem_cgroup_migrate < move_to_new_page < migrate_pages < compact_zone <
> compact_zone_order < try_to_compact_pages < __alloc_pages_direct_compact <
> __alloc_pages_nodemask < alloc_pages_vma < do_huge_pmd_anonymous_page <
> handle_mm_fault < __do_page_fault

Haven't seen that one yet, either.  The only way I can see this happen
is when the same page gets selected for migration twice in a row.
Maybe a race with putback, where it gets added to the LRU but isolated
by compaction before putback drops the refcount - will verify that.

Thanks for your reports!

  reply	other threads:[~2014-07-04  0:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 23:55 mm: memcontrol: rewrite uncharge API: problems Hugh Dickins
2014-06-30 23:55 ` Hugh Dickins
2014-07-01 17:46 ` Johannes Weiner
2014-07-01 17:46   ` Johannes Weiner
2014-07-02 21:20   ` Johannes Weiner
2014-07-02 21:20     ` Johannes Weiner
2014-07-02 22:28     ` Hugh Dickins
2014-07-02 22:28       ` Hugh Dickins
2014-07-03 19:54       ` Hugh Dickins
2014-07-03 19:54         ` Hugh Dickins
2014-07-04  0:41         ` Johannes Weiner [this message]
2014-07-04  0:41           ` Johannes Weiner
2014-07-05  2:12           ` Hugh Dickins
2014-07-05  2:12             ` Hugh Dickins
2014-07-07 13:44             ` Johannes Weiner
2014-07-07 13:44               ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140704004104.GG1369@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.