From: Sasha Levin <sasha.levin@oracle.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>, Hugh Dickins <hughd@google.com>,
Tejun Heo <tj@kernel.org>,
Vladimir Davydov <vdavydov@parallels.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 13/13] mm: memcontrol: rewrite uncharge API
Date: Fri, 20 Jun 2014 20:34:43 -0400 [thread overview]
Message-ID: <53A4D323.5080808@oracle.com> (raw)
In-Reply-To: <1403124045-24361-14-git-send-email-hannes@cmpxchg.org>
On 06/18/2014 04:40 PM, Johannes Weiner wrote:
> The memcg uncharging code that is involved towards the end of a page's
> lifetime - truncation, reclaim, swapout, migration - is impressively
> complicated and fragile.
>
> Because anonymous and file pages were always charged before they had
> their page->mapping established, uncharges had to happen when the page
> type could still be known from the context; as in unmap for anonymous,
> page cache removal for file and shmem pages, and swap cache truncation
> for swap pages. However, these operations happen well before the page
> is actually freed, and so a lot of synchronization is necessary:
>
> - Charging, uncharging, page migration, and charge migration all need
> to take a per-page bit spinlock as they could race with uncharging.
>
> - Swap cache truncation happens during both swap-in and swap-out, and
> possibly repeatedly before the page is actually freed. This means
> that the memcg swapout code is called from many contexts that make
> no sense and it has to figure out the direction from page state to
> make sure memory and memory+swap are always correctly charged.
>
> - On page migration, the old page might be unmapped but then reused,
> so memcg code has to prevent untimely uncharging in that case.
> Because this code - which should be a simple charge transfer - is so
> special-cased, it is not reusable for replace_page_cache().
>
> But now that charged pages always have a page->mapping, introduce
> mem_cgroup_uncharge(), which is called after the final put_page(),
> when we know for sure that nobody is looking at the page anymore.
>
> For page migration, introduce mem_cgroup_migrate(), which is called
> after the migration is successful and the new page is fully rmapped.
> Because the old page is no longer uncharged after migration, prevent
> double charges by decoupling the page's memcg association (PCG_USED
> and pc->mem_cgroup) from the page holding an actual charge. The new
> bits PCG_MEM and PCG_MEMSW represent the respective charges and are
> transferred to the new page during migration.
>
> mem_cgroup_migrate() is suitable for replace_page_cache() as well,
> which gets rid of mem_cgroup_replace_page_cache().
>
> Swap accounting is massively simplified: because the page is no longer
> uncharged as early as swap cache deletion, a new mem_cgroup_swapout()
> can transfer the page's memory+swap charge (PCG_MEMSW) to the swap
> entry before the final put_page() in page reclaim.
>
> Finally, page_cgroup changes are now protected by whatever protection
> the page itself offers: anonymous pages are charged under the page
> table lock, whereas page cache insertions, swapin, and migration hold
> the page lock. Uncharging happens under full exclusion with no
> outstanding references. Charging and uncharging also ensure that the
> page is off-LRU, which serializes against charge migration. Remove
> the very costly page_cgroup lock and set pc->flags non-atomically.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Hi Johannes,
I'm seeing the following when booting a VM, bisection pointed me to this
patch.
[ 32.830823] BUG: using __this_cpu_add() in preemptible [00000000] code: mkdir/8677
[ 32.831522] caller is __this_cpu_preempt_check+0x13/0x20
[ 32.832079] CPU: 35 PID: 8677 Comm: mkdir Not tainted 3.16.0-rc1-next-20140620-sasha-00023-g8fc12ed #700
[ 32.832898] ffffffffb27ea69d ffff8800cb91b618 ffffffffb151820b 0000000000000002
[ 32.833607] 0000000000000023 ffff8800cb91b648 ffffffffaeb4c799 ffff88006efa5b60
[ 32.834318] ffffea0007cff9c0 0000000000000001 0000000000000001 ffff8800cb91b658
[ 32.835030] Call Trace:
[ 32.835257] dump_stack (lib/dump_stack.c:52)
[ 32.835755] check_preemption_disabled (./arch/x86/include/asm/preempt.h:80 lib/smp_processor_id.c:49)
[ 32.836336] __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 32.836991] mem_cgroup_charge_statistics.isra.23 (mm/memcontrol.c:930)
[ 32.837682] commit_charge (mm/memcontrol.c:2761)
[ 32.838187] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[ 32.838735] ? get_parent_ip (kernel/sched/core.c:2546)
[ 32.839230] mem_cgroup_commit_charge (mm/memcontrol.c:6519)
[ 32.839807] __add_to_page_cache_locked (mm/filemap.c:588 include/linux/jump_label.h:115 include/trace/events/filemap.h:50 mm/filemap.c:589)
[ 32.840479] add_to_page_cache_lru (mm/filemap.c:627)
[ 32.841048] read_cache_pages (mm/readahead.c:92)
[ 32.841560] ? v9fs_cache_session_get_key (fs/9p/cache.c:306)
[ 32.842145] ? v9fs_write_begin (fs/9p/vfs_addr.c:99)
[ 32.842694] v9fs_vfs_readpages (fs/9p/vfs_addr.c:127)
[ 32.843251] __do_page_cache_readahead (mm/readahead.c:123 mm/readahead.c:200)
[ 32.843848] ? __do_page_cache_readahead (include/linux/rcupdate.h:877 mm/readahead.c:178)
[ 32.844435] ? __const_udelay (arch/x86/lib/delay.c:126)
[ 32.844944] filemap_fault (include/linux/memcontrol.h:141 include/linux/memcontrol.h:198 mm/filemap.c:1869)
[ 32.845465] ? __rcu_read_unlock (kernel/rcu/update.c:97)
[ 32.845999] __do_fault (mm/memory.c:2705)
[ 32.846472] ? mem_cgroup_try_charge (include/linux/cgroup.h:158 mm/memcontrol.c:6467)
[ 32.847048] do_cow_fault (mm/memory.c:2936)
[ 32.847561] __handle_mm_fault (mm/memory.c:3078 mm/memory.c:3205 mm/memory.c:3322)
[ 32.848092] ? __const_udelay (arch/x86/lib/delay.c:126)
[ 32.848596] ? __rcu_read_unlock (kernel/rcu/update.c:97)
[ 32.849157] handle_mm_fault (mm/memory.c:3345)
[ 32.849665] __do_page_fault (arch/x86/mm/fault.c:1230)
[ 32.850239] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
[ 32.850963] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 32.851442] ? sched_clock_local (kernel/sched/clock.c:214)
[ 32.852034] ? context_tracking_user_exit (kernel/context_tracking.c:184)
[ 32.852669] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 32.853243] ? trace_hardirqs_off_caller (kernel/locking/lockdep.c:2638 (discriminator 2))
[ 32.853854] trace_do_page_fault (arch/x86/mm/fault.c:1313 include/linux/jump_label.h:115 include/linux/context_tracking_state.h:27 include/linux/context_tracking.h:45 arch/x86/mm/fault.c:1314)
[ 32.854393] do_async_page_fault (arch/x86/kernel/kvm.c:264)
[ 32.854924] async_page_fault (arch/x86/kernel/entry_64.S:1322)
[ 32.855507] ? __clear_user (arch/x86/lib/usercopy_64.c:22)
[ 32.855999] ? __clear_user (arch/x86/lib/usercopy_64.c:18 arch/x86/lib/usercopy_64.c:21)
[ 32.856488] clear_user (arch/x86/lib/usercopy_64.c:54)
[ 32.856997] padzero (fs/binfmt_elf.c:122)
[ 32.857440] load_elf_binary (fs/binfmt_elf.c:909 (discriminator 1))
[ 32.857949] ? search_binary_handler (fs/exec.c:1374)
[ 32.858550] ? preempt_count_sub (kernel/sched/core.c:2602)
[ 32.859089] search_binary_handler (fs/exec.c:1375)
[ 32.859654] do_execve_common.isra.19 (fs/exec.c:1412 fs/exec.c:1508)
[ 32.860319] ? do_execve_common.isra.19 (./arch/x86/include/asm/current.h:14 fs/exec.c:1406 fs/exec.c:1508)
[ 32.860949] do_execve (fs/exec.c:1551)
[ 32.861390] SyS_execve (fs/exec.c:1602)
[ 32.861848] stub_execve (arch/x86/kernel/entry_64.S:662)
Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-06-21 0:35 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-18 20:40 [patch 00/13] mm: memcontrol: naturalize charge lifetime v4 Johannes Weiner
2014-06-18 20:40 ` [patch 01/13] mm: memcontrol: fold mem_cgroup_do_charge() Johannes Weiner
2014-06-18 20:40 ` [patch 02/13] mm: memcontrol: rearrange charging fast path Johannes Weiner
2014-06-18 20:40 ` [patch 03/13] mm: memcontrol: reclaim at least once for __GFP_NORETRY Johannes Weiner
2014-06-18 20:40 ` [patch 04/13] mm: huge_memory: use GFP_TRANSHUGE when charging huge pages Johannes Weiner
2014-06-18 20:40 ` [patch 05/13] mm: memcontrol: retry reclaim for oom-disabled and __GFP_NOFAIL charges Johannes Weiner
2014-06-18 20:40 ` [patch 06/13] mm: memcontrol: remove explicit OOM parameter in charge path Johannes Weiner
2014-06-18 20:40 ` [patch 07/13] mm: memcontrol: simplify move precharge function Johannes Weiner
2014-06-18 20:40 ` [patch 08/13] mm: memcontrol: catch root bypass in move precharge Johannes Weiner
2014-06-18 20:40 ` [patch 09/13] mm: memcontrol: use root_mem_cgroup res_counter Johannes Weiner
2014-06-18 20:40 ` [patch 10/13] mm: memcontrol: remove ordering between pc->mem_cgroup and PageCgroupUsed Johannes Weiner
2014-06-18 20:40 ` [patch 11/13] mm: memcontrol: do not acquire page_cgroup lock for kmem pages Johannes Weiner
2014-06-18 20:40 ` [patch 12/13] mm: memcontrol: rewrite charge API Johannes Weiner
2014-06-23 6:15 ` Uwe Kleine-König
2014-06-23 9:30 ` Michal Hocko
2014-06-23 9:42 ` Uwe Kleine-König
2014-07-14 15:04 ` Michal Hocko
2014-07-14 17:13 ` Johannes Weiner
2014-07-14 18:43 ` Michal Hocko
2014-06-18 20:40 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Johannes Weiner
2014-06-20 16:36 ` [PATCH -mm] memcg: mem_cgroup_charge_statistics needs preempt_disable Michal Hocko
2014-06-23 4:16 ` Johannes Weiner
2014-06-21 0:34 ` Sasha Levin [this message]
2014-06-21 0:56 ` [patch 13/13] mm: memcontrol: rewrite uncharge API Andrew Morton
2014-06-21 1:03 ` Sasha Levin
2014-07-15 8:25 ` Michal Hocko
2014-07-15 12:19 ` Michal Hocko
2014-07-18 7:12 ` Michal Hocko
2014-07-18 14:45 ` Johannes Weiner
2014-07-18 15:12 ` Miklos Szeredi
2014-07-19 17:39 ` Johannes Weiner
2014-07-22 15:08 ` Michal Hocko
2014-07-22 15:44 ` Miklos Szeredi
2014-07-23 14:38 ` Michal Hocko
2014-07-23 15:06 ` Johannes Weiner
2014-07-23 15:19 ` Michal Hocko
2014-07-23 15:36 ` Johannes Weiner
2014-07-23 18:08 ` Miklos Szeredi
2014-07-23 21:02 ` Johannes Weiner
2014-07-24 8:46 ` Michal Hocko
2014-07-24 9:02 ` Michal Hocko
2014-07-25 15:26 ` Johannes Weiner
2014-07-25 15:43 ` Michal Hocko
2014-07-25 17:34 ` Johannes Weiner
2014-07-15 14:23 ` Michal Hocko
2014-07-15 15:09 ` Johannes Weiner
2014-07-15 15:18 ` Michal Hocko
2014-07-15 15:46 ` Johannes Weiner
2014-07-15 15:56 ` Michal Hocko
2014-07-15 15:55 ` Naoya Horiguchi
2014-07-15 16:07 ` Michal Hocko
2014-07-15 17:34 ` Johannes Weiner
2014-07-15 18:21 ` Michal Hocko
2014-07-15 18:43 ` Naoya Horiguchi
2014-07-15 19:04 ` Johannes Weiner
2014-07-15 20:49 ` Naoya Horiguchi
2014-07-15 21:48 ` Johannes Weiner
2014-07-16 7:55 ` Michal Hocko
2014-07-16 13:30 ` Naoya Horiguchi
2014-07-16 14:14 ` Johannes Weiner
2014-07-16 14:57 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A4D323.5080808@oracle.com \
--to=sasha.levin@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=tj@kernel.org \
--cc=vdavydov@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).