From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Nazarewicz Subject: Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation Date: Tue, 10 Nov 2015 17:02:43 +0100 Message-ID: References: <1447053784-27811-1-git-send-email-iamjoonsoo.kim@lge.com> <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <1447053784-27811-2-git-send-email-iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Joonsoo Kim , Andrew Morton Cc: Minchan Kim , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Joonsoo Kim List-Id: linux-api@vger.kernel.org On Mon, Nov 09 2015, Joonsoo Kim wrote: > CMA allocation should be guaranteed to succeed by definition,=20 Uh? That=E2=80=99s a peculiar statement. Which is to say that it=E2=80= =99s not true. > but, > unfortunately, it would be failed sometimes. It is hard to track down > the problem, because it is related to page reference manipulation and > we don't have any facility to analyze it. > > This patch adds tracepoints to track down page reference manipulation= =2E > With it, we can find exact reason of failure and can fix the problem. > Following is an example of tracepoint output. > > <...>-9018 [004] 92.678375: page_ref_set: pfn=3D0x17ac9 f= lags=3D0x0 count=3D1 mapcount=3D0 mapping=3D(nil) mt=3D4 val=3D1 > <...>-9018 [004] 92.678378: kernel_stack: > =3D> get_page_from_freelist (ffffffff81176659) > =3D> __alloc_pages_nodemask (ffffffff81176d22) > =3D> alloc_pages_vma (ffffffff811bf675) > =3D> handle_mm_fault (ffffffff8119e693) > =3D> __do_page_fault (ffffffff810631ea) > =3D> trace_do_page_fault (ffffffff81063543) > =3D> do_async_page_fault (ffffffff8105c40a) > =3D> async_page_fault (ffffffff817581d8) > [snip] > <...>-9018 [004] 92.678379: page_ref_mod: pfn=3D0x17ac9 f= lags=3D0x40048 count=3D2 mapcount=3D1 mapping=3D0xffff880015a78dc1 mt=3D= 4 val=3D1 > [snip] > ... > ... > <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=3D0x1= 7800 end_pfn=3D0x17c00 fin_pfn=3D0x17ac9 ret=3Dfail > [snip] > <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=3D0x17ac9 = flags=3D0x40068 count=3D0 mapcount=3D0 mapping=3D0xffff880015a78dc1 mt=3D= 4 val=3D-1 ret=3D1 > =3D> release_pages (ffffffff8117c9e4) > =3D> free_pages_and_swap_cache (ffffffff811b0697) > =3D> tlb_flush_mmu_free (ffffffff81199616) > =3D> tlb_finish_mmu (ffffffff8119a62c) > =3D> exit_mmap (ffffffff811a53f7) > =3D> mmput (ffffffff81073f47) > =3D> do_exit (ffffffff810794e9) > =3D> do_group_exit (ffffffff81079def) > =3D> SyS_exit_group (ffffffff81079e74) > =3D> entry_SYSCALL_64_fastpath (ffffffff817560b6) > > This output shows that problem comes from exit path. In exit path, > to improve performance, pages are not freed immediately. They are gat= hered > and processed by batch. During this process, migration cannot be poss= ible > and CMA allocation is failed. This problem is hard to find without th= is > page reference tracepoint facility. > > Enabling this feature bloat kernel text 20 KB in my configuration. > > text data bss dec hex filename > 12041272 2223424 1507328 15772024 f0a978 vmlinux_disab= led > 12064844 2225920 1507328 15798092 f10f4c vmlinux_enabl= ed > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz > --- > include/trace/events/page_ref.h | 128 ++++++++++++++++++++++++++++++= ++++++++++ I haven=E2=80=99t really looked at the above file though. --=20 Best regards, _ _ =2Eo. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o =2E.o | Computer Science, =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9Cmina86=E2= =80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83=84 (o o) ooo +---------ooO--(_)--Ooo-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by kanga.kvack.org (Postfix) with ESMTP id AAB6D6B0038 for ; Tue, 10 Nov 2015 11:02:48 -0500 (EST) Received: by wmww144 with SMTP id w144so6881799wmw.0 for ; Tue, 10 Nov 2015 08:02:48 -0800 (PST) Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com. [2a00:1450:400c:c09::233]) by mx.google.com with ESMTPS id 193si6213966wmx.83.2015.11.10.08.02.46 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 10 Nov 2015 08:02:46 -0800 (PST) Received: by wmww144 with SMTP id w144so124064675wmw.1 for ; Tue, 10 Nov 2015 08:02:46 -0800 (PST) From: Michal Nazarewicz Subject: Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation In-Reply-To: <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com> References: <1447053784-27811-1-git-send-email-iamjoonsoo.kim@lge.com> <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com> Date: Tue, 10 Nov 2015 17:02:43 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Joonsoo Kim , Andrew Morton Cc: Minchan Kim , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Joonsoo Kim On Mon, Nov 09 2015, Joonsoo Kim wrote: > CMA allocation should be guaranteed to succeed by definition,=20 Uh? That=E2=80=99s a peculiar statement. Which is to say that it=E2=80=99= s not true. > but, > unfortunately, it would be failed sometimes. It is hard to track down > the problem, because it is related to page reference manipulation and > we don't have any facility to analyze it. > > This patch adds tracepoints to track down page reference manipulation. > With it, we can find exact reason of failure and can fix the problem. > Following is an example of tracepoint output. > > <...>-9018 [004] 92.678375: page_ref_set: pfn=3D0x17ac9 flags= =3D0x0 count=3D1 mapcount=3D0 mapping=3D(nil) mt=3D4 val=3D1 > <...>-9018 [004] 92.678378: kernel_stack: > =3D> get_page_from_freelist (ffffffff81176659) > =3D> __alloc_pages_nodemask (ffffffff81176d22) > =3D> alloc_pages_vma (ffffffff811bf675) > =3D> handle_mm_fault (ffffffff8119e693) > =3D> __do_page_fault (ffffffff810631ea) > =3D> trace_do_page_fault (ffffffff81063543) > =3D> do_async_page_fault (ffffffff8105c40a) > =3D> async_page_fault (ffffffff817581d8) > [snip] > <...>-9018 [004] 92.678379: page_ref_mod: pfn=3D0x17ac9 flags= =3D0x40048 count=3D2 mapcount=3D1 mapping=3D0xffff880015a78dc1 mt=3D4 val= =3D1 > [snip] > ... > ... > <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=3D0x17800= end_pfn=3D0x17c00 fin_pfn=3D0x17ac9 ret=3Dfail > [snip] > <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=3D0x17ac9 flag= s=3D0x40068 count=3D0 mapcount=3D0 mapping=3D0xffff880015a78dc1 mt=3D4 val= =3D-1 ret=3D1 > =3D> release_pages (ffffffff8117c9e4) > =3D> free_pages_and_swap_cache (ffffffff811b0697) > =3D> tlb_flush_mmu_free (ffffffff81199616) > =3D> tlb_finish_mmu (ffffffff8119a62c) > =3D> exit_mmap (ffffffff811a53f7) > =3D> mmput (ffffffff81073f47) > =3D> do_exit (ffffffff810794e9) > =3D> do_group_exit (ffffffff81079def) > =3D> SyS_exit_group (ffffffff81079e74) > =3D> entry_SYSCALL_64_fastpath (ffffffff817560b6) > > This output shows that problem comes from exit path. In exit path, > to improve performance, pages are not freed immediately. They are gathered > and processed by batch. During this process, migration cannot be possible > and CMA allocation is failed. This problem is hard to find without this > page reference tracepoint facility. > > Enabling this feature bloat kernel text 20 KB in my configuration. > > text data bss dec hex filename > 12041272 2223424 1507328 15772024 f0a978 vmlinux_disabled > 12064844 2225920 1507328 15798092 f10f4c vmlinux_enabled > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz > --- > include/trace/events/page_ref.h | 128 ++++++++++++++++++++++++++++++++++= ++++++ I haven=E2=80=99t really looked at the above file though. --=20 Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=3D./ `o ..o | Computer Science, =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9Cmina86=E2=80= =9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83=84 (o o) ooo +---------ooO--(_)--Ooo-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754179AbbKJQCu (ORCPT ); Tue, 10 Nov 2015 11:02:50 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:37392 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752394AbbKJQCs convert rfc822-to-8bit (ORCPT ); Tue, 10 Nov 2015 11:02:48 -0500 From: Michal Nazarewicz To: Joonsoo Kim , Andrew Morton Cc: Minchan Kim , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Joonsoo Kim Subject: Re: [PATCH 2/2] mm/page_ref: add tracepoint to track down page reference manipulation In-Reply-To: <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com> Organization: http://mina86.com/ References: <1447053784-27811-1-git-send-email-iamjoonsoo.kim@lge.com> <1447053784-27811-2-git-send-email-iamjoonsoo.kim@lge.com> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.0.50.1 (x86_64-unknown-linux-gnu) X-Face: PbkBB1w#)bOqd`iCe"Ds{e+!C7`pkC9a|f)Qo^BMQvy\q5x3?vDQJeN(DS?|-^$uMti[3D*#^_Ts"pU$jBQLq~Ud6iNwAw_r_o_4]|JO?]}P_}Nc&"p#D(ZgUb4uCNPe7~a[DbPG0T~!&c.y$Ur,=N4RT>]dNpd;KFrfMCylc}gc??'U2j,!8%xdD Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACQElEQVQ4jW3TMWvbQBQHcBk1xE6WyALX1069oZBMlq+ouUwpEQQ6uRjttkWP4CmBgGM0BQLBdPFZYPsyFUo6uEtKDQ7oy/U96XR2Ux8ehH/89Z6enqxBcS7Lg81jmSuujrfCZcLI/TYYvbGj+jbgFpHJ/bqQAUISj8iLyu4LuFHJTosxsucO4jSDNE0Hq3hwK/ceQ5sx97b8LcUDsILfk+ovHkOIsMbBfg43VuQ5Ln9YAGCkUdKJoXR9EclFBhixy3EGVz1K6eEkhxCAkeMMnqoAhAKwhoUJkDrCqvbecaYINlFKSRS1i12VKH1XpUd4qxL876EkMcDvHj3s5RBajHHMlA5iK32e0C7VgG0RlzFPvoYHZLRmAC0BmNcBruhkE0KsMsbEc62ZwUJDxWUdMsMhVqovoT96i/DnX/ASvz/6hbCabELLk/6FF/8PNpPCGqcZTGFcBhhAaZZDbQPaAB3+KrWWy2XgbYDNIinkdWAFcCpraDE/knwe5DBqGmgzESl1p2E4MWAz0VUPgYYzmfWb9yS4vCvgsxJriNTHoIBz5YteBvg+VGISQWUqhMiByPIPpygeDBE6elD973xWwKkEiHZAHKjhuPsFnBuArrzxtakRcISv+XMIPl4aGBUJm8Emk7qBYU8IlgNEIpiJhk/No24jHwkKTFHDWfPniR4iw5vJaw2nzSjfq2zffcE/GDjRC2dn0J0XwPAbDL84TvaFCJEU4Oml9pRyEUhR3Cl2t01AoEjRbs0sYugp14/4X5n4pU4EHHnMAAAAAElFTkSuQmCC X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:151110:linux-api@vger.kernel.org::AAknkLyqD9BSmm1y:00000000000000000000000000000000000008g0 X-Hashcash: 1:20:151110:linux-kernel@vger.kernel.org::S0K8qdKc4VcbLylV:0000000000000000000000000000000000Huq X-Hashcash: 1:20:151110:vbabka@suse.cz::3IfSy9xyfX/y0Zl0:0000vzh X-Hashcash: 1:20:151110:iamjoonsoo.kim@lge.com::4qS5snaiq989fWeI:0000000000000000000000000000000000000000XUz X-Hashcash: 1:20:151110:minchan@kernel.org::/UcPompUAWO9kbnN:00000000000000000000000000000000000000000001CgE X-Hashcash: 1:20:151110:kirill.shutemov@linux.intel.com::x1KFsaPXjW4/7UC9:0000000000000000000000000000001qz4 X-Hashcash: 1:20:151110:mgorman@suse.de::+K+sCIlTojO7JgvN:001X19 X-Hashcash: 1:20:151110:akpm@linux-foundation.org::+EbaB1gBZd0GCREq:0000000000000000000000000000000000001KCU X-Hashcash: 1:20:151110:js1304@gmail.com::PlX3/JdXRfg+u41Z:02gIx X-Hashcash: 1:20:151110:linux-mm@kvack.org::6R4/2xbTWtlJ3GEv:00000000000000000000000000000000000000000008HoH Date: Tue, 10 Nov 2015 17:02:43 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 09 2015, Joonsoo Kim wrote: > CMA allocation should be guaranteed to succeed by definition, Uh? That’s a peculiar statement. Which is to say that it’s not true. > but, > unfortunately, it would be failed sometimes. It is hard to track down > the problem, because it is related to page reference manipulation and > we don't have any facility to analyze it. > > This patch adds tracepoints to track down page reference manipulation. > With it, we can find exact reason of failure and can fix the problem. > Following is an example of tracepoint output. > > <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1 > <...>-9018 [004] 92.678378: kernel_stack: > => get_page_from_freelist (ffffffff81176659) > => __alloc_pages_nodemask (ffffffff81176d22) > => alloc_pages_vma (ffffffff811bf675) > => handle_mm_fault (ffffffff8119e693) > => __do_page_fault (ffffffff810631ea) > => trace_do_page_fault (ffffffff81063543) > => do_async_page_fault (ffffffff8105c40a) > => async_page_fault (ffffffff817581d8) > [snip] > <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1 > [snip] > ... > ... > <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail > [snip] > <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1 > => release_pages (ffffffff8117c9e4) > => free_pages_and_swap_cache (ffffffff811b0697) > => tlb_flush_mmu_free (ffffffff81199616) > => tlb_finish_mmu (ffffffff8119a62c) > => exit_mmap (ffffffff811a53f7) > => mmput (ffffffff81073f47) > => do_exit (ffffffff810794e9) > => do_group_exit (ffffffff81079def) > => SyS_exit_group (ffffffff81079e74) > => entry_SYSCALL_64_fastpath (ffffffff817560b6) > > This output shows that problem comes from exit path. In exit path, > to improve performance, pages are not freed immediately. They are gathered > and processed by batch. During this process, migration cannot be possible > and CMA allocation is failed. This problem is hard to find without this > page reference tracepoint facility. > > Enabling this feature bloat kernel text 20 KB in my configuration. > > text data bss dec hex filename > 12041272 2223424 1507328 15772024 f0a978 vmlinux_disabled > 12064844 2225920 1507328 15798092 f10f4c vmlinux_enabled > > Signed-off-by: Joonsoo Kim Acked-by: Michal Nazarewicz > --- > include/trace/events/page_ref.h | 128 ++++++++++++++++++++++++++++++++++++++++ I haven’t really looked at the above file though. -- Best regards, _ _ .o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o ..o | Computer Science, ミハウ “mina86” ナザレヴイツ (o o) ooo +---------ooO--(_)--Ooo--