From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41B04C433ED for ; Tue, 13 Apr 2021 11:30:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 776CA61244 for ; Tue, 13 Apr 2021 11:30:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 776CA61244 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=e16-tech.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AC9D06B0036; Tue, 13 Apr 2021 07:30:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7A016B006E; Tue, 13 Apr 2021 07:30:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9415B6B0070; Tue, 13 Apr 2021 07:30:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 75AC56B0036 for ; Tue, 13 Apr 2021 07:30:21 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 15A6A1814B0DC for ; Tue, 13 Apr 2021 11:30:21 +0000 (UTC) X-FDA: 78027125442.06.B2B94B2 Received: from out20-13.mail.aliyun.com (out20-13.mail.aliyun.com [115.124.20.13]) by imf01.hostedemail.com (Postfix) with ESMTP id 5A9B55001535 for ; Tue, 13 Apr 2021 11:30:17 +0000 (UTC) X-Alimail-AntiSpam:AC=CONTINUE;BC=0.04517033|-1;CH=green;DM=|CONTINUE|false|;DS=CONTINUE|ham_system_inform|0.0172391-0.000134984-0.982626;FP=0|0|0|0|0|-1|-1|-1;HT=ay29a033018047208;MF=wangyugui@e16-tech.com;NM=1;PH=DS;RN=3;RT=3;SR=0;TI=SMTPD_---.JzLKwV2_1618313414; Received: from 192.168.2.112(mailfrom:wangyugui@e16-tech.com fp:SMTPD_---.JzLKwV2_1618313414) by smtp.aliyun-inc.com(10.147.43.95); Tue, 13 Apr 2021 19:30:14 +0800 Date: Tue, 13 Apr 2021 19:30:18 +0800 From: Wang Yugui To: Yang Shi Subject: Re: kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) Cc: Linux MM , wangyugui@e16-tech.com In-Reply-To: References: <20210412180659.B9E3.409509F4@e16-tech.com> Message-Id: <20210413193015.77E7.409509F4@e16-tech.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.03 [en] X-Stat-Signature: us7318j6tmzi5hc7k8pooc9gc8dgasis X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5A9B55001535 Received-SPF: none (e16-tech.com>: No applicable sender policy available) receiver=imf01; identity=mailfrom; envelope-from=""; helo=out20-13.mail.aliyun.com; client-ip=115.124.20.13 X-HE-DKIM-Result: none/none X-HE-Tag: 1618313417-419013 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, > On Mon, Apr 12, 2021 at 3:07 AM Wang Yugui wrote: > > > > Hi, > > > > kernel BUG at mm/huge_memory.c:2736(linux 5.10.29) is triggered > > by some files write test. > > > > mm/huge_memory.c: > > if (IS_ENABLED(CONFIG_DEBUG_VM) && mapcount) { > > pr_alert("total_mapcount: %u, page_count(): %u\n", > > mapcount, count); > > if (PageTail(page)) > > dump_page(head, NULL); > > dump_page(page, "total_mapcount(head) > 0"); > > L2736: BUG(); > > } > > We just can tell the mapcount of the page is not zero from the current > log, it might mean the unmap_page() call is failed. It seems you have > CONFIG_DEBUG_VM enabled, could you please paste more log? There is > "VM_BUG_ON_PAGE(!unmap_success, page)" in unmap_page(). It should be > able to tell us if unmap_page() is failed or not, or something else > happened. The kernel config: $grep CONFIG_DEBUG_VM /boot/config-5.10.29-3.el7.x86_64 CONFIG_DEBUG_VM=y # CONFIG_DEBUG_VM_VMACACHE is not set # CONFIG_DEBUG_VM_RB is not set # CONFIG_DEBUG_VM_PGFLAGS is not set # CONFIG_DEBUG_VM_PGTABLE is not set $grep HUGE /boot/config-5.10.29-3.el7.x86_64 CONFIG_CGROUP_HUGETLB=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y CONFIG_HAVE_ARCH_HUGE_VMAP=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_TRANSPARENT_HUGEPAGE=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y This problem hava a high reproduce frequence in a workstation, we need a new RS232 cable to get more log, and it will take about a week. Server: Dell Precision T7610 CPU: E5-2680v2 *2 Memory: 192G the user case of our user-space application. 1) write the files with the total size > 3 * memory size. the memory size > 128G 2) some CPU load, and some memory load. the output of 'free -h' when our user-space application is running. total used free shared buff/cache available Mem: 188Gi 75Gi 7.9Gi 17Mi 104Gi 107Gi Swap: 0B 0B 0B Best Regards Wang Yugui (wangyugui@e16-tech.com) 2021/04/13 > > > > > kernel version:5.10.29 > > > > kernel BUG at mm/huge_memory.c:2736 > > invalid opcode: 0000 [#1] SMP NOPTI > > CPU:9 pid:351 Comm: kswapd0 Tainted: G S > > RIP: 0010:split_huge_page_to_list.cold.86+0x19/8x1b > > ... > > Call Trace: > > ? shrink_inactive_list+0x241/0x3d0 > > deferred_split_scan+0x1ca/0x320 > > do_shrink_slab+0x20f/0x2c0 > > shrink_node+0x24b/0x6d0 > > balanced_pgdat+0x2db/0x550 > > kswaped+0x201/0x390 > > ? finish_wait+0x80/0x80 > > ? balance_pgdat+0x550/0x550 > > kthread+0x116/0x130 > > ? ktrhead_park+0x80/0x80 > > ret_from_fork+0x1f/0x30 > > > > see OOPS.jpg for more info. > > > > Best Regards > > Wang Yugui (wangyugui@e16-tech.com) > > 2021/04/12 > >