From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB37412E4A for ; Tue, 10 Dec 2024 04:21:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733804471; cv=none; b=JDs12p0KUlepjjUwPqdXUrLgF4l8BvBX0E0CeoWZrzNx3bRrqrKD5dFMQoipBHzzoYIoZu8zAJysimYg23f7ILVBConkbSZS5KvyWI9xI+cJBvqHr8ER39rjpa6fparsufMq8cLJ9eDsCUdODk25stq/H1WghCUisRQsHmvY1PM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733804471; c=relaxed/simple; bh=frG1YSuHEGnkDwkzrfy/45sc9P6WScrYCGbVBS9n0dg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=W1U/K3O/Ta46Foc0TnbjqW/RBhvgCJ0/um9BiguTXYXHGuFGfu6/droGYCQA+a1+ZCGMSPapdzG2QFaDd9HFT6HcA4HKb4rWU0WvgBip/HdjxYOEfMcsPl3MrOGtYp6e0a0Qo7459VyvWrFzBJWPKyOfIDOmXkGtljVzx+wAdis= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=OvepTpoW; arc=none smtp.client-ip=91.218.175.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="OvepTpoW" Message-ID: <464cdb19-e3d1-b0d6-15aa-3b291f90d61b@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1733804466; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kdm6tBxBqlDv3SoBcEPkAfpkwBDNn+GodbiPCM5KXoc=; b=OvepTpoWfQWYEWRjrmlFxhJXcCd+ckgZ6FkJCcV8M1QbutBRtp1+5GDkUlLuGq34oFyOyi 5bc5CFK6u2oMCpROnxkJd5t3x4Q7QJBKAQV2W2A1xuS98zKkDfiySL4v35w6vnPL2dJPv/ HNiBhcutfPxmB+DhhiBPOxJ1zin1GxM= Date: Tue, 10 Dec 2024 12:20:16 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot. To: Suren Baghdasaryan , Ben Greear Cc: LKML References: <1ba0cc57-e2ed-caa2-1241-aa5615bee01f@candelatech.com> <48b36c0e-86bb-b181-4d9b-7ed50d70426f@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Ge In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Hi Suren and Ben On 12/10/24 06:33, Suren Baghdasaryan wrote: > On Mon, Dec 9, 2024 at 4:48 AM Hao Ge wrote: >> Hi Suren >> >> >> On 12/7/24 09:27, Suren Baghdasaryan wrote: >>> On Fri, Dec 6, 2024 at 4:50 PM Ben Greear wrote: >>>> On 12/6/24 16:15, Suren Baghdasaryan wrote: >>>>> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan wrote: >>>>>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear wrote: >>>>>>> On 12/6/24 14:03, Suren Baghdasaryan wrote: >>>>>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear wrote: >>>>>>>>> Hello Suren, >>>>>>>>> >>>>>>>>> My system crashes on bootup, and I bisected to this commit. >>>>>>>>> >>>>>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit >>>>>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45 >>>>>>>>> Author: Suren Baghdasaryan >>>>>>>>> Date: Wed Oct 23 10:07:57 2024 -0700 >>>>>>>>> >>>>>>>>> alloc_tag: populate memory for module tags as needed >>>>>>>>> >>>>>>>>> The memory reserved for module tags does not need to be backed by physical >>>>>>>>> pages until there are tags to store there. Change the way we reserve this >>>>>>>>> memory to allocate only virtual area for the tags and populate it with >>>>>>>>> physical pages as needed when we load a module. >>>>>>>>> >>>>>>>>> The crash looks like this: >>>>>>>>> >>>>>>>>> BUG: unable to handle page fault for address: fffffbfff4041000 >>>>>>>>> #PF: supervisor read access in kernel mode >>>>>>>>> #PF: error_code(0x0000) - not-present page >>>>>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0 >>>>>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN >>>>>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21 >>>>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023 >>>>>>>>> RIP: 0010:kasan_check_range+0xa5/0x190 >>>>>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29 >>>>>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206 >>>>>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b >>>>>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000 >>>>>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d >>>>>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000 >>>>>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000 >>>>>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000 >>>>>>>>> dev Devices. >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0 >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>>> Call Trace: >>>>>>>>> >>>>>>>>> [ OK ? __die+0x1f/0x60 >>>>>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910 >>>>>>>>> et sysi ? dump_pagetable+0x690/0x690 >>>>>>>>> nit.target - ? search_bpf_extables+0x22/0x250 >>>>>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120 >>>>>>>>> zation. >>>>>>>>> ? search_bpf_extables+0x164/0x250 >>>>>>>>> ? kasan_check_range+0xa5/0x190 >>>>>>>>> ? fixup_exception+0x4d/0xc70 >>>>>>>>> ? exc_page_fault+0xe1/0xf0 >>>>>>>>> [ OK ? asm_exc_page_fault+0x22/0x30 >>>>>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560 >>>>>>>>> et netw ? kasan_check_range+0xa5/0x190 >>>>>>>>> ork.target - __asan_memcpy+0x38/0x60 >>>>>>>>> Network. >>>>>>>>> load_module+0x3d7b/0x7560 >>>>>>>>> ? module_frob_arch_sections+0x30/0x30 >>>>>>>>> ? lockdep_lock+0xbe/0x1b0 >>>>>>>>> ? rw_verify_area+0x18d/0x5e0 >>>>>>>>> ? kernel_read_file+0x246/0x870 >>>>>>>>> ? __x64_sys_fspick+0x290/0x290 >>>>>>>>> ? init_module_from_file+0xd1/0x130 >>>>>>>>> init_module_from_file+0xd1/0x130 >>>>>>>>> ? __ia32_sys_init_module+0xa0/0xa0 >>>>>>>>> ? lock_acquire+0x2d/0xb0 >>>>>>>>> ? idempotent_init_module+0x116/0x790 >>>>>>>>> ? do_raw_spin_unlock+0x54/0x220 >>>>>>>>> idempotent_init_module+0x226/0x790 >>>>>>>>> ? init_module_from_file+0x130/0x130 >>>>>>>>> ? vm_mmap_pgoff+0x203/0x2e0 >>>>>>>>> __x64_sys_finit_module+0xba/0x130 >>>>>>>>> do_syscall_64+0x69/0x160 >>>>>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53 >>>>>>>>> RIP: 0033:0x7fe869de327d >>>>>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248 >>>>>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 >>>>>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d >>>>>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006 >>>>>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000 >>>>>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000 >>>>>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110 >>>>>>>>> >>>>>>>>> Modules linked in: >>>>>>>>> CR2: fffffbfff4041000 >>>>>>>>> ---[ end trace 0000000000000000 ]--- >>>>>>>>> >>>>>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used >>>>>>>>> is found here: >>>>>>>>> >>>>>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config >>>>>>>>> >>>>>>>>> I will be happy to test fixes. >>>>>>>> Hi Ben, >>>>>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree: >>>>>>>> >>>>>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/ >>>>>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/ >>>>>>>> >>>>>>>> If not, couple you please apply them and see if the issue is still happening? >>>>>>>> Thanks, >>>>>>>> Suren. >>>>>>> Hello Suren, >>>>>>> >>>>>>> Thanks for the quick response. The first patch is already in latest Linus tree, >>>>> Hmm. Could you please double-check which tree you are using? I don't >>>>> see the first patch >>>>> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/) >>>>> in Linus' tree. Maybe you are using linux-next? >>>> Sorry, you are correct. I must have mangled something when trying to apply >>>> the patch and I didn't look hard enough when patch said changes were already applied. >>>> >>>> I can re-test this next week...and for reference, kernel boots fine when you disable >>>> KASAN and other debugging. >>> Thanks! Please retest with this patch and let me know if you are still >>> having issues. >>> Suren. >> Indeed, this is a bug that still exists in another context, namely when >> CONFIG_KASAN_VMALLOC is not enabled. > Hmm. Are you able to reproduce this issue with all the fixes we had? Yes, I set up an x86 virtual machine and after porting both of our patches, I encountered a reproduction of the issue. I have also submitted a patch to fix this issue. https://lore.kernel.org/all/20241210041515.765569-1-hao.ge@linux.dev/ I verified it locally. Hi Ben Can you test with this patch? Thanks Best Regards Hao > >> We may need to look into this scenario next. >> >> Thanks >> >> Best Regards >> >> Hao >> >>>> Thanks, >>>> Ben >>>> >>>> >>>> -- >>>> Ben Greear >>>> Candela Technologies Inc http://www.candelatech.com >>>> >>>>