From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71364C2BD09 for ; Thu, 27 Jun 2024 14:46:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3A616B0085; Thu, 27 Jun 2024 10:46:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CEA676B0088; Thu, 27 Jun 2024 10:46:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB1E66B0089; Thu, 27 Jun 2024 10:46:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9DCF26B0085 for ; Thu, 27 Jun 2024 10:46:13 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1E2A4C06E5 for ; Thu, 27 Jun 2024 14:46:13 +0000 (UTC) X-FDA: 82276943826.23.11450C3 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 3EF0612001D for ; Thu, 27 Jun 2024 14:46:10 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of "SRS0=W6lk=N5=linux-m68k.org=gerg@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=W6lk=N5=linux-m68k.org=gerg@kernel.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719499553; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zwePqaq+iRntdEXvKQ6BcAJETBkods90r6Mip78dT1k=; b=DnHVEDQtWYlTcgEg70j8TGp7L+N2sO0WWtv2L54LYqFUFKMt7hLkh8sZ2DA+NmwevBbR2q OMaqnRZ3ykZhQjwmbXu8edxKY/glCjevuijxBNvEmAVrbWyyK5WFMshCoQMGiWxG6Clt/O NwxHMbeRnQO0n/9e8bOofA9UMnYQU0U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719499553; a=rsa-sha256; cv=none; b=JYbyeJ/c5BwaJTvclS9sLC55LdAatvGfgUchjcmqyWp4m5CGd/f/4xgvCavGOvJVPeL8Ha 8VSLlHsVVc8HYHFbJaIoajtoY9HUw9E/pPH+bXK6nPbg/ARDXFOUFw21tnhkyqnpqx3BWg rZ2X/mtAkyW77zm81h2fN9FPay2fclQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of "SRS0=W6lk=N5=linux-m68k.org=gerg@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=W6lk=N5=linux-m68k.org=gerg@kernel.org"; dmarc=none Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 0580261EA5; Thu, 27 Jun 2024 14:46:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 00723C2BBFC; Thu, 27 Jun 2024 14:46:06 +0000 (UTC) Message-ID: Date: Fri, 28 Jun 2024 00:46:04 +1000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: m68k 54418 fails to execute user space To: Jean-Michel Hautbois , Michael Schmitz , linux-m68k@lists.linux-m68k.org, linux-mm@kvack.org, linux-mtd@lists.infradead.org Cc: Geert Uytterhoeven , Christoph Hellwig , wbx@openadk.org References: <735e19b6-3747-417f-ba5b-1a7da137a3a3@yoseli.org> <7fb2988d-ab89-405f-8cf1-edcdd2196376@gmail.com> <57879ac8-eaf5-48f1-b4ef-6619d9108440@yoseli.org> Content-Language: en-US From: Greg Ungerer In-Reply-To: <57879ac8-eaf5-48f1-b4ef-6619d9108440@yoseli.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3EF0612001D X-Stat-Signature: 9qw4b46tgciochu8ob86e11r7xhopao8 X-HE-Tag: 1719499570-412945 X-HE-Meta: U2FsdGVkX1+uJRbN2vNFV9qK+oyuOmxtxUOtAprIDEe/3FtX951OPFqVgwbyAsK22EoYKPEsRD2Ss+mtBMTor1p1jchZc99bAEaULEjwc5fipJAchTyU1dwN3kRtbnfS+UQW8HkQoH78e0asO/qm8zOsY0C6HyAFTDupvT2urVhL3sEUpd4WXsVbLFqqQRdmTHouEIp94xY4lSenV2R7D9OIq3dVKF/YPYSgpRNLU3Ty0WHJ01r9CvYoiBUMF7j+pUXQGGSLiOf3ZOCKUDIPfVDqfivH0DxHQ1fIzGJjfxk+FGebMPEh6nC1YUl3hjCYGJsx6JAl+2qulLKaEAenr9dYAosRJ/v7HsqLu2hA1AZpyXWWapoHtwZ6iqFxf65rEMWglrj/V6MO31FNW49btdwcFMU6H/9k1z9ICVS194KqRH4ysx42gkv5BZlT9iy0fmh8F0GI4Rbr+cWRtb/AuIwY1puZNMzTqXB7kuFzrQ9sFoEJorESJ6sowEI2mhhOCdbQE+4rxJ1E9g2NimKDN/JwZuNBhC784EiTbGa9+IzfYd4yyio+D0Xdw74YtU1poNgG9XgS4IlhAuXs8NqWlMHJjQnm5XrvKgJcJiSMxq9rUJsSHYfOMR71bzteNqbdnpuJrmpXEVbyk4WIZveevCjAMxCKfPFAUqDgsSZsmAIA7LZ2uvcW/vTbN7HVMgp5/9hDRu+jFqUJhJWhTzT0L1Vge6fROAGcwAGxbtkqHMMZLDc5k32gN4oX45rcztl850809mnklO180IB7EAg6gPhNO/LjxAXgHjTjlbiAoqnRFkH8w5fu8+5W6ilV0eSNYL/7V34lWqIRLbMlFzZhYUcA/StPfrViwlddLS2YMxbK6Jn3Wu+1LB/xZCpDVjYBjn4JqwaXnU5uLmP+RojhfRnYC9OVlIsLOEYAgIuvFu8M5LPB/dAFAxqlA+Wm5OxvOu7XYGnJOdBSHonhLj0 w0QTslUQ KEL4la6nq7rxBbRRu4smlu9B++I9jxGPOfK0MCYrFMuGJbI1O2w1rL/STOL8kCo5jdYkMoWpDOCsChalaBEkJafURGX0d+AMsMNCMb2PDIl/EjJMOY553ngV07NLpWekBrAnBTon6TVeY5jjhXU1cImJVy7InD+lrTgdZPlsMGa80SiFjYP5CH+SD7FA1JZ8vSz4X+oimZFDzDFZG0VcTfkANXgSsCpkya3HbUx5B+z32GRp4KS9WnNblrY8ImONUmNA/9kpGRJCa364rmcufx5w2V8vlvgvwtxpfo8Npr+wQEERvJbDpWKvtEUwu4fUYWDqTvKctRYVdDPuGdQV4PZ4N2KGj94okZ0oW7pSuIgP2pW4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi JM, On 27/6/24 22:36, Jean-Michel Hautbois wrote: > Michael, > > On 26/06/2024 21:36, Michael Schmitz wrote: >> Jean-Michel, >> >> On 27/06/24 01:28, Jean-Michel Hautbois wrote: >>> Hi Michael, >>> >>> On 26/06/2024 03:56, Michael Schmitz wrote: >>>> Jean-Michel, >>>> >>>> On 24/06/24 20:56, Jean-Michel Hautbois wrote: >>>>> >>>>> When I printk the do_page_fault first debug, I get for the first call to ls: >>>>> bash-5.2# ls >>>>> [   14.700000] do page fault: >>>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, address=0x70069ee6, 0, (ptrval) >>>> >>>> Page not present, read fault. Please disable obfuscation of kernel pointer addresses by printk. Maybe also disable address space randomization while debugging this. >>>> >>>>> This call works almost fine (I still have the assert failed: folio->private != NULL issue). >>>>> >>>>> And when I call it a second time, I get: >>>>> bash-5.2# ls >>>>> [   19.820000] do page fault: >>>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, address=0x700e2004, 2, (ptrval) >>>> >>>> Page not present, write fault. >>>> >>>> It would be helpful if you could get a dump of /proc/1/maps before the execve() syscall in your helloworld init replacement. That might confirm all these addresses are legit (assuming mappings survive across execve(), that is), and what they correspond to. >>>> >>>>> >>>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set it to 0x70000000. >>>>> >>>>> regs->pc is not the same as the address. It might be unrelevant, but any help is appreciated to understand the process behind :-). >>>>> >>>>> I keep digging, and I am in the asm part which fears me a bit ! >>>> >>>> I don't see that you'd need to look at any asm code here. >>> >>> I add a small test in do_page_fault, and in case of an error, it panics. The result follows: >> >> Please take a look at the comments at the start of arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in error_code are explained there. >> >> error_code != 0 is just one possible case out of the four that are handled by do_page_fault(). It does not signify 'no error' - if there hadn't been a page fault, do_page_fault() would not have been called. >> >> You just forced a panic each time a write fault and/or a protection fault happens. Write faults are absolutely expected to happen when loading a library - ld.so needs to perform relocation after loading a dynamic library, and that means writes to the GOT in the library's data segment (PIC assumed). >> >> >>>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log >>> [    3.857000] Run /bin/bash as init process >>> [    3.858000]   with arguments: >>> [    3.861000]     /bin/bash >>> [    3.862000]   with environment: >>> [    3.863000]     HOME=/ >>> [    3.864000]     TERM=linux >>> [    4.242000] do page fault: >>> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, address=0x700b3364, 2, 41fb0000 >>> [    4.242000] Kernel panic - not syncing: page fault error >>> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 6.10.0-rc5-g927da6cf01fe-dirty #25 >>> [    4.242000] Stack from 4186dda8: >>> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 00000000 4136ee10 41423aa4 >>> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 4186de60 41fb0000 41d51a60 >>> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 700b3364 00000002 41fb0000 >>> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 41d51a00 4186de60 41d51a60 >>> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 700b3364 00000002 00000000 >>> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 4186de60 00000000 00000000 >>> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124) >>> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368) >>> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1)) >>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108) >>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 arch/m68k/kernel/traps.c:775) >>> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 arch/m68k/kernel/traps.c:775) >>> [    4.242000] buserr (arch/m68k/kernel/entry.S:116) >>> [    4.242000] ma_slots (lib/maple_tree.c:759) >>> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108) >>> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) fs/binfmt_elf.c:421 (discriminator 1)) >>> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132) >>> [    4.242000] memset (arch/m68k/lib/memset.c:11) >>> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 fs/binfmt_misc.c:146 fs/binfmt_misc.c:213) >>> [    4.242000] memset (arch/m68k/lib/memset.c:11) >>> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 fs/exec.c:1891 fs/exec.c:1867) >>> [    4.242000] copy_strings_kernel (fs/exec.c:669) >>> [    4.242000] count_strings_kernel (fs/exec.c:473) >>> [    4.242000] kernel_execve (fs/exec.c:2058) >>> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865) >>> [    4.242000] run_init_process (init/main.c:1389) >>> [    4.242000] _printk (kernel/printk/printk.c:2365) >>> [    4.242000] kernel_init (init/main.c:1508) >>> [    4.242000] kernel_init (init/main.c:1459) >>> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142) >>> [    4.242000] >>> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]--- >>> >>> Looks like a memory mapping failure, but why ? >>> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM starts at 0x40000000 and ends at 0x50000000 (256MB). >> 0x41fb0000 seems to be init's page directory. The fault address is in the range where I'd expect dynamic libraries to reside. >>> >>> It looks like a TLB write miss which is obscure to me :-). >>> >>> I tried to use the /proc but as expected it is not alive after mounting it. >> >> The memory map ought to be accessible through sysrq - an alternative would be to modify the ELF binfmt handler and dump the map once ld.so has finished with relocations. > > I added a dump in the binfmt_elf file: > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index a43897b03ce9..395f556f3a90 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -816,6 +816,63 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr, >         return ret == -ENOENT ? 0 : ret; >  } > > +static int dump_memory_map(struct task_struct *task) > +{ > +    struct mm_struct *mm = task->mm; > +    struct vm_area_struct *vma; > +       MA_STATE(mas, &mm->mm_mt, 0, -1); > +    struct file *file; > +    struct path *path; > +    char *buf; > +    char *pathname; > + > +    // Acquire the read lock for mmap_lock > +    down_read(&mm->mmap_lock); > +       mas_lock(&mas); > +    for (vma = mas_find(&mas, ULONG_MAX); vma; vma = mas_find(&mas, ULONG_MAX)) { > +        if (vma->vm_file) { > +            buf = (char *)__get_free_page(GFP_KERNEL); > +            if (!buf) { > +                continue; // Handle memory allocation failure > +            } > + > +            file = vma->vm_file; > +            path = &file->f_path; > +            pathname = d_path(path, buf, PAGE_SIZE); > +            if (IS_ERR(pathname)) { > +                pathname = NULL; > +            } > + > +            pr_info("%lx-%lx %c%c%c%c %08lx %02x:%02x %lu %s\n", > +                vma->vm_start, vma->vm_end, > +                vma->vm_flags & VM_READ ? 'r' : '-', > +                vma->vm_flags & VM_WRITE ? 'w' : '-', > +                vma->vm_flags & VM_EXEC ? 'x' : '-', > +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p', > +                vma->vm_pgoff << PAGE_SHIFT, > +                MAJOR(file->f_inode->i_rdev), > +                MINOR(file->f_inode->i_rdev), > +                file->f_inode->i_ino, > +                pathname ? pathname : ""); > + > +            free_page((unsigned long)buf); > +        } else { > +            pr_info("%lx-%lx %c%c%c%c %08lx 00:00 0\n", > +                vma->vm_start, vma->vm_end, > +                vma->vm_flags & VM_READ ? 'r' : '-', > +                vma->vm_flags & VM_WRITE ? 'w' : '-', > +                vma->vm_flags & VM_EXEC ? 'x' : '-', > +                vma->vm_flags & VM_MAYSHARE ? 's' : 'p', > +                vma->vm_pgoff << PAGE_SHIFT); > +        } > +    } > +       mas_unlock(&mas); > +    // Release the read lock for mmap_lock > +    up_read(&mm->mmap_lock); > + > +    return 0; > +} > + >  static int load_elf_binary(struct linux_binprm *bprm) >  { >         struct file *interpreter = NULL; /* to shut gcc up */ > @@ -1299,6 +1356,9 @@ static int load_elf_binary(struct linux_binprm *bprm) > >         finalize_exec(bprm); >         START_THREAD(elf_ex, regs, elf_entry, bprm->p); > +       if (current->pid == 1) {  // Check if this is the init process > +            dump_memory_map(current); > +    } >         retval = 0; >  out: >         return retval; > > I think it is quick and dirty, but seems to do the trick. > I then get in my console: > [    4.265000] 60000000-6001e000 r-xp 00000000 00:00 178 /lib/ld.so.1 > [    4.266000] 6001e000-60022000 rw-p 0001c000 00:00 178 /lib/ld.so.1 > [    4.267000] 70000000-700ac000 r-xp 00000000 00:00 27 /bin/bash > [    4.268000] 700ac000-700b4000 rw-p 000ac000 00:00 27 /bin/bash > [    4.269000] 700b4000-700be000 rwxp 700b4000 00:00 0 > [    4.270000] bfe7a000-bfe9c000 rw-p bffde000 00:00 0 > > But nothing rings a bell at this level for me... > Thanks ! Here is the same dump trace generated on my newly resurrected M5475EVB for comparison: [snip] Freeing unused kernel image (initmem) memory: 80K This architecture does not have kernel memory protection. Run /sbin/init as init process Run /etc/init as init process Run /bin/init as init process process '/bin/init' started with executable stack 60000000-60008000 r-xp 00000000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so 60008000-6000c000 rw-p 00006000 00:00 550544 /lib/ld-uClibc-0.9.33.2.so 80000000-80004000 r-xp 00000000 00:00 1882624 /bin/init 80004000-80008000 rw-p 00002000 00:00 1882624 /bin/init bfc9a000-bfcbc000 rwxp bffde000 00:00 0 Welcome to ... Execution otherwise continues as normal to a shell after this. Regards Greg