* riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
@ 2023-05-21 2:05 Drew Fustini
2023-05-21 3:22 ` Samuel Holland
0 siblings, 1 reply; 3+ messages in thread
From: Drew Fustini @ 2023-05-21 2:05 UTC (permalink / raw)
To: Alexandre Ghiti
Cc: linux-riscv, linux-kernel, Conor Dooley, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Andrew Jones, Anup Patel
Hello, I tested 6.4-rc1 on an internal RISC-V SoC and observed a boot
failure on a Store/AMO access fault (exception code 7) in __memset().
stval (e.g. badaddr) was set to 0xffffaf8000000000. This SoC is RV64GC
with Sv48 so it seems that address is the start of the "direct mapping
of all physical memory" [1].
The 6.3 release boots okay and the system is able to operate correctly
with an Ubuntu 23.04 rootfs on eMMC. Therefore, I decided to bisect and
I found the failure begins with 3335068f8721 ("riscv: Use PUD/P4D/PGD
pages for the linear mapping"). The system boots okay with the prior
commit 8589e346bbb6 ("riscv: Move the linear mapping creation in its
own function").
The boot log [2] shows that the fault happens right after buildroot's
init script [3] uses switch_root to execute init from the Ubuntu rootfs
on the eMMC.
DWARF4 is enabled in .config [4] and the decoded stack trace [5] shows:
epc : __memset (/eng/dfustini/gitlab/linux/arch/riscv/lib/memset.S:67)
From memset.S:
Line 67: REG_S a1, 0(t0)
From the oops:
epc : ffffffff81122d6c ra : ffffffff80218504 sp : ffffaf8002e47500
gp : ffffffff82695010 tp : ffffaf8002e2ec00 t0 : ffffaf8000000000
t1 : 0000000000000080 t2 : 0000000000000001 s0 : ffffaf8002e47550
s1 : ffff8d8200000040 a0 : ffffaf8000000000 a1 : 0000000000000000
Thus I think it is trying to store 0x0 to 0xffffaf8000000000 which is
the start of the direct map. From the boot log [2], OpenSBI shows:
Domain0 Region00 : 0x0000000002080000-0x00000000020bffff M: (I,R,W) S/U: ()
Domain0 Region01 : 0x0000008000000000-0x000000800003ffff M: (R,W,X) S/U: ()
Domain0 Region02 : 0x0000000002000000-0x000000000207ffff M: (I,R,W) S/U: ()
Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X)
The DDR memory on this SoC starts at 0x8000000000 with size 2GB. The
memory node from the device tree [6]:
memory@8000000000 {
device_type = "memory";
reg = <0x80 0 0x00000000 0x80000000>;
};
I think the direct map address 0xffffaf8000000000 would map to physical
address 0x8000000000. Thus I think the attempted store in S-mode to that
address would violate the PMP settings for Region01.
I do not yet understand why this happens with 3335068f8721 ("riscv: Use
PUD/P4D/PGD pages for the linear mapping") but not for the prior commit
8589e346bbb6 ("riscv: Move the linear mapping creation in its own
function").
One important cavaet: I do have a small diff from mainline to add
support for the eMMC controller in this SoC to sdhci-of-dwcmshc.c. The
output of 'git diff' when 3335068f8721 is checked out [7] shows that
this just adds a new compatible and corresponding sdhci_ops struct.
Everything works ok with this change in both the 6.3 release and the
commit prior to 3335068f8721.
I know it is a bit awkward for me to report a boot failure for an
internal SoC but I am hoping to find a better solution than just
reverting this change in the downstream kernel.
The reason that so few changes are needed to run Linux on this SoC is
that there is a service processor that handles all the low-level tasks
like setting up clocks and configuring various peripheral controllers.
Everything is already setup and ready to go by the time the hart meant
to run OpenSBI+Linux (fw_payload.bin) comes out of reset.
Note: normally Linux runs on all four harts but I reduced to running on
a single hart to simplify diagnosing this boot failure.
Thanks,
Drew
[1] https://docs.kernel.org/riscv/vm-layout.html#risc-v-linux-kernel-sv48
[2] boot log: https://gist.github.com/pdp7/afe78604f477c9e3a3cf0241bcdffcdb
[3] init script: https://gist.github.com/pdp7/8d61bafbca55e987b790433c0353831d
[4] linux .config: https://gist.github.com/pdp7/a4df66f1359a34194bddd32f74ab38a3
[5] stacktrace: https://gist.github.com/pdp7/0524892ea319775ea70e43a54cc842a9
[6] mysoc.dts: https://gist.github.com/pdp7/cd1b2e8e8d3f6047efd53e4ef65664da
[7] git diff: https://gist.github.com/pdp7/581c9e8415da94a29d34ae6d7cc14669
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") 2023-05-21 2:05 riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Drew Fustini @ 2023-05-21 3:22 ` Samuel Holland 2023-05-21 22:01 ` Drew Fustini 0 siblings, 1 reply; 3+ messages in thread From: Samuel Holland @ 2023-05-21 3:22 UTC (permalink / raw) To: Drew Fustini Cc: linux-riscv, linux-kernel, Conor Dooley, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Jones, Anup Patel, Alexandre Ghiti Hi Drew, On 5/20/23 21:05, Drew Fustini wrote: > Hello, I tested 6.4-rc1 on an internal RISC-V SoC and observed a boot > failure on a Store/AMO access fault (exception code 7) in __memset(). > stval (e.g. badaddr) was set to 0xffffaf8000000000. This SoC is RV64GC > with Sv48 so it seems that address is the start of the "direct mapping > of all physical memory" [1]. > > The 6.3 release boots okay and the system is able to operate correctly > with an Ubuntu 23.04 rootfs on eMMC. Therefore, I decided to bisect and > I found the failure begins with 3335068f8721 ("riscv: Use PUD/P4D/PGD > pages for the linear mapping"). The system boots okay with the prior > commit 8589e346bbb6 ("riscv: Move the linear mapping creation in its > own function"). > > The boot log [2] shows that the fault happens right after buildroot's > init script [3] uses switch_root to execute init from the Ubuntu rootfs > on the eMMC. > > DWARF4 is enabled in .config [4] and the decoded stack trace [5] shows: > > epc : __memset (/eng/dfustini/gitlab/linux/arch/riscv/lib/memset.S:67) > > From memset.S: > > Line 67: REG_S a1, 0(t0) > > From the oops: > > epc : ffffffff81122d6c ra : ffffffff80218504 sp : ffffaf8002e47500 > gp : ffffffff82695010 tp : ffffaf8002e2ec00 t0 : ffffaf8000000000 > t1 : 0000000000000080 t2 : 0000000000000001 s0 : ffffaf8002e47550 > s1 : ffff8d8200000040 a0 : ffffaf8000000000 a1 : 0000000000000000 > > Thus I think it is trying to store 0x0 to 0xffffaf8000000000 which is > the start of the direct map. From the boot log [2], OpenSBI shows: > > Domain0 Region00 : 0x0000000002080000-0x00000000020bffff M: (I,R,W) S/U: () > Domain0 Region01 : 0x0000008000000000-0x000000800003ffff M: (R,W,X) S/U: () > Domain0 Region02 : 0x0000000002000000-0x000000000207ffff M: (I,R,W) S/U: () > Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X) > > The DDR memory on this SoC starts at 0x8000000000 with size 2GB. The > memory node from the device tree [6]: > > memory@8000000000 { > device_type = "memory"; > reg = <0x80 0 0x00000000 0x80000000>; > }; > > I think the direct map address 0xffffaf8000000000 would map to physical > address 0x8000000000. Thus I think the attempted store in S-mode to that > address would violate the PMP settings for Region01. > > I do not yet understand why this happens with 3335068f8721 ("riscv: Use > PUD/P4D/PGD pages for the linear mapping") but not for the prior commit > 8589e346bbb6 ("riscv: Move the linear mapping creation in its own > function"). Where does Linux's DTB come from? It should be the one that was modified by OpenSBI to add a reserved-memory node matching PMP Region01 (fdt_reserved_memory_fixup()). Before this commit, Linux ignored the first 2 MiB of physical RAM. So if OpenSBI was loaded in this region, you could get away with ignoring the firmware-provided DTB; now you actually need to use it, as intended. Regards, Samuel _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") 2023-05-21 3:22 ` Samuel Holland @ 2023-05-21 22:01 ` Drew Fustini 0 siblings, 0 replies; 3+ messages in thread From: Drew Fustini @ 2023-05-21 22:01 UTC (permalink / raw) To: Samuel Holland Cc: linux-riscv, linux-kernel, Conor Dooley, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrew Jones, Anup Patel, Alexandre Ghiti On Sat, May 20, 2023 at 10:22:36PM -0500, Samuel Holland wrote: > Hi Drew, > > On 5/20/23 21:05, Drew Fustini wrote: > > Hello, I tested 6.4-rc1 on an internal RISC-V SoC and observed a boot > > failure on a Store/AMO access fault (exception code 7) in __memset(). > > stval (e.g. badaddr) was set to 0xffffaf8000000000. This SoC is RV64GC > > with Sv48 so it seems that address is the start of the "direct mapping > > of all physical memory" [1]. > > > > The 6.3 release boots okay and the system is able to operate correctly > > with an Ubuntu 23.04 rootfs on eMMC. Therefore, I decided to bisect and > > I found the failure begins with 3335068f8721 ("riscv: Use PUD/P4D/PGD > > pages for the linear mapping"). The system boots okay with the prior > > commit 8589e346bbb6 ("riscv: Move the linear mapping creation in its > > own function"). > > > > The boot log [2] shows that the fault happens right after buildroot's > > init script [3] uses switch_root to execute init from the Ubuntu rootfs > > on the eMMC. > > > > DWARF4 is enabled in .config [4] and the decoded stack trace [5] shows: > > > > epc : __memset (/eng/dfustini/gitlab/linux/arch/riscv/lib/memset.S:67) > > > > From memset.S: > > > > Line 67: REG_S a1, 0(t0) > > > > From the oops: > > > > epc : ffffffff81122d6c ra : ffffffff80218504 sp : ffffaf8002e47500 > > gp : ffffffff82695010 tp : ffffaf8002e2ec00 t0 : ffffaf8000000000 > > t1 : 0000000000000080 t2 : 0000000000000001 s0 : ffffaf8002e47550 > > s1 : ffff8d8200000040 a0 : ffffaf8000000000 a1 : 0000000000000000 > > > > Thus I think it is trying to store 0x0 to 0xffffaf8000000000 which is > > the start of the direct map. From the boot log [2], OpenSBI shows: > > > > Domain0 Region00 : 0x0000000002080000-0x00000000020bffff M: (I,R,W) S/U: () > > Domain0 Region01 : 0x0000008000000000-0x000000800003ffff M: (R,W,X) S/U: () > > Domain0 Region02 : 0x0000000002000000-0x000000000207ffff M: (I,R,W) S/U: () > > Domain0 Region03 : 0x0000000000000000-0xffffffffffffffff M: (R,W,X) S/U: (R,W,X) > > > > The DDR memory on this SoC starts at 0x8000000000 with size 2GB. The > > memory node from the device tree [6]: > > > > memory@8000000000 { > > device_type = "memory"; > > reg = <0x80 0 0x00000000 0x80000000>; > > }; > > > > I think the direct map address 0xffffaf8000000000 would map to physical > > address 0x8000000000. Thus I think the attempted store in S-mode to that > > address would violate the PMP settings for Region01. > > > > I do not yet understand why this happens with 3335068f8721 ("riscv: Use > > PUD/P4D/PGD pages for the linear mapping") but not for the prior commit > > 8589e346bbb6 ("riscv: Move the linear mapping creation in its own > > function"). > > Where does Linux's DTB come from? It should be the one that was modified > by OpenSBI to add a reserved-memory node matching PMP Region01 > (fdt_reserved_memory_fixup()). > > Before this commit, Linux ignored the first 2 MiB of physical RAM. So if > OpenSBI was loaded in this region, you could get away with ignoring the > firmware-provided DTB; now you actually need to use it, as intended. The address of the dtb is passed by the boot code to OpenSBI. I had been using OpenSBI master from Jan 9: 001106d ("docs: Update domain's region permissions and requirements"). The kernel receives the device tree from OpenSBI but I had never actually dumped it from sysfs. I checked out the prior kernel commit 8589e346bbb6 ("riscv: Move the linear mapping creation in its own function") and ran "dtc -I fs /sys/firmware/devicetree/base/" to dump the device tree [1]. This showed that the reserved-memory node was blank. Jessica pointed out to me on #riscv irc that this was fixed in OpenSBI on Jan 21 with: a990309 ("lib: utils: Fix reserved memory node for firmware memory"). Therefore, I updated to the current OpenSBI master: 33f1722 ("lib: sbi: Document sbi_ecall_extension members") from May 15. The device tree that OpenSBI passes to the kernel now has "mmode_resv0@80,0" and "mmode_resv1@80,20000". Furthermore, my system now boots okay with 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") so the problem was just that I had been using an OpenSBI that was slightly too old. Thanks, Drew [1] https://gist.github.com/pdp7/71ca465997274e11953b26861e36144f [2] https://gist.github.com/pdp7/90b4632146fc55625735fa288d80532b _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-21 21:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-21 2:05 riscv: boot failure for 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Drew Fustini
2023-05-21 3:22 ` Samuel Holland
2023-05-21 22:01 ` Drew Fustini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox