All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Kexec Mailing List <kexec@lists.infradead.org>,
	AKASHI Takahiro <takahiro.akashi@linaro.org>,
	James Morse <james.morse@arm.com>,
	Bhupesh SHARMA <bhupesh.linux@gmail.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of linear region
Date: Wed, 13 Jun 2018 11:11:25 +0100	[thread overview]
Message-ID: <20180613101124.GB26280@arm.com> (raw)
In-Reply-To: <CACi5LpOJ7Q7nbo9m_HGqr44+n_iEtreZt+Uxtf2PzcHSzTdKuA@mail.gmail.com>

On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote:
> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote:
> > On 12/06/18 09:25, Bhupesh Sharma wrote:
> >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >>>> The start of the linear region map on a KASLR enabled ARM64 machine -
> >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL
> >>>> support), is no longer correctly represented by the PAGE_OFFSET macro,
> >>>> since it is defined as:
> >>>>
> >>>>     (UL(1) << (VA_BITS - 1)) + 1)
> >
> >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map
> >>> can be sparsely populated with actual memory, regardless of whether
> >>> KASLR is in effect or not. The only difference in the presence of
> >>> KASLR is that there may be such a hole at the beginning, but that does
> >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is
> >>> now wrong.
> >
> >>>> So taking an example of a platform with VA_BITS=48, this gives a static
> >>>> value of:
> >>>> PAGE_OFFSET = 0xffff800000000000
> >>>>
> >>>> However, for the KASLR case, we use the 'memstart_offset_seed'
> >>>> to randomize the linear region - since 'memstart_addr' indicates the
> >>>> start of physical RAM, we randomize the same on basis
> >>>> of 'memstart_offset_seed' value.
> >>>>
> >>>> As the PAGE_OFFSET value is used presently by several user space
> >>>> tools (for e.g. makedumpfile and crash tools) to determine the start
> >>>> of linear region and hence to read addresses (like PT_NOTE fields) from
> >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to
> >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as
> >>>> the start of linear region for the KASLR cases and default to
> >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of
> >>>> linear region.
> >
> >>> Userland code that assumes that the linear map cannot have a hole at
> >>> the beginning should be fixed.
> >
> >> That is a separate case (although that needs fixing as well via a
> >> kernel patch probably as the user-space tools rely on '/proc/iomem'
> >> contents to determine the first System RAM/reserved range).
> >
> > This is for kexec-tools generating the kdump vmcore ELF headers in user-space?
> 
> Yes, but again, I would like to reiterate that the case where I see a
> hole at the start of the System RAM range (as I listed above) is just
> a specific case, which probably deserves a separate patch. The current
> patch though is for a generic issue (please see more details below).
> 
> >> 1. In that particular case (see [1]) the EFI firmware sets the first
> >> EFI block as EfiReservedMemType:
> >>
> >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
> >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]
> >>
> >> Since EFI firmware won't return the "EfiReservedMemType" memory to
> >> Linux kernel,
> >
> > (Its linux that makes this choice in
> > drivers/firmware/efi/arm-init.c::is_usable_memory())
> >
> >
> >> so the kernel can't get any info about the first mem
> >> block, and kernel can only see region2 as below:
> >>
> >> efi: Processing EFI memory map:
> >> efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |
> >> |  |  |  |   |WB|WT|WC|UC]
> >>
> >> # head -1 /proc/iomem
> >> 00200000-0021ffff : reserved
> >>
> >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the
> >> kernel Virtual map we can see that the memory node is set to:
> >>
> >> # dmesg | grep memory
> >> ..........
> >> memory  : 0xffff800000200000 - 0xffff801800000000
> >>
> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that
> >> if we use 'readelf' to get the last program Header from vmcore (logs
> >> below are for the non-kaslr case):
> >>
> >> # readelf -l vmcore
> >>
> >> ELF Header:
> >> ........................
> >>
> >> Program Headers:
> >>   Type           Offset             VirtAddr           PhysAddr
> >>          FileSiz            MemSiz              Flags  Align
> >> ..............................................................................................................................................................
> >>   LOAD        0x0000000076d40000 0xffff80017fe00000 0x0000000180000000
> >>                 0x0000001680000000 0x0000001680000000  RWE    0
> >>
> >> 3. So if we do a simple calculation:
> >>
> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 =
> >> 0xFFFF8017FFE00000 != 0xffff801800000000.
> >>
> >> which indicates that the end virtual memory nodes are not the same
> >> between vmlinux and vmcore.
> >
> > If I've followed this properly: the problem is that to generate the ELF headers
> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the
> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem.
> >
> > The problem you are hitting is an invisible hole at the beginning of RAM,
> > meaning user-space's guess_phys_to_virt() is off by the size of this hole.
> >
> > Isn't KASLR a special case for this? You must have to correct for that after
> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this?
> 
> No, I hit this issue both for the KASLR and non-KASLR boot cases. We
> can fix this either in kernel or user-space.
> 
> Fixing this in kernel space seems better to me as the definition of
> 'memstart_addr' is that it indicates the start of the physical ram,
> but since in this case there is a hole at the start of the system ram
> visible in Linux (and thus to user-space), but 'memstart_addr' is
> still 0 which seems contradictory at the least. This causes PHY_OFFSET
> to be 0 as well, which is again contradictory.

Contradictory to who? Userspace has no business messing around with this
stuff and I'm reluctant to make this an ABI by adding a symbol with a
special name. Why can't the various constants needed by these tools be
exported in the ELF headers for kcore/vmcore, or as a NOTE as James
suggests? That sounds a lot less fragile to me.

Will

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64/mm: Introduce a variable to hold base address of linear region
Date: Wed, 13 Jun 2018 11:11:25 +0100	[thread overview]
Message-ID: <20180613101124.GB26280@arm.com> (raw)
In-Reply-To: <CACi5LpOJ7Q7nbo9m_HGqr44+n_iEtreZt+Uxtf2PzcHSzTdKuA@mail.gmail.com>

On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote:
> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote:
> > On 12/06/18 09:25, Bhupesh Sharma wrote:
> >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel
> >> <ard.biesheuvel@linaro.org> wrote:
> >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >>>> The start of the linear region map on a KASLR enabled ARM64 machine -
> >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL
> >>>> support), is no longer correctly represented by the PAGE_OFFSET macro,
> >>>> since it is defined as:
> >>>>
> >>>>     (UL(1) << (VA_BITS - 1)) + 1)
> >
> >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map
> >>> can be sparsely populated with actual memory, regardless of whether
> >>> KASLR is in effect or not. The only difference in the presence of
> >>> KASLR is that there may be such a hole at the beginning, but that does
> >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is
> >>> now wrong.
> >
> >>>> So taking an example of a platform with VA_BITS=48, this gives a static
> >>>> value of:
> >>>> PAGE_OFFSET = 0xffff800000000000
> >>>>
> >>>> However, for the KASLR case, we use the 'memstart_offset_seed'
> >>>> to randomize the linear region - since 'memstart_addr' indicates the
> >>>> start of physical RAM, we randomize the same on basis
> >>>> of 'memstart_offset_seed' value.
> >>>>
> >>>> As the PAGE_OFFSET value is used presently by several user space
> >>>> tools (for e.g. makedumpfile and crash tools) to determine the start
> >>>> of linear region and hence to read addresses (like PT_NOTE fields) from
> >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to
> >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as
> >>>> the start of linear region for the KASLR cases and default to
> >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of
> >>>> linear region.
> >
> >>> Userland code that assumes that the linear map cannot have a hole at
> >>> the beginning should be fixed.
> >
> >> That is a separate case (although that needs fixing as well via a
> >> kernel patch probably as the user-space tools rely on '/proc/iomem'
> >> contents to determine the first System RAM/reserved range).
> >
> > This is for kexec-tools generating the kdump vmcore ELF headers in user-space?
> 
> Yes, but again, I would like to reiterate that the case where I see a
> hole at the start of the System RAM range (as I listed above) is just
> a specific case, which probably deserves a separate patch. The current
> patch though is for a generic issue (please see more details below).
> 
> >> 1. In that particular case (see [1]) the EFI firmware sets the first
> >> EFI block as EfiReservedMemType:
> >>
> >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
> >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]
> >>
> >> Since EFI firmware won't return the "EfiReservedMemType" memory to
> >> Linux kernel,
> >
> > (Its linux that makes this choice in
> > drivers/firmware/efi/arm-init.c::is_usable_memory())
> >
> >
> >> so the kernel can't get any info about the first mem
> >> block, and kernel can only see region2 as below:
> >>
> >> efi: Processing EFI memory map:
> >> efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |
> >> |  |  |  |   |WB|WT|WC|UC]
> >>
> >> # head -1 /proc/iomem
> >> 00200000-0021ffff : reserved
> >>
> >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the
> >> kernel Virtual map we can see that the memory node is set to:
> >>
> >> # dmesg | grep memory
> >> ..........
> >> memory  : 0xffff800000200000 - 0xffff801800000000
> >>
> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that
> >> if we use 'readelf' to get the last program Header from vmcore (logs
> >> below are for the non-kaslr case):
> >>
> >> # readelf -l vmcore
> >>
> >> ELF Header:
> >> ........................
> >>
> >> Program Headers:
> >>   Type           Offset             VirtAddr           PhysAddr
> >>          FileSiz            MemSiz              Flags  Align
> >> ..............................................................................................................................................................
> >>   LOAD        0x0000000076d40000 0xffff80017fe00000 0x0000000180000000
> >>                 0x0000001680000000 0x0000001680000000  RWE    0
> >>
> >> 3. So if we do a simple calculation:
> >>
> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 =
> >> 0xFFFF8017FFE00000 != 0xffff801800000000.
> >>
> >> which indicates that the end virtual memory nodes are not the same
> >> between vmlinux and vmcore.
> >
> > If I've followed this properly: the problem is that to generate the ELF headers
> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the
> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem.
> >
> > The problem you are hitting is an invisible hole at the beginning of RAM,
> > meaning user-space's guess_phys_to_virt() is off by the size of this hole.
> >
> > Isn't KASLR a special case for this? You must have to correct for that after
> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this?
> 
> No, I hit this issue both for the KASLR and non-KASLR boot cases. We
> can fix this either in kernel or user-space.
> 
> Fixing this in kernel space seems better to me as the definition of
> 'memstart_addr' is that it indicates the start of the physical ram,
> but since in this case there is a hole at the start of the system ram
> visible in Linux (and thus to user-space), but 'memstart_addr' is
> still 0 which seems contradictory at the least. This causes PHY_OFFSET
> to be 0 as well, which is again contradictory.

Contradictory to who? Userspace has no business messing around with this
stuff and I'm reluctant to make this an ABI by adding a symbol with a
special name. Why can't the various constants needed by these tools be
exported in the ELF headers for kcore/vmcore, or as a NOTE as James
suggests? That sounds a lot less fragile to me.

Will

  reply	other threads:[~2018-06-13 10:11 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-12  6:36 [PATCH] arm64/mm: Introduce a variable to hold base address of linear region Bhupesh Sharma
2018-06-12  6:36 ` Bhupesh Sharma
2018-06-12  6:53 ` Ard Biesheuvel
2018-06-12  6:53   ` Ard Biesheuvel
2018-06-12  8:25   ` Bhupesh Sharma
2018-06-12  8:25     ` Bhupesh Sharma
2018-06-12 10:12     ` James Morse
2018-06-12 10:12       ` James Morse
2018-06-13  5:16       ` Bhupesh Sharma
2018-06-13  5:16         ` Bhupesh Sharma
2018-06-13 10:11         ` Will Deacon [this message]
2018-06-13 10:11           ` Will Deacon
2018-06-14  6:23           ` Bhupesh Sharma
2018-06-14  6:23             ` Bhupesh Sharma
2018-06-15 16:52             ` Will Deacon
2018-06-15 16:52               ` Will Deacon
2018-06-15 20:02               ` Bhupesh Sharma
2018-06-15 20:02                 ` Bhupesh Sharma
2018-06-13 10:29         ` James Morse
2018-06-13 10:29           ` James Morse
2018-06-14  7:53           ` Bhupesh Sharma
2018-06-14  7:53             ` Bhupesh Sharma
2018-06-14 16:17             ` James Morse
2018-06-14 16:17               ` James Morse
2018-06-19  3:02               ` Jin, Yanjiang
2018-06-19  3:02                 ` Jin, Yanjiang
2018-06-19  8:55                 ` Will Deacon
2018-06-19  8:55                   ` Will Deacon
2018-06-19  9:34                   ` Jin, Yanjiang
2018-06-19  9:34                     ` Jin, Yanjiang
2018-06-19  9:40                     ` Will Deacon
2018-06-19  9:40                       ` Will Deacon
2018-06-19  9:57                       ` Jin, Yanjiang
2018-06-19  9:57                         ` Jin, Yanjiang
2018-06-19 10:16                         ` James Morse
2018-06-19 10:16                           ` James Morse
2018-06-19 10:37                           ` Bhupesh Sharma
2018-06-19 10:37                             ` Bhupesh Sharma
2018-06-19 11:26                             ` James Morse
2018-06-19 11:26                               ` James Morse
2018-06-19 11:58                               ` Bhupesh Sharma
2018-06-19 11:58                                 ` Bhupesh Sharma
2018-06-20  2:16                                 ` Jin, Yanjiang
2018-06-20  2:16                                   ` Jin, Yanjiang
2018-06-20  7:26                                   ` Bhupesh Sharma
2018-06-20  7:26                                     ` Bhupesh Sharma
2018-06-20 10:06                                     ` James Morse
2018-06-20 10:06                                       ` James Morse
2018-07-11 13:24                                     ` James Morse
2018-07-11 13:24                                       ` James Morse
2018-07-11 15:36                                       ` Bhupesh Sharma
2018-07-11 15:36                                         ` Bhupesh Sharma
2018-07-11 16:24                                         ` Omar Sandoval
2018-07-11 16:24                                           ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180613101124.GB26280@arm.com \
    --to=will.deacon@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bhsharma@redhat.com \
    --cc=bhupesh.linux@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=takahiro.akashi@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.