linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: takahiro.akashi@linaro.org (AKASHI Takahiro)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC] arm64: extra entries in /proc/iomem for kexec
Date: Mon, 2 Apr 2018 10:53:32 +0900	[thread overview]
Message-ID: <20180402015330.GA27603@linaro.org> (raw)
In-Reply-To: <da2989ec-7fa5-b044-3237-4e75f2fe6d98@arm.com>

James,

My apologies for slow response. I had a long weekend.

On Tue, Mar 27, 2018 at 02:32:49PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 27/03/18 11:16, AKASHI Takahiro wrote:
> > On Tue, Mar 20, 2018 at 01:18:34AM +0530, Bhupesh Sharma wrote:
> >> On 03/14/2018 01:59 PM, AKASHI Takahiro wrote:
> >>> Currently, there is a inconsistent view between (A) and the mainline's:
> >>> see (A-1) and (B-1). If this is really a matter, I can fix it.
> >>> Kexec-tools can be easily modified to accept both formats, though.
> 
> Ooer, what needs changing in kexec-tools? What happens if someone doesn't update
> userspace at the same time?

Basically, changes that I made on /proc/iomem in my new format D were:
1. to move NOMAP region entries, formerly named "reserved" and now named
   "reserved (no map)", under System RAM
2. to add new entries for firmware-reserved regions as "reserved" also
   under System RAM

On the other hand, current kexec-tools, in particular kexec command,
only scan top-level "System RAM" entries as well as "reserved" entries.

So if someone doesn't update kexec-tools, secondary kernel may potentially
crash during boot time either because
a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region
   which are not suitable for usable memory, or
b. new kernel (or initrd/dtb) may have been allocated on a reserved region
   whose contents can be overwritten.

While we see (b) even today, (a) is a backward compatibility issue.

Note: we have a different story for kdump (alignment error), and I will
take a different approach to fixing kdump case.

> Is there a format which doesn't require a user-space change, (and shouldn't we
> pick that one?)

The only solution that I can imagine for now to prevent (a) and (b)
at the same time without any user-space change is
2+. to add new entries for firmware-reserved regions as "reserved",
    in addition to the current NOMAP regions, at top level

(format E)
40000000-5858ffff : System RAM
  40080000-40f1ffff : Kernel code
  41040000-411e9fff : Kernel data
  54400000-583fffff : Crash kernel
58590000-585effff : reserved
484f0000-586fffff : System RAM 
58700000-5871ffff : reserved
58720000-58b5ffff : reserved (no map)
58b60000-58b0ffff : System RAM
58b61000-58b61fff : reserved
58620000-59a7b117 : System RAM
59a7b118-59a7b667 : reserved
59a7b668-5be3ffff : System RAM
5be40000-5becffff : reserved (no map)
5bed0000-5bedffff : System RAM
5bee0000-5bffffff : reserved (no map)
5ec00000-5edfffff : reserved
5ee00000-5fffffff ; System RAM
8000000000-ffffffffff : PCI Bus 0000:00

This does not only look quite noisy but also ignores the fact that
reserved regions are part of System RAM (or memblock.memory). 

Or to maximize the compatibility, we may adopt format B:

(format B)
40000000-5871ffff : System RAM
  40080000-40f1ffff : Kernel code
  41040000-411e9fff : Kernel data
  54400000-583fffff : Crash kernel
  58590000-585effff : reserved
  58700000-5871ffff : reserved
58720000-58b5ffff : reserved (no map)
58b60000-5be3ffff : System RAM
  58b61000-58b61fff : reserved
  59a7b118-59a7b667 : reserved
5be40000-5becffff : reserved (no map)
5bed0000-5bedffff : System RAM
5bee0000-5bffffff : reserved (no map)
5c000000-5fffffff : System RAM
  5ec00000-5edfffff : reserved
8000000000-ffffffffff : PCI Bus 0000:00

but, in this case, we need some change on kexec-tools to fix (b).

> >>> 2. How should we determine which regions be exported in /proc/iomem?
> >>>
> >>>  a. Trust all the memblock_reserve'd regions as my previous patch [3] does.
> >>>
> >>>     As I said, it's a kind of "overkill." Some of regions, say fdt, are
> >>>     not required to be preserved across kexec.
> >>
> >>
> >> I think we should preserve all the memblock_reserve'd regions. So +1 on this
> >> approach from my side. I believe it might help avoid issues we have seen in
> >> the past with 'kexec-tools' _incorrectly_ determining which regions to pick
> >> from the '/proc/iomem'.
> > 
> > As I said in my reply to Ard's comment, I now know *overkill* is not a big
> > issue and I will go for this approach.
> 
> /sys/kernel/debug/memblock/reserved has all kinds of weird stuff in it,
> including some smaller-than-a-page reservations that appear to come from the
> percpu allocator.
> 
> I agree it will make the implementation simpler, and reserving 'too much' isn't
> an issue.

Are you suggesting that we should use /sys/kernel/debug/memblock/reserved
without modifying current /proc/iomem?
(Note that, even in this approach, we need an user-space change.)

Hmm, overall, this approach will be preferable to format B/E.

Thanks,
-Takahiro AKASHI

> 
> Thanks,
> 
> James

  reply	other threads:[~2018-04-02  1:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-14  8:29 [RFC] arm64: extra entries in /proc/iomem for kexec AKASHI Takahiro
2018-03-14  8:39 ` Ard Biesheuvel
2018-03-15  4:41   ` AKASHI Takahiro
2018-03-15  7:33     ` Ard Biesheuvel
2018-03-19 19:48 ` Bhupesh Sharma
2018-03-27 10:16   ` AKASHI Takahiro
2018-03-27 13:32     ` James Morse
2018-04-02  1:53       ` AKASHI Takahiro [this message]
2018-04-05  2:42         ` AKASHI Takahiro
2018-04-12 16:01           ` James Morse
2018-04-16 10:08             ` AKASHI Takahiro
2018-04-24 16:08               ` James Morse
2018-04-25  9:20                 ` AKASHI Takahiro
2018-04-25 13:22                   ` James Morse
2018-04-26  7:40                     ` AKASHI Takahiro
2018-04-26 14:26                       ` James Morse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180402015330.GA27603@linaro.org \
    --to=takahiro.akashi@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).