All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Berger <stefanb@linux.ibm.com>
To: The development of GNU GRUB <grub-devel@gnu.org>,
	Daniel Kiper <dkiper@net-space.pl>
Cc: leif@nuviainc.com, ps@pks.im, Daniel Axtens <dja@axtens.net>
Subject: Re: [PATCH v3 11/15] ieee1275: request memory with ibm, client-architecture-support
Date: Tue, 19 Jul 2022 16:49:51 -0400	[thread overview]
Message-ID: <a548c2e1-caa9-45f2-e96c-5cf8ec309d8a@linux.ibm.com> (raw)
In-Reply-To: <20220421052427.1389987-12-dja@axtens.net>

Daniel K.,

   were you going to push the last 4 patches of this series into the 
repo as well now that the first 10 are checked in?

Regards,
    Stefan

On 4/21/22 01:24, Daniel Axtens wrote:
> On PowerVM, the first time we boot a Linux partition, we may only get
> 256MB of real memory area, even if the partition has more memory.
> 
> This isn't enough to reliably verify a kernel. Fortunately, the Power
> Architecture Platform Reference (PAPR) defines a method we can call to ask
> for more memory: the broad and powerful ibm,client-architecture-support
> (CAS) method.
> 
> CAS can do an enormous amount of things on a PAPR platform: as well as
> asking for memory, you can set the supported processor level, the interrupt
> controller, hash vs radix mmu, and so on.
> 
> If:
> 
>   - we are running under what we think is PowerVM (compatible property of /
>     begins with "IBM"), and
> 
>   - the full amount of RMA is less than 512MB (as determined by the reg
>     property of /memory)
> 
> then call CAS as follows: (refer to the Linux on Power Architecture
> Reference, LoPAR, which is public, at B.5.2.3):
> 
>   - Use the "any" PVR value and supply 2 option vectors.
> 
>   - Set option vector 1 (PowerPC Server Processor Architecture Level)
>     to "ignore".
> 
>   - Set option vector 2 with default or Linux-like options, including a
>     min-rma-size of 512MB.
> 
>   - Set option vector 3 to request Floating Point, VMX and Decimal Floating
>     point, but don't abort the boot if we can't get them.
> 
>   - Set option vector 4 to request a minimum VP percentage to 1%, which is
>     what Linux requests, and is below the default of 10%. Without this,
>     some systems with very large or very small configurations fail to boot.
> 
> This will cause a CAS reboot and the partition will restart with 512MB
> of RMA. Importantly, grub will notice the 512MB and not call CAS again.
> 
> Notes about the choices of parameters:
> 
>   - A partition can be configured with only 256MB of memory, which would
>     mean this request couldn't be satisfied, but PFW refuses to load with
>     only 256MB of memory, so it's a bit moot. SLOF will run fine with 256MB,
>     but we will never call CAS under qemu/SLOF because /compatible won't
>     begin with "IBM".)
> 
>   - unspecified CAS vectors take on default values. Some of these values
>     might restrict the ability of certain hardware configurations to boot.
>     This is why we need to specify the VP percentage in vector 4, which is
>     in turn why we need to specify vector 3.
> 
> Finally, we should have enough memory to verify a kernel, and we will
> reach Linux. One of the first things Linux does while still running under
> OpenFirmware is to call CAS with a much fuller set of options (including
> asking for 512MB of memory). Linux includes a much more restrictive set of
> PVR values and processor support levels, and this CAS invocation will likely
> induce another reboot. On this reboot grub will again notice the higher RMA,
> and not call CAS. We will get to Linux again, Linux will call CAS again, but
> because the values are now set for Linux this will not induce another CAS
> reboot and we will finally boot all the way to userspace.
> 
> On all subsequent boots, everything will be configured with 512MB of RMA,
> so there will be no further CAS reboots from grub. (phyp is super sticky
> with the RMA size - it persists even on cold boots. So if you've ever booted
> Linux in a partition, you'll probably never have grub call CAS. It'll only
> ever fire the first time a partition loads grub, or if you deliberately lower
> the amount of memory your partition has below 512MB.)
> 
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> 
> ---
> 
> v2: reformat
> v3: extend to option vectors 3 & 4
> 
> I wrongly assumed that the most compatible way to perform CAS
> negotiation was to only set the minimum number of vectors required
> to ask for more memory. It turns out that this messes up booting
> if the minimum VP capacity would be less than the default 10% in
> vector 4.
> ---
>   grub-core/kern/ieee1275/cmain.c  |   3 +
>   grub-core/kern/ieee1275/init.c   | 152 +++++++++++++++++++++++++++++++
>   include/grub/ieee1275/ieee1275.h |   8 ++
>   3 files changed, 163 insertions(+)
> 
> diff --git a/grub-core/kern/ieee1275/cmain.c b/grub-core/kern/ieee1275/cmain.c
> index 4442b6a83193..b707798ec3fb 100644
> --- a/grub-core/kern/ieee1275/cmain.c
> +++ b/grub-core/kern/ieee1275/cmain.c
> @@ -123,6 +123,9 @@ grub_ieee1275_find_options (void)
>   	      break;
>   	    }
>   	}
> +
> +      if (grub_strncmp (tmp, "IBM,", 4) == 0)
> +	grub_ieee1275_set_flag (GRUB_IEEE1275_FLAG_CAN_TRY_CAS_FOR_MORE_MEMORY);
>       }
>   
>     if (is_smartfirmware)
> diff --git a/grub-core/kern/ieee1275/init.c b/grub-core/kern/ieee1275/init.c
> index 2adf4fdfc0e7..cf4bcf2cfbf5 100644
> --- a/grub-core/kern/ieee1275/init.c
> +++ b/grub-core/kern/ieee1275/init.c
> @@ -200,11 +200,163 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
>     return 0;
>   }
>   
> +/*
> + * How much memory does OF believe it has? (regardless of whether
> + * it's accessible or not)
> + */
> +static grub_err_t
> +grub_ieee1275_total_mem (grub_uint64_t *total)
> +{
> +  grub_ieee1275_phandle_t root;
> +  grub_ieee1275_phandle_t memory;
> +  grub_uint32_t reg[4];
> +  grub_ssize_t reg_size;
> +  grub_uint32_t address_cells = 1;
> +  grub_uint32_t size_cells = 1;
> +  grub_uint64_t size;
> +
> +  /* If we fail to get to the end, report 0. */
> +  *total = 0;
> +
> +  /* Determine the format of each entry in `reg'.  */
> +  grub_ieee1275_finddevice ("/", &root);
> +  grub_ieee1275_get_integer_property (root, "#address-cells", &address_cells,
> +				      sizeof address_cells, 0);
> +  grub_ieee1275_get_integer_property (root, "#size-cells", &size_cells,
> +				      sizeof size_cells, 0);
> +
> +  if (size_cells > address_cells)
> +    address_cells = size_cells;
> +
> +  /* Load `/memory/reg'.  */
> +  if (grub_ieee1275_finddevice ("/memory", &memory))
> +    return grub_error (GRUB_ERR_UNKNOWN_DEVICE,
> +		       "couldn't find /memory node");
> +  if (grub_ieee1275_get_integer_property (memory, "reg", reg,
> +					  sizeof reg, &reg_size))
> +    return grub_error (GRUB_ERR_UNKNOWN_DEVICE,
> +		       "couldn't examine /memory/reg property");
> +  if (reg_size < 0 || (grub_size_t) reg_size > sizeof (reg))
> +    return grub_error (GRUB_ERR_UNKNOWN_DEVICE,
> +                       "/memory response buffer exceeded");
> +
> +  if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_BROKEN_ADDRESS_CELLS))
> +    {
> +      address_cells = 1;
> +      size_cells = 1;
> +    }
> +
> +  /* Decode only the size */
> +  size = reg[address_cells];
> +  if (size_cells == 2)
> +    size = (size << 32) | reg[address_cells + 1];
> +
> +  *total = size;
> +
> +  return grub_errno;
> +}
> +
> +/* See PAPR or arch/powerpc/kernel/prom_init.c */
> +struct option_vector2 {
> +  grub_uint8_t byte1;
> +  grub_uint16_t reserved;
> +  grub_uint32_t real_base;
> +  grub_uint32_t real_size;
> +  grub_uint32_t virt_base;
> +  grub_uint32_t virt_size;
> +  grub_uint32_t load_base;
> +  grub_uint32_t min_rma;
> +  grub_uint32_t min_load;
> +  grub_uint8_t min_rma_percent;
> +  grub_uint8_t max_pft_size;
> +} __attribute__((packed));
> +
> +struct pvr_entry {
> +  grub_uint32_t mask;
> +  grub_uint32_t entry;
> +};
> +
> +struct cas_vector {
> +  struct {
> +    struct pvr_entry terminal;
> +  } pvr_list;
> +  grub_uint8_t num_vecs;
> +  grub_uint8_t vec1_size;
> +  grub_uint8_t vec1;
> +  grub_uint8_t vec2_size;
> +  struct option_vector2 vec2;
> +  grub_uint8_t vec3_size;
> +  grub_uint16_t vec3;
> +  grub_uint8_t vec4_size;
> +  grub_uint16_t vec4;
> +} __attribute__((packed));
> +
> +/*
> + * Call ibm,client-architecture-support to try to get more RMA.
> + * We ask for 512MB which should be enough to verify a distro kernel.
> + * We ignore most errors: if we don't succeed we'll proceed with whatever
> + * memory we have.
> + */
> +static void
> +grub_ieee1275_ibm_cas (void)
> +{
> +  int rc;
> +  grub_ieee1275_ihandle_t root;
> +  struct cas_args {
> +    struct grub_ieee1275_common_hdr common;
> +    grub_ieee1275_cell_t method;
> +    grub_ieee1275_ihandle_t ihandle;
> +    grub_ieee1275_cell_t cas_addr;
> +    grub_ieee1275_cell_t result;
> +  } args;
> +  struct cas_vector vector = {
> +    .pvr_list = { { 0x00000000, 0xffffffff } }, /* any processor */
> +    .num_vecs = 4 - 1,
> +    .vec1_size = 0,
> +    .vec1 = 0x80, /* ignore */
> +    .vec2_size = 1 + sizeof(struct option_vector2) - 2,
> +    .vec2 = {
> +      0, 0, -1, -1, -1, -1, -1, 512, -1, 0, 48
> +    },
> +    .vec3_size = 2 - 1,
> +    .vec3 = 0x00e0, /* ask for FP + VMX + DFP but don't halt if unsatisfied */
> +    .vec4_size = 2 - 1,
> +    .vec4 = 0x0001, /* set required minimum capacity % to the lowest value */
> +  };
> +
> +  INIT_IEEE1275_COMMON (&args.common, "call-method", 3, 2);
> +  args.method = (grub_ieee1275_cell_t)"ibm,client-architecture-support";
> +  rc = grub_ieee1275_open("/", &root);
> +  if (rc) {
> +	  grub_error (GRUB_ERR_IO, "could not open root when trying to call CAS");
> +	  return;
> +  }
> +  args.ihandle = root;
> +  args.cas_addr = (grub_ieee1275_cell_t)&vector;
> +
> +  grub_printf("Calling ibm,client-architecture-support from grub...");
> +  IEEE1275_CALL_ENTRY_FN (&args);
> +  grub_printf("done\n");
> +
> +  grub_ieee1275_close(root);
> +}
> +
>   static void
>   grub_claim_heap (void)
>   {
>     unsigned long total = 0;
>   
> +  if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_CAN_TRY_CAS_FOR_MORE_MEMORY))
> +    {
> +      grub_uint64_t rma_size;
> +      grub_err_t err;
> +
> +      err = grub_ieee1275_total_mem (&rma_size);
> +      /* if we have an error, don't call CAS, just hope for the best */
> +      if (!err && rma_size < (512 * 1024 * 1024))
> +	grub_ieee1275_ibm_cas();
> +    }
> +
>     grub_machine_mmap_iterate (heap_init, &total);
>   }
>   #endif
> diff --git a/include/grub/ieee1275/ieee1275.h b/include/grub/ieee1275/ieee1275.h
> index f53228703bdb..b5c916d1de58 100644
> --- a/include/grub/ieee1275/ieee1275.h
> +++ b/include/grub/ieee1275/ieee1275.h
> @@ -128,6 +128,14 @@ enum grub_ieee1275_flag
>     GRUB_IEEE1275_FLAG_CURSORONOFF_ANSI_BROKEN,
>   
>     GRUB_IEEE1275_FLAG_RAW_DEVNAMES,
> +
> +  /*
> +   * On PFW, the first time we boot a Linux partition, we may only get 256MB of
> +   * real memory area, even if the partition has more memory. Set this flag if
> +   * we think we're running under PFW. Then, if this flag is set, and the RMA is
> +   * only 256MB in size, try asking for more with CAS.
> +   */
> +  GRUB_IEEE1275_FLAG_CAN_TRY_CAS_FOR_MORE_MEMORY,
>   };
>   
>   extern int EXPORT_FUNC(grub_ieee1275_test_flag) (enum grub_ieee1275_flag flag);


  reply	other threads:[~2022-07-19 20:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-21  5:24 [PATCH v3 00/15] Dynamic allocation of memory regions and IBM vTPM v2 Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 01/15] grub-shell: only pass SeaBIOS fw_opt in for x86 BIOS platforms Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 02/15] mm: assert that we preserve header vs region alignment Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 03/15] mm: when adding a region, merge with region after as well as before Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 04/15] mm: debug support for region operations Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 05/15] mm: Drop unused unloading of modules on OOM Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 06/15] mm: Allow dynamically requesting additional memory regions Daniel Axtens
2022-04-21  6:50   ` Daniel Axtens
2022-04-21 13:32     ` Daniel Kiper
2022-04-21  5:24 ` [PATCH v3 07/15] efi: mm: Always request a fixed number of pages on init Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 08/15] efi: mm: Extract function to add memory regions Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 09/15] efi: mm: Pass up errors from `add_memory_regions ()` Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 10/15] efi: mm: Implement runtime addition of pages Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 11/15] ieee1275: request memory with ibm, client-architecture-support Daniel Axtens
2022-07-19 20:49   ` Stefan Berger [this message]
2022-07-20 13:34     ` Daniel Kiper
2022-04-21  5:24 ` [PATCH v3 12/15] ieee1275: drop len -= 1 quirk in heap_init Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 13/15] ieee1275: support runtime memory claiming Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 14/15] [RFC] Add memtool module with memory allocation stress-test Daniel Axtens
2022-04-21  5:24 ` [PATCH v3 15/15] ibmvtpm: Add support for trusted boot using a vTPM 2.0 Daniel Axtens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a548c2e1-caa9-45f2-e96c-5cf8ec309d8a@linux.ibm.com \
    --to=stefanb@linux.ibm.com \
    --cc=dja@axtens.net \
    --cc=dkiper@net-space.pl \
    --cc=grub-devel@gnu.org \
    --cc=leif@nuviainc.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.