All of lore.kernel.org
 help / color / mirror / Atom feed
* Handling large allocations (bypassing mm?)
@ 2022-12-14 13:21 Julian Andres Klode
  2022-12-14 15:11 ` Daniel Kiper
  0 siblings, 1 reply; 6+ messages in thread
From: Julian Andres Klode @ 2022-12-14 13:21 UTC (permalink / raw)
  To: grub-devel; +Cc: Jeremy Szu

[-- Attachment #1: Type: text/plain, Size: 4710 bytes --]

Hi,

so I want to bring this discussion here that I had mostly with myself
in the past days on IRC.

As some of you know, we had a couple issues with large initrds in
Ubuntu, Jeremy posted a patch series earlier about mmunlimited.

I wanted to propose a more fine-grained approach, as well as a
more generic approach to handling large allocations.

The first issue one experiences when opening large initrds is
that grub_file_open() calls grub_verifier_open() which simply
grub_malloc()s a buffer for the size of the file.

Later, for initrd, we have to allocate it a second time, in
the upstream tree that happens via relocator, in the rhboot
tree it allocates directly from EFI.

Now my basic proposal is quite simple: We make grub_malloc()
and that relocator allocation code bypass the grub memory
management altogether and just do raw EFI page allocations
(provide two function pointers grub_mm_allocate_pages and
grub_mm_free_pages, and just call them if allocation size
is large[1]). e.g. at the start of grub_malloc:

    if (len > @100 pages@ && grub_mm_allocate_pages != NULL) {
        ret = grub_mm_allocate_pages_below(@4GB@, ..., ROUND_TO_PAGES(size));
        if (ret == NULL)
            ret = grub_mm_allocate_pages_below(@infinity@, ..., ROUND_TO_PAGES(size));
        return ret;
    }

Allocating those below 4GB and only falling back to >4GB when we
run out of space allows us to avoid most issues where DMA fails
above 4GB.

But then we also patch grub_file_read() to check if the target buffer
is located above 4GB and if so, use bounce buffers to copy
data so that we avoid even more of those issues, so we add to
the start of it something like:

    if ((grub_addr_t) buf > @4GB@) {
        return read_bufferedfile, buf. len)
    }

where grub_file_read_with_buffer is like in rhboot's EFI loader:

    #define BOUNCE_BUFFER_MAX 0x1000000ull

    static grub_ssize_t
    read_buffered(grub_file_t file, grub_uint8_t *bufp, grub_size_t len)
    {
      grub_ssize_t bufpos = 0;
      static grub_size_t bbufsz = 0;
      static char *bbuf = NULL;

      if (bbufsz == 0)
        bbufsz = MIN(BOUNCE_BUFFER_MAX, len);

      while (!bbuf && bbufsz)
        {
          bbuf = grub_malloc(bbufsz);
          if (!bbuf)
        bbufsz >>= 1;
        }
      if (!bbuf)
        grub_error (GRUB_ERR_OUT_OF_MEMORY, N_("cannot allocate bounce buffer"));

      while (bufpos < (long long)len)
        {
          grub_ssize_t sz;

          sz = grub_file_read (file, bbuf, MIN(bbufsz, len - bufpos));
          if (sz < 0)
        return sz;
          if (sz == 0)
        break;

          grub_memcpy(bufp + bufpos, bbuf, sz);
          bufpos += sz;
        }

      return bufpos;
    }


Now we still end up allocating each file twice, but we allocate
and release the verifier copy to the EFI system. This means that
we allocate a lot less regions and have outsourced the problem
of releasing the memory after it's been used to the firmware :)

Of course ultimately we would want to avoid the double
allocation altogether, so it might make sense to provide
a way to directly allocate the buffer we need, such as:

void * (*grub_allocator)(size_t bytes);

grub_file_t grub_file_open_alloc(const char *name, 
                                 enum grub_file_type type,
                                 grub_allocator allocator);

or a function that simply reads a file at a path into a buffer:

void *grub_file_open_read_close(const char *name, 
                                enum grub_file_type type,
                                grub_allocator allocator);

The latter simply allocates the buffer by calling allocator,
reads into it, then verifies the content using verifier
framework before returning it.

So if we want to load an initrd, we write a function that
allocates an initrd using whatever policies the kernel needs
there, and then do

initrd_buf = grub_file_open_read_close(path, 
                GRUB_FILE_TYPE_LINUX_INITRD | GRUB_FILE_TYPE_NO_DECOMPRESS,
                initrd_alloc);

and then we're done and don't need to allocate and read
each file twice.

But that seems like a 2nd step that's a bit more complex
than bypassing the MM for large allocations and using bounce
buffers for >4GB targets in grub_file_read().

[1] What is large? Perhaps it's just 100 pages, perhaps it's
    4 MB. It depends on how different the performance is for
    the EFI call round trip vs doing it in our mm.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handling large allocations (bypassing mm?)
  2022-12-14 13:21 Handling large allocations (bypassing mm?) Julian Andres Klode
@ 2022-12-14 15:11 ` Daniel Kiper
  2022-12-15 14:28   ` Julian Andres Klode
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Kiper @ 2022-12-14 15:11 UTC (permalink / raw)
  To: Julian Andres Klode
  Cc: grub-devel, Jeremy Szu, ardb, arei, heinrich.schuchardt,
	ilias.apalodimas, quic_llindhol, tobias.powalowski

Adding a few folks who may be interested in this discussion too...

On Wed, Dec 14, 2022 at 02:21:49PM +0100, Julian Andres Klode wrote:
> Hi,
>
> so I want to bring this discussion here that I had mostly with myself
> in the past days on IRC.
>
> As some of you know, we had a couple issues with large initrds in
> Ubuntu, Jeremy posted a patch series earlier about mmunlimited.

I skimmed through it but at first sight I do not like it. I will comment
the patches more extensively in the following days.

> I wanted to propose a more fine-grained approach, as well as a
> more generic approach to handling large allocations.
>
> The first issue one experiences when opening large initrds is
> that grub_file_open() calls grub_verifier_open() which simply
> grub_malloc()s a buffer for the size of the file.
>
> Later, for initrd, we have to allocate it a second time, in
> the upstream tree that happens via relocator, in the rhboot
> tree it allocates directly from EFI.
>
> Now my basic proposal is quite simple: We make grub_malloc()
> and that relocator allocation code bypass the grub memory
> management altogether and just do raw EFI page allocations
> (provide two function pointers grub_mm_allocate_pages and
> grub_mm_free_pages, and just call them if allocation size
> is large[1]). e.g. at the start of grub_malloc:

In general I think we should stop using legacy Linux loader on UEFI
platforms and migrate to LoadFile2 protocol. Relevant patches for ARM64
are in master now. We are waiting for patches from Atish which will
merge ARM64 implementation into RISC-V (you can find earlier version
of the patch set here [1]). Then it could be extended into other
architectures including AMD64. Of course this will not solve problem
with extra allocation in verifiers. However, at least we should not
have problems with large initrds and we will not deal with relocator
on EFI platforms.

WRT verifiers extra allocation, as you said, this is a problem but it can
be solved later. Now at least we should be sure it is possible to load
big initrds with LoadFile2 protocol and there are no obstacles in the
GRUB code for this thing.

Daniel

[1] https://lists.gnu.org/archive/html/grub-devel/2022-11/msg00018.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handling large allocations (bypassing mm?)
  2022-12-14 15:11 ` Daniel Kiper
@ 2022-12-15 14:28   ` Julian Andres Klode
  2022-12-15 17:35     ` Ard Biesheuvel
  0 siblings, 1 reply; 6+ messages in thread
From: Julian Andres Klode @ 2022-12-15 14:28 UTC (permalink / raw)
  To: Daniel Kiper
  Cc: grub-devel, Jeremy Szu, ardb, arei, heinrich.schuchardt,
	ilias.apalodimas, quic_llindhol, tobias.powalowski

On Wed, Dec 14, 2022 at 04:11:18PM +0100, Daniel Kiper wrote:
> Adding a few folks who may be interested in this discussion too...
> 
> On Wed, Dec 14, 2022 at 02:21:49PM +0100, Julian Andres Klode wrote:
> > Hi,
> >
> > so I want to bring this discussion here that I had mostly with myself
> > in the past days on IRC.
> >
> > As some of you know, we had a couple issues with large initrds in
> > Ubuntu, Jeremy posted a patch series earlier about mmunlimited.
> 
> I skimmed through it but at first sight I do not like it. I will comment
> the patches more extensively in the following days.
> 
> > I wanted to propose a more fine-grained approach, as well as a
> > more generic approach to handling large allocations.
> >
> > The first issue one experiences when opening large initrds is
> > that grub_file_open() calls grub_verifier_open() which simply
> > grub_malloc()s a buffer for the size of the file.
> >
> > Later, for initrd, we have to allocate it a second time, in
> > the upstream tree that happens via relocator, in the rhboot
> > tree it allocates directly from EFI.
> >
> > Now my basic proposal is quite simple: We make grub_malloc()
> > and that relocator allocation code bypass the grub memory
> > management altogether and just do raw EFI page allocations
> > (provide two function pointers grub_mm_allocate_pages and
> > grub_mm_free_pages, and just call them if allocation size
> > is large[1]). e.g. at the start of grub_malloc:
> 
> In general I think we should stop using legacy Linux loader on UEFI
> platforms and migrate to LoadFile2 protocol. Relevant patches for ARM64
> are in master now. We are waiting for patches from Atish which will
> merge ARM64 implementation into RISC-V (you can find earlier version
> of the patch set here [1]). Then it could be extended into other
> architectures including AMD64. Of course this will not solve problem
> with extra allocation in verifiers. However, at least we should not
> have problems with large initrds and we will not deal with relocator
> on EFI platforms.
> 
> WRT verifiers extra allocation, as you said, this is a problem but it can
> be solved later. Now at least we should be sure it is possible to load
> big initrds with LoadFile2 protocol and there are no obstacles in the
> GRUB code for this thing.

To my knowledge, that is incorrect and LoadFile2 suppoort does nothing
to address the issue of large initrds:

- LoadFile2 support for the kernel has only been added very recently
  (2020) and people need to support 10 year old kernels (2012).

  How large initrds do they need? I don't know. Do I want them
  stranded? Not necessarily.

- We do not need to read the initrd and keep it in memory ahead of
  time presumably but at the moment we reuse the allocation paths
  and just copy them over afaik.

I do not know how LoadFile2 is implemented on the kernel side, but
it may reinforce the very issue because any allocations on the grub
side can't be reused, and grub may be unable to reuse the initrd
allocation by the verifier for the loader by relocator, so you
end up needing to allocate 3*initrd (2 in grub, and 1 in the
kernel).

By directly allocating pages for the verifier and releasing
them after it's done we can effectively go down to a memory
usage of 2*initrd size, and *both* can be allocated > 4GB,
so they can be large.

-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handling large allocations (bypassing mm?)
  2022-12-15 14:28   ` Julian Andres Klode
@ 2022-12-15 17:35     ` Ard Biesheuvel
  2022-12-16 17:55       ` Robbie Harwood
  0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel @ 2022-12-15 17:35 UTC (permalink / raw)
  To: Julian Andres Klode
  Cc: Daniel Kiper, grub-devel, Jeremy Szu, arei, heinrich.schuchardt,
	ilias.apalodimas, quic_llindhol, tobias.powalowski

On Thu, 15 Dec 2022 at 15:29, Julian Andres Klode
<julian.klode@canonical.com> wrote:
>
> On Wed, Dec 14, 2022 at 04:11:18PM +0100, Daniel Kiper wrote:
> > Adding a few folks who may be interested in this discussion too...
> >
> > On Wed, Dec 14, 2022 at 02:21:49PM +0100, Julian Andres Klode wrote:
> > > Hi,
> > >
> > > so I want to bring this discussion here that I had mostly with myself
> > > in the past days on IRC.
> > >
> > > As some of you know, we had a couple issues with large initrds in
> > > Ubuntu, Jeremy posted a patch series earlier about mmunlimited.
> >
> > I skimmed through it but at first sight I do not like it. I will comment
> > the patches more extensively in the following days.
> >
> > > I wanted to propose a more fine-grained approach, as well as a
> > > more generic approach to handling large allocations.
> > >
> > > The first issue one experiences when opening large initrds is
> > > that grub_file_open() calls grub_verifier_open() which simply
> > > grub_malloc()s a buffer for the size of the file.
> > >
> > > Later, for initrd, we have to allocate it a second time, in
> > > the upstream tree that happens via relocator, in the rhboot
> > > tree it allocates directly from EFI.
> > >
> > > Now my basic proposal is quite simple: We make grub_malloc()
> > > and that relocator allocation code bypass the grub memory
> > > management altogether and just do raw EFI page allocations
> > > (provide two function pointers grub_mm_allocate_pages and
> > > grub_mm_free_pages, and just call them if allocation size
> > > is large[1]). e.g. at the start of grub_malloc:
> >
> > In general I think we should stop using legacy Linux loader on UEFI
> > platforms and migrate to LoadFile2 protocol. Relevant patches for ARM64
> > are in master now. We are waiting for patches from Atish which will
> > merge ARM64 implementation into RISC-V (you can find earlier version
> > of the patch set here [1]). Then it could be extended into other
> > architectures including AMD64. Of course this will not solve problem
> > with extra allocation in verifiers. However, at least we should not
> > have problems with large initrds and we will not deal with relocator
> > on EFI platforms.
> >
> > WRT verifiers extra allocation, as you said, this is a problem but it can
> > be solved later. Now at least we should be sure it is possible to load
> > big initrds with LoadFile2 protocol and there are no obstacles in the
> > GRUB code for this thing.
>
> To my knowledge, that is incorrect and LoadFile2 suppoort does nothing
> to address the issue of large initrds:
>

It does.

You should only need 1x the space for the initrd with LoadFile2(), and
given that it is the EFI stub itself that performs the allocation,
whether or not it is above 4 GB or meets some other arch-specific
requirement is completely immaterial to GRUB.

The initrd data will only be loaded once, when the EFI stub asks for
it, and if it doesn't, GRUB will never access the file (or more
accurately, never call grub_file_read() on it).

For verification and other purposes, GRUB is free to authenticate,
decompress or do other things to the payload after loading it into the
buffer but before returning from the callback, and if needed, it can
wipe the buffer again and return an error code to the EFI stub, which
will cause it to give up and return back to GRUB (provided that you
are not using the terminally broken EFI handover protocol)

As for supporing kernels from 2012: I don't see why upstream GRUB
should care about that. If your distro fork supports those today, you
will simply need to carry those patches out of tree a bit longer.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handling large allocations (bypassing mm?)
  2022-12-15 17:35     ` Ard Biesheuvel
@ 2022-12-16 17:55       ` Robbie Harwood
  2022-12-22 13:14         ` Ard Biesheuvel
  0 siblings, 1 reply; 6+ messages in thread
From: Robbie Harwood @ 2022-12-16 17:55 UTC (permalink / raw)
  To: Ard Biesheuvel, Julian Andres Klode
  Cc: Daniel Kiper, grub-devel, Jeremy Szu, arei, heinrich.schuchardt,
	ilias.apalodimas, quic_llindhol, tobias.powalowski

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Ard Biesheuvel <ardb@kernel.org> writes:

> As for supporing kernels from 2012: I don't see why upstream GRUB
> should care about that. If your distro fork supports those today, you
> will simply need to carry those patches out of tree a bit longer.

No, it's not a question of distros supporting themselves like this.  For
better or worse, people expect to be able to install many OSs on their
drive and have any grub be able to boot any of them.

One such use case I've seen is hardware testing: someone will install
all the operating systems they care about on a drive, then plug it in to
the machine with the hardware and try all of them in sequence.  And of
course there're always hobbyists who just think it's fun to do things
like that - they file bugs too :)

In any case, it's not a little bit of time we're talking about here -
even though RHEL 6 is only guaranteed into 2024 right now, RHEL 8 is
slated to be here until at least 2031.  (Source:
https://access.redhat.com/support/policy/updates/errata/ )

Be well,
--Robbie

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handling large allocations (bypassing mm?)
  2022-12-16 17:55       ` Robbie Harwood
@ 2022-12-22 13:14         ` Ard Biesheuvel
  0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2022-12-22 13:14 UTC (permalink / raw)
  To: Robbie Harwood
  Cc: Julian Andres Klode, Daniel Kiper, grub-devel, Jeremy Szu, arei,
	heinrich.schuchardt, ilias.apalodimas, quic_llindhol,
	tobias.powalowski

On Fri, 16 Dec 2022 at 18:55, Robbie Harwood <rharwood@redhat.com> wrote:
>
> Ard Biesheuvel <ardb@kernel.org> writes:
>
> > As for supporing kernels from 2012: I don't see why upstream GRUB
> > should care about that. If your distro fork supports those today, you
> > will simply need to carry those patches out of tree a bit longer.
>
> No, it's not a question of distros supporting themselves like this.  For
> better or worse, people expect to be able to install many OSs on their
> drive and have any grub be able to boot any of them.
>
> One such use case I've seen is hardware testing: someone will install
> all the operating systems they care about on a drive, then plug it in to
> the machine with the hardware and try all of them in sequence.  And of
> course there're always hobbyists who just think it's fun to do things
> like that - they file bugs too :)
>
> In any case, it's not a little bit of time we're talking about here -
> even though RHEL 6 is only guaranteed into 2024 right now, RHEL 8 is
> slated to be here until at least 2031.  (Source:
> https://access.redhat.com/support/policy/updates/errata/ )
>

Fair enough.

So are you saying that all downstream GRUBs are mutually compatible,
and can boot other distro kernel images, and it is simply mainline
GRUB that needs to be modified to incorporate those existing
out-of-tree changes?

My stake in this is mainly from the opposite side, i.e., avoiding
pointless per-arch deviations when doing EFI boot from any bootloader.
Upstreaming the EFI handover protocol and other x86 quirks into GRUB
while we already have all the pieces we need to make EFI boot almost
entirely arch-agnostic seems like a mistake to me. And given that the
x86 EFI handover protocol is not in mainline GRUB at the moment, I
wonder how this all fits with your (plural) assertion that older
kernels must be supported by mainline GRUB, as it doesn't support any
of those today.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-22 13:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-14 13:21 Handling large allocations (bypassing mm?) Julian Andres Klode
2022-12-14 15:11 ` Daniel Kiper
2022-12-15 14:28   ` Julian Andres Klode
2022-12-15 17:35     ` Ard Biesheuvel
2022-12-16 17:55       ` Robbie Harwood
2022-12-22 13:14         ` Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.