From: Charlie Jenkins <charlie@rivosinc.com>
To: Maxim Kochetkov <fido_max@inbox.ru>
Cc: linux-riscv@lists.infradead.org, bigunclemax@gmail.com,
Amma Lee <lixiaoyun@binary-semi.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Conor Dooley <conor.dooley@microchip.com>,
Andrew Jones <ajones@ventanamicro.com>,
Jisheng Zhang <jszhang@kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/1] riscv: optimize ELF relocation function in riscv
Date: Mon, 11 Dec 2023 17:51:06 -0800 [thread overview]
Message-ID: <ZXe8imHSg20nKagf@ghost> (raw)
In-Reply-To: <ZXJrGLwxtjf6cK42@ghost>
On Thu, Dec 07, 2023 at 05:02:16PM -0800, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> > The patch can optimize the running times of insmod command by modify ELF
> > relocation function.
> > In the 5.10 and latest kernel, when install the riscv ELF drivers which
> > contains multiple symbol table items to be relocated, kernel takes a lot
> > of time to execute the relocation. For example, we install a 3+MB driver
> > need 180+s.
> > We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> > type items relocation function in the arch\riscv\kernel\module.c and
> > find that there are two-loops in the function. If we modify the begin
> > number in the second for-loops iteration, we could save significant time
> > for installation. We install the same 3+MB driver could just need 2s.
> >
> > Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> > Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> > ---
> > Changes in v4:
> > - use 'while' loop instead of 'for' loop to avoid code duplicate
> > ---
> > arch/riscv/kernel/module.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 7c651d55fcbd..8c9b644ebfdb 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > Elf_Sym *sym;
> > u32 *location;
> > unsigned int i, type;
> > + unsigned int j_idx = 0;
> > Elf_Addr v;
> > int res;
> >
> > @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > v = sym->st_value + rel[i].r_addend;
> >
> > if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> > - unsigned int j;
> > + unsigned int j = j_idx;
> > + bool found = false;
> >
> > - for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > + do {
> > unsigned long hi20_loc =
> > sechdrs[sechdrs[relsec].sh_info].sh_addr
> > + rel[j].r_offset;
> > @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > hi20 = (offset + 0x800) & 0xfffff000;
> > lo12 = offset - hi20;
> > v = lo12;
> > + found = true;
> >
> > break;
> > }
> > - }
> > - if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > +
> > + j++;
> > + if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> > + j = 0;
> Very interesting algorithm here. Assuming the hi relocation is after the
> previous one seems to be a good heuristic. However I think we can do
> better. In GNU ld, a hashmap of all of the hi relocations is stored and
> a list of all of the lo relocations. After all of the other relocations
> have been parsed, it iterates through all of the lo relocations and
> looks up the associated hi relocation in the hashmap.
>
> There is more memory overhead here but I suspect it will be faster. I
> had started to mock up a hashmap implementation to see if it was faster
> but decided I should mention it here first in case somebody had some
> additional insight.
Turns out this is a fantastic heuristic. Using a hashmap is
significantly faster than the default implementation but this algorithm
above is significantly faster than the hashmap. Using the amdgpu driver
(which is actually a collection of drivers) and is a size of about 469M
I found that the hashmap implementation is about 30% faster than the
current implementation, but this patch is 50% faster than the current
implementation. It is probably possible to write an ELF header with the
relocations sufficiently scrambled to make the hashmap faster, but I
suspect that for all "normal" programs this algorithm is faster.
I also tried a couple other smaller modules and it was faster or around
the same as the hashmap in all of them.
A lot of code has changed in this file since this patch was submitted,
can you rebase onto 6.7-rc1? Otherwise this patch is great.
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
>
> - Charlie
>
> > +
> > + } while (j_idx != j);
> > +
> > + if (!found) {
> > pr_err(
> > "%s: Can not find HI20 relocation information\n",
> > me->name);
> > return -EINVAL;
> > }
> > +
> > + /* Record the previous j-loop end index */
> > + j_idx = j;
> > }
> >
> > res = handler(me, location, v);
> > --
> > 2.40.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
prev parent reply other threads:[~2023-12-12 1:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 13:05 [PATCH v4 1/1] riscv: optimize ELF relocation function in riscv Maxim Kochetkov
2023-12-08 1:02 ` Charlie Jenkins
2023-12-12 1:51 ` Charlie Jenkins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXe8imHSg20nKagf@ghost \
--to=charlie@rivosinc.com \
--cc=ajones@ventanamicro.com \
--cc=aou@eecs.berkeley.edu \
--cc=bigunclemax@gmail.com \
--cc=conor.dooley@microchip.com \
--cc=fido_max@inbox.ru \
--cc=jszhang@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=lixiaoyun@binary-semi.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).