linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Charlie Jenkins <charlie@rivosinc.com>
To: Maxim Kochetkov <fido_max@inbox.ru>
Cc: linux-riscv@lists.infradead.org, bigunclemax@gmail.com,
	Amma Lee <lixiaoyun@binary-semi.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Conor Dooley <conor.dooley@microchip.com>,
	Andrew Jones <ajones@ventanamicro.com>,
	Jisheng Zhang <jszhang@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/1] riscv: optimize ELF relocation function in riscv
Date: Mon, 11 Dec 2023 17:51:06 -0800	[thread overview]
Message-ID: <ZXe8imHSg20nKagf@ghost> (raw)
In-Reply-To: <ZXJrGLwxtjf6cK42@ghost>

On Thu, Dec 07, 2023 at 05:02:16PM -0800, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> > The patch can optimize the running times of insmod command by modify ELF
> > relocation function.
> > In the 5.10 and latest kernel, when install the riscv ELF drivers which
> > contains multiple symbol table items to be relocated, kernel takes a lot
> > of time to execute the relocation. For example, we install a 3+MB driver
> > need 180+s.
> > We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> > type items relocation function in the arch\riscv\kernel\module.c and
> > find that there are two-loops in the function. If we modify the begin
> > number in the second for-loops iteration, we could save significant time
> > for installation. We install the same 3+MB driver could just need 2s.
> > 
> > Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> > Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
> > ---
> > Changes in v4:
> > - use 'while' loop instead of 'for' loop to avoid code duplicate
> > ---
> >  arch/riscv/kernel/module.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 7c651d55fcbd..8c9b644ebfdb 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  	Elf_Sym *sym;
> >  	u32 *location;
> >  	unsigned int i, type;
> > +	unsigned int j_idx = 0;
> >  	Elf_Addr v;
> >  	int res;
> >  
> > @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  		v = sym->st_value + rel[i].r_addend;
> >  
> >  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> > -			unsigned int j;
> > +			unsigned int j = j_idx;
> > +			bool found = false;
> >  
> > -			for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > +			do {
> >  				unsigned long hi20_loc =
> >  					sechdrs[sechdrs[relsec].sh_info].sh_addr
> >  					+ rel[j].r_offset;
> > @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  					hi20 = (offset + 0x800) & 0xfffff000;
> >  					lo12 = offset - hi20;
> >  					v = lo12;
> > +					found = true;
> >  
> >  					break;
> >  				}
> > -			}
> > -			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > +
> > +				j++;
> > +				if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> > +					j = 0;
> Very interesting algorithm here. Assuming the hi relocation is after the
> previous one seems to be a good heuristic. However I think we can do
> better. In GNU ld, a hashmap of all of the hi relocations is stored and
> a list of all of the lo relocations. After all of the other relocations
> have been parsed, it iterates through all of the lo relocations and
> looks up the associated hi relocation in the hashmap.
> 
> There is more memory overhead here but I suspect it will be faster. I
> had started to mock up a hashmap implementation to see if it was faster
> but decided I should mention it here first in case somebody had some
> additional insight. 

Turns out this is a fantastic heuristic. Using a hashmap is
significantly faster than the default implementation but this algorithm
above is significantly faster than the hashmap. Using the amdgpu driver
(which is actually a collection of drivers) and is a size of about 469M
I found that the hashmap implementation is about 30% faster than the
current implementation, but this patch is 50% faster than the current
implementation. It is probably possible to write an ELF header with the
relocations sufficiently scrambled to make the hashmap faster, but I
suspect that for all "normal" programs this algorithm is faster.

I also tried a couple other smaller modules and it was faster or around
the same as the hashmap in all of them.

A lot of code has changed in this file since this patch was submitted,
can you rebase onto 6.7-rc1? Otherwise this patch is great.

Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>

> 
> - Charlie
> 
> > +
> > +			} while (j_idx != j);
> > +
> > +			if (!found) {
> >  				pr_err(
> >  				  "%s: Can not find HI20 relocation information\n",
> >  				  me->name);
> >  				return -EINVAL;
> >  			}
> > +
> > +			/* Record the previous j-loop end index */
> > +			j_idx = j;
> >  		}
> >  
> >  		res = handler(me, location, v);
> > -- 
> > 2.40.1
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

      reply	other threads:[~2023-12-12  1:51 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13 13:05 [PATCH v4 1/1] riscv: optimize ELF relocation function in riscv Maxim Kochetkov
2023-12-08  1:02 ` Charlie Jenkins
2023-12-12  1:51   ` Charlie Jenkins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZXe8imHSg20nKagf@ghost \
    --to=charlie@rivosinc.com \
    --cc=ajones@ventanamicro.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bigunclemax@gmail.com \
    --cc=conor.dooley@microchip.com \
    --cc=fido_max@inbox.ru \
    --cc=jszhang@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lixiaoyun@binary-semi.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).