* [PATCH] x86: Intel microcode loader performance improvement
@ 2010-03-05 17:42 Dimitri Sivanich
2010-03-08 10:33 ` Dmitry Adamushko
2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich
0 siblings, 2 replies; 5+ messages in thread
From: Dimitri Sivanich @ 2010-03-05 17:42 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Dmitry Adamushko
We've noticed that on large SGI UV system configurations, running
microcode.ctl can take very long periods of time. This is due to
the large number of vmalloc/vfree calls made by the Intel
generic_load_microcode() logic.
By reusing allocated space, the following patch reduces the time
to run microcode.ctl on a 1024 cpu system from approximately 80
seconds down to 1 or 2 seconds.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
arch/x86/kernel/microcode_intel.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
Index: linux/arch/x86/kernel/microcode_intel.c
===================================================================
--- linux.orig/arch/x86/kernel/microcode_intel.c
+++ linux/arch/x86/kernel/microcode_intel.c
@@ -343,10 +343,11 @@ static enum ucode_state generic_load_mic
int (*get_ucode_data)(void *, const void *, size_t))
{
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
- u8 *ucode_ptr = data, *new_mc = NULL, *mc;
+ u8 *ucode_ptr = data, *new_mc = NULL, *mc = NULL;
int new_rev = uci->cpu_sig.rev;
unsigned int leftover = size;
enum ucode_state state = UCODE_OK;
+ unsigned int curr_mc_size = 0;
while (leftover) {
struct microcode_header_intel mc_header;
@@ -361,9 +362,15 @@ static enum ucode_state generic_load_mic
break;
}
- mc = vmalloc(mc_size);
- if (!mc)
- break;
+ /* For performance reasons, reuse mc area when possible */
+ if (!mc || mc_size > curr_mc_size) {
+ if (mc)
+ vfree(mc);
+ mc = vmalloc(mc_size);
+ if (!mc)
+ break;
+ curr_mc_size = mc_size;
+ }
if (get_ucode_data(mc, ucode_ptr, mc_size) ||
microcode_sanity_check(mc) < 0) {
@@ -376,13 +383,16 @@ static enum ucode_state generic_load_mic
vfree(new_mc);
new_rev = mc_header.rev;
new_mc = mc;
- } else
- vfree(mc);
+ mc = NULL; /* trigger new vmalloc */
+ }
ucode_ptr += mc_size;
leftover -= mc_size;
}
+ if (mc)
+ vfree(mc);
+
if (leftover) {
if (new_mc)
vfree(new_mc);
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] x86: Intel microcode loader performance improvement 2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich @ 2010-03-08 10:33 ` Dmitry Adamushko 2010-03-08 11:23 ` Avi Kivity 2010-03-08 20:37 ` Bill Davidsen 2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich 1 sibling, 2 replies; 5+ messages in thread From: Dmitry Adamushko @ 2010-03-08 10:33 UTC (permalink / raw) To: Dimitri Sivanich; +Cc: linux-kernel, Ingo Molnar On 5 March 2010 18:42, Dimitri Sivanich <sivanich@sgi.com> wrote: > We've noticed that on large SGI UV system configurations, running > microcode.ctl can take very long periods of time. This is due to > the large number of vmalloc/vfree calls made by the Intel > generic_load_microcode() logic. > > By reusing allocated space, the following patch reduces the time > to run microcode.ctl on a 1024 cpu system from approximately 80 > seconds down to 1 or 2 seconds. > > Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> This approach seems reasonable in the scope of the current framework. Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> However, I think a better approach would be to have some kind of shared storage for loaded microcode updates. Given that for the majority of SMP systems all the cpus are normally updated to the very same new instance of microcode, it should be enough to do a search for the first cpu, cache the instance of microcode and then reuse it for others. -- Dmitry ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86: Intel microcode loader performance improvement 2010-03-08 10:33 ` Dmitry Adamushko @ 2010-03-08 11:23 ` Avi Kivity 2010-03-08 20:37 ` Bill Davidsen 1 sibling, 0 replies; 5+ messages in thread From: Avi Kivity @ 2010-03-08 11:23 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: Dimitri Sivanich, linux-kernel, Ingo Molnar On 03/08/2010 12:33 PM, Dmitry Adamushko wrote: > On 5 March 2010 18:42, Dimitri Sivanich<sivanich@sgi.com> wrote: > >> We've noticed that on large SGI UV system configurations, running >> microcode.ctl can take very long periods of time. This is due to >> the large number of vmalloc/vfree calls made by the Intel >> generic_load_microcode() logic. >> >> By reusing allocated space, the following patch reduces the time >> to run microcode.ctl on a 1024 cpu system from approximately 80 >> seconds down to 1 or 2 seconds. >> >> Signed-off-by: Dimitri Sivanich<sivanich@sgi.com> >> > This approach seems reasonable in the scope of the current framework. > > Acked-by: Dmitry Adamushko<dmitry.adamushko@gmail.com> > > However, I think a better approach would be to have some kind of > shared storage for loaded microcode updates. Given that for the > majority of SMP systems all the cpus are normally updated to the very > same new instance of microcode, it should be enough to do a search for > the first cpu, cache the instance of microcode and then reuse it for > others. > > And/or update processors in parallel. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] x86: Intel microcode loader performance improvement 2010-03-08 10:33 ` Dmitry Adamushko 2010-03-08 11:23 ` Avi Kivity @ 2010-03-08 20:37 ` Bill Davidsen 1 sibling, 0 replies; 5+ messages in thread From: Bill Davidsen @ 2010-03-08 20:37 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: Dimitri Sivanich, linux-kernel, Ingo Molnar Dmitry Adamushko wrote: > On 5 March 2010 18:42, Dimitri Sivanich <sivanich@sgi.com> wrote: >> We've noticed that on large SGI UV system configurations, running >> microcode.ctl can take very long periods of time. This is due to >> the large number of vmalloc/vfree calls made by the Intel >> generic_load_microcode() logic. >> >> By reusing allocated space, the following patch reduces the time >> to run microcode.ctl on a 1024 cpu system from approximately 80 >> seconds down to 1 or 2 seconds. >> >> Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> > > This approach seems reasonable in the scope of the current framework. > > Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> > > However, I think a better approach would be to have some kind of > shared storage for loaded microcode updates. Given that for the > majority of SMP systems all the cpus are normally updated to the very > same new instance of microcode, it should be enough to do a search for > the first cpu, cache the instance of microcode and then reuse it for > others. > The assumption that all CPUs are the same is not always true in practice, people buy a system and don't always fully populate initially, and when they add processors, they have a more recent stepping. So reusing microcode or updating in parallel would add complexity, and 2 sec for 1024 CPUs puts a pretty low upper bound on possible improvement. Does more improvement to a one time small delay justify additional complexity? Systems that size are probably not booted all that often. Something to consider before putting a lot of effort into it, I think. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:x86/microcode] x86: Improve Intel microcode loader performance 2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich 2010-03-08 10:33 ` Dmitry Adamushko @ 2010-03-11 14:39 ` tip-bot for Dimitri Sivanich 1 sibling, 0 replies; 5+ messages in thread From: tip-bot for Dimitri Sivanich @ 2010-03-11 14:39 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, sivanich, davidsen, dmitry.adamushko, tglx, mingo, avi Commit-ID: 938179b4f8cf8a4f11234ebf2dff2eb48400acfe Gitweb: http://git.kernel.org/tip/938179b4f8cf8a4f11234ebf2dff2eb48400acfe Author: Dimitri Sivanich <sivanich@sgi.com> AuthorDate: Fri, 5 Mar 2010 11:42:03 -0600 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Thu, 11 Mar 2010 13:49:06 +0100 x86: Improve Intel microcode loader performance We've noticed that on large SGI UV system configurations, running microcode.ctl can take very long periods of time. This is due to the large number of vmalloc/vfree calls made by the Intel generic_load_microcode() logic. By reusing allocated space, the following patch reduces the time to run microcode.ctl on a 1024 cpu system from approximately 80 seconds down to 1 or 2 seconds. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Bill Davidsen <davidsen@tmr.com> LKML-Reference: <20100305174203.GA19638@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/microcode_intel.c | 22 ++++++++++++++++------ 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c index 85a343e..3561702 100644 --- a/arch/x86/kernel/microcode_intel.c +++ b/arch/x86/kernel/microcode_intel.c @@ -343,10 +343,11 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size, int (*get_ucode_data)(void *, const void *, size_t)) { struct ucode_cpu_info *uci = ucode_cpu_info + cpu; - u8 *ucode_ptr = data, *new_mc = NULL, *mc; + u8 *ucode_ptr = data, *new_mc = NULL, *mc = NULL; int new_rev = uci->cpu_sig.rev; unsigned int leftover = size; enum ucode_state state = UCODE_OK; + unsigned int curr_mc_size = 0; while (leftover) { struct microcode_header_intel mc_header; @@ -361,9 +362,15 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size, break; } - mc = vmalloc(mc_size); - if (!mc) - break; + /* For performance reasons, reuse mc area when possible */ + if (!mc || mc_size > curr_mc_size) { + if (mc) + vfree(mc); + mc = vmalloc(mc_size); + if (!mc) + break; + curr_mc_size = mc_size; + } if (get_ucode_data(mc, ucode_ptr, mc_size) || microcode_sanity_check(mc) < 0) { @@ -376,13 +383,16 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size, vfree(new_mc); new_rev = mc_header.rev; new_mc = mc; - } else - vfree(mc); + mc = NULL; /* trigger new vmalloc */ + } ucode_ptr += mc_size; leftover -= mc_size; } + if (mc) + vfree(mc); + if (leftover) { if (new_mc) vfree(new_mc); ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-03-11 14:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich 2010-03-08 10:33 ` Dmitry Adamushko 2010-03-08 11:23 ` Avi Kivity 2010-03-08 20:37 ` Bill Davidsen 2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox