* [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure @ 2025-04-14 9:21 shouyeliu 2025-04-14 10:41 ` Ilpo Järvinen 0 siblings, 1 reply; 7+ messages in thread From: shouyeliu @ 2025-04-14 9:21 UTC (permalink / raw) To: srinivas.pandruvada, hdegoede, ilpo.jarvinen Cc: platform-driver-x86, linux-kernel, shouyeliu When uncore_event_cpu_online() fails to initialize a control CPU (e.g., due to memory allocation failure or uncore_freq_add_entry() errors), the code leaves stale entries in uncore_cpu_mask after that online CPU will not try to call uncore_freq_add_entry, resulting in no sys interface. Signed-off-by: shouyeliu <shouyeliu@gmail.com> --- .../x86/intel/uncore-frequency/uncore-frequency.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c index 40bbf8e45fa4..1de0a4a9d6cd 100644 --- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c +++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c @@ -146,15 +146,13 @@ static int uncore_event_cpu_online(unsigned int cpu) { struct uncore_data *data; int target; + int ret; /* Check if there is an online cpu in the package for uncore MSR */ target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu)); if (target < nr_cpu_ids) return 0; - /* Use this CPU on this die as a control CPU */ - cpumask_set_cpu(cpu, &uncore_cpu_mask); - data = uncore_get_instance(cpu); if (!data) return 0; @@ -163,7 +161,13 @@ static int uncore_event_cpu_online(unsigned int cpu) data->die_id = topology_die_id(cpu); data->domain_id = UNCORE_DOMAIN_ID_INVALID; - return uncore_freq_add_entry(data, cpu); + ret = uncore_freq_add_entry(data, cpu); + if (!ret) { + /* Use this CPU on this die as a control CPU */ + cpumask_set_cpu(cpu, &uncore_cpu_mask); + } + + return ret; } static int uncore_event_cpu_offline(unsigned int cpu) -- 2.19.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure 2025-04-14 9:21 [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure shouyeliu @ 2025-04-14 10:41 ` Ilpo Järvinen 2025-04-14 16:06 ` srinivas pandruvada 0 siblings, 1 reply; 7+ messages in thread From: Ilpo Järvinen @ 2025-04-14 10:41 UTC (permalink / raw) To: shouyeliu; +Cc: srinivas.pandruvada, Hans de Goede, platform-driver-x86, LKML On Mon, 14 Apr 2025, shouyeliu wrote: > When uncore_event_cpu_online() fails to initialize a control CPU (e.g., > due to memory allocation failure or uncore_freq_add_entry() errors), > the code leaves stale entries in uncore_cpu_mask after that online CPU > will not try to call uncore_freq_add_entry, resulting in no sys interface. Please add () after any name that refers to a C function (you're not even being consistent here as you had it in some cases but not here). Please try to split the very long sentence a bit and make it more obvious what causes what as the current wording is a bit vague, did you mean: uncore_event_cpu_online() will not call uncore_freq_add_entry() for another CPU that is being onlined or something along those lines? Will this change work/matter? Documentation/core-api/cpu_hotplug.rst says about cpuhp_setup_state(): "If a callback fails for CPU N then the teardown callback for CPU 0 .. N-1 is invoked to rollback the operation. The state setup fails, the callbacks for the state are not installed and in case of dynamic allocation the allocated state is freed." > Fixes tag? > Signed-off-by: shouyeliu <shouyeliu@gmail.com> The correct format for tags is documented in Documentation/process/5.Posting.rst: tag: Full Name <email address> > --- > .../x86/intel/uncore-frequency/uncore-frequency.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c > index 40bbf8e45fa4..1de0a4a9d6cd 100644 > --- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c > +++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c > @@ -146,15 +146,13 @@ static int uncore_event_cpu_online(unsigned int cpu) > { > struct uncore_data *data; > int target; > + int ret; > > /* Check if there is an online cpu in the package for uncore MSR */ > target = cpumask_any_and(&uncore_cpu_mask, topology_die_cpumask(cpu)); > if (target < nr_cpu_ids) > return 0; > > - /* Use this CPU on this die as a control CPU */ > - cpumask_set_cpu(cpu, &uncore_cpu_mask); > - > data = uncore_get_instance(cpu); > if (!data) > return 0; > @@ -163,7 +161,13 @@ static int uncore_event_cpu_online(unsigned int cpu) > data->die_id = topology_die_id(cpu); > data->domain_id = UNCORE_DOMAIN_ID_INVALID; > > - return uncore_freq_add_entry(data, cpu); > + ret = uncore_freq_add_entry(data, cpu); > + if (!ret) { > + /* Use this CPU on this die as a control CPU */ > + cpumask_set_cpu(cpu, &uncore_cpu_mask); > + } > + > + return ret; Please reverse to logic such that you return early on error, which is the usual error handling pattern. > } > > static int uncore_event_cpu_offline(unsigned int cpu) > -- i. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure 2025-04-14 10:41 ` Ilpo Järvinen @ 2025-04-14 16:06 ` srinivas pandruvada 2025-04-15 6:54 ` liu shouye [not found] ` <CAAscG3VHVdNDQGfsdBs_ht5H-WUtCBksMYPXLKW2D6Uqu3yeAA@mail.gmail.com> 0 siblings, 2 replies; 7+ messages in thread From: srinivas pandruvada @ 2025-04-14 16:06 UTC (permalink / raw) To: Ilpo Järvinen, shouyeliu; +Cc: Hans de Goede, platform-driver-x86, LKML On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote: > On Mon, 14 Apr 2025, shouyeliu wrote: > > > When uncore_event_cpu_online() fails to initialize a control CPU > > (e.g., > > due to memory allocation failure or uncore_freq_add_entry() > > errors), > > the code leaves stale entries in uncore_cpu_mask after that online > > CPU > > will not try to call uncore_freq_add_entry, resulting in no sys > > interface. > > Please add () after any name that refers to a C function (you're not > even > being consistent here as you had it in some cases but not here). > > Please try to split the very long sentence a bit and make it more > obvious > what causes what as the current wording is a bit vague, did you mean: > uncore_event_cpu_online() will not call uncore_freq_add_entry() for > another CPU that is being onlined or something along those lines? > > Will this change work/matter? Documentation/core-api/cpu_hotplug.rst > says > about cpuhp_setup_state(): > > "If a callback fails for CPU N then the teardown callback for CPU > 0 .. N-1 is invoked to rollback the operation. The state setup > fails, > the callbacks for the state are not installed and in case of dynamic > allocation the allocated state is freed." > Yes, cpuhp_setup_state() will fail and which will result in clean up. So any fail of any fail uncore_event_cpu_online() will result in no sys entries. I think here the intention is to keep sys entries, which will not happen with this patch. For confirmation on 6.14 kernel, I forced failure on CPU 10: [595799.696873] intel_uncore_init [595799.700102] uncore_event_cpu_online cpu:0 [595799.704240] uncore_event_cpu_online cpu:1 [595799.708360] uncore_event_cpu_online cpu:2 [595799.712505] uncore_event_cpu_online cpu:3 [595799.716633] uncore_event_cpu_online cpu:4 [595799.720755] uncore_event_cpu_online cpu:5 [595799.724953] uncore_event_cpu_online cpu:6 [595799.729158] uncore_event_cpu_online cpu:7 [595799.733409] uncore_event_cpu_online cpu:8 [595799.737674] uncore_event_cpu_online cpu:9 [595799.741954] uncore_event_cpu_online cpu:10 [595799.746134] Force CPU 10 to fail online [595799.750182] uncore_event_cpu_offline cpu:0 [595799.754508] uncore_event_cpu_offline cpu:1 [595799.758834] uncore_event_cpu_offline cpu:2 [595799.763238] uncore_event_cpu_offline cpu:3 [595799.767558] uncore_event_cpu_offline cpu:4 [595799.771832] uncore_event_cpu_offline cpu:5 [595799.776178] uncore_event_cpu_offline cpu:6 [595799.780506] uncore_event_cpu_offline cpu:7 [595799.784862] uncore_event_cpu_offline cpu:8 [595799.789247] uncore_event_cpu_offline cpu:9 [595799.793540] intel_uncore_init cpuhp_setup_state failed [595799.798776] intel_uncore_init failed Thanks, Srinivas > > > > Fixes tag? > > > Signed-off-by: shouyeliu <shouyeliu@gmail.com> > > The correct format for tags is documented in > Documentation/process/5.Posting.rst: > > tag: Full Name <email address> > > > --- > > .../x86/intel/uncore-frequency/uncore-frequency.c | 12 > > ++++++++---- > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore- > > frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore- > > frequency.c > > index 40bbf8e45fa4..1de0a4a9d6cd 100644 > > --- a/drivers/platform/x86/intel/uncore-frequency/uncore- > > frequency.c > > +++ b/drivers/platform/x86/intel/uncore-frequency/uncore- > > frequency.c > > @@ -146,15 +146,13 @@ static int uncore_event_cpu_online(unsigned > > int cpu) > > { > > struct uncore_data *data; > > int target; > > + int ret; > > > > /* Check if there is an online cpu in the package for > > uncore MSR */ > > target = cpumask_any_and(&uncore_cpu_mask, > > topology_die_cpumask(cpu)); > > if (target < nr_cpu_ids) > > return 0; > > > > - /* Use this CPU on this die as a control CPU */ > > - cpumask_set_cpu(cpu, &uncore_cpu_mask); > > - > > data = uncore_get_instance(cpu); > > if (!data) > > return 0; > > @@ -163,7 +161,13 @@ static int uncore_event_cpu_online(unsigned > > int cpu) > > data->die_id = topology_die_id(cpu); > > data->domain_id = UNCORE_DOMAIN_ID_INVALID; > > > > - return uncore_freq_add_entry(data, cpu); > > + ret = uncore_freq_add_entry(data, cpu); > > + if (!ret) { > > + /* Use this CPU on this die as a control CPU */ > > + cpumask_set_cpu(cpu, &uncore_cpu_mask); > > + } > > + > > + return ret; > > Please reverse to logic such that you return early on error, which is > the > usual error handling pattern. > > > } > > > > static int uncore_event_cpu_offline(unsigned int cpu) > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure 2025-04-14 16:06 ` srinivas pandruvada @ 2025-04-15 6:54 ` liu shouye [not found] ` <CAAscG3VHVdNDQGfsdBs_ht5H-WUtCBksMYPXLKW2D6Uqu3yeAA@mail.gmail.com> 1 sibling, 0 replies; 7+ messages in thread From: liu shouye @ 2025-04-15 6:54 UTC (permalink / raw) To: srinivas pandruvada Cc: Ilpo Järvinen, Hans de Goede, platform-driver-x86, LKML srinivas pandruvada <srinivas.pandruvada@linux.intel.com> 于2025年4月15日周二 00:08写道: > > On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote: > > On Mon, 14 Apr 2025, shouyeliu wrote: > > > > > When uncore_event_cpu_online() fails to initialize a control CPU > > > (e.g., > > > due to memory allocation failure or uncore_freq_add_entry() > > > errors), > > > the code leaves stale entries in uncore_cpu_mask after that online > > > CPU > > > will not try to call uncore_freq_add_entry, resulting in no sys > > > interface. > > > > Please add () after any name that refers to a C function (you're not > > even > > being consistent here as you had it in some cases but not here). ok,I will modify it in the next version > > > > Please try to split the very long sentence a bit and make it more > > obvious > > what causes what as the current wording is a bit vague, did you mean: > > uncore_event_cpu_online() will not call uncore_freq_add_entry() for > > another CPU that is being onlined or something along those lines? > > > > Will this change work/matter? Documentation/core-api/cpu_hotplug.rst > > says > > about cpuhp_setup_state(): > > > > "If a callback fails for CPU N then the teardown callback for CPU > > 0 .. N-1 is invoked to rollback the operation. The state setup > > fails, > > the callbacks for the state are not installed and in case of dynamic > > allocation the allocated state is freed." > > > > Yes, cpuhp_setup_state() will fail and which will result in clean up. > So any fail of any fail uncore_event_cpu_online() will result in no sys > entries. > > I think here the intention is to keep sys entries, which will not > happen with this patch. > > For confirmation on 6.14 kernel, I forced failure on CPU 10: > > [595799.696873] intel_uncore_init > [595799.700102] uncore_event_cpu_online cpu:0 > [595799.704240] uncore_event_cpu_online cpu:1 > [595799.708360] uncore_event_cpu_online cpu:2 > [595799.712505] uncore_event_cpu_online cpu:3 > [595799.716633] uncore_event_cpu_online cpu:4 > [595799.720755] uncore_event_cpu_online cpu:5 > [595799.724953] uncore_event_cpu_online cpu:6 > [595799.729158] uncore_event_cpu_online cpu:7 > [595799.733409] uncore_event_cpu_online cpu:8 > [595799.737674] uncore_event_cpu_online cpu:9 > [595799.741954] uncore_event_cpu_online cpu:10 > [595799.746134] Force CPU 10 to fail online > [595799.750182] uncore_event_cpu_offline cpu:0 > [595799.754508] uncore_event_cpu_offline cpu:1 > [595799.758834] uncore_event_cpu_offline cpu:2 > [595799.763238] uncore_event_cpu_offline cpu:3 > [595799.767558] uncore_event_cpu_offline cpu:4 > [595799.771832] uncore_event_cpu_offline cpu:5 > [595799.776178] uncore_event_cpu_offline cpu:6 > [595799.780506] uncore_event_cpu_offline cpu:7 > [595799.784862] uncore_event_cpu_offline cpu:8 > [595799.789247] uncore_event_cpu_offline cpu:9 > [595799.793540] intel_uncore_init cpuhp_setup_state failed > [595799.798776] intel_uncore_init failed > > > Thanks, > Srinivas Registering the CPU hot-plug callback function during booting can be handled correctly. I think the problem occurs during runtime. The original code may have problems when the CPU hot-plug modifies the management CPU during runtime: Assume that the CPUs of package 1 are 8-15, and the uncore driver has been registered at boot time; 1. Offline all CPU No.8-15 2. Try online CPU No. 8, the code executes cpumask_set_cpu() successfully, but fails in the uncore_freq_add_entry() process. At this time, the mark of CPU No. 8 is added to uncore_cpu_mask, but no sys interface is generated,cpu No.8 online fails; 3. Try online CPU No. 8 again, cpumask_any_and() judges success, and the CPU No.8 online is successful at this time; 4. Assume that the attempt to online CPU No. 9-15 is successful at this time, but there is no sys interface ————unexpected behavior 1. 5. Offline CPU No. 9-15, and offline No.8, will eventually call uncore_freq_remove_die_entry()————unexpected behavior 2 is generated, which may cause a crash. > > > > > > > > > > Fixes tag? > > > > > Signed-off-by: shouyeliu <shouyeliu@gmail.com> > > > > The correct format for tags is documented in > > Documentation/process/5.Posting.rst: > > > > tag: Full Name <email address> ok,I will modify it in the next version > > > > > --- > > > .../x86/intel/uncore-frequency/uncore-frequency.c | 12 > > > ++++++++---- > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > index 40bbf8e45fa4..1de0a4a9d6cd 100644 > > > --- a/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > +++ b/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > @@ -146,15 +146,13 @@ static int uncore_event_cpu_online(unsigned > > > int cpu) > > > { > > > struct uncore_data *data; > > > int target; > > > + int ret; > > > > > > /* Check if there is an online cpu in the package for > > > uncore MSR */ > > > target = cpumask_any_and(&uncore_cpu_mask, > > > topology_die_cpumask(cpu)); > > > if (target < nr_cpu_ids) > > > return 0; > > > > > > - /* Use this CPU on this die as a control CPU */ > > > - cpumask_set_cpu(cpu, &uncore_cpu_mask); > > > - > > > data = uncore_get_instance(cpu); > > > if (!data) > > > return 0; > > > @@ -163,7 +161,13 @@ static int uncore_event_cpu_online(unsigned > > > int cpu) > > > data->die_id = topology_die_id(cpu); > > > data->domain_id = UNCORE_DOMAIN_ID_INVALID; > > > > > > - return uncore_freq_add_entry(data, cpu); > > > + ret = uncore_freq_add_entry(data, cpu); > > > + if (!ret) { > > > + /* Use this CPU on this die as a control CPU */ > > > + cpumask_set_cpu(cpu, &uncore_cpu_mask); > > > + } > > > + > > > + return ret; > > > > Please reverse to logic such that you return early on error, which is > > the > > usual error handling pattern. ok,I will modify it in the next version > > > > > } > > > > > > static int uncore_event_cpu_offline(unsigned int cpu) > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAAscG3VHVdNDQGfsdBs_ht5H-WUtCBksMYPXLKW2D6Uqu3yeAA@mail.gmail.com>]
[parent not found: <331a55fe7334cf425ddb8826160b64a5af37c805.camel@linux.intel.com>]
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure [not found] ` <331a55fe7334cf425ddb8826160b64a5af37c805.camel@linux.intel.com> @ 2025-04-15 15:10 ` Ilpo Järvinen 2025-04-15 22:11 ` srinivas pandruvada 0 siblings, 1 reply; 7+ messages in thread From: Ilpo Järvinen @ 2025-04-15 15:10 UTC (permalink / raw) To: srinivas pandruvada; +Cc: liu shouye, Hans de Goede, platform-driver-x86, LKML [-- Attachment #1: Type: text/plain, Size: 7864 bytes --] On Tue, 15 Apr 2025, srinivas pandruvada wrote: > On Tue, 2025-04-15 at 14:39 +0800, liu shouye wrote: > > > srinivas pandruvada <srinivas.pandruvada@linux.intel.com> 于2025年4月15日周二 > 00:08写道: > On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote: > > On Mon, 14 Apr 2025, shouyeliu wrote: > > > > > When uncore_event_cpu_online() fails to initialize a control > CPU > > > (e.g., > > > due to memory allocation failure or uncore_freq_add_entry() > > > errors), > > > the code leaves stale entries in uncore_cpu_mask after that > online > > > CPU > > > will not try to call uncore_freq_add_entry, resulting in no sys > > > interface. > > > > Please add () after any name that refers to a C function (you're > not > > even > > being consistent here as you had it in some cases but not here). > > ok,I will modify it in the next version > > > > > Please try to split the very long sentence a bit and make it more > > obvious > > what causes what as the current wording is a bit vague, did you > mean: > > uncore_event_cpu_online() will not call uncore_freq_add_entry() > for > > another CPU that is being onlined or something along those lines? > > > > Will this change work/matter? > Documentation/core-api/cpu_hotplug.rst > > says > > about cpuhp_setup_state(): > > > > "If a callback fails for CPU N then the teardown callback for CPU > > 0 .. N-1 is invoked to rollback the operation. The state setup > > fails, > > the callbacks for the state are not installed and in case of > dynamic > > allocation the allocated state is freed." > > > > Yes, cpuhp_setup_state() will fail and which will result in clean up. > So any fail of any fail uncore_event_cpu_online() will result in no sys > entries. > > I think here the intention is to keep sys entries, which will not > happen with this patch. > > For confirmation on 6.14 kernel, I forced failure on CPU 10: > > [595799.696873] intel_uncore_init > [595799.700102] uncore_event_cpu_online cpu:0 > [595799.704240] uncore_event_cpu_online cpu:1 > [595799.708360] uncore_event_cpu_online cpu:2 > [595799.712505] uncore_event_cpu_online cpu:3 > [595799.716633] uncore_event_cpu_online cpu:4 > [595799.720755] uncore_event_cpu_online cpu:5 > [595799.724953] uncore_event_cpu_online cpu:6 > [595799.729158] uncore_event_cpu_online cpu:7 > [595799.733409] uncore_event_cpu_online cpu:8 > [595799.737674] uncore_event_cpu_online cpu:9 > [595799.741954] uncore_event_cpu_online cpu:10 > [595799.746134] Force CPU 10 to fail online > [595799.750182] uncore_event_cpu_offline cpu:0 > [595799.754508] uncore_event_cpu_offline cpu:1 > [595799.758834] uncore_event_cpu_offline cpu:2 > [595799.763238] uncore_event_cpu_offline cpu:3 > [595799.767558] uncore_event_cpu_offline cpu:4 > [595799.771832] uncore_event_cpu_offline cpu:5 > [595799.776178] uncore_event_cpu_offline cpu:6 > [595799.780506] uncore_event_cpu_offline cpu:7 > [595799.784862] uncore_event_cpu_offline cpu:8 > [595799.789247] uncore_event_cpu_offline cpu:9 > [595799.793540] intel_uncore_init cpuhp_setup_state failed > [595799.798776] intel_uncore_init failed > > > Thanks, > Srinivas > > > > Registering the CPU hot-plug callback function during booting can be handled > correctly. I think the problem occurs during runtime. > The original code may have problems when the CPU hot-plug modifies the > management CPU during runtime: > Assume that the CPUs of package 1 are 8-15, and the uncore driver has been > registered at boot time; > 1. Offline all CPU No.8-15 > 2. Try online CPU No. 8, the code executes cpumask_set_cpu() successfully, but > fails in the uncore_freq_add_entry() process. At this time, the mark of CPU No. > 8 is added to uncore_cpu_mask, but no sys interface is generated,cpu No.8 > online fails; > 3. Try online CPU No. 8 again, cpumask_any_and() judges success, and the CPU > No.8 online is successful at this time; > 4. Assume that the attempt to online CPU No. 9-15 is successful at this time, > but there is no sys interface ————unexpected behavior 1. > 5. Offline CPU No. 9-15, and offline No.8, will eventually call > uncore_freq_remove_die_entry()————unexpected behavior 2 is generated, which may > cause a crash. > > > > I see the case in 2 socket server. So good to fix. Thanks for this. All this extra information that this discussion has brought into light should be included into the changelog (including the fact that this can only occur after they were first setup without errors as the it would have rolled back otherwise). -- i. > > Also I suggest to look why the addition of one entry failed on your server system. > > Thanks, > Srinivas > > > > > > > > > > Fixes tag? > > > > > Signed-off-by: shouyeliu <shouyeliu@gmail.com> > > > > The correct format for tags is documented in > > Documentation/process/5.Posting.rst: > > > > tag: Full Name <email address> > > ok,I will modify it in the next version > > > > > --- > > > .../x86/intel/uncore-frequency/uncore-frequency.c | 12 > > > ++++++++---- > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git > a/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > b/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > index 40bbf8e45fa4..1de0a4a9d6cd 100644 > > > --- a/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > +++ b/drivers/platform/x86/intel/uncore-frequency/uncore- > > > frequency.c > > > @@ -146,15 +146,13 @@ static int > uncore_event_cpu_online(unsigned > > > int cpu) > > > { > > > struct uncore_data *data; > > > int target; > > > + int ret; > > > > > > /* Check if there is an online cpu in the package for > > > uncore MSR */ > > > target = cpumask_any_and(&uncore_cpu_mask, > > > topology_die_cpumask(cpu)); > > > if (target < nr_cpu_ids) > > > return 0; > > > > > > - /* Use this CPU on this die as a control CPU */ > > > - cpumask_set_cpu(cpu, &uncore_cpu_mask); > > > - > > > data = uncore_get_instance(cpu); > > > if (!data) > > > return 0; > > > @@ -163,7 +161,13 @@ static int > uncore_event_cpu_online(unsigned > > > int cpu) > > > data->die_id = topology_die_id(cpu); > > > data->domain_id = UNCORE_DOMAIN_ID_INVALID; > > > > > > - return uncore_freq_add_entry(data, cpu); > > > + ret = uncore_freq_add_entry(data, cpu); > > > + if (!ret) { > > > + /* Use this CPU on this die as a control CPU */ > > > + cpumask_set_cpu(cpu, &uncore_cpu_mask); > > > + } > > > + > > > + return ret; > > > > Please reverse to logic such that you return early on error, > which is > > the > > usual error handling pattern. > > ok,I will modify it in the next version > > > > > } > > > > > > static int uncore_event_cpu_offline(unsigned int cpu) > > > > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure 2025-04-15 15:10 ` Ilpo Järvinen @ 2025-04-15 22:11 ` srinivas pandruvada 2025-04-16 2:34 ` liu shouye 0 siblings, 1 reply; 7+ messages in thread From: srinivas pandruvada @ 2025-04-15 22:11 UTC (permalink / raw) To: Ilpo Järvinen; +Cc: liu shouye, Hans de Goede, platform-driver-x86, LKML On Tue, 2025-04-15 at 18:10 +0300, Ilpo Järvinen wrote: > On Tue, 15 Apr 2025, srinivas pandruvada wrote: > > > On Tue, 2025-04-15 at 14:39 +0800, liu shouye wrote: > > > > > > srinivas pandruvada <srinivas.pandruvada@linux.intel.com> > > 于2025年4月15日周二 > > 00:08写道: > > On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote: > > > On Mon, 14 Apr 2025, shouyeliu wrote: > > > > > > > When uncore_event_cpu_online() fails to initialize a > > control > > CPU > > > > (e.g., > > > > due to memory allocation failure or > > uncore_freq_add_entry() > > > > errors), > > > > the code leaves stale entries in uncore_cpu_mask after > > that > > online > > > > CPU > > > > will not try to call uncore_freq_add_entry, resulting in > > no sys > > > > interface. > > > > > > Please add () after any name that refers to a C function > > (you're > > not > > > even > > > being consistent here as you had it in some cases but not > > here). > > > > ok,I will modify it in the next version > > > > > > > > Please try to split the very long sentence a bit and make > > it more > > > obvious > > > what causes what as the current wording is a bit vague, did > > you > > mean: > > > uncore_event_cpu_online() will not call > > uncore_freq_add_entry() > > for > > > another CPU that is being onlined or something along those > > lines? > > > > > > Will this change work/matter? > > Documentation/core-api/cpu_hotplug.rst > > > says > > > about cpuhp_setup_state(): > > > > > > "If a callback fails for CPU N then the teardown callback > > for CPU > > > 0 .. N-1 is invoked to rollback the operation. The state > > setup > > > fails, > > > the callbacks for the state are not installed and in case > > of > > dynamic > > > allocation the allocated state is freed." > > > > > > > Yes, cpuhp_setup_state() will fail and which will result in clean > > up. > > So any fail of any fail uncore_event_cpu_online() will result in no > > sys > > entries. > > > > I think here the intention is to keep sys entries, which will not > > happen with this patch. > > > > For confirmation on 6.14 kernel, I forced failure on CPU 10: > > > > [595799.696873] intel_uncore_init > > [595799.700102] uncore_event_cpu_online cpu:0 > > [595799.704240] uncore_event_cpu_online cpu:1 > > [595799.708360] uncore_event_cpu_online cpu:2 > > [595799.712505] uncore_event_cpu_online cpu:3 > > [595799.716633] uncore_event_cpu_online cpu:4 > > [595799.720755] uncore_event_cpu_online cpu:5 > > [595799.724953] uncore_event_cpu_online cpu:6 > > [595799.729158] uncore_event_cpu_online cpu:7 > > [595799.733409] uncore_event_cpu_online cpu:8 > > [595799.737674] uncore_event_cpu_online cpu:9 > > [595799.741954] uncore_event_cpu_online cpu:10 > > [595799.746134] Force CPU 10 to fail online > > [595799.750182] uncore_event_cpu_offline cpu:0 > > [595799.754508] uncore_event_cpu_offline cpu:1 > > [595799.758834] uncore_event_cpu_offline cpu:2 > > [595799.763238] uncore_event_cpu_offline cpu:3 > > [595799.767558] uncore_event_cpu_offline cpu:4 > > [595799.771832] uncore_event_cpu_offline cpu:5 > > [595799.776178] uncore_event_cpu_offline cpu:6 > > [595799.780506] uncore_event_cpu_offline cpu:7 > > [595799.784862] uncore_event_cpu_offline cpu:8 > > [595799.789247] uncore_event_cpu_offline cpu:9 > > [595799.793540] intel_uncore_init cpuhp_setup_state failed > > [595799.798776] intel_uncore_init failed > > > > > > Thanks, > > Srinivas > > > > > > > > Registering the CPU hot-plug callback function during booting can > > be handled > > correctly. I think the problem occurs during runtime. > > The original code may have problems when the CPU hot-plug modifies > > the > > management CPU during runtime: > > Assume that the CPUs of package 1 are 8-15, and the uncore driver > > has been > > registered at boot time; > > 1. Offline all CPU No.8-15 > > 2. Try online CPU No. 8, the code executes cpumask_set_cpu() > > successfully, but > > fails in the uncore_freq_add_entry() process. At this time, the > > mark of CPU No. > > 8 is added to uncore_cpu_mask, but no sys interface is > > generated,cpu No.8 > > online fails; > > 3. Try online CPU No. 8 again, cpumask_any_and() judges success, > > and the CPU > > No.8 online is successful at this time; > > 4. Assume that the attempt to online CPU No. 9-15 is successful at > > this time, > > but there is no sys interface ————unexpected behavior 1. > > 5. Offline CPU No. 9-15, and offline No.8, will eventually call > > uncore_freq_remove_die_entry()————unexpected behavior 2 is > > generated, which may > > cause a crash. > > > > > > > > I see the case in 2 socket server. So good to fix. Thanks for this. > > All this extra information that this discussion has brought into > light > should be included into the changelog (including the fact that this > can > only occur after they were first setup without errors as the it would > have rolled back otherwise). > May be something like this: " Fix missing uncore sysfs during CPU hotplug In certain situations, the sysfs for uncore may not be present when all CPUs in a package are offlined and then brought back online after boot. This issue can occur if there is an error in adding the sysfs entry due to a memory allocation failure. Retrying to bring the CPUs online will not resolve the issue, as the uncore_cpu_mask is already set for the package before the failure condition occurs. This issue does not occur if the failure happens during module initialization, as the module will fail to load in the event of any error. To address this, ensure that the uncore_cpu_mask is not set until the successful return of uncore_freq_add_entry(). " Thanks, Srinivas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure 2025-04-15 22:11 ` srinivas pandruvada @ 2025-04-16 2:34 ` liu shouye 0 siblings, 0 replies; 7+ messages in thread From: liu shouye @ 2025-04-16 2:34 UTC (permalink / raw) To: srinivas pandruvada Cc: Ilpo Järvinen, Hans de Goede, platform-driver-x86, LKML srinivas pandruvada <srinivas.pandruvada@linux.intel.com> 于2025年4月16日周三 06:11写道: > > On Tue, 2025-04-15 at 18:10 +0300, Ilpo Järvinen wrote: > > On Tue, 15 Apr 2025, srinivas pandruvada wrote: > > > > > On Tue, 2025-04-15 at 14:39 +0800, liu shouye wrote: > > > > > > > > > srinivas pandruvada <srinivas.pandruvada@linux.intel.com> > > > 于2025年4月15日周二 > > > 00:08写道: > > > On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote: > > > > On Mon, 14 Apr 2025, shouyeliu wrote: > > > > > > > > > When uncore_event_cpu_online() fails to initialize a > > > control > > > CPU > > > > > (e.g., > > > > > due to memory allocation failure or > > > uncore_freq_add_entry() > > > > > errors), > > > > > the code leaves stale entries in uncore_cpu_mask after > > > that > > > online > > > > > CPU > > > > > will not try to call uncore_freq_add_entry, resulting in > > > no sys > > > > > interface. > > > > > > > > Please add () after any name that refers to a C function > > > (you're > > > not > > > > even > > > > being consistent here as you had it in some cases but not > > > here). > > > > > > ok,I will modify it in the next version > > > > > > > > > > > Please try to split the very long sentence a bit and make > > > it more > > > > obvious > > > > what causes what as the current wording is a bit vague, did > > > you > > > mean: > > > > uncore_event_cpu_online() will not call > > > uncore_freq_add_entry() > > > for > > > > another CPU that is being onlined or something along those > > > lines? > > > > > > > > Will this change work/matter? > > > Documentation/core-api/cpu_hotplug.rst > > > > says > > > > about cpuhp_setup_state(): > > > > > > > > "If a callback fails for CPU N then the teardown callback > > > for CPU > > > > 0 .. N-1 is invoked to rollback the operation. The state > > > setup > > > > fails, > > > > the callbacks for the state are not installed and in case > > > of > > > dynamic > > > > allocation the allocated state is freed." > > > > > > > > > > Yes, cpuhp_setup_state() will fail and which will result in clean > > > up. > > > So any fail of any fail uncore_event_cpu_online() will result in no > > > sys > > > entries. > > > > > > I think here the intention is to keep sys entries, which will not > > > happen with this patch. > > > > > > For confirmation on 6.14 kernel, I forced failure on CPU 10: > > > > > > [595799.696873] intel_uncore_init > > > [595799.700102] uncore_event_cpu_online cpu:0 > > > [595799.704240] uncore_event_cpu_online cpu:1 > > > [595799.708360] uncore_event_cpu_online cpu:2 > > > [595799.712505] uncore_event_cpu_online cpu:3 > > > [595799.716633] uncore_event_cpu_online cpu:4 > > > [595799.720755] uncore_event_cpu_online cpu:5 > > > [595799.724953] uncore_event_cpu_online cpu:6 > > > [595799.729158] uncore_event_cpu_online cpu:7 > > > [595799.733409] uncore_event_cpu_online cpu:8 > > > [595799.737674] uncore_event_cpu_online cpu:9 > > > [595799.741954] uncore_event_cpu_online cpu:10 > > > [595799.746134] Force CPU 10 to fail online > > > [595799.750182] uncore_event_cpu_offline cpu:0 > > > [595799.754508] uncore_event_cpu_offline cpu:1 > > > [595799.758834] uncore_event_cpu_offline cpu:2 > > > [595799.763238] uncore_event_cpu_offline cpu:3 > > > [595799.767558] uncore_event_cpu_offline cpu:4 > > > [595799.771832] uncore_event_cpu_offline cpu:5 > > > [595799.776178] uncore_event_cpu_offline cpu:6 > > > [595799.780506] uncore_event_cpu_offline cpu:7 > > > [595799.784862] uncore_event_cpu_offline cpu:8 > > > [595799.789247] uncore_event_cpu_offline cpu:9 > > > [595799.793540] intel_uncore_init cpuhp_setup_state failed > > > [595799.798776] intel_uncore_init failed > > > > > > > > > Thanks, > > > Srinivas > > > > > > > > > > > > Registering the CPU hot-plug callback function during booting can > > > be handled > > > correctly. I think the problem occurs during runtime. > > > The original code may have problems when the CPU hot-plug modifies > > > the > > > management CPU during runtime: > > > Assume that the CPUs of package 1 are 8-15, and the uncore driver > > > has been > > > registered at boot time; > > > 1. Offline all CPU No.8-15 > > > 2. Try online CPU No. 8, the code executes cpumask_set_cpu() > > > successfully, but > > > fails in the uncore_freq_add_entry() process. At this time, the > > > mark of CPU No. > > > 8 is added to uncore_cpu_mask, but no sys interface is > > > generated,cpu No.8 > > > online fails; > > > 3. Try online CPU No. 8 again, cpumask_any_and() judges success, > > > and the CPU > > > No.8 online is successful at this time; > > > 4. Assume that the attempt to online CPU No. 9-15 is successful at > > > this time, > > > but there is no sys interface ————unexpected behavior 1. > > > 5. Offline CPU No. 9-15, and offline No.8, will eventually call > > > uncore_freq_remove_die_entry()————unexpected behavior 2 is > > > generated, which may > > > cause a crash. > > > > > > > > > > > > I see the case in 2 socket server. So good to fix. Thanks for this. > > > > > > Also I suggest to look why the addition of one entry failed on your server system. Sorry for my description problem. My machine also has two sockets. After all, a one-socket machine cannot offline all CPUs. > > > > All this extra information that this discussion has brought into > > light > > should be included into the changelog (including the fact that this > > can > > only occur after they were first setup without errors as the it would > > have rolled back otherwise). Got it. My previous commit was indeed not described clearly. Thanks, Shouye > > > > May be something like this: > > " > Fix missing uncore sysfs during CPU hotplug > > In certain situations, the sysfs for uncore may not be present when all > CPUs in a package are offlined and then brought back online after boot. > > This issue can occur if there is an error in adding the sysfs entry due > to a memory allocation failure. Retrying to bring the CPUs online will > not resolve the issue, as the uncore_cpu_mask is already set for the > package before the failure condition occurs. > > This issue does not occur if the failure happens during module > initialization, as the module will fail to load in the event of any > error. > > To address this, ensure that the uncore_cpu_mask is not set until the > successful return of uncore_freq_add_entry(). > " > > Thanks, > Srinivas > Very clear and comprehensive commit description - I will apply it in the next version. Thanks, Shouye ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-16 2:34 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 9:21 [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure shouyeliu
2025-04-14 10:41 ` Ilpo Järvinen
2025-04-14 16:06 ` srinivas pandruvada
2025-04-15 6:54 ` liu shouye
[not found] ` <CAAscG3VHVdNDQGfsdBs_ht5H-WUtCBksMYPXLKW2D6Uqu3yeAA@mail.gmail.com>
[not found] ` <331a55fe7334cf425ddb8826160b64a5af37c805.camel@linux.intel.com>
2025-04-15 15:10 ` Ilpo Järvinen
2025-04-15 22:11 ` srinivas pandruvada
2025-04-16 2:34 ` liu shouye
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox