From mboxrd@z Thu Jan 1 00:00:00 1970 From: jszhang@marvell.com (Jisheng Zhang) Date: Fri, 8 Jan 2016 21:21:09 +0800 Subject: Armada XP (MV78460): BUG in netdevice.h with maxcpus=2 In-Reply-To: <20160108210309.5a728bd7@xhacker> References: <568F6A47.6010905@gmail.com> <20160108182537.0a68630c@xhacker> <20160108105721.GG19062@n2100.arm.linux.org.uk> <20160108204523.43b4d473@xhacker> <20160108210309.5a728bd7@xhacker> Message-ID: <20160108212109.521962d0@xhacker> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Fri, 8 Jan 2016 21:03:09 +0800 Jisheng Zhang wrote: > On Fri, 8 Jan 2016 20:45:23 +0800 > Jisheng Zhang wrote: > > > Dear Russell, > > > > On Fri, 8 Jan 2016 10:57:21 +0000 Russell King - ARM Linux wrote: > > > > > On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote: > > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c > > > > index ed622fa..e1242f4 100644 > > > > --- a/drivers/net/ethernet/marvell/mvneta.c > > > > +++ b/drivers/net/ethernet/marvell/mvneta.c > > > > @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp) > > > > mvneta_port_enable(pp); > > > > > > > > /* Enable polling on the port */ > > > > - for_each_present_cpu(cpu) { > > > > + for_each_online_cpu(cpu) { > > > > struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu); > > > > > > > > napi_enable(&port->napi); > > > > @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp) > > > > > > > > phy_stop(pp->phy_dev); > > > > > > > > - for_each_present_cpu(cpu) { > > > > + for_each_online_cpu(cpu) { > > > > struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu); > > > > > > > > napi_disable(&port->napi); > > > > @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev) > > > > mvneta_stop_dev(pp); > > > > mvneta_mdio_remove(pp); > > > > unregister_cpu_notifier(&pp->cpu_notifier); > > > > - for_each_present_cpu(cpu) > > > > + for_each_online_cpu(cpu) > > > > smp_call_function_single(cpu, mvneta_percpu_disable, pp, true); > > > > free_percpu_irq(dev->irq, pp->ports); > > > > mvneta_cleanup_rxqs(pp); > > > > > > I'm not convinced that this isn't racy - what happens if a CPU is > > > brought online between unregister_cpu_notifier(&pp->cpu_notifier) > > > and the following loop? We'll end up calling mvneta_percpu_disable() > > > for the new CPU - is that harmless? > > > > > > Similarly, what if the online CPUs change between mvneta_stop_dev() > > > and mvneta_stop(), and also what about hotplug CPU changes during > > > the startup path? > > > > > > Secondly, is there a reason for: > > > > > > for_each_online_cpu(cpu) > > > smp_call_function_single(cpu, ...) > > > > > > as opposed to: > > > > > > smp_call_function(mvneta_percpu_disable, pp, true); > > > > > > > Yep, thanks for pointing out the race. Before sending out the patch, I prepared > > another *wrong*(IMHO) patch to patch smp_prepare_cpus() in arch/arm/kernel/smp.c. > > Here is the patch. I think it could also fix the issue. oops, the version can't be compiled, sorry for noise. Here is the updated one: diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index b263613..26c164b 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -432,6 +432,7 @@ void __init smp_prepare_boot_cpu(void) void __init smp_prepare_cpus(unsigned int max_cpus) { + unsigned int cpu, max; unsigned int ncores = num_possible_cpus(); init_cpu_topology(); @@ -443,15 +444,29 @@ void __init smp_prepare_cpus(unsigned int max_cpus) */ if (max_cpus > ncores) max_cpus = ncores; - if (ncores > 1 && max_cpus) { - /* - * Initialise the present map, which describes the set of CPUs - * actually populated at the present time. A platform should - * re-initialize the map in the platforms smp_prepare_cpus() - * if present != possible (e.g. physical hotplug). - */ - init_cpu_present(cpu_possible_mask); + /* Don't bother if we're effectively UP */ + if (max_cpus <= 1) + return; + + /* + * Initialise the present map (which describes the set of CPUs + * actually populated@the present time). + */ + max = max_cpus; + max--; + for_each_possible_cpu(cpu) { + if (max == 0) + break; + + if (cpu == smp_processor_id()) + continue; + + set_cpu_present(cpu, true); + max--; + } + + if (ncores > 1 && max_cpus) { /* * Initialise the SCU if there are more than one CPU * and let them know where to start. > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index b263613..f94c755 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -443,15 +443,31 @@ void __init smp_prepare_cpus(unsigned int max_cpus) > */ > if (max_cpus > ncores) > max_cpus = ncores; > - if (ncores > 1 && max_cpus) { > - /* > - * Initialise the present map, which describes the set of CPUs > - * actually populated at the present time. A platform should > - * re-initialize the map in the platforms smp_prepare_cpus() > - * if present != possible (e.g. physical hotplug). > - */ > - init_cpu_present(cpu_possible_mask); > > + /* Don't bother if we're effectively UP */ > + if (max_cpus <= 1) > + return; > + > + /* > + * Initialise the present map (which describes the set of CPUs > + * actually populated at the present time) and release the > + * secondaries from the bootloader. > + * > + * Make sure we online at most (max_cpus - 1) additional CPUs. > + */ > + max_cpus--; > + for_each_possible_cpu(cpu) { > + if (max_cpus == 0) > + break; > + > + if (cpu == smp_processor_id()) > + continue; > + > + set_cpu_present(cpu, true); > + max_cpus--; > + } > + > + if (ncores > 1 && max_cpus) { > /* > * Initialise the SCU if there are more than one CPU > * and let them know where to start. > > > > > > Here is the background: > > I can reproduce this issue on arm but failed to reproduce it on arm64. The key > > is what's present cpu. > > > > let's assume a quad core system, boot with maxcpus=2, after booting. > > > > on arm64, present cpus is cpu0, cpu1 > > > > on arm, present cpus is cpu0, cpu1, cpu2 and cpu3. > > > > the arm core code requires every platform to update the present map in > > platforms' smp_prepare_cpus(), but only two or three platforms do so. > > > > Then get back to mvneta issue, during startup, mvneta_start_dev() calls > > napi_enable() for each present cpu's port. If userspace ask for online > > cpu2, mvneta_percpu_notifier() will call napi_enable() for cpu2 again, > > so BUG_ON() is triggered. > > > > I have two solutions: > > > > 1. as the above patch did, then prevent the race as pointed out by > > get_online_cpus(). > > > > 2. make arm platforms smp_prepare_cpus to update the present map or even > > patch arm core smp_prepare_cpus(). > > > > What's the better solution? Could you please guide me? > > > > Thanks in advance, > > Jisheng > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel