LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv
From: Nicholas Piggin @ 2019-05-08  4:59 UTC (permalink / raw)
  To: Abhishek Goel, linux-kernel, linux-pm, linuxppc-dev
  Cc: ego, daniel.lezcano, rjw, dja
In-Reply-To: <20190422063231.51043-1-huntbag@linux.vnet.ibm.com>

Abhishek Goel's on April 22, 2019 4:32 pm:
> Currently, the cpuidle governors determine what idle state a idling CPU
> should enter into based on heuristics that depend on the idle history on
> that CPU. Given that no predictive heuristic is perfect, there are cases
> where the governor predicts a shallow idle state, hoping that the CPU will
> be busy soon. However, if no new workload is scheduled on that CPU in the
> near future, the CPU will end up in the shallow state.
> 
> Motivation
> ----------
> In case of POWER, this is problematic, when the predicted state in the
> aforementioned scenario is a lite stop state, as such lite states will
> inhibit SMT folding, thereby depriving the other threads in the core from
> using the core resources.
> 
> So we do not want to get stucked in such states for longer duration. To
> address this, the cpuidle-core can queue timer to correspond with the
> residency value of the next available state. This timer will forcefully
> wakeup the cpu. Few such iterations will essentially train the governor to
> select a deeper state for that cpu, as the timer here corresponds to the
> next available cpuidle state residency. Cpu will be kicked out of the lite
> state and end up in a non-lite state.
> 
> Experiment
> ----------
> I performed experiments for three scenarios to collect some data.
> 
> case 1 :
> Without this patch and without tick retained, i.e. in a upstream kernel,
> It would spend more than even a second to get out of stop0_lite.
> 
> case 2 : With tick retained in a upstream kernel -
> 
> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
> it to take 8 sched tick to get out of stop0_lite. Experimentally,
> observation was
> 
> =========================================================
> sample          min            max           99percentile
> 20              4ms            12ms          4ms
> =========================================================
> 
> It would take atleast one sched tick to get out of stop0_lite.
> 
> case 2 :  With this patch (not stopping tick, but explicitly queuing a
>           timer)
> 
> ============================================================
> sample          min             max             99percentile
> ============================================================
> 20              144us           192us           144us
> ============================================================
> 
> In this patch, we queue a timer just before entering into a stop0_lite
> state. The timer fires at (residency of next available state + exit latency
> of next available state * 2). Let's say if next state(stop0) is available
> which has residency of 20us, it should get out in as low as (20+2*2)*8
> [Based on the forumla (residency + 2xlatency)*history length] microseconds
> = 192us. Ideally we would expect 8 iterations, it was observed to get out
> in 6-7 iterations. Even if let's say stop2 is next available state(stop0
> and stop1 both are unavailable), it would take (100+2*10)*8 = 960us to get
> into stop2.
> 
> So, We are able to get out of stop0_lite generally in 150us(with this
> patch) as compared to 4ms(with tick retained). As stated earlier, we do not
> want to get stuck into stop0_lite as it inhibits SMT folding for other
> sibling threads, depriving them of core resources. Current patch is using
> forced-wakeup only for stop0_lite, as it gives performance benefit(primary
> reason) along with lowering down power consumption. We may extend this
> model for other states in future.

I still have to wonder, between our snooze loop and stop0, what does
stop0_lite buy us.

That said, the problem you're solving here is a generic one that all
stop states have, I think. Doesn't the same thing apply going from
stop0 to stop5? You might under estimate the sleep time and lose power
savings and therefore performance there too. Shouldn't we make it
generic for all stop states?

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH 1/1] cpuidle-powernv : forced wakeup for stop lite states
From: Gautham R Shenoy @ 2019-05-08  3:55 UTC (permalink / raw)
  To: Abhishek Goel
  Cc: ego, linux-pm, daniel.lezcano, rjw, linux-kernel, linuxppc-dev,
	dja
In-Reply-To: <20190422063231.51043-2-huntbag@linux.vnet.ibm.com>

Hi Abhishek,

Apologies for this delayed review.

On Mon, Apr 22, 2019 at 01:32:31AM -0500, Abhishek Goel wrote:
> Currently, the cpuidle governors determine what idle state a idling CPU
> should enter into based on heuristics that depend on the idle history on
> that CPU. Given that no predictive heuristic is perfect, there are cases
> where the governor predicts a shallow idle state, hoping that the CPU will
> be busy soon. However, if no new workload is scheduled on that CPU in the
> near future, the CPU will end up in the shallow state.
> 
> In case of POWER, this is problematic, when the predicted state in the
> aforementioned scenario is a lite stop state, as such lite states will
> inhibit SMT folding, thereby depriving the other threads in the core from
> using the core resources.
> 
> So we do not want to get stucked in such states for longer duration. To

s/stucked/stuck

> address this, the cpuidle-core can queue timer to correspond with the

Actually we are queuing the timer in the driver and not the core!

> residency value of the next available state. This timer will forcefully
> wakeup the cpu. Few such iterations will essentially train the governor to
> select a deeper state for that cpu, as the timer here corresponds to the
> next available cpuidle state residency. Cpu will be kicked out of the lite
> state and end up in a non-lite state.


Coming to think of it, this is also the probelm that we have solved
for the snooze state. So, perhaps we can reuse that code to determine
what the timeout value should be for these idle states in which the
CPU shouldn't be remaining for a long time.


> 
> Signed-off-by: Abhishek Goel <huntbag@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/opal-api.h |  1 +
>  drivers/cpuidle/cpuidle-powernv.c   | 71 ++++++++++++++++++++++++++++-
>  2 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index 870fb7b23..735dec731 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -226,6 +226,7 @@
>   */
> 
>  #define OPAL_PM_TIMEBASE_STOP		0x00000002
> +#define OPAL_PM_LOSE_USER_CONTEXT	0x00001000
>  #define OPAL_PM_LOSE_HYP_CONTEXT	0x00002000
>  #define OPAL_PM_LOSE_FULL_CONTEXT	0x00004000
>  #define OPAL_PM_NAP_ENABLED		0x00010000
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index 84b1ebe21..30b877962 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -15,6 +15,7 @@
>  #include <linux/clockchips.h>
>  #include <linux/of.h>
>  #include <linux/slab.h>
> +#include <linux/hrtimer.h>
> 
>  #include <asm/machdep.h>
>  #include <asm/firmware.h>
> @@ -43,6 +44,40 @@ struct stop_psscr_table {
> 
>  static struct stop_psscr_table stop_psscr_table[CPUIDLE_STATE_MAX] __read_mostly;
> 
> +DEFINE_PER_CPU(struct hrtimer, forced_wakeup_timer);
> +
> +static int forced_wakeup_time_compute(struct cpuidle_device *dev,
> +				      struct cpuidle_driver *drv,
> +				      int index)
> +{
> +	int i, timeout_us = 0;
> +
> +	for (i = index + 1; i < drv->state_count; i++) {
> +		if (drv->states[i].disabled || dev->states_usage[i].disable)
> +			continue;
> +		timeout_us = drv->states[i].target_residency +
> +				2 * drv->states[i].exit_latency;
> +		break;
> +	}
> +

This code is similar to the one in get_snooze_timeout(), except for
the inclusion of exit_latency in the timeout. What will we miss if we
won't consider exit_latency?

Could you try to see if you can club the two ?


> +	return timeout_us;
> +}
> +
> +enum hrtimer_restart forced_wakeup_hrtimer_callback(struct hrtimer *hrtimer)
> +{
> +	return HRTIMER_NORESTART;
> +}
> +
> +static void forced_wakeup_timer_init(int cpu, struct cpuidle_driver *drv)
> +{
> +	struct hrtimer *cpu_forced_wakeup_timer = &per_cpu(forced_wakeup_timer,
> +								cpu);
> +
> +	hrtimer_init(cpu_forced_wakeup_timer, CLOCK_MONOTONIC,
> +			HRTIMER_MODE_REL);
> +	cpu_forced_wakeup_timer->function = forced_wakeup_hrtimer_callback;
> +}
> +
>  static u64 default_snooze_timeout __read_mostly;
>  static bool snooze_timeout_en __read_mostly;
> 
> @@ -103,6 +138,28 @@ static int snooze_loop(struct cpuidle_device *dev,
>  	return index;
>  }
> 
> +static int stop_lite_loop(struct cpuidle_device *dev,
> +			  struct cpuidle_driver *drv,
> +			  int index)
> +{
> +	int timeout_us;
> +	struct hrtimer *this_timer = &per_cpu(forced_wakeup_timer, dev->cpu);
> +
> +	timeout_us = forced_wakeup_time_compute(dev, drv, index);
> +
> +	if (timeout_us > 0)
> +		hrtimer_start(this_timer, ns_to_ktime(timeout_us * 1000),
> +						HRTIMER_MODE_REL_PINNED);
> +
> +	power9_idle_type(stop_psscr_table[index].val,
> +			 stop_psscr_table[index].mask);
> +
> +	if (unlikely(hrtimer_is_queued(this_timer)))
> +		hrtimer_cancel(this_timer);
> +
> +	return index;
> +}
> +
>  static int nap_loop(struct cpuidle_device *dev,
>  			struct cpuidle_driver *drv,
>  			int index)
> @@ -190,7 +247,7 @@ static int powernv_cpuidle_cpu_dead(unsigned int cpu)
>   */
>  static int powernv_cpuidle_driver_init(void)
>  {
> -	int idle_state;
> +	int idle_state, cpu;
>  	struct cpuidle_driver *drv = &powernv_idle_driver;
> 
>  	drv->state_count = 0;
> @@ -224,6 +281,9 @@ static int powernv_cpuidle_driver_init(void)
> 
>  	drv->cpumask = (struct cpumask *)cpu_present_mask;
> 
> +	for_each_cpu(cpu, drv->cpumask)
> +		forced_wakeup_timer_init(cpu, drv);
> +
>  	return 0;
>  }
> 
> @@ -299,6 +359,7 @@ static int powernv_add_idle_states(void)
>  	for (i = 0; i < dt_idle_states; i++) {
>  		unsigned int exit_latency, target_residency;
>  		bool stops_timebase = false;
> +		bool lose_user_context = false;
>  		struct pnv_idle_states_t *state = &pnv_idle_states[i];
> 
>  		/*
> @@ -324,6 +385,9 @@ static int powernv_add_idle_states(void)
>  		if (has_stop_states && !(state->valid))
>  				continue;
> 
> +		if (state->flags & OPAL_PM_LOSE_USER_CONTEXT)
> +			lose_user_context = true;
> +
>  		if (state->flags & OPAL_PM_TIMEBASE_STOP)
>  			stops_timebase = true;
> 
> @@ -332,6 +396,11 @@ static int powernv_add_idle_states(void)
>  			add_powernv_state(nr_idle_states, "Nap",
>  					  CPUIDLE_FLAG_NONE, nap_loop,
>  					  target_residency, exit_latency, 0, 0);
> +		} else if (has_stop_states && !lose_user_context) {
> +			add_powernv_state(nr_idle_states, state->name,
> +					  CPUIDLE_FLAG_NONE, stop_lite_loop,
> +					  target_residency, exit_latency,
> +					  state->psscr_val, state->psscr_mask);
>  		} else if (has_stop_states && !stops_timebase) {
>  			add_powernv_state(nr_idle_states, state->name,
>  					  CPUIDLE_FLAG_NONE, stop_loop,
> -- 
> 2.17.1
> 


^ permalink raw reply

* RE: [EXT] [PATCH net-next v2 1/4] net: ethernet: support of_get_mac_address new ERR_PTR error
From: Andy Duan @ 2019-05-08  3:07 UTC (permalink / raw)
  To: Petr Štetiar, netdev@vger.kernel.org, David S. Miller,
	Andreas Larsson, Maxime Ripard, Chen-Yu Tsai, Thor Thayer,
	Florian Fainelli, Doug Berger, Sunil Goutham, Robert Richter,
	Madalin-cristian Bucur, Pantelis Antoniou, Claudiu Manoil, Leo Li,
	Yisen Zhuang, Salil Mehta, Hauke Mehrtens, Sebastian Hesselbarth,
	Thomas Petazzoni, Mirko Lindner, Stephen Hemminger, Felix Fietkau,
	John Crispin, Sean Wang, Nelson Chang, Matthias Brugger,
	Vladimir Zapolskiy, Sylvain Lemieux, Sergei Shtylyov, Byungho An,
	Girish K S, Vipul Pandya, Kunihiko Hayashi, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu, Maxime Coquelin, Grygorii Strashko,
	Wingman Kwok, Murali Karicheri, Michal Simek, Anirudha Sarangi,
	John Linn
  Cc: devel@driverdev.osuosl.org, Andrew Lunn, Greg Kroah-Hartman,
	linuxppc-dev@lists.ozlabs.org, linux-mediatek@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	bcm-kernel-feedback-list@broadcom.com,
	nios2-dev@lists.rocketboards.org, linux-omap@vger.kernel.org,
	Frank Rowand, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Heiner Kallweit
In-Reply-To: <1557177887-30446-2-git-send-email-ynezz@true.cz>

From: Petr Štetiar <ynezz@true.cz> Sent: Tuesday, May 7, 2019 5:25 AM
> There was NVMEM support added to of_get_mac_address, so it could now
> return ERR_PTR encoded error values, so we need to adjust all current users
> of of_get_mac_address to this new fact.
> 
> While at it, remove superfluous is_valid_ether_addr as the MAC address
> returned from of_get_mac_address is always valid and checked by
> is_valid_ether_addr anyway.
> 
> Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address")
> Signed-off-by: Petr Štetiar <ynezz@true.cz>

Test the patch on i.MX8MQ platform that fix the mem abort issue that is introduced in next-190507.

Tested-by: Fugang Duan <fugang.duan@nxp.com>
> ---
> 
>  This is defacto v5 of the previous 05/10 patch in the series, but since the
>  v4 of this 05/10 patch wasn't picked up by the patchwork for some unknown
> reason, this patch wasn't applied with the other 9 patches in the series, so
> I'm resending it as a separate patch of this fixup series.
> 
>  Changes since v1:
> 
>   * added Fixes: tag
> 
>  Previous changelog (Patch 05/10):
> 
>   Changes since v3:
> 
>    * IS_ERR_OR_NULL -> IS_ERR
> 
>   Changes since v4:
> 
>    * I've just blindly replaced IS_ERR_OR_NULL with IS_ERR, but I've just
>      realized, that in some cases we still need to check for NULL, so I've
>      corrected it in following drivers/files:
> 
>       - broadcom/bgmac-bcma.c
>       - marvell/pxa168_eth.c
>       - samsung/sxgbe/sxgbe_platform.c
>       - stmicro/stmmac/stmmac_main.c
>       - wiznet/w5100.c
>       - ethernet/eth.c
> 
>  drivers/net/ethernet/aeroflex/greth.c                 | 2 +-
>  drivers/net/ethernet/allwinner/sun4i-emac.c           | 2 +-
>  drivers/net/ethernet/altera/altera_tse_main.c         | 2 +-
>  drivers/net/ethernet/arc/emac_main.c                  | 2 +-
>  drivers/net/ethernet/aurora/nb8800.c                  | 2 +-
>  drivers/net/ethernet/broadcom/bcmsysport.c            | 2 +-
>  drivers/net/ethernet/broadcom/bgmac-bcma.c            | 2 +-
>  drivers/net/ethernet/broadcom/bgmac-platform.c        | 2 +-
>  drivers/net/ethernet/broadcom/genet/bcmgenet.c        | 2 +-
>  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c      | 2 +-
>  drivers/net/ethernet/cavium/thunder/thunder_bgx.c     | 2 +-
>  drivers/net/ethernet/davicom/dm9000.c                 | 2 +-
>  drivers/net/ethernet/ethoc.c                          | 2 +-
>  drivers/net/ethernet/ezchip/nps_enet.c                | 2 +-
>  drivers/net/ethernet/freescale/fec_main.c             | 2 +-
>  drivers/net/ethernet/freescale/fec_mpc52xx.c          | 2 +-
>  drivers/net/ethernet/freescale/fman/mac.c             | 2 +-
>  drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c | 2 +-
>  drivers/net/ethernet/freescale/gianfar.c              | 2 +-
>  drivers/net/ethernet/freescale/ucc_geth.c             | 2 +-
>  drivers/net/ethernet/hisilicon/hisi_femac.c           | 2 +-
>  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c         | 2 +-
>  drivers/net/ethernet/lantiq_xrx200.c                  | 2 +-
>  drivers/net/ethernet/marvell/mv643xx_eth.c            | 2 +-
>  drivers/net/ethernet/marvell/mvneta.c                 | 2 +-
>  drivers/net/ethernet/marvell/pxa168_eth.c             | 2 +-
>  drivers/net/ethernet/marvell/sky2.c                   | 2 +-
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c           | 2 +-
>  drivers/net/ethernet/micrel/ks8851.c                  | 2 +-
>  drivers/net/ethernet/micrel/ks8851_mll.c              | 2 +-
>  drivers/net/ethernet/nxp/lpc_eth.c                    | 2 +-
>  drivers/net/ethernet/qualcomm/qca_spi.c               | 2 +-
>  drivers/net/ethernet/qualcomm/qca_uart.c              | 2 +-
>  drivers/net/ethernet/renesas/ravb_main.c              | 2 +-
>  drivers/net/ethernet/renesas/sh_eth.c                 | 2 +-
>  drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c   | 2 +-
>  drivers/net/ethernet/socionext/sni_ave.c              | 2 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c     | 2 +-
>  drivers/net/ethernet/ti/cpsw.c                        | 2 +-
>  drivers/net/ethernet/ti/netcp_core.c                  | 2 +-
>  drivers/net/ethernet/wiznet/w5100.c                   | 2 +-
>  drivers/net/ethernet/xilinx/ll_temac_main.c           | 2 +-
>  drivers/net/ethernet/xilinx/xilinx_axienet_main.c     | 2 +-
>  drivers/net/ethernet/xilinx/xilinx_emaclite.c         | 2 +-
>  net/ethernet/eth.c                                    | 2 +-
>  45 files changed, 45 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/net/ethernet/aeroflex/greth.c
> b/drivers/net/ethernet/aeroflex/greth.c
> index 47e5984..7c5cf02 100644
> --- a/drivers/net/ethernet/aeroflex/greth.c
> +++ b/drivers/net/ethernet/aeroflex/greth.c
> @@ -1459,7 +1459,7 @@ static int greth_of_probe(struct platform_device
> *ofdev)
>                 const u8 *addr;
> 
>                 addr = of_get_mac_address(ofdev->dev.of_node);
> -               if (addr) {
> +               if (!IS_ERR(addr)) {
>                         for (i = 0; i < 6; i++)
>                                 macaddr[i] = (unsigned int) addr[i];
>                 } else {
> diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c
> b/drivers/net/ethernet/allwinner/sun4i-emac.c
> index e1acafa..37ebd89 100644
> --- a/drivers/net/ethernet/allwinner/sun4i-emac.c
> +++ b/drivers/net/ethernet/allwinner/sun4i-emac.c
> @@ -870,7 +870,7 @@ static int emac_probe(struct platform_device *pdev)
> 
>         /* Read MAC-address from DT */
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
> 
>         /* Check if the MAC address is valid, if not get a random one */ diff
> --git a/drivers/net/ethernet/altera/altera_tse_main.c
> b/drivers/net/ethernet/altera/altera_tse_main.c
> index aa1d1f5..877e67f 100644
> --- a/drivers/net/ethernet/altera/altera_tse_main.c
> +++ b/drivers/net/ethernet/altera/altera_tse_main.c
> @@ -1537,7 +1537,7 @@ static int altera_tse_probe(struct platform_device
> *pdev)
> 
>         /* get default MAC address from device tree */
>         macaddr = of_get_mac_address(pdev->dev.of_node);
> -       if (macaddr)
> +       if (!IS_ERR(macaddr))
>                 ether_addr_copy(ndev->dev_addr, macaddr);
>         else
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/arc/emac_main.c
> b/drivers/net/ethernet/arc/emac_main.c
> index ff3d685..7f89ad5 100644
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -960,7 +960,7 @@ int arc_emac_probe(struct net_device *ndev, int
> interface)
>         /* Get MAC address from device tree */
>         mac_addr = of_get_mac_address(dev->of_node);
> 
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
>         else
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/aurora/nb8800.c
> b/drivers/net/ethernet/aurora/nb8800.c
> index f62deeb..3c4967e 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -1463,7 +1463,7 @@ static int nb8800_probe(struct platform_device
> *pdev)
>         dev->irq = irq;
> 
>         mac = of_get_mac_address(pdev->dev.of_node);
> -       if (mac)
> +       if (!IS_ERR(mac))
>                 ether_addr_copy(dev->dev_addr, mac);
> 
>         if (!is_valid_ether_addr(dev->dev_addr))
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 4e87a30..c623896 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -2505,7 +2505,7 @@ static int bcm_sysport_probe(struct
> platform_device *pdev)
> 
>         /* Initialize netdevice members */
>         macaddr = of_get_mac_address(dn);
> -       if (!macaddr || !is_valid_ether_addr(macaddr)) {
> +       if (IS_ERR(macaddr)) {
>                 dev_warn(&pdev->dev, "using random Ethernet MAC\n");
>                 eth_hw_addr_random(dev);
>         } else {
> diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c
> b/drivers/net/ethernet/broadcom/bgmac-bcma.c
> index 6fe074c..34d1830 100644
> --- a/drivers/net/ethernet/broadcom/bgmac-bcma.c
> +++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
> @@ -132,7 +132,7 @@ static int bgmac_probe(struct bcma_device *core)
>                 mac = of_get_mac_address(bgmac->dev->of_node);
> 
>         /* If no MAC address assigned via device tree, check SPROM */
> -       if (!mac) {
> +       if (IS_ERR_OR_NULL(mac)) {
>                 switch (core->core_unit) {
>                 case 0:
>                         mac = sprom->et0mac; diff --git
> a/drivers/net/ethernet/broadcom/bgmac-platform.c
> b/drivers/net/ethernet/broadcom/bgmac-platform.c
> index 894eda5..6dc0dd9 100644
> --- a/drivers/net/ethernet/broadcom/bgmac-platform.c
> +++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
> @@ -193,7 +193,7 @@ static int bgmac_probe(struct platform_device *pdev)
>         bgmac->dma_dev = &pdev->dev;
> 
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(bgmac->net_dev->dev_addr,
> mac_addr);
>         else
>                 dev_warn(&pdev->dev, "MAC address not present in
> device tree\n"); diff --git
> a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 4fd9735..374b9ff 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -3476,7 +3476,7 @@ static int bcmgenet_probe(struct platform_device
> *pdev)
> 
>         if (dn) {
>                 macaddr = of_get_mac_address(dn);
> -               if (!macaddr) {
> +               if (IS_ERR(macaddr)) {
>                         dev_err(&pdev->dev, "can't find MAC
> address\n");
>                         err = -EINVAL;
>                         goto err;
> diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> index 5359c10..15b1130 100644
> --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> @@ -1503,7 +1503,7 @@ static int octeon_mgmt_probe(struct
> platform_device *pdev)
> 
>         mac = of_get_mac_address(pdev->dev.of_node);
> 
> -       if (mac)
> +       if (!IS_ERR(mac))
>                 memcpy(netdev->dev_addr, mac, ETH_ALEN);
>         else
>                 eth_hw_addr_random(netdev); diff --git
> a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> index 81c281a..a65be85 100644
> --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
> @@ -1484,7 +1484,7 @@ static int bgx_init_of_phy(struct bgx *bgx)
>                         break;
> 
>                 mac = of_get_mac_address(node);
> -               if (mac)
> +               if (!IS_ERR(mac))
>                         ether_addr_copy(bgx->lmac[lmac].mac, mac);
> 
>                 SET_NETDEV_DEV(&bgx->lmac[lmac].netdev,
> &bgx->pdev->dev); diff --git a/drivers/net/ethernet/davicom/dm9000.c
> b/drivers/net/ethernet/davicom/dm9000.c
> index c2586f4..953ee56 100644
> --- a/drivers/net/ethernet/davicom/dm9000.c
> +++ b/drivers/net/ethernet/davicom/dm9000.c
> @@ -1412,7 +1412,7 @@ static struct dm9000_plat_data
> *dm9000_parse_dt(struct device *dev)
>                 pdata->flags |= DM9000_PLATF_NO_EEPROM;
> 
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(pdata->dev_addr, mac_addr,
> sizeof(pdata->dev_addr));
> 
>         return pdata;
> diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
> index 0f3e7f2..71da049 100644
> --- a/drivers/net/ethernet/ethoc.c
> +++ b/drivers/net/ethernet/ethoc.c
> @@ -1153,7 +1153,7 @@ static int ethoc_probe(struct platform_device
> *pdev)
>                 const void *mac;
> 
>                 mac = of_get_mac_address(pdev->dev.of_node);
> -               if (mac)
> +               if (!IS_ERR(mac))
>                         ether_addr_copy(netdev->dev_addr, mac);
>                 priv->phy_id = -1;
>         }
> diff --git a/drivers/net/ethernet/ezchip/nps_enet.c
> b/drivers/net/ethernet/ezchip/nps_enet.c
> index 659f1ad..b4ce261 100644
> --- a/drivers/net/ethernet/ezchip/nps_enet.c
> +++ b/drivers/net/ethernet/ezchip/nps_enet.c
> @@ -616,7 +616,7 @@ static s32 nps_enet_probe(struct platform_device
> *pdev)
> 
>         /* set kernel MAC address to dev */
>         mac_addr = of_get_mac_address(dev->of_node);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(ndev->dev_addr, mac_addr);
>         else
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index a96ad20..aa7d4e2 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1655,7 +1655,7 @@ static void fec_get_mac(struct net_device *ndev)
>                 struct device_node *np = fep->pdev->dev.of_node;
>                 if (np) {
>                         const char *mac = of_get_mac_address(np);
> -                       if (mac)
> +                       if (!IS_ERR(mac))
>                                 iap = (unsigned char *) mac;
>                 }
>         }
> diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c
> b/drivers/net/ethernet/freescale/fec_mpc52xx.c
> index c1968b3..7b7e526 100644
> --- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
> +++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
> @@ -902,7 +902,7 @@ static int mpc52xx_fec_probe(struct platform_device
> *op)
>          * First try to read MAC address from DT
>          */
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr) {
> +       if (!IS_ERR(mac_addr)) {
>                 memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
>         } else {
>                 struct mpc52xx_fec __iomem *fec = priv->fec; diff --git
> a/drivers/net/ethernet/freescale/fman/mac.c
> b/drivers/net/ethernet/freescale/fman/mac.c
> index 3c21486..9cd2c28 100644
> --- a/drivers/net/ethernet/freescale/fman/mac.c
> +++ b/drivers/net/ethernet/freescale/fman/mac.c
> @@ -724,7 +724,7 @@ static int mac_probe(struct platform_device
> *_of_dev)
> 
>         /* Get the MAC address */
>         mac_addr = of_get_mac_address(mac_node);
> -       if (!mac_addr) {
> +       if (IS_ERR(mac_addr)) {
>                 dev_err(dev, "of_get_mac_address(%pOF) failed\n",
> mac_node);
>                 err = -EINVAL;
>                 goto _return_of_get_parent; diff --git
> a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
> b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
> index 7c548ed..90ea7a1 100644
> --- a/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
> +++ b/drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
> @@ -1014,7 +1014,7 @@ static int fs_enet_probe(struct platform_device
> *ofdev)
>         spin_lock_init(&fep->tx_lock);
> 
>         mac_addr = of_get_mac_address(ofdev->dev.of_node);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
> 
>         ret = fep->ops->allocate_bd(ndev); diff --git
> a/drivers/net/ethernet/freescale/gianfar.c
> b/drivers/net/ethernet/freescale/gianfar.c
> index 45fcc96..df13c69 100644
> --- a/drivers/net/ethernet/freescale/gianfar.c
> +++ b/drivers/net/ethernet/freescale/gianfar.c
> @@ -872,7 +872,7 @@ static int gfar_of_init(struct platform_device *ofdev,
> struct net_device **pdev)
> 
>         mac_addr = of_get_mac_address(np);
> 
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
> 
>         if (model && !strcasecmp(model, "TSEC")) diff --git
> a/drivers/net/ethernet/freescale/ucc_geth.c
> b/drivers/net/ethernet/freescale/ucc_geth.c
> index eb3e65e..216e99a 100644
> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> @@ -3910,7 +3910,7 @@ static int ucc_geth_probe(struct platform_device*
> ofdev)
>         }
> 
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
> 
>         ugeth->ug_info = ug_info;
> diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c
> b/drivers/net/ethernet/hisilicon/hisi_femac.c
> index 2c28088..96c32ae 100644
> --- a/drivers/net/ethernet/hisilicon/hisi_femac.c
> +++ b/drivers/net/ethernet/hisilicon/hisi_femac.c
> @@ -870,7 +870,7 @@ static int hisi_femac_drv_probe(struct
> platform_device *pdev)
>                            phy_modes(phy->interface));
> 
>         mac_addr = of_get_mac_address(node);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(ndev->dev_addr, mac_addr);
>         if (!is_valid_ether_addr(ndev->dev_addr)) {
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
> b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
> index e5d853b..b1cb58f 100644
> --- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
> +++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
> @@ -1229,7 +1229,7 @@ static int hix5hd2_dev_probe(struct
> platform_device *pdev)
>         }
> 
>         mac_addr = of_get_mac_address(node);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(ndev->dev_addr, mac_addr);
>         if (!is_valid_ether_addr(ndev->dev_addr)) {
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/lantiq_xrx200.c
> b/drivers/net/ethernet/lantiq_xrx200.c
> index d29104d..cda641e 100644
> --- a/drivers/net/ethernet/lantiq_xrx200.c
> +++ b/drivers/net/ethernet/lantiq_xrx200.c
> @@ -478,7 +478,7 @@ static int xrx200_probe(struct platform_device
> *pdev)
>         }
> 
>         mac = of_get_mac_address(np);
> -       if (mac && is_valid_ether_addr(mac))
> +       if (!IS_ERR(mac))
>                 ether_addr_copy(net_dev->dev_addr, mac);
>         else
>                 eth_hw_addr_random(net_dev); diff --git
> a/drivers/net/ethernet/marvell/mv643xx_eth.c
> b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index 292a668..07e254f 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -2749,7 +2749,7 @@ static int mv643xx_eth_shared_of_add_port(struct
> platform_device *pdev,
>         }
> 
>         mac_addr = of_get_mac_address(pnp);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(ppd.mac_addr, mac_addr, ETH_ALEN);
> 
>         mv643xx_eth_property(pnp, "tx-queue-size", ppd.tx_queue_size);
> diff --git a/drivers/net/ethernet/marvell/mvneta.c
> b/drivers/net/ethernet/marvell/mvneta.c
> index a715277..8186135 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4563,7 +4563,7 @@ static int mvneta_probe(struct platform_device
> *pdev)
>         }
> 
>         dt_mac_addr = of_get_mac_address(dn);
> -       if (dt_mac_addr) {
> +       if (!IS_ERR(dt_mac_addr)) {
>                 mac_from = "device tree";
>                 memcpy(dev->dev_addr, dt_mac_addr, ETH_ALEN);
>         } else {
> diff --git a/drivers/net/ethernet/marvell/pxa168_eth.c
> b/drivers/net/ethernet/marvell/pxa168_eth.c
> index 35f2142..ce037e8 100644
> --- a/drivers/net/ethernet/marvell/pxa168_eth.c
> +++ b/drivers/net/ethernet/marvell/pxa168_eth.c
> @@ -1461,7 +1461,7 @@ static int pxa168_eth_probe(struct
> platform_device *pdev)
>         if (pdev->dev.of_node)
>                 mac_addr = of_get_mac_address(pdev->dev.of_node);
> 
> -       if (mac_addr && is_valid_ether_addr(mac_addr)) {
> +       if (!IS_ERR_OR_NULL(mac_addr)) {
>                 ether_addr_copy(dev->dev_addr, mac_addr);
>         } else {
>                 /* try reading the mac address, if set by the bootloader */
> diff --git a/drivers/net/ethernet/marvell/sky2.c
> b/drivers/net/ethernet/marvell/sky2.c
> index 8b3495e..c4050ec 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -4808,7 +4808,7 @@ static struct net_device *sky2_init_netdev(struct
> sky2_hw *hw, unsigned port,
>          * 2) from internal registers set by bootloader
>          */
>         iap = of_get_mac_address(hw->pdev->dev.of_node);
> -       if (iap)
> +       if (!IS_ERR(iap))
>                 memcpy(dev->dev_addr, iap, ETH_ALEN);
>         else
>                 memcpy_fromio(dev->dev_addr, hw->regs + B2_MAC_1 +
> port * 8, diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 53abe92..f9fbb3f 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -2028,7 +2028,7 @@ static int __init mtk_init(struct net_device *dev)
>         const char *mac_addr;
> 
>         mac_addr = of_get_mac_address(mac->of_node);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(dev->dev_addr, mac_addr);
> 
>         /* If the mac address is invalid, use random mac address  */ diff
> --git a/drivers/net/ethernet/micrel/ks8851.c
> b/drivers/net/ethernet/micrel/ks8851.c
> index 7849119..b44172a 100644
> --- a/drivers/net/ethernet/micrel/ks8851.c
> +++ b/drivers/net/ethernet/micrel/ks8851.c
> @@ -425,7 +425,7 @@ static void ks8851_init_mac(struct ks8851_net *ks)
>         const u8 *mac_addr;
> 
>         mac_addr = of_get_mac_address(ks->spidev->dev.of_node);
> -       if (mac_addr) {
> +       if (!IS_ERR(mac_addr)) {
>                 memcpy(dev->dev_addr, mac_addr, ETH_ALEN);
>                 ks8851_write_mac_addr(dev);
>                 return;
> diff --git a/drivers/net/ethernet/micrel/ks8851_mll.c
> b/drivers/net/ethernet/micrel/ks8851_mll.c
> index c946841..dc76b0d 100644
> --- a/drivers/net/ethernet/micrel/ks8851_mll.c
> +++ b/drivers/net/ethernet/micrel/ks8851_mll.c
> @@ -1327,7 +1327,7 @@ static int ks8851_probe(struct platform_device
> *pdev)
>         /* overwriting the default MAC address */
>         if (pdev->dev.of_node) {
>                 mac = of_get_mac_address(pdev->dev.of_node);
> -               if (mac)
> +               if (!IS_ERR(mac))
>                         memcpy(ks->mac_addr, mac, ETH_ALEN);
>         } else {
>                 struct ks8851_mll_platform_data *pdata; diff --git
> a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
> index 89d1739..da138ed 100644
> --- a/drivers/net/ethernet/nxp/lpc_eth.c
> +++ b/drivers/net/ethernet/nxp/lpc_eth.c
> @@ -1368,7 +1368,7 @@ static int lpc_eth_drv_probe(struct
> platform_device *pdev)
> 
>         if (!is_valid_ether_addr(ndev->dev_addr)) {
>                 const char *macaddr = of_get_mac_address(np);
> -               if (macaddr)
> +               if (!IS_ERR(macaddr))
>                         memcpy(ndev->dev_addr, macaddr,
> ETH_ALEN);
>         }
>         if (!is_valid_ether_addr(ndev->dev_addr))
> diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c
> b/drivers/net/ethernet/qualcomm/qca_spi.c
> index 97f9295..b28360b 100644
> --- a/drivers/net/ethernet/qualcomm/qca_spi.c
> +++ b/drivers/net/ethernet/qualcomm/qca_spi.c
> @@ -966,7 +966,7 @@
> 
>         mac = of_get_mac_address(spi->dev.of_node);
> 
> -       if (mac)
> +       if (!IS_ERR(mac))
>                 ether_addr_copy(qca->net_dev->dev_addr, mac);
> 
>         if (!is_valid_ether_addr(qca->net_dev->dev_addr)) { diff --git
> a/drivers/net/ethernet/qualcomm/qca_uart.c
> b/drivers/net/ethernet/qualcomm/qca_uart.c
> index db6068c..5906168 100644
> --- a/drivers/net/ethernet/qualcomm/qca_uart.c
> +++ b/drivers/net/ethernet/qualcomm/qca_uart.c
> @@ -351,7 +351,7 @@ static int qca_uart_probe(struct serdev_device
> *serdev)
> 
>         mac = of_get_mac_address(serdev->dev.of_node);
> 
> -       if (mac)
> +       if (!IS_ERR(mac))
>                 ether_addr_copy(qca->net_dev->dev_addr, mac);
> 
>         if (!is_valid_ether_addr(qca->net_dev->dev_addr)) { diff --git
> a/drivers/net/ethernet/renesas/ravb_main.c
> b/drivers/net/ethernet/renesas/ravb_main.c
> index 9618c48..d3ffcf5 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -111,7 +111,7 @@ static void ravb_set_buffer_align(struct sk_buff *skb)
>   */
>  static void ravb_read_mac_address(struct net_device *ndev, const u8 *mac)
> {
> -       if (mac) {
> +       if (!IS_ERR(mac)) {
>                 ether_addr_copy(ndev->dev_addr, mac);
>         } else {
>                 u32 mahr = ravb_read(ndev, MAHR); diff --git
> a/drivers/net/ethernet/renesas/sh_eth.c
> b/drivers/net/ethernet/renesas/sh_eth.c
> index e33af37..4d4be66 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -3193,7 +3193,7 @@ static struct sh_eth_plat_data
> *sh_eth_parse_dt(struct device *dev)
>         pdata->phy_interface = ret;
> 
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 memcpy(pdata->mac_addr, mac_addr, ETH_ALEN);
> 
>         pdata->no_ether_link =
> diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
> b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
> index fbd00cb..d2bc941 100644
> --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
> +++ b/drivers/net/ethernet/samsung/sxgbe/sxgbe_platform.c
> @@ -124,7 +124,7 @@ static int sxgbe_platform_probe(struct
> platform_device *pdev)
>         }
> 
>         /* Get MAC address if available (DT) */
> -       if (mac)
> +       if (!IS_ERR_OR_NULL(mac))
>                 ether_addr_copy(priv->dev->dev_addr, mac);
> 
>         /* Get the TX/RX IRQ numbers */
> diff --git a/drivers/net/ethernet/socionext/sni_ave.c
> b/drivers/net/ethernet/socionext/sni_ave.c
> index bb6d5fb..51a7b48 100644
> --- a/drivers/net/ethernet/socionext/sni_ave.c
> +++ b/drivers/net/ethernet/socionext/sni_ave.c
> @@ -1599,7 +1599,7 @@ static int ave_probe(struct platform_device *pdev)
>         ndev->max_mtu = AVE_MAX_ETHFRAME - (ETH_HLEN +
> ETH_FCS_LEN);
> 
>         mac_addr = of_get_mac_address(np);
> -       if (mac_addr)
> +       if (!IS_ERR(mac_addr))
>                 ether_addr_copy(ndev->dev_addr, mac_addr);
> 
>         /* if the mac address is invalid, use random mac address */ diff --git
> a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 5ab2733..5678b86 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4262,7 +4262,7 @@ int stmmac_dvr_probe(struct device *device,
>         priv->wol_irq = res->wol_irq;
>         priv->lpi_irq = res->lpi_irq;
> 
> -       if (res->mac)
> +       if (!IS_ERR_OR_NULL(res->mac))
>                 memcpy(priv->dev->dev_addr, res->mac, ETH_ALEN);
> 
>         dev_set_drvdata(device, priv->dev); diff --git
> a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index
> e376806..b18eeb0 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -2232,7 +2232,7 @@ static int cpsw_probe_dt(struct
> cpsw_platform_data *data,
> 
>  no_phy_slave:
>                 mac_addr = of_get_mac_address(slave_node);
> -               if (mac_addr) {
> +               if (!IS_ERR(mac_addr)) {
>                         memcpy(slave_data->mac_addr, mac_addr,
> ETH_ALEN);
>                 } else {
>                         ret = ti_cm_get_macid(&pdev->dev, i, diff --git
> a/drivers/net/ethernet/ti/netcp_core.c
> b/drivers/net/ethernet/ti/netcp_core.c
> index 01d4ca3..6428439 100644
> --- a/drivers/net/ethernet/ti/netcp_core.c
> +++ b/drivers/net/ethernet/ti/netcp_core.c
> @@ -2037,7 +2037,7 @@ static int netcp_create_interface(struct
> netcp_device *netcp_device,
>                 devm_release_mem_region(dev, res.start, size);
>         } else {
>                 mac_addr = of_get_mac_address(node_interface);
> -               if (mac_addr)
> +               if (!IS_ERR(mac_addr))
>                         ether_addr_copy(ndev->dev_addr, mac_addr);
>                 else
>                         eth_random_addr(ndev->dev_addr); diff --git
> a/drivers/net/ethernet/wiznet/w5100.c
> b/drivers/net/ethernet/wiznet/w5100.c
> index d8ba512..b0052933 100644
> --- a/drivers/net/ethernet/wiznet/w5100.c
> +++ b/drivers/net/ethernet/wiznet/w5100.c
> @@ -1164,7 +1164,7 @@ int w5100_probe(struct device *dev, const struct
> w5100_ops *ops,
>         INIT_WORK(&priv->setrx_work, w5100_setrx_work);
>         INIT_WORK(&priv->restart_work, w5100_restart_work);
> 
> -       if (mac_addr)
> +       if (!IS_ERR_OR_NULL(mac_addr))
>                 memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
>         else
>                 eth_hw_addr_random(ndev); diff --git
> a/drivers/net/ethernet/xilinx/ll_temac_main.c
> b/drivers/net/ethernet/xilinx/ll_temac_main.c
> index 9851991..f389a81 100644
> --- a/drivers/net/ethernet/xilinx/ll_temac_main.c
> +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c
> @@ -1252,7 +1252,7 @@ static int temac_probe(struct platform_device
> *pdev)
>         if (temac_np) {
>                 /* Retrieve the MAC address */
>                 addr = of_get_mac_address(temac_np);
> -               if (!addr) {
> +               if (IS_ERR(addr)) {
>                         dev_err(&pdev->dev, "could not find MAC
> address\n");
>                         return -ENODEV;
>                 }
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> index 4041c75..108fbc7 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> @@ -1596,7 +1596,7 @@ static int axienet_probe(struct platform_device
> *pdev)
> 
>         /* Retrieve the MAC address */
>         mac_addr = of_get_mac_address(pdev->dev.of_node);
> -       if (!mac_addr) {
> +       if (IS_ERR(mac_addr)) {
>                 dev_err(&pdev->dev, "could not find MAC address\n");
>                 goto free_netdev;
>         }
> diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> index fc38692..6911707 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
> @@ -1165,7 +1165,7 @@ static int xemaclite_of_probe(struct
> platform_device *ofdev)
>         lp->rx_ping_pong = get_bool(ofdev, "xlnx,rx-ping-pong");
>         mac_address = of_get_mac_address(ofdev->dev.of_node);
> 
> -       if (mac_address) {
> +       if (!IS_ERR(mac_address)) {
>                 /* Set the MAC address. */
>                 memcpy(ndev->dev_addr, mac_address, ETH_ALEN);
>         } else {
> diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index fddcee3..4b2b222
> 100644
> --- a/net/ethernet/eth.c
> +++ b/net/ethernet/eth.c
> @@ -560,7 +560,7 @@ int eth_platform_get_mac_address(struct device
> *dev, u8 *mac_addr)
>         addr = NULL;
>         if (dp)
>                 addr = of_get_mac_address(dp);
> -       if (!addr)
> +       if (IS_ERR_OR_NULL(addr))
>                 addr = arch_get_platform_mac_address();
> 
>         if (!addr)
> --
> 1.9.1


^ permalink raw reply

* RE: [EXT] Re: [PATCH v1] timer:clock:ptp: add support the dynamic posix clock alarm set for ptp
From: Po Liu @ 2019-05-08  3:30 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Roy Zang, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Leo Li, Claudiu Manoil, Mingkai Hu, Y.b. Lu,
	linuxppc-dev@lists.ozlabs.org, davem@davemloft.net,
	linux-arm-kernel@lists.infradead.org, deepa.kernel@gmail.com
In-Reply-To: <20190507134952.uqqxmhinv75actbh@localhost>

Hi Richard,

Thank you for your reply.


Br,
Po Liu

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: 2019年5月7日 21:50
> To: Po Liu <po.liu@nxp.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org; linux-arm-kernel@lists.infradead.org; Y.b. Lu
> <yangbo.lu@nxp.com>; Claudiu Manoil <claudiu.manoil@nxp.com>;
> davem@davemloft.net; Leo Li <leoyang.li@nxp.com>; Roy Zang
> <roy.zang@nxp.com>; Mingkai Hu <mingkai.hu@nxp.com>;
> deepa.kernel@gmail.com
> Subject: [EXT] Re: [PATCH v1] timer:clock:ptp: add support the dynamic posix
> clock alarm set for ptp
> 
> Caution: EXT Email
> 
> On Sun, May 05, 2019 at 05:02:05AM +0000, Po Liu wrote:
> > Current kernel code do not support the dynamic posix clock alarm set.
> > This code would support it by the posix timer structure.
> >
> > 319  const struct k_clock clock_posix_dynamic = {
> >
> > 320         .clock_getres   = pc_clock_getres,
> > 321         .clock_set      = pc_clock_settime,
> > 322         .clock_get      = pc_clock_gettime,
> > 323         .clock_adj      = pc_clock_adjtime,
> > 324 +       .timer_create   = pc_timer_create,
> > 325 +       .timer_del      = pc_timer_delete,
> > 326 +       .timer_set      = pc_timer_set,
> > 327 +       .timer_arm      = pc_timer_arm,
> > }
> >
> 
> Sorry, NAK, since we decided some time ago not to support timer_* operations
> on dynamic clocks.  You get much better application level timer performance
> by synchronizing CLOCK_REALTIME to your PHC and using clock_nanosleep()
> with CLOCK_REALTIME or CLOCK_MONOTONIC.

The code intend to get alarm by interrupt of ptp hardware. The code to fix ptp not support to application layer to get the alarm interrupt. 
Do you mean the synchronizing at application layer by PHC (using clock_nanosleep()) to the CLOCK_REALTIME source? Then the kernel could using the hrtimer with CLOCK_REALTIME?

> 
> > This won't change the user space system call code. Normally the user
> > space set alarm by timer_create() and timer_settime(). Reference code
> > are tools/testing/selftests/ptp/testptp.c.
> 
> That program still has misleading examples.  Sorry about that.  I'll submit a
> patch to remove them.

Is there any replace method for an application code to get alarm interrupt by the ptp source?

> 
> > +static int pc_timer_create(struct k_itimer *new_timer) {
> > +     return 0;
> > +}
> > +
> 
> This of course would never work.  Consider what happens when two or more
> timers are created and armed.
> 
> Thanks,
> Richard

^ permalink raw reply

* [PATCH] powerpc: Fix compile issue with force DAWR
From: Michael Neuling @ 2019-05-08  1:30 UTC (permalink / raw)
  To: mpe; +Cc: mikey, linuxppc-dev

If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
  arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable'

This was caused by this recent patch:
   commit c1fe190c06723322f2dfac31d3b982c581e434ef
   Author: Michael Neuling <mikey@neuling.org>
   powerpc: Add force enable of DAWR on P9 option

This builds dawr_force_enable in always via a new file.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/Makefile        |  2 +-
 arch/powerpc/kernel/dawr.c          | 11 +++++++++++
 arch/powerpc/kernel/hw_breakpoint.c |  3 ---
 3 files changed, 12 insertions(+), 4 deletions(-)
 create mode 100644 arch/powerpc/kernel/dawr.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0ea6c4aa3a..48a20ef5be 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -49,7 +49,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
 				   signal.o sysfs.o cacheinfo.o time.o \
 				   prom.o traps.o setup-common.o \
 				   udbg.o misc.o io.o misc_$(BITS).o \
-				   of_platform.o prom_parse.o
+				   of_platform.o prom_parse.o dawr.o
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
 				   paca.o nvram_64.o firmware.o
diff --git a/arch/powerpc/kernel/dawr.c b/arch/powerpc/kernel/dawr.c
new file mode 100644
index 0000000000..ca343efd23
--- /dev/null
+++ b/arch/powerpc/kernel/dawr.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// DAWR global variables
+//
+// Copyright 2019, Michael Neuling, IBM Corporation.
+
+#include <linux/types.h>
+#include <linux/export.h>
+
+bool dawr_force_enable;
+EXPORT_SYMBOL_GPL(dawr_force_enable);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93e..78a17454f4 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -381,9 +381,6 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 	/* TODO */
 }
 
-bool dawr_force_enable;
-EXPORT_SYMBOL_GPL(dawr_force_enable);
-
 static ssize_t dawr_write_file_bool(struct file *file,
 				    const char __user *user_buf,
 				    size_t count, loff_t *ppos)
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] powerpc/ftrace: Enable C Version of recordmcount
From: Michael Ellerman @ 2019-05-08  0:45 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4464516c0b6835b42acc65e088b6d7f88fe886f2.1557235811.git.christophe.leroy@c-s.fr>

Christophe Leroy <christophe.leroy@c-s.fr> writes:
> Selects HAVE_C_RECORDMCOUNT to use the C version of the recordmcount
> intead of the old Perl Version of recordmcount.
>
> This should improve build time. It also seems like the old Perl Version
> misses some calls to _mcount that the C version finds.

That would make this a bug fix possibly for stable even.

Any more details on what the difference is, is it just some random
subset of functions or some pattern?

cheers

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2711aac24621..d87de4f9da61 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -180,6 +180,7 @@ config PPC
>  	select HAVE_ARCH_NVRAM_OPS
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_TRACEHOOK
> +	select HAVE_C_RECORDMCOUNT
>  	select HAVE_CBPF_JIT			if !PPC64
>  	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>  	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
> -- 
> 2.13.3

^ permalink raw reply

* Re: [PATCH] powerpc: restore current_thread_info()
From: Michael Ellerman @ 2019-05-08  0:42 UTC (permalink / raw)
  To: Yury Norov, Al Viro
  Cc: Yury Norov, linux-kernel, Nicholas Piggin, Paul Mackerras,
	Breno Leitao, linuxppc-dev
In-Reply-To: <20190507230746.GA19259@yury-thinkpad>

Yury Norov <yury.norov@gmail.com> writes:
> On Tue, May 07, 2019 at 11:58:56PM +0100, Al Viro wrote:
>> On Tue, May 07, 2019 at 03:51:21PM -0700, Yury Norov wrote:
>> > Commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
>> > removes the function current_thread_info(). It's wrong because the
>> > function is used in non-arch code and is part of API.
>> 
>> In include/linux/thread_info.h:
>> 
>> #ifdef CONFIG_THREAD_INFO_IN_TASK
>> /*
>>  * For CONFIG_THREAD_INFO_IN_TASK kernels we need <asm/current.h> for the
>>  * definition of current, but for !CONFIG_THREAD_INFO_IN_TASK kernels,
>>  * including <asm/current.h> can cause a circular dependency on some platforms.
>>  */
>> #include <asm/current.h>
>> #define current_thread_info() ((struct thread_info *)current)
>> #endif
>
> Ah, sorry. Then it might be my rebase issue. I was confused because Christophe
> didn't remove the comment to current_thread_info(), so I decided he
> removed it erroneously.

Yeah you're right, that comment should have been removed too.

cheers

^ permalink raw reply

* Re: [PATCH v2 8/8] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section
From: Dan Williams @ 2019-05-08  0:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-s390, linux-ia64, Linux-sh, Linux Kernel Mailing List,
	Linux MM, Andrew Morton, linuxppc-dev
In-Reply-To: <20190507183804.5512-9-david@redhat.com>

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Unused, and memory unplug path should never care about zones. This is
> the job of memory offlining. ZONE_DEVICE might require special care -
> the caller of arch_remove_memory() should handle this.

The ZONE_DEVICE usage does not require special care so you can drop
that comment. The only place it's used in the subsection patches is to
lookup the node-id, but it turns out that the resulting node-id is
then never used.

With the ZONE_DEVICE mention dropped out of changelog you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* Re: [PATCH v2 12/16] powerpc/powernv: export /proc/opalcore for analysing opal crashes
From: Nicholas Piggin @ 2019-05-08  0:25 UTC (permalink / raw)
  To: Hari Bathini, linuxppc-dev
  Cc: Ananth N Mavinakayanahalli, Mahesh J Salgaonkar, Vasant Hegde,
	Stewart Smith, Daniel Axtens
In-Reply-To: <155541094306.812.15406965699701118702.stgit@hbathini.in.ibm.com>

Hari Bathini's on April 16, 2019 8:35 pm:
> From: Hari Bathini <hbathini@linux.vnet.ibm.com>
> 
> Export /proc/opalcore file to analyze opal crashes. Since opalcore can
> be generated independent of CONFIG_FA_DUMP support in kernel, add this
> support under a new kernel config option CONFIG_OPAL_CORE. Also, avoid
> code duplication by moving common code used for processing the register
> state data to export /proc/vmcore and/or /proc/opalcore file(s).

Can this go in /sys/firmware/opal/core or similar?

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH v2 7/8] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail
From: Dan Williams @ 2019-05-08  0:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-s390, linux-ia64, Linux-sh, Greg Kroah-Hartman,
	Rafael J. Wysocki, Linux Kernel Mailing List, Chris Wilson,
	Linux MM, Mark Brown, Jonathan Cameron, Alex Deucher,
	Andrew Morton, linuxppc-dev, David S. Miller, Oscar Salvador
In-Reply-To: <20190507183804.5512-8-david@redhat.com>

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> We really don't want anything during memory hotunplug to fail.
> We always pass a valid memory block device, that check can go. Avoid
> allocating memory and eventually failing. As we are always called under
> lock, we can use a static piece of memory. This avoids having to put
> the structure onto the stack, having to guess about the stack size
> of callers.
>
> Patch inspired by a patch from Oscar Salvador.
>
> In the future, there might be no need to iterate over nodes at all.
> mem->nid should tell us exactly what to remove. Memory block devices
> with mixed nodes (added during boot) should properly fenced off and never
> removed.
>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/node.c  | 18 +++++-------------
>  include/linux/node.h |  5 ++---
>  2 files changed, 7 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 04fdfa99b8bc..9be88fd05147 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>
>  /*
>   * Unregister memory block device under all nodes that it spans.
> + * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).

Given this comment can bitrot relative to the implementation lets
instead add an explicit:

    lockdep_assert_held(&mem_sysfs_mutex);

With that you can add:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* IPIC_SERMR vs IPIC_SERMR
From: Rodriguez Quesada, Pablo @ 2019-05-07 21:50 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev@lists.ozlabs.org
  Cc: Ramirez Rojas, Luis Daniel (Daniel)

[-- Attachment #1: Type: text/plain, Size: 3809 bytes --]

Hi, Christophe and everyone from the LinuxPPC community,

Hope you are doing well!
I was working with Daniel, and we were reviewing the ipic source code and came to these questions:


1.       The first one came up while reviewing the document "Understanding IPIC" from NXP (*). The document states that:

System Error Status Register (SERSR) - 0x40: Each bit in the SERSR register corresponds to a non-maskable error source that generates a machine check exception (MCP). These bits are cleared by writing the bit location with a logic 1.

The interrupting source must be cleared before clearing the interrupts error status bit.

System Error Mask Register (SERMR) - 0x44 When a machine check exception is signaled from one of the peripheral modules, its corresponding flag bit in this register is set. Any of the interrupting sources can be individually masked.

We noticed that you changed the ipic_clear_mcp_status  function by swapping SERMR to SERSR, but we don't fully understand why. Could you please elaborate on why this change is made? This because we had a code that was working before the patch and we would fully understand why it stopped working. From our perspective, both SERSR and SERMR are related to MCEs and honestly don't know what's the difference at the hardware level, and the role of every register on the machine check flow. It is also weird for us that before your patch the register SERSR wasn't used in any part of the Linux kernel.


2.       In commit 8acb88682cc00a41a677c2455a7c992d you removed ipic_set_highest_priority(), ipic_enable_mcp() and ipic_disable_mcp()because it wasn't used but here we are confused because ipic_clear_mcp_status() sets all bits with the mask and in the old disable function the bits from the mask are cleared with a bitwise AND,changing the other bits from the SERMR register but the mcp_irq.

ipic_clear:
ipic_write<https://elixir.bootlin.com/linux/v4.1.13/ident/ipic_write>(primary_ipic<https://elixir.bootlin.com/linux/v4.1.13/ident/primary_ipic>->regs, IPIC_SERMR<https://elixir.bootlin.com/linux/v4.1.13/ident/IPIC_SERMR>, mask);
ipic_disable:
                                temp<https://elixir.bootlin.com/linux/v4.1.13/ident/temp> = ipic_read<https://elixir.bootlin.com/linux/v4.1.13/ident/ipic_read>(ipic<https://elixir.bootlin.com/linux/v4.1.13/ident/ipic>->regs, IPIC_SERMR<https://elixir.bootlin.com/linux/v4.1.13/ident/IPIC_SERMR>);
                                temp<https://elixir.bootlin.com/linux/v4.1.13/ident/temp> &= (1 << (31 - mcp_irq));
                                ipic_write<https://elixir.bootlin.com/linux/v4.1.13/ident/ipic_write>(ipic<https://elixir.bootlin.com/linux/v4.1.13/ident/ipic>->regs, IPIC_SERMR<https://elixir.bootlin.com/linux/v4.1.13/ident/IPIC_SERMR>, temp<https://elixir.bootlin.com/linux/v4.1.13/ident/temp>);

                Isn't ipic_clear a set function instead of a clear function? This bring us confusion
Another question is why these functions were created in the first place?


3.       In the NXP document it states that: These bits are cleared by writing the bit location with a logic 1. The interrupting source must be cleared before clearing the interrupts error status bit.
Does this mean that these registers work with negative logic? How is this managed in the kernel?
The interrupting source is the SERMR register, the SERSR or any other?


4.       What is the real difference between MCP and MCE? What are their uses?


I know it is a lot of information but we are very confused about the use of these registers and we noticed that you are very involved on the PPC development. So we would like to hear from an expert.

Thank you very much for your time,
Pablo






(*) https://www.nxp.com/docs/en/application-note/AN3797.pdf

[-- Attachment #2: Type: text/html, Size: 13255 bytes --]

^ permalink raw reply

* Re: [PATCH] powerpc: restore current_thread_info()
From: Yury Norov @ 2019-05-07 23:07 UTC (permalink / raw)
  To: Al Viro
  Cc: Yury Norov, linux-kernel, Nicholas Piggin, Paul Mackerras,
	Breno Leitao, linuxppc-dev
In-Reply-To: <20190507225856.GP23075@ZenIV.linux.org.uk>

On Tue, May 07, 2019 at 11:58:56PM +0100, Al Viro wrote:
> On Tue, May 07, 2019 at 03:51:21PM -0700, Yury Norov wrote:
> > Commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
> > removes the function current_thread_info(). It's wrong because the
> > function is used in non-arch code and is part of API.
> 
> In include/linux/thread_info.h:
> 
> #ifdef CONFIG_THREAD_INFO_IN_TASK
> /*
>  * For CONFIG_THREAD_INFO_IN_TASK kernels we need <asm/current.h> for the
>  * definition of current, but for !CONFIG_THREAD_INFO_IN_TASK kernels,
>  * including <asm/current.h> can cause a circular dependency on some platforms.
>  */
> #include <asm/current.h>
> #define current_thread_info() ((struct thread_info *)current)
> #endif

Ah, sorry. Then it might be my rebase issue. I was confused because Christophe
didn't remove the comment to current_thread_info(), so I decided he
removed it erroneously.

^ permalink raw reply

* Re: [PATCH] powerpc: restore current_thread_info()
From: Al Viro @ 2019-05-07 22:58 UTC (permalink / raw)
  To: Yury Norov
  Cc: Yury Norov, linux-kernel, Nicholas Piggin, Paul Mackerras,
	Breno Leitao, linuxppc-dev
In-Reply-To: <20190507225121.18676-1-ynorov@marvell.com>

On Tue, May 07, 2019 at 03:51:21PM -0700, Yury Norov wrote:
> Commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
> removes the function current_thread_info(). It's wrong because the
> function is used in non-arch code and is part of API.

In include/linux/thread_info.h:

#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
 * For CONFIG_THREAD_INFO_IN_TASK kernels we need <asm/current.h> for the
 * definition of current, but for !CONFIG_THREAD_INFO_IN_TASK kernels,
 * including <asm/current.h> can cause a circular dependency on some platforms.
 */
#include <asm/current.h>
#define current_thread_info() ((struct thread_info *)current)
#endif


^ permalink raw reply

* [PATCH] powerpc: restore current_thread_info()
From: Yury Norov @ 2019-05-07 22:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Breno Leitao, linuxppc-dev,
	linux-kernel
  Cc: Yury Norov, Yury Norov

Commit ed1cd6deb013 ("powerpc: Activate CONFIG_THREAD_INFO_IN_TASK")
removes the function current_thread_info(). It's wrong because the
function is used in non-arch code and is part of API.

For my series of arm64/ilp32, after applying the patch
https://github.com/norov/linux/commit/b269e51eee66ffec3008a3effb12363b91754e49
it causes build break.

This patch restores current_thread_info().

Signed-off-by: Yury Norov <ynorov@marvell.com>
---
 arch/powerpc/include/asm/thread_info.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index 8e1d0195ac36..f700bc80a607 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -19,6 +19,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/cache.h>
+#include <asm/current.h>
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <asm/accounting.h>
@@ -57,6 +58,11 @@ struct thread_info {
 #define THREAD_SIZE_ORDER	(THREAD_SHIFT - PAGE_SHIFT)
 
 /* how to get the thread information struct from C */
+static inline struct thread_info *current_thread_info(void)
+{
+	return (struct thread_info *)current;
+}
+
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
 
 #ifdef CONFIG_PPC_BOOK3S_64
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: David Hildenbrand @ 2019-05-07 21:27 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, Linux-sh,
	mike.travis@hpe.com, Greg Kroah-Hartman, Rafael J. Wysocki,
	Mathieu Malaterre, Linux Kernel Mailing List, Wei Yang, Linux MM,
	Andrew Banman, Qian Cai, Arun KS, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Oscar Salvador
In-Reply-To: <CAPcyv4jiVyaPbUrQwSiy65xk=EegJwuGSDKkVYWkGiTJz847gg@mail.gmail.com>

>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +       BUG_ON(memory->dev.bus != &memory_subsys);
> 
> Given this should never happen and only a future kernel developer
> might trip over it, do we really need to kill that developer's
> machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
> such a change that to a follow-on patch since you're just preserving
> existing behavior, but I figure might as well address these as the
> code is refactored.

I assume only

if (WARN ...)
	return;

makes sense then, right?

> 
>> +
>> +       /* drop the ref. we got via find_memory_block() */
>> +       put_device(&memory->dev);
>> +       device_unregister(&memory->dev);
>> +}
>> +
>>  /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>   */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int hotplug_memory_register(unsigned long start, unsigned long size)
>>  {
>> -       int ret = 0;
>> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
>> +       unsigned long start_pfn = PFN_DOWN(start);
>> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>> +       unsigned long pfn;
>>         struct memory_block *mem;
>> +       int ret = 0;
>>
>> -       mutex_lock(&mem_sysfs_mutex);
>> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
>> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));
> 
> Perhaps:
> 
>     if (WARN_ON(...))
>         return -EINVAL;
> 

Yes, guess this souldn't hurt.

>>
>> -       mem = find_memory_block(section);
>> -       if (mem) {
>> -               mem->section_count++;
>> -               put_device(&mem->dev);
>> -       } else {
>> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
>> +       mutex_lock(&mem_sysfs_mutex);
>> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
>> +               mem = find_memory_block(__pfn_to_section(pfn));
>> +               if (mem) {
>> +                       WARN_ON_ONCE(false);
> 
> ?? Isn't that a nop?

Yes, that makes no sense :)

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v2 6/8] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()
From: Dan Williams @ 2019-05-07 21:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, linux-ia64, Linux-sh, Chris Wilson, Linux MM,
	Arun KS, Ingo Molnar, linux-s390, Rafael J. Wysocki,
	Pavel Tatashin, mike.travis@hpe.com, Mark Brown, Jonathan Cameron,
	Oscar Salvador, Andrew Banman, Mathieu Malaterre,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Alex Deucher,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20190507183804.5512-7-david@redhat.com>

On Tue, May 7, 2019 at 11:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's factor out removing of memory block devices, which is only
> necessary for memory added via add_memory() and friends that created
> memory block devices. Remove the devices before calling
> arch_remove_memory().
>
> This finishes factoring out memory block device handling from
> arch_add_memory() and arch_remove_memory().

Also nice! makes it easier in the future for the "device-memory" use
case to not avoid messing up the typical memory hotplug flow.

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 39 +++++++++++++++++++--------------------
>  drivers/base/node.c    | 11 ++++++-----
>  include/linux/memory.h |  2 +-
>  include/linux/node.h   |  6 ++----
>  mm/memory_hotplug.c    |  5 +++--
>  5 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 862c202a18ca..47ff49058d1f 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -756,32 +756,31 @@ int hotplug_memory_register(unsigned long start, unsigned long size)
>         return ret;
>  }
>
> -static int remove_memory_section(struct mem_section *section)
> +/*
> + * Remove memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * have to be offline.
> + */
> +void hotplug_memory_unregister(unsigned long start, unsigned long size)
>  {
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
>         struct memory_block *mem;
> +       unsigned long pfn;
>
> -       if (WARN_ON_ONCE(!present_section(section)))
> -               return;
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Similar BUG_ON vs comments WARN_ON comments as the previous patch.

>
>         mutex_lock(&mem_sysfs_mutex);
> -
> -       /*
> -        * Some users of the memory hotplug do not want/need memblock to
> -        * track all sections. Skip over those.
> -        */
> -       mem = find_memory_block(section);
> -       if (!mem)
> -               goto out_unlock;
> -
> -       unregister_mem_sect_under_nodes(mem, __section_nr(section));
> -
> -       mem->section_count--;
> -       if (mem->section_count == 0)
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (!mem)
> +                       continue;
> +               mem->section_count = 0;
> +               unregister_memory_block_under_nodes(mem);
>                 unregister_memory(mem);
> -       else
> -               put_device(&mem->dev);
> -
> -out_unlock:
> +       }
>         mutex_unlock(&mem_sysfs_mutex);
>  }
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 8598fcbd2a17..04fdfa99b8bc 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -801,9 +801,10 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, void *arg)
>         return 0;
>  }
>
> -/* unregister memory section under all nodes that it spans */
> -int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                   unsigned long phys_index)
> +/*
> + * Unregister memory block device under all nodes that it spans.
> + */
> +int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>         unsigned long pfn, sect_start_pfn, sect_end_pfn;
> @@ -816,8 +817,8 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
>                 return -ENOMEM;
>         nodes_clear(*unlinked_nodes);
>
> -       sect_start_pfn = section_nr_to_pfn(phys_index);
> -       sect_end_pfn = sect_start_pfn + PAGES_PER_SECTION - 1;
> +       sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> +       sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>         for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>                 int nid;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 95505fbb5f85..aa236c2a0466 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,7 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(unsigned long start, unsigned long size);
> -extern void unregister_memory_section(struct mem_section *);
> +void hotplug_memory_unregister(unsigned long start, unsigned long size);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 1a557c589ecb..02a29e71b175 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -139,8 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
>  extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>                                                 void *arg);
> -extern int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                          unsigned long phys_index);
> +extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>
>  extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>                                                    unsigned int cpu_nid,
> @@ -176,8 +175,7 @@ static inline int register_mem_sect_under_node(struct memory_block *mem_blk,
>  {
>         return 0;
>  }
> -static inline int unregister_mem_sect_under_nodes(struct memory_block *mem_blk,
> -                                                 unsigned long phys_index)
> +static inline int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>  {
>         return 0;
>  }
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 107f72952347..527fe4f9c620 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -519,8 +519,6 @@ static void __remove_section(struct zone *zone, struct mem_section *ms,
>         if (WARN_ON_ONCE(!valid_section(ms)))
>                 return;
>
> -       unregister_memory_section(ms);
> -
>         scn_nr = __section_nr(ms);
>         start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>         __remove_zone(zone, start_pfn);
> @@ -1844,6 +1842,9 @@ void __ref __remove_memory(int nid, u64 start, u64 size)
>         memblock_free(start, size);
>         memblock_remove(start, size);
>
> +       /* remove memory block devices before removing memory */
> +       hotplug_memory_unregister(start, size);
> +
>         arch_remove_memory(nid, start, size, NULL);
>         __release_memory_resource(start, size);

Other than the BUG_ON concern you can add

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
From: Dan Williams @ 2019-05-07 21:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Pavel Tatashin, Linux-sh, Mathieu Malaterre,
	Linux Kernel Mailing List, Wei Yang, Linux MM, Qian Cai, Arun KS,
	Andrew Morton, linuxppc-dev, Joonsoo Kim
In-Reply-To: <d83fec16-ceff-2f6f-72e1-48996187d5ba@redhat.com>

On Tue, May 7, 2019 at 2:24 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 23:19, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> No longer needed, the callers of arch_add_memory() can handle this
> >> manually.
> >>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> >> Cc: Wei Yang <richard.weiyang@gmail.com>
> >> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >> Cc: Qian Cai <cai@lca.pw>
> >> Cc: Arun KS <arunks@codeaurora.org>
> >> Cc: Mathieu Malaterre <malat@debian.org>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  include/linux/memory_hotplug.h | 8 --------
> >>  mm/memory_hotplug.c            | 9 +++------
> >>  2 files changed, 3 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> >> index 2d4de313926d..2f1f87e13baa 100644
> >> --- a/include/linux/memory_hotplug.h
> >> +++ b/include/linux/memory_hotplug.h
> >> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
> >>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
> >>                            unsigned long nr_pages, struct vmem_altmap *altmap);
> >>
> >> -/*
> >> - * Do we want sysfs memblock files created. This will allow userspace to online
> >> - * and offline memory explicitly. Lack of this bit means that the caller has to
> >> - * call move_pfn_range_to_zone to finish the initialization.
> >> - */
> >> -
> >> -#define MHP_MEMBLOCK_API               (1<<0)
> >> -
> >>  /* reasonably generic interface to expand the physical pages */
> >>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> >>                        struct mhp_restrictions *restrictions);
> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >> index e1637c8a0723..107f72952347 100644
> >> --- a/mm/memory_hotplug.c
> >> +++ b/mm/memory_hotplug.c
> >> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> >>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> >>
> >>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> >> -               struct vmem_altmap *altmap, bool want_memblock)
> >> +                                  struct vmem_altmap *altmap)
> >>  {
> >>         int ret;
> >>
> >> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> >>         }
> >>
> >>         for (i = start_sec; i <= end_sec; i++) {
> >> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> >> -                               restrictions->flags & MHP_MEMBLOCK_API);
> >> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
> >>
> >>                 /*
> >>                  * EEXIST is finally dealt with by ioresource collision
> >> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
> >>   */
> >>  int __ref add_memory_resource(int nid, struct resource *res)
> >>  {
> >> -       struct mhp_restrictions restrictions = {
> >> -               .flags = MHP_MEMBLOCK_API,
> >> -       };
> >> +       struct mhp_restrictions restrictions = {};
> >
> > With mhp_restrictions.flags no longer needed, can we drop
> > mhp_restrictions and just go back to a plain @altmap argument?
> >
>
> Oscar wants to use it to configure from where to allocate vmemmaps. That
> was the original driver behind it.
>

Ah, ok, makes sense.

^ permalink raw reply

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
From: David Hildenbrand @ 2019-05-07 21:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Pavel Tatashin, Linux-sh, Mathieu Malaterre,
	Linux Kernel Mailing List, Wei Yang, Linux MM, Qian Cai, Arun KS,
	Andrew Morton, linuxppc-dev, Joonsoo Kim
In-Reply-To: <CAPcyv4ge1pSOopfHof4USn=7Skc-UV4Xhd_s=h+M9VXSp_p1XQ@mail.gmail.com>

On 07.05.19 23:19, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> No longer needed, the callers of arch_add_memory() can handle this
>> manually.
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Wei Yang <richard.weiyang@gmail.com>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Qian Cai <cai@lca.pw>
>> Cc: Arun KS <arunks@codeaurora.org>
>> Cc: Mathieu Malaterre <malat@debian.org>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  include/linux/memory_hotplug.h | 8 --------
>>  mm/memory_hotplug.c            | 9 +++------
>>  2 files changed, 3 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>> index 2d4de313926d..2f1f87e13baa 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>>
>> -/*
>> - * Do we want sysfs memblock files created. This will allow userspace to online
>> - * and offline memory explicitly. Lack of this bit means that the caller has to
>> - * call move_pfn_range_to_zone to finish the initialization.
>> - */
>> -
>> -#define MHP_MEMBLOCK_API               (1<<0)
>> -
>>  /* reasonably generic interface to expand the physical pages */
>>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>>                        struct mhp_restrictions *restrictions);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index e1637c8a0723..107f72952347 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>
>>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>> -               struct vmem_altmap *altmap, bool want_memblock)
>> +                                  struct vmem_altmap *altmap)
>>  {
>>         int ret;
>>
>> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>         }
>>
>>         for (i = start_sec; i <= end_sec; i++) {
>> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
>> -                               restrictions->flags & MHP_MEMBLOCK_API);
>> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>>
>>                 /*
>>                  * EEXIST is finally dealt with by ioresource collision
>> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>>   */
>>  int __ref add_memory_resource(int nid, struct resource *res)
>>  {
>> -       struct mhp_restrictions restrictions = {
>> -               .flags = MHP_MEMBLOCK_API,
>> -       };
>> +       struct mhp_restrictions restrictions = {};
> 
> With mhp_restrictions.flags no longer needed, can we drop
> mhp_restrictions and just go back to a plain @altmap argument?
> 

Oscar wants to use it to configure from where to allocate vmemmaps. That
was the original driver behind it.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v2 5/8] mm/memory_hotplug: Drop MHP_MEMBLOCK_API
From: Dan Williams @ 2019-05-07 21:19 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Pavel Tatashin, Linux-sh, Mathieu Malaterre,
	Linux Kernel Mailing List, Wei Yang, Linux MM, Qian Cai, Arun KS,
	Andrew Morton, linuxppc-dev, Joonsoo Kim
In-Reply-To: <20190507183804.5512-6-david@redhat.com>

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> No longer needed, the callers of arch_add_memory() can handle this
> manually.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h | 8 --------
>  mm/memory_hotplug.c            | 9 +++------
>  2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 2d4de313926d..2f1f87e13baa 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -128,14 +128,6 @@ extern void arch_remove_memory(int nid, u64 start, u64 size,
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>                            unsigned long nr_pages, struct vmem_altmap *altmap);
>
> -/*
> - * Do we want sysfs memblock files created. This will allow userspace to online
> - * and offline memory explicitly. Lack of this bit means that the caller has to
> - * call move_pfn_range_to_zone to finish the initialization.
> - */
> -
> -#define MHP_MEMBLOCK_API               (1<<0)
> -
>  /* reasonably generic interface to expand the physical pages */
>  extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>                        struct mhp_restrictions *restrictions);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e1637c8a0723..107f72952347 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -250,7 +250,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
>  #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
>  static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> -               struct vmem_altmap *altmap, bool want_memblock)
> +                                  struct vmem_altmap *altmap)
>  {
>         int ret;
>
> @@ -293,8 +293,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         }
>
>         for (i = start_sec; i <= end_sec; i++) {
> -               err = __add_section(nid, section_nr_to_pfn(i), altmap,
> -                               restrictions->flags & MHP_MEMBLOCK_API);
> +               err = __add_section(nid, section_nr_to_pfn(i), altmap);
>
>                 /*
>                  * EEXIST is finally dealt with by ioresource collision
> @@ -1066,9 +1065,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   */
>  int __ref add_memory_resource(int nid, struct resource *res)
>  {
> -       struct mhp_restrictions restrictions = {
> -               .flags = MHP_MEMBLOCK_API,
> -       };
> +       struct mhp_restrictions restrictions = {};

With mhp_restrictions.flags no longer needed, can we drop
mhp_restrictions and just go back to a plain @altmap argument?

^ permalink raw reply

* Re: [PATCH v2 4/8] mm/memory_hotplug: Create memory block devices after arch_add_memory()
From: Dan Williams @ 2019-05-07 21:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-s390, Michal Hocko, linux-ia64, Pavel Tatashin, Linux-sh,
	mike.travis@hpe.com, Greg Kroah-Hartman, Rafael J. Wysocki,
	Mathieu Malaterre, Linux Kernel Mailing List, Wei Yang, Linux MM,
	Andrew Banman, Qian Cai, Arun KS, Andrew Morton, linuxppc-dev,
	Ingo Molnar, Oscar Salvador
In-Reply-To: <20190507183804.5512-5-david@redhat.com>

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Only memory to be added to the buddy and to be onlined/offlined by
> user space using memory block devices needs (and should have!) memory
> block devices.
>
> Factor out creation of memory block devices Create all devices after
> arch_add_memory() succeeded. We can later drop the want_memblock parameter,
> because it is now effectively stale.
>
> Only after memory block devices have been added, memory can be onlined
> by user space. This implies, that memory is not visible to user space at
> all before arch_add_memory() succeeded.

Nice!

>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Mathieu Malaterre <malat@debian.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/base/memory.c  | 70 ++++++++++++++++++++++++++----------------
>  include/linux/memory.h |  2 +-
>  mm/memory_hotplug.c    | 15 ++++-----
>  3 files changed, 53 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 6e0cb4fda179..862c202a18ca 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -701,44 +701,62 @@ static int add_memory_block(int base_section_nr)
>         return 0;
>  }
>
> +static void unregister_memory(struct memory_block *memory)
> +{
> +       BUG_ON(memory->dev.bus != &memory_subsys);

Given this should never happen and only a future kernel developer
might trip over it, do we really need to kill that developer's
machine? I.e. s/BUG/WARN/? I guess an argument can be made to move
such a change that to a follow-on patch since you're just preserving
existing behavior, but I figure might as well address these as the
code is refactored.

> +
> +       /* drop the ref. we got via find_memory_block() */
> +       put_device(&memory->dev);
> +       device_unregister(&memory->dev);
> +}
> +
>  /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>   */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int hotplug_memory_register(unsigned long start, unsigned long size)
>  {
> -       int ret = 0;
> +       unsigned long block_nr_pages = memory_block_size_bytes() >> PAGE_SHIFT;
> +       unsigned long start_pfn = PFN_DOWN(start);
> +       unsigned long end_pfn = start_pfn + (size >> PAGE_SHIFT);
> +       unsigned long pfn;
>         struct memory_block *mem;
> +       int ret = 0;
>
> -       mutex_lock(&mem_sysfs_mutex);
> +       BUG_ON(!IS_ALIGNED(start, memory_block_size_bytes()));
> +       BUG_ON(!IS_ALIGNED(size, memory_block_size_bytes()));

Perhaps:

    if (WARN_ON(...))
        return -EINVAL;

>
> -       mem = find_memory_block(section);
> -       if (mem) {
> -               mem->section_count++;
> -               put_device(&mem->dev);
> -       } else {
> -               ret = init_memory_block(&mem, section, MEM_OFFLINE);
> +       mutex_lock(&mem_sysfs_mutex);
> +       for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +               mem = find_memory_block(__pfn_to_section(pfn));
> +               if (mem) {
> +                       WARN_ON_ONCE(false);

?? Isn't that a nop?

> +                       put_device(&mem->dev);
> +                       continue;
> +               }
> +               ret = init_memory_block(&mem, __pfn_to_section(pfn),
> +                                       MEM_OFFLINE);
>                 if (ret)
> -                       goto out;
> -               mem->section_count++;
> +                       break;
> +               mem->section_count = memory_block_size_bytes() /
> +                                    MIN_MEMORY_BLOCK_SIZE;
> +       }
> +       if (ret) {
> +               end_pfn = pfn;
> +               for (pfn = start_pfn; pfn != end_pfn; pfn += block_nr_pages) {
> +                       mem = find_memory_block(__pfn_to_section(pfn));
> +                       if (!mem)
> +                               continue;
> +                       mem->section_count = 0;
> +                       unregister_memory(mem);
> +               }
>         }
> -
> -out:
>         mutex_unlock(&mem_sysfs_mutex);
>         return ret;
>  }
>
> -static void
> -unregister_memory(struct memory_block *memory)
> -{
> -       BUG_ON(memory->dev.bus != &memory_subsys);
> -
> -       /* drop the ref. we got via find_memory_block() */
> -       put_device(&memory->dev);
> -       device_unregister(&memory->dev);
> -}
> -
> -void unregister_memory_section(struct mem_section *section)
> +static int remove_memory_section(struct mem_section *section)
>  {
>         struct memory_block *mem;
>
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 474c7c60c8f2..95505fbb5f85 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
>  extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
> -int hotplug_memory_register(int nid, struct mem_section *section);
> +int hotplug_memory_register(unsigned long start, unsigned long size);
>  extern void unregister_memory_section(struct mem_section *);
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7b5439839d67..e1637c8a0723 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -258,13 +258,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>                 return -EEXIST;
>
>         ret = sparse_add_one_section(nid, phys_start_pfn, altmap);
> -       if (ret < 0)
> -               return ret;
> -
> -       if (!want_memblock)
> -               return 0;
> -
> -       return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
> +       return ret < 0 ? ret : 0;
>  }
>
>  /*
> @@ -1106,6 +1100,13 @@ int __ref add_memory_resource(int nid, struct resource *res)
>         if (ret < 0)
>                 goto error;
>
> +       /* create memory block devices after memory was added */
> +       ret = hotplug_memory_register(start, size);
> +       if (ret) {
> +               arch_remove_memory(nid, start, size, NULL);
> +               goto error;
> +       }
> +
>         if (new_node) {
>                 /* If sysfs file of new node can't be created, cpu on the node
>                  * can't be hot-added. There is no rollback way now.
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
From: David Hildenbrand @ 2019-05-07 21:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Vasily Gorbik, Linux-sh, Heiko Carstens,
	Linux Kernel Mailing List, Linux MM, Mike Rapoport,
	Martin Schwidefsky, Andrew Morton, linuxppc-dev
In-Reply-To: <CAPcyv4hvpBo=6c6pFCoGiEf3xiPsjc8w2p4Y6_bW4PrzcN=Few@mail.gmail.com>

On 07.05.19 22:57, Dan Williams wrote:
> On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.05.19 22:46, Dan Williams wrote:
>>> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Will come in handy when wanting to handle errors after
>>>> arch_add_memory().
>>>>
>>>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>>>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>>>> Cc: Oscar Salvador <osalvador@suse.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  arch/s390/mm/init.c | 13 +++++++------
>>>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>>>> index 31b1071315d7..1e0cbae69f12 100644
>>>> --- a/arch/s390/mm/init.c
>>>> +++ b/arch/s390/mm/init.c
>>>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>>>                         struct vmem_altmap *altmap)
>>>>  {
>>>> -       /*
>>>> -        * There is no hardware or firmware interface which could trigger a
>>>> -        * hot memory remove on s390. So there is nothing that needs to be
>>>> -        * implemented.
>>>> -        */
>>>> -       BUG();
>>>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>>>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>>>> +       struct zone *zone;
>>>> +
>>>> +       zone = page_zone(pfn_to_page(start_pfn));
>>>
>>> Does s390 actually support passing in an altmap? If 'yes', I think it
>>> also needs the vmem_altmap_offset() fixup like x86-64:
>>>
>>>         /* With altmap the first mapped page is offset from @start */
>>>         if (altmap)
>>>                 page += vmem_altmap_offset(altmap);
>>>
>>> ...but I suspect it does not support altmap since
>>> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
>>> page' capacity to be allocated out of an altmap defined page pool.
>>>
>>> I think it would be enough to disallow any arch_add_memory() on s390
>>> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
>>> support and can enable the pmem use case.
>>>
>>
>> As far as I know, it doesn't yet, however I guess this could change once
>> virtio-pmem is supported?
> 
> I would expect and request virtio-pmem remain a non-starter on s390
> until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
> another flavor of the general pmem driver and the pmem driver
> currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
> pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

Yes, I think ZONE_DEVICE will be the way to go. On real HW, there will
never be anything mapped into the physical address space besides system
ram. However with virtio-pmem in virtual environments, we have the
option to change that.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
From: David Hildenbrand @ 2019-05-07 21:06 UTC (permalink / raw)
  To: Dan Williams
  Cc: Oscar Salvador, Rich Felker, linux-ia64, Linux-sh, Peter Zijlstra,
	Dave Hansen, Heiko Carstens, Chris Wilson, Linux MM, Michal Hocko,
	Paul Mackerras, H. Peter Anvin, Qian Cai, linux-s390,
	Yoshinori Sato, Rafael J. Wysocki, Mike Rapoport, Ingo Molnar,
	Fenghua Yu, Pavel Tatashin, Vasily Gorbik, Rob Herring,
	mike.travis@hpe.com, Nicholas Piggin, Alex Deucher, Mark Brown,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Tony Luck,
	Baoquan He, Masahiro Yamada, Mathieu Malaterre,
	Greg Kroah-Hartman, Andrew Banman, Linux Kernel Mailing List,
	Logan Gunthorpe, Wei Yang, Martin Schwidefsky, Arun KS,
	Andrew Morton, linuxppc-dev, David S. Miller, Kirill A. Shutemov
In-Reply-To: <CAPcyv4jpnKjeP3QEvF3_9CzdZhtFXN2nMU7P-Ee7y06J3bGZ0A@mail.gmail.com>

On 07.05.19 23:02, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Let's prepare for better error handling while adding memory by allowing
>> to use arch_remove_memory() and __remove_pages() even if
>> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
>> covers
>> - Offlining of system ram (memory block devices) - offline_pages()
>> - Unplug of system ram - remove_memory()
>> - Unplug/remap of device memory - devm_memremap()
>>
>> This allows e.g. for handling like
>>
>> arch_add_memory()
>> rc = do_something();
>> if (rc) {
>>         arch_remove_memory();
>> }
>>
>> Whereby do_something() will for example be memory block device creation
>> after it has been factored out.
> 
> What's left after this?

I tried to describe this above here:

- Offlining of system ram (memory block devices) - offline_pages()
- Unplug of system ram - remove_memory()
- Unplug/remap of device memory - devm_memremap()

So administrators cannot trigger offlining/removal of memory.

> Can we just get rid of CONFIG_MEMORY_HOTREMOVE
> option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
> clear to me why there was ever the option to compile out the remove
> code when the add code is included.

I guess it was a configure option because initially, offlining/unplug
was extremely unstable. Now it's only slightly unstable :D

I would actually favor getting rid of CONFIG_MEMORY_HOTREMOVE
completely. After all, unplug always has to be triggered by an admin (HW
or software).

Opinions?

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* Re: [PATCH v2 3/8] mm/memory_hotplug: arch_remove_memory() and __remove_pages() with CONFIG_MEMORY_HOTPLUG
From: Dan Williams @ 2019-05-07 21:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rich Felker, linux-ia64, Linux-sh, Peter Zijlstra,
	Dave Hansen, Heiko Carstens, Chris Wilson, Linux MM, Michal Hocko,
	Paul Mackerras, H. Peter Anvin, Qian Cai, linux-s390,
	Yoshinori Sato, Rafael J. Wysocki, Mike Rapoport, Ingo Molnar,
	Fenghua Yu, Pavel Tatashin, Vasily Gorbik, Rob Herring,
	mike.travis@hpe.com, Nicholas Piggin, Alex Deucher, Mark Brown,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Tony Luck,
	Baoquan He, Masahiro Yamada, Mathieu Malaterre,
	Greg Kroah-Hartman, Andrew Banman, Linux Kernel Mailing List,
	Logan Gunthorpe, Wei Yang, Martin Schwidefsky, Arun KS,
	Andrew Morton, linuxppc-dev, David S. Miller, Kirill A. Shutemov
In-Reply-To: <20190507183804.5512-4-david@redhat.com>

On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's prepare for better error handling while adding memory by allowing
> to use arch_remove_memory() and __remove_pages() even if
> CONFIG_MEMORY_HOTREMOVE is not set. CONFIG_MEMORY_HOTREMOVE effectively
> covers
> - Offlining of system ram (memory block devices) - offline_pages()
> - Unplug of system ram - remove_memory()
> - Unplug/remap of device memory - devm_memremap()
>
> This allows e.g. for handling like
>
> arch_add_memory()
> rc = do_something();
> if (rc) {
>         arch_remove_memory();
> }
>
> Whereby do_something() will for example be memory block device creation
> after it has been factored out.

What's left after this? Can we just get rid of CONFIG_MEMORY_HOTREMOVE
option completely when CONFIG_MEMORY_HOTPLUG is enabled? It's not
clear to me why there was ever the option to compile out the remove
code when the add code is included.

> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mike Rapoport <rppt@linux.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Oscar Salvador <osalvador@suse.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
> Cc: Andrew Banman <andrew.banman@hpe.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Arun KS <arunks@codeaurora.org>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Mathieu Malaterre <malat@debian.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/ia64/mm/init.c            | 2 --
>  arch/powerpc/mm/mem.c          | 2 --
>  arch/s390/mm/init.c            | 2 --
>  arch/sh/mm/init.c              | 2 --
>  arch/x86/mm/init_32.c          | 2 --
>  arch/x86/mm/init_64.c          | 2 --
>  drivers/base/memory.c          | 2 --
>  include/linux/memory.h         | 2 --
>  include/linux/memory_hotplug.h | 2 --
>  mm/memory_hotplug.c            | 2 --
>  mm/sparse.c                    | 6 ------
>  11 files changed, 26 deletions(-)
>
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index d28e29103bdb..aae75fd7b810 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -681,7 +681,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -693,4 +692,3 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index a2b78e72452f..ddc69b59575c 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -131,7 +131,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
>         return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void __ref arch_remove_memory(int nid, u64 start, u64 size,
>                              struct vmem_altmap *altmap)
>  {
> @@ -164,7 +163,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>         resize_hpt_for_hotplug(memblock_phys_mem_size());
>  }
>  #endif
> -#endif /* CONFIG_MEMORY_HOTPLUG */
>
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  void __init mem_topology_setup(void)
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index 1e0cbae69f12..eafa3c750efc 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -233,7 +233,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return rc;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -245,5 +244,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>         vmem_remove_mapping(start, size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
> index 5aeb4d7099a1..59c5fe511f25 100644
> --- a/arch/sh/mm/init.c
> +++ b/arch/sh/mm/init.c
> @@ -428,7 +428,6 @@ int memory_add_physaddr_to_nid(u64 addr)
>  EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
>  #endif
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -439,5 +438,4 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         zone = page_zone(pfn_to_page(start_pfn));
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
> index 075e568098f2..8d4bf2d97d50 100644
> --- a/arch/x86/mm/init_32.c
> +++ b/arch/x86/mm/init_32.c
> @@ -859,7 +859,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>         return __add_pages(nid, start_pfn, nr_pages, restrictions);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  void arch_remove_memory(int nid, u64 start, u64 size,
>                         struct vmem_altmap *altmap)
>  {
> @@ -871,7 +870,6 @@ void arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>  }
>  #endif
> -#endif
>
>  int kernel_set_to_readonly __read_mostly;
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 20d14254b686..f1b55ddea23f 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1131,7 +1131,6 @@ void __ref vmemmap_free(unsigned long start, unsigned long end,
>         remove_pagetable(start, end, false, altmap);
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void __meminit
>  kernel_physical_mapping_remove(unsigned long start, unsigned long end)
>  {
> @@ -1156,7 +1155,6 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size,
>         __remove_pages(zone, start_pfn, nr_pages, altmap);
>         kernel_physical_mapping_remove(start, start + size);
>  }
> -#endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
>  static struct kcore_list kcore_vsyscall;
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f180427e48f4..6e0cb4fda179 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -728,7 +728,6 @@ int hotplug_memory_register(int nid, struct mem_section *section)
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void
>  unregister_memory(struct memory_block *memory)
>  {
> @@ -767,7 +766,6 @@ void unregister_memory_section(struct mem_section *section)
>  out_unlock:
>         mutex_unlock(&mem_sysfs_mutex);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  /* return true if the memory block is offlined, otherwise, return false */
>  bool is_memblock_offlined(struct memory_block *mem)
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index e1dc1bb2b787..474c7c60c8f2 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -112,9 +112,7 @@ extern void unregister_memory_notifier(struct notifier_block *nb);
>  extern int register_memory_isolate_notifier(struct notifier_block *nb);
>  extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>  int hotplug_memory_register(int nid, struct mem_section *section);
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void unregister_memory_section(struct mem_section *);
> -#endif
>  extern int memory_dev_init(void);
>  extern int memory_notify(unsigned long val, void *v);
>  extern int memory_isolate_notify(unsigned long val, void *v);
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index ae892eef8b82..2d4de313926d 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -123,12 +123,10 @@ static inline bool movable_node_is_enabled(void)
>         return movable_node_enabled;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  extern void arch_remove_memory(int nid, u64 start, u64 size,
>                                struct vmem_altmap *altmap);
>  extern void __remove_pages(struct zone *zone, unsigned long start_pfn,
>                            unsigned long nr_pages, struct vmem_altmap *altmap);
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  /*
>   * Do we want sysfs memblock files created. This will allow userspace to online
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 202febe88b58..7b5439839d67 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -317,7 +317,6 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>         return err;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
>  static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>                                      unsigned long start_pfn,
> @@ -581,7 +580,6 @@ void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
>
>         set_zone_contiguous(zone);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>
>  int set_online_page_callback(online_page_callback_t callback)
>  {
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fd13166949b5..d1d5e05f5b8d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -604,7 +604,6 @@ static void __kfree_section_memmap(struct page *memmap,
>
>         vmemmap_free(start, end, altmap);
>  }
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>         unsigned long start = (unsigned long)memmap;
> @@ -612,7 +611,6 @@ static void free_map_bootmem(struct page *memmap)
>
>         vmemmap_free(start, end, NULL);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #else
>  static struct page *__kmalloc_section_memmap(void)
>  {
> @@ -651,7 +649,6 @@ static void __kfree_section_memmap(struct page *memmap,
>                            get_order(sizeof(struct page) * PAGES_PER_SECTION));
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  static void free_map_bootmem(struct page *memmap)
>  {
>         unsigned long maps_section_nr, removing_section_nr, i;
> @@ -681,7 +678,6 @@ static void free_map_bootmem(struct page *memmap)
>                         put_page_bootmem(page);
>         }
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>
>  /**
> @@ -746,7 +742,6 @@ int __meminit sparse_add_one_section(int nid, unsigned long start_pfn,
>         return ret;
>  }
>
> -#ifdef CONFIG_MEMORY_HOTREMOVE
>  #ifdef CONFIG_MEMORY_FAILURE
>  static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages)
>  {
> @@ -823,5 +818,4 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>                         PAGES_PER_SECTION - map_offset);
>         free_section_usemap(memmap, usemap, altmap);
>  }
> -#endif /* CONFIG_MEMORY_HOTREMOVE */
>  #endif /* CONFIG_MEMORY_HOTPLUG */
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
From: Dan Williams @ 2019-05-07 20:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Vasily Gorbik, Linux-sh, Heiko Carstens,
	Linux Kernel Mailing List, Linux MM, Mike Rapoport,
	Martin Schwidefsky, Andrew Morton, linuxppc-dev
In-Reply-To: <97a6a2ab-0e8b-d403-ca39-ffa4425e15a5@redhat.com>

On Tue, May 7, 2019 at 1:47 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.05.19 22:46, Dan Williams wrote:
> > On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Will come in handy when wanting to handle errors after
> >> arch_add_memory().
> >>
> >> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> >> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@suse.com>
> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Vasily Gorbik <gor@linux.ibm.com>
> >> Cc: Oscar Salvador <osalvador@suse.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  arch/s390/mm/init.c | 13 +++++++------
> >>  1 file changed, 7 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> >> index 31b1071315d7..1e0cbae69f12 100644
> >> --- a/arch/s390/mm/init.c
> >> +++ b/arch/s390/mm/init.c
> >> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
> >>  void arch_remove_memory(int nid, u64 start, u64 size,
> >>                         struct vmem_altmap *altmap)
> >>  {
> >> -       /*
> >> -        * There is no hardware or firmware interface which could trigger a
> >> -        * hot memory remove on s390. So there is nothing that needs to be
> >> -        * implemented.
> >> -        */
> >> -       BUG();
> >> +       unsigned long start_pfn = start >> PAGE_SHIFT;
> >> +       unsigned long nr_pages = size >> PAGE_SHIFT;
> >> +       struct zone *zone;
> >> +
> >> +       zone = page_zone(pfn_to_page(start_pfn));
> >
> > Does s390 actually support passing in an altmap? If 'yes', I think it
> > also needs the vmem_altmap_offset() fixup like x86-64:
> >
> >         /* With altmap the first mapped page is offset from @start */
> >         if (altmap)
> >                 page += vmem_altmap_offset(altmap);
> >
> > ...but I suspect it does not support altmap since
> > arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> > page' capacity to be allocated out of an altmap defined page pool.
> >
> > I think it would be enough to disallow any arch_add_memory() on s390
> > where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> > support and can enable the pmem use case.
> >
>
> As far as I know, it doesn't yet, however I guess this could change once
> virtio-pmem is supported?

I would expect and request virtio-pmem remain a non-starter on s390
until s390 gains ZONE_DEVICE support. As it stands virtio-pmem is just
another flavor of the general pmem driver and the pmem driver
currently only exports ZONE_DEVICE pfns tagged by the PTE_DEVMAP
pte-flag and PFN_DEV+PFN_MAP pfn_t-flags.

A hamstrung version of DAX (CONFIG_FS_DAX_LIMITED) is enabled for the
s390/dcssblk driver, but that requires that the driver indicate this
limited use case via the PTE_SPECIAL pte-flag and PFN_SPECIAL
pfn_t-flag. I otherwise do not want to see CONFIG_FS_DAX_LIMITED
spread outside of the s390/dcssblk use case.

^ permalink raw reply

* Re: [PATCH v2 2/8] s390x/mm: Implement arch_remove_memory()
From: David Hildenbrand @ 2019-05-07 20:47 UTC (permalink / raw)
  To: Dan Williams
  Cc: Oscar Salvador, linux-s390, Michal Hocko, linux-ia64,
	Vasily Gorbik, Linux-sh, Heiko Carstens,
	Linux Kernel Mailing List, Linux MM, Mike Rapoport,
	Martin Schwidefsky, Andrew Morton, linuxppc-dev
In-Reply-To: <CAPcyv4gtAMn2mDz0s1GRTJ52MeTK3jJYLQne6MiEx_ipPFUsmA@mail.gmail.com>

On 07.05.19 22:46, Dan Williams wrote:
> On Tue, May 7, 2019 at 11:38 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> Will come in handy when wanting to handle errors after
>> arch_add_memory().
>>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  arch/s390/mm/init.c | 13 +++++++------
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>> index 31b1071315d7..1e0cbae69f12 100644
>> --- a/arch/s390/mm/init.c
>> +++ b/arch/s390/mm/init.c
>> @@ -237,12 +237,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
>>  void arch_remove_memory(int nid, u64 start, u64 size,
>>                         struct vmem_altmap *altmap)
>>  {
>> -       /*
>> -        * There is no hardware or firmware interface which could trigger a
>> -        * hot memory remove on s390. So there is nothing that needs to be
>> -        * implemented.
>> -        */
>> -       BUG();
>> +       unsigned long start_pfn = start >> PAGE_SHIFT;
>> +       unsigned long nr_pages = size >> PAGE_SHIFT;
>> +       struct zone *zone;
>> +
>> +       zone = page_zone(pfn_to_page(start_pfn));
> 
> Does s390 actually support passing in an altmap? If 'yes', I think it
> also needs the vmem_altmap_offset() fixup like x86-64:
> 
>         /* With altmap the first mapped page is offset from @start */
>         if (altmap)
>                 page += vmem_altmap_offset(altmap);
> 
> ...but I suspect it does not support altmap since
> arch/s390/mm/vmem.c::vmemmap_populate() does not arrange for 'struct
> page' capacity to be allocated out of an altmap defined page pool.
> 
> I think it would be enough to disallow any arch_add_memory() on s390
> where @altmap is non-NULL. At least until s390 gains ZONE_DEVICE
> support and can enable the pmem use case.
> 

As far as I know, it doesn't yet, however I guess this could change once
virtio-pmem is supported?

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox