LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Ricardo Neri @ 2018-06-20  0:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu
In-Reply-To: <alpine.DEB.2.21.1806161517050.1582@nanos.tec.linutronix.de>

On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> On Fri, 15 Jun 2018, Ricardo Neri wrote:
> > On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
> > > On Thu, 14 Jun 2018, Ricardo Neri wrote:
> > > > Alternatively, there could be a counter that skips reading the HPET status
> > > > register (and the detection of hardlockups) for every X NMIs. This would
> > > > reduce the overall frequency of HPET register reads.
> > > 
> > > Great plan. So if the watchdog is the only NMI (because perf is off) then
> > > you delay the watchdog detection by that count.
> > 
> > OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
> > register per NMI just to check in the status register if the HPET timer
> > caused the NMI?
> 
> The status register is useless in case of MSI. MSI is edge triggered ....
> 
> The only register which gives you proper information is the counter
> register itself. That adds an massive overhead to each NMI, because the
> counter register access is synchronized to the HPET clock with hardware
> magic. Plus on larger systems, the HPET access is cross node and even
> slower.

It starts to sound that the HPET is too slow to drive the hardlockup detector.

Would it be possible to envision a variant of this implementation? In this
variant, the HPET only targets a single CPU. The actual hardlockup detector
is implemented by this single CPU sending interprocessor interrupts to the
rest of the CPUs.

In this manner only one CPU has to deal with the slowness of the HPET; the
rest of the CPUs don't have to read or write any HPET registers. A sysfs
entry could be added to configure which CPU will have to deal with the HPET
timer. However, profiling could not be done accurately on such CPU.

Thanks and BR,
Ricardo

^ permalink raw reply

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Randy Dunlap @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Ricardo Neri, Thomas Gleixner
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Davidlohr Bueso, Christoffer Dall, Marc Zyngier,
	Kai-Heng Feng, Konrad Rzeszutek Wilk, David Rientjes, iommu
In-Reply-To: <20180620001535.GA27531@voyager>

On 06/19/2018 05:15 PM, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
>> On Fri, 15 Jun 2018, Ricardo Neri wrote:
>>> On Fri, Jun 15, 2018 at 11:19:09AM +0200, Thomas Gleixner wrote:
>>>> On Thu, 14 Jun 2018, Ricardo Neri wrote:
>>>>> Alternatively, there could be a counter that skips reading the HPET status
>>>>> register (and the detection of hardlockups) for every X NMIs. This would
>>>>> reduce the overall frequency of HPET register reads.
>>>>
>>>> Great plan. So if the watchdog is the only NMI (because perf is off) then
>>>> you delay the watchdog detection by that count.
>>>
>>> OK. This was a bad idea. Then, is it acceptable to have an read to an HPET
>>> register per NMI just to check in the status register if the HPET timer
>>> caused the NMI?
>>
>> The status register is useless in case of MSI. MSI is edge triggered ....
>>
>> The only register which gives you proper information is the counter
>> register itself. That adds an massive overhead to each NMI, because the
>> counter register access is synchronized to the HPET clock with hardware
>> magic. Plus on larger systems, the HPET access is cross node and even
>> slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.
> 
> In this manner only one CPU has to deal with the slowness of the HPET; the
> rest of the CPUs don't have to read or write any HPET registers. A sysfs
> entry could be added to configure which CPU will have to deal with the HPET
> timer. However, profiling could not be done accurately on such CPU.

Please forgive my simple question:

What happens when this one CPU is the one that locks up?

thnx,
-- 
~Randy

^ permalink raw reply

* Re: [PATCH kernel 1/2] vfio/spapr: Use IOMMU pageshift rather than pagesize
From: David Gibson @ 2018-06-20  0:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, kvm-ppc, kvm, Alex Williamson, Paul Mackerras
In-Reply-To: <20180619084414.808-2-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 2164 bytes --]

On Tue, Jun 19, 2018 at 06:44:13PM +1000, Alexey Kardashevskiy wrote:
> The size is always equal to 1 page so let's use this. Later on this will
> be used for other checks which use page shifts to check the granularity
> of access.
> 
> This should cause no behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 759a5bd..2da5f05 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -457,13 +457,13 @@ static void tce_iommu_unuse_page(struct tce_container *container,
>  }
>  
>  static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
> -		unsigned long tce, unsigned long size,
> +		unsigned long tce, unsigned long shift,
>  		unsigned long *phpa, struct mm_iommu_table_group_mem_t **pmem)
>  {
>  	long ret = 0;
>  	struct mm_iommu_table_group_mem_t *mem;
>  
> -	mem = mm_iommu_lookup(container->mm, tce, size);
> +	mem = mm_iommu_lookup(container->mm, tce, 1ULL << shift);
>  	if (!mem)
>  		return -EINVAL;
>  
> @@ -487,7 +487,7 @@ static void tce_iommu_unuse_page_v2(struct tce_container *container,
>  	if (!pua)
>  		return;
>  
> -	ret = tce_iommu_prereg_ua_to_hpa(container, *pua, IOMMU_PAGE_SIZE(tbl),
> +	ret = tce_iommu_prereg_ua_to_hpa(container, *pua, tbl->it_page_shift,
>  			&hpa, &mem);
>  	if (ret)
>  		pr_debug("%s: tce %lx at #%lx was not cached, ret=%d\n",
> @@ -611,7 +611,7 @@ static long tce_iommu_build_v2(struct tce_container *container,
>  				entry + i);
>  
>  		ret = tce_iommu_prereg_ua_to_hpa(container,
> -				tce, IOMMU_PAGE_SIZE(tbl), &hpa, &mem);
> +				tce, tbl->it_page_shift, &hpa, &mem);
>  		if (ret)
>  			break;
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] ocxl: Fix page fault handler in case of fault on dying process
From: Alastair D'Silva @ 2018-06-20  1:39 UTC (permalink / raw)
  To: Frederic Barrat, andrew.donnellan, linuxppc-dev; +Cc: clombard, vaibhav
In-Reply-To: <20180618121436.20479-1-fbarrat@linux.ibm.com>

On Mon, 2018-06-18 at 14:14 +0200, Frederic Barrat wrote:
> If a process exits without doing proper cleanup, there's a window
> where an opencapi device can try to access the memory of the dying
> process and may trigger a page fault. That's an expected scenario and
> the ocxl driver holds a reference on the mm_struct of the process
> until the opencapi device is notified of the process exiting.
> However, if mm_users is already at 0, i.e. the address space of the
> process has already been destroyed, the driver shouldn't try
> resolving
> the page fault, as it will fail, but it can also try accessing
> already
> freed data.
> 
> It is fixed by only calling the bottom half of the page fault handler
> if mm_users is greater than 0 and get a reference on mm_users instead
> of mm_count. Otherwise, we can safely return a translation fault to
> the device, as its associated memory context is being removed. The
> opencapi device will be properly cleaned up shortly after when
> closing
> the file descriptors.
> 
> Fixes: 5ef3166e8a32 ("ocxl: Driver code for 'generic' opencapi
> devices")
> Cc: stable@vger.kernel.org # v4.16+
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
>  drivers/misc/ocxl/link.c | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index f30790582dc0..2aeaf34e7eda 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -136,7 +136,7 @@ static void xsl_fault_handler_bh(struct
> work_struct *fault_work)
>  	int rc;
>  
>  	/*
> -	 * We need to release a reference on the mm whenever exiting
> this
> +	 * We must release a reference on mm_users whenever exiting
> this
>  	 * function (taken in the memory fault interrupt handler)
>  	 */
>  	rc = copro_handle_mm_fault(fault->pe_data.mm, fault->dar,
> fault->dsisr,
> @@ -172,7 +172,7 @@ static void xsl_fault_handler_bh(struct
> work_struct *fault_work)
>  	}
>  	r = RESTART;
>  ack:
> -	mmdrop(fault->pe_data.mm);
> +	mmput(fault->pe_data.mm);
>  	ack_irq(spa, r);
>  }
>  
> @@ -184,6 +184,7 @@ static irqreturn_t xsl_fault_handler(int irq,
> void *data)
>  	struct pe_data *pe_data;
>  	struct ocxl_process_element *pe;
>  	int lpid, pid, tid;
> +	bool schedule = false;
>  
>  	read_irq(spa, &dsisr, &dar, &pe_handle);
>  	trace_ocxl_fault(spa->spa_mem, pe_handle, dsisr, dar, -1);
> @@ -226,14 +227,19 @@ static irqreturn_t xsl_fault_handler(int irq,
> void *data)
>  	}
>  	WARN_ON(pe_data->mm->context.id != pid);
>  
> -	spa->xsl_fault.pe = pe_handle;
> -	spa->xsl_fault.dar = dar;
> -	spa->xsl_fault.dsisr = dsisr;
> -	spa->xsl_fault.pe_data = *pe_data;
> -	mmgrab(pe_data->mm); /* mm count is released by bottom half
> */
> -
> +	if (mmget_not_zero(pe_data->mm)) {
> +			spa->xsl_fault.pe = pe_handle;
> +			spa->xsl_fault.dar = dar;
> +			spa->xsl_fault.dsisr = dsisr;
> +			spa->xsl_fault.pe_data = *pe_data;
> +			schedule = true;
> +			/* mm_users count released by bottom half */
> +	}
>  	rcu_read_unlock();
> -	schedule_work(&spa->xsl_fault.fault_work);
> +	if (schedule)
> +		schedule_work(&spa->xsl_fault.fault_work);
> +	else
> +		ack_irq(spa, ADDRESS_ERROR);
>  	return IRQ_HANDLED;
>  }
>  

Reviewed-By: Alastair D'Silva <alastair@d-silva.org>

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819

^ permalink raw reply

* [PATCH] powerpc/mm: Initialize kernel pagetable memory for PTE fragments
From: Anshuman Khandual @ 2018-06-20  3:40 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, aneesh.kumar, benh

Kernel pagetable pages for PTE fragments never go through the standard init
sequence which can cause inaccuracies in utilization statistics reported at
places like /proc and /sysfs interfaces etc. Also the allocated page misses
out on pagetable lock and page flag init as well. Fix it by making sure all
pages allocated for either user process or kernel PTE fragments go through
same initialization.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable-book3s64.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index c1f4ca4..1e8f67c 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -335,23 +335,21 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 
 static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 {
+	gfp_t gfp_mask = PGALLOC_GFP;
 	void *ret = NULL;
 	struct page *page;
 
-	if (!kernel) {
-		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-		if (!page)
-			return NULL;
-		if (!pgtable_page_ctor(page)) {
-			__free_page(page);
-			return NULL;
-		}
-	} else {
-		page = alloc_page(PGALLOC_GFP);
-		if (!page)
-			return NULL;
-	}
+	if (!kernel)
+		gfp_mask |= __GFP_ACCOUNT;
 
+	page = alloc_page(gfp_mask);
+	if (!page)
+		return NULL;
+
+	if (!pgtable_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
 
 	ret = page_address(page);
 	/*
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/3] powernv/cpuidle: Parse dt idle properties into global structure
From: Gautham R Shenoy @ 2018-06-20  4:41 UTC (permalink / raw)
  To: Akshay Adiga
  Cc: linux-kernel, linuxppc-dev, linux-pm, rjw, svaidy, ego, npiggin,
	mpe
In-Reply-To: <1529384668-27548-2-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Hi Akshay,

On Tue, Jun 19, 2018 at 10:34:26AM +0530, Akshay Adiga wrote:
> Device-tree parsing happens in twice, once while deciding idle state to
> be used for hotplug and once during cpuidle init. Hence, parsing the
> device tree and caching it will reduce code duplication. Parsing code
> has been moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states().
> 
> Setting up things so that number of available idle states can be
> accessible to cpuidle-powernv driver. Hence adding nr_pnv_idle_states to
> track number of idle states.
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/cpuidle.h    |  14 +++
>  arch/powerpc/platforms/powernv/idle.c | 197 ++++++++++++++++++++++++----------
>  2 files changed, 152 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
> index e210a83..55ee7e3 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -79,6 +79,20 @@ struct stop_sprs {
>  	u64 mmcra;
>  };
> 
> +#define PNV_IDLE_NAME_LEN    16
> +struct pnv_idle_states_t {
> +	char name[PNV_IDLE_NAME_LEN];
> +	u32 latency_ns;
> +	u32 residency_ns;
> +	/*
> +	 * Register value/mask used to select different idle states.
> +	 * PMICR in POWER8 and PSSCR in POWER9
> +	 */
> +	u64 pm_ctrl_reg_val;
> +	u64 pm_ctrl_reg_mask;

We don't use pmicr on POWER8. So I don't mind if you rename this to
psscr_val and psscr_mask.

Or atleast have
   	   union {
	   	 u64 psscr; /* P9 onwards */
		 u64 pmicr  /* P7 and P8 */
	    } val;

	   union {
	   	 u64 psscr; /* P9 onwards */
		 u64 pmicr; /* P7 and P8 */
	   } mask;


	


> +	u32 flags;
> +};
> +
>  extern u32 pnv_fastsleep_workaround_at_entry[];
>  extern u32 pnv_fastsleep_workaround_at_exit[];
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 1c5d067..07be984 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -36,6 +36,8 @@
>  #define P9_STOP_SPR_PSSCR      855
> 
>  static u32 supported_cpuidle_states;
> +struct pnv_idle_states_t *pnv_idle_states;
> +int nr_pnv_idle_states;
> 
>  /*
>   * The default stop state that will be used by ppc_md.power_save
> @@ -625,45 +627,8 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags)
>  static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
>  					int dt_idle_states)

We are not using np, flags in this function right? Also dt_idle_states
can be obtained from the global variable nr_pnv_idle_states.


>  {
> -	u64 *psscr_val = NULL;
> -	u64 *psscr_mask = NULL;
> -	u32 *residency_ns = NULL;
>  	u64 max_residency_ns = 0;
> -	int rc = 0, i;
> -
> -	psscr_val = kcalloc(dt_idle_states, sizeof(*psscr_val), GFP_KERNEL);
> -	psscr_mask = kcalloc(dt_idle_states, sizeof(*psscr_mask), GFP_KERNEL);
> -	residency_ns = kcalloc(dt_idle_states, sizeof(*residency_ns),
> -			       GFP_KERNEL);
> -
> -	if (!psscr_val || !psscr_mask || !residency_ns) {
> -		rc = -1;
> -		goto out;
> -	}
> -
> -	if (of_property_read_u64_array(np,
> -		"ibm,cpu-idle-state-psscr",
> -		psscr_val, dt_idle_states)) {
> -		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
> -		rc = -1;
> -		goto out;
> -	}
> -
> -	if (of_property_read_u64_array(np,
> -				       "ibm,cpu-idle-state-psscr-mask",
> -				       psscr_mask, dt_idle_states)) {
> -		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask in DT\n");
> -		rc = -1;
> -		goto out;
> -	}
> -
> -	if (of_property_read_u32_array(np,
> -				       "ibm,cpu-idle-state-residency-ns",
> -					residency_ns, dt_idle_states)) {
> -		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-residency-ns in DT\n");
> -		rc = -1;
> -		goto out;
> -	}
> +	int i;
> 
>  	/*
>  	 * Set pnv_first_deep_stop_state, pnv_deepest_stop_psscr_{val,mask},
> @@ -681,31 +646,33 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
>  	pnv_first_deep_stop_state = MAX_STOP_STATE;
>  	for (i = 0; i < dt_idle_states; i++) {
>  		int err;
> -		u64 psscr_rl = psscr_val[i] & PSSCR_RL_MASK;
> +		struct pnv_idle_states_t *state = &pnv_idle_states[i];
> +		u64 psscr_rl = state->pm_ctrl_reg_val & PSSCR_RL_MASK;
> 
> -		if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
> -		     (pnv_first_deep_stop_state > psscr_rl))
> +		if ((state->flags & OPAL_PM_LOSE_FULL_CONTEXT) &&
> +		    (pnv_first_deep_stop_state > psscr_rl))
>  			pnv_first_deep_stop_state = psscr_rl;
> 
> -		err = validate_psscr_val_mask(&psscr_val[i], &psscr_mask[i],
> -					      flags[i]);
> +		err = validate_psscr_val_mask(&state->pm_ctrl_reg_val,
> +					      &state->pm_ctrl_reg_mask,
> +					      state->flags);


  We could have a "bool valid" field in the pnv_idle_states_t struct
  for explicitly recording any invalid states in order to prevent any
  other subsystems from using it. We are doing the validation of the
  psscr_val and mask twice today - once in this code and once again in
  the cpuidle code.
  
  
>  		if (err) {
> -			report_invalid_psscr_val(psscr_val[i], err);
> +			report_invalid_psscr_val(state->pm_ctrl_reg_val, err);
>  			continue;
>  		}
> 
> -		if (max_residency_ns < residency_ns[i]) {
> -			max_residency_ns = residency_ns[i];
> -			pnv_deepest_stop_psscr_val = psscr_val[i];
> -			pnv_deepest_stop_psscr_mask = psscr_mask[i];
> -			pnv_deepest_stop_flag = flags[i];
> +		if (max_residency_ns < state->residency_ns) {
> +			max_residency_ns = state->residency_ns;
> +			pnv_deepest_stop_psscr_val = state->pm_ctrl_reg_val;
> +			pnv_deepest_stop_psscr_mask = state->pm_ctrl_reg_mask;
> +			pnv_deepest_stop_flag = state->flags;
>  			deepest_stop_found = true;
>  		}
> 
>  		if (!default_stop_found &&
> -		    (flags[i] & OPAL_PM_STOP_INST_FAST)) {
> -			pnv_default_stop_val = psscr_val[i];
> -			pnv_default_stop_mask = psscr_mask[i];
> +		    (state->flags & OPAL_PM_STOP_INST_FAST)) {
> +			pnv_default_stop_val = state->pm_ctrl_reg_val;
> +			pnv_default_stop_mask = state->pm_ctrl_reg_mask;
>  			default_stop_found = true;
>  		}
>  	}
> @@ -728,11 +695,8 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
> 
>  	pr_info("cpuidle-powernv: Requested Level (RL) value of first deep stop = 0x%llx\n",
>  		pnv_first_deep_stop_state);
> -out:
> -	kfree(psscr_val);
> -	kfree(psscr_mask);
> -	kfree(residency_ns);
> -	return rc;
> +
> +	return 0;
>  }
> 
>  /*
> @@ -776,14 +740,129 @@ static void __init pnv_probe_idle_states(void)
>  out:
>  	kfree(flags);
>  }
> -static int __init pnv_init_idle_states(void)
> +
> +/*
> + * This function is use to parse device-tree and populates all the information
> + * into pnv_idle_states structure. Also it sets up nr_pnv_idle_states and
> + * the number of cpuidle states discovered through device-tree.

  "Also it sets up nr_pnv_idle_states *which is* the number of cpuidle
  states discovered through the device-tree."
  
> + */
> +
> +static int pnv_parse_cpuidle_dt(void)
>  {
> +	struct device_node *np;
> +	int nr_idle_states, i;
> +	u32 *temp_u32;
> +	u64 *temp_u64;
> +	const char **temp_string;
> +
> +	np = of_find_node_by_path("/ibm,opal/power-mgt");
> +	if (!np) {
> +		pr_warn("opal: PowerMgmt Node not found\n");
> +		return -ENODEV;
> +	}
> +	nr_idle_states = of_property_count_u32_elems(np,
> +				"ibm,cpu-idle-state-flags");
> +
> +	pnv_idle_states = kcalloc(nr_idle_states, sizeof(*pnv_idle_states),
> +				  GFP_KERNEL);
> +	temp_u32 = kcalloc(nr_idle_states, sizeof(u32),  GFP_KERNEL);
> +	temp_u64 = kcalloc(nr_idle_states, sizeof(u64),  GFP_KERNEL);
> +	temp_string = kcalloc(nr_idle_states, sizeof(char *), GFP_KERNEL);

We should ensure that the allocations has succeeded here, else
bail out and disable the idle states.
	
> +
> +	/* Read flags */
> +	if (of_property_read_u32_array(np,
> +			"ibm,cpu-idle-state-flags", temp_u32, nr_idle_states)) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-flags in DT\n");
> +		goto out;
> +	}

There was some device-tree hardening in the cpuidle code which I think
can be moved to this place.

> +	for (i = 0; i < nr_idle_states; i++)
> +		pnv_idle_states[i].flags = temp_u32[i];
> 
> +	/* Read latencies */
> +	if (of_property_read_u32_array(np,
> +			"ibm,cpu-idle-state-latencies-ns", temp_u32, nr_idle_states)) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
> +		goto out;
> +	}
> +	for (i = 0; i < nr_idle_states; i++)
> +		pnv_idle_states[i].latency_ns = temp_u32[i];
> +
> +	/* Read residencies */
> +	if (of_property_read_u32_array(np,
> +			"ibm,cpu-idle-state-residency-ns", temp_u32, nr_idle_states)) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
> +		goto out;
> +	}
> +	for (i = 0; i < nr_idle_states; i++)
> +		pnv_idle_states[i].residency_ns = temp_u32[i];
> +
> +	/* For power9 */
> +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +		/* Read pm_crtl_val */
> +		if (of_property_read_u64_array(np,
> +				"ibm,cpu-idle-state-psscr", temp_u64, nr_idle_states)) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
> +			goto out;
> +		}
> +		for (i = 0; i < nr_idle_states; i++)
> +			pnv_idle_states[i].pm_ctrl_reg_val = temp_u64[i];
> +
> +		/* Read pm_crtl_mask */
> +		if (of_property_read_u64_array(np,
> +				"ibm,cpu-idle-state-psscr-mask", temp_u64, nr_idle_states)) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask in DT\n");
> +			goto out;
> +		}
> +		for (i = 0; i < nr_idle_states; i++)
> +			pnv_idle_states[i].pm_ctrl_reg_mask = temp_u64[i];
> +	} else { /* Power8 and Power7 */
> +		/* Read pm_crtl_val */
> +		if (of_property_read_u64_array(np,
> +				"ibm,cpu-idle-state-pmicr", temp_u64, nr_idle_states)) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr in DT\n");
> +			goto out;
> +		}
> +		for (i = 0; i < nr_idle_states; i++)
> +			pnv_idle_states[i].pm_ctrl_reg_val = temp_u64[i];
> +
> +		/* Read pm_crtl_mask */
> +		if (of_property_read_u64_array(np,
> +				"ibm,cpu-idle-state-pmicr-mask", temp_u64, nr_idle_states)) {
> +			pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-psscr-mask in DT\n");
> +			goto out;
> +		}
> +		for (i = 0; i < nr_idle_states; i++)
> +			pnv_idle_states[i].pm_ctrl_reg_mask = temp_u64[i];

We don't use the pmicr val and the mask in the kernel for either
POWER7 or POWER8. So, is this "else" block required ?

> +
> +	}
> +	if (of_property_read_string_array(np,
> +			"ibm,cpu-idle-state-names", temp_string, nr_idle_states) < 0) {
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in DT\n");
> +		goto out;
> +	}
> +	for (i = 0; i < nr_idle_states; i++)
> +		strncpy(pnv_idle_states[i].name, temp_string[i],
> +			       PNV_IDLE_NAME_LEN);
> +	nr_pnv_idle_states = nr_idle_states;
> +out:
> +	kfree(temp_u32);
> +	kfree(temp_u64);
> +	kfree(temp_string);
> +	return 0;

We shouldn't be returning 0 we have come to "out" due to any failure
in the device-tree parsing ?

> +}
> +
> +static int __init pnv_init_idle_states(void)
> +{
> +	int rc = 0;
>  	supported_cpuidle_states = 0;
> 
> +	/* In case we error out nr_pnv_idle_states will be zero */
> +	nr_pnv_idle_states = 0;
>  	if (cpuidle_disable != IDLE_NO_OVERRIDE)
>  		goto out;
> -
> +	rc = pnv_parse_cpuidle_dt();
> +	if (rc)
> +		return rc;
>  	pnv_probe_idle_states();
>
>  	if (!(supported_cpuidle_states & OPAL_PM_SLEEP_ENABLED_ER1)) {
> -- 
> 2.5.5
> 

^ permalink raw reply

* Re: [PATCH 2/3] cpuidle/powernv: Change platform init to avoid reparsing dt
From: Gautham R Shenoy @ 2018-06-20  4:52 UTC (permalink / raw)
  To: Akshay Adiga
  Cc: linux-kernel, linuxppc-dev, linux-pm, rjw, svaidy, ego, npiggin,
	mpe
In-Reply-To: <1529384668-27548-3-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Hi Akshay,


On Tue, Jun 19, 2018 at 10:34:27AM +0530, Akshay Adiga wrote:
> The required data is accessible from cpuidle_states structure and
> nr_cpu_idle_states. This patch makes changes to avoid reparsing and use
> data from these structures.
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/idle.c | 37 ++++++++---------------------------
>  1 file changed, 8 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 07be984..0a62611 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -624,8 +624,7 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags)
>   * @dt_idle_states: Number of idle state entries
>   * Returns 0 on success
>   */
> -static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
> -					int dt_idle_states)
> +static int __init pnv_power9_idle_init(void)
>  {
>  	u64 max_residency_ns = 0;
>  	int i;
> @@ -644,7 +643,7 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
>  	 * the shallowest (OPAL_PM_STOP_INST_FAST) loss-less stop state.
>  	 */
>  	pnv_first_deep_stop_state = MAX_STOP_STATE;
> -	for (i = 0; i < dt_idle_states; i++) {
> +	for (i = 0; i < nr_pnv_idle_states; i++) {
>  		int err;
>  		struct pnv_idle_states_t *state = &pnv_idle_states[i];
>  		u64 psscr_rl = state->pm_ctrl_reg_val & PSSCR_RL_MASK;

These above two hunks could be folded into the first patch.

> @@ -704,41 +703,21 @@ static int __init pnv_power9_idle_init(struct device_node *np, u32 *flags,
>   */
>  static void __init pnv_probe_idle_states(void)

Can we move the remaining content of this function into the calling
function pnv_init_idle_states() ? That way...

>  {
> -	struct device_node *np;
> -	int dt_idle_states;
> -	u32 *flags = NULL;
>  	int i;
> 
> -	np = of_find_node_by_path("/ibm,opal/power-mgt");
> -	if (!np) {
> -		pr_warn("opal: PowerMgmt Node not found\n");
> -		goto out;
> -	}
> -	dt_idle_states = of_property_count_u32_elems(np,
> -			"ibm,cpu-idle-state-flags");
> -	if (dt_idle_states < 0) {
> +	if (nr_pnv_idle_states < 0) {

... This reduntant check can be removed. Because nr_pnv_idle_states is initialized to
zero) and if there is an error in parsing the device tree we don't set
up this value. If there is no error, then nr_pnv_idle_states is a
positive value. 



>  		pr_warn("cpuidle-powernv: no idle states found in the DT\n");
> -		goto out;
> -	}
> -
> -	flags = kcalloc(dt_idle_states, sizeof(*flags),  GFP_KERNEL);
> -
> -	if (of_property_read_u32_array(np,
> -			"ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
> -		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-flags in DT\n");
> -		goto out;
> +		return;
>  	}
> 
>  	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> -		if (pnv_power9_idle_init(np, flags, dt_idle_states))
> -			goto out;
> +		if (pnv_power9_idle_init())

There cannot any longer be a failure in pnv_power9_idle_init() after
the first patch, since we always return a 0. So  this if condition can be removed.

> +			return;
>  	}
> 
> -	for (i = 0; i < dt_idle_states; i++)
> -		supported_cpuidle_states |= flags[i];
> +	for (i = 0; i < nr_pnv_idle_states; i++)
> +		supported_cpuidle_states |= pnv_idle_states[i].flags;

Ideally we should be setting the supported_cpuidle_states flags after
all the validatations have been done, including the ones for stop-api
initialization.

> 
> -out:
> -	kfree(flags);
>  }
> 
>  /*
> -- 
> 2.5.5
> 

^ permalink raw reply

* Re: [PATCH 3/3] powernv/cpuidle: Use parsed device tree values for cpuidle_init
From: Gautham R Shenoy @ 2018-06-20  5:01 UTC (permalink / raw)
  To: Akshay Adiga
  Cc: linux-kernel, linuxppc-dev, linux-pm, rjw, svaidy, ego, npiggin,
	mpe
In-Reply-To: <1529384668-27548-4-git-send-email-akshay.adiga@linux.vnet.ibm.com>

Hi Akshay,

On Tue, Jun 19, 2018 at 10:34:28AM +0530, Akshay Adiga wrote:
> Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
> cpuidle driver. Use properties from pnv_idle_states structure for powernv
> cpuidle_init.
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/cpuidle.h |  2 ++
>  drivers/cpuidle/cpuidle-powernv.c  | 49 +++++++++++++++++++++++---------------
>  2 files changed, 32 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h
> index 55ee7e3..1446747 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -93,6 +93,8 @@ struct pnv_idle_states_t {
>  	u32 flags;
>  };
> 
> +extern struct pnv_idle_states_t *pnv_idle_states;
> +extern int nr_pnv_idle_states;
>  extern u32 pnv_fastsleep_workaround_at_entry[];
>  extern u32 pnv_fastsleep_workaround_at_exit[];
> 
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index d29e4f0..de8ba26 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -285,6 +285,11 @@ static int powernv_add_idle_states(void)
>  		goto out;
>  	}
> 
> +	if (nr_pnv_idle_states <= 0) {
> +		pr_warn("opal: No idle states found\n");

We would have a warning of this type in the powernv/idle.c right ? We
don't need to warn again. Or at least have a different warning
something like: 

      	         pr_warn("cpuidle: Only snooze state is available\n");
> +		goto out;
> +	}

> +
>  	/* Read values of any property to determine the num of idle states */
>  	dt_idle_states = of_property_count_u32_elems(power_mgt, "ibm,cpu-idle-state-flags");


We don't need to get dt_idle_states from the device tree now that we
have nr_pnv_idle_states. 


>  	if (dt_idle_states < 0) {
> @@ -338,7 +343,7 @@ static int powernv_add_idle_states(void)
>  	 * If the idle states use stop instruction, probe for psscr values
>  	 * and psscr mask which are necessary to specify required stop level.
>  	 */
> -	has_stop_states = (flags[0] &
> +	has_stop_states = (pnv_idle_states[0].flags &
>  			   (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP));
>  	if (has_stop_states) {
>  		count = of_property_count_u64_elems(power_mgt,
> @@ -387,51 +392,55 @@ static int powernv_add_idle_states(void)
>  						residency_ns, dt_idle_states);
>  	}
> 
> -	for (i = 0; i < dt_idle_states; i++) {
> +	for (i = 0; i < nr_pnv_idle_states; i++) {
>  		unsigned int exit_latency, target_residency;
>  		bool stops_timebase = false;
> +		struct pnv_idle_states_t *state = &pnv_idle_states[i];
> 
>  		/*
>  		 * Skip the platform idle state whose flag isn't in
> -		 * the supported_cpuidle_states flag mask.
> +		 * the supported_pnv_idle_states flag mask.
>  		 */
> -		if ((flags[i] & supported_flags) != flags[i])
> +		if ((state->flags & supported_flags) !=
> +				state->flags)
>  			continue;
>  		/*
>  		 * If an idle state has exit latency beyond
>  		 * POWERNV_THRESHOLD_LATENCY_NS then don't use it
>  		 * in cpu-idle.
>  		 */
> -		if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
> +		if (state->latency_ns > POWERNV_THRESHOLD_LATENCY_NS)
>  			continue;
>  		/*
>  		 * Firmware passes residency and latency values in ns.
>  		 * cpuidle expects it in us.
>  		 */
> -		exit_latency = DIV_ROUND_UP(latency_ns[i], 1000);
> +		exit_latency = DIV_ROUND_UP(state->latency_ns, 1000);
>  		if (!rc)
> -			target_residency = DIV_ROUND_UP(residency_ns[i], 1000);
> +			target_residency = DIV_ROUND_UP(state->residency_ns, 1000);
>  		else
>  			target_residency = 0;
> 
>  		if (has_stop_states) {
> -			int err = validate_psscr_val_mask(&psscr_val[i],
> -							  &psscr_mask[i],
> -							  flags[i]);

> +			int err;
> +			err = validate_psscr_val_mask(&state->pm_ctrl_reg_val,
> +						      &state->pm_ctrl_reg_mask,
> +						      state->flags);
>  			if (err) {
> -				report_invalid_psscr_val(psscr_val[i], err);
> +				report_invalid_psscr_val(state->pm_ctrl_reg_val,
> +							 err);
>  				continue;
>  			}

We have already validated the states once. We should simply use the
result here.


>  		}
> 
> -		if (flags[i] & OPAL_PM_TIMEBASE_STOP)
> +		if (state->flags & OPAL_PM_TIMEBASE_STOP)
>  			stops_timebase = true;
> 
>  		/*
>  		 * For nap and fastsleep, use default target_residency
>  		 * values if f/w does not expose it.
>  		 */
> -		if (flags[i] & OPAL_PM_NAP_ENABLED) {
> +		if (state->flags & OPAL_PM_NAP_ENABLED) {
>  			if (!rc)
>  				target_residency = 100;
>  			/* Add NAP state */
> @@ -439,10 +448,11 @@ static int powernv_add_idle_states(void)
>  					  CPUIDLE_FLAG_NONE, nap_loop,
>  					  target_residency, exit_latency, 0, 0);
>  		} else if (has_stop_states && !stops_timebase) {
> -			add_powernv_state(nr_idle_states, names[i],
> +			add_powernv_state(nr_idle_states, state->name,
>  					  CPUIDLE_FLAG_NONE, stop_loop,
>  					  target_residency, exit_latency,
> -					  psscr_val[i], psscr_mask[i]);
> +					  state->pm_ctrl_reg_val,
> +					  state->pm_ctrl_reg_mask);
>  		}
> 
>  		/*
> @@ -450,8 +460,8 @@ static int powernv_add_idle_states(void)
>  		 * within this config dependency check.
>  		 */
>  #ifdef CONFIG_TICK_ONESHOT
> -		else if (flags[i] & OPAL_PM_SLEEP_ENABLED ||
> -			 flags[i] & OPAL_PM_SLEEP_ENABLED_ER1) {
> +		else if (state->flags & OPAL_PM_SLEEP_ENABLED ||
> +			 state->flags & OPAL_PM_SLEEP_ENABLED_ER1) {
>  			if (!rc)
>  				target_residency = 300000;
>  			/* Add FASTSLEEP state */
> @@ -460,10 +470,11 @@ static int powernv_add_idle_states(void)
>  					  fastsleep_loop,
>  					  target_residency, exit_latency, 0, 0);
>  		} else if (has_stop_states && stops_timebase) {
> -			add_powernv_state(nr_idle_states, names[i],
> +			add_powernv_state(nr_idle_states, state->name,
>  					  CPUIDLE_FLAG_TIMER_STOP, stop_loop,
>  					  target_residency, exit_latency,
> -					  psscr_val[i], psscr_mask[i]);
> +					  state->pm_ctrl_reg_val,
> +					  state->pm_ctrl_reg_mask);
>  		}
>  #endif
>  		else
> -- 
> 2.5.5
> 

^ permalink raw reply

* [PATCH] powerpc/pkeys: preallocate execute_only key only if the key is available.
From: Ram Pai @ 2018-06-20  5:34 UTC (permalink / raw)
  To: mpe
  Cc: linuxppc-dev, hbabu, mhocko, bauerman, linuxram, Ulrich.Weigand,
	fweimer, msuchanek

Key 2 is preallocated and reserved for execute-only key. In rare
cases if key-2 is unavailable, mprotect(PROT_EXEC) will behave
incorrectly. NOTE: mprotect(PROT_EXEC) uses execute-only key.

Ensure key 2 is available for preallocation before reserving it for
execute_only purpose.

CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/mm/pkeys.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index cec990c..2013ef0 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -19,6 +19,7 @@
 u64  pkey_amr_mask;		/* Bits in AMR not to be touched */
 u64  pkey_iamr_mask;		/* Bits in AMR not to be touched */
 u64  pkey_uamor_mask;		/* Bits in UMOR not to be touched */
+u32  execute_only_key = 2;
 
 #define AMR_BITS_PER_PKEY 2
 #define AMR_RD_BIT 0x1UL
@@ -26,7 +27,6 @@
 #define IAMR_EX_BIT 0x1UL
 #define PKEY_REG_BITS (sizeof(u64)*8)
 #define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
-#define EXECUTE_ONLY_KEY 2
 
 static void scan_pkey_feature(void)
 {
@@ -122,8 +122,12 @@ int pkey_initialize(void)
 #else
 	os_reserved = 0;
 #endif
+
+	if ((pkeys_total - os_reserved) <= execute_only_key)
+		execute_only_key = -1;
+
 	/* Bits are in LE format. */
-	reserved_allocation_mask = (0x1 << 1) | (0x1 << EXECUTE_ONLY_KEY);
+	reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key);
 	initial_allocation_mask  = reserved_allocation_mask | (0x1 << PKEY_0);
 
 	/* register mask is in BE format */
@@ -132,11 +136,11 @@ int pkey_initialize(void)
 
 	pkey_iamr_mask = ~0x0ul;
 	pkey_iamr_mask &= ~(0x3ul << pkeyshift(PKEY_0));
-	pkey_iamr_mask &= ~(0x3ul << pkeyshift(EXECUTE_ONLY_KEY));
+	pkey_iamr_mask &= ~(0x3ul << pkeyshift(execute_only_key));
 
 	pkey_uamor_mask = ~0x0ul;
 	pkey_uamor_mask &= ~(0x3ul << pkeyshift(PKEY_0));
-	pkey_uamor_mask &= ~(0x3ul << pkeyshift(EXECUTE_ONLY_KEY));
+	pkey_uamor_mask &= ~(0x3ul << pkeyshift(execute_only_key));
 
 	for (i = (pkeys_total - os_reserved); i < pkeys_total; i++)
 		pkey_uamor_mask &= ~(0x3ul << pkeyshift(i));
@@ -151,7 +155,7 @@ void pkey_mm_init(struct mm_struct *mm)
 	if (static_branch_likely(&pkey_disabled))
 		return;
 	mm_pkey_allocation_map(mm) = initial_allocation_mask;
-	mm->context.execute_only_pkey = EXECUTE_ONLY_KEY;
+	mm->context.execute_only_pkey = execute_only_key;
 }
 
 static inline u64 read_amr(void)
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 1/3] [v2] powerpc: mac: fix rtc read/write functions
From: Mathieu Malaterre @ 2018-06-20  7:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Paul Mackerras, Michael Ellerman, Geert Uytterhoeven, funaho,
	Benjamin Herrenschmidt, gerg, linux-m68k, linuxppc-dev, LKML,
	y2038, Meelis Roos, schwab
In-Reply-To: <20180619140229.3615110-1-arnd@arndb.de>

On Tue, Jun 19, 2018 at 4:04 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> As Mathieu pointed out, my conversion to time64_t was incorrect and resulted
> in negative times to be read from the RTC. The problem is that during the
> conversion from a byte array to a time64_t, the 'unsigned char' variable
> holding the top byte gets turned into a negative signed 32-bit integer
> before being assigned to the 64-bit variable for any times after 1972.
>
> This changes the logic to cast to an unsigned 32-bit number first for
> the Macintosh time and then convert that to the Unix time, which then gives
> us a time in the documented 1904..2040 year range. I decided not to use
> the longer 1970..2106 range that other drivers use, for consistency with
> the literal interpretation of the register, but that could be easily
> changed if we decide we want to support any Mac after 2040.
>
> Just to be on the safe side, I'm also adding a WARN_ON that will trigger
> if either the year 2040 has come and is observed by this driver, or we
> run into an RTC that got set back to a pre-1970 date for some reason
> (the two are indistinguishable).
>
> For the RTC write functions, Andreas found another problem: both
> pmu_request() and cuda_request() are varargs functions, so changing
> the type of the arguments passed into them from 32 bit to 64 bit
> breaks the API for the set_rtc_time functions. This changes it
> back to 32 bits.
>
> The same code exists in arch/m68k/ and is patched in an identical way now
> in a separate patch.
>
> Fixes: 5bfd643583b2 ("powerpc: use time64_t in read_persistent_clock")
> Reported-by: Mathieu Malaterre <malat@debian.org>
> Reported-by: Andreas Schwab <schwab@linux-m68k.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Doing a reboot on Debian sets the hardware clock from system clock and
upon reboot everything is back in shape:

[    5.645082] rtc-generic rtc-generic: setting system clock to
2018-06-20 07:12:04 UTC (1529478724)


Tested-by: Mathieu Malaterre <malat@debian.org>

Thanks!
> ---
>  arch/powerpc/platforms/powermac/time.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powermac/time.c b/arch/powerpc/platforms/powermac/time.c
> index 7c968e46736f..12e6e4d30602 100644
> --- a/arch/powerpc/platforms/powermac/time.c
> +++ b/arch/powerpc/platforms/powermac/time.c
> @@ -42,7 +42,11 @@
>  #define DBG(x...)
>  #endif
>
> -/* Apparently the RTC stores seconds since 1 Jan 1904 */
> +/*
> + * Offset between Unix time (1970-based) and Mac time (1904-based). Cuda and PMU
> + * times wrap in 2040. If we need to handle later times, the read_time functions
> + * need to be changed to interpret wrapped times as post-2040.
> + */
>  #define RTC_OFFSET     2082844800
>
>  /*
> @@ -97,8 +101,11 @@ static time64_t cuda_get_time(void)
>         if (req.reply_len != 7)
>                 printk(KERN_ERR "cuda_get_time: got %d byte reply\n",
>                        req.reply_len);
> -       now = (req.reply[3] << 24) + (req.reply[4] << 16)
> -               + (req.reply[5] << 8) + req.reply[6];
> +       now = (u32)((req.reply[3] << 24) + (req.reply[4] << 16) +
> +                   (req.reply[5] << 8) + req.reply[6]);
> +       /* it's either after year 2040, or the RTC has gone backwards */
> +       WARN_ON(now < RTC_OFFSET);
> +
>         return now - RTC_OFFSET;
>  }
>
> @@ -106,10 +113,10 @@ static time64_t cuda_get_time(void)
>
>  static int cuda_set_rtc_time(struct rtc_time *tm)
>  {
> -       time64_t nowtime;
> +       u32 nowtime;
>         struct adb_request req;
>
> -       nowtime = rtc_tm_to_time64(tm) + RTC_OFFSET;
> +       nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
>         if (cuda_request(&req, NULL, 6, CUDA_PACKET, CUDA_SET_TIME,
>                          nowtime >> 24, nowtime >> 16, nowtime >> 8,
>                          nowtime) < 0)
> @@ -140,8 +147,12 @@ static time64_t pmu_get_time(void)
>         if (req.reply_len != 4)
>                 printk(KERN_ERR "pmu_get_time: got %d byte reply from PMU\n",
>                        req.reply_len);
> -       now = (req.reply[0] << 24) + (req.reply[1] << 16)
> -               + (req.reply[2] << 8) + req.reply[3];
> +       now = (u32)((req.reply[0] << 24) + (req.reply[1] << 16) +
> +                   (req.reply[2] << 8) + req.reply[3]);
> +
> +       /* it's either after year 2040, or the RTC has gone backwards */
> +       WARN_ON(now < RTC_OFFSET);
> +
>         return now - RTC_OFFSET;
>  }
>
> @@ -149,10 +160,10 @@ static time64_t pmu_get_time(void)
>
>  static int pmu_set_rtc_time(struct rtc_time *tm)
>  {
> -       time64_t nowtime;
> +       u32 nowtime;
>         struct adb_request req;
>
> -       nowtime = rtc_tm_to_time64(tm) + RTC_OFFSET;
> +       nowtime = lower_32_bits(rtc_tm_to_time64(tm) + RTC_OFFSET);
>         if (pmu_request(&req, NULL, 5, PMU_SET_RTC, nowtime >> 24,
>                         nowtime >> 16, nowtime >> 8, nowtime) < 0)
>                 return -ENXIO;
> --
> 2.9.0
>

^ permalink raw reply

* Re: [RFC PATCH 17/23] watchdog/hardlockup/hpet: Convert the timer's interrupt to NMI
From: Thomas Gleixner @ 2018-06-20  7:47 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ingo Molnar, H. Peter Anvin, Andi Kleen, Ashok Raj,
	Borislav Petkov, Tony Luck, Ravi V. Shankar, x86, sparclinux,
	linuxppc-dev, linux-kernel, Jacob Pan, Rafael J. Wysocki,
	Don Zickus, Nicholas Piggin, Michael Ellerman,
	Frederic Weisbecker, Alexei Starovoitov, Babu Moger,
	Mathieu Desnoyers, Masami Hiramatsu, Peter Zijlstra,
	Andrew Morton, Philippe Ombredanne, Colin Ian King,
	Byungchul Park, Paul E. McKenney, Luis R. Rodriguez, Waiman Long,
	Josh Poimboeuf, Randy Dunlap, Davidlohr Bueso, Christoffer Dall,
	Marc Zyngier, Kai-Heng Feng, Konrad Rzeszutek Wilk,
	David Rientjes, iommu
In-Reply-To: <20180620001535.GA27531@voyager>

On Tue, 19 Jun 2018, Ricardo Neri wrote:
> On Sat, Jun 16, 2018 at 03:24:49PM +0200, Thomas Gleixner wrote:
> > The status register is useless in case of MSI. MSI is edge triggered ....
> > 
> > The only register which gives you proper information is the counter
> > register itself. That adds an massive overhead to each NMI, because the
> > counter register access is synchronized to the HPET clock with hardware
> > magic. Plus on larger systems, the HPET access is cross node and even
> > slower.
> 
> It starts to sound that the HPET is too slow to drive the hardlockup detector.
> 
> Would it be possible to envision a variant of this implementation? In this
> variant, the HPET only targets a single CPU. The actual hardlockup detector
> is implemented by this single CPU sending interprocessor interrupts to the
> rest of the CPUs.

And these IPIs must be NMIs which need to have a software based indicator
that the watchdog needs to be checked, which is going to create yet another
can of race conditions and in the worst case 'unknown NMI' splats. Not
pretty either.

Thanks,

	tglx

^ permalink raw reply

* [PATCH kernel] KVM: PPC: Fix hardware and emulated TCE tables matching
From: Alexey Kardashevskiy @ 2018-06-20  8:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, David Gibson, kvm-ppc, Paul Mackerras

When attaching a hardware table to LIOBN in KVM, we match table parameters
such as page size, table offset and table size. However the tables are
created via very different paths - VFIO and KVM - and the VFIO path goes
through the platform code which has minimum TCE page size requirement
(which is 4K but since we allocate memory by pages and cannot avoid
alignment anyway, we align to 64k pages for powernv_defconfig).

So when we match the tables, one might be bigger that the other which
means the hardware table cannot get attached to LIOBN and DMA mapping
fails.

This removes the table size alignment from the guest visible table.
This does not affect the memory allocation which is still aligned -
kvmppc_tce_pages() takes care of this.

This relaxes the check we do when attaching tables to allow the hardware
table be bigger than the guest visible table.

Ideally we want the KVM table to cover the same space as the hardware
table does but since the hardware table may use multiple levels, and
all levels must use the same table size (IODA2 design), the area it can
actually cover might get very different from the window size which
the guest requested, even though the guest won't map it all.

Fixes: ca1fc489cf "KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/book3s_64_vio.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8c456fa..8167ce8 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -180,7 +180,7 @@ extern long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
 		if ((tbltmp->it_page_shift <= stt->page_shift) &&
 				(tbltmp->it_offset << tbltmp->it_page_shift ==
 				 stt->offset << stt->page_shift) &&
-				(tbltmp->it_size << tbltmp->it_page_shift ==
+				(tbltmp->it_size << tbltmp->it_page_shift >=
 				 stt->size << stt->page_shift)) {
 			/*
 			 * Reference the table to avoid races with
@@ -296,7 +296,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 {
 	struct kvmppc_spapr_tce_table *stt = NULL;
 	struct kvmppc_spapr_tce_table *siter;
-	unsigned long npages, size;
+	unsigned long npages, size = args->size;
 	int ret = -ENOMEM;
 	int i;
 
@@ -304,7 +304,6 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
 		(args->offset + args->size > (ULLONG_MAX >> args->page_shift)))
 		return -EINVAL;
 
-	size = _ALIGN_UP(args->size, PAGE_SIZE >> 3);
 	npages = kvmppc_tce_pages(size);
 	ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true);
 	if (ret)
-- 
2.11.0

^ permalink raw reply related

* [powerpc:merge 116/158] ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined!
From: kbuild test robot @ 2018-06-20  9:12 UTC (permalink / raw)
  Cc: kbuild-all, linuxppc-dev, Michael Ellerman

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
head:   6f4c6f50edb82c36e84df8528760998b1ed6b6cb
commit: 076674a524241b30bbff9e8d45e5dbdcea39cada [116/158] Automatic merge of branches 'master', 'next' and 'fixes' into merge
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 076674a524241b30bbff9e8d45e5dbdcea39cada
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   ERROR: "ia64_delay_loop" [drivers/spi/spi-thunderx.ko] undefined!
>> ERROR: "__sw_hweight8" [drivers/net/wireless/mediatek/mt76/mt76.ko] undefined!
   ERROR: "ia64_delay_loop" [drivers/net/phy/mdio-cavium.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49916 bytes --]

^ permalink raw reply

* [PATCH V2] powerpc/mm: Initialize kernel pagetable memory for PTE fragments
From: Anshuman Khandual @ 2018-06-20  9:22 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, aneesh.kumar, benh

Kernel pagetable pages for PTE fragments never go through the standard init
sequence which can cause inaccuracies in utilization statistics reported at
places like /proc and /sysfs interfaces etc. Also the allocated page misses
out on pagetable lock and page flag init as well. Fix it by making sure all
pages allocated for either user process or kernel PTE fragments go through
same initialization.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V2:

- Call the destructor function during free for all cases

 arch/powerpc/mm/pgtable-book3s64.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index c1f4ca4..a820ee6 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -335,23 +335,21 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 
 static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 {
+	gfp_t gfp_mask = PGALLOC_GFP;
 	void *ret = NULL;
 	struct page *page;
 
-	if (!kernel) {
-		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-		if (!page)
-			return NULL;
-		if (!pgtable_page_ctor(page)) {
-			__free_page(page);
-			return NULL;
-		}
-	} else {
-		page = alloc_page(PGALLOC_GFP);
-		if (!page)
-			return NULL;
-	}
+	if (!kernel)
+		gfp_mask |= __GFP_ACCOUNT;
 
+	page = alloc_page(gfp_mask);
+	if (!page)
+		return NULL;
+
+	if (!pgtable_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
 
 	ret = page_address(page);
 	/*
@@ -391,8 +389,7 @@ void pte_fragment_free(unsigned long *table, int kernel)
 	struct page *page = virt_to_page(table);
 
 	if (put_page_testzero(page)) {
-		if (!kernel)
-			pgtable_page_dtor(page);
+		pgtable_page_dtor(page);
 		free_unref_page(page);
 	}
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH V2] powerpc/mm: Initialize kernel pagetable memory for PTE fragments
From: Aneesh Kumar K.V @ 2018-06-20  9:30 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev
In-Reply-To: <20180620092255.19393-1-khandual@linux.vnet.ibm.com>

On 06/20/2018 02:52 PM, Anshuman Khandual wrote:
> Kernel pagetable pages for PTE fragments never go through the standard init
> sequence which can cause inaccuracies in utilization statistics reported at
> places like /proc and /sysfs interfaces etc. Also the allocated page misses
> out on pagetable lock and page flag init as well. Fix it by making sure all
> pages allocated for either user process or kernel PTE fragments go through
> same initialization.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
> Changes in V2:
> 
> - Call the destructor function during free for all cases
> 
>   arch/powerpc/mm/pgtable-book3s64.c | 27 ++++++++++++---------------
>   1 file changed, 12 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index c1f4ca4..a820ee6 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -335,23 +335,21 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
>   
>   static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
>   {
> +	gfp_t gfp_mask = PGALLOC_GFP;
>   	void *ret = NULL;
>   	struct page *page;
>   
> -	if (!kernel) {
> -		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
> -		if (!page)
> -			return NULL;
> -		if (!pgtable_page_ctor(page)) {
> -			__free_page(page);
> -			return NULL;
> -		}
> -	} else {
> -		page = alloc_page(PGALLOC_GFP);
> -		if (!page)
> -			return NULL;
> -	}
> +	if (!kernel)
> +		gfp_mask |= __GFP_ACCOUNT;
>   
> +	page = alloc_page(gfp_mask);
> +	if (!page)
> +		return NULL;
> +
> +	if (!pgtable_page_ctor(page)) {
> +		__free_page(page);
> +		return NULL;
> +	}
>   
>   	ret = page_address(page);
>   	/*
> @@ -391,8 +389,7 @@ void pte_fragment_free(unsigned long *table, int kernel)
>   	struct page *page = virt_to_page(table);
>   
>   	if (put_page_testzero(page)) {
> -		if (!kernel)
> -			pgtable_page_dtor(page);
> +		pgtable_page_dtor(page);
>   		free_unref_page(page);
>   	}
>   }
> 

If we really are going to do this, may be we can  do it as 
alloc_for_pmdcache, kill the kernel arg and compare against init_mm?

-aneesh

^ permalink raw reply

* Re: [PATCH] spi/fsl-espi: Add missing cell-index OF property
From: Mark Brown @ 2018-06-20 10:36 UTC (permalink / raw)
  To: York Sun; +Cc: jocke@infinera.com, linuxppc-dev
In-Reply-To: <VI1PR04MB20782A7BE6789A3AE441DB359A700@VI1PR04MB2078.eurprd04.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

On Tue, Jun 19, 2018 at 04:29:35PM +0000, York Sun wrote:
> Joakim,
> 
> I am not the maintainer for this driver. Adding Mark Brown.
> 
> York
> 
> On 06/19/2018 09:23 AM, Joakim Tjernlund wrote:
> > espi does not look for a OF cell-index property which
> > makes the bus numbering dynamic only. This add an

As documented in SubmittingPatches please send patches to the 
maintainers for the code you would like to change.  The normal kernel
workflow is that people apply patches from their inboxes, if they aren't
copied they are likely to not see the patch at all and it is much more
difficult to apply patches.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [git pull] Please pull powerpc.git next branch
From: Jiri Kosina @ 2018-06-20 10:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, linuxppc-dev list, Andrew Morton,
	Linux Kernel Mailing List, Mauro Carvalho Chehab
In-Reply-To: <d86fa9cdb4c160350268c70bd8fead138ce5a8e2.camel@kernel.crashing.org>

On Wed, 20 Jun 2018, Benjamin Herrenschmidt wrote:

> > I love the alleged line removal, but there's nothing in that 'merge'
> > branch. It points to an ancient powerpc commit from 2016.
> > 
> > And please, signed tags. I know you do them, because you used to send me them.
> 
> I didn't send this. It's an ancient (8 years old) email... I have no
> idea how it got resent while I was asleep...

I got a few mails from 2010 this morning as well. From the headers it 
looks like Mauro (CCed) re-bounced them for some reason.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply

* arch/powerpc/boot/addnote.c:75:47: warning: right shift count >= width of type [-Wshift-count-overflow]
From: Mathieu Malaterre @ 2018-06-20 11:29 UTC (permalink / raw)
  To: linuxppc-dev

Hi there,

For some reason I am seeing some new warning (W=3D0) on some old code.
Anyone else seeing them ?

  HOSTCC  arch/powerpc/boot/addnote
arch/powerpc/boot/addnote.c: In function =E2=80=98main=E2=80=99:
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >=3D width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
                                               ^
arch/powerpc/boot/addnote.c:72:39: note: in definition of macro =E2=80=98PU=
T_16BE=E2=80=99
 #define PUT_16BE(off, v)(buf[off] =3D ((v) >> 8) & 0xff, \
                                       ^
arch/powerpc/boot/addnote.c:75:27: note: in expansion of macro =E2=80=98PUT=
_32BE=E2=80=99
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
                           ^~~~~~~~
arch/powerpc/boot/addnote.c:94:50: note: in expansion of macro =E2=80=98PUT=
_64BE=E2=80=99
 #define PUT_64(off, v)  (e_data =3D=3D ELFDATA2MSB ? PUT_64BE(off, v) : \
                                                  ^~~~~~~~
arch/powerpc/boot/addnote.c:183:3: note: in expansion of macro =E2=80=98PUT=
_64=E2=80=99
   PUT_64(ph + PH_OFFSET, ns);
   ^~~~~~
arch/powerpc/boot/addnote.c:75:47: warning: right shift count >=3D width
of type [-Wshift-count-overflow]
 #define PUT_64BE(off, v)((PUT_32BE((off), (v) >> 32L), \
                                               ^
arch/powerpc/boot/addnote.c:73:23: note: in definition of macro =E2=80=98PU=
T_16BE=E2=80=99
     buf[(off) + 1] =3D (v) & 0xff)

^ permalink raw reply

* [PATCH V3] powerpc/mm: Initialize kernel pagetable memory for PTE fragments
From: Anshuman Khandual @ 2018-06-20 13:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, aneesh.kumar, benh

Kernel pagetable pages for PTE fragments never go through the standard init
sequence which can cause inaccuracies in utilization statistics reported at
places like /proc and /sysfs interfaces etc. Also the allocated page misses
out on pagetable lock and page flag init as well. Fix it by making sure all
pages allocated for either user process or kernel PTE fragments go through
same initialization.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Changes in V3:

- Replaced 'kernel' argument with direct check on init_mm as per Aneesh

Changes in V2:

- Call the destructor function during free for all cases

 arch/powerpc/include/asm/book3s/64/pgalloc.h | 12 ++++-----
 arch/powerpc/mm/pgtable-book3s64.c           | 37 +++++++++++++---------------
 2 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 01ee40f..ccb351c 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -41,9 +41,9 @@ struct vmemmap_backing {
 			pgtable_cache[(shift) - 1];	\
 		})
 
-extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
+extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long);
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
-extern void pte_fragment_free(unsigned long *, int);
+extern void pte_fragment_free(unsigned long *);
 extern void pmd_fragment_free(unsigned long *);
 extern void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift);
 #ifdef CONFIG_SMP
@@ -176,23 +176,23 @@ static inline pgtable_t pmd_pgtable(pmd_t pmd)
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return (pte_t *)pte_fragment_alloc(mm, address, 1);
+	return (pte_t *)pte_fragment_alloc(mm, address);
 }
 
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
 				      unsigned long address)
 {
-	return (pgtable_t)pte_fragment_alloc(mm, address, 0);
+	return (pgtable_t)pte_fragment_alloc(mm, address);
 }
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	pte_fragment_free((unsigned long *)pte, 1);
+	pte_fragment_free((unsigned long *)pte);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
-	pte_fragment_free((unsigned long *)ptepage, 0);
+	pte_fragment_free((unsigned long *)ptepage);
 }
 
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index c1f4ca4..b792f8a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -333,25 +333,23 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 	return (pte_t *)ret;
 }
 
-static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
+static pte_t *__alloc_for_ptecache(struct mm_struct *mm)
 {
+	gfp_t gfp_mask = PGALLOC_GFP;
 	void *ret = NULL;
 	struct page *page;
 
-	if (!kernel) {
-		page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-		if (!page)
-			return NULL;
-		if (!pgtable_page_ctor(page)) {
-			__free_page(page);
-			return NULL;
-		}
-	} else {
-		page = alloc_page(PGALLOC_GFP);
-		if (!page)
-			return NULL;
-	}
+	if (mm != &init_mm)
+		gfp_mask |= __GFP_ACCOUNT;
 
+	page = alloc_page(gfp_mask);
+	if (!page)
+		return NULL;
+
+	if (!pgtable_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
 
 	ret = page_address(page);
 	/*
@@ -375,7 +373,7 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
 	return (pte_t *)ret;
 }
 
-pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel)
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr)
 {
 	pte_t *pte;
 
@@ -383,16 +381,15 @@ pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int kernel
 	if (pte)
 		return pte;
 
-	return __alloc_for_ptecache(mm, kernel);
+	return __alloc_for_ptecache(mm);
 }
 
-void pte_fragment_free(unsigned long *table, int kernel)
+void pte_fragment_free(unsigned long *table)
 {
 	struct page *page = virt_to_page(table);
 
 	if (put_page_testzero(page)) {
-		if (!kernel)
-			pgtable_page_dtor(page);
+		pgtable_page_dtor(page);
 		free_unref_page(page);
 	}
 }
@@ -401,7 +398,7 @@ static inline void pgtable_free(void *table, int index)
 {
 	switch (index) {
 	case PTE_INDEX:
-		pte_fragment_free(table, 0);
+		pte_fragment_free(table);
 		break;
 	case PMD_INDEX:
 		pmd_fragment_free(table);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2 3/3] powerpc: Remove -Wattribute-alias pragmas
From: Christophe LEROY @ 2018-06-20 13:40 UTC (permalink / raw)
  To: Paul Burton, linux-kbuild
  Cc: Mauro Carvalho Chehab, linux-mips, Arnd Bergmann, Ingo Molnar,
	Matthew Wilcox, Thomas Gleixner, Douglas Anderson, Josh Poimboeuf,
	Andrew Morton, Matthias Kaehlcke, He Zhe, Benjamin Herrenschmidt,
	Michal Marek, Khem Raj, Al Viro, Stafford Horne,
	Gideon Israel Dsouza, Masahiro Yamada, Kees Cook,
	Michael Ellerman, Heiko Carstens, linux-kernel, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <20180619201458.4559-4-paul.burton@mips.com>



Le 19/06/2018 à 22:14, Paul Burton a écrit :
> With SYSCALL_DEFINEx() disabling -Wattribute-alias generically, there's
> no need to duplicate that for PowerPC syscalls.
> 
> This reverts commit 415520373975 ("powerpc: fix build failure by
> disabling attribute-alias warning in pci_32") and commit 2479bfc9bc60
> ("powerpc: Fix build by disabling attribute-alias warning for
> SYSCALL_DEFINEx").
> 
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Cc: Michal Marek <michal.lkml@markovi.net>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Matthew Wilcox <matthew@wil.cx>
> Cc: Matthias Kaehlcke <mka@chromium.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Gideon Israel Dsouza <gidisrael@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: Khem Raj <raj.khem@gmail.com>
> Cc: He Zhe <zhe.he@windriver.com>
> Cc: linux-kbuild@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-mips@linux-mips.org
> Cc: linuxppc-dev@lists.ozlabs.org
> 
> ---
> Michael & Christophe, I didn't add your acks here yet since it changed
> to include the second revert that Christophe pointed out I'd missed & I
> didn't want to presume your acks extended to that.

Looks ok to me

Acked-by: Christophe Leroy <christophe.leroy@c-s.fr>

> 
> Changes in v2:
> - Also revert 2479bfc9bc60 ("powerpc: Fix build by disabling
>    attribute-alias warning for SYSCALL_DEFINEx").
> - Change subject now that it's not just a simple one-commit revert.
> 
>   arch/powerpc/kernel/pci_32.c    | 4 ----
>   arch/powerpc/kernel/pci_64.c    | 4 ----
>   arch/powerpc/kernel/rtas.c      | 4 ----
>   arch/powerpc/kernel/signal_32.c | 8 --------
>   arch/powerpc/kernel/signal_64.c | 4 ----
>   arch/powerpc/kernel/syscalls.c  | 4 ----
>   arch/powerpc/mm/subpage-prot.c  | 4 ----
>   7 files changed, 32 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
> index 4f861055a852..d63b488d34d7 100644
> --- a/arch/powerpc/kernel/pci_32.c
> +++ b/arch/powerpc/kernel/pci_32.c
> @@ -285,9 +285,6 @@ pci_bus_to_hose(int bus)
>    * Note that the returned IO or memory base is a physical address
>    */
>   
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE3(pciconfig_iobase, long, which,
>   		unsigned long, bus, unsigned long, devfn)
>   {
> @@ -313,4 +310,3 @@ SYSCALL_DEFINE3(pciconfig_iobase, long, which,
>   
>   	return result;
>   }
> -#pragma GCC diagnostic pop
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 812171c09f42..dff28f903512 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -203,9 +203,6 @@ void pcibios_setup_phb_io_space(struct pci_controller *hose)
>   #define IOBASE_ISA_IO		3
>   #define IOBASE_ISA_MEM		4
>   
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, in_bus,
>   			  unsigned long, in_devfn)
>   {
> @@ -259,7 +256,6 @@ SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, in_bus,
>   
>   	return -EOPNOTSUPP;
>   }
> -#pragma GCC diagnostic pop
>   
>   #ifdef CONFIG_NUMA
>   int pcibus_to_node(struct pci_bus *bus)
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 7fb9f83dcde8..8afd146bc9c7 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1051,9 +1051,6 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
>   }
>   
>   /* We assume to be passed big endian arguments */
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   {
>   	struct rtas_args args;
> @@ -1140,7 +1137,6 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   
>   	return 0;
>   }
> -#pragma GCC diagnostic pop
>   
>   /*
>    * Call early during boot, before mem init, to retrieve the RTAS
> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index 5eedbb282d42..e6474a45cef5 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -1038,9 +1038,6 @@ static int do_setcontext_tm(struct ucontext __user *ucp,
>   }
>   #endif
>   
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   #ifdef CONFIG_PPC64
>   COMPAT_SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>   		       struct ucontext __user *, new_ctx, int, ctx_size)
> @@ -1134,7 +1131,6 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>   	set_thread_flag(TIF_RESTOREALL);
>   	return 0;
>   }
> -#pragma GCC diagnostic pop
>   
>   #ifdef CONFIG_PPC64
>   COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
> @@ -1231,9 +1227,6 @@ SYSCALL_DEFINE0(rt_sigreturn)
>   	return 0;
>   }
>   
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   #ifdef CONFIG_PPC32
>   SYSCALL_DEFINE3(debug_setcontext, struct ucontext __user *, ctx,
>   			 int, ndbg, struct sig_dbg_op __user *, dbg)
> @@ -1337,7 +1330,6 @@ SYSCALL_DEFINE3(debug_setcontext, struct ucontext __user *, ctx,
>   	return 0;
>   }
>   #endif
> -#pragma GCC diagnostic pop
>   
>   /*
>    * OK, we're invoking a handler
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index d42b60020389..83d51bf586c7 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -625,9 +625,6 @@ static long setup_trampoline(unsigned int syscall, unsigned int __user *tramp)
>   /*
>    * Handle {get,set,swap}_context operations
>    */
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>   		struct ucontext __user *, new_ctx, long, ctx_size)
>   {
> @@ -693,7 +690,6 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>   	set_thread_flag(TIF_RESTOREALL);
>   	return 0;
>   }
> -#pragma GCC diagnostic pop
>   
>   
>   /*
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index 083fa06962fd..466216506eb2 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -62,9 +62,6 @@ static inline long do_mmap2(unsigned long addr, size_t len,
>   	return ret;
>   }
>   
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
>   		unsigned long, prot, unsigned long, flags,
>   		unsigned long, fd, unsigned long, pgoff)
> @@ -78,7 +75,6 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
>   {
>   	return do_mmap2(addr, len, prot, flags, fd, offset, PAGE_SHIFT);
>   }
> -#pragma GCC diagnostic pop
>   
>   #ifdef CONFIG_PPC32
>   /*
> diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
> index 75cb646a79c3..9d16ee251fc0 100644
> --- a/arch/powerpc/mm/subpage-prot.c
> +++ b/arch/powerpc/mm/subpage-prot.c
> @@ -186,9 +186,6 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr,
>    * in a 2-bit field won't allow writes to a page that is otherwise
>    * write-protected.
>    */
> -#pragma GCC diagnostic push
> -#pragma GCC diagnostic ignored "-Wpragmas"
> -#pragma GCC diagnostic ignored "-Wattribute-alias"
>   SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
>   		unsigned long, len, u32 __user *, map)
>   {
> @@ -272,4 +269,3 @@ SYSCALL_DEFINE3(subpage_prot, unsigned long, addr,
>   	up_write(&mm->mmap_sem);
>   	return err;
>   }
> -#pragma GCC diagnostic pop
> 

^ permalink raw reply

* Re: [PATCH v13 08/24] selftests/vm: fix the wrong assert in pkey_disable_set()
From: Dave Hansen @ 2018-06-20 14:47 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-9-git-send-email-linuxram@us.ibm.com>

On 06/13/2018 05:44 PM, Ram Pai wrote:
> If the flag is 0, no bits will be set. Hence we cant expect
> the resulting bitmap to have a higher value than what it
> was earlier
...
>  	if (flags)
> -		pkey_assert(read_pkey_reg() > orig_pkey_reg);
> +		pkey_assert(read_pkey_reg() >= orig_pkey_reg);
>  	dprintf1("END<---%s(%d, 0x%x)\n", __func__,
>  		pkey, flags);
>  }

This is the kind of thing where I'd love to hear the motivation and
background.  This "disable a key that was already disabled" operation
obviously doesn't happen today.  What motivated you to change it now?

^ permalink raw reply

* Re: [PATCH v13 10/24] selftests/vm: clear the bits in shadow reg when a pkey is freed.
From: Dave Hansen @ 2018-06-20 14:49 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-11-git-send-email-linuxram@us.ibm.com>

On 06/13/2018 05:45 PM, Ram Pai wrote:
> When a key is freed, the  key  is  no  more  effective.
> Clear the bits corresponding to the pkey in the shadow
> register. Otherwise  it  will carry some spurious bits
> which can trigger false-positive asserts.
...--- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -556,6 +556,9 @@ int alloc_pkey(void)
>  int sys_pkey_free(unsigned long pkey)
>  {
>  	int ret = syscall(SYS_pkey_free, pkey);
> +
> +	if (!ret)
> +		shadow_pkey_reg &= clear_pkey_flags(pkey, PKEY_DISABLE_ACCESS);
>  	dprintf1("%s(pkey=%ld) syscall ret: %d\n", __func__, pkey, ret);
>  	return ret;
>  }

This would be great code for an actual application.  But, I'm not
immediately convinced we want sane, kind behavior in our selftest.  x86
doesn't clear the hardware register at pkey_free, so wouldn't this cause
the shadow and the hardware register to diverge?

^ permalink raw reply

* Re: [PATCH v13 13/24] selftests/vm: pkey register should match shadow pkey
From: Dave Hansen @ 2018-06-20 14:53 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-14-git-send-email-linuxram@us.ibm.com>

On 06/13/2018 05:45 PM, Ram Pai wrote:
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -916,10 +916,10 @@ void expected_pkey_fault(int pkey)
>  		pkey_assert(last_si_pkey == pkey);
>  
>  	/*
> -	 * The signal handler shold have cleared out PKEY register to let the
> +	 * The signal handler should have cleared out pkey-register to let the
>  	 * test program continue.  We now have to restore it.
>  	 */
> -	if (__read_pkey_reg() != 0)
> +	if (__read_pkey_reg() != shadow_pkey_reg)
>  		pkey_assert(0);
>  
>  	__write_pkey_reg(shadow_pkey_reg);

I think this is wrong on x86.

When we leave the signal handler, we zero out PKRU so that the faulting
instruction can continue, that's why we have the check against zero.
I'm actually kinda surprised this works.

Logically, this patch does:

	if (hardware != shadow)
		error();
	hardware = shadow;

That does not look right to me.  What we want is:

	if (hardware != signal_return_pkey_reg)
		error();
	hardware = shadow;

^ permalink raw reply

* Re: [PATCH v13 14/24] selftests/vm: generic cleanup
From: Dave Hansen @ 2018-06-20 14:57 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-15-git-send-email-linuxram@us.ibm.com>

On 06/13/2018 05:45 PM, Ram Pai wrote:
> cleanup the code to satisfy coding styles.

A lot of this makes the code look worse and more unreadable than before.
 I think someone just ran it through lindent or something.

I also took a few CodingStyle liberties in here because it's _not_ main
kernel code.  I think the occasional 85-column line is probably OK in here.

^ permalink raw reply

* Re: [PATCH v13 15/24] selftests/vm: powerpc implementation for generic abstraction
From: Dave Hansen @ 2018-06-20 15:06 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1528937115-10132-16-git-send-email-linuxram@us.ibm.com>

> +static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
> +{
> +#ifdef si_pkey
> +	return &si->si_pkey;
> +#else
> +	return (u32 *)(((u8 *)si) + si_pkey_offset);
> +#endif
>  }

FWIW, this isn't ppc-specific.


> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index f43a319..88dfa40 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -197,17 +197,18 @@ void dump_mem(void *dumpme, int len_bytes)
>  
>  int pkey_faults;
>  int last_si_pkey = -1;
> +void pkey_access_allow(int pkey);
>  void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  {
>  	ucontext_t *uctxt = vucontext;
>  	int trapno;
>  	unsigned long ip;
>  	char *fpregs;
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
>  	pkey_reg_t *pkey_reg_ptr;
> -	u64 siginfo_pkey;
> +#endif /* defined(__i386__) || defined(__x86_64__) */
> +	u32 siginfo_pkey;
>  	u32 *si_pkey_ptr;
> -	int pkey_reg_offset;
> -	fpregset_t fpregset;
>  
>  	dprint_in_signal = 1;
>  	dprintf1(">>>>===============SIGSEGV============================\n");
> @@ -217,12 +218,14 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  
>  	trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
>  	ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
> -	fpregset = uctxt->uc_mcontext.fpregs;
> -	fpregs = (void *)fpregset;
> +	fpregs = (char *) uctxt->uc_mcontext.fpregs;
>  
>  	dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
>  			__func__, trapno, ip, si_code_str(si->si_code),
>  			si->si_code);
> +
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
> +
>  #ifdef __i386__
>  	/*
>  	 * 32-bit has some extra padding so that userspace can tell whether
> @@ -230,20 +233,28 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  	 * state.  We just assume that it is here.
>  	 */
>  	fpregs += 0x70;
> -#endif
> -	pkey_reg_offset = pkey_reg_xstate_offset();
> -	pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]);
> +#endif /* __i386__ */
>  
> -	dprintf1("siginfo: %p\n", si);
> -	dprintf1(" fpregs: %p\n", fpregs);
> +	pkey_reg_ptr = (void *)(&fpregs[pkey_reg_xstate_offset()]);
>  	/*
> -	 * If we got a PKEY fault, we *HAVE* to have at least one bit set in
> +	 * If we got a key fault, we *HAVE* to have at least one bit set in
>  	 * here.
>  	 */
>  	dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
>  	if (DEBUG_LEVEL > 4)
>  		dump_mem(pkey_reg_ptr - 128, 256);
>  	pkey_assert(*pkey_reg_ptr);
> +#endif /* defined(__i386__) || defined(__x86_64__) */

The series up to this point has been looking pretty nice and broken out
and easy to read.  It goes off the rails a bit here.  Adding #ifdefs and..

> +	dprintf1("siginfo: %p\n", si);
> +	dprintf1(" fpregs: %p\n", fpregs);
> +
> +	si_pkey_ptr = siginfo_get_pkey_ptr(si);
> +	dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
> +	dump_mem(si_pkey_ptr - 8, 24);
> +	siginfo_pkey = *si_pkey_ptr;
> +	pkey_assert(siginfo_pkey < NR_PKEYS);
> +	last_si_pkey = siginfo_pkey;
>  
>  	if ((si->si_code == SEGV_MAPERR) ||
>  	    (si->si_code == SEGV_ACCERR) ||
> @@ -252,22 +263,21 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  		exit(4);
>  	}
>  
> -	si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
> -	dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
> -	dump_mem((u8 *)si_pkey_ptr - 8, 24);
> -	siginfo_pkey = *si_pkey_ptr;
> -	pkey_assert(siginfo_pkey < NR_PKEYS);
> -	last_si_pkey = siginfo_pkey;

Moving random code around with no explanation.

> -	dprintf1("signal pkey_reg from xsave: "PKEY_REG_FMT"\n", *pkey_reg_ptr);
>  	/*
>  	 * need __read_pkey_reg() version so we do not do shadow_pkey_reg
>  	 * checking
>  	 */
>  	dprintf1("signal pkey_reg from  pkey_reg: "PKEY_REG_FMT"\n",
>  			__read_pkey_reg());
> -	dprintf1("pkey from siginfo: %jx\n", siginfo_pkey);
> -	*(u64 *)pkey_reg_ptr = 0x00000000;
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
> +	dprintf1("signal pkey_reg from xsave: "PKEY_REG_FMT"\n", *pkey_reg_ptr);
> +	*(u64 *)pkey_reg_ptr &= clear_pkey_flags(siginfo_pkey,
> +			PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE);
> +#elif __powerpc64__
> +	pkey_access_allow(siginfo_pkey);
> +#endif
> +	shadow_pkey_reg &= clear_pkey_flags(siginfo_pkey,
> +			PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE);
>  	dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction "
>  			"to continue\n");
>  	pkey_faults++;
> @@ -1331,9 +1341,8 @@ void test_executing_on_unreadable_memory(int *ptr, u16 pkey)
>  	madvise(p1, PAGE_SIZE, MADV_DONTNEED);
>  	lots_o_noops_around_write(&scratch);
>  	do_not_expect_pkey_fault("executing on PROT_EXEC memory");
> -	ptr_contents = read_ptr(p1);
> -	dprintf2("ptr (%p) contents@%d: %x\n", p1, __LINE__, ptr_contents);
> -	expected_pkey_fault(pkey);
> +
> +	expect_fault_on_read_execonly_key(p1, pkey);
>  }

While none of this is a deal-breaker (as I said, I feel like the
selftests/ rules are a bit more lax) this does kinda break the illusion
of a nice, broken out series.

Could you address this a bit in the changelog at least, please?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox