LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 7/9] macintosh/via-macii: Use unsigned type for autopoll_devs variable
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index e143ddb81de34..447273967e1e8 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -125,7 +125,7 @@ static bool srq_asserted;    /* have to poll for the device that asserted it */
 static u8 last_cmd;              /* the most recent command byte transmitted */
 static u8 last_talk_cmd;    /* the most recent Talk command byte transmitted */
 static u8 last_poll_cmd; /* the most recent Talk R0 command byte transmitted */
-static int autopoll_devs;      /* bits set are device addresses to be polled */
+static unsigned int autopoll_devs;  /* bits set are device addresses to poll */
 
 /* Check for MacII style ADB */
 static int macii_probe(void)
@@ -291,7 +291,7 @@ static int macii_autopoll(int devs)
 	local_irq_save(flags);
 
 	/* bit 1 == device 1, and so on. */
-	autopoll_devs = devs & 0xFFFE;
+	autopoll_devs = (unsigned int)devs & 0xFFFE;
 
 	if (!current_req) {
 		macii_queue_poll();
-- 
2.26.2


^ permalink raw reply related

* [PATCH 8/9] macintosh/via-macii: Use the stack for reset request storage
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The adb_request struct can be stored on the stack because the request
is synchronous and is completed before the function returns.

Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 447273967e1e8..2f9be4ec7d345 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -313,7 +313,7 @@ static void macii_poll(void)
 /* Reset the bus */
 static int macii_reset_bus(void)
 {
-	static struct adb_request req;
+	struct adb_request req;
 
 	/* Command = 0, Address = ignored */
 	adb_request(&req, NULL, ADBREQ_NOSEND, 1, ADB_BUSRESET);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 0/9] Macintosh II ADB driver fixes
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson

Various issues with the via-macii driver have become apparent over the
years. Some examples:

 - A Talk command response can be lost. This can result in phantom devices
being probed or an incorrect device handler ID being retrieved.

 - A reply packet containing a null byte can get truncated. Such packets
are sometimes generated by ADB keyboards.

 - A Talk Register 3 reply from device 15 (that is, command byte 0xFF)
can be mistaken for a bus timeout (empty packet).

This patch series contains fixes for all known bugs in the via-macii
driver, plus a few code style improvements. It has been successfully
tested on an Apple Centris 650 and qemu-system-m68k.

The patched kernel does regress on past QEMU releases, due to ADB
transceiver emulation bugs. Those bugs have been fixed in mainline QEMU.
My thanks go to Mark Cave-Ayland for that effort and for figuring out
the improvements to the signalling between the VIA and the transceiver.

Note to -stable maintainers: these fixes can be cherry-picked without
difficulty, if you have the 5 commits that appeared in v5.0:

b52dce8738938 macintosh/via-macii: Synchronous bus reset
5f93d7081a47e macintosh/via-macii: Remove BUG_ON assertions
5ce6185c2ef4e macintosh/via-macii: Simplify locking
351e5ad327d07 macintosh/via-macii, macintosh/adb-iop: Modernize printk calls
47fd2060660e6 macintosh/via-macii, macintosh/adb-iop: Clean up whitespace

Just for the sake of simplicity, the 'fixes' tags in this series limit
backporting to 'v5.0+'.

Finn Thain (9):
  macintosh/via-macii: Access autopoll_devs when inside lock
  macintosh/via-macii: Poll the device most likely to respond
  macintosh/via-macii: Handle /CTLR_IRQ signal correctly
  macintosh/via-macii: Remove read_done state
  macintosh/via-macii: Handle poll replies correctly
  macintosh/via-macii: Use bool type for reading_reply variable
  macintosh/via-macii: Use unsigned type for autopoll_devs variable
  macintosh/via-macii: Use the stack for reset request storage
  macintosh/via-macii: Clarify definition of macii_init()

 drivers/macintosh/via-macii.c | 324 +++++++++++++++++++---------------
 1 file changed, 179 insertions(+), 145 deletions(-)

-- 
2.26.2

^ permalink raw reply

* [PATCH 9/9] macintosh/via-macii: Clarify definition of macii_init()
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The function prototype correctly specifies the 'static' storage class.
Let the function definition match the declaration for better readability.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 2f9be4ec7d345..060e03f2264bc 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -140,7 +140,7 @@ static int macii_probe(void)
 }
 
 /* Initialize the driver */
-int macii_init(void)
+static int macii_init(void)
 {
 	unsigned long flags;
 	int err;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 4/9] macintosh/via-macii: Remove read_done state
From: Finn Thain @ 2020-06-28  4:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Laurent Vivier, Mark Cave-Ayland, linux-kernel, linux-m68k,
	Geert Uytterhoeven, linuxppc-dev, Joshua Thompson
In-Reply-To: <cover.1593318192.git.fthain@telegraphics.com.au>

The driver state machine may enter the 'read_done' state when leaving the
'idle' or 'reading' state. This transition is pointless, as is the extra
interrupt it requires. The interrupt is produced by the transceiver
(even when it has no data to send) because an extra EVEN/ODD toggle
was signalled by the driver. Drop the extra state to simplify the code.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2") # v5.0+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/macintosh/via-macii.c | 70 ++++++++++++++---------------------
 1 file changed, 28 insertions(+), 42 deletions(-)

diff --git a/drivers/macintosh/via-macii.c b/drivers/macintosh/via-macii.c
index 6a5cd7de05baf..d29c87943ca46 100644
--- a/drivers/macintosh/via-macii.c
+++ b/drivers/macintosh/via-macii.c
@@ -110,7 +110,6 @@ static enum macii_state {
 	idle,
 	sending,
 	reading,
-	read_done,
 } macii_state;
 
 static struct adb_request *current_req; /* first request struct in the queue */
@@ -411,8 +410,8 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			reply_len = 1;
 		} else {
 			/* bus timeout */
-			macii_state = read_done;
 			reply_len = 0;
+			break;
 		}
 
 		/* set ADB state = even for first data byte */
@@ -471,20 +470,6 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 				current_req = req->next;
 				if (req->done)
 					(*req->done)(req);
-
-				if (!current_req)
-					macii_queue_poll();
-
-				if (current_req && macii_state == idle)
-					macii_start();
-
-				if (macii_state == idle) {
-					/* reset to shift in */
-					via[ACR] &= ~SR_OUT;
-					x = via[SR];
-					/* set ADB state idle - might get SRQ */
-					via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
-				}
 				break;
 			}
 		} else {
@@ -511,12 +496,28 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 			} else if (status == ST_ODD && reply_len == 2) {
 				srq_asserted = true;
 			} else {
-				macii_state = read_done;
+				macii_state = idle;
+
+				if (bus_timeout)
+					reply_len = 0;
+
+				if (reading_reply) {
+					struct adb_request *req = current_req;
+
+					req->reply_len = reply_len;
+
+					req->complete = 1;
+					current_req = req->next;
+					if (req->done)
+						(*req->done)(req);
+				} else if (reply_len && autopoll_devs) {
+					adb_input(reply_buf, reply_len, 0);
+				}
+				break;
 			}
 		}
 
-		if (macii_state == reading &&
-		    reply_len < ARRAY_SIZE(reply_buf)) {
+		if (reply_len < ARRAY_SIZE(reply_buf)) {
 			reply_ptr++;
 			*reply_ptr = x;
 			reply_len++;
@@ -526,37 +527,22 @@ static irqreturn_t macii_interrupt(int irq, void *arg)
 		via[B] ^= ST_MASK;
 		break;
 
-	case read_done:
-		x = via[SR];
-
-		if (bus_timeout)
-			reply_len = 0;
-
-		if (reading_reply) {
-			reading_reply = 0;
-			req = current_req;
-			req->reply_len = reply_len;
-			req->complete = 1;
-			current_req = req->next;
-			if (req->done)
-				(*req->done)(req);
-		} else if (reply_len && autopoll_devs)
-			adb_input(reply_buf, reply_len, 0);
-
-		macii_state = idle;
+	default:
+		break;
+	}
 
+	if (macii_state == idle) {
 		if (!current_req)
 			macii_queue_poll();
 
 		if (current_req)
 			macii_start();
 
-		if (macii_state == idle)
+		if (macii_state == idle) {
+			via[ACR] &= ~SR_OUT;
+			x = via[SR];
 			via[B] = (via[B] & ~ST_MASK) | ST_IDLE;
-		break;
-
-	default:
-		break;
+		}
 	}
 
 	local_irq_restore(flags);
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH 9/8] mm: Account PMD tables like PTE tables
From: Mike Rapoport @ 2020-06-28  6:59 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-ia64, linux-sh, Peter Zijlstra, linux-mips, Max Filippov,
	Satheesh Rajendran, linux-csky, sparclinux, linux-riscv,
	linux-arch, Stephen Rothwell, linux-hexagon, Joerg Roedel,
	Mike Rapoport, Abdul Haleem, linux-snps-arc, linux-xtensa,
	Arnd Bergmann, linux-s390, linux-um, Steven Rostedt, linux-m68k,
	openrisc, Andy Lutomirski, Stafford Horne, linux-arm-kernel,
	linux-parisc, linux-mm, linux-kernel, linux-alpha, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200627184642.GF25039@casper.infradead.org>

On Sat, Jun 27, 2020 at 07:46:42PM +0100, Matthew Wilcox wrote:
> We account the PTE level of the page tables to the process in order to
> make smarter OOM decisions and help diagnose why memory is fragmented.
> For these same reasons, we should account pages allocated for PMDs.
> With larger process address spaces and ASLR, the number of PMDs in use
> is higher than it used to be so the inaccuracy is starting to matter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  include/linux/mm.h | 24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dc7b87310c10..b283e25fcffa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2271,7 +2271,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  	return ptlock_ptr(pmd_to_page(pmd));
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pmd_ptlock_init(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	page->pmd_huge_pte = NULL;
> @@ -2279,7 +2279,7 @@ static inline bool pgtable_pmd_page_ctor(struct page *page)
>  	return ptlock_init(page);
>  }
>  
> -static inline void pgtable_pmd_page_dtor(struct page *page)
> +static inline void pmd_ptlock_free(struct page *page)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> @@ -2296,8 +2296,8 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  	return &mm->page_table_lock;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page) { return true; }
> -static inline void pgtable_pmd_page_dtor(struct page *page) {}
> +static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2310,6 +2310,22 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
>  	return ptl;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> +	if (!pmd_ptlock_init(page))
> +		return false;
> +	__SetPageTable(page);
> +	inc_zone_page_state(page, NR_PAGETABLE);
> +	return true;
> +}
> +
> +static inline void pgtable_pmd_page_dtor(struct page *page)
> +{
> +	pmd_ptlock_free(page);
> +	__ClearPageTable(page);
> +	dec_zone_page_state(page, NR_PAGETABLE);
> +}
> +
>  /*
>   * No scalability reason to split PUD locks yet, but follow the same pattern
>   * as the PMD locks to make it easier if we decide to.  The VM should not be
> -- 
> 2.27.0
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 4/8] asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
From: Mike Rapoport @ 2020-06-28  7:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-ia64, linux-sh, Peter Zijlstra, linux-mips, Max Filippov,
	Satheesh Rajendran, linux-csky, sparclinux, linux-riscv,
	linux-arch, Stephen Rothwell, linux-hexagon, Joerg Roedel,
	Mike Rapoport, Abdul Haleem, linux-snps-arc, linux-xtensa,
	Arnd Bergmann, linux-s390, linux-um, Steven Rostedt, linux-m68k,
	openrisc, Andy Lutomirski, Stafford Horne, linux-arm-kernel,
	linux-parisc, linux-mm, linux-kernel, linux-alpha, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200627190304.GG25039@casper.infradead.org>

On Sat, Jun 27, 2020 at 08:03:04PM +0100, Matthew Wilcox wrote:
> On Sat, Jun 27, 2020 at 05:34:49PM +0300, Mike Rapoport wrote:
> > More elaborate versions on arm64 and x86 account memory for the user page
> > tables and call to pgtable_pmd_page_ctor() as the part of PMD page
> > initialization.
> > 
> > Move the arm64 version to include/asm-generic/pgalloc.h and use the generic
> > version on several architectures.
> > 
> > The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
> > not enabled, so there is no functional change for most architectures except
> > of the addition of __GFP_ACCOUNT for allocation of user page tables.
> 
> Thanks for including this line; it reminded me that we're not setting
> the PageTable flag on the page, nor accounting it to the zone page stats.
> Hope you don't mind me tagging a patch to do that on as 9/8.
> 
> We could also do with a pud_page_[cd]tor and maybe even p4d/pgd versions.
> But that brings me to the next question -- could/should some of this
> be moved over to asm-generic/pgalloc.h?  The ctor/dtor aren't called
> from anywhere else, and there's value to reducing the total amount of
> code in mm.h, but then there's also value to keeping all the ifdef
> ARCH_ENABLE_SPLIT_PMD_PTLOCK code together too.  So I'm a bit torn.
> What do you think?

There are arhcitectures that don't use asm-generic/pgalloc.h but rather
have their own, sometimes completely different, versoins of these
funcitons.

I've tried adding linux/pgalloc.h, but I've ended up with contradicting
need to include asm/pgalloc.h before the generic code for some
architecures or after the generic code for others :)

I think let's leave it in mm.h for now, maybe after several more cleaups
we could do better.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.
From: Bharata B Rao @ 2020-06-28 16:11 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <1592606622-29884-1-git-send-email-linuxram@us.ibm.com>

On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
> 100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
> caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
> secure-memory. It has to just migrate the necessary and sufficient GFNs.
> 
> However when the optimization is incorporated in the Ultravisor, the Hypervisor
> starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
> will explicitly request to migrate, each and every GFN of the VM. If only
> necessary and sufficient GFNs are requested for migration, the Hypervisor
> continues to manage the remaining GFNs as normal GFNs. This leads of memory
> corruption, manifested consistently when the SVM reboots.
> 
> The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
> expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
> the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
> problem manifests as random errors in the SVM, when a memory-slot is
> hotplugged.
> 
> This patch series automatically migrates the non-migrated pages of a SVM,
>      and thus solves the problem.

So this is what I understand as the objective of this patchset:

1. Getting all the pages into the secure memory right when the guest
   transitions into secure mode is expensive. Ultravisor wants to just get
   the necessary and sufficient pages in and put the onus on the Hypervisor
   to mark the remaining pages (w/o actual page-in) as secure during
   H_SVM_INIT_DONE.
2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
   are already secure from the pages that are shared and that are paged-out.
   For this you are introducing all these new states in HV.

UV knows about the shared GFNs and maintains the state of the same. Hence
let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
and if UV finds any shared pages in them, let it fail the uv-page-in call.
Then HV can fail the migration for it  and the page continues to remain
shared. With this, you don't need to maintain a state for secured GFN in HV.

In the unlikely case of sending a paged-out page to UV during
H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
if required. With this, you don't need a state in HV to identify a
paged-out-but-encrypted state.

Doesn't the above work? If so, we can avoid all those extra states
in HV. That way HV can continue to differentiate only between two types
of pages: secure and not-secure. The rest of the states (shared,
paged-out-encrypted) actually belong to SVM/UV and let UV take care of
them.

Or did I miss something?

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v3 3/4] KVM: PPC: Book3S HV: migrate remaining normal-GFNs to secure-GFNs in H_SVM_INIT_DONE
From: Bharata B Rao @ 2020-06-28 16:20 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <1592606622-29884-4-git-send-email-linuxram@us.ibm.com>

On Fri, Jun 19, 2020 at 03:43:41PM -0700, Ram Pai wrote:
> H_SVM_INIT_DONE incorrectly assumes that the Ultravisor has explicitly

As noted in the last iteration, can you reword the above please?
I don't see it as an incorrect assumption, but see it as extension of
scope now :-)

> called H_SVM_PAGE_IN for all secure pages. These GFNs continue to be
> normal GFNs associated with normal PFNs; when infact, these GFNs should
> have been secure GFNs, associated with device PFNs.
> 
> Move all the PFNs associated with the SVM's GFNs, to secure-PFNs, in
> H_SVM_INIT_DONE. Skip the GFNs that are already Paged-in or Shared
> through H_SVM_PAGE_IN, or Paged-in followed by a Paged-out through
> UV_PAGE_OUT.
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  Documentation/powerpc/ultravisor.rst        |   2 +
>  arch/powerpc/include/asm/kvm_book3s_uvmem.h |   2 +
>  arch/powerpc/kvm/book3s_hv_uvmem.c          | 154 +++++++++++++++++++++++-----
>  3 files changed, 132 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/powerpc/ultravisor.rst b/Documentation/powerpc/ultravisor.rst
> index 363736d..3bc8957 100644
> --- a/Documentation/powerpc/ultravisor.rst
> +++ b/Documentation/powerpc/ultravisor.rst
> @@ -933,6 +933,8 @@ Return values
>  	* H_UNSUPPORTED		if called from the wrong context (e.g.
>  				from an SVM or before an H_SVM_INIT_START
>  				hypercall).
> +	* H_STATE		if the hypervisor could not successfully
> +                                transition the VM to Secure VM.
>  
>  Description
>  ~~~~~~~~~~~
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index 5a9834e..b9cd7eb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -22,6 +22,8 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>  unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm);
>  void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>  			     struct kvm *kvm, bool skip_page_out);
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +			const struct kvm_memory_slot *memslot);
>  #else
>  static inline int kvmppc_uvmem_init(void)
>  {
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index c8c0290..449e8a7 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -93,6 +93,7 @@
>  #include <asm/ultravisor.h>
>  #include <asm/mman.h>
>  #include <asm/kvm_ppc.h>
> +#include <asm/kvm_book3s_uvmem.h>
>  
>  static struct dev_pagemap kvmppc_uvmem_pgmap;
>  static unsigned long *kvmppc_uvmem_bitmap;
> @@ -339,6 +340,21 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsigned long gfn, struct kvm *kvm,
>  	return false;
>  }
>  
> +/* return true, if the GFN is a shared-GFN, or a secure-GFN */
> +bool kvmppc_gfn_has_transitioned(unsigned long gfn, struct kvm *kvm)
> +{
> +	struct kvmppc_uvmem_slot *p;
> +
> +	list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) {
> +		if (gfn >= p->base_pfn && gfn < p->base_pfn + p->nr_pfns) {
> +			unsigned long index = gfn - p->base_pfn;
> +
> +			return (p->pfns[index] & KVMPPC_GFN_FLAG_MASK);
> +		}
> +	}
> +	return false;
> +}
> +
>  unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  {
>  	struct kvm_memslots *slots;
> @@ -379,12 +395,31 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
>  
>  unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
>  {
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int srcu_idx;
> +	long ret = H_SUCCESS;
> +
>  	if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
>  		return H_UNSUPPORTED;
>  
> +	/* migrate any unmoved normal pfn to device pfns*/
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +	slots = kvm_memslots(kvm);
> +	kvm_for_each_memslot(memslot, slots) {
> +		ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
> +		if (ret) {
> +			ret = H_STATE;
> +			goto out;
> +		}
> +	}
> +
>  	kvm->arch.secure_guest |= KVMPPC_SECURE_INIT_DONE;
>  	pr_info("LPID %d went secure\n", kvm->arch.lpid);
> -	return H_SUCCESS;
> +
> +out:
> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> +	return ret;
>  }
>  
>  /*
> @@ -505,12 +540,14 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
>  }
>  
>  /*
> - * Alloc a PFN from private device memory pool and copy page from normal
> - * memory to secure memory using UV_PAGE_IN uvcall.
> + * Alloc a PFN from private device memory pool. If @pagein is true,
> + * copy page from normal memory to secure memory using UV_PAGE_IN uvcall.
>   */
> -static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
> -		   unsigned long end, unsigned long gpa, struct kvm *kvm,
> -		   unsigned long page_shift, bool *downgrade)
> +static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
> +		unsigned long start,
> +		unsigned long end, unsigned long gpa, struct kvm *kvm,
> +		unsigned long page_shift,
> +		bool pagein)
>  {
>  	unsigned long src_pfn, dst_pfn = 0;
>  	struct migrate_vma mig;
> @@ -526,18 +563,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  	mig.src = &src_pfn;
>  	mig.dst = &dst_pfn;
>  
> -	/*
> -	 * We come here with mmap_sem write lock held just for
> -	 * ksm_madvise(), otherwise we only need read mmap_sem.
> -	 * Hence downgrade to read lock once ksm_madvise() is done.
> -	 */

Can you please retain this comment? This explains why we take write lock
and then downgrade.

> -	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> -			  MADV_UNMERGEABLE, &vma->vm_flags);
> -	downgrade_write(&kvm->mm->mmap_sem);
> -	*downgrade = true;

When I introduced this variable, there was a suggestion to rename it
to "downgraded", but we were a bit late then. When you are touching
this now, can you rename it appropriately?

> -	if (ret)
> -		return ret;
> -
>  	ret = migrate_vma_setup(&mig);
>  	if (ret)
>  		return ret;
> @@ -553,11 +578,16 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  		goto out_finalize;
>  	}
>  
> -	pfn = *mig.src >> MIGRATE_PFN_SHIFT;
> -	spage = migrate_pfn_to_page(*mig.src);
> -	if (spage)
> -		uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0,
> -			   page_shift);
> +	if (pagein) {
> +		pfn = *mig.src >> MIGRATE_PFN_SHIFT;
> +		spage = migrate_pfn_to_page(*mig.src);
> +		if (spage) {
> +			ret = uv_page_in(kvm->arch.lpid, pfn << page_shift,
> +					gpa, 0, page_shift);
> +			if (ret)
> +				goto out_finalize;
> +		}
> +	}
>  
>  	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
>  	migrate_vma_pages(&mig);
> @@ -566,6 +596,66 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  	return ret;
>  }
>  
> +int kvmppc_uv_migrate_mem_slot(struct kvm *kvm,
> +		const struct kvm_memory_slot *memslot)
> +{
> +	unsigned long gfn = memslot->base_gfn;
> +	unsigned long end;
> +	bool downgrade = false;
> +	struct vm_area_struct *vma;
> +	int i, ret = 0;
> +	unsigned long start = gfn_to_hva(kvm, gfn);
> +
> +	if (kvm_is_error_hva(start))
> +		return H_STATE;
> +
> +	end = start + (memslot->npages << PAGE_SHIFT);
> +
> +	down_write(&kvm->mm->mmap_sem);
> +
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	vma = find_vma_intersection(kvm->mm, start, end);
> +	if (!vma || vma->vm_start > start || vma->vm_end < end) {
> +		ret = H_STATE;
> +		goto out_unlock;
> +	}
> +
> +	ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +			  MADV_UNMERGEABLE, &vma->vm_flags);
> +	downgrade_write(&kvm->mm->mmap_sem);
> +	downgrade = true;
> +	if (ret) {
> +		ret = H_STATE;
> +		goto out_unlock;
> +	}
> +
> +	for (i = 0; i < memslot->npages; i++, ++gfn) {
> +		/*
> +		 * skip GFNs that have already tranistioned.
> +		 * paged-in GFNs, shared GFNs, paged-in GFNs
> +		 * that were later paged-out.
> +		 */
> +		if (kvmppc_gfn_has_transitioned(gfn, kvm))
> +			continue;
> +
> +		start = gfn_to_hva(kvm, gfn);
> +		end = start + (1UL << PAGE_SHIFT);
> +		ret = kvmppc_svm_migrate_page(vma, start, end,
> +			(gfn << PAGE_SHIFT), kvm, PAGE_SHIFT, false);
> +

As I said last time, you are assuming that the vma that you obtained
in the beginning actually spans the entire memslot range. This might
be true as you haven't found any issues during testing, but I feel it
is better if there is no such implicit assumption in the code here.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
From: Steven Rostedt @ 2020-06-28 19:07 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Michal Marek, Yoshinori Sato, linux-kbuild, linux-sh,
	linux-kernel, Russell King, linuxppc-dev, Ingo Molnar,
	Paul Mackerras, Sami Tolvanen, Rich Felker, linux-arm-kernel
In-Reply-To: <20200628015041.1000002-1-masahiroy@kernel.org>

On Sun, 28 Jun 2020 10:50:41 +0900
Masahiro Yamada <masahiroy@kernel.org> wrote:

> CFLAGS_REMOVE_<file>.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
> 
> Add ccflags-remove-y and asflags-remove-y to make it easily.
> 
> Use ccflags-remove-y to clean up some Makefiles.
> 
> Suggested-by: Sami Tolvanen <samitolvanen@google.com>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---

Nice feature!

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply

* Re: [PATCH v3 0/4] Migrate non-migrated pages of a SVM.
From: Bharata B Rao @ 2020-06-29  1:53 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david
In-Reply-To: <20200628161149.GA27215@in.ibm.com>

On Sun, Jun 28, 2020 at 09:41:53PM +0530, Bharata B Rao wrote:
> On Fri, Jun 19, 2020 at 03:43:38PM -0700, Ram Pai wrote:
> > The time taken to switch a VM to Secure-VM, increases by the size of the VM.  A
> > 100GB VM takes about 7minutes. This is unacceptable.  This linear increase is
> > caused by a suboptimal behavior by the Ultravisor and the Hypervisor.  The
> > Ultravisor unnecessarily migrates all the GFN of the VM from normal-memory to
> > secure-memory. It has to just migrate the necessary and sufficient GFNs.
> > 
> > However when the optimization is incorporated in the Ultravisor, the Hypervisor
> > starts misbehaving. The Hypervisor has a inbuilt assumption that the Ultravisor
> > will explicitly request to migrate, each and every GFN of the VM. If only
> > necessary and sufficient GFNs are requested for migration, the Hypervisor
> > continues to manage the remaining GFNs as normal GFNs. This leads of memory
> > corruption, manifested consistently when the SVM reboots.
> > 
> > The same is true, when a memory slot is hotplugged into a SVM. The Hypervisor
> > expects the ultravisor to request migration of all GFNs to secure-GFN.  But at
> > the same time, the hypervisor is unable to handle any H_SVM_PAGE_IN requests
> > from the Ultravisor, done in the context of UV_REGISTER_MEM_SLOT ucall.  This
> > problem manifests as random errors in the SVM, when a memory-slot is
> > hotplugged.
> > 
> > This patch series automatically migrates the non-migrated pages of a SVM,
> >      and thus solves the problem.
> 
> So this is what I understand as the objective of this patchset:
> 
> 1. Getting all the pages into the secure memory right when the guest
>    transitions into secure mode is expensive. Ultravisor wants to just get
>    the necessary and sufficient pages in and put the onus on the Hypervisor
>    to mark the remaining pages (w/o actual page-in) as secure during
>    H_SVM_INIT_DONE.
> 2. During H_SVM_INIT_DONE, you want a way to differentiate the pages that
>    are already secure from the pages that are shared and that are paged-out.
>    For this you are introducing all these new states in HV.
> 
> UV knows about the shared GFNs and maintains the state of the same. Hence
> let HV send all the pages (minus already secured pages) via H_SVM_PAGE_IN
> and if UV finds any shared pages in them, let it fail the uv-page-in call.
> Then HV can fail the migration for it  and the page continues to remain
> shared. With this, you don't need to maintain a state for secured GFN in HV.
> 
> In the unlikely case of sending a paged-out page to UV during
> H_SVM_INIT_DONE, let the page-in succeed and HV will fault on it again
> if required. With this, you don't need a state in HV to identify a
> paged-out-but-encrypted state.
> 
> Doesn't the above work?

I see that you want to infact skip the uv-page-in calls from H_SVM_INIT_DONE.
So that would need the extra states in HV which you are proposing here.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/mm: Enable radix GTSE only if supported.
From: Bharata B Rao @ 2020-06-29  4:31 UTC (permalink / raw)
  To: Murilo Opsfelder Araújo; +Cc: aneesh.kumar, linuxppc-dev, npiggin
In-Reply-To: <20200626205530.GA23269@kermit.br.ibm.com>

On Fri, Jun 26, 2020 at 05:55:30PM -0300, Murilo Opsfelder Araújo wrote:
> > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> > index bc73abf0bc25..152aa0200cef 100644
> > --- a/arch/powerpc/mm/init_64.c
> > +++ b/arch/powerpc/mm/init_64.c
> > @@ -407,12 +407,15 @@ static void __init early_check_vec5(void)
> >  		if (!(vec5[OV5_INDX(OV5_RADIX_GTSE)] &
> >  						OV5_FEAT(OV5_RADIX_GTSE))) {
> >  			pr_warn("WARNING: Hypervisor doesn't support RADIX with GTSE\n");
> > -		}
> > +			cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> > +		} else
> > +			cur_cpu_spec->mmu_features |= MMU_FTR_GTSE;
> >  		/* Do radix anyway - the hypervisor said we had to */
> >  		cur_cpu_spec->mmu_features |= MMU_FTR_TYPE_RADIX;
> >  	} else if (mmu_supported == OV5_FEAT(OV5_MMU_HASH)) {
> >  		/* Hypervisor only supports hash - disable radix */
> >  		cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX;
> > +		cur_cpu_spec->mmu_features &= ~MMU_FTR_GTSE;
> >  	}
> >  }
> 
> Is this a part of the code where mmu_clear_feature() cannot be used?

Yes, it appears so. Jump label initialization isn't done yet.

Regards,
Bharata.

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
From: Nicholas Piggin @ 2020-06-29  4:50 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
In-Reply-To: <87d064g692.fsf@mpe.ellerman.id.au>

Excerpts from Michael Ellerman's message of June 12, 2020 4:14 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> ISA v3.1 does not support the SAO storage control attribute required to
>> implement PROT_SAO. PROT_SAO was used by specialised system software
>> (Lx86) that has been discontinued for about 7 years, and is not thought
>> to be used elsewhere, so removal should not cause problems.
>>
>> We rather remove it than keep support for older processors, because
>> live migrating guest partitions to newer processors may not be possible
>> if SAO is in use.
> 

Thakns for the review, sorry got distracted...

> They key details being:
>  - you don't remove PROT_SAO from the uapi header, so code using the
>    definition will still build.
>  - you change arch_validate_prot() to reject PROT_SAO, which means code
>    using it will see a failure from mmap() at runtime.

Yes.

> This obviously risks breaking userspace, even if we think it won't in
> practice. I guess we don't really have any option given the hardware
> support is being dropped.
> 
> Can you repost with a wider Cc list, including linux-mm and linux-arch?

Will do.

> I wonder if we should add a comment to the uapi header, eg?
> 
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index c0c737215b00..d4fdbe768997 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -11,7 +11,7 @@
>  #include <asm-generic/mman-common.h>
>  
>  
> -#define PROT_SAO	0x10		/* Strong Access Ordering */
> +#define PROT_SAO	0x10		/* Unsupported since v5.9 */
>  
>  #define MAP_RENAME      MAP_ANONYMOUS   /* In SunOS terminology */
>  #define MAP_NORESERVE   0x40            /* don't reserve swap pages */

Yeah that makes sense.

>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index f17442c3a092..d9e92586f8dc 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -20,9 +20,13 @@
>>  #define _PAGE_RW		(_PAGE_READ | _PAGE_WRITE)
>>  #define _PAGE_RWX		(_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
>>  #define _PAGE_PRIVILEGED	0x00008 /* kernel access only */
>> -#define _PAGE_SAO		0x00010 /* Strong access order */
>> +
>> +#define _PAGE_CACHE_CTL		0x00030 /* Bits for the folowing cache modes */
>> +			/*	No bits set is normal cacheable memory */
>> +			/*	0x00010 unused, is SAO bit on radix POWER9 */
>>  #define _PAGE_NON_IDEMPOTENT	0x00020 /* non idempotent memory */
>>  #define _PAGE_TOLERANT		0x00030 /* tolerant memory, cache inhibited */
>> +
> 
> Why'd you do it that way vs just dropping _PAGE_SAO from the or below?

Just didn't like _PAGE_CACHE_CTL depending on values of the variants 
that we use.

>> diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
>> index bac2252c839e..c7e923b0000a 100644
>> --- a/arch/powerpc/include/asm/cputable.h
>> +++ b/arch/powerpc/include/asm/cputable.h
>> @@ -191,7 +191,6 @@ static inline void cpu_feature_keys_init(void) { }
>>  #define CPU_FTR_SPURR			LONG_ASM_CONST(0x0000000001000000)
>>  #define CPU_FTR_DSCR			LONG_ASM_CONST(0x0000000002000000)
>>  #define CPU_FTR_VSX			LONG_ASM_CONST(0x0000000004000000)
>> -#define CPU_FTR_SAO			LONG_ASM_CONST(0x0000000008000000)
> 
> Can you do:
> 
> +// Free				LONG_ASM_CONST(0x0000000008000000)

Yes.

> 
>> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
>> index 9bb9bb370b53..579c9229124b 100644
>> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
>> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
>> @@ -400,7 +400,8 @@ static inline bool hpte_cache_flags_ok(unsigned long hptel, bool is_ci)
>>  
>>  	/* Handle SAO */
>>  	if (wimg == (HPTE_R_W | HPTE_R_I | HPTE_R_M) &&
>> -	    cpu_has_feature(CPU_FTR_ARCH_206))
>> +	    cpu_has_feature(CPU_FTR_ARCH_206) &&
>> +	    !cpu_has_feature(CPU_FTR_ARCH_31))
>>  		wimg = HPTE_R_M;
> 
> Shouldn't it reject that combination if the host can't support it?
> 
> Or I guess it does, but yikes that code is not clear.

Yeah, took me a bit to work that out.

>> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
>> index d610c2e07b28..43a62f3e21a0 100644
>> --- a/arch/powerpc/include/asm/mman.h
>> +++ b/arch/powerpc/include/asm/mman.h
>> @@ -13,38 +13,24 @@
>>  #include <linux/pkeys.h>
>>  #include <asm/cpu_has_feature.h>
>>  
>> -/*
>> - * This file is included by linux/mman.h, so we can't use cacl_vm_prot_bits()
>> - * here.  How important is the optimization?
>> - */
> 
> This comment seems confused, but also unrelated to this patch?

Yeah.
 
>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> index 3a409517c031..8d2e4043702f 100644
>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>> @@ -622,7 +622,7 @@ static struct dt_cpu_feature_match __initdata
>>  	{"processor-control-facility-v3", feat_enable_dbell, CPU_FTR_DBELL},
>>  	{"processor-utilization-of-resources-register", feat_enable_purr, 0},
>>  	{"no-execute", feat_enable, 0},
>> -	{"strong-access-ordering", feat_enable, CPU_FTR_SAO},
>> +	{"strong-access-ordering", feat_enable, 0},
> 
> Would it make more sense to drop it entirely? Or leave it commented out.

Probably would, yes.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH] kbuild: introduce ccflags-remove-y and asflags-remove-y
From: Michael Ellerman @ 2020-06-29  5:57 UTC (permalink / raw)
  To: Masahiro Yamada, linux-kbuild
  Cc: Michal Marek, Yoshinori Sato, linux-sh, Masahiro Yamada,
	linuxppc-dev, linux-kernel, Steven Rostedt, Russell King,
	Ingo Molnar, Paul Mackerras, Sami Tolvanen, Rich Felker,
	linux-arm-kernel
In-Reply-To: <20200628015041.1000002-1-masahiroy@kernel.org>

Masahiro Yamada <masahiroy@kernel.org> writes:
> CFLAGS_REMOVE_<file>.o works per object, that is, there is no
> convenient way to filter out flags for every object in a directory.
>
> Add ccflags-remove-y and asflags-remove-y to make it easily.
>
> Use ccflags-remove-y to clean up some Makefiles.
>
> Suggested-by: Sami Tolvanen <samitolvanen@google.com>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  arch/arm/boot/compressed/Makefile | 6 +-----
>  arch/powerpc/xmon/Makefile        | 3 +--
>  arch/sh/boot/compressed/Makefile  | 5 +----
>  kernel/trace/Makefile             | 4 ++--
>  lib/Makefile                      | 5 +----
>  scripts/Makefile.lib              | 4 ++--
>  6 files changed, 8 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> index 89c76ca35640..55cbcdd88ac0 100644
> --- a/arch/powerpc/xmon/Makefile
> +++ b/arch/powerpc/xmon/Makefile
> @@ -7,8 +7,7 @@ UBSAN_SANITIZE := n
>  KASAN_SANITIZE := n
>  
>  # Disable ftrace for the entire directory
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
> +ccflags-remove-y += $(CC_FLAGS_FTRACE)

This could be:

ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)

Similar to kernel/trace/Makefile below.

I don't mind though.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)

cheers

> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 6575bb0a0434..7492844a8b1b 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -2,9 +2,9 @@
>  
>  # Do not instrument the tracer itself:
>  
> +ccflags-remove-$(CONFIG_FUNCTION_TRACER) += $(CC_FLAGS_FTRACE)
> +
>  ifdef CONFIG_FUNCTION_TRACER
> -ORIG_CFLAGS := $(KBUILD_CFLAGS)
> -KBUILD_CFLAGS = $(subst $(CC_FLAGS_FTRACE),,$(ORIG_CFLAGS))
>  
>  # Avoid recursion due to instrumentation.
>  KCSAN_SANITIZE := n

^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: Hari Bathini @ 2020-06-29  5:55 UTC (permalink / raw)
  To: piliu, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Thiago Jung Bauermann, Dave Young,
	Vivek Goyal, Eric Biederman
In-Reply-To: <9cfda789-0747-a67a-b825-5ea6f15099b8@redhat.com>



On 28/06/20 7:44 am, piliu wrote:
> Hi Hari,

Hi Pingfan,

> 
> After a quick through for this series, I have a few question/comment on
> this patch for the time being. Pls see comment inline.
> 
> On 06/27/2020 03:05 AM, Hari Bathini wrote:
>> crashkernel region could have an overlap with special memory regions
>> like  opal, rtas, tce-table & such. These regions are referred to as
>> exclude memory ranges. Setup this ranges during image probe in order
>> to avoid them while finding the buffer for different kdump segments.

[...]

>> +	/*
>> +	 * Use the locate_mem_hole logic in kexec_add_buffer() for regular
>> +	 * kexec_file_load syscall
>> +	 */
>> +	if (kbuf->image->type != KEXEC_TYPE_CRASH)
>> +		return 0;
> Can the ranges overlap [crashk_res.start, crashk_res.end]?  Otherwise
> there is no requirement for @exclude_ranges.

The ranges like rtas, opal are loaded by f/w. They almost always overlap with
crashkernel region. So, @exclude_ranges is required to support kdump.

> I guess you have a design for future. If not true, then it is better to
> fold the condition "if (kbuf->image->type != KEXEC_TYPE_CRASH)" into the
> caller and rename this function to better distinguish use cases between
> kexec and kdump

Yeah, this condition will be folded. I have a follow-up patch for that explaining
why kexec case should also be folded. Will try to add that to this series for v2.

Thanks
Hari

^ permalink raw reply

* Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: Hari Bathini @ 2020-06-29  6:00 UTC (permalink / raw)
  To: piliu, Michael Ellerman, Andrew Morton
  Cc: Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar, Sourabh Jain, lkml,
	linuxppc-dev, Mimi Zohar, Vivek Goyal, Dave Young,
	Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <ed94a357-16aa-9f17-aa5f-5aab6617ed68@redhat.com>



On 28/06/20 7:58 am, piliu wrote:
> Hi Hari,
> 
> If in [4/11],  get_exclude_memory_ranges() turns out to be unnecessary
> ,then this patch is abundant either. As my understanding, memblock has
> already helped to achieved the purpose that get_exclude_memory_ranges()
> wants.

As mentioned in the other patch, there is a need for @exclude_ranges as crashkernel
region is likely to have an overlap with regions like opal, rtas..

But yeah.. the weak function should have been kexec_locate_mem_hole() instead
of kexec_add_buffer(). Will take care of that in v2.

> On 06/27/2020 03:04 AM, Hari Bathini wrote:
>> Some archs can have special memory regions, within the given memory
>> range, which can't be used for the buffer in a kexec segment. As
>> kexec_add_buffer() function is being called from generic code as well,
>> add weak arch_kexec_add_buffer definition for archs to override & take
>> care of special regions before trying to locate a memory hole.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>

Thanks
Hari

^ permalink raw reply

* Re: [PATCH v2] powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config
From: Michael Ellerman @ 2020-06-29  6:06 UTC (permalink / raw)
  To: Bin Meng, linuxppc-dev, linux-kernel; +Cc: Bin Meng
In-Reply-To: <CAEUhbmUj4iC1+4Y=93zpj+aCBqU1ySOHXvQgJHmxNx__UWduCQ@mail.gmail.com>

Bin Meng <bmeng.cn@gmail.com> writes:
> On Sat, May 2, 2020 at 12:45 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>
>> From: Bin Meng <bin.meng@windriver.com>
>>
>> Drop CONFIG_MTD_M25P80 that was removed in
>> commit b35b9a10362d ("mtd: spi-nor: Move m25p80 code in spi-nor.c")
>>
>> Signed-off-by: Bin Meng <bin.meng@windriver.com>
>>
>> ---
>>
>> Changes in v2:
>> - correct the typo (5xx => 85xx) in the commit title
>>
>>  arch/powerpc/configs/85xx-hw.config | 1 -
>>  1 file changed, 1 deletion(-)
>>
>
> It seems this patch isn't applied anywhere. Ping?

I'll grab it.

cheers

^ permalink raw reply

* Re: [PATCH 02/11] powerpc/kexec_file: mark PPC64 specific code
From: Hari Bathini @ 2020-06-29  6:23 UTC (permalink / raw)
  To: Christophe Leroy, Michael Ellerman, Andrew Morton
  Cc: Pingfan Liu, Kexec-ml, Petr Tesarik, Mahesh J Salgaonkar,
	Sourabh Jain, lkml, linuxppc-dev, Mimi Zohar, Vivek Goyal,
	Dave Young, Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <68d59c00-da02-b362-7bd9-a9631eca0fdd@csgroup.eu>

Hi Christophe

Thanks for the review...

On 27/06/20 12:12 pm, Christophe Leroy wrote:
> 
> 
> Le 26/06/2020 à 21:04, Hari Bathini a écrit :
>> Some of the kexec_file_load code isn't PPC64 specific. Move PPC64
>> specific code from kexec/file_load.c to kexec/file_load_64.c. Also,
>> rename purgatory/trampoline.S to purgatory/trampoline_64.S in the
>> same spirit.
> 
> At the time being, CONFIG_KEXEC_FILE depends on PPC64.

Right.

> Are you planning to make it work on PPC32 as well ?

No.

> Otherwise I don't understand the purpose of this patch.

But I want to make sure the changes I am adding in this series do not
get in the way of adding PPC32 changes whenever they are submitted as there
is common code currently and some more of it in the changes I am adding
in this series...

> Also, what is being done in this patch seems to go far beyond what you describe above.> It is propably worth splitting in several patches with proper explanation.

Hmmm.. I don't see any other reason beyond what I mentioned above.
Will try to split the patch but the changelog would still be the same, afaics.

> Christophe
> 
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/kexec.h       |   11 +++
>>   arch/powerpc/kexec/Makefile            |    2 -
>>   arch/powerpc/kexec/elf_64.c            |    7 +-
>>   arch/powerpc/kexec/file_load.c         |   37 ++--------
>>   arch/powerpc/kexec/file_load_64.c      |  108 ++++++++++++++++++++++++++++++
>>   arch/powerpc/purgatory/Makefile        |    4 +
>>   arch/powerpc/purgatory/trampoline.S    |  117 --------------------------------
>>   arch/powerpc/purgatory/trampoline_64.S |  117 ++++++++++++++++++++++++++++++++
>>   8 files changed, 248 insertions(+), 155 deletions(-)
>>   create mode 100644 arch/powerpc/kexec/file_load_64.c
>>   delete mode 100644 arch/powerpc/purgatory/trampoline.S
>>   create mode 100644 arch/powerpc/purgatory/trampoline_64.S

Thanks
Hari

^ permalink raw reply

* Re: [PATCH v2] powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()
From: Christophe Leroy @ 2020-06-29  6:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin,
	segher
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <c2addbd9d76212242d3d8554a2f7ff849fb08b85.1587040754.git.christophe.leroy@c-s.fr>

Hi Michael,

I see this patch is marked as "defered" in patchwork, but I can't see 
any related discussion. Is it normal ?

Christophe

Le 16/04/2020 à 14:39, Christophe Leroy a écrit :
> At the time being, __put_user()/__get_user() and friends only use
> D-form addressing, with 0 offset. Ex:
> 
> 	lwz	reg1, 0(reg2)
> 
> Give the compiler the opportunity to use other adressing modes
> whenever possible, to get more optimised code.
> 
> Hereunder is a small exemple:
> 
> struct test {
> 	u32 item1;
> 	u16 item2;
> 	u8 item3;
> 	u64 item4;
> };
> 
> int set_test_user(struct test __user *from, struct test __user *to)
> {
> 	int err;
> 	u32 item1;
> 	u16 item2;
> 	u8 item3;
> 	u64 item4;
> 
> 	err = __get_user(item1, &from->item1);
> 	err |= __get_user(item2, &from->item2);
> 	err |= __get_user(item3, &from->item3);
> 	err |= __get_user(item4, &from->item4);
> 
> 	err |= __put_user(item1, &to->item1);
> 	err |= __put_user(item2, &to->item2);
> 	err |= __put_user(item3, &to->item3);
> 	err |= __put_user(item4, &to->item4);
> 
> 	return err;
> }
> 
> Before the patch:
> 
> 00000df0 <set_test_user>:
>   df0:	94 21 ff f0 	stwu    r1,-16(r1)
>   df4:	39 40 00 00 	li      r10,0
>   df8:	93 c1 00 08 	stw     r30,8(r1)
>   dfc:	93 e1 00 0c 	stw     r31,12(r1)
>   e00:	7d 49 53 78 	mr      r9,r10
>   e04:	80 a3 00 00 	lwz     r5,0(r3)
>   e08:	38 e3 00 04 	addi    r7,r3,4
>   e0c:	7d 46 53 78 	mr      r6,r10
>   e10:	a0 e7 00 00 	lhz     r7,0(r7)
>   e14:	7d 29 33 78 	or      r9,r9,r6
>   e18:	39 03 00 06 	addi    r8,r3,6
>   e1c:	7d 46 53 78 	mr      r6,r10
>   e20:	89 08 00 00 	lbz     r8,0(r8)
>   e24:	7d 29 33 78 	or      r9,r9,r6
>   e28:	38 63 00 08 	addi    r3,r3,8
>   e2c:	7d 46 53 78 	mr      r6,r10
>   e30:	83 c3 00 00 	lwz     r30,0(r3)
>   e34:	83 e3 00 04 	lwz     r31,4(r3)
>   e38:	7d 29 33 78 	or      r9,r9,r6
>   e3c:	7d 43 53 78 	mr      r3,r10
>   e40:	90 a4 00 00 	stw     r5,0(r4)
>   e44:	7d 29 1b 78 	or      r9,r9,r3
>   e48:	38 c4 00 04 	addi    r6,r4,4
>   e4c:	7d 43 53 78 	mr      r3,r10
>   e50:	b0 e6 00 00 	sth     r7,0(r6)
>   e54:	7d 29 1b 78 	or      r9,r9,r3
>   e58:	38 e4 00 06 	addi    r7,r4,6
>   e5c:	7d 43 53 78 	mr      r3,r10
>   e60:	99 07 00 00 	stb     r8,0(r7)
>   e64:	7d 23 1b 78 	or      r3,r9,r3
>   e68:	38 84 00 08 	addi    r4,r4,8
>   e6c:	93 c4 00 00 	stw     r30,0(r4)
>   e70:	93 e4 00 04 	stw     r31,4(r4)
>   e74:	7c 63 53 78 	or      r3,r3,r10
>   e78:	83 c1 00 08 	lwz     r30,8(r1)
>   e7c:	83 e1 00 0c 	lwz     r31,12(r1)
>   e80:	38 21 00 10 	addi    r1,r1,16
>   e84:	4e 80 00 20 	blr
> 
> After the patch:
> 
> 00000dbc <set_test_user>:
>   dbc:	39 40 00 00 	li      r10,0
>   dc0:	7d 49 53 78 	mr      r9,r10
>   dc4:	80 03 00 00 	lwz     r0,0(r3)
>   dc8:	7d 48 53 78 	mr      r8,r10
>   dcc:	a1 63 00 04 	lhz     r11,4(r3)
>   dd0:	7d 29 43 78 	or      r9,r9,r8
>   dd4:	7d 48 53 78 	mr      r8,r10
>   dd8:	88 a3 00 06 	lbz     r5,6(r3)
>   ddc:	7d 29 43 78 	or      r9,r9,r8
>   de0:	7d 48 53 78 	mr      r8,r10
>   de4:	80 c3 00 08 	lwz     r6,8(r3)
>   de8:	80 e3 00 0c 	lwz     r7,12(r3)
>   dec:	7d 29 43 78 	or      r9,r9,r8
>   df0:	7d 43 53 78 	mr      r3,r10
>   df4:	90 04 00 00 	stw     r0,0(r4)
>   df8:	7d 29 1b 78 	or      r9,r9,r3
>   dfc:	7d 43 53 78 	mr      r3,r10
>   e00:	b1 64 00 04 	sth     r11,4(r4)
>   e04:	7d 29 1b 78 	or      r9,r9,r3
>   e08:	7d 43 53 78 	mr      r3,r10
>   e0c:	98 a4 00 06 	stb     r5,6(r4)
>   e10:	7d 23 1b 78 	or      r3,r9,r3
>   e14:	90 c4 00 08 	stw     r6,8(r4)
>   e18:	90 e4 00 0c 	stw     r7,12(r4)
>   e1c:	7c 63 53 78 	or      r3,r3,r10
>   e20:	4e 80 00 20 	blr
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
> ---
> v2:
> - Added <> modifier in __put_user_asm() and __get_user_asm()
> - Removed %U2 in __put_user_asm2() and __get_user_asm2()
> - Reworded the commit log
> ---
>   arch/powerpc/include/asm/uaccess.h | 28 ++++++++++++++--------------
>   1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 7c811442b607..9365b59495a2 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -114,7 +114,7 @@ extern long __put_user_bad(void);
>    */
>   #define __put_user_asm(x, addr, err, op)			\
>   	__asm__ __volatile__(					\
> -		"1:	" op " %1,0(%2)	# put_user\n"		\
> +		"1:	" op "%U2%X2 %1,%2	# put_user\n"	\
>   		"2:\n"						\
>   		".section .fixup,\"ax\"\n"			\
>   		"3:	li %0,%3\n"				\
> @@ -122,7 +122,7 @@ extern long __put_user_bad(void);
>   		".previous\n"					\
>   		EX_TABLE(1b, 3b)				\
>   		: "=r" (err)					\
> -		: "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> +		: "r" (x), "m<>" (*addr), "i" (-EFAULT), "0" (err))
>   
>   #ifdef __powerpc64__
>   #define __put_user_asm2(x, ptr, retval)				\
> @@ -130,8 +130,8 @@ extern long __put_user_bad(void);
>   #else /* __powerpc64__ */
>   #define __put_user_asm2(x, addr, err)				\
>   	__asm__ __volatile__(					\
> -		"1:	stw %1,0(%2)\n"				\
> -		"2:	stw %1+1,4(%2)\n"			\
> +		"1:	stw%X2 %1,%2\n"			\
> +		"2:	stw%X2 %L1,%L2\n"			\
>   		"3:\n"						\
>   		".section .fixup,\"ax\"\n"			\
>   		"4:	li %0,%3\n"				\
> @@ -140,7 +140,7 @@ extern long __put_user_bad(void);
>   		EX_TABLE(1b, 4b)				\
>   		EX_TABLE(2b, 4b)				\
>   		: "=r" (err)					\
> -		: "r" (x), "b" (addr), "i" (-EFAULT), "0" (err))
> +		: "r" (x), "m" (*addr), "i" (-EFAULT), "0" (err))
>   #endif /* __powerpc64__ */
>   
>   #define __put_user_size_allowed(x, ptr, size, retval)		\
> @@ -260,7 +260,7 @@ extern long __get_user_bad(void);
>   
>   #define __get_user_asm(x, addr, err, op)		\
>   	__asm__ __volatile__(				\
> -		"1:	"op" %1,0(%2)	# get_user\n"	\
> +		"1:	"op"%U2%X2 %1, %2	# get_user\n"	\
>   		"2:\n"					\
>   		".section .fixup,\"ax\"\n"		\
>   		"3:	li %0,%3\n"			\
> @@ -269,7 +269,7 @@ extern long __get_user_bad(void);
>   		".previous\n"				\
>   		EX_TABLE(1b, 3b)			\
>   		: "=r" (err), "=r" (x)			\
> -		: "b" (addr), "i" (-EFAULT), "0" (err))
> +		: "m<>" (*addr), "i" (-EFAULT), "0" (err))
>   
>   #ifdef __powerpc64__
>   #define __get_user_asm2(x, addr, err)			\
> @@ -277,8 +277,8 @@ extern long __get_user_bad(void);
>   #else /* __powerpc64__ */
>   #define __get_user_asm2(x, addr, err)			\
>   	__asm__ __volatile__(				\
> -		"1:	lwz %1,0(%2)\n"			\
> -		"2:	lwz %1+1,4(%2)\n"		\
> +		"1:	lwz%X2 %1, %2\n"			\
> +		"2:	lwz%X2 %L1, %L2\n"		\
>   		"3:\n"					\
>   		".section .fixup,\"ax\"\n"		\
>   		"4:	li %0,%3\n"			\
> @@ -289,7 +289,7 @@ extern long __get_user_bad(void);
>   		EX_TABLE(1b, 4b)			\
>   		EX_TABLE(2b, 4b)			\
>   		: "=r" (err), "=&r" (x)			\
> -		: "b" (addr), "i" (-EFAULT), "0" (err))
> +		: "m" (*addr), "i" (-EFAULT), "0" (err))
>   #endif /* __powerpc64__ */
>   
>   #define __get_user_size_allowed(x, ptr, size, retval)		\
> @@ -299,10 +299,10 @@ do {								\
>   	if (size > sizeof(x))					\
>   		(x) = __get_user_bad();				\
>   	switch (size) {						\
> -	case 1: __get_user_asm(x, ptr, retval, "lbz"); break;	\
> -	case 2: __get_user_asm(x, ptr, retval, "lhz"); break;	\
> -	case 4: __get_user_asm(x, ptr, retval, "lwz"); break;	\
> -	case 8: __get_user_asm2(x, ptr, retval);  break;	\
> +	case 1: __get_user_asm(x, (u8 __user *)ptr, retval, "lbz"); break;	\
> +	case 2: __get_user_asm(x, (u16 __user *)ptr, retval, "lhz"); break;	\
> +	case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break;	\
> +	case 8: __get_user_asm2(x, (u64 __user *)ptr, retval);  break;	\
>   	default: (x) = __get_user_bad();			\
>   	}							\
>   } while (0)
> 

^ permalink raw reply

* [PATCH] ASoC: fsl_sai: Refine regcache usage with pm runtime
From: Shengjiu Wang @ 2020-06-29  6:42 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, broonie, perex, tiwai,
	alsa-devel
  Cc: linuxppc-dev, linux-kernel

When there is dedicated power domain bound with device, after probing
the power will be disabled, then registers are not accessible in
fsl_sai_dai_probe(), so regcache only need to be enabled in end of
probe() and regcache_mark_dirty should be moved to pm runtime resume
callback function.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 sound/soc/fsl/fsl_sai.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 9d436b0c5718..a22562f2df47 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -1016,6 +1016,7 @@ static int fsl_sai_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, sai);
 
 	pm_runtime_enable(&pdev->dev);
+	regcache_cache_only(sai->regmap, true);
 
 	ret = devm_snd_soc_register_component(&pdev->dev, &fsl_component,
 			&fsl_sai_dai, 1);
@@ -1107,7 +1108,6 @@ static int fsl_sai_runtime_suspend(struct device *dev)
 	clk_disable_unprepare(sai->bus_clk);
 
 	regcache_cache_only(sai->regmap, true);
-	regcache_mark_dirty(sai->regmap);
 
 	return 0;
 }
@@ -1137,6 +1137,7 @@ static int fsl_sai_runtime_resume(struct device *dev)
 	}
 
 	regcache_cache_only(sai->regmap, false);
+	regcache_mark_dirty(sai->regmap);
 	regmap_write(sai->regmap, FSL_SAI_TCSR(ofs), FSL_SAI_CSR_SR);
 	regmap_write(sai->regmap, FSL_SAI_RCSR(ofs), FSL_SAI_CSR_SR);
 	usleep_range(1000, 2000);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH 01/13] iommu/exynos: Use dev_iommu_priv_get/set()
From: Marek Szyprowski @ 2020-06-29  7:51 UTC (permalink / raw)
  To: Joerg Roedel, iommu
  Cc: linux-ia64, Heiko Stuebner, David Airlie, Joonas Lahtinen,
	Thierry Reding, Paul Mackerras, Will Deacon, x86, Russell King,
	Catalin Marinas, Fenghua Yu, Joerg Roedel, intel-gfx, Jani Nikula,
	Rodrigo Vivi, Matthias Brugger, linux-arm-kernel, Tony Luck,
	linuxppc-dev, linux-kernel, Daniel Vetter, David Woodhouse,
	Lu Baolu
In-Reply-To: <20200625130836.1916-2-joro@8bytes.org>

On 25.06.2020 15:08, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Remove the use of dev->archdata.iommu and use the private per-device
> pointer provided by IOMMU core code instead.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>   drivers/iommu/exynos-iommu.c                  | 20 +++++++++----------
>   .../media/platform/s5p-mfc/s5p_mfc_iommu.h    |  4 +++-
>   2 files changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
> index 60c8a56e4a3f..6a9b67302369 100644
> --- a/drivers/iommu/exynos-iommu.c
> +++ b/drivers/iommu/exynos-iommu.c
> @@ -173,7 +173,7 @@ static u32 lv2ent_offset(sysmmu_iova_t iova)
>   #define REG_V5_FAULT_AR_VA	0x070
>   #define REG_V5_FAULT_AW_VA	0x080
>   
> -#define has_sysmmu(dev)		(dev->archdata.iommu != NULL)
> +#define has_sysmmu(dev)		(dev_iommu_priv_get(dev) != NULL)
>   
>   static struct device *dma_dev;
>   static struct kmem_cache *lv2table_kmem_cache;
> @@ -226,7 +226,7 @@ static const struct sysmmu_fault_info sysmmu_v5_faults[] = {
>   };
>   
>   /*
> - * This structure is attached to dev.archdata.iommu of the master device
> + * This structure is attached to dev->iommu->priv of the master device
>    * on device add, contains a list of SYSMMU controllers defined by device tree,
>    * which are bound to given master device. It is usually referenced by 'owner'
>    * pointer.
> @@ -670,7 +670,7 @@ static int __maybe_unused exynos_sysmmu_suspend(struct device *dev)
>   	struct device *master = data->master;
>   
>   	if (master) {
> -		struct exynos_iommu_owner *owner = master->archdata.iommu;
> +		struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
>   
>   		mutex_lock(&owner->rpm_lock);
>   		if (data->domain) {
> @@ -688,7 +688,7 @@ static int __maybe_unused exynos_sysmmu_resume(struct device *dev)
>   	struct device *master = data->master;
>   
>   	if (master) {
> -		struct exynos_iommu_owner *owner = master->archdata.iommu;
> +		struct exynos_iommu_owner *owner = dev_iommu_priv_get(master);
>   
>   		mutex_lock(&owner->rpm_lock);
>   		if (data->domain) {
> @@ -837,8 +837,8 @@ static void exynos_iommu_domain_free(struct iommu_domain *iommu_domain)
>   static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain,
>   				    struct device *dev)
>   {
> -	struct exynos_iommu_owner *owner = dev->archdata.iommu;
>   	struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
> +	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	phys_addr_t pagetable = virt_to_phys(domain->pgtable);
>   	struct sysmmu_drvdata *data, *next;
>   	unsigned long flags;
> @@ -875,8 +875,8 @@ static void exynos_iommu_detach_device(struct iommu_domain *iommu_domain,
>   static int exynos_iommu_attach_device(struct iommu_domain *iommu_domain,
>   				   struct device *dev)
>   {
> -	struct exynos_iommu_owner *owner = dev->archdata.iommu;
>   	struct exynos_iommu_domain *domain = to_exynos_domain(iommu_domain);
> +	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	struct sysmmu_drvdata *data;
>   	phys_addr_t pagetable = virt_to_phys(domain->pgtable);
>   	unsigned long flags;
> @@ -1237,7 +1237,7 @@ static phys_addr_t exynos_iommu_iova_to_phys(struct iommu_domain *iommu_domain,
>   
>   static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
>   {
> -	struct exynos_iommu_owner *owner = dev->archdata.iommu;
> +	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	struct sysmmu_drvdata *data;
>   
>   	if (!has_sysmmu(dev))
> @@ -1263,7 +1263,7 @@ static struct iommu_device *exynos_iommu_probe_device(struct device *dev)
>   
>   static void exynos_iommu_release_device(struct device *dev)
>   {
> -	struct exynos_iommu_owner *owner = dev->archdata.iommu;
> +	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	struct sysmmu_drvdata *data;
>   
>   	if (!has_sysmmu(dev))
> @@ -1287,8 +1287,8 @@ static void exynos_iommu_release_device(struct device *dev)
>   static int exynos_iommu_of_xlate(struct device *dev,
>   				 struct of_phandle_args *spec)
>   {
> -	struct exynos_iommu_owner *owner = dev->archdata.iommu;
>   	struct platform_device *sysmmu = of_find_device_by_node(spec->np);
> +	struct exynos_iommu_owner *owner = dev_iommu_priv_get(dev);
>   	struct sysmmu_drvdata *data, *entry;
>   
>   	if (!sysmmu)
> @@ -1305,7 +1305,7 @@ static int exynos_iommu_of_xlate(struct device *dev,
>   
>   		INIT_LIST_HEAD(&owner->controllers);
>   		mutex_init(&owner->rpm_lock);
> -		dev->archdata.iommu = owner;
> +		dev_iommu_priv_set(dev, owner);
>   	}
>   
>   	list_for_each_entry(entry, &owner->controllers, owner_node)
> diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h b/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
> index 152a713fff78..1a32266b7ddc 100644
> --- a/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
> +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_iommu.h
> @@ -9,9 +9,11 @@
>   
>   #if defined(CONFIG_EXYNOS_IOMMU)
>   
> +#include <linux/iommu.h>
> +
>   static inline bool exynos_is_iommu_available(struct device *dev)
>   {
> -	return dev->archdata.iommu != NULL;
> +	return dev_iommu_priv_get(dev) != NULL;
>   }
>   
>   #else

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply

* Re: [PATCH V3 2/4] mm/debug_vm_pgtable: Add tests validating advanced arch page table helpers
From: Anshuman Khandual @ 2020-06-29  8:09 UTC (permalink / raw)
  To: Christophe Leroy, linux-mm
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, linux-arch, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, linux-arm-kernel, ziy,
	Catalin Marinas, linux-snps-arc, Vasily Gorbik, Borislav Petkov,
	Paul Walmsley, Kirill A . Shutemov, Thomas Gleixner,
	gerald.schaefer, christophe.leroy, Vineet Gupta, linux-kernel,
	Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <6da177e6-9219-9ccf-a402-f4293c7564f7@csgroup.eu>


On 06/27/2020 12:48 PM, Christophe Leroy wrote:
> Le 15/06/2020 à 05:37, Anshuman Khandual a écrit :
>> This adds new tests validating for these following arch advanced page table
>> helpers. These tests create and test specific mapping types at various page
>> table levels.
>>
>> 1. pxxp_set_wrprotect()
>> 2. pxxp_get_and_clear()
>> 3. pxxp_set_access_flags()
>> 4. pxxp_get_and_clear_full()
>> 5. pxxp_test_and_clear_young()
>> 6. pxx_leaf()
>> 7. pxx_set_huge()
>> 8. pxx_(clear|mk)_savedwrite()
>> 9. huge_pxxp_xxx()
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Cc: Paul Walmsley <paul.walmsley@sifive.com>
>> Cc: Palmer Dabbelt <palmer@dabbelt.com>
>> Cc: linux-snps-arc@lists.infradead.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s390@vger.kernel.org
>> Cc: linux-riscv@lists.infradead.org
>> Cc: x86@kernel.org
>> Cc: linux-mm@kvack.org
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   mm/debug_vm_pgtable.c | 306 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 306 insertions(+)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index ffa163d4c63c..e3f9f8317a98 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/module.h>
>>   #include <linux/pfn_t.h>
>>   #include <linux/printk.h>
>> +#include <linux/pgtable.h>
>>   #include <linux/random.h>
>>   #include <linux/spinlock.h>
>>   #include <linux/swap.h>
>> @@ -28,6 +29,7 @@
>>   #include <linux/start_kernel.h>
>>   #include <linux/sched/mm.h>
>>   #include <asm/pgalloc.h>
>> +#include <asm/tlbflush.h>
>>     #define VMFLAGS    (VM_READ|VM_WRITE|VM_EXEC)
>>   @@ -55,6 +57,54 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
>>       WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte))));
>>   }
>>   +static void __init pte_advanced_tests(struct mm_struct *mm,
>> +            struct vm_area_struct *vma, pte_t *ptep,
>> +            unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +    pte_t pte = pfn_pte(pfn, prot);
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_set_wrprotect(mm, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
> 
> same
> 
>> +    WARN_ON(pte_write(pte));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_get_and_clear(mm, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
> 
> same
> 
>> +    WARN_ON(!pte_none(pte));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    pte = pte_wrprotect(pte);
>> +    pte = pte_mkclean(pte);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    pte = pte_mkwrite(pte);
>> +    pte = pte_mkdirty(pte);
>> +    ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>> +    pte = READ_ONCE(*ptep);
> 
> same
> 
>> +    WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_get_and_clear_full(mm, vaddr, ptep, 1);
>> +    pte = READ_ONCE(*ptep);
> 
> same
> 
>> +    WARN_ON(!pte_none(pte));
>> +
>> +    pte = pte_mkyoung(pte);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_test_and_clear_young(vma, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
> 
> same
> 
>> +    WARN_ON(pte_young(pte));
>> +}
>> +
>> +static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pte_t pte = pfn_pte(pfn, prot);
>> +
>> +    WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte))));
>> +    WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte))));
>> +}
>>   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>   static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>>   {
>> @@ -77,6 +127,89 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>>       WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>>   }
>>   +static void __init pmd_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pmd_t *pmdp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    if (!has_transparent_hugepage())
>> +        return;
>> +
>> +    /* Align the address wrt HPAGE_PMD_SIZE */
>> +    vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_set_wrprotect(mm, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(pmd_write(pmd));
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_huge_get_and_clear(mm, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    pmd = pmd_wrprotect(pmd);
>> +    pmd = pmd_mkclean(pmd);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmd = pmd_mkwrite(pmd);
>> +    pmd = pmd_mkdirty(pmd);
>> +    pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
>> +
>> +    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +
>> +    pmd = pmd_mkyoung(pmd);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_test_and_clear_young(vma, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(pmd_young(pmd));
>> +}
>> +
>> +static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    /*
>> +     * PMD based THP is a leaf entry.
>> +     */
>> +    pmd = pmd_mkhuge(pmd);
>> +    WARN_ON(!pmd_leaf(pmd));
>> +}
>> +
>> +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd;
>> +
>> +    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
>> +        return;
>> +    /*
>> +     * X86 defined pmd_set_huge() verifies that the given
>> +     * PMD is not a populated non-leaf entry.
>> +     */
>> +    WRITE_ONCE(*pmdp, __pmd(0));
>> +    WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
>> +    WARN_ON(!pmd_clear_huge(pmdp));
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +}
>> +
>> +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd))));
>> +    WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd))));
>> +}
>> +
>>   #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>>   {
>> @@ -100,12 +233,115 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>>        */
>>       WARN_ON(!pud_bad(pud_mkhuge(pud)));
>>   }
>> +
>> +static void pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +    pud_t pud = pfn_pud(pfn, prot);
>> +
>> +    if (!has_transparent_hugepage())
>> +        return;
>> +
>> +    /* Align the address wrt HPAGE_PUD_SIZE */
>> +    vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
>> +
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_set_wrprotect(mm, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(pud_write(pud));
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> +    pud = pfn_pud(pfn, prot);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_huge_get_and_clear(mm, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +
>> +    pud = pfn_pud(pfn, prot);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +#endif /* __PAGETABLE_PMD_FOLDED */
>> +    pud = pfn_pud(pfn, prot);
>> +    pud = pud_wrprotect(pud);
>> +    pud = pud_mkclean(pud);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pud = pud_mkwrite(pud);
>> +    pud = pud_mkdirty(pud);
>> +    pudp_set_access_flags(vma, vaddr, pudp, pud, 1);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
>> +
>> +    pud = pud_mkyoung(pud);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_test_and_clear_young(vma, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(pud_young(pud));
>> +}
>> +
>> +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pud_t pud = pfn_pud(pfn, prot);
>> +
>> +    /*
>> +     * PUD based THP is a leaf entry.
>> +     */
>> +    pud = pud_mkhuge(pud);
>> +    WARN_ON(!pud_leaf(pud));
>> +}
>> +
>> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>> +{
>> +    pud_t pud;
>> +
>> +    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
>> +        return;
>> +    /*
>> +     * X86 defined pud_set_huge() verifies that the given
>> +     * PUD is not a populated non-leaf entry.
>> +     */
>> +    WRITE_ONCE(*pudp, __pud(0));
>> +    WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
>> +    WARN_ON(!pud_clear_huge(pudp));
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +}
>>   #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +}
>> +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>> +{
>> +}
>>   #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>   #else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>>   static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pmd_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pmd_t *pmdp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +}
>> +static void __init pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
>> +{
>> +}
>> +static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>> +{
>> +}
>> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>> +{
>> +}
>> +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { }
>>   #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>     static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
>> @@ -495,8 +731,56 @@ static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot)
>>       WARN_ON(!pte_huge(pte_mkhuge(pte)));
>>   #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
>>   }
>> +
>> +static void __init hugetlb_advanced_tests(struct mm_struct *mm,
>> +                      struct vm_area_struct *vma,
>> +                      pte_t *ptep, unsigned long pfn,
>> +                      unsigned long vaddr, pgprot_t prot)
>> +{
>> +    struct page *page = pfn_to_page(pfn);
>> +    pte_t pte = READ_ONCE(*ptep);
> 
> Remplace with ptep_get() to avoid build failure on powerpc 8xx.

Sure, will replace all open PTE pointer accesses with ptep_get().

^ permalink raw reply

* Re: [PATCH RFC 1/1] powerpc/eeh: Synchronization for safety
From: Oliver O'Halloran @ 2020-06-29  8:11 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev
In-Reply-To: <049418257caa232ca3e27ee8cbab9b4ff38b11aa.1585550388.git.sbobroff@linux.ibm.com>

On Mon, 2020-03-30 at 17:39 +1100, Sam Bobroff wrote:
> There is currently little synchronization between EEH error detection
> (eeh_dev_check_failure()), EEH error recovery
> (eeh_handle_{normal,special}_event()) and the PCI subsystem (device
> addition and removal), and so there are race conditions that lead to
> crashes (often access to free'd memory or LIST_POISON).
> 
> However, a solution must consider:
> - EEH error detection can occur in interrupt context, which prevents
> the use of a mutex.
> - EEH recovery may need to sleep, which prevents the use of a spinlock.
> - EEH recovery uses PCI operations that may require the PCI
> rescan/remove lock and/or device lock to be held
> - PCI operations may hold the rescan/remove and/or device lock when
> calling into EEH functions.
> - Device driver callbacks may perform arbitrary PCI operations
> during recovery, including device removal.
> 
> In this patch the existing mutex and spinlock are combined with the
> EEH_PE_RECOVERING flag to provide some assurances that are then used
> to reduce the race conditions.

> The fields to be protected are the ones that provide the structure
> of the trees of struct eeh_pe that are held for each PHB: the parent
> pointer and child lists and the list of struct eeh_dev, as well as
> the pe and pdev pointers within struct eeh_dev.

This needs to be explicitly documented in the source tree. Preferably
next to the fields they protect in eeh_pe and eeh_dev.


> - I'm not aware of any outstanding problems with the set, but I've kept it RFC
>   for now becuase I'm looking for comments on the general strategy and
>   direction -- is this a good way to achieve some safety?

The two-locks idea seems like the sort of thing that's easy to screw up
and it's not really obvious what it's doing or why. Half the problem
with EEH is that it doesn't really make sense unless you spend far too
much time gazing into that abyss and this isn't really improving
matters IMO.

The main driver seems to be the desire to dereference edev->pe from
eeh_dev_check_failure(), but... why? The way it works is technically
broken as-is since it relies on eeh_ops->get_state() correctly
reporting the PE state. That's the same function that we have that 300s
wait hack for in the recovery thread (makes you wonder what platforms
actually need that...)

We've spoken in the past about having eeh_dev_check_failure() punt the
work of checking PE states into the event handler thread. I think we'd
get more milage out of that idea rather than doing the two-locks thing.
Is there some other situation besides eeh_dev_check_failure() being
called from interrupt context that makes the spinlock necessary?


> - Good places to review carefully would be eeh_pe_report_pdev() and
>   eeh_reset_device().

> - The mutex and spinlock need better names. Suggestions welcome.

eeh_dev_mutex should just be eeh_pe_tree_mutex, or similar. If you
really think the two locks idea is the way to go then
eeh_pe_tree_spinlock would make sense too. If we're going to have two
locks covering the same things then their names should be similar.

> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 12c248a16527..30acae0d10e5 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -108,11 +108,26 @@ bool eeh_debugfs_no_recover;
>  /* Platform dependent EEH operations */
>  struct eeh_ops *eeh_ops = NULL;
>  
> -/* Lock to avoid races due to multiple reports of an error */
> +/*
> + * confirm_error_lock and eeh_dev_mutex are used together to provide
> + * safety during EEH operations.
> + *
> + * Generally, the spinlock is used in error detection where it's not possible
> + * to use a mutex or where there is potential to deadlock with the mutex, and
> + * the mutex is used during recovery and other PCI related operations. One must
> + * be held when reading and both must be held when making changes to the
> + * protected fields: eeh_pe.parent, child_list, child, edevs and eeh_dev.pe,
> + * .pdev.
> + *
> + * Lock ordering:
> + * - the PCI rescan/remove mutex (see pci_lock_rescan_remove())
> + * - the struct device lock (see device_lock())
> + * - the eeh_dev_mutex mutex (see eeh_recovery_lock())
> + * - the confirm_error_lock spinlock (see eeh_serialize_lock())
> + * - the eeh_eventlist_lock spinlock
> + */

I can't make sense of this list. I probably don't need to take the pci
rescan lock before taking the eeh event list lock, or vis-a-vis, but
that's what it's suggesting.

I'm guessing there's two seperate orderings here?


> @@ -1323,7 +1368,11 @@ void eeh_remove_device(struct pci_dev *dev)
>  	 * for the VF EEH device.
>  	 */
>  	edev->in_error = false;
> +	/* Both locks are required to make changes */
> +	eeh_recovery_must_be_locked();
> +	eeh_serialize_lock(&flags);
>  	dev->dev.archdata.edev = NULL;
> +	eeh_serialize_unlock(flags);

So pci_dev->dev.archdata.edev is also covered by eeh_serialize_lock()?


> +struct pci_dev **pdev_cache_list_create(struct eeh_pe *root)
>  {
>  	struct eeh_pe *pe;
>  	struct eeh_dev *edev, *tmp;
> +	struct pci_dev **pdevs;
> +	int i, n;
> +
> +	n = 0;
> +	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) {
> +		if (edev->pdev)
> +			n++;
> +	}
> +	pdevs = kmalloc(sizeof(*pdevs) * (n + 1), GFP_KERNEL);
> +	if (WARN_ON_ONCE(!pdevs))
> +		return NULL;
> +	i = 0;
> +	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp) {
> +		if (i < n) {
> +			get_device(&edev->pdev->dev);
> +			pdevs[i++] = edev->pdev;
> +		}
> +	}
> +	if (WARN_ON_ONCE(i < n))
> +		n = i;
> +	pdevs[n] = NULL; /* terminator */
> +	return pdevs;
> +}
> +
> +static void pdev_cache_list_destroy(struct pci_dev **pdevs)
> +{
> +	struct pci_dev **pdevp;
> +
> +	for (pdevp = pdevs; pdevp && *pdevp; pdevp++)
> +		put_device(&(*pdevp)->dev);
> +	kfree(pdevs);
> +}
> +
> +static void eeh_pe_report(const char *name, struct eeh_pe *root,
> +			  eeh_report_fn fn, enum pci_ers_result *result)
> +{
> +	struct pci_dev **pdevs, **pdevp;
>  
>  	pr_info("EEH: Beginning: '%s'\n", name);
> -	eeh_for_each_pe(root, pe) eeh_pe_for_each_dev(pe, edev, tmp)
> -		eeh_pe_report_edev(edev, fn, result);
> +	/*
> +	 * It would be convenient to continue to hold the recovery lock here
> +	 * but driver callbacks can take a very long time or never return at
> +	 * all.
> +	 */
> +	pdevs = pdev_cache_list_create(root);
> +	for (pdevp = pdevs; pdevp && *pdevp; pdevp++) {
> +		/*
> +		 * NOTE! eeh_recovery_lock() is released briefly
> +		 * in eeh_pe_report_pdev()
> +		 */
> +		eeh_pe_report_pdev(*pdevp, fn, result);
> +	}
> +	pdev_cache_list_destroy(pdevs);

So the cache stuff is taking a ref to all the pci_devs under that PE. I
assume that's to keep the eeh_dev->pdev pointer valid for the whole
time we don't have the recovery_lock held? I don't really see why we
need to care though since we're not holding a reference to the pdev
itself and the removal should happen while we're in the driver
callback. Is this a hold over from when the eeh_dev would be removed
with the pci_dev was released?

> @@ -477,17 +554,44 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
>  	}
>  
>  #ifdef CONFIG_PCI_IOV
> -	pci_iov_add_virtfn(edev->physfn, eeh_dev_to_pdn(edev)->vf_index);
> +	{
> +		struct pci_dev *physfn = edev->physfn;
> +		int vf_index = eeh_dev_to_pdn(edev)->vf_index;
> +
> +		get_device(&physfn->dev);
> +		eeh_recovery_unlock();
> +		/*
> +		 * This PCI operation will call back into EEH code where the
> +		 * recovery lock will be acquired, so it must be released here,
> +		 * first:
> +		 */
> +		pci_iov_add_virtfn(physfn, vf_index);
> +		put_device(&physfn->dev);
> +		eeh_recovery_lock();
> +	}
>  #endif
>  	return NULL;
>  }
>  
> -static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
> +/*
> + * It's important that this function take a pdev as a parameter, because they
> + * are protected by a reference count. An edev could be lost when the recovery
> + * lock is dropped (which it must be in order to take the PCI rescan/remove
> + * lock without risking a deadlock).

Same comments as above. Even if an edev goes away does it matter? The
remove will be handled by code that properly handles the locking. Also
with the change that moved eeh_remove_device() from the
pcibios_release_device() into the bus removal notifier this won't stop
the eeh_dev from being removed.


> @@ -1118,6 +1273,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>   * Called when an EEH event is detected but can't be narrowed down to a
>   * specific PE.  Iterates through possible failures and handles them as
>   * necessary.
> + * XXX TODO Needs to be checked sync work

stale comment or is it fine?


> @@ -445,6 +474,9 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>  	}
>  	pe->parent = parent;
>  
> +	if (parent->state & EEH_PE_RECOVERING)
> +		pe->state |= EEH_PE_RECOVERING;
> +

What's actually required to hit this case? All I can think of is
discovering a new bridge during a post-reset rescan. A new brige would
mean a new PE would be allocated for the downstream bus.

> @@ -141,13 +144,21 @@ static ssize_t pnv_eeh_ei_write(struct file *filp,
>  	if (ret != 5)
>  		return -EINVAL;
>  
> +	/*
> +	 * Use the spinlock rather than the mutex so that errors can be
> +	 * injected during slow recovery operations (for testing).
> +	 */
> +	eeh_serialize_lock(&flags);
>  	/* Retrieve PE */
>  	pe = eeh_pe_get(hose, pe_no, 0);
> -	if (!pe)
> +	if (!pe) {
> +		eeh_serialize_unlock(flags);
>  		return -ENODEV;
> +	}
>  
>  	/* Do error injection */
>  	ret = eeh_ops->err_inject(pe, type, func, addr, mask);
> +	eeh_serialize_unlock(flags);
>  	return ret < 0 ? ret : count;
>  }

We could probably make this just look at the powernv phb state directly
and avoid touching the eeh_pe at all. The only bit of the eeh_pe that
we care about is the pe number which the user already provided.

I suppose we care about validating that the PE is in use, but it might
be useful to allow unused PEs to be injected into. That would help with
testing errors during device probe, etc.


^ permalink raw reply

* Re: [PATCH V3 2/4] mm/debug_vm_pgtable: Add tests validating advanced arch page table helpers
From: Anshuman Khandual @ 2020-06-29  8:15 UTC (permalink / raw)
  To: Christophe Leroy, linux-mm
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, linux-arch, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, gerald.schaefer, ziy,
	Catalin Marinas, linux-snps-arc, Vasily Gorbik, Borislav Petkov,
	Paul Walmsley, Kirill A . Shutemov, Thomas Gleixner,
	linux-arm-kernel, Vineet Gupta, linux-kernel, Palmer Dabbelt,
	Andrew Morton, linuxppc-dev
In-Reply-To: <4da41eee-5ce0-2a5e-40eb-4424655b3489@csgroup.eu>



On 06/27/2020 12:56 PM, Christophe Leroy wrote:
> 
> 
> Le 15/06/2020 à 05:37, Anshuman Khandual a écrit :
>> This adds new tests validating for these following arch advanced page table
>> helpers. These tests create and test specific mapping types at various page
>> table levels.
>>
>> 1. pxxp_set_wrprotect()
>> 2. pxxp_get_and_clear()
>> 3. pxxp_set_access_flags()
>> 4. pxxp_get_and_clear_full()
>> 5. pxxp_test_and_clear_young()
>> 6. pxx_leaf()
>> 7. pxx_set_huge()
>> 8. pxx_(clear|mk)_savedwrite()
>> 9. huge_pxxp_xxx()
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mike Rapoport <rppt@linux.ibm.com>
>> Cc: Vineet Gupta <vgupta@synopsys.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> Cc: Paul Mackerras <paulus@samba.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Cc: Paul Walmsley <paul.walmsley@sifive.com>
>> Cc: Palmer Dabbelt <palmer@dabbelt.com>
>> Cc: linux-snps-arc@lists.infradead.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: linux-s390@vger.kernel.org
>> Cc: linux-riscv@lists.infradead.org
>> Cc: x86@kernel.org
>> Cc: linux-mm@kvack.org
>> Cc: linux-arch@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   mm/debug_vm_pgtable.c | 306 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 306 insertions(+)
>>
>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>> index ffa163d4c63c..e3f9f8317a98 100644
>> --- a/mm/debug_vm_pgtable.c
>> +++ b/mm/debug_vm_pgtable.c
>> @@ -21,6 +21,7 @@
>>   #include <linux/module.h>
>>   #include <linux/pfn_t.h>
>>   #include <linux/printk.h>
>> +#include <linux/pgtable.h>
>>   #include <linux/random.h>
>>   #include <linux/spinlock.h>
>>   #include <linux/swap.h>
>> @@ -28,6 +29,7 @@
>>   #include <linux/start_kernel.h>
>>   #include <linux/sched/mm.h>
>>   #include <asm/pgalloc.h>
>> +#include <asm/tlbflush.h>
>>     #define VMFLAGS    (VM_READ|VM_WRITE|VM_EXEC)
>>   @@ -55,6 +57,54 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
>>       WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte))));
>>   }
>>   +static void __init pte_advanced_tests(struct mm_struct *mm,
>> +            struct vm_area_struct *vma, pte_t *ptep,
>> +            unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly.
> 
>> +{
>> +    pte_t pte = pfn_pte(pfn, prot);
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_set_wrprotect(mm, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
>> +    WARN_ON(pte_write(pte));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_get_and_clear(mm, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
>> +    WARN_ON(!pte_none(pte));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    pte = pte_wrprotect(pte);
>> +    pte = pte_mkclean(pte);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    pte = pte_mkwrite(pte);
>> +    pte = pte_mkdirty(pte);
>> +    ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>> +    pte = READ_ONCE(*ptep);
>> +    WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
>> +
>> +    pte = pfn_pte(pfn, prot);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_get_and_clear_full(mm, vaddr, ptep, 1);
>> +    pte = READ_ONCE(*ptep);
>> +    WARN_ON(!pte_none(pte));
>> +
>> +    pte = pte_mkyoung(pte);
>> +    set_pte_at(mm, vaddr, ptep, pte);
>> +    ptep_test_and_clear_young(vma, vaddr, ptep);
>> +    pte = READ_ONCE(*ptep);
>> +    WARN_ON(pte_young(pte));
>> +}
>> +
>> +static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pte_t pte = pfn_pte(pfn, prot);
>> +
>> +    WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte))));
>> +    WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte))));
>> +}
>>   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>   static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>>   {
>> @@ -77,6 +127,89 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
>>       WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
>>   }
>>   +static void __init pmd_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pmd_t *pmdp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly
> 
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    if (!has_transparent_hugepage())
>> +        return;
>> +
>> +    /* Align the address wrt HPAGE_PMD_SIZE */
>> +    vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_set_wrprotect(mm, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(pmd_write(pmd));
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_huge_get_and_clear(mm, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +
>> +    pmd = pfn_pmd(pfn, prot);
>> +    pmd = pmd_wrprotect(pmd);
>> +    pmd = pmd_mkclean(pmd);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmd = pmd_mkwrite(pmd);
>> +    pmd = pmd_mkdirty(pmd);
>> +    pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
>> +
>> +    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +
>> +    pmd = pmd_mkyoung(pmd);
>> +    set_pmd_at(mm, vaddr, pmdp, pmd);
>> +    pmdp_test_and_clear_young(vma, vaddr, pmdp);
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(pmd_young(pmd));
>> +}
>> +
>> +static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    /*
>> +     * PMD based THP is a leaf entry.
>> +     */
>> +    pmd = pmd_mkhuge(pmd);
>> +    WARN_ON(!pmd_leaf(pmd));
>> +}
>> +
>> +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd;
>> +
>> +    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
>> +        return;
>> +    /*
>> +     * X86 defined pmd_set_huge() verifies that the given
>> +     * PMD is not a populated non-leaf entry.
>> +     */
>> +    WRITE_ONCE(*pmdp, __pmd(0));
>> +    WARN_ON(!pmd_set_huge(pmdp, __pfn_to_phys(pfn), prot));
>> +    WARN_ON(!pmd_clear_huge(pmdp));
>> +    pmd = READ_ONCE(*pmdp);
>> +    WARN_ON(!pmd_none(pmd));
>> +}
>> +
>> +static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pmd_t pmd = pfn_pmd(pfn, prot);
>> +
>> +    WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd))));
>> +    WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd))));
>> +}
>> +
>>   #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>>   {
>> @@ -100,12 +233,115 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
>>        */
>>       WARN_ON(!pud_bad(pud_mkhuge(pud)));
>>   }
>> +
>> +static void pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly
> 
>> +{
>> +    pud_t pud = pfn_pud(pfn, prot);
>> +
>> +    if (!has_transparent_hugepage())
>> +        return;
>> +
>> +    /* Align the address wrt HPAGE_PUD_SIZE */
>> +    vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
>> +
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_set_wrprotect(mm, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(pud_write(pud));
>> +
>> +#ifndef __PAGETABLE_PMD_FOLDED
>> +    pud = pfn_pud(pfn, prot);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_huge_get_and_clear(mm, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +
>> +    pud = pfn_pud(pfn, prot);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +#endif /* __PAGETABLE_PMD_FOLDED */
>> +    pud = pfn_pud(pfn, prot);
>> +    pud = pud_wrprotect(pud);
>> +    pud = pud_mkclean(pud);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pud = pud_mkwrite(pud);
>> +    pud = pud_mkdirty(pud);
>> +    pudp_set_access_flags(vma, vaddr, pudp, pud, 1);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
>> +
>> +    pud = pud_mkyoung(pud);
>> +    set_pud_at(mm, vaddr, pudp, pud);
>> +    pudp_test_and_clear_young(vma, vaddr, pudp);
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(pud_young(pud));
>> +}
>> +
>> +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
>> +{
>> +    pud_t pud = pfn_pud(pfn, prot);
>> +
>> +    /*
>> +     * PUD based THP is a leaf entry.
>> +     */
>> +    pud = pud_mkhuge(pud);
>> +    WARN_ON(!pud_leaf(pud));
>> +}
>> +
>> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>> +{
>> +    pud_t pud;
>> +
>> +    if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
>> +        return;
>> +    /*
>> +     * X86 defined pud_set_huge() verifies that the given
>> +     * PUD is not a populated non-leaf entry.
>> +     */
>> +    WRITE_ONCE(*pudp, __pud(0));
>> +    WARN_ON(!pud_set_huge(pudp, __pfn_to_phys(pfn), prot));
>> +    WARN_ON(!pud_clear_huge(pudp));
>> +    pud = READ_ONCE(*pudp);
>> +    WARN_ON(!pud_none(pud));
>> +}
>>   #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly
> 
>> +{
>> +}
>> +static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
>> +{
>> +}
>>   #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>   #else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
>>   static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
>>   static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
>> +static void __init pmd_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pmd_t *pmdp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly
> 
>> +{
>> +}
>> +static void __init pud_advanced_tests(struct mm_struct *mm,
>> +        struct vm_area_struct *vma, pud_t *pudp,
>> +        unsigned long pfn, unsigned long vaddr, pgprot_t prot)
> 
> Align args properly
> 

Sure, will fix the arguments alignment in the above mentioned places.

^ permalink raw reply

* [PATCH V4 0/3] cpufreq: Allow default governor on cmdline and fix locking issues
From: Viresh Kumar @ 2020-06-29  8:24 UTC (permalink / raw)
  To: Rafael Wysocki, Arnd Bergmann, Ben Segall, Dietmar Eggemann,
	Ingo Molnar, Jonathan Corbet, Juri Lelli, Mel Gorman,
	Peter Zijlstra, Steven Rostedt, Vincent Guittot, Viresh Kumar
  Cc: linux-doc, linuxppc-dev, adharmap, linux-pm, linux-kernel,
	Quentin Perret, kernel-team, tkjos

Hi,

I have picked Quentin's series over my patch, modified both and tested.

V3->V4:
- Do __module_get() for cpufreq_default_governor() case as well and get
  rid of an extra variable.
- Use a single character array, default_governor, instead of two of them.

V2->V3:
- default_governor is a string now and we don't set it on governor
  registration or unregistration anymore.
- Fixed locking issues in cpufreq_init_policy().

--
Viresh

Original cover letter fro Quentin:

This series enables users of prebuilt kernels (e.g. distro kernels) to
specify their CPUfreq governor of choice using the kernel command line,
instead of having to wait for the system to fully boot to userspace to
switch using the sysfs interface. This is helpful for 2 reasons:
  1. users get to choose the governor that runs during the actual boot;
  2. it simplifies the userspace boot procedure a bit (one less thing to
     worry about).

To enable this, the first patch moves all governor init calls to
core_initcall, to make sure they are registered by the time the drivers
probe. This should be relatively low impact as registering a governor
is a simple procedure (it gets added to a llist), and all governors
already load at core_initcall anyway when they're set as the default
in Kconfig. This also allows to clean-up the governors' init/exit code,
and reduces boilerplate.

The second patch introduces the new command line parameter, inspired by
its cpuidle counterpart. More details can be found in the respective
patch headers.

Changes in v2:
 - added Viresh's ack to patch 01
 - moved the assignment of 'default_governor' in patch 02 to the governor
   registration path instead of the driver registration (Viresh)

Quentin Perret (2):
  cpufreq: Register governors at core_initcall
  cpufreq: Specify default governor on command line

Viresh Kumar (1):
  cpufreq: Fix locking issues with governors

 .../admin-guide/kernel-parameters.txt         |  5 ++
 Documentation/admin-guide/pm/cpufreq.rst      |  6 +-
 .../platforms/cell/cpufreq_spudemand.c        | 26 +-----
 drivers/cpufreq/cpufreq.c                     | 87 ++++++++++++-------
 drivers/cpufreq/cpufreq_conservative.c        | 22 ++---
 drivers/cpufreq/cpufreq_ondemand.c            | 24 ++---
 drivers/cpufreq/cpufreq_performance.c         | 14 +--
 drivers/cpufreq/cpufreq_powersave.c           | 18 +---
 drivers/cpufreq/cpufreq_userspace.c           | 18 +---
 include/linux/cpufreq.h                       | 14 +++
 kernel/sched/cpufreq_schedutil.c              |  6 +-
 11 files changed, 100 insertions(+), 140 deletions(-)

-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox