LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] Linux: Define struct termios2 in <termios.h> under _GNU_SOURCE [BZ #10339]
From: Florian Weimer @ 2019-04-12  7:50 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: linux-api, libc-alpha, linuxppc-dev, hpa
In-Reply-To: <db468a0f-4a7b-d251-601f-428885275d08@linaro.org>

* Adhemerval Zanella:

> On 11/04/2019 08:07, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> This allows us to adjust the baud rates to non-standard values using termios
>>> interfaces without to resorting to add new headers and use a different API
>>> (ioctl).
>> 
>> How much symbol versioning will be required for this change?
>
> I think all interfaces that have termios as input for sparc and mips 
> (tcgetattr, tcsetattr, cfmakeraw, cfgetispeed, cfgetospeed, cfsetispeed,
> cfsetospeed, cfsetspeed).
>
> Alpha will also need to use termios1 for pre-4.20 kernels.

So only new symbol versions there?  Hmm.

>>> As Peter Anvin has indicated, he create a POC [1] with the aforementioned
>>> new interfaces.  It has not been rebased against master, more specially against
>>> my termios refactor to simplify the multiple architecture header definitions,
>>> but I intend to use as a base.
>> 
>> Reference [1] is still missing. 8-(
>
> Oops... it is https://git.zytor.com/users/hpa/glibc/termbaud.git/log/?h=wip.termbaud

This doesn't really illuminate things.  “Drop explicit baud setting
interfaces in favor of cfenc|decspeed()” removes the new symbol version
for the cf* functions.

My gut feeling is that it's safer to add new interfaces, based on the
actual kernel/userspace interface, rather than trying to fix up existing
interfaces with symbol versioning.  The main reason is that code
involving serial interfaces is difficult to test, so it will take years
until we find the last application broken by the glibc interface bump.

I don't feel strongly about this.  This came out of a request for
enabling TCGETS2 support downstream.  If I can't fix this upstream, I
will just reject that request.

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH kernel RFC 0/2] powerpc/ioda2: An attempt to allow DMA masks between 32 and 59
From: Russell Currey @ 2019-04-12  7:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Alistair Popple, Oliver O'Halloran, David Gibson
In-Reply-To: <20190412064408.85399-1-aik@ozlabs.ru>

I'm gonna try and benchmark this on a few different devices for performance, with 64k TCEs (as is), with larger TCE sizes, and against sketchy bypass.  Hopefully if performance isn't too far off, we can get rid of sketchy bypass entirely and have a more robust solution.

-- 
  Russell Currey
  ruscur@russell.cc

On Fri, Apr 12, 2019, at 4:44 PM, Alexey Kardashevskiy wrote:
> 
> This is an attempt to allow DMA mask 40 or similar which are not large
> enough to use either a PHB3 bypass mode or a sketchy bypass.
> 
> This is based on sha1
> 582549e3fbe1 Linus Torvalds Merge tag 'for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma 
> 
> Please comment. Thanks.
> 
> 
> 
> Alexey Kardashevskiy (2):
>   powerpc/powernv/ioda: Allocate TCE table levels on demand for default
>     DMA window
>   powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU
>     pages
> 
>  arch/powerpc/include/asm/iommu.h              |  8 ++-
>  arch/powerpc/platforms/powernv/pci.h          |  2 +-
>  arch/powerpc/kernel/iommu.c                   | 58 +++++++++++++------
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c | 19 +++---
>  arch/powerpc/platforms/powernv/pci-ioda.c     | 14 ++++-
>  5 files changed, 66 insertions(+), 35 deletions(-)
> 
> -- 
> 2.17.1
> 
>

^ permalink raw reply

* [PATCH kernel RFC 2/2] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages
From: Alexey Kardashevskiy @ 2019-04-12  6:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	David Gibson
In-Reply-To: <20190412064408.85399-1-aik@ozlabs.ru>

At the moment we create a small window only for 32bit devices, the window
maps 0..2GB of the PCI space only. For other devices we either use
a sketchy bypass or hardware bypass but the former can only work if
the amount of RAM is no bigger than the device's DMA mask and the latter
requires devices to support at least 59bit DMA.

This extends the default DMA window to the maximum size possible to allow
more flexibility. The default window size has a limitation as
the allocation bitmap in iommu_table is a contiguous array with a bit per
an IOMMU page. Also, this increases the default IOMMU page size to
the system page size to allow higher supported DMA masks.

As the extended window now overlaps the 32bit MMIO region, this adds
reservation capability to iommu_init_table().

After this change the default window size is 0x80000000000==1<<43 so
devices limited to DMA mask smaller than the amount of system RAM can
still use more than just 2GB of memory for DMA.

With the on-demand allocation of indirect TCE table levels enabled and
2 levels, the first TCE level size is just
1<<ceil((log2(0x7ffffffffff+1)-16)/2)=16384 TCEs or 2 system pages.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/iommu.h          |  8 +++-
 arch/powerpc/kernel/iommu.c               | 58 +++++++++++++++--------
 arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++++--
 3 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 0ac52392ed99..5ea782e04803 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -124,6 +124,8 @@ struct iommu_table {
 	struct iommu_table_ops *it_ops;
 	struct kref    it_kref;
 	int it_nid;
+	unsigned long it_reserved_start; /* Start of not-DMA-able (MMIO) area */
+	unsigned long it_reserved_end;
 };
 
 #define IOMMU_TABLE_USERSPACE_ENTRY_RO(tbl, entry) \
@@ -162,8 +164,10 @@ extern int iommu_tce_table_put(struct iommu_table *tbl);
 /* Initializes an iommu_table based in values set in the passed-in
  * structure
  */
-extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
-					    int nid);
+extern struct iommu_table *iommu_init_table_res(struct iommu_table *tbl,
+		int nid, unsigned long res_start, unsigned long res_end);
+#define iommu_init_table(tbl, nid) iommu_init_table_res((tbl), (nid), 0, 0)
+
 #define IOMMU_TABLE_GROUP_MAX_TABLES	2
 
 struct iommu_table_group;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 33bbd59cff79..209306ce7f4b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -646,11 +646,43 @@ static void iommu_table_clear(struct iommu_table *tbl)
 #endif
 }
 
+static void iommu_table_reserve_pages(struct iommu_table *tbl)
+{
+	int i;
+
+	/*
+	 * Reserve page 0 so it will not be used for any mappings.
+	 * This avoids buggy drivers that consider page 0 to be invalid
+	 * to crash the machine or even lose data.
+	 */
+	if (tbl->it_offset == 0)
+		set_bit(0, tbl->it_map);
+
+	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
+		set_bit(i, tbl->it_map);
+}
+
+static void iommu_table_release_pages(struct iommu_table *tbl)
+{
+	int i;
+
+	/*
+	 * In case we have reserved the first bit, we should not emit
+	 * the warning below.
+	 */
+	if (tbl->it_offset == 0)
+		clear_bit(0, tbl->it_map);
+
+	for (i = tbl->it_reserved_start; i < tbl->it_reserved_end; ++i)
+		clear_bit(i, tbl->it_map);
+}
+
 /*
  * Build a iommu_table structure.  This contains a bit map which
  * is used to manage allocation of the tce space.
  */
-struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
+struct iommu_table *iommu_init_table_res(struct iommu_table *tbl, int nid,
+		unsigned long res_start, unsigned long res_end)
 {
 	unsigned long sz;
 	static int welcomed = 0;
@@ -669,13 +701,9 @@ struct iommu_table *iommu_init_table(struct iommu_table *tbl, int nid)
 	tbl->it_map = page_address(page);
 	memset(tbl->it_map, 0, sz);
 
-	/*
-	 * Reserve page 0 so it will not be used for any mappings.
-	 * This avoids buggy drivers that consider page 0 to be invalid
-	 * to crash the machine or even lose data.
-	 */
-	if (tbl->it_offset == 0)
-		set_bit(0, tbl->it_map);
+	tbl->it_reserved_start = res_start;
+	tbl->it_reserved_end = res_end;
+	iommu_table_reserve_pages(tbl);
 
 	/* We only split the IOMMU table if we have 1GB or more of space */
 	if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 * 1024))
@@ -727,12 +755,7 @@ static void iommu_table_free(struct kref *kref)
 		return;
 	}
 
-	/*
-	 * In case we have reserved the first bit, we should not emit
-	 * the warning below.
-	 */
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
+	iommu_table_release_pages(tbl);
 
 	/* verify that table contains no entries */
 	if (!bitmap_empty(tbl->it_map, tbl->it_size))
@@ -1037,8 +1060,7 @@ int iommu_take_ownership(struct iommu_table *tbl)
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_lock(&tbl->pools[i].lock);
 
-	if (tbl->it_offset == 0)
-		clear_bit(0, tbl->it_map);
+	iommu_table_reserve_pages(tbl);
 
 	if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
 		pr_err("iommu_tce: it_map is not empty");
@@ -1068,9 +1090,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
 
 	memset(tbl->it_map, 0, sz);
 
-	/* Restore bit#0 set by iommu_init_table() */
-	if (tbl->it_offset == 0)
-		set_bit(0, tbl->it_map);
+	iommu_table_release_pages(tbl);
 
 	for (i = 0; i < tbl->nr_pools; i++)
 		spin_unlock(&tbl->pools[i].lock);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index dcfb5469678e..3eb755cf13d7 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2434,10 +2434,13 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 	 * DMA window can be larger than available memory, which will
 	 * cause errors later.
 	 */
-	const u64 window_size = min((u64)pe->table_group.tce32_size, max_memory);
+	const u64 maxblock = 1UL << (PAGE_SHIFT + MAX_ORDER - 1);
+	/* iommu_table::it_map uses 1 bit per IOMMU page, hence 8 */
+	const u64 window_size = min((maxblock * 8) << PAGE_SHIFT, max_memory);
+	unsigned long res_start = 0, res_end = 0;
 
 	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0,
-			IOMMU_PAGE_SHIFT_4K,
+			PAGE_SHIFT,
 			window_size,
 			POWERNV_IOMMU_DEFAULT_LEVELS, false, &tbl);
 	if (rc) {
@@ -2446,7 +2449,12 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
 		return rc;
 	}
 
-	iommu_init_table(tbl, pe->phb->hose->node);
+	/* We use top part of 32bit space for MMIO so exclude it from DMA */
+	if (window_size > pe->phb->ioda.m32_pci_base) {
+		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
+		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
+	}
+	iommu_init_table_res(tbl, pe->phb->hose->node, res_start, res_end);
 
 	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
 	if (rc) {
-- 
2.17.1


^ permalink raw reply related

* [PATCH kernel RFC 1/2] powerpc/powernv/ioda: Allocate TCE table levels on demand for default DMA window
From: Alexey Kardashevskiy @ 2019-04-12  6:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	David Gibson
In-Reply-To: <20190412064408.85399-1-aik@ozlabs.ru>

We allocate only the first level of multilevel TCE tables for KVM
already (alloc_userspace_copy==true), and the rest is allocated on demand.
This is not enabled though for baremetal.

This removes the KVM limitation (implicit, via the alloc_userspace_copy
parameter) and always allocates just the first level. The on-demand
allocation of missing levels is already implemented.

As from now on DMA map might happen with disabled interrupts, this
allocates TCEs with GFP_ATOMIC.

To save time when creating a new clean table, this skips non-allocated
indirect TCE entries in pnv_tce_free just like we already do in
the VFIO IOMMU TCE driver.

This changes the default level number from 1 to 2 to reduce the amount
of memory required for the default 32bit DMA window at the boot time.
The default window size is up to 2GB which requires 4MB of TCEs which is
unlikely to be used entirely or at all as most devices these days are
64bit capable so by switching to 2 levels by default we save 4032KB of
RAM per a device.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 19 +++++++++----------
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index 07464d3bad4a..1b8aba83d1b2 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -224,7 +224,7 @@ extern struct iommu_table_group *pnv_npu_compound_attach(
 		struct pnv_ioda_pe *pe);
 
 /* pci-ioda-tce.c */
-#define POWERNV_IOMMU_DEFAULT_LEVELS	1
+#define POWERNV_IOMMU_DEFAULT_LEVELS	2
 #define POWERNV_IOMMU_MAX_LEVELS	5
 
 extern int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index 7fc3d951f421..2e827546b983 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -36,7 +36,7 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
 	struct page *tce_mem = NULL;
 	__be64 *addr;
 
-	tce_mem = alloc_pages_node(nid, GFP_KERNEL | __GFP_NOWARN,
+	tce_mem = alloc_pages_node(nid, GFP_ATOMIC | __GFP_NOWARN,
 			shift - PAGE_SHIFT);
 	if (!tce_mem) {
 		pr_err("Failed to allocate a TCE memory, level shift=%d\n",
@@ -162,6 +162,9 @@ void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
 
 		if (ptce)
 			*ptce = cpu_to_be64(0);
+		else
+			/* align to level_size which is power of two */
+			i |= tbl->it_level_size - 1;
 	}
 }
 
@@ -261,7 +264,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	unsigned int table_shift = max_t(unsigned int, entries_shift + 3,
 			PAGE_SHIFT);
 	const unsigned long tce_table_size = 1UL << table_shift;
-	unsigned int tmplevels = levels;
 
 	if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS))
 		return -EINVAL;
@@ -269,9 +271,6 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	if (!is_power_of_2(window_size))
 		return -EINVAL;
 
-	if (alloc_userspace_copy && (window_size > (1ULL << 32)))
-		tmplevels = 1;
-
 	/* Adjust direct table size from window_size and levels */
 	entries_shift = (entries_shift + levels - 1) / levels;
 	level_shift = entries_shift + 3;
@@ -282,7 +281,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	/* Allocate TCE table */
 	addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-			tmplevels, tce_table_size, &offset, &total_allocated);
+			1, tce_table_size, &offset, &total_allocated);
 
 	/* addr==NULL means that the first level allocation failed */
 	if (!addr)
@@ -293,18 +292,18 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 	 * we did not allocate as much as we wanted,
 	 * release partially allocated table.
 	 */
-	if (tmplevels == levels && offset < tce_table_size)
+	if (levels == 1 && offset < tce_table_size)
 		goto free_tces_exit;
 
 	/* Allocate userspace view of the TCE table */
 	if (alloc_userspace_copy) {
 		offset = 0;
 		uas = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
-				tmplevels, tce_table_size, &offset,
+				1, tce_table_size, &offset,
 				&total_allocated_uas);
 		if (!uas)
 			goto free_tces_exit;
-		if (tmplevels == levels && (offset < tce_table_size ||
+		if (levels == 1 && (offset < tce_table_size ||
 				total_allocated_uas != total_allocated))
 			goto free_uas_exit;
 	}
@@ -319,7 +318,7 @@ long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
 
 	pr_debug("Created TCE table: ws=%08llx ts=%lx @%08llx base=%lx uas=%p levels=%d/%d\n",
 			window_size, tce_table_size, bus_offset, tbl->it_base,
-			tbl->it_userspace, tmplevels, levels);
+			tbl->it_userspace, 1, levels);
 
 	return 0;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH kernel RFC 0/2] powerpc/ioda2: An attempt to allow DMA masks between 32 and 59
From: Alexey Kardashevskiy @ 2019-04-12  6:44 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Alistair Popple, Oliver O'Halloran,
	David Gibson


This is an attempt to allow DMA mask 40 or similar which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass.

This is based on sha1
582549e3fbe1 Linus Torvalds Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma 

Please comment. Thanks.



Alexey Kardashevskiy (2):
  powerpc/powernv/ioda: Allocate TCE table levels on demand for default
    DMA window
  powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU
    pages

 arch/powerpc/include/asm/iommu.h              |  8 ++-
 arch/powerpc/platforms/powernv/pci.h          |  2 +-
 arch/powerpc/kernel/iommu.c                   | 58 +++++++++++++------
 arch/powerpc/platforms/powernv/pci-ioda-tce.c | 19 +++---
 arch/powerpc/platforms/powernv/pci-ioda.c     | 14 ++++-
 5 files changed, 66 insertions(+), 35 deletions(-)

-- 
2.17.1


^ permalink raw reply

* Re: [PATCH v2] powerpc/watchdog: Use hrtimers for per-CPU heartbeat
From: Gautham R Shenoy @ 2019-04-12  6:04 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Gautham R . Shenoy, Ravikumar Bangoria, linuxppc-dev
In-Reply-To: <20190409044005.26446-1-npiggin@gmail.com>

Hello Nicholas,

On Tue, Apr 09, 2019 at 02:40:05PM +1000, Nicholas Piggin wrote:
> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
> incrementing jiffies. If that CPU has locked up and jiffies is not
> incrementing, the watchdog heartbeat timer for all CPUs stops and
> creates false positives and confusing warnings on local CPUs, and
> also causes the SMP detector to stop, so the root cause is never
> detected.
> 
> Fix this by using hrtimer based timers for the watchdog heartbeat,
> like the generic kernel hardlockup detector.
> 
> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Looks good to me.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

--
Thanks and Regards
gautham.

^ permalink raw reply

* Re: [PATCH v5 06/16] KVM: PPC: Book3S HV: XIVE: add controls for the EQ configuration
From: David Gibson @ 2019-04-12  5:19 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: linuxppc-dev, Paul Mackerras, kvm, kvm-ppc
In-Reply-To: <20190410170448.3923-7-clg@kaod.org>

[-- Attachment #1: Type: text/plain, Size: 16576 bytes --]

On Wed, Apr 10, 2019 at 07:04:38PM +0200, Cédric Le Goater wrote:
> These controls will be used by the H_INT_SET_QUEUE_CONFIG and
> H_INT_GET_QUEUE_CONFIG hcalls from QEMU to configure the underlying
> Event Queue in the XIVE IC. They will also be used to restore the
> configuration of the XIVE EQs and to capture the internal run-time
> state of the EQs. Both 'get' and 'set' rely on an OPAL call to access
> the EQ toggle bit and EQ index which are updated by the XIVE IC when
> event notifications are enqueued in the EQ.
> 
> The value of the guest physical address of the event queue is saved in
> the XIVE internal xive_q structure for later use. That is when
> migration needs to mark the EQ pages dirty to capture a consistent
> memory state of the VM.
> 
> To be noted that H_INT_SET_QUEUE_CONFIG does not require the extra
> OPAL call setting the EQ toggle bit and EQ index to configure the EQ,
> but restoring the EQ state will.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
> 
>  Changes since v4 :
> 
>  - add check on EQ page alignment
>  - add requirement on KVM_XIVE_EQ_ALWAYS_NOTIFY
> 
>  arch/powerpc/include/asm/xive.h            |   2 +
>  arch/powerpc/include/uapi/asm/kvm.h        |  19 ++
>  arch/powerpc/kvm/book3s_xive.h             |   2 +
>  arch/powerpc/kvm/book3s_xive.c             |  15 +-
>  arch/powerpc/kvm/book3s_xive_native.c      | 249 +++++++++++++++++++++
>  Documentation/virtual/kvm/devices/xive.txt |  34 +++
>  6 files changed, 315 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> index b579a943407b..c4e88abd3b67 100644
> --- a/arch/powerpc/include/asm/xive.h
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -73,6 +73,8 @@ struct xive_q {
>  	u32			esc_irq;
>  	atomic_t		count;
>  	atomic_t		pending_count;
> +	u64			guest_qaddr;
> +	u32			guest_qshift;
>  };
>  
>  /* Global enable flags for the XIVE support */
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index e8161e21629b..85005400fd86 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -681,6 +681,7 @@ struct kvm_ppc_cpu_char {
>  #define KVM_DEV_XIVE_GRP_CTRL		1
>  #define KVM_DEV_XIVE_GRP_SOURCE		2	/* 64-bit source identifier */
>  #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG	3	/* 64-bit source identifier */
> +#define KVM_DEV_XIVE_GRP_EQ_CONFIG	4	/* 64-bit EQ identifier */
>  
>  /* Layout of 64-bit XIVE source attribute values */
>  #define KVM_XIVE_LEVEL_SENSITIVE	(1ULL << 0)
> @@ -696,4 +697,22 @@ struct kvm_ppc_cpu_char {
>  #define KVM_XIVE_SOURCE_EISN_SHIFT	33
>  #define KVM_XIVE_SOURCE_EISN_MASK	0xfffffffe00000000ULL
>  
> +/* Layout of 64-bit EQ identifier */
> +#define KVM_XIVE_EQ_PRIORITY_SHIFT	0
> +#define KVM_XIVE_EQ_PRIORITY_MASK	0x7
> +#define KVM_XIVE_EQ_SERVER_SHIFT	3
> +#define KVM_XIVE_EQ_SERVER_MASK		0xfffffff8ULL
> +
> +/* Layout of EQ configuration values (64 bytes) */
> +struct kvm_ppc_xive_eq {
> +	__u32 flags;
> +	__u32 qshift;
> +	__u64 qaddr;
> +	__u32 qtoggle;
> +	__u32 qindex;
> +	__u8  pad[40];
> +};
> +
> +#define KVM_XIVE_EQ_ALWAYS_NOTIFY	0x00000001
> +
>  #endif /* __LINUX_KVM_POWERPC_H */
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index ae26fe653d98..622f594d93e1 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -272,6 +272,8 @@ struct kvmppc_xive_src_block *kvmppc_xive_create_src_block(
>  	struct kvmppc_xive *xive, int irq);
>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb);
>  int kvmppc_xive_select_target(struct kvm *kvm, u32 *server, u8 prio);
> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
> +				  bool single_escalation);
>  
>  #endif /* CONFIG_KVM_XICS */
>  #endif /* _KVM_PPC_BOOK3S_XICS_H */
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index e09f3addffe5..c1b7aa7dbc28 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -166,7 +166,8 @@ static irqreturn_t xive_esc_irq(int irq, void *data)
>  	return IRQ_HANDLED;
>  }
>  
> -static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
> +int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
> +				  bool single_escalation)
>  {
>  	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>  	struct xive_q *q = &xc->queues[prio];
> @@ -185,7 +186,7 @@ static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
>  		return -EIO;
>  	}
>  
> -	if (xc->xive->single_escalation)
> +	if (single_escalation)
>  		name = kasprintf(GFP_KERNEL, "kvm-%d-%d",
>  				 vcpu->kvm->arch.lpid, xc->server_num);
>  	else
> @@ -217,7 +218,7 @@ static int xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio)
>  	 * interrupt, thus leaving it effectively masked after
>  	 * it fires once.
>  	 */
> -	if (xc->xive->single_escalation) {
> +	if (single_escalation) {
>  		struct irq_data *d = irq_get_irq_data(xc->esc_virq[prio]);
>  		struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
>  
> @@ -291,7 +292,8 @@ static int xive_check_provisioning(struct kvm *kvm, u8 prio)
>  			continue;
>  		rc = xive_provision_queue(vcpu, prio);
>  		if (rc == 0 && !xive->single_escalation)
> -			xive_attach_escalation(vcpu, prio);
> +			kvmppc_xive_attach_escalation(vcpu, prio,
> +						      xive->single_escalation);
>  		if (rc)
>  			return rc;
>  	}
> @@ -1214,7 +1216,8 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  		if (xive->qmap & (1 << i)) {
>  			r = xive_provision_queue(vcpu, i);
>  			if (r == 0 && !xive->single_escalation)
> -				xive_attach_escalation(vcpu, i);
> +				kvmppc_xive_attach_escalation(
> +					vcpu, i, xive->single_escalation);
>  			if (r)
>  				goto bail;
>  		} else {
> @@ -1229,7 +1232,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  	}
>  
>  	/* If not done above, attach priority 0 escalation */
> -	r = xive_attach_escalation(vcpu, 0);
> +	r = kvmppc_xive_attach_escalation(vcpu, 0, xive->single_escalation);
>  	if (r)
>  		goto bail;
>  
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 492825a35958..3e7cdcacc932 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -335,6 +335,243 @@ static int kvmppc_xive_native_set_source_config(struct kvmppc_xive *xive,
>  						       priority, masked, eisn);
>  }
>  
> +static int xive_native_validate_queue_size(u32 qshift)
> +{
> +	/*
> +	 * We only support 64K pages for the moment. This is also
> +	 * advertised in the DT property "ibm,xive-eq-sizes"
> +	 */
> +	switch (qshift) {
> +	case 0: /* EQ reset */
> +	case 16:
> +		return 0;
> +	case 12:
> +	case 21:
> +	case 24:
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> +					       long eq_idx, u64 addr)
> +{
> +	struct kvm *kvm = xive->kvm;
> +	struct kvm_vcpu *vcpu;
> +	struct kvmppc_xive_vcpu *xc;
> +	void __user *ubufp = (void __user *) addr;
> +	u32 server;
> +	u8 priority;
> +	struct kvm_ppc_xive_eq kvm_eq;
> +	int rc;
> +	__be32 *qaddr = 0;
> +	struct page *page;
> +	struct xive_q *q;
> +	gfn_t gfn;
> +	unsigned long page_size;
> +
> +	/*
> +	 * Demangle priority/server tuple from the EQ identifier
> +	 */
> +	priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
> +		KVM_XIVE_EQ_PRIORITY_SHIFT;
> +	server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
> +		KVM_XIVE_EQ_SERVER_SHIFT;
> +
> +	if (copy_from_user(&kvm_eq, ubufp, sizeof(kvm_eq)))
> +		return -EFAULT;
> +
> +	vcpu = kvmppc_xive_find_server(kvm, server);
> +	if (!vcpu) {
> +		pr_err("Can't find server %d\n", server);
> +		return -ENOENT;
> +	}
> +	xc = vcpu->arch.xive_vcpu;
> +
> +	if (priority != xive_prio_from_guest(priority)) {
> +		pr_err("Trying to restore invalid queue %d for VCPU %d\n",
> +		       priority, server);
> +		return -EINVAL;
> +	}
> +	q = &xc->queues[priority];
> +
> +	pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n",
> +		 __func__, server, priority, kvm_eq.flags,
> +		 kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex);
> +
> +	/*
> +	 * sPAPR specifies a "Unconditional Notify (n) flag" for the
> +	 * H_INT_SET_QUEUE_CONFIG hcall which forces notification
> +	 * without using the coalescing mechanisms provided by the
> +	 * XIVE END ESBs. This is required on KVM as notification
> +	 * using the END ESBs is not supported.
> +	 */
> +	if (kvm_eq.flags != KVM_XIVE_EQ_ALWAYS_NOTIFY) {
> +		pr_err("invalid flags %d\n", kvm_eq.flags);
> +		return -EINVAL;
> +	}
> +
> +	rc = xive_native_validate_queue_size(kvm_eq.qshift);
> +	if (rc) {
> +		pr_err("invalid queue size %d\n", kvm_eq.qshift);
> +		return rc;
> +	}
> +
> +	/* reset queue and disable queueing */
> +	if (!kvm_eq.qshift) {
> +		q->guest_qaddr  = 0;
> +		q->guest_qshift = 0;
> +
> +		rc = xive_native_configure_queue(xc->vp_id, q, priority,
> +						 NULL, 0, true);
> +		if (rc) {
> +			pr_err("Failed to reset queue %d for VCPU %d: %d\n",
> +			       priority, xc->server_num, rc);
> +			return rc;
> +		}
> +
> +		if (q->qpage) {
> +			put_page(virt_to_page(q->qpage));
> +			q->qpage = NULL;
> +		}
> +
> +		return 0;
> +	}
> +
> +	if (kvm_eq.qaddr & ((1ull << kvm_eq.qshift) - 1)) {
> +		pr_err("queue page is not aligned %llx/%llx\n", kvm_eq.qaddr,
> +		       1ull << kvm_eq.qshift);
> +		return -EINVAL;
> +	}
> +
> +	gfn = gpa_to_gfn(kvm_eq.qaddr);
> +	page = gfn_to_page(kvm, gfn);
> +	if (is_error_page(page)) {
> +		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
> +		return -EINVAL;
> +	}
> +	page_size = kvm_host_page_size(kvm, gfn);
> +	if (1ull << kvm_eq.qshift > page_size) {
> +		pr_warn("Incompatible host page size %lx!\n", page_size);
> +		return -EINVAL;
> +	}
> +
> +	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
> +
> +	/*
> +	 * Backup the queue page guest address to the mark EQ page
> +	 * dirty for migration.
> +	 */
> +	q->guest_qaddr  = kvm_eq.qaddr;
> +	q->guest_qshift = kvm_eq.qshift;
> +
> +	 /*
> +	  * Unconditional Notification is forced by default at the
> +	  * OPAL level because the use of END ESBs is not supported by
> +	  * Linux.
> +	  */
> +	rc = xive_native_configure_queue(xc->vp_id, q, priority,
> +					 (__be32 *) qaddr, kvm_eq.qshift, true);
> +	if (rc) {
> +		pr_err("Failed to configure queue %d for VCPU %d: %d\n",
> +		       priority, xc->server_num, rc);
> +		put_page(page);
> +		return rc;
> +	}
> +
> +	/*
> +	 * Only restore the queue state when needed. When doing the
> +	 * H_INT_SET_SOURCE_CONFIG hcall, it should not.
> +	 */
> +	if (kvm_eq.qtoggle != 1 || kvm_eq.qindex != 0) {
> +		rc = xive_native_set_queue_state(xc->vp_id, priority,
> +						 kvm_eq.qtoggle,
> +						 kvm_eq.qindex);
> +		if (rc)
> +			goto error;
> +	}
> +
> +	rc = kvmppc_xive_attach_escalation(vcpu, priority,
> +					   xive->single_escalation);
> +error:
> +	if (rc)
> +		kvmppc_xive_native_cleanup_queue(vcpu, priority);
> +	return rc;
> +}
> +
> +static int kvmppc_xive_native_get_queue_config(struct kvmppc_xive *xive,
> +					       long eq_idx, u64 addr)
> +{
> +	struct kvm *kvm = xive->kvm;
> +	struct kvm_vcpu *vcpu;
> +	struct kvmppc_xive_vcpu *xc;
> +	struct xive_q *q;
> +	void __user *ubufp = (u64 __user *) addr;
> +	u32 server;
> +	u8 priority;
> +	struct kvm_ppc_xive_eq kvm_eq;
> +	u64 qaddr;
> +	u64 qshift;
> +	u64 qeoi_page;
> +	u32 escalate_irq;
> +	u64 qflags;
> +	int rc;
> +
> +	/*
> +	 * Demangle priority/server tuple from the EQ identifier
> +	 */
> +	priority = (eq_idx & KVM_XIVE_EQ_PRIORITY_MASK) >>
> +		KVM_XIVE_EQ_PRIORITY_SHIFT;
> +	server = (eq_idx & KVM_XIVE_EQ_SERVER_MASK) >>
> +		KVM_XIVE_EQ_SERVER_SHIFT;
> +
> +	vcpu = kvmppc_xive_find_server(kvm, server);
> +	if (!vcpu) {
> +		pr_err("Can't find server %d\n", server);
> +		return -ENOENT;
> +	}
> +	xc = vcpu->arch.xive_vcpu;
> +
> +	if (priority != xive_prio_from_guest(priority)) {
> +		pr_err("invalid priority for queue %d for VCPU %d\n",
> +		       priority, server);
> +		return -EINVAL;
> +	}
> +	q = &xc->queues[priority];
> +
> +	memset(&kvm_eq, 0, sizeof(kvm_eq));
> +
> +	if (!q->qpage)
> +		return 0;
> +
> +	rc = xive_native_get_queue_info(xc->vp_id, priority, &qaddr, &qshift,
> +					&qeoi_page, &escalate_irq, &qflags);
> +	if (rc)
> +		return rc;
> +
> +	kvm_eq.flags = 0;
> +	if (qflags & OPAL_XIVE_EQ_ALWAYS_NOTIFY)
> +		kvm_eq.flags |= KVM_XIVE_EQ_ALWAYS_NOTIFY;
> +
> +	kvm_eq.qshift = q->guest_qshift;
> +	kvm_eq.qaddr  = q->guest_qaddr;
> +
> +	rc = xive_native_get_queue_state(xc->vp_id, priority, &kvm_eq.qtoggle,
> +					 &kvm_eq.qindex);
> +	if (rc)
> +		return rc;
> +
> +	pr_devel("%s VCPU %d priority %d fl:%x shift:%d addr:%llx g:%d idx:%d\n",
> +		 __func__, server, priority, kvm_eq.flags,
> +		 kvm_eq.qshift, kvm_eq.qaddr, kvm_eq.qtoggle, kvm_eq.qindex);
> +
> +	if (copy_to_user(ubufp, &kvm_eq, sizeof(kvm_eq)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
>  				       struct kvm_device_attr *attr)
>  {
> @@ -349,6 +586,9 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
>  	case KVM_DEV_XIVE_GRP_SOURCE_CONFIG:
>  		return kvmppc_xive_native_set_source_config(xive, attr->attr,
>  							    attr->addr);
> +	case KVM_DEV_XIVE_GRP_EQ_CONFIG:
> +		return kvmppc_xive_native_set_queue_config(xive, attr->attr,
> +							   attr->addr);
>  	}
>  	return -ENXIO;
>  }
> @@ -356,6 +596,13 @@ static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
>  static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
>  				       struct kvm_device_attr *attr)
>  {
> +	struct kvmppc_xive *xive = dev->private;
> +
> +	switch (attr->group) {
> +	case KVM_DEV_XIVE_GRP_EQ_CONFIG:
> +		return kvmppc_xive_native_get_queue_config(xive, attr->attr,
> +							   attr->addr);
> +	}
>  	return -ENXIO;
>  }
>  
> @@ -371,6 +618,8 @@ static int kvmppc_xive_native_has_attr(struct kvm_device *dev,
>  		    attr->attr < KVMPPC_XIVE_NR_IRQS)
>  			return 0;
>  		break;
> +	case KVM_DEV_XIVE_GRP_EQ_CONFIG:
> +		return 0;
>  	}
>  	return -ENXIO;
>  }
> diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt
> index 33c64b2cdbe8..cc13bfd5cf53 100644
> --- a/Documentation/virtual/kvm/devices/xive.txt
> +++ b/Documentation/virtual/kvm/devices/xive.txt
> @@ -53,3 +53,37 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
>      -ENXIO:  CPU event queues not configured or configuration of the
>               underlying HW interrupt failed
>      -EBUSY:  No CPU available to serve interrupt
> +
> +  4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
> +  Configures an event queue of a CPU
> +  Attributes:
> +    EQ descriptor identifier (64-bit)
> +  The EQ descriptor identifier is a tuple (server, priority) :
> +  bits:     | 63   ....  32 | 31 .. 3 |  2 .. 0
> +  values:   |    unused     |  server | priority
> +  The kvm_device_attr.addr points to :
> +    struct kvm_ppc_xive_eq {
> +	__u32 flags;
> +	__u32 qshift;
> +	__u64 qaddr;
> +	__u32 qtoggle;
> +	__u32 qindex;
> +	__u8  pad[40];
> +    };
> +  - flags: queue flags
> +    KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
> +	forces notification without using the coalescing mechanism
> +	provided by the XIVE END ESBs.
> +  - qshift: queue size (power of 2)
> +  - qaddr: real address of queue
> +  - qtoggle: current queue toggle bit
> +  - qindex: current queue index
> +  - pad: reserved for future use
> +  Errors:
> +    -ENOENT: Invalid CPU number
> +    -EINVAL: Invalid priority
> +    -EINVAL: Invalid flags
> +    -EINVAL: Invalid queue size
> +    -EINVAL: Invalid queue address
> +    -EFAULT: Invalid user pointer for attr->addr.
> +    -EIO:    Configuration of the underlying HW failed

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
From: Alexey Kardashevskiy @ 2019-04-12  3:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jose Ricardo Ziviani, kvm, Sam Bobroff, linuxppc-dev,
	Daniel Henrique Barboza, kvm-ppc, Piotr Jaroszynski,
	Leonardo Augusto Guimarães Garcia, Reza Arbab, David Gibson
In-Reply-To: <20190411105251.20165f1d@x1.home>



On 12/04/2019 02:52, Alex Williamson wrote:
> On Thu, 11 Apr 2019 16:48:44 +1000
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> The NVIDIA V100 SXM2 GPUs are connected to the CPU via PCIe links and
>> (on POWER9) NVLinks. In addition to that, GPUs themselves have direct
>> peer-to-peer NVLinks in groups of 2 to 4 GPUs with no buffers/latches
>> between GPUs.
>>
>> Because of these interconnected NVLinks, the POWERNV platform puts such
>> interconnected GPUs to the same IOMMU group. However users want to pass
>> GPUs through individually which requires separate IOMMU groups.
>>
>> Thankfully V100 GPUs implement an interface to disable arbitrary links
>> by programming link disabling mask via the GPU's BAR0. Once a link is
>> disabled, it only can be enabled after performing the secondary bus reset
>> (SBR) on the GPU. Since these GPUs do not advertise any other type of
>> reset, it is reset by the platform's SBR handler.
>>
>> This adds an extra step to the POWERNV's SBR handler to block NVLinks to
>> GPUs which do not belong to the same group as the GPU being reset.
>>
>> This adds a new "isolate_nvlink" kernel parameter to force GPU isolation;
>> when enabled, every GPU gets placed in its own IOMMU group. The new
>> parameter is off by default to preserve the existing behaviour.
>>
>> Before isolating:
>> [nvdbg ~]$ nvidia-smi topo -m
>>         GPU0    GPU1    GPU2    CPU Affinity
>> GPU0     X      NV2     NV2     0-0
>> GPU1    NV2      X      NV2     0-0
>> GPU2    NV2     NV2      X      0-0
>>
>> After isolating:
>> [nvdbg ~]$ nvidia-smi topo -m
>>         GPU0    GPU1    GPU2    CPU Affinity
>> GPU0     X      PHB     PHB     0-0
>> GPU1    PHB      X      PHB     0-0
>> GPU2    PHB     PHB      X      0-0
>>
>> Where:
>>   X    = Self
>>   PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
>>   NV#  = Connection traversing a bonded set of # NVLinks
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v3:
>> * added pci_err() for failed ioremap
>> * reworked commit log
>>
>> v2:
>> * this is rework of [PATCH kernel RFC 0/2] vfio, powerpc/powernv: Isolate GV100GL
>> but this time it is contained in the powernv platform
>> ---
>>  arch/powerpc/platforms/powernv/Makefile      |   2 +-
>>  arch/powerpc/platforms/powernv/pci.h         |   1 +
>>  arch/powerpc/platforms/powernv/eeh-powernv.c |   1 +
>>  arch/powerpc/platforms/powernv/npu-dma.c     |  24 +++-
>>  arch/powerpc/platforms/powernv/nvlinkgpu.c   | 137 +++++++++++++++++++
>>  5 files changed, 162 insertions(+), 3 deletions(-)
>>  create mode 100644 arch/powerpc/platforms/powernv/nvlinkgpu.c
>>
>> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
>> index da2e99efbd04..60a10d3b36eb 100644
>> --- a/arch/powerpc/platforms/powernv/Makefile
>> +++ b/arch/powerpc/platforms/powernv/Makefile
>> @@ -6,7 +6,7 @@ obj-y			+= opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
>>  obj-y			+= opal-kmsg.o opal-powercap.o opal-psr.o opal-sensor-groups.o
>>  
>>  obj-$(CONFIG_SMP)	+= smp.o subcore.o subcore-asm.o
>> -obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o
>> +obj-$(CONFIG_PCI)	+= pci.o pci-ioda.o npu-dma.o pci-ioda-tce.o nvlinkgpu.o
>>  obj-$(CONFIG_CXL_BASE)	+= pci-cxl.o
>>  obj-$(CONFIG_EEH)	+= eeh-powernv.o
>>  obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
>> diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
>> index 8e36da379252..9fd3f391482c 100644
>> --- a/arch/powerpc/platforms/powernv/pci.h
>> +++ b/arch/powerpc/platforms/powernv/pci.h
>> @@ -250,5 +250,6 @@ extern void pnv_pci_unlink_table_and_group(struct iommu_table *tbl,
>>  extern void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
>>  		void *tce_mem, u64 tce_size,
>>  		u64 dma_offset, unsigned int page_shift);
>> +extern void pnv_try_isolate_nvidia_v100(struct pci_dev *gpdev);
>>  
>>  #endif /* __POWERNV_PCI_H */
>> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> index f38078976c5d..464b097d9635 100644
>> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>> @@ -937,6 +937,7 @@ void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  		pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>>  		pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>>  	}
>> +	pnv_try_isolate_nvidia_v100(dev);
>>  }
>>  
>>  static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, const char *type,
>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
>> index dc23d9d2a7d9..d4f9ee6222b5 100644
>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>> @@ -22,6 +22,23 @@
>>  
>>  #include "pci.h"
>>  
>> +static bool isolate_nvlink;
>> +
>> +static int __init parse_isolate_nvlink(char *p)
>> +{
>> +	bool val;
>> +
>> +	if (!p)
>> +		val = true;
>> +	else if (kstrtobool(p, &val))
>> +		return -EINVAL;
>> +
>> +	isolate_nvlink = val;
>> +
>> +	return 0;
>> +}
>> +early_param("isolate_nvlink", parse_isolate_nvlink);
>> +
>>  /*
>>   * spinlock to protect initialisation of an npu_context for a particular
>>   * mm_struct.
>> @@ -549,7 +566,7 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>>  
>>  	hose = pci_bus_to_host(npdev->bus);
>>  
>> -	if (hose->npu) {
>> +	if (hose->npu && !isolate_nvlink) {
>>  		table_group = &hose->npu->npucomp.table_group;
>>  
>>  		if (!table_group->group) {
>> @@ -559,7 +576,10 @@ struct iommu_table_group *pnv_try_setup_npu_table_group(struct pnv_ioda_pe *pe)
>>  					pe->pe_number);
>>  		}
>>  	} else {
>> -		/* Create a group for 1 GPU and attached NPUs for POWER8 */
>> +		/*
>> +		 * Create a group for 1 GPU and attached NPUs for
>> +		 * POWER8 (always) or POWER9 (when isolate_nvlink).
>> +		 */
>>  		pe->npucomp = kzalloc(sizeof(*pe->npucomp), GFP_KERNEL);
>>  		table_group = &pe->npucomp->table_group;
>>  		table_group->ops = &pnv_npu_peers_ops;
>> diff --git a/arch/powerpc/platforms/powernv/nvlinkgpu.c b/arch/powerpc/platforms/powernv/nvlinkgpu.c
>> new file mode 100644
>> index 000000000000..2a97cb15b6d0
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/powernv/nvlinkgpu.c
>> @@ -0,0 +1,137 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * A helper to disable NVLinks between GPUs on IBM Withersponn platform.
>> + *
>> + * Copyright (C) 2019 IBM Corp.  All rights reserved.
>> + *     Author: Alexey Kardashevskiy <aik@ozlabs.ru>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/device.h>
>> +#include <linux/of.h>
>> +#include <linux/iommu.h>
>> +#include <linux/pci.h>
>> +
>> +static int nvlinkgpu_is_ph_in_group(struct device *dev, void *data)
>> +{
>> +	return dev->of_node->phandle == *(phandle *) data;
>> +}
>> +
>> +static u32 nvlinkgpu_get_disable_mask(struct device *dev)
>> +{
>> +	int npu, peer;
>> +	u32 mask;
>> +	struct device_node *dn;
>> +	struct iommu_group *group;
>> +
>> +	dn = dev->of_node;
>> +	if (!of_find_property(dn, "ibm,nvlink-peers", NULL))
>> +		return 0;
>> +
>> +	group = iommu_group_get(dev);
>> +	if (!group)
>> +		return 0;
>> +
>> +	/*
>> +	 * Collect links to keep which includes links to NPU and links to
>> +	 * other GPUs in the same IOMMU group.
>> +	 */
>> +	for (npu = 0, mask = 0; ; ++npu) {
>> +		u32 npuph = 0;
>> +
>> +		if (of_property_read_u32_index(dn, "ibm,npu", npu, &npuph))
>> +			break;
>> +
>> +		for (peer = 0; ; ++peer) {
>> +			u32 peerph = 0;
>> +
>> +			if (of_property_read_u32_index(dn, "ibm,nvlink-peers",
>> +					peer, &peerph))
>> +				break;
>> +
>> +			if (peerph != npuph &&
>> +				!iommu_group_for_each_dev(group, &peerph,
>> +					nvlinkgpu_is_ph_in_group))
>> +				continue;
>> +
>> +			mask |= 1 << (peer + 16);
>> +		}
>> +	}
>> +	iommu_group_put(group);
>> +
>> +	/* Disabling mechanism takes links to disable so invert it here */
>> +	mask = ~mask & 0x3F0000;
>> +
>> +	return mask;
>> +}
>> +
>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
>> +{
>> +	u32 mask, val;
>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
>> +	struct pci_dev *pdev;
>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
>> +
>> +	if (!bridge->subordinate)
>> +		return;
>> +
>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
>> +			struct pci_dev, bus_list);
>> +	if (!pdev)
>> +		return;
>> +
>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
>> +		return;
>> +
>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
>> +	if (!mask)
>> +		return;
>> +
>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
>> +	if (!bar0_0) {
>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
>> +		return;
>> +	}
>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
>> +	if (!bar0_120000) {
>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
>> +		goto bar0_0_unmap;
>> +	}
>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
>> +	if (!bar0_a00000) {
>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
>> +		goto bar0_120000_unmap;
>> +	}
> 
> Is it really necessary to do three separate ioremaps vs one that would
> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> of the highest register accessed. Thanks,


Sure I can map it once, I just do not see the point in mapping/unmapping
all 0xa10000>>16=161 system pages for a very short period of time while
we know precisely that we need just 3 pages.

Repost?



> 
> Alex
> 
>> +
>> +	pci_restore_state(pdev);
>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
>> +	if ((cmd & cmdmask) != cmdmask)
>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
>> +
>> +	/*
>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
>> +	 * Multi-Tenant Systems".
>> +	 * The register names are not provided there either, hence raw values.
>> +	 */
>> +	iowrite32(0x4, bar0_120000 + 0x4C);
>> +	iowrite32(0x2, bar0_120000 + 0x2204);
>> +	val = ioread32(bar0_0 + 0x200);
>> +	val |= 0x02000000;
>> +	iowrite32(val, bar0_0 + 0x200);
>> +	val = ioread32(bar0_a00000 + 0x148);
>> +	val |= mask;
>> +	iowrite32(val, bar0_a00000 + 0x148);
>> +
>> +	if ((cmd | cmdmask) != cmd)
>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
>> +
>> +	pci_iounmap(pdev, bar0_a00000);
>> +bar0_120000_unmap:
>> +	pci_iounmap(pdev, bar0_120000);
>> +bar0_0_unmap:
>> +	pci_iounmap(pdev, bar0_0);
>> +}
> 

-- 
Alexey

^ permalink raw reply

* Re: powerpc/64s/radix: Fix radix segment exception handling
From: Nicholas Piggin @ 2019-04-12  3:35 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman; +Cc: Aneesh Kumar K . V, Anton Blanchard
In-Reply-To: <44fRrx4BVsz9s70@ozlabs.org>

Michael Ellerman's on April 11, 2019 12:49 am:
> On Fri, 2019-03-29 at 07:42:57 UTC, Nicholas Piggin wrote:
>> Commit 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
>> broke the radix-mode segment exception handler. In radix mode, this is
>> exception is not an SLB miss, rather it signals that the EA is outside
>> the range translated by any page table.
>> 
>> The commit lost the radix feature alternate code patch, which can
>> cause faults to some EAs to kernel BUG at arch/powerpc/mm/slb.c:639!
>> 
>> The original radix code would send faults to slb_miss_large_addr,
>> which would end up faulting due to slb_addr_limit being 0. This patch
>> sends radix directly to do_bad_slb_fault, which is a bit clearer.
>> 
>> Fixes: 48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C")
>> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Reported-by: Anton Blanchard <anton@samba.org>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> 
> Applied to powerpc fixes, thanks.
> 
> https://git.kernel.org/powerpc/c/7100e8704b61247649c50551b965e71d

I sent a v2 with a selftests that triggers the crash if you want it.
Code was unchanged to no big deal there.

Thanks,
Nick


^ permalink raw reply

* [PATCH][v2] powerpc/lib: remove memcpy_flushcache redundant return
From: Li RongQing @ 2019-04-12  2:51 UTC (permalink / raw)
  To: linuxppc-dev

Align it with other architectures and none of the callers has
been interested its return

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
v1->v2: change memcpy_flushcache declaration in arch/powerpc/include/asm/string.h

 arch/powerpc/include/asm/string.h | 2 +-
 arch/powerpc/lib/pmem.c           | 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index 1647de15a31e..15e2e16272ea 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -25,7 +25,7 @@ extern void * memcpy(void *,const void *,__kernel_size_t);
 extern void * memmove(void *,const void *,__kernel_size_t);
 extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
-extern void * memcpy_flushcache(void *,const void *,__kernel_size_t);
+extern void memcpy_flushcache(void *, const void *, __kernel_size_t);
 
 #ifdef CONFIG_PPC64
 #define __HAVE_ARCH_MEMSET32
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 53c018762e1c..a7a1b3fc6720 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -48,14 +48,12 @@ long __copy_from_user_flushcache(void *dest, const void __user *src,
 	return copied;
 }
 
-void *memcpy_flushcache(void *dest, const void *src, size_t size)
+void memcpy_flushcache(void *dest, const void *src, size_t size)
 {
 	unsigned long start = (unsigned long) dest;
 
 	memcpy(dest, src, size);
 	flush_inval_dcache_range(start, start + size);
-
-	return dest;
 }
 EXPORT_SYMBOL(memcpy_flushcache);
 
-- 
2.16.2


^ permalink raw reply related

* Re: [PATCH RFC 1/5] cpu/speculation: Add 'cpu_spec_mitigations=' cmdline options
From: Michael Ellerman @ 2019-04-12  2:41 UTC (permalink / raw)
  To: Josh Poimboeuf, Thomas Gleixner
  Cc: Peter Zijlstra, Heiko Carstens, Paul Mackerras, H . Peter Anvin,
	Ingo Molnar, Andrea Arcangeli, linux-s390, x86, Will Deacon,
	Linus Torvalds, Catalin Marinas, Waiman Long, linux-arch,
	Jon Masters, Jiri Kosina, Borislav Petkov, Andy Lutomirski,
	linux-arm-kernel, Greg Kroah-Hartman, linux-kernel, Tyler Hicks,
	Martin Schwidefsky, linuxppc-dev
In-Reply-To: <20190411131540.754t5t4tp55i6vjq@treble>

Josh Poimboeuf <jpoimboe@redhat.com> writes:
> On Wed, Apr 10, 2019 at 02:10:01PM +0200, Thomas Gleixner wrote:
>> On Wed, 10 Apr 2019, Michael Ellerman wrote:
>> > Josh Poimboeuf <jpoimboe@redhat.com> writes:
>> > 
>> > > On Fri, Apr 05, 2019 at 06:01:36PM +0200, Borislav Petkov wrote:
>> > >> Thinking about this more, we can shave off the first 4 chars and have it
>> > >> be:
>> > >> 
>> > >> spec_mitigations=
>> > >> 
>> > >> I think it is painfully clear which speculation mitigations we mean. And
>> > >> the other switches don't have "cpu_" prefixes too so...
>> > >
>> > > Sure, I'm ok with renaming it to that, if there are no objections.
>> > 
>> > What about when we have a mitigation for a non-speculation related bug :)
>> 
>> Those kind of silicon bugs are usually mitigated unconditionally.
>
> Right.
>
> But at least "mitigations=" is nice and short.  We could clarify in the
> documentation that it doesn't apply to *all* mitigations, only the ones
> which are optional and which can affect performance.
>
> And it would give us the freedom to include any future "optional"
> mitigations, spec or not.
>
> I kind of like it.  But I could go either way.

Some of the published SMT attacks are not speculation based.

And arguably we already have an optional mitigation for those, ie. nosmt.

cheers

^ permalink raw reply

* Re: [PATCH RFC 1/5] cpu/speculation: Add 'cpu_spec_mitigations=' cmdline options
From: Michael Ellerman @ 2019-04-12  2:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Heiko Carstens, Paul Mackerras, H . Peter Anvin,
	Ingo Molnar, Andrea Arcangeli, linux-s390, x86, Will Deacon,
	Linus Torvalds, Catalin Marinas, Waiman Long, linux-arch,
	Jon Masters, Jiri Kosina, Borislav Petkov, Andy Lutomirski,
	Josh Poimboeuf, linux-arm-kernel, Greg Kroah-Hartman,
	linux-kernel, Tyler Hicks, Martin Schwidefsky, linuxppc-dev
In-Reply-To: <alpine.DEB.2.21.1904101408580.3479@nanos.tec.linutronix.de>

Thomas Gleixner <tglx@linutronix.de> writes:
> On Wed, 10 Apr 2019, Michael Ellerman wrote:
>> Josh Poimboeuf <jpoimboe@redhat.com> writes:
>> 
>> > On Fri, Apr 05, 2019 at 06:01:36PM +0200, Borislav Petkov wrote:
>> >> Thinking about this more, we can shave off the first 4 chars and have it
>> >> be:
>> >> 
>> >> spec_mitigations=
>> >> 
>> >> I think it is painfully clear which speculation mitigations we mean. And
>> >> the other switches don't have "cpu_" prefixes too so...
>> >
>> > Sure, I'm ok with renaming it to that, if there are no objections.
>> 
>> What about when we have a mitigation for a non-speculation related bug :)
>
> Those kind of silicon bugs are usually mitigated unconditionally.

I guess that's true, usually :)

cheers

^ permalink raw reply

* Re: [PATCH stable v4.9 00/35] powerpc spectre backports for 4.9
From: Michael Ellerman @ 2019-04-12  2:28 UTC (permalink / raw)
  To: Sasha Levin; +Cc: gregkh, stable, diana.craciun, linuxppc-dev, msuchanek
In-Reply-To: <20190411152543.GN11568@sasha-vm>

Sasha Levin <sashal@kernel.org> writes:
> On Thu, Apr 11, 2019 at 09:45:55PM +1000, Michael Ellerman wrote:
>>-----BEGIN PGP SIGNED MESSAGE-----
>>Hash: SHA1
>>
>>Hi Greg,
>>
>>Please queue up these powerpc patches for 4.9 if you have no objections.
>>
>>There's one build fix for newer toolchains, and the rest are spectre related.
>
> I've queued it up, thank you.

Thanks. I'll fix my script to generate "Hi Sasha" for v4.9 mails :)

cheers

^ permalink raw reply

* [PATCH v4 2/5] powerpc: Replace CONFIG_DEBUG_KERNEL with CONFIG_DEBUG_MISC
From: Sinan Kaya @ 2019-04-12  1:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: keescook, Sinan Kaya, josh, Paul Mackerras,
	open list:LINUX FOR POWERPC 32-BIT AND 64-BIT
In-Reply-To: <20190412014355.25307-1-okaya@kernel.org>

CONFIG_DEBUG_KERNEL should not impact code generation. Use the newly
defined CONFIG_DEBUG_MISC instead to keep the current code.

Signed-off-by: Sinan Kaya <okaya@kernel.org>
---
 arch/powerpc/kernel/sysfs.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index e8e93c2c7d03..7a1708875d27 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -610,7 +610,7 @@ SYSFS_PMCSETUP(pa6t_pmc2, SPRN_PA6T_PMC2);
 SYSFS_PMCSETUP(pa6t_pmc3, SPRN_PA6T_PMC3);
 SYSFS_PMCSETUP(pa6t_pmc4, SPRN_PA6T_PMC4);
 SYSFS_PMCSETUP(pa6t_pmc5, SPRN_PA6T_PMC5);
-#ifdef CONFIG_DEBUG_KERNEL
+#ifdef CONFIG_DEBUG_MISC
 SYSFS_SPRSETUP(hid0, SPRN_HID0);
 SYSFS_SPRSETUP(hid1, SPRN_HID1);
 SYSFS_SPRSETUP(hid4, SPRN_HID4);
@@ -639,7 +639,7 @@ SYSFS_SPRSETUP(tsr0, SPRN_PA6T_TSR0);
 SYSFS_SPRSETUP(tsr1, SPRN_PA6T_TSR1);
 SYSFS_SPRSETUP(tsr2, SPRN_PA6T_TSR2);
 SYSFS_SPRSETUP(tsr3, SPRN_PA6T_TSR3);
-#endif /* CONFIG_DEBUG_KERNEL */
+#endif /* CONFIG_DEBUG_MISC */
 #endif /* HAS_PPC_PMC_PA6T */
 
 #ifdef HAS_PPC_PMC_IBM
@@ -680,7 +680,7 @@ static struct device_attribute pa6t_attrs[] = {
 	__ATTR(pmc3, 0600, show_pa6t_pmc3, store_pa6t_pmc3),
 	__ATTR(pmc4, 0600, show_pa6t_pmc4, store_pa6t_pmc4),
 	__ATTR(pmc5, 0600, show_pa6t_pmc5, store_pa6t_pmc5),
-#ifdef CONFIG_DEBUG_KERNEL
+#ifdef CONFIG_DEBUG_MISC
 	__ATTR(hid0, 0600, show_hid0, store_hid0),
 	__ATTR(hid1, 0600, show_hid1, store_hid1),
 	__ATTR(hid4, 0600, show_hid4, store_hid4),
@@ -709,7 +709,7 @@ static struct device_attribute pa6t_attrs[] = {
 	__ATTR(tsr1, 0600, show_tsr1, store_tsr1),
 	__ATTR(tsr2, 0600, show_tsr2, store_tsr2),
 	__ATTR(tsr3, 0600, show_tsr3, store_tsr3),
-#endif /* CONFIG_DEBUG_KERNEL */
+#endif /* CONFIG_DEBUG_MISC */
 };
 #endif /* HAS_PPC_PMC_PA6T */
 #endif /* HAS_PPC_PMC_CLASSIC */
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v2] powerpc/xmon: add read-only mode
From: Christopher M Riedl @ 2019-04-12  1:42 UTC (permalink / raw)
  To: Michael Ellerman, Oliver; +Cc: linuxppc-dev, Andrew Donnellan
In-Reply-To: <87h8b4vhti.fsf@concordia.ellerman.id.au>


> On April 11, 2019 at 8:37 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> 
> Christopher M Riedl <cmr@informatik.wtf> writes:
> >> On April 8, 2019 at 1:34 AM Oliver <oohall@gmail.com> wrote:
> >> On Mon, Apr 8, 2019 at 1:06 PM Christopher M. Riedl <cmr@informatik.wtf> wrote:
> ...
> >> >
> >> > diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> >> > index 4e00cb0a5464..0c7f21476018 100644
> >> > --- a/arch/powerpc/Kconfig.debug
> >> > +++ b/arch/powerpc/Kconfig.debug
> >> > @@ -117,6 +117,16 @@ config XMON_DISASSEMBLY
> >> >           to say Y here, unless you're building for a memory-constrained
> >> >           system.
> >> >
> >> 
> >> > +config XMON_RW
> >> > +       bool "Allow xmon read and write operations"
> >> > +       depends on XMON
> >> > +       default !STRICT_KERNEL_RWX
> >> > +       help
> >> > +         Allow xmon to read and write to memory and special-purpose registers.
> >> > +          Conversely, prevent xmon write access when set to N. Read and write
> >> > +          access can also be explicitly controlled with 'xmon=rw' or 'xmon=ro'
> >> > +          (read-only) cmdline options. Default is !STRICT_KERNEL_RWX.
> >> 
> >> Maybe I am a dumb, but I found this *extremely* confusing.
> >> Conventionally Kconfig options will control what code is and is not
> >> included in the kernel (see XMON_DISASSEMBLY) rather than changing the
> >> default behaviour of code. It's not wrong to do so and I'm going to
> >> assume that you were following the pattern of XMON_DEFAULT, but I
> >> think you need to be a little more clear about what option actually
> >> does. Renaming it to XMON_DEFAULT_RO_MODE and re-wording the
> >> description to indicate it's a only a mode change would help a lot.
> >> 
> >> Sorry if this comes across as pointless bikeshedding since it's the
> >> opposite of what Christophe said in the last patch, but this was a bit
> >> of a head scratcher.
> >
> > If anyone is dumb here it's me for making this confusing :)
> > I chatted with Michael Ellerman about this, so let me try to explain this more clearly.
> 
> Yeah it's my fault :)
>

"Signed-off-by: Christopher M. Riedl" -- I take full responsibility hah.

> 
> > There are two things I am trying to address with XMON_RW:
> > 1) provide a default access mode for xmon based on system "security"
> 
> I think I've gone off this idea. Tying them together is just enforcing a
> linkage that people may not want.
> 
> I think XMON_RW should just be an option that stands on its own. It
> should probably be default n, to give people a safe default.
> 

Next version includes this along with making it clear that this option
provides the default mode for XMON.

>
> > 2) replace XMON in the decision to write-protect kernel text at compile-time
> 
> We should do that as a separate patch. That's actually a bug in the
> current STRICT_KERNEL_RWX support.
> 
> ie. STRICT_KERNEL_RWX should always give you PAGE_KERNEL_ROX, regardless
> of XMON or anything else.
> 
> > I think a single Kconfig for both of those things is sensible as ultimately the
> > point is to allow xmon to operate in read-only mode on "secure" systems -- without
> > violating any integrity/security guarantees (such as write-protected kernel text).
> >
> > Christophe suggested looking at STRICT_KERNEL_RWX and I think that option makes the
> > most sense to base XMON_RW on since the description for STRICT_KERNEL_RWX states:
> 
> Once we fix the bugs in STRICT_KERNEL_RWX people are going to enable
> that by default, so it will essentially be always on in future.
> 
> 
> > With that said, I will remove the 'xmon=rw' cmdline option as it really doesn't work
> > since kernel text is write-protected at compile time.
> 
> I think 'xmon=rw' still makes sense. Only some of the RW functionality
> relies on being able to patch kernel text.
> 
> And once you have proccall() you can just call a function to make it
> read/write anyway, or use memex to manually frob the page tables.
> 
> cheers

Great, adding this back in the next version.

^ permalink raw reply

* Re: [PATCH 0/8]
From: Tyrel Datwyler @ 2019-04-12  0:55 UTC (permalink / raw)
  To: Sam Bobroff, linuxppc-dev
In-Reply-To: <cover.1553050609.git.sbobroff@linux.ibm.com>

On 03/19/2019 07:58 PM, Sam Bobroff wrote:
> Hi all,
> 
> This patch set adds support for EEH recovery of hot plugged devices on pSeries
> machines. Specifically, devices discovered by PCI rescanning using
> /sys/bus/pci/rescan, which includes devices hotplugged by QEMU's device_add
> command. (pSeries doesn't currently use slot power control for hotplugging.)

Slight nit that its not that pSeries doesn't support slot power control
hotplugging, its that QEMU pSeries guests don't support it. We most definitely
use the slot power control for hotplugging in PowerVM pSeries Linux guests. More
specifically we had to work around short comings in the rpaphp driver when
dealing with QEMU. This being that while at initial glance the design implies
that it had multiple devices per PHB in mind, it didn't, and only actually
supported a single slot per PHB. Further, when we developed the QEMU pci hotplug
feature we had to deal with only having a single PHB per QEMU guest and as a
result needed a way to plug multiple pci devices into a single PHB. Hence, came
the pci rescan work around in drmgr.

Mike Roth and I have had discussions over the years to get the slot power
control hotplugging working properly with QEMU, and while I did get the RPA
hotplug driver fixed to register all available slots associated with a PHB, EEH
remained an issue. So, I'm very happy to see this patchset get that working with
the rescan work around.

> 
> As a side effect this also provides EEH support for devices removed by
> /sys/bus/pci/devices/*/remove and re-discovered by writing to /sys/bus/pci/rescan,
> on all platforms.

+1, this seems like icing on the cake. ;)

> 
> The approach I've taken is to use the fact that the existing
> pcibios_bus_add_device() platform hooks (which are used to set up EEH on
> Virtual Function devices (VFs)) are actually called for all devices, so I've
> widened their scope and made other adjustments necessary to allow them to work
> for hotplugged and boot-time devices as well.
> 
> Because some of the changes are in generic PowerPC code, it's
> possible that I've disturbed something for another PowerPC platform. I've tried
> to minimize this by leaving that code alone as much as possible and so there
> are a few cases where eeh_add_device_{early,late}() or eeh_add_sysfs_files() is
> called more than once. I think these can be looked at later, as duplicate calls
> are not harmful.
> 
> The patch "Convert PNV_PHB_FLAG_EEH" isn't strictly necessary and I'm not sure
> if it's better to keep it, because it simplifies the code or drop it, because
> we may need a separate flag per PHB later on. Thoughts anyone?
> 
> The first patch is a rework of the pcibios_init reordering patch I posted
> earlier, which I've included here because it's necessary for this set.
> 
> I have done some testing for PowerNV on Power9 using a modified pnv_php module
> and some testing on pSeries with slot power control using a modified rpaphp
> module, and the EEH-related parts seem to work.

I'm interested in what modifications with rpaphp. Its unclear if you are saying
rpaphp modified so that slot power hotplug works with a QEMU pSeries guest? If
thats the case it would be optimal to get that upstream and remove the work
rescan workaround for guests that don't need it.

-Tyrel

> 
> Cheers,
> Sam.
> 
> Sam Bobroff (8):
>   powerpc/64: Adjust order in pcibios_init()
>   powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
>   powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag
>   powerpc/eeh: Improve debug messages around device addition
>   powerpc/eeh: Add eeh_show_enabled()
>   powerpc/eeh: Initialize EEH address cache earlier
>   powerpc/eeh: EEH for pSeries hot plug
>   powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()
> 
>  arch/powerpc/include/asm/eeh.h               | 19 +++--
>  arch/powerpc/kernel/eeh.c                    | 33 ++++-----
>  arch/powerpc/kernel/eeh_cache.c              | 29 +-------
>  arch/powerpc/kernel/eeh_driver.c             |  4 ++
>  arch/powerpc/kernel/of_platform.c            |  3 +-
>  arch/powerpc/kernel/pci-common.c             |  4 --
>  arch/powerpc/kernel/pci_32.c                 |  4 ++
>  arch/powerpc/kernel/pci_64.c                 | 12 +++-
>  arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++------
>  arch/powerpc/platforms/powernv/pci.c         |  7 +-
>  arch/powerpc/platforms/powernv/pci.h         |  2 -
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 75 +++++++++++---------
>  arch/powerpc/platforms/pseries/pci.c         |  7 +-
>  13 files changed, 122 insertions(+), 118 deletions(-)
> 


^ permalink raw reply

* Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
From: huang ying @ 2019-04-12  0:49 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-ia64, Linux-sh list, Peter Zijlstra, Will Deacon,
	H. Peter Anvin, sparclinux, linux-arch, Davidlohr Bueso,
	Chen Rong, linux-hexagon, the arch/x86 maintainers,
	IngoMolnar@shao2-debian, linux-xtensa, Arnd Bergmann,
	linuxppc-dev, Borislav Petkov, Thomas Gleixner,
	linux-alpha@vger.kernel.org, Linus Torvalds,
	Linux List Kernel Mailing, Andrew Morton, Tim Chen
In-Reply-To: <cc8af095-50e0-cd3e-e498-8c3350d314b7@redhat.com>

On Thu, Apr 11, 2019 at 12:08 AM Waiman Long <longman@redhat.com> wrote:
>
> On 04/10/2019 04:15 AM, huang ying wrote:
> > Hi, Waiman,
> >
> > What's the status of this patchset?  And its merging plan?
> >
> > Best Regards,
> > Huang, Ying
>
> I have broken the patch into 3 parts (0/1/2) and rewritten some of them.
> Part 0 has been merged into tip. Parts 1 and 2 are still under testing.

Thanks!  Please keep me updated!

Best Regards,
Huang, Ying

> Cheers,
> Longman
>

^ permalink raw reply

* Re: [PATCH v2 00/21] Convert hwmon documentation to ReST
From: Mauro Carvalho Chehab @ 2019-04-11 23:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-hwmon, Jean Delvare, linux-aspeed, Linux Doc Mailing List,
	Andrew Jeffery, Jonathan Corbet, Liviu Dudau, linux-kernel,
	Mauro Carvalho Chehab, Lorenzo Pieralisi, Paul Mackerras,
	Joel Stanley, linuxppc-dev, Sudeep Holla, linux-arm-kernel
In-Reply-To: <20190411210731.GA29378@roeck-us.net>

Em Thu, 11 Apr 2019 14:07:31 -0700
Guenter Roeck <linux@roeck-us.net> escreveu:

> On Thu, Apr 11, 2019 at 05:43:57PM -0300, Mauro Carvalho Chehab wrote:
> > Em Thu, 11 Apr 2019 12:43:24 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >   
> > > On Wed, 10 Apr 2019 16:22:37 -0300
> > > Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> > >   
> > > > This series converts the contents of Documentation/hwmon to ReST
> > > > format.
> > > > 
> > > > PS.: I opted to group the conversion files per groups of maintainer
> > > > set, as, if I were to generate one patch per file, it would give around
> > > > 160 patches.
> > > > 
> > > > I also added those patches to my development tree at:
> > > > 	https://git.linuxtv.org/mchehab/experimental.git/log/?h=hwmon
> > > > 
> > > > If you want to see the results, they're at:
> > > > 	https://www.infradead.org/~mchehab/hwmon/    
> > > 
> > > This set seems generally good and could probably be applied as-is.  But I
> > > have to ask...is there a reason to not take the last step and actually
> > > bring this stuff into the Sphinx doc tree?
> > > 
> > > We seem to be mostly documenting sysfs files and such.  I am *guessing*
> > > that perhaps the set should move to Documentation/admin-guide/hwmon?  Or
> > > have I misunderstood the intended audience here?  
> > 
> > :-)
> > 
> > Yeah, I'd say that 80% of the contents there are user-faced.
> > 
> > Yet, the main issue with this (and other driver subsystems) is that there's
> > a mix of userspace and Kernelspace stuff. One somewhat simple case is
> > the abituguru: it has a "datasheet" file:
> > 
> > 	abituguru-datasheet
> > 
> > This contains programming information for the corresponding drivers,
> > while abituguru and abituguru3 contains mostly userspace
> > stuff (still, it also contains the I2C address, with shouldn't mean
> > anything for the user).
> > 
> > However, if you take a look at w83781d, you'll see a mix of both
> > userspace and driver developer info there... it has a chapter called
> > "Data sheet updates", for example, with is probably meaningless for
> > anyone but the hwmon driver developers.
> > 
> > That's, btw, a pattern that happens a lot inside device driver
> > documents on almost all subsystems I checked: driver-specific
> > documentation is usually not split into user-facing/kernel-facing.
> > 
> > While nobody does such split, IMHO, the best would be to keep the
> > information outside Documentation/admin-guide. But hey! You're
> > the Doc maintainer. If you prefer to move, I'm perfectly fine
> > with that.
> >   
> 
> Same here, but please don't move the files which are kernel facing only.
> 
> How do you want to handle this series ? Do you expect it to be pushed
> through hwmon, or through Documentation, or do you plan to push yourself ?
> 
> If the series isn't pushed through hwmon, we'll likely have a couple of
> conflicts against hwmon-next.

Guenter,

I won't be pushing it myself. IMO, it makes more sense to apply it at
hwmon-next, except if it would cause some conflicts against docs-next.

Regards,
Mauro

^ permalink raw reply

* Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode
From: Nayna @ 2019-04-11 17:40 UTC (permalink / raw)
  To: Daniel Axtens, Eric Biggers
  Cc: leo.barbosa, Herbert Xu, Stephan Mueller, nayna, omosnacek,
	leitao, pfsmorigo, linux-crypto, marcelo.cerri, George Wilson,
	linuxppc-dev
In-Reply-To: <87imvkwqdh.fsf@dja-thinkpad.axtens.net>


On 04/11/2019 10:47 AM, Daniel Axtens wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
>
>> Are you still planning to fix the remaining bug?  I booted a ppc64le VM, and I
>> see the same test failure (I think) you were referring to:
>>
>> alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 3, cfg="uneven misaligned splits, may sleep"
>>
> Yes, that's the one I saw. I don't have time to follow it up at the
> moment, but Nayna is aware of it.
>

Yes Eric, we identified this as a separate issue of misalignment and 
plan to post a separate patch to address it.

Thanks & Regards,
       - Nayna


^ permalink raw reply

* Re: [PATCH v2 00/21] Convert hwmon documentation to ReST
From: Guenter Roeck @ 2019-04-11 21:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-hwmon, Jean Delvare, linux-aspeed, Linux Doc Mailing List,
	Andrew Jeffery, Jonathan Corbet, Liviu Dudau, linux-kernel,
	Mauro Carvalho Chehab, Lorenzo Pieralisi, Paul Mackerras,
	Joel Stanley, linuxppc-dev, Sudeep Holla, linux-arm-kernel
In-Reply-To: <20190411174357.251904f5@coco.lan>

On Thu, Apr 11, 2019 at 05:43:57PM -0300, Mauro Carvalho Chehab wrote:
> Em Thu, 11 Apr 2019 12:43:24 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
> 
> > On Wed, 10 Apr 2019 16:22:37 -0300
> > Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> > 
> > > This series converts the contents of Documentation/hwmon to ReST
> > > format.
> > > 
> > > PS.: I opted to group the conversion files per groups of maintainer
> > > set, as, if I were to generate one patch per file, it would give around
> > > 160 patches.
> > > 
> > > I also added those patches to my development tree at:
> > > 	https://git.linuxtv.org/mchehab/experimental.git/log/?h=hwmon
> > > 
> > > If you want to see the results, they're at:
> > > 	https://www.infradead.org/~mchehab/hwmon/  
> > 
> > This set seems generally good and could probably be applied as-is.  But I
> > have to ask...is there a reason to not take the last step and actually
> > bring this stuff into the Sphinx doc tree?
> > 
> > We seem to be mostly documenting sysfs files and such.  I am *guessing*
> > that perhaps the set should move to Documentation/admin-guide/hwmon?  Or
> > have I misunderstood the intended audience here?
> 
> :-)
> 
> Yeah, I'd say that 80% of the contents there are user-faced.
> 
> Yet, the main issue with this (and other driver subsystems) is that there's
> a mix of userspace and Kernelspace stuff. One somewhat simple case is
> the abituguru: it has a "datasheet" file:
> 
> 	abituguru-datasheet
> 
> This contains programming information for the corresponding drivers,
> while abituguru and abituguru3 contains mostly userspace
> stuff (still, it also contains the I2C address, with shouldn't mean
> anything for the user).
> 
> However, if you take a look at w83781d, you'll see a mix of both
> userspace and driver developer info there... it has a chapter called
> "Data sheet updates", for example, with is probably meaningless for
> anyone but the hwmon driver developers.
> 
> That's, btw, a pattern that happens a lot inside device driver
> documents on almost all subsystems I checked: driver-specific
> documentation is usually not split into user-facing/kernel-facing.
> 
> While nobody does such split, IMHO, the best would be to keep the
> information outside Documentation/admin-guide. But hey! You're
> the Doc maintainer. If you prefer to move, I'm perfectly fine
> with that.
> 

Same here, but please don't move the files which are kernel facing only.

How do you want to handle this series ? Do you expect it to be pushed
through hwmon, or through Documentation, or do you plan to push yourself ?

If the series isn't pushed through hwmon, we'll likely have a couple of
conflicts against hwmon-next.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH v2 00/21] Convert hwmon documentation to ReST
From: Mauro Carvalho Chehab @ 2019-04-11 20:43 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: linux-arm-kernel, linux-hwmon, Jean Delvare, linux-aspeed,
	Linux Doc Mailing List, Andrew Jeffery, Sudeep Holla, Liviu Dudau,
	linux-kernel, Mauro Carvalho Chehab, Lorenzo Pieralisi,
	Paul Mackerras, Joel Stanley, linuxppc-dev, Guenter Roeck
In-Reply-To: <20190411124324.3ed62fad@lwn.net>

Em Thu, 11 Apr 2019 12:43:24 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Wed, 10 Apr 2019 16:22:37 -0300
> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> 
> > This series converts the contents of Documentation/hwmon to ReST
> > format.
> > 
> > PS.: I opted to group the conversion files per groups of maintainer
> > set, as, if I were to generate one patch per file, it would give around
> > 160 patches.
> > 
> > I also added those patches to my development tree at:
> > 	https://git.linuxtv.org/mchehab/experimental.git/log/?h=hwmon
> > 
> > If you want to see the results, they're at:
> > 	https://www.infradead.org/~mchehab/hwmon/  
> 
> This set seems generally good and could probably be applied as-is.  But I
> have to ask...is there a reason to not take the last step and actually
> bring this stuff into the Sphinx doc tree?
> 
> We seem to be mostly documenting sysfs files and such.  I am *guessing*
> that perhaps the set should move to Documentation/admin-guide/hwmon?  Or
> have I misunderstood the intended audience here?

:-)

Yeah, I'd say that 80% of the contents there are user-faced.

Yet, the main issue with this (and other driver subsystems) is that there's
a mix of userspace and Kernelspace stuff. One somewhat simple case is
the abituguru: it has a "datasheet" file:

	abituguru-datasheet

This contains programming information for the corresponding drivers,
while abituguru and abituguru3 contains mostly userspace
stuff (still, it also contains the I2C address, with shouldn't mean
anything for the user).

However, if you take a look at w83781d, you'll see a mix of both
userspace and driver developer info there... it has a chapter called
"Data sheet updates", for example, with is probably meaningless for
anyone but the hwmon driver developers.

That's, btw, a pattern that happens a lot inside device driver
documents on almost all subsystems I checked: driver-specific
documentation is usually not split into user-facing/kernel-facing.

While nobody does such split, IMHO, the best would be to keep the
information outside Documentation/admin-guide. But hey! You're
the Doc maintainer. If you prefer to move, I'm perfectly fine
with that.

Thanks,
Mauro

^ permalink raw reply

* Re: [PATCH 1/6] mm: change locked_vm's type from unsigned long to atomic64_t
From: Daniel Jordan @ 2019-04-11 20:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Davidlohr Bueso, kvm, Alan Tull, Alexey Kardashevskiy, linux-fpga,
	linux-kernel, kvm-ppc, Daniel Jordan, linux-mm, Alex Williamson,
	Moritz Fischer, akpm, linuxppc-dev, Christoph Lameter, Wu Hao
In-Reply-To: <20190411095543.GA55197@lakrids.cambridge.arm.com>

On Thu, Apr 11, 2019 at 10:55:43AM +0100, Mark Rutland wrote:
> On Thu, Apr 11, 2019 at 02:22:23PM +1000, Alexey Kardashevskiy wrote:
> > On 03/04/2019 07:41, Daniel Jordan wrote:
> 
> > > -	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %ld/%ld%s\n", current->pid,
> > > +	dev_dbg(dev, "[%d] RLIMIT_MEMLOCK %c%ld %lld/%lu%s\n", current->pid,
> > >  		incr ? '+' : '-', npages << PAGE_SHIFT,
> > > -		current->mm->locked_vm << PAGE_SHIFT, rlimit(RLIMIT_MEMLOCK),
> > > -		ret ? "- exceeded" : "");
> > > +		(s64)atomic64_read(&current->mm->locked_vm) << PAGE_SHIFT,
> > > +		rlimit(RLIMIT_MEMLOCK), ret ? "- exceeded" : "");
> > 
> > 
> > 
> > atomic64_read() returns "long" which matches "%ld", why this change (and
> > similar below)? You did not do this in the two pr_debug()s above anyway.
> 
> Unfortunately, architectures return inconsistent types for atomic64 ops.
> 
> Some return long (e..g. powerpc), some return long long (e.g. arc), and
> some return s64 (e.g. x86).

Yes, Mark said it all, I'm just chiming in to confirm that's why I added the
cast.

Btw, thanks for doing this, Mark.

^ permalink raw reply

* Re: [PATCH v4 4/4] ASoC: fsl_audmix: cache pdev->dev pointer
From: Nicolin Chen @ 2019-04-11 20:21 UTC (permalink / raw)
  To: Viorel Suman
  Cc: Mark Rutland, devicetree@vger.kernel.org,
	alsa-devel@alsa-project.org, linuxppc-dev@lists.ozlabs.org,
	Timur Tabi, Xiubo Li, Shawn Guo, Sascha Hauer, Takashi Iwai,
	Liam Girdwood, Rob Herring, Jaroslav Kysela, Viorel Suman,
	Julia Lawall, Mark Brown, dl-linux-imx, Pengutronix Kernel Team,
	Fabio Estevam, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <1554894380-25153-5-git-send-email-viorel.suman@nxp.com>

On Wed, Apr 10, 2019 at 11:06:39AM +0000, Viorel Suman wrote:
> There should be no trouble to understand dev = pdev->dev.
> This can save some space to have more print info or save
> some wrapped lines.

Change looks fine. Ack.

> Signed-off-by: Viorel Suman <viorel.suman@nxp.com>
> Suggested-by: Nicolin Chen <nicoleotsuka@gmail.com>

I think usually Suggested-by comes above Signed-off-by.

This was originally in your change already, so you can replace:
-Suggested-by: Nicolin Chen <nicoleotsuka@gmail.com>
-Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

Just a chance for it to get resent without being a noise :)

^ permalink raw reply

* Re: [PATCH V2 2/2] ASoC: fsl_asrc: Unify the supported input and output rate
From: Nicolin Chen @ 2019-04-11 20:11 UTC (permalink / raw)
  To: S.j. Wang
  Cc: alsa-devel@alsa-project.org, timur@kernel.org,
	Xiubo.Lee@gmail.com, festevam@gmail.com,
	linux-kernel@vger.kernel.org, broonie@kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <b7f4c8dd00743ddd09c33def00c1759757b8e743.1554975348.git.shengjiu.wang@nxp.com>

On Thu, Apr 11, 2019 at 09:39:09AM +0000, S.j. Wang wrote:
> Unify the supported input and output rate, add the

We previously didn't support 5KHz->5KHz, but now we do? That'd be
great if so.

>  static int fsl_asrc_dai_hw_params(struct snd_pcm_substream *substream,
> @@ -626,14 +629,18 @@ static int fsl_asrc_dai_probe(struct snd_soc_dai *dai)
>  		.stream_name = "ASRC-Playback",
>  		.channels_min = 1,
>  		.channels_max = 10,
> -		.rates = FSL_ASRC_RATES,
> +		.rate_min = 5512,
> +		.rate_max = 192000,
> +		.rates = SNDRV_PCM_RATE_KNOT,
>  		.formats = FSL_ASRC_FORMATS,
>  	},
>  	.capture = {
>  		.stream_name = "ASRC-Capture",
>  		.channels_min = 1,
>  		.channels_max = 10,
> -		.rates = FSL_ASRC_RATES,

Probably could remove FSL_ASRC_RATES define since it's not used.

Thanks

^ permalink raw reply

* Re: [PATCH V2 1/2] ASoC: fsl_asrc: replace the process_option table with function
From: Nicolin Chen @ 2019-04-11 20:06 UTC (permalink / raw)
  To: S.j. Wang
  Cc: alsa-devel@alsa-project.org, timur@kernel.org,
	Xiubo.Lee@gmail.com, festevam@gmail.com,
	linux-kernel@vger.kernel.org, broonie@kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <a04859bff36ba26cb4cb51ff092a3f2e2eca455d.1554975348.git.shengjiu.wang@nxp.com>

On Thu, Apr 11, 2019 at 09:39:06AM +0000, S.j. Wang wrote:
  
> +/*
> + * Select the pre-processing and post-processing options

By aligning with other function comments:
/**
 * Select the pre-processing and post-processing options

> + *
> + * Fsin: input sample rate
> + * Fsout: output sample rate

I would suggest to use inrate and outrate to keep naming consistent.

> + * pre_proc: return value for pre-processing option
> + * post_proc: return value for post-processing option
> + */
> +static int fsl_asrc_sel_proc(int Fsin, int Fsout, int *pre_proc, int *post_proc)
> +{
> +	bool det_out_op2_cond;
> +	bool det_out_op0_cond;

By looking at the comments below. Probably better to rename them
	bool post_proc_cond2, post_proc_cond0;

> +	/* Codition for selection of post-processing */

"Codition" -> "Conditions"

> +	det_out_op2_cond = (((Fsin * 15 > Fsout * 16) & (Fsout < 56000)) |
> +					((Fsin > 56000) & (Fsout < 56000)));

Combining Daniel's comments + indentation alignment:
	det_out_op2_cond = (Fsin * 15 > Fsout * 16 && Fsout < 56000) ||
			   (Fsin > 56000 && Fsout < 56000);

> +	det_out_op0_cond = (Fsin * 23 < Fsout * 8);
> +
> +	/*
> +	 * unsupported case: Tsout>16.125*Tsin, and Tsout>8.125*Tsin.

Funny thing is that there'd be no point in checking the 16.125, if
Tsout is bigger than 8.125 times of Tsin, i.e. only one condition:
	Tsout > 8.125 * Tsin

> +	 * Tsout>16.125*Tsin -> Fsin * 8 > 129 * Fsout
> +	 * Tsout>8.125*Tsin  -> Fsin * 8 > 65 * Fsout
> +	 * Tsout>4.125*Tsin  -> Fsin * 8 > 33 * Fsout
> +	 * Tsout>1.875*Tsin  -> Fsin * 8 > 15 * Fsout

Took me a while to understand what it is saying....

Better to write like this:
	/* Does not support cases: Tsout > 8.125 * Tsin */
	if (Fsin * 8 > 65 * Fsout) {
		pair_err("Does not support %d > 8.125 * %d\n", Fsout, Fsin);
		return -EINVAL;
	}

	/* Otherwise, select pre_proc between [0, 2] */
	if (Fsin * 8 > 33 * Fsout)
> +		*pre_proc = 2;
> +	else if (Fsin * 8 > 15 * Fsout) {
> +		if (Fsin > 152000)
> +			*pre_proc = 2;
> +		else
> +			*pre_proc = 1;
> +	} else if (Fsin < 76000)
> +		*pre_proc = 0;
> +	else if (Fsin > 152000)
> +		*pre_proc = 2;
> +	else
> +		*pre_proc = 1;

<== Would look better by moving post_cond calculations here.

> +	if (det_out_op2_cond)
> +		*post_proc = 2;
> +	else if (det_out_op0_cond)
> +		*post_proc = 0;
> +	else
> +		*post_proc = 1;

And we could remove this check below:
> +	/* unsupported options */
> +	if (*pre_proc == 4 || *pre_proc == 5)
> +		return -EINVAL;

So basically we are doing:
	1) Error out unsupported cases
	2) Select pre_proc
	3) Calculate conditions for post_proc
	4) Select post_proc

Thanks

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox